Vous êtes sur la page 1sur 99

Sara Elizabeth Bury

Aberrant Network Behaviour
Indication and Analysis

BSc. Computer Science

22nd March 2007

I certify that the material contained in this dissertation is my own work, and does not
contain significant portions of unreferenced or unacknowledged material. I also warrant
that the above statement applies to the implementation of the project, and all associated

Date: 22nd March 2007


The aim of this project is to explore the use of aberrance detection techniques for network
monitoring in a large network environment. This an important area for research and de-
velopment as todays networks are expected to function twenty four hours a day, seven
days a week; something which is impossible to guarantee relying only on the vigilance
and investigative skill of network operators. This project can be broken down into three
main areas: research into current aberrant network detection methods and assessment of
their suitability; eliciting the requirements of a large network operator, and the produc-
tion of a prototype system to illustrate the advantages of an aberrance detection system
within a network operations environment. The result would be a system which indicates
instances of aberrant behaviour as they occur and provides further information for net-
work operators to aid their workflow and allow them to make an initial classification of
the event.

1 Introduction 7
1.1 Project Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.1 DANTE and GÈANT2 . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Report Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 Background and Related Work 11

2.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Sources of Network Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.1 Measurement of Metrics . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.2 Individual Packet Capture . . . . . . . . . . . . . . . . . . . . . . 15
2.2.3 SNMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.4 Network Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Existing Network Monitoring Solutions . . . . . . . . . . . . . . . . . . . 17
2.3.1 TCPDump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.2 Snort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3.3 RRDtool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3.4 Cacti . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3.5 Flow-Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3.6 NfDump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.7 NfSen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3.8 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4 NfSen-HW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4.1 Architecture and Organisation . . . . . . . . . . . . . . . . . . . . 26
2.4.2 Holt-Winters Forecasting . . . . . . . . . . . . . . . . . . . . . . . 27

3 Design 30


3.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.1.1 Network Operator’s Workflow . . . . . . . . . . . . . . . . . . . . 30
3.1.2 Requirements list . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2 Design Decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2.1 NfSen-HW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2.2 Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2.3 MySQL and PHP . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2.4 Debian GNU/Linux . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3.1 NfSen-HW and NfDump . . . . . . . . . . . . . . . . . . . . . . . 38
3.3.2 runSentinel.sh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3.3 Sentinel.jar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3.4 Sentinel Database . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3.5 Sentinel Web Interface . . . . . . . . . . . . . . . . . . . . . . . . 42

4 Implementation 47
4.1 Method of Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.2 NfSen-HW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3 runSentinel.sh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.4 Sentinel.jar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.4.1 Implementation Overview . . . . . . . . . . . . . . . . . . . . . . 51
4.4.2 Problems with XML Parsing . . . . . . . . . . . . . . . . . . . . . 51
4.4.3 Database Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.5 Sentinel Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.6 Sentinel Web Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.6.1 Live Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.6.2 Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.6.3 Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5 System Operation 57

5.1 Usage Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.1.1 Examining Live Update for Aberrant Behaviour . . . . . . . . . . 57
5.1.2 Filtering the results . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.1.3 Viewing further Details . . . . . . . . . . . . . . . . . . . . . . . . 58
5.1.4 Analysis and editing event details . . . . . . . . . . . . . . . . . . 60
5.1.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

6 Testing and evaluation 62

6.1 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.1.1 Defect and Component Testing . . . . . . . . . . . . . . . . . . . 62
6.1.2 Functional and Integration Testing . . . . . . . . . . . . . . . . . 65
6.2 User Interface Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.3.1 Requirements List Review . . . . . . . . . . . . . . . . . . . . . . 71
6.3.2 Summary and Feedback from DANTE . . . . . . . . . . . . . . . 77

7 Conclusion 78
7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
7.2 Further work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

A Acknowledgements 80

B Project Proposal 81

C JavaDoc 82

D NfDump(1) Manpage 83

E Holt-Winters Forecasting Examples 92

List of Figures

1.1 GÈANT2 Network Topology . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.2 GÈANT2 Global Connectivity . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1 Section of an RRD exported to XML format . . . . . . . . . . . . . . . . 19

2.2 NfSen Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3 NfSen-HW Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.1 Use Case diagram depicting the diagnosis of a network anomaly . . . . . 31

3.2 Overview of Proposed System Architecture . . . . . . . . . . . . . . . . . 37
3.3 Sentinal Java UML Diagram . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.4 Sentinal Database Entity Relationship Diagram . . . . . . . . . . . . . . 40
3.5 Simple foreign key linking example . . . . . . . . . . . . . . . . . . . . . 41
3.6 Proposed Live Update Web Interface . . . . . . . . . . . . . . . . . . . . 43
3.7 Proposed Details Web Interface . . . . . . . . . . . . . . . . . . . . . . . 44
3.8 Proposed Review Web Interface . . . . . . . . . . . . . . . . . . . . . . . 45

4.1 Sentinel Java UML Class Diagram . . . . . . . . . . . . . . . . . . . . . . 50

4.2 Sentinel Database UML Diagram . . . . . . . . . . . . . . . . . . . . . . 52
4.3 Aberrant Marking Example . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.4 Subtracting 40 Minutes Example . . . . . . . . . . . . . . . . . . . . . . 55

5.1 Investigation Process Step 1 . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.2 Investigation Process Step 2 . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.3 Investigation Process Step 3 . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.4 Investigation Process Step 4 - Editing . . . . . . . . . . . . . . . . . . . . 60
5.5 Investigation Process Step 4 - Inserting . . . . . . . . . . . . . . . . . . . 60
5.6 Sequence Diagram of System Operation . . . . . . . . . . . . . . . . . . . 61

6.1 General Defect Testing Model . . . . . . . . . . . . . . . . . . . . . . . . 62


6.2 Functional Testing Model . . . . . . . . . . . . . . . . . . . . . . . . . . 65

E.1 Aberrant Marking Example . . . . . . . . . . . . . . . . . . . . . . . . . 92

E.2 Subtracting 40 Minutes Example 1 . . . . . . . . . . . . . . . . . . . . . 93
E.3 Subtracting 40 Minutes Example 2 . . . . . . . . . . . . . . . . . . . . . 93
E.4 Subtracting 40 Minutes Example 3 . . . . . . . . . . . . . . . . . . . . . 93
List of Tables

2.1 Consolidation functions within RRDtool for aberrant behaviour detection 20

3.1 Derived Requirements List for High Level Requirement A . . . . . . . . . 32

3.2 Derived Requirements List for High Level Requirement B . . . . . . . . . 33
3.3 Derived Requirements List for High Level Requirement C . . . . . . . . . 33
3.4 Derived Requirements List for High Level Requirement D . . . . . . . . . 33
3.5 Derived Requirements List for High Level Requirement E . . . . . . . . . 33
3.6 Derived Requirements List for High Level Requirement F . . . . . . . . . 34
3.7 Derived Requirements List for High Level Requirement G . . . . . . . . . 34
3.8 Sentinel Database Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

6.1 Sentinel.jar Testing - XML Parsing . . . . . . . . . . . . . . . . . . . . . 63

6.2 Sentinel.jar Testing - Source and Profile Detection . . . . . . . . . . . . . 63
6.3 Sentinel.jar Testing - Database Connectivity . . . . . . . . . . . . . . . . 64
6.4 runSentinel.sh Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.5 Sentinel UI Functional Testing - Live Update . . . . . . . . . . . . . . . . 67
6.6 Sentinel UI Functional Testing - Details . . . . . . . . . . . . . . . . . . . 68
6.7 Sentinel UI Functional Testing - Review . . . . . . . . . . . . . . . . . . 69


1.1 Project Aims

The rationale behind this project was to gain an understanding of current research work
surrounding aberrant network behaviour detection and then investigate the challenges
faced when creating a system which would detect aberrant behaviour and provide a
classification of its type. Leading on from this, the aim is to produce an application which
illustrates how the work of a network operator could be aided by indicating instances
of aberrant behaviour, providing any relevant information, and performing some kind of
classification of the type of anomaly. Such an application should ease a network operator’s
workflow when diagnosing and fixing network problems by providing necessary details
with considerably less manual intervention than might currently be required. It should
also provide the facility for instances of aberrant behaviour to be recorded to provide
a historical perspective on any future anomalies detected which should further aid the
network operator in their work.

1.2 Motivation
Computer Networks play an increasingly important role in today’s technological age. The
transfer of information between computers has become something necessary for many
day to day activities, and this is especially true of the education and research sector.
Universities and research institutes rely on them as communication links between scholars
and students across the globe. Also in many cases the research being done corresponds
directly to the networks themselves, computing and communications research requires
high speed reliable links between sites in order to accurately test new protocols and
technologies. It is important that these networks are monitored carefully to ensure that
potential issues are caught and resolved.
People charged with the task of maintaining computer networks face a constant battle
to ensure that they do not fail, but failure is not such a black or white issue. Whilst one
problem faced might be a network breaking removing a connection between machines, it
is more than likely that regular problems would be less obvious and require investigation
to solve. Services on the network may become unusually busy, or slow to respond; users
might notice lag between transfers being sent and acknowledged. Other issues might affect
network traffic but remain unseen, namely security problems. Users might not notice if
their data is being tampered with or observed, but it is up to a network administrator
to try and prevent attacks of that kind, and to ensure they are rectified if they occur.


1.2.1 DANTE and GÈANT2

DANTE, standing for “Delivery of Advanced Network Technology to Europe”, is an

organisation part owned by each of the European National Research and Education
Networks (NRENs) which has worked to plan, build and operate pan-European com-
puter networks for advanced research and education since it was established in 1993.
[DANTE, 2007]. DANTE has played an important part in the previous four generations
of pan-European research network, and was responsible for the initial construction and
subsequently the maintainence and management of it’s current incarnation, GÈANT2.
This network connects 30 NRENs serving 34 countries providing network facilities for
approximately 30 million research and education users [GÈANT2, 2007].

Figure 1.1: GÈANT2 Network Topology


Figure 1.2: GÈANT2 Global Connectivity

DANTE and its network operations team are responsible for the day to day business of
running GÈANT2, ensuring the network is operating smoothly and that each of it’s end
users are happy with it’s performance. As you can see from Figure 1.1 and Figure 1.2, the
GÈANT2 network is exceptionally large and interacts with multiple research networks
around the world. Monitoring a network this size presents a very difficult prospect, it’s
a balance between wanting to know about every network event in order to be sure the
network is operating correctly but also having only the time to deal with problems which
are being specifically reported by end users. This results in a situation where network
anomalies not causing immediate problems for network users are often missed, and po-
tentially causes problems further down the line as whatever the cause of the network
event might be is not dealt with in the first instance. This problem is emphasised by the
sheer amount of data being dealt with, any metrics created for monitoring purposes are
be excessively large and logically cannot be kept for an indefinite amount of time. This
leads to circumstances where a network problem has occurred but the data pertaining to
it has been deleted simply because the data for that time period has expired.

A network operator in this situation does not have time to spend actively monitoring
the network for aberrant behaviour by hand. Most widely used network monitoring so-

lutions will provide an overview of network activity in a graphed format and this can be
used to visually identify anomalies. Unfortunately this is not an automated process and
requires human interaction to view the graphs at the correct time. Also on a network
the size of GÈANT2, in order for an anomaly to show up on a graphed view it would
have be quite large. Due to this. network events could be missed both through being
of a size too small to create a visible footprint on the graphs and by ocurring at a time
during which a network operator has not checked the monitoring software. In such a
case there is not normally any record of the behaviour other than in the graphs, no other
historical record is kept of network anomalies and their type. In some cases it is possible
to go back and examine the graphs for a given time period, but the data used to create
them might have expired hence losing the potential for closer analysis and the graphs
themselves often become averaged over time, causing smaller anomalies to be evened out
into the normal flow of data.

DANTE and their work with GÈANT2 provide an excellent example of why automated
aberrant network detection is a necessary area of research. This project aims to use this
scenario and through discussions and liaison with network operators at DANTE, provide
a concept network monitoring system which will attempt to provide a solution for the
issues highlighted above.

1.3 Report Overview

The remaining sections of this report will be as follows:

Chapter two provides some background information pertaining to this area of research
and examines existing work and applications which could aid the design and implemen-
tation of the project. Chapter three gives a breakdown and explanation of the major
design decisions made and describes the system architecture, interface designs and com-
munication structure. It also provides a list of documented requirements to be met by
the finished product. Chapter four describes how the application was implemented and
lists important sections of code. Chapter five shows the application in operation, how it
would be used by a network operator in their daily work with a walkthrough of typical
usage. Chapter six gives an overview of the testing undertaken, and how successfully the
application meets the specified requirements. Chapter seven draws conclusions based on
comparisons between the finished project and the initial aims and objectives. It analyses
the overall success of the project and indicates where further work could be undertaken.
Background and Related Work

2.1 Related Work

There is a lot of research surrounding the area of network anomaly detection and in every
case the first thing that must be defined is what specifically constitutes anomalous or
aberrant behaviour on a network. Some researchers have chosen to define aberrant events
as any network traffic which has been caused by some malicious intent [Kim et al. 2004],
others simply as any large scale event on the network [Wagner & Plattner, 2005]. These
definitions seem simultaneously too broad and too specific; malicious network traffic may
make up a large part of anomalous traffic on a network but there are other contributing
factors such as network configuration issues which are not be covered by this, where as
some large scale network events may be planned, or occur as part of general network
usage. A more reasonable definition is “circumstances when network operations deviate
from normal network behaviour” [Thottan & Ji, 2003, pg 2192], in essence when witnessed
traffic on the network differs from might be expected according to prior knowledge of how
the network operates. This obviously requires an indepth knowledge of the network and
how it is used. One way of doing this is to create a picture of normal network traffic and
actively use that for comparison purposes to judge which traffic is abnormal. Jake Brut-
lag develops this idea further by stating that if you have an accurate statistical model
for a given time series of network traffic data, then you can define aberrant behaviour as
“behaviour that does not conform to this model” [2000, p140]. In these cases aberrant or
anomalous behaviour is not necessarily of any given type or size, but merely something
which would not have normally occurred, and even in relation to the earlier definitions it
is appropriate as it can be used to specifically identify malicious traffic of network wide
events. Overall, it means that the identification is not restricted to network events which
have been witnessed and identified before allowinh new, undefined network problems to
be flagged up. Of course, for this to be a useful definition there must first exist an exact
specification of what constitutes normal behaviour and this is a big focus of much re-
search in this area. Almost all the research papers covered were in agreement that what
is required is a statistical analysis of network traffic data [Barford et al. 2002; Thottan &
Ji, 2003; Kim et al. 2004; Brutlag. 2000]. The question then is how that statistical anal-
ysis is performed, and then from that, how is aberrant or non-normal behaviour identified.

Barford et al. [2002] perform a signal analysis of network traffic data known as wavelet
analysis which produces an organised hierarchy of data over time into separate levels
known as strata. The differing levels of strata produce information of varying types,
from “sophisticated aggregations of the original data” at the lower levels to “fine grained
details” at the higher end [p74]. From this they show that these separations of results
can indicate different characteristics, for example, lower level strata capturing patterns
over a long period of time, middle ranges of strata producing information about daily


variations, and high levels indicating very short term variation and in their opinion not
useful to network anomaly analysis. They make the point that there can never be one
single method for detecting network anomalies from this information due to the differing
definitions of what a network anomaly is, but they suggest a method for automating
the process of identifying “irregularities in the measured data”[p75] which they call a
deviation score. The results illustrated that a wavelet analysis of network data is “quite
effective” at showing the details of network traffic, both during normal network opera-
tion and during anomalous events. The network data used throughout this analysis was
both SNMP from their network devices, mostly activity counts (i.e. numbers of pack-
ets transmitted per node) and Network Flow data including more specific protocol level
information about end to end packet flows, which together provide a “reasonably solid
measurement foundation” [p71]. A comparison was made between the details elicited
from Network Flow and SNMP and they found that it is possible to expose anomalies
effectively using both. There will be further discussion of Network Flow data and SNMP
data later in this chapter.

Thottan & Ji [2003] provide an overview of what they consider to be the most popu-
lar network anomaly detection methods; rule based techniques, finite state machines,
pattern matching and finally their main focus, statistical analysis which can be used to
“continuously track the behaviour of the network” [p2194] unlike the other approaches
which often require recalibration over time. The statistical analysis is performed using
SNMP data collected from network routers and the method of statistical analysis has
been developed based upon the theory of change detection, i.e. having defined a network
anomaly as “correlated abrupt changes in network data” [p2195], using theory to detect
changes in network data indicates network anomalies also. They define an abrupt change
as “any change in the parameters of a time series that occurs on the order of the sampling
period of the measurement” [p2195]. It is the correlated nature of the changes which dis-
tinguish them from the normal variable nature of normal network operation but due to
the nature of SNMP data from various devices, even data of the same type from separate
devices cannot be treated the same way. Each source of data must be tested indepen-
dently and correlations between devices found. To give a very general overview, abrupt
changes are detected by comparing the variation of statistics between two contiguous
windows of data using an auto-regressive model. They found that the use of fine-grained
network data greatly improved the time taken for detection and something for concern
was the possibility of time synchronation being out between the various network devices
being polled. Also SNMP runs using the UDP network protocol meaning that there is
no guarantee of queries and responses reaching their desired target.

Kim et al. [2004] propose a method for abnormal traffic detection based entirely on
Network Flow analysis. They divide the analytical process into two sections, flow header
detection and traffic pattern data generation. As a packet is received by their algorithm
its header is checked and the transport protocol determined. From this further checks
can be made on such information as destination/source port number, or the packet/flow
size. The traffic patterns can be used to detect further aberrant behaviour, for example,

a scanning attack would result in a large flow count per host, but small flow and packet
sizes. This is not strictly a statistical analysis of network data, more a record of previous
network traffic from a specific host/network in order to produce better knowledge of their
use of the network. It suffers from the same pitfalls as most rule-based analysis, for exam-
ple, a regular need for reconfiguring and a lack of ability to detect new and undocumented
aberrant events, but it does produce some interesting information regarding particular
network anomalies and how they appear as part of Network Flow data. It is also of note
that their system suffered problems with false alarms due to the similarities (according
to their model) between attack traffic data and normal peer-to-peer communication data
which, according to their paper, is the nature of as much as 50% of current Internet traffic.

Jake Brutlag [2000] describes the statistical model from which aberrant behaviour is
determined as having to take into account a number of factors, mostly surrounding sea-
son cycles or variations that are considered normal network behaviour, for example,
network usage during the day being higher than at night, and higher still Monday-Friday
compared to weekends. The model should be able to take this into account, and not
mistakenly judge such trends as aberrant instances. It should also be capable of evolving
over time with the network as the cycles and trends gradually adapt to new conditions
[p140]. His emphasis is on the use of such a model in a real-time monitoring context,
complicated statistical modelling is not likely to be understood by the network operators
and may have issues performing at an adequate speed. The model is broken down into
three sections [p140]:

• An algorithm for predicting the values of a time series one time step into the

• A measure of deviation between the predicted values and the observed values.

• A mechanism to decide if and when an observed value or sequence of values is

‘too deviant’ from the predicted value(s).

His solution is an extension to the Holt-Winters forecasting algorithm which builds upon
exponential smoothing. Exponential smoothing is a simple algorithm for predicting the
next value in a time series which works on the premise that the most useful value to
predict the next value is the current value and that the continued usefulness of earlier
values decays exponentially. Aberrant behaviour is then detected through devised con-
fidence bands, a measure of how much deviation is allowed for a specific time within a
seasonal cycle. There will be a more full explanation of the Holt-Winters forecasting
algorithm and how it works later in this chapter. Jake Brutlag included this implemen-
tation in RRDtool, a data logging and graphing application and illustrates its use within
a web based network monitoring solution called Cricket [Brutlag, 2000b; RRDtool, 2007;
Cricket, 2007]. His conclusions are that whilst not an optimal solution, it is flexible,
efficient and does effectively detect aberrant behaviour.
This solution appears to be the most complete, if not the most formally specified. The

technique used is already at a production level and is being used. The fact that it is
incorporated into RRDtool, one of the most commonly used logging and graphing tools
available, makes it a very attractive option. There will be a closer examination of this
RRDtool/Cricket solution later in this chapter.

Whilst most of these methods of anomaly detection and analysis have involved basic
counting metrics, there has been some investigation into the use of different methods
of analysis to create models of traffic flow. One such approach involves devising the
entropy content within traffic data and using that information to decide whether traffic
is anomalous or aberrant [Wagner & Plattner, 2005]. Entropy is defined as “a measure
of how random a data-set is” [p172] and the process they use to determine entropy for
the network traffic data first involves representing the data in a purely binary format
then performing data compression. The resultant size of the compressed data then cor-
responds directly to the level of entropy present. Their results found many interesting
entropy patterns in normal and attack traffic, for example, in regular network traffic the
entropy of source and destination port fields is almost identical where as in attack traffic
many of the answering flows do not exist, hence source port entropy increases whiled
destination port entropy decreases. They also found that this method of analysis is not
greatly affected by the use of sampled network traffic data.

2.2 Sources of Network Data

Before looking at how to detect aberrant or anomalous network behaviour it is important
to examine possible sources of network traffic data and their strengths and weaknesses.
There are four main types of network data available to use and in this section each will
be examined.

2.2.1 Measurement of Metrics

This refers to the method of obtaining network data by measuring certain metrics re-
garding network performance. An example might be the measurement of packet round
trip times and packet loss. This is not something which is necessarily automated, there
are command line tools which can give results of this nature such as ping or traceroute.
Findings obtained in this manner would not normally be incorporated into a network
monitoring solution but are useful as a secondary source of information during the inves-
tigation of a potential network problem. They present useful information about the state
of the network at a given time and also how well it is currently operating but cannot give
any indication of the type or nature of network traffic.

2.2.2 Individual Packet Capture

This method involves capturing each individual packet as it passes through the network
and processing it to find out useful information. Due to its invasive nature it provides
highly detailed information about the type and even content of data traversing the net-
work, this is due to its ability to look into the application layer of network packets. Such
indepth network traffic data creates the potential for incredibly accurate and specific anal-
ysis of network operation, not just based upon protocols used or source/destination, but
also based upon the program or application the packet is being used to update. In a lot
of cases this would be the ideal for network traffic analysis and would mean that all kinds
of networ anomalies can be identified and very accurately classified, unfortunately such
high detail comes at a price. Capturing individual packets as they pass through network
devices is an incredibly intensive process when the sheer amount of packets traversing
even a medium sized network operation. In a scenario such as that at DANTE, individual
packet capture would be far, far too heavy a load for any available server. Whilst the
information would be highly desirable it would result in such a performance hit on the
network itself that it is inappropriate for a passive network monitoring application.

2.2.3 SNMP

SNMP stands for Simple Network Monitoring Protocol [SNMP, 2007] and is an IETF
declared Internet standard in the application layer of the TCP/IP five layer model. It is
used by network monitoring systems to monitor and manage network connected devices
using Management Information Base (MIB) queries. Devices can be polled for numer-
ous different types of information, the first is regarding the state of the device itself.
This gives information about load and operational readiness, for example, information
about how heavily loaded the processor within a router is which could indicate potential
problems with the capability of that specific device, or possibility of unpredicted net-
work load in that area. It can also produce statistical information about the network
data the device is passing such as numbers of packets transmitted in a certain period
of time which gives another indication of bandwidth and network load. Another capa-
bility is providing network management systems with alerts when certain events occur
on the device, for example a large number of failed login attempts to its management
interface. One other use of this protocol standard is to actually remotely manage the
network devices, reconfiguring them for different circumstances, for example, blocking
partiular ports or dropping network interfaces. This isn’t something which is specifically
connected to network monitoring and aberrant behaviour detection but such capability
would allow a network engineer to react to aberrant events which might have been de-
tected and perhaps find a solution.

Information gathered via this method is quite coarse, there are few specifics about types
of packets or regularity of their throughput other than plain statistical counts and aggre-
gations. This data could be very useful alongside a more indepth source of network data,

but probably is not granular enough to be the soul data source in an aberrant behaviour
detection system.

2.2.4 Network Flow

A Network Flow is a record of a unidirectional sequence of packets between two endpoints

over a defined period of time that contains certain information with which the flow can
be identified. This information consists of seven key fields: source IP address, destination
IP addresss, source port number, destination port number, protocol type, service type
and router input interface. After receiving a packet a flow capable router will examine it
for the information to fill these seven fields and based upon the results decide whether
the packet is part of a pre-existant flow record, or if it is something new. In the case
that it is part of an existing flow record, the traffic statistics of that flow record will
be increased accordingly, otherwise a new flow record will be created with the statistics
including the initially recived packet. A few standards exist for for flow data, the most
common being NetFlow developed by Cisco [NetFlow, 2007] and generally accepted as
the industry standard, another is sFlow an alternative produced more recently. Both
produce fairly similar data for analysis and for the purposes of this definition the focus
will be on NetFlow as it is currently the more commonly supported.

A flow record does not contain any information pertaining to the application layer, it
is merely a traffic profiling tool. Flow level data is not as specific as full packet analysis
but holds the advantage in large scale heavily used networks due to its high speed na-
ture. NetFlow recording is nowhere near as intensive as individual packet capture and
produced a much smaller dataset for a given series of packets due to the way it aggregates
packets into related flows. This can have a big impact on heavily used networks as the
sheer amount of data created by each data source and the processing power required
to perform analysis can be prohibative to producing any kind of useful network usage
report, especially when working in real-time. Even with this reduction in the amount of
data, without some kind of presentation application NetFlow data can still be difficult
to manage and so in organisations where NetFlow is used, it will most likely be sent to
a network monitoring system to produce clear reports about the analysis carried out.
NetFlow data can be used to gain an overview of traffic traversing the entire network at
a point in time. It holds enough detail to analyse and produce reports of trends in port
usage, bandwidth on a packets per second/flows per second/bits per second basis, as well
as giving indications of interesting network behaviour. A network can be analysed using
NetFlow data and characterised according to how it is normally used, from this it can
be seen when network usage is different. This is all based on flow level information but
is usually enough to indicate areas of interest.

There are some potential problems with NetFlow as a main source of network data,
the first being the common use of packet sampling in order to create the flow records.
Even though recording Network Flow is much less intensive and quicker than individual

packet capture, it is still too much of an overhead for very large networks, such as in
the case of DANTE and their operation of GÈANT2. The problem is twofold, firstly not
wanting to impact network performance with the analysis load, but also creating large
datasets which are impossible to deal with sensibly. In such cases there is nothing to
be done but enforce some scheme of packet sampling upon the NetFlow enabled router.
By this process, not all packets are examined to record flow data, a given ratio can be
set, usually somewhere between the extremes of 1 in 15 and 1 in 1000. A study com-
pleted quite recently by Braukhoff et al. [2006] examined the impact of packet sampling
on anomaly detection metrics and the results were quite interesting. The investigation
used a record of flow data from the outbreak of the Blaster worm in 2003 where the
characteristics were well known, and the anomaly detection could be replayed at vari-
ous levels of sampling to produce results which could be scientifically compared. Firstly
they found that packet counts are barely disturbed by packet sampling even as high as 1
in 1000, where as flow counts are heavily disturbed causing many identifiable trends to
simply vanish. They attribute this to the fact that flows containing only a few packets
are sampled with a lower probability that flows containing many packets, hence in a lot
of cases the smaller flows disappear. Secondly they examined how volume and feature
entropy metrics are affected by packet sampling, their conclusion was that “though we
see that packet sampling disturbs entropy metrics (the unsampled value cannot easily be
computed from the sampled value as for byte and packet counts), the main traffic pattern
is still visible in the sampled trace.” [p161]. This is something which would have to be
taken into account when deciding upon analysis techniques using NetFlow data.

2.3 Existing Network Monitoring Solutions

2.3.1 TCPDump

TCPdump [TCPdump, 2007] is a commonly used network debugging tool which enables
the user to intercept and view individually captured TCP/IP packets that are being
transmitted over a network. It is built upon the libpcap [libpcap, 2007] packet capture
library and has the capability of writing out the data obtained from captured packets
to a formatted text file. This can then be interpreted by a statistical analysis program
to produce reports of trends in network usage and to give further information about
network traffic traversing the network in question. The program itself contains no form
of alter or notifcation regarding network events but this can be acheived with the appli-
cation of some network monitoring solution and an indepth analysis of the data recorded.

This solution provides an overwhelming amount of data regarding usage of the network
and can be very useful in diagnosing network faults. However, as mentioned in previously,
individual packet capture is a very intensive process and on a network of any great size
there would simply not be the resources available to capture, process and store every
packet or even a sampled amount of packets in this fashion. This is a very useful tool

to have when actively working to solve an identified issue, but is not something which
should necessarily be used in a passive network monitoring context.

2.3.2 Snort

Snort [Roesch, 1999] is described by it’s creator as a “Lightweight Intrusion Detection

System” which operates in a passive fashion “providing administrators with enough data
to make informed decisions” [p229]. It is based upon the libpcap [libpcap, 2007] packet
capture library like TCPdump but analyses individually captured packets with the capa-
bility to examine the payloads of packets in the application layer which TCPdump lacks.
Again, due to the nature of its operation, it does not scale successfully to be used on
larger networks, and it’s creator states that it is intended to be used on “small, lightly
utilized networks” [p229]. It’s method of network traffic analysis is a rule-based one and
the rules are created by the individual network administrators tailored to their network.
If Snort witnesses some traffic trend which is defined as aberrant according to those rules
it will perform a set action, most commonly sending an email to the administrators to
alert them to the possibility of some network problem.

Again this is dependent on individual packet capture which is not suited to large, heavily
used networks, even as stated by the creator of the program. It does contain some form
of alert system but as mentioned previously, rule based systems are not ideal for such
analysis as it is very difficult to predict new network trends, either naturally evolving ones
or ones caused by new network threats. Such a system might create large amounts of
false positives or in the case of a new style of anomaly, may miss the problem altogether.

2.3.3 RRDtool

RRDtool [RRDtool, 2007] stands for Round Robin Database tool and describes itself
as “the industry standard data logging and graphing application”. It provides a series
of tools capable of creating, updating and manipulating databases of time series data
with which to produce graphs for visualising results. It’s data storage uses round robin
database principles which means that the database files will never grow to be larger than
a custom set size. This is acheived through constantly averaging and generalising the
data held within over a set amount of time. This has two results, firstly that the size
needed to record the data for a particular source will always be constant, but secondly
that over time the older results held will lose their granularity as smaller differences will
be averaged out. This means it is a good choice for a situation such as at DANTE as
the initial data size is a known quantity, and storing data in this format means it is held
in a compact fashion meaning that potential I/O constraints are minimized. The loss of
data granularity is something which can be organised such that only data so old that it
is of no direct use is changed past a certain point.


RRDtool provides a way of storing data in a logical and easily readible/updateable for-
mat. It provides the facility to generate graphs based upon the data values held within
the RRD databases and to hold data in different resolutions depending on user con-
figurable settings, and also contains a form of aberrant behaviour detection based on
Holt-Winters forecasting. The result of its aberrant behaviour detection is a boolean
result, yes or no, for a particular time period. There is no built in functionality for alerts
or available interface to more information, the behaviour is merely flagged when seen. It
does not provide any front end interface to these commands, nor is it specifically tailored
to data sources such as SNMP queries or Network Flow. The data must be organised
and processed into the required format before being inputted to RRDtool for archiving
and graphing. There are quite a few front-ends and extensions available for RRDtool,
most noteable are Cacti and NfSen which are discussed later in this chapter.

<pdp per row>1</pdp per row><!-- 300 seconds -->
<!-- 2007-01-20 16:45:00 GMT / 1169311500 -->

Figure 2.1: Section of an RRD exported to XML format


This is a brief overview of the architecture of an RRDtool database and how it operates,
including the aberrant behaviour detection capability added by Jake Brutlag. Firstly an

RRDtool database (from this point on referred to as an RRD) is stored on disk as a binary
file specific to the architecture of the machine used to compile the version of RRDtool it
was created using. It can be exported to and imported from an xml format, in which it
is easy to see the constituant sections [See figure 2.1], but more importantly so it can be
ported between machines with RRDtool compiled for different architectures. This binary
format minimizes the time taken for read and writes performed by the application itself.

RRDtool performs an operation on the RRD known as consolidation which is essentially

a form of archiving based upon user specific rules. Consolidation occurs with every RRD
update, as new data is added older data is consolidated such that the archive maintains
a specific size, and that the overall data result is how the user has defined; older data
can be reduced to an average, a minimum, a maximum etcetera. There can be different
consolidation functions per RRD and internally data using the same consolidation rules
is divided into separate Round Robin Archives (RRAs) where the required amount of
space is set aside ready for data values to fill it. The example given in the RRDtool
documentation is that of a need to store 1000 values at 5 minute intervals. Within the
RRA, space for 1000 values will be allocated plus a header of a set size. As data values
are updated they are added to the allocated space in a round robin fashion, so newer
values would appear to knock older values off the end of the 1000 recorded instances.
This is when the consolidation function is used to keep track of the previous data in the
way the user has specified.

The aberrant behaviour detection functionality within RRDtool is implemented using

the Holt-Winters forecasting method which will be examined more closely later in this
chapter. The information is stored through the addition of five consolidation functions
as shown in table 2.1 [Brutlag, 2000a p143].

HWPREDICT An array of forecasts computed by the Holt-Winters algorithm,

one per primary data point.
SEASONAL An array of seasonal coefficients with length equal to the seasonal
period. For each primary data point the seasonal coefficient that
matches the index in the seasonal cycle is updated.
DEVPREDICT An array of deviation predictions. Essentially copied from the
DEVSEASONAL array to preserve a history; it does no processing
of its own.
DEVSEASONAL An array of seasonal deviations. For each primary data point the
seasonal deviation that matches the index in the seasonal cycle
is updated.
FAILURES An array of boolean indicators, 1 indicating a failure. Each update
update removes the oldest value and inserts the new observation.
On each update the number of violations is recomputed.

Table 2.1: Consolidation functions within RRDtool for aberrant behaviour detection

When the calculations have been performed and the specific RRAs updated then the
FAILURES section is where the actual aberrant behaviour is indicated.

2.3.4 Cacti

Cacti [Cacti, 2007] is described as being the “complete front end to RRDtool” providing
a web based framework for aggregating data sources and displaying graphs dependent on
user configuration. A lot of the functionality is provided via RRDtool; what Cacti offers
above RRDtool alone is, in essence, exactly how it is described, a more easily config-
urable interface which can be altered to a network administrator’s preference to provide
a coherent front-end display of available network usage information. It contains support
for graphing based upon SNMP queries and drawing graphs from any data source can be
made to utilise the ‘create’ and ‘update’ functionality in RRDtool.

It does not have any inherent ability for analysing and processing network traffic data
other than that within RRDtool itself, it merely allows the data to be displayed in a
coherent fashion so visual analysis can be carried out. This means that aberrant be-
haviour detection can be performed using RRDtool’s Holt-Winters implementation, but
the network traffic data must first be processed to present it to RRDtool in a compat-
ible format. Having done this there is no provided inteface to the aberrant behaviour
detection results other than a presentation of the data as a graph. This only provides a
visible indication of what time the aberrant event occurred and would require a network
administrator to conduct further separate research using alternative tools to diagnose the
perceived problem.

2.3.5 Flow-Tools

Flow-Tools is “a software package for collecting and processing NetFlow data” created by
Mark Fulmer [Flow-Tools, 2007]. It can be used to collect raw Network Flow data from
routers/servers and then process it to create reports on network activity. NetFlow that is
collected using the flow-capture command will be written to disk in files that cover a user
configurable time period and compression is applied. The files can be configured to ex-
pire, either after a set amount of time has passed or when a certain amount of disk space
has been used. A rotation prcoess occurs so older files are expired first. The interface
to the flow-tools files for querying and analysis purposes is largely at the commandline,
there are commands to process the data in a way completely configurable by the user,
but there are also inbuilt commands to aid searching for set kinds of aberrant behaviour,
such as scanning traffic on the network.

Flow-Tools can be configured to produce reports and graphs (through RRDtool) about
behaviour it witnesses on the network, but any aberrant or anomalous behaviour detec-
tion via this method would be rule based. It could be configured to sent the appropriate

information to RRDtool to make use of the in-built aberrant behaviour detection but
this is not available as standard. A statistical analysis could also be applied to flow-
tools archived files but again this is something which must be configured by the network
administrators individually, there are no in-built capabilities for this.

2.3.6 NfDump

NfDump is a command line application written principally by Peter Haag to collect,

process and produce analytical reports of Network Flow data [NfDump, 2007]. It is a
key part of the wider NfSen project which will be mentioned later in this section. It has
a built in NetFlow capture daemon, nfcapd, which runs as a background system process
collecting the NetFlow data as it is exported from the router. The data is then stored
in 5 minute long timeslices in a proprietary format which can be accessed using other
NfDump command line tools. It contains the facility for viewing archived NetFlow data
corresponding to defined filters, a simplified example taken from the nfdump manpage
[Haag, 2005b] would be:

nfdump -r nfcapd.200407110845 inet6 and tcp and (src port > 1024 and dst port 80)

This displays all IPv6 connections on port 80 to any webserver that occured within the
timeframe that the specified nfdump file covered. This filter syntax is capable of exam-
ining a range of timestamped nfcapd files can produce detailed statistics very quickly,
for example, the top 20 statistics during the two given timeslices in the regular format
[Haag, 2005b]:

nfdump -r nfcapd.200407110845:nfcapd.200407110945 -S -n 20

An example of the output format:

Date flow start Duration Proto Src IP Addr:Port Dst IP Addr:Port Pkts Bytes Flows
2004-07-11 08:59:52.338 0.001 UDP -> 1 404 1
2004-07-11 09:15:03.422 5.301 TCP -> 45 2340 2

NfDump provides a highly flexible and quick interface at the command line to view spe-
cific information pertaining to network events. It can be configured to to produce graphs
via it’s sister application NfSen which uses RRDtool as a back end. There is no com-
mand line facility for detecting aberrant behaviour other than examining the statistical
information by hand or by processing the stored data files using some external statistical
analysis program but as with flow-tools, this is something that a network administrator
would have to create and configure using their personal knowledge of the network. One
extra possibility provided by NfDump is the use of the command line tool nfprofile to use

stored filters (known as profiles within NfDump) to process specified traffic into either an
ASCII formatted human readable report, or a binary formatted data file which can be
analysed again using the NfDump command line tools. This can be configured to occur
as the files are stored, or when the administrator initiates analysis. This could allow data
of a particular type/to a particular subnet/from a particular ip to be stored separately
from normally stored data to ease analysis.

Overall, NfDump provides a very nice solution for processing and organising collected
NetFlow into a format which can be analysed for aberrant behaviour, but it does not
provide any kind of aberrant behaviour detection itself. The only restriction on the
amount of data which can be held is disk space and stored data is not held in any com-
pressed way, even so, as files are rotated every 5 minutes and marked with the datestamp
they cover, in cases such as DANTE where the amount of NetFlow being stored is too
much for the machine to keep longer than two weeks, the older files could easily be
automatically deleted.

2.3.7 NfSen

NfSen, or NetFlow Sensor, is described as “a graphical web based front end for the nf-
dump netflow tools” [NfDump, 2007]. Combined with NfDump it makes up what the
author Peter Haag refers to as the NfSen Project. It provides an interface to the Nf-
Dump command line tools, as well as illustrating network usage via graphs using data
processed and stored using the nfcapd NetFlow capture daemon. Figure 2.2 illustrates
exactly how NfDump and NfSen interact [Kiss & Mohàcsi, 2006].

Figure 2.2: NfSen Architecture

By default it will produce graphs based on the live NetFlow data being captured to dis-
play current network traffic behaviour over various timeframes. It also offers the ability

to define further profiles, specifics of data you wish to see graphs separate from the live
display. The details of these can be configured via the web front end, including the
amount of disk space the RRD for that profile will take up. and it makes use of the pro-
filing feature within NfDump. This size per profile configuration means that an reliable
estimate of disk space can be obtained before any data is captured, as well as giving the
ability to allow more detailed profiles to use more space and hence hold their granularity
for a longer period.

NfSen uses NfDump as its back end for capturing and processing the data, this means it
is only capable of monitoring network traffic based upon Network Flow. Due to this focus
on one source of data however, the analysis that it is capable of is perhaps more detailed
than other similar monitoring solutions and the presentation of the results is specifically
tailored to this kind of information. It also uses RRDtool as it’s back end for producing a
graphical display, this means that it would be possible to harness the aberrant behaviour
capabilities of it’s built in Holt-Winters algorithms, though NfSen does not have any
inherent solution for displaying such information. As with any system utilising RRDtool
for its data storage, disk space is a known quantity though, as I mentioned previously,
NfDump does not have any compression facility for its data storage. There is modular
frame work for adding plugins to the system, one popular example is PortTracker which
monitors the connections to various ports in a graphical way. There is also the facility
for automatic alerting via email according to given rules, but this has to be configured
by a network administrator with specific to the network in question and its uses.

2.3.8 Overview

There are quite a number of network monitoring solutions available but few, if any,
support aberrant behaviour detection and indication. TCPdump and Snort are far too
intensive to use in any kind of passive monitoring environment and the data they provide
is very specific to the rules used. The analysis they perform is not adaptive and in most
cases requires a very good knowledge of the network environment which is being moni-
tored. A side factor is the legal implications of the data that is produced, for example,
with Snort it is possible to view data held in the application layer of packets traversing
the network. A network administrator using such a tool to analyse network traffic may
inadvertently find network traffic which contains illegal material, and in such a case the
law is not entirely clear regarding the administator’s position in having viewed it. Whilst
it is important to ensure there are rules and restrictions regarding network use in action,
it is not necessarily the place of a network administrator to enforce such legislation, and
so, whilst diagnosing network faults the potential for such inadvertent discoveries might
be something to be avoided. Especially for administrators in a situation like DANTE
where their responsibility is simply for the links between separate service providers, in-
stitutions who’s place it is to be enforcing such network usage restrictions rather than

RRDtool is the most promising service offering the capability of aberrant behaviour
detection using the Holt-Winters algorithm based upon supplied data. It is based upon
Round Robin principles and hence uses a static amount of disk space to store it’s data
as well as having the ability to produce graphical representations of any kind of informa-
tion, provided it is inputed in the correct format. Unfortunately this is where RRDtool
is not enough, it has no capability for analysing or processing data merely dealing with
pre-processed values submitted with correct flags. Further applications are necessary to
give RRDtool its full potential.

Cacti provides a fully functional interface to RRDtool, allowing the ability to create
and view graphs of various data sources, even including the facility to carry out SNMP
queries. Whilst this solves some of the initial interface problems faced when using RRD-
tool alone, it still leaves a need for some form of pre-processing of any network data
other than SNMP before it can be inputed to RRDtool. Also Cacti does not have any
interface specifically designed to tailor for the results what might be produced by RRD-
tool’s aberrant behaviour detection, so this would also need to be created. Flow-Tools
might be a solution to the need for preparatory processing and analysis, it automati-
cally captures and stores NetFlow data in a compressed format and can be configured to
output to RRDtool depending on how the data presentation required. This still leaves a
need for a front end presentation of the data, both regular and aberrant behaviour related.

Finally there is NfDump and NfSen, two applications which are closely linked. One
supplies a NetFlow capturing and storing facility, with processing and profiling capa-
bility, the other a web based interface to RRDtool. This is the most complete package
overall, within the context of this project. It is lacking in a number of areas however,
there is no inherent ability for NfSen to provide any aberrant behaviour detection or
indication, and the web interface only allows the creation of basic data profiles not the
ability to set up or view data being processed by the Holt-Winters algorithm. There
is no pre designed facility for indication network events when they occur other than by
viewing the specific graphs at the right time. This would appear to be the best on offer,
but requires more development to be an ideal solution.

2.4 NfSen-HW
NfSen-HW is an extension to NfSen currently being developed as an attempt to make
full use of the Holt-Winters aberrant behaviour detection capabilities within RRDtool
[NfSen-HW, 2007]. It was initially presented to JRA2, the GÈANT2 security team, in
September 2006 as a project being undertaken by network administrators at HUNGAR-
NET the Hungarian research and education network [Kiss & Mohàcsi, 2006]. The aim
was to aid the work of the Computer Security Incident Report Teams (CSIRT) in their
usual work process; “find abnormal behaviour, report and coordinate incidents”, the
goal being to “help visually detect abnormal behaviour” [Kiss & Mohàcsi, 2006, Slide 2].

Whilst it is at the very cutting edge of development, it does provide a combination of

everything previously listed as being a requirement; NfDump for the underlying data pro-
cessing and profiling, and a customised NfSen interface to create and view data sources,
including instances of aberrant behaviour detected using the Holt-Winters functionality
within RRDtool.

2.4.1 Architecture and Organisation

The architecture of NfSen-HW is much like the architecture of NfSen, the main differ-
ences being the extra processing done as part of RRDtool, and the redesign of the front
end. As you can see from comparison of figure 2.2 and 2.3, there are no alterations to
the actual framework of NfDump and NfSen, merely the addition of taking into account
the Holt-Winters forecasting within RRDtool [Kiss & Mohàcsi, 2006].

Figure 2.3: NfSen-HW Architecture

The forecasting algorithm reads from and updates the individual RRD files, adding data
into the Holt-Winters specific RRAs. When a forecasted value is considered to be too
deviant, it is marked within the RRD files such that during the next scheduled processing
event this is being displayed on the web front end.

The plugin architecture within NfSen is such that perl modules of a particular format
can be included as processing scheduled to be run every time an update occurs, every five
minutes. These plugins are held within a rigid framework and can provide information to
a front end plugin, simply a php page included in the front end plugin directory. Using

this method certain extra processing can be performed tailored to a specific network or
need. In the case of NfSen-HW, the plugin architecture has not been used to imple-
ment the extra processing and changes required to update RRDtool for Holt-Winters
forecasting correctly. Gabor Kiss said this is due to the organisation of NfSen’s modular
structure; in order to have acheived what he has within a plugin, he would have had to
repeat large pieces of the underlying code base within the plugin itself, because of this
he chose to simply modify the source code and has submitted suggestions to Peter Haag
as to how the modular framework could be improved. 1

In conclusion, this provides a very useful platform for detecting aberrant behaviour,
but it does not fulfil all of the criteria laid down for use within DANTE. In their case
the amount of network data available is incredibly large, and even in graphical form it
can be too much to take in visually. With NfSen-HW there is no immediate way of
indicating network anomalies without an administrator examining the correct graph at
the right time. This might not seem like much of an initial issue, but due to the size of
the NetFlow data being captured per day they can only keep hold of a certain amount
of NetFlow data, and from that there would not be the space to hold unlimited sizes of
RRD files for profiles. If an RRD can only ever be a certain size, that size might only be
one day’s worth of aberrant behaviour indications and hence after 24 hours the indication
of aberrant behaviour for that profile is lost.

2.4.2 Holt-Winters Forecasting

There are three separate sections which explain the mathematical process which consti-
tutes Holt-Winters Forecasting, firstly:

Single Exponential Smoothing

This is a simple algorithm for predicting the next data value in a time series and can only
be used for predictions in time series’ where there are no trends in results. A weighted
average is taken of all previous time series values, weighted such that the most recently
recorded values are worth the most. This is because logically the most recent values
are the most relevant to any further values. This is acheived by assigning geometrically
declining weights to previous values which decrease over a constant ratio the further back
they go. This forecast can be updated using only two pieces of information, the latest
observed value and the previously calculated forecast. For this to work successfully it is
important to choose the smoothing constant carefully, high values (0.8/0.9) will place a
heavy emphasis on the newest values in the time series where as low values (0.1/0.2) will
stretch the weight further giving further promenance to values in the past. A smoothing
This explanation occurred during a telephone conference on 24th January 2007 involving myself,
Maurizio Molina (Network Engineer, DANTE), Jànos Mohàcsi and Gabor Kiss (NfSen-HW Developers).

constant value of 1 would result in the forecasted value being equal to the previously
observed result.

Holt’s Method

The second section is what is known as Holt’s Method, the introduction of the possibility
of some trend in the values of a time series, and to take this into consideration when
forecasting the next result. This is done by creating another variable, the slope variable,
which keeps track of the direction in which the trend is heading. This variable is also
updated using exponential smoothing hence there are two smoothing constants to choose
values for. In the initial case these must be given values, usually in the region of

0.02 < α0 α1 < 0.2

where a0 and a1 are the two smoothing constants.

Holt-Winters Forecasting

The third and most important section is the actual forecasting algorithm. This is an
extension to Holt’s method which not only takes into account the possibility of some
trend in time series values, but also the potential for seasonal variation over different
time periods, for example daily, monthly or yearly seasonal traits. The observed time
series is broken down into three componants, each of which can be calculated to forecast
further values:

• The Baseline (or Intercept)

• The Linear Trend (or Slope as it was referred to previously)

• The Seasonal Trend

The results are still calculated using exponential smoothing, but different weighting is
applied dependent on which component is involved. In the case of the seasonal trend,
since the current point within the season is known, the last known value for the same
point in the season can be referenced and given most relevance in calculating a prediction.

Aberrant Behaviour Detection is then performed using confidence bands. Since an res-
onably accurate prediction can be made regarding the next value in a series, it is possible
also to define limits that confidently the value will fall between. In other words, in the
case that the actual next value is not exactly the same as the predicted next value, to
what limits are we confident that it still follows the current trend and seasonal varia-
tions. If the actual value is beyond these limits, either higher or lower, then depending
on the magnitude by which the prediction is incorrect, the actual value can be classified
as aberrant, compared to previous known values.

This is a very simplified explanation of the Holt-Winters forecasting process with a re-
duced emphasis on the mathematical formulae involved. It is based upon information
given by Jake Brutlag and on Chatfield and Yar’s investigation into the practical issues
of Holt-Winters forecasting, more detailed explanations of the algorithm can be found in
these sources [Brutlag, 2000a; Chatfield & Yar, 1988].

3.1 Requirements

3.1.1 Network Operator’s Workflow

The creation of requirements for this project requires an understanding of the situation
in which the system will be used. A network operator has a considerable amount of day
to day responsibilities other than identifying and rectifying network problems, in some
cases detecting issues on the network will be less of a proactive feature of their work,
more something which might be triggered by a report from a user of a specific issue they
are facing.
The result of this is that quite often network issues will go unnoticed and unattended
until they become enough of a problem for an end user to complain. In a situation like
that of DANTE the size of the network that is being monitored and they amount of
data that traverses it means that even with a network monitoring application showing
graphs and trends of network activity, it is very easy to miss a network event which
only effects a small portion of the network, or small number of sources of traffic data.
Figure 3.1 gives an example of the actions taken by a network operator when a problem
is detected or reported. With their current infrastructure, the majority of that process
involves tracking down the problem and then using separate applications and tools to gain
a better understanding. There is no facility for easily seeing other affected sites without
going through the same process multiple times. In order to discover if other network
operators have previously investigated or dealt with the problems identified they must
access a separate ticketing system and specifically identify the sources and time periods
in question. This means that if a problem has already been analysed and explained
previously, there is the possibility a second operator may have to go through the same
process a second time. Finally, as mentioned previously, due to the amount of data being
monitored network problems could be missed. In a case where someone reports a problem
which has been ongoing for longer than a certain period then the original NetFlow data
covering the time that the event began will probably have been deleted, in DANTEs
network monitoring setup the length of that window is two weeks. This would result in
all analysis and diagnosis being performed based on the data held in the RRD files and
graphs which, due to the Round Robin nature of RRDtool, will become less accurate as
time passes.

3.1.2 Requirements list

Based upon this understanding of the situation, a set list of requirements have been
dervived, each of which should be met for the solution be considered a success. This list


Figure 3.1: Use Case diagram depicting the diagnosis of a network anomaly

was completed after a series of discussions with a network operator at DANTE.

Overall Outcome

There is a simplistic overall outcome to be achieved by attaining each of the individual

requirements which was the initial starting point for derivation of more specific needs.

To assist a network operator in the identification and diagnosis of

network problems and illustrate how the inclusion of automated aberrant
behaviour detection could improve large network monitoring.

High Level Requirements

Working from this overall end aim has produced a short list of high level, slightly more
focussed requirements:

A Automatically indicate aberrant network behaviour instances as they occur in a

clear, coherent fashion.
B Allow the display of aberrant network behaviour instances to be tailored to the
information the operator deems relevant.
C Supply enough information about each aberrant network behaviour instance that
a preliminary analysis can be made straight away.
D Indicate possible links between indicated aberrant network behaviour instances.
E Keep a historical record of aberrant network behaviour instances and basic analyt-
ical details.
F Provide an flexible interface to past aberrant network behaviour information.
G Provide a means of indicating that aberrant network behaviour instances have been

Fully Derived Requirements List

Finally based upon the high level requirements, a fully derived requirements list can be
created. These are broken down into separate tables relevant to the high level requirement
they satisfy. These numbered requirements will be reviewed at the end of the project as
part of the Testing and Evaluation chapter of this report.

A Automatically indicate aberrant network behaviour instances

as they occur in a clear, coherant fashion.
A.1 Aberrant network behaviour instances should be displayed together on
one page organised by the time they occurred.
A.2 Only the most relevant information for each aberrant behaviour instance
should be displayed.
A.3 Aberrant network behaviour instances should be aggregated to display one
event per continuously flagged period.
A.4 This display should automatically update as new aberrant behaviour is
detected on the network.
A.5 The display should be accessible from machines other than the machine it
is installed on.
A.6 Each aberrant network behaviour events should be displayed in an identical
style so quick comparisons of information can be made.

Table 3.1: Derived Requirements List for High Level Requirement A


B Allow the display of aberrant network behaviour instances

to be tailored to the information the operator deems relevant
B.1 The information displayed as part of the live update can be filtered
to show only instances which match particular conditions.
B.2 The default update should contain information the network operator
believes to be the most relevant in the first instance.

Table 3.2: Derived Requirements List for High Level Requirement B

C Supply enough information about each aberrant network

behaviour instance that a preliminary analysis can be made
straight away.
C.1 Further information should be available for each aberrant network
behaviour instance on request.
C.2 This information should include, at the least, a graph of the time frame in
question and a brief statistical synopsis for the given period and traffic
C.3 This information should be persistant beyond deletion of the actual NetFlow
records for that aberrant network behaviour event.
C.4 It should be made obvious if a particular aberrant network behaviour event has
been flagged as a false positive when examining further details.

Table 3.3: Derived Requirements List for High Level Requirement C

D Indicate possible links between indicated aberrant network

behaviour instances.
D.1 If further information about an aberrant network behaviour event is
requested then a display should also be provided of possible associated events.
D.2 Further information pertaining to these associated aberrant network
behaviour events should be available on request.

Table 3.4: Derived Requirements List for High Level Requirement D

E Keep a historical record of aberrant network behaviour

instance and basic analytical details.
E.1 Detected aberrant network behaviour events should be recorded in some form
of persistant database.
E.2 The database should be reliable, quick to query, and scale well to holding
potentially very large data sets.

Table 3.5: Derived Requirements List for High Level Requirement E


F Provide an flexible interface to past aberrant network

behaviour information.
F.1 It should be possible to view past aberrant network behaviour event details
based upon a number of criteria;
F.2 Exact Start time and End time.
F.3 Start time somewhere between two given dates and times.
F.4 End time somewhere between two given dates and times.
F.5 Alongside queries based upon the starting and end times results should be
chosen according to further specific information; type/source/profile etc.
F.6 When results have been found it should be possible to view further information
about an event in the same way it would be possible for a live event.

Table 3.6: Derived Requirements List for High Level Requirement F

G Provide a means of indicating that aberrant network behaviour

instances have been investigated.
G.1 Aberrant network behaviour events stored in the system should be able to
be flagged as acknowledged when they have been dealt with.
G.2 Aberrant network behaviour events stored in the system should be able to
be flagged as a false positive if they have been identified as such.
G.3 Operators who have dealt with a particular aberrant network behaviour event
should be able to leave some comment regarding their findings for the benefit
of later users.

Table 3.7: Derived Requirements List for High Level Requirement G


3.2 Design Decisions

A brief justification of the tools and systems being used within the Sentinel system design.

3.2.1 NfSen-HW

This system has been chosen to provide a basis for the network traffic data analysis and for
the aberrant behaviour detection. This is for a few reasons, firstly it is the most complete
package available in this area of network monitoring. What it provides is a reliable,
mathematically proven platform for detecting network anomalies packaged such that
installation and configuration is not an arduous task. Secondly the Network Operators
at DANTE already have good working experience of NfSen, the non aberrant behaviour
detection capable version of this software. Due to this the exchange of GÈANT2 data
for Sentinel development and testing should be more straightforward as the flows can be
transferred as already organised compatible format files.

3.2.2 Java

Java 1.5 will be used to process the RRD files produced by NfSen-HW for aberrant be-
haviour marks. This was intended to be done using a Java RRD library, allowing Java
to directly interface with the RRD files, some examples of such libraries are compared
in the JRA1 Perfsonar wiki [RRD Java Libraries]. Unfortunately due to the version of
RRDtool required for use with NfSen-HW the libraries will not read the RRD files that
are produced by it. The most complete library, JRobin, required the use of a convertor
before the RRD libraries and whilst JRobin itself would produce the results I required,
the convertor did not support the version of the RRD files being used and so could not
convert them [JRobin, 2006]. Instead then a tool within RRDtool will be used, rrdtool
dump. This was mentioned earlier in the Background and Related Work section and pro-
duces a full XML representation of the contents of the RRD files. Java by default contains
very flexible XML parsing libraries and so once the RRD files have been exported to an
XML format, it should be possible to read in the appropriate aberrant behaviour results.

Java also contains methods and functions for connecting to, querying and alerting sql
compliant databases, and this will be used to insert the collated aberrant behaviour events
into the database to be used by the front end. Using Java in this fashion should mean
that the finished application is completely portable to any system upon which NfSen-HW
has been installed, regardless of architecture unlike RRDtool. Java is portable to any
system or architecture providing it has been installed, and this should mean that the end
application will run in any NfSen-HW environment,

3.2.3 MySQL and PHP

The database will be stored using MySQL 5.0 and the web front end written using
PHP5 [MySQL, 2007; PHP, 2007]. MySQL is an open source database implementation
very widely used in web based applications and PHP a server side embedded scripting
language which allows processing to be applied with results displayed to a webpage.
These are two highly flexible and frequently integrated pieces of software which should
provide an excellent platform for the aberrant indication history and presentation. They
provide all the necessary tools and functions to complete the project in the easiest way
possible, allowing complex database queries and functionality within PHP for service and
IP address lookups.

3.2.4 Debian GNU/Linux

Debian GNU/Linux will be the operating system platform for Sentinel. This is first
and foremost because the installation of NfSen-HW requires a well maintained and com-
pliant Linux distribution, but secondly because of my familiarity with Debian’s sys-
tem architecture and knowledge of Debian’s excellent package management system apt
[Debian GNU/Linux, 2007]. This should mean that the installation of certain necessary
software, such as Java and PHP, will be a simple process leaving more time for devel-
opment and testing. Also Linux generically comes with a number of useful applications
which will be required for this project, the most important being Bash or ‘Bourne-Again
SHell’ and Cron. Bash is the command line interpreter which comes as standard with
GNU operating systems [Bash, 2007]. It provides a text based user interface to execute
commands but also allows files containing commands to be created, Bash scripts, which
is what will be used to initiate the Sentinel java process on specific RRD files. Cron,
or more specifically Vixie Cron, is a background process or daemon which exists to ex-
ecute scheduled commands at specific times. Using formated configuration files known
as crontabs Cron can be set to run a particular command or script at a set point every
minute/hour/day/month. Cron will be used to ensure that the runSentinel.sh script is
executed every five minutes to correspond with NfSen updates.

3.3 System Architecture

As described within the Design Decisions section, the system is made up of many smaller
componants which interact with NfSen-HW, NfDump and RRDtool.

Figure 3.2: Overview of Proposed System Architecture

Figure 3.3 illustrates how the various sections and systems interface with each other.
As can be seen, NfSen-HW is an important part of the back end, Sentinel bases it’s
aberrance detection upon the events identified by RRDtool’s Holt-Winters forecasting.
The rest of this section should give a more detailed description of what each of the

individual components do.

3.3.1 NfSen-HW and NfDump

The operation of these two applications has already been covered by previous chapters
of this report, but here is an overview of their use within the wider Sentinel indication
system.The Network Flow data from all sources is captured, as shown in Figure 3.2, by
individual instances of the nfcapd capture daemon. This is then analysed and organised
according to specified profile filters by NfDump. This information is then processed by
the front end system, NfSen-HW and the specific parameters are passed to RRDtool for
the creation of RRD files for each source/profile. The Holt-Winters forecasting occurs
as part of this process within RRDtool itself, and the resultant aberrance indication
data is updated to each individual RRD in RRA sections specific to aberrant behaviour
detection. Once this information has been stored within the RRD files then NfSen-HW
plays no further part in the abberance indication process. This occurs once every 5
minutes, as the nfcapd NetFlow data files are rotated allowing new NetWork traffic data
to be analysed.

3.3.2 runSentinel.sh

runSentinel.sh is a Bash script which is executed once every five minutes by Cron. It
traverses the directory structure that holds the NfSen-HW RRD files, uses the rrdtool
dump command to export them to XML and runs Sentinel.jar on each file to pull out
the aberrant behaviour. This is merely a method of ensuring that the aberrant event
database is updated every five minutes, the same as the RRD files themselves, which
should ensure that no aberrant events are missed.

3.3.3 Sentinel.jar

This is the Java file which is responsible for interpreting the contents of the RRD files and
then for inserting that information into the Sentinel database. This is done by parsing
the XML outputted version of each RRD file created using runSentinel.sh and then using
Java’s inbuilt SAX XML parsing libraries. The default XML handler provided by the
library is extended to create an XML handler which only looks for the specific sections
of XML that are required to retrieve the aberrant behaviour data. Information about
each flagged event is pulled out and placed in an AberrantBehaviour object and once the
XML file has been parsed to pick up ever instance of aberrant behaviour, the collection
of AberrantBehaviour objects are inserted into the Sentinel database. It makes use of the
JDBC libraries within Java for connecting to and manipulating data within databases,
in this case using the MySQL connector.

Figure 3.3 depicts a high level UML diagram of the classes within the Sentinel Java
component. As you can see from the diagram most of the complexity is within the RRD-
Database, the XML parser simply pulls out the relevant information. It should be noted
that there are two forms of parse available within Sentinel, the first is the default, a scan
for any aberrant behaviour which has been indicated in the five minutes previous to the
last updated time. This is the form that will be run by the runSentinel.sh script every 5
minutes, and ensures that only the latest information is pulled into the database as it is
updated. The second is a full scan, trigged by a command line argument, which will go
through and parse an XML file for every single aberrant event that it contains. This is
designed to be run the first time the system is put into operation, to retrieve the backlog
of aberrant events into the database for historical purposes.

Figure 3.3: Sentinal Java UML Diagram


3.3.4 Sentinel Database

This MySQL data base stores all information about aberrant network events, including
their type, source, profile and a basic amount statistical information. Here is an entity
relationship diagram for the database schema:

Figure 3.4: Sentinal Database Entity Relationship Diagram

As you can see, an aberrant event can have one type, profile and source, but each of those
could be applicable to many events. Here is an overview of the contents and responsibil-
ities of each table within the database schema.
events Holds information relevant to an aberrant network event, including the type,
source and profile via foreign key links to other tables. A start time and
end time is held per event, as well as a comment and a marker indicating
acknowledged and false positive status. Also brief statistics are held,
taken from nfdump and a lookup of port/hostname.
types A simple table containing all possible types of network data and an id number
for linking purposes.
sources Contains all the sources seen so far with an id number for linking and a
description field to store further brief information about each source.
profiles Contains all the profiles seen so far with an id number for linking and a
description field to store further brief information about each profiles.

Table 3.8: Sentinel Database Tables

This final diagram illustrates how the tables will link together and the connections that

will take place using foreign keys.

Figure 3.5: Simple foreign key linking example

These diagrams give a good precise description of the contents of the tables and the
relationships between them, but it is also important to understand how the data within
the tables will be used by the other sections of the Sentinel system.

Firstly, and most importantly, the events table which links together all the relevant
information about a particular aberrant event. The table contains a unique event id as
it’s primary key; by using an integer and separating this necessity away from the actual
held data should mean that indexing of the table is a lot quicker and lookup times should
be improved. Second to that are two columns relating to the time that the event took
place. The way that aberrant behaviour detection is implemented within NfSen-HW and
RRDtool means that one particular network event will be flagged within the RRD as a
continuing series of 5 minute long segments. Since it is quite obvious from viewing the
produced graphs that each individual 5 minute long segment is not an aberrant event
in its own right, this design holds single events by storing the start time and the end
time of each event; from the RRD this would be the first 5 minute segment that the
aberrant behaviour was indicated, and subsequently the last 5 minute segment that it
was indicated. In the case of a live updated page, the end time would be the last time
that aberrant event was seen as active as, without seeing the next segment in time, we
cannot predict when a series of aberrant markers is going to end. The events table then
holds three foreign keys, linking to tables containing information about the type, source
and profile of an aberrant event. Next is a comment field where network operators can
comment on an event, leaving messages about any research they have undertaken to solve
a problem. From that there are two boolean flags, firstly an acknowledged field, where
network operators may mark events as having been dealt with, and secondly a false pos-
itive field. This can be used if NfSen-HW has incorrectly identified a period of time as
an aberrant event. These fields are merely present for filtering purposes, when using the

system a network operator does not want to be presented with falsely identified events
if they have been idenfied as such. The final two fields within this table are simply text
fields containing more detailed information about the flows which were occurring during
the idenfied time frame of the particular network protocol. While the system is being
used as a live update, this information is will most likely be retrieved from NfDump
directly but once an event has been marked as ended and time has passed without it
becoming active this information will be stored in the database for two reasons. Firstly,
this will speed up the front end considerably, once an event has finished there will be no
new flow data added to it, the information which can be garnered from flow statistics
and hostname lookups is not going to change so removing the need to requery the stored
flows should save time. Secondly, in cases such as at DANTE where the Network Flow
data is only held for a restricted amount of time, this will keep at least some basic level of
information connected to an event where it can be examined at a later date. If this were
not done, at a point in the future when information about a past event was retrieved, the
lookup from NfDump could not be performed due to the NetFlow data no longer being
present on the system.

The other tables in the database schema are quite similar, the types table contains a
numeric primary key and a corresponding network traffic type. NfSen-HW chooses to
specify 15 types of network traffic data which do not change throughout the rest of the
system, these correspond to ‘flows’, ‘packets’ and finally ‘traffic’. For each of these there
are 5 subcatergories, firstly all traffic within that classification, then all tcp traffic, all
udp traffic, all icmp traffic and finally ‘other’ which catches all other kinds of network
traffic protocol (for example, PIM or OSPF).

The profiles and sources tables are practically identical other than content, one con-
tains information regarding the data sources being used, the other information regarding
the profiles that have been configured. They both contain a numeric primary key and a
name for the source/profile being stored. The final optional field is a description field, a
place for further information about a source or profile. This might be used to clarify a
certain source or profile’s reference, something which might not be immediately apparent
from the short name.

3.3.5 Sentinel Web Interface

The Sentinel web interface should provide three different views on the same data. The first
view is a live update screen showing all the aberrant behaviour which has beem identified
by the system during a configured amount of time, for example, the last 24 hours. The
second is a more detailed view of a specific aberrant event with further information and
details to help identify the source of the problem. The third is an interface to search
the database of stored aberrant events based on when they occurred, what kind of traffic
was involved, which sources. Each of these interfaces will be discussed in turn with a
prototype of the end design.

Live Update

Figure 3.6: Proposed Live Update Web Interface

This interface is designed to be simple and easy to view at a glance. The aberrant events
which have occurred within the specified time frame are displayed in a tabular format,
just containing the information immediately necessary to gain an initial understanding
of what has happened. They are ordered by end time, in other words, the events which
were active most recently are near the top. It is possible to further filter the aberrant
events displayed, perhaps to group together events affecting a particular data source or
traffic type. This is done using the filter interface at the top of the screen, on clicking
submit the page would be refreshed showing only the data relevant to the options selected.
By default on this display any events which have been marked as a false positive, or as
acknowledged will not be displayed. This stems from understanding a network operator’s
workflow, in most cases if an event has been dealt with or is being dealt with then it
should not be listed as an event in the Live Update requiring attention. It could be
necessary for an operator to compare a currenrly un acknowledged event with other
previous events regardless of acknowledged or false positive status, in this case the filters
can be temporarily alters through the filter interface to display all events within the given

timeframe, regardless of the flags applied.


Figure 3.7: Proposed Details Web Interface

The Details interface is designed to contain as much information about the event as possi-
ble in one place. The stored information about the event is first presented in text form at
the top of the page, this is the longer form of the information containing comments, flags
and descriptions. The information specific to this event can be edited from this page,

further towards the bottom there is a small entry form. This will be auto completed to
contain the information that is stored currently about that event so it may be edited
/ removed as appropriate. The graph covers a time period relevant to the event and
underneath is a brief synopsis of statistical analysis from NfDump based upon the start
and end times and the classification of traffic that was indicated as aberrant. There is
also here a presentation of the top few flows with the port numbers and hostnames looked
up. This is to present the operator with as much information as possible in one place
so they aren’t required to use separate applications to perform the analysis necessary.
Finally on the page is a display of events that have been idenfied as associated with this
one, primarily by when the events occurred. If two events are entered as starting and
ending at identical times, then logically they are going to be related in some way. This
gives a network operator a better feeling for how widespread the problem is.


Figure 3.8: Proposed Review Web Interface


The Review section is primarily for accessing data about events that have fallen outside
of the Live Update time period. Events can be searched for initially based upon their
start and end date/time and secondly against filters like the Live Update page. Further
Details of events which have been found will be displayed through the same Details
page as mentioned previously. Similarly to the Live Update section, the Review page
is concerned with presenting the essential pieces of information clearly and concisely, if
an operator is interested in a particular event then further information can be found by
clicking on it.

4.1 Method of Implementation

The system was implemented over a number of weeks, initially though the focus was on
gaining a full understanding of NfSen-HW and its operation alongside RRDtool and how
to use it. Once this had been acheived the focus moved to the creation of the Java pack-
age to parse the XML using small RRD files dumped to XML format for testing as it was
produced. A small difficulty was encountered with the parsing due to the organisation
and content of the XML file and this will be discussed in more depth in the Sentinel.jar
section of this chapter. Once the package was operating correctly the database tables
were created and development in Java continued to ensure the correct insertion of data.
Next the Bash script runSentinel.jar was created and then the back end of the system
could be put into proper operation. Lastly the web interface was produced taking data
from the Sentinel Database.

The implementation of each of these sections will be discussed in more detail under
the headings which follow.

4.2 NfSen-HW
This is was not technically implemented as part of the system, but it’s installation and
use did cause some initial problems for development. NfSen-HW is based on the 20060412
snapshot of NfSen, a non-stable version which appears to have some bugs. The biggest
problem was with the creation of NfSen-HW instances with previous data; the application
would work perfectly if, on creation, each source was specified and there was no earlier
data to be imported. Unfortunately due to the way data was transfered from DANTE the
only data available for use was technically past data from a large number of sources which
needed to be imported before graphs or aberrant detection could be displayed. After a
large amount of experimentation it became apparent that if all past data is present at
the very moment you create the NfSen-HW instance for the first time, on the initial start
up it will go through every stored NetFlow file and create appropriately dated RRD files.
If past data is added at a future time even initiating a rebuild of the RRD files will not
allow the creation of graphs based on this new data. This is because when NfSen-HW
creates the RRD files it has to give a starting time for the data it contains, any later
addition of previous data failed to change the starting date and so any earlier data was

Another problem was the inability to add new sources of data once an NfSen-HW instance


had been initialised. The configuration options can be altered but on rebuild even though
the correct RRD files would be constructed, no data was added to them. This resulted
in graphs claming to contain data from new sources but never actually displaying any
content. The data received from DANTE contained over twenty separate sources of Net-
Flow data, and was not delivered to any installation of NfSen-HW ‘live’, that is to say, as
it was produced by the routers. What was received was backlog of nfcapd archived files
since the last update of NetFlow data was performed from my machine performed using
rsync over ssh. The only solution to this and the previously mentioned problem was,
with every fresh installment of DANTE NetFlow, to reinstall NfSen-HW and rebuild the
RRD libraries which was somewhat time consuming. Due to the amount of data being
received, a rebuild of RRD files after a reinstall could take over three hours, the size of
data was approximately 20Gb per week, with an initial download of 83Gb, now nearing
the end of the project the space required to hold the NetFlow has surpassed 400Gb.

The implications of this were a little further reaching; as the data received from DANTE
was never live it meant that it was impossible to test the system using that data in a
live situation with aberrant network data events being updated at five minute intervals.
Initially I thought that having the RRDs based on the old data would allow me to do
a full scan and collect the aberrant network behaviour events which occurred through-
out the period the data covered. Unfortunately it appeared that the RRD files only
hold the aberrant behaviour markers for 24 hours, after which point they are removed.
This caused me to think about how RRDs work, they archive information based upon
a number of averages; over time the results lose their granularity and trends become
more vague. In the case of Holt-Winters Forecast results, they are held within the RRD
structure as a binary 1 or 0 marker. Binary data like this cannot be averaged, a 1 or 0
result makes no sense if it becomes translated into 0.8 at some point in the future, and
so it became obvious that RRD files must only hold their Holt-Winters marks for a set
period of time. Through discussions with a network operator at DANTE and a telephone
conference with Gabor Kiss and Jànos Mohàcsi, the developers of NfSen-HW, it appeared
that Gabor when creating the system had never set it up personally without having old
RRD files which he wanted to reimport. He then ran a perl script called Holt Winters
Reapply to take data from the RRDs files, run Holt Winters forecasting on it, and create
new HW capable RRDs for use with his system. Whilst in my case there was old data
being imported into NfSen-HW, it was not in RRD format, but infact NfDump archived
NetFlow data. This meant that the entire process of creating RRD files was done via
NfSen-HW, and the default time period parameters were hard coded. The solution was
to run the Holt Winters Reapply script once the any NetFlow data had been incorporated
into a new NfSen-HW install which took more time upon each new NetFlow installment
arriving. Even having done this, the RRD files will only hold their Holt-Winters marks
for two weeks. This meant that the historical perspective functionality of my system
became all the more critical.

In order to ensure the Sentinel system was working correctly collecting its information
from a live data source, another installation of NfSen-HW was performed, this time

running using data exported from a router based in my home on a small scale testing
network. Whils the data size is not nearly as large as that from DANTE, it worked as
was expected detecting aberrant network behaviour events of various kinds. It is using
this installation that the majority of the development work was performed.

4.3 runSentinel.sh
The Bash script holds everything together by navigating the directory structures and
converts each RRD file to their XML equivalent. It then runs the Java XML parser over
the XML file with the correct parameters and finally deletes the XML file so as not to
interfere with further conversions.
To understand the way the script works an understanding of the directory structure used
by NfSen-HW is required.


This is the root directory for any RRD data to be held. RRD files created using no filters
or profiles are stored within a profile known as live. It can generally be accepted that
there will be a live profile part of every installation of NfSen-HW which actively captures
NetFlow data but runSentinel.sh does not make that assumption.

sara@fairlop: profiles$ ls
live/ profile1/ profile2/

Asking for a directory listing of the profiles directory would yield results similar to this,
where each of the files listed is actually a directory containing all data related to that
profile name. Looking inside a profile directory shows the actual images which RRDtool

sara@fairlop: live$ ls
flows-day.gif DataSource1/ packets-week.gif traffic-month.gif
flows-month.gif DataSource1.rrd packets-year.gif traffic-week.gif
flows-week.gif packets-day.gif profile.dat traffic-year.gif
flows-year.gif packets-month.gif traffic-day.gif

The two important listings here are DataSource1.rrd, the RRD file containing all the
data for this profile and data source, and the directory DataSource1/ which contains all
of the nfcapd archived NetFlow files. This is repeated for every named profile directory
within /home/nfsen-hw/profiles.
The Bash script therefore works by changing directory into /home/nfsen-hw/profiles
and reading in the directory listing as a list of files. For every ‘file’ found in profiles,
it changes to that directory, and reads in a file list of all files that ends in .rrd. This

should give a list of all RRD files, and hence Data Sources for that Profile. Knowing this,
and its current working directory, it then executes the command rrdtool dump on each
RRD in turn, creating the XML formatted file, and then runs Sentinel.jar passing in the
correct directory paths as parameters to read the XML file.

4.4 Sentinel.jar
The Java XML parser was completed mostly to the specification given in the Design
chapter. Here is a more specific UML diagram of the component classes.

Figure 4.1: Sentinel Java UML Class Diagram


There is one additional class which was not present in the original design, the Aberrant-
Mark class. This is due to some unforeseen problems with parsing the XML files which
shall be mentioned in more detail later in this section.

4.4.1 Implementation Overview

Sentinel.jar is executed via a call from the runSentinel.sh Bash script and is passed
the appropriate parameters to know the location of the RRD file that to be processed.
When passing the parameters it is important that the full path to the chosen RRD file
is given; this is because the RRD files themselves contain no reference to the profile or
source they correspond to. Such information can only be retrieved from the directory
structure and filename. When Sentinel.jar is run, the first thing that happens is the
passed in parameter is broken down into its component parts and the source and profile
name stored. An instance of RRDDatabase is created and the profile and source are set
within it. This is where the information about the RRD file being parsed will be saved,
including a list of AberrantBehaviour objects, one for each aberrant nework behaviour
event that is retrieved. When the XML parsing has finished, the Main driver class gets the
Vector of AberrantBehaviour’s from the RRDDatabase using the getAberrantBehaviour()
method. This Vector is then iterated through and based upon its start and end time, the
information is inserted into the database.

4.4.2 Problems with XML Parsing

Originally I had assumed that parsing the XML file would be as straightforward was
lookng for the tags within the FAILURES section which were marked as 1.0000000000e+00
rather than 0.0000000000e+00 and retrieve the timestamp for that FAILURES entry. Un-
fortunately on closer examination of the RRD structure, the individual entries in the
FAILURES section do not contain a timestamp. The only timestamp available within
the file is the one marking the instant that it was last updated. In order to solve this
problem in a generic and portable way, every aberrant network behaviour marker re-
quired it’s time calculating based upon it’s place in the file, working backwards from the
last entry, which logically is equal to the last update time. An AberrantMark class was
created which is created whenever an aberrant mark is found, as the file is parsed every
entry which would have occured with a time update is counted and when an aberrant
marker is located, the number of the row it was from is stored within the AberrantMark
object created. When the file has been fully processed then the exact number of entries
is known and the timestamp for each event can be worked out using simple mathematics.
Secondary to this, when an AberrantMark is located, it is important to note which field
or fields the mark occurred in. For each update there are multiple types of traffic being
graphed, and the type of traffic the aberrant behaviour occurs as part of determines which
field the mark occurs in. This number was stored inside the AberrantMark instance for
each event and translated back into a human readable name in the Main driver class.

4.4.3 Database Insertion

Sentinel.jar uses the JDBC libraries for connection to the Sentinel MySQL database but
there are some checks made before the data is inserted. The database design is such that
each aberrant instance cannot just be inserted into the events table as any number of
aberrant network behaviour marks may be aggregated into one event if they are part of
a series with the same type, source and profile. The first information to be updated is
the profile and source data, this is so the identifier can be retrieved to be inserted as a
primary key in the events table. A check is made to ensure that the same profile and
source are not already present in the database, if not they are stored and the id numbers
saved. Once this has taken place, the actual aberrant network behaviour information
can be checked. For every event, the type id is retrieved, and then a query performed
to see whether an entry exists with the exactly the same information apart from an end
time stamp 5 minutes previous. If this is the case then the end time in the database is
updated to the end time of the new aberrant instance, and the rest of the information
left as it was. If there weren’t any prior entries in the table fitting that descriprion then
a new entry is created with that information, and so the process continues until there are
no more AberrantBehaviour objects left.

4.5 Sentinel Database

The creation of the database was exactly as laid out in the design, here is a more detailed
UML diagram of the datatypes and interactions between tables.

Figure 4.2: Sentinel Database UML Diagram


4.6 Sentinel Web Interface

The web interface was implemented as illustrated in the Design report also, written in
PHP and divided over 3 separate sections. Here is a brief overview of each page and what
how it was implemented.

4.6.1 Live Update

The Live Update operates initally using a default SQL query. It makes a connection to
the Sentinel database and retrieves all events whose end timestamp was with the last 24
hours. It filters the results in order to not show events which have been acknowleged
or marked as a false positive as part of the default view. Second to that there are
three smaller queries which get a current list of all profiles, sources and types being used
within the system. Along with the option to show acknowledged and false positive events,
this information is used to create the filter functionality. Operators can choose certain
information they would like to see by ticking checkboxes. When the submit button is
clicked, the values that have been selected are submitted back to the same page, the page
detects that selections have been made and the choices are retrieved from the POST
array and assembled into appropriate SQL queries. Here it was important to ensure that
the SQL logic was correct using brackets to separate parts of queries. The assembled
queries are performed and the results displayed in the same style as the default query
would. Alongside this, the page auto-refreshes every five minutes to ensure the displayed
results are as up to date as possible.

4.6.2 Details

The Details page is initialised by a user clicking on an event for more information, this
passes the event id via GET to the Details page. For security reasons it is important
when using the GET method in this situation to validate the value that has been passed;
in Sentinel’s case this checks that the value passed is numeric, which removes the ability
for malicious users to perform SQL commands upon the Sentinel database. The graph is
drawn by sending the appropriate values as part of the GET request to rrdgraph.php, a
part of NfSen-HW. The appropriate values are:

• profile name
• ‘:’ separated list of sources
• proto type: ‘any’, ‘TCP’, ‘UDP’, ‘ICMP’, ‘other’
• ‘flows’, ‘packets’, ‘traffic’
• profile start time - UNIX format
• start time - UNIX format
• end time - UNIX format
• left time of marker - UNIX format; 0 is no marker

• right time of marker - UNIX format; 0 is no marker

• width of graph
• heigh of graph
• light version ( small graphs ) - no title or footer
• linear or log y-Axis
• linear or log y-Axis

Using this it is possible to draw a graph for any period, with specific markers depending
on the options chosen. The Details page draws graphs starting a number of hours before
than the actual start time of the aberrant network event. Also an amount of time is
added to the end time, just to give a better view of what happened at that time which
removed the aberrant marker. This is done by either adding one hour or showing all
traffic up until the current last update time, whichever is smaller. Having done this, the
time period covered by the event itself is marked and on the graph appears as highlighted
in green.

The next stage displays some statistical analysis of the flows during that time period,
either by retrieving the already performed nfdump query from the database or, if the
event is still ongoing, by directly querying nfdump via PHP exec() and displaying the
results back to the webpage. This was not as straight forward as passing in the start and
end times due to the way Holt-Winters forecasting detects aberrant results. An aberrant
mark is displayed based upon the next value in a time series being mathematically “too
deviant” from what was expected, because of this an assessment of aberrance cannot oc-
cur at the exact time the network event begins to happen, it is only realised a set amount
afterwards and marked from then onwards. I found that if analysis was performed based
upon the exact start and end times, then the results would quite often not cover the
period of aberrance. To get around this I conducted some experimentation into the av-
erage amount of time passes between the aberrant event starting and the aberrant mark
being set. Figure 4.3 llustrates the difference between the start of the aberrant event,
and the initial marker being placed. I found that in most cases, unless the aberrant event
was exceptionally out of the ordinary, if 40 minutes was subtracted from the aberrant
event marker’s start time then the start of the actual network activity was included in
the statistics. Figure 4.4 gives an example of this, and further examples are available in
the appendix, section E.

Figure 4.3: Aberrant Marking Example

Figure 4.4: Subtracting 40 Minutes Example


Depending on the aberrant network event, different filters are applied to the nfdump
query to produce the most appropriate results, for example, only showing TCP traffic. A
second version of the query is also performed to retrieve only the top four flow statistics
of that kind, and the result is requested in machine readable format. This produces a
similar result but there are no human readable lables and each entry is separated by
the pipe symbol. From this I pulled out the source and destination ip addresses and
port numbers, and these are looked up using two PHP commands, ‘getServByPort()’ and
‘getHostByIP()’. The results are then displayed to the page in a similar format to the
nfdump output.

The last two sections of the Details page deal with editing information about a par-
ticular stored event and showing potential links between it and other stored events. The
details which can be edited are those which are immediately specific to that event, so the
comment and the acknowledged/false positive flags. Source and profile descriptions are
not specific to an event so they are edited elsewhere. The method of implementation is
a simple form which redirects the details filled in via the POST array to another page
where it is inserted into the database. Associated events are displayed in a similar style
to Live Update, but only if they match the strict similarity criteria; starting and ending
at the same time as the currently viewed event. Details of these events can be viewed in
the same way as from the Live Update page.

4.6.3 Review

The Review page is quite straight forward in comparison, past data can be queried using
a form allowing searching by exact start and end time, start time between two dates and
end time withing two dates. Other filters can be applied such as specific source, profile
or type. Acknowledged and false positive events can be excluded or included and the
results are displayed, again, similarly to the Live Update page complete with a link to
view further details.
System Operation

To illustrate the system’s operation this section will contain a walk through of the work-
flow as experienced by a network operator and will conclude with a comparison of this
and the previous processed defined in the Design chapter.

5.1 Usage Scenario

A Network Operator wishes to know if there are any current problems with external
connectivity (i.e. connections to the wide Internet) on the network. There is already a
profile set up in NfSen-HW to monitor traffic passing outward/inward to the Internet
known as ‘External’. The process of investigation follows these steps:

1. Examining Live Update for indications of Aberrant Behaviour.

2. Filtering the results to only show relevant information.
3. Viewing further details of a specific event.
4. Analysing the results and editing the event details to reflect the results.

5.1.1 Examining Live Update for Aberrant Behaviour

Figure 5.1: Investigation Process Step 1


This page indicates all currently or recently active aberrant events as detected by the
system. As you can see it contains both data from the live profile and the External
profile. The view can be tailored to see only the External sources.

5.1.2 Filtering the results

Figure 5.2: Investigation Process Step 2

Only aberrant behaviour involving data being received from or passed to external hosts
is now show in the summary. From this more information can be requested.

5.1.3 Viewing further Details

Further details contains more information about the event, including a graph of an appro-
priate time period with the actual time period marked in green, the top flows ordered by
the traffic type, in the case of the screenshot this is any traffic type, and a lookup of the
most relevant hostnames and port numbers. From this display an operator could easily
identify that the activity here is nothing of concern and then, using the edit section, a
comment could be added to this effect.

Figure 5.3: Investigation Process Step 3


5.1.4 Analysis and editing event details

Adding conclusions of the findings is very simple, and this information is then stored in
the database for other operators to see.

Figure 5.4: Investigation Process Step 4 - Editing

Figure 5.5: Investigation Process Step 4 - Inserting

5.1.5 Summary

A comparison between this network operator’s workflow and the original example given in
the Design section shows a number of improvements. Firstly, there is one single location
for finding aberrant behaviour instances, the operator does not need to view individual
graphs of profiles and sources as the database picks up all relevant information. This
information can be displayed in the manner the operator chooses, so initial assessments
of problem scope can be made. Leading on from this, the biggest improvement over the
previous process is the ability to see a large amount of relevant information in one place.
The Details section provide basic information about the duration and location of the
problem as well as suggesting possible causes via NetFlow statistics, and finally indicates
likely explanations of the issue by indicating the hostnames and services in use. This
is information that previously the operator would have had to find out by hand. The

Details section also provides an assessment of possibly associated events which can be
viewed in more detail. This reduces the amount of time the operator might have required
to find out other areas and services affected by the event.

Figure 5.6 shows a final sequence diagram depicting the system in operation during
the preceding usage scenario.

Figure 5.6: Sequence Diagram of System Operation

Testing and evaluation

6.1 Testing
In order to thoroughly test the system I used a number of different testing stratgies, each
of which will be covered in detail during this chapter.

6.1.1 Defect and Component Testing

According to Ian Sommervile “the goal of defect testing is to expose defects in a software
system before the system is delivered” [2004 p442]. He provides a graphical example of
a general model of the defect testing process [2004 p443].

Figure 6.1: General Defect Testing Model

His suggestion for testing system usage and operational features is to meet the following
criteria [2004 p443].

1. All system functions that are accessed through menus should be tested.
2. Combinations of functions that are accessed through the same menu should be
3. Where user input is provided, all functions must be tested with both correct and
incorrect input.

Test Cases have been identified in order to meet this criteria. For this stage of the testing
cycle I have separated out the components to test their individual correctness. Once this
testing has been completed there will be further testing to ensure that the integrated
system works as it should. The conclusions reached by carrying out this testing will be
discussed afterwards.


Firstly, some tests to ensure that Sentinel.jar is parsing the XML file and inserting the
data into the database correctly. For these tests the Java application will be treated as a
separate componant reading from the XML outputted format of a small RRD file. The
results will be displayed at the command line rather than inserted into the database,
apart from the test cases which involve testing database connectivity and correctness.
Each test case will be defined by a number, a description of the test, the expected out-
come and the result. The sections are broken in to separate tables for ease of viewing.

Sentinel.jar - XML Parsing

No. Test Description Expected Outcome Result
100 Parse XML file for the last update The correct last update time is
time. printed to screen PASS
101 Parse XML file for Aberrant Marks. Seven Aberrant Marks are
found and printed to screen. PASS
102 Correctly specify the times that The times the Aberrant
the Aberrant Marks occurred. Marks occurred are correct. PASS
103 Parse the XML file for the traffic The correct list of traffic
types in use. types are printed to screen. PASS
104 Correctly idenfity the traffic type The traffic type of each
in use for each Aberrant Mark. Aberrant Mark is correct. PASS

Table 6.1: Sentinel.jar Testing - XML Parsing

Sentinel.jar - Source and Profile detection

No. Test Description Expected Outcome Result
105 The path to the RRD file specified The correct path as specified
as a command line argument can be at the command line is printed
read in by the system. to screen PASS
106 The path can be broken down into The correct path and source is
the correct profile and source. found and printed to screen. PASS

Table 6.2: Sentinel.jar Testing - Source and Profile Detection

From the tests specified in Figures 6.1, 6.2 and 6.3 it can be seen that the Java portion
of the Sentinel system is working correctly, both in it’s XML parsing and in it’s database
connectivity. It is able to retrieve the source and profile names from the path it is
supplied. It can also retrieve all the necessary information from the RRD files in XML
form, including the correct date and time per Aberrant Mark, which was a concern at
the Implementation stage. It is capable of querying the database for results already held,
and based upon that knowledge can insert or update currently held event information.

Sentinel.jar Database Connectivity

No. Test Description Expected Outcome Result
107 Check for the presence of the traffic All of the data types are
types in the database. present in the database. PASS
108 Retrieve the ID numbers of each The correct ID numbers and
traffic type from the database. traffic types are printed to
screen. PASS
109 Check for the presence of the found Of the two detected data
data sources in the database. sources, one is present in the
database and one is not. PASS
110 Check for the presence of the found Of the two detected data
data profiles in the database. profiles, one is present in the
database and one is not. PASS
111 Insert the data source not already The data source not present
present, into the database. should be inserted into the
database PASS
112 Insert the data profile not already The data profile not present
present, into the database. should be inserted into the
database. PASS
113 Retrieve the ID numbers for each of The correct ID number and source
the data sources from the database. should be printed to screen. PASS
114 Retrieve the ID numbers for each of The correct ID number and profile
the data profiles from the database. should be printed to screen. PASS
115 Check for the existence of an Two of the detected Aberrant
Aberrant Event in the database with Marks have equivalent entries in
the same start time, profile, type the database, the rest do not.
and source as each Aberrant Mark
found, but with an end time five
minutes earlier. PASS
116 Insert the full details of each The five Aberrant Marks without
detected Aberrant Event which does equivalent entries should be
not have an equivalent Aberrant inserted into the database.
event already present in the
database. PASS
117 Update the end time of each of the The two Aberrant Events in the
equivalent Aberrant Events in the database should have their end
database to be the same as the found time updated to be the same as
Aberrant Mark it matches. the two Aberrant Marks. PASS

Table 6.3: Sentinel.jar Testing - Database Connectivity


Next, runSentinel.sh must be tested to ensure its correct operation. This will be carried
out using a mock directory structure containing two profiles and two data sources. The
script will be modified first of all to not delete the XML output it creates, and secondly
to print the command it will use to execute Sentinel.jar to the screen instead of running
it. This will ensure that it is behaving correctly, further testing will ensure that the two
components work together in the proper way.

No. Test Description Expected Outcome Result
200 XML files are created for every valid After the script has been run
RRD file within the directory appropriate XML files should
structure. exist. PASS
201 For every XML file created it should The correct command to run
create a valid set of arguments to Sentinel.jar should be printed
run Sentinel.jar correctly to screen as the script meets
each XML file to be parsed. PASS

Table 6.4: runSentinel.sh Testing

As these test runSentinel.sh passes in every case possible to test while each component
is being dealt with independently.

6.1.2 Functional and Integration Testing

Functional testing, sometimes known as Black Box testing is, according to Ian Som-
merville “an approach to testing where the tests are derived from the program or compo-
nent specification”, the system is a Black Box and its behaviour can only be determined
by “studying its inputs and the related outputs”[2004 p443].

Figure 6.2: Functional Testing Model


In the case of Sentinel, this is also a form of integration testing as all the individual
components have to work together in order to meet the specification goals. Sommerville
provides a graphical example of functional testing which illustrates how to view the sys-
tem when conducting the tests which is shown in Figure 6.2. Testing in this area is mostly
centered around the use of the User Interface, in Sentinel’s case the web front end. Inputs
will be chosen as test cases and outputs recorded. This section uses a database which
contains a certain amount of test data, including two data sources, two profiles and 25
events of multiple traffic types. Some of the events are older than 24 hours, which in
the test case is the amount of data to be shown in the Live Update. One of the events
is marked as a False Positive, one as Acknowledged, and one as both; all three have a
comment stored. Three of the events share the same start and end times. During the
test I simulated some aberrant behaviour by pinging a machine on the network at a very
high speed for approximately 15 minutes. The tables of test cases are shown over the
next three pages.

From the results of the test cases it can be seen that the Sentinel system has integrated
successfully and works as it was intended.

Table 6.5: Sentinel UI Functional Testing - Live Update

Sentinel UI - Live Update

No. Test Description Expected Outcome Result
301 Opening a web browser and loading Page should display only a
the Sentinel Live Update page. set period of Aberrant Events
and a complete list of types,
sources and profiles excluding
those flagged as Acknowleged
or False Positive. PASS
302 Filter Aberrant Events for only one In every case, only Aberrant
data source. Should be tried with Events involving that data
every data source listed. source should be shown. PASS
303 Filter Aberrant Events for only one In every case, only Aberrant
traffic type. Should be tried with Events involving that traffic
every traffic type listed. type should be shown. PASS
304 Filter to display events which have The event flagged as
been flagged as Acknowleged. Acknowledged should be shown
alongside the normal results
but not the event marked as
Acknowledged and False Positive PASS
305 Filter to display events which have The event flagged as
been flagged as False Positive. False Positive should be shown
alongside the normal results
but not the event marked as
Acknowledged and False Positive PASS
306 Filter to display events which have The events flagged as
been flagged as False Positive or False Positive or Acknowledged
Acknowledged. alongside normal results as well
as the events flagged as both
Acknowledged and False Positive. PASS
307 Display every Aberrant Event within The default list of Aberrant
the set timeframe by selecting every Events should be displayed,
kind of filter possible at once. plus the Events which had been
flagged as Acknowledged or
False Positive. PASS
308 Leave the Live Update page open for Every five minutes the table
approximately an hour. (During this should refresh. At some point
time, aberrant network traffic will during the hour, the new
be created.) Aberrant Behaviour should be
detected and displayed. PASS
309 Click on the Details link of a The Details page should be
particular Aberrant Event. displayed with information
pertaining to that event. PASS

Table 6.6: Sentinel UI Functional Testing - Details

Sentinel UI - Details
No. Test Description Expected Outcome Result
310 Leading on from test 309, load the The Details page should be
Details page by clicking on the Details loaded with information about
link from an Aberrant Event that event. A graph should
be shown which covered the
related time period, the
exact times of the event
should be highlighted in
green. Statistical details
based on the flows should be
shown and a lookup of the top
top IP addresses and port
numbers. PASS
311 Check for associated Aberrant Events. These should be displayed at
the bottom of the page. PASS
312 Click on the Details link from one of A similar Details page should
the associated Aberrant Events. be loaded with details
relevant to the new event. PASS
313 Edit the details of an event by The UI should redirect to a
changing the acknowledgement status different page indicating the
and false positive status to yes success of the alteration.
and adding/altering a comment. The new details should be
inserted into the database. PASS
314 Return to the Live Update page and The newly edited event should
filter the results to show events not have been displayed
which have been marked as false initially but should appear
positive and acknowledged. then the filter is applied. PASS

Table 6.7: Sentinel UI Functional Testing - Review

Sentinel UI - Review
No. Test Description Expected Outcome Result
315 Load the Review page of the web No events should be shown,
interface. but a filter interface for
searching. PASS
316 Search for an event with a specific Using a specified start and
start time and end time. No other end time known to be in the
filters. database, four events should
be found; one from each
profile and source. PASS
317 Search for the same start and end The two results as shown
times as test 316 but filter to previously connected to that
only show sourceA. source should be displayed. PASS
317 Search for the same start and end The two results as shown
times as test 316 but filter to previously connected to that
only show profileA. profile should be displayed. PASS
318 Search for events starting between All the events which start in
two dates. that period should be
displayed and no others. PASS
319 Search for events ending between All the events which end in
two dates. that period should be
displayed and no others. PASS
320 Search for events ending between the In each case, only the events
same dates as test 319, but only those which concern that network
of each listed type individually. traffic type should be shown.
If there are none then there
should be nothing displayed. PASS
321 Search for events starting between the In each case, only the events
same dates as test 318, but only those which concern that network
of each listed type individually. traffic type should be shown.
If there are none then there
should be nothing displayed. PASS
321 Search for events starting and ending In each case, only the events
at the same times as test 316 but which concern that network
only those of each listed type traffic type should be shown.
individually. If there are none then there
should be nothing displayed. PASS

6.2 User Interface Evaluation

The user interfaces within Sentinel are intended to be simplistic, but functional and their
use should be fairly straight forward. One of the key things kept in mind when design-
ing these interfaces was the situation in which they would be used. When diagnosing a
problem, a network operator does not want the information to be spread across multiple
pages, the information should be presented in a coherant clear way in as little time as
possible. For this evaluation I shall assess each section of the user interface in turn.

The idea behind the Live Update page was two fold, it should be functional such than
an operator could use it on their personal machine, but also so it could be used in an
office environment as a network monitoring tool. The page if displayed on a larger screen
would give anyone concerned an instant overview of any strange behaviour on the net-
work which they could then investigate more thoroughly, using the same interface at
their personal machine. The colour scheme is very basic, colour is not very important as
long as the information is clear. The filters are very clear and simple; it should be fairly
obvious how they are intended to be used, but the aim was to provide a quick way of
narrowing down on a particular problem - on a network the size of GÈANT2 there could
be a large amount of aberrant behaviour occuring at any one time and it is important
that the operators should be able to see exactly what they require.

The Details page was designed with similar aspirations. The page has no use as an
office wide monitoring solution so there is no requirement for the information to be so
stripped down. This page is to provide more information to the network operator so that
they can perform some analysis and hopefully make an initial suggestion as to the cause
of the anomaly. There are three pieces of information which are very important, the first
is the graph of the time period. This gives an instant view of what was happening on
the network as this aberrant event was triggered. As mentioned previously, it displays
information about a longer period of time than the actual event lasted, this is to give a
better overview; operators can see what was happening in the build up to an event, and
in the case of events which have ended, what happened at the end to cause the event to
finish. The second important section is the statistical analysis. This shows in detail what
the graph gives an indication of, broken down into what was causing the most traffic (be
that flows, packets or bytes) and what protocols it was using. The IP addresses and port
numbers are then looked up to provide an extra level of information. The final important
section shows potentially associated events. The aim of this information is to give the
operator a better indication of the scope of the problem, and to provide easy links to
investigate other problematic areas. It is displayed in the same style as the Live Update
page, full details are not required unless the operator is specifically interested in them,
in which case the details link can be clicked.

The Details page also provides the ability to edit the details of an event via a form.
The details of the event are automatically filled into the details of the form so if an

operator chooses to change something, they know what the current values are before
they start. It is fairly simplistic, but again was designed with the aim of being quick
and simple. An operator does not want to be delayed when performing his work by a
complicated interface design.

The Review page is simply an interface to the historical information, a way of build-
ing queries for the database. The most important factor was to provide a flexible filter
system, operators can search for events either by knowing the exact start and end times,
or just requesting any events which started or finished between two dates. This then can
be filtered further in a similar style to the Live Update page, by selecting different pieces
of information which should be present in the results. The style is common across all
sections of the interface so that once the technique of filtering is understood there is no
further knowledge required. Results are again presented in the same style as on the Live
Update page with further details available on request.

Overall the user interface serves its purpose, it is clean, clear and simple which are
the most important factors for how it is intended to be used. The design could have been
more polished, and the filters organised in a more flexible way but due to the nature of
the data, it can’t be known before then system is run how many sources and profiles
there are. They should be organised into blocks of five per line, incase more than 5 are
listed which keeps them together in a sensible way which shouldn’t overfill the webpage.
Other than that the design is a successful interpretation of the requirements and needs
of a large network support office.

6.3 Evaluation
This is broken down into two sections, a comparison of the original derived requirements
list and the finished system, and then an overview containing some feedback from the
network operator at DANTE who has been my liaison for the project.

6.3.1 Requirements List Review

In order to evaluate the success of the system I am going to go back over each of the
derived requirements from the Design chapter, and assess how well this requirement has
been met.

Aberrant network behaviour instances should be displayed together on one
page organised by the time they occurred.

This is fully realised in the Live Update section. Aberrant network events are displayed
in a tabular format organised by the end time, so the events more recently active are

displayed nearest the top of the list. The reasoning behind this was so the Live Update
page could be used as an office wide network monitoring screen, where every event could
be seen easily. I feel this has been acheived successfully.

Only the most relevant information for each aberrant behaviour instance
should be displayed.

The interface design was created so that the Live Update page would be merely a list of
events, with very basic information. I decided that the most important information was
the start and end times, since this should be what the events are ordered by. Then the
traffic type, profile and source, as this identifies where on the network the behaviour is
occurring. Then the two flags are listed, whether the events are acknowledged or marked
as false positive. This is not necessarily vital information about the event, but it aids
understanding of the interface as a filter is applied removing events marked with those
flags at the start. The last entry per event is a simple link to the Details page where
more information can be found. I believe this requirement has been met successfully.

Aberrant network behaviour instances should be aggregated to display one
event per continuously flagged period.

This functionality is provide via the database and the Java component. As an event is
added to the database, the database is checked to see if there is an existing event with
identical details, other than the end time. If so, only the end time is updated rather
than a new entry made. This was of crucial importance to the design as it made the
potentially large amounts of held data much more manageable and created the possibility
for associated events to be identified very easily.

This display should automatically update as new aberrant behaviour is
detected on the network.

This has also been acheived, the Live Update page automatically refreshes and retrieves
any new aberrant network event data when it does so.

The display should be accessible from machines other than the machine it is
installed on.

This requirement is met as the interface is all web based, the communication to the
database occurs over the network so providing the server can be accessed then the web
interface can be too.

Each aberrant network behaviour events should be displayed in an identical
style so quick comparisons of information can be made.

The same information is retrieved for each aberrant network event, and this is displayed
as a list in a table. The table is organised by time, so it should be easy to scan the list for
the events you are looking for. This style is carried throughout the system and is used
on other pages so when an operator gets used to the layout and information presentation
it will aid his work.

The information displayed as part of the live update can be filtered to show
only instances which match particular conditions.

All of the results displayed on the Live Update page can be filted to show only specific
information. This is flexible, so any amount of filters for things to be included can be
added. The filtered display is only a temporary thing however, perhaps functionality
to persist filters across aberrant event display updates might have been useful. The
implemented system does acheive what was stated but could perhaps have been more
usable with a slightly different implementation.

The default update should contain information the network operator
believes to be the most relevant in the first instance.

This is connected to the previous requirement, Sentinel is implemented so that events of

any traffic type, source and profile are displayed as a normal Live Update, but the display
is filtered by default to not show any events which have been marked as acknowledged
or as a false positive. This was implemeted after discussions with both the network
operator at DANTE and a network specialist based at Lancaster University and the
reasoning behind it is that if an event has been dealt with then it no longer needs to be
shown as a current event. If there is another event which is connected to an acknowledged
one, then it should be shown as associated from within the Details section. This is simply
for speed of viewing on the main update page and has been implemented successfully.

Further information should be available for each aberrant network
behaviour instance on request.

This requirement is met via the Details section, every event displayed as part of the Live
Update also supplies a link to view further details if it is required.

This information should include, at the least, a graph of the time frame in
question and a brief statistical synopsis for the given period and traffic type.

The Details page shows all the required information and also gives further details by
performing a hostname lookup on the top four IP addresses, and a service lookup on
their respective source and destination ports.

This information should be persistant beyond deletion of the actual
NetFlow records for that aberrant network behaviour event.

The statistical analysis and service/hostname lookup information is held as a text record
as part of each event in the database. The graph however is only held for as long as
the RRD files are scheduled to last, and over time will lose its accuracy. The statistical
analysis is more information that would normally have been available in such a situation
however, and so I feel this requirement has been met.

It should be made obvious if a particular aberrant network behaviour event
has been flagged as a false positive when examining further details.

On the Details page, if the specified event has been marked as a false positive, this is
displayed in large writing above the graph. This is so a network operator does not waste
time re analysing something which has already been assesed and found to be an error.

If further information about an aberrant network behaviour event is
requested then a display should also be provided of possible associated

The Details page provides this as a table in a similar style to the Live Update display. The
results present in this table to not follow the same restrictions as the Live Update page
and this list includes events which have been marked as acknowledges or false positive.
This is because the events are related regardless of whether they have been dealt with or
determined to be inaccurate.

Further information pertaining to these associated aberrant network
behaviour events should be available on request.

In the same way as the Live Update page, the associated events which have been identified
provide a link back to the Details page for more information regarding themselves.

Detected aberrant network behaviour events should be recorded in some
form of persistant database.

This is one of the most basic requirements and it has been achieved admirably, without
the existence of a database the system would not function.

The database should be reliable, quick to query, and scale well to holding
potentially very large data sets.

The database server used is MySQL which is commonly used in industry for much more
time critical applications than the regular five minute updates that Sentinel works from.
The database has been implemented so there is no data repetition, most searching and
queries are done based upon ID numbers which are the easiest thing to index and generally
the quickest information to query upon. The way the database has been designed to hold
information about continuous aberrant behaviour marks as one entry, rather than as a
series means that the data sets involved will be considerably smaller, and it increases the
ease with which searches based on date/time can be performed.

It should be possible to view past aberrant network behaviour event details
based upon a number of criteria;

This is requirement is met via the Review page, the next few requirements specify more

Exact Start time and End time.

This is possible using the query form on the Review page. Only events which start and
end at that exact time specified will be displayed.

Start time somewhere between two given dates and times.

This is also possible using a differemt section of the query form on the Review page.

Start time somewhere between two given dates and times.

Lastly, this is also possible, queried in a similar way to F.3.

Alongside queries based upon the starting and end times results should be
chosen according to further specific information; type/source/profile etc.

This functionality is also provided alongside the date based filters, other options can be
selected to show only events which match those details. Ideally it might have been useful
to have been able to specify queries like “everything but not using profileX” more easily
than ticking every profile apart from profileX, but there is still the capability for doing
queries of that kind so the requirement is adequately met.

When results have been found it should be possible to view further
information about an event in the same way it would be possible for a live

The results displayed on the Review page are of a similar style to the Live Update. Each
one has a basic amount of information, but there is a link to access much more detailed
results via the Details page.

Aberrant network behaviour events stored in the system should be able to
be flagged as acknowledged when they have been dealt with.

This feature is built into the database and is possible via the Details interface.

Aberrant network behaviour events stored in the system should be able to
be flagged as a false positive if they have been identified as such.

This is also provided in the database design and again is accessed via the Details interface.

Operators who have dealt with a particular aberrant network behaviour
event should be able to leave some comment regarding their findings for the
benefit of later users.

This is provided with a comment field in the database, per event. The Details page offeres
an interface to leave a comment, or alter a comment that someone else has made. This
information is not displayed as part of the Live Update screen, but is shown when more
details are requested about a specific event.

6.3.2 Summary and Feedback from DANTE

The overall aim of this project was;

To assist a network operator in the identification and diagnosis of network problems and
illustrate how the inclusion of automated aberrant behaviour detection could improve
large network monitoring.

From this there were certain lower level requirements derived which I have shown to all
have been successfully met, however, I wanted to look at the statement again and just
assess how far this project has gone to achieve those aims. The Sentinel system is func-
tional and it does exactly what it set out to do, it assists a network operator by aiding his
workflow, providing all the right information in one place. That however is a feature of
many network monitoring systems, and where Sentinel is different is the automated aber-
rance detection and it is that which makes it the interesting prospect that it is. When I
had finished the implementation I contacted the network operator in DANTE who I have
been liaising with throughout the duration of the project, his name is Maurizio Molina.
He very kindly offered to review my project and provide me with feedback about how
well it met his requirements. Overall he was very pleased, it’s a functional self contained
project which acheives what it set out to - the initial requirements were created based
upon conversations with him about his workflow and the way he utilised the NfSen data
as part of his work. His initial hopes had been that I would take NfSen-HW and use it to
examine whether it was useful in detecting aberrance, rather than providing a fully func-
tional solution for viewing the aberrant events it detected, so in some ways my project
was beyond his expectations. His only criticism was that he would have appreciated more
research into NfSen-HW and how successful its aberrance detection was based upon the
GÈANT2 data, but having implemented what I have has indicated that NfSen-HW does
indeed do as it claims and Holt-Winters Forecasting is generally successful, within its
limits. More research would have been carried out based upon the data from GÈANT2
if it had not been for the data source issues present in NfSen-HW.
Most importantly I believe that the project does indicate the usefulness of aberrant
behaviour detection as part of a wider network monitoring strategy for large network
providers. It is another source of information when diagnosing problems which hasn’t
really been taken advantage of. The prototype I have created could be extremely infor-
mative, perhaps with a little more development regarding anomaly detection methods,
this is something I will discuss in my conclusion.

7.1 Overview
Looking back over the project as a whole, I am extremely happy with the outcome.
Firstly, I have gained quite an indepth understanding of network monitoring techniques,
aberrance detection methods, a selection of Linux tools and configuration options, and
the experience of dealing with a real world network operation centre in DANTE. This is
all knowledge I have developed whilst working on the Sentinel system, and it has been a
very worthwhile learning experience. Secondly, the system that has been developed works
as it was intended, and it provides and interesting angle on network monitoring which
according to my research, has not been widely utilised. The feedback I received from
Maurizio helped prove to me that the project has been a success and that this is an area
where there is still much work to be done. This is something I’ll come back to as part of
the Further Work section. There are some areas where I think further development could
have been undertaken. The database was more or less exactly what was required based
on the needs of the data, so if I were to redevelop from scratch, I think I would keep
the same database design. One area I would look into changing is the method of picking
up data from the RRD files. Whilst Java provide a functional, working application, it
probably isn’t a language most suited to the task. Unfortunately I spent so much time
researching NfSen, NfDump, NfSen-HW and RRDtool that when I discovered the Java
RRD libraries were not compatible, there was not enough time to change the develop-
ment plan. Hence the XML parsing solution was implemented, and successfully, but with
hindsight I would have looked into developing that part of the project using a lower level
language, perhaps Perl where there is a substantial chance of an RRD interface library
that works. The second thing which would be reconsidered is the development of the
web interface as an independent entity. It might have been possible, having used a Perl
RRD interface, to tie the web interface into NfSen-HW as a plugin. I investigated the
possibility of implementing the web interface as a stand alone front end plugin, that is
one without a back end plugin to go along side it, but unfortunately it seems that the
NfSen-HW snapshot does not recognise front end plugins withoug a corresponding perl
module back end. I don’t think that the Sentinel web interface lost anything by being
an independent system however, it just might have been a nice touch to tie everything
together. I also believe that the web interface could have been made more sophisticated,
I am not a web developer by choice and the interface, whilst meeting all of the desire
criteria, was very plain and simple. Perhaps using some other development language it
might have been easier to implement, but PHP met all the needs and satisfied all of the
requirements. This is again something which I would examine if I were do either continue
the project or to start again. Something that would need to be added in order to move
the system from being a prototype to becoming a live service would be user authenti-
cation and permissions, this would require changes to the database and front end, but


shouldn’t be too difficult. It’s not something which causes problems for the prototype,
and security has been considered in its development, but it would be a required feature
in any real world network operations centre.

One of the biggest problems was infact the software I was intefacing with, NfSen-HW.
It is a project very much in beta development, and it is based upon an old version of an
application which at the time of the snapshot was still having major bugs ironed out of
it. The problems cannot be rectified in NfSen-HW as it stands due to the amount that
has been added to the default codebase, so the development team from Hungarnet would
have to start from scratch with the latest version. If I had not encountered so much
difficulty with adding the GÈANT2 data into NfSen-HW, there would have been more
research into the accuracy of the results and the implications for fine tuning, this would
have been a highly useful addition to the report, sadly it was just not possible with the
time available.

7.2 Further work

There is a lot of scope for this project to be taken further. What I have produced is an
indication of how incredibly useful aberrance detection could be in real world network
monitoring environments, but it bases all of its aberrance detection on one method and
one technology. As I identified within the Background and Related Work section, there
are many different aberrance detection methods being researched, the most interesting of
which I believe to be the use of entropy to detect changes in network use. If this project
were to be continued I would like to see an investigation into the use of entropy to detect
anomalous events, and a comparison of those results with further study regarding the
accuracy of NfSen-HW, as I mentioned in the previous section. Whilst there is a great
amount of further research to be done surrounding this topic, I feel that in order for
NfSen-HW to be regarded as a decent platform for development some time should be
given to bring the project up to speed. The latest versions of NfSen use an entirely
different RRD structure compared to the snapshot NfSen-HW is built on which means
that their results are incompatible. The RRD structure in the newer version is more
distributed and logical, each traffic type is divided out into it’s own RRD file rather than
storing everything in one source RRD. Allowing time for development could also mean
that Peter Haag has had chance to implement the suggestions Gabor Kiss made regarding
the plugin functionality, and that could result in NfSen-HW being implemented simply
as a plugin for NfSen. This would be the ideal as it would allow the version of NfSen to
always be the most up to date and least likely to cause problems in development.

The following individuals helped in the develoment and design of this

Maurizio Molina, DANTE Network Operator
For large amounts of help and advice regarding the process and systems in place at
DANTE, and for some very delicious Italian food.
Gabor Kiss and Janos Mohàcsi
For taking the time to answer my questions regarding NfSen-HW and RRDtool.

Project Proposal

See the pages following.


See the pages following.

NfDump(1) Manpage

nfdump(1) nfdump(1)

nfdump - netflow display and analyze program

nfdump [options] [filter]

nfdump is the netflow display and analyzing program of the nfdump tool
set. It reads the netflow data from files stored by nfcapd and pro-
cesses the flows according the options given. The filter syntax is com-
parable to tcpdump and extended for netflow data. Nfdump can also dis-
play many different top N flow and flow element statistics.

-r inputfile
Read input data from inputfile. Default is read from stdin.

-R expr
Read input from a sequence of files in the same directory. expr may
be one of:
/any/dir Read all files in directory dir.
/dir/file Read all files beginning with file.
/dir/file1:file2 Read all files from file1 to file2.

-M expr
Read input from multiple directories. expr looks like:
/any/path/to/dir1:dir2:dir3 etc. and will be expanded to the direc-
tories: /any/path/to/dir1, /any/path/to/dir2 and /any/path/to/dir3
Any number of colon separated directories may be given. The files to
read are specified by -r or -R and are expected to exist in all the
given directories. The options -r and -R must not contain any
directory part when used in conjunction with -M.

-m Sort the netflow records according the date first seen. This option
is usually only useful in conjunction with -M, when netflow records
are read from different sources, which are not necessarily sorted.

-w outputfile
If specified writes binary netflow records to outputfile ready to be
processed again with nfdump. The default output is ASCII on stdout.

-f filterfile
Reads the filter syntax from filterfile. Note: Any filter specified
directly on the command line takes precedence over -f.


-t timewin
Process only flows, which fall in the time window timewin, where
timewin is YYYY/MM/dd.hh:mm:ss[-YYYY/MM/dd.hh:mm:ss]. Any parts of
the time spec may be omitted e.g YYYY/MM/dd expands to
YYYY/MM/dd.00:00:00-YYYY/MM/dd.23:59:59 and processes all flow from
a given day. The time window may also be specified as +/- n. In this
case it is relativ to the beginning or end of all flows. +10 means
the first 10 seconds of all flows, -10 means the last 10 seconds of
all flows.

-c num
Limit number of records to process to the first num flows.

-a Aggregate netflow data. Aggregation is done at connection level.

-A fields[/netmask]
Aggregate netflow data using the specified fields, where fields is a
, separated list out of srcip dstip srcport dstport. The default
is using all fields: srcip,dstip,srcport,dstport. An additional net-
mask may be given. In that case flows from the same subnets are
aggregated. In order to do proper aggregation, the IP version is
important, for which the mask applies. Therefore the IP protocol
version must be given in the form of: srcip4/24 for IPv4 or
srcip6/64 for IPv6 address aggregation. Apply the protocol version
for dstip respectively. Only flows of the same IP protocol tcp,
udp, icmp etc. are aggregated.

-I Print flow statistics from file specified by -r, or timeslot speci-

fied by -R/-M. The printed information corresponds to pre nfdump
1.5 nfcapd stat files.

-S Compatibility option with pre 1.4 nfdump. Is equal to -s


-s statistic[:p][/orderby]
Generate the Top N flow or flow element statistic. statistic can be:
record Statistic about arregated netflow records.
srcip Statistic about source IP addresses
dstip Statistic about destination IP addresses
ip Statistic about any (source or destination) IP addresses
srcport Statistic about source ports
dstport Statistic about destination ports
port Statistic about any (source or destination) ports
srcas Statistic about source AS numbers
dstas Statistic about destination AS numbers
as Statistic about any (source or destination) AS numbers
inif Statistic about input interface
outif Statistic about output interface
proto Statistic about IP protocols
By adding :p to the statistic name, the resulting statistic is
splitted up into transport layer protocols. Default is transport
protocol independant statistics.
orderby is optional and specifies the order by which the statistics

is ordered and can be flows, packets, bytes, pps, bps or bpp. You
may specify more than one orderby which results in the same statis-
tic but ordered differently. If no orderby is given, statistics are
ordered by flows. You can specify as many -s flow element statis-
tics on the command line for the same run.
Example: -s srcip -s ip/flows -s dstport/pps/packets/bytes -s

-O orderby
Specifies the default orderby for flow element statistics -s, which
applies when no orderby is given at -s. orderby can be flows, pack-
ets, bytes, pps, bps or bpp. Defaults to flows.

-l [+/-]packet_num
Limit statistics output to those records above or below the
packet_num limit. packet_num accepts positive or negative numbers
followed by K , M or G 10E3, 10E6 or 10E9 flows respectively.
See also note at -L

-L [+/-]byte_num
Limit statistics output to those records above or below the byte_num
limit. byte_num accepts positive or negative numbers followed by K
, M or G 10E3, 10E6 or 10E9 bytes respectively. Note: These lim-
its only apply to the statistics and aggregated outputs generated
with -a -s or -S. To filter netflow records by packets and bytes,
use the filter syntax ’packets’ and ’bytes’ described below.

-n num
Define the number for the Top N statistics. Defaults to 10. If 0 is
specified the number is unlimited.

-o format
Selects the output format to print flows or flow record statistics
(-s record). The following formats are available:
raw Print each file flow record on multiple lines.
line Print each flow on one line. Default format.
long Print each flow on one line with more details
extended Print each flow on one line with even more details.
pipe Machine readable format: Print all fields ’|’ separated.
fmt:format User defined output format.
For each defined output format except -o fmt:<format> an IPv6 long
output format exists. line6, long6 and extended6. See output formts
below for more information.

-K key
Anonymize all IP addresses using the CryptoPAn (Cryptography-based
Prefix-preserving Anonymization) module. The key is used to initial-
ize the Rijndael cipher. key is either a 32 character string, or a
64 hex digit string starting with 0x. Anonymizing takes place after
applying the flow filter, but before printing the flow or writing
the flow to a file.

See http://www.cc.gatech.edu/computing/Telecomm/cryptopan/ for more


information about CryptoPAn.

-q Suppress the header line and the statistics at the bottom.

-z Zero flows. Do not dump flows into the output file, but only the
statistics record.

-Z Check filter syntax and exit. Sets the return value accordingly.

-X Compiles the filer syntax and dumps the filter engine table to std
out. This is for debugging purpose only.

-V Print nfdump version and exit.

-h Print help text on stdout with all options and exit.

0 No error.
255 Initialization failed.
254 Error in filter syntax.
250 Internal error.

The output format raw prints each flow record on multiple lines,
including all information available in the record. This is the most
detailed view on a flow.

Other output formats print each flow on a single line. Predefined out-
put formats are line, long and extended The output format line is the
default output format when no format is specified. It limits the
imformation to the connection details as well as number of packets,
bytes and flows.

The output format long is identical to the format line, and includes
additional information such as TCP flags and Type of Service.

The output format extended is identical to the format long, and

includes additional computed information such as pps, bps and bpp.


Date flow start: Start time flow first seen. ISO 8601 format includ-
ing miliseconds.

Duration: Duration of the flow in seconds and miliseconds. If flows

are aggregated, duration is the time span over the entire periode of
time from first seen to last seen.

Proto: Protocol used in the connection.

Src IP Addr:Port: Source IP address and source port.


Dst IP Addr:Port: Destination IP address and destination port.

Flags: TCP flags ORed of the connection.

Tos: Type of service.

Packets: The number of packets in this flow. If flows are aggre-

gated, the packets are summed up.

Bytes: The number of bytes in this flow. If flows are aggregated,

the bytes are summed up.

pps: The calculated packets per second: number of packets / dura-

tion. If flows are aggregated this results in the average pps dur-
ing this periode of time.

bps: The calculated bits per second: 8 * number of bytes / duration.

If flows are aggregated this results in the average bps during this
periode of time.

Bpp: The calculated bytes per packet: number of bytes / number of

packets. If flows are aggregated this results in the average bpp
during this periode of time.

Flows: Number of flows. If flows are listed only, this number is

alwasy 1. If flows are aggregated, this shows the number of aggre-
gated flows to one record.

Numbers larger than 1048576 (1024*1024), are scaled to 4 digits and one
decimal digit including the scaling factor M, G or T for cleaner out-
put, e.g. 923.4 M

To make the output more readable, IPv6 addresses are shrinked down to
16 characters. The seven most and seven least digits connected with two
dots .. are displayed in any normal output formats. To display the
full IPv6 address, use the appropriate long format, which is the format
name followed by a 6.

Example: -o line displays an IPv6 address as 2001:23..80:d01e where as

the format -o line6 displays the IPv6 address in full length
2001:234:aabb::211:24ff:fe80:d01e. The combination of -o line -6 is
equivalent to -o line6.

The pipe output format is intended to be read by another programm for

further processing. Values are separated by a |. IP addresses are
printed as 4 consecutive 32bit numbers. Output sequence:

Address family PF_INET or PF_INET6

Time first seen UNIX time seconds
msec first seen Mili seconds first seen
Time last seen UNIX time seconds
msec last seen Mili seconds first seen
Protocol Protocol

Src address Src address as 4 consecutive 32bit numbers.

Src port Src port
Dst address Dst address as 4 consecutive 32bit numbers.
Dst port Dst port
Src AS Src AS number
Dst AS Dst AS number
Input IF Input Interface
Output IF Output Interface
TCP Flags TCP Flags
000001 FIN.
000010 SYN
000100 RESET
001000 PUSH
010000 ACK
100000 URGENT
e.g. 6 => SYN + RESET
Tos Type of Service
Packets Packets
Bytes Bytes

For IPv4 addresses only the last 32bit integer is used. All others are
set to zero.

The output format fmt:<format> allows you to define your own output
format. A format description format consists of a single line contain-
ing arbitrary strings and format specifier as described below

%ts Start Time - first seen

%te End Time - last seen
%td Duration
%pr Protocol
%sa Source Address
%da Destination Address
%sap Source Address:Port
%dap Destination Address:Port
%sp Source Port
%dp Destination Port
%sas Source AS
%das Destination AS
%in Input Interface num
%out Output Interface num
%pkt Packets
%byt Bytes
%fl Flows
%pkt Packets
%flg TCP Flags
%tos Tos
%bps bps - bits per second
%pps pps - packets per second
%bpp bps - Bytes per package

For example the standard output format long can be created as


-o "fmt:%ts %td %pr %sap -> %dap %flg %tos %pkt %byt %fl"

You may also define your own output format and have it compiled into
nfdump. See nfdump.c around line 100 for more details.

The filter syntax is similar to the well known pcap library used by
tcpdump. The filter can be either specified on the command line after
all options or in a separate file. It can span several lines. Anything
after a # is treated as a comment and ignored to the end of the line.
There is virtually no limit in the length of the filter expression. All
keywords are case independent.

Any filter consists of one or more expressions expr. Any number of expr
can be linked together:

expr and expr, expr or expr, not expr and ( expr ).

Expr can be one of the following filter primitives:

protocol version
inet for IPv4 and inet6 for IPv6

proto <protocol> where protocol can be any known protocol such as
TCP, UDP, ICMP, ICMP6 GRE, ESP, AH, or a valid protocol number.

IP address
[SourceDestination] IP <ipaddr> or
[SourceDestination] HOST <ipaddr> with <ipaddr> as any valid IPv4
or IPv6 address. SourceDestination may be omitted.

defines the IP address to be selected and can be SRC DST or any
combination of SRC and|or DST. Ommiting SourceDestination is equiv-
alent to SRC or DST.

defines the interface to be selected and can be IN or OUT.

[SourceDestination] NET a.b.c.d m.n.r.s. for IPv4 with m.n.r.s as
[SourceDestination] NET <net> / num with <net> as a valid IPv4 or
IPv6 network and num as maskbits. The number of mask bits must
match the appropriate address familiy IPv4 or IPv6. Networks may be
abreviated such as 172.16/16 if they are unambiguous.

[SourceDestination] PORT [comp] num with num as a valid port num-
ber. If comp is omitted, = is assumed.

[inout] IF num with num as an interface number.

flags tcpflags with tcpflags as a combination of:
R Reset.
P Push.
U Urgent.
X All flags on.
The ordering of the flags is not relevant. Flags not mentioned are
treated as dont care. In order to get those flows with only the SYN
flag set, use the syntax ’flags S and not flags AFRPU’.

TOS Type of service: tos value with value 0..255.

packets [comp] num [scale] to specify the packet count in the net-
flow record.

bytes [comp] num [scale] to specify the byte count in the netflow

Packets per second: Calculated value.

pps [comp] num [scale] to specify the pps of the flow.

Duration: Calculated value

duration [comp] num to specify the duration in miliseconds of the

Bits per second: Calculated value.

bps [comp] num [scale] to specify the bps of the flow.

Bytes per packet: Calculated value.

bpp [comp] num [scale] to specify the bpp of the flow.

AS [SourceDestination] AS num with num as a valid AS number.

scale scaling factor. Maybe k m g. Factor is 1024

comp The following comparators are supported:

=, ==, >, <, EQ, LT, GT . If comp is omitted, = is assumed.

nfdump -r /and/dir/nfcapd.200407110845 -c 100 ’tcp and ( src ip or dst ip )’ Dumps the first 100 netflow
records which match the given filter:

nfdump -R /and/dir/nfcapd.200407110845:nfcapd.200407110945 ’host’ Dumps all netflow records of host from July 11

08:45 - 09:45

nfdump -M /to/and/dir1:dir2 -R nfcapd.200407110845:nfcapd.200407110945

-S -n 20 Generates the Top 20 statistics from 08:45 to 09:45 from 3

nfdump -r /and/dir/nfcapd.200407110845 -S -n 20 -o extended Generates

the Top 20 statistics, extended output format

nfdump -r /and/dir/nfcapd.200407110845 -S -n 20 ’in if 5 and bps > 10k’

Generates the Top 20 statistics from flows comming from interface 5

nfdump -r /and/dir/nfcapd.200407110845 ’inet6 and tcp and ( src port >

1024 and dst port 80 ) Dumps all port 80 IPv6 connections to any web

Generating the statistics for data files of a few hundred MB is no
problem. However be careful if you want to create statistics of several
GB of data. This may consume a lot of memory and can take a while.
Also, anonymizing IP addresses is time consuming and uses a lot of CPU
power, which reduces the number of flows per second. Therefore
anonymizing takes place only, when flow records are printed or written
to files. Any internal flow processing takes place using the original
IP addresses.

nfcapd(1), nfprofile(1), nfreplay(1)

There is still the famous last bug. Please report them - all the last
bugs - back to me.

2005-08-19 nfdump(1)
Holt-Winters Forecasting Examples

Figure E.1: Aberrant Marking Example


Figure E.2: Subtracting 40 Minutes Example 1

Figure E.3: Subtracting 40 Minutes Example 2

Figure E.4: Subtracting 40 Minutes Example 3


[Working Documents] Available from: http://www.lancs.ac.uk/˜burys1/fyp

[Barford & Plonka 2001] Barford, P. & Plonka, D. (2001) Characteristics of Network Traffic
Flow Anomalies. In:IMW 01: Proceedings of the 1st ACM SIGCOMM Workshop on In-
ternet Measurement, San Francisco, California, USA. ACM Press, New York, NY, USA.

[Barford et al. 2002] Barford, P., Kline, J., Plonka, D. & Ron, A. (2002) A Signal Analysis if
Network Traffic Anomalies. In:IMW ’02: Proceedings of the 2nd ACM SIGCOMM Work-
shop on Internet Measurement, Marseille, France. ACM Press, New York, NY, USA.

[Bash, 2007] Bash (2007) The Bash Reference Manual [Internet]. Available from:
<http://www.gnu.org/software/bash/manual/bashref.html>[Accessed on 20th February

[Braukhoff et al. 2006] Braukhoff, D., Tellenbach, B., Wagner, A., May, M. & Lakhina. A.
(2006) Impact of Packet Sampling on Anomaly Detection Metrics. In: IMC ’06: Proceed-
ings of the 6th ACM SIGCOMM on Internet measurement, Rio de Janeriro, Brazil. ACM
Press, New York, NY, USA. pp159-164.

[Brutlag, 2000a] Brutlag, J. (2000) Aberrant Behaviour Detection in Time Series for Network
Monitoring. In:LISA ’00: Proceedings of the 14th USENIX conference on System Admin-
istration, New Orleans, Louisiana. USENIX Association, Berkeley, CA, USA. pp139-146.

[Brutlag, 2000b] Brutlag, J. (2000) Notes on RRDTOOL implementation of Aberrant Behavior

Detection [Internet], Microsoft WebTV, Mountain View, California, USA. Available from:
<http://cricket.sourceforge.net/aberrant/rrd hw.htm/>[Accessed 20th February 2007].

[Cacti, 2007] Cacti (2007) Cacti - The complete rrd based graphing solution [Internet]. Available
from: <http://cacti.net/features.php/>[Accessed 25th February 2007].

[Chatfield & Yar, 1988] Chatfield, C & Yar, M. (1988) Holt-Winters Forecasting: Some Prac-
tical Issues The Statistician, Vol. 37, No. 2, Special Issue: Statistical Forecasting and
Decision-Making. 1988, pp. 129-140.

[Cricket, 2007] Cricket (2007) Cricket [Internet]. Available from:

<http://cricket.sourceforge.net/>[Accessed 20th February 2007].

[DANTE, 2007] DANTE (2007) Delivery of Advanced Network Technology to Europe [Inter-
net], Cambridge, UK. Available from: <http://www.dante.net/>[Accessed 20th February

[Debian GNU/Linux, 2007] Debian GNU/Linux (2007) Debian GNU/Linux [Internet]. Avail-
able from: <http://www.debian.org/>[Accessed 21st February 2007].


[Flow-Tools, 2007] Flow-Tools (2007) Flow-Tools - A toolset for working with NetFlow data
[Internet]. Available from: <http://www.splintered.net/sw/flow-tools/>[Accessed 23rd
February 2007].

[GÈANT2, 2007] GÈANT2 (2007) GÈANT2 [Internet], Cambridge, UK. Available from:
<http://www.geant2.net/>[Accessed 20th February 2007].

[Haag, 2005a] Haag, P. (2005) Watch your flows with NfSen and NfDump [Internet],
Presented at 50th RIPE Meeting, Stockholm, Sweden, May 3rd 2005. Avail-
able from: <http://www.ripe.net/ripe/meetings/ripe-50/presentations/ripe50-plenary-
tue-nfsen-nfdump.pdf>[Accessed 10th March 2007].

[Haag, 2005b] Haag, P. (2005) NfDump(1) Manpage. Installed with NfDump the application.
Available as an appendix of this report [Appendix C].

[JRobin, 2006] JRobin, (2006) JRobin - A Java port of RRDtool by Sasa Markovic [Internet].
Available from: <http://www.jrobin.org/index.php/Main Page>[Accessed 10th March

[Kim et al. 2004] Kim, M.-S., Kang, H.-J., Hung, S.-C., Chung, S.-H. & Hong, J. W. (2004)
A Flow-based Method for Abnormal Network Traffic Detection. In: Proceedings of the
IEEE/IFIP Network Operations and Management Symposium, Seoul, April 2004.

[Kiss & Mohàcsi, 2006] Kiss, G. & Mohàcsi, J. (2006) Anomaly detection for
NFSen/nfdump netflow engine - with Holt-Winters algorithm Presented at
19th TF-CSIRT Meeting, Espoo, Finland, 21st September 2006. Avail-
able from: <http://bakacsin.ki.iif.hu/k̃issg/project/nfsen-hw/JRA2-meeting-at-
Espoo slides.pdf>[Accessed 10th March 2007].

[Korzyk, 1998] Korzyk, A. D. Sr, (1998) A Forecasting Model for Internet Security Attacks. In:
NISSC ’98. Proceedings of the National Information System Security Conference, Crystal
City, Virginia, USA, October 6th-9th 1998.

[libpcap, 2007] libpcap (2007) libpcap - Packet Capture Library [Internet]. Available from
<http://www.tcpdump.org/>[Accessed 21st February 2007].

[MySQL, 2007] MySQL (2007) MySQL - The worlds most popular open source database [Inter-
net]. Available from <http://www.mysql.org/>[Accessed 21st February 2007].

[NetFlow, 2007] NetFlow (2007) Cisco IOS NetFlow [Internet]. Available from:
<http://www.cisco.com/go/netflow/>[Accessed 21st February 2007].

[NfDump, 2007] NfDump (2007) NfDump - NetFlow Dump [Internet]. Available from:
<http://nfdump.sourceforge.net/>[Accessed 10th March 2007].

[NfSen, 2007] NfSen (2007) NfSen - NetFlow Sensor [Internet]. Available from:
<http://nfsen.sourceforge.net/>[Accessed 10th March 2007].

[NfSen-HW, 2007] NfSen-HW (2007) NfSen - Holt Winters [Internet]. Available from:
<http://bakacsin.ki.iif.hu/ kissg/project/nfsen-hw/>[Accessed 10th March 2007].

[PHPL, 2007] PHP (2007) PHP: Hypertext Preprocessor [Internet]. Available from
<http://www.php.net>[Accessed 21st February 2007].

[Roesch, 1999] Roesch, M. (1999) Snort - Lightweight Intrusion Detection for Networks.
In:LISA ’99: Proceedings of the 13th USENIX conference on System Administration, Seat-
tle, Washington, USA. USENIX Association, Berkeley, CA, USA. pp229-238.

[RRDtool, 2007] RRDtool (2007) RRDtool - logging and graphing [Internet]. Available from:
<http://oss.oetiker.ch/rrdtool/>[Accessed 21st February 2007].

[RRD Java Libraries] RRD Java Libraries (2007) RRD Libraries for Java [Internet]. Available
from: <http://monstera.man.poznan.pl/wiki/index.php/RRD Java libraries>[Accessed
10th March 2007].

[sFlow, 2007] sFlow (2007) sFlow End User Forum [Internet]. Available from:
<http://www.sflow.org/index.php>[Accessed 22nd February 2007]

[SNMP, 2007] SNMP (2007)Information about Simple Network Management

Protocol and Management Information Base [Internet]. Available from
<http://www.snmplink.org/>[Accessed 22nd February 2007].

[Sommerville, 2004] Sommerville, I. (2004) Software Engineering. Seventh Ed. Harlow, Pearson
Education Limited.

[TCPdump, 2007] TCPdump (2007) TCPdump - Network debugging tool [Internet]. Available
from <http://www.tcpdump.org/>[Accessed 21st February 2007].

[Thottan & Ji, 2003] Thottan, M. & Ji, C. (2003) Anomaly Detection in IP Networks. IEEE
Transactions On Signal Processing. Vol. 51, No. 8, August 2003. pp2191-2204.

[Wagner & Plattner, 2005] Wagner, A. & Plattner, B. (2005) Entropy Based Worm and
Anomaly Detection in Fast IP Networks. In:WETICE ’05: Proceedings of the 14th IEEE
International Workshops on Enabling Technologies: Infrastructure for Collaborative En-
terprise, Linköping University, Sweden, June 13-15 2005. IEEE Computer Society, Wash-
ington, DC, USA. pp172-177.