Académique Documents
Professionnel Documents
Culture Documents
INSTITUTE OF ENGINEERING
PULCHOWK CAMPUS
DEPARTMENT OF ELECTRONICS AND COMPUTER ENGINEERING
A
FINAL YEAR PROJECT REPORT
ON
INTELLIGENT NETWORK INTRUSION DETECTION SYSTEM
By:
LALITPUR, NEPAL
MARCH, 2010
TRIBHUVAN UNIVERSITY
INSTITUTE OF ENGINEERING
PULCHOWK CAMPUS
By:
Puneet Khanal
Rajiv Shrestha
Raju KC
LETTER OF APPROVAL
The undersigned certify that they have read, and recommended to the Institute of
Engineering for acceptance, a project report entitled Intelligent Network Intrusion
Detection System" submitted by Puneet Khanal, Rajiv Shrestha and Raju KC in partial
fulfillment of the requirements for the degree Bachelor of Computer Engineering.
______________________________
______________________________
Project Supervisor
Project Supervisor
Manoj Ghimire
Assistant Professor
Lecturer
Engineering
Computer Engineering
______________________________
______________________________
Internal Examiner
External Examiner
Purushottam Sigdel
Director
Senior Engineer
Nepal Telecom
________________________________
COPYRIGHT
The author has agreed that the Library, Department of Electronics and Computer
Engineering, Pulchowk Campus, Institute of Engineering may make this report freely
available for inspection. Moreover, the author has agreed that permission for extensive
copying of this project report for scholarly purpose may be granted by the supervisors who
supervised the project work recorded herein or, in their absence, by the Head of the
Department wherein the project report was done. It is understood that the recognition will
be given to the author of this report and to the Department of Electronics and Computer
Engineering, Pulchowk Campus, Institute of Engineering in any use of the material of this
project report. Copying or publication or the other use of this report for financial gain
without approval of to the Department of Electronics and Computer Engineering,
Pulchowk Campus, Institute of Engineering and authors written permission is prohibited.
Request for permission to copy or to make any other use of the material in this report in
whole or in part should be addressed to:
Head
Department of Electronics and Computer Engineering
Pulchowk Campus, Institute of Engineering
Lalitpur, Kathmandu
Nepal
iii
ACKNOWLEDGEMENT
We are sincerely thankful to the Department of Electronics and Computer Engineering for
providing the opportunity to do this project.
We are indebted to our supervisor Mr. Babu Ram Dawadi and Mr. Manoj Ghimire for their
valuable suggestions and constant guidance for the accomplishment of the project. Besides,
we are also thankful to the Project Coordinator Mr. Surendra Shrestha for assisting and
guiding us in the project.
Last but not the least we are thankful towards our friends as well as teachers who
supported us all the way in the course of the project
iv
ABSTRACT
Network Intrusion Detection Systems (NIDS) aim at preventing network attacks and
unauthorized remote use of computers. More accurately, depending on the kind of attack it
targets, an NIDS can be oriented to detect misuses (by defining all possible attacks) or
anomalies (by modeling legitimate behavior and detecting those that do not fit on that
model). Still, since their problem knowledge is restricted to possible attacks, misuse
detection fails to notice anomalies and vice versa. Against this, we present here Intelligent
Network Intrusion Detection System (INIDS), the misuse and anomaly detection system
based on Naive Bayes Classifier, trained with a KDDCup99 dataset traffic, to analyze
completely network packets, and the strategy to create a consistent knowledge model that
integrates misuse and anomaly-based knowledge.
Finally, we evaluate against well-known and new attacks showing how it outperforms a
well-established industrial NIDS.
TABLE OF CONTENTS
PAGE OF APPROVAL.....II
COPYRIGHT...III
ACKNOWLEDGEMENT...IV
ABSTRACT..V
TABLE OF CONTENTS.VI
LIST OF FIGURES...VIII
LIST OF TABLES...IX
LIST OF SYMBOLS AND ABBREVIATIONS..X
1 INTRODUCTION...1
1.1 What is an IDS?......................................................................................................1
1.2 What is not an IDS?................................................................................................3
1.3 Attack Types...3
1.4 Existing System..4
1.5 Problem Statement..4
1.6 Objectives...4
1.7 Scope of the Project....5
2 LITERATURE REVIEW....6
2.1 The TCP/IP Reference Model..6
2.1.1 Internet Protocol (IP).....7
2.1.2 Internet Control Message Protocol (ICMP)....10
2.1.3 User Datagram Protocol (UDP)..12
2.1.4 Transmission Control Protocol (TCP).13
2.2 Naive Bayes Classifier...16
2.3 Some Well-Known Attacks....18
2.3.1 DoS..18
2.3.2 Probe....22
2.4 jNetPcap.25
vi
2.5 jSMILE...25
3 SYSTEM DESIGN...26
3.1 System Block Diagram...27
3.2 Data Flow Diagrams (DFDs).27
3.3 Unified Modeling Language (UML)..30
4 METHODOLOGY31
5 IMPLEMENTATION...33
5.1 Object-Oriented Design..33
6 TESTING..34
6.1 Level of Testing.34
6.2 Software Testing Strategies....35
7 RESULT....36
7.1 Screenshots.....36
7.2 Comparison with Other Existing System...41
REFERENCES 43
APPENDIX A: RFCs...45
APPENDIX D: CD Contents...50
vii
LIST OF FIGURES
viii
LIST OF TABLES
ix
Product
ACK
Acknowledgment
API
DFDs
DNS
DoS
Denial-of-Service
DS
Dataset
DSCP
GUI
HIDS
ICMP
IDS
INIDS
IP
Internet Protocol
NIDS
OS
Operating System
TCP
TCP/IP
TOS
Type of Service
TTL
Time to Live
UDP
1. INTRODUCTION
Nowadays, as more people make use of the internet, their computers and valuable data in
their computer systems become a more interesting target for the intruders. Attackers scan
the Internet constantly, searching for potential vulnerabilities in the machines that are
connected to the network. Intruders aim at gaining control of a machine and to insert a
malicious code into it. Later on, using these slaved machines (also called Zombies)
intruder may initiate attacks such as worm attack, Denial-of-Service (DoS) attack and
probing attack.
Intrusion is any set of actions that threaten the integrity, availability, or confidentiality of a
network resource. An intrusion detection system (IDS) monitors network traffic and
monitors for suspicious activity and alerts the system or network administrator. In some
cases the IDS may also respond to anomalous or malicious traffic by taking action such as
blocking the user or source IP address from accessing the network.
IDS come in a variety of flavors and approach the goal of detecting suspicious traffic in
different ways. There are network based (NIDS) and host based (HIDS) intrusion detection
systems.
b) HIDS: Host-based intrusion detection system (HIDS) monitors and analyzes the
internals of a computing system rather than the network packets on its external interfaces.
There are IDS that detect based on looking for specific signatures of known threats- similar
to the way antivirus software typically detects and protects against malware- and there are
IDS that detect based on comparing traffic patterns against a baseline and looking for
anomalies.
a) Signature Based: A signature based IDS will monitor packets on the network and
compare them against a database of signatures or attributes from known malicious threats.
This is similar to the way most antivirus software detects malware. The issue is that there
will be a lag between a new threat being discovered in the wild and the signature for
detecting that threat being applied to the IDS. During that lag time, the IDS would be
unable to detect the new threat. The limitation of this approach lies in its dependence on
frequent updates of the signature database and its inability to generalize and detect novel or
unknown intrusions.
b) Anomaly Based: An IDS which is anomaly based will monitor network traffic and
compare it against an established baseline. The baseline will identify what is normal for
that network- what sort of bandwidth is generally used, what protocols are used, what ports
and devices generally connect to each other- and alert the administrator or user when
traffic is detected which is anomalous, or significantly different, than the baseline.
However, statistical anomaly detection is not based on an adaptive intelligent model and
cannot learn from normal and malicious traffic patterns.
There are IDS that simply monitor and alert and there are IDS that perform an action or
actions in response to a detected threat.
a) Passive IDS: A passive IDS simply detects and alerts. When suspicious or malicious
traffic is detected an alert is generated and sent to the administrator or user and it is up to
them to take action to block the activity or respond in some way.
b) Reactive IDS: Reactive IDS will not only detect suspicious or malicious traffic and
alert the administrator, but will take pre-defined proactive actions to respond to the threat.
Typically this means blocking any further network traffic from the source IP address or
user.
Intrusion detection systems help network administrators prepare for and deal with network
security attacks. These systems collect information from a variety of systems and network
sources, and analyze them for signs of intrusion and misuse. A variety of techniques have
been employed for analysis ranging from traditional statistical methods to new machine
learning approaches.
Domain
b) Exploits: These attacks take advantage of a known bug or design flaw in the system.
One of the most well known and widely used intrusion detection systems is the open
source, freely available Snort. It is available for a number of platforms and operating
systems including both Linux and Windows. Snort has a large and loyal following and
there are many resources available on the Internet where we can acquire signatures to
implement to detect the latest threats.
1.6. Objectives
Increased network complexity, greater access, and a growing emphasis on the Internet have
made network security a major concern for organizations. The number of computer
security breaches has risen significantly in the last three years. In February 2000, several
major web sites including Yahoo, Amazon, E-Bay, Datek, and E-Trade were shut down
due to denial-of-service attacks on their web servers.
2. LITERATURE REVIEW
The TCP/IP layer is a multi-layered architecture. This means that we have one
functionality running at one depth, and another one at another level, and so forth. We can
add new functionality to the application layers, for example, without having to reimplement the whole TCP/IP stack code, or to include a complete TCP/IP stack into the
actual application.
b) Transport layer
Manages end-to-end communications between hosts.
Two transport layers protocols are TCP and UDP.
c) Network layer
Gets data from source to destination.
d) Link layer
Manages data transfer to and from physical medium.
Stream
Web
browser
Web server
TCP segment
TCP
TCP
IP datagram
IP
Ethernet
driver
IP
Ethernet frame
Ethernet
driver
IP Header
32-bits
VER(4-bits)
HLEN(4-bits)
Identification(16-bits)
TTL(8-bits)
Service(8-bits)
Total Length(16-bits)
Flags(3-bits)
Fragmentation Offset(13-bits)
Protocol(8-bits)
Header Checksum(16-bits)
Source Address(32-bits)
Destination Address(32-bits)
Options
Padding
Version (VER): This four bits field tells the version of IPV4 protocol in binary which
value is 0100.
Header Length (HLEN): This four bits field defines the total length of the datagram
header in four byte words. This field is needed because the length of the header is variable
(between 20 and 60 bytes). When there are no options, the header length is 20 bytes, and
the value of this field is five (5 x 4 = 20). When the option field is at its maximum size, the
value of this field is 15 (15 x 4 = 60).
a) Service Type
In this interpretation, the first three bits are called precedence bits. The next four bits are
called type of service (TOS) bits, and the last bit is not used.
TOS Bits
Description
0000
Normal (default)
0001
Minimize cost
0010
Maximize reliability
0100
Maximize throughput
1000
Minimize delay
b) Differentiated Services
According to this standard bits [0-5] is Differentiated Services Code Point (DSCP) and the
remaining two bits [6-7] are still unused.
Total Length: This field defines the total length (header plus data) of the IPv4 datagram in
bytes. The maximum size is 65535 octets, or bytes, for a single packet.
Flags: This field is used in fragmentation. The first bit is reserved, but still not used, and
must be set to zero. The second bit is set to zero if the packet may be fragmented and to
one if it may not be fragmented. The third and last bit can be set to zero if this was the last
fragment and one if there are more fragments of this same packet.
Fragmentation Offset: The fragmentation offset field tells where in the datagram that this
packet belongs. The fragments are calculated in 64 bits, and the first fragment has offset
zero.
Time to Live: The TTL field defines how long the packet may live, or rather how many
"hops" it may take over the Internet. After processing the datagram, each router
decrements this number by one. If this value, after being decremented, is zero, the router
discards the datagram.
Protocol: This field indicates the protocol of the next level layer. This can be TCP, UDP
or ICMP.
Option: If the Header Length is greater than five, it means that the Options field is present
and must be considered. The options field contains different optional settings such as
Internet timestamps, SACK or record route options.
Padding: This field is used to make the header end at an even 32 bit boundary. The field
must always be set to zeroes straight through to the end.
The Internet Control Message Protocol (ICMP) is gives important information about the
health of the network.
Types of Messages
The error-reporting messages report problems that a router or a host (destination) may
encounter when it processes an IP packet. Five types of errors are handled: destination
unreachable, source quench, time exceeded, parameter problems, and redirection. The
query messages, which occur in pairs, help a host or a network manager get specific
information from a router or another host. For example, nodes can discover their
10
neighbors. Also, hosts can discover and learn about routers on their network, and routers
can help a node redirect its messages. Four types of query messages are echo request and
reply, timestamp request and reply, address-mask request and reply, & router solicitation
and advertisement.
ICMP Header
8-bits
8-bits
16-bits
Type
Code
Checksum
Type: The type field contains the ICMP type of the packet. This is always different from
ICMP type to type.
Code: All ICMP types can contain different codes as well. Some types only have a single
code, while others have several codes that they can use.
11
If UDP is so powerless, why would a process want to use it? With the disadvantages come
some advantages. UDP is a very simple protocol using a minimum of overhead. If a
process wants to send a small message and does not care much about reliability, it can use
UDP.
UDP Header
The UDP header can be said to contain a very basic and simplified TCP header. It contains
destination-ports, source-ports, header length and a checksum as seen in the image below.
16-bits
16-bits
Source Port
Destination Port
Total Length
Checksum
Source Port: This field indicates the port number used by the process running on the
source host. It is 16-bits long. The port number can range from 0 to 65,535.
Destination Port: This field indicates the port number used by the process running on the
destination host. It is also 16-bits long.
12
Total Length: The length field specifies the length of the whole packet (header and data
portions).
Checksum: This field is used to detect errors over the entire user datagram (header plus
data).
TCP Header
32-bits
Source Port Address(16-bits)
Sequence Number(32-bits)
Acknowledge Number(32-bits)
HLEN
Reserved
(4-bits)
(6-bits)
U A P R
R C S S
G K H T
N N
Checksum(16-bits)
Window Size(16-bits)
Urgent Pointer(16-bits)
Options and Padding
13
Source Port: This field indicates the source port of the packet. The source port is directly
bound to the process on the sending system.
Destination Port: This field indicates the destination port of the TCP packet. Just as with
the source port, this port is directly bound to the process on the receiving system.
Sequence Number: This field is used to set a number on each TCP packet so that the TCP
stream can be properly sequenced. The Sequence number is then returned in the ACK field
to acknowledge that the packet was properly received.
Header Length: This four bits field indicates the number of four byte words in the TCP
header. The length of the header can be between 20 and 60 bytes. Therefore, the value of
this field can be between five (5 x 4 = 20) and 15 (15 x 4 = 60).
14
Flag
Description
URG
ACK
PSH
RST
SYN
FIN
Window: This field is used by the receiving host to tell the sender how much data the
receiver permits at the moment. This can be done by sending an ACK back, which contains
the Sequence number that we want to acknowledge, and the Window field then contains
the maximum accepted sequence numbers that the sending host can use before he receives
the next ACK packet. The next ACK packet will update accepted Window which the
sender may use.
Checksum: This field contains the checksum of the whole TCP header. The checksum
also covers a 96 bit pseudo header containing the destination-address, source-address,
protocol, and TCP length. This is for extra security.
Urgent Pointer: This field contains a pointer that points to the end of the data which is
considered urgent. If the connection has important data that should be processed as soon as
possible by the receiving end, the sender can set the URG flag and set the Urgent pointer to
indicate where the urgent data ends.
Option: The Option field is a variable length field and contains optional headers that we
may want to use.
Padding: This padding field pads the TCP header until the whole header ends at a 32-bit
boundary. This ensures that the data part of the packet begins on a 32-bit boundary, and no
data is lost in the packet. The padding always consists of only zeros.
15
In simple terms, a naive Bayes classifier assumes that the presence (or absence) of a
particular feature of a class is unrelated to the presence (or absence) of any other feature.
Depending on the precise nature of the probability model, naive Bayes classifiers can be
trained very efficiently in a supervised learning setting. In spite of their naive design and
apparently over-simplified assumptions, naive Bayes classifiers have worked quite well in
many complex real-world situations.
An advantage of the naive Bayes classifier is that it requires a small amount of training
data to estimate the parameters (means and variances of the variables) necessary for
classification. Because independent variables are assumed, only the variances of the
variables for each class need to be determined and not the entire covariance matrix. The
Naive Bayes algorithm affords fast, highly scalable model building and scoring. It scales
linearly with the number of predictors and rows. The build process for Naive Bayes is
parallelized. Naive Bayes can be used for both binary and multiclass classification
problems.
The Naive Bayes algorithm is based on conditional probabilities. It uses Bayes' Theorem, a
formula that calculates a probability by counting the frequency of values and combinations
of values in the historical data.
Bayes' Theorem
Bayes' Theorem finds the probability of an event occurring given the probability of another
event that has already occurred. If B represents the dependent event and A represents the
prior event, Bayes' theorem can be stated as follows.
16
. (2.1)
. (2.2)
where:
n = the number of training examples for which v = vj
nc = number of examples for which v = vj and a = ai
p = a priori estimate for P ( ai | vj )
m = the equivalent sample size
17
2.3.1. DoS
One common method of attack involves saturating the target (victim) machine with
external communications requests, such that it cannot respond to legitimate traffic, or
responds so slowly as to be rendered effectively unavailable. In general terms, DoS attacks
are implemented by either forcing the targeted computer(s) to reset, or consuming its
resources so that it can no longer provide its intended service or obstructing the
communication media between the intended users and the victim so that they can no longer
communicate adequately.
Denial-of-service attacks are considered violations of the IAB's Internet proper use policy,
and also violate the acceptable use policies of virtually all Internet Service Providers. They
also commonly constitute violations of the laws of individual nations.
There are many varieties of denial of service (or DoS) attacks. Some DoS attacks (like a
mailbomb, neptune, or smurf attack) abuse a perfectly legitimate feature. Others (teardrop,
Ping of Death) create malformed packets that confuse the TCP/IP stack of the machine that
is trying to reconstruct the packet. Still others (apache2, back, syslogd) take advantage of
bugs in a particular network daemon.
d) Pod
e) Land
f) Nuke
Smurf
The smurf attack is a way of generating significant computer network traffic on a victim
network. This is a type of denial-of-service attack that floods a target system via spoofed
broadcast ping messages.
In the "smurf" attack, attackers use ICMP echo request packets directed to IP broadcast
addresses from remote locations to create a denial-of-service attack. There are three parties
in these attacks: the attacker, the intermediary, and the victim (note that the intermediary
can also be a victim). The attacker sends ICMP echo request packets to the broadcast
address (xxx.xxx.xxx.255) of many subnets with the source address spoofed to be that of
the intended victim. Any machines that are listening on these subnets will respond by
sending ICMP echo reply packets to the victim. The smurf attack is effective because the
attacker is able to use broadcast addresses to amplify what would otherwise be a rather
innocuous ping flood. In the best case (from an attackers point of view), the attacker can
flood a victim with a volume of packets 255 times as great in magnitude as the attacker
would be able to achieve without such amplification. This amplification effect is illustrated
by Figure 2.6. The attacking machine sends a single spoofed packet to the broadcast
address of some network, and every machine that is located on that network responds by
sending a packet to the victim machine. Because there can be as many as 255 machines on
an Ethernet segment, the attacker can use this amplification to generate a flood of ping
packets 255 times as great in size as would otherwise be possible. This figure is a
simplification of the smurf attack. In an actual attack, the attacker sends a stream of icmp
ECHO requests to the broadcast address of many subnets, resulting in a large,
continuous stream of ECHO replies that flood the victim.
19
Internet
Hundreds of echo replys flood
One echo request sent to
Echo Reply
Reply
fromEcho
192.168.0.20
Echo
Reply
from 192.168.0.20
to victim
Echo
from
192.168.0.20
to victim Reply
from 192.168.0.20
to victim
to victim
broadcast address.
Echo Request
From attacker
To 192.168.0.225
Attacker
Victim
Teardrop
A teardrop attack is a denial of service attack. The teardrop attack uses IP to create packet
reassembly problems so the target computer crashes. The teardrop attack uses erroneous
packet header information indicating overlapping fragments of packets so some data in
some packets must overwrite data in other packets to re-assemble the packet. Attempts to
re-assemble these packets with overlapping data can cause the computer to crash if the
software is not prepared to handle erroneous packet header information.
Neptune
Neptune (SYN Flood) is a denial of service attack to which every TCP/IP implementation
is vulnerable (to some degree). For distinguishing a Neptune attack network traffic is
monitored for a number of simultaneous SYN packets destined for a particular machine.
The host sending these packets is usually unreachable.
20
Each half-open TCP connection made to a machine causes the tcpd server to add a
record to the data structure that stores information describing all pending connections. This
data structure is of finite size, and it can be made to overflow by intentionally creating too
many partially-open connections. The half-open connections data structure on the victim
server system will eventually fill and the system will be unable to accept any new
incoming connections until the table is emptied out. Normally there is a timeout associated
with a pending connection, so the half-open connections will eventually expire and the
victim server system will recover. However, the attacking system can simply continue
sending IP-spoofed packets requesting new connections faster than the victim system can
expire the pending connections. In some cases, the system may exhaust memory, crash, or
be rendered otherwise inoperative.
POD
Traditionally, this bug has been relatively easy to exploit. Generally, sending a 65,536 byte
ping packet is illegal according to networking protocol, but a packet of such a size can be
sent if it is fragmented; when the target computer reassembles the packet, a buffer overflow
can occur, which often causes a system crash.
This exploit has affected a wide variety of systems, including Unix, Linux, Mac, Windows,
printers, and routers. However, most systems since 1997-1998 have been fixed, so this bug
is mostly historical.
In recent years, a different kind of ping attack has become wide-spread - ping flooding
simply floods the victim with so much ping traffic that normal traffic fails to reach the
system (a basic denial-of-service attack).
21
Land
The Land attack occurs when an attacker sends a spoofed SYN packet in which the source
address is the same as the destination address. The reason a LAND attack works is because
it causes the machine to reply to itself continuously. Directed against vulnerable systems,
this attack caused systems to lock up or become unstable.
Nuke
Nuke is an old dos attack against computer network consisting of fragmented or otherwise
invalid ICMP packets sent to the target, achieved by using modified ping utility to
repeatedly send the corrupt data, thus slowing down the affected computer until it comes to
complete stop.
2.3.2. Probe
22
Nmap
Nmap can be used for black hat hacking, or attempting to gain unauthorized access to
computer systems. It would typically be used to discover open ports which are likely to be
running vulnerable services, in preparation for attacking those services with another
program.
System administrators often use Nmap to search for unauthorized servers on their network,
or for computers which don't meet the organization's minimum level of security.
Satan
23
Scans in light and normal mode simply check for smaller subsets of these vulnerabilities.
Ipsweep
Portsweep
Port Sweep is a network testing tool that will let attacker learn a lot about Internet and its
functionality. It is like more applications combined together to get more efficient results in
easier way. Attacker can gather information about the computer and some other computers
that are connected to Internet. This professionally designed application can be handy in
finding all information (location, network type) about certain computer (IP, server, email).Attacker can sweep their network to see if there is any open ports waiting to be
hacked, to see what data is sent etc.
24
2.4. jNetPcap
jNetPcap is a java wrapper around libpcap and WinPcap native libraries found on various
unix and windows platforms. jNetPcap exposes the functionality as a java programming
interface (API) which helps in capturing packets in the network.
The main classes which implement libpcap and WinPcap functionality are:
org.jnetpcap.Pcap class - core libpcap methods available on all platforms
org.jnetpcap.winpcap.winpcap class - extensions based on WinPcap library
typically only available on windows based system
2.5. jSMILE
25
3. SYSTEM DESIGN
Our aim is to design and develop an Intelligent Network Intrusion Detection System
(INIDS) that would be accurate, low in false alarms, not easily cheated by small variations
in patterns, adaptive and real time detection.
Attributes Used
For our INIDS, we have extracted 18 features from tcpdump files which can identify
packet characteristics. The features are:
protocol type,
ip length,
dont fragment flag(df),
more fragment flag(mf),
fragmentation offset,
syn flood,
urgent pointer,
tcp flags(urg, ack, psh, rst, syn, fin),
tcp window size,
udp checksum,
icmp flood,
icmp checksum, and
type (packet is normal or attack)
26
Normal
Detector
Sniffer
Attack
File
System
Captured
Knowledge
Based
Engine
Network
Trained
Training
DataSet
The 'Context Diagram 'or level-0 DFD is an overall, simplified, view of the target
system, which contains only one process box and the primary inputs and outputs.
27
The level-1 DFD shows all processes at the first level of numbering, data stores, external
entities and the data flows between them. The purpose of this level is to show the major
high-level processes of the system and their interrelation.
28
29
UML is now the most widely used graphical representation scheme for modeling objectoriented systems. An attractive feature of the UML is its flexibility. The UML is extensible
and is independent of any particular OOAD process. We have created a use case diagram
to model the interactions between network administrators or crackers with theirs use cases.
INIDS
Train Dataset
Test Dataset
Attack System
Network Admin
Add to Dataset
Run System
30
Cracker
4. METHODOLOGY
To develop our system, we have adopted the traditional waterfall model. The waterfall
model is a sequential software development process, in which progress is seen as flowing
steadily downwards like a waterfall through the phases of conception, analysis, design,
construction, testing and maintenance. To follow the waterfall model, one proceeds from
one phase to the next in a sequential manner. For example, when the requirements are fully
completed, one proceeds to design. When the design is fully completed, an implementation
of that design is made by coders. Towards the later stages of this implementation phase,
separate software components produced are combined to introduce new functionality and
reduced risk through the removal of errors. Thus the waterfall model maintains that one
should move to a phase only when its preceding phase is completed and perfected.
31
For the training and testing of our INIDS, we have used the 1998 DARPAs dataset
provided by MIT Lincoln Laboratory. It is widely used dataset to train and test the
intrusion detection system. It provides around 4 gigabytes of compressed Tcpdump data
for 7 weeks of the network traffic. Each week has five days, and each day has the TCP
dump data. It also provides TCP dump list file, which labels every flow whether the flow is
attack or not. Every entries consists of the flow identifier number, date, time when the first
packet of the flow is arrived, duration, service name, source port number, destination port
number, source IP address, destination IP address, attack score, and the name of the attack.
With this file, we are able to recognize which flow is an attack and to extract the data from
the TCP dump data with the information in the TCP dump list file.
First week and second week of training data consists of normal traffic and other week
consists of mixed dataset i.e. normal traffic and attack traffic. For the purpose of training
our intrusion detection system, we have extracted normal traffic from outside tcpdump of
the day Wednesday and Thursday of second week. Similarly, we have extracted attack
traffic from other weeks traffic. We have used editcap tool to split the huge tcpdump file
and wireshark to filter the desired packets.
For our INIDS, we have extracted 18 features from tcpdump files which can identify
packet characteristics. The features have to be preprocessed to be suitable for naive bayes
algorithm because naive bayes algorithm cannot handle continuous value. So, while
making dataset the continuous features are discretized. Then, this dataset is fed for the
purpose of learning naive bayes classifier. Again, when inferencing we extract all the
features for each packet and we feed them to naive bayes classifier which calculates the
probability of packet is normal and based on the threshold the packet is classified as
normal or attack.
32
5. IMPLEMENTATION
In this technique, various objects that occur in the problem domain and the solution
domain are first identified and different kinds of relationships that exist among these
objects are identified. This object structure is further refined to obtain the detailed design.
This approach has several advantages such as less development effort, and time and better
maintainability.
Tools Used:
Netbeans 6.5 IDE
API Used:
JSmile API
JNetPcap
Language Used:
Java
33
6. TESTING
While implementing our system, we go through various levels of testing which are as
follows:
a) Unit Testing: The purpose or unit testing is to determine the correct working of the
individual modules.
b) Integration Testing: During this phase the different modules are integrated in a
planned manner. The different modules making up a system are never integrated in a single
shot. Integration is normally carried out through a number of steps. During each integration
step, the partially integrated system is tested.
c) System Testing: Finally when all the modules have been successfully integrated and
tested, system testing is carried out.
34
Two of the most prevalent strategies that we performed are black-box testing and whitebox testing.
a) Black-Box testing: Demonstrates that software functions are operational and the input
is properly accepted and output is correct produced.
b) White-Box testing: Examines the fundamental aspect of the system with complete
information and access to the internal logical structure, code and algorithms.
A lot of features are still to be added in our project. There are many limitations which are
still to be corrected. Before releasing the final version of software, alpha testing, beta
testing and acceptance testing can be done additionally.
35
7. RESULT
7.1. Screenshots
36
37
38
39
40
Our INIDS can be compared with the existing IDS system such as snort which is regarded
as ideal intrusion detection system. Snort is signature-based, whereas our system is
machine learning-based. In terms of known attacks, we see that snort is better, whereas in
case of unknown attacks, our system is better. Snort has command line configuration mode
whereas our system has GUI mode for the configuration. As a result, one can find that our
system is easy to use.
High
INIDS
High
SNORTS
Low
or
0
INIDS
SNORT
Low
High
SNORT
INIDS
Low
41
8.1. Conclusions
We accomplished the project regarding the detection of network intrusions based on Naive
Bayes algorithm. The completed project can detect the novel attacks with the learning
techniques which were not detected by the existing system, Snort. Comparing with snort,
although it provides high accuracy, it was more time consuming requiring regular updates.
Our system can detect the intrusions more efficiently with less time consuming.
After completing this project we are able to do teamwork and knew the way to task
dividing and cooperating in the task. Successful work not only made us feel proud but we
also became good companions. In this way we completed our project successfully.
Our system works only for IPv4 network. In future, it can be extended to IPv6 network.
We have analyzed only packet header. So, our system could not detect Exploits
intrusions. So, we could add payload analyzing features in our system in future.
As a nave Bayesian network is a restricted network that has only two layers and assumes
complete independence between the information nodes. This poses a limitation to this
research work. In order to alleviate this problem so as to reduce the false positives, active
platform or event based classification may be thought of using Bayesian network. We
continue our work in this direction in order to build an efficient intrusion detection model.
42
REFERENCES
[1] Bace R.G, Intrusion Detection, Technical Publishing ISBN 1-57870-185-6, 2002
[2] Lunt. T., Detecting intruders in computer systems. Conference on auditing and
computer technology, 1993.
[3] Krister Johansen, Stephen Lee, Bayesian Network Intrusion Detection, 2003
[4] MIT Lincolon Laboratory, 1999 DARPA intrusion detection evaluation design and
procedure, DARPA Technical report Feb 2001
[5] Weijie Chai, Li Li, Anomaly Detection Using TCP Header Information, April 26th,
2004
[6] Peyman Kabiri, Gholam Reza Zargar, Category-Based Selection of Effective
Parameters for Intrusion Detection, September 2009, VOL.9
[7] Matthew V. Mahoney,Philip K. Chan, Packet Header Anomaly Detection for
Identifying Hostile Network Traffic, 2001
[8] Christopher Kruegel, Darren Mutz ,William Robertson, Bayesian Event Classification
for Intrusion Detection, 2003
[9] Mrutyunjaya Panda and Manas Ranjan Patra, Network Intrusion Detection Using
Naive Bayes, December 2007, VOL.7
[10] Roland Kwitt, A Statistical Anomaly Detection Approach for Detecting Network
Attacks, December 14th 2004
[11] Kevin P. Murphy Naive Bayes classifiers October 24, 2006
[12] Salem Benferhat, Abdelhamid Boudjelida, Habiba Drias An Intrusion Detection
Approach Based on Tree Augmented Naive Bayes and Expert Knowledge
[13] Daniel Barbara, Ningning Wu, Sushil Jajodia Detecting Novel Network Intrusions
Using Bayes Estimators, 2001
[14] Kristopher Kendall, A Database of Computer Attacks for the Evaluation of Intrusion
Detection Systems, June 1999
[15] Markus Lang, Implementation of Nave Bayesian Classifiers in Java
43
http://jnetpcap.com/docs/javadoc/jnetpcap-javadoc/index.html
http://genie.sis.pitt.edu/wiki/Introduction_to_jSMILE
http://en.wikipedia.org/wiki/Naive_Bayes_classifier
http://en.wikipedia.org/wiki/Denial-of-service_attack
http://www.irchelp.org/irchelp/nuke/
http://www.autonlab.org/tutorials/naive.html
http://www.resample.com/xlminer/help/NaiveBC/classiNB_intro.htm
http://www.topbits.com/network-attacks.html
44
APPENDIX A: RFCs
Protocol
RFC
826, 903, 925, 1027, 1293, 1329, 1433, 1868, 1931, 2390
BGP
1092, 1105, 1163, 1265, 1266, 1267, 1364, 1392, 1403, 1565,
1654, 1655, 1665, 1771, 1772, 1745, 1774, 2283
951, 1048, 1084, 1395, 1497, 1531, 1532, 1533, 1534, 1541
BGP
CIDR
DHCP
DNS
799, 811, 819, 830, 881, 882, 883, 897, 920, 921, 1034, 1035,
1386, 1480, 1535, 1536, 1537, 1591, 1637, 1664, 1706, 1712,
1713, 1982, 2065, 2137, 2317, 2535, 2671
FTP
114, 133, 141, 163, 171, 172, 238, 242, 250, 256, 264, 269, 281,
291, 354, 385, 412, 414, 418, 430, 438, 448, 463, 468, 478, 486,
505, 506, 542, 553, 624, 630, 640, 691, 765, 913, 959, 1635, 1785,
2228, 2577
HTML
1866
HTTP
2068, 2109
ICMP
IGMP
988, 1054, 1112, 1301, 1458, 1469, 1768, 2236, 2357, 2365, 2502,
2588
IMAP
IP
760, 781, 791, 815, 1025, 1063, 1071, 1141, 1190, 1191, 1624,
2113
IPv6
1365, 1550, 1678, 1680, 1682, 1683, 1686, 1688, 1726, 1752,
1826, 1883, 1884, 1886, 1887, 1955, 2080, 2373, 2452, 2463
45
Protocol
RFC
MIB
MIME
Multicast Routing
NAT
OSPF
1131, 1245, 1246, 1247, 1370, 1583, 1584, 1585, 1586, 1587,
2178, 2328, 2329, 2370
POP
RARP
RIP
1131, 1245, 1246, 1247, 1370, 1583, 1584, 1585, 1586, 1587,
1722, 1723, 2082, 2453
SCTP
2960, 3257, 3284, 3285, 3286, 3309, 3436, 3554, 3708, 3758
SMI
196, 221, 224, 278, 524, 539, 753, 772, 780, 806, 821, 934, 974
IMAP
1047, 1081, 1082, 1225, 1460, 1496, 1426, 1427, 1652, 1653,
1711, 1725, 1734, 1740, 1741, 1767, 1869, 1870, 2045, 2046,
2047, 2048, 2177, 2180, 2192, 2193, 2221, 2342, 2359, 2449
TCP
675, 700, 721, 761, 793, 879, 896, 1078, 1106, 1110, 1144, 1145,
1146, 1263, 1323, 1337, 1379, 1644, 1693, 1901, 1905, 2001
TELNET
137, 340, 393, 426, 435, 452, 466, 495, 513, 529, 562, 595, 596,
599, 669, 679, 701, 702, 703, 728, 764, 782, 818, 854, 855, 1184,
1205, 2355
TFTP
UDP
768
VPN
2547,2637,2685
WWW
46
PortNumber
UDP/TCP Protocol
TCP
ECHO
13
UDP/TCP
DAYTIME
19
20
TCP
FTP-DATA
21
TCP
FTP-CONTROL
23
TCP
TELNET
25
TCP
SMTP
37
UDP/TCP
TIME
67
UDP
BOOTP-SERVER
68
UDP
BOOTP-CLIENT
69
UDP
TFTP
70
TCP
GOPHER
79
TCP
FINGER
80
TCP
HTTP
109
TCP
POP-2
110
TCP
POP-3
111
UDP/TCP
RPC
161
UDP
SNMP
162
UDP
SNMP-TRAP
179
TCP
BGP
520
UDP
RIP
47
Type
Code Description
0 - Echo Reply
1 and 2
3 - DestinationUnreachable
4 - Source Quench
5 - Redirect Message
10
11
12
13
Reserved
8 - Echo Request
Echo request
9 - Router Advertisement
Router Advertisement
10 - Router Solicitation
Router discovery/selection/solicitation
48
Type
Code Description
11 - Time Exceeded
IP header
Bad length
13 - Timestamp
Timestamp
14 - Timestamp Reply
Timestamp reply
15 - Information Request
Information Request
16 - Information Reply
Information Reply
19
20 through 29
30 - Traceroute
Information Request
31
32
33
34
35
36
37
38
39
40
41
42 through 255
Reserved
49
APPENDIX D: CD Contents
a) Source Codes
b) Readme
50