System

TRIBHUVAN UNIVERSITY
INSTITUTE OF ENGINEERING
PULCHOWK CAMPUS
DEPARTMENT OF ELECTRONICS AND COMPUTER ENGINEERING
A
FINAL YEAR PROJECT REPORT
ON
INTELLIGENT NETWORK INTRUSION DETECTION SYSTEM
By:
PUNEET KHANAL (062BCT527)

RAJIV SHRESTHA (062BCT529)
RAJU KC (062BCT530)
LALITPUR, NEPAL
MARCH, 2010
TRIBHUVAN UNIVERSITY
INSTITUTE OF ENGINEERING
PULCHOWK CAMPUS
INTELLIGENT NETWORK INTRUSION DETECTION SYSTEM
By:
Puneet Khanal
Rajiv Shrestha
Raju KC
A PROJECT SUBMITTED TO THE DEPARTMENT OF ELECTRONICS AND COMPUTER

ENGINEERING IN PARTIAL FULLFILLMENT OF THE REQUIREMENT FOR THE
BACHELORS DEGREE IN ELECTRONICS & COMMUNICATION / COMPUTER
ENGINEERING
DEPARTMENT OF ELECTRONICS AND COMPUTER ENGINEERING

LALITPUR, NEPAL
March, 2010
LETTER OF APPROVAL
The undersigned certify that they have read, and recommended to the Institute of
Engineering for acceptance, a project report entitled Intelligent Network Intrusion
Detection System" submitted by Puneet Khanal, Rajiv Shrestha and Raju KC in partial
fulfillment of the requirements for the degree Bachelor of Computer Engineering.
______________________________
______________________________
Project Supervisor
Project Supervisor
Babu Ram Dawadi
Manoj Ghimire
Assistant Professor
Lecturer
Department of Electronics and Computer
Department of Electronics and
Engineering
Computer Engineering
______________________________
______________________________
Internal Examiner
External Examiner
Purushottam Sigdel
Krishna Prasad Bhandari
Director
Senior Engineer
Center for Information Technology
Nepal Telecom
________________________________
Project Coordinator and Deputy Head

Surendra Shrestha, Ph.D.
Department of Electronics and Computer Engineering
Institute of Engineering
DATE OF APPROVAL: 17th March, 2010

ii
COPYRIGHT
The author has agreed that the Library, Department of Electronics and Computer
Engineering, Pulchowk Campus, Institute of Engineering may make this report freely
available for inspection. Moreover, the author has agreed that permission for extensive
copying of this project report for scholarly purpose may be granted by the supervisors who
supervised the project work recorded herein or, in their absence, by the Head of the
Department wherein the project report was done. It is understood that the recognition will
be given to the author of this report and to the Department of Electronics and Computer
Engineering, Pulchowk Campus, Institute of Engineering in any use of the material of this
project report. Copying or publication or the other use of this report for financial gain
without approval of to the Department of Electronics and Computer Engineering,
Pulchowk Campus, Institute of Engineering and authors written permission is prohibited.
Request for permission to copy or to make any other use of the material in this report in
whole or in part should be addressed to:
Head
Department of Electronics and Computer Engineering
Pulchowk Campus, Institute of Engineering
Lalitpur, Kathmandu
Nepal
iii
ACKNOWLEDGEMENT
We are sincerely thankful to the Department of Electronics and Computer Engineering for
providing the opportunity to do this project.
We are indebted to our supervisor Mr. Babu Ram Dawadi and Mr. Manoj Ghimire for their
valuable suggestions and constant guidance for the accomplishment of the project. Besides,
we are also thankful to the Project Coordinator Mr. Surendra Shrestha for assisting and
guiding us in the project.
Last but not the least we are thankful towards our friends as well as teachers who
supported us all the way in the course of the project
Puneet Khanal (062BCT527)

Rajiv Shrestha (062BCT529)
Raju KC (062BCT530)
iv
ABSTRACT
Network Intrusion Detection Systems (NIDS) aim at preventing network attacks and
unauthorized remote use of computers. More accurately, depending on the kind of attack it
targets, an NIDS can be oriented to detect misuses (by defining all possible attacks) or
anomalies (by modeling legitimate behavior and detecting those that do not fit on that
model). Still, since their problem knowledge is restricted to possible attacks, misuse
detection fails to notice anomalies and vice versa. Against this, we present here Intelligent
Network Intrusion Detection System (INIDS), the misuse and anomaly detection system
based on Naive Bayes Classifier, trained with a KDDCup99 dataset traffic, to analyze
completely network packets, and the strategy to create a consistent knowledge model that
integrates misuse and anomaly-based knowledge.
Finally, we evaluate against well-known and new attacks showing how it outperforms a
well-established industrial NIDS.
Keywords: Network Attacks, Misuse Detection, Anomaly Detection, Network Packets,

Naive Bayes Classifier
TABLE OF CONTENTS
PAGE OF APPROVAL.....II
COPYRIGHT...III
ACKNOWLEDGEMENT...IV
ABSTRACT..V
TABLE OF CONTENTS.VI
LIST OF FIGURES...VIII
LIST OF TABLES...IX
LIST OF SYMBOLS AND ABBREVIATIONS..X
1 INTRODUCTION...1
1.1 What is an IDS?......................................................................................................1
1.2 What is not an IDS?................................................................................................3
1.3 Attack Types...3
1.4 Existing System..4
1.5 Problem Statement..4
1.6 Objectives...4
1.7 Scope of the Project....5
2 LITERATURE REVIEW....6
2.1 The TCP/IP Reference Model..6
2.1.1 Internet Protocol (IP).....7
2.1.2 Internet Control Message Protocol (ICMP)....10
2.1.3 User Datagram Protocol (UDP)..12
2.1.4 Transmission Control Protocol (TCP).13
2.2 Naive Bayes Classifier...16
2.3 Some Well-Known Attacks....18
2.3.1 DoS..18
2.3.2 Probe....22
2.4 jNetPcap.25
vi
2.5 jSMILE...25
3 SYSTEM DESIGN...26
3.1 System Block Diagram...27
3.2 Data Flow Diagrams (DFDs).27
3.3 Unified Modeling Language (UML)..30
4 METHODOLOGY31
5 IMPLEMENTATION...33
5.1 Object-Oriented Design..33
6 TESTING..34
6.1 Level of Testing.34
6.2 Software Testing Strategies....35
7 RESULT....36
7.1 Screenshots.....36
7.2 Comparison with Other Existing System...41
8 CONCLUSIONS AND FURTHER WORK.42

8.1 Conclusions....42
8.2 Further Work......42
REFERENCES 43
APPENDIX A: RFCs...45
APPENDIX B: UDP and TCP Ports47
APPENDIX C: ICMP Messages..48
APPENDIX D: CD Contents...50
vii
LIST OF FIGURES
Figure 2.1 TCP/IP Internet Model......7

Figure 2.2 IP Header Format......8
Figure 2.3 ICMP Header Format..11
Figure 2.4 UDP Header Format...12
Figure 2.5 TCP Header Format....13
Figure 2.6 Smurf attack....20
Figure 3.1 System Block Diagram...27
Figure 3.2 Level-0 DFD...28
Figure 3.5 Use Case Diagram..30
Figure7.1 Naive Bayes Classifier.36
Figure 7.2 GUI Layout.37
Figure 7.3 Detection of normal packets only...38
Figure 7.4 Detection of anomalous packets only.39
Figure 7.5 Detection of both normal and anomalous packets .....40
Figure 7.6 Accuracy of known attack..41
Figure 7.7 Accuracy of unknown attack..41
Figure 7.8 Ease of Use.41
viii
LIST OF TABLES
Table 2.1 Types of Service... 9

Table 2.2 Description of flags in the control field...15
Table A.1 RFCs for each protocol...45
Table B.1 List of UDP and TCP ports.47
Table C.1 List of permitted ICMP messages...48
ix
LIST OF SYMBOLS AND ABBREVIATIONS
Product
ACK
Acknowledgment
API
Application Programming Interface
DFDs
Data Flow Diagrams
DNS
Domain Name System
DoS
Denial-of-Service
DS
Dataset
DSCP
Differentiated Services Code Point
GUI
Graphical User Interface
HIDS
Host-based Intrusion Detection System
ICMP
Internet Control Message Protocol
IDS
Intrusion Detection System
INIDS
Intelligent Network Intrusion Detection System
IP
Internet Protocol
NIDS
Network Intrusion Detection System
OS
Operating System
TCP
Transmission Control Protocol
TCP/IP
Transmission Control protocol / Internetworking Protocol
TOS
Type of Service
TTL
Time to Live
UDP
User Datagram Protocol
1. INTRODUCTION
Nowadays, as more people make use of the internet, their computers and valuable data in
their computer systems become a more interesting target for the intruders. Attackers scan
the Internet constantly, searching for potential vulnerabilities in the machines that are
connected to the network. Intruders aim at gaining control of a machine and to insert a
malicious code into it. Later on, using these slaved machines (also called Zombies)
intruder may initiate attacks such as worm attack, Denial-of-Service (DoS) attack and
probing attack.
1.1. What is an IDS?
Intrusion is any set of actions that threaten the integrity, availability, or confidentiality of a
network resource. An intrusion detection system (IDS) monitors network traffic and
monitors for suspicious activity and alerts the system or network administrator. In some
cases the IDS may also respond to anomalous or malicious traffic by taking action such as
blocking the user or source IP address from accessing the network.
IDS come in a variety of flavors and approach the goal of detecting suspicious traffic in
different ways. There are network based (NIDS) and host based (HIDS) intrusion detection
systems.
a) NIDS: Network Intrusion Detection Systems (NIDS) are a subset of security

management systems that are used to discover inappropriate, incorrect, or anomalous
activities within networks.
b) HIDS: Host-based intrusion detection system (HIDS) monitors and analyzes the
internals of a computing system rather than the network packets on its external interfaces.
There are IDS that detect based on looking for specific signatures of known threats- similar
to the way antivirus software typically detects and protects against malware- and there are
IDS that detect based on comparing traffic patterns against a baseline and looking for
anomalies.
a) Signature Based: A signature based IDS will monitor packets on the network and
compare them against a database of signatures or attributes from known malicious threats.
This is similar to the way most antivirus software detects malware. The issue is that there
will be a lag between a new threat being discovered in the wild and the signature for
detecting that threat being applied to the IDS. During that lag time, the IDS would be
unable to detect the new threat. The limitation of this approach lies in its dependence on
frequent updates of the signature database and its inability to generalize and detect novel or
unknown intrusions.
b) Anomaly Based: An IDS which is anomaly based will monitor network traffic and
compare it against an established baseline. The baseline will identify what is normal for
that network- what sort of bandwidth is generally used, what protocols are used, what ports
and devices generally connect to each other- and alert the administrator or user when
traffic is detected which is anomalous, or significantly different, than the baseline.
However, statistical anomaly detection is not based on an adaptive intelligent model and
cannot learn from normal and malicious traffic patterns.
There are IDS that simply monitor and alert and there are IDS that perform an action or
actions in response to a detected threat.
a) Passive IDS: A passive IDS simply detects and alerts. When suspicious or malicious
traffic is detected an alert is generated and sent to the administrator or user and it is up to
them to take action to block the activity or respond in some way.
b) Reactive IDS: Reactive IDS will not only detect suspicious or malicious traffic and
alert the administrator, but will take pre-defined proactive actions to respond to the threat.
Typically this means blocking any further network traffic from the source IP address or
user.
Intrusion detection systems help network administrators prepare for and deal with network
security attacks. These systems collect information from a variety of systems and network
sources, and analyze them for signs of intrusion and misuse. A variety of techniques have
been employed for analysis ranging from traditional statistical methods to new machine
learning approaches.
1.2. What is not an IDS?

Contrary to popular marketing belief and terminology employed in the literature on
intrusion detection systems, not everything falls into this category. In particular, the
following security devices are not IDS:
Network logging systems used, for example, network traffic monitoring systems.
Anti-virus products designed to detect malicious software such as viruses, trojan
horses, worms, logic bombs.
Firewalls.
Security/cryptographic systems, for example VPN, SSL, S/MIME, Kerberos,
Radius etc.
1.3. Attack Types
Attack can be classified into three types. They are as follows:
a) Reconnaissance: These attacks involve the gathering of information about a system in

order to find its weaknesses such as port sweeps, ping sweeps, port scans, and
Domain
Name System (DNS) zone transfers.
b) Exploits: These attacks take advantage of a known bug or design flaw in the system.
c) Denial-of-Service (DoS): These attacks disrupt or deny access to a service or resource.
1.4. Existing System
One of the most well known and widely used intrusion detection systems is the open
source, freely available Snort. It is available for a number of platforms and operating
systems including both Linux and Windows. Snort has a large and loyal following and
there are many resources available on the Internet where we can acquire signatures to
implement to detect the latest threats.
1.5. Problem Statement
The classical signature-based approach:

Cannot detect unknown or new intrusions.
Patches and regular updates are required.
The statistical anomaly-based approach:

Not based on an adaptive intelligent model.
Cannot learn from normal and malicious traffic patterns.
An alternative approach based on machine learning must be developed.
1.6. Objectives
To implement intrusion detection system using Nave Bayes Classifier,

To protect secure information of an organization from outside and inside intruders,
To detect novel or unknown intrusions in real-time.
1.7. Scope of the Project
Increased network complexity, greater access, and a growing emphasis on the Internet have
made network security a major concern for organizations. The number of computer
security breaches has risen significantly in the last three years. In February 2000, several
major web sites including Yahoo, Amazon, E-Bay, Datek, and E-Trade were shut down
due to denial-of-service attacks on their web servers.
Today, a large amount of sensitive information is processed through computer networks,

thus it is increasingly important to make information systems, especially those used for
critical functions in the military and commercial sectors, resistant and tolerant to network
intrusions. Hence Intrusion Detection has become an integral part of the information
security process.
2. LITERATURE REVIEW
2.1. The TCP/IP Reference Model
The TCP/IP layer is a multi-layered architecture. This means that we have one
functionality running at one depth, and another one at another level, and so forth. We can
add new functionality to the application layers, for example, without having to reimplement the whole TCP/IP stack code, or to include a complete TCP/IP stack into the
actual application.
The following four layers comprise the TCP/IP Internet model:

a) Application layer
Handles implementation of user applications.
b) Transport layer
Manages end-to-end communications between hosts.
Two transport layers protocols are TCP and UDP.
c) Network layer
Gets data from source to destination.
d) Link layer
Manages data transfer to and from physical medium.
Stream
Web
browser
Web server
TCP segment
TCP
TCP
IP datagram
IP
Ethernet
driver
IP
Ethernet frame
Ethernet
driver
Figure 2.1 TCP/IP Internet Model
2.1.1. Internet Protocol (IP)
The IP protocol resides in the Internet layer. It is an unreliable and connectionless

datagram protocol-a best-effort delivery service. The term best-effort means that IPv4
provides no error control or flow control (except for error detection on the header). IPv4
assumes the unreliability of the underlying layers and does its best to get a transmission
through to its destination, but with no guarantees. If reliability is important, IPv4 must be
paired with a reliable protocol such as TCP.
IP Header
A datagram is a variable-length packet consisting of two parts: header and data.

The header is 20 to 60 bytes in length and contains information essential to routing and
delivery. The header has a 20-byte fixed part and a variable length optional part of
maximum of 40-bytes. The header format is shown below:
32-bits
VER(4-bits)
HLEN(4-bits)
Identification(16-bits)
TTL(8-bits)
Service(8-bits)
Total Length(16-bits)
Flags(3-bits)
Fragmentation Offset(13-bits)
Protocol(8-bits)
Header Checksum(16-bits)
Source Address(32-bits)
Destination Address(32-bits)
Options
Padding
Figure 2.2 IP Header Format
IP Header Field Description
Version (VER): This four bits field tells the version of IPV4 protocol in binary which
value is 0100.
Header Length (HLEN): This four bits field defines the total length of the datagram
header in four byte words. This field is needed because the length of the header is variable
(between 20 and 60 bytes). When there are no options, the header length is 20 bytes, and
the value of this field is five (5 x 4 = 20). When the option field is at its maximum size, the
value of this field is 15 (15 x 4 = 60).
Service: This has two interpretations. They are:
a) Service Type
In this interpretation, the first three bits are called precedence bits. The next four bits are
called type of service (TOS) bits, and the last bit is not used.
Table 2.1 Types of Service
TOS Bits
Description
0000
Normal (default)
0001
Minimize cost
0010
Maximize reliability
0100
Maximize throughput
1000
Minimize delay
b) Differentiated Services
According to this standard bits [0-5] is Differentiated Services Code Point (DSCP) and the
remaining two bits [6-7] are still unused.
Total Length: This field defines the total length (header plus data) of the IPv4 datagram in
bytes. The maximum size is 65535 octets, or bytes, for a single packet.
Identification: This field is used in reassembly of fragmented packets.
Flags: This field is used in fragmentation. The first bit is reserved, but still not used, and
must be set to zero. The second bit is set to zero if the packet may be fragmented and to
one if it may not be fragmented. The third and last bit can be set to zero if this was the last
fragment and one if there are more fragments of this same packet.
Fragmentation Offset: The fragmentation offset field tells where in the datagram that this
packet belongs. The fragments are calculated in 64 bits, and the first fragment has offset
zero.
Time to Live: The TTL field defines how long the packet may live, or rather how many
"hops" it may take over the Internet. After processing the datagram, each router
decrements this number by one. If this value, after being decremented, is zero, the router
discards the datagram.
Protocol: This field indicates the protocol of the next level layer. This can be TCP, UDP
or ICMP.
Checksum: This field is used for error detection.
Source Address: This field contains the source address.
Destination Address: This field contains the destination address.
Option: If the Header Length is greater than five, it means that the Options field is present
and must be considered. The options field contains different optional settings such as
Internet timestamps, SACK or record route options.
Padding: This field is used to make the header end at an even 32 bit boundary. The field
must always be set to zeroes straight through to the end.
2.1.2. Internet Control Message Protocol (ICMP)
The Internet Control Message Protocol (ICMP) is gives important information about the
health of the network.
Types of Messages
ICMP messages are divided into two broad categories:

a) error-reporting messages, and
b) query messages.
The error-reporting messages report problems that a router or a host (destination) may
encounter when it processes an IP packet. Five types of errors are handled: destination
unreachable, source quench, time exceeded, parameter problems, and redirection. The
query messages, which occur in pairs, help a host or a network manager get specific
information from a router or another host. For example, nodes can discover their
10
neighbors. Also, hosts can discover and learn about routers on their network, and routers
can help a node redirect its messages. Four types of query messages are echo request and
reply, timestamp request and reply, address-mask request and reply, & router solicitation
and advertisement.
ICMP Header
8-bits
8-bits
16-bits
Type
Code
Checksum
Rest of the header

Data Sections
Figure 2.3 ICMP Header Format
ICMP Header Field Description
Type: The type field contains the ICMP type of the packet. This is always different from
ICMP type to type.
Code: All ICMP types can contain different codes as well. Some types only have a single
code, while others have several codes that they can use.
Checksum: This field is used for error detection.
11
2.1.3. User Datagram Protocol (UDP)
The User Datagram Protocol (UDP) is called a connectionless, unreliable transport

protocol. It does not add anything to the services of IP except to provide process-toprocess communication instead of host-to-host communication. Also, it performs very
limited error checking.
If UDP is so powerless, why would a process want to use it? With the disadvantages come
some advantages. UDP is a very simple protocol using a minimum of overhead. If a
process wants to send a small message and does not care much about reliability, it can use
UDP.
UDP Header
The UDP header can be said to contain a very basic and simplified TCP header. It contains
destination-ports, source-ports, header length and a checksum as seen in the image below.
16-bits
16-bits
Source Port
Destination Port
Total Length
Checksum
Figure 2.4 UDP Header Format
UDP Header Field Description
Source Port: This field indicates the port number used by the process running on the
source host. It is 16-bits long. The port number can range from 0 to 65,535.
Destination Port: This field indicates the port number used by the process running on the
destination host. It is also 16-bits long.
12
Total Length: The length field specifies the length of the whole packet (header and data
portions).
Checksum: This field is used to detect errors over the entire user datagram (header plus
data).
2.1.4. Transmission Control Protocol (TCP)
TCP, like UDP, is a process-to-process (program-to-program) protocol. TCP, therefore,

like UDP, uses port numbers. Unlike UDP, TCP is a connectionoriented protocol; it
creates a virtual connection between two TCPs to send data. In addition, TCP uses flow
and error control mechanisms at the transport level. In brief, TCP is called a connectionoriented, reliable transport protocol. It adds connection-oriented and reliability features to
the services of IP.
TCP Header
32-bits
Source Port Address(16-bits)
Destination Port Address(16-bits)
Sequence Number(32-bits)
Acknowledge Number(32-bits)
HLEN
Reserved
(4-bits)
(6-bits)
U A P R
R C S S
G K H T
N N
Checksum(16-bits)
Window Size(16-bits)
Urgent Pointer(16-bits)
Options and Padding
Figure 2.5 TCP Header Format
13
TCP Header Field Description
Source Port: This field indicates the source port of the packet. The source port is directly
bound to the process on the sending system.
Destination Port: This field indicates the destination port of the TCP packet. Just as with
the source port, this port is directly bound to the process on the receiving system.
Sequence Number: This field is used to set a number on each TCP packet so that the TCP
stream can be properly sequenced. The Sequence number is then returned in the ACK field
to acknowledge that the packet was properly received.
Acknowledgement Number: This field is used to acknowledge a specific packet a host

has received. For example, we receive a packet with one Sequence number set, and if
everything is okay with the packet, we reply with an ACK packet with the
Acknowledgment number set to the same as the original Sequence number.
Header Length: This four bits field indicates the number of four byte words in the TCP
header. The length of the header can be between 20 and 60 bytes. Therefore, the value of
this field can be between five (5 x 4 = 20) and 15 (15 x 4 = 60).
Reserved: This is a six bits field reserved for future usage.
Control: This field defines six different control flags as:
14
Table 2.2 Description of flags in the control field
Flag
Description
URG
The value of the urgent pointer field is valid.
ACK
The value of the acknowledgment field is valid.
PSH
Push the data.
RST
Reset the connection.
SYN
Synchronize sequence numbers during connection.
FIN
Terminate the connection.
Window: This field is used by the receiving host to tell the sender how much data the
receiver permits at the moment. This can be done by sending an ACK back, which contains
the Sequence number that we want to acknowledge, and the Window field then contains
the maximum accepted sequence numbers that the sending host can use before he receives
the next ACK packet. The next ACK packet will update accepted Window which the
sender may use.
Checksum: This field contains the checksum of the whole TCP header. The checksum
also covers a 96 bit pseudo header containing the destination-address, source-address,
protocol, and TCP length. This is for extra security.
Urgent Pointer: This field contains a pointer that points to the end of the data which is
considered urgent. If the connection has important data that should be processed as soon as
possible by the receiving end, the sender can set the URG flag and set the Urgent pointer to
indicate where the urgent data ends.
Option: The Option field is a variable length field and contains optional headers that we
may want to use.
Padding: This padding field pads the TCP header until the whole header ends at a 32-bit
boundary. This ensures that the data part of the packet begins on a 32-bit boundary, and no
data is lost in the packet. The padding always consists of only zeros.
15
2.2. Naive Bayes Classifier
A Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem

with strong (naive) independence assumptions. A more descriptive term for the underlying
probability model would be "independent feature model".
In simple terms, a naive Bayes classifier assumes that the presence (or absence) of a
particular feature of a class is unrelated to the presence (or absence) of any other feature.
Depending on the precise nature of the probability model, naive Bayes classifiers can be
trained very efficiently in a supervised learning setting. In spite of their naive design and
apparently over-simplified assumptions, naive Bayes classifiers have worked quite well in
many complex real-world situations.
An advantage of the naive Bayes classifier is that it requires a small amount of training
data to estimate the parameters (means and variances of the variables) necessary for
classification. Because independent variables are assumed, only the variances of the
variables for each class need to be determined and not the entire covariance matrix. The
Naive Bayes algorithm affords fast, highly scalable model building and scoring. It scales
linearly with the number of predictors and rows. The build process for Naive Bayes is
parallelized. Naive Bayes can be used for both binary and multiclass classification
problems.
The Naive Bayes algorithm is based on conditional probabilities. It uses Bayes' Theorem, a
formula that calculates a probability by counting the frequency of values and combinations
of values in the historical data.
Bayes' Theorem
Bayes' Theorem finds the probability of an event occurring given the probability of another
event that has already occurred. If B represents the dependent event and A represents the
prior event, Bayes' theorem can be stated as follows.
16
Prob(B given A) = Prob(A and B)/Prob(A)

To calculate the probability of B given A, the algorithm counts the number of cases where
A and B occur together and divides it by the number of cases where A occurs alone.
Naive Bayes Algorithm

X be a set of instances xi = (a1,a2,,an)
V be a set of classifications vj
Naive Bayes assumption:

. (2.1)
This leads to the following algorithm:

Naive_Bayes_Learn ( examples )
for each target value vj
estimate P ( vj )
for each attribute value ai of each attribute a
estimate P ( ai | vj )
Classify_New_Instance ( x )
We generally estimate P ( ai | vj ) using m-estimates:

. (2.2)
where:
n = the number of training examples for which v = vj
nc = number of examples for which v = vj and a = ai
p = a priori estimate for P ( ai | vj )
m = the equivalent sample size
17
2.3. Some Well-Known Attacks
2.3.1. DoS
A denial of service attack (DoS attack) or distributed denial of service (DDos) is an

attempt to make a computer resource unavailable to its intended users. Perpetrators of DoS
attacks typically target sites or services hosted on high-profile web servers such as banks,
credit card payment gateways, etc. The term is generally used with regards to computer
networks, but is not limited to this field, for example, it is also used in reference to CPU
resource management.
One common method of attack involves saturating the target (victim) machine with
external communications requests, such that it cannot respond to legitimate traffic, or
responds so slowly as to be rendered effectively unavailable. In general terms, DoS attacks
are implemented by either forcing the targeted computer(s) to reset, or consuming its
resources so that it can no longer provide its intended service or obstructing the
communication media between the intended users and the victim so that they can no longer
communicate adequately.
Denial-of-service attacks are considered violations of the IAB's Internet proper use policy,
and also violate the acceptable use policies of virtually all Internet Service Providers. They
also commonly constitute violations of the laws of individual nations.
There are many varieties of denial of service (or DoS) attacks. Some DoS attacks (like a
mailbomb, neptune, or smurf attack) abuse a perfectly legitimate feature. Others (teardrop,
Ping of Death) create malformed packets that confuse the TCP/IP stack of the machine that
is trying to reconstruct the packet. Still others (apache2, back, syslogd) take advantage of
bugs in a particular network daemon.
Some Captured DoS attacks are as follows:

a) Smurf
b) Neptune
c) Teardrop
18
d) Pod
e) Land
f) Nuke
Smurf
The smurf attack is a way of generating significant computer network traffic on a victim
network. This is a type of denial-of-service attack that floods a target system via spoofed
broadcast ping messages.
In the "smurf" attack, attackers use ICMP echo request packets directed to IP broadcast
addresses from remote locations to create a denial-of-service attack. There are three parties
in these attacks: the attacker, the intermediary, and the victim (note that the intermediary
can also be a victim). The attacker sends ICMP echo request packets to the broadcast
address (xxx.xxx.xxx.255) of many subnets with the source address spoofed to be that of
the intended victim. Any machines that are listening on these subnets will respond by
sending ICMP echo reply packets to the victim. The smurf attack is effective because the
attacker is able to use broadcast addresses to amplify what would otherwise be a rather
innocuous ping flood. In the best case (from an attackers point of view), the attacker can
flood a victim with a volume of packets 255 times as great in magnitude as the attacker
would be able to achieve without such amplification. This amplification effect is illustrated
by Figure 2.6. The attacking machine sends a single spoofed packet to the broadcast
address of some network, and every machine that is located on that network responds by
sending a packet to the victim machine. Because there can be as many as 255 machines on
an Ethernet segment, the attacker can use this amplification to generate a flood of ping
packets 255 times as great in size as would otherwise be possible. This figure is a
simplification of the smurf attack. In an actual attack, the attacker sends a stream of icmp
ECHO requests to the broadcast address of many subnets, resulting in a large,
continuous stream of ECHO replies that flood the victim.
19
Internet
Hundreds of echo replys flood
One echo request sent to
Echo Reply
Reply
fromEcho
192.168.0.20
Echo
Reply
from 192.168.0.20
to victim
Echo
from
192.168.0.20
to victim Reply
from 192.168.0.20
to victim
to victim
broadcast address.
Echo Request
From attacker
To 192.168.0.225
Attacker
Victim
Figure 2.6 Smurf attack
Teardrop
A teardrop attack is a denial of service attack. The teardrop attack uses IP to create packet
reassembly problems so the target computer crashes. The teardrop attack uses erroneous
packet header information indicating overlapping fragments of packets so some data in
some packets must overwrite data in other packets to re-assemble the packet. Attempts to
re-assemble these packets with overlapping data can cause the computer to crash if the
software is not prepared to handle erroneous packet header information.
Neptune
Neptune (SYN Flood) is a denial of service attack to which every TCP/IP implementation
is vulnerable (to some degree). For distinguishing a Neptune attack network traffic is
monitored for a number of simultaneous SYN packets destined for a particular machine.
The host sending these packets is usually unreachable.
20
Each half-open TCP connection made to a machine causes the tcpd server to add a
record to the data structure that stores information describing all pending connections. This
data structure is of finite size, and it can be made to overflow by intentionally creating too
many partially-open connections. The half-open connections data structure on the victim
server system will eventually fill and the system will be unable to accept any new
incoming connections until the table is emptied out. Normally there is a timeout associated
with a pending connection, so the half-open connections will eventually expire and the
victim server system will recover. However, the attacking system can simply continue
sending IP-spoofed packets requesting new connections faster than the victim system can
expire the pending connections. In some cases, the system may exhaust memory, crash, or
be rendered otherwise inoperative.
POD
A ping of death (abbreviated "POD") is a type of attack on a computer that involves

sending a malformed or otherwise malicious ping to a computer. A ping is normally 64
bytes in size (or 84 bytes when IP header is considered); many computer systems cannot
handle a ping larger than the maximum IP packet size, which is 65,535 bytes. Sending a
ping of this size can crash the target computer.
Traditionally, this bug has been relatively easy to exploit. Generally, sending a 65,536 byte
ping packet is illegal according to networking protocol, but a packet of such a size can be
sent if it is fragmented; when the target computer reassembles the packet, a buffer overflow
can occur, which often causes a system crash.
This exploit has affected a wide variety of systems, including Unix, Linux, Mac, Windows,
printers, and routers. However, most systems since 1997-1998 have been fixed, so this bug
is mostly historical.
In recent years, a different kind of ping attack has become wide-spread - ping flooding
simply floods the victim with so much ping traffic that normal traffic fails to reach the
system (a basic denial-of-service attack).
21
Land
The Land attack occurs when an attacker sends a spoofed SYN packet in which the source
address is the same as the destination address. The reason a LAND attack works is because
it causes the machine to reply to itself continuously. Directed against vulnerable systems,
this attack caused systems to lock up or become unstable.
Nuke
Nuke is an old dos attack against computer network consisting of fragmented or otherwise
invalid ICMP packets sent to the target, achieved by using modified ping utility to
repeatedly send the corrupt data, thus slowing down the affected computer until it comes to
complete stop.
2.3.2. Probe
Probing is a class of attacks in which an attacker scans a network of computers to collect

information or find known vulnerabilities. An intruder with a map of machines and
services that are available on a network can use this information to look for exploits. There
are different types of probing: some of them abuse the computers legitimate features;
other ones use social engineering techniques. This class of attacks is the most commonly
heard and requires very little technical expertise. Examples are Ipsweep, Mscan, Nmap,
Saint, Satan, Pingsweep and Portsweep attacks.
Following are the captured attacks.

a) Satan
b) Ipsweep
c) Portsweep
d) Nmap
22
Nmap
Nmap is a "Network Mapper", used to discover computers and services on a computer

network, thus creating a "map" of the network. Just like many simple port scanners, Nmap
is capable of discovering passive services on a network despite the fact that such services
aren't advertising themselves with a service discovery protocol. In addition Nmap may be
able to determine various details about the remote computers. These include operating
system, device type, uptime, software product used to run a service, exact version number
of that product, presence of some firewall techniques and, on a local area network, even
vendor of the remote network card.
Nmap can be used for black hat hacking, or attempting to gain unauthorized access to
computer systems. It would typically be used to discover open ports which are likely to be
running vulnerable services, in preparation for attacking those services with another
program.
System administrators often use Nmap to search for unauthorized servers on their network,
or for computers which don't meet the organization's minimum level of security.
Satan
Satan is a probing intrusion which automatically scans a network of computers to gather

information or find known vulnerabilities.
SATAN is an early predecessor of the SAINT scanning program described in the

lastsection. While SAINT and SATAN are quite similar in purpose and design, the
particular vulnerabilities that each tools checks for are slightly different. Like SAINT,
SATAN is distributed as a collection of perl and C programs that can be run either from
within a web browser or from the UNIX command prompt. SATAN supports three levels
of scanning: light, normal, and heavy. The vulnerabilities that SATAN checks for in heavy
mode are:
23
NFS export to unprivileged programs

NFS export via portmapper
NIS password file access
REXD access
tftp file access
remote shell access
unrestricted NFS export
unrestricted X Server access
write-able ftp home directory
several Sendmail vulnerabilities
several ftp vulnerabilities
Scans in light and normal mode simply check for smaller subsets of these vulnerabilities.
Ipsweep
An Ipsweep attack is a surveillance sweep to determine which hosts are listening on a

network. This information is useful to an attacker in staging attacks and searching for
vulnerable machines. There are many methods an attacker can use to perform an Ipsweep
attack. The most common method and the method used within the simulation is to send
ICMP Ping packets to every possible address within a subnet and wait to see which
machines respond.
Portsweep
Port Sweep is a network testing tool that will let attacker learn a lot about Internet and its
functionality. It is like more applications combined together to get more efficient results in
easier way. Attacker can gather information about the computer and some other computers
that are connected to Internet. This professionally designed application can be handy in
finding all information (location, network type) about certain computer (IP, server, email).Attacker can sweep their network to see if there is any open ports waiting to be
hacked, to see what data is sent etc.
24
2.4. jNetPcap
jNetPcap is a java wrapper around libpcap and WinPcap native libraries found on various
unix and windows platforms. jNetPcap exposes the functionality as a java programming
interface (API) which helps in capturing packets in the network.
The main classes which implement libpcap and WinPcap functionality are:
org.jnetpcap.Pcap class - core libpcap methods available on all platforms
org.jnetpcap.winpcap.winpcap class - extensions based on WinPcap library
typically only available on windows based system
The core libpcap implementation of jNetPcap, provides methods to do the following

functions
Find a complete list of network interfaces the system has
Open either a network interface or a PCAP capture file for reading packets
Apply a packet filter
Dump packets into a PCAP capture file
Transmit raw link layer packets over a network interface
Gather statistics on network interface and report counters
2.5. jSMILE
jSMILE is a platform independent library of java classes for reasoning in graphical

probabilistic models, such as Bayesian networks and influence diagrams. It can be
embedded in programs that use graphical probabilistic models as their reasoning engines.
It is enough for jSMILE to have JRE installed so it be used to create stand-alone
applications, applets, and servlets. Model building and inference are under full control of
the application program, as the jSMILE library serves merely as a set of tools and
structures that facilitates them.
25
3. SYSTEM DESIGN
Our aim is to design and develop an Intelligent Network Intrusion Detection System
(INIDS) that would be accurate, low in false alarms, not easily cheated by small variations
in patterns, adaptive and real time detection.
Attributes Used
For our INIDS, we have extracted 18 features from tcpdump files which can identify
packet characteristics. The features are:
protocol type,
ip length,
dont fragment flag(df),
more fragment flag(mf),
fragmentation offset,
syn flood,
urgent pointer,
tcp flags(urg, ack, psh, rst, syn, fin),
tcp window size,
udp checksum,
icmp flood,
icmp checksum, and
type (packet is normal or attack)
26
3.1. System Block Diagram
Normal
Detector
Sniffer
Attack
File
System
Captured
Knowledge
Based
Engine
Network
Trained
Training
DataSet
Figure 3.1 System Block Diagram
3.2. Data Flow Diagrams (DFDs)
DFD is a structured, diagrammatic technique for showing the functions performed by a

system and the data flowing into, out of, and within it.
The 'Context Diagram 'or level-0 DFD is an overall, simplified, view of the target
system, which contains only one process box and the primary inputs and outputs.
27
Figure 3.2 Level-0 DFD
The level-1 DFD shows all processes at the first level of numbering, data stores, external
entities and the data flows between them. The purpose of this level is to show the major
high-level processes of the system and their interrelation.
28
The level-2 DFD is a decomposition of a process shown in a level-1 diagram. Here we

have decomposed inference engine process.
29
3.3. Unified Modeling Language (UML)
UML is now the most widely used graphical representation scheme for modeling objectoriented systems. An attractive feature of the UML is its flexibility. The UML is extensible
and is independent of any particular OOAD process. We have created a use case diagram
to model the interactions between network administrators or crackers with theirs use cases.
INIDS
Train Dataset
Test Dataset
Attack System
Network Admin
Add to Dataset
Run System
Figure 3.5 Use Case Diagram
30
Cracker
4. METHODOLOGY
To develop our system, we have adopted the traditional waterfall model. The waterfall
model is a sequential software development process, in which progress is seen as flowing
steadily downwards like a waterfall through the phases of conception, analysis, design,
construction, testing and maintenance. To follow the waterfall model, one proceeds from
one phase to the next in a sequential manner. For example, when the requirements are fully
completed, one proceeds to design. When the design is fully completed, an implementation
of that design is made by coders. Towards the later stages of this implementation phase,
separate software components produced are combined to introduce new functionality and
reduced risk through the removal of errors. Thus the waterfall model maintains that one
should move to a phase only when its preceding phase is completed and perfected.
As this project is based on knowledge-based, a sizeable proportion of time was spent

researching strategies for implementation. In order to achieve our desired goal regarding
our project, we had come across several books and websites along with the remarkable
suggestions of friends and seniors. We studied different existing systems that are
applicable in several fields. We went through those existing systems and found out their
characteristics, applicability and limitations as well. In this regard, the existed intrusion
detection system "snort" became the inspiring software for us which is signature-based and
failed to detect unknown intrusions and rely on the signatures extracted by human experts.
A learning algorithm is good if it produces better prediction for the classifications of

unseen examples. First we train our model with training dataset and then we test with test
dataset. So, it is more convenient to adopt the following methodology:
Collect a large set of examples.
Divide it into two disjoint sets: the training set and the test set.
Apply the learning algorithm to the training set.
Measure the percentage of examples in the test set that are correct classified.
31
For the training and testing of our INIDS, we have used the 1998 DARPAs dataset
provided by MIT Lincoln Laboratory. It is widely used dataset to train and test the
intrusion detection system. It provides around 4 gigabytes of compressed Tcpdump data
for 7 weeks of the network traffic. Each week has five days, and each day has the TCP
dump data. It also provides TCP dump list file, which labels every flow whether the flow is
attack or not. Every entries consists of the flow identifier number, date, time when the first
packet of the flow is arrived, duration, service name, source port number, destination port
number, source IP address, destination IP address, attack score, and the name of the attack.
With this file, we are able to recognize which flow is an attack and to extract the data from
the TCP dump data with the information in the TCP dump list file.
First week and second week of training data consists of normal traffic and other week
consists of mixed dataset i.e. normal traffic and attack traffic. For the purpose of training
our intrusion detection system, we have extracted normal traffic from outside tcpdump of
the day Wednesday and Thursday of second week. Similarly, we have extracted attack
traffic from other weeks traffic. We have used editcap tool to split the huge tcpdump file
and wireshark to filter the desired packets.
For our INIDS, we have extracted 18 features from tcpdump files which can identify
packet characteristics. The features have to be preprocessed to be suitable for naive bayes
algorithm because naive bayes algorithm cannot handle continuous value. So, while
making dataset the continuous features are discretized. Then, this dataset is fed for the
purpose of learning naive bayes classifier. Again, when inferencing we extract all the
features for each packet and we feed them to naive bayes classifier which calculates the
probability of packet is normal and based on the threshold the packet is classified as
normal or attack.
32
5. IMPLEMENTATION
5.1. Object-Oriented Design
In this technique, various objects that occur in the problem domain and the solution
domain are first identified and different kinds of relationships that exist among these
objects are identified. This object structure is further refined to obtain the detailed design.
This approach has several advantages such as less development effort, and time and better
maintainability.
During this implementation phase, each component of the design is implemented as a

program module, and each of these programs modules is unit tested, debugged and
documented.
Tools Used:
Netbeans 6.5 IDE
API Used:
JSmile API
JNetPcap
Language Used:
Java
System Installation Requirement:

Operating System - XP, Vista, Window - 7
CPU - 500 MHz (or above)
Memory - 128MB (or above)
33
6. TESTING
Testing is necessary to carry-out whether the modules or system is working properly or

not.
6.1. Level of Testing
While implementing our system, we go through various levels of testing which are as
follows:
a) Unit Testing: The purpose or unit testing is to determine the correct working of the
individual modules.
b) Integration Testing: During this phase the different modules are integrated in a
planned manner. The different modules making up a system are never integrated in a single
shot. Integration is normally carried out through a number of steps. During each integration
step, the partially integrated system is tested.
c) System Testing: Finally when all the modules have been successfully integrated and
tested, system testing is carried out.
34
6.2. Software Testing Strategies
Two of the most prevalent strategies that we performed are black-box testing and whitebox testing.
a) Black-Box testing: Demonstrates that software functions are operational and the input
is properly accepted and output is correct produced.
b) White-Box testing: Examines the fundamental aspect of the system with complete
information and access to the internal logical structure, code and algorithms.
A lot of features are still to be added in our project. There are many limitations which are
still to be corrected. Before releasing the final version of software, alpha testing, beta
testing and acceptance testing can be done additionally.
35
7. RESULT
7.1. Screenshots
Figure7.1 Naive Bayes Classifier
36
Figure 7.2 GUI Layout
37
Figure 7.3 Detection of normal packets only
38
Figure 7.4 Detection of anomalous packets only
39
Figure 7.5 Detection of both normal and anomalous packets
40
7.2. Comparison with Other Existing System
Our INIDS can be compared with the existing IDS system such as snort which is regarded
as ideal intrusion detection system. Snort is signature-based, whereas our system is
machine learning-based. In terms of known attacks, we see that snort is better, whereas in
case of unknown attacks, our system is better. Snort has command line configuration mode
whereas our system has GUI mode for the configuration. As a result, one can find that our
system is easy to use.
High
Figure 7.6 Accuracy of known attack
INIDS
High
SNORTS
Low
or
0
INIDS
SNORT
Low
Figure 7.7 Accuracy of unknown attack
High
SNORT
INIDS
Low
Figure 7.8 Ease of Use
41
8. CONCLUSIONS AND FURTHER WORK
8.1. Conclusions
We accomplished the project regarding the detection of network intrusions based on Naive
Bayes algorithm. The completed project can detect the novel attacks with the learning
techniques which were not detected by the existing system, Snort. Comparing with snort,
although it provides high accuracy, it was more time consuming requiring regular updates.
Our system can detect the intrusions more efficiently with less time consuming.
After completing this project we are able to do teamwork and knew the way to task
dividing and cooperating in the task. Successful work not only made us feel proud but we
also became good companions. In this way we completed our project successfully.
8.2. Further Work
Our system works only for IPv4 network. In future, it can be extended to IPv6 network.
We have analyzed only packet header. So, our system could not detect Exploits
intrusions. So, we could add payload analyzing features in our system in future.
As a nave Bayesian network is a restricted network that has only two layers and assumes
complete independence between the information nodes. This poses a limitation to this
research work. In order to alleviate this problem so as to reduce the false positives, active
platform or event based classification may be thought of using Bayesian network. We
continue our work in this direction in order to build an efficient intrusion detection model.
42
REFERENCES
[1] Bace R.G, Intrusion Detection, Technical Publishing ISBN 1-57870-185-6, 2002
[2] Lunt. T., Detecting intruders in computer systems. Conference on auditing and
computer technology, 1993.
[3] Krister Johansen, Stephen Lee, Bayesian Network Intrusion Detection, 2003
[4] MIT Lincolon Laboratory, 1999 DARPA intrusion detection evaluation design and
procedure, DARPA Technical report Feb 2001
[5] Weijie Chai, Li Li, Anomaly Detection Using TCP Header Information, April 26th,
2004
[6] Peyman Kabiri, Gholam Reza Zargar, Category-Based Selection of Effective
Parameters for Intrusion Detection, September 2009, VOL.9
[7] Matthew V. Mahoney,Philip K. Chan, Packet Header Anomaly Detection for
Identifying Hostile Network Traffic, 2001
[8] Christopher Kruegel, Darren Mutz ,William Robertson, Bayesian Event Classification
for Intrusion Detection, 2003
[9] Mrutyunjaya Panda and Manas Ranjan Patra, Network Intrusion Detection Using
Naive Bayes, December 2007, VOL.7
[10] Roland Kwitt, A Statistical Anomaly Detection Approach for Detecting Network
Attacks, December 14th 2004
[11] Kevin P. Murphy Naive Bayes classifiers October 24, 2006
[12] Salem Benferhat, Abdelhamid Boudjelida, Habiba Drias An Intrusion Detection
Approach Based on Tree Augmented Naive Bayes and Expert Knowledge
[13] Daniel Barbara, Ningning Wu, Sushil Jajodia Detecting Novel Network Intrusions
Using Bayes Estimators, 2001
[14] Kristopher Kendall, A Database of Computer Attacks for the Evaluation of Intrusion
Detection Systems, June 1999
[15] Markus Lang, Implementation of Nave Bayesian Classifiers in Java
43
Some Relevant Websites
http://jnetpcap.com/docs/javadoc/jnetpcap-javadoc/index.html
http://genie.sis.pitt.edu/wiki/Introduction_to_jSMILE
http://en.wikipedia.org/wiki/Naive_Bayes_classifier
http://en.wikipedia.org/wiki/Denial-of-service_attack
http://www.irchelp.org/irchelp/nuke/
http://www.autonlab.org/tutorials/naive.html
http://www.resample.com/xlminer/help/NaiveBC/classiNB_intro.htm
http://www.topbits.com/network-attacks.html
44
APPENDIX A: RFCs
Table A.1: RFCs for each protocol
Protocol
RFC
ARP and RARP
826, 903, 925, 1027, 1293, 1329, 1433, 1868, 1931, 2390
BGP
1092, 1105, 1163, 1265, 1266, 1267, 1364, 1392, 1403, 1565,
1654, 1655, 1665, 1771, 1772, 1745, 1774, 2283
BOOTP and DHCP
951, 1048, 1084, 1395, 1497, 1531, 1532, 1533, 1534, 1541
BGP
1542, 2131, 2132
CIDR
1322, 1478, 1479, 1517, 1817
DHCP
See BOOTP and DHCP
DNS
799, 811, 819, 830, 881, 882, 883, 897, 920, 921, 1034, 1035,
1386, 1480, 1535, 1536, 1537, 1591, 1637, 1664, 1706, 1712,
1713, 1982, 2065, 2137, 2317, 2535, 2671
FTP
114, 133, 141, 163, 171, 172, 238, 242, 250, 256, 264, 269, 281,
291, 354, 385, 412, 414, 418, 430, 438, 448, 463, 468, 478, 486,
505, 506, 542, 553, 624, 630, 640, 691, 765, 913, 959, 1635, 1785,
2228, 2577
HTML
1866
HTTP
2068, 2109
ICMP
777, 792, 1016, 1018, 1256, 1788, 2521
IGMP
988, 1054, 1112, 1301, 1458, 1469, 1768, 2236, 2357, 2365, 2502,
2588
IMAP
See SMTP, MIME, POP, IMAP
IP
760, 781, 791, 815, 1025, 1063, 1071, 1141, 1190, 1191, 1624,
2113
IPv6
1365, 1550, 1678, 1680, 1682, 1683, 1686, 1688, 1726, 1752,
1826, 1883, 1884, 1886, 1887, 1955, 2080, 2373, 2452, 2463
45
Table A.1: RFCs for each protocol (Continued)
Protocol
RFC
MIB
See SNMP, MIB, SMI
MIME
Multicast Routing
1584, 1585, 2117, 2362
NAT
1361, 2663, 2694
OSPF
1131, 1245, 1246, 1247, 1370, 1583, 1584, 1585, 1586, 1587,
2178, 2328, 2329, 2370
POP
RARP
See ARP and RARP
RIP
1131, 1245, 1246, 1247, 1370, 1583, 1584, 1585, 1586, 1587,
1722, 1723, 2082, 2453
SCTP
2960, 3257, 3284, 3285, 3286, 3309, 3436, 3554, 3708, 3758
SMI
See SNMP, MIB, SMI
SMTP, MIME, POP,
196, 221, 224, 278, 524, 539, 753, 772, 780, 806, 821, 934, 974
IMAP
1047, 1081, 1082, 1225, 1460, 1496, 1426, 1427, 1652, 1653,
1711, 1725, 1734, 1740, 1741, 1767, 1869, 1870, 2045, 2046,
2047, 2048, 2177, 2180, 2192, 2193, 2221, 2342, 2359, 2449
TCP
675, 700, 721, 761, 793, 879, 896, 1078, 1106, 1110, 1144, 1145,
1146, 1263, 1323, 1337, 1379, 1644, 1693, 1901, 1905, 2001
TELNET
137, 340, 393, 426, 435, 452, 466, 495, 513, 529, 562, 595, 596,
599, 669, 679, 701, 702, 703, 728, 764, 782, 818, 854, 855, 1184,
1205, 2355
TFTP
1350, 1782, 1783, 1784
UDP
768
VPN
2547,2637,2685
WWW
1614, 1630, 1737, 1738
46
APPENDIX B: UDP and TCP Ports
Table B.1: List of UDP and TCP ports
PortNumber
UDP/TCP Protocol
TCP
ECHO
13
UDP/TCP
DAYTIME
19
UDP/TCP CHARACTER GENERATOR
20
TCP
FTP-DATA
21
TCP
FTP-CONTROL
23
TCP
TELNET
25
TCP
SMTP
37
UDP/TCP
TIME
67
UDP
BOOTP-SERVER
68
UDP
BOOTP-CLIENT
69
UDP
TFTP
70
TCP
GOPHER
79
TCP
FINGER
80
TCP
HTTP
109
TCP
POP-2
110
TCP
POP-3
111
UDP/TCP
RPC
161
UDP
SNMP
162
UDP
SNMP-TRAP
179
TCP
BGP
520
UDP
RIP
47
APPENDIX C: ICMP Messages
Table C.1: List of permitted ICMP messages
Type
Code Description
0 - Echo Reply
1 and 2
3 - DestinationUnreachable
4 - Source Quench
5 - Redirect Message
Echo reply (used to ping)

Reserved
Destination network unreachable
Destination host unreachable
Destination protocol unreachable
Destination port unreachable
Fragmentation required, and DF flag set
Source route failed
Destination network unknown
Destination host unknown
Source host isolated
Network administratively prohibited
10
Host administratively prohibited
11
Network unreachable for TOS
12
Host unreachable for TOS
13
Communication administratively prohibited
Source quench (congestion control)
Redirect Datagram for the Network
Redirect Datagram for the Host
Redirect Datagram for the TOS & network
Redirect Datagram for the TOS & host
Alternate Host Address
Reserved
8 - Echo Request
Echo request
9 - Router Advertisement
Router Advertisement
10 - Router Solicitation
Router discovery/selection/solicitation
48
Table C.1: List of permitted ICMP messages (Continued)
Type
Code Description
11 - Time Exceeded
TTL expired in transit
Fragment reassembly time exceeded
12 - Parameter Problem: Bad 0
Pointer indicates the error
IP header
Missing a required option
Bad length
13 - Timestamp
Timestamp
14 - Timestamp Reply
Timestamp reply
15 - Information Request
Information Request
16 - Information Reply
Information Reply
17 - Address Mask Request
Address Mask Request
18 - Address Mask Reply
Address Mask Reply
19
Reserved for security
20 through 29
Reserved for robustness experiment
30 - Traceroute
Information Request
31
Datagram Conversion Error
32
Mobile Host Redirect
33
Where-Are-You (originally meant for IPv6)
34
Here-I-Am (originally meant for IPv6)
35
Mobile Registration Request
36
Mobile Registration Reply
37
Domain Name Request
38
Domain Name Reply
39
SKIP Algorithm Discovery Protocol, Simple KeyManagement for Internet Protocol
40
Photuris, Security failures
41
ICMP for experimental mobility protocols such as

Seamoby [RFC4065]
42 through 255
Reserved
49
APPENDIX D: CD Contents
a) Source Codes
b) Readme
50

System

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

System

Transféré par

Droits d'auteur :

Formats disponibles

TRIBHUVAN UNIVERSITY

PUNEET KHANAL (062BCT527)

INTELLIGENT NETWORK INTRUSION DETECTION SYSTEM

A PROJECT SUBMITTED TO THE DEPARTMENT OF ELECTRONICS AND COMPUTER

DEPARTMENT OF ELECTRONICS AND COMPUTER ENGINEERING

Babu Ram Dawadi

Department of Electronics and Computer

Department of Electronics and

Krishna Prasad Bhandari

Center for Information Technology

Project Coordinator and Deputy Head

DATE OF APPROVAL: 17th March, 2010

Puneet Khanal (062BCT527)

Keywords: Network Attacks, Misuse Detection, Anomaly Detection, Network Packets,

8 CONCLUSIONS AND FURTHER WORK.42

APPENDIX B: UDP and TCP Ports47

APPENDIX C: ICMP Messages..48

Figure 2.1 TCP/IP Internet Model......7

Table 2.1 Types of Service... 9

LIST OF SYMBOLS AND ABBREVIATIONS

Application Programming Interface

Data Flow Diagrams

Domain Name System

Differentiated Services Code Point

Graphical User Interface

Host-based Intrusion Detection System

Internet Control Message Protocol

Intrusion Detection System

Intelligent Network Intrusion Detection System

Network Intrusion Detection System

Transmission Control Protocol

Transmission Control protocol / Internetworking Protocol

User Datagram Protocol

1.1. What is an IDS?

a) NIDS: Network Intrusion Detection Systems (NIDS) are a subset of security

1.2. What is not an IDS?

1.3. Attack Types

Attack can be classified into three types. They are as follows:

a) Reconnaissance: These attacks involve the gathering of information about a system in

Name System (DNS) zone transfers.

c) Denial-of-Service (DoS): These attacks disrupt or deny access to a service or resource.

1.4. Existing System

1.5. Problem Statement

The classical signature-based approach:

The statistical anomaly-based approach:

An alternative approach based on machine learning must be developed.

To implement intrusion detection system using Nave Bayes Classifier,

1.7. Scope of the Project

Today, a large amount of sensitive information is processed through computer networks,

2.1. The TCP/IP Reference Model

The following four layers comprise the TCP/IP Internet model:

Figure 2.1 TCP/IP Internet Model

2.1.1. Internet Protocol (IP)

The IP protocol resides in the Internet layer. It is an unreliable and connectionless

A datagram is a variable-length packet consisting of two parts: header and data.

Figure 2.2 IP Header Format

IP Header Field Description

Service: This has two interpretations. They are:

Table 2.1 Types of Service

Identification: This field is used in reassembly of fragmented packets.

Checksum: This field is used for error detection.

Source Address: This field contains the source address.

Destination Address: This field contains the destination address.

2.1.2. Internet Control Message Protocol (ICMP)