Vous êtes sur la page 1sur 13

Detecting Worms and Abnormal Activities with NetFlow, Part 1

Yiming Gong 2004-08-16


Editor's note: a French translation of this article (PDF) is also available.

Enterprise networks are facing ever-increasing security threats from worms, port scans, DDoS,
and network misuse, and thus effective monitoring approaches to quickly detect these
activities are greatly needed. Firewall and intrusion detection systems (IDS) are the most
common ways to detect these activities, but additional technology such as NetFlow can be a
valuable enhancement.

1. NetFlow overview

NetFlow is a traffic profile monitoring technology developed by Darren Kerr and Barry Bruins
at Cisco Systems, back in 1996. As a de facto industry standard, NetFlow describes the
method for a router to export statistics about the routed socket pairs, and it's now a built-in
feature for most Cisco routers as well as Juniper, Extreme and some other vendor's routers
and switches.

When a network administrator enables the NetFlow export on a router interface, traffic
statistics of packets received on that interface will be counted as "flow" and stored into a
dynamic flow cache.

1.1 What is "flow"?

Flow is defined as a unidirectional sequence of packets (which means there will be two flows
for each connection session, one from the server to client, one from the client to server)
between two endpoints. A flow can be identified by seven key fields: source IP address,
destination IP address, source port number, destination port number, protocol type, type of
services, and the router input interface. Any time after receiving a packet, a router will look
for these seven fields and then make a decision: if the packet belongs to an existent flow,
traffic statistics of the corresponding flow will be increased, otherwise a new flow entry will be
created.

According to Cisco, as new flow is continuously created, the expired flow records will be
exported by means of a UDP packet to a user-specified monitoring station if one of the
following conditions occurs. The conditions are:

• The transport protocol indicates that the connection is completed (TCP FIN), and there
is a small delay to allow for the completion of the FIN acknowledgment handshaking.
• Traffic inactivity exceeds 15 seconds.
• For flows that remain continuously active, flow cache entries expire every 30 minutes
to ensure periodic reporting of active flows.

A number of network hardware vendors have implemented their version of NetFlow, but
Version5 is now the most common. For a V5 datagram, every single UDP datagram contains
one flow header and thirty flow records. Every flow record is made up of several fields, which
include: the source and destination IP address, next hop address, input and output interface
number, number of packet in the flow, total bytes in the flow, the source and destination port,
the protocol, ToS, source and destination AS number, and TCP flags (Cumulative OR of TCP
flags).

On the collection station, a flow file analyzer is needed to process the exported flow data in
real time. It can be either commercial software/hardware or a station created with open
source tools.

1.2 NetFlow versus intrusion detection systems

Looking through a flow record, you will find that there is no packet payload information in the
flow field. This is one of the major differences with NetFlow as compared to a traditional IDS.
A flow record doesn't contain any high-layer information, it just contains traffic profiles. As a
result, this makes NetFlow lose the ability to dig deeply into packets and do any packet
analysis work, yet there is still enough information to make some valuable conclusions from
the data. The advantage to this approach is its high speed. Paying no attention to packet
payloads greatly reduces the processing overhead and makes NetFlow an extraordinarily good
fit for busy, high-speed network environments. In addition, this characteristic makes NetFlow
very useful in zero-day or "mutant attack" detection in cases where signature-based intrusion
detection systems would fail.

Because flow data is coming directly from the router, a core element of any large network,
NetFlow is capable of providing a unique view on the entire traffic of a network at the
infrastructure level. It also proactive detection of network infrastructure security events.

If analyzed properly, NetFlow records will be very suitable for early worm and other abnormal
network activity detection in large enterprise networks and service providers. In this paper, I
will discuss some flow-based analysis methods on network security.

2. Flow-based analysis methods

2.1 Top N and Baseline

A baseline is a model describing what 'normal' network activity is according to some historical
traffic pattern; all traffic that falls outside the scope of this established traffic pattern will be
flagged as anomalous.

Trend and baseline analysis reports, commonly referred to as Top N and Baseline Analysis, is
the most common and basic method of doing flow-based analysis. With this approach,
attention is paid to flow records which have some "special high volume" characteristics,
especially the value of those flow fields that deviate significantly from an established historical
baseline.

Normally there are two ways to make use of Top N and Baseline methods: Top N sessions and
Top N data.
2.1.1 Top N session

A Top N session means a single host produces an abnormally high volume of connection
requests to a single destination or block of destinations, and the volume departs from the
established baseline. The most likely reason for these activities are the presence of new
worms, DoS/DDoS attacks, network scans or certain kinds of network abuses.

Normal clients connecting to the Internet should keep a relatively normal connection
frequency to the outside. But if a host is infected with a worm, it will absolutely act different.
It will always launch a large number of connection requests to the outside for its attempts to
infect the next batch of victims, and as a result, the connection request numbers sent out will
be significantly high.

For the same reason, when a lesser-skilled "script kiddie" is scanning a large block of
addresses for certain vulnerable services, we will see especially high volume sessions sent out
by that single IP address.

We can also use Top N session methods to detect many kinds of network abuses, such as
checking the flow records for port 25 connection requests sent out by every single host in real
time. In a given duration, for any host, if the statistics of port 25 requests are above a
'normal' value, it could be considered to be a spammer or someone infected with some kinds
of email worm. It would be better for the Internet as a whole if service providers started using
this technology and shut down the spammers upon detection.

2.1.2 Top N data

A second method of using Top N and Baseline methods is with Top N data. This can be defined
as a consistantly large amount of network data transferred in a certain period of time between
two network nodes or from a single node to a block of addresses.

The Top N hosts that transfer traffic data to or from the outside in an enterprise should be
ranked into relatively fixed groups. If this pattern changes, and a new host suddenly appears
in the Top N hosts matrix, an alert should be triggered.

Here is a example demonstrating Top N data methods that were used to track down a network
security problem. One day, one of our customers reported a network bandwidth usage and
congestion problem. We quickly enabled NetFlow on their upstream router's interface to collect
egress traffic from their network, and had the flow data sent to our monitoring station. A few
minutes later, a flow file was created. We analyzed the file with our flow-tools to generate a
usage report for the top 20 hosts, sorted by octets. When the result displayed on console, we
noticed that a host now siting in first place had abnormally high communication octets. A
further examination of the flow records showed that the host sent out a huge number of
requests to destination port 1434, so we now had the answer. The host was infected with the
SQL slammer worm, and it almost ate up all their available bandwidth. After the customer
patched the vulnerable machine, their network connection situation recovered.
2.2 Pattern Matching

Pattern matching is another method we can use to identify abnormal network activities when
doing flow-based analysis. With this method, the flow records will be searched and those hosts
associated with flow fields that seem "suspicious" based on our criteria will be flagged.

All the flow fields in a flow record can be used to do a pattern match, but the source and
destination IP addresses, and the source and destination port numbers, are the most
commonly used.

2.2.1 Port matching

Generally speaking, in order to launch an attack almost every attack should target a specific,
functional port. For example, the SQL Slammer worm works on port 1434, the Netbus Trojan
works on port 12345. Administrators can filter out all the flow records whereby the destination
ports are equal to some specific ports, in order to find the corresponding attacks. This method
is very easy to implement and can be used in most cases, although it may also produce false
positives.

2.2.2 IP address matching

IP address matching is another method that can be used for security purposes with NetFlow
analysis. There are several ways to make an IP address match, such as the following:
(A) Match IANA reserved addresses

The IANA has reserved large blocks of Internet address space which should not be
used for global routing. If we find any flow record containing IANA reserved addresses,
an alert should be triggered.

An important fact that the administrator must realize when performing IANA reserved
address matches is that he can't trace back the potential host within the flow record if
it is using spoofed IP addresses. At this point another flow field, Ifindex, should be
used. We could check the corresponding router Ifindex number in the flow records to
find the actual router interface where the flow comes from.

I've experienced an interesting case in which one of our customer's NetFlow records
were appearing strange; the flow records showed a large number of connections
whereby the source ports were all 80, the source addresses were 127.0.0.1, and the
TCP flags of these flow records were all RST/ACK.

The following is an output example of flow-tools:

Sif SrcIPaddress DIf DstIPaddress Pr SrcP DstP Pkts Octets StartTime EndTime
Active B/Pk Ts Fl
0059 127.0.0.1 005b 219.140.194.174 06 50 4f3 1 40 0721.21:58:00.593
0721.21:58:00.593 0.000 40 00 14
0059 127.0.0.1 005b 219.148.205.228 06 50 6ef 1 40 0721.21:57:56.533
0721.21:57:56.533 0.000 40 00 14
We can see that the source port (SrcP) is 50 in HEX, which equals 80 in decimal. And
TCP flag (Fl) is 14 in HEX, and in the binary system it means 010100, which is TCP
RST/ACK. Since the source IP address (SrcIPaddress) is a spoofed 127.0.0.1, where is
the attacker coming from?

Using the router Ifindex (Sif) field in the flow records, the router interface where these
packets came from was quickly identified. I informed the administrator who was in
charge of the network on that interface, and after a little while he responded to me
with the answer: a PC in his domain was broken in and had a DoS program installed.
The program was designed to launch TCP port 80 DoS attacks with spoofed source IP
addresses against a security website located in Guangdong, China, but the DNS A
record of the website had been changed to 127.0.0.1. Thus, the attack packets were
received by the PC itself, then reset to the spoofed source IP addresses.

(B). Match a special IP or IP list

There are always some default rules for any enterprise or ISP when performing flow-
based abnormal detection. Some of those rules are based on:

o outbound traffic

For an enterprise or ISP, any flow record where the IP source address is not
part of their network domain for outbound traffic should be considered as
abnormal.

o Inbound traffic

For an enterprise or ISP, any flow record where the IP source addresses are
part of their domain for inbound traffic should be considered abnormal.

o Fixed addresses

Some kinds of abnormal activities may have one or more fixed IP addresses
that contact is made with. For example, when the W32/Netsky.c worm
spreads, it will send a DNS query to the following DNS servers,

145.253.2.171, 151.189.13.35, 193.141.40.42, 193.189.244.205,


193.193.144.12, 193.193.158.10, 194.25.2.129, 194.25.2.129, 194.25.2.130,
194.25.2.131, 194.25.2.132, 194.25.2.133, 194.25.2.134, 195.185.185.195,
195.20.224.234, 212.185.252.136, 212.185.252.73, 212.185.253.70,
212.44.160.8, 212.7.128.162, 212.7.128.165, 213.191.74.19, 217.5.97.137,
62.155.255.16

Therefore, any flow record in which the destination address is found to be in


this list and the destination port is also UDP 53 should raise an alert, and
future analysis is then needed.

3.0 Concluding part one

This concludes the first of our two-part series. Check back in two week's time where we'll
continue the discussion of NetFlow. In part two, we'll look at how to filter our flow results via
TCP flags, we'll discuss some ICMP issues, and then discuss some of the various tools that
exist to help implement and analyze our NetFlow solution. Stay tuned

Detecting Worms and Abnormal Activities with NetFlow, Part 2


Yiming Gong 2004-09-23
1. NetFlow review

In the first part of this article series, we looked at what NetFlow is and how it can be used in
the early detection of worms, spammers, and other abnormal network activity for large
enterprise networks and Internet service providers. The article discussed some of the most
common methods of flow-based analysis: Top N, Baseline and Pattern Matching techniques.

In this second and final part of the article, we'll look at three additional methods of analyzing
the flow, including how to filter our flow results via TCP flags, in order to get a more granular
view of network abnormalities. We'll discuss some ICMP issues, and then look at some of the
various tools that exist to help implement and analyze our NetFlow solution. Let's get started.

2. TCP flags for NetFlow

One difficult task when performing flow-based analysis is that the administrator must evaluate
a very large number of flow records. If he is just relying on the Top N, baseline and pattern
matching methods, the administrator will merely get a coarse view of network abnormities.
We've seen many times there are moderately intensive worms and other abnormal activities
which appear intangible amongst the immense amount of legitimate traffic that is typically
found in a large enterprise network. Those malicious hosts will not show up in the Top N lists,
nor will we know in advance what key fields and values to 'grep' -- yet these are still malicious
hosts that must be addressed.

In order to identify the abnormities more effectively and accurately, a better way to narrow
the analyzable flow records is required. Fortunately, for most types of TCP-based worms and
other abnormities, there is another useful field in flow records: analysis based on the TCP
flags.

Worms, by their replicating nature, are programmed to seek as many victims as possible.
Typically they send out hundreds or even thousands of probes to large blocks of IP addresses
in a very short period of time. If a worm was designed to spread via TCP (as most of them
are), during its propagation there will be a lot of corresponding TCP SYN packets sent out as it
seeks vulnerable services in other hosts.

2.1 A typical worm's SYN scan process

In the patch a worm takes as it makes its way outside the corporate network, there are three
possibile results to its SYN scan:

1. The first possibility is that the destination host is alive, and the corresponding
vulnerable service that was targeted is running.

As we all know, the three-way handshake that starts a normal TCP connection
involves:

o First a client will send a SYN packet to the destination host.


o The destination host sends back a SYN/ACK packet.
o The client acknowledges the destination host's acknowledgment.
o A connection is established.

Figure 1, below, illustrates this handshake.

Figure 1: destination host is living and the TCP port is open

When a SYN packet arrives at the destination port of a host, if the port is open the
SYN request sent from the worm will be responded to -- regardless of whether or not
the service running on that port is vulnerable. Therefore, the standard TCP three-way
handshake will be completed and subsequent packets carrying other TCP flags such as
PUSH and ACK will be followed. In the first part of this article we mentioned that a V5
NetFlow record contains the cumulative OR of TCP flags in the 'entire' connection
session. Therefore, using our the NetFlow approach we should expect to see a
combination of TCP flags such as ACK/PUSH/SYN/FIN or ACK/SYN/FIN in flow record,
flowing in both directions.

2. The second possible result from the worm's SYN scan is that the destination host it
attempts to connect to is not living, as shown below in Figure 2.
Figure 2: connect to a 'dead' destination host

Because the destination host is not alive, the SYN requests sent out by the worm will
receive no response. From our NetFlow perspective, we will get a flow record in which
only the SYN bit was set from worm host and sent to destination host.

3. The third possible result is that the destination is alive but connection attempts from
the worm are not functional, as shown in Figure 3.

Figure 3: destination host with closed TCP port

This simply means the destination host is indeed living but the port which the worm is
trying to connect is closed. As we know, to establish a TCP connection a server must
listen on that particular port. If a client connects to a server's non-listening port, the
server will send back a RST/ACK packet. According to normal TCP implementation
guidelines, the host will immediately stop any TCP connection attempts once it
receives a RST. From the NetFlow angle, the TCP flag combination in the flow records
will show only SYN requests from the worm host to the destination host.

Thus far I have discussed three possibilities when a worm's host SYN scans the network. There
is one important fact about the worm to remember: when it tries to propagate, the destination
addresses are typically generated at random, and normally there will be a large number of
destination hosts that are not living or functional. Therefore in the outgoing traffic we should
expect to see a large number of SYN bits set in the flow records associated with the worm-
infected host. This characteristic is very useful and may be used as a key point for our flow-
based detection of worms and other TCP based abnormalities. In section 2.2, below, we will
discuss this detection process.

2.2 Three steps to process a flow file for TCP flags

When doing flow-based analysis, a captured flow file can be processed by the following steps:

1. The first step is to search through the flow file, filter out all flow records that have only
the SYN bit set, extract the source IP addresses of every flow record, count the
occurrence of every unique IP, and then finally sort the records by the number of
counts for each one. Following this process, we will end up with a suitable list of
potentials. The administrator can set a threshold depending on the network size and
traffic volume, whereby hosts whose counters are above the threshold should be
considered as potentially malicious, and those under the threshold should be
considered as benign.
2. The second step is to search through the flow file again to extract all flow records
where the source IP addresses are the ones found in the "potential malicious" list as
generated in step 1 above. By taking a second look at the flow file in this way, we will
get a detailed connection table for every potential host. The results of this search will
be used for our third and final step in this process, and will help us to further identify
the behavior of our suspicious hosts.
3. The third step will give us some very meaningful data about the worm-infected hosts
on our network. First read the output taken from step 2, and then for each host count
the number of appearances of every unique destination port. Sort by the number of
occurrences, and we will then get an IP address and its corresponding active ports
table. The following is an example output generated by a little shell script that
performs this task, as written by the author.

potential host1: 61.236.123.225


------
84 times probe on dstport 1025
76 times probe on dstport 80
72 times probe on dstport 2745
64 times probe on dstport 3127
48 times probe on dstport 6129

For a malicious host that is always trying to connect to one or several special ports, we
can discover it in our reports by checking the corresponding services registered for the
most active host's destination port in the matrix. In other words, using the above
example, host 61.236.123.225 was infected with W32.gaobot.sa worm because it is
scanning for the MyDoom backdoor on port 3127, the Bagle backdoor on port 2745,
and the Dameware port on 6129. These are clearly the characteristics of worm
W32.gaobot.sa.

In our goal is to evaluate the propagation of a worm from outside our network into the inside,
flow records coming from outside that have the TCP flags RST/ACK set should be looked at.
We know that a closed port will send back a RST/ACK to a TCP request, as was shown in
Figure 3. If a worm is scanning a large block of living hosts for a certain vulnerability, those
hosts with closed ports would send back a RST/ACK. For ingress traffic, if a destination (not
source!) host in the flow records receives too many RST/ACK responses, the administrator
should check out this destination IP seriously, as it is very likely infected with a worm.

3. ICMP issues

One of the purposes of ICMP is to provide feedback about problems in the communication
environment of a network. Sometimes a ICMP type/code in the flow records could also be used
to help us locate the potential malicious hosts.

The first thing to note is that there is no flow field that is directly named as an ICMP type or
ICMP code, as was inferred in part 1 of this article series. Some people have suggested, then,
that the type and code of ICMP requests cannot be identified in the captured flow data
because that information is not specifically recorded. Does that mean we cannot use ICMP
type/code for flow-based analysis? No. In actual fact, we can: ICMP type and code are indeed
recorded in the NetFlow data, they are simply stored in the destination port field in the flow
record.

For flow-tools, when one needs to obtain the ICMP type and code number, we just need to
check the destination port field (dstPort). If the number appears in hex, we should convert it
to decimal, and vice versa.

The following is example output of flow-tools in which the dstPort is in decimal.

srcIP dstIP prot srcPort dstPort octets


packets
135.169.9.116 137.54.111.144 1 0 2048 28
1
135.169.9.116 137.62.249.241 1 0 2048 28
1
135.230.255.66 136.129.9.27 1 0 769 112
2
135.32.252.50 136.129.9.27 1 0 769 56
1

We can see that protocol field (prot) is 1, which means ICMP. The destination port is 2048,
which is 800 in hex. Here 8 means ICMP type 8, and 00 is the code field for ICMP type 8,
which means no code. So we can conclude that 800 is an ICMP echo request. In the same
way, 769 is 301 in hex, which is ICMP type 3 and code 01, which means ICMP host
unreachable.

There are two interesting types of ICMP packets that can be used for flow-based abnormal
detection when analyzing a network's ingress traffic. It is also possible to use pattern
matching methods to do ICMP flow analysis, as we will demonstrate.

3.1 ICMP destination unreachable

According to ICMP implement guidelines, if the destination network or the destination host is
unreachable, the gateway MAY send destination unreachable messages to the source host, as
shown below in Figure 4.

Figure 4: destination unreachable

3.2 ICMP port unreachable

For UDP requests, hosts with closed ports may send back ICMP port unreachable messages to
the source host. If a worm spreads with UDP, it may then trigger many ICMP port unreachable
flow records in the packets returned. This is shown below in Figure 5.
Figure 5: destination host with closed UDP port

If a host has an abnormal volume of ICMP port/host/network unreachable set of flow records,
that may indicate that the host is acting abnormal.

3.3 Pattern matching methods

Another ICMP-based flow analysis method is pattern matching. Some worms and network
attacks are carried out using ICMP, as we saw with the W32.Nachi.worm. When a host is
infected with the worm, it will send out ICMP echo requests to the outside with a fixed length
of 92 bytes. So we simply need to filter out the flow records with ICMP type 8 that have a 92-
byte packet length, and hosts infected with this worm will be caught.

4. Special zones in the Enterprise

In enterprise networks there are always some servers that are protected with firewalls.
Normally these servers should be hardened and have only fixed ports open to outside. Except
for those fixed ports, any connection established between the server and outside should be
prohibited.

We could use this characteristic to monitor the security of the servers using NetFlow.

4.1 ingress traffic

If we find any flow record whereby the destination IP contains a server IP, but the destination
port is not in the server's functional port list and additionally the TCP flags in the flow record
contains ACK (but not RST/ACK), an alert should be triggered.

The above suggestions perhaps indicates two points. First, it tells us that the firewall in front
of the host has something wrong with it, as it has let a connection (which should be
prohibited) get established. An exception to this would be that the connection launched by
outside incorrectly contains only a ACK packet; regardless, this kind of connection should not
have appeared. Secondly, the appearance of this flow record also indicates the server may
have an abnormal port open to outside!

4.2 egress traffic

When we see any flow record whereby the source IP contains a server IP but the source port
is not on the server's list of functioning ports, and additionally the TCP flags in the flow record
are not RST/ACK, an alert should be triggered.

As well, if we spot any data being transferred at the same time as the above, a red alert
should be immediately raised! It is quite possible that the server has been broken into.
Perhaps a backdoor has been actived, and maybe a new service has been enabled.

5. Implementation guidelines

Thus far we have discussed several methods that can be used for flow-based detection of
worms and other network abnormal activities, however no detailed implement instructions
have been provided. In reality, if you follow the main points of these methods as discussed in
this article series, implement will actually be fairly easy.

For instructions on how to enable NetFlow on a specific router, readers can check the
corresponding manufacture's website. Some example NetFlow configurations for popular Cisco
and Juniper routers can be found at: http://www.splintered.net/sw/flow-tools/docs/flow-tools-
examples.html

Although there are both commercial and open source solutions for flow file analysis, the
author himself prefers the open source solution. Commercial products normally have built-in
and fixed functions which are always difficult to extend. Most importantly, commercial flow
analysis tools don't have the flexibility as existing open source options.

For open source solutions we have many choices, such as cflowd, SiLK, and flow-tools. All of
these work quite well and on many different UNIX platforms.

cflowd
cflowd is the classic traffic flow analysis tool. It can be found at
http://www.caida.org/tools/measurement/cflowd, however note that it is no longer
supported by CAIDA anymore, so consider one of the other tools below.

SiLK
SiLK is a collection of NetFlow tools developed by CERT/AC to facilitate security
analysis in large networks. It consists of a packing system and an analysis suite. SiLK
provides administrators with great flexibility in its ability to process the flow data, but
in my option, SiLK still needs some revisions and enhancements to make it run more
smoothly. SiLK can be found at http://silktools.sourceforge.net.

Flow-tools
Flow-tools is a powerful and helpful program for NetFlow related work. There are some
available add-ons, and overall it provides greater flexibility and controls than many of
the other tools. Flow-tools can be found at http://www.splintered.net/sw/flow-tools/.

In addition to these, there are some other programs such as FlowScan and CUFlow which can
be used for flow-based analysis work. All of these can be considered to be valuable tools.

6. Summary

This article series has discussed the flow-based detection of worms and abnormal activities.
Part one talked about the basic concept of NetFlow, and then the first two of the five flow-
based analysis methods were put forward. The second part of the article discussed the final
three analysis methods. In summary, these five methods of analysis are Top N and Baseline,
Pattern Matching, TCP flags, ICMP issues and special zone for large enterprises. With these
methods, network administrators can detect network-wide abnormities much more effectively.

There is no silver bullet for security detection on large network infrastructure, but with
NetFlow we may attain further insight into the traffic crossing our entire network -- and make
it run better.

About the author

Yiming Gong has worked for China Telecom for more than 5 years as a senior system
administrator, and now he works as a Technical Manager in China Telecom System Integration
Co.Ltd. He also has a personal homepage focusing on network/system security.