Vous êtes sur la page 1sur 17

Page 1 of 17

Chapter 10

DNS-Based Botnet Detection

Introduction
This chapter discusses the detection of bots and botnets using the Domain Name System (DNS).
1
The
first section below provides some essential background on aspects of the DNS protocol relevant to botnet
detection. The subsequent section discusses how to design botnet detection heuristics using DNS, and presents
selected case studies and tools.

Background
The Domain Name System (DNS) is a world-wide distributed database that stores information about
named Internet resources. Although DNS holds many types of information about domains (for example, mappings
between IP addresses and name servers, mail servers, canonical names), just a few details are relevant to botnet
detection.

Here, we present a simplified overview of DNS, with a focus on recursive queries for A and CNAME
records. Readers looking for a more detailed analysis of DNS are directed to RFC 1034, 1035 and the
numerous books on DNS.
2


DNS Overview
As Don Marti once observed, DNS is a consensus reality
3
. The mappings between any particular
domain name and IP address depend on which server is queried, when, and whether it performs caching,
forwarding, or is an authority server. Network caches and application layer caching (either through an OS stub
resolver or user application) can also affect mappings. Because of local variations in caching behavior, it is
entirely likely that different hosts will receive different answer sets for the same domain. The DNS infrastructure,
and resolvers like BIND,
4
seek to minimize this; however, botmasters have crafted their networks to leverage this
potential.



1
See P. Mockapetris, RFC 1034: Domain names - concepts and facilities, Nov. 1987,
ht t p: / / www. f aqs. or g/ r f cs/ r f c1034. ht ml , and P. Mockapetris, "RFC 1035: Domain names -
implementation and specification", Nov. 1987, ht t p: / / www. f aqs. or g/ r f cs/ r f c1035. ht ml
2
Cricket Liu & Paul Albitz, DNS and BIND, 5th Ed., O'Reilly, 2006. Ron Aitchison, Pro DNS and BIND
APress, New York, 2005.
3
Don Marti, "[linux-elitists] ICANN frenzy!", March, 2001, ht t p: / / zgp. or g/ pi per mai l / l i nux-
el i t i st s/ 2001- Mar ch/ 001716. ht ml
4
Internet Systems Consortium, Berkeley Internet Name Domain (BIND), 2006,
ht t p: / / www. i sc. or g/ i ndex. pl ?/ sw/ bi nd/
Page 2 of 17
Figure 10.1: Typical propagation of a botnet, and resulting DNS usage.




To illustrate key principles of DNS relevant to botnet detection, lets consider the following scenario. A
botmaster releases a virus, which spreads randomly. The virus forces victim to rally by joining a command-and-
control (C&C) service, hosted at the domain, evi l . exampl e. com.
5
From there, the botmaster may make use
of the victims for other purposes, e.g., spamming, phishing, identity theft, DDoS.

Figure 10.1 illustrates the propagation of the malware, written by the botmaster, and designated as VX
in the diagram. Each victim in turn infects others, creating a victim cloud. Since the virus also forces victims to
contact the C&C server (e.g., perhaps an IRC server, a web server, a P2P network), infected individuals must
perform DNS lookups of evi l . exampl e. com. In Figure 10.1, this is depicted for a single victim, with a
dashed line showing an A-record query. The botmaster, who owns or has license to use the domain, can control
the DNS resolution at the authority server. Thus, if the C&C service is taken down, the botmaster merely has to
update the DNS mapping to point to a new C&C domain. Likewise, if network administrators block access to the
IP address of the C&C site, the botmaster merely has to migrate or renumber the C&Cs IP address.

Figure 10.1 therefore represents the general pattern of infection seen in many botnets. (There are of
course variations that use a different cycle of infection.) Note that Figure 10.1 shows only a simplified view of the
DNS traffic. During the growth of a botnet, there are several distinct phases to the botnets DNS traffic. Figure
10.2 shows a more detailed view of DNS resolution, while still omitting several possible scenarios. First, the host
performing a lookup may consult the stub resolver, which may have a local cache. (This scenario is discussed in
detail below.) If we presume the host cache does not contain the mapping for evi l . exampl e. com, the host
then sends a DNS request (here, we presume an A-record query) to a recursive server. (Again, we omit the
possibility of an iterative, or non-recursive query).


5
The exampl e. com, exampl e. net and exampl e. or g domains are reserved under RFC 2606. for use in
documentation. In this chapter, well use the fictitious third-level domain evil.example.com as an example of a
domain associated with botnet activity. See D. Eastlake & A. Panitz, RFC 2606: Reserved Top Level DNS
Names, J une 1999, ht t p: / / www. f aqs. or g/ r f cs/ r f c2606. ht ml

Page 3 of 17
Caching resolvers perform lookups, and store the results for a prescribed period of time, the TTL period.
6

If we presume the caching resolver does not have a cached answer (and further presume that, at the time the hosts
request arrives, it has only cached the addresses of the root servers), then the caching server sends a request to the
root servers.
7
Since the domain (evi l . exampl e. com) is not part of the root zone (.), and the com. zone
has been delegated to another DNS server, the root servers cannot reply with an answer, and instead give the
address of other name servers: the com. TLD servers.
8
The recursive server then sends a query to the TLDs
server. Since the query is for a host in the exampl e. comzone, which has been further delegated, it returns the
address of the exampl e. comname server. The exampl e. comserver is, in this hypothetical example, the start
of authority (SOA) for the zone, and provides the requested record to the recursive server. This answer is then
sent in reply to the hosts request, and cached by both the recursive server and stub resolver. Additionally, all of
the intermediary requests (e.g., the address mappings of the TLD and SOA) are cached as well.

One can also examine many of these steps by using the dig utility, executed in trace mode. For example,
the command

dig maps.google.com +trace

will run dig in non-recursive mode, printing all the intermediate lookups. For example, the trace will show the
steps executed in finding the com. servers, finding the googl e. comzone servers, and then ultimately
locating the appropriate A-record or CNAME response.


6
The TTL period of a DNS cache is different from the hop-count lifetime or TTL period found in routing. The
TTL period is prescribed by the authority name server for the zone. Caching servers generally, but not always,
follow the recommended caching time.

7
The alert ready might spot the possible chicken-and-egg problem in this setup. A freshly booted caching server
would consult a root zone hints file, often a static file distributed with a DNS server, to learn the addresses of the
root servers. One can obtain a copy of the hints file at ht t p: / / www. i nt er ni c. net / zones/ named. r oot

8
Separately, one can inspect the roots zone files by sending an AXFR request to a root server. For example, using
the command dig @f.root-servers.net . axfr will list all zone entries at the root. BIND8 previously
shipped with a useful script in $BIND8/contrib/misc/normalize_zone.pl, to format this output; however,
this was removed in BIND9.

Page 4 of 17
Figure 10.3: Distributed victim networks, some in diurnal low phases, and the impact of their recursive servers on
authority DNS servers.


Interacting with dig will further show how Figure 10.2 greatly simplifies matters. The diagram does not
consider EDNS0
9
responses, truncation, and other DNS traffic and scenarios that routinely occur. But this general
view is useful to understand how world-wide epidemics of infections drive DNS resolution patterns, and affect
caching refreshes from different servers.

Consider Figure 10.3, which shows the world-wide spread of our hypothetical botnet to various different
networks. Each network has a different recursive resolver (depicted in the Figure 10.3 on the edge of each
network cloud). Victims within each network drive patterns of lookups directed at the caching resolver. When the
caching resolvers local cache fails (through timeout), the networks DNS servers will, eventually, consult with
the start of authority (SOA) for a given domain. Note that many of the victims are located in different time zones,
and might therefore generate less activity at night.
10
Since botnets usually have victims scattered around the
world, recursive timeouts and authority refresh lookups arrive in rolling waves, depending on the number of
victims in local areas. This observation has resulted in models that describe botnet growth patterns, based on time
zones.
11


Stub Caching

Most operating systems provide a minimal DNS resolution service for use by applications. In most cases,
both negative (i.e., NXDOMAIN),
12
and positive results are stored. For example, on Windows, the
dnsrslvr.dll and dnsapi.dll libraries are used by most applications to resolve domain names. Previously

9
Paul Vixie, Extension Mechanisms for DNS (EDNS0), Aug. 1999,
ht t p: / / www. f aqs. or g/ r f cs/ r f c2671. ht ml
10
In many countries, electricity costs and local customs are such that machines are powered down at night. Upon
reboot, the victims require new DNS resolutions, and refresh the local recursive servers cache entries.
11
See David Dagon, Cliff Zou & Wenke Lee, Modeling Botnet Propagation Using Time Zones, in
Proceedings of the 13th Annual Network and Distributed System Security Symposium, 2006.
12
M. Andr ews, Negat i ve Cachi ng of DNS Quer i es ( DNS NCACHE) , Mar . 1998,
ht t p: / / www. f aqs. or g/ r f cs/ r f c2308. ht ml
Page 5 of 17
resolved domains (both successful and unsuccessful) are stored by the host OS. This improves performance, since
the host does not need to use the network to lookup recently resolved domains.

In some cases, particular applications operate their own DNS cache, on top of the hosts stub resolver.
Most prominently, Microsoft Internet Explorer used to cache domains
13
for 24 hours (in IE 3.x), and more
recently does so for 30 minutes (in IE 4.x, 5.x and 7.x)
14
. Similarly, Firefox and Mozilla-based browsers cached
DNS answers for 15 minutes (and more recently, since 2004, for 1 minute), regardless of the TTL value.

In many cases, researchers may need to control the caching behavior of the stub resolver and user
applications. On Windows, this is typically done by running

ipconfig /flushdns

after using ipconfig /displaydns to confirm the local DNS cache contents. On Mac OSX, one can simply
use lookupd -flushcache. Other unixes, by default, do not cache DNS answers obtained by
gethostbyname(3) or gethostbyname_r(3), or other <netdb.h> functions. A restart generally flushes the
various caching utilities and daemons, e.g., nscd, named.

Windows provides a variety of registry keys to control host-based DNS caching, including those for IE.
15

Table 10.1 provides a listing of many relevant registry keys that affect stub resolver behavior. For Firefox and
Mozilla-class browsers, one merely browses to about:config, and selects a New > Integer value, creating
a property called network.dnsCacheExpiration, with an Integer value of 0 (zero).
16


Researchers are not alone in their need to occasionally flush local DNS caches. Most bots include a
primitive capability to flush stub DNS caches, either through a forked execution of ipconfig /flushdns, or
by using the DnsFlushResolverCache* functions in the dnsapi.dll library. Botmasters often flush the stub
resolvers cache, to increase their bots network agility. A typical implementation appears in Code Listing 1,
sampled from a common r Bot source tree.

This particular code block originated in the rBot family, but is now common to hundreds of bots. It
essentially locates the address of the DnsFlushResolverCache() family of functions, and uses them to clear
the cache. This side-steps the need to adjust the registry, and avoids forking a secondary process to invoke
ipconfig. In many cases, this ability to flush the stub resolvers cache is exposed in the bots instruction API.
Thus, with a single command, a botmaster can remove any stale DNS entries in the victim host stubs.

Caching Resolvers

With many exceptions, recursive servers provide DNS services to local networks. With the exception of open
recursive servers,
17
the clients populating a recursive servers cache lines should tend to be those found within a

13
Note that the browsers DNS cache is completely different from the local cache of a website.
14
Microsoft, Inc., How Internet Explorer uses the cache for DNS host entries, Nov. 2004,
http://support.microsoft.com/kb/263558 Internet Explorer 6.x only cached DNS CNAME responses, not A-
records. Prior to XPsp2, if a resolution was pending, each click by a user would add another 2-minutes to the TTL
period of a cached recordpotentially creating an infinite cache.
15
Microsoft, Inc., How to Disable Client-Side DNS Caching in Windows XP and Windows Server 2003, Dec.
2005, http://support.microsoft.com/default.aspx?scid=kb\%3Ben-us\%3B318803
16
Gordon Sheridan, "Network.dnsCacheExpiration, J une, 2001,
http://kb.mozillazine.org/Network.dnsCacheExpiration One similarly can add user
pref(network.dnsCacheExpiration, 0); to the users prefs.jsfile.
17
J ohn Kristof, DNS - Open Recursive Name Server Probing, 2006,
ht t p: / / condor . depaul . edu/ ~j kr i st of / or ns/
Page 6 of 17
local network (e.g., clients who obtain a DHCP lease, and have their / et c/ r esol v. conf settings provided
by the DHCP daemon.) Thus, it is often the case that recursive caches reflect the resolution behavior of the local
user population.




Table 1.1 DNS Cache Registry Setting


DWORD: MaxNegativeCacheTtl
Value: 0 (default 900 sec; 15 min.)
Comment: Time an NXDOMAINresponse is cached; 0 eliminates negative caching


DWORD: MaxCacheEntryTtlLimit
Value: 0 (default 86400 sec; 1 day)
Comment: Maximum DNS cache time


DWORD: NetFailureCacheTime
Value: 0 (default 30 sec)
Comment: How long a DNS client stops sending queries when network is down.


DWORD: NegativeSOACacheTime
Value: 0 (default 120 sec)
Comment: The time an NXDOMAIN response from an SOA is cached.


Settings for the registry key HKEY_LOCAL_MACHI NE\ SYSTEM\ Cur r ent Cont r ol Set \
Ser vi ces\ Dnscache\ Par amet er s, and how they affect local caching behavior. Researchers may need to
adjust these keys when investigating DNS problems, or configuring honeypots. Note that many bots also adjust
these settings from their defaults to improve bot performance.


Code Listing 1 DNS cache flushing in rBot

/* loaddlls.cpp */
// dynamically load dnsapi.dll

HMODULE dnsapi_dll = LoadLibrary(dnsapi.dll);

if (dnsapi_dll) {
fDnsFlushResolverCache = (DFRC)GetProcAddress(dnsapi_dll,
DnsFlushResolverCache);
fDnsFlushResolverCacheEntry_A = (DFRCEA)GetProcAddress(dnsapi_dll,
DnsFlushResolverCacheEntry_A);
if (!fDnsFlushResolverCache || !fDnsFlushResolverCacheEntry_A)
nodnsapi = TRUE;


Page 7 of 17
} else {
nodnsapierr = GetLastError();
nodnsapi = TRUE;
}

//...

/* netutils.cpp */
BOOL FlushDNSCache(void)
{
BOOL bRet = FALSE;
if (fDnsFlushResolverCache)
bRet = fDnsFlushResolverCache();
return (bRet);
}


Page 8 of 17

Figure 10.4: Botnet broken down by country of origin, over time. A clear diurnal pattern appears.


As noted above, botnets tend to have diverse victim populations, spread across different time zones.
Figure 10.4 illustrates this point. Victims from a 350,000 member botnet were tracked for activity (e.g.,
connections to the C&C server), plotted along the Y-axis as SYN-rates in a one minute epoch. Since the time line
covers several days, a clear diurnal pattern appears. Closer inspection of a few countries shows that those in
different time zones are phase shifted by an amount appropriate for their difference in time zones. (Compare, for
example, England in GMT -0, or Zulu, time zone, with those in Eastern Europe.) This illustrates how botnet
activity is structured in waves of connections, divided by time zones.

Because clients are in diurnal low phases at different times, depending on their country of origin, the DNS
activity of these hosts similarly varies. Unless cached by the stub resolver or programmed to do otherwise, bots
will have to preface their TCP connections to the C&C server with a (network) DNS lookup. In the context of
botnets, recursive servers are less likely to refresh cache entries for malicious domains during off-peak hours.
This property will help us design detection heuristics, discussed below.

DNS-Based Botnet Detection
Passive DNS Replication
A weakness in DNS routinely exploited by botnets is the lack of consistency and history. DNS servers
merely show the current address of a domain (at least according to the resolvers cache), and not its prior history.
While BIND and other DNS tools all offer extensive output capabilities, logging is primarily used in debugging
and not production environments. Similarly, zone transfers are usually permitted between trusted servers and
secondaries, and are generally not available to the larger Internet community.

Thus, even DNS operators often lack information about what records were cached. In many cases, only
the zone maintainer is in a position to know the complete history of domains mapping.

Page 9 of 17


Figure 10.5: Conceptual diagram of passive DNS sensor deployment, adapted from Weimers FIRST paper. A sensor
witnesses all mappings for a command-and-control server.



Botmasters are keenly aware of this, and routinely move C&C locations. With a large number of C&C
servers, botmasters minimize the chance that the network can be disrupted through simple remediation. In some
cases, botmasters turn botnets on and off to simulate remediation. For example, a malicious domains address
might be set to a non-routable (e.g., and RCF 1918 address,
18
) for a few hours. When more victims are needed, the
domain address is set to the C&C server. This gives the impression that the domain is remediated (and non-
routed). The cycling of IP mappings for a domain complicates take down efforts, and makes remediation (e.g.,
simple firewall rule creation) far more difficult.

One way to overcome this technique is to consult a passive DNS replication service.
19
Passive DNS
replication was created by Florian Weimer, and constructs partial zone files by observing DNS traffic. The
intuitive idea is fairly simple: one merely observed DNS traffic, and stores all completed resolutions. Over time,
this approximates a zone file (or more precisely, the relevant portions of a zone file that users required.) Since the
replication is placed in a database, one can further query not only the current resolution of a domain, but every
address seen in an answer set for a domain.

Figure 10.5, adapted from Weimers paper, shows a conceptual diagram of how passive DNS helps
investigators track botnets. In order to reach the command and control server, infected hosts perform a DNS
lookup of the domain. If we assume a low caching time for the domain, the recursive server for their network
eventually contacts the SOA for the domain. (Alternatively, the victims recursive server could be a forwarder, or
otherwise consult another caching server. This is conceptually the same case.) As shown in Figure 10.5, the

18
See I ANA, Speci al - Use I Pv4 Addr esses, Sept . 2002,
ht t p: / / www. f aqs. or g/ r f cs/ r f c3330. ht ml " , and Y. Rekht er , B. Moskowi t z, D.
Kar r enber g, G. J . de Gr oot & E. Lear , Addr ess Al l ocat i on f or Pr i vat e I nt er net s,
Feb. 1996, ht t p: / / www. f aqs. or g/ r f cs/ r f c1918. ht ml

19
Fl or i an Wei mer , Passi ve DNS Repl i cat i on, Apr . 2005,
ht t p: / / www. enyo. de/ f w/ sof t war e/ dnsl ogger / f i r st 2005- paper . pdf

Page 10 of 17
botmaster has a variety of C&C servers at the ready. Mitigation of one does not stop the botnet, since the
botmaster also has the ability to update the DNS entry with the authority server. (This is popularly referred to as
the whack-a-mole strategy of survival, after the name of a popular carnival amusement game.)

A passive DNS sensor, shown in Figure 10.5, is deployed so that it observes the answers provided by the
authority server. The sensor stores all answers from the authority, and not just the most current mapping.
Migration of the C&C server is therefore observable. One can also use the passive DNS database to study the
migration pattern and history of all domains (or at least those domains that users below the recursive server
consulted). Legitimate Internet servers have very good reasons to stay put, and change their IP addresses very
infrequently. In contrast, fraud-oriented servers, such as botnet C&Cs, phishing sites, and drop sites, have every
motivation to change their IP address. Passive DNS gives us a window into this behavior, and is designed to track
changes in DNS mappings.

Investigators can therefore use passive DNS to discover the history of a C&C domain. A key variable is
the amount of DNS traffic generated by the victims, relative to the tendency of the C&C server to migrate. If the
C&C server migrates frequently, one needs enough victims to force a new cache discovery of the changed IP.
Such sensors are most useful when deployed at locations where sufficiently high volumes of traffic are expected.

The University of Stuttgart provides a web interface to a passive DNS service, at:

http://cert.uni-stuttgart.de/stats/dns-replication.php

Users can make a rate-limited number of queries. When investigating a malicious domain, their web interface lets
one query the passive DNS logger database.

At first, it may seem incongruous that a DNS trace from a few networks (many in Germany) would
provide clues to, say, a security investigation in a New Zealand network, or in some other distant part of the
world. But since botnets do not differentiate between their victims, it is very likely that passive DNS sensors on
remote networks will provide useful clues to local investigations. Victims in other network may provide clues
useful to your own network.

To further illustrate the utility of passive DNS, consider the following approach, taken in response to an
botnet alert on a local network.

Assume one discovers an infect host, H, attempting to contact a C&C server, called C
1


o One can remediate host H, but how do you stop other potential victims in the network
from reaching the command and control site?

o One can of course block access to C
1
, assuming the domain has no other legitimate uses.
But the botmaster can update the C&C site to a new address, C
2
. And once you discover
the new site, they can move the C&C site to any other address, C
i
.

How can one discover the other C&C sites, without having your local machines first become
victims? The local network administrator often has difficult choices:

o One can discover the binary, and run it in a honeypot to track the botnet. This is difficult,
since binaries are often not easily discovered in the early hours of an infection. Further,
having this technical capability can be expensive for small networks, and may violate
local policies about handling live malware.

o One can instead leave the infected host H connected to the botnet at host C
1
, and watch
what other C&C machines it later reaches. This approach of course risks local assets, and
Page 11 of 17
is usually not an option for networks that have taken the trouble to draft security incident
response policies.

One can instead consult a passive DNS service, and ask what other mappings were seen for the
C&C domain. For a given domain, passive DNS server can tell what other IP addresses are
associated. A firewall rule can block access to the servers at C
1
, C
2
,... C
i
, even if your local
victim H has only contact the first command-and-control site, C
1
. Similarly, the list of associated
IPs for a botnet C&C domain may help one expand a local investigation.

In effect, victims in remote networks become canaries in the mine, and indicate what other IPs are
associated with a botnet outbreak. This lets administrators rapidly deploy comprehensive firewall blocking rules.
Further, passive DNS logs help remediation. For example, one can locate other victims in the local network by
consulting flow logs, to see who has recently contacted any of the C&Cs possible IP addresses.

Network administrators battling botnets are encouraged to run passive DNS servers of their own. In most
cases, local network privacy rules would not prevent the sharing of information about remote, third-party
domains. (Passive DNS only shares what answer came from a DNS server, not the time, or the user who requested
the information.) And even if privacy rules prevent the sharing of this information, the additional information
may help local investigations.

Heuristics
The preceding section discussed the use of sensor tools to investigate botnets. Using particular aspects of
the DNS traffic, one can further design additional detection heuristics.

Here, we have a needle-in-the-haystack problem. Many recursive servers handle hundreds of thousands of
packets per second, for thousands of different domains. Differentiating the legitimate domains from the botnet
C&C domains is a complex research problem. There are far too many domains for a human to do this by hand.

In the following sections, we discuss monitoring techniques that help one classify domains as either
benign or suspect. Note that these factors are not decisive, and would not be suitable for an automated response
system. Rather, they are heuristics that help a human expedite a review of suspect domains.

TTL Monitoring

DNS responses optionally have a TTL field, which suggests how long (in seconds) a server and
application should cache the address.
20
Note that caching is itself optional, (though recommended) and the time
given to a cache is also optional. The field is an unsigned 32-bit field, ranging from 0 (meaning do not cache) to
2
32
. In some DNS resolvers, the value can be set in a zone file, where each domain has an optional TTL period set
by the $TTL directive.
21


Many legitimate services use long caching times, e.g., 86400 seconds or more. This relieves the load on
authority servers, and generally provides resolvers with shorter network paths to a cache and faster responses.
Lengthy cache times are appropriate and often used for legitimate servers because they seldom change IP
addresses. This is not universally true, of course. Some legitimate servers (most famously cnn.com, which tends
to use 5 minute cache times) opt for a shorter cache time. This provides them flexibility in handling large spikes

20
R. Elz & R. Bush, Clarifications to the DNS Specification, J uly, 1997,
ht t p: / / www. f aqs. or g/ r f cs/ r f c2181. ht ml
21
M. Andrews, Negative Caching of DNS Queries (DNS NCACHE), March, 1998,
ht t p: / / www. f aqs. or g/ r f cs/ r f c2308. ht ml

Page 12 of 17
of exponentially arriving traffic. But these situations tend to be the minority; most legitimate sites use longer TTL
periods.

DNS-Based Botnet Detection

As discussed above, botmasters tend to migrate C&C servers, to avoid remediation, and frustrate take-
down efforts. To maximize the number of victims that migrate to a new C&C server, botmasters tend to favor
shorter TTL periods for domains. This of course is not universally the case, but botnets that use lengthy TTL
periods must keep C&C servers up for at least that length of time, or suffer a loss in victim population with each
server migration.

Figure 10.6: (a) A histogram of 443 sampled botnet C&C TTLs.

Page 13 of 17
Figure 10. 6 (b) A CDF of Botnet C&C TTL. Note the majority of the population is under a few hours.


Thus, while not universally the case, there is nonetheless a subclass of botnets that favor low TTL
periods. This is demonstrated in Figure 10.6(a), which shows a distribution of botnet C&C TTLs. The sampling
started with 443 active botnets. Botnets already flagged for abuse at the SOA were excluded, and typically had
a TTL >86400. Many of the domains have very short TTL periods for the life of the botnet.

Figure 10.6(b) shows the same population in a cumulative distribution function (CDF) graph. In general,
a CDF shows what fraction of an overall distribution falls below a particular threshold. In Figure 10.6(b), we see
that 50% of the population had a TTL below 2 hours, and 85% of the population used less than 3 hours. With
noted exceptions (e.g., akamai, cnn.com), TTL periods for legitimate sites are often set for days. Short TTL
periods are therefore an indication (but not proof) that a domain is suspicious. At the very least, they help one
rank domains, so that an analyst is more productive in a manual review.
22


Caution should be used when focusing exclusively on short-TTL values. First, this parameter is easily
manipulated by botmasters. (For example, most Dynamic DNS services provide a simple interface to let domain
owners adjust TTL values). As such, botnet detection based solely on TTL values is extremely brittle. Second, it is
quite common for legitimate domain owners to shorten TTL values in advance of IP renumberings or server
migration. That is, many legitimate domains with long TTL values (e.g., TTL =86400) will shorten TTLs in
advance of network maintenance that results in IP changes, all to minimize disrupting clients.




22
Machine learning models are also possible, but are beyond the scope of this chapter.

Page 14 of 17



Network administrators observing DNS traffic at their network edge can use short TTLs to prioritize
suspicious domains, and assist investigations. Similarly, researchers who encounter domains gathered from
honeypots and binary analysis should note short TTL periods, since they are a hallmark of suspicious activities.

Request Rates

As noted above, client populations distributed around the world tend to fall into different time zones, with
different diurnal patterns. Each hour of a day, a new population of victims potentially comes online. This in turn
drives large spikes in recursive traffic. As illustrated in Figure 10.4, these recursive lookups ultimately result in
traffic directed towards an authority server.

To help identify suspicious domains, we can rank and prioritize domains based on their associated request
rates. The theory is that malicious domains (with large numbers of victims) should tend to have a larger volume of
recursive and SOA refreshes. The problem of course is that some legitimate domains also have very high request
rates.

To address this problem, we can look at patterns of resolutions associated with different levels of a
domain. We can classify DNS requests as either second-level domain (SLD) requests, such as exampl e. com, or
third-level subdomain requests (3LD), such as f oo. exampl e. com. To avoid increased costs and additional
risks, botmasters tend create botnets within 3LDs, all under a common SLD. For example, a botmaster may
purchase the string exampl e. comfrom a registrar, and then also arrange for DNS service for the 3LDs
bot net 1. exampl e. com, bot net 2. . . , and so on. The botmasters use subdomains in order to avoid
creating a new domain, different SLD for each new botnet, e.g., exampl e1. com, exampl e2. com.

Each transaction to create such a domain involves risk. The seller may be recording the originating IP for
the transaction, requiring the bot master to use numerous stepping stones or proxies. Some registrars are careful
about screening and validating the whoi s contact information provided by the domain purchaser. Some dynamic
registrars require phone numbers and other identification. If the purchase is performed with stolen user accounts,
there is a further risk of being caught. Since many DNS providers offer subdomain packages (e.g., a few free
subdomains with DNS service) this allows the botmaster to reuse their purchased domain, and minimize both their
costs and risk.

Botmasters see another advantage in using subdomains. Even if service to a 3LD is suspended, service to
other 3LDs within the same SLD is usually not disrupted. So, if bot net 1. exampl e. comis blocked, traffic to
nor mal user . exampl e. comand bot net 2. exampl e. comis not disrupted. This lets botmasters create
multiple, redundant DDNS services for their networks, all using the same SLD.

Page 15 of 17
Figure 10.7: Comparison of Canonical DNS Request Rates

By comparison, most normal users usually do not employ subdomains when adding subcategories to an
existing site. For example, if a legitimate company owns exampl e. com, and wants to add subcategories of
pages on their web site, they are more likely to expand the URL (e.g., exampl e. com/ pr oduct s) instead of
using a 3LD subdomain (e.g., pr oduct s. exampl e. com). This lets novice web developers create new content
cheaply and quickly, without the need to perform complicated DNS updates (and implement virtual hosts
checking in the web server) following each change to a web site.

This is, of course, essentially a sociological observation about how botmasters and normal users behave
when creating subdomains and domain content. There will be exceptions, and the behavior of both groups can
also change. But the motivating factors (risk, cost, and convenience) should persist. We therefore assume that, in
the large, this observation may hold for a class of botnets (but certainly not all).

This fact helps us design a simple detection system. We can score domains based on the number of
sibling and child domain lookups that occur. Thus, we can penalize the ranking of domains using the traffic
volumes set to sister domains. For example, if one observes large amounts of legitimate traffic to googl e. com,
and large volumes of botnet traffic to bot net 1. exampl e. comand bot net 2. exampl e. com, we can sift
out the botnets by scoring the parent zone, exampl e. com, based on the traffic directed at its children. One can
think of this as ranking families of domains, based on the amount of traffic sent to the parent zones subtree.
Similarly, it mirrors some of the analysis provided by dig, when run in trace mode, as discussed above.
Logically, this must start at the SLD-level.

Figure 10.7 shows an application of this technique. After monitoring DNS traffic at a busy service
provider for several weeks, approximately 1.28 million DNS requests were sampled. Fig. 10.7 shows the average
lookup rate for normal hosts, in requests per hour.

When SLD domain traffic is placed into a canonical form (based on the volume of traffic directly to
subdomains), it becomes much easier to distinguish the normal and bot traffic. Since the botnet traffic tends to
favor a family of related domains (e.g., bot net 1. exampl e. com, bot net 2. exampl e. com), ranking the
domains based on the traffic to a particular subtree helps separate the signal (bot traffic) from the noise (normal
traffic).

Page 16 of 17
Once again, its important to note that this heuristic is not by itself a complete classifier. Further, one
must appreciate that botmasters are human adversaries, and always have a chance to respond to any detection
system. This particular heuristic is based more on risk factors that influence a botmasters decision process, rather
than empirical (and potentially brittle) observations. As such, it may prove more useful and resilient than other
detection strategies.

Conclusion
This chapter has discussed key properties of DNS, how botnets affect DNS traffic, and what DNS-based
tools and heuristics are useful in detecting botnets.

Botnets generate large waves of DNS traffic. This is dampened by the impact of caching, both at the host
application/stub level, and at the recursive level. The discussion above noted how researchers may need to adjust
local caching behavior, and how botnets already do the same.

The detection and remediation of botnets is assisted by DNS sensors as well. Passive DNS in particular
provides researchers an opportunity to provide more comprehensive responses. By logging all addresses
associated with a domain, passive DNS lets administrators expand investigations, and implement more complete
remediations.

This chapter also discussed how some heuristics can be used to identify suspicious domains. For example,
low TTL values and weighted traffic volumes can help priority rank domain traffic. While not applicable to all
botnets, these approaches have some demonstrated utility. The reader is urged to follow the example used in
designing these heuristics. Researchers should consider properties that are inherent in a botnets behavior, and less
likely to change over time.

Solutions Fast Track
How do Botnets Use DNS?
Botmasters frequently create multiple, redundant command-and-control centers. By
manipulating the DNS entry for the C&C domain, botmasters can migrate victims between
different C&C centers.
Because caching delays the propagation of a new IP for a C&C center, botmasters seek to
minimize cache times. Bots frequently flush the stub and application caches.
Further, botmasters minimize recursive DNS server cache times by setting a low TTL
period for the C&C domain.
Because botnets often have victims all over the world, victims in different time zones
generate traffic in waves. This includes DNS resolutions, which can be reduced by
caching behavior.
Using DNS to Assist Botnet Response
To detect the multiple IP addresses often associated with a botnet domain, once can use a
Passive DNS service. Passive DNS stores all resolutions associated with a domain. This
lets one learn not only where a C&C is located, but where it used to be located.
Local administrators may also run their own Passive DNS collection logger.
Because of the large numbers of domains found in most DNS traces, heuristics are needed
to rank order or prioritize suspicious domains.
Using DNS Detect Botnets
Page 17 of 17
To increase the network agility of a botnet, botmasters favor short TTL periods, or the time
a domain is cached by a host or caching DNS server. Empirical evidence suggests the vast
major of bot-oriented domains are cached for only a few hours. With a few noted
exceptions, most legitimate domains are cached for a period of days.
Since botnets often have many victims, large volumes of DNS traffic are associated with
botnets, particular at authority DNS servers. Many legitimate domains also experience
large volumes of traffic.
To help distinguish the two, one can score the traffic for subdomains. For example, traffic
to a domain can be weighted by the traffic directed to sibling domains.
These detection techniques are merely heuristics. Botmasters are human adversaries, and
can respond to detection strategies.
Frequently Asked Questions
Q: What is DNS caching?
A: Answers from DNS servers are often stored in caching DNS servers, in stub resolver (host OS) cache lines, or
by particular applications, such as IE and Firefox. When a host performs a DNS lookup, it consults any
relevant local application caches, host caches, and finally any cache associated with a recursive server. This
improves DNS performance, but also affects the type of traffic a researcher may find when investigating a
botnet.

Q: Why do bots flush local DNS caches?
A: Botnets are mobile, and often use multiple C&C sites. To shift victim populations between C&C servers,
botmasters merely have to change the DNS entries. Local caching of previous C&C locations delays or
prevents victims from reaching a new C&C. Thus, bots often flush the local DNS cache, using various
utilities or host API functions.

Q: Where can I access a passive DNS service?
A: One is available from http://cert.uni-stuttgart.de/stats/dns-replication.php

Q: What will passive DNS show me?
A: For a given domain, you can obtain every previous resolution of the domain observed by the network. This
includes mail, name server, CNAME, A-records, and others.

Q: What DNS properties are useful for botnet detection?
A: Botnets often, but not always, use low TTL periods for C&C domains. In other words, the DNS entry for a
botnet domain is often short, or under a few hours. This contrasts with legitimate domains, with a few noted
exceptions. Because botnets have large numbers of victims, they often create large spikes of DNS lookups at
authority servers, and at recursive servers.

Q: Cant botmasters evade this sort of DNS-based detection?
A: Of course. Botmasters get a turn to respond to any detection regime. Factors such as weighted subdomain
request volume and low TTL periods, however, are likely to remain valid for a class of botnets. There are
practical reasons for botmasters to continue using low TTL values, and difficulties in adjusting volumes of
traffic before victim rallying has completed. These are heuristics, and have a shelf life.

Vous aimerez peut-être aussi