Académique Documents
Professionnel Documents
Culture Documents
A Dissertation
of Cornell University
Doctor of Philosophy
by
Ranveer Chandra
January 2006
c 2006 Ranveer Chandra
This doctoral dissertation describes the design and applications of a new virtualization
architecture for wireless network cards, called MultiNet. MultiNet virtualizes a sin-
gle wireless card to appear as multiple virtual wireless cards to the user. Each virtual
card can then be configured separately on a physically different network. The goal of
cards although the network card is on a single network at any instant. MultiNet achieves
this transparency using intelligent buffering and switching algorithms. The switching
and buffering mechanisms are implemented as a kernel driver, while the policies are
implemented as a user-level service. The MultiNet system has been implemented over
Windows XP and has been operational for over two years. It is agnostic of the upper
layer protocols, and works well over popular IEEE 802.11 wireless LAN cards. Further,
MultiNet enables a new class of applications, which were earlier only possible with
multiple wireless cards in the device. This dissertation describes two such applications:
SSCH is a new channel hopping protocol that works over MultiNet, and utilizes fre-
quency diversity to increase the capacity of IEEE 802.11 wireless networks. Each node
using SSCH switches across channels in such a manner that nodes desiring to communi-
cate overlap, while disjoint communications do not overlap, and hence do not interfere
with each other. To achieve this, SSCH uses a novel scheme for distributed rendezvous
and synchronization. Simulation results show that SSCH significantly increases network
advantage of the beaconing and probing mechanisms of IEEE 802.11 to ensure that
connected clients do not pay unnecessary overheads while helping disconnected clients.
Client Conduit has been implemented over Windows XP as part of an architecture for
Ranveer was born in Jamshedpur, an industrial town in Eastern India on August 27, 1976
as the third in a family of four children. He lived in Jamshedpur for the first 18 years
of his life and decided to appear for the IIT exam after finishing high school. Ranveer
secured a good rank in the IIT qualifying exam and decided to go to IIT Kharagpur,
which was within 100 miles of Jamshedpur. IIT Kharagpur provided an ideal setting
for Ranveer to complete his undergraduate education in an environment that had good
professors, extraordinary peers, little distraction, and still a lot of fun. Ranveer ma-
jored in Computer Science, and developed a keen interest in computer networking and
motivated Ranveer to study further. He applied to a few schools in the United States,
and decided to go to Cornell University in Ithaca, NY for his PhD in Computer Science.
Over the six years at Cornell University Ranveer worked with a number of people at
Cornell. He also spent three summers in Microsoft Research and one at AT&T Labs -
Research, and enjoyed working in industrial research labs. After completing his PhD,
Ranveer is headed towards the North-West, where he has accepted an offer from Mi-
iii
ACKNOWLEDGEMENTS
First, I want to thank my advisor, Ken Birman, for his constant support and guidance
during my six years of PhD study at Cornell University. He kept me motivated and
provided the right direction that enabled me to finish these challenging years of work.
His sharp intellect and great comments were always the guiding feature in my PhD.
Further, his towering figure in the field of Computer Science has been and will always
Secondly, I am grateful to Victor Bahl for bringing out in me what I really wanted
to do in research. Interactions with him during the three internships made me realize
the open problems in wireless networking, and what I needed to do to make an impact
in this field. Victor has also been a constant source of encouragement. His unbridled
I am also grateful to my other committee members, Eva Tardos, Zygmunt Haas and
Robbert VanRenesse, who have been supportive of my research in every step of my PhD.
Their comments have been very valuable in rewriting the final draft of this dissertation.
lar, Atul Adya has been a great influence during my PhD. His views and ideas have
influenced the way I write, present and do my research. Lili Qiu has showed me how
perseverance, patience and good work always pays off. Finally John Dunagan has been
of great help in reviewing my work, and showing me the right direction. In addition I
would also like to thank Alec Wolman and Jitu Padhye for great research conversations.
friends for keeping me motivated to finish my PhD. My parents have shown their belief
in me and supported me in every possible way. My sister and brother-in-law have al-
ways been with me through the troubled phases of my PhD. I would also like to thank
iv
Meenakshi, Biswanath, Ben, Rimon, Indranil and Rama for making the six years of stay
v
TABLE OF CONTENTS
1 Introduction 1
1.1 Problems with Existing Wireless Networks . . . . . . . . . . . . . . . 1
1.2 Thesis and Its Contributions . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Limitations of this Dissertation . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Roadmap of this Dissertation . . . . . . . . . . . . . . . . . . . . . . . 5
vi
2.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
vii
4.9 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
4.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
5 Conclusion 158
References 160
viii
LIST OF TABLES
2.1 The Switching Delays between IS and AH networks for IEEE 802.11
cards with and without the optimization of trapping media connect and
disconnect messages. . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.2 The average throughput in the ad hoc and infrastructure networks using
both strategies of MultiNet and two radios . . . . . . . . . . . . . . . 45
2.3 The average packet delay in infrastructure mode for the various strategies 46
2.4 The average packet delay in infrastructure mode on varying the number
of MultiNet connected networks . . . . . . . . . . . . . . . . . . . . . 50
4.1 Different fault diagnosis mechanisms and entities that can diagnose
them; the last column indicates if the solution can be supported using
legacy APs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
4.2 Times for different operations: U means time measured from user-level
code; rest are times taken for the corresponding ioctl to complete . . . 140
ix
LIST OF FIGURES
2.1 The MultiNet Layer maintains virtual interfaces for networks 1, 2 and
3, and switches the physical card across all these networks. It gives the
illusion of connectivity on all networks although the card is on network
2 at this instant. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 The steps of Spoofed Buffering when a node uses MultiNet to connect
to two networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3 Two nodes in communication range and using MultiNet that fail to
overlap in the ad hoc network abd hence experience a logical partitioning. 26
2.4 The Network Stack with MultiNet . . . . . . . . . . . . . . . . . . . . 29
2.5 Time taken to complete a 47 MB FTP transfer on an ad hoc and infras-
tructure network using different switching strategies . . . . . . . . . . 36
2.6 Variation of the activity period for two networks with time. The activity
period of a network is directly proportional to the relative traffic on it. . 37
2.7 TCP Performance with and without Spoofed Buffering. . . . . . . . . 39
2.8 Effect on UDP flows when a node uses Slotted Synchronization to join
an ad hoc network . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.9 MultiNet in a Mobile Scenario . . . . . . . . . . . . . . . . . . . . . . 42
2.10 Packet trace for the web browsing application over the infrastructure
network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.11 Packet trace for the presentation and chat workloads over the ad hoc
network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.12 Comparison of total energy usage when using MultiNet versus two radios 47
2.13 Energy usage when using MultiNet and two radios with IEEE 802.11
Power Saving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.1 Only one of the three packets can be transmitted when all the nodes are
on the same channel. . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.2 Channel hopping schedules for two nodes with 3 channels and 2 slots.
Node A always overlaps with Node B in slot 1 and the parity slot. The
field of the channel schedule that determines the channel during each
slot is shown in bold. . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.3 The problem with a naive synchronization scheme. Node A has two
slots, with (channel, seed) pairs represented by A1 and A2 ; nodes B
and C are similarly depicted. At time t1 , node A synchronizes with
node B. Node B synchronizes with node C at time t2 , after which A
and B are no longer synchronized. . . . . . . . . . . . . . . . . . . . . 72
3.4 Need for De-synchronization: All nodes converge to the same channel
without de-synchronization. . . . . . . . . . . . . . . . . . . . . . . . 74
3.5 Switching and Synchronizing Overhead: Node 1 starts a maximum rate
UDP flow to Node 2. We show the throughput for both SSCH and IEEE
802.11a. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
x
3.6 Overhead of an Absent Node: Node 1 is sending a maximum rate UDP
stream to Node 2. Node 1 then attempts to send a packet to a non-
existent node. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.7 Overhead of a Parallel Session: Node 1 is sending a maximum rate
UDP stream to Node 2. Node 1 then starts a second stream to Node 3. . 83
3.8 Overhead of Mobility: Node 1 is sending a maximum rate UDP stream
to Node 2. Node 1 starts another maximum rate UDP session to Node
3. Node 3 moves out of range at 30 seconds, while Node 1 continues to
attempt to send until 43 seconds. . . . . . . . . . . . . . . . . . . . . . 84
3.9 Overhead of Clock Skew: Throughput between two nodes using SSCH
as a function of clock skew. . . . . . . . . . . . . . . . . . . . . . . . 85
3.10 Disjoint Flows: The throughput of each flow on increasing the number
of flows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.11 Disjoint Flows: The system throughput on increasing the number of
flows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.12 Non-disjoint Flows: The average throughput of each flow on increasing
the number of flows. There is a flow from every node in the network. . 88
3.13 Non-disjoint Flows: The system throughput on increasing the number
of flows. There is a flow from every node in the network. . . . . . . . . 89
3.14 Effect of Flow Duration: Ratio of SSCH average throughput to IEEE
802.11a average throughput for flows having different durations. . . . . 90
3.15 TCP over SSCH: Steady-state TCP throughput when varying the num-
ber of non-disjoint flows. . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.16 Multihop Chain Network: Variation in throughput as chain length in-
creases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.17 Mulithop Mesh Network of 100 Nodes: Average flow throughput on
varying the number of flows in the network. . . . . . . . . . . . . . . . 94
3.18 Impact of SSCH on Unmodified MANET Routing Protocols: The av-
erage time to discover a route and the average route length for 10 ran-
domly chosen routes in a 100 node network using DSR over SSCH. . . 95
3.19 Dense Multihop Mobile Network: The per-flow throughput and the av-
erage route length for 10 flows in a 100 node network in a 200m×200m
area, using DSR over both SSCH and IEEE 802.11a. . . . . . . . . . . 97
3.20 Sparse Multihop Mobile Network: The per-flow throughput and the
average route length for 10 flows in a 100 node network in a 300m ×
300m area, using DSR over both SSCH and IEEE 802.11a. . . . . . . 98
xi
4.6 CPU usage in Promiscuous mode (1 GHz machine) . . . . . . . . . . . 141
4.7 Breakdown of costs for Client Conduit. The protocol steps are executed
from the bottom entry in the legend to the topmost, i.e., starting at “Set
channel”. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
4.8 Time taken by a disconnected client to transfer data via Multinet . . . . 144
4.9 Median error in locating disconnected clients. The lower and upper
bounds of error bars correspond to min and max error. E(i) denotes
that the ith connected client’s location contains error. . . . . . . . . . 145
4.10 E DEN’s accuracy of estimating the delay at a client . . . . . . . . . . . 148
4.11 Breakdown of delay at the client, AP, and the medium as estimated by
E DEN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
4.12 Overlapping channels on which an AP is overheard . . . . . . . . . . . 151
4.13 Overlapping channels heard relative to distance . . . . . . . . . . . . . 152
4.14 The maximum idle time duration available during every 5-minute pe-
riod at different times of the day . . . . . . . . . . . . . . . . . . . . . 152
xii
CHAPTER 1
INTRODUCTION
There has been a recent interest in using multiple wireless cards in a device [9,64,87,95,
115, 119]. This dissertation provides a cheaper and more energy-efficient scheme to get
the functionality of multiple wireless cards while using only a single physical network
interface. This approach is called MultiNet, which is a new architecture for virtualizing
wireless cards. MultiNet is very useful in solving some of the key problems in wireless
Wireless technology has an increasing presence in our life from cellular phones, wireless
LANs, Bluetooth headphones, cordless phones, location systems, to smart homes, and
many more. This trend will grow with an increasing deployment of sensor networks [61,
88], mesh networks [50, 93], and the recent WiMAX initiative [63, 133]. Although they
are increasingly common, wireless networks are still relatively fragile and underutilized.
ageability costs [5, 7, 39, 103, 131]. The state of the art will be significantly en-
lems with minimum human intervention and informs the user of ways to recover
1
2
capacity is still a bottleneck for many applications [40, 65, 95]. Any scheme that
increases wireless capacity, through advanced antennas [34, 42] and smarter pro-
tocols [16, 114] will greatly impact the wireless performance of a number of ap-
plications.
• Power: Limited battery power is the Achilles heel for wireless applications [72].
Applications and protocols for mobile computing should prolong battery life by
using schemes such as maximizing sleep durations of wireless cards [71], using
This doctoral dissertation contributes towards solving these problems for IEEE 802.11
MultiNet virtualizes a single wireless card to make it appear as multiple wireless cards
to the user. The user can configure each virtual card separately to be on a physically
different network. For example, when using an IEEE 802.11 card the user can connect
one virtual card on an infrastructure network, and the other virtual card on an ad hoc
network, although the network card is on a single physical network at any instant. The
wireless networks. MultiNet achieves this transparency using intelligent buffering and
switching algorithms. MultiNet has been implemented over Windows XP and is avail-
able for download. In addition to describing this architecture, this thesis also explores
three ways in which MultiNet alleviates the above problems of wireless networks.
where the fixed energy cost of multiple physical interfaces is not feasible. MultiNet also
enables a new power saving mechanism by allowing nodes to function as relays using
only one wireless card: nodes with low battery power can send their traffic to the Access
Secondly, MultiNet facilitates a way to increase the capacity of wireless ad hoc net-
scale poorly with the number of communicating nodes [67]. When multiple neighbor-
ing node pairs want to communicate using IEEE 802.11, only one pair can be active at a
time. However, other nodes can talk simultaneously if they are on orthogonal frequency
channels, since traffic on orthogonal channels do not interfere. But this breaks the se-
channels and cannot communicate. MultiNet helps to solve this problem. The number
a new scheduling algorithm, called Slotted Seeded Channel Hopping (SSCH), which
works with MultiNet to improve network capacity. The goal of SSCH is to have com-
municating nodes on the same channel and other nodes on randomly different channels
at any instant, while ensuring that any two neighboring nodes overlap within a fixed
period. SSCH achieves this goal by introducing the technique of partial synchroniza-
tion and also makes use of existing techniques such as pseudo-random generators. It is
shown mathematically that SSCH has desired synchronization properties. Using sim-
IEEE 802.11.
chines, called Client Conduit, which is used to diagnose faults in infrastructure wireless
wireless networks [37] entails a pressing need for wireless network management tools,
similar to wired networks [56, 94]. Network administrators want to know why users are
suffering from poor performance and frequent disconnections. They are interested in lo-
cating security breaches, for example an unauthorized (rogue) access point plugged into
an enterprises’ Ethernet jack that jeopardizes its resources. In our architecture, Client
Conduit allows disconnected clients to transfer diagnostic messages to and from a back-
end server. It is implemented using MultiNet, since it allows connected clients to stay on
the infrastructure network using one virtual interface, and form an ad hoc network with
the disconnected client on another virtual interface. This thesis presents a lightweight
mechanism to implement Client Conduit, where virtual interfaces are added dynami-
cally and a connected client suffers no penalty in the common case. It also proposes
algorithms to detect rogue access points, locate disconnected clients, and diagnose poor
wireless performance. This architecture has been prototyped over Windows XP using
Although MultiNet has been implemented over Windows XP, it has not been tested over
all cases and in large deployments. Consequently, simulation results were used to show
the feasibility MultiNet. Further, the inability of available hardware to quickly switch
across frequency channels limited all results on SSCH to simulations in QualNet [62].
SSCH was done to show that SSCH will significantly improve the capacity of wireless
networks when the required hardware is available. MultiNet, SSCH and our fault diag-
and 4 respectively.
5
Chapter 2 describes the MultiNet architecture in detail. It also shows that MultiNet
consumes less energy than an alternative approach of using multiple wireless cards.
Chapter 3 describes the SSCH protocol and its properties, and analyzes the performance
of SSCH. Chapter 4 then presents our fault diagnosis architecture, and describes and
evaluates the design of Client Conduit. Finally, Chapter 5 concludes this dissertation.
Most of the contents of Chapters 2, 3 and 4 are adapted from previously written
independent papers, in particular [30], [16] and [3] respectively. The contributions of
coauthors of each of these papers is listed in the last paragraph of each chapter.
CHAPTER 2
2.1 Introduction
Systems research over the last two decades has revealed a number of benefits of virtual-
izing different systems components, such as virtual machines [20, 49, 126, 130], virtual
storage [55, 81] and virtual memory [23] among others. However, the benefits of vir-
tualizing a wireless card have not been explored. This chapter describes MultiNet, a
new virtualization architecture that abstracts a single wireless LAN (WLAN) [60] card,
ity for end users; increased range of the wireless network; bridging between infrastruc-
ture and ad hoc wireless networks, and painless secure access to sensitive resources. We
discuss these in detail in Section 2.2. To explore this problem space with current tech-
nology, one would have to use a single WLAN card for each desired network [64, 115].
Doing so is costly, cumbersome, and consumes energy resources that are often limited.
Virtualizing a wireless card poses several research challenges. Firstly, a virtual wire-
less card should appear as a real (physical) wireless card to the user. Secondly, the user
should get an illusion of simultaneous connectivity on all virtual cards, although the
physical wireless card can only be on one network at any instant [58]. Thirdly, the
system should be deployable and compatible with nodes not using virtualization. More-
over, the virtualization software should not require modifications to existing backbone
MultiNet solves the above problems by creating a new virtual interface for each net-
6
7
work to which connectivity is desired. The virtual interface exports itself as a new phys-
ical device to the network layer. It also maintains the state of the physical card required
for connecting to the wireless network corresponding to this virtual interface. Multi-
Net achieves the illusion of simultaneous connectivity over all networks by switching
the physical network card across the desired networks and activating the correspond-
ing virtual interface. Further, MultiNet is deployable as it does not require changes to
APs and routers. This is achieved by a new protocol called Spoofed Buffering, which
leverages the Power Save Mode of the IEEE 802.11 [58] standard, and is described in
Section 2.5.4.
WLAN cards. As part of the design it describes the state that needs to be stored for
every virtual wireless card. It also describes in detail the implementation of Multi-
Net over Windows XP. The implementation works with modest modifications to
the Operating System kernel, and without any modifications to the wireless card
drivers.
• It proposes a new protocol, called Spoofed Buffering, which delivers packets sent
to a node using MultiNet when it is on another network. APs buffer packets for
the nodes that have switched to another network, and deliver them when nodes
switch back to their network. Spoofed Buffering achieves this functionality with-
out requiring any changes to APs. This protocol has also been used in a recent
cards, and shows that MultiNet is suitable for most applications. It describes
and shows that MultiNet consumes less battery power than an alternative approach
As of this writing, MultiNet has been operational for over two years. During this
time, we have refined the protocols and analyzed them in greater detail. Many of the
results we present in this chapter are based on real working systems that include current
and next generation IEEE 802.11 wireless cards. For cases where it is not possible to
study the property of the system without large scale deployment and optimized hard-
ware, we carry out simulation based studies. Most of our simulations are driven by
traffic traces that represent ‘typical traffic’. For IEEE 802.11, our study shows that
MultiNet nodes can save upto 50% of the energy consumed by nodes with two cards,
while providing similar functionality. We also quantify the delay versus energy tradeoff
The rest of this chapter is organized as follows. Section 2.2 presents some scenarios
and applications that motivate the need for MultiNet and for which MultiNet is currently
being used. Section 2.3 presents some related research and Section 2.4 provides the
background needed for the rest of the chapter. The MultiNet architecture is presented
in Section 2.5, and its implementation is described in Section 2.6. Performance and
feasibility are discussed in Sections 2.7 and 2.8. Future work is presented in Section 2.9
MultiNet enables several new applications that were earlier not possible using a single
border nodes use MultiNet to function as relays for authorized nodes that are
outside the range of the Access Point (AP). We implemented this functionality as
• Gateway Node: A node that is part of a wireless ad hoc network and close to
an AP, connected to the Internet, can use MultiNet to stay connected on both
networks, and become a gateway node for the ad hoc network [26].
leged user, who has permission to access different networks, can use MultiNet to
less card into as many instances as the number of orthogonal channels, and simul-
• Virtual Machines: Existing Virtual Machine architectures (for example, [28, 126,
130]) restrict all virtual machine instances to stay connected on the same wireless
connect to an AP without disconnecting from its previous one. This technique has
All the above scenarios require nodes to stay connected on more than one wireless net-
work, and MultiNet achieves this functionality with only one wireless card.
Virtualization has been studied extensively for abstracting a single system resource as
multiple available resources to the user. For example, Virtual Machine architectures,
such as VMWare [126], Denali [130], Xen [20], Terra [49], etc., virtualize a single com-
puter to give an illusion of many smaller virtual machines, each running its own oper-
ating system. Storage Virtualization systems, such as Facade [81] and Stonehenge [55],
virtualize a storage device into multiple logical storage devices. Similarly, Virtual Mem-
ory [23,41] presents an illusion of larger memory to user programs than is actually avail-
able. MultiNet is similar to the above systems in abstracting a single resource, in this
case a wireless card, as multiple wireless cards to the user. However, to the best of our
Prior work has looked at virtualizing the wired network interface on a machine. The
Virtual Machine architectures discussed above [20, 28, 49, 126, 130] virtualize all hard-
ware resources, including the network interface [120]. Other systems for low latency
communication, such as U-Net [128] and VIA [29, 38], virtualize the network interface
to multiple local virtual interfaces, one for each process. The physical network interface
is multiplexed across the virtual interfaces to send packets sent by a process. Network
Cloning [138] brings up multiple network stacks for a single physical interface. Similar
to these systems, MultiNet abstracts the wireless interface as multiple virtual interfaces,
and multiplexes the physical card across the virtual instances. However, it faces different
11
challenges that do not arise in the case of wired networks. Firstly, each virtual wireless
a contrast to the above systems, only one virtual instance is physically on the network at
any time. Secondly, switching to a different network takes a few hundred milliseconds,
as we show in Section 2.7. So, the approach used by the above systems, where packets
from different virtual interfaces are serviced by the wired interface in the order in which
they arrive, might incur a network switch overhead for every packet. This scheme may
not be suitable for virtualizing wireless cards. MultiNet uses different switching and
Another set of related work looked at smart channel hopping schemes over a single
wireless radio [66, 89, 114]. The idea is to distribute interfering traffic on different fre-
quency channels to increase the capacity of wireless networks. MultiNet differs from
these systems in two aspects. Firstly, MultiNet has to switch across multiple networks
instead of channels, and consequently MultiNet has to store more state for each network.
Secondly, all the above protocols have only been evaluated in simulation. We are not
As part of MultiNet’s design goals, which we will describe in Section 2.5.1, any two
neighboring nodes in an ad hoc network should overlap on the same frequency channel
within a definite period. Our solution to this problem, described in Section 2.5.6, relies
IEEE 802.11 [58]. The algorithm or its variants [54, 74, 110] are based on an algorithm
proposed by Lamport [75], which shows that given the clock accuracy, link delay and
network diameter, and assuming that a timestamp is sent successfully along each link
within an established bound. A previous work [54] has shown that these algorithms
12
work reasonably well when there are no Byzantine failures [76] in the network. For our
algorithms to work with such failures, we would need clock synchronization algorithms
with stronger guarantees [116, 125]. However, handling these failures is out of scope of
this dissertation.
wireless networks has not been studied before in the context of wireless LANs. A related
problem was considered for scatternet formation in Bluetooth [92] networks [77, 78].
Bluetooth networks comprise basic units, called piconets, that can have at most 7 nodes.
Piconets are used to form bigger networks, called scatternets, by having some nodes on
listen to multiple piconets is significantly different from the problem of allowing nodes
scheme for communication between multiple nodes on the network. A node can be
on two networks simultaneously if it knows the correct hopping sequence of the two
networks and hops fast enough. IEEE 802.11 networks, on the other hand, have no such
and this approach has been commonly used in commercial products [64, 115, 119] and
wireless networking research [9, 87, 95]. However, as we show in Section 2.7, using
multiple radios consumes more power, which is a scarce resource in battery operated
devices. Further, a recent result shows that the performance of multi-radio systems is
significantly degraded by the self interference among the radios on the device [106]. In
Section 2.7.8, we show that MultiNet solves these problems of multi-radio systems at a
2.4 Background
This section first discusses the limitations of IEEE 802.11 and describes why maintain-
then briefly describes the Power Save Mode (PSM) [58] feature of IEEE 802.11, which
is used in the Spoofed Buffering Protocol described in Section 2.5.4. Finally, it discusses
Popular wireless networks, such as IEEE 802.11, work by association. Once associated
wireless card can receive and send traffic only on that network. The card cannot inter-
act with nodes in another network if the nodes are operating on a different frequency
channel. Further, a node in an ad hoc network cannot interact with a node in the infras-
tructure network even when they are operating on the same channel. This is because the
IEEE 802.11 standard defines different protocols for communication in the two modes
and it does not address the difficult issue of synchronization between different networks.
firmware reset each time the mode is changed from infrastructure to ad hoc or vice versa.
The IEEE 802.11 standard defines Power Save Mode (PSM) for infrastructure wireless
networks as a means to save battery power. When a node wants to use PSM, it sends a
message to the AP and sets its wireless interface to sleep mode. The message to the AP
also contains the duration for which the node wants to sleep. This duration is called the
14
Listen Interval. When the AP receives a packet destined for the sleeping node, it buffers
the packet. After a Listen Interval period, the node using PSM wakes up, and receives
the packets buffered at the AP. Usually, the Listen Interval is set to be a multiple of
the Beacon Period, where the Beacon Period is the interval at which an AP broadcasts
its beacon. The Beacon Period is a parameter of the AP, while the Listen Interval is a
In order to reduce the cost and commoditize wireless cards, IEEE 802.11 WLAN card
vendors [11, 102] are minimizing the functionality of the code residing in the micro-
controller of their cards. These next generation of wireless cards, which we refer to as
Native WiFi cards, implement just the basic time-critical MAC functions, while leaving
their control and configuration to the operating system. More importantly, these cards
allow the operating system to maintain state and do not undergo a firmware reset on
changing the mode of the wireless card. This is in contrast to the existing cards, which
2.5 MultiNet
This section first formulates the MultiNet problem and enumerates its design goals. It
maintains a timer and uses a distributed Timer Synchronization Function (TSF) [58]
the timers at all nodes in a network to within 224 µs [60]. TSF, or its modifica-
tions, ATSP [54, 74] or ASCP [110] can be used to achieve the required synchro-
• APs implement Power Save Mode (PSM), and have enough buffer space to sup-
port all nodes using PSM in the network. This feature is defined in the IEEE
802.11 standard [58], and is implemented in some existing WLAN products [35,
121, 122].
This comprises the time to switch to another channel and associate to the network.
As we discuss in Section 2.7, this overhead is a few hundred milliseconds for most
commercial wireless cards. MultiNet will give better performance when this delay
and delays. Some sample applications supported by MultiNet are browsing, file
transfers and web downloads. The reason why other applications are not sup-
• The device driver of a wireless card sends a disconnect message to the network
layer when it disconnects from a network, and a connect message when it success-
fully connects to one. On modern operating systems, such as Linux and Windows
XP, these messages are passed up to the user level and are used to display the
current status of the physical interface. In Windows XP, the device driver sends
16
connection respectively. In Linux, the device driver calls netif carrier off
• A user knows if MultiNet is being used by more than one machine in an ad hoc
where each virtual interface corresponds to a physically different wireless network. Fur-
ther, MultiNet also strives to achieve the following design goals when virtualizing a
wireless card:
able to connect different virtual cards to different wireless networks, although the
physical card is only on one network at any instant. The architecture should ensure
that packets sent to and from a virtual interface are not discarded if the physical
wireless card is not on the corresponding network at that instant. Further, when
a machine is mobile, the virtual interface should appear disconnected when the
machine moves out of range of the network. However, it should appear connected
The user should also be able to prioritize different virtual interfaces, so that pack-
ets on a more important network are sent and received with lesser delay.
work. It should work over the commonly used IEEE 802.11 standard, and with
commercial wireless cards. Further, it should not require changes to the wireless
card drivers or the network infrastructure. Nearly all of the modifications should
In addition to the above design goals, there are a few plausible goals that Multi-
Net does not achieve. Firstly, it does not aim to support real-time applications over the
network, such as Voice over IP(VoIP) [127] or streaming video. This constraint arises
from the few hundred milliseconds overhead when switching from one network to an-
other. Unless this overhead is reduced, MultiNet will be unable to provide response
time guarantees of less than a few hundred milliseconds on all networks. Secondly,
MultiNet does not handle Byzantine failures in the network. Handling these failures
would require changes to our buffering and synchronization protocols described in Sec-
tions 2.5.4 and 2.5.6 respectively, and is out scope of this dissertation. Thirdly, we defer
the discussion of using MultiNet in multi-hop ad hoc networks to Chapter 3. In the rest
of this chapter, we limit our discussion to using MultiNet in single hop ad hoc networks,
where all nodes are in communication range of each other, and in infrastructure wireless
networks. Finally, the current implementation of MultiNet allows a node to stay con-
nected on only one ad hoc network in which multiple nodes use MultiNet. Enabling a
node to use MultiNet for maintaining connections to more than one such ad hoc network
MultiNet achieves the above design goals by introducing functionality in a new layer,
between the network and physical layers of the network stack, as shown in Figure 2.1.
This layer, called the MultiNet Layer, initializes and maintains a new virtual network
interface for every new network on which the user wants to stay connected. The IEEE
802.11 parameters [58] of the physical wireless card is duplicated at each virtual inter-
face. So, each virtual interface has its own Service Set Identifier (SSID) and Network
All virtual interfaces appear as connected to the network layer, even though the
physical card is connected to only one wireless network at any instant. This is shown in
Figure 2.1 where IP sees virtual interfaces 1, 2 and 3 as connected to networks 1, 2 and
3 respectively, although the physical card is connected to Network 2. Since all virtual
interfaces appear as connected, the user might send packets on any of them. Packets
sent to a virtual interface, when the physical card is not on its corresponding wireless
network, are buffered in a packet buffer maintained at each virtual interface. Packets are
sent over the network without any delay if the physical card is on the network.
tiplexing the physical wireless card across all virtual interfaces. The physical card stays
connected on a network long enough to send and receive one or more packets on the cor-
responding virtual interface. The MultiNet Layer then switches the physical card to a
network corresponding to another virtual interface. The information about the network
is retrieved from the state stored in the virtual interface. After switching the physi-
cal card to another network, MultiNet waits for a media connect message from the
lower layers. This message is sent only if the physical card successfully switches to
another network. On receiving this message, MultiNet sends the packets buffered on
19
Application
User Level
Kernel Level
Transport (TCP, UDP)
IP
MultiNet Layer
Figure 2.1: The MultiNet Layer maintains virtual interfaces for networks 1, 2 and
3, and switches the physical card across all these networks. It gives the illusion of
the virtual interface, and stays connected to this network for some time. This cycle
Before describing the architecture further, we briefly define some terms we use in
the rest of this chapter. The period of time for which a card stays on a network after
successfully connecting to it is called the Activity Period for the network. The time to
switch to another network, from the time switching is initiated to the time the card is
associated to the wireless network, is called the Switching Delay for the network. The
Activity Period is the useful time when a card sends and receives packets, while the
Switching Delay is an overhead when the card is not on any network. The performance
20
of MultiNet is better when the Switching Delays are small. The sum of the Activity
Periods and Switching Delays over all connected networks is called the Switching Cycle.
Switching from one network to another requires the physical card to disconnect from
one network and connect to the other. Correspondingly, as described in Section 2.5.1,
the physical layer sends disconnect and connect messages to the upper layers. These
messages change the connectivity status of the virtual network interface, and as a result
only one virtual interface appears as connected at any time. This is a problem for Multi-
Net since the operating system drops packets sent on a disconnected network interface.
MultiNet solves this problem by trapping the disconnect message sent by the physi-
cal layer immediately after a disconnection. This message is received at the MultiNet
Layer and is prevented from going up the network stack. Consequently, layers above
the MultiNet Layer see all the virtual interfaces as connected although the physical card
MultiNet also manages the state of a virtual interface when a network disconnection
is caused by factors such as mobility or weak signal strength. The virtual interface is
made to appear disconnected when the physical card is unable to connect to its network,
and is made to appear connected when the physical card regains connectivity to the
network. MultiNet achieves this functionality by not trapping the disconnect message
when it is caused by any source other than MultiNet. As a result the virtual card appears
disconnected whenever the physical wireless card is unable to connect to its network.
Further, MultiNet attempts to connect to all networks in its Switching Cycle, even if its
previous attempt to connect was unsuccessful. When the physical wireless card success-
fully connects to a network, the connect message is passed up the network stack, and the
This design of MultiNet poses two interesting questions. Firstly, how are packets
delivered to a virtual interface if the card has switched to another network? Secondly,
how long should the card stay on a network? We first answer these questions for the
scenario when only one machine in any ad hoc network uses MultiNet. We then develop
our approach to handle the case when MultiNet is used by more than one node in an
ad hoc network. An important question we defer to future work, in Section 2.9, is the
In this section, we present a buffering protocol that prevents packets sent to a virtual
interface from being discarded when the physical card is not on the corresponding net-
work. As part of the protocol, we describe a new approach that allows MultiNet to work
The buffering protocol works differently for ad hoc and infrastructure networks. For
ad hoc networks, just before switching out of the network, a node broadcasts a packet
that informs all other nodes in the network of its unavailability and when it will be back
in the network. On switching back to the ad hoc network, the node broadcasts another
packet announcing its availability. Packets destined for this node are buffered by other
nodes in the ad hoc network, until either of the following two conditions hold: the
broadcast announcing availability of the node is received, or the time by which the node
was expected to be back in the network has elapsed. If the node is available, then the
buffered packets are sent to the it. Otherwise, if the timer has elapsed, then the buffered
packets are discarded. This protocol requires modifications at all nodes in the ad hoc
network, even if they do not use MultiNet to connect to multiple networks. This should
MultiNet could use a similar protocol for infrastructure networks. However, APs
would need to be modified to buffer packets destined for nodes using MultiNet on its
tion 2.5.2. MultiNet solves this problem by proposing a new protocol, called Spoofed
Buffering. Spoofed Buffering buffers packets at the APs without requiring modifications
to them.
Spoofed Buffering works as follows. MultiNet spoofs sleep mode to the AP just
before switching out of an infrastructure network. It sends a special IEEE 802.11 packet
to the AP, which informs the AP that it is using IEEE 802.11 PSM to go to sleep mode,
and the time for which it will sleep. While the AP knows the node to be sleeping,
MultiNet switches the physical card to another network. As described in Section 2.4.2,
PSM requires APs to buffer packets for nodes that are sleeping in its network, and to
send the buffered packets when the nodes wake up. So, packets destined for the MultiNet
nodes are buffered at the AP until the node switches back to the infrastructure network.
The node then informs the AP that it is awake by sending another IEEE 802.11 packet.
On receiving this packet, the AP sends all the buffered packets, which are received by
Figure 2.2 illustrates the steps of Spoofed Buffering when a node uses MultiNet to
connect to two wireless networks. Before switching out of network 1, the node informs
the AP that it is going to sleep for a certain time. It then switches to network 2, where
it announces that it is awake. The AP in network 2 then sends the buffered packets to
the node, which forwards them up to the corresponding virtual interface. The virtual
interface also sends its buffered packets to the AP. The node then stays on network 2 for
the Activity Period. It then sends a message to the AP of network 2 announcing that it
is going to sleep, and switches to network 1 and informs the AP of network 1 that it is
23
awake. These steps continue as long as the node requires connectivity on both wireless
networks.
2)
3)
ing
I a /Re ing
4)
Se
ke
m
leep
I
nd leep
awa
am
aw eive
ms
ak
s
am
e
a
1) I
5) I
pa
ck
tse
Figure 2.2: The steps of Spoofed Buffering when a node uses MultiNet to connect
to two networks.
We note that despite our buffering protocol, packets might still be lost due to other
reasons, such as mobility, wireless signal fade or interference. Further, buffering might
not be possible at other nodes in the network, due to lack of cooperation from nodes in
the ad hoc network or PSM support at the APs. In such scenarios, MultiNet relies on
higher layer protocols, such as TCP, to recover the lost packets. We compare MultiNet
with and without buffering support in Section 2.7.5, and show that although MultiNet
performs much better when the buffering protocols are implemented, its performance is
The Activity Period is the duration for which a wireless card stays connected on a net-
work. MultiNet supports three schemes for determining this duration, each of which is
• Fixed Slot Duration: In this approach, MultiNet divides time into slots of fixed
duration. Every time the physical card switches to a network, it stays on that
network for one slot. The slot duration includes the Switching Delay. This scheme
between multiple nodes using MultiNet in an ad hoc network. We use it for our
• User Defined Priority: This scheme requires the user to prioritize all his net-
works, and define the Total Activity Period. The Total Activity Period is the sum
of Activity Periods of all networks, which is equal to the difference between the
Switching Cycle and the sum of Switching Delays across all networks. Multi-
Net then calculates the Activity Period for each network based on its priority.
So, if a user requires connectivity to a set of wireless networks, and has given
exists a predefined priority across all networks. For example, the Client Conduit
Protocol, described in Chapter 4, uses user defined priorities to limit the duration
• Adaptive Schemes: This approach does not require any intervention from the
on it, and uses these priorities to calculate the Activity Period for each network.
25
Consequently, a network that sends and receives more packets has a longer Ac-
tivity Period as compared to a less active one. So, if MultiNet has to switch
across different networks, and network i has seen Pi packets in its last Activity
Period AT Pi , then the node stays in network j for an Activity Period given by
(Pj /AT Pj ) ∗ (1/( ∀k Pk /AT Pk )) ∗ ( ∀k AT Pk ). The first term gives the net-
work utilization of network j, the second gives the utilization across all networks,
and the final term is the total amount of time the node is active across all networks.
This approach is useful in scenarios where the user wants to get the best perfor-
mance on multiple networks, without worrying about the traffic patterns on each
network. We use this strategy to provide true zero configuration over MultiNet,
that maintains a history of packets sent and received on all virtual interfaces over
a certain number of Switching Cycles. It then uses this history to prioritize across
Supporting multiple nodes to use MultiNet in an ad hoc network poses a new problem.
Any two nodes using MultiNet might not overlap in the ad hoc network for a signif-
icant period of time. Consequently, these nodes will be unable to communicate with
each other for long durations even though they are in communication range of each
other. This significantly affects the performance of MultiNet on the ad hoc network.
Figure 2.3 illustrates this problem when two nodes A and B are in communication range
26
of each other and use MultiNet with Fixed Slot Duration to connect to two networks:
Infrastructure Network 1 and Ad Hoc Network 2. In this scenario, nodes A and B do not
overlap in the ad hoc network, and cannot communicate in this network. However, note
that this problem is specific to ad hoc networks, as these nodes can communicate in the
infrastructure network using Spoofed Buffering to buffer packets at the APs. Further,
this problem also arises for other switching protocols described in Section 2.5.5, as two
nodes might overlap for a very small period of time, which is too small to send even a
single packet.
Machine A
Machine B
time
Figure 2.3: Two nodes in communication range and using MultiNet that fail to
nize an overlap between any two nodes using MultiNet in a single hop ad hoc network.
We discuss SSCH, which is a more sophisticated and efficient approach for multihop
connect to only one ad hoc network in which multiple nodes use MultiNet. Extending
this approach to allow nodes to stay connected in many ad hoc networks with multiple
Slotted Synchronization uses what we term the “Fixed Slot Duration switching scheme”,
in which time is divided into slots and nodes switch to a network at the beginning of a
27
slot. All nodes use the same slot duration, and clocks at all nodes in a network are
makes the assumption, as described in Section 2.5.1, that the node starting an ad hoc
network knows if more than one node in its network is going to use MultiNet.
defines a recurrence period for the network. The recurrence period is the periodicity,
Section 2.6.4, the SSID field of the IEEE 802.11 Beacon [58] can be modified to carry
the information about the recurrence period of the network and offset within the slot
when the Beacon is transmitted. When a node uses MultiNet to join this network, it uses
this information to synchronize the start time of its slots to that of the ad hoc network.
Then, after every recurrence period slots, MultiNet switches the physical card of this
node to the ad hoc network. Over the remaining slots, MultiNet switches the physical
This algorithm ensures that all nodes in the ad hoc network overlap for one slot
every recurrence period slots, even when some nodes use MultiNet to stay connected on
slots at all nodes in the network to the parameters specified by the initiator. Further, slot
synchronization occurs only at the time of joining the network and so this algorithm is
not affected by mobility in the network. Note that this algorithm might not work if a
node uses it to synchronize slots to multiple networks, since the initiator’s slots of these
disjoint networks might not be synchronized. Therefore, we limit a node to use MultiNet
to stay connected on only one ad hoc network in which multiple nodes use MultiNet.
28
However, it can connect to many infrastructure networks and ad hoc networks in which
2.6 Implementation
a user level service. The mechanisms for storing network state, and for switching and
buffering across networks are implemented in the kernel, while the respective policies
are implemented in the service. The kernel driver is an NDIS intermediate driver, which
1
exists as a layer between the network device drivers and IP. MultiNet performs best
when APs implement PSM and other nodes in an ad hoc network buffer packets for
nodes using MultiNet. However, no changes are required in the wired nodes for Multi-
Net to work. The rest of this section describes the details of our implementation.
The MultiNet driver provides all the mechanisms required by the MultiNet architecture.
It initializes and maintains the virtual interfaces, and provides support to switch a wire-
less card from one network to another and to buffer packets at the virtual interfaces if the
physical card is not on the wireless network. This driver also sends the buffered packets
NDIS requires the lower binding of a network protocol, such as IP, to be a network
miniport driver2, such as the driver of a network interface. Similarly, NDIS requires the
1
Network Driver Interface Specification (NDIS) is a Windows construct that pro-
vides transport independence for the network card vendors. All networking protocols
used by Windows call the NDIS interface to access the network.
2
A miniport driver directly manages a network interface card (NIC) and provides an
29
this requirement in the design of the MultiNet Driver, which includes two components:
the MultiNet Protocol Driver (MPD), which provides an upper binding to the network
card miniport driver, and the MultiNet Miniport Driver (MMD), which provides a lower
binding to the network protocols, such as TCP/IP. The modified stack is illustrated in
Figure 2.4.
Mobile Aware
Application
Application
MultiNet Service
Driver
MultiNet Protocol Driver (MPD)
NDIS WLAN
extensions
Hardware
The MPD manages multiple virtual interfaces over one wireless card. It switches
the association of the underlying card across different networks, and buffers packets if
the SSID of the associated network is different from the SSID of the sending virtual
interface to higher-level drivers.
30
interface. MPD also buffers packets on the instruction of the MultiNet Service, as we
describe later in Section 2.6.2. Further, the MPD handles packets received by the wire-
less card. A packet received on the wireless card is sent to the virtual interface associated
The MMD manages a virtual interface of a wireless card. It maintains the state for
each virtual interface, which includes the SSID and operational mode of the wireless
network. It is also responsible for handling query and set operations directed for the
The MultiNet service implements the algorithms for switching across networks and
buffering packets, described in Sections 2.5.5 and 2.5.4 respectively. This service is
a user level daemon that uses I/O Control Codes (ioctls) to interact with the MultiNet
Driver. It also broadcasts packets to interact with the service running at other nodes.
These messages coordinate the buffering protocol for ad hoc networks, described in
Section 2.5.4. Further, all the switching algorithms discussed in Section 2.5.5 are im-
plemented in the MultiNet service. The service determines the duration of the Activity
Period, and sends a signal to MPD when the Activity Period expires. This signal initiates
the switching mechanism implemented in MPD. Finally, the service also coordinates the
synchronization protocol described in Section 2.5.6. It embeds the recurrence period and
offset in the IEEE 802.11 Beacon frame, and uses this information to synchronize the
Spoofed Buffering, described in Section 2.5.4, buffers packets for MultiNet over infras-
tructure networks using IEEE 802.11 PSM. We successfully implemented this scheme
over Native WiFi cards, which were described in Section 2.4.3. For non-Native WiFi
(legacy) cards, we were constrained by the proprietary software on the card drivers.
Their software does not expose any APIs in Windows to programmatically set the res-
olution of power save mode. Therefore, we were unable to implement the buffering
algorithm for these WLAN cards. However, for prototyping Spoofed Buffering, we
buffer packets at the end points of infrastructure networks, using a scheme similar to the
one described for ad hoc networks in Section 2.5.4. The MultiNet service keeps track of
the end points of all on-going sessions, and buffers packets if the destination is currently
in another network.
network with multiple MultiNet nodes to have two parameters, in addition to the ones
specified by IEEE 802.11. In particular, the initiator of such an ad hoc network has
to specify the recurrence period and the offset within the slot when the IEEE 802.11
Beacon is sent. Any node joining this network has to learn of both these parameters for
IEEE 802.11 packets to carry more information. However, this requires modifications to
the wireless card driver, and might reduce the interoperability of MultiNet, as discussed
in Section 2.5.2.
We use an alternative approach to solve this problem. The two parameters are em-
bedded in the SSID field of an IEEE 802.11 Beacon, which is broadcast once every
32
fixed interval.3 The SSID field of the Beacon frame is 32 bytes in length. The recur-
rence period is measured in slots, and the maximum value is the number of networks to
which a user can connect to. We limit this to be 255, and so 1 byte is sufficient to carry
this information. Further, the offset within the slot is measured in microseconds, and
we limit the maximum slot duration to 5 seconds. So, 2 bytes are enough to embed the
value of the offset. Therefore, the user is allowed to use a 29 characters long SSID for
such ad hoc networks. Based on experience, we believe that this does not significantly
We studied the performance of MultiNet using a real implementation and a custom sim-
ulator. The implementation was used to study the throughput behavior with different
switching algorithms. We then simulated MultiNet with realistic parameters, and com-
pared it with the alternative approach of using multiple radios to connect to multiple
networks. We compare the two approaches with respect to energy consumption and the
average delay encountered by the packets. The results presented in this section con-
MultiNet has been deployed and tested over a dozen commercial IEEE 802.11 wireless
cards. The results in this section were derived over an IEEE 802.11b network [60].
The wireless cards used were the Cisco 340 series, Compaq WLAN 200, Orinoco Gold,
3
The IEEE 802.11 protocol for joining an ad hoc network requires the joining node
to use the information in the Beacons of that network.
33
Netgear WAG 511 and the Native WiFi cards from AMD [11] and Realtek [102]. All
these cards have a maximum data rate of 11 Mbps. The APs used were the Cisco 340
Series, EZConnect 2656, DLink DI-614+ and Native WiFi APs. IEEE 802.11 PSM was
implemented only in the Native WiFi APs. Most of our results were consistent across
Good performance of MultiNet depends on a short delay when switching across net-
works. However, legacy IEEE 802.11b cards perform the entire association procedure
every time they switch to a network. We carried out a detailed analysis of the time to
associate to an IEEE 802.11 network. The results showed significant overhead when
switching from one network to another. In fact, an astronomical delay of 3.9 seconds
was observed from the time the card started associating to an ad hoc network, after
Table 2.1: The Switching Delays between IS and AH networks for IEEE 802.11
cards with and without the optimization of trapping media connect and disconnect
messages.
IS to AH 3.9 s 170 ms 25 ms
AH to IS 2.8 s 300 ms 30 ms
Our investigations revealed that the cause of this delay is the media disconnect and
media connect notifications to the IP stack. The IP stack damps the media disconnect
and connect for a few seconds to protect itself and its clients from spurious signals. The
34
spurious connects and disconnects can be generated by network interface cards due to a
variety of reasons ranging from buggy implementations of the card or switch firmware
to the card/switch resetting itself to flush out old state. Windows was designed to damp
the media disconnect and connect notifications for some time before rechecking the
connectivity state of the adapter and taking the action commensurate with that state.
hidden from higher protocols, such as IP and the applications. We hide switching by
having MPD trap the media disconnect and media connect messages when it switches
between networks. Since the MPD is placed below IP, it can prevent the network layer
from receiving these messages. This minor modification significantly improves the
Switching Delay as shown in Table 2.1. Using the above optimization, we were able
to reduce the switching delay from 2.8 seconds to 300 ms when switching from an ad
hoc network to an infrastructure network and from 3.9 seconds to 170 ms when switch-
ing from an infrastructure network to an ad hoc network. These numbers are further
reduced to as low as 30 ms and 25 ms respectively, when Native WiFi cards are used.
We believe that this overhead is extraneous for purposes of MultiNet and in Section 2.8
A nice consequence of masking the media connect and media disconnect messages
is that all virtual adapters are visible to IP as connected, and our architecture of Section
We implemented three switching strategies described in Section 2.5.5, i.e. User Defined
Priority, Adaptive Buffer, and Adaptive Traffic. The test environment comprised a node
that used MultiNet to stay connected to an infrastructure and an ad hoc network. The
35
Switching Delays from the ad hoc to the infrastructure network and vice versa were
overestimated at 500 ms and 300 ms respectively. 4 The total time available for switch-
ing between networks was 1 sec. We evaluated the switching strategies when simultane-
ously transferring a file of size 47 MB using FTP from the MultiNet node to two nodes
on the different networks. An independent transfer of the file over the ad hoc network
took 80.25 seconds, while it took 54.12 seconds over the infrastructure network.
Figure 2.5 shows the time taken to simultaneously transfer this file over MultiNet
using different switching strategies for legacy cards. We evaluated 3 different User
Defined Priority switching schemes. In the ‘50%IS 50%AH’ strategy the node stays on
each network for 500 ms. In the ‘75%IS 25%AH’ scheme it stays on the infrastructure
network for 750 ms and on the ad hoc network for 250 ms, and in the ‘25%IS 75%AH’
scheme the node stays on the infrastructure network for 250 ms and the ad hoc network
for 750 ms. For the Adaptive Traffic algorithm we used a window of 3 switching cycles
to estimate the Activity Periods. In this case the window is 3*1.8 = 5.4 seconds since
Different switching strategies show different behavior and each of them might be
useful for different scenarios. For the User Defined Priority strategies, the network with
higher priority gets a larger slot to remain connected. Therefore, the network with a
higher priority takes lesser time to complete the FTP transfer. The results of the adap-
tive algorithms are similar. The Adaptive Buffer algorithm adjusts the time it stays on
a network based on the number of packets buffered for that network. Since the maxi-
network5 , the number of packets buffered for the infrastructure network is more. There-
4
This overprovisioning helped to evenly compare all the switching schemes by fixing
the duration of the Switching Cycle
5
Separate experiments revealed that the average throughput on a wireless network
with commercial APs and wireless cards is 5.8 Mbps for an isolated infrastructure net-
36
800
700
600
500
Seconds
400
300
200
100
0
25%IS 75%AH 50%IS 50%AH 75%IS 25%AH Adaptive Buffer Adaptive Traffic
IS AH
Figure 2.5: Time taken to complete a 47 MB FTP transfer on an ad hoc and infras-
fore the FTP transfer completes faster over the infrastructure network as compared to
the ‘50%IS 50%AH’ case. For a similar reason the FTP transfer over the infrastructure
network completes faster when using Adaptive Traffic switching. MultiNet sees much
more traffic sent over the infrastructure network and proportionally gives more time to
it. Overall, the adaptive strategies work by giving more time to faster networks if there
is maximum activity over all the networks. However, if some networks are more active
than the others, then the active networks get more time. We expect these adaptive strate-
gies to give the best performance if the user has no priority and wants to achieve the best
1200 -10
1000 -30
800 -50
Activity Period (in ms)
600 -70
400 -90
200 -110
0 -130
0 20 40 60 80 100 120 140
Figure 2.6: Variation of the activity period for two networks with time. The activity
The adaptability of MultiNet is demonstrated in Figure 2.6. The Adaptive Traffic switch-
ing strategy is evaluated by running our system for two networks, an ad hoc and an
infrastructure network, for 150 seconds. The plots at the top of Figure 2.6 show the
traffic seen on both the wireless networks, and the ones at the bottom of this figure show
the corresponding effect on the Activity Period of each network. The adaptive switch-
ing strategy causes the Activity Period of the networks to vary according to the traffic
seen on them. Initially when there is no traffic on either network, MultiNet gives equal
time to both networks. After 20 seconds there is more traffic on the ad hoc network,
and so MultiNet allocates more time to it. The traffic on the infrastructure network is
greater than the traffic on the ad hoc network after around 110 seconds. Consequently,
the infrastructure network is allocated more time. This correspondence between relative
MultiNet, when used with adaptive switching schemes, provides true zero config-
uration. Prior schemes, such as Wireless Zero Configuration (WZC), require users to
specify a list of preferred networks, and WZC only connects to the most preferred avail-
able wireless network. The adaptive switching strategies require a user to specify a list
of preferred networks, and the card connects to all the networks giving time to a network
cards using IEEE 802.11 PSM. However, many commercial APs do not implement
PSM. Further, the ad hoc network buffering protocol, described in Section 2.5.4, re-
lies on broadcast packets, which are more unreliable than unicast packets [91]. These
packets might get lost, and packets destined to MultiNet’s virtual interface might get
dropped. The worst case occurs when no packets are buffered due to lost broadcast
packets or lack of PSM support from commercial APs. Figure 2.7 compares this worst
case to the scenarios when MultiNet implements buffering. In our test scenario, we
Packets were sent, using ntttcp, over the infrastructure network from the MultiNet
node to another node in the network. Ntttcp, which is a port of ttcp [118] to Windows,
works by establishing a TCP session between two nodes and sending the packets at the
maximum rate. The Activity Period for both networks was fixed at 500 ms. We present
results for three scenarios in Figure 2.7. ‘NoMultiNet’ corresponds to the case when
the sender and receiver are connected to just one network, ‘MultiNetNoBuffer’ is when
the sender is connected to two networks using MultiNet and the AP does not implement
Spoofed Buffering, and the APs implement Spoofed Buffering in ‘MultiNetBuffer’. Re-
39
9.E+03
8.E+03
7.E+03
TCP Sequence #
6.E+03
5.E+03
4.E+03
3.E+03
2.E+03
1.E+03
0.E+00
1.9 2.6 3.3 4 4.7 5.4 6.1 6.8 7.5 8.2 8.9 9.6
sults show that the performance drops by a factor of four when using MultiNet with
Spoofed Buffering and drops further when the AP does not buffer packets. When APs
buffer packets, the MultiNet node can achieve a throughput proportional to the duration
of its Activity Period, which is around a fourth of the Switching Cycle. Without buffer-
ing, the throughput of the system in this case goes down to a seventh of the maximum
We set up a three node network. The first machine always stays on the infrastructure
network. Both the other machines use MultiNet. Before we start this experiment, the
network. It is initially the only node in the ad hoc network. The third node, which we
40
Connecting to
AH Network
1.6
1.4
0.8
0.6
0.4
0.2
0
0 10 20 30 40 50
Time (In Seconds)
Figure 2.8: Effect on UDP flows when a node uses Slotted Synchronization to join
an ad hoc network
also use as our test machine, is initially connected to only the infrastructure network.
We start a UDP flow between the test machine and the first machine, which is only on
the infrastructure network. We use Fixed Slot Duration switching, and set the duration
of each slot to 800 ms. This duration contains the Switching Delay. IPerf [1] was used to
initiate UDP flows of 1 Mbps with 512 bytes packets. The MPD was also instrumented
to report the total number of successful packets sent and received in every slot. This
Figure 2.8 illustrates the instantaneous throughput, measured once per Switching
Cycle, achieved by UDP flows when the test machine joins an ad hoc network that has
more than one MultiNet node. Initially, when the test machine is only in the infras-
tructure network, there is no Switching Delay, and consequently the UDP throughput
is around 1 Mbps. After 13 seconds, the test machine uses MultiNet to connect to the
41
ad hoc network, which already has one MultiNet node. The test machine takes around
15 seconds to initialize another virtual interface, build up its state, synchronize the slots
to the MultiNet node in the ad hoc network and get a DHCP address for the virtual
interface. After this time, the UDP flow between the test machine and the infrastruc-
ture network node resumes. We immediately start another UDP flow between the two
MultiNet nodes in the ad hoc network. As we see in the figure, UDP throughput in
the infrastructure network drops to around half the initial throughput. This is because
the infrastructure network gets one of two slots in Fixed Slot Duration Switching since
MultiNet connects to two networks. The Switching Delay does not reduce the through-
put further, because MultiNet is able to send the buffered packets over the Activity
Period at the network’s bandwidth, which is greater than the IPerf flow rate of 1 Mbps.
Further, the flow over the ad hoc network roughly achieves the same throughput as over
the infrastructure network, which implies that Slotted Synchronization maintains a good
MultiNet does not aim to hide mobility from the user. As discussed in Section 2.5.2,
MultiNet’s virtual interfaces should behave as physical wireless cards when nodes are
mobile. To illustrate this behavior, the same experimental setup of Section 2.7.6 was
used. However, in this case, we focused on the throughput in the ad hoc network. After
around 28 seconds, the test machine was moved away from the other MultiNet node
in the ad hoc network. As we see in Figure 2.9, the IPerf throughput over the ad hoc
network keeps falling as the machine moves away from the other node in the ad hoc
network. With an increase in distance between the two nodes, the signal strength de-
creases, which increases the loss rate and reduces the throughput. After some time the
42
0.7
0.5
0.4
0.3
0.2
0.1
0
0 10 20 30 40 50 60 70 80
connection over the ad hoc network is lost. This state is propagated to the application
layer, which halts IPerf. However, MultiNet keeps trying to reconnect to the ad hoc net-
IPerf flow is started immediately between the two nodes. As we see in the figure, the
two nodes using MultiNet achieve the same throughput after reconnection, as they had
before the connection was lost. This shows that there is a significant overlap between the
two nodes, and the performance of Slotted Synchronization is not significantly affected
with mobility. The test machine was again moved at around 70 seconds, and we see a
MultiNet is one way of staying connected to multiple wireless networks. The alter-
native approach is to use multiple wireless cards. Each card connects to a different
this approach, and compared it with the MultiNet scheme with respect to the energy
consumed and the average delay of packets over the different networks. We first present
our simulation environment, and then compare the results of the MultiNet scheme to the
Simulation Environment
We simulated both approaches for a sample scenario of people wanting to share and
discuss a presentation over an ad hoc network and browse the web over the infrastruc-
ture network at the same time. This feature is extremely useful in many scenarios. For
example, consider the case where a company, say Kisco’s, employees conduct a busi-
ness meeting with another company, say Macrosoft’s, employees at Macrosoft’s head-
quarters. With MultiNet and a single wireless network card, Kisco employees can share
documents, presentations, and data with Macrosoft’s employees over an ad hoc network.
Macrosoft’s employees can stay connected to their internal network via the access point
does not have to give Kisco employees access in their internal network in order for the
We model traffic over the two networks, and analyze the packet trace using our sim-
ulator. Traffic over the infrastructure network is considered to be mostly web browsing.
We used Surge [18] to model http requests according to the behavior of an Internet
user. Surge is a tool that generates web requests with statistical properties similar to
measured Internet data. The generated sequence of URL requests exhibit representa-
tive distributions for requested document size, temporal locality, spatial locality, user
off times, document popularity and embedded document count. For our purposes, Surge
was used to generate a web trace for a 1 hour 50 minute duration, and this web trace
44
1600
1400
1200
800
600
400
200
0
0 1000 2000 3000 4000 5000 6000 7000
Time (seconds)
Figure 2.10: Packet trace for the web browsing application over the infrastructure
network
was then broken down to a sample packet trace for this period. The distribution of the
The ad hoc network is used for two purposes: sharing a presentation, and support-
ing discussions using a sample chat application. Three presentations are shared in our
downloaded to the target machine using an FTP session over the ad hoc network. They
are downloaded in the 1st minute, the 38th minute, and the 75th minute. Further, the
user also chats continuously with other people in the presentation room, discussing the
presentation and other relevant topics. Packet traces for both the applications, FTP and
chat, were obtained by sniffing the network, using Ethereal [45], while running the re-
spective applications. MSN messenger was used for a sample chat trace for a 30 minute
duration. The Packet traces for FTP and chat were then extended over the duration of
In our simulations we assume that wireless networks operate at their maximum TCP
throughput of 4.4 and 5.8 Mbps for an ad hoc and infrastructure network respectively.
We then analyze the packet traces for independent networks, and generate another trace
for MultiNet. We use a ‘75%IS 25%AH’ switching strategy presented in Section 2.5.5
with a switching cycle time of 400 ms. The switching delay is set to 1 ms, and we ex-
plain the reason for choosing this value in Section 2.8.1. Further, the power consumed
sumptions to greatly affect the results of our experiments. We analyze packet traces for
the two radio and MultiNet case and compute the total power consumed and the average
delay encountered by the packets. All the cards are assumed to be Cisco AIR-PCM350,
and their corresponding power consumption numbers are used from [111]. Specifically,
the card consumes 45 mW of power in sleep mode, 1.08W in idle mode, 1.3W in re-
ceive mode, and 1.875W in transmit mode. Further, in PSM, the energy consumed by
the Cisco AIR-PCM 350 in one power save cycle is given by: 0.045 ∗ n ∗ t + 24200
milliJoules, where n is the Listen Interval and t is the Beacon Period of the AP. The
Table 2.2: The average throughput in the ad hoc and infrastructure networks using
Despite the performance advantages seen in Table 2.2, using multiple radios con-
sumes more power. Each radio is always on, and therefore keeps transmitting and re-
ceiving over all the networks. Even when it is not, the radio is in idle mode, and drains
46
1600
1400
1200
800
600
400
200
0
0 1000 2000 3000 4000 5000 6000 7000
Time (seconds)
Figure 2.11: Packet trace for the presentation and chat workloads over the ad hoc
network
a significant amount of power. Figure 2.12 shows the amount of energy consumed by
the MultiNet scheme and the two radio scheme for the above application. Two radios
consume almost double the power consumed by the single MultiNet radio.
Table 2.3: The average packet delay in infrastructure mode for the various strate-
gies
MultiNet 0.157
MultiNet PS 0.167
47
16000
14000
10000
8000
6000
4000
2000
0
0 1000 2000 3000 4000 5000 6000
Time (In Seconds)
Figure 2.12: Comparison of total energy usage when using MultiNet versus two
radios
The multiple radio approach can be modified to consume less power by allowing the
network card in infrastructure mode to use PSM. Figure 2.13 shows the energy usage
when the infrastructure radio uses PSM for our application. The Beacon Period is set to
100 ms, and the Listen Interval is 4. The amount of energy consumed in the two radio
case using PSM is very close to the consumption of MultiNet without PSM. However,
this saving comes at a price. It is no longer possible to achieve the high throughput for
infrastructure networks if the cards are in PSM. Simulated results in Table 2.3 show that
the average packet delay over the infrastructure network with PSM is now close to the
average packet delay for MultiNet. Therefore, using two radios with PSM does not give
We analyze the two schemes of connecting to multiple networks with respect to the
performance on the network and the amount of power consumed. In our simulated
scenario, each of the radios gives the best achievable throughput on both the networks.
As shown in Table 2.2, the average throughput of MultiNet in the infrastructure mode is
4.35 Mbps compared to 5.8 Mbps in the two radio case. The average throughput in the ad
hoc network is 1.1 Mbps in MultiNet and 4.4 Mbps when using two radios. Switching
smaller time period. Consequently, the scheme of using multiple cards gives much
the power save mode for infrastructure networks as described in Section 2.4.2. In our
experiment we chose the Switching Cycle to be 400 ms, with ‘75%IS 25%AH’ switch-
ing. For consistency in comparison, the Listen Interval is set to 4 and the Beacon Period
to 100 ms. Consequently, every time the card switches to infrastructure mode, it listens
for the traffic indication map from the AP. After it has processed all its packets it goes
to sleep and wakes up after 300 ms. It then stays in the ad hoc network for 100 ms,
and then switches back to the infrastructure network. The modified algorithm results in
greater energy savings as shown in Figure 2.13. The average delay per packet over the
infrastructure network is not seriously affected, while the energy consumed is reduced
by more than a factor of 3. We conclude that MultiNet is superior to the use of multiple
power efficiency.
Note that we do not evaluate power saving in ad hoc mode because we are unaware
of any commercial cards that implement this feature. As a result we were unable to get
49
performance numbers when using PSM in ad hoc mode. However, we believe that if
8000
7000
6000
Energy (In Joules)
5000
4000
3000
2000
1000
0
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000
Figure 2.13: Energy usage when using MultiNet and two radios with IEEE 802.11
Power Saving
increasing the number of connected networks. Table 2.4 presents the average delay seen
by packets over the infrastructure network on varying the number of MultiNet networks
from 2 to 6. We used a Fixed Priority switching strategy with equal priorities to all
Activity Period for each connected network when using Fixed Priority Switching. As a
result, more packets are buffered and the average delay encountered by the packets on a
Table 2.4: The average packet delay in infrastructure mode on varying the number
2 0.191
3 0.261
4 0.332
5 0.410
6 0.485
2.7.10 Summary
gies are best when no network preference is indicated. Both Adaptive Buffer and
• For the applications studied, MultiNet consumes 50% less energy than a two card
solution.
• As expected, the average packet delay with MultiNet varies linearly with an in-
crease in the number of connected networks when all the networks are given equal
activity periods.
MultiNet works even without Spoofed Buffering, although the performance goes
down by a factor of 4.
reduction in the switching overhead. The switching delay for legacy cards is re-
duced to around 300 ms, while this number goes down to 30 ms for Native WiFi
cards.
oritize the preferred network. With MultiNet based zero configuration, the user
• In mobile scenarios, MultiNet exposes the same connectivity status as a real card.
wireless cards.
This section discusses ways in which the performance of MultiNet can be improved. In
particular, it focuses on reducing the switching overhead, enabling 802.1X [57] authen-
Good performance of MultiNet depends on low switching delays. The main cause of
the switching overhead in current generation wireless cards is the 802.11 network asso-
ciation and authentication protocol [58], which is executed every time the card switches
to a network. Further, these cards do not store state for more than one network in the
firmware, and worse still, many card vendors force a firmware reset when changing the
Most of these problems are fixed in the next generation Native WiFi cards. These
cards do not incur a firmware reset on changing their mode. Moreover, since switching
52
is forced by MultiNet, Native WiFi cards do not explicitly disconnect from the network
when switching. However, they still carry out the association procedure that causes the
automatically initializing them, this delay can be made negligible. The only overhead
on switching is then the synchronization with the wireless network. This can be done
Using the above optimizations, a WLAN card can switch to a network as fast as the
network card can switch to a different channel and the speed with which a network’s
state can be loaded into a flash card. Recent research has shown that the time to switch
to a different channel is less than 100 µsec for an IEEE 802.11a wireless card [51].
Further, as the network state to load is around 100 bytes, and data transfer speeds for
flash cards is 8 Mbps [13], we expect the switching overhead to be less than 1 ms.
The IEEE 802.1X is a port based authentication protocol that is becoming popular for
enterprise wireless LANs. For MultiNet to be useful in all environments it has to support
in the Wireless Zero Configuration Service (WZC) for Windows XP, and we had to turn
off WZC for MultiNet to work. Only minor changes are needed in WZC for it to work
with MultiNet. However, achieving good performance with IEEE 802.1X is difficult.
We measured the overhead of the IEEE 802.1X authentication protocol and found it to
be approximately 600 ms. It is clear that we need to prevent the card from going through
a complete authentication procedure every time it switches across IEEE 802.1X enabled
networks. We can eliminate the authentication cycles by storing the IEEE 802.1X state
in the MPD and using this state instead of redoing the authentication procedure. Further,
53
the APs. Preauthentication works by having the APs maintain a list of authenticated
nodes. When implemented, this optimization will eliminate the authentication overhead
The simple answer is yes, however we strongly believe that the right place to implement
MultiNet is as a kernel driver. Buffering imposes memory requirements that are best
taken care of by the operating system, and the policy driven behavior can bloat the
firmware. Additionally, by moving the intelligence into a general purpose PC, the cost of
the wireless hardware can be reduced further, which is the trend for the next generation
The switching behavior of MultiNet augurs badly for TCP performance. MultiNet is
implemented below IP, and so TCP sees fluctuating behavior for packets sent by it. It
receives immediate acknowledgements for packets sent when the network is active, and
delayed acknowledgements for buffered packets. The above behavior affects the way
TCP adjusts the RTT for the session, and from the way it is calculated, the RTT will
ensure that packets are not lost. However, a larger than required RTT has other conse-
quences with respect to flow control, and congestion response. This problem is generally
relevant for networks that have periodic connectivity. A solution to this problem has to
mask the delay encountered by the buffered packets. We are currently exploring ways
than one ad hoc network that has multiple nodes using MultiNet. Solving this problem
requires MultiNet to synchronize its slots to initiators of multiple ad hoc networks, and
those initiators’ slots might not be synchronized. We are looking at ways to handle this
scenario by allowing all nodes, including the initiator, to resynchronize their slots.
such scenarios. A scheme that supports multi-hop networks has to handle partitioning
issues of the ad hoc network, and ways to resynchronize it. SSCH, described in Chap-
ter 3, is a step towards making MultiNet work in multi-hop networks. We hope to build
2.10 Summary
MultiNet. Several compelling real-life scenarios are described that motivate the
need for such an architecture. To the best of our knowledge, MultiNet is the first
to articulate this problem and propose a solution for IEEE 802.11 hardware.
presents Spoofed Buffering, which leverages IEEE 802.11 PSM to buffer packets
at the APs without modifying them. Three switching algorithms are presented
that are useful in different applications of MultiNet. It also presents Slotted Syn-
more power efficient than an alternative of using multiple wireless cards in the
device.
Windows XP over commercial wireless cards. Finally, the performance of MultiNet has
been studied in detail, and is shown to give good performance in most scenarios. The
MultiNet software is available for free download, and more information can be found
at: http://www.cs.cornell.edu/people/ranveer/MultiNet/.
The contents of this chapter have benefitted from several helpful suggestions and
comments. In particular, Victor Bahl and Pradeep Bahl were involved in discussions that
helped develop the MultiNet architecture. Slotted Synchronization and the performance
results were revised after inputs from Ken Birman. Further, some of MultiNet’s design
3.1 Introduction
The problem of supporting multiple senders and receivers in wireless networks has re-
ceived significant attention in the past decade. One domain where this communication
pattern naturally arises is fixed wireless multi-hop networks, such as community net-
works [21, 70, 107, 109]. Increasing the capacity of such wireless networks has been
the focus of much recent research (e.g., [40, 65, 95]). An obvious way to increase the
Channelization was added to the IEEE 802.11 standard to increase the capacity of
traffic to and from these access points does not interfere [4]. Non-infrastructure (i.e., ad-
hoc) networks have thus far been unable to exploit the benefits of channelization. The
current practice in ad-hoc networks is for all nodes to use the same channel, irrespective
of whether the nodes are within communication range of each other [107, 109].
Among its constructions, this thesis proposes a new protocol, Slotted Seeded Chan-
nel Hopping (SSCH), which extends the benefits of channelization to ad-hoc networks.
Logically, SSCH operates at the link layer, but it can be implemented in software over
an IEEE 802.11-compliant wireless Network Interface Card (NIC). The SSCH layer
in a node handles three aspects of channel hopping (i) implementing the node’s chan-
nel hopping schedule and scheduling packets within each channel, (ii) transmitting the
channel hopping schedule to neighboring nodes, and (iii) updating the node’s channel
56
57
for coordinating channel switching decisions, but one that only sends a single type of
message, a broadcast packet containing that node’s current channel hopping schedule.
The simulation results show that SSCH yields a significant capacity improvement in
• It is a new protocol that increases the capacity of IEEE 802.11 ad-hoc networks
ad-hoc networks. The protocol is suitable for a multi-hop environment, does not
require changes to the IEEE 802.11 standard, and does not require multiple radios.
tributed across all channels, and thus avoids control channel saturation, a bottle-
• SSCH introduces a second novel technique to achieve good performance for multi-
warding node to partially synchronize with a source node and partially synchro-
nize with a destination node. This synchronization pattern allows the load for a
In this section, the discussion will be limited to the widely-deployed IEEE 802.11 Dis-
tributed Coordination Function (DCF) protocol [60]. We begin by reviewing some rel-
evant details of this protocol. IEEE 802.11 recommends the use of a Request To Send
58
(RTS) and Clear To Send (CTS) mechanism to control access to the medium. A sender
desiring to transmit a packet must first sense the medium free for a DCF interframe space
(DIFS). The sender then broadcasts an RTS packet seeking to reserve the medium. If
the intended receiver hears the RTS packet, the receiver sends a CTS packet. The CTS
reserves the medium in the neighborhood of the receiver, and neighbors do not attempt
to send a packet for the duration of the reservation. In the event of a collision or failed
RTS, the node performs an exponential backoff. For additional details, the reader is
referred to [60].
The IEEE 802.11 standard divides the available frequency into orthogonal (non-
overlapping) channels. IEEE 802.11b specifies 11 channels in the 2.4 GHz spectrum,
3 of which are orthogonal, and IEEE 802.11a specifies 13 orthogonal channels in the
the communicating nodes on them are reasonably separated (at least 12 inches apart for
Using only a single channel limits the capacity of a wireless network. For example,
consider the scenario in Figure 3.1 where 6 nodes are within communication range of
each other, all nodes are on the same channel, and 3 of them have packets to send
to distinct receivers. Due to interference on the single channel, only one of them, in
this case node 3, can be active. In contrast, if all 3 orthogonal channels are used, all
transmissions can take place simultaneously on distinct channels. SSCH captures the
• SSCH should require only a single radio per node. Some of the previous work
on exploiting frequency diversity has proposed that each node be equipped with
multiple radios [4, 135]. Multiple radios draw more power, and energy consump-
59
1 3 5
2 4 6
Figure 3.1: Only one of the three packets can be transmitted when all the nodes are
requiring only a single standards-compliant NIC per node, SSCH faces fewer de-
• SSCH should not cause logical partitions; any two nodes in communication range
should be able to communicate with each other despite channel hopping. Because
SSCH switches each NIC across frequency channels, different NICs may be on
different channels most of the time. Despite this, any two nodes in communication
range will overlap on a channel with moderate frequency (e.g., at least 10 ms out
will show in Section 3.5.3, the mathematical properties of the SSCH protocol
guarantee that this overlap always occurs, even in the absence of synchronization.
SSCH exploits frequency diversity using an approach that we term optimistic syn-
chronization. This design makes the common case be that nodes are aware of each
other’s channel hopping schedules. However, SSCH also allows any node to change its
channel hopping schedule at any time. If node A has traffic to send to another node B,
and A knows B’s hopping schedule, A will probably be able to quickly send to B by
changing its own schedule. In the uncommon case that A does not know B’s sched-
ule, or A has out-of-date information about B, then the traffic incurs a latency penalty
while A discovers B’s new schedule. The SSCH design achieves this good common case
behavior when SSCH is used with a workload where traffic patterns change (i.e., new
flows are started) with lower frequency than hopping schedule updates are propagated.
Because hopping schedule update propagation requires only tens of milliseconds, this is
a good workload assumption for many wireless networking scenarios. Section 3.6 gives
one node, say B, to follow a channel hopping schedule that overlaps half the time with
another node A, and half the time with a third node C; this is necessary for node B
for node B to have a channel hopping schedule that is an interleaving of A and C’s
schedules, this leaves open how B will schedule itself when a fourth node desires to
synchronize with B. The channel hopping design described in Section 3.5.2 resolves
this issue.
61
We assume that all nodes are using IEEE 802.11a – SSCH could also be used with other
MACs in the IEEE 802.11 family, but evaluation of such options are beyond the scope
expect wireless cards to be capable of switching across channels. The clocks at all nodes
are assumed to be synchronized to within 1 ms of each other using the Timer Synchro-
nization Function of IEEE 802.11 [58] or its modifications proposed in the literature,
such as ATSP [54,74] or ASCP [110]. We justified this assumption in Chapter 2.5.1. As
we discuss in more detail at the beginning of Section 3.6, recent work has reduced this
switching delay to approximately 80 µs ( [51, 84]). We assume that each wireless card
We require that NICs with a buffered packet wait after switching for the maximum
length of a packet transmission before attempting to reserve the medium. This prevents
hidden terminal problems from occurring just after switching. This hardware require-
ment is not necessary if the NIC packet buffer can be cleared whenever the channel is
switched.
We divide prior work relevant to SSCH into two categories: prior uses of pseudo-random
quency diversity. In the first category, we find that pseudo-random number generators
have been used for a variety of tasks in wireless networking. For example, the SEEDEX
less network. Nodes build a schedule for sending and listening on a network, and pub-
lish their seeds to all the neighbors. A node attempts a transmission only when all its
neighbors (including the receiver) are in a listening state. Assuming relatively constant
wireless transmission ranges, this protocol also helps in overcoming the hidden and ex-
posed terminal problem caused by the RTS/CTS approach. The TSMA protocol [31,32]
is a channel access scheme proposed as an alternative to ALOHA and TDMA, for time-
slotted multihop wireless networks. TSMA aims to achieve the guarantees of TDMA
Each node is bootstrapped with a fixed seed that determines its transmission sched-
ule. The schedules are constructed using polynomials over Galois fields (which have
pseudo-random properties), and the construction guarantees that each node will overlap
with only a single other node within a certain time frame. The length of the schedule
is fixed, remains an open problem, and even such a porting would not meet the SSCH
goal of supporting traffic-driven overlap. Redi et al. [33] use a pseudo-random gener-
ator to derive listening schedules for battery-constrained devices. Each device’s seed
is known to a base station, which can then schedule transmissions for the infrequent
erators have been used for a number of tasks (as this survey of the literature makes
clear), to the best of our knowledge, SSCH is the first protocol to use a pseudo-random
ploiting frequency diversity. This is a significant body of research. The first division we
make in this body of work is between approaches that assume a single NIC capable of
63
communicating on a single channel at any given instance in time, and those that assume
more powerful radio technology, such as multiple NICs [4, 112] or NICs capable of lis-
tening on many channels simultaneously [66,89], even if they can only communicate on
one. Our work falls in to the former category; the SSCH architecture can be deployed
(MUP) [4] are both technologies that use multiple radios (in both cases, two radios) to
take advantage of multiple orthogonal channels. DCA uses one radio on a control chan-
nel, and the other radio switches across all the other channels sending data. Arbitration
for channels is embedded in the RTS and CTS messages, and is executed on the control
channel. Although this scheme may fully utilize the data channel, it does so at the cost
of using an entire radio just for control. MUP uses both radios for data and control trans-
missions. Radios are assigned to orthogonal channels, and a packet is sent on the radio
with better channel characteristics. This scheme gives good performance in many sce-
narios. However, it still only allows the use of as many channels as there are radios on
each physical node. From our perspective, the key drawback to both DCA and MUP is
simply that they require the use of multiple radios. Recently, commercial products have
appeared that support multiple radios on a single NIC [44]. It is not known whether
these products will achieve as many radios on a NIC as there are available channels, nor
true multiple radio design is to consider two distinct sources of bottleneck in a single-
radio, single-channel system: the saturation of the channel, and the saturation of any
without increasing the bandwidth of any individual radio. In contrast, a true multiple
64
radio design increases both. A specific example of this difference is that a node using
MUP (a true multiple radio design) can simultaneously send and receive packets on
separate channels, while a node using SSCH can only perform one of these operations
at a time.
We next turn our attention to work assuming more powerful radio technology than
spread spectrum (FHSS) wireless cards. Time is divided into slots, each corresponding
to a small fraction of the time required to send a packet, and the wireless NIC is on a
different frequency during each slot. All nodes are required to maintain synchronized
clocks, where the synchronization is at the granularity of slot times that are much shorter
than the duration of a packet. Each slot is subdivided in to four segments of time for
three segments of time are assumed to be small in comparison with the amount of time
spent sending a segment of the packet during the DATA time interval. To the best of our
knowledge, a FHSS wireless card that supports this type of MAC protocol at high data
Another line of related work assumes technology by which nodes can concurrently
listen on all channels. For example, Nasipuri et al [89] and Jain et al [66] assume
wireless NICs that can receive packets on all channels simultaneously, and where the
channel for transmission can be chosen arbitrarily. In these schemes, nodes maintain a
list of free channels, and either the sending or receiving node chooses a channel with the
least interference for its data transfer. Wireless NICs do not currently support listening
on arbitrarily many channels, and we do not assume the availability of such technology
We finally consider prior work that only assumes the presence of a single NIC with a
65
single half-duplex transceiver. The only other approach that we are aware of to exploit-
ing frequency diversity under this assumption is Multichannel MAC (MMAC) [114].
Like SSCH, MMAC attempts to improve capacity by arranging for nodes to simultane-
using MMAC periodically switch to a common control channel, negotiate their chan-
nel selections, and then switch to the negotiated channel, where they contend for the
channel as in IEEE 802.11. This scheme raises several concerns that SSCH attempts
the extent that these are relaxed, MMAC must spend more time on the common control
multi-hop wireless networks [54]. In contrast, SSCH does not require tight clock syn-
chronization because SSCH does not have a common control channel or a dedicated
nificant fraction of the system traffic, and the common synchronization channel can
ing synchronization and control traffic across all the available channels. A third concern
with MMAC is that it assumes wireless NICs are capable of switching across channels
switching time better reflects the current state of the art in wireless NIC design, and
SSCH performs well with this assumption. A fourth concern with MMAC is that it may
not efficiently support multi-hop flows because forwarding nodes may not predictably
split their time between their sending and receiving neighbors. SSCH addresses this by
Although this survey does not cover all related work, it does characterize the current
state of the field. At the level of detail in this section, prior work such as CHMA [124]
66
is similar to HRMA [137], and MAC-SCC [80] and the MAC protocols implicit in the
work of Li et al [79] and Fitzek et al [46] are similar to DCA [135]. However, a final
related channel hopping technology that is worth mentioning is the definition of FHSS
channels in the IEEE 802.11 [60] specification. At first glance, it may seem redun-
dant that SSCH does channel hopping across logical channels, each one of which (per
the IEEE 802.11 specification) may be employing frequency hopping across distinct
frequencies at the physical layer. The IEEE 802.11 specification justifies this physi-
cal layer frequency hopping with the scenario of providing support for multiple Basic
Service Sets (BSS’s) that can coincide geographically without coinciding on the same
logical channel. In contrast, SSCH does channel hopping so that any two nodes can
coincide as much or as little of the time as they desire. This is also at the heart of the
difference between SSCH and past work on channel-hopping protocols where nodes
overlap a fixed fraction of the time [32] – the degree of overlap between any two nodes
3.5 SSCH
SSCH switches each radio across multiple channels and distributes flows within inter-
fering range of each other on orthogonal channels. This results in significantly increased
network capacity when the network traffic pattern consists of such flows.
work. It does not require synchronization or leader election. Nodes do attempt to syn-
SSCH is designed to work with MultiNet, where a slot is defined to be the time
have further decreased the overhead of channel switching, but would have increased the
delay that packets encounter during some forwarding operations. The channel schedule
is the list of channels that the node plans to switch to in subsequent slots and the time at
which it plans to make each switch. Each node maintains a list of the channel schedules
for all other nodes it is aware of – this information is allowed to be out-of-date, but
the common case will be that it is accurate. The good performance exhibited by SSCH
that are made by each node within a slot, and we refer to this as the packet schedule
(Section 3.5.1). Next, we define the policy for updating the channel schedule and for
propagating the channel schedule to other nodes (Section 3.5.2). We then describe the
mathematical properties that guided SSCH’s design (Section 3.5.3). Finally, we discuss
SSCH maintains packets in per-neighbor FIFO queues. These queues maintain standard
higher-layer assumptions about in-order delivery. The per-neighbor FIFO queues are, in
ing the number of packets sent to nodes that are unreachable. It works as follows. At the
all flows. If a packet transmission to a particular neighbor fails, the corresponding flow
is reduced in priority until a period of time equal to one half of a slot duration has
elapsed – this limits the bandwidth wasted on flows targeted at nodes that are currently
68
on a different channel to at most two packets per slot whenever a flow to a reachable
node also exists. Packets are only drawn from the flows that have not been reduced in
Because nodes using SSCH will often be on different channels, broadcast packets
transmitted in any one slot are likely to reach only some of the nodes within physical
communication range. The SSCH layer handles this issue through repeated link-layer
packets sent this way may reach a different set of nodes than if all nodes had been on
the same channel, we have not found this to present a difficulty to protocols employing
broadcast packets — in Section 3.6 we show that as few as 6 transmissions allows DSR
(a protocol that relies heavily on broadcasts) to function well. This behavior is not sur-
prising because broadcast packets are known to be less reliable than unicast packets, and
so protocols employing them are already robust to their occasional loss. However, the
SSCH retransmission strategy may not be compatible with all uses of broadcast, such
as its use for synchronization [43]. Also, deploying SSCH in an environment with a
packets, SSCH still delivers significant capacity improvement in the traffic scenarios we
An SSCH node with a packet to send may discover that a neighbor is not present on
the node may very well be present on another channel, in which case SSCH should still
deliver the packet. To handle this, we initially retain the packet in the packet queue.
Packets are dropped only when SSCH gives up on all packets to a given destination, and
this dropping of an entire flow occurs only when we have failed to transmit a packet to
69
the destination node for an entire cycle through the channel schedule. We will explain
the meaning of a cycle through the channel schedule in Section 3.5.2, but with our cho-
sen parameter settings the timeout is 530 ms. After a flow has been garbage collected,
new packets with the same destination inserted in the queue are assigned to a new flow,
This packet scheduling policy is simple to implement, and yields good performance
in the common case where node schedules are known, and information about node avail-
ability is accurate. A potential drawback is that a node crash (or other failure events)
can lead to a number of wasted RTSs to the failed node. When summed across channels,
the number may exceed the IEEE 802.11 suggested value of 7 retransmission attempts
for RTS packets. In Section 3.6, we quantify the cost of such failures and show that it is
small.
We begin our description of channel scheduling by describing the data structure used to
represent the channel schedule. We then describe the policy nodes use to act on their own
channel schedule, the mechanism to communicate channel schedules to other nodes, and
finally the policy nodes implement for updating or changing their own channel schedule.
The channel schedule must capture a given node’s plans for channel hopping in the
future, and there is obvious overhead to representing this as a very long list. Instead, we
compactly represent the channel schedule as a current channel and a rule for updating
the channel – in particular, as a set of 4 (channel, seed) pairs. Our experimental results
show that 4 pairs suffice to give good performance (Section 3.6). We represent the
(channel, seed) pair as (xi , ai ). The channel xi is represented as an integer in the range
[0, 12] (13 possibilities), and the seed a i is represented as an integer in the range [1, 12].
70
Each node iterates through all of the channels in the current schedule, switching to the
channel designated in the schedule in each new slot. The node then increments each of
xi ← (xi + ai ) mod 13
We introduce one additional slot to prevent logical partitions. After the node has
iterated through every channel on each of its 4 slots, it switches to a parity slot whose
channel assignment is given by x parity = a1 . The term parity slot is derived from the
analogy to the parity bits appended at the end of a string in some error correcting codes.
The mathematical justification for this design is given in Section 3.5.3. We use the term
cycle to refer to the 530 ms iteration through all the slots, including the parity slot.
In Figure 3.2, we illustrate possible channel schedules for two nodes in the case of 2
slots and 3 channels. In the Figure, node A and node B are synchronized in one of their
two slots (they have identical (channel, seed) pairs), and they also overlap during the
parity slot. The field of the channel schedule that determines the channel during each
slot is shown in bold. Each time a slot reappears, the channel is updated using the seed.
For example, node A’s slot 1 initially has (channel, seed) = (1,2). The next time slot 1
is entered, the channel is updated by adding the seed to it mod 3 (mod 3 because in this
Nodes switch from one slot to the next according to a fixed schedule (every 10 ms
in our current parameter settings). However, the decision to switch channels may occur
while a node is transmitting or receiving a packet. In this case we delay the switch until
after the transmission and ACK (or lack thereof) have occurred.
Nodes learn each other’s schedules by periodically broadcasting their seeds and the
offset within this cycle through the channel schedule. We use the IEEE 802.11 Long
71
A: 1 2 0 0 2 1 2 1
Slot: 1 2 1 2 1 2 Parity 1
B: 1 0 0 1 2 2 2 1
Slot: 1 2 1 2 1 2 Parity 1
Figure 3.2: Channel hopping schedules for two nodes with 3 channels and 2 slots.
Node A always overlaps with Node B in slot 1 and the parity slot. The field of the
channel schedule that determines the channel during each slot is shown in bold.
Control Frame Header format to embed both the schedule and the node’s current offset –
this is discussed in more detail in Section 3.6.4. The SSCH layer at each node schedules
Nodes also update their knowledge of other nodes’ schedules by trying to communi-
cate and failing. Whenever a node sends an RTS to another node, and that node fails to
respond even though it was believed to be in this slot, the node sending the RTS updates
the channel schedule for the other node to reflect that it does not currently know the
We now turn to the question of how a given node changes its own schedule. Sched-
ules are updated in two ways: each node attempts to maintain that its slots start and stop
at roughly the same time as other nodes, and that its channel schedule overlaps with
72
nodes for which it has packets to send. We embed the information needed for this syn-
chronization within the Long Control Frame Header as well. Using this information, a
simple averaging scheme such as described by Elson et al [43] can be applied to achieve
the loose synchronization required for good performance (Section 3.6 shows that a 100
At a high level, each node achieves overlap with nodes for which it has traffic
straightforwardly, by changing part of its own schedule to match that of the other nodes.
However, a number of minor decisions must be made correctly in order to achieve this
slots, with (channel, seed) pairs represented by A1 and A2 ; nodes B and C are sim-
Nodes recompute their channel schedule right before they enqueue the packet an-
nouncing this schedule in the NIC (and so at least once per slot). In a naive approach,
this node could examine its packet queue, and select the (channel, seed) pairs which
lead to the best opportunity to send the largest number of packets. However, this ignores
the interest this node has in receiving packets, and in avoiding congested channels. An
73
example of the kind of problem that might arise if one ignores the interest in receiving
packets is given in Figure 3.3. Here, A synchronized with B, and then B synchronized
with C in such a way that A was no longer synchronized with B. This could have been
avoided if B had used its other slot to synchronize with C, as it would have if it consid-
To account for this node’s interest in receiving packets, we maintain per-slot counters
for the number of packets received during the previous time the slot was active (ignoring
broadcast packets). Any slot that received more than 10 packets during the previous
iteration through that slot is labeled a receiving slot; if all slots are receiving slots, any
one is allowed to be changed. If some slots are receiving slots and some are not, only
the (channel, seed) pair on a non-receiving slot is allowed to be changed for the purpose
SSCH has to avoid the scenario where all nodes in a network converge on the same
(channel, seed) pair value. This situation could arise in a number of scenarios. For
example, if a node, say A, initiates a flow to another node, say B, and then node C
initiates a flow to node A, then A, B and C will synchronize to the same (channel,
seed) value. Moreover, if these were the only nodes in the network, they would never
change their (channel, seed) value. This situation is a problem for SSCH since all nodes
will hop to the same channel in every slot, and therefore all flows will be on the same
channel. Hence, the benefits of channelization are lost, and SSCH becomes equivalent
to a single-channel MAC.
A node compares the (channel, seed) pairs of all nodes from which it received packets
in a given slot, with the list of (channel, seed) pairs of all the other nodes in its list of
channel schedules. If the number of nodes synchronized to the same (channel, seed)
74
pair is more than twice the number that this node communicated with in the previous
occurrence of the slot, we attempt to de-synchronize it from these other nodes. De-
synchronization just involves choosing a new (channel, seed) pair for this slot.
10
9
8
# Synchronized Nodes
7
6
5
4
3
2
1
0
0 50 100 150 200 250 300 350 400 450
Time (in Slots)
Figure 3.4: Need for De-synchronization: All nodes converge to the same channel
without de-synchronization.
The need for de-synchronization is illustrated in Figure 3.4. Our protocol is simu-
lated for 10 stationary nodes, and one of them is randomly picked as a test node. All
nodes are within communication range of each other, the slot duration is 10 ms, and
each node has 4 (channel, seed) pairs. We consider IEEE 802.11a [59], which has 13
orthogonal channels. Initially, every node starts a flow to a randomly chosen destination
node for a random duration between 1 and 500 ms. At the end of a flow, a node starts
a different flow with a randomly picked destination and duration. Figure 3.4 plots the
number of neighbors of the test node that have the same (channel, seed) pair in a slot as
the test node. Without de-synchronization, the number of nodes with the same (channel,
75
seed) pair increases monotonically over time for each of the 4 (channel, seed) values.
After around 370 slots, which is 370*10 ms = 3.7 seconds, all 9 neighbors of the test
node converge to the same (channel, seed) pair on all slots. Consequently, all nodes
always switch to the same channel all the time, and SSCH becomes equivalent to single
The final constraints we add moderate the pace of change in schedule information.
Each node only considers updating the (channel, seed) pair for the next slot, never for
slots further in the future. If the previous set of criteria suggest updating some slot other
than the next slot, we delay that decision. Given these constraints, picking the best pos-
sible (channel, seed) pair simply requires considering the choice that synchronizes with
the set of nodes for which we have the largest number of queued packets. Additionally,
the (channel, seed) pair for the first slot is only allowed to be updated during the par-
ity slot – this helps to prevent logical partition, as will be explained in more detail in
Section 3.5.3.
source node will find that it can assign all of its slots to support sends. A sink node will
find that it rarely changes its slot assignment, and hence nodes sending to it can easily
stay synchronized. A forwarding node will find that some of its slots are used primarily
for receiving; after re-assigning the channel and seed in a slot to support sending, the
slots that did not change are more likely to receive packets, and hence to stabilize on
their current channel and seed as receiving slots for the duration of the current traffic
patterns. Our simulation results (Section 3.6) support this conclusion. We refer to the
Our discussion of the mathematical properties of SSCH will initially focus on the static
case. The behavior of SSCH when channel schedules are not changing assures us that in
a steady-state flow setting, nodes will rendezvous appropriately, in a sense that we make
precise below. We will then expand our discussion to include the dynamics of channel
scheduling in an environment where flows are starting and stopping. In our discussion,
we assume that all nodes use IEEE 802.11 to synchronize their clocks within 1 ms of
each other, and there are no Byzantine failures in the network. A node never sends false
The channel scheduling mechanism has three simultaneous design goals: allowing
nodes to be synchronized in a slot, infrequent overlap between nodes that do not have
data to send to each other, and ensuring that all nodes come into contact occasionally (to
avoid a logical partition). To achieve these goals, we rely on a very simple mathematical
Consider two nodes that want to be synchronized in a given slot. If they have iden-
tical (channel, seed) pairs for this slot, then clearly they will remain synchronized in
future iterations (using the static assumption). Now consider two nodes that are not syn-
chronized because they have different seeds. A simple calculation shows that these two
nodes will overlap exactly one out of every 13 iterations in this slot (recall that 13 is
the number of channels). This is the behavior we want from these nodes: they overlap
regularly enough that they can exchange their channel schedules, but they are mostly on
Now consider the rare case that two nodes share identical seeds in every slot, but
different channels accompany each seed – this has at most a 1 in 13 4 ≈ 28, 000 chance
of occurring for randomly chosen (channel, seed) pairs. In this case, the nodes will
77
march in lock-step through the same set of channels in each slot, never overlapping.
This would be problematic, and it is this situation that the parity slot prevents. To justify
this claim, we consider two distinct situations. If both nodes enter their parity slot at
the same time, then they overlap there because the parity channel is equal to the seed
for the first slot for both nodes. With our chosen parameter settings of 10 ms per slot,
4 slots, and 13 channels, this overlap occurs once every 530 ms and lasts for 10 ms. If
their parity slots do not occur at the same time, then the first node’s parity slot offers a
fixed target for the slot in which the second node is changing channels, and again, the
two nodes will overlap. This overlap occurs once every 7 seconds. Although both these
cases will be rare, the SSCH time synchronization mechanism allows us to ignore the
second case entirely – a relative clock skew of 5 ms or less is sufficient to guarantee that
Now considering the dynamic case (and assuming clock synchronization to within
5 ms), we note that nodes are not permitted to change the seed for the first of their four
slots except during a parity slot. Therefore they will always overlap in either the first slot
or the parity slot, and hence will always be able to exchange channel schedules within a
The use of addition modulo a prime to construct channel hopping schedules does
not restrict SSCH to scenarios where the number of channels is a prime number. If one
desired to use SSCH with a wireless technology where the number of channels is not a
prime, one could straightforwardly use a larger prime as the range of xi , and then map
down to the actual number of channels using a modulus reduction. Though the mapping
would have some bias to certain channels, the bias could be made arbitrarily small by
A final point about the use of addition modulo a prime is that SSCH can be modified
78
to require fewer bits to represent a node’s schedule by reducing the number of choices
for a seed. The only penalty to this reduction is increasing the protocol’s reliance on the
This section presents the simulation results of SSCH in QualNet and compares its perfor-
mance with the commonly used single-channel IEEE 802.11a protocol. Subsection 3.6.1
to encompass mobility and multihop routing. Our results show that SSCH incurs very
low overhead, and significantly outperforms IEEE 802.11a in a multiple flow environ-
ment.
200m area. All nodes in a single simulation run use the same MAC, either SSCH or
IEEE 802.11a. All nodes are set to operate at the same raw data rate, 54 Mbps. We
assume 13 usable channels in the 5 GHz band. SSCH is configured to use 4 seeds,
and each slot duration is 10 ms. All seeds are randomly chosen at the beginning of
each simulation run. The macrobenchmarks in subsections 3.6.2 and 3.6.3 are averages
from 5 independent simulation runs, while the microbenchmarks in subsection 3.6.1 are
We primarily measure throughput under a traffic load of maximum rate UDP flows.
In particular, we use Constant Bit Rate (CBR) flows of 512 byte packets sent every 50
µs. This data rate is more than the sustainable throughput of IEEE 802.11a operating at
54 Mbps.
79
For all our simulations, we modified QualNet to use a channel switch delay of 80
µs. This choice was informed by recent work in solid state electronics on reducing
the settling time of the Voltage Control Oscillator (VCO) [85]. Switching the channel
of a wireless card requires changing the input voltage of the VCO, which operates in
a Phase Locked Loop (PLL) to achieve the desired output frequency. The delay in
channel switching is due to this settling time. The specification of Maxim IEEE 802.11b
Transceivers [84] shows this delay to be 150 µs. More recent work [51] shows that this
3.6.1 Microbenchmarks
narios. In Section 3.6.1, we measure the overhead during the successful initiation of
failing to initiate a parallel CBR stream. In Section 3.6.1, we measure the overhead
of continuing to attempt transmissions to a mobile node that has moved out of range.
These scenarios cover many of the different dynamic events that a MAC must appro-
priately handle: a flow starting while a node is present, a flow starting while a node is
absent, simultaneous flows where both nodes are present, simultaneous flows where one
node moves out of range, etc. Finally, the last scenario (Section 3.6.1) measures the
between two nodes within communication range of each other. The first node initiates
80
the stream just after the parity slot. This incurs a worst-case delay in synchronization,
because the first of the four slots will not be synchronized until 530 ms later.
In Figure 3.5, we graph the instantaneous throughput at the receiver node. The
sender quickly synchronizes with the receiver on three of the four slots, as it should, and
on the fourth slot after 530 ms. The figure shows the throughput while synchronizing
(oscillating around 3/4 of the raw bandwidth), and the time required to synchronize.
After synchronizing, the channel switching and other protocol overheads of SSCH lead
to only a 400 Kbps penalty in the steady-state throughput relative to IEEE 802.11a.
This penalty conforms to our intuition about the overheads in SSCH: a node spends 80
µs every 10 ms switching channels (80 µs/10 ms = .008), and then must wait for the
the new channel (1 packet/35 packets = .028). Adding these two overheads together
leads to an expected cumulative overhead of 3.6%, which is in close agreement with the
Note that the throughput of the session reaches a maximum of only 13 Mbps, al-
though the raw data rate is 54 Mbps. This low utilization can be explained by the IEEE
802.11a requirement that the RTS/CTS packets be sent at the lowest supported data rate,
SSCH requires more re-transmissions than IEEE 802.11 in order to prevent logical par-
titions. These retransmissions waste bandwidth that could have been dedicated to a node
that was present on the channel. To quantify this overhead, we initiated a CBR stream
between two nodes, allowed the system to quiesce, and then initiated a send from the
first node to a non-existent node. We present a moving average of the throughput over 80
81
Time to totally
synchronize
14
12
0
10 12 14 16 18 20 22 24
Figure 3.5: Switching and Synchronizing Overhead: Node 1 starts a maximum rate
UDP flow to Node 2. We show the throughput for both SSCH and IEEE 802.11a.
ms in Figure 3.6. It shows that the sender takes 530 ms to timeout on the non-existent
node. During this time the session throughput drops by 550 Kbps, which is a small
Next, we quantify the ability of SSCH to fairly share bandwidth between two flows, and
to quickly achieve this fair sharing. We start with Node 1 sending a maximum rate UDP
stream to Node 2. At 21.5 seconds, Node 1 starts a second maximum rate UDP stream
to Node 3.
Figure 3.7 presents a moving average of the throughput achieved by both nodes over
a period of 140 ms. It illustrates the instantaneous throughput achieved at Nodes 2 and 3
82
14
10
Attempting send to
8 absent node
0
23 23.5 24 24.5 25 25.5 26 26.5 27
Time (in seconds)
Figure 3.6: Overhead of an Absent Node: Node 1 is sending a maximum rate UDP
(the receivers). The bandwidth is split between the receivers nearly perfectly (and with
Overhead of Mobility
Ideally, SSCH should be able to detect a link breakage due to movement of a node, and
subsequently re-synchronize to other neighbors. We show that SSCH can indeed handle
this scenario with an experiment comprising 3 nodes and 2 sessions, and in Figure 3.8
we present a moving average of each session throughput, averaged over a period of 280
ms.
Node 1 is initially sending a maximum rate UDP stream to Node 2. Node 1 initiates
a second UDP stream to Node 3 at around 20.5 seconds. This bandwidth is then shared
between both the sessions (as in the experiment of Section 3.6.1) until 30 seconds, when
83
14
12
0
15 20 25 30 35
Time (in seconds)
Node 2 Node 3
Node 3 moves out of the communication range of Node 1. Our experiment configures
Node 1 to continue to attempt to send to Node 3 until 43 seconds, and during this time
Section 3.6.1 measured the overhead of enqueueing a single packet to an absent node.
When the stream to Node 3 finally stops, Node 2’s received throughput increases back
As we described in Section 3.5.2, SSCH tries to synchronize slot begin and end times,
though it is also designed to be robust to clock skew. In this experiment, we quantify the
robustness of SSCH to moderate clock skew. We measure the throughput between two
nodes after artificially introducing a clock skew between them, and disabling the SSCH
84
14
10
0
15 20 25 30 35 40 45 50
Time (in seconds)
Node 2 Node 3
Figure 3.8: Overhead of Mobility: Node 1 is sending a maximum rate UDP stream
to Node 2. Node 1 starts another maximum rate UDP session to Node 3. Node 3
moves out of range at 30 seconds, while Node 1 continues to attempt to send until
43 seconds.
synchronization scheme for slot begin and end times. We vary the clock skew from 1 ns
(10−6 ms) to 1 ms such that the sender is always ahead of the receiver by this value, and
present the results in Figure 3.9. Note the log scale on the x-axis.
The throughput achieved between the two nodes is not significantly affected by a
clock skew of less than 10 µs. The drop in throughput is more for larger clock skews,
although the throughput is still acceptable at 10.5 Mbps when the skew value is an
These results provide justification for the design choice we made not to require nodes
to switch synchronously across slots, as described in Section 3.5.2. For example, a node
will delay switching to receive an ACK, or to send a data packet if its channel reservation
85
14
10
0
1 ns 10 ns 100 ns 1 µs 10 µs 100 µs 1 ms
Clock Drift
Figure 3.9: Overhead of Clock Skew: Throughput between two nodes using SSCH
is successful. In the 100 node experiment described in Section 3.6.3, we measured the
We now present simulation results showing SSCH’s ability to achieve and sustain a
consistently high throughput for a traffic pattern consisting of multiple flows. We first
evaluate this using steady state UDP flows. We then extend our evaluation to consider
a dynamic traffic scenario where UDP flows both start and stop. Finally, we study the
Disjoint Flows
We first look at the number of disjoint flows that can be supported by SSCH. All nodes
in this experiment are in communication range of each other, and therefore two flows
86
are considered disjoint if they do not share either endpoint. Ideally, SSCH should utilize
the available bandwidth on all the channels on increasing the number of disjoint flows
in the system. We evaluate this by varying the number of nodes in the network from 2 to
30 and introducing a flow between disjoint pairs of nodes — the number of flows varies
from 1 to 15.
14
Per-flow Throughput
Throughput (in Mbps)
12
10
0
0 2 4 6 8 10 12 14
# Flows
802.11a SSCH
Figure 3.10: Disjoint Flows: The throughput of each flow on increasing the number
of flows.
Figure 3.10 shows the average per-flow throughput, and Figure 3.11 shows the total
utilized system throughput. IEEE 802.11a performs marginally better when there is
just one flow in the network. When there is more than one flow, SSCH significantly
An increase in the number of flows decreases the per-flow throughput for both SSCH
and IEEE 802.11a. However, the drop for IEEE 802.11a is much more significant. The
drop for IEEE 802.11a is easily explained by Figure 3.11, which shows that the overall
120
System Throughput
80
60
40
20
0
0 2 4 6 8 10 12 14
# Flows
802.11a SSCH
Figure 3.11: Disjoint Flows: The system throughput on increasing the number of
flows.
It may seem surprising that the SSCH system throughput has not stabilized at 13
times the throughput of a single flow by the time there are 13 flows. However, this can
random choices do not lead to a perfectly balanced allocation, and therefore there is still
unused spectrum even when there are 13 flows in the system, as shown by the continuing
Non-disjoint Flows
We now consider the case when the flows in the network are not disjoint – nodes par-
ticipate as both sources and sinks, and in multiple flows. This scenario stresses SSCH’s
ability to efficiently support sharing among simultaneous flows that have a common
endpoint. Each node in the network starts a maximum rate UDP flow with one other
randomly chosen node in the network. We vary the number of nodes (and thus flows)
88
from 2 to 20. As in the previous experiment, all nodes are within communication range
of each other. We present the per-flow and system throughput for SSCH and IEEE
802.11a in Figures 3.12 and 3.13 respectively. The curves are not monotonic because
variation in the random choices leads to some receivers being recipients in multiple
flows (and hence bottlenecks). This lack of monotonicity persisted even after averag-
ing over 5 simulation runs. As in the disjoint flow experiment, SSCH performs slightly
worse in the case of a single flow, but much better in the case of a large number of flows.
7 Per-flow Throughput
Throughput (in Mbps)
0
0 5 10 15 20
# Flows
802.11a SSCH
Figure 3.12: Non-disjoint Flows: The average throughput of each flow on increas-
ing the number of flows. There is a flow from every node in the network.
SSCH introduces a delay when flows start because nodes must synchronize. This over-
head is more significant for shorter flows. We evaluate this overhead for maximum rate
UDP flows with different flow lengths. In the first experiment the flow duration is cho-
89
45
40 System Throughput
30
25
20
15
10
0
0 5 10 15 20
# Flows
802.11a SSCH
Figure 3.13: Non-disjoint Flows: The system throughput on increasing the number
sen randomly between 20 and 30 ms, while for the second experiment it is between 0.5
and 1 second. In both the experiments, each node starts a flow with a randomly selected
node, discards all packets at the end of the designated sending window, pauses for a
second at the end of the flow, and then starts another flow with a new randomly selected
node. This process continues for 30 seconds. We run these experiments for both SSCH
and IEEE 802.11a, and vary the number of nodes from 2 to 16. We present the ratio
of the average throughput achieved by SSCH to that achieved by the flows when using
For small numbers of sufficiently short-lived flows, IEEE 802.11a offers superior
overhead. However, as soon as there are more than 4 simultaneous flows in the network,
the ability of SSCH to spread transmissions across multiple channels leads to a higher
total throughput than IEEE 802.11a in both the short and long flow scenarios.
90
3.5
2.5
1.5
0.5
0
2 4 6 8 10 12 14 16
# Nodes
Figure 3.14: Effect of Flow Duration: Ratio of SSCH average throughput to IEEE
We now study the behavior of TCP over SSCH. SSCH allows a node to stay synchro-
nized to multiple nodes over different slots. However, this might cause significant jitter
in packet delivery times, which could adversely affect TCP. To evaluate this concern
quantitatively, we run an experiment where we vary the number of nodes in the network
from 2 to 9, such that all nodes are in communication range of one another. We then start
an infinite-size file transfer over FTP from each node to a randomly selected other node.
This choice to use non-disjoint flows is designed to stress the SSCH implementation by
nodes. In Figure 3.15 we present the resulting cumulative steady-state TCP throughput
Figure 3.15 shows that the TCP throughput for a small number of flows is lower
91
16
14
10
0
2 3 4 5 6 7 8
# Flows
SSCH 802.11a
Figure 3.15: TCP over SSCH: Steady-state TCP throughput when varying the
for SSCH than the throughput over IEEE 802.11a. However, as the number of flows
increases, SSCH does achieve a higher system throughput. Although TCP over SSCH
does provide higher aggregate throughput than over IEEE 802.11a, the performance
improvement is not nearly as good as for UDP flows. This shows that jitter due to
SSCH does have an impact on the performance of TCP. A more detailed analysis of
the interaction between TCP and SSCH, and modifications to support better interactions
between TCP and SSCH, is a subject we plan to address in our future work.
We now evaluate SSCH’s performance when combined with multihop flows and mobile
nodes. We first analyze the behavior of SSCH in a multihop chain network. We then
92
consider large scale multihop networks, both with and without mobility. As part of this
analysis, we study the interaction between SSCH and MANET routing protocols.
work [136]. For example, if all nodes are on the same channel, the RTS/CTS mechanism
allows at most one hop in an A− B − C − D chain to be active at any given time. SSCH
reduces the throughput drop due to this behavior by allowing nodes to communicate on
different channels. To examine this, we evaluate both SSCH and IEEE 802.11a in a
14
12
Throughput (in Mbps)
10
0
0 6 12 18
# Nodes
SSCH 802.11a
increases.
We vary the number of nodes, which are all in communication range, from 2 to 18.
We initiate a single flow that encounters every node in the network. Although more
93
than 4 nodes transmitting within interference range of each other would be unlikely
to arise from multihop routing of a single flow, it could easily arise in a more general
distributed application. Figure 3.16 shows the maximum throughput as the number of
nodes in the chain is varied. We see that there is not much difference between SSCH
and IEEE 802.11a for flows with few hops. As the number of hops increases, SSCH
performs much better than IEEE 802.11a since it distributes the communication on each
We now analyze the performance of SSCH in a large scale multihop network without
mobility. We place 100 nodes uniformly in a 200 × 200 m area, and set each node to
transmit with a power of 21 dBm. The Dynamic Source Routing (DSR) [68] protocol
is used to discover the source route between different source-destination pairs. These
source routes are then fed into a static variant of DSR that does not perform discovery
or maintain routes. We vary the number of maximum rate UDP flows from 10 to 50. We
generate source and destination pairs by choosing randomly, and rejecting pairs that are
We present the average flow throughput in Figure 3.17. Increasing the number of
flows leads to greater contention, and the average throughput of both SSCH and IEEE
802.11a drops. For every considered number of flows, SSCH provides significantly
higher throughput than IEEE 802.11a. For 50 flows, the inefficiencies of sharing a
single channel are sufficiently pronounced that SSCH yields more than a factor of 15
capacity improvement.
94
3.5
2.5
1.5
0.5
0
10 20 30 40 50
# Flows
SSCH 802.11a
Figure 3.17: Mulithop Mesh Network of 100 Nodes: Average flow throughput on
Previous work on multi-channel MACs has often overlooked the effect of channel switch-
ing on routing protocols. Most of the proposed protocols for MANETs, such as DSR [68],
and AODV [97] rely heavily on broadcasts. However, neighbors using a multi-channel
MAC could be on different channels, which could cause broadcasts to reach signifi-
cantly fewer neighbors than in a single-channel MAC. SSCH addresses this concern
We study the behavior of DSR [68] over SSCH in the same experimental setup used
in Section 3.6.3, with 100 nodes in a 200 m×200 m area. However, we reduce the trans-
mission power of each node to 16 dBm to force routes to increase in length (and hence
to stress DSR over SSCH). We select 10 source-destination pairs at random, and we use
DSR to discover routes between them. In Figure 3.18 we compare the performance of
95
0.8 7
0.7 6
0.5
Average Route Length for IEEE 802.11 4
0.4
3
0.3
2
0.2
0.1 1
Average Route Discovery Time for IEEE 802.11
0 0
2 3 4 5 6 7 8
# Broadcasts
Route Discovery Time Avg Route Length
average time to discover a route and the average route length for 10 randomly
DSR over SSCH, when varying the SSCH broadcast transmission count parameter (the
Figure 3.18 shows that the performance of DSR over SSCH improves with an in-
crease in the broadcast transmission count. The DSR Route Request packets see more
neighbors when SSCH broadcasts them over a greater number of slots. This increases
the likelihood of discovering shorter routes, and the speed with which routes are dis-
covered. However, there seems to be little additional benefit to increasing the broadcast
parameter to a value greater than 6. The slight bumpiness in the curves can be attributed
Comparing SSCH to IEEE 802.11a, we see that the SSCH discovers routes that are
96
comparable in length. However, the average route discovery time for SSCH is much
higher than for IEEE 802.11a. Because each slot is 10 ms in length, broadcasts are only
retransmitted once every 10 ms, and this leads to a significantly longer time to discover a
route to a given destination node. We believe that this latency is a fundamental difficulty
in using a reactive protocol such as DSR with SSCH. We plan to explore the interaction
of other proactive and hybrid routing protocols with SSCH in the future.
We now present the impact of mobility in a network using DSR over IEEE 802.11a
and SSCH. In this experiment, we place 100 nodes randomly in a square and select 10
flows. Each node transmits packets at 21 dBm. Node movement is determined using
the Random Waypoint model. In this model, each node has a predefined minimum and
maximum speed. Nodes select a random point in the simulation area, and move towards
it with a speed chosen randomly from the interval. After reaching its destination, a node
rests for a period chosen from a uniform distribution between 0 and 10 seconds. It then
chooses a new destination and repeats the procedure. In our experiments, we fix the
minimum speed at 0.01 m/s and vary the maximum speed from 0.2 to 1.0 m/s. Although
we have studied SSCH at higher speeds, the results are not significantly different. We
performed this experiment using two different areas for the nodes, a 200m × 200m area
and a 300m × 300m area. We refer to the smaller area as the dense network, and the
larger area as the sparse network – the average path is 0.5 hops longer in the sparse
network. For all these experiments, we set the SSCH broadcast transmission count
parameter to 6.
Figure 3.19 shows that in a dense network, SSCH yields much greater through-
put than IEEE 802.11a even when there is mobility. Although DSR discovers shorter
97
5 4
4.5
3.5
2.5 2
2
1.5
1.5
1
1
0.5
0.5
0 0
0.2 0.4 0.6 0.8 1
Speed (in m/s)
802.11 Throughput SSCH Throughput
802.11 # Hops SSCH # Hops
Figure 3.19: Dense Multihop Mobile Network: The per-flow throughput and the
average route length for 10 flows in a 100 node network in a 200m × 200m area,
routes over IEEE 802.11a, the ability of SSCH to distribute traffic on a greater number
of channels leads to much higher overall throughput. Figure 3.20 evaluates the same
benchmarks in a sparse network. The results show that the per-flow throughput de-
creases in a sparse network for both SSCH and IEEE 802.11a. This is because the route
lengths are greater, and it takes more time to repair routes. However, the same quali-
tative comparison continues to hold: SSCH causes DSR to discover longer routes, but
DSR discovers longer routes over SSCH than over IEEE 802.11a because broadcast
packets sent over SSCH may not reach a node’s entire neighbor set. Furthermore, some
optimizations of DSR, such as promiscuous mode operation of nodes, are not as effective
in a multi-channel MAC such as SSCH. Thus, although the throughput of mobile nodes
98
6.5
1400
5.5
1200
2.5
600
400 1.5
200 0.5
0 -0.5
0.2 0.4 0.6 0.8 1
Speed (in m/s)
802.11 Throughput SSCH Throughput
802.11 # Hops SSCH # Hops
Figure 3.20: Sparse Multihop Mobile Network: The per-flow throughput and the
average route length for 10 flows in a 100 node network in a 300m × 300m area,
using DSR over SSCH is much better than their throughput over IEEE 802.11a, we
conclude that a routing protocol that takes the channel switching behavior of SSCH into
When simulating SSCH in QualNet [62], we made two technical choices that seem to
be relatively uncommon based on our reading of the literature. The first technical choice
relates to how we added SSCH to an existing system, and the second relates to a little-
In order to implement SSCH, we had to implement new packet queuing and retrans-
mission strategies. To avoid requiring modifications to the hardware (in QualNet, the
99
hardware model) or the network stack, SSCH buffers packets below the network layer,
but above the NIC device driver. To maintain control over transmission attempts, we
configure the NIC to buffer at most one packet at a time, and to attempt exactly one RTS
for each packet before returning to the SSCH layer. By observing NIC-level counters
before and after every attempted packet transmission, we are able to determine whether
a CTS was heard for the packet, and if so, whether the packet was successfully trans-
mitted and acknowledged. All the necessary parameters to do this are exposed by the
For efficiency reasons, we choose to use the IEEE 802.11 Long Control Frame
Header format to broadcast channel schedules and current offsets, rather than using a full
broadcast data packet. The most common control frames in IEEE 802.11 (RTS, CTS,
and ACK) use the alternative short format. The long format was included in the IEEE
802.11 standard to support inter-operability with legacy 1-Mbps and 2-Mbps DSSS sys-
tems [60]. The format contains 6 unused bytes; we use 4 to embed the 4 (channel, seed)
pairs, and another 2 to embed the offset within the cycle (i.e., how far the node has
Lastly, we comment that the beaconing mechanism used in IEEE 802.11 ad-hoc
mode for associating with a Basic Service Set (BSS) works unchanged in the presence
of SSCH. A newly-arrived node can associate to a BSS as soon as it overlaps in the same
This Section discusses alternative designs for SSCH within the constraints that were
SSCH distributes the rendezvous and control traffic across all the channels. One
100
straightforward alternative scheme, which still only requires one radio, is to use one of
the channels as a control channel, and all the other channels as data channels (e.g., [66]).
Each node must then somehow split its time between the control channel and the data
channels.
Such a scheme will have difficulty in preventing the control channel from becoming
a bottleneck. Suppose that two nodes exchange RTS/CTS on the control channel, and
then switch to a data channel to do transmission. Unless all other nodes were also on the
control channel during the RTS/CTS exchange, these two nodes will still need to do an
RTS/CTS on this channel in order to avoid the hidden terminal problem. The two nodes
should wait to even do the RTS/CTS until after an entire packet transmission interval
has elapsed, because another pair of nodes might have also switched to this channel,
orchestrating that decision on the control channel during a time that the first pair of
nodes were not on the control channel. In order to amortize this startup cost, the nodes
should have several packets to send to each other. However, while any one node remains
on a data channel, any other node that desires to send it a packet must remain idle on
the control channel waiting for the node it desires to reach to re-appear. If the idle node
on the control channel chooses not to wait, and instead switches to a data channel with
another node for which it has traffic, it may repeatedly fail to rendezvous with the first
The problems with a dedicated control channel may be solvable, but it is clear that a
If one instead tried to synchronize rendezvous on the control channel, the control chan-
nel could again become a bottleneck simply because many nodes simultaneously desire
SSCH is a promising technology. In our future work, we plan to investigate how SSCH
will perform when implemented over actual hardware, and subjected to the normal en-
strength. As part of this implementation effort, we also plan to evaluate how metrics
reflecting environmental conditions, such as ETX [40], can be integrated into SSCH.
Our results in Section 3.6.3 show that existing routing protocols do not give the best
performance over SSCH. In particular, we find that the time to discover a route can be
quite large in a reactive routing protocol being run over SSCH. In the future, we plan
to more thoroughly evaluate routing over SSCH (as opposed to classical single channel
routing), and to explore a wider variety of proactive and hybrid routing protocols over
SSCH.
There are at least four additional topics that would also need to be addressed be-
fore SSCH can be deployed. One is interoperability with nodes that are not running
SSCH. Another is the evaluation of power consumption under this scheme. We have
not attempted to evaluate the energy cost of switching channels, nor have we attempted
to enable a power-saving strategy such as in the IEEE 802.11 specification for access-
point mode. A third topic of investigation is the evaluation of SSCH in conjunction with
3.9 Summary
We have presented SSCH, a new protocol that extends the benefits of channelization
to ad-hoc networks. This protocol is compatible with the IEEE 802.11 standard, and is
102
suitable for a multi-hop environment. SSCH achieves these gains using a novel approach
We have shown through extensive simulation that SSCH yields significant capacity
we look forward to exploring SSCH in more detail using an implementation over ac-
tual hardware. More information about SSCH and the QualNet simulation code can be
Work on SSCH was done jointly with people at Microsoft Research. The SSCH
protocol was co-developed with John Dunagan. Victor Bahl was involved in the entire
research project and made sure that we proceeded in the right direction. Finally, this
4.1 Introduction
The convenience of wireless networking has led to a wide-scale adoption of IEEE 802.11
networks [58]. Corporations, universities, homes, and public places are deploying these
for end-users and network administrators. Users experience a number of problems such
ures. These problems occur due to a variety of reasons such as poor access point lay-
out, device misconfiguration, hardware and software errors, the nature of the wireless
Figure 4.1 shows the number of such wireless-related complaints logged by the In-
months. The company has a large deployment of IEEE 802.11 networks with several
thousand Access Points (APs) spread over more than forty buildings. Each complaint is
an indication of end-user frustration and loss of productivity for the corporation. Fur-
the IT department; our research revealed that this cost is several tens of dollars and this
does not include the cost due to the loss of end-user productivity.
To resolve complaints quickly and efficiently, network administrators need tools for
detecting, isolating, diagnosing, and correcting faults. To the best of our knowledge,
there is no previous research that addresses fault diagnostic problems in IEEE 802.11
infrastructure networks. However, as discussed in Section 4.3, there has been consid-
erable prior work on fault diagnosis in other setting, which we can leverage here. The
103
104
related problems
No. of wireless-
600
400
200
0
1 2 3 4 5 6
Month
a major US corporation
importance of diagnosing these problems in the “real-world” is apparent from the num-
ber of companies that offer solutions in this space [5, 7, 39, 103, 131]. These products
do a reasonable job of presenting statistical data from the network; however, they lack a
ering and analyzing the data to establish the possible causes of a problem. Furthermore,
most products only gather data from the APs and neglect the client-side view of the
network. Some products that monitor the network from the client’s perspective require
hardware sensors, which can be expensive to deploy and maintain. Also, current solu-
tions do not provide any support for disconnected clients even though these are the ones
that need the most help. We discuss these products in more detail in Section 4.3.
This chapter presents a flexible architecture for detecting and diagnosing faults in
infrastructure wireless networks. We instrument wireless clients and (if possible) access
points to monitor the wireless medium and devices that are nearby. Our architecture
supports both proactive and reactive fault diagnosis. We use this monitoring framework
to address some of the problems plaguing wireless users. We present a novel technique
called Client Conduit that enables disconnected clients to diagnose their problems with
105
the help of nearby clients. This technique takes advantage of the beaconing and probing
mechanisms of IEEE 802.11 to ensure that connected clients do not pay unnecessary
overheads for detecting disconnected clients. We also present a simple technique for
uses nearby wireless clients for diagnosing wireless network performance problems.
Finally, we show how our monitoring architecture naturally lends itself to detecting
mented and evaluated the basic architectural framework, Client Conduit, and Rogue AP
detection on the Windows operating system using off-the-shelf IEEE 802.11 network
cards; we have evaluated our other mechanisms using tools such as AiroPeek [132] and
WinDump [134]. Our results show that our techniques are effective; furthermore, they
• We believe ours is the first work to identify fault diagnosis in IEEE 802.11 infras-
in the wireless context since we use clients (and if possible, infrastructure APs) to
• We describe a simple and efficient technique called Client Conduit that allows dis-
• We present novel solutions that use our architecture for detecting and diagnosing a
Our work is just a first step in the direction of self-healing wireless networks and
there are a number of issues that still need to be addressed. From the vast number of
wireless problems faced by end-users and network administrators everyday, we have fo-
cused only on a subset of those problems; our selection was based on conversations
with network administrators [24] along with the high-priority problems observed in
user-complaint logs. Even though some of our techniques are applicable to other de-
ployments (e.g., hotspots, homes), our main emphasis has been diagnosing faults in en-
terprise wireless networks. We ensure that our techniques do not introduce new security
attacks but we do not focus on denial-of-service and greedy MAC attacks [101].
The rest of the chapter is organized as follows: In Section 4.2, we discuss the most
important problems that users and network administrators complain about with respect
to wireless LAN deployment. Section 4.4 describes the components of our client-based
architecture. Section 4.5 presents the Client Conduit protocol. Section 4.6 focuses on lo-
cating disconnected clients, performance isolation, and Rogue AP detection. Section 4.7
describes the implementation of our system and Section 4.8 presents an evaluation of
our techniques. Section 4.3 discusses related work. Finally, we discuss future work in
We enumerate the most important problems that users and network administrators face
when using and maintaining corporate wireless networks. Our list has been derived from
over 4,400 IEEE 802.11 APs distributed over forty buildings in the company.
connectivity in certain areas of a building. Such “dead spots” or “RF holes” can oc-
they can then resolve the problem by either relocating APs or increasing the density of
APs in the problem area or by adjusting the power settings on nearby APs for better
coverage.
Performance problems: This category includes all the situations where a client ob-
serves degraded performance, e.g., low throughput or high latency. There could be a
number of reasons why the performance problem exists, e.g., traffic slow-down due
poorly configured client/AP. Performance problems can also occur as a result of prob-
lems in the non-wireless part of the network, e.g., due to a slow server or proxy. It is
therefore necessary that the diagnostic tool be able to determine whether the problem is
in the wireless network or elsewhere. Furthermore, identifying the cause in the wireless
part is important for allowing network administrators to better provision the system and
Network security: Large enterprises often use solutions such as IEEE 802.1x [57] to
secure their networks. However, a nightmare scenario for IT managers occurs when em-
referred to as the “Rogue AP Problem” [5, 7, 36]. These Rogue APs are one of the
most common and serious breaches of wireless network security. Due to the presence of
such APs, external users are allowed access to resources on the corporate network; these
users can leak information or cause other damage. Furthermore, Rogue APs can cause
interference with other access points in the vicinity. Detecting Rogue APs in a large
plaints are related to users’ inability to authenticate themselves to the network. In wire-
less networks secured by technologies such as IEEE 802.1x [57], authentication failures
are typically due to missing or expired certificates. Thus, detecting such authentication
detecting Rogue APs, and helping a client to recover from an authentication problem
via Client Conduit. As part of our future work, we will investigate diagnosis of authen-
To the best of our knowledge, there has been no previous research on fault diagnosis
products that provide varying degrees of support for network management tasks, e.g.,
AirWave [7], Network Systems and Management (NSM) [39], Wireless Security Advi-
sor [103], AirDefense [5], SpectraMon/SpectraGuard [131], AirMagnet [6], and Sym-
bol [123]. Due to their propriety nature, the available description typically describes the
109
feature-set and not the techniques; the comparison below is based on our understanding
of their brochures.
The emphasis in most of these products is more towards managing wireless networks
rather than diagnosing faults. These tools allow network administrators to obtain and vi-
sualize data from access points, upgrade firmware, manage security policies, etc. Some
of them also provide real-time WLAN performance monitoring through IEEE 802.11
the AP, etc. Even though these low-level statistics are useful for network administra-
tors, it is more desirable to provide higher level fault detection and diagnosis, e.g., our
approach detects network performance problems and pinpoints the components that are
problematic.
Many of these products (e.g., AirWave, Unicenter) operate from the AP or the server
side only, i.e., clients are not instrumented. Given the asymmetry and variability of the
wireless medium, observing data from the client-side is important for fault diagnosis,
e.g., since conditions such as interference near the client can be drastically different
than the conditions near the AP, client-side information is needed to do a detailed per-
disconnected clients via Client Conduit, locate Rogue APs and disconnected clients, and
Some products like AirMagnet and AirDefense obtain the complete view of the
pass all the packets to the server for analysis. Anecdotal evidence from talking to vari-
ous network administrators suggests that products that use sensor-based monitoring are
very few sensors are deployed due to the network traffic. Our approach uses regular
110
our approach is that we rely on the presence of nearby clients for diagnosing some of
the wireless faults; however, the increasing usage of wireless clients in organizations is
Since Rogue APs are a serious security problem, all the products listed above per-
form Rogue AP detection. Unlike our solution, most of these products achieve this goal
either by using other APs [7, 39] or by using specialized sensors [5, 6, 131]; as discussed
above, these approaches have deployment and fault-detection limitations. Our technique
of using both clients and APs for detecting Rogue APs is similar to the Symbol tech-
nique [123]. However, unlike their approach, our technique can also detect Rogue APs
that use MAC address spoofing of real APs; furthermore, we leverage our client and AP
None of the above products provide solutions for assisting disconnected clients even
though they need the most help. Our Client Conduit mechanism allows live and reactive
diagnosis to be performed for such clients that are unable to access the infrastructure
wireless network.
The notion of making wireless clients snoop the environment for ensuring secure and
correct routing has been suggested for ad hoc networks. In [83], the authors propose a
nodes. The basic idea is to have watchdog nodes observe their neighbors and determine
if they are forwarding traffic as expected; this approach for detecting routing anomalies
has been further refined by others as well [15,27]. Inspired by the watchdog mechanism,
we also use nearby clients to monitor the RF conditions and traffic flow around them; in
our architecture, the watchdog mechanism is used for fault detection (e.g., Rogue APs)
and fault diagnosis (e.g., Client Conduit, locating disconnected clients, performance
111
isolation). Recent work [101] has used snooping wireless clients for detecting greedy
and malicious behavior in hotspots environment; these techniques are orthogonal to our
the Internet. For example, Barford et al. [19] use traffic traces at the end points and clas-
sify delays as occurring due to a slow server, a slow client, or the network. While E DEN
has similar goals over a wireless network, it does so without requiring tracing support
from both end points. Tulip [82] is another approach for diagnosing delays over Internet
paths. The client sends ICMP packets and uses their responses from different compo-
nents to determine the cause, such as lost packets, packets reordering, or queueing delay.
E DEN also uses ICMP packets. However, the broadcast nature of the wireless medium
enables E DEN to use a novel approach of snooping these packets as a mechanism for
In this Section, we first discuss the requirements and then describe the components that
Before we describe the system components, we enumerate the requirements for our
system:
• We require that the software on clients be augmented for monitoring. In our sys-
tem, software modifications on APs are needed only for better scalability and for
analyzing an AP’s performance (Section 4.6.2). Since our approach does not re-
quire hardware modifications, “the bar” for deploying our system is lower.
112
• For some of our mechanisms, we need the ability to control beacons and probes.
We also require that clients have the capability of starting an infrastructure net-
work (i.e., become an AP) or an ad hoc network on their own; this ability is sup-
ported by many wireless cards, e.g., Atheros [14], Native WiFi [86]. Whenever
• We rely on the availability of a database that keeps track of the location of all
the access points; such location databases are typically maintained by network
administrators.
• Some of our techniques require the presence of nearby clients or access points.
With the increasing deployment of access points and the use of wireless laptops
easy to satisfy in these environments. In fact, based on SNMP data collected from
APs over a period of two days, we observed the presence of 13-15 associated
wireless clients on our floor (approximately 2500 sq. meters) during working
hours of the day; thus, with such client densities, there is a high likelihood that
Compared with the existing products that require deploying special wireless sen-
sors throughout the enterprise, our approach takes advantage of nearby clients and
access points instrumented with software “sensors”, thereby imposing a lower de-
ployment cost.
Our system consists of the following components — a Diagnostic Client (DC) that runs
Point, and a Diagnostic Server (DS) that runs on a backend server of the organization
Diagnostic Client module or DC: The Diagnostic Client module monitors the RF en-
vironment and the traffic flow from neighboring clients and APs. Note that during nor-
mal activity, the client’s wireless card is not placed in promiscuous mode. The DC
uses the collected data to perform local fault diagnosis. Depending on the individual
at regular intervals, e.g., for Rogue AP detection, the DC in our prototype sends MAC
and channel information of nearby APs every 30 seconds. In addition, the DC is geared
to accept commands from the DAP or the DS to perform on-demand data gathering,
e.g., switching to promiscuous mode and analyzing a nearby client’s performance prob-
lems. In case the wireless client becomes disconnected, the DC logs data to a local
database/file. This data can be analyzed by the DAP or DS at some future time when
Diagnostic Access Point module or DAP: The Diagnostic AP’s main function is to ac-
cept diagnostic messages from DCs, merge them along with its own measurements and
send a summary report to the DS. The Diagnostic AP is not a fundamental requirement
of our architecture; it is primarily needed for offloading work from the DS. Most of our
techniques can work in an environment with a mixture of legacy APs and DAPs: if an
AP is a legacy AP, its monitoring functions are performed by the DCs and its summa-
rizing functions and checks are performed at the DS. In the rest of the chapter, for the
Diagnostic Server module or DS: The Diagnostic Server accepts data from DCs and
DAPs and performs the appropriate analysis to detect and diagnose different faults. The
114
Diagnostic DS
Server (DS) Auth Radius Kerberos DHCP
Info
Diagnostic
Messages/ Legacy AP
Actions
DAP
Forward
Send monitor disconnected
info client msgs DC
DC DC
DC
Coverage
Area Client Disconnected
Conduit Peer
Information flow
Wiring
for Diagnosis
DS also has access to a database that stores each AP’s location. Network administrators
may deploy multiple DSs in the system to balance the load, e.g., each AP’s MAC address
could be hashed to a particular DS. In the rest of the chapter, we present our mechanisms
Figure 4.2 gives a schematic view of our fault diagnosis system. As shown, the
Diagnostic Server interacts with other network servers e.g., the RADIUS [105] and Ker-
beros [90] servers, to get client authorization and user information. Our architecture
allows disconnected clients to communicate with the DS via a nearby connected client
using the Client Conduit protocol; this mechanism is presented in Section 4.5.
115
Our system supports both reactive and proactive monitoring. In proactive moni-
toring, DCs and DAPs monitor the system continuously: if an anomaly is detected by
a DC, DAP, or DS, an alarm is raised for a network administrator to investigate. The
reactive monitoring mode is used when a support personnel wants to diagnose a user
complaint. The personnel can issue a directive to a DC from one of the DSs to collect
and analyze the data for diagnosing the problem. We believe that it is acceptable to
increase the network and CPU load (on the DCs, DAPs, DSs) by a small amount during
reactive monitoring; of course, in the proactive case, these overheads must be kept low.
Our architecture itself imposes negligible overheads with respect to power man-
wastage. Both the proactive and reactive techniques presented later in this chapter con-
sume very little bandwidth, CPU, or disk resources; as a result, they should have negli-
gible impact on battery consumption. Only during data transfer in Client Conduit does
that the helping client’s applications (or battery) are not affected significantly, it is of-
fered a knob to control the amount of resources it wants to devote for this transfer (see
Section 4.5.2).
Table 4.1 shows the various problems diagnosed in this chapter, the entities (DCs,
DAPs, and DSs) involved in the diagnosis, and whether the solution can be used with
legacy APs.
We have designed our system to scale with the number of clients and APs in the system.
The two shared resources in our system are DSs and DAPs. To prevent a single Di-
agnostic Server from becoming a potential bottleneck in our system, the design allows
116
Table 4.1: Different fault diagnosis mechanisms and entities that can diagnose
them; the last column indicates if the solution can be supported using legacy APs
more DSs to be added as the system load increases. Furthermore, we offload work from
each individual DS by sharing the diagnosis burden with the DCs and the DAPs. The
DS is used only when the DCs and DAPs are unable to diagnose the problem and the
analysis requires a global perspective and additional data, e.g., signal strength informa-
tion obtained from multiple DAPs may be needed for locating a disconnected client. As
stated earlier, the presence of legacy APs degrades scalability since the work usually
Similarly, since the DAP is a shared resource, making it do extra work can potentially
hurt the performance of all its associated clients. To reduce the load on a DAP, different
fault diagnosis mechanisms can use a simple technique that we refer to as Busy AP
Optimization: with this optimization, an AP does not perform active scanning if any
client is associated with it; the associated clients perform these operations as needed.
The AP continues to perform passive monitoring activities that have a negligible effect
on its performance. If there is no client associated, the AP is idle and it can perform these
monitoring operations. This approach ensures that most of the physical area around the
The interactions between the DC, DAP, and DS are secured using EAP-TLS [2] certifi-
cates issued over IEEE 802.1x. An authorized certification authority (CA) issues certifi-
cates to DCs, DAPs and DSs; we use these certificates to ensure that all communication
searchers have developed techniques for detecting greedy and malicious behavior for
hotspot environments [101]; others have suggested techniques to handle problems due
to false information sent by malicious clients to central entities such as the DS [99].
These approaches are complimentary to our design and could be included in our sys-
tem.
This section presents a novel mechanism called Client Conduit that allows disconnected
If a wireless client cannot connect to the network, the DC logs the problem in its
database. When the client is connected later (e.g., via a wired connection), this log is
uploaded to the DS, which performs the diagnosis to determine the cause of the problem.
However, sometimes it is possible that this client is in the range of other connected
clients; this client may be disconnected since it is just outside the range of any AP or
diagnosis with the DS immediately and, if possible, rectify the problem. We now focus
on this scenario.
On first thought one may ask: why not have the disconnected node simply send a
message to its connected neighbor? Unfortunately, this approach does not work because
118
IEEE 802.11 does not allow a client to be connected to two networks at the same time.
Since the connected node has already associated to an infrastructure network, it cannot
wants to receive a message from D, it first has to disconnect and then join the ad-hoc
client.
One can imagine solving this problem using multiple radios on the connected client
(one dedicated on an ad hoc network for diagnosis), or using MultiNet (which allows a
client to multiplex a single wireless card such that it is present on multiple networks), or
by making a connected client periodically scan all channels. All these approaches have
problem that is expected to occur infrequently. In the periodic scanning case, switching
the wireless card across channels or networks can cause packet drops at the connected
client. In the MultiNet case, the wireless card will periodically spend time on the ad
hoc network, and will thus consume bandwidth on the connected client. On the other
hand, our Client Conduit approach imposes no overheads in the common case when no
We now discuss our Client Conduit protocol that allows a disconnected client to be di-
agnosed by a DS via one of the connected clients. Client Conduit achieves its efficiency
(of not penalizing connected clients) by exploiting two operational facts about the IEEE
802.11 protocol. First, even when a client is associated to an AP, it continues to re-
ceive beacons from neighboring APs or ad hoc networks at regular intervals. Second,
a connected client can send directed or broadcast Probe Requests without disconnect-
119
ing from the infrastructure network. We now present the Client Conduit protocol for a
Figure 4.3). In the following description, we refer to the first 4 steps of the protocol aa
the Connection Setup phase and the last step as the Data Transfer phase.
Figure 4.3: Client Conduit Mechanism (Steps 1 through 5 are described below)
cuous mode. It scans all channels to determine if any nearby client is connected
detected the client’s packets. For the reasons discussed in Section 4.4.1, and for
2. This newly-formed AP at D now broadcasts its beacon like a regular AP, with an
SSID of the form “SOS HELP <num>” where num is a 32-bit random number to
3. The DC on the connected client C detects the SOS beacon of this new AP. At
this point, C needs to inform D that its request has been heard and it can stop
C sends a Probe Request of the form “SOS ACK <num>” to D. Note that the
Probe Request is sent with a different SSID than the one being advertised by
the AP running on D. This approach prevents some other nearby client that is
not involved in the Client Conduit protocol from inadvertently sending a Probe
Request to D (as part of that client’s regular tests of detecting new APs in its
environment).
4. When D hears this Probe Request (and perhaps other requests as well), it stops
being an AP, and becomes a station again. Note that in response to the Probe
Request, a Probe Response is sent out by D; client C now knows that it does
not need to send more Probe Requests (it would have stopped anyway when D’s
like to use client C as a hop for exchanging diagnostic messages with the DS.
This response mechanism ensures that if multiple connected clients try to help D,
only one of them is chosen by D for setting up the conduit with the DS.
5. Now D starts an ad hoc network and C joins this network via MultiNet [30]. At
this point, C becomes a conduit for D’s messages and D can exchange diagnostic
The key advantage of the Client Conduit protocol is that connected clients do not
experience unnecessary overheads during normal operation. Their overheads during the
It is important to note that the Client Conduit mechanism can also be used for boot-
strapping clients. For example, suppose that a client tries to access a wireless network
for the first time and does not have EAP-TLS certificates, but has other credentials such
with the backend Radius/Kerberos servers. New certificates can then be installed on the
client machine; similarly, a client’s expired certificates can also be refreshed without
of IEEE 802.1x authentication problems [24]. Client Conduit can be used if a connected
client is in range as well. If there is no such client, one can dynamically configure the
AP to allow D’s diagnostic messages to the back end DS (or to the RADIUS servers
who can forward to the DS) via the uncontrolled port [57].
We must ensure that the Client Conduit protocol does not introduce any new security
leaks or opportuniues for denial-of-service attacks in the system. To ensure that a mali-
cious/unauthorized client does not obtain arbitrary access to the network, the connected
client allows a disconnected client’s packets to be exchanged only with the DS or back-
We now discuss two potential abuses of Client Conduit: hurting the performance of
When a connected client C helps a disconnected client via Client Conduit, we need to
ensure that C’s application’s performance is not adversely affected. During the Con-
122
nection Setup part of Client Conduit, the connected client C simply requires processing
the beacon message and sending/receiving probe messages; no messages are forwarded
by C on the disconnected client’s behalf. These steps not only consume negligible re-
sources on C but they also do not result in any security leak or compromise on C; of
course, C can further rate-limit or stop performing these steps if it discovers that the
We now consider the Data Transfer part of the protocol for possible security and
connected client [30]. There are two problems that need to be addressed. First, a mali-
cious client should not be allowed to waste a connected client C’s resources by making
it enter MultiNet mode unnecessarily. Second, even when helping a legitimate client, C
should be able to control the amount of resources that it wants to allocate for the discon-
nected client D during the MultiNet transfer. The second problem can be addressed by
providing a knob to the client that allows it to limit the percentage of time that it spends
on the ad hoc network relative to the infrastructure network; client C may also limit
this usage to save battery power. Section 4.8.2 characterizes the disconnected client’s
To prevent the first problem due to malicious clients, we add the following authen-
tication step before Data Transfer to ensure that only legitimate clients are allowed to
After the Connection Setup phase, client C switches to MultiNet mode for perform-
MultiNet mode repeatedly, C can limit the number of times per minute that it performs
such an authentication step. As part of the authentication step, client C obtains the
EAP-TLS machine certificate from the disconnected client and validates it (for ensuring
123
mutual authentication, client D can perform these steps as well). If the disconnected
client has no certificates or its certificates have expired, client C acts as an intermediary
for running the desired authentication protocol, e.g., C could help D perform Kerberos
authentication from the back end Kerberos servers and obtain the relevant tickets. If
the disconnected client D still cannot authenticate, C asks D to send the last (say) 10
KBytes of its diagnosis log to C and C forwards this log to the DS. To prevent a possible
DoS attack in which a malicious client tries to send this unauthenticated log repeatedly
(e.g., while spoofing different MAC addresses), the connected client can limit the total
amount of unauthenticated data that it sends in a fixed time period, e.g., C could say that
remain undetected may try to exploit the properties of Client Conduit. The attacker’s
AP can be set up to beacon with an SOS SSID; our Rogue AP detection mechanism
(Section 4.6.3) will assume that this beaconing device is actually a disconnected client
and not declare it as a Rogue AP. Thus, we need to distinguish between the cases where
the beaconing device is a legitimate client and where it is actually a Rogue AP.
network during the Connection Setup and starts beaconing, it does not send or receive
any data packets. Thus, if a DC ever detects an AP (or a node in ad hoc mode) that is
beaconing the SOS SSID and sending/receiving data packets, the DC can immediately
flag it as a Rogue device. There is another test that can be used to detect such a Rogue
device: when the helping client hears the Probe Response in step 4 of the Client Conduit
124
protocol, it knows that the disconnected client no longer needs to beacon. Thus, if the
helping client continues to hear the SOS beacons after a few seconds, it can flag the
This section discusses our techniques for detecting and diagnosing faults in a IEEE
802.11 wireless network. Section 4.6.1 describes a simple technique for locating dis-
connected clients. Section 4.6.2 presents our mechanisms for isolating performance
problems and Section 4.6.3 describes how we detect rogue access points.
e.g., poor coverage or high interference (locating RF holes) or for locating possibly
faulty APs. A disconnected client can determine that it is in an RF hole if it does not hear
beacons from any AP (as opposed to being disconnected due to some other reason such
help in locating RF holes), we now discuss a technique called Double Indirection for
location of this client, nearby connected clients hear D’s beacons and record the sig-
nal strength (RSSI) of these packets. They inform the DS that client D is disconnected
and send the collected RSSI data. At this point, the DS executes the first step of D IAL
to determine the location of the connected clients: this can be done using any known
125
location-determination technique in the literature [17, 73]. In the next step of D IAL, the
DS uses the locations of the connected clients as “anchor points” and the disconnected
client’s RSSI data to estimate its approximate location. This step can be performed us-
ing any scheme that uses RSSI values from multiple clients for determining a machine’s
location [17, 25, 73]. Since locating the connected clients results in some error, con-
sequently locating disconnected clients with these anchor points further increases the
error. In Section 4.8.3, we show that this error is approximately 10 to 12 meters which
Our design for diagnosing network performance problems comprises two lightweight
ponent. The monitoring component runs in the background at the DC and informs the
this point, the diagnosing component analyzes the connections and outputs a report that
gives a breakdown of the delays, i.e., the extent of the delays in the wired and the wire-
less part, and for the latter, a further breakdown into delays at the client, AP, and the
medium. Note that the monitoring component can be conservative in declaring that
network problems are being encountered; a false alarm simply invokes our diagnosing
component. Since this component has low overheads, invoking it has a small impact
on the performance of clients and APs. These components have not been implemented
yet in our current prototype but we have evaluated the effectiveness of some of these
We focus on diagnosing performance problems for TCP connections since TCP is the
most widely used transport protocol in the Internet. For a TCP connection, we can
passively diagnose performance problems by leveraging the connection’s data and ac-
knowledgment (ACK) packets. For other transport protocols, we can determine end-
to-end loss-rate and round-trip times using either active probing or performance reports
low throughput, high loss rate, and high delay. We do not use throughput as a metric for
detecting a problem since it is dependent on the workload (i.e., the client’s application
may not need a high throughput) and on specific parameters of the transport protocol
(e.g., initial window size, sender and receive window size in TCP). Instead, we use
packet loss rate and round-trip time for detecting performance problems.
Estimating the round trip time (RTT) in a TCP connection is simple: if the client is
a sender, it already keeps track of the RTT; if the client is a receiver, it can apply the
To estimate the loss rate, we use heuristics suggested in [47] and [10] on the client
side. We compute different loss rates for packets sent and received by the client. For data
packets sent by the client, the loss rate is estimated as the ratio of retransmitted packets
to the packets sent over the last L RTTs [10]. This estimation mechanism assumes that
the TCP implementation uses Selective ACKs so that loss rate is not overestimated un-
support this option by default, e.g., Windows, Linux, Solaris. As shown in [10], this
estimate can be higher than the actual loss rate when timeouts occur in a TCP connec-
tion. For our purposes, this inaccuracy is acceptable for two reasons: first, if a TCP
127
diagnosing; second, the only consequence of a mistake is to trigger our diagnosis com-
ponent, which incurs low overhead. If more accurate analysis is needed, the LEAST
For the data packets received by the client, we use an approach similar to the one
suggested in [47] to estimate the number of losses: if a packet is received such that
its starting sequence number is not the next expected sequence number, the missing
segment is considered lost. The loss rate is estimated as the ratio of lost packets to the
total number of expected packets in the last L RTTs. Note that the expected number
of bytes is calculated as the maximum observed sequence number minus the minimum
during the last L RTTs; we apply the idea in [139] to estimate maximum segment size
(MSS), and estimate the number of packets by dividing the number of bytes by MSS.
Our assumption is that segments are rarely delivered out-of-order in a TCP connection
if the RTT of a particular packet is more than 250 msec or is higher than twice the current
TCP RTT [140]. To avoid invoking our diagnosis algorithm for high delays that occur
temporarily, we flag a connection only when D or more packets experience a high delay.
A connection is classified as lossy if its loss rate (for transmitted or received packets) is
Both D and L are configurable parameters and each represents a tradeoff between
component. That is, with a low value of D or L, any change in delays/losses will be
detected quickly but it may also result in invoking the diagnosis component unnecessar-
128
ily. For high values, apart from slow responsiveness, another problem occurs: the TCP
connection may end before sufficient number of samples have been collected. Such a sit-
uation can occur with short Web transfers. We can alleviate this problem by aggregating
loss rate and delay information between the client and remote hosts across TCP con-
nections. We are currently exploring such techniques along with choosing appropriate
values of D and L.
When the DC at a client detects a network performance problem for a TCP connection, it
communicates with its associated DAP to differentiate between the delays on the wired
and wireless parts of the path. The DAP then starts monitoring the TCP data and ACK
packets for that client’s connection. If the DC is the sender in the TCP connection,
the DAP computes the difference between the received time of a data packet from the
client to the remote host and the corresponding TCP ACK packet; this time difference
is an estimate of the delay incurred in the wired network. To ensure that the roundtrip
time estimate is reasonable, various heuristics used by TCP need to be applied to these
roundtrip measurements as well, e.g., Karn’s algorithm [117]. The DAP sends this
estimate to the DC who can now determine the wireless part of the delay by subtracting
this estimate from the TCP roundtrip time. A similar approach can be used to compute
this breakdown when the client is a receiver: the DAP determines the wireless delay by
monitoring the data packets from the remote host to the client and the corresponding
ACK packets. Note that the amount of state maintained at the DAP is small since it
corresponds to the number of unacknowledged TCP packets; this can be reduced further
by sampling.
129
A client may experience poor wireless performance due to a number of reasons, such
some driver or other kernel issues at either the AP or the client. We quantify the effect
of these problems by observing their impact on packet delay in the wireless network
path. We group these performance problems into three categories: packet delay at the
client, packet delay at the AP, and packet delay in the wireless medium. In this sec-
Neighbors or E DEN, which leverages the presence of other clients to quantify the delay
experienced in each of the above categories. Since electromagnetic waves travel at the
speed of light, we can safely assume that RF propagation delays are negligible relative
casting packets asking for diagnosis help from nearby clients. All clients who hear
these packets switch to promiscuous mode and ask the DAP to start the diagnosis (Sec-
tion 4.8.1 shows that the CPU overheads of entering promiscuous mode are low on
modern processors). Security mechanisms similar to the ones discussed in Section 4.5.2
can be used to prevent attacks on these clients. Note that we use multiple snooping
clients in E DEN primarily for robustness: multiple clients increase the likelihood that at
least one client hears the E DEN protocol requests and responses discussed below.
E DEN proceeds in two phases. In the first phase, the DAP to which client D is as-
sociated estimates the delay at D. The DAP periodically (say every 2 seconds) sends
Snoop request packets to client D. When D receives a Snoop request packet, it imme-
diately replies with a Snoop response message. The eavesdropping clients log the time
when they hear a Snoop request and the first attempt by D to send the corresponding
130
Snoop response packet, i.e., we only record the times of response packets for which the
it ignores the timing values for that request/response pair. The difference between the
recorded times is the client delay, i.e., application and OS delays experienced by D after
receiving the request packet. For robustness, Snoop requests are sent a number of times
(say 20); the client and AP delays are averaged over all these instances.
In the second phase, a similar technique is used to measure the AP delay, i.e., client D
sends the Snoop request packets and the AP sends the responses. Client D also records
the round trip times to the AP for these Snoop requests and responses along with the
number of request packets for which it did not receive a response, e.g., the request or
Strictly speaking, this client and AP delay also includes the delay due to contention
experienced in the wireless medium. In Section 4.8.4, we discuss the extent of inaccu-
At the end of the protocol, all the eavesdropping clients send the AP and client
delay times to the client D. The difference between the round trip time reported by D,
and the sum of the delays at the client and the AP, approximates the sum of the delay
experienced by the packet in the forward and backward wireless link. The client can then
report the client/AP/medium breakdown to the network administrator; it can also report
As discussed in Section 4.2, Rogue APs are unauthorized APs that have been connected
to an Ethernet tap in an enterprise or university network; such APs can result in security
131
holes, and unwanted RF and network load. Rogue APs are considered a major security
Our architectural framework of using clients and (if possible) APs to monitor the
environment around them naturally lends itself for detecting Rogue APs. Our basic
approach is to make clients and DAPs collect information about nearby access points
and send it to the DS. When the DS receives information about an AP X, it checks the
AP location database and ensures that X is a registered AP in the expected location and
channel.
Assumptions
We assume that all Rogue APs and the corresponding connected “rogue” clients use
bar” such that non-compliant APs with low-level modifications are needed to defeat our
scheme: to avoid detection, an attacker must modify the Rogue AP to not beacon and
not respond to probe requests. Of course, an attacker can simply use a proprietary access
point or one with different technology, e.g. HIPERLAN. Detecting such intruders re-
quires special hardware and is not our goal. We simply want a low-cost mechanism that
addresses the (common case) Rogue AP problem being faced in current deployments:
for many networks administrators, the main goal is to detect APs inadvertently installed
will investigate the detection of non-compliant Rogue access points and clients as well.
If two companies have neighboring wireless networks, our mechanisms will clas-
sify the other companies’ access points as Rogue APs. If this classification is unaccept-
able, the network administrators of the respective companies can share their AP location
databases.
132
In our system, each DC monitors the packets in its vicinity (non-promiscuous mode),
and for each AP that it detects, it sends a 4-tuple < MAC address, SSID, channel, RSSI
> to the DS. Essentially, the 4-tuple uniquely identifies an AP in a particular location
and channel. To get this information, a DC needs to determine the MAC addresses of
and observing data packets (it can use the FromDS and ToDS bits in the packet to deter-
mine which address belongs to the AP). However, we can achieve the same effect using
a simpler approach: since IEEE 802.11 requires all APs to broadcast beacons at regular
intervals, the DC can obtain the MAC addresses from the APs’ beacons from all the
APs that it can hear. In Section 4.8.5, we show that a DC not only hears beacons on its
channel but it may also hear beacons from overlapping channels as well; this property
To ensure that we do not miss a Rogue AP even if no client is present on any chan-
nel overlapping with the AP, we use the Active Scanning mechanism of the IEEE 802.11
protocol: when a client wants to find out what APs are nearby, the client goes to each
of the 11 channels (in 802.11b), sends Probe Requests and waits for Probe Responses
from all APs that hear those Probe Requests; from these responses, the DC can obtain
the APs’ MAC addresses. Every IEEE 802.11-compliant AP must respond to such re-
quests and in some chipsets [86], no controls are provided to disable this functionality.
Consistent with our framework, we use the Busy AP Optimization (see Section 4.4.3) so
that active scans in an AP’s vicinity are performed by the AP only when it has no client
Analysis at the DS
When the DS receives information for an AP from various clients, it uses D IAL to esti-
mate the AP’s approximate location based on these clients’ locations and the AP’s RSSI
AP in the DS’s AP location database, i.e., if the MAC address is not present in the
database, or if the AP is not in the expected location, or the SSID does not correspond
to the expected SSID(s) in the organization. Note that if an AP’s SSID corresponds
to an SOS SSID, the DS skips further analysis since this AP actually corresponds to a
disconnected client that is executing the Connection Setup phase of the Client Conduit
protocol. The channel information is used in a slightly different way. As stated above, if
with the one on which it is expected. Note that if the channel on an AP is changed, the
DAP can ask the DS to update its AP location database (recall that the communication
between the DAP and the DS is authenticated; if the AP is a legacy AP, the administrator
can update the AP location database when the AP’s channel is changed). The checks
A Rogue AP, say R, may try MAC address spoofing to avoid being detected, i.e.,
send packets using the MAC address of an authorized AP, say G. However, the DS can
still detect R as it will reside in a different location or channel than G (if it is on the
same channel and location, G would immediately detect it). Our approach also detects a
Rogue AP that does not broadcast an SSID in its beacons since a DC can still obtain the
AP’s MAC address. Of course, we can detect such unauthorized APs in an even simpler
Start
Is AP in
Is MAC
Yes expected
registered?
location?
Rogue AP
No No
detected
Yes
No
Is AP on the No
expected Is AP advertising
channel? the expected SSID?
Yes
not
Thus, given the above strategy, an unauthorized AP may stay undetected for a short
time by spoofing an existing AP X near X’s location, beacon a valid SSID in the or-
However, when a nearby client performs an active scan, the Rogue AP will be detected;
as we show in Section 4.8.5, a DC can easily perform such a scan every 5 minutes.
4.7 Implementation
We now describe the details of our fault diagnosis implementation. We have imple-
mented the basic architecture consisting of the DC, DAP and DS daemons; the authenti-
cation and logging mechanisms have not been implemented. We have also implemented
135
the Client Conduit protocol and the Rogue AP detection mechanism. The support for
Our system has been implemented on the Windows operating system with Netgear
MA 521 802.11b cards. On the DS, we simply run a daemon process that accepts
information from DAPs. The DS reads the list of legitimate APs from a file; support
for reading this information from a database can be easily added. The structure of the
code on the DC or DAP consists of a user-level daemon and kernel level drivers (see
Figure 4.5). These pieces are structured such that code is added to the kernel drivers only
Diagnostics IM Module
Native WiFi IM Driver
NDIS
Kernel drivers: There are two drivers in our system — a miniport driver and an inter-
mediate driver (IM driver) called the Native WiFi driver [86].
The miniport driver communicates directly with the hardware and provides basic
cient interfaces such that functions like association, authentication, etc. can be handled
in the IM driver.
The IM driver supports a number of interfaces (exposed via ioctls) for querying various
parameters such as the current channel, transmission level, power management mode,
SSID, etc. In addition to allowing the parameters to be set, it allows the user-level code
to request for active scans, associate with a particular SSID, capture packets, etc. In
general, it provides a significant amount of flexibility and control to the user-level code.
Even though some of the required operations were already present in the IM driver,
improve performance of our protocols. The miniport driver was changed to expose
certain packet types to the IM driver. In the IM driver, we added the following support:
• Capturing packet headers and packets: We allow filters to be set such that only
certain packets or packet headers are captured, e.g., filters based on specific MAC
addresses, packet types, packet subtypes (such as management and beacon pack-
ets), etc.
• Storing the RSSI values from received packets: We obtain the RSSI value of ev-
ery received packet and maintain a table called the NeighborInfo table that keeps
track of the RSSI value from each neighbor (indexed on the MAC address). We
maintain an exponentially weighted average with the new value given a weight-
ing factor of 0.25. The RSSI information is needed for estimating the location of
the channels on which packets were heard from a particular MAC address, SSID
• Kernel event support for protocol efficiency: We added an event that is shared
between the kernel and user-level code. The kernel triggers this event when an
rather being polling-based. Currently, the kernel sets this event whenever it hears
an SOS beacon from a disconnected client during Client Conduit, thereby result-
• We added a number of ioctls to get and clear the information discussed above.
Fault Diagnostic daemon: This daemon gathers information and implements various
mechanisms discussed in this chapter, e.g.., collect MAC addresses of APs for Rogue
AP detection, perform Client Conduit, etc. If the device is an AP, it communicates diag-
nostic information with the DS and the DCs; if the device is just a DC, it communicates
The Diagnostic daemon on the DC obtains the current NeighborInfo table from the
kernel every 30 seconds. If any new node has been discovered or if the existing data has
changed significantly (e.g., RSSI value of a client has changed by more than a factor
of 2), it is sent to the DAP. The DAP also maintains a similar table indexed on MAC
addresses. However, it only sends information about disconnected clients and APs to
the DS; otherwise, the DS would end up getting updates for every client in the system,
making it less scalable. The DAP sends new or changed information about APs to the
DS periodically (30 seconds in our current prototype). Furthermore, if the DAP has any
138
All messages from the DC to the DAP and DAP to the DS are sent as XML mes-
sages. A sample message format from the DC is shown below (timestamps have been
removed):
<Clients TStamp="...">
</Clients>
<Real-APs TStamp="...">
</Real-APs>
<Disconnected-Clients TStamp="...">
</Disconnected-Clients>
</DiagPacket>
As the sample message shows, the DC sends information about other connected
clients, APs, and disconnected clients. For each such class of entities, it sends the MAC
address of a machine along with RSSI, SSID, and a channel bitmap which indicates the
We now evaluate our mechanisms and show that they are not only effective but they
also impose low overheads. For the basic architecture evaluation, Client Conduit, and
and D IAL, we use a combination of tools such as AiroPeek [132] and WinDump [134].
Section 4.8.1 presents the timings for individual operations that are used by our pro-
tocols. Section 4.8.2 presents the breakdown of the costs involved in the Client Conduit
mechanism and shows that it can be used to help disconnected clients in a timely manner.
Section 4.8.3 show the effectiveness of our D IAL technique for locating disconnected
clients. In Section 4.8.4, we evaluate the effectiveness of the E DEN technique to iso-
late performance problems. Section 4.8.5 shows that the scanning requirements of our
Section 4.8.6, we discuss scalability issues with respect to the Client Conduit protocol,
To better understand the cost of various operations involved in our detection and di-
believe that these numbers are valuable for other researchers for modeling purposes as
well. Table 4.2 shows the results. Note, the cost of changing a machine from AP to
Station mode is less than 2 seconds (731 msecs for the actual change and then waiting
determine if placing a machine in promiscuous mode has any effect on the machine’s
140
Table 4.2: Times for different operations: U means time measured from user-level
code; rest are times taken for the corresponding ioctl to complete
incoming/outgoing bandwidth. We setup the machines such that machine A did a TCP
transfer to C at full blast and B performed a full blast TCP transfer to D. The experiment
was performed three times; in each case, machine C was placed in normal mode first
and then in promiscuous mode. We observed that C’s throughput was largely unaffected
(standard deviation of 63.7 KB/sec) in the normal mode case and a bandwidth of 252.3
placed in promiscuous mode. We ran a full blast TCP transfer between two machines
A and B; during this process, we first placed machine M in normal mode and then in
promiscuous mode. Figure 4.6 shows the CPU overhead for machine M (a 1 GHz Pen-
tium III machine). Even for such a relatively old machine, the CPU overhead of placing
it in promiscuous mode is quite low, mostly staying below 10%; we also observed that
Thus, these results show that the CPU overheads on a machine due to promiscuous
To measure the performance of the Client Conduit protocol, we set an experiment with
one AP, one connected client C and a disconnected client D. The connected client is
a 1 GHz Pentium III machine and the disconnected machine is a 800 MHz Pentium III
machine. Both machines have 512 MB of memory and Netgear MA521 802.11b cards.
16
Promiscuous mode
Normal mode
CPU Usage
12
0
0 20 40 60
Time (secs)
Figure 4.7 shows the total time taken along with a breakdown of the Connection
Setup part of the protocol. “User time” indicates the end-to-end time taken by our user-
level implementation whereas “Kernel time” indicates the time taken by the relevant
ioctls for the same functionality. The costs in both cases are similar thereby justifying
our approach of implementing only the essential mechanisms at the kernel level and
driving most of the protocol from the user-level (for ease of debugging). In the first
two bars, the user-level daemon at the connected client shares an event with the kernel
who immediately informs the daemon when a disconnected client’s beacon is detected
142
(See Section 4.7). Thus, the disconnected client needs to wait only a short time before it
hears the Probe Request message from the connected client C indicating that C is ready
to help (see the “Get ACK” times). This delay would be much higher if the daemon
obtained the disconnected machine information from the kernel periodically instead of
being interrupt-driven. The third bar shows the delay breakdown for an implementation
where the daemon client polls for this information every 10 seconds from the kernel
We now clarify a couple of details about our experiment. First, the initial step of
setting the channel and checking for available clients takes approximately 190 msecs.
In the worst case, the disconnected client may have to scan all channels and check for
connected clients; in that case, this step may take an 2-3 seconds. Second, the steps
in which we set the AP/Station mode of the machine take approximately 730 msec;
however, the hardware specifications require that the operating system must wait for a
few hundred milliseconds before using the card in the new mode. For robustness, we
added a one second delay after such a mode change; the figure includes these delays
From the figure, one can see that the Connection Setup and association time for the
disconnected client is quite reasonable: it takes less than 5 seconds to run the setup and
another 1.9 seconds to associate with a connected client C in ad-hoc mode so that the
After MultiNet starts running on the connected client, the disconnected client can
interact with the DS to diagnose its problems, e.g., transfer certificates or log files to
the DS. To evaluate the time taken to perform these transfers via MultiNet, we ran
an experiment in which a machine D sent files of different sizes (100KB, 500KB and
1MB) to the DS through connected client C. Figure 4.8 shows the time taken when
143
14,000
Adhoc-mode association
Sleep 1 second
12,000 Become STA
Get Ack
Set Beacon Period
10,000
Set SSID
Sleep 1 second
Time (msec)
8,000 Become AP
Set channel
6,000
4,000
2,000
0
User time (ms) Kernel time (ms) User time with polling (ms)
Figure 4.7: Breakdown of costs for Client Conduit. The protocol steps are executed
from the bottom entry in the legend to the topmost, i.e., starting at “Set channel”.
the connected client C allows 17-50% of its time to be used for ad hoc mode; client C
stays on the infrastructure network for 500 msecs, and the time on the ad-hoc network is
varied between 100 to 500 msecs. In our experiment, the time to switch from ad-hoc to
infrastructure mode is 500 msecs and from infrastructure to ad hoc mode is 300 msecs.
As expected, the results show that the file transfer speed is a direct function of the
time a connected client stays in the ad hoc network. We expect that as the switching
delay overhead reduces (as in newer cards) the transfer speeds will improve.
Thus, our results show that Client Conduit allows a disconnected client’s problem to
be reported (and even be resolved, e.g., updating expired certificates) in a few seconds.
We now evaluate the accuracy of locating disconnected clients (or Rogue APs) using our
D IAL scheme described in Section 4.6.1. Unlike previous work on location determina-
144
20
0
0 0.1 0.2 0.3 0.4 0.5 0.6
% Time of connected node
Figure 4.8: Time taken by a disconnected client to transfer data via Multinet
tion, the location calculated by D IAL incurs extra error since the location of reference
We evaluated D IAL using RADAR [17] for locating the disconnected clients from
the anchor points due to its simplicity; more sophisticated RSSI-based schemes such as
the one suggested in [73] can be used for reducing the errors of D IAL even further.
In our experiment, we placed 3 connected clients in 3 offices on the same floor of our
building. We obtained the floor map, and applied the Cohen-Sutherland line-clipping
algorithm [48] to compute the number of walls between each of the three connected
clients and the other rooms. We placed a disconnected client at 7 different locations
while it sent out broadcast packets. We used AiroPeek [132] to measure the RSSI of
the disconnected client’s packets received at the connected machines. We then applied
the equation specified in [17] to compute the wall attenuation factor (WAF). Based on
the WAF, we inferred that the disconnected client is in location X if the predicted signal
strength at X is closest to the observed signal strength at the three connected clients.
We ran the RADAR algorithm on the collected RSSI data for locating the discon-
145
nected client D using the precise location of the connected clients. We computed the
error in D’s predicted location with respect to its actual location; the “No Error” bar in
(a) Estimated location of connected client is one-room off from its true location
Median Location Error (metres)
30
24
18
12
0
No Error E(1) E(2) E(1,2) E(3) E(1,3) E(2,3) E(1,2,3)
(b) Estimated location of connected client is two-rooms off from its true location.
Figure 4.9: Median error in locating disconnected clients. The lower and upper
bounds of error bars correspond to min and max error. E(i) denotes that the ith
Then, we ran the algorithm again by assuming that there was an error in estimating
the location of one connected client by a distance of 3.3 meters; this distance corre-
sponds to the average width of a room in our building. For example, if connected client
when using it as an anchor point in RADAR. The second bar in Figure 4.9(a) shows this
error when such a situation occurs. The rest of the bars show the error in locating the
disconnected client when the location of either one, two or three connected clients is es-
timated incorrectly by one room; Figure 4.9(b) shows the error when estimated location
The results show that when there is no error in the known location of the connected
clients, the median error is 9.7 meters. This error increases to at most 12 meters when the
estimated location of one or more clients is one or two rooms off from its true location.
Of course, when the estimated locations of the connected clients are off by two rooms,
the maximum error is substantially higher, e.g., 33 meters for the case when the location
of all three clients is incorrect. This case occrs when the estimated locations of the
connected clients are off in different directions, e.g., client A’s location is off towards
Note that the error in the location of the anchor points (i.e., connected clients) can
be kept low (less than one room off) by using mechanisms such as Cricket [98] and
Active Badges [129] for locating connected clients. With accurate location of anchor
points, D IAL’s error would be similar to that of the best-known RSSI-based location
mechanism. Note that even an error of 10-12 metres (for our experimental setup using
clients or Rogue APs. Thus, based on our results, we can say that D IAL is a practi-
cal approach for helping network administrators estimate the approximate location of
problematic areas.
147
In Section 4.6.2, we presented the E DEN scheme that uses nearby clients to measure the
delay encountered by a wireless station or an AP. We now show that E DEN can estimate
The E DEN technique measures the time spent on a client (or an AP) by measuring
the times of the Snoop request and response packets at nearby clients. However, this
measurement includes the delay at the machine due to medium contention. To under-
stand the extent of this congestion delay, we set up a simple experiment with 4 machines
machine B, thereby creating traffic congestion in the medium. Then we associated client
C with the Native WiFi AP machine D. The Native WiFi AP then sent 20 ping packets
to the associated client, which in turn sent ping reply packets. We ran the experiment
twice: once with no extra client delays and next when an extra 40 msec were added at
the client between the ping request and replies. Using a fifth machine running Airopeek,
we observed that E DEN over-estimated the client delay by approximately 3 msec. When
examining scenarios where the client or the AP are the bottlenecks, such inaccuracies
may be acceptable. However, when these entities are not bottlenecks or when E DEN is
examining a scenario with low delays or when contention is even worse (e.g., the con-
tention delay can even be more than 20 msec in 802.11b), a better estimation is needed;
at an endpoint. In this setup, a client machine was associated with another machine
running as an access point; both machines had Netgear MA521 802.11b cards and the
corresponding Native WiFi drivers. We then injected delays in the path of all packets at
the client (varying from 30 to 300 msecs). To emulate the E DEN protocol, the AP sent
148
0
30 60 90 120 150 180 210 240 270 300
Delay Introduced at client (msec)
20 ping packets to the client; the ping packets and replies emulate the Snoop request
and response messages in E DEN. A third machine running AiroPeek was used to snoop
on these ping packets; this machine effectively emulates the eavesdropping client in
E DEN. The collected Airopeek data was then analyzed to estimate the delays at the
client. Figure 4.10 shows that E DEN is reasonably accurate in estimating the delays at
an endpoint: E DEN can estimate client delays with an error less than 5% of the actual
introduced delay.
Finally, we studied E DEN’s effectiveness in classifying delays at the client, AP, and
the medium. We used a 3-machine setup similar to the one in the previous experiment;
in this case, to estimate delays at the AP, the client also send ping packets to the AP.
To introduce delays in the medium, we increased the distance between the client and
the AP. The medium delay increased relative to the case when the AP and client were
nearby beacuse there were more retries 2 . For better accuracy, we ran these experiments
in the night when the wireless traffic was expected to be low (since the corporate LAN is
2
the increased distance resulted in an increase in the number of walls between the
two machines, thereby weakening the received signal
149
actively used by employees during the day, we did not want traffic interference to affect
our measurements).
120
60
40
20
0
40-40-near 40-40-far 0-0-far
Figure 4.11: Breakdown of delay at the client, AP, and the medium as estimated by
E DEN
Figure 4.11 shows E DEN’s breakdown for three different scenarios. The 40-40-near
bar corresponds to the scenario when the AP and client were placed near each other,
and we added a 40 msec delay to all packets at both machines. The 40-40-far scenario
is similar except that client and the AP were placed far from each other. Finally, the
0-0-far case is one in which we did not introduce any delays at the client or the AP but
In the 40-40-near case, E DEN estimates approximately equal delays for the client and
the AP. With an increase in the distance (the 40-40-far and 0-0-far cases), the medium
delays increase and E DEN is able to estimate this change as well. Note that the client
and the AP delays increased in the the latter two cases by a few milseconds beacuse
the wireless cards transmitted the packets at a lower transmission rate (1 Mbps) in order
to decrease the error rate. These results show that E DEN is an effective mechanism for
In this section, we explore two issues related to Rogue AP detection. Section 4.8.5
shows that overlapping channels helps in quicker detection of Rogue APs that are hiding
on channels where no AP or client is present. Section 4.8.5 shows that even if Rogue
APs are not overheard on overlapping channels, there is ample opportunity for clients to
perform active scanning without hurting their performance. To check the effectiveness
floor and were able to detect all “known” Rogue APs (these were experimental APs
Overlapping Channels
It is known that overlapping channels in IEEE 802.11 not only interfere with one other
but it is sometimes possible for a NIC on one channel to decode packets from another
is present on a channel that overlaps with a Rogue AP’s channel, it will detect the AP’s
was placed on channel 1 and a nearby client checked for the AP’s beacons on all 11
and document where it could be heard. In one run, the client lingered on each channel for
1 second and in the second run, it stayed for 5 seconds. Figure 4.12 plots the channels on
which the AP is heard (Y-axis) when it is placed on a specific channel (X-axis). Clearly,
the overlap across various channels is non-negligible and is helpful for detection of
Rogue APs. Furthermore, given sufficient time (see the 5-second run), there is an even
higher likelihood that some packet from a Rogue AP leaks through to a monitoring DC.
151
12
0
0 2 4 6 8 10 12
Channel on which AP beacons
In the above experiments, the AP and the client were placed 5 feet apart with one
obstacle between them. We wanted to study the change in leakage across overlapping
channels on increasing the the distance between the AP and the client. For this we
placed an AP machine at 10 different locations on our floor in various rooms and re-
peated the above experiment. Figure 4.13 shows that as the distance between the AP
and the monitoring client increases, the AP is heard on fewer channels (the decrease is
The above results show that even though one cannot rely on overlap as a guaranteed
mechanism for detecting Rogue APs, it does reduce the need of performing frequent ac-
tive scans. This observation also implies that there are more opportunities for detecting
Rogue APs: for a Rogue AP to go undetected, it must be far away from any client that
is on an overlapping channel.
As shown in Section 4.8.2, active scans can take up to 2 seconds. Our current imple-
mentation performs an active scan every 5 minutes; we refer to this period as the Active
Scan Period. Even though 2 seconds out of 300 seconds is a small fraction of the time, it
152
6 Channel 7
0
0 20 40 60 80 100 120
Distance (in feet)
is important for clients to perform these scans at appropriate times; otherwise, network
traffic on a client may get disrupted: packets sent to this client may be dropped, TCP
250
every 5 minutes
200
150
100
50
0
0 4 8 12 16 20 24
Time of day (hours)
Figure 4.14: The maximum idle time duration available during every 5-minute
Ideally, these scans should be done when the node is idle and has no ongoing net-
work transfers. To determine whether such idle times exist in current usage, we used
153
Ethereal [45] to obtain traces from 3 desktop machines of our colleagues over multiple
days. Note that even though these traces are from desktops attached to wired networks,
they still give us a reasonable estimate of network traffic generated by users; as users
start using laptops as their primary machines, it is likely that the network and idle time
We divided the traces into 5-minute periods (the Active Scan Period) and for each
period, we determined the maximum period of time for which the network was idle.
Figure 4.14 presents the maximum idle period in every 5-minute interval during a 24-
hour period. Each point in the graph (e.g., for 12:00 pm to 12:05 pm) is obtained by
averaging the maximum idle time value across multiple days and multiple machines for
the same 5-minute period. The figure shows that there are large chunks of idle periods
available for performing active scans: the smallest idle period available in a 5-minute
interval was 118 seconds and typically, idle periods of more than 2.5 to 3 minutes were
easily available. Thus, a large window of opportunity is available to our rogue detection
Given the availability of such opportunities, one can use any heuristic to predict idle
times for launching an active scan (which takes 2 seconds). We studied the effectiveness
of a simple history-based heuristic: if the network has been idle for X seconds, it predicts
that the network will be idle for the next 2 seonds. Thus, after every 5 minutes, the Rogue
AP detection module can perform an active scan whenever it observes that the network
interface has been idle for X seconds. We evaluated the effectiveness of this heuristic
over our 3-machine traces with two different values of X: 5 and 10 seconds. With both
values of X, we observed that the active scan would complete within the idle period
for more than 95% of the cases. The effectiveness of this heuristic shows that wireless
clients can perform active scans for Rogue AP detection without hurting performance.
154
As discussed in Section 4.4.3, our architecture is designed to scale with the number
of access points and clients in the system. We now discuss why our proactive and
reactive techniques maintain the scalability property. We also argue why our reactive
mechanisms impose low network overhead even if a number of clients are experiencing
As discussed in Section 4.7, each DC pro-actively sends the RSSI, SSID, and MAC
address information about nearby devices to the DAP 30 seconds; this information is
necessary for Rogue AP detection. The DAP filters this data and sends information
about APs every 30 seconds. To understand the network bandwidth consumed on the
wireless link, we set up an experiment with a single DC, DAP and DS for 4 hours. We
observed that the bandwidth consumption by the DC was less than 0.2 Kbps and the
DAP’s bandwidth requirements were less than 0.01 Kbps. This result implies that even
if a large number of clients were present, the bandwidth usage is still low, e.g., 20 Kbps
for 100 clients by DC. Thus, for pro-active monitoring, our techniques have negligible
bandwidth requirements.
We now analyze the bandwidth overheads of our reactive diagnosis mechanisms, i.e.,
Client Conduit and E DEN; we do not discuss D IAL’s overheads since D IAL’s beaconing
messages are part of Client Conduit and the overheads of sending the RSSI information
The bandwidth requirements of E DEN and the Connection Setup part (beacons and
probe messages) of Client Conduit are low since these protocols send small broadcast
or beacon packets at a low frequency, e.g., every 100 msecs in Client Conduit and ev-
ery 2 seconds in E DEN. The bandwidth consumption while using MultiNet can also
be controlled: as stated in Section 4.5.2, the connected client can limit the amount of
155
bandwidth that it allocates to the disconnected client. Thus, if a single client needs help,
We now analyze the overheads when a large number of clients (say 50) in an area
have wireless faults and are utilizing our reactive mechanisms to diagnose their prob-
lems. Our basic idea for ensuring that the performance of the network does not deteori-
ate is to rate-limit our mechanisms; we have not implemented these protocol extensions
in our current prototype. In Client Conduit, when a disconnected client overhears the
msec, it sends out a beacon every K msecs where K is a random number between 0
and 100*N msecs. This self-regulation ensures that the network is not swamped out by
Client Conduit beacons if a sudden loss of coverage occurs in an area. A similar self-
regulatory mechanism is used to limit the rate at which the initial broadcast packets are
sent in E DEN. Furthermore, to limit the overheads on a connected client C (and possibly
reduce the reactive scheme’s load on the DAP and DS), we can use a policy such that
C helps only one client at any given point. Thus, with these policy decisions, we can
ensure that Client Conduit and E DEN impose low bandwidth overheads even when a
There are a number of additional problems in wireless fault diagnosis that require further
problem is to detect Rogue Ad-hoc Networks. Such networks are created when
a user connected to the corporate network (e.g., via a wired network) sets up
an IEEE 802.11 ad-hoc network with an unauthenticated client. Thus, like the
156
Rogue AP scenario, such a network can compromise the security of the corporate
network.
was not discussed in this chapter. For example, the system could analyze the IEEE
• In Section 4.6.1, we show how the location of disconnected clients can be deter-
mined when a few connected clients are present nearby. The question remains,
what should be done when there are no connected clients in the neighborhood.
One approach may be to have the client log its last known location where connec-
• The next logical step after diagnosis is recovery. Once a fault has been detected,
one needs to determine what automatic steps should the system take to resolve the
4.10 Summary
The rising popularity of IEEE 802.11 networks has made fault detection and diagno-
sis an important problem for IT managers responsible for maintaining these networks.
Interestingly, the wireless research community has overlooked these problems, perhaps
because maintenance issues surface only after large deployments are in place, which is
In this chapter, we presented novel solutions for detecting a variety of faults and
Our initial results show that our mechanisms of locating RF holes, detecting Rogue
157
APs, and diagnosing performance problems are effective and impose low overheads.
Furthermore, we show that a novel mechanism called Client Conduit can be used for
general architecture that uses clients, APs, and backend servers together for diagnosing
The general problem space of effective network management for IEEE 802.11 net-
works is large. Our fault diagnosis architecture is a first attempt at addressing some of
deployment. It is our hope that this work will stimulate other researchers to investigate
such problems further and propose solutions that will eventually result in the smooth
The contents of this chapter were developed in joint work with Atul Adya, Victor
Bahl and Lili Qiu. The idea to work on this problem was conceived by Victor Bahl.
He also helped define the problem space. I designed the fault diagnosis architecture,
described in Section 4.4, the Client Conduit Protocol and the Rogue AP algorithm, along
with Atul Adya. I implemented the Client Conduit protocol, and Atul implemented the
Rogue AP algorithm of Section 4.6.3. I also designed the performance isolation and
location determination algorithms, presented in Sections 4.6.1 and 4.6.2, in joint work
CONCLUSION
To the best of our knowledge, this dissertation is the first to look at the problem of
The MultiNet solution is a new virtualization architecture for wireless network cards.
part of its Mesh Networking Academic Resource Toolkit [104]. In addition to describ-
ing MultiNet, this dissertation also presents two of its applications: SSCH and Client
Conduit.
SSCH is a channel hopping protocol for increasing the capacity of wireless ad hoc
networks. SSCH can be implemented in the link layer of the network stack and works
over the IEEE 802.11 standard. It is the first multi-channel protocol that we are aware of,
which works over a single wireless card without requiring a dedicated control channel.
We show that SSCH significantly increases the capacity of wireless ad hoc networks.
Client Conduit is a key component of our fault diagnosis architecture. It uses Multi-
the bandwidth of connected machines. Client Conduit has been implemented on Win-
In addition to SSCH and Client Conduit, MultiNet enables the design of a whole
new class of applications. System designers are no longer constrained by the number
of wireless cards they can fit into in a system. They are free to design systems and
applications that can connect to many wireless networks at the same time. We believe
Through its constructions, this dissertation contributes towards solving some of the
key problems in existing wireless networks, in particular power, capacity and manage-
158
159
ability. MultiNet saves battery power by not requiring multiple wireless cards to stay
nally, this dissertation presents a new client-centric fault diagnosis architecture for in-
[2] B. Aboba and D. Simon. PPP EAP TLS Authentication Protocol. In RFC 2716,
October 1999.
[3] A. Adya, P. Bahl, R. Chandra, and L. Qiu. Architecture and Techniques for
Diagnosing Faults in IEEE 802.11 Infrastructure Networks. In Proc. of ACM
MobiCom, Philadelphia, PA, September 2004.
[4] Atul Adya, Paramvir Bahl, Jitendra Padhye, Alec Wolman, and Lidong Zhou. A
Multi-Radio Unification Protocol for IEEE 802.11 Wireless Networks. Technical
Report MSR-TR-2003-44, Microsoft Research, July 2003.
[9] M. Alicherry, R. Bhatia, and L. Li. Joint Channel Assignment and Routing for
Throughput Optimization in Multi-radio Wireless Mesh Networks. In MobiCom,
August 2005.
[10] M. Allman, W. Eddy, and S. Ostermann. Estimating Loss Rates With TCP. In
ACM Perf. Evaluation Review 31(3), Dec 2003.
[16] P. Bahl, R. Chandra, and J. Dunagan. SSCH: Slotted Seeded Channel Hopping
for Capacity Improvement in IEEE 802.11 Ad-Hoc Wireless Networks. In Proc.
of ACM MobiCom, Philadelphia, PA, September 2004.
160
161
[18] P. Barford and M. Crovella. Generating Representative Web Workloads for Net-
work and Server Performance Evaluation. In ACM SIGMETRICS 1998, pages
151–160, July 1998.
[19] P. Barford and M. Crovella. Critical Path Analysis of TCP Transactions. In Proc.
of ACM SIGCOMM, Stockholm, Sweden, Aug 2000.
[22] J. Bellardo and S. Savage. Measuring Packet Reordering. In Proc. of ACM Inter-
net Measurement Workshop, Marseille France, Nov 2002.
[30] R. Chandra, P. Bahl, and P. Bahl. MultiNet: Connecting to Multiple IEEE 802.11
Networks Using a Single Wireless Card. In Proc. of IEEE INFOCOM, Hong
Kong, Mar 2004.
[38] Intel Compaq and Microsoft Corporations. Virtual Interface Specification. Ver-
sion 1.0. December 1997.
[42] T. ElBatt and B. Ryu. On the Channel Reservation Schemes for Ad-hoc Net-
works: Utilizing Directional Antennas. In IEEE International Symposium on
Wireless Personal Multimedia Communications, October 2002.
[50] Motorola Government and Enterprise. Motorola’s Mobile Mesh Networks Tech-
nology. http://www.motorola.com/governmentandenterprise/.
[53] Hung-Yun Hsieh, Kyu-Han Kim, Yujie Zhu, and Raghupathy Sivakumar. A
receiver-centric transport protocol for mobile hosts with heterogeneous wireless
interfaces. In Proceedings of the 9th annual international conference on Mobile
computing and networking, pages 1–15. ACM Press, 2003.
[54] L. Huang and T. Lai. On the scalability of IEEE 802.11 ad hoc networks. In Pro-
ceedings of the 3rd ACM international symposium on Mobile Ad Hoc Networking
& Computing, MobiHoc, pages 173–182. ACM Press, 2002.
[57] IEEE. IEEE 802.1x-2001 IEEE Standards for Local and Metropolitan Area Net-
works: Port-Based Network Access Control, 1999.
[58] IEEE Computer Society. Wireless LAN Medium Access Control (MAC) and
Physical Layer (PHY) Specifications. IEEE Standard 802.11, 1999.
[61] Crossbow Technology Inc. Motes, Smart Dust Sensors, Wireless Sensor Net-
works. http://www.xbow.com/Products/Wireless Sensor Networks.htm.
[66] N. Jain and S. R. Das. A Multichannel CSMA MAC Protocol with Receiver-
Based Channel Selection for Multihop Wireless Networks. In International Con-
ference on Computer Communications and Networks (IC3N), October 2001.
[67] Jinyang Li and Charles Blake and Douglas S. J. De Couto and Hu Imm Lee and
Robert Morris. Capacity of Ad Hoc wireless networks. In Mobile Computing and
Networking, pages 61–69, 2001.
[68] D. Johnson, D. Maltz, and J. Broch. DSR The Dynamic Source Routing Proto-
col for Multihop Wireless Ad Hoc Networks. In C.E. Perkins, editor, Ad Hoc
Networking, chapter 5, pages 139–172. Addison-Wesley, 2001.
[69] E. Jung and N. Vaidya. An Energy Efficient MAC Protocol for Wireless LANS.
In IEEE INFOCOM 2002, June 2002.
[71] R. Krashinsky and H. Balakrishnan. Minimizing Energy for Wireless Web Access
with Bounded Slowdown. In ACM MobiCom 2002, pages 119–130, September
2002.
[72] R. Kravets and R. Krishnan. Power Management Techniques for Mobile Com-
munications. In ACM MobiCom 1998, October 1998.
[74] T. H. Lai and D. Zhou. Efficient and Scalable IEEE 802.11 Ad-Hoc Mode Timing
Synchronization Function. In Proc. of International Conference on Advanced
Information Networking and Applications, March 2003.
165
[75] L. Lamport. Time, Clocks and the Ordering of Events in Distributed Systems. In
Communications of the ACM, volume 21, pages 558–565, 1978.
[76] L. Lamport, R. Shostak, and M. Pease. The Byzantine Generals Problem. ACM
TOPLAS, 4(3):382–401, July 1982.
[77] C. Law, A. K. Mehta, and K. Siu. A New Bluetooth Scatternet Formation Proto-
col. In To appear in ACM Mobile Networks and Applications Journal, 2002.
[78] C. Law and K. Siu. A Bluetooth Scatternet Formation Algorithm. In IEEE Sym-
posium on Ad Hoc Wireless Networks 2001, November 2001.
[80] Y. Li, H. Wu, D. Perkins, N. Tzeng, and M. Bayoumi. MAC-SCC: Medium Ac-
cess Control with a Separate Control Channel for Multihop Wireless Networks.
In 23rd International Conference on Distributed Computing Systems Workshops
(ICDCSW), 2003.
[86] Microsoft Corp. Native 802.11 Framework for IEEE 802.11 Networks. http://ww
w.microsoft.com.
[89] A. Nasipuri and S. R. Das. Multichannel CSMA with Signal Power-Based Chan-
nel Selection for Multihop Wireless Networks. In IEEE Vehicular Technology
Conference (VTC), September 2000.
[91] S. Ni, Y. Tseng, Y. Chen, and J. Sheu. The Broadcast Storm Problem in a Mobile
Ad Hoc Network. In ACM MobiCom, August 1999.
[92] L. Nord and J. Haartsen. The Bluetooth Radio Specification and The Bluetooth
Baseband Specification. Bluetooth, 1999-2000.
[99] L. Qiu, P. Bahl, A. Rao, and L. Zhou. Fault Detection, Isolation, and Diagnosis
in Multihop Wireless Networks. Technical Report MSR-TR-2004-11, Microsoft
Research, Redmond, WA, Dec 2003.
[100] I. Ramani and S. Savage. SyncScan: Practical Fast Handoff for 802.11 Infras-
tructure Networks . In Proc. of IEEE Infocom, Miami, FL, March 2005.
[101] M. Raya, J. P. Hubaux, and I. Aad. DOMINO: A System to Detect Greedy Be-
havior in IEEE 802.11 Hotspots. In Proc. of MobiSys, Boston, MA, June 2004.
[108] R. Rozovsky and P. Kumar. SEEDEX: A MAC Protocol for Ad Hoc Networks.
In ACM MobiHoc, 2001.
[110] J. Sheu, C. Chao, and C. Sun. A Clock Synchronization Algorithm for Multi-
Hop Wireless Ad Hoc Networks. In Proc. of IEEE International Conference on
Distributed Computing Systems, ICDCS, Tokyo, March 2004.
[111] E. Shih, P. Bahl, and M. Sinclair. Wake On Wireless: An Event Driven Energy
Saving Strategy for Battery Operated Devices. In MOBICOM, September 2002.
[112] E. Shih, P. Bahl, and M. Sinclair. Wake on Wireless: An event driven power
saving strategy for battery operated devices. In ACM MobiCom 2002, September
2002.
[113] M. Shin, A. Mishra, and W. Arbaugh. Improving the Latency of 802.11 Handoffs
Using Neighbr Graphs. In Proc. of MobiSys, Boston, MA, June 2004.
[117] R. Stevens. TCP/IP Illustrated (Vol. 1): The Protocols. Addison Wesley, 1994.
[118] R. Stine. FYI on a Network Management Tool: Catalog Tools for Monitoring and
Debugging TCP/IP Internets and Interconnected Devices. In IETF RFC 1147,
April 1990.
[121] SuperPass. Wireless LAN PCI card for 2.4 GHz. http://www.superpass.com/SP-
PCI-01.html.
[125] P. Verissimo and L. Rodrigues. A Posteriori Agreement for Fault Tolerant Clock
Synchronization on Broadcast Networks. In Proc. of International Symposium
on Fault-Tolerant Computing (FTCS), page 85, July 1992.
[128] T. von Eicken, A.Basu, V. Buch, and W. Vogels. U-Net: A User-Level Network
Interface for Parallel andDistributed Computing. In Proc. of ACM SOSP, New
York, December 1995.
[129] R. Want, A. Hopper, V. Falcao, and J. Gibbons. The Active Badge Location
System. ACM Transactions on Information Systems, 10(1), January 1992.
[130] A. Whitaker, M. Shaw, and S. D. Gribble. Scale and Performance in the Denali
Isolation Kernel. In Fifth Symposium on Operating Systems Design and Imple-
mentation, December 2002.
[135] S.-L. Wu, C.-Y. Lin, Y.-C. Tseng, and J.-P. Sheu. A New Multi-Channel MAC
Protocol with On-Demand Channel Assignment for Mobile Ad Hoc Networks.
In International Symposium on Parallel Architectures, Algorithms and Networks
(I-SPAN), 2000.
[136] S. Xu and T. Saadawi. Does the IEEE 802.11 MAC Protocol Work Well in Mul-
tihop Wireless Ad Hoc Networks? IEEE Communi.Magazine, pp.130-137, June
2001.
169