A Virtualization Architecture For

A VIRTUALIZATION ARCHITECTURE FOR
WIRELESS NETWORK CARDS
A Dissertation
Presented to the Faculty of the Graduate School
of Cornell University
in Partial Fulfillment of the Requirements for the Degree of
Doctor of Philosophy
by
Ranveer Chandra
January 2006

c 2006 Ranveer Chandra
ALL RIGHTS RESERVED

A VIRTUALIZATION ARCHITECTURE FOR WIRELESS NETWORK CARDS
Ranveer Chandra, Ph.D.
Cornell University 2006
This doctoral dissertation describes the design and applications of a new virtualization
architecture for wireless network cards, called MultiNet. MultiNet virtualizes a sin-
gle wireless card to appear as multiple virtual wireless cards to the user. Each virtual
card can then be configured separately on a physically different network. The goal of
MultiNet is to provide a user-level illusion of simultaneous connectivity on all virtual
cards although the network card is on a single network at any instant. MultiNet achieves
this transparency using intelligent buffering and switching algorithms. The switching
and buffering mechanisms are implemented as a kernel driver, while the policies are
implemented as a user-level service. The MultiNet system has been implemented over
Windows XP and has been operational for over two years. It is agnostic of the upper
layer protocols, and works well over popular IEEE 802.11 wireless LAN cards. Further,
MultiNet enables a new class of applications, which were earlier only possible with
multiple wireless cards in the device. This dissertation describes two such applications:
Slotted Seeded Channel Hopping (SSCH) and Client Conduit.
SSCH is a new channel hopping protocol that works over MultiNet, and utilizes fre-
quency diversity to increase the capacity of IEEE 802.11 wireless networks. Each node
using SSCH switches across channels in such a manner that nodes desiring to communi-
cate overlap, while disjoint communications do not overlap, and hence do not interfere
with each other. To achieve this, SSCH uses a novel scheme for distributed rendezvous
and synchronization. Simulation results show that SSCH significantly increases network
capacity in several multihop and single hop wireless networking scenarios.
Client Conduit is a novel technique for providing connectivity to disconnected wire-

less clients with the help of nearby connected clients. It is based on MultiNet and takes
advantage of the beaconing and probing mechanisms of IEEE 802.11 to ensure that
connected clients do not pay unnecessary overheads while helping disconnected clients.
Client Conduit has been implemented over Windows XP as part of an architecture for
diagnosing faults in wireless networks.

BIOGRAPHICAL SKETCH
Ranveer was born in Jamshedpur, an industrial town in Eastern India on August 27, 1976
as the third in a family of four children. He lived in Jamshedpur for the first 18 years
of his life and decided to appear for the IIT exam after finishing high school. Ranveer
secured a good rank in the IIT qualifying exam and decided to go to IIT Kharagpur,
which was within 100 miles of Jamshedpur. IIT Kharagpur provided an ideal setting
for Ranveer to complete his undergraduate education in an environment that had good
professors, extraordinary peers, little distraction, and still a lot of fun. Ranveer ma-
jored in Computer Science, and developed a keen interest in computer networking and
distributed systems. The opportunity of solving challenging problems in these fields
motivated Ranveer to study further. He applied to a few schools in the United States,
and decided to go to Cornell University in Ithaca, NY for his PhD in Computer Science.
Over the six years at Cornell University Ranveer worked with a number of people at
Cornell. He also spent three summers in Microsoft Research and one at AT&T Labs -
Research, and enjoyed working in industrial research labs. After completing his PhD,
Ranveer is headed towards the North-West, where he has accepted an offer from Mi-
crosoft Research in Redmond, WA.
iii
ACKNOWLEDGEMENTS
First, I want to thank my advisor, Ken Birman, for his constant support and guidance
during my six years of PhD study at Cornell University. He kept me motivated and
provided the right direction that enabled me to finish these challenging years of work.
His sharp intellect and great comments were always the guiding feature in my PhD.
Further, his towering figure in the field of Computer Science has been and will always
be a role model of what I want to achieve with my research.
Secondly, I am grateful to Victor Bahl for bringing out in me what I really wanted
to do in research. Interactions with him during the three internships made me realize
the open problems in wireless networking, and what I needed to do to make an impact
in this field. Victor has also been a constant source of encouragement. His unbridled
enthusiasm on seeing results always motivated me to go further in research.
I am also grateful to my other committee members, Eva Tardos, Zygmunt Haas and
Robbert VanRenesse, who have been supportive of my research in every step of my PhD.
Their comments have been very valuable in rewriting the final draft of this dissertation.
I would also like to thank my other coauthors at Microsoft Research. In particu-
lar, Atul Adya has been a great influence during my PhD. His views and ideas have
influenced the way I write, present and do my research. Lili Qiu has showed me how
perseverance, patience and good work always pays off. Finally John Dunagan has been
of great help in reviewing my work, and showing me the right direction. In addition I
would also like to thank Alec Wolman and Jitu Padhye for great research conversations.
Finally, I would like to acknowledge the contribution of my family members and
friends for keeping me motivated to finish my PhD. My parents have shown their belief
in me and supported me in every possible way. My sister and brother-in-law have al-
ways been with me through the troubled phases of my PhD. I would also like to thank
iv
Meenakshi, Biswanath, Ben, Rimon, Indranil and Rama for making the six years of stay
in Ithaca very enjoyable.
v
TABLE OF CONTENTS
1 Introduction 1
1.1 Problems with Existing Wireless Networks . . . . . . . . . . . . . . . 1
1.2 Thesis and Its Contributions . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Limitations of this Dissertation . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Roadmap of this Dissertation . . . . . . . . . . . . . . . . . . . . . . . 5
2 The MultiNet Virtualization Approach 6

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Motivating Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Prior Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4.1 Limitations in Existing Systems . . . . . . . . . . . . . . . . . 13
2.4.2 Power Save Mode (PSM) of IEEE 802.11 . . . . . . . . . . . . 13
2.4.3 Next Generation of IEEE 802.11 WLAN cards . . . . . . . . . 14
2.5 MultiNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5.1 Assumptions about the System . . . . . . . . . . . . . . . . . . 14
2.5.2 MultiNet Design Goals . . . . . . . . . . . . . . . . . . . . . . 16
2.5.3 The MultiNet Approach . . . . . . . . . . . . . . . . . . . . . 18
2.5.4 Delivering Packets to Virtual Interfaces . . . . . . . . . . . . . 21
2.5.5 Determining the Activity Period for a Network . . . . . . . . . 24
2.5.6 Handling Ad Hoc Networks with Multiple MultiNet Nodes . . . 25
2.6 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.6.1 MultiNet Driver . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.6.2 MultiNet Service . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.6.3 Implementing Buffering . . . . . . . . . . . . . . . . . . . . . 31
2.6.4 Implementing Slotted Synchronization . . . . . . . . . . . . . 31
2.7 System Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.7.1 Test Configuration . . . . . . . . . . . . . . . . . . . . . . . . 32
2.7.2 Reducing the Switching Delay . . . . . . . . . . . . . . . . . . 33
2.7.3 Comparing Different Switching Strategies . . . . . . . . . . . . 34
2.7.4 Adaptive Switching using MultiNet . . . . . . . . . . . . . . . 37
2.7.5 MultiNet with and without Buffering . . . . . . . . . . . . . . 38
2.7.6 MultiNet with Slotted Synchronization . . . . . . . . . . . . . 39
2.7.7 MultiNet on a Mobile Node . . . . . . . . . . . . . . . . . . . 41
2.7.8 MultiNet versus Multiple Radios . . . . . . . . . . . . . . . . . 42
2.7.9 Maximum Connectivity in MultiNet . . . . . . . . . . . . . . . 49
2.7.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.8 Discussion on the MultiNet Architecture . . . . . . . . . . . . . . . . . 51
2.8.1 Reducing the Switching Overhead . . . . . . . . . . . . . . . . 51
2.8.2 Network Port Based Authentication . . . . . . . . . . . . . . . 52
2.8.3 Can MultiNet be done in the Firmware? . . . . . . . . . . . . . 53
2.9 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
vi
2.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3 SSCH: Capacity Improvement Using MultiNet 56

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.2 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . 57
3.3 Hardware and MAC Assumptions . . . . . . . . . . . . . . . . . . . . 61
3.4 Prior Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.5 SSCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.5.1 Packet Scheduling . . . . . . . . . . . . . . . . . . . . . . . . 67
3.5.2 Channel Scheduling . . . . . . . . . . . . . . . . . . . . . . . 69
3.5.3 Mathematical Properties of SSCH . . . . . . . . . . . . . . . . 76
3.6 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.6.1 Microbenchmarks . . . . . . . . . . . . . . . . . . . . . . . . 79
3.6.2 Macrobenchmarks: Single-hop Case . . . . . . . . . . . . . . . 85
3.6.3 Macrobenchmarks: Multihop and Mobility . . . . . . . . . . . 91
3.6.4 Implementation Considerations . . . . . . . . . . . . . . . . . 98
3.7 Alternatives to SSCH . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
3.8 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
3.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4 Client Conduit and Fault Diagnosis in Wireless Networks 103

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.2 Faults in a Wireless Network . . . . . . . . . . . . . . . . . . . . . . . 106
4.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.4 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.4.1 System Requirements . . . . . . . . . . . . . . . . . . . . . . 111
4.4.2 System Components . . . . . . . . . . . . . . . . . . . . . . . 112
4.4.3 System Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . 115
4.4.4 System Security . . . . . . . . . . . . . . . . . . . . . . . . . 117
4.5 Client Conduit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
4.5.1 The Client Conduit Protocol . . . . . . . . . . . . . . . . . . . 118
4.5.2 Client Conduit Security and Attacks . . . . . . . . . . . . . . . 121
4.6 Fault Detection and Diagnosis . . . . . . . . . . . . . . . . . . . . . . 124
4.6.1 Locating Disconnected Clients . . . . . . . . . . . . . . . . . . 124
4.6.2 Network Performance Problems . . . . . . . . . . . . . . . . . 125
4.6.3 Rogue AP Detection . . . . . . . . . . . . . . . . . . . . . . . 130
4.7 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
4.8 System Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
4.8.1 Cost of Individual Operations . . . . . . . . . . . . . . . . . . 139
4.8.2 Client Conduit . . . . . . . . . . . . . . . . . . . . . . . . . . 141
4.8.3 Location Determination . . . . . . . . . . . . . . . . . . . . . 143
4.8.4 Estimating Wireless Delays . . . . . . . . . . . . . . . . . . . 147
4.8.5 Rogue AP Detection . . . . . . . . . . . . . . . . . . . . . . . 150
4.8.6 Scalability Analysis . . . . . . . . . . . . . . . . . . . . . . . 154
vii
4.9 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
4.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
5 Conclusion 158
References 160
viii
LIST OF TABLES
2.1 The Switching Delays between IS and AH networks for IEEE 802.11
cards with and without the optimization of trapping media connect and
disconnect messages. . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.2 The average throughput in the ad hoc and infrastructure networks using
both strategies of MultiNet and two radios . . . . . . . . . . . . . . . 45
2.3 The average packet delay in infrastructure mode for the various strategies 46
2.4 The average packet delay in infrastructure mode on varying the number
of MultiNet connected networks . . . . . . . . . . . . . . . . . . . . . 50
4.1 Different fault diagnosis mechanisms and entities that can diagnose
them; the last column indicates if the solution can be supported using
legacy APs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
4.2 Times for different operations: U means time measured from user-level
code; rest are times taken for the corresponding ioctl to complete . . . 140
ix
LIST OF FIGURES
2.1 The MultiNet Layer maintains virtual interfaces for networks 1, 2 and
3, and switches the physical card across all these networks. It gives the
illusion of connectivity on all networks although the card is on network
2 at this instant. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 The steps of Spoofed Buffering when a node uses MultiNet to connect
to two networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3 Two nodes in communication range and using MultiNet that fail to
overlap in the ad hoc network abd hence experience a logical partitioning. 26
2.4 The Network Stack with MultiNet . . . . . . . . . . . . . . . . . . . . 29
2.5 Time taken to complete a 47 MB FTP transfer on an ad hoc and infras-
tructure network using different switching strategies . . . . . . . . . . 36
2.6 Variation of the activity period for two networks with time. The activity
period of a network is directly proportional to the relative traffic on it. . 37
2.7 TCP Performance with and without Spoofed Buffering. . . . . . . . . 39
2.8 Effect on UDP flows when a node uses Slotted Synchronization to join
an ad hoc network . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.9 MultiNet in a Mobile Scenario . . . . . . . . . . . . . . . . . . . . . . 42
2.10 Packet trace for the web browsing application over the infrastructure
network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.11 Packet trace for the presentation and chat workloads over the ad hoc
network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.12 Comparison of total energy usage when using MultiNet versus two radios 47
2.13 Energy usage when using MultiNet and two radios with IEEE 802.11
Power Saving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.1 Only one of the three packets can be transmitted when all the nodes are
on the same channel. . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.2 Channel hopping schedules for two nodes with 3 channels and 2 slots.
Node A always overlaps with Node B in slot 1 and the parity slot. The
field of the channel schedule that determines the channel during each
slot is shown in bold. . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.3 The problem with a naive synchronization scheme. Node A has two
slots, with (channel, seed) pairs represented by A1 and A2 ; nodes B
and C are similarly depicted. At time t1 , node A synchronizes with
node B. Node B synchronizes with node C at time t2 , after which A
and B are no longer synchronized. . . . . . . . . . . . . . . . . . . . . 72
3.4 Need for De-synchronization: All nodes converge to the same channel
without de-synchronization. . . . . . . . . . . . . . . . . . . . . . . . 74
3.5 Switching and Synchronizing Overhead: Node 1 starts a maximum rate
UDP flow to Node 2. We show the throughput for both SSCH and IEEE
802.11a. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
x
3.6 Overhead of an Absent Node: Node 1 is sending a maximum rate UDP
stream to Node 2. Node 1 then attempts to send a packet to a non-
existent node. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.7 Overhead of a Parallel Session: Node 1 is sending a maximum rate
UDP stream to Node 2. Node 1 then starts a second stream to Node 3. . 83
3.8 Overhead of Mobility: Node 1 is sending a maximum rate UDP stream
to Node 2. Node 1 starts another maximum rate UDP session to Node
3. Node 3 moves out of range at 30 seconds, while Node 1 continues to
attempt to send until 43 seconds. . . . . . . . . . . . . . . . . . . . . . 84
3.9 Overhead of Clock Skew: Throughput between two nodes using SSCH
as a function of clock skew. . . . . . . . . . . . . . . . . . . . . . . . 85
3.10 Disjoint Flows: The throughput of each flow on increasing the number
of flows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.11 Disjoint Flows: The system throughput on increasing the number of
flows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.12 Non-disjoint Flows: The average throughput of each flow on increasing
the number of flows. There is a flow from every node in the network. . 88
3.13 Non-disjoint Flows: The system throughput on increasing the number
of flows. There is a flow from every node in the network. . . . . . . . . 89
3.14 Effect of Flow Duration: Ratio of SSCH average throughput to IEEE
802.11a average throughput for flows having different durations. . . . . 90
3.15 TCP over SSCH: Steady-state TCP throughput when varying the num-
ber of non-disjoint flows. . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.16 Multihop Chain Network: Variation in throughput as chain length in-
creases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.17 Mulithop Mesh Network of 100 Nodes: Average flow throughput on
varying the number of flows in the network. . . . . . . . . . . . . . . . 94
3.18 Impact of SSCH on Unmodified MANET Routing Protocols: The av-
erage time to discover a route and the average route length for 10 ran-
domly chosen routes in a 100 node network using DSR over SSCH. . . 95
3.19 Dense Multihop Mobile Network: The per-flow throughput and the av-
erage route length for 10 flows in a 100 node network in a 200m×200m
area, using DSR over both SSCH and IEEE 802.11a. . . . . . . . . . . 97
3.20 Sparse Multihop Mobile Network: The per-flow throughput and the
average route length for 10 flows in a 100 node network in a 300m ×
300m area, using DSR over both SSCH and IEEE 802.11a. . . . . . . 98
4.1 Number of wireless related complaints logged by the IT department of

a major US corporation . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.2 Fault Diagnosis Architecture . . . . . . . . . . . . . . . . . . . . . . . 114
4.3 Client Conduit Mechanism (Steps 1 through 5 are described below) . . 119
4.4 Decision steps taken by the DS to determine if an AP is a Rogue AP or
not . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
4.5 Components on DC and DAP . . . . . . . . . . . . . . . . . . . . . . 135
xi
4.6 CPU usage in Promiscuous mode (1 GHz machine) . . . . . . . . . . . 141
4.7 Breakdown of costs for Client Conduit. The protocol steps are executed
from the bottom entry in the legend to the topmost, i.e., starting at “Set
channel”. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
4.8 Time taken by a disconnected client to transfer data via Multinet . . . . 144
4.9 Median error in locating disconnected clients. The lower and upper
bounds of error bars correspond to min and max error. E(i) denotes
that the ith connected client’s location contains error. . . . . . . . . . 145
4.10 E DEN’s accuracy of estimating the delay at a client . . . . . . . . . . . 148
4.11 Breakdown of delay at the client, AP, and the medium as estimated by
E DEN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
4.12 Overlapping channels on which an AP is overheard . . . . . . . . . . . 151
4.13 Overlapping channels heard relative to distance . . . . . . . . . . . . . 152
4.14 The maximum idle time duration available during every 5-minute pe-
riod at different times of the day . . . . . . . . . . . . . . . . . . . . . 152
xii
CHAPTER 1
INTRODUCTION
There has been a recent interest in using multiple wireless cards in a device [9,64,87,95,
115, 119]. This dissertation provides a cheaper and more energy-efficient scheme to get
the functionality of multiple wireless cards while using only a single physical network
interface. This approach is called MultiNet, which is a new architecture for virtualizing
wireless cards. MultiNet is very useful in solving some of the key problems in wireless
networks, and we explore it in greater detail in the rest of this chapter.
1.1 Problems with Existing Wireless Networks
Wireless technology has an increasing presence in our life from cellular phones, wireless
LANs, Bluetooth headphones, cordless phones, location systems, to smart homes, and
many more. This trend will grow with an increasing deployment of sensor networks [61,
88], mesh networks [50, 93], and the recent WiMAX initiative [63, 133]. Although they
are increasingly common, wireless networks are still relatively fragile and underutilized.
In order to make wireless networks robust we have to solve a number of important
problems, some of which can be grouped under the following categories:
• Manageability: Wireless networks are frustratingly opaque. This leads to long
delays in resolving performance and connectivity problems, as well as high man-
ageability costs [5, 7, 39, 103, 131]. The state of the art will be significantly en-
hanced by a management infrastructure for wireless networks that diagnoses prob-
lems with minimum human intervention and informs the user of ways to recover
from them [3, 8].
1
2
• Capacity: Although the bandwidth of wireless networks is steadily increasing,
capacity is still a bottleneck for many applications [40, 65, 95]. Any scheme that
increases wireless capacity, through advanced antennas [34, 42] and smarter pro-
tocols [16, 114] will greatly impact the wireless performance of a number of ap-
plications.
• Power: Limited battery power is the Achilles heel for wireless applications [72].
Applications and protocols for mobile computing should prolong battery life by
using schemes such as maximizing sleep durations of wireless cards [71], using
transmit power control [69], or avoiding multiple wireless interfaces [30].
1.2 Thesis and Its Contributions
This doctoral dissertation contributes towards solving these problems for IEEE 802.11
wireless networks [58] by proposing a new virtualization architecture called MultiNet.
MultiNet virtualizes a single wireless card to make it appear as multiple wireless cards
to the user. The user can configure each virtual card separately to be on a physically
different network. For example, when using an IEEE 802.11 card the user can connect
one virtual card on an infrastructure network, and the other virtual card on an ad hoc
network, although the network card is on a single physical network at any instant. The
goal of MultiNet is to provide a user-level illusion of simultaneous connectivity on all
wireless networks. MultiNet achieves this transparency using intelligent buffering and
switching algorithms. MultiNet has been implemented over Windows XP and is avail-
able for download. In addition to describing this architecture, this thesis also explores
three ways in which MultiNet alleviates the above problems of wireless networks.
Firstly, MultiNet enables a number of techniques to reduce power consumption. For
example, it allows the functionality of multiple interfaces to be provided in situations

3
where the fixed energy cost of multiple physical interfaces is not feasible. MultiNet also
enables a new power saving mechanism by allowing nodes to function as relays using
only one wireless card: nodes with low battery power can send their traffic to the Access
Point at a lower transmit power using intermediate relay nodes.
Secondly, MultiNet facilitates a way to increase the capacity of wireless ad hoc net-
works by exploiting channel diversity. The capacity of ad hoc networks is known to
scale poorly with the number of communicating nodes [67]. When multiple neighbor-
ing node pairs want to communicate using IEEE 802.11, only one pair can be active at a
time. However, other nodes can talk simultaneously if they are on orthogonal frequency
channels, since traffic on orthogonal channels do not interfere. But this breaks the se-
mantics of wireless networks: two neighboring nodes in a network might be on different
channels and cannot communicate. MultiNet helps to solve this problem. The number
of virtual interfaces is the number of orthogonal channels. This dissertation proposes
a new scheduling algorithm, called Slotted Seeded Channel Hopping (SSCH), which
works with MultiNet to improve network capacity. The goal of SSCH is to have com-
municating nodes on the same channel and other nodes on randomly different channels
at any instant, while ensuring that any two neighboring nodes overlap within a fixed
period. SSCH achieves this goal by introducing the technique of partial synchroniza-
tion and also makes use of existing techniques such as pseudo-random generators. It is
shown mathematically that SSCH has desired synchronization properties. Using sim-
ulations in QualNet, it is shown that SSCH significantly improves wireless capacity of
IEEE 802.11.
Finally MultiNet enables a novel communication mechanism for disconnected ma-
chines, called Client Conduit, which is used to diagnose faults in infrastructure wireless
networks. A recent surge in the deployment of large-scale enterprise and city-wide

4
wireless networks [37] entails a pressing need for wireless network management tools,
similar to wired networks [56, 94]. Network administrators want to know why users are
suffering from poor performance and frequent disconnections. They are interested in lo-
cating security breaches, for example an unauthorized (rogue) access point plugged into
an enterprises’ Ethernet jack that jeopardizes its resources. In our architecture, Client
Conduit allows disconnected clients to transfer diagnostic messages to and from a back-
end server. It is implemented using MultiNet, since it allows connected clients to stay on
the infrastructure network using one virtual interface, and form an ad hoc network with
the disconnected client on another virtual interface. This thesis presents a lightweight
mechanism to implement Client Conduit, where virtual interfaces are added dynami-
cally and a connected client suffers no penalty in the common case. It also proposes
algorithms to detect rogue access points, locate disconnected clients, and diagnose poor
wireless performance. This architecture has been prototyped over Windows XP using
off the shelf wireless cards and access points.
1.3 Limitations of this Dissertation
Although MultiNet has been implemented over Windows XP, it has not been tested over
all cases and in large deployments. Consequently, simulation results were used to show
the feasibility MultiNet. Further, the inability of available hardware to quickly switch
across frequency channels limited all results on SSCH to simulations in QualNet [62].
However, realistic simulation parameters were chosen and a mathematical analysis of
SSCH was done to show that SSCH will significantly improve the capacity of wireless
networks when the required hardware is available. MultiNet, SSCH and our fault diag-
nosis architecture have additional limitations, and we enumerate them in Chapters 2, 3
and 4 respectively.
5
1.4 Roadmap of this Dissertation
Chapter 2 describes the MultiNet architecture in detail. It also shows that MultiNet
consumes less energy than an alternative approach of using multiple wireless cards.
Chapter 3 describes the SSCH protocol and its properties, and analyzes the performance
of SSCH. Chapter 4 then presents our fault diagnosis architecture, and describes and
evaluates the design of Client Conduit. Finally, Chapter 5 concludes this dissertation.
Most of the contents of Chapters 2, 3 and 4 are adapted from previously written
independent papers, in particular [30], [16] and [3] respectively. The contributions of
coauthors of each of these papers is listed in the last paragraph of each chapter.
CHAPTER 2
THE MULTINET VIRTUALIZATION APPROACH
2.1 Introduction
Systems research over the last two decades has revealed a number of benefits of virtual-
izing different systems components, such as virtual machines [20, 49, 126, 130], virtual
storage [55, 81] and virtual memory [23] among others. However, the benefits of vir-
tualizing a wireless card have not been explored. This chapter describes MultiNet, a
new virtualization architecture that abstracts a single wireless LAN (WLAN) [60] card,
making it appear as multiple virtual cards to the user.
MultiNet enables several compelling scenarios. These include increased connectiv-
ity for end users; increased range of the wireless network; bridging between infrastruc-
ture and ad hoc wireless networks, and painless secure access to sensitive resources. We
discuss these in detail in Section 2.2. To explore this problem space with current tech-
nology, one would have to use a single WLAN card for each desired network [64, 115].
Doing so is costly, cumbersome, and consumes energy resources that are often limited.
An alternative is to use the MultiNet virtualization approach.
Virtualizing a wireless card poses several research challenges. Firstly, a virtual wire-
less card should appear as a real (physical) wireless card to the user. Secondly, the user
should get an illusion of simultaneous connectivity on all virtual cards, although the
physical wireless card can only be on one network at any instant [58]. Thirdly, the
system should be deployable and compatible with nodes not using virtualization. More-
over, the virtualization software should not require modifications to existing backbone
infrastructure, such as Access Points (APs) [58] and routers.
MultiNet solves the above problems by creating a new virtual interface for each net-
6
7
work to which connectivity is desired. The virtual interface exports itself as a new phys-
ical device to the network layer. It also maintains the state of the physical card required
for connecting to the wireless network corresponding to this virtual interface. Multi-
Net achieves the illusion of simultaneous connectivity over all networks by switching
the physical network card across the desired networks and activating the correspond-
ing virtual interface. Further, MultiNet is deployable as it does not require changes to
APs and routers. This is achieved by a new protocol called Spoofed Buffering, which
leverages the Power Save Mode of the IEEE 802.11 [58] standard, and is described in
Section 2.5.4.
This main contributions of this chapter can be summarized as follows:
• It presents the design of MultiNet, which is a new architecture for virtualizing
WLAN cards. As part of the design it describes the state that needs to be stored for
every virtual wireless card. It also describes in detail the implementation of Multi-
Net over Windows XP. The implementation works with modest modifications to
the Operating System kernel, and without any modifications to the wireless card
drivers.
• It proposes a new protocol, called Spoofed Buffering, which delivers packets sent
to a node using MultiNet when it is on another network. APs buffer packets for
the nodes that have switched to another network, and deliver them when nodes
switch back to their network. Spoofed Buffering achieves this functionality with-
out requiring any changes to APs. This protocol has also been used in a recent
work for fast handoff in IEEE 802.11 networks [100].
• It analyzes the performance of MultiNet over a number of commercial WLAN
cards, and shows that MultiNet is suitable for most applications. It describes
a technique to reduce the overhead of switching a wireless card across networks,

8
and shows that MultiNet consumes less battery power than an alternative approach
of using multiple wireless cards in the device.
As of this writing, MultiNet has been operational for over two years. During this
time, we have refined the protocols and analyzed them in greater detail. Many of the
results we present in this chapter are based on real working systems that include current
and next generation IEEE 802.11 wireless cards. For cases where it is not possible to
study the property of the system without large scale deployment and optimized hard-
ware, we carry out simulation based studies. Most of our simulations are driven by
traffic traces that represent ‘typical traffic’. For IEEE 802.11, our study shows that
MultiNet nodes can save upto 50% of the energy consumed by nodes with two cards,
while providing similar functionality. We also quantify the delay versus energy tradeoff
for switching nodes over performance sensitive applications.
The rest of this chapter is organized as follows. Section 2.2 presents some scenarios
and applications that motivate the need for MultiNet and for which MultiNet is currently
being used. Section 2.3 presents some related research and Section 2.4 provides the
background needed for the rest of the chapter. The MultiNet architecture is presented
in Section 2.5, and its implementation is described in Section 2.6. Performance and
feasibility are discussed in Sections 2.7 and 2.8. Future work is presented in Section 2.9
and we conclude in Section 2.10.
2.2 Motivating Scenarios
MultiNet enables several new applications that were earlier not possible using a single
wireless card. A few examples include:
• Concurrent Connectivity: A user can connect to multiple wireless networks. He
specifies a list of networks, and MultiNet simultaneously connects to all of them.

9
• Network Elasticity: The range of an infrastructure network can be extended if
border nodes use MultiNet to function as relays for authorized nodes that are
outside the range of the Access Point (AP). We implemented this functionality as
part of our fault diagnosis architecture, and describe it in detail in Chapter 4.
• Gateway Node: A node that is part of a wireless ad hoc network and close to
an AP, connected to the Internet, can use MultiNet to stay connected on both
networks, and become a gateway node for the ad hoc network [26].
• Network Security: Different groups (e.g. human resources personnel, secretaries,
developers etc.) within a company may be given different permissions to access
data servers. These servers could be on physically different networks. A privi-
leged user, who has permission to access different networks, can use MultiNet to
simultaneously connect to multiple networks.
• Increased Capacity: The capacity of ad hoc networks can be increased if nodes
within interference range communicate on orthogonal frequency channels [16,
114]. In Chapter 3, we describe SSCH, which uses MultiNet to virtualize a wire-
less card into as many instances as the number of orthogonal channels, and simul-
taneously connects on all of them.
• Virtual Machines: Existing Virtual Machine architectures (for example, [28, 126,
130]) restrict all virtual machine instances to stay connected on the same wireless
network. MultiNet allows users to connect different virtual machines to physically
different wireless networks using only a single wireless card.
• Seamless Roaming: The time to handoff from one AP to another is a significant
overhead in mobile wireless networks [113]. MultiNet allows a wireless card to

10
connect to an AP without disconnecting from its previous one. This technique has
been used in a recent work, called SyncScan [100].
All the above scenarios require nodes to stay connected on more than one wireless net-
work, and MultiNet achieves this functionality with only one wireless card.
2.3 Prior Work
Virtualization has been studied extensively for abstracting a single system resource as
multiple available resources to the user. For example, Virtual Machine architectures,
such as VMWare [126], Denali [130], Xen [20], Terra [49], etc., virtualize a single com-
puter to give an illusion of many smaller virtual machines, each running its own oper-
ating system. Storage Virtualization systems, such as Facade [81] and Stonehenge [55],
virtualize a storage device into multiple logical storage devices. Similarly, Virtual Mem-
ory [23,41] presents an illusion of larger memory to user programs than is actually avail-
able. MultiNet is similar to the above systems in abstracting a single resource, in this
case a wireless card, as multiple wireless cards to the user. However, to the best of our
knowledge, it is the first system that virtualizes wireless network cards.
Prior work has looked at virtualizing the wired network interface on a machine. The
Virtual Machine architectures discussed above [20, 28, 49, 126, 130] virtualize all hard-
ware resources, including the network interface [120]. Other systems for low latency
communication, such as U-Net [128] and VIA [29, 38], virtualize the network interface
to multiple local virtual interfaces, one for each process. The physical network interface
is multiplexed across the virtual interfaces to send packets sent by a process. Network
Cloning [138] brings up multiple network stacks for a single physical interface. Similar
to these systems, MultiNet abstracts the wireless interface as multiple virtual interfaces,
and multiplexes the physical card across the virtual instances. However, it faces different
11
challenges that do not arise in the case of wired networks. Firstly, each virtual wireless
card might require connectivity to a physically different wireless network. Therefore, as
a contrast to the above systems, only one virtual instance is physically on the network at
any time. Secondly, switching to a different network takes a few hundred milliseconds,
as we show in Section 2.7. So, the approach used by the above systems, where packets
from different virtual interfaces are serviced by the wired interface in the order in which
they arrive, might incur a network switch overhead for every packet. This scheme may
not be suitable for virtualizing wireless cards. MultiNet uses different switching and
buffering algorithms, which are described in Section 2.5.3.
Another set of related work looked at smart channel hopping schemes over a single
wireless radio [66, 89, 114]. The idea is to distribute interfering traffic on different fre-
quency channels to increase the capacity of wireless networks. MultiNet differs from
these systems in two aspects. Firstly, MultiNet has to switch across multiple networks
instead of channels, and consequently MultiNet has to store more state for each network.
Secondly, all the above protocols have only been evaluated in simulation. We are not
aware of any prior implementation of these systems.
As part of MultiNet’s design goals, which we will describe in Section 2.5.1, any two
neighboring nodes in an ad hoc network should overlap on the same frequency channel
within a definite period. Our solution to this problem, described in Section 2.5.6, relies
on clock synchronization provided by the Timer Synchronization Function (TSF) of
IEEE 802.11 [58]. The algorithm or its variants [54, 74, 110] are based on an algorithm
proposed by Lamport [75], which shows that given the clock accuracy, link delay and
network diameter, and assuming that a timestamp is sent successfully along each link
at a constant frequency, the timing values of the entire network is guaranteed to be
within an established bound. A previous work [54] has shown that these algorithms
12
work reasonably well when there are no Byzantine failures [76] in the network. For our
algorithms to work with such failures, we would need clock synchronization algorithms
with stronger guarantees [116, 125]. However, handling these failures is out of scope of
this dissertation.
To the best of our knowledge, the idea of simultaneously connecting to multiple
wireless networks has not been studied before in the context of wireless LANs. A related
problem was considered for scatternet formation in Bluetooth [92] networks [77, 78].
Bluetooth networks comprise basic units, called piconets, that can have at most 7 nodes.
Piconets are used to form bigger networks, called scatternets, by having some nodes on
multiple piconets. However, the problem of enabling nodes in Bluetooth networks to
listen to multiple piconets is significantly different from the problem of allowing nodes
to connect to multiple IEEE 802.11 networks. Bluetooth uses a frequency hopping
scheme for communication between multiple nodes on the network. A node can be
on two networks simultaneously if it knows the correct hopping sequence of the two
networks and hops fast enough. IEEE 802.11 networks, on the other hand, have no such
scheme as is described next in Section 2.4.
An alternative to virtualizing wireless cards is to use multiple radios in the device,
and this approach has been commonly used in commercial products [64, 115, 119] and
wireless networking research [9, 87, 95]. However, as we show in Section 2.7, using
multiple radios consumes more power, which is a scarce resource in battery operated
devices. Further, a recent result shows that the performance of multi-radio systems is
significantly degraded by the self interference among the radios on the device [106]. In
Section 2.7.8, we show that MultiNet solves these problems of multi-radio systems at a
cost of reduced throughput.

13
2.4 Background
This section first discusses the limitations of IEEE 802.11 and describes why maintain-
ing simultaneous connections to multiple wireless networks is a non-trivial problem. It
then briefly describes the Power Save Mode (PSM) [58] feature of IEEE 802.11, which
is used in the Spoofed Buffering Protocol described in Section 2.5.4. Finally, it discusses
the next generation of WLAN cards, over which we evaluate MultiNet.
2.4.1 Limitations in Existing Systems
Popular wireless networks, such as IEEE 802.11, work by association. Once associated
to a particular network, either an AP based (infrastructure) or an ad hoc network, the
wireless card can receive and send traffic only on that network. The card cannot inter-
act with nodes in another network if the nodes are operating on a different frequency
channel. Further, a node in an ad hoc network cannot interact with a node in the infras-
tructure network even when they are operating on the same channel. This is because the
IEEE 802.11 standard defines different protocols for communication in the two modes
and it does not address the difficult issue of synchronization between different networks.
As a matter of practical concern, most commercially available WLAN cards trigger a
firmware reset each time the mode is changed from infrastructure to ad hoc or vice versa.
2.4.2 Power Save Mode (PSM) of IEEE 802.11
The IEEE 802.11 standard defines Power Save Mode (PSM) for infrastructure wireless
networks as a means to save battery power. When a node wants to use PSM, it sends a
message to the AP and sets its wireless interface to sleep mode. The message to the AP
also contains the duration for which the node wants to sleep. This duration is called the
14
Listen Interval. When the AP receives a packet destined for the sleeping node, it buffers
the packet. After a Listen Interval period, the node using PSM wakes up, and receives
the packets buffered at the AP. Usually, the Listen Interval is set to be a multiple of
the Beacon Period, where the Beacon Period is the interval at which an AP broadcasts
its beacon. The Beacon Period is a parameter of the AP, while the Listen Interval is a
parameter of the node using PSM.
2.4.3 Next Generation of IEEE 802.11 WLAN cards
In order to reduce the cost and commoditize wireless cards, IEEE 802.11 WLAN card
vendors [11, 102] are minimizing the functionality of the code residing in the micro-
controller of their cards. These next generation of wireless cards, which we refer to as
Native WiFi cards, implement just the basic time-critical MAC functions, while leaving
their control and configuration to the operating system. More importantly, these cards
allow the operating system to maintain state and do not undergo a firmware reset on
changing the mode of the wireless card. This is in contrast to the existing cards, which
we refer to as legacy wireless cards in the rest of this dissertation.
2.5 MultiNet
This section first formulates the MultiNet problem and enumerates its design goals. It
then describes the MultiNet system in detail.
2.5.1 Assumptions about the System
MultiNet is designed to work in a Wireless LAN environment, such as IEEE 802.11.
We make the following assumptions about such a network:

15
• All nodes in a network are synchronized to within a millisecond. IEEE 802.11
maintains a timer and uses a distributed Timer Synchronization Function (TSF) [58]
to synchronize these timers at all nodes in a network. IEEE 802.11b synchronizes
the timers at all nodes in a network to within 224 µs [60]. TSF, or its modifica-
tions, ATSP [54, 74] or ASCP [110] can be used to achieve the required synchro-
nization granularity even when broadcast packets are lost.
• APs implement Power Save Mode (PSM), and have enough buffer space to sup-
port all nodes using PSM in the network. This feature is defined in the IEEE
802.11 standard [58], and is implemented in some existing WLAN products [35,
121, 122].
• There is an overhead of switching a wireless card from one network to another.
This comprises the time to switch to another channel and associate to the network.
As we discuss in Section 2.7, this overhead is a few hundred milliseconds for most
commercial wireless cards. MultiNet will give better performance when this delay
is reduced, using the schemes we discuss in Section 2.8.1.
• The applications on machines running MultiNet can tolerate variable throughput
and delays. Some sample applications supported by MultiNet are browsing, file
transfers and web downloads. The reason why other applications are not sup-
ported is explained in Section 2.5.2.
• The device driver of a wireless card sends a disconnect message to the network
layer when it disconnects from a network, and a connect message when it success-
fully connects to one. On modern operating systems, such as Linux and Windows
XP, these messages are passed up to the user level and are used to display the
current status of the physical interface. In Windows XP, the device driver sends
16
a media disconnect and media connect message on disconnection and
connection respectively. In Linux, the device driver calls netif carrier off
and netif carrier on methods.
• A user knows if MultiNet is being used by more than one machine in an ad hoc
network. Further, all nodes in an ad hoc network agree to install software to
support MultiNet. Since ad hoc networks are usually cooperative networks, we
expect this assumption to hold in most cases.
2.5.2 MultiNet Design Goals
A virtualized physical wireless card appears as multiple virtual network interfaces,
where each virtual interface corresponds to a physically different wireless network. Fur-
ther, MultiNet also strives to achieve the following design goals when virtualizing a
wireless card:
• Transparency: To reduce the learning curve in using the system, we require
virtual interfaces to appear as physical wireless cards to the user. He should be
able to connect different virtual cards to different wireless networks, although the
physical card is only on one network at any instant. The architecture should ensure
that packets sent to and from a virtual interface are not discarded if the physical
wireless card is not on the corresponding network at that instant. Further, when
a machine is mobile, the virtual interface should appear disconnected when the
machine moves out of range of the network. However, it should appear connected
when the machine moves back in the network range.
• Performance: The system should give the illusion of simultaneous connectivity
on all virtual interfaces. Packet delays on a virtual interface should be minimized.

17
The user should also be able to prioritize different virtual interfaces, so that pack-
ets on a more important network are sent and received with lesser delay.
• Deployability: The system should be easy to deploy in an existing wireless net-
work. It should work over the commonly used IEEE 802.11 standard, and with
commercial wireless cards. Further, it should not require changes to the wireless
card drivers or the network infrastructure. Nearly all of the modifications should
be on the user’s machines.
In addition to the above design goals, there are a few plausible goals that Multi-
Net does not achieve. Firstly, it does not aim to support real-time applications over the
network, such as Voice over IP(VoIP) [127] or streaming video. This constraint arises
from the few hundred milliseconds overhead when switching from one network to an-
other. Unless this overhead is reduced, MultiNet will be unable to provide response
time guarantees of less than a few hundred milliseconds on all networks. Secondly,
MultiNet does not handle Byzantine failures in the network. Handling these failures
would require changes to our buffering and synchronization protocols described in Sec-
tions 2.5.4 and 2.5.6 respectively, and is out scope of this dissertation. Thirdly, we defer
the discussion of using MultiNet in multi-hop ad hoc networks to Chapter 3. In the rest
of this chapter, we limit our discussion to using MultiNet in single hop ad hoc networks,
where all nodes are in communication range of each other, and in infrastructure wireless
networks. Finally, the current implementation of MultiNet allows a node to stay con-
nected on only one ad hoc network in which multiple nodes use MultiNet. Enabling a
node to use MultiNet for maintaining connections to more than one such ad hoc network
is part of future work.

18
2.5.3 The MultiNet Approach
MultiNet achieves the above design goals by introducing functionality in a new layer,
between the network and physical layers of the network stack, as shown in Figure 2.1.
This layer, called the MultiNet Layer, initializes and maintains a new virtual network
interface for every new network on which the user wants to stay connected. The IEEE
802.11 parameters [58] of the physical wireless card is duplicated at each virtual inter-
face. So, each virtual interface has its own Service Set Identifier (SSID) and Network
Mode and appears as a separate wireless card to the network layer.
All virtual interfaces appear as connected to the network layer, even though the
physical card is connected to only one wireless network at any instant. This is shown in
Figure 2.1 where IP sees virtual interfaces 1, 2 and 3 as connected to networks 1, 2 and
3 respectively, although the physical card is connected to Network 2. Since all virtual
interfaces appear as connected, the user might send packets on any of them. Packets
sent to a virtual interface, when the physical card is not on its corresponding wireless
network, are buffered in a packet buffer maintained at each virtual interface. Packets are
sent over the network without any delay if the physical card is on the network.
MultiNet provides an illusion of simultaneous connectivity on all networks by mul-
tiplexing the physical wireless card across all virtual interfaces. The physical card stays
connected on a network long enough to send and receive one or more packets on the cor-
responding virtual interface. The MultiNet Layer then switches the physical card to a
network corresponding to another virtual interface. The information about the network
is retrieved from the state stored in the virtual interface. After switching the physi-
cal card to another network, MultiNet waits for a media connect message from the
lower layers. This message is sent only if the physical card successfully switches to
another network. On receiving this message, MultiNet sends the packets buffered on
19
Application
User Level
Kernel Level
Transport (TCP, UDP)
IP
Network 1 Network 2 Network 3
MultiNet Layer
Network 1 Network 2 Network 3
MAC and PHY
Wireless Card is on Network 2
Figure 2.1: The MultiNet Layer maintains virtual interfaces for networks 1, 2 and
3, and switches the physical card across all these networks. It gives the illusion of
connectivity on all networks although the card is on network 2 at this instant.
the virtual interface, and stays connected to this network for some time. This cycle
continues in round-robin fashion across all virtual interfaces.
Before describing the architecture further, we briefly define some terms we use in
the rest of this chapter. The period of time for which a card stays on a network after
successfully connecting to it is called the Activity Period for the network. The time to
switch to another network, from the time switching is initiated to the time the card is
associated to the wireless network, is called the Switching Delay for the network. The
Activity Period is the useful time when a card sends and receives packets, while the
Switching Delay is an overhead when the card is not on any network. The performance
20
of MultiNet is better when the Switching Delays are small. The sum of the Activity
Periods and Switching Delays over all connected networks is called the Switching Cycle.
Switching from one network to another requires the physical card to disconnect from
one network and connect to the other. Correspondingly, as described in Section 2.5.1,
the physical layer sends disconnect and connect messages to the upper layers. These
messages change the connectivity status of the virtual network interface, and as a result
only one virtual interface appears as connected at any time. This is a problem for Multi-
Net since the operating system drops packets sent on a disconnected network interface.
MultiNet solves this problem by trapping the disconnect message sent by the physi-
cal layer immediately after a disconnection. This message is received at the MultiNet
Layer and is prevented from going up the network stack. Consequently, layers above
the MultiNet Layer see all the virtual interfaces as connected although the physical card
switches across different networks.
MultiNet also manages the state of a virtual interface when a network disconnection
is caused by factors such as mobility or weak signal strength. The virtual interface is
made to appear disconnected when the physical card is unable to connect to its network,
and is made to appear connected when the physical card regains connectivity to the
network. MultiNet achieves this functionality by not trapping the disconnect message
when it is caused by any source other than MultiNet. As a result the virtual card appears
disconnected whenever the physical wireless card is unable to connect to its network.
Further, MultiNet attempts to connect to all networks in its Switching Cycle, even if its
previous attempt to connect was unsuccessful. When the physical wireless card success-
fully connects to a network, the connect message is passed up the network stack, and the
corresponding virtual interface appears connected. We demonstrate this functionality in
a mobile scenario in Section 2.7.7.

21
This design of MultiNet poses two interesting questions. Firstly, how are packets
delivered to a virtual interface if the card has switched to another network? Secondly,
how long should the card stay on a network? We first answer these questions for the
scenario when only one machine in any ad hoc network uses MultiNet. We then develop
our approach to handle the case when MultiNet is used by more than one node in an
ad hoc network. An important question we defer to future work, in Section 2.9, is the
interaction of TCP with MultiNet.
2.5.4 Delivering Packets to Virtual Interfaces
In this section, we present a buffering protocol that prevents packets sent to a virtual
interface from being discarded when the physical card is not on the corresponding net-
work. As part of the protocol, we describe a new approach that allows MultiNet to work
in infrastructure networks without modifying the APs.
The buffering protocol works differently for ad hoc and infrastructure networks. For
ad hoc networks, just before switching out of the network, a node broadcasts a packet
that informs all other nodes in the network of its unavailability and when it will be back
in the network. On switching back to the ad hoc network, the node broadcasts another
packet announcing its availability. Packets destined for this node are buffered by other
nodes in the ad hoc network, until either of the following two conditions hold: the
broadcast announcing availability of the node is received, or the time by which the node
was expected to be back in the network has elapsed. If the node is available, then the
buffered packets are sent to the it. Otherwise, if the timer has elapsed, then the buffered
packets are discarded. This protocol requires modifications at all nodes in the ad hoc
network, even if they do not use MultiNet to connect to multiple networks. This should
not be very difficult to achieve as was explained in Section 2.5.1.

22
MultiNet could use a similar protocol for infrastructure networks. However, APs
would need to be modified to buffer packets destined for nodes using MultiNet on its
network. This significantly affects the deployability of MultiNet, as discussed in Sec-
tion 2.5.2. MultiNet solves this problem by proposing a new protocol, called Spoofed
Buffering. Spoofed Buffering buffers packets at the APs without requiring modifications
to them.
Spoofed Buffering works as follows. MultiNet spoofs sleep mode to the AP just
before switching out of an infrastructure network. It sends a special IEEE 802.11 packet
to the AP, which informs the AP that it is using IEEE 802.11 PSM to go to sleep mode,
and the time for which it will sleep. While the AP knows the node to be sleeping,
MultiNet switches the physical card to another network. As described in Section 2.4.2,
PSM requires APs to buffer packets for nodes that are sleeping in its network, and to
send the buffered packets when the nodes wake up. So, packets destined for the MultiNet
nodes are buffered at the AP until the node switches back to the infrastructure network.
The node then informs the AP that it is awake by sending another IEEE 802.11 packet.
On receiving this packet, the AP sends all the buffered packets, which are received by
the corresponding virtual interface.
Figure 2.2 illustrates the steps of Spoofed Buffering when a node uses MultiNet to
connect to two wireless networks. Before switching out of network 1, the node informs
the AP that it is going to sleep for a certain time. It then switches to network 2, where
it announces that it is awake. The AP in network 2 then sends the buffered packets to
the node, which forwards them up to the corresponding virtual interface. The virtual
interface also sends its buffered packets to the AP. The node then stays on network 2 for
the Activity Period. It then sends a message to the AP of network 2 announcing that it
is going to sleep, and switches to network 1 and informs the AP of network 1 that it is
23
awake. These steps continue as long as the node requires connectivity on both wireless
networks.
Network 2 SERIAL ETHERNET
2)
3)
ing
I a /Re ing
4)
Se
ke
m
leep
I
nd leep
awa
am
aw eive
ms
ak
s
am
e
a
1) I
5) I
pa
ck
tse
MultiNet node connected to networks 1 and 2
Figure 2.2: The steps of Spoofed Buffering when a node uses MultiNet to connect
to two networks.
We note that despite our buffering protocol, packets might still be lost due to other
reasons, such as mobility, wireless signal fade or interference. Further, buffering might
not be possible at other nodes in the network, due to lack of cooperation from nodes in
the ad hoc network or PSM support at the APs. In such scenarios, MultiNet relies on
higher layer protocols, such as TCP, to recover the lost packets. We compare MultiNet
with and without buffering support in Section 2.7.5, and show that although MultiNet
performs much better when the buffering protocols are implemented, its performance is
reasonable in the bad case when no packets are buffered.

24
2.5.5 Determining the Activity Period for a Network
The Activity Period is the duration for which a wireless card stays connected on a net-
work. MultiNet supports three schemes for determining this duration, each of which is
useful in different scenarios.
• Fixed Slot Duration: In this approach, MultiNet divides time into slots of fixed
duration. Every time the physical card switches to a network, it stays on that
network for one slot. The slot duration includes the Switching Delay. This scheme
is simple to implement, and is useful in cases where synchronization is required
between multiple nodes using MultiNet in an ad hoc network. We use it for our
algorithms in Section 2.5.6 and for SSCH described in Chapter 3.
• User Defined Priority: This scheme requires the user to prioritize all his net-
works, and define the Total Activity Period. The Total Activity Period is the sum
of Activity Periods of all networks, which is equal to the difference between the
Switching Cycle and the sum of Switching Delays across all networks. Multi-
Net then calculates the Activity Period for each network based on its priority.
So, if a user requires connectivity to a set of wireless networks, and has given
network i a priority xi , then the Activity Period of any network j is given by

xj ∗ (1/( ∀k xk )) ∗ (T otalActivityP eriod). This scheme is useful when there
exists a predefined priority across all networks. For example, the Client Conduit
Protocol, described in Chapter 4, uses user defined priorities to limit the duration
for which a connected machine helps a disconnect wireless client.
• Adaptive Schemes: This approach does not require any intervention from the
user. It dynamically prioritizes networks based on the amount of traffic seen
on it, and uses these priorities to calculate the Activity Period for each network.
25
Consequently, a network that sends and receives more packets has a longer Ac-
tivity Period as compared to a less active one. So, if MultiNet has to switch
across different networks, and network i has seen Pi packets in its last Activity
Period AT Pi , then the node stays in network j for an Activity Period given by

(Pj /AT Pj ) ∗ (1/( ∀k Pk /AT Pk )) ∗ ( ∀k AT Pk ). The first term gives the net-
work utilization of network j, the second gives the utilization across all networks,
and the final term is the total amount of time the node is active across all networks.
This approach is useful in scenarios where the user wants to get the best perfor-
mance on multiple networks, without worrying about the traffic patterns on each
network. We use this strategy to provide true zero configuration over MultiNet,
as described in Section 2.7.4. We evaluate two adaptive strategies for MultiNet
in Section 2.7.3. Adaptive Buffer is a naive approach that prioritizes networks
based on the number of packets buffered by their corresponding virtual interfaces
during a Switching Cycle. Adaptive Traffic is a more sophisticated approach
that maintains a history of packets sent and received on all virtual interfaces over
a certain number of Switching Cycles. It then uses this history to prioritize across
networks, and adapt their Activity Periods.
2.5.6 Handling Ad Hoc Networks with Multiple MultiNet Nodes
Supporting multiple nodes to use MultiNet in an ad hoc network poses a new problem.
Any two nodes using MultiNet might not overlap in the ad hoc network for a signif-
icant period of time. Consequently, these nodes will be unable to communicate with
each other for long durations even though they are in communication range of each
other. This significantly affects the performance of MultiNet on the ad hoc network.
Figure 2.3 illustrates this problem when two nodes A and B are in communication range
26
of each other and use MultiNet with Fixed Slot Duration to connect to two networks:
Infrastructure Network 1 and Ad Hoc Network 2. In this scenario, nodes A and B do not
overlap in the ad hoc network, and cannot communicate in this network. However, note
that this problem is specific to ad hoc networks, as these nodes can communicate in the
infrastructure network using Spoofed Buffering to buffer packets at the APs. Further,
this problem also arises for other switching protocols described in Section 2.5.5, as two
nodes might overlap for a very small period of time, which is too small to send even a
single packet.
IS Network 1 AH Network 2 IS Network 1 AH Network 2
Machine A
AH Network 2 IS Network 1 AH Network 2 IS Network 1
Machine B
time
Figure 2.3: Two nodes in communication range and using MultiNet that fail to
overlap in the ad hoc network abd hence experience a logical partitioning.
This section presents a simple approach, called Slotted Synchronization, to synchro-
nize an overlap between any two nodes using MultiNet in a single hop ad hoc network.
We discuss SSCH, which is a more sophisticated and efficient approach for multihop
networks in Chapter 3. Slotted Synchronization has a limitation that it allows a node to
connect to only one ad hoc network in which multiple nodes use MultiNet. Extending
this approach to allow nodes to stay connected in many ad hoc networks with multiple
nodes using MultiNet is part of future work.
Slotted Synchronization uses what we term the “Fixed Slot Duration switching scheme”,
in which time is divided into slots and nodes switch to a network at the beginning of a
27
slot. All nodes use the same slot duration, and clocks at all nodes in a network are
synchronized to within a millisecond of each other. The slot duration is chosen to be
a few hundred milliseconds to accommodate the Switching Delay when switching to a
network. We quantify the Switching Delay in Section 2.7.2. Slotted Synchronization
makes the assumption, as described in Section 2.5.1, that the node starting an ad hoc
network knows if more than one node in its network is going to use MultiNet.
Slotted Synchronization works as follows. The initiator node of an ad hoc network
defines a recurrence period for the network. The recurrence period is the periodicity,
measured in slots, at which MultiNet connects to the ad hoc network. As we show in
Section 2.6.4, the SSID field of the IEEE 802.11 Beacon [58] can be modified to carry
the information about the recurrence period of the network and offset within the slot
when the Beacon is transmitted. When a node uses MultiNet to join this network, it uses
this information to synchronize the start time of its slots to that of the ad hoc network.
Then, after every recurrence period slots, MultiNet switches the physical card of this
node to the ad hoc network. Over the remaining slots, MultiNet switches the physical
card across all the other networks.
This algorithm ensures that all nodes in the ad hoc network overlap for one slot
every recurrence period slots, even when some nodes use MultiNet to stay connected on
other networks. Slotted Synchronization achieves this guarantee by synchronizing the
slots at all nodes in the network to the parameters specified by the initiator. Further, slot
synchronization occurs only at the time of joining the network and so this algorithm is
not affected by mobility in the network. Note that this algorithm might not work if a
node uses it to synchronize slots to multiple networks, since the initiator’s slots of these
disjoint networks might not be synchronized. Therefore, we limit a node to use MultiNet
to stay connected on only one ad hoc network in which multiple nodes use MultiNet.
28
However, it can connect to many infrastructure networks and ad hoc networks in which
it is the only node using MultiNet.
2.6 Implementation
We implemented MultiNet on Windows XP as a combination of a kernel driver with
a user level service. The mechanisms for storing network state, and for switching and
buffering across networks are implemented in the kernel, while the respective policies
are implemented in the service. The kernel driver is an NDIS intermediate driver, which
1
exists as a layer between the network device drivers and IP. MultiNet performs best
when APs implement PSM and other nodes in an ad hoc network buffer packets for
nodes using MultiNet. However, no changes are required in the wired nodes for Multi-
Net to work. The rest of this section describes the details of our implementation.
2.6.1 MultiNet Driver
The MultiNet driver provides all the mechanisms required by the MultiNet architecture.
It initializes and maintains the virtual interfaces, and provides support to switch a wire-
less card from one network to another and to buffer packets at the virtual interfaces if the
physical card is not on the wireless network. This driver also sends the buffered packets
when it receives a media connect message after switching to another network.
The MultiNet driver is implemented entirely as a Windows NDIS Intermediate driver.
NDIS requires the lower binding of a network protocol, such as IP, to be a network
miniport driver2, such as the driver of a network interface. Similarly, NDIS requires the
1
Network Driver Interface Specification (NDIS) is a Windows construct that pro-
vides transport independence for the network card vendors. All networking protocols
used by Windows call the NDIS interface to access the network.
2
A miniport driver directly manages a network interface card (NIC) and provides an
29
upper binding of miniport drivers to be a network protocol driver. We accommodate
this requirement in the design of the MultiNet Driver, which includes two components:
the MultiNet Protocol Driver (MPD), which provides an upper binding to the network
card miniport driver, and the MultiNet Miniport Driver (MMD), which provides a lower
binding to the network protocols, such as TCP/IP. The modified stack is illustrated in
Figure 2.4.
Mobile Aware
Application
Application
MultiNet Service
User WinSock 2.0
Kernel Legacy Native media -aware

Protocols TCP/IP protocols
Net 1 Net 2 Net N

MultiNet Miniport Driver (MMD) MultiNet
NDIS
Driver
MultiNet Protocol Driver (MPD)
NDIS WLAN
extensions
NDIS miniport NDIS WLAN

miniport
Hardware
Figure 2.4: The Network Stack with MultiNet
The MPD manages multiple virtual interfaces over one wireless card. It switches
the association of the underlying card across different networks, and buffers packets if
the SSID of the associated network is different from the SSID of the sending virtual
interface to higher-level drivers.
30
interface. MPD also buffers packets on the instruction of the MultiNet Service, as we
describe later in Section 2.6.2. Further, the MPD handles packets received by the wire-
less card. A packet received on the wireless card is sent to the virtual interface associated
with the network on which the packet is received.
The MMD manages a virtual interface of a wireless card. It maintains the state for
each virtual interface, which includes the SSID and operational mode of the wireless
network. It is also responsible for handling query and set operations directed for the
underlying physical wireless interface.
2.6.2 MultiNet Service
The MultiNet service implements the algorithms for switching across networks and
buffering packets, described in Sections 2.5.5 and 2.5.4 respectively. This service is
a user level daemon that uses I/O Control Codes (ioctls) to interact with the MultiNet
Driver. It also broadcasts packets to interact with the service running at other nodes.
These messages coordinate the buffering protocol for ad hoc networks, described in
Section 2.5.4. Further, all the switching algorithms discussed in Section 2.5.5 are im-
plemented in the MultiNet service. The service determines the duration of the Activity
Period, and sends a signal to MPD when the Activity Period expires. This signal initiates
the switching mechanism implemented in MPD. Finally, the service also coordinates the
synchronization protocol described in Section 2.5.6. It embeds the recurrence period and
offset in the IEEE 802.11 Beacon frame, and uses this information to synchronize the
slot times of all nodes in the network.

31
2.6.3 Implementing Buffering
Spoofed Buffering, described in Section 2.5.4, buffers packets for MultiNet over infras-
tructure networks using IEEE 802.11 PSM. We successfully implemented this scheme
over Native WiFi cards, which were described in Section 2.4.3. For non-Native WiFi
(legacy) cards, we were constrained by the proprietary software on the card drivers.
Their software does not expose any APIs in Windows to programmatically set the res-
olution of power save mode. Therefore, we were unable to implement the buffering
algorithm for these WLAN cards. However, for prototyping Spoofed Buffering, we
buffer packets at the end points of infrastructure networks, using a scheme similar to the
one described for ad hoc networks in Section 2.5.4. The MultiNet service keeps track of
the end points of all on-going sessions, and buffers packets if the destination is currently
in another network.
2.6.4 Implementing Slotted Synchronization
The Slotted Synchronization Protocol, described in Section 2.5.6, requires an ad hoc
network with multiple MultiNet nodes to have two parameters, in addition to the ones
specified by IEEE 802.11. In particular, the initiator of such an ad hoc network has
to specify the recurrence period and the offset within the slot when the IEEE 802.11
Beacon is sent. Any node joining this network has to learn of both these parameters for
Slotted Synchronization to work. One way to implement this requirement is to modify
IEEE 802.11 packets to carry more information. However, this requires modifications to
the wireless card driver, and might reduce the interoperability of MultiNet, as discussed
in Section 2.5.2.
We use an alternative approach to solve this problem. The two parameters are em-
bedded in the SSID field of an IEEE 802.11 Beacon, which is broadcast once every
32
fixed interval.3 The SSID field of the Beacon frame is 32 bytes in length. The recur-
rence period is measured in slots, and the maximum value is the number of networks to
which a user can connect to. We limit this to be 255, and so 1 byte is sufficient to carry
this information. Further, the offset within the slot is measured in microseconds, and
we limit the maximum slot duration to 5 seconds. So, 2 bytes are enough to embed the
value of the offset. Therefore, the user is allowed to use a 29 characters long SSID for
such ad hoc networks. Based on experience, we believe that this does not significantly
reduce the usability of IEEE 802.11 networks.
2.7 System Evaluation
We studied the performance of MultiNet using a real implementation and a custom sim-
ulator. The implementation was used to study the throughput behavior with different
switching algorithms. We then simulated MultiNet with realistic parameters, and com-
pared it with the alternative approach of using multiple radios to connect to multiple
networks. We compare the two approaches with respect to energy consumption and the
average delay encountered by the packets. The results presented in this section con-
firm that MultiNet is a more energy-efficient way of achieving connectivity to multiple
networks as compared to using multiple radios.
2.7.1 Test Configuration
MultiNet has been deployed and tested over a dozen commercial IEEE 802.11 wireless
cards. The results in this section were derived over an IEEE 802.11b network [60].
The wireless cards used were the Cisco 340 series, Compaq WLAN 200, Orinoco Gold,
3
The IEEE 802.11 protocol for joining an ad hoc network requires the joining node
to use the information in the Beacons of that network.
33
Netgear WAG 511 and the Native WiFi cards from AMD [11] and Realtek [102]. All
these cards have a maximum data rate of 11 Mbps. The APs used were the Cisco 340
Series, EZConnect 2656, DLink DI-614+ and Native WiFi APs. IEEE 802.11 PSM was
implemented only in the Native WiFi APs. Most of our results were consistent across
all these network equipments.
2.7.2 Reducing the Switching Delay
Good performance of MultiNet depends on a short delay when switching across net-
works. However, legacy IEEE 802.11b cards perform the entire association procedure
every time they switch to a network. We carried out a detailed analysis of the time to
associate to an IEEE 802.11 network. The results showed significant overhead when
switching from one network to another. In fact, an astronomical delay of 3.9 seconds
was observed from the time the card started associating to an ad hoc network, after
switching from an infrastructure network, to the time it started sending data.
Table 2.1: The Switching Delays between IS and AH networks for IEEE 802.11
cards with and without the optimization of trapping media connect and disconnect
messages.
Switching Unoptimized Optimized Optimized
From Legacy Legacy Native WiFi
IS to AH 3.9 s 170 ms 25 ms
AH to IS 2.8 s 300 ms 30 ms
Our investigations revealed that the cause of this delay is the media disconnect and
media connect notifications to the IP stack. The IP stack damps the media disconnect
and connect for a few seconds to protect itself and its clients from spurious signals. The
34
spurious connects and disconnects can be generated by network interface cards due to a
variety of reasons ranging from buggy implementations of the card or switch firmware
to the card/switch resetting itself to flush out old state. Windows was designed to damp
the media disconnect and connect notifications for some time before rechecking the
connectivity state of the adapter and taking the action commensurate with that state.
In the case of MultiNet, switching between networks is deliberate and meant to be
hidden from higher protocols, such as IP and the applications. We hide switching by
having MPD trap the media disconnect and media connect messages when it switches
between networks. Since the MPD is placed below IP, it can prevent the network layer
from receiving these messages. This minor modification significantly improves the
Switching Delay as shown in Table 2.1. Using the above optimization, we were able
to reduce the switching delay from 2.8 seconds to 300 ms when switching from an ad
hoc network to an infrastructure network and from 3.9 seconds to 170 ms when switch-
ing from an infrastructure network to an ad hoc network. These numbers are further
reduced to as low as 30 ms and 25 ms respectively, when Native WiFi cards are used.
We believe that this overhead is extraneous for purposes of MultiNet and in Section 2.8
we suggest additional ways to make this delay negligible.
A nice consequence of masking the media connect and media disconnect messages
is that all virtual adapters are visible to IP as connected, and our architecture of Section
2.5.3 is therefore consistent.
2.7.3 Comparing Different Switching Strategies
We implemented three switching strategies described in Section 2.5.5, i.e. User Defined
Priority, Adaptive Buffer, and Adaptive Traffic. The test environment comprised a node
that used MultiNet to stay connected to an infrastructure and an ad hoc network. The
35
Switching Delays from the ad hoc to the infrastructure network and vice versa were
overestimated at 500 ms and 300 ms respectively. 4 The total time available for switch-
ing between networks was 1 sec. We evaluated the switching strategies when simultane-
ously transferring a file of size 47 MB using FTP from the MultiNet node to two nodes
on the different networks. An independent transfer of the file over the ad hoc network
took 80.25 seconds, while it took 54.12 seconds over the infrastructure network.
Figure 2.5 shows the time taken to simultaneously transfer this file over MultiNet
using different switching strategies for legacy cards. We evaluated 3 different User
Defined Priority switching schemes. In the ‘50%IS 50%AH’ strategy the node stays on
each network for 500 ms. In the ‘75%IS 25%AH’ scheme it stays on the infrastructure
network for 750 ms and on the ad hoc network for 250 ms, and in the ‘25%IS 75%AH’
scheme the node stays on the infrastructure network for 250 ms and the ad hoc network
for 750 ms. For the Adaptive Traffic algorithm we used a window of 3 switching cycles
to estimate the Activity Periods. In this case the window is 3*1.8 = 5.4 seconds since
the Switching Cycle is 500+300+1000 = 1800 ms.
Different switching strategies show different behavior and each of them might be
useful for different scenarios. For the User Defined Priority strategies, the network with
higher priority gets a larger slot to remain connected. Therefore, the network with a
higher priority takes lesser time to complete the FTP transfer. The results of the adap-
tive algorithms are similar. The Adaptive Buffer algorithm adjusts the time it stays on
a network based on the number of packets buffered for that network. Since the maxi-
mum throughput on an infrastructure network is more than the throughput of an ad hoc
network5 , the number of packets buffered for the infrastructure network is more. There-
4
This overprovisioning helped to evenly compare all the switching schemes by fixing
the duration of the Switching Cycle
5
Separate experiments revealed that the average throughput on a wireless network
with commercial APs and wireless cards is 5.8 Mbps for an isolated infrastructure net-
36
800
700
600
500
Seconds
400
300
200
100
0
25%IS 75%AH 50%IS 50%AH 75%IS 25%AH Adaptive Buffer Adaptive Traffic
IS AH
Figure 2.5: Time taken to complete a 47 MB FTP transfer on an ad hoc and infras-
tructure network using different switching strategies
fore the FTP transfer completes faster over the infrastructure network as compared to
the ‘50%IS 50%AH’ case. For a similar reason the FTP transfer over the infrastructure
network completes faster when using Adaptive Traffic switching. MultiNet sees much
more traffic sent over the infrastructure network and proportionally gives more time to
it. Overall, the adaptive strategies work by giving more time to faster networks if there
is maximum activity over all the networks. However, if some networks are more active
than the others, then the active networks get more time. We expect these adaptive strate-
gies to give the best performance if the user has no priority and wants to achieve the best
performance over all the MultiNet networks.

work and 4.4 Mbps for an isolated two node ad hoc network. These results are consistent
with [52].
37
Traffic (in packets)

1400 10
1200 -10
1000 -30
800 -50
Activity Period (in ms)
600 -70
400 -90
200 -110
0 -130
0 20 40 60 80 100 120 140
Time (in seconds)
Ad hoc Infrastructure TrafficAH TrafficIS
Figure 2.6: Variation of the activity period for two networks with time. The activity
period of a network is directly proportional to the relative traffic on it.
2.7.4 Adaptive Switching using MultiNet
The adaptability of MultiNet is demonstrated in Figure 2.6. The Adaptive Traffic switch-
ing strategy is evaluated by running our system for two networks, an ad hoc and an
infrastructure network, for 150 seconds. The plots at the top of Figure 2.6 show the
traffic seen on both the wireless networks, and the ones at the bottom of this figure show
the corresponding effect on the Activity Period of each network. The adaptive switch-
ing strategy causes the Activity Period of the networks to vary according to the traffic
seen on them. Initially when there is no traffic on either network, MultiNet gives equal
time to both networks. After 20 seconds there is more traffic on the ad hoc network,
and so MultiNet allocates more time to it. The traffic on the infrastructure network is
greater than the traffic on the ad hoc network after around 110 seconds. Consequently,
the infrastructure network is allocated more time. This correspondence between relative
traffic on a network and its activity periods is evident in Figure 2.6.

38
MultiNet, when used with adaptive switching schemes, provides true zero config-
uration. Prior schemes, such as Wireless Zero Configuration (WZC), require users to
specify a list of preferred networks, and WZC only connects to the most preferred avail-
able wireless network. The adaptive switching strategies require a user to specify a list
of preferred networks, and the card connects to all the networks giving time to a network
based on the amount of traffic on it.
2.7.5 MultiNet with and without Buffering
We have implemented Spoofed Buffering on infrastructure networks with Native WiFi
cards using IEEE 802.11 PSM. However, many commercial APs do not implement
PSM. Further, the ad hoc network buffering protocol, described in Section 2.5.4, re-
lies on broadcast packets, which are more unreliable than unicast packets [91]. These
packets might get lost, and packets destined to MultiNet’s virtual interface might get
dropped. The worst case occurs when no packets are buffered due to lost broadcast
packets or lack of PSM support from commercial APs. Figure 2.7 compares this worst
case to the scenarios when MultiNet implements buffering. In our test scenario, we
consider an infrastructure network with and without Spoofed Buffering.
Packets were sent, using ntttcp, over the infrastructure network from the MultiNet
node to another node in the network. Ntttcp, which is a port of ttcp [118] to Windows,
works by establishing a TCP session between two nodes and sending the packets at the
maximum rate. The Activity Period for both networks was fixed at 500 ms. We present
results for three scenarios in Figure 2.7. ‘NoMultiNet’ corresponds to the case when
the sender and receiver are connected to just one network, ‘MultiNetNoBuffer’ is when
the sender is connected to two networks using MultiNet and the AP does not implement
Spoofed Buffering, and the APs implement Spoofed Buffering in ‘MultiNetBuffer’. Re-
39
9.E+03
8.E+03
7.E+03
TCP Sequence #
6.E+03
5.E+03
4.E+03
3.E+03
2.E+03
1.E+03
0.E+00
1.9 2.6 3.3 4 4.7 5.4 6.1 6.8 7.5 8.2 8.9 9.6
Time (in seconds)
NoMultiNet MultiNetBuffer MultiNetNoBuffer
Figure 2.7: TCP Performance with and without Spoofed Buffering.
sults show that the performance drops by a factor of four when using MultiNet with
Spoofed Buffering and drops further when the AP does not buffer packets. When APs
buffer packets, the MultiNet node can achieve a throughput proportional to the duration
of its Activity Period, which is around a fourth of the Switching Cycle. Without buffer-
ing, the throughput of the system in this case goes down to a seventh of the maximum
achievable throughput. Notice that although performance drops significantly, MultiNet
is still usable with a throughput of around 500 Kbps.
2.7.6 MultiNet with Slotted Synchronization
We now study the performance of Slotted Synchronization, described in Section 2.5.6.
We set up a three node network. The first machine always stays on the infrastructure
network. Both the other machines use MultiNet. Before we start this experiment, the
second node is connected to two networks, an ad hoc network and an infrastructure
network. It is initially the only node in the ad hoc network. The third node, which we
40
Connecting to
AH Network
1.6
1.4
Throughput (in Mbps) 1.2
0.8
0.6
0.4
0.2
0
0 10 20 30 40 50
Time (In Seconds)
IS Network Ad Hoc Network
Figure 2.8: Effect on UDP flows when a node uses Slotted Synchronization to join
an ad hoc network
also use as our test machine, is initially connected to only the infrastructure network.
We start a UDP flow between the test machine and the first machine, which is only on
the infrastructure network. We use Fixed Slot Duration switching, and set the duration
of each slot to 800 ms. This duration contains the Switching Delay. IPerf [1] was used to
initiate UDP flows of 1 Mbps with 512 bytes packets. The MPD was also instrumented
to report the total number of successful packets sent and received in every slot. This
setup used Spoofed Buffering.
Figure 2.8 illustrates the instantaneous throughput, measured once per Switching
Cycle, achieved by UDP flows when the test machine joins an ad hoc network that has
more than one MultiNet node. Initially, when the test machine is only in the infras-
tructure network, there is no Switching Delay, and consequently the UDP throughput
is around 1 Mbps. After 13 seconds, the test machine uses MultiNet to connect to the
41
ad hoc network, which already has one MultiNet node. The test machine takes around
15 seconds to initialize another virtual interface, build up its state, synchronize the slots
to the MultiNet node in the ad hoc network and get a DHCP address for the virtual
interface. After this time, the UDP flow between the test machine and the infrastruc-
ture network node resumes. We immediately start another UDP flow between the two
MultiNet nodes in the ad hoc network. As we see in the figure, UDP throughput in
the infrastructure network drops to around half the initial throughput. This is because
the infrastructure network gets one of two slots in Fixed Slot Duration Switching since
MultiNet connects to two networks. The Switching Delay does not reduce the through-
put further, because MultiNet is able to send the buffered packets over the Activity
Period at the network’s bandwidth, which is greater than the IPerf flow rate of 1 Mbps.
Further, the flow over the ad hoc network roughly achieves the same throughput as over
the infrastructure network, which implies that Slotted Synchronization maintains a good
overlap between multiple nodes using MultiNet to connect to an ad hoc network.
2.7.7 MultiNet on a Mobile Node
MultiNet does not aim to hide mobility from the user. As discussed in Section 2.5.2,
MultiNet’s virtual interfaces should behave as physical wireless cards when nodes are
mobile. To illustrate this behavior, the same experimental setup of Section 2.7.6 was
used. However, in this case, we focused on the throughput in the ad hoc network. After
around 28 seconds, the test machine was moved away from the other MultiNet node
in the ad hoc network. As we see in Figure 2.9, the IPerf throughput over the ad hoc
network keeps falling as the machine moves away from the other node in the ad hoc
network. With an increase in distance between the two nodes, the signal strength de-
creases, which increases the loss rate and reduces the throughput. After some time the
42
0.9 Motion Connection Connection

Start Lost Regained
0.8
0.7
Throughput (in Mbps)

0.6
0.5
0.4
0.3
0.2
0.1
0
0 10 20 30 40 50 60 70 80
Time (in seconds)
Figure 2.9: MultiNet in a Mobile Scenario
connection over the ad hoc network is lost. This state is propagated to the application
layer, which halts IPerf. However, MultiNet keeps trying to reconnect to the ad hoc net-
work, as described in Section 2.5.3. It regains connectivity at around 52 seconds. The
IPerf flow is started immediately between the two nodes. As we see in the figure, the
two nodes using MultiNet achieve the same throughput after reconnection, as they had
before the connection was lost. This shows that there is a significant overlap between the
two nodes, and the performance of Slotted Synchronization is not significantly affected
with mobility. The test machine was again moved at around 70 seconds, and we see a
corresponding reduction in throughput.
2.7.8 MultiNet versus Multiple Radios
MultiNet is one way of staying connected to multiple wireless networks. The alter-
native approach is to use multiple wireless cards. Each card connects to a different
network, and the machine is therefore connected to multiple networks. We simulated

43
this approach, and compared it with the MultiNet scheme with respect to the energy
consumed and the average delay of packets over the different networks. We first present
our simulation environment, and then compare the results of the MultiNet scheme to the
alternative approach using multiple radios.
Simulation Environment
We simulated both approaches for a sample scenario of people wanting to share and
discuss a presentation over an ad hoc network and browse the web over the infrastruc-
ture network at the same time. This feature is extremely useful in many scenarios. For
example, consider the case where a company, say Kisco’s, employees conduct a busi-
ness meeting with another company, say Macrosoft’s, employees at Macrosoft’s head-
quarters. With MultiNet and a single wireless network card, Kisco employees can share
documents, presentations, and data with Macrosoft’s employees over an ad hoc network.
Macrosoft’s employees can stay connected to their internal network via the access point
infrastructure while sharing electronic information with Kisco’s employees. Macrosoft
does not have to give Kisco employees access in their internal network in order for the
two parties to communicate.
We model traffic over the two networks, and analyze the packet trace using our sim-
ulator. Traffic over the infrastructure network is considered to be mostly web browsing.
We used Surge [18] to model http requests according to the behavior of an Internet
user. Surge is a tool that generates web requests with statistical properties similar to
measured Internet data. The generated sequence of URL requests exhibit representa-
tive distributions for requested document size, temporal locality, spatial locality, user
off times, document popularity and embedded document count. For our purposes, Surge
was used to generate a web trace for a 1 hour 50 minute duration, and this web trace
44
1600
1400
1200
Packet Size (in Bytes)

1000
800
600
400
200
0
0 1000 2000 3000 4000 5000 6000 7000
Time (seconds)
Figure 2.10: Packet trace for the web browsing application over the infrastructure
network
was then broken down to a sample packet trace for this period. The distribution of the
packet sizes over the infrastructure network is illustrated in Figure2.10.
The ad hoc network is used for two purposes: sharing a presentation, and support-
ing discussions using a sample chat application. Three presentations are shared in our
application over a 1 hour 50 minute period. Each presentation is a 2 MB file, and is
downloaded to the target machine using an FTP session over the ad hoc network. They
are downloaded in the 1st minute, the 38th minute, and the 75th minute. Further, the
user also chats continuously with other people in the presentation room, discussing the
presentation and other relevant topics. Packet traces for both the applications, FTP and
chat, were obtained by sniffing the network, using Ethereal [45], while running the re-
spective applications. MSN messenger was used for a sample chat trace for a 30 minute
duration. The Packet traces for FTP and chat were then extended over the duration of
our application, and are illustrated in Figure 2.11.

45
In our simulations we assume that wireless networks operate at their maximum TCP
throughput of 4.4 and 5.8 Mbps for an ad hoc and infrastructure network respectively.
We then analyze the packet traces for independent networks, and generate another trace
for MultiNet. We use a ‘75%IS 25%AH’ switching strategy presented in Section 2.5.5
with a switching cycle time of 400 ms. The switching delay is set to 1 ms, and we ex-
plain the reason for choosing this value in Section 2.8.1. Further, the power consumed
when switching is assumed to be negligible. We do not expect these simplifying as-
sumptions to greatly affect the results of our experiments. We analyze packet traces for
the two radio and MultiNet case and compute the total power consumed and the average
delay encountered by the packets. All the cards are assumed to be Cisco AIR-PCM350,
and their corresponding power consumption numbers are used from [111]. Specifically,
the card consumes 45 mW of power in sleep mode, 1.08W in idle mode, 1.3W in re-
ceive mode, and 1.875W in transmit mode. Further, in PSM, the energy consumed by
the Cisco AIR-PCM 350 in one power save cycle is given by: 0.045 ∗ n ∗ t + 24200
milliJoules, where n is the Listen Interval and t is the Beacon Period of the AP. The
details of these numbers are presented in [111].
Table 2.2: The average throughput in the ad hoc and infrastructure networks using
both strategies of MultiNet and two radios
Network Two Radio MultiNet
Ad Hoc 4.4 Mbps 1.1 Mbps
Infrastructure 5.8 Mbps 4.35 Mbps
Despite the performance advantages seen in Table 2.2, using multiple radios con-
sumes more power. Each radio is always on, and therefore keeps transmitting and re-
ceiving over all the networks. Even when it is not, the radio is in idle mode, and drains
46
1600
1400
1200
Packet Size (in Bytes)

1000
800
600
400
200
0
0 1000 2000 3000 4000 5000 6000 7000
Time (seconds)
Figure 2.11: Packet trace for the presentation and chat workloads over the ad hoc
network
a significant amount of power. Figure 2.12 shows the amount of energy consumed by
the MultiNet scheme and the two radio scheme for the above application. Two radios
consume almost double the power consumed by the single MultiNet radio.
Table 2.3: The average packet delay in infrastructure mode for the various strate-
gies
Scheme Avg Delay (in Seconds)
Two Radio 0.001
MultiNet 0.157
Two Radio PS 0.156
MultiNet PS 0.167
47
16000
14000
Energy Consumed (In Joules)

12000
10000
8000
6000
4000
2000
0
0 1000 2000 3000 4000 5000 6000
Time (In Seconds)
Two Radios MultiNet
Figure 2.12: Comparison of total energy usage when using MultiNet versus two
radios
With Power Save Mode
The multiple radio approach can be modified to consume less power by allowing the
network card in infrastructure mode to use PSM. Figure 2.13 shows the energy usage
when the infrastructure radio uses PSM for our application. The Beacon Period is set to
100 ms, and the Listen Interval is 4. The amount of energy consumed in the two radio
case using PSM is very close to the consumption of MultiNet without PSM. However,
this saving comes at a price. It is no longer possible to achieve the high throughput for
infrastructure networks if the cards are in PSM. Simulated results in Table 2.3 show that
the average packet delay over the infrastructure network with PSM is now close to the
average packet delay for MultiNet. Therefore, using two radios with PSM does not give
significant benefits as compared to MultiNet without PSM.

48
Without Power Save Mode
We analyze the two schemes of connecting to multiple networks with respect to the
performance on the network and the amount of power consumed. In our simulated
scenario, each of the radios gives the best achievable throughput on both the networks.
As shown in Table 2.2, the average throughput of MultiNet in the infrastructure mode is
4.35 Mbps compared to 5.8 Mbps in the two radio case. The average throughput in the ad
hoc network is 1.1 Mbps in MultiNet and 4.4 Mbps when using two radios. Switching
results in lesser throughput across individual networks, since it is on a network for a
smaller time period. Consequently, the scheme of using multiple cards gives much
better throughput as compared to MultiNet when connected to multiple networks.
The power consumption of MultiNet can be reduced further by allowing it to enter
the power save mode for infrastructure networks as described in Section 2.4.2. In our
experiment we chose the Switching Cycle to be 400 ms, with ‘75%IS 25%AH’ switch-
ing. For consistency in comparison, the Listen Interval is set to 4 and the Beacon Period
to 100 ms. Consequently, every time the card switches to infrastructure mode, it listens
for the traffic indication map from the AP. After it has processed all its packets it goes
to sleep and wakes up after 300 ms. It then stays in the ad hoc network for 100 ms,
and then switches back to the infrastructure network. The modified algorithm results in
greater energy savings as shown in Figure 2.13. The average delay per packet over the
infrastructure network is not seriously affected, while the energy consumed is reduced
by more than a factor of 3. We conclude that MultiNet is superior to the use of multiple
cards when connecting to multiple networks in applications seeking convenience and
power efficiency.
Note that we do not evaluate power saving in ad hoc mode because we are unaware
of any commercial cards that implement this feature. As a result we were unable to get
49
performance numbers when using PSM in ad hoc mode. However, we believe that if
such a scheme is implemented, we will be able to incorporate it in MultiNet, and further
reduce the power consumption.
8000
7000
6000
Energy (In Joules)
5000
4000
3000
2000
1000
0
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000
Time (In Seconds)
Two Radio PS MultiNet No PS MultiNet PS
Figure 2.13: Energy usage when using MultiNet and two radios with IEEE 802.11
Power Saving
2.7.9 Maximum Connectivity in MultiNet
We use the simulation environment of Section 2.7.8 to evaluate the performance on
increasing the number of connected networks. Table 2.4 presents the average delay seen
by packets over the infrastructure network on varying the number of MultiNet networks
from 2 to 6. We used a Fixed Priority switching strategy with equal priorities to all
the networks. An increase in the number of connected networks results in a smaller
Activity Period for each connected network when using Fixed Priority Switching. As a
result, more packets are buffered and the average delay encountered by the packets on a
network increases. This is shown in Table 2.4.

50
Table 2.4: The average packet delay in infrastructure mode on varying the number
of MultiNet connected networks
Num Networks Avg Delay (in Seconds)
2 0.191
3 0.261
4 0.332
5 0.410
6 0.485
2.7.10 Summary
We summarize the conclusions of our performance analysis as follows:
• No single switching strategy is best under all circumstances. Adaptive strate-
gies are best when no network preference is indicated. Both Adaptive Buffer and
Adaptive Traffic give similar performance
• For the applications studied, MultiNet consumes 50% less energy than a two card
solution.
• As expected, the average packet delay with MultiNet varies linearly with an in-
crease in the number of connected networks when all the networks are given equal
activity periods.
• Spoofed Buffering significantly improves the performance of MultiNet. However,
MultiNet works even without Spoofed Buffering, although the performance goes
down by a factor of 4.
• Masking ‘media connects’ and ‘media disconnects’ below IP leads to significant

51
reduction in the switching overhead. The switching delay for legacy cards is re-
duced to around 300 ms, while this number goes down to 30 ms for Native WiFi
cards.
• Adaptive Switching eliminates the current zero configuration requirement to pri-
oritize the preferred network. With MultiNet based zero configuration, the user
connects to all preferred networks.
• In mobile scenarios, MultiNet exposes the same connectivity status as a real card.
Further, Slotted Synchronization works well in ad hoc networks with commercial
wireless cards.
2.8 Discussion on the MultiNet Architecture
This section discusses ways in which the performance of MultiNet can be improved. In
particular, it focuses on reducing the switching overhead, enabling 802.1X [57] authen-
tication, and deployment.
2.8.1 Reducing the Switching Overhead
Good performance of MultiNet depends on low switching delays. The main cause of
the switching overhead in current generation wireless cards is the 802.11 network asso-
ciation and authentication protocol [58], which is executed every time the card switches
to a network. Further, these cards do not store state for more than one network in the
firmware, and worse still, many card vendors force a firmware reset when changing the
mode from ad hoc to infrastructure and vice versa.
Most of these problems are fixed in the next generation Native WiFi cards. These
cards do not incur a firmware reset on changing their mode. Moreover, since switching
52
is forced by MultiNet, Native WiFi cards do not explicitly disconnect from the network
when switching. However, they still carry out the association procedure that causes the
25 to 30 ms delay. By allowing upper layer software to control associations, instead of
automatically initializing them, this delay can be made negligible. The only overhead
on switching is then the synchronization with the wireless network. This can be done
reactively, if the card requests a synchronization beacon when it switches to a network.
Using the above optimizations, a WLAN card can switch to a network as fast as the
network card can switch to a different channel and the speed with which a network’s
state can be loaded into a flash card. Recent research has shown that the time to switch
to a different channel is less than 100 µsec for an IEEE 802.11a wireless card [51].
Further, as the network state to load is around 100 bytes, and data transfer speeds for
flash cards is 8 Mbps [13], we expect the switching overhead to be less than 1 ms.
2.8.2 Network Port Based Authentication
The IEEE 802.1X is a port based authentication protocol that is becoming popular for
enterprise wireless LANs. For MultiNet to be useful in all environments it has to support
this authentication protocol. However, the supplicant 802.1X protocol is implemented
in the Wireless Zero Configuration Service (WZC) for Windows XP, and we had to turn
off WZC for MultiNet to work. Only minor changes are needed in WZC for it to work
with MultiNet. However, achieving good performance with IEEE 802.1X is difficult.
We measured the overhead of the IEEE 802.1X authentication protocol and found it to
be approximately 600 ms. It is clear that we need to prevent the card from going through
a complete authentication procedure every time it switches across IEEE 802.1X enabled
networks. We can eliminate the authentication cycles by storing the IEEE 802.1X state
in the MPD and using this state instead of redoing the authentication procedure. Further,
53
the IEEE 802.11 standard recommends an optimization called ‘Preauthentication’ for
the APs. Preauthentication works by having the APs maintain a list of authenticated
nodes. When implemented, this optimization will eliminate the authentication overhead
every time the wireless card switches to an 802.1X enabled network.
2.8.3 Can MultiNet be done in the Firmware?
The simple answer is yes, however we strongly believe that the right place to implement
MultiNet is as a kernel driver. Buffering imposes memory requirements that are best
taken care of by the operating system, and the policy driven behavior can bloat the
firmware. Additionally, by moving the intelligence into a general purpose PC, the cost of
the wireless hardware can be reduced further, which is the trend for the next generation
of WLAN cards we described in Section 2.4.3.
2.9 Future Research
The switching behavior of MultiNet augurs badly for TCP performance. MultiNet is
implemented below IP, and so TCP sees fluctuating behavior for packets sent by it. It
receives immediate acknowledgements for packets sent when the network is active, and
delayed acknowledgements for buffered packets. The above behavior affects the way
TCP adjusts the RTT for the session, and from the way it is calculated, the RTT will
always be an upper bound. An overestimate of RTT results in larger timeout values to
ensure that packets are not lost. However, a larger than required RTT has other conse-
quences with respect to flow control, and congestion response. This problem is generally
relevant for networks that have periodic connectivity. A solution to this problem has to
mask the delay encountered by the buffered packets. We are currently exploring ways
to achieve this, and improve TCP performance.

54
Another open problem, as discussed in Section 2.5.6, is the synchronization of more
than one ad hoc network that has multiple nodes using MultiNet. Solving this problem
requires MultiNet to synchronize its slots to initiators of multiple ad hoc networks, and
those initiators’ slots might not be synchronized. We are looking at ways to handle this
scenario by allowing all nodes, including the initiator, to resynchronize their slots.
As stated previously, we do not consider scenarios when a MultiNet node is partici-
pating in a multi-hop ad hoc network. The synchronization problem is complicated for
such scenarios. A scheme that supports multi-hop networks has to handle partitioning
issues of the ad hoc network, and ways to resynchronize it. SSCH, described in Chap-
ter 3, is a step towards making MultiNet work in multi-hop networks. We hope to build
on SSCH, and implement a protocol that works in such scenarios.
2.10 Summary
The main contributions of this chapter can be summarized as follows:
• It describes a new virtualization architecture for wireless network cards, called
MultiNet. Several compelling real-life scenarios are described that motivate the
need for such an architecture. To the best of our knowledge, MultiNet is the first
to articulate this problem and propose a solution for IEEE 802.11 hardware.
• It proposes a deployable architecture for MultiNet. As part of the architecture, it
presents Spoofed Buffering, which leverages IEEE 802.11 PSM to buffer packets
at the APs without modifying them. Three switching algorithms are presented
that are useful in different applications of MultiNet. It also presents Slotted Syn-
chronization, which is a simple synchronization protocol that works in ad hoc
networks with multiple MultiNet nodes.

55
• It describes the implementation of MultiNet on Windows XP and over commercial
wireless network cards. MultiNet requires no modifications to the wireless card
drivers. This chapter also analyzes the performance of MultiNet in a number of
scenarios, such as mobility and without Spoofed Buffering. Further, MultiNet is
more power efficient than an alternative of using multiple wireless cards in the
device.
MultiNet achieves the design goals of transparency, deployability and performance.
Transparency is achieved by making virtual wireless cards appear as physical wireless
cards to the user. Its deployability has been demonstrated by an implementation on
Windows XP over commercial wireless cards. Finally, the performance of MultiNet has
been studied in detail, and is shown to give good performance in most scenarios. The
MultiNet software is available for free download, and more information can be found
at: http://www.cs.cornell.edu/people/ranveer/MultiNet/.
The contents of this chapter have benefitted from several helpful suggestions and
comments. In particular, Victor Bahl and Pradeep Bahl were involved in discussions that
helped develop the MultiNet architecture. Slotted Synchronization and the performance
results were revised after inputs from Ken Birman. Further, some of MultiNet’s design
goals were motivated by the requirements of MultiNet users.

CHAPTER 3
SSCH: CAPACITY IMPROVEMENT USING MULTINET
3.1 Introduction
The problem of supporting multiple senders and receivers in wireless networks has re-
ceived significant attention in the past decade. One domain where this communication
pattern naturally arises is fixed wireless multi-hop networks, such as community net-
works [21, 70, 107, 109]. Increasing the capacity of such wireless networks has been
the focus of much recent research (e.g., [40, 65, 95]). An obvious way to increase the
network capacity is to use frequency diversity [4,114]. Commodity wireless networking
hardware commonly supports a number of orthogonal channels, and distributing com-
munication across channels permits multiple simultaneous communication flows.
Channelization was added to the IEEE 802.11 standard to increase the capacity of
infrastructure networks — neighboring access points are tuned to different channels so
traffic to and from these access points does not interfere [4]. Non-infrastructure (i.e., ad-
hoc) networks have thus far been unable to exploit the benefits of channelization. The
current practice in ad-hoc networks is for all nodes to use the same channel, irrespective
of whether the nodes are within communication range of each other [107, 109].
Among its constructions, this thesis proposes a new protocol, Slotted Seeded Chan-
nel Hopping (SSCH), which extends the benefits of channelization to ad-hoc networks.
Logically, SSCH operates at the link layer, but it can be implemented in software over
an IEEE 802.11-compliant wireless Network Interface Card (NIC). The SSCH layer
in a node handles three aspects of channel hopping (i) implementing the node’s chan-
nel hopping schedule and scheduling packets within each channel, (ii) transmitting the
channel hopping schedule to neighboring nodes, and (iii) updating the node’s channel
56
57
hopping schedule to adapt to changing traffic patterns. SSCH is a distributed protocol
for coordinating channel switching decisions, but one that only sends a single type of
message, a broadcast packet containing that node’s current channel hopping schedule.
The simulation results show that SSCH yields a significant capacity improvement in
ad-hoc wireless networks, including both single-hop and multi-hop scenarios.
The primary research contributions of SSCH can be summarized as follows:
• It is a new protocol that increases the capacity of IEEE 802.11 ad-hoc networks
by exploiting frequency diversity. This extends the benefits of channelization to
ad-hoc networks. The protocol is suitable for a multi-hop environment, does not
require changes to the IEEE 802.11 standard, and does not require multiple radios.
• SSCH introduces a novel technique, optimistic synchronization, for distributed
rendezvous and synchronization. This technique allows control traffic to be dis-
tributed across all channels, and thus avoids control channel saturation, a bottle-
neck identified in prior work on exploiting frequency diversity [114].
• SSCH introduces a second novel technique to achieve good performance for multi-
hop communication flows. The partial synchronization technique allows a for-
warding node to partially synchronize with a source node and partially synchro-
nize with a destination node. This synchronization pattern allows the load for a
single multi-hop flow to be distributed across multiple channels.
3.2 Background and Motivation
In this section, the discussion will be limited to the widely-deployed IEEE 802.11 Dis-
tributed Coordination Function (DCF) protocol [60]. We begin by reviewing some rel-
evant details of this protocol. IEEE 802.11 recommends the use of a Request To Send
58
(RTS) and Clear To Send (CTS) mechanism to control access to the medium. A sender
desiring to transmit a packet must first sense the medium free for a DCF interframe space
(DIFS). The sender then broadcasts an RTS packet seeking to reserve the medium. If
the intended receiver hears the RTS packet, the receiver sends a CTS packet. The CTS
reserves the medium in the neighborhood of the receiver, and neighbors do not attempt
to send a packet for the duration of the reservation. In the event of a collision or failed
RTS, the node performs an exponential backoff. For additional details, the reader is
referred to [60].
The IEEE 802.11 standard divides the available frequency into orthogonal (non-
overlapping) channels. IEEE 802.11b specifies 11 channels in the 2.4 GHz spectrum,
3 of which are orthogonal, and IEEE 802.11a specifies 13 orthogonal channels in the
5 GHz spectrum. Packet transmissions on these orthogonal channels do not interfere if
the communicating nodes on them are reasonably separated (at least 12 inches apart for
common hardware [4]).
Using only a single channel limits the capacity of a wireless network. For example,
consider the scenario in Figure 3.1 where 6 nodes are within communication range of
each other, all nodes are on the same channel, and 3 of them have packets to send
to distinct receivers. Due to interference on the single channel, only one of them, in
this case node 3, can be active. In contrast, if all 3 orthogonal channels are used, all
transmissions can take place simultaneously on distinct channels. SSCH captures the
additional capacity provided by these orthogonal channels.
There were three important constraints in the design of SSCH:
• SSCH should require only a single radio per node. Some of the previous work
on exploiting frequency diversity has proposed that each node be equipped with
multiple radios [4, 135]. Multiple radios draw more power, and energy consump-
59
1 3 5
2 4 6
Figure 3.1: Only one of the three packets can be transmitted when all the nodes are
on the same channel.
tion continues to be a significant constraint in mobile networking scenarios. By
requiring only a single standards-compliant NIC per node, SSCH faces fewer de-
ployability hurdles than schemes with additional hardware requirements.
• SSCH should use an unmodified IEEE 802.11 protocol (including RTS/CTS)
when not switching channels. Requiring standards-compliant hardware allows
for easier deployment of this technology.
• SSCH should not cause logical partitions; any two nodes in communication range
should be able to communicate with each other despite channel hopping. Because
SSCH switches each NIC across frequency channels, different NICs may be on
different channels most of the time. Despite this, any two nodes in communication
range will overlap on a channel with moderate frequency (e.g., at least 10 ms out
of every half second) and discovery is accomplished during this time. As we

60
will show in Section 3.5.3, the mathematical properties of the SSCH protocol
guarantee that this overlap always occurs, even in the absence of synchronization.
SSCH exploits frequency diversity using an approach that we term optimistic syn-
chronization. This design makes the common case be that nodes are aware of each
other’s channel hopping schedules. However, SSCH also allows any node to change its
channel hopping schedule at any time. If node A has traffic to send to another node B,
and A knows B’s hopping schedule, A will probably be able to quickly send to B by
changing its own schedule. In the uncommon case that A does not know B’s sched-
ule, or A has out-of-date information about B, then the traffic incurs a latency penalty
while A discovers B’s new schedule. The SSCH design achieves this good common case
behavior when SSCH is used with a workload where traffic patterns change (i.e., new
flows are started) with lower frequency than hopping schedule updates are propagated.
Because hopping schedule update propagation requires only tens of milliseconds, this is
a good workload assumption for many wireless networking scenarios. Section 3.6 gives
absolute numbers for these qualitative claims.
SSCH is designed to work in a single-hop or multi-hop environment, and therefore
must support multi-hop flows. We introduce a partial synchronization technique to allow
one node, say B, to follow a channel hopping schedule that overlaps half the time with
another node A, and half the time with a third node C; this is necessary for node B
to efficiently forward traffic from node A to node C. Although it is trivially possible
for node B to have a channel hopping schedule that is an interleaving of A and C’s
schedules, this leaves open how B will schedule itself when a fourth node desires to
synchronize with B. The channel hopping design described in Section 3.5.2 resolves
this issue.
61
3.3 Hardware and MAC Assumptions
We assume that all nodes are using IEEE 802.11a – SSCH could also be used with other
MACs in the IEEE 802.11 family, but evaluation of such options are beyond the scope
of this dissertation. IEEE 802.11a supports 13 orthogonal channels, and we assume no
co-channel interference, a reasonable assumption for physically separated nodes [4]. We
expect wireless cards to be capable of switching across channels. The clocks at all nodes
are assumed to be synchronized to within 1 ms of each other using the Timer Synchro-
nization Function of IEEE 802.11 [58] or its modifications proposed in the literature,
such as ATSP [54,74] or ASCP [110]. We justified this assumption in Chapter 2.5.1. As
we discuss in more detail at the beginning of Section 3.6, recent work has reduced this
switching delay to approximately 80 µs ( [51, 84]). We assume that each wireless card
contains only a single half-duplex single-channel transceiver.
We require that NICs with a buffered packet wait after switching for the maximum
length of a packet transmission before attempting to reserve the medium. This prevents
hidden terminal problems from occurring just after switching. This hardware require-
ment is not necessary if the NIC packet buffer can be cleared whenever the channel is
switched.
3.4 Prior Work
We divide prior work relevant to SSCH into two categories: prior uses of pseudo-random
number generators in wireless networking, and alternative approaches to exploiting fre-
quency diversity. In the first category, we find that pseudo-random number generators
have been used for a variety of tasks in wireless networking. For example, the SEEDEX
protocol [108] uses pseudo-random generators to avoid RTS/CTS exchanges in a wire-

62
less network. Nodes build a schedule for sending and listening on a network, and pub-
lish their seeds to all the neighbors. A node attempts a transmission only when all its
neighbors (including the receiver) are in a listening state. Assuming relatively constant
wireless transmission ranges, this protocol also helps in overcoming the hidden and ex-
posed terminal problem caused by the RTS/CTS approach. The TSMA protocol [31,32]
is a channel access scheme proposed as an alternative to ALOHA and TDMA, for time-
slotted multihop wireless networks. TSMA aims to achieve the guarantees of TDMA
without incurring the overhead of transmitting large schedules in a mobile environment.
Each node is bootstrapped with a fixed seed that determines its transmission sched-
ule. The schedules are constructed using polynomials over Galois fields (which have
pseudo-random properties), and the construction guarantees that each node will overlap
with only a single other node within a certain time frame. The length of the schedule
depends on the number of nodes and the degree of the network.
Porting these schedules to a multichannel scenario, where the number of channels
is fixed, remains an open problem, and even such a porting would not meet the SSCH
goal of supporting traffic-driven overlap. Redi et al. [33] use a pseudo-random gener-
ator to derive listening schedules for battery-constrained devices. Each device’s seed
is known to a base station, which can then schedule transmissions for the infrequent
moments when the battery-constrained device is awake. Although pseudo-random gen-
erators have been used for a number of tasks (as this survey of the literature makes
clear), to the best of our knowledge, SSCH is the first protocol to use a pseudo-random
generator to construct a channel hopping schedule.
A second category of prior work focusses on increasing network capacity by ex-
ploiting frequency diversity. This is a significant body of research. The first division we
make in this body of work is between approaches that assume a single NIC capable of
63
communicating on a single channel at any given instance in time, and those that assume
more powerful radio technology, such as multiple NICs [4, 112] or NICs capable of lis-
tening on many channels simultaneously [66,89], even if they can only communicate on
one. Our work falls in to the former category; the SSCH architecture can be deployed
over a single standards-compliant NIC supporting fast channel switching.
Dynamic Channel Assignment (DCA) [135] and Multi-radio Unification Protocol
(MUP) [4] are both technologies that use multiple radios (in both cases, two radios) to
take advantage of multiple orthogonal channels. DCA uses one radio on a control chan-
nel, and the other radio switches across all the other channels sending data. Arbitration
for channels is embedded in the RTS and CTS messages, and is executed on the control
channel. Although this scheme may fully utilize the data channel, it does so at the cost
of using an entire radio just for control. MUP uses both radios for data and control trans-
missions. Radios are assigned to orthogonal channels, and a packet is sent on the radio
with better channel characteristics. This scheme gives good performance in many sce-
narios. However, it still only allows the use of as many channels as there are radios on
each physical node. From our perspective, the key drawback to both DCA and MUP is
simply that they require the use of multiple radios. Recently, commercial products have
appeared that support multiple radios on a single NIC [44]. It is not known whether
these products will achieve as many radios on a NIC as there are available channels, nor
what their power consumption will be.
A straightforward way to view the different potential gains of SSCH compared to a
true multiple radio design is to consider two distinct sources of bottleneck in a single-
radio, single-channel system: the saturation of the channel, and the saturation of any
particular radio. Conceptually, SSCH significantly increases the channel bandwidth,
without increasing the bandwidth of any individual radio. In contrast, a true multiple
64
radio design increases both. A specific example of this difference is that a node using
MUP (a true multiple radio design) can simultaneously send and receive packets on
separate channels, while a node using SSCH can only perform one of these operations
at a time.
We next turn our attention to work assuming more powerful radio technology than
is currently technologically feasible. HRMA [137] is designed for frequency hopping
spread spectrum (FHSS) wireless cards. Time is divided into slots, each corresponding
to a small fraction of the time required to send a packet, and the wireless NIC is on a
different frequency during each slot. All nodes are required to maintain synchronized
clocks, where the synchronization is at the granularity of slot times that are much shorter
than the duration of a packet. Each slot is subdivided in to four segments of time for
four different possible communications: HOP-RESERVED/RTS/CTS/DATA. The first
three segments of time are assumed to be small in comparison with the amount of time
spent sending a segment of the packet during the DATA time interval. To the best of our
knowledge, a FHSS wireless card that supports this type of MAC protocol at high data
rates is not commercially available.
Another line of related work assumes technology by which nodes can concurrently
listen on all channels. For example, Nasipuri et al [89] and Jain et al [66] assume
wireless NICs that can receive packets on all channels simultaneously, and where the
channel for transmission can be chosen arbitrarily. In these schemes, nodes maintain a
list of free channels, and either the sending or receiving node chooses a channel with the
least interference for its data transfer. Wireless NICs do not currently support listening
on arbitrarily many channels, and we do not assume the availability of such technology
in the design of SSCH.
We finally consider prior work that only assumes the presence of a single NIC with a
65
single half-duplex transceiver. The only other approach that we are aware of to exploit-
ing frequency diversity under this assumption is Multichannel MAC (MMAC) [114].
Like SSCH, MMAC attempts to improve capacity by arranging for nodes to simultane-
ously communicate on orthogonal channels. Briefly, MMAC operates as follows: nodes
using MMAC periodically switch to a common control channel, negotiate their chan-
nel selections, and then switch to the negotiated channel, where they contend for the
channel as in IEEE 802.11. This scheme raises several concerns that SSCH attempts
to overcome. First, MMAC has stringent clock synchronization requirements, and to
the extent that these are relaxed, MMAC must spend more time on the common control
channel doing discovery. Tight clock synchronization is particularly hard to provide in
multi-hop wireless networks [54]. In contrast, SSCH does not require tight clock syn-
chronization because SSCH does not have a common control channel or a dedicated
neighbor discovery interval. Secondly, synchronization traffic in MMAC can be a sig-
nificant fraction of the system traffic, and the common synchronization channel can
become a bottleneck on system throughput. SSCH addresses this concern by distribut-
ing synchronization and control traffic across all the available channels. A third concern
with MMAC is that it assumes wireless NICs are capable of switching across channels
in less than a microsecond. As we will see in the beginning of Section 3.6, an 80 µs
switching time better reflects the current state of the art in wireless NIC design, and
SSCH performs well with this assumption. A fourth concern with MMAC is that it may
not efficiently support multi-hop flows because forwarding nodes may not predictably
split their time between their sending and receiving neighbors. SSCH addresses this by
allowing nodes to achieve predictable partial synchronization with multiple neighbors.
Although this survey does not cover all related work, it does characterize the current
state of the field. At the level of detail in this section, prior work such as CHMA [124]
66
is similar to HRMA [137], and MAC-SCC [80] and the MAC protocols implicit in the
work of Li et al [79] and Fitzek et al [46] are similar to DCA [135]. However, a final
related channel hopping technology that is worth mentioning is the definition of FHSS
channels in the IEEE 802.11 [60] specification. At first glance, it may seem redun-
dant that SSCH does channel hopping across logical channels, each one of which (per
the IEEE 802.11 specification) may be employing frequency hopping across distinct
frequencies at the physical layer. The IEEE 802.11 specification justifies this physi-
cal layer frequency hopping with the scenario of providing support for multiple Basic
Service Sets (BSS’s) that can coincide geographically without coinciding on the same
logical channel. In contrast, SSCH does channel hopping so that any two nodes can
coincide as much or as little of the time as they desire. This is also at the heart of the
difference between SSCH and past work on channel-hopping protocols where nodes
overlap a fixed fraction of the time [32] – the degree of overlap between any two nodes
using SSCH is traffic-dependent.
3.5 SSCH
SSCH switches each radio across multiple channels and distributes flows within inter-
fering range of each other on orthogonal channels. This results in significantly increased
network capacity when the network traffic pattern consists of such flows.
SSCH is a distributed protocol, suitable for deployment in a multi-hop wireless net-
work. It does not require synchronization or leader election. Nodes do attempt to syn-
chronize, but lack of synchronization results in at most a mild reduction in throughput.
SSCH is designed to work with MultiNet, where a slot is defined to be the time
spent on a single channel. We choose a slot duration of 10 ms to amortize the overhead
of channel switching. At 54 Mbps (the maximum data rate in IEEE 802.11a), 10 ms is

67
equivalent to 35 maximum-length packet transmissions. A longer slot duration would
have further decreased the overhead of channel switching, but would have increased the
delay that packets encounter during some forwarding operations. The channel schedule
is the list of channels that the node plans to switch to in subsequent slots and the time at
which it plans to make each switch. Each node maintains a list of the channel schedules
for all other nodes it is aware of – this information is allowed to be out-of-date, but
the common case will be that it is accurate. The good performance exhibited by SSCH
(Section 3.6) validates this claim.
We develop the SSCH protocol by first describing packet transmission attempts
that are made by each node within a slot, and we refer to this as the packet schedule
(Section 3.5.1). Next, we define the policy for updating the channel schedule and for
propagating the channel schedule to other nodes (Section 3.5.2). We then describe the
mathematical properties that guided SSCH’s design (Section 3.5.3). Finally, we discuss
implementation considerations for SSCH (Section 3.6.4).
3.5.1 Packet Scheduling
SSCH maintains packets in per-neighbor FIFO queues. These queues maintain standard
higher-layer assumptions about in-order delivery. The per-neighbor FIFO queues are, in
turn, maintained in a priority queue ordered by perceived neighbor reachability.
The SSCH scheduling strategy aims to maximize bandwidth utilization by minimiz-
ing the number of packets sent to nodes that are unreachable. It works as follows. At the
beginning of a slot, packet transmissions are attempted in a round-robin manner among
all flows. If a packet transmission to a particular neighbor fails, the corresponding flow
is reduced in priority until a period of time equal to one half of a slot duration has
elapsed – this limits the bandwidth wasted on flows targeted at nodes that are currently
68
on a different channel to at most two packets per slot whenever a flow to a reachable
node also exists. Packets are only drawn from the flows that have not been reduced in
priority unless only reduced priority flows are available.
Because nodes using SSCH will often be on different channels, broadcast packets
transmitted in any one slot are likely to reach only some of the nodes within physical
communication range. The SSCH layer handles this issue through repeated link-layer
retransmission of broadcast packets enqueued by higher layers. Although broadcast
packets sent this way may reach a different set of nodes than if all nodes had been on
the same channel, we have not found this to present a difficulty to protocols employing
broadcast packets — in Section 3.6 we show that as few as 6 transmissions allows DSR
(a protocol that relies heavily on broadcasts) to function well. This behavior is not sur-
prising because broadcast packets are known to be less reliable than unicast packets, and
so protocols employing them are already robust to their occasional loss. However, the
SSCH retransmission strategy may not be compatible with all uses of broadcast, such
as its use for synchronization [43]. Also, deploying SSCH in an environment with a
different number of channels might require the choice of 6 transmissions to be revis-
ited. Finally, although retransmission increases the bandwidth consumed by broadcast
packets, SSCH still delivers significant capacity improvement in the traffic scenarios we
studied (Section 3.6).
An SSCH node with a packet to send may discover that a neighbor is not present on
a given channel when no CTS is received in response to a transmitted RTS. However,
the node may very well be present on another channel, in which case SSCH should still
deliver the packet. To handle this, we initially retain the packet in the packet queue.
Packets are dropped only when SSCH gives up on all packets to a given destination, and
this dropping of an entire flow occurs only when we have failed to transmit a packet to
69
the destination node for an entire cycle through the channel schedule. We will explain
the meaning of a cycle through the channel schedule in Section 3.5.2, but with our cho-
sen parameter settings the timeout is 530 ms. After a flow has been garbage collected,
new packets with the same destination inserted in the queue are assigned to a new flow,
and attempted in the normal manner.
This packet scheduling policy is simple to implement, and yields good performance
in the common case where node schedules are known, and information about node avail-
ability is accurate. A potential drawback is that a node crash (or other failure events)
can lead to a number of wasted RTSs to the failed node. When summed across channels,
the number may exceed the IEEE 802.11 suggested value of 7 retransmission attempts
for RTS packets. In Section 3.6, we quantify the cost of such failures and show that it is
small.
3.5.2 Channel Scheduling
We begin our description of channel scheduling by describing the data structure used to
represent the channel schedule. We then describe the policy nodes use to act on their own
channel schedule, the mechanism to communicate channel schedules to other nodes, and
finally the policy nodes implement for updating or changing their own channel schedule.
The channel schedule must capture a given node’s plans for channel hopping in the
future, and there is obvious overhead to representing this as a very long list. Instead, we
compactly represent the channel schedule as a current channel and a rule for updating
the channel – in particular, as a set of 4 (channel, seed) pairs. Our experimental results
show that 4 pairs suffice to give good performance (Section 3.6). We represent the
(channel, seed) pair as (xi , ai ). The channel xi is represented as an integer in the range
[0, 12] (13 possibilities), and the seed a i is represented as an integer in the range [1, 12].
70
Each node iterates through all of the channels in the current schedule, switching to the
channel designated in the schedule in each new slot. The node then increments each of
the channels in its schedule using the seed,
xi ← (xi + ai ) mod 13
and repeats the process.
We introduce one additional slot to prevent logical partitions. After the node has
iterated through every channel on each of its 4 slots, it switches to a parity slot whose
channel assignment is given by x parity = a1 . The term parity slot is derived from the
analogy to the parity bits appended at the end of a string in some error correcting codes.
The mathematical justification for this design is given in Section 3.5.3. We use the term
cycle to refer to the 530 ms iteration through all the slots, including the parity slot.
In Figure 3.2, we illustrate possible channel schedules for two nodes in the case of 2
slots and 3 channels. In the Figure, node A and node B are synchronized in one of their
two slots (they have identical (channel, seed) pairs), and they also overlap during the
parity slot. The field of the channel schedule that determines the channel during each
slot is shown in bold. Each time a slot reappears, the channel is updated using the seed.
For example, node A’s slot 1 initially has (channel, seed) = (1,2). The next time slot 1
is entered, the channel is updated by adding the seed to it mod 3 (mod 3 because in this
example, there are 3 channels). The resulting channel is given by (1 + 2) mod 3 = 0.
Nodes switch from one slot to the next according to a fixed schedule (every 10 ms
in our current parameter settings). However, the decision to switch channels may occur
while a node is transmitting or receiving a packet. In this case we delay the switch until
after the transmission and ACK (or lack thereof) have occurred.
Nodes learn each other’s schedules by periodically broadcasting their seeds and the
offset within this cycle through the channel schedule. We use the IEEE 802.11 Long
71
A: 1 2 0 0 2 1 2 1
(x1, a1) ( 1, 2) (1, 2) 0

( , 2) (0, 2) ( 2, 2) (2, 2) (1, 2) (1, 2)
(x2, a2) (2, 1) ( 2, 1) (0, 1) ( 0, 1) (1, 1) ( 1, 1) (2, 1) (2, 1)
Slot: 1 2 1 2 1 2 Parity 1
B: 1 0 0 1 2 2 2 1
(x1, a1) ( 1, 2) (1, 2) ( 0, 2) (0, 2) ( 2, 2) (2, 2) (1, 2) (1, 2)

(x2, a2) (0, 1) 0
( , 1) (1, 1) ( 1, 1) (2, 1) 2
( , 1) (2, 1) (2, 1)
Slot: 1 2 1 2 1 2 Parity 1
Node goes to Channel X

X
in this slot.
Parity Slot. Node goes to Channel Y,

Y
and then repeats the cycle.
Figure 3.2: Channel hopping schedules for two nodes with 3 channels and 2 slots.
Node A always overlaps with Node B in slot 1 and the parity slot. The field of the
channel schedule that determines the channel during each slot is shown in bold.
Control Frame Header format to embed both the schedule and the node’s current offset –
this is discussed in more detail in Section 3.6.4. The SSCH layer at each node schedules
one of these packets for broadcast once per slot.
Nodes also update their knowledge of other nodes’ schedules by trying to communi-
cate and failing. Whenever a node sends an RTS to another node, and that node fails to
respond even though it was believed to be in this slot, the node sending the RTS updates
the channel schedule for the other node to reflect that it does not currently know the
node’s schedule in this slot.
We now turn to the question of how a given node changes its own schedule. Sched-
ules are updated in two ways: each node attempts to maintain that its slots start and stop
at roughly the same time as other nodes, and that its channel schedule overlaps with
72
nodes for which it has packets to send. We embed the information needed for this syn-
chronization within the Long Control Frame Header as well. Using this information, a
simple averaging scheme such as described by Elson et al [43] can be applied to achieve
the loose synchronization required for good performance (Section 3.6 shows that a 100
µs skew in clock times leads to less than a 2% decrease in capacity).
At a high level, each node achieves overlap with nodes for which it has traffic
straightforwardly, by changing part of its own schedule to match that of the other nodes.
However, a number of minor decisions must be made correctly in order to achieve this
high level goal.
(A1, A2) (A1, B2) (A1, B2)

Node A
A B
(B1, B2) (B1, C2)
Node B
B C
(C1, C2)
Node C
t1 t2
time
Figure 3.3: The problem with a naive synchronization scheme. Node A has two
slots, with (channel, seed) pairs represented by A1 and A2 ; nodes B and C are sim-
ilarly depicted. At time t1 , node A synchronizes with node B. Node B synchronizes
with node C at time t2 , after which A and B are no longer synchronized.
Nodes recompute their channel schedule right before they enqueue the packet an-
nouncing this schedule in the NIC (and so at least once per slot). In a naive approach,
this node could examine its packet queue, and select the (channel, seed) pairs which
lead to the best opportunity to send the largest number of packets. However, this ignores
the interest this node has in receiving packets, and in avoiding congested channels. An
73
example of the kind of problem that might arise if one ignores the interest in receiving
packets is given in Figure 3.3. Here, A synchronized with B, and then B synchronized
with C in such a way that A was no longer synchronized with B. This could have been
avoided if B had used its other slot to synchronize with C, as it would have if it consid-
ered its interest in receiving packets.
To account for this node’s interest in receiving packets, we maintain per-slot counters
for the number of packets received during the previous time the slot was active (ignoring
broadcast packets). Any slot that received more than 10 packets during the previous
iteration through that slot is labeled a receiving slot; if all slots are receiving slots, any
one is allowed to be changed. If some slots are receiving slots and some are not, only
the (channel, seed) pair on a non-receiving slot is allowed to be changed for the purpose
of synchronizing with nodes it wants to send to.
SSCH has to avoid the scenario where all nodes in a network converge on the same
(channel, seed) pair value. This situation could arise in a number of scenarios. For
example, if a node, say A, initiates a flow to another node, say B, and then node C
initiates a flow to node A, then A, B and C will synchronize to the same (channel,
seed) value. Moreover, if these were the only nodes in the network, they would never
change their (channel, seed) value. This situation is a problem for SSCH since all nodes
will hop to the same channel in every slot, and therefore all flows will be on the same
channel. Hence, the benefits of channelization are lost, and SSCH becomes equivalent
to a single-channel MAC.
To account for this channel congestion, we propose a new de-synchronization scheme.
A node compares the (channel, seed) pairs of all nodes from which it received packets
in a given slot, with the list of (channel, seed) pairs of all the other nodes in its list of
channel schedules. If the number of nodes synchronized to the same (channel, seed)
74
pair is more than twice the number that this node communicated with in the previous
occurrence of the slot, we attempt to de-synchronize it from these other nodes. De-
synchronization just involves choosing a new (channel, seed) pair for this slot.
10
9
8
# Synchronized Nodes
7
6
5
4
3
2
1
0
0 50 100 150 200 250 300 350 400 450
Time (in Slots)
Without Desync With Desync
Figure 3.4: Need for De-synchronization: All nodes converge to the same channel
without de-synchronization.
The need for de-synchronization is illustrated in Figure 3.4. Our protocol is simu-
lated for 10 stationary nodes, and one of them is randomly picked as a test node. All
nodes are within communication range of each other, the slot duration is 10 ms, and
each node has 4 (channel, seed) pairs. We consider IEEE 802.11a [59], which has 13
orthogonal channels. Initially, every node starts a flow to a randomly chosen destination
node for a random duration between 1 and 500 ms. At the end of a flow, a node starts
a different flow with a randomly picked destination and duration. Figure 3.4 plots the
number of neighbors of the test node that have the same (channel, seed) pair in a slot as
the test node. Without de-synchronization, the number of nodes with the same (channel,
75
seed) pair increases monotonically over time for each of the 4 (channel, seed) values.
After around 370 slots, which is 370*10 ms = 3.7 seconds, all 9 neighbors of the test
node converge to the same (channel, seed) pair on all slots. Consequently, all nodes
always switch to the same channel all the time, and SSCH becomes equivalent to single
channel IEEE 802.11. This scenario is avoided by our de-synchronization mechanism.
In our experimental scenario, de-synchronization never allows more than 4 neighbors to
have the same (channel, seed) pair as the test node.
The final constraints we add moderate the pace of change in schedule information.
Each node only considers updating the (channel, seed) pair for the next slot, never for
slots further in the future. If the previous set of criteria suggest updating some slot other
than the next slot, we delay that decision. Given these constraints, picking the best pos-
sible (channel, seed) pair simply requires considering the choice that synchronizes with
the set of nodes for which we have the largest number of queued packets. Additionally,
the (channel, seed) pair for the first slot is only allowed to be updated during the par-
ity slot – this helps to prevent logical partition, as will be explained in more detail in
Section 3.5.3.
This strategy naturally supports nodes acting as sources, sinks, or forwarders. A
source node will find that it can assign all of its slots to support sends. A sink node will
find that it rarely changes its slot assignment, and hence nodes sending to it can easily
stay synchronized. A forwarding node will find that some of its slots are used primarily
for receiving; after re-assigning the channel and seed in a slot to support sending, the
slots that did not change are more likely to receive packets, and hence to stabilize on
their current channel and seed as receiving slots for the duration of the current traffic
patterns. Our simulation results (Section 3.6) support this conclusion. We refer to the
technique of enabling this synchronization pattern as partial synchronization.

76
3.5.3 Mathematical Properties of SSCH
Our discussion of the mathematical properties of SSCH will initially focus on the static
case. The behavior of SSCH when channel schedules are not changing assures us that in
a steady-state flow setting, nodes will rendezvous appropriately, in a sense that we make
precise below. We will then expand our discussion to include the dynamics of channel
scheduling in an environment where flows are starting and stopping. In our discussion,
we assume that all nodes use IEEE 802.11 to synchronize their clocks within 1 ms of
each other, and there are no Byzantine failures in the network. A node never sends false
information about its schedule.
The channel scheduling mechanism has three simultaneous design goals: allowing
nodes to be synchronized in a slot, infrequent overlap between nodes that do not have
data to send to each other, and ensuring that all nodes come into contact occasionally (to
avoid a logical partition). To achieve these goals, we rely on a very simple mathematical
technique, addition modulo a prime number [12].
Consider two nodes that want to be synchronized in a given slot. If they have iden-
tical (channel, seed) pairs for this slot, then clearly they will remain synchronized in
future iterations (using the static assumption). Now consider two nodes that are not syn-
chronized because they have different seeds. A simple calculation shows that these two
nodes will overlap exactly one out of every 13 iterations in this slot (recall that 13 is
the number of channels). This is the behavior we want from these nodes: they overlap
regularly enough that they can exchange their channel schedules, but they are mostly on
different channels, and so do not interfere with each other’s transmissions.
Now consider the rare case that two nodes share identical seeds in every slot, but
different channels accompany each seed – this has at most a 1 in 13 4 ≈ 28, 000 chance
of occurring for randomly chosen (channel, seed) pairs. In this case, the nodes will
77
march in lock-step through the same set of channels in each slot, never overlapping.
This would be problematic, and it is this situation that the parity slot prevents. To justify
this claim, we consider two distinct situations. If both nodes enter their parity slot at
the same time, then they overlap there because the parity channel is equal to the seed
for the first slot for both nodes. With our chosen parameter settings of 10 ms per slot,
4 slots, and 13 channels, this overlap occurs once every 530 ms and lasts for 10 ms. If
their parity slots do not occur at the same time, then the first node’s parity slot offers a
fixed target for the slot in which the second node is changing channels, and again, the
two nodes will overlap. This overlap occurs once every 7 seconds. Although both these
cases will be rare, the SSCH time synchronization mechanism allows us to ignore the
second case entirely – a relative clock skew of 5 ms or less is sufficient to guarantee that
two parity slots overlap in time.
Now considering the dynamic case (and assuming clock synchronization to within
5 ms), we note that nodes are not permitted to change the seed for the first of their four
slots except during a parity slot. Therefore they will always overlap in either the first slot
or the parity slot, and hence will always be able to exchange channel schedules within a
moderate time interval.
The use of addition modulo a prime to construct channel hopping schedules does
not restrict SSCH to scenarios where the number of channels is a prime number. If one
desired to use SSCH with a wireless technology where the number of channels is not a
prime, one could straightforwardly use a larger prime as the range of xi , and then map
down to the actual number of channels using a modulus reduction. Though the mapping
would have some bias to certain channels, the bias could be made arbitrarily small by
choosing a sufficiently large prime.
A final point about the use of addition modulo a prime is that SSCH can be modified
78
to require fewer bits to represent a node’s schedule by reducing the number of choices
for a seed. The only penalty to this reduction is increasing the protocol’s reliance on the
parity slot for avoiding logical partitions.
3.6 Performance Evaluation
This section presents the simulation results of SSCH in QualNet and compares its perfor-
mance with the commonly used single-channel IEEE 802.11a protocol. Subsection 3.6.1
presents microbenchmarks quantifying the different SSCH overheads. Subsection 3.6.2,
presents macrobenchmarks on the performance of SSCH with a large number of nodes
in a single hop environment. Subsection 3.6.3 extends the macrobenchmark evaluation
to encompass mobility and multihop routing. Our results show that SSCH incurs very
low overhead, and significantly outperforms IEEE 802.11a in a multiple flow environ-
ment.
The simulation environment comprises a varying number of nodes in a 200m ×
200m area. All nodes in a single simulation run use the same MAC, either SSCH or
IEEE 802.11a. All nodes are set to operate at the same raw data rate, 54 Mbps. We
assume 13 usable channels in the 5 GHz band. SSCH is configured to use 4 seeds,
and each slot duration is 10 ms. All seeds are randomly chosen at the beginning of
each simulation run. The macrobenchmarks in subsections 3.6.2 and 3.6.3 are averages
from 5 independent simulation runs, while the microbenchmarks in subsection 3.6.1 are
drawn from a single simulation run.
We primarily measure throughput under a traffic load of maximum rate UDP flows.
In particular, we use Constant Bit Rate (CBR) flows of 512 byte packets sent every 50
µs. This data rate is more than the sustainable throughput of IEEE 802.11a operating at
54 Mbps.
79
For all our simulations, we modified QualNet to use a channel switch delay of 80
µs. This choice was informed by recent work in solid state electronics on reducing
the settling time of the Voltage Control Oscillator (VCO) [85]. Switching the channel
of a wireless card requires changing the input voltage of the VCO, which operates in
a Phase Locked Loop (PLL) to achieve the desired output frequency. The delay in
channel switching is due to this settling time. The specification of Maxim IEEE 802.11b
Transceivers [84] shows this delay to be 150 µs. More recent work [51] shows that this
delay can be reduced to 40-80 µs for IEEE 802.11a cards.
3.6.1 Microbenchmarks
We present microbenchmarks measuring the overhead of SSCH in several different sce-
narios. In Section 3.6.1, we measure the overhead during the successful initiation of
a CBR stream. In Section 3.6.1, we measure the overhead on an existing session of
failing to initiate a parallel CBR stream. In Section 3.6.1, we measure the overhead
of supporting two streams simultaneously. In Section 3.6.1, we measure the overhead
of continuing to attempt transmissions to a mobile node that has moved out of range.
These scenarios cover many of the different dynamic events that a MAC must appro-
priately handle: a flow starting while a node is present, a flow starting while a node is
absent, simultaneous flows where both nodes are present, simultaneous flows where one
node moves out of range, etc. Finally, the last scenario (Section 3.6.1) measures the
overhead of SSCH with respect to a different kind of event, clock skew.
Overhead of Switching and Synchronizing
In this experiment, we measured the overhead of successfully initiating a CBR stream
between two nodes within communication range of each other. The first node initiates
80
the stream just after the parity slot. This incurs a worst-case delay in synchronization,
because the first of the four slots will not be synchronized until 530 ms later.
In Figure 3.5, we graph the instantaneous throughput at the receiver node. The
sender quickly synchronizes with the receiver on three of the four slots, as it should, and
on the fourth slot after 530 ms. The figure shows the throughput while synchronizing
(oscillating around 3/4 of the raw bandwidth), and the time required to synchronize.
After synchronizing, the channel switching and other protocol overheads of SSCH lead
to only a 400 Kbps penalty in the steady-state throughput relative to IEEE 802.11a.
This penalty conforms to our intuition about the overheads in SSCH: a node spends 80
µs every 10 ms switching channels (80 µs/10 ms = .008), and then must wait for the
duration of a single packet to avoid colliding with pre-existing packet transmissions in
the new channel (1 packet/35 packets = .028). Adding these two overheads together
leads to an expected cumulative overhead of 3.6%, which is in close agreement with the
measured overhead of (400 Kbps/12 Mbps) = 3.3%.
Note that the throughput of the session reaches a maximum of only 13 Mbps, al-
though the raw data rate is 54 Mbps. This low utilization can be explained by the IEEE
802.11a requirement that the RTS/CTS packets be sent at the lowest supported data rate,
6 Mbps, along with other overheads [52].
Overhead of an Absent Node
SSCH requires more re-transmissions than IEEE 802.11 in order to prevent logical par-
titions. These retransmissions waste bandwidth that could have been dedicated to a node
that was present on the channel. To quantify this overhead, we initiated a CBR stream
between two nodes, allowed the system to quiesce, and then initiated a send from the
first node to a non-existent node. We present a moving average of the throughput over 80
81
Time to totally
synchronize
14
12
Throughput (in Mbps) 10
0
10 12 14 16 18 20 22 24
Time (in seconds)

SSCH 802.11a
Figure 3.5: Switching and Synchronizing Overhead: Node 1 starts a maximum rate
UDP flow to Node 2. We show the throughput for both SSCH and IEEE 802.11a.
ms in Figure 3.6. It shows that the sender takes 530 ms to timeout on the non-existent
node. During this time the session throughput drops by 550 Kbps, which is a small
fraction (4.6%) of the total throughput.
Overhead of a Parallel Session
Next, we quantify the ability of SSCH to fairly share bandwidth between two flows, and
to quickly achieve this fair sharing. We start with Node 1 sending a maximum rate UDP
stream to Node 2. At 21.5 seconds, Node 1 starts a second maximum rate UDP stream
to Node 3.
Figure 3.7 presents a moving average of the throughput achieved by both nodes over
a period of 140 ms. It illustrates the instantaneous throughput achieved at Nodes 2 and 3
82
14

12
10
Attempting send to
8 absent node
0
23 23.5 24 24.5 25 25.5 26 26.5 27
Time (in seconds)
Figure 3.6: Overhead of an Absent Node: Node 1 is sending a maximum rate UDP
stream to Node 2. Node 1 then attempts to send a packet to a non-existent node.
(the receivers). The bandwidth is split between the receivers nearly perfectly (and with
no decrease in net throughput) within 200 ms.
Overhead of Mobility
We now analyze the effect of mobility at a micro-level on the performance of SSCH.
Ideally, SSCH should be able to detect a link breakage due to movement of a node, and
subsequently re-synchronize to other neighbors. We show that SSCH can indeed handle
this scenario with an experiment comprising 3 nodes and 2 sessions, and in Figure 3.8
we present a moving average of each session throughput, averaged over a period of 280
ms.
Node 1 is initially sending a maximum rate UDP stream to Node 2. Node 1 initiates
a second UDP stream to Node 3 at around 20.5 seconds. This bandwidth is then shared
between both the sessions (as in the experiment of Section 3.6.1) until 30 seconds, when
83
14
12

10
0
15 20 25 30 35
Time (in seconds)
Node 2 Node 3
Figure 3.7: Overhead of a Parallel Session: Node 1 is sending a maximum rate
UDP stream to Node 2. Node 1 then starts a second stream to Node 3.
Node 3 moves out of the communication range of Node 1. Our experiment configures
Node 1 to continue to attempt to send to Node 3 until 43 seconds, and during this time
it continues to consume a small amount of bandwidth. In contrast, the experiment in
Section 3.6.1 measured the overhead of enqueueing a single packet to an absent node.
When the stream to Node 3 finally stops, Node 2’s received throughput increases back
to its initial rate.
Overhead of Clock Drift
As we described in Section 3.5.2, SSCH tries to synchronize slot begin and end times,
though it is also designed to be robust to clock skew. In this experiment, we quantify the
robustness of SSCH to moderate clock skew. We measure the throughput between two
nodes after artificially introducing a clock skew between them, and disabling the SSCH
84
14

12
10
0
15 20 25 30 35 40 45 50
Time (in seconds)
Node 2 Node 3
Figure 3.8: Overhead of Mobility: Node 1 is sending a maximum rate UDP stream
to Node 2. Node 1 starts another maximum rate UDP session to Node 3. Node 3
moves out of range at 30 seconds, while Node 1 continues to attempt to send until
43 seconds.
synchronization scheme for slot begin and end times. We vary the clock skew from 1 ns
(10−6 ms) to 1 ms such that the sender is always ahead of the receiver by this value, and
present the results in Figure 3.9. Note the log scale on the x-axis.
The throughput achieved between the two nodes is not significantly affected by a
clock skew of less than 10 µs. The drop in throughput is more for larger clock skews,
although the throughput is still acceptable at 10.5 Mbps when the skew value is an
extremely high 1 ms.
These results provide justification for the design choice we made not to require nodes
to switch synchronously across slots, as described in Section 3.5.2. For example, a node
will delay switching to receive an ACK, or to send a data packet if its channel reservation
85
14

12
10
0
1 ns 10 ns 100 ns 1 µs 10 µs 100 µs 1 ms
Clock Drift
Figure 3.9: Overhead of Clock Skew: Throughput between two nodes using SSCH
as a function of clock skew.
is successful. In the 100 node experiment described in Section 3.6.3, we measured the
skew in channel switching times for a traffic pattern of 50 flows to be approximately 20
µs. Figure 3.9 shows that this is a negligible amount.
3.6.2 Macrobenchmarks: Single-hop Case
We now present simulation results showing SSCH’s ability to achieve and sustain a
consistently high throughput for a traffic pattern consisting of multiple flows. We first
evaluate this using steady state UDP flows. We then extend our evaluation to consider
a dynamic traffic scenario where UDP flows both start and stop. Finally, we study the
performance of TCP over SSCH.
Disjoint Flows
We first look at the number of disjoint flows that can be supported by SSCH. All nodes
in this experiment are in communication range of each other, and therefore two flows
86
are considered disjoint if they do not share either endpoint. Ideally, SSCH should utilize
the available bandwidth on all the channels on increasing the number of disjoint flows
in the system. We evaluate this by varying the number of nodes in the network from 2 to
30 and introducing a flow between disjoint pairs of nodes — the number of flows varies
from 1 to 15.
14
Per-flow Throughput
12
10
0
0 2 4 6 8 10 12 14
# Flows
802.11a SSCH
Figure 3.10: Disjoint Flows: The throughput of each flow on increasing the number
of flows.
Figure 3.10 shows the average per-flow throughput, and Figure 3.11 shows the total
utilized system throughput. IEEE 802.11a performs marginally better when there is
just one flow in the network. When there is more than one flow, SSCH significantly
outperforms IEEE 802.11a.
An increase in the number of flows decreases the per-flow throughput for both SSCH
and IEEE 802.11a. However, the drop for IEEE 802.11a is much more significant. The
drop for IEEE 802.11a is easily explained by Figure 3.11, which shows that the overall
system throughput for IEEE 802.11a is approximately constant.

87
120
System Throughput

100
80
60
40
20
0
0 2 4 6 8 10 12 14
# Flows
802.11a SSCH
Figure 3.11: Disjoint Flows: The system throughput on increasing the number of
flows.
It may seem surprising that the SSCH system throughput has not stabilized at 13
times the throughput of a single flow by the time there are 13 flows. However, this can
be attributed to SSCH’s use of randomness to distribute flows across channels. These
random choices do not lead to a perfectly balanced allocation, and therefore there is still
unused spectrum even when there are 13 flows in the system, as shown by the continuing
positive slope of the curve in Figure 3.11.
Non-disjoint Flows
We now consider the case when the flows in the network are not disjoint – nodes par-
ticipate as both sources and sinks, and in multiple flows. This scenario stresses SSCH’s
ability to efficiently support sharing among simultaneous flows that have a common
endpoint. Each node in the network starts a maximum rate UDP flow with one other
randomly chosen node in the network. We vary the number of nodes (and thus flows)
88
from 2 to 20. As in the previous experiment, all nodes are within communication range
of each other. We present the per-flow and system throughput for SSCH and IEEE
802.11a in Figures 3.12 and 3.13 respectively. The curves are not monotonic because
variation in the random choices leads to some receivers being recipients in multiple
flows (and hence bottlenecks). This lack of monotonicity persisted even after averag-
ing over 5 simulation runs. As in the disjoint flow experiment, SSCH performs slightly
worse in the case of a single flow, but much better in the case of a large number of flows.
7 Per-flow Throughput
0
0 5 10 15 20
# Flows
802.11a SSCH
Figure 3.12: Non-disjoint Flows: The average throughput of each flow on increas-
ing the number of flows. There is a flow from every node in the network.
Effect of Flow Duration
SSCH introduces a delay when flows start because nodes must synchronize. This over-
head is more significant for shorter flows. We evaluate this overhead for maximum rate
UDP flows with different flow lengths. In the first experiment the flow duration is cho-
89
45
40 System Throughput

35
30
25
20
15
10
0
0 5 10 15 20
# Flows
802.11a SSCH
Figure 3.13: Non-disjoint Flows: The system throughput on increasing the number
of flows. There is a flow from every node in the network.
sen randomly between 20 and 30 ms, while for the second experiment it is between 0.5
and 1 second. In both the experiments, each node starts a flow with a randomly selected
node, discards all packets at the end of the designated sending window, pauses for a
second at the end of the flow, and then starts another flow with a new randomly selected
node. This process continues for 30 seconds. We run these experiments for both SSCH
and IEEE 802.11a, and vary the number of nodes from 2 to 16. We present the ratio
of the average throughput achieved by SSCH to that achieved by the flows when using
IEEE 802.11a in Figure 3.14.
For small numbers of sufficiently short-lived flows, IEEE 802.11a offers superior
performance; short flows do indeed suffer from a more pronounced synchronization
overhead. However, as soon as there are more than 4 simultaneous flows in the network,
the ability of SSCH to spread transmissions across multiple channels leads to a higher
total throughput than IEEE 802.11a in both the short and long flow scenarios.
90
3.5
Throughput Ratio (SSCH/802.11)

3
2.5
1.5
0.5
0
2 4 6 8 10 12 14 16
# Nodes
Duration 20-30 ms Duration 0.5 -1 second
Figure 3.14: Effect of Flow Duration: Ratio of SSCH average throughput to IEEE
802.11a average throughput for flows having different durations.
TCP Performance over SSCH
We now study the behavior of TCP over SSCH. SSCH allows a node to stay synchro-
nized to multiple nodes over different slots. However, this might cause significant jitter
in packet delivery times, which could adversely affect TCP. To evaluate this concern
quantitatively, we run an experiment where we vary the number of nodes in the network
from 2 to 9, such that all nodes are in communication range of one another. We then start
an infinite-size file transfer over FTP from each node to a randomly selected other node.
This choice to use non-disjoint flows is designed to stress the SSCH implementation by
requiring nodes to be synchronized as either senders or receivers with multiple other
nodes. In Figure 3.15 we present the resulting cumulative steady-state TCP throughput
over all the flows in the network.
Figure 3.15 shows that the TCP throughput for a small number of flows is lower
91
16
14

12
10
0
2 3 4 5 6 7 8
# Flows
SSCH 802.11a
Figure 3.15: TCP over SSCH: Steady-state TCP throughput when varying the
number of non-disjoint flows.
for SSCH than the throughput over IEEE 802.11a. However, as the number of flows
increases, SSCH does achieve a higher system throughput. Although TCP over SSCH
does provide higher aggregate throughput than over IEEE 802.11a, the performance
improvement is not nearly as good as for UDP flows. This shows that jitter due to
SSCH does have an impact on the performance of TCP. A more detailed analysis of
the interaction between TCP and SSCH, and modifications to support better interactions
between TCP and SSCH, is a subject we plan to address in our future work.
3.6.3 Macrobenchmarks: Multihop and Mobility
We now evaluate SSCH’s performance when combined with multihop flows and mobile
nodes. We first analyze the behavior of SSCH in a multihop chain network. We then
92
consider large scale multihop networks, both with and without mobility. As part of this
analysis, we study the interaction between SSCH and MANET routing protocols.
Performance in a Multihop Chain Network
IEEE 802.11 is known to encounter significant performance problems in a multihop net-
work [136]. For example, if all nodes are on the same channel, the RTS/CTS mechanism
allows at most one hop in an A− B − C − D chain to be active at any given time. SSCH
reduces the throughput drop due to this behavior by allowing nodes to communicate on
different channels. To examine this, we evaluate both SSCH and IEEE 802.11a in a
multihop chain network.
14
12
10
0
0 6 12 18
# Nodes
SSCH 802.11a
Figure 3.16: Multihop Chain Network: Variation in throughput as chain length
increases.
We vary the number of nodes, which are all in communication range, from 2 to 18.
We initiate a single flow that encounters every node in the network. Although more
93
than 4 nodes transmitting within interference range of each other would be unlikely
to arise from multihop routing of a single flow, it could easily arise in a more general
distributed application. Figure 3.16 shows the maximum throughput as the number of
nodes in the chain is varied. We see that there is not much difference between SSCH
and IEEE 802.11a for flows with few hops. As the number of hops increases, SSCH
performs much better than IEEE 802.11a since it distributes the communication on each
hop across all the available channels.
Performance in a Multihop Mesh Network
We now analyze the performance of SSCH in a large scale multihop network without
mobility. We place 100 nodes uniformly in a 200 × 200 m area, and set each node to
transmit with a power of 21 dBm. The Dynamic Source Routing (DSR) [68] protocol
is used to discover the source route between different source-destination pairs. These
source routes are then fed into a static variant of DSR that does not perform discovery
or maintain routes. We vary the number of maximum rate UDP flows from 10 to 50. We
generate source and destination pairs by choosing randomly, and rejecting pairs that are
within a single hop of each other.
We present the average flow throughput in Figure 3.17. Increasing the number of
flows leads to greater contention, and the average throughput of both SSCH and IEEE
802.11a drops. For every considered number of flows, SSCH provides significantly
higher throughput than IEEE 802.11a. For 50 flows, the inefficiencies of sharing a
single channel are sufficiently pronounced that SSCH yields more than a factor of 15
capacity improvement.
94
3.5

3
2.5
1.5
0.5
0
10 20 30 40 50
# Flows
SSCH 802.11a
Figure 3.17: Mulithop Mesh Network of 100 Nodes: Average flow throughput on
varying the number of flows in the network.
Impact of Channel Switching on MANET Routing Protocols
Previous work on multi-channel MACs has often overlooked the effect of channel switch-
ing on routing protocols. Most of the proposed protocols for MANETs, such as DSR [68],
and AODV [97] rely heavily on broadcasts. However, neighbors using a multi-channel
MAC could be on different channels, which could cause broadcasts to reach signifi-
cantly fewer neighbors than in a single-channel MAC. SSCH addresses this concern
using a broadcast retransmission strategy discussed in Section 3.5.1.
We study the behavior of DSR [68] over SSCH in the same experimental setup used
in Section 3.6.3, with 100 nodes in a 200 m×200 m area. However, we reduce the trans-
mission power of each node to 16 dBm to force routes to increase in length (and hence
to stress DSR over SSCH). We select 10 source-destination pairs at random, and we use
DSR to discover routes between them. In Figure 3.18 we compare the performance of
95
0.8 7
0.7 6
Average Route Length (hops)

Time to Discover Route (s)
0.6
5
0.5
Average Route Length for IEEE 802.11 4
0.4
3
0.3
2
0.2
0.1 1
Average Route Discovery Time for IEEE 802.11
0 0
2 3 4 5 6 7 8
# Broadcasts
Route Discovery Time Avg Route Length
Figure 3.18: Impact of SSCH on Unmodified MANET Routing Protocols: The
average time to discover a route and the average route length for 10 randomly
chosen routes in a 100 node network using DSR over SSCH.
DSR over SSCH, when varying the SSCH broadcast transmission count parameter (the
number of consecutive slots in which each broadcast packet is sent once).
Figure 3.18 shows that the performance of DSR over SSCH improves with an in-
crease in the broadcast transmission count. The DSR Route Request packets see more
neighbors when SSCH broadcasts them over a greater number of slots. This increases
the likelihood of discovering shorter routes, and the speed with which routes are dis-
covered. However, there seems to be little additional benefit to increasing the broadcast
parameter to a value greater than 6. The slight bumpiness in the curves can be attributed
to the stochastic nature of DSR, and its reliance on broadcasts.
Comparing SSCH to IEEE 802.11a, we see that the SSCH discovers routes that are
96
comparable in length. However, the average route discovery time for SSCH is much
higher than for IEEE 802.11a. Because each slot is 10 ms in length, broadcasts are only
retransmitted once every 10 ms, and this leads to a significantly longer time to discover a
route to a given destination node. We believe that this latency is a fundamental difficulty
in using a reactive protocol such as DSR with SSCH. We plan to explore the interaction
of other proactive and hybrid routing protocols with SSCH in the future.
Performance in Multihop Mobile Networks
We now present the impact of mobility in a network using DSR over IEEE 802.11a
and SSCH. In this experiment, we place 100 nodes randomly in a square and select 10
flows. Each node transmits packets at 21 dBm. Node movement is determined using
the Random Waypoint model. In this model, each node has a predefined minimum and
maximum speed. Nodes select a random point in the simulation area, and move towards
it with a speed chosen randomly from the interval. After reaching its destination, a node
rests for a period chosen from a uniform distribution between 0 and 10 seconds. It then
chooses a new destination and repeats the procedure. In our experiments, we fix the
minimum speed at 0.01 m/s and vary the maximum speed from 0.2 to 1.0 m/s. Although
we have studied SSCH at higher speeds, the results are not significantly different. We
performed this experiment using two different areas for the nodes, a 200m × 200m area
and a 300m × 300m area. We refer to the smaller area as the dense network, and the
larger area as the sparse network – the average path is 0.5 hops longer in the sparse
network. For all these experiments, we set the SSCH broadcast transmission count
parameter to 6.
Figure 3.19 shows that in a dense network, SSCH yields much greater through-
put than IEEE 802.11a even when there is mobility. Although DSR discovers shorter
97
5 4
4.5
3.5
Route Length (Hops)

4
3
3.5
2.5
3
2.5 2
2
1.5
1.5
1
1
0.5
0.5
0 0
0.2 0.4 0.6 0.8 1
Speed (in m/s)
802.11 Throughput SSCH Throughput
802.11 # Hops SSCH # Hops
Figure 3.19: Dense Multihop Mobile Network: The per-flow throughput and the
average route length for 10 flows in a 100 node network in a 200m × 200m area,
using DSR over both SSCH and IEEE 802.11a.
routes over IEEE 802.11a, the ability of SSCH to distribute traffic on a greater number
of channels leads to much higher overall throughput. Figure 3.20 evaluates the same
benchmarks in a sparse network. The results show that the per-flow throughput de-
creases in a sparse network for both SSCH and IEEE 802.11a. This is because the route
lengths are greater, and it takes more time to repair routes. However, the same quali-
tative comparison continues to hold: SSCH causes DSR to discover longer routes, but
still leads to an overall capacity improvement.
DSR discovers longer routes over SSCH than over IEEE 802.11a because broadcast
packets sent over SSCH may not reach a node’s entire neighbor set. Furthermore, some
optimizations of DSR, such as promiscuous mode operation of nodes, are not as effective
in a multi-channel MAC such as SSCH. Thus, although the throughput of mobile nodes
98
6.5
1400
5.5
1200
Throughput (in Kbps)
Route Length (Hops)

4.5
1000
3.5
800
2.5
600
400 1.5
200 0.5
0 -0.5
0.2 0.4 0.6 0.8 1
Speed (in m/s)
802.11 Throughput SSCH Throughput
802.11 # Hops SSCH # Hops
Figure 3.20: Sparse Multihop Mobile Network: The per-flow throughput and the
average route length for 10 flows in a 100 node network in a 300m × 300m area,
using DSR over both SSCH and IEEE 802.11a.
using DSR over SSCH is much better than their throughput over IEEE 802.11a, we
conclude that a routing protocol that takes the channel switching behavior of SSCH into
account will likely lead to even better performance.
3.6.4 Implementation Considerations
When simulating SSCH in QualNet [62], we made two technical choices that seem to
be relatively uncommon based on our reading of the literature. The first technical choice
relates to how we added SSCH to an existing system, and the second relates to a little-
utilized part of the IEEE 802.11 specification.
In order to implement SSCH, we had to implement new packet queuing and retrans-
mission strategies. To avoid requiring modifications to the hardware (in QualNet, the
99
hardware model) or the network stack, SSCH buffers packets below the network layer,
but above the NIC device driver. To maintain control over transmission attempts, we
configure the NIC to buffer at most one packet at a time, and to attempt exactly one RTS
for each packet before returning to the SSCH layer. By observing NIC-level counters
before and after every attempted packet transmission, we are able to determine whether
a CTS was heard for the packet, and if so, whether the packet was successfully trans-
mitted and acknowledged. All the necessary parameters to do this are exposed by the
hardware model we used in QualNet.
For efficiency reasons, we choose to use the IEEE 802.11 Long Control Frame
Header format to broadcast channel schedules and current offsets, rather than using a full
broadcast data packet. The most common control frames in IEEE 802.11 (RTS, CTS,
and ACK) use the alternative short format. The long format was included in the IEEE
802.11 standard to support inter-operability with legacy 1-Mbps and 2-Mbps DSSS sys-
tems [60]. The format contains 6 unused bytes; we use 4 to embed the 4 (channel, seed)
pairs, and another 2 to embed the offset within the cycle (i.e., how far the node has
progressed through the 530 ms cycle).
Lastly, we comment that the beaconing mechanism used in IEEE 802.11 ad-hoc
mode for associating with a Basic Service Set (BSS) works unchanged in the presence
of SSCH. A newly-arrived node can associate to a BSS as soon as it overlaps in the same
channel with any already-arrived node.
3.7 Alternatives to SSCH
This Section discusses alternative designs for SSCH within the constraints that were
enumerated in Section 3.2.
SSCH distributes the rendezvous and control traffic across all the channels. One
100
straightforward alternative scheme, which still only requires one radio, is to use one of
the channels as a control channel, and all the other channels as data channels (e.g., [66]).
Each node must then somehow split its time between the control channel and the data
channels.
Such a scheme will have difficulty in preventing the control channel from becoming
a bottleneck. Suppose that two nodes exchange RTS/CTS on the control channel, and
then switch to a data channel to do transmission. Unless all other nodes were also on the
control channel during the RTS/CTS exchange, these two nodes will still need to do an
RTS/CTS on this channel in order to avoid the hidden terminal problem. The two nodes
should wait to even do the RTS/CTS until after an entire packet transmission interval
has elapsed, because another pair of nodes might have also switched to this channel,
orchestrating that decision on the control channel during a time that the first pair of
nodes were not on the control channel. In order to amortize this startup cost, the nodes
should have several packets to send to each other. However, while any one node remains
on a data channel, any other node that desires to send it a packet must remain idle on
the control channel waiting for the node it desires to reach to re-appear. If the idle node
on the control channel chooses not to wait, and instead switches to a data channel with
another node for which it has traffic, it may repeatedly fail to rendezvous with the first
node, leading to a significant imbalance in throughput and possibly a logical partition.
The problems with a dedicated control channel may be solvable, but it is clear that a
straightforward approach with un-synchronized rendezvous presents several difficulties.
If one instead tried to synchronize rendezvous on the control channel, the control chan-
nel could again become a bottleneck simply because many nodes simultaneously desire
to schedule packets on that channel.

101
3.8 Future Research
SSCH is a promising technology. In our future work, we plan to investigate how SSCH
will perform when implemented over actual hardware, and subjected to the normal en-
vironmental vagaries of wireless networks, such as unpredictable variations in signal
strength. As part of this implementation effort, we also plan to evaluate how metrics
reflecting environmental conditions, such as ETX [40], can be integrated into SSCH.
Our results in Section 3.6.3 show that existing routing protocols do not give the best
performance over SSCH. In particular, we find that the time to discover a route can be
quite large in a reactive routing protocol being run over SSCH. In the future, we plan
to more thoroughly evaluate routing over SSCH (as opposed to classical single channel
routing), and to explore a wider variety of proactive and hybrid routing protocols over
SSCH.
There are at least four additional topics that would also need to be addressed be-
fore SSCH can be deployed. One is interoperability with nodes that are not running
SSCH. Another is the evaluation of power consumption under this scheme. We have
not attempted to evaluate the energy cost of switching channels, nor have we attempted
to enable a power-saving strategy such as in the IEEE 802.11 specification for access-
point mode. A third topic of investigation is the evaluation of SSCH in conjunction with
auto-rate adaptation mechanisms. A fourth topic is a more detailed evaluation of the
interplay between SSCH and TCP.
3.9 Summary
We have presented SSCH, a new protocol that extends the benefits of channelization
to ad-hoc networks. This protocol is compatible with the IEEE 802.11 standard, and is
102
suitable for a multi-hop environment. SSCH achieves these gains using a novel approach
called optimistic synchronization. We expect this approach to be useful in additional
settings beyond channel hopping.
We have shown through extensive simulation that SSCH yields significant capacity
improvement in a variety of single-hop and multi-hop wireless scenarios. In the future,
we look forward to exploring SSCH in more detail using an implementation over ac-
tual hardware. More information about SSCH and the QualNet simulation code can be
obtained from: http://www.cs.cornell.edu/people/ranveer/multinet/ssch.htm.
Work on SSCH was done jointly with people at Microsoft Research. The SSCH
protocol was co-developed with John Dunagan. Victor Bahl was involved in the entire
research project and made sure that we proceeded in the right direction. Finally, this
work benefitted greatly from Ken Birman’s insightful comments.

CHAPTER 4
CLIENT CONDUIT AND FAULT DIAGNOSIS IN WIRELESS NETWORKS
4.1 Introduction
The convenience of wireless networking has led to a wide-scale adoption of IEEE 802.11
networks [58]. Corporations, universities, homes, and public places are deploying these
networks at a remarkable rate. However, a significant number of “pain points” remain
for end-users and network administrators. Users experience a number of problems such
as intermittent connectivity, poor performance, lack of coverage, and authentication fail-
ures. These problems occur due to a variety of reasons such as poor access point lay-
out, device misconfiguration, hardware and software errors, the nature of the wireless
medium (e.g., interference, propagation), and traffic congestion.
Figure 4.1 shows the number of such wireless-related complaints logged by the In-
formation Technology (IT) department of Microsoft corporation over a period of six
months. The company has a large deployment of IEEE 802.11 networks with several
thousand Access Points (APs) spread over more than forty buildings. Each complaint is
an indication of end-user frustration and loss of productivity for the corporation. Fur-
thermore, resolution of each complaint results in additional support personnel costs to
the IT department; our research revealed that this cost is several tens of dollars and this
does not include the cost due to the loss of end-user productivity.
To resolve complaints quickly and efficiently, network administrators need tools for
detecting, isolating, diagnosing, and correcting faults. To the best of our knowledge,
there is no previous research that addresses fault diagnostic problems in IEEE 802.11
infrastructure networks. However, as discussed in Section 4.3, there has been consid-
erable prior work on fault diagnosis in other setting, which we can leverage here. The
103
104
related problems
No. of wireless-
600
400
200
0
1 2 3 4 5 6
Month
Figure 4.1: Number of wireless related complaints logged by the IT department of
a major US corporation
importance of diagnosing these problems in the “real-world” is apparent from the num-
ber of companies that offer solutions in this space [5, 7, 39, 103, 131]. These products
do a reasonable job of presenting statistical data from the network; however, they lack a
number of desirable features. Specifically, they do not do a comprehensive job of gath-
ering and analyzing the data to establish the possible causes of a problem. Furthermore,
most products only gather data from the APs and neglect the client-side view of the
network. Some products that monitor the network from the client’s perspective require
hardware sensors, which can be expensive to deploy and maintain. Also, current solu-
tions do not provide any support for disconnected clients even though these are the ones
that need the most help. We discuss these products in more detail in Section 4.3.
This chapter presents a flexible architecture for detecting and diagnosing faults in
infrastructure wireless networks. We instrument wireless clients and (if possible) access
points to monitor the wireless medium and devices that are nearby. Our architecture
supports both proactive and reactive fault diagnosis. We use this monitoring framework
to address some of the problems plaguing wireless users. We present a novel technique
called Client Conduit that enables disconnected clients to diagnose their problems with
105
the help of nearby clients. This technique takes advantage of the beaconing and probing
mechanisms of IEEE 802.11 to ensure that connected clients do not pay unnecessary
overheads for detecting disconnected clients. We also present a simple technique for
finding the approximate location of disconnected clients. We present a technique that
uses nearby wireless clients for diagnosing wireless network performance problems.
Finally, we show how our monitoring architecture naturally lends itself to detecting
rogue or unauthorized access points in enterprise wireless networks. We have imple-
mented and evaluated the basic architectural framework, Client Conduit, and Rogue AP
detection on the Windows operating system using off-the-shelf IEEE 802.11 network
cards; we have evaluated our other mechanisms using tools such as AiroPeek [132] and
WinDump [134]. Our results show that our techniques are effective; furthermore, they
impose negligible overheads when clients are not experiencing problems.
We summarize the primary contributions of our chapter as follows:
• We believe ours is the first work to identify fault diagnosis in IEEE 802.11 infras-
tructure networks as an important area of research. The identification of various
problems in such environments is an important contribution since wireless fault
diagnosis is an area that needs attention.
• We present a flexible client-based architecture for detecting and diagnosing faults
in an IEEE 802.11 infrastructure network. Our fault-diagnosis approach is unique
in the wireless context since we use clients (and if possible, infrastructure APs) to
monitor the network and the radio frequency (RF) environment.
• We describe a simple and efficient technique called Client Conduit that allows dis-
connected clients to communicate via nearby connected clients; this mechanism
can be used to bootstrap wireless clients and resolve connectivity problems.

106
• We present novel solutions that use our architecture for detecting and diagnosing a
variety of faults: locating disconnected clients, diagnosing performance problems,
and detecting Rogue APs.
Our work is just a first step in the direction of self-healing wireless networks and
there are a number of issues that still need to be addressed. From the vast number of
wireless problems faced by end-users and network administrators everyday, we have fo-
cused only on a subset of those problems; our selection was based on conversations
with network administrators [24] along with the high-priority problems observed in
user-complaint logs. Even though some of our techniques are applicable to other de-
ployments (e.g., hotspots, homes), our main emphasis has been diagnosing faults in en-
terprise wireless networks. We ensure that our techniques do not introduce new security
attacks but we do not focus on denial-of-service and greedy MAC attacks [101].
The rest of the chapter is organized as follows: In Section 4.2, we discuss the most
important problems that users and network administrators complain about with respect
to wireless LAN deployment. Section 4.4 describes the components of our client-based
architecture. Section 4.5 presents the Client Conduit protocol. Section 4.6 focuses on lo-
cating disconnected clients, performance isolation, and Rogue AP detection. Section 4.7
describes the implementation of our system and Section 4.8 presents an evaluation of
our techniques. Section 4.3 discusses related work. Finally, we discuss future work in
Section 4.9 and conclude in Section 4.10.
4.2 Faults in a Wireless Network
We enumerate the most important problems that users and network administrators face
when using and maintaining corporate wireless networks. Our list has been derived from
interviews and discussions we conducted with network administrators and operation

107
engineers of Microsoft’s IT department. These individuals are responsible for managing
over 4,400 IEEE 802.11 APs distributed over forty buildings in the company.
Connectivity problems: End-users complain about inconsistent or a lack of network
connectivity in certain areas of a building. Such “dead spots” or “RF holes” can oc-
cur due to a weak RF signal, lack of a signal, changing environmental conditions, or
obstructions. Locating an RF hole automatically is critical for wireless administrators;
they can then resolve the problem by either relocating APs or increasing the density of
APs in the problem area or by adjusting the power settings on nearby APs for better
coverage.
Performance problems: This category includes all the situations where a client ob-
serves degraded performance, e.g., low throughput or high latency. There could be a
number of reasons why the performance problem exists, e.g., traffic slow-down due
to congestion, RF interference due to a microwave oven or cordless phone, multi-path
interference, large co-channel interference due to poor network planning, or due to a
poorly configured client/AP. Performance problems can also occur as a result of prob-
lems in the non-wireless part of the network, e.g., due to a slow server or proxy. It is
therefore necessary that the diagnostic tool be able to determine whether the problem is
in the wireless network or elsewhere. Furthermore, identifying the cause in the wireless
part is important for allowing network administrators to better provision the system and
improve the experience for end-users.
Network security: Large enterprises often use solutions such as IEEE 802.1x [57] to
secure their networks. However, a nightmare scenario for IT managers occurs when em-
ployees unknowingly compromise the security of the network by connecting an unau-
thorized AP to an Ethernet tap of the corporate network. The problem is commonly

108
referred to as the “Rogue AP Problem” [5, 7, 36]. These Rogue APs are one of the
most common and serious breaches of wireless network security. Due to the presence of
such APs, external users are allowed access to resources on the corporate network; these
users can leak information or cause other damage. Furthermore, Rogue APs can cause
interference with other access points in the vicinity. Detecting Rogue APs in a large
network via a manual process is expensive and time-consuming; thus, it is important to
detect such APs proactively.
Authentication problems: According to the IT support group’s logs, a number of com-
plaints are related to users’ inability to authenticate themselves to the network. In wire-
less networks secured by technologies such as IEEE 802.1x [57], authentication failures
are typically due to missing or expired certificates. Thus, detecting such authentication
problems and helping clients to bootstrap with valid certificates is important.
In this chapter, we focus on detecting RF holes, diagnosing performance problems,
detecting Rogue APs, and helping a client to recover from an authentication problem
via Client Conduit. As part of our future work, we will investigate diagnosis of authen-
tication problems as well.
4.3 Related Work
To the best of our knowledge, there has been no previous research on fault diagnosis
in IEEE 802.11 infrastructure networks. However, there are a number of commercial
products that provide varying degrees of support for network management tasks, e.g.,
AirWave [7], Network Systems and Management (NSM) [39], Wireless Security Advi-
sor [103], AirDefense [5], SpectraMon/SpectraGuard [131], AirMagnet [6], and Sym-
bol [123]. Due to their propriety nature, the available description typically describes the
109
feature-set and not the techniques; the comparison below is based on our understanding
of their brochures.
The emphasis in most of these products is more towards managing wireless networks
rather than diagnosing faults. These tools allow network administrators to obtain and vi-
sualize data from access points, upgrade firmware, manage security policies, etc. Some
of them also provide real-time WLAN performance monitoring through IEEE 802.11
statistics such as packet throughput, number of retries, number of dropped packets at
the AP, etc. Even though these low-level statistics are useful for network administra-
tors, it is more desirable to provide higher level fault detection and diagnosis, e.g., our
approach detects network performance problems and pinpoints the components that are
problematic.
Many of these products (e.g., AirWave, Unicenter) operate from the AP or the server
side only, i.e., clients are not instrumented. Given the asymmetry and variability of the
wireless medium, observing data from the client-side is important for fault diagnosis,
e.g., since conditions such as interference near the client can be drastically different
than the conditions near the AP, client-side information is needed to do a detailed per-
formance breakdown. Furthermore, our approach of modifying clients allows us to help
disconnected clients via Client Conduit, locate Rogue APs and disconnected clients, and
obtain better coverage for detecting Rogue APs.
Some products like AirMagnet and AirDefense obtain the complete view of the
enterprise by deploying specialized sensors throughout the organization; these sensors
pass all the packets to the server for analysis. Anecdotal evidence from talking to vari-
ous network administrators suggests that products that use sensor-based monitoring are
expensive to deploy; furthermore, their performance degrades significantly even when
very few sensors are deployed due to the network traffic. Our approach uses regular
110
wireless clients to avoid extra hardware deployment costs. Of course, a limitation of
our approach is that we rely on the presence of nearby clients for diagnosing some of
the wireless faults; however, the increasing usage of wireless clients in organizations is
making it easier to satisfy this requirement.
Since Rogue APs are a serious security problem, all the products listed above per-
form Rogue AP detection. Unlike our solution, most of these products achieve this goal
either by using other APs [7, 39] or by using specialized sensors [5, 6, 131]; as discussed
above, these approaches have deployment and fault-detection limitations. Our technique
of using both clients and APs for detecting Rogue APs is similar to the Symbol tech-
nique [123]. However, unlike their approach, our technique can also detect Rogue APs
that use MAC address spoofing of real APs; furthermore, we leverage our client and AP
instrumentation to approximately locate Rogue APs using D IAL.
None of the above products provide solutions for assisting disconnected clients even
though they need the most help. Our Client Conduit mechanism allows live and reactive
diagnosis to be performed for such clients that are unable to access the infrastructure
wireless network.
The notion of making wireless clients snoop the environment for ensuring secure and
correct routing has been suggested for ad hoc networks. In [83], the authors propose a
watchdog mechanism to detect network unreliability problems stemming from selfish
nodes. The basic idea is to have watchdog nodes observe their neighbors and determine
if they are forwarding traffic as expected; this approach for detecting routing anomalies
has been further refined by others as well [15,27]. Inspired by the watchdog mechanism,
we also use nearby clients to monitor the RF conditions and traffic flow around them; in
our architecture, the watchdog mechanism is used for fault detection (e.g., Rogue APs)
and fault diagnosis (e.g., Client Conduit, locating disconnected clients, performance
111
isolation). Recent work [101] has used snooping wireless clients for detecting greedy
and malicious behavior in hotspots environment; these techniques are orthogonal to our
work and can be incorporated in our framework as well.
Researchers have developed techniques for diagnosing performance problems over
the Internet. For example, Barford et al. [19] use traffic traces at the end points and clas-
sify delays as occurring due to a slow server, a slow client, or the network. While E DEN
has similar goals over a wireless network, it does so without requiring tracing support
from both end points. Tulip [82] is another approach for diagnosing delays over Internet
paths. The client sends ICMP packets and uses their responses from different compo-
nents to determine the cause, such as lost packets, packets reordering, or queueing delay.
E DEN also uses ICMP packets. However, the broadcast nature of the wireless medium
enables E DEN to use a novel approach of snooping these packets as a mechanism for
diagnosing component delays.
4.4 System Architecture
In this Section, we first discuss the requirements and then describe the components that
make up our fault detection and diagnosis architecture.
4.4.1 System Requirements
Before we describe the system components, we enumerate the requirements for our
system:
• We require that the software on clients be augmented for monitoring. In our sys-
tem, software modifications on APs are needed only for better scalability and for
analyzing an AP’s performance (Section 4.6.2). Since our approach does not re-
quire hardware modifications, “the bar” for deploying our system is lower.
112
• For some of our mechanisms, we need the ability to control beacons and probes.
We also require that clients have the capability of starting an infrastructure net-
work (i.e., become an AP) or an ad hoc network on their own; this ability is sup-
ported by many wireless cards, e.g., Atheros [14], Native WiFi [86]. Whenever
faced with a choice of starting an ad hoc or an infrastructure network, we prefer
the latter since infrastructure mode is better supported in current cards.
• We rely on the availability of a database that keeps track of the location of all
the access points; such location databases are typically maintained by network
administrators.
• Some of our techniques require the presence of nearby clients or access points.
With the increasing deployment of access points and the use of wireless laptops
and PDAs in enterprise wireless networks, this requirement is becoming relatively
easy to satisfy in these environments. In fact, based on SNMP data collected from
APs over a period of two days, we observed the presence of 13-15 associated
wireless clients on our floor (approximately 2500 sq. meters) during working
hours of the day; thus, with such client densities, there is a high likelihood that
our requirement will be satisfied.
Compared with the existing products that require deploying special wireless sen-
sors throughout the enterprise, our approach takes advantage of nearby clients and
access points instrumented with software “sensors”, thereby imposing a lower de-
ployment cost.
4.4.2 System Components
Our system consists of the following components — a Diagnostic Client (DC) that runs
on a wireless client machine, an optional Diagnostic AP (DAP) that runs on an Access

113
Point, and a Diagnostic Server (DS) that runs on a backend server of the organization
(see Figure 4.2). Below, we describe each of these in detail.
Diagnostic Client module or DC: The Diagnostic Client module monitors the RF en-
vironment and the traffic flow from neighboring clients and APs. Note that during nor-
mal activity, the client’s wireless card is not placed in promiscuous mode. The DC
uses the collected data to perform local fault diagnosis. Depending on the individual
fault-detection mechanism, a summary of this data is transmitted to the DAPs or DSs
at regular intervals, e.g., for Rogue AP detection, the DC in our prototype sends MAC
and channel information of nearby APs every 30 seconds. In addition, the DC is geared
to accept commands from the DAP or the DS to perform on-demand data gathering,
e.g., switching to promiscuous mode and analyzing a nearby client’s performance prob-
lems. In case the wireless client becomes disconnected, the DC logs data to a local
database/file. This data can be analyzed by the DAP or DS at some future time when
network connectivity is resumed.
Diagnostic Access Point module or DAP: The Diagnostic AP’s main function is to ac-
cept diagnostic messages from DCs, merge them along with its own measurements and
send a summary report to the DS. The Diagnostic AP is not a fundamental requirement
of our architecture; it is primarily needed for offloading work from the DS. Most of our
techniques can work in an environment with a mixture of legacy APs and DAPs: if an
AP is a legacy AP, its monitoring functions are performed by the DCs and its summa-
rizing functions and checks are performed at the DS. In the rest of the chapter, for the
ease of exposition, we assume the presence of DAPs.
Diagnostic Server module or DS: The Diagnostic Server accepts data from DCs and
DAPs and performs the appropriate analysis to detect and diagnose different faults. The
114
User Level Info
Diagnostic DS
Server (DS) Auth Radius Kerberos DHCP
Info
Diagnostic
Messages/ Legacy AP
Actions
DAP
Forward
Send monitor disconnected
info client msgs DC
DC DC
DC
Coverage
Area Client Disconnected
Conduit Peer
Information flow
Wiring
for Diagnosis
Figure 4.2: Fault Diagnosis Architecture
DS also has access to a database that stores each AP’s location. Network administrators
may deploy multiple DSs in the system to balance the load, e.g., each AP’s MAC address
could be hashed to a particular DS. In the rest of the chapter, we present our mechanisms
as if one Diagnostic Server is present in the system.
Figure 4.2 gives a schematic view of our fault diagnosis system. As shown, the
Diagnostic Server interacts with other network servers e.g., the RADIUS [105] and Ker-
beros [90] servers, to get client authorization and user information. Our architecture
allows disconnected clients to communicate with the DS via a nearby connected client
using the Client Conduit protocol; this mechanism is presented in Section 4.5.
115
Our system supports both reactive and proactive monitoring. In proactive moni-
toring, DCs and DAPs monitor the system continuously: if an anomaly is detected by
a DC, DAP, or DS, an alarm is raised for a network administrator to investigate. The
reactive monitoring mode is used when a support personnel wants to diagnose a user
complaint. The personnel can issue a directive to a DC from one of the DSs to collect
and analyze the data for diagnosing the problem. We believe that it is acceptable to
increase the network and CPU load (on the DCs, DAPs, DSs) by a small amount during
reactive monitoring; of course, in the proactive case, these overheads must be kept low.
Our architecture itself imposes negligible overheads with respect to power man-
agement: the individual techniques have to be designed to prevent unnecessary battery
wastage. Both the proactive and reactive techniques presented later in this chapter con-
sume very little bandwidth, CPU, or disk resources; as a result, they should have negli-
gible impact on battery consumption. Only during data transfer in Client Conduit does
a connected client send/receive messages on behalf of a disconnected client. To ensure
that the helping client’s applications (or battery) are not affected significantly, it is of-
fered a knob to control the amount of resources it wants to devote for this transfer (see
Section 4.5.2).
Table 4.1 shows the various problems diagnosed in this chapter, the entities (DCs,
DAPs, and DSs) involved in the diagnosis, and whether the solution can be used with
legacy APs.
4.4.3 System Scaling
We have designed our system to scale with the number of clients and APs in the system.
The two shared resources in our system are DSs and DAPs. To prevent a single Di-
agnostic Server from becoming a potential bottleneck in our system, the design allows
116
Table 4.1: Different fault diagnosis mechanisms and entities that can diagnose
them; the last column indicates if the solution can be supported using legacy APs
Fault Diagnosis Where performed Support for legacy APs?
Help disconnected client DC Yes
Locate disconnected client DS Yes
Performance Isolation DC and DAP Partially
Detect Rogue APs DS Yes
more DSs to be added as the system load increases. Furthermore, we offload work from
each individual DS by sharing the diagnosis burden with the DCs and the DAPs. The
DS is used only when the DCs and DAPs are unable to diagnose the problem and the
analysis requires a global perspective and additional data, e.g., signal strength informa-
tion obtained from multiple DAPs may be needed for locating a disconnected client. As
stated earlier, the presence of legacy APs degrades scalability since the work usually
performed by DAPs would need to be performed by the DSs.
Similarly, since the DAP is a shared resource, making it do extra work can potentially
hurt the performance of all its associated clients. To reduce the load on a DAP, different
fault diagnosis mechanisms can use a simple technique that we refer to as Busy AP
Optimization: with this optimization, an AP does not perform active scanning if any
client is associated with it; the associated clients perform these operations as needed.
The AP continues to perform passive monitoring activities that have a negligible effect
on its performance. If there is no client associated, the AP is idle and it can perform these
monitoring operations. This approach ensures that most of the physical area around the
AP is monitored without hurting the AP’s performance.

117
4.4.4 System Security
The interactions between the DC, DAP, and DS are secured using EAP-TLS [2] certifi-
cates issued over IEEE 802.1x. An authorized certification authority (CA) issues certifi-
cates to DCs, DAPs and DSs; we use these certificates to ensure that all communication
between these entities is mutually authenticated.
We do not address malicious behavior by legitimate users in our environment. Re-
searchers have developed techniques for detecting greedy and malicious behavior for
hotspot environments [101]; others have suggested techniques to handle problems due
to false information sent by malicious clients to central entities such as the DS [99].
These approaches are complimentary to our design and could be included in our sys-
tem.
4.5 Client Conduit
This section presents a novel mechanism called Client Conduit that allows disconnected
wireless clients to convey information to network administrators and support personnel.
If a wireless client cannot connect to the network, the DC logs the problem in its
database. When the client is connected later (e.g., via a wired connection), this log is
uploaded to the DS, which performs the diagnosis to determine the cause of the problem.
However, sometimes it is possible that this client is in the range of other connected
clients; this client may be disconnected since it is just outside the range of any AP or
due to authentication problems. In this situation, it would be desirable to perform fault
diagnosis with the DS immediately and, if possible, rectify the problem. We now focus
on this scenario.
On first thought one may ask: why not have the disconnected node simply send a
message to its connected neighbor? Unfortunately, this approach does not work because
118
IEEE 802.11 does not allow a client to be connected to two networks at the same time.
Since the connected node has already associated to an infrastructure network, it cannot
simultaneously connect to an ad-hoc network with the disconnected client D — if it
wants to receive a message from D, it first has to disconnect and then join the ad-hoc
network started by D. This is inefficient and unfair to a normally-functioning connected
client.
One can imagine solving this problem using multiple radios on the connected client
(one dedicated on an ad hoc network for diagnosis), or using MultiNet (which allows a
client to multiplex a single wireless card such that it is present on multiple networks), or
by making a connected client periodically scan all channels. All these approaches have
the undesirable property of penalizing the normal-case operation/costs to deal with a
problem that is expected to occur infrequently. In the periodic scanning case, switching
the wireless card across channels or networks can cause packet drops at the connected
client. In the MultiNet case, the wireless card will periodically spend time on the ad
hoc network, and will thus consume bandwidth on the connected client. On the other
hand, our Client Conduit approach imposes no overheads in the common case when no
disconnected clients are present in the neighborhood.
4.5.1 The Client Conduit Protocol
We now discuss our Client Conduit protocol that allows a disconnected client to be di-
agnosed by a DS via one of the connected clients. Client Conduit achieves its efficiency
(of not penalizing connected clients) by exploiting two operational facts about the IEEE
802.11 protocol. First, even when a client is associated to an AP, it continues to re-
ceive beacons from neighboring APs or ad hoc networks at regular intervals. Second,
a connected client can send directed or broadcast Probe Requests without disconnect-
119
ing from the infrastructure network. We now present the Client Conduit protocol for a
scenario where a disconnected client D is in the vicinity of a connected client C (see
Figure 4.3). In the following description, we refer to the first 4 steps of the protocol aa
the Connection Setup phase and the last step as the Data Transfer phase.
Figure 4.3: Client Conduit Mechanism (Steps 1 through 5 are described below)
1. The DC on the disconnected client D configures the machine to operate in promis-
cuous mode. It scans all channels to determine if any nearby client is connected
to the infrastructure network. If it detects such a connected client on a channel,
it starts a new infrastructure or an ad hoc network on the channel on which it
detected the client’s packets. For the reasons discussed in Section 4.4.1, and for
the simplicity of exposition, we assume that client D switches mode to become an
AP and starts an infrastructure network.1
2. This newly-formed AP at D now broadcasts its beacon like a regular AP, with an
SSID of the form “SOS HELP <num>” where num is a 32-bit random number to
differentiate between multiple disconnected clients.

1
By examining the ToDS and FromDS fields of IEEE 802.11 data frames [58], client
D can determine whether the data packet is part of an infrastructure network and is being
sent to/from an AP.
120
3. The DC on the connected client C detects the SOS beacon of this new AP. At
this point, C needs to inform D that its request has been heard and it can stop
beaconing. If client C tries to connect to D, it would need to disconnect from
the infrastructure network, thereby hurting the performance of C’s applications.
Instead, we utilize the “active scanning” mechanism of IEEE 802.11 networks —
C sends a Probe Request of the form “SOS ACK <num>” to D. Note that the
Probe Request is sent with a different SSID than the one being advertised by
the AP running on D. This approach prevents some other nearby client that is
not involved in the Client Conduit protocol from inadvertently sending a Probe
Request to D (as part of that client’s regular tests of detecting new APs in its
environment).
4. When D hears this Probe Request (and perhaps other requests as well), it stops
being an AP, and becomes a station again. Note that in response to the Probe
Request, a Probe Response is sent out by D; client C now knows that it does
not need to send more Probe Requests (it would have stopped anyway when D’s
beacons stopped). More importantly, D’s Probe Response indicates if D would
like to use client C as a hop for exchanging diagnostic messages with the DS.
This response mechanism ensures that if multiple connected clients try to help D,
only one of them is chosen by D for setting up the conduit with the DS.
5. Now D starts an ad hoc network and C joins this network via MultiNet [30]. At
this point, C becomes a conduit for D’s messages and D can exchange diagnostic
messages with the DS through C.
The key advantage of the Client Conduit protocol is that connected clients do not
experience unnecessary overheads during normal operation. Their overheads during the
execution of the protocol are discussed later in this section.

121
It is important to note that the Client Conduit mechanism can also be used for boot-
strapping clients. For example, suppose that a client tries to access a wireless network
for the first time and does not have EAP-TLS certificates, but has other credentials such
as Kerberos credentials; Client Conduit can be used to authenticate the user/machine
with the backend Radius/Kerberos servers. New certificates can then be installed on the
client machine; similarly, a client’s expired certificates can also be refreshed without
requiring a wired connection.
It is possible that a client D is within the range of an AP and is disconnected because
of IEEE 802.1x authentication problems [24]. Client Conduit can be used if a connected
client is in range as well. If there is no such client, one can dynamically configure the
AP to allow D’s diagnostic messages to the back end DS (or to the RADIUS servers
who can forward to the DS) via the uncontrolled port [57].
4.5.2 Client Conduit Security and Attacks
We must ensure that the Client Conduit protocol does not introduce any new security
leaks or opportuniues for denial-of-service attacks in the system. To ensure that a mali-
cious/unauthorized client does not obtain arbitrary access to the network, the connected
client allows a disconnected client’s packets to be exchanged only with the DS or back-
end authentication servers.
We now discuss two potential abuses of Client Conduit: hurting the performance of
helping clients and disguising a Rogue AP as a disconnected client.
Performance Degradation of Helping Clients
When a connected client C helps a disconnected client via Client Conduit, we need to
ensure that C’s application’s performance is not adversely affected. During the Con-
122
nection Setup part of Client Conduit, the connected client C simply requires processing
the beacon message and sending/receiving probe messages; no messages are forwarded
by C on the disconnected client’s behalf. These steps not only consume negligible re-
sources on C but they also do not result in any security leak or compromise on C; of
course, C can further rate-limit or stop performing these steps if it discovers that the
disconnected client is making it perform these steps often.
We now consider the Data Transfer part of the protocol for possible security and
denial-of-service attacks. Switching to MultiNet mode can consume bandwidth at the
connected client [30]. There are two problems that need to be addressed. First, a mali-
cious client should not be allowed to waste a connected client C’s resources by making
it enter MultiNet mode unnecessarily. Second, even when helping a legitimate client, C
should be able to control the amount of resources that it wants to allocate for the discon-
nected client D during the MultiNet transfer. The second problem can be addressed by
providing a knob to the client that allows it to limit the percentage of time that it spends
on the ad hoc network relative to the infrastructure network; client C may also limit
this usage to save battery power. Section 4.8.2 characterizes the disconnected client’s
performance overheads due to this tradeoff.
To prevent the first problem due to malicious clients, we add the following authen-
tication step before Data Transfer to ensure that only legitimate clients are allowed to
connect via client C.
After the Connection Setup phase, client C switches to MultiNet mode for perform-
ing authentication. To prevent a denial-of-service (DoS) attack where C is forced into
MultiNet mode repeatedly, C can limit the number of times per minute that it performs
such an authentication step. As part of the authentication step, client C obtains the
EAP-TLS machine certificate from the disconnected client and validates it (for ensuring
123
mutual authentication, client D can perform these steps as well). If the disconnected
client has no certificates or its certificates have expired, client C acts as an intermediary
for running the desired authentication protocol, e.g., C could help D perform Kerberos
authentication from the back end Kerberos servers and obtain the relevant tickets. If
the disconnected client D still cannot authenticate, C asks D to send the last (say) 10
KBytes of its diagnosis log to C and C forwards this log to the DS. To prevent a possible
DoS attack in which a malicious client tries to send this unauthenticated log repeatedly
(e.g., while spoofing different MAC addresses), the connected client can limit the total
amount of unauthenticated data that it sends in a fixed time period, e.g., C could say that
it will send at most 10 KBytes of such data every 5 minutes.
Preventing Disguised Rogue APs
As discussed in Section 4.2, unauthorized APs are a serious security problem in an
enterprise wireless network. An attacker who wants to set up an unauthorized AP and
remain undetected may try to exploit the properties of Client Conduit. The attacker’s
AP can be set up to beacon with an SOS SSID; our Rogue AP detection mechanism
(Section 4.6.3) will assume that this beaconing device is actually a disconnected client
and not declare it as a Rogue AP. Thus, we need to distinguish between the cases where
the beaconing device is a legitimate client and where it is actually a Rogue AP.
In Client Conduit, when a disconnected client becomes an AP or starts an ad hoc
network during the Connection Setup and starts beaconing, it does not send or receive
any data packets. Thus, if a DC ever detects an AP (or a node in ad hoc mode) that is
beaconing the SOS SSID and sending/receiving data packets, the DC can immediately
flag it as a Rogue device. There is another test that can be used to detect such a Rogue
device: when the helping client hears the Probe Response in step 4 of the Client Conduit
124
protocol, it knows that the disconnected client no longer needs to beacon. Thus, if the
helping client continues to hear the SOS beacons after a few seconds, it can flag the
device as a disguised Rogue device.
4.6 Fault Detection and Diagnosis
This section discusses our techniques for detecting and diagnosing faults in a IEEE
802.11 wireless network. Section 4.6.1 describes a simple technique for locating dis-
connected clients. Section 4.6.2 presents our mechanisms for isolating performance
problems and Section 4.6.3 describes how we detect rogue access points.
4.6.1 Locating Disconnected Clients
The ability to locate disconnected wireless clients automatically in a fault diagnosis
system is valuable for proactively determining problematic regions in a deployment,
e.g., poor coverage or high interference (locating RF holes) or for locating possibly
faulty APs. A disconnected client can determine that it is in an RF hole if it does not hear
beacons from any AP (as opposed to being disconnected due to some other reason such
as authentication failures). To approximately locate disconnected clients (and hence
help in locating RF holes), we now discuss a technique called Double Indirection for
Approximating Location or D IAL.
As discussed earlier, when a client D discovers that it is disconnected, it becomes
an AP or starts an ad hoc network and starts beaconing. To determine the approximate
location of this client, nearby connected clients hear D’s beacons and record the sig-
nal strength (RSSI) of these packets. They inform the DS that client D is disconnected
and send the collected RSSI data. At this point, the DS executes the first step of D IAL
to determine the location of the connected clients: this can be done using any known
125
location-determination technique in the literature [17, 73]. In the next step of D IAL, the
DS uses the locations of the connected clients as “anchor points” and the disconnected
client’s RSSI data to estimate its approximate location. This step can be performed us-
ing any scheme that uses RSSI values from multiple clients for determining a machine’s
location [17, 25, 73]. Since locating the connected clients results in some error, con-
sequently locating disconnected clients with these anchor points further increases the
error. In Section 4.8.3, we show that this error is approximately 10 to 12 meters which
is acceptable for estimating the location of disconnected clients.
4.6.2 Network Performance Problems
Our design for diagnosing network performance problems comprises two lightweight
components: a proactive/passive monitoring component and a reactive diagnosing com-
ponent. The monitoring component runs in the background at the DC and informs the
diagnosing component when it detects connections experiencing poor performance. At
this point, the diagnosing component analyzes the connections and outputs a report that
gives a breakdown of the delays, i.e., the extent of the delays in the wired and the wire-
less part, and for the latter, a further breakdown into delays at the client, AP, and the
medium. Note that the monitoring component can be conservative in declaring that
network problems are being encountered; a false alarm simply invokes our diagnosing
component. Since this component has low overheads, invoking it has a small impact
on the performance of clients and APs. These components have not been implemented
yet in our current prototype but we have evaluated the effectiveness of some of these
techniques using tools such as AiroPeek and WinDump.

126
Detecting Network Performance Problems
We focus on diagnosing performance problems for TCP connections since TCP is the
most widely used transport protocol in the Internet. For a TCP connection, we can
passively diagnose performance problems by leveraging the connection’s data and ac-
knowledgment (ACK) packets. For other transport protocols, we can determine end-
to-end loss-rate and round-trip times using either active probing or performance reports
(e.g., RTCP reports [53]).
Network performance problems can manifest themselves in different ways, such as
low throughput, high loss rate, and high delay. We do not use throughput as a metric for
detecting a problem since it is dependent on the workload (i.e., the client’s application
may not need a high throughput) and on specific parameters of the transport protocol
(e.g., initial window size, sender and receive window size in TCP). Instead, we use
packet loss rate and round-trip time for detecting performance problems.
Estimating the round trip time (RTT) in a TCP connection is simple: if the client is
a sender, it already keeps track of the RTT; if the client is a receiver, it can apply the
heuristic proposed in [139] to estimate the round-trip time.
To estimate the loss rate, we use heuristics suggested in [47] and [10] on the client
side. We compute different loss rates for packets sent and received by the client. For data
packets sent by the client, the loss rate is estimated as the ratio of retransmitted packets
to the packets sent over the last L RTTs [10]. This estimation mechanism assumes that
the TCP implementation uses Selective ACKs so that loss rate is not overestimated un-
necessarily; this is a reasonable assumption since a number of operating systems now
support this option by default, e.g., Windows, Linux, Solaris. As shown in [10], this
estimate can be higher than the actual loss rate when timeouts occur in a TCP connec-
tion. For our purposes, this inaccuracy is acceptable for two reasons: first, if a TCP
127
connection is experiencing timeouts, it is probably experiencing problems and is worth
diagnosing; second, the only consequence of a mistake is to trigger our diagnosis com-
ponent, which incurs low overhead. If more accurate analysis is needed, the LEAST
approach suggested in [10] can be used.
For the data packets received by the client, we use an approach similar to the one
suggested in [47] to estimate the number of losses: if a packet is received such that
its starting sequence number is not the next expected sequence number, the missing
segment is considered lost. The loss rate is estimated as the ratio of lost packets to the
total number of expected packets in the last L RTTs. Note that the expected number
of bytes is calculated as the maximum observed sequence number minus the minimum
during the last L RTTs; we apply the idea in [139] to estimate maximum segment size
(MSS), and estimate the number of packets by dividing the number of bytes by MSS.
Our assumption is that segments are rarely delivered out-of-order in a TCP connection
(which has been observed by others [22]).
Our detection component triggers the diagnosis component if a connection is very
lossy or it experiences high delay. A connection is detected as experiencing high delays
if the RTT of a particular packet is more than 250 msec or is higher than twice the current
TCP RTT [140]. To avoid invoking our diagnosis algorithm for high delays that occur
temporarily, we flag a connection only when D or more packets experience a high delay.
A connection is classified as lossy if its loss rate (for transmitted or received packets) is
higher than 5% [96, 140].
Both D and L are configurable parameters and each represents a tradeoff between
responsiveness of the detection component and unnecessary invocation of the diagnosis
component. That is, with a low value of D or L, any change in delays/losses will be
detected quickly but it may also result in invoking the diagnosis component unnecessar-
128
ily. For high values, apart from slow responsiveness, another problem occurs: the TCP
connection may end before sufficient number of samples have been collected. Such a sit-
uation can occur with short Web transfers. We can alleviate this problem by aggregating
loss rate and delay information between the client and remote hosts across TCP con-
nections. We are currently exploring such techniques along with choosing appropriate
values of D and L.
Isolating Wireless and Wired Problems
When the DC at a client detects a network performance problem for a TCP connection, it
communicates with its associated DAP to differentiate between the delays on the wired
and wireless parts of the path. The DAP then starts monitoring the TCP data and ACK
packets for that client’s connection. If the DC is the sender in the TCP connection,
the DAP computes the difference between the received time of a data packet from the
client to the remote host and the corresponding TCP ACK packet; this time difference
is an estimate of the delay incurred in the wired network. To ensure that the roundtrip
time estimate is reasonable, various heuristics used by TCP need to be applied to these
roundtrip measurements as well, e.g., Karn’s algorithm [117]. The DAP sends this
estimate to the DC who can now determine the wireless part of the delay by subtracting
this estimate from the TCP roundtrip time. A similar approach can be used to compute
this breakdown when the client is a receiver: the DAP determines the wireless delay by
monitoring the data packets from the remote host to the client and the corresponding
ACK packets. Note that the amount of state maintained at the DAP is small since it
corresponds to the number of unacknowledged TCP packets; this can be reduced further
by sampling.
129
Diagnosing Wireless Network Problems
A client may experience poor wireless performance due to a number of reasons, such
as an overloaded processor at the AP or the client, problems in the wireless medium,
some driver or other kernel issues at either the AP or the client. We quantify the effect
of these problems by observing their impact on packet delay in the wireless network
path. We group these performance problems into three categories: packet delay at the
client, packet delay at the AP, and packet delay in the wireless medium. In this sec-
tion, we describe a collaborative scheme, called Estimating Delay using Eavesdropping
Neighbors or E DEN, which leverages the presence of other clients to quantify the delay
experienced in each of the above categories. Since electromagnetic waves travel at the
speed of light, we can safely assume that RF propagation delays are negligible relative
to the client or AP delays.
When a client D’s performance diagnosis component is triggered, it starts broad-
casting packets asking for diagnosis help from nearby clients. All clients who hear
these packets switch to promiscuous mode and ask the DAP to start the diagnosis (Sec-
tion 4.8.1 shows that the CPU overheads of entering promiscuous mode are low on
modern processors). Security mechanisms similar to the ones discussed in Section 4.5.2
can be used to prevent attacks on these clients. Note that we use multiple snooping
clients in E DEN primarily for robustness: multiple clients increase the likelihood that at
least one client hears the E DEN protocol requests and responses discussed below.
E DEN proceeds in two phases. In the first phase, the DAP to which client D is as-
sociated estimates the delay at D. The DAP periodically (say every 2 seconds) sends
Snoop request packets to client D. When D receives a Snoop request packet, it imme-
diately replies with a Snoop response message. The eavesdropping clients log the time
when they hear a Snoop request and the first attempt by D to send the corresponding
130
Snoop response packet, i.e., we only record the times of response packets for which the
retransmission bit is clear. If an eavesdropping client misses either of these packets,
it ignores the timing values for that request/response pair. The difference between the
recorded times is the client delay, i.e., application and OS delays experienced by D after
receiving the request packet. For robustness, Snoop requests are sent a number of times
(say 20); the client and AP delays are averaged over all these instances.
In the second phase, a similar technique is used to measure the AP delay, i.e., client D
sends the Snoop request packets and the AP sends the responses. Client D also records
the round trip times to the AP for these Snoop requests and responses along with the
number of request packets for which it did not receive a response, e.g., the request or
response was lost.
Strictly speaking, this client and AP delay also includes the delay due to contention
experienced in the wireless medium. In Section 4.8.4, we discuss the extent of inaccu-
racy introduced in E DEN’s estimates due to traffic congestion.
At the end of the protocol, all the eavesdropping clients send the AP and client
delay times to the client D. The difference between the round trip time reported by D,
and the sum of the delays at the client and the AP, approximates the sum of the delay
experienced by the packet in the forward and backward wireless link. The client can then
report the client/AP/medium breakdown to the network administrator; it can also report
the percentage of unacknowledged request packets as an indicator of the network-level
loss rate on the wireless link.
4.6.3 Rogue AP Detection
As discussed in Section 4.2, Rogue APs are unauthorized APs that have been connected
to an Ethernet tap in an enterprise or university network; such APs can result in security
131
holes, and unwanted RF and network load. Rogue APs are considered a major security
issue for enterprise wireless LANs [5, 7, 36].
Our architectural framework of using clients and (if possible) APs to monitor the
environment around them naturally lends itself for detecting Rogue APs. Our basic
approach is to make clients and DAPs collect information about nearby access points
and send it to the DS. When the DS receives information about an AP X, it checks the
AP location database and ensures that X is a registered AP in the expected location and
channel.
Assumptions
We assume that all Rogue APs and the corresponding connected “rogue” clients use
off-the-shelf IEEE 802.11-compliant hardware. Our approach essentially “raises the
bar” such that non-compliant APs with low-level modifications are needed to defeat our
scheme: to avoid detection, an attacker must modify the Rogue AP to not beacon and
not respond to probe requests. Of course, an attacker can simply use a proprietary access
point or one with different technology, e.g. HIPERLAN. Detecting such intruders re-
quires special hardware and is not our goal. We simply want a low-cost mechanism that
addresses the (common case) Rogue AP problem being faced in current deployments:
for many networks administrators, the main goal is to detect APs inadvertently installed
by employees for experimentation or convenience [24]. As part of future research, we
will investigate the detection of non-compliant Rogue access points and clients as well.
If two companies have neighboring wireless networks, our mechanisms will clas-
sify the other companies’ access points as Rogue APs. If this classification is unaccept-
able, the network administrators of the respective companies can share their AP location
databases.
132
Monitoring at clients and APs
In our system, each DC monitors the packets in its vicinity (non-promiscuous mode),
and for each AP that it detects, it sends a 4-tuple < MAC address, SSID, channel, RSSI
> to the DS. Essentially, the 4-tuple uniquely identifies an AP in a particular location
and channel. To get this information, a DC needs to determine the MAC addresses of
all APs around it.
The DC can obtain the MAC address of an AP by switching to promiscuous mode
and observing data packets (it can use the FromDS and ToDS bits in the packet to deter-
mine which address belongs to the AP). However, we can achieve the same effect using
a simpler approach: since IEEE 802.11 requires all APs to broadcast beacons at regular
intervals, the DC can obtain the MAC addresses from the APs’ beacons from all the
APs that it can hear. In Section 4.8.5, we show that a DC not only hears beacons on its
channel but it may also hear beacons from overlapping channels as well; this property
increases the likelihood of a Rogue AP being detected.
To ensure that we do not miss a Rogue AP even if no client is present on any chan-
nel overlapping with the AP, we use the Active Scanning mechanism of the IEEE 802.11
protocol: when a client wants to find out what APs are nearby, the client goes to each
of the 11 channels (in 802.11b), sends Probe Requests and waits for Probe Responses
from all APs that hear those Probe Requests; from these responses, the DC can obtain
the APs’ MAC addresses. Every IEEE 802.11-compliant AP must respond to such re-
quests and in some chipsets [86], no controls are provided to disable this functionality.
Consistent with our framework, we use the Busy AP Optimization (see Section 4.4.3) so
that active scans in an AP’s vicinity are performed by the AP only when it has no client
associated with it.

133
Analysis at the DS
When the DS receives information for an AP from various clients, it uses D IAL to esti-
mate the AP’s approximate location based on these clients’ locations and the AP’s RSSI
values from them.
The DS classifies an AP as rogue if a 4-tuple does not correspond to a known legal
AP in the DS’s AP location database, i.e., if the MAC address is not present in the
database, or if the AP is not in the expected location, or the SSID does not correspond
to the expected SSID(s) in the organization. Note that if an AP’s SSID corresponds
to an SOS SSID, the DS skips further analysis since this AP actually corresponds to a
disconnected client that is executing the Connection Setup phase of the Client Conduit
protocol. The channel information is used in a slightly different way. As stated above, if
an AP is on a certain channel, it is possible to be heard on overlapping channels. Thus,
an AP is classified as rogue only if it is reported on a channel that does not overlap
with the one on which it is expected. Note that if the channel on an AP is changed, the
DAP can ask the DS to update its AP location database (recall that the communication
between the DAP and the DS is authenticated; if the AP is a legacy AP, the administrator
can update the AP location database when the AP’s channel is changed). The checks
that the DS executes are summarized in Figure 4.4.
A Rogue AP, say R, may try MAC address spoofing to avoid being detected, i.e.,
send packets using the MAC address of an authorized AP, say G. However, the DS can
still detect R as it will reside in a different location or channel than G (if it is on the
same channel and location, G would immediately detect it). Our approach also detects a
Rogue AP that does not broadcast an SSID in its beacons since a DC can still obtain the
AP’s MAC address. Of course, we can detect such unauthorized APs in an even simpler
manner by disallowing APs that do not broadcast SSIDs in an enterprise LAN.

134
Start
Is AP in
Is MAC
Yes expected
registered?
location?
Rogue AP
No No
detected
Yes
No
Is AP on the No
expected Is AP advertising
channel? the expected SSID?
Yes
Figure 4.4: Decision steps taken by the DS to determine if an AP is a Rogue AP or
not
Thus, given the above strategy, an unauthorized AP may stay undetected for a short
time by spoofing an existing AP X near X’s location, beacon a valid SSID in the or-
ganization, and stay on a channel on which no DC or AP can overhear its beacons.
However, when a nearby client performs an active scan, the Rogue AP will be detected;
as we show in Section 4.8.5, a DC can easily perform such a scan every 5 minutes.
4.7 Implementation
We now describe the details of our fault diagnosis implementation. We have imple-
mented the basic architecture consisting of the DC, DAP and DS daemons; the authenti-
cation and logging mechanisms have not been implemented. We have also implemented
135
the Client Conduit protocol and the Rogue AP detection mechanism. The support for
D IAL and E DEN is currently being added.
Our system has been implemented on the Windows operating system with Netgear
MA 521 802.11b cards. On the DS, we simply run a daemon process that accepts
information from DAPs. The DS reads the list of legitimate APs from a file; support
for reading this information from a database can be easily added. The structure of the
code on the DC or DAP consists of a user-level daemon and kernel level drivers (see
Figure 4.5). These pieces are structured such that code is added to the kernel drivers only
if the functionality cannot be achieved in the user-level daemon or if the performance
penalty is too high.
User Diagnostics Daemon

Mode
Kernel
Mode TCP/IP
Diagnostics IM Module
Native WiFi IM Driver
NDIS
Diagnostics Miniport Module

Native WiFi Miniport Driver
Native WiFi NIC
Figure 4.5: Components on DC and DAP

136
Kernel drivers: There are two drivers in our system — a miniport driver and an inter-
mediate driver (IM driver) called the Native WiFi driver [86].
The miniport driver communicates directly with the hardware and provides basic
functionalities such as sending/receiving packets, setting channels, etc. It exposes suffi-
cient interfaces such that functions like association, authentication, etc. can be handled
in the IM driver.
The IM driver supports a number of interfaces (exposed via ioctls) for querying various
parameters such as the current channel, transmission level, power management mode,
SSID, etc. In addition to allowing the parameters to be set, it allows the user-level code
to request for active scans, associate with a particular SSID, capture packets, etc. In
general, it provides a significant amount of flexibility and control to the user-level code.
Even though some of the required operations were already present in the IM driver,
we still had to make significant modifications to expose certain functionalities and to
improve performance of our protocols. The miniport driver was changed to expose
certain packet types to the IM driver. In the IM driver, we added the following support:
• Capturing packet headers and packets: We allow filters to be set such that only
certain packets or packet headers are captured, e.g., filters based on specific MAC
addresses, packet types, packet subtypes (such as management and beacon pack-
ets), etc.
• Storing the RSSI values from received packets: We obtain the RSSI value of ev-
ery received packet and maintain a table called the NeighborInfo table that keeps
track of the RSSI value from each neighbor (indexed on the MAC address). We
maintain an exponentially weighted average with the new value given a weight-
ing factor of 0.25. The RSSI information is needed for estimating the location of
disconnected clients and APs using D IAL.

137
• Keeping track of AP information: In the NeighborInfo table, we keep track of
the channels on which packets were heard from a particular MAC address, SSID
information (from beacons), and whether the device is an AP or a station. This
information needs to be sent to the DAP/DS for Rogue AP detection.
• Kernel event support for protocol efficiency: We added an event that is shared
between the kernel and user-level code. The kernel triggers this event when an
“interesting” event occurs; this allows some of our protocols to be interrupt-driven
rather being polling-based. Currently, the kernel sets this event whenever it hears
an SOS beacon from a disconnected client during Client Conduit, thereby result-
ing in a lower protocol latency (see Section 4.8.2).
• We added a number of ioctls to get and clear the information discussed above.
Fault Diagnostic daemon: This daemon gathers information and implements various
mechanisms discussed in this chapter, e.g.., collect MAC addresses of APs for Rogue
AP detection, perform Client Conduit, etc. If the device is an AP, it communicates diag-
nostic information with the DS and the DCs; if the device is just a DC, it communicates
with its associated AP to convey the diagnostic information.
The Diagnostic daemon on the DC obtains the current NeighborInfo table from the
kernel every 30 seconds. If any new node has been discovered or if the existing data has
changed significantly (e.g., RSSI value of a client has changed by more than a factor
of 2), it is sent to the DAP. The DAP also maintains a similar table indexed on MAC
addresses. However, it only sends information about disconnected clients and APs to
the DS; otherwise, the DS would end up getting updates for every client in the system,
making it less scalable. The DAP sends new or changed information about APs to the
DS periodically (30 seconds in our current prototype). Furthermore, if the DAP has any
138
pending information about a disconnected client D, it informs the DS immediately so
that D can be serviced in a timely fashion.
All messages from the DC to the DAP and DAP to the DS are sent as XML mes-
sages. A sample message format from the DC is shown below (timestamps have been
removed):
<DiagPacket Type="RSSIInfo" TStamp="...">
<Clients TStamp="...">
<MacInfo MAC="00:40:96:27:dd:cc" RSSI="23"
Channels ="19" SSID="" TStamp="..."/>
</Clients>
<Real-APs TStamp="...">
<MacInfo MAC="00:20:a6:4c:c7:85" RSSI="89"
Channels="12" SSID="UNIV_LAN" TStamp="..."/>
<MacInfo MAC="00:20:a6:4c:bb:ad" RSSI="7"
Channels="10" SSID="EXPER" TStamp="..."/>
</Real-APs>
<Disconnected-Clients TStamp="...">
<MacInfo MAC="00:40:96:33:34:3e" RSSI="57"
Channels="2048" SSID="SOS_764" TStamp="..."/>
</Disconnected-Clients>
</DiagPacket>
As the sample message shows, the DC sends information about other connected
clients, APs, and disconnected clients. For each such class of entities, it sends the MAC
address of a machine along with RSSI, SSID, and a channel bitmap which indicates the
channels on which the particular device was overheard.

139
4.8 System Evaluation
We now evaluate our mechanisms and show that they are not only effective but they
also impose low overheads. For the basic architecture evaluation, Client Conduit, and
Rogue AP detection, we use our prototype. To demonstrate the effectiveness of E DEN
and D IAL, we use a combination of tools such as AiroPeek [132] and WinDump [134].
Section 4.8.1 presents the timings for individual operations that are used by our pro-
tocols. Section 4.8.2 presents the breakdown of the costs involved in the Client Conduit
mechanism and shows that it can be used to help disconnected clients in a timely manner.
Section 4.8.3 show the effectiveness of our D IAL technique for locating disconnected
clients. In Section 4.8.4, we evaluate the effectiveness of the E DEN technique to iso-
late performance problems. Section 4.8.5 shows that the scanning requirements of our
Rogue AP detection mechanism imposes low overheads on client machines. Finally, in
Section 4.8.6, we discuss scalability issues with respect to the Client Conduit protocol,
D IAL, E DEN, and Rogue AP detection mechanisms.
4.8.1 Cost of Individual Operations
To better understand the cost of various operations involved in our detection and di-
agnosis mechanisms (e.g., Client Conduit), we ran a series of micro-benchmarks. We
believe that these numbers are valuable for other researchers for modeling purposes as
well. Table 4.2 shows the results. Note, the cost of changing a machine from AP to
Station mode is less than 2 seconds (731 msecs for the actual change and then waiting
for a few hundred msecs as specified by the hardware specifications).
Additionally, we ran some experiments to understand the overheads of placing a card
in promiscuous mode. We first ran an experiment with 4 machines, A, B, C, and D to
determine if placing a machine in promiscuous mode has any effect on the machine’s
140
Table 4.2: Times for different operations: U means time measured from user-level
code; rest are times taken for the corresponding ioctl to complete
Operation Time (msecs) Std. dev
Mostly No-op Ioctl (U) 0.096 0.0008
RPC-based Ioctl (U) 5.72 0.29
Set channel 177.56 7.52
Set beacon period 71.43 7.73
Set AP/STA mode 731.77 232.53
Active Scan 1901.04 14.73
Set SSID 64.73 5.47
incoming/outgoing bandwidth. We setup the machines such that machine A did a TCP
transfer to C at full blast and B performed a full blast TCP transfer to D. The experiment
was performed three times; in each case, machine C was placed in normal mode first
and then in promiscuous mode. We observed that C’s throughput was largely unaffected
by being in promiscuous mode: C achieved an incoming bandwidth of 254.7 KB/sec
(standard deviation of 63.7 KB/sec) in the normal mode case and a bandwidth of 252.3
KB/sec (standard deviation of 21.7 KB/sec) in the promiscuous mode case.
We ran another experiment to determine a machine’s CPU utilization when it is
placed in promiscuous mode. We ran a full blast TCP transfer between two machines
A and B; during this process, we first placed machine M in normal mode and then in
promiscuous mode. Figure 4.6 shows the CPU overhead for machine M (a 1 GHz Pen-
tium III machine). Even for such a relatively old machine, the CPU overhead of placing
it in promiscuous mode is quite low, mostly staying below 10%; we also observed that
none of the packets were dropped.

141
Thus, these results show that the CPU overheads on a machine due to promiscuous
mode are reasonably low.
4.8.2 Client Conduit
To measure the performance of the Client Conduit protocol, we set an experiment with
one AP, one connected client C and a disconnected client D. The connected client is
a 1 GHz Pentium III machine and the disconnected machine is a 800 MHz Pentium III
machine. Both machines have 512 MB of memory and Netgear MA521 802.11b cards.
16
Promiscuous mode
Normal mode
CPU Usage
12
0
0 20 40 60
Time (secs)
Figure 4.6: CPU usage in Promiscuous mode (1 GHz machine)
Figure 4.7 shows the total time taken along with a breakdown of the Connection
Setup part of the protocol. “User time” indicates the end-to-end time taken by our user-
level implementation whereas “Kernel time” indicates the time taken by the relevant
ioctls for the same functionality. The costs in both cases are similar thereby justifying
our approach of implementing only the essential mechanisms at the kernel level and
driving most of the protocol from the user-level (for ease of debugging). In the first
two bars, the user-level daemon at the connected client shares an event with the kernel
who immediately informs the daemon when a disconnected client’s beacon is detected
142
(See Section 4.7). Thus, the disconnected client needs to wait only a short time before it
hears the Probe Request message from the connected client C indicating that C is ready
to help (see the “Get ACK” times). This delay would be much higher if the daemon
obtained the disconnected machine information from the kernel periodically instead of
being interrupt-driven. The third bar shows the delay breakdown for an implementation
where the daemon client polls for this information every 10 seconds from the kernel
(from a disconnected client’s perspective, the “Get ACK” delay is higher).
We now clarify a couple of details about our experiment. First, the initial step of
setting the channel and checking for available clients takes approximately 190 msecs.
In the worst case, the disconnected client may have to scan all channels and check for
connected clients; in that case, this step may take an 2-3 seconds. Second, the steps
in which we set the AP/Station mode of the machine take approximately 730 msec;
however, the hardware specifications require that the operating system must wait for a
few hundred milliseconds before using the card in the new mode. For robustness, we
added a one second delay after such a mode change; the figure includes these delays
after each mode change.
From the figure, one can see that the Connection Setup and association time for the
disconnected client is quite reasonable: it takes less than 5 seconds to run the setup and
another 1.9 seconds to associate with a connected client C in ad-hoc mode so that the
MultiNet protocol can be started on C.
After MultiNet starts running on the connected client, the disconnected client can
interact with the DS to diagnose its problems, e.g., transfer certificates or log files to
the DS. To evaluate the time taken to perform these transfers via MultiNet, we ran
an experiment in which a machine D sent files of different sizes (100KB, 500KB and
1MB) to the DS through connected client C. Figure 4.8 shows the time taken when
143
14,000
Adhoc-mode association
Sleep 1 second
12,000 Become STA
Get Ack
Set Beacon Period
10,000
Set SSID
Sleep 1 second
Time (msec)
8,000 Become AP
Set channel
6,000
4,000
2,000
0
User time (ms) Kernel time (ms) User time with polling (ms)
Figure 4.7: Breakdown of costs for Client Conduit. The protocol steps are executed
from the bottom entry in the legend to the topmost, i.e., starting at “Set channel”.
the connected client C allows 17-50% of its time to be used for ad hoc mode; client C
stays on the infrastructure network for 500 msecs, and the time on the ad-hoc network is
varied between 100 to 500 msecs. In our experiment, the time to switch from ad-hoc to
infrastructure mode is 500 msecs and from infrastructure to ad hoc mode is 300 msecs.
As expected, the results show that the file transfer speed is a direct function of the
time a connected client stays in the ad hoc network. We expect that as the switching
delay overhead reduces (as in newer cards) the transfer speeds will improve.
Thus, our results show that Client Conduit allows a disconnected client’s problem to
be reported (and even be resolved, e.g., updating expired certificates) in a few seconds.
4.8.3 Location Determination
We now evaluate the accuracy of locating disconnected clients (or Rogue APs) using our
D IAL scheme described in Section 4.6.1. Unlike previous work on location determina-
144
20
Time to transfer data (sec)

1MB
16 500KB
100KB
12
0
0 0.1 0.2 0.3 0.4 0.5 0.6
% Time of connected node
Figure 4.8: Time taken by a disconnected client to transfer data via Multinet
tion, the location calculated by D IAL incurs extra error since the location of reference
points themselves may not be known accurately.
We evaluated D IAL using RADAR [17] for locating the disconnected clients from
the anchor points due to its simplicity; more sophisticated RSSI-based schemes such as
the one suggested in [73] can be used for reducing the errors of D IAL even further.
In our experiment, we placed 3 connected clients in 3 offices on the same floor of our
building. We obtained the floor map, and applied the Cohen-Sutherland line-clipping
algorithm [48] to compute the number of walls between each of the three connected
clients and the other rooms. We placed a disconnected client at 7 different locations
while it sent out broadcast packets. We used AiroPeek [132] to measure the RSSI of
the disconnected client’s packets received at the connected machines. We then applied
the equation specified in [17] to compute the wall attenuation factor (WAF). Based on
the WAF, we inferred that the disconnected client is in location X if the predicted signal
strength at X is closest to the observed signal strength at the three connected clients.
We ran the RADAR algorithm on the collected RSSI data for locating the discon-
145
nected client D using the precise location of the connected clients. We computed the
error in D’s predicted location with respect to its actual location; the “No Error” bar in
Figure 4.9(a) shows this error.
Median Location Error (metres) 18

15
12
9
6
3
0
No E(1) E(2) E(1,2) E(3) E(1,3) E(2,3) E(1,2,3)
Error
(a) Estimated location of connected client is one-room off from its true location
Median Location Error (metres)
30
24
18
12
0
No Error E(1) E(2) E(1,2) E(3) E(1,3) E(2,3) E(1,2,3)
(b) Estimated location of connected client is two-rooms off from its true location.
Figure 4.9: Median error in locating disconnected clients. The lower and upper
bounds of error bars correspond to min and max error. E(i) denotes that the ith
connected client’s location contains error.
Then, we ran the algorithm again by assuming that there was an error in estimating
the location of one connected client by a distance of 3.3 meters; this distance corre-
sponds to the average width of a room in our building. For example, if connected client
A was placed in room X, we assume its estimated location to be a neighbouring room Y

146
when using it as an anchor point in RADAR. The second bar in Figure 4.9(a) shows this
error when such a situation occurs. The rest of the bars show the error in locating the
disconnected client when the location of either one, two or three connected clients is es-
timated incorrectly by one room; Figure 4.9(b) shows the error when estimated location
is off by a distance equivalient to that of two rooms.
The results show that when there is no error in the known location of the connected
clients, the median error is 9.7 meters. This error increases to at most 12 meters when the
estimated location of one or more clients is one or two rooms off from its true location.
Of course, when the estimated locations of the connected clients are off by two rooms,
the maximum error is substantially higher, e.g., 33 meters for the case when the location
of all three clients is incorrect. This case occrs when the estimated locations of the
connected clients are off in different directions, e.g., client A’s location is off towards
north and client B’s location is off towards south.
Note that the error in the location of the anchor points (i.e., connected clients) can
be kept low (less than one room off) by using mechanisms such as Cricket [98] and
Active Badges [129] for locating connected clients. With accurate location of anchor
points, D IAL’s error would be similar to that of the best-known RSSI-based location
mechanism. Note that even an error of 10-12 metres (for our experimental setup using
RADAR) is acceptable since the goal of D IAL is to approximately locate disconnected
clients or Rogue APs. Thus, based on our results, we can say that D IAL is a practi-
cal approach for helping network administrators estimate the approximate location of
problematic areas.
147
4.8.4 Estimating Wireless Delays
In Section 4.6.2, we presented the E DEN scheme that uses nearby clients to measure the
delay encountered by a wireless station or an AP. We now show that E DEN can estimate
the delay encountered at these endpoints with reasonable accuracy.
The E DEN technique measures the time spent on a client (or an AP) by measuring
the times of the Snoop request and response packets at nearby clients. However, this
measurement includes the delay at the machine due to medium contention. To under-
stand the extent of this congestion delay, we set up a simple experiment with 4 machines
A, B, C and D on the same channel. Machine A performed a full-blast data transfer to
machine B, thereby creating traffic congestion in the medium. Then we associated client
C with the Native WiFi AP machine D. The Native WiFi AP then sent 20 ping packets
to the associated client, which in turn sent ping reply packets. We ran the experiment
twice: once with no extra client delays and next when an extra 40 msec were added at
the client between the ping request and replies. Using a fifth machine running Airopeek,
we observed that E DEN over-estimated the client delay by approximately 3 msec. When
examining scenarios where the client or the AP are the bottlenecks, such inaccuracies
may be acceptable. However, when these entities are not bottlenecks or when E DEN is
examining a scenario with low delays or when contention is even worse (e.g., the con-
tention delay can even be more than 20 msec in 802.11b), a better estimation is needed;
we are currently exploring mechanisms to reduce such inaccuracies.
Next, we ran an experiment to determine E DEN’s accuracy in determining delays
at an endpoint. In this setup, a client machine was associated with another machine
running as an access point; both machines had Netgear MA521 802.11b cards and the
corresponding Native WiFi drivers. We then injected delays in the path of all packets at
the client (varying from 30 to 300 msecs). To emulate the E DEN protocol, the AP sent
148
Error in Estimated Delay (%)

4
0
30 60 90 120 150 180 210 240 270 300
Delay Introduced at client (msec)
Figure 4.10: E DEN’s accuracy of estimating the delay at a client
20 ping packets to the client; the ping packets and replies emulate the Snoop request
and response messages in E DEN. A third machine running AiroPeek was used to snoop
on these ping packets; this machine effectively emulates the eavesdropping client in
E DEN. The collected Airopeek data was then analyzed to estimate the delays at the
client. Figure 4.10 shows that E DEN is reasonably accurate in estimating the delays at
an endpoint: E DEN can estimate client delays with an error less than 5% of the actual
introduced delay.
Finally, we studied E DEN’s effectiveness in classifying delays at the client, AP, and
the medium. We used a 3-machine setup similar to the one in the previous experiment;
in this case, to estimate delays at the AP, the client also send ping packets to the AP.
To introduce delays in the medium, we increased the distance between the client and
the AP. The medium delay increased relative to the case when the AP and client were
nearby beacuse there were more retries 2 . For better accuracy, we ran these experiments
in the night when the wireless traffic was expected to be low (since the corporate LAN is
2
the increased distance resulted in an increase in the number of walls between the
two machines, thereby weakening the received signal
149
actively used by employees during the day, we did not want traffic interference to affect
our measurements).
120
100 Medium Delay
Roundtrip delay (msec)

AP Delay
80
Client Delay
60
40
20
0
40-40-near 40-40-far 0-0-far
Figure 4.11: Breakdown of delay at the client, AP, and the medium as estimated by
E DEN
Figure 4.11 shows E DEN’s breakdown for three different scenarios. The 40-40-near
bar corresponds to the scenario when the AP and client were placed near each other,
and we added a 40 msec delay to all packets at both machines. The 40-40-far scenario
is similar except that client and the AP were placed far from each other. Finally, the
0-0-far case is one in which we did not introduce any delays at the client or the AP but
they were placed far from each other.
In the 40-40-near case, E DEN estimates approximately equal delays for the client and
the AP. With an increase in the distance (the 40-40-far and 0-0-far cases), the medium
delays increase and E DEN is able to estimate this change as well. Note that the client
and the AP delays increased in the the latter two cases by a few milseconds beacuse
the wireless cards transmitted the packets at a lower transmission rate (1 Mbps) in order
to decrease the error rate. These results show that E DEN is an effective mechanism for
obtaining a delay breakdown in a wireless setting.

150
4.8.5 Rogue AP Detection
In this section, we explore two issues related to Rogue AP detection. Section 4.8.5
shows that overlapping channels helps in quicker detection of Rogue APs that are hiding
on channels where no AP or client is present. Section 4.8.5 shows that even if Rogue
APs are not overheard on overlapping channels, there is ample opportunity for clients to
perform active scanning without hurting their performance. To check the effectiveness
of our implementation, we ran our Rogue AP detection mechanism on our building
floor and were able to detect all “known” Rogue APs (these were experimental APs
being used by our colleagues).
Overlapping Channels
It is known that overlapping channels in IEEE 802.11 not only interfere with one other
but it is sometimes possible for a NIC on one channel to decode packets from another
overlapping channel. This characteristic is helpful in detecting Rogue APs: if a client
is present on a channel that overlaps with a Rogue AP’s channel, it will detect the AP’s
presence if it is able to hear the AP’s beacons.
To verify the extent of this overlap, we performed an experiment in which an AP
was placed on channel 1 and a nearby client checked for the AP’s beacons on all 11
channels. We repeated this experiment by placing the AP in all channels from 2 to 11
and document where it could be heard. In one run, the client lingered on each channel for
1 second and in the second run, it stayed for 5 seconds. Figure 4.12 plots the channels on
which the AP is heard (Y-axis) when it is placed on a specific channel (X-axis). Clearly,
the overlap across various channels is non-negligible and is helpful for detection of
Rogue APs. Furthermore, given sufficient time (see the 5-second run), there is an even
higher likelihood that some packet from a Rogue AP leaks through to a monitoring DC.
151
12
Channel on which beacons

are decoded correctly
Detected in 1 and 5 sec runs
10
Detected only in 5 sec run
8
0
0 2 4 6 8 10 12
Channel on which AP beacons
Figure 4.12: Overlapping channels on which an AP is overheard
In the above experiments, the AP and the client were placed 5 feet apart with one
obstacle between them. We wanted to study the change in leakage across overlapping
channels on increasing the the distance between the AP and the client. For this we
placed an AP machine at 10 different locations on our floor in various rooms and re-
peated the above experiment. Figure 4.13 shows that as the distance between the AP
and the monitoring client increases, the AP is heard on fewer channels (the decrease is
not monotonic due to obstructions).
The above results show that even though one cannot rely on overlap as a guaranteed
mechanism for detecting Rogue APs, it does reduce the need of performing frequent ac-
tive scans. This observation also implies that there are more opportunities for detecting
Rogue APs: for a Rogue AP to go undetected, it must be far away from any client that
is on an overlapping channel.
Availability of Idle Times for Active Scans
As shown in Section 4.8.2, active scans can take up to 2 seconds. Our current imple-
mentation performs an active scan every 5 minutes; we refer to this period as the Active
Scan Period. Even though 2 seconds out of 300 seconds is a small fraction of the time, it
152
Num Channels Sensing AP Beacons

8 Channel 1
7 Channel 4
6 Channel 7
0
0 20 40 60 80 100 120
Distance (in feet)
Figure 4.13: Overlapping channels heard relative to distance
is important for clients to perform these scans at appropriate times; otherwise, network
traffic on a client may get disrupted: packets sent to this client may be dropped, TCP
may timeout, etc.

300
Max idle period (seconds)
250
every 5 minutes
200
150
100
50
0
0 4 8 12 16 20 24
Time of day (hours)
Figure 4.14: The maximum idle time duration available during every 5-minute
period at different times of the day
Ideally, these scans should be done when the node is idle and has no ongoing net-
work transfers. To determine whether such idle times exist in current usage, we used
153
Ethereal [45] to obtain traces from 3 desktop machines of our colleagues over multiple
days. Note that even though these traces are from desktops attached to wired networks,
they still give us a reasonable estimate of network traffic generated by users; as users
start using laptops as their primary machines, it is likely that the network and idle time
behavior will be similar to that of desktop clients.
We divided the traces into 5-minute periods (the Active Scan Period) and for each
period, we determined the maximum period of time for which the network was idle.
Figure 4.14 presents the maximum idle period in every 5-minute interval during a 24-
hour period. Each point in the graph (e.g., for 12:00 pm to 12:05 pm) is obtained by
averaging the maximum idle time value across multiple days and multiple machines for
the same 5-minute period. The figure shows that there are large chunks of idle periods
available for performing active scans: the smallest idle period available in a 5-minute
interval was 118 seconds and typically, idle periods of more than 2.5 to 3 minutes were
easily available. Thus, a large window of opportunity is available to our rogue detection
scheme for performing active scans every 5 minutes.
Given the availability of such opportunities, one can use any heuristic to predict idle
times for launching an active scan (which takes 2 seconds). We studied the effectiveness
of a simple history-based heuristic: if the network has been idle for X seconds, it predicts
that the network will be idle for the next 2 seonds. Thus, after every 5 minutes, the Rogue
AP detection module can perform an active scan whenever it observes that the network
interface has been idle for X seconds. We evaluated the effectiveness of this heuristic
over our 3-machine traces with two different values of X: 5 and 10 seconds. With both
values of X, we observed that the active scan would complete within the idle period
for more than 95% of the cases. The effectiveness of this heuristic shows that wireless
clients can perform active scans for Rogue AP detection without hurting performance.
154
4.8.6 Scalability Analysis
As discussed in Section 4.4.3, our architecture is designed to scale with the number
of access points and clients in the system. We now discuss why our proactive and
reactive techniques maintain the scalability property. We also argue why our reactive
mechanisms impose low network overhead even if a number of clients are experiencing
wireless problems in an area.
As discussed in Section 4.7, each DC pro-actively sends the RSSI, SSID, and MAC
address information about nearby devices to the DAP 30 seconds; this information is
necessary for Rogue AP detection. The DAP filters this data and sends information
about APs every 30 seconds. To understand the network bandwidth consumed on the
wireless link, we set up an experiment with a single DC, DAP and DS for 4 hours. We
observed that the bandwidth consumption by the DC was less than 0.2 Kbps and the
DAP’s bandwidth requirements were less than 0.01 Kbps. This result implies that even
if a large number of clients were present, the bandwidth usage is still low, e.g., 20 Kbps
for 100 clients by DC. Thus, for pro-active monitoring, our techniques have negligible
bandwidth requirements.
We now analyze the bandwidth overheads of our reactive diagnosis mechanisms, i.e.,
Client Conduit and E DEN; we do not discuss D IAL’s overheads since D IAL’s beaconing
messages are part of Client Conduit and the overheads of sending the RSSI information
to the DAP has already been discussed above.
The bandwidth requirements of E DEN and the Connection Setup part (beacons and
probe messages) of Client Conduit are low since these protocols send small broadcast
or beacon packets at a low frequency, e.g., every 100 msecs in Client Conduit and ev-
ery 2 seconds in E DEN. The bandwidth consumption while using MultiNet can also
be controlled: as stated in Section 4.5.2, the connected client can limit the amount of
155
bandwidth that it allocates to the disconnected client. Thus, if a single client needs help,
our reactive mechanisms impose little overhead.
We now analyze the overheads when a large number of clients (say 50) in an area
have wireless faults and are utilizing our reactive mechanisms to diagnose their prob-
lems. Our basic idea for ensuring that the performance of the network does not deteori-
ate is to rate-limit our mechanisms; we have not implemented these protocol extensions
in our current prototype. In Client Conduit, when a disconnected client overhears the
beacons on N disconnected clients, instead of choosing a fixed beacon period of 100
msec, it sends out a beacon every K msecs where K is a random number between 0
and 100*N msecs. This self-regulation ensures that the network is not swamped out by
Client Conduit beacons if a sudden loss of coverage occurs in an area. A similar self-
regulatory mechanism is used to limit the rate at which the initial broadcast packets are
sent in E DEN. Furthermore, to limit the overheads on a connected client C (and possibly
reduce the reactive scheme’s load on the DAP and DS), we can use a policy such that
C helps only one client at any given point. Thus, with these policy decisions, we can
ensure that Client Conduit and E DEN impose low bandwidth overheads even when a
large number of clients are experiencing problems.
4.9 Future Work
There are a number of additional problems in wireless fault diagnosis that require further
research. We plan to pursue these in the near future.
• We presented a technique for detecting Rogue APs in a deployment. A related
problem is to detect Rogue Ad-hoc Networks. Such networks are created when
a user connected to the corporate network (e.g., via a wired network) sets up
an IEEE 802.11 ad-hoc network with an unauthenticated client. Thus, like the
156
Rogue AP scenario, such a network can compromise the security of the corporate
network.
• The problem of performing root-cause analysis on client authentication problems
was not discussed in this chapter. For example, the system could analyze the IEEE
802.1x protocol messages to determine the point at which authentication failed.
• In Section 4.6.1, we show how the location of disconnected clients can be deter-
mined when a few connected clients are present nearby. The question remains,
what should be done when there are no connected clients in the neighborhood.
One approach may be to have the client log its last known location where connec-
tivity was available. Using heuristics, such as movement trajectory, it might be
possible to determine the approximate location of the dead spot.
• The next logical step after diagnosis is recovery. Once a fault has been detected,
one needs to determine what automatic steps should the system take to resolve the
situation without necessarily involving a network administrator.
4.10 Summary
The rising popularity of IEEE 802.11 networks has made fault detection and diagno-
sis an important problem for IT managers responsible for maintaining these networks.
Interestingly, the wireless research community has overlooked these problems, perhaps
because maintenance issues surface only after large deployments are in place, which is
a relatively recent phenomenon.
In this chapter, we presented novel solutions for detecting a variety of faults and
proposed approaches for analyzing performance problems experienced by end-users.
Our initial results show that our mechanisms of locating RF holes, detecting Rogue
157
APs, and diagnosing performance problems are effective and impose low overheads.
Furthermore, we show that a novel mechanism called Client Conduit can be used for
assisting disconnected clients in real-time. These techniques in conjunction with our
general architecture that uses clients, APs, and backend servers together for diagnosing
wireless networks make our system unique and practical.
The general problem space of effective network management for IEEE 802.11 net-
works is large. Our fault diagnosis architecture is a first attempt at addressing some of
the critical problems identified to us by network administrators managing a large 802.11
deployment. It is our hope that this work will stimulate other researchers to investigate
such problems further and propose solutions that will eventually result in the smooth
operation of IEEE 802.11 networks.
The contents of this chapter were developed in joint work with Atul Adya, Victor
Bahl and Lili Qiu. The idea to work on this problem was conceived by Victor Bahl.
He also helped define the problem space. I designed the fault diagnosis architecture,
described in Section 4.4, the Client Conduit Protocol and the Rogue AP algorithm, along
with Atul Adya. I implemented the Client Conduit protocol, and Atul implemented the
Rogue AP algorithm of Section 4.6.3. I also designed the performance isolation and
location determination algorithms, presented in Sections 4.6.1 and 4.6.2, in joint work
with Lili Qiu.

CHAPTER 5
CONCLUSION
To the best of our knowledge, this dissertation is the first to look at the problem of
simultaneously connecting to multiple wireless networks with a single wireless card.
The MultiNet solution is a new virtualization architecture for wireless network cards.
It has been implemented on Windows XP, and is distributed by Microsoft Research as
part of its Mesh Networking Academic Resource Toolkit [104]. In addition to describ-
ing MultiNet, this dissertation also presents two of its applications: SSCH and Client
Conduit.
SSCH is a channel hopping protocol for increasing the capacity of wireless ad hoc
networks. SSCH can be implemented in the link layer of the network stack and works
over the IEEE 802.11 standard. It is the first multi-channel protocol that we are aware of,
which works over a single wireless card without requiring a dedicated control channel.
We show that SSCH significantly increases the capacity of wireless ad hoc networks.
Client Conduit is a key component of our fault diagnosis architecture. It uses Multi-
Net to provide a thin pipe of communication to disconnected wireless clients by using
the bandwidth of connected machines. Client Conduit has been implemented on Win-
dows XP and is shown to be lightweight and secure.
In addition to SSCH and Client Conduit, MultiNet enables the design of a whole
new class of applications. System designers are no longer constrained by the number
of wireless cards they can fit into in a system. They are free to design systems and
applications that can connect to many wireless networks at the same time. We believe
that MultiNet is the first system to relax this physical constraint.
Through its constructions, this dissertation contributes towards solving some of the
key problems in existing wireless networks, in particular power, capacity and manage-
158
159
ability. MultiNet saves battery power by not requiring multiple wireless cards to stay
connected on multiple wireless networks. SSCH improves the capacity of wireless ad
hoc networks by distributing interfering flows on orthogonal frequency channels. Fi-
nally, this dissertation presents a new client-centric fault diagnosis architecture for in-
frastructure wireless networks.

REFERENCES
[1] NLANR/DAST: Iperf 1.7.0. TCP/UDP Bandwidth Measrement Tool. http://dast

.nlanr.net/Projects/Iperf/.
[2] B. Aboba and D. Simon. PPP EAP TLS Authentication Protocol. In RFC 2716,
October 1999.
[3] A. Adya, P. Bahl, R. Chandra, and L. Qiu. Architecture and Techniques for
Diagnosing Faults in IEEE 802.11 Infrastructure Networks. In Proc. of ACM
MobiCom, Philadelphia, PA, September 2004.
[4] Atul Adya, Paramvir Bahl, Jitendra Padhye, Alec Wolman, and Lidong Zhou. A
Multi-Radio Unification Protocol for IEEE 802.11 Wireless Networks. Technical
Report MSR-TR-2003-44, Microsoft Research, July 2003.
[5] AirDefense. Wireless LAN Security. http://airdefense.net.
[6] AirMagnet. AirMagnet Distributed System. http://airmagnet.com.
[7] AirWave. AirWave Management Platform. http://airwave.com.
[8] A. Akella, G. Judd, S. Seshan, and P. Steenkiste. Self Management in Chaotic

Wireless Deployments. In ACM MobiCom 2005, August 2005.
[9] M. Alicherry, R. Bhatia, and L. Li. Joint Channel Assignment and Routing for
Throughput Optimization in Multi-radio Wireless Mesh Networks. In MobiCom,
August 2005.
[10] M. Allman, W. Eddy, and S. Ostermann. Estimating Loss Rates With TCP. In
ACM Perf. Evaluation Review 31(3), Dec 2003.
[11] AMD. Advanced Micro Devices. http://www.amd.com/.
[12] T. M. Apostol. Introduction to Analytic Number Theory. Springer-Verlag, NY,

1976.
[13] ATA. Flash Memory Cards. http://www.magicram.com/flshcrd.htm.
[14] Atheros Communications. http://www.atheros.com.
[15] B. Awerbuch, D. Holmer, and H. Rubens. Provably Secure Competitive Routing

against Proactive Byzantine Adversaries via Reinforcement Learning. In JHU
Tech Report Version 1, May 2003.
[16] P. Bahl, R. Chandra, and J. Dunagan. SSCH: Slotted Seeded Channel Hopping
for Capacity Improvement in IEEE 802.11 Ad-Hoc Wireless Networks. In Proc.
of ACM MobiCom, Philadelphia, PA, September 2004.
160
161
[17] P. Bahl and V. N. Padmanabhan. RADAR: An Inbuilding RF-based User Location

and Tracking System. In Proc. of IEEE INFOCOM, Tel-Aviv, Israel, March 2000.
[18] P. Barford and M. Crovella. Generating Representative Web Workloads for Net-
work and Server Performance Evaluation. In ACM SIGMETRICS 1998, pages
151–160, July 1998.
[19] P. Barford and M. Crovella. Critical Path Analysis of TCP Transactions. In Proc.
of ACM SIGCOMM, Stockholm, Sweden, Aug 2000.
[20] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer,

I. Pratt, and A. Warfield. Xen and the Art of Virtualization. In Proc. of ACM
SOSP, Bolton Landing, NY, October 2003.
[21] BAWUG. Bay Area Wireless Users Group. http://www.bawug.org.
[22] J. Bellardo and S. Savage. Measuring Packet Reordering. In Proc. of ACM Inter-
net Measurement Workshop, Marseille France, Nov 2002.
[23] A. Bensoussan, C. T. Clingen, and R. C. Daley. The Multics Virtual Memory. In

Proc. of ACM SOSP, Princeton, NJ, October 1969.
[24] D. Berry and G. Breeze. Microsoft IT division. Private Communication, 2004.
[25] Bluetooth SIG. Location Working Group. http://bluetooth.org.
[26] J. Broch, D. A. Maltz, and D. B. Johnson. Supporting Hierarchy and Hetero-

geneous Interfaces in Multi-Hop Wireless Ad Hoc Networks. In Workshop on
Mobile Computing held in conjunction with the International Symposium on Par-
allel Architectures, June 1999.
[27] S. Buchegger and J. Le Boudec. The Effect of Rumor Spreading in Reputation

Systems for Mobile Ad-Hoc Networks. In Proc. of WiOpt, France, March 2003.
[28] E. Bugnion, S. Devine, and M. Rosenblum. Disco: Running Commodity Op-

erating Systems on Scalable Multiprocessors. In Sixteenth ACM Symposium on
Operating System Principles, October 1997.
[29] P. Buonadonna, A. Geweke, and D. Culler. An Implementation and Analysis of

the Virtual Interface Architecture. In Proc. of SC, November 1998.
[30] R. Chandra, P. Bahl, and P. Bahl. MultiNet: Connecting to Multiple IEEE 802.11
Networks Using a Single Wireless Card. In Proc. of IEEE INFOCOM, Hong
Kong, Mar 2004.
[31] I. Chlamtac and A. Farago. Making Transmission Schedules Immune to Topol-

ogy Changes in Multi-Hop Packet Radio Networks. IEEE/ACM Transactions on
Networking, 2(1):23–29, February 1994.
162
[32] I. Chlamtac and A. Farago. Time-Spread Multiple-Access (TSMA) Protocols

for Multihop Mobile Radio Networks. IEEE/ACM Transactions on Networking,
5(6):804–812, December 1997.
[33] I. Chlamtac, C. Petrioli, and J. Redi. Energy-Conserving Access Protocols for

Identification Networks. IEEE/ACM Transactions on Networking, 7(1):51–61,
February 1999.
[34] R. R. Choudhury, X. Yang, R. Ramanathan, and N. H. Vaidya. Using Directional

Antennas for Medium Access Control in Ad Hoc Networks. In Proc. of ACM
MobiCom, September 2002.
[35] Cisco. Cisco Aironet 350 series. http://www.cisco.com/warp/public/cc/pd/witc/

ao350ap.
[36] Cisco. CiscoWorks Wireless LAN Solution Engine. http://cisco.com.
[37] Executive Committee. Wireless Philadelphia. http://www.phila.gov/wireless/.
[38] Intel Compaq and Microsoft Corporations. Virtual Interface Specification. Ver-
sion 1.0. December 1997.
[39] Computer Associates. Unicenter Solutions: Enabling a Successful Wireless En-

terprise. http://www.ca.com.
[40] D. De Couto, D. Aguayo, J. Bicket, and R. Morris. A High-Throughput Path

Metric for Multi-Hop Wireless Routing. In ACM MobiCom 2003, September
2003.
[41] P. J. Denning. Virtual Memory. In ACM Computing Surveys, volume 2, pages

153–189, September 1970.
[42] T. ElBatt and B. Ryu. On the Channel Reservation Schemes for Ad-hoc Net-
works: Utilizing Directional Antennas. In IEEE International Symposium on
Wireless Personal Multimedia Communications, October 2002.
[43] J. Elson, L. Girod, and D. Estrin. Fine-Grained Network Time Synchronization

using Reference Broadcast. In Operating Systems Design and Implementation
(OSDI 2002), December 2002.
[44] Engim. Intelligent, Wideband WLAN Chipsets with ASAP Functionality.

http://www.engim.com/.
[45] Ethereal. A network protocol analyzer. http://www.ethereal.com/.
[46] F. Fitzek, D. Angelini, G. Mazzini, and M. Zorzi. Design and performance of

an enhanced IEEE 802.11 MAC protocol for multihop coverage extension. IEEE
Wireless Communications, 10(6):30–39, December 2003.
163
[47] S. Floyd, M. Handley, J. Padhye, and J. Widmer. Equation-Based Congestion

Control for Unicast Applications. In Proc. of ACM SIGCOMM, Stockholm, Swe-
den, Aug 2000.
[48] J. D. Foley, A. van Dam, S. K. Feiner, and J. F. Hughes. Computer Graphics

Principles and Practice (2nd Edition). Addison Wesley, 1990.
[49] T. Garfinkel, B. Pfaff, J. Chow, M. Rosenblum, and D. Boneh. Terra: A Virtual

Machine-Based Platform for Trusted Computing. In Proc. of ACM SOSP, Bolton
Landing, NY, October 2003.
[50] Motorola Government and Enterprise. Motorola’s Mobile Mesh Networks Tech-
nology. http://www.motorola.com/governmentandenterprise/.
[51] F. Herzel, G. Fischer, and H. Gustat. An Integrated CMOS RF Synthesizer for

802.11a Wireless LAN. IEEE Journal of Solid-state Circuits, Vol. 38, No. 10,
October 2003.
[52] M. Heusse, F. Rousseau, G. Berger-Sabbatel, and A. Duda. Performance

Anomaly of 802.11b. In IEEE INFOCOM, 2003.
[53] Hung-Yun Hsieh, Kyu-Han Kim, Yujie Zhu, and Raghupathy Sivakumar. A
receiver-centric transport protocol for mobile hosts with heterogeneous wireless
interfaces. In Proceedings of the 9th annual international conference on Mobile
computing and networking, pages 1–15. ACM Press, 2003.
[54] L. Huang and T. Lai. On the scalability of IEEE 802.11 ad hoc networks. In Pro-
ceedings of the 3rd ACM international symposium on Mobile Ad Hoc Networking
& Computing, MobiHoc, pages 173–182. ACM Press, 2002.
[55] L. Huang, G. Peng, and T. Chiueh. Multi-Dimensional Storage Virtualization. In

Proc. of ACM SIGMETRICS, New York, June 2004.
[56] IBM. Tivoli Software. http://www.ibm.com/software/tivoli/.
[57] IEEE. IEEE 802.1x-2001 IEEE Standards for Local and Metropolitan Area Net-
works: Port-Based Network Access Control, 1999.
[58] IEEE Computer Society. Wireless LAN Medium Access Control (MAC) and
Physical Layer (PHY) Specifications. IEEE Standard 802.11, 1999.
[59] IEEE802.11a. Wireless LAN Medium Access Control(MAC) and Physical

(PHY) Layer Specification: High Speed Physical Layer Extensions in the 5 GHz
Band. 1999.
[60] IEEE802.11b/D3.0. Wireless LAN Medium Access Control(MAC) and Physical

(PHY) Layer Specification: High Speed Physical Layer Extensions in the 2.4
GHz Band. 1999.
164
[61] Crossbow Technology Inc. Motes, Smart Dust Sensors, Wireless Sensor Net-
works. http://www.xbow.com/Products/Wireless Sensor Networks.htm.
[62] Scalable Networks Inc. The Qualnet Simulator. http://www.scalable-networks.co

m/.
[63] Intel. WiMAX - Broadband Wireless Access Technology. http://www.intel.com/

netcomms/technologies/wimax/.
[64] InterEpoch Technology Inc. IWE1100-T Series. http://www.interepoch.com.tw/

products/IWE1100T.asp.
[65] K. Jain, J. Padhye, V. Padmanabhan, and L. Qiu. Impact of Interference on Multi-

hop Wireless Network Performance. In ACM MobiCom 2003, September 2003.
[66] N. Jain and S. R. Das. A Multichannel CSMA MAC Protocol with Receiver-
Based Channel Selection for Multihop Wireless Networks. In International Con-
ference on Computer Communications and Networks (IC3N), October 2001.
[67] Jinyang Li and Charles Blake and Douglas S. J. De Couto and Hu Imm Lee and
Robert Morris. Capacity of Ad Hoc wireless networks. In Mobile Computing and
Networking, pages 61–69, 2001.
[68] D. Johnson, D. Maltz, and J. Broch. DSR The Dynamic Source Routing Proto-
col for Multihop Wireless Ad Hoc Networks. In C.E. Perkins, editor, Ad Hoc
Networking, chapter 5, pages 139–172. Addison-Wesley, 2001.
[69] E. Jung and N. Vaidya. An Energy Efficient MAC Protocol for Wireless LANS.
In IEEE INFOCOM 2002, June 2002.
[70] R. Karrer, A. Sabharwal, and E. Knightly. Enabling Large-scale Wireless Broad-

band: The Case for TAPs. HotNets 2003.
[71] R. Krashinsky and H. Balakrishnan. Minimizing Energy for Wireless Web Access
with Bounded Slowdown. In ACM MobiCom 2002, pages 119–130, September
2002.
[72] R. Kravets and R. Krishnan. Power Management Techniques for Mobile Com-
munications. In ACM MobiCom 1998, October 1998.
[73] A. Ladd, K. Bekris, A. Rudys, G. Marceau, L. Kavraki, and D. Wallach.

Robotics-Based Location Sensing using Wireless Ethernet. In Proc. of ACM Mo-
biCom, Atlanta, GA, Sept 2002.
[74] T. H. Lai and D. Zhou. Efficient and Scalable IEEE 802.11 Ad-Hoc Mode Timing
Synchronization Function. In Proc. of International Conference on Advanced
Information Networking and Applications, March 2003.
165
[75] L. Lamport. Time, Clocks and the Ordering of Events in Distributed Systems. In
Communications of the ACM, volume 21, pages 558–565, 1978.
[76] L. Lamport, R. Shostak, and M. Pease. The Byzantine Generals Problem. ACM
TOPLAS, 4(3):382–401, July 1982.
[77] C. Law, A. K. Mehta, and K. Siu. A New Bluetooth Scatternet Formation Proto-
col. In To appear in ACM Mobile Networks and Applications Journal, 2002.
[78] C. Law and K. Siu. A Bluetooth Scatternet Formation Algorithm. In IEEE Sym-
posium on Ad Hoc Wireless Networks 2001, November 2001.
[79] J. Li, Z. J. Haas, M. Sheng, and Y. Chen. Performance Evaluation of Modi-

fied IEEE 802.11 MAC for Multi-Channel Multi-Hop Ad Hoc Network. In In-
ternational Conference on Advanced Information Networking and Applications
(AINA), 2003.
[80] Y. Li, H. Wu, D. Perkins, N. Tzeng, and M. Bayoumi. MAC-SCC: Medium Ac-
cess Control with a Separate Control Channel for Multihop Wireless Networks.
In 23rd International Conference on Distributed Computing Systems Workshops
(ICDCSW), 2003.
[81] C. R. Lumb, A. Merchant, and G. A. Alvarez. Facade: Virtual Storage Devices

with Performance Guarantees. In Proc. of USENIX FAST, San Francisco, April
2003.
[82] R. Mahajan, N. Spring, D. Wetherall, and T. Anderson. User-level Internet Path

Diagnosis. In Proc. of ACM SOSP, Bolton Landing, NY, October 2003.
[83] S. Marti, T. Giuli, K. Lai, and M. Baker. Mitigating Routing Misbehavior in

Mobile Ad Hoc Networks. In Proc. of ACM MobiCom, Boston, MA, August
2000.
[84] Maxim. Maxim 2.4GHz 802.11b Zero-IF Transceivers. http://pdfserv.maxim-

ic.com/en/ds/MAX2820-MAX2821.pdf.
[85] Maxim. Tracking Advances in VCO Technology. http://pdfserv.maximic.com/en/

an/AN1768.pdf.
[86] Microsoft Corp. Native 802.11 Framework for IEEE 802.11 Networks. http://ww
w.microsoft.com.
[87] A. Miu, H. Balakrishnan, and C. E. Koksal. Achieving Loss Resiliency through

Multi-Radio Diversity in Wireless Networks. In ACM MobiCom 2005, August
2005.
[88] Expert Monitoring. WiSNet Wireless Sensor Networks. http://www.expertmon.c

om/products.html.
166
[89] A. Nasipuri and S. R. Das. Multichannel CSMA with Signal Power-Based Chan-
nel Selection for Multihop Wireless Networks. In IEEE Vehicular Technology
Conference (VTC), September 2000.
[90] B. Neuman and T. Tso. An Authentication Service for Computer Networks. In

IEEE Communications, Karlsruhe, Germany, Sept 1996.
[91] S. Ni, Y. Tseng, Y. Chen, and J. Sheu. The Broadcast Storm Problem in a Mobile
Ad Hoc Network. In ACM MobiCom, August 1999.
[92] L. Nord and J. Haartsen. The Bluetooth Radio Specification and The Bluetooth
Baseband Specification. Bluetooth, 1999-2000.
[93] Nortel. Wireless Mesh Network Solution. http://www.nortel.com/.
[94] HP Openview. Management Solutions for Your Adaptive Enterprise. http://www.

managementsoftware.hp.com/.
[95] J. Padhye, R. Draves, and B. Zill. Routing in Multi-radio, Multi-hop Wireless

Mesh Networks. In ACM MobiCom, 2004.
[96] J. Padhye, V. Firoiu, D. Towsley, and J. Kurose. Modeling TCP Throughput: a

Simple Model and its Empirical Validation. In Proc. of ACM SIGCOMM, Van-
couver, BC, September 1998.
[97] C. Perkins, E. Belding-Royer, and S. Das. Ad hoc On-Demand Distance Vector

(AODV) Routing. In IETF RFC 3561, July 2003.
[98] N. B. Priyantha, A. Chakraborty, and H. Balakrishnan. The Cricket Location-

Support System. In Proc. of ACM MobiCom, Boston, MA, August 2000.
[99] L. Qiu, P. Bahl, A. Rao, and L. Zhou. Fault Detection, Isolation, and Diagnosis
in Multihop Wireless Networks. Technical Report MSR-TR-2004-11, Microsoft
Research, Redmond, WA, Dec 2003.
[100] I. Ramani and S. Savage. SyncScan: Practical Fast Handoff for 802.11 Infras-
tructure Networks . In Proc. of IEEE Infocom, Miami, FL, March 2005.
[101] M. Raya, J. P. Hubaux, and I. Aad. DOMINO: A System to Detect Greedy Be-
havior in IEEE 802.11 Hotspots. In Proc. of MobiSys, Boston, MA, June 2004.
[102] Relatek. Rtl8185l. http://www.realtek.com.tw/.
[103] IBM Security Research. Wireless Security Auditor (WSA). http://www.research

.ibm.com/gsal/wsa.
[104] Microsoft Research. Mesh Networking Academic Resource Toolkit. http://resear

ch.microsoft.com/netres/software.aspx.
167
[105] C. Rigney, A. Rubens, W. Simpson, and S. Willens. Remote Authentication Dial

In User Service (RADIUS). In RFC 2138, IETF, April 1997.
[106] J. Robinson, K. Papagiannaki, C. Diot, X. Guo, and L. Krishnamurthy. Experi-

menting with a Multi-Radio Mesh Networking Testbed. In Workshop on Wireless
Network Measurements, April 2005.
[107] RoofNet. MIT RoofNet. http://www.pdos.lcs.mit.edu/roofnet/.
[108] R. Rozovsky and P. Kumar. SEEDEX: A MAC Protocol for Ad Hoc Networks.
In ACM MobiHoc, 2001.
[109] SeattleWireless. Seattle Wireless. http://www.seattlewireless.net/.
[110] J. Sheu, C. Chao, and C. Sun. A Clock Synchronization Algorithm for Multi-
Hop Wireless Ad Hoc Networks. In Proc. of IEEE International Conference on
Distributed Computing Systems, ICDCS, Tokyo, March 2004.
[111] E. Shih, P. Bahl, and M. Sinclair. Wake On Wireless: An Event Driven Energy
Saving Strategy for Battery Operated Devices. In MOBICOM, September 2002.
[112] E. Shih, P. Bahl, and M. Sinclair. Wake on Wireless: An event driven power
saving strategy for battery operated devices. In ACM MobiCom 2002, September
2002.
[113] M. Shin, A. Mishra, and W. Arbaugh. Improving the Latency of 802.11 Handoffs
Using Neighbr Graphs. In Proc. of MobiSys, Boston, MA, June 2004.
[114] J. So and N. H. Vaidya. A Multi-channel MAC Protocol for Ad Hoc Wireless

Networks. In UIUC Technical Report, also accepted to MobiHoc 2004, January
2003.
[115] Sputnik. Sputnik Managed Wi-Fi Networks. http://www.sputnik.com.
[116] T. K. Srikanth and S. Toueg. Optimal Clock Synchronization. Journal of the

ACM, 34(3):626–645, July 1987.
[117] R. Stevens. TCP/IP Illustrated (Vol. 1): The Protocols. Addison Wesley, 1994.
[118] R. Stine. FYI on a Network Management Tool: Catalog Tools for Monitoring and
Debugging TCP/IP Internets and Interconnected Devices. In IETF RFC 1147,
April 1990.
[119] Strix Systems. Networks without Wires. http://www.strixsystems.com.
[120] J. Sugerman, G. Venkitachalam, and B. Lim. Virtualizing I/O devices on VMware

workstation’s hosted virtual machine monitor. In Annual Usenix Technical Con-
ference, June 2001.
168
[121] SuperPass. Wireless LAN PCI card for 2.4 GHz. http://www.superpass.com/SP-
PCI-01.html.
[122] Symbol. Spectrum42 4131 AccessPoint. http://www.symbol.com/products/wire

less/ap4131.html.
[123] Symbol Technolgies Inc. SpectrumSoft: Wireless Network Management System.

http://www.symbol.com.
[124] A. Tyamaloukas and J. J. Garcia-Luna-Aceves. Channel-hopping multiple access.

In IEEE International Communications Conference (ICC), 2000.
[125] P. Verissimo and L. Rodrigues. A Posteriori Agreement for Fault Tolerant Clock
Synchronization on Broadcast Networks. In Proc. of International Symposium
on Fault-Tolerant Computing (FTCS), page 85, July 1992.
[126] VMware. Enterprise-Class Virtualization Software. http://www.vmware.com/.
[127] VoIP. Voice Over Internet Protocol. http://www.fcc.gov/voip/.
[128] T. von Eicken, A.Basu, V. Buch, and W. Vogels. U-Net: A User-Level Network
Interface for Parallel andDistributed Computing. In Proc. of ACM SOSP, New
York, December 1995.
[129] R. Want, A. Hopper, V. Falcao, and J. Gibbons. The Active Badge Location
System. ACM Transactions on Information Systems, 10(1), January 1992.
[130] A. Whitaker, M. Shaw, and S. D. Gribble. Scale and Performance in the Denali
Isolation Kernel. In Fifth Symposium on Operating Systems Design and Imple-
mentation, December 2002.
[131] Wibhu Technologies Inc. SpectraMon. http://www.wibhu.com.
[132] WildPackets Incorporation. Airopeek Wireless LAN Analyzer. http://www.wild

packets.com.
[133] WiMax.com. WiMAX technology, news, training and conferences. http://www.w

imax.com/.
[134] WinDump: Tcpdump for Windows. http://windump.polito.it.
[135] S.-L. Wu, C.-Y. Lin, Y.-C. Tseng, and J.-P. Sheu. A New Multi-Channel MAC
Protocol with On-Demand Channel Assignment for Mobile Ad Hoc Networks.
In International Symposium on Parallel Architectures, Algorithms and Networks
(I-SPAN), 2000.
[136] S. Xu and T. Saadawi. Does the IEEE 802.11 MAC Protocol Work Well in Mul-
tihop Wireless Ad Hoc Networks? IEEE Communi.Magazine, pp.130-137, June
2001.
169
[137] Z. Tang and J. Garcia-Luna-Aceves. Hop-reservation multiple access (HRMA)

for ad-hoc networks. In IEEE INFOCOM, 1999.
[138] M. Zec. Implementing a Clonable Network Stack in the FreeBSD Kernel. In

Proc. of USENIX Annual Technical Conference, June 2003.
[139] Y. Zhang, L. Breslau, V. Paxson, and S. Shenker. On the Characteristics and

Origins of Internet Flow Rates. In Proc. of ACM SIGCOMM, Pitsburgh, PA,
August 2002.
[140] Y. Zhang, N. Duffield, V. Paxson, and S. Shenker. On the Constancy of Internet

Path Properties. In Proc. of ACM Internet Measurement Workshop, San Fran-
cisco, CA, Nov 2001.

A Virtualization Architecture For

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

A Virtualization Architecture For

Transféré par

Droits d'auteur :

Formats disponibles

A VIRTUALIZATION ARCHITECTURE FOR

WIRELESS NETWORK CARDS

Presented to the Faculty of the Graduate School

in Partial Fulfillment of the Requirements for the Degree of

ALL RIGHTS RESERVED

Ranveer Chandra, Ph.D.

Cornell University 2006

MultiNet is to provide a user-level illusion of simultaneous connectivity on all virtual

Slotted Seeded Channel Hopping (SSCH) and Client Conduit.

capacity in several multihop and single hop wireless networking scenarios.

Client Conduit is a novel technique for providing connectivity to disconnected wire-

diagnosing faults in wireless networks.

distributed systems. The opportunity of solving challenging problems in these fields

crosoft Research in Redmond, WA.

be a role model of what I want to achieve with my research.

enthusiasm on seeing results always motivated me to go further in research.

I would also like to thank my other coauthors at Microsoft Research. In particu-

Finally, I would like to acknowledge the contribution of my family members and

in Ithaca very enjoyable.

2 The MultiNet Virtualization Approach 6

3 SSCH: Capacity Improvement Using MultiNet 56

4 Client Conduit and Fault Diagnosis in Wireless Networks 103

4.1 Number of wireless related complaints logged by the IT department of

networks, and we explore it in greater detail in the rest of this chapter.

1.1 Problems with Existing Wireless Networks

In order to make wireless networks robust we have to solve a number of important

problems, some of which can be grouped under the following categories:

• Manageability: Wireless networks are frustratingly opaque. This leads to long

delays in resolving performance and connectivity problems, as well as high man-

hanced by a management infrastructure for wireless networks that diagnoses prob-

from them [3, 8].

• Capacity: Although the bandwidth of wireless networks is steadily increasing,

transmit power control [69], or avoiding multiple wireless interfaces [30].

1.2 Thesis and Its Contributions

wireless networks [58] by proposing a new virtualization architecture called MultiNet.

goal of MultiNet is to provide a user-level illusion of simultaneous connectivity on all

Firstly, MultiNet enables a number of techniques to reduce power consumption. For

example, it allows the functionality of multiple interfaces to be provided in situations

Point at a lower transmit power using intermediate relay nodes.

works by exploiting channel diversity. The capacity of ad hoc networks is known to

mantics of wireless networks: two neighboring nodes in a network might be on different

of virtual interfaces is the number of orthogonal channels. This dissertation proposes

ulations in QualNet, it is shown that SSCH significantly improves wireless capacity of

Finally MultiNet enables a novel communication mechanism for disconnected ma-

networks. A recent surge in the deployment of large-scale enterprise and city-wide

off the shelf wireless cards and access points.

1.3 Limitations of this Dissertation

However, realistic simulation parameters were chosen and a mathematical analysis of

nosis architecture have additional limitations, and we enumerate them in Chapters 2, 3

1.4 Roadmap of this Dissertation

THE MULTINET VIRTUALIZATION APPROACH

making it appear as multiple virtual cards to the user.

MultiNet enables several compelling scenarios. These include increased connectiv-

An alternative is to use the MultiNet virtualization approach.

infrastructure, such as Access Points (APs) [58] and routers.

This main contributions of this chapter can be summarized as follows:

• It presents the design of MultiNet, which is a new architecture for virtualizing

work for fast handoff in IEEE 802.11 networks [100].

• It analyzes the performance of MultiNet over a number of commercial WLAN

a technique to reduce the overhead of switching a wireless card across networks,

of using multiple wireless cards in the device.

for switching nodes over performance sensitive applications.

and we conclude in Section 2.10.

2.2 Motivating Scenarios