Vous êtes sur la page 1sur 183

A VIRTUALIZATION ARCHITECTURE FOR

WIRELESS NETWORK CARDS

A Dissertation

Presented to the Faculty of the Graduate School

of Cornell University

in Partial Fulfillment of the Requirements for the Degree of

Doctor of Philosophy

by

Ranveer Chandra

January 2006

c 2006 Ranveer Chandra

ALL RIGHTS RESERVED


A VIRTUALIZATION ARCHITECTURE FOR WIRELESS NETWORK CARDS

Ranveer Chandra, Ph.D.

Cornell University 2006

This doctoral dissertation describes the design and applications of a new virtualization

architecture for wireless network cards, called MultiNet. MultiNet virtualizes a sin-

gle wireless card to appear as multiple virtual wireless cards to the user. Each virtual

card can then be configured separately on a physically different network. The goal of

MultiNet is to provide a user-level illusion of simultaneous connectivity on all virtual

cards although the network card is on a single network at any instant. MultiNet achieves

this transparency using intelligent buffering and switching algorithms. The switching

and buffering mechanisms are implemented as a kernel driver, while the policies are

implemented as a user-level service. The MultiNet system has been implemented over

Windows XP and has been operational for over two years. It is agnostic of the upper

layer protocols, and works well over popular IEEE 802.11 wireless LAN cards. Further,

MultiNet enables a new class of applications, which were earlier only possible with

multiple wireless cards in the device. This dissertation describes two such applications:

Slotted Seeded Channel Hopping (SSCH) and Client Conduit.

SSCH is a new channel hopping protocol that works over MultiNet, and utilizes fre-

quency diversity to increase the capacity of IEEE 802.11 wireless networks. Each node

using SSCH switches across channels in such a manner that nodes desiring to communi-

cate overlap, while disjoint communications do not overlap, and hence do not interfere

with each other. To achieve this, SSCH uses a novel scheme for distributed rendezvous

and synchronization. Simulation results show that SSCH significantly increases network

capacity in several multihop and single hop wireless networking scenarios.

Client Conduit is a novel technique for providing connectivity to disconnected wire-


less clients with the help of nearby connected clients. It is based on MultiNet and takes

advantage of the beaconing and probing mechanisms of IEEE 802.11 to ensure that

connected clients do not pay unnecessary overheads while helping disconnected clients.

Client Conduit has been implemented over Windows XP as part of an architecture for

diagnosing faults in wireless networks.


BIOGRAPHICAL SKETCH

Ranveer was born in Jamshedpur, an industrial town in Eastern India on August 27, 1976

as the third in a family of four children. He lived in Jamshedpur for the first 18 years

of his life and decided to appear for the IIT exam after finishing high school. Ranveer

secured a good rank in the IIT qualifying exam and decided to go to IIT Kharagpur,

which was within 100 miles of Jamshedpur. IIT Kharagpur provided an ideal setting

for Ranveer to complete his undergraduate education in an environment that had good

professors, extraordinary peers, little distraction, and still a lot of fun. Ranveer ma-

jored in Computer Science, and developed a keen interest in computer networking and

distributed systems. The opportunity of solving challenging problems in these fields

motivated Ranveer to study further. He applied to a few schools in the United States,

and decided to go to Cornell University in Ithaca, NY for his PhD in Computer Science.

Over the six years at Cornell University Ranveer worked with a number of people at

Cornell. He also spent three summers in Microsoft Research and one at AT&T Labs -

Research, and enjoyed working in industrial research labs. After completing his PhD,

Ranveer is headed towards the North-West, where he has accepted an offer from Mi-

crosoft Research in Redmond, WA.

iii
ACKNOWLEDGEMENTS

First, I want to thank my advisor, Ken Birman, for his constant support and guidance

during my six years of PhD study at Cornell University. He kept me motivated and

provided the right direction that enabled me to finish these challenging years of work.

His sharp intellect and great comments were always the guiding feature in my PhD.

Further, his towering figure in the field of Computer Science has been and will always

be a role model of what I want to achieve with my research.

Secondly, I am grateful to Victor Bahl for bringing out in me what I really wanted

to do in research. Interactions with him during the three internships made me realize

the open problems in wireless networking, and what I needed to do to make an impact

in this field. Victor has also been a constant source of encouragement. His unbridled

enthusiasm on seeing results always motivated me to go further in research.

I am also grateful to my other committee members, Eva Tardos, Zygmunt Haas and

Robbert VanRenesse, who have been supportive of my research in every step of my PhD.

Their comments have been very valuable in rewriting the final draft of this dissertation.

I would also like to thank my other coauthors at Microsoft Research. In particu-

lar, Atul Adya has been a great influence during my PhD. His views and ideas have

influenced the way I write, present and do my research. Lili Qiu has showed me how

perseverance, patience and good work always pays off. Finally John Dunagan has been

of great help in reviewing my work, and showing me the right direction. In addition I

would also like to thank Alec Wolman and Jitu Padhye for great research conversations.

Finally, I would like to acknowledge the contribution of my family members and

friends for keeping me motivated to finish my PhD. My parents have shown their belief

in me and supported me in every possible way. My sister and brother-in-law have al-

ways been with me through the troubled phases of my PhD. I would also like to thank

iv
Meenakshi, Biswanath, Ben, Rimon, Indranil and Rama for making the six years of stay

in Ithaca very enjoyable.

v
TABLE OF CONTENTS

1 Introduction 1
1.1 Problems with Existing Wireless Networks . . . . . . . . . . . . . . . 1
1.2 Thesis and Its Contributions . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Limitations of this Dissertation . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Roadmap of this Dissertation . . . . . . . . . . . . . . . . . . . . . . . 5

2 The MultiNet Virtualization Approach 6


2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Motivating Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Prior Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4.1 Limitations in Existing Systems . . . . . . . . . . . . . . . . . 13
2.4.2 Power Save Mode (PSM) of IEEE 802.11 . . . . . . . . . . . . 13
2.4.3 Next Generation of IEEE 802.11 WLAN cards . . . . . . . . . 14
2.5 MultiNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5.1 Assumptions about the System . . . . . . . . . . . . . . . . . . 14
2.5.2 MultiNet Design Goals . . . . . . . . . . . . . . . . . . . . . . 16
2.5.3 The MultiNet Approach . . . . . . . . . . . . . . . . . . . . . 18
2.5.4 Delivering Packets to Virtual Interfaces . . . . . . . . . . . . . 21
2.5.5 Determining the Activity Period for a Network . . . . . . . . . 24
2.5.6 Handling Ad Hoc Networks with Multiple MultiNet Nodes . . . 25
2.6 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.6.1 MultiNet Driver . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.6.2 MultiNet Service . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.6.3 Implementing Buffering . . . . . . . . . . . . . . . . . . . . . 31
2.6.4 Implementing Slotted Synchronization . . . . . . . . . . . . . 31
2.7 System Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.7.1 Test Configuration . . . . . . . . . . . . . . . . . . . . . . . . 32
2.7.2 Reducing the Switching Delay . . . . . . . . . . . . . . . . . . 33
2.7.3 Comparing Different Switching Strategies . . . . . . . . . . . . 34
2.7.4 Adaptive Switching using MultiNet . . . . . . . . . . . . . . . 37
2.7.5 MultiNet with and without Buffering . . . . . . . . . . . . . . 38
2.7.6 MultiNet with Slotted Synchronization . . . . . . . . . . . . . 39
2.7.7 MultiNet on a Mobile Node . . . . . . . . . . . . . . . . . . . 41
2.7.8 MultiNet versus Multiple Radios . . . . . . . . . . . . . . . . . 42
2.7.9 Maximum Connectivity in MultiNet . . . . . . . . . . . . . . . 49
2.7.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.8 Discussion on the MultiNet Architecture . . . . . . . . . . . . . . . . . 51
2.8.1 Reducing the Switching Overhead . . . . . . . . . . . . . . . . 51
2.8.2 Network Port Based Authentication . . . . . . . . . . . . . . . 52
2.8.3 Can MultiNet be done in the Firmware? . . . . . . . . . . . . . 53
2.9 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

vi
2.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3 SSCH: Capacity Improvement Using MultiNet 56


3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.2 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . 57
3.3 Hardware and MAC Assumptions . . . . . . . . . . . . . . . . . . . . 61
3.4 Prior Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.5 SSCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.5.1 Packet Scheduling . . . . . . . . . . . . . . . . . . . . . . . . 67
3.5.2 Channel Scheduling . . . . . . . . . . . . . . . . . . . . . . . 69
3.5.3 Mathematical Properties of SSCH . . . . . . . . . . . . . . . . 76
3.6 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.6.1 Microbenchmarks . . . . . . . . . . . . . . . . . . . . . . . . 79
3.6.2 Macrobenchmarks: Single-hop Case . . . . . . . . . . . . . . . 85
3.6.3 Macrobenchmarks: Multihop and Mobility . . . . . . . . . . . 91
3.6.4 Implementation Considerations . . . . . . . . . . . . . . . . . 98
3.7 Alternatives to SSCH . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
3.8 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
3.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

4 Client Conduit and Fault Diagnosis in Wireless Networks 103


4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.2 Faults in a Wireless Network . . . . . . . . . . . . . . . . . . . . . . . 106
4.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.4 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.4.1 System Requirements . . . . . . . . . . . . . . . . . . . . . . 111
4.4.2 System Components . . . . . . . . . . . . . . . . . . . . . . . 112
4.4.3 System Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . 115
4.4.4 System Security . . . . . . . . . . . . . . . . . . . . . . . . . 117
4.5 Client Conduit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
4.5.1 The Client Conduit Protocol . . . . . . . . . . . . . . . . . . . 118
4.5.2 Client Conduit Security and Attacks . . . . . . . . . . . . . . . 121
4.6 Fault Detection and Diagnosis . . . . . . . . . . . . . . . . . . . . . . 124
4.6.1 Locating Disconnected Clients . . . . . . . . . . . . . . . . . . 124
4.6.2 Network Performance Problems . . . . . . . . . . . . . . . . . 125
4.6.3 Rogue AP Detection . . . . . . . . . . . . . . . . . . . . . . . 130
4.7 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
4.8 System Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
4.8.1 Cost of Individual Operations . . . . . . . . . . . . . . . . . . 139
4.8.2 Client Conduit . . . . . . . . . . . . . . . . . . . . . . . . . . 141
4.8.3 Location Determination . . . . . . . . . . . . . . . . . . . . . 143
4.8.4 Estimating Wireless Delays . . . . . . . . . . . . . . . . . . . 147
4.8.5 Rogue AP Detection . . . . . . . . . . . . . . . . . . . . . . . 150
4.8.6 Scalability Analysis . . . . . . . . . . . . . . . . . . . . . . . 154

vii
4.9 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
4.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

5 Conclusion 158

References 160

viii
LIST OF TABLES

2.1 The Switching Delays between IS and AH networks for IEEE 802.11
cards with and without the optimization of trapping media connect and
disconnect messages. . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.2 The average throughput in the ad hoc and infrastructure networks using
both strategies of MultiNet and two radios . . . . . . . . . . . . . . . 45
2.3 The average packet delay in infrastructure mode for the various strategies 46
2.4 The average packet delay in infrastructure mode on varying the number
of MultiNet connected networks . . . . . . . . . . . . . . . . . . . . . 50

4.1 Different fault diagnosis mechanisms and entities that can diagnose
them; the last column indicates if the solution can be supported using
legacy APs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
4.2 Times for different operations: U means time measured from user-level
code; rest are times taken for the corresponding ioctl to complete . . . 140

ix
LIST OF FIGURES

2.1 The MultiNet Layer maintains virtual interfaces for networks 1, 2 and
3, and switches the physical card across all these networks. It gives the
illusion of connectivity on all networks although the card is on network
2 at this instant. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 The steps of Spoofed Buffering when a node uses MultiNet to connect
to two networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3 Two nodes in communication range and using MultiNet that fail to
overlap in the ad hoc network abd hence experience a logical partitioning. 26
2.4 The Network Stack with MultiNet . . . . . . . . . . . . . . . . . . . . 29
2.5 Time taken to complete a 47 MB FTP transfer on an ad hoc and infras-
tructure network using different switching strategies . . . . . . . . . . 36
2.6 Variation of the activity period for two networks with time. The activity
period of a network is directly proportional to the relative traffic on it. . 37
2.7 TCP Performance with and without Spoofed Buffering. . . . . . . . . 39
2.8 Effect on UDP flows when a node uses Slotted Synchronization to join
an ad hoc network . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.9 MultiNet in a Mobile Scenario . . . . . . . . . . . . . . . . . . . . . . 42
2.10 Packet trace for the web browsing application over the infrastructure
network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.11 Packet trace for the presentation and chat workloads over the ad hoc
network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.12 Comparison of total energy usage when using MultiNet versus two radios 47
2.13 Energy usage when using MultiNet and two radios with IEEE 802.11
Power Saving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.1 Only one of the three packets can be transmitted when all the nodes are
on the same channel. . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.2 Channel hopping schedules for two nodes with 3 channels and 2 slots.
Node A always overlaps with Node B in slot 1 and the parity slot. The
field of the channel schedule that determines the channel during each
slot is shown in bold. . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.3 The problem with a naive synchronization scheme. Node A has two
slots, with (channel, seed) pairs represented by A1 and A2 ; nodes B
and C are similarly depicted. At time t1 , node A synchronizes with
node B. Node B synchronizes with node C at time t2 , after which A
and B are no longer synchronized. . . . . . . . . . . . . . . . . . . . . 72
3.4 Need for De-synchronization: All nodes converge to the same channel
without de-synchronization. . . . . . . . . . . . . . . . . . . . . . . . 74
3.5 Switching and Synchronizing Overhead: Node 1 starts a maximum rate
UDP flow to Node 2. We show the throughput for both SSCH and IEEE
802.11a. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

x
3.6 Overhead of an Absent Node: Node 1 is sending a maximum rate UDP
stream to Node 2. Node 1 then attempts to send a packet to a non-
existent node. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.7 Overhead of a Parallel Session: Node 1 is sending a maximum rate
UDP stream to Node 2. Node 1 then starts a second stream to Node 3. . 83
3.8 Overhead of Mobility: Node 1 is sending a maximum rate UDP stream
to Node 2. Node 1 starts another maximum rate UDP session to Node
3. Node 3 moves out of range at 30 seconds, while Node 1 continues to
attempt to send until 43 seconds. . . . . . . . . . . . . . . . . . . . . . 84
3.9 Overhead of Clock Skew: Throughput between two nodes using SSCH
as a function of clock skew. . . . . . . . . . . . . . . . . . . . . . . . 85
3.10 Disjoint Flows: The throughput of each flow on increasing the number
of flows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.11 Disjoint Flows: The system throughput on increasing the number of
flows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.12 Non-disjoint Flows: The average throughput of each flow on increasing
the number of flows. There is a flow from every node in the network. . 88
3.13 Non-disjoint Flows: The system throughput on increasing the number
of flows. There is a flow from every node in the network. . . . . . . . . 89
3.14 Effect of Flow Duration: Ratio of SSCH average throughput to IEEE
802.11a average throughput for flows having different durations. . . . . 90
3.15 TCP over SSCH: Steady-state TCP throughput when varying the num-
ber of non-disjoint flows. . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.16 Multihop Chain Network: Variation in throughput as chain length in-
creases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.17 Mulithop Mesh Network of 100 Nodes: Average flow throughput on
varying the number of flows in the network. . . . . . . . . . . . . . . . 94
3.18 Impact of SSCH on Unmodified MANET Routing Protocols: The av-
erage time to discover a route and the average route length for 10 ran-
domly chosen routes in a 100 node network using DSR over SSCH. . . 95
3.19 Dense Multihop Mobile Network: The per-flow throughput and the av-
erage route length for 10 flows in a 100 node network in a 200m×200m
area, using DSR over both SSCH and IEEE 802.11a. . . . . . . . . . . 97
3.20 Sparse Multihop Mobile Network: The per-flow throughput and the
average route length for 10 flows in a 100 node network in a 300m ×
300m area, using DSR over both SSCH and IEEE 802.11a. . . . . . . 98

4.1 Number of wireless related complaints logged by the IT department of


a major US corporation . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.2 Fault Diagnosis Architecture . . . . . . . . . . . . . . . . . . . . . . . 114
4.3 Client Conduit Mechanism (Steps 1 through 5 are described below) . . 119
4.4 Decision steps taken by the DS to determine if an AP is a Rogue AP or
not . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
4.5 Components on DC and DAP . . . . . . . . . . . . . . . . . . . . . . 135

xi
4.6 CPU usage in Promiscuous mode (1 GHz machine) . . . . . . . . . . . 141
4.7 Breakdown of costs for Client Conduit. The protocol steps are executed
from the bottom entry in the legend to the topmost, i.e., starting at “Set
channel”. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
4.8 Time taken by a disconnected client to transfer data via Multinet . . . . 144
4.9 Median error in locating disconnected clients. The lower and upper
bounds of error bars correspond to min and max error. E(i) denotes
that the ith connected client’s location contains error. . . . . . . . . . 145
4.10 E DEN’s accuracy of estimating the delay at a client . . . . . . . . . . . 148
4.11 Breakdown of delay at the client, AP, and the medium as estimated by
E DEN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
4.12 Overlapping channels on which an AP is overheard . . . . . . . . . . . 151
4.13 Overlapping channels heard relative to distance . . . . . . . . . . . . . 152
4.14 The maximum idle time duration available during every 5-minute pe-
riod at different times of the day . . . . . . . . . . . . . . . . . . . . . 152

xii
CHAPTER 1

INTRODUCTION

There has been a recent interest in using multiple wireless cards in a device [9,64,87,95,

115, 119]. This dissertation provides a cheaper and more energy-efficient scheme to get

the functionality of multiple wireless cards while using only a single physical network

interface. This approach is called MultiNet, which is a new architecture for virtualizing

wireless cards. MultiNet is very useful in solving some of the key problems in wireless

networks, and we explore it in greater detail in the rest of this chapter.

1.1 Problems with Existing Wireless Networks

Wireless technology has an increasing presence in our life from cellular phones, wireless

LANs, Bluetooth headphones, cordless phones, location systems, to smart homes, and

many more. This trend will grow with an increasing deployment of sensor networks [61,

88], mesh networks [50, 93], and the recent WiMAX initiative [63, 133]. Although they

are increasingly common, wireless networks are still relatively fragile and underutilized.

In order to make wireless networks robust we have to solve a number of important

problems, some of which can be grouped under the following categories:

• Manageability: Wireless networks are frustratingly opaque. This leads to long

delays in resolving performance and connectivity problems, as well as high man-

ageability costs [5, 7, 39, 103, 131]. The state of the art will be significantly en-

hanced by a management infrastructure for wireless networks that diagnoses prob-

lems with minimum human intervention and informs the user of ways to recover

from them [3, 8].

1
2

• Capacity: Although the bandwidth of wireless networks is steadily increasing,

capacity is still a bottleneck for many applications [40, 65, 95]. Any scheme that

increases wireless capacity, through advanced antennas [34, 42] and smarter pro-

tocols [16, 114] will greatly impact the wireless performance of a number of ap-

plications.

• Power: Limited battery power is the Achilles heel for wireless applications [72].

Applications and protocols for mobile computing should prolong battery life by

using schemes such as maximizing sleep durations of wireless cards [71], using

transmit power control [69], or avoiding multiple wireless interfaces [30].

1.2 Thesis and Its Contributions

This doctoral dissertation contributes towards solving these problems for IEEE 802.11

wireless networks [58] by proposing a new virtualization architecture called MultiNet.

MultiNet virtualizes a single wireless card to make it appear as multiple wireless cards

to the user. The user can configure each virtual card separately to be on a physically

different network. For example, when using an IEEE 802.11 card the user can connect

one virtual card on an infrastructure network, and the other virtual card on an ad hoc

network, although the network card is on a single physical network at any instant. The

goal of MultiNet is to provide a user-level illusion of simultaneous connectivity on all

wireless networks. MultiNet achieves this transparency using intelligent buffering and

switching algorithms. MultiNet has been implemented over Windows XP and is avail-

able for download. In addition to describing this architecture, this thesis also explores

three ways in which MultiNet alleviates the above problems of wireless networks.

Firstly, MultiNet enables a number of techniques to reduce power consumption. For

example, it allows the functionality of multiple interfaces to be provided in situations


3

where the fixed energy cost of multiple physical interfaces is not feasible. MultiNet also

enables a new power saving mechanism by allowing nodes to function as relays using

only one wireless card: nodes with low battery power can send their traffic to the Access

Point at a lower transmit power using intermediate relay nodes.

Secondly, MultiNet facilitates a way to increase the capacity of wireless ad hoc net-

works by exploiting channel diversity. The capacity of ad hoc networks is known to

scale poorly with the number of communicating nodes [67]. When multiple neighbor-

ing node pairs want to communicate using IEEE 802.11, only one pair can be active at a

time. However, other nodes can talk simultaneously if they are on orthogonal frequency

channels, since traffic on orthogonal channels do not interfere. But this breaks the se-

mantics of wireless networks: two neighboring nodes in a network might be on different

channels and cannot communicate. MultiNet helps to solve this problem. The number

of virtual interfaces is the number of orthogonal channels. This dissertation proposes

a new scheduling algorithm, called Slotted Seeded Channel Hopping (SSCH), which

works with MultiNet to improve network capacity. The goal of SSCH is to have com-

municating nodes on the same channel and other nodes on randomly different channels

at any instant, while ensuring that any two neighboring nodes overlap within a fixed

period. SSCH achieves this goal by introducing the technique of partial synchroniza-

tion and also makes use of existing techniques such as pseudo-random generators. It is

shown mathematically that SSCH has desired synchronization properties. Using sim-

ulations in QualNet, it is shown that SSCH significantly improves wireless capacity of

IEEE 802.11.

Finally MultiNet enables a novel communication mechanism for disconnected ma-

chines, called Client Conduit, which is used to diagnose faults in infrastructure wireless

networks. A recent surge in the deployment of large-scale enterprise and city-wide


4

wireless networks [37] entails a pressing need for wireless network management tools,

similar to wired networks [56, 94]. Network administrators want to know why users are

suffering from poor performance and frequent disconnections. They are interested in lo-

cating security breaches, for example an unauthorized (rogue) access point plugged into

an enterprises’ Ethernet jack that jeopardizes its resources. In our architecture, Client

Conduit allows disconnected clients to transfer diagnostic messages to and from a back-

end server. It is implemented using MultiNet, since it allows connected clients to stay on

the infrastructure network using one virtual interface, and form an ad hoc network with

the disconnected client on another virtual interface. This thesis presents a lightweight

mechanism to implement Client Conduit, where virtual interfaces are added dynami-

cally and a connected client suffers no penalty in the common case. It also proposes

algorithms to detect rogue access points, locate disconnected clients, and diagnose poor

wireless performance. This architecture has been prototyped over Windows XP using

off the shelf wireless cards and access points.

1.3 Limitations of this Dissertation

Although MultiNet has been implemented over Windows XP, it has not been tested over

all cases and in large deployments. Consequently, simulation results were used to show

the feasibility MultiNet. Further, the inability of available hardware to quickly switch

across frequency channels limited all results on SSCH to simulations in QualNet [62].

However, realistic simulation parameters were chosen and a mathematical analysis of

SSCH was done to show that SSCH will significantly improve the capacity of wireless

networks when the required hardware is available. MultiNet, SSCH and our fault diag-

nosis architecture have additional limitations, and we enumerate them in Chapters 2, 3

and 4 respectively.
5

1.4 Roadmap of this Dissertation

Chapter 2 describes the MultiNet architecture in detail. It also shows that MultiNet

consumes less energy than an alternative approach of using multiple wireless cards.

Chapter 3 describes the SSCH protocol and its properties, and analyzes the performance

of SSCH. Chapter 4 then presents our fault diagnosis architecture, and describes and

evaluates the design of Client Conduit. Finally, Chapter 5 concludes this dissertation.

Most of the contents of Chapters 2, 3 and 4 are adapted from previously written

independent papers, in particular [30], [16] and [3] respectively. The contributions of

coauthors of each of these papers is listed in the last paragraph of each chapter.
CHAPTER 2

THE MULTINET VIRTUALIZATION APPROACH

2.1 Introduction

Systems research over the last two decades has revealed a number of benefits of virtual-

izing different systems components, such as virtual machines [20, 49, 126, 130], virtual

storage [55, 81] and virtual memory [23] among others. However, the benefits of vir-

tualizing a wireless card have not been explored. This chapter describes MultiNet, a

new virtualization architecture that abstracts a single wireless LAN (WLAN) [60] card,

making it appear as multiple virtual cards to the user.

MultiNet enables several compelling scenarios. These include increased connectiv-

ity for end users; increased range of the wireless network; bridging between infrastruc-

ture and ad hoc wireless networks, and painless secure access to sensitive resources. We

discuss these in detail in Section 2.2. To explore this problem space with current tech-

nology, one would have to use a single WLAN card for each desired network [64, 115].

Doing so is costly, cumbersome, and consumes energy resources that are often limited.

An alternative is to use the MultiNet virtualization approach.

Virtualizing a wireless card poses several research challenges. Firstly, a virtual wire-

less card should appear as a real (physical) wireless card to the user. Secondly, the user

should get an illusion of simultaneous connectivity on all virtual cards, although the

physical wireless card can only be on one network at any instant [58]. Thirdly, the

system should be deployable and compatible with nodes not using virtualization. More-

over, the virtualization software should not require modifications to existing backbone

infrastructure, such as Access Points (APs) [58] and routers.

MultiNet solves the above problems by creating a new virtual interface for each net-

6
7

work to which connectivity is desired. The virtual interface exports itself as a new phys-

ical device to the network layer. It also maintains the state of the physical card required

for connecting to the wireless network corresponding to this virtual interface. Multi-

Net achieves the illusion of simultaneous connectivity over all networks by switching

the physical network card across the desired networks and activating the correspond-

ing virtual interface. Further, MultiNet is deployable as it does not require changes to

APs and routers. This is achieved by a new protocol called Spoofed Buffering, which

leverages the Power Save Mode of the IEEE 802.11 [58] standard, and is described in

Section 2.5.4.

This main contributions of this chapter can be summarized as follows:

• It presents the design of MultiNet, which is a new architecture for virtualizing

WLAN cards. As part of the design it describes the state that needs to be stored for

every virtual wireless card. It also describes in detail the implementation of Multi-

Net over Windows XP. The implementation works with modest modifications to

the Operating System kernel, and without any modifications to the wireless card

drivers.

• It proposes a new protocol, called Spoofed Buffering, which delivers packets sent

to a node using MultiNet when it is on another network. APs buffer packets for

the nodes that have switched to another network, and deliver them when nodes

switch back to their network. Spoofed Buffering achieves this functionality with-

out requiring any changes to APs. This protocol has also been used in a recent

work for fast handoff in IEEE 802.11 networks [100].

• It analyzes the performance of MultiNet over a number of commercial WLAN

cards, and shows that MultiNet is suitable for most applications. It describes

a technique to reduce the overhead of switching a wireless card across networks,


8

and shows that MultiNet consumes less battery power than an alternative approach

of using multiple wireless cards in the device.

As of this writing, MultiNet has been operational for over two years. During this

time, we have refined the protocols and analyzed them in greater detail. Many of the

results we present in this chapter are based on real working systems that include current

and next generation IEEE 802.11 wireless cards. For cases where it is not possible to

study the property of the system without large scale deployment and optimized hard-

ware, we carry out simulation based studies. Most of our simulations are driven by

traffic traces that represent ‘typical traffic’. For IEEE 802.11, our study shows that

MultiNet nodes can save upto 50% of the energy consumed by nodes with two cards,

while providing similar functionality. We also quantify the delay versus energy tradeoff

for switching nodes over performance sensitive applications.

The rest of this chapter is organized as follows. Section 2.2 presents some scenarios

and applications that motivate the need for MultiNet and for which MultiNet is currently

being used. Section 2.3 presents some related research and Section 2.4 provides the

background needed for the rest of the chapter. The MultiNet architecture is presented

in Section 2.5, and its implementation is described in Section 2.6. Performance and

feasibility are discussed in Sections 2.7 and 2.8. Future work is presented in Section 2.9

and we conclude in Section 2.10.

2.2 Motivating Scenarios

MultiNet enables several new applications that were earlier not possible using a single

wireless card. A few examples include:

• Concurrent Connectivity: A user can connect to multiple wireless networks. He

specifies a list of networks, and MultiNet simultaneously connects to all of them.


9

• Network Elasticity: The range of an infrastructure network can be extended if

border nodes use MultiNet to function as relays for authorized nodes that are

outside the range of the Access Point (AP). We implemented this functionality as

part of our fault diagnosis architecture, and describe it in detail in Chapter 4.

• Gateway Node: A node that is part of a wireless ad hoc network and close to

an AP, connected to the Internet, can use MultiNet to stay connected on both

networks, and become a gateway node for the ad hoc network [26].

• Network Security: Different groups (e.g. human resources personnel, secretaries,

developers etc.) within a company may be given different permissions to access

data servers. These servers could be on physically different networks. A privi-

leged user, who has permission to access different networks, can use MultiNet to

simultaneously connect to multiple networks.

• Increased Capacity: The capacity of ad hoc networks can be increased if nodes

within interference range communicate on orthogonal frequency channels [16,

114]. In Chapter 3, we describe SSCH, which uses MultiNet to virtualize a wire-

less card into as many instances as the number of orthogonal channels, and simul-

taneously connects on all of them.

• Virtual Machines: Existing Virtual Machine architectures (for example, [28, 126,

130]) restrict all virtual machine instances to stay connected on the same wireless

network. MultiNet allows users to connect different virtual machines to physically

different wireless networks using only a single wireless card.

• Seamless Roaming: The time to handoff from one AP to another is a significant

overhead in mobile wireless networks [113]. MultiNet allows a wireless card to


10

connect to an AP without disconnecting from its previous one. This technique has

been used in a recent work, called SyncScan [100].

All the above scenarios require nodes to stay connected on more than one wireless net-

work, and MultiNet achieves this functionality with only one wireless card.

2.3 Prior Work

Virtualization has been studied extensively for abstracting a single system resource as

multiple available resources to the user. For example, Virtual Machine architectures,

such as VMWare [126], Denali [130], Xen [20], Terra [49], etc., virtualize a single com-

puter to give an illusion of many smaller virtual machines, each running its own oper-

ating system. Storage Virtualization systems, such as Facade [81] and Stonehenge [55],

virtualize a storage device into multiple logical storage devices. Similarly, Virtual Mem-

ory [23,41] presents an illusion of larger memory to user programs than is actually avail-

able. MultiNet is similar to the above systems in abstracting a single resource, in this

case a wireless card, as multiple wireless cards to the user. However, to the best of our

knowledge, it is the first system that virtualizes wireless network cards.

Prior work has looked at virtualizing the wired network interface on a machine. The

Virtual Machine architectures discussed above [20, 28, 49, 126, 130] virtualize all hard-

ware resources, including the network interface [120]. Other systems for low latency

communication, such as U-Net [128] and VIA [29, 38], virtualize the network interface

to multiple local virtual interfaces, one for each process. The physical network interface

is multiplexed across the virtual interfaces to send packets sent by a process. Network

Cloning [138] brings up multiple network stacks for a single physical interface. Similar

to these systems, MultiNet abstracts the wireless interface as multiple virtual interfaces,

and multiplexes the physical card across the virtual instances. However, it faces different
11

challenges that do not arise in the case of wired networks. Firstly, each virtual wireless

card might require connectivity to a physically different wireless network. Therefore, as

a contrast to the above systems, only one virtual instance is physically on the network at

any time. Secondly, switching to a different network takes a few hundred milliseconds,

as we show in Section 2.7. So, the approach used by the above systems, where packets

from different virtual interfaces are serviced by the wired interface in the order in which

they arrive, might incur a network switch overhead for every packet. This scheme may

not be suitable for virtualizing wireless cards. MultiNet uses different switching and

buffering algorithms, which are described in Section 2.5.3.

Another set of related work looked at smart channel hopping schemes over a single

wireless radio [66, 89, 114]. The idea is to distribute interfering traffic on different fre-

quency channels to increase the capacity of wireless networks. MultiNet differs from

these systems in two aspects. Firstly, MultiNet has to switch across multiple networks

instead of channels, and consequently MultiNet has to store more state for each network.

Secondly, all the above protocols have only been evaluated in simulation. We are not

aware of any prior implementation of these systems.

As part of MultiNet’s design goals, which we will describe in Section 2.5.1, any two

neighboring nodes in an ad hoc network should overlap on the same frequency channel

within a definite period. Our solution to this problem, described in Section 2.5.6, relies

on clock synchronization provided by the Timer Synchronization Function (TSF) of

IEEE 802.11 [58]. The algorithm or its variants [54, 74, 110] are based on an algorithm

proposed by Lamport [75], which shows that given the clock accuracy, link delay and

network diameter, and assuming that a timestamp is sent successfully along each link

at a constant frequency, the timing values of the entire network is guaranteed to be

within an established bound. A previous work [54] has shown that these algorithms
12

work reasonably well when there are no Byzantine failures [76] in the network. For our

algorithms to work with such failures, we would need clock synchronization algorithms

with stronger guarantees [116, 125]. However, handling these failures is out of scope of

this dissertation.

To the best of our knowledge, the idea of simultaneously connecting to multiple

wireless networks has not been studied before in the context of wireless LANs. A related

problem was considered for scatternet formation in Bluetooth [92] networks [77, 78].

Bluetooth networks comprise basic units, called piconets, that can have at most 7 nodes.

Piconets are used to form bigger networks, called scatternets, by having some nodes on

multiple piconets. However, the problem of enabling nodes in Bluetooth networks to

listen to multiple piconets is significantly different from the problem of allowing nodes

to connect to multiple IEEE 802.11 networks. Bluetooth uses a frequency hopping

scheme for communication between multiple nodes on the network. A node can be

on two networks simultaneously if it knows the correct hopping sequence of the two

networks and hops fast enough. IEEE 802.11 networks, on the other hand, have no such

scheme as is described next in Section 2.4.

An alternative to virtualizing wireless cards is to use multiple radios in the device,

and this approach has been commonly used in commercial products [64, 115, 119] and

wireless networking research [9, 87, 95]. However, as we show in Section 2.7, using

multiple radios consumes more power, which is a scarce resource in battery operated

devices. Further, a recent result shows that the performance of multi-radio systems is

significantly degraded by the self interference among the radios on the device [106]. In

Section 2.7.8, we show that MultiNet solves these problems of multi-radio systems at a

cost of reduced throughput.


13

2.4 Background

This section first discusses the limitations of IEEE 802.11 and describes why maintain-

ing simultaneous connections to multiple wireless networks is a non-trivial problem. It

then briefly describes the Power Save Mode (PSM) [58] feature of IEEE 802.11, which

is used in the Spoofed Buffering Protocol described in Section 2.5.4. Finally, it discusses

the next generation of WLAN cards, over which we evaluate MultiNet.

2.4.1 Limitations in Existing Systems

Popular wireless networks, such as IEEE 802.11, work by association. Once associated

to a particular network, either an AP based (infrastructure) or an ad hoc network, the

wireless card can receive and send traffic only on that network. The card cannot inter-

act with nodes in another network if the nodes are operating on a different frequency

channel. Further, a node in an ad hoc network cannot interact with a node in the infras-

tructure network even when they are operating on the same channel. This is because the

IEEE 802.11 standard defines different protocols for communication in the two modes

and it does not address the difficult issue of synchronization between different networks.

As a matter of practical concern, most commercially available WLAN cards trigger a

firmware reset each time the mode is changed from infrastructure to ad hoc or vice versa.

2.4.2 Power Save Mode (PSM) of IEEE 802.11

The IEEE 802.11 standard defines Power Save Mode (PSM) for infrastructure wireless

networks as a means to save battery power. When a node wants to use PSM, it sends a

message to the AP and sets its wireless interface to sleep mode. The message to the AP

also contains the duration for which the node wants to sleep. This duration is called the
14

Listen Interval. When the AP receives a packet destined for the sleeping node, it buffers

the packet. After a Listen Interval period, the node using PSM wakes up, and receives

the packets buffered at the AP. Usually, the Listen Interval is set to be a multiple of

the Beacon Period, where the Beacon Period is the interval at which an AP broadcasts

its beacon. The Beacon Period is a parameter of the AP, while the Listen Interval is a

parameter of the node using PSM.

2.4.3 Next Generation of IEEE 802.11 WLAN cards

In order to reduce the cost and commoditize wireless cards, IEEE 802.11 WLAN card

vendors [11, 102] are minimizing the functionality of the code residing in the micro-

controller of their cards. These next generation of wireless cards, which we refer to as

Native WiFi cards, implement just the basic time-critical MAC functions, while leaving

their control and configuration to the operating system. More importantly, these cards

allow the operating system to maintain state and do not undergo a firmware reset on

changing the mode of the wireless card. This is in contrast to the existing cards, which

we refer to as legacy wireless cards in the rest of this dissertation.

2.5 MultiNet

This section first formulates the MultiNet problem and enumerates its design goals. It

then describes the MultiNet system in detail.

2.5.1 Assumptions about the System

MultiNet is designed to work in a Wireless LAN environment, such as IEEE 802.11.

We make the following assumptions about such a network:


15

• All nodes in a network are synchronized to within a millisecond. IEEE 802.11

maintains a timer and uses a distributed Timer Synchronization Function (TSF) [58]

to synchronize these timers at all nodes in a network. IEEE 802.11b synchronizes

the timers at all nodes in a network to within 224 µs [60]. TSF, or its modifica-

tions, ATSP [54, 74] or ASCP [110] can be used to achieve the required synchro-

nization granularity even when broadcast packets are lost.

• APs implement Power Save Mode (PSM), and have enough buffer space to sup-

port all nodes using PSM in the network. This feature is defined in the IEEE

802.11 standard [58], and is implemented in some existing WLAN products [35,

121, 122].

• There is an overhead of switching a wireless card from one network to another.

This comprises the time to switch to another channel and associate to the network.

As we discuss in Section 2.7, this overhead is a few hundred milliseconds for most

commercial wireless cards. MultiNet will give better performance when this delay

is reduced, using the schemes we discuss in Section 2.8.1.

• The applications on machines running MultiNet can tolerate variable throughput

and delays. Some sample applications supported by MultiNet are browsing, file

transfers and web downloads. The reason why other applications are not sup-

ported is explained in Section 2.5.2.

• The device driver of a wireless card sends a disconnect message to the network

layer when it disconnects from a network, and a connect message when it success-

fully connects to one. On modern operating systems, such as Linux and Windows

XP, these messages are passed up to the user level and are used to display the

current status of the physical interface. In Windows XP, the device driver sends
16

a media disconnect and media connect message on disconnection and

connection respectively. In Linux, the device driver calls netif carrier off

and netif carrier on methods.

• A user knows if MultiNet is being used by more than one machine in an ad hoc

network. Further, all nodes in an ad hoc network agree to install software to

support MultiNet. Since ad hoc networks are usually cooperative networks, we

expect this assumption to hold in most cases.

2.5.2 MultiNet Design Goals

A virtualized physical wireless card appears as multiple virtual network interfaces,

where each virtual interface corresponds to a physically different wireless network. Fur-

ther, MultiNet also strives to achieve the following design goals when virtualizing a

wireless card:

• Transparency: To reduce the learning curve in using the system, we require

virtual interfaces to appear as physical wireless cards to the user. He should be

able to connect different virtual cards to different wireless networks, although the

physical card is only on one network at any instant. The architecture should ensure

that packets sent to and from a virtual interface are not discarded if the physical

wireless card is not on the corresponding network at that instant. Further, when

a machine is mobile, the virtual interface should appear disconnected when the

machine moves out of range of the network. However, it should appear connected

when the machine moves back in the network range.

• Performance: The system should give the illusion of simultaneous connectivity

on all virtual interfaces. Packet delays on a virtual interface should be minimized.


17

The user should also be able to prioritize different virtual interfaces, so that pack-

ets on a more important network are sent and received with lesser delay.

• Deployability: The system should be easy to deploy in an existing wireless net-

work. It should work over the commonly used IEEE 802.11 standard, and with

commercial wireless cards. Further, it should not require changes to the wireless

card drivers or the network infrastructure. Nearly all of the modifications should

be on the user’s machines.

In addition to the above design goals, there are a few plausible goals that Multi-

Net does not achieve. Firstly, it does not aim to support real-time applications over the

network, such as Voice over IP(VoIP) [127] or streaming video. This constraint arises

from the few hundred milliseconds overhead when switching from one network to an-

other. Unless this overhead is reduced, MultiNet will be unable to provide response

time guarantees of less than a few hundred milliseconds on all networks. Secondly,

MultiNet does not handle Byzantine failures in the network. Handling these failures

would require changes to our buffering and synchronization protocols described in Sec-

tions 2.5.4 and 2.5.6 respectively, and is out scope of this dissertation. Thirdly, we defer

the discussion of using MultiNet in multi-hop ad hoc networks to Chapter 3. In the rest

of this chapter, we limit our discussion to using MultiNet in single hop ad hoc networks,

where all nodes are in communication range of each other, and in infrastructure wireless

networks. Finally, the current implementation of MultiNet allows a node to stay con-

nected on only one ad hoc network in which multiple nodes use MultiNet. Enabling a

node to use MultiNet for maintaining connections to more than one such ad hoc network

is part of future work.


18

2.5.3 The MultiNet Approach

MultiNet achieves the above design goals by introducing functionality in a new layer,

between the network and physical layers of the network stack, as shown in Figure 2.1.

This layer, called the MultiNet Layer, initializes and maintains a new virtual network

interface for every new network on which the user wants to stay connected. The IEEE

802.11 parameters [58] of the physical wireless card is duplicated at each virtual inter-

face. So, each virtual interface has its own Service Set Identifier (SSID) and Network

Mode and appears as a separate wireless card to the network layer.

All virtual interfaces appear as connected to the network layer, even though the

physical card is connected to only one wireless network at any instant. This is shown in

Figure 2.1 where IP sees virtual interfaces 1, 2 and 3 as connected to networks 1, 2 and

3 respectively, although the physical card is connected to Network 2. Since all virtual

interfaces appear as connected, the user might send packets on any of them. Packets

sent to a virtual interface, when the physical card is not on its corresponding wireless

network, are buffered in a packet buffer maintained at each virtual interface. Packets are

sent over the network without any delay if the physical card is on the network.

MultiNet provides an illusion of simultaneous connectivity on all networks by mul-

tiplexing the physical wireless card across all virtual interfaces. The physical card stays

connected on a network long enough to send and receive one or more packets on the cor-

responding virtual interface. The MultiNet Layer then switches the physical card to a

network corresponding to another virtual interface. The information about the network

is retrieved from the state stored in the virtual interface. After switching the physi-

cal card to another network, MultiNet waits for a media connect message from the

lower layers. This message is sent only if the physical card successfully switches to

another network. On receiving this message, MultiNet sends the packets buffered on
19

Application
User Level
Kernel Level
Transport (TCP, UDP)

IP

Network 1 Network 2 Network 3

MultiNet Layer

Network 1 Network 2 Network 3

MAC and PHY

Wireless Card is on Network 2

Figure 2.1: The MultiNet Layer maintains virtual interfaces for networks 1, 2 and

3, and switches the physical card across all these networks. It gives the illusion of

connectivity on all networks although the card is on network 2 at this instant.

the virtual interface, and stays connected to this network for some time. This cycle

continues in round-robin fashion across all virtual interfaces.

Before describing the architecture further, we briefly define some terms we use in

the rest of this chapter. The period of time for which a card stays on a network after

successfully connecting to it is called the Activity Period for the network. The time to

switch to another network, from the time switching is initiated to the time the card is

associated to the wireless network, is called the Switching Delay for the network. The

Activity Period is the useful time when a card sends and receives packets, while the

Switching Delay is an overhead when the card is not on any network. The performance
20

of MultiNet is better when the Switching Delays are small. The sum of the Activity

Periods and Switching Delays over all connected networks is called the Switching Cycle.

Switching from one network to another requires the physical card to disconnect from

one network and connect to the other. Correspondingly, as described in Section 2.5.1,

the physical layer sends disconnect and connect messages to the upper layers. These

messages change the connectivity status of the virtual network interface, and as a result

only one virtual interface appears as connected at any time. This is a problem for Multi-

Net since the operating system drops packets sent on a disconnected network interface.

MultiNet solves this problem by trapping the disconnect message sent by the physi-

cal layer immediately after a disconnection. This message is received at the MultiNet

Layer and is prevented from going up the network stack. Consequently, layers above

the MultiNet Layer see all the virtual interfaces as connected although the physical card

switches across different networks.

MultiNet also manages the state of a virtual interface when a network disconnection

is caused by factors such as mobility or weak signal strength. The virtual interface is

made to appear disconnected when the physical card is unable to connect to its network,

and is made to appear connected when the physical card regains connectivity to the

network. MultiNet achieves this functionality by not trapping the disconnect message

when it is caused by any source other than MultiNet. As a result the virtual card appears

disconnected whenever the physical wireless card is unable to connect to its network.

Further, MultiNet attempts to connect to all networks in its Switching Cycle, even if its

previous attempt to connect was unsuccessful. When the physical wireless card success-

fully connects to a network, the connect message is passed up the network stack, and the

corresponding virtual interface appears connected. We demonstrate this functionality in

a mobile scenario in Section 2.7.7.


21

This design of MultiNet poses two interesting questions. Firstly, how are packets

delivered to a virtual interface if the card has switched to another network? Secondly,

how long should the card stay on a network? We first answer these questions for the

scenario when only one machine in any ad hoc network uses MultiNet. We then develop

our approach to handle the case when MultiNet is used by more than one node in an

ad hoc network. An important question we defer to future work, in Section 2.9, is the

interaction of TCP with MultiNet.

2.5.4 Delivering Packets to Virtual Interfaces

In this section, we present a buffering protocol that prevents packets sent to a virtual

interface from being discarded when the physical card is not on the corresponding net-

work. As part of the protocol, we describe a new approach that allows MultiNet to work

in infrastructure networks without modifying the APs.

The buffering protocol works differently for ad hoc and infrastructure networks. For

ad hoc networks, just before switching out of the network, a node broadcasts a packet

that informs all other nodes in the network of its unavailability and when it will be back

in the network. On switching back to the ad hoc network, the node broadcasts another

packet announcing its availability. Packets destined for this node are buffered by other

nodes in the ad hoc network, until either of the following two conditions hold: the

broadcast announcing availability of the node is received, or the time by which the node

was expected to be back in the network has elapsed. If the node is available, then the

buffered packets are sent to the it. Otherwise, if the timer has elapsed, then the buffered

packets are discarded. This protocol requires modifications at all nodes in the ad hoc

network, even if they do not use MultiNet to connect to multiple networks. This should

not be very difficult to achieve as was explained in Section 2.5.1.


22

MultiNet could use a similar protocol for infrastructure networks. However, APs

would need to be modified to buffer packets destined for nodes using MultiNet on its

network. This significantly affects the deployability of MultiNet, as discussed in Sec-

tion 2.5.2. MultiNet solves this problem by proposing a new protocol, called Spoofed

Buffering. Spoofed Buffering buffers packets at the APs without requiring modifications

to them.

Spoofed Buffering works as follows. MultiNet spoofs sleep mode to the AP just

before switching out of an infrastructure network. It sends a special IEEE 802.11 packet

to the AP, which informs the AP that it is using IEEE 802.11 PSM to go to sleep mode,

and the time for which it will sleep. While the AP knows the node to be sleeping,

MultiNet switches the physical card to another network. As described in Section 2.4.2,

PSM requires APs to buffer packets for nodes that are sleeping in its network, and to

send the buffered packets when the nodes wake up. So, packets destined for the MultiNet

nodes are buffered at the AP until the node switches back to the infrastructure network.

The node then informs the AP that it is awake by sending another IEEE 802.11 packet.

On receiving this packet, the AP sends all the buffered packets, which are received by

the corresponding virtual interface.

Figure 2.2 illustrates the steps of Spoofed Buffering when a node uses MultiNet to

connect to two wireless networks. Before switching out of network 1, the node informs

the AP that it is going to sleep for a certain time. It then switches to network 2, where

it announces that it is awake. The AP in network 2 then sends the buffered packets to

the node, which forwards them up to the corresponding virtual interface. The virtual

interface also sends its buffered packets to the AP. The node then stays on network 2 for

the Activity Period. It then sends a message to the AP of network 2 announcing that it

is going to sleep, and switches to network 1 and informs the AP of network 1 that it is
23

awake. These steps continue as long as the node requires connectivity on both wireless

networks.

Network 2 SERIAL ETHERNET

2)
3)

ing
I a /Re ing
4)
Se

ke
m

leep
I
nd leep

awa
am

aw eive

ms
ak
s

am
e

a
1) I
5) I
pa
ck
tse

MultiNet node connected to networks 1 and 2

Figure 2.2: The steps of Spoofed Buffering when a node uses MultiNet to connect

to two networks.

We note that despite our buffering protocol, packets might still be lost due to other

reasons, such as mobility, wireless signal fade or interference. Further, buffering might

not be possible at other nodes in the network, due to lack of cooperation from nodes in

the ad hoc network or PSM support at the APs. In such scenarios, MultiNet relies on

higher layer protocols, such as TCP, to recover the lost packets. We compare MultiNet

with and without buffering support in Section 2.7.5, and show that although MultiNet

performs much better when the buffering protocols are implemented, its performance is

reasonable in the bad case when no packets are buffered.


24

2.5.5 Determining the Activity Period for a Network

The Activity Period is the duration for which a wireless card stays connected on a net-

work. MultiNet supports three schemes for determining this duration, each of which is

useful in different scenarios.

• Fixed Slot Duration: In this approach, MultiNet divides time into slots of fixed

duration. Every time the physical card switches to a network, it stays on that

network for one slot. The slot duration includes the Switching Delay. This scheme

is simple to implement, and is useful in cases where synchronization is required

between multiple nodes using MultiNet in an ad hoc network. We use it for our

algorithms in Section 2.5.6 and for SSCH described in Chapter 3.

• User Defined Priority: This scheme requires the user to prioritize all his net-

works, and define the Total Activity Period. The Total Activity Period is the sum

of Activity Periods of all networks, which is equal to the difference between the

Switching Cycle and the sum of Switching Delays across all networks. Multi-

Net then calculates the Activity Period for each network based on its priority.

So, if a user requires connectivity to a set of wireless networks, and has given

network i a priority xi , then the Activity Period of any network j is given by



xj ∗ (1/( ∀k xk )) ∗ (T otalActivityP eriod). This scheme is useful when there

exists a predefined priority across all networks. For example, the Client Conduit

Protocol, described in Chapter 4, uses user defined priorities to limit the duration

for which a connected machine helps a disconnect wireless client.

• Adaptive Schemes: This approach does not require any intervention from the

user. It dynamically prioritizes networks based on the amount of traffic seen

on it, and uses these priorities to calculate the Activity Period for each network.
25

Consequently, a network that sends and receives more packets has a longer Ac-

tivity Period as compared to a less active one. So, if MultiNet has to switch

across different networks, and network i has seen Pi packets in its last Activity

Period AT Pi , then the node stays in network j for an Activity Period given by
 
(Pj /AT Pj ) ∗ (1/( ∀k Pk /AT Pk )) ∗ ( ∀k AT Pk ). The first term gives the net-

work utilization of network j, the second gives the utilization across all networks,

and the final term is the total amount of time the node is active across all networks.

This approach is useful in scenarios where the user wants to get the best perfor-

mance on multiple networks, without worrying about the traffic patterns on each

network. We use this strategy to provide true zero configuration over MultiNet,

as described in Section 2.7.4. We evaluate two adaptive strategies for MultiNet

in Section 2.7.3. Adaptive Buffer is a naive approach that prioritizes networks

based on the number of packets buffered by their corresponding virtual interfaces

during a Switching Cycle. Adaptive Traffic is a more sophisticated approach

that maintains a history of packets sent and received on all virtual interfaces over

a certain number of Switching Cycles. It then uses this history to prioritize across

networks, and adapt their Activity Periods.

2.5.6 Handling Ad Hoc Networks with Multiple MultiNet Nodes

Supporting multiple nodes to use MultiNet in an ad hoc network poses a new problem.

Any two nodes using MultiNet might not overlap in the ad hoc network for a signif-

icant period of time. Consequently, these nodes will be unable to communicate with

each other for long durations even though they are in communication range of each

other. This significantly affects the performance of MultiNet on the ad hoc network.

Figure 2.3 illustrates this problem when two nodes A and B are in communication range
26

of each other and use MultiNet with Fixed Slot Duration to connect to two networks:

Infrastructure Network 1 and Ad Hoc Network 2. In this scenario, nodes A and B do not

overlap in the ad hoc network, and cannot communicate in this network. However, note

that this problem is specific to ad hoc networks, as these nodes can communicate in the

infrastructure network using Spoofed Buffering to buffer packets at the APs. Further,

this problem also arises for other switching protocols described in Section 2.5.5, as two

nodes might overlap for a very small period of time, which is too small to send even a

single packet.

IS Network 1 AH Network 2 IS Network 1 AH Network 2

Machine A

AH Network 2 IS Network 1 AH Network 2 IS Network 1

Machine B

time

Figure 2.3: Two nodes in communication range and using MultiNet that fail to

overlap in the ad hoc network abd hence experience a logical partitioning.

This section presents a simple approach, called Slotted Synchronization, to synchro-

nize an overlap between any two nodes using MultiNet in a single hop ad hoc network.

We discuss SSCH, which is a more sophisticated and efficient approach for multihop

networks in Chapter 3. Slotted Synchronization has a limitation that it allows a node to

connect to only one ad hoc network in which multiple nodes use MultiNet. Extending

this approach to allow nodes to stay connected in many ad hoc networks with multiple

nodes using MultiNet is part of future work.

Slotted Synchronization uses what we term the “Fixed Slot Duration switching scheme”,

in which time is divided into slots and nodes switch to a network at the beginning of a
27

slot. All nodes use the same slot duration, and clocks at all nodes in a network are

synchronized to within a millisecond of each other. The slot duration is chosen to be

a few hundred milliseconds to accommodate the Switching Delay when switching to a

network. We quantify the Switching Delay in Section 2.7.2. Slotted Synchronization

makes the assumption, as described in Section 2.5.1, that the node starting an ad hoc

network knows if more than one node in its network is going to use MultiNet.

Slotted Synchronization works as follows. The initiator node of an ad hoc network

defines a recurrence period for the network. The recurrence period is the periodicity,

measured in slots, at which MultiNet connects to the ad hoc network. As we show in

Section 2.6.4, the SSID field of the IEEE 802.11 Beacon [58] can be modified to carry

the information about the recurrence period of the network and offset within the slot

when the Beacon is transmitted. When a node uses MultiNet to join this network, it uses

this information to synchronize the start time of its slots to that of the ad hoc network.

Then, after every recurrence period slots, MultiNet switches the physical card of this

node to the ad hoc network. Over the remaining slots, MultiNet switches the physical

card across all the other networks.

This algorithm ensures that all nodes in the ad hoc network overlap for one slot

every recurrence period slots, even when some nodes use MultiNet to stay connected on

other networks. Slotted Synchronization achieves this guarantee by synchronizing the

slots at all nodes in the network to the parameters specified by the initiator. Further, slot

synchronization occurs only at the time of joining the network and so this algorithm is

not affected by mobility in the network. Note that this algorithm might not work if a

node uses it to synchronize slots to multiple networks, since the initiator’s slots of these

disjoint networks might not be synchronized. Therefore, we limit a node to use MultiNet

to stay connected on only one ad hoc network in which multiple nodes use MultiNet.
28

However, it can connect to many infrastructure networks and ad hoc networks in which

it is the only node using MultiNet.

2.6 Implementation

We implemented MultiNet on Windows XP as a combination of a kernel driver with

a user level service. The mechanisms for storing network state, and for switching and

buffering across networks are implemented in the kernel, while the respective policies

are implemented in the service. The kernel driver is an NDIS intermediate driver, which
1
exists as a layer between the network device drivers and IP. MultiNet performs best

when APs implement PSM and other nodes in an ad hoc network buffer packets for

nodes using MultiNet. However, no changes are required in the wired nodes for Multi-

Net to work. The rest of this section describes the details of our implementation.

2.6.1 MultiNet Driver

The MultiNet driver provides all the mechanisms required by the MultiNet architecture.

It initializes and maintains the virtual interfaces, and provides support to switch a wire-

less card from one network to another and to buffer packets at the virtual interfaces if the

physical card is not on the wireless network. This driver also sends the buffered packets

when it receives a media connect message after switching to another network.

The MultiNet driver is implemented entirely as a Windows NDIS Intermediate driver.

NDIS requires the lower binding of a network protocol, such as IP, to be a network

miniport driver2, such as the driver of a network interface. Similarly, NDIS requires the
1
Network Driver Interface Specification (NDIS) is a Windows construct that pro-
vides transport independence for the network card vendors. All networking protocols
used by Windows call the NDIS interface to access the network.
2
A miniport driver directly manages a network interface card (NIC) and provides an
29

upper binding of miniport drivers to be a network protocol driver. We accommodate

this requirement in the design of the MultiNet Driver, which includes two components:

the MultiNet Protocol Driver (MPD), which provides an upper binding to the network

card miniport driver, and the MultiNet Miniport Driver (MMD), which provides a lower

binding to the network protocols, such as TCP/IP. The modified stack is illustrated in

Figure 2.4.

Mobile Aware
Application
Application
MultiNet Service

User WinSock 2.0

Kernel Legacy Native media -aware


Protocols TCP/IP protocols

Net 1 Net 2 Net N


MultiNet Miniport Driver (MMD) MultiNet
NDIS

Driver
MultiNet Protocol Driver (MPD)

NDIS WLAN
extensions

NDIS miniport NDIS WLAN


miniport

Hardware

Figure 2.4: The Network Stack with MultiNet

The MPD manages multiple virtual interfaces over one wireless card. It switches

the association of the underlying card across different networks, and buffers packets if

the SSID of the associated network is different from the SSID of the sending virtual
interface to higher-level drivers.
30

interface. MPD also buffers packets on the instruction of the MultiNet Service, as we

describe later in Section 2.6.2. Further, the MPD handles packets received by the wire-

less card. A packet received on the wireless card is sent to the virtual interface associated

with the network on which the packet is received.

The MMD manages a virtual interface of a wireless card. It maintains the state for

each virtual interface, which includes the SSID and operational mode of the wireless

network. It is also responsible for handling query and set operations directed for the

underlying physical wireless interface.

2.6.2 MultiNet Service

The MultiNet service implements the algorithms for switching across networks and

buffering packets, described in Sections 2.5.5 and 2.5.4 respectively. This service is

a user level daemon that uses I/O Control Codes (ioctls) to interact with the MultiNet

Driver. It also broadcasts packets to interact with the service running at other nodes.

These messages coordinate the buffering protocol for ad hoc networks, described in

Section 2.5.4. Further, all the switching algorithms discussed in Section 2.5.5 are im-

plemented in the MultiNet service. The service determines the duration of the Activity

Period, and sends a signal to MPD when the Activity Period expires. This signal initiates

the switching mechanism implemented in MPD. Finally, the service also coordinates the

synchronization protocol described in Section 2.5.6. It embeds the recurrence period and

offset in the IEEE 802.11 Beacon frame, and uses this information to synchronize the

slot times of all nodes in the network.


31

2.6.3 Implementing Buffering

Spoofed Buffering, described in Section 2.5.4, buffers packets for MultiNet over infras-

tructure networks using IEEE 802.11 PSM. We successfully implemented this scheme

over Native WiFi cards, which were described in Section 2.4.3. For non-Native WiFi

(legacy) cards, we were constrained by the proprietary software on the card drivers.

Their software does not expose any APIs in Windows to programmatically set the res-

olution of power save mode. Therefore, we were unable to implement the buffering

algorithm for these WLAN cards. However, for prototyping Spoofed Buffering, we

buffer packets at the end points of infrastructure networks, using a scheme similar to the

one described for ad hoc networks in Section 2.5.4. The MultiNet service keeps track of

the end points of all on-going sessions, and buffers packets if the destination is currently

in another network.

2.6.4 Implementing Slotted Synchronization

The Slotted Synchronization Protocol, described in Section 2.5.6, requires an ad hoc

network with multiple MultiNet nodes to have two parameters, in addition to the ones

specified by IEEE 802.11. In particular, the initiator of such an ad hoc network has

to specify the recurrence period and the offset within the slot when the IEEE 802.11

Beacon is sent. Any node joining this network has to learn of both these parameters for

Slotted Synchronization to work. One way to implement this requirement is to modify

IEEE 802.11 packets to carry more information. However, this requires modifications to

the wireless card driver, and might reduce the interoperability of MultiNet, as discussed

in Section 2.5.2.

We use an alternative approach to solve this problem. The two parameters are em-

bedded in the SSID field of an IEEE 802.11 Beacon, which is broadcast once every
32

fixed interval.3 The SSID field of the Beacon frame is 32 bytes in length. The recur-

rence period is measured in slots, and the maximum value is the number of networks to

which a user can connect to. We limit this to be 255, and so 1 byte is sufficient to carry

this information. Further, the offset within the slot is measured in microseconds, and

we limit the maximum slot duration to 5 seconds. So, 2 bytes are enough to embed the

value of the offset. Therefore, the user is allowed to use a 29 characters long SSID for

such ad hoc networks. Based on experience, we believe that this does not significantly

reduce the usability of IEEE 802.11 networks.

2.7 System Evaluation

We studied the performance of MultiNet using a real implementation and a custom sim-

ulator. The implementation was used to study the throughput behavior with different

switching algorithms. We then simulated MultiNet with realistic parameters, and com-

pared it with the alternative approach of using multiple radios to connect to multiple

networks. We compare the two approaches with respect to energy consumption and the

average delay encountered by the packets. The results presented in this section con-

firm that MultiNet is a more energy-efficient way of achieving connectivity to multiple

networks as compared to using multiple radios.

2.7.1 Test Configuration

MultiNet has been deployed and tested over a dozen commercial IEEE 802.11 wireless

cards. The results in this section were derived over an IEEE 802.11b network [60].

The wireless cards used were the Cisco 340 series, Compaq WLAN 200, Orinoco Gold,
3
The IEEE 802.11 protocol for joining an ad hoc network requires the joining node
to use the information in the Beacons of that network.
33

Netgear WAG 511 and the Native WiFi cards from AMD [11] and Realtek [102]. All

these cards have a maximum data rate of 11 Mbps. The APs used were the Cisco 340

Series, EZConnect 2656, DLink DI-614+ and Native WiFi APs. IEEE 802.11 PSM was

implemented only in the Native WiFi APs. Most of our results were consistent across

all these network equipments.

2.7.2 Reducing the Switching Delay

Good performance of MultiNet depends on a short delay when switching across net-

works. However, legacy IEEE 802.11b cards perform the entire association procedure

every time they switch to a network. We carried out a detailed analysis of the time to

associate to an IEEE 802.11 network. The results showed significant overhead when

switching from one network to another. In fact, an astronomical delay of 3.9 seconds

was observed from the time the card started associating to an ad hoc network, after

switching from an infrastructure network, to the time it started sending data.

Table 2.1: The Switching Delays between IS and AH networks for IEEE 802.11

cards with and without the optimization of trapping media connect and disconnect

messages.

Switching Unoptimized Optimized Optimized

From Legacy Legacy Native WiFi

IS to AH 3.9 s 170 ms 25 ms

AH to IS 2.8 s 300 ms 30 ms

Our investigations revealed that the cause of this delay is the media disconnect and

media connect notifications to the IP stack. The IP stack damps the media disconnect

and connect for a few seconds to protect itself and its clients from spurious signals. The
34

spurious connects and disconnects can be generated by network interface cards due to a

variety of reasons ranging from buggy implementations of the card or switch firmware

to the card/switch resetting itself to flush out old state. Windows was designed to damp

the media disconnect and connect notifications for some time before rechecking the

connectivity state of the adapter and taking the action commensurate with that state.

In the case of MultiNet, switching between networks is deliberate and meant to be

hidden from higher protocols, such as IP and the applications. We hide switching by

having MPD trap the media disconnect and media connect messages when it switches

between networks. Since the MPD is placed below IP, it can prevent the network layer

from receiving these messages. This minor modification significantly improves the

Switching Delay as shown in Table 2.1. Using the above optimization, we were able

to reduce the switching delay from 2.8 seconds to 300 ms when switching from an ad

hoc network to an infrastructure network and from 3.9 seconds to 170 ms when switch-

ing from an infrastructure network to an ad hoc network. These numbers are further

reduced to as low as 30 ms and 25 ms respectively, when Native WiFi cards are used.

We believe that this overhead is extraneous for purposes of MultiNet and in Section 2.8

we suggest additional ways to make this delay negligible.

A nice consequence of masking the media connect and media disconnect messages

is that all virtual adapters are visible to IP as connected, and our architecture of Section

2.5.3 is therefore consistent.

2.7.3 Comparing Different Switching Strategies

We implemented three switching strategies described in Section 2.5.5, i.e. User Defined

Priority, Adaptive Buffer, and Adaptive Traffic. The test environment comprised a node

that used MultiNet to stay connected to an infrastructure and an ad hoc network. The
35

Switching Delays from the ad hoc to the infrastructure network and vice versa were

overestimated at 500 ms and 300 ms respectively. 4 The total time available for switch-

ing between networks was 1 sec. We evaluated the switching strategies when simultane-

ously transferring a file of size 47 MB using FTP from the MultiNet node to two nodes

on the different networks. An independent transfer of the file over the ad hoc network

took 80.25 seconds, while it took 54.12 seconds over the infrastructure network.

Figure 2.5 shows the time taken to simultaneously transfer this file over MultiNet

using different switching strategies for legacy cards. We evaluated 3 different User

Defined Priority switching schemes. In the ‘50%IS 50%AH’ strategy the node stays on

each network for 500 ms. In the ‘75%IS 25%AH’ scheme it stays on the infrastructure

network for 750 ms and on the ad hoc network for 250 ms, and in the ‘25%IS 75%AH’

scheme the node stays on the infrastructure network for 250 ms and the ad hoc network

for 750 ms. For the Adaptive Traffic algorithm we used a window of 3 switching cycles

to estimate the Activity Periods. In this case the window is 3*1.8 = 5.4 seconds since

the Switching Cycle is 500+300+1000 = 1800 ms.

Different switching strategies show different behavior and each of them might be

useful for different scenarios. For the User Defined Priority strategies, the network with

higher priority gets a larger slot to remain connected. Therefore, the network with a

higher priority takes lesser time to complete the FTP transfer. The results of the adap-

tive algorithms are similar. The Adaptive Buffer algorithm adjusts the time it stays on

a network based on the number of packets buffered for that network. Since the maxi-

mum throughput on an infrastructure network is more than the throughput of an ad hoc

network5 , the number of packets buffered for the infrastructure network is more. There-
4
This overprovisioning helped to evenly compare all the switching schemes by fixing
the duration of the Switching Cycle
5
Separate experiments revealed that the average throughput on a wireless network
with commercial APs and wireless cards is 5.8 Mbps for an isolated infrastructure net-
36

800

700

600

500

Seconds
400

300

200

100

0
25%IS 75%AH 50%IS 50%AH 75%IS 25%AH Adaptive Buffer Adaptive Traffic

IS AH

Figure 2.5: Time taken to complete a 47 MB FTP transfer on an ad hoc and infras-

tructure network using different switching strategies

fore the FTP transfer completes faster over the infrastructure network as compared to

the ‘50%IS 50%AH’ case. For a similar reason the FTP transfer over the infrastructure

network completes faster when using Adaptive Traffic switching. MultiNet sees much

more traffic sent over the infrastructure network and proportionally gives more time to

it. Overall, the adaptive strategies work by giving more time to faster networks if there

is maximum activity over all the networks. However, if some networks are more active

than the others, then the active networks get more time. We expect these adaptive strate-

gies to give the best performance if the user has no priority and wants to achieve the best

performance over all the MultiNet networks.


work and 4.4 Mbps for an isolated two node ad hoc network. These results are consistent
with [52].
37

Traffic (in packets)


1400 10

1200 -10

1000 -30

800 -50
Activity Period (in ms)

600 -70

400 -90

200 -110

0 -130
0 20 40 60 80 100 120 140

Time (in seconds)

Ad hoc Infrastructure TrafficAH TrafficIS

Figure 2.6: Variation of the activity period for two networks with time. The activity

period of a network is directly proportional to the relative traffic on it.

2.7.4 Adaptive Switching using MultiNet

The adaptability of MultiNet is demonstrated in Figure 2.6. The Adaptive Traffic switch-

ing strategy is evaluated by running our system for two networks, an ad hoc and an

infrastructure network, for 150 seconds. The plots at the top of Figure 2.6 show the

traffic seen on both the wireless networks, and the ones at the bottom of this figure show

the corresponding effect on the Activity Period of each network. The adaptive switch-

ing strategy causes the Activity Period of the networks to vary according to the traffic

seen on them. Initially when there is no traffic on either network, MultiNet gives equal

time to both networks. After 20 seconds there is more traffic on the ad hoc network,

and so MultiNet allocates more time to it. The traffic on the infrastructure network is

greater than the traffic on the ad hoc network after around 110 seconds. Consequently,

the infrastructure network is allocated more time. This correspondence between relative

traffic on a network and its activity periods is evident in Figure 2.6.


38

MultiNet, when used with adaptive switching schemes, provides true zero config-

uration. Prior schemes, such as Wireless Zero Configuration (WZC), require users to

specify a list of preferred networks, and WZC only connects to the most preferred avail-

able wireless network. The adaptive switching strategies require a user to specify a list

of preferred networks, and the card connects to all the networks giving time to a network

based on the amount of traffic on it.

2.7.5 MultiNet with and without Buffering

We have implemented Spoofed Buffering on infrastructure networks with Native WiFi

cards using IEEE 802.11 PSM. However, many commercial APs do not implement

PSM. Further, the ad hoc network buffering protocol, described in Section 2.5.4, re-

lies on broadcast packets, which are more unreliable than unicast packets [91]. These

packets might get lost, and packets destined to MultiNet’s virtual interface might get

dropped. The worst case occurs when no packets are buffered due to lost broadcast

packets or lack of PSM support from commercial APs. Figure 2.7 compares this worst

case to the scenarios when MultiNet implements buffering. In our test scenario, we

consider an infrastructure network with and without Spoofed Buffering.

Packets were sent, using ntttcp, over the infrastructure network from the MultiNet

node to another node in the network. Ntttcp, which is a port of ttcp [118] to Windows,

works by establishing a TCP session between two nodes and sending the packets at the

maximum rate. The Activity Period for both networks was fixed at 500 ms. We present

results for three scenarios in Figure 2.7. ‘NoMultiNet’ corresponds to the case when

the sender and receiver are connected to just one network, ‘MultiNetNoBuffer’ is when

the sender is connected to two networks using MultiNet and the AP does not implement

Spoofed Buffering, and the APs implement Spoofed Buffering in ‘MultiNetBuffer’. Re-
39

9.E+03

8.E+03

7.E+03

TCP Sequence #
6.E+03

5.E+03

4.E+03

3.E+03

2.E+03

1.E+03

0.E+00
1.9 2.6 3.3 4 4.7 5.4 6.1 6.8 7.5 8.2 8.9 9.6

Time (in seconds)

NoMultiNet MultiNetBuffer MultiNetNoBuffer

Figure 2.7: TCP Performance with and without Spoofed Buffering.

sults show that the performance drops by a factor of four when using MultiNet with

Spoofed Buffering and drops further when the AP does not buffer packets. When APs

buffer packets, the MultiNet node can achieve a throughput proportional to the duration

of its Activity Period, which is around a fourth of the Switching Cycle. Without buffer-

ing, the throughput of the system in this case goes down to a seventh of the maximum

achievable throughput. Notice that although performance drops significantly, MultiNet

is still usable with a throughput of around 500 Kbps.

2.7.6 MultiNet with Slotted Synchronization

We now study the performance of Slotted Synchronization, described in Section 2.5.6.

We set up a three node network. The first machine always stays on the infrastructure

network. Both the other machines use MultiNet. Before we start this experiment, the

second node is connected to two networks, an ad hoc network and an infrastructure

network. It is initially the only node in the ad hoc network. The third node, which we
40

Connecting to
AH Network
1.6

1.4

Throughput (in Mbps) 1.2

0.8
0.6

0.4
0.2

0
0 10 20 30 40 50
Time (In Seconds)

IS Network Ad Hoc Network

Figure 2.8: Effect on UDP flows when a node uses Slotted Synchronization to join

an ad hoc network

also use as our test machine, is initially connected to only the infrastructure network.

We start a UDP flow between the test machine and the first machine, which is only on

the infrastructure network. We use Fixed Slot Duration switching, and set the duration

of each slot to 800 ms. This duration contains the Switching Delay. IPerf [1] was used to

initiate UDP flows of 1 Mbps with 512 bytes packets. The MPD was also instrumented

to report the total number of successful packets sent and received in every slot. This

setup used Spoofed Buffering.

Figure 2.8 illustrates the instantaneous throughput, measured once per Switching

Cycle, achieved by UDP flows when the test machine joins an ad hoc network that has

more than one MultiNet node. Initially, when the test machine is only in the infras-

tructure network, there is no Switching Delay, and consequently the UDP throughput

is around 1 Mbps. After 13 seconds, the test machine uses MultiNet to connect to the
41

ad hoc network, which already has one MultiNet node. The test machine takes around

15 seconds to initialize another virtual interface, build up its state, synchronize the slots

to the MultiNet node in the ad hoc network and get a DHCP address for the virtual

interface. After this time, the UDP flow between the test machine and the infrastruc-

ture network node resumes. We immediately start another UDP flow between the two

MultiNet nodes in the ad hoc network. As we see in the figure, UDP throughput in

the infrastructure network drops to around half the initial throughput. This is because

the infrastructure network gets one of two slots in Fixed Slot Duration Switching since

MultiNet connects to two networks. The Switching Delay does not reduce the through-

put further, because MultiNet is able to send the buffered packets over the Activity

Period at the network’s bandwidth, which is greater than the IPerf flow rate of 1 Mbps.

Further, the flow over the ad hoc network roughly achieves the same throughput as over

the infrastructure network, which implies that Slotted Synchronization maintains a good

overlap between multiple nodes using MultiNet to connect to an ad hoc network.

2.7.7 MultiNet on a Mobile Node

MultiNet does not aim to hide mobility from the user. As discussed in Section 2.5.2,

MultiNet’s virtual interfaces should behave as physical wireless cards when nodes are

mobile. To illustrate this behavior, the same experimental setup of Section 2.7.6 was

used. However, in this case, we focused on the throughput in the ad hoc network. After

around 28 seconds, the test machine was moved away from the other MultiNet node

in the ad hoc network. As we see in Figure 2.9, the IPerf throughput over the ad hoc

network keeps falling as the machine moves away from the other node in the ad hoc

network. With an increase in distance between the two nodes, the signal strength de-

creases, which increases the loss rate and reduces the throughput. After some time the
42

0.9 Motion Connection Connection


Start Lost Regained
0.8

0.7

Throughput (in Mbps)


0.6

0.5

0.4

0.3

0.2

0.1

0
0 10 20 30 40 50 60 70 80

Time (in seconds)

Figure 2.9: MultiNet in a Mobile Scenario

connection over the ad hoc network is lost. This state is propagated to the application

layer, which halts IPerf. However, MultiNet keeps trying to reconnect to the ad hoc net-

work, as described in Section 2.5.3. It regains connectivity at around 52 seconds. The

IPerf flow is started immediately between the two nodes. As we see in the figure, the

two nodes using MultiNet achieve the same throughput after reconnection, as they had

before the connection was lost. This shows that there is a significant overlap between the

two nodes, and the performance of Slotted Synchronization is not significantly affected

with mobility. The test machine was again moved at around 70 seconds, and we see a

corresponding reduction in throughput.

2.7.8 MultiNet versus Multiple Radios

MultiNet is one way of staying connected to multiple wireless networks. The alter-

native approach is to use multiple wireless cards. Each card connects to a different

network, and the machine is therefore connected to multiple networks. We simulated


43

this approach, and compared it with the MultiNet scheme with respect to the energy

consumed and the average delay of packets over the different networks. We first present

our simulation environment, and then compare the results of the MultiNet scheme to the

alternative approach using multiple radios.

Simulation Environment

We simulated both approaches for a sample scenario of people wanting to share and

discuss a presentation over an ad hoc network and browse the web over the infrastruc-

ture network at the same time. This feature is extremely useful in many scenarios. For

example, consider the case where a company, say Kisco’s, employees conduct a busi-

ness meeting with another company, say Macrosoft’s, employees at Macrosoft’s head-

quarters. With MultiNet and a single wireless network card, Kisco employees can share

documents, presentations, and data with Macrosoft’s employees over an ad hoc network.

Macrosoft’s employees can stay connected to their internal network via the access point

infrastructure while sharing electronic information with Kisco’s employees. Macrosoft

does not have to give Kisco employees access in their internal network in order for the

two parties to communicate.

We model traffic over the two networks, and analyze the packet trace using our sim-

ulator. Traffic over the infrastructure network is considered to be mostly web browsing.

We used Surge [18] to model http requests according to the behavior of an Internet

user. Surge is a tool that generates web requests with statistical properties similar to

measured Internet data. The generated sequence of URL requests exhibit representa-

tive distributions for requested document size, temporal locality, spatial locality, user

off times, document popularity and embedded document count. For our purposes, Surge

was used to generate a web trace for a 1 hour 50 minute duration, and this web trace
44

1600

1400

1200

Packet Size (in Bytes)


1000

800

600

400

200

0
0 1000 2000 3000 4000 5000 6000 7000

Time (seconds)

Figure 2.10: Packet trace for the web browsing application over the infrastructure

network

was then broken down to a sample packet trace for this period. The distribution of the

packet sizes over the infrastructure network is illustrated in Figure2.10.

The ad hoc network is used for two purposes: sharing a presentation, and support-

ing discussions using a sample chat application. Three presentations are shared in our

application over a 1 hour 50 minute period. Each presentation is a 2 MB file, and is

downloaded to the target machine using an FTP session over the ad hoc network. They

are downloaded in the 1st minute, the 38th minute, and the 75th minute. Further, the

user also chats continuously with other people in the presentation room, discussing the

presentation and other relevant topics. Packet traces for both the applications, FTP and

chat, were obtained by sniffing the network, using Ethereal [45], while running the re-

spective applications. MSN messenger was used for a sample chat trace for a 30 minute

duration. The Packet traces for FTP and chat were then extended over the duration of

our application, and are illustrated in Figure 2.11.


45

In our simulations we assume that wireless networks operate at their maximum TCP

throughput of 4.4 and 5.8 Mbps for an ad hoc and infrastructure network respectively.

We then analyze the packet traces for independent networks, and generate another trace

for MultiNet. We use a ‘75%IS 25%AH’ switching strategy presented in Section 2.5.5

with a switching cycle time of 400 ms. The switching delay is set to 1 ms, and we ex-

plain the reason for choosing this value in Section 2.8.1. Further, the power consumed

when switching is assumed to be negligible. We do not expect these simplifying as-

sumptions to greatly affect the results of our experiments. We analyze packet traces for

the two radio and MultiNet case and compute the total power consumed and the average

delay encountered by the packets. All the cards are assumed to be Cisco AIR-PCM350,

and their corresponding power consumption numbers are used from [111]. Specifically,

the card consumes 45 mW of power in sleep mode, 1.08W in idle mode, 1.3W in re-

ceive mode, and 1.875W in transmit mode. Further, in PSM, the energy consumed by

the Cisco AIR-PCM 350 in one power save cycle is given by: 0.045 ∗ n ∗ t + 24200

milliJoules, where n is the Listen Interval and t is the Beacon Period of the AP. The

details of these numbers are presented in [111].

Table 2.2: The average throughput in the ad hoc and infrastructure networks using

both strategies of MultiNet and two radios

Network Two Radio MultiNet

Ad Hoc 4.4 Mbps 1.1 Mbps

Infrastructure 5.8 Mbps 4.35 Mbps

Despite the performance advantages seen in Table 2.2, using multiple radios con-

sumes more power. Each radio is always on, and therefore keeps transmitting and re-

ceiving over all the networks. Even when it is not, the radio is in idle mode, and drains
46

1600

1400

1200

Packet Size (in Bytes)


1000

800

600

400

200

0
0 1000 2000 3000 4000 5000 6000 7000

Time (seconds)

Figure 2.11: Packet trace for the presentation and chat workloads over the ad hoc

network

a significant amount of power. Figure 2.12 shows the amount of energy consumed by

the MultiNet scheme and the two radio scheme for the above application. Two radios

consume almost double the power consumed by the single MultiNet radio.

Table 2.3: The average packet delay in infrastructure mode for the various strate-

gies

Scheme Avg Delay (in Seconds)

Two Radio 0.001

MultiNet 0.157

Two Radio PS 0.156

MultiNet PS 0.167
47

16000

14000

Energy Consumed (In Joules)


12000

10000

8000

6000

4000

2000

0
0 1000 2000 3000 4000 5000 6000
Time (In Seconds)

Two Radios MultiNet

Figure 2.12: Comparison of total energy usage when using MultiNet versus two

radios

With Power Save Mode

The multiple radio approach can be modified to consume less power by allowing the

network card in infrastructure mode to use PSM. Figure 2.13 shows the energy usage

when the infrastructure radio uses PSM for our application. The Beacon Period is set to

100 ms, and the Listen Interval is 4. The amount of energy consumed in the two radio

case using PSM is very close to the consumption of MultiNet without PSM. However,

this saving comes at a price. It is no longer possible to achieve the high throughput for

infrastructure networks if the cards are in PSM. Simulated results in Table 2.3 show that

the average packet delay over the infrastructure network with PSM is now close to the

average packet delay for MultiNet. Therefore, using two radios with PSM does not give

significant benefits as compared to MultiNet without PSM.


48

Without Power Save Mode

We analyze the two schemes of connecting to multiple networks with respect to the

performance on the network and the amount of power consumed. In our simulated

scenario, each of the radios gives the best achievable throughput on both the networks.

As shown in Table 2.2, the average throughput of MultiNet in the infrastructure mode is

4.35 Mbps compared to 5.8 Mbps in the two radio case. The average throughput in the ad

hoc network is 1.1 Mbps in MultiNet and 4.4 Mbps when using two radios. Switching

results in lesser throughput across individual networks, since it is on a network for a

smaller time period. Consequently, the scheme of using multiple cards gives much

better throughput as compared to MultiNet when connected to multiple networks.

The power consumption of MultiNet can be reduced further by allowing it to enter

the power save mode for infrastructure networks as described in Section 2.4.2. In our

experiment we chose the Switching Cycle to be 400 ms, with ‘75%IS 25%AH’ switch-

ing. For consistency in comparison, the Listen Interval is set to 4 and the Beacon Period

to 100 ms. Consequently, every time the card switches to infrastructure mode, it listens

for the traffic indication map from the AP. After it has processed all its packets it goes

to sleep and wakes up after 300 ms. It then stays in the ad hoc network for 100 ms,

and then switches back to the infrastructure network. The modified algorithm results in

greater energy savings as shown in Figure 2.13. The average delay per packet over the

infrastructure network is not seriously affected, while the energy consumed is reduced

by more than a factor of 3. We conclude that MultiNet is superior to the use of multiple

cards when connecting to multiple networks in applications seeking convenience and

power efficiency.

Note that we do not evaluate power saving in ad hoc mode because we are unaware

of any commercial cards that implement this feature. As a result we were unable to get
49

performance numbers when using PSM in ad hoc mode. However, we believe that if

such a scheme is implemented, we will be able to incorporate it in MultiNet, and further

reduce the power consumption.

8000

7000

6000
Energy (In Joules)

5000

4000

3000

2000

1000

0
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000

Time (In Seconds)

Two Radio PS MultiNet No PS MultiNet PS

Figure 2.13: Energy usage when using MultiNet and two radios with IEEE 802.11

Power Saving

2.7.9 Maximum Connectivity in MultiNet

We use the simulation environment of Section 2.7.8 to evaluate the performance on

increasing the number of connected networks. Table 2.4 presents the average delay seen

by packets over the infrastructure network on varying the number of MultiNet networks

from 2 to 6. We used a Fixed Priority switching strategy with equal priorities to all

the networks. An increase in the number of connected networks results in a smaller

Activity Period for each connected network when using Fixed Priority Switching. As a

result, more packets are buffered and the average delay encountered by the packets on a

network increases. This is shown in Table 2.4.


50

Table 2.4: The average packet delay in infrastructure mode on varying the number

of MultiNet connected networks

Num Networks Avg Delay (in Seconds)

2 0.191

3 0.261

4 0.332

5 0.410

6 0.485

2.7.10 Summary

We summarize the conclusions of our performance analysis as follows:

• No single switching strategy is best under all circumstances. Adaptive strate-

gies are best when no network preference is indicated. Both Adaptive Buffer and

Adaptive Traffic give similar performance

• For the applications studied, MultiNet consumes 50% less energy than a two card

solution.

• As expected, the average packet delay with MultiNet varies linearly with an in-

crease in the number of connected networks when all the networks are given equal

activity periods.

• Spoofed Buffering significantly improves the performance of MultiNet. However,

MultiNet works even without Spoofed Buffering, although the performance goes

down by a factor of 4.

• Masking ‘media connects’ and ‘media disconnects’ below IP leads to significant


51

reduction in the switching overhead. The switching delay for legacy cards is re-

duced to around 300 ms, while this number goes down to 30 ms for Native WiFi

cards.

• Adaptive Switching eliminates the current zero configuration requirement to pri-

oritize the preferred network. With MultiNet based zero configuration, the user

connects to all preferred networks.

• In mobile scenarios, MultiNet exposes the same connectivity status as a real card.

Further, Slotted Synchronization works well in ad hoc networks with commercial

wireless cards.

2.8 Discussion on the MultiNet Architecture

This section discusses ways in which the performance of MultiNet can be improved. In

particular, it focuses on reducing the switching overhead, enabling 802.1X [57] authen-

tication, and deployment.

2.8.1 Reducing the Switching Overhead

Good performance of MultiNet depends on low switching delays. The main cause of

the switching overhead in current generation wireless cards is the 802.11 network asso-

ciation and authentication protocol [58], which is executed every time the card switches

to a network. Further, these cards do not store state for more than one network in the

firmware, and worse still, many card vendors force a firmware reset when changing the

mode from ad hoc to infrastructure and vice versa.

Most of these problems are fixed in the next generation Native WiFi cards. These

cards do not incur a firmware reset on changing their mode. Moreover, since switching
52

is forced by MultiNet, Native WiFi cards do not explicitly disconnect from the network

when switching. However, they still carry out the association procedure that causes the

25 to 30 ms delay. By allowing upper layer software to control associations, instead of

automatically initializing them, this delay can be made negligible. The only overhead

on switching is then the synchronization with the wireless network. This can be done

reactively, if the card requests a synchronization beacon when it switches to a network.

Using the above optimizations, a WLAN card can switch to a network as fast as the

network card can switch to a different channel and the speed with which a network’s

state can be loaded into a flash card. Recent research has shown that the time to switch

to a different channel is less than 100 µsec for an IEEE 802.11a wireless card [51].

Further, as the network state to load is around 100 bytes, and data transfer speeds for

flash cards is 8 Mbps [13], we expect the switching overhead to be less than 1 ms.

2.8.2 Network Port Based Authentication

The IEEE 802.1X is a port based authentication protocol that is becoming popular for

enterprise wireless LANs. For MultiNet to be useful in all environments it has to support

this authentication protocol. However, the supplicant 802.1X protocol is implemented

in the Wireless Zero Configuration Service (WZC) for Windows XP, and we had to turn

off WZC for MultiNet to work. Only minor changes are needed in WZC for it to work

with MultiNet. However, achieving good performance with IEEE 802.1X is difficult.

We measured the overhead of the IEEE 802.1X authentication protocol and found it to

be approximately 600 ms. It is clear that we need to prevent the card from going through

a complete authentication procedure every time it switches across IEEE 802.1X enabled

networks. We can eliminate the authentication cycles by storing the IEEE 802.1X state

in the MPD and using this state instead of redoing the authentication procedure. Further,
53

the IEEE 802.11 standard recommends an optimization called ‘Preauthentication’ for

the APs. Preauthentication works by having the APs maintain a list of authenticated

nodes. When implemented, this optimization will eliminate the authentication overhead

every time the wireless card switches to an 802.1X enabled network.

2.8.3 Can MultiNet be done in the Firmware?

The simple answer is yes, however we strongly believe that the right place to implement

MultiNet is as a kernel driver. Buffering imposes memory requirements that are best

taken care of by the operating system, and the policy driven behavior can bloat the

firmware. Additionally, by moving the intelligence into a general purpose PC, the cost of

the wireless hardware can be reduced further, which is the trend for the next generation

of WLAN cards we described in Section 2.4.3.

2.9 Future Research

The switching behavior of MultiNet augurs badly for TCP performance. MultiNet is

implemented below IP, and so TCP sees fluctuating behavior for packets sent by it. It

receives immediate acknowledgements for packets sent when the network is active, and

delayed acknowledgements for buffered packets. The above behavior affects the way

TCP adjusts the RTT for the session, and from the way it is calculated, the RTT will

always be an upper bound. An overestimate of RTT results in larger timeout values to

ensure that packets are not lost. However, a larger than required RTT has other conse-

quences with respect to flow control, and congestion response. This problem is generally

relevant for networks that have periodic connectivity. A solution to this problem has to

mask the delay encountered by the buffered packets. We are currently exploring ways

to achieve this, and improve TCP performance.


54

Another open problem, as discussed in Section 2.5.6, is the synchronization of more

than one ad hoc network that has multiple nodes using MultiNet. Solving this problem

requires MultiNet to synchronize its slots to initiators of multiple ad hoc networks, and

those initiators’ slots might not be synchronized. We are looking at ways to handle this

scenario by allowing all nodes, including the initiator, to resynchronize their slots.

As stated previously, we do not consider scenarios when a MultiNet node is partici-

pating in a multi-hop ad hoc network. The synchronization problem is complicated for

such scenarios. A scheme that supports multi-hop networks has to handle partitioning

issues of the ad hoc network, and ways to resynchronize it. SSCH, described in Chap-

ter 3, is a step towards making MultiNet work in multi-hop networks. We hope to build

on SSCH, and implement a protocol that works in such scenarios.

2.10 Summary

The main contributions of this chapter can be summarized as follows:

• It describes a new virtualization architecture for wireless network cards, called

MultiNet. Several compelling real-life scenarios are described that motivate the

need for such an architecture. To the best of our knowledge, MultiNet is the first

to articulate this problem and propose a solution for IEEE 802.11 hardware.

• It proposes a deployable architecture for MultiNet. As part of the architecture, it

presents Spoofed Buffering, which leverages IEEE 802.11 PSM to buffer packets

at the APs without modifying them. Three switching algorithms are presented

that are useful in different applications of MultiNet. It also presents Slotted Syn-

chronization, which is a simple synchronization protocol that works in ad hoc

networks with multiple MultiNet nodes.


55

• It describes the implementation of MultiNet on Windows XP and over commercial

wireless network cards. MultiNet requires no modifications to the wireless card

drivers. This chapter also analyzes the performance of MultiNet in a number of

scenarios, such as mobility and without Spoofed Buffering. Further, MultiNet is

more power efficient than an alternative of using multiple wireless cards in the

device.

MultiNet achieves the design goals of transparency, deployability and performance.

Transparency is achieved by making virtual wireless cards appear as physical wireless

cards to the user. Its deployability has been demonstrated by an implementation on

Windows XP over commercial wireless cards. Finally, the performance of MultiNet has

been studied in detail, and is shown to give good performance in most scenarios. The

MultiNet software is available for free download, and more information can be found

at: http://www.cs.cornell.edu/people/ranveer/MultiNet/.

The contents of this chapter have benefitted from several helpful suggestions and

comments. In particular, Victor Bahl and Pradeep Bahl were involved in discussions that

helped develop the MultiNet architecture. Slotted Synchronization and the performance

results were revised after inputs from Ken Birman. Further, some of MultiNet’s design

goals were motivated by the requirements of MultiNet users.


CHAPTER 3

SSCH: CAPACITY IMPROVEMENT USING MULTINET

3.1 Introduction

The problem of supporting multiple senders and receivers in wireless networks has re-

ceived significant attention in the past decade. One domain where this communication

pattern naturally arises is fixed wireless multi-hop networks, such as community net-

works [21, 70, 107, 109]. Increasing the capacity of such wireless networks has been

the focus of much recent research (e.g., [40, 65, 95]). An obvious way to increase the

network capacity is to use frequency diversity [4,114]. Commodity wireless networking

hardware commonly supports a number of orthogonal channels, and distributing com-

munication across channels permits multiple simultaneous communication flows.

Channelization was added to the IEEE 802.11 standard to increase the capacity of

infrastructure networks — neighboring access points are tuned to different channels so

traffic to and from these access points does not interfere [4]. Non-infrastructure (i.e., ad-

hoc) networks have thus far been unable to exploit the benefits of channelization. The

current practice in ad-hoc networks is for all nodes to use the same channel, irrespective

of whether the nodes are within communication range of each other [107, 109].

Among its constructions, this thesis proposes a new protocol, Slotted Seeded Chan-

nel Hopping (SSCH), which extends the benefits of channelization to ad-hoc networks.

Logically, SSCH operates at the link layer, but it can be implemented in software over

an IEEE 802.11-compliant wireless Network Interface Card (NIC). The SSCH layer

in a node handles three aspects of channel hopping (i) implementing the node’s chan-

nel hopping schedule and scheduling packets within each channel, (ii) transmitting the

channel hopping schedule to neighboring nodes, and (iii) updating the node’s channel

56
57

hopping schedule to adapt to changing traffic patterns. SSCH is a distributed protocol

for coordinating channel switching decisions, but one that only sends a single type of

message, a broadcast packet containing that node’s current channel hopping schedule.

The simulation results show that SSCH yields a significant capacity improvement in

ad-hoc wireless networks, including both single-hop and multi-hop scenarios.

The primary research contributions of SSCH can be summarized as follows:

• It is a new protocol that increases the capacity of IEEE 802.11 ad-hoc networks

by exploiting frequency diversity. This extends the benefits of channelization to

ad-hoc networks. The protocol is suitable for a multi-hop environment, does not

require changes to the IEEE 802.11 standard, and does not require multiple radios.

• SSCH introduces a novel technique, optimistic synchronization, for distributed

rendezvous and synchronization. This technique allows control traffic to be dis-

tributed across all channels, and thus avoids control channel saturation, a bottle-

neck identified in prior work on exploiting frequency diversity [114].

• SSCH introduces a second novel technique to achieve good performance for multi-

hop communication flows. The partial synchronization technique allows a for-

warding node to partially synchronize with a source node and partially synchro-

nize with a destination node. This synchronization pattern allows the load for a

single multi-hop flow to be distributed across multiple channels.

3.2 Background and Motivation

In this section, the discussion will be limited to the widely-deployed IEEE 802.11 Dis-

tributed Coordination Function (DCF) protocol [60]. We begin by reviewing some rel-

evant details of this protocol. IEEE 802.11 recommends the use of a Request To Send
58

(RTS) and Clear To Send (CTS) mechanism to control access to the medium. A sender

desiring to transmit a packet must first sense the medium free for a DCF interframe space

(DIFS). The sender then broadcasts an RTS packet seeking to reserve the medium. If

the intended receiver hears the RTS packet, the receiver sends a CTS packet. The CTS

reserves the medium in the neighborhood of the receiver, and neighbors do not attempt

to send a packet for the duration of the reservation. In the event of a collision or failed

RTS, the node performs an exponential backoff. For additional details, the reader is

referred to [60].

The IEEE 802.11 standard divides the available frequency into orthogonal (non-

overlapping) channels. IEEE 802.11b specifies 11 channels in the 2.4 GHz spectrum,

3 of which are orthogonal, and IEEE 802.11a specifies 13 orthogonal channels in the

5 GHz spectrum. Packet transmissions on these orthogonal channels do not interfere if

the communicating nodes on them are reasonably separated (at least 12 inches apart for

common hardware [4]).

Using only a single channel limits the capacity of a wireless network. For example,

consider the scenario in Figure 3.1 where 6 nodes are within communication range of

each other, all nodes are on the same channel, and 3 of them have packets to send

to distinct receivers. Due to interference on the single channel, only one of them, in

this case node 3, can be active. In contrast, if all 3 orthogonal channels are used, all

transmissions can take place simultaneously on distinct channels. SSCH captures the

additional capacity provided by these orthogonal channels.

There were three important constraints in the design of SSCH:

• SSCH should require only a single radio per node. Some of the previous work

on exploiting frequency diversity has proposed that each node be equipped with

multiple radios [4, 135]. Multiple radios draw more power, and energy consump-
59

1 3 5

2 4 6

Figure 3.1: Only one of the three packets can be transmitted when all the nodes are

on the same channel.

tion continues to be a significant constraint in mobile networking scenarios. By

requiring only a single standards-compliant NIC per node, SSCH faces fewer de-

ployability hurdles than schemes with additional hardware requirements.

• SSCH should use an unmodified IEEE 802.11 protocol (including RTS/CTS)

when not switching channels. Requiring standards-compliant hardware allows

for easier deployment of this technology.

• SSCH should not cause logical partitions; any two nodes in communication range

should be able to communicate with each other despite channel hopping. Because

SSCH switches each NIC across frequency channels, different NICs may be on

different channels most of the time. Despite this, any two nodes in communication

range will overlap on a channel with moderate frequency (e.g., at least 10 ms out

of every half second) and discovery is accomplished during this time. As we


60

will show in Section 3.5.3, the mathematical properties of the SSCH protocol

guarantee that this overlap always occurs, even in the absence of synchronization.

SSCH exploits frequency diversity using an approach that we term optimistic syn-

chronization. This design makes the common case be that nodes are aware of each

other’s channel hopping schedules. However, SSCH also allows any node to change its

channel hopping schedule at any time. If node A has traffic to send to another node B,

and A knows B’s hopping schedule, A will probably be able to quickly send to B by

changing its own schedule. In the uncommon case that A does not know B’s sched-

ule, or A has out-of-date information about B, then the traffic incurs a latency penalty

while A discovers B’s new schedule. The SSCH design achieves this good common case

behavior when SSCH is used with a workload where traffic patterns change (i.e., new

flows are started) with lower frequency than hopping schedule updates are propagated.

Because hopping schedule update propagation requires only tens of milliseconds, this is

a good workload assumption for many wireless networking scenarios. Section 3.6 gives

absolute numbers for these qualitative claims.

SSCH is designed to work in a single-hop or multi-hop environment, and therefore

must support multi-hop flows. We introduce a partial synchronization technique to allow

one node, say B, to follow a channel hopping schedule that overlaps half the time with

another node A, and half the time with a third node C; this is necessary for node B

to efficiently forward traffic from node A to node C. Although it is trivially possible

for node B to have a channel hopping schedule that is an interleaving of A and C’s

schedules, this leaves open how B will schedule itself when a fourth node desires to

synchronize with B. The channel hopping design described in Section 3.5.2 resolves

this issue.
61

3.3 Hardware and MAC Assumptions

We assume that all nodes are using IEEE 802.11a – SSCH could also be used with other

MACs in the IEEE 802.11 family, but evaluation of such options are beyond the scope

of this dissertation. IEEE 802.11a supports 13 orthogonal channels, and we assume no

co-channel interference, a reasonable assumption for physically separated nodes [4]. We

expect wireless cards to be capable of switching across channels. The clocks at all nodes

are assumed to be synchronized to within 1 ms of each other using the Timer Synchro-

nization Function of IEEE 802.11 [58] or its modifications proposed in the literature,

such as ATSP [54,74] or ASCP [110]. We justified this assumption in Chapter 2.5.1. As

we discuss in more detail at the beginning of Section 3.6, recent work has reduced this

switching delay to approximately 80 µs ( [51, 84]). We assume that each wireless card

contains only a single half-duplex single-channel transceiver.

We require that NICs with a buffered packet wait after switching for the maximum

length of a packet transmission before attempting to reserve the medium. This prevents

hidden terminal problems from occurring just after switching. This hardware require-

ment is not necessary if the NIC packet buffer can be cleared whenever the channel is

switched.

3.4 Prior Work

We divide prior work relevant to SSCH into two categories: prior uses of pseudo-random

number generators in wireless networking, and alternative approaches to exploiting fre-

quency diversity. In the first category, we find that pseudo-random number generators

have been used for a variety of tasks in wireless networking. For example, the SEEDEX

protocol [108] uses pseudo-random generators to avoid RTS/CTS exchanges in a wire-


62

less network. Nodes build a schedule for sending and listening on a network, and pub-

lish their seeds to all the neighbors. A node attempts a transmission only when all its

neighbors (including the receiver) are in a listening state. Assuming relatively constant

wireless transmission ranges, this protocol also helps in overcoming the hidden and ex-

posed terminal problem caused by the RTS/CTS approach. The TSMA protocol [31,32]

is a channel access scheme proposed as an alternative to ALOHA and TDMA, for time-

slotted multihop wireless networks. TSMA aims to achieve the guarantees of TDMA

without incurring the overhead of transmitting large schedules in a mobile environment.

Each node is bootstrapped with a fixed seed that determines its transmission sched-

ule. The schedules are constructed using polynomials over Galois fields (which have

pseudo-random properties), and the construction guarantees that each node will overlap

with only a single other node within a certain time frame. The length of the schedule

depends on the number of nodes and the degree of the network.

Porting these schedules to a multichannel scenario, where the number of channels

is fixed, remains an open problem, and even such a porting would not meet the SSCH

goal of supporting traffic-driven overlap. Redi et al. [33] use a pseudo-random gener-

ator to derive listening schedules for battery-constrained devices. Each device’s seed

is known to a base station, which can then schedule transmissions for the infrequent

moments when the battery-constrained device is awake. Although pseudo-random gen-

erators have been used for a number of tasks (as this survey of the literature makes

clear), to the best of our knowledge, SSCH is the first protocol to use a pseudo-random

generator to construct a channel hopping schedule.

A second category of prior work focusses on increasing network capacity by ex-

ploiting frequency diversity. This is a significant body of research. The first division we

make in this body of work is between approaches that assume a single NIC capable of
63

communicating on a single channel at any given instance in time, and those that assume

more powerful radio technology, such as multiple NICs [4, 112] or NICs capable of lis-

tening on many channels simultaneously [66,89], even if they can only communicate on

one. Our work falls in to the former category; the SSCH architecture can be deployed

over a single standards-compliant NIC supporting fast channel switching.

Dynamic Channel Assignment (DCA) [135] and Multi-radio Unification Protocol

(MUP) [4] are both technologies that use multiple radios (in both cases, two radios) to

take advantage of multiple orthogonal channels. DCA uses one radio on a control chan-

nel, and the other radio switches across all the other channels sending data. Arbitration

for channels is embedded in the RTS and CTS messages, and is executed on the control

channel. Although this scheme may fully utilize the data channel, it does so at the cost

of using an entire radio just for control. MUP uses both radios for data and control trans-

missions. Radios are assigned to orthogonal channels, and a packet is sent on the radio

with better channel characteristics. This scheme gives good performance in many sce-

narios. However, it still only allows the use of as many channels as there are radios on

each physical node. From our perspective, the key drawback to both DCA and MUP is

simply that they require the use of multiple radios. Recently, commercial products have

appeared that support multiple radios on a single NIC [44]. It is not known whether

these products will achieve as many radios on a NIC as there are available channels, nor

what their power consumption will be.

A straightforward way to view the different potential gains of SSCH compared to a

true multiple radio design is to consider two distinct sources of bottleneck in a single-

radio, single-channel system: the saturation of the channel, and the saturation of any

particular radio. Conceptually, SSCH significantly increases the channel bandwidth,

without increasing the bandwidth of any individual radio. In contrast, a true multiple
64

radio design increases both. A specific example of this difference is that a node using

MUP (a true multiple radio design) can simultaneously send and receive packets on

separate channels, while a node using SSCH can only perform one of these operations

at a time.

We next turn our attention to work assuming more powerful radio technology than

is currently technologically feasible. HRMA [137] is designed for frequency hopping

spread spectrum (FHSS) wireless cards. Time is divided into slots, each corresponding

to a small fraction of the time required to send a packet, and the wireless NIC is on a

different frequency during each slot. All nodes are required to maintain synchronized

clocks, where the synchronization is at the granularity of slot times that are much shorter

than the duration of a packet. Each slot is subdivided in to four segments of time for

four different possible communications: HOP-RESERVED/RTS/CTS/DATA. The first

three segments of time are assumed to be small in comparison with the amount of time

spent sending a segment of the packet during the DATA time interval. To the best of our

knowledge, a FHSS wireless card that supports this type of MAC protocol at high data

rates is not commercially available.

Another line of related work assumes technology by which nodes can concurrently

listen on all channels. For example, Nasipuri et al [89] and Jain et al [66] assume

wireless NICs that can receive packets on all channels simultaneously, and where the

channel for transmission can be chosen arbitrarily. In these schemes, nodes maintain a

list of free channels, and either the sending or receiving node chooses a channel with the

least interference for its data transfer. Wireless NICs do not currently support listening

on arbitrarily many channels, and we do not assume the availability of such technology

in the design of SSCH.

We finally consider prior work that only assumes the presence of a single NIC with a
65

single half-duplex transceiver. The only other approach that we are aware of to exploit-

ing frequency diversity under this assumption is Multichannel MAC (MMAC) [114].

Like SSCH, MMAC attempts to improve capacity by arranging for nodes to simultane-

ously communicate on orthogonal channels. Briefly, MMAC operates as follows: nodes

using MMAC periodically switch to a common control channel, negotiate their chan-

nel selections, and then switch to the negotiated channel, where they contend for the

channel as in IEEE 802.11. This scheme raises several concerns that SSCH attempts

to overcome. First, MMAC has stringent clock synchronization requirements, and to

the extent that these are relaxed, MMAC must spend more time on the common control

channel doing discovery. Tight clock synchronization is particularly hard to provide in

multi-hop wireless networks [54]. In contrast, SSCH does not require tight clock syn-

chronization because SSCH does not have a common control channel or a dedicated

neighbor discovery interval. Secondly, synchronization traffic in MMAC can be a sig-

nificant fraction of the system traffic, and the common synchronization channel can

become a bottleneck on system throughput. SSCH addresses this concern by distribut-

ing synchronization and control traffic across all the available channels. A third concern

with MMAC is that it assumes wireless NICs are capable of switching across channels

in less than a microsecond. As we will see in the beginning of Section 3.6, an 80 µs

switching time better reflects the current state of the art in wireless NIC design, and

SSCH performs well with this assumption. A fourth concern with MMAC is that it may

not efficiently support multi-hop flows because forwarding nodes may not predictably

split their time between their sending and receiving neighbors. SSCH addresses this by

allowing nodes to achieve predictable partial synchronization with multiple neighbors.

Although this survey does not cover all related work, it does characterize the current

state of the field. At the level of detail in this section, prior work such as CHMA [124]
66

is similar to HRMA [137], and MAC-SCC [80] and the MAC protocols implicit in the

work of Li et al [79] and Fitzek et al [46] are similar to DCA [135]. However, a final

related channel hopping technology that is worth mentioning is the definition of FHSS

channels in the IEEE 802.11 [60] specification. At first glance, it may seem redun-

dant that SSCH does channel hopping across logical channels, each one of which (per

the IEEE 802.11 specification) may be employing frequency hopping across distinct

frequencies at the physical layer. The IEEE 802.11 specification justifies this physi-

cal layer frequency hopping with the scenario of providing support for multiple Basic

Service Sets (BSS’s) that can coincide geographically without coinciding on the same

logical channel. In contrast, SSCH does channel hopping so that any two nodes can

coincide as much or as little of the time as they desire. This is also at the heart of the

difference between SSCH and past work on channel-hopping protocols where nodes

overlap a fixed fraction of the time [32] – the degree of overlap between any two nodes

using SSCH is traffic-dependent.

3.5 SSCH

SSCH switches each radio across multiple channels and distributes flows within inter-

fering range of each other on orthogonal channels. This results in significantly increased

network capacity when the network traffic pattern consists of such flows.

SSCH is a distributed protocol, suitable for deployment in a multi-hop wireless net-

work. It does not require synchronization or leader election. Nodes do attempt to syn-

chronize, but lack of synchronization results in at most a mild reduction in throughput.

SSCH is designed to work with MultiNet, where a slot is defined to be the time

spent on a single channel. We choose a slot duration of 10 ms to amortize the overhead

of channel switching. At 54 Mbps (the maximum data rate in IEEE 802.11a), 10 ms is


67

equivalent to 35 maximum-length packet transmissions. A longer slot duration would

have further decreased the overhead of channel switching, but would have increased the

delay that packets encounter during some forwarding operations. The channel schedule

is the list of channels that the node plans to switch to in subsequent slots and the time at

which it plans to make each switch. Each node maintains a list of the channel schedules

for all other nodes it is aware of – this information is allowed to be out-of-date, but

the common case will be that it is accurate. The good performance exhibited by SSCH

(Section 3.6) validates this claim.

We develop the SSCH protocol by first describing packet transmission attempts

that are made by each node within a slot, and we refer to this as the packet schedule

(Section 3.5.1). Next, we define the policy for updating the channel schedule and for

propagating the channel schedule to other nodes (Section 3.5.2). We then describe the

mathematical properties that guided SSCH’s design (Section 3.5.3). Finally, we discuss

implementation considerations for SSCH (Section 3.6.4).

3.5.1 Packet Scheduling

SSCH maintains packets in per-neighbor FIFO queues. These queues maintain standard

higher-layer assumptions about in-order delivery. The per-neighbor FIFO queues are, in

turn, maintained in a priority queue ordered by perceived neighbor reachability.

The SSCH scheduling strategy aims to maximize bandwidth utilization by minimiz-

ing the number of packets sent to nodes that are unreachable. It works as follows. At the

beginning of a slot, packet transmissions are attempted in a round-robin manner among

all flows. If a packet transmission to a particular neighbor fails, the corresponding flow

is reduced in priority until a period of time equal to one half of a slot duration has

elapsed – this limits the bandwidth wasted on flows targeted at nodes that are currently
68

on a different channel to at most two packets per slot whenever a flow to a reachable

node also exists. Packets are only drawn from the flows that have not been reduced in

priority unless only reduced priority flows are available.

Because nodes using SSCH will often be on different channels, broadcast packets

transmitted in any one slot are likely to reach only some of the nodes within physical

communication range. The SSCH layer handles this issue through repeated link-layer

retransmission of broadcast packets enqueued by higher layers. Although broadcast

packets sent this way may reach a different set of nodes than if all nodes had been on

the same channel, we have not found this to present a difficulty to protocols employing

broadcast packets — in Section 3.6 we show that as few as 6 transmissions allows DSR

(a protocol that relies heavily on broadcasts) to function well. This behavior is not sur-

prising because broadcast packets are known to be less reliable than unicast packets, and

so protocols employing them are already robust to their occasional loss. However, the

SSCH retransmission strategy may not be compatible with all uses of broadcast, such

as its use for synchronization [43]. Also, deploying SSCH in an environment with a

different number of channels might require the choice of 6 transmissions to be revis-

ited. Finally, although retransmission increases the bandwidth consumed by broadcast

packets, SSCH still delivers significant capacity improvement in the traffic scenarios we

studied (Section 3.6).

An SSCH node with a packet to send may discover that a neighbor is not present on

a given channel when no CTS is received in response to a transmitted RTS. However,

the node may very well be present on another channel, in which case SSCH should still

deliver the packet. To handle this, we initially retain the packet in the packet queue.

Packets are dropped only when SSCH gives up on all packets to a given destination, and

this dropping of an entire flow occurs only when we have failed to transmit a packet to
69

the destination node for an entire cycle through the channel schedule. We will explain

the meaning of a cycle through the channel schedule in Section 3.5.2, but with our cho-

sen parameter settings the timeout is 530 ms. After a flow has been garbage collected,

new packets with the same destination inserted in the queue are assigned to a new flow,

and attempted in the normal manner.

This packet scheduling policy is simple to implement, and yields good performance

in the common case where node schedules are known, and information about node avail-

ability is accurate. A potential drawback is that a node crash (or other failure events)

can lead to a number of wasted RTSs to the failed node. When summed across channels,

the number may exceed the IEEE 802.11 suggested value of 7 retransmission attempts

for RTS packets. In Section 3.6, we quantify the cost of such failures and show that it is

small.

3.5.2 Channel Scheduling

We begin our description of channel scheduling by describing the data structure used to

represent the channel schedule. We then describe the policy nodes use to act on their own

channel schedule, the mechanism to communicate channel schedules to other nodes, and

finally the policy nodes implement for updating or changing their own channel schedule.

The channel schedule must capture a given node’s plans for channel hopping in the

future, and there is obvious overhead to representing this as a very long list. Instead, we

compactly represent the channel schedule as a current channel and a rule for updating

the channel – in particular, as a set of 4 (channel, seed) pairs. Our experimental results

show that 4 pairs suffice to give good performance (Section 3.6). We represent the

(channel, seed) pair as (xi , ai ). The channel xi is represented as an integer in the range

[0, 12] (13 possibilities), and the seed a i is represented as an integer in the range [1, 12].
70

Each node iterates through all of the channels in the current schedule, switching to the

channel designated in the schedule in each new slot. The node then increments each of

the channels in its schedule using the seed,

xi ← (xi + ai ) mod 13

and repeats the process.

We introduce one additional slot to prevent logical partitions. After the node has

iterated through every channel on each of its 4 slots, it switches to a parity slot whose

channel assignment is given by x parity = a1 . The term parity slot is derived from the

analogy to the parity bits appended at the end of a string in some error correcting codes.

The mathematical justification for this design is given in Section 3.5.3. We use the term

cycle to refer to the 530 ms iteration through all the slots, including the parity slot.

In Figure 3.2, we illustrate possible channel schedules for two nodes in the case of 2

slots and 3 channels. In the Figure, node A and node B are synchronized in one of their

two slots (they have identical (channel, seed) pairs), and they also overlap during the

parity slot. The field of the channel schedule that determines the channel during each

slot is shown in bold. Each time a slot reappears, the channel is updated using the seed.

For example, node A’s slot 1 initially has (channel, seed) = (1,2). The next time slot 1

is entered, the channel is updated by adding the seed to it mod 3 (mod 3 because in this

example, there are 3 channels). The resulting channel is given by (1 + 2) mod 3 = 0.

Nodes switch from one slot to the next according to a fixed schedule (every 10 ms

in our current parameter settings). However, the decision to switch channels may occur

while a node is transmitting or receiving a packet. In this case we delay the switch until

after the transmission and ACK (or lack thereof) have occurred.

Nodes learn each other’s schedules by periodically broadcasting their seeds and the

offset within this cycle through the channel schedule. We use the IEEE 802.11 Long
71

A: 1 2 0 0 2 1 2 1

(x1, a1) ( 1, 2) (1, 2) 0


( , 2) (0, 2) ( 2, 2) (2, 2) (1, 2) (1, 2)
(x2, a2) (2, 1) ( 2, 1) (0, 1) ( 0, 1) (1, 1) ( 1, 1) (2, 1) (2, 1)

Slot: 1 2 1 2 1 2 Parity 1

B: 1 0 0 1 2 2 2 1

(x1, a1) ( 1, 2) (1, 2) ( 0, 2) (0, 2) ( 2, 2) (2, 2) (1, 2) (1, 2)


(x2, a2) (0, 1) 0
( , 1) (1, 1) ( 1, 1) (2, 1) 2
( , 1) (2, 1) (2, 1)

Slot: 1 2 1 2 1 2 Parity 1

Node goes to Channel X


X
in this slot.

Parity Slot. Node goes to Channel Y,


Y
and then repeats the cycle.

Figure 3.2: Channel hopping schedules for two nodes with 3 channels and 2 slots.

Node A always overlaps with Node B in slot 1 and the parity slot. The field of the

channel schedule that determines the channel during each slot is shown in bold.

Control Frame Header format to embed both the schedule and the node’s current offset –

this is discussed in more detail in Section 3.6.4. The SSCH layer at each node schedules

one of these packets for broadcast once per slot.

Nodes also update their knowledge of other nodes’ schedules by trying to communi-

cate and failing. Whenever a node sends an RTS to another node, and that node fails to

respond even though it was believed to be in this slot, the node sending the RTS updates

the channel schedule for the other node to reflect that it does not currently know the

node’s schedule in this slot.

We now turn to the question of how a given node changes its own schedule. Sched-

ules are updated in two ways: each node attempts to maintain that its slots start and stop

at roughly the same time as other nodes, and that its channel schedule overlaps with
72

nodes for which it has packets to send. We embed the information needed for this syn-

chronization within the Long Control Frame Header as well. Using this information, a

simple averaging scheme such as described by Elson et al [43] can be applied to achieve

the loose synchronization required for good performance (Section 3.6 shows that a 100

µs skew in clock times leads to less than a 2% decrease in capacity).

At a high level, each node achieves overlap with nodes for which it has traffic

straightforwardly, by changing part of its own schedule to match that of the other nodes.

However, a number of minor decisions must be made correctly in order to achieve this

high level goal.

(A1, A2) (A1, B2) (A1, B2)


Node A
A B
(B1, B2) (B1, C2)
Node B
B C
(C1, C2)
Node C
t1 t2
time
Figure 3.3: The problem with a naive synchronization scheme. Node A has two

slots, with (channel, seed) pairs represented by A1 and A2 ; nodes B and C are sim-

ilarly depicted. At time t1 , node A synchronizes with node B. Node B synchronizes

with node C at time t2 , after which A and B are no longer synchronized.

Nodes recompute their channel schedule right before they enqueue the packet an-

nouncing this schedule in the NIC (and so at least once per slot). In a naive approach,

this node could examine its packet queue, and select the (channel, seed) pairs which

lead to the best opportunity to send the largest number of packets. However, this ignores

the interest this node has in receiving packets, and in avoiding congested channels. An
73

example of the kind of problem that might arise if one ignores the interest in receiving

packets is given in Figure 3.3. Here, A synchronized with B, and then B synchronized

with C in such a way that A was no longer synchronized with B. This could have been

avoided if B had used its other slot to synchronize with C, as it would have if it consid-

ered its interest in receiving packets.

To account for this node’s interest in receiving packets, we maintain per-slot counters

for the number of packets received during the previous time the slot was active (ignoring

broadcast packets). Any slot that received more than 10 packets during the previous

iteration through that slot is labeled a receiving slot; if all slots are receiving slots, any

one is allowed to be changed. If some slots are receiving slots and some are not, only

the (channel, seed) pair on a non-receiving slot is allowed to be changed for the purpose

of synchronizing with nodes it wants to send to.

SSCH has to avoid the scenario where all nodes in a network converge on the same

(channel, seed) pair value. This situation could arise in a number of scenarios. For

example, if a node, say A, initiates a flow to another node, say B, and then node C

initiates a flow to node A, then A, B and C will synchronize to the same (channel,

seed) value. Moreover, if these were the only nodes in the network, they would never

change their (channel, seed) value. This situation is a problem for SSCH since all nodes

will hop to the same channel in every slot, and therefore all flows will be on the same

channel. Hence, the benefits of channelization are lost, and SSCH becomes equivalent

to a single-channel MAC.

To account for this channel congestion, we propose a new de-synchronization scheme.

A node compares the (channel, seed) pairs of all nodes from which it received packets

in a given slot, with the list of (channel, seed) pairs of all the other nodes in its list of

channel schedules. If the number of nodes synchronized to the same (channel, seed)
74

pair is more than twice the number that this node communicated with in the previous

occurrence of the slot, we attempt to de-synchronize it from these other nodes. De-

synchronization just involves choosing a new (channel, seed) pair for this slot.
10
9
8
# Synchronized Nodes

7
6
5
4
3
2
1
0
0 50 100 150 200 250 300 350 400 450
Time (in Slots)

Without Desync With Desync

Figure 3.4: Need for De-synchronization: All nodes converge to the same channel

without de-synchronization.

The need for de-synchronization is illustrated in Figure 3.4. Our protocol is simu-

lated for 10 stationary nodes, and one of them is randomly picked as a test node. All

nodes are within communication range of each other, the slot duration is 10 ms, and

each node has 4 (channel, seed) pairs. We consider IEEE 802.11a [59], which has 13

orthogonal channels. Initially, every node starts a flow to a randomly chosen destination

node for a random duration between 1 and 500 ms. At the end of a flow, a node starts

a different flow with a randomly picked destination and duration. Figure 3.4 plots the

number of neighbors of the test node that have the same (channel, seed) pair in a slot as

the test node. Without de-synchronization, the number of nodes with the same (channel,
75

seed) pair increases monotonically over time for each of the 4 (channel, seed) values.

After around 370 slots, which is 370*10 ms = 3.7 seconds, all 9 neighbors of the test

node converge to the same (channel, seed) pair on all slots. Consequently, all nodes

always switch to the same channel all the time, and SSCH becomes equivalent to single

channel IEEE 802.11. This scenario is avoided by our de-synchronization mechanism.

In our experimental scenario, de-synchronization never allows more than 4 neighbors to

have the same (channel, seed) pair as the test node.

The final constraints we add moderate the pace of change in schedule information.

Each node only considers updating the (channel, seed) pair for the next slot, never for

slots further in the future. If the previous set of criteria suggest updating some slot other

than the next slot, we delay that decision. Given these constraints, picking the best pos-

sible (channel, seed) pair simply requires considering the choice that synchronizes with

the set of nodes for which we have the largest number of queued packets. Additionally,

the (channel, seed) pair for the first slot is only allowed to be updated during the par-

ity slot – this helps to prevent logical partition, as will be explained in more detail in

Section 3.5.3.

This strategy naturally supports nodes acting as sources, sinks, or forwarders. A

source node will find that it can assign all of its slots to support sends. A sink node will

find that it rarely changes its slot assignment, and hence nodes sending to it can easily

stay synchronized. A forwarding node will find that some of its slots are used primarily

for receiving; after re-assigning the channel and seed in a slot to support sending, the

slots that did not change are more likely to receive packets, and hence to stabilize on

their current channel and seed as receiving slots for the duration of the current traffic

patterns. Our simulation results (Section 3.6) support this conclusion. We refer to the

technique of enabling this synchronization pattern as partial synchronization.


76

3.5.3 Mathematical Properties of SSCH

Our discussion of the mathematical properties of SSCH will initially focus on the static

case. The behavior of SSCH when channel schedules are not changing assures us that in

a steady-state flow setting, nodes will rendezvous appropriately, in a sense that we make

precise below. We will then expand our discussion to include the dynamics of channel

scheduling in an environment where flows are starting and stopping. In our discussion,

we assume that all nodes use IEEE 802.11 to synchronize their clocks within 1 ms of

each other, and there are no Byzantine failures in the network. A node never sends false

information about its schedule.

The channel scheduling mechanism has three simultaneous design goals: allowing

nodes to be synchronized in a slot, infrequent overlap between nodes that do not have

data to send to each other, and ensuring that all nodes come into contact occasionally (to

avoid a logical partition). To achieve these goals, we rely on a very simple mathematical

technique, addition modulo a prime number [12].

Consider two nodes that want to be synchronized in a given slot. If they have iden-

tical (channel, seed) pairs for this slot, then clearly they will remain synchronized in

future iterations (using the static assumption). Now consider two nodes that are not syn-

chronized because they have different seeds. A simple calculation shows that these two

nodes will overlap exactly one out of every 13 iterations in this slot (recall that 13 is

the number of channels). This is the behavior we want from these nodes: they overlap

regularly enough that they can exchange their channel schedules, but they are mostly on

different channels, and so do not interfere with each other’s transmissions.

Now consider the rare case that two nodes share identical seeds in every slot, but

different channels accompany each seed – this has at most a 1 in 13 4 ≈ 28, 000 chance

of occurring for randomly chosen (channel, seed) pairs. In this case, the nodes will
77

march in lock-step through the same set of channels in each slot, never overlapping.

This would be problematic, and it is this situation that the parity slot prevents. To justify

this claim, we consider two distinct situations. If both nodes enter their parity slot at

the same time, then they overlap there because the parity channel is equal to the seed

for the first slot for both nodes. With our chosen parameter settings of 10 ms per slot,

4 slots, and 13 channels, this overlap occurs once every 530 ms and lasts for 10 ms. If

their parity slots do not occur at the same time, then the first node’s parity slot offers a

fixed target for the slot in which the second node is changing channels, and again, the

two nodes will overlap. This overlap occurs once every 7 seconds. Although both these

cases will be rare, the SSCH time synchronization mechanism allows us to ignore the

second case entirely – a relative clock skew of 5 ms or less is sufficient to guarantee that

two parity slots overlap in time.

Now considering the dynamic case (and assuming clock synchronization to within

5 ms), we note that nodes are not permitted to change the seed for the first of their four

slots except during a parity slot. Therefore they will always overlap in either the first slot

or the parity slot, and hence will always be able to exchange channel schedules within a

moderate time interval.

The use of addition modulo a prime to construct channel hopping schedules does

not restrict SSCH to scenarios where the number of channels is a prime number. If one

desired to use SSCH with a wireless technology where the number of channels is not a

prime, one could straightforwardly use a larger prime as the range of xi , and then map

down to the actual number of channels using a modulus reduction. Though the mapping

would have some bias to certain channels, the bias could be made arbitrarily small by

choosing a sufficiently large prime.

A final point about the use of addition modulo a prime is that SSCH can be modified
78

to require fewer bits to represent a node’s schedule by reducing the number of choices

for a seed. The only penalty to this reduction is increasing the protocol’s reliance on the

parity slot for avoiding logical partitions.

3.6 Performance Evaluation

This section presents the simulation results of SSCH in QualNet and compares its perfor-

mance with the commonly used single-channel IEEE 802.11a protocol. Subsection 3.6.1

presents microbenchmarks quantifying the different SSCH overheads. Subsection 3.6.2,

presents macrobenchmarks on the performance of SSCH with a large number of nodes

in a single hop environment. Subsection 3.6.3 extends the macrobenchmark evaluation

to encompass mobility and multihop routing. Our results show that SSCH incurs very

low overhead, and significantly outperforms IEEE 802.11a in a multiple flow environ-

ment.

The simulation environment comprises a varying number of nodes in a 200m ×

200m area. All nodes in a single simulation run use the same MAC, either SSCH or

IEEE 802.11a. All nodes are set to operate at the same raw data rate, 54 Mbps. We

assume 13 usable channels in the 5 GHz band. SSCH is configured to use 4 seeds,

and each slot duration is 10 ms. All seeds are randomly chosen at the beginning of

each simulation run. The macrobenchmarks in subsections 3.6.2 and 3.6.3 are averages

from 5 independent simulation runs, while the microbenchmarks in subsection 3.6.1 are

drawn from a single simulation run.

We primarily measure throughput under a traffic load of maximum rate UDP flows.

In particular, we use Constant Bit Rate (CBR) flows of 512 byte packets sent every 50

µs. This data rate is more than the sustainable throughput of IEEE 802.11a operating at

54 Mbps.
79

For all our simulations, we modified QualNet to use a channel switch delay of 80

µs. This choice was informed by recent work in solid state electronics on reducing

the settling time of the Voltage Control Oscillator (VCO) [85]. Switching the channel

of a wireless card requires changing the input voltage of the VCO, which operates in

a Phase Locked Loop (PLL) to achieve the desired output frequency. The delay in

channel switching is due to this settling time. The specification of Maxim IEEE 802.11b

Transceivers [84] shows this delay to be 150 µs. More recent work [51] shows that this

delay can be reduced to 40-80 µs for IEEE 802.11a cards.

3.6.1 Microbenchmarks

We present microbenchmarks measuring the overhead of SSCH in several different sce-

narios. In Section 3.6.1, we measure the overhead during the successful initiation of

a CBR stream. In Section 3.6.1, we measure the overhead on an existing session of

failing to initiate a parallel CBR stream. In Section 3.6.1, we measure the overhead

of supporting two streams simultaneously. In Section 3.6.1, we measure the overhead

of continuing to attempt transmissions to a mobile node that has moved out of range.

These scenarios cover many of the different dynamic events that a MAC must appro-

priately handle: a flow starting while a node is present, a flow starting while a node is

absent, simultaneous flows where both nodes are present, simultaneous flows where one

node moves out of range, etc. Finally, the last scenario (Section 3.6.1) measures the

overhead of SSCH with respect to a different kind of event, clock skew.

Overhead of Switching and Synchronizing

In this experiment, we measured the overhead of successfully initiating a CBR stream

between two nodes within communication range of each other. The first node initiates
80

the stream just after the parity slot. This incurs a worst-case delay in synchronization,

because the first of the four slots will not be synchronized until 530 ms later.

In Figure 3.5, we graph the instantaneous throughput at the receiver node. The

sender quickly synchronizes with the receiver on three of the four slots, as it should, and

on the fourth slot after 530 ms. The figure shows the throughput while synchronizing

(oscillating around 3/4 of the raw bandwidth), and the time required to synchronize.

After synchronizing, the channel switching and other protocol overheads of SSCH lead

to only a 400 Kbps penalty in the steady-state throughput relative to IEEE 802.11a.

This penalty conforms to our intuition about the overheads in SSCH: a node spends 80

µs every 10 ms switching channels (80 µs/10 ms = .008), and then must wait for the

duration of a single packet to avoid colliding with pre-existing packet transmissions in

the new channel (1 packet/35 packets = .028). Adding these two overheads together

leads to an expected cumulative overhead of 3.6%, which is in close agreement with the

measured overhead of (400 Kbps/12 Mbps) = 3.3%.

Note that the throughput of the session reaches a maximum of only 13 Mbps, al-

though the raw data rate is 54 Mbps. This low utilization can be explained by the IEEE

802.11a requirement that the RTS/CTS packets be sent at the lowest supported data rate,

6 Mbps, along with other overheads [52].

Overhead of an Absent Node

SSCH requires more re-transmissions than IEEE 802.11 in order to prevent logical par-

titions. These retransmissions waste bandwidth that could have been dedicated to a node

that was present on the channel. To quantify this overhead, we initiated a CBR stream

between two nodes, allowed the system to quiesce, and then initiated a send from the

first node to a non-existent node. We present a moving average of the throughput over 80
81

Time to totally
synchronize
14

12

Throughput (in Mbps) 10

0
10 12 14 16 18 20 22 24

Time (in seconds)


SSCH 802.11a

Figure 3.5: Switching and Synchronizing Overhead: Node 1 starts a maximum rate

UDP flow to Node 2. We show the throughput for both SSCH and IEEE 802.11a.

ms in Figure 3.6. It shows that the sender takes 530 ms to timeout on the non-existent

node. During this time the session throughput drops by 550 Kbps, which is a small

fraction (4.6%) of the total throughput.

Overhead of a Parallel Session

Next, we quantify the ability of SSCH to fairly share bandwidth between two flows, and

to quickly achieve this fair sharing. We start with Node 1 sending a maximum rate UDP

stream to Node 2. At 21.5 seconds, Node 1 starts a second maximum rate UDP stream

to Node 3.

Figure 3.7 presents a moving average of the throughput achieved by both nodes over

a period of 140 ms. It illustrates the instantaneous throughput achieved at Nodes 2 and 3
82

14

Throughput (in Mbps)


12

10
Attempting send to
8 absent node

0
23 23.5 24 24.5 25 25.5 26 26.5 27
Time (in seconds)

Figure 3.6: Overhead of an Absent Node: Node 1 is sending a maximum rate UDP

stream to Node 2. Node 1 then attempts to send a packet to a non-existent node.

(the receivers). The bandwidth is split between the receivers nearly perfectly (and with

no decrease in net throughput) within 200 ms.

Overhead of Mobility

We now analyze the effect of mobility at a micro-level on the performance of SSCH.

Ideally, SSCH should be able to detect a link breakage due to movement of a node, and

subsequently re-synchronize to other neighbors. We show that SSCH can indeed handle

this scenario with an experiment comprising 3 nodes and 2 sessions, and in Figure 3.8

we present a moving average of each session throughput, averaged over a period of 280

ms.

Node 1 is initially sending a maximum rate UDP stream to Node 2. Node 1 initiates

a second UDP stream to Node 3 at around 20.5 seconds. This bandwidth is then shared

between both the sessions (as in the experiment of Section 3.6.1) until 30 seconds, when
83

14

12

Throughput (in Mbps)


10

0
15 20 25 30 35
Time (in seconds)
Node 2 Node 3

Figure 3.7: Overhead of a Parallel Session: Node 1 is sending a maximum rate

UDP stream to Node 2. Node 1 then starts a second stream to Node 3.

Node 3 moves out of the communication range of Node 1. Our experiment configures

Node 1 to continue to attempt to send to Node 3 until 43 seconds, and during this time

it continues to consume a small amount of bandwidth. In contrast, the experiment in

Section 3.6.1 measured the overhead of enqueueing a single packet to an absent node.

When the stream to Node 3 finally stops, Node 2’s received throughput increases back

to its initial rate.

Overhead of Clock Drift

As we described in Section 3.5.2, SSCH tries to synchronize slot begin and end times,

though it is also designed to be robust to clock skew. In this experiment, we quantify the

robustness of SSCH to moderate clock skew. We measure the throughput between two

nodes after artificially introducing a clock skew between them, and disabling the SSCH
84

14

Throughput (in Mbps)


12

10

0
15 20 25 30 35 40 45 50
Time (in seconds)

Node 2 Node 3

Figure 3.8: Overhead of Mobility: Node 1 is sending a maximum rate UDP stream

to Node 2. Node 1 starts another maximum rate UDP session to Node 3. Node 3

moves out of range at 30 seconds, while Node 1 continues to attempt to send until

43 seconds.

synchronization scheme for slot begin and end times. We vary the clock skew from 1 ns

(10−6 ms) to 1 ms such that the sender is always ahead of the receiver by this value, and

present the results in Figure 3.9. Note the log scale on the x-axis.

The throughput achieved between the two nodes is not significantly affected by a

clock skew of less than 10 µs. The drop in throughput is more for larger clock skews,

although the throughput is still acceptable at 10.5 Mbps when the skew value is an

extremely high 1 ms.

These results provide justification for the design choice we made not to require nodes

to switch synchronously across slots, as described in Section 3.5.2. For example, a node

will delay switching to receive an ACK, or to send a data packet if its channel reservation
85

14

Throughput (in Mbps)


12

10

0
1 ns 10 ns 100 ns 1 µs 10 µs 100 µs 1 ms

Clock Drift

Figure 3.9: Overhead of Clock Skew: Throughput between two nodes using SSCH

as a function of clock skew.

is successful. In the 100 node experiment described in Section 3.6.3, we measured the

skew in channel switching times for a traffic pattern of 50 flows to be approximately 20

µs. Figure 3.9 shows that this is a negligible amount.

3.6.2 Macrobenchmarks: Single-hop Case

We now present simulation results showing SSCH’s ability to achieve and sustain a

consistently high throughput for a traffic pattern consisting of multiple flows. We first

evaluate this using steady state UDP flows. We then extend our evaluation to consider

a dynamic traffic scenario where UDP flows both start and stop. Finally, we study the

performance of TCP over SSCH.

Disjoint Flows

We first look at the number of disjoint flows that can be supported by SSCH. All nodes

in this experiment are in communication range of each other, and therefore two flows
86

are considered disjoint if they do not share either endpoint. Ideally, SSCH should utilize

the available bandwidth on all the channels on increasing the number of disjoint flows

in the system. We evaluate this by varying the number of nodes in the network from 2 to

30 and introducing a flow between disjoint pairs of nodes — the number of flows varies

from 1 to 15.

14
Per-flow Throughput
Throughput (in Mbps)

12

10

0
0 2 4 6 8 10 12 14
# Flows

802.11a SSCH

Figure 3.10: Disjoint Flows: The throughput of each flow on increasing the number

of flows.

Figure 3.10 shows the average per-flow throughput, and Figure 3.11 shows the total

utilized system throughput. IEEE 802.11a performs marginally better when there is

just one flow in the network. When there is more than one flow, SSCH significantly

outperforms IEEE 802.11a.

An increase in the number of flows decreases the per-flow throughput for both SSCH

and IEEE 802.11a. However, the drop for IEEE 802.11a is much more significant. The

drop for IEEE 802.11a is easily explained by Figure 3.11, which shows that the overall

system throughput for IEEE 802.11a is approximately constant.


87

120
System Throughput

Throughput (in Mbps)


100

80

60

40

20

0
0 2 4 6 8 10 12 14
# Flows
802.11a SSCH

Figure 3.11: Disjoint Flows: The system throughput on increasing the number of

flows.

It may seem surprising that the SSCH system throughput has not stabilized at 13

times the throughput of a single flow by the time there are 13 flows. However, this can

be attributed to SSCH’s use of randomness to distribute flows across channels. These

random choices do not lead to a perfectly balanced allocation, and therefore there is still

unused spectrum even when there are 13 flows in the system, as shown by the continuing

positive slope of the curve in Figure 3.11.

Non-disjoint Flows

We now consider the case when the flows in the network are not disjoint – nodes par-

ticipate as both sources and sinks, and in multiple flows. This scenario stresses SSCH’s

ability to efficiently support sharing among simultaneous flows that have a common

endpoint. Each node in the network starts a maximum rate UDP flow with one other

randomly chosen node in the network. We vary the number of nodes (and thus flows)
88

from 2 to 20. As in the previous experiment, all nodes are within communication range

of each other. We present the per-flow and system throughput for SSCH and IEEE

802.11a in Figures 3.12 and 3.13 respectively. The curves are not monotonic because

variation in the random choices leads to some receivers being recipients in multiple

flows (and hence bottlenecks). This lack of monotonicity persisted even after averag-

ing over 5 simulation runs. As in the disjoint flow experiment, SSCH performs slightly

worse in the case of a single flow, but much better in the case of a large number of flows.

7 Per-flow Throughput
Throughput (in Mbps)

0
0 5 10 15 20
# Flows
802.11a SSCH

Figure 3.12: Non-disjoint Flows: The average throughput of each flow on increas-

ing the number of flows. There is a flow from every node in the network.

Effect of Flow Duration

SSCH introduces a delay when flows start because nodes must synchronize. This over-

head is more significant for shorter flows. We evaluate this overhead for maximum rate

UDP flows with different flow lengths. In the first experiment the flow duration is cho-
89

45

40 System Throughput

Throughput (in Mbps)


35

30

25

20

15

10

0
0 5 10 15 20
# Flows
802.11a SSCH

Figure 3.13: Non-disjoint Flows: The system throughput on increasing the number

of flows. There is a flow from every node in the network.

sen randomly between 20 and 30 ms, while for the second experiment it is between 0.5

and 1 second. In both the experiments, each node starts a flow with a randomly selected

node, discards all packets at the end of the designated sending window, pauses for a

second at the end of the flow, and then starts another flow with a new randomly selected

node. This process continues for 30 seconds. We run these experiments for both SSCH

and IEEE 802.11a, and vary the number of nodes from 2 to 16. We present the ratio

of the average throughput achieved by SSCH to that achieved by the flows when using

IEEE 802.11a in Figure 3.14.

For small numbers of sufficiently short-lived flows, IEEE 802.11a offers superior

performance; short flows do indeed suffer from a more pronounced synchronization

overhead. However, as soon as there are more than 4 simultaneous flows in the network,

the ability of SSCH to spread transmissions across multiple channels leads to a higher

total throughput than IEEE 802.11a in both the short and long flow scenarios.
90

3.5

Throughput Ratio (SSCH/802.11)


3

2.5

1.5

0.5

0
2 4 6 8 10 12 14 16
# Nodes

Duration 20-30 ms Duration 0.5 -1 second

Figure 3.14: Effect of Flow Duration: Ratio of SSCH average throughput to IEEE

802.11a average throughput for flows having different durations.

TCP Performance over SSCH

We now study the behavior of TCP over SSCH. SSCH allows a node to stay synchro-

nized to multiple nodes over different slots. However, this might cause significant jitter

in packet delivery times, which could adversely affect TCP. To evaluate this concern

quantitatively, we run an experiment where we vary the number of nodes in the network

from 2 to 9, such that all nodes are in communication range of one another. We then start

an infinite-size file transfer over FTP from each node to a randomly selected other node.

This choice to use non-disjoint flows is designed to stress the SSCH implementation by

requiring nodes to be synchronized as either senders or receivers with multiple other

nodes. In Figure 3.15 we present the resulting cumulative steady-state TCP throughput

over all the flows in the network.

Figure 3.15 shows that the TCP throughput for a small number of flows is lower
91

16

14

Throughput (in Mbps)


12

10

0
2 3 4 5 6 7 8
# Flows
SSCH 802.11a

Figure 3.15: TCP over SSCH: Steady-state TCP throughput when varying the

number of non-disjoint flows.

for SSCH than the throughput over IEEE 802.11a. However, as the number of flows

increases, SSCH does achieve a higher system throughput. Although TCP over SSCH

does provide higher aggregate throughput than over IEEE 802.11a, the performance

improvement is not nearly as good as for UDP flows. This shows that jitter due to

SSCH does have an impact on the performance of TCP. A more detailed analysis of

the interaction between TCP and SSCH, and modifications to support better interactions

between TCP and SSCH, is a subject we plan to address in our future work.

3.6.3 Macrobenchmarks: Multihop and Mobility

We now evaluate SSCH’s performance when combined with multihop flows and mobile

nodes. We first analyze the behavior of SSCH in a multihop chain network. We then
92

consider large scale multihop networks, both with and without mobility. As part of this

analysis, we study the interaction between SSCH and MANET routing protocols.

Performance in a Multihop Chain Network

IEEE 802.11 is known to encounter significant performance problems in a multihop net-

work [136]. For example, if all nodes are on the same channel, the RTS/CTS mechanism

allows at most one hop in an A− B − C − D chain to be active at any given time. SSCH

reduces the throughput drop due to this behavior by allowing nodes to communicate on

different channels. To examine this, we evaluate both SSCH and IEEE 802.11a in a

multihop chain network.

14

12
Throughput (in Mbps)

10

0
0 6 12 18
# Nodes
SSCH 802.11a

Figure 3.16: Multihop Chain Network: Variation in throughput as chain length

increases.

We vary the number of nodes, which are all in communication range, from 2 to 18.

We initiate a single flow that encounters every node in the network. Although more
93

than 4 nodes transmitting within interference range of each other would be unlikely

to arise from multihop routing of a single flow, it could easily arise in a more general

distributed application. Figure 3.16 shows the maximum throughput as the number of

nodes in the chain is varied. We see that there is not much difference between SSCH

and IEEE 802.11a for flows with few hops. As the number of hops increases, SSCH

performs much better than IEEE 802.11a since it distributes the communication on each

hop across all the available channels.

Performance in a Multihop Mesh Network

We now analyze the performance of SSCH in a large scale multihop network without

mobility. We place 100 nodes uniformly in a 200 × 200 m area, and set each node to

transmit with a power of 21 dBm. The Dynamic Source Routing (DSR) [68] protocol

is used to discover the source route between different source-destination pairs. These

source routes are then fed into a static variant of DSR that does not perform discovery

or maintain routes. We vary the number of maximum rate UDP flows from 10 to 50. We

generate source and destination pairs by choosing randomly, and rejecting pairs that are

within a single hop of each other.

We present the average flow throughput in Figure 3.17. Increasing the number of

flows leads to greater contention, and the average throughput of both SSCH and IEEE

802.11a drops. For every considered number of flows, SSCH provides significantly

higher throughput than IEEE 802.11a. For 50 flows, the inefficiencies of sharing a

single channel are sufficiently pronounced that SSCH yields more than a factor of 15

capacity improvement.
94

3.5

Throughput (in Mbps)


3

2.5

1.5

0.5

0
10 20 30 40 50
# Flows
SSCH 802.11a

Figure 3.17: Mulithop Mesh Network of 100 Nodes: Average flow throughput on

varying the number of flows in the network.

Impact of Channel Switching on MANET Routing Protocols

Previous work on multi-channel MACs has often overlooked the effect of channel switch-

ing on routing protocols. Most of the proposed protocols for MANETs, such as DSR [68],

and AODV [97] rely heavily on broadcasts. However, neighbors using a multi-channel

MAC could be on different channels, which could cause broadcasts to reach signifi-

cantly fewer neighbors than in a single-channel MAC. SSCH addresses this concern

using a broadcast retransmission strategy discussed in Section 3.5.1.

We study the behavior of DSR [68] over SSCH in the same experimental setup used

in Section 3.6.3, with 100 nodes in a 200 m×200 m area. However, we reduce the trans-

mission power of each node to 16 dBm to force routes to increase in length (and hence

to stress DSR over SSCH). We select 10 source-destination pairs at random, and we use

DSR to discover routes between them. In Figure 3.18 we compare the performance of
95

0.8 7

0.7 6

Average Route Length (hops)


Time to Discover Route (s)
0.6
5

0.5
Average Route Length for IEEE 802.11 4
0.4
3
0.3

2
0.2

0.1 1
Average Route Discovery Time for IEEE 802.11
0 0
2 3 4 5 6 7 8
# Broadcasts
Route Discovery Time Avg Route Length

Figure 3.18: Impact of SSCH on Unmodified MANET Routing Protocols: The

average time to discover a route and the average route length for 10 randomly

chosen routes in a 100 node network using DSR over SSCH.

DSR over SSCH, when varying the SSCH broadcast transmission count parameter (the

number of consecutive slots in which each broadcast packet is sent once).

Figure 3.18 shows that the performance of DSR over SSCH improves with an in-

crease in the broadcast transmission count. The DSR Route Request packets see more

neighbors when SSCH broadcasts them over a greater number of slots. This increases

the likelihood of discovering shorter routes, and the speed with which routes are dis-

covered. However, there seems to be little additional benefit to increasing the broadcast

parameter to a value greater than 6. The slight bumpiness in the curves can be attributed

to the stochastic nature of DSR, and its reliance on broadcasts.

Comparing SSCH to IEEE 802.11a, we see that the SSCH discovers routes that are
96

comparable in length. However, the average route discovery time for SSCH is much

higher than for IEEE 802.11a. Because each slot is 10 ms in length, broadcasts are only

retransmitted once every 10 ms, and this leads to a significantly longer time to discover a

route to a given destination node. We believe that this latency is a fundamental difficulty

in using a reactive protocol such as DSR with SSCH. We plan to explore the interaction

of other proactive and hybrid routing protocols with SSCH in the future.

Performance in Multihop Mobile Networks

We now present the impact of mobility in a network using DSR over IEEE 802.11a

and SSCH. In this experiment, we place 100 nodes randomly in a square and select 10

flows. Each node transmits packets at 21 dBm. Node movement is determined using

the Random Waypoint model. In this model, each node has a predefined minimum and

maximum speed. Nodes select a random point in the simulation area, and move towards

it with a speed chosen randomly from the interval. After reaching its destination, a node

rests for a period chosen from a uniform distribution between 0 and 10 seconds. It then

chooses a new destination and repeats the procedure. In our experiments, we fix the

minimum speed at 0.01 m/s and vary the maximum speed from 0.2 to 1.0 m/s. Although

we have studied SSCH at higher speeds, the results are not significantly different. We

performed this experiment using two different areas for the nodes, a 200m × 200m area

and a 300m × 300m area. We refer to the smaller area as the dense network, and the

larger area as the sparse network – the average path is 0.5 hops longer in the sparse

network. For all these experiments, we set the SSCH broadcast transmission count

parameter to 6.

Figure 3.19 shows that in a dense network, SSCH yields much greater through-

put than IEEE 802.11a even when there is mobility. Although DSR discovers shorter
97

5 4

4.5
3.5

Route Length (Hops)


Throughput (in Mbps)
4
3
3.5
2.5
3

2.5 2

2
1.5
1.5
1
1
0.5
0.5

0 0
0.2 0.4 0.6 0.8 1
Speed (in m/s)
802.11 Throughput SSCH Throughput
802.11 # Hops SSCH # Hops

Figure 3.19: Dense Multihop Mobile Network: The per-flow throughput and the

average route length for 10 flows in a 100 node network in a 200m × 200m area,

using DSR over both SSCH and IEEE 802.11a.

routes over IEEE 802.11a, the ability of SSCH to distribute traffic on a greater number

of channels leads to much higher overall throughput. Figure 3.20 evaluates the same

benchmarks in a sparse network. The results show that the per-flow throughput de-

creases in a sparse network for both SSCH and IEEE 802.11a. This is because the route

lengths are greater, and it takes more time to repair routes. However, the same quali-

tative comparison continues to hold: SSCH causes DSR to discover longer routes, but

still leads to an overall capacity improvement.

DSR discovers longer routes over SSCH than over IEEE 802.11a because broadcast

packets sent over SSCH may not reach a node’s entire neighbor set. Furthermore, some

optimizations of DSR, such as promiscuous mode operation of nodes, are not as effective

in a multi-channel MAC such as SSCH. Thus, although the throughput of mobile nodes
98

6.5
1400
5.5
1200

Throughput (in Kbps)

Route Length (Hops)


4.5
1000
3.5
800

2.5
600

400 1.5

200 0.5

0 -0.5
0.2 0.4 0.6 0.8 1
Speed (in m/s)
802.11 Throughput SSCH Throughput
802.11 # Hops SSCH # Hops

Figure 3.20: Sparse Multihop Mobile Network: The per-flow throughput and the

average route length for 10 flows in a 100 node network in a 300m × 300m area,

using DSR over both SSCH and IEEE 802.11a.

using DSR over SSCH is much better than their throughput over IEEE 802.11a, we

conclude that a routing protocol that takes the channel switching behavior of SSCH into

account will likely lead to even better performance.

3.6.4 Implementation Considerations

When simulating SSCH in QualNet [62], we made two technical choices that seem to

be relatively uncommon based on our reading of the literature. The first technical choice

relates to how we added SSCH to an existing system, and the second relates to a little-

utilized part of the IEEE 802.11 specification.

In order to implement SSCH, we had to implement new packet queuing and retrans-

mission strategies. To avoid requiring modifications to the hardware (in QualNet, the
99

hardware model) or the network stack, SSCH buffers packets below the network layer,

but above the NIC device driver. To maintain control over transmission attempts, we

configure the NIC to buffer at most one packet at a time, and to attempt exactly one RTS

for each packet before returning to the SSCH layer. By observing NIC-level counters

before and after every attempted packet transmission, we are able to determine whether

a CTS was heard for the packet, and if so, whether the packet was successfully trans-

mitted and acknowledged. All the necessary parameters to do this are exposed by the

hardware model we used in QualNet.

For efficiency reasons, we choose to use the IEEE 802.11 Long Control Frame

Header format to broadcast channel schedules and current offsets, rather than using a full

broadcast data packet. The most common control frames in IEEE 802.11 (RTS, CTS,

and ACK) use the alternative short format. The long format was included in the IEEE

802.11 standard to support inter-operability with legacy 1-Mbps and 2-Mbps DSSS sys-

tems [60]. The format contains 6 unused bytes; we use 4 to embed the 4 (channel, seed)

pairs, and another 2 to embed the offset within the cycle (i.e., how far the node has

progressed through the 530 ms cycle).

Lastly, we comment that the beaconing mechanism used in IEEE 802.11 ad-hoc

mode for associating with a Basic Service Set (BSS) works unchanged in the presence

of SSCH. A newly-arrived node can associate to a BSS as soon as it overlaps in the same

channel with any already-arrived node.

3.7 Alternatives to SSCH

This Section discusses alternative designs for SSCH within the constraints that were

enumerated in Section 3.2.

SSCH distributes the rendezvous and control traffic across all the channels. One
100

straightforward alternative scheme, which still only requires one radio, is to use one of

the channels as a control channel, and all the other channels as data channels (e.g., [66]).

Each node must then somehow split its time between the control channel and the data

channels.

Such a scheme will have difficulty in preventing the control channel from becoming

a bottleneck. Suppose that two nodes exchange RTS/CTS on the control channel, and

then switch to a data channel to do transmission. Unless all other nodes were also on the

control channel during the RTS/CTS exchange, these two nodes will still need to do an

RTS/CTS on this channel in order to avoid the hidden terminal problem. The two nodes

should wait to even do the RTS/CTS until after an entire packet transmission interval

has elapsed, because another pair of nodes might have also switched to this channel,

orchestrating that decision on the control channel during a time that the first pair of

nodes were not on the control channel. In order to amortize this startup cost, the nodes

should have several packets to send to each other. However, while any one node remains

on a data channel, any other node that desires to send it a packet must remain idle on

the control channel waiting for the node it desires to reach to re-appear. If the idle node

on the control channel chooses not to wait, and instead switches to a data channel with

another node for which it has traffic, it may repeatedly fail to rendezvous with the first

node, leading to a significant imbalance in throughput and possibly a logical partition.

The problems with a dedicated control channel may be solvable, but it is clear that a

straightforward approach with un-synchronized rendezvous presents several difficulties.

If one instead tried to synchronize rendezvous on the control channel, the control chan-

nel could again become a bottleneck simply because many nodes simultaneously desire

to schedule packets on that channel.


101

3.8 Future Research

SSCH is a promising technology. In our future work, we plan to investigate how SSCH

will perform when implemented over actual hardware, and subjected to the normal en-

vironmental vagaries of wireless networks, such as unpredictable variations in signal

strength. As part of this implementation effort, we also plan to evaluate how metrics

reflecting environmental conditions, such as ETX [40], can be integrated into SSCH.

Our results in Section 3.6.3 show that existing routing protocols do not give the best

performance over SSCH. In particular, we find that the time to discover a route can be

quite large in a reactive routing protocol being run over SSCH. In the future, we plan

to more thoroughly evaluate routing over SSCH (as opposed to classical single channel

routing), and to explore a wider variety of proactive and hybrid routing protocols over

SSCH.

There are at least four additional topics that would also need to be addressed be-

fore SSCH can be deployed. One is interoperability with nodes that are not running

SSCH. Another is the evaluation of power consumption under this scheme. We have

not attempted to evaluate the energy cost of switching channels, nor have we attempted

to enable a power-saving strategy such as in the IEEE 802.11 specification for access-

point mode. A third topic of investigation is the evaluation of SSCH in conjunction with

auto-rate adaptation mechanisms. A fourth topic is a more detailed evaluation of the

interplay between SSCH and TCP.

3.9 Summary

We have presented SSCH, a new protocol that extends the benefits of channelization

to ad-hoc networks. This protocol is compatible with the IEEE 802.11 standard, and is
102

suitable for a multi-hop environment. SSCH achieves these gains using a novel approach

called optimistic synchronization. We expect this approach to be useful in additional

settings beyond channel hopping.

We have shown through extensive simulation that SSCH yields significant capacity

improvement in a variety of single-hop and multi-hop wireless scenarios. In the future,

we look forward to exploring SSCH in more detail using an implementation over ac-

tual hardware. More information about SSCH and the QualNet simulation code can be

obtained from: http://www.cs.cornell.edu/people/ranveer/multinet/ssch.htm.

Work on SSCH was done jointly with people at Microsoft Research. The SSCH

protocol was co-developed with John Dunagan. Victor Bahl was involved in the entire

research project and made sure that we proceeded in the right direction. Finally, this

work benefitted greatly from Ken Birman’s insightful comments.


CHAPTER 4

CLIENT CONDUIT AND FAULT DIAGNOSIS IN WIRELESS NETWORKS

4.1 Introduction

The convenience of wireless networking has led to a wide-scale adoption of IEEE 802.11

networks [58]. Corporations, universities, homes, and public places are deploying these

networks at a remarkable rate. However, a significant number of “pain points” remain

for end-users and network administrators. Users experience a number of problems such

as intermittent connectivity, poor performance, lack of coverage, and authentication fail-

ures. These problems occur due to a variety of reasons such as poor access point lay-

out, device misconfiguration, hardware and software errors, the nature of the wireless

medium (e.g., interference, propagation), and traffic congestion.

Figure 4.1 shows the number of such wireless-related complaints logged by the In-

formation Technology (IT) department of Microsoft corporation over a period of six

months. The company has a large deployment of IEEE 802.11 networks with several

thousand Access Points (APs) spread over more than forty buildings. Each complaint is

an indication of end-user frustration and loss of productivity for the corporation. Fur-

thermore, resolution of each complaint results in additional support personnel costs to

the IT department; our research revealed that this cost is several tens of dollars and this

does not include the cost due to the loss of end-user productivity.

To resolve complaints quickly and efficiently, network administrators need tools for

detecting, isolating, diagnosing, and correcting faults. To the best of our knowledge,

there is no previous research that addresses fault diagnostic problems in IEEE 802.11

infrastructure networks. However, as discussed in Section 4.3, there has been consid-

erable prior work on fault diagnosis in other setting, which we can leverage here. The

103
104

related problems
No. of wireless-
600

400

200

0
1 2 3 4 5 6
Month

Figure 4.1: Number of wireless related complaints logged by the IT department of

a major US corporation

importance of diagnosing these problems in the “real-world” is apparent from the num-

ber of companies that offer solutions in this space [5, 7, 39, 103, 131]. These products

do a reasonable job of presenting statistical data from the network; however, they lack a

number of desirable features. Specifically, they do not do a comprehensive job of gath-

ering and analyzing the data to establish the possible causes of a problem. Furthermore,

most products only gather data from the APs and neglect the client-side view of the

network. Some products that monitor the network from the client’s perspective require

hardware sensors, which can be expensive to deploy and maintain. Also, current solu-

tions do not provide any support for disconnected clients even though these are the ones

that need the most help. We discuss these products in more detail in Section 4.3.

This chapter presents a flexible architecture for detecting and diagnosing faults in

infrastructure wireless networks. We instrument wireless clients and (if possible) access

points to monitor the wireless medium and devices that are nearby. Our architecture

supports both proactive and reactive fault diagnosis. We use this monitoring framework

to address some of the problems plaguing wireless users. We present a novel technique

called Client Conduit that enables disconnected clients to diagnose their problems with
105

the help of nearby clients. This technique takes advantage of the beaconing and probing

mechanisms of IEEE 802.11 to ensure that connected clients do not pay unnecessary

overheads for detecting disconnected clients. We also present a simple technique for

finding the approximate location of disconnected clients. We present a technique that

uses nearby wireless clients for diagnosing wireless network performance problems.

Finally, we show how our monitoring architecture naturally lends itself to detecting

rogue or unauthorized access points in enterprise wireless networks. We have imple-

mented and evaluated the basic architectural framework, Client Conduit, and Rogue AP

detection on the Windows operating system using off-the-shelf IEEE 802.11 network

cards; we have evaluated our other mechanisms using tools such as AiroPeek [132] and

WinDump [134]. Our results show that our techniques are effective; furthermore, they

impose negligible overheads when clients are not experiencing problems.

We summarize the primary contributions of our chapter as follows:

• We believe ours is the first work to identify fault diagnosis in IEEE 802.11 infras-

tructure networks as an important area of research. The identification of various

problems in such environments is an important contribution since wireless fault

diagnosis is an area that needs attention.

• We present a flexible client-based architecture for detecting and diagnosing faults

in an IEEE 802.11 infrastructure network. Our fault-diagnosis approach is unique

in the wireless context since we use clients (and if possible, infrastructure APs) to

monitor the network and the radio frequency (RF) environment.

• We describe a simple and efficient technique called Client Conduit that allows dis-

connected clients to communicate via nearby connected clients; this mechanism

can be used to bootstrap wireless clients and resolve connectivity problems.


106

• We present novel solutions that use our architecture for detecting and diagnosing a

variety of faults: locating disconnected clients, diagnosing performance problems,

and detecting Rogue APs.

Our work is just a first step in the direction of self-healing wireless networks and

there are a number of issues that still need to be addressed. From the vast number of

wireless problems faced by end-users and network administrators everyday, we have fo-

cused only on a subset of those problems; our selection was based on conversations

with network administrators [24] along with the high-priority problems observed in

user-complaint logs. Even though some of our techniques are applicable to other de-

ployments (e.g., hotspots, homes), our main emphasis has been diagnosing faults in en-

terprise wireless networks. We ensure that our techniques do not introduce new security

attacks but we do not focus on denial-of-service and greedy MAC attacks [101].

The rest of the chapter is organized as follows: In Section 4.2, we discuss the most

important problems that users and network administrators complain about with respect

to wireless LAN deployment. Section 4.4 describes the components of our client-based

architecture. Section 4.5 presents the Client Conduit protocol. Section 4.6 focuses on lo-

cating disconnected clients, performance isolation, and Rogue AP detection. Section 4.7

describes the implementation of our system and Section 4.8 presents an evaluation of

our techniques. Section 4.3 discusses related work. Finally, we discuss future work in

Section 4.9 and conclude in Section 4.10.

4.2 Faults in a Wireless Network

We enumerate the most important problems that users and network administrators face

when using and maintaining corporate wireless networks. Our list has been derived from

interviews and discussions we conducted with network administrators and operation


107

engineers of Microsoft’s IT department. These individuals are responsible for managing

over 4,400 IEEE 802.11 APs distributed over forty buildings in the company.

Connectivity problems: End-users complain about inconsistent or a lack of network

connectivity in certain areas of a building. Such “dead spots” or “RF holes” can oc-

cur due to a weak RF signal, lack of a signal, changing environmental conditions, or

obstructions. Locating an RF hole automatically is critical for wireless administrators;

they can then resolve the problem by either relocating APs or increasing the density of

APs in the problem area or by adjusting the power settings on nearby APs for better

coverage.

Performance problems: This category includes all the situations where a client ob-

serves degraded performance, e.g., low throughput or high latency. There could be a

number of reasons why the performance problem exists, e.g., traffic slow-down due

to congestion, RF interference due to a microwave oven or cordless phone, multi-path

interference, large co-channel interference due to poor network planning, or due to a

poorly configured client/AP. Performance problems can also occur as a result of prob-

lems in the non-wireless part of the network, e.g., due to a slow server or proxy. It is

therefore necessary that the diagnostic tool be able to determine whether the problem is

in the wireless network or elsewhere. Furthermore, identifying the cause in the wireless

part is important for allowing network administrators to better provision the system and

improve the experience for end-users.

Network security: Large enterprises often use solutions such as IEEE 802.1x [57] to

secure their networks. However, a nightmare scenario for IT managers occurs when em-

ployees unknowingly compromise the security of the network by connecting an unau-

thorized AP to an Ethernet tap of the corporate network. The problem is commonly


108

referred to as the “Rogue AP Problem” [5, 7, 36]. These Rogue APs are one of the

most common and serious breaches of wireless network security. Due to the presence of

such APs, external users are allowed access to resources on the corporate network; these

users can leak information or cause other damage. Furthermore, Rogue APs can cause

interference with other access points in the vicinity. Detecting Rogue APs in a large

network via a manual process is expensive and time-consuming; thus, it is important to

detect such APs proactively.

Authentication problems: According to the IT support group’s logs, a number of com-

plaints are related to users’ inability to authenticate themselves to the network. In wire-

less networks secured by technologies such as IEEE 802.1x [57], authentication failures

are typically due to missing or expired certificates. Thus, detecting such authentication

problems and helping clients to bootstrap with valid certificates is important.

In this chapter, we focus on detecting RF holes, diagnosing performance problems,

detecting Rogue APs, and helping a client to recover from an authentication problem

via Client Conduit. As part of our future work, we will investigate diagnosis of authen-

tication problems as well.

4.3 Related Work

To the best of our knowledge, there has been no previous research on fault diagnosis

in IEEE 802.11 infrastructure networks. However, there are a number of commercial

products that provide varying degrees of support for network management tasks, e.g.,

AirWave [7], Network Systems and Management (NSM) [39], Wireless Security Advi-

sor [103], AirDefense [5], SpectraMon/SpectraGuard [131], AirMagnet [6], and Sym-

bol [123]. Due to their propriety nature, the available description typically describes the
109

feature-set and not the techniques; the comparison below is based on our understanding

of their brochures.

The emphasis in most of these products is more towards managing wireless networks

rather than diagnosing faults. These tools allow network administrators to obtain and vi-

sualize data from access points, upgrade firmware, manage security policies, etc. Some

of them also provide real-time WLAN performance monitoring through IEEE 802.11

statistics such as packet throughput, number of retries, number of dropped packets at

the AP, etc. Even though these low-level statistics are useful for network administra-

tors, it is more desirable to provide higher level fault detection and diagnosis, e.g., our

approach detects network performance problems and pinpoints the components that are

problematic.

Many of these products (e.g., AirWave, Unicenter) operate from the AP or the server

side only, i.e., clients are not instrumented. Given the asymmetry and variability of the

wireless medium, observing data from the client-side is important for fault diagnosis,

e.g., since conditions such as interference near the client can be drastically different

than the conditions near the AP, client-side information is needed to do a detailed per-

formance breakdown. Furthermore, our approach of modifying clients allows us to help

disconnected clients via Client Conduit, locate Rogue APs and disconnected clients, and

obtain better coverage for detecting Rogue APs.

Some products like AirMagnet and AirDefense obtain the complete view of the

enterprise by deploying specialized sensors throughout the organization; these sensors

pass all the packets to the server for analysis. Anecdotal evidence from talking to vari-

ous network administrators suggests that products that use sensor-based monitoring are

expensive to deploy; furthermore, their performance degrades significantly even when

very few sensors are deployed due to the network traffic. Our approach uses regular
110

wireless clients to avoid extra hardware deployment costs. Of course, a limitation of

our approach is that we rely on the presence of nearby clients for diagnosing some of

the wireless faults; however, the increasing usage of wireless clients in organizations is

making it easier to satisfy this requirement.

Since Rogue APs are a serious security problem, all the products listed above per-

form Rogue AP detection. Unlike our solution, most of these products achieve this goal

either by using other APs [7, 39] or by using specialized sensors [5, 6, 131]; as discussed

above, these approaches have deployment and fault-detection limitations. Our technique

of using both clients and APs for detecting Rogue APs is similar to the Symbol tech-

nique [123]. However, unlike their approach, our technique can also detect Rogue APs

that use MAC address spoofing of real APs; furthermore, we leverage our client and AP

instrumentation to approximately locate Rogue APs using D IAL.

None of the above products provide solutions for assisting disconnected clients even

though they need the most help. Our Client Conduit mechanism allows live and reactive

diagnosis to be performed for such clients that are unable to access the infrastructure

wireless network.

The notion of making wireless clients snoop the environment for ensuring secure and

correct routing has been suggested for ad hoc networks. In [83], the authors propose a

watchdog mechanism to detect network unreliability problems stemming from selfish

nodes. The basic idea is to have watchdog nodes observe their neighbors and determine

if they are forwarding traffic as expected; this approach for detecting routing anomalies

has been further refined by others as well [15,27]. Inspired by the watchdog mechanism,

we also use nearby clients to monitor the RF conditions and traffic flow around them; in

our architecture, the watchdog mechanism is used for fault detection (e.g., Rogue APs)

and fault diagnosis (e.g., Client Conduit, locating disconnected clients, performance
111

isolation). Recent work [101] has used snooping wireless clients for detecting greedy

and malicious behavior in hotspots environment; these techniques are orthogonal to our

work and can be incorporated in our framework as well.

Researchers have developed techniques for diagnosing performance problems over

the Internet. For example, Barford et al. [19] use traffic traces at the end points and clas-

sify delays as occurring due to a slow server, a slow client, or the network. While E DEN

has similar goals over a wireless network, it does so without requiring tracing support

from both end points. Tulip [82] is another approach for diagnosing delays over Internet

paths. The client sends ICMP packets and uses their responses from different compo-

nents to determine the cause, such as lost packets, packets reordering, or queueing delay.

E DEN also uses ICMP packets. However, the broadcast nature of the wireless medium

enables E DEN to use a novel approach of snooping these packets as a mechanism for

diagnosing component delays.

4.4 System Architecture

In this Section, we first discuss the requirements and then describe the components that

make up our fault detection and diagnosis architecture.

4.4.1 System Requirements

Before we describe the system components, we enumerate the requirements for our

system:

• We require that the software on clients be augmented for monitoring. In our sys-

tem, software modifications on APs are needed only for better scalability and for

analyzing an AP’s performance (Section 4.6.2). Since our approach does not re-

quire hardware modifications, “the bar” for deploying our system is lower.
112

• For some of our mechanisms, we need the ability to control beacons and probes.

We also require that clients have the capability of starting an infrastructure net-

work (i.e., become an AP) or an ad hoc network on their own; this ability is sup-

ported by many wireless cards, e.g., Atheros [14], Native WiFi [86]. Whenever

faced with a choice of starting an ad hoc or an infrastructure network, we prefer

the latter since infrastructure mode is better supported in current cards.

• We rely on the availability of a database that keeps track of the location of all

the access points; such location databases are typically maintained by network

administrators.

• Some of our techniques require the presence of nearby clients or access points.

With the increasing deployment of access points and the use of wireless laptops

and PDAs in enterprise wireless networks, this requirement is becoming relatively

easy to satisfy in these environments. In fact, based on SNMP data collected from

APs over a period of two days, we observed the presence of 13-15 associated

wireless clients on our floor (approximately 2500 sq. meters) during working

hours of the day; thus, with such client densities, there is a high likelihood that

our requirement will be satisfied.

Compared with the existing products that require deploying special wireless sen-

sors throughout the enterprise, our approach takes advantage of nearby clients and

access points instrumented with software “sensors”, thereby imposing a lower de-

ployment cost.

4.4.2 System Components

Our system consists of the following components — a Diagnostic Client (DC) that runs

on a wireless client machine, an optional Diagnostic AP (DAP) that runs on an Access


113

Point, and a Diagnostic Server (DS) that runs on a backend server of the organization

(see Figure 4.2). Below, we describe each of these in detail.

Diagnostic Client module or DC: The Diagnostic Client module monitors the RF en-

vironment and the traffic flow from neighboring clients and APs. Note that during nor-

mal activity, the client’s wireless card is not placed in promiscuous mode. The DC

uses the collected data to perform local fault diagnosis. Depending on the individual

fault-detection mechanism, a summary of this data is transmitted to the DAPs or DSs

at regular intervals, e.g., for Rogue AP detection, the DC in our prototype sends MAC

and channel information of nearby APs every 30 seconds. In addition, the DC is geared

to accept commands from the DAP or the DS to perform on-demand data gathering,

e.g., switching to promiscuous mode and analyzing a nearby client’s performance prob-

lems. In case the wireless client becomes disconnected, the DC logs data to a local

database/file. This data can be analyzed by the DAP or DS at some future time when

network connectivity is resumed.

Diagnostic Access Point module or DAP: The Diagnostic AP’s main function is to ac-

cept diagnostic messages from DCs, merge them along with its own measurements and

send a summary report to the DS. The Diagnostic AP is not a fundamental requirement

of our architecture; it is primarily needed for offloading work from the DS. Most of our

techniques can work in an environment with a mixture of legacy APs and DAPs: if an

AP is a legacy AP, its monitoring functions are performed by the DCs and its summa-

rizing functions and checks are performed at the DS. In the rest of the chapter, for the

ease of exposition, we assume the presence of DAPs.

Diagnostic Server module or DS: The Diagnostic Server accepts data from DCs and

DAPs and performs the appropriate analysis to detect and diagnose different faults. The
114

User Level Info

Diagnostic DS
Server (DS) Auth Radius Kerberos DHCP
Info

Diagnostic
Messages/ Legacy AP
Actions
DAP

Forward
Send monitor disconnected
info client msgs DC
DC DC
DC
Coverage
Area Client Disconnected
Conduit Peer

Information flow
Wiring
for Diagnosis

Figure 4.2: Fault Diagnosis Architecture

DS also has access to a database that stores each AP’s location. Network administrators

may deploy multiple DSs in the system to balance the load, e.g., each AP’s MAC address

could be hashed to a particular DS. In the rest of the chapter, we present our mechanisms

as if one Diagnostic Server is present in the system.

Figure 4.2 gives a schematic view of our fault diagnosis system. As shown, the

Diagnostic Server interacts with other network servers e.g., the RADIUS [105] and Ker-

beros [90] servers, to get client authorization and user information. Our architecture

allows disconnected clients to communicate with the DS via a nearby connected client

using the Client Conduit protocol; this mechanism is presented in Section 4.5.
115

Our system supports both reactive and proactive monitoring. In proactive moni-

toring, DCs and DAPs monitor the system continuously: if an anomaly is detected by

a DC, DAP, or DS, an alarm is raised for a network administrator to investigate. The

reactive monitoring mode is used when a support personnel wants to diagnose a user

complaint. The personnel can issue a directive to a DC from one of the DSs to collect

and analyze the data for diagnosing the problem. We believe that it is acceptable to

increase the network and CPU load (on the DCs, DAPs, DSs) by a small amount during

reactive monitoring; of course, in the proactive case, these overheads must be kept low.

Our architecture itself imposes negligible overheads with respect to power man-

agement: the individual techniques have to be designed to prevent unnecessary battery

wastage. Both the proactive and reactive techniques presented later in this chapter con-

sume very little bandwidth, CPU, or disk resources; as a result, they should have negli-

gible impact on battery consumption. Only during data transfer in Client Conduit does

a connected client send/receive messages on behalf of a disconnected client. To ensure

that the helping client’s applications (or battery) are not affected significantly, it is of-

fered a knob to control the amount of resources it wants to devote for this transfer (see

Section 4.5.2).

Table 4.1 shows the various problems diagnosed in this chapter, the entities (DCs,

DAPs, and DSs) involved in the diagnosis, and whether the solution can be used with

legacy APs.

4.4.3 System Scaling

We have designed our system to scale with the number of clients and APs in the system.

The two shared resources in our system are DSs and DAPs. To prevent a single Di-

agnostic Server from becoming a potential bottleneck in our system, the design allows
116

Table 4.1: Different fault diagnosis mechanisms and entities that can diagnose

them; the last column indicates if the solution can be supported using legacy APs

Fault Diagnosis Where performed Support for legacy APs?

Help disconnected client DC Yes

Locate disconnected client DS Yes

Performance Isolation DC and DAP Partially

Detect Rogue APs DS Yes

more DSs to be added as the system load increases. Furthermore, we offload work from

each individual DS by sharing the diagnosis burden with the DCs and the DAPs. The

DS is used only when the DCs and DAPs are unable to diagnose the problem and the

analysis requires a global perspective and additional data, e.g., signal strength informa-

tion obtained from multiple DAPs may be needed for locating a disconnected client. As

stated earlier, the presence of legacy APs degrades scalability since the work usually

performed by DAPs would need to be performed by the DSs.

Similarly, since the DAP is a shared resource, making it do extra work can potentially

hurt the performance of all its associated clients. To reduce the load on a DAP, different

fault diagnosis mechanisms can use a simple technique that we refer to as Busy AP

Optimization: with this optimization, an AP does not perform active scanning if any

client is associated with it; the associated clients perform these operations as needed.

The AP continues to perform passive monitoring activities that have a negligible effect

on its performance. If there is no client associated, the AP is idle and it can perform these

monitoring operations. This approach ensures that most of the physical area around the

AP is monitored without hurting the AP’s performance.


117

4.4.4 System Security

The interactions between the DC, DAP, and DS are secured using EAP-TLS [2] certifi-

cates issued over IEEE 802.1x. An authorized certification authority (CA) issues certifi-

cates to DCs, DAPs and DSs; we use these certificates to ensure that all communication

between these entities is mutually authenticated.

We do not address malicious behavior by legitimate users in our environment. Re-

searchers have developed techniques for detecting greedy and malicious behavior for

hotspot environments [101]; others have suggested techniques to handle problems due

to false information sent by malicious clients to central entities such as the DS [99].

These approaches are complimentary to our design and could be included in our sys-

tem.

4.5 Client Conduit

This section presents a novel mechanism called Client Conduit that allows disconnected

wireless clients to convey information to network administrators and support personnel.

If a wireless client cannot connect to the network, the DC logs the problem in its

database. When the client is connected later (e.g., via a wired connection), this log is

uploaded to the DS, which performs the diagnosis to determine the cause of the problem.

However, sometimes it is possible that this client is in the range of other connected

clients; this client may be disconnected since it is just outside the range of any AP or

due to authentication problems. In this situation, it would be desirable to perform fault

diagnosis with the DS immediately and, if possible, rectify the problem. We now focus

on this scenario.

On first thought one may ask: why not have the disconnected node simply send a

message to its connected neighbor? Unfortunately, this approach does not work because
118

IEEE 802.11 does not allow a client to be connected to two networks at the same time.

Since the connected node has already associated to an infrastructure network, it cannot

simultaneously connect to an ad-hoc network with the disconnected client D — if it

wants to receive a message from D, it first has to disconnect and then join the ad-hoc

network started by D. This is inefficient and unfair to a normally-functioning connected

client.

One can imagine solving this problem using multiple radios on the connected client

(one dedicated on an ad hoc network for diagnosis), or using MultiNet (which allows a

client to multiplex a single wireless card such that it is present on multiple networks), or

by making a connected client periodically scan all channels. All these approaches have

the undesirable property of penalizing the normal-case operation/costs to deal with a

problem that is expected to occur infrequently. In the periodic scanning case, switching

the wireless card across channels or networks can cause packet drops at the connected

client. In the MultiNet case, the wireless card will periodically spend time on the ad

hoc network, and will thus consume bandwidth on the connected client. On the other

hand, our Client Conduit approach imposes no overheads in the common case when no

disconnected clients are present in the neighborhood.

4.5.1 The Client Conduit Protocol

We now discuss our Client Conduit protocol that allows a disconnected client to be di-

agnosed by a DS via one of the connected clients. Client Conduit achieves its efficiency

(of not penalizing connected clients) by exploiting two operational facts about the IEEE

802.11 protocol. First, even when a client is associated to an AP, it continues to re-

ceive beacons from neighboring APs or ad hoc networks at regular intervals. Second,

a connected client can send directed or broadcast Probe Requests without disconnect-
119

ing from the infrastructure network. We now present the Client Conduit protocol for a

scenario where a disconnected client D is in the vicinity of a connected client C (see

Figure 4.3). In the following description, we refer to the first 4 steps of the protocol aa

the Connection Setup phase and the last step as the Data Transfer phase.

Figure 4.3: Client Conduit Mechanism (Steps 1 through 5 are described below)

1. The DC on the disconnected client D configures the machine to operate in promis-

cuous mode. It scans all channels to determine if any nearby client is connected

to the infrastructure network. If it detects such a connected client on a channel,

it starts a new infrastructure or an ad hoc network on the channel on which it

detected the client’s packets. For the reasons discussed in Section 4.4.1, and for

the simplicity of exposition, we assume that client D switches mode to become an

AP and starts an infrastructure network.1

2. This newly-formed AP at D now broadcasts its beacon like a regular AP, with an

SSID of the form “SOS HELP <num>” where num is a 32-bit random number to

differentiate between multiple disconnected clients.


1
By examining the ToDS and FromDS fields of IEEE 802.11 data frames [58], client
D can determine whether the data packet is part of an infrastructure network and is being
sent to/from an AP.
120

3. The DC on the connected client C detects the SOS beacon of this new AP. At

this point, C needs to inform D that its request has been heard and it can stop

beaconing. If client C tries to connect to D, it would need to disconnect from

the infrastructure network, thereby hurting the performance of C’s applications.

Instead, we utilize the “active scanning” mechanism of IEEE 802.11 networks —

C sends a Probe Request of the form “SOS ACK <num>” to D. Note that the

Probe Request is sent with a different SSID than the one being advertised by

the AP running on D. This approach prevents some other nearby client that is

not involved in the Client Conduit protocol from inadvertently sending a Probe

Request to D (as part of that client’s regular tests of detecting new APs in its

environment).

4. When D hears this Probe Request (and perhaps other requests as well), it stops

being an AP, and becomes a station again. Note that in response to the Probe

Request, a Probe Response is sent out by D; client C now knows that it does

not need to send more Probe Requests (it would have stopped anyway when D’s

beacons stopped). More importantly, D’s Probe Response indicates if D would

like to use client C as a hop for exchanging diagnostic messages with the DS.

This response mechanism ensures that if multiple connected clients try to help D,

only one of them is chosen by D for setting up the conduit with the DS.

5. Now D starts an ad hoc network and C joins this network via MultiNet [30]. At

this point, C becomes a conduit for D’s messages and D can exchange diagnostic

messages with the DS through C.

The key advantage of the Client Conduit protocol is that connected clients do not

experience unnecessary overheads during normal operation. Their overheads during the

execution of the protocol are discussed later in this section.


121

It is important to note that the Client Conduit mechanism can also be used for boot-

strapping clients. For example, suppose that a client tries to access a wireless network

for the first time and does not have EAP-TLS certificates, but has other credentials such

as Kerberos credentials; Client Conduit can be used to authenticate the user/machine

with the backend Radius/Kerberos servers. New certificates can then be installed on the

client machine; similarly, a client’s expired certificates can also be refreshed without

requiring a wired connection.

It is possible that a client D is within the range of an AP and is disconnected because

of IEEE 802.1x authentication problems [24]. Client Conduit can be used if a connected

client is in range as well. If there is no such client, one can dynamically configure the

AP to allow D’s diagnostic messages to the back end DS (or to the RADIUS servers

who can forward to the DS) via the uncontrolled port [57].

4.5.2 Client Conduit Security and Attacks

We must ensure that the Client Conduit protocol does not introduce any new security

leaks or opportuniues for denial-of-service attacks in the system. To ensure that a mali-

cious/unauthorized client does not obtain arbitrary access to the network, the connected

client allows a disconnected client’s packets to be exchanged only with the DS or back-

end authentication servers.

We now discuss two potential abuses of Client Conduit: hurting the performance of

helping clients and disguising a Rogue AP as a disconnected client.

Performance Degradation of Helping Clients

When a connected client C helps a disconnected client via Client Conduit, we need to

ensure that C’s application’s performance is not adversely affected. During the Con-
122

nection Setup part of Client Conduit, the connected client C simply requires processing

the beacon message and sending/receiving probe messages; no messages are forwarded

by C on the disconnected client’s behalf. These steps not only consume negligible re-

sources on C but they also do not result in any security leak or compromise on C; of

course, C can further rate-limit or stop performing these steps if it discovers that the

disconnected client is making it perform these steps often.

We now consider the Data Transfer part of the protocol for possible security and

denial-of-service attacks. Switching to MultiNet mode can consume bandwidth at the

connected client [30]. There are two problems that need to be addressed. First, a mali-

cious client should not be allowed to waste a connected client C’s resources by making

it enter MultiNet mode unnecessarily. Second, even when helping a legitimate client, C

should be able to control the amount of resources that it wants to allocate for the discon-

nected client D during the MultiNet transfer. The second problem can be addressed by

providing a knob to the client that allows it to limit the percentage of time that it spends

on the ad hoc network relative to the infrastructure network; client C may also limit

this usage to save battery power. Section 4.8.2 characterizes the disconnected client’s

performance overheads due to this tradeoff.

To prevent the first problem due to malicious clients, we add the following authen-

tication step before Data Transfer to ensure that only legitimate clients are allowed to

connect via client C.

After the Connection Setup phase, client C switches to MultiNet mode for perform-

ing authentication. To prevent a denial-of-service (DoS) attack where C is forced into

MultiNet mode repeatedly, C can limit the number of times per minute that it performs

such an authentication step. As part of the authentication step, client C obtains the

EAP-TLS machine certificate from the disconnected client and validates it (for ensuring
123

mutual authentication, client D can perform these steps as well). If the disconnected

client has no certificates or its certificates have expired, client C acts as an intermediary

for running the desired authentication protocol, e.g., C could help D perform Kerberos

authentication from the back end Kerberos servers and obtain the relevant tickets. If

the disconnected client D still cannot authenticate, C asks D to send the last (say) 10

KBytes of its diagnosis log to C and C forwards this log to the DS. To prevent a possible

DoS attack in which a malicious client tries to send this unauthenticated log repeatedly

(e.g., while spoofing different MAC addresses), the connected client can limit the total

amount of unauthenticated data that it sends in a fixed time period, e.g., C could say that

it will send at most 10 KBytes of such data every 5 minutes.

Preventing Disguised Rogue APs

As discussed in Section 4.2, unauthorized APs are a serious security problem in an

enterprise wireless network. An attacker who wants to set up an unauthorized AP and

remain undetected may try to exploit the properties of Client Conduit. The attacker’s

AP can be set up to beacon with an SOS SSID; our Rogue AP detection mechanism

(Section 4.6.3) will assume that this beaconing device is actually a disconnected client

and not declare it as a Rogue AP. Thus, we need to distinguish between the cases where

the beaconing device is a legitimate client and where it is actually a Rogue AP.

In Client Conduit, when a disconnected client becomes an AP or starts an ad hoc

network during the Connection Setup and starts beaconing, it does not send or receive

any data packets. Thus, if a DC ever detects an AP (or a node in ad hoc mode) that is

beaconing the SOS SSID and sending/receiving data packets, the DC can immediately

flag it as a Rogue device. There is another test that can be used to detect such a Rogue

device: when the helping client hears the Probe Response in step 4 of the Client Conduit
124

protocol, it knows that the disconnected client no longer needs to beacon. Thus, if the

helping client continues to hear the SOS beacons after a few seconds, it can flag the

device as a disguised Rogue device.

4.6 Fault Detection and Diagnosis

This section discusses our techniques for detecting and diagnosing faults in a IEEE

802.11 wireless network. Section 4.6.1 describes a simple technique for locating dis-

connected clients. Section 4.6.2 presents our mechanisms for isolating performance

problems and Section 4.6.3 describes how we detect rogue access points.

4.6.1 Locating Disconnected Clients

The ability to locate disconnected wireless clients automatically in a fault diagnosis

system is valuable for proactively determining problematic regions in a deployment,

e.g., poor coverage or high interference (locating RF holes) or for locating possibly

faulty APs. A disconnected client can determine that it is in an RF hole if it does not hear

beacons from any AP (as opposed to being disconnected due to some other reason such

as authentication failures). To approximately locate disconnected clients (and hence

help in locating RF holes), we now discuss a technique called Double Indirection for

Approximating Location or D IAL.

As discussed earlier, when a client D discovers that it is disconnected, it becomes

an AP or starts an ad hoc network and starts beaconing. To determine the approximate

location of this client, nearby connected clients hear D’s beacons and record the sig-

nal strength (RSSI) of these packets. They inform the DS that client D is disconnected

and send the collected RSSI data. At this point, the DS executes the first step of D IAL

to determine the location of the connected clients: this can be done using any known
125

location-determination technique in the literature [17, 73]. In the next step of D IAL, the

DS uses the locations of the connected clients as “anchor points” and the disconnected

client’s RSSI data to estimate its approximate location. This step can be performed us-

ing any scheme that uses RSSI values from multiple clients for determining a machine’s

location [17, 25, 73]. Since locating the connected clients results in some error, con-

sequently locating disconnected clients with these anchor points further increases the

error. In Section 4.8.3, we show that this error is approximately 10 to 12 meters which

is acceptable for estimating the location of disconnected clients.

4.6.2 Network Performance Problems

Our design for diagnosing network performance problems comprises two lightweight

components: a proactive/passive monitoring component and a reactive diagnosing com-

ponent. The monitoring component runs in the background at the DC and informs the

diagnosing component when it detects connections experiencing poor performance. At

this point, the diagnosing component analyzes the connections and outputs a report that

gives a breakdown of the delays, i.e., the extent of the delays in the wired and the wire-

less part, and for the latter, a further breakdown into delays at the client, AP, and the

medium. Note that the monitoring component can be conservative in declaring that

network problems are being encountered; a false alarm simply invokes our diagnosing

component. Since this component has low overheads, invoking it has a small impact

on the performance of clients and APs. These components have not been implemented

yet in our current prototype but we have evaluated the effectiveness of some of these

techniques using tools such as AiroPeek and WinDump.


126

Detecting Network Performance Problems

We focus on diagnosing performance problems for TCP connections since TCP is the

most widely used transport protocol in the Internet. For a TCP connection, we can

passively diagnose performance problems by leveraging the connection’s data and ac-

knowledgment (ACK) packets. For other transport protocols, we can determine end-

to-end loss-rate and round-trip times using either active probing or performance reports

(e.g., RTCP reports [53]).

Network performance problems can manifest themselves in different ways, such as

low throughput, high loss rate, and high delay. We do not use throughput as a metric for

detecting a problem since it is dependent on the workload (i.e., the client’s application

may not need a high throughput) and on specific parameters of the transport protocol

(e.g., initial window size, sender and receive window size in TCP). Instead, we use

packet loss rate and round-trip time for detecting performance problems.

Estimating the round trip time (RTT) in a TCP connection is simple: if the client is

a sender, it already keeps track of the RTT; if the client is a receiver, it can apply the

heuristic proposed in [139] to estimate the round-trip time.

To estimate the loss rate, we use heuristics suggested in [47] and [10] on the client

side. We compute different loss rates for packets sent and received by the client. For data

packets sent by the client, the loss rate is estimated as the ratio of retransmitted packets

to the packets sent over the last L RTTs [10]. This estimation mechanism assumes that

the TCP implementation uses Selective ACKs so that loss rate is not overestimated un-

necessarily; this is a reasonable assumption since a number of operating systems now

support this option by default, e.g., Windows, Linux, Solaris. As shown in [10], this

estimate can be higher than the actual loss rate when timeouts occur in a TCP connec-

tion. For our purposes, this inaccuracy is acceptable for two reasons: first, if a TCP
127

connection is experiencing timeouts, it is probably experiencing problems and is worth

diagnosing; second, the only consequence of a mistake is to trigger our diagnosis com-

ponent, which incurs low overhead. If more accurate analysis is needed, the LEAST

approach suggested in [10] can be used.

For the data packets received by the client, we use an approach similar to the one

suggested in [47] to estimate the number of losses: if a packet is received such that

its starting sequence number is not the next expected sequence number, the missing

segment is considered lost. The loss rate is estimated as the ratio of lost packets to the

total number of expected packets in the last L RTTs. Note that the expected number

of bytes is calculated as the maximum observed sequence number minus the minimum

during the last L RTTs; we apply the idea in [139] to estimate maximum segment size

(MSS), and estimate the number of packets by dividing the number of bytes by MSS.

Our assumption is that segments are rarely delivered out-of-order in a TCP connection

(which has been observed by others [22]).

Our detection component triggers the diagnosis component if a connection is very

lossy or it experiences high delay. A connection is detected as experiencing high delays

if the RTT of a particular packet is more than 250 msec or is higher than twice the current

TCP RTT [140]. To avoid invoking our diagnosis algorithm for high delays that occur

temporarily, we flag a connection only when D or more packets experience a high delay.

A connection is classified as lossy if its loss rate (for transmitted or received packets) is

higher than 5% [96, 140].

Both D and L are configurable parameters and each represents a tradeoff between

responsiveness of the detection component and unnecessary invocation of the diagnosis

component. That is, with a low value of D or L, any change in delays/losses will be

detected quickly but it may also result in invoking the diagnosis component unnecessar-
128

ily. For high values, apart from slow responsiveness, another problem occurs: the TCP

connection may end before sufficient number of samples have been collected. Such a sit-

uation can occur with short Web transfers. We can alleviate this problem by aggregating

loss rate and delay information between the client and remote hosts across TCP con-

nections. We are currently exploring such techniques along with choosing appropriate

values of D and L.

Isolating Wireless and Wired Problems

When the DC at a client detects a network performance problem for a TCP connection, it

communicates with its associated DAP to differentiate between the delays on the wired

and wireless parts of the path. The DAP then starts monitoring the TCP data and ACK

packets for that client’s connection. If the DC is the sender in the TCP connection,

the DAP computes the difference between the received time of a data packet from the

client to the remote host and the corresponding TCP ACK packet; this time difference

is an estimate of the delay incurred in the wired network. To ensure that the roundtrip

time estimate is reasonable, various heuristics used by TCP need to be applied to these

roundtrip measurements as well, e.g., Karn’s algorithm [117]. The DAP sends this

estimate to the DC who can now determine the wireless part of the delay by subtracting

this estimate from the TCP roundtrip time. A similar approach can be used to compute

this breakdown when the client is a receiver: the DAP determines the wireless delay by

monitoring the data packets from the remote host to the client and the corresponding

ACK packets. Note that the amount of state maintained at the DAP is small since it

corresponds to the number of unacknowledged TCP packets; this can be reduced further

by sampling.
129

Diagnosing Wireless Network Problems

A client may experience poor wireless performance due to a number of reasons, such

as an overloaded processor at the AP or the client, problems in the wireless medium,

some driver or other kernel issues at either the AP or the client. We quantify the effect

of these problems by observing their impact on packet delay in the wireless network

path. We group these performance problems into three categories: packet delay at the

client, packet delay at the AP, and packet delay in the wireless medium. In this sec-

tion, we describe a collaborative scheme, called Estimating Delay using Eavesdropping

Neighbors or E DEN, which leverages the presence of other clients to quantify the delay

experienced in each of the above categories. Since electromagnetic waves travel at the

speed of light, we can safely assume that RF propagation delays are negligible relative

to the client or AP delays.

When a client D’s performance diagnosis component is triggered, it starts broad-

casting packets asking for diagnosis help from nearby clients. All clients who hear

these packets switch to promiscuous mode and ask the DAP to start the diagnosis (Sec-

tion 4.8.1 shows that the CPU overheads of entering promiscuous mode are low on

modern processors). Security mechanisms similar to the ones discussed in Section 4.5.2

can be used to prevent attacks on these clients. Note that we use multiple snooping

clients in E DEN primarily for robustness: multiple clients increase the likelihood that at

least one client hears the E DEN protocol requests and responses discussed below.

E DEN proceeds in two phases. In the first phase, the DAP to which client D is as-

sociated estimates the delay at D. The DAP periodically (say every 2 seconds) sends

Snoop request packets to client D. When D receives a Snoop request packet, it imme-

diately replies with a Snoop response message. The eavesdropping clients log the time

when they hear a Snoop request and the first attempt by D to send the corresponding
130

Snoop response packet, i.e., we only record the times of response packets for which the

retransmission bit is clear. If an eavesdropping client misses either of these packets,

it ignores the timing values for that request/response pair. The difference between the

recorded times is the client delay, i.e., application and OS delays experienced by D after

receiving the request packet. For robustness, Snoop requests are sent a number of times

(say 20); the client and AP delays are averaged over all these instances.

In the second phase, a similar technique is used to measure the AP delay, i.e., client D

sends the Snoop request packets and the AP sends the responses. Client D also records

the round trip times to the AP for these Snoop requests and responses along with the

number of request packets for which it did not receive a response, e.g., the request or

response was lost.

Strictly speaking, this client and AP delay also includes the delay due to contention

experienced in the wireless medium. In Section 4.8.4, we discuss the extent of inaccu-

racy introduced in E DEN’s estimates due to traffic congestion.

At the end of the protocol, all the eavesdropping clients send the AP and client

delay times to the client D. The difference between the round trip time reported by D,

and the sum of the delays at the client and the AP, approximates the sum of the delay

experienced by the packet in the forward and backward wireless link. The client can then

report the client/AP/medium breakdown to the network administrator; it can also report

the percentage of unacknowledged request packets as an indicator of the network-level

loss rate on the wireless link.

4.6.3 Rogue AP Detection

As discussed in Section 4.2, Rogue APs are unauthorized APs that have been connected

to an Ethernet tap in an enterprise or university network; such APs can result in security
131

holes, and unwanted RF and network load. Rogue APs are considered a major security

issue for enterprise wireless LANs [5, 7, 36].

Our architectural framework of using clients and (if possible) APs to monitor the

environment around them naturally lends itself for detecting Rogue APs. Our basic

approach is to make clients and DAPs collect information about nearby access points

and send it to the DS. When the DS receives information about an AP X, it checks the

AP location database and ensures that X is a registered AP in the expected location and

channel.

Assumptions

We assume that all Rogue APs and the corresponding connected “rogue” clients use

off-the-shelf IEEE 802.11-compliant hardware. Our approach essentially “raises the

bar” such that non-compliant APs with low-level modifications are needed to defeat our

scheme: to avoid detection, an attacker must modify the Rogue AP to not beacon and

not respond to probe requests. Of course, an attacker can simply use a proprietary access

point or one with different technology, e.g. HIPERLAN. Detecting such intruders re-

quires special hardware and is not our goal. We simply want a low-cost mechanism that

addresses the (common case) Rogue AP problem being faced in current deployments:

for many networks administrators, the main goal is to detect APs inadvertently installed

by employees for experimentation or convenience [24]. As part of future research, we

will investigate the detection of non-compliant Rogue access points and clients as well.

If two companies have neighboring wireless networks, our mechanisms will clas-

sify the other companies’ access points as Rogue APs. If this classification is unaccept-

able, the network administrators of the respective companies can share their AP location

databases.
132

Monitoring at clients and APs

In our system, each DC monitors the packets in its vicinity (non-promiscuous mode),

and for each AP that it detects, it sends a 4-tuple < MAC address, SSID, channel, RSSI

> to the DS. Essentially, the 4-tuple uniquely identifies an AP in a particular location

and channel. To get this information, a DC needs to determine the MAC addresses of

all APs around it.

The DC can obtain the MAC address of an AP by switching to promiscuous mode

and observing data packets (it can use the FromDS and ToDS bits in the packet to deter-

mine which address belongs to the AP). However, we can achieve the same effect using

a simpler approach: since IEEE 802.11 requires all APs to broadcast beacons at regular

intervals, the DC can obtain the MAC addresses from the APs’ beacons from all the

APs that it can hear. In Section 4.8.5, we show that a DC not only hears beacons on its

channel but it may also hear beacons from overlapping channels as well; this property

increases the likelihood of a Rogue AP being detected.

To ensure that we do not miss a Rogue AP even if no client is present on any chan-

nel overlapping with the AP, we use the Active Scanning mechanism of the IEEE 802.11

protocol: when a client wants to find out what APs are nearby, the client goes to each

of the 11 channels (in 802.11b), sends Probe Requests and waits for Probe Responses

from all APs that hear those Probe Requests; from these responses, the DC can obtain

the APs’ MAC addresses. Every IEEE 802.11-compliant AP must respond to such re-

quests and in some chipsets [86], no controls are provided to disable this functionality.

Consistent with our framework, we use the Busy AP Optimization (see Section 4.4.3) so

that active scans in an AP’s vicinity are performed by the AP only when it has no client

associated with it.


133

Analysis at the DS

When the DS receives information for an AP from various clients, it uses D IAL to esti-

mate the AP’s approximate location based on these clients’ locations and the AP’s RSSI

values from them.

The DS classifies an AP as rogue if a 4-tuple does not correspond to a known legal

AP in the DS’s AP location database, i.e., if the MAC address is not present in the

database, or if the AP is not in the expected location, or the SSID does not correspond

to the expected SSID(s) in the organization. Note that if an AP’s SSID corresponds

to an SOS SSID, the DS skips further analysis since this AP actually corresponds to a

disconnected client that is executing the Connection Setup phase of the Client Conduit

protocol. The channel information is used in a slightly different way. As stated above, if

an AP is on a certain channel, it is possible to be heard on overlapping channels. Thus,

an AP is classified as rogue only if it is reported on a channel that does not overlap

with the one on which it is expected. Note that if the channel on an AP is changed, the

DAP can ask the DS to update its AP location database (recall that the communication

between the DAP and the DS is authenticated; if the AP is a legacy AP, the administrator

can update the AP location database when the AP’s channel is changed). The checks

that the DS executes are summarized in Figure 4.4.

A Rogue AP, say R, may try MAC address spoofing to avoid being detected, i.e.,

send packets using the MAC address of an authorized AP, say G. However, the DS can

still detect R as it will reside in a different location or channel than G (if it is on the

same channel and location, G would immediately detect it). Our approach also detects a

Rogue AP that does not broadcast an SSID in its beacons since a DC can still obtain the

AP’s MAC address. Of course, we can detect such unauthorized APs in an even simpler

manner by disallowing APs that do not broadcast SSIDs in an enterprise LAN.


134

Start

Is AP in
Is MAC
Yes expected
registered?
location?

Rogue AP
No No
detected
Yes
No

Is AP on the No
expected Is AP advertising
channel? the expected SSID?

Yes

Figure 4.4: Decision steps taken by the DS to determine if an AP is a Rogue AP or

not

Thus, given the above strategy, an unauthorized AP may stay undetected for a short

time by spoofing an existing AP X near X’s location, beacon a valid SSID in the or-

ganization, and stay on a channel on which no DC or AP can overhear its beacons.

However, when a nearby client performs an active scan, the Rogue AP will be detected;

as we show in Section 4.8.5, a DC can easily perform such a scan every 5 minutes.

4.7 Implementation

We now describe the details of our fault diagnosis implementation. We have imple-

mented the basic architecture consisting of the DC, DAP and DS daemons; the authenti-

cation and logging mechanisms have not been implemented. We have also implemented
135

the Client Conduit protocol and the Rogue AP detection mechanism. The support for

D IAL and E DEN is currently being added.

Our system has been implemented on the Windows operating system with Netgear

MA 521 802.11b cards. On the DS, we simply run a daemon process that accepts

information from DAPs. The DS reads the list of legitimate APs from a file; support

for reading this information from a database can be easily added. The structure of the

code on the DC or DAP consists of a user-level daemon and kernel level drivers (see

Figure 4.5). These pieces are structured such that code is added to the kernel drivers only

if the functionality cannot be achieved in the user-level daemon or if the performance

penalty is too high.

User Diagnostics Daemon


Mode
Kernel
Mode TCP/IP

Diagnostics IM Module
Native WiFi IM Driver
NDIS

Diagnostics Miniport Module


Native WiFi Miniport Driver

Native WiFi NIC

Figure 4.5: Components on DC and DAP


136

Kernel drivers: There are two drivers in our system — a miniport driver and an inter-

mediate driver (IM driver) called the Native WiFi driver [86].

The miniport driver communicates directly with the hardware and provides basic

functionalities such as sending/receiving packets, setting channels, etc. It exposes suffi-

cient interfaces such that functions like association, authentication, etc. can be handled

in the IM driver.

The IM driver supports a number of interfaces (exposed via ioctls) for querying various

parameters such as the current channel, transmission level, power management mode,

SSID, etc. In addition to allowing the parameters to be set, it allows the user-level code

to request for active scans, associate with a particular SSID, capture packets, etc. In

general, it provides a significant amount of flexibility and control to the user-level code.

Even though some of the required operations were already present in the IM driver,

we still had to make significant modifications to expose certain functionalities and to

improve performance of our protocols. The miniport driver was changed to expose

certain packet types to the IM driver. In the IM driver, we added the following support:

• Capturing packet headers and packets: We allow filters to be set such that only

certain packets or packet headers are captured, e.g., filters based on specific MAC

addresses, packet types, packet subtypes (such as management and beacon pack-

ets), etc.

• Storing the RSSI values from received packets: We obtain the RSSI value of ev-

ery received packet and maintain a table called the NeighborInfo table that keeps

track of the RSSI value from each neighbor (indexed on the MAC address). We

maintain an exponentially weighted average with the new value given a weight-

ing factor of 0.25. The RSSI information is needed for estimating the location of

disconnected clients and APs using D IAL.


137

• Keeping track of AP information: In the NeighborInfo table, we keep track of

the channels on which packets were heard from a particular MAC address, SSID

information (from beacons), and whether the device is an AP or a station. This

information needs to be sent to the DAP/DS for Rogue AP detection.

• Kernel event support for protocol efficiency: We added an event that is shared

between the kernel and user-level code. The kernel triggers this event when an

“interesting” event occurs; this allows some of our protocols to be interrupt-driven

rather being polling-based. Currently, the kernel sets this event whenever it hears

an SOS beacon from a disconnected client during Client Conduit, thereby result-

ing in a lower protocol latency (see Section 4.8.2).

• We added a number of ioctls to get and clear the information discussed above.

Fault Diagnostic daemon: This daemon gathers information and implements various

mechanisms discussed in this chapter, e.g.., collect MAC addresses of APs for Rogue

AP detection, perform Client Conduit, etc. If the device is an AP, it communicates diag-

nostic information with the DS and the DCs; if the device is just a DC, it communicates

with its associated AP to convey the diagnostic information.

The Diagnostic daemon on the DC obtains the current NeighborInfo table from the

kernel every 30 seconds. If any new node has been discovered or if the existing data has

changed significantly (e.g., RSSI value of a client has changed by more than a factor

of 2), it is sent to the DAP. The DAP also maintains a similar table indexed on MAC

addresses. However, it only sends information about disconnected clients and APs to

the DS; otherwise, the DS would end up getting updates for every client in the system,

making it less scalable. The DAP sends new or changed information about APs to the

DS periodically (30 seconds in our current prototype). Furthermore, if the DAP has any
138

pending information about a disconnected client D, it informs the DS immediately so

that D can be serviced in a timely fashion.

All messages from the DC to the DAP and DAP to the DS are sent as XML mes-

sages. A sample message format from the DC is shown below (timestamps have been

removed):

<DiagPacket Type="RSSIInfo" TStamp="...">

<Clients TStamp="...">

<MacInfo MAC="00:40:96:27:dd:cc" RSSI="23"

Channels ="19" SSID="" TStamp="..."/>

</Clients>

<Real-APs TStamp="...">

<MacInfo MAC="00:20:a6:4c:c7:85" RSSI="89"

Channels="12" SSID="UNIV_LAN" TStamp="..."/>

<MacInfo MAC="00:20:a6:4c:bb:ad" RSSI="7"

Channels="10" SSID="EXPER" TStamp="..."/>

</Real-APs>

<Disconnected-Clients TStamp="...">

<MacInfo MAC="00:40:96:33:34:3e" RSSI="57"

Channels="2048" SSID="SOS_764" TStamp="..."/>

</Disconnected-Clients>

</DiagPacket>

As the sample message shows, the DC sends information about other connected

clients, APs, and disconnected clients. For each such class of entities, it sends the MAC

address of a machine along with RSSI, SSID, and a channel bitmap which indicates the

channels on which the particular device was overheard.


139

4.8 System Evaluation

We now evaluate our mechanisms and show that they are not only effective but they

also impose low overheads. For the basic architecture evaluation, Client Conduit, and

Rogue AP detection, we use our prototype. To demonstrate the effectiveness of E DEN

and D IAL, we use a combination of tools such as AiroPeek [132] and WinDump [134].

Section 4.8.1 presents the timings for individual operations that are used by our pro-

tocols. Section 4.8.2 presents the breakdown of the costs involved in the Client Conduit

mechanism and shows that it can be used to help disconnected clients in a timely manner.

Section 4.8.3 show the effectiveness of our D IAL technique for locating disconnected

clients. In Section 4.8.4, we evaluate the effectiveness of the E DEN technique to iso-

late performance problems. Section 4.8.5 shows that the scanning requirements of our

Rogue AP detection mechanism imposes low overheads on client machines. Finally, in

Section 4.8.6, we discuss scalability issues with respect to the Client Conduit protocol,

D IAL, E DEN, and Rogue AP detection mechanisms.

4.8.1 Cost of Individual Operations

To better understand the cost of various operations involved in our detection and di-

agnosis mechanisms (e.g., Client Conduit), we ran a series of micro-benchmarks. We

believe that these numbers are valuable for other researchers for modeling purposes as

well. Table 4.2 shows the results. Note, the cost of changing a machine from AP to

Station mode is less than 2 seconds (731 msecs for the actual change and then waiting

for a few hundred msecs as specified by the hardware specifications).

Additionally, we ran some experiments to understand the overheads of placing a card

in promiscuous mode. We first ran an experiment with 4 machines, A, B, C, and D to

determine if placing a machine in promiscuous mode has any effect on the machine’s
140

Table 4.2: Times for different operations: U means time measured from user-level

code; rest are times taken for the corresponding ioctl to complete

Operation Time (msecs) Std. dev

Mostly No-op Ioctl (U) 0.096 0.0008

RPC-based Ioctl (U) 5.72 0.29

Set channel 177.56 7.52

Set beacon period 71.43 7.73

Set AP/STA mode 731.77 232.53

Active Scan 1901.04 14.73

Set SSID 64.73 5.47

incoming/outgoing bandwidth. We setup the machines such that machine A did a TCP

transfer to C at full blast and B performed a full blast TCP transfer to D. The experiment

was performed three times; in each case, machine C was placed in normal mode first

and then in promiscuous mode. We observed that C’s throughput was largely unaffected

by being in promiscuous mode: C achieved an incoming bandwidth of 254.7 KB/sec

(standard deviation of 63.7 KB/sec) in the normal mode case and a bandwidth of 252.3

KB/sec (standard deviation of 21.7 KB/sec) in the promiscuous mode case.

We ran another experiment to determine a machine’s CPU utilization when it is

placed in promiscuous mode. We ran a full blast TCP transfer between two machines

A and B; during this process, we first placed machine M in normal mode and then in

promiscuous mode. Figure 4.6 shows the CPU overhead for machine M (a 1 GHz Pen-

tium III machine). Even for such a relatively old machine, the CPU overhead of placing

it in promiscuous mode is quite low, mostly staying below 10%; we also observed that

none of the packets were dropped.


141

Thus, these results show that the CPU overheads on a machine due to promiscuous

mode are reasonably low.

4.8.2 Client Conduit

To measure the performance of the Client Conduit protocol, we set an experiment with

one AP, one connected client C and a disconnected client D. The connected client is

a 1 GHz Pentium III machine and the disconnected machine is a 800 MHz Pentium III

machine. Both machines have 512 MB of memory and Netgear MA521 802.11b cards.

16
Promiscuous mode
Normal mode
CPU Usage

12

0
0 20 40 60
Time (secs)

Figure 4.6: CPU usage in Promiscuous mode (1 GHz machine)

Figure 4.7 shows the total time taken along with a breakdown of the Connection

Setup part of the protocol. “User time” indicates the end-to-end time taken by our user-

level implementation whereas “Kernel time” indicates the time taken by the relevant

ioctls for the same functionality. The costs in both cases are similar thereby justifying

our approach of implementing only the essential mechanisms at the kernel level and

driving most of the protocol from the user-level (for ease of debugging). In the first

two bars, the user-level daemon at the connected client shares an event with the kernel

who immediately informs the daemon when a disconnected client’s beacon is detected
142

(See Section 4.7). Thus, the disconnected client needs to wait only a short time before it

hears the Probe Request message from the connected client C indicating that C is ready

to help (see the “Get ACK” times). This delay would be much higher if the daemon

obtained the disconnected machine information from the kernel periodically instead of

being interrupt-driven. The third bar shows the delay breakdown for an implementation

where the daemon client polls for this information every 10 seconds from the kernel

(from a disconnected client’s perspective, the “Get ACK” delay is higher).

We now clarify a couple of details about our experiment. First, the initial step of

setting the channel and checking for available clients takes approximately 190 msecs.

In the worst case, the disconnected client may have to scan all channels and check for

connected clients; in that case, this step may take an 2-3 seconds. Second, the steps

in which we set the AP/Station mode of the machine take approximately 730 msec;

however, the hardware specifications require that the operating system must wait for a

few hundred milliseconds before using the card in the new mode. For robustness, we

added a one second delay after such a mode change; the figure includes these delays

after each mode change.

From the figure, one can see that the Connection Setup and association time for the

disconnected client is quite reasonable: it takes less than 5 seconds to run the setup and

another 1.9 seconds to associate with a connected client C in ad-hoc mode so that the

MultiNet protocol can be started on C.

After MultiNet starts running on the connected client, the disconnected client can

interact with the DS to diagnose its problems, e.g., transfer certificates or log files to

the DS. To evaluate the time taken to perform these transfers via MultiNet, we ran

an experiment in which a machine D sent files of different sizes (100KB, 500KB and

1MB) to the DS through connected client C. Figure 4.8 shows the time taken when
143

14,000
Adhoc-mode association
Sleep 1 second
12,000 Become STA
Get Ack
Set Beacon Period
10,000
Set SSID
Sleep 1 second
Time (msec)

8,000 Become AP
Set channel

6,000

4,000

2,000

0
User time (ms) Kernel time (ms) User time with polling (ms)

Figure 4.7: Breakdown of costs for Client Conduit. The protocol steps are executed

from the bottom entry in the legend to the topmost, i.e., starting at “Set channel”.

the connected client C allows 17-50% of its time to be used for ad hoc mode; client C

stays on the infrastructure network for 500 msecs, and the time on the ad-hoc network is

varied between 100 to 500 msecs. In our experiment, the time to switch from ad-hoc to

infrastructure mode is 500 msecs and from infrastructure to ad hoc mode is 300 msecs.

As expected, the results show that the file transfer speed is a direct function of the

time a connected client stays in the ad hoc network. We expect that as the switching

delay overhead reduces (as in newer cards) the transfer speeds will improve.

Thus, our results show that Client Conduit allows a disconnected client’s problem to

be reported (and even be resolved, e.g., updating expired certificates) in a few seconds.

4.8.3 Location Determination

We now evaluate the accuracy of locating disconnected clients (or Rogue APs) using our

D IAL scheme described in Section 4.6.1. Unlike previous work on location determina-
144

20

Time to transfer data (sec)


1MB
16 500KB
100KB
12

0
0 0.1 0.2 0.3 0.4 0.5 0.6
% Time of connected node

Figure 4.8: Time taken by a disconnected client to transfer data via Multinet

tion, the location calculated by D IAL incurs extra error since the location of reference

points themselves may not be known accurately.

We evaluated D IAL using RADAR [17] for locating the disconnected clients from

the anchor points due to its simplicity; more sophisticated RSSI-based schemes such as

the one suggested in [73] can be used for reducing the errors of D IAL even further.

In our experiment, we placed 3 connected clients in 3 offices on the same floor of our

building. We obtained the floor map, and applied the Cohen-Sutherland line-clipping

algorithm [48] to compute the number of walls between each of the three connected

clients and the other rooms. We placed a disconnected client at 7 different locations

while it sent out broadcast packets. We used AiroPeek [132] to measure the RSSI of

the disconnected client’s packets received at the connected machines. We then applied

the equation specified in [17] to compute the wall attenuation factor (WAF). Based on

the WAF, we inferred that the disconnected client is in location X if the predicted signal

strength at X is closest to the observed signal strength at the three connected clients.

We ran the RADAR algorithm on the collected RSSI data for locating the discon-
145

nected client D using the precise location of the connected clients. We computed the

error in D’s predicted location with respect to its actual location; the “No Error” bar in

Figure 4.9(a) shows this error.

Median Location Error (metres) 18


15
12
9
6
3
0
No E(1) E(2) E(1,2) E(3) E(1,3) E(2,3) E(1,2,3)
Error

(a) Estimated location of connected client is one-room off from its true location
Median Location Error (metres)

30

24

18

12

0
No Error E(1) E(2) E(1,2) E(3) E(1,3) E(2,3) E(1,2,3)

(b) Estimated location of connected client is two-rooms off from its true location.

Figure 4.9: Median error in locating disconnected clients. The lower and upper

bounds of error bars correspond to min and max error. E(i) denotes that the ith

connected client’s location contains error.

Then, we ran the algorithm again by assuming that there was an error in estimating

the location of one connected client by a distance of 3.3 meters; this distance corre-

sponds to the average width of a room in our building. For example, if connected client

A was placed in room X, we assume its estimated location to be a neighbouring room Y


146

when using it as an anchor point in RADAR. The second bar in Figure 4.9(a) shows this

error when such a situation occurs. The rest of the bars show the error in locating the

disconnected client when the location of either one, two or three connected clients is es-

timated incorrectly by one room; Figure 4.9(b) shows the error when estimated location

is off by a distance equivalient to that of two rooms.

The results show that when there is no error in the known location of the connected

clients, the median error is 9.7 meters. This error increases to at most 12 meters when the

estimated location of one or more clients is one or two rooms off from its true location.

Of course, when the estimated locations of the connected clients are off by two rooms,

the maximum error is substantially higher, e.g., 33 meters for the case when the location

of all three clients is incorrect. This case occrs when the estimated locations of the

connected clients are off in different directions, e.g., client A’s location is off towards

north and client B’s location is off towards south.

Note that the error in the location of the anchor points (i.e., connected clients) can

be kept low (less than one room off) by using mechanisms such as Cricket [98] and

Active Badges [129] for locating connected clients. With accurate location of anchor

points, D IAL’s error would be similar to that of the best-known RSSI-based location

mechanism. Note that even an error of 10-12 metres (for our experimental setup using

RADAR) is acceptable since the goal of D IAL is to approximately locate disconnected

clients or Rogue APs. Thus, based on our results, we can say that D IAL is a practi-

cal approach for helping network administrators estimate the approximate location of

problematic areas.
147

4.8.4 Estimating Wireless Delays

In Section 4.6.2, we presented the E DEN scheme that uses nearby clients to measure the

delay encountered by a wireless station or an AP. We now show that E DEN can estimate

the delay encountered at these endpoints with reasonable accuracy.

The E DEN technique measures the time spent on a client (or an AP) by measuring

the times of the Snoop request and response packets at nearby clients. However, this

measurement includes the delay at the machine due to medium contention. To under-

stand the extent of this congestion delay, we set up a simple experiment with 4 machines

A, B, C and D on the same channel. Machine A performed a full-blast data transfer to

machine B, thereby creating traffic congestion in the medium. Then we associated client

C with the Native WiFi AP machine D. The Native WiFi AP then sent 20 ping packets

to the associated client, which in turn sent ping reply packets. We ran the experiment

twice: once with no extra client delays and next when an extra 40 msec were added at

the client between the ping request and replies. Using a fifth machine running Airopeek,

we observed that E DEN over-estimated the client delay by approximately 3 msec. When

examining scenarios where the client or the AP are the bottlenecks, such inaccuracies

may be acceptable. However, when these entities are not bottlenecks or when E DEN is

examining a scenario with low delays or when contention is even worse (e.g., the con-

tention delay can even be more than 20 msec in 802.11b), a better estimation is needed;

we are currently exploring mechanisms to reduce such inaccuracies.

Next, we ran an experiment to determine E DEN’s accuracy in determining delays

at an endpoint. In this setup, a client machine was associated with another machine

running as an access point; both machines had Netgear MA521 802.11b cards and the

corresponding Native WiFi drivers. We then injected delays in the path of all packets at

the client (varying from 30 to 300 msecs). To emulate the E DEN protocol, the AP sent
148

Error in Estimated Delay (%)


4

0
30 60 90 120 150 180 210 240 270 300
Delay Introduced at client (msec)

Figure 4.10: E DEN’s accuracy of estimating the delay at a client

20 ping packets to the client; the ping packets and replies emulate the Snoop request

and response messages in E DEN. A third machine running AiroPeek was used to snoop

on these ping packets; this machine effectively emulates the eavesdropping client in

E DEN. The collected Airopeek data was then analyzed to estimate the delays at the

client. Figure 4.10 shows that E DEN is reasonably accurate in estimating the delays at

an endpoint: E DEN can estimate client delays with an error less than 5% of the actual

introduced delay.

Finally, we studied E DEN’s effectiveness in classifying delays at the client, AP, and

the medium. We used a 3-machine setup similar to the one in the previous experiment;

in this case, to estimate delays at the AP, the client also send ping packets to the AP.

To introduce delays in the medium, we increased the distance between the client and

the AP. The medium delay increased relative to the case when the AP and client were

nearby beacuse there were more retries 2 . For better accuracy, we ran these experiments

in the night when the wireless traffic was expected to be low (since the corporate LAN is
2
the increased distance resulted in an increase in the number of walls between the
two machines, thereby weakening the received signal
149

actively used by employees during the day, we did not want traffic interference to affect

our measurements).
120

100 Medium Delay

Roundtrip delay (msec)


AP Delay
80
Client Delay

60

40

20

0
40-40-near 40-40-far 0-0-far

Figure 4.11: Breakdown of delay at the client, AP, and the medium as estimated by

E DEN

Figure 4.11 shows E DEN’s breakdown for three different scenarios. The 40-40-near

bar corresponds to the scenario when the AP and client were placed near each other,

and we added a 40 msec delay to all packets at both machines. The 40-40-far scenario

is similar except that client and the AP were placed far from each other. Finally, the

0-0-far case is one in which we did not introduce any delays at the client or the AP but

they were placed far from each other.

In the 40-40-near case, E DEN estimates approximately equal delays for the client and

the AP. With an increase in the distance (the 40-40-far and 0-0-far cases), the medium

delays increase and E DEN is able to estimate this change as well. Note that the client

and the AP delays increased in the the latter two cases by a few milseconds beacuse

the wireless cards transmitted the packets at a lower transmission rate (1 Mbps) in order

to decrease the error rate. These results show that E DEN is an effective mechanism for

obtaining a delay breakdown in a wireless setting.


150

4.8.5 Rogue AP Detection

In this section, we explore two issues related to Rogue AP detection. Section 4.8.5

shows that overlapping channels helps in quicker detection of Rogue APs that are hiding

on channels where no AP or client is present. Section 4.8.5 shows that even if Rogue

APs are not overheard on overlapping channels, there is ample opportunity for clients to

perform active scanning without hurting their performance. To check the effectiveness

of our implementation, we ran our Rogue AP detection mechanism on our building

floor and were able to detect all “known” Rogue APs (these were experimental APs

being used by our colleagues).

Overlapping Channels

It is known that overlapping channels in IEEE 802.11 not only interfere with one other

but it is sometimes possible for a NIC on one channel to decode packets from another

overlapping channel. This characteristic is helpful in detecting Rogue APs: if a client

is present on a channel that overlaps with a Rogue AP’s channel, it will detect the AP’s

presence if it is able to hear the AP’s beacons.

To verify the extent of this overlap, we performed an experiment in which an AP

was placed on channel 1 and a nearby client checked for the AP’s beacons on all 11

channels. We repeated this experiment by placing the AP in all channels from 2 to 11

and document where it could be heard. In one run, the client lingered on each channel for

1 second and in the second run, it stayed for 5 seconds. Figure 4.12 plots the channels on

which the AP is heard (Y-axis) when it is placed on a specific channel (X-axis). Clearly,

the overlap across various channels is non-negligible and is helpful for detection of

Rogue APs. Furthermore, given sufficient time (see the 5-second run), there is an even

higher likelihood that some packet from a Rogue AP leaks through to a monitoring DC.
151

12

Channel on which beacons


are decoded correctly
Detected in 1 and 5 sec runs
10
Detected only in 5 sec run
8

0
0 2 4 6 8 10 12
Channel on which AP beacons

Figure 4.12: Overlapping channels on which an AP is overheard

In the above experiments, the AP and the client were placed 5 feet apart with one

obstacle between them. We wanted to study the change in leakage across overlapping

channels on increasing the the distance between the AP and the client. For this we

placed an AP machine at 10 different locations on our floor in various rooms and re-

peated the above experiment. Figure 4.13 shows that as the distance between the AP

and the monitoring client increases, the AP is heard on fewer channels (the decrease is

not monotonic due to obstructions).

The above results show that even though one cannot rely on overlap as a guaranteed

mechanism for detecting Rogue APs, it does reduce the need of performing frequent ac-

tive scans. This observation also implies that there are more opportunities for detecting

Rogue APs: for a Rogue AP to go undetected, it must be far away from any client that

is on an overlapping channel.

Availability of Idle Times for Active Scans

As shown in Section 4.8.2, active scans can take up to 2 seconds. Our current imple-

mentation performs an active scan every 5 minutes; we refer to this period as the Active

Scan Period. Even though 2 seconds out of 300 seconds is a small fraction of the time, it
152

Num Channels Sensing AP Beacons


8 Channel 1
7 Channel 4

6 Channel 7

0
0 20 40 60 80 100 120
Distance (in feet)

Figure 4.13: Overlapping channels heard relative to distance

is important for clients to perform these scans at appropriate times; otherwise, network

traffic on a client may get disrupted: packets sent to this client may be dropped, TCP

may timeout, etc.


300
Max idle period (seconds)

250
every 5 minutes

200

150

100

50

0
0 4 8 12 16 20 24
Time of day (hours)

Figure 4.14: The maximum idle time duration available during every 5-minute

period at different times of the day

Ideally, these scans should be done when the node is idle and has no ongoing net-

work transfers. To determine whether such idle times exist in current usage, we used
153

Ethereal [45] to obtain traces from 3 desktop machines of our colleagues over multiple

days. Note that even though these traces are from desktops attached to wired networks,

they still give us a reasonable estimate of network traffic generated by users; as users

start using laptops as their primary machines, it is likely that the network and idle time

behavior will be similar to that of desktop clients.

We divided the traces into 5-minute periods (the Active Scan Period) and for each

period, we determined the maximum period of time for which the network was idle.

Figure 4.14 presents the maximum idle period in every 5-minute interval during a 24-

hour period. Each point in the graph (e.g., for 12:00 pm to 12:05 pm) is obtained by

averaging the maximum idle time value across multiple days and multiple machines for

the same 5-minute period. The figure shows that there are large chunks of idle periods

available for performing active scans: the smallest idle period available in a 5-minute

interval was 118 seconds and typically, idle periods of more than 2.5 to 3 minutes were

easily available. Thus, a large window of opportunity is available to our rogue detection

scheme for performing active scans every 5 minutes.

Given the availability of such opportunities, one can use any heuristic to predict idle

times for launching an active scan (which takes 2 seconds). We studied the effectiveness

of a simple history-based heuristic: if the network has been idle for X seconds, it predicts

that the network will be idle for the next 2 seonds. Thus, after every 5 minutes, the Rogue

AP detection module can perform an active scan whenever it observes that the network

interface has been idle for X seconds. We evaluated the effectiveness of this heuristic

over our 3-machine traces with two different values of X: 5 and 10 seconds. With both

values of X, we observed that the active scan would complete within the idle period

for more than 95% of the cases. The effectiveness of this heuristic shows that wireless

clients can perform active scans for Rogue AP detection without hurting performance.
154

4.8.6 Scalability Analysis

As discussed in Section 4.4.3, our architecture is designed to scale with the number

of access points and clients in the system. We now discuss why our proactive and

reactive techniques maintain the scalability property. We also argue why our reactive

mechanisms impose low network overhead even if a number of clients are experiencing

wireless problems in an area.

As discussed in Section 4.7, each DC pro-actively sends the RSSI, SSID, and MAC

address information about nearby devices to the DAP 30 seconds; this information is

necessary for Rogue AP detection. The DAP filters this data and sends information

about APs every 30 seconds. To understand the network bandwidth consumed on the

wireless link, we set up an experiment with a single DC, DAP and DS for 4 hours. We

observed that the bandwidth consumption by the DC was less than 0.2 Kbps and the

DAP’s bandwidth requirements were less than 0.01 Kbps. This result implies that even

if a large number of clients were present, the bandwidth usage is still low, e.g., 20 Kbps

for 100 clients by DC. Thus, for pro-active monitoring, our techniques have negligible

bandwidth requirements.

We now analyze the bandwidth overheads of our reactive diagnosis mechanisms, i.e.,

Client Conduit and E DEN; we do not discuss D IAL’s overheads since D IAL’s beaconing

messages are part of Client Conduit and the overheads of sending the RSSI information

to the DAP has already been discussed above.

The bandwidth requirements of E DEN and the Connection Setup part (beacons and

probe messages) of Client Conduit are low since these protocols send small broadcast

or beacon packets at a low frequency, e.g., every 100 msecs in Client Conduit and ev-

ery 2 seconds in E DEN. The bandwidth consumption while using MultiNet can also

be controlled: as stated in Section 4.5.2, the connected client can limit the amount of
155

bandwidth that it allocates to the disconnected client. Thus, if a single client needs help,

our reactive mechanisms impose little overhead.

We now analyze the overheads when a large number of clients (say 50) in an area

have wireless faults and are utilizing our reactive mechanisms to diagnose their prob-

lems. Our basic idea for ensuring that the performance of the network does not deteori-

ate is to rate-limit our mechanisms; we have not implemented these protocol extensions

in our current prototype. In Client Conduit, when a disconnected client overhears the

beacons on N disconnected clients, instead of choosing a fixed beacon period of 100

msec, it sends out a beacon every K msecs where K is a random number between 0

and 100*N msecs. This self-regulation ensures that the network is not swamped out by

Client Conduit beacons if a sudden loss of coverage occurs in an area. A similar self-

regulatory mechanism is used to limit the rate at which the initial broadcast packets are

sent in E DEN. Furthermore, to limit the overheads on a connected client C (and possibly

reduce the reactive scheme’s load on the DAP and DS), we can use a policy such that

C helps only one client at any given point. Thus, with these policy decisions, we can

ensure that Client Conduit and E DEN impose low bandwidth overheads even when a

large number of clients are experiencing problems.

4.9 Future Work

There are a number of additional problems in wireless fault diagnosis that require further

research. We plan to pursue these in the near future.

• We presented a technique for detecting Rogue APs in a deployment. A related

problem is to detect Rogue Ad-hoc Networks. Such networks are created when

a user connected to the corporate network (e.g., via a wired network) sets up

an IEEE 802.11 ad-hoc network with an unauthenticated client. Thus, like the
156

Rogue AP scenario, such a network can compromise the security of the corporate

network.

• The problem of performing root-cause analysis on client authentication problems

was not discussed in this chapter. For example, the system could analyze the IEEE

802.1x protocol messages to determine the point at which authentication failed.

• In Section 4.6.1, we show how the location of disconnected clients can be deter-

mined when a few connected clients are present nearby. The question remains,

what should be done when there are no connected clients in the neighborhood.

One approach may be to have the client log its last known location where connec-

tivity was available. Using heuristics, such as movement trajectory, it might be

possible to determine the approximate location of the dead spot.

• The next logical step after diagnosis is recovery. Once a fault has been detected,

one needs to determine what automatic steps should the system take to resolve the

situation without necessarily involving a network administrator.

4.10 Summary

The rising popularity of IEEE 802.11 networks has made fault detection and diagno-

sis an important problem for IT managers responsible for maintaining these networks.

Interestingly, the wireless research community has overlooked these problems, perhaps

because maintenance issues surface only after large deployments are in place, which is

a relatively recent phenomenon.

In this chapter, we presented novel solutions for detecting a variety of faults and

proposed approaches for analyzing performance problems experienced by end-users.

Our initial results show that our mechanisms of locating RF holes, detecting Rogue
157

APs, and diagnosing performance problems are effective and impose low overheads.

Furthermore, we show that a novel mechanism called Client Conduit can be used for

assisting disconnected clients in real-time. These techniques in conjunction with our

general architecture that uses clients, APs, and backend servers together for diagnosing

wireless networks make our system unique and practical.

The general problem space of effective network management for IEEE 802.11 net-

works is large. Our fault diagnosis architecture is a first attempt at addressing some of

the critical problems identified to us by network administrators managing a large 802.11

deployment. It is our hope that this work will stimulate other researchers to investigate

such problems further and propose solutions that will eventually result in the smooth

operation of IEEE 802.11 networks.

The contents of this chapter were developed in joint work with Atul Adya, Victor

Bahl and Lili Qiu. The idea to work on this problem was conceived by Victor Bahl.

He also helped define the problem space. I designed the fault diagnosis architecture,

described in Section 4.4, the Client Conduit Protocol and the Rogue AP algorithm, along

with Atul Adya. I implemented the Client Conduit protocol, and Atul implemented the

Rogue AP algorithm of Section 4.6.3. I also designed the performance isolation and

location determination algorithms, presented in Sections 4.6.1 and 4.6.2, in joint work

with Lili Qiu.


CHAPTER 5

CONCLUSION

To the best of our knowledge, this dissertation is the first to look at the problem of

simultaneously connecting to multiple wireless networks with a single wireless card.

The MultiNet solution is a new virtualization architecture for wireless network cards.

It has been implemented on Windows XP, and is distributed by Microsoft Research as

part of its Mesh Networking Academic Resource Toolkit [104]. In addition to describ-

ing MultiNet, this dissertation also presents two of its applications: SSCH and Client

Conduit.

SSCH is a channel hopping protocol for increasing the capacity of wireless ad hoc

networks. SSCH can be implemented in the link layer of the network stack and works

over the IEEE 802.11 standard. It is the first multi-channel protocol that we are aware of,

which works over a single wireless card without requiring a dedicated control channel.

We show that SSCH significantly increases the capacity of wireless ad hoc networks.

Client Conduit is a key component of our fault diagnosis architecture. It uses Multi-

Net to provide a thin pipe of communication to disconnected wireless clients by using

the bandwidth of connected machines. Client Conduit has been implemented on Win-

dows XP and is shown to be lightweight and secure.

In addition to SSCH and Client Conduit, MultiNet enables the design of a whole

new class of applications. System designers are no longer constrained by the number

of wireless cards they can fit into in a system. They are free to design systems and

applications that can connect to many wireless networks at the same time. We believe

that MultiNet is the first system to relax this physical constraint.

Through its constructions, this dissertation contributes towards solving some of the

key problems in existing wireless networks, in particular power, capacity and manage-

158
159

ability. MultiNet saves battery power by not requiring multiple wireless cards to stay

connected on multiple wireless networks. SSCH improves the capacity of wireless ad

hoc networks by distributing interfering flows on orthogonal frequency channels. Fi-

nally, this dissertation presents a new client-centric fault diagnosis architecture for in-

frastructure wireless networks.


REFERENCES

[1] NLANR/DAST: Iperf 1.7.0. TCP/UDP Bandwidth Measrement Tool. http://dast


.nlanr.net/Projects/Iperf/.

[2] B. Aboba and D. Simon. PPP EAP TLS Authentication Protocol. In RFC 2716,
October 1999.

[3] A. Adya, P. Bahl, R. Chandra, and L. Qiu. Architecture and Techniques for
Diagnosing Faults in IEEE 802.11 Infrastructure Networks. In Proc. of ACM
MobiCom, Philadelphia, PA, September 2004.

[4] Atul Adya, Paramvir Bahl, Jitendra Padhye, Alec Wolman, and Lidong Zhou. A
Multi-Radio Unification Protocol for IEEE 802.11 Wireless Networks. Technical
Report MSR-TR-2003-44, Microsoft Research, July 2003.

[5] AirDefense. Wireless LAN Security. http://airdefense.net.

[6] AirMagnet. AirMagnet Distributed System. http://airmagnet.com.

[7] AirWave. AirWave Management Platform. http://airwave.com.

[8] A. Akella, G. Judd, S. Seshan, and P. Steenkiste. Self Management in Chaotic


Wireless Deployments. In ACM MobiCom 2005, August 2005.

[9] M. Alicherry, R. Bhatia, and L. Li. Joint Channel Assignment and Routing for
Throughput Optimization in Multi-radio Wireless Mesh Networks. In MobiCom,
August 2005.

[10] M. Allman, W. Eddy, and S. Ostermann. Estimating Loss Rates With TCP. In
ACM Perf. Evaluation Review 31(3), Dec 2003.

[11] AMD. Advanced Micro Devices. http://www.amd.com/.

[12] T. M. Apostol. Introduction to Analytic Number Theory. Springer-Verlag, NY,


1976.

[13] ATA. Flash Memory Cards. http://www.magicram.com/flshcrd.htm.

[14] Atheros Communications. http://www.atheros.com.

[15] B. Awerbuch, D. Holmer, and H. Rubens. Provably Secure Competitive Routing


against Proactive Byzantine Adversaries via Reinforcement Learning. In JHU
Tech Report Version 1, May 2003.

[16] P. Bahl, R. Chandra, and J. Dunagan. SSCH: Slotted Seeded Channel Hopping
for Capacity Improvement in IEEE 802.11 Ad-Hoc Wireless Networks. In Proc.
of ACM MobiCom, Philadelphia, PA, September 2004.

160
161

[17] P. Bahl and V. N. Padmanabhan. RADAR: An Inbuilding RF-based User Location


and Tracking System. In Proc. of IEEE INFOCOM, Tel-Aviv, Israel, March 2000.

[18] P. Barford and M. Crovella. Generating Representative Web Workloads for Net-
work and Server Performance Evaluation. In ACM SIGMETRICS 1998, pages
151–160, July 1998.

[19] P. Barford and M. Crovella. Critical Path Analysis of TCP Transactions. In Proc.
of ACM SIGCOMM, Stockholm, Sweden, Aug 2000.

[20] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer,


I. Pratt, and A. Warfield. Xen and the Art of Virtualization. In Proc. of ACM
SOSP, Bolton Landing, NY, October 2003.

[21] BAWUG. Bay Area Wireless Users Group. http://www.bawug.org.

[22] J. Bellardo and S. Savage. Measuring Packet Reordering. In Proc. of ACM Inter-
net Measurement Workshop, Marseille France, Nov 2002.

[23] A. Bensoussan, C. T. Clingen, and R. C. Daley. The Multics Virtual Memory. In


Proc. of ACM SOSP, Princeton, NJ, October 1969.

[24] D. Berry and G. Breeze. Microsoft IT division. Private Communication, 2004.

[25] Bluetooth SIG. Location Working Group. http://bluetooth.org.

[26] J. Broch, D. A. Maltz, and D. B. Johnson. Supporting Hierarchy and Hetero-


geneous Interfaces in Multi-Hop Wireless Ad Hoc Networks. In Workshop on
Mobile Computing held in conjunction with the International Symposium on Par-
allel Architectures, June 1999.

[27] S. Buchegger and J. Le Boudec. The Effect of Rumor Spreading in Reputation


Systems for Mobile Ad-Hoc Networks. In Proc. of WiOpt, France, March 2003.

[28] E. Bugnion, S. Devine, and M. Rosenblum. Disco: Running Commodity Op-


erating Systems on Scalable Multiprocessors. In Sixteenth ACM Symposium on
Operating System Principles, October 1997.

[29] P. Buonadonna, A. Geweke, and D. Culler. An Implementation and Analysis of


the Virtual Interface Architecture. In Proc. of SC, November 1998.

[30] R. Chandra, P. Bahl, and P. Bahl. MultiNet: Connecting to Multiple IEEE 802.11
Networks Using a Single Wireless Card. In Proc. of IEEE INFOCOM, Hong
Kong, Mar 2004.

[31] I. Chlamtac and A. Farago. Making Transmission Schedules Immune to Topol-


ogy Changes in Multi-Hop Packet Radio Networks. IEEE/ACM Transactions on
Networking, 2(1):23–29, February 1994.
162

[32] I. Chlamtac and A. Farago. Time-Spread Multiple-Access (TSMA) Protocols


for Multihop Mobile Radio Networks. IEEE/ACM Transactions on Networking,
5(6):804–812, December 1997.

[33] I. Chlamtac, C. Petrioli, and J. Redi. Energy-Conserving Access Protocols for


Identification Networks. IEEE/ACM Transactions on Networking, 7(1):51–61,
February 1999.

[34] R. R. Choudhury, X. Yang, R. Ramanathan, and N. H. Vaidya. Using Directional


Antennas for Medium Access Control in Ad Hoc Networks. In Proc. of ACM
MobiCom, September 2002.

[35] Cisco. Cisco Aironet 350 series. http://www.cisco.com/warp/public/cc/pd/witc/


ao350ap.

[36] Cisco. CiscoWorks Wireless LAN Solution Engine. http://cisco.com.

[37] Executive Committee. Wireless Philadelphia. http://www.phila.gov/wireless/.

[38] Intel Compaq and Microsoft Corporations. Virtual Interface Specification. Ver-
sion 1.0. December 1997.

[39] Computer Associates. Unicenter Solutions: Enabling a Successful Wireless En-


terprise. http://www.ca.com.

[40] D. De Couto, D. Aguayo, J. Bicket, and R. Morris. A High-Throughput Path


Metric for Multi-Hop Wireless Routing. In ACM MobiCom 2003, September
2003.

[41] P. J. Denning. Virtual Memory. In ACM Computing Surveys, volume 2, pages


153–189, September 1970.

[42] T. ElBatt and B. Ryu. On the Channel Reservation Schemes for Ad-hoc Net-
works: Utilizing Directional Antennas. In IEEE International Symposium on
Wireless Personal Multimedia Communications, October 2002.

[43] J. Elson, L. Girod, and D. Estrin. Fine-Grained Network Time Synchronization


using Reference Broadcast. In Operating Systems Design and Implementation
(OSDI 2002), December 2002.

[44] Engim. Intelligent, Wideband WLAN Chipsets with ASAP Functionality.


http://www.engim.com/.

[45] Ethereal. A network protocol analyzer. http://www.ethereal.com/.

[46] F. Fitzek, D. Angelini, G. Mazzini, and M. Zorzi. Design and performance of


an enhanced IEEE 802.11 MAC protocol for multihop coverage extension. IEEE
Wireless Communications, 10(6):30–39, December 2003.
163

[47] S. Floyd, M. Handley, J. Padhye, and J. Widmer. Equation-Based Congestion


Control for Unicast Applications. In Proc. of ACM SIGCOMM, Stockholm, Swe-
den, Aug 2000.

[48] J. D. Foley, A. van Dam, S. K. Feiner, and J. F. Hughes. Computer Graphics


Principles and Practice (2nd Edition). Addison Wesley, 1990.

[49] T. Garfinkel, B. Pfaff, J. Chow, M. Rosenblum, and D. Boneh. Terra: A Virtual


Machine-Based Platform for Trusted Computing. In Proc. of ACM SOSP, Bolton
Landing, NY, October 2003.

[50] Motorola Government and Enterprise. Motorola’s Mobile Mesh Networks Tech-
nology. http://www.motorola.com/governmentandenterprise/.

[51] F. Herzel, G. Fischer, and H. Gustat. An Integrated CMOS RF Synthesizer for


802.11a Wireless LAN. IEEE Journal of Solid-state Circuits, Vol. 38, No. 10,
October 2003.

[52] M. Heusse, F. Rousseau, G. Berger-Sabbatel, and A. Duda. Performance


Anomaly of 802.11b. In IEEE INFOCOM, 2003.

[53] Hung-Yun Hsieh, Kyu-Han Kim, Yujie Zhu, and Raghupathy Sivakumar. A
receiver-centric transport protocol for mobile hosts with heterogeneous wireless
interfaces. In Proceedings of the 9th annual international conference on Mobile
computing and networking, pages 1–15. ACM Press, 2003.

[54] L. Huang and T. Lai. On the scalability of IEEE 802.11 ad hoc networks. In Pro-
ceedings of the 3rd ACM international symposium on Mobile Ad Hoc Networking
& Computing, MobiHoc, pages 173–182. ACM Press, 2002.

[55] L. Huang, G. Peng, and T. Chiueh. Multi-Dimensional Storage Virtualization. In


Proc. of ACM SIGMETRICS, New York, June 2004.

[56] IBM. Tivoli Software. http://www.ibm.com/software/tivoli/.

[57] IEEE. IEEE 802.1x-2001 IEEE Standards for Local and Metropolitan Area Net-
works: Port-Based Network Access Control, 1999.

[58] IEEE Computer Society. Wireless LAN Medium Access Control (MAC) and
Physical Layer (PHY) Specifications. IEEE Standard 802.11, 1999.

[59] IEEE802.11a. Wireless LAN Medium Access Control(MAC) and Physical


(PHY) Layer Specification: High Speed Physical Layer Extensions in the 5 GHz
Band. 1999.

[60] IEEE802.11b/D3.0. Wireless LAN Medium Access Control(MAC) and Physical


(PHY) Layer Specification: High Speed Physical Layer Extensions in the 2.4
GHz Band. 1999.
164

[61] Crossbow Technology Inc. Motes, Smart Dust Sensors, Wireless Sensor Net-
works. http://www.xbow.com/Products/Wireless Sensor Networks.htm.

[62] Scalable Networks Inc. The Qualnet Simulator. http://www.scalable-networks.co


m/.

[63] Intel. WiMAX - Broadband Wireless Access Technology. http://www.intel.com/


netcomms/technologies/wimax/.

[64] InterEpoch Technology Inc. IWE1100-T Series. http://www.interepoch.com.tw/


products/IWE1100T.asp.

[65] K. Jain, J. Padhye, V. Padmanabhan, and L. Qiu. Impact of Interference on Multi-


hop Wireless Network Performance. In ACM MobiCom 2003, September 2003.

[66] N. Jain and S. R. Das. A Multichannel CSMA MAC Protocol with Receiver-
Based Channel Selection for Multihop Wireless Networks. In International Con-
ference on Computer Communications and Networks (IC3N), October 2001.

[67] Jinyang Li and Charles Blake and Douglas S. J. De Couto and Hu Imm Lee and
Robert Morris. Capacity of Ad Hoc wireless networks. In Mobile Computing and
Networking, pages 61–69, 2001.

[68] D. Johnson, D. Maltz, and J. Broch. DSR The Dynamic Source Routing Proto-
col for Multihop Wireless Ad Hoc Networks. In C.E. Perkins, editor, Ad Hoc
Networking, chapter 5, pages 139–172. Addison-Wesley, 2001.

[69] E. Jung and N. Vaidya. An Energy Efficient MAC Protocol for Wireless LANS.
In IEEE INFOCOM 2002, June 2002.

[70] R. Karrer, A. Sabharwal, and E. Knightly. Enabling Large-scale Wireless Broad-


band: The Case for TAPs. HotNets 2003.

[71] R. Krashinsky and H. Balakrishnan. Minimizing Energy for Wireless Web Access
with Bounded Slowdown. In ACM MobiCom 2002, pages 119–130, September
2002.

[72] R. Kravets and R. Krishnan. Power Management Techniques for Mobile Com-
munications. In ACM MobiCom 1998, October 1998.

[73] A. Ladd, K. Bekris, A. Rudys, G. Marceau, L. Kavraki, and D. Wallach.


Robotics-Based Location Sensing using Wireless Ethernet. In Proc. of ACM Mo-
biCom, Atlanta, GA, Sept 2002.

[74] T. H. Lai and D. Zhou. Efficient and Scalable IEEE 802.11 Ad-Hoc Mode Timing
Synchronization Function. In Proc. of International Conference on Advanced
Information Networking and Applications, March 2003.
165

[75] L. Lamport. Time, Clocks and the Ordering of Events in Distributed Systems. In
Communications of the ACM, volume 21, pages 558–565, 1978.

[76] L. Lamport, R. Shostak, and M. Pease. The Byzantine Generals Problem. ACM
TOPLAS, 4(3):382–401, July 1982.

[77] C. Law, A. K. Mehta, and K. Siu. A New Bluetooth Scatternet Formation Proto-
col. In To appear in ACM Mobile Networks and Applications Journal, 2002.

[78] C. Law and K. Siu. A Bluetooth Scatternet Formation Algorithm. In IEEE Sym-
posium on Ad Hoc Wireless Networks 2001, November 2001.

[79] J. Li, Z. J. Haas, M. Sheng, and Y. Chen. Performance Evaluation of Modi-


fied IEEE 802.11 MAC for Multi-Channel Multi-Hop Ad Hoc Network. In In-
ternational Conference on Advanced Information Networking and Applications
(AINA), 2003.

[80] Y. Li, H. Wu, D. Perkins, N. Tzeng, and M. Bayoumi. MAC-SCC: Medium Ac-
cess Control with a Separate Control Channel for Multihop Wireless Networks.
In 23rd International Conference on Distributed Computing Systems Workshops
(ICDCSW), 2003.

[81] C. R. Lumb, A. Merchant, and G. A. Alvarez. Facade: Virtual Storage Devices


with Performance Guarantees. In Proc. of USENIX FAST, San Francisco, April
2003.

[82] R. Mahajan, N. Spring, D. Wetherall, and T. Anderson. User-level Internet Path


Diagnosis. In Proc. of ACM SOSP, Bolton Landing, NY, October 2003.

[83] S. Marti, T. Giuli, K. Lai, and M. Baker. Mitigating Routing Misbehavior in


Mobile Ad Hoc Networks. In Proc. of ACM MobiCom, Boston, MA, August
2000.

[84] Maxim. Maxim 2.4GHz 802.11b Zero-IF Transceivers. http://pdfserv.maxim-


ic.com/en/ds/MAX2820-MAX2821.pdf.

[85] Maxim. Tracking Advances in VCO Technology. http://pdfserv.maximic.com/en/


an/AN1768.pdf.

[86] Microsoft Corp. Native 802.11 Framework for IEEE 802.11 Networks. http://ww
w.microsoft.com.

[87] A. Miu, H. Balakrishnan, and C. E. Koksal. Achieving Loss Resiliency through


Multi-Radio Diversity in Wireless Networks. In ACM MobiCom 2005, August
2005.

[88] Expert Monitoring. WiSNet Wireless Sensor Networks. http://www.expertmon.c


om/products.html.
166

[89] A. Nasipuri and S. R. Das. Multichannel CSMA with Signal Power-Based Chan-
nel Selection for Multihop Wireless Networks. In IEEE Vehicular Technology
Conference (VTC), September 2000.

[90] B. Neuman and T. Tso. An Authentication Service for Computer Networks. In


IEEE Communications, Karlsruhe, Germany, Sept 1996.

[91] S. Ni, Y. Tseng, Y. Chen, and J. Sheu. The Broadcast Storm Problem in a Mobile
Ad Hoc Network. In ACM MobiCom, August 1999.

[92] L. Nord and J. Haartsen. The Bluetooth Radio Specification and The Bluetooth
Baseband Specification. Bluetooth, 1999-2000.

[93] Nortel. Wireless Mesh Network Solution. http://www.nortel.com/.

[94] HP Openview. Management Solutions for Your Adaptive Enterprise. http://www.


managementsoftware.hp.com/.

[95] J. Padhye, R. Draves, and B. Zill. Routing in Multi-radio, Multi-hop Wireless


Mesh Networks. In ACM MobiCom, 2004.

[96] J. Padhye, V. Firoiu, D. Towsley, and J. Kurose. Modeling TCP Throughput: a


Simple Model and its Empirical Validation. In Proc. of ACM SIGCOMM, Van-
couver, BC, September 1998.

[97] C. Perkins, E. Belding-Royer, and S. Das. Ad hoc On-Demand Distance Vector


(AODV) Routing. In IETF RFC 3561, July 2003.

[98] N. B. Priyantha, A. Chakraborty, and H. Balakrishnan. The Cricket Location-


Support System. In Proc. of ACM MobiCom, Boston, MA, August 2000.

[99] L. Qiu, P. Bahl, A. Rao, and L. Zhou. Fault Detection, Isolation, and Diagnosis
in Multihop Wireless Networks. Technical Report MSR-TR-2004-11, Microsoft
Research, Redmond, WA, Dec 2003.

[100] I. Ramani and S. Savage. SyncScan: Practical Fast Handoff for 802.11 Infras-
tructure Networks . In Proc. of IEEE Infocom, Miami, FL, March 2005.

[101] M. Raya, J. P. Hubaux, and I. Aad. DOMINO: A System to Detect Greedy Be-
havior in IEEE 802.11 Hotspots. In Proc. of MobiSys, Boston, MA, June 2004.

[102] Relatek. Rtl8185l. http://www.realtek.com.tw/.

[103] IBM Security Research. Wireless Security Auditor (WSA). http://www.research


.ibm.com/gsal/wsa.

[104] Microsoft Research. Mesh Networking Academic Resource Toolkit. http://resear


ch.microsoft.com/netres/software.aspx.
167

[105] C. Rigney, A. Rubens, W. Simpson, and S. Willens. Remote Authentication Dial


In User Service (RADIUS). In RFC 2138, IETF, April 1997.

[106] J. Robinson, K. Papagiannaki, C. Diot, X. Guo, and L. Krishnamurthy. Experi-


menting with a Multi-Radio Mesh Networking Testbed. In Workshop on Wireless
Network Measurements, April 2005.

[107] RoofNet. MIT RoofNet. http://www.pdos.lcs.mit.edu/roofnet/.

[108] R. Rozovsky and P. Kumar. SEEDEX: A MAC Protocol for Ad Hoc Networks.
In ACM MobiHoc, 2001.

[109] SeattleWireless. Seattle Wireless. http://www.seattlewireless.net/.

[110] J. Sheu, C. Chao, and C. Sun. A Clock Synchronization Algorithm for Multi-
Hop Wireless Ad Hoc Networks. In Proc. of IEEE International Conference on
Distributed Computing Systems, ICDCS, Tokyo, March 2004.

[111] E. Shih, P. Bahl, and M. Sinclair. Wake On Wireless: An Event Driven Energy
Saving Strategy for Battery Operated Devices. In MOBICOM, September 2002.

[112] E. Shih, P. Bahl, and M. Sinclair. Wake on Wireless: An event driven power
saving strategy for battery operated devices. In ACM MobiCom 2002, September
2002.

[113] M. Shin, A. Mishra, and W. Arbaugh. Improving the Latency of 802.11 Handoffs
Using Neighbr Graphs. In Proc. of MobiSys, Boston, MA, June 2004.

[114] J. So and N. H. Vaidya. A Multi-channel MAC Protocol for Ad Hoc Wireless


Networks. In UIUC Technical Report, also accepted to MobiHoc 2004, January
2003.

[115] Sputnik. Sputnik Managed Wi-Fi Networks. http://www.sputnik.com.

[116] T. K. Srikanth and S. Toueg. Optimal Clock Synchronization. Journal of the


ACM, 34(3):626–645, July 1987.

[117] R. Stevens. TCP/IP Illustrated (Vol. 1): The Protocols. Addison Wesley, 1994.

[118] R. Stine. FYI on a Network Management Tool: Catalog Tools for Monitoring and
Debugging TCP/IP Internets and Interconnected Devices. In IETF RFC 1147,
April 1990.

[119] Strix Systems. Networks without Wires. http://www.strixsystems.com.

[120] J. Sugerman, G. Venkitachalam, and B. Lim. Virtualizing I/O devices on VMware


workstation’s hosted virtual machine monitor. In Annual Usenix Technical Con-
ference, June 2001.
168

[121] SuperPass. Wireless LAN PCI card for 2.4 GHz. http://www.superpass.com/SP-
PCI-01.html.

[122] Symbol. Spectrum42 4131 AccessPoint. http://www.symbol.com/products/wire


less/ap4131.html.

[123] Symbol Technolgies Inc. SpectrumSoft: Wireless Network Management System.


http://www.symbol.com.

[124] A. Tyamaloukas and J. J. Garcia-Luna-Aceves. Channel-hopping multiple access.


In IEEE International Communications Conference (ICC), 2000.

[125] P. Verissimo and L. Rodrigues. A Posteriori Agreement for Fault Tolerant Clock
Synchronization on Broadcast Networks. In Proc. of International Symposium
on Fault-Tolerant Computing (FTCS), page 85, July 1992.

[126] VMware. Enterprise-Class Virtualization Software. http://www.vmware.com/.

[127] VoIP. Voice Over Internet Protocol. http://www.fcc.gov/voip/.

[128] T. von Eicken, A.Basu, V. Buch, and W. Vogels. U-Net: A User-Level Network
Interface for Parallel andDistributed Computing. In Proc. of ACM SOSP, New
York, December 1995.

[129] R. Want, A. Hopper, V. Falcao, and J. Gibbons. The Active Badge Location
System. ACM Transactions on Information Systems, 10(1), January 1992.

[130] A. Whitaker, M. Shaw, and S. D. Gribble. Scale and Performance in the Denali
Isolation Kernel. In Fifth Symposium on Operating Systems Design and Imple-
mentation, December 2002.

[131] Wibhu Technologies Inc. SpectraMon. http://www.wibhu.com.

[132] WildPackets Incorporation. Airopeek Wireless LAN Analyzer. http://www.wild


packets.com.

[133] WiMax.com. WiMAX technology, news, training and conferences. http://www.w


imax.com/.

[134] WinDump: Tcpdump for Windows. http://windump.polito.it.

[135] S.-L. Wu, C.-Y. Lin, Y.-C. Tseng, and J.-P. Sheu. A New Multi-Channel MAC
Protocol with On-Demand Channel Assignment for Mobile Ad Hoc Networks.
In International Symposium on Parallel Architectures, Algorithms and Networks
(I-SPAN), 2000.

[136] S. Xu and T. Saadawi. Does the IEEE 802.11 MAC Protocol Work Well in Mul-
tihop Wireless Ad Hoc Networks? IEEE Communi.Magazine, pp.130-137, June
2001.
169

[137] Z. Tang and J. Garcia-Luna-Aceves. Hop-reservation multiple access (HRMA)


for ad-hoc networks. In IEEE INFOCOM, 1999.

[138] M. Zec. Implementing a Clonable Network Stack in the FreeBSD Kernel. In


Proc. of USENIX Annual Technical Conference, June 2003.

[139] Y. Zhang, L. Breslau, V. Paxson, and S. Shenker. On the Characteristics and


Origins of Internet Flow Rates. In Proc. of ACM SIGCOMM, Pitsburgh, PA,
August 2002.

[140] Y. Zhang, N. Duffield, V. Paxson, and S. Shenker. On the Constancy of Internet


Path Properties. In Proc. of ACM Internet Measurement Workshop, San Fran-
cisco, CA, Nov 2001.

Vous aimerez peut-être aussi