Académique Documents
Professionnel Documents
Culture Documents
Chapter 6
Feng Zhao
Leonidas J. Guibas
Wireless Sensor Networks
Outline
Sensor Database Challenges
Querying the Physical Environment
Query Interfaces
High-Level Database Organization
In-Network Aggregation
Data-Centric Storage
Data Indices and Range Queries
Distributed Hierarchical Aggregation
Temporal Data
Summary
Sensor Network Abstraction
Characteristics: distributed,
resource-constrained, failure
prone
CREATE LR_QUERY q2 AS
SELECT R1.dev.location()
FROM TempSensors R1, TempSensors R2
WHERE $every(10)
AND R1.dev.detectAbnormalTemperature()
AND R2.dev.detectAbnormalTemperature()
AND R1.dev > R2.dev;
Probabilistic Queries
Sensor data is subject to random errors.
Sensor data is normally distributed and
characterized by a gaussian p.d.f.
GADT
An instance of the ADT corresponds to a
gaussian p.d.f.
Use mean μ and standard deviation σ to
represent.
Prob is used to pose queries.
Probabilistic Queries
“Retrieve from sensors all tuples whose temperature
is within 0.5 degrees of 68 degrees, with at least 60
percent probability”
Ex: SELECT *
FROM sensors
WHERE Sensor.Temp.Prob([67.5,68.5] >= 0.6)
Centralized approach
Each sensor forwards its data to a central
server.
Disadvantages
The nodes near the access point become traffic
hot spots.
Sampling rate have to be set to be the highest
burdening the network with unnecessary traffic.
In-network storage approach
Choose rendezvous points to storage
data in network.
Advantages
The overhead to store and access the data
is minimized.
The overall load is balanced across the
network.
Server-based approach
Require a total of 16
message transmissions
In-Network Aggregation
Each sensor may
compute a partial state
record based on its data
and that of its children
Require a total of 6
message transmissions
Aggregation Framework
• As in extensible databases, TinyDB supports any
aggregation function conforming to:
Aggn={finit, fmerge, fevaluate}
Finit {a0} → <a0>
Fmerge {<a1>,<a2>} → <a12> ->Partial State Record
Fevaluate {<a1>} → aggregate value
Example: Average
AVGinit {v} → <v,1>
AVGmerge {<S1, C1>, <S2, C2>} → < S1 + S2 , C1 + C2>
AVGevaluate{<S, C>} → S/C
Bytes Transmitted / Epoch, All sensors
20000
40000
60000
80000
COUNT
MIN
HISTOGRAM
AVERAGE
MEDIAN
Aggregates and their efficiency in TAG
Performance Metrics
Network usage
Total usage and Hot spot usage
Preprocessing time
time taken to construct an index
Storage space requirement
Query time
time to process a query, assemble an answer, and
return this answer.
Throughput
Update and maintenance cost
Properties of Sensor Database
Persistence
Data stored in the system must remain available
to queries.
Consistency
A query must be routed correctly to a node where
the data are currently stored.
Controlled access to data
Scalability in network size
As the number of nodes increase, the
communication cost should not grow unduly.
Load balancing
Topological generality
The database architecture should work well on a
broad range of network topologies.
Query Processing Scheduling
TinyDB uses an epoch-based mechanism.
The epoch should be sufficiently large for
data to travel from the leaf to the root.
Each epoch is divided into time intervals.
The number of intervals reflects the depth of
the routing tree.
Each node only needs to power up during its
scheduled interval.
Schedule of In-Network
Aggregation
SELECT COUNT(*) FROM sensors Interval 4
Sensor #
1
Epoch
1 2 3 4 5
4 1 2 3
3
Interval #
2
4
1
1
4
5
Schedule of In-Network
Aggregation
SELECT COUNT(*) FROM sensors Interval 3
Sensor #
1
Epoch
1 2 3 4 5
4 1 2 3
3 2
2
Interval #
2
4
1
4
5
Schedule of In-Network
Aggregation
SELECT COUNT(*) FROM sensors Interval 2
Sensor #
1
Epoch
1 2 3 4 5 1 3
4 1 2 3
3 2
Interval #
2 1 3
4
1
4
5
Schedule of In-Network
Aggregation
SELECT COUNT(*) FROM sensors 5 Interval 1
Sensor #
1
Epoch
1 2 3 4 5
4 1 2 3
3 2
Interval #
2 1 3
4
1 5
4
5
Schedule of In-Network
Aggregation
SELECT COUNT(*) FROM sensors Interval 4
Sensor #
1
Epoch
1 2 3 4 5
4 1 2 3
3 2
Interval #
2 1 3
4
1 5
1
4 1
5
Data-Centric Storage (DCS)
DCS is a method proposed to support queries
from any node in the network by providing a
rendezvous mechanism for data and queries.
Avoids flooding the entire network.
At the center of a DCS system are
rendezvous points.
DCS distributes the storage load across the
entire network.
Data-Centric Storage (DCS)
For example:
Geographic hash table (GHT) attempts to
distribute data evenly across the network.
GHT assumes each node knows its
geographic location. (by GPS or…)
A data object is associated with a key.
Each node is responsible for storing a
certain range of keys.
Geographic Hash Table (GHT)
Rendezvous
Events are named with keys
Storage and retrieval performed using these keys
A key is hashed to a geographic position
Geographic routing (GPSR) used to locate closest node to
this geographic position
This node serves as a rendezvous for storage and search
Costs
No flooding of queries
Aggregate storage cost same as external
Structured Replication
Rendezvous points are replicated
Decreases storage communication cost
Increases query dissemination cost
Structured replication in GHT
(0,100) (100,100)
Root Point
(0,0) (100,0)
Data-Centric Storage (DCS)
Reduce unnecessary network traffic
Hashing to locations respect geographic
proximity.
Hash to regions rather than to locations to
avoid hot spots and increasing robustness.
Trade-off
If the frequency of event generation is
high, then pushing data to arbitrary
rendezvous points may be too expensive.
Data indices and range queries
It is difficult to serve a range query well
TinyDB aggregation tree require
flooding the entire network each query
Indices
Auxiliary data structures to facilitate and
speed up the execution of the query
Is useful when the rate of query is high
than the rate of update
Indices
Key idea
Pre-storing the answers to certain special queries
and then delivering the answer to an arbitrary
range query
Index structure
Hash table, k-d tree, quad-tree, R tree,…
Trade-off
the number of pre-stored answers and the speed
of query execution.
One-Dimensional Indices
s0
u4
s1
s2
u2 u6
s3
s4 u1 u3 u5 u7
s5
s6
s7 s0 s1 s2 s3 s4 s5 s6 s7
Canonical subsets of
sensors along a road
One-Dimensional Indices
We map logical node ui to physical node si-1
Canonical subsets
The nodes with the pre-stored data. s0~s6
u1 s0♁s1
u2 s0♁s1♁s2♁s3
u3 s2♁s3
u4 s0♁s1♁s2♁s3♁s4♁s5♁s6♁s7
u5 s4♁s5
u6 s4♁s5♁s6♁s7
u7 s6♁s7 (♁ denotes the aggregation operator )
Select * from
…
Nestion_Events
50
Where
40
Temperature >= 50
30
And Temperature <= 60 20
And Light >= 5
10
And Light <= 10
0 10 20 30 40 50 60 70 …
Temperature
A k-d tree partitions a plane
into rectangles
Drill down the k-d tree with rectangle Q
When reach a node whose corresponding
rectangle is disjoint from Q, just stop
propagation
When reach a node whose corresponding
rectangle is fully contained in Q,
incorporate its count into the events of
interest
Otherwise, expand a node and continue
drilling on its children
A k-d tree partitions a plane
into rectangles
Light
Temperature
Non-orthogonal Range Searching
propagate
propagate
propagate
propagate
propagate
propagate
propagate
Query Range
Distributed Hierarchical Aggregation
root point
level 1 children
level 2 children
3 16
1 8
7 5
2
Storage 1 4
9 14
11
7 2
16
Example: Event with
“temperature” equal to 9 13
15 5 12
generated at location 1-4
8
1-4
(68,61) 9-12 9-12 6
13-16 13-16
Compute geographically 5-8 5-8
bounded hash
“temperature:1:16” in 1-4 1-4
(50,50)->(75,75)
9-12 9-12
100
“temperature:9:12” in 13-16 13-16
(50,50)->(100,100) 1-16 1-16 1-16 1-16
5-8 5-8
9
“temperature:9:9” in
(0,0)->(100,100) 1-16 1-16 1-16 1-16
Periodically propagate up
the tree 1-16 1-16 1-16 1-16
0 100
DIFS Hierarchy
Fractional Cascading
Sensor p
Leaves of the quad-tree A sensor p’s view of the world
Locality-Preserving Hashing
Goal:
Have a way to map that attribute space to
the plane so that nearby locations in
attribute space correspond to nearby
locations in the plane
DIM (distributed index for multidimensional data)
Data with values close to one another are hashed to
locations nearby
Zone code - zone unique identify
DIM - zone tree & zone code
g
1111 0 1
a
110
010 f
b 0 1 0 1
011 1110
c
00 0 1 0 1 0 1
c
00 e a b d e
d 101
100 010 011 100 101 110 0 1
f g
1110 1111
Temporal Data
Overall node storage is very limited
We might query about the past, the
present, or the future
Data Aging
Application-dependent
Schedule for discarding data and data
summaries
Indexing Motion Data
A fixed index structure will soon be
obsolete, because of heavy update and
communication cost
Both the index construction and
updates can be quite expensive
Modify only when new objects are
inserted or deleted, or when the
trajectory of an object changes
KDS (Kinetic Data Structure)
Update only when certain critical events
occur
Drawback
It may lead to waste processing during
periods of inactivity, when no queries are
present in the network, because the index
require to be updated as time goes on
These updates need not to be so
frequent if the motion predictions are
accurate
Summary
This area is still in its infancy, much more
needs to be done
As we remarked, integration of query
processing with the networking layer, the
mapping of index structures to the spatial
topology of the network, and distributed
index construction for motion data all remain
important topics for further investigation