Vous êtes sur la page 1sur 58

Sensor Network Databases

Chapter 6
Feng Zhao
Leonidas J. Guibas
Wireless Sensor Networks
Outline
„ Sensor Database Challenges
„ Querying the Physical Environment
„ Query Interfaces
„ High-Level Database Organization
„ In-Network Aggregation
„ Data-Centric Storage
„ Data Indices and Range Queries
„ Distributed Hierarchical Aggregation
„ Temporal Data
„ Summary
Sensor Network Abstraction
Characteristics: distributed,
resource-constrained, failure
prone

From data storage point of


view: think of a sensor net as
a distributed database
Sensor Network Database Challenges
„ The sensor network is highly volatile.
„ Nodes may be depleted, and links may go down.
„ Relational tables are not static.
„ New data is continuously being sensed.
„ High energy cost of communication.
„ In-networking processing during query execution
„ The rates at which input data arrives to a
database operator can be highly variable.
Sensor Network Database Challenges
„ Limited storage on sensor nodes.
„ Older data has to be discarded.
„ Sensor tasking interacts in numerous ways
with the sensor database system.
„ Classical metrics of database system
performance may have to be adjusted.
Differences in Sensor Network
Databases
„ Sensor Network data inherently include errors
„ interference from other signals, device noise
„ Range and probabilistic or approximate queries
are more appropriate than exact queries.
„ Additional operators needed to the query
language
„ specify durations and sampling rates for the data
„ Continuous, long-running type queries
„ Ex: monitoring the average temperature in a room
„ Having correlating and comparing operators
Querying the Physical Environment
„ An aggregate query
„ Query result is computed by integrating data from
a set of sensors.
„ Delivery of data from distributed sensor nodes to
a central node for computation.
„ Ex: average , join of sensor readings from
different groups.
„ Correlation Queries
„ “Sound an alarm whenever two sensors within 10
meters of each other simultaneously detect an
abnormal temperature.”
Querying the Physical Environment
„ Snapshot queries
„ “Retrieve the current rainfall level for all sensors in
Southern California.”
„ Historical queries
„ “Display the average rainfall level at all sensors for
the last three months of the previous year.”
TinyDB Query interfaces
„ SQL-style querying
Æ long-running monitoring query
„ “For the next three hours, retrieve every 10 minutes the

maximum rainfall level in each county in Southern California,


if it is greater than 3.0 inches.”
„ SELECT max (Rainfall_level), county
FROM sensors
WHERE state = California
GROUP BY county
HAVING max(Rainfall_Level) > 3.0 in
DURATION [ now, now + 180 min ]
SMAPLING PERIOD 10 min
TinyDB Query interfaces
Cougar Sensor Database
„ Object-relational database
„ SQL-type query interface
„ Each type of sensor is associated with an
abstract data type (ADT)
„ Device ADT method represent device functions
e.g., getTemperature() ; detectTempGreaterThan(90)
Examples of Long-running queries
CREATE LR_QUERY q1 AS
SELECT R.dev, R.dev.getTemperature()
FROM TempSensors R, NamedPlaces N
WHERE $every(30)
AND R.dev.location().inside(N.bbox)
AND N.name = “California”;

CREATE LR_QUERY q2 AS
SELECT R1.dev.location()
FROM TempSensors R1, TempSensors R2
WHERE $every(10)
AND R1.dev.detectAbnormalTemperature()
AND R2.dev.detectAbnormalTemperature()
AND R1.dev > R2.dev;
Probabilistic Queries
„ Sensor data is subject to random errors.
„ Sensor data is normally distributed and
characterized by a gaussian p.d.f.
„ GADT
„ An instance of the ADT corresponds to a
gaussian p.d.f.
„ Use mean μ and standard deviation σ to
represent.
„ Prob is used to pose queries.
Probabilistic Queries
„ “Retrieve from sensors all tuples whose temperature
is within 0.5 degrees of 68 degrees, with at least 60
percent probability”
„ Ex: SELECT *
FROM sensors
WHERE Sensor.Temp.Prob([67.5,68.5] >= 0.6)
Centralized approach
„ Each sensor forwards its data to a central
server.
„ Disadvantages
„ The nodes near the access point become traffic
hot spots.
„ Sampling rate have to be set to be the highest
„ burdening the network with unnecessary traffic.
In-network storage approach
„ Choose rendezvous points to storage
data in network.
„ Advantages
„ The overhead to store and access the data
is minimized.
„ The overall load is balanced across the
network.
Server-based approach
„ Require a total of 16
message transmissions
In-Network Aggregation
„ Each sensor may
compute a partial state
record based on its data
and that of its children
„ Require a total of 6
message transmissions
Aggregation Framework
• As in extensible databases, TinyDB supports any
aggregation function conforming to:
Aggn={finit, fmerge, fevaluate}
Finit {a0} → <a0>
Fmerge {<a1>,<a2>} → <a12> ->Partial State Record
Fevaluate {<a1>} → aggregate value

Example: Average
AVGinit {v} → <v,1>
AVGmerge {<S1, C1>, <S2, C2>} → < S1 + S2 , C1 + C2>
AVGevaluate{<S, C>} → S/C
Bytes Transmitted / Epoch, All sensors

20000
40000
60000
80000

COUNT

MIN

HISTOGRAM

AVERAGE

MEDIAN
Aggregates and their efficiency in TAG
Performance Metrics
„ Network usage
„ Total usage and Hot spot usage
„ Preprocessing time
„ time taken to construct an index
„ Storage space requirement
„ Query time
„ time to process a query, assemble an answer, and
return this answer.
„ Throughput
„ Update and maintenance cost
Properties of Sensor Database
„ Persistence
„ Data stored in the system must remain available
to queries.
„ Consistency
„ A query must be routed correctly to a node where
the data are currently stored.
„ Controlled access to data
„ Scalability in network size
„ As the number of nodes increase, the
communication cost should not grow unduly.
„ Load balancing
„ Topological generality
„ The database architecture should work well on a
broad range of network topologies.
Query Processing Scheduling
„ TinyDB uses an epoch-based mechanism.
„ The epoch should be sufficiently large for
data to travel from the leaf to the root.
„ Each epoch is divided into time intervals.
„ The number of intervals reflects the depth of
the routing tree.
„ Each node only needs to power up during its
scheduled interval.
Schedule of In-Network
Aggregation
SELECT COUNT(*) FROM sensors Interval 4

Sensor #
1
Epoch
1 2 3 4 5

4 1 2 3
3
Interval #

2
4
1
1
4
5
Schedule of In-Network
Aggregation
SELECT COUNT(*) FROM sensors Interval 3

Sensor #
1
Epoch
1 2 3 4 5

4 1 2 3
3 2
2
Interval #

2
4
1

4
5
Schedule of In-Network
Aggregation
SELECT COUNT(*) FROM sensors Interval 2

Sensor #
1
Epoch
1 2 3 4 5 1 3
4 1 2 3
3 2
Interval #

2 1 3
4
1

4
5
Schedule of In-Network
Aggregation
SELECT COUNT(*) FROM sensors 5 Interval 1

Sensor #
1
Epoch
1 2 3 4 5

4 1 2 3
3 2
Interval #

2 1 3
4
1 5

4
5
Schedule of In-Network
Aggregation
SELECT COUNT(*) FROM sensors Interval 4

Sensor #
1
Epoch
1 2 3 4 5

4 1 2 3
3 2
Interval #

2 1 3
4
1 5
1
4 1
5
Data-Centric Storage (DCS)
„ DCS is a method proposed to support queries
from any node in the network by providing a
rendezvous mechanism for data and queries.
„ Avoids flooding the entire network.
„ At the center of a DCS system are
rendezvous points.
„ DCS distributes the storage load across the
entire network.
Data-Centric Storage (DCS)
„ For example:
„ Geographic hash table (GHT) attempts to
distribute data evenly across the network.
„ GHT assumes each node knows its
geographic location. (by GPS or…)
„ A data object is associated with a key.
„ Each node is responsible for storing a
certain range of keys.
Geographic Hash Table (GHT)
„ Rendezvous
„ Events are named with keys
„ Storage and retrieval performed using these keys
„ A key is hashed to a geographic position
„ Geographic routing (GPSR) used to locate closest node to
this geographic position
„ This node serves as a rendezvous for storage and search
„ Costs
„ No flooding of queries
„ Aggregate storage cost same as external
„ Structured Replication
„ Rendezvous points are replicated
„ Decreases storage communication cost
„ Increases query dissemination cost
Structured replication in GHT
(0,100) (100,100)

Root Point

Level 1 mirror Point

Level 2 mirror Point

(0,0) (100,0)
Data-Centric Storage (DCS)
„ Reduce unnecessary network traffic
„ Hashing to locations respect geographic
proximity.
„ Hash to regions rather than to locations to
avoid hot spots and increasing robustness.
„ Trade-off
„ If the frequency of event generation is
high, then pushing data to arbitrary
rendezvous points may be too expensive.
Data indices and range queries
„ It is difficult to serve a range query well
„ TinyDB aggregation tree require
flooding the entire network each query
„ Indices
„ Auxiliary data structures to facilitate and
speed up the execution of the query
„ Is useful when the rate of query is high
than the rate of update
Indices
„ Key idea
„ Pre-storing the answers to certain special queries
and then delivering the answer to an arbitrary
range query
„ Index structure
„ Hash table, k-d tree, quad-tree, R tree,…
„ Trade-off
„ the number of pre-stored answers and the speed
of query execution.
One-Dimensional Indices
s0
u4
s1
s2

u2 u6

s3
s4 u1 u3 u5 u7

s5
s6
s7 s0 s1 s2 s3 s4 s5 s6 s7
Canonical subsets of
sensors along a road
One-Dimensional Indices
„ We map logical node ui to physical node si-1
„ Canonical subsets
„ The nodes with the pre-stored data. s0~s6
u1 s0♁s1
u2 s0♁s1♁s2♁s3
u3 s2♁s3
u4 s0♁s1♁s2♁s3♁s4♁s5♁s6♁s7
u5 s4♁s5
u6 s4♁s5♁s6♁s7
u7 s6♁s7 (♁ denotes the aggregation operator )

„ Complexity: store O( n ) ; query O( log n )


Multidimensional Indices for
Orthogonal Range Searching

¾ Orthogonal range query: Light

Select * from


Nestion_Events
50
Where
40
Temperature >= 50
30
And Temperature <= 60 20
And Light >= 5
10
And Light <= 10
0 10 20 30 40 50 60 70 …
Temperature
A k-d tree partitions a plane
into rectangles
„ Drill down the k-d tree with rectangle Q
„ When reach a node whose corresponding
rectangle is disjoint from Q, just stop
propagation
„ When reach a node whose corresponding
rectangle is fully contained in Q,
incorporate its count into the events of
interest
„ Otherwise, expand a node and continue
drilling on its children
A k-d tree partitions a plane
into rectangles
Light

Temperature
Non-orthogonal Range Searching

propagate

propagate

propagate

propagate

propagate

propagate

propagate

Query Range
Distributed Hierarchical Aggregation

„ Designing a distributed index


„ Load-balancing the communication,
processing, and storage across the nodes
„ Robustness consideration
„ Frequent failures of nodes and links
„ Important to WSN database
„ Receive the attention it deserves
Multiresolution Summarization
„ Wavelet transforms
„ One way to compress and summarize
information for both temporal and spatial
signals
„ Data structure
„ Quad-tree
„ Routing
„ GPSR + GHT
„ Avoid hot spot
„ Replication
Partitioning the Summaries
„ Query start at the root of the
summarization tree
„ Partition aggregation data in a
meaningful way to lessen the load on
nodes near the hierarchy root
„ Use a multi-rooted quad-tree to
partition the spatial domain
„ System - DIFS
Quad Tree Approach
„ Quaternary Tree:
„ Each node has 4 children

„ Each node has 4 histograms summarizing


data distribution in each child subtree
„ Queries only propagate in relevant parts of
the tree (pruning)
Quad Tree: Issues
„ Explicit child pointers required

„ On storage of new data, update must


be propagated up the tree

„ Every query must originate at tree root


„ Root bears greater burden!
DIFS
„ DIFS stands for distributed index for
features in sensor networks
„ Goals
„ Provide an efficient query mechanism for
range searches of event attributes
„ Extend network lifetime by amortizing the
costs of communication and storage over
as many nodes as possible
„ Even at expense of modest overall increases
GHT-based Quad Tree
„ We add an index structure
to Structured Replication
„ Hierarchy of histograms
summarizes the range of data
within children
„ Problem: Root is the
bottleneck
„ Every query goes through it
„ Information from every event
that’s generated propagates
to it

root point
level 1 children
level 2 children
3 16
1 8

The DIFS Tree 4


9 14

7 5
2

„ Every node (except 13


15 12
1-4 10
the root) has 1-4
9-12
11
9-12
6
parents 13-16 13-16
5-8 5-8
„ The wider the
spatial extent an 1-4 1-4
9-12 9-12
index node knows 13-16 13-16
1-16 1-16 1-16 1-16
about, the more 5-8 5-8

constrained the 1-16 1-16 1-16 1-16


value range it
covers 1-16 1-16 1-16 1-16

1-16 1-16 1-16 1-16


3 10

Storage 1 4
9 14
11
7 2
16
„ Example: Event with
“temperature” equal to 9 13
15 5 12
generated at location 1-4
8
1-4
(68,61) 9-12 9-12 6
13-16 13-16
„ Compute geographically 5-8 5-8
bounded hash
„ “temperature:1:16” in 1-4 1-4
(50,50)->(75,75)
9-12 9-12

100
„ “temperature:9:12” in 13-16 13-16
(50,50)->(100,100) 1-16 1-16 1-16 1-16
5-8 5-8
9
„ “temperature:9:9” in
(0,0)->(100,100) 1-16 1-16 1-16 1-16
„ Periodically propagate up
the tree 1-16 1-16 1-16 1-16

1-16 1-16 1-16 1-16


0

0 100
DIFS Hierarchy
Fractional Cascading

Sensor p
Leaves of the quad-tree A sensor p’s view of the world
Locality-Preserving Hashing
„ Goal:
„ Have a way to map that attribute space to
the plane so that nearby locations in
attribute space correspond to nearby
locations in the plane
„ DIM (distributed index for multidimensional data)
„ Data with values close to one another are hashed to
locations nearby
„ Zone code - zone unique identify
DIM - zone tree & zone code

g
1111 0 1
a
110
010 f
b 0 1 0 1
011 1110
c
00 0 1 0 1 0 1
c
00 e a b d e
d 101
100 010 011 100 101 110 0 1
f g
1110 1111
Temporal Data
„ Overall node storage is very limited
„ We might query about the past, the
present, or the future
„ Data Aging
„ Application-dependent
„ Schedule for discarding data and data
summaries
Indexing Motion Data
„ A fixed index structure will soon be
obsolete, because of heavy update and
communication cost
„ Both the index construction and
updates can be quite expensive
„ Modify only when new objects are
inserted or deleted, or when the
trajectory of an object changes
KDS (Kinetic Data Structure)
„ Update only when certain critical events
occur
„ Drawback
„ It may lead to waste processing during
periods of inactivity, when no queries are
present in the network, because the index
require to be updated as time goes on
„ These updates need not to be so
frequent if the motion predictions are
accurate
Summary
„ This area is still in its infancy, much more
needs to be done
„ As we remarked, integration of query
processing with the networking layer, the
mapping of index structures to the spatial
topology of the network, and distributed
index construction for motion data all remain
important topics for further investigation

Vous aimerez peut-être aussi