Ch. 6 Sensor Network Databases

Sensor Network Databases
Chapter 6
Feng Zhao
Leonidas J. Guibas
Wireless Sensor Networks
Outline
Sensor Database Challenges
Querying the Physical Environment
Query Interfaces
High-Level Database Organization
In-Network Aggregation
Data-Centric Storage
Data Indices and Range Queries
Distributed Hierarchical Aggregation
Temporal Data
Summary
Sensor Network Abstraction
Characteristics: distributed,
resource-constrained, failure
prone
From data storage point of

view: think of a sensor net as
a distributed database
Sensor Network Database Challenges
The sensor network is highly volatile.
Nodes may be depleted, and links may go down.
Relational tables are not static.
New data is continuously being sensed.
High energy cost of communication.
In-networking processing during query execution
The rates at which input data arrives to a
database operator can be highly variable.
Sensor Network Database Challenges
Limited storage on sensor nodes.
Older data has to be discarded.
Sensor tasking interacts in numerous ways
with the sensor database system.
Classical metrics of database system
performance may have to be adjusted.
Differences in Sensor Network
Databases
Sensor Network data inherently include errors
interference from other signals, device noise
Range and probabilistic or approximate queries
are more appropriate than exact queries.
Additional operators needed to the query
language
specify durations and sampling rates for the data
Continuous, long-running type queries
Ex: monitoring the average temperature in a room
Having correlating and comparing operators
An aggregate query
Query result is computed by integrating data from
a set of sensors.
Delivery of data from distributed sensor nodes to
a central node for computation.
Ex: average , join of sensor readings from
different groups.
Correlation Queries
“Sound an alarm whenever two sensors within 10
meters of each other simultaneously detect an
abnormal temperature.”
Snapshot queries
“Retrieve the current rainfall level for all sensors in
Southern California.”
Historical queries
“Display the average rainfall level at all sensors for
the last three months of the previous year.”
TinyDB Query interfaces
SQL-style querying
Æ long-running monitoring query
“For the next three hours, retrieve every 10 minutes the
maximum rainfall level in each county in Southern California,

if it is greater than 3.0 inches.”
SELECT max (Rainfall_level), county
FROM sensors
WHERE state = California
GROUP BY county
HAVING max(Rainfall_Level) > 3.0 in
DURATION [ now, now + 180 min ]
SMAPLING PERIOD 10 min
TinyDB Query interfaces
Cougar Sensor Database
Object-relational database
SQL-type query interface
Each type of sensor is associated with an
abstract data type (ADT)
Device ADT method represent device functions
e.g., getTemperature() ; detectTempGreaterThan(90)
Examples of Long-running queries
CREATE LR_QUERY q1 AS
SELECT R.dev, R.dev.getTemperature()
FROM TempSensors R, NamedPlaces N
WHERE $every(30)
AND R.dev.location().inside(N.bbox)
AND N.name = “California”;
CREATE LR_QUERY q2 AS
SELECT R1.dev.location()
FROM TempSensors R1, TempSensors R2
WHERE $every(10)
AND R1.dev.detectAbnormalTemperature()
AND R2.dev.detectAbnormalTemperature()
AND R1.dev > R2.dev;
Probabilistic Queries
Sensor data is subject to random errors.
Sensor data is normally distributed and
characterized by a gaussian p.d.f.
GADT
An instance of the ADT corresponds to a
gaussian p.d.f.
Use mean μ and standard deviation σ to
represent.
Prob is used to pose queries.
Probabilistic Queries
“Retrieve from sensors all tuples whose temperature
is within 0.5 degrees of 68 degrees, with at least 60
percent probability”
Ex: SELECT *
FROM sensors
WHERE Sensor.Temp.Prob([67.5,68.5] >= 0.6)
Centralized approach
Each sensor forwards its data to a central
server.
Disadvantages
The nodes near the access point become traffic
hot spots.
Sampling rate have to be set to be the highest
burdening the network with unnecessary traffic.
In-network storage approach
Choose rendezvous points to storage
data in network.
Advantages
The overhead to store and access the data
is minimized.
The overall load is balanced across the
network.
Server-based approach
Require a total of 16
message transmissions
In-Network Aggregation
Each sensor may
compute a partial state
record based on its data
and that of its children
Require a total of 6
message transmissions
Aggregation Framework
• As in extensible databases, TinyDB supports any
aggregation function conforming to:
Aggn={finit, fmerge, fevaluate}
Finit {a0} → <a0>
Fmerge {<a1>,<a2>} → <a12> ->Partial State Record
Fevaluate {<a1>} → aggregate value
Example: Average
AVGinit {v} → <v,1>
AVGmerge {<S1, C1>, <S2, C2>} → < S1 + S2 , C1 + C2>
AVGevaluate{<S, C>} → S/C
Bytes Transmitted / Epoch, All sensors
20000
40000
60000
80000
COUNT
MIN
HISTOGRAM
AVERAGE
MEDIAN
Aggregates and their efficiency in TAG
Performance Metrics
Network usage
Total usage and Hot spot usage
Preprocessing time
time taken to construct an index
Storage space requirement
Query time
time to process a query, assemble an answer, and
return this answer.
Throughput
Update and maintenance cost
Properties of Sensor Database
Persistence
Data stored in the system must remain available
to queries.
Consistency
A query must be routed correctly to a node where
the data are currently stored.
Controlled access to data
Scalability in network size
As the number of nodes increase, the
communication cost should not grow unduly.
Load balancing
Topological generality
The database architecture should work well on a
broad range of network topologies.
Query Processing Scheduling
TinyDB uses an epoch-based mechanism.
The epoch should be sufficiently large for
data to travel from the leaf to the root.
Each epoch is divided into time intervals.
The number of intervals reflects the depth of
the routing tree.
Each node only needs to power up during its
scheduled interval.
Schedule of In-Network
Aggregation
SELECT COUNT(*) FROM sensors Interval 4
Sensor #
1
Epoch
1 2 3 4 5
4 1 2 3
3
Interval #
2
4
1
1
4
5
Aggregation
Sensor #
1
Epoch
1 2 3 4 5
4 1 2 3
3 2
2
Interval #
2
4
1
4
5
Aggregation
Sensor #
1
Epoch
1 2 3 4 5 1 3
4 1 2 3
3 2
Interval #
2 1 3
4
1
4
5
Aggregation
SELECT COUNT(*) FROM sensors 5 Interval 1
Sensor #
1
Epoch
1 2 3 4 5
4 1 2 3
3 2
Interval #
2 1 3
4
1 5
4
5
Aggregation
Sensor #
1
Epoch
1 2 3 4 5
4 1 2 3
3 2
Interval #
2 1 3
4
1 5
1
4 1
5
Data-Centric Storage (DCS)
DCS is a method proposed to support queries
from any node in the network by providing a
rendezvous mechanism for data and queries.
Avoids flooding the entire network.
At the center of a DCS system are
rendezvous points.
DCS distributes the storage load across the
entire network.
For example:
Geographic hash table (GHT) attempts to
distribute data evenly across the network.
GHT assumes each node knows its
geographic location. (by GPS or…)
A data object is associated with a key.
Each node is responsible for storing a
certain range of keys.
Geographic Hash Table (GHT)
Rendezvous
Events are named with keys
Storage and retrieval performed using these keys
A key is hashed to a geographic position
Geographic routing (GPSR) used to locate closest node to
this geographic position
This node serves as a rendezvous for storage and search
Costs
No flooding of queries
Aggregate storage cost same as external
Structured Replication
Rendezvous points are replicated
Decreases storage communication cost
Increases query dissemination cost
Structured replication in GHT
(0,100) (100,100)
Root Point
Level 1 mirror Point
Level 2 mirror Point
(0,0) (100,0)
Reduce unnecessary network traffic
Hashing to locations respect geographic
proximity.
Hash to regions rather than to locations to
avoid hot spots and increasing robustness.
Trade-off
If the frequency of event generation is
high, then pushing data to arbitrary
rendezvous points may be too expensive.
Data indices and range queries
It is difficult to serve a range query well
TinyDB aggregation tree require
flooding the entire network each query
Indices
Auxiliary data structures to facilitate and
speed up the execution of the query
Is useful when the rate of query is high
than the rate of update
Indices
Key idea
Pre-storing the answers to certain special queries
and then delivering the answer to an arbitrary
range query
Index structure
Hash table, k-d tree, quad-tree, R tree,…
Trade-off
the number of pre-stored answers and the speed
of query execution.
One-Dimensional Indices
s0
u4
s1
s2
u2 u6
s3
s4 u1 u3 u5 u7
s5
s6
s7 s0 s1 s2 s3 s4 s5 s6 s7
Canonical subsets of
sensors along a road
One-Dimensional Indices
We map logical node ui to physical node si-1
Canonical subsets
The nodes with the pre-stored data. s0~s6
u1 s0♁s1
u2 s0♁s1♁s2♁s3
u3 s2♁s3
u4 s0♁s1♁s2♁s3♁s4♁s5♁s6♁s7
u5 s4♁s5
u6 s4♁s5♁s6♁s7
u7 s6♁s7 (♁ denotes the aggregation operator )
Complexity: store O( n ) ; query O( log n )

Multidimensional Indices for
Orthogonal Range Searching
¾ Orthogonal range query: Light
Select * from
…
Nestion_Events
50
Where
40
Temperature >= 50
30
And Temperature <= 60 20
And Light >= 5
10
And Light <= 10
0 10 20 30 40 50 60 70 …
Temperature
A k-d tree partitions a plane
into rectangles
Drill down the k-d tree with rectangle Q
When reach a node whose corresponding
rectangle is disjoint from Q, just stop
propagation
When reach a node whose corresponding
rectangle is fully contained in Q,
incorporate its count into the events of
interest
Otherwise, expand a node and continue
drilling on its children
A k-d tree partitions a plane
into rectangles
Light
Temperature
Non-orthogonal Range Searching
propagate
propagate
propagate
propagate
propagate
propagate
propagate
Query Range
Distributed Hierarchical Aggregation
Designing a distributed index

Load-balancing the communication,
processing, and storage across the nodes
Robustness consideration
Frequent failures of nodes and links
Important to WSN database
Receive the attention it deserves
Multiresolution Summarization
Wavelet transforms
One way to compress and summarize
information for both temporal and spatial
signals
Data structure
Quad-tree
Routing
GPSR + GHT
Avoid hot spot
Replication
Partitioning the Summaries
Query start at the root of the
summarization tree
Partition aggregation data in a
meaningful way to lessen the load on
nodes near the hierarchy root
Use a multi-rooted quad-tree to
partition the spatial domain
System - DIFS
Quad Tree Approach
Quaternary Tree:
Each node has 4 children
Each node has 4 histograms summarizing

data distribution in each child subtree
Queries only propagate in relevant parts of
the tree (pruning)
Quad Tree: Issues
Explicit child pointers required
On storage of new data, update must

be propagated up the tree
Every query must originate at tree root

Root bears greater burden!
DIFS
DIFS stands for distributed index for
features in sensor networks
Goals
Provide an efficient query mechanism for
range searches of event attributes
Extend network lifetime by amortizing the
costs of communication and storage over
as many nodes as possible
Even at expense of modest overall increases
GHT-based Quad Tree
We add an index structure
to Structured Replication
Hierarchy of histograms
summarizes the range of data
within children
Problem: Root is the
bottleneck
Every query goes through it
Information from every event
that’s generated propagates
to it
root point
level 1 children
level 2 children
3 16
1 8
The DIFS Tree 4

9 14
7 5
2
Every node (except 13

15 12
1-4 10
the root) has 1-4
9-12
11
9-12
6
parents 13-16 13-16
5-8 5-8
The wider the
spatial extent an 1-4 1-4
9-12 9-12
index node knows 13-16 13-16
1-16 1-16 1-16 1-16
about, the more 5-8 5-8
constrained the 1-16 1-16 1-16 1-16

value range it
covers 1-16 1-16 1-16 1-16
1-16 1-16 1-16 1-16

3 10
Storage 1 4
9 14
11
7 2
16
Example: Event with
“temperature” equal to 9 13
15 5 12
generated at location 1-4
8
1-4
(68,61) 9-12 9-12 6
13-16 13-16
Compute geographically 5-8 5-8
bounded hash
“temperature:1:16” in 1-4 1-4
(50,50)->(75,75)
9-12 9-12
100
“temperature:9:12” in 13-16 13-16
(50,50)->(100,100) 1-16 1-16 1-16 1-16
5-8 5-8
9
“temperature:9:9” in
(0,0)->(100,100) 1-16 1-16 1-16 1-16
Periodically propagate up
the tree 1-16 1-16 1-16 1-16
1-16 1-16 1-16 1-16

0
0 100
DIFS Hierarchy
Fractional Cascading
Sensor p
Leaves of the quad-tree A sensor p’s view of the world
Locality-Preserving Hashing
Goal:
Have a way to map that attribute space to
the plane so that nearby locations in
attribute space correspond to nearby
locations in the plane
DIM (distributed index for multidimensional data)
Data with values close to one another are hashed to
locations nearby
Zone code - zone unique identify
DIM - zone tree & zone code
g
1111 0 1
a
110
010 f
b 0 1 0 1
011 1110
c
00 0 1 0 1 0 1
c
00 e a b d e
d 101
100 010 011 100 101 110 0 1
f g
1110 1111
Temporal Data
Overall node storage is very limited
We might query about the past, the
present, or the future
Data Aging
Application-dependent
Schedule for discarding data and data
summaries
Indexing Motion Data
A fixed index structure will soon be
obsolete, because of heavy update and
communication cost
Both the index construction and
updates can be quite expensive
Modify only when new objects are
inserted or deleted, or when the
trajectory of an object changes
KDS (Kinetic Data Structure)
Update only when certain critical events
occur
Drawback
It may lead to waste processing during
periods of inactivity, when no queries are
present in the network, because the index
require to be updated as time goes on
These updates need not to be so
frequent if the motion predictions are
accurate
Summary
This area is still in its infancy, much more
needs to be done
As we remarked, integration of query
processing with the networking layer, the
mapping of index structures to the spatial
topology of the network, and distributed
index construction for motion data all remain
important topics for further investigation

Ch. 6 Sensor Network Databases

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Ch. 6 Sensor Network Databases

Transféré par

Droits d'auteur :

Formats disponibles

Sensor Network Databases

From data storage point of

maximum rainfall level in each county in Southern California,

Level 1 mirror Point

Level 2 mirror Point

Complexity: store O( n ) ; query O( log n )

¾ Orthogonal range query: Light

Designing a distributed index

Each node has 4 histograms summarizing

On storage of new data, update must

Every query must originate at tree root

The DIFS Tree 4

Every node (except 13

constrained the 1-16 1-16 1-16 1-16

1-16 1-16 1-16 1-16

1-16 1-16 1-16 1-16

Vous aimerez peut-être aussi

Ch. 6 Sensor Network Databases

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Ch. 6 Sensor Network Databases

Transféré par

Droits d'auteur :

Formats disponibles

Sensor Network Databases

From data storage point of

maximum rainfall level in each county in Southern California,

Level 1 mirror Point

Level 2 mirror Point

 Complexity: store O( n ) ; query O( log n )

¾ Orthogonal range query: Light

 Designing a distributed index

 Each node has 4 histograms summarizing

 On storage of new data, update must

 Every query must originate at tree root

The DIFS Tree 4

 Every node (except 13

constrained the 1-16 1-16 1-16 1-16

1-16 1-16 1-16 1-16

1-16 1-16 1-16 1-16

Vous aimerez peut-être aussi

Complexity: store O( n ) ; query O( log n )

Designing a distributed index

Each node has 4 histograms summarizing

On storage of new data, update must

Every query must originate at tree root

Every node (except 13