Académique Documents
Professionnel Documents
Culture Documents
ABSTRACT
Wireless sensor networks (WSNs) consist of many sensor nodes. These networks have huge application in habitat
monitoring, disaster management, security and military, etc. Wireless sensor nodes are very small in size and have limited
processing capability with very low battery power. This restriction of low battery power makes the sensor network prone to
failure. Data aggregation may be effective technique in this context because it reduces the number of packets to be sent to
sink by aggregating the similar packets .In this paper we put our attention into various data aggregation algorithms in
wireless sensor network. Data aggregation technique increases the lifetime of sensor network by decreasing the number of
packets to be sent to sink or base station. Here, we explore the data aggregation algorithms on the basis of network.
KEYWORDS: Wireless Sensor Networks, Data fusion and Data Aggregation, Energy Constrained Networks
INTRODUCTION
A Wireless Sensor Network (WSN) consists of several spatially distributed autonomous devices (sensor nodes)
with sensing and communication capabilities that cooperatively sense physical or environmental conditions, such as
temperature, sound, vibration, pressure, motion or pollutants at different locations & used in applications such as
environmental monitoring, homeland security, critical infrastructure systems, communications, manufacturing etc.
WSNs are data-driven networks that usually produce a large amount of information that needs to be routed across the
networks. As sensor nodes are energy-constrained devices and the energy consumption is generally associated with the
amount of gathered data. Since energy conservation is a key issue in WSNs, Data fusion and Data aggregation is exploited
in order to save energy[1][23]. A strategy to optimize the routing task for the available processing capacity can be provided
by the intermediate sensor nodes along the routing paths. Data aggregation is defined as the process of aggregating the data
from multiple sensors to eliminate redundant transmission and provide fused information to the base station. The main goal
of data aggregation algorithms is to gather and aggregate data in an efficient manner so that lifetime of the network
increases by decreasing the number of packets to be sent to sink or base station[24], intern reduces the communication
costs and energy consumption.
www.tjprc.org
editor@tjprc.org
22
Rashmi M. R
process. Hence, we need methods for combining data into high-quality information at the sensors or intermediate nodes
which can reduce the number of packets transmitted to the base station resulting in conservation of energy and bandwidth.
This can be accomplished by data aggregation. Data aggregation is defined as the process of aggregating the data from
multiple sensors to eliminate redundant transmission and provide fused information to the base station [2].
Data aggregation usually involves the fusion of data from multiple sensors at intermediate nodes and transmission
of the aggregated data to the base station (sink) so we can conclude that data gathering is to collect the data from neighbor
node to be sent to sink and data aggregation is process of removing redundancy among them.
Data aggregation can be categorized on the basis of network topology, network flow, quality of services and many
more. In this paper we have put our attention on network topology based data aggregation technique. We can divide the
data aggregation technique into parts: structure based and structure free.
Structure based data aggregation can be further divided into four parts flat network based, cluster based, tree
based and grid based.
Data Aggregation in Flat Networks
In flat networks, each sensor node plays the same role and is equipped with approximately the same battery
power. In such networks, data aggregation is accomplished by data centric routing where the sink usually transmits a query
message to the sensors, for example, via flooding and sensors which have data matching the query send response messages
back to the sink. The choice of a particular communication protocol depends on the specific application at hand.
In the rest of this subsection we describe these protocols and highlight their advantages and limitations.
Flooding and Gossiping
Flooding and gossiping [3] are two classical mechanisms to relay data in sensor networks without the need for any
routing algorithms and topology maintenance. In flooding, each sensor receiving a data packet broadcasts it to all of its
neighbors and this process continues until the packet arrives at the destination or the maximum number of hops for the
packet is reached. On the other hand, gossiping is a slightly enhanced version of flooding where the receiving node sends
the packet to a randomly selected neighbor, which picks another random neighbor to forward the packet to and so on.
Directed Diffusion
Directed diffusion (DD) [4] is a popular data aggregation paradigm for wireless sensor networks. It is a
data-centric and application aware paradigm, in the sense that all data generated by sensor nodes is named by attributevalue pairs. Such a scheme combines the data coming from different sources en route to the sink by eliminating
redundancy and minimizing the number of transmissions. In this way, it saves the energy consumption and increases the
network lifetime of WSNs. In directed diffusion, the base station requests data by broadcasting interests, which describes a
required task to be implemented by the network.
The interest is defined using a list of attribute value pairs such as name of objects, interval, duration and
geographical area. Each node receiving the interest can cache it for later use.
As the interest is broadcasted through the network hop-by-hop, gradients are setup to draw data satisfying the
query towards the requesting node. A gradient is a reply link to the neighbor from which the interest was received.
It contains the information derived from the received interests fields, such as the data rate, duration and expiration time.
Impact Factor (JCC): 1.6842
23
Each sensor that receives the interest sets up a gradient toward the sensor nodes from which it received the interest.
This process continues until gradients are setup from the sources all the way back to the base station.
In this way, several paths can be established, so that one of them is selected by reinforcement. The sink resends
the original interest message through the selected path with a smaller interval, hence reinforcing the source node on that
path to send data more frequently.
SPIN
SPIN [5] is among the early work to pursue data centric routing mechanism. The idea behind SPIN is to name the
data using high-level descriptors or meta-data. Before transmission, metadata are exchanged among sensors via a data
advertisement mechanism, which is the key feature of SPIN. Each node upon receiving new data, advertises it to its
neighbors and interested neighbors, i.e. those who do not have the data, retrieve the data by sending a request message.
SPINs meta-data negotiation solves the classic problems of flooding such as redundant information passing,
overlapping of sensing areas and resource blindness thus, achieving a lot of energy efficiency. There is no standard
meta-data format and it is assumed to be application specific, e.g. using an application level framing. There are three
messages defined in SPIN to exchange data between nodes. These are: ADV message to allow a sensor to advertise a
particular meta-data, REQ message to request the specific data and DATA message that carry the actual data.
Rumor Routing
Rumor routing [6] is another variation of Directed Diffusion and is mainly intended for contexts in which
geographic routing criteria are not applicable. Generally Directed Diffusion floods the query to the entire network when
there is no geographic criterion to diffuse tasks. However, in some cases there is only a little amount of data requested from
the nodes and thus the use of flooding is unnecessary. An alternative approach is to flood the events if number of events is
small and number of queries is large. Rumor routing is between event flooding and query flooding. The idea is to route the
queries to the nodes that have observed a particular event rather than flooding the entire network to retrieve information
about the occurring events. In order to flood events through the network, the rumor routing algorithm employs long-lived
packets, called agents. When a node detects an event, it adds such event to its local table and generates an agent.
Agents travel the network in order to propagate information about local events to distant nodes. When a node generates a
query for an event, the nodes that know the route, can respond to the query by referring its event table. Hence, the cost of
flooding the whole network is avoided. Rumor routing maintains only one path between source and destination as opposed
to Directed Diffusion where data can be sent through multiple paths at low rates.
Gradient-Based Routing
Gradient-Based Routing [7] is another version of directed diffusion, which aims to distribute traffic evenly
throughout the network in order to increase the network lifetime. The key idea is to memorize the number of hops when the
interest is diffused through the whole network. As such, each node can calculate a parameter called the height of the node,
which is the minimum number of hops required to reach the base station. The difference between a nodes height and that
of its neighbor is considered the gradient on that link. A packet is forwarded on a link with the largest gradient.
www.tjprc.org
editor@tjprc.org
24
Rashmi M. R
25
is almost the same as LEACH protocol, only makes communication mode from single hop to multi-hop between CHs and
BS.
LEACH-C
Wendi et al. [12] proposed LEACH-C protocol which uses a centralized algorithm. LEACH-C protocol can
produce better performance by dispersing the cluster heads throughout the network. During the set-up phase of LEACH-C,
each node sends information about its current location (possibly determined using GPS) and residual energy level to the
sink. In addition to determining good clusters, the sink needs to ensure that the energy load is evenly distributed among all
the nodes. To do this, sink computes the average node energy, and determines which nodes have energy below this
average. The steady-state phase of LEACH-C is identical to that of the LEACH protocol.
V-LEACH
In the LEACH protocol the CH is always on receiving data from cluster members, aggregates these data and then
sends it to BS that might be far away from the BS. The CH may die earlier than other nodes because of receiving,
sending and overhearing. When the CH dies cluster becomes useless because data sensed by sensor nodes will not be
aggregated and also will not be sent to BS. In V-LEACH [13] protocol, besides having a CH in the cluster, there is a
vice-CH that takes the role of the CH when the CH dies because the reasons we mentioned above. By doing this,
cluster nodes data will always reach the BS; no need to elect a new CH each time the CH dies. This will extend the overall
network life time.
Chain-Based Data Aggregation
In cluster-based sensor networks, sensors transmit data to the cluster head where data aggregation is performed.
However, if the cluster head is far away from the sensors, they might expend excessive energy in communication.
Further improvements in energy efficiency can be obtained if sensors transmit only to close neighbors. The key idea behind
chain-based data aggregation is that each sensor transmits only to its closest neighbor. Lindsey et al. [14] presented a
chain-based data-aggregation protocol called Power-Efficient Data Gathering Protocol for Sensor Information Systems
(PEGASIS). In PEGASIS, nodes are organized into a linear chain for data aggregation. The nodes can form a chain by
employing a greedy algorithm or the sink can determine the chain in a centralized manner. Greedy chain formation
assumes that all nodes have global knowledge of the network. The farthest node from the sink initiates chain formation
and, at each step, the closest neighbor of a node is selected as its successor in the chain. In each data-gathering round,
a node receives data from one of its neighbors, fuses the data with its own, and transmits the fused data to its other
neighbor along the chain. Eventually, the leader node which is similar to cluster head transmits the aggregated data to the
sink. Nodes take turns in transmitting to the sink. The greedy chain formation approach used in may result in some nodes
having relatively distant neighbors along the chain. This problem is alleviated by not allowing such nodes to become
leaders.
Tree-Based Data Aggregation
In a tree-based network, sensor nodes are organized into a tree where data aggregation is performed at
intermediate nodes along the tree and a concise representation of the data is transmitted to the root node. Tree-based data
aggregation is suitable for applications which involve in-network data aggregation. An example application is radiation
level monitoring in a nuclear plant where the maximum value provides the most useful information for the safety of the
www.tjprc.org
editor@tjprc.org
26
Rashmi M. R
plant. One of the main aspects of tree-based networks is the construction of an energy efficient data aggregation tree.
Various tree data aggregation algorithms have been proposed in the literature.
An Energy-Aware Data Aggregation Tree (EADAT) algorithm is proposed in [20]. The base station (root) sends a
broadcast control message periodically. Upon receiving this message for the first time, each node will start a timer.
The expiration time is inversely proportional to the nodes residual energy. The timer is refreshed when a node receives
this message during the timer count down.
TAG
The Tiny Aggregation (TAG) approach [20] is a data-centric protocol. It is based on aggregation trees and
specifically designed for monitoring applications. This means that all nodes should produce relevant information
periodically. Therefore, it is possible to classify TAG as a periodic per hop adjusted aggregation approach.
The implementation of the core TAG algorithm consists of two main phases:
The collection phase, where the aggregated sensor readings are routed up the aggregation tree.
For the distribution phase, TAG uses a tree based routing scheme rooted at the sink node. The sink broadcasts a
message asking nodes to organize into a routing tree and then sends its queries. Each message contains a field specifying
the level, or distance from the root, of the sending node. Whenever a node receives a message and it does not yet belong to
any level, it sets its own level to be the level of the message plus one. It also elects the node from which it receives the
message as its parent. The parent is the node that is used to route messages toward the sink. Each sensor then rebroadcasts
the received message adding its own identifier (ID) and level. This process continues until all nodes have been assigned an
ID and a parent. The routing messages are periodically broadcast by the sink in order to keep the tree structure updated.
After the construction of the tree, the queries are sent along the structure to all nodes in the network.TAG adopts the
selection and aggregation facilities of the database query languages (SQL). Accordingly, TAG queries have the following
form:
SELECT {agg (expr), attrs} from SENSOR
WHERE {selPreds}
GROUP BY {attrs}
HAVING {havingPreds}
EPOCH DURATION i
In practice, the sink sends a query, where it specifies the quantities that it wants to collect (attrs field), how these
must be aggregated (agg (expr)), and the sensors that should be involved in the data retrieval. An EPOCH DURATION
field specifies the time (in seconds) each device should wait before sending new sensor readings.
This means the readings used to compute an aggregate record all belong to the same time interval, or epoch.
During the data collection phase, due to the tree structure, each parent has to wait for data from all of its children before it
can send its aggregate up the tree. TAG may be inefficient for dynamic topologies or link/device failures. In addition,
as the topology changes, TAG has to reorganize the tree structure, which means high costs in terms of energy consumption
Impact Factor (JCC): 1.6842
27
and overhead. E-Span, [16] is an energy-aware spanning tree algorithm. In E-span, the source node which has the highest
residual energy is chosen as the root.
Other source nodes choose their corresponding parent node among their neighbors based on the information of the
residual energy and distance to the root. If there are multiple neighbors with equal distance, the node which has most
remaining energy is selected as parent. As Espan protocol considers distance as main parameter and remaining energy as
second, the network coverage is not high; because in some cases the nodes with low remaining energy are selected as
parent. After local aggregation and data transmission, the remaining energy of these nodes is finished quickly.
This causes the node failure and network cannot coverage region completely.
Grid-Based Data Aggregation
Vaidhyanathan et al. [18] have proposed two data-aggregation schemes which are based on dividing the region
monitored by a sensor network into several grids. They are: grid-based data aggregation and In-network data aggregation.
In grid-based data aggregation, a set of sensors is assigned as data aggregators in fixed regions of the sensor network. The
sensors in a particular grid transmit the data directly to the data aggregator of that grid. Hence, the sensors within a grid do
not communicate with each other.
In-network aggregation is similar to grid-based data aggregation with two major differences, namely, each sensor
within a grid communicates with its neighboring sensors. Any sensor node within a grid can assume the role of a data
aggregator in terms of rounds until the last node dies.
Multipath Approaches
In order to overcome the robustness problems of aggregation trees, a new approach was recently proposed [21],
[22]. Instead of having an aggregation tree where each node has to send the partial result of its aggregation to a single
parent, these solutions send data over multiple paths. The main idea is that each node can send the data to its multiple
neighbors by exploiting the broadcast characteristics of the wireless medium. Hence, data may flow from the sources to the
sinks along multiple paths, and aggregation may be performed by each node. In contrast to the tree-based schemes
discussed above, multipath approaches allow duplicates of the same information to be propagated. Clearly, such schemes
trade higher robustness for some extra overhead due to sending duplicate packet. An aggregation structure that fits well
with this methodology is called ring topology, where sensor nodes are divided into several levels according to the number
of hops separating them from the data sink. Data aggregation is performed over multiple paths as packets move level by
level toward the sink.
Synopsis Diffusion
In Synopsis Diffusion [21] Data aggregation is performed through a multipath approach. The underlying topology
for data dissemination is organized in concentric rings around the sink. Synopsis Diffusion consists of two phases:
www.tjprc.org
editor@tjprc.org
28
Rashmi M. R
29
CONCLUSIONS
We have presented a comprehensive survey of data-aggregation algorithms in wireless sensor networks.
Here we first present the taxonomy of data aggregation based on network topology. Then we present the comprehensive
study various data aggregation algorithm. All of them focus on optimizing important performance measures such as
network lifetime, data latency, data accuracy, and energy consumption. Efficient organization, routing, and
data-aggregation tree construction are the three main focus areas of data-aggregation algorithms. We have described the
main features, the advantages and disadvantages of various data aggregation algorithm.
Most of the existing work has mainly focused on the development of an efficient routing mechanism for data
aggregation. However, the performance of the data aggregation protocol is strongly coupled with the infrastructure of the
network. There has not been significant research on exploring the impact of heterogeneity and mode of communication
(single hop versus multi hop) on the performance of the data-aggregation protocols. Although, many of the
data-aggregation techniques discussed look promising, there is significant scope for future research. Combining aspects
such as security, data latency, and system lifetime in the context of data aggregation is worth exploring.
REFERENCES
1.
Vaibhav Pandey, Amarjeet kaur & Narottam Chand, A review on data aggregation techniques in wireless sensor
network, in Journal of Electronics & Electrical Engineering, ISSN:0976-8106 vol.1 issue 2010,pp 01-08.
2.
Ramesh Rajagopalan and Pramod K.Varsney (2009) Journal IEEE Transactions on Wireless Communications,
Volume 8 Issue 7, doi>10.1109/TWC.2009.060484.
3.
Hedetniemi S. and Liestman A. (1988) Networks: An International Journal Vol. 18 no. 4 pp. 319349.
4.
Chalermek Intanagonwiwat, Ramesh Govindan, Deborah Estrin, John Heidemann, and Fabio Silva
Directed Diffusion for Wireless Sensor Networking
5.
Heinzelman W., Kulik J., Balakrishnan H. (1999) Proceedings of the 5th Annual ACM/IEEE International
Conference on Mobile Computing and Networking (MobiCom_99), Seattle, WA, August 1999.
6.
Braginsky D., Estrin D. (2002) Proceedings of the First Workshop on Sensor Networks and Applications (WSNA),
Atlanta, GA.
7.
Schurgers C. and Srivastava M.B. (2001) Energy Efficient Routing in Wireless Sensor Networks. In MILCOM
Proceedings on Communications for Network-Centric Operations: Creating the Information Force Mc Lean, VA.
8.
Wendi Rabiner Heinzelman, Anantha Chandrakasan, and Hari Balakrishnan (2000) Energy-Efficient
Communication Protocol for Wireless Microsensor Networks Proceedings of the 33rd Hawaii International
Conference on System Sciences 2000.
9.
Fan Xiangning, Song Yulin (2007) International Conference on Sensor Technologies and Applications.
10. Loscri V., Morabito G., Marano S. (2005) Vehicular Technology Conference, VTC-2005, Volume: 3, 1809-1813.
11. Yuhua Liu, Yongfeng Zhao, Jingju Gao, (2009) International Joint Conference on Artificial Intelligence.
www.tjprc.org
editor@tjprc.org
30
Rashmi M. R
12. Wendi B. Heinzelman, Anantha P.Chandrakasan and Hari Balakrishnan (2002) IEEE Transactions on Wireless
Communications, Vol. 1, No. 4.
13. Bani Yassein M., Al-zou'bi A., Khamayseh Y., Mardini W. (2009) JDCTA: International Journal of Digital
Content Technology and its Applications, Vol. 3, No. 2, pp. 132-136.
14. Lindsey S., Raghavendra C. and Sivalingam K. M. (2002) IEEE Trans. Parallel and Distributed Systems, vol. 13,
no. 9, 92435.
15. Min Ding, Xiuzhen Cheng, Guoliang Xue, (2006) Proceeding Mobility '06 Proceedings of the 3rd international
conference
on
Mobile
technology,
applications
&
systems,
ISBN:1-59593-519-3,
doi>
10.1145/1292331.1292391
16. Marc Lee, Vincent W.S. Wong (2006) Computer Communications 29(13-14): 2506-2520.
17. Tan H. O. and Korpeoglu I. (2003) SIGMOD Record, vol. 32, no. 4, 6671.
18. Vaidhyanathan K. et al. (2004) Technical Report, OSU-CISRC-11/04- TR60, Ohio State University.
19. Kai-Wei Fan, Sha Liu, and Prasun Sinha (2007) IEEE Transaction on Mobile Computing, 6, 8.
20. S. Madden et al., TAG: a Tiny Aggregation Service for Ad Hoc Sensor Networks, OSDI 2002, Boston, MA,
Dec. 2002.
21. S. Nath et al., Synopsis Diffusion for Robust Aggregation in Sensor Networks, ACM Sen Sys 2004, Baltimore,
MD, Nov. 2004.
22. A. Manjhi, S. Nath, and P. B. Gibbons, Tributaries and Deltas: Efficient and Robust Aggregation in Sensor
Network Stream, ACM SIGMOD 2005, Baltimore, MD, June 2005.
23. Leandro Villas, Azzedine Boukerche, Heitor S. Ramos, A Lightweight and Reliable Routing Approach for inNetwork aggregation in WSN, in Jounal of IEEE Tranactions on computer networks, vol 38,Mar 2011,pp 393422.
24. Rajagopalan, Ramesh and Varshney, Pramod K., "Data aggregation techniques in sensor networks: A survey",
(2006), Electrical Engineering and Computer Science. Paper22.