Vous êtes sur la page 1sur 5

Proposal of KMSTME Data Mining Clustering

Method for Prolonging Life of Wireless Sensor


Networks
Mladen Vukasinovic
University Mediterranean Podgorica
Faculty of information tehnology
Podgorica, Montenegro
mladen@ac.me
Abstract Clustering technique is one of the most important
and basic tool for data mining. Cluster algorithms have the
ability to detect clusters with irregular boundaries, minimum
spanning tree-based clustering algorithms have been widely
used in practice. In such clustering algorithms, the search for
nearest objects in the construction of minimum spanning trees is
the main source of computation and the standard solutions
In this paper i present a proposal of k-mstme cluster
algorithm based on minimum spanning tree and maximum
resource of energy and k-means algorithm on sensors named kmstme. K-mstme performs clustering, when assessing evaluation
of metrics, better than the already known methods of minimum
energy of adaptive clustering of hierarchy and base stations
controlled by dynamic clustering protocols in wireless sensor
networks. This method consists of three algorithms is due to
make an optimal solution for extending the lifetime of wireless
sensor networks.
Keywords-component; wireless sensor networks; data mining;
clustering method; minimal spanning tree; maximum energy; kmeans.

I.

INTRODUCTION

A sensor network is composed of a large number of sensor


nodes, which are densely deployed either inside the
phenomenon or very close to it. The position of sensor nodes
need not be engineered or pre-determined. This allows random
deployment in inaccessible terrains or disaster relief operations
[1]. On the other hand, this also means that sensor network
protocols and algorithms must possess self-organizing
capabilities. Another unique feature of sensor networks is the
cooperative effort of sensor nodes. Sensor nodes are fitted with
an on-board processor. Instead of sending the raw data to the
nodes responsible for the fusion, sensor nodes use their
processing abilities to locally carry out simple computations
and transmit only the required and partially processed data.
Wireless Sensor Network is a self-configuring network of
small sensor nodes communicating among themselves using
radio signals, and deployed in quantity to sense, monitor and
understand the physical world. Sensor nodes are inexpensive
and equipped with limited battery power and it is constrained
in energy. So maximize network lifetime is one of the
fundamental problems in wireless sensor networks. Network
lifetime is defined as the time when the first node is unable to

send its data to sink. In a scenario of data gathering application,


each node sends its data to its sink. Data aggregation reduces
the traffic of data and saves the energy by combining multiple
packets to a single packet when sensed data are highly
correlated. To increase the lifetime of the network many
research has been carried out. In that most of the existing
protocols use different approach called cluster based approach.
Iin cluster based approach the whole network is divided into
groups where each group has a head. An important problem is
finding an energy efficient routing scheme for gathering all
data periodically at the sink so that the network lifetime is
prolonged as much as possible. The lifetime of network can be
expressed in terms of founds where a round is the time period
between two sensing activities of sensor nodes. Time is a
critical issue in sensor network and introduces the possibility of
temporal relations between sensors. These relations are
important in that they can help in predicting the sources of
future events. Several techniques can be used to extract these
temporal relations, among these techniques; data mining has
recently received a great deal of attention. However, the stream
nature of senor data along with the limited resources of
wireless networks, bring new challenges to the data mining
techniques that should be addressed among these challenges
are the type of the knowledge to be extracted from the
networks and the way to extract the required data to mine the
defined knowledge. Wireless sensor networks extend the
concepts of databases in traditional data mining system. In
literature many researchers concerning protocols for WSNs
have been proposed to improve the energy consumption and
the network lifetime. Those protocols can be categorized into
three classes: routing protocols, sleep-and-awake scheduling
protocols, and clustering protocols. The routing protocols [2]
determine the energy-efficient multi-hop paths from each node
to the base station. In sleep-and-awake scheduling protocols
[3], every node in the schedule can sleep, in order to minimize
energy consumption. In clustering protocols [4] data
aggregation can be used for reducing energy consumption.
Data aggregation, also known as data fusion, can combine
multiple data packets received from different sensor nodes. It
reduces the size of the data packet by eliminating the
redundancy. Wireless communication cost is also decreased by
the reduction in the data packets [5]. Therefore, clustering
protocols improve the energy consumption and the network
lifetime of the WSN.

II.

CLUSTERING ALGORITHMS IN WIRELESS SENSOR

Good clustering algorithms in wireless sensor networks would


generate clusters in each round satisfying the five optimal
rules as follows:
1) Proximity: The average distance between neighbor sensors
is the shorter the better in each cluster.
2) Same Number: The number of sensors in each cluster
should be as the same as possible.
3) Maximum Energy: The energy resource on CHs is the
higher the better, and is at least above average level.
4) Even Location: CHs are distributed evenly.
5)..Dynamic Change: Diferent clusters are generated
dynamically in diferent round.
III.

finite number of iterations. The greedy-descent nature of kmeans on a non-convex cost also implies that the convergence
is only to a local optimum, and indeed the algorithm is
typically quite sensitive to the initial centroid locations. Figure
2 1 illustrates how a poorer result is obtained for the same
dataset as in Fig. 1 for a different choice of the three initial
centroids. The local minima problem can be countered to
some extent by running the algorithm multiple times with
different initial centroids, or by doing limited local search
about the converged solution. [10]

THE K-MEANS ALGORITHM

The k-means algorithm is a simple iterative method to


partition a given dataset into a userspecified number of
clusters, k. This algorithm has been discovered by several
researchers across different disciplines, most notably Lloyd
(1957, 1982) [7], Forgey (1965), Friedman and Rubin (1967),
and McQueen (1967). A detailed history of k-means alongwith
descriptions of several variations are given in [8]. Gray and
Neuhoff [9] provide a nice historical background for k-means
placed in the larger context of hill-climbing algorithms.
The algorithm operates on a set of d-dimensional vectors, D =
{xi | i = 1, . . . , N}, where xi _d denotes the ith data point.
The algorithm is initialized by picking k points in _d as the
initial k cluster representatives or centroids. Techniques for
selecting these initial seeds include sampling at random from
the dataset, setting them as the solution of clustering a small
subset of the data or perturbing the global mean of the data k
times. Then the algorithm iterates between two steps till
convergence:
Step 1: Data Assignment. Each data point is assigned to its
closest centroid, with ties broken arbitrarily. This results in a
partitioning of the data.
Step 2: Relocation of means. Each cluster representative is
relocated to the center (mean) of all data points assigned to it.
If the data points come with a probability measure(weights),
then the relocation is to the expectations (weighted mean) of
the data partitions.
The algorithm convergeswhen the assignments (and hence the
cj values) no longer change. The algorithm execution is
visually depicted in Fig. 1. Note that each iteration needs N
k comparisons, which determines the time complexity of one
iteration. The number of iterations required for convergence
varies and may depend on N, but as a first cut, this algorithm
can be considered linear in the dataset size.
One issue to resolve is how to quantify closest in the
assignment step. The default measure of closeness is the
Euclidean distance, in which case one can readily show that
the non-negative cost function,
will decrease whenever there is a change in the assignment or
the relocation steps, and hence convergence is guaranteed in a

Figure 1.
Changes in cluster representative locations (indicated
by + signs) and data assignments (indicated by color) during
an execution of the k-means algorithm

Figure 2.

IV.

Effect of an inferior initialization on the k-means results


K-MSTME CLUSTERING METHOD

K-MSTME clustering method is based on K-Means


clustering algorithms Minimal Spanning Tree and on
Maximum Energy resource on sensor. K-MSTME method do
well in the five optimal rules of sector II.
The main idea of K-MSTME is as follows. Firstly, those
sensors with energy resource above average level in the
network are selected into a CH candidate set, S. Then all the
candidates in S are connected with a MST with respect to their
locations of two-dimension coordinates. Non-candidate sensors
become supporters of their closest candidates. At last, a given
number, p, edges are broken in the MST to divide S into p+1
subsets and it makes the sum of supporters in each sub MST
(or subset) nearly the same. In each subset, the candidate with
maximum energy resource is chosen as CH. Than a algorithm
classified or grouped objects based on attributes/features into K
number of group. K is positive integer number. The grouping is
done by minimizing the sum of squares of distances between
data and the corresponding cluster centroid.
V.

K-MSTME ALGORITHM

In K-MSTME, sensors with energy above average level are


chosen as cluster-heads (CH) candidates. A MST is created to

describe the closeness of the CH candidates. Non-candidate


sensors become supporters of their closest candidates.
A expected number, Non-ch, of chs are chosen by KMSTME algorithm, which satisfy all of the five optimal
principles. Algorithm of K-MSTME is given in table.
TABELA I. Algorithm of K-MSTME clustering method
Step 1: Sensors with more energy resource than average level are selected
into CH candidates set, S.
Step 2: A MST, T, is used to connect all the items in S.
Step 3: Supporters of a CH candidate x are those non-candidate sensors that
are nearest to x among all CH candidates. Compute the number of
supporters for each CH candidate including candidate itself.
Step 4: Suppose supporters are just around their candidates and thus the
latter can delegate the former to decide which edge would be split.
Step 5 (Initialization): Let the number of already split edge nSplit = 0, T_
= T and S_ = S.
Step 6 (Loop): Find an edge, which breaks T_ into two sub MSTs of T1 and
T2 and at the same time the nodes in S_ are grouped into two subsets of S1
and S2, respectively with the nearest number of supporters in both subsets.
Then let nSplit = nSplit + 1.
Step 7 (Termination test): If nSplit NCH go to Step 8. Otherwise, go on
splitting S1 and S2 in turn. If the number of supporters in S1 (or S2) is more
than N/NCH, then let S_ = S1 and T_ = T1 (or S_ = S2 and T_ = T2). Go to
Step 6.
Step 8: The CH candidate with the most energy resource among CH
candidates in each subset is chosen as real CH. If more than one CH
candidates have maximum energy resource in their subset, then randomly
chose one as real CH.
Step 9: Data Assignment. Each data point is assigned to its closest centroid,
with ties broken arbitrarily. This results in a partitioning of the data.
Step 10: Relocation of means. Each cluster representative is relocated to
the center of all data points assigned to it. If the data points come with a
probability measure, then the relocation is to the expectations of the data
partitions.

Using of k-mstme algorithm every ch candidates would be


placed on same distance from each other and all of them would
have the same number of non-ch sensors.
Theorem 1. In wireless sensor networks, assume that all the
sensors have the same initial energy resource and sensors are
deployed evenly and randomly in the network area, then noncandidate supporters are approximately around their CH
candidates in each round.
Proof. With the same initial energy resource, all the sensors in
the networks should be connected by an MST, and then NCH
sub trees are formed by MST-based methods in Table 1.
Because all the sensors are distributed in the network area
evenly and randomly, CHs chosen by K-MSTME method are
distributed evenly and randomly in the first round. The
principle of nearly the same number of supporters in MSTME
ensures that each split sub tree covers nearly the same size of
network area. Also, according to Step 8 in Table 1, CHs are
randomly chosen. Therefore, CHs are distributed evenly and
randomly in the whole network area in the first round. This
provides basic conditions for the later rounds to choose CH
candidates. In the second round or later rounds, all the sensors
generally have different energy resources. Because CHs and
non-CH sensors that are far from CHs consume more energy

according to the energy dissipation model in Sec. II, those


sensors nearer to latest CHs would become CH candidates in
the current round. Therefore, in the local area of the networks,
non-candidate supporters are approximately around CH
candidates. MST is reasonable for clustering while evenly
distributing energy dissipation in wireless sensor networks.
First, MST connects all CH candidates with the minimal
edges. Wherever the edge is broken, the nodes in the same
split sub trees are closer. Because non-CH candidate sensors
support the nearest CH candidates, CH candidates may
delegate their supporters in terms of location. Thus, sensors in
the same clusters formed by K-MSTME are closer to each
other. Also approximately the same number of supporters in
each CH candidate subset ensures nearly the same number of
sensors be closest to the CH in this subset. At the same time,
because the sensors are evenly and randomly distributed in the
network area, the nearly same number of sensors may cover
the same size areas. Each area includes one CH, that is, CHs
are distributed evenly in the whole networks. At last, any
nodes in MST are high-energy sensors among all sensors in
the networks. Therefore,K-MSTME synthetically does better
in the five optimal principles.
VI.

4. COMPARISON OF SAMPLE CLUSTERS IN K-MSTME


WITH LEACH AND BCDCP

A sample wireless sensor network with N=100 nodes


randomly deployed in MM (M=100m) area is adopted to
evaluate clusters generated by K-MSTME, LEACH, and
BCDCP. In fact, the three clustering methods do not depend
on the absolute distance of the wireless sensor networks and
also they do not require applicationrelated
parameters. These are their advantages compared with the
clustering methods in traditional data mining system.
Moreover, the three clustering methods in wireless sensor
networks satisfy optimal cluster rules of both Maximum
Energy and Dynamic Change. The first principle of choosing
CH is based on high-energy sensors in each clustering
methods and it is also ensure that the latest CHs cannot be
chosen as CHs again in the current round. Some examples of
clusters in three consecutive rounds are shown in Figs. 36,
and they show that clusters even in consecutive rounds are so
different. Therefore, clusters in them are analyzed mainly
from the remaining three optimal cluster rules of Proximity,
Same Number, and Even Location.
Also, distributed LEACH and center-controlled BCDCP are
selected to compare with center-controlled K-MSTME,
because LEACH is the most popular routing protocols in
wireless sensor networks and BCDCP published in 2005 is
one of those that delegate the most recent development level
of clustering methods in wireless
sensor networks. Clusters in the 100th round to 102nd round
are chosen accidentally as examples to evaluate the
performance of K-MSTME in both single static round and
dynamic changes in consecutive rounds. Actually clusters in
arbitrary rounds have the same effectiveness. These arbitrarily
selected consecutive rounds are used to evaluate the above
three optimal rules statically in this section. However, to

evaluate the whole performance of the clustering methods, a


serial of consecutive rounds from the first round to the last
round should be considered.

(b)
(c)

488.4
620.6

6
8

Numeric results for evaluating nine clusters in multi-hop


networks
Clustering Algorithm Closer (EC) SameNumber(ESN)
BCDCP
Figure 3.

Six clusters in LEACH

(a)
(b)
(c)

750.3
422.6
376.9

1.9
2.8
3.9

293.2
300.6
275.7

3.4
2.3
2.3

KMSTME
(a)
(b)
(c)
Figure 4.

Six clusters in K-MSTME

Figure 5.

Figure 6.

Nine clusters in BCDCP

LEACH
647
903.6
681.4

7.7
9.7
10.3

685.9

7.3

KMSTME
(a)

Avarage energy dissipation in two-hop wsn

Nine clusters in K-MSTME

Numeric results for evaluating six clusters in two-hop


networks.
Clustering Algorithm Closer (EC) SameNumber(ESN)

(a)
(b)
(c)

Figure 7.

Figure 8.

Avarage energy dissipation in multi-hop wsn

Figure 5 plots average energy dissipation varies with number


of rounds in twohop wireless sensor networks [11]. Figure 5 is
also used to evaluate the closeness of clusters because energy
dissipation of transmitting data on non-CH sensors is reduced
a lot by using MSTME to replace LEACH. Figure 5 shows
average energy dissipation in MSTME is less than in LEACH.
Also, average energy dissipation varying with number of
rounds in multi-hop wireless sensor networks is shown in Fig.
6 [12]. Figure 6 plots MSTME outperforms BCDCP. MSTME
performs better in multi-hop topology than in two-hop
topology, because the average transmitting distance is reduced

further by reducing the average length of edges in MST that


connects CHs in MSTME.
VII.

CONCLUSION

The most important advantage of K-MSTME is to cluster


good solution to reduce the average energy dissipation while
choosing CHs with higher energy resource, and thus to
prolong network lifetime. K-MSTME proposes a good
solution to cluster sensors in wireless sensor networks
optimally. Based on K-MST, members in any sub sets of the
CH candidate se are closer. Also because CH candidates have
energy above average level, the sensors with most energy
resources in any sub sets have higher energy resources among
thesensors in the whole network. Nearest numbers of
supporters in each sub sets make the final clusters have the
number of sensors as near as possible, and thus energy
dissipations of CHs are approximately the same.
REFERENCES
[1]

[2]

[3]

A. Boukerche and S. Samarah, A Performance Evaluation of


Distributed Framework for Mining Wireless Sensor Networks,
ANSS'07, pp. 239-246, 2007
J.-H. Chang and L. Tassiulas, Maximum lifetime routing in wireless
sensor networks, IEEE/ACM Trans. Netw., vol. 12, no. 4, pp. 609619,
Aug. 2004.
J. Deng, Y. S.Han, W. Heinzelman, and P. Varshney, Balanced-energy
sleep scheduling scheme for high density cluster-based sensor

networks, in ASWN 2004: 4th Workshop on Applications and Services


in Wireless Networks, pp. 99108, 2004.
[4] C. Li, M. Ye, G. Chen, and J. Wu, An energy-efficient unequal
clustering mechanism for wireless sensor networks, in IEEE
International Conference on Mobile Adhoc and Sensor Systems
Conference, p. 604, 2005.
[5] B. Krishnamachari, D. Estrin, and S. B. Wicker, The impact of data
aggregation in wireless sensor networks, in ICDCSW 02: 22nd
International Conference on Distributed Computing Systems.
Washington, DC, USA:IEEE Computer Society, pp. 575578, 2002.
[6] S.-G. Lee, D.-K. Yun, Clustering categorical and numerical data: a new
procedure using multidimensional scaling, International Journal of
Information Technology De- cision Making, 2(2003) 135{159.
[7] Lloyd SP (1957) Least squares quantization in PCM. Unpublished Bell
Lab. Tech. Note, portions presented at the Institute of Mathematical
Statistics Meeting Atlantic City, NJ, September 1957. Also, IEEE Trans
Inform Theory (Special Issue on Quantization), vol IT-28, pp 129137,
March 1982
[8] Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall,
Englewood Cliffs
[9] Gray RM, Neuhoff DL (1998) Quantization. IEEE Trans Inform Theory
44(6):23252384
[10] Introduction to Data Mining, Tan, Kumar, Steinbach, 2006.
[11] W. B. Heinzelman, A. P. Chandrakasan and H. Balakrishnan, An
application-specific protocol architecture for wireless microsensor
networks. IEEE Transactions on Wireless Communications 1 (2002)
660670.
[12] S. D. Muruganathan, D. Ma, R. I. Bhasin and A. O. Fapojuwo, A
centralized energyefficient routing protocol for wireless sensor
networks, IEEE Communications Magazine 43 (2005) 813.

Vous aimerez peut-être aussi