Vous êtes sur la page 1sur 9

Differential Evolution Methods for Unsupervised Image Classification

Mahamed G. H. Omran 1, Andries P Engelbrecht2, Ayed Salman 3


1

Faculty of Computing & IT, Arab Open University, Kuwait mjomran@engineer.com


2
Department of Computer Science, University of Pretoria, South Africa
engel@cs.up.ac.za
3
Computer Engineering Department, Kuwait University, Kuwait
ayed@eng.kuniv.edu.kw
approach has several advantages over the supervised
approach, namely [3]
Abstract- A clustering method that is based on
Differential Evolution is developed in this paper. The
For unsupervised approaches, there is no need for an
algorithm finds the centroids of a user specified
analyst to specify in advance all the classes in the
number of clusters, where each cluster groups
image data set. The clustering algorithm automatically
together similar patterns. The application of the
finds distinct classes, which dramatically reduces the
proposed clustering algorithm to the problem of
work of the analyst.
unsupervised classification and segmentation of
The characteristics of the objects being classified can
images is investigated. To illustrate its wide
vary with time; the unsupervised approach is an
applicability, the proposed algorithm is then applied
excellent way to monitor these changes.
to synthetic, MRI and satellite images. Experimental
Some characteristics of objects may not be known in
results show that the Differential Evolution clustering
advance. The unsupervised approach automatically
algorithm performs very well compared to other
flags these characteristics.
stateof-the-art clustering algorithms in all measured
criteria. Additionally, the paper presents a different
The focus of this paper is on the unsupervised approach.
formulation to the multi-objective fitness function to
There
are several algorithms that belong to this approach.
eliminate the need to tune objective weights. A gbest
These
algorithms can be categorized into two groups:
DE is also proposed with encouraging results.
hierarchical and partitional [4, 5]. In hierarchical clustering,
the output is "a tree showing a sequence of clustering with
each clustering being a partition of the data set" [5]. This
1 Introduction
type of algorithms have the following advantages:
Image clustering is the process of identifying groups of
similar image primitives [1]. These image primitives can
be pixels, regions, line elements and so on, depending on
the problem encountered. Many basic image processing
techniques such as quantization, segmentation and
coarsening can be viewed as different instances of the
clustering problem [1].
There are two main approaches to image
classification: supervised and unsupervised. In the
supervised approach, the number and the numerical
characteristics (e.g. mean and variance) of the classes in
the image are known in advance (by the analyst) and
used in the training step which is followed by the
classification step. There are several popular supervised
algorithms such as the minimum-distance-to-mean,
parallelepiped and the Gaussian maximum likelihood
classifiers [2]. In the unsupervised approach the classes
are unknown and the approach starts by partitioning the
image data into groups (or clusters), according to a
similarity measure, which can be compared with
reference to data by an analyst [2]. Therefore,
unsupervised classification is also referred to as a
clustering problem. In general, the unsupervised

The number of classes need not be specified a priori,


and
they are independent of the initial conditions.
However, hierarchical clustering suffers from the
following drawbacks:
They are static, i.e. pixels assigned to a cluster can not
move to another cluster.
They may fail to separate overlapping clusters due to
lack of information about the global shape or size of
the clusters [4].
On the other hand, partitional clustering algorithms
partition the data set into a specified number of clusters.
These algorithms try to minimize certain criteria (e.g. a
square error function); therefore, they can be treated as an
optimization problem. The advantages of the hierarchical
algorithms are the disadvantages of the partitional
algorithms and vice versa.
The most widely used partitional algorithm is the
iterative K-means approach. The K-means algorithm starts

with K cluster centers or centroids. Cluster centroids can be


initialized to random values or can be derived from a priori
information. Each pixel in the image is then assigned to the
closest cluster (i.e. closest centroid). Finally, the centriods
are recalculated according to the associated pixels. This
process is repeated until convergence [6]. The K-means
algorithm suffers from the following drawbacks:
the algorithm is data-dependent;
it is a greedy algorithm that depends on the initial
conditions, which may cause the algorithm to converge
to suboptimal solutions; and
the user needs to specify the number of classes in
advance [3].
ISODATA is an enhancement proposed by [7] that
operates on the same concept as the K-means algorithm
with the addition of the possibility of merging classes and
splitting elongated classes.
Another category of unsupervised partitional algorithms
is the class of non-iterative algorithms. The most widely
used non-iterative algorithm is MacQueen's K-means
algorithm [8]. This algorithm works in two phases as
follows: one phase to find the centroids of the classes and
the second to classify the image pixels. Competitive
Learning (CL), updates the centroids sequentially by
moving the closest centroid toward the pixel being
classified [9]. Non-iterative algorithms suffer the drawback
of being dependent on the order in which the data points are
presented. To overcome this problem, the choice of data
points can be randomized [3]. Lillesand and Kiefer
presented a non-iterative approach to unsupervised
clustering with a strong dependence on the image texture
[2]. A window (e.g. 3 3 window) is moved over the image
and the variance of the pixels within this window is
calculated. If the variance is less than a pre-specified
threshold then the mean of the pixels within this window is
considered as a new centroid. This process is repeated until
a pre-specified maximum number of classes is reached. The
closest centroids are then merged until the entire image is
analyzed. The final centroids resulting from this algorithm
are used to classify the image [2]. In general, iterative
algorithms are more effective than the non-iterative
algorithms, since iterative algorithms are less dependent on
the order in which data points are presented.
In this paper, an unsupervised image classification
approach is developed which uses a Differential Evolution
(DE) algorithm. The paper shows that the DE approach has
promise in unsupervised image classification. A gbest DE
is also proposed with encouraging results.
The rest of the paper is organized as follows: An
overview of DE is given in section 2. The new unsupervised
image classification algorithm is presented in section 3.
Section 4 presents experimental results to illustrate the
efficiency of the algorithm. Section 5 concludes the paper,
and outlines future research.

2 Differential Evolution
Differential evolution (DE) [10] is a population-based
search strategy very similar to standard evolutionary
algorithms. The main difference is in the reproduction step
where offspring is created from three parents using an
arithmetic cross-over operator. DE is defined for floatingpoint representations of individuals.
Differential evolution does not make use of a mutation
operator that depends on some probability distribution
function, but introduces a new arithmetic operator which
depends on the differences between randomly selected pairs
of individuals.
For each parent,

xi (t), of generation t, an offspring,

xi (t), is created in the following way: Randomly select


three individuals from the current population, namely xi1(t)
, xi 2(t) and xi3(t), with i1 i2 i3 i and i1, i2, i3 ~ U(1,,
s), where s is the population size. Select a random number
r ~ U(1,, Nd), where Nd is the number of genes
(parameters) of a single chromosome. Then, for all
parameters j = 1,, Nd, if U(0, 1) < Pr, or if j = r, let

xi, j (t) = xi 3,j (t)+(xi1,j (t)-xi 2,j (t))


otherwise,
(2)

let

xi,

(t)

(1)

xi,

(t)

In the above, Pr is the probability of reproduction (with


Pr [0, 1]), is a scaling factor with (0, ), and xi,

(t)

and xi, j (t) indicate respectively the j-th


parameter of the offspring and the parent.
Thus, each offspring consists of a linear combination of
three randomly chosen individuals when U(0, 1) < Pr;
otherwise the offspring inherits directly from the parent.
Even when Pr = 0, at least one of the parameters of the
offspring will differ from the parent (forced by the
condition j = r).
The mutation process above requires that the population
consists of more than three individuals.
After completion of the mutation process, the next step
is to select the new generation. For each parent of the
current population, the parent is replaced with its offspring
if the fitness of the offspring is better, otherwise the parent
is carried over to the next generation.
DE is easy to implement, requires little parameter
tunning [11], can find the global optimum regardless of the
initial parameter values and exhibits fast convergence [12].

3 DE-Based Clustering Algorithm

zi,k,p indicates if

clusters of individual i. Each element

This section defines the terminology used throughout the


rest of the paper. A measure is given to quantify the quality
of a clustering algorithm, after which the DEbased
clustering algorithm is introduced.
Define the following symbols:
Nb denotes the number of spectral bands of the image
set

pattern zp belongs to cluster Ck of individual i. The constants


w1, w2 and w3 are user-defined constants used to weigh the
contribution of each of the subobjectives. Also,

dmax(Zi ,xi ) = kmax=1,K,K


)/ni,k

pCi,

dk (zp ,mi,k

(5)

is the maximum average Euclidean distance of individuals


to their associated clusters, and

Np denotes the number of image pixels


Nc denotes the number of spectral classes (as provided
by the user)
zp denotes the Nb components of pixel p
mj denotes the centroid (mean) of cluster j

dmin (xi ) = k,minkk,kkk{d(mi,k ,mi,kk )}

(6)

is the minimum Euclidean distance between any pair of


clusters. In the above,

ni,k is the number of patterns that

3.1 Measure of Quality


belong to cluster Ci,k of individual i.
Different measures can be used to express the quality of a
clustering algorithm. The most general measure of
performance is the quantization error, defined as

dmax (Zi ,xi ) and the quantization error, as quantified by J e

Je =

k=1

The fitness function in equation (4) has as objective to


simultaneously minimize the intra-distance between
patterns and their cluster centroids, as quantified by

, and to maximize the inter-distance between any pair of

d(zp ,mk )

z p Ck

nk

clusters, as quantified by,


(3) where Ck

dmin (xi ) . According to the

definition of the fitness function, a small value of f (xi ,Zi )


suggests compact and wellseparated clusters (i.e. good

is the kth cluster, and nk is the number of pixels in Ck.

3.2 DE-Based Clustering Algorithm


In the context of data clustering, a single individual
represents the K cluster centroids. That is, each individual
xi is constructed as xi = (mi,1,,mi,k,,mi,K) where mi,k
refers to the kth cluster centroid vector of the ith individual.
Therefore, a population represents a number of candidate
data clusterings. The quality of each individual is measured
using

f (xi,Zi )=w1dmax(Zi,xi )+w2(zmax dmin(xi ))+w3Je,i (4)


where zmax is the maximum value in the data set (i.e. in the
context of digital images, zmax

2s 1 for an s-bit image);

Zi is a matrix representing the assignment of patterns to the

clustering).
The fitness function is thus a multi-objective problem.
Approaches to solve multi-objective problems have been
developed mostly for evolutionary computation approaches
[13]. Since our scope is to illustrate the applicability of DE
to unsupervised image classification, and not on multiobjective optimization, a simple weighted approach is used
to cope with multiple objectives. Different priorities are
assigned to the subobjectives via appropriate initialization
of the values of w1, w2 and w3.
The DE clustering algorithm is summarized in Figure 1.

1. Initialize each individual to contain


K randomly selected cluster centroids
2. For t = 1 to tmax (a) For each
individual i i. For each pattern zp calculate

d(z p ,mi,k ) for all clusters Ci,k assign zp to


Ci,k where
d( zp ,mi,k ) =

min=1,K,K {d( zp ,mi,k )} ii.

Calculate the fitness, f

(xi ,Zi )

(b) Apply DE as described in Section 2 and update the


cluster centroids using equations (1) and (2)
Figure 1: The DE clustering algorithm

An advantage of using DE is that a parallel search for an


optimal clustering is performed. This population-based
search approach reduces the effect of the initial conditions,
compared to K-means, especially for relatively large
population sizes.

4 Experimental Results
The DE-based clustering algorithm has been applied to
three types of imagery data, namely synthetic, MRI and
LANDSAT 5 MSS (79 m GSD) images. These data sets
have been selected to test the algorithms, and to compare
them with other algorithms, on a range of problem types, as
listed below:
Synthetic Image: Figure 2(a) shows a 100 100 8-bit gray
scale image created to specifically show that the DE
algorithm does not get trapped in the local minimum. The
image was created using two types of brushes, one brighter
than the other.
MRI Image: Figure 2(c) shows a 300 300 8-bit gray
scale image of a human brain, intentionally chosen for its
importance in medical image processing.
Remotely Sensed Imagery Data: Figure 2(e) shows band
4 of the four-channel multispectral test image set of the
Lake Tahoe region in the US. Each channel is comprised of
a 300 300, 8-bit per pixel (remapped from the original 6
bit) image. The test data are one of the North American
Landscape Characterization (NALC) Landsat multispectral
scanner data sets obtained from the U.S. Geological Survey
(USGS).
4.1 DE versus state-of-the-art clustering algorithms

This section compares the performance of the DE with


Kmeans, FCM [14], KHM [15], H2 [16], a GA clustering
algorithm and a PSO clustering algorithm [17]. In all cases,
for DE, PSO and GA, 50 individuals were trained for 100
iterations; for the other algorithms 5000 iterations were
used (i.e. all algorithms have performed 5000 function
evaluations). For K-means, FCM, KHM, H2, GA and PSO,
the parameters were set as in [18]. For the DE, Pr was set to
0.9 as suggested by [19]. According to [19], is generally
in the range [0.5, 1]. Therefore, in order to free the user
from specifying a value for , in this paper, starts with a
value of 1 and linearly decreases until it reaches 0.5. The
results are summarized in Table I. These results are
averages and standard deviations over 20 simulation runs.
Table I shows that DE generally outperformed Kmeans, FCM, KHM and H2 in dmin and dmax , while
performing comparably with respect to J e (for the synthetic
image, DE performs significantly better than Kmeans,
FCM, KHM and H2 with respect to J e ). The DE, PSO and
GA showed similar performance, with no significant
difference. However, due to the advantages of DE as shown
in section 2, DE is the best choice to use. These results
confirm the results of [20] and [21]. The segmented images
resulting from the DE-based clustering algorithm are shown
in Figure 2.
These results show that the DE-based clustering
algorithm is a viable alternative that merit further
investigation.

4.2 A Non-parametric Fitness Function


The fitness function defined in equation (4) provides the
user with the flexibility of prioritizing the fitness term of
interest by modifying the corresponding weight. However,
it requires the user to find the best combination of w1, w2
and w3 for each image which is not an easy task. Therefore,
a non-parametric fitness function without weights is
defined as [18]

f (xi ,Zi ) = dmaxd(minZi(,Zxii,)x+i )Je,i

(7)

The advantage of equation (7) is that it works with any


data set without any user intervention. Table II is a repeat
of Table I, but with the results of DE using the
nonparametric fitness function. In general, the DE using the
non-parametric fitness function performed better than KMeans, FCM, KHM and H2 in terms of dmin and dmax , while
performing comparably with respect to J e . In addition, the
DE using the non-parametric fitness function performed
comparably with GA and PSO using the parametric fitness
function (equation (4)). Hence, the non-parametric fitness

function (equation (7)) can be used instead of the


parametric fitness function (equation (4)), thereby
eliminating the need for tuning w1, w2 and w3.
4.3 A gbest DE

Figure 3 shows that gbest DE performs relatively better


than DE, especially for the MRI image (both DE and gbest
DE use equation (7)).

5 Conclusions

The idea of cooperation found in PSO where individuals


(called particles) are attracted to the best point found by all
particles is used to modify the DE such that equation (1) is
replaced by:

xi, j (t) = xg, j (t) +(xi1, j (t) - xi 2, j (t))

(8)

where xg,j (t) is the global best individual in the population.


The modified DE is called gbest DE.

This paper presented a clustering approach using DE. The


DE clustering algorithm has as objective to simultaneously
minimize the quantization error and intracluster distances,
and to maximize the inter-cluster distances. The DE
clustering algorithm was compared against K-means, FCM,
KHM, H2, GA and PSO. In general, the DE algorithm
performed very well with reference to inter- and intracluster distances, while having quantization errors
comparable to the other algorithms. A non-parametric
version of the proposed

Table I: Comparison between K-means, FCM, KHM, H2, GA, PSO and DE for fitness function defined
in equation (4)
Image
K-means
FCM
KHM

Synthetic

H2
GA
PSO
DE
K-means
FCM
KHM

MRI

H2
GA
PSO
DE

Tahoe

K-means

Je

dmax

dmin

20.21225
0.937836
20.731920
0.650023
20.168574
0.0
20.136423
0.793973
17.004002
0.035146
16.988910
0.023937
17.019477
0.036177
7.3703
0.042809
7.205987
0.166418
7.53071
0.129073
7.264114
0.149919
7.038909
0.508953
7.594520
0.449454
7.558624
0.483753
3.280730
0.095188

28.04049
2.7779388
28.559214
2.221067
23.362418
0.0
26.686939
3.011022
24.603018
0.11527
24.696055
0.130334
24.695851
0.119644
13.214369
0.761599
10.851742
0.960273
10.655988
0.295526
10.926594
0.737545
9.811888
0.419176
10.186097
1.237529
10.917589
0.758975
5.234911
0.312988

78.4975
7.0628718
82.434116
4.404686
86.307593
0.000008
81.834143
6.022036
93.492196
0.2567
93.632200
0.248234
93.658673
0.2085
9.93435
7.308529
19.517755
2.014138
24.270841
2.04944
20.543530
1.871984
25.954191
2.993480
26.705917
3.008073
23.706708
3.055167
9.402616
2.823284

FCM
KHM
H2
GA
PSO
DE

3.164670
0.000004
3.830761
0.000001
3.197610
0.000003
3.472897
0.151868
3.523967
0.172424
3.761064
0.179259

4.999294
0.000009
6.141770
0.0
5.058015
0.000007
4.645980
0.105467
4.681492
0.110739
4.876556
0.206333

10.970607
0.000015
13.768387
0.000002
11.052893
0.000012
14.446860
0.857770
14.664859
1.177861
15.615877
0.927105

Table II: Comparison between K-means, FCM, KHM, H2, GA, PSO and DE for fitness function defined
in equation (7)
Image
K-means
FCM
KHM

Synthetic

H2
GA
PSO
DE
K-means
FCM
KHM

MRI

H2
GA
PSO

Je

dmax

dmin

20.21225
0.937836
20.731920
0.650023
20.168574
0.0
20.136423
0.793973
17.004002
0.035146
17.284
0.09
17.349039
0.024415
7.3703
0.042809
7.205987
0.166418
7.53071
0.129073
7.264114
0.149919
7.038909
0.508953
7.839
0.238

28.04049
2.7779388
28.559214
2.221067
23.362418
0.0
26.686939
3.011022
24.603018
0.11527
22.457
0.414
22.208008
0.045002
13.214369
0.761599
10.851742
0.960273
10.655988
0.295526
10.926594
0.737545
9.811888
0.419176
9.197
0.56

78.4975
7.0628718
82.434116
4.404686
86.307593
0.000008
81.834143
6.022036
93.492196
0.2567
90.06
0.712
89.674503
0.071472
9.93435
7.308529
19.517755
2.014138
24.270841
2.04944
20.543530
1.871984
25.954191
2.993480
29.45
1.481

DE
K-means
FCM
KHM

Tahoe

H2
GA
PSO
DE

8.489362
0.518571
3.280730
0.095188
3.164670
0.000004
3.830761
0.000001
3.197610
0.000003
3.472897
0.151868
3.882
0.274
4.190698
0.302445

(a) Synthetic image

(c) MRI Image of Human brain

11.193335
0.620451
5.234911
0.312988
4.999294
0.000009
6.141770
0.0
5.058015
0.000007
4.645980
0.105467
5.036
0.368
5.216843
0.321865

26.561583
1.339439
9.402616
2.823284
10.970607
0.000015
13.768387
0.000002
11.052893
0.000012
14.446860
0.857770
16.410
1.231
16.906206
1.089620

(b) The Segmented Synthetic Image

(d) The Segmented MRI Images

(e) Band 4 of the Landsat MSS test image of Lake Tahoe


(f) The Segmented Lake Tahoe Images
Figure 2: Segmented images resulting from the DE clustering algorithm

(b) dmax

(a) Je

(c) dmin

Figure 3: Comparison between DE and gbest DE for fitness function defined in equation (7)

fitness function was tested with encouraging results. In


addition, a gbest DE was proposed.

Analysis and Machine Intelligence 22 [12],

Future research will investigate the effect of Pr, , s


and tmax. In addition, the performance of gbest DE needs
more investigation. Finally, using DE with other PSO
topologies (e.g. von Neumann and ring) will be
investigated.

[6] E. Forgy, Cluster Analysis of Multivariate Data:


Efficiency versus Interpretability of Classification,
Biometrics 21, 768-769 (1965).
[7] G. Ball and D. Hall, A Clustering Technique for
Summarizing

Multivariate

Data,

Behavioral

Science 12, 153-155 (1967).

Bibliography

[8] J. MacQueen, Some methods for Classification

[1] J. Puzicha, T. Hofmann and J. M. Buhmann,


Histogram

13961410 (2000).

Clustering

for

Unsupervised

Image

Segmentation, IEEE Proceedings of the Computer


Vision and Pattern Recognition 2, 602-608 (2000).
[2] T. Lillesand and R. Kiefer, Remote Sensing and Image
Interpretation, John Wiley & Sons Publishing, N.Y.,
1994.

and Analysis of Multivariate Observations,


Proceedings 5th Berkeley Symp. on Math. Stat.
and Prob. I, 281297 (1967).
[9] P. Scheunders, A Genetic C-means Clustering
Algorithm Applied to Image Quantization, Pattern
Recognition 30 [6], 859-866 (1997).
[10] R. Storn. and P. Kenneth P, Differential Evolution

[3] E. Davies, Machine Vision: Theory, Algorithms,

a Simple and Efficient Adaptive Scheme for

Practicalities, Academic Press, 2nd Edition, 1997.

Global Optimization over Continuous Spaces,

[4] H.
Frigui
Krishnapuram,

and
A

R.
Robust

Technical Report TR-95-012, ICSI, 1995.


[11] S. Paterlini and T. Krink, High Performance

Competitive Clustering Algorithm with Applications

Clustering with Differential Evolution, Congress

in Computer Vision, IEEE Transactions on Pattern

on Evolutionary Computation (CEC2004) 2, 2004-

Analysis and Machine Intelligence 21 [5], 450-465

2011 (2004).

(1999).
[5] Y. Leung, J. Zhang and Z. Xu, Clustering by
SpaceSpace Filtering, IEEE Transactions on Pattern

[12] D. Karaboga and S. Okdem, A Simple and Global


Optimization
Problems:

Algorithm

for

Engineering

Differential Evolution Algorithm, Turk Journal of


Electrical Engineering 12 [1], 53-60 (2004).
[13] C.A. Coello Coello, An Empirical Study of
Evolutionary

Techniques

for

Multiobjective

Optimization in Engineering Design, PhD Thesis.


Tulane University, 1996.
[14] J. Bezdek, A Convergence Theorem for the Fuzzy
ISODATA Clustering Algorithms, IEEE Transactions on
Pattern Analysis and Machine
Intelligence 2, 1-8 (1980).
[15] B. Zhang, Generalized K-Harmonic Means - Boosting
in Unsupervised Learning, Technical Report HPL2000-137, Hewlett-Packard Labs, 2000.
[16] G. Hamerly and C. Elkan, Alternatives to the Kmeans
Algorithm that Find Better Clusterings, Proceedings of
the ACM Conference on Information and Knowledge
Management (CIKM-2002), 600607 (2002).
[17] M. Omran, A. Engelbrecht and A. Salman, Particle Swarm
Optimization Method for Image Clustering. To appear in
the in the International Journal of Pattern Recognition and
Artificial Intelligence 19 [3], 297-322 (2005).
[18] M. Omran, Particle Swarm Optimization Methods for
Pattern Recognition and Image Processing, PhD thesis.
University of Pretoria, 2005.
[19] Differential

Evolution

homepage,

http://www.icsi.berkeley.edu/~storn/code.html#basi
[access date April 6, 2005].
[20] J. Vesterstrm and R. Thomsen, A Comparative Study of
Differential Evolution, Particle Swarm Optimization, and
Evolutionary Algorithms on Numerical Benchmark
Problems, Proceedings of the 2004 Congress on
Evolutionary Computation 2, 1980-1987 (2004).
[21] S. Paterlini and T. Krink, High Performance Clustering
with Differential Evolution. Proceedings of the Sixth
Congress on Evolutionary Computation (CEC-2004),
IEEE Press, Piscataway NJ, 2, 20042011 (2004).

Vous aimerez peut-être aussi