Vous êtes sur la page 1sur 21




7.1 Introduction to Fuzzy clustering in image segmentation

In the hard clustering process, each data sample is assigned to only one

cluster and all clusters are regarded as disjoint gatherings of the data set. In real

life, however, there are many cases, in which the clusters are not completely

disjointed and data cannot be classified as belonging to one cluster. A crisp

classification process cannot cater for such a situation. Therefore, the separation of

the clusters becomes a fuzzy notion, and the representations of real data structures

can then be more accurately handled by fuzzy clustering methods. In these cases, it

is necessary to describe the data structure in terms of fuzzy clusters. The fuzzy c-

means (FCM) algorithm is the best known and the most intelligently used

technique. The objective function of FCM algorithm is very suitable for image

processing. The objective function of FCM algorithm is already been described in

Chapter 2. The complexity involving the uncertainty and vagueness in digital

image can be prudently handled FCM algorithm. The use of membership values

provides more flexibility and makes the clustering results more useful in practical

applications. To take the image geometric properties into account, we can make

use of some spatial features. To do this, however, we need to solve a problem of

classification in a multi-dimensional feature space and the simple threshold method

can no longer be used. Now let us consider the FCM clustering method to segment
the image. At each pixel, we have two membership values, one representing the

degree of certainty of a pixel belonging to background and the other representing

the degree of certainty of a pixel belonging to foreground. Since the sum of two

membership values must be equal to one, they are not independent of each other.

In general, for any -class problem, only membership values for each input

sample are independent.

7.2 Pap smear image segmentation using FCM algorithm

There exist many literatures suggesting various methods Pap smear image

segmentation as well as classification. From conventional digital image processing

techniques to hybrid intelligent methods, various methods are suggested based on

the requirement. The first attempt to detect and segment cells in cervical

microscopic images was based on image thresholding techniques [169]. Pixel

classification [170], morphological watersheds [104], [171] were some other

methods. The boundaries of the structuring elements of the cells can be detected by

applying methods based on active contours [104], template fitting [172], genetic

algorithms [173], region growing with moving K-means [174], and edge detectors

[175]. The method proposed by Lezoray and Cardot [171] is based on pixel-

classification techniques for the detection of the nuclei markers, in order to avoid

the over segmentation that the watershed algorithm may produce. In pixel-

classification techniques, the choice of the number of the classes the pixels belong

to plays a crucial role for the final segmentation result. Pap smear images exhibit

great complexity and the number of pixel classes is not obvious. The rough

assumption that all the pixels of the image are distributed into two classes, such as

nuclei pixels and other pixels, would produce noisy results.

Plissiti et al. [176] proposed a method of automated cell nuclei detection in

Pap smear images using morphological reconstruction and two types of clustering

techniques. Unsupervised FCM and supervised Support Vector Machine (SVM)

clustering are used in the study. The preprocessing of the images involved

application of adaptive histogram equalization with Otsu’s method [47]. The

process is refined with the application of a 3x3 flat structuring element. Gray scale

morphological reconstruction and h-minima transform are used to determine the

centroids of the candidate nuclei. Reconstruction of the original image, from a

marker, is obtained by subtracting a threshold, from every pixel in the

complement of of the initial image with dimension DI as follows [176]

Geodesic dilation is then performed n times, n 0, on the images. This

process creates some regional minima. FCM and SVM processed these regional

minima and found out the true nuclei and the points belonging to other regional


The process is fully automated. Morphological reconstruction and geodesic

dilation yield good result though consuming a lot of computational time.

Supervised SVM requires a training data set or more precisely some apriori

knowledge of the cell nuclei. An incomplete and unrealistic training set will greatly

affect the efficiency of SVM. A fixed structuring element of size may cause

problem for images of higher dimension. FCM suffers from the local minima

Kim et al. [177] proposed a method of segmentation and classification of

Pap smear images using Hue-Saturation-Intensity (HSI) model and FCM

algorithm. The extracted characteristics of the cell nuclei are morphemetric feature,

densitometric feature, colourmetric feature, and textural feature. The acquired

images are preprocessed with a three layer processing: conversion to gray scaled

image, removal of noise using brightness information, and application of fuzzy

morphological operations. They proposed a method to process the converted

images. The mathematical function is based on brightness information being free

from noise as given in the following equation [177]

and are the brightness values of the original image and z is . is the

sum of the lowest brightness value and 30% of the highest brightness value, and

is 80% of the highest brightness value. and are 0 and 255 respectively.

Next fuzzy morphological operation is applied which helps to extract the region of

interest (ROI) properly. A 5×5 structuring element, B is used to achieve the

refinement of the cell nuclei region. The fuzzy morphological operations are

defined as [177]

! " #$%

! &'() *+ , -
./ 0 12 !34 5
7 &( 7
0 12 6 >
89 :;<&=:

? ?! " #$%

?! =@ )A* + / !34 5

Final segmentation is achieved by applying iterative histogram

thresholding. Feature extraction from the segmented images is performed and the

features include shape description such as area of the nuclei, brightness

information mostly based on histogram. According to Kim et al. [177], one of the

most important features is textural feature. The extracted features are processed

with FCM algorithm.

Only 20 samples of Pap smear images are analyzed with the method which

is very low for analytical procedure as there exists a vast variation of images. Use

of conventional FCM algorithm with Euclidean distance measure detects only

spherical shaped clusters, which leads to omission of valid cluster element.

Muhimmah et al. [178] proposed a method of detection of epithelial cells in

Pap smear images using a combined framework of distance metric and FCM

clustering algorithm. The preprocessing of the Pap smear images is done by

applying adaptive histogram equalization and global thresholding. Three binary

images are created from each of the Pap smear images using Otsu’s method [47].

Logical OR operation is used to obtain a binary mask from the binary image. The

connected components with less than 500 pixels are removed. For detection of

candidate nuclei, morphological reconstruction is applied. The extra nuclei markers

are removed using an edge detection method. The Euclidean distance metric is

used to refine the markers. The centroids of the nuclei are refined by applying the

following rule [178]



B C DE &

Select ; F G , -.H IJ

& KK

Until & 'E

E is the area that has label equal &, 'E is labeled area, and is the gray level

image of the original image. Finally, FCM clustering algorithm is used to segment

the cell nuclei with the following conditions:

(i) 2 clusters are defined for cytoplasm and nuclei.

(ii) Similarity is defined on the intensity value.

(iii) Positive class is defined when average intensity of the clusters is lower than

the other.

The above-mentioned method has considerable amount of faults. The

candidate nuclei detection is a very lengthy process. Euclidean distance is used to

evaluate the intensity values and again FCM is used to refine the markers of the

candidate nuclei. The use of FCM algorithm with fixed number of clusters reduces
the efficiency and the flexibility of FCM. One of the obvious drawbacks of the

method is the use of only one criterion as an input to the FCM algorithm, which is

trivial. The complexity of the Pap smear image is not considered up to the level of

requirement for a robust automated method.

Ghafar et al. [179] devised a method of Pap smear image segmentation

using stretching and clustering technique. The proposed segmentation technique

segments the Pap smear images into cell nuclei, cytoplasm and background. In the

stretching process, the distribution of gray level of the Pap smear images are

stretched. This is a simple process of enhancing the contrast of the images. Images

are segmented using two clustering techniques, FCM and non-adaptive k-means

algorithm. Ghafar et al. [171] compared the results of the two segmentation

methods and found out FCM to be superior to k-means algorithm.

The method is straightforward and makes the use of conventional clustering

algorithms. Non-adaptive k-means and FCM algorithms, both can detect only

spherical shaped clusters which is a drawback of this method. As a whole, the

method suffers from the typical limitations of the two clustering algorithms.

Cebron et al. [180] proposed a method of classification cell assay image of

Pap smear test through FCM clustering algorithm and learning vector quantization.

The method involved unsupervised classification of the cell assay image

supplemented by user-defined parameters. The cell assay images contain a large

number of images of cervical cells. A trainee neural network followed by region

growing method suggested by Jones et al. [181] does segmentation of the images.

Fourteen texture features are extracted based on the work of Haralick et al. [182],
representing statistics of the co-occurrence matrix of the gray level image. Four co-

occurrence matrices from horizontal, vertical, diagonal, and anti-diagonal

directions are averaged to achieve rotation invariance. These features provide

information about the smoothness, contrast, or randomness of the image or more

general statistics about the relative positions of the gray levels within the image.

Cebron et al. [180] modified the FCM algorithm to cancel the affect of

noise. They define the feature vector and the objective function as follows: let

L N & O O O "L " be a set of feature vectors for the data items that are to be

clustered, P ST, U OOO a set of c clusters. V is the matrix with


coefficients where W& U denotes the membership of N to cluster, U. Given a


distance function, X, the FCM algorithm with noise detection iteratively minimizes

the following objective function with respect to W and < [180]

"b" _ "b" _

YZ [ [ WMZR X ST N ^ c [
[ WM R ^
Q\ 4]
M`a R`a M`a R`a

d is the fuzzification parameter and indicates how much the clusters are

allowed to overlap each other. The first term on the right hand side of the above

equation efghhijkl-m. nhohOcorresponds to the normal FCM objective function,

whereas the second term arises from the noise cluster. c is the distance from every

datapoint to the noise cluster . This distance can either be fixed or can be updated

in each iteration according to the average inter-point distances. Objects that are not

close to any of the cluster centers ST are therefore detected as having a high

membership value to the noise cluster.

An adaptive active classification based FCM clustering algorithm followed

the unsupervised FCM clustering. A user can adjust the parameters of the FCM

clustering as well as can label the samples and add them to a labeled prototype


The method is efficient because FCM is prone to noise and the noise

removal mechanism of the proposed method enables FCM to avoid unwanted

cluster center generation. The interactive user dependent FCM clustering can give

very good result provided the user is sufficiently expert and has good knowledge of

the cell assay image. The unsupervised FCM clustering produces a noise cluster.

This has both advantage and disadvantage. It enables efficient noise removal; and

on the other hand, the noise points could have been discarded without assembling

them in to a cluster to reduce computational time. Euclidean distance in FCM has

some limitations as seen in the previously discussed methods. The user defined

FCM parameter and labeling can completely fail due to lack of experience and

knowledge of the user.

7.3 Proposed method of Analysis of Pap smear images using FCM algorithm

7.3.1 Segmentation of Pap smear images using FCM clustering algorithm

Colour Pap smear images in RGB colour channel are considered for

analysis. An image is nothing but a collection of pixels; each pixel having a

particular value for the three colour channels red (R), green (G) and blue (B).

Hence to find the different areas inside the image, let us consider the image as a set

of data having pixel values R, G and B. This dataset (x1, x2,....xn) is classified using

Fuzzy c-mean (FCM) clustering algorithm to distinguish the different regions

inside image namely nucleus and cytoplasm. To initiate the clustering process

random numbers are generated corresponding to each of the R, G and B value.

Two types of random number generator are used; general random number

generator and random number based on Chaos theory. These random numbers

constitute the membership value (µ) of the pixels. The µ value of the pixels are

compared with that of the cluster center and classified accordingly. The clustering

process segments the images namely into three classes: cytoplasm, nucleus and the


7.3.2 Cluster Validity for region segmentation of cervical cell

In practical applications, we need a cluster validity method to measure the

quality of the clustering result. The efficiency of a clustering algorithm depends on

various factors, such as the process of initialization, the choice of the number of

clusters’ c etc. The method of initialization requires a good estimate of the clusters

and is application dependent, so the cluster validity problem is reduced to the

choice of an optimal number of class c. Several cluster validity measures have

been developed in the past. In this section, three of these measures are described:

partition coefficient, partition entropy, compactness and separation validity

function. The partition coefficient (PC) is defined as [34]

c n
F (U , c ) = ( µ ik )

n i =1 k =1

Suppose that _ represents the clustering result, then the optimal choice of

c is given by
c {
max max F (U , c ) )
Ωc } c = 2,..., n −1

The partition coefficient measures the closeness of all input samples to their

corresponding cluster centers. If each sample is closely associated with only one

cluster, that is, if for each U, MR is large for only one value, then the uncertainty of

the data is small, which corresponds to a large 2 p value. The partition

entropy (PE) is defined as [34]

1 c n
H (U, c) = − µik log ( µik )
n i=1 k =1

The optimal choice of c is given by

c {
m i n m in H (U , c ) )
Ωc } c = 2,..., n − 1

When all MR ’s have values close to 0.5, which represents a high degree of

fuzziness of the clusters, q p is large and thus indicates a poor clustering

result. On the other hand, if all MR ’s have values close to 0 or 1 q p is small

and indicates a good clustering result. The compactness and separation (SC)

validity function is defined as [35], [183]

c n
1 2
µ i2k x k − v i
S (U , c ) = i =1 k =1
m in vi − v j
i, j

The optimal choice of c is given by

c {
m i n m i n S (U , c ) )
Ω c
} c = 2,..., n − 1

r p is the ratio between the average distance of input samples to their

corresponding cluster centers and the minimum distance between cluster centers. A

good cluster procedure should make all input samples as close to their cluster

centers as possible and all cluster centers separated as far as possible. The

compactness and separation validity function seems to work better for image

segmentation problems.

7.3.3 Shape analysis of the cervical cells

Shape analysis gives the idea about the shape of an object. Simple direction

code or chain code can be used to trace the contour of an object. The chain code is

chosen among 8 selected points on the boundary of the object as shown in Figure

7.1. The angle between any consecutive lines connecting two consecutive pair of

points is 45°. Any two opposite points can be taken as starting points of boundary.

Suppose point 1 and point 5 are taken as the starting point of the boundary and let

the respective boundaries be a and s. Mathematically it can be written as s

a , where is a real number.

Figure 7.1: Direction code with 8 points

7.4 Results

The segmented images of the Pap smear images FCM clustering algorithm

are shown in Figure 7.2.

Chain coded direction is used to trace the boundary of the cell nucleus. The

tracing is helpful in distingushing the cell nucleus, that is to visualize the region of

inaterest for the health personnel. Cell nuclei are traced with white dotted line as

shown in Figure 7.3 (b) and (d).

The seven classes of Pap smear cell images segmented with the histogram

and morphology based segmentation are again segmented with FCM clustering

algorithm. In Table 7.1, the seven classes of Pap smear cell images are with their

corresponding traced and segmented images.

(a) (b)

(c) (d)

Figure 7.2: FCM segmentation: (a) and (c) Original Pap smear image; (b) and (d)

Segmented image.
(a) (b)

(c) (d)
Figure 7.3: Chain coded tracing: (a) and (c) Original image, (b) and (d) traced cell

nuclei with white dots.

Table 7.1: Seven class of Cervical cell segmentation and cell nuclei tracing.
Cell Original Image Segmented Image Traced cell nucleus
Class boundary

The clustering process is validated by three cluster validity measures. The

validity measure gives an idea of the goodness of the clusters. In image

segmentation problem, validation of the segmentation result is very essential to

evaluate the correctness of segmentation. The aim of segmentation is to isolate the

area of interests to facilitate the easier implementation of the subsequent image

processing tasks. The FCM segmentation is carried out on both the images with

cluster of cells and the images with single cell. The cluster validity measure values

are given below.

The cluster validity measures for the segmented images in Figure 7.2 (b)

and (d) are given in the Table 7.2. The value of the fuzzifier, d is 1.2.

Table 7.2: Cluster validity measure values

Image Number of Partition Partition Compactness and

Clusters coefficient Entropy Separation index
(c) (PC) (PE) (SC)

Figure 7.2
2 0.59 0.62 1.28

Figure 7.2
2 0.63 0.56 1.16

The cluster validity measure values are also calculated for the seven classes

of cervical cell images. The number of the clusters for these images are

automatically determined by the FCM algorithm based on the random numbers

generated for FCM initialization as described in Section 7.3.1. The value of the
fuzzifier, d is kept same as above that is 1.2. Table 7.3 shows the values of three

cluster validity measures for the segmented Pap smear images.

Table 7.3: Cluster validity measure values for seven classes of cervical cell image

Cell Segmented image Number Partition Partition Compactness

Class of Coefficient Entropy and
Clusters (PC) (PE) Separation
index (SC)
5 0.29 0.68 0.96

4 0.33 0.69 0.88

3 0.43 0.71 0.83

4 0.48 0.72 0.79

4 0.31 0.75 1.05

3 0.49 0.83 1.12

4 0.51 0.89 1.44
7.5 Discussion and Conclusion

The FCM clustering algorithm has several advantages over hard clustering

and conventional non-fuzzy image segmentation methods. This can be seen from

the results in Figure 7.2 and Figure 7.3, Table 7.1, Table 7.2 and Table 7.3. The

FCM segmentation method preserves the colour of the images, which is an

essential requirement for medical image segmentation. The previous method,

proposed in Chapter 6, works on converted gray level image. The conversion may

lead to some data loss. But the FCM based method works on colour image and

chances of data loss is almost nil or very less.

RGB colour space has three components, R, G, and B. Thus, a pixel has

three values associated with it. Two types of random number generators are used to

form the membership matrix for each pixel. The use of right random number

generator is very important for initialization of the FCM algorithm.

Cluster validation is an integral part of any clustering process. It gives an

idea of the goodness of the clusters terms of the dataset and the problem domain.

In this study, three validity measures are used; partition coefficient (PC), partition

entropy (PE), and compactness and separation function (SC). The PC and PE are

particularly very useful in numerical data clustering. Compactness and separation

validity function (SC) is more in practical application such as image segmentation.

The physical interpretation of SC closely resembles the visual perception of digital


The shape analysis gives the contour of cell nuclei, which is a critical

criterion for cervical cancer diagnosis. The application of chain code is easy and
simple, though it is capable of producing very reliable result. It exploits the pixel

connectivity properties of digital image very efficiently.

The method produces a reasonable result, which suffers from some

common drawbacks of the FCM clustering and intricacies of high dimensional

image data. The obvious drawback of FCM algorithm is that it cannot detect

clusters of arbitrary shape except spherical shaped clusters. In case of colour image

segmentation, where very subtle and uncertain change in pixel properties occurs

within a large data space, FCM fails to detect all the valid clusters. A closer

inspection of the segmentation results presented in Figure 7.2 and Table 7.1,

invites some retrospection. In Figure 7.2 (b) and (d), it can be seen that not all the

cell nuclei are segmented properly. In Table 7.2, some images are over segmented

and some are under segmented. The high dimension of image data is responsible

for local minima problem faced by FCM clustering algorithm.

The tracing of the cell nuclei works well in case of images with less

complexity, such as minimal colour variation, but fails to trace the region of

interest (ROI) where the boundary is vague. Some parts of the cytoplasm are traced

in some of the Pap smear images.

The feature enhancement tasks include enabling the FCM algorithm to

detect arbitrary shaped clusters, minimizing the effect of high dimensionality of

Pap smear images, and to generalize the shape analysis with some concrete and

invariant shape measures. In the next chapter a method, which attempts to address

these problems, is proposed and it is described in details.