Vous êtes sur la page 1sur 10

c

   
 
      c
 c c
c c
c

p   

 


c

c c

c
á  
c

Image segmentation is a fundamental task in computer vision with applications in


many fields including medical image analysis, object recognition, pedestrian
detection, and automated surveillance. The goal is to decompose an image into
Ê regions. While the precise meaning of Ê varies for the specific
application, in general, the objective is to produce regions that correspond to full
coherent objects. To this end, existing segmentation algorithms group pixels that
have similar image characteristics (e.g., color, intensity, and/or texture) with the
assumption that image regions corresponding to objects have similar features
throughout. Supervised methods incorporate high -level object cues learned from a
training set while unsupervised method s assume no prior knowledge on what defines
an object. For unsupervised methods, the number of segments (i.e., model selection)
is often a paramete r that the user must specif. That is, the user must specify how
many objects (or object parts) exist in the i mage to produce the optimal
segmentation. This is often an unrealistic and demanding assumption to make.

In this paper, I will use the Kolmogorov -Smirnov (K-S) test to perform unsupervised
image segmentation. The proposed method does not require the user to specify the
number of segments for each image. The K -S test is a non-parametric statistical test
which can be used to determine whether a data sample is drawn from some
underlying reference probability distribution (one sample test) or to determine
whether two data samples are drawn from the same probability distribution (two -
sample test). I will use the two-sample test in conjunction with agglomerative
clustering. Specifically, after forming a complete binary merger tree based on feature
similarities between regions, the test will be used to prune erroneous merges. The
test will determine whether two regions belong to the same probability distribution
and hence should remain merged.

In my experiments, I will perform image segmentation on the Microsoft Research


Cambridge dataset and compare against a well known unsupervised image
segmentation technique called Normalized Cuts.

 

 

The K-S test has been applied previously to various computer vision applications,
such as image comparison, category discovery, and image segmentation. In [4], the
author uses the K-S test to test the hypothesis that the two images have the same
gray scale intensity distributions. In, the authors use the K-S test to distinguish
images containing objects of different categories. They use agglomerative clustering
to merge similar images to create a binary merger tree and use the K-S test to prune
erroneous mergers. My method is most influenced by the method of. However, the
proposed method will be used for image segmen tation rather than for category
clustering. The K-S test has also been used previously for image segmentation in [6].

The authors produce 1-dimensional histograms for each subspace of the data (e.g.,
each of the R, G, and B color dimensions) and directly cluster the histograms by
locating local maxima and minima. The authors use the K -S test to simplify the
clustering by identifying the simplest density function that fits the data, such that
each pixel can be assigned to its nearest cluster center in the m odified distribution.
In contrast, the proposed method will not consider each 1 -dimensional subspace of
the data separately (since the optimal choice is image -dependent and cannot be
intuitively determined). Instead, the proposed approach will map the mult i-
dimensional data to a 1 -dimensional space. More importantly, I will use the K -S test
to merge segments rather than to approximate a fit.

Î   
c

2  
  Ê  Ê  Ê   

In the two-sample test, the null hypothesis H0 is that the two samples are drawn from
the same distribution and the alternate hypothesis Ha is that they are drawn from
different distributions. Using the notation from, let f n1(x) and fn2(x) be two histograms
(samples) of size n1 and n2 drawn from two continuous probability density functions,
f 1(x) and f2(x), respectively. The null hypothesis, H0, and alternative hypothesis, Ha,
are:

We can compute the empirical cumulative distribution function (ECDF), Fn1(x) and
Fn2(x) as:

The K-S test statistic, D, is defined as the maximum absolute distance between the
two ECDFs:
Kolmogorov and Smirnov showed that the two -sided p-value can be approximated
as:

The null hypothesis is rejected if the test is significant at level The


test is non-parametric in that no assumption is made concerni ng the distribution of
the variables or the distribution between the two empirical density functions.

Ê  Ê     

The proposed algorithm is summarized in Figure 1. First, I oversegment an image
into small homogeneous regions called ³superpixels. Superpixels are small
homogeneous groups of pixels that preserve object boundaries. They are much
more efficient to work with than pixels since a typical image will be comprised of
hundreds of superpixels compared to thousands of pixel s. The specific number of
superpixels is a user selected parameter to the algorithm.

For each superpixel, color and texture features are computed using the Lab space
color pixel values and image filter responses, respectively. Hence, each superpixel
can be represented by multidimensional color features or multi -dimensional texture
features.

To compactly represent each region (i.e., superpixel or merged superpixels) in the


image, I generate a codebook to map each multi -dimensional feature to a 1 -
dimensional histogram. Each index in the codebook represents a cluster center in
the feature space, obtained using k-means on a random subset of the features from
the entire image collection. Each n-dimensional feature (n = 3 for color, and n =
Ê   for texture) is mapped to the nearest cluster center in the codebook.
The final representation of a region is a histogram with c-bins where c = Ê  

  
 
 . Each histogram bin is a count of the pixels in the region
that have been mapped to that codebook index. The histogram is normalized to sum
to one.
To compare the similarity between two regions i and j, I use the chi-square distance
between their histograms, h i and hj :

Figure 2 (a) shows a distance matrix of the computed _2 distances between all
superpixels in an example image. Red indicates high distances while blue indicates
low distances. Given the distance matrix, I sequentially merge regions using single -
link agglomerative clustering. Specifically, at each step, I merge the two regions that
have the smallest _2 distance given by Eqn. ( 1). Every time two regions get merged,
the new region is represented by averaging their respective histograms. Once the
complete binary merger tree is formed, the K -S test is used to prune erroneous
merges top-down. As explained in Section III -A, the K-S test statistic computes the
maximum absolute difference between the empirical cumulative distribution functions
of two histograms to determine whether they are drawn from the same distributi on
(see Figure 2 (b)). We can use this to determine whether two regions should remain
merged. Each time the null hypothesis is rejected (i.e., it is determined that two
regions should not be merged), a region is split into two.

The reason that the K-S test is used to prune erroneous mergers top -down instead
of using it in the bottom -up merging stage is that the K-S test is more reliable over
larger regions [5]. The pruning process ends when no null hypothesis is rejected. In
this case, multiple hypothesis correction was not necessary since the regions
represented in each level of the tree are independent. That is, while there is
dependency in terms of image features between the parent region and children
region since they contain overlapping parts of the image, a split in the parent node
level should not influence (and therefore should be independent of) whether the two
children should remain merged or not.

c
© 


   

2  Ê Ê   
 

I tested my method on the Microsoft Research Cambridge dataset (MSRC) [3] which
is comprised of 591 color images. Each image has multiple objects belonging to a
subset of 23 categories. Pixel-level ground-truth annotation is available which makes
pixel-based evaluation feasible.

I represent color features by their 3 -dimensional Lab values, and use a filter bank
consisting of 12 oriented bar filters at three scales and two isotropic filters to
compute texture features. For the color features I quantize the feature space into 69
bins and for the texture features I quantize the feature space into 400 bins. These
numbers are chosen to provide (roughly) good coverage of the feature spaces. As
part of the experiments, I compare the tradeoff in accuracy for different feature
choices as well as different significance levels.

    
To evaluate my method, I treat the image segmentation problem as one of data
clustering and use cluster evaluation techniques on the segmentation results. This is
the approach taken in [10];

(i) f00 = number of such pairs that fall in different clusters under X and Y .
(ii) f01 = number of such pairs that fall in the same cluster under Y but not under X.
(iii) f10 = number of such pairs that fall in the same cluster under X but not under Y .
(iv) f11 = number of such pairs that fall in the same cluster under X and Y .

where f00 + f01 + f10 + f11 = n(n Ú 1)/2, and n is the total number of objects in S.
Intuitively, one can think of f00 + f11 as the number of agreements between X and Y
and f01 + f 10 as the number of disagreements between X and Y.

The distance measures are in the domain [0,1]; a value of 0 means maximum
similarity, while a value of 1 means maximum dissimilarity. The Jaccard index does
not give any weight to pairs of objects that belong to different clusters under the two
partitions. Hence, its distance measure is generally higher than that of the Rand
index.

For my evaluation, I chose the two partitions X and Y to be the algorithm


segmentation (i.e., the proposed method or Normalized Cu ts) and the Ground Truth
segmentation, respectively. An image in the dataset is represented as S and each
pixel in it is represented as an object oi, where i = 1, . . . , n , and n = number of
pixels in the image.
 
 

For each algorithm (my method and Normalized Cuts), I obtain Rand and Jaccard
index values by comparing it to the ground truth segmentation. I evaluate the
performance of each method by comparing their index values. For the Normalized
Cuts algorithm, I used the code provided by the authors. The features used for the
Normalized Cuts algorithm are gray-scale pixel values. Note that Normalized Cuts
requires the user to specify the number of segments, K, as a parameter to the
algorithm. To choose the ³ideal´ va lue for K, I chose it to be equal to the number of
segments specified in the ground -truth segmentation. While this gives an unfair
advantage to Normalized Cuts, its a good setting to test how well my method works.
Unlike Normalized Cuts, my method does not require user specification on K and
instead automatically determines the number of segments for each image.

Table I shows the mean and standard deviation values of the index values on the
MSRC dataset images. Table I (left) shows results for my metho d when using color
features and Table I (right) shows results for my method when using texture
features. Each row in the tables shows results for different choices for the
significance level, _, and the number of superpixels, N. There are some interesting
observations that can be made from the results.

First, for my method, color features produce better segmentations than texture
features. This implies that most objects in the dataset can be uniquely defined by
their color. Second, the number of superpixel s, N, does not have a significant effect
on the segmentation results. This is mainly because my method prunes merge errors
top-down, so the size of the regions that are considered for pruning are
(approximately) independent of the size of the original supe rpixels that the image
started off with.

Third, with increasing significance level, _, the segmentations become much worse.
The main reason for this is the size of each sample (the number of pixels within each
region), which is on the order of ten thousan ds; a typical image in the MSRC dataset
has size 320 x 213. Since we prune merges top -down after the binary merger tree is
formed, the sample sizes of the considered regions can be quite large. Due to the
large sample sizes, even very minor differences in the region feature distributions
result in tiny p-values and hence rejection of the null hypothesis. Consequently, for
most images, every merge is considered to be an error which results in the final
segmentations being their initial oversegmented superpix el images (where nothing is
merged). Therefore, setting _  0 to account for the tiny p -values produced the most
reasonable results. Figure 4 shows example segmentation results obtained using
these settings versus those obtained by Normalized Cuts.

The fourth observation is that my method with feature type = color, N = 25, and _  0
(The best setting) performs better than Normalized Cuts in terms of the Jaccard
index, but not in terms of the Rand index. This may be due to the fact that the Rand
index gives equal weight to pairs of pixels that were placed in different segments for
both partitions, f00, as pairs of pixels that were placed in the same segment for both
partitions, f11. Since f00 only considers whether the pairs of pixels were placed in
different segments, but not specifically 
segments, the Rand index could
(incorrectly) give a higher measure of similarity than the Jaccard index which does
not consider f00.

Figure 3 shows index values computed from the Normalized Cuts algorithm and m y
algorithm (with parameters feature type = color, N = 25, and _  0). Since the index
values are distances, lower values are better. It is quite clear that in terms of Jaccard
index, my method outperforms Normalized Cuts, while in terms of Rand index, the
methods seem comparable.

To test the statistical significance of the differences in the index values obtained from
the two methods, I used the paired Z -test statistic. The null hypothesis is that there is
no difference in the measured index values. Since my sample size is 591 (the
number of images in the dataset), I can assume the difference in index values to be
normally distributed by the Central Limit Theorem. I first computed the differences in
index values for each sample. Then I computed the mean an d standard deviation of
the differences. From this, I computed the Z -score and corresponding p -value. The
p-values for the Rand index and Jaccard index were 0.0136 and 3.148e -30,
respectively. At _ = 0.05, the null hypothesis is rejected and thus the differences are
significant. With a lower _ value such as 0.01, the differences in Rand index values
would not be considered to be significant.

c
   

I proposed an unsupervised method for image segmentation. The approach does not
need user input to determine the number of segments, unlike most unsupervised
alternatives. Instead, it uses statistical testing to select it automatically. The algorithm
starts by merging superpixels with hierarchical agglomerative clustering to form a
binary merger tree. It then prunes erroneous merges using the Kolmogorov-Smirnov
test. The results indicate that the proposed method performs comparably or better
than the Normalized Cuts method. A limitation of the method is that the pruning of
the binary merger tree down any path stops when the null hypothesis is not rejected.
If a merger between two very different regions produces a new region that is similar
to another region in the image, then the larger regions could remain merged. Hence,
the better segmentation would not be produced. This is a tradeoff of pruning error
top-down versus bottom-up. As future work, a combined top-down and bottom-up
merging/error pruning method could be employed to alleviate such effects.

Vous aimerez peut-être aussi