Journal of Electronic Imaging 13(1), 146–165 (January 2004).
Survey over image thresholding techniques and quantitative performance evaluation
Mehmet Sezgin Tu ¨ bı ˙ tak Marmara Research Center Information Technologies Research Institute Gebze, Kocaeli Turkey Email: sezgin@btae.mam.gov.tr
Bu ¨ lent Sankur Bog ˇ azic¸i University
ElectricElectronic Engineering Department
˙
Bebek, I stanbul
Turkey
Abstract. We conduct an exhaustive survey of image thresholding methods, categorize them, express their formulas under a uniform notation, and ﬁnally carry their performance comparison. The thresholding methods are categorized according to the information they are exploiting, such as histogram shape, measurement space clustering, entropy, object attributes, spatial correlation, and local graylevel surface. 40 selected thresholding methods from various categories are compared in the context of nondestructive testing applications as well as for document images. The comparison is based on the combined performance measures. We identify the thresholding algorithms that perform uniformly better over nonde structive testing and document image applications. _{©} _{2}_{0}_{0}_{4} _{S}_{P}_{I}_{E} _{a}_{n}_{d}
IS&T.
[DOI: 10.1117/1.1631316]
1 Introduction
In many applications of image processing, the gray levels
of pixels belonging to the object are substantially different from the gray levels of the pixels belonging to the back ground. Thresholding then becomes a simple but effective tool to separate objects from the background. Examples of thresholding applications are document image analysis,
where the goal is to extract printed characters, ^{1}^{,}^{2}
graphical content, or musical scores: map processing, where lines, legends, and characters are to be found:
scene
processing, where a target is to be detected: ^{4} and quality
where defective parts must be
inspection of materials,
delineated. Other applications can be listed as follows: cell
images ^{7}^{,}^{8} and knowledge representation, ^{9} segmentation of various image modalities for nondestructive testing NDT applications, such as ultrasonic images in Ref. 10, eddy
xray computed tomog
raphy CAT ,
current images, ^{1}^{1} thermal images,
endoscopic images, ^{1}^{4} laser scanning confo
logos,
3
5,6
12
13
Paper 02016 received Feb. 7, 2002; revised manuscript received Jan. 23, 2003; ac cepted for publication May 27, 2003. 10179909/2004/$15.00 © 2004 SPIE and IS&T.
cal microscopy,
tation in general, ^{1}^{6}^{,}^{1}^{7} spatiotemporal segmentation of video images, ^{1}^{8} etc. The output of the thresholding operation is a binary im age whose one state will indicate the foreground objects, that is, printed text, a legend, a target, defective part of a material, etc., while the complementary state will corre spond to the background. Depending on the application, the foreground can be represented by graylevel 0, that is, black as for text, and the background by the highest lumi nance for document paper, that is 255 in 8bit images, or conversely the foreground by white and the background by black. Various factors, such as nonstationary and correlated noise, ambient illumination, busyness of gray levels within the object and its background, inadequate contrast, and ob ject size not commensurate with the scene, complicate the thresholding operation. Finally, the lack of objective mea sures to assess the performance of various thresholding al gorithms, and the difﬁculty of extensive testing in a task oriented environment, have been other major handicaps. In this study we develop taxonomy of thresholding al gorithms based on the type of information used, and we assess their performance comparatively using a set of ob
13 extraction of edge ﬁeld, ^{1}^{5} image segmen
jective segmentation quality metrics. We distinguish six categories, namely, thresholding algorithms based on the exploitation of: 1. histogram shape information, 2. mea surement space clustering, 3. histogram entropy informa tion, 4. image attribute information, 5. spatial information, and 6. local characteristics. In this assessment study we envisage two major application areas of thresholding,
namely document binarization and segmentation of nonde structive testing NDT images. A document image analysis system includes several imageprocessing tasks, beginning with digitization of the document and ending with character recognition and natu ral language processing. The thresholding step can affect quite critically the performance of successive steps such as
146 / Journal of Electronic Imaging / January 2004 / Vol. 13(1)
Survey over image thresholding techniques
classiﬁcation of the document into text objects, and the correctness of the optical character recognition OCR . Im proper thresholding causes blotches, streaks, erasures on the document confounding segmentation, and recognition tasks. The merges, fractures, and other deformations in the character shapes as a consequence of incorrect thresholding are the main reasons of OCR performance deterioration. In NDT applications, the thresholding is again often the ﬁrst critical step in a series of processing operations such as morphological ﬁltering, measurement, and statistical as sessment. In contrast to document images, NDT images can derive from various modalities, with differing application goals. Furthermore, it is conjectured that the thresholding algorithms that apply well for document images are not necessarily the good ones for the NDT images, and vice versa, given the different nature of the document and NDT images. There have been a number of survey papers on thresh olding. Lee, Chung, and Park ^{1}^{9} conducted a comparative analysis of ﬁve global thresholding methods and advanced useful criteria for thresholding performance evaluation. In an earlier work, Weszka and Rosenfeld ^{2}^{0} also deﬁned sev eral evaluation criteria. Palumbo, Swaminathan and Srihari ^{2}^{1} addressed the issue of document binarization com paring three methods, while Trier and Jain ^{3} had the most extensive comparison basis 19 methods in the context of character segmentation from complex backgrounds. Sahoo et al. ^{2}^{2} surveyed nine thresholding algorithms and illus trated comparatively their performance. Glasbey ^{2}^{3} pointed out the relationships and performance differences between 11 histogrambased algorithms based on an extensive sta tistical study. This survey and evaluation, on the one hand, represents a timely effort, in that about 60% of the methods discussed and referenced are dating after the last surveys in this area. ^{1}^{9}^{,}^{2}^{3} We describe 40 thresholding algorithms with the idea underlying them, categorize them according to the in formation content used, and describe their thresholding functions in a streamlined fashion. We also measure and rank their performance comparatively in two different con texts, namely, document images and NDT images. The im age repertoire consists of printed circuit board PCB im ages, eddy current images, thermal images, microscope cell images, ultrasonic images, textile images, and reﬂective surfaces as in ceramics, microscope material images, as well as several document images. For an objective perfor mance comparison, we employ a combination of ﬁve crite ria of shape segmentation goodness. The outcome of this study is envisaged to be the formu lation of the large variety of algorithms under a uniﬁed notation, the identiﬁcation of the most appropriate types of binarization algorithms, and deduction of guidelines for novel algorithms. The structure of the work is as follows:
Notation and general formulations are given in Sec. 2. In Secs. 3 to 8, respectively, histogram shapebased, clusteringbased, entropybased, object attributebased, spatial informationbased, and ﬁnally locally adaptive thresholding methods are described. In Sec. 9 we present the comparison methodology and performance criteria. The evaluation results of image thresholding methods, sepa rately for nondestructive inspection and document process
ing applications, are given in Sec. 10. Finally, Sec. 11 draws the main conclusions.
2 Categories and Preliminaries
We categorize the thresholding methods in six groups ac cording to the information they are exploiting. These cat egories are:
1. histogram shapebased methods, where, for example, the peaks, valleys and curvatures of the smoothed histogram are analyzed
2. clusteringbased methods, where the graylevel samples are clustered in two parts as background and foreground object , or alternately are modeled as a mixture of two Gaussians
3. entropybased methods result in algorithms that use the entropy of the foreground and background re gions, the crossentropy between the original and bi narized image, etc.
4. object attributebased methods search a measure of similarity between the graylevel and the binarized images, such as fuzzy shape similarity, edge coinci dence, etc.
5. the spatial methods use higherorder probability dis tribution and/or correlation between pixels
6. local methods adapt the threshold value on each pixel to the local image characteristics.
In the sequel, we use the following notation. The histogram and the probability mass function PMF of the image are
indicated, respectively, by h(g) and by p(g), g 0
where G is the maximum luminance value in the image, typically 255 if 8bit quantization is assumed. If the gray value range is not explicitly indicated as _{} g _{m}_{i}_{n} , g _{m}_{a}_{x} ], it will be assumed to extend from 0 to G. The cumulative probability function is deﬁned as
G,
g
P _{} g _{}
i 0
p _{} i _{} .
It is assumed that the PMF is estimated from the histogram of the image by normalizing it to the total number of samples. In the context of document processing, the fore ground object becomes the set of pixels with luminance values less than T, while the background pixels have lumi nance value above this threshold. In NDT images, the fore ground area may consists of darker more absorbent, denser, etc. regions or conversely of shinier regions, for example, hotter, more reﬂective, less dense, etc., regions. In the latter contexts, where the object appears brighter than the background, obviously the set of pixels with luminance greater than T will be deﬁned as the foreground. The foreground object and background PMFs are ex
pressed as
p _{f} (g), 0 g T, and p _{b} (g), T 1 g G, re
spectively, where T is the threshold value. The foreground and background area probabilities are calculated as:
T
P _{f} _{} T P _{f} _{}
g 0
p _{} g _{} ,
P _{b} _{} T P _{b}
G
_{}
g T 1
p _{} g _{} .
1
Journal of Electronic Imaging / January 2004 / Vol. 13(1) / 147
Sezgin and Sankur
Table 1 Thresholding functions for the shapebased algorithms.
_{S}_{h}_{a}_{p}_{e} – _{R}_{o}_{s}_{e}_{n}_{f}_{e}_{l}_{d} _{2}_{4}
Shape _{–} Sezan ^{3}^{2}
_{S}_{h}_{a}_{p}_{e} – _{O}_{l}_{i}_{v}_{i}_{o}
Shape _{–} Ramesh ^{3}^{0}
Shape _{–} Guo ^{2}^{8}
T _{o}_{p}_{t} arg max _{}_{} p(g) Hull(g) _{}_{} by considering object attributes, such as busyness.
T _{o}_{p}_{t} ﬁrst terminating zero of p˜ (g) _{}_{} (1 ) ﬁrst initiating zero of p˜ (g) _{} 0 1, p˜ (g) d/dg _{} p(g)*smoothing _{–} kernel(g) _{} , where 1 and kernel size is 55
T _{o}_{p}_{t} T ^{0} , valley at the highest resolution T ^{1} , given valleys at k lower resolutions , k 3
T ^{k} ,
T
^{T} _{o}_{p}_{t} ^{} ^{m}^{i}^{n}
_{g} _{} _{0}
G
_{} b _{1} _{} T g _{} ^{2} _{} b _{2} T g ^{2} _{}
g
T 1
where b _{1} (T) m _{f} (T)/P(T), b _{2} (T) m _{b} (T) _{} 1 P(T) _{}
^{T} ^{o}^{p}^{t} ^{} ^{m}^{i}^{n}
g
1
1
p
i 1
a _{i} exp _{}_{} j2 g/256 _{}_{} ^{2} ^{,}
where _{} a _{i} _{}
p
i 1
the n’th order AR coefﬁcients
The Shannon entropy, parametrically dependent on the threshold value T for the foreground and background, is formulated as:
T
H _{f} _{} T _{}
g 0
p _{f} _{} g _{} log p _{f} _{} g _{} ,
H _{b} _{} T
G
_{}
g T 1
p _{b} _{} g _{} log p _{b} _{} g _{} .
2
The sum of these two is expressed as H(T) H _{f} (T) H _{b} (T). When the entropy is calculated over the input image distribution p(g) and not over the class distribu tions , then obviously it does not depend on the threshold T, and hence is expressed simply as H. For various other deﬁ nitions of the entropy in the context of thresholding, with some abuse of notation, we use the same symbols of H _{f} (T) and H _{b} (T). The fuzzy measures attributed to the background and foreground events, that is, the degree to which the gray level, g, belongs to the background and object, respectively, and are symbolized by _{f} (g) and _{b} (g). The mean and variance of the foreground and background as functions of the thresholding level T can be similarly denoted as:
T 
T 

_{f} _{} T _{} g 0 
gp _{} g _{} 
_{f} _{} T _{} 2 g 0 

m 

G 

m 
_{b} _{} T _{} 
gp _{} g _{} 

g T 1 

G 

^{} ^{2} b _{} T _{} 
_{} g m _{b} _{} T _{}_{} ^{2} p _{} g _{} . 
g T 1
_{} g m _{f} _{} T _{}_{} ^{2} p _{} g _{} , 
3 
4 
We refer to a speciﬁc thresholding method, which was pro grammed in the simulation analysis and whose formula ap pears in the table, with the descriptor, ‘‘category _{–} author.’’ For example, Shape _{–} Sezan and Cluster _{–} Otsu, refer, re spectively, to the shapebased thresholding method intro
duced in a paper by Sezan and to the clusteringbased thresholding method ﬁrst proposed by Otsu.
3 Histogram ShapeBased Thresholding Methods
This category of methods achieves thresholding based on the shape properties of the histogram see Table 1 . The shape properties come into play in different forms. The distance from the convex hull of the histogram is investi gated in Refs. 20, and 24–27, while the histogram is forced into a smoothed twopeaked representation via autoregres sive modeling in Refs. 28 and 29. A more crude rectangular approximation to the lobes of the histogram is given in Refs. 30 and 31. Other algorithms search explicitly for peaks and valleys, or implicitly for overlapping peaks via curvature analysis. ^{3}^{2}^{–}^{3}^{4}
method ^{2}^{4}
Shape _{–} Rosenfeld is based on analyzing the concavities of the histogram h(g) visa ` vis its convex hull, Hull g , that is
the set theoretic difference Hull(g) p(g) . When the convex hull of the histogram is calculated, the deepest con cavity points become candidates for a threshold. In case of competing concavities, some object attribute feedback, such as low busyness of the threshold image edges, could be used to select one of them. Other variations on the theme
can be found in Weszka and Rosenfeld, ^{2}^{0}^{,}^{2}^{5} and Halada and Osokov. ^{2}^{6} Sahasrabudhe and Gupta ^{2}^{7} have addressed the histogram valleyseeking problem. More recently, Whatmough ^{3}^{5} has improved on this method by considering the exponential hull of the histogram.
Convex
hull
thresholding. Rosenfeld’s
Sezan ^{3}^{2} Shape _{–} Sezan
carries out the peak analysis by convolving the histogram function with a smoothing and differencing kernel. By ad justing the smoothing aperture of the kernel and resorting to peak merging, the histogram is reduced to a twolobe function. The differencing operation in the kernel outputs the triplet of incipient, maximum, and terminating zero
crossings of the histogram lobe S (e _{i} ,m _{i} ,s _{i} ),i 2 1, .
The threshold sought should be somewhere between the ﬁrst terminating and second initiating zero crossing, that is:
Peakandvalley thresholding.
148 / Journal of Electronic Imaging / January 2004 / Vol. 13(1)
Survey over image thresholding techniques
T _{o}_{p}_{t} e _{1} (1 )s _{2} , 0 1. In our work, we have found that 1 yields good results. Variations on this
theme are provided in Boukharouba, Rebordao, and Wendel, ^{3}^{6} where the cumulative distribution of the image is ﬁrst expanded in terms of Tschebyshev functions, followed
by the curvature analysis. Tsai
gram via Gaussians, and the resulting histogram is investi gated for the presence of both valleys and sharp curvature points. We point out that the curvature analysis becomes effective when the histogram has lost its bimodality due to the excessive overlapping of class histograms.
Shape _{–} Olivio
consider the multiscale analysis of the PMF and interpret
its ﬁngerprints, that is, the course of its zero crossings and extrema over the scales. In Ref. 34 using a discrete dyadic wavelet transform, one obtains the multiresolution analysis
, p ^{0} (g) p(g) is the original normalized histogram. The threshold is deﬁned as the valley minimum point follow ing the ﬁrst peak in the smoothed histogram. This threshold position is successively reﬁned over the scales starting from
where
of the histogram, p ^{s} (g) p(g)* _{s} (g), s 1,2
37
obtains a smoothed histo
In a similar vein, Carlotto ^{3}^{3} and Olivo
34
the coarsest resolution. Thus starting with the valley point T ^{(}^{k}^{)} at the k’th coarse level, the position is backtracked to the corresponding extrema in the higher resolution histo
grams p ^{(}^{k} ^{} ^{1}^{)} (g)
ﬁning the sequence T ^{(}^{1}^{)}
used .
T ^{(}^{k}^{)} in our work k 3 was
^{(}^{0}^{)} (g), that is, T ^{(}^{0}^{)} is estimated by re
p
and
Sethi
mation to the PMF consisting of a twostep function. Thus the sum of squares between a bilevel function and the his
is obtained
by iterative search. Kampke and Kober ^{3}^{1} have generalized the shape approximation idea.
the authors have approximated the
spectrum as the power spectrum of multicomplex expo nential signals using Prony’s spectral analysis method. A similar allpole model was assumed in Guo ^{2}^{8} Shape _{–} Guo .
We have used a modiﬁed approach, where an autoregres
sive AR model is used to smooth the histogram. Here one begins by interpreting the PMF and its mirror reﬂection around g 0, p( g), as a noisy power spectral density, given by ˜p(g) p(g) for g 0, and p( g) for g 0. One then obtains the autocorrelation coefﬁcients at lags k
by the inverse discrete fourier transform IDFT of
the original histogram, that is, r(k) IDFT _{} ˜p(g) _{} . The au
tocorrelation coefﬁcients _{} r(k) _{} are then used to solve for the n’th order AR coefﬁcients _{} a _{i} _{} . In effect, one smoothes the histogram and forces it to a bimodal or twopeaked representation via the n’th order AR model (n 1,2,3,4,5,6). The threshold is established as the mini mum, resting between its two pole locations, of the result ing smoothed AR spectrum.
Shapemodeling
30
thresholding.
Ramesh,
Yoo,
Shape _{–} Ramesh use a simple functional approxi
opt
togram is minimized, and the solution for T
In Cai and Liu,
29
0
G,
4 ClusteringBased Thresholding Methods
In this class of algorithms, the graylevel data undergoes a clustering analysis, with the number of clusters being set always to two. Since the two clusters correspond to the two
lobes of a histogram assumed distinct , some authors search for the midpoint of the peaks. ^{3}^{8}^{–}^{4}^{1} In Refs. 42–45, the algorithm is based on the ﬁtting of the mixture of Gaus sians. Meansquare clustering is used in Ref. 46, while fuzzy clustering ideas have been applied in Refs. 30 and 47.
See Table 2 for these algorithms.
Iterative thresholding. Riddler ^{3}^{8} Cluster _{–} Riddler ad vanced one of the ﬁrst iterative schemes based on twoclass Gaussian mixture models. At iteration n, a new threshold T _{n} is established using the average of the foreground and
background class means. In practice, iterations terminate when the changes T _{n} T _{n} _{} _{1} become sufﬁciently small. Leung and Fam ^{3}^{9} and Trussel ^{4}^{0} realized two similar meth ods. In his method, Yanni and Horne ^{4}^{1} Cluster _{–} Yanni ini tializes the midpoint between the two assumed peaks of the histogram as g _{m}_{i}_{d} (g _{m}_{a}_{x} g _{m}_{i}_{n} )/2, where g _{m}_{a}_{x} is the high est nonzero gray level and g _{m}_{i}_{n} is the lowest one, so that (g _{m}_{a}_{x} g _{m}_{i}_{n} ) becomes the span of nonzero gray values in the histogram. This midpoint is updated using the mean of the two peaks on the right and left, that is, as g _{m}_{i}_{d}
*
^{} ^{(}^{g} peak1 ^{} ^{g} peak2 ^{)}^{/}^{2}^{.}
Clustering thresholding. Otsu ^{4}^{6} Cluster _{–} Otsu sug gested minimizing the weighted sum of withinclass vari ances of the foreground and background pixels to establish
an optimum threshold. Recall that minimization of within class variances is tantamount to the maximization of betweenclass scatter. This method gives satisfactory results when the numbers of pixels in each class are close to each other. The Otsu method still remains one of the most refer enced thresholding methods. In a similar study, threshold ing based on isodata clustering is given in Velasco. ^{4}^{8} Some limitations of the Otsu method are discussed in Lee and Park. ^{4}^{9} Liu and Li ^{5}^{0} generalized it to a 2D Otsu algorithm.
methods
assume that the image can be characterized by a mixture distribution of foreground and background pixels: p(g) P(T).p _{f} (g) 1 P(T) _{} .p _{b} (g). Lloyd ^{4}^{2} Cluster _{–} Lloyd considers equal variance Gaussian density functions, and minimizes the total misclassiﬁcation error via an iterative search. In contrast, Kittler and Illingworth ^{4}^{3}^{,}^{4}^{5} Cluster _{–} Kittler removes the equal vari ance assumption and, in essence, addresses a minimum error Gaussian densityﬁtting problem. Recently Cho, Haralick, and Yi ^{4}^{4} have suggested an improvement of this thresholding method by observing that the means and vari ances estimated from truncated distributions result in a bias. This bias becomes noticeable, however, only when ever the two histogram modes are not distinguishable.
Jawahar, Biswas, and
Ray
Cluster _{–} Jawahar _{–} 1 , and Ramesh Yoo, and Sethi ^{3}^{0}
assign fuzzy clustering memberships to pixels depending on their differences from the two class means. The cluster means and membership functions are calculated as:
Minimum
error
thresholding.
These
Fuzzy clustering thresholding.
47
m k ^{} g 0
G
^{} g 0
G
g.p _{} g _{}_{} _{k} ^{} _{} g _{}
p g _{k} ^{} g
,
k f,b,
Journal of Electronic Imaging / January 2004 / Vol. 13(1) / 149
Sezgin and Sankur
Table 2 Thresholding functions for clusteringbased algorithms.
Cluster _{–} Riddler ^{3}^{4}
Cluster _{–} Yanni ^{4}^{1}
Cluster _{–} Otsu ^{4}^{6}
Cluster _{–} Lloyd ^{2}^{4}
Cluster _{–} Kittler ^{3}^{2}
Cluster _{–} Jawahar _{–} 1 ^{4}^{7}
Cluster _{–} Jawahar _{–} 2 ^{4}^{7}
T _{o}_{p}_{t} lim
n→
m f T n m b T n
2
where
^{T} n
m f T n _{}
g 0
gp _{} g _{}
m _{b} _{} T _{n}
*
^{g} mid
T ^{o}^{p}^{t} g ^{m}^{a}^{x} g ^{m}^{i}^{n} ^{}
^{g}
^{} ^{g} min
p _{} g _{}
G
_{}
g T _{n} 1
gp _{} g _{}
^{T} ^{o}^{p}^{t} ^{} ^{a}^{r}^{g} ^{m}^{a}^{x} ^{} P T 1 P T m f T m b T 2
^{2}
_{} T _{f}
P
2
T 1 P T _{b} T
T opt arg min m f T m b T
^{2} is the variance of the whole image.
T _{o}_{p}_{t} arg min _{} P(T)log _{f} (T) 1 P(T) _{} log _{b} (T) P(T)log P(T) 1 P(T) _{} log _{} 1 P(T) _{}_{} where
_{}_{} _{f} (T), _{b} (T) _{}
standard deviations.
^{2}
_{m} b _{} _{T} _{} log ^{1} ^{} ^{P} ^{} ^{T} ^{}
_{} T _{}
P
and
background
2
are
_{f} _{} T
m
^{}
foreground
T _{o}_{p}_{t} arg 
equal 
_{}_{} _{f} ^{} (g) 

(g) _{} 

b 

g 

where 

T 

d _{} g,m _{k} _{} _{} g m _{k} ^{2} , k b,f 

g 
0 

T _{o}_{p}_{t} arg 
equal 
_{}_{} _{f} ^{} (g) 

(g) 
_{} 

b 

g 
where
_{} g,m _{k}
d
1 g m _{k}
2
2
k
log _{k} log _{k} , k b,f
^{} _{} g
f
1
1 _{}
d _{} g,m _{f} _{} d _{} g,m _{b}
_{2}_{/} _{}_{} _{1} ,
_{b} ^{} _{} g _{}_{} 1 _{f}
^{} _{} g _{} .
In these expressions, d(
tion between the grayvalue g and the class mean, while is the fuzziness index. Notice that for 1, one obtains the Kmeans clustering. In our experiments we used _{}_{} 2. In a second method proposed by Jawahar, Biswas, and Ray ^{4}^{7} Cluster _{–} Jawahar _{–} 2 , the distance function and the mem bership function are deﬁned as for k f ,b):
is the Euclidean distance func
)
d _{} g,m _{k}
1 g m _{k}
2
k
2
log _{k} log _{k} ,
_{k}
G
^{} g 0
p _{} g _{k}
^{} g
G
^{} g 0
p g ^{} _{f} g _{b} ^{} g ^{,}
_{2} ^{} g 0
^{} k
G
p _{} g _{}_{} _{k} ^{} _{} g _{}_{} g m _{k} _{} ^{2}
G
^{} g 0
_{k} ^{} _{} g _{} p _{} g _{}
^{.}
In either method, the threshold is established as the cross over point of membership functions, i.e., T _{o}_{p}_{t}
arg equal
g
_{f}
^{} (g) _{b} ^{} (g) _{} .
5 EntropyBased Thresholding Methods
This class of algorithms exploits the entropy of the distri bution of the gray levels in a scene. The maximization of the entropy of the thresholded image is interpreted as in
dicative of maximum information transfer.
thors try to minimize the crossentropy between the input
graylevel image and the output binary image as indicative
of preservation of information,
entropy.
Hashim ^{6}^{3} were the ﬁrst to study Shannon entropybased thresholding. See Table 3 for these algorithms.
Kapur, Sahoo, and Wong ^{5}^{3}
Entropy _{–} Kapur consider the image foreground and back ground as two different signal sources, so that when the sum of the two class entropies reaches its maximum, the image is said to be optimally thresholded. Following this idea, Yen, Chang, and Chang ^{5}^{4} Entropy _{–} Yen deﬁne the entropic correlation as
Other au
51–55
56–59
or a measure of fuzzy and Pal, King, and
60,61
Johannsen and Bille ^{6}^{2}
thresholding.
Entropic
TC _{} T C _{b} _{} T C _{f} _{} T
T
log _{}_{}
g 0
p
P
_{} g _{}
_{} _{} 2 _{} log _{}
_{} T
G
g T 1
p _{} g _{}
T 2
1 P _{}
and obtain the threshold that maximizes it. This method corresponds to the special case of the following method in
150 / Journal of Electronic Imaging / January 2004 / Vol. 13(1)
Survey over image thresholding techniques
Table 3 Thresholding functions for entropybased algorithms.
Entropy _{–} Kapur ^{5}^{3}
Entropy _{–} Yen ^{5}^{4}
Entropy _{–} Sahoo ^{5}^{5}
Entropy _{–} Pun _{–} a ^{5}^{1}
Entropy _{–} Pun _{–} b ^{5}^{2}
Entropy _{–} Li ^{5}^{6}^{,}^{5}^{7}
Entropy _{–} Brink ^{5}^{8}
Entropy _{–} Pal ^{5}^{9}
Entropy _{–} Shanbag ^{6}^{0}
Entropy _{–} Cheng ^{6}^{1}
T _{o}_{p}_{t} arg max _{} H _{f} (T) H _{b} (T) _{} with
T
_{f} _{} T _{}
H
g
0
_{} g _{} _{} T _{}
p
P
log
G
_{T} _{} and H _{b} _{} T _{}
P
g T 1
_{} g _{}
p
p _{} g _{} P _{} T _{}
log
_{} g _{} _{} T _{}
p
P
T _{o}_{p}_{t} (T) arg max _{} C _{b} (T) C _{f} (T) _{} with
T
C _{b} _{} T log _{}_{}
g 0
_{T} _{} _{} 2 _{} and C _{f} _{} T log _{}
_{} g _{}
p
P
g
G
T 1
2
_{} g _{}
p
1
P _{}
T
^{T} opt ^{} ^{T} _{} 1 _{} ^{} ^{P} T _{} _{1} _{} ^{}
1
_{4}
•w•B _{1} _{}_{} _{4} •T _{} _{2} _{} •w•B _{2} T _{} _{3} _{} • _{} 1 P _{T} _{} _{3} _{} _{4} •w•B _{3} _{}
1
1
^{T} k
where P _{} T _{} _{k} _{} _{}_{} _{}
B 1 ,B 2 ,B 3 _{}
_{}
_{}
g 0
1,2,1
0,1,3
3,1,0
p _{} g _{} , k 1,2,3, w P _{} T _{} _{3} _{} _{}_{} P _{} T _{} _{1} _{} _{} and
_{}
if
if
if
T _{} _{1} _{}
T _{} _{1} _{}
T _{} _{2} _{}
T _{} _{2} _{}
T _{} _{1} _{} T _{} _{2} _{}
5
5
5
5
and T _{} _{2} _{} T _{} _{3} _{} 5 and T _{} _{2} _{} T _{} _{3} _{} 5
and
T _{} _{2} _{}
T _{} _{3} _{}
or T _{} _{1} _{} T _{} _{2} _{} 5
T _{o}_{p}_{t} arg equal _{} H _{f} (T) H(T) _{} where
arg max _{}
log P _{} T log _{} max _{} p _{} 1 ,
,p _{} _{T} _{}_{}_{} 1
log _{} 1 P _{} T log _{} max _{} p _{} T 1 ,
p
G ^{}
^{T} _{o}_{p}_{t} ^{} ^{a}^{r}^{g}
optimizing histogram symmetry by tuning .
T
_{g} _{} _{0}
p _{} g 0.5 0.5 _{} _{}
T _{o}_{p}_{t} arg min _{}_{}
T
^{g} ^{} ^{0} gp _{} g _{} log
G
g
T ^{} ^{}
f
m
g
T 1
gp _{} g _{} log
g
b
m
T
where _{}
g
T g _{} T m _{f} _{} T _{} and _{} T g _{} T m _{b} _{} T _{} .
g
g
g
T _{o}_{p}_{t} arg min _{} H(T) _{} where
H(T) is
T
0
g
p _{} g _{} _{} m _{f} _{} T _{} log ^{m} ^{f} ^{} ^{T} ^{}
g
g log
G
T ^{} ^{} ^{}
g
f
m
T
1
p _{} g _{} _{} m _{b} _{} T _{} log ^{m} ^{b} ^{} ^{T} ^{}
g
g log
g
b
m
T
T _{o}_{p}_{t} arg max _{} H _{f} (T) H _{b} (T) _{}
T
where H _{f} _{} T _{}
g 0
_{} p _{f} _{} g _{} log
_{f} _{} g _{} _{f} _{} g _{}
p
q
q _{f} _{} g log ^{q} ^{f} ^{} ^{g} ^{}
_{f} _{} g _{}
p
G
H _{b} _{} T _{}
g
T 1
_{} p _{b} _{} g _{} log
_{b} _{} g _{} _{b} _{} g _{}
p
q
q _{b} _{} g log ^{q} ^{b} ^{} ^{g} ^{}
_{b} _{} g _{}
p
and q _{f} _{} g exp _{}_{} m _{f} _{} T ^{m} ^{f} ^{} ^{T} ^{} g , g 0,
g!
T,
q _{b} _{} g exp _{}_{} m _{b} _{} T ^{m} ^{b} ^{} ^{T} ^{} g , g T 1,
g!
T _{o}_{p}_{t} arg min _{}_{} H _{f} (T) H _{b} (T) where
^{T}
^{G}
H _{f} _{} T _{} _{P} _{} _{T} _{} log _{}_{} _{f} _{} g _{}_{} , H _{b} _{} T _{}
g
0
g
T 1
p _{} g _{}
p _{} g _{}
1 P _{} _{T} _{} ^{l}^{o}^{g} ^{}^{} ^{b} ^{} ^{g} ^{}^{}
_{} H _{} A, _{A}
_{l}_{o}_{g} _{2} _{} Q _{} A _{f} log Q _{} A _{f} Q _{} A _{b} log Q _{} A _{b} _{}
^{1}
max
a,b,c
where _{A} is Zadeh’s membership with parameters a,b,c, and
Q _{} A _{f} _{}
g _{}_{} A _{f}
p _{} g _{} , Q _{} A _{b} _{}
g _{}_{} A _{b}
_{} g _{}
p
G.
and T _{} _{2} _{} T _{} _{3} _{} 5
Ref. 55 Entropy _{–} Sahoo for the Renyi power 2. Sahoo, Wilkins, and Yeager ^{5}^{5} combine the results of three different threshold values. The Renyi entropy of the foreground and background sources for some parameter are deﬁned as:
^{H} ^{} f
T
^{l}^{n} ^{}^{}
1
1
g 0
p
P
_{} g _{}
_{} _{} _{}
_{} T
and
H ^{}
b
1
1 ^{l}^{n} ^{}
G
g T 1
p _{} g _{} 1 P _{} T ^{} ^{} ^{.}
They then ﬁnd three different threshold values, namely, T _{1} , T _{2} , and T _{3} , by maximizing the sum of the foreground and background Renyi entropies for the three ranges of , 0
Journal of Electronic Imaging / January 2004 / Vol. 13(1) / 151
Sezgin and Sankur
1, 1, and 1, respectively. For example T _{2} for 1 corresponds to the Kapur, Sahoo and Wong ^{5}^{3} thresh old value, while for 1, the threshold corresponds to that found in Yen, Chang, and Chang. ^{5}^{4} Denoting T _{} _{1} _{} , T _{} _{2} _{} ,
and T _{} _{3} _{} as the rank ordered T _{1} , T _{2} , and T _{3} values, ‘‘op timum’’ T is found by their weighted combination. Finally, although the two methods of Pun ^{5}^{1}^{,}^{5}^{2} have been superseded by other techniques, for historical reasons, we have included them Entropy _{–} Pun1, Entropy _{–} Pun2 . In Ref. 51, Pun considers the graylevel histogram as a Gsymbol source, where all the symbols are statistically independent. He considers the ratio of the a posteriori en tropy H _{} (T) P(T)log _{} P(T) _{}_{}_{} 1 P(T) _{} log _{} 1 P(T) _{} as
a 
function of the threshold T to that of the source entropy 

T 
G 

H 
_{} T _{} p _{} g _{} log _{} p _{} g _{} 
p _{} g _{} log _{} p _{} g _{}_{} . 
g 0
g T 1
This ratio is lower bounded by
H T
H
log P _{} T _{} log _{} max _{} p _{} 1 _{} ,
,p
T
1
log _{} 1 P _{} T _{}_{} log _{} max _{} p _{} T 1 _{} ,
G ^{} ^{.}
p
In a second method, the threshold ^{5}^{2} depends on the anisot ropy parameter , which depends on the histogram asym metry.
Crossentropic
Entropy _{–} Li formulate the thresholding as the minimiza tion of an information theoretic distance. This measure is the KullbackLeibler distance
thresholding.
Li,
Lee,
and
Tam ^{5}^{6}^{,}^{5}^{7}
D _{} q,p _{} q _{} g _{} log
g _{} g _{}
q
p
background regions, as in Pal ^{5}^{9} Entropy _{–} Pal . Using the maximum entropy principle in Shore and Johanson, ^{6}^{4} the corresponding PMFs are deﬁned in terms of class means:
q
_{f} _{} g exp _{}_{} m _{f} _{} T _{}_{} ^{m} ^{f} g! ^{} ^{T} ^{} g ,
g 0,
T,
q _{b} _{} g
exp _{}_{} m _{b} _{} T _{}_{} ^{m} ^{b} g! ^{} ^{T} ^{} g ,
g T 1,
G.
Wong and Sahoo ^{6}^{5} have also presented a former study of thresholding based on the maximum entropy principle.
Shanbag ^{6}^{0}
Entropy _{–} Shanbag considers the fuzzy memberships as an indication of how strongly a gray value belongs to the background or to the foreground. In fact, the farther away a gray value is from a presumed threshold the deeper in its region , the greater becomes its potential to belong to a speciﬁc class. Thus, for any foreground and background pixel, which is i levels below or i levels above a given threshold T, the membership values are determined by
Fuzzy
entropic
thresholding.
f T i 0.5 p _{} T
p _{} T 1 i p _{} T i _{}
2P _{} T _{}
^{,}
that is, its measure of belonging to the foreground, and by
b T i 0.5 p _{} T 1
p _{} T 1 i p _{} T i _{}
2 _{} 1 P _{} T _{}_{}
^{,}
respectively. Obviously on the gray value corresponding to the threshold, one should have the maximum uncertainty, such that _{f} (T) _{b} (T) 0.5. The optimum threshold is found as the T that minimizes the sum of the fuzzy entro pies,
of the distributions of the observed image p(g) and of the reconstructed image q(g). The KullbackLeibler measure
is minimized under the constraint that observed and recon
structed images have identical average intensity in their foreground and background, namely the condition
T
_{o}_{p}_{t} arg min
T
_{}_{} H _{0} T H _{1} T ,
H
T
_{0} _{} T _{}
g 0
p _{} g _{}
P _{} T _{}
log _{}_{} _{0} _{} g _{}_{} ,
^{G}
g
T g _{} T m _{f} _{} T _{}
g
and
_{} T g _{} T m _{b} _{} T _{} .
g
g
H _{1} _{} T _{}
Brink and Pendock ^{5}^{8} Entropy _{–} Brink suggest that a threshold be selected to minimize the crossentropy, deﬁned as
g T 1
T
H _{} T _{} q _{} g _{} log
g 0
g
q
p
_{} g
^{}
G
^{}
g T 1
p _{} g _{} log q ^{p} _{} ^{} g ^{g} ^{}
_{}
^{.}
The crossentropy is interpreted as a measure of data con sistency between the original and the binarized images. They show that this optimum threshold can also be found by maximizing an expression in terms of class means. A variation of this crossentropy approach is given by speciﬁ cally modeling the a posteriori PMF of the foreground and
Q _{} A _{f} _{}
g A _{f}
p _{} g _{}
_{1} _{} _{P} _{} _{T} _{} log _{}_{} _{1} _{} g _{}_{} ,
since one wants to get equal information for both the fore ground and background. Cheng, Chen, and Sun’s method ^{6}^{1}
Entropy _{–} Cheng relies on the maximization of fuzzy event entropies, namely, the foreground A _{f} and background A _{b} subevents. The membership function is assigned using Za deh’s S function in Ref. 66. The probability of the fore ground subevent Q(A _{f} ) is found by summing those gray value probabilities that map into the A _{f} subevent:
p _{} g _{}
152 / Journal of Electronic Imaging / January 2004 / Vol. 13(1)
Survey over image thresholding techniques
Table 4 Thresholding functions for the attributebased algorithms.
Attribute _{–} Tsai ^{7}^{0}
Attribute – Hertz _{6}_{7}
Attribute _{–} Huang ^{8}^{2}
_{A}_{t}_{t}_{r}_{i}_{b}_{u}_{t}_{e} – _{P}_{i}_{k}_{a}_{z} _{7}_{6}
Attribute _{–} Leung ^{8}^{1}
_{A}_{t}_{t}_{r}_{i}_{b}_{u}_{t}_{e} – _{P}_{a}_{l} 68
T _{o}_{p}_{t} arg equal _{} m _{1} b _{1} (T),m _{2} b _{2} (T),m _{1} b _{3} (T) _{}
G
where m _{k} _{}
g 0
p _{} g _{} g ^{k} and b _{k} P _{f} m _{f} ^{k} P _{b} m _{b}
k
T _{o}_{p}_{t} arg max _{} E _{g}_{r}_{a}_{y} E _{b}_{i}_{n}_{a}_{r}_{y} (T) _{} , where E _{g}_{r}_{a}_{y} : graylevel edge ﬁeld, E _{b}_{i}_{n}_{a}_{r}_{y} (T) binary edge ﬁeld
T opt arg min
1
N ^{2} log 2
G
g 0
_{}_{} _{f} _{} g,T _{} log _{}_{} _{f} _{} g,T _{}_{}
1 _{f} _{} g,T _{}_{} log _{} 1 _{f} _{} g,T _{}_{} p _{} g _{}_{}
where _{f} _{} I _{} i,j ,T _{}_{}
G
G I _{} i,j
Bien plus que des documents.
Découvrez tout ce que Scribd a à offrir, dont les livres et les livres audio des principaux éditeurs.
Annulez à tout moment.