Académique Documents
Professionnel Documents
Culture Documents
DISSERTATION
ON
in
Electronics Engineering
Department of
Electronics
Engineering
Kamla Nehru Institute of Technology
Sultanpur 288 118 (U.P.)
Affiliated to
PAGE NO.
Certificate
Acknowledgement
ii
Abstract
iii
LIST OF ABBREVIATIONS
LIST OF FIGURES
LIST OF TABLES
S.NO
1.
CHAPTERS
INTRODUCTION
1.1
Motivation
1.2
Digital Image
1.3
1.4
Types of Noise
1.5
1.5.1
1.5.2
Adaptive Filter
1.5.3
Wiener Filter
1.5.4
10
1.5.5
Average Filter
11
1.6
12
1.7
13
1.8
14
1.8.1
Global Thresholding
14
15
1.8.2
Local Thresholding
15
1.8.3
Hybrid Thresholding
15
1.9
15
1.10
16
1.11
Organization of Dissertation
17
2. REVIEW OF LITERATURE
2.1
General
18
2.2
Summary
28
3. METHODOLOGY
29
30
33
33
33
33
34
34
35
47
47
47
47
4.2.3 Accuracy
48
48
4.2.5 Precision
48
4.2.6 F-Measure
48
4.2.7 Specificity
49
49
49
55
58
63
67
70
74
81
87
5. CONCULSION
5.1 Summary
92
5.2 Conclusion
92
93
6. REFERENCES
(94)
Electronics Engineering
Department
Kamla Nehru Institute of
Technology,
Sultanpur-228118, (U.P.)
Certificate
This is certified that Mr Digvijay Pandey, (Roll No. 139404) has carried out the
dissertation work presented in the report entitled
PERFORMANCE
ANALYSIS
OF
TEXT
IMPROVEMENT AND
EXTRACTION
ALGORITHMS
submitted in partial fulfillment of the requirements for the award the degree of Master of
Technology in Electronics Engineering
Technology, Sultanpur -228118 (UP) under my supervision during the academic session
2015-16
(Associate Professor)
Head of Department/Supervisor
Date:July,2016
place:Sultanpur(UP)
Acknowledgement
I wish express my heartfelt gratitude to my project guide
Prof. Yogesh Kumar Mishra, Department of Electronics Engg. K.N.I.T. Sultanpur 228118,
U.P. for their active interest, constructive guidance, and advice during every stage of this
work. Their valuable guidance coupled with active and timely review of my work provided
the necessary motivation for me to work on and successfully complete seminar.
I am also thankful to Prof. Yogesh Kumar Mishra (HOD), Elex. Engg.
Department K.N.I.T. Sultanpur-228118, and all the faculty & staff members of Electronics
Engineering department for their kind support and help.
It is a contribution of many persons that make a work successful. I wish to
express my gratitude to individuals who have contributed their ideas, time and energy in this
work.
Words are inefficient to express my gratitude to family members for their
inspirations, blessing and support.
I thank God, who has supported me at every moment.
(Digvijay Pandey)
Roll No (129406)
ii
Abstract
With rapid growth of the internet the amount of image and video data is increasing
exponentially. The text data present in images and videos is useful for automatic
annotations, indexing and structuring of images. There is huge increment in images and
video database online. In such database, there is need to fetch, explore and inspect the
images and videos. Text extraction plays a major role in finding vital and valuable
information. Noise is an important factor that influences the quality of image which is
mainly produced in the processes of image acquirement and transmission. An image can
be contaminated by noise like salt and pepper noise, random valued impulse noise, speckle
noise and Gaussian noise.
For the removal of noise from images, the filtering algorithm like adaptive filter,
average filter, maximum filter, median filter, minimum filter, trimmed filter and wiener filter
are used. After removing noise from input complex image the text is extracted in binary
form through proposed algorithm. The proposed method uses the techniques of local
contrast, local gradient, adaptive map contrast, canny edge detection for detection of text
strokes and Otsu threshold for calculation of threshold value .On the basis of calculated
threshold value the pixels are classified into background and foreground .A comparative
study of some popular existing filtering method is done for text extraction from complex
images .The proposed method is simulated in MATLAB to verify and validate the
performance analysis.
iii
Chapter 1
1.1
Introduction
Motivation
With rapid growth of the internet the amount of image and video data is increasing
exponentially. In some image categories (e.g. natural scenes) and video categories (news,
documents) there is often text information. This information can be used as a semantic
feature in addition to visual feature such as colors and shapes to improve the retrieval of the
relevant images and videos. The text data present in images and videos is useful for
automatic annotations, indexing and structuring of images Jung et al. (2011). There is huge
increment in images and video database online. In such database, there is need to fetch,
explore and inspect the images and videos. Text extraction plays a major role in finding vital
and valuable information Sumathi et al. (2012). As most of the search engines are text
based, manual keywords annotations have been traditionally used. However this process is
laborious and inconsistent i.e. two users may choose different keywords for same image and
video. An alternate approach is to generate the keywords from the text that appears in the
image. These keywords can be used as semantic feature to improve the retrieval of the
relevant images and videos. The other application of text extraction from images includes
sign translation, robots, video skimming and navigation aid for visually impaired. Therefore
there is an increasing demand for text extraction from images and videos.
Although several methods have been proposed over the past years, text extraction
from images is still a problem because of almost unconstrained text appearance i.e. text can
vary drastically in font, colors, size and alignments as well as low image contrast and
complex background make the problem of automatic text extraction extremely challenging
Jung et al. (2011).
1.2
Digital Images
tiny dot of a fussy color. By calculating the color of an image at a large number of points,
we can reproduce a digital approximation of the image from which a replica of the original
can be recreated. Pixels are like grain particles in a conventional photographic image, which
can be organized in a regular pattern of rows and columns and store data differently to some
extent. A digital image is a rectangular adjustment of pixels sometimes called a bitmap.
The study of various noise model and filtering techniques Kamboj et al. (2013) in
image processing, noise reduction and image restoration is expected to improve the
and also produces better result than a Standard Median Filter (SMF). The advantage of the
proposed algorithm lies in removing only the noisy pixel either by the median value or by
the mean of the previously processed neighboring pixel values.
The proposed that removal of high density salt and pepper noise in noisy color
images using projected median filter Singh et al.(2013). The presentation of improved
median filter is good at lower noise density level. The mean filter prevents little noise and
gets the worst results. The enhanced median filter is good at lower noise density levels. It
removes most of the noises effectively while preserving colored image details. The
performance of the algorithm is analyzed in terms of Peak signal to noise ratio (PSNR),
Mean square error (MSE), Image Enhancement Factor (IEF).
(3)
The Luo et al. (2006) suggested that images are often corrupted by noise known as
salt and pepper noise. The standard median filters has been established as reliable method to
remove the salt and pepper noise without harming the edge features. Though, the major
problem of standard Median Filter (MF) is that the filter is effective only at low noise
densities.
1.3
visual. Hence, processing visible information by computer has been drawing a very
significant attention of the researchers over the last few decades. The procedure of receiving
and analyzing visual information by the human species is referred to as sight, perception and
understanding. Similarly, the process of receiving and analyzing visual information by
digital computer is called digital image processing Kamboj et al. (2013).
Image processing begins with an image acquisition process. The two elements are
required to acquire digital images. The first one is a sensor; it is a physical device that is
sensitive to the energy radiated by the object that has to be imaged. The second part is called
a digitizer. It is a device for converting the output of the sensing device into digital form.
For example in a digital camera, the sensors yield an electrical output proportional to light
intensity. The digitizer converts the outputs to digital data.
The aim of digital image processing is to improve the potential information for
human interpretation and processing of image data for storage, transmission, and
representation for autonomous machine perception Hemalatha (2014). The attributes of an
image deteriorate due to contamination of various types of noise: Additive white Gaussian
noise, Rayleigh noise, Impulse noise etc. contaminate an image during the processes of
acquisition, transmission and reception and storage and retrieval. For a meaningful and
useful processing such as image segmentation and object recognition, and to have very good
visible display in applications like television, photo-phone, etc., the acquired image signal
must be noise-free and made deblurred Tripathi (2012). Image deblurring and image
denoising are the two sub-areas of image restoration. In the present research work, attempts
(4)
are made to propose efficient filters that suppress the noise and preserve the text edges and
fine details of an image as far as possible in wide range of noise density.
1.4
Types of Noise
The principal origin of noise in digital images arises during image acquisition and/or
Fig. 1.2:
Original
and
image
affected
by
Gaussian noise
(5)
Another type of
noise that may degrade an
image
signal
is
the
some biomedical applications like ultrasonic imaging and a few engineering applications
like synthesis aperture radar imaging, such a noise is encountered. The SN is a signal
dependent noise, if the image pixel magnitude is high, then the noise is also high. The noise
is multiplicative because initially a transmitting system transmits a signal to the object and
the reflected signal is recorded. When the signal is transmitted, the signal may get
contaminated with additive noise in the channel. Due to varying reflectance of the surface of
(6)
the object, the reflected signal magnitude varies. So also the noise varies since the noise is
also reflected by the surface of the object. Noise magnitude is, therefore, higher when the
signal magnitude is higher. Thus, the speckle noise is multiplicative in nature.
1.5
Image
Filtering
Algorithms
The basic problem in image processing is the enhancement and the restoration of
image in the noisy environment. For enhancing the quality of images, various filtering
techniques can be used which are available in image processing. There are various filters
which can remove the noise from images while preserving image details and enhance the
quality of image. Filters are special kind of tools designed to take an input as an image,
apply a mathematical algorithm to it, and return image in a modified format. Filters are
used to remove the noise from the images. Tripathi et al (2012) earlier linear filters
were used for removing the noise from images but linear filters have poor performance
in the presence of noise that is not additive as well as in systems where system nonlinearities or non-Gaussian statistics are encountered. The linear filters have the
advantage of fast processing but the disadvantage of not preserving edges. Conversely
the non-linear filters have the advantage of preserving text edges and the disadvantage of
slower processing Patidar et al. (2010).
1.5.1 Trimmed Average Filter
In order to calculate the -trimmed filter, the data should be sorted low to high
and summed the central part of the ordered array. The number of input data values which
(7)
are dropped from the average is controlled by the trimming parameter . It is well
known that the average filter suppresses additive white Gaussian noise better than the
median filter, while the median filter is better at preserving edges and rejecting impulses
Pitas et al. (1992). The best choice taking advantages of both average and median filter
was proposed called the a-trimmed mean filter Bednar et al. (1987). The -trimmed
mean filter rejects the smaller and the larger observation data depending on the value of
. In order to perform analysis, different metrics of images and complexity are
considered.
changes in pixel values, it can remove salt and pepper noise without significantly reducing
the sharpness of an image. Median filtering is a nonlinear operation used in image
processing to reduce "salt and pepper" noise. The median is much less sensitive than the
mean to extreme values. Median filtering is therefore better able to remove these outliers
without reducing the sharpness of the image. Median filter removes impulse noise, but it
also smoothes all edges and boundaries and may erase details of the image.
(8)
Mean filter replaces the mean of the pixels values but it does not preserve image
details. Some details are removes with the mean filter Varghese (2014). But in the median
filter, we do not replace the pixel value with the mean of neighboring pixel values, we
replaces with the median of those values. The median is calculated by first sorting all the
pixel values from the surrounding neighbourhood into numerical order and then replacing
the pixel being considered with the middle pixel value.
Adaptive filters are commonly used in image processing to enhance or restore data
by removing noise without significantly blurring the structures in the image Westin et al.
(2000). Adaptive filter is performed on the degraded image that contains original image and
noise. The mean and variance are the two statistical measures that a local adaptive filter
depends with a defined mxn window region. They can be thought of as self-adjusting digital
filters. Adaptive filters find widespread use in countering the effects of "speckle" noise,
which afflicts coherent imaging systems like ultrasound. With these imaging techniques,
scattered waves interfere with one another to contaminate an acquired image with
multiplicative speckle noise
(9)
Wiener theory, formulated by Norbert Wiener in 1940, forms the foundation of datadependent linear least square error filters. Wiener filters play a central role in a wide range
of applications such as linear prediction, echo cancellation, signal restoration, channel
equalization and system identification. The main aim of this technique is to filter out noise
that has corrupted the signal. It is kind of statistical approach. For the designing of this filter
one should know the spectral properties of the original signal ,the noise and linear timevariant filter whose output should be as close as to the original as possible Kaur (2015). The
Wiener filter minimizes the mean square error between the estimated random process and
the desired process. Wiener filter is a low pass filter an intensity image that has been
degrade by constant power additive noise. Wiener uses a pixel-wise adaptive Wiener method
based on statistics estimated from a local neighbourhood of each pixel. The wiener filters
the image I using pixel-wise adaptive Wiener filtering, using neighbourhoods of size M-byN to estimate the local image mean and standard deviation. By omitting the [M N]
argument, M and N default to 3.
(10)
pixel. From the list of neighbour pixels, the minimum or maximum value is found and
stored as the corresponding resulting value. Finally, each pixel in the image is replaced by
the resulting value generated for its associated neighbourhood. If we apply max and min
filters alternately they can remove certain kind of noise, such as salt-and-pepper noise very
efficiently Kaur (2015).
A single pixel with a very unrepresentative value can significantly affect the mean
value of all the pixels in its neighbourhood.
When the filter neighbourhood straddles an edge, the filter will interpolate new
values for pixels on the edge and so will blur that edge. This may be a problem if
sharp edges are required in the output.
Both of these problems are tackled by the median filter, which is often a better filter
for reducing noise than the mean filter, but it takes longer to compute. In general the mean
filter acts as a low pass frequency filter and, therefore, reduces the spatial intensity
derivatives present in the image.
1.6
text extraction from the complex images and documents (historical and degraded). They are
very effective and have been used in many document image binarization techniques Su et al.
(12)
(2010). In Bernsens paper Bernsen et al.(1986), the local contrast is defined as follows:
Error: Reference source not found
(1)
Where C(i, j ) denotes the contrast of an image pixel (i, j ), Error: Reference source not
foundand Error: Reference source not found denote the maximum and minimum intensities
within a local neighborhood windows of(i j),respectively. If the local contrast C(i, j ) is
smaller than a threshold, the pixel is set as background directly. Otherwise it will be
classified into text or background by comparing with the mean ofError: Reference source
not found and Error: Reference source not foundBernsens method s simple, but cannot
(2)
Where is a positive but infinitely small number that is added in case the local maximum is
equal to 0. Compared with Bernsens contrast, the new local image contrast introduces a
normalization factor (the denominator) to compensate the image variation within the
document background.
The image gradient has been widely used for edge detection Ziou et al. (1988) and it
can be used to detect the text stroke edges of the document images effectively that have a
uniform document background. On the other hand, it often detects many non stroke edges
from the background of degraded document that often contains certain image variations due
to noise, uneven lighting, bleed-through, etc. To extract only the stroke edges properly, the
image gradient needs to be normalized to compensate the image variation within the
document background Su et al. (2010.
1.7
Adaptive Contrast Map, combination the local image contrast with the local image gradient
is applied to the input image. It detects many non stroke edges from the background of
image that often contains certain image variations due to noise, uneven lighting, bleed(13)
through, etc. To extract only the stroke edges properly, the image gradient needs to be
normalized to compensate the image variation within the document background. The
purpose of the contrast image construction is to detect the stroke edge pixels of the
document text properly Su et al. (2010.
Where C(i.j) denotes the local contrast and Error: Reference source not found represents the
local gradient normalized over [0 and 1].The local window size is set to 3. is the weight
between local contrast and local gradient which is controlled by the image statistical
information.
Error: Reference source not found
(4)
Std is the document image intensity standard deviation and is predefined parameter [0,
infinity]
1.8
pair wise squared distances is constant), so that their inter-class variance is maximal
Otsu (1979).
(15)
The Canny edge detector is an edge detection operator that uses a multi-stage
algorithm to detect a wide range of edges in images. Canny edge detector has a good
localization property that it can mark the edges close to real edge locations in the
detecting image Jagtap et al. (2015). In addition, canny edge detector uses two
adaptive thresholds and is more tolerant to different imaging artifacts such as shading.
It was observed that over the years different algorithms has been implemented for
the text extraction from the documents and images, but it has not achieved perfection. These
algorithms have undergone vast alterations and modifications. The algorithms has been
modified in exploring the best method for achieving the threshold values ,by applying the
best techniques for edge detection, integrating the image gradient and image contrast
enhancement for improving accuracy, but the application of filtering techniques for
removing the noise from images at initial stage for text extraction has not been addressed
and explored much.
(16)
chapters that form the dissertation. In chapter 2, a literature review of current research on
text extraction from images, degraded document images and historical document have been
studied. This chapter also provides study about various binarization and filtering techniques
and algorithms. Chapter 3 gives a brief overview of simulation tools which are used in this
dissertation. With the help of these tools overall simulation is performed. In this chapter,
description of proposed methodology which is designed to complete this dissertation is
given. Methodology gives a complete overview of work which is done from starting to end.
After this the proposed algorithm to study the effects of filters on text extraction from
complex images through binarization is implemented.
(17)
Chapter 2
2.1
Review of Literature
General
The literature survey chapter should demonstrate a systematic knowledge of the area
and provide arguments to support the study focus. A literature survey helps to locate and
summarize the background study of a specific topic. Reliable sources such as IEEE, ACM
and books on binarization and filters were used for detailed literature review in order to
acquire relevant data.
Shaomin et al. (1995) the proposed a work on image enhancement using -trimmed
mean filters. Image enhancement is the most important challenging pre-processing for
almost all applications of Image Processing. By now, various methods such as Median filter,
-trimmed mean filter, etc. have been suggested. It was proved that the -trimmed mean
filter is the modification of median and mean filters. The proposed algorithm has shown
excellent performance in suppressing noise.
Sauvola et al. (1999) this paper proposes a method for adaptive document image
binarization, where the page is considered as a collection of subcomponents such as text,
background and picture. He proposes a new method that first performs a rapid classification
of the local contents of a page to background, pictures and text. Two different approaches
are then applied to define a threshold for each pixel: a soft decision method (SDM) for
background and pictures, and a specialized text binarization method (TBM) for textual and
line drawing areas. The SDM includes noise filtering and signal tracking capabilities, while
the TBM is used to separate text components from background in bad conditions, caused by
uneven lumination or noise. Finally, the outcome of these algorithms is combined.
Sobottka et al. (2000) the automatic retrieval of indexing information from colored
paper documents is a challenging problem. In order to build up bibliographic databases,
editing by humans is usually necessary to provide information about title, authors and
keywords. For automating the indexing process, the identification of text elements is
essential. In this paper an approach to automatic text extraction from colored book and
journal covers is proposed. Two methods have been developed for extracting text. The
(18)
results of both methods are combined to robustly distinguish between text and non-text
elements.
Yuan et al. (2001) in this paper they present a well designed method that makes use
of edge information to extract textual blocks from gray scale document images. It aims at
detecting textual regions on heavy noise infected newspaper images and separate them from
graphical regions. The algorithm traces the feature points in different entities and then
groups those edge points of textual regions. From using the technology of line
approximation and layout categorization, it can successfully retrieve directional placed text
blocks. Finally feature based connected component merging was introduced to gather
homogeneous textual regions together within the scope of its bounding rectangles. The
proposed method has been tested on a large group of newspaper images with multiple page
layouts, promising results approved the effectiveness of our method
Chen et al. (2001), the paper presents a fast and robust algorithm to identify text in
image or video frames with complex backgrounds and compression effects. The algorithm
first extracts the candidate text line on the basis of edge analysis, baseline location and
heuristic constraints. Support Vector Machine is then used to identify text line from the
candidates in edge-based distance map feature space. Experiments based on a large amount
of images and video frames from different sources showed the advantages of this algorithm
compared to conventional methods in both identification quality and computation time.
Tsai et al. (2002) this paper presents a novel binarization algorithm for color
document images. Conventional thresholding methods do not produce satisfactory
binarization results for documents with close or mixed foreground colors and background
colors. Initially, statistical image features are extracted from the luminance distribution.
Then, a decision-tree based binarization method is proposed, which selects various color
features to binarize color document images. First, if the document image colors are
concentrated within a limited range, saturation is employed. Second, if the image
foreground colors are significant, luminance is adopted. Third, if the image background
colors are concentrated within a limited range, luminance is also applied. Fourth, if the total
number of pixels with low luminance (less than 60) is limited, saturation is applied; else
both luminance and saturation are employed. Our experiments include 519 color images,
(19)
most of which are uniform invoice and name-card document images. The proposed
binarization method generates better results than other available methods in shape and
connected-component measurements. Also, the binarization method obtains higher
recognition accuracy in a commercial OCR system than other comparable methods.
Gllavata et al. (2003) text detection in images or videos is an important step to
achieve multimedia content retrieval. In this paper, an efficient algorithm which can
automatically detect, localize and extract horizontally aligned text in images (and digital
videos) with complex backgrounds is presented. The proposed approach is based on the
application of a color reduction technique, a method for edge detection, and the localization
of text regions using projection profile analyses and geometrical properties. The outputs of
the algorithm are text boxes with a simplified background, ready to be fed into an OCR
engine for subsequent character recognition. Our proposal is robust with respect to different
font sizes, font colors, languages and background complexities. The performance of the
Gatos et al. (2006) this paper presents a new adaptive approach for the binarization
and enhancement of degraded documents. The proposed method does not require any
parameter tuning by the user and can deal with degradations which occur due to shadows,
non-uniform illumination, low contrast, large signal-dependent noise, smear and strain. We
follow several distinct steps: a pre-processing procedure using a low-pass Wiener filter, a
rough estimation of foreground regions, a background surface calculation by interpolating
neighboring background intensities, a thresholding by combining the calculated background
surface with the original image while incorporating image up-sampling and finally a postprocessing step in order to improve the quality of text regions and preserve stroke
connectivity.
(21)
Badekas et al. (2007) this article presents a new method for the binarization of
color document images. Initially, the colors of the document image are reduced to a
small number using a new color reduction technique. Specifically, this technique
estimates the dominant colors and then assigns the original image colors to them in order
that the background and text components to become uniform. Each dominant color
defines a color plane in which the connected components (CCs) are extracted. Next, in
each color plane a CC filtering procedure is applied which is followed by a grouping
procedure. At the end of this stage, blocks of CCs are constructed which are next
redefined by obtaining the direction of connection (DOC) property for each CC. Using
the DOC property, the blocks of CCs are classified as text or no text. The identified text
blocks are binarized properly using suitable binarization techniques, considering the rest
of the pixels as background. The final result is a binary image which contains always
black characters in white background independently of the original colors of each text
block. The proposed document binarization approach can also be used for binarization of
noisy color (or gray-scale) document images.
Nikholas et al. (2008) Binarization methods are applied to document images for
discriminating the text from the background based on pure thresholding and filtering
combined with image processing algorithms. The proposed binarization procedure consists
of five discrete steps in image processing, for different classes of document images. A
refinement technique enhances further the image quality. Results on Byzantine historical
manuscripts are discussed and potential applications and further research are proposed. The
main contribution of this paper is to propose a simple and robust binarization procedure for
pre-filtered historical manuscripts images, and simulation results are also presented.
Stathis et al. (2008) this paper proposes a new technique for the validation of
document binarization algorithms. The method is simple in its implementation and can be
performed on any binarization algorithm since it doesnt require anything more than the
binarization stage. The technique is appropriate for document images that are difficult to be
evaluated by techniques based on segmentation or recognition of the text. In this paper, a
(22)
survey of binarization algorithm performance is done in which a comparison is made range
from the oldest to the newest ones and some conclusions are presented. Experiments are
perform on artificial historical documents that imitate the common problems of historical
documents, made by using techniques of image mosaicing and combining old blank
document pages with noise-free pdf documents. This way, after the application of the
binarization algorithms to the synthetic images, it is easy to evaluate the results by
comparing the resulted image, pixel by pixel, with the original document.
Ebenezer et.al (2009) has proposed a novel decision based trimmed median filter
algorithm for restoring gray scale, and color images of highly corrupted. It interchanges the
noisy pixel by trimmed median pixel value when other values of pixels, 0s and 255s are
present in the selected window and when all the pixel values are 0s and 255s then the
noise pixel is replaced by mean value of all the elements present in the selected window. It
can handle color as well as gray images.
Hedjamg et al. (2010) in this work, a robust segmentation method for text
extraction from the historical document images is presented. The method is based on
Markovian-Bayesian clustering on local graphs on both pixel and regional scales. It consists
of three steps. In the first step, an over-segmented map of the input image is created. The
resulting map provides a rich and accurate semi-mosaic fragments. The map is processed in
the second step, similar and adjoining sub-regions are merged together to form accurate text
shapes. The output of the second step, which contains accurate shapes, is processed in the
final step in which, using clustering with fixed number of classes, the segmentation will be
obtained. The method employs significantly the local and spatial correlation and coherence
on both the image and between the stroke parts, and therefore is very robust with respect to
the degradation. The resulting segmented text is smooth, and weak connections and loops
are preserved thanks to robust nature of the method. The output can be used in succeeding
skeletonization processes which require preservation of the text topology for achieving high
performance. The method is tested on real degraded document images with promising
results.
Wang et al. (2010) has proposed a novel improved median filter algorithm for the
(23)
images highly corrupted with salt-and-pepper noise. Firstly all the pixels are classified into
signal pixels and noisy pixels by using the Max-Min noise detector. The noisy pixels are
then separated into three classes, which are low-density, moderate-density, and high-density
noises, based on the local statistic information. Finally the weighted 8-neighborhood
similarity function filter, the 55 median filter and the 4-neighborhood mean filter are
adopted to remove the noises for the low, moderate and high level cases, respectively. The
validation results show that the proposed algorithm has better performance for capabilities
of noise removal, adaptively, and detail preservation, especially effective for the cases when
the images are extremely highly corrupted.
Liu et al. (2012) have proposed an improved image filtering algorithm which is
based on median filtering algorithm and medium filtering algorithm according to the
simpleness of median filtering algorithm and the significant de-noising effect of medium
filtering algorithm. The new algorithm combines the two algorithms and thus gets a better
filtering effect. The simulation was performed using MATLAB, and the objective evaluation
using by the classical method PSNR. Simulation results showed that the new algorithm has
a better de-noising effect than the medium filtering algorithm and reduces the denoising
time as well. Thus the improved algorithm has a better practicability.
Rongzhu et al. (2012) has demonstrated the application of improved median filter
on image processing. Median filter is the most common method of clearing image noise.
This report proposes improved algorithm of median filter to remove sale and pepper noise of
image. According to the characteristics of salt and pepper noise, the algorithm detects image
noise, and establishes noise marked matrix, without processing the pixels marked as signal.
The signal of the pixel is marked as not treated, labeled according to their pixel noise
pollution in the neighborhood to take a different pixel weighted mean filter window size,
weight pixel region by the noise points to determine the local histogram. Matlab
experiments show that improved median filter can greatly reduce the time of clears image
noise and it performs better than median filters on noise reduction while retaining edges of
an image.
(24)
Wang et al. (2012) has done the comparative study of research work done in the field of
image filtering. Image filtering processes are applied on images to remove the different
types of noise that are either present in the image during capturing or introduced into the
image during transmission. The salt & pepper(impulse) noise is the one type of noise which
is occurred during transmission of the images or due to bit errors or dead pixels in the image
contents. The images are blurred due to object movement or camera displacement when we
capture the image. This pepper deals with removing the impulse noise and blurredness
simultaneously from the images. The hybrid filter is a combination of wiener filter and
median filter.
and alpha trimmed has some potential benefits over existing filters when to reduce salt and
pepper noise.
Kaur (2014) this paper has focused on the different image binarization technique.
Existing research has been shown that no technique is perfect for every case. Several
researchers have used image filters to reduce the noise from the image however the
utilization of the guided filter (best edge preserving filter) is not found. It may increase the
accuracy of the present binarization strategies. Within the most of techniques the contrast
enhancement is either done by tradition way or not done. Therefore adaptive contrast
enhancement is required. Most of the strategies have neglected the utilization of edge map
that has the capability to map the precise character in proficient manner. This paper has
proposed a new technique which has the ability to binarized documents in more efficient
manner. The proposed method has integrated the image gradients and the image contrast
enhancement to improve the accuracy of document image binarization. The proposed
technique also utilizes the guided image filter to improve the accuracy rate further. The
comparative analysis has shown that the proposed algorithm provides quite significant
improvement over the available algorithms.
Kumari et al. (2014) in this paper the author has survey about the filtering technique
for denoising images in Digital Image Processing. In image denoising techniques, image
filtering algorithms are applied on images to remove the different types of noise that are
(26)
either present in the image during capturing or injected into the image during transmission.
The certain image denoising filters are based on the median filters. The author has explored
variety of methods to remove noise from digital images, such as Gaussian filtering, Wiener
filtering etc. Due to certain assumptions made about the frequency content of the image,
many of these algorithms remove fine details from images in addition to the noise.
Sehad (2014) in this paper, the author propose to estimate texture information based
on Gabor filters for ancient degraded documents. First, the dominant slant angle of the
document image script is computed by using the Fourier transform. Then, this dominant
angle is used within a weighted sum of angles in a Gabor filter bank in order to capture
more efficiently the document image foreground (text). This information, combined with the
variance and the mean extracted respectively from spatial and frequency domains are used
for estimating the binarization threshold. Three variants are used for evaluating the
performance of Gabor filter bank, which are based on Niblack's, Sauvola's, and Wolf's
thresholds.
Qixiang et al. (2015) this paper analyzes, compares, and contrasts
technical challenges, methods, and the performance of text detection and
recognition research in color imagery. It summarizes the fundamental problems
and enumerates factors that should be considered when addressing these
problems. Existing techniques are categorized as either stepwise or integrated
and sub-problems are highlighted including text localization, verification,
segmentation and recognition. Special issues associated with the enhancement
of degraded text and the processing of video text, multi-oriented, perspectively
distorted and multilingual text are also addressed. The categories and subcategories of text are illustrated, benchmark datasets are enumerated, and the
performance of the most representative approaches is compared.
Ranganathan et al. (2015) this paper presents a simple and efficient
binarization method to binarize the degraded document image. The proposed
technique is efficient to tolerate the high inter and intra intensity variation in
the degraded document image. Document Image binarization is a process of
converting the document image into binary
(27)
image containing text as foreground and plain white as background or vice
versa. Characters from the document image should be extracted from the
binarized image, in order to recognize them. So performance of the character
recognition system is completely depends on the binarization quality. The
proposed method is based on spatial domain techniques: Laplacian operator,
Adaptive Bilateral filter and Gaussian filter and works well for degraded
documents and palm leaf manuscript images.
2.2
Summary
It is inferred from above literature that most of the researchers give great
contribution in the study of filtering algorithms and text extraction from colored images and
degraded historical document images. Filters are used for denoising the images but
sometime it may happen that while removing noise, the filters are not able to preserve the
edge of text present in image. Therefore, it is necessary to analyze how different filters
behave and perform while extracting text from images. For this purpose, in this thesis the
main focus is to find out the performance of filters on colored images with different
background colors, intensity, illumination and text with different font size, shapes and
alignments. The quality of images are degraded and contaminated by noise like Gaussian
noise, Speckle noise and Salt and pepper noise.
(28)
Chapter 3
MethodOLOGY
algorithms,
processing,
analysis,
transformations
and
intensity-based
image
registration
methods
3.2
Research Methodology
Research Methodology implies simply the methods I intend to use in my thesis. It
can be used for resolving problems and better implementation of my research work.
The research activities consist of following phases:
1. The first phase of my work starts with the selection of dissertation topic. Text data
present in images and video contain useful information for automatic annotation,
indexing and structuring of images. Extraction of this information involves detection
localization tracking extraction enhancement and recognition of the text from
degraded image. Text Extraction from image is concerned with extracting the
relevant text data from a collection of images and how the noise like salt and pepper
noise, gaussain noise present in those images are effecting the text extraction from
images.
(30)
2. In the second phase of my dissertation, I have done a broad study of my topic. In the
literature review, study about existing text extraction algorithms which are applicable
in digital image processing is done. Existing research has shown that no technique is
perfect for every case. After studying the literature I analyze that I need to design
highly efficient algorithm to ensure text extraction from degraded documents or
images with the help of removing the noise at pre-processing stage.
3. In the third phase, I found out research gaps which include to explore the
concept of text extraction through binarization from complex images and to
study the performance of filters on images before applying binarization
algorithms.
The
enhancement
are
concept
used
of
to
image
gradients
improve
the
and
accuracy
the
of
image
document
contrast
image
binarization.After this I started installing MATLAB R2013a* version and learn how
to use it. And also learn about its syntax.
and Understanding
Basic
Concepts
4. FourthSelection
phase of
of Topic
dissertation
starts withthethe
design
of the proposed model
which include designing and implementation of my algorithm in the matlab toolbox.
5. In the fifth phase, I performed overall methods in my work. In this I implemented
Literature Survey
Discussion with Mentor
Experimental
my filtering and text extraction algorithms. After this I performed
validation of my
Knowledge
proposed algorithm and compare the performance of filters in terms of PSNR, MSE,
Broad
study of Topic
DRD, MPM, NPN, Recall
and Accuracy.
6. In the final phase of my dissertation, I discussed the results and found out
which filter has better performance for my proposed algorithms under different
simulation parameter values.
Required
tools
installatio
n
Observati
on of
Existing
Solution
Docume
nt
Evidence
(31)
Identifying Research Gaps
Data Collection
Data Synthesis
Methodology
(32)
3.3
Simulation Modeling
Simulation is the technique to solving problems by the observation of the
and
analysis of a model
repeatedly perform.
3.4
Proposed Algorithm
Binarized image
Otsu Threshold
Threshold Estimation
Gradient image
Contrast image
(34)
3.4.2
g ( x, y ) f ( x, y ) ( x, y )
Gaussian
(35)
Rayleigh
Erlang
Exponential Gaussian
Uniform
Impulse
Rayleigh
Erlang
Erlang (Gamma)
Uniform
Exponential
Uniform
Exponential
Impulse
Impulse
(36)
The arithmetic mean filter is a very simple one and is calculated as follows:
1
f ( x, y )
g ( s, t )
mn ( s ,t )S xy
(2)
In order to calculate the -trimmed filter, the data should be sorted low to high
and summed the central part of the ordered array. The number of input data values which
are dropped from the average is controlled by the trimming parameter . It is well
known that the average filter suppresses additive white Gaussian noise better than the
median filter, while the median filter is better at preserving edges and rejecting impulses
Pitas et al. (1992). The best choice taking advantages of both average and median filter
was proposed called the a-trimmed mean filter Bednar et al. (1987). The -trimmed
mean filter rejects the smaller and the larger observation data depending on the value of
. In order to perform analysis, different metrics of images and complexity are
considered.
f ( x, y )
1
mn d
( s ,t )S xy
( s, t )
(3)
We can delete the d/2 lowest and d/2 highest grey levels
So gr(s, t) represents the remaining mn d pixels
(37)
Mean filter replaces the mean of the pixels values but it does not preserve image
details. Some details are removes with the mean filter Varghese (2014). But in the median
filter, we do not replace the pixel value with the mean of neighboring pixel values, we
replaces with the median of those values. The median is calculated by first sorting all the
pixel values from the surrounding neighbourhood into numerical order and then replacing
the pixel being considered with the middle pixel value.
f ( x, y ) median{g ( s, t )}
( s ,t )S xy
(4)
2000). Adaptive filter is performed on the degraded image that contains original image and
noise. The mean and variance are the two statistical measures that a local adaptive filter
depends with a defined mxn window region. They can be thought of as self-adjusting digital
filters. Adaptive filters find widespread use in countering the effects of "speckle" noise,
which afflicts coherent imaging systems like ultrasound. With these imaging techniques,
scattered waves interfere with one another to contaminate an acquired image with
multiplicative speckle noise
(5)
(6)
Our objective, then, is to minimize the power of the last subfilter output
The Wiener filter minimizes the mean square error between the estimated random process
and the desired process. Wiener filter is a low pass filter an intensity image that has been
degrade by constant power additive noise. Wiener uses a pixel-wise adaptive Wiener method
based on statistics estimated from a local neighbourhood of each pixel. The wiener filters
the image I using pixel-wise adaptive Wiener filtering, using neighbourhoods of size M-byN to estimate the local image mean and standard deviation. By omitting the [M N]
argument, M and N default to 3.
H (e j )
S s (e j )
S s (e j ) (1 / M ) S v (e j )
(7)
power spectra of signal (s) and noise (v) are F-transforms of correlation functions
r(k)
pixel. From the list of neighbour pixels, the minimum or maximum value is found and
stored as the corresponding resulting value. Finally, each pixel in the image is replaced by
the resulting value generated for its associated neighbourhood. If we apply max and min
filters alternately they can remove certain kind of noise, such as salt-and-pepper noise very
(40)
Max Filter:
f ( x, y ) max {g ( s, t )}
( s ,t )S xy
(8)
f ( x, y ) min {g ( s, t )}
(9)
Min Filter:
( s ,t )S xy
Max filter is good for pepper noise and Min filter is good for salt noise.
filter. Like other convolutions it is based around a kernel, which represents the shape and
size of the neighbourhood to be sampled when calculating the mean.
This result is not a significant improvement in noise reduction and, furthermore, the
image is now very blurred. The two main problems with mean filtering, which are:
A single pixel with a very unrepresentative value can significantly affect the mean
value of all the pixels in its neighbourhood.
When the filter neighbourhood straddles an edge, the filter will interpolate new
values for pixels on the edge and so will blur that edge. This may be a problem if
sharp edges are required in the output.
Both of these problems are tackled by the median filter, which is often a better filter
for reducing noise than the mean filter, but it takes longer to compute. In general the mean
filter acts as a low pass frequency filter and, therefore, reduces the spatial intensity
derivatives present in the image.
1
f ( x, y )
g ( s, t )
mn ( s ,t )S xy
(42)
(10)
(12)
Where is a positive but infinitely small number that is added in case the local maximum is
equal to 0. Compared with Bernsens contrast, the new local image contrast introduces a
normalization factor (the denominator) to compensate the image variation within the
document background.
The image gradient has been widely used for edge detection Ziou et al. (1988) and it
can be used to detect the text stroke edges of the document images effectively that have a
uniform document background. On the other hand, it often detects many non stroke edges
from the background of degraded document that often contains certain image variations due
to noise, uneven lighting, bleed-through, etc. To extract only the stroke edges properly, the
image gradient needs to be normalized to compensate the image variation within the
document background Su et al. (2010).
(43)
Where C(i.j) denotes the local contrast and Error: Reference source not found represents the
local gradient normalized over [0 and 1].The local window size is set to 3. is the weight
between local contrast and local gradient which is controlled by the image statistical
information.
Error: Reference source not found
(14)
Std is the document image intensity standard deviation and is predefined parameter [0,
infinity]
3.4.6
from the background based on their gray level distribution Vala et al. (2013). Based on
the calculation of threshold value the there are three type of thresholding methods:
3.4.6.1Global Thresholding
The global thresholding technique computes an optimal threshold for the entire
image Jagroop Kaur et al. (2014). It works well for the simple cases, but fails for the
images with complex background and uneven illuminations. They are generally based on
histogram analysis Kasar et al. (2007). They work well for images with separated
foreground and background classes.
of the background of the document image and keep only the part containing foreground. A
second step aims to refine the image obtain from the previous step in order to obtain a better
result Kaur et al. (2014).
3.4.7
algorithm to detect a wide range of edges in images. Canny edge detector has a good
localization property that it can mark the edges close to real edge locations in the
detecting image Jagtap et al. (2015). In addition, canny edge detector uses two
adaptive thresholds and is more tolerant to different imaging artifacts such as shading.
(46)
Chapter 4
4.1
General
Discussion
This chapter includes the results obtained from the simulation. The proposed
algorithm is used to provide more clarity than the previous work. In this chapter results of
all the intermediate steps of the proposed methods are highlighted. Implementation is done
using MATLAB simulation tool Experimental results of intermediate steps shows the
efficiency of the proposed approach.
4.2
Performance Metrics
In my study I want to check the performance of my proposed algorithm on different
filters. In the end comparison between different filters is done on the basis of different
performance metrics & then we evaluate our results. These metrics are as follows:
(1)
4.2.3 Accuracy
Accuracy is defined as the ratio of the correctly and incorrectly recognized
characters to the sum of correctly and incorrectly detected and recognized characters, false
positive and false negatives. It is used to describe the closeness of a measurement to the
true value and an accuracy of 100% means that the measured values are exactly the same as
the given values.
Error: Reference source not found
(3)
4.2.5 Precision
Precision in digital image retrieval is the fraction of the documents that are relevant
to the query that are successfully retrieved.
Error: Reference source not found
(5)
Where TP is True Positive value and FP is False Positive value.
Where FN is false negative.
4.2.6 F-Measure
The F-Measure is the harmonic mean of recall (R) and precision (P) values. This
metric measures how well proposed algorithm can retrieve the desire pixels.
(48)
4.2.7 Specificity
The specificity is the number of true negative results divided by the sum of the
numbers of true negative plus false positive results
Where True positive is correctly identified data in image, False positive is
incorrectly identified data in image, True negative correctly rejected data in image and False
negative is incorrectly rejected data.
4.3
(49)
Following are the output images which are obtained by applying the different
filtering methods on the proposed algorithm for the input image.
Fig. 4.2: Text extraction from vehicle license plate using adaptive filter
(50)
Fig. 4.2 shows the text extracted from the input image by using adaptive filter method on
proposed algorithm.
Fig. 4.3: Text extraction from vehicle license plate using average filter
Fig. 4.3 shows the text extracted from the input image by using average filter method
on proposed algorithm.
Fig. 4.4: Text extraction from vehicle license plate using maximum filter
(51)
Fig. 4.4 shows the text extracted from the input image by using maximum filter
method on proposed algorithm.
Fig. 4.5: Text extraction from vehicle license plate after applying median filter
Fig. 4.5 shows the text extracted from the input image by using median filter method
on proposed algorithm.
Fig. 4.6: Text extraction from vehicle license plate using minimum filter
Fig. 4.6 shows the text extracted from the input image by using minimum filter
method on proposed algorithm.
(52)
Fig. 4.7: Text extraction from vehicle license plate using trimmed filter
Fig. 4.7 shows the text extracted from the input image by using trimmed filter
method on proposed algorithm.
Fig. 4.8: Text extraction from vehicle license plate using wiener filter
(53)
Fig. 4.8 shows the text extracted from the input image by using wiener filter method
on proposed algorithm.
Table 4.1: Summary of the result for input image of vehicle license plate
Filtering
Methods
Accuracy
DRD
FNRM Precision
Measure
PSNR
Specificity
Adaptive Filter
0.9572
13.4257
0.5143
0.4987
0.8056
13.6855
1.0000
Average Filter
0.9547
12.1345
0.2518
0.4994
0.3191
13.4358
0.9999
Maximum Filter
0.9540
11.2602
0.3804
0.4991
0.3966
13.3770
0.9999
Median Filter
0.9559
11.1076
0.1898
0.4995
0.7456
13.5521
1.0000
Minimum Filter
0.9618
11.2403
1.6104
0.4960
0.8454
14.1767
0.9999
Trimmed Filter
0.9522
12.7740
0.3974
0.4991
0.2717
13.2055
0.9997
Wiener Filter
0.9523
12.1693
0.1914
0.4996
0.2727
13.21114
0.9999
Table 4.1 shows the performance values calculated by multiple filters for given input
image of vehicle license plate. These values are further analyzed with the help of the graphs.
(54)
Fig 4.9 shows that in term of accuracy the minimum filter shows the best result and
trimmed filter shows the worst result for the input image of vehicle license plate. After
minimum filter the filter which shows the better output is adaptive filter in comparison of
other filters.
Fig. 4.10: Precision values of different filters for vehicle license plate
(55)
Fig. 4.10 shows that the values for minimum filter are close enough to 1. It implies
that for given input image of vehicle license plate minimum filter is more efficient for text
extraction in binary form in comparison of other filters.
Fig. 4.11: Negative rate metric of different filters for vehicle license plate
Fig.4.11 shows that for input image of vehicle license plate there is least pixel
mismatch for minimum filter and maximum pixel mismatch for wiener filter.
Fig. 4.12: Distance reciprocal distortion metric of different filters for vehicle license plate
(56)
Fig. 4.12 show that median filter provides best visual quality and maintains good
text stroke for input image and adaptive filter gives the highest values of DRD which means
that adaptive filter is not able to detect text properly in input image.
Fig. 4.13: F-Measure values of different filters for vehicle license plate
Fig. 4.13 shows that minimum filter posses the highest value of the F-Measure it
indicated that binarized image and input image are equivalent. It also implies that precision
and recall value of binarized image are having high values.
Fig. 4.14: PSNR values of different filters for vehicle license plate
(57)
Fig. 4.14 shows that for the given input image of vehicle license plate the minimum
filter gives high psnr value which shows the good image quality and less error is introduced
in output image whereas trimmed filter gives the lowest psnr value which means the quality
of input image is degraded in comparison of other filters.
Fig. 4.15: Specificity values of different filters for vehicle license plate
Fig. 4.15 shows that for adaptive filter and median filter the proposed algorithm is
able to reject the pixel of a input image which do not contain text data more accurately in
comparison of other filters.
(58)
Fig. 4.16:
Input
image is
logo of
western
union
bank
Following are the output images which are obtained by applying the different
Fig. 4.17 Text extraction from logo of western union bank after applying adaptive filter
Fig. 4.17 shows the text extracted from the input image by using adaptive filter
method on proposed algorithm.
(59)
Fig. 4.18: Text extraction from input image after applying average filter
Fig. 4.18. shows the text extracted from the input image after using average filter
method on proposed algorithm.
Fig. 4.19: Text extraction from input image after applying maximum filter
(60)
Fig. 4.19 shows the text extracted from the input image after using maximum filter
method on proposed algorithm.
Fig. 4.20: Text extraction from logo of western union bank image after applying median filter
Fig. 4.20 shows the text extracted from the input image after using median filter
method on proposed algorithm.
Fig. 4.21: Text extraction from logo of western union bank image after applying minimum filter
(61)
Fig. 4.21 shows the text extracted from the input image after using minimum filter
method on proposed algorithm.
Fig. 4.22: Text extraction from logo of western union bank image after applying trimmed filter
Fig. 4.22 shows the text extracted from the input image after using trimmed filter
method on proposed algorithm.
Fig. 4.23: Text extraction from logo of western union bank image after applying wiener filter
(62)
Fig. 4.23 shows the text extracted from the input image after using wiener filter
method on proposed algorithm.
Table 4.2: Summary of the result for input image of logo of western union bank
Filtering
Methods
Accuracy
DRD
FMeasure
NRM
Adaptive Filter
0.9260
16.0210
0.0825
0.4995
0.5479
11.3098
1.0000
Average Filter
0.9190
19.5042
0.3233
0.4992
0.5758
10.4877
0.9999
Maximum Filter
0.9183
19.7458
0.8218
0.4980
0.5657
10.8766
0.9998
Median Filter
0.9272
16.9677
0.1256
0.4997
1.0000
11.3803
1.0000
Minimum Filter
0.9484
15.1737
1.7071
0.4815
1.0000
10.8774
1.0000
Trimmed Filter
0.9193
14.9850
0.5854
0.4837
0.4026
10.9826
0.9996
Wiener Filter
0.9273
16.5714
0.3765
0.4991
0.5000
11.3864
0.9999
Accuracy
0.955
0.95
0.945
values
0.94
0.935
0.93
0.925
0.92
Fig. 4.24: Accuracy of different filters for logo of Western union bank
0.915
(63)
4
Filters
Fig 4.24 shows that in term of accuracy the minimum filter shows the best result and
maximum filter shows the worst result for the input image of Western union bank logo. As
we can see in Fig 4 minimum filter extract complete text present in image whereas in
maximum filter most of the text is not extracted.
20
19
values
18
17
16
15
Fig. 4.25: Distance reciprocal distortion metric of different filters for logo of Western union bank
14
Fig. 4.25 show that trimmed filter provides best visual quality and maintains good
text
and adaptive filter
1 stroke for input image
2
3 gives the highest4values of DRD which
5 means
Filters
that maximum filter is not able to detect text properly in input
image.
F-Measure
1.8
1.6
1.4
values
1.2
0.8
0.6
0.4
0.2
Fig. 4.26: F- Measure of different filters for logo of Western union bank
(64)
4
5
Filtersvalue of the F-Measure it
Fig. 4.26 shows that minimum filter posses the highest
indicated that binarized image and input image are equivalent. It also implies that precision
and recall value of binarized image are having high values.
0.505
0.5
values
0.495
0.49
0.485
Fig. 4.27: Negative rate metric of different filters for logo of Western union bank
0.48
Fig. 4.27 shows that for input image of Western union bank there is least pixel
1
2 filter and maximum
3 pixel mismatch for4 median filter.
mismatch
for minimum
Filters
Precision
0.9
values
0.8
0.7
0.6
0.5
0.4
Fig. 4.28: Precision of different filters for logo of Western union bank
implies that for given input image minimum and median filter are more efficient for text
extraction in binary form in comparison of other filters.
(65)
11.4
PSN
11.3
11.2
11.1
values
11
10.9
10.8
10.7
10.6
10.5
10.4
Fig. 4.29: PSNR of different filters for logo of Western union bank
which shows the good image quality and less error is introduced in output image whereas
average filter gives the lowest psnr value which means the quality of input image is
degraded in comparison of other filters.
Specificity
0.9999
values
0.9999
0.9998
0.9998
0.9997
0.9997
0.9996
0.9996
Fig. 4.30: Specificity of different filters for logo of Western union bank
3 (66)
4
Filters
Fig. 4.30 shows that for adaptive, median and minimum filter the proposed
algorithm is able to reject the pixel of a input image which do not contain text data more
accurately in comparison of other filters.
The third input image is a DVD cover which contains a large amount of text information.
The image contains background of different type of colors and text of different font size,
alignments and shapes. The input image is corrupted and its quality is degraded by Gaussian
noise .Firstly we will remove most of the noise by an effective filtering algorithm, and
finally search for the text in the cover by proposed algorithm and extract the text from cover.
Following are the output images of text extraction which are obtained by applying
the different filtering methods on the proposed algorithm for the input image.
Fig. 4.35 shows the text extracted from the input image by using median filter
method on proposed algorithm.
Fig. 4.37 shows the text extracted from the input image by using trimmed filter
method on proposed algorithm.
Fig. 4.38 shows the text extracted from the input image by using wiener filter
method on proposed algorithm.
Table 4.3: Summary of the result for DVD cover having Gaussian noise
Filtering
Methods
Accuracy
DRD
FMeasure
NRM
Adaptive Filter
0.6967
20.4257
3.8455
0.5477
0.0423
11.0592
0.7309
Average Filter
0.9216
8.5363
0.0195
0.4999
0.2857
12.3968
1.0000
Maximum Filter
0.9424
8.5363
0.0265
0.4997
0.1642
12.3968
0.8997
Median Filter
0.9482
11.0706
0.1618
0.4997
0.1642
12.8593
0.9998
Minimum Filter
0.8952
18.7968
4.2457
0.5045
0.0532
9.7968
0.9557
Trimmed Filter
0.9434
12.1343
0.0538
0.4991
0.3077
12.4681
1.0000
Wiener Filter
0.9410
12.28860
0.1806
0.4997
0.1429
12.2891
0.9895
Table 4.3 shows the performance values calculated by multiple filters for given input
image of DVD cover contaminated by Gaussian noise. These values are further analyzed
with the help of the graphs.
4.3.3.1 Analysis of Calculated values
In the following figure the number 1, 2, 3, 4, 5, 6 and 7 represents the adaptive filter,
average filter, maximum filter, median filter, minimum filter, trimmed filter and wiener filter
respectively.
(70)
Accuracy
0.95
0.9
values
0.85
0.8
0.75
0.7
0.65
Fig. 4.39: Accuracy of different filters for cover of DVD having Gaussian noise
preserving the edges of text and adaptive filter shows the worst result for the input image of
cover of DVD. After minimum filter the filter which shows the better output is trimmed
filter in comparison of other filters.
22
20
18
values
16
14
12
10
Fig. 4.40: Distance reciprocal distortion metric of different filters cover of DVD having Gaussian
noise
8
(71)
4
Filters
Fig. 4.40 show that maximum filter provides best visual quality and detects as
possible text for input image and minimum filter gives the highest values of DRD which
means that minimum filter is not able to detect text properly in input image.
F-Measure
4.5
3.5
values
2.5
1.5
0.5
Fig. 4.41: F- Measure of different filters for cover of DVD having Gaussian noise
indicated that binarized image and input image are equivalent. It also implies that precision
and recall value of binarized image are having high values.
0.56
0.55
0.54
values
0.53
0.52
0.51
0.5
Fig. 4.42: Negative rate metric of different filters for cover of DVD having Gaussian noise
0.49
(72)
4
Filters
Fig.4.42 shows that for input image of cover of DVD ,there is least pixel mismatch
for median filter and maximum pixel mismatch for average filter.
Precision
0.35
0.3
0.25
values
0.2
0.15
0.1
0.05
Fig. 4.43: Precision of different filters for cover of DVD having Gaussian noise
4
Fig. 4.3 shows2 that the values for 3
trimmed filter are highest.
It implies that 5for given
Filters
input image trimmed filter are more efficient for text extraction in binary form in
comparison of other filters.
13
12.5
12
values
11.5
11
10.5
10
9.5
Fig. 4.44: PSNR values of different filters for cover of DVD having Gaussian noise
3 (73)
4
Filters
Fig. 4.44 shows that for the given input image the median filter gives high psnr
value which shows the good image quality but in the process of removing the noise, the
median filter was not able to preserve the edges of text in output image.
Specificity
0.95
values
0.9
0.85
0.8
0.75
0.7
Fig. 4.45: Specificity of different filters for cover of DVD having Gaussian noise
is able to reject the pixel of a input image which do not contain text data more accurately in
comparison of other filters.
noise (SN). Firstly we will remove most of the noise by an effective filtering algorithm, and
finally search for the text in the cover by proposed algorithm and extract the text from cover.
Fig. 4.46 shows the input image.
(74)
Following are the output images of text extraction which are obtained by applying
the different filtering methods on the proposed algorithm for the input image.
Fig.
4.47 shows the text extracted from the input image by using adaptive filter method on
proposed algorithm.
(75)
Fig. 4.48: Text extraction from input image having SN using average filter
Fig. 4.48 shows the text extracted from the input image by using average filter
method on proposed algorithm.
Fig.
4.49 shows the
text extracted from the input image by using maximum filter method on proposed algorithm.
(76)
Fig. 4.50: Text extraction from input image having SN using median filter
Fig. 4.50 shows the text extracted from the input image by using median filter
method on proposed algorithm.
Fig. 4.51
shows the text
extracted from the input image by using minimum filter method on proposed algorithm.
(77)
Fig. 4.52
shows the text
extracted from the input image by using trimmed filter method on proposed algorithm.
Fig. 4.53 shows the text extracted from the input image by using wiener filter
method on proposed algorithm.
Table 4.4: Summary of the result for input image having SN
Filtering
Methods
Accuracy
DRD
FMeasure
NRM
Adaptive Filter
0.7132
12.2243
0.0124
0.5161
0.6385
5.4237
0.7351
Average Filter
0.9308
12.4802
0.0110
0.5000
0.7178
11.6019
1.0000
Maximum Filter
0.9594
9.4423
0.0453
0.4997
0.9134
12.9623
1.0000
Median Filter
0.9583
9.4602
0.0183
0.5000
0.9032
13.7968
0.9998
Minimum Filter
0.9460
9.8490
0.0389
0.5234
0.5321
13.7968
0.9998
Trimmed Filter
0.9529
12.1343
0.0434
0.5000
0.9130
13.2699
1.0000
Wiener Filter
0.9580
9.8500
0.0307
0.4997
0.8645
13.0413
0.9999
Table 4.4 shows the performance values calculated by multiple filters for given input
image in which image quality is degraded by SN.
These values are further analyzed with the help of the graphs.
4.3.4.1 Analysis of Calculated values
In the following figure the number 1, 2, 3, 4, 5, 6 and 7 represents the adaptive filter,
average filter, maximum filter, median filter, minimum filter, trimmed filter and wiener filter
respectively.
Accuracy
0.95
values
0.9
0.85
0.8
0.75
0.7
3 (79)
4
Filters
Fig 4.54 shows that in term of accuracy the maximum filter shows the best result in
preserving the edges of text and adaptive filter shows the worst result for the input image.
After maximum filter, the filter which shows the better output is trimmed filter in
comparison of other filters.
12.5
12
11.5
values
11
10.5
10
9.5
Fig. 4.55: Distance reciprocal distortion metric of different filters for input image having SN
Fig. 4.55 show that maximum filter provides best visual quality and detects as
possible
text for input
values of DRD
1
2 image and adaptive
3 filter gives the highest
4
5 which
means that adaptive filter is not able to detect text properly Filters
in input image.
F-Measure
0.05
0.045
0.04
values
0.035
0.03
0.025
0.02
0.015
Fig. 4.56: F- Measure of different filters for input image having SN
0.01
(80)
indicated that binarized image and input image are equivalent. It also implies that precision
and recall value of binarized image are having high values.
0.52
0.515
0.51
values
0.505
0.5
0.495
0.49
0.485
0.48
Fig. 4.57: Negative rate metric of different filters input image having SN
0.475
Fig. 4.57 shows that for input image there is least pixel mismatch for maximum filter
4
Filters
Precision
0.95
0.9
0.85
values
0.8
0.75
0.7
0.65
0.6
0.55
0.5
highest. It implies that for given input image maximum and trimmed filter are more efficient
(81)
for text extraction in binary form in comparison of other filters
15
14
13
12
values
11
10
9
8
7
6
5
Fig. 4.59: PSNR values of different filters for input image having SN
value which shows the good image quality but in the process of removing the noise, the
minimum filter was not able to preserve the edges of text in output image.
Specificity
0.95
Spe
values
0.9
0.85
0.8
0.75
0.7
3 (82)
4
Filters
Fig. 4.60 shows that for average, maximum and trimmed filter the proposed
algorithm is able to reject the pixel of a input image which do not contain text data more
accurately in comparison of other filters.
In the input image the text is written over a background which contains different
type of colors. The input image is corrupted and its quality is degraded by salt and pepper
noise (SPN).Firstly we will remove most of the noise by an effective filtering algorithm, and
finally search for the text in the image by proposed algorithm and extract the text from
cover.
Fig. 4.61: The input image having salt and pepper noise
Following are the output images of text extraction which are obtained by applying
the different filtering methods on the proposed algorithm for the input image.
Fig. 4.62 shows the text extracted from the input image by using adaptive filter
method on proposed algorithm.
(83)
(84)
Fig.
4.65
shows the text extracted from the input image by using median filter method on proposed
algorithm
Fig.
shows
the
4.68
text
extracted from the input image by using wiener filter method on proposed algorithm
Table 4.5: Summary of the result for input image having SPN
(86)
Filtering
Methods
Accuracy
DRD
FMeasure
NRM
Adaptive Filter
0.5868
26.4197
0.0744
0.6215
0.0402
3.8387
0.6850
Average Filter
0.9293
10.0979
0.0108
0.5000
0.9543
11.5052
0.7689
Maximum Filter
0.8229
25.2479
0.0086
0.5000
0.7654
7.5176
0.8999
Median Filter
0.9649
9.1618
0.1572
0.4996
1.000
13.0567
1.0000
Minimum Filter
0.7189
27.7578
0.0217
0.4999
0.6654
5.5108
0.4563
Trimmed Filter
0.9505
9.6301
0.0462
0.4999
0.9995
13.0567
1.0000
Wiener Filter
0.8313
17.1161
0.0045
0.5000
0.7655
7.72891
0.9992
Table 4.5 shows the performance values calculated by multiple filters for given input
image in which image quality is degraded by SPN. These values are further analyzed with
the help of the graphs.
4.3.5.1 Analysis of calculated values
In the following figure the number 1, 2, 3, 4, 5, 6 and 7 represents the adaptive filter,
average filter, maximum filter, median filter, minimum filter, trimmed filter and wiener filter
respectively.
Accuracy
0.95
0.9
values
0.85
0.8
0.75
0.7
0.65
0.6
Fig. 4.69: Accuracy of different filters for input image having SPN
0.55
(87)
preserving the edges of text and minimum filter shows the worst result for the input image.
After median filter the filter which shows the better output is trimmed filter in comparison
of other filters.
28
26
24
22
values
20
18
16
14
12
10
Fig. 4.70: Distance reciprocal distortion metric of different filters for input image having SPN
Fig. 4.70 show that median filter provides best visual quality and detects as possible
text
1 for input image and
2 minimum filter gives
3 the highest values
4 which means that minimum
5
Filters
filter is not able to detect text properly in input image and not
effective for removing noise.
F-Measure
0.16
0.14
0.12
values
0.1
0.08
0.06
0.04
0.02
Fig. 4.71: F- Measure of different filters for input image having SN
(88)
4
Filters
Fig. 4.71 shows that median filter posses the highest value of the F-Measure it
indicated that binarized image and input image are nearly equal. Wiener filter posses the
lowest value of F-Measure which means text is not clearly identified in it.
0.64
0.62
0.6
values
0.58
0.56
0.54
0.52
0.5
0.48
Fig. 4.72: Negative rate metric of different filters input image having salt and pepper noise
0.46
Fig.4.72 shows that for input image there is least pixel mismatch for median filter
4
Filters
14
12
values
10
Fig.4.73: PSNR values of different filters for input image having SPN
3
(89)
4
Filters
Fig. 4.73 shows that for the given input image the median filter gives high psnr value
which shows the good image quality but in the process of removing the noise, the median
filter was not able to preserve and detect the edges of small text in output image.
Specificity
Specificity
0.9
values
0.8
0.7
0.6
0.5
0.4
is able to reject the pixel of a input image which do not contain text data more accurately in
comparison of other filters.
Precision
1
0.9
0.8
0.7
values
0.6
0.5
0.4
0.3
0.2
0.1
0
Fig. 4.75: Precision of different filters for input image having SPN
3 (90)
4
Filters
Fig. 4.75 shows that the values of precision for median and trimmed filter are
highest. It implies that for given input image median and trimmed filter are more efficient
for text extraction in binary form in comparison of other filters
(91)
Chapter 5
Conclusion
Summary
This thesis considered various issues and challenges of Digital image
processing that prevent its wide spread use in various applications. Denoising
and extraction of text from images are the main issues of the digital image
processing. Thus this study concentrated the focus on overcoming these issues
in digital images. To achieve this objective various design issues were studied
and also analysis of the previous research works was done in this direction. A
comparative study of various filtering algorithms like wiener, adaptive, average,
minimum, maximum and median filter is also provided which would be useful
for other researchers working in the same field. After analyzing all the previous
work an algorithm is proposed for text extraction through images in which noise
is removed from input images using different filters and then text is extracted
from filtered image .The output image contain text in black color and
background in white color. The proposed algorithm has been simulated in
MATLAB. By conducting various experiments in this work analyzed the
performance of filters on input image and results have been calculated in
accordance with the performance metrics defined.
5.2
Conclusion
In this thesis analysis of filters on complex digital image for text
result. Maximum filter is showing best result for image in which image quality is
degraded by SN. For image with SPN, the median filter is showing the best
performance in terms of simulation parameters.
The proposed method retains the useful textual information more
accurately and thus, has a wider range of applications compared to other
conventional methods.
5.3
Future Scope
The result in this thesis provides a strong foundation for future work for the
hardware design. All of the analysis presented in this thesis work involved exhaustive
simulations. The algorithm can be realized in hardware implementation as a future work.
I have limited this work by using only wiener, adaptive, average, minimum,
maximum and median filter, although it is to be stressed that the library of the best-basis can
be extended to other types of filters like hybrid filter which is combination of a combination
of wiener filter and median filter and bilateral filter. Special attention must be paid to the
threshold calculating algorithm. I have used global threshold method in my research; the
further research work can be done by using local and hybrid threshold method. Better text
extraction can be obtained if the best methods are selected for noise removal from image
and threshold estimation.
MATLAB is used for all simulating the results of the proposed algorithm. The same
work can also be simulated in OpenCv- Python for further analysis and comparison.
(93)
references
[1].
approach
for the removal of impulse noise from highly corrupted images. Image
Processing, IEEE Transactions on, 5(6), 1012-1025(1996).
[2]
[3].
[4].
Arce, G. R., & Paredes, J. L. Recursive weighted median filters admitting negative
weights and their optimization. Signal Processing, IEEE Transactions on, 48(3),
768-779(2000).
[5].
relationship to
[8].
Bovik, A. C., Huang, T. S., & Munson Jr, D. C. A generalization of median filtering
using linear combinations of order statistics. Acoustics, Speech and Signal
Processing, IEEE Transactions on, 31(6), 1342-1350(1983).
(94)
[9].
of the
Buades, A., Coll, B. and Morel, J. M. A non-local algorithm for image denoising. In
Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer
Society Conference on (Vol. 2, pp. 60-65). IEEE(2005, June).
[11].
[12]. Chan, R. H., Ho, C. W. and Nikolova, M. Salt-and-pepper noise removal by m ediantype noise detectors and detail-preserving regularization.Image Processing, IEEE
Transactions on, 14(10), 1479-1485(2005).
[13]. Chen, C. T. and Chen, L. G. A self-adjusting weighted median filter for removing
impulse noise in images. In Image Processing, 1996. Proceedings., International
Conference on (Vol. 1, pp. 419-422). IEEE(1996, September).
[14]. Chen, D., Bourlard, H. and Thiran, J. P. Text identification in complex
background
using SVM. In Computer Vision and Pattern Recognition, 2001. CVPR 2001.
Proceedings of the 2001 IEEE Computer Society Conference on (Vol. 2, pp. II621). IEEE(2001)..
[15].
[16].
Durand, F. and Dorsey, J. Fast bilateral filtering for the display of high-dynamicrange images. ACM transactions on graphics (TOG), 21(3), 257-266(2002).
[17].
[18].
[19].
Gllavata, J., Ewerth, R. and Freisleben, B. A robust algorithm for text detection in
images. In Image and Signal Processing and Analysis, 2003. ISPA 2003.
Proceedings of the 3rd International Symposium on (Vol. 2, pp. 611-616). IEEE.
(2003, September).
[20]. Gonzalez, R. C. and Richard, E. Woods, digital image processing. ed: Prentice Hall
Press, ISBN 0-201-18075-8(2002).
[21].
[22]. Hou, Z. and Koh, T. S . Image denoising using robust regression.Signal Processing
Letters, IEEE, 11(2), 243-246(2004).
[23].
[24]. Jain, A. K., Duin, R. P. and Mao, J. Statistical pattern recognition: A review. Pattern
Analysis and Machine Intelligence, IEEE Transactions on, 22(1), 4-37(2000)..
[25].
Jang, I. H. and Kim, N. C. Locally adaptive Wiener filtering in wavelet domain for
image restoration. In TENCON'97. IEEE Region 10 Annual Conference. Speech and
Image Technologies for Computing and Telecommunications., Proceedings of IEEE
(Vol. 1, pp. 25-28). IEEE(1997, December).
[26].
Jayaraj, V., Ebenezer, D. and Aiswarya, K. High density salt and pepper noise
removal
in
images
using
improved
adaptive
statistics
estimation
filter.
(96)
[27].
Jung, K., Kim, K. I. and Jain, A. K. Text information extraction in images and video:
a survey. Pattern recognition, 37(5), 977-997(2004).
[28].
[29].
[30].
[33].
[34].
[35].
Kim, K. I., Jung, K. and Kim, J. H. Texture-based approach for text detection in
images using support vector machines and continuously adaptive mean shift
algorithm. Pattern Analysis and Machine Intelligence, IEEE Transactions on,
25(12), 1631-1639(2003).
[36]. Ko, S. J. and Lee, Y. H. Center weighted median filters and their applications to
(97)
image enhancement. Circuits and Systems, IEEE Transactions on, 38(9), 984993(1991).
[37].
[38].
[39].
[40].
[41].
[42].
[43].
[44].
[45].
[46].
[47].
[48].
Rudin, L. I., Osher, S. and Fatemi, E. Nonlinear total variation based noise
removal algorithms. Physica D: Nonlinear Phenomena, 60(1), 259-268(1992).
[49].
binarization. Pattern
(99)