Vous êtes sur la page 1sur 64

CHAPTER 1

INTRODUCTION

1
1.1 OVERVIEW
Documents have been the traditional medium for printed documents. However, with
the advancement of digital technology, it is seen that paper documents were gradually
augmented by electronic documents. Paper documents consist of printed information on
paper media. Electronic documents use predefined digital formats, where information
regarding both textual and graphical document elements, have been recorded along with
layout and stylistic data. Both paper and electronic documents confer to their own advantages
and disadvantages to the user. For example, information on paper is easy to access but
tedious under modification and difficult under storage of huge information. While electronic
documents are best under storage of huge data base but very difficult for modifications.
In order to gain the benefits of both media, the user needs to be able to port
information freely between the two formats. Due to this need, the development of computer
systems capable of accomplishing this inter-conversion is needed. Therefore, Automatic
Document Conversion has become increasingly important in many areas of academicia,
business and industry. Automatic Document Conversion occurs in two directions: Document
Formatting and Document Image Analysis. The first automatically converts Electronic
documents to paper documents, and the second, converts paper documents to their electronic
counterparts.
Document Image Analysis is concerned with the problem of transferring the
document images into electronic format. This would involve the automatic interpretation of
text images in a printed document, such as books, reference papers, newspapers etc.
Document Image Analysis can be defined as the process that performs the overall
interpretation of document images. It is a key area of research for various applications in
machine vision and media processing, including page readers, content-based document
retrieval, digital libraries etc.
There is a considerable amount of text occurring in video that is a useful source of
information, which can be used to improve the indexing of video. The presence of text in a
scene, to some extent, naturally describes its content. If this text information can be
harnessed, it can be used along with the temporal segmentation methods to provide a much
truer form of content-based access to the video data.

2
Figure 1.1 Example of a documented video image clip

1.2 STATEMENT OF PROBLEM


Text in images and video sequences provide highly condensed information about the
contents of the images or video sequences and can be used for video browsing in a large
video database. Text superimposed on the video frames provides supplemental but important
information for video indexing and retrieval. Although text provides important information
about images or video sequences, it is not a easy problem to detect and segment them. The
main difficulties lie in the low resolution of the text, and the complexity of the background.
Video frames have very low resolution and suffer from blurring effects due to lossy
compression. Additionally the background of a video frame is more complex with many
objects having text like features. One more problem lies with the handling of large amount of
text data in video clip images.

1.3 OBJECTIVE OF THE STUDY


The main objective of this project is to develop an efficient algorithm for localization
of text data from the video image sequence. The project implemented, performs
transformation analysis on existing wavelet transforms for the suitability of wavelet
transformation for text isolation having multiple features. The project realizes morphological
operation on the wavelet coefficients and making it editable for further modifications.

1.4 SCOPE OF STUDY


This project implements an efficient system for the localization of text from a given
away documented video clips for further applications. The implemented project work finds
efficient usage under video image processing for enhancement and maintenance. The work

3
can be efficiently used in the area of video image enhancement such as cinematography and
video presentation etc. The proposed work will be very useful under digital library
maintenance of video database.
Following are the areas of application of text isolation in video images;
1. Digital library: For maintenance of documented video images in large database.
2. Data modification: Useful under modification of information’s in video images.
3. Cinematographic applications: For enhancing the document information in movie
video clips.
4. Instant documentation of news and reports: For documentization of instant reports
and news matters in paper.

1.5 METHODOLGY
Many efforts have been made earlier to address the problems of text area detection,
text segmentation and text recognition. Current text localization approaches can be classified
into three categories:
The first category is connected component-based method, which can locate text
quickly but have difficulties when text is embedded in complex background or touches other
graphical objects.
The second category is texture-based, which is hard to find accurate boundaries of
text areas and usually yields many false alarms in “text-like” background texture areas.
The third category is edge-based method. Generally, analyzing the projection profiles of edge
intensity maps can decompose text regions and can efficiently predict the text data from a
given video image clip.
Text region usually have a special texture because they consist of identical character
components. These components contrast the background and have a periodic horizontal
intensity variation due to the horizontal alignment of many characters. As a result, text
regions can be segmented using texture feature.

4
1.5.1 DOCUMENT IMAGE SEGMENTATION
Document Image Segmentation is the act of partitioning a document image into
separated regions. These regions should ideally correspond to the image entities such as text
blocks and graphical images, which are present in the document image. These entities can
then be identified and processed as required by the subsequent steps of Automated Document
Conversion.
Various methods are described for processing Document Image Segmentation. They
include: Layout Analysis, Geometric Structure Detection/Analysis, Document Analysis,
Document Page Decomposition, Layout Segmentation, etc. Texts in images and video
sequences provide highly condensed information about the contents of the images or video
sequences and can be used for video browsing/retrieval in a large image database. Although
texts provide important information about images or video sequences, it is not easy to detect
and segment out the text data from the documented image.
The difficulty in text extraction is due to the following reasons;
1. The text properties vary randomly with non-uniform distribution.
2. Texts present in an image or a video sequence may have different cluttered
background.
Methods for text localization can be done using component-based or texture-based.
Using component-based texts localization methods, text regions are detected by analyzing the
edge component of the candidate regions or homogenous color/grayscale components that
contain the characters. Whereas texture based method uses the texture property such as
curviness of the character and image for text isolation. In texture based document image
analysis an M-band wavelet transformation is used which decomposes the image into M×M
band pass sub channels so as to detect the text regions easily from the documented image.
The intensity of the candidate text edges are used to recognize the real text regions in an M-
sub band image.

1.5.2 WAVELET TRANSFORMATION


Digital image is represented as a two-dimensional array of coefficients, each
coefficient representing the intensity level at that coordinate. Most natural images have
smooth color variations, with the fine details being represented as sharp edges in between the

5
smooth variations. Technically, the smooth variations in color can be termed as low
frequency variations, and the sharp variations as high frequency variations.
The low frequency components (smooth variations) constitute the base of an image,
and the high frequency components (the edges which give the details) add upon them to
refine the image, thereby giving a detailed image. Hence, the smooth variations are more
important than the details.
Separating the smooth variations and details of the image can be performed in many
ways. One way is the decomposition of the image using the discrete wavelet transform.
Digital image compression is based on the ideas of sub-band decomposition or discrete
wavelet transforms. Wavelets, which refer to a set of basis functions, are defined recursively
from a set of scaling coefficients and scaling functions. The DWT is defined using these
scaling functions and can be used to analyze digital images with superior performance than
classical short-time Fourier-based techniques, such as the DCT.

1.5.3 MORPHOLOGICAL OPERATION


Mathematical morphology as a tool for extracting image components that are useful
in the representation and descriptive of region shape, such as boundaries, skeletons and the
convex hull. It is defined two fundamental morphological operations, dilation and erosion, in
terms of the union or intersection of an image with a translated shape called as structuring
element.

1.6 LIMITATION OF STUDY


This project work implements a text isolation and recognition system for the isolation
of text character from a given video sequence. The project work implemented has certain
limitation on the implementation. The implemented system gives less accuracy under high
intensity background of video image. The implementation also shows less accuracy to the
extraction of text and recognition under occultation. Under high variable components in
video sequence the system results in text isolation with noise.

6
CHAPTER 2
TEXT EXTRACTION AND RECOGNITION

7
2.1 INTRODUCTION
Document Image Segmentation is a crucial step in the conversion process for paper
document images into electronic documents. Entities in a document image, such as text
blocks, and figures need to be separated before further document analysis and recognition
can occur. Many Document Segmentation algorithms are designed exclusively for a few
specific document types, utilizing highly specialized document models.
Basically an independent segmenter does not assume specific document layout
models in its segmentation. The segmenter utilizes a minimal amount of image domain
knowledge. Entities from the document images are extracted as non-overlapping sub-images
by the segmenter.
The advantages of document image analysis are:
1. Document size:
An ASCII representation of a document page can easily be stored in 2-3 KB, whereas
a typical scanned image of a page may require between 500 KB and 2 MB. If it is to
maintain documents in image form, an efficient compressed representation is essential for
both storage and transmission.
2. Providing efficient access to the compressed image:
Traditional compression techniques used for document images have been successful
in reducing storage requirements but do not provide efficient access to the compressed
data. It is desirable to use a compression method that makes use of a structured
representation of the data, so that it not only allows for rapid transmission but also allows
access to various document components and facilitates processing of documents without
the need for expensive decompression.
3. Readability:
Many lossy compression and progressive transmission techniques use resolution
reduction or texture-preserving methods that can render a document image unreadable. It
is desirable that a document be readable even at the highest levels of lossy compression
and at the start of a progressive transmission. The highly lossy representation can then be
augmented by subsequent transmissions for better rendition.

8
2.2 OVERVIEW TO DOCUMENT IMAGE PROCESSING
The following gives a summary on document image processing:
1. A paper document is scanned to a digital form as digital document image.
2. Pre processing filters are applied to reduce noise or image distortion. Binarization of
the image is performed during this stage of processing.
3. Segmentation is performed on a digital document image by splitting the document
image into discrete graphic entries.
4. Segmented entries are classified into different entity types.
5. Non text entities are analyzed and any text matters in these entries are also extracted.

Figure 2.1 shows the overall flow process for the document Image Analysis. The
fundamental steps involved in document image analysis are Preprocessing, Feature
Extraction, Classification, Text analysis, Non-text analysis, Text Isolation which is further
processed for different applications.
Document Video Image

Preprocessing
Figure 2.1 Fundamental Steps in document image analysis
2.3 VIDEO DOCUMENT TEXT EXTRACTION
Feature Extraction
In the age of multimedia, video is increasingly important and common information
medium. However, most current video data is unstructured, i.e. stored and displayed only as
Classification
pixels. There is no additional content information such as year of production, starring actors,
director, producer, costume designer, places of shots, positions and types of scene breaks etc.
Therefore, the usability of raw video is limited, precluding effective and efficient retrieval.
Text Analysis Non-Text Analysis
Consider the thousand of MPEG-encoded films on the Internet. Beyond the title and a short
description rarely can any information be found about the content and structure of these
films, therefore making it very difficult to find e.g. specific kinds of films or scenes.
Text Isolation
Information on video content would be highly desirable.
Usually, this information has to be generated manually, but manual annotation of
Isolated
video is very time-consuming and costly. Text content-based retrieval and browsing
Therefore,
prompt a demand for automatic video content analysis tools. One important source of
information about videos is the text contained therein. Video images consist of multiple

9
video frames. A single video frame as shown in Figure 2.2 can contain several text strings,
each having a different features and orientation. This makes the segmentation of text in video
a challenging problem. The output of segmentation is a binary image of the text in each
bounding box, with the text pixels in black and the background pixels in white as shown in
Figure 2.3.

Figure 2.2 An original Documented Video Image frame

Figure 2.3 Segmented binarized text image

The system realized for a text extraction should be capable of binarizing both
artificial caption text as well as scene text occurring naturally in a video frame. To
accommodate scene text, the module should also be capable of segmenting low-contrast and
unevenly illuminated text, which is quite common in general purpose video images.

2.4 DISCRETE WAVELET TRANSFORM


The discrete wavelet transform is a very useful tool for signal analysis and image
processing, especially in multi-resolution representation. In image processing, it is difficult to
analyze the information about an image directly from the gray-level intensity of image pixels.
The multi-resolution representation can provide a simple method for exploring the
information about images. The two-dimensional discrete wavelet transform can decompose

10
an image into 4 different resolutions of sub-bands. Those sub-bands include one average sub-
bands and three detail component sub-bands. Detail component sub-bands represent different
features for an image.
Wavelets ψ a, b (x) are functions generated from mother wavelet ψ by dilations and
translations
 x−b
ψ a, b (x) = | a | -1/2 ψ   ---------- (2.1)
 a 
The basic idea of wavelet transform is to represent any function f as a superposition
of wavelets. Using weighting coefficients, the wavelet can be decomposed as an integral over
a range a and b of ψ (x). In a multi-resolution analysis, a scaling function φ (x) is employed
to process the multi-resolution. The wavelet get decomposed into a m,n (f ) called as
approximate coefficients of a m-1,l and c m,n (f ) termed as detail coefficients of a m-1,l using a
low-pass and a high-pass filter in cascade. The two-dimensional decomposition is carried out
by the combination of two one-dimensional decomposition of wavelet transform.

2.4.1 ONE-DIMENSIONAL DISCRETE WAVELET TRANSFORM


The one-dimensional discrete wavelet transform decomposes an input signal x={x0, x1,
……. ,xn-1} into a low-pass sub-band a={a0, a1, …….., an/2-1} and a high-pass sub-band c={c0, c1,
…….. ,cn/2-1}. Those decomposed components can be represented as

An = ∑h k
2 n −k x k ---------- (2.2)

Cn= ∑gk
2 n −k x k ----------(2.3)

Where hn and gn are the low-pass and high-pass filter coefficients respectively.

g C
Image

h A

11
Figure 2.4 1-D DWT Decomposition

Figure 2.4 shows the realization of a 1-Dimensional DWT filter banks for
decomposition of original image into Approximate and Detail coefficients respectively. An
example for one-dimensional decomposition of a video image clip is shown in Figure 2.5.
The filter bank extracts the high varying components and the Low varying components from
the given image.

Figure 2.5 One Dimensional Decomposition of a video image

2.4.2 TWO-DIMENSIONAL DISCRETE WAVELET TRANSFORM


Two-dimensional discrete wavelet transform can be achieved by two 1-D DWT
operations performing operations isolate ly on rows and columns. Firstly the row operation is
performed to obtain two sub-bands by using 1-D DWT, one low-pass sub-band (L) and one
high-pass sub-band (H) as shown in Figure 2.6.
The 1-D DWT image is transformed again to obtain four sub-bands by another 1-D
DWT operation. Figure 2.6 shows the filter bank realization for the decomposition process of
a 2-D DWT operation. The LL sub-band represents the approximate component of the image
and other three sub-bands (LH, HL and HH) represent the detail components. An example of
2-D DWT on a documented video image is shown in Figure 2.7.

12
Row Operation Column Operation

g HH

g
h HL

Image

g LH
h

LL
h

Figure 2-6 2-Dimensional DWT Decomposition

Figure 2.7 2-D DWT decomposition of video image

13
2.5 WAVELET FAMILIES:

2.5.1 DISCRETE WAVELET TRANSFORM:


The transform of a signal is just another form of representing the signal. It does not
change the information content present in the signal. The Wavelet Transform provides a
time-frequency representation of the signal. It was developed to overcome the short coming
of the Short Time Fourier Transform (STFT), which can also be used to analyze non-
stationary signals. While STFT gives a constant resolution at all frequencies, the Wavelet
Transform uses multi-resolution technique by which different frequencies are analyzed with
different resolutions.
A wave is an oscillating function of time or space and is periodic. In contrast,
wavelets are localized waves. They have their energy concentrated in time or space and are
suited to analysis of transient signals. While Fourier Transform and STFT use waves
to analyze signals, the Wavelet Transform uses wavelets of finite energy.

Figure 2.8 Demonstrations of (a) a Wave and (b) a Wavelet.

The wavelet analysis is done similar to the STFT analysis. The signal to be analyzed
is multiplied with a wavelet function just as it is multiplied with a window function in STFT,
and then the transform is computed for each segment generated. However, unlike STFT, in
Wavelet Transform, the width of the wavelet function changes with each spectral component.
The Wavelet Transform, at high frequencies, gives good time resolution and poor frequency
resolution, while at low frequencies; the Wavelet Transform gives good frequency resolution
and poor time resolution.

14
Figure 2.9 wavelet families

2.5.2 INTRODUCTION TO WAVELET FAMILIES:


There are a number of basis functions that can be used as the mother wavelet for
Wavelet Transformations. Since the mother wavelet produces all wavelet functions used in
the transformation through translation and scaling, it determines the characteristics of the
resulting Wavelet Transform. Therefore, the details of the particular application should be
taken into account and the appropriate mother wavelet should be chosen in order to
Use the Wavelet Transform effectively.
Figure 2.4 illustrates the commonly used wavelet functions Haar and Daubechies
respectively.Haar wavelet is one of the oldest and simplest wavelet. Therefore, any
discussion of wavelets starts with the Haar wavelet. Daubechies wavelets are the most
popular wavelets. They represent the foundations of wavelet signal processing and are used
in numerous applications. These are also called Maxflat wavelets as their frequency
responses have maximum flatness at frequencies 0 and π. This is a very desirable property in
some applications. The Haar, Daubechies are compactly supported orthogonal wavelets. The
wavelets are chosen based on their shape and their ability to analyze the signal in a particular
application.

15
2.5.3 HAAR WAVELET:
Haar wavelet is a certain sequence of functions. It is the first known wavelet. This
sequence was proposed in 1909 by Alfred Haar. Haar used these functions to give an
example of a countable orthonormal system for the space of square-integrable functions on
the real line. The study of wavelets, and even the term "wavelet", did not come until much
later. The Haar wavelet is also the simplest possible wavelet. The technical disadvantage of
the Haar wavelet is that it is not continuous.

The Haar wavelet's mother wavelet function ψ(t) can be described as

and its scaling function φ(t) can be described as

16
LIMITATIONS OF HAAR WAVELET:
• Needs an algorithm that eliminates the noise and not distributes the rest of the signal.
• Disadvantages of Harr wavelet: discontinuous and does not approximate continuous
signals very well.

2.5.4 DAUBECHIES WAVELET:


Daubechies wavelet is a special case of haar wavelet. The Daubechies wavelets are a
family of orthogonal wavelets defining a discrete wavelet transform. With each wavelet type
of this class, there is a scaling function (also called father wavelet) which generates an
orthogonal multi-resolution analysis.
Scaling function:

Wavelet function:

In general the Daubechies wavelets are chosen to have the highest number A of
vanishing moments, (this does not imply the best smoothness) for given support width N=2A,
and among the 2A−1 possible solutions the one is chosen whose scaling filter has extremal
phase. The wavelet transform is also easy to put into practice using the fast wavelet

17
transform. Daubechies wavelets are widely used in solving a broad range of problems, e.g.
self-similarity properties of a signal or fractal problems, signal discontinuities, etc.The
Daubechies wavelets are not defined in terms of the resulting scaling and wavelet functions.

2.5.5 SPLINE WAVELET:


Spline wavelet packets and the fast integral wavelet transform. Because of the multi-
resolution properties and the real-time capability of the wavelet techniques, much more
information can be obtained from the pressure waveform than by using the traditional
Fourier-based method.
2.6 WAVELET DECOMPOSITION
There are several ways wavelet transforms can decompose a signal into various sub
bands. These include uniform decomposition, octave-band decomposition, and adaptive or
wavelet-packet decomposition. Out of these, octave-band decomposition is the most widely
used.
The decomposition of the signal into different frequency bands is simply obtained by
successive high pass and low pass filtering of the time domain signal. This filter pair is called
the analysis filter pair. First, the low pass filter is applied for each row of data, thereby
getting the low frequency components of the row. But since the low pass filter is a half band
filter, the output data contains frequencies only in the first half of the original frequency
range. They are down-sampled by two, so that the output data contains only half the original
number of samples. Now, the high pass filter is applied for the same row of data, and
similarly the high pass components are separated.

18
Vertical
Wavelet 2 HH

Horizontal
decomposition 2
Vertical
Scaling HL
Wavelet 2

Image
Vertical LH
Wavelet 2

Horizontal
Scaling
Wavelet 2
Vertical
Scaling LL
Wavelet 2

Figure 2.10 Pyramidal Decomposition of an image

This is a non-uniform band splitting method that decomposes the lower frequency
part into narrower bands and the high-pass output at each level termed as detail coefficients
are left without any further decomposition. This procedure is done for all rows. Next, the
filtering is done for each column of the intermediate data. The resulting two-dimensional
array of coefficients contains four bands of data, each labeled as LL (low-low), HL (high-
low), LH (low-high) and HH (high-high).

LL HL

LH HH1

Figure 2.11 Illustration of 1 level decomposed coefficient


The LL band called as approximate coefficients can be decomposed once again in the
same manner, thereby producing even more sub bands. This can be done up to a level of log2
(size max).
2.7 MATHEMATICAL MORPHOLOGY

19
The field of mathematical morphology contributes a wide range of operators to image
processing, all based around a few simple mathematical concepts from set theory. The
operators are particularly useful for the analysis of binary images and common usages
include edge detection, noise removal, image enhancement and image segmentation.

2.7.1 DILATION
It is typically applied to binary images. The basic effect of the operator on a binary
image is to enlarge the boundaries of regions of foreground pixels.
The dilation operator takes two pieces of data as inputs. The first is the image, which is to be
dilated and the second is a set of coordinate points known as a structuring element. It is this
structuring element that determines the precise effect of the dilation on the input image.

2.7.2 EROSION
Erosion is one of the two basic operators in the area of mathematical morphology. It
is typically applied to binary images. The basic effect of the operator on a binary image is to
erode away the boundaries of regions of foreground pixels. Thus areas of foreground pixels
shrink in size, and holes within those areas become larger.
The erosion operator also takes two pieces of data as inputs i.e., the input image and a
set of coordinate points known as a structuring element (also known as a kernel).
To compute the erosion of a binary input image by a given structuring element, each
of the foreground pixels in the input image is considered. If for every pixel in the structuring
element, the corresponding pixel in the image underneath is a foreground pixel, then the
input pixel is left as it is.

2.7.3 STRUCTURING ELEMENTS


The structuring element consists of a pattern specified as the coordinates of a number
of discrete points relative to some origin. Figure 2.10 shows a number of different structuring
elements of various sizes. In each case the origin is marked by a ring around that point. The
origin does not have to be in the center of the structuring element, but often it is. As seen
from the figure, structuring elements that fit into a 3×3 grid with its origin at the center are
the most commonly seen type.

20
When a morphological operation is carried out, the origin of the structuring element is
typically translated to each pixel position in the image in turn, and then the points within the
translated structuring element are compared with the underlying image pixel values. The
details of this comparison and the effect of the outcome depend on which morphological
operator is being used.

1 1 1 1 1 1
1 1 1
1 1 1 1 1 1 1 1
1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1
1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1

Figure 2.12 Structuring elements

21
CHAPTER 3
SYSTEM ANALYSIS

22
3.1 DOCUMENT IMAGE ANALYSIS
Document images are scans of documents which are in most cases pseudo-binary and
rich in textual content. Informally, we can define document images as images that contain
components that resemble the symbols of a language. Generally a scanned documented
image contains text and graphic in it. A documented image tends to be highly structured in
terms of layout, and have significant redundancy in the symbols which they occur.
Document Image Segmentation is the act of partitioning a document image into
separated regions. These regions should ideally correspond to the image entities such as text
blocks and graphical images as shown in Figure 3.1, which are present in the document
image. These entities can then be identified and processed as required by the subsequent
steps of Automated Document Conversion.

Figure 3.1 Document image and possible segmentation.

Methods used includes Segmentations, Layout Analysis, Geometric Structure


Detection/Analysis, Document Page Decomposition, Layout Segmentation etc..
Text in documented images can be extracted by extracting the fundamental
components using wavelet transforms. Thresholding the image into two levels of the
fundamental components results in the uniformity of neighboring pixels and uniforms the
intensity flow in decomposed documented images.
The binarized images can be operated with morphological analysis and neural
networks for an efficient extraction of text data from the documented image.
The higher variation in text pixels compared to the graphic results in the prediction of
text pixels from the graphic image.

23
3.2 TEXT ISOLATION
Text localization in document images has been an active research area for some time.
However text recognition in broadcast quality digital video is a problem requiring different
approaches. Unlike document images, video frames tend to have text not in orderly columns
but in widely scattered areas, and fewer, separated lines. Also, video frames are typically
noisy, low-resolution, and full-color with interlace artifacts. Wavelet transform is used as an
effective tool in isolation of text character from a documented video image. The Edges in
spatial Domain can be found from the wavelet transformation by identifying the peaks at
corresponding locations. The binarization of documented image results in the two level
isolation of text image, which help in proper extraction of text data from a given documented
image. The morphological operators and logical operators give an enhancement to the
extraction process. A feature extraction and classification gives the prediction of the text
characters.

3.3 SYSTEM PARAMETERS


3.3.1 DECOMPOSITION LEVELS
The decomposition levels affect the accuracy in isolation of text character in a given
video image. The higher-level decomposition provides better segmentation compared to
lower level decomposition and gets saturated as the decomposition levels increase. The
decomposition reaches to saturation at about half the total decomposition levels. The
implemented system gets saturated at four level decomposition. The obtained result shows no
variation in text isolation once the saturation level is reached.

3.3.2 FREQUENCY BANDS


The frequency band known as LH, HL, HH and LL bands of wavelet transform gives
horizontal, vertical, diagonal and approximate coefficients. The directional information plays
an efficient role in solving the segmentation task. With the increase in the frequency bands
the level of accuracy improves. The higher frequency bands are derived from the
decomposition of approximate coefficients obtained form a bank of Low pass filters i.e. LL
band. One of the factor effecting the level of accuracy in text isolation is the
Selection of wavelet function chosen for transformation.

24
3.3.3 WAVELET FUNCTIONS
Wavelet functions for bi-orthogonal wavelet namely spline wavelet, shows more
accuracy in segmentation levels compared to orthonormal wavelets such as Haar and
Debuchie. The Biorthogonal wavelet is found to be more suitable than the orthonormal
wavelet because of its non-energy preserving nature of the filters used and the property of
non-orthogonality between the filter coefficients used.

3.3.4 TIME ANALYSIS


The system implemented is analyzed based on the Accuracy level and the time taken
for overall processing for text isolation from a given video image. The segmentation system
improves the speed of operation with comparable segmentation results due to the reduction in
volume of data set involved in computation. Before the feature extraction the data is reduced
by a factor of 2 since each level of decomposition uses two wavelet sub bands. The speed of
operation is significantly improved by taking the advantage of edge detection ability of
wavelet transform.

25
CHAPTER 4
SYSTEM DESIGN

26
SYSTEM DESIGN
The proposed work analyzes the effect of wavelet transformations on the text
segmentation. The proposed work implements a segmentation system which read the
documented image containing text and graphics in it and carries out transformation followed
by morphological operation for the isolation of text data from the document image.

4.1 DESIGN FLOW

Document video
sequence

Wavelet Morphological
Binarization Operation
Decomposition

Logical
Operation

Isolated text

Figure 4.1 Block Diagram of the Implemented Design

27
4.1.1 DOCUMENTED VIDEO IMAGE
The documented video image sequences were considered for implementation consists
of graphic and text. Different documented video image samples are fed to the system having
uniform and non-uniform text distribution with different background. Figure 4.2 shows a
documented video image sample containing text and graphic.

Figure 4.2 A documented video Image consisting Text and Graphics

4.1.2 TRANSFORMATIONS
The wavelet transform is a very useful tool for signal analysis and image processing,
especially in multi-resolution representation. It can decompose signal into different
components in the frequency domain. One-dimensional discrete wavelet transform (1-D
DWT) decomposes an input sequence into two components (the average component and the
detail component) by calculations with a low-pass filter and a high-pass filter. Two-
dimensional discrete wavelet transform (2-D DWT) decomposes an input image into four
sub-bands, one average component (LL) and three detail components (LH, HL, HH) as
shown in Figure 4.3.

LL HL

LH HH

Figure 4.3 A two Dimensional DWT decomposition

28
In image processing, the multi-resolution of 2-D DWT has been employed to detect
edges of an original image. The traditional edge detection filters can provide the similar
result as well. However, 2-D DWT can detect three kinds of edges at a time while traditional
edge detection filters cannot. The traditional edge detection filters detect three kinds of edges
by using four kinds of mask operators. Therefore, processing times of the traditional edge
detection filters is slower than 2-D DWT.

-1 -1 -1
2 2 2 Horizontal
-1 –1 -1 Edge
Horizontal
-1 -1 -1
2 2 2 Horizontal
Horizontal -1 –1 -1 Edge
Edge
Vertical
-1 -1 -1
2 2 2
-1 –1 -1
+ 45 0 + Diagonal
- 45 0
-1 -1 -1 Edge
2 2 2
-1 –1 -1

Figure 4.4 Traditional edge detection using mask operation

The proposed work implements three wavelet transforms namely Haar, Debuchies
(orthogonal) and Spline Wavelet (Biorthogonal). The wavelet transform uses filter bank
realization as shown in Figure 4.5 for the decomposition of documented image into 3 details
and 1 approximate coefficient. 2-D DWT is achieved by two ordered 1-D DWT operations
(row and column). Initially the row operation is carried out to obtain 1 D decomposition.
Then it is transformed by the column operation and the final resulted 2-D DWT is obtained.
2-D DWT decomposes a gray-level image into one average component sub-band and three
detail component sub-bands as shown in Figure 4.5.

29
Vertical 2
Wavelet HH

Horizontal
decomposition 2
Vertical
Scaling HL
Wavelet 2

Image
Vertical LH
Wavelet 2

Horizontal
Scaling
Wavelet 2
Vertical
Scaling LL
Wavelet 2

Figure 4.5 Filter Bank Implementation of Wavelet sub-band decomposition

Figure 4.6 shows the original image used for the decomposition which is decomposed to 3
detail coefficients namely Horizontal, Vertical and Diagonal coefficients and 1 approximate
coefficient as shown in Figure 4.7.

Figure 4.6 Original Image

30
Figure 4.7 1 level Scaled Image of the original image

4.1.3 BINARIZATION
Binarization is carried out using the thresholding. Thresholding is a simple technique
for image segmentation. It distinguishes the image regions as objects or the background.
Although the detected edges are consisting of text edges and non-text edges in every detail
component sub-band, they can distinguish due to the fact that the intensity of the text edges is
higher than that of the non-text edges. Thus, an appropriate threshold can be selected to
preliminarily remove the non-text edges in the detail component sub-bands.
A dynamic thresholding value is calculated as the target threshold value T. The target
threshold value is obtained by performing an equation on each pixel with its neighboring
pixels. Two mask operators are used to obtain mask equation and then calculate the threshold
value for each pixel in the 3 detail sub-bands. Basically, the dynamic thresholding method
obtains different target threshold values for different sub-band images. Each detail
component sub-band es is then compared with T to obtain a binary image (e).
The threshold T is determined by
T = ∑ (es(i,j) X s(i,j))
∑ s(i,j) ----------(4.1)

where
s(i,j)=Max( | g1 * * es(i,j) |,|g2 * * es(i,j)|) ----------(4.2)
and

31
g1= [ -1 0 1], g2=[-1 0 1] t ----------(4.3)

In Equation 4.2, “* *” denote two-dimensional linear convolution.


Figure below shows an example of a 5×5 detail component sub-band (es). The
masked matrix element S(P8) is calculated as given in eqn 4.

P1 P2 P3 P4 P5
P6 P7 P8 P9 P10
P11 P12 P13 P14 P15
P16 P17 P18 P19 P20
P21 P22 P23 P24 P25

Figure 4.8 5×5 detail component sub-band (es)

S(P8) = max(| P9 – P7|,|P13-P3|) ----------(4.4)


Applying similar operations to each pixel, all S (i, j) elements can be determined for each
detail component sub-band. Using Equation 4.1 threshold ‘T’ can then be computed, and the
binary edge image (e) is then given by
255, if es(i,j) > T
e(i,j)= 0, otherwise ------------(4.5)

The resulted binary image, as shown in Figure 4.9 mostly consisted of text edges with few
non-text edges and binarized to 2 mean levels.

32
Figure 4.9 Binary image of detail component sub-band

4.1.4 IMAGE DILATION


For the text region extraction, we use morphological operators and the logical
operator to further remove the non-text regions. In text regions, vertical edges, Horizontal
edges and diagonal edges are mingled together while they are distributed separately in non-
text regions. Since text regions are composed of vertical edges, horizontal edges and diagonal
edges, text regions can be determined to be the regions where those three kinds of edges are
intermixed. Text edges are generally short and connected with each other in different
orientation. Morphological dilation and Erosion operators are used to connect isolated
candidate text edges in each detail component sub-band of the binary image. Figure 4.10
show the Morphological operated scaled image.

Figure 4.10 Morphological operated image of three binary regions

33
The Morphological operators for the three detail sub-bands are designed differently
so as to fit the text characteristics.

4.1.5 LOGICAL AND OPERATION


The logical AND operation is then carried on three kinds (vertical, horizontal and
diagonal) of edges after morphological operation to isolate the text data from the scaled
image. Figure 4.11 demonstrates the application of the logical AND operator for the isolation
of the Text Data in the operated Documented Image. The operator Performs Logical AND
operation element by element for the three images and results the final ANDED Image. The
Morphologically operated image have higher uniformity at text regions compared to the
graphic region which results in elimination of the graphic region when ANDED.

Horizontal Edge

AND

Vertical Edge Extracted Text Region

Diagonal Edge
Figure 4.11 Text extraction by using the logical AND operator
Since three kinds of edge regions are intermixed in the text regions, overlapping
appears a lot after the morphological operation due to the expansion of each single edge. On
the contrary, only one kind of edge region or two kinds of edge regions exist separately in the

34
non-text regions and hence there is no overlapping even after the morphologically operated.
Therefore, the AND operator help in isolating the text regions as shown in Figure 4.11.

35
CHAPTER 5
MATLAB

36
5.1 INTRODUCTION TO MATLAB
The past decade has been a tremendous escalation in the use of computers. There is a
wealth of software packages available for analysis, design and manufacture of devices,
equipment, machinery and systems. There is no doubt that we have familiar with computing
techniques and have an awareness of these software packages. MATLAB is such a software
package that serves a vehicle for analysis and performance of a system.
MATLAB is a software package for high performance numeric computation and data
visualization. Fundamentally, it is built upon a foundation of sophisticated matrix software
for analyzing linear system of equations.

5.1.1 FEATURES OF MATLAB:


• A very strong programming environment, which is reasonably interactive.
• A high level matrix language with control flow statements, functions, data structures,
I/O, OOP features.
• Good plotting features and graphics handling techniques (3D) used for image
processing, animation and in visual interfacing.
• Rich of libraries such a as Application Program that interface (API) that allows you to
write C and FORTRAN Programs that interact with MATLAB.
• Rich of computational algorithms for simple functions like sum, since to complex
functions like FET, Bessel’s function etc.
• There are several ‘Tool boxes’ available for users of MATLAB. These tool boxes are
collections of functions written for special applications such as statistics,
communications, and control system design, signal processing and neural networks.
MATLAB Includes Tools For:
1. Data acquisition.
2. Data analysis and Exploration.
3. Visualization and Image Processing.
4. Algorithm Programming and Development.
5. Modeling and simulation.
6. Programming and Application Development.

37
These tools allow you to solve the problems in Applied Maths, Physics, Chemistry,
Engineering and Finance – almost any application area that deals with complex numerical
calculations.
When MATLAB is invoked the following windows are present:-
• Command Window: This is the main window characterized by the prompt ‘>>’.
• All commands, including those for running user written programs are typed here.
• Command history: All commands typed on MATLAB prompt in command window
get recorded, even across multiple sessions.
• Workspace: It lists all variable that have been generated so far and shows there type
and size.
• Edit Window: This where we write, edit, create and save our own programs in files
called ‘M-Files’.
MATLAB for Speech Recognition:
For implementing a speech recognition algorithm, MATLAB is a good choice. For our
application we use the following tool boxes:
1. Signal processing tool box.
2. Symbolic math tool box.
3. General MATLAB instructions.
Signal Processing Toolbox:
The signal processing tool box is a specialized collection of M-files built specifically for
signal processing operations, from wave form generation to filter design and implementation,
parametric modeling, and spectral analysis.
The Toolbox provides 2 categories of tools:
Command Line Functions:
1. Analog and Digital filter analysis.
2. Digital filter implementation.
3. FIR and IIR digital filter design.
4. Analog filter design.
5. Filter Discretisation.

38
6. Spectral Windows transform.
7. Spectral analysis.
8. Statistical signal processing and Spectral analysis.
9. Parametric Modeling.
10. Linear prediction.
11. Waveform Generation.
Interactive Graphical User Interfaces for:
1. Filter design and analysis-FDA tool.
2. Filter visualization.
3. Signal plotting and analysis-WIN tool.
4. Window visualization.
5. Signal plotting and analysis, Spectral analysis and filtering signals-SP tool.

5.1.2 SYMBOLIC MATH TOOLBOX:


The symbolic math tool box integrates powerful symbolic and variable precision
computing into MATLAB environment. This tool box supplement MATLAB numeric and
graphic facilities with several other types of mathematical computation.

5.1.3 IMAGE MATRIX:


MATLAB handles images as matrices. This involves breaking each pixel of an image
down into the elements of a matrix. MATLAB distinguishes between color and
grayscale images and therefore their resulting image matrices differ slightly.

5.1.4 INTRODUCTION TO PROGRAMMING IN MATLAB:


In this worksheet we will continue to learn Matlab programming. The main goal is that
you by completing the worksheet can write your own Runge Kutta 4 ODE-solver. The
worksheet centers on a few examples, and it is important to implement these examples,
run them and carefully compare the output with the code. The prerequisites for this
worksheets are the worksheets Introduction to Matlab and Introduction to programming
in Matlab.

39
5.1.5 FUNCTIONS IN MAT LAB:

You have already encountered functions that are built in to Matlab. The sin() is a
function that takes an argument and returns an output. As you start to write your own
programs in Matlab you will need to create your own functions that takes one or more
arguments and does something with these arguments.
Example:
Function y = function name(arguments list)
commands

The code above must be written in a separate m-file! The name of the file should coincide
with the name of the function. Remember to save the file with a .m after the file name.

40
5.2 IMPLEMENTATION FUNCTION DESCRIPTION
For the implementation of the proposed design following functions are realized:

FINAL : This function gives the top level user interface to the implemented
modules.
CLOSE1 : This function responds to the closed button of user interface and
closes all the active figure windows.
GUI : The second level graphical user interface created for user interaction.
HR : This function realizes the decomposition of a documenter image
using haar wavelet transform.
HRRES : This function is implemented for displaying the obtained result of
haar wavelet decomposition.
PROCESS : The graphical user interfaces which call back processing of haar
wavelet
WAVE FAST : This function is realized for wavelet decomposition of the image
sample.
WAVE FILTER : The wavelet filters for high pass and low pass filter are defined in
this function.

41
5.3 SOURCE CODE
Final.m*
global GParm
GParm.figure = figure('Color',[0 0 0], ...% black color
'MenuBar','none', ...
'Colormap',gray(256), ...
'Name','DOCUMENT IMAGE ANALYSIS INTERFACE', ... % figure window name
'Visible','on', 'Resize','off', ... % initially not drawn
'NumberTitle','off', ...
'ShareColor','off', ...
'RendererMode','manual','Renderer','painters', ...
'Units','pixels', ...
'Position',[5 5 790 570], ... % window position
'WindowStyle','normal', ...
'Pointer','arrow');
posbox = [30 310 720 180] ;
GParm.close = uicontrol( ...
'Parent',GParm.figure, ...
'Background',[.9 .9 .9], ...
'Style','Pushbutton', ...
'BusyAction','queue','Interruptible','off', ...
'Units','pixel', ...
'String','TEXT LOCALISATION USING DISCRETE WAVELET TRANSFORMS ',...
'Position',posbox) ;
posbox = [90 330 600 30] ;
GParm.close = uicontrol( ...
'Parent',GParm.figure, ...
'Background',[.9 .7 .7], ...
'Style','Pushbutton', ...
'BusyAction','queue','Interruptible','off', ...
'Units','pixel', ...

42
'String','BY -- MERUVA KARTHIK, N. RAVI KRISHNA, SRIKANTH KONALE
',...
'Position',posbox) ;
posbox = [130 150 180 50] ;
GParm.close = uicontrol( ...
'Parent',GParm.figure, ...
'Style','Pushbutton', ...
'BusyAction','queue','Interruptible','off', ...
'Units','pixel', ...
'String','CONTINUE', ...
'Position',posbox, ...
'Callback','gui;' ...
);

posbox = [450 150 180 50] ;


GParm.close = uicontrol( ...
'Parent',GParm.figure, ...
'Style','Pushbutton', ...
'BusyAction','queue','Interruptible','off', ...
'Units','pixel', ...
'String','CLOSE', ...
'Position',posbox, ...
'Callback','close1;' ...
);

43
Gui.m*
function gui1()
GParm.figure = figure('Color',[0 0 0], ...% black color
'MenuBar','none', ...
'Colormap',gray(256), ...
'Name','EMBEDDED ZERO-TREE WAVELET INTERFACE', ... % figure window
name
'Visible','on', 'Resize','off', ... % initially not drawn
'NumberTitle','off', ...
'ShareColor','off', ...
'RendererMode','manual','Renderer','painters', ...
'Units','pixels', ...
'Position',[5 5 795 580], ... % window position
'WindowStyle','normal', ...
'Pointer','arrow');

%%% CLOSE BUTTON

posbox = [620 60 150 50] ;


GParm.close = uicontrol( ...
'Parent',GParm.figure, ...
'Style','Pushbutton', ...
'BusyAction','queue','Interruptible','off', ...
'Units','pixel', ...
'String','CLOSE', ...
'Position',posbox, ...
'Callback','close1' ...
);
posbox = [70 60 150 50] ;
GParm.close = uicontrol( ...
'Parent',GParm.figure, ...

44
'Style','Pushbutton', ...
'BusyAction','queue','Interruptible','off', ...
'Units','pixel', ...
'String','READ INPUT', ...
'Position',posbox, ...
'Callback','[img,H,W,S,c,s,map]=rdfrm;' ...
);
% 'Callback','[s,map]=read1;' ...
%[s,map]=read1
posbox = [260 60 160 50] ;
GParm.close = uicontrol( ...
'Parent',GParm.figure, ...
'Style','Pushbutton', ...
'BusyAction','queue','Interruptible','off', ...
'Units','pixel', ...
'String','PROCESS', ...
'Position',posbox, ...
'Callback','process(S,s,map);' ...
);

Hr.m*
function[c0,s0,A,H,V,D,dilatedimage,onlytext,TSP2]=hr(S,s,map)
load a;
ot=cell(1,S);
s=size(a{1});
for irt=1:S
AQ=cputime;
as=waitbar(0,'PROCESSING TEXT ISOLATION-HARR');
s=a{irt};
f1=double(s);
as1=size(f1);

45
if length(as1)==3
f1=rgb2gray(f1);
end
w_name='haar';
[c0,s0]=wavefast(f1,2,w_name);
waitbar(0.3)
A = appcoef2(c0,s0,w_name,1);
[H,V,D] = detcoef2('all',c0,s0,1);
[s1,s2]=size(H);
sH=H;
for i=2:s1-1
for j=2:s2-1
ab1=abs(H(i,j+1)-H(i,j-1));
ab2=abs(H(i+1,j)-H(i-1,j));
sH(i,j)=max(ab1,ab2);
end
end
waitbar(0.5)
% pause(9)
kH=sum(sH,1);
kH1=sum(kH,2);
j1=abs(H).*sH;
sj1=sum(j1,1);
sumjH=sum(sj1,2);
TH=sumjH/kH1;
[s1,s2]=size(D);
sD=D;
for i=2:s1-1
for j=2:s2-1
ab1=abs(D(i,j+1)-D(i,j-1));
ab2=abs(D(i+1,j)-D(i-1,j));

46
sD(i,j)=max(ab1,ab2);
end
end
waitbar(0.6)
kD=sum(sD,1);
kD1=sum(kD,2);
j1=abs(D).*sD;
sj1=sum(j1,1);
sumjD=sum(sj1,2);
TD=sumjD/kD1;
[s1,s2]=size(V);
sV=V;
for i=2:s1-1
for j=2:s2-1
ab1=abs(V(i,j+1)-V(i,j-1));
ab2=abs(V(i+1,j)-V(i-1,j));
sV(i,j)=max(ab1,ab2);
end
end
kV=sum(sV,1);
kV1=sum(kV,2);
j1=abs(V).*sV;
sj1=sum(j1,1);
sumjV=sum(sj1,2);
TV=sumjV/kV1;
[s1,s2]=size(H);
eH=H;
for i=1:s1
for j=1:s2
if H(i,j)>TH
eH(i,j)=255;

47
else eH(i,j)=0;
end
end
end
waitbar(0.68)
[s1,s2]=size(V);
eV=V;
for i=1:s1
for j=1:s2
if V(i,j)>TV
eV(i,j)=255;
else eV(i,j)=0;
end
end
end
waitbar(0.8)
[s1,s2]=size(D);
eD=D;
for i=1:s1
for j=1:s2
if D(i,j)>TD
eD(i,j)=255;
else eD(i,j)=0;
end
end
end
waitbar(0.9)
seH=strel('rectangle',[3 5]);
di_eH=imdilate(eH,seH);
seV=strel('rectangle',[7 3]);
di_eV=imdilate(eV,seV);

48
seD=strel('rectangle',[3 3]);
di_eD=imdilate(eD,seD);
di_eD=f1;
sd=strel('diamond',4);
dilatedimage = imdilate(di_eD,sd);
se=strel('square',3);
erodedimage=imerode(di_eD,se);
waitbar(1)
onlytext{1,irt}= ~(dilatedimage&~erodedimage);
close(as);
TSP2=cputime-AQ;
end;

Hrres.m*
function hrres(S,t)
global GParm
kl=[];
for i=1:S
bo=t{i};
zs=size(bo);
cx=bo(zs(1)/2:end,:);
kl=[kl;cx];
end
figure('position',[250 290 280 100]);
imshow(kl);
title('ISOLATED TEXT-HAAR WAVELET');

49
Process.m*
function process(S,s,map)
GParm.figure = figure('Color',[0 0 0], ...% black color
'MenuBar','none', ...
'Colormap',gray(256), ...
'Name','DOCUMENT IMAGE ANALYSIS INTERFACE', ... % figure window name
'Visible','on', 'Resize','off', ... % initially not drawn
'NumberTitle','off', ...
'ShareColor','off', ...
'RendererMode','manual','Renderer','painters', ...
'Units','pixels', ...
'Position',[5 5 795 580], ... % window position
'WindowStyle','normal', ...
'Pointer','arrow');

%%% CLOSE BUTTON


posbox = [470 60 150 50] ;
GParm.close = uicontrol( ...
'Parent',GParm.figure, ...
'Style','Pushbutton', ...
'BusyAction','queue','Interruptible','off', ...
'Units','pixel', ...
'String','CLOSE', ...
'Position',posbox, ...
'Callback','close1' ...
);
posbox = [200 60 150 50] ;
GParm.close = uicontrol( ...
'Parent',GParm.figure, ...
'Style','Pushbutton', ...
'BusyAction','queue','Interruptible','off', ...

50
'Units','pixel', ...
'String',' <<== BACK', ...
'Position',posbox, ...
'Callback','gui' ...
);
posbox = [400 400 150 50] ;
GParm.close = uicontrol( ...
'Parent',GParm.figure, ...
'Style','Pushbutton', ...
'BusyAction','queue','Interruptible','off', ...
'Units','pixel', ...
'String','HAAR-WAVELET', ...
'Position',posbox, ...
'Callback','[c03,s03,A3,H3,V3,D3,dilatedimage3,onlytext2,TPS2]=hr(S,s,map);' ...
);
posbox = [600 400 150 50] ;
GParm.close = uicontrol( ...
'Parent',GParm.figure, ...
'Style','Pushbutton', ...
'BusyAction','queue','Interruptible','off', ...
'Units','pixel', ...
'String','HAAR-RESULT', ...
'Position',posbox, ...
'Callback','hrres(S,onlytext2)' ...
);

Rdfrm.m*
function [img,H,W,S,c,s,map]=rdfrm
% movcreat; % function that creates the avi movie

[fname,path] = uigetfile('*.avi','Open an avi movie file :') ;

51
[mov]=aviread(fname);
[c]=mov(1).colormap;
S=length(mov); % to find length of the movie
% map=colormap(hot);
a=cell(7);
% CEll array creation to store all
% the images converted from frame to movie
img=cell(size(mov));
figure;
q=1
mmov=mov(q); % reading each frame from the movie data
mimg=frame2im(mmov); % Converting each frame in to an image
% iptsetpref('TruesizeWarning', 'off')
imshow(mimg,c);
% to display each image
img{1,q}=getimage(gca); % to capture each image from the figure window

q=2
mmov=mov(q); % reading each frame from the movie data
mimg=frame2im(mmov); % Converting each frame in to an image
% iptsetpref('TruesizeWarning', 'off')
imshow(mimg,c);
% to display each image
img{1,q}=getimage(gca); % to capture each image from the figure window

q=3
mmov=mov(q); % reading each frame from the movie data
mimg=frame2im(mmov); % Converting each frame in to an image
% iptsetpref('TruesizeWarning', 'off')
imshow(mimg,c);
% to display each image

52
img{1,q}=getimage(gca); % to capture each image from the figure window

q=1
mmov=mov(q); % reading each frame from the movie data
mimg=frame2im(mmov); % Converting each frame in to an image
% iptsetpref('TruesizeWarning', 'off')
imshow(mimg,c);
% to display each image
img{1,q}=getimage(gca); % to capture each image from the figure window

q=4
mmov=mov(q); % reading each frame from the movie data
mimg=frame2im(mmov); % Converting each frame in to an image
% iptsetpref('TruesizeWarning', 'off')
imshow(mimg,c);
% to display each image
img{1,q}=getimage(gca); % to capture each image from the figure window

q=5
mmov=mov(q); % reading each frame from the movie data
mimg=frame2im(mmov); % Converting each frame in to an image
% iptsetpref('TruesizeWarning', 'off')
imshow(mimg,c);
% to display each image
img{1,q}=getimage(gca); % to capture each image from the figure window
% figure('visible','off');imshow(img{1,S},'notruesize')
% figure('visible','off');
% [z,map]=capture;
%img
[H,W,S1]=size(mimg);
aaa=cell2mat(img);

53
% aa1=aaa(:,1:212,:);
aa1=aaa;
if length(c)==0
imwrite(aa1,'to.bmp');
else
imwrite(aa1,c,'to.bmp');
end
clc
[s,map]=imread('to.bmp');

Wavefast.m*
function[c,s]=wavefast(x,n,w_name)
% error(nargchk(3,4,nargin));
%if nargin==3
% if ischar(varargin{1})
% [lp,hp]=wavefilter(varargin{1}, 'd');
% else
% error('missing wavelet name.');
% end
% else
% lp=varargin{1};
% hp=varargin{2};
% %end
[lp,hp,hr,lr]=wavefilter(w_name);
f1=length(lp);sx=size(x);
% if (ndims(x)~=2)|(min(sx)<2)|~isreal(x)|~isnumeric(x)
% error('x must be a real, numeric matrix.');
% end
% if (ndims(lp)~=2)|~isreal(lp)|~isnumeric(lp)|(ndims(hp)~=2)|~isreal(hp)|~isnumeric(hp)|
(f1~=length(hp))|rem(f1,2)~=0
% error(['lp and hp must be even and equal length real numeric filter vectors.']);

54
% end
% if ~isreal(n)|~isnumeric(n)|(n<1)|(n>log2(max(sx)))
% error(['N must be a real scalar between 1 and log2(max(size(X))).']);
% end
c=[];
s=sx;
app=double(x);
for i=1:n
[app,keep]=symextend(app,f1);%%%% to reduce the border effect
rows=symconv(app,hp,'row',f1,keep);
coefs=symconv(rows,hp,'col',f1,keep);
c=[coefs(:)' c]; % coefs(:) convert the 2D matrix into 1 col matrix, stored in row format
for transmission. DIAG
s=[size(coefs);s];% save the size of coefs element above to image dimension indicate the
first level decomposition matrix SIZE
coefs=symconv(rows,lp,'col',f1,keep);
c=[coefs(:)' c];%%%%% VERTICAL
rows=symconv(app,lp,'row',f1,keep); %%%LOWER CHAIN IMPL.
coefs=symconv(rows,hp,'col',f1,keep);
c=[coefs(:)' c];%%%%%HORI
app=symconv(rows,lp,'col',f1,keep);
end
c=[app(:)' c];%%%%% FINAL LEVEL APPROX
s=[size(app);s];%%%%GIVES THE FINAL APP DIMENSION

Wavefilter.m*

55
function[ld,hd,hr,lr]=wavefilter(wname)
switch lower (wname)
case{'haar','db1'}
ld=[1 1]/sqrt(2);
hd=[-1 1]/sqrt(2);
lr=ld;
hr=-hd;
case 'db4'
ld=[-1.059740178499728e-002 3.288301166698295e-002 3.084138183598697e-002
-1.870348117188811e-001 ...
-2.798376941698385e-002 6.308807679295904e-001 7.1484965705525415e-001
2.303778133088552e-001];
t= (0:7);
hd=ld;
hd(end:-1:1)=cos(pi*t).*ld;
lr=ld;
lr(end:-1:1)=ld;
hr=cos(pi*t).*ld;
case 'sym4'
ld=[-7.576571478927333e-002 -2.963552764599851e-002 4.976186676320155e-001
8.037387518059161e-001....
2.978577956052774e-001 -9.921954357684722e-002 -1.260396726203783e-002
3.222310060404270e-002];
t=(0:7);
hd=ld;
hd(end:-1:1)=cos(pi*t).*ld;
lr=ld;
lr(end:-1:1)=ld;
hr=cos(pi*t).*ld;

case 'bior6.8'

56
ld=[0 1.908831736481291e-003 -1.914286129088767e-003 -1.699063986760234e-002
1.193456527972926e-02...
4.973290349094079e-002 -7.726317316720414e-002 -9.405920349573646e-002
4.207962846098268e-001...
8.2592299745840236e-001 4.207962846098268e-001 -9.405920349573646e-002
-7.726317316720414e-002...
4.973290349094079e-002 1.193456527972926e-002 -1.699063986760234e-002
-1.914286129088767e-003...
1.908831736481291e-003];
hd=[0 0 0 1.442628250562444e-002 -1.44675489679015e-002 -7.872200106262882e-
002 4.036797903033992e-002...
4.178491091502746e-001 -7.589077294536542e-001 4.178491091502746e-001
4.036797903033992e-002...
-7.872200106262882e-002 -1.44675489679015e-002 1.442628250562444e-002 0
0 0];
t=(0:17);
lr=cos(pi*(t+1)).*hd;
hr=cos(pi*t).*ld;

case 'jpeg9.7'
ld=[0 0.02674875741080976 -0.01686411844287495 -0.07822326652898785
0.2668641184428723...
0.6029490182363579 0.2668641184428723 -0.07822326652898785
-0.01686411844287495...
0.02674875741080976];
hd=[0 -0.9127176311424948 0.05754352622849957 0.5912717631142470
-1.115087052456994...
0.5912717631142470 0.05754352622849957 -0.9127176311424948 0 0];
t=(0:9);
lr=cos(pi*(t+1)).*hd;
hr=cos(pi*t).*ld;

57
otherwise
error('Unrecognizable wavelet name(wname)');
end

58
CHAPTER 6
RESULTS AND CONCLUSIONS

59
6.1 FIRST GRAPHICAL USER INTERFACE

Figure 6.1 Initial Graphical User Interface

The initial GUI created which shows the project title, and other details regarding the
project work. In this window a continue button is used to go for further processing and a
close button to terminate the application.

6.2 READING DOCUMENTED VIDEO FILE

Figure 6.2 Read documented video file

60
The above screen is created for input interface having a READ INPUT button, which
is used to select a desired video file for processing from the select window. The process
button helps to move further in the application and close button terminate the current process.

6.3 DOCUMENTED VIDEO FILE

Figure 6.3 Documented video file

On selection of READ INPUT button the selected input get displayed for processing.
On selection of TRAIN the alphanumeric data which is stored in database is trained for
character recognition.

6.4 TEXT ISOLATION USING HAAR WAVELET

Figure 6.4 Text isolation from video file using Haar wavelet

61
6.5 CONCLUSION
The project work realizes efficient text segmentation algorithm and character
recognition for the isolated text data in a documented video image sequence. The text
isolation system implements three wavelet transformations namely Harr, Debauchie and
spline wavelet and analyzes the effect of these wavelet transformation on text isolation
process for a given video sequence. The systems also analyze the effect of level
decomposition on a documented video image sequence. The obtained results shows good
isolation of text data form the image sequance for biorthogonal spline wavelet and shows
better result at fourth level of decomposition. From the multi level and multi wavlet analysis
the best suited wavelet transform and the best level of decomposition is obtained which is
used under text isolation and its recogintion. The spline wavelet transformation gives more
accuracy to isolation of text data compared to haar and debuchie wavlet transform at higher
level of decomposition. The implemented design also realizes the character recognition unit
using supervised learning process. It is observed that the system recognizes the isolated text
data form the video sequence to high accuracy.

6.6 FUTURE SCOPE


The project work is implemented without considering the occultation effect in a video
sequence. This work can be extended considering the occultation effect for text data. The
project work can also be enhanced for high intensity video images with variable feature
components such as moving text on static image. The moving image with moving text can
also be considered as one more case of implementation.

62
REFERENCES
[1] Chung-Wei Liang and Po-Yueh Chen, “DWT Based Text Localization”, Int. J. Appl.
Sci. Eng., 2004. 2, 1.
[2] Jie Xi, Xian-Sheng Hua, Xiang-Rong Chen, Liu Wenyin, Hong-Jiang Zhang, “A
Video Text Detection And Recognition System”, Microsoft Research China 49
Zhichun Road, Beijing 100080, China.
[3] Xian-Sheng Hua, Pei Yin, Hong-Jiang Zhang. “Efficient Video Text Recognition
Using Multiple Frame Integration”, Microsoft Research Asia, 2.
[4] C´eline Thillou and Bernard Gosselin, “Robust Thresholding Based On Wavelets
And Thinning Algorithms For Degraded Camera Images”, Facult´e Polytechnique de
Mons, Avenue Copernic, 7000 Mons, Belgium.
[5] Celine Thillou, Bernard Gosselin, “Segmentation-Based Binarization For
Colordegraded Images ”, Facult´e Polytechnique de Mons, Avenue Copernic, 7000
Mons, Belgium.
[6] Maarten Jansen, Hyeokho Choi, Sridhar Lavu, Richard Baraniuk, “Multiscale Image
Processing Using Normal Triangulated Meshes”, Dept. of Electrical and Computer
Engineering Rice University Houston, TX 77005, USA.
[7] S. Antani D. Crandall R. Kasturi, “Robust Extraction of Text in Video”, Proceedings
of the International Conference on Pattern Recognition (ICPR'00) 2000 IEEE.
[8] Kobus Barnard and Nikhil V. Shirahatti, “A method for comparing content based
image retrieval methods”, Department of Computer Science, University of Arizona.
[9] Rainer Lienhart and Frank Stuber, “Automatic text recognition in digital videos”,
University of Mannheim, Praktische Informatik IV, 68131 Mannheim, Germany.
[10] Rainer Lienhart and Wolfgang Effelsberg, ”Automatic Text Segmentation and Text
Recognition for Video Indexing”, ACM/Springer Multimedia Systems Magazine.
[11] Jafar M. H. Ali Aboul Ella Hassanien, “An Iris Recognition System to Enhance E-
security Environment Based on Wavelet Theory”, AMO - Advanced Modeling and
Optimization, Volume 5, Number 2, 2003.
[12] Jovanka Malobabiæ, Noel O'Connor,Noel Murphy, Sean Marlow, “Automatic
Detection And Extraction Of Artificial Text In Video”, Adaptive Information
Cluster, Centre for Digital Video Processing Dublin City University, Dublin,

63
Ireland.
[13] Chew Lim Tan, Ruini Cao, Qian Wang, Peiyi Shen, “Text Extraction from
Historical Handwritten Documents by Edge Detection”, School of Computing,
National University of Singapore,10 Kent Ridge Crescent, Singapore 119260.
[14] Aurelio Velázquez and Serguei Levachkine, “Text/Graphics Separation and
Recognition in Raster-scanned Color Cartographic Maps”, Centre for Computing
Research (CIC) - National Polytechnic Institute (IPN).

64

Vous aimerez peut-être aussi