Académique Documents
Professionnel Documents
Culture Documents
TITLE PAGE
ABSTRACT
Page | 1
DIGITAL IMAGE PROCESSING
Information carrying function of time is called signal. Real time signals can be
audio or video(image) signals. Still video is called an image. Moving image is called a video.
Difference between digital image processing and signals and systems is that time graph is not
there in DIP. X and Y coordinates in DIP are spatial coordinates.Time graph is not there
because photo doesn’t change with time.
IMAGE : An image is defined as a two dimensional function f(x, y) where x and y are
spatial coordinates and the amplitude ‘f’ at any point (x, y) is known as the intensity of image
at that point.
PIXEL: A pixel (short for picture element) is a single point in a graphic image. Each such
information element is not really a dot, nor a square but an abstract sample. Each element of
the above matrix is known as a pixel where dark = 0 and light = 1. A pixel with only 1 bit
will represent a black and white image. If the numbers of bits are increased then the number
of gray levels will increase and a better picture quality is achieved.
All naturally occurring images are analog in nature. If the number of pixels is more then the
clarity is more. An image is represented as a matrix in DIP. In DSP we use only row
matrices. Naturally occurring images should be sampled and quantized to get a digital image.
A good image should have 1024*1024 pixels which is known as
1k * 1k = 1M pixel.
Page | 2
FUNDAMENTAL STEPS IN DIP:
Page | 3
and frequency information simultaneously, hence giving a time-frequency
representation of the signal. Although the time and frequency resolution problems are
results of a physical phenomenon ( the Heisenberg uncertainty principle ) and exist
regardless of the transform used, it is possible to any signal by using an alternative
approach called the multiresolution analysis (MRA). MRA analyzes the signal at
different frequencies with different resolutions. MRA is designed to give good time
resolution and poor frequency resolution at high frequencies and good frequency
resolution and poor time resolution at low frequencies.
Page | 4
The main parts of image sensing are :
1. Sensor( converts optical to electrical energy)
2. Digitizer(It converts analog signal to digital signal by sampling and quantization)
Light and electromagnetic spectrum : Light is a part of electromagnetic spectrum that can
be seen and sensed with the human eye. Light travels at a speed of 3*10^8 m/s. If any one
can cross this speed they can go into past or future. The visible light can be split up into
VIBGYOR from range of violet ( 0.43micrometres ) to
red( 0.79micrometres). A substance which absorbs all the colours appears as black, no
colour as white. A substance which reflects blue will appear as blue. Colour is the part of
light spectrum which is not absorbed by human eye. Light that is void of colour is called
monochromatic or achromatic light. The only property of this light is intensity or
graylevel.
Properties of light :
1. Radiance : The total energy that flows from a light source is called radiance. Units are
watts. Example of sources are sun, bulb etc.
2. Luminance : The amount of energy that an observer perceives from the source is
called luminance. Units are lumens. Example is seeing the sun with black glasses.
3. Brigthness : Brightness is an attribute of visual perception in which a source appears
to emit a given amount of light. It has no units as it is practically impossible to
measure.
Page | 5
IMAGE SENSING AND ACQUISITION :
In image sensing light energy is converted into voltage. Image acquisition can
be done using 3 principle sensor arrangements :
1. Single sensor
2. Line sensor/strip sensor
3. Array sensor
If something is changing more than 17 times per second or if the frequency is greater
than 17 then we cannot differentiate it.
Page | 6
All real time signals are analog in nature. But we need a digital image. But we acquire an
analog image and digitize it. For this we need an A/D converter. To convert naturally
occurring images into digital form we must digitize both coordinates and amplitudes.
Digitizing coordinate values is called sampling and digitizing amplitude values is called
quantization. Hence the quality of digital image will depend on the number of
samples(sampling) & the number of gray levels(quantization). More the samples better will
be the quality.
A 1M pixel size image requires 1024 rows and 1024 columns. If each pixel is represented
using 8 bits then total amount of memory required to store the image = 1k*1k*1 bytes
Page | 7
RESOLUTION : Resolution is classified into 2 types :
1. spatial
2. graylevel
1.SPATIAL RESOLUTION :
It is the smallest discernable detail in the image. Consider vertical lines of width ‘w’
with spaces between also having width ‘w’. A line pair consists of one such line & its
adjacent space. Hence width of line pair = 2w and there are 1/(2*w) line pairs per unit
distance. Thus spatial resolution is the number of discernable line pairs per unit distance.
Example : 100 line pairs/unit distance.
GRAYLEVEL RESOLUTION :
Page | 8
INTRODUCTION
TO
MATLAB
Page | 9
Matlab Introduction
MATLAB is a high performance language for technical computing .It integrates computation
Mat lab stands for matrix laboratory. It was written originally to provide easy access to
matrix software developed by LINPACK (linear system package) and EISPACK (Eigen
2. Algorithm development
3. Data acquisition
2. A large collection of predefined mathematical functions and the ability to define one’s
own functions.
5. Powerful, matrix or vector oriented high level programming language for individual
applications.
Page | 10
MATLAB
MATLAB
Programming language
Tool boxes
Signal processing
Image processing
Control systems
Neural Networks
Communications
Robust control
Statistics
Page | 11
The MATLAB System
The MATLAB system consists of five main parts:
Development Environment.
This is the set of tools and facilities that help you use MATLAB functions and files. Many of
these tools are graphical user interfaces. It includes the MATLAB desktop and Command
Window, a command history, an editor and debugger, and browsers for viewing help, the
workspace, files, and the search path.
The MATLAB Mathematical Function Library.
This is a vast collection of computational algorithms ranging from elementary functions, like
sum, sine, cosine, and complex arithmetic, to more sophisticated functions like matrix
inverse, matrix Eigen values, Bessel functions, and fast Fourier transforms.
The MATLAB Language.
This is a high-level matrix/array language with control flow statements, functions, data
structures, input/output, and object-oriented programming features. It allows both
"programming in the small" to rapidly create quick and dirty throw-away programs, and
"programming in the large" to create large and complex application programs.
Graphics.
MATLAB has extensive facilities for displaying vectors and matrices as graphs, as well as
annotating and printing these graphs. It includes high-level functions for two-dimensional and
three-dimensional data visualization, image processing, animation, and presentation graphics.
It also includes low-level functions that allow you to fully customize the appearance of
graphics as well as to build complete graphical user interfaces on your MATLAB
applications.
Starting MATLAB
On Windows platforms, start MATLAB by double-clicking the MATLAB shortcut icon on
your Windows desktop. On UNIX platforms, start MATLAB by typing mat lab at the
Page | 12
operating system prompt. You can customize MATLAB startup. For example, you can
change the directory in which MATLAB starts or automatically execute MATLAB
statements in a script file named startup.m
MATLAB Desktop
When you start MATLAB, the MATLAB desktop appears, containing tools (graphical user
interfaces) for managing files, variables, and applications associated with MATLAB. The
following illustration shows the default desktop. You can customize the arrangement of tools
and documents to suit your needs. For more information about the desktop tools .
Implementations
1. Arithmetic operations
2. Matrix operations
3. Graphical Representation
Page | 13
SYSTEM
OVERVIEW
Page | 14
SPECIFICATION:
This project is an offline application. Car images are taken by digital camera or
traditional camera. Then a program written in MatLab is used to identify the Number Plate
Plate Extraction
Character Segmentation
Character Recognition
Display Number
CONSTRAINTS:
1. All the pictures of cars are to be taken from fixed angle parallel to the horizon
2. Car is stationary when the image is taken
3. Car number plates should be according to the Central Motor Vehicle Rules 1989.
4. Intensity of images which is too high or too low will not be dealt with.
Page | 15
SYSTEM DESIGN
Page | 16
IMAGE ACQUISATION:
We use the yellow license plate at the back of the car as the input image. We take two sets of
Then, the photos are resized to a resolution of 256 * 192 pixels. Although the program does
not take long time to execute, reducing the resolution can reduce the computation time, and at
192 pixels
256 pixels
PLATE EXTRACTION:
Page | 17
Plate Extraction Binarization
Find Edges
Hough Transformation
Character Segmentation
Character Recognition
BINARIZATION:
We need to change the color image to binary image first. We change yellow to white and
non-yellow to black.
As RGB color space is greatly affected by light, we cannot use it to determine colors.
Page | 18
FIND EDGES:
We can use closing operator to outline the four edges of the license plate in the binary image.
1. Canny Edge Detection:
detect gradient change (from black to white; from white to black)
2. Dilation:
add a border
3. Filing:
Page | 19
fill in small holes
4. Erosion:
remove most of the dilated pixels (border pixels)
but once the interior holes are filled, they cannot be removed
original size of license plated is unchanged
HOUGH TRANSFORMATION:
Page | 20
Hough Transformation is used to detect line. We use two domains to explain it. One
is spatial domain (Fig. 4) and the other is parameter domain (Fig. 5). A line in a spatial
domain is represented by a point in the parameter domain; a point in a spatial domain is
represented by a line in the parameter domain.
We can see that the line in Fig. 4 is represented by a point in Fig. 5. If we take 20
points on the line in Fig. 4, they are represented by 20 lines in Fig. 5. All the 20 lines pass
through one point. We set up an accumulator cell for every pixel (point). For each line
passing the pixel, the accumulator cell is increased by one. Therefore, the accumulator cell is
added up to 20.
Page | 21
Fig. 6 Accumulator Cells
By looking at the accumulator cells, we can see that there is a peak equals to 20. This is the
position of the line in the spatial domain.
A program is written to set up an accumulator array A(r, theta).
r is the longest line possible in the 256 * 192 pixel image, which is given by
floor(sqrt(256^2 + 192^2)) = 320.
360 gives the range of theta we want to examine, begin 1 to 360.
In the image, all the values of x and y are searched through to find any feature points.
Feature points are intensity = 1. That is white in color.
For each (x,y) feature point, all the corresponding (r, theta) points are found.
r negative is not ‘voted for’ in the accumulator cell, otherwise, the accumulator cell is
increased by one.
The maximum value of the A(rm, thetam) array is found and the corresponding (rm, thetam)
values of this maximum value are found.
Page | 23
CHARACTER
SEGMENTATION
Page | 24
CHARACTER SEGMENTATION:
Horizontal Projection
Vertical Projection
Character Segmentation
Fine Tuning
Character Recognition
3.3.1 Scaling
We use pixel count to remove noise and segment characters. However, the characters
in the photo which taken nearer are larger and contain more pixels (Fig. 8) but the characters
in the photo which taken further are smaller and contain fewer pixels (Fig. 10). Therefore,
we need to scale the extracted license plate to a pre-defined dimension so as to make every
character has roughly the same height and width. As can be seen, after scaling, size of
characters in Fig. 8 is roughly the same as size of characters in Fig. 10. Then we can use
pixel count.
Page | 25
Fig. 7 Photo taken nearer
Page | 26
HORIZONTAL PROJECTION:
There is always noise around the extracted license plate (Fig. 11), therefore we need
to remove or reduce the noise before segment the characters.
We can use horizontal projection to see the distribution of the pixels (Fig. 12). X-axis is the
height of the license plate (from top to bottom). Y-axis is the pixel count per row of the
license plate [4].
Page | 27
Double Line License Plate:
The noise above the characters is usually found less than 5 pixels. Therefore, we choose 5 as
the threshold. Also, there must have more than 5 rows of pixel which is greater than 5
continuously so as to be considered as useful.
The noise below the characters is usually found less than 15 pixels. Therefore, we choose 15
as the threshold. Also, there must have more than 5 rows of pixel which is greater than 15
continuously so as to be considered as useful.
After removing the noise above and below the plate, we need to separate the top set of
characters and the bottom set of characters.
Each set of characters occupied about half of the plate (Fig. 14, Fig. 15).
To locate the bottom of the top set of characters, we search from the middle and trace left. If
the pixel is more than 5, then it is the bottom of the top set of characters.
To locate the top of the bottom set of characters, we search from the middle and trace right.
If the pixel is more than 15, then it is the top of the bottom set of characters.
After removing the noise, and separate the license plate into two sets, it becomes Fig. 16.
Page | 28
VERTICAL PROJECTION:
There is always noise around the characters, so we need to distinguish the characters
from the noise.
We can use vertical projection to see the distribution of the pixels (Fig. 18). X-axis is the
width of the license plate (from left to right). Y-axis is the pixel count per column of the
license plate [4].
Ideal Case
There is no noise between the characters (Fig. 17), so there is a sharp valley which pixel
count = 0 (Fig. 18). This is the break point between characters.
To be more precise, we choose 2 as the threshold. Pixel count greater than 2 in a column will
be considered as useful.
By locating these valleys, the positions of the characters are found and segmented (Fig. 19).
Page | 29
Connected Characters
There is noise between the characters (Fig. 20), so there is no sharp valley and it is difficult to
find the break points.
We need to check the width of each separated characters. If the width of a segmented
character is greater than 20 pixels, it is composed of several characters and need further
segmentation.
GU8767
Fig. 25 Isolated characters
Page | 31
FINE TUNING
After segmenting the characters, we need further removing the noise and white lines
above and below the characters due to their non parallel arrangement in the photo. Also, we
need to scale every character to the same dimension so that they can be compared with the
templates stored in the data base in the next procedure.
To remove the white lines above and below each character, we use horizontal projection and
set a threshold. If the pixel count is less than the threshold, the row is discarded (characters
‘2’ and ‘5’ in Fig. 26 & characters ‘5’, ‘8’ and ‘4’ in Fig. 28).
To remove the noise on the left and right of the character, we use vertical projection and set a
threshold. If the pixel count is less than the threshold, the column is discarded (character ‘J’
in Fig. 26).
Then the character is scaled to a pre-defined dimension (Fig. 27).
Page | 32
CHARACTER
RECOGNITION
Page | 33
CHARACTER RECOGNITION:
Component Labeling
Vertical Projection
Character Segmentation
Character Recognition
TEMPLATE MATCHING:
A list of templates is stored in the databases. There are two databases, one for
alphabet, and the other for numeric characters. These templates are chosen from the isolated
characters. There are two to three templates for each character. They are in high quality and
have outstanding features.
The first two isolated characters are compared with the templates stored in the alphabet
database; the last three to four isolated characters are compared with the templates stored in
the numeric database.
We measure the degrees of matching between the isolated character and each of the stored
templates, and select the highest degree of match. The maximum value of 2D correlation is
equal to 1.
Template characters
Page | 34
COMPONENT LABELING:
By using Template Matching, some characters are often mixed up. They are ‘8’ &
‘3’, ‘8’ & ‘6’, ‘8’ & ‘9’ and ‘3’ & ‘9’. Therefore, if a character is matched as ‘3’, ‘6’, ‘8’ or
‘9’ and its correlation value is less than 0.7, it needs to be further recognized by Component
Labeling.
Labeling method
Character ‘3’ has one stroke and no holes – labeled as 1
Character ‘6’ has one stroke and one hole – labeled as 2
Character ‘8’ has one stroke and two holes – labeled as 3
Character ‘9’ has one stroke and one hole – labeled as 2
As can be seen, characters ‘3’ and ‘8’ have different labels, so they can be recognized.
However, characters ‘6’ and ‘9’ have the same label, so they need to be further recognized by
Vertical Projection.
Page | 35
VERTICAL PROJECTION:
To distinguish between characters ‘6’ and ‘9’, we can use Vertical Projection to see
the distribution of the pixels (Fig. 30 and Fig. 31). X-axis is the width of the character (from
left to right). Y-axis is the pixel count per column of the character.
We add up the first four columns and add up the last four columns of the pixels.
If the pixel count of first four columns is greater than the last four columns, it is character ‘6’
(Fig. 30); otherwise, it is character ‘9’ (Fig. 31).
Page | 36
SNAPSHOTS
Page | 37
1. Loading car image
3. Noise removed
Page | 38
4.Binarization performed
5. Edges detetcted.
Page | 39
7.Noise removed
Page | 40
FINAL OUTPUT:
10.Final result
RESULT:
Text characters are recognized and displayed to the user using word pad.
Page | 41
EXPERIMENTAL
RESULTS
&
IMPROVEMENTS
Page | 42
IMAGE ACQUISATION:
Photos with higher resolution (1024 * 768 pixels) are better than lower resolution (640 *
480 pixels).
Photos taken nearer are better than photos taken farer. Since photos taken farer are
dimmer, so they have more dark areas on the license plates which produce noise.
Improvement:
Apply Histogram Equalization to the image. It can spread out the pixel intensities so as to
CHARACTER ISOLATION:
Single line license plate is better than double line license plate. It is because the top and
Improvement:
CHARACTER RECOGNITION:
used.
Page | 43
CONCLUSION:
Our project combines many methods to extract the license plate, segment the
characters and identify the characters. Further improvement should focus on not only the
accuracy of identification, but also the accuracy of license plate extraction, character
A program coding with MatLab goes through all the stages of the license plate recognition is
built. It is helpful to understand the procedures of license plate recognition step by step.
They are image acquisition, plate extraction, character segmentation and character
Page | 44
BIBLIOGRAPHY:
http://www.mathworks.com/products/image/demos.jsp#
http://www.cat.csiro.au/cmst/AC/expertise/Expertise.php?ocr
Template Matching
http://en.wikipedia.org/wiki/Template_matching#cite_note-0
http://www-cs-students.stanford.edu/~robles/ee368/matching.html
Textbooks:
Page | 45