Vous êtes sur la page 1sur 5

International Journal of Applied Engineering Research, ISSN 0973-4562 Vol. 9 No.

20, 2014
Research India Publications http://www.ripublication.com/ijaer.htm

Combined histogram chain code feature extraction


method to recognize handwritten digits with
probabilistic neural network
N. Venkateswara Rao B. RaveendraBabu
Dept. of Computer Science & Engineering, VNR VignanaJyothi Institute of Engineering and
R.V.R. & J.C. College of Engineering, Technology,
Guntur, INDIA Hyderabad, INDIA
vraonaramala@gmail.com rbhogapathi@yahoo.com

Abstract - This paper presents a novel approach for automatic selection of feature extraction method and classifier. To
recognition of handwritten digits from histogram chain code achieving high recognition rate we used combined
feature extraction method by using probabilistic neural network histogram chain code feature extraction method and a
classifier. The handwritten digits recognition is a vital research probabilistic neural network classifier. If the feature
in the domain of pattern recognition. The features are extracted extraction method is robust then the classification of the
from the digit images using histogram chain code. Four features is easy so that for a variety of samples of the same
directional histogram chain codes were calculated here, i.e., digit images, similar set of feature are generated. Different
horizontal, vertical, left diagonal and right diagonal histogram feature extraction methods for handwritten digits are
chain code. To increase the accuracy of the recognition rate the majorly classified into two types: statistical features and
chain code features are trained with probabilistic neural network
structural features. The most common statistical features
to classify the digit images. The individual histogram chain code
features give reasonably good accuracy, while the combined used for handwritten character recognition are: a) zoning,
histogram chain code features gives a recognition rate of 98.1% where the character image is divided into multiple zones and
which is better than the individual histogram chain code feature. features are extracted from each zone b) projections like
The strength of this approach is the probabilistic neural network horizontal projections, vertical projections etc. and c)
classification scheme due to which, we have been able to achieve crossings that consist of the number of transitions from
a high recognition rate. foreground image pixels to background image pixels along
horizontal and vertical lines and distances [11]. Structural
Keywords handwritten digits, probabilistic neural network, features are based on topological and geometrical properties
chain code histogram, feature extraction. of the character image [12]. In addition to the above
methods, many different feature extraction techniques have
INTRODUCTION been described in the literature [8, 9, 15, 16, 18, and 19].
Handwritten Digit Recognition is an active research area The selection of neural network classification scheme is not
in the field of artificial intelligence, pattern recognition, and an easy task since the classifier depends on so many things
computer vision problems. It refers to the translation of such as available training samples, number of hidden
scanned or photocopied images of handwritten text into neurons, type of training algorithm, transfer function etc.
machine editable text. Nowadays, the recognition of Since from many decades different classification methods
handwritten characters is comparatively difficult than the have been applied to handwritten digit recognition. These
recognition of machine printed characters [5], because the methods include statistical methods like Bayes decision
people have different handwriting styles. Each handwritten rule, Artificial Neural Networks (ANNs) [11, 13, 17],
digit character image is unique in many ways and if we can Kernel Methods including Support Vector Machine (SVM)
extract the unique features of the character image we can [14], Hidden Markov Models (HMM) [3,4] and multiple
train the computer about that particular character image. classifier combination [10].
Each character has different sets of features which can be
METHODOLOGY
used while comparing with a test character. Hence by this
way we can make the computer recognize a character. So, The basic processes involved in any handwritten digit
the handwritten character recognition is still an active area recognition systems were Pre-processing the input image
of research [1, 6]. The approach used in this system is a data, extracting the features from the input data, and the
two-step classification of the extracted features. The steps involved in
pre-processing are binarization, cropping, normalization,
process: 1) represent the handwritten digits as a and thinning. In binarization the given image is converted
histogram chain code feature vector [20, 21] and 2) into binary image by computing the average threshold value
classification of features vector into set of classes. The of the image. Normally the image data is not aligned to the
recognition rate of handwritten digits is depends on the center, so the images are cropped. In normalization all the

[Page No. 4585]


International Journal of Applied Engineering Research, ISSN 0973-4562 Vol. 9 No. 20, 2014
Research India Publications http://www.ripublication.com/ijaer.htm

sample images of the dataset are set to a predefined size in a digital image. It plots the number of pixels for each
since all the images may not have the same size. In thinning intensity value in the given image. The histogram is a graph
the normalized image is reduced to a single pixel thickness. showing the number of pixels in an image at each different
In feature extraction process the number of image pixels of intensity values found in that image. There are 256 different
a thinned image in different directions are summed into a possible intensities for grayscale images, and so the
numeric value and it is represented as a histogram chain histogram of these images will graphically display a total of
code value. The classification is performed on the histogram 256 numbers showing the distribution of the pixels amongst
chain code features of different directions with a those grayscale values. Here we represented the histogram
probabilistic neural network. The overall process of this of an image as the number of pixels in each direction as a
methodology is shown in Figure 1. count instead of intensity values. When the thinned image is
obtained, the horizontal, vertical, left diagonal and right
Preprocessing diagonal histogram of the image is calculated and it is
Binarizatio Normaliization Thinning determined as a feature vector. Then all the four directional
histogram sequences are combined into a single integer
sequence as feature vector of the digit image sample. The
combined integer sequence is represented as a chain code of
Right Left Vertical Horizonta the digit image. The histogram chain code features are used
Diagonal Diagonal
in the neural network classifier to recognize the digit
Histogram Chain code Feature Extraction images. Classical chain code is represented as directional
information of the image. Figure 3 shows the four
directional chain code histograms computed for an example
Feature Neural Class
Vector Network Label of one sample image which shown in Figure 2.
Classification

Pre-processing
Pre-processing is one of the preliminary steps in
handwritten character recognition systems. Before the raw
data is used for feature extraction it has to undergo certain
preliminary processes so that we can get accurate results.
This step helps in correcting the deficiencies in the data
which may have occurred due to the limitations of the
sensor or photo copied data. The input for a system may be
taken in different environmental conditions. Same object
may give different images when taken in different time and
different conditions. Hence by doing preprocessing we get
data that will be easy for the system to operate on, there by
producing accurate result. Before inputting the data into the
network, the image has to be converted into the binary
image. Then the binary image is resized to 32 X 32 pixels.
Afterwards, the image is thinned to the single pixel
thickness so only the skeleton remains. The process of the
preprocessing is shown in Figure 2 for one sample of the
image taken from the digit dataset.

Figure 2: Four Directional Chain Code Histogram (a) horizontal (b) vertical
(c) left diagonal (d) right diagonal
Figure 1: Process of Pre-Processing Method
Feature Extraction
In Statistics, Histogram is a graphical representation Classification
showing a visual impression of the distribution of data. In A Probabilistic Neural Network (PNN) is one type of the
an image processing environment, the histogram of an back propagation neural network. The PNN is motivated
image is normally refers to the intensity of the pixel values. from the theory of Bayesian classification and the classical
An Image Histogram is a type of histogram that acts as a estimation of probability density functions (PDF). The
graphical representation of the lightness or color distribution architecture of the Probabilistic neural network is a feed

[Page No. 4586]


International Journal of Applied Engineering Research, ISSN 0973-4562 Vol. 9 No. 20, 2014
Research India Publications http://www.ripublication.com/ijaer.htm

forward in nature which is analogous to the back In PNN algorithm, calculating the class-node activations is a
propagation neural network, but the distinction between simple process. For each class node, the example vector
these two is the way that learning process occurs. activations are summed, which are the sum of the products
Probabilistic neural network is one of the supervised of the example vector and the input vector. The hidden node
learning mechanisms but it includes no weights in its hidden activation, shown in the following equation, is simply the
layer. Each hidden layer node represents a pattern or feature product of the two vectors (E is the example vector, and F is
vector, with the example substitute as the weights to that the input feature vector).
hidden layer node. These hidden layer nodes are not
adjusted at all. The sample architecture of the PNN is shown =
The class output activations are then defined as:
( )
Input Hidden Summation Output
Layer Layer Layer Layer
=

where N is the total number of example vectors for this


class, hi is the hidden-node activation, and is a smoothing
factor. From the experimentation the smoothing factor is
chosen. If the smoothing factor is too small, the classifier
may not generalize well, but if the smoothing factor is large,
the details can be lost. Even in the noisy data environment
the probabilistic neural network also generalizes very well.
PNN Classification Algorithm 2:
Step 1: Initialize the weights from training algorithm.
in Figure 4. Step 2: For each pattern (hidden layer) to be classified, do
steps 3-5
Figure 3: Probabilistic Neural Network Architecture Step 3: in hidden layer units
The architecture of the probabilistic neural network i) Calculate net input, Zinj = x . wj= xT.Wj
consists four layers, i.e. input layer, hidden (pattern) layer,
summation layer, and output layer as shown in Figure 4. ii) Calculate the output Z =exp[Zinj 1/ 2]
The input layer is fully interconnected with the hidden layer, Step 4: in Summation layer units
which consists of the example feature vectors of the training
set. The example vector serves as the weights as applied to The weights used by the summation unit for Class B is,
the input layer. The output layer represents each of the VB = - PBCBmA / PACAmB
possible classes for which the input vector data can be
classified. The classification in the PNN is done by the Step 5: Output unit:
output layer using winner-takes-all method. The output class
node with the largest activation represents the winning class. It sums the signals from fA and fB. Input vector is
But the hidden layer nodes are not fully interconnected to classified as Class A if the total input to decision unit
the output layer nodes. The basic operation performed by is positive.
the PNN is an estimation of the probability density function RESULTS
of features of the each class provided from the training
samples with Gaussian Kernel method. The training process Experiments are conducted on MNIST digit dataset. We
is performed by the net is shown in Algorithm 1. These have taken 100 samples of each digit images from the data
estimated probability densities are then used in a Bayes set a total of 1000 samples. For all the samples the four
decision rule to perform the classification. The classification directional chain code histogram feature vector is computed.
is performed by the net is shown in Algorithm 2. All the images are normalized to 32 X 32. The input pattern
vector consists of 32 horizontal chain code values, 32
PNN Training Algorithm 1: vertical chain code values, 63 left diagonal chain code
Step 1: For each training input pattern, x (p), p=1, P. values, and 63 right diagonal chain code values, a total of
perform step 2-3 190 values. The individual chain code histogram features
are trained by the probabilistic neural network classifier and
Step 2: Create a hidden layer (or pattern unit) Zp, and observed the recognition rate. The four directional chain
Weight vector for unit Zp is wp = x (p) code histograms are combined into a single histogram chain
Step 3: Connect the hidden layer to summation layer. If x code as a feature vector of a numeric value. The combined
(p) belongs to Class A, connect hidden layer Zp to histogram chain code features is trained with probabilistic
summation layer SA otherwise connect hidden layer Zp to neural network and gives good results than the individual
summation layer SB and so on. histogram chain codes. The recognition rate and the error
rate for the combined histogram chain code are observed as

[Page No. 4587]


International Journal of Applied Engineering Research, ISSN 0973-4562 Vol. 9 No. 20, 2014
Research India Publications http://www.ripublication.com/ijaer.htm

98.1% and 1.9% respectively. The combined histogram [5] Shumji Mori, Kazuhiko Yamamoto, Michio Yasuda,
results are observed in the confusion matrix shown in Figure Research on Machine Recognition of Hand printed
5.The performance of the net is also computed and it is
Characters, IEEE Tran on Pattern Analysis and
shown in theFigure 6.
Machine Intelligence, Vol 6, No 4, pp 386404, 1984.
[6] J.Mantas, An overview of character recognition
methodologies, Pattern Recognition Journal, Vol 19,
No 6, pp 425 430, 1986.
[7] Plummer and E.R. Davies, Thinning Algorithms A
Critique and A New Methodology, Pattern
Recognition, Vol 14, No 1, pp 53 63, 1981.
[8] . D. Trier, A. K. Jain, and T. Taxt, Feature Extraction
Methods for Character Recognition-A Survey, Pattern
Recognition, Vol. 29, No. 4, pp. 641-662, 1996.
[9] L. Lam, S.W. Lee and C. Y. Suen, Thining
Figure 5: Confusion Matrix Methodologies - A Comprehensive Survey, IEEE
Trans. on Pattern recognition and Machine
Intelligence, Vol 14, pp 869 885, 1992.
[10] J. Cai, M. Ahmadi, and M. Shridhar, Recognition of
Handwritten Numerals with Multiple Feature and
Multi-stage Classifier, Pattern Recognition, Vol 28,
No 2, pp 153-160, 1995.
[11] J. Cai, M. Ahmadi, and M. Shridhar, A Hierarchical
Neural Network Architecture for Handwritten
Numeral Recognition, Pattern Recognition, Vol 30,
No 2, pp 289-294, 1997.
Figure 6: Performance of the Net [12] G. Y. Chen, T. D. Bui, and A. Krzyzak, Contour-
Based Handwritten Numeral Recognition Using
CONCLUSION Multiwavelets and Neural Networks, Pattern
In this paper, a new combined histogram chain code feature Recognition, Vol 36, No 7, pp 1997-1604, 2003.
extraction method has been proposed to solve the problem
[13] S. B. Cho, Neural-Network Classifiers for
of handwritten digit recognition with probabilistic neural
network. The feasibility and efficiency of this proposed Recognizing Totally Unconstrained Handwritten
method was evaluated in two aspects, they are performance Numerals, IEEE Transactions on Neural Networks,
and the recognition accuracy. Vol 8, No 1, pp 43-53, 1997.
REFERENCES [14] J. X. Dong, A. Krzyzak, C.Y. Suen, Fast SVM
Training Algorithm with Decomposition on Very
[1] V.K. Govindan and A.P.Shivaprasad, Character
Large Datasets, IEEE Tran on Pattern Analysis and
recognition - A review, Pattern recognition Journal,
Machine Intelligence, Vol 27, No 4, pp 603-618, 2005.
Vol 23, No 7, pp 671 683, 1990.
[15] P. D. Gader and M. A. Khabou, Automatic Feature
[2] M. Shridhar, A.Badreldin, High Accuracy Syntactic
Generation for Handwritten Digit Recognition, IEEE
Recognition Algorithm for Handwritten Numerals,
Transactions on Pattern Analysis and Machine
IEEE Tran on Systems Man and Cybernetics, Vol 15,
Intelligence, Vol 18, No 12, pp 1256-1261.
No1, pp 152 158, 1985.
[16] E. Kussul and T. Baidyk, Improved Method of
[3] N. Arica, F. Yarman-Vural, HMM Based Handwritten
Handwritten Digit Recognition Tested on MNIST
Recognition, Proceedings Of ISCIS XII, pp 260 266,
Database, Image and Vision Computing, Vol 22, No
1997.
12, pp 971-981, 2004.
[4] A.de S. Britto, R. Sabourin, F. Bortolozzi, and C. Y.
[17] S.W.Lee, Off-Line Recognition of Totally
Suen, The Recognition of Handwritten Numeral
Unconstrained Handwritten Numerals Using
Strings Using A Two-stage HMM-based Method,
Multilayer Cluster Neural Network, IEEE
International Journal on Document Analysis and
Transactions on PAMI, Vol 18, No 6, pp 648-652,
Recognition, Vol 5, No 2, pp 102-117, 2003.
1996.

[Page No. 4588]


International Journal of Applied Engineering Research, ISSN 0973-4562 Vol. 9 No. 20, 2014
Research India Publications http://www.ripublication.com/ijaer.htm

[18] C.L.Liu, K.Nakashima, H. Sako, and H. Fujisawa, Chain Code Histogram Feature for Handwritten
Handwritten Digit Recognition: Benchmarking of Character Recognition, Springer Lecture Notes in
State-of-the-Art Techniques, Pattern Recognition, Vol CCSIT, Vol 84, pp 611 619, 2012.
36, No 10, pp 2271-2285, 2003. [21] Abdel-Badeeh M. Salem, Adel A. Sewisy, Usama A.
[19] L. H. Yang, C. Y. Suen, T. D. Bui, and P. Zhang, Elyan, A Vertex Chain Code Approach for Image
Discrimination of Similar Handwritten Numerals Recognition, ICGST-GVIP Journal, Volume 5, Issue3,
Based on Invariant Curvature Features, Pattern March 2005.
Recognition, Vol 38, No 7, pp 947-963, 2005.
[20] Jitendra Jain, Soyuj Kumar Sahoo, S.R.
MahadevaPrasanna, and G. Siva Reddy, Modified

[Page No. 4589]

Vous aimerez peut-être aussi