Vous êtes sur la page 1sur 7

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 2, FEBRUARY 2011, ISSN 2151-9617

HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 59

Hindi Part-of-Speech Tagger Based Neural Networks


Jabar H. Yousif and Dinesh Kumar Saini

Abstract- Multilayered perceptron part of speech tagger (POS) for Hindi language is developed and implemented. It achieves an accuracy of
82.8%. The implementation of neural network is done using NeuroSolution package which is used to create and adopt the multilayer network.
For the sake of error correction the backpropagation learning algorithm is used, testing the efficiency of correctly predicting the syntactic and
semantic classification tagging. The results showed clearly that the proposed MLP tagger is accurate and speedy words tagging. In addition it
uses a less time to accomplish the training of network. In comparison with others results of other researchers, the proposed tagger used a
little number of data sets to achieve the adaptation and learning of network.

Index Terms— Hindi part-of-speech, Hindi corpus, natural language processing, neural network

——————————  ——————————

1 INTRODUCTION

T HE artificial neural network is a computational model


that is motivated by the structure and/or functional
aspects of biological neurons. It is considered as one of
based approaches not only consummate the associations
(word-to-tag mappings) from a representative training data
set but it can also generalize to unseen exemplars [1].
the most techniques for the learning from the rare data [9].
The aim of this paper is to design and implement a Hindi
2 BACKGROUND
part of speech (POS) tagger based multilayer perceptron
neural network. There are specific features that make neur-
al networks successfully applied in a number of applica- 2.1 HINDI LANGUAGE CHARACTERISTICS
tions [9, 12, and 22]. The POS tagger is an important task for The official verbal communication of India is Hindi. It is
the majority of natural language processing applications the native language of more than 180 million people and
such as classifications of data, information extraction and many others speak Hindi as a second language [11]. Hindi
information retrieval systems, machine translation, speech is a morphologically rich language. The nature of the Hindi
recognition, grammar spelling checkers [5], etc. Hindi (or word structure is highly derivative and inflective. Moreo-
Hindustani) is an Indo-Aryan language spoken by over 500 ver, Hindi words are often compound structures which
million people, mostly in India where it is one of the two should syntactically be regarded as phrases rather than sin-
official languages of communication [3, 8], along with Eng- gle words. [5]. Unlike Latin-based alphabets, the orientation
lish. It is the third most spoken language in the world, after of writing in Hindi is from left to right. Hindi writing is
Chinese and English. Hindi is written with the Devanagari similar from other languages like English and Spanish …
script. Hindi language processing has recently becomes a etc. Although there are common features that characterize
focus of research and commercial development. The NLP such writings [5, 23], these features can be summarized by
applications of Hindi must include a fast POS tagger as one the followings:
of its main core components [11, 19, and 23]. The reliable **Writing is directed from left to right.
POS tagger must be expeditious, expressive, inclusive, ac- ** Articles are not used in Hindi.
curate, and portable. The POS tagger systems have been ** Accents are not used in Hindi.
implemented using varied methods such as rule-based ** Capitalization is not used in Hindi.
model [5, 9], statistical model [2, 7, 19, 21], and neural net- ** Hindi does not have any particular forms of punctuation
work model [1, 7, 12, 13, 14, 15, 20]. On the other hand, sta- that an English-speaker might find odd.
tistical models [2, 19, 21], rule based models [17, 23], and
Support Vector Machine (SVM) models that have been pro-
posed to address and Implement the Hindi POS tagger [3,
8]. Malayalam is one type of Hindi languages which is 2.2 Hindi Language Basic Grammar
spoked primarily in Southern Coastal India by over 35 mil-
A. Types
lion speakers.
These are of five types (similar to those in English lan-
All these approaches need a vast amount of data to adapt
guage)—
and implement the POS tagger, except the neural ap-
1. Vyakti vachak sangya (Proper Noun) - e.g. Delhi,
proaches which used a little amount of data to perfume the
Gandhi, Ramayan, Geetanjali, Himalaya, Tajmahal
training and learning stages [16]. Moreover, the neural
2. Jati vachak sangya (Common Noun) - e.g.
———————————————— more(peacock), pustak(book), mahila(lady), baa-
 Jabar H Yousif, Faculty of Computing and Information Technology Sohar lak(boy), baalika(girl)
University, P.O.Box. 44, PC 311,Soahr Sultante of Oman
 Dinesh Kuamr Saini, Faculty of Computing and Information Technology 3. Bhav vachak sangya (Abstract Noun) - e.g. bach-
Sohar University, P.O.Box. 44, PC 311,Soahr Sultante of Oman pan(childhood), satya(truth), sundarata(beauty),
namrata(politeness)
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 2, FEBRUARY 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 60

4. Samudaay vachak sangya (Collective Noun) - e.g. 2.3 Part Of Speech


sena(armed forces), sabha(assembly), manda- One aim of Natural language processing is to make
li(group) computer understand the input text and the meaning of
every word in that text.
In fact, any momentous NLP applications must include
Gun Va- Sankhya Va- Pariman Sanket POS tagging as one of its main core components. The pars-
chak chak (Num- Vachak Vachak ing task consider as a complex operation. Second, the
(Quality) eral) (Quantity) (Demon meaning of many words may be ambiguous like pronouns,
strative) they always points to other sentence. Therefore, it is impor-
tant to be able to recognize and identify the different types
of words in a language, so that you can understand gram-
5. Dravya vachak sangya (Material Noun) - e.g. so- mar constructions and use the right word form in the ap-
na(gold), loha(iron), paani(water) propriate place.
Part of Speech (POS) is the classification of words according
Sarvnaam (Pronoun In Hindi Grammar) to their meaning and function, also called grammatical tag-
ging.
Pronouns in Hindi language are of five types:
The methods which have been used for part of speech tag-
1. Purush vachak sarvnaam (Personal pronoun) - ging can be categorized to the following two sections:
These are of three kind : a. Supervised models tagger used a “training corpus” to
create rules and probabilities used in the actual tagging
a. Uttam Purush (First Person) e.g. mae (I), process. It is require analyzation of a huge amount of cor-
hum(we), mera (my), humara (our) rectly tagged text.
b. Madhyam Purush (Second Person) e.g. b. Unsupervised models, on the other hand, are those
tum(you), tera(your) which do not require a pre-tagged corpus. They analyze
input text and reason tagging rules and probabilities as
c. Anya Purush (Third person) e.g. vah (he),
they encounter them.
uska (his)
2. Nischay vachak sarvnaam (Demonstrative pro- There are several things which effect on the POS tagging
noun) - Point to a definite person or object. e.g. yeh accurate, like ambiguous words, ambiguous phrases, un-
(this), veh (that), ye(these), ve (those) known words and multi-part words.

3. Anischay vachak sarvnaam (Indefinite pronoun) -


Do not point to a definite person or object. e.g. koi
2.4 Neural Networks
(someone), kuchh (something)
A neural network is a powerful data modeling tool that
4. Sambandh vachak sarvnaam (Relative pronoun) - is able to capture and represent complex input/output rela-
Relate one word to another. e.g. jo (who), jiski tionships either linear or non-linear one. There are specific
(whose), jaisa (like) features that stimulate scientists to adopt neural network
5. Prashna vachak sarvnaam (Interrogative pronoun) design theme in the different application fields. The main
- Used for interrogation. e.g. kaun (who), kya features are Massive parallelism, Uniformity, distribution
(what), kisko (whom) representation and computation, learning ability, Trainabil-
ity, generalization ability, and adaptivity [9, 16, and 18].
Neural networks are being successfully applied in a num-
Visheshan (Adjective In Hindi Grammar) ber of areas like data classifications, resource allocation and
A. Types scheduling, database mining, speech production and rec-
ognition, pattern recognition, etc.
These are of four types. The grammar tree is given below.
The multilayer perceptron (MLP) is considered as one of
B. Tulna (Degree of Comparison) the most common neural network models [1, 9]. The MLP
neural network is one type of a supervised network because
There are three stages of comparison in Hindi Visheshan.
it requires a desired output in order to learn. The main ob-
1. Mula vastha (Positive degree) jectives of this type of network are to engender a model that
correctly maps the input to the output using previous
2. Uttara vastha (Comparative degree) knowledge and it is perform the tagging task with low
process time.
3. Uttama vastha (Superlative degree)
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 2, FEBRUARY 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 61

3 RELATED WORK w ij (n  1)  w ij (n)   i( n)  x j ( n)...( 2)


Ahemad [1] describes a part of speech tagger based MLP
Where the local error  i( n ) is computed from ei (n) at the
network with three-layers. This tagger is trained with error
output PE or can be computed as a weighted sum of errors
back-propagation learning algorithm using SUSANNE Eng-
lish tagged-corpus which consisting of 156,622 words. This
at the internal PEs. The constant step size is . 
For the sake of an improvement to the straight gradient
method shows promise for addressing parts-of-speech tag-
descent, the Momentum learning is used which is speed up
ging problem for Indian language text considering the fact
and stabilize convergence of network. The update of the
that most of the Indian language corpora, especially tagged
weights of momentum learning computed as in formula 3.
ones, are still considerably small in size.
Aniket et. al. [2] describes a Hindi part of speech tagger
w ij (n  1)  w ij (n)   i( n )  x j ( n ) 
 (w ij (n) - w ij (n - 1)) ….(3)
using the Maximum Entropy techniques. It used a word
based context with one level suffix and dictionary-
The best value of  the momentum value is to be set be-
tween 0.1 and 0.9.
based features.
In this paper, a multilayer perceptron network is used as
Scott [21] design a partial free word order language
depicted in Fig.1. This MLP have one hidden layer, with 7
based Hidden Markov Model. It chooses the best
PEs as input, 26 PEs as output, the data set consist of 200
tag for a given word sequence. But it is not consider
pairs of (word, tag). The maximum number of epochs is
to tackle the partial free word order characteristics
1000. The TanhAxon transfer function is implemented in
of Hindi.
hidden and output layers. The Tanh Axon implements a
Schmid[20] successfully demonstrated that a Net-Tagger
bias and Tanh function to every neuron in the layer. This
with a context window of three preceding words and two
will squash the range of each neuron in the layer between -
succeeding words trained on a large corpus called Penn
1 and 1. Such nonlinear elements provide a network with
Treebank Corpus performed considerably well as com-
the ability to make soft decisions.
pared to statistical approaches based on trigram model
The momentum learning method is adopted with con-
and Hidden Markov Model (HMM).
stant step size =1, and momentum rate = 0.7. Each word w
Ma Q. et. al. [12, 13] used a two-layer perceptron but net-
in the training data set is encoded into vector IN = ( b1, b2
work outputs are corrected using Brill’s transformation rule
,b3,….,bn) , where n is the total number of words . bj is the
approach [6] and the context window is dynamically sized.
prior probability that the word w resembled to the tag posj
Marques and Pereira [17] adopted and explore the use of
and is computed from the training data as given in formula.
Elman’s net but with worse results for small training sets
4 and 5.
for Portuguese.
Delip [4] implements different part of speech tagger
(INV, OOV) for Hindi, Bengali and Telugu languages. It
used a manually tagged corpus consist about 10k words.
The best tagger TnT achieves an accuracy of 76.68.

4 MLP CONFIGURATION & DESIGN


It is a supervised classification technique, which has the
ability to classify a data upon learning from example of
data set. The extensively neural network topologies used to
implement the neural network is multilayer perceptron
(MLP). The discriminate functions can take any shape, as
required by the input data clusters. In order to achieve a
high performance of the maximum a posterior receiver, we
need to normalize the input and the output classes to 0/1,
Fig.1 MLP with one hidden layer
which is optimal from a classification point of view [1, 9,
and 16]. A MLP network with error back-propagation
learning algorithm is used [18]. The back propagation learn-
ing algorithm is used to propagate the errors through the If the word w appears in the training data, the bj we will be
network and permits to accomplish the adaptation of computed as explained in formula 4 :
weights in the hidden PEs. The error correction learning
works in the response at PE i at iteration n, yi(n), and the bj = C(posj , w) / C(w) .........(4)
preferred result di(n) for a given input pattern an instanta-
neous error ei (n) is defined by the formula 1. where C(posj, w) is the number of occurrences of w tagged
ei (n) = di(n) - yi(n), …..(1) as posj in training data. C(w) is the number of occurrences
With the aim of adaptation of each weight in the network, of w in training data . Otherwise, if the word w does not
the theory of gradient learning techniques is used. It cor- appear in the training data, the bj will be computed as fol-
rects the present value of the weight as depicted in formula lows:
2.
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 2, FEBRUARY 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 62

the Average of MSE with Standard Deviation Boundaries


for 3 Runs is implemented.
 1
if pos is candidate
bi  
j
N (w) TABLE 1. MINIMUM AND FINAL MSE

0 otherwise ...... ( 5 )

N(w) is the number of possible POS tags that can be as- Best Net- Cross Valida-
signed to the word w. Each element in input vector IN is an works Training tion
n-element probability vector encoded as described in for- Run # 2 3
mula.5 .
Epoch # 1000 1000
The output pattern OUT is defined as in 6:
Minimum
OUT = (o1, o2, . . . , on) …….(6)
MSE 0.005644808 0.026806657
The output at each node is a function of weighted sum of
inputs at the node. Final MSE 0.005644808 0.026806657

op i  f (  w ji * x j )........( 7 ) The figure 2 is illustrated the Average of MSE for three


j
runs, and all results are epitomized in table 2. Likewise, the
Where, xj is the output from jth node of the previous layer. training of MSE results for 3 Runs is depicted in figure3. In
The network learns the word-tag mappings as a complex order to estimate the performance of a predictive model, we
function: implement the cross validation techniques. The figure 4 is
F(targetword, context) = tag ….(8) summarized the cross validation of MSE results for 3 Runs. 
The weights of the network act as the parameters for the
function F and the context refers to the set of words in the
immediate neighborhood of the target word. 6 COMPARISON AND CONCLUSION

5 EXPERIMENTS AND RESULTS


6.1 COMPARISON WITH RELATED WORK
In this section we describe our POS tagging experiments
and results of Hindi language, performed by using the mul- We had to do this comparison carefully because the fea-
tilayer perceptron neural network which is represented in tures used here do not match those in the previous studies.
Section 4. The NeuroSolutions package is used to design On other hand, the comparison of MLP-tagger with other
and implement the MLP network. The experiments in this existing taggers is difficult matter, because the tagger accu-
paper were achieved using the Hindi tagset manually ex- racy relies on numerous parameters such as language com-
tract from the news paper "danik bhaskar". The segmenta- plication (ambiguous words, ambiguous phrases), the lan-
tion of a sentence into words does not considered in this guage type (Hindi , Bengali, English, Arabic, Chinese, etc),
study. We suppose that the words were segmented before the training data magnitude, the tag-set size and the evalua-
POS tagging began. Our objective is to identify the correct tion measurement criteria. The tag-set size depends on the
POS tag for each word. complexity of the language and determines the number of
The input data set was divided into three categories; first, possible classes into which words are to be categorized. So
100 pairs of data set are used for training process. Second, the tag-set size has great impact on the tagging process.
50 pairs of data set are used for cross validation process. However, the table 3 is summarized the comparison results
The Cross validation computes the error in a test set at the of different taggers.
same time that the network is being trained with the train-
ing set. 6.2 CONCLUSION
Third, in sake of the network performance test, 50 pairs of
We had demonstrated a MLP tagger to solve the problem
data set that not train on the network before are used. There
of Hindi part of speech using a small tagged data set. The
is several ways to test the networks performance. Usually,
proposed tagger is achieved an accuracy of 82.6, has low
the mean squared error MSE is used. It is a function of two
training time, and speedy words tagging.
times the average cost and it is computed as follows:
On the other hand, the neural network got some of
  d 
P N
 y
2 wrong tagging, this only because they have negative corre-
ij ij
j  o i 0 lations which mean they are right but needs to use the op-
MSE  ....( 9 )
NP timization algorithms like genetic algorithms to achieve the
desired result.
Where, P is the number of output processing elements. N is
the number of exemplars in the data set. yij is the network
output for exemplar i at processing element j . dij is the de-
sired output for exemplar i at processing element j . The 7 FUTURE WORK
result of the best MSE is depicted in figure 2. The outputs of
Recently, we are trying to implement unsupervised learn-
the network for the minimum and final MSE values are
ing approach to perform the automatic text tagging. Using
epitomized in table 1. In sake of test the network stability,
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 2, FEBRUARY 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 63

unsupervised approach will make the fully automatic part rience”, artcom, pages.709-713, 2009 International Conference on Ad-
of speech easy matter. Beside, we will implement some op- vances in Recent Technologies in Communication and Computing, 2009.
timization techniques like genetic algorithm in order to in [12] Ma Q, et al, "Elastic neural networks for part of speech tagging", In Pro-
increase the accuracy of tagging. ceedings of IJCNN’99, pages 2991–2996, Washington, DC., 1999.
Some attention and effort must be employed to resolve the [13] Ma Q., et al, “PoS Tagging With Mixed Approaches of NN and Trans-
issue suffixes in an absence of a morphological tagger for formation Rules”, NLPRS’99 Workshop on Natural Language Processing
such languages. and Neural Networks, Beijing, China, 1999.
[14] Marinai, S.,Gori, M., &Soda, G. "Artificial neural networks for document
REFERENCES analysis and recognition", IEEE Trans. on Pattern Analysis and Machine
Intelligence, 27(1), 23-35, 2005.
[1] Ahmed, "Application of Multilayer Perceptron Network for Tagging
[15] Marques N. C. and Gabriel Pereira Lopes, “Using Neural Nets for Portu-
Parts-of-Speech", Proceedings of the Language Engineering Conference
guese Part-of-Speech Tagging”, Proceedings of the Fifth International
(LEC’02) ,IEEE, 2002.
Conference on The Cognitive Science of Natural Language Processing,
[2] Aniket D., et al, “Hindi Part-of-Speech Tagging and Chunking: A Maxi-
Dublin City University, Ireland, 1996.
mum Entropy Approach”, In Proceeding of the NLPAI Machine Learning
[16] Masters, T., "Advanced algorithms for neural networks", New York: Wi-
Competition, 2006.
ley, 1995.
[3] Antony P.J, et al,”SVM Based Part of Speech Tagger for Malayalam”, IEEE
[17] Navanath Saharia. and et. al. “Part of Speech Tagger for Assamese Text”,
International Conference on Recent Trends in Information, Telecommuni- Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pages 33–
cation and Computing, pages 339-341, 2010. 36, Suntec, Singapore, 4 August 2009.
[4] Delip Rao, & david Yarowsky, "Part of speech tagging and shallow pars- [18] Rumelhart D. E., et. al, "Learning internal representations by error propa-
ing of Indian languages", IJCAI-07 workshop on Shallow Parsing for gation", In D. E. Rumelhart and J. L. McClelland, editors, Parallel Distri-
South Asian Languages, 2007. buted Processing, volume 1, pages 318–362, Cambridge, USA, MIT-Press,
1986.
[5] Dinesh K. & Gurpreet S. J., “Part of Speech Taggers for Morphologically
Rich Indian Languages: A Survey” , International Journal of Computer [19] Sankaran B., “Hindi POS tagging and Chunking”, The Proceedings of the
Applications , pages 0975 – 8887, 6(5), September 2010. NLPAI Machine Learning Contest 2006.
[6] E. Brill, "A simple rule-based part-of-speech tagger", In Proceedings of [20] Schmid H. "Part-of-speech tagging with neural networks". In Proceedings
of COLING-94, pages 172–176, Kyoto, Japan, 1994.
ANLP-92, 3rd Conference on Applied Natural Language Processing, pag-
es 152–155, Trento, IT, 1992. [21] Scott M. T. & Mary P. H. ,”A second-order Hidden Markov Model for
[7] E Brill, "Some Advances in Transformation Based Part of Speech Tag- part-of-speech tagging”. In Proceedings of the 37th Annual Meeting of the
Association for Computational Linguistics, pages 175-182, Association for
ging", In “Proceedings of the Twelfth International Conference on Artifi-
Computational Linguistics, 1999.
cial Intelligence” (AAAI-94), Seattle, WA, 1994.
[22] Smith, K & Gupta J., "Neural networks in business: Techniques and appli-
[8] Ekbal, A. & Bandyopadhyay, S., “Part of Speech Tagging in Bengali Using cations for the operations researcher", Computer & Operation Research,
Support Vector Machine”, ICIT-08, IEEE, International Conference on In- (27), pages 1023-1044, 2000.
formation Technology, pages 106-111, 2008
[23] Smriti S., et.al,” Morphological Richness Offsets Resource Demand- Expe-
[9] Lippman R. "An introduction to computing with neural nets." IEEE Trans.
riences in Constructing a POS Tagger for Hindi”, in the proceedings of
ASSP Magazine (4), pages 4-22, 1987.
COLING/ACL, pages 779-786, 2006
[10] Manish S. & Pushpak B., "Hindi POS Tagger Using Naive Stemming:
Harnessing Morphological Information Without Extensive Linguistic
Knowledge", International Conference on NLP (ICON08), Pune, India,
December, 2008.
[11] Manju K., et al, "Development of a POS Tagger for Malayalam-An Expe-
TABLE 2 AVERAGE OF MSE FOR THREE RUNS

Cross Validation
Training Mini- Training Stan- Cross Validation Standard Devia-
All Runs mum dard Deviation Minimum tion
Average of
Minimum 0.008229626 0.003381067 0.029465726 0.003894326
MSEs

Average of 0.008229626 0.003381067 0.029472774 0.003889426


Final MSEs

TABLE 3. THE COMPARISON RESULTS OF DIFFERENT TAGGERS.


JOURNAL OF COMPUTING, VOLUME 3, ISSUE 2, FEBRUARY 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 64

Our tagger
Schmid[20] Ahmed[1] Delip[4] Ekbal[8]
MLP
SVM -Tree
Method NN NN HMM NN
Bank
Language English English Hindi Hindi Hindi
Data size
for train- 44.4% 85% 100% 100% 10%
ing
Accuracy 96.22% 90.4% 76.68% 71.65% 82.6%

Figure.2 Average MSE with Standard Deviation Boundaries for 3 Runs

Training MSE
0.7

0.6

0.5

0.4
MSE

Run #1
0.3
Run #2
0.2
Run #3
0.1

0
1 100 199 298 397 496 595 694 793 892 991

Epoch

Figure.3 the training of MSE results for 3 Runs


JOURNAL OF COMPUTING, VOLUME 3, ISSUE 2, FEBRUARY 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 65

Cross Validation MSE
0.35

0.3

0.25

0.2
MSE

Run #1
0.15
Run #2
0.1
Run #3
0.05

0
1 100 199 298 397 496 595 694 793 892 991

Epoch

Figure.4 the cross validation of MSE results for 3 Runs

Vous aimerez peut-être aussi