Vous êtes sur la page 1sur 6

(IJCSIS) International Journal of Computer Science and Information Security,

Vol. 9, No. 3, March 2011

Automatic parsing For Arabic sentences

Zainab Ali Khalaf* Dr. Tan Tien Ping

School of computer science School of computer science


Universiti Sains Malaysia (USM) Universiti Sains Malaysia (USM)
Penang, Malaysia Penang, Malaysia
E-mail: zak10_com026@student.usm.my E-mail: tienping@cs.usm.my
*(Ass. Prof. In Computer Science Dept.,
Basra University, Iraq)

Abstract__The designed system is a parser for Arabic


sentences using syntactic and semantic relations The proposed system aims to use these properties
between deep and surface structures. The system to parse Arabic sentences depending on the position
depends on implementation of Case theory of Fillmore. of the words in the sentence and the functional
meaning of them.
The parsing algorithm starts analyzing the input
sentence to check its syntax, semantic and spelling using
Arabic transformation rules proposed in Al_Khouly to
gain semantic strength. The proposed system depends II. SYSTEM COMPONENTS
on the effective elements represented by the verb of the
sentence .This element is used to control the parsing
operation. The syntactical properties of any natural
The proposed system permits as input different
language are formally described by the use of what
surface structures of Arabic sentences to produce
automatic parsing forms for these input sentences. Chomsky calls production systems. A formal system
generally depends on three types of data [2,3,6]:
Keywords__Artificial intelligence; natural language
processing; transformation rules; deep structure and A. Data of vocabulary lexicon
surface structure; parsing Arabic sentences .

The lexicon plays an important role in any NLP


system. It is a huge data base of variable entries
I. INTRODUCTION describing the meaning of words in synonymy (and
antinomy) contextual fashion [3,6]. The implemented
lexicon consists of entries saved as a rule ( Entrance
Arabic language is a parsing language . Parsing [ Word , Features ] ).
means the relation among the words in the sentence.
The most important component is the verb which acts • The Entrance is one of the following indicators :-
as the basic unit to control the rules of choosing other Verb , Noun , Preposition , Determinate , Assistant
elements. Although Arabic sentences have different and Negation.
structures , but it is recognized as a ( verb , subject ,
object ) language. The subject or the object may be The Word is a string index for the lexicon entry.
precede the verb in the Arabic sentences according to
the pragmatic necessity [1,3,4].
• The Features is a list of structured integers coded
to hold the syntactical and semantic information of
Arabic Syntactic facilitates the flexibility of the the word. Each coded integer, written as [Fp],
deep structure and the surface structure of sentence to consists of two parts F and p. The [p] part is either 1
be connected together strongly. This propriety helps or 0 depending on whether the feature [F] exists or
Arabic language accept for automatic processing not. The [F] part is the feature code.
[4,5].

58 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 3, March 2011

B. Data of syntactical rules


The presence of the verb is necessary and obligatory,
whereas the presence of other elements is optional
These rules are formalized to describe the and dependent on the verb rules [1,4].
language in order to relate each one deep structure
into so many corresponding surface structures of the
same meaning. These rules are actually inductive and
sequential. Some are obligatory and others are III. DENESIGD SYSTEM STRUCTURE
optional rules. From the optional rules, one can obtain
various surface structures that act as contextual The designed system has many stages : Figure (1)
linguistics. The transformations are mainly operations acts flowchart of these stages which are described
that are addition, deletion, moving forward, moving below :
backward and some other secondary operations. These
operations are, in general, not performed at random,
but are governed and selected according to a set of A. Input sentence stage
conditions and rules of structure description. These
operations will generate all surface structures
emerging from that one deep structure. The function of this stage is to input Arabic sentence
from the keyboard to the computer , this sentence
ended by dot or semicolon or space character .
C. Data of syntactic structure

These data are rules described in BNF for


Arabic language , and acts as constraints and controls B. Segmentation stage
to form the sentences of Arabic language. The most
important component, as Fillmore and Shank
recognized, is the verb element which acts as the The function of this stage is to segment the input
basic unit that controls rules of choosing other sentence into words depended on space character
elements. The dependent phrase structure rules used (free number of space characters).
are the following :-
C. Lexicon search stage
<Sentence> ::= <Modality> + < Auxiliary > + <
Proposition >
<Sentence> ::= < Auxiliary > + < Proposition > < The function of this stage is to search for all sentence
Modality > ::= < External Condition > + < External words in the lexicon . If the word is not found in the
Adverb > lexicon, the program gives spelling error message
<Proposition > ::= < Verb > + < Theme > + < Indirect and stop .
Object > + < Place > + < Tool > + < Agent >
< Theme > ::= < Noun Phrase > D. Syntactical analysis stage
< Agent > ::= < Noun Phrase >
< Tool > ::= < Noun Phrase >
< Place > ::= < Noun Phrase > The function of this stage is to ensure and govern the
< Indirect Object > ::= < Noun Phrase > correctness of input sentence from its syntactical side
<Noun phrase> ::=<Proposition> + <determinate > + < . If the processing found errors , the program gives
Noun > syntactical error massage .
< Noun Phrase > ::= < Proposition >+ < Noun>
< External Condition > ::= semi statements used to
combine two sentences such as ( in spite of ‫بالرغم‬ E. Semantic analysis stage
‫ )من‬or ( moreover ‫ ) وعالوة على ذلك‬etc.
< External Adverb > ::= <Time Adverb>+<
Interrogative Words> +<Negation Words> The function of this stage is to ensure and govern the
< Auxiliary > ::= lexical words such as (‫ )كان‬or ( correctness of input sentence from its harmony, its
‫ )يكون‬etc. vocabulary and correctness of its meaning . If the
< Verb > ::= A dictionary verb such as ( write ‫) يكتب‬ sentence is not correct in its meaning, the program
etc. gives semantic error massage .
< Noun > ::= A dictionary noun such as ( boy ‫) ولد‬
etc.

59 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 3, March 2011

F. Generative deep structure stage

IV. EXAMPLES
Transformational operations will carry out , and try
to compile the addition, deletion, replacement and
other operations to obtain on the sentence structure
which acts as the deep structure . For example we want to know the parsing of the
following sentences. Figure (2) depicted this
mechanism :-
G. Parsing stage
A. Example 1
The function of this stage is to parse sentence which ‫ما كتب أحمد ؟‬
depends on its effective element and its position in
structure phrase . This stage has many Arabic The system prints the following parsing :
language rules which control the parsing operations . ‫ أسم أستفھام مبني على الفتح في محل رفع مبتدأ‬: ‫ما‬
. . ‫ماض مبني على الفتح‬ ‫ فعل‬: ‫كتب‬
ٍ
Here an examples of sentences that the system can ‫ فاعل مرفوع وعالمة رفعة الضمة الظاھرة على آخره والجملة‬: ‫أحمد‬
parse its :-
. ‫الفعلية في محل رفع خبر للمبتدأ‬
B. Example 2
 
  .1
. ‫كتبت منال الرسالة‬
   .2 The system prints the following parsing :


 
  .3 ‫ماض مبني على الفتح والتاء تاء التأنيث الساكنة ال محل‬
ٍ ‫ فعل‬: ‫كتبت‬
. ‫لھا من األعراب‬

  .4 . ‫ فاعل مرفوع وعالمة رفعة الضمة الظاھرة على آخره‬: ‫منال‬
. ‫ مفعول به منصوب وعالمة نصبه الفتحة الظاھرة على آخره‬: ‫الرسالة‬
    .5
C. Example 3
 
 .6
. ‫أن البنت ضحوك‬
 
!"   .7 The system prints the following parsing :

# $ 
 .8
 # .9 ‫ حرف مشبھة بالفعل ينصب األول اسما ً له ويرفع‬: ‫أن‬
. ‫الثاني خبر له‬
% # 
 .10 ‫ أسم أن منصوب وعالمة نصبه الفتحة الظاھرة على‬: ‫البنت‬
. ‫آخره‬
 
&
 .11 ‫ خبر أن مرفوع وعالمة رفعة الضمة الظاھرة على‬: ‫ضحوك‬
. ‫آخره‬
 
.12 D. Example 4
. ‫أنت أم حنون‬
' (
!)" .13
The system prints the following parsing :


   .14

* 
.15 . ‫ ضمير رفع منفصل في محل رفع مبتدأ‬: ‫أنت‬
. ‫ خبر مرفوع وعالمة رفعة تنوين الضم‬: ‫أم‬
  
 .16 ‫ صفة مرفوعة وعالمة رفعھا الضمة الظاھرة على‬: ‫حنون‬
(   .17 . ‫آخرھا‬
E. Example 5
. ‫كتب أحمد الرسالة كتابة‬
The system prints the following parsing :

60 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 3, March 2011

References
. ‫ماض مبني على الفتح‬
ٍ ‫ فعل‬: ‫كتب‬
‫ فاعل مرفوع وعالمة رفعة الضمة الظاھرة على آخره‬: ‫أحمد‬ [1] Abo-Arafah .A. , "A grammar for the Arabic language suitable
. for machine parsing and automatic text generation ", PH.D. thesis ,
‫ مفعول به منصوب وعالمة نصبه الفتحة الظاھرة على‬: ‫الرسالة‬ Illinois of technology , Chicago , USA,1995 .
. ‫آخره‬
. ‫ مفعول مطلق منصوب وعالمة نصبه تنوين الفتح‬: ‫كتابة‬ [2] Ali .N. , “Arabic language and Computer” , "Al-Tareeb
Publishing House, Cairo, Egypt, 1988.

[3] Al-Khouly, M. , “ Transformation rules for Arabic language”,


Al- Riyadh, 1981.
Conclusions
[4] Al-Shalabi .R. , Evens .M ." A Computational Morphology
System For Arabic " , Dept. Of Computer Science and applied
Mathematics , Illinos Institute Of Technology , Chicago , USA ,
The present research ends up with the following W.D.
conclusions :-
[5] Gheith .M. , Mashour .M . " A Computer Based System For
1. The verb is the main component which controls understanding Arabic language ", Computer Science Department
all other component appearing with it . From this Inst. Of Statistical Study & Research , Cairo University , Egypt
point, we consider all deep structures as containing ,W.D.
the verb in its structure .
[6] Khalaf .Z. , “Computerized Implementation For Processing
Arabic Sentences By Interpretation Synonymy Relationships” ,
2. The word meaning depends on the essential M.Sc. thesis, Basra University, Iraq, 2001.
effective element ( the deep element ) .

3. The lexicon plays the essential element to


provide any system by vocabulary and its features .
By these features, we can control the different
processing levels of syntax and semantics .

4. The absence of vowelization might bring some


ambiguities to sentence understanding. However the
transformation rules are used to remedy these
ambiguities in an explicit and easy way, as in the
following sentences which show where, in all the
sentences, the man is the subject and the lion is object
.

‫قتل الرجل األسد‬


‫الرجل قتل األسد‬
‫األسد قتله الرجل‬

Acknowledgment

I would like to express my sincere appreciation to


TWAS organization , USM university for their
encouragement and continuous financial support
through the providing PHD fellowship. In addition we
would like to thank school of computer science for
their encouragement and motivation of international
students in the faculty.

61 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 3, March 2011

User Interface

Input Stage

Segmentation
Lexicon
Stage
Lexical Rules

Spelling
Lexicon Stage
Error

Initial Descriptive
Structure

Syntactical
TransformationalRules
Rules Errors
Transformational

Semantic
Stage Deep Structure

Semantic Parsing stage


Error

User Interface

Figure (1) acts flowchart of


Parsing operations

62 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 3, March 2011

Surface structure

.  
  

Transformation Rules

An agent ( ) used a tool ( ) to


perform the verb ( ) to get the object
(
 )
Deep structure

Verb () , Subject ( ) , Object (


 ) , Tool ( )

Sentence structure

Parsing Stage

  ! : 
   
. (
)*  +
,-  "  !# $% &'!
 ! : 
. (
)*  +
,- . /01 $% 2'0 / 3'  : 

.
4 5
. #%
4   %
6 5
 7 : 

Figure (2) acts the mechanism to Parse


Arabic sentence

63 http://sites.google.com/site/ijcsis/
ISSN 1947-5500