Académique Documents
Professionnel Documents
Culture Documents
From a serious computational perspective, the creation and availability of a MAG for
a language is important.The role of morphology is very significant in the field of NLP, as
seen in applications like MT, QA system, IE, IR, spell checker, lexicography etc.
Morphological analyzer and generators are very essential for languages having rich
inflectional and derivational morphological features such as Kannada. The function of
morphological analyzer is to return all the morphemes and their grammatical categories
associated with a particular word form. On the other hand, for a given root word and
grammatical information, morphological generator will generate the particular word form
of that word.Developing a full fledged morphological analyzer and generator tools for
highly agglutinative language like Kannada is a challenging task. To build a
morphological analyzer and generator for a language, one has to take care of the
morphological peculiarities of that language. Some peculiarities of Kannada language such
as, the usage of classifiers, excessive presence of vowel harmony etc. make it
morphologically complex and thus, a challenge in NLG.
Input Output
(AnegaLu) + (Ane+gaLu)
(hOguttEne) ++ (hOgu+utt+Ene)
The function of morphological analyzer is to segment the given word into component
morphemes and assigning correct morpho-syntactic information. The Table 6.1 shows
examples for morphological analysis of Kannada words.The function of morphological
generator is to combine the constituent morphemes to get the actual word. The Table 6.2
shows examples for morphological generation of Kannada words.
184
Table 6.2: Input/output Examples for Morphological Generator
Input Output
+ (Ane+gaLu) (AnegaLu)
++ (hOgu+utt+Ene) (hOguttEne)
The first section of this chapter used to explain the development of a Rule Based
Morphological Analyzer and Generator (RBMAG) system for Kannada language by
incorporating morphological information and peculiarities of the language. The proposed
RBMAG system was developed using FST.
The second section of this chapter explains the development of a statistical based
morphological analyser for Kannada language verbs. The developed statistical system is a
paradigm based morphological analyzer, developedusing machine learning approach. The
systemwas designed using sequence labelling approach and training, testing and
evaluations were performed by SVM algorithms.
185
normally root word is affixed with several morphemes to generate thousands of word
forms. To build an effective morphological analyzer one should carefully analyze and
identify all these roots and morphemes. The next challenge is the design of morphological
structure and generation of well organized corpus that should possibly cover all types of
inflections.
Agglutination is the most critical and important feature of Kannada language. Due to
the highly agglutinating nature of this language and the morphophonemic variations that
take place at the point of agglutination, it is very difficult to mark word boundaries. For
example consider the verb root (OdikoMDiddavana) ->the
one(masculine) who was reading. The different meaningful parts of this word are as
follows:
+ + + + + + + + +
The above word consists of ten meaningful parts, in which one root word (Root), two
Verbal Participle (VBP), two Auxiliary Verbs (AUXV), two Past Tense Markers (PST),
one Relative Participle (RP), one Pronoun (PRON-3SM) and one accusative (ACC).
In general, there are three types of Kannada words namely: i) namapada (Declinable
words or nouns) ii) kriyapada (Conjugable words or Verbs) and iii) avyaya (Uninflected
words). Nouns, pronouns and adjectives belong to declinable words and are inflected to
differences of case, number and gender. Conjugable words are inflected to mark
differences of person, gender, number, aspect, mood and tense. All the Kannada words are
of three genders: masculine, feminine and neuter. Declinable and conjugable words have
two numbers: singular and plural. The singular has no particular distinguishing marker
added. The plural marker is usually gaLu, but there are some exceptions as follows:
Masculine nouns (E.g., huDuga) ending in a and some feminine nouns (E.g., heMgasu)
186
endings in u have plural with aru . Feminine nouns ending with i (E.g., huDugi) or
e (atte) have plural with yaru. Also nouns with kinship terms (E.g., aNNa), the marker
for plural is often aMdiru. Some nouns are irregular plurals such as makkaLu which is
the plural for noun magu.
The case system of Kannada is similar to those of other south Dravidian languages
like Tamil, Telugu and Malayalam. Nouns may usually end in a, e, i, u, A or in a
consonant [158]. Various suffixes are added to the noun stem to indicate different
relationships between the noun and other constituents of the sentence. The different types
of suffixes are used with a particular case, based on the type of nouns and their end
character. For example dative case characteristic suffixes are decided by the following
criteria as shown in Table 6.3.
Table 6.4 below shows the different cases and their corresponding characteristic
suffixes for nouns.
187
Table 6.4: Noun Cases and their Characteristics suffixes
Singular Plural
(gaLadeseyiMda)/
Ablative
(Pachami) (deseyiMda) (MdiradeseyiMda)/
(yaradeseyiMda)/
(radeseyiMda)
Genitive (da)/(ya)/(ina)/ (gala)/(Mdira)/(yara)/
(Shashti)
(na)/(a)/(vina)/ (ra) (ra)
Locative (dalli)/(yalli)/(alli) (gaLalli)/(Mdiralli)/
(Saptami)
/ (nalli) (aralli)/(ralli)
Vocative (E)/(vE)/(A)/(I) (gale)/(MdirE)/
(Sambhodana)
/(yare)
188
complexity makes it very challenging to capture in a machine analyzable and generatable
format. Also the formation of the verbal complex involves arrangement of the verbal units
and the interpretation of their combinatory meaning. Phonology also plays a little role in
word formation in terms of morphophonemic and sandhi rules which account for the
shape changes due to inflection. To resolve the computational challenges in verb
morphological analysis,I have classified verbs into 35 distinguished paradigms and verb
words are grouped based on their class paradigms.
Verb forms can be broadly classified into two types: finite verbs and non-finite verbs.
In case of finite verbs, the verbs are usually added to the end of sentences with the
exception of clitics or reportives and can have nothing added to them. The general syntax
of finite verb is the form: Subject-Object-Verb. Some of the finite forms of the verbs are
imperatives, present and past forms marked with PNG, modals and verbal/participle
nouns. The tense can be past/present/future, if it is in the affirmative. The negative form
does not take tense. The non-finite verbs in contrast cannot stand alone and must have
some other forms following them. Non-finite verb forms include infinitives, verbal and
adjectival participles and tense-marked verb stems [158]. The non-past denotes both
present and future tenses and unlike Malayalam language (another south Dravidian
language) all tenses have different tense markers in Kannada language. Mood is another
important feature of Kannada language and is associated with statements of fact versus
possibility, supposition, etc [160]. There are four different moods that are expressed in
Kannada are: infinitive, imperative, affirmative and negative. Also Kannada has some
additional modal forms such as: indicative, conditional, optative, potential, monitory and
conjunctive.
Kannada language also include past verb stems in addition to simple verb stems, that
are used in forming the past tense, past participles, conditionals and some other
constructions. The contingent form is another distinguished feature of Kannada language
that is not present in any other Dravidian languages [161]. Past verbs are broadly classified
into two types called regular and irregular (or semi regular). In case of regular the different
words are formed by adding id to the verb stem. In the other case different words are
formed by adding any one of the past tense marker as shown in Table 6.18.
189
6.1.4.1.1 The Infinitive
The infinitive is a non-finite form of the verbs that occur together with other verbs,
auxiliary verbs (modals), negative morphemes and some other forms. There are two types
of infinitives in Kannada called (al) and (Okke). Both are added to the verb
root to generate other word forms as shown in Table 6.5.
6.1.4.1.2The Imperative
Based on various degrees of politeness and deference, Kannada verbs exhibit number
of forms that express commands or exhortations. These imperatives also changes based on
verb types which usually depend upon the end of verb. The Table 6.6 below indicates the
imperatives with example each.
190
Polite, plural (banni)
(kuDIri) (tegoLLi ) (hOgi)
Very polite (banniri)
(kuDiyiri ) (tegoLLiri) (hOgiri)
ultrapolite
(tAvu (tAvu (tAvu banniri)
kuDiyiri) tegoLLiri) (tAvu
hOgiri)
These forms command someone not to do something. There are two ways for
indicating negative imperatives called (bAradu) and (bEDa). The first one
bAradu (historically a form of bA/baru, come) maybe added with infinitive (a) to
generate negative verb form. Also added with infinitive (al) or (l) and an
emphatic (E) to generate strong negative verb forms. Table 6.7 illustrates the negative
imperative with example.
Similarly the second way for indicating negative imperatives are using the negative
modals (bEDa, negative modal of the word (bEku) - want, need, must,
should) and (kUDadu, must not). Table 6.8 illustrates these negative
imperatives with example.
191
Table 6.8: The Negative Imperative
Case 3 (mADalEbEDa) or
(mADu) + (l) + (E) (bEDa) =
(mADlEbEDa), must not do
(mADu) + (a) + (kUDadu) =
Case 4 (mADakUDadu) or
(mADu) + (kUDadu)= (mADkUDadu), must
not do
6.1.4.1.4 Optative
The optative is usually used with first and third persons in Kannada and is formed by
adding (i) to the infinitives. It is often translates into an English word let, if it is used
in an affirmative and has the meaning shall, should and may when appeared in the
interrogative as shown in table 6.9.
192
6.1.4.1.5 Hortative
This form in Kannada can be formed by (ONa) to the verb stem and can be
translated either as lets (do something) or shall we (do something)? in the interrogative
sentence. Table 6.10 illustrates this with example.
6.1.4.1.6 Participle
Kannada has some non finite verb forms called participles that function verbally,
adjectivally or has some special syntactic function in the sentence. Participles may be
affirmative and that can be marked for tense or negative. The important verbal participles
in Kannada are as shown in Table 6.11 with example for each.
Past verbal If the past tense marker of the Case 1:( mADi
participle verb is (id), then add Urige baMdenu)having done, I
(i) to the verb stem came to village
followed by a finite verb or Case 2: + + ( hOgi +
verb phrase. Otherwise add biTTu + banni) go and come
(u) to the past verb stem
193
followed a finite verb or verb
phrase.
Negative Used to express the negative Caes 1: + =
verbal of both the present and past (nODu + ade = nODade)without
participle verbal participle. These are seeing
formed by adding (ade) to Case 2: + = (illa + ade
the verb stem or to the =illade) not being/having
negative stem illa. been
Verbal/part The most common among Caese 1: + +=
iciple noun verbal participle nouns are (ADu + O +adu= mADOdu) the
the neuter singular (adu) (act/fact of) doing, that which does
and personal verbal nouns Case 2: + + =
like (avanu), n(ODu + O +avaru =
(avaLu), (avaru) etc. nODOvavaru) those (people
) who see
194
6.1.4.1.7 Modal auxiliaries
There are number of modal auxiliaries in Kannada language that may have number of
different meanings as shown in Table 6.12.
195
can/may (do something)
The negative of is (nODa bAradu)
(bAradu) cant/shouldnt see
The negative contingent + + + =
with PNG markers is nagu + al + Ar + enu =
(Ar) attached to the verbal nagalArenu
infinitive to generate the cannot/might not laugh
meaning cannot/might
not.
Mainly, there are two verb stems namely (baru) and (Agu) are frequently
used as dative-stative verbs in Kannada language. These verbs have a habitual sense when
they stand alone and normally do not take tense markers. The verb ( iru, be)
indicating possession has the meaning have in English. The second verb (become)
act as an aspect marker indicating finality.
Aspect markers are very similar to main verbs in their morphology and syntax but
semantically they do not express the lexical meaning like the other main verbs. In
Kannada language, the verbal aspect marker is usually added to the past verbal participle.
The Table 6.13 shows the various verbal aspect markers that are used in Kannada. The
aspect markers in Kannada and their meaning in English are underlined in each of the
given example.
196
definiteness). participle. It is go
homophonous
with the lexical
verb
(leave) and has
the similar
tense
formation.
However the
aspectual
can also
attached to the
lexical verb
.
(hOgu)Meaning: The aspectual Case 1: (anna
completion(sometimes beMdu hOgide) the rice has
in voluntaryor indicates that gotten overcooked
accidental) something has Case 2: ( keTTu
changed from hOgu)get spoiled
one state to
another. It is
homophonous
with the lexical
verb
(go) and has
the similar
tense
formation.
Aspectual
is
attached to the
past verbal
197
participle.
(ADu) It is Case 1: (avaru
Meaning: continuity, homophonous ODADidaru)they ran around
duration (with some with the lexical Case 2: (avaru
verbs reciprocalor verb (paly) kAdADidaru)they fought with
competitive) and has the each other
similar tense
formation.
Aspectual
is also attached
to the past
verbal
participle.
(koDu) It is Case :
Meaning: benefactive homophonous (avanu kate
with the lexical baredu koTTanu)he wrote the
verb story for someones benefit
(give) and has
the similar
tense
formation.
Aspectual
is also
attached to the
past verbal
participle.
(nODu) It is Case:
Meaning: attemptive, homophonous (avanu
experimentive with the lexical kAfi kuDidu nODidanu)hetried
verb drinking/tasted the coffee
(see) and has
the similar
198
tense
formation. It is
usually used
with transitive
verbs and
rarely used with
intransitive
formation.
(hAku) It is Case: (
Meaning :exhaustive, homophonous avanu dOseyella tiMdu
malefactive with the lexical hAkidanu)he ate up all the pan
verb (put, cakes (against our whishes)
place) and
takes regular
(weak) tense
formation. It is
usually used
with transitive
verbs.
(koLLu) It is Case : (
Meaning: reflexive, self homophonous avanu kate baredu koMDanu)he
benefactive with the lexical wrote a story for himself
verb
(buy, take,
acquire) and
has the similar
tense
formation.
The causative suffix (isu) (or yisu) can be added to any verb stems to make
causative verbs out of noncausative ones as in Table 6.14.
199
Table 6.14: The causative suffix (isu)
The PNG and the tense marker concatenated to the verb stems are the two important
aspect of verb morphology. The verbal inflectional morphemes attach to the verbs
providing information about the syntactic aspects like number, person, case-ending
relation and tense. The Table 6.16 shows the various PNG suffixes that can be attached to
be any verb root word.
200
Present Future Past Contingent
(bILu) . These two words, when inflected with tense (past) and PNG markers to
form (eddenu) and (biddenu). As these two words show the same
orthographic changes, they are grouped under the same paradigm. The Table 6.17
illustrates these paradigms with example each.
201
Paradigms Past tense Description & Example
marker
Class-1 --(-tt-) Verbs ends with 'Ayu', 'Iyu', 'ILu'; Eg: sAyu, Iyu, kILu etc.
Class-2 --(-tt-) Verbs ends with 'ru', 'aLu', 'uLu'; Eg: heru,aLu,uLuetc.
Class-5 -- (-t-) Verbs ending with i and e; Eg: kali, mere, koLe etc.
Class-9 -- (-d-) Verbs ending with 'Agu','Ogu'; Eg: hOgu, Agu etc
Class-11 -- (-d-) verbs ending with 'ge' and 'gi' ; Eg: age, agi
Class-12 -- (-d-) Verbs ending with 'yyu' ; Eg: koyyu, geyyu, hoyyu etc.
Class-13 -- (-d-) Verbs ends with 'nnu' ; Eg: annu, tinnu, ennu etc
Class-14 -- (-d-) Verbs ending with 'Eyu' ; Eg: gEyu, nEyu etc
Class-18 --(-dd-) Verbs ends with 'ILu','ELu' ; Eg: bILu ,ELu, etc
Class-21 --(-id-) Verbs ends with 'ADu', 'ODu' ; Eg: ADu,nODu, tODu
202
Class-22 --(-id-) Verb ends with 'TTu',ddu, bbu, ttu, llu, ccu
Class-24 --(-id-) Verbs ends with 'ju',su ; Eg: mOju, aMkurisu etc
Class-25 --(-id-) Verb ends with 'MTu',Mju,Mcu; Eg: IMTu, aMju, hoMcu
Class-26 --(-id-) Verbs ends with 'ELu', 'ILu'; Eg: hELu, sILu etc
Class-27 -- (-nd-) Verbs ends with 'Eyu', 'Oyu' ; Eg: bEyu, nOyu etc
Class-29 -- (-nd-) Verbs ends with 'ollu', 'ellu', 'allu' ; Eg: kollu,mellu ,sallu etc.
Class-35 -- (nD-) Verbs ends with 'ko' ; Eg: baggiko, bEDiko etc
Designing the morphological system thatshould probably generate all possible word
forms for the given verb root word was the next important stage in the developed system.
Fig. 6.1 shows the proposed flowchart for verb morphology. The meaning of each of the
abbreviations that are used in the flowchart is shown in Table 6.18.
Abreviation Examples
PAST: Past tense marker tt, Mt.,t, d, dd,id,Md, D,T, kk, MD, ttidd, Mtidd,
203
tidd, didd, ddidd, idd, Mdidd , Didd, Tidd, kkidd
,MDidd .
PRESENT: Present tense marker utt, yutt, uttidd, yuttidd, iyutt, iyuttidd
INF: Infinitive isi, is, isal, isalu, isid , iyisid, sid, al , alu ,i ,sal
,salu ,si , s, a
The different levels show the possibility of different verb words derived from the
same root word. For example the different morphemes associated with the verb word
(OdisinODuttAne) is:
+ + + +
The proposed rule based MAG tool was developed using AT &T Finite State
Machine. This section describes the various efforts required to create the proposed rule
based MAG system.
204
6.3.1.1 Lexicon
The list of stems and affixes together with basic informations about them (Noun stem
or Verb stem etc,).
6.3.1.2 Morphotactics
The model of morpheme ordering that explains which classes of morphemes can follow
other classes of morphemes inside a word. E.g., the rule that Kannada plural morpheme
follows the noun stem rather than preceding it.
These are spelling rules used to model the changes that occur in a word, usually when
two
INF morphemes combine. For example, insert a yu on the surface tape just when the
A FST essentially is a finite state automaton that works on two (or more) tapes. The
most common way to think about transducers is as a kind of translating machine which
works by reading from one tape and writing onto the other. For example, on one tape we
shown in Fig. 6.2.: means read a symbol on one tape and write the same
on the other tape. Similarly +N: means read a +N symbol on one tape and write
nothing on the other.
Lexical Level
Surface Level
206
Fig. 6.2:FST working principle
FSTs can be used for both analysis and generation (they are bidirectional) and, it act
as a two level morphology [162, 163]. A word is represented as a correspondence between
a lexicalleveland surfacelevel. At lexical level, represents a simple concatenation of
morphemes making up a word. But at the surface level, represents the actual spelling of
the final word.
With all relevant morphological feature information of Kannada words, created well
defined sandhi rules based on FST. The architecture of proposed rule based MAG system
is as shown in Fig. 6.3.
For the Morphological generator, if the string which has the root word and its
morphemic information is accepted by the automaton, then it generates the corresponding
root word and morpheme units in the first level as shown in Fig. 6.4.
Here beLe is the root word, V indicates the category of the root word as verb,
PRES and FUT indicates the tense markers for presentence and future tense respectively
and 3SM indicates PNG marker for third singular masculine.
207
Fig. 6.4: Example forMorphotactics Rule
The output of the first level becomes the input of the second level, where the
orthographic (sandhi) rules are handled as shown in Fig. 6.5. If it gets accepted then it
generates the inflected word.
The sandhi rules should be written in such a way that, if the root word ends with e
and the next morphemes are tt(PRES) or Ane(3SM), then insert yu immediately
after the root word. Fig. 6.6 below shows the corresponding sandhi rule.
208
Fig. 6.6: Example for Sandhi Rule
Based on the inflections and differences, all possible Morphotactic and Sandhi rules
were written for all forms of nouns and verbs using FST.
####Plural "gaLu"#####
[b2] [<epsilon>] / __ gaLu
This rule works as follows if any root word given as input in the form Root+N+PL
as like this mara+N+PL, here mara (tree) is root word, here + is consider as [b2] this is
already defined in alphabets file. So +N should be replaced as [<epsilon>] that is replace
as empty string and then Plural marker is added to the root word +PL is replaced as
gaLu, now tool give output as maragaLu.
####PRESENT tt#####
[b2] [<epsilon>] / __ tt
This rule works as follows if any root word given as input in the form Root+V+PRES
as like this Odu+V+PRES, here Odu (tree) is root word, here + is consider as [b2] this is
already defined in alphabets file this is common for both Noun and Verb. So +V should
be replaced as [<epsilon>] that is replace as empty string and then Presentence marker is
added to the root word +PRES is replaced as tt, now tool give output as Odutt.
209
6.4 MORPHOLOGICAL ANALYZER USING MACHINE LEARNING
APPROACH
The proposed morphological analyzer model was developed using supervised machine
learning approach using the popular classification and regression tool called SVM. In the
supervised machine learning approach, the training corpus consists of pairs of input
objects and the desired output.
6.4.1.1 Romanization
SVM support only Roman character code but Dravidian language like Kannada does
not support this code format and support only Unicode character. Mapping files were
created and used to map from Unicode to Roman and vice versa. The Romanized aligned
input-output data corpus,consisting of most commonly used verbs, selected from all verb
paradigms was created manually.
210
Fig. 6.7: Preprocessing steps
6.4.1.2Segmentation
This step involves four different stages: grapheme segmentation, splitting syllable, C-
V (Consonant-Vowel) representation and segmentation. In the first stage each and every
Romanized word in the corpora is segmented based on Kannada grapheme. Again these
graphemes are split into syllables of consonants and vowels in the second stage. In the
next stage the consonant and vowel markers -C and -V are append to the segmented
consonant and vowel syllable respectively. The symbol * is used to indicate the
morpheme boundaries of the output data.
Using the sequence labelling approaches [165] the segmented input - output words
were aligned vertically and consequently as segments with space between them. The
Table 6.19 shows a sample of alignment of input and out data in the corpus creation.
211
Input k-C a-V l-C i-V t-C a-V n-C u-V
Output k a l i* t* a n u*
When we map input and output data words, the problem of mismatching may occur
due to either the input units are larger or smaller than that of the output units. In the first
case when the input units are larger than the output units, based on the morph-syntactic
rules, inserting a null symbol $ in the output vector can solve the mismatching problem.
Consider the Kannada verb kaliyuttAne is having 11 segments in the input sequence and
10 segments in the output sequence.
Due to the morphosyntactic rule the occurrence of y in the input sequence becomes
null in the output sequence. To equalize the input and output data in such situation the
training data y is mapped with the empty symbol $ as shown in Table 6.20.
In the second case the number of segments in the input sequence is less than the
corresponding output sequence as oppose to the first case. To illustrate this situation,
consider the Kannada verb OdidaLu which is having 7 segments in the input sequence
but 8 segments in the output sequence. In order to overcome this problem using the
morphosyntactic rule the first occurrence of d- C in the input sequence is mapped with
two segments d and u in output sequence in the training data.
212
6.4.2 Morphological Analyzer Model Creation
The architecture of proposed model mainly consists of three different phases as shown
in Fig. 6.8.
The pre-processed parallel corpus consists of sequence of input characters and their
corresponding output labels. The parallel corpus consist of more than 200,000 words were
trained using SVMTlearn tool and the morphological analyser model called model-I was
generated. The model-I was used to analyse and identify different morphemes associated
with the given input test word. Similarly another model called model-II was created for
assigning grammatical classes to each morpheme in a word and this second model was
trained using sequence of morphemes and their grammatical categories.
The working principle is as follows:The test input word is given to the trained model-
I. The trained model-I predicts each label to the input segments. In the next phase the
segmented morphemes are given to the trained model-II. It predicts grammatical
categories to the segmented morphemes for the given input word.
213
6.5 SYSTEM EVALUATION AND PERFORMANCE
Development of MAG is a challenging task for all types of word forms. The
developed rule based MAG is capable of analyzing and generating a list of twenty
thousand nouns, around three thousand verbs and a relatively smaller list of adjectives.The
uniqueness of the developed MAG is its capacity to generate and analyze transitive,
causative and tense forms apart from the passive constructions, auxiliaries and verbal
nouns. The performance of the developed system can be substantially improved by adding
more rules such as rules for complex morphology etc. Also by checking against more and
more different types of word lexicons, the accuracy of the developed MAG can be
improved. A rule based machine translation system for English to Kannada language was
developed using the proposed MAG. The following Fig. 6.9 shows a command line
screenshot for the developed RMAG.
In the second proposed method, a parallel corpus consist of more than 200,000 words were
trained using SVMTlearn tool and the morphological analyzer model was generated.
Using the SVMTagger we have tested more than 50,000 different verb words selected
from two standard dictionaries [166,167] and also from the Amrita POS tagged corpus.
The performance of the system was evaluated using SVMTeval tool and the outputs which
are incorrect are noticed. In contrast to the rule based approach, the system performance is
considerably increased by adding the input words to the training corpus whose
corresponding output are incorrect during testing and evaluation. From the experiment we
found that the performance of our system significantly outperforms and achieves a very
competitive accuracy of 96.25% for Kannada verbs.
214
Fig. 6.9: Command Line Output of Morph Tool
Sample screenshots of the developed MAG model for noun are shown in Fig. 6.10 and
6.11. Similarly Fig. 6.12 and 6.13 shows the screenshots of the developed MAG model for
verb.
215
Fig. 6.11: Screenshot of Kannada Morph generator for Noun
216
Fig. 6.13: Screenshot of Kannada Morph generator for Verb
6.6 SUMMARY
This chapter is a part of the research work which deals with the design and
development of morphological analyzer and generator for Kannada language using rule
based as well as statistical based approaches.
Development of a morphological analyzer and generator for all types of word forms is a
challenging task for an agglutinative language like Kannada. The implementations aimed
to incorporate more lexical information of Kannada language with good semantic features,
which will solve the morphological problem more effectively. The performance of the
statistical approach depends on large sized aligned bilingual corpora of all types of word
categories. On the hand the performance of the rule based approach depends on all types
of simple and complex linguistic rules, in order to cover all types of word forms.
The performance of the developed rule based system can be substantially improved by
adding more rules by checking against more and more different types of word lexicons.
On the other hand, the performance of the statistical approach can be improved by
increasing the corpus size to cover other word categories like noun, pronoun, adverb etc.
217
6.7 PUBLICATIONS
218