Vous êtes sur la page 1sur 7

A Context-free Grammar for metre in Tamil Prosody and a parser to analyse metrical text

L.BalaSundaraRaman, Ishwar Sridharan


sundarbecse@yahoo.com, ishwar@gmail.com

Abstract Tamil is a classical language belonging to the Dravidian language family. Poetry in Tamil literature, especially from the post-classical period and the neo-classical period, largely adheres to the welldefined rules of metre described in tolkppiyam, and later in ypparunkalam, ypparunkalakkrigai and other books on Tamil grammar.1 Metre in Tamil is defined in terms of six elements euttu (phone), acai (metreme), cr (metrical foot), taai (linkage), ai (metrical line) and toai (ornamentation). Based on the metre, Tamil poems are classified into five types of verses (p) viz. 1. vep 2. ciriyapp 3. vacipp 4. kalipp and 5. marup. The authors claim is that the grammar governing metre in Tamil is well-structured such that it can be written as a Context-free Grammar (CFG). The productions in this CFG are written using the six elements. With the formal grammar for the metres expressed in EBNF, it is now possible to write a parser for the grammar. A parser, named visaineri2, that analyses a given metrical text in Tamil and generates a tree consisting of the six basic elements of metre has been developed by the authors. This parser is built on the spark framework.3 The parser takes as input tamil verses, checks for conformance to the formal grammar specified in EBNF, and produces as output an XML file specifying the various metremes in a schema along the lines of TEI P54 and [JLC].5 As a direct outcome, visaineri helps automate the intensive operation of annotating literary works with metrical elements. This process is otherwise done by a labourious manual effort. After annotation, the text can be fed to a search engine to search for metrical patterns. Various statistical analyses can also be performed on the annotated corpus. Since there is copious amount of literature available to be annotated and analysed, the parser described above will be very helpful for researchers. In addition to analysing existing corpora, the parser has been employed by modern poets who want to write poems conforming to classical metre using visaineri.
1

Niklas, Ulrike (1988). "Introduction to Tamil Prosody". Bulletin de l'Ecole franaise d'Extrme-Orient 77 (1): 165227. doi:10.3406/befeo.1988.1744. ISSN 0336-1519. 2 http://www.visaineri.net/ 3 http://pages.cpsc.ucalgary.ca/~aycock/spark/ 4 TEI Consortium, eds. TEI P5: Guidelines for Electronic Text Encoding and Interchange. [November 1, 2007]. TEI Consortium. 5 Jean-Luc Chevillard, Critical editions of Tamil works: exploratory survey and future perspectives (INFITT 2009, Kln, 25th October)

System of Tamil Prosody


The system of Tamil prosody is defined in terms of the following six elements:6 1. 2. 3. 4. 5. 6. euttu (phone) acai (metreme) cr (metrical foot) taai (linkage) ai (metrical line) toai (ornamentation)

Among the above, except for a few toai, all the elements are identified by phonological rules. These well-defined rules have a direct bearing on the metrical essence of any Tamil p. Based on these rules, a stanza of Tamil poetry can be classified into any of the four p or their 12 variants.7

euttu or phone
euttu form the basic tokens of the Tamil prosodic system. Specifically, acai are composed of them. Certain phonological rules in tolkppiyam specify the length of a metrical line in terms of the phones. Several features of ornamentation depend upon the attributes of phones occuring in specific positions of metrical feet. Tamil euttu are classified into primary phones and secondary phones. The primary phones are further classified into vowels and consonants. The 12 vowels are further classified into short (kuil) and long (neil) vowels. The 18 consonants are classified into three groups viz. hard, soft, and middle.

acai or metreme
acai is composed of phones and forms the basic unit of Tamil metres. Each acai or metreme can have one or two syllables, and in certain specific scenarios, three syllables. A short syllable or one long syllable, either ending in a consonant or otherwise, but not forming a part of a disyllabic sequence, is called nr-acai. A sequence of an open short syllable followed by a long syllable that can either end in a consonant or remain open, is called nirai-acai. When an overshort u (a secondary phone) follws nr-acai or nirai-acai in specific locations of a p, it can be called nrpu or niraipu as the case may be.

In Introduction to Tamil Prosody, Ulrike Niklas elaborates how unique the elements of Tamil prosody are. She

favours a terminology distinct and aligned with the original Tamil definition. The same has been followed in this paper.
7

https://www.tamilvu.org/courses/diploma/a021/a0214/html/a02144l0.htm

cr or metrical foot
One to four metremes can form a cr or metrical foot. The metrical feet in turn form the ai or metrical line. The metrical feet are classifed according to the number and sequence of metremes that they are formed of. Each of these types have a mnemonic or patternword as its name. The patternword itself will have the sequence of metremes that it represents. The full list of metrical feet can be found in the Context-free grammar given later below.

taai or linkage
The constraints on the way one cr can be linked with another dictate the taai. The linkage elements taai define the relationship between the last acai of a cr with the first acai of its following one. There are seven types of taai, based on the acai composition of the cr under consideration (nilaimoi) and the first acai of the following cr (varumoi).

ai or metrical line
An ai is a line in a p. It can be defined variously in terms of its component metrical feet or the sequence of taai. In addition, tolkppiyam specifies the counts of phones in an ai. U. Niklas lists the types of ai and the number of feet and the range on the number of phones each of them are composed of. The types are kuaai (dwarf), cintai (short), aavai (standard), neilai (long), and kaineilai (extremely long).

toai or ornament
There are five kinds of ornament and seven methods of ornamentation that can occur in corresponding phones within or across metrical lines. Ornaments include alliteration and rhyme. Ornamentation happens with a single metrical line and across metrical feet. Both ornaments and ornamentation form the toai. While many of the toai are phonological relationships, the authors have kept them under the purview of future research.

Context-free grammar for Tamil prosody


At the level of phones, metremes, and metrical feet, the phonological rules are shared between the various kinds of p. These rules form a Context-free grammar. This Context-free grammar has been expressed in Extended BackusNaur Form (EBNF) below. p ::= ai p p ::= ai ai ::= cr DELIMITER ai ai ::= cr ai TERMINAL ai ::= uccr ai TERMINAL cr ::= racaiccr cr ::= mvacaiccr cr ::= nlacaiccr uccr ::= racaiccr uccr ::= niraipu (ypp. 6,7) nr ::= nr ::= nr ::= nr ::= nr ::= (ypp. 8,9) nirai ::= nirai ::= nirai ::= nirai ::= nirai ::=

kuil ou neil ou kaaikkuil kaaineil kuil

kuil kuil kuil kuil kuil

kuil ou kaaikkuil kuil neil ou kaaineil

uccr ::= nrpu (ypp. 14) racaiccr ::= malar racaiccr ::= nr (ypp. 11) racaiccr racaiccr racaiccr racaiccr

nirai ::= kuil neil malar malar malar malar ::= ::= ::= ::= kuil kuil kuil kuil kuil ou kaaikkuil neil ou kaaineil

::= ::= ::= ::=

karuviam puim kviam tm

nrpu ::= nr overshort_u niraipu ::= nirai overshort_u karuviam ::= nirai nirai puim ::= nirai nr kviam ::= nr nirai tm ::= nr nr karuviakai ::= nirai nirai nirai puimkai ::= nirai nr nirai kviakai ::= nr nirai nirai tmkai ::= nr nr nirai karuviaki ::= nirai nirai nr puimki ::= nirai nr nr kviaki ::= nr nirai nr tmki ::= nr nr nr karuvianaunial ::= nirai nirai nirai nirai kvianaunial ::= nr nirai nirai nirai puimnaunial ::= nirai nr nirai nirai tmnaunial ::= nr nr nirai nirai karuvianaump ::= nirai nirai nirai nr kvianaump ::= nr nirai nirai nr puimnaump ::= nirai nr nirai nr tmnaump ::= nr nr nirai nr karuviantaial ::= nirai nirai nr nirai kviantaial ::= nr nirai nr nirai puimntaial ::= nirai nr nr nirai tmntaial ::= nr nr nr nirai karuviantap ::= nirai nirai nr nr kviantap ::= nr nirai nr nr puimntap ::= nirai nr nr nr tmntap ::= nr nr nr nr

(ypp. 12) mvacaiccr mvacaiccr mvacaiccr mvacaiccr mvacaiccr mvacaiccr mvacaiccr mvacaiccr (ypp. 13) nlacaiccr nlacaiccr nlacaiccr nlacaiccr nlacaiccr nlacaiccr nlacaiccr nlacaiccr nlacaiccr nlacaiccr nlacaiccr nlacaiccr nlacaiccr nlacaiccr nlacaiccr nlacaiccr

::= ::= ::= ::= ::= ::= ::= ::=

karuviakai karuviaki puimkai puimki kviakai kviaki tmkai tmki

::= karuvianaunial ::= karuvianaump ::= karuviantaial ::= karuviantap ::= puimnaunial ::= puimnaump :: puimntaial ::= puimntap ::= kvianaunial ::= kvianaump ::= kviantaial ::= kviantap ::= tmnaunial ::= tmnaump ::= tmntaial ::= tmntap

Parsing
Based on the P5 guidelines8 developed by the Text Encoding Initiative, Jean-Luc Chevillard has proposed a format to represent the parse tree of Tamil metrical text.9 The authors have adapted his proposal and added attributes like the

TEI Consortium, eds. TEI P5: Guidelines for Electronic Text Encoding and Interchange. [November 1, 2007]. TEI

Consortium. http://www.tei-c.org/Guidelines/P5/
9

Jean-Luc Chevillard, Critical editions of Tamil works: exploratory survey and future perspectives (INFITT 2009,

Kln, 25th October)

type of cr and taai. An example output for a p from tirukkua is given below.

Parser Design
With the grammar for Tamil prosody expressed in EBNF format, it is now possible to write a parser to check for conformance of metrical texts to the grammar. The authors have developed the visaineri parser using the SPARK framework10 in Python programming language. The parser follows the standard four-stage parsing process. Each phase performs a well-defined task, and passes an output data structure on to the next phase.

10

http://pages.cpsc.ucalgary.ca/~aycock/spark/

The four stages of Visaineri parser are given below: 1. Lexical Analysis: This stage breaks the input text into a list of tokens (euttu) 2. Syntax Analysis: In this stage, the parser checks if the list of tokens (euttu) come together to form metremes (acai), metrical foot (cr), metrical line (ai) in conformance with the CFG. The result of the parsing is an Abstract Syntax Tree (AST). 3. Semantic Analysis: In this stage, the tree is traversed, information on the various acai, cr and ai collected to check if the semantics are in accordance with the taai rules and updates the corresponding nodes in the AST. 4. XML Generation: Once the AST is complete, the XML file is generated by traversing the AST.

Tirukkua Analysis
In order to show the utility of a metrical parser beyond outputting a parse tree corresponding to a verse, the authors ran it on a classical work of ethics, the Tirukkua. It is a poetic text from the patiekkaakku collection dated to the post-Sangam period. All the poems in the text are written in Venpa metre. For the purposes of this paper, the authors fed 1323 verses from the text through the parser and obtained the XML output. The statistical analysis of tirukkua poems is presented below: Frequency distribution of various prosodic features Prosodic feature nr-acai nirai-acai karuviaki cr kviaki cr puimki cr kviam cr karuviam cr tm cr puim cr Frequency 14301 6456 507 364 1008 866 602 2549 1339 n Malar iyarcrveaai vecrveaai nr cr at acai beginning nr cr at acai end nirai cr at acai beginning nirai cr at acai end Prosodic feature Frequency 174 661 4868 3070 5144 7132 4117 2129

The table lists the various prosodic features and their observed frequency. The first two features show the distribution of nr and nirai acai across poems. Due to the nature of Venpa grammar, the numbers are skewed towards nr. The next nine features show the distribution of metrical feet patterns (cr). The subsequent two features show the distribution of linkages between acai. Since tirukkua verses are written in Venpa metre, iyarcrveaai and vecrveaai are the only types of linkages allowed between cr. The last four features show the distribution of nr and nirai occuring at the beginning and end of each cr.

Applications
The theoretical outcome of the research described here is in establishing indigenous Tamil prosody as a Context-Free Grammar. In addition to that, the parser saves a lot of manual labour spent by linguists who identify the prosodic elements by hand and aggregate statistics based on that. This process is both cumbersome and error-prone. With plenty of texts still to be parsed and analysed, automation has a significant utility.

Future work
Most of the Tamil works available from ancient to modern times are in poetic form. Going forward, the authors intend to run the available Tamil works in metrical form through the parser. The parsed information will be used for collecting aggregate information about the distribution of prosodic elements and other analyses. There is a recent increase in interest in writing modern poems according to metrical rules. The parser will be used to enable people to write metrical poetry online by providing realtime feedback on the conformance to the phonological rules.

Vous aimerez peut-être aussi