Vous êtes sur la page 1sur 15

INTERNATIONAL JOURNAL OF TRANSLATION

Vol. 18, No. 1-2, Jan-Dec 2006

A Comparative Diagnostic Study of the Criterion of Comprehensibility in two English-to-Persian Machine Translations
DR. MOHAMMAD REZA FALAHATI QADIMI FUMANI
Faculty of Computational Linguistics Research Department, Regional Information Center for Science and Technology, RICeST, Shiraz, Iran. E-mail: mrfalahat@yahoo.com

DR. AZADEH NEMATl


Department of English Language Teaching, Jahrom Branch, Islamic Azad University, Jahrom, Iran. E-mail: azadehnematiar@yahoo.com

Cite this article:


Falahati Qadimi Fumani, M. R. and Nemati, A. (2006). A co mprehensive diagnostic study of the criterion of comprehensibility in two English-to-Persian machine translations. International Journal of Translation, 18(1-2), 103-117.

ABSTRACT

The intent of this research was to analyze the performance of two English-Persian MTs based on the criterion of Comprehensibility as well as to carry out an error analysis of the types of errors they made. i.e. lexical, grammatical, punctuation, spelling untranslated items and style. A text was randomly selected from amongst ten texts all related to the area of General as specified by the two systems, which was given to three groups of raters: ordinary, professional translators (case I ) and professional translators (case 2). The three groups embodied 4, 3 and 3 raters respectively. Based on a revised researcher-made version of the theoretical framework proposed by Miller eta/. (2001), the raters were asked to assign a I, 0.5 or 0 score to each sentence of the text. The data, extracted from a normal distribution , were summarized in the form of frequency tables, which were later analyzed by the researcher using percentage tables as well as /-tests. The error analysis was carried out by the researchers, and therefore inter-rater reliability was used. The results indicated that neither of the two MTs could attain even a 40% degree of Comprehensibility. and thus they had a low performance indeed. With regard to the error analysis it was found out that the two MTs had made a lot of errors (MTI= 52 I MT2= 45 cases) and that grammatical and lexical errors were most common with 59.61% and 19.23% in MTI and 51. 11% and 28.89% in MT2. Finally it is stated that these systems have problems mainly with their grammar and lexicon and that in their present shape they are heavily in need of expert translators as pre and post editors to check the input and revise, or better retranslate, the output.

104

MOHAMMAD REZA FALAHATI QADIMI FUMANI & AZADEH NEMATI

Keywords: Computational Linguistics, Machine Translation, Evaluation, Comprehensibility, Diagnostic, Comparative Study, Black-box, Glass-box.
1. l NTRODUCTION

The translation of natural languages by machine, first dreamt of in the seventeenth century, has become a reality in the twentieth (Hutchins 2005b). Of course, systematic research on MT started right after the invention of computer around 1950. Works on the status of MT were even available in 1951 (Bar-Hillel 1964). Therefore, for about the past fiftysix years, researchers from different countries and with different expertise, i.e., Computational Linguistics, Computer Sciences and other related areas, have endeavored to design systems that could successfully and without human involvement render texts from one or more source language(s) into one or more target Ianguage(s) within a short period of time. After such systems were designed, those inside and outside the project, i.e., researchers, designers... found out that the outputs of the systems were not, in most cases, reliable and hence some evaluation systems had to be incorporated to help researchers and designers eliminate the shortcomings and improve the output quality.
1.1. Research questions Due to the great importance of evaluation and evaluation techniques, the present study seeks to find an answer to the following research questions while carrying out a comparative diagnostic evaluation of two Englishto-Persian MT systems. (We do not wish to disclose the names of the two systems used in this study and we shall subsequently refer to them simply as MT I and MT2.)
I.

2.

What is the degree of comprehensibility of each of the two MT's under study? (which MT is more comprehensible?) Within a diagnostic evaluation framework, what is the frequency of each type of error (grammatical, lexical, punctuation, spelling, untranslated items and style) each MT made? which MT has fewer errors? The null hypothesis will be that there is no difference between MTI and MT2.

In the present study, a diagnostic evaluation has been adopted since this type of evaluation is, "the concern mainly of researchers and developers" (Hutchins 2005a).

CRITERION OF COMPREHENSIBILITY IN ENGLISH-PERSIAN MTS

105

12. Scope of the study Each of these two MT systems has been designed in such a way as to translate texts related to a variety of subject areas, i .e:, humanities, geography, economy, etc. and in case the text is not related to any specific subject area, the icon general or public could be selected. Because the text randomly selected for the present study was not related to any specific area, n<;>t a technical text, the researchers assessed the input text using the icon ge neral in MTl .and MT2 and left the remaining subjects untackled .

1.3. Theoretical framework of the study

To analyze the data the concept of Comprehensibility as described by Miller et al. (200 1) has been used as the theoretical framework of the study. Comprehensibility in their work was defined as a measure that '"seeks to address the question: Is the text understandable?" They carried out the evaluation process on a sentence-by-sentence basis and divided all sentences into two groups of comprehensible and incomprehensible. Each comprehensible sentence received the score I and each incomprehensible sentence was scored 0. They did not specify the exact number of raters needed for a reliable evaluation but mentioned that the more the number of raters the more reliable the results obtained would be (Miller et al. 2001). Because Miller et al.'s Model just used the two simple scores of 1 and 0 for comprehensible and incomprehensible sentences, the present researchers thought of making use of a revised self-made version of the model. In fact, Miller 's model did not assign any score to sentences that were comprehensible tQ some extent)t. The raters had to use either I or 0 and this was thought to lower the quality of the evaluation. The present researchers added a third score option, 0.5, to the other two possibilities, i.e. 0 and I. The revised model was used in this study.
2.
LITERATURE REVIEW

In this section some major works done in the area of MT and MT evaluation will be introduced. Generally speaking, there are three ways of evaluating an MT (Hutchins 2005a): ( l) Diagnostic evaluation which analyzes linguistic errors (grammatical, lexical, etc.) as well as system limitations; (2) Adequacy evaluation which mostly concerns those who use or purchase the system, i.e., persons, companies, organizations, etc.); and finally (3) Performance

106

MOHAMMAD REZA FALAHATI QADIMI FUMANI & AZADEH NEMATI

evaluation that deals with stages during which the system developed as it deals with its technical implementations. There are, in the literature, some other criteria for classifying MT evaluation, which could be summarized as: human evaluation, semiautomatic evaluation and automatic evaluation. For example, Guessoum and Zantout (200 I b) carried out a semi-automatic evaluation with a focus on the concept of grammatical coverage. In their later work, they wrote about the evaluation of lexicons of MT systems (Guessoum & Zantout 2001a). Papineni et al. (2002) designed a method for automatic evaluation of machine translation. They referred to human evaluation as extensive but expensive. They mentioned: Human evaluations can take months to finish and involve human labor that cannot be reused ... We propose a method of automatic machine translation evaluation that is quick. inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run. (Papineni et al. 2002) There are also other ways of classifying different types of evaluation. For example, Guessoum and Zantout (2001 b) state: . system evaluation is divided into two types of Glass-box and Black-box. ln the former, the rater or evaluator will have access to the input data as well as the output produced by the system. Black-box is different from Glassbox in that in the former the evaluator has also access to various sub-parts of the system ... In the present study Glass-box evaluation was used in the third category of raters. Hutchins and Somers (1992) while discussing different types of evaluation, concentrated on quality assessment. They took into consideration items like, accuracy, intelligibility, style and error analysis. Arnold ( 1995) and Locke & Booth (1955) wrote about MT and the evaluation of MT systems. Nagao et al. (1985), King (1990) and Reeder (2001) discussed in detail methodologies available for the evaluation of MTs. Lehrberger and Bourbeau (1988) introduced general methodology for evaluation by users. White and O'Connell (1994) wrote about the development of diagnostic and evaluation tools for natural language processing applications; like the Darpa project. Jihad (1996), Qende1ft ( 1997) and Arabuter ( 1996) carried out various research on Arabic MT systems, i.e., AI-Wafi, etc.

CRITERION OF COMPREHENSIBILITY IN ENGLISH-PERSIAN MTS

107

Among the other works carried out on evaluation, those carried out by Taylor and White (1998), Rajman and Hartley (2000), Vanni and MilJer (2001 ), Niessen et aL (2000), Popescu-Belis, Manzi & King (2001), Hovy, King, & Popescu-Belis (2002), King and Falkedal (1990), White (1995) and Arbor (2005) are of particular importance.
3.
METHODOLOGY

In this section first the two MTs analyzed in the present research will be briefly" reviewed. Then, participants of the study will be stated. Later, the corpus used as the input data will be explained.
3.1.' Machine translations under study In order to carry out the present research the researchers used two Englishto-Persian MT systems, which as mentioned before will be named simply as MT1 and MT2 here . Boh MTs are unidirectional, that is they can just translate English inputs into Persian outputs and it is not possible to render Persian texts into English ones. These two are not the on1y available MTs but are the ones that seem to have a better performance than the rival MTs. Below a description will be given on MT I extracted from <www.parstranslator.net/eng/>.
The first commercial version of software issued to public in June 1997 . .. Its input system uses typed text or English contents selected from a file. By now,_ its engine is able to recognize and parse for more than 1.5 million words and terminologies commonly used in public English and 33 fields of sciences. Its bank of words and terminologies are being reviewed and upgraded continuously ...

With regard to MT2 the following information extracted from Compendium of Translation Software 8the edition, p. 52 will suffice:
MT2 is a unidirectional English-to-Persian MT that with the use of different types of dictionaries (technical dictionaries, seven groups, and a user dictionary, up to 128000 entries) translates English texts into Persian texts. This MT enjoys a translation speed equal to 400 pages per minute.

32. Participants
In order to evaluate the two MTs, a number of users and translators were used as the ultimate raters of the performance of each MT. The raters

108

MOHAMMAD REZA FALAHATI QADIMI FUMANI & AZADEH NEMATI

were divided into three groups as follows: (1) Ordinary users who never saw the input text while evaluating the MT performance(2) Experienced translators who were just allowed to evaluate the system outputs without having a look at the input English text and finally (3) Experienced translators who used the input text while evaluating the output of each MT (In fact this is a true example of Glass Box evaluation). There were four raters in (1) and three raters in (2) and (3). Therefore, in all ten raters were used in this study. The raters were asked to score each sentence either 0, for incomprehensible sentences, or 1, for comprehensible sentences, and in case they thought a sentence was comprehensible to some extent they were required to assign that sentence a 0.5 score.
3.3. Corpus The subject area of General was used by the researchers because it was present in both MTs. Ten different general one-page texts were first selected by the researchers. The researchers made such a selection because they were interested in using a general text as the corpus. Then one text was again randomly selected. This one-page text was used as the smai.J corpus of the study. One point to be mentioned about the corpus is that the researchers read the translated text and divided the complex and compound sentences into simple sentences and each simple sentence was then scored by raters one-by-one. (The source text had also been divided into 20 simple sentences).
4.
DATA ANALYSIS AND DISCUSSIONS

In this section using the two statistical procedures of frequency & percentage tables and t-test, the rate of Comprehensibility of MT1 and MT2 will be examined. In the analysis three types of raters will be considered: ( 1) Ordinary readers who did not know any English word and were just given the chance of observing the Persian output; (2) Experienced translators, with aB.A. degree in English Translator Training, who judged the Comprehensibility of the output text without looking at the input and finally (3) Experienced translators, with a B.A. degree, who reviewed the English input while making judgements about .the Persian output. A comparison will also be made between the twoMTs. At the end , some points will be made on the most frequent types of errors, 1.e. grammatical , lexical, etc., observed in the two outputs.

CRITERION OF COMPREHENSIBILITY IN ENGLISH-PERSIAN MTS

I 09

4.1. Comprehensibility of MTJ and MT2 The Persian output was first given to four native speakers of Persian who did not know Eng1ish at all. They were asked to read the text, which had been first separated in the form of twenty sentences. It was explained to the raters that they had to read the sentences one by one and then assign each with one of the following scores0 for sentences that were not comprehensible; 0.5 for sentences that were comprehensi_ble to some extent, but had some errors, and I for sentences that were completely understood by the raters and were grammatical. As shown in Table I raters scored MT1 and MT2 29 and 24.5 respectively. Similarly the mean score and the percentage of comprehensible sentences in MT I (x=7.25), 36.25o/o, was higher than MT2 (x=6.12), 30.6%. This shows that even in the case of MTl no more than 36.25/o of the translated text was comprehensible. The result of the t-test was significant at 0.5 level (t=0.8, at 0.5 level, df=6).
Referees
l

Machine Translation l 5

Machine Translation 2 8 4 5 7.5 24.5 (x=6_ 12), 30.6%

2 3 4
Total

6
9 9

29 (x=7.25).36.25%

Table 1. Ordinary referees. Then, the researcher gave the output to three translators, but asked them to assess the Comprehensibility of the sentences without looking at the input text. The results obtained in Table 2 indicate that again the scores, MTI (20.5) and MT2 (19.5), were really low. Here again the mean score as well as the percentage of Comprehensibility was higher in MT l (x=6.83), 34.15/o than MT2 (x=6.5), 32.5o/o, but the t-test did not reject the nullhypothesis (t= 0.22, at 0.05, df= 4).
Referees
r

Machine Translation l 5.5 9 6 20.5 (x=6.83), 34. 15%

Machine Translation 2 4.5 8 7 19.5 (x,;,6_ 5), 32.5%

1 2 3 Total

Table 2. Experienced translators judging the Persian output text without using the English input.

110

MOHAMMAD REZA FALAHATI QADIMI FUMANI & AZADEH NEMATI

Unlike Table 2, in Table 3 the raters, experienced translators, were. asked to have a look at the input sentences while evaluating and scoring each output sentence. As shown in Table 3, MT1 (20) received a marginally higher score than MT2 (19.5). Evel) in this case none of the two MTs could obtain a 40o/o Comprehensibility level. (MT1, x=6.66, 33.3o/o, and MT2, x=6.5, 32.5o/o). No significant difference was observed between MT1 and MT2 (t= 0.002, at 0.05 level, df= 4).
Referees 1 2 3 Total Machine Translation 1 5 8 7 20 (x=6.66), 33.3% Machine Translation 2 5 8 6. 5 32.5% 19.5 (x=6.5),

'

Table 3. Experienced translators judging the Persian output text while having an eye on the English input. In Table 4 all the output sentences were divided, by the researcher, into three groups. As indicated in Table 4, the number of comprehensible and granunatical sentences produced by MT1 and MT2 are 3 (15o/o) and 5 (25o/o) respectively. By adding comprehensible to some extent to comprehensible but with grammatical problems in the statistics, the result obtained will be no better than 15 (75%) and 13 (65o/o) for MT1and MT2. This point proves that even if such MTs are to be used, widespread post editing or even retranslating will be an unignoreable prerequisite (There is nothing wrong with Pre and post editing, but the problem here is the amount of editing these MTs must undergo to produce a comprehensible output.).
Type of sentence Comprehensible and grammatical Incomprehensible Comprehensible to some extent or Comprehensible but with grammatical problems Total comprehe nsible sentences Machine Translation 1 3 (15%) 5 (25 ) Machine Translation 2 5 (25%) 7 (35o/o )

.
12 (60%) 15 (75%) 8 (40%) 13 (65%)

Table 4. The analysis of three types of sentences (comprehensible; incomprehensible; comprehensible to some extent or comprehensible but with grammatical problems) based on the principles of comprehensibility and relevance (numbers indicate the number of sentences within each type).

CRITERION OF COMPREHENSIBILITY IN ENGLISH-PERSIAN MTS

111

42. A comparison of MT1 and MT2

A comparison of the two MTs indicates that neither MT I nor MT2 has a good performance. In fact, the number of comprehensible and grammatical sentences is very low and this seems to be due to deficiencies of the grammar as well as the lexicon of the MTs, particularly, when they come to words with a number of different meanings, like homonyms, polysemous words, etc. Both MTs need to be heavily post edited by human translators. The- low quality of the two MTs questions the justifiability of their application as an instrument for market use. One further point is that although MT l as appeared in Tables I, 2 and 3 showed a marginally better performance, this difference is not statistically significant because the two tailed t-tests calculated for the following cases (Table 5) did not reject the hypothesis of no difference between the two MTs:
Column
I

Table(s) Table 2 Table 3 MTl in Tables MT1 in Tables MT I in Tables MT2 in. Tables MT2 in Tables MT2 in Tables

T..;Test T=0.22 0.002 T=0.28 T=0.290 T=0.086 T=O. l84 T=O. l21 T=O

Degree of Level of Significance Freedom 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 4 4 5 5 4 5 5 4

Result Not Sig. Not Sig. Not Sig. Not Sig. Not Sig. Not Sig. Not Sig. Not Sig.

2 3 4 5 6 7 8

1-2 1-3 2-3 1-2 1-3 2-3

Table 5. A summary of the t-tests calculated As mentioned before the only _ place where the t-lest was marginally significant was in Table I (t= 0.8, 0.5, df=6), where the result obtained connot be strongly generalized to the whole parts of the two systems because of the small size of the sample raters - four raters were used in this case - and the input data. Moreover, the result was marginally significant at 0.5 .level, which proves that the difference could not be reliably cited. Anyway it must be kept in mind that neither of the two systems could attain even a 40o/o degree of Comprehensibility.
4.3. Error analysis of the Performance of MTJ and MT2

In this section the performance of the two MT systems will be analyzed according to the errors occurred in them. The errors to be considered are as follows: Grammatical, Lexical, Punctuation, Spelling, Untranslated items and Stylistic.

112

MOHAMMAD REZA FALAHATI QADIMI FUMANI & AZADEH NEMATI

In order to pinpoint the above error types the researchers, as -expert translators, reviewed the output of each MT and then extracted the errors observed. To increase the reliability of the results, the researchers consulted another experienced translator and in case there were some discrepancies they discussed the matter with him to come to a single conclusion . For this reason, each error was classified just in one category. . Table 6 provides the readers with types of errors observed in MT I. The results indicate that, in all, there were 52 errors in this MT Grammatical errors, with 31 (59.61 o/o) cases were the most frequent type of error. Lexical errors with I 0 (19.23o/o), punctuation errors with 5 (9.61%) and untranslated items with 4 (7.69o/o) cases ranked second to fourth in this regard . This MT could not correct or even understand misspellings, like "of cource", which is a misspelling of "ofcourse". In such cases the component that embodied misspell ing would be left untackled Column 1 2 3
4

Error Type Sty le Grammatical Lexical Punctuation Spelling UntransIated

Frequency I 31 *10 **5


I 4

Percentage 1.93 59.61 19.23 9.61 1.93 7.69

5 6 Total

52

..

100

Table 6. The type and frequency of errors observed in MT I as "cource", and the remaining part that was grammatically correct would be translated as a separate linguistic item. For example "of' in "of cource" would be simply translated as"./'. Therefore, the translation of the above word would be something like "cource ./',which is written from right to left following Persian writing style. The other words not translated by this system are as follows: went", "invited", "friend", and "heap-and". With regard to the last term (heap-and) the system failed to translate this term, the main reason of which seems to be the existence of dash and lack of space before and after it. Thus, the system considered these two separate words, i.e. heap and and as a single lexical unit.
*Repetitions of the same error have not been included in the final error count. ** In all, this system made 23 punctuation errors but after deleting repetitions 5 cases were left.

CRITERION OF COMPREHENSIBILITY IN ENGLISH-PERSIAN MTS

1 13

In the diagram given below, error types have been arranged in sequence based on the frequency of each error type. Spelling and style errors occurred with the same frequency. grammatical>lexical>punctuation>untranslated>spelling & style But as illustrated in Table 7, there were 45 errors in MT2, _in the aggregate, which is 7 errors lesser than those made by MTI. Like MTl here again grammatical errors with 23 cases (51.11o/o) ranked first. After this, lexical errors as well as punctuation and untranslated items ranked
Column
1 2

Error Type Style Grammatical Lexical Punctuation Spelling Untranslated

Frequency
-

Percentage
0
51 . II
<

3
4 5 6

23 13 *4
1

4 45

28.89 8.89 2.22 8.89


100

Total

Table 7. The type and frequency of errors observed in MT2 second and third with 13 (28.89o/o) and 4 (8.89o/o) cases respectively. This result is similar to what obtained in Table 6 with regard to MT1, of course with varying frequencies, except that in Table 7 untranslated items and punctuation ranked jointly third whereas in Table 6 they ranked third and fourth, with a frequency of 4 and 5. Although in MT J more grammatical errors were found (31 vs. 23), in MT2 there were more incomprehensible sentences (7 vs. 5, taken from Table 4) and less comprehensible to some extent sentences (8 vs. I 2, taken from Table 4). The number of lexical errors made by MT2 was also higher. As words embody concepts and meanings this, as is evident in the data, resulted in the production of more incomprehensible sentences (Using a larger corpus will provide us with more and better information in this regard.). One further point concerning the spelling errors made by MT2 is that this system, 1ike MTI , failed to correct a misspeJJing of the source text, i.e. the word "of cource". In fact, it was translated as, '"ecruoc l'. But the ln all, this system made 5 pLnctuation errors but after deleting repetitions 4
cases were left.

114

MOHAMMAD REZA FALAHATI QADIMI FUMANI & AZADEH NEMATI

difference is that in MT I the untranslated word preserved. its English format (i.e., the arrangement of letters from )eft to right - c-o-u-r-c-e) whereas in MT2 the system, though left th.e term untranslated, made some modifications in the order of the letters of this word to make it more Persian-like - the system converted the input"cource" into the output "ecruoc". Thus, it was treated as a Persian word. The diagram below summarizes the error types in sequence based on the frequency of each item. Grammatical>lexical>punctuationand untranslated>spelling>style The results obtained from the error analysis of MT I and MT2 indicate that the performance of both MT systems is very low particularly with regard to grammar and lexicon. This might be due to the shortcomings of the syntactic theory employed by the designers of the two MTs or simply the r sult of incorrect application of the grammatical theory. The lexicon used in the two systems seems to be incomprehensi.ble as well. In general, the shortcomings of the two systems could be summarized as follows:
1.

2. 3. 4. 5. 6. 7.

8.

9.

They cannot handle homonyms, polysemous words, etc. properly. Misspelling is left untackled quite often. Weak performance when lexical or syntactic ambiguity is present. Complex and compound sentences are not dealt with properly. Tense is often misinterpreted. Particles are not translated properly taken the verb they accompany. Direct object marker is often lost or may appear in the wrong place in the output. The systems also have problems with identification and application of such markers in the input text. Plural vs. singular terms. Both systems interpreted plurality and singularity based on the element that comes closest to the verb. For example in The number of coins varied the verb was translated as plural because the verb was preceded by "coins, that was a plural noun. This is a great mistake because the verb should agree, here, in number with the word "number", which is a singular noun. Good application of syntactic tree diagrams may help designers remove this problem at least to degrees. Punctuations are misinterpreted, For instance "heap-, :and" was taken as a single word, which was ultimately left untranslated because the lexicon did not embody such a non-English term. The semicolon marker ; is another example which appeared in the output without

CRITERION OF COMPREHENSIBILITY IN ENGLISH-PERSIAN MTS

115

10.

11. 12.

13. 14.

being changed into Persian (where, unlike English, comma comes over dot to form semicolon). Finally, MT l failed to close each sentence with a dot in 14 cases. Not making reference to the context before interpreting a word or words in the structural representation of a sentence - "ruler" was translated as an object for writing. Misinterpretation of 'adjective+noun' sequences. Misinterpretation of relative clauses and wh-words. Sometimes a whword like "when" was mistakenly considered to be the starting point of an interrogative sentence, which based on the input was nothing more than a misinterpretation on part of the system. Failure to deal with redundancy particularly in MTl and with regard to subject pronouns. The inability of the system in interpreting acceptable deletions. For example, the word money in so he did not have much to pay was not understood by MTl and MT2. In fact, the word "money" could be inferred because of the presence of words like "income", "pay", etc. in the surrounding context.
SUMMARY

5.

Two English-to-Persian MT systems were diagnostically analyzed to provide an answer to the two research questions of this study, which concerned the degree of Comprehensibility of each MT as well as the error types and frequencies observed in_ the output of each MT system. A modified self-made version of Miller et al.'s theoretical framework concerning Comprehensibility was employed. Ten raters in three groups rated the performance of each MT on a sentence-by-sentence scale. Using frequency tables, percentage figures and t-tests the scores were analyzed. The research indicated that these two MTs did not have a satisfactory performance and that they were abundant with a lost of problems particularly in grammar and lexicon. Fourteen major weaknesses of the two systems were also mentioned with the objective of helping designers and producers of such systems to improve the performance of their systems.

116

MOHAMMAD REZA FALAHATI QADIMI FUMANI & AZADEH NEMATI

. REFERENCES

Arabuter. 1996Al-mutarjim Al-aaJiy Al-waafy [The Machine Translator:' AlWafi]. Arabuter, 8171. Arbor, A, 2005. Intrinsic and Extrinsic Evaluation Measures for MT and/or Summarization. 29th June. Available online: <http://www.isi.edu/-cyi/MTSE 2005>. Arnold, D. J.. 1995. Evaluating MT Systems. Dec. Available online: <http:// chvww.essex.ac.uk/ douglbook/node75.html>. Bar-Hillel, Y. 1964. The State of Machine Translation in 1951. American Documentation, 2, 229-237. Represented in: Bar-Hillel, 153-165. Guessoum, A., & Zantout, R. 200 I a. A Methodology for a Semi-Automatic Evaluation of the Lexicons of Machine Translation Systems. To Appear in the Machine Translation Journal. Kluwer Academic. - -. 200 I b. Semi-Automatic Evaluation of the Grammatical Coverage ofMachine Translation Systems. MT Summit VIII, Spain, September 18-22. Hovy, E. H., King, M., and Popescu-Bel is, A 2002. Computer-Aided Specification ofQuality Models for MT Evaluation. In LREC 2002: The Third International Conference on Language Resources and Evaluation, Las Palmas de Gran Canaria, Spain, 1239-1246. Hutchins, W. J. 2005a. Evaluation of Machine.Translation and Translation Tools. Available online: <http://cslu.cse.ogi.edu/HLTsurvey/ch13node5.html> --. 2005b. Machine Translation: A Brief History.Available online: <http:// ourworld. ompuserve.comlhomepages.WJHutchins/Conchist.htm>. --., & Somers, H. L. I99T An Introduction to Machine Translation. New York: Academic Press. Jihad, A I99'6. Hal bada'a asru altarjamati al-aaliyyati arabiyyan? [Has the Arabic Machine Translation Era Start d?] Byte Middle East, (In Arabic), November. King, M. A. 1 990. Workshop on Evaluation. Background Paper, Proceedings of the Third International Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages, Linguistic Research Center, University of Texas, Austin, TX, 255-259. - - . & Falkedal, K. 1990. Using Test Suites in Evaluation ofMach_ine Translation Systems. In COL/NG-90: Paper Presented to the 13th International Conference on Computational Linguistics, Helsinki, Vol. 2, 21I-2I6. Lehrberger, J., and Bourbeau, L. I988. Machine Translation: Linguistic Characteristics of MT Systems and General Methodology of Evaluation. Amsterdam: John Ben Jamins Press. Locke, W.N., & Booth, A D. I955. Machine Translation ofLanguages. Cambridge, Mass: MIT Press, 1955. Miller, K. J., et al. 2001 . Evaluating Machine Translation Output for an Unknown Source Language. Report of an ISLE-Based Investigation, Fourth ISLE Workshop, MT Summit VIII, Spain, September 18-22.

CRITERION OF COMPREHENSlBILITY IN ENGLLSH-PERSIAN MTS

I 17

Nagao, M. et at. 1985. Machine Translation System f:om Japanese into English : Mu-JE. Computational Linguistics, 1 1/2-3,91-110. Niessen, S. et al. 2000. An Evaluation Tool for Machine Translation: Fast Evaluation for MT Research. In LREC 2000, Second International Conference on Language Resources and Evaluation, Athens, Greece, 39-45. Papineni, K. et al. 2002. BLEU: A Method .for Automatic Evaluation of Machine Translation. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL), 311-318. Popescu-Belis, A., Manzi, S., & King, M. 200 I. Towards a Two-stage Taxonomy for Machine Translation Evaluation. In MT Summit VIII Workshop on MT Evaluation "Who did What to_ Whom?", Santiago de Compostela. Spain, 1-8. _Qendelft, G. 1997. Bamaamaj alwaafy liltarjamati mufiidun lifahmi alma naa al aammi min risaalatin inkliiziyyatin. [The Translation Program Al-Wafi Is Useful for Getting a General Understanding of a Letter Written in English]. AI-Hayat Newspaper, 12657, October 25. Rajman, M. & Hartley, A. 2000. Automatic Ranking of MT Systems. In LREC. 2000: The Third International Conference on Language Resources and Evaluation, Las Palmas de Gran Canaria, Spain, 1247-1253. Reader, F. 2001 . An Introduction to MT Evaluation. Available online: <http:// www. issco.unige.ch/projects/isle/mte-introduction-fr/index. htm> Taylor, K. B., & White J . S. 1998. Predicting What MT is Good for: User Judgements and Task Performance. In David Farwell, Laurie Gerber and Edward H. Hovy ( Eds.), Machine Translation and the Information Soup (pp. 364-373). Berlin: Springer-Verlag. vanni, M., & Miller, K. J. 2001. Scoring Methods for Multi -Dimensional Measurement of Machine Translation Quality. In MT Summit VIII Workshop on MT Evaluation "Who did .What to Whom?" , Santiago de Compostela, Spain, 21-28. White, J. 1995. Approaches to Black Box Evaluation. ln Proceedings of the MT Summit, Luxembourg. --. & O'Connell , T. 1994. The DARPA MT Evaluation Methodologies: Evolution, Lessons, and Future Approaches. Proceedings of the 1994 Conference, Association for Machine Translatron in the Americas.

Vous aimerez peut-être aussi