Leading by Examples

This article was downloaded by:
On: 18 February 2009

Access details: Access Details: Free Access
Publisher Routledge
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,
37-41 Mortimer Street, London W1T 3JH, UK
Perspectives
Publication details, including instructions for authors and subscription information:
http://www.informaworld.com/smpp/title~content=t794297831
LEADING BY EXAMPLE: AUTOMATIC TRANSLATION OF SUBTITLES VIA

EBMT
Armstrong Stephen a; Way Andy a; Caffrey Colm b; Flanagan Marian b; Kenny Dorothy b; O'Hagan Minako b
a
School of Computing, b School of Applied Language and Intercultural Studies, Dublin City University, Ireland
Online Publication Date: 31 January 2007
To cite this Article Stephen, Armstrong, Andy, Way, Colm, Caffrey, Marian, Flanagan, Dorothy, Kenny and Minako,
O'Hagan(2007)'LEADING BY EXAMPLE: AUTOMATIC TRANSLATION OF SUBTITLES VIA EBMT',Perspectives,14:3,163 — 184
To link to this Article: DOI: 10.1080/09076760708669036
URL: http://dx.doi.org/10.1080/09076760708669036
PLEASE SCROLL DOWN FOR ARTICLE
Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf
This article may be used for research, teaching and private study purposes. Any substantial or
systematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply or
distribution in any form to anyone is expressly forbidden.
The publisher does not give any warranty express or implied or make any representation that the contents
will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses
should be independently verified with primary sources. The publisher shall not be liable for any loss,
actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly
or indirectly in connection with or arising out of the use of this material.
163
LEADING BY EXAMPLE:
AUTOMATIC TRANSLATION OF SUBTITLES VIA EBMT1
Stephen Armstrong & Andy Way, School of Computing,

Colm Caffrey, Marian Flanagan, Dorothy Kenny & Minako O’Hagan,
School of Applied Language and Intercultural Studies,
Dublin City University, Ireland
minako.ohagan@dcu.ie
Abstract
This paper describes a project to investigate the scope of the application of Example-Based Ma-
chine Translation (EBMT) to the translation of DVD subtitles and bonus material for English-
German and English-Japanese. The project focused on the development of the EBMT system and
its evaluation. This was undertaken as an interdisciplinary study, combining expertise in the
areas of multimedia translation, corpus linguistics and natural language processing. The main
areas on which this paper focuses are subtitle corpus creation, development of the EBMT system
for English-German, development of evaluation methods for MT output and an assessment of the
productiveness of different data types to EBMT.
Key words: Machine translation; EBMT systems; corpus creation; subtitling; English-
German; English-Japanese.
Downloaded At: 23:31 18 February 2009
1. Background
Demand for subtitle translation is on the increase due to the proliferation of
DVD releases of audiovisual content in particular for feature films. Despite the
upward demand, however, the working conditions for human subtitlers are
declining with decreasing rates of pay and mounting time pressure to translate
within shorter and shorter timeframes. Worse still, when producing DVD sub-
titles in multilingual versions, translators are sometimes forced to work solely
on the basis of a master file containing the source subtitle text without access
to the audiovisual content. Furthermore DVD has opened the floodgate of pi-
racy issues with films distributed illegally, undercutting the official prices, with
sometimes extremely poor quality subtitles invariably carried out by unquali-
fied amateurs. Piracy is another reason why official versions need to be distri
buted without delay. These issues are repeatedly raised in recent audiovisual
translation conferences2 and yet the market reality suggests that the prices have
to be contained due to fierce competition (Carroll 2004). The subtitling process
is increasingly facilitated by computer-based subtitling systems, but they are
mainly used for mechanical aspects such as time-coding and word-processing
while the translation process itself remains unaided. The lack of attempts to
introduce computer-aided translation (CAT) in audiovisual translation particu-
larly for fictional films may stem from the notion that the source text mainly
representing dialogues is unlikely to render well to machine translation (MT).
The problem includes incomplete sentences with ellipsis, the need for conden-
sation to fit the translation into the allocated space as well as the requirement
for synchronizing the text with the images. All these elements may have been
considered insurmountable challenges to MT.
However, our investigation of Example-based MT (EBMT) seeded with sub-
title data has an immediate link to the research direction represented in Taylor
0907-676X/06/03/163-22 $20.00©2006Armstrong/Way/Caffrey/Flanagan/Kenny/O’Hagan
Perspectives: Studies in Translatology Vol. 14, No. 3, 2006
164 2006. Perspectives: Studies in Translatology. Volume 14: 3
(2006a, 2006b) in detecting predictable patterns used in dialogues of fictional

audiovisual content. Implicit in our interest in the EBMT paradigm (explained
in section 3.1 below) is therefore to seek to what extent repetition or similarity
exists across film dialogues, both at a sentential and especially sub-sentential
level. This will benefit audiovisual translation research and similarly EBMT
where there is no prior research focusing on subtitles for fictional films.
There have been a number of early attempts to develop an MT system in
the area of news subtitles, with notable examples by public broadcasting bod-
ies such as NHK (Japan Broadcasting Corporation), testing MT for displaying
Japanese subtitles for English language satellite news in the 80s with the dis-
claimer credit of “MT-produced translations” running at the bottom of the TV
screen. Following the early attempts mainly using a transfer-based MT system,
they have also tested the then developing EBMT paradigm (e.g. Nagao 1984)
as reported in their 1996 annual report (NHK Annual Report 1996). Today the
main foreign satellite news reports are translated live by human media transla-
tors in Japan, suggesting that the research has not produced workable systems.
Also in the US, commercial MT systems were built to automatically translate
and produce Spanish captions from English news (Toole et al. 1998). There
have also been recent high-profile projects undertaken in Europe to automate
subtitle translations. One is the MUSA (Multilingual Subtitling of Multimedia

Content)3 project funded by the European Union to produce a set of technolo-
gies to automatically produce subtitles for English TV documentaries in English
(intralingual subtitles), French and Greek. In addition to an MT component, the
MUSA project included a development of a speech recognition engine to turn
the audio input into text and also a condensation technology to shrink the MT
output into a shorter sentence to be immediately usable as a subtitle. Another is
the eTITLE project4 aimed at enabling faster multilingual cross-platform locali-
sation for media content owners via linguistic technologies such as automated
speech-to-text, MT, sentence compression, subtitling automation and metadata
automation. These projects differ from the present study in scope and coverage
and most of all the fundamental interest of our project in investigating the suit-
ability of the EBMT paradigm for the text type of subtitles for fictional films.
Our project is driven by the deteriorating working conditions developing for
subtitlers and the fact that they currently translate mostly without the benefit
of CAT tools. The ultimate goal of the current study is therefore to build a CAT
tool for human subtitles, integrating an MT unit to the existing computer-based
subtitling system. Such tools will be designed to increase the throughput of
human subtitlers, enabling them to produce subtitles faster and even improve
their quality. A preliminary study (O’Hagan 2003) had pointed to the scope
for applying a CAT paradigm to audiovisual translation on the basis of the
shortness and the relative lack of complex sentence structures characteristic of
subtitles. The present project set out to test the feasibility of seeding an EBMT
system with human-produced subtitles and applying it to subtitle translation.
We argue that our choice of using EBMT as opposed to more freely available
rule-based MT (RBMT) is motivated by the increasing technical feasibility to
harvest human-produced subtitles from DVDs in significant quantities, copy-
right issues notwithstanding, and following the popular Translation Memory
(TM) paradigm where translators are able to build up their own resources to
Armstrong, Way, Caffrey, Flanagan, Kenny & O’Hagan. Leading by Example. 165
increase productivity. Also, given the relatively short timeframe for the project,
we have set a realistic goal to test the feasibility of building and testing EBMT
designed to produce German and Japanese subtitles from already available hu-
man-produced English intralingual subtitles. Our system is based on a number
of assumptions: (i) we will specifically aim at producing translations for sub-
titles for DVD productions where the intralingual subtitles in the source lan-
guage are already available; (ii) we will aim at developing the system mainly
for English and German, followed by English and Japanese and (iii) we will not
deal with the copyright issue for this particular feasibility study assuming that
it will fall on the party who wishes to ultimately commercialise our concept.
In this paper, we focus on our main EBMT system developed for English and
German; the English-to-Japanese system is not discussed, for reasons of space
constraints. Section 2 provides a brief description of our research design to ex-
plain our methodological approach. We give an overview of EBMT in section
3, together with a description of the marker-based approach that we use in our
system. In section 4 we describe the corpora we created, as well as the other
corpora used to test the effect of training the EBMT system on heterogeneous or
homogeneous data. Section 5 contains thorough evaluations of our results using
automatic metrics commonplace in MT today, as well as two manual evalua-
tions carried out – one using the standard human scales of accuracy and intel-
ligibility, and the other a summative, holistic evaluation to test the suitability of
the automatically produced subtitles for viewing incorporated in the film clips.
Finally, in section 6 we conclude, and avenues for further work are provided in
section 7.
2. Methodology
Our objective was to build an EBMT system for the purpose of the feasibility
study to test if this data-driven MT paradigm works for translating subtitles
for fictional films and, if so, to determine which data type is more productive
to seed the system in order to produce high quality translation. Given the short
timeframe, we had to make the system integration component as a post-project
activity to incorporate the MT unit into the existing subtitling system. The
project was able to take advantage of the prototype EBMT system being devel-
oped by the MT group at the National Centre for Language Technology (NCLT)
at Dublin City University (Armstrong et al. 2006b; Stroppa et al. 2006).5
The first task was to design and build parallel corpora made up of human-
produced subtitles (in German and Japanese) for English-language material (cf.
section 4 below). For the purpose of data type comparisons, we also created
heterogeneous data from the publicly available Europarl corpus (Koehn 2005)
in addition to the homogeneous data which consisted solely of subtitles. We
first built an English-German EBMT system and then repeated the steps for
English-Japanese. This was followed by a series of evaluation sessions. One of
the objectives of this study was to explore a holistic evaluation methodology
and therefore we combined the BLEU (Papineni et al. 2002) automatic metric
and human-based methods. The BLEU scores provided a quantitative analy-
sis to assess our EBMT performance according to different data sizes and also
data types. The human evaluations provided a qualitative evaluation to point
to shortcomings of the system and also some positive reinforcement to sup-
port our approach. The research design was motivated by a number of factors,
including the interdisciplinary character of the project drawing on talents from
humanities and science. We will reflect on advantages and disadvantages of our
approaches in our conclusions.
3. Example-Based Machine Translation

3.1. Introduction
While various types of MT systems exist, almost all MT research being carried
out today is corpus-based, with the two main data-driven approaches to MT
being Statistical Machine Translation (SMT) and EBMT. Despite this, the main
commercial MT systems available on the market today are primarily rule-based
MT (RBMT).
The idea behind EBMT is translation by analogy (Nagao 1984), meaning that
human translations are recycled to automatically generate new output on the
basis of similarities between the source text elements stored in the system’s
databases and those of the input. Data-driven approaches such as EBMT rely
on the availability of a sententially-aligned bilingual corpus, with which the
system must firstly be trained in order to extract and store source-target sub-
sentential alignments at a later stage. During the translation process, the in-
put sentence is segmented into chunks. These source language chunks are then
matched against the example database locating corresponding target language
examples, so they can be recombined to produce the final output.
The first stage of this process may sound familiar to those who have used
Translation Memory (TM) tools. However, the essential difference between the
two approaches lies in the fact that other than in 100% matches, a TM does
not translate; rather, a human is required during the translation process to ma-
nipulate the target language sentences corresponding to close-matching source
strings in the TM into the appropriate final translation. By contrast, EBMT sys-
tems translate completely automatically, and require no human intervention in
the translation process.
Somers (1999: 137) raises the question whether or not certain language pairs
are more suited than others to use with EBMT. Our research design was to con-
centrate on the English-to-German system first and then to follow it by the Eng-
lish-to-Japanese system. The reasons for choosing these languages had to do
with commercial considerations, as German and Japanese both represent sig-
nificant markets for DVD sales. In addition, given the fact that these language
pairs exhibit quite different translational phenomena, they were ideal for test-
ing the coverage and robustness of the system.
3.2. Marker-Based EBMT

Many different types of EBMT systems abound, including those using source-
target tree pairs (Hearne & Way 2006), dependency structures (Watanabe et al.
2003), strings (Somers et al. 1994), or those which generalise examples on the
basis of content words (Brown 1999).
Another approach clusters instead on closed-class or ‘marker’ words, and
has its roots in the ‘Marker Hypothesis’ (Green 1979). This is a psycholinguistic
constraint, stating that languages are ‘marked’ for syntactic structure at surface
level by a closed set of specific lexemes and morphemes. As an example, consid-
er the string in (1) from the Wall Street Journal section of the Penn-II Treebank:
(1) The Dearborn, Mich., energy company stopped paying a dividend in the third
quarter of 1984 because of troubles at its Midland nuclear plant.
Here we see that three noun phrases start with determiners and one with a
possessive pronoun. The sets of determiners and possessive pronouns are both
very small. Furthermore, there are four prepositional phrases, and the set of
prepositions is similarly small. The Marker Hypothesis is arguably universal in
presuming that concepts and structures like these have similar morphological
or structural marking in all languages.
When the EBMT system (cf. Armstrong et al. 2006b and Stroppa et al. 2006)
was developed, eight marker sets were defined, namely determiners <DET>,
prepositions <PREP>, quantifiers <QUANT>, conjunctions <CONJ>, wh-ad-
verbs <WH>, possessive pronouns <POSS_PRON>, personal pronouns <PERS_
PRON> and punctuation <PUNC> (as an end of chunk marker). These marker
categories are used to segment aligned source and target sentences during a
pre-processing stage, indicating where one chunk ends and the next one be-
gins.
The following steps of the marker-based chunking can be explained with the
English-German example in (2) (from ‘As good as it gets’, 1997):
(2) Do you like being interrupted when you’re playing in your garden?
↔
Werden Sie gern gestört, wenn Sie in Ihrem Garten herumhüpfen?
In (3) we see how the source/target aligned sentences are traversed word by
word and automatically tagged with their marker categories:
(3) Do <PERS_PRON> you like being interrupted <CONJ> when <PERS_PRON>

you ’re playing <PREP> in <POSS_PRON> your garden <PUNC> ?
↔
Werden <PERS_PRON> Sie gern gestört, <CONJ> wenn <PERS_PRON> Sie
<PREP> in <POSS_PRON> Ihrem Garten herumhüpfen <PUNC> ?
The marking of syntactic structures is necessary for the extraction of transla-

tion resources. Once the marking stage is over, aligned source-target chunks
are created by segmenting the sentences based on these tags, as well as by the
use of word translation probabilities and cognate information. A further con-
straint exists when creating chunks in that each chunk must contain at least
one non-marker word. This constraint ensures that each chunk contains useful
contextual information. If multiple marker-words appear alongside each other,
we keep the first and discard the rest.
Figure 1: The System Architecture - MaTrEx
Where chunks contain just one non-marker word in both source and target,
we assume they are translations. From this assumption it is possible to extract
word-level translations, as in (4):
(4) <CONJ> when ↔ <CONJ> wenn

<PREP> in ↔ <PREP> in
<PERS_PRON> you ↔ <PERS_PRON> Sie
<POSS_PRON> your ↔ <POSS_PRON> Ihrem
3.3. The MaTrEx EBMT system

Now that we have described the marker-based chunking method, we will ex-
plain how our EBMT system uses this methodology and give examples to illus-
trate the various stages. The EBMT system used in this research is the MaTrEx
(Machine Translation using Examples) system (Armstrong et al. 2006b; Stroppa
et al. 2006).6 This is a corpus-based MT engine, and is designed in a modular
fashion. Figure 1 illustrates the system architecture and the interaction of each
module. There are four main modules in the system: Word Alignment Mod-
ule, Chunking Module, Chunk Alignment Module and Decoding Module. Each
of these modules works together to produce the most likely translation of the
input sentence. In brief, the word alignment module takes an aligned corpus
as input and produces a set of word alignments (Och & Ney 2003); the chunk-
ing module also takes an aligned corpus as input, and produces a corpus of
source and target chunks; the chunk alignment module takes in source and tar-
get chunks, aligning them sentence by sentence; and finally the decoder (Koehn
2004) searches for a translation using the original aligned corpus, together with
the derived word and chunk alignments. A more detailed description of each
module can be found in (Armstrong et al. 2006b; Stroppa et al. 2006).
3.4. An EBMT example

The following is an example task for the EBMT system: to translate the Eng-
lish input sentence in (5) into German, given the aligned data in (6) as the sys-
tem’s training corpus. The English-German examples are taken from a mix of
films including Breakfast at Tiffany’s, Casablanca, Being John Malkovich and Dr
Strangelove.
(5) Darling, we just met two weeks ago at the bar
(6) Darling <PUNC>, <PERS_PRON> I am sorry <CONJ> but <PERS_PRON> I lost

<POSS_PRON> my key ↔ <POSS_PRON> Mein Guter <PUNC>, <PERS_PRON>
es tut <PERS_PRON> mir Leid <PERS_PRON> Ich habe <POSS_PRON> meinen
Schlüssel verloren
<PERS_PRON> I’m <DET> an artist ↔ <PERS_PRON> Ich bin <DET> ein Künst-
ler
<DET> That was <QUANT> two weeks ago ↔ <DET> Das war <PREP> vor zwei
Wochen
<PERS_PRON> We just met <QUANT> one day ↔ <PERS_PRON> Wir trafen

uns einfach <DET> eines Tages
<PERS_PRON> I’ ll call <DET> the police ↔ <PERS_PRON> Ich rufe <DET> die
Polizei
<PERS_PRON> I’ll be <PREP> at <DET> the bar ↔ <PERS_PRON> Ich gehe

<PREP> an <DET> die Bar
The data in the aligned corpus (6) is chunked (as described in section 3.2) ex-
tracting and storing useful chunks and their target-language counterparts for
later use, including those in (7):
(7) Darling ↔ Mein Guter

That was ↔ Das war
Two weeks ago ↔ vor zwei Wochen
The police ↔ die Polizei
I’m ↔ Ich bin
An artist ↔ ein Künstler
We just met ↔ wir traffen uns einfach
One day ↔ eines Tages
At the bar ↔ an die Bar
I lost my key ↔ Ich habe meinen Schlüssel verloren
In order to identify how useful a chunk will be in the translation process, a range
of similarity metrics are used, including word alignment probabilities, cognates
and marker chunk labels (cf. Stroppa et al. 2006; Armstrong 2007). These metrics
are implemented in the chunk alignment module as previously mentioned.
The first step in the translation process is to search the German side of the
original corpus in (6) to check if it contains the whole input sentence in (5). It
does not, so the system chunks the input sentence into smaller constituents (8):
(8) Darling <PERS_PRON> we just met <QUANT> two weeks ago <PREP> at the
bar
These new input sentence chunks are then searched for in the corpus of aligned
chunks (7). Once suitable chunks in the database are found, they are recom-
bined by the decoder to produce the final translation in (9):
(9) Mein Guter, wir trafen uns einfach vor zwei Wochen an die Bar
4. Corpus Description
4.1. Introduction
A corpus is a large collection of authentic texts, gathered according to specific
criteria and most commonly stored in electronic format. These texts can then be
used to study authentic examples of language use (Bowker and Pearson 2002:9).
In the field of computational linguistics, natural language processing tools, such
as EBMT also use corpus-based resources (ibid.).
When creating the corpora for our research purposes we wanted to have a
selection of subtitle (homogeneous) and non-subtitle (heterogeneous) aligned
data for each language pair. One of the main advantages of our EBMT system
in comparison with a RBMT system is the former’s ability to feed a custom-
made selection of aligned sentences into it. Given the fact that no prior study
existed in testing EBMT seeded with a subtitle-specific corpus, at the begin-
ning of our research, we did not know whether a subtitle-specific homogene-

ous corpus would give us better results than a general language corpus made
up of non-subtitle sentences. To this end it was necessary to build our own
homogeneous corpora for both language pairs (English-German and English-
Japanese). For our heterogeneous data, we were able to avail of an English-Ger-
man corpus containing the European parliament proceedings, Europarl (Koehn
2005), which is freely available for research purposes, together with a publicly
accessible English-Japanese heterogeneous corpus made up of various books
and articles created by Utiyama & Isahara (2003).
4.2. Creating the Corpora

Japan is traditionally a subtitling country, with foreign films for theatrical re-
leases typically screened with subtitles (apart from Disney films intended for
children which may be both subtitled and dubbed), and most DVDs contain
subtitles. Even though Germany favours dubbing for theatrical releases, DVD
films sold in Germany contain German subtitles. All the corpora created for
our research are bilingual sententially-aligned parallel corpora, a prerequisite
for EBMT systems. Firstly we decided to create a corpus containing subtitles
from DVD films. These subtitles are from the main feature film. In addition, we
created an English-German bonus material subtitle corpus, as well as one for
English-Japanese. The majority of DVDs now contain extra bonus material such
as a ‘behind-the-scenes’ documentary on how the film was made, an interview
with the director and actors from the film, extra scenes which may be deleted
from the final version, etc. Our second corpus consisted solely of bonus mate-
rial. This corpus was substantially smaller in size than the main subtitle corpus,
due to the fact that very often subtitles are not provided for all bonus materials,
one of the factors influencing our experimental design.
We concluded that the best way to create a homogeneous corpus was to build
up a collection of DVDs of English language films, which contained German or
Japanese subtitles alongside English intralingual subtitles. In an attempt to as-
sure the quality of the subtitles we trained the system on, we only took subtitles
from major motion pictures which tend to have high-quality subtitles produced
by humans. The corpus compilation work was also undertaken by a team of
researchers competent in the given language combination so that any errors
would be spotted.
We extracted both the interlingual and intralingual subtitles and saved them
in .srt format text files using the freely available software SubRip, providing
us with the subtitle text in English, German and/or Japanese, along with their
respective TC-in/TC-out (the time code at which the subtitle begins and ends).
The software uses optical character recognition (OCR) to convert the subtitles
into text format, as they are stored as an image. During the corpus creation
stage, we noted that it would be extremely helpful if the subtitles were stored
in a text format on DVD. The current standard of storing subtitles as graphic
files necessitates the use of OCR, or in some cases basic transcription, which
is even more time-consuming, inevitably leading to a loss of time in preparing
the source data, particularly in the case of Japanese. As the OCR component
of SubRip is not optimised to recognise Japanese characters, processing of the
Japanese subtitles took on average at least five times as long as for German or
English subtitles, which had a negative impact on the overall English-Japanese
corpus size. This problem was compounded by the limited availability of DVDs
with Japanese subtitles in Ireland, and the difficulties of sourcing them outside
Japan due to the region code regulations7, thus further contributing to the dif-
ficulty in bulking up the subtitle data.
After the first step of extracting the subtitles (one file for each language) from
the DVDs, we needed to clean up the files by removing the time codes. This was
done by running a Perl script on the files, leaving just the subtitles in text format.
When training the EBMT system with the corpus, the system works more effi-
ciently when the text is all lowercased for English and German (Japanese char-
acters are treated slightly differently as there are no distinctions between lower
and upper cases), meaning it will, for example, recognise that the token ‘The’ is
the same word as the token ‘the’. This was also done by running a separate Perl
script converting all the text to lower case. The two files were then sententially
aligned. This is quite a time-consuming stage, but by automatically numbering
the lines (which will work for English, German and Japanese), or by using an
alignment tool such as Trados WinAlign,8 the time spent on this process can be
reduced. The corpora were then ready to train and test with the EBMT system.
4.3. Corpus Statistics

The German-English DVD subtitle corpus contains 40K sentence pairs and
187,337 words, the DVD bonus material corpus contains 10K sentence pairs and
40,443 words, while the heterogeneous corpus (Europarl) contains in excess of
1 million sentence pairs. We currently have 36 film titles aligned for English-
German and 12 titles aligned for English–Japanese. The English-Japanese (ho-
mogeneous) DVD subtitle corpus consists of 12,700 sentences, roughly 124,012
Japanese characters and the heterogeneous corpus contains 82,805 sentences,
equalling roughly 2,624,850 Japanese characters. When we compare the average
Japanese sentence length of both corpora (9.76 characters per sentence for the
homogeneous and 31.7 characters per sentence for the heterogeneous corpus)
there is a noticeable difference, with the subtitles being on average one third
the length of a sentence from an article or book in line with the earlier findings9
(O’Hagan 2003).
Using the corpus analysis tool WordSmith,10 we were able to extract some
interesting statistics from our own corpus. We calculated the average sentence
length for both English and German to be a little less than 9 words. Contrast
this with the average length of sentences in, for example, the Europarl corpus,
which we calculated to be 24 words per sentence, clearly proving the presup-
positions about the space-constraints imposed on subtitlers.
The corpora are still growing, with at least 12 more film titles ready to be
added to the English-German corpus and a lesser number to be added to the
English-Japanese corpus. These corpora will be used for further research as dis-
cussed in the final section of this paper. It is imperative in our research to avoid
creating corpora which are contaminated with erroneous translations, thus are
unlikely to yield acceptable subtitles (as noted by human users). We considered
that given the end-use of the translation as subtitles, human verification was
deemed essential. Furthermore, creating and testing different types of corpora
is an integral part of the evaluation process of the system. Once the most pro-
ductive corpus type is established, this can then be built on over the course of
the research.
5. Evaluation Approaches and Results

5.1 Introduction
There are primarily two types of evaluation techniques: automatic and real user.
Automatic evaluation has been a popular choice in the past for many natural
language generation technologies due to the speed with which large amounts
of text can be checked in a relatively small amount of time and it is also a very
economical choice. Within the MT community, therefore, automatic evaluation
is the norm (cf. NIST: Doddington 2002; BLEU: Papineni et al. 2002; GTM:
Turian et al. 2003; METEOR: Banerjee & Lavie 2005 being the most often-used
metrics). There is essentially no human intervention in this evaluation process
(the recently introduced HTER metric (Snover et al. 2006) being one obvious
exception). The added feature of using the automatic metric is quantitative
feedback on the system’s performance: in our case, to quantify the results of
training the system either on a domain-specific homogeneous corpus (DVD
subtitles) or on a heterogeneous corpus (Europarl proceedings) which is non
domain-specific.
Given the nature of the text type and its use, we intended to incorporate some
form of human evaluation in addition to the automatic metrics prevalent in MT
community. Within translation studies, translation evaluation techniques focus
on human input, and little credit is given to the automatic methodology where
a text is scored automatically by a computer program. Human evaluation has
various drawbacks, including being prone to subjective opinion, expensive and
time-consuming. It does, however, play an important part in any kind of natu-
ral language generation system, given the fact that humans will ultimately be
the end users of automatically generated text.
The main aim of our evaluation process was, therefore, to move towards a
balanced holistic evaluation of machine translation output. By incorporating
real user evaluation studies with automatic metrics, we hoped to gain a better
understanding of the quality of our automatically generated DVD subtitles.
5.2. Evaluation using an Automatic Metric

The automatic evaluation metric we used in our evaluation was BLEU (Bilingual
Evaluation Understudy), which is based on the idea of measuring the transla-
tion closeness between a candidate translation and a set of reference translations
with a numerical metric (Papineni et. al. 2002). BLEU scores are given between 0
and 1, where 1 indicates a perfect match between the output translation and the
reference translations. These reference translations are treated as a “gold stand-
ard”, with which the EBMT system output is compared. The nearer the BLEU
score is to 1, the better the quality of the output translation is deemed to be. In
addition to generating a BLEU score for the target translations, we wanted to
train the system on increasing amounts of homogeneous data and heterogene-
ous data, and to record the resulting BLEU scores for each. Our goal was to see
which corpus type would produce the better scores and thus improve the qual-
ity of the automated subtitle translations.
Table 1 illustrates the BLEU scores for both homogeneous and heterogeneous
corpora, when the system is trained on the varying quantities of sentence pairs
ranging between 10K and 40K from the English-German corpus respectively (cf.
Armstrong 2007 for scores using other automatic evaluation metrics, and for the
other language direction and results for the bonus material).
The results show that by training the system on 10K sentence pairs from the
homogeneous corpus (0.1082), we achieved almost 50% better results than train-
ing the system on 40K sentence pairs from the heterogeneous corpus (0.0737).
While the improved score may appear far removed from the human-produced
reference translation of 1, this is still considered to be a sign of significant
progress from the point of view of system development. Note also that there is a
consistent increase in BLEU scores when incremental amounts of homogeneous
training data are used, while adding more than 20K sentence pairs of Europarl
data seems to show no improvement. With further evaluation studies we can
investigate whether or not a threshold exists for BLEU scores when the system
is trained on the DVD subtitle corpus.
Table 1: Automatic Evaluation Results
Amount of training sen- Type of Corpus BLEU Score

tence pairs (En-De)
10K Homogeneous Data 0.1082
Heterogeneous Data 0.0695



5.3. Formative and Comparative Evaluation by Human

The two types of human-based evaluation we conducted were of a formative
and comparative nature and were used to assess whether or not changes made
to our system were able to produce subtitles which are of a good enough quality
for the end-user. Formative evaluation is designed to detect areas requiring im-
provement while the system is still under development. It may be carried out at
different stages of the system development whereby making changes to the sys-
tem and the new changes implemented are in turn rechecked. The comparative
evaluation compares the performance between different MT systems in order to
assess how the system under investigation fares against another MT system.
Within the formative evaluation section we carried out a text-only evaluation
with 6 participants and a text and image based evaluation with 6 participants
evaluating DVD clips on a TV screen. Both evaluation strategies were used to
improve the output of the EBMT system. For the comparative evaluation sec-
tion we carried out an online survey, which combined a questionnaire with sub-
titled movie clips. A total of 12 German- speaking students took part, including
both native and non-native speakers.
5.3.1 Formative Evaluation Techniques with Text-only Evaluation

We used a training corpus of 30K sentence pairs and then input a test set of 2K
English sentences into the EBMT system. Of the 2K German output sentences
we randomly chose 200 for our evaluation purposes. The idea behind this was
to make the evaluation completely objective, and not choose the ‘best’ 200
sentences from the output. However, this evaluation method was harsh as the
subtitles were not accompanied by any images. Also the sentences were picked
randomly out of context and there was no relation between the sentence before
and the sentence after where each was an independent subtitle generated by
our EBMT system. We then split up the 200 sentences into four groups of 50. We
provided the human evaluators with the sentences, along with two scales, one
for intelligibility and the other for accuracy. We adopted the scales from Wagner
(1998), as she explains that these scales are useful for small-scale corpus-based
research. The two scales used for the evaluation had scores ranging from 1 to 4,
1 being the best result.
The evaluation indicated the main areas of weakness of our system. These in-
cluded lexical errors, lack of capitalisation and verb agreement, English words
remaining in the German output, and our chunking methods. The most nega-
tive comment came from one evaluator, stating that MT subtitles would never
be of any use in any situation. While we were prepared for this type of feedback,
given the particularly harsh conditions as explained earlier, it also raised the
possibility that MT evaluation by a human can be affected by a negative attitude
the evaluator may already have towards MT in general. On the other hand, pos-
itive comments from the evaluation included acceptable translations for short
sentences and even creative renditions by the EBMT system when comparing
some of the original human subtitles. Furthermore, there were many instances
of minor grammatical errors only, scoring 2, which can easily be fixed by train-
ing the system further. Nevertheless evaluators noted that the subtitles would
need post-editing if they were to reach a standard good enough to be shown on
a commercial DVD. The errors which they pointed out were beneficial to us to
further develop the EBMT system in a way that would improve the quality of
the output.
5.3.2 Formative Evaluation Techniques with Text and Image Evaluation

In this evaluation, the participants watched a number of DVD clips with Ger-
man subtitles produced by our EBMT system on a widescreen television in a
dedicated lab11, assimilating a home theatre set-up where people are likely to
watch DVD films. This session was then followed by a retrospective interview.
The idea behind this type of formative evaluation is that we now introduce a
relevant context to the process, providing the text (subtitles) with image and
sound. These two extra media channels may influence the responses of the par-
ticipants, as sound and image form part of the comprehension process when
people watch a film with subtitles, rather than simply relying on reading them
as standalone texts with no accompanying context.
Six German native speakers participated in this evaluation. Three of the clips
had English original soundtracks, and three had Japanese. The level of English
language knowledge of the participants ranged from good to excellent, with
one participant having some knowledge of the Japanese language, but not at
a level to understand a film. The participants were informed in advance that
the DVD subtitles were automatically generated by an EBMT system. Each clip
lasted approximately 2 minutes, and each retrospective interview was recorded
on cassette tape. There were ten sections in the interview, each containing an
average of four questions. Two researchers conducting the evaluation session
were present in the room throughout the viewing of the clips, followed by the
interview.
The results of this evaluation session were more promising than the text-only
evaluation. During the retrospective interview we gathered some background
information on the participants, indicating if they often watched films on DVDs
with subtitles and how much they knew about translation technology and ma-
chine translation. Most participants watched subtitled films on DVDs three to
four times a year, as most German films released in cinemas are dubbed. How-
ever, they all said that they much preferred subtitled films, because hearing
the original soundtrack provided the viewer with a much better insight into
cultural aspects of the film. None of the participants were familiar with the tech-
nologies we were using in the project, which perhaps would also influence their
answers and ideas of the capabilities of MT.
There was a general consensus among the evaluators that our EBMT subtitles
with no post-editing would still benefit viewers if they did not understand the
source language. It was also interesting to hear that with post-editing these sub-
titles could probably be used in certain public situations, for example in-flight
movies, film festivals with the extremely short release time and a small budget
to cover for the cost of subtitling, minority language scenarios and streaming
videos. The participants were hesitant to say whether they would accept these
EBMT subtitles with post-editing on a commercial DVD. We were correct in our
pre-evaluation assumption that knowledge of the original source might have
an influence on the participants’ answers. During the retrospective interview
all of them were slightly more critical of mistakes in the English language clips
as opposed to Japanese language clips, given that they had no knowledge of
the source language in the latter. This pilot study was rewarding in providing
us with an insight into real-user perceptions and also into evaluation strate-
gies. This will be the stepping-stone from which a larger real-user study can be
devised.
5.3.3 Comparative Technique with Online Survey including Film Clips

Following the above evaluation session, we developed an online survey to carry
out our third set of human evaluations, using the virtual learning environment
Moodle12 which is implemented campus-wide at DCU. The idea behind devis-
ing an online survey was to access a wider audience and also to test the techni-
cal capability for Moodle to incorporate multimedia files and their access by
the participants. This therefore formed a pilot trial for future larger scale on-
line surveys which we hope to conduct. We asked native German speakers as
well as non-native speakers to take part in the survey, with the final number of
participants totalling 12. While that number was a little smaller than had been
hoped for, this experience provided us with useful technical issues to consider
in developing a large-scale online survey of this nature.
The questionnaire first asked the participants to give some background infor-
mation, such as how often they normally watch subtitled media. In this evalu-
ation, participants were asked to look at film clips which incorporated German
subtitles produced by EBMT for two films, namely The Bourne Identity and Harry
Potter and the Prisoner of Azkaban.
The approach taken was to prepare 3 different sets of subtitles for each film,
making a total of 6 clips which the participants were asked to evaluate. The
German subtitles used in the survey along with the corresponding original
(intralingual) English subtitles are included in Appendix A. The first clip had
raw EBMT subtitles from our system, the second had subtitles translated by the
free online MT site Babelfish,13 and the third had post-edited EBMT subtitles
from our system. We decided to include Babelfish output as a benchmark
comparison for our MT system, in addition to human translation. Although
the free version via Babelfish does not provide the full capability which might
be available via its commercial version, it is perhaps the best-known freely
available general purpose automatic translation system, and gives the reader
a very good idea of the relative quality of our EBMT system.14 The third set of
post-edited subtitles was included to see the acceptability from the end-user’s
point of view. The editing was conducted by a native speaker of English with
the knowledge of German within a pre-determined timeframe of 20 minutes
to post-edit 38 subtitles. This was to test if non-native input could be used to
improve the results.
Figure 2 shows the results from our online survey. There are four charts (A-
D), indicating how many of the 12 respondents voted each of the 3 subtitle ver-
sions for the selected scenes from the two films as being acceptable for use in the
given scenarios, namely on a purchased DVD, on a pirate DVD, on an in-flight
film and on a streaming video. The participants were allowed to select multiple
answers for which set of subtitles they would consider acceptable. For example,
in (C) for the Harry Potter clip, 1 respondent regarded the raw EBMT output as
acceptable quality for use on a in-flight film, 2 considered raw Babelfish output
acceptable, 9 considered post-edited EBMT acceptable while 3 respondents re-
garded none of the three versions acceptable for this scenario.

From the charts in Figure 2, it is very positive to see that high numbers of
people would accept post-edited EBMT subtitles, on all four types of media for
both clips shown, and in contrast to this there were a lower number of responses
indicating that they would accept none of the subtitles offered, especially in the
case of Harry Potter subtitles. Overall the Harry Potter subtitles were accepted
more often than The Bourne Identity subtitles. This is probably due to the type
of clip selected. The Bourne Identity is an action film, and therefore the camera
changes are more frequent, together with more interjections from various peo-
ple within one scene. On the other hand, Harry Potter tends to focus on its main
characters, not making camera changes as frequently and as suddenly as in The
Bourne Identity, and allowing them to finish their sentences without being inter-
rupted. This perhaps has repercussions for the types of film to which automated
subtitles are, in general, suited.
Figure 2: Online Survey Results

Of the 12 responses, 9 said they would purchase a commercial DVD containing

the post-edited EBMT subtitles, based on the short Harry Potter clip, and 6
people said they would purchase a DVD based on the sample of post-edited
subtitles on The Bourne Identity clip. Most responses regarding the post-edited
clips were very supportive, with these subtitles being strongly accepted in
all four scenarios. The participants liked the fact that the post-edited EBMT
subtitles displayed good colloquial phrases, correct subject-verb agreement,
the translation of tone and register was always correct, and it was commented
that the subtitles ‘felt like German’. The raw EBMT output was not considered
in a favourable manner, with the main complaint being the lack of capitalised
nouns15. This is something which is relatively easily fixed in the post-editing
phrase, particularly given that our ultimate aim is to use EBMT as an integral
tool for a human subtitler.
By contrast, the Babelfish translations received positive feedback in relation
to nouns being capitalised. Nonetheless, some of the subtitles contained blatant
lexical and grammatical errors, which were explicitly marked down by all par-
ticipants. The Babelfish subtitles were also heavily criticised for being translated
too literally, and in some cases the incorrect register was used. Nevertheless
Babelfish did score better than our raw EBMT output in all subtitle scenarios.
The feedback provided in narratives from the participants correlated well with
the figures shown in each chart, and these results provided us with concrete
evidence as to where EBMT is failing as compared with RBMT, which we could
use for our future studies.
The overall improved positive responses confirm our research direction and
the importance of human-based user feedback which can be obtained in nar-
rative form pinpointing the nature of the problem albeit by way of blackbox
evaluation. It also points to the need to continue developing further evaluation
techniques, which elicit the shortcomings of the system under study.
6. Conclusions
In this paper we presented a year-long proof of concept study undertaken with
the main objective being to build and test the feasibility of an EBMT system to
translate subtitles from English into German and Japanese for the DVD market.
We mainly focused on our primary system developed for English-German. We
also had a secondary objective to develop a holistic evaluation methodology
combining automatic metrics popular in the MT community such as BLEU with
a variety of human assessment methods. The BLEU scores provided a quantita-
tive measure to indicate the preference of the data types with which the EBMT
should be seeded. It indicated that the EBMT trained with homogeneous data
is likely to contribute to a higher translation quality than with heterogeneous
data. This in turn suggested that there are probably more similarities found
among subtitles than between a subtitle and a sentence from a more general
text type.
The human evaluation was, therefore, applied to the system only focusing on
the homogeneous data. As predicted, the text-only evaluation with randomly
selected text strings shown to the evaluators without the audiovisual context
was generally regarded as poor. This was also the first of our human-based
formative evaluations and the corpus size was the smallest of the three human
evaluations conducted. The next human evaluation indicated an improvement
as the system had been adjusted based on the first set of feedback from the
human evaluators. This points to the importance of human feedback in order
to elaborate the system’s shortcomings. The third human evaluation included
comparative evaluations between three different translations created from the
same source text. The raw output from our system scored the worst, behind the
output from the well-known online MT system Babelfish. This is understand-
able, however, given the time and resources invested in this long-standing sys-
tem. It was encouraging to see that human evaluators indicated the post-edited
EBMT output to be acceptable to be used for subtitles for commercial audio-
visual content despite the fact that the post-editing was performed by a non-
native speaker. These results indicated that the EBMT paradigm is feasible as
a CAT tool. Further, it may be possible, for instance, for such a system to be
used by professional translators to produce subtitles into their non-native lan-
guages.
The project team consisted of humanities researchers specialising in multi-
media translation, corpus linguistics and computing researchers specialising in
EBMT. Due to this combined expertise, we were able to achieve during the
relatively short timeframe the research objectives of building a working MT sys-

tem producing German subtitles from English on the basis of parallel corpora
of two different types. We were also able to explore holistic evaluation methods
by testing both automatic and human-based approaches.
7. Future Work
The time-consuming nature of corpus building with subtitle data was some-
thing we had slightly underestimated. In an industrial setting, in order to im-
plement a system such as ours it will require a much more efficient way of har-
vesting the data without compromising its quality. This suggests a great need
for co-operation with film distributors who may have subtitle data in electronic
form. As alluded to before, copyright issues are something which need to be
addressed for commercialisation of our concepts tested in this project. As a post-
project development, we hope to see our work being continued to integrate the
MT component into an existing subtitling environment and measure any im-
provement in subtitler throughput. We also hope to further refine the holistic
evaluation methods, by expanding the online survey platform to reach a wider
audience. Further, we hope to experiment with our eye-tracking equipment in
the research lab to explore the difference in cognitive load of the viewer when
watching the film with the machine-translated subtitles as compared to human-

translated subtitles, inspired by the work conducted by O’Brien (2006).
Finally, on the basis of the subtitle parallel corpus we have created for English-
German and English-Japanese, we hope to pursue our search for patterns of
repetitions and similarities in a more microscopic manner in this text type from
the perspective of the EBMT paradigm.
Works cited
Armstrong, S. 2007. Using EBMT to Produce Foreign Language Subtitle. MSc. Thesis, Dublin
City University, Dublin, Ireland.
Armstrong, S., Caffrey, C., Flanagan, M., Kenny, D., O’Hagan, M., and Way, A. 2006a.
Improving the Quality of Automated DVD Subtitles via Example-Based Machine
Translation. In Translating and the Computer. London: Aslib.
Armstrong, S., Flanagan, M., Graham, Y., Groves, D., Mellebeek, B., Morrissey, S., Strop-
pa, N. and Way, A. 2006b. MaTrEx: Machine Translation Using Examples. TC-STAR
OpenLab Workshop on Speech Translation. Trento, Italy (available at: www.computing.
dcu.ie/~away/PUBS/2006/Matrex.pdf).
Banerjee, S. and Lavie, A. 2005. METEOR: An Automatic Metric for MT Evaluation with
Improved Correlation with Human Judgments. In Proceedings of Workshop on Intrinsic
and Extrinsic Evaluation Measures for MT and/or Summarization at the 43rd Annual Meet-
ing of the Association of Computational Linguistics (ACL-2005), Ann Arbor, MI., 65-72.
Bowker, L. and Pearson, J. 2002. Working with Specialized Language: A practical guide to us-
ing corpora. London and New York: Routledge.
Brown, R. 1999. Adding Linguistic Knowledge to a Lexical Example-Based Translation
System. In Proceedings of the Eighth International Conference on Theoretical and Methodo
logical Issues in Machine Translation (TMI-99), Chester, UK, 22-32.
Carroll, M. 2004. Subtitling: Changing Standards for New Media: www.translationdirec-
tory.com/article422.htm [Accessed November 2006].
Doddington, G. 2002. Automatic evaluation of machine translation quality using n-gram
co-occurrence statistics. In Proceedings ARPA Workshop on Human Language Technology,
San Diego, CA., 128-132.
Gough, N., and Way, A. 2004. Robust Large-Scale EBMT with Marker-Based Segmenta-
tion. In Proceedings of the Tenth Conference on Theoretical and Methodological Issues in
Machine Translation (TMI-04), Baltimore, MD., 95-104.
Green, T. 1979. The Necessity of Syntax Markers. Two Experiments with Artificial lan-
guages. Journal of Verbal Learning and Behavior 18: 481-496.
Hearne, M. and Way, A. 2006. Disambiguation Strategies for Data-Oriented Translation.
In Proceedings of the 11th Conference of the European Association for Machine Translation,
Oslo, Norway, 59-68.
Koehn, P. 2005. A Parallel Corpus for Statistical Machine Translation. Machine Transla-
tion Summit X, Phuket, Thailand. 79-86.
Morrissey, S. and Way, A. 2005. An Example-Based Approach to Translating Sign Lan-
guage. In Proceedings of the Second Workshop on Example-Based Machine Translation,
Phuket, Thailand, 109-116.
Morrissey, S. and Way, A. 2006. Lost in Translation: the Problems of Using Mainstream
MT Evaluation Metrics for Sign Language Translation. In Proceedings of the SALTMIL
Workshop on Minority Languages, 5th International Conference on Language Resources and
Evaluation (LREC 2006), Genoa, Italy, 91-98.
Nagao, M. 1984. A Framework of a Mechanical Translation between Japanese and Eng-
lish by Analogy Principle. In A. Elithorn and R. Banerji, (eds.) Artificial and Human
Intelligence, North-Holland, Amsterdam, The Netherlands: Elsevier Science Publica-
tion. 173-180.
NHK Annual Report. 1996. www.nhk.or.jp/strl/results/annual96/3-4.html [Accessed 28
November 2006].
O’Brien, S. 2006. Investigating Translation From an Eye-Tracking Perspective. A paper
given at the 2nd International Association for Translation and Intercultural Studies
Conference: Intervention in Translation, Interpreting and Intercultural Encounters,

held at the University of the Western Cape, South Africa, 11-14 July, 2006.
Och, F. and Ney, H. 2003. A Systematic Comparison of Various Statistical Alignment
Models. Computational Linguistics 29(1): 19-51.
O’Hagan, M. 2003. Can language technology respond to the subtitler’s dilemma? – A
preliminary study. In Translating and the Computer 25. London: Aslib.
Papineni, K., Roukos, S., Ward, Y. and Zhu, W-J. 2002. BLEU: a Method for Automatic
Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the As-
sociation for Computational Linguistics, Philadelphia, PA., 311-318.
Snover, M., Dorr, B., Schwartz, R., Makhoul, J. and Micciulla, L. 2006. A Study of Transla-
tion Error Rate with Targeted Human Annotation. In Proceedings of the 7th Conference of
the Association for Machine Translation in the Americas, Boston, MA., 223-231.
Somers, H. 1999. Review Article: Example-based Machine Translation. Machine Trans-
lation 14: 113-157 (revised, extended version in Carl., M. and Way, A. (eds.) (2003),
Recent Advances in Example-Based Machine Translation, Kluwer Academic Publishers,
Dordrecht, The Netherlands, 3-59).
Somers, H., McLean, I. And Jones, D. 1994. Experiments in Multilingual Example-based
Generation. CSNLP 1994: 3rd Conference on the Cognitive Science of Natural Language
Processing, Dublin City University, 6-8 July 1994.
Stroppa, N., Groves, D., Sarasola, K., and Way, A. 2006. Example-Based Machine Trans-
lation of the Basque Language. In Proceedings of the 7th Conference of the Association for
Machine Translation in the Americas, Boston, MA., 232-241.
Stroppa, N. and Way, A. 2006. MaTrEx: DCU Machine Translation System for IWSLT
2006. In Proceedings of the International Workshop on Spoken Language Translation, Kyoto,
Japan.
Taylor, C. 2006a. I knew he’d say that!” A consideration of the predictability of language
use in film. A paper presented at the “Multidimensional Translation: Audiovisual
Translation Scenarios” conference, University of Copenhagen, 1-5 May, 2006, Copen-
hagen, Denmark.
Taylor, C. 2006b. The Language of Television Series: a Study of Predictable Patterns. A paper
presented at the “Languages & the Media” conference, 25-27 October, 2006, Berlin,
Germany.
Toole, J., Turcato, D., Popowich, F., Fass, D. and McFetridge, P. 1998. Time-constrained
machine translation. In: Farwell, D., Gerber, L. and Hovy, E. (eds.) Machine translation
and the information soup: third conference of the Association for Machine Translation in
the Americas, AMTA’98, Langhorne, PA. Proceedings (Berlin: Springer), 103-112.
Turian, J., Shen, L. and Melamed, D. 2003. Evaluation of Machine Translation and its
Evaluation. Machine Translation Summit IX, New Orleans, LA., 386-393.
Utiyama, M. and Isahara, H. 2003. Reliable Measures for Aligning Japanese-English
News Articles and Sentences. In Proceedings of the 41st Annual Meeting of the Association
for Computational Linguistics (ACL-03), Sapporo, Japan, 72-79.
van den Bosch, A., Stroppa, N. and Way, A. 2007. A memory-based classification ap-
proach to marker-based EBMT. In Proceedings of the METIS-II Workshop on New Ap-
proaches to Machine Translation, Leuven, Belgium (to appear).
Wagner, S. 1998. Small Scale Evaluation Methods. In: R. Nübel & U. Seewald-Heeg (eds.)
Evaluation of the Linguistic Performance of Machine Translation Systems. Proceedings of the
Workshop at KONVENS-98. Bonn, Germany, 93-105.
Watanabe, H., Kurohashi, S. and Aramaki, E. 2003. Finding Translation Patterns from
Paired Source and Target Dependency Structures. In Carl., M. and Way, A. (eds.)
(2003), Recent Advances in Example-Based Machine Translation, Kluwer Academic Pub-
lishers, Dordrecht, The Netherlands, 397-420.
Multimedia References
As Good as it Gets (1997). [DVD]. USA: TriStar Pictures.
Being John Malkovich (1999). [DVD]. USA: Universal Studios.
Breakfast at Tiffany’s (1961). [DVD]. USA: Paramount Pictures.
Casablanca (1942). [DVD]. USA: Time Warner.
Dr Strangelove (1964). [DVD]. UK: Hawk Films Ltd.
Harry Potter and the Prisoner of Azkaban (2004). [DVD]. USA: Time Warner.
The Bourne Identity (2002). [DVD]. USA: Universal Studios.
Notes
1. This work was generously supported by an Enterprise Ireland Proof of Concept
Commercialization award.
2. Examples include the international conference in audiovisual translation In So Many
Words: Language Transfer on the Screen, held in February 2004 in London; Languages and
the Media conferences held in October 2004 and 2006 in Berlin; EU High Level Scientific
Conference series: Multidimensional Translation held in Saarbrücken in May 2005 and in
Copenhagen in May 2006.
3. www/kuleuven.ac.be/research/researchdatabase/project/3E02/3E020715.htm
4. www.etitle.co.uk
5. www.nclt.dcu.ie/mt
6. The MaTrEx system currently translates between English and a number of languages,
including French (Gough & Way 2004), Spanish (Armstrong et al. 2006b), German and
Japanese (Armstrong et al. 2006a), Italian and Arabic (Stroppa & Way 2006), Basque
(Stroppa et al. 2006), Irish Sign Language (Morrissey & Way 2005, 2006), and Dutch (van
den Bosch et al. 2007).
7. For example, www.amazon.co.jp does not deliver these region-protected DVDs out-
side of Japan.
8. www.translationzone.com/component.asp?ID=244
9. This study indicated an average of 11 characters per sentence for Japanese subtitles for
the Lord of the Rings: The Fellowship of the Ring, against 31 characters per sentence for the
Japanese translation of the book 1 of the Lord of the Rings.
10. www.lexically.net/wordsmith/index.html
11. The Advanced Translation Research Lab was recently established at SALIS, DCU,
with audiovisual equipment such as a 32” widescreen TV, and a DVD recorder/player.
The room can also be used for small group interviews.
12. http://moodle.org
13. http://babelfish.altavista.com/
14. Note that this is not a completely fair comparison, as our system is trained on mate-
rial similar to that with which it is tested, while Babelfish receives no such customisation,
of course.
15. In the German language, all nouns are capitalised. The production of lower-case
nouns by the EBMT system is described in section 4.2.
Appendix A: MT German Subtitles used in the Survey
Harry Potter and the Prisoner of Azkaban

Original English Subtitle Oh for goodness sake
Raw EBMT oh wie sie goodness’ himmels willen.
Babelfish OH für Grund der Güte.
Post-edited EBMT Oh für liebe Güte.
Original English Subtitle Brilliant!

Raw EBMT genial
Babelfish Leuchtend
Post-edited EBMT Toll!
Original English Subtitle It’s not funny

Raw EBMT das ist nicht witzig.
Babelfish Sie ist nicht lustig.
Post-edited EBMT Das ist nicht witzig.
Original English Subtitle I think I was lucky not to be arrested

Raw EBMT ich war ein glück nicht die arrested.
Babelfish Ich war glücklich nicht festgehalten zu werden.
Post-edited EBMT Ich hatte Glück nicht eingesperrt zu werden.
Original English Subtitle Ouch Ron, that was my foot

Raw EBMT ouch, ron, das war mein zu fuß.
Babelfish Autsch Ron, der war mein Fuß.
Post-edited EBMT Au Ron, das war mein Fuß.
Original English Subtitle Why are we stopping?

Raw EBMT warum halten wir an ?
Babelfish Warum wir stoppend sind.
Post-edited EBMT Warum halten wir?
Original English Subtitle There’s something moving out there

Raw EBMT da draußen bewegt sich irgendwas.
Babelfish Es gibt etwas, das heraus dort bewegt.
Post-edited EBMT Etwas bewegt draußen.
Original English Subtitle I think someone’s come on board.

Raw EBMT ich glaub da steigt jemand ein.
Babelfish Ich denke jemand, das an Bord kommt.
Post-edited EBMT Ich glaube irgendjemand kommt an Bord.
The Bourne Identity
Original English Subtitle I got enough trouble, Ok.
Raw EBMT ich hab und ein problem.
Babelfish Ich erhielt genügend Mühe.
Post-edited EBMT Ich habe genug Probleme, ja?
Original English Subtitle Ok, I’m up.

Raw EBMT also,ich bin zu.
Babelfish O.K,bin ich oben.
Post-edited EBMT Also,ich bin bereit.
Original English Subtitle It’s no scam.

Raw EBMT Nein, das ist ja scam .
Babelfish Nr., ist es kein scam.
Post-edited EBMT Nein, das ist keine Masche.
Original English Subtitle Is it him?

Raw EBMT sie auf dem ?
Babelfish Ist es er?
Post-edited EBMT Ist das er?
Original English Subtitle I heard you inside.

Raw EBMT ich hab sie dich .
Babelfish Ich hörte Sie nach innen.
Post-edited EBMT Ich hörte Sie rein.
Original English Subtitle It’s that simple.

Raw EBMT ich mir einfach .
Babelfish Er ist einfacher der.
Post-edited EBMT Es ist so einfach.
Original English Subtitle And I’ll give you another $10,000 when we get there.
Raw EBMT ich gebe dir einen $10 . 000 als wir kommen sehen.
Babelfish Ich gebe Ihnen eine anderen $10.000, wenn wir
ankommen
Post-edited EBMT Wenn wir ankommen, gebe ich dir noch $10,000.

Leading by Examples

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Leading by Examples

Transféré par

Droits d'auteur :

Formats disponibles

This article was downloaded by:

On: 18 February 2009

LEADING BY EXAMPLE: AUTOMATIC TRANSLATION OF SUBTITLES VIA

Online Publication Date: 31 January 2007

PLEASE SCROLL DOWN FOR ARTICLE

Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf

Stephen Armstrong & Andy Way, School of Computing,

(2006a, 2006b) in detecting predictable patterns used in dialogues of fictional

subtitle translations. One is the MUSA (Multilingual Subtitling of Multimedia

3. Example-Based Machine Translation

3.2. Marker-Based EBMT

(3) Do <PERS_PRON> you like being interrupted <CONJ> when <PERS_PRON>

The marking of syntactic structures is necessary for the extraction of transla-

(4) <CONJ> when ↔ <CONJ> wenn

3.3. The MaTrEx EBMT system

3.4. An EBMT example

(5) Darling, we just met two weeks ago at the bar

(6) Darling <PUNC>, <PERS_PRON> I am sorry <CONJ> but <PERS_PRON> I lost

<PERS_PRON> We just met <QUANT> one day ↔ <PERS_PRON> Wir trafen

uns einfach <DET> eines Tages

<PERS_PRON> I’ll be <PREP> at <DET> the bar ↔ <PERS_PRON> Ich gehe

(7) Darling ↔ Mein Guter

ning of our research, we did not know whether a subtitle-specific homogene-

4.2. Creating the Corpora

4.3. Corpus Statistics

5. Evaluation Approaches and Results

5.2. Evaluation using an Automatic Metric

Table 1: Automatic Evaluation Results

Amount of training sen- Type of Corpus BLEU Score

20K Homogeneous Data 0.1166

30K Homogeneous Data 0.1195

40K Homogeneous Data 0.1287

5.3. Formative and Comparative Evaluation by Human

5.3.1 Formative Evaluation Techniques with Text-only Evaluation

5.3.2 Formative Evaluation Techniques with Text and Image Evaluation

5.3.3 Comparative Technique with Online Survey including Film Clips

garded none of the three versions acceptable for this scenario.

Figure 2: Online Survey Results

Of the 12 responses, 9 said they would purchase a commercial DVD containing

relatively short timeframe the research objectives of building a working MT sys-

watching the film with the machine-translated subtitles as compared to human-

Conference: Intervention in Translation, Interpreting and Intercultural Encounters,

Appendix A: MT German Subtitles used in the Survey

Harry Potter and the Prisoner of Azkaban

Original English Subtitle Brilliant!

Original English Subtitle It’s not funny

Original English Subtitle I think I was lucky not to be arrested

Original English Subtitle Ouch Ron, that was my foot

Original English Subtitle Why are we stopping?

Original English Subtitle There’s something moving out there

Original English Subtitle I think someone’s come on board.

Original English Subtitle Ok, I’m up.

Original English Subtitle It’s no scam.

Original English Subtitle Is it him?

Original English Subtitle I heard you inside.

Original English Subtitle It’s that simple.

Vous aimerez peut-être aussi