Vous êtes sur la page 1sur 36

Breaking the Language Barrier:

A Game-Changing Approach
L1
L2
Version 0.25
Ziyuan Yao
yaoziyuan@gmail.com
https://sites.google.com/site/yaoziyuan/
24 June 2012
his !or" is license# un#er a $reati%e $ommons &ttri'ution (.0 )nporte# *icense.
Table of Contents
+%er%ie!....................................................................................................................................................(
$hapter 1: ,rea"ing the *anguage ,arrier !ith *anguage *earning........................................................4
1.1. -oreign *anguage &c.uisition.......................................................................................................4
1.1.1. *1/0ri%en *2 eaching1 2*10*23.......................................................................................4
1.1.1.1. he 4#ea...........................................................................................................................4
1.1.1.2. 5hy 4s 4t the ,est6 & 7roo8............................................................................................9
1.1.1.(. :istorical 0e%elopments.................................................................................................;
1.1.1.4. &n <=ample >ystem 0esign............................................................................................?
1.1.1.4.1. +%er%ie!.................................................................................................................?
1.1.1.4.2. &*&> @ission 7ro8iles.......................................................................................10
1.1.1.4.(. &*&> )ser 7ro8iles.............................................................................................19
1.1.1.4.4. 0ata &c.uisition >trategies...................................................................................1A
1.1.1.5. & Be!er 0esign: & 0ata/0ri%en &pproach..................................................................1A
1.1.2. 5or# @nemonics..................................................................................................................1?
1.1.2.1. <ssential @nemonics....................................................................................................20
1.1.2.1.1. 7honetically 4ntuiti%e <nglish1 274<3....................................................................20
1.1.2.1.2. <tymology an# -ree &ssociation...........................................................................24
1.1.2.1.(. 5hy &re hey <ssential6 & 7roo8.........................................................................25
1.1.2.2. +ther @nemonics..........................................................................................................29
1.1.2.2.1. +rthographically 4ntuiti%e <nglish 2+4<3.............................................................29
1.1.2.2.2. 7rogressi%e 5or# &c.uisition 275&3...................................................................2;
1.1.2.(. 7rinciples *earne#........................................................................................................2A
1.2. -oreign *anguage 5riting &i#s...................................................................................................2?
1.2.1. 7re#icti%e %s. $orrecti%e 5riting &i#s.................................................................................2?
1.2.2. 4nput/0ri%en >ynta= &i#1 240>&3........................................................................................2?
1.2.(. 4nput/0ri%en +ntology &i#1 240+&3...................................................................................(0
1.(. -oreign *anguage Cea#ing &i#s..................................................................................................(0
$hapter 2: ,rea"ing the *anguage ,arrier !ith *ittle *earning.............................................................(2
2.1. -oreign *anguage )n#erstan#ing................................................................................................(2
2.1.1. >ynta=/7reser%ing @achine ranslation1 2>7@3...............................................................(2
2.2. -oreign *anguage Deneration......................................................................................................(4
2.2.1. -ormal *anguage @achine ranslation1 2-*@3................................................................(5
Overview
4n to#ayEs !orl#F the goal o8 'rea"ing the language 'arrier is pursue# on t!o 8ronts: language teachers
teaching stu#ents a secon# languageF thus ena'ling humans to manually 'rea" the language 'arrierF an#
computational linguists 'uil#ing increasingly 'etter machine translation systems to automatically 'rea"
the language 'arrier.
:o!e%erF 4 see importantF un8ul8ille# opportunities on 'oth 8ronts:
4n secon# language teachingF amazingly e88icient teaching metho#s ha%e not gone mainstream an# not
#ra!n enough attention 8rom computational linguists 2so that these metho#s coul# 'e automate# an#
truly po!er8ul3. -or e=ampleF imagine i8 youEre 'ro!sing a 5e' page in your nati%e languageF an# a
5e' 'ro!ser e=tension automatically #etects the topic o8 this page an# inserts rele%ant 8oreign
language micro/lessons in itF so that you can inci#entally learn a 8oreign language !hile 'ro!sing
interesting nati%e language in8ormation :/3 his &#>ense/li"e G*1/#ri%en *2 teachingG !ill 'e the
8uture o8 secon# language teaching.
4n machine translationF computational linguists only pay attention to computer capa'ilities to process
natural language 2"no!n as natural language processingF B*73F an# totally ignore human capa'ilities to
share some 'ur#en 8rom the computer in language processingF !hich can lea# to signi8icantly 'etter
results. -or e=ampleF theory an# practice ha%e pro%en that synta= #isam'iguation is a much har#er tas"
than !or# sense #isam'iguationF an# there8ore machine translation ten#s to scre! up the !or# or#er o8
the translation result i8 the language pair has #isparate !or# or#ersH 'ut !hat i8 machine translation
preser%es the source languageEs !or# or#er in the translation resultF an# teaches the en# user a'out the
source languageEs !or# or#er so that he can manually 8igure out the logic o8 the translation result6 48
the en# user is !illing to commit some o8 his o!n natural intelligence in the man/machine Ioint e88ort
to 'rea" the language 'arrierF he !ill get the Io' #one 'etter.
here8ore this e'oo" presents emerging i#eas an# implementations in computer/assiste# language
learning 2$&**3F secon# language rea#ing an# !riting ai#s an# machine translation 2@3 that stri%e to
le%erage 'oth human an# machine language processing potential an# capa'ilitiesF an# !ill re#e8ine the
!ay people 'rea" the language 'arrier.
&pproaches !hose titles ha%e an e=clamation mar" 213 are stirring game/changing technologies !hich
are the #ri%ing 8orces 'ehin# this initiati%e.
You can stay in8orme# o8 ne! %ersions o8 this e'oo" 'y su'scri'ing to
http://groups.google.com/group/'l'gca/announce
an# #iscuss topics in the e'oo" !ith the author an# other rea#ers at
http://groups.google.com/group/'l'gca/#iscuss
Chapter 1: Breaking the Language Barrier with
Language Learning
>ometimes a person !ants to internalize a 8oreign language in or#er to un#erstan# an# generate
in8ormation in that languageF especially in the case o8 <nglishF !hich is the #e 8acto lingua 8ranca in
this era o8 glo'alization.
>ection 1.1 J-oreign *anguage &c.uisitionK #iscusses a no%el approach to learning a 8oreign language
2e=empli8ie# 'y <nglish3.
& person !ith some 8oreign language "no!le#ge may still nee# assistance to 'etter rea# an# !rite in
that language. here8oreF >ections 1.2 J-oreign *anguage 5riting &i#sK an# 1.( J-oreign *anguage
Cea#ing &i#sK #iscuss ho! no%el tools can assist a non/nati%e user in !riting an# rea#ing.
1.1. Foreign Language Acuisition
& language can 'e #i%i#e# into t!o parts: the easy part is its grammar an# a 8e! 8unction !or#sF !hich
account 8or a %ery small an# 8i=e# portion o8 the languageEs entire 'o#y o8 "no!le#geH the har# part is
its %ast %oca'ularyF !hich is constantly gro!ing an# changing an# canEt 'e e=hauste# e%en 'y a nati%e
spea"er.
here8oreF the pro'lem o8 language ac.uisition is largely the pro'lem o8 %oca'ulary ac.uisitionF an# a
language ac.uisition solutionEs o%erall per8ormance is largely #etermine# 'y its %oca'ulary ac.uisition
per8ormance.
he pro'lem o8 %oca'ulary ac.uisition can 'e #i%i#e# into t!o su'pro'lems: J!henK L !hen is
potentially the 'est time to teach the user a !or#F an# Jho!K L !hen such a teaching opportunity
comesF !hat is the 'est !ay 8or the user to memorize the !or# an# 'on# its spellingF pronunciation an#
meaning all together6
>ection 1.1.1 a##resses the J!henK pro'lem !ith a metho# calle# J*1/#ri%en *2 teachingKF !hich
automatically teaches you a secon# language !hen youEre 'ro!sing nati%e language !e'sites.
>ection 1.1.2 a##resses the Jho!K pro'lem !ith %arious mnemonic #e%icesF all o8 them 8itting neatly
!ith the J*1/#ri%en *2 teachingK 8rame!or".
1.1.1. L1!"riven L2 Teaching# $L1"L2T%
1.1.1.1. The &'ea
A Quick Introduction
4magine i8 youEre 'ro!sing a 5e' page in your nati%e language 2J*1K3F an# a 5e' 'ro!ser e=tension
automatically #etects the topic o8 this page an# inserts rele%ant 8oreign language 2J*2K3 micro/lessons
in itF so that you can inci#entally learn a 8oreign language !hile 'ro!sing interesting nati%e language
in8ormation :/3 -or e=ampleF i8 the 5e' page is a ne!s story a'out 'as"et'allF the 'ro!ser e=tension
can insert micro/lessons a'out 'as"et'all !or#s an# e=pressions in the 8oreign language you !ish to
learn.
his &#>ense/li"e J*1/#ri%en *2 teachingK !ill 'e the 8uture o8 secon# language teaching.
Topic-Oriented vs. Word-Oriented Teaching
,esi#es inserting 8oreign language micro/lessons 'ase# on the pageEs o%erall topicF the 'ro!ser
e=tension coul# e%en insert micro/lessons a8ter in#i%i#ual !or#s on that page to speci8ically teach these
!or#sE 8oreign counterparts. -or e=ampleF i8 a sentence

2$hinese 8or J:e is a goo# stu#ent.K3 appears in a $hinese personEs 5e' 'ro!serF the 'ro!ser
e=tension can insert a8ter JK a micro/lesson that teaches its <nglish counterpartF Jstu#entK:
2 stu#ent3
2he micro/lesson rea#s: J>)0<B 4> stu#entK.3 &##itional in8ormation such as stu#entEs
pronunciation can also 'e inserte#. &8ter se%eral micro/lessons li"e this 2each lesson teaching #i88erent
a##itional in8ormation such as e=ample sentencesF relate# phrases an# comparisons to near/synonyms3F
the computer can #irectly replace 8uture occurrences o8 JK !ith Jstu#entK:
stu#ent
,ut 'ear in min# that such #irect replacement is not al!ays technically possi'le or pe#agogically
!elcomeF especially i8 the !or# 'eing taught is a %er' an# has #i88erent argument structures in the t!o
languages. 4n practiceF a 8oreign !or# can 'e practice# separately in micro/lessonsF each lesson
containing one e=ample sentenceF e.g.
2 stu#entF : There are 20 students in our class.3
and!ing Am"iguit# in Word-Oriented Teaching
)nli"e topic/oriente# teachingF !or#/oriente# teaching nee#s to #eal !ith a pro'lem concerning
am'iguous nati%e language !or#s on the 5e' page. he 'ro!ser e=tension nee#s arti8icial intelligence
2more speci8icallyF J!or# sense #isam'iguationK3 to #etermine an am'iguous !or#Es inten#e# meaning
'ase# on conte=tF an# then teach 8oreign language 8or that meaning. >uch #isam'iguation may not 'e
al!ays rightF so the computer shoul# a!$a#s te!! the user $hich meaning is "eing assumed in the
teaching.
48 an *10*2 system !ants to a%oi# the pro'lem o8 !or# sense #isam'iguation entirelyF it shoul# use
topic/oriente# teaching instea# o8 !or#/oriente# teaching.
What a"out Teaching Grammatica! %no$!edge&
Drammatical "no!le#ge can 'e taught similarly using *1/#ri%en *2 teaching. he 'ro!ser e=tension
coul# #etect a certain grammatical usage in the nati%e language page an# insert a micro/lesson a8ter
that usage to teach its correspon#ing 8oreign language grammatical usage.
L'(L)T in a *u!ti-+eer ,nvironment
48 an *10*2 system inserts 8oreign language micro/lessons not only into the userEs incoming nati%e
language communication 2e.g. a 5e' page loa#e# into his 'ro!ser3 'ut also outgoing communication
2e.g. a message he posts to a 8orum3F all his recipients !ill 'e engage# in language learningF e%en i8
they themsel%es #o not install an *10*2 system on their si#e. 7ut another !ayF i8 only one acti%e
participant in an online community 2e.g. an 4C$ chat room or a 8orum3 *10*2/izes his outgoing
messagesF all other mem'ers !ill 'e learning the 8oreign language. 4tEs li"e someone smo"ing in a
lo''y L no one else !ill sur%i%e the smo"e.
>uch a situation also 8osters language learnersE Jpro#ucti%e "no!le#geK in a##ition to Jrecepti%e
"no!le#geK 2Jrecepti%eK means a learner can recognize the meaning o8 a !or# !hen he sees or hears
that !or#F !hile Jpro#ucti%eK means he can in#epen#ently !rite or say a !or# !hen he !ants to use
it3. -or e=ampleF suppose t!o $hineseF &lice an# ,o'F are chatting !ith each otherF an# &lice says:

2$hinese 8or J:e is a goo# stu#ent.K3F 'ut &liceEs *10*2 system trans8orms this outgoing messageF
replacing the $hinese !or# JK !ith its <nglish counterpartF Jstu#entK:
stu#ent
Bo! 'oth &lice an# ,o' see this trans8orme# messageF an# suppose no! ,o' !ants to say:

2$hinese 8or JBoF he is not a goo# stu#ent.K3F 'ut he is in8luence# 'y the <nglish !or# Jstu#entK in
&liceEs messageF an# su'consciously 8ollo!s suitF typing Jstu#entK instea# o8 JK in his reply:
stu#ent
husF ,o' is engage# in not only JrecognizingK this <nglish !or# 'ut also Jpro#ucingK it.
1.1.1.2. (h) &s &t the Best* A +roof
5hy is *10*2 the 'est %oca'ulary ac.uisition strategy6 ,elo! is my proo8.
-act ': A Cognitive Constraint in .oca"u!ar# Ac/uisition
he 1??? mo%ie The Matrix sho!s us an a#%ance# 'rain/computer inter8aceF !ith !hich Neo is taught
"ung 8u in secon#s. 4#eallyF !e !oul# li"e to ha%e such a tool to impart a 8oreign language to us in
secon#s as !ell. ,ut this is still science 8ictionF an# !e still ha%e to ac.uire "no!le#ge !ith natural
means: our eyesF ears an# slo!ish memory.
-act ): An A00ective Constraint in .oca"u!ar# Ac/uisition
,esi#es the a'o%ementione# cogniti%e limitF !e humans also ha%e an emotional threshol# in ac.uiring
ne! "no!le#ge. -or e=ampleF a 'eginner o8 a 8oreign language is not li"ely !illing to sit there all #ay
an# memorize a #ictionary o8 this 8oreign language. >imilarlyF he !oul# 8in# it %ery #i88icult to rea# an
article in this 8oreign language inten#e# 8or nati%e rea#ersF as most !or#s in such an article !oul# 'e
un8amiliar to him. hese t!o e=amples tell us that !e simply canEt 8orce a person to #igest a large
num'er o8 un8amiliar 8oreign language !or#s at once.
(ivide and Time!# Con/uer
,ase# on -act 2 a'o%eF !e can conclu#e that the %ast %oca'ulary o8 a 8oreign language must 'e #i%i#e#
into small JpiecesK an# taught to a learner one piece at a time 2e.g. one !or# at a time3F an# !eE# 'etter
choose a time !hen the learner is most moti%ate# to learn such a piece. -or e=ample:
!hen the learner is playing a computer game or participating in a %irtual reality !orl# such as
Second LifeF la'el o'Iects in that game or %irtual !orl# !ith 8oreign language #escriptionsH
!hen the learner is !atching a mo%ieF sho! 'oth nati%e language an# 8oreign language su'titles
on the screenH
!hen the learner is using a computer programF sho! the programEs user inter8ace in a 8oreign
languageH
!hen the learner is interacting !ith the real !orl#F sho! 8oreign language la'els on roa# signsF
'ill'oar#sF pro#uctsF o88icial #ocumentsF etc.H
!hen the learner is 'ro!sing a nati%e language 5e' pageF insert 8oreign language micro/lessons
rele%ant to the pageEs topic or certain !or#s o8 the page.
&mong the a'o%e e=amplesF it is o'%ious that the last e=ampleF *10*2F pro%i#es the !i#est spectrum
o8 !or# teaching opportunities.
1.1.1.,. -istorical "evelop.ents
'123s: 4o""ins Bur!ing5s (ig!ot *ethod
4n the 1?90s &merican anthropologist an# linguist Co''ins ,urling 8irst intro#uce# the metho# o8
gra#ually intro#ucing 8oreign language ingre#ients in a nati%e language story'oo" to teach 8oreign
language in his seminal paper Some Outlandish Proposals for the Teaching of Foreign Languages.
,urling calle# it the J#iglot metho#K an# !as inspire# 'y a Learning !hinese 'oo" series pu'lishe#
'y Yale )ni%ersity 7ressF !here ne! $hinese characters gra#ually replace# Comanize# $hinese in a
te=t'oo".
,urlingEs metho# re.uires a human translator to manually trans8orm a nati%e language story'oo" in
such a !ay thatF as a rea#er rea#s onF he !oul# see more an# more !or#s an# phrases e=presse# in a
8oreign languageF an# e%entually !hole sentences an# paragraphs too. his 8orces the rea#er to loo" up
ne!ly encountere# 8oreign language elements in a #ictionary or guess their meanings 'ase# on
conte=tF there8ore gra#ually pic"ing up that 8oreign language as he e=plores the story.
>ince then the metho# is also "no!n as Jmi=e# te=tsKF J'ilingual metho#KF Jco#e s!itchingKF
Jsan#!ich techni.ueK or J#iglot !ea%eK in the 8oreign language teaching research community.
:o!e%er the metho# has ne%er 'ecome a mainstream metho#.
'113s 6 +resent: (ig!ot Books7 ,"ooks and .ideos on the *arket
here are e#ucational materials using manually prepare# #iglot te=ts on the 8oreign language teaching
mar"etF 'ut they ha%e ne%er gone mainstream. -or e=ampleF Po"er#lide 2!!!.po!er/gli#e.com3 sells
interacti%e #iglot e'oo"sH 7ro8essors Ji Yuhua 23 an# Mu Nichao 23 sell #iglot %i#eos title#
Three Little Pigs and Step"ise $nglish %& in $hina.
)333s 6 +resent: Automatic L'(L)T Theories and 8#stems
4n Bo%em'er 2004 4 in#epen#ently came up !ith the #iglot i#ea 2)senet post an# threa#3F an# again in
&pril 200; 2)senet post an# threa#3. -rom the 'eginning 4 ha%e 'een researching it as an automatic
system 2e.g. a 5e' 'ro!ser e=tension3 using computational linguistics an# natural language processing
2$*/B*73. @aIor aspects o8 this research are presente# in >ection 1.1.1.1 Jhe 4#eaK. -or e=ample:
1. 4 propose a ne! *10*2 para#igmF Jtopic/oriente# teachingKF to completely a%oi# the pro'lem
o8 !or# sense #isam'iguationH
2. 4 propose that !or#/oriente# teaching must al!ays tell the user !hich sense is assume# !hen
teaching 8oreign language 8or an am'iguous nati%e language !or#H
(. 4 propose that !e can put *2 teachings an# practices in Jmicro/lessonsK separately 8rom a 5e'
pageEs original *1 te=tF so that !e !onEt ha%e linguistic pro'lems commonly encountere# !hen
t!o languages are mi=e# in the same sentence.
4 !ill also present an e=ample *10*2 system #esign in >ection 1.1.1.4 J&n <=ample >ystem
0esignKF co#e name 'TL'S %'cti(e Target Language 'c)uisition S*stem&.
here are also other e88orts to implement an automatic *10*2 system in recent years:
+e,-oca, 2http://!e'%oca'.source8orge.net/3 is a "in# o8 -ire8o= a##/on 2Dreasemon"ey user
script3. 4t can 'e classi8ie# as J!or#/oriente# teachingK 2see Jopic/+riente# %s. 5or#/+riente#
eachingK in >ection 1.1.1.13F an# only #isam'iguates !or#s 'y part/o8/speech clues 2e.g. a
!or# a8ter J4K must 'e a %er'/a#%er' rather than a noun/a#Iecti%eF so JcanK a8ter J4K must 'e in
its au=iliary %er' senseF J'e a'le toKF rather than its noun senseF JcontainerK3H other!ise it !ill
not teach or practice 8oreign language 8or am'iguous !or#s at all.
ming.a.ling 2https://a##ons.mozilla.org/en/)>/8ire8o=/a##on/ming/a/ling/3 is a -ire8o=
e=tension that can also 'e classi8ie# as J!or#/oriente#K *10*2. 4t simply calls Doogle
ranslate to translate nati%e language !or#s on a 5e' page to a 8oreign language. :o!e%erF in
case a nati%e language !or# is am'iguousF it #oes not tell the user !hich sense is assume# in
the teachingF so the user may 'e taught a !rong 8oreign language !or# an# !ill not "no! a'out
it !hen such a misteaching happens.
!haracteri/er 2https://a##ons.mozilla.org/en/)>/8ire8o=/a##on/characterizer/3 is a -ire8o=
e=tension that aims to teach $hinese or Japanese characters 'y putting them in the nati%e
language 5e' page youEre 'ro!sing. 4t is also a !or#/oriente# *10*2 system. :o!e%erF in
case a nati%e language !or# is am'iguousF it !ill simply choose a ran#om sense an# teach you
the $hinese/Japanese character 8or that senseF an# you !ill ne%er "no! i8 a misteaching
happens.
pol*glop 2https://a##ons.mozilla.org/en/)>/8ire8o=/a##on/polyglop/3 is a -ire8o= e=tension that
is also a !or#/oriente# *10*2. 4n case a nati%e language !or# is am'iguousF it !ill choose a
sense at ran#om an# teach that sense in 8oreign language. You !ill ne%er "no! i8 a misteaching
happens.
pol*glot 2https://chrome.google.com/!e'store/#etail/plpI"Ipl"n"nmh8h"Igc8go8clmlnine3 is a
$hrome e=tension that also #oes !or#/oriente# *10*2 'y calling Doogle ranslate to
translate certain nati%e language !or#s in your $hrome 'ro!ser. *i"e e=tensions a'o%eF it !ill
not tell you !hich sense is assume# !hen teaching 8oreign language 8or an am'iguous nati%e
language !or#F so you !ill ne%er "no! i8 a misteaching happens.
&s you seeF all these other e88orts liste# a'o%e are !or#/oriente# *10*2 systems 'ut they #onEt tell
you !hich sense is chosen 8or 8oreign language teaching !hen a nati%e language !or# is am'iguousF so
you !ill ne%er "no! i8 a misteaching happens.
4 'elie%e an automatic *10*2 system shoul# either 'e topic/oriente# 2so that it #oesnEt in%ol%e the
pro'lem o8 !or# sense #isam'iguation at all3F or 'e !or#/oriente# 'ut al!ays tell the user !hich sense
is chosen 8or teaching.
1.1.1./. An 01a.ple 2)ste. "esign
1.1.1./.1. Overview
,elo! 4 !ill present the #esign o8 a 8ictional *10*2 systemF co#e name ATLA8 9Active Target
Language Ac/uisition 8#stem:. 4t consists o8 a $hrome e=tension an# associate# open stan#ar#s.
$urrently it is only 8ocuse# on !or#/oriente# teaching. :o!e%erF the t!o para#igms 2!or#/oriente#
teaching an# topic/oriente# teaching3 actually share much in commonF an# the system has 8eatures that
!ill support topic/oriente# teaching at a later time. hese 8eatures !ill also 'e presente#.
Operation
he $hrome e=tension !ill loa# t!o 8iles 'e8ore actual 8oreign language teaching ta"es place: 213 an
;ATLA8 teaching mission pro0i!e< 2or Jmission pro8ileK 8or short3 that speci8ies !hich nati%e
language !or#s !ill trigger 8oreign language teachingF !hat content !ill 'e #isplaye# to the user in
such a teaching 2i.e. the micro/lessons3F an# #ata that help the e=tension to #isam'iguate am'iguous
nati%e language !or#s so that &*&> can try to teach 8oreign language 8or the right senseH 223 an
;ATLA8 user pro0i!e< 2or Juser pro8ileK 8or short3 that recor#s !hich micro/lessons a user has alrea#y
'een taught.
here8ore the e=tensionEs 'eha%ior !ill largely 'e #e8ine# 'y these t!o "in#s o8 pro8iles. >ections
1.1.1.4.2 an# 1.1.1.4.( !ill present the speci8ications o8 these pro8iles in #etail.
Word 8ense (isam"iguation 8trateg#
&s a !or#/oriente# *10*2 systemF a !or# sense #isam'iguation 25>03 strategy is re.uire# 8or
&*&>. 5e !ill use a %ery simple yet e88ecti%e 5>0 approach that assumes a !or#Es inten#e# sense is
#etermine# 'y the JtopicK o8 the !or#Es conte=tF an# the JtopicK is in turn #etermine# 'y !hat other
!or#s occur in that conte=t. -or e=ampleF i8 the userEs nati%e language 2*13 is <nglish an# there is an
am'iguous !or# J'assK in his 5e' 'ro!serF an# i8 there are !or#s such as JseaK an# J8ishingK near'yF
then the ongoing topic is pro'a'ly J8ishingK an# there8ore this J'assK is pro'a'ly in the 8ish senseH 'ut
i8 there are !or#s such as JmusicK an# JsongK near'yF then the ongoing topic is pro'a'ly JmusicK an#
there8ore this J'assK is pro'a'ly in the music sense.
here8ore in an &*&> teaching mission pro8ileF !or#s !ill 'e groupe# into JtopicsK so that they can
help #isam'iguate each other in the same topic. his !ill 'e e=plaine# in #etail in >ection 1.1.1.4.2
J&*&> @ission 7ro8ilesK.
>ince no 5>0 strategy is per8ectF micro/lessons must e=plicitly tell the user !hich !or# sense is 'eing
assume# in the teaching.
O00-Topic Teaching
4n principleF !or#/oriente# teaching !ill teach a 8oreign language !or# only i8 its correspon#ing nati%e
language !or# appears in the userEs 5e' 'ro!ser. ,ut !hat i8 nati%e language !or#s #e8ine# in a
teaching mission pro8ile ne%er appear in the 'ro!ser6 -or e=ampleF !hat i8 !e ha%e a mission pro8ile
that teaches 8oreign language 8or 'as"et'all !or#sF 'ut the user ne%er happens to 'ro!se 5e' pages
relate# to 'as"et'all6 4n that case the teaching mission !oul# ne%er 'e per8orme# success8ully. o
a##ress this pro'lemF !e shoul# let &*&> occasionally teach micro/lessons unre!ated to a We"
page5s $ordsF at the 'ottom o8 that pageF i8 &*&> 8in#s it #i88icult to 8in# normal teaching
opportunities.
1.1.1./.2. ATLA2 3ission +rofiles
&n &*&> teaching mission pro8ile 2Jmission pro8ileK 8or short3 is a plain te=t 8ile !ith the 8ile
e=tension J.mission.IsonK that #e8ines !hat !ill 'e taught to the user an# ho!: !hich nati%e language
!or#s !ill trigger 8oreign language teachingF !hat content !ill 'e #isplaye# to the user in such a
teaching 2i.e. the micro/lessons3F an# #ata that help the e=tension to #isam'iguate am'iguous nati%e
language !or#s so that &*&> can try to teach 8oreign language 8or the right sense.
A 8amp!e *ission +ro0i!e
4 !ill 8irst sho! you a sample mission pro8ile Jsample.mission.IsonK an# then e=plain it.
{
"dataFormat": "ATLAS Teaching Mission Profile Format 0.01",
"title": "School-related ords !"nglish -# $hinese%",
"descri&tion": "For "nglish s&ea'ers to learn school-related ords in $hinese.",
"a(thors":
)
"*i+(an ,ao !+ao-i+(an.gmail.com%"
/,
"&art+": "+ao-i+(an.gmail.com",
"date": "01 Fe1r(ar+ 0010",
"license": "$reati2e $ommons Attri1(tion 3.0 License",
"L1": ""nglish",
"L0": "Sim&lified $hinese",
"&aradigm": "ord-oriented teaching",
"ot$onte4t5indoSi-e": {"(nit": "ord", "2al(e": 306,
"le4emes":
)
{
"le4eme78": "teacher1",
"inflectedForms": )"teacher", "teachers"/,
"microLessons": )"A teacher is a !9i:osh;%.", "..."/
6,
{
"le4eme78": "st(dent1",
"inflectedForms": )"st(dent", "st(dents"/,
"microLessons": )"A st(dent is a !4(<sh=ng%.", "..."/
6,
{
"le4eme78": "1lac'1oard1",
"inflectedForms": )"1lac'1oard", "1lac'1oards"/,
"microLessons": )"A 1lac'1oard is a !h=i1>n%.", "..."/
6,
{
"le4eme78": "teach1",
"inflectedForms": )"teach", "teaches", "teaching", "ta(ght"/,
"microLessons": )"To teach something is to !9i?o% something.",
"..."/
6,
{
"le4eme78": "learn1",
"inflectedForms": )"learn", "learns", "learning", "learned", "learnt"/,
"microLessons": )"To learn something is to !4(<% something.", "..."/
6,
{
"le4eme78": "te4t1oo'1",
"inflectedForms": )"te4t1oo'", "te4t1oo's"/,
"microLessons": )"A te4t1oo' is a !'@1An%.", "..."/
6
/,
"to&ics":
)
{
"to&icTitle": "Schooling",
"mem1ers":
)
{"le4eme78": "teacher1", "eight": 10, "de&endenc+": 06,
{"le4eme78": "st(dent1", "eight": 10, "de&endenc+": 06,
{"le4eme78": "1lac'1oard1", "eight": 10, "de&endenc+": 06,
{"le4eme78": "teach1", "eight": 10, "de&endenc+": 06,
{"le4eme78": "learn1", "eight": 10, "de&endenc+": 06,
{"le4eme78": "te4t1oo'1", "eight": 10, "de&endenc+": 06
/
6,
{
"to&icTitle": "...",
"mem1ers":
)
{"le4eme78": "...", "eight": ..., "de&endenc+": ...6,
{"le4eme78": "...", "eight": ..., "de&endenc+": ...6,
{"le4eme78": "...", "eight": ..., "de&endenc+": ...6
/
6
/
6
he structure# 8ormat you see in the a'o%e sample mission pro8ile is calle# =8O> 2Ja%a>cript +'Iect
Botation3. 4#eallyF mission pro8iles shoul# 'e e#ite# !ith a #e#icate# e#itor programF 'ut i8 youEre
'ra%e enough to e#it it #irectlyF you shoul# learn J>+B 8irstF !hich is %ery .uic" to learn.
data-ormat speci8ies the 8ormat %ersion that this mission pro8ile uses.
tit!eF descriptionF authorsF date an# !icense are #escripti%e items 8or human users to rea#H they #onEt
a88ect &*&>Es 'eha%ior.
part# is a 8eature that allo!s authors o8 multiple mission pro8iles to coor#inate their e88orts so that their
mission pro8iles can %irtually !or" as a single mission. his !ill 'e 8urther e=plaine# later. 48 you #onEt
!ant to coor#inate !ith other authorsF use a uni.ue i#entity o8 yours 2e.g. email a##ress3 as the party
%alue.
L' an# L) respecti%ely speci8y the nati%e an# 8oreign language in this teaching mission. *1 also tells
&*&> !hether the nati%e language uses spaces to #elimit !or#s. -or e=ampleF <nglish uses spaces to
#elimit !or#s 'ut >impli8ie# $hinese #oesnEt.
paradigm is either J!or#/oriente# teachingK or Jtopic/oriente# teachingK. 4t tells &*&> !hich
para#igm to use !ith this mission pro8ile.
$otConte?tWindo$8i@eF use# !ith para#igm O J!or#/oriente# teachingKF tells &*&>Es !or# sense
#isam'iguation 25>03 algorithm ho! large a conte=t it shoul# loo" at to #etermine a !or#Es sense. 48
*1 is a space# language 2e.g. <nglish3F !ot$onte=t5in#o!>ize !ill 'e speci8ie# in terms o8 !or#sH
other!ise 2e.g. >impli8ie# $hinese3 it !ill 'e speci8ie# in terms o8 characters.
totConte?tWindo$8i@eF use# !ith para#igm O Jtopic/oriente# teachingKF tells &*&> ho! large a
conte=t it shoul# loo" at to #isco%er topics that may trigger *2 teaching. 48 *1 is a space# language
2e.g. <nglish3F tot$onte=t5in#o!>ize !ill 'e speci8ie# in terms o8 !or#sH other!ise 2e.g. >impli8ie#
$hinese3 it !ill 'e speci8ie# in terms o8 characters.
!e?emes speci8ies a list o8 *1 le=emes 2actually Jle=ical itemsKF as phrases are also allo!e#3 that coul#
trigger *2 teaching. <ach le=eme contains 213 a le=eme40 that !ill 'e uni.ue among mission pro8iles
o8 the same party 2see JpartyK a'o%e3 an# the same *1F 223 a list o8 in8lecte#-orms that speci8y all
possi'le 8orms that this le=eme may ta"e in a 5e' page 2inclu#ing the lemma3F an# 2(3 a list o8
micro*essons that !ill ta"e turns to sho! up each time &*&> #eci#es to teach 8oreign language 8or
this le=eme.
topics speci8ies a list o8 JtopicsK that group topically relate# le=emes togetherF so that these le=emes
can hint at each other in !or# sense #isam'iguation 25>03. -or e=ampleF the J>choolingK topic groups
9 school/relate# le=emes #e8ine# in the Jle=emesK section. <ach le=eme in a topic is associate# !ith
t!o %alues: !eight an# #epen#ency. hese %alues !ill 'e 8urther e=plaine# later. 4t is also %ery
important to note that a le=eme can appear in multiple topicsF an# each appearance !ill ha%e its o!n
uni.ue set o8 !eight an# #epen#ency %alues. JtopicitleK is a #escripti%e item 8or humans to i#enti8y a
topicH i8 para#igm O Jtopic/oriente# teachingKF another itemF Jtopic40KF !ill 'e use# 8or the computer
to uni.uely i#enti8y a topic among mission pro8iles o8 the same party an# *1.
Word 8ense (isam"iguation $ith ;$eight< and ;dependenc#<
he *1 !or#s use# in the a'o%e sample mission pro8ile 2JteacherKF Jstu#entKF J'lac"'oar#KF etc.3 are
largely unam'iguousF an# there8ore 4 canEt e=plain J!eightK an# J#epen#encyK using that pro8ile. *etEs
loo" at another mission pro8ile snippet that in%ol%es an am'iguous !or#F J'assK:
"le4emes":
)
{
"le4eme78": "sea1",
"inflectedForms": )"sea", "seas"/,
"microLessons": )"A sea is a !d:h>i%.", "..."/
6,
{
"le4eme78": "fishing1",
"inflectedForms": )"fishing"/,
"microLessons": )"Fishing is !di:o+B%.", "..."/
6,
{
"le4eme78": "1assC!fish%",
"inflectedForms": )"1ass", "1asses"/,
"microLessons": )"A 1ass !fish% is a !lB+B%.", "..."/
6,
{
"le4eme78": "m(sic1",
"inflectedForms": )"m(sic"/,
"microLessons": )"M(sic is !+;n+(@%.", "..."/
6,
{
"le4eme78": "song1",
"inflectedForms": )"songD, EsongsD/,
"microLessons": )"A song is a !g=FG%.", "..."/
6,
{
"le4eme78": "1assC!m(sic%",
"inflectedForms": )"1ass"/,
"microLessons": )"Hass !m(sic% is !" !1@is;%.", "..."/
6
/,
"to&ics":
)
{
"to&icTitle": "Fishing",
"mem1ers":
)
{"le4eme78": "sea1", "eight": 10, "de&endenc+": 06,
{"le4eme78": "fishing1", "eight": 10, "de&endenc+": 06,
{"le4eme78": "1assC!fish%", "eight": 10, "de&endenc+": 106
/
6,
{
"to&icTitle": "M(sic",
"mem1ers":
)
{"le4eme78": "m(sic1", "eight": 10, "de&endenc+": 06,
{"le4eme78": "song1", "eight": 10, "de&endenc+": 06,
{"le4eme78": "1assC!m(sic%", "eight": 10, "de&endenc+": 106
/
6
/
2,e8ore you rea# onF 'ear in min# that it is not necessary 8or a mission pro8ile authorF such as a secon#
language teacherF to really un#erstan# J!eightK an# J#epen#encyKF 'ecause she/he can 'e instructe# to
al!ays use !eight O 10 an# #epen#ency O 20 as a #e8ault setting.3
&s you can seeF the !or# J'assK appears in t!o topicsF J-ishingK an# J@usicKF respecti%ely as t!o
le=emesF J'assP28ish3K an# J'assP2music3K. 5e !ant &*&> to interpret the !or# J'assK as the le=eme
J'assP28ish3KF i8 either the le=eme Jsea1K or J8ishing1K appears near'yH or interpret it as
J'assP2music3KF i8 either Jmusic1K or Jsong1K appears near'y. o #o thisF !e gi%e Jsea1K an#
J8ishing1K a J!eightK o8 10F an# gi%e J'assP28ish3K a J#epen#encyK o8 10F !hich means !hether
J'assP28ish3K !ill 'e assume# #epen#s on !hether other J-ishingK le=emes 2e.g. Jsea1K an#
J8ishing1K3 in the same conte=t constitute a total !eight o8 at least 10. 4n other !or#sF either Jsea1K or
J8ishing1K or 'oth o8 them must appear near'y to let &*&> interpret the !or# J'assK as the le=eme
J'assP28ish3K an# teach this le=emeEs *2 micro/lessons. 5e treat Jmusic1KF Jsong1K an#
J'assP2music3K similarly so that at least one among Jmusic1K an# Jsong1K must 'e present in the
conte=t to let &*&> interpret the !or# J'assK as J'assP2music3K an# teach *2 micro/lessons
accor#ingly.
48 a !or# is unam'iguous 2e.g. JmusicK3F its le=eme 2e.g. Jmusic1K3 can ha%e a J#epen#encyK o8 0F
!hich means it #oesnEt re.uire any other le=eme 8rom the same topic to appear near'y to help
#isam'iguate itF as it is unam'iguous any!ay.
48 a le=eme #oes not suggest a topic as much as other le=emes #oF it can ha%e a smaller !eight 2e.g.
!eight O 53 8or that topic than other le=emes. -or e=ampleF the occurrence o8 J'all penK in a 5e' page
may not suggest the presence o8 the topic J>choolingK as much as the occurrence o8 JteacherK !oul#F
so J'all penK may ha%e a smaller !eight than JteacherK 8or that topic. >imilarlyF there can 'e le=emes
that ha%e a greater !eight than othersF i8 they more strongly suggest a topic.
&lso note that the micro/lessons 8or J'assP28ish3K an# J'assP2music3K e=plicitly tell the user !hich
J'assK is 'eing taught in 8oreign languageF 'y a##ing J28ish3K an# J2music3K in the lesson content. his
is in accor#ance !ith the J!or#/oriente# teaching must e=plicitly tell the user !hich !or# sense is
'eing assume# !hen teaching 8oreign language 8or an am'iguous nati%e language !or#K principle 4
propose# in >ection 1.1.1.1 Jhe 4#eaK.
4A!!-0or-A!!< W8( vs. ;A!!-0or-One< W8(
7re%iously !e #emonstrate# JtopicsK !ithin !hich le=emes hint at each other in !or# sense
#isam'iguation 25>03F e.g.
{
"to&icTitle": "Fishing",
"mem1ers":
)
{"le4eme78": "sea1", "eight": 10, "de&endenc+": 06,
{"le4eme78": "fishing1", "eight": 10, "de&endenc+": 06,
{"le4eme78": "1assC!fish%", "eight": 10, "de&endenc+": 106
/
6
5e call this Jall/8or/all 5>0K. :o!e%erF i8 necessaryF !e can e%en #esign topics solely 8or the purpose
o8 hinting at only one le=eme in 5>0. -or e=ampleF !e can mo#i8y the a'o%e topic as:
{
"to&icTitle": "Hass !fish%",
"mem1ers":
)
{"le4eme78": "sea1", "eight": 10, "de&endenc+": IIIII6,
{"le4eme78": "fishing1", "eight": 10, "de&endenc+": IIIII6,
{"le4eme78": "1assC!fish%", "eight": 10, "de&endenc+": 106
/
6
-irstF notice the topicitle has 'een change# to J,ass 28ish3KF 'ecause this ne! topic is solely 8or
hinting at the le=eme J'assP28ish3K.
>econ#lyF notice 'oth Jsea1K an# J8ishing1K ha%e a J#epen#encyK o8 ?????F !hich means theyEll ne%er
'e acti%ate# 8or *2 teaching 2at least 8rom this topic3. ,ut they ha%e a normal !eight o8 10F !hich
means they can help acti%ate J'assP28ish3K 8or *2 teaching.
+arties
part# is a 8eature that allo!s authors o8 multiple mission pro8iles to coor#inate their e88orts so that their
mission pro8iles can %irtually !or" as a single mission. 48 you #onEt !ant to coor#inate !ith other
authorsF use a uni.ue i#entity o8 yours 2e.g. email a##ress3 as the party %alue..
48 multiple mission pro8iles speci8y the same party %alueF their le=eme40s are mutually recognize# an#
there8ore their le=emes can 'e merge#. ,esi#esF their !eight an# #epen#ency %alues are in accor#ance
!ith the same #esign stan#ar# 2e.g. they !oul# all use 10 as a stan#ar# !eight3. -urthermoreF they !ill
ha%e consistent !ot$onte=t5in#o!>ize an# tot$onte=t5in#o!>ize %alues.
8upporting Topic-Oriented Teaching
&lthough currently this e=ample system #esign is 8ocuse# on !or#/oriente# teachingF it !ill 'e easy to
support topic/oriente# teaching 'ecause the t!o para#igms share much #ata in common.
-or e=ampleF consi#er the >chooling topic in our 8irst sample mission pro8ile:
{
"to&icTitle": "Schooling",
"mem1ers":
)
{"le4eme78": "teacher1", "eight": 10, "de&endenc+": 06,
{"le4eme78": "st(dent1", "eight": 10, "de&endenc+": 06,
{"le4eme78": "1lac'1oard1", "eight": 10, "de&endenc+": 06,
{"le4eme78": "teach1", "eight": 10, "de&endenc+": 06,
{"le4eme78": "learn1", "eight": 10, "de&endenc+": 06,
{"le4eme78": "te4t1oo'1", "eight": 10, "de&endenc+": 06
/
6
5e can #isa'le !or#/oriente# teachingF 'y changing all #epen#ency %alues to a large num'erF sayF
?????. hen !e can intro#uce a topic/le%el #epen#ency %alue an# micro*essonsF so that the topic itsel8
can 'e acti%ate# 8or *2 teaching:
{
"to&icTitle": "Schooling",
"to&ic78": "schooling",
"mem1ers":
)
{"le4eme78": "teacher1", "eight": 10, "de&endenc+": IIIII6,
{"le4eme78": "st(dent1", "eight": 10, "de&endenc+": IIIII6,
{"le4eme78": "1lac'1oard1", "eight": 10, "de&endenc+": IIIII6,
{"le4eme78": "teach1", "eight": 10, "de&endenc+": IIIII6,
{"le4eme78": "learn1", "eight": 10, "de&endenc+": IIIII6,
{"le4eme78": "te4t1oo'1", "eight": 10, "de&endenc+": IIIII6
/,
"de&endenc+": 30,
"microLessons": )"Lesson 1...", "Lesson 0...", .../
6
his ne! topic means i8 at least three le=emes in this topic are present in a conte=tF &*&> can insert
the topic/le%el micro*essons into that conte=t to teach schooling/relate# *2 lessons.
topic40K gi%es the topic a uni.ue 40 !hich !ill help &*&> recor# the userEs learning progress 8or
this topicEs micro/lessons.
Bote that a mission pro8ile 8or topic/oriente# teaching !oul# usually use a larger conte=t !in#o! size
2tot$onte=t5in#o!>ize3 than mission pro8iles 8or !or#/oriente# teaching 2!ot$onte=t5in#o!>ize3F
'ecause the #etermination o8 a large portion o8 te=tEs topic re.uires a scan !in#o! larger than that o8
the #etermination o8 a !or#Es sense.
Bote that a mission pro8ile can choose either !or#/oriente# teaching or topic/oriente# teaching 2%ia the
Jpara#igmK item3F 'ut not 'othF 'ecause running 'oth para#igms at the same time !oul# in%ol%e
unnecessary a##itional engineering comple=ity to han#le con8licts 'et!een the t!o.
1.1.1./.,. ATLA2 5ser +rofiles
&n &*&> user pro8ile 2Juser pro8ileK 8or short3 is a plain te=t 8ile !ith the 8ile e=tension J.user.IsonK
that speci8ies a userEs i#entityF pre8erences an# learning recor#s.
A 8amp!e Aser +ro0i!e
4 !ill 8irst sho! you a sample user pro8ile Jsample.user.IsonK an# then e=plain it.
{
"dataFormat": "ATLAS Jser Profile Format 0.01",
"name": "*i+(an ,ao",
"more7nfo": ""mail: +ao-i+(an.gmail.com",
"&references":
{
"learnSame5ordFromM(lti&leParties": false,
"microLesson8ensit+": {"(nit": "ord", "2al(e": 1006
6,
"otLearningKecords":
)
{
"L1": ""nglish",
"L0": "Sim&lified $hinese",
"&art+": "+ao-i+(an.gmail.com",
"le4eme78": "teacher1",
"lemma": "teacher",
"microLessons$om&leted": 1
6,
{
...
6
/,
"totLearningKecords":
)
{
"L1": ""nglish",
"L0": "Sim&lified $hinese",
"&art+": "+ao-i+(an.gmail.com",
"to&ic78": "schooling",
"microLessons$om&leted": 1
6,
{
...
6
/
6
he structure# 8ormat you see in the a'o%e sample mission pro8ile is calle# =8O> 2Ja%a>cript +'Iect
Botation3. 4#eallyF user pro8iles !ill 'e han#le# 'y &*&> automaticallyF 'ut i8 youEre 'ra%e enough to
e#it it manuallyF you shoul# learn J>+B 8irstF !hich is %ery .uic" to learn.
data-ormat speci8ies the 8ormat %ersion that this user pro8ile uses.
name an# moreIn0o i#enti8y the user.
!earn8ameWord-rom*u!tip!e+arties is a 'oolean %alue that speci8ies !hether the user is !illing to
learn lessons a'out the same !or# o88ere# 'y multiple parties. >ee JpartyK in >ection 1.1.1.4.2 J&*&>
@ission 7ro8ilesK. 48 8alseF lessons 8or the same !or# o88ere# 'y ne! parties !ill 'e s"ippe#.
microLesson(ensit# speci8ies the #istance 'et!een t!o a#Iacent micro/lessons inserte# into a 5e'
page. 4t means ho! 8re.uently &*&> shoul# insert micro/lessons.
$otLearning4ecords is a list o8 le=emes !hose *2 micro/lessons ha%e 'een or are 'eing taught to the
user %ia !or#/oriente# teaching.
totLearning4ecords is a list o8 topics !hose *2 micro/lessons ha%e 'een or are 'eing taught to the
user %ia topic/oriente# teaching.
1.1.1././. "ata Acuisition 2trategies
>ince &*&> relies on mission pro8iles to pro%i#e *1/#ri%en *2 teaching in the userEs 5e' 'ro!serF it
is %ery important to #iscuss ho! such mission pro8iles can 'e pro#uce#. DenerallyF 4 see t!o
approaches:
Let 8econd Language Teachers *anua!!# +repare *ission +ro0i!es
eachers can #esign le=emes an# topics 'ase# on:
!hat topics their stu#ents most li"ely 'ro!se online 2e.g. sportsF entertainmentF 8ashionF
gamingF etc.3H
!hat topics theyEre going to teach in the near termH
general/purpose topics that are li"ely to appear in any 5e' page 2topics that group relate#
general/purpose !or#sF such as a topic that groups Ji8KF JthenK an# JelseKF or another topic that
groups Jas"K an# Jans!erK.
eachers can also 8orm !or"groups that pool their e88orts togetherF using the same part# %alue to
coor#inate their mission pro8iles 2see >ection 1.1.1.4.23.
Let Computationa! Linguists Automatica!!# Generate *ission +ro0i!es
4t is e%en possi'le 8or computational linguists to automatically generate &*&> mission pro8iles 8rom
rele%ant #ata sources. -or e=ampleF a le=eme can correspon# to a 5i"ipe#ia articleF an# on that article a
computer program can automatically 8in# out lin"s to topically relate# le=emes 2articles3F thus
automatically generating topics that group relate# le=emes together. 5i"ipe#ia also connects
multilingual %ersions o8 the same le=eme 2article3 togetherF !hich can 'e use8ul 8or automatically
generating *2 micro/lessons 8or a gi%en le=eme.
1.1.1.6. A 7ewer "esign: A "ata!"riven Approach
4n @ay 2012 4 realize# that a #ata/#ri%en approachF !hich #oesnEt re.uire language teachers to
manually pro#uce Jmission pro8ilesKF is more practical. ,elo! is my ne! i#ea poste# to $orpora/list
an# other computational linguistics mailing lists. 4 inten# to #e%elop this i#ea into a $hrome 'ro!ser
e=tension.
"Language Immersion for Chrome", and a Better Idea
Google's "Language Immersion for Chrome"
Recently a Chrome browser extension called "Language Immersion for Chrome" has
been much publicized. Developed by "Use All Five Inc." on behalf of Google, the
extension translates certain words and phrases on the Web page you're browsing to
a foreign language via Google Translate, for the purpose of helping you learn
that foreign language while browsing the Web.
I have been researching this kind of thing for years, and one of my main
standpoints is machine translation shouldn't be used in serious language learning
as it is error-prone: it takes a learner a great effort to memorize a piece of
erroneous knowledge, another great effort to "unlearn" this wrong knowledge and
yet another great effort to "relearn" the right knowledge.
But I do understand online machine translation services like Google Translate and
Bing Translator are so readily available that directly using them to do the
translation can minimize development costs. Upon seeing this news, I asked
myself: "Can we use a kind of freely available, manually prepared data, instead
of machine translation, to do this better?" And the answer is YES!
A Better Idea
Imagine if we have a database of manually-translated bilingual sentence pairs
(such as those multilingual movie subtitle files on those subtitle websites, and
those famous quotations on wikiquote.org), e.g.
(German) Er ist ein guter Schler.
(English) He is a good student.
Now if a German wants to learn English, and he happens to be browsing a German
Web page that contains the German word "Schler" (student), and the computer
finds out that this German word also occurs in a bilingual sentence pair like the
above. Now, the computer can teach English for this German word, by inserting the
above bilingual sentence pair into that Web page, like an embedded advertisement.
This way, the German will learn the English word "student", and better yet, learn
it in a bilingual sentence pair! This means he will not only learn the word
"student" alone, but also its syntax, semantics and pragmatics, all implied by
this example sentence. As to phonetics, the computer can use text-to-speech to
read aloud the English sentence, or display some kind of pronunciation guide
above or alongside the English sentence (see my recent project "Phonetically
Intuitive English" for such a pronunciation aid:
https://sites.google.com/site/phoneticallyintuitiveenglish/).
That's the basic idea. But of course we can further refine this idea. For
example, if there are multiple bilingual sentence pairs containing "Schler", the
computer can prefer a pair that contains words that appear near "Schler" on the
Web page (i.e. context words). This would be very useful if the word in question
(Schler) is ambiguous.
Besides bilingual sentence pairs, we may also explore multilingual data from
Wiktionary and Wikipedia, although their usage may not be as straightforward as
the model discussed above. I leave this as homework for the reader.
I also intend to develop a Chrome extension based on the idea discussed above :-)
Best Regards,
Ziyuan Yao
1.1.2. (or' 3ne.onics
he *1/#ri%en *2 teaching 2*10*23 metho# #iscusse# in >ection 1.1.1 alrea#y implies an approach
to !or# memorization: 'y repetition 2a ne! !or# is taught an# practice# in a series o8 micro/lessons
'e8ore it is consi#ere# learne#3. Cesearch into more sophisticate# mnemonics has un%eile# metho#s
that can ser%e as po!er8ul 8orce multipliers 8or *10*2F !hich !ill 'e presente# in the 8ollo!ing
sections. &mong themF J7honetically 4ntuiti%e <nglishK an# J<tymology an# -ree &ssociationK are
recommen#e# 'y this e'oo" as ;essentia! mnemonics< an# 4 !ill pro%e !hy 2>ection 1.1.2.1.(3.
+honetica!!# Intuitive ,ng!ish 9essentia!:: @emorizing a !or# in terms o8 sylla'les ta"es 8ar less
e88ort than in terms o8 lettersF an# there8ore pronunciation as a more compresse# 8orm than spelling is a
"ey mnemonic. >ection 1.1.2.1.1 J7honetically 4ntuiti%e <nglishK is an approach that integrates a
!or#Es pronunciation into its spellingF en8orcing correctF con8i#ent an# 8irm memorization o8
pronunciationF !hich in turn e88ecti%ely 8acilitates memorization an# recall o8 spelling.
,t#mo!og# and -ree Association 9essentia!:: @any !or#s are "no!n to 'e 'uilt on smaller
meaning8ul units "no!n as !or# roots an# a88i=esF or #eri%e# 8rom relate# !or#s. Qno!ing 8re.uently
use# roots an# a88i=es an# a ne! !or#Es etymology can certainly help the user memorize the ne! !or#
in a logical manner. <%en i8 a !or# is not et*mologicall* associated !ith any !or#F root or a88i=F
people can still freel* associate it !ith an alrea#y "no!n !or# that is similar in 8orm 2either in !ritten
8orm or in spo"en 8orm3 an#F optionally 'ut #esira'lyF relate# in meaning. >ection 1.1.2.1.2
J<tymology an# -ree &ssociationK re%isits these !i#ely "no!n metho#s.
Orthographica!!# Intuitive ,ng!ish: $ertain parts o8 a long !or# can 'e so o'scure that they are o8ten
ignore# e%en 'y nati%e spea"ersF such as a !or#Es choice 'et!een J/anceK an# J/enceK. >ection
1.1.2.2.1 J+rthographically 4ntuiti%e <nglishK #iscusses an approach that #eli'erately Jampli8iesK such
J!ea" signalsKF so that the learner gets a stronger impression.
+rogressive Word Ac/uisition: >ections 1.1.2.2.2 J7rogressi%e 5or# &c.uisitionK is an approach that
splits a long !or# into more #igesti'le parts an# e%entually con.uers the !hole !or#.
>ection 1.1.2.( J7rinciples *earne#K e=tracts se%eral Jprinciples o8 !or# memorizationK 8rom the
metho#s #iscusse# in earlier sectionsF gi%ing us a more 8un#amental un#erstan#ing o8 !hy these
metho#s !or".
1.1.2.1. 0ssential 3ne.onics
1.1.2.1.1. +honeticall) &ntuitive 0nglish# $+&0%
>ote: 7honetically 4ntuiti%e <nglish is one o8 t!o Jessential mnemonicsK recommen#e# 'y this e'oo"F
the other one 'eing J<tymology an# -ree &ssociationK 2see >ection 1.1.2.1.23. & proo8 !hy they are
essential is in >ection 1.1.2.1.(.
A Quick Introduction
7honetically 4ntuiti%e <nglish slightly #ecorates or mo#i8ies an <nglish !or#Es %isual 8orm 2usually 'y
a##ing #iacritical mar"s3 to 'etter re8lect its pronunciationF !hile retaining its original spelling. & !or#
can 'e #isplaye# in this 8orm as it is taught to a non/nati%e learner 8or the 8irst 8e! times 2e.g. 'y *1/
#ri%en *2 teachingH see >ection 1.1.13F in or#er to en8orce correctF con8i#ent an# 8irm memorization o8
pronunciation as early as possi'leF !hich in turn also 8acilitates e88ecti%e memorization o8 spelling.
& 8ull/8le#ge# 74< sentence may loo" li"e this 2%ie! !ith a )nico#e 8ont !ith a#%ance# typography
8eatures such as the 8ree an# open source J>4* &n#i"a ,asicKH in practiceF a 'ro!ser a##/on that sho!s
74< te=t in a 'ro!ser !ill al!ays en8orce such 8onts 8or such te=t to ensure goo# ren#ering3:
A
A
quick broown fox jumps oooveA r theA laoo zy do!
he a'o%e e=ample sho!s pronunciation in a %ery %er'ose mo#e: J&KF JuKF JcKF JoKF JeKF JtKF JaK an#
JyK are assigne# #iacritics to #i88erentiate 8rom their #e8ault soun# %aluesH J!K an# JhK ha%e a short 'ar
!hich means theyEre silentH multi/sylla'le !or#s such as Jo%erK an# JlazyK ha%e a #ot to in#icate stress.
>uch a mo#e is inten#e# 8or a non/nati%e 'eginner o8 <nglishF !ho is una!are o8 #igraphs li"e Jo!KF
JerK an# JthK.
+n the other han#F more a#%ance# learners can use a liter %ersion:
A quick broown fox jumps ooover the laoo zy do!
-urthermoreF !or#s an# !or# parts 2e.g. /tion3 that a learner is alrea#y 8amiliar !ith also #onEt nee#
#iacritics.
Bote that 74< is inten#e# as a kind o0 pronunciation guide that is automatically #isplaye# as the
computer teaches ne! !or#s 2as in *1/#ri%en *2 teaching3H it is not intended to "e t#pe$ritten or
hand$ritten "# a student.
The Chart
>ee the ne=t page 8or the chart o8 7honetically 4ntuiti%e <nglish 2.0 274<23F my latest 74< #esign.
PHONlTlCALLY lNTUlTlVl lNGLlSH 2.0
GlNlRAL MARKS (APPLY TO BOTH VOWlL AND CONSONANT LlTTlRS)
Pll2 maiks Remaiks lxamples
Default values -
Usually omiued unless necessaiy. Diawn above vowel leueis a, e, i, o and u foi
shoit vowels //, /r/, /i/, /n/ (US /o./) and /:/. Diawn below a consonant leuei
foi that leuei's most typical consonant value; can be diawn above ceitain
consonant leueis such as g and p.
(lf necessaiy) b"t, b#t, b$ %t, b&t, b't; bat
Silence (above), - (thiough)
A dot diawn above a vowel oi consonant leuei silences that leuei.
- loi ceitain leueis such as i, a shoit hoiizontal line is diawn thiough them
instead.
tak(, penc)l
Unsuppoited values
Diawn above a vowel oi consonant leuei to mean it has a sound value not
suppoited by Pll2.
o* ne
VOWlL MARKS (ALWAYS ABOVl VOWlL LlTTlRS)
Pll2 maiks Remaiks lxamples
Shoit vowels
- (default values);
c, i, o, :, (custom values)
- Te default value" maik (-) can be used when a vowel leuei pioduces its default
shoit vowel value, namely //, /r/, /i/, /n/ (US /o./) and /:/ foi a, e, i, o and u.
c, i, o, :, ln case a vowel leuei pioduces a shoit vowel othei than its default value,
a dedicated diaciitic will be used to iepiesent each such vowel c foi /r/, i foi /i/, o
foi /n/ (US /o./), : foi /:/, and foi /o/.
(Default) b"t, b#t, b$ %t, b&t, b't
(Custom) any , busy , swap , s+n, p,t
Long vowels
(leuei names)
When is diawn above a, e, i/y, o oi u/w, the leuei has a long vowel sound that
equals to its name /ei/, /i./, /ai/, /ao/ (US /oo/) oi /ju./. u /ju./ also has a weak
vaiiant, u /jo/.
t-ke, .ve, n/ce, m0de, c1te; cuore
(Middle lnglish-like)
ln Middle lnglish, a, e, i, o and u used to have long vowels /a./, /e./, /i./, /o./ and /u./.
ln Modein lnglish, they can have similai sounds /o./, /ei/, /i./, /o./ and /u./. Tese
Middle lnglish-like sounds aie iepiesented in Pll by diawing above a, e, i/y, o and
u/w. A good mnemonic is that is simply a 90 iotation of /./, while the lPA leuei
befoie /./ becomes its coiiesponding Latin leuei.
f2ther, caf3, mach4ne, c5rd, br6te
--, (additive cases)
-- When two -'s aie added above ei/ey oi oi/oy, it means /ei/ oi /oi/. Note that these
two -'s can be omiued unless necessaiy.
When two 's aie added above two adjacent leueis such as oo, it means /u./.
#7ht, b&8; f99d
\\, // (special cases)
\\ Te leuei a has a special case wheie it pionounces the long vowel /o./, and \\ is
diawn above a foi this case. A good mnemonic is Te woid fall has falling stiokes
above."
// Te digiaphs ou, ow and au can pioduce a long vowel /ao/ which is also not
accounted foi so fai, and // is diawn above o oi a foi this case. A good mnemonic is
Te woid out has outgoing stiokes above."
fall , :ut
Schwa \
\ When \ is diawn above a vowel leuei (a, e, i/y, o, u/w) oi i, the leuei
pionounces /a/.
fell;, hourA (UK)
Long schwa i (as in ei, ii, ui, .)
i When is diawn above i as in woid", it means this i (along with all adjacent
vowel leueis) pionounces /s./.
wor< d (UK)
CONSONANT MARKS (USUALLY BlLOW CONSONANT LlTTlRS)
Pll2 maiks Remaiks lxamples
Default values -
Usually omiued unless necessaiy. Diawn below a leuei foi that leuei's most typical
consonant value; can be diawn above ceitain consonant leueis such as g and p.
bat ; quick
Secondaiy values i
Diawn below oi above a consonant leuei to iepiesent usually the second most typical
value foi that consonant leuei, e.g. /d/ foi d oi g, /k/ foi c, /p/ foi n, /v/ foi f, /0/ foi
t, /z/ foi s, /qz/ foi x.
soldier , class , sin , of, thin , is, example
Teitiaiy values ii
Diawn below oi above a consonant leuei to iepiesent usually the thiid most typical
value foi that consonant leuei, e.g. // foi t, /f/ foi g oi p (in oidei to align with g in
this case, p has no secondaiy value), /t/ foi d, /z/ foi x.
this , couh , phone , booked, xanadu
R-coloied i ; i
When i /a/ and i /s./ aie combined with a - (the default value" maik) below, they get
an additional /i/ /ai/ and /s.i/ as commonly heaid in Ameiican lnglish.
hourA (UK) / hourA (US); wor< d (UK) / wor<d (US)
/t[/, /[/ and // c; o;
A c, o oi below ceitain consonant leueis (s, c, t, z) denotes /t[/, /[/ oi //. A good
mnemonic is that /t[/, /[/ and // can coiiespond to thiee typical digiaphs ch", sh" and
zh", and c, o and iesemble the lowei lef paits of these digiaphs (i.e. the bouoms of
c", s" and z").
c=hair, act=ual; s> hirt, act> ion, ma?hine; vers@ion
STRlSS MARK (BlLOW A SYLLABLl'S PRlMARY VOWlL LlTTlR)
Pll2 maiks Remaiks lxamples
Piimaiy stiess Diawn below the stiessed syllable's piimaiy vowel leuei.
pronunciAtion
Wh# +I,
o learn a ne! !or#F t!o tas"sF among othersF are in%ol%e#: learning its pronunciation an# its spelling.
hese t!o tas"s are relate#F an# choosing !hich to #o 8irst ma"es a lot o8 #i88erence. 48 !e learn
spelling 8irstF !eE# 'e memorizing a 2usually long3 se.uence o8 lettersF !hich is as te#ious as
remem'ering a long telephone num'er. ,ut i8 !e learn pronunciation 8irstF !eE# 'e memorizing a much
shorter se.uence o8 sylla'lesF !hich can 'e #one in a 'reezeH then pronunciation can ser%e as a goo#
catalyst 8or the su'se.uent memorization o8 spelling.
here8ore pronunciation plays a prominent role in !or# ac.uisitionF an# it is !orth!hile 8in#ing out a
goo# metho# to learn it.
& 'ig #ra!'ac" o8 47& is thatF 'ecause it sho!s pronunciation separate!# 8rom spellingF it gi%es the
user a chance to s"ip learning pronunciation at all. his is especially the case !hen the user encounters
an un"no!n !or# in rea#ing an article: at that momentF the user cares most a'out the meaning o8 that
ne! !or#F not the pronunciationF as he #oesnEt ha%e a nee# to hear or say that !or# in real li8e in the
near 8uture. here8ore he is %ery li"ely to s"ip learning the !or#Es true pronunciation in a #ictionaryF
'ut instea# ma"e a guessed pronunciation on his o!n. @a"ing a guesse# pronunciation !ill then lea#
to t!o ne! pro'lems: 2a3 'ecause the user is a non/nati%e spea"er o8 <nglishF his guesse#
pronunciation !ill ten# to 'e error/proneF an# there8ore he !onEt #are to commit this guesse#
pronunciation to his long/term memory %ery 8irmlyF lest it !oul# 'e #i88icult to Jupgra#eK the guesse#
pronunciation to the correct pronunciation in the 8utureH 2'3 the longer the !or# isF the more
uncertainties there are in guessing a pronunciationF ma"ing the guess!or" more error/proneF an#
there8ore itEs %ery li"ely that the user !onEt #are to guess a complete pronunciationH he !oul# only
guess the 8irst t!o sylla'les an# then Iump to the en# o8 the !or#. -or e=ampleF 4 use# to memorize
JetymologyK as Iust Jety...logyKF Ju'i.uitousK as Iust Ju'i...ousKF JthesaurusK as Iust Jthes...usKF an# so
onH this results in 'oth an incompleteF guesse# pronunciation an# an incomplete spelling in the userEs
memory.
74<F on the other han#F eliminates the pro'lems #iscusse# a'o%e. $orrect pronunciation is ma#e
imme#iately a%aila'le to the user as he scans through a !or#Es spellingH there is no nee# to ma"e a
Jguesse# pronunciationK at all. he user !ill memorize the correctF comp!ete pronunciation 0irm!#F
!hich in turn !ill 8acilitate memorization o8 the comp!ete spe!!ing.
Technica! Ana!#sis
here are se%eral technical approaches to Ja##ing something a'o%e normal te=tK. You can #esign a
special 8ont that #ra!s letters !ith #iacriticsF or 5e' 'ro!ser e=tensionsF plugins an# ser%er/si#e
scripts that #ynamically generate graphics 8rom special co#es 2e.g. @ath@*3F or :@* Jinline ta'lesK
as use# in an implementation o8 JCu'y te=tK 2http://en.!i"ipe#ia.org/!i"i/Cu'yPcharacterF
http://!e'.nic"shan"s.com/stylesheets/ru'y.css3F or systems that use t!o )nico#e 8eatures L Jpre/
compose# charactersK 2letters that come !ith #iacritics right out o8 the 'o=3 an# Jcom'ining
charactersK 2special characters that #onEt stan# alone 'ut a## #iacritics to other characters3. he 74<
scheme sho!n a'o%e ma"es use o8 some com'ining characters.
4n the ma"ing o8 the 74< schemeF 4 consulte# these 5i"ipe#ia articles:
$nglish spelling L spelling/to/soun# an# soun#/to/spelling patterns in <nglish
2http://en.!i"ipe#ia.org/!i"i/<nglishPspellingR>pellingPpatterns3
!om,ing character L ta'les o8 com'ining characters in )nico#e
2http://en.!i"ipe#ia.org/!i"i/$om'iningPcharacter3
Pronunciation respelling for $nglish L comparison o8 respelling schemes in maIor #ictionaries
2http://en.!i"ipe#ia.org/!i"i/7ronunciationPrespellingP8orP<nglish3
Imp!ementation
& 8ree an# open source $hrome 'ro!ser e=tension J74< rans8ormerK that automatically a##s 74<
#iacritics to <nglish !or#s on 5e' pages has 'een #e%elope# in &pril 2012F an# its o88icial !e'site is
https://sites.google.com/site/phoneticallyintuiti%eenglish/. 4t has 'een reporte# 'y 7$5orl#
2http://!!!.pc!orl#.com/article/2599A0/piePaPchromePe=tensionPthatPcanPhelpP!ithPrea#ingPan#Pp
ronunciation.html3 an# numerous other ne!s me#ia.
istorica! >otes
he general i#ea o8 representing a letterEs %arious soun# %alues 'y a##itional mar"s is pro'a'ly as ol#
as #iacritics.
&merican #ictionaries 'e8ore the 20
th
century sho!e# #iacritical mar"s #irectly a'o%e hea#!or#s to
in#icate pronunciation 8or nati%e rea#ers 2though not necessarily %er'osely3. hey ha%e 'een replace#
'y separate transcription schemesF such as the 47& an# respelling systems.
&##ing #iacritics to <nglish 8or non/nati%e learners is an o'scure metho# that has ne%er gone
mainstream. 4n $hina the metho# !as pro'a'ly 8irst pu'lishe# 'y Deorge >iao 2#$%3F a retire#
pro8essorF in the 1?;0s. :e 8inalize# his phonetic transcription scheme in the 1??0s 'ase# on that o8
5e'sterEs 0ictionary an# name# it the Simple and $as* Phonetic Mar0s %&. +ther people
ha%e create# more schemes #eri%e# 8rom >iaoEs. >ome o8 them intro#uce# more 47&/li"e #iacritics.
4 in#epen#ently came up !ith this i#ea in @arch 200? 2)senet postF threa#3 an# create# a scheme
calle# Phoneticall* 1ntuiti(e $nglish %P1$&F 'ase# on )nico#e. he schemeEs chart is sho!n a'o%e.
he scheme is so #esigne# that it aims to 'e the easiest to learn among all e=isting schemes o8 its "in#.
1.1.2.1.2. 0t).olog) an' Free Association
>ote: <tymology an# -ree &ssociation is one o8 t!o Jessential mnemonicsK recommen#e# 'y this
e'oo"F the other one 'eing J7honetically 4ntuiti%e <nglishK 2see >ection 1.1.2.1.13. & proo8 !hy they
are essential is in >ection 1.1.2.1.(.
@any !or#s are "no!n to 'e 'uilt on smaller meaning8ul units "no!n as !or# roots an# a88i=esF or
#eri%e# 8rom relate# !or#s. Qno!ing 8re.uently use# roots an# a88i=es an# a ne! !or#Es etymology
can certainly help the user memorize the ne! !or# in a logical manner. -or e=ampleF JmemorizeK
comes 8rom a relate# !or#F JmemoryKF an# a common su88i=F J/izeK.
<%en i8 a !or# is not et*mologicall* associated !ith any !or#F root or a88i=F people can still freel*
associate it !ith an alrea#y "no!n !or# that is similar in 8orm 2either in !ritten 8orm or in spo"en
8orm3 an#F optionally 'ut #esira'lyF relate# in meaning. his alrea#y "no!n !or# can come 8rom the
target 8oreign languageF or 8rom the learnerEs nati%e language. -or e=ampleF as a $hineseF !hen 4 8irst
encountere# the !or# JsonataK in a multime#ia encyclope#ia as a teenagerF 4 associate# it !ith a
tra#itional $hinese musical instrument suona 23 !hich !as 8eature# in an elementary school music
class an# 'ears a similar pronunciation to the JsonaK part o8 JsonataK. 4t shoul# also 'e note# thatF as
sai# earlierF sometimes !or#s ser%ing as mnemonics are not necessarily relate# to the !or# to 'e
memorize# in meaning. -or e=ampleF to memorize the !or# J+scarKF !e can associate it !ith t!o
"no!n !or#sF J+>K 2operating system3 an# JcarKF although they ha%e nothing to #o !ith +scar in
meaning.
here8ore it is use8ul to let people contri'ute nati%e language/'ase# an# target language/'ase#
mnemonics colla'orati%ely online. 5i"tionary might 'e a potential site 8or such colla'oration.
1.1.2.1.,. (h) Are The) 0ssential* A +roof
,elo! 4 !ill pro%e !hy J7honetically 4ntuiti%e <nglishK an# J<tymology an# -ree &ssociationK are the
t!o an# only t!o Jessential mnemonicsKF 'y analogizing !or# memorization strategies to #ata
compression strategies in computer science.
5e !ill enumerate strategies in #ata compressionF an# try to 8in# their counterparts in !or#
memorization:
Compression "# removing use!ess portions: 48 a portion o8 a 8ile is uselessF !e can simply #elete it
'e8ore !e compress the 8ile. 4n the case o8 !or# memorizationF the same strategy is use# 'y
7honetically 4ntuiti%e <nglish 274<3F 'ecause 74< lets you memorize a !or# 'y pronunciation rather
than spellingF an# pronunciation #oesnEt ha%e as many useless phones as spelling !hen 'oth are rea#
alou#. -or e=ampleF the !or# JthesaurusK has a pronunciation that ta"es ( sylla'les: the L sau L rusF 'ut
it has a spelling that ta"es ? sylla'les !hen rea# alou#: tee L aych L ee L es L ay L you L are L you L es.
his is e=actly 'ecause the latter has many useless phones compare# to the 8ormer.
Compression "# 0inding redundant portions: 48 t!o portions o8 a 8ile is i#enticalF !e only nee# to
represent this i#entical content in the compression result once. his strategy is also use# in !or#
memorization: 48 !e ha%e alrea#y memorize# J+>K an# JcarKF !e !ill 8in# it easy to memorize the
spelling o8 J+scarK 'ecause it simply is the com'ination o8 t!o parts that !eEre alrea#y 8amiliar !ith.
here8oreF J<tymology an# -ree &ssociationK actually uses the same strategy as 8in#ing re#un#ant
portions in #ata compression.
Compression "# re-coding portions: 0ata compression has a thir# strategy !hich uses shorter co#es
to enco#e more 8re.uently occurring portions o8 a 8ile. his is seen in :u88man co#ingF 8or e=ample. 4n
!or# memorizationF this strategy is not use#F 'ecause re/co#ing portions o8 a !or# !hen you memorize
it !oul# mean manually un/re/co#ing these portions !hen you !ant to repro#uce the !or#F !hich is
prohi'iti%ely har# 8or a human.
Compression "# 0inding the data-generating a!gorithm: he Jalgorithmic in8ormation comple=ityK
8iel# in computer science says i8 you can 8in# an algorithm that generates the gi%en #ataF you #onEt nee#
to sa%e the #ata 'ut Iust the algorithm. *i"e!iseF i8 !e "no! !hy a meaning ta"es a particular !or#
8orm in a languageF !e !oul# 'e a'le to generate that !or# 8orm 'ase# on the meaningF !ithout
memorizing the 8orm itsel8. 5or#s 8orme# 'y roots an# a88i=es in#ee# ha%e some connection 'et!een
meaning an# 8ormF 'ut !hat a'out single/sylla'le !or#s li"e JcatKF JantK an# J'logK6 5e can trace
catEs etymology 'ac" to *.*. cattusF !hich has no earlier etymology an# there8ore can 'e consi#ere# as
ran#omly 8orme#. 5e can trace ant to 5.Dmc. 2amait3o 22ai. Jo88F a!ayK S 2mait. JcutK3. &lthough
2amait3o has a connection 'et!een its 8orm an# meaningF this connection gets lost !hen 2amait3o
8urther trans8orms to antF !hich can 'e consi#ere# as Jran#omize#K. >imilarlyF ,log #eri%es 8rom "e,
logF a meaning8ul phraseF 'ut !hen "e, is shortene# to Iust ,F the connection 'et!een meaning an#
8orm is !ea"ene#. here8oreF !e can not al!ays 8in# a goo# connection 'et!een a !or#Es meaning an#
8ormF so !e canEt Iust memorize this JconnectionK to repro#uce the !or#Es 8orm.
here8ore !e no! "no! J7honetically 4ntuiti%e <nglishK an# J<tymology an# -ree &ssociationK are
essential !or# memorization strategies as the same strategies are use# in #ata compression.
1.1.2.2. Other 3ne.onics
1.1.2.2.1. Orthographicall) &ntuitive 0nglish $O&0%
74< in >ection 1.1.2.1.1 essentially enco#es a !or#Es pronunciation into its spelling. his spa!ns a
symmetric .uestion: can !e enco#e a !or#Es spelling into its pronunciation as !ell6 -or e=ampleF
Jre8erenceK an# JinsuranceK ha%e su88i=es that soun# the same 'ut spell #i88erently 2/ence an# /ance3F
an# can !e slightly mo#i8y these su88i=esE pronunciations to re8lect their spelling #i88erence6 $an !e
gi%e /ance a rising tone an# /ence a 8alling tone6 his soun#s $hinese an# !oul# create ne! #ialects
8or <nglish that lea# to chaos in con%ersations.
:o!e%erF i8 !e thin" outsi#e the 'o=F i8 !e no longer try to Jput in8ormation into pronunciationKF !e
may 'e a'le to e=plore other a%enues. 5hat a'out putting this in8ormation into a !or#Es %isual 8orm6
5hat a'out lo!ering the JaK in /ance a little so that it ma"es a #i88erent impression on the learner6 >o
!e ha%e
insur
a
nce
in contrast to
re8erence
@a"es a #i88erenceF #oesnEt it6 48 the learner #e%elops %isual memory that JinsuranceK has a lo!ere#
character in its su88i=F then he can in8er that this su88i= is /ance 'ecause /ance has a lo!ere# JaK !hile
/ence #oesnEt ha%e anything lo!ere#.
5e call this J+rthographically 4ntuiti%e <nglishK 2+4<3.
Technica! Ana!#sis
*i"e 74<F +4< ma"es slight mo#i8ications to a !or#Es %isual 8orm to a## some e=tra in8ormation.
here8ore they can share the same techni.ues 8or ren#ering to such e88ects. 4n the a'o%e e=ampleF most
#ocument 8ormats that support rich 8ormatting shoul# allo! us to raise/lo!er a character 8rom its
'aseline. 7articularlyF in :@*F !e can use the TspanU tag an# its J%ertical/alignK style property:
L&#ins(rLs&an st+leMD2ertical-align: -1NOD#aLPs&an#nceLP&#
!hich !ill lo!er the JaK in JinsuranceK 'y 15VF ma"ing the !or# loo" li"e
insur
a
nce
+8 courseF !e can also encapsulate the style property into a $>> class so that the a'o%e :@* co#e
can shrin" to something li"e
L&#ins(rLs&an classMDloeredD#aLPs&an#nceLP&#
istorica! >otes
4 came up !ith this i#ea in &ugust 200? 2)senet postF threa#3.
1.1.2.2.2. +rogressive (or' Acuisition $+(A%
4n *1/#ri%en *2 teaching 2see >ection 1.1.13F long !or#s are optionally split into small segments
2usually t!o sylla'les long3 an# taught progressi%elyF an# e%en practice# progressi%ely. his lets the
user learn Iust a little 'it each time an# pay more attention to each 'it 2so that he !oul#nEt Iust learn an
incomplete 8orm o8 a !or# as #iscusse# in >ection 1.1.2.1.13. -or e=ampleF !hen
&'()*
2$hinese 8or J$olora#oK3 8irst appears in a $hinese personEs 5e' 'ro!serF the computer inserts $oloE
a8ter it 2optionally !ith $oloEs pronunciation3:
&'()*2$oloE3
5hen &'()*appears 8or the secon# timeF the computer may #eci#e to test the userEs memory a'out
$oloE so it replaces &'()*!ith
$oloE 2)> state3
Bote that a hint such as J)> stateK is necessary in or#er to #i88erentiate this $oloE 8rom other !or#s
'eginning !ith $olo.
-or the thir# occurrence o8 &'()*F the computer teaches the 8ull 8ormF $olora#oF 'y inserting it
a8ter the $hinese occurrence:
&'()*2$olora#o3
&t the 8ourth timeF the computer may totally replace &'()*!ith
$olora#o
Bot only the 8oreign language element 2$olora#o3 can emerge gra#uallyF the original nati%e language
element 2&'()*3 can also gra#ually 8a#e outF either %isually or semantically 2e.g. &'()* +
,-* ./ F !hich means $olora#o W )> state W place name W 3 . his pre%ents the learner
8rom su##enly losing the $hinese clueF !hile also engaging him in acti%e recalls o8 the occurrenceEs
complete meaning 2&'()*3 !ith gra#ually re#uce# clues.
1.1.2.,. +rinciples Learne'
his section #iscusses se%eral Jprinciples o8 !or# memorizationK learne# 8rom !or# memorization
metho#s #iscusse# in pre%ious sectionsF gi%ing us a more 8un#amental un#erstan#ing o8 !hy these
metho#s !or".
+rincip!e o0 4epetition 2use# in: *1/0ri%en *2 eaching3: he more times you learn or use a !or#F
the 'etter you memorize it. his is !hy *1/#ri%en *2 teaching teaches an# practices a ne! !or#
se%eral times in conte=t 'e8ore consi#ering it as learne# 'y the user.
+rincip!e o0 8egmentation 2use# in: 7rogressi%e 5or# &c.uisition3: & %ery long !or# ha# 'etter 'e
split into smaller segments an# taught gra#ually. his is the rationale 8or J7rogressi%e 5or#
&c.uisitionK.
+rincip!e o0 Amp!i0ication 2use# in: +rthographically 4ntuiti%e <nglish3: J+rthographically 4ntuiti%e
<nglishK #eli'erately ampli8ies the #i88erence 'et!een similar spellings such as J/enceK an# J/anceKF
gi%ing the learner a stronger impression an# hence 'etter memorization.
+rincip!e o0 Condensation 2use# in: 7honetically 4ntuiti%e <nglish3: You pronounce a !or# more
.uic"ly than you spell itF so pronunciation is a more con#ense 8orm than spelling an# ta"es much less
e88ort to 'e memorize#. -urthermoreF 7ronunciation can 8acilitate the memorization o8 spelling. >o
pronunciation shoul# play an early an# critical role in !or# ac.uisition. J7honetically 4ntuiti%e
<nglishK em'o#ies this principle.
+rincip!e o0 Association 2use# in: <tymology an# -ree &ssociation3: @emorizing a ne! !or# can 'e
ma#e easier i8 !e can reuse alrea#y memorize# in8ormation 2!or#sF roots an# a88i=es3 to reconstruct it.
J<tymology an# -ree &ssociationK associates a ne! !or# !ith "no!n in8ormation etymologically or
8reely.
+rincip!e o0 Con0idence 2use# in: 7honetically 4ntuiti%e <nglish3: 5eEre !illing to memorize a piece
o8 in8ormation more 8irmly i8 !eEre sure a'out its long/term %ali#ity an# correctnessF or other!ise !e
!oul# 8ear that it !oul# get up#ate# or correcte# sooner or laterF in%ali#ating !hat !e ha# alrea#y
memorize# an# 8orcing us to ma"e a great e88ort to JunlearnK the in%ali#ate# %ersion an# learn the
up#ate# or correcte# %ersion ane!. here8oreF J7honetically 4ntuiti%e <nglishK pre%ents us 8rom
guessing a ne! !or#Es pronunciation !rongF an# teaches its correct pronunciation 8rom the %ery
'eginning.
+rincip!e o0 Integration 2use# in: *1/0ri%en *2 eachingF 7honetically 4ntuiti%e <nglishF
+rthographically 4ntuiti%e <nglish3: +8ten !e are not moti%ate# or sel8/#iscipline# enough to learn
somethingF 'ut the computer can JintegrateK it into something else that !eEre highly moti%ate# to
engage. -or e=ampleF !e may not !ant to learn a 8oreign language !or# !ithout any conte=t or
purposeF 'ut *1/#ri%en *2 teaching can put it in our #aily nati%e language rea#ing e=perienceH !e may
not 'e %ery intereste# in learning a !or#Es pronunciation !hen it is put separately 8rom the !or#Es
spellingF 'ut J7honetically 4ntuiti%e <nglishK can integrate it into the !or#Es spellingH !e may not pay
attention to the J/ence %s. /anceK #i88erence in a !or#F 'ut J+rthographically 4ntuiti%e <nglishK can
re8lect this #i88erence in8ormation in the !or#Es o%erall %isual shape to !hich !e #o pay attention.
1.2. Foreign Language (riting Ai's
& person !ith some 8oreign language "no!le#ge may still nee# assistance to 'etter !rite in that
language. his section #iscusses ho! no%el tools can assist a non/nati%e user in !riting.
1.2.1. +re'ictive vs. Corrective (riting Ai's
4n contrast to language learning metho#s such as *1/#ri%en *2 teaching !hich 'uil#s up the userEs
8oreign language inci#entally on a long/term 'asisF the user also nee#s Iust/in/time 2Jon/#eman#K3
language support that caters into his imme#iate rea#ing/!riting nee#s. his is especially true o8 !ritingF
!hich re.uires Jpro#ucti%e "no!le#geK that is o8ten ignore# in rea#ingF such as a !or#Es correct synta=
an# applica'le conte=t.
+n/#eman# !riting ai#s can 'e #i%i#e# into t!o types:
+redictive $riting aids pre#icts le=icalF syntactic an# topical in8ormation that might 'e use8ul in the
upcoming !ritingF 'ase# on clues in pre%ious conte=t. >ections 1.2.2 an# 1.2.( #iscuss t!o such toolsF
one 8or ma"ing syntactically %ali# sentencesF the other 8or choosing topically correct !or#s an# larger
'uil#ing 'loc"s such as essay templates.
Corrective $riting aids retroacti%ely e=amines !hat is Iust input 8or possi'le errors an# suggestions.
& spell chec"er is a typical e=ampleF !hich chec"s 8or misspellings in input. $orrecti%e !riting ai#s are
a much researche# areaF as most natural language analysis techni.ues can 'e applie# to e=amine
sentences 8or in%ali# usagesF an# there are stu#ies on non/nati%e !riting phenomena such as !rong
collocations. here8ore this e'oo" #oes not e=pan# this topic.
1.2.2. &nput!"riven 2)nta1 Ai'# $&"2A%
&s a non/nati%e <nglish user inputs a !or#F e.g. JsearchKF the !or#Es sentence/ma"ing synta=es are
pro%i#e# 'y the computerF e.g.
v. search: n. searcher search... )n. search sco&e/ )for n. search target/
!hich means the synta= 8or the %er' JsearchK normally 'egins !ith a noun phraseF the searcherF !hich
is 8ollo!e# 'y the %er'Es 8inite 8ormF then 'y an optional noun phrase !hich is the search scopeF an#
then 'y an optional prepositional phrase stating the search target.
5ith this in8ormationF the user can no! !rite a syntactically %ali# sentence li"e
4Em searching the room 8or the cat.
istorica! >otes
he un#erlying theories o8 this i#ea are !i#ely "no!n in linguistics as case grammar an# 8rame
semantics.
1.2.,. &nput!"riven Ontolog) Ai'# $&"OA%
&s a non/nati%e <nglish user inputs a !or#F e.g. J'a#mintonKF things 2o'Iects3 an# relations that
normally co/e=ist !ith the !or# in the same topic are pro%i#e# to the user as an JontologyK 'y the
computerF !hich is a net!or" !here there are o'Iects li"e Jrac.uetKF Jshuttlecoc"K an# Jplaying courtKF
relations li"e Jser%eK an# Jstri"eK that connect these o'IectsF an# e%en 8ull/scripte# essay templates li"e
Jtemplate: a 'a#minton gameK.
he user can e%en Jzoom inK at an o'Iect or relation to e=plore the micro!orl# aroun# it 28or e=ampleF
zooming in at Jplaying courtK !oul# lea# to a more #etaile# loo" at !hat components a playing court
hasF e.g. a net3 an# Jzoom outKF Iust li"e ho! !e play aroun# in Doogle <arth.
he 'ene8its o8 the ontology ai# are t!o8ol#. -irstF the ontology helps the user %eri8y that the Jsee#
!or#KF 'a#mintonF is a %ali# usage in his inten#e# topicH secon#F the ontology pre/empti%ely e=poses
other %ali# !or#s in this topic to the userF pre%enting him 8rom using a !rong !or#F e.g. 'at 2instea# o8
rac.uet3F 8rom the %ery 'eginning.
4n case the ontology #oes not represent the userEs inten#e# topicF this means the see# !or# is !rong. 4n
this caseF the computer can guess the userEs inten#e# topic 'ase# on pre%ious conte=tF sho! an ontology
that represents this topicF an# let the user choose a right see# !or# 8rom this ontology.
istorica! >otes
4 came up !ith this i#ea in @arch 2009 2)senet postF threa#3.
1.,. Foreign Language 8ea'ing Ai's
)nli"e non/nati%e !ritingF non/nati%e rea#ing #oesnEt re.uire much help 8rom sophisticate# tools. &
learner !ith 'asic <nglish grammar an# the most 8re.uent 100/(00 !or#s can engage in serious
rea#ing !ith the help 8rom a point/to/translate #ictionary program such as #olden4ict an# 5a,*lon
2such a program sho!s translations 8or !hate%er <nglish !or# or e%en phrase is un#er the learnerEs
mouse3.
4t shoul# 'e note# that in rea#ing something the learner only cares a'out the meaning o8 an un8amiliar
!or#F not 8urther in8ormation such as irregular in8lecte# 8orms. >uch 8urther in8ormation is taught in
*1/#ri%en *2 teaching or timely pro%i#e# 'y !riting ai#sF 'ut can also 'e intro#uce# using the
approach 'elo!.
& rea#ing ai# can insert e#ucational in8ormation a'out a !or# or sentence into the te=t 'eing rea#F Iust
li"e *1/#ri%en *2 teachingF !ith the only #i88erence that the main te=t is in the 8oreign language rather
than the nati%e language. his ena'les the computer to teach a##itional "no!le#ge such as i#ioms an#
grammatical usages that are 'eyon# !or#/8or/!or# translation. 5or#/speci8ic synta=es as #iscusse# in
>ection 1.2.2 J4nput/0ri%en >ynta= &i#K an# #omain/speci8ic %oca'ularies as #iscusse# in >ection
1.2.( J4nput/0ri%en +ntology &i#K are also goo# 8ee#s.
Chapter 2: Breaking the Language Barrier with Little
Learning
*anguage learning isnEt al!ays a cost/e88ecti%e option to process in8ormation in a 8oreign languageF
especially i8 the num'er o8 8oreign languages in%ol%e# goes up L an or#inary person certainly #oesnEt
!ant to ac.uire the %ast %oca'ularies o8 the !orl#Es many languagesF as learning <nglish alone is
alrea#y #eman#ing. :e more li"ely !oul# li"e to harness the computerEs memory capacity to interpret
an# generate !or#s in those other 8oreign languages. >ections 2.1 an# 2.2 #iscuss ho! the human an#
the machine can !or" together to un#erstan# an# generate in8ormation in a 8oreign language.
2.1. Foreign Language 5n'erstan'ing
:o! #o !e un#erstan# in8ormation in a 8oreign languageF !ithout learning that 8oreign language6
@achine translation 2@3 is o8ten the only option. 5hile @ gi%es us a JgistK a'out an articleEs main
i#eaF #etails are o8ten elusi%e as @ usually scre!s up synta= 2relations 'et!een content !or#s3 in the
translation result !hen the language pair has .uite #i88erent syntactic rules. here8oreF >ection 2.1.1
intro#uces a ne! approach to @F !here the computer preser%es the original languageEs syntactic
structures in the translation result an# helps the human rea#er un#erstan# these syntactic structures in
their original setting.
2.1.1. 2)nta1!+reserving 3achine Translation# $2+3T%
5e !ill 8irst e=amine to#ayEs machine translationF 8in# out the !orst part 2synta= #isam'iguation3 that
greatly un#ermines the !hole systemEs use8ulnessF an# then propose a ne! @ mo#el 2J>ynta=/
7reser%ing @achine ranslationK3 that 8i=es that part.
Toda#5s *achine Trans!ation: +ros and Cons
,e8ore arti8icial intelligence reaches its 8ullest potentialF machine translation al!ays 8aces unresol%a'le
am'iguities. he good ne$s isF statistical @ such as #oogle Translate #isam'iguates content !or#s
.uite !ell in most casesF an# syntactic am'iguity can largely 'e Jtrans8erre#K to the target languageF
!ithout 'eing resol%e#F i8 'oth the source an# the target language ha%e common syntactic 8eatures. -or
e=ampleF 'oth <nglish an# -rench support prepositional phrasesF so
4 passe# the test !ith his help.
can 'e translate# to -rench !ithout #etermining !hether J!ith his helpK mo#i8ies Jpasse#K or Jthe
testK 2theoreticallyF J!ith his helpK can mo#i8y Jthe testK i8 the test is a#ministere# !ith JhisK help3.
he "ad ne$s isF synta= #isam'iguation usually canEt 'e 'ypasse# in a language pair li"e <nglish to
$hinese. 4n $hineseF J!ith his helpK must 'e mo%e# to the le8t o8 !hat it mo#i8iesF so the $hinese
translation resultEs !or# or#er !ill 'e either
4F !ith his helpF passe# the test.
or
4 passe# the !ith/his/help test.
>oF in or#er to translate the original <nglish sentence to $hineseF it is necessary to #etermine !hether
J!ith his helpK mo#i8ies Jpasse#K or Jthe testK.
Inherent AI Comp!e?it# in 8#nta? (isam"iguation
>ynta= #isam'iguation li"e #etermining !hat is mo#i8ie# 'y J!ith his helpK re.uires capa'ilities
ranging 8rom shallo! rules 2e.g. J!ith helpK shoul# mo#i8y an action rather than an entityF an# i8 there
are more than one action L 'oth JpassK an# JtestK can 'e consi#ere# actions L it shoul# mo#i8y the %er'
L JpassK3 to the most sophisticate# reasoning 'ase# on conte=t or e%en in8ormation e=ternal to the te=t
2e.g. in
4 sa! a cat near a tree an# a man.
!hat is the prepositional o'Iect o8 JnearK6 J& treeK or Ja tree an# a manK63
Let the uman Anderstand 8#nta? in Its Origina! -ormation
4n the 8irst e=ample in this sectionF i.e.
4 passe# the test !ith his help.
!hat i8 J!ith his helpK can 'e translate# to $hinese still in the 8orm o8 a postpone# prepositional
phraseF Iust li"e ho! it is translate# to -rench6 hen the computer !onEt ha%e to #etermine !hat is
mo#i8ie# 'y J!ith his helpKF as this am'iguity is Jtrans8erre#K to $hinese Iust li"e -rench.
he $hinese language itsel8 #oesnEt ha%e postpone# prepositional phrasesF 'ut !e can teach a $hinese
person !hat a postpone# prepositional phrase is so that !e can introduce such a prepositional phrase
in the $hinese result 2the translation !ill 'e #emonstrate# later3. o teach language stu88 li"e J!hat is a
prepositional phraseKF *1/#ri%en *2 teaching 2see >ection 1.1.13 is a goo# approach. &lsoF consi#ering
syntactic concepts li"e JprepositionK are share# 'y many languages in the !orl#F it !oul# 'e .uic" 8or
a person to learn syntactic "no!le#ge o8 the !orl#Es maIor languages.
@ore speci8icallyF !e can #o machine translation in this !ay 2also see J& Nuic" <=ampleK 'elo!3:
Content $ords are #irectly machine/translate# 'y a statistical 5>0 algorithm. 4n case a
content !or#Es #e8ault translation #oesnEt ma"e senseF the user can mo%e the mouse to that
translation to see alternati%e translations.
Word order is generally preser%e# e=actly as it is in the source language te=t. &t the 'eginning
o8 the translation resultF the computer #eclares the translation resultEs Jsentence !or# or#erK
2e.g. >V+ L su'Iect/%er'/o'Iect3 an# Jphrase !or# or#erK 2e.g. hea#/#epen#ent3F so that the
user can ha%e a general i#ea a'out each sentence an# phraseEs syntactic structures.
Anam"iguous or eas#-to-disam"iguate s#ntactic 0eatures are automatically translate# to
grammatical mar0ersF !hich are an international stan#ar# !eEll #esign to represent syntactic
8eatures in a language/in#epen#ent mannerF an# is suppose# to 'e learne# 'y the user in
a#%ance. -or e=ampleF i8 the computer can positi%ely i#enti8y a sentenceEs su'IectF it can mar"
that su'Iect !ith a Jsu'Iect mar"erKF so that the user !ill "no! itEs a su'Iect. &nother e=ample
is a %er'Es transiti%ityF !hich can 'e mar"e# !ith J%tK or J%iK.
ard-to-disam"iguate s#ntactic 0eatures are le8t unchange# in the translation result 2'ut may
'e transcri'e# to the userEs nati%e alpha'et 8or rea#a'ility3. -or e=ampleF the <nglish preposition
J!ithK is an am'iguous 8unction !or#F an# in case it canEt 'e automatically an# con8i#ently
#isam'iguate#F !e !ill lea%e it alone in the translation resultF an# e=pect the user to manually
learn this !or# in a#%ance or in place. :o!e%erF the computer can 'e certain that J!ithK is a
preposition L this part/o8/speech is an unam'iguous syntactic 8eature an# the computer can
appen# a Jpreposition mar"erK a8ter J!ithKF to in#icate this 8eature in the translation result.
@erely "no!ing this is a preposition can o8ten ena'le the user to guess its meaning 'ase# on
conte=t.
A Quick ,?amp!e
he computer can translate
4 passe# the test !ith his help.
to $hinese as
pp
!hich literally means
4 7&>><0 :</<> )>4BDpp :4> :<*7.
!here the computer translates all content !or#sF preser%es the original !or# or#erF automatically
#isam'iguates the 8unction !or# J!ithK in the JusingK sense 2as the prepositional o'Iect Jhis helpK
suggests this sense3F an# a##s a Jpreposition mar"erK L JppK to in#icate to the $hinese rea#er that this
is a preposition 2so that the rea#er !oul# realize that it lea#s a phrase that usually mo#i8ies something
'e8ore itF 'ut not necessarily imme#iately 'e8ore it3.
istorica! >otes
4 came up !ith this i#ea in @arch 2010 2)senet posts 1 an# 23.
2.2. Foreign Language 9eneration
:o! #o !e generate in8ormation in a 8oreign languageF !ithout learning that 8oreign language6
@achine translation 2@3 is o8ten the only option. :o!e%erF @ #oesnEt generate pu'lication/.uality
translation results. >ection 2.2.1 intro#uces an approach to generating te=t in a 8oreign language in
per8ect .ualityF !hich re.uires that the source te=t 'e !ritten in unam'iguous syntactic structures 2!ell/
8orme# synta=3 an# content !or#s 'e #isam'iguate# either automatically or manually.
2.2.1. For.al Language 3achine Translation# $FL3T%
& person not "no!ing a target language can generate in8ormation in that language 'y 8irst !riting his
in8ormation in a 8ormal language L !here syntactic structures are !ritten in an unam'iguous manner
8rom the %ery 'eginningF an# content !or#s are 8rom his nati%e %oca'ulary 'ut !ill 'e automatically or
manually #isam'iguate#. he 8ormal language composition !ill then 'e machine/translate# to %irtually
any 8oreign language in per8ect .uality.
A Quick ,?amp!e
>uppose the userEs nati%e language is <nglish. & 8ormal language sentence 'ase# on <nglish %oca'ulary
may loo" li"e
A F(ic' 1ron fo4.9(m&!over_object: the la-+ dog%Q
!hich literally means J& .uic" 'ro!n 8o= Iumps o%er the lazy #ogK. his 8ormal sentence resem'les an
o'Iect/oriente# programming languageEs 8unction callF !here JIumpK is a mem'er 8unction o8 the
o'Iect J8o=KF an# Jthe lazy #ogK is the %alue 8or an optional argument la'ele# Jo%erPo'IectK.
8#ntactic We!!--ormedness
4n the process o8 !riting a 8ormal language sentenceF an input/#ri%en synta= ai# 2see >ection 1.2.23
helps the user use %ali# synta=. -or e=ampleF in !riting the a'o%e 8ormal sentenceF as soon as the user
inputs J8o=.KF the synta= ai# !ill sho! actions that a 8o= can ta"eF an# as soon as the user inputs
JIump2KF the synta= ai# sho!s possi'le roles that can 'e playe# in a Iump e%entF one o8 them 'eing
something that is Iumpe# o%erF !hich is la'ele# Jo%erPo'IectK.
Le?ica! (isam"iguation
48 a content !or# in such a 8ormal/synta= sentence is am'iguousF automatic !or# sense #isam'iguation
25>03 metho#s can calculate the most li"ely sense an# imme#iately in8orm the user o8 this calculate#
sense 'y #isplaying a synonym 'elo! the original !or# accor#ing to this sense. he user can manually
reselect a sense i8 the machine/calculate# sense is !rong. &ll multi/sense content !or#s are initially
mar"e# as Juncon8irme#K 2e.g. using #otte# un#erlines3F !hich means their machine/calculate# senses
are su'Iect to automatic change i8 later entere# te=t suggests a 'etter interpretation. &n uncon8irme#
!or# 'ecomes con8irme# !hen the user corrects the machine/calculate# sense o8 that !or#F or !hen
the user hits a special "ey to ma"e all currently uncon8irme# !or#s con8irme#. his process is li"e ho!
people input an# con8irm a $hinese string !ith a $hinese input metho#. 4n a##itionF i8 the computer
8eels certain a'out a !or#Es #isam'iguation 2e.g. the #isam'iguation is 'ase# on a relia'le clue such as
a collocation3F it can automatically ma"e that !or# Jcon8irme#K 2remo%e its un#erline3.
*achine Trans!ation to >atura! Languages
&8ter all le=ical am'iguity is resol%e# either automatically or manuallyF the computer can procee# to
machine/translating the 8ormal language composition to any target natural language.
*achine Trans!ation to a 8tandard -orm
he 6ni(ersal Net"or0ing Language %6NL& 7roIect has 'een trying to #o e=actly !hat is #iscusse#
a'o%e: machine/translating a 8ormal language composition to all maIor natural languagesF 'ut it hasnEt
'orne 8ruit 8or 20 years. his is 'ecause natural language generation 2B*D3 8or all maIor languages is a
8ormi#a'le engineering challenge. &n alternati%e approach 4 thin" may !or" is to machine/translate the
8ormal language composition to a 8ormal 'ut human/rea#a'le 8orm li"e this:
a fox
!"#$% &'o()
*!+,o-.'/
01. 2o3
4a56
7
istorica! >otes
here are .uite a 8e! attempts at this approach. he most nota'le one is the )B* 2)ni%ersal
Bet!or"ing *anguage3 at http://!!!.un#l.org.
4 in#epen#ently came up !ith this i#ea in 200(F 'y the en# o8 high school.

Vous aimerez peut-être aussi