Académique Documents
Professionnel Documents
Culture Documents
Course Overview What is NLP? How is it done? a brief historics of NLP Linguistics in NLP Why is it hard? What are the applications?
2007/2008
1 / 111
2 / 111
Documentation
Webpage of the course www.loria.fr/gardent/applicationsTAL Slides of the lecture will be handed out at the beginning of each lecture. Thursday Nancy NLP Seminar www.loria.fr/gardent/Seminar/content/seminar07-2.php
Course Overview
Theory
What is NLP? : Why is it hard? How is it done? What are the applications? Symbolic approaches. Exemplied by natural language generation Meaning Text Statistical approaches. Exemplied by Information retrieval and information extraction Text Meaning
Practice First seminar: this week ; Guillaume Pitel (in English); on using Latent Semantic Analysis to bootstrap a Framenet for French
3 / 111
Assessment
you should all have a login account on the UHP machines. Nancy 2 students need rst to register at Nancy 1 (UHP) Registration is free. from this account you should all be able to access, python, NLTK and whatever is needed for the exercices and the projet If not, tell us! Room I100 is reserved for you every wednesday morning until christmas. Optional lab sessions with Bertrand Gaie, wednesday morning from 10 to 12 in Room I100. Starting next week.
5 / 111
6 / 111
Presentations
Each student must present a paper on either Question Answering or Natural Language Generation. A list of papers suggested for presentation will be given shortly on the course web site. If you prefer, you can choose some other paper on either QA or NLG but you must then rst run it by me for approval I will collect your choices at the end of the second week. Presentations will be held on 4th (QA) and 16th October (NLG). More about presentations and about their grading at http://www.loria.fr/gardent/applicationsTAL
Software Project
A list of software projects will be presented at the end of the second week. You should gather into groups of 2 or 3 and choose a topic. If desired, there can also be individual projects. I will collect your choices at the end of the third week (4 October). Each group will give a short oral presentation (intermediate report) of their project at the end of the 5th week (18 october). The results (program and output) of each group on the project will be returned at the end of the semester (roughly end of january). More about projects at http://www.loria.fr/gardent/applicationsTAL
7 / 111 8 / 111
Course schedule
Course schedule
Mo 17 september, 2pm. What is NLP? Why is it hard? How is it done? An overview of NLP applications Tue 18 september, 10am. Python fundamentals Th 20 september, 10am. Regular expressions Mo 24 september, 2pm. Corpus processing and tokenization with NLTK. Tue 25 september, 10am. Tagging and Chunking with NLTK. Th 27 september, 10am. Information Retrieval. Presentation of Topics for Software projects Collecting Individual Choices for Presentations Mo 1 october, 10am. Information extraction. Tu 2 october, 10am. Information extraction. Th 4 october, 10am. Presentations on Question Answering
9 / 111 10 / 111
Mo 8 october, 2pm. NLG: Introduction Tue 9 october, 10am. NLG: Macro and microplanning dd Th 11 october, 10am. NLG: Referential expressions Mo 15 october, 2pm. NLG: Surface realisation Mo 16 october, 10am. Presentations on NLG. Th 18 october, 10am. Software Projects Presentations.
Contact information
What is NLP?
Need to nd me?
Natural Language Processing aims at making computers talk and more precisely, at endowing them with the linguistic ability of humans.
11 / 111
12 / 111
1966 ELIZA
1966: Eliza, the rst chatterbot i.e., a computer program that mimics human conversation. Developed by Joseph Weizenbaum from the Massachusetts Institute of Technology. 1973: SHRDLU, a computer program that allows the user to interact with a robot in a block world. Developed by Terry Winograd (MIT)
Eliza, the rst chatterbot a computer program that mimics human conversation. Joseph Weizenbaum Massachusetts Institute of Technology User types in some statement or set of statements in natural language ELIZA then analyzes the users statement and generates some response which it types out. http://www-ai.ijs.si/eliza/eliza.html
13 / 111
14 / 111
Sample conversation
ELIZA
Men are all alike. IN WHAT WAY? Theyre always bugging us about something or other. CAN YOU THINK OF A SPECIFIC EXAMPLE? Well, my boyfriend made me come here. YOUR BOYFRIEND MADE YOU COME HERE He says Im depressed much of the time. I AM SORRY TO HEAR YOU ARE DEPRESSED
Its true. I am unhappy DO YOU THINK COMING HERE WILL HELP YOU NOT TO BE UNHAPPY I need some help, that much seems certain. WHAT WOULD IT MEAN TO YOU IF YOU GOT SOME HELP Perhaps I could learn to get along with my mother. TELL ME MORE ABOUT YOUR FAMILY
15 / 111
16 / 111
ELIZA
ELIZA
My mother takes care of me. WHO ELSE IN YOUR FAMILY TAKES CARE OF YOU My father. YOUR FATHER You are like my father in some ways. WHAT RESEMBLANCE DO YOU SEE You are not very aggressive but I think you dont want me to notice that. WHAT MAKES YOU THINK I AM NOT VERY AGGRESSIVE You dont argue with me.
WHY DO YOU THINK I DONT ARGUE WITH YOU You are afraid of me. DOES IT PLEASE YOU TO BELIEVE I AM AFRAID OF YOU My father is afraid of everybody. WHAT ELSE COMES TO YOUR MIND WHEN YOU THINK OF YOUR FATHER Bullies. DOES THAT HAVE ANYTHING TO DO WITH THE FACT THAT YOUR BOYFRIEND MADE YOU COME HERE
17 / 111
18 / 111
Example transformations
When keyword is found, the sentence is transformed according to a rule associated with the keyword (rcriture dexpressions rgulires)
I am X How long have you been X? I am very unhappy. HOW LONG HAVE YOU BEEN VERY UNHAPPY? X YOU Y ME What makes you think I Y you? It seems that you hate me. WHAT MAKES YOU THINK THAT I HATE YOU?
Else a content-free remark or, under certain conditions, an earlier transformation is retrieved. The text so computed or retrieved is then printed out.
19 / 111
20 / 111
Can pretend to know almost nothing of the real world I went for a long boat ride Tell me about boats Not a real dialog (no real communication)
ELiza generates a reaction (often a rather non commital one) to the users statement The user cannot ask for information. The user cannot issue an order
Eliza: Hello. I am ELIZA. How can I help you? You: I went to Paris yesterday Eliza: Can you elaborate on that? You: I took the train Eliza: Can you elaborate on that? You: The train was late Eliza: Please go on. You: All the seats were full Eliza: Please go on. You: A little boy was having a tantrum Eliza: Please go on.
21 / 111
22 / 111
Indexicals (I, you, here, now) User: I am having fun right now. Eliza: Does it please you to believe that I am having fun right now.
World Knowledge User: I am a human. Eliza: How long have you been a human?
Wh-Questions User: Who are you? Eliza: Would you prefer it if I were not? User: What do you want me to tell? Eliza: Does that question interest you?
23 / 111
24 / 111
Complex Assertions User: I dont have a problem. Im quite happy, I have a girlfriend, I study. Everything is OK! Eliza: Why do you not have any problem, Im quite happy, you have a girlfriend, you study. Everything is OK!
Ill-formed Input User: azefzaer is qgqdg in the agag. Eliza: Please go on.
Ill-formed Output User: Doesnt my story interest you? Eliza: Why do you stay your story interest I? User: I think that you can help me but Im not sure that you can understand me. Eliza: Do you wish to but im not sure that me
25 / 111
26 / 111
Is it NLP?
1973 SHRDLU
Ad hoc transformations No linguistic (e.g., syntactic, semantic, morphological) knowledge/processing involved Not portable: would not work for another domain e.g., Train scheduling information system I want to go to Paris ??? Tell me about Paris
Terry Winograd, MIT Interaction with a robot in a block world. The user can:
instruct the robot to manipulate the blocks ask it about the blocks congurations ask it about its reasoning update it with facts
http://hci.stanford.edu/winograd/shrdlu/
27 / 111
28 / 111
4s. What does the box contain? 4u. THE BLUE PYRAMID AND THE BLUE BOX. Keep tracks of how context is aected by dialog 5s. What is the pyramid supported by? 5u. THE BOX. Handles passive/active variation; Correctly identies the pyramid as the pyramid just mentioned
30 / 111
Is it NLP?
How many blocks are not in the box? FOUR OF THEM. Is at least one of them narrower than the one which I told you to pick up? YES, THE RED CUBE. Is it supported? YES, BY THE TABLE. Can the table pick up blocks? NO. Can reason about the situation
Real communicative abilities: the user can request information, issue instructions and update the system with new information The system Understands language in a limited domain by using syntactic parsing and semantic reasoning
Large scale grammar of English + parser Procedural semantics for words and phrases
31 / 111
32 / 111
Machine Translation (MT) one of the earliest applications of computers Major attempts in US and USSR - Russian to English and reverse George Town University, Washington system: - Translated sample texts in 1954 - Euphoria - Lot of funding, many groups in US, USSR * But: the system could not be scaled up.
Assessed research results of groups working on MTs Concluded: MT not possible in near future. Funding should cease for MT ! Basic research should be supported.
33 / 111
34 / 111
1957 Noam Chomskys Syntactic Structures A formal denition of grammars and languages Provides the basis for a automatic syntactic processing of NL expressions Montagues PTQ Formal semantics for NL. Basis for logical treatment of NL meaning 1967 Woods procedural semantics A procedural approach to the meaning of a sentence Provides the basis for a automatic semantic processing of NL expressions
1970 TAUM Meteo Machine translation of weather reports (Canada) 1970s SYSTRAN: MT system; still used by Google 1973 Lunar To question expert system on rock analyses from Moon samples 1973 SHRDLU (T. Winograd) Instructing a robot to move toy blocks
35 / 111
36 / 111
Formally grounded and reasonably computationally tractable linguistic formalisms (Lexical Functional Grammar, Head-Driven Phrase Structure Grammar, Tree Adjoining Grammar etc.) Linguistic/Logical paradigm extensively pursued Not robust enough Few applications
Disk space becomes cheap Machine readable text becomes uniquitous US funding emphasises large scale evaluation on real data 1994 The British National Corpus is made available A balanced corpus of British English Mid 1990s WordNet (Fellbaum & Miller) A computational thesaurus developed by psycholinguists Early 2000s The World Wide Web used as a corpus
37 / 111
38 / 111
CL History Summary
50s Machine translation; ended by ALPAC report 60s Applications use linguistic techniques (Eliza, shrdlu) from Chomsky (formal grammars, parsers); Procedural semantics (Woods) also important. Approaches only work on restricted Domains. Not portable. 70s/80s Symbolic NLP. Applications based on extensive linguistic and real world knowledge. Not robust enough. Lexical acquisition bottleneck. 90s now. Statistical NLP. Applications based on statistical methods and large (annotated) corpora
Speech recognition shows that given enough data, simple statistical techniques work US funding emphasises speech-based interfaces and information extraction Large size digitised corpora are available
39 / 111
40 / 111
Linguistics in NLP
Based on hand written rules Requires linguistic expertise No frequencey information More brittle and slower than statistical approaches Often more precise than statistical approaches Error analysis is usually easier than for statistical approaches Supervised or non-supervised Rules acquired from large size corpus Not much linguistic expertise required Robust and quick Requires large size (annotated) corpora Error analysis is often dicult
41 / 111
NLP applications use knowledge about language to process language All levels of linguistic knowledge are relevant:
Statistical
Phonetics, Phonology The study of linguistic sounds and of their relation to words Morphology The study of words components Syntactic The study of the structural relationship between words Semantics The study of meaning Pragmatics The study of how language is used to accomplish goals and of the inuence of context on meaning Discourse The study of linguistic units larger than a single utterance
42 / 111
Phonetics/phonology
Phonetics : study of the speech sounds used in the languages of the world How to transcribe those sounds (IPA, International Phonetic Alphabet) How sounds are produced (Articulatory Phonetics) Phonology : study of the way a sound is realised in dierent environments A sound (phone) can usually be realised in dierent ways (allophones) depending on its context E.g., the hand transcribed Switchboard corpus of English telephone speech list 16 ways of pronuncing because and about
43 / 111
Phonetics/phonology
An example illustrating the Sound-to-Text mapping issue. (1) a. Recognise speech. b. Wreck a nice peach. Phonetics and phonology can be used either to map words into sound (Speech synthesis) or to map sounds onto words (Speech recognition).
44 / 111
Morphology
Study of the structure of words Two types of morphology : Flectional: decomposes a word into a lemma and one or more grammatical axes giving information about tense, gender, number, etc. E.g., Cats lemma = cat + axe = s Derivational: decomposes a word into a lemma and one or more axes giving information about meaning or/and category. E.g., Unfair prex = un + lemma = fair
Ambiguity:
saw saw, noun, sg, neuter saw saw, verb, 1st person, sg, past saw saw, verb, 2nd person, sg, past saw saw, verb, 3rd person, sg, past saw saw, verb, 1st person, pl, past saw saw, verb, 2nd person, sg, past saw saw, verb, 3rd person, sg, past
46 / 111
45 / 111
Morphology: Applications
Methods
Tools
to resolve anaphora: (2) Sarah met the women in the street. She did not like them. [Shesg = Sarahsg ; thempl = the womenpl ] for spell checking and for generation * The womenpl issg
47 / 111
48 / 111
Syntax
Captures structural relationships between words and phrases Describes the constituent structure of NL expressions Grammars are used to describe the syntax of a language Syntactic analysers and surface realisers assign a syntactic structure to a string/semantic representation on the basis of a grammar NP John Adv often V V gives
49 / 111
50 / 111
Methods in Syntax
Syntax
Algorithm : parser Resource used : Lexicon + Grammar Symbolic : hand-written grammar and lexicon Statistical: grammar acquired from tree bank diculty : coverage and ambiguity
for spell checking (e.g., subject-verb agreement) to construct the meaning of a sentence to generate a grammatical sentence
51 / 111
52 / 111
Spell checking
(3) Its a fair exchange. No syntactic tree Its a fair exchange. Ok syntactic tree (4) My friends is unhappy. The number of my friends who were unhappy was amazing. The man who greets my friends is amazing. Subject+Verb agreement
John loves Mary Agent = Subject = Mary loves John Agent = Subject Mary is loved by John Agent = By-Object
love(j,m)
love(m,l)
love(j,m)
53 / 111
54 / 111
Lexical semantics
Lexical semantics
The study of word meanings and of their interaction with context Words have several possible meanings Early methods use selectional restrictions to identify meaning intended in given context (5) a. The astronomer saw the star. b. The astronomer married the star. Statistical methods use cooccurrence information derived from corpora annotated with word senses (6) e. John sat on the bank. f. John went to the bank. g. ?? King Kong sat on the bank. Lesk algorithm: word overlap between words appearing the denitions of the ambiguous word and the words surrounding this word in text
55 / 111 56 / 111
Lexical relations i.e., relations between word meanings are also very important for CL based applications The most used lexical relations are:
Hyponymy (ISA) e.g., a dog is a hyponym of animal Meronymy (part of) e.g., arm is a meronym of body Synonymy e.g., eggplant and aubergine Antonymy e.g., big and little
Lexical semantics
Compositional Semantics
In NLP applications, the most commonly used lexical relation is hyponymy which is used:
for semantic classication (e.g., selectional restrictions, named entity recognition) for shallow inference (e.g., X murdered Y implies X killed Y) for word sense disambiguation for machine translation (if a term cannot be translated, substitute a hypernym)
Semantics of phrases Useful to reason about the meaning of an expression (e.g., to improve the accuracy of a question answering system)
(7) a. John saw Mary. b. Mary saw John. Same words, dierent meanings.
57 / 111
58 / 111
Pragmatics
Discourse
Compositional semantics delivers the literal meaning of an utterance NL phrases are often used non literally Examples. (8) a. Can you pass the salt? b. You are standing on my foot. Speech act analysis, plan recognition are needed to determine the full meaning of an utterance
Much of language interpretation is dependent on the preceding discourse/dialogue Example. Anaphora resolution. (9) a. The councillors refused the women a permit because they feared revolution. b. The councillors refused the women a permit because they advocated revolution.
59 / 111
60 / 111
The various types of linguistic knowledge are put to work in Deep NLP systems Deep Natural Language Processing Systems build a meaning representation (needed e.g., for NL interface to databases, question answering and good MT) from user input and produces some feedback to the user In a deep NLP system, each type of linguistic knowledge is encoded in a knowledge base which can be used by one or several modules of the system Ambiguity: the same linguistic unit (word, constituent, sentence, etc.) can be interpreted/categorised in several competing ways Paraphrases: the same content can be expressed in dierent ways.
61 / 111
62 / 111
Problem 1: Ambiguity
Lexical semantics: The same word can mean dierent things. toile : sky star or celebrity? e Part of speech: The same word can belong to dierent parts of speech. la : pronoun, noun or determiner? Syntax: The same sentence can have dierent syntactic structures. Jean regarde (la lle avec des lunettes) Jean ((regarde la lle) avec des lunettes) Semantics: The same sentence can have dierent meanings. La belle ferme la porte
( La belle femme )Subj (ferme la porte)VP . (La belle femme ferme)Subj (la porte)VP .
63 / 111
64 / 111
A combinatorial problem
Problem 2: Paraphrase
Ambiguities multiply out thereby inducing a combinatorial issue. Example: La porte que la belle ferme prsente ferme mal. e
Quand mon laptop arrivera-til? Pourriez vous me dire quand je peux esprer recevoir mon e laptop?
In generation (Meaning Text), this implies making choices. Against the combinatorics is high.
65 / 111
66 / 111
NLP applications
Spelling and grammar checking Speech recognition Spoken Language Dialog Systems Machine Translation Text summarisation Information retrieval and extraction Question answering
Three main types of applications: 1. Language input technologies 2. Language processing technologies 3. Language output technologies
67 / 111
68 / 111
Key focus
Spoken utterance Text Desktop control: dictation, voice control, navigation Telephony-based transaction: travel reservation, remote banking, pizza ordering , voice control
69 / 111
70 / 111
Speech recognition
Cheap PC desktop software available 60-90% accuracy. Good enough for dictation and simple transactions but depends on Speaker and circumstances Speech recognition is not understanding!
based on statistical techniques and very large corpora works for many languages accurracy depends on audio-conditions (robustness problem)
71 / 111
72 / 111
Dictation
Dictation systems can do more than just transcribe what was said:
Desktop control
leave out the ums and eh implement corrections that are dictated ll the information into forms rephrase sentences (add missing articles, verbs and punctuation; remove redundant or repeated words and self corrections)
Communicate what is meant, not what is said Speech can be used both to dictate content or to issue commands to the word processing applications (speech macros eg to insert frequently used blocks of text or to navigate through form)
73 / 111
74 / 111
Key focus
Nuance (www.nuance.com) ScanSoft (www.scansoft.com) Philips (www.speech.philips.com) Telstra directory enquiry (tel. 12455)
Printed material computer readable representation Scanning (text digitized format) Business card readers (to scan the printed information from business cards into the correct elds of an electronic address book.) www.cardscan.com Website construction from printed documents
Applications
75 / 111
76 / 111
90% accuracy on clean text 100-200 characters per second (as opposed to 3-4 for typing) character segmentation and character recognition Problems: unclean data and ambiguity Many OCR systems use linguistic knowledge to correct recognition errors:
Fielded products
Fundamental issues
77 / 111
78 / 111
Key focus
Everyone write dierently! Isolated letter vs. cursive script Train user or system? Most people type faster than they write: choose applications where keyboards are not appropriate Need elaborate language model and writing style models
Human handwriting computer readable representation Forms processing Mail routing Personal digital agenda (PDA)
Applications
79 / 111
80 / 111
Isolated letters
5-6% error rate (on isolated letters) Good typist tolerate up to 1% error rate Human subjects make 4-8% errrors
Palms Grati (www.palm.com) Computer Intelligence Corporations Jot (www.cic.com) Motorolas Lexicaus ParaGraphs Calligraphper (www.paragraph.com)
Cursive scripts
81 / 111
82 / 111
Retroconversion
Key focus: identify the logical and physical structure of the input text Applications
Spelling and grammar checking Spoken Language Dialog System Machine Translation Text Summarisation Search and Information Retrieval Question answering systems
Recognising tables of contents Recognising bibliographical references Locating and recognising mathematical formulae Document classication
83 / 111
84 / 111
Flag words which are not in the dictionnary Dictionnary lookup * neccessary In case of language with a rich morphology, ag words which are morphosyntactically incorrect e.g., He *gived a book to Mary Morphological processing
Word sense disambiguation : *The trees bows were heavy with snow v. The trees boughs were heavy with snow
85 / 111
86 / 111
Goal
a system that you can talk to in order to carry out some task. Speech recognition Speech synthesis Dialogue Management Information provision systems: provides information in response to query (request for timetable information, weather information) Transaction-based systems: to undertake transaction such as buying/selling stocs or reserving a seat on a plane.
Key focus
No training period possible in Phone-based systems Error handling remains dicult User initiative remains limited (or likely to result in errors)
Applications
87 / 111
88 / 111
Nuance (www.nuance.com) SpeechWorks (www.scansoft.com) Philips (www.speech.philips.com) See also google category : http://directory.google.com/Top/Computers/Speech Technology/
Stock broking system Betting service American Airlines ight information system
89 / 111
90 / 111
Machine Translation
Key focus
Direct transfer: map between words Syntactic transfer: map between syntactic structures Semantic structures: map between semantic structures Interlingua: parse to derive language neutral semantic representation and generate from there Example based: use a database of translation pairs and nd closest matching phrase
translating a text written/spoken in one language into another language Web based translation services Spoken language translation services
Applications
91 / 111
92 / 111
Existing MT Systems
Exceptional domain: Limited language, large translation need. A limited domain with enough material never again found. The same group tried to build Taum-Aveo for aircraft maintenance manuals. Only limited success.
Domain of weather reports Highly successful Human assisted translation Rough translation Used over the internet through AltaVista http://babelsh.altavista.com
93 / 111
94 / 111
Deux soldats britanniques de capot interne sont arrts par la ee police dIraq ` Bassora suivant une chasse de voiture. On rapporte a quils mettent le feu sur la police.
Larbre est une structure tr`s utilise en linguistique. On lutilise e e par exemple, pour reprsenter la structure syntaxique dune e expression ou, par le biais des formules logiques, pour reprsenter e le sens des expressions de la langue naturelle.
undercover/de capot interne: incorrect word translation following/suivant (instead of suite a): gerund/preposition ambiguity wrongly resolved car chase/chasse a voiture (instead of course en voiture): wrong recognition of N-N compound re on/mettre le feu sur: non recognition of verbal locution
95 / 111
The tree is a structure very much used in linguistics. It is used for example, to represent the syntactic structure of an expression or, by the means of the logical formulas, to represent the direction of the expressions of the natural language.
96 / 111
Word salad
Cette approche est particuli`rement intressante parce que, un peu e e comme les grammaires dunication introduites il y a quelques dcennies par Martin Kay, [ ... ]. Cette vision qui est sans doute, e celle de la plupart des linguistes, na malheureusement toujours pas trouv de cadre informatique adquat pour sexprimer et e e sinstancier.
Broad coverage systems already available on the web (Systran) Reasonable accuracy for specic domains (TAUM Meteo) or controlled languages Machine aided translation is mostly used
This approach is particularly interesting because, a little like introduced grammars of unication a few decades ago by Martin Kay, [ ... ] . This vision which is undoubtedly, that of the majority of the linguists, unfortunately still did not nd of data-processing framework adequate to be expressed and instancier.
97 / 111
98 / 111
Text summarisation
Text summarisation
Key issue
Text Shorter version of text to decide whether its worth reading the original text To read summary instead of full text to automatically produce abstract
1. Extract important sentences (compute document keywords and score document sentences wrt these keywords) 2. Cohesion check: Spot anaphoric references and modify text accordingly (eg add sentence containing pronoun antecedent; remove dicult sentences; remove pronoun) 3. Balance and coverage: modify summary to have an appropriate text structure (delete redundant sentences; harmonize tense of verbs; ensure balance and proper coverage)
Applications
99 / 111
100 / 111
Text summarisation
Sentences extracted on the basis of: location, linguistic cues, statistical information Low discourse coherence
retrieve document containing answer (retrieval) ll in template with relevant information (extraction) produce answer to query (Q/A)
Commercial systems
British Telecoms ProSum (transend.labs.bt.com) Copernic (www.copernic.com) MS Words Summarisation tool See also http://www.ics.mq.edu.au/swan/summarization/projects.htm
Limited to factoid questions e.g., Who invented the electric guitar? How many hexagons are on a soccer ball? Where did Bill Gates go to college? Excludes: how-to questions, yes-no questions, questions that reauire complex reasoning Highest possible accuracy estimated at around 70%
101 / 111
102 / 111
AskJeeves (www.askjeeves.com) Articial lifes Alife Sales Rep (www.articial-life.com) Native MindsvReps (www.nativeminds.com) Soliloquy (www.soliloquy.com)
103 / 111
104 / 111
Text-to-Speech (1)
Text-to-Speech (2)
Key focus
Text Natural sounding speech Spoken rendering of email via desktop and telephone Document proofreading Voice portals Computer assisted language learning
Applications
Scansofts RealSpeak (www.lhsl.com/realspeak) British Telecoms Laureate AT&T Natural Voices (http://www.naturalvoices.att.com)
105 / 111
106 / 111
Key focus
Document structure + parameters Individually tailored documents Personalised advice giving Customised policy manuals Web delivered dynamic documents
KnowledgePoint (www.knowledgepoint.com)
Applications
CoGenTex (www.cogentex.com)
107 / 111
108 / 111
CL Applications Summary
NLP application process language using knowledge about language all levels of linguistic knowledge are relevant Two main problems: ambiguity and paraphrase NLP applications use a mix of symbolic and statistical methods Current applications are not perfect as
Symbolic processing is not robust/portable enough Statistical processing is not accurate enough
Applications should be classied into two main types: aids to human users (e.g., spell checkers, machine aided translations) and agents in their own right (e.g., NL interfaces to DB, dialogue systems) Useful applications have been built since the late 70s Commercial success is harder to achieve
109 / 111