2 Semantics and Pragmatics High-level Linguistics (the good stuff!)
Semantics: the study of meaning that can be determined from a sentence, phrase or word.
Pragmatics: the study of meaning, as it depends on context (speaker, situation, dialogue history) LING 2000 - 2006
NLP
3 Language to (Simplistic) Logic John went to the book store. go(John, store1) John bought a book. buy(John,book1) John gave the book to Mary. give(John,book1,Mary) Mary put the book on the table. put(Mary,book1,on table1)
Whats missing? Word sense disambiguation Quantification Coreference Interpreting within a phrase Many, many more issues
But its still more than you get from parsing! Some problems in shallow semantics 1. Identifying entities noun-phrase chunking named-entity recognition coreference resolution (involves discourse/pragmatics too) 2. Identifying relationship names Verb-phrase chunking Predicate identification (step 0 of semantic role labeling) Synonym resolution (e.g., get = receive) 3. Identifying arguments to predicates Information extraction Argument identification (step 1 of semantic role labeling) 4. Assigning semantic roles (step 2 of semantic role labeling) 5. Sentiment classification That is, does the relationship express an opinion? If so, is the opinion positive or negative? 1. Identifying Entities Named Entity Tagging: Identify all the proper names in a text
Sally went to see Up in the Air at the local theater. Person Film
Noun Phrase Chunking: Find all base noun phrases (that is, noun phrases that dont have smaller noun phrases nested inside them)
Sally went to see Up in the Air at the local theater on Elm Street.
1. Identifying Entities (2) Parsing: Identify all phrase constituents, which will of course include all noun phrases.
S NP VP N Sally V NP PP P NP the theater at Up in the Air saw NP Elm St. on PP NP P 1. Identifying Entities (3) Coreference Resolution: Identify all references (aka mentions) of people, places and things in text, and determine which mentions are co-referential.
John stuck his foot in his mouth. 2. Identifying relationship names Verb phrase chunking: the commonest approach Some issues: 1. Often, prepositions/particles belong with the relation name Youre ticking me off.
2. Many relationships are expressed without a verb: Jack Welch, CEO of GE,
3. Some verbs dont really express a meaningful relationship by themselves: Jim is the father of 12 boys.
4. Verb sense disambiguation 5. Synonymy ticking off = bothering 2. Identifying relationship names (2) Synonym Resolution: Discovery of Inference Rules from Text (DIRT) (Lin and Pantel, 2001)
1. They collect millions of examples of
Subject Verb Object
triples by parsing a Web corpus.
2. For a pair of verbs, v1 and v2, they compute mutual information scores between - the vector space model (VSM) for subjects of v1 and the vector space model for the subjects of v2 - the VSM for objects of v1 and VSM for objects of v2
3. They cluster verbs with high MI scores between them
give donate many gift souls gift . your self partner monthly How to animal please hair you gift many dollars please blood you car help life you money members energy you today See (Yates and Etzioni, JAIR 2009) for a more recent approach using probabilistic models. 5. Sentiment Classification Given a review (about a movie, hotel, Amazon product, etc.), a sentiment classification system tries to determine what opinions are expressed in the review.
Coarse-level objective: is the review positive, negative, or neutral overall?
Fine-grained objective: what are the positive aspects (according to the reviewer), and what are the negative aspects?
Question: what technique(s) would you use to solve these two problems? Semantic Role Labeling a.k.a., Shallow Semantic Parsing Semantic Role Labeling Semantic role labeling is the computational task of assigning semantic roles to phrases
Its usually divided into three subtasks: 1. Predicate identification 2. Argument Identification 3. Argument Classification -- assigning semantic roles
John broke the window with a hammer. Pred B-Arg B-Arg I-Arg B-Arg I-Arg I-Arg Agent Patient Means (or instrument) NLP
14 Same event - different sentences John broke the window with a hammer.
John broke the window with the crack.
The hammer broke the window.
The window broke. NLP
15 Same event - different syntactic frames
John broke the window with a hammer. SUBJ VERB OBJ MODIFIER
John broke the window with the crack. SUBJ VERB OBJ MODIFIER
The hammer broke the window. SUBJ VERB OBJ
The window broke. SUBJ VERB NLP
16 Semantic role example break(AGENT, INSTRUMENT, PATIENT)
AGENT PATIENT INSTRUMENT John broke the window with a hammer.
INSTRUMENT PATIENT The hammer broke the window.
PATIENT The window broke. Fillmore 68 - The case for case NLP
17
AGENT PATIENT INSTRUMENT John broke the window with a hammer. SUBJ OBJ MODIFIER
INSTRUMENT PATIENT The hammer broke the window. SUBJ OBJ
PATIENT The window broke. SUBJ Semantic roles Semantic roles (or just roles) are slots, belonging to a predicate, which arguments can fill. - There are different naming conventions, but one common set of names for semantic roles are agent, patient, means/instrument, .
Some constraints: 1. Only certain kinds of phrases can fill certain kinds of semantic roles with a crack will never be an agent But many are ambiguous: hammer patient or instrument? 2. Syntax provides a clue, but it is not the full answer Subject Agent? Patient? Instrument?
Slot Filling Pred John broke the window with a hammer Agent Patient Means (or instrument) Phrases Slots Argument Classification Slot Filling Pred The hammer broke the window Agent Patient Means (or instrument) Phrases Slots Argument Classification Slot Filling Pred The window broke Agent Patient Means (or instrument) Phrases Slots Argument Classification Slot Filling and Shallow Semantics Pred John broke the window with a hammer Agent Patient Means (or instrument) Phrases Slots Shallow Semantics broke(John, the window, with a hammer) Pred Agent Patient Means (or instrument) Slot Filling and Shallow Semantics Pred broke The window Agent Patient Means (or instrument) Phrases Slots Shallow Semantics broke( ?x , the window, ?y ) Pred Agent Patient Means (or instrument) Semantic Role Labeling Techniques Semantic Role Labeling Techniques Well cover 3 approaches to SRL 1. Basic (Gildea and Jurafsky, Comp. Ling. 2003)
2. Joint inference for argument structure (Toutanova et al., Comp. Ling. 2008)
3. Open-domain (Huang and Yates, ACL 2010) 1. Gildea and Jurafsky Main idea: start with parse tree, and try to identify constituents that are arguments. G&J (1) Build a (probabilistic) classifier for predicting: - for each constituent, which role is it? - Essentially, a maximum-entropy classifier, although its not described that way
Features for Argument Classification: 1. Phrase type of constituent 2. Governing category of NPs S or VP (differentiates between subjects and objects) 3. Position w.r.t. predicate (before or after) 4. Voice of predicate (active or passive verb) 5. Head word of constituent 6. Parse tree path between predicate and constituent G&J (2) Parse Tree Path Feature Parse tree path (or just path) feature:
Determines the syntactic relationship between predicate and current constituent.
In this example, path feature:
VB VP S NP G&J (3) 4086 possible values of the Path feature in training data. A sparse feature! G&J (4) Build a (probabilistic) classifier for predicting: - for each constituent, which role is it? - Essentially, a maximum-entropy classifier, although its not described that way
Features for Argument Identification: 1. Predicate word 2. Head word of constituent 3. Parse tree path between predicate and constituent G&J (5): Results Task Best Result Argument Identification (only) 92% prec., 86% rec., .89 F1 Argument Classification (only) 78.5% assigned correct role 2. Toutanova, Haghighi, and Manning A Global Joint Model for SRL (Comp. Ling., 2008)
Main idea(s): Include features that depend on multiple arguments Use multiple parsers as input, for robustness THM (1): Motivation 1. The day that the ogre cooked the children is still remembered.
2. The meal that the ogre cooked the children is still remembered.
Both sentences have identical syntax. They differ in only 1 word (day vs. meal).
If we classify arguments 1 at a time, the children will be labeled the same thing in both cases.
But in (1), the children is the Patient (thing being cooked). And in (2), the children is the Beneficiary (people for whom the cooking is done).
Intuitively, we cant classify these arguments independently. THM(2): Features Features: 1. Whole label sequence 1. [voice:active, Arg1, pred, Arg4, ArgM-TMP] 2. [voice:active, lemma:accelerated, Arg1, pred, Arg4, ArgM-TMP] 3. [voice:active, lemma:accelerated, Arg1, pred, Arg4] (no adjuncts) 4. [voice:active, lemma:accelerated, Arg, pred, Arg] (no adjuncts, no #s) 2. Syntax and semantics in the label sequence 1. [voice:active, NP-Arg1, pred, PP-Arg4] 2. [voice:active, lemma:accelerated, NP-Arg1, pred, PP-Arg4] 3. Repetition features: whether Arg1 (for example) appears multiple times THM(3): Classifier First, for each sentence, obtain the top-10 most likely parse tree/semantic role label outputs from G&J Build a max-ent classifier to select from these 10, using the features above Also, include top-10 parses from the Charniak parser THM(4): Results These are on a different data set from G&J, so results not directly comparable. But the local model is similar to G&J, so think of that as the comparison. Model WSJ (ID & CLS) Brown (ID & CLS) Local 78.00 65.55 Joint (1 parse) 79.71 67.79 Joint (top 5 parses) 80.32 68.81 Results show F1 scores for IDentification and CLaSsification of arguments together. WSJ is the Wall Street Journal test set, a collection of approximately 4,000 news sentences. Brown is a smaller collection of fiction stories. The system is trained on a separate set of WSJ sentences. 3. Huang and Yates Open-Domain SRL by Modeling Word Spans, ACL 2010
Main Idea: One of the biggest problems for SRL systems is that they need lexical features to classify arguments, but lexical features are sparse.
We build a simple SRL system that outperforms the previous state-of-the-art on out-of-domain data, by learning new lexical representations. Simple, open-domain SRL Chris broke the window with a hammer Proper Noun Verb Det. Noun Prep. Det. Noun B-NP B-VP B-NP I-NP B-PP B-NP I-NP -1 0 +1 +2 +3 +4 +5 POS tag Chunk tag dist. from predicate SRL Label Breaker Pred Thing Broken Means Baseline Features HMM label Simple, open-domain SRL Chris broke the window with a hammer Proper Noun Verb Det. Noun Prep. Det. Noun B-NP B-VP B-NP I-NP B-PP B-NP I-NP -1 0 +1 +2 +3 +4 +5 POS tag Chunk tag dist. from predicate SRL Label Breaker Pred Thing Broken Means Baseline +HMM The importance of paths
Chris [ predicate broke] [ thing broken a hammer]
Chris [ predicate broke] a window with [ means a hammer]
Chris [ predicate broke] the desk, so she fetched [ not an arg a hammer] and nails.
Simple, open-domain SRL Chris broke the window with a hammer None None None the the- window the- window- with the- window- with-a Word path SRL Label Breaker Pred Thing Broken Means Baseline +HMM + Paths Simple, open-domain SRL Chris broke the window with a hammer None None None the the- window the- window- with the- window- with-a Word path SRL Label Breaker Pred Thing Broken Means Baseline +HMM + Paths Det Det-Noun Det- Noun- Prep Det- Noun- Prep-Det POS path None None None Simple, open-domain SRL Chris broke the window with a hammer None None None the the- window the- window- with the- window- with-a Word path SRL Label Breaker Pred Thing Broken Means Baseline +HMM + Paths Det Det-Noun Det- Noun- Prep Det- Noun- Prep-Det POS path None None None HMM path None None None Experimental results F1 All systems were trained on newswire text from the Wall Street Journal (WSJ), and tested on WSJ and fiction texts from the Brown corpus (Brown). 0.672 0.729 0.750 0.617 0.655 0.677 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 WSJ Brown Experimental results F1 All systems were trained on newswire text from the Wall Street Journal (WSJ), and tested on WSJ and fiction texts from the Brown corpus (Brown). 0.672 0.729 0.750 0.808 0.786 0.794 0.617 0.655 0.677 0.688 0.684 0.678 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 WSJ Brown Span-HMMs
Span-HMM features Chris broke the window with a hammer Span-HMM for hammer SRL Label Breaker Pred Thing Broken Means Span-HMM Features Span-HMM feature Span-HMM features Chris broke the window with a hammer Span-HMM for hammer SRL Label Breaker Pred Thing Broken Means Span-HMM Features Span-HMM feature Span-HMM features Chris broke the window with a hammer Span-HMM for a SRL Label Breaker Pred Thing Broken Means Span-HMM Features Span-HMM feature Span-HMM features Chris broke the window with a hammer Span-HMM for a SRL Label Breaker Pred Thing Broken Means Span-HMM Features Span-HMM feature Span-HMM features Chris broke the window with a hammer SRL Label Breaker Pred Thing Broken Means Span-HMM Features Span-HMM feature None None None Experimental results SRL F1 All systems were trained on newswire text from the Wall Street Journal (WSJ), and tested on WSJ and fiction texts from the Brown corpus (Brown). 0.750 0.808 0.786 0.794 0.786 0.792 0.677 0.688 0.684 0.678 0.718 0.731 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 WSJ Brown Experimental results feature sparsity Benefit grows with distance from predicate