Vous êtes sur la page 1sur 28

NLP week 2

Rationalist and Empiricist paradigms in


Natural Language Processing
Rationalist, rule based, symbolic AI approach:
finding a grammar which represents the structure
of natural language.
Empiricist, data driven approach: finding
regularities in language by examining large
corpora, constructing models from training data.

Rationalist position
Knowledge of language involves in the first
place knowledge of grammar language is a
derivative and perhaps not very interesting concept
Chomsky, 1980
Empiricist position
Dont think but look! the more we examine
actual language, the sharper becomes the conflict
between it and
our requirement, the crystalline
purity of logic
Wittgenstein, 1945
NB Chomsky has modified his position over the years,
and now says that empirical evidence is relevant.

Empirical, or corpus based, methods


Basic principle: Set the parameters of the system by
exposure to training data, store them and then apply to new,
unseen data
A language model from training data: a set of
characteristics of language, drawn from a corpus.
A corpus is a collection of natural language texts.
Empirical methods include probabilistic systems and neural
processors. The concepts underlying these methods are
related.

Types of grammar: the Chomsky hierarchy


Type 0:

Turing equivalent (not needed for natural


language processing)

Type 1:

Context sensitive

Type 2:

Context free

Type 3:

Regular

Empirical methods and grammar type (1)


Based on regular, type 3, grammars:
Dependencies between adjacent symbols e.g. words or
tags, are captured. Can be modelled by finite state
automata.
Cannot model phrase structure, or higher level
grammar.

Empirical methods and grammar type (2)


Little semantic information can be captured from a regular
grammar.
E.g. the agent and victim cannot be extracted from:
John kicked Bill.
John was kicked by Bill.
This may not matter. E.g the Ferret plagiarism detector
Even if it does matter, a regular grammar base may be
chosen to make computation tractable
6

Rule based methods and grammar type (1)


Rule based methods typically founded on context free
grammars (CFGs) type 2 grammars. Phrase structure
grammars (PSGs) and Link Grammar are context free.
Models dependencies between adjacent symbols, and
between groups of symbols
E.g. (noun phrase - verb phrase)

Can be modelled by push down automata.

Cannot model context sensitive grammar.

Rule based methods and CFGs


Basic phrase structure can be modelled. It may be possible
to determine subject / predicate roles.
However,
In practice the rule base cannot usually cover unrestricted
natural language.
CFGs typically generate a large number of alternative
parses unless sentences are short and simple.
Semantic knowledge may be needed to disambiguate
sentence structure.

Context sensitive grammars


Context sensitive information is not captured. Consider
translating the sentence:
The pipe connections of the cooler must be checked.
The head of the subject connections is plural, so verb
must should be plural.
In English modal verbs have same sing/plural form, so we
cannot tell from looking at the word which it is. A CFG
could not represent the necessary context sensitive number
agreement.

Example of an empirical, probabilistic method


The CLAWS tag disambiguation system

Many

words have more than one part-of-speech tag.

The right tag needs to be found


Examples
light

adjective, noun, verb

race

noun, verb

cut

verb, noun, past participle

her

pronoun, possesive pronoun

flies

noun, verb

10

Tagsets

Tagsets of different sizes are used for different purposes,


depending on the degree of resolution needed. For
instance, nouns can be subdivided into singular and plural,
verbs into different tenses and number.
Tagsets in use vary in size from 2 (just content and
function words) to over 100.

11

Sample tagset
Tagset with 10 classes
Content words
noun
verb
adjective
adverb

Function words
determiner
pronoun
preposition
conjunction
auxiliary verb

Other

CLAWS version 5, has 61 tags


12

Tag disambiguation with CLAWS


Tag disambiguation is often a preliminary task The Claws
system is a well known method
Underlying principle:
- Based on probabilities
- How likely is a word to have a certain tag, given its context
Based on a Markov model, uses information from local
dependencies.

13

CLAWS the underlying concepts (cont.)

A markovian sequence is one in which the probability


of a certain output element occurring depends on a
finite number of preceding terms.
Here the number of preceding terms is one.
The training data is tagged by hand.
Then the frequencies of the tag transitions are counted.
These frequencies constitute the language model.

14

Current implementation of CLAWS


2 million words of the BNC tagged by hand.
Transition frequencies, how often one tag is followed by
another, were found.
This is the training data: the transition frequencies
constitute the language model.

15

Tag disambiguation - example 1


NP
NNS
VBZ
.

proper noun
plural noun
3rd person singular verb
full stop

We want to disambiguate the sentence:


Henry likes stews .
where Henry NP
likes

NNS or VBZ

stews NNS or VBZ


16

Part of transition matrix from training data


Preceding tag

NP

NNS

VBZ

NNS

28

VBZ

17

---

135

37

Following tag

NP_NNS_NNS_ . = 7 * 5 * 135 = 4725


NP_NNS_VBZ_ . = 7 * 1 * 37 = 259
NP_VBZ_NNS_ . = 17 * 28 * 135 = 64260
NP_VBZ_VBZ_ . = 17 * 0 * 37 = 0
17

Tag transition probabilities


Let t and t be part-of-speech tags
(t,t) means t follows t
C (t) is a count of the number of times that tag t occurs in the
training set
C (t,t) is a count of the number of times that tag t follows t
P ( t,t) = C (t,t)
C (t)

18

CLAWS process
Lexicon holds a list of words and possible tags.
The processor will take each word in turn and look it up.
If not there, tags are assigned using suffix or prefix rules.
Examples:
suffix
probable tag
-able
adjective
e.g. suitable
-cle
noun
e.g. article
-dle
noun or verb e.g. handle
If word does not have suffix or prefix in list, default to
noun or verb
19

Claws process (continued)


When words in text have been allocated one or more tags,
then the disambiguation process is begun.
The transition probabilities are looked up in a table, which
has been created from the training data.
Then the most likely tags are calculated

20

Tag disambiguation example 2


Norman forced her to cut down on smoking.
Norman
forced
her
to
cut
down
on
smoking

proper noun, adj


verb, adj
possessive pronoun, pronoun
prep, infinitive
verb, noun, past participle
adverb, prep, noun
prep
verb, noun, adj (present participle)
21

Corpus based application, example 2


The Alpine neural parser
The Alpine parser takes natural language text and divides
sentences up into constituent parts. For example:
Yesterday
that dog with a long tail
{pre-subject}
{subject}

bit the boy


{predicate}

Steps in the process:


1. Using CLAWS get possible tags for each word. The text
is mapped onto a sequence of parts-of-speech tags
2. Use the trained neural net to find the boundary markers
of the subject.
22

The Alpine parser (continued)


A single layer neural net is used, data turns is linearly
separable. The net is trained in supervised mode.
The neural net has 2 outputs: grammatical and
ungrammatical.
Each sentence generates a set of alternative tag sequences,
and alternative locations for the boundary markers. These
are converted into sets of features that can be entered into
an input vector.
The net determines the sequence that is grammatical.
The correct location of the boundary markers together with
the correct tags are displayed.
23

Alpine as a hybrid system


Alpine is a corpus based application, using empirical
methods
It also has a rule based component. We assert that a
sentence can be decomposed into
{pre-subject} {subject} {predicate}

24

Development of the two paradigms in historical context


Empirical methods dominant in early 1950s:
- behaviourism in psychology
- information theory.
- Rosenblatt introduced perceptrons.
Chomsky demolished the empirical position in linguistics
in his seminal book Syntactic Structures (1956)
- showed the limitations of regular grammars.
- his work on compilers contributed to his reputation.
Minsky and Papert showed the limitations of the
perceptron (1969)
Empirical, data driven approaches went out of fashion.
25

Development of the two paradigms (cont.)


Much work done developing rule based systems in limited
language domains.
Limitations of rule based language processors became
apparent in 1980s.
Modest achievements in speech recognition, based on
empirical methods, excited enthusiasm.
Empirical methods became possible as the necessary large
corpora were collected.
Increasing computing power made data driven, empirical
methods feasible.
Data driven, empirical methods came back into favour.
26

Development of the two paradigms cont.


Research shifted from academia to industry:
- more interest in working systems than theories.
Focus shifted from deep analysis of small samples of
language (rule based methods) to shallow analysis of real
language (data driven methods).
Information theory provided metrics to evaluate
empirically derived language models.
Neural networks revived:
- methods found to avoid the problems with perceptrons.
Many recent developments are based mainly on data
driven methods.

27

Integration of the two paradigms


Integrated systems are now necessary for many key tasks.
1. Hybrid systems that are primarily data driven can operate
within a rule based framework. E.g. probabilistic or neural
parsers.
2. Systems may combine modules based on different paradigms
Example : See paper LCC Tools for Question Answering, an
information retrieval system, which integrates different
processing modules. These include a probabilistic parser and
a rule based logic prover.

28