Académique Documents
Professionnel Documents
Culture Documents
Type 1
Context sensitive
Type 2
Context free
Type 3
Regular
Function words
determiner
pronoun
preposition
conjunction
auxiliary verb
Other
The underlying concepts
The underlying principle of CLAWS is based on probabilities. How likely is it that a
word has a certain tag, given its context. This method uses information from local
dependencies. For example noun verb is a likely pair, whereas determiner - verb is an
unlikely pair. What about noun noun ? People often think this is unlikely at first, but
taking an empirical approach we find in fact that in English it is very common. For
example: window pane, table cloth, paper handkerchief, Computer Science Department.
NP (proper noun)
VBZ (singular verb) or NNS (plural noun)
VBZ or NNS
. (full stop)
From the training data we have the following part of a transition matrix
Preceding tag
NP
NNS
VBZ
NNS
28
VBZ
17
not needed
135
37
Following tag
When words in text have been allocated one or more tags, then the
disambiguation process is begun.
The transition probabilities are looked up in a table, which has been created from
the training data.
Example 2
Which of the words in the following sentence have more than one part-of-speech tag ?
Norman forced her to cut down on smoking.
References for tag disambiguation
The neural net has 2 outputs: grammatical and ungrammatical, yes and no.
The correct location of the boundary markers together with the correct tags are
displayed.
Taking a similar stand, Minsky and Papert later on demonstrated the limitations of the
perceptron, Perceptrons (1969) Empirical, data driven approaches went out of fashion.
Much work was done developing rule based systems in limited language domains
usually with invented examples rather than naturally occurring sentences. But by the
1980s the limitations of rule based language processors became apparent. The goal of
codifying syntactic rules seems in practice unobtainable for unrestricted natural language.
At this time modest achievements in speech recognition, based on empirical methods,
excited enthusiasm. Empirical methods became possible as the necessary large corpora
were collected. Increasing computing power made data driven, empirical methods
feasible, and they came back into favour.
There were other developments that encouraged this trend. In the 1980s some research
and its funding shifted from academia to industry. There was more interest in working
systems than underlying theories. The focus shifted from deep analysis of small samples
of language (rule based methods) to a shallow analysis of real language (data driven
methods).
Information theory provided metrics to evaluate empirically derived language models,
based on regular grammars. Rule based systems had no comparable evaluation
techniques. This was important in securing resources in an industrial environment.
In the mid 80s neural networks had a resurgence, as methods were found to avoid the
problems with perceptrons. A key publication was Parallel Distributed Processing (1986)
by Rumelhart and McClelland. Many recent developments in natural language processing
are based mainly on data driven methods. E.g language models for speech recognition;
neural systems for information retrieval, commonly based on Support Vector Machines.
Integrated systems that incorporate rule based and data driven modules are now found to
be necessary for many key tasks. Hybrid systems that are primarily data driven can
operate within a rule based framework. E.g. probabilistic or neural parsers.
Systems may combine modules based on different paradigms.
Example 1: the Verbmobil system, which aims to translate simple conversation between
English and German. Components include probabilistic taggers, leading into rule based
parsers, integrated with semantic analysers.
Example 2: See paper LCC Tools for Question Answering, an information retrieval
system, which integrates at least 5 different processing modules. These include a
probabilistic parser and a rule based logic prover.