Académique Documents
Professionnel Documents
Culture Documents
Outline
Introduction Formal Learning Theory and motivation Golds Paradigm Alterative Models of Language Acquisition Strong Nativism (time permitting)
Introduction
Motivation
I wish to construct a precise model for the intuitive notion able to speak a language in order to be able to investigate theoretically how it can be achieved artificially. EM Gold (1967)
Comparative Grammar
Comparative Grammar is the attempt to characterize the class of natural languages through formal specification of their grammars; Theories of comparative grammars begin with Chomsky (e.g. 1957, 1965).
Thus
If we will be able to construct a formal model on how children learn a language, we will be able to train a computer to learn a language. Maybe.
Golds Paradigm
Definitions
For the purpose of this discussion we will need to define the following:
Language; Learner; Learning Environment; Criterion of Learning;
| L |= 0
All Logically possible Grammars are defined here as all possible Turing Machines.
Decidable Languages
A language is said to be decidable iff it has a grammar and its complement has a grammar. We focus on Non-empty languages.
Environment
To understand how a learner acquires a language, we must understand their learning environment. Assumptions on Learning environment: Sentences are presented one after another with no ungrammatical intrusions; Negative Information is withheld; Each sentence in L eventually appears; Repetitions are allowed; Sentences can arrive at any order; Sentences are presented forever;
1. 2. 3. 4. 5. 6.
Texts as Environments
An environment is referred to as text. A text is for a language L if every member of L appears somewhere in t (repetitions are allowed), and no members of L appear in t. L(t) denotes the language for which t is a text. We denote a text t, and the first n members of t are denoted tn. The set of all finite sequences of any length (t1,t2,) is denoted SEQ.
Learning Function
A learning function is defined as any function from the set of all finite sentence sequences (denoted SEQ) the set of possible grammars. f : SEQ G Note: It may be that some learning functions may be undefined.
Learning Functions
We wish to define some criterion that would enable us to decide what is a good learning function and what is a bad one.
Criterion of Learning
In his paper Language identification in the Limit, Gold has suggested the following criterion for learning:
1. A learning function f is defined on text t if f is defined on tn for all n in N; 2. If f is defined on t and for some grammar g in G, f(tn)=g, for all but finitely many n in N then f is said to converge on t to g ; 3. If f converges on t to a grammar for L(t), then f is said to identify t. 4. If f identifies every text for a language, L, then f is said to identify L. 5. If f identifies every language in a set of languages then f is said to identify that set of languages. 6. A collection of languages is said to be identifiable if there is some learning function f which identifies it.
Intuitive Example
To have some intuition on what is meant by identifying a language lets consider the following example: A text t is fed to a learner M, one sentence at a time; With each new input, M is faced with a finite set of sequences; M is defined on t if it offers a hypothesis on all of these finite sequence of sentences; If M is undefined somewhere on t, then it is stuck; If M does not get stuck and after some finite time converges on t to a grammar g, M has identified the language.
M
Questions
We will see the constraints that these alternative models impose on the languages which they are able to learn.
children
Computability
We would like to believe that language acquisition is computable. i.e. For a natural language L there exists a Turing Machine M, which generates a Turing Machine G, which identifies a L. Computable functions are a small subset of all learning functions. If we assume that language acquisition is computable, what constraints are we imposing?
Learning Functions
Learning Functions
Computable
Nontriviality
Natural Languages are infinite.
(No natural language contains the longest sentence).
A learning function is considered nontrivial if: a. It is computable; b. It produces a grammar which generates an infinite language on every finite sequence for which it is defined.
Constraints of Nontriviality
The next proposition shows that nontriviality imposes limits on the computable learner: PROPOSITION: There are collections, L of infinite languages such that some computable function identifies L ,but no nontrivial learning functions identifies L. If children are nontrivial learners, then there are collections of infinite languages beyond their reach that might otherwise (if theyre not nontrivial learners) have been available.
Learning Functions
Learning Learning functions of Functions identifiable languages Children? Computable
Nontrivial
Natural Environments
Gold has defined environment as text. One the one hand: We know that a real learning environment contains ungrammatical intrusions, as well as omission of some grammatical sentences; On the other hand: Arbitrary text consists of arbitrary ordering of sentences. Usually this is not the case. Waldo is in the red house. all positive even integers greater than two can be expressed as the sum of two primes.
Noisy Text
Suppose D is some arbitrary finite set. A noisy text for a language is a a text for L U D. That is, a text for L into which a number of intrusions has been inserted. It is easy to see that no collection of languages that includes finite variants is identifiable on noisy text.
Strong Nativism
Summary
We have presented Golds paradigm and the difficulties with the assumptions it makes. We have explored a few alternative models, and witnessed their constraints. We have introduced briefly the concept of Strong Nativism. Ultimately, we might hope to find sufficiently powerful conditions on the human learning function, on the environments in which it typically operates, and on the criterion of success to uniquely define the class of natural languages.