Académique Documents
Professionnel Documents
Culture Documents
Intelligence", while working at the University of Manchester (Turing, 1950; p. 460).It opens
with the words: "I propose to consider the question, 'Can machines think?'" Because
"thinking" is difficult to define, Turing chooses to "replace the question by another, which is
closely related to it and is expressed in relatively unambiguous words."Turing's new
question is: "Are there imaginable digital computers which would do well in the imitation
game?" This question, Turing believed, is one that can actually be answered. In the
remainder of the paper, he argued against all the major objections to the proposition that
"machines can think".
3. What are the cognitive sciences and the relation between them and AI.
------------------------------------------------------------------------------------------------------------------------Cognitive science is the interdisciplinary scientific study of the mind and its processes.It
examines what cognition is, what it does and how it works. It includes research on
intelligence and behaviour, especially focusing on how information is represented,
processed, and transformed (in faculties such as perception, language, memory, attention,
reasoning, and emotion) within nervous systems (humans or other animals) and machines
(e.g. computers). Cognitive science consists of multiple research disciplines, including
psychology, artificial intelligence, philosophy, neuroscience, linguistics, and anthropology.It
spans many levels of analysis, from low-level learning and decision mechanisms to highlevel logic and planning; from neural circuitry to modular brain organization. The
fundamental concept of cognitive science is that "thinking can best be understood in terms
of representational structures in the mind and computational procedures that operate on
those structures.
4. What are the roots of AI.
Philosophy: logic, methods of reasoning, mind as physical system, foundations of learning,
language, rationality
Mathematics: formal representation and proof, algorithms, computation, (un)decidability,
(in)tractability, probability
Psychology: adaptation phenomena of perception and motor control experimental
techniques (psychophysics, etc.)
Economics: formal theory of rational decisions
Linguistics: knowledge representation, grammar
Neuroscience: plastic physical substrate for mental activity
Control theory: homeostatic systems, stability, simple optimal agent designs
5. Name at least 3 important facts from the history of AI.
1.Greek myths of Hephaestus and Pygmalion incorporated the idea of intelligent robots
(such as Talos) and artificial beings (such as Galatea and Pandora)
2.Ren Descartes proposed that bodies of animals are nothing more than complex machines
(but that mental phenomena are of a different "substance")
3.Samuel Butler suggested that Darwinian evolution also applies to machines, and
speculates that they will one day become conscious and eventually supplant humanity
6. The General Problem Solver.
General Problem Solver or G.P.S. was a computer program created in 1959 by Herbert
A. Simon, J.C. Shaw, and Allen Newell intended to work as a universal problem solver
machine. Any problem that can be expressed as a set of well-formed formulas (WFFs) or
Horn clauses, and that constitute a directed graph with one or more sources (viz., axioms)
and sinks (viz., desired conclusions), can be solved, in principle, by GPS. Proofs in the
predicate logic and Euclidean geometry problem spaces are prime examples of the domain
the applicability of GPS. of predicate logic theorems. It was based on Simon and Newell's
theoretical work on logic machines. GPS was the first computer program which separated its
knowledge of problems (rules represented as input data) from its strategy of how to solve
problems (a generic solver engine). GPS was implemented in the third-order programming
language, IPL.
While GPS solved simple problems such as the Towers of Hanoi that could be
sufficiently formalized, it could not solve any real-world problems because search was easily
lost in the combinatorial explosion. Put another way, the number of "walks" through the
inferential digraph became computationally untenable. (In practice, even a straightforward
state space search such as the Towers of Hanoi can become computationally infeasible,
albeit judicious prunings of the state space can be achieved by such elementary AI
techniques as alpha-beta pruning and min-max.)
7) The Mycin system
MYCIN was an early expert system that used artificial intelligence to identify bacteria
causing severe infections, such as bacteremia and meningitis, and to recommend
antibiotics, with the dosage adjusted for patient's body weight the name derived from
the antibiotics themselves, as many antibiotics have the suffix "-mycin". The Mycin system
was also used for the diagnosis of blood clotting diseases.
MYCIN was developed over five or six years in the early 1970s at Stanford University.
It was written in Lisp as the doctoral dissertation of Edward Shortliffe under the direction of
Bruce G. Buchanan, Stanley N. Cohen and others. It arose in the laboratory that had created
the earlier Dendral expert system.
MYCIN was never actually used in practice but research indicated that it proposed an
acceptable therapy in about 69% of cases, which was better than the performance of
infectious disease experts who were judged using the same criteria.
8) The Eliza system.
ELIZA is a computer program and an early example of primitive natural language
processing. ELIZA operated by processing users' responses to scripts, the most famous of
which was DOCTOR, a simulation of a Rogerian psychotherapist. Using almost no
information about human thought or emotion, DOCTOR sometimes provided a startlingly
human-like interaction. ELIZA was written at MIT by Joseph Weizenbaum between 1964 and
1966.
When the "patient" exceeded the very small knowledge base, DOCTOR might provide a
generic response, for example, responding to "My head hurts" with "Why do you say your
head hurts?" A possible response to "My mother hates me" would be "Who else in your
family hates you?" ELIZA was implemented using simple pattern matching techniques, but
was taken seriously by several of its users, even after Weizenbaum explained to them how it
worked. It was one of the first chatterbots.
9) Name at least 3 programming languages and/or frameworks specific for AI. (Note: general
purpose programming languages, such as Java are excluded).
Prolog is a declarative language where programs are expressed in terms of relations, and
execution occurs by running queries over these relations. Prolog is particularly useful for
symbolic reasoning, database and language parsing applications. Prolog is widely used in AI
today.Prolog is a declarative language where programs are expressed in terms of relations,
and execution occurs by running queries over these relations. Prolog is particularly useful
for symbolic reasoning, database and language parsing applications. Prolog is widely used in
AI today.
Python is very widely used for Artificial Intelligence. They have a lot of different AIs with
corresponding packages: General AI, Machine Learning, Natural Language Processing and
Neural Networks. Companies like Narrative Science use Python to create an artificial
intelligence for Narrative Language Processing
IPL was the first language developed for artificial intelligence. It includes features
intended to support programs that could perform general problem solving, including lists,
associations, schemas (frames), dynamic memory allocation, data types, recursion,
associative retrieval, functions as arguments, generators (streams), and cooperative
multitasking.
10) Name at least 5 subfields of AI.
algorithm. Since it is a depth-first search algorithm, its memory usage is lower than in A*, but
unlike ordinary iterative deepening search, it concentrates on exploring the most promising
nodes and thus doesn't go to the same depth everywhere in the search tree. Unlike A*, IDA*
doesn't utilize dynamic programming and therefore often ends up exploring the same nodes
many times.
While the standard iterative deepening depth-first search uses search depth as the cutoff for
each iteration, the IDA* uses the more informative
cost to travel from the root to node
cost to travel from
and
where
is the
which estimates the cost of reaching the goal by taking a path through that
node. The lower the value, the higher the priority. As in A* this value is initialized
to
, but will then be updated to reflect changes to this estimate when its children
choosing a node to expand, it chooses the best according to that order. When selecting a node
to prune, it chooses the worst.
RBFS is a linear-space algorithm that expands nodes in best-first order even with a nonmonotonic cost function and generates fewer nodes than iterative deepening with a
monotonic cost function.
In order to be expanded, the upper bound on a node must be at least as large as its stored
value.
If a node has been previously expanded, its stored value will be greater than its static value.
If the stored value of a node is greater than its static value, its stored value is the minimum
of the last stored values of its children.
In general, a parents stored alue is passed do n to its hildren, hi h inherit the alue
onl if it e eeds oth the parents stati alue and the hilds stati alue.
17) Describe the Hill-climbing search algorithm. Example.
In computer science, hill climbing is a mathematical optimization technique which belongs to
the family of local search. It is an iterative algorithm that starts with an arbitrary solution to a
problem, then attempts to find a better solution by incrementally changing a single element of
the solution. If the change produces a better solution, an incremental change is made to the
new solution, repeating until no further improvements can be found.
For example, hill climbing can be applied to the travelling salesman problem. It is easy to find an
initial solution that visits all the cities but will be very poor compared to the optimal solution. The
algorithm starts with such a solution and makes small improvements to it, such as switching the
order in which two cities are visited. Eventually, a much shorter route is likely to be obtained.
Hill climbing is good for finding a local optium (a solution that cannot be improved by
considering a neighbouring configuration) but it is not necessarily guaranteed to find the best
possible solution (the global optium) out of all possible solutions (the search space). In convex
problems, hill-climbing is optimal. Examples of algorithms that solve convex problems by hillclimbing include the simplex algorithm for linear programming and binary search.
The relative simplicity of the algorithm makes it a popular first choice amongst optimizing
algorithms. It is used widely in artificial intelligence, for reaching a goal state from a starting
node. Choice of next node and starting node can be varied to give a list of related algorithms.
Although more advanced algorithms such as simulated annealing or tabu search may give
better results, in some situations hill climbing works just as well. Hill climbing can often produce
a better result than other algorithms when the amount of time available to perform a search is
limited, such as with real-time systems. It is an anytime algorithm: it can return a valid solution
even if it's interrupted at any time before it ends.
the satisfiability modulo theory(SMT) and answer set programming (ASP) can be roughly
thought of as certain forms of the constraint satisfaction problem.
Examples of simple problems that can be modeled as a constraint satisfaction problem
Sudoku.
Examples demonstrating the above are often provided with tutorials of ASP, Boolean SAT and
SMT solvers. In the general case, constraint problems can be much harder, and may not be
expressible in some of these simpler systems.
"Real life" examples include automated planning and resource allocation.
many other meta-heuristics, it is guaranteed to find all solutions to a finite problem in a bounded
amount of time.
will exist. The process terminates when achieving quiescence, meaning that a solution has been found,
or when the empty nogood is generated, meaning that the problem is unsolvable.
the tool used to implement the expert system, the explanation may be either in a natural language or
simply a listing of rule numbers.
27) Knowledge representation using the first order predicate logic. Constants, predicates,
functions, variables, connectives, quantifiers.
Functions, which are a subset of relations where there is only one value for any given
input
A sentence is
inconsistent if there does not exist any interpretation under which the sentence is true
Basics:
- binary predicates:
pq
the modus ponens inference rule. It is one of the two most commonly used methods of reasoning with
inference rules and logical implications the other is forward chaining. Backward chaining systems
usually employ a depth-first search strategy, e.g. Prolog.
Some of the benefits of IF-THEN rules are that they are modular, each defining a
relatively small and, at least in principle, independent piece of knowledge. New rules
may be added and old ones deleted usually independently of other rules.
Mycin was designed to help the doctor to decide whether a patient has a bacterial
infection, which organism is responsible, which drug may be appropriate for this
infection, and which may be used on the specific patient.
The global knowledge base contains facts and rules relating for example symptoms to
infections, and the local database will contain particular observations about the patient
being examined. A typical rule in Mycin is as follows:
IF the identity of the germ is not known with certainty
AND the germ is gram-positive
AND the morphology of the organism is "rod"
AND the germ is aerobic
THEN there is a strong probability (0.8) that the germ is of type
enterobacteriacae
WordNet properties have been studied from a network theory perspective and compared to other
semantic networks created from Roget's Thesaurus and word association tasks. From this perspective
the three of them are a small world structure.
The deftemplate construct is used to create a template which can then be used by
nonordered facts to access fields of the fact by name. The deftemplate construct is
(temperature high)
(valve a3572 open)
40) CLIPS/JESS. Assert and retract. Examples.
Asserting facts
There are two ways of introducing facts into the CLIPS database. One way is to
include them in a set of initial facts
(deffacts initial-facts
(student (sno 123) (sname maigret) (major pre-med)
(advisor simenon)) ... )
deffacts are asserted after the CLIPS file containing them has been loaded into CLIPS
(see below) and then after the (reset) command.
Another way to assert a fact is to assert it "on the fly", generally as an action in a rule:
(assert (student (sno 123) (sname maigret)
(major pre-med) (advisor simenon)))
Notice that, just as in LISP, parenthesis are important. Although CLIPS was written
in C (the letters CLI in CLIPS stand for "C Language Implementation"), the syntax
for using CLIPS is very much LISP-like.
41) CLIPS/JESS. Constraints. Examples
One type of field constraint is called a connective constraint. There are three types of
connective constraints. The first is called a ~ constraint. Its symbol is the tilde "~".
The ~ constraint acts on the one value that immediately follows it and will not allow
that value.
As a simple example of the ~ constraint, suppose you wanted to write a rule that
would print out "Don't walk" if the light was not green. One approach would be to
write rules for every possible light condition, including all possible malfunctions:
yellow, red, blinking yellow, blinking red, blinking green, winking yellow, blinking
yellow and winking red, and so forth. However, a much easier approach is to use the ~
constraint as shown in the following rule:
(defrule walk
(light ~green)
=>
(printout t "Don't walk" crlf))
By using the ~ constraint, this one rule does the work of many other rules that
required specifying each light condition.
42) CLIPS/JESS. Relational expressions. Examples.
which takes the value true if the value of the variable x is less than the value of the
variable y.
The general form of a relational expression is:
operand1 relational-operator operand2
The operands can be either variables, constants or expressions. If an operand is an
expression then the expression is evaluated and its value used as the operand.
The relational-operators allowable in C++ are:
less than
> greater than
<= less than or equal to
>= greater than or equal to
== equals
!= not equals
<
Note that equality is tested for using the operator == since = is already used for
assigning values to variables.
The condition is true if the values of the two operands satisfy the relational operator,
and false otherwise.
The following example shows how (read) is used to input data. Note that no extra
(crlf) is needed after the (read) to put the cursor on a new line. The (read)
automatically resets the cursor to a new line.
CLIPS> (clear)
CLIPS> (defrule read-input
(initial-fact)
=>
(printout t "Name a primary color" crlf)
(assert (color (read))))
CLIPS>
(defrule check-input
?color <- (color ?color-read&red|yellow|blue)
=>
(retract ?color)
(printout t "Correct" crlf))
CLIPS> (reset)
CLIPS> (run)
Name a primary color
red
Correct
CLIPS> (reset)
CLIPS> (run)
Name a primary color
green
CLIPS> ; No "correct"
The rule is designed to use keyboard input on the RHS, so it's convenient to trigger
the rule with (initial-fact). Otherwise, you'd have to make up some dummy fact to
trigger the rule.
The (read) function is not a general-purpose function that will read anything you type
on the keyboard. One limitation is that (read) will read only one field. So if you try to
read
primary color is red
only the first field, "primary", will be read. To (read) all the input, you must enclose
the input within double quotes. Of course, once the input is within double quotes, it is
a single literal field. You can then access the substrings "primary", "color", "is", and
"red" with the strexplode or sub-string functions.
The second limitation of (read) is that you can't input parentheses unless they are
within double quotes. Just as you can't assert a fact containing parentheses, you can't
(read) parentheses directly except as literals.
The readline function is used to read multiple values until terminated by a carriage
return. This function reads in data as a string. In order to assert the (readline) data, an
(assert- string) function is used to assert the nonstring fact, just as input by (readline).
A top-level example of (assert-string) follows.
CLIPS> (clear)
CLIPS> (assert-string "(primary color is red)")
<Fact-0>
CLIPS> (facts)
f-0 (primary color is red)
For a total of 1 fact.
CLIPS>
Notice that the argument of (assert-string) must be a string The following shows how
to assert a fact of multiple fields from (readline).
CLIPS> (clear)
CLIPS> (defrule test-readline
(initial-fact)
=>
(printout t "Enter input" crlf)
(bind ?string (readline))
(assert-string (str-cat "(" ?string ")")))
CLIPS> (reset)
CLIPS> (run)
Enter input
primary color is red
CLIPS> (facts)
f-0 (initial-fact)
f-1 (primary color is r ed)
For a total of 2 facts.
CLIPS>
Since (assert-string) requires parentheses around the string to be asserted, the (str- cat)
function is used to put them around ?string.
Both (read) and (readline) also can be used to read information from a file by
specifying the logical name of the file as the argument. For more information, see
the CLIPS Reference Manual.
44) CLIPS/Jess. Loop structures and/or techniques.
Loop
The loop loop is used when a single variable is being incremented with each successive loop.
Syntax: loop (variable, startVal, endVal) {script};
A simple increment loop structure. Initializes variable with the value of starVal. Executes script.
Increments variable and tests if it is greater than endVal. If it is not, executes script and continues to loop.
For example, the following script outputs numbers from 1 to 4:
loop (ii, 1, 4)
{type "$(ii)";};
Note: The loop command provides faster looping through a block of script than does the for command. The
enhanced speed is a result of not having to parse out a LabTalk expression for the condition required to stop the
loop.
Doc -e
The doc -e loop is used when a script is being executed to affect objects of a specific type, such as graph
windows. The doc -e loop tells Origin to execute the script for each instance of the specified object type.
Syntax: doc -e object {script};
The different object types are listed in the document command.
For example, the following script prints the windows title of all graph windows in the project:
doc -e P {%H=}
For
The for loop is used for all other situations.
Syntax: for (expression1; expression2; expression3) {script};
In the for statement, expression1 is evaluated. This specifies initialization for the loop. Second, expression2 is
evaluated and if true (non-zero), the script is executed. Third, expression3, often incrementing of a counter, is
executed. The process repeats at the second step. The loop terminates when expression2 is found to be false
(zero). Any expression can consist of multiple statements, each separated by a comma.
For example, the following script output numbers from 1 to 4:
45) Planning. The language of planning problems (from section 11.1 of [1]).
PLANNING - In which we see how an agent can take advantage of the structure of a problem to
construct complex plans of action.
The language of planning problems The preceding discussion suggests that the representation of
planning problemsstates, actions, and goalsshould make it possible for planning algorithms to take
advantage of the logical structure of the problem. The key is to find a language that is expressive enough
to describe a wide variety of problems, but restrictive enough to allow efficient algorithms to operate
over it.
48) Bayes' Rule and Its Use (from section 13.6 of [1]).
In probability theory and applications, Bayes's rule relates the odds of event A_1 to the odds of
event A_2, before (prior to) and after (posterior to) conditioning on another event B. The odds on A_1 to
event A_2 is simply the ratio of the probabilities of the two events. The prior odds is the ratio of the
unconditional or prior probabilities, the posterior odds is the ratio of conditional or posterior
probabilities given the event B. The relationship is expressed in terms of the likelihood ratio or Bayes
factor, \Lambda. By definition, this is the ratio of the conditional probabilities of the event B given that
A_1 is the case or that A_2 is the case, respectively. The rule simply states: posterior odds equals prior
odds times Bayes factor (Gelman et al., 2005, Chapter 1).
Bayes' rule is an equivalent way to formulate Bayes' theorem. If we know the odds for and against
A we also know the probabilities of A. It may be preferred to Bayes' theorem in practice for a number of
reasons.
Bayes' rule is widely used in statistics, science and engineering, for instance in model selection,
probabilistic expert systems based on Bayes networks, statistical proof in legal proceedings, email spam
filters. As an elementary fact from the calculus of probability, Bayes' rule tells us how unconditional and
conditional probabilities are related whether we work with a frequentist interpretation of probability or
a Bayesian interpretation of probability. Under the Bayesian interpretation it is frequently applied in the
situation where A_1 and A_2 are competing hypotheses, and B is some observed evidence. The rule
shows how one's judgement on whether A_1 or A_2 is true should be updated on observing the
evidence B .
A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is
assumed to be a Markov process with unobserved (hidden) states. A HMM can be presented as the
simplest dynamic Bayesian network.
In simpler Markov models (like a Markov chain), the state is directly visible to the observer, and
therefore the state transition probabilities are the only parameters. In a hidden Markov model, the state
is not directly visible, but the output, dependent on the state, is visible. Each state has a probability
distribution over the possible output tokens. Therefore the sequence of tokens generated by an HMM
gives some information about the sequence of states. The adjective 'hidden' refers to the state
sequence through which the model passes, not to the parameters of the model; the model is still
referred to as a 'hidden' Markov model even if these parameters are known exactly.
Hidden Markov models are especially known for their application in temporal pattern recognition
such as speech, handwriting, gesture recognition, part-of-speech tagging, musical score following,partial
discharges and bioinformatics.
A hidden Markov model can be considered a generalization of a mixture model where the hidden
variables (or latent variables), which control the mixture component to be selected for each
observation, are related through a Markov process rather than independent of each other. Recently,
hidden Markov models have been generalized to pairwise Markov models and triplet Markov models
which allow consideration of more complex data structures and the modelling of nonstationary data.
54) Machine learning. Learning decision trees algorithm (from section 18.3 of [1])
Decision tree learning uses a decision tree as a predictive model which maps observations about an
item to conclusions about the item's target value. It is one of the predictive modelling approaches used
in statistics, data mining and machine learning. Tree models where the target variable can take a finite
set of values are called classification trees. In these tree structures, leaves represent class labels and
branches represent conjunctions of features that lead to those class labels. Decision trees where the
target variable can take continuous values (typically real numbers) are called regression trees.
In decision analysis, a decision tree can be used to visually and explicitly represent decisions and
decision making. In data mining, a decision tree describes data but not decisions; rather the resulting
classification tree can be an input for decision making. This page deals with decision trees in data
mining.
55) Statistical Learning Methods. Nave Bayes models (from section 20.2 of [1])
Naive Bayes models have been widely used for clustering and classification. However, they are
seldom used for general probabilistic learning and inference (i.e., for estimating and computing arbitrary
joint, conditional and marginal distributions). In this paper we show that, for a wide range of benchmark
datasets, naive Bayes models learned using EM have accuracy and learning time comparable to Bayesian
networks with context-specific independence. Most significantly, naive Bayes inference is orders of
magnitude faster than Bayesian network inference using Gibbs sampling and belief propagation. This
makes naive Bayes models a very attractive alternative to Bayesian networks for general probability
estimation, particularly in large or real-time domains.
56) Single layer feed-forward neural networks (perceptrons) (from section 20.5 of [1])
Any number of McCulloch-Pitts neurons can be connected together in any way we like. The
arrangement that has one layer of input neurons feeding forward to one output layer of McCulloch-Pitts
neurons, with full connectivity, is known as a Perceptron
This is a very simple network, but it is already a powerful computational device. Later we shall see
variations of it that make it even more powerful.
MLF neural networks, trained with a back-propagation learning algorithm, are the most popular
neural networks. They are applied to a wide variety of chemistry related problems . A MLF neural
network consists of neurons, that are ordered into layers . The first layer is called the input layer, the
last layer is called the out- mation .