Vous êtes sur la page 1sur 58

UNIT-I

Q1. What is Artificial Intelligence? Explain history and importance of AI.


Ans:- Artificial Intelligence is the study of how to make computers do things, which at the
moment, people can do better.
History of AI:Although the computer provided the technology necessary for AI, it was not until the
early 1950's that the link between human intelligence and machines was really observed. Norbert
Wiener was one of the first Americans to make observations on the principle of feedback theory
feedback theory. The most familiar example of feedback theory is the thermostat: It controls the
temperature of an environment by gathering the actual temperature of the house, comparing it to
the desired temperature, and responding by turning the heat up or down. What was so important
about his research into feedback loops was that Wiener theorized that all intelligent behavior was
the result of feedback mechanisms. Mechanisms that could possibly be simulated by machines.
This discovery influenced much of early development of AI.
Importance of AI:The subject of artificial intelligence was originated with game-playing and theoremproving programs and was gradually enriched with theories from a number of parent disciplines.
Learning Systems: Among the subject areas covered under artificial intelligence, learning
systems needs special mention. The concept of learning is illustrated here with reference to a
natural problem of learning of pronunciation by a child from his mother.
Knowledge Representation and Reasoning: In a reasoning problem, one has to reach a predefined goal state from one or more given initial states. So, the lesser the number of transitions
for reaching the goal state, the higher the efficiency of the reasoning system.
Planning: Another significant area of artificial intelligence is planning. The problems of
reasoning and planning share many common issues, but have a basic difference that originates
from their definitions.
Knowledge Acquisition: Acquisition (Elicitation) of knowledge is equally hard for machines as
it is for human beings. It includes generation of new pieces of knowledge from given knowledge
base, setting dynamic data structures for existing knowledge, learning knowledge from the
environment and refinement of knowledge.
Logic Programming: For more than a century, mathematicians and logicians were used to
designing various tools to represent logical statements by symbolic operators. One outgrowth of
such attempts is propositional logic, which deals with a set of binary statements (propositions)
connected by Boolean operators.
Soft Computing: Soft computing, according to Prof. Zadeh, is "an emerging approach to
computing, which parallels the remarkable ability of the human mind to reason and learn in an
environment of uncertainty and imprecision.
Q2. What are application fields of AI?
Ans:- Almost every branch of science and engineering currently shares the tools and techniques
available in the domain of artificial intelligence. However, for the sake of the convenience of the
readers, we mention here a few typical applications, where AI plays a significant and decisive
role in engineering automation.
Expert Systems: For example, we illustrate the reasoning process involved in an expert system

for a weather forecasting problem with special emphasis to its architecture. An expert system
consists of a knowledge base, database and an inference engine for interpreting the database
using the knowledge supplied in the knowledge base. The inference engine attempts to match the
antecedent clauses (IF parts) of the rules with the data stored in the database. When all the
antecedent clauses of a rule are available in the database, the rule is fired, resulting in new
inferences. The resulting inferences are added to the database for activating subsequent firing of
other rules. In order to keep limited data in the database, a few rules that contain an explicit
consequent (THEN) clause to delete specific data from the databases are employed in the
knowledge base. On firing of such rules, the unwanted data clauses as suggested by the rule are
deleted from the database. Here PR1 fires as both of its antecedent clauses are present in the
database. On firing of PR1, the consequent clause "it-will-rain" will be added to the database for
subsequent firing of PR2.
Image Understanding and Computer Vision: A digital image can be regarded as a twodimensional array of pixels containing gray levels corresponding to the intensity of the reflected
illumination received by a video camera. For interpretation of a scene, its image should be
passed through three basic processes: low, medium and high level vision.
The importance of low level vision is to pre-process the image by filtering from noise. The
medium level vision system deals with enhancement of details and segmentation (i.e.,
partitioning the image into objects of interest ). The high level vision system includes three steps:
recognition of the objects from the segmented image, labeling of the image and interpretation of
the scene. Most of the AI tools and techniques are required in high level vision systems.
Recognition of objects from its image can be carried out through a process of pattern
classification, which at present is realized by supervised learning algorithms. The interpretation
process, on the other hand, requires knowledge-based computation.
Speech and Natural Language Understanding: Understanding of speech and natural
languages is basically two classical problems. In speech analysis, the main problem is to separate
the syllables of a spoken word and determine features like amplitude, and fundamental and
harmonic frequencies of each syllable. The words then could be identified from the extracted
features by pattern classification techniques. Recently, artificial neural networks have been
employed to classify words from their features. The problem of understanding natural languages
like English, on the other hand, includes syntactic and semantic interpretation of the words in a
sentence, and sentences in a paragraph. The syntactic steps are required to analyze the sentences
by its grammar and are similar with the steps of compilation. The semantic analysis, which is
performed following the syntactic analysis, determines the meaning of the sentences from the
association of the words and that of a paragraph from the closeness of the sentences. A robot
capable of understanding speech in a natural language will be of immense importance, for it
could execute any task verbally communicated to it. The phonetic typewriter, which prints the
words pronounced by a person, is another recent invention where speech understanding is
employed in a commercial application.
Scheduling: In a scheduling problem, one has to plan the time schedule of a set of events to
improve the time efficiency of the solution. For instance in a class-routine scheduling problem,
the teachers are allocated to different classrooms at different time slots, and we want most
classrooms to be occupied most of the time. Flowshop scheduling problems are a NP complete
problem and determination of optimal scheduling (for minimizing the make-span) thus requires
an exponential order of time with respect to both machine-size and job-size. Finding a suboptimal solution is thus preferred for such scheduling problems. Recently, artificial neural nets

and genetic algorithms have been employed to solve this problem. The heuristic search, to be
discussed shortly, has also been used for handling this problem.
Intelligent Control: In process control, the controller is designed from the known models of the
process and the required control objective. When the dynamics of the plant is not completely
known, the existing techniques for controller design no longer remain valid. Rule-based control
is appropriate in such situations. In a rule-based control system, the controller is realized by a set
of production rules intuitively set by an expert control engineer. The antecedent (premise) part of
the rules in a rule-based system is searched against the dynamic response of the plant parameters.
The rule whose antecedent part matches with the plant response is selected and fired. When more
than one rule is fir able, the controller resolves the conflict by a set of strategies. On the other
hand, there exist situations when the antecedent part of no rules exactly matches with the plant
responses. Such situations are handled with fuzzy logic, which is capable of matching the
antecedent parts of rules partially/ approximately with the dynamic plant responses. Fuzzy
control has been successfully used in many industrial plants. One typical application is the power
control in a nuclear reactor. Besides design of the controller, the other issue in process control is
to design a plant (process) estimator, which attempts to follow the response of the actual plant,
when both the plant and the estimator are jointly excited by a common input signal. The fuzzy
and artificial neural network-based learning techniques have recently been identified as new
tools for plant estimation.
Q3.Explain State space representation of AI with help of Water jug problem.
Ans:- "You are given two jugs, a 4-gallon one and a 3-gallon one. Neither has any measuring
markers on it. There is a tap that can be used to fill the jugs with water. How can you get exactly
2 gallons of water into the 4-gallon jug?".
We can look at a state as a pair of numbers, where the first represents the number of
gallons of water currently in Jug-A and the second represents the number of gallons in Jug-B.
State space search:1) (x,y)
->(4,y)
if x<4
2) (x,y)
->(x,3)
if y<3
3) (x,y)
->(x-d,y)
if x>0
4) (x,y)
->(x,y-d)
if y>0
5) (x,y)
->(0,y)
if x>0
6) (x,y)
->(x,0)
if y>0
7) (x,y)
->(4,y-(4-x))
if x+y>=4,y>0
8) (x,y)
->(x-(3-y),3)
if x+y>=3,x>0
9) (x,y)
->(x+y,0)
if x+y<4,y>0

10) (x,y)
->(0,x+y)
if x+y<3,x>0
11) (0,2)
->(2,0)
12) (2,y)

->(0,y)

Solution:1. current state=(0,0)


2.Loop until reaching the goal state=(2,0)
(0,0)
(0,3)
(3,0)
(3,3)
(4,2)
(0,2)
(2,0)
Q4. Discuss constraint satisfaction procedure to solve the following cryparithmatic
problem:

SEND
+M O R E
MO N E Y
Ans:- we have to replace each letter by a distinct digit so that the resulting sum is correct.
Two-step process:
1.
Constraints are discovered and propagated as far as possible.
2.
If there is still not a solution, then search begins, adding new constraints.

Two kinds of rules:


1.
Rules that define valid constraint propagation.
2.
Rules that suggest guesses when necessary.
Q5. Distinguish between A* and AO* algorithms?
Ans:- A* algorithm: A* uses a best-first search and finds the least-cost path from a given initial node to one
goal node (out of one or more possible goals).
It uses a distance-plus-cost heuristic function (usually denoted f(x)) to determine the
order in which the search visits nodes in the tree. The distance-plus-cost heuristic is a
sum of two functions:
o the path-cost function, which is the cost from the starting node to the current node
(usually denoted g(x))
o an admissible "heuristic estimate" of the distance to the goal (usually denoted
h(x)).
o The h(x) part of the f(x) function must be an admissible heuristic; that is, it must
not overestimate the distance to the goal.
o Thus, for an application like routing, h(x) might represent the straight-line
distance to the goal, since that is physically the smallest possible distance between
any two points or nodes.
o If the heuristic h satisfies the additional condition for every edge x, y of the graph
(where d denotes the length of that edge), then h is called monotone, or consistent.
o In such a case, A* can be implemented more efficientlyroughly speaking, no
node needs to be processed more than once (see closed set below)and A* is
equivalent to running Dijkstra's algorithm with the reduced cost d'(x,y): = d(x,y)
h(x) + h(y).

AO* algorithm:

Initialize the graph to start node.


Traverse the graph following the current path accumulating nodes that have not yet been
expanded or solved.
Pick any of these nodes and expand it and if it has no successors call this value
FUTILITY otherwise calculate only f for each of the successors.
If f is 0 then mark the node as SOLVED.
Change the value of f for the newly created node to reflect its successors by back
propagation.
Wherever possible use the most promising routes and if a node is marked as SOLVED
then mark the parent node as SOLVED.
If starting node is SOLVED or value greater than FUTILITY, stop, else repeat from 2.

Q6. .Discuss Game playing. Explain Alpha-beta pruning.


Ans. Game playing
Games are well-defined problems that are generally interpreted as requiring intelligence
to play well.
Introduces uncertainty since opponents moves can not be determined in advance
Computer programs which play 2-player games
game-playing as search
with the complication of an opponent
General principles of game-playing and search
evaluation functions
minimax principle
alpha-beta-pruning
heuristic techniques
Status of Game-Playing Systems
in chess, checkers, backgammon, Othello, etc, computers routinely defeat leading
world players
Applications?
think of nature as an opponent
economics, war-gaming, medical drug treatment
Alpha-beta pruning
Idea:
Do depth first search to generate partial game tree,
Give static evaluation function to leaves,
Compute bound on internal nodes.
Alpha, Beta bounds:
Alpha value for max node means that max real value is at least alpha.
Beta for min node means that min can guarantee a value below beta.
Computation:
Alpha of a max node is the maximum value of its seen children.
Beta of a min node is the minimum value seen of its child node .
Pruning

Below a Min node whose beta value is lower than or equal to the alpha value of
its ancestors.
Below a Max node having an alpha value greater than or equal to the beta value of
any of its Min nodes ancestors.
Worst-Case
Branches are ordered so that no pruning takes place. In this case alpha-beta gives
no improvement over exhaustive search
Best-Case
Each players best move is the left-most alternative (i.e., evaluated first)
In practice, performance is closer to best rather than worst-case
In practice often get O(b(d/2)) rather than O(bd)
This is the same as having a branching factor of sqrt(b),
since (sqrt(b))d = b(d/2) (i.e., we have effectively gone from b to square
root of b)
In chess go from b ~ 35 to b ~ 6
permiting much deeper search in the same amount of time
Alpha-Beta General Principle
Consider a node n where it is Players choice of moving to that node. If Player has a better
choice m at either the parent node of n or at any choice point further up, then n will never be
reached in actual play.
Maintain two parameters in depth-first search, a, the value of the best (highest) value found so
far for MAX along any path; and b, the best (lowest) value found along any path for MIN. Prune
a subtree once it is known to be worse than the current a or b.
Effectiveness of Alpha-Beta
Amount of pruning depends on the order in which siblings are explored.
In optimal case where the best options are explored first, time complexity reduces from O(bd) to
O(bd/2), a dramatic improvement. But entails knowledge of best move in advance!
With successors randomly ordered, assymptotic bound is O((b/logb)d) which is not much help
but only accurate for b>1000. More realistic expectation is something like O(b3d/4).
Fairly simple ordering heuristic can produce closer to optimal results than random results (e.g.
check captures & threats first).
Theoretical analysis makes unrealistic assumptions such
as utility values distributed randomly across leaves and therefore experimental results are
necessary.
Q7. Explain Min-Max method of generating the game tree.
Ans. The Minmax search procedure is a depth-first, depth-limited procedure. The ideas is to start
at the current position & use the plausible-move generator to generate the set of possible
successor positions. Now we can apply the static evaluation function to those positions & simply
choose the best one.
1. Generate the whole game tree to leaves
2. Apply utility (payoff) function to leaves
3. Back-up values from leaves toward the root:
a Max node computes the max of its child values
a Min node computes the Min of its child values
4. When value reaches the root: choose max value and the corresponding move.

1. Minimax Procedure
Figure 2 shows a hypothetical game tree with scores assigned at leaves ( terminal nodes).
As we are looking ahead, we need the evaluation function only at the leaves of the tree,
and the program will make a move based on these values. We start with First Player at the
root and examine the whole situation in her perspective. The move the program chooses
is a branch coming from the root and the program picks a move to maximize the score in
order to minimize mistakes. She also assumes her opponent is as good as her. At the next
level down, her opponent selects a move to minimize her score, and so on.

Figure 2 A Game tree with values assigned at leaves


By working up from the bottom of the tree, the program can assign backed-up values to
all the nodes. For example, at the right side of Figure 2, the parent of the leaves with
score 3 and 8 is at level 3, corresponding to Second Player who moves to minimize the
score; so he chooses 3, the minimum of ( 3, 8 ) and we assign score 3 to the node. Its
parent, at level 2, corresponding to First Player will choose from ( 3, 12 ) to maximize the
score; so she chooses 12 and the program assigns score 12 to the node; the parent of this
node again corresponds to Second Player who then chooses 6 from ( 6, 12 ) to minimize
the score and so on. The resulted tree is shown in Figure 3

Figure 3 Minimax evaluation of a game tree.


Since we alternately take minima and maxima, this process is called
a minimax procedure. In a minimax tree, one can view in its entire form, the score
values at each each level of the tree at any instant of the game. A player can find out from
the tree which moves are the best. In our example, the current situation has a score of 7.
So First Player should choose the leftmost branch which leads to the child with the
maximum score.
Q8. What are Blind Search techniques? Differentiate them.
Ans. Blind/Uninformed Search
Having no information about the number of steps from the current state to the
goal.
do not use any specific problem domain information
e.g., searching for a route on a map without using any information about
direction
yes, it seems dumb: but the power is in the generality
examples: breadth-first, depth-first, etc
we will look at several of these classic algorithms
1.Depth-first search : Expand one of the nodes at the deepest level.

Pseudocode for Depth-First Search


Initialize: Let Q = {S}
While Q is not empty
pull Q1, the first element in Q

if Q1 is a goal
report(success) and quit
else
child_nodes = expand(Q1)
eliminate child_nodes which represent loops
put remaining child_nodes at the front of Q
end
Continue
Comments
a specific example of the general search tree method
open nodes are stored in a queue Q of nodes
key feature
new unexpanded nodes are put at front of the queue
convention is that nodes are ordered left to right
2. Breadth-first search : Expand all the nodes of one level first.

Pseudocode for Breadth-First Search


Initialize: Let Q = {S}
While Q is not empty
pull Q1, the first element in Q
if Q1 is a goal
report(success) and quit
else
child_nodes = expand(Q1)
eliminate child_nodes which represent loops
put remaining child_nodes at the back of Q
end
Continue
Comments
another specific example of the general search tree method
open nodes are stored in a queue Q of nodes
differs from depth-first only in that
new unexpanded nodes are put at back of the queue
convention again is that nodes are ordered left to right
Q9. Differentiate between
a. Heuristic and Brute force search
b. LISP and PROLOG
Ans.(a) Heuristic search :

Involving or serving as an aid to learning, discovery, or problem-solving by experimental and


especially trial-and-error methods. Heuristic technique improves the efficiency of a search
process, possibly by sacrificing claims of completeness or optimality.
heuristic is a method that
might not always find the best solution
but is guaranteed to find a good solution in reasonable time.
By sacrificing completeness it increases efficiency.
Useful in solving tough problems which
o could not be solved any other way.
o solutions take an infinite time or very long time to compute
Heuristic is for combinatorial explosion.
Optimal solutions are rarely needed.
Heuristic Search methods Generate and Test Algorithm
1. generate a possible solution which can either be a point in the problem space or a path
from the initial state.
2. test to see if this possible solution is a real solution by comparing the state reached with
the set of goal states.
3. if it is a real solution, return. Otherwise repeat from 1.
This method is basically a depth first search as complete solutions must be created before testing.
It is often called the British Museum method as it is like looking for an exhibit at random. A
heuristic is needed to sharpen up the search. Consider the problem of four 6-sided cubes, and
each side of the cube is painted in one of four colours. The four cubes are placed next to one
another and the problem lies in arranging them so that the four available colours are displayed
whichever way the 4 cubes are viewed. The problem can only be solved if there are at least four
sides coloured in each colour and the number of options tested can be reduced using heuristics if
the most popular colour is hidden by the adjacent cube.
Hill climbing
Here the generate and test method is augmented by an heuristic function which measures the
closeness of the current state to the goal state.
1. Evaluate the initial state if it is goal state quit otherwise current state is initial state.
2. Select a new operator for this state and generate a new state.
3. Evaluate the new state
o if it is closer to goal state than current state make it current state
o if it is no better ignore
4. If the current state is goal state or no new operators available, quit
The Travelling Salesman Problem
A salesman has a list of cities, each of which he must visit exactly once. There are direct
roads between each pair of cities on the list. Find the route the salesman should follow for the
shortest possible round trip that both starts and finishes at any one of the cities.

Nearest neighbour heuristic:


1. Select a starting city.
2. Select the one closest to the current city.
3. Repeat step 2 until all cities have been visited.
Brute force search
The most general search algorithms are brute-force searches since they do not require any
domain specific knowledge. All that is required for a brute-force search is a state description, a
set of legal operators, an initial state, and a descriptions of the goal state. So brute-force search is
also called uninformed search and blind search.
Brute-force search should proceed in a systematic way by exploring nodes in some
predetermined order or simply by selecting nodes at random. Search programs either return only
a solution value when a goal is found or record and return the solution path.

Brute-force search, also known as exhaustive search, is the simplest and crudest of all
possible heuristics: it means checking every single point in the function space.
Nearly all thought in heuristic pertains to how to find a solution, an optimum, or a pretty
good combination without searching every point in the design space. Thus brute-force
search is the null heuristic.
It's what you do when you don't know of any heuristic that could simplify the problem.
That said, though, brute force always has the last word. However you whittle down
your search space, you still must examine one possibility at a time, even in your muchreduced search space.
Using computers, brute-force search is becoming more and more feasible for more kinds
of problems. Brute-force search always has the advantage that it requires no imagination
or cleverness.
There is none of the metaheuristic problem of finding a good heuristic. If you want
results fast, and the problem is small enough that brute-force search can find the
solution, brute-force search is the way to go.
No matter how fast computers get, though, the vast majority of interesting problems will
never submit to brute force.
The reason is that in many problems, the number of combinations grows so quickly that
even if all the universe's matter were converted into the fastest computers, it would still
take more years to find the solution than there are sub-atomic particles in the universe.
For example, brute force cannot figure out optimal play in chess. There are 20 possible
opening moves in chess, and approximately that number of possible responses. 20 x 20
x 20 x 20 x ... soon multiplies out to a vaster number than anything that any computer
could ever deal with.
But, sometimes you can find a clever way to use brute force on part of a problem. And
often that's a huge advance.

(b) LISP :

Lisp (or LISP) is a family of computer programming languages with a long history and a
distinctive, fully parenthesized syntax. Originally specified in 1958, Lisp is the secondoldesthigh-level programming language in widespread use today.
The name LISP derives from "LISt Processing". Linked lists are one of Lisp languages'
majordata structures, and Lisp source code is itself made up of lists. As a result, Lisp programs
can manipulate source code as a data structure, giving rise to the macro systems that allow
programmers to create new syntax or even new domain-specific languages embedded in Lisp.
Versions of LISP

Lisp is an old language with many variants

Lisp is alive and well today


Most modern versions are based on Common Lisp
LispWorks is based on Common Lisp
Scheme is one of the major variants
The essentials havent changed much

Recursion

Recursion is essential in Lisp

A recursive definition is a definition in which

certain things are specified as belonging to the category being defined, and

a rule or rules are given for building new things in the category from other things already
known to be in the category.
Informal Syntax

An atom is either an integer or an identifier.

A list is a left parenthesis, followed by zero or more S-expressions, followed by a right


parenthesis.

An S-expression is an atom or a list.

Example: (A (B 3) (C) ( ( ) ) )
Formal Syntax (approximate)

<S-expression> ::= <atom> | <list>


<atom> ::= <number> | <identifier>
<list> ::= ( <S-expressions> )
<S-expressions > ::= <empty>
| <S-expressions > <S-expression>
<number> ::= <digit> | <number> <digit>
<identifier> ::= string of printable characters, not including parentheses

T and NIL

NIL is the name of the empty list


As a test, NIL means false
T is usually used to mean true, but
anything that isnt NIL is true
NIL is both an atom and a list
its defined this way, so just accept it

Function calls and data

A function call is written as a list


the first element is the name of the function
remaining elements are the arguments
Example: (F A B)
calls function F with arguments A and B
Data is written as atoms or lists
Example: (F A B) is a list of three elements
Do you see a problem here?

Basic Functions

CAR returns the head of a list


CDR returns the tail of a list
CONS inserts a new head into a list
EQ compares two atoms for equality
ATOM tests if its argument is an atom

Other useful Functions

(NULL S) tests if S is the empty list


(LISTP S) tests if S is a list
LIST makes a list of its (evaluated) arguments
(LIST 'A '(B C) 'D) returns (A (B C) D)
(LIST (CDR '(A B)) 'C) returns ((B) C)

APPEND concatenates two lists

(APPEND '(A B) '((X) Y) ) returns (A B (X) Y)

(CAR L)

(CDR L) (CONS (CAR L) (CDR L))

(A B C)
( (X Y) Z)
(X)
(()())
()

A
(X Y)
X
()
undefined

(B C)
(Z)
()
(())
undefined

(A B C)
( (X Y) Z)
(X)
(()())
undefined

ATOM

ATOM takes any S-expression as an argument


ATOM returns "true" if the argument you gave it is an atom
As with any predicate, ATOM returns either NIL or something that isn't NIL

COND

COND implements the if...then...elseif...then...elseif...then... control structure

The arguments to a function are evaluated before the function is called


This isn't what you want for COND

COND is a special form, not a function

Defining Functions

(DEFUN function_name parameter_list


function_body )
Example: Test if the argument is the empty list
(DEFUN NULL (X)
(COND
(X NIL)
(T T) ) )

Rules for Recursion

Handle the base (simplest) cases first

Recur only with a simpler case


Simpler = more like the base case
Dont alter global variables (you cant anyway with the Lisp functions)
Dont look down into the recursion

Guidelines for Lisp Functions

Unless the function is trivial, start with COND.

Handle the base case first.


Avoid having more than one base case.

The base case is usually testing for NULL.


Do something with the CAR and recur with the CDR.

PROLOG :
Prolog is a general purpose logic programming language associated with artificial
intelligenceand computational linguistics. Prolog has its roots in formal logic, and unlike many
other programming languages, Prolog isdeclarative: The program logic is expressed in terms of
relations, represented as facts and rules. A computation is initiated by running a query over these
relations. Prolog was one of the first logic programming languages, and remains among the most
popular such languages today, with many free and commercial implementations available. While
initially aimed at natural language processing, the language has since then stretched far into other
areas like theorem proving, expert systems, games,
Prolog is the major example of a fourth generation programming language supporting the
declarative programming paradigm. The programs in this tutorial are written in
'standard' University of Edinburgh Prolog, as specified in the classic Prolog textbook by authors
Clocksin and Mellish (1981,1992).
What Is Prolog?

Prolog is a logic-based language


With a few simple rules, information can be analyzed

Syntax

.pl files contain lists of clauses

Clauses can be either facts or rules


male(bob).
male(harry).
child(bob,harry).
son(X,Y):male(X),child(X,Y).
Rules

Rules combine facts to increase knowledge of the system


son(X,Y):male(X),child(X,Y).

X is a son of Y if X is male and X is a child of Y

UNIT-II
Q10. Explain Bayes theorem with example.
Ans. Thomas Bayes addressed both the case of discrete probability distributions of data and the more
complicated case of continuous probability distributions. In the discrete case, Bayes' theorem relates
the conditional and marginal probabilities of events A and B, provided that the probability of B does not
equal zero:

Each term in Bayes' theorem has a conventional name:


P(A) is the prior probability or marginal probability of A. It is "prior" in the sense that it
does not take into account any information about B.
P(A|B) is the conditional probability of A, given B. It is also called the posterior
probability because it is derived from or depends upon the specified value of B.
P(B|A) is the conditional probability of B given A. It is also called the likelihood.
P(B) is the prior or marginal probability of B, and acts as a normalizing constant.
Bayes' theorem in this form gives a mathematical representation of how the conditional
probability of event A given B is related to the converse conditional probability
of B given A.
Most students understand that the probability of an event occurring can be influenced by another
event that has already occurred. However, many students cannot understand that the probability
of an event occurring can actually be dependent on an event that occurred later. Having
information about the outcome of an event can be used to revise probabilities of the occurrence
of a previous event. This lesson plan will help teachers to correct this common student
misconception. For students with more probability experience, this lesson also introduces them
to Bayes Theorem. Bayes Theorem provides a formula that to find one conditional probability
if other conditional probabilities are known. More specifically, it can be used to find P(A|B) if
P(B|A) is known. Bayes Theorem is usually expressed as:
P(AB)
[P(B|A) * P(A)]
P(A|B) = ----------- = -------------------------------------------P(B)
[P(B|A) * P(A) + P(B|~A) * P(~A)]
i) If A and B are two mutually exclusive events, then
ii)
A simple example of Bayes' theorem
Suppose there is a school with 60% boys and 40% girls as its students. The female students wear
trousers or skirts in equal numbers; the boys all wear trousers. An observer sees a (random)
student from a distance, and what the observer can see is that this student is wearing trousers.
What is the probability this student is a girl? The correct answer can be computed using Bayes'
theorem.
The event A is that the student observed is a girl, and the event B is that the student observed is
wearing trousers. To compute P(A|B), we first need to know:

P(A), or the probability that the student is a girl regardless of any other information. Since
the observer sees a random student, meaning that all students have the same probability of
being observed, and the fraction of girls among the students is 40%, this probability equals
0.4.

P(B|A), or the probability of the student wearing trousers given that the student is a girl.
Since they are as likely to wear skirts as trousers, this is 0.5.

P(B), or the probability of a (randomly selected) student wearing trousers regardless of


any other information. Since half of the girls and all of the boys are wearing trousers, this
is 0.50.4 + 1.00.6 = 0.8.
Given all this information, the probability of the observer having spotted a girl given that the
observed student is wearing trousers can be computed by substituting these values in the
formula:

Another, essentially equivalent way of obtaining the same result is as follows. Assume, for
concreteness, that there are 100 students, 60 boys and 40 girls. Among these, 60 boys and
20 girls wear trousers. All together there are 80 trouser-wearers, of which 20 are girls.
Therefore the chance that a random trouser-wearer is a girl equals 20/80 = 0.25. Put in terms
of Bayes theorem, the probability of a student being a girl is 40/100, the probability that
any given girl will wear trousers is 1/2. The product of these two is 20/100, but we know the
student is wearing trousers, so one deducts the 20 students not wearing trousers, and then
calculate a probability of (20/100)/(80/100), or 20/80.
It is often helpful when calculating conditional probabilities to create a simple table
containing the number of occurrences of each outcome, or the relative frequencies of each
outcome, for each of the independent variables. The table below illustrates the use of this
method for the above girl-or-boy example
Girls

Boys

Total

Trousers

20

60

80

Skirts

20

20

Total
40
60
100
Bayes' theorem derived via conditional probabilities
To derive Bayes' theorem, start from the definition of conditional probability. The probability of
the event A given the event B is

Equivalently, the probability of the event B given the event A is

.
Q 11. What are issues involved in representation of knowledge?

Ans. Below are listed issues that should be raised when using a knowledge representation
technique:
Important Attributes
-- Are there any attributes that occur in many different types of problem?
There are two instance and is a and each is important because each supports property
inheritance.
Single-Valued Attributes
- Introduce an explicit notation for temporal interval. If two values are given for the same
time then Signal a contradiction automatically.
- Assume the only temporal interval is NOW. So, if new value comes then replace the old
value.
- Provide no explicit support. If an attribute has one value then it is known not to have all
other values.
Relationships
-- What about the relationship between the attributes of an object, such as,
inverses, existence, techniques for reasoning about values and single valued
attributes. We can consider an example of an inverse in band(John Zorn,Naked

City)
This can be treated as John Zorn plays in the band Naked City or John Zorn's band is
Naked City.
Another representation is band = Naked City
band-members = John Zorn, Bill Frissell, Fred Frith, Joey Barron,
Granularity
-- At what level should the knowledge be represented and what are the primitives. Clearly the
separate levels of understanding require different levels of primitives and these need many rules
to link together apparently similar primitives.
Obviously there is a potential storage problem and the underlying question must be what level of
comprehension is needed.
Finding the right structures as needed:
Selecting an initial structure.
Revising the choice when necessary.
Range of Knowledge Representations issues,include, but are not limited to:
1. measure of KR approachs adequacy to the represented knowledge
2. measure of knowledge role with respect to the goal that is trying to be achieved
3. measure of overall quality of knowledge within the knowledge representation
4. measure of knowledge uncertainty for the knowledge utilization by the autonomous system
5. measure of the consistency of knowledge that is provided by the autonomous software agents
or by the service providers
6. measure of the ontologies role in autonomous systems
Q 12.Represent following sentences in WFF or Predicate logic:
a. All gardeners like sun.

(
x) gardener(x) => likes(x,Sun)
b. Everyone is younger than his father.
X:
y: younger (x,y)
c. John likes all kinds of food.
(
x) food(x) ->likes(John,x)
d. Everyone is loyal to someone.
X:
y: loyalto(x,y)
e. Apple is food.
Food(Apple)
Q 13. Discuss Frames in detail.
Ans. A frame is a data-structure for representing a stereotyped situation, like being in a certain
kind of living room, or going to a child's birthday party. Attached to each frame are several kinds
of information. Some of this information is about how to use the frame. Some is about what one
can expect to happen next. Some is about what to do if these expectations are not confirmed.
A frame is a data structure introduced by Marvin Minsky in the 1970s that can be used for
knowledge representation. Minsky frames are intended to help an Artificial Intelligent system
recognize specific instances of patterns. Frames usually contain properties called attributes or
slots. Slots may contain default values (subject to override by detecting a different value for an
attribute), refer to other frames (component relationships) or contain methods for recognizing
pattern instances. Frames are thus a machine-usable formalization of concepts or schemata. In
contrast, the object-oriented paradigm partitions an information domain into abstraction
hierarchies (classes and subclasses) rather than partitioning into component hierarchies, and is
used to implement any kind of information processing. Frame Technology is loosely based on
Minsky Frames, its purpose being software synthesis rather than pattern analysis.
Like many other knowledge representation systems and languages, frames are an attempt to
resemble the way human beings are storing knowledge. It seems like we are storing our
knowledge in rather large chunks, and that different chunks are highly interconnected. In framebased knowledge representations knowledge describing a particular concept is organized as a
frame. The frame usually contains a name and a set of slots.
The slots describe the frame with attribute-value pairs <slotname value> or alternatively a triple
containing framename, slotname and value in some order. In many frame systems the slots are
complex structures that have facets describing the properties of the slot. The value of a slot may
be a primitive such as a text string or an integer, or it may be another frame. Most systems allow
multiple values for slots and some systems support procedural attachments. These attachments
can be used to compute the slot value, or they can be triggers used to make consistency checking
or updates of other slots. The triggers can be trigged by updates on slots.

Q 14.What are Rule based Deduction systems? Also discuss certainty factors in detail.

Ans. The way in which a piece of knowledge is expressed by a human expert carries important
information,
example: if the person has fever and feels tummy-pain then she may have an infection.
In logic it can be expressed as follows:
x. (has_fever(x) & tummy_pain(x) has_an_infection(x))
If we convert this formula to clausal form we loose the content as then we may have equivalent
formulas like:
(i) has_fever(x) & ~has_an_infection(x) ~tummy_pain(x)
(ii) ~has_an_infection(x) & tummy_pain(x) ~has_fever(x)
Notice that:
(i) and (ii) are logically equivalent to the original sentence
they have lost the main information contained in its formulation.
Forward Production System: The main idea behind the forward/backward production systems is:
to take advantage of the implicational form in which production rules are stated
by the expert
and use that information to help achieving the goal.
In the present systems the formulas have two forms:
rules
and facts
Rules are the productions stated in implication form.
Rules express specific knowledge about the problem.
Facts are assertions not expressed as implications.
The task of the system will be to prove a goal formula with these facts and rules.
In a forward production system the rules are expressed as F-rules
F-rules operate on the global database of facts until the termination condition is
achieved.
This sort of proving system is a direct system rather than a refutation system.
Facts
Facts are expressed in AND/OR form.
An expression in AND/OR form consists on sub-expressions of literals connected
by & and V symbols.
An expression in AND/OR form is not in clausal form.
Steps to transform facts into AND/OR form for forward system:
1.
2.
3.
4.
5.

Eliminate (temporarily) implication symbols.


Reverse quantification of variables in first disjunct by moving negation symbol.
Skolemize existential variables.
Move all universal quantifiers to the front an drop.
Rename variables so the same variable does not occur in different main conjuncts
- Main conjuncts are small AND/OR trees, not necessarily sum of literal clauses as
in Prolog.
Steps to transform the rules into a free-quantifier form:

1. Eliminate (temporarily) implication symbols.


2. Reverse quantification of variables in first disjunct by moving negation symbol.
3. Skolemize existential variables.
4. Move all universal quantifiers to the front and drop.
5. Restore implication.
All variables appearing on the final expressions are assumed to be universally quantified.
E.g. Original formula: x.(y. z. (p(x, y, z)) u. q(x, u))
Converted formula: p(x, y, f(x, y)) q(x, u).
Backward Production System:We restrict B-rules to expressions of the form: W ==> L,
where W is an expression in AND/OR form and L is a literal,
and the scope of quantification of any variables in the implication is the entire implication.
Recall that W==>(L1 & L2) is equivalent to the two rules: W==>L1 and W==>L2.
An important property of logic is the duality between assertions and goals in theorem-proving
systems.
Duality between assertions and goals allows the goal expression to be treated as if it were an
assertion.
Conversion of the goal expression into AND/OR form:
1. Elimination of implication symbols.
2. Move negation symbols in.
3. Skolemize existential variables.
4. Drop existential quantifiers. Variables remaining in the AND/OR form are considered to
be existentially quantified.
Goal clauses are conjunctions of literals and the disjunction of these clauses is the clause form of
the goal well-formed formula.

Q 15.What is Reasoning under Uncertainty?

Ans. Axioms of Probability Theory


Probability Theory provides us with the formal mechanisms and rules for manipulating
propositions represented probabilistically. The following are the three axioms of probability
theory:
0 <= P(A=a) <= 1 for all a in sample space of A
P(True)=1, P(False)=0
P(A v B) = P(A) + P(B) - P(A ^ B)
From these axioms we can show the following properties also hold:
P(~A) = 1 - P(A)
P(A) = P(A ^ B) + P(A ^ ~B)
Sum{P(A=a)} = 1, where the sum is over all possible values a in the sample space of A
Axioms of Probability Theory
Probability Theory provides us with the formal mechanisms and rules for manipulating
propositions represented probabilistically. The following are the three axioms of probability
theory:
0 <= P(A=a) <= 1 for all a in sample space of A
P(True)=1, P(False)=0
P(A v B) = P(A) + P(B) - P(A ^ B)
From these axioms we can show the following properties also hold:
P(~A) = 1 - P(A)
P(A) = P(A ^ B) + P(A ^ ~B)
Sum{P(A=a)} = 1, where the sum is over all possible values a in the sample space of A
Conditional Probabilities
Conditional probabilities are key for reasoning because they formalize the process of
accumulating evidence and updating probabilities based on new evidence. For example,
if we know there is a 4% chance of a person having a cavity, we can represent this as the
prior (aka unconditional) probability P(Cavity)=0.04. Say that person now has a
symptom of a toothache, we'd like to know what is the posterior probability of a Cavity
given this new evidence. That is, compute P(Cavity | Toothache).
If P(A|B) = 1, this is equivalent to the sentence in Propositional Logic B => A. Similarly,
if P(A|B) =0.9, then this is like saying B => A with 90% certainty. In other words, we've
made implication fuzzy because it's not absolutely certain.
Given several measurements and other "evidence", E1, ..., Ek, we will formulate queries
as P(Q | E1, E2, ..., Ek) meaning "what is the degree of belief that Q is true given that we
know E1, ..., Ek and nothing else."
Conditional probability is defined as: P(A|B) = P(A ^ B)/P(B) = P(A,B)/P(B)
One way of looking at this definition is as a normalized (using P(B)) joint probability
(P(A,B)).
Axioms of Probability Theory

Probability Theory provides us with the formal mechanisms and rules for manipulating
propositions represented probabilistically. The following are the three axioms of probability
theory:
0 <= P(A=a) <= 1 for all a in sample space of A
P(True)=1, P(False)=0
P(A v B) = P(A) + P(B) - P(A ^ B)
From these axioms we can show the following properties also hold:
P(~A) = 1 - P(A)
P(A) = P(A ^ B) + P(A ^ ~B)
Sum{P(A=a)} = 1, where the sum is over all possible values a in the sample space of A
Conditional Probabilities
Conditional probabilities are key for reasoning because they formalize the process of
accumulating evidence and updating probabilities based on new evidence. For example,
if we know there is a 4% chance of a person having a cavity, we can represent this as the
prior (aka unconditional) probability P(Cavity)=0.04. Say that person now has a
symptom of a toothache, we'd like to know what is the posterior probability of a Cavity
given this new evidence. That is, compute P(Cavity | Toothache).
If P(A|B) = 1, this is equivalent to the sentence in Propositional Logic B => A. Similarly,
if P(A|B) =0.9, then this is like saying B => A with 90% certainty. In other words, we've
made implication fuzzy because it's not absolutely certain.
Given several measurements and other "evidence", E1, ..., Ek, we will formulate queries
as P(Q | E1, E2, ..., Ek) meaning "what is the degree of belief that Q is true given that we
know E1, ..., Ek and nothing else."
Conditional probability is defined as: P(A|B) = P(A ^ B)/P(B) = P(A,B)/P(B)
One way of looking at this definition is as a normalized (using P(B)) joint probability
(P(A,B)).
Using Bayes's Rule
Bayes's Rule is the basis for probabilistic reasoning because given a prior model of the
world in the form of P(A) and a new piece of evidence B, Bayes's Rule says how the new
piece of evidence decreases my ignorance about the world by defining P(A|B).
Combining Multiple Evidence using Bayes's Rule
Generalizing Bayes's Rule for two pieces of evidence, B and C, we get:
P(A|B,C) = ((P(A)P(B,C | A))/P(B,C)
= P(A) * [P(B|A)/P(B)] * [P(C | A,B)/P(C|B)]
A is (unconditionally) independent of B if P(A|B) = P(A). In this case, P(A,B) =
P(A)P(B).
A is conditionally independent of B given C if P(A|B,C) = P(A|C) and, symmetrically,
P(B|A,C) = P(B|C). What this means is that if we know P(A|C), we also know P(A|B,C),
so we don't need to store this case. Furthermore, it also means that P(A,B|C) = P(A|
C)P(B|C).
Axioms of Probability Theory
Probability Theory provides us with the formal mechanisms and rules for manipulating
propositions represented probabilistically. The following are the three axioms of probability
theory:
0 <= P(A=a) <= 1 for all a in sample space of A
P(True)=1, P(False)=0

P(A v B) = P(A) + P(B) - P(A ^ B)


From these axioms we can show the following properties also hold:
P(~A) = 1 - P(A)
P(A) = P(A ^ B) + P(A ^ ~B)
Sum{P(A=a)} = 1, where the sum is over all possible values a in the sample space of A
Conditional Probabilities
Conditional probabilities are key for reasoning because they formalize the process of
accumulating evidence and updating probabilities based on new evidence. For example,
if we know there is a 4% chance of a person having a cavity, we can represent this as the
prior (aka unconditional) probability P(Cavity)=0.04. Say that person now has a
symptom of a toothache, we'd like to know what is the posterior probability of a Cavity
given this new evidence. That is, compute P(Cavity | Toothache).
If P(A|B) = 1, this is equivalent to the sentence in Propositional Logic B => A. Similarly,
if P(A|B) =0.9, then this is like saying B => A with 90% certainty. In other words, we've
made implication fuzzy because it's not absolutely certain.
Given several measurements and other "evidence", E1, ..., Ek, we will formulate queries
as P(Q | E1, E2, ..., Ek) meaning "what is the degree of belief that Q is true given that we
know E1, ..., Ek and nothing else."
Conditional probability is defined as: P(A|B) = P(A ^ B)/P(B) = P(A,B)/P(B)
One way of looking at this definition is as a normalized (using P(B)) joint probability
(P(A,B)).
Using Bayes's Rule
Bayes's Rule is the basis for probabilistic reasoning because given a prior model of the
world in the form of P(A) and a new piece of evidence B, Bayes's Rule says how the new
piece of evidence decreases my ignorance about the world by defining P(A|B).
Combining Multiple Evidence using Bayes's Rule
Generalizing Bayes's Rule for two pieces of evidence, B and C, we get:
P(A|B,C) = ((P(A)P(B,C | A))/P(B,C)
= P(A) * [P(B|A)/P(B)] * [P(C | A,B)/P(C|B)]
A is (unconditionally) independent of B if P(A|B) = P(A). In this case, P(A,B) =
P(A)P(B).
A is conditionally independent of B given C if P(A|B,C) = P(A|C) and, symmetrically,
P(B|A,C) = P(B|C). What this means is that if we know P(A|C), we also know P(A|B,C),
so we don't need to store this case. Furthermore, it also means that P(A,B|C) = P(A|
C)P(B|C).
Bayesian Networks
Bayesian Networks, also known as Bayes Nets, Belief Nets, Causal Nets, and Probability
Nets, are a space-efficient data structure for encoding all of the information in the full
joint probability distribution for the set of random variables defining a domain. That is,
from the Bayesian Net one can compute any value in the full joint probability distribution
of the set of random variables.
Represents all of the direct causal relationships between variables
Intuitively, to construct a Bayesian net for a given set of variables, draw arcs from cause
variables to immediate effects.

Space efficient because it exploits the fact that in many real-world problem domains the
dependencies between variables are generally local, so there are a lot of conditionally
independent variables
Captures both qualitative and quantitative relationships between variables
Can be used to reason
o Forward (top-down) from causes to effects -- predictive reasoning (aka causal
reasoning)
o Backward (bottom-up) from effects to causes -- diagnostic reasoning
Formally, a Bayesian Net is a directed, acyclic graph (DAG), where there is a node for
each random variable, and a directed arc from A to B whenever A is a direct causal
influence on B. Thus the arcs represent direct causal relationships and the nodes represent
states of affairs. The occurrence of A provides support for B, and vice versa. The
backward influence is call "diagnostic" or "evidential" support for A due to the
occurrence of B.
Each node A in a net is conditionally independent of any subset of nodes that are not
descendants of A given the parents of A.
Building a Bayesian Net
Intuitively, "to construct a Bayesian Net for a given set of variables, we draw arcs from cause
variables to immediate effects. In almost all cases, doing so results in a Bayesian network [whose
conditional independence implications are accurate]." (Heckerman, 1996)
More formally, the following algorithm constructs a Bayesian Net:
1. Identify a set of random variables that describe the given problem domain
2. Choose an ordering for them: X1, ..., Xn
3. for i=1 to n do
a. Add a new node for Xi to the net
b. Set Parents(Xi) to be the minimal set of already added nodes such that we have
conditional independence of Xi and all other members of {X1, ..., Xi-1} given
Parents(Xi)
c. Add a directed arc from each node in Parents(Xi) to Xi
d. If Xi has at least one parent, then define a conditional probability table at Xi:
P(Xi=x | possible assignments to Parents(Xi)). Otherwise, define a prior
probability at Xi: P(Xi)

Q 16. Explain Temporal Reasoning.


Ans. Temporal reasoning is only one of the components of CEP (complex event processing).
CEP is about processing a large amount of events and identifying the meaningful events out of
the event cloud. CEP uses techniques such as detection of complex patterns, etc.
It aims at describing the common-sense background knowledge on which our human
perspective on the physical reality is based. Methodologically, qualitative constraint calculi
restrict the vocabulary of rich mathematical theories dealing with temporal or spatial entities
such that specific aspects of these theories can be treated within decidable fragments with simple
qualitative (non-metric) languages. Contrary to mathematical or physical theories about space
and time, qualitative constraint calculi allow for rather inexpensive reasoning about entities
located in space and time. For this reason, the limited expressiveness of qualitative
representation formalism calculi is a benefit if such reasoning tasks need to be integrated in
applications.

Temporal reasoning requires:


A CEP enabled engine (time and events)
Ability to express temporal relationships
Requires a reference clock
Requires support of temporal dimension
Temporal reasoning is widely used in AI, especially for natural language processing. Existing
methods for temporal reasoning are extremely expensive in time and space, because complete
graphs are used.
We present an approach of temporal reasoning for expert systems in technical applications that
reduces the amount of time and space by using sequence graphs.
A sequence graph consists of one or more sequence chains and other intervals that are connected
only loosely with these chains. Sequence chains are based on the observation that in technical
applications many events occur sequentially.
The uninterrupted execution of technical processes for a long time is characteristic for technical
applications. To relate the first intervals in the application with the last ones makes no sense. In
sequence graphs only these relations are stored that are needed for further propagation.
In contrast to other algorithms which use incomplete graphs, no information is lost and the
reduction of complexity is significant. Additionally, the representation is more transparent,
because the "flow" of time is modelled.
Reasoning about space and time is a major field of interest in many areas of theoretical and
applied AI, especially in the theory and application of temporal and spatial models in planning,
high-level navigation of autonomous mobile robots, natural language understanding, temporal
databases, and concurrent and distributed programming.
The special track on spatio-temporal reasoning focuses on research and development aspects in
the area of reasoning about models of space and time.
Q 17. Explain fuzzy reasoning.
Ans. Fuzzy logic is a form of multi-valued logic derived from fuzzy set theory to deal with
reasoning that is approximate rather than accurate. Fuzzy logic corresponds to "degrees of truth",
while probabilistic logic corresponds to "probability, likelihood"; as these differ, fuzzy logic and
probabilistic logic yield different models of the same real-world situations.
A fuzzy concept is a concept of which the content, value, or boundaries of application can vary
according to context or conditions, instead of being fixed once and for all.
Usually this means the concept is vague, lacking a fixed, precise meaning, without however
being meaningless altogether. It does have a meaning, or rather multiple meanings (it has
different semantic associations). But these can become clearer only through further elaboration
and specification, including a closer definition of the context in which they are used. Fuzzy
concepts "lack clarity and are difficult to test or operationalize". In logic, fuzzy concepts are
often regarded as concepts which in their application are neither completely true or completely
false, or which are partly true and partly false.
Complementing general questions of how to represent knowledge is the need to understand how
knowledge can be used. In general, realistic problems have enormous associated spaces of
possible solutions which must be explored (searched) to find an actual solution that meets the
requirements of the problem. These spaces are much too large to be searched in their entirety,
and ways must be found to focus or short-circuit the search for solutions if systems are to have
any practical utility."

Linguistic variables
While variables in mathematics usually take numerical values, in fuzzy logic applications, the
non-numeric linguistic variables are often used to facilitate the expression of rules and facts.[4]
A linguistic variable such as age may have a value such as young or its antonym old. However,
the great utility of linguistic variables is that they can be modified via linguistic hedges applied
to primary terms. The linguistic hedges can be associated with certain functions.
Example
Fuzzy set theory defines fuzzy operators on fuzzy sets. The problem in applying this is that the
appropriate fuzzy operator may not be known. For this reason, fuzzy logic usually uses IF-THEN
rules, or constructs that are equivalent, such as fuzzy associative matrices.
Rules are usually expressed in the form:
IF variable IS property THEN action
For example, a simple temperature regulator that uses a fan might look like this:
IF temperature IS very cold THEN stop fan
IF temperature IS cold THEN turn down fan
IF temperature IS normal THEN maintain level
IF temperature IS hot THEN speed up fan
There is no "ELSE" all of the rules are evaluated, because the temperature might be "cold" and
"normal" at the same time to different degrees.
The AND, OR, and NOT operators of boolean logic exist in fuzzy logic, usually defined as the
minimum, maximum, and complement; when they are defined this way, they are called the
Zadeh operators. So for the fuzzy variables x and y:
NOT x = (1 - truth(x))
x AND y = minimum(truth(x), truth(y))
x OR y = maximum(truth(x), truth(y))
There are also other operators, more linguistic in nature, called hedges that can be applied. These
are generally adverbs such as "very", or "somewhat", which modify the meaning of a set using a
mathematical formula.
Q 18. Explain the heuristic methods for Reasoning under uncertainty.
Ans. Bayesian methods
The Bayesian methods have a number of advantages that indicates their suitability in uncertainty
management. Most significant is their sound theoretical foundation in probability theory. Thus,
they are currently the most mature of all of the uncertainty reasoning methods. While Bayesian
methods are more developed than the other uncertainty methods, they are not without faults.
1. They require a significant amount of probability data to construct a knowledge base.
Furthermore, human experts are normally uncertain and uncomfortable about the probabilities
they are providing.
2. What are the relevant prior and conditional probabilities based on? If they are statistically
based, the sample sizes must be sufficient so the probabilities obtained are accurate. If human
experts have provided the values, are the values consistent and comprehensive?
3. Often the type of relationship between the hypothesis and evidence is important in
determining howthe uncertainty will be managed. Reducing these associations to simple numbers
removes relevant information that might be needed for successful reasoning about the
uncertainties. For example,Bayesian-based medical diagnostic systems have failed to gain

acceptance because physicians distrust systems that cannot provide explanations describing how
a conclusion was reached (a feature difficult to provide in a Bayesian-based system).
4. The reduction of the associations to numbers also eliminated using this knowledge within
other tasks.For example, the associations that would enable the system to explain its reasoning to
a user are lost, as is the ability to browse through the hierarchy of evidences to hypotheses.
2: Certainty factors: Certainty factor is another method of dealing withuncertainty. This method
was originally developed for the MYCIN system. One of the difficulties with Bayesian method is
thatthere are too many probabilities required. Most of them could be unknown.
The problem gets very bad when there are manypieces of evidence.Besides the problem of
amassing all the conditional probabilities for the Bayesian method, another major problem that
appeared with medical experts was the relationship of belief and disbelief.
At first sight, this may appear trivial since obviously disbelief is simply the opposite of belief. In
fact, the theory of probability states that
P(H) + P(H) = 1
and so
P(H) = 1 - P(H)
For the case of a posterior hypothesis that relies on evidence,
E (1) P(H | E) = 1 - P(H | E)
However, when the MYCIN knowledge engineers began interviewing medical experts, they
found that physicians were extremely reluctant to state their knowledge in the form of equation
(1).
For example, consider a MYCIN rule such as the following.
IF 1) The stain of the organism is gram positive, and
2) The morphology of the organism is coccus, and
3) The growth conformation of the organism is chains
THEN There is suggestive evidence (0.7) that the identity of the organism is streptococcus
This can be written in terms of posterior probability:
(2) P(H | E1 E2 E3) = 0.7
where the Ei correspond to the three patterns of the antecedent.
3: Dempster-Shafer Theory
Here we discuss another method for handling uncertainty. It is called Dempster-Shafer theory. It
is evolved during the 1960s and 1970s through the efforts of Arthur Dempster and one of his
students, Glenn Shafer.
This theory was designed as a mathematical theory of evidence.
The development of the theory has been motivated by the observation that probability theory is
not able to distinguish between uncertainty and ignorance owing to incomplete information.

UNIT-III
Q 19. Explain the need of planning and also discuss the representation of planning.
Ans. Need of planning:
Intelligent agents must be able to set goals and achieve them. They need a way to visualize the
future (they must have a representation of the state of the world and be able to make predictions
about how their actions will change it) and be able to make choices that maximize the utility (or
"value") of the available choices. In classical planning problems, the agent can assume that it is
the only thing acting on the world and it can be certain what the consequences of its actions may
be.] However, if this is not true, it must periodically check if the world matches its predictions
and it must change its plan as this becomes necessary, requiring the agent to reason under
uncertainty.
Multi-agent planning uses the cooperation and competition of many agents to achieve a given
goal. Emergent behavior such as this is used by evolutionary algorithms and swarm intelligence.
The representation of planning :
An analysis of strategies, recognizable abstract patterns of planned behavior, highlights the
difference between the assumptions that people make about their own planning processes and the
representational commitments made in current automated planning systems
Problem Solving Planning
Newell and Simon 1956
Given the actions available in a task domain.
Given a problem specified as:
an initial state of the world,
a set of goals to be achieved.
Find a solution to the problem, i.e., a way to transform the initial state into a new state of the
world where the goal statement is true.
-Action Model, State, Goals
Classical Deterministic Planning
Action Model:
How to represent actions
Deterministic, correct, rich representation
State:
- single initial state, fully known
Goals:
- complete satisfaction
The Blocks World Definition Actions

Blocks are picked up and put down by the arm

Blocks can be picked up only if they are clear, i.e., without any block on top
The arm can pick up a block only if the arm is empty, i.e., if it is not holding another block, i.e.,
the arm can be pick up only one block at a time
The arm can put down blocks on blocks or on the table
Planning by Plain State Search
Search from an initial state of the world to a goal state
Enumerate all states of the world
Connect states with legal actions
Search for paths between initial and goal States
Planning - Generation
Many plan generation algorithms:
Forward from state, backward from goals
Serial, parallel search
Logical satisfiability
Heuristic search
Planning Actions and States
Model of an action a description of legal actions in the domain
move queen, open door if unlocked, unstack if top is clear,.
Model of the state
Numerical identification (s1, s2,...) no information
Symbolic description
objects, predicates
STRIPS Action Representation
Actions - operators -- rules -- with:
Precondition expression -- must be satisfied before the operator is applied.
Set of effects -- describe how the application of the operator changes the state.
Precondition expression: propositional, typed first order predicate logic, negation, conjunction,
disjunction,
existential and universal quantification, and functions.
Effects: add-list and delete-list.
Conditional effects -- dependent on condition on the state when action takes place.
Many Planning Domains
Web management agents
Robot planning
Manufacturing planning
Image processing management
Logistics transportation
Crisis management
Bank risk management
Blocks world
Puzzles
Artificial domains

Q 20. What is learning? Explain various types of learning in detail.


Ans. Most often heard criticisms of AI is that machines cannot be called intelligent until they
are able to learn to do new things and adapt to new situations, rather than simply doing as they
are told to do.
Some critics of AI have been saying that computers cannot learn!
Definitions of Learning: changes in the system that are adaptive in the sense that they
enable the system to do the same task or tasks drawn from the same population more
efficiently and more effectively the next time.
Learning covers a wide range of phenomenon:
Skill refinement : Practice makes skills improve. More you play tennis, better you
get
Knowledge acquisition: Knowledge is generally acquired through experience
various types of learning :
1. Rote Learning
2. Learning In Problem solving
3. Winstons Learning Program
4. Explanation-Based Learning
Rote Learning
When a computer stores a piece of data, it is performing a form of learning.
In case of data caching, we store computed values so that we do not have to recompute
them later.
When computation is more expensive than recall, this strategy can save a significant
amount of time.
Caching has been used in AI programs to produce some surprising performance
improvements.
Such caching is known as rote learning.
Rote learning does not involve any sophisticated problem-solving capabilities.
It shows the need for some capabilities required of complex learning systems such as:
Organized Storage of information
Generalization
Learning In Problem solving
Can program get better without the aid of a teacher?
It can be by generalizing from its own experiences
Winstons Learning Program
An early structural concept learning program.
This program operates in a simple blocks world domain.
Its goal was to construct representations of the definitions of concepts in blocks domain.
For example, it learned the concepts House, Tent and Arch.

A near miss is an object that is not an instance of the concept in question but that is very
similar to such instances.
Basic approach of Winstons Program :
1. Begin with a structural description of one known instance of the concept. Call that
description the concept definition.
2. Examine descriptions of other known instances of the concepts. Generalize the definition
to include them.
3. Examine the descriptions of near misses of the concept. Restrict the definition to exclude
these.
Explanation-Based Learning
Learning from Examples: Induction
Classification is the process of assigning, to a particular input, the name of a class to
which it belongs.
The classes from which the classification procedure can choose can be described in a
variety of ways.
Their definition will depend on the use to which they are put.
Classification is an important component of many problem solving tasks.

Before classification can be done, the classes it will use must be defined:
Isolate a set of features that are relevant to the task domain. Define each class by a
weighted sum of values of these features. Ex: task is weather prediction, the
parameters can be measurements such as rainfall, location of cold fronts etc.
Isolate a set of features that are relevant to the task domain. Define each class as a
structure composed of these features. Ex: classifying animals, various features can
be such things as color, length of neck etc
The idea of producing a classification program that can evolve its own class definitions is
called concept learning or induction.

Q 21. Explain partial order planning algorithm


Ans. Partial-order planning is an approach to automated planning. The basic idea is to leave the
decision about the order of the actions as open as possible. Given a problem description, a
partial-order plan is a set of all needed actions and order conditions for the actions where needed.
The approach is inspired by the least commitment strategy. In many cases, there are many
possible plans for a problem which only differs in the order of the actions. Many traditional
automated planners are searching for plans in the full search space containing all possible orders.
Despite the smaller search space for partial-order planning, it can also have advantages to leave
the option about the order of the actions open for later.
Partial-order plan
A partial-order plan consists of four components:
A set of actions.
A partial order for the actions. It specifies the conditions about the order of some actions.
A set of causal links. It describes what action meet what preconditions of other actions.
A set of open preconditions, i.e. those preconditions which are not fulfilled by any action
in the partial-order plan.
If you want to keep the possible orders of the actions as open as possible, you want to have the
set of order conditions as small as possible.
A plan is a solution if the set of open preconditions is empty.
Partial-order planner
A partial-order planner is an algorithm or program which will construct a plan and searches for a
solution. The input is the problem description, consisting of descriptions of the initial state, the
goal and possible actions.
The problem can be interpreted as a search problem where the set of possible partial-order plans
is the search space. The initial state would be the plan with the open preconditions equal to the
goal conditions. The final state would be any plan with no open preconditions, i.e. a solution.

Q 22. Explain Neural Network. Also explain the strength and weakness of NN.

Ans: An artificial neural network (ANN), usually called "neural network" (NN), is a
mathematical model or computational model that is inspired by the structure and/or functional
aspects of biological neural networks. A neural network consists of an interconnected group of
artificial neurons, and it processes information using a connectionist approach to computation. In
most cases an ANN is an adaptive system that changes its structure based on external or internal
information that flows through the network during the learning phase. Modern neural networks
are non-linear statistical data modeling tools. They are usually used to model complex
relationships between inputs and outputs or to find patterns in data.
In an artificial neural network simple artificial nodes, called variously "neurons", "neurodes",
"processing elements" (PEs) or "units", are connected together to form a network of nodes
mimicking the biological neural networks hence the term "artificial neural network".
Neural networks are being used:
in investment analysis:
to attempt to predict the movement of stocks currencies etc., from previous data. There,
they are replacing earlier simpler linear models.
in signature analysis:
as a mechanism for comparing signatures made (e.g. in a bank) with those stored. This is
one of the first large-scale applications of neural networks in the USA, and is also one of
the first to use a neural network chip.
in process control:
there are clearly applications to be made here: most processes cannot be determined as
computable algorithms. Newcastle University Chemical Engineering Department is
working with industrial partners (such as Zeneca and BP) in this area.
in monitoring:
networks have been used to monitor
the state of aircraft engines. By monitoring vibration levels and sound, early warning of
engine problems can be given.
British Rail have also been testing a similar application monitoring diesel engines.
in marketing:
networks have been used to improve marketing mailshots. One technique is to run a test
mailshot, and look at the pattern of returns from this. The idea is to find a predictive
mapping from the data known about the clients to how they have responded. This
mapping is then used to direct further mailshots.
Strengths and Weaknesses of Neural Networks
The greatest strength of neural networks is :
their ability to accurately predict outcomes of complex problems. In accuracy tests against other
approaches, neural networks are always able to score very high.
There are some downfalls to neural networks.
1) First, they have been criticized as being useful for prediction, but not always in
understanding a model. It is true that early implementations of neural networks were
criticized as black box prediction engines; however, with the new tools on the
market today, this criticism is debatable.
2) Secondly, neural networks are susceptible to over-training. If a network with a
large capacity for learning is trained using too few data examples to support that.

Q 23. Discuss Genetic algorithm.


Ans. A genetic algorithm (or GA for short) is a programming technique that mimics biological
evolution as a problem-solving strategy. Given a specific problem to solve, the input to the GA is
a set of potential solutions to that problem, encoded in some fashion, and a metric called a fitness
function that allows each candidate to be quantitatively evaluated. These candidates may be
solutions already known to work, with the aim of the GA being to improve them, but more often
they are generated at random.
The GA then evaluates each candidate according to the fitness function. In a pool of randomly
generated candidates, of course, most will not work at all, and these will be deleted. However,
purely by chance, a few may hold promise - they may show activity, even if only weak and
imperfect activity, toward solving the problem.
These promising candidates are kept and allowed to reproduce. Multiple copies are made of
them, but the copies are not perfect; random changes are introduced during the copying process.
These digital offspring then go on to the next generation, forming a new pool of candidate
solutions, and are subjected to a second round of fitness evaluation. Those candidate solutions
which were worsened, or made no better, by the changes to their code are again deleted; but
again, purely by chance, the random variations introduced into the population may have
improved some individuals, making them into better, more complete or more efficient solutions
to the problem at hand. Again these winning individuals are selected and copied over into the
next generation with random changes, and the process repeats. The expectation is that the
average fitness of the population will increase each round, and so by repeating this process for
hundreds or thousands of rounds, very good solutions to the problem can be discovered.
As astonishing and counterintuitive as it may seem to some, genetic algorithms have proven to
be an enormously powerful and successful problem-solving strategy, dramatically demonstrating
the power of evolutionary principles. Genetic algorithms have been used in a wide variety of
fields to evolve solutions to problems as difficult as or more difficult than those faced by human
designers. Moreover, the solutions they come up with are often more efficient, more elegant, or
more complex than anything comparable a human engineer would produce. In some cases,
genetic algorithms have come up with solutions that baffle the programmers who wrote the
algorithms in the first place!

Methods of representation
Before a genetic algorithm can be put to work on any problem, a method is needed to encode
potential solutions to that problem in a form that a computer can process. One common approach
is to encode solutions as binary strings: sequences of 1's and 0's, where the digit at each position
represents the value of some aspect of the solution. Another, similar approach is to encode
solutions as arrays of integers or decimal numbers, with each position again representing some
particular aspect of the solution. This approach allows for greater precision and complexity than
the comparatively restricted method of using binary numbers only and often "is intuitively closer
to the problem space" (Fleming and Purshouse 2002, p. 1228).
This technique was used, for example, in the work of Steffen Schulze-Kremer, who wrote a
genetic algorithm to predict the three-dimensional structure of a protein based on the sequence of
amino acids that go into it (Mitchell 1996, p. 62). Schulze-Kremer's GA used real-valued
numbers to represent the so-called "torsion angles" between the peptide bonds that connect
amino acids. (A protein is made up of a sequence of basic building blocks called amino acids,
which are joined together like the links in a chain. Once all the amino acids are linked, the
protein folds up into a complex three-dimensional shape based on which amino acids attract each
other and which ones repel each other. The shape of a protein determines its function.) Genetic
algorithms for training neural networks often use this method of encoding also.
A third approach is to represent individuals in a GA as strings of letters, where each letter again
stands for a specific aspect of the solution. One example of this technique is Hiroaki Kitano's
"grammatical encoding" approach, where a GA was put to the task of evolving a simple set of
rules called a context-free grammar that was in turn used to generate neural networks for a
variety of problems (Mitchell 1996, p. 74).
The virtue of all three of these methods is that they make it easy to define operators that cause
the random changes in the selected candidates: flip a 0 to a 1 or vice versa, add or subtract from
the value of a number by a randomly chosen amount, or change one letter to another. (See the
section on Methods of change for more detail about the genetic operators.) Another strategy,
developed principally by John Koza of Stanford University and called genetic programming,
represents programs as branching data structures called trees (Koza et al. 2003, p. 35). In this
approach, random changes can be brought about by changing the operator or altering the value at
a given node in the tree, or replacing one subtree with another.

Figure 1: Three simple program trees of the kind normally used in genetic programming. The
mathematical expression that each one represents is given underneath.
It is important to note that evolutionary algorithms do not need to represent candidate solutions
as data strings of fixed length. Some do represent them in this way, but others do not; for
example, Kitano's grammatical encoding discussed above can be efficiently scaled to create large
and complex neural networks, and Koza's genetic programming trees can grow arbitrarily large
as necessary to solve whatever problem they are applied to.
Methods of selection
There are many different techniques which a genetic algorithm can use to select the individuals
to be copied over into the next generation, but listed below are some of the most common
methods. Some of these methods are mutually exclusive, but others can be and often are used in
combination.
Elitist selection: The most fit members of each generation are guaranteed to be selected. (Most
GAs do not use pure elitism, but instead use a modified form where the single best, or a few of
the best, individuals from each generation are copied into the next generation just in case nothing
better turns up.)
Fitness-proportionate selection: More fit individuals are more likely, but not certain, to be
selected.
Roulette-wheel selection: A form of fitness-proportionate selection in which the chance of an
individual's being selected is proportional to the amount by which its fitness is greater or less
than its competitors' fitness. (Conceptually, this can be represented as a game of roulette - each
individual gets a slice of the wheel, but more fit ones get larger slices than less fit ones. The

wheel is then spun, and whichever individual "owns" the section on which it lands each time is
chosen.)
Scaling selection: As the average fitness of the population increases, the strength of the selective
pressure also increases and the fitness function becomes more discriminating. This method can
be helpful in making the best selection later on when all individuals have relatively high fitness
and only small differences in fitness distinguish one from another.
Tournament selection: Subgroups of individuals are chosen from the larger population, and
members of each subgroup compete against each other. Only one individual from each subgroup
is chosen to reproduce.
Rank selection: Each individual in the population is assigned a numerical rank based on fitness,
and selection is based on this ranking rather than absolute differences in fitness. The advantage
of this method is that it can prevent very fit individuals from gaining dominance early at the
expense of less fit ones, which would reduce the population's genetic diversity and might hinder
attempts to find an acceptable solution.
Generational selection: The offspring of the individuals selected from each generation become
the entire next generation. No individuals are retained between generations.
Steady-state selection: The offspring of the individuals selected from each generation go back
into the pre-existing gene pool, replacing some of the less fit members of the previous
generation. Some individuals are retained between generations.
Hierarchical selection: Individuals go through multiple rounds of selection each generation.
Lower-level evaluations are faster and less discriminating, while those that survive to higher
levels are evaluated more rigorously. The advantage of this method is that it reduces overall
computation time by using faster, less selective evaluation to weed out the majority of
individuals that show little or no promise, and only subjecting those who survive this initial test
to more rigorous and more computationally expensive fitness evaluation.

Methods of change
Once selection has chosen fit individuals, they must be randomly altered in hopes of improving
their fitness for the next generation. There are two basic strategies to accomplish this. The first

and simplest is called mutation. Just as mutation in living things changes one gene to another, so
mutation in a genetic algorithm causes small alterations at single points in an individual's code.
The second method is called crossover, and entails choosing two individuals to swap segments of
their code, producing artificial "offspring" that are combinations of their parents. This process is
intended to simulate the analogous process of recombination that occurs to chromosomes during
sexual reproduction. Common forms of crossover include single-point crossover, in which a
point of exchange is set at a random location in the two individuals' genomes, and one individual
contributes all its code from before that point and the other contributes all its code from after that
point to produce an offspring, and uniform crossover, in which the value at any given location in
the offspring's genome is either the value of one parent's genome at that location or the value of
the other parent's genome at that location, chosen with 50/50 probability.

Figure 2: Crossover and mutation. The above diagrams illustrate the effect of each of these
genetic operators on individuals in a population of 8-bit strings. The upper diagram shows two
individuals undergoing single-point crossover; the point of exchange is set between the fifth and
sixth positions in the genome, producing a new individual that is a hybrid of its progenitors. The
second diagram shows an individual undergoing mutation at position 4, changing the 0 at that
position in its genome to a 1.

Other problem-solving techniques


With the rise of artificial life computing and the development of heuristic methods, other
computerized problem-solving techniques have emerged that are in some ways similar to genetic
algorithms. This section explains some of these techniques, in what ways they resemble GAs and
in what ways they differ.

Neural networks
A neural network, or neural net for short, is a problem-solving method based on a
computer model of how neurons are connected in the brain. A neural network consists of
layers of processing units called nodes joined by directional links: one input layer, one
output layer, and zero or more hidden layers in between. An initial pattern of input is
presented to the input layer of the neural network, and nodes that are stimulated then
transmit a signal to the nodes of the next layer to which they are connected. If the sum of
all the inputs entering one of these virtual neurons is higher than that neuron's so-called
activation threshold, that neuron itself activates, and passes on its own signal to neurons
in the next layer. The pattern of activation therefore spreads forward until it reaches the
output layer and is there returned as a solution to the presented input. Just as in the
nervous system of biological organisms, neural networks learn and fine-tune their
performance over time via repeated rounds of adjusting their thresholds until the actual
output matches the desired output for any given input. This process can be supervised by
a human experimenter or may run automatically using a learning algorithm (Mitchell
1996, p. 52). Genetic algorithms have been used both to build and to train neural
networks.

Figure 3: A simple feedforward neural network, with one input layer consisting of four neurons,
one hidden layer consisting of three neurons, and one output layer consisting of four neurons.
The number on each neuron represents its activation threshold: it will only fire if it receives at
least that many inputs. The diagram shows the neural network being presented with an input
string and shows how activation spreads forward through the network to produce an output.

Hill-climbing
Similar to genetic algorithms, though more systematic and less random, a hill-climbing
algorithm begins with one initial solution to the problem at hand, usually chosen at
random. The string is then mutated, and if the mutation results in higher fitness for the
new solution than for the previous one, the new solution is kept; otherwise, the current
solution is retained. The algorithm is then repeated until no mutation can be found that
causes an increase in the current solution's fitness, and this solution is returned as the
result (Koza et al. 2003, p. 59). (To understand where the name of this technique comes
from, imagine that the space of all possible solutions to a given problem is represented as
a three-dimensional contour landscape. A given set of coordinates on that landscape
represents one particular solution. Those solutions that are better are higher in altitude,
forming hills and peaks; those that are worse are lower in altitude, forming valleys. A
"hill-climber" is then an algorithm that starts out at a given point on the landscape and
moves inexorably uphill.) Hill-climbing is what is known as a greedy algorithm, meaning
it always makes the best choice available at each step in the hope that the overall best
result can be achieved this way. By contrast, methods such as genetic algorithms and

simulated annealing, discussed below, are not greedy; these methods sometimes make
suboptimal choices in the hopes that they will lead to better solutions later on.

Simulated annealing
Another optimization technique similar to evolutionary algorithms is known as simulated
annealing. The idea borrows its name from the industrial process of annealing in which a
material is heated to above a critical point to soften it, then gradually cooled in order to
erase defects in its crystalline structure, producing a more stable and regular lattice
arrangement of atoms (Haupt and Haupt 1998, p. 16). In simulated annealing, as in
genetic algorithms, there is a fitness function that defines a fitness landscape; however,
rather than a population of candidates as in GAs, there is only one candidate solution.
Simulated annealing also adds the concept of "temperature", a global numerical quantity
which gradually decreases over time. At each step of the algorithm, the solution mutates
(which is equivalent to moving to an adjacent point of the fitness landscape). The fitness
of the new solution is then compared to the fitness of the previous solution; if it is higher,
the new solution is kept. Otherwise, the algorithm makes a decision whether to keep or
discard it based on temperature. If the temperature is high, as it is initially, even changes
that cause significant decreases in fitness may be kept and used as the basis for the next
round of the algorithm, but as temperature decreases, the algorithm becomes more and
more inclined to only accept fitness-increasing changes. Finally, the temperature reaches
zero and the system "freezes"; whatever configuration it is in at that point becomes the
solution. Simulated annealing is often used for engineering design applications such as
determining the physical layout of components on a computer chip.

Q 24. Write explanatory note on Explanation based learning.

Learning complex concepts using Induction procedures typically requires a substantial


number of training instances.
But people seem to be able to learn quite a bit from single examples.
We dont need to see dozens of positive and negative examples of fork( chess) positions
in order to learn to avoid this trap in the future and perhaps use it to our advantage.
What makes such single-example learning possible? The answer is knowledge.
Much of the recent work in machine learning has moved away from the empirical, data
intensive approach described in the last section toward this more analytical knowledge
intensive approach.
A number of independent studies led to the characterization of this approach as
explanation-base learning(EBL).
An EBL system attempts to learn from a single example x by explaining why x is an
example of the target concept.
The explanation is then generalized, and then systems performance is improved through
the availability of this knowledge.
We can think of EBL programs as accepting the following as input:
A training example
A goal concept: A high level description of what the program is supposed to learn
An operational criterion- A description of which concepts are usable.
A domain theory: A set of rules that describe relationships between objects and
actions in a domain.
From this EBL computes a generalization of the training example that is sufficient to
describe the goal concept, and also satisfies the operationality criterion.
Explanation-based generalization (EBG) is an algorithm for EBL and has two steps: (1)
explain, (2) generalize
During the explanation step, the domain theory is used to prune away all the unimportant
aspects of the training example with respect to the goal concept. What is left is an
explanation of why the training example is an instance of the goal concept. This
explanation is expressed in terms that satisfy the operationality criterion.
The next step is to generalize the explanation as far as possible while still describing the
goal concept.

An Explanation-based Learning (EBL ) system accepts an example (i.e. a training example) and
explains what it learns from the example. The EBL system takes only the relevant aspects of the
training. This explanation is translated into particular form that a problem solving program can
understand. The explanation is generalized so that it can be used to solve other problems.
PRODIGY is a system that integrates problem solving, planning, and learning methods in a
single architecture. It was originally conceived by Jaime Carbonell and Steven Minton, as an AI
system to test and develop ideas on the role that machine learning plays in planning and problem
solving. PRODIGY uses the EBL to acquire control rules.
The EBL module uses the results from the problem-solving trace (ie. Steps in solving problems)
that were generated by the central problem solver (a search engine that searches over a problem
space). It constructs explanations using an axiomatized theory that describes both the domain and
the architecture of the problem solver. The results are then translated as control rules and added
to the knowledge base. The control knowledge that contains control rules is used to guide the
search process effectively.
When an agent can utilize a worked example of a problem as a problem-solving method, the
agent is said to have the capability of explanation-based learning (EBL). This is a type of
analytic learning. The advantage of explanation-based learning is that, as a deductive
mechanism, it requires only a single training example ( inductive learning methods often require
many training examples). However, to utilize just a single example most EBL algorithms require
all of the following:
The training example
A Goal Concept
An Operationality Criteria
A Domain Theory
From the training example, the EBL algorithm computes a generalization of the example that is
consistent with the goal concept and that meets the operationality criteria (a description of the
appropriate form of the final concept). One criticism of EBL is that the required domain theory
needs to be complete and consistent. Additionally, the utility of learned information is an issue
when learning proceeds indiscriminately. Other forms of learning that are based on EBL are
knowledge compilation, caching and macro-ops.

Q 25. Write explanatory note on learning by analogy


Analogy is a powerful inference tool.
Our language and reasoning are laden with analogies.
Last month, the stock market was a roller coaster.
Bill is like a fire engine.
Problems in electromagnetism are just like problems in fluid flow.
Underlying each of these examples is a complicated mapping between what appear to be
dissimilar concepts.
For example, to understand the first sentence above, it is necessary to do two things:
Pick out one key property of a roller coaster, namely that it travels up and down
rapidly
Realize that physical travel is itself an analogy for numerical fluctuations.
This is no easy trick.
The space of possible analogies is very large.
An AI program that is unable to grasp analogy will be difficult to talk to and consequently
difficult to teach.
Thus analogical reasoning is an important factor in learning by advice taking.
Humans often solve problems by making analogies to things they already understand how
to do.

Analogical learning generally involves developing a set of mappings between features of


two instances. Paul Thagard and Keith Holyoak have developed a computational theory of
analogical reasoning that is consistent with the outline above, provided that abstraction
rules are provided to the model.
Analogy is a cognitive process of transferring information or meaning from a particular subject
(the analogue or source) to another particular subject (the target), and a linguistic expression
corresponding to such a process. In a narrower sense, analogy is an inference or an argument
from one particular to another particular, as opposed to deduction, induction, and abduction,
where at least one of the premises or the conclusion is general. The word analogy can also refer
to the relation between the source and the target themselves, which is often, though not
necessarily, a similarity, as in the biological notion of analogy.

Niels Bohr's model of the atom made an analogy between the atom and the solar system.
Analogy plays a significant role in problem solving, decision making, perception, memory,
creativity, emotion, explanation and communication. It lies behind basic tasks such as the
identification of places, objects and people, for example, in face perception and facial
recognition systems. It has been argued that analogy is "the core of cognition". [3] Specific
analogical language comprises exemplification, comparisons, metaphors, similes, allegories, and
parables, but not metonymy.
The ANALOGY module uses reasoning strategies and justifications saved from previous
problem-solving traces to build strategies for a new problem. These strategies are evoked when
the problem solver comes to a place where the new problem's justifications are similar to a
previously-solved problem. ANALOGY then directs the problem-solving strategy in the
direction that led to the previous solution.
ANALOGY
Requires more inferencing

Process of learning new concept or solutions through the use of similar known
concepts or solutions.

UNIT-IV
Q 26. What is expert system? Discuss the architecture of Expert system.
Ans. Definition(s) of Expert/Knowledge-Based Systems
The primary intent of expert system technology is to realize the integration of human expertise
into computer processes. This integration not only helps to preserve the human expertise but also
allows humans to be freed from performing the more routine activities that might be associated
with interactions with a computer-based system.
Given the number of textbooks, journal articles, and conference publications about
expert/knowledgebased
systems and their application, it is not surprising that there exist a number of different definitions
for an expert/knowledge-based system. In this article we use the following definition (2, 3):
An expert/knowledge-based system is a computer program that is designed to mimic the
decision-making ability of a decision-maker(s) (i.e., expert(s)) in a particular narrow domain of
expertise.
In order to fully understand and appreciate the meaning and nature of this definition, we
highlight and
detail the four major component pieces.
An expert/knowledge-based system is a computer program. A computer program is a piece of
software, written by a programmer as a solution to some particular problem or client need.
Because expert/knowledge-based systems are software products they inherit all of the problems
associated with any piece of computer software. Some of these issues will be addressed in the
discussion on the
development of these systems.
An expert/knowledge-based system is designed to mimic the decision-making ability. The
specific task of an expert/knowledge-based system is to be an alternative source of decisionmaking ability for organizations to use; instead of relying on the expertise of just oneor a
handfulof people qualified to make a particular decision. An expert/knowledge-based system
attempts to capture the reasoning of a particular person for a specific problem. Usually
expert/knowledge-based systems are designed and developed to capture the scarce, but critical
decision-making that occurs in many organizations. Expert/knowledge-based systems are often
feared to be replacements for decisionmakers, however, in many organizations, these systems
are used to free up the decision-maker to address more complex and important issues facing
the organization.
An expert/knowledge-based system uses a decision-maker(s) (i.e., expert(s)). Websters
dictionary defines an expert as
One with the special skill or mastery of a particular subject
The focal point in the development of an expert/knowledge-based system is to acquire and
represent the knowledge and experience of a person(s) who have been identified as possessing
the special skill or mastery.
An expert/knowledge-based system is created to solve problems in a particular narrow domain of
expertise. The above definition restricts the term expert to a particular subject. Some of the most

successful development efforts of expert/knowledge-based systems have been in domains that


are well scoped and have clear boundaries. Specific problem characteristics that lead to
successful expert/knowledge-based systems are discussed as part of the development process.
Now that we have defined what an expert/knowledge-based is, we will briefly discuss the history
of these systems. In this discussion, we include their historical place within the Artificial
Intelligence area and highlight some of the early, significant expert system development.
Major Application Areas
There are two different ways developers look at application areas for expert/knowledge-based
systems. First, they look at the functional nature of the problem. Secondly, they look at the
application domain. We review both of these ways to get a better understanding for the
application of expert/knowledge-based systems to real-world problems. In 1993, John Durkin
(12) published a catalog of expert system applications that briefly reviews a number of
applications of expert/knowledgebased system technology and categorizes each of the nearly
2,500 systems.
Both MYCIN and XCON point out two different functions that are viewed as highly favorable
for expert/knowledge-based system development. MYCIN mainly deals with the diagnosis of a
disease given a set of symptoms and patient information. XCON, on the other hand, is a
synthesis-based (design) configuration expert system. It takes as its input the needs of the
customer and builds a feasible arrangement of components to meet the need. Both of these
systems solve different generic types of problems.
An expert system may have many differing functions. It may monitor, detect faults, isolate faults,
control, give advice, document, assist, etc. The range of applications for expert system
technology ranges from highly embedded turnkey expert systems for controlling certain
functions in a car or in a home to systems that provide financial, medical, or navigation advice to
systems that control spacecraft.
Structure of Expert Systems
In the early days the phrase expert system was used to denote a system whose knowledge base
and reasoning mechanisms were based on those of a human expert. In this article a more general
position is held. A system will be called an expert system based on its form alone and
independent of its source of knowledge or reasoning capabilities.
The purpose of this section is to provide an intuitive overview of the architectural ideas
associated with expert systems. In discussing the architecture of expert systems we will first
introduce the concept of an expert system kernel and then embed that kernel in a fuller and more
traditional expert system architecture.
An Expert System Architecture
If we embed the kernel of an expert system in an operational contextthat contains processes
for interacting with and interfacing with a user, a process for knowledge and data acquisition and
a process to support the generation of explanations for rule firings and advice to the user then
we arrive at what is customarily viewed as the architecture for an expert system.

Figure 2 displays the architecture commonly associated with expert systems. In our terminology
it is comprised of a kernel augmented by processes for data and knowledge capture, user
interfaces and interactions, and an process for generating and presenting to a user explanations of
its behaviors.
The Knowledge and Data Acquisition process is used by the expert system to acquire new
facts and rules associated with its specific domain. It is through this process that capabilities can
be added to or subtracted from the expert system. Associated with this process is the concept of
knowledge engineering.
This is the process whereby knowledge from an expert or group of experts or other sources such
as books, procedure manuals, training guides, etc. are gathered, formatted, verified and validated,
and input into the knowledge base of the expert system (see the discussion on expert/knowledge
development for a more detailed explanation of knowledge engineering activities).

Q 27. Explain the AI application to Robotics.


Ans.
Application of AI to Robotics
One of the main applications of AI is in the area of robot control. By using evolving control
architectures, the robot can 'learn' the best way to do a task. Designers can use neural networks
and genetic algorithms to enable the robot to cope with complicated tasks, such as navigation in
a complicated environment (Mars, for instance).
Another area is image, sound and pattern recognition - 3 traits that any anthropomorphic robot
would need. Again, neural-networks could be used to analyze data from the optical or audio
device the robot used.
Example
A robot is assigned to hover over an assembly line and examine the gears that pass
underneath is for faults. If a fault is discovered, the robot is to push the gear off the line
into the rejection bin. Before the robot is put into practice, the robot is trained using a
neural network to recognize the salient features of the gear - its radius, the shape of its
perimeter, and its size. When the machine is then put on to the assembly line, its optical
equipment converts what it sees into input for the neural network which then analyzes
whether the gear is ok or not. If so, it passes, else an arm is activated to push the gear off.
There are limitations to such visual systems, described in the Problems with Machine
Vision essay.
Another area that robotics and AI/ALife are closely connected is the 'simple function, complex
behaviour' robots. Such robots perform small tasks, using very simple rules, but behave rather in
a complicated fashion, in a very similar way to Conway's Boids. For example, 5-6 robots could
be programmed to clean up a room, moving small objects to the nearest corner of the room,
whilst avoiding obstacles and each other. The military and NASA are researching such robots as
possible spy (or for NASA, exploration) robots that could easily pass through enemy defenses,
and through sheer numbers gather a large amount of data, without risk of loss of human life.
Examples of robots :
Cog.
MIT has always been at the forefront of AI technology, and it is building its own robot(s) under a
project heading "The Cog Shop", then main one called Cog. Cog is an attempt at creating a robot
that simulates the sensory and motor dynamics of the human body (the only exception being its
lack of legs). The motors within Cog all simulate the degree of freedom that the human body
does. The robot also has an advanced vision system that again simulates that of the human. Each
'eye' consists of two cameras, one for the wide-angle view, the other for a smaller, more precise
view (mimicking the fovea). Cog is powered by 8 (expandable to 239!) 16Mhz Motorola 68332
microprocessors that are networked together, running L (a
Kesmit.
Other robot from the "The Cog Shop" is a social robot called Kesmit. Kesmit is a completely
autonomous robot that attempts to simulate a baby learning from its parents. This rather cute
looking robot uses facial expressions to show the untrained user how it feels. So far it has over
10 different facial expressions, including fear, disgust, anger, surprise and other common
emotions. His features include a mouth, eye-brows, ears and eyelids - giving Kesmit a Gremlinlike appearance.
Kesmits software takes much of its theory from "...psychology, ethology, and developmental
psychology..." [Cog 2]. It is split up into 5 systems, perception system, the motivation system,

the attention system, the behaviour system, and the motor system. The perception system extracts
the data from the outside world (through the cameras in Kesmits eyes), the motivation system
maintains the emotions Kesmit 'feels', the attention system regulates the extent of these emotions,
the behaviour system implements the emotions, and the motor system controls the hardware
required to express the emotion. Visit Kismet's homepage to learn more, its a fascinating project.
Conclusion
Robotics is in many respects Mechanical AI. It is also a lot more complicated, since the data the
robot is receiving is real-time, real-world data, a lot more complicated that more software-based
AI programs have to deal with. On top of this more complicated programming required,
algorithms to respond via motors and other sensors is needed.
The field of robotics is where AI is all eventually aimed, most research is intended to one day
become part of a robot.
Q 28. Discuss the current trends in Intelligent systems.
Ans.
Mobile Robot Competition and Exhibition This is the twelfth year AAAI has
sponsored the Mobile Robot Competition, which brings together teams from leading
robotics research labs to compete and demonstrate state-of-the-art research in robotics
and AI. Each year the bar is raised on the competition challenges, and each year the
robots demonstrate increasing capabilities. This year, the competition includes three
events: Robot Host, Robot Rescue, and Robot Challenge.
Innovative Applications of AI Awards and Emerging Applications This
conference, co-located with IJCAI-03 and sponsored by the Association for the
Advancement of Artificial Intelligence, honors deployed applications that use AI
techniques in innovative ways delivering quantifiable value. In recent years, the
conference has been expanded to recognize experimental emerging applications that in
preliminary tests are demonstrating promising results. Together, the 20 applications
presented point the way toward new trends in intelligent applications in a wide range of
areas from automated fraud detection in the NASDAQ stock market, to automated
search of broadcast news for items of interest, to new teaching and commerce systems,
and intelligent distributed computing.
Trading Agents Competition The goal of this competition is to spurn research in
intelligent agents for e-commerce. This year there will be two competitions one in
which travel agents put together travel packages, and the second a dynamic supply
chain trading scenario in which PC manufacturers compete against one another for
customer orders and supplies over 250 simulated days.
AI and the Web special track Four invited speakers will discuss emerging trends
in using AI to improve the intelligence of the Web infrastructure. Senior researchers from
Google, University of Southern California, Hong Kong Baptist University and University
of Trento, Italy, will discuss topics such as Web search engines, technologies to build and
deploy intelligent agents on the Web, the research agenda to build true Web intelligence,
and eCommerce travel systems.
Invited Speakers Eleven speakers from leading research centers and business
ventures in the U.S. and Europe have been invited to the technical conference to discuss
emerging AI research and experimental systems on topics, such as selfreconfiguring

robots, Paul Allen's Vulcan, Inc. Project Halo, aimed toward the creation of a digital
Aristotle, capable of answering and providing cogent explanations to arbitrary questions
in a variety of domains.
Technical Program At the heart of the conference is the technical program, which
this year includes 189 technical paper presentations by leading researchers on a broad
array of AI topics, for example: computer vision, robotics, intelligent agents, intelligent
Internet, logic, learning, reasoning, representation and much more.
Workshops and Tutorials There will be 30 workshops on research and applications
(by invitation only) on such topics as Agent-Oriented Systems, AI Applications,
Knowledge Representation and Reasoning, Machine Learning and Data Mining, and AI
and the Web. In addition, 19 tutorials will cover concentrated technical topics of current
or emerging interest, for instance: state of the art in ant robotics, intelligent Web service,
and automated security protocol verification.
Intelligent Systems Demonstrations This track highlights innovative contributions to
the science of AI with an emphasis on the benefits of developing and using implemented
systems in AI research. This year's demonstrations include web technologies, intelligent
agents, reasoning engines, and collaborative and conversational system, in domains
ranging from space exploration to travel agencies to writing technical papers.
Poster Session This new program is designed to promote new research ideas and
widen participation at the conference. The 87 poster presentations represent a broad
cross-section of AI research areas including: automated reasoning, case-based reasoning,
constraints, knowledge representation, data mining and information retrieval, machine
learning, multiagents, natural language, neural networks, planning, search, vision and
robotics.
Exhibit Program Some of the leading robotics vendors, publishers, research labs, and
others will exhibit their offerings.

Q. 29 What are principles of natural language processing?


Ans.
INTRODUCTION
Natural Language Processing (NLP) is the computerized approach to analyzing text that is based
on both a set of theories and a set of technologies. And, being a very active area of research and
development, there is not a single agreed-upon definition that would satisfy everyone, but there
are some aspects, which would be part of any knowledgeable persons definition.
The definition is:
Natural Language Processing is a theoretically motivated range of computational techniques for
analyzing and representing naturally occurring texts at one or more levels of linguistic analysis
for the purpose of achieving human-like language processing for a range of tasks or applications.
Several elements of this definition can be further detailed. Firstly the imprecise notion of range
of computational techniques is necessary because there are multiple methods or techniques from
which to choose to accomplish a particular type of language analysis.
Naturally occurring texts can be of any language, mode, genre, etc. The texts can be oral or
written. The only requirement is that they be in a language used by humans to communicate to
one another. Also, the text being analyzed should not be specifically constructed for the purpose
of the analysis, but rather that the text be gathered from actual usage.
The notion of levels of linguistic analysis (to be further explained in Section 2) refers to the fact
that there are multiple types of language processing known to be at work when humans produce
or comprehend language. It is thought that humans normally utilize all of these levels since each
level conveys different types of meaning. But various NLP systems utilize different levels, or
combinations of levels of linguistic analysis, and this is seen in the differences amongst various
NLP applications. This also leads to much
confusion on the part of non-specialists as to what NLP really is, because a system that uses any
subset of these levels of analysis can be said to be an NLP-based system. The difference between
them, therefore, may actually be whether the system uses weak NLP or strong NLP.
Human-like language processing reveals that NLP is considered a discipline within Artificial
Intelligence (AI). And while the full lineage of NLP does depend on a number of other
disciplines, since NLP strives for human-like performance, it is appropriate to consider it an AI
discipline.
For a range of tasks or applications points out that NLP is not usually considered a goal in and
of itself, except perhaps for AI researchers. For others, NLP is the means for accomplishing a
particular task. Therefore, you have Information Retrieval (IR) systems that utilize NLP, as well
as Machine Translation (MT), Question-Answering, etc.
Goal
The goal of NLP as stated above is to accomplish human-like language processing. The choice
of the word processing is very deliberate, and should not be replaced with understanding. For
although the field of NLP was originally referred to as Natural Language Understanding (NLU)
in the early days of AI, it is well agreed today that while the goal of NLP is true NLU, that goal
has not yet been accomplished. A full NLU System would be able to:
1. Paraphrase an input text
2. Translate the text into another language

3. Answer questions about the contents of the text


4. Draw inferences from the text
While NLP has made serious inroads into accomplishing goals 1 to 3, the fact that NLP systems
cannot, of themselves, draw inferences from text, NLU still remains the goal of NLP.
There are more practical goals for NLP, many related to the particular application for which it is
being utilized. For example, an NLP-based IR system has the goal of providing more precise,
complete information in response to a users real information need. The goal of the NLP system
here is to represent the true meaning and intent of the users query, which can be expressed as
naturally in everyday language as if they were speaking to a reference librarian. Also, the
contents of the documents that are being searched will be represented at all their levels of
meaning so that a true match between need and response can be found, no matter how either are
expressed in their surface form.
Origins
As most modern disciplines, the lineage of NLP is indeed mixed, and still today has strong
emphases by different groups whose backgrounds are more influenced by one or another of the
disciplines. Key among the contributors to the discipline and practice of
NLP are: Linguistics - focuses on formal, structural models of language and the discovery of
language universals - in fact the field of NLP was originally referred to as Computational
Linguistics; Computer Science - is concerned with developing internal representations of data
and efficient processing of these structures, and; Cognitive
Psychology - looks at language usage as a window into human cognitive processes, and
has the goal of modeling the use of language in a psychologically plausible way.
Divisions
While the entire field is referred to as Natural Language Processing, there are in fact two distinct
focuses language processing and language generation. The first of these refers to the analysis
of language for the purpose of producing a meaningful representation, while the latter refers to
the production of language from a representation. The task of Natural Language Processing is
equivalent to the role of reader/listener, while the task of Natural Language Generation is that of
the writer/speaker. While much of the theory and
technology are shared by these two divisions, Natural Language Generation also requires a
planning capability. That is, the generation system requires a plan or model of the goal of the
interaction in order to decide what the system should generate at each point in an interaction. We
will focus on the task of natural language analysis, as this is most relevant to Library and
Information Science.
Another distinction is traditionally made between language understanding and speech
understanding. Speech understanding starts with, and speech generation ends with, oral language
and therefore rely on the additional fields of acoustics and phonology. Speech understanding
focuses on how the sounds of language as picked up by the system in the form of acoustical
waves are transcribed into recognizable morphemes and words. Once in this form, the same
levels of processing which are utilized on written text are utilized.
LEVELS OF NATURAL LANGUAGE PROCESSING
The most explanatory method for presenting what actually happens within a Natural Language
Processing system is by means of the levels of language approach. This is also referred to as the
synchronic model of language and is distinguished from the earlier sequential model, which

hypothesizes that the levels of human language processing follow one another in a strictly
sequential manner. Psycholinguistic research suggests that language processing is much more
dynamic, as the levels can interact in a variety of orders. Introspection reveals that we frequently
use information we gain from what is typically thought of as a higher level of processing to assist
in a lower level of analysis.
For example, the pragmatic knowledge that the document you are reading is about biology will
be used when a particular word that has several possible senses (or meanings) is encountered,
and the word will be interpreted as having the biology sense.
Of necessity, the following description of levels will be presented sequentially. The key point
here is that meaning is conveyed by each and every level of language and that since humans have
been shown to use all levels of language to gain understanding, the more capable an NLP system
is, the more levels of language it will utilize.
Phonology
This level deals with the interpretation of speech sounds within and across words. There are, in
fact, three types of rules used in phonological analysis: 1) phonetic rules for sounds within
words; 2) phonemic rules for variations of pronunciation when words are spoken together, and;
3) prosodic rules for fluctuation in stress and intonation across a sentence. In an NLP system
that accepts spoken input, the sound waves are analyzed and encoded into a digitized signal for
interpretation by various rules or by comparison to the particular language model being utilized.
Morphology
This level deals with the componential nature of words, which are composed of morphemes the
smallest units of meaning. For example, the word preregistration can be morphologically
analyzed into three separate morphemes: the prefix pre, the root registra, and the suffix tion.
Since the meaning of each morpheme remains the same across words, humans can break down
an unknown word into its constituent morphemes in order to understand its meaning. Similarly,
an NLP system can recognize the meaning conveyed by each morpheme in order to gain and
represent meaning. For example, adding the suffix ed to a verb, conveys that the action of the
verb took place in the past.
This is a key piece of meaning, and in fact, is frequently only evidenced in a text by the use of
the -ed morpheme.
Lexical
At this level, humans, as well as NLP systems, interpret the meaning of individual words.
Several types of processing contribute to word-level understanding the first of these being
assignment of a single part-of-speech tag to each word. In this processing, words that can
function as more than one part-of-speech are assigned the most probable part-ofspeech tag based
on the context in which they occur.
Additionally at the lexical level, those words that have only one possible sense or meaning can
be replaced by a semantic representation of that meaning. The nature of the representation varies
according to the semantic theory utilized in the NLP system. The following representation of the
meaning of the word launch is in the form of logical predicates. As can be observed, a single
lexical unit is decomposed into its more basic properties. Given that there is a set of semantic
primitives used across all words, these simplified lexical representations make it possible to
unify meaning across words and to produce complex interpretations, much the same as humans
do.
launch (a large boat used for carrying people on rivers, lakes harbors, etc.)
((CLASS BOAT) (PROPERTIES (LARGE)

(PURPOSE (PREDICATION (CLASS CARRY) (OBJECT PEOPLE))))


The lexical level may require a lexicon, and the particular approach taken by an NLP system will
determine whether a lexicon will be utilized, as well as the nature and extent of information that
is encoded in the lexicon. Lexicons may be quite simple, with only the words and their part(s)of-speech, or may be increasingly complex and contain information on the semantic class of the
word, what arguments it takes, and the semantic limitations on these arguments, definitions of
the sense(s) in the semantic representation
utilized in the particular system, and even the semantic field in which each sense of a
polysemous word is used.
Syntactic
This level focuses on analyzing the words in a sentence so as to uncover the grammatical
structure of the sentence. This requires both a grammar and a parser. The output of this level of
processing is a (possibly delinearized) representation of the sentence that reveals the structural
dependency relationships between the words. There are various grammars that can be utilized,
and which will, in turn, impact the choice of a parser. Not all NLP applications require a full
parse of sentences, therefore the remaining challenges in
parsing of prepositional phrase attachment and conjunction scoping no longer stymie those
applications for which phrasal and clausal dependencies are sufficient. Syntax conveys meaning
in most languages because order and dependency contribute to meaning. For example the two
sentences: The dog chased the cat. and The cat chased the dog. differ only in terms of syntax,
yet convey quite different meanings.
Semantic
This is the level at which most people think meaning is determined, however, as we can see in
the above defining of the levels, it is all the levels that contribute to meaning.
Semantic processing determines the possible meanings of a sentence by focusing on the
interactions among word-level meanings in the sentence. This level of processing can include the
semantic disambiguation of words with multiple senses; in an analogous way to how syntactic
disambiguation of words that can function as multiple parts-of-speech is accomplished at the
syntactic level. Semantic disambiguation permits one and only one sense of polysemous words to
be selected and included in the semantic representation of the sentence. For example, amongst
other meanings, file as a noun can mean either a folder for storing papers, or a tool to shape
ones fingernails, or a line of individuals in a queue. If information from the rest of the sentence
were required for the disambiguation, the semantic, not the lexical level, would do the
disambiguation. A wide range of methods can be implemented to accomplish the disambiguation,
some which require information as to the frequency with which each sense occurs in a particular
corpus of interest, or in general usage, some which require consideration of the local context, and
others which utilize pragmatic knowledge of the domain of the document.
Discourse
While syntax and semantics work with sentence-length units, the discourse level of NLP works
with units of text longer than a sentence. That is, it does not interpret multisentence texts as just
concatenated sentences, each of which can be interpreted singly.
Rather, discourse focuses on the properties of the text as a whole that convey meaning by making
connections between component sentences. Several types of discourse processing can occur at
this level, two of the most common being anaphora resolution and discourse/text structure
recognition. Anaphora resolution is the replacing of words such as pronouns, which are

semantically vacant, with the appropriate entity to which they refer (30). Discourse/text structure
recognition determines the functions of sentences in
the text, which, in turn, adds to the meaningful representation of the text. For example,
newspaper articles can be deconstructed into discourse components such as: Lead, Main
Pragmatic
This level is concerned with the purposeful use of language in situations and utilizes context over
and above the contents of the text for understanding The goal is to explain how extra meaning is
read into texts without actually being encoded in them. This requires much world knowledge,
including the understanding of intentions, plans, and goals. Some NLP applications may utilize
knowledge bases and inferencing modules. For example, the following two sentences require
resolution of the anaphoric term they, but this resolution requires pragmatic or world
knowledge.

Vous aimerez peut-être aussi