Vous êtes sur la page 1sur 69

PERCEPTION

Introduction
Some of the possible uses for Vision: Manipulation Grasping, insertion, needs local shape information and feedback for motor control. Navigation Finding clear paths, avoiding obstacles, calculating ones current velocity and orientation. Object Recognition A useful skill for distinguishing between multiple objects.

Image Formation
Vision works by gathering light scattered from objects in the scene and creating a 2-D image. Its important to the understand the geometry of the process in order to obtain information about the scene.

Image Formation

Image Formation

Perspective Project Equations -x/f = X/Z, -y/f = Y/Z => x = (-fX)/Z, y = (-fY)/Z

Image Formation
The Perspective projection is often approximated using orthographic projection, but there is an important difference. The Orthographic projection does not project vectors through a pinhole. Instead, the vectors run parallel, either perpendicular to or at a consistent angle from the image plane.

Lens Systems
Both human and artificial eyes use a lens. The lens is wider than a pinhole, allowing more light to enter, increasing the information collected. The human eye focuses by bending the shape of the lens. Artificial eyes focus by changing the distance between the lens and the image plane.

Photometry of Image Formation


A processed image plane contains a brightness value for each pixel. The brightness of a pixel p in the image is proportional to amount of light directed toward the camera by the surface patch Sp that projects to pixel p. The light is characterized as being either Diffuse or Specular reflection.

Photometry of Image Formation


Phong's formula: E = p E0cosm (theta) p is the coefficient of Specular reflection E0 is the intensity of the light source m is the 'shininess' of the surface (theta) is the angle between the light direction and surface normal.

Image-Processing Operations
Edge Detection Edges are curves in the image plane across which there is a significant change in image brightness. The goal of edge detection is the construction of an idealized line drawing

Image-Processing Operations
One idea to detect edges is to differentiate the image and look for places where the brightness undergoes a sharp change Consider a 1-D example. Below is an intensity profile for a 1-D image.

Image-Processing Operations
Below we have the derivative of the previous graph.

Here we have a peak at x=18, x=50 and x=75. These errors are due to the presence of noise in the image.

Image-Processing Operations
This problem is countered by convolving a smoothing function along with the differentiation operation. The mathematical concept of convolution allows us to perform many useful image-processing operations.

Image-Processing Operations
One standard form of smoothing is to use a Gaussian function.

Now using the idea of convolving with the Gaussian function we can revisit the 1-D example.

Image-Processing Operations
With the convolving applied we can more easily see the edge at x=50.

Using convolving we are able to discover where edges are located and this allows us to make an accurate line drawing.

Extracting 3-D Information Using Vision


We need to extract 3-D information for performing certain tasks such as manipulation, navigation, and recognition. Three aspects: 1.Segmentation 2.Position & Orientation 3.Shape To recover 3-D information there are a number of cues available including motion, binocular stereopsis, texture, shading and contour.

Extracting 3-D Information Using Vision

To measure Optical Flow, we need to find corresponding points between one time frame and the next. One formula is Sum of Squared Differences (SSD) SSD(Dx, Dy) = (x,y) (I(x, y, t) - I(x+Dx, y+Dy, t+Dt))2

Extracting 3-D Information Using Vision


Binocular Stereopsis Binocular stereopsis uses multiple images in space. Where as motion used multiple images over time. Because the scenes will be in a different places relative to the z-axis, if we superpose the two images, there will be disparity in the location of important features.

Extracting 3-D Information Using Vision


Texture Gradient Texture refers to a spatially repeating pattern on a surface that can be sensed visually. In the images, the apparent size, shape, spacing of the repeating texture elements(texels) vary.

Extracting 3-D Information Using Vision


Texture can be used to determine shape via a two-step process: (a) measure the texture gradients and (b) estimate the surface shape, slant and tilt that could give rise to them.

Extracting 3-D Information Using Vision


The main problem is with dealing with interreflections. In most scenes the surfaces are not only illuminated by the light sources, but also by the light reflected from other surfaces which serve as a secondary light source. These mutual illumination effects are quite significant.

Extracting 3-D Information Using Vision


In a simplified world, where all surface marks and shadows have been removed all the lines can be classified as either limbs or edges. Limbs are the locus point on the surface where the line of sight is tangent to the surface. Edge is a surface normal discontinuity. Each edge can be further broken up into convex, concave and occluding edges.

Extracting 3-D Information Using Vision


In 1971 two men (Huffman and Clowes) independently studied the line labeling problem for trihedral solids objects in which exactly three plane surfaces come together at each vertex.

Extracting 3-D Information Using Vision


They created a junction dictionary to find a labeling for the line drawing. Later this work was generalized for arbitrary polyhedral and for piecewise smooth curved objects.

Outline
9.1: Introduction 9.2: Image Formation 9.3: Image Processing Operations for Early Vision 9.4: Extracting 3D Information using Vision 9.5: Using Vision for Manipulation and Navigation 9.6: Object Representation and Recognition 9.7: Summary

Outline
9.5: Using Vision for Manipulation and Navigation Driving Example Lateral Control Longitudinal Control

Using Vision for Manipulation and Navigation


One of the main uses of vision is to provide information for manipulating objects as well as navigating in a scene while avoiding obstacles. A perfect example of the use of vision is the driving example.

Using Vision for Manipulation and Navigation

Figure 24.24: The information needed for visual control of a vehicle on a freeway.

Using Vision for Manipulation and Navigation


The problem for the driver is to generate appropriate steering, actuation or braking actions to best accomplish these tasks. Focusing specifically on lateral and longitudinal control, what information is needed for these tasks?

Using Vision for Manipulation and Navigation


Lateral Control: The steering control system for the vehicle needs to detect edges corresponding to the lane marker segments and then needs to fit smooth curves to these. The parameters of these curves carry information about the lateral position of the car, the direction it is pointing relative to the lane, and the curvature of the lane. The dynamics of the vehicle are also needed.

Using Vision for Manipulation and Navigation


Longitudinal Control: The driver needs to know the distance to the vehicles in front. This can be accomplished using binocular stereopsis or optical flow. The driving example makes one point very clear: for a specific task, one does not need to recover all the information that in principle can be recovered from an image.

Agents That Communicate


Chris Bourne Chris Christen Bryan Hryciw

Overview
Communication as Action Types of Communicating Agents A Formal Grammar for a Subset of English Syntactic Analysis (Parsing) Definite Clause Grammar (DCG) Augmenting a Grammar Semantic Interpretation Ambiguity and Disambiguity A Communicating Agent Summary

Communication
Communication is the intentional exchange of information brought about by the production and perception of signs drawn from a shared system of conventional signs. Most animals employ a fixed set of signs to represent messages that are important to their survival: food here, predator nearby, approach, withdraw, lets mate. Humans, just as many other animals, use a limited number of signs to communicate (smiling, shaking hands)

Introduction
Humans are the only animal that has developed a complex, structured system of signs, known as language, that enables us to communicate most of what they know about the world. Although other animals such as chimpanzees and dolphins have shown vocabularies of hundreds of symbols, humans are the only species that can communicate an unbounded number of qualitatively different messages. Although there are other uniquely human attributes, such as wearing clothes and watching TV, Turing created his test based on language because language is closely tied to thinking, in a way these other attributes are not.

Origins & Evolution of Language


Did humans develop the use of language because we are smart, or are we smart because we use language well?
Jerrison, 1991: Human language stems from a need for better cognitive maps of territory. Canines rely heavily on scent marking and their sense of smell to determine where they are and what other animals have been there. Since primates do not have such a highly developed sense of smell, they substituted vocal sounds for scent marking.

Communication as Action
Speech Act: The action available to an agent to produce language includes talking, typing, sign language, etc. Speaker - An agent that produces a speech act Hearer - An agent that receives a speech act Why would agents choose to perform a speech act? To be able to: Inform, Query, Answer, Request or Command, Promise, Acknowledge and Share

Communication as Action
Transferring Information to Hearer: Inform:
each other about the part of the world each has explored, so other agent has less exploring to do. Ex. Theres a breeze in 3 4.

Answer:
questions. This is a kind of informing. Ex. Yes, I smelled the wumpus in 2 3.

Acknowledge:

requests and offers. Ex. Okay.

Share:
feelings and experiences with each other. Ex. You know, that wumpus sure needs deodorant.

Communication as Action
Make the Hearer take some action: Promise:
to do things or offer deals. Ex. Ill shoot the wumpus if you let me share the gold.

Query:
other agents about particular aspects of the world. Ex. Have you smelled the wumpus anywhere?

Request or Command:
other agents to perform actions. It can be seen as impolite to make direct requests, so often an indirect speech act (a request in the form of a statement or question) is used instead. Ex. I could use some help carrying this or Could you please help me carry this?

Difficulties with Communication


Speaking: When is a speech act called for? Which speech act, out of all the possibilities is the right one? Nondeterminism Understanding: Given ambiguous inputs, what state of the world could have created these inputs?

Fundamentals of Language
Formal Languages: Languages that are invented and are rigidly defined. A set of strings where each string is a sequence of symbols taken from a finite set called the terminal symbols.
Lisp, C++, first order logic, etc.

Natural Languages: Languages that humans use to talk to one another.


Chinese, Danish, English, etc.

Component Steps of Communication Three steps take place in the speaker:


Intention: S want H to believe P Generation: S chooses words W Synthesis: S utters the words W

Four steps take place in the hearer:


Perception: H perceives W1 (ideally, W = W1) Analysis: H infers that W1 has possible meanings P1, , Pn Disambiguation: H infers that S intended to convey Pi (ideally, Pi = P) Incorporation: H decides to believe Pi (or rejects it if it is out of line with what H already believes)

Component Steps of Communication Three steps take place in the speaker:


Intention: S want H to believe P Generation: S chooses words W Synthesis: S utters the words W

Four steps take place in the hearer:


Perception: H perceives W1 (ideally, W = W1) Analysis: H infers that W1 has possible meanings P1, , Pn Disambiguation: H infers that S intended to convey Pi (ideally, Pi = P) Incorporation: H decides to believe Pi (or rejects it if it is out of line with what H already believes)

Models of Communication
Encoded Message Model:
Speaker encodes a proposition into words or signs. The hearer then tries to decode this message to retrieve the original proposition. The meaning in the speakers head, the message that gets transmitted, and the interpretation that hearer arrives at are all the same, unless there is noise during communication, or an error in encoding or decoding occurs.

Situated Language Model:


Created because of limitations on the encoded message model. The meaning of the message depends on both the words, and the situation. Ex. Diamond refers to one thing when the subject is jewelry, and a completely different meaning when the subject is baseball.

Formal Language

Language

Agent A KB

Language
KB

Agent B

Percepts

Reasoning

Actions

Percepts

Reasoning

Actions

Formal Grammar for a Subset of English


Lexicon: List of allowable vocabulary words.
Noun -> stench | breeze | glitter | nothing | wumpus | pit | pits | gold | east | Verb -> is | see | smell | shoot | feel | stinks | go | grab | carry | kill | turn | Adjective -> right | left | east | south | back | smelly | Adverb -> here | there | nearby | ahead | right | left | east | south | back | Pronoun -> me | you | I | it | Name -> John | Mary | Boston | Aristotle | Preposition -> to | in | on | near |

Article -> the | a | an | Conjunction -> and | or | but |

Digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Formal Grammar for a Subset of English


Grammar: S -> NP VP S -> S Conjunction S NP -> Pronoun -> Noun -> Article Noun -> Digit Digit -> NP PP -> NP RelClause RelClause -> that VP VP -> Verb -> VP NP -> VP Adjective -> VP PP -> VP Adverb

PP -> Preposition NP

Parsing Algorithms
There are many algorithms for parsing
Top-down parsing
Starting with an S and expanding accordingly

Bottom-up parsing Combination of top-down and bottom-up Dynamic programming techniques


Avoids inefficiencies of backtracking

Bottom-up Parse (example)


forest The wumpus is dead Article wumpus is dead Article Noun is dead NP is dead NP Verb dead NP Verb Adjective NP VP Adjective NP VP S subsequence The wumpus Article Noun is dead Verb VP Adjective NP VP rule Article the Noun wumpus NP Article Noun Verb is Adjective dead VP Verb VP Verb Adjective S NP VP

function BOTTOM-UP-PARSE(words, grammar) returns a parse tree forest words loop do if LENGTH(forest) = 1 and CATEGORY(forest[1]) = START(grammar) then return forest[1] else i choose from {1LENGTH(forest)} rule choose from RULES(grammar) n LENGTH(RULE-RHS(rule)) subsequence SUBSEQUENCE(forest, i, i+n-1) if MATCH(subsequence,RULE-RHS(rule)) then forest[ii+n-1] [MAKE-NODE(RULE-LHS(rule) , subsequence)] else fail end

Definite Clause Grammer (DCG)


Problems with Backus-Naur Form (BNF)
Need meaning Context sensitive

Introduction of First Order Logic


BNF S NP VP Noun stench | First Order Logic NP(s1) /\ VP(s2) S(Append(s1 ,s2)) (s=stench \/ ) Noun(s)

DCG Notation
Positive:
Easy to describe grammars

Negative:
More verbose than BNF

3 Rules:
The notation X Y Z translate as Y(s1) /\ Z(s2) X(Append(s1, s2,). The notation X word translates as X([word]). The notation X Y | Z | translates as Y(s) \/ Z(s) \/ X(s), where Y is the translation into logic of the DCG expression Y.

Extending the Notation


Non-terminal symbols can be augmented A variable can appear on RHS An arbitrary logical test can appear on DCG First Order Logic RHS Digit(sem) sem { 0 sem 9 } (s=[sem]) Digit(sem , s)
Number(sem) Digit(sem) Number(sem) Number(sem1) Digit(sem2) {sem = 10 sem1 + sem2} Digit(sem , s) Number(sem , s) Number(sem , s1) /\ Digit(sem , s2) /\ sem = 10 sem1 + sem2 Number(sem , Append(s1 , s2)

Augmenting a Grammar Overgeneration


Simple grammar can overgenerate
Ex: Me smells a stench.

To fix we must understand Cases of English


Nominative - subjective - I Accusative - objective - me

New Rules
Changes needed to handle subjective and objective cases S NPs Npo VP PP Pronouns Pronouno NPs VP | Pronouns | Noun | Article Noun Pronouno | Noun | Article Noun VP NPo | Preposition NPo I | you | he | she | me | you | him | her |

Use of Augmentation S NP(case) VP PP Pronoun(Subjective) Pronoun(Objective) NP(subjective) VP | Pronoun(case) | Noun | Article Noun VP NP(Objective) | Preposition NP(Objective) I | you | he | she | me | you | him | her |

Verb Subcategorization
Now have slight improvement Must create a sub-categorization list
Verb give smell Subcats [NP , PP] [NP , NP] [NP] [Adjective] [PP] [Adjective] [PP] [NP] [S] Example give the gold in 3,3 to me give me the gold smell a wumpus smell awful smell like a wumpus Is smelly is in 2 2 is a pit Believe the smelly wumpus in 2 2 is dead

is

believe

Parse Tree
S VP([]) NP VP([NP]) VP([NP,NP]) Pronoun Verb([NP,NP]) NP Pronoun NP Article Noun

You

give

me

the

gold

Semantic Interpretation
Semantic Interpretation: Responsible for combining meanings compositionally to get a set of possible interpretations Formal Languages
Compositional Semantics: The semantics of any phrase is a function of its subphrases
X+Y

We can handle an infinite grammar with a finite set of rules

Natural Languages
Appears to have a noncompositional semantics
The batter hit the ball

Semantic interpretation alone cannot be certain of the right interpretation of a phrase or sentence

Semantic Interpretation
Semantics as DCG Augmentation The same idea used to specify the semantics of numbers and digits can applied to the complete language of mathematics

Exp(sem) > Exp(sem1) Operator(op) Exp(sem2) {sem = Apply(op, sem1, sem2)} Exp(sem) > ( Exp(sem) ) Exp(sem) > Number(sem) Digit(sem) > sem { 0 sem 9 } Number(sem) > Digit(sem) Number(sem) > Number(sem1) Digit(sem2) { sem = 10 sem1 + sem2 } Operator(sem) > sem { sem {+,,,}}

The Semantics of E1
Semantic structure is very different from syntactic structure. We use an intermediate form called a quasi-logical form which uses a new construction which we will call a quantified term.
every agent -> [ a Agent(a)]

Every agent smells a wumpus


e (e Perceive([ a Agent(a)], [ w Wumpus(w)],Nose) During(Now, e))

Pragmatic Interpretation
Through semantic interpretation, an agent can perceive a string of words and use a grammar to derive a set of possible semantic interpretations. Now we address the problem of completing the interpretation by adding information about the current situation
Information which is noncompositional and contextdependant

Pragmatic Interpretation
Indexicals: Phrases that refer directly to the current situation
I am in Boston today.

Anaphora: Phrases referring to objects that have been mentioned previously


John was hungry. He entered a restaurant. After John proposed to Marsha, they found a preacher and got married. For the honeymoon, they went to Hawaii.

Deciding which reference is the right one is a part of disambiguation.

Ambiguity and Disambiguation


The biggest problem in communicative exchange is that most utterances are ambiguous.
Squad helps dog bite victim. Red-hot star to wed astronomer. Helicopter powered by human flies. Once-sagging cloth diaper industry saved by full dumps.

Ambiguity
Lexical Ambiguity
a word has more than one meaning

Syntactic Ambiguity (Structural Ambiguity)


more than one possible parse for the phrase

Semantic Ambiguity
follows from lexical or syntactic ambiguity

Referential Ambiguity
semantic ambiguity caused by anaphoric expressions

Ambiguity
Pragmatic Ambiguity
Speaker and hearer disagree on what the current situation is.

Local Ambiguity
A substring can be parsed several ways.

Vagueness
Natural languages are also vague
Its hot outside.

Disambiguation
Disambiguation is a question of diagnosis. Models of the world are used to provide possible interpretations of a speech act.
Models of the speaker Models of the hearer

It is difficult to pick the right interpretation because there may be several right ones.

Disambiguation
In general, disambiguation requires the combination of four models:
the the the the world model mental model language model acoustic model

Natural language often uses deliberate ambiguity.


Most language understanding programs ignore this possibility

Disambiguation
Context free grammars do not provide a very useful language model. Probabilistic context-free grammars (PCFGs)
each rewrite rule has a probability associated with it

S > NP VP S > S Conjunction S

(0.9) (0.1)

A Communicating Agent
How does this all fit in to an agent that can communicate?
Start with the wumpus world robot slave.

Extend the grammar to accept commands


Go east Go to 2 2

Identify the kind (i.e, command or statement) of speech as part of the quasi-logical form.

A Communicating Agent
Rules for commands and statements
S(Command(rel(Hearer)) > VP(rel) S(Statement(rel(obj)) > NP(obj) VP(rel)

Rules for acknowledgements


S(Acknowledge(sem)) > Ack(sem) Ack(True) > yes Ack(True) > OK Ack(False) > no

Vous aimerez peut-être aussi