Académique Documents
Professionnel Documents
Culture Documents
Introduction
Some of the possible uses for Vision: Manipulation Grasping, insertion, needs local shape information and feedback for motor control. Navigation Finding clear paths, avoiding obstacles, calculating ones current velocity and orientation. Object Recognition A useful skill for distinguishing between multiple objects.
Image Formation
Vision works by gathering light scattered from objects in the scene and creating a 2-D image. Its important to the understand the geometry of the process in order to obtain information about the scene.
Image Formation
Image Formation
Perspective Project Equations -x/f = X/Z, -y/f = Y/Z => x = (-fX)/Z, y = (-fY)/Z
Image Formation
The Perspective projection is often approximated using orthographic projection, but there is an important difference. The Orthographic projection does not project vectors through a pinhole. Instead, the vectors run parallel, either perpendicular to or at a consistent angle from the image plane.
Lens Systems
Both human and artificial eyes use a lens. The lens is wider than a pinhole, allowing more light to enter, increasing the information collected. The human eye focuses by bending the shape of the lens. Artificial eyes focus by changing the distance between the lens and the image plane.
Image-Processing Operations
Edge Detection Edges are curves in the image plane across which there is a significant change in image brightness. The goal of edge detection is the construction of an idealized line drawing
Image-Processing Operations
One idea to detect edges is to differentiate the image and look for places where the brightness undergoes a sharp change Consider a 1-D example. Below is an intensity profile for a 1-D image.
Image-Processing Operations
Below we have the derivative of the previous graph.
Here we have a peak at x=18, x=50 and x=75. These errors are due to the presence of noise in the image.
Image-Processing Operations
This problem is countered by convolving a smoothing function along with the differentiation operation. The mathematical concept of convolution allows us to perform many useful image-processing operations.
Image-Processing Operations
One standard form of smoothing is to use a Gaussian function.
Now using the idea of convolving with the Gaussian function we can revisit the 1-D example.
Image-Processing Operations
With the convolving applied we can more easily see the edge at x=50.
Using convolving we are able to discover where edges are located and this allows us to make an accurate line drawing.
To measure Optical Flow, we need to find corresponding points between one time frame and the next. One formula is Sum of Squared Differences (SSD) SSD(Dx, Dy) = (x,y) (I(x, y, t) - I(x+Dx, y+Dy, t+Dt))2
Outline
9.1: Introduction 9.2: Image Formation 9.3: Image Processing Operations for Early Vision 9.4: Extracting 3D Information using Vision 9.5: Using Vision for Manipulation and Navigation 9.6: Object Representation and Recognition 9.7: Summary
Outline
9.5: Using Vision for Manipulation and Navigation Driving Example Lateral Control Longitudinal Control
Figure 24.24: The information needed for visual control of a vehicle on a freeway.
Overview
Communication as Action Types of Communicating Agents A Formal Grammar for a Subset of English Syntactic Analysis (Parsing) Definite Clause Grammar (DCG) Augmenting a Grammar Semantic Interpretation Ambiguity and Disambiguity A Communicating Agent Summary
Communication
Communication is the intentional exchange of information brought about by the production and perception of signs drawn from a shared system of conventional signs. Most animals employ a fixed set of signs to represent messages that are important to their survival: food here, predator nearby, approach, withdraw, lets mate. Humans, just as many other animals, use a limited number of signs to communicate (smiling, shaking hands)
Introduction
Humans are the only animal that has developed a complex, structured system of signs, known as language, that enables us to communicate most of what they know about the world. Although other animals such as chimpanzees and dolphins have shown vocabularies of hundreds of symbols, humans are the only species that can communicate an unbounded number of qualitatively different messages. Although there are other uniquely human attributes, such as wearing clothes and watching TV, Turing created his test based on language because language is closely tied to thinking, in a way these other attributes are not.
Communication as Action
Speech Act: The action available to an agent to produce language includes talking, typing, sign language, etc. Speaker - An agent that produces a speech act Hearer - An agent that receives a speech act Why would agents choose to perform a speech act? To be able to: Inform, Query, Answer, Request or Command, Promise, Acknowledge and Share
Communication as Action
Transferring Information to Hearer: Inform:
each other about the part of the world each has explored, so other agent has less exploring to do. Ex. Theres a breeze in 3 4.
Answer:
questions. This is a kind of informing. Ex. Yes, I smelled the wumpus in 2 3.
Acknowledge:
Share:
feelings and experiences with each other. Ex. You know, that wumpus sure needs deodorant.
Communication as Action
Make the Hearer take some action: Promise:
to do things or offer deals. Ex. Ill shoot the wumpus if you let me share the gold.
Query:
other agents about particular aspects of the world. Ex. Have you smelled the wumpus anywhere?
Request or Command:
other agents to perform actions. It can be seen as impolite to make direct requests, so often an indirect speech act (a request in the form of a statement or question) is used instead. Ex. I could use some help carrying this or Could you please help me carry this?
Fundamentals of Language
Formal Languages: Languages that are invented and are rigidly defined. A set of strings where each string is a sequence of symbols taken from a finite set called the terminal symbols.
Lisp, C++, first order logic, etc.
Models of Communication
Encoded Message Model:
Speaker encodes a proposition into words or signs. The hearer then tries to decode this message to retrieve the original proposition. The meaning in the speakers head, the message that gets transmitted, and the interpretation that hearer arrives at are all the same, unless there is noise during communication, or an error in encoding or decoding occurs.
Formal Language
Language
Agent A KB
Language
KB
Agent B
Percepts
Reasoning
Actions
Percepts
Reasoning
Actions
Digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
PP -> Preposition NP
Parsing Algorithms
There are many algorithms for parsing
Top-down parsing
Starting with an S and expanding accordingly
function BOTTOM-UP-PARSE(words, grammar) returns a parse tree forest words loop do if LENGTH(forest) = 1 and CATEGORY(forest[1]) = START(grammar) then return forest[1] else i choose from {1LENGTH(forest)} rule choose from RULES(grammar) n LENGTH(RULE-RHS(rule)) subsequence SUBSEQUENCE(forest, i, i+n-1) if MATCH(subsequence,RULE-RHS(rule)) then forest[ii+n-1] [MAKE-NODE(RULE-LHS(rule) , subsequence)] else fail end
DCG Notation
Positive:
Easy to describe grammars
Negative:
More verbose than BNF
3 Rules:
The notation X Y Z translate as Y(s1) /\ Z(s2) X(Append(s1, s2,). The notation X word translates as X([word]). The notation X Y | Z | translates as Y(s) \/ Z(s) \/ X(s), where Y is the translation into logic of the DCG expression Y.
New Rules
Changes needed to handle subjective and objective cases S NPs Npo VP PP Pronouns Pronouno NPs VP | Pronouns | Noun | Article Noun Pronouno | Noun | Article Noun VP NPo | Preposition NPo I | you | he | she | me | you | him | her |
Use of Augmentation S NP(case) VP PP Pronoun(Subjective) Pronoun(Objective) NP(subjective) VP | Pronoun(case) | Noun | Article Noun VP NP(Objective) | Preposition NP(Objective) I | you | he | she | me | you | him | her |
Verb Subcategorization
Now have slight improvement Must create a sub-categorization list
Verb give smell Subcats [NP , PP] [NP , NP] [NP] [Adjective] [PP] [Adjective] [PP] [NP] [S] Example give the gold in 3,3 to me give me the gold smell a wumpus smell awful smell like a wumpus Is smelly is in 2 2 is a pit Believe the smelly wumpus in 2 2 is dead
is
believe
Parse Tree
S VP([]) NP VP([NP]) VP([NP,NP]) Pronoun Verb([NP,NP]) NP Pronoun NP Article Noun
You
give
me
the
gold
Semantic Interpretation
Semantic Interpretation: Responsible for combining meanings compositionally to get a set of possible interpretations Formal Languages
Compositional Semantics: The semantics of any phrase is a function of its subphrases
X+Y
Natural Languages
Appears to have a noncompositional semantics
The batter hit the ball
Semantic interpretation alone cannot be certain of the right interpretation of a phrase or sentence
Semantic Interpretation
Semantics as DCG Augmentation The same idea used to specify the semantics of numbers and digits can applied to the complete language of mathematics
Exp(sem) > Exp(sem1) Operator(op) Exp(sem2) {sem = Apply(op, sem1, sem2)} Exp(sem) > ( Exp(sem) ) Exp(sem) > Number(sem) Digit(sem) > sem { 0 sem 9 } Number(sem) > Digit(sem) Number(sem) > Number(sem1) Digit(sem2) { sem = 10 sem1 + sem2 } Operator(sem) > sem { sem {+,,,}}
The Semantics of E1
Semantic structure is very different from syntactic structure. We use an intermediate form called a quasi-logical form which uses a new construction which we will call a quantified term.
every agent -> [ a Agent(a)]
Pragmatic Interpretation
Through semantic interpretation, an agent can perceive a string of words and use a grammar to derive a set of possible semantic interpretations. Now we address the problem of completing the interpretation by adding information about the current situation
Information which is noncompositional and contextdependant
Pragmatic Interpretation
Indexicals: Phrases that refer directly to the current situation
I am in Boston today.
Ambiguity
Lexical Ambiguity
a word has more than one meaning
Semantic Ambiguity
follows from lexical or syntactic ambiguity
Referential Ambiguity
semantic ambiguity caused by anaphoric expressions
Ambiguity
Pragmatic Ambiguity
Speaker and hearer disagree on what the current situation is.
Local Ambiguity
A substring can be parsed several ways.
Vagueness
Natural languages are also vague
Its hot outside.
Disambiguation
Disambiguation is a question of diagnosis. Models of the world are used to provide possible interpretations of a speech act.
Models of the speaker Models of the hearer
It is difficult to pick the right interpretation because there may be several right ones.
Disambiguation
In general, disambiguation requires the combination of four models:
the the the the world model mental model language model acoustic model
Disambiguation
Context free grammars do not provide a very useful language model. Probabilistic context-free grammars (PCFGs)
each rewrite rule has a probability associated with it
(0.9) (0.1)
A Communicating Agent
How does this all fit in to an agent that can communicate?
Start with the wumpus world robot slave.
Identify the kind (i.e, command or statement) of speech as part of the quasi-logical form.
A Communicating Agent
Rules for commands and statements
S(Command(rel(Hearer)) > VP(rel) S(Statement(rel(obj)) > NP(obj) VP(rel)