Académique Documents
Professionnel Documents
Culture Documents
Management
Xiaoqin (Shelley) Zhang
, Sungwook Yoon
, Phillip DiBona
, Li Ding
,
Janardhan Rao Doppa
, Derek Green
, Jinhong K. Guo
, Ugur Kuter
, Geoff Levine
, Reid L. MacTavish
,
Daniel McFarlane
, James R Michaelis
, Hala Mostafa
, Santiago Onta n on
, Charles Parker
, Jainarayan Radhakrishnan
,
Anton Rebguns
, Bhavesh Shrestha
, Zhexuan Song
, Ethan B. Trewhitt
, Huzaifa Zafar
, Chongjie Zhang
,
Daniel Corkill
, Gerald DeJong
, Thomas G. Dietterich
, Subbarao Kambhampati
, Victor Lesser
, Deborah L. McGuinness
,
Ashwin Ram
, Diana Spears
, Prasad Tadepalli
, Elizabeth T. Whitaker
, Weng-Keen Wong
,
James A. Hendler
, Martin O. Hofmann
, Kenneth Whitebread
Abstract
In this paper we describe the application of a novel
learning and problem solving architecture to the do-
main of airspace management, where multiple requests
for the use of airspace need to be reconciled and man-
aged automatically. The key feature of our General-
ized Integrated Learning Architecture (GILA) is a set
of integrated learning and reasoning (ILR) systems co-
ordinated by a central meta-reasoning executive (MRE).
Each ILR learns independently from the same training
example and contributes to problem-solving in concert
with other ILRs as directed by the MRE. Formal evalua-
tions show that our system performs as well as or better
than humans after learning from the same training data.
Further, GILA outperforms any individual ILR run in
isolation, thus demonstrating the power of the ensem-
ble architecture for learning and problem solving.
Introduction
Air-space management is a complex knowledge-intensive
activity that challenges even the best of human experts. Not
only do the experts need to keep track of myriad details
of different kinds of aircraft, their limitations and require-
ments, but they also need to nd a safe, mission-sensitive,
University of Wyoming
University of Maryland
University of Illinois at Urbana
Georgia Institute of Technology
University of Massachusetts, Amherst
Fujitsu Laboratories of America
Copyright c 2009, Association for the Advancement of Articial
Intelligence (www.aaai.org). All rights reserved.
and cost-effective global schedule of ights for the day. An
expert systems approach to airspace management requires
painstaking knowledge engineering to build the system and
a team of human experts to maintain it when changes occur
to the eet, possible missions, safety protocols, and costs
of schedule changes. An approach based on learning from
expert demonstrations is more attractive, especially if it can
be based on a small number of training examples since this
bypasses the knowledge engineering bottleneck (Buchanan
and Wilkins 1993).
In this paper we describe a novel learning and problem
solving architecture based on multiple integrated learning
and reasoning (ILR) systems coordinated by a centralized
controller called the meta-reasoning executive (MRE). Our
architecture is directly motivated by the need to learn rapidly
from a small number of expert demonstrations in a complex
problem solving domain. Learning from a small number of
examples is only possible if the learning system has a strong
bias or prior knowledge (Mitchell 1997). However, a strong
bias could lead to undesired solutions or no solution when
the bias is not appropriate for solving the problem at hand.
One way to guard against such inappropriate biases and to
hedge ones bets is to consider the solutions of learners with
different biases and combine them in an optimal fashion.
Since learning is highly impacted by the knowledge rep-
resentation, which in turn inuences problem solving, we
tightly integrate learning and problem solving in each ILR
by allowing it to use its own internal knowledge representa-
tion scheme. The MRE is responsible for presenting train-
ing examples to the ILRs, posing problems to be solved, and
selecting and combining the proposed solutions in a novel
way to solve the overall problem. Thus the MRE is spared
from understanding the internal representations of the ILRs
and operates only in the space of subproblems and candidate
solutions. While there have been highly successful uses of
ensembles in classication learning including bagging and
boosting (Polikar 2006), we are not aware of any work on
ensemble learning in the context of complex problem solv-
ing. Unlike classication ensembles that use simple voting
mechanisms to make the nal decision, here the MRE needs
to select the best among the proposed sub-problem solutions
and assemble them into a nal solution.
We developed a system called Generalized Integrated
Learning Architecture (GILA) that implements the above
, (x, y, y
) > 0 iff y is
a better choice than y
i=1
log(1 + e
mi
). (3)
We dene the margin m
i
to be the amount by which the
model prefers y
i
to y
i
as output for x
i
(see Equation 2.)
The loss on each example is determined by the logitboost
function of the margin and is accumulated over all training
examples as in Equation 3. The parameters w of the cost
function are repeatedly adjusted in proportion to the negative
gradient of the cumulative loss L over the training data.
The problem of nding y
i
which is encountered during
both learning and performance is called inference. We de-
ne a discretized space of operators namely, altitude, time,
radius and x-y co-ordinates based on the domain knowledge
to produce various possible plans to de-conict each ACM
in the ACMREQ. Since nding the best scoring plan that
resolves the conict is too expensive, we approximate it by
trying to nd a single change that resolves the conicts and
consider multiple changes only when it is necessary.
Case-Based Learner Reasoner (CBLR)
CBLR learns how to prioritize conicts and deconict
airspaces by using case-based reasoning (CBR). CBR
(Aamodt and Plaza 1994) consists of storing and indexing
a set of past experiences (called cases) and reusing them to
solve new problems, assuming that similar problems have
similar solutions. In GILA, CBLR uses the expert demon-
stration to learn cases. Unlike other learning techniques that
attempt to generalize at learning time, learning in CBR sys-
tems consists of representing and retaining cases in the case
libraries. All attempts at generalization are made during per-
formance by adapting the solutions of retrieved cases to the
problem at hand.
Learning in the CBLR CBLR learns knowledge to en-
able solving of both conict prioritization and deconiction
problems and, for that purpose, it uses ve knowledge con-
tainers. It uses a prioritization library which has a set of
cases for reasoning about the priority with which conicts
should be solved and a choice library, which contains cases
for reasoning about which airspace is to be moved in solving
a given conict. It has a constellation library used to charac-
terize the neighborhood surrounding a conict (i.e. its con-
text) and a deconiction library containing the specic steps
that the expert used to solve the conict. Finally, in order
to adapt the deconicted solutions, CBLR uses adaptation
knowledge, stored as rules and constraints.
CBLR learns cases for the four case libraries by analyz-
ing the expert trace. The order in which the expert solves
conicts provides the information required for prioritization
cases, and the experts choice of which airspace to modify
for a particular conict is the basis for choice cases. Con-
straints are learned by observing the range of values that the
expert uses for each variable (altitude, radius, width, etc.),
and by incorporating any additional constraints that other
components in GILA might learn.
Reasoning in the CBLR Given a set of conicts, CBLRuses
the prioritization library to assign to each conict the prior-
ity of the most similar conict in the library. The conicts
are then ranked according to these priorities, to obtain a nal
ordering. Thus, the bias of the CBLR is that similar con-
icts have similar priorities. Because CBLR followed the
approach of the expert so closely, evaluation shows priori-
ties computed by the CBLR were of very high quality.
Given a conict, CBLR uses a hierarchical problem solv-
ing process. First, the choice library is used to determine
which of the two airspaces is to move. Then, using the
constellation library, CBLR determines which deconiction
strategy (e.g. shift in altitude) is best suited to resolve the
conict given the context. Finally, cases from the deconic-
tion library are retrieved and adapted using the adaptation
knowledge. Adaptation may produce several candidate so-
lutions for each case retrieved and the validity of each candi-
date solution is evaluated with the help of the 4DCL. CBLR
attempts different cases, until satisfactory deconicted so-
lutions are found. In addition to standard problem solving
tests, incremental learning tests show that CBLR uses only
the approaches that were demonstrated by the expert, thus
showing CBLRs success in learning the experts approach.
4D Constraint Learner/Reasoner (4DCLR)
The 4D Constraint Learner/Reasoner (4DCLR) within
GILA is responsible for automated learning and application
of planning knowledge in the form of safety constraints.
Example constraints are: The altitude of a UAV over the
course of its trajectory should never exceed a maximum of
60,000 feet, and An aircraft trajectory should never be
moved so that it intersects a no-y zone.
The 4DCLR (Rebguns et al. 2008) has two components:
(1) the Constraint Learner (CL), which automatically in-
fers constraints from expert demonstration traces, and (2)
the Safety Checker (SC), which is responsible for verifying
the correctness of solution in terms of their satisfaction or
violation of the safety constraints learned by the CL. Why is
the SC needed if the ILRs already use the safety constraints
during problem solving? The reason is that the ILRs output
partial solutions to sub-problems. These partial solutions are
then combined by the MRE into one complete nal solution,
and that is what needs to be checked by the SC.
Constraint Learner We assume that the system designer
provides constraint templates a priori; it is the job of the
CL to infer the values of parameters within these templates.
Learning in the CL is Bayesian. For each parameter, such as
the maximum ying altitude for a particular aircraft, the CL
begins with a prior probability distribution. If informed, the
prior might be a Gaussian approximation of the real distribu-
tion obtained by asking the expert for the average, variance,
and covariance of the minimum and maximum altitudes. If
uninformed, a uniform prior is used.
Learning proceeds based on evidence witnessed by the
CL at each step of the demonstration trace. This evidence
might be a change in maximumaltitude that occurs as the ex-
pert positions and repositions an airspace to avoid a conict.
Based on this evidence, the prior is updated using Bayes
Rule and the assumption that the expert always moves an
airspace uniformly into a safe region. After observing
evidence, the CL assigns zero probability to constraint pa-
rameters that are inconsistent with the experts actions, and
assigns the highest probability to more constraining sets of
parameters that are consistent with the experts actions.
Safety Checker The SC inputs are candidate sub-problem
solutions from the ILRs, the current ACO on which to try
the candidate solutions, and the safety constraints output by
the CL; it outputs a violation message. The SC uses its 4D
Spatio-Temporal Reasoner to verify whether any constraint
is violated. A violation message is output by the SC that
includes the violated constraint, the solution that violated
the constraint, the nature of the violation, and the expected
degree (severity) of the violation, normalized to a value in
the range [0, 1].
The expected degree of violation is called the safety vi-
olation penalty. The SC calculates this penalty by nding
a normalized expectation of differences between expert ac-
tions (from the demonstration trace) and proposed actions
in the candidate solution. The differences are weighted by
their probabilities from the posterior distribution. The MRE
uses the violation penalty to discard sub-problem solutions
that are invalid because they have too many violations.
Meta-Reasoning Executive (MRE)
In both the practice learning phase and the performance
phase, the system is required to solve a test problem using
the learned knowledge. The MRE directs a collaborative
performance learning process where ILRs are not directly
sharing their learned knowledge but are all contributing to
solving the test problem. This collaborative performance
process is being modeled as a search process, searching for
a path from the initial state to the goal state where the prob-
lem is fully solved. The complete solution is a combination
of the partial solutions contributed by each ILR found on
this path. The given test problem is decomposed as a set
of sub-problems (conicts). As the MRE posts these sub-
problems on blackboard, each ILR will post its solutions to
some of these sub-problems. These sub-problem solutions
are the search operators available at the current state. The
MRE then selects one of these solutions and applies it to the
current state. As a result, a new state will be generated af-
ter applying this sub-problem solution. New conicts may
appear after applying a sub-problem solution. The new state
is then evaluated: if it is a goal state, the problem is fully
solved; otherwise, the MRE will post all sub-problems (con-
icts) existing in the new state and the previous process is
repeated. Figure 5 shows an example of a search tree. The
yellow nodes represent problem states and blue/green nodes
represent sub-problem solutions (sequences of ACM mod-
ications) posted by the ILRs. The yellow and blue/green
nodes alternate. When a sub-problem solution is selected to