Vous êtes sur la page 1sur 8

An Ensemble Learning and Problem Solving Architecture for Airspace

Management

Xiaoqin (Shelley) Zhang

, Sungwook Yoon

, Phillip DiBona

, Darren Scott Appling

, Li Ding

,
Janardhan Rao Doppa

, Derek Green

, Jinhong K. Guo

, Ugur Kuter

, Geoff Levine

, Reid L. MacTavish

,
Daniel McFarlane

, James R Michaelis

, Hala Mostafa

, Santiago Onta n on

, Charles Parker

, Jainarayan Radhakrishnan

,
Anton Rebguns

, Bhavesh Shrestha

, Zhexuan Song

, Ethan B. Trewhitt

, Huzaifa Zafar

, Chongjie Zhang

,
Daniel Corkill

, Gerald DeJong

, Thomas G. Dietterich

, Subbarao Kambhampati

, Victor Lesser

, Deborah L. McGuinness

,
Ashwin Ram

, Diana Spears

, Prasad Tadepalli

, Elizabeth T. Whitaker

, Weng-Keen Wong

,
James A. Hendler

, Martin O. Hofmann

, Kenneth Whitebread

Abstract
In this paper we describe the application of a novel
learning and problem solving architecture to the do-
main of airspace management, where multiple requests
for the use of airspace need to be reconciled and man-
aged automatically. The key feature of our General-
ized Integrated Learning Architecture (GILA) is a set
of integrated learning and reasoning (ILR) systems co-
ordinated by a central meta-reasoning executive (MRE).
Each ILR learns independently from the same training
example and contributes to problem-solving in concert
with other ILRs as directed by the MRE. Formal evalua-
tions show that our system performs as well as or better
than humans after learning from the same training data.
Further, GILA outperforms any individual ILR run in
isolation, thus demonstrating the power of the ensem-
ble architecture for learning and problem solving.
Introduction
Air-space management is a complex knowledge-intensive
activity that challenges even the best of human experts. Not
only do the experts need to keep track of myriad details
of different kinds of aircraft, their limitations and require-
ments, but they also need to nd a safe, mission-sensitive,

Distribution Statement A (Approved for Public Release, Dis-


tribution Unlimited). This material is based upon work supported
by DARPA through a contract with Lockheed Martin (prime con-
tract #FA8650-06-C-7605). Any opinions, ndings, conclusions,
or recommendations expressed in this material are those of the au-
thor(s) and do not necessarily reect the views of DARPA, Lock-
heed Martin or the U.S. Government.

University of Massachusetts at Dartmouth

Arizona State University

Lockheed Martin Advanced Technology Laboratories

Georgia Tech Research Institute

Rensselaer Polytechnic Institute

Oregon State University

University of Wyoming

University of Maryland
University of Illinois at Urbana
Georgia Institute of Technology
University of Massachusetts, Amherst
Fujitsu Laboratories of America
Copyright c 2009, Association for the Advancement of Articial
Intelligence (www.aaai.org). All rights reserved.
and cost-effective global schedule of ights for the day. An
expert systems approach to airspace management requires
painstaking knowledge engineering to build the system and
a team of human experts to maintain it when changes occur
to the eet, possible missions, safety protocols, and costs
of schedule changes. An approach based on learning from
expert demonstrations is more attractive, especially if it can
be based on a small number of training examples since this
bypasses the knowledge engineering bottleneck (Buchanan
and Wilkins 1993).
In this paper we describe a novel learning and problem
solving architecture based on multiple integrated learning
and reasoning (ILR) systems coordinated by a centralized
controller called the meta-reasoning executive (MRE). Our
architecture is directly motivated by the need to learn rapidly
from a small number of expert demonstrations in a complex
problem solving domain. Learning from a small number of
examples is only possible if the learning system has a strong
bias or prior knowledge (Mitchell 1997). However, a strong
bias could lead to undesired solutions or no solution when
the bias is not appropriate for solving the problem at hand.
One way to guard against such inappropriate biases and to
hedge ones bets is to consider the solutions of learners with
different biases and combine them in an optimal fashion.
Since learning is highly impacted by the knowledge rep-
resentation, which in turn inuences problem solving, we
tightly integrate learning and problem solving in each ILR
by allowing it to use its own internal knowledge representa-
tion scheme. The MRE is responsible for presenting train-
ing examples to the ILRs, posing problems to be solved, and
selecting and combining the proposed solutions in a novel
way to solve the overall problem. Thus the MRE is spared
from understanding the internal representations of the ILRs
and operates only in the space of subproblems and candidate
solutions. While there have been highly successful uses of
ensembles in classication learning including bagging and
boosting (Polikar 2006), we are not aware of any work on
ensemble learning in the context of complex problem solv-
ing. Unlike classication ensembles that use simple voting
mechanisms to make the nal decision, here the MRE needs
to select the best among the proposed sub-problem solutions
and assemble them into a nal solution.
We developed a system called Generalized Integrated
Learning Architecture (GILA) that implements the above

Figure 1: An example of an ACO (left) and its deconicted


solution (right). Yellow: exiting airspaces that cannot be mod-
ied. Light Blue: airspace requests that can be modied. Ma-
genta/Red: conicts.
framework and evaluated it in the domain of airspace man-
agement. Our systemincludes the MRE and 4 different ILRs
with diverse biases including the decision-theoretic learner-
reasoner (DTLR) that learns a linear cost function, symbolic
planner learner and reasoner (SPLR) that learns decision
rules and value functions, the case-based learner reasoner
(CBLR) that learns and stores a feature-based case database,
and the 4D-deconiction and constraint learner (4DCLR)
that learns hard constraints on the schedules in the form of
rules. In rigorously evaluated comparisons, GILA was able
to outperform human novices, who are provided with the
same background knowledge and the same training exam-
ples as GILA, and do it in much less time. Our results show
that the quality of the solutions of the overall system is bet-
ter than that of the individual ILRs, all but one of which
were unable to fully solve the problems by themselves. This
demonstrates that the ensemble learning and problem solv-
ing architecture as instantiated by GILA is an effective ap-
proach to learning and managing complex problem-solving
in domains such as airspace management.
Domain Background
The domain of application for the GILA project is Airspace
Management in an Air Operations Center (AOC). Airspace
Management is the process of making changes to requested
airspaces so that they do not overlap with other requested
airspaces or previously approved airspaces. The problem
is: given a set of Airspace Control Means Requests (ACM-
Reqs), each representing an airspace requested by a pilot as
part of a given military mission, identify undesirable con-
icts between airspace uses and suggest moves in latitude,
longitude, time, or altitude that will eliminate them. An
Airspace Control Order (ACO) is used to represent the en-
tire collection of airspaces to be used during a given 24-hour
period. Each airspace is dened by a polygon described by
latitude and longitude points, an altitude range, and a time
interval during which the air vehicle will be allowed to oc-
Domain
Knowledge
World
Knowledge
World
Knowledge
|ackboard
- wor|d 3lale
- Traces
- KroW|edge
0rlo|og|es
Corslra|rls
elc.
-
- Proposed 3o|ul|ors
WorkfIow
Executor
Execution
Trace
GLA-TEF GLA-AMS
Integrated Learners
and Reasoners
Case-based L/R
4D Constraint L/R
Decision Theoretic L/R
SymboIic PIanner L/R
3uo-proo|ers
Meta-Reasoning Executive
- System Execution Coordination
- Collaborative Performance
- Conflict Management
Figure 2: GILA System Architecture (GILA-TEIF: Training
and Evaluation Interface; GILA-AMS: Air Management System)
cupy the airspace. The process of deconiction assures that
any two vehicles airspaces do not overlap or conict. Fig-
ure 1 shows an example of an original ACO (left) as well
as its deconicted solution (right). In order to solve the
airspace management problem, GILA must decide in what
order to address the conicts during the problem-solving
process and, for each conict, which airspace to move and
how to move the airspace so as to resolve the conict and
minimize the impact on the mission. The main problem
in airspace deconiction is that there are typically innitely
many ways to successfully deconict. The goal of the sys-
tem is to nd deconicted solutions that are qualitatively
similar to those found by human experts. Since it is hard
to specify what this means, we take the approach of learning
from expert demonstrations, and evaluate GILAs results by
comparing them to human solutions.
Generalized Integrated Learning Architecture
GILA loosely integrates a number of tightly coupled learn-
ing and reasoning mechanisms called Integrated Learner-
Reasoners (ILRs). ILRinteractions are controlled by a meta-
reasoning executive (MRE). Coordination is facilitated by
a blackboard. GILA is self-learning, i.e. learns how to
best apply its reasoning and learning modules via the MRE.
Figure 2 shows GILAs architecture. Different components
communicate through a common ontology. We will discuss
the ontology, ILRs and MRE in detail in later sections.
As shown in Figure 3, the system process is divided into
three sub-phases: demonstration learning, practice learn-
ing and performance phases. In the demonstration learning
phase, a complete, machine-parsable trace of the experts in-
teractions with a set of application services is captured and
is made available to the ILRs via the blackboard. Each ILR
uses shared world, domain, and ILR-specic knowledge to
expand its private models both in parallel during individual
Expert Trace
Demonstration Learning Phase
Individual ILR Learning
Performance Phase
Practice Learning Phase
MRE-DM Controlled
Collaborative-Performance
Learning
ILR Learning from
Feedback
Pseudo Expert
Trace
MRE-DM Controlled
Figure 3: GILA System Process
learning and in collaboration. During the practice learning
phase, GILA is given a practice problem (a set of airspaces
with conicts) and its goal state (with all conicts resolved)
but not told how this goal state is achieved (the actual modi-
cations to those airspaces). The MRE then directs all ILRs
to collaboratively solve this practice problem and generate
a solution that is referred as pseudo expert trace. ILRs
can learn from this pseudo expert trace, thus indirectly shar-
ing their learned knowledge through practice. In the perfor-
mance phase, GILA solves a problem based on the knowl-
edge it has already learned. The MRE decomposes the prob-
lem as a set of sub-problems (each sub-problem is a con-
ict to resolve), which are then ordered using the learned
knowledge from ILRs. ILRs are requested to solve those
sub-problems one at a time according to the ordering. The
MRE evaluates the partial solutions to the sub-problems pro-
duced by different ILRs, and composes the best ones into a
complete nal solution. The evaluation criteria is also based
on the knowledge learned by ILRs.
Ontology and Explanations
Background ontologies have been used to aid GILAs learn-
ing, problem-solving, and explanation processes. We de-
veloped a family of ontologies to support the goal of de-
conicting requests in an Airspace Control Order (ACO).
GILA ontologies are partitioned into two inter-dependent
groups: (i) domain knowledge related to the ACO decon-
iction problem, e.g., constraints, deconiction plan steps,
problem state description, and (ii) general-purpose knowl-
edge of the world, e.g. explanation, problem-solving and
temporal-spatial concepts. By adopting GILA ontologies,
especially the explanation interlingua - Proof Markup Lan-
guage (PML) (McGuinness et al. 2007), MRE and ILRs
with drastically different implementations can communi-
cate.
GILA ontologies are also used to drive the collabora-
tion between ILRs and the MRE. Instead of using a peer-
to-peer communications model, ILRs are driven by knowl-
edge shared on a centralized blackboard. The ILRs use
the blackboard to obtain the current system state and po-
tential problems, guide action choice, and publish results.
This communication model can be abstracted as a collabo-
rative and monotonic knowledge base evolution process, run
by an open hybrid reasoning system. Besides supporting
runtime collaboration, GILA ontologies enable logging and
explanation of the learning and problem-solving processes.
While the initial implementation only used the blackboard
system to encode basic provenance, an updated prototype
that leverages the Inference Web (IW) (McGuinness and da
Silva 2004) explanation infrastructure and SPARQL uses the
blackboard system to log more comprehensive GILA com-
ponent behavior. Given the complexity of a large hybrid
system, explanations are critical to end users acceptance of
the nal GILA recommendations and they can also be used
by the internal components when trying to decide how and
when to make choices.
Integrated Learning Reasoning systems (ILRs)
In this section, we describe how each ILR works inside the
GILA system and how their internal knowledge representa-
tion formats and learning/reasoning mechanisms are differ-
ent from each other.
Symbolic Planner Learner and Reasoner
Symbolic planner learner and reasoner (SPLR) represents
its learned solution strategy with hybrid hierarchical repre-
sentation machine (HHRM). There is anecdotal evidence to
suggest that a hybrid representation is consistent with expert
reasoning in the Airspace Deconiction problem. Experts
choose the top level solution strategy by looking at the usage
of the airspaces. This type of activity is represented with di-
rect policy representation. When, subsequently, the experts
need to decide how much to change the altitude or time, they
tend to use some value function, they attempt to minimize.
SPLR investigated a method to learn and use hybrid hierar-
chical representation machine (HHRM) (Yoon and Kamb-
hampati 2007).
An abstract action is either a parameterized action literal
such as A(x
1
, . . . , x
n
), or as a minimizer of a real-valued
cost function such as A(argmin V (x
1
, . . . , x
n
)).
HHRM for Airspace Management Domain Since our
target domain is airspace management, we provide the set
of action hierarchies that represents the action space for
the airspace management problem. At the top of the hi-
erarchy, we dene three relational actions: 1) altitude-
change (airspace), 2) geometrical-change (airspace), 3)
width-change (airspace). Each of the top-level actions has
a corresponding cost function to minimize. Each cost func-
tion is minimized with the following ground level physical
actions. SetAltitude, SetPoint and SetWidth.
Learning HHRM Learning of HHRM from the example
solution trajectory starts with partitioning the solution ac-
cording to the top level actions. Note that the example solu-
tion presented to SPLR has only physical level actions, e.g.,
SetAltiude(Airspace1, 10000ft). SPLR then partitions these
0. Learn-HHRM (Training-Sequence, Top-Level-Actions)
1. If Top-Level-Actions is null, then Return
2. Part Partition (Training-Sequence, Top-Level-Actions)
3. Learn-Top-Level-Actions (Part, Top-Level-Actions)
4. Learn-HHRM (Part, Lower-Level-Actions (Part))
Figure 4: Hybrid Hierarchical Representation Machine Learning.
sequences into top level action sequences, e.g. altitude-
change (Airspace1). Then for each partitioned space, SPLR
learns the top level actions. This process is recursively ap-
plied to each partition by setting the lower level actions as
top level actions and then partitioning the sequence of the
new top level actions. We conclude this section with the
HHRM learning algorithm in Figure 4. We used features
based on Taxonomic syntax (Yoon, Fern, and Givan 2002),
derived directly from the domain ontology and the demon-
stration examples, to learn direct policy representation ac-
tions. Taxonomic syntax has empirically been proven to
work very well with learning direct policies and, as the em-
pirical evaluation demonstrates, this was also the case for
SPLR. Linear regression was used to learn the value function
actions. Further details of both these phases can be found in
(Yoon and Kambhampati 2007).
Decision Theoretic Learner-Reasoner (DTLR)
The Decision Theoretic Learner-Reasoner (DTLR) learns a
cost function over possible solutions to problems. It is as-
sumed that the experts solution optimizes the cost function
subject to some constraints. The goal of the learner is to
learn a close approximation of the experts cost function.
Once the cost function is learned, the performance system
tries to nd a solution that nearly minimizes the cost through
iterative-deepening search.
We will formalize the cost function learning in the frame-
work of structured prediction (Taskar, Guestrin, and Koller
2004; Tsochantaridis et al. 2005). We dene a structured
prediction problem as a tuple A, }, , , where A is the
input space and } is the output space. In the case of ACO
scheduling, an input drawn fromA is a combination of ACO
and a set of ACMReqs to be scheduled. An output y drawn
from } is a schedule of the deconicted ACMs. The loss
function, : A } } ', quanties the relative pref-
erence of two outputs given some input. Formally, for an
input x and two outputs y and y

, (x, y, y

) > 0 iff y is
a better choice than y

given input x. Our goal, then, is to


learn a function that selects the best output for every input.
For each possible input x
i
A, we dene output y
i
} to
be the best output, so that y } y
i
: (x
i
, y
i
, y) > 0.
We are aided in the learning process by the joint feature
function : A } '
n
. In the ACO scheduling domain,
these features are x-y coordinate change, altitude change,
time change for each ACM, and the changes in the number of
intersections of the ight path with the enemy territory. The
decision function is the linear inner product (x, y), w)
where w is the vector of learned model parameters. The
specic goal of learning, then, is to nd w such that i :
argmax
yY
(x
i
, y), w) = y
i
.
We now describe our Structured Gradient Boosting
(SGB) algorithm (Parker, Fern, and Tadepalli 2006) which is
a gradient descent approach to solving the structured predic-
tion problem. Suppose that we have some training example
x
i
A with the correct output y
i
}. We dene y
i
to
be the best incorrect output for x
i
according to the current
model parameters.
y
i
= argmax
yY\yi
(x
i
, y), w) (1)
m
i
= (x
i
, y
i
), w)) (x
i
, y
i
), w) (2)
Cumulative loss L =
n

i=1
log(1 + e
mi
). (3)
We dene the margin m
i
to be the amount by which the
model prefers y
i
to y
i
as output for x
i
(see Equation 2.)
The loss on each example is determined by the logitboost
function of the margin and is accumulated over all training
examples as in Equation 3. The parameters w of the cost
function are repeatedly adjusted in proportion to the negative
gradient of the cumulative loss L over the training data.
The problem of nding y
i
which is encountered during
both learning and performance is called inference. We de-
ne a discretized space of operators namely, altitude, time,
radius and x-y co-ordinates based on the domain knowledge
to produce various possible plans to de-conict each ACM
in the ACMREQ. Since nding the best scoring plan that
resolves the conict is too expensive, we approximate it by
trying to nd a single change that resolves the conicts and
consider multiple changes only when it is necessary.
Case-Based Learner Reasoner (CBLR)
CBLR learns how to prioritize conicts and deconict
airspaces by using case-based reasoning (CBR). CBR
(Aamodt and Plaza 1994) consists of storing and indexing
a set of past experiences (called cases) and reusing them to
solve new problems, assuming that similar problems have
similar solutions. In GILA, CBLR uses the expert demon-
stration to learn cases. Unlike other learning techniques that
attempt to generalize at learning time, learning in CBR sys-
tems consists of representing and retaining cases in the case
libraries. All attempts at generalization are made during per-
formance by adapting the solutions of retrieved cases to the
problem at hand.
Learning in the CBLR CBLR learns knowledge to en-
able solving of both conict prioritization and deconiction
problems and, for that purpose, it uses ve knowledge con-
tainers. It uses a prioritization library which has a set of
cases for reasoning about the priority with which conicts
should be solved and a choice library, which contains cases
for reasoning about which airspace is to be moved in solving
a given conict. It has a constellation library used to charac-
terize the neighborhood surrounding a conict (i.e. its con-
text) and a deconiction library containing the specic steps
that the expert used to solve the conict. Finally, in order
to adapt the deconicted solutions, CBLR uses adaptation
knowledge, stored as rules and constraints.
CBLR learns cases for the four case libraries by analyz-
ing the expert trace. The order in which the expert solves
conicts provides the information required for prioritization
cases, and the experts choice of which airspace to modify
for a particular conict is the basis for choice cases. Con-
straints are learned by observing the range of values that the
expert uses for each variable (altitude, radius, width, etc.),
and by incorporating any additional constraints that other
components in GILA might learn.
Reasoning in the CBLR Given a set of conicts, CBLRuses
the prioritization library to assign to each conict the prior-
ity of the most similar conict in the library. The conicts
are then ranked according to these priorities, to obtain a nal
ordering. Thus, the bias of the CBLR is that similar con-
icts have similar priorities. Because CBLR followed the
approach of the expert so closely, evaluation shows priori-
ties computed by the CBLR were of very high quality.
Given a conict, CBLR uses a hierarchical problem solv-
ing process. First, the choice library is used to determine
which of the two airspaces is to move. Then, using the
constellation library, CBLR determines which deconiction
strategy (e.g. shift in altitude) is best suited to resolve the
conict given the context. Finally, cases from the deconic-
tion library are retrieved and adapted using the adaptation
knowledge. Adaptation may produce several candidate so-
lutions for each case retrieved and the validity of each candi-
date solution is evaluated with the help of the 4DCL. CBLR
attempts different cases, until satisfactory deconicted so-
lutions are found. In addition to standard problem solving
tests, incremental learning tests show that CBLR uses only
the approaches that were demonstrated by the expert, thus
showing CBLRs success in learning the experts approach.
4D Constraint Learner/Reasoner (4DCLR)
The 4D Constraint Learner/Reasoner (4DCLR) within
GILA is responsible for automated learning and application
of planning knowledge in the form of safety constraints.
Example constraints are: The altitude of a UAV over the
course of its trajectory should never exceed a maximum of
60,000 feet, and An aircraft trajectory should never be
moved so that it intersects a no-y zone.
The 4DCLR (Rebguns et al. 2008) has two components:
(1) the Constraint Learner (CL), which automatically in-
fers constraints from expert demonstration traces, and (2)
the Safety Checker (SC), which is responsible for verifying
the correctness of solution in terms of their satisfaction or
violation of the safety constraints learned by the CL. Why is
the SC needed if the ILRs already use the safety constraints
during problem solving? The reason is that the ILRs output
partial solutions to sub-problems. These partial solutions are
then combined by the MRE into one complete nal solution,
and that is what needs to be checked by the SC.
Constraint Learner We assume that the system designer
provides constraint templates a priori; it is the job of the
CL to infer the values of parameters within these templates.
Learning in the CL is Bayesian. For each parameter, such as
the maximum ying altitude for a particular aircraft, the CL
begins with a prior probability distribution. If informed, the
prior might be a Gaussian approximation of the real distribu-
tion obtained by asking the expert for the average, variance,
and covariance of the minimum and maximum altitudes. If
uninformed, a uniform prior is used.
Learning proceeds based on evidence witnessed by the
CL at each step of the demonstration trace. This evidence
might be a change in maximumaltitude that occurs as the ex-
pert positions and repositions an airspace to avoid a conict.
Based on this evidence, the prior is updated using Bayes
Rule and the assumption that the expert always moves an
airspace uniformly into a safe region. After observing
evidence, the CL assigns zero probability to constraint pa-
rameters that are inconsistent with the experts actions, and
assigns the highest probability to more constraining sets of
parameters that are consistent with the experts actions.
Safety Checker The SC inputs are candidate sub-problem
solutions from the ILRs, the current ACO on which to try
the candidate solutions, and the safety constraints output by
the CL; it outputs a violation message. The SC uses its 4D
Spatio-Temporal Reasoner to verify whether any constraint
is violated. A violation message is output by the SC that
includes the violated constraint, the solution that violated
the constraint, the nature of the violation, and the expected
degree (severity) of the violation, normalized to a value in
the range [0, 1].
The expected degree of violation is called the safety vi-
olation penalty. The SC calculates this penalty by nding
a normalized expectation of differences between expert ac-
tions (from the demonstration trace) and proposed actions
in the candidate solution. The differences are weighted by
their probabilities from the posterior distribution. The MRE
uses the violation penalty to discard sub-problem solutions
that are invalid because they have too many violations.
Meta-Reasoning Executive (MRE)
In both the practice learning phase and the performance
phase, the system is required to solve a test problem using
the learned knowledge. The MRE directs a collaborative
performance learning process where ILRs are not directly
sharing their learned knowledge but are all contributing to
solving the test problem. This collaborative performance
process is being modeled as a search process, searching for
a path from the initial state to the goal state where the prob-
lem is fully solved. The complete solution is a combination
of the partial solutions contributed by each ILR found on
this path. The given test problem is decomposed as a set
of sub-problems (conicts). As the MRE posts these sub-
problems on blackboard, each ILR will post its solutions to
some of these sub-problems. These sub-problem solutions
are the search operators available at the current state. The
MRE then selects one of these solutions and applies it to the
current state. As a result, a new state will be generated af-
ter applying this sub-problem solution. New conicts may
appear after applying a sub-problem solution. The new state
is then evaluated: if it is a goal state, the problem is fully
solved; otherwise, the MRE will post all sub-problems (con-
icts) existing in the new state and the previous process is
repeated. Figure 5 shows an example of a search tree. The
yellow nodes represent problem states and blue/green nodes
represent sub-problem solutions (sequences of ACM mod-
ications) posted by the ILRs. The yellow and blue/green
nodes alternate. When a sub-problem solution is selected to

Figure 5: Search Tree Example


be explored, its color is changed from blue to green. If the
yellow node represents the problem state p, and its (green)
child node represents a solution s, the green nodes (yellow)
child represents the result of applying s to p. The ordering
of nodes to be explored depends on the search strategy. A
best-rst search strategy is used in this work. This search
process is directed by the learned knowledge from the ILRs
in the following two ways.
First, GILA is learning to decide which sub-problems to
work on initially. It is not efcient to have all ILRs provide
solutions to all sub-problems, as it takes more time to gen-
erate those sub-problem solutions and also requires more ef-
fort to evaluate them. Since solving one sub-problem could
make solving the remaining problems easier or more dif-
cult, it is crucial to direct ILRs to work on sub-problems in
a facilitating order. This ordering knowledge is learned by
each ILR. In the beginning of the search process, the MRE
asks ILRs to provide a priority ordering of the sub-problems.
Based on each ILRs input, MRE will generate a priority list
that suggests which sub-problem to work on rst. This sug-
gestion is taken by all ILRs as guidance to generate solutions
for sub-problems.
Secondly, GILA learns to evaluate the proposed sub-
problem solutions. Each sub-problem solution (green node
on the search tree) is evaluated using the learned knowledge
from ILRs in the following ways:
1. Check for safety violations in the sub-problem solution.
Some sub-problem solutions may cause safety violations
that make it invalid. The SafetyChecker learns to check if
there is a safety violation and how severe the violation is,
which is represented by a number called violation penalty.
If the violation penalty is greater than the safety threshold,
the MRE discards this sub-problem solution; otherwise,
the following evaluations are performed.
2. DTLR derives the execution cost for a sub-problem so-
lution which is expected to approximate the experts cost
function. However, because this is not exact due to var-
ious assumptions in its learning such as discretization of
the action space and inexact inference.
3. Another ILR, the 4DCL, performs an internal simulations
to investigate the results of applying a sub-problem solu-
tion to the current state. The resulting new problem state
is evaluated and the number of remaining conicts is re-
turned to the MRE as an estimate of how far it is from the
goal state when there are no remaining conicts.
If a sub-problem solution does not solve any conict at all
it is discarded; otherwise a sub-problem solution is evalu-
ated based on the following factors: the cost of executing all
sub-problem solutions selected on this path, safety violation
penalties that would be present if the path were executed,
and the estimated execution cost & violation penalties of re-
solving remaining conicts. These factors are combined us-
ing a linear function with a set of weight parameters. The
values of these weight parameters can be varied to generate
different search behaviors.
Experimental Setup and Results
The GILA system consists of a set of distributed, loosely-
coupled components, interacting via a blackboard. Each
component is a standalone software module that interacts
with the GILA System using a standard set of domain-
independent APIs (e.g., interfaces). The distributed nature
of the design allows components to operate in parallel, max-
imizing efciency and scalability. Figure 6 shows the design
of GILA software architecture.
Test Cases and Evaluation Criteria
The testing cases for experiments were developed by Sub-
ject Matter Experts (SMEs) from BlueForce LLC. The ex-
periment results were graded by the SME. In each test-
ing case, there are 24 Airspace Control Measures (ACMs).
There are 14 conicts among these ACMs as well as exist-
ing airspaces. Figure 1 shows an example of a test case and
its solution. Each testing scenario consists of three test cases
for demonstration, practice and performance respectively.
The core task is to remove conicts between ACMs and
to congure ACMs such that they do not violate constraints
on time, altitude or geometry of airspace. The quality of
each step (action) inside the solution is judged according the
following factors:
Whether this action solves a conict
Whether this is the requested action
The simplicity of the action
Proximity to the problem area
Suitability for operational context
Originality of the action
Data Evaluator
GILA-AMS
Execution
Trace
End-
State
Logs
Scenario
SME
Eval
Data
ACM
Data
Eval
Table
G
I
L
A

C
o
m
p
o
n
e
n
t
s
MRE
PSTEPS
Manager
ILR
GILA
Coordinator
GILA-TEIF
GILA Integrated Learning Framework
RDBMS
Blackboard
(OWL-API)
Logs
Config
Data
G
I
L
A

C
o
m
p
o
n
e
n
t
s
Multi-Threaded
and Distributed
Cache Context Data to
minimize ILR rework
All components have
a common interface
to all resources
Figure 6: GILA Software Architecture
Each factor is graded on a 0-5 scale. The score for each
step is the average of all six factors. The nal score for a
solution is an average of the score of each step and then mul-
tiplied by 20 to make it between 0-100.
GILA vs. Human Novice Performance Comparison
Comparative evaluation of the GILA system is difcult be-
cause we have not found a similar man-made systemthat can
learn from a few demonstrations to solve complicated prob-
lems. Hence we chose the human novices as our base-line,
which is both challenging and exciting.
To compare the performance of GILA with novices, we
grouped 33 people into six groups. Each group was given
a demonstration case, a practice case and a testing case to
perform on. The 33 people were engineers from Lockheed
Martin with no priori airspace management experience. We
used three testing cases in six combinations for demonstra-
tion, practice and performance. We started with an introduc-
tion of the background knowledge. Each of the participants
was given a written document that lists all the knowledge
GILA had before learning. They also received GUI training
on how to use the graphical interface designed to make hu-
man testing fair in comparison with GILA testing. After the
training, each participant was handed a questionnaire to val-
idate that they had gained the basic knowledge to carry out
the test. The participants were then given a video demonstra-
tion of the expert traces on how to deconict the airspaces.
Based on their observation, they practiced on the practice
case, which only had the beginning and the ending states of
the airspaces without the detailed actions to deconict them.
Finally, the participants were given a testing case to work
on. The test ended with an exit questionnaire.
Table 1 shows the scores achieved by GILA and human
novices. The score for human novices shown in the table
is the average score of all human novices in a group who
are working on the same testing scenario. The score of a
solution represents the quality of a solution, which is evalu-
Table 1: Solution Quality: GILA v.s. Human Novices and
Solution Number Contributed by ILR
Scenario Human Novices GILA SPLR CBLR DTLR
EFG 93.84 95.40 75% 25% 0
EGF 91.97 96.00 60% 30% 10%
FEG 92.03 95.00 64% 36% 0
FGE 91.44 95.80 75% 17% 8%
GEF 87.40 95.40 75% 25% 0
GFE 86.3 95.00 75% 17% 8%
Average 90.5 95.4 70% 24% 6%
ated by SMEs based on the six factors described earlier. The
maximum possible score for one solution is 100. For ex-
ample, the rst row in Table 1 shows that for experiment
scenario EFG (using test case E for demonstration, F for
practice and G for performance), the average score for hu-
man novices is 93.84 while the score of GILA solution is
95.40. It is shown that based on the average of all six exper-
iments, GILA has achieved 105% of human novices perfor-
mance. The trimmed mean score of human novices (ignore
two highest scores and two lowest scores) is 91.24. The hy-
pothesis that GILA has achieved 100% human novice per-
formance (measured by trimmed mean score) is supported
with 99.98% condence using t-test. Here are some gen-
eral observations of how human novices perform differently
from GILA in solving an airspace management problem.
1. GILA sometimes gives uneven solutions, for example 35001 ft
instead of 35000 ft. Novices can infer from the expert trace that
35000 is the convention. It seems that human reasoning process
uses a piece of common knowledge, which is missing in GILAs
knowledge base.
2. Overall, novices lacked the ability to manage more than one
piece of information. As the complexity of the conicts in-
creased, they start to forget factors (which ACM, which method
to change, etc.) that needed to be taken into account. GILA
demonstrated a clearly higher level of information management
ability to working with multiple conicts at the same time.
Table 2: Comparison of GILA and Single-ILR for Conict
Resolving (Test Scenario: EFG)
GILA SPLR DTLR CBLR
Conict Solved 14 14 5 7
Quality Score 95.40 81.2 N/A N/A
The last three columns of Table 1 show the percentage
of contribution made by each ILR in the nal solution out-
put by GILA. Note that 4DCL is not in this list since it
does not propose conict resolutions but only checks safety
and constraint violations. On the average, SPLR clearly
dominates the performance by contributing 70% nal solu-
tion, followed by CBLR which contributes 24% and nally
DTLR which contributes 6%. One reason SPLRs perfor-
mance is so good is that its rule language which is based on
taxonomic syntax is very natural and appropriate to capture
the kind of rules that people seem to be using. Secondly,
its lower level value function captures nuanced differences
between different parameter values for the ACM modica-
tion operators. Third, it does more exhaustive search dur-
ing the performance phase than the other ILRs to nd the
best possible ACM modications. CBLR does well when
its training cases are similar to the test cases, and otherwise
does poorly. DTLR suffers from its approximate search and
coarse discretization of the search space. It sometimes com-
pletely fails to nd a solution that passes muster by the con-
straint learner although such solutions do exist in the ner
search space.
GILA vs. Single ILR for Solving Conicts
To test the importance of the collaboration among vari-
ous ILRs, we performed additional experiments by running
GILA with only one ILR for solving conicts. However,
in all these experiments, DTLR was still used for providing
cost information for MRE, and 4DCLR was used for con-
straint and safety checking. Table 2 shows that DTLR is able
to solve ve conicts out of 14 total conicts while CBLR is
able to solve seven of them. Though SPLR is able to solve
all 14 conicts, the quality score of its solution (81.2) is sig-
nicantly lower than the score achieved by GILA as a whole
(95.4). The lower score for SPLR-only solution is caused
by some large altitude and time changes, including moving
the altitude above 66000. Though there are multiple alter-
natives to resolving a conict, usually an action that min-
imizes change is preferred over those with larger changes.
Such large-change actions were not in the solution produced
using all ILRs because other ILRs proposed alternative ac-
tions, which were preferred and chosen by the MRE. Even
when DTLRs cost function is good, it is unable to solve
some conicts because of its incomplete search. CBLR fails
to solve conicts if its case base does not contain similar
conicts. These experiments verify that the collaboration of
multiple ILRs is indeed important to solve problems with
high-quality solutions.
Conclusions
In this paper we presented an ensemble architecture for
learning to solve the airspace management problem. Mul-
tiple components, each using a different learning/reasoning
mechanism and internal knowledge representations, learn
independently from the same expert demonstration trace.
The meta-reasoning executive component directs a collab-
orative performance process, in which it posts sub-problems
and selects partial solutions from ILRs to explore. During
this process, each ILR contributes to the problem-solving
without explicitly transferring its learned knowledge. Ex-
perimental results show that GILA matches or exceeds the
performance of human novices after learning from the same
expert demonstration. This system can be used to provide
advice for human operators to deconict airspace requests.
It is also straightforward to use this architecture for a new
problem domain since the MRE and the core learning algo-
rithms inside ILRs are domain-independent, and also there
is no need to maintain a common knowledge representa-
tion. This ensemble learning and problem-solving architec-
ture opens a new path to learning to solve complex problems
from very few examples.
References
Aamodt, A., and Plaza, E. 1994. Case-based reasoning: Founda-
tional issues, methodological variations, and system approaches.
Articial Intelligence Communications 7(1):3959.
Buchanan, B. G., and Wilkins, D. C. 1993. Readings in Knowl-
edge Acquisition and Learning :Automating the Construction and
Improvement of Expert Systems. San Mateo, CA: Morgan Kauf-
mann.
McGuinness, D. L., and da Silva, P. P. 2004. Explaining answers
from the semantic web: the inference web approach. J. Web Sem.
1(4):397413.
McGuinness, D. L.; Ding, L.; da Silva, P. P.; and Chang, C. 2007.
Pml 2: A modular explanation interlingua. In Proceedings of
AAAI07 Workshop on Explanation-Aware Computing.
Mitchell, T. M. 1997. Machine Learning. New York, NY: Mc
Graw Hill.
Parker, C.; Fern, A.; and Tadepalli, P. 2006. Gradient boost-
ing for sequence alignment. In Proceedings of the 21st National
Conferenceo on Articial Intelligence. AAAI Press.
Polikar, R. 2006. Ensemble based systems in decision making.
IEEE Circuits and Systems Magazine 6(3):2145.
Rebguns, A.; Green, D.; Levine, G.; Kuter, U.; and Spears,
D. 2008. Inferring and applying safety constraints to guide an
ensemble of planners for airspace deconiction. In CP/ICAPS
COPLAS08 Workshop on Constraint Satisfaction Techniques for
Planning and Scheduling Problems.
Taskar, B.; Guestrin, C.; and Koller, D. 2004. Max margin
markov networks. In NIPS.
Tsochantaridis, I.; Joachims, T.; Hofmann, T.; and Altun, Y.
2005. Large margin methods for structured and interdependent
output variables. Journal of Machine Learning Research 6:1453
1484.
Yoon, S., and Kambhampati, S. 2007. Hierarchical strategy learn-
ing with hybrid representations. In AAAI 2007 Workshop on Ac-
quiring Planning Knowledge via Demonstrations.
Yoon, S.; Fern, A.; and Givan, R. 2002. Inductive policy selection
for rst-order MDPs. In In Proceedings of Eighteenth Conference
in Uncertainty in Articial Intelligence.

Vous aimerez peut-être aussi