Académique Documents
Professionnel Documents
Culture Documents
Managing Editor
Dov M. Gabbay, Department of Computer Science, King's College, Landen, U.K.
Co-Editor
John Barwise, Department of Philosophy, Indiana University, Bloomington, IN,
U.S.A .
Editorial Assistant
Jane Spurr, Department of Computer Science, King's College, London, U.K.
The titles published in this series are listed at the end of this volume.
Abduction and
Induction
Essays on their Relation and Integration
edited by
PETER A. FLACH
University of Bristol
and
ANTONIS C. KAKAS
University of Cyprus
Foreword ix
Preface xiii
Contributing Authors xv
3
Abduction as epistemic change: a Peircean model in Artificial Intelligence 45
Atocha Aliseda
3.1 Introduction 45
3.2 Abduction and induction 46
3.3 The notion of abduction 47
3.4 Epistemic change 51
3.5 Abduction as epistemic change 54
3.6 Discussion and conclusions 57
v
Vl CONTENTS
4
Abduction: between conceptual richness and computational complexity 59
Stathis Psillos
4.1 Introduction 59
4.2 Ampliative reasoning 60
4.3 Explanatory reasoning: induction and hypothesis 62
4.4 Abduction 64
4.5 Abduction and computation 69
4.6 Conclusions 73
6
On the logic of hypothesis generation 89
Peter A. Flach
6.1 Introduction 89
6.2 Logical preliminaries 90
6.3 Explanatory reasoning 93
6.4 Confirmatory reasoning 97
6.5 Discussion 105
7
Abduction and induction from a non-monotonic reasoning perspective 107
Nicolas Lachiche
7.1 Introduction 107
7.2 Definitions 108
7.3 Abduction and explanatory induction 111
7.4 Abduction and descriptive induction 111
7.5 Discussion 114
7.6 Conclusion 115
8
Unified inference in extended syllogism 117
Pei Wang
8.1 Term logic and predicate logic 117
8.2 Extended syllogism in NARS 119
8.3 An example 123
8.4 Discussion 125
CONTENTS vii
10
Learning, Bayesian probability, graphical models, and abduction 153
David Poole
10.1 Introduction 153
10.2 Bayesian probability 156
10.3 Bayesian networks 161
10.4 Bayesian learning and logic-based abduction 162
10.5 Combining induction and abduction 166
10.6 Conclusion 168
11
On the relation between abductive and inductive hypotheses 169
Akinori Abe
11 .1 Introduction 169
11.2 The relation between abduction and induction 170
11.3 About the integration of abduction and induction 176
11.4 Conclusion 179
12
Integrating abduction and induction in Machine Learning 181
Raymond J. Mooney
12.1 Introduction 181
12.2 Abduction and induction 182
12.3 Abduction in theory refinement 183
12.4 Induction of abductive knowledge bases 188
12.5 Conclusions 191
14
Learning abductive and nonmonotonic logic programs 213
Katsumi Inoue and Hiromasa Haneda
14.1 Introduction 213
14.2 Learning nonmonotonic logic programs 216
14.3 Learning abductive logic programs 223
14.4 Related work 228
14.5 Conclusion 229
Appendix: Proof of Theorem 14.2 229
15
Cooperation of abduction and induction in Logic Programming 233
Evelina Lamma, Paola Mello, Fabrizio Riguzzi, Floriana Esposito,
Stefano Ferilli, and Giovanni Semeraro
15.1 Introduction 233
15.2 Abductive and Inductive Logic Programming 234
15.3 An algorithm for learning abductive logic programs 239
15.4 Examples 241
15.5 Integration of abduction and induction 248
15.6 Conclusions and future work 250
Appendix: Abductive proof procedure 250
16
Abductive generalization and specialization 253
Chiaki Sakama
16.1 Introduction 253
16.2 Preliminaries 254
16.3 Generalizing knowledge bases through abduction 256
16.4 Specializing knowledge bases through abduction 260
16.5 Related work 264
16.6 Concluding remarks 264
17
Using abduction for induction based on bottom generalization 267
Akihiro Yamamoto
17.1 Introduction 267
17.2 From abduction to induction 269
17.3 SOLD-resolution 270
17.4 Finding definite clauses 273
17.5 Finding unit programs 278
17.6 Concluding remarks 279
Bibliography 281
Index 301
Foreword
Reasoning in reverse
Logic is the systematic study of cogent reasoning. The central process of reasoning
studied by modem logicians is the accumulative deduction, usually explained semanti-
cally, as taking us from truths to further truths. But actually, this emphasis is the result
of a historical contraction of the agenda for the field. Up to the 1930s, many logic
textbooks still treated deduction, induction, confirmation, and various further forms of
reasoning in a broader sense as part of the logical core curriculum. And moving back
to the 19th century, authors like Mill or Peirce included various non-deductive modes
of reasoning (induction, abduction) on a par with material that we would recognize at
once as 'modem' concerns. Since these non-deductive styles of reasoning seemed ir-
relevant to foundational research in mathematics, they moved out quietly in the Golden
Age of mathematical logic. But they do remain central to a logical understanding of
ordinary human cognition. These days, this older broader agenda is coming back to
life, mostly under the influence of Artificial Intelligence, but now pursued by more
sophisticated techniques - made available, incidentally, by advances in mathematical
logic ...
The present volume is devoted to two major varieties of non-deductive inference,
namely abduction and induction, identified as logical 'twins' by C.S. Peirce, but dis-
covered independently under many different names. Roughly speaking, abduction is
about finding explanations for observed facts, viewed as missing premises in an ar-
gument from available background knowledge deriving those facts. Equally roughly
speaking, induction is abput finding general rules covering a large number of given
observations. Both these phenomena have been studied by philosophers of science
since the 1950s, such as Camap (the pioneer of inductive logic) and Hempel (whose
'logico-deductive' model of explanation has unmistakable abductive features). An-
other major contribution was made by Popper. If good news travels in this forward
direction, bad news travels in the opposite. Valid consequences also take us from false
conclusions to falsity of at least one of the premises, allowing us to learn by revision
- even though we may have some latitude in where to assign the blame. Thus, rea-
soning is also tied up with scientific theory change, and more generally, flux of our
commonsense opinions. What the present volume shows is how these concerns are
ix
X FOREWORD
converging with those of logicians and computer scientists, into a broader picture of
what reasoning is all about.
A pervasive initial problem in this area, even if just an irritant, is terminology. Some
people feel that 'abduction' and 'induction', once baptised, must be real phenomena.
But they might be mere terms of art, still looking for a substantial denotation ... Indeed,
it is not easy to give a crystal-clear definition for them, either independently or in their
inter-relationship. (Of course, this is not easy for 'deduction' either.) Fortunately, the
editors do an excellent job in their introductory chapter of clearing up a number of
confusions, and relating abduction and induction in a productive manner. No need to
repeat that. Instead, let me highlight how the subject of this book intertwines many
general features of reasoning that need to be understood - and that somehow man-
age to escape from the usual logical agenda. For this purpose, we must make some
distinctions.
Every type of reasoning revolves around some underlying connection, giving us a
link with a certain 'quality' between input (data) and output (conclusions). The classi-
cal schema for this connection is the binary format P, Q, ... f= C. But this format leaves
many 'degrees of freedom', that are essential to reasoning. First, the strength of the
connection may vary. With Tarski it says "all models of the premises P, Q, .. . are also
models for the conclusion C'. But there are respectable alternatives which ask less:
replacing "all" by "almost all" (as happens in some probabilistic reasoning), or by "all
most preferred" (as in non-monotonic logics in AI). This variety of 'styles of reason-
ing' can be traced back to the pioneering work of Bolzano in the early 19th century,
including an awareness- wide-spread now, but quite novel then - that these different
styles differ, not only in the individual inferences they sanction, but also in their gen-
eral structural rules, such as Monotonicity of Transitivity. Second, varieties of logical
consequence multiply by the existence of very different viewpoints on the connection.
All variations so far were semantic, in terms of models, truth, and preference. But
we can also analyse cogent reasoning in a proof-theoretic manner (consequence as
derivability), giving us options between classical, or intuitionistic, or linear logic- or
a game-theoretic one (where valid consequence is the existence of a winning strategy
for a proponent in debate), giving us yet further logical systems.
Another key dimension to reasoning is direction. Standard logical inference moves
forward, from given premises to new conclusions. But abduction moves backwards,
looking for premises that imply a given conclusion. The backwards direction is often
less deterministic, as we can choose from a vast background reservoir of knowledge,
prejudices, hypotheses, etc. Indeed, to put backwards reasoning in proper perspective,
we need richer formats of inference. An example is Toulmin 's schema from the 1950s,
where claims follow from data via a 'warrant', where data are backed up by evidence,
and warrants by background theory. Thus, we are led naturally to a study of theory
structure. The latter is prominent in the philosophy of science, where assertions may
fall into laws, facts, and auxiliary hypotheses. This structure seems essential to logical
analysis of reasoning. Thus, abduction looks for facts, while induction searches for
regularities. Indeed, would the same distinctions make sense in the forward direction?
Structured theories fix different roles for assertions. A more radical view would
be that such roles are not fixed globally, but just represent a focus in the current con-
FOREWORD XI
From the very beginning of investigation of human reasoning, philosophers had iden-
tified- along with deduction- two other forms of reasoning which we now call abduc-
tion and induction. Whereas deduction has been widely studied over the years and is
now fairly well understood, these two other forms of reasoning have, until now, eluded
a similar level of understanding. Their study has concentrated more on the role they
play in the evolution of knowledge and the development of scientific theories.
In an attempt to increase our understanding of these two forms of non-deductive
reasoning, this book presents a collection of works addressing the issues of the re-
lation between abduction and induction, as well as their possible integration. These
issues are approached sometimes from a philosophical perspective, sometimes from a
(purely) logical perspective, but also from the more task-oriented perspective of Arti-
ficial Intelligence. To a certain extent, the emphasis lies with the last area of Artificial
Intelligence, where abduction and induction have been more intensively studied in
recent years.
This book grew out of a series of workshops on this topic. The first of these took
place at the Twelfth European Conference on Artificial Intelligence (Budapest, August
1996), and concentrated on the general philosophical issues pertaining to the unifica-
tion or distinction between abduction and induction. The second workshop took place
at the Fifteenth International Joint Conference on Artificial Intelligence (Nagoya, Au-
gust I 997), with an emphasis on the more practical issues of integration of abduc-
tion and induction. Taking place in parallel with the preparation of this book, a third
workshop was held at the Thirteenth European Conference on Artificial Intelligence
(Brighton, August 1998). Detailed reports on the first two workshops have been pub-
lished as (Flach and Kakas, 1997a; Flach and Kakas, I 998); these reports, as well as
further information about the workshops (including submitted papers), are available
on-line at http: 1 jwww . cs. bris . ac. uk/-flachjabdind/.
After the first two workshops, we invited the participants to submit a longer paper
based on their workshop contribution(s), suitable for publication in an edited volume.
Following a careful reviewing process, thirteen of the submitted papers were selected
for publication. In addition, we invited four well-known authors to contribute a paper:
John Josephson, Luca Console, Lorenza Saitta, and David Poole.
xiii
xiv PREFACE
Following a general introduction into the subject, the book is structured into four
main parts. The first two parts take a more theoretical perspective, while the remaining
two parts address the more practical issue of integrating abduction and induction. Part
1 contains three papers addressing philosophical aspects of abduction and induction.
In Part 2, four papers investigate the logical relation of the two forms of reasoning.
The four papers in Part 3 deal with integration of the two forms of reasoning from the
perspective of Artificial Intelligence, while the five papers that can be found in Part
4 address this problem within the more particular framework of Logic Programming.
The book starts off with an introductory chapter aimed at helping the reader in two
ways. It provides background material on the general subject of the book and exposes
the main issues involved. At the same time it positions the other contributions in the
book within the general terrain of debate.
The present book is one of the first books to address explicitly the problem of under-
standing the relation and interaction between abduction and induction in the various
fields of study where these two forms of reasoning appear. As such, it should be rele-
vant to a variety of students and researchers from these different areas of study, such
as philosophers, logicians, and people working in Artificial Intelligence and Computer
Science more generally.
Acknowledgments
We would like to thank all persons who helped in one way or another with the prepa-
ration of this book.
These are all those involved in the organisation of the three workshops on the sub-
ject at ECAI'96, IJCAI'97 and ECAI'98, where much of the groundwork for this book
has been done. In particular, we would like to thank the other members of the organis-
ing committees of these workshops: Henning Christiansen, Luca Console, Marc De-
necker, Luc De Raedt, Randy Goebel, Katsumi Inoue, John Josephson, Ray Mooney
and Chiaki Sakama. A special thanks goes to the three invited speakers at these work-
shops, John Josephson, David Poole and Murray Shanahan. And of course we are
grateful to all the participants, for the pleasant atmosphere and lively discussions dur-
ing the workshops. Finally, we would like to thank the two European networks of
excellence, Compulog-Net and ML-Net, for their financial support in organising these
workshops.
We thank everybody who submitted a paper to this book; those who helped review-
ing the submissions; the invited authors for their marvellous contributions; and Johan
van Benthem for his beautiful and thought-provoking foreword.
Part of this work falls under the workplan of the ESPRIT project ILP2: Inductive
Logic Programming. We wish to thank the other partners of the project for their help
and valuable discussions on the subject of the book. We also thank the Universities of
Cyprus, Tilburg and Bristol for providing the opportunities to prepare this book.
Special thanks go to Kim and Nada for their patient understanding and support with
all the rest of life's necessities, thus allowing us the selfish pleasure of concentrating
on research and other academic matters such as putting this book together.
Akinori Abe (ave@cslab. keel. ntt. co. jp) is a Senior Research Scientist at
NTI Communication Science laboratories. He obtained his Doctor of Engineering
degree from University of Tokyo in 1991, with a thesis entitled A Fast Hypothetical
Reasoning System using Analogical Case. His main research interests are abduction
(hypothetical reasoning), analogical reasoning and language sense processing. He is a
member of the Planning Committee of New Generation Computing.
Stefano Ferilli (ferilli @di . uniba. it) is currently a PhD student in Computer
Science at DIB, University of Bari. He graduated in Computer Science at University
of Bari in 1996. His research interests include logic programming, machine learning
and theory revision.
John R. Josephson (j j @cis . ohio- state. edu} is a Research Scientist and the
Associate Director of the Laboratory for AI Research (LAIR) in the Department of
Computer and Information Science at the Ohio State University. He received his Ph.D.
in Philosophy (of science) from Ohio State in 1982; he also holds B.S. and M.S. de-
grees in Mathematics from Ohio State. His primary research interests are artificial
intelligence, knowledge-based systems, abductive inference, causal reasoning, theory
formation, perception, diagnosis, the logic of investigation, and the foundations of
science. He has worked in several application domains including: medical diagno-
sis, diagnosis of engineered systems, logistics planning, speech recognition, genetics,
molecular biology, and design of electro-mechanical systems. He is the co-editor,
with Susan G. Josephson, of Abductive Inference (Cambridge University Press, 1994,
1996). His homepage is at http : 1 jwww. cis . ohio- state. edu/-j j/.
David Poole (poole @cs . ubc. ca) is a Professor of Computer Science at the Uni-
versity of British Columbia. He received his Ph.D. from the Australian National Uni-
versity in 1984. He is known for his work on knowledge representation, default rea-
soning, assumption-based reasoning, diagnosis, reasoning under uncertainty, and au-
tomated decision making. He is a co-author of a recent AI textbook, Computational
Intelligence: A Logical Perspective (Oxford University Press, 1998), co-editor of the
Proceedings of the Tenth Conference in Uncertainty in Artificial Intelligence (Morgan
Kaufmann, 1994), serves on the editorial board of the Journal of AI research, and
is a principal investigator in the Institute for Robotics and Intelligent Systems. His
homepage is at http: 1 lwww. cs . ubc. calspiderlpoolel.
Lorenza Saitta (sai tta @di. uni to. it) is a Professor of Computer Science at
the Universita' del Piemonte Orientale, Alessandria, Italy. Her main research interests
are in Machine Learning, specifically learning relations, multistrategy learning, and
CONTRIBUTING AUTHORS xix
complexity issues. Recently, she became also interested in Genetic Algorithms and
Cognitive Sciences. She has been the Chairperson of the International Conference on
Machine Learning in 1996.
1.1 INTRODUCTION
This collection is devoted to the analysis and application of abductive and inductive
reasoning in a common context, studying their relation and possible ways for integra-
tion. There are several reasons for doing so. One reason is practical, and based on the
expectation that abduction and induction are sufficiently similar to allow for a tight
integration in practical systems, yet sufficiently complementary for this integration to
be useful and productive.
Our interest in combining abduction and induction is not purely practical, however.
Conceptually, the relation between abduction and induction is not well understood.
More precisely, there are several, mutually incompatible ways to perceive this relation.
For instance, Josephson writes that 'it is possible to treat every good (... ) inductive
generalisation as an instance of abduction' (Josephson, 1994, p.l9), while Michalski
has it that 'inductive inference was defined as a process of generating descriptions that
imply original facts in the context of background knowledge. Such a general definition
includes inductive generalisation and abduction as special cases' (Michalski, 1987,
p.188).
One can argue that such incompatible viewpoints indicate that abduction and induc-
tion themselves are not well-defined. Once their definitions have been fixed, studying
their relation becomes a technical rather than a conceptual matter. However, it is not
self-evident why there should exist absolute, Platonic ideals of abduction and induc-
tion, waiting to be discovered and captured once and for all by an appropriate defini-
P.A. Flach and A.C. Kakns (eds.), Abduction and Induction, 1-27.
@ 2000 Kluwer Academic Publishers.
2 P.A. FLACH AND A.C. KAKAS
tion. As with most theoretical notions, it is more a matter of pragmatics, of how useful
a particular definition is going to be in a particular context.
A more relativistic viewpoint is often more productive in these matters, looking at
situations where it might be more appropriate to distinguish between abduction and
induction, and also at cases where it seems more useful to unify them. Sometimes
we want to stress that abduction and induction spring from a common root (say hypo-
thetical or non-deductive reasoning), and sometimes we want to take a finer grained
perspective by looking at what distinguishes them (e.g. the way in which the hypothe-
sis extends our knowledge). The following questions will therefore be our guidelines:
• When and how will it be useful to unify, or distinguish, abduction and induction?
This establishes a dichotomy of the set of non-fallacious arguments into either deduc-
tive or inductive arguments, the distinction being based on the way they are supported
or justified: while deductive support is an absolute notion, inductive support must be
expressed in relative (e.g. quantitative) terms.
Salmon further classifies inductive arguments into arguments based on samples,
arguments from analogy, and statistical syllogisms. Arguments based on samples or
inductive generalisations have the following general form:
X percent of observed Fs are Gs;
therefore, (approximately) X percent of all Fs are Gs.
Peirce's syllogistic theory. In Peirce's days logic was not nearly as well-developed
as it is today, and his first attempt to classify arguments (which he considers 'the
chief business of the logician' (2.619), follows Aristotle in employing syllogisms.
The following syllogism is known as Barbara:
All the beans from this bag are white;
these beans are from this bag;
therefore, these beans are white.
The idea is that this valid argument represents a particular instantiation of a reason-
ing scheme, and that any alternative instantiation represents another argument that is
likewise valid. Syllogisms should thus be interpreted as argument schemas.
Two other syllogisms are obtained from Barbara if we exchange the conclusion (or
Result, as Peirce calls it) with either the major premiss (the Rule) or the minor premiss
(the Case):
Case. -These beans are from this bag.
Result. -These beans are white.
Rule. - All the beans from this bag are white.
1References to Peirce's collected papers take the form X .Y, where X denotes the volume number andY the
paragraph within the volume.
6 P.A. FLACH AND A.C. KAKAS
and the result) Peirce calls making a hypothesis or, briefly, hypothesis - the term
'abduction' is introduced only in his later theory. 2
Peirce thus arrives at the following classification of inference (2.623):
Deductive or Analytic
Inference { S h . { Induction
ynt ettc Hypothesis
Comparing this classification with the one obtained in Section 1.2.1, we can point
out the following similarities. That what was called induction previously corresponds
to what Peirce calls synthetic inference (another term he uses is ampliative reason-
ing, since it amplifies, or goes beyond, the information contained in the premisses).
Furthermore, what Peirce calls induction corresponds to what we called inductive gen-
eralisation in Section 1.2.1.3
On the other hand, the motivations for these classifications are quite different in
each case. In Section 1.2.1 we were concentrating on the different kinds of support or
confirmation that arguments provide, and we noticed that this is essentially the same
for all non-deductive reasoning. When we concentrate instead on the syllogistic form
of arguments, we find this to correspond more naturally to a trichotomy, separating
non-deductive reasoning into two subcategories. As Horn clause logic is in some sense
a modem upgrade of syllogistic logic, it is perhaps not surprising that the distinction
between abduction and induction in logic programming follows Peirce's syllogistic
classification to a large extent. This will be further taken up in Section 1.3.
Peirce's inferential theory. In his later theory of reasoning Peirce abandoned the
idea of a syllogistic classification of reasoning:
'( ... ) I was too much taken up in considering syllogistic forms and the doctrine
of logical extension and comprehension, both of which I made more fundamenlal
than they really are. As long as I held that opinion, my conceptions of Abduc-
tion necessarily confused two different kinds of reasoning.' (Peirce, 1958, 2.1 02,
written in 1902)
Instead, he identified the three reasoning forms - abduction, deduction and induction
- with the three stages of scientific inquiry: hypothesis generation, prediction, and
evaluation (Figure 1.1). The underlying model of scientific inquiry runs as follows.
When confronted with a number of observations she seeks to explain, the scientist
comes up with an initial hypothesis; then she investigates what other consequences
this theory, were it true, would have; and finally she evaluates the extent to which
these predicted consequences agree with reality. Peirce calls the first stage, coming up
with a hypothesis to explain the initial observations, abduction; predictions are derived
from a suggested hypothesis by deduction; and the credibility of that hypothesis is
estimated through its predictions by induction. We will now take a closer look at these
stages.
2 Peirce also uses the term 'retroduction', a translation of the Greek word a.Tta.J'IIYVIl used by Aristotle (trans-
lated by others as 'reduction').
3 It should be noted that, although the above syllogistic arguments are all categorical, Peirce also considered
statistical versions.
ABDUCTIVE AND INDUCTIVE REASONING: BACKGROUND AND ISSUES 7
hypothesis
deduction
abduction
R E A L I T v
Figure 1.1 The three stages of scientific inquiry.
Let us investigate the logical form of abduction given by Peirce a little closer. About
C we know two things: that it is true in the actual world, and that it is surprising. The
latter thing can be modelled in many ways, one of the simplest being the requirement
that C does not follow from our other knowledge about the world. In this volume,
Aliseda models it by an epistemic state of doubt which calls for abductive reasoning
to transform it into a state of belief.
Then, 'if A were true, C would be a matter of course' is usually interpreted as 'A
logically entails C'. 4 Peirce calls A an explanation of C, or an 'explanatory hypothe-
sis'. Whether or not this is an appropriate notion of explanation remains an issue of
4 Note that interpreting the second premiss as a material implication, as is sometimes done in the literature,
renders it superfluous, since the truth of A -+C follows from the truth of the observation C.
8 P.A. FLACH AND A.C. KAKAS
debate. In this volume, Console and Saitta also propose to identify explanation with
entailment, but Josephson argues against it.
Besides being explanatory, Peirce mentions two more conditions to be fulfilled
by abductive hypotheses: they should be capable of experimental verification, and
they should be 'economic'. A hypothesis should be experimentally verifiable, since
otherwise it cannot be evaluated inductively. Economic factors include the cost of
verifying the hypothesis, its intrinsic value, and its effect upon other projects (Peirce,
1958, 7 .220). In other words, economic factors are taken into account when choosing
the best explanation among the logically possible ones. For this reason, abduction is
often termed 'inference to the best explanation' (Lipton, 1991).
Induction is identified by Peirce as the process of testing a hypothesis against reality
through selected predictions. 'Induction consists in starting from a theory, deducing
from it predictions of phenomena, and observing those phenomena in order to see how
nearly they agree with the theory' (Peirce, 1958, 5.170). Such predictions can be seen
as experiments:
'When I say that by inductive reasoning I mean a course of experimental inves-
tigation, I do not understand experiment in the narrow sense of an operation by
which one varies the conditions of a phenomenon almost as one pleases. (...) An
experiment( ...) is a question put to nature. ( ...) The question is, Will this be the
result? If Nature replies 'No!' the experimenter has gained an important piece
of knowledge. If Nature says 'Yes,' the experimenter's ideas remain just as they
were, only somewhat more deeply engrained.' (Peirce, 1958, 5.168)
This view of hypothesis testing is essentially what is called the 'hypothetico-deductive
method' in philosophy of science (Hempel, 1966). The idea that a verified prediction
provides further support for the hypothesis is very similar to the notion of confirma-
tion as discussed in Section 1.2.1, and also refutation of hypotheses through falsified
predictions can be brought in line with confirmation theory, with a limiting degree of
support of zero. 5 The main difference from confirmation theory is that in the Peircean
view of induction the hypothesis is, through the predictions, tested against selected
pieces of evidence only. This leads to a restricted form of hypothesis evaluation, for
which we will use the term hypothesis testing.
Peirce's inferential theory makes two main points. It posits a separation between
hypothesis generation and hypothesis evaluation; and it focuses attention on hypothe-
ses that can explain and predict. Combining the two points, abduction is the process of
generating explanatory hypotheses (be they general 'rules' or specific 'cases', as in the
syllogistic account), and induction corresponds to the hypothetico-deductive method
of hypothesis testing. However, the two points are relatively independent: e.g., we can
perceive generation of non-explanatory hypotheses. We will come back to this point
in the discussion below.
5 From a Bayesian perspective P(HIE) is proportional to P(EIH)P(H), where P(H) is the prior probability
of the hypothesis; if E is contrary to a prediction P(EIH) = 0. See Poole's chapter for further discussion of
the Bayesian perspective.
ABDUCfiVE AND INDUCfiVE REASONING: BACKGROUND AND ISSUES 9
1.2.3 Discussion
In the previous two sections we have considered three philosophical and logical per-
spectives on how non-deductive reasoning may be categorised: the inductivist view,
which holds that no further categorisation is needed since all non-deductive reasoning
must be justified in the same way by means of confirmation theory; the syllogistic
view, which distinguishes between inductive generalisation on the one hand and hy-
pothesis or abduction as inference of specific 'cases' on the other; and the inferential
view, which holds that abduction and induction represent the hypothesis generation
and evaluation phases in explanatory reasoning. As we think that none of these view-
points provides a complete picture, there is opportunity to come to a partial synthesis.
Hypothesis generation and hypothesis evaluation. The most salient point of Peirce's
later, inferential theory is the distinction between hypothesis generation and hypothe-
sis evaluation. In most other accounts of non-deductive reasoning the actual hypothe-
sis is already present in the argument under consideration, as can be seen clearly from
the argument forms discussed in Section 1.2.1. For instance, when constructing an
inductive generalisation
X percent of observed Fs are Gs;
therefore, (approximately) X percent of all Fs are Gs.
our job is first to conjecture possible instantiations ofF and G (hypothesis generation),
and then to see whether the resulting argument has sufficient support (hypothesis eval-
uation).
One may argue that a too rigid distinction between generation and evaluation of
hypotheses is counter-productive, since it would lead to the generation of many, ulti-
mately useless hypotheses. Indeed, Peirce's 'economic factors', to be considered when
constructing possible abductive hypotheses, already blur the distinction to a certain ex-
tent. However, even if a too categorical distinction may have practical disadvantages,
on the conceptual level the dangers of confusing the two processes are much larger.
Furthermore, the distinction will arguably be sharper drawn in artificial reasoning sys-
tems than it is in humans, just as chess playing computers still have no real alternative
to finding useful moves than to consider all possible ones.
In any case, whether tightly integrated or clearly separated, hypothesis generation
and hypothesis evaluation have quite distinct characteristics. Here we would argue that
it is hypothesis generation, being concerned with possibilities rather than choices, that
is most inherently 'logical' in the traditional sense. Deductive logic does not help the
mathematician in selecting theorems, only in distinguishing potential theorems from
fallacious ones. Also, as (Hanson, 1958) notes, if hypothesis evaluation establishes
a logic at all, then this would be a 'Logic of the Finished Research Report' rather
than a 'Logic of Discovery'. An axiomatic formalisation of the logic of hypothesis
generation is suggested by Flach in his chapter in this volume.
We also stress the distinction between generation and evaluation because it provides
a useful heuristic for understanding the various positions of participants in the debate
on abduction and induction. This rule of thumb states that those concentrating on
generating hypotheses tend to distinguish between non-deductive forms of reasoning;
those concentrating on evaluating hypotheses tend not to distinguish between them.
10 P.A. FLACH AND A.C. KAKAS
Not only does the rule apply to the approaches discussed in the previous two sections;
we believe that it can guide the reader, by and large, through the chapters in this
collection.
For instance, instead of observing that 53% of observed humans are female, such
approaches will continue to refine F until all observed Fs are female (for instance, F
could be 'humans wearing a dress').
The point here is not so much that in artificial intelligence we are only interested
in infallible truths. Often, we have to deal with uncertainties in the form of noisy
data, exceptions to rules, etc. Instead of representing these uncertainties explicitly in
the form of relative frequencies, one deals with them semantically, e.g. by attaching a
degree of confirmation to the inductive conclusion, or by interpreting rules as defaults.
The above formulation of categorical inductive generalisation is still somewhat lim-
iting. The essential step in any inductive generalisation is the extension of the universal
quantifier's scope from the sample to the population. Although the universally quan-
tified sentence is frequently a material implication, this need not be. A more general
form for categorical inductive generalisation would therefore be:
All objects in the sample satisfy P(x);
therefore, all objects in the population satisfy P(x) .
where P(x) denotes a formula with free variable x. Possible instantiations of P(x) can
be found by pretending that there exist no other objects than those in the sample, and
looking for true universal sentences. For instance, we might note that every object in
the sample is either female or male. This approach is further discussed in the chapter
by Lachiche.
generalisations: the rule 'every parent of John is a parent of John 's brother' does not
explain parenthood.
In line with recent developments in inductive logic programming, we would like to
suggest that inductive generalisations like these are not explanatory at all. They sim-
ply are generalisations that are confirmed by the sample. The process of finding such
generalisations has been called confirmatory induction (also descriptive induction).
The difference between the two forms of induction can be understood as follows. A
typical form of explanatory induction is concept learning, where we want to learn a
definition of a given concept C in terms of other concepts. This means that our induc-
tive hypotheses are required to explain (logically entail) why particular individuals are
Cs, in terms of the properties they have.
However, in the more general case of confirmatory induction we are not given a
fixed concept to be learned. The aim is to learn relationships between any of the
concepts, with no particular concept singled out. The formalisation of confirmatory
hypothesis formation thus cannot be based on logical entailment, as in Peirce's ab-
duction. Rather, it is a qualitative form of degree of confirmation, which explains its
name. We will have more to say about the issue in Section 1.3.2.
Abduction. Turning next to abduction, it may seem at first that Peirce's syllogistic
and inferential definitions are not easily reconcilable. However, it is possible to per-
ceive a similarity between the two when we notice that the early syllogistic view of
abduction or hypothesis (p. 5) provides a special form of explanation. The Result (tak-
ing the role of the observation) is explained by the Case in the light of the Rule as a
given theory. The syllogistic form of abduction can thus be seen to meet the explana-
tory requirement of the later inferential view of abduction. Hence we can consider
explanation as a characterising feature of abduction. This will be further discussed in
Section 1.3.2.
Even if the syllogistic and inferential view of abduction can thus be reconciled, it is
still possible to distinguish between approaches which are primarily motivated by one
of the two views. The syllogistic account of abduction has been taken up, by and large,
in logic programming and other work in artificial intelligence addressing tasks such
as that of diagnosis and planning. In this volume, the logic programming perspective
on abduction can be found in the contributions by Christiansen, Console and Saitta,
Inoue and Haneda, Mooney, Poole, Lamma et al., Sakama, and Yamamoto. The logic
programming and artificial intelligence perspective will be more closely examined in
the next section. On the other hand, the chapters by Aliseda, Josephson, and Psillos
are more closely related to the inferential perspective on abduction.
earlier, syllogistic theory. In Section 1.3.2 we argue that abductive hypotheses primar-
ily provide explanations, while inductive hypotheses provide generalisations. We then
further investigate abduction and induction from a logical perspective in Section 1.3.3,
pointing out differences in the way in which they extend incomplete theories. In Sec-
tion 1.3.4 we investigate how more complex reasoning patterns can be viewed as being
built up from simple abductive and inductive inferences. Finally, in Section 1.3.5 we
address the computational characteristics of abduction and induction.
The first pattern, inference of a general rule from a case (description) and a result
(observation) of a particular individual, exemplifies the kind of reasoning performed
by inductive logic programming (ILP) systems. The second pattern, inferring a more
complete description of an individual from an observation and a general theory valid
for all such individuals, is the kind of reasoning studied in abductive logic program-
ming (ALP).
The above account describes ILP and ALP by example, and does not provide a gen-
eral definition. Interestingly, attempts to provide such a general definition of abduction
and induction in logic programming typically correspond to Peirce's later, inferential
characterisation of explanatory hypotheses generation. Thus, in ALP abductive infer-
ence is typically specified as follows:
'Given a set of sentences T (a theory presentation), and a sentence G (obser-
vation), to a first approximation, the abductive task can be characterised as the
problem of finding a set of sentences ll. (abductive explanation for G) such that:
(I) TUI:J. F G.
(2) T U ll. is consistent. • (Kakas et al., 1992, p. 720)
The following is a specification of induction in ILP:
'Given a consistent set of examples or observations 0 and consistent background
knowledge B find an hypothesis H such that: B UH f= 0 • (Muggleton and De
Raedt, 1994)
In spite of small terminological differences the two specifications are virtually iden-
tical: they both invert a deductive consequence relation in order to complete an incom-
plete given theory, prompted by some new observations that cannot be deductively
14 P.A. FLACH AND A. C. KAKAS
accounted for by the theory alone. 6 If our assessment of the distinction between ab-
duction and induction that is usually drawn in AI is correct, we must conclude that
the above specifications are unable to account for this distinction. In the remainder of
Section 1.3 we will try to understand the differences between abduction and induction
as used in AI in modern, non-syllogistic terms. For an account which stays closer to
syllogisms, the reader is referred to the chapter by Wang.
6 Extra elements that are often added to the above definitions are the satisfaction of integrity constraints for
the case of abduction, and the avoidance of negative examples for the case of induction; these can again be
viewed under the same heading, namely as being aimed at exclusion of certain hypotheses.
ABDUCTIVE AND INDUCTIVE REASONING: BACKGROUND AND ISSUES 15
foreground knowledge may also be used. In some cases it may be empty, for instance
when we are learning the definition of a recursive predicate, when we are learning
the definitions of several mutually dependent predicates, or when we are doing data
mining. The observations specify incomplete (usually extensional) knowledge about
the observables, which we try to generalise into new foreground knowledge.
On the other hand, in abduction we are inferring instance knowledge from ob-
servations and other known information. The latter necessarily contains foreground
information pertaining to the observations at hand. Possible abductive hypotheses are
built from specific non-observable predicates called abducibles in ALP. The intuition
is that these are the predicates of which the extensions are not completely known as in-
stance knowledge. Thus, an abductive hypothesis is one which completes the instance
knowledge about an observed individual. This difference between the effect of abduc-
tion and induction on observable and instance knowledge is studied in the chapter by
Console and Saitta.
puts it, inductive hypotheses do not explain particular observations, but they explain
the frequencies with which the observations occur (viz. that non-white beans from this
bag are never observed).
Generalisation. We thus find that inductive hypotheses are not explanatory in the
same way as abductive hypotheses are. But we would argue that being explanatory is
not the primary aim of inductive hypotheses in the first place. Rather, the main goal of
induction is to provide generalisations. In this respect, we find that the ILP definition
of induction (p. 13) is too much focused on the problem of learning classification rules,
without stressing the aspect of generalisation. An explanatory hypothesis would only
be inductive if it generalises. The essential aspect of induction as applied in AI seems
to be the kind of sample-to-population inference exemplified by categorical inductive
generalisation, reproduced here in its more general form from Section 1.2.3:
All objects in the sample satisfy P(x);
therefore, all objects in the population satisfy P(x) .
As with Peirce's syllogisms, the problem here is that P(x) is already assumed to be
given, while in AI a major problem is to generate such hypotheses. The specification
of confirmatory or descriptive induction follows this pattern, but leaves the hypothesis
unspecified:
Given a consistent set of observations 0 and a consistent background knowledge
B, find a hypothesis H such that: M (B U 0) I= H
(Helft, 1989; De Raedt and Bruynooghe, 1993; Flach, 1995)
Hence the formal requirement now is that any generated hypothesis should be true
in a certain model constructed from the given knowledge and observations (e.g. the
truth-minimal model).
This specification can be seen as sample-to-population inference. For example, in
Peirce's bean example (p. 5), B is 'these beans are from this bag' (instance knowledge),
0 is 'these beans are white' (observation), and H- 'all the beans from this bag are
white' - is satisfied by the model containing 'these beans' as the only beans in the
universe. Under the assumption that the population is similar to the sample, we achieve
generalisation by restricting attention to formulae true in the sample. Note that the
induced hypothesis is not restricted to one explaining the whiteness of these beans:
we might equally well have induced that 'all white beans are from this bag'.
Above we defined a hypothesis as generalising if it makes a prediction involving
an observable. We have to qualify this statement somewhat, as the following example
shows (taken from the chapter by Console and Saitta, Example 9.2, p. 141). Let our
background theory contain the following clauses:
measles(X):-brother(X,Y),measles(Y).
red_spots(X):-measles(X).
brother(john,dan).
red_spots (dan). Thus, the hypothesis that John has measles also seems to qualify
as a generalisation. We would argue however that this generalisation effect is already
present in the background theory. On the other hand, an inductive hypothesis produces
a genuinely new generalisation effect, in the sense that we can find new individuals for
which the addition of the hypothesis to our knowledge is necessary to derive some ob-
servable property for these individuals (usually this property is that of the observations
on which the induction was based). With an abductive hypothesis this kind of exten-
sion of the observable property to other new individuals does not necessarily require
the a priori addition of the abductive hypothesis to the theory but depends only on the
properties of this individual and the given background theory: the generalisation, if
any, already exists in the background theory.
We conclude that abductive and inductive hypotheses differ in the degree of gen-
eralisation that each of them produces. With the given background theory T we im-
plicitly restrict the generalising power of abduction as we require that the basic model
of our domain remains that of T . The existence of this theory separates two levels
of generalisation: (a) that contained in the theory and (b) new generalisations that are
not given by the theory. In abduction we can only have the first level with no in-
terest in genuinely new generalisations, while in induction we do produce such new
generalisations.
ties for this new individual.7 Given an abductive theory T as above, the process of
abduction is to select one of the abductive extensions T(A) ofT in which the given
observation to be explained holds, by selecting the corresponding formula A. We can
then reason deductively in T(A) to arrive at other conclusions. By selecting A we are
essentially enabling one of the possible associations between A and the observation
among those supplied by the theory T.
It is important here to emphasise that the restriction of the hypothesis of abduction
to abducible predicates is not incidental or computational, but has a deeper representa-
tional reason. It reflects the relative comprehensiveness of knowledge of the problem
domain contained in T. The abducible predicates and the allowed abductive formu-
lae take the role of 'answer-holders' for the problem goals that we want to set to our
theory. In this respect they take the place of the logical variable as the answer-holder
when deductive reasoning is used for problem solving. As a result this means that the
form of the abductive hypothesis depends heavily on the particular theory T at hand,
and the way that we have chosen to represent in this our problem domain.
Typically, the allowed abducible formulae are further restricted to simple logical
forms such as ground or existentially quantified conjunctions of abducible literals.
Although these further restrictions may be partly motivated by computational consid-
erations, it is again important to point out that they are only made possible by the
relative comprehensiveness of the particular representation of our problem domain in
the theory T. Thus, the case of simple abduction - where the abducible hypothesis
are ground facts- occurs exactly because the representation of the problem domain in
T is sufficiently complete to allow this. Furthermore, this restriction is not significant
for the purposes of comparison of abduction and induction: our analysis here is inde-
pendent of the particular form of abducible formulae. The important elements are the
existence of an enumeration of the abductive formulae, and the fact that these do not
involve observable predicates.
Inductive extensions. Let us now tum to the case of induction and analyse this
process to facilitate comparison with the process of abduction as described above.
Again, we have a collection of possible inductive hypotheses from which one must be
selected. The main difference now is the fact that these hypotheses are not limited to
a particular subset of predicates that are incompletely specified in the representation
of our problem domain by the theory T, but are restricted only by the language ofT.
In practice, there may be a restriction on the form of the hypothesis, called language
bias, but this is usually motivated either by computational considerations, or by other
information external to the theory T that guides us to an inductive solution.
Another essential characteristic of the process of induction concerns the role of the
selected inductive hypothesis H. The role of H is to extend the existing theory T to a
new theory T' = T U H, rather than reason with T under the set of assumptions H as is
the case for abduction. Hence T is replaced by T' to become a new theory with which
we can subsequently reason, either deductively of abductively, to extract information
7 Note that this type of abductive (or open) reasoning with a theory T collapses to deduction, when and if
the theory becomes fully complete.
ABDUCTIVE AND INDUCTIVE REASONING: BACKGROUND AND ISSUES 19
from it. The hypothesis H changes T by requiring extra conditions on the observable
predicates that drive the induction, unlike abduction where the extra conditions do not
involve the observable predicates. In effect, H provides the link between observables
and non-observables that was missing or incomplete in the original theory T.
Analogously to the concept of abductive extension, we can define inductive ex-
tensions as follows. Consider a common given theory T with which we are able to
perform abduction and induction. That is, T has a number of abductive extensions
T(!!.). Choosing an inductive hypothesis Has a new part of the theory T has the ef-
fect of further conditioning each of the abductive extensions T(!!.). Hence, while in
abduction we select an abductive extension of T, with induction we extend each of
the abductive extensions with H. The effect of induction is thus 'universal' on all the
abductive extensions.
If we now consider the new abductive theory T' = T U H, constructed by induction,
we can view induction as a process of selecting a collection of abductive extensions,
namely those of the new theory T'. Hence an inductive extension can be viewed as a
set ofabductive extensions of the original theory T that are further (uniformly) condi-
tioned by the common statement of the inductive hypothesis H . This idea of an induc-
tive extension consisting of a set of abductive extensions was used in (Denecker et al.,
1996) to obtain a formalisation of abduction and induction as selection processes in
a space of possible world models over the given theory in each case. In this way the
process of induction can be seen to have a more general form than abduction, able to
select a set of extensions rather than a single one. Note that this does not necessar-
ily mean that induction will yield a more general syntactic form of hypotheses than
abduction.
Analysis. Comparing the possible inductive and abductive extensions of a given the-
ory T we have an essential difference. In the case of abduction some of the predicates
in the theory, namely the observables, cannot be arbitrarily defined in an extension.
The freedom of choice of abduction is restricted to constrain directly (via!!..) only the
abducibles of the theory. The observable predicates cannot be affected except through
the theory: the observables must be grounded in the existing theory T by the choice of
the abductive conditions on the abducible part of the extension. Hence in an abductive
extension the extent to which the observables can become true is limited by the theory
T and the particular conditions !!.. on the rest of the predicates.
In induction this restriction is lifted, and indeed we can have inductive extensions
of the given theory T, the truthvalue of which on the observable predicates need not
be attributed via T to a choice on the abducibles. The inductive extensions 'induce'
a more general change (from the point of view of the observables) on the existing
theory T, and- as we will see below- this will allow induction to genuinely gener-
alise the given observations to other cases not derivable from the original theory T .
The generalising effect of abduction, if at all present, is much more limited. The se-
lected abductive hypothesis!!.. may produce in T(!!.) further information on abducible
or other predicates, as in the measles example from the previous section. Assuming
that abducibles and observables are disjoint, any information on an observable derived
in T(!!.) is a generalisation already contained in T .
20 P.A. R..ACH AND A. C. KAKAS
What cannot happen is that the chosen abductive hypothesis A alone (without T)
predicts a new observation, as A does not affect directly the value of the observable
predicates. Every prediction on an observable derived in T(A), not previously true in
T (including the observation that drives the abductive process), corresponds to some
further instance knowledge A', which is a consequence of T(A), and describes the
new situation (or individual) at hand. Such consequences are already known to be
possible in the theory T, as we know that one of its possible extensions is T(A').
In the measles example (p. 16), the observation red_spots (john) gives rise to the
hypothesis A= measles (john). Adopting this hypothesis leads to a new prediction
red_spots (dan), corresponding to the instance knowledge A'= measles (dan),
which is a consequence of T(A) . This new prediction could be obtained directly from
T(measles (dan)) without the need of A= measles (john).
Similarly, if we consider a previously unobserved situation (not derivable from
T(A)) described by Anew with T(A) U Anew deriving a new observation, this is also
already known to be possible as T(A U Anew) is one of the possible extensions ofT.
For example, if Anew= measles (mary), then T(A) UAnew• and in fact TUAnew
derives red_spots (mary), which is again not a genuine generalisation.
In short, abduction is meant to select some further conditions A under which we
should reason with T. It concerns only this particular situation described by A and
hence, if A cannot impose directly any conditions on the observable predicates, the
only generalisations that we can get on the observables are those contained in T under
the particular restrictions A. In this sense we say that the generalisation is not genuine
but already contained in T. Hence, as argued in the chapter by Console and Saitta,
abduction increases the intension of known individuals (abducible properties are now
made true for these individuals), but does not have a genuine generalisation effect on
the observables (it does not increase the extension of the observables with previously
unobserved individuals for which the theory T alone could not produce this extension
when it is given the instance knowledge that describes these individuals).
On the other hand, the universal conditioning of the theory T by the inductive
hypothesis H produces a genuine generalisation on the observables of induction. The
extra conditions in H on the observables introduce new information on the relation of
these predicates to non-observable predicates in the theory T, and from this we get
new observable consequences. We can now find cases where from H alone together
with a (non-observable) part ofT, describing this case, we can derive a prediction not
previously derivable in T.
The new generalisation effect of induction shows up more when we consider as
above the case where the given theory for induction has some of its predicates as
abducible (different from the observables). It is now possible to have a new individual
described by the extra abducible information Anew. such that in the new theory T' =
T U H produced by induction a new observation holds which was not known to be
possible in the old theory T (i.e. it is not a consequence ofT U Anew). Note that we
cannot (as in the case of abduction) combine H with Anew to a set A~ew of instance
knowledge, under which the observation would hold from the old theory T. We can
also have that a new observation holds alone from the hypothesis H and Anew for such
previously unobserved situations not described in the given theory T. These are cases
ABDUCTIVE AND INDUCTIVE REASONING: BACKGROUND AND ISSUES 21
of genuine generalisation not previously known to be possible from the initial theory
T.
Summarising this subsection, induction- seen as a selection of a set of extensions
defined by the new theory T U H - has a stronger and genuinely new generalising
effect on the observable predicates than abduction. The purpose of abduction is to
select an extension and reason with it, thus enabling the generalising potential of the
given theory T. In induction the purpose is to extend the given theory to a new theory,
the abductive extensions of which can provide new possible observable consequences.
Finally, we point out a duality between abduction and induction (first studied in
(Dimopoulos and Kakas, 1996b)) as a result of this analysis. In abduction the the-
ory T is fixed and we vary the instance knowledge to capture (via T) the observable
knowledge. On the other hand, in induction the instance knowledge is fixed as part of
the background knowledge B, and we vary the general theory so that if the selected
theory T is taken as our abductive theory then the instance knowledge in B will form
an abductive solution for the observations that drove the induction. Conversely, if
we perform abduction with T and we consider the abductive hypothesis A explaining
the observations as instance knowledge, the original theory T forms a valid inductive
hypothesis.
by the statement of the observation 0. In fact, we can replace the universal quantifica-
tion in 'all bananas from this shop' by a typical representative through skolemisation.
More importantly, the link of the observation 0 with the extra information of H is
known a priori as one of the possible ways of reasoning with the theory T to derive
new observable information.
There is a second way in which to view this reasoning and the hypothesis H above.
We can consider the predicate 'from Barbados' as the observable predicate with a set
of observations that each of the observed bananas in the shop is from Barbados. We
then have a prototypical inductive problem (like the white bean example of Peirce)
where we generate the same statement H as above, but now as an inductive hypothe-
sis. From this point of view the hypothesis now has a genuine generalising effect over
the observations on the predicate 'from Barbados'. But where did the observations on
Barbados come from? These can be obtained from the theory T as separate abductive
explanations for each of the original observations (or a typical one) on the predicate
'yellow'. We can thus understand this example as a hybrid process of first using (sim-
ple) abduction to translate separately each given observation as an observation on the
abducibles, and then use induction to generalise the latter set of observations, thus
arriving at a general statement on the abducibles.
Essentially, in this latter view we are identifying, by changing within the same
problem the observable and abducible predicates, simple basic forms of abduction and
induction on which we can build more complex forms of non-deductive reasoning.
Referring back to our earlier discussion in Section 1.3, these basic forms are: pure
abduction for explanation with no generalisation effect (over what already exists in the
theory T); and pure induction of simple generalisations from sample to population.
This identification of basic distinct forms of reasoning has important computational
consequences. It means that we can consider two basic computational models for the
separate tasks of abduction and induction. The emphasis then shifts to the question
of how these basic forms of reasoning and computation can be integrated together to
solve more complex problems by suitably breaking down these problems into simpler
ones.
It is interesting to note here that in the recent framework of inverse entailment as
used by the ILP system Progol (Muggleton, 1995) where we can learn from general
clauses as observations, an analysis of its computation as done in the chapter by Ya-
mamoto reveals that this can be understood as a mixture of abduction and induction.
As described in the above example, the Progol computation can be separated into first
abductively explaining according to the background theory a skolemised, typical ob-
servation, and then inductively generalising over this abductive explanation. The use-
fulness of explicitly separating out abduction and induction is also evident in several
works of theory formation or revision. Basic computational forms of abduction and
induction are used together to address these complex problems. This will be described
further in Section 1.4 on the integration of abduction and induction in AI.
ming. Indeed, when we examine the computational models used for abduction and
induction in AI, we notice that they are very different. Their difference is so wide that
it is difficult, if not impossible, to use the computational framework of one form of
reasoning in order to compute the other form of reasoning. Systems developed in AI
for abduction cannot be used for induction (and learning), and vice versa, inductive AI
systems cannot be used to solve abductive problems.9 In the chapter by Christiansen a
system is described where the computation of both forms of reasoning can be unified
at a meta-level, but where the actual computation followed by the system is different
for the separate forms of reasoning.
We will describe here the main characteristics of the computational models of the
basic forms of abduction and induction, discussed above, as they are found in practical
AI approaches. According to these basic forms, abduction extracts an explanation
for an observation from a given theory T, and induction generalises a set of atomic
observations. For abduction the computation has the following basic form: extract
from the given theory T a hypothesis A and check this for consistency. The search
for a hypothesis is done via some form of enhanced deduction method e.g. resolution
with residues (Cox and Pietrzykowski, 1986a; Eshghi and Kowalski, 1989; Kakas
and Mancarella, 1990c; Denecker and de Schreye, 1992; Inoue, 1992a; Kakas and
Michael, 1995), or unfolding of the theory T (Console et al., 1991b; Fung and
Kowalski, 1997).
The important thing to note is that the abductive computation is primarily based on
the computation of deductive consequences from the theory T. The proofs are now
generalised so that they can be successfully terminated 'early' with an abductive for-
mula. To check consistency of the found hypothesis, abductive systems employ stan-
dard deductive methods (these may sometimes be specially simplified and adapted to
the particular form that the abductive formulae are restricted to take). If a hypothesis
(or part of a hypothesis) is found inconsistent then it is rejected and another one is
sought. Note that systems that compute constructive abduction (e.g. SLDNFA (De-
necker and de Schreye, 1998) , IFF (Fung and Kowalski, 1997), ACLP (Kakas and
Michael, 1995)), where the hypothesis may not be ground but can be an existentially
quantified conjunction (with arithmetic constraints on these variables) or even a uni-
versally quantified formula, have the same computational characteristics. They arrive
at these more complex hypotheses by extending the proof methods for entailment to
account for the (isolated) incompleteness on the abducible predicates.
On the other hand, the computational model for the basic form of induction in AI
takes a rather different form. It constructs a hypothesis and then refines this under
consistency and other criteria. The construction of the hypothesis is based on methods
for inverting entailment proofs (or satisfaction proofs in the case of confirmatory in-
duction) so that we can obtain a new theory that would then entail (or be satisfied by)
the observations. Thus, unlike the abductive case, the computation cannot be based
on proof methods for entailment, and new methods such as inverse resolution, clause
generalisation and specialisation are used. In induction the hypothesis is generated
9 With the possible exception of Cigol (Muggleton and Buntine, 1988), a system designed for doing unre-
stricted reversed deduction.
24 P.A. FLACH AND A. C. KAKAS
from the language of the problem domain (rather than a given theory of the domain),
in a process of iteratively improving a hypothesis to meet the various requirements
posed by the problem. Furthermore, in induction the comparison of the different pos-
sible hypotheses plays a prominent and dynamic role in the actual process of hypothe-
sis generation, whereas in abduction evaluation of the different alternative hypothesis
may be done after these have been generated.
It should be noted, however, that the observed computational differences between
generating abductive hypotheses and generating inductive hypotheses are likely to be-
come smaller once more complex abductive hypotheses are allowed. Much of the
computational effort of ILP systems is spent on efficiently searching and pruning the
space of possible hypotheses, while ALP systems typically enumerate all possible
abductive explanations. The latter approach becomes clearly infeasible when the ab-
ductive hypothesis space grows. In this respect, we again mention the system Cigol
which seems to be the only system employing a unified computational method (inverse
resolution) to generate both abductive and inductive hypotheses.
Computational distinctions of the two forms of reasoning are amplified when we
consider the different works of trying to integrate abduction and induction in a com-
mon system. In most of these works, each of the two forms of reasoning is computed
separately, and their results are transferred to the other form of reasoning as input. The
integration clearly recognises two different computational processes (for each reason-
ing) which are then suitably linked together. For example, in LAB (Thompson and
Mooney, 1994) or ACL (Kakas and Riguzzi, 1997; Kakas and Riguzzi, 1999) the
overall computation is that of induction as described above, but where now - at the
point of evaluation and improvement of the hypothesis- a specific abductive problem
is computed that provides feedback regarding the suitability of the inductive hypothe-
sis. In other cases, such as RUTH (Ade et al., 1994) or Either (Ourston and Mooney,
1994) an abductive process generates new observable input for a subsidiary inductive
process. In all these cases we have well-defined separate problems of simple forms of
abduction and induction each of which is computed along the lines described above.
In other words, the computational viability of the integrated systems depends signifi-
cantly on this separation of the problem and computation into instances of the simple
forms of abduction and induction.
T' ----0
Induction
"'
TuH 0 Abduction
T
------ ~0'
Figure 1.2 The cycle of abductive and inductive knowledge development.
several ways in which this can happen within a cycle of development of T, as will be
described below. For further discussion on the integration of abduction and induction
in the context of machine learning see the chapter by Mooney in this volume. Also
the chapter by Sak:ama studies how abduction can be used to compute induction in an
integrated way.
The cycle of abductive and inductive knowledge development. On the one hand,
abduction can be used to extract from the given theory T and observations 0 abducible
information that would then feed into induction as (additional) training data. One ex-
ample of this is provided by (Ours ton and Mooney, 1994), where abduction identifies
points of repair of the original, faulty theory T, i.e. clauses that could be generalised
so that positive observations in 0 become entailed, or clauses that may need to be
specialised or retracted because they are inconsistent with negative observations.
A more active cooperation occurs when, first, through the use of basic abduction,
the original observations are transformed to data on abducible background predicates
in T, becoming training data for induction on these predicates. An example of this was
discussed in Section 1.3 .4; another example in (Dimopoulos and Kak:as, 1996b) shows
that only if, before inductive generalisation takes place, we abductively transform the
observations into other predicates in a uniform way, it is possible to solve the original
inductive learning task. In this volume, Abe studies this type of integration, employing
an analogy principle to generate suitable data for induction. Similarly, Yamamoto's
analysis of the ILP system Progol in this volume shows that- at an abstract level- the
computation splits into a first phase of abductively transforming the observations on
one predicate to data on other predicates, followed by a second generalisation phase
to produce the solution.
In the framework of the system RUTH (Ade et al., 1994), we see induction feeding
into the original abductive task. An abductive explanation may lead to a set of required
facts on 'inducible' predicates, which are inductively generalised to give a general rule
in the abductive explanation for the original observations, similar to (one analysis of)
the bananas example discussed previously.
These types of integration can be succinctly summarised as follows. Consider a
cycle of knowledge development governed by the 'equation' T U HI= 0, where Tis
the current theory, 0 the observation triggering theory development, and H the new
26 P.A. FLACH AND A.C. KAKAS
knowledge generated. Then, as shown in Figure 1.2, on one side of this cycle we
have induction, its output feeding into the theory T for later use by abduction, as
shown in the other half of the cycle, where the abductive output in tum feeds into the
observational data 0 for later use by induction, and so on.
Inducing abductive theories. Another way in which induction can feed into abduc-
tion is through the generation of confirmatory (or descriptive) inductive hypotheses
that could act as integrity constraints for the new theory. Here we initially have some
abductive hypotheses regarding the presence or absence of abducible assumptions.
Based on these hypotheses and other data in T we generate, by means of confirmatory
induction, new sentences I which, when interpreted as integrity constraints on the new
theory T, would support the abducible assumptions (assumptions of presence would
be consistent with/, assumptions of absence would now be inconsistent with/).
This type of cooperation between abductive and inductive reasoning is based on a
deeper level of integration of the two forms of reasoning, where induction is perceived
as hypothesising abductive (rather than deductive) theories. The deductive coverage
relation for learning is replaced by abductive coverage, such that an inductive hypoth-
esis H is a valid generalisation if the observations can be abductively explained by
T' = T U H, rather than deductively entailed. A simple example of this is the exten-
sion of Explanation-Based Learning with abduction (Cohen, 1992; O'Rorke, 1994),
such that deductive explanations are allowed to be completed by abductive assump-
tions before they are generalised.
Inducing abductive theories is particularly useful in cases where the domain theory
is incomplete, and also when performing multiple predicate learning, as also in this
case the background knowledge for one predicate includes the incomplete data for the
other predicates to be learned. In these cases the given theory T is essentially an ab-
ductive theory, and hence it is appropriate to use an abductive coverage relation. On
the other hand, it may be that the domain that we are trying to learn is itself inher-
ently abductive or non-monotonic (e.g. containing nested hierarchies of exceptions),
in which case the hypothesis space for learning is a space of abductive theories.
LAB (Thompson and Mooney, 1994) is one of the first learning systems adopting
this point of view (see also Mooney's contribution to this volume). The class predi-
cates to be learned are the abducible predicates, and the induced theory H describes the
effects of these predicates on other predicates that we can observe directly with rules
of the form observation~ class. Then the training examples (each consisting of a set
of properties and its classification) are captured by the induced hypothesis H when the
correct classification of the examples forms a valid abductive explanation, given H, for
their observed properties. Other frameworks for learning abductive theories are given
in (Kakas and Riguzzi, 1997; Kakas and Riguzzi, 1999; Dimopoulos et al., 1997)
and the chapter by Lamma et al. Here, both explanatory and confirmatory induction
are used to generate theories together with integrity constraints. In this volume, Inoue
and Haneda also study the problem of learning abductive logic programs for capturing
non-monotonic theories.
With this type of integration we can perceive abduction as being used to evaluate
the suitability or credibility of the inductive hypothesis. Similarly, abductive expla-
ABDUCfiVE AND INDUCfiVE REASONING: BACKGROUND AND ISSUES 27
nations that lead to induction can be evaluated by testing the induced generalisation.
In this sense, the integration of abduction and induction can help to cross-evaluate the
hypothesis that they generate.
1.5 CONCLUSIONS
The nature of abduction and induction is still hotly debated. In this introductory chap-
ter we have tried to chart the terrain of possible positions in this debate, and also to
provide a roadmap for the contributions to this volume. From a logico-philosophical
perspective, there are broadly speaking two positions: either one holds that abduction
provides explanations and induction provides generalisations; or one can hold that ab-
duction is the logic of hypothesis generation and induction is the logic of hypothesis
evaluation. AI approaches tend to adopt the first perspective (although there are ex-
ceptions)- abduction and induction each deal with a different kind of incompleteness
of the given theory, extending it in different ways.
As stressed in the introduction to this chapter, we do however think that absolute
positions in this debate may be counter-productive. Referring back to the questions
formulated there, we think it will be useful to unify abduction and induction when
concentrating on hypothesis evaluation. On the other hand, when considering hypoth-
esis generation we often perceive a distinction between abduction and induction, in
particular in their computational aspects.
With respect to the second question, abduction and induction can be usefully inte-
grated when trying to solve complex theory development tasks. We have reviewed a
number of AI approaches to such integration. Most of these frameworks of integration
use relatively simple forms of abduction and induction, namely abduction of ground
facts and basic inductive generalisations. Moreover, each of the two is computed sep-
arately and its results transferred to the other, thus clearly recognising two separate
and basic computational problems. From these, they synthesise an integrated form of
reasoning that can produce more complex solutions, following a cyclic pattern with
each form of reasoning feeding into the other.
A central question then arises as to what extent the combination of such basic forms
of abduction and induction is complete, in the sense that it encapsulates all solutions to
the task. Can they form a generating basis for any method for such theory development
which Peirce describes in his later work as 'coming up with a new theory'? We hope
that the present collection of papers will contribute towards understanding this issue,
and many other issues pertaining to the relation between abduction and induction.
Acknowledgments
Part of this work was supported by Esprit IV Long Term Research Project 20237 (Inductive
Logic Programming 2).
I The philosophy of abduction
and induction
SMART INDUCTIVE 2
GENERALIZATIONS ARE ABDUCTIONS
John A. Josephson
1The phrase "inference to the best explanation" seems to originate with Gilbert Hannan (Hannan, 1965).
2 This formulation is largely due to William Lycan.
3 Piease see (Josephson, 1994, p. l4), for a more complete description of the considerations governing con-
fidence and acceptance, including pragmatic considerations.
31
P.A. Flach and A.C. Kakas (eds.), Abduction and Induction, 31-44.
© 2000 Kluwer Academic Publishers.
32 J.R. JOSEPHSON
I trust that my readers recognize IBE as familiar, and as having a kind of intuitively
recognizable evidential force. We can observe that people quite commonly justify
their conclusions by direct, or barely disguised, appeal to IBE, showing that speaker
and hearer share a common understanding of the pattern. Thus, IBE appears to be
part of "commonsense logic." Why this might be so makes for interesting speculation
- perhaps it is somehow built into the human mind, and perhaps this is good design.
These speculations aside, it seems undeniable that people commonly view IBE as a
form of good thinking. Moreover, it appears that people intuitively recognize many
of the considerations, such as those just mentioned, that govern the strength of the
conclusions of IBEs. Beyond that, people sometimes actually come up to the standards
set by IBE in their actual reasoning. When they do so, they can be reasonably said to
be "behaving intelligently" (in that respect). Thus, IBE is part of being intelligent, part
of being "smart."
When I say that a form of inference, is "smart" I mean that reasoning in accor-
dance with it has some value in contributing to intelligence, either because inferences
according to the pattern carry evidential force, as in the case of IBE, or because they
have some other power or effectiveness to contribute to intelligence. I will leave "in-
telligence" undefined, although I suppose that intelligence is approximately the same
as western philosophers have called "reason," and that intelligence is a biological phe-
nomenon, an information-processing capability of humans and other organisms, and
that it comes in degrees and dimensions, with some species and individuals being more
intelligent than others in some respects.
Besides everyday intelligence, we can readily see IBE in scientific reasoning, as
well as in the reasoning of historians, juries, diagnosticians, and detectives. Thus, IBE
seems to characterize some of the most careful and productive processes of human
reasoning. 4 Considering its apparent ubiquity, it is remarkable how overlooked and
underanalyzed this inference pattern is by some 2,400 years of logic and philosophy.
The effectiveness of predictions enters the evaluative considerations in a natural
way. A hypothesis that leads to false or highly inaccurate predictions is poor by itself,
and should not be accepted, even if it appears to be the best explanation when consid-
ering all the available data. Failures in predictive power count as evidence against a
hypothesis and so tend to improve the chances of other hypotheses coming out as best.
Failures in predictive power may also improve the margin of decisiveness by which
the best explanation surpasses the failing alternatives. Thus. we see that IBEs are
capable of turning negative evidence against some hypotheses into positive evidence
for alternative hypotheses.
This kind of reasoning by exclusion, which is able to tum negative to positive ev-
idence, can be viewed deductively as relying on the assumption that the contrast set
(the set of hypotheses within which one hypothesis gets evidential support by being
best and over which reasoning by exclusion proceeds) exhausts the possibilities. It
4 For more extensive discussions of the epistemic virtues of this pattern of inference, see (Harman, 1965;
Lipton, 1991; Josephson, 1994).
SMART INDUCfiVE GENERALIZATIONS ARE ABDUCfiONS 33
must either exhaust the possibilities, or at least be broad enough to include all plau-
sible hypotheses, i.e., all hypotheses with a significant chance of being true. If the
contrast set is broad enough, the true explanation can be presumed to be included
somewhere in the set of hypotheses under consideration, and the best explanation can
then be brought out by reasoning by exclusion. A thorough search for alternatives,
the third consideration mentioned previously, is important for high confidence in the
conclusion, since a thorough search reduces the danger that the contrast set is too nar-
row and that the true explanation has been overlooked.5 Note that nothing requires
the alternatives in a contrast set to be mutually exclusive. In principle a patient might
have both indigestion and heart trouble. Reasoning by exclusion works fine for non-
exclusive hypotheses- reasoning by exclusion depends on exhaustiveness, not mutual
exclusion.
I do not suggest that the description of the IBE inference pattern that I have given
here is perfect, or precise, or complete, or the best possible description of it. I suggest
only that it is good enough so that we can recognize IBE as distinctive, logically
forceful, ubiquitous, and smart.
2.1.2 "Abduction"
Sometimes a distinction has been made between an initial process of coming up with
explanatorily useful hypothesis alternatives and a subsequent process of critical eval-
uation wherein a decision is made as to which explanation is best. Sometimes the
term "abduction" has been restricted to the hypothesis-generation phase. Peirce him-
self commonly wrote this way, although at other times Peirce clearly used the term
"abduction" for something close to what I have here called "inference to the best ex-
planation."6
Sometimes "abduction" has been identified with the creative generation of explana-
tory hypotheses, even sometimes with the creative generation of ideas in general.
Kruijff suggests that, besides the creativity of hypotheses, the surprisingness of what
is to be explained is at the core of abduction's ubiquity and of its relation to reality
(Kruijff, 1997). Peirce, too, sometimes emphasizes surprise. It is clear that there is
much expected utility in trying to explain things that are surprising. Surprise points out
just where knowledge is lacking, and when a failed expectation has distinctly pleas-
ant or unpleasant effects, there may well be something of practical important to be
learned. But one may also wonder about, and seek explanations for, things that are not
ordinarily surprising, and which only become "surprising" when you wonder about
them, when you recognize that in some way things could be different. "Why do things
fall?" "Why do people get angry?" "Why do arctic foxes have white coats in winter?"
None of these are unexpected, all present openings for new knowledge. Clearly, nei-
ther novelty of hypothesis nor surprise at the data are essential for an IBE to establish
5 For amore extensive discussion of the importance of evidence that the contrast set includes the true expla-
nation, please see (Josephson, 1994, p. IS).
6 For a discussion of Peirce's views on abduction, please see the opening essay by Flach and Kakas and
the essay by Psillos in this volume. For a detailed scholarly examination of Peirce's writings on abduction
please see (Fann, 1970).
34 J.R. JOSEPHSON
its conclusion with evidential force . "Who forgot to put the cheese away last night?"
"Probably Billy. He has left the cheese out most nights this week."
While the creative generation of ideas is certainly virtuous in the right contexts, and
useful for being smart, it is necessary for creative hypotheses to have some plausibility,
some chance of being true, or some pursuit value, before creativity can make a gen-
uine contribution to working intelligence. Generating low value creative explanatory
hypotheses is in itself a dis-virtue in that time, attention, and other cognitive or com-
putational resources must then be expended in rejecting these low value hypotheses so
that better hypotheses may be pursued. Too much of the wrong kind of creativity is a
drain on intelligence, and so is not smart. Generation of hypotheses, without critical
control, is not smart. Generation of hypotheses, as a pattern of inference, in and of
itself, is not smart.
Generation of plausible explanatory hypotheses, relevant to the current explanatory
problem, is smart. Yet pre-screening hypotheses to remove those that are implausible
or irrelevant mixes critical evaluation into the hypothesis-generation process, and so
breaks the separation between the process of hypotheses generation and the process of
critical evaluation. Furthermore, evaluating one or more explanatory hypotheses may
require (according to IBE) that alternative explanations are generated and considered
and that a judgment is made concerning the thoroughness of the search for alternative
explanations. Again we see a breakdown in the separation of the processes of hypoth-
esis generation from the processes of critical evaluation. Either type of process will
sometimes need the other as a subprocess. The use of one process by the other might
be precompiled, so that it is not invoked explicitly at run time, but instead is only
implicit. A hypothesis generation mechanism might implicitly use criticism (it must
use some criticism if it is to be smart), and criticism might implicitly use hypothesis
generation, for example by implicitly considering and eliminating a large number of
alternatives as being implausible. Thus I conclude that hypothesis generation and hy-
pothesis evaluation cannot be neatly separated, and in any case, hypothesis generation
by itself is not smart.
Consider another pattern of inference, which I will call "backward modus ponens,"
which has a pattern as follows:
p-+q
q
Therefore, p.
The arrow,"--+", here may be variously interpreted, so let us just suppose it to have
more or less the same meaning as the arrow used in schematizing:
p-+q
p
Therefore, q.
This second one is modus ponens, and this one is smart. Modus ponens has some
kind of intuitively visible logical force. In contrast, backward modes ponens is ob-
viously fallacious. Copi calls it "the fallacy of affirming the consequent" (Copi and
Cohen, 1998). By itself, backward modus ponens is not smart, although reasoning in
SMART INDUCfiVE GENERALIZATIONS ARE ABDUCfiONS 35
accordance with its pattern may be smart for other reasons, and there may be special
contexts in which following the pattern is smart.
It has become common in AI to identify "abduction" with backward modus po-
nens, or with backward modus ponens together with syntactic or semantic constraints,
such as that the- conclusion must come from some fixed set of abducibles. There is
a burden on those who study restricted forms of backward modus ponens to show us
the virtues of their particular forms, that is, they need to show us how they are smart.
I suggest that we will find that backward modus ponens is smart to the degree that
it approximates, or when it is controlled and constrained to approximate, or when it
implements, inference to the best explanation.
From the foregoing discussion it appears that IBE is distinctive, evidentially force-
ful, ubiquitous, and smart, and that no other proposed definition or description of the
term "abduction" has all of these virtues. Thus it seems that IBE is our best candidate
as a description of what is at the epistemological and information-processing core of
the family of patterns collected around the idea of abduction. I therefore claim the
term "abduction" for IBE, and in the remainder of this essay, by "abduction" I mean
"inference to the best explanation."
Some authors characterize abduction as reasoning from effects to causes, a view to
which we will return later in this essay. For now, I would just like to point out that, at
least, abduction is a good way to be effective in reasoning from effects to causes. From
an effect, we may generate a set of alternative causal explanations and try to determine
which is the best. If a hypothesized cause is the best explanation, then we have good
evidence that it is the true cause.
gies, implementations, and processes that will be needed to accomplish it. 7 These
three perspectives - justification, task, and process - are conceptually tightly inter-
connected, as follows. An abductive reasoning task, prototypically, is one that has
the goal of producing a satisfactory explanation, which is an explanation that can be
confidently accepted. An explanation that can be confidently accepted is one that has
strong abductive justification. Thus, a prototypical abductive task aims at setting up a
strong abductive justification. Information processing that is undertaken for the pur-
pose of accomplishing a prototypical abductive task, that is, of producing a confident
explanation, may reasonably be called an "abductive reasoning process." From an
information- processing perspective, it makes sense to think of abductive reasoning
as comprising the whole process of generation, criticism, and possible acceptance of
explanatory hypotheses.
Note that the abductive justifications set up by abductive reasoning might be ex-
plicit, as when a diagnostic conclusion can be justified, or they might arise implicitly
as a result of the functioning of an "abductively effective mechanism," such as, per-
haps, the human visual system, or the human language understanding mechanism, or
an effective neural-net diagnostic system. Note also that the conclusions of abduc-
tive arguments (and correspondingly, the accomplishments of abductive tasks, and the
results of abductive reasoning processes) may be either general or particular propo-
sitions. Sometimes a particular patient's symptoms are explained; sometimes an em-
pirical generalization is explained by an underlying causal mechanism (e.g., univer-
sal gravitation explains the orbits of the planets). Sometimes an individual event is
explained - "What caused the fire?" - and sometimes a recurrent phenomenon is
explained- "What causes malaria?"
The account of abduction that has been sketched so far in this essay still has two
large holes: (I) what is an explanation? and (2) what makes one explanation better
than another? I will not attempt to fill the second hole in this chapter - the literature
on the subject is vast (see (Darden, 1991, p.277 ff.) for a starting point). I will simply
mention some desirable features of explanatory hypotheses: consistency, plausibility,
simplicity, explanatory power, predictive power, precision, specificity, and theoretical
promise. To begin to fill the first hole, let us ask: what conception of explanation is
needed for understanding abduction?
nomological" (D-N) model of explanation. 8 The main difficulty with these accounts
(besides Hempel's confounding the question of what makes an ideally good explana-
tion with the question of what it is to explain at all) is that being a deductive proof is
neither necessary nor sufficient for being an explanation. Consider the following:
QUESTION: Why does he have bums on his hand?
EXPLANATION: He sneezed while cooking pasta and upset the pot.
The point of this example is that an explanation is given, but no deductive proof, and
although it could be turned into a deductive proof by including additional proposi-
tions, this would amount to gratuitously completing what is on the face of it an in-
complete explanation. Real explanations are almost always incomplete. Under the
circumstances (incompletely specified) sneezing and upsetting the pot were presum-
ably causally sufficient for the effect, but this is quite different from being deductively
sufficient. For another example, consider that the flu hypothesis explains the body
aches, but often people have flu without body aches, so having flu does not imply
having body aches. The lesson is that an explanatory hypothesis need not deductively
entail what it explains.
The case that explanations are not necessarily deductive proofs is made even stronger
when we consider psychological explanations, where there is presumptively an ele-
ment of free will, and explanations that are fundamentally statistical, where, for ex-
ample, quantum phenomena are involved. In these cases it is clear that causal deter-
minism cannot be assumed, so the antecedent conditions, even all antecedent condi-
tions together, known and unknown, cannot be assumed to be causally sufficient for
the effects.
Conversely, many deductive proofs fail to be explanations of anything. For exam-
ple, classical mechanics is deterministic and time reversible, so an earlier state of a
system can be deduced from a later state, but the earlier state cannot be said to be
explained thereby. Also, q can be deduced from 'p and q' but is not thereby explained.
Many mathematicians will at least privately acknowledge that some proofs establish
their conclusion without giving much insight into why the conclusions are true, while
other proofs give richer understanding. So it seems that, even in pure mathematics,
some proofs are explanatory and some are not.
We are forced to conclude that explanations are not deductive proofs in any par-
ticularly interesting sense. Although they can always be presented in the form of
deductive proofs by adding premises, doing so does not succeed in capturing anything
essential or especially useful, and typically requires completing an incomplete expla-
nation. Thus the search for a proof of D is not the same as the search for an explanation
of D. Instead it is only a traditional, but seriously flawed, approximation of it.
brief summary of deductive and other models of explanation please see (Bhaskar, 1981 ). For a history
8 For a
of more recent philosophical accounts of explanation, please see (Salmon, 1990).
38 J.R. JOSEPHSON
the plane crash. The mechanisms that connect the ingestion of cigarette smoke with
effects on the arteries of the heart, explain the statistical association between smok-
ing and heart disease. It is common in science for an empirical generalization, an
observed regularity, to be explained by reference to underlying structure and mech-
anisms. Explainer and explained, explanans and explanandum, may be general or
particular. Accordingly, abductions may apply to, or arrive at, propositions that are
either general or particular. Computational models of abduction that do not allow for
this are not fully general, although they may be effective as special-purpose models.
As I have argued, explanations are not deductive proofs in any particularly inter-
esting sense. Although they can always be presented in the form of deductive proofs,
doing so seems not to capturing anything essential, or especially useful, and usually
requires completing an incomplete explanation. Thinking of explanations as proofs
tends to confuse causation with logical implication. To put it simply: causation is in
the world, implication is in the mind. Of course, mental causation exists (e.g., where
decisions cause other decisions), which complicates the simple distinction by includ-
ing mental processes in the causal world, but that complication should not be allowed
to obscure the basic point, which is not to confuse an entailment relationship with
what may be the objective, causal grounds for that relationship. Deductive models of
causation are at their best when modeling deterministic closed-world causation, but
this is too narrow for most real-world purposes. Even for modeling situations where
determinism and closed world are appropriate assumptions, treating causality as de-
duction is dangerous, since one must be careful to exclude non-causal and anti-causal
(effect-to-cause) conditionals from any knowledge base if one is to distinguish cause-
effect from other kinds of inferences. [Pearl has pointed out the significant dangers of
unconstrained mixing of cause-to-effect with effect-to-cause reasoning(Pearl, 1988b).]
Per se, there is no reason to seek an implier of some given fact. The set of possible
impliers includes all sorts of riffraff, and there is no obvious contrast set at that level to
set up reasoning by exclusion. But there is a reason to seek a possible cause: broadly
speaking, because knowledge of causes gives us powers of influence and prediction.
And the set of possible causes (of the kind we are interested in) does constitute a
contrast set for reasoning by exclusion. Common sense takes it on faith that everything
has a cause. (Compare this with Leibniz's Principle of Sufficient Reason.) There is
no (non-trivial) principle of logic or common sense that says that everything has an
implier.
In the search for interesting causes, we may set up alternative explanations and
reason by exclusion. Thus, IBE is a way to reason from effect to cause. Effect-to-
cause reasoning is not itself the same as abduction, rather, effect-to-cause reasoning
is what abduction is for.
To begin with, let us note that the word "induction" has had no consistent use, either
recently or historically. Sometimes writers have used the term to mean all inferences
that are not deductive, sometimes they have specifically meant inductive generaliza-
tions, and sometimes they have meant next-case inductions as in the philosophical
"problem of induction" as put by David Hume. We focus on inductive generalizations,
which we may describe by saying that an inductive generalization is an inference that
goes from the characteristics of some observed sample of individuals to a conclusion
about the distribution of those characteristics in some larger population. Examples
include generalizations that arrive at categorical propositions (All A's are B's) and
generalizations that arrive at statistical propositions (71% of A's are B's, Most A's are
B's, Typical A's are B's.). A common form of inductive generalization in AI is called
"concept learning from examples," which may be supervised or unsupervised. Here
the learned concept generalizes the frequencies of occurrence and co-occurrence of
certain characteristics in a sample, with the intention to apply them to a larger general
population, which includes unobserved as well as observed instances.
I will argue that it is possible to treat every "smart" (i.e., reasonable, valid, strong)
inductive generalization as an instance of abduction, and that analyzing inductive gen-
eralizations as abductions shows us how to evaluate the strengths of these inferences.
First we note that many possible inductive generalizations are not smart.9
This thumb is mine & this thumb is mine.
Russell's example: a man falls from a tall building, passes the 75th floor, passes the
74th floor, passes the 73rd floor, is heard to say, "so far, so good."
Harman pointed out that it is useful to describe inductive generalizations as abduc-
tions because it helps to make clear when the inferences are warranted(Harman, 1965).
Consider the following inference:
All observed A's are B's
This inference is warranted, Harman writes, " ... whenever the hypothesis that all
A's are B's is (in the light of all the evidence) a better, simpler, more plausible (and
so forth) hypothesis than is the hypothesis, say, that someone is biasing the observed
sample in order to make us think that all A's are B's. On the other hand, as soon as the
total evidence makes some other competing hypothesis plausible, one may not infer
from the past correlation in the observed sample to a complete correlation in the total
population."
9I did not invent these examples, but I forget where I got them.
SMART INDUCfiVE GENERALIZATIONS ARE ABDUCfiONS 41
If this is indeed an abductive inference, then "All A's are B's" should explain "All
observed A's are B's." The problem is that, "All A's are B's" does not seem to explain
why "This A is a B," or why A and Bare regularly associated (pointed out by (Ennis,
1968)). Furthermore, it is hard to see how a general fact could explain its instances,
because it does not seem in any way to cause them.
The story becomes clearer if we are careful about what precisely is explained and
what is doing the explaining. What the general statement in the conclusion explains are
certain characteristics of the set of observations, not the facts observed. For example,
suppose I choose a ball at random (arbitrarily) from a large hat containing colored
balls. The ball I choose is red. Does the fact that all of the balls in the hat are red
explain why this particular ball is red? No, but it does explain why, when I chose a
ball at random, it turned out to be a red one (because they all are). "All A's are B's"
cannot explain why "This A is a B" because it does not say anything at all about how
its being an A is connected with its being a B. The information that "they all are" does
not tell us anything about why this one is, except that it suggests that if we want to
know why this one is, we would do well to figure out why they all are. Instead, all A's
are B's helps to explain why, when a sample was taken, it turned out that all of the A's
in the sample were B's. A generalization helps to explain some characteristics of the
set of observations of the instances, but it does not explain the instances themselves.
That the cloudless, daytime sky is blue helps explain why, when I look up, I see the
sky to be blue, but it doesn't explain why the sky is blue. Seen this way, an inductive
generalization does indeed have the form of an inference whose conclusion explains
its premises.
In particular, "A's are mostly B's" together with "This sample of A's was obtained
without regard to whether or not they were B's" explains why the A's that were sam-
pled were mostly B's.
Why were 61% of the chosen balls yellow?
Because the balls were chosen more or less randomly from a population that
was two thirds yellow, the difference from two thirds in the sample being due
to chance.
Alternative explanation for the same observation:
Because the balls were chosen by a selector with a bias for large balls from a
population that was only one third yellow but where yellow balls tend to be larger
than non yellow
Core claim: The frequencies in the larger population, together with the frequency-
relevant characteristics of the method for drawing a sample, explain the frequencies
in the observed sample.
What is explained? In this example, just the frequency of characteristics in the
sample is explained, not why these particular balls are yellow or why the experiment
was conducted on Tuesday. In general, the explanation explains why the sample fre-
quency was the way it was, rather than having some markedly different value. If there
is a deviation in the sample from what you would expect, given the population and the
sampling method, then you have to throw some Chance into the explanation (which is
more or less plausible depending on how much Chance you have to suppose).
42 J.R. JOSEPHSON
class for the parent frequency (e.g., is it just humans or all multi-celled animals that
use insulin for glucose control?). This is the A class in "All A's are B's." Generaliza-
tions may also differ in the B class, which amounts to differing on the choice of which
attribute to generalize (e.g., 'Crows are black' versus 'Crows are squawky'). This
example shows clearly that sometimes alternative generalizations from the same data
(e.g., crow observations) are not genuinely contrastive; they do not belong together
in contrast sets from which a best explanation may be drawn by abductive reasoning.
It does not make sense to argue in favor of the generalization, 'Crows are black' by
arguing against 'Crows are squawky.' This is not simply because they are compati-
ble hypotheses; we have seen earlier that alternative explanations for abduction need
not be mutually exclusive. 'Heart attack' and 'indigestion' are compatible explana-
tions for the pain since it is possible for the patient to have both conditions. But in
this case one may argue for one by arguing against the other; they are genuinely con-
trastive. Perhaps the 'black' and 'squawky' generalizations of the crow observations
are not contrastive because different aspects of the observations are explained by the
generalizations, so they are not alternative ways of explaining the same things.
I am puzzled about what principles govern when explanations in general are gen-
uinely contrastive, and when they are not. Explanations of different causal types, e.g.,
final cause, efficient cause, are not usually contrastive. Yet within the same causal
type, explanations may not be contrastive, e.g., whether we blame the death on the
murderer or the heart stoppage.
Peter Flach has suggested that finding the right level of generality is most of the
work in forming good generalizations (mailing-list communication). It is interesting
to note that alternative levels of generality do seem to be genuinely contrastive. Let
us suppose that the attribute to be generalized is fixed, and we want to determine the
best reference class for the parent frequency. For concreteness, let us suppose that
we have noticed (and have carefully recorded data showing that) on average, week-
ends are rainier than weekdays. Our observations are taken in New York. Alternative
generalizations include that weekend days are rainier in New York, on the east coast
of North America, on Earth. These generalizations lead to different predictions and
perhaps have differing levels of plausibility based on background knowledge about
plausible mechanisms. (The best generalization in this case seems to be the east coast
of North America, according to a recent report in Nature.)
We have seen that contrastive alternative generalizations may differ in their degrees
of plausibility, for example by supposing more or less chance or by hypothesizing
kinds of sampling bias that are made more or less plausible by background knowl-
edge. Alternative explanations with alternative generalizations may also differ in other
virtues, such as explanatory power, predictive power, precision (e.g., in hypothesized
frequency), specificity (of reference class), consistency (internal consistency of the ex-
planation), simplicity (e.g., complicated versus simple account of bias), and theoretical
promise (e.g., whether the generalization is both testable and theoretically suggestive).
2.4 CONCLUSION
I have argued that it is possible to treat every "smart" inductive generalization as an in-
stance of abduction, and that analyzing inductive generalizations as abductions shows
44 J.R. JOSEPHSON
Acknowledgments
Parts of this essay are adapted from (Josephson and Josephson, 1994, Chapter 1), "Conceptual
analysis of abduction" (used with permission). I especially want to thank Richard Fox for many
helpful comments on an earlier draft of this chapter.
3 ABDUCTION AS EPISTEMIC
CHANGE: A PEIRCEAN MODEL IN
ARTIFICIAL INTELLIGENCE
Atocha Aliseda
3.1 INTRODUCTION
Charles S. Peirce's abductive formulation ((Peirce, 1958, 5.189), reproduced on p.7),
has been the point of departure of many recent studies on abductive reasoning in arti-
ficial intelligence, such as in logic programming (Kakas et al., 1992), knowledge ac-
quisition (Kakas and Mancarella, 1994) and natural language processing (Hobbs et al.,
1990).
Nevertheless, these approaches have paid little attention to the elements of this for-
mulation and none to what Peirce said elsewhere in his writings. This situation may be
due to the fact that his philosophy is very complex and not easy to be implemented in
the computational realm. The notions of logical inference and of validity that Peirce
puts forward go beyond logical formulations. They are linked to his epistemology, a
dynamic view of thought as logical inquiry. In our view, however, there are several
aspects of Peirce's abduction which are tractable and may be implemented using ma-
chinery of artificial intelligence (AI), such as that found in theories of belief revision.
In this chapter, we propose abduction as an epistemic process for the acquisition of
knowledge and present a model which combines elements from Peirce's epistemology
and theories of epistemic change in AI originated in (Alchourr6n et al., 1985). In
particular, our interest is on the role played by the element of surprise in the abductive
formulation; and its connection to the epistemic transition between the states of doubt
and belief.
45
P.A. Flach and A. C. Kakas (eds.), Abduction and Induction, 45-58.
© 2000 Kluwer Academic Publishers.
46 A.ALISEDA
which a single best explanation is constructed. While the latter view considers finding
the best explanation as fundamental for abduction (an approach shared by Joseph-
son and Psillos in this volume), abduction understood as the construction of explana-
tions, regards the notion of explanation as more fundamental, as shown in (Aliseda,
1996b; Denecker et at., 1996), Bessant (this volume), and Flach (this volume).
To clear up all these conflicts, which are terminological to a large extent, one might
want to coin new terminology altogether. I have argued for a new term of "explana-
tory reasoning" in (Aliseda, 1996b; Aliseda, 1997), trying to describe its fundamental
aspects without having to decide if they are instances of either abduction or induction.
However, for the purposes of this chapter, rather than introducing new terminology, I
shall use the term 'abduction' for the basic type of explanatory reasoning.
Our focus is on abduction as hypothesis construction. It is a general process of
explanation which is best described by a taxonomy (cf. Section 3.3.2) in which sev-
eral parameters (inference, triggers, outcomes) determine types of abduction. More
precisely, we shall understand abduction as reasoning from a single observation to its
explanations, and induction as enumerative induction from samples to general state-
ments. Therefore, abduction (when properly generalized), encloses (some cases of)
induction as one of its instances, when the observations are many and the outcome a
universal statement.
Peirce proposes abduction to be the logic for synthetic reasoning, a method to ac-
quire new ideas. He was the first philosopher to give to abduction a logical form.
However, his notion of abduction is a difficult one to unravel. On the one hand, it is
entangled with many other aspects of his philosophy, and on the other hand, several
different conceptions of abduction evolved in his thought. We will point out a few
general aspects of his theory of inquiry, and later concentrate on some of its more
logical aspects.
1See p.6 for Peirce's classification of inferences into analytic and synthetic.
48 A.ALISEDA
The development of a logic of inquiry occupied Peirce's thought since the begin-
ning of his work. In the early years he thought of a logic composed of three modes
of reasoning: deduction, induction and hypothesis each of which corresponds to a
syllogistic form (p.5). Of these, deduction is the only reasoning which is completely
certain, inferring its 'Result' as a necessary conclusion. Induction produces a 'Rule'
validated only in the 'long run' (Peirce, 1958, 5.170), and hypothesis merely suggests
that something may be 'the Case' (Peirce, 1958, 5.171).
Later on, Peirce proposed these types of reasoning as the stages composing a
method for logical inquiry, of which hypothesis (now called abduction), is the be-
ginning:
"From its [abductive] suggestion deduction can draw a prediction which can be
tested by induction". (Peirce, 1958, 5.171)
The notion of abduction is then enriched by the more general conception of: "the pro-
cess of forming an explanatory hypothesis" (Peirce, 1958, 5.171) and the syllogistic
form is replaced by the often-quoted logical formulation (p.7).
For Peirce, three aspects determine whether a hypothesis is promising: it must be
explanatory, testable, and economic. A hypothesis is an explanation if it accounts
for the facts, according to the abductive formulation. Its status is that of a suggestion
until it is verified, which explains the need for the testability criterion. Finally, the
motivation for the economic criterion is twofold: a response to the practical problem of
having innumerable explanatory hypotheses to test, as well as the need for a criterion
to select the best explanation amongst the testable ones.
Moreover, abductive reasoning is essential for every human inquiry. It plays a role
in perception, in which:
"The abductive suggestion comes to us as a flash" (Peirce, 1958, 5.181)
In all this, abduction is both "an act of insight and an inference" as has been claimed
by (Anderson, 1986), who suggests a double aspect of abduction; an intuitive and a
rational one.
Interpreting Peirce's abduction. The notion of abduction has puzzled Peirce schol-
ars all along. Some have concluded that Peirce held no coherent view on abduction at
all (Frankfurt, 1958), others have tried to give a joint account with induction (Reilly,
1970) and still others claim it is a form of inverted modus ponens (Anderson, 1986).
A more modem view is found in (Kapitan, 1990) who interprets Peirce's abduction
as a form of heuristics. An account that tries to make sense of the two extremes of
abduction, both as a guessing instinct and as a rational activity is found in (Ayim,
1974). This last approach continues to present day. While (Debrock, 1997) proposes
to reinterpret the concept of rationality to account for these two aspects, (Gorlee, 1997)
shows abductive inference in language translation, a process in which the best possi-
ble hypothesis is sought using instinctive as well as rational elements of translation.
ABDUCfiON AS EPISTEMIC CHANGE 49
Peirce's epistemic model proposes two varieties of surprise as the triggers for ev-
ery inquiry, which we relate to the previously proposed novelty and anomaly (cf. Sec-
tion 3.3.2). We will see (Section 3.5.1) how these are related to the epistemic opera-
tions for belief change in AI.
52 A.ALISEDA
• Expansion
A new sentence is added to 8 regardless of the consequences of the larger set
to be formed. The belief system that results from expanding 8 by a sentence <p
together with the logical consequences is denoted by 8 + <p.
• Revision
A new sentence that is (typically) inconsistent with a belief system 8 is added,
but in order that the resulting belief system be consistent, some of the old sen-
tences in 8 are deleted. The result of revising 8 by a sentence <p is denoted by
8*<1'·
• Contraction
Some sentence in 8 is retracted without adding any new facts. In order to guar-
antee the deductive closure of the resulting system, some other sentences of 8
may be given up. The result of contracting 8 with respect to sentence <p is
denoted by 8 - <p.
While expansion can be uniquely and easily defined (8 + <p = {a I 8 V {<p} f- a}),
this is not so with contraction or revision. A simple example to illustrate this point is
the following:
3 However, the material of this section is mainly based on (Glirdenfors and Rott, 1995).
ABDUCTION AS EPISTEMIC CHANGE 53
E>: r, r--+ w.
<p: -.w.
In order to incorporate <P into 9 and maintain consistency, the theory must be re-
vised. But there are two possibilities for doing this: deleting either of r --+ w or r
allows us to then expand the contracted theory with -.w consistently. Several formulas
can be retracted to achieve the desired effect, thus it is impossible to state in purely
logical or set-theoretical terms which of these is to be chosen.
Therefore, an additional criterion must be incorporated in order to fix which for-
mula to retract. Here, the general intuition is that changes on the theory should be kept
'minimal', in some sense of informational economy 4 .
Moreover, epistemic theories in this tradition observe certain 'integrity constraints',
which concern the theory's preservation of consistency, its deductive closure and two
criteria for the retraction of beliefs: the loss of information should be kept minimal
and the less entrenched beliefs should be removed first.
These are the very basics of the AGM approach. In practice, however, full-fledged
systems of belief revision can be quite diverse. They differ in at least three aspects: (a)
belief state representation (sets, bases, possible worlds or probabilities over sentences
or worlds), (b) characterization of the operations of epistemic change (via postulates or
constructively), and (c) epistemological stance. This last aspect concerns the epistemic
quality to be preserved. While the foundationalists argue that beliefs must be justified
(with the exception of a selected set of 'basic beliefs'), the coherentists consider it
a priority to maintain the overall coherence of the system and reject the existence of
basic beliefs.
Therefore, each theory of epistemic change may be characterized by its represen-
tation of belief states, its description of belief revision operations, and its stand on
the main properties of sets of beliefs one should be looking for. These choices may
be interdependent. Say, a constructive approach might favor a representation by be-
lief bases, and hence define belief revision operations on some finite base, rather than
the whole background theory. Moreover, the epistemological stance determines what
constitutes rational epistemic change. The foundationalist accepts only those beliefs
which are justified in virtue of other basic beliefs, thus having an additional challenge
of computing the reasons for an incoming belief. On the other hand, the coheren-
tist must maintain coherence, and hence make only those minimal changes which do
not endanger (at least) consistency (however, coherence need not be identified with
consistency).
In particular, the AGM paradigm represents belief states as sets (in fact, as theories
closed under logical consequence), provides 'rationality postulates' to characterize the
belief revision operations, and finally, it advocates a coherentist view.
4Various ways of dealing with this issue occur in the literature. I mention only that in (Glirdenfors, 1988). It
is based on the notion of entrenchment, a preferential ordering which lines up the formulas in a belief state
according to their importance. Thus, we may retract those formulas which are the 'least entrenched' first.
For a more detailed reference as to how this is done exactly, see (Glirdenfors and Makinson, 1988).
54 A. ALISEDA
for expansion. So, the basic operations for abduction are expansion and revision 5 •
Therefore, two epistemic attitudes and changes in them are reflected in an abductive
model.
Here, then, are the abductive operations for epistemic change:
• Abductive Expansion
Given an abductive novelty <p, a consistent explanation a for <p is computed in
such a way that e, (l => <p, and then added to e.
• Abductive Revision
Given an abductive anomaly <p, a consistent explanation a is computed as fol-
lows: the theory e is revised into e' so that it does not explain ...,<p. That is,
e' =fo ...,<p, where e' = e - Uh' .. . '~I ) 6 •
Once e' is obtained, a consistent explanation a is calculated in such a way that
e'' (l => <p and then added to e.
Thus, the process of revision involves both contraction and expansion.
In one respect, these operations are more general than their counterparts in the
AGM theory, since incoming beliefs are incorporated into the theory together with
their explanation (when the theory is closed under logical consequence). But in the
type of sentences to accept, they are more restrictive. Given that in our model non-
surprising facts (where e => <p) are not candidates for being explained, abductive ex-
pansion does not apply to already accepted beliefs, and similarly, revision only accepts
rejected facts. Other approaches however, do not commit themselves to the precondi-
tions of novelty and anomaly that we have set forward. Pagnucco's abductive expan-
sion (Pagnucco, 1996) is defined for an inconsistent input, but in this case the resulting
state stays the same. Lobo and Uzcategui's abductive expansion (Lobo and Uzcategui,
1998) is even closer to standard AGM expansion; it is in fact the same when every
atom is "abducible".
5 Indeed the three belief change operations can be reduced into two of them, since revision and contraction
may be defined in terms of each other. In particular, revision here is defined as a composition of contraction
and expansion: first contract those beliefs of 9 that are in conflict with <JI, and then expand the modified
theory with sentence <P (known as ' Levi's identity').
6 In many cases, several formulas and not just one must be removed from the theory. The reason is that sets
of formulas which entail (explain) <P should be removed. E.g., given 9 = {a --+ 13. a,l3} and <P = ~13. in
order to make 9, ~13 consistent, one needs to remove either {13, a} or {13, a --+ 13}.
56 A. ALISEDA
of what we have examined so far, is to stay close to the AGM approach. That is, to
represent belief states as sets (in fact as closed theories), and to characterize the ab-
ductive operations of extension and revision through definitions, rationality postulates
and a number of constructions motivated by those for AGM contraction and revision.
As for epistemological stance, Pagnucco is careful to keep his proposal away from
being interpreted as foundationalist; he thinks that having a special set of beliefs like
the "abducibles", as found in (Lobo and Uzcategui, 1998), is "against the coherentist
spirit of the AGM" (Pagnucco, 1996, p.174).
In our view, an abductive theory for epistemic change which aims to model Peirce's
abduction naturally calls for a procedural approach. It should produce explanations for
surprising phenomena and thus transform a state of doubt into one of belief. The AGM
postulates describe expansions, contractions and revisions as epistemic products rather
than processes in their own right. Their concern is with the nature of epistemic states,
not with their dynamics. This gives them a 'static' flavour, which may not always be
appropriate.
Therefore, we aim at giving a constructive model in which abduction is an epistemic
activity. In (Aliseda, 1997), I propose (an extension of) the logical framework of se-
mantic tableaux as a constructive representation of theories, and abductive expansion
and revision operations that work over them. These tableaux analyze non-deductively
closed finite sets of formulas, corresponding with 'belief bases'. For this chapter how-
ever, I leave the precise technical proposal outside of its scope. Let me just mention
that semantic tableaux provides a very attractive framework, in which expanding a
theory concerns the addition of new formulae to open branches and contraction cor-
responds to deleting formulas by 'opening' closed branches. Besides implementing
the standard account of contraction, which is done by removing complete formulas,
tableaux offer an alternative route. This is done by removing "subformulas", which is
a more delicate kind of minimal change. We thus offer two strategies for contracting
theories, global and local. We also explore the idea of contracting by revising the
language, which seems a more realistic way to account for inconsistencies, as often
people resolve anomalies by introducing distinctions in the language, rather than by
deleting and expanding their background theory. Moreover, in tableaux consistency is
handled in a natural way and its structure provides a way in which an entrenchment
order is easily represented.
As for epistemological stance, here is our position. The main motivation for an ab-
ductive epistemic theory is to incorporate an incoming belief together with its explana-
tion, the belief (or set of) that justifies it. This fact places abduction close to the above
foundationalist line, which requires that beliefs are justified in terms of other basic be-
liefs. Often, abductive beliefs are used by a (scientific) community, so the claim that
individuals do not keep track of the justifications of their beliefs (Gardenfors, 1988)
does not apply. On the other hand, an important feature of abductive reasoning is
maintaining consistency of the theory. Otherwise, explanations would be meaningless
(especially if:::} is interpreted as classical logical consequence. Cf. (Aliseda, 1997)).
Therefore, abduction is committed to the coherentist approach as well. This is not
a case of opportunism. Abduction rather demonstrates that the earlier philosophical
stances are not incompatible. Indeed, (Haack, 1993) argues for an intermediate stance
ABDUCTION AS EPISTEMIC CHANGE 57
the traditional view of abduction in AI, to include cases in which the observation is in
conflict with the theory, as we have suggested in our taxonomy (Section 3.3.2).
Regarding the connection to other work, it would be interesting to compare our
approach with that mentioned which follows the AGM line. Although this is not as
straightforward as it may seem (cf. (Aliseda, 1997)), we could at least check whether
our algorithmic constructions of abductive expansion and revision validate the abduc-
tive postulates found in (Pagnucco, 1996) and (Lobo and Uzcategui, 1998).
Acknowledgments
I am grateful to Michael Hoffmann and Raymundo Morado for discussions on several points of
this chapter, and to Samir Okasha for very helpful comments and suggestions to this chapter.
4
ABDUCTION: BETWEEN
CONCEPTUAL RICHNESS AND
COMPUTATIONAL COMPLEXITY
Stathis Psillos
4.1 INTRODUCTION
The aim of this chapter is two-fold: first, to explore the relationship between abduction
and induction from a philosophical point of view; and second, to examine critically
some recent attempts to provide computational models of abduction. Induction is
typically conceived as the mode of reasoning which produces generalisations over
domains of individuals based on samples. Abduction, on the other hand, is typically
seen as the mode of reasoning which produces hypotheses such that, if true, they would
explain certain phenomena or evidence. Recently there has been some increasing
interest in the issue of how exactly, if at all, they are related. Two seem to be the
main problems: first, whether or not induction and abduction are conceptually distinct
modes of reasoning; second, whether or not they can be modelled computationally in
the same, or similar, ways. The second issue is explored in some detail by several
chapters in this collection (e.g. the contributions by Aliseda, Mooney and Poole). The
first issue is what the present chapter will concentrate on. My suggestion will be that
abduction is the basic type of ampliative reasoning. It comprises as special case both
Induction and what the American philosopher Charles Peirce called "the Method of
Hypothesis".
In order to motivate and defend my thesis, I proceed as follows. Section 4.2 de-
scribes the basic logical features of ampliative reasoning. Section 4.3 takes its cue
from Peirce's distinction between Induction and Hypothesis and raises the following
question: should the fact that Induction and Hypothesis admit different logical forms
59
P.A. Flach and A.C. Kakas (eds.), Abduction and Induction, 59-74.
© 2000 Kluwer Academic Publishers.
60 S. PSILLOS
be taken to indicate that they are conceptually distinct modes of ampliative reasoning?
I answer this question negatively and defend the view that Induction and Hypothesis
are very similar in nature: they are instances of what can be called "explanatory rea-
soning", where explanatory considerations govern the acceptance of the conclusion.
So, I suggest that explanatory reasoning is a basic type of ampliative reasoning, irre-
spective of the specific logical forms it may admit. In Section 4.4, I describe abduction
as the basic type of explanatory reasoning. I suggest that it should be best understood
as Inference to the Best Explanation. In particular, I deal with three problems. First,
how abduction can acquire an eliminative-evaluative dimension; second, how abduc-
tion can produce likely hypotheses; and third, what the nature of explanation is. These
are still open issues and what this chapter aims to do is motivate some ways to address
them. Finally, Section 4.5 discusses some recent computational models of abduction
and notes that there seems to be an inherent tension in the project of modelling abduc-
tion. Simple models of abduction are computationally tractable, but fail to capture the
rich conceptual structure of abductive reasoning. And conversely, conceptually rich
models of abduction become computationally intractable.
1All references to Peirce's work are given in the standard form and refer to the relevant volume and para-
graph of his collected papers.
BETWEEN CONCEPTUAL RICHNESS AND COMPUTATIONAL COMPLEXITY 61
constitutive difference from explicative reasoning. The latter is not defeasible, since
the addition of further information in the premises of a logically valid argument would
not affect the derivation of the original conclusion. When it comes to ampliative rea-
soning, further evidence, which does not affect the truth of the premises, can render
the conclusion false. Take for instance the simple inductive argument: 'All hitherto
observed swans have been white; so, all swans are white'. The observation of a black
swan falsifies its conclusion, without contradicting its premises. Given its defeasi-
bility, one may wonder why ampliative reasoning should be accepted as a legitimate
type of reasoning in the first place. The reason for this is that explicative reasoning
is not concerned with one of the basic aspects of reasoning, viz., how it is reasonable
for someone to form and change their system of beliefs, or the information they hold
true. All that explicative reasoning dictates is that since a certain conclusion logically
follows from a set of premises, its likelihood of being true is at least as great as the
likelihood of the premises being true, and it will remain so when further premises are
added. But this is too thin. Judgements as to whether the conclusion, or the premises,
are probable enough, or even plausible at all, to be accepted fall outside the province of
explicative reasoning. When, for instance, the conclusion of an explicative argument
is not acceptable to a reasoner, at least one of the premises should have to go (or the
integrity of the derivation may be challenged). But explicative reasoning on its own
cannot tell us which premise should go. This requires reasoning based on some con-
siderations of plausibility, and only ampliative reasoning can tell the reasoner what to
count as plausible and what not, given the information available. In order, however, to
avoid a possible misunderstanding, the following should be stressed. There is nothing
wrong with the claim that it is reasonable to accept a statement which logically follows
from other premises accepted by a reasoner. Rather, what needs to be emphasised is
that a) what makes premises acceptable in the first place is some sort of ampliative rea-
soning which renders them plausible, or reasonable, given the evidence available; and
b) if the conclusion of a deductive argument is not acceptable, explicative reasoning
alone cannot tell the reasoner where to revise.
Such opening remarks lead us directly to the problem of justification of ampliative
reasoning: given that ampliative reasoning is not necessarily truth-preserving, how
can it be justified? This is Hume' s problem, for although David Hume first raised it for
induction, his challenge concerns ampliative reasoning in general. His point hinges
on the fact that ampliative reasoning is defeasible. Since, the Humean challenge goes,
the premises of an ampliative argument do not logically entail the conclusion, there
are possible worlds in which the premises are true and the conclusion false. How then,
the challenge goes on, can we show that the actual world is one of the possible worlds
in which whenever the premises of an ampliative argument are true, its conclusion is
also true? Or even, how can we show that in the actual world most of the times in
which the premises of an ampliative argument are true, the conclusion is also true?
The Humean challenge is precisely that the only way to do this is bound to presuppose
that ampliative reasoning is rational and reliable; hence, its bound to beg the question.
What the Humean challenge is taken to suggest is that the premises of an amplia-
tive argument cannot confer warrant or rational support on its conclusion. This is
the central philosophical issue concerning ampliative reasoning. Any substantial de-
62 S.PSILLOS
fence of the rationality of ampliative reasoning should either solve or dissolve Hume's
problem. Yet, this is not the place to deal with this philosophical problem. Instead,
this chapter will concentrate on another problem, which needs to be dealt with inde-
pendently of the problem of justification. It is the descriptive problem: what is the
structure of ampliative reasoning? No-one, including Hume himself, but save Popper,
denies that humans are engaged in ampliative inferential practices. What exactly these
inferential practices involve, and whether or not they admit specific logical forms, are
issues worth looking into. One may call the descriptive problem Peirce's problem,
since Peirce was, arguably, the first who tried to address it systematically.
where the premises are a particular known fact (a is B) and a generalisation (All A's
are B), while the conclusion is a particular hypothesis (that a is A).
The fact that argument-patterns I and H have different logical forms suggests that
there may well be two different and distinct types of ampliative reasoning. While the
argument-pattern I clearly characterises the logical form of the intuitive more-of-the-
same rule of induction, the argument-pattern H is more difficult to characterise. Peirce
called "Hypothesis" (or the "method of hypothesis") the mode of reasoning which cor-
responds to H. It can be illustrated by using Peirce's own example: given the premises
"All the beans from this bag are white" and "These beans are white", one can draw the
hypothetical conclusion that "These beans are from this bag" (2.623). Peirce seems
to have thought that the argument-patterns H and I correspond to two distinct modes
of ampliative reasoning, since he noted that "induction classifies, whereas hypothesis
explains" (2.636). As he put it: "Induction is where we generalise from a number of
cases of which something is true, and infer that the same thing is true of a whole class.
(...) Hypothesis is where we find some very curious circumstance, which would be
explained by the supposition that it was a case of a certain general rule, and thereupon
adopt that supposition" (2.636). However, scholars of his work, most notably (Fann,
1970, p.22-23), suggest that he was not prepared to separate sharply the two forms
of inference, but that he conceived of induction and hypothesis as occupying opposite
BETWEEN CONCEPTUAL RICHNESS AND COMPUTATIONAL COMPLEXITY 63
2A similar point is made by John Josephson in his chapter in the present volume.
3Gilbert Harman (Harman, 1965) has also emphasised this point.
64 S. PSILLOS
4.4 ABDUCTION
In his famous characterisation of abduction, Peirce described "abduction" as the rea-
soning process which proceeds as follows: "The surprising fact C is observed. But if
A were true, C would be a matter of course. Hence, there is reason to suspect that A is
true" (5.189). We can easily see that this process can underlie both argument-patterns
I and H. Suppose that the surprising fact is the particular fact that a is B. This is ex-
4 With one possible exception, viz.• the predictive inference, as in the case of next-instance induction. There,
we move from n observed A's being 8 to conclude that the next A is going to be B. The conclusion is a
singular statement and is clearly non-explanatory. But one may think of predictive inference as parasitic on
the implicit generalisation "All A's are 8".
BETWEEN CONCEPTUAL RICHNESS AND COMPUTATIONAL COMPLEXITY 65
plained by saying that if a is A and All A's are B, then a is expected to be B. This piece
of reasoning is nothing but an instance of the argument-pattern H. Suppose now that
we allow the surprising fact to be the correlation of two properties A and B in a sample
of individuals. Then, we can explain this by saying that the sample is the way it is
because All A's are B. This is an instance of the argument-pattern l. So, abduction can
incorporate both l and H, and therefore can lead to the generation of generalisations
no less than to the generation of hypotheses stating particular facts. Accordingly, I
shall reserve the term abduction for explanatory reasoning in general and suggest that
abduction comprises both argument-patterns l and H. This may tally with what Peirce
himself thought of abduction, when in his later years introduced abduction as a distinct
type of reasoning. But I refrain from engaging in interpretative work. 5 Instead, I will
try to characterise more precisely what exactly is involved in abduction as a reasoning
process, drawing on some of Peirce's thoughts and suggestions and pointing out some
open problems.
Three, I think, are the big problems that any precise characterisation of abduction
faces. The first is what I will call the "multiple explanations problem". The second
concerns the connection between the connection between reasoning process behind
abduction and the likelihood of the hypotheses that it generates. The third is the nature
of explanation itself. Let me consider them in tum.
5The interested reader should look at the Introduction of this book for a description of Peirce's account
of abduction. More relevant literature includes (Burks, 1946; Hanson, 1965; Fann, 1970; Thagard, 1981;
Flach, 1996a).
6Using background beliefs to give a hypothesis a certain place in the order of preference is going to influ-
ence the likelihood of the hypothesis, and hence its acceptability, since background beliefs themselves are,
typically, supported by evidence to some degree.
66 S. PSILLOS
far as possible, simple; d) have unifying power 7 e) are more testable, and especially,
are such that entail novel predictions. 8 These factors are not algorithmic in charac-
ter, but this does not mean that one cannot decide, on their basis, which hypothesis
should be ranked highest. In fact, in most typical cases, these factors will lead to a
definite conclusion, be it about medical diagnosis or car mechanics, or what have you.
So, for instance, a diagnostician, pretty much like a good car mechanic, will look for
a hypothesis about the cause of symptoms such that: it accounts, if possible, for all
the symptoms; it is consonant with background knowledge as what types of causes
produce these symptoms; it avoids, in the first instance, attributing the symptoms to
multiple causes; it can yield further predictions that can be tested (e.g., that the patient
will recover if they take a certain medicine which acts on the cause of the symptoms).
It's not implausible to think that although virtually never do we go through all such
factors explicitly when we are engaged in abductive reasoning, all these factors have
nonetheless been internalised by a good reasoner who, then, applies them implicitly
in the case at hand. The internalisation of these factor may well be what Peirce called
"good sense" (7.220). In most typical cases, an explicit reconstruction of the reason-
ing process will reveal the implicit reliance on such factors . Similarly, in most typical
cases, the product of the reasoning will be just one hypothesis which is ranked as most
plausible. But when there is more than one (e.g., the light does not come on; is it
because the light-bulb is gone; because the fuse is blown; or because of a power-cut?),
the reasoning process itself contains obvious resources which will lead to adjudica-
tion. To the extent that the application of these evaluative-ranking criteria mark the
degree of goodness of a hypothesis, it is reasonable to say that abduction is nothing
but what (Harman, 1965) has called "Inference to the Best Explanation". According to
this mode of reasoning, a hypothesis H is accepted on the basis that a) it explains the
evidence and b) no other hypothesis explains the evidence as well as H does. So, not
only is there a reasoning process which underlies abduction, but also this reasoning
process has a certain logical, though not algorithmic, structure.
by the facts is what I call abduction" (7.202). The question is: how can abduction be
this process? How, that is, can abduction render the chosen hypothesis likely?
For a plausible solution to this problem we may take our cue from late Peirce's
suggestion that abduction should be seen as part and parcel of the method of enquiry
(cf. 7.202ff.). So, the reasoning process that underlies abduction should be embedded
in a more general framework of inquiry so that the hypotheses generated and evalu-
ated by abduction can be further tested. The result of this testing is the confirmation
or disconfirmation of the hypothesis which, naturally, affects its likelihood to be true.
We should therefore conceive of abduction as the first stage of the reasoner's attempt
to add reasonable beliefs into his belief-corpus in the light of new phenomena or ob-
servations. The process of generation and ranking of hypotheses in terms of plausibil-
ity (abduction) is followed by the derivation of further predictions from the abduced
hypotheses. Insofar as these predictions are fulfilled, the abduced hypothesis gets con-
firmed. Peirce himself thought that the process of generating predictions is deductive
and came to call "Induction" the testing these predictions, and hence the process of
confirming the abduced hypothesis (cf. 7.202ff). 9 Leaving once again aside some im-
portant interpretative issues, I make the following use of Peirce's idea: although a
hypothesis might be reasonably accepted as the most plausible hypothesis based on
explanatory considerations (abduction), the degree of confidence in this hypothesis is
tied to its degree of subsequent confirmation. The latter has an antecedent input, i.e.,
it depends on how good the hypothesis is (i.e., how thorough the search for other po-
tential explanations was, how plausible a potential explanation is the one at hand etc.),
but it also crucially depends on how well-confirmed the hypothesis becomes in light
of further evidence. So, abduction can return likely hypotheses, but only insofar as it
is seen as an integral part of the method of inquiry, whereby hypotheses are further
evaluated and tested.
4.4.3 Explanation
As we have already seen, in his famous characterisation of abduction, Peirce noted
that the abduced hypothesis makes the surprising fact to be explained a matter of
course. This reference to matter of course is not accidental. It suggests that the ex-
planatory hypothesis should be such that it removes the surprise from the occurrence
of the explanandum. But, although it is certainly part of an explanation that it renders
the explanandum non-surprising, what needs to be added is in exactly what ways the
explanandum is rendered non-surprising. And although it is intuitively clear that to
explain an explanandum-event is to provide information about its causal history, there
is substantive disagreement over how exactly we should understand this last claim.
Explanation is effected by pointing to some causal-nomological connections between
the explanandum and the fact that is called upon to do the explaining. But the nature
of these causal-nomological connections are under heavy dispute.
9 "But if [abduction is] to be understood to be a process antecedent to the application of induction, not
intending to test the hypothesis, but intended to aid in perfecting that hypothesis and making it more definite,
this proceeding is an essential part of a well-conducted inquiry" (7.114) And "Induction is a process for
testing hypotheses already in hand. The induction adds nothing" (7.217).
68 S. PSILLOS
Two are the important points of dispute. The first centres around how exactly
the explanatory connection is to be understood. Some philosophers (e.g. (Hempel,
1965; Kitcher, 1981)) argue that explanation proceeds via derivation. They claim that
explanations are, essentially, arguments such that an event-type P explains an event-
type Q iff (a description ot) the explanandum-event logically follows from a set of
premises which essentially involve (a description ot) P. The well-known Deductive-
Nomological account of explanation is an instance of this approach. What's typical
of this approach is that causal order follows from (instead of being presupposed by)
explanatory-derivational order: what causes what is settled after we have settled the
question what explains (in a derivational sense) what. Opposite to the above approach
is the view (advocated, among others, by (Salmon, 1984b)) that explanation are not
arguments. Instead, they should characterise the causal mechanisms that bring about
the explanandum-event, irrespective of whether (descriptions ot) these mechanisms
can be captured in the premises of an argument whose conclusion is( a description ot)
the explanandum-event. 10 The second (related) dispute focuses on the role of laws in
explanation. On one approach, laws and reference to nomological connections are es-
sential part of an explanation, whereas on another view, causal stories can be complete
even though they make no reference to laws, or even though there may be no relevant
laws to refer to. These issues are in the forefront of the current philosophical debate.
So, here I will not try to examine them further (but cf. (Salmon, 1990)). It should be
enough to keep in mind two things. First, it is still an open issue what exactly an ex-
planation is. Second, whatever the explanatory relation is taken to be in its details, its
connection with causal and/or nomological information about the explanandum-event
and its function as a surprise-remover should be pretty uncontroversial. I think the best
way to capture the latter function is to point out that explanation is typically linked
with improving our understanding of why an event happened and that improvement of
understanding occurs when we succeed in showing how an event can be made to fit in
the causal-nomological nexus of things we accept. We remove the surprise of the oc-
currence of an event if we show that the acceptance of certain explanatory hypotheses,
and their incorporation into our belief-corpus, helps to include the explanandum-event
into this corpus. Schematically, if BK is this belief corpus, e is the explanandum-event
and T is a potentially explanatory hypothesis, then T should be accepted as a potential
explanation of e if BK alone cannot explain e, but BK U T explains e.
To sum up, abduction, conceived as Inference to the Best Explanation has a rather
definite logical structure: it is the reasoning process in which the reasoner generates
and evaluates a number of potentially explanatory hypotheses, in the light of back-
ground knowledge. Judging the plausibility of each of them, and ranking them ac-
cordingly, is precisely the respect in which abduction is evaluative. The degree of
is more to causal explanation than can be captured by the ON-pattern. The ON-patterns is symmetric, but
causation is not. For instance, one can explain the length of the shadow of a flagpole in a ON-fashion by
constructing an argument whose premises are general Jaws about the propagation of light and particular
conditions about the height of the flagpole. Yet, one can use the length of the shadow as the initial condition
and ON-explain the height of the flagpole by reversing the above ON-argument. The latter ON-derivation
cannot count as a genuine explanation because it does not respect the relation of cause and effect.
BETWEEN CONCEPTUAL RICHNESS AND COMPUTATIONAL COMPLEXITY 69
confidence in the chosen hypothesis, however, is a matter of how well the hypothesis
will stand up further testing.
11 For a recent survey of the role of alxluction in AI, see (Konolige, 1996).
70 S.PSILLOS
consistent with KB, but they may stand in a stronger relation to it (e.g., they are made
likely by KB.) Notice also that the required relation cannot be entailment of H's by
KB, unless one is willing to accept only mutually consistent hypotheses as potential
explanations of 0. If all potential H's are entailed by KB, then they have to be mutu-
ally consistent. However, it is essential for abduction to be able to deal with mutually
inconsistent explanatory hypotheses. Fourth, knowledge assimilation is typically ab-
ductive, but it is a much more complicated process than the one characterised by (i)
and (ii) above. Here, let me only stress that requirement (ii) above can be too re-
strictive. It may well be the case that the assimilation of datum 0 requires extensive
modification of the existing KB in such a way that the adopted H is inconsistent with
the existing KB, although, of course, the new KB' which includes H should be inter-
nally consistent. Fifth, the explanation of the datum 0 need not be deductive. It may
well be the case that 0 does not logically follow from H and KB, but that still KB UH
explain 0, by showing how 0 was to be expected (e.g., by showing how KBUH make
0 likely, or more likely than not-0.).
An adequate computational model of abduction should be able to deal with such
problems. That is, it should incorporate these features into the computation. Naturally,
there should be a trade-off between the need to set-up a conceptually adequate model
and the need for the model to be computationally tractable. But at least computational
modelling should aim to characterise as adequately as possible the rich conceptual
structure of an abductive problem.
LP-theorists have attempted to improve on F above. There are three main ways in
which F has been improved. First, the logical space from which H's are drawn com-
prises a set A of domain-specific hypotheses, called abducibles. In the toy-example
above the abducibles are {rain last night; sprinkler was on}. In the event calculus
the set of abducibles is a set of events (or event-predicates of the form happens (E)
) which are abduced to hold at a time tt in order to explain how a property holds at
t2 (t2 > tJ) (cf. (Shanahan, 1989)). Second, the updates of KB are subjected to a set
of integrity constraints (I C), i.e., a set of meta-rules which specify which changes of
the KB are not allowed and, therefore, specify which abducibles are not acceptable.
In the event calculus, an /C is such that a property cannot hold at time t2 even though
it held at time tt Ct2 > tt) if there was an event that terminated Pat a time t12, where
t2 > t12 >ft. Third, the abduced hypothesis must be minimal, i.e., it must not be de-
composable into two others each of which could on its own explain the datum. So, F
above gives way to the following schema (F'): an abductive problem is characterised
by the triple < KB,A,IC > and it consist in searching for a minimal H such that i')
KB U H p 0 and ii") KB U H satisfies /C.
There is no doubt that F' is on the right track. But its own limitations suggest some
very general problems with the computational approach. Two such problems stick out.
First, although it should be clear that the specification of a set of abducibles is nec-
essary for any computational model of abduction, its doubtful that this specification
can only be achieved by syntactic-computational resources. An abductive problem
is not merely the search of an explanation H of a datum 0 such that KB U H p 0.
Rather, it is the search of an explanation of a particular type, one that the background
information suggests that is relevant to the understanding why 0 occurred. When,
BETWEEN CONCEPTUAL RICHNESS AND COMPUTATIONAL COMPLEXITY 71
for instance, the computer does not come on we look for blown fuses, power-cuts, or
internal failures, but we do not look for astral influence, or for who switched it off
last time etc. Some hypotheses are relevant while others are not. The first should be
properly called abducible. Yet, these judgements of relevance - and hence the spec-
ification of the appropriate type of abducibles - are not-syntactic, although once in
place, they may admit a certain logical-computational formP The second problem
with F' is that it does not yet have built into it the required preferential structure. As it
stands it simply seems to lack the resources to rank abducibles in some order of prefer-
ence. The requirement of minimality says that of two mutually consistent abducibles
the minimal should be preferred, but as it stands it applies only to abducibles with a
definite logical structure, e.g., p and p&q. It does not say, for instance, among two
or more mutually inconsistent abducibles which one should be preferred. Nor does it
say among two equally simple, but mutually consistent hypotheses, which should be
chosen. In a nutshell, F' does not yet capture the rich structure of abductive reasoning.
LP-theorists have developed several techniques to deal especially with the multi-
ple explanations problem. 13 At this stage, however, they are sets of heuristics which
are not fully incorporated into the computational framework of abductive reasoning.
According to Michalski in (Michalski, 1993, p.120), however, the computational char-
acterisation of abduction need not capture its preferential structure. He suggests that
what needs to be formalised is the process of generating (creating) an explanation, not
the evaluation of which explanation is the best. Michalski's abstract model conceives
of abduction as "reversed deduction", or as he puts it, as tracing backwards an im-
plication rule: where in deduction the reasoner looks for conclusions of the premises
she already accepts, in abduction she looks for the premises that, if true, would entail
a certain conclusion she already accepts. (Hence, Michalski's abduction looks very
much like early Peirce's Hypothesis.)
Michalski is quite right in stressing that the determination of the best among a set
of alternative explanations is not always easy. Abductive reasoning will not always
rank hypotheses in such a way that one, and only one, comes out the best. But his
objection cuts much deeper than that. He thinks that the logical properties of abduc-
tive reasoning do not depend on any measure of the goodness of an explanation. It
should be clear, however, that if abduction was just what Michalski thinks, then ab-
duction would generate an infinity of crazy explanations. Michalski's own example
is instructive. Envisage a case in which one wants to explain why one's pencil is
green. Abduction could easily trace backwards the following implication: {My pencil
is grass; Everything that is grass is green; Therefore, my pencil is green}. The result
12 (Console et al.• 199lb, p.668) give a syntactic characterisation of abducible as follows: the abducible
symbols are exactly those not occurring in the head of any clause in the theory. This characterisation works,
however, only in the limited case in which the explanatory hypothesis is already included in the background
theory. If the explanation is to be sought outside the theory and is such that, together with the theory, it
explains the datum, then Console et al. need to explicitly introduce a set of abducibles (cf. p.676). It should
then be clear that the specification of this set cannot be made syntactically.
13 See (Kakas eta/. , 1997) and (Evans and Kakas, 1992). Evans and Kakas use the notion of corroboration
to select explanations. But it should clear that the notion of corroboration is not related to the search for
explanations but rather to the degree of confidence in the chosen explanation. Corroboration is more akin
to Peirce 's later use of induction rather than to his abduction.
72 S. PSILLOS
would be that the reasoner might consider as an explanation of the fact that the pencil
is green that it is grass. It is precisely because the computational characterisation of
abduction should avoid such trivialities, that some measure of goodness of the abduced
hypotheses should be incorporated in it.
Michalski does, after all, build into his abstract model some measure of goodness of
potential explanation. He makes abduction dependent on some estimation of the like-
lihood of what he calls a "mutual implication". According to his suggestion, whether
or not a hypothesis of the form "All A's are B" is a good explanation depends on the
backward strength of the converse implication: "All B's are A". If it is likely that 'If
something is a B, then it is also an A' then, upon finding a B we may conclude that it
is also an A. So, on this suggestion, inferring from "a is B" and "All A's are B" that
probably "a is A" depends on how likely it is that "All B's are A". If it is very likely,
then the reasoner may accept that a is A, but not otherwise. In the example above,
it would be silly to infer that my pencil is grass because the reversed implication "If
something is green, then it is grass" is not at all likely. Since there is no much space at
present to evaluate properly Michalski's theory, the only point I will stress is that his
suggested measure of goodness does not depend on explanatory considerations. His
suggestion amounts to the claim that the likeliest hypothesis should be chosen. This
is a sound piece of advice, if we already know which is the likeliest hypothesis. But
if we do know that, then there is no reason to generate any other than the likeliest
hypothesis.
Bylander and his collaborators in (Bylander et al., 1991) have aimed to offer a
computational model of abduction which captures its evaluative element. According
to them, an adduction problem is a tuple< Dau ,Hall ,e,pl >where: Dall is a finite set
of all the data to de explained; Hall is a finite set of all the individual hypotheses; e is
a map from all subsets of Hall to subsets of Dall; pl is a map from subsets of Hall to a
partially ordered set representing the plausibility of various hypotheses. In this model,
an explanation is a set of hypotheses H such that H is complete and parsimonious, i.e.,
such that e(H) = Dall and there is no proper subset H' of H such that e(H') = Dall·
The best explanation is the H with the highest place in the plausibility ordering. Lets
call this model J .
There are clear senses in which J is an improvement over F' and over Michalski's
model. Its most distinctive improvements are a) that it is not built into the model that
an explanation should be a deductive argument and b) that potential explanations are
ordered in terms of plausibilities. Allowing for an initial plausibility ordering takes
account of the way in which background information and explanatory considerations
may affect the trustworthiness of a hypothesis. In the case of medical diagnosis, where
J has been applied, the plausibility ordering suggests, for instance, that not all hypothe-
ses concerning the causes of a set of symptoms are equally licensed by background
information. The plausibility ordering is also helpful from a technical point of view.
Given that J conceives of explanation as a function from subsets of Hall to subsets of
Dau, it should be clear that there will, normally, be a large number of such potential
explanations. If they are ranked in terms of plausibility, then some of them will be
deemed implausible and will not be further entertained.
BETWEEN CONCEPTUAL RICHNESS AND COMPUTATIONAL COMPLEXITY 73
The J model, however, has some weaknesses, too. Some computational difficulties
have been noted by the authors of J themselves. They point out that J makes abductive
problems computationally tractable only if it assumed - as a rule, implausibly - that
there are no incompatibility relationships between the competing hypotheses. Besides,
if it is required that there always should be one most plausible (best) explanation, then
there intractability is guaranteed. Some conceptual in nature problems with J have
been noted by (Thagard and Shelley, 1997). What one may add is that plausibility in J
is taken as a primitive notion. Although it is right to say that the details of the plausi-
bility ordering will be domain-specific, J needs to say more about its general structure
in order to accommodate explanatory factors into abductive reasoning. To be sure,
(Josephson and Josephson, 1994, ch.9) offer a weaker model of abductive reasoning
which is computationally tractable. In this model the task of an abductive problem
is to explain as much as possible of the data with acceptable levels of confidence.
Completeness and maximal plausibility have to be sacrificed in favour of the weaker
aim of maximising explanatory coverage. There are algorithms for this model which
compute the result in polynomial time. This is clearly an improvement in respect of
computation, but some of the element which make their original model J conceptually
rich have to go.
4.6 CONCLUSIONS
To recapitulate, I have argued that abduction has a rich conceptual structure which
comprises induction as a special case. Abduction is the mode of reasoning in which a
hypothesis His accepted on the basis that a) it explains the evidence and b) no other
hypothesis explains the evidence as well as H does. So, the reasoning process which
underlies abduction has a certain logical, though not algorithmic, structure. Induc-
tion produces generalisations (be they universal or statistical), but these are explana-
tory and their acceptance is governed by explanatory considerations. So, although
induction may be taken to be superficially distinct from abduction, it is an instance of
explanatory reasoning.
As for the second theme of this chapter, i.e., the critical discussion of the recent
computational modelling of abduction, I wish to sum it up with a conjecture: the
more conceptually adequate a model of abduction becomes, the less computationally
tractable it is. This may leave us with a dilemma: either we may have to go for
computational tractability at the expense of conceptual richness, or we may have to
resolve for the view that a rich conceptual model of abduction cannot be adequately
programmed. The solution, if any, lies with future research.
Acknowledgments
Many thanks to Peter Lipton whose book "Inference to the Best Explanation" (Lipton, 1991) has
been a great source of inspiration; to John Josephson for many helpful comments on an earlier
draft; to Bob Kowalski for many hours of discussions about the role of abduction in Artificial
Intelligence; and to Francesca Toni for her patient defence of the Logic Programming approach
to abduction. Research for this chapter was conducted under a British Academy Postdoctoral
Fellowship. I am grateful to the Academy for all the help. Many ideas of this chapter have been
74 S. PSILLOS
stimulated by, and have found a great companion in, the views expressed in the book "Abductive
Inference" by John Josephson and his collaborators (Josephson and Josephson, 1994), whom I
wish to thank.
II The logic of abduction and
induction
5 ON RELATIONSHIPS BETWEEN
INDUCTION AND ABDUCTION: A
LOGICAL POINT OF VIEW
Brigitte Bessant
5.1 INTRODUCTION
The concepts of inductive and abductive reasonings have led and still lead to many
open discussions. Many works are based on these reasonings, for example in machine
learning (e.g. (Kodratoff and Ganascia, 1986; Michalski, 1983a) ), inductive logic pro-
gramming (see (Muggleton and De Raedt, 1994) for a survey), abductive logic pro-
gramming (e.g. (Poole et al., 1987)), or resolution of diagnosis problems (see (Cox
and Pietrzykowski, 1987; de Kleer and Williams, 1987; Poole, 1989b) for various ap-
proaches to abduction). However, many of these works deal with only one of these
reasonings. When we attempt to relate induction and abduction, the obvious differ-
ences between the various approaches make this task difficult. How can we, for ex-
ample, relate induction characterized by logical conditions of adequacy of the relation
"observation 0 confirms hypothesis H" (Hempel, 1945) with the ABDUCE procedure
in logic programming (Console et al., 199lb)?
In order to clarify, we examine the relationships between induction and abduc-
tion through three standard approaches which are analyzed and compared. The first,
namely "one is an instance of the other", is that induction is simply a form of abduction
and vice versa. The second, namely "different with a common root", is that they are
distinct but they may be made to share a common logical form based on a hypothetico-
deductive model. The third, namely "totally different", is that induction is nothing but
the process of confirmation and hence it is distinct from abduction, based on what
we call "the hypothetico-nonmonotonic model". We then present the viewpoint that
77
P.A. Flach and A.C. Kakas (eds.), Abduction and Induction, 77-87.
© 2000 Kluwer Academic Publishers.
78 B. BESSANT
both abduction and induction are means by which knowledge bases are completed in
order to employ deduction and to draw further conclusions. We eventually discuss
about another way to deal with inductive or abductive inference, that is the study at
the meta-level of its logical properties.
Discussion. First of all, through this first approach, we notice the difficulty to relate
and to compare definitions of abduction and induction which are sometimes conflict-
ing. The variety of definitions and models using unrelated formalisms, the intuitive
perception of some basic concepts (e.g. the concept of explanation, generality), or
else, the existence of numerous languages to deal with abduction and induction, is
what we call "the babelism problem", pointed out in (Bessant, 1996). The different
points of view presented above, result among others from a problem of terminology
which of course, is not the main problem but should yet be noted. Another reason is
the level of generality of the definitions that allow plenty of scope for liberty. For ex-
ample, when induction is considered as non-deductive inference, it covers a very large
set of possible reasonings; or else when abduction is defined as inference to the best
explanation, the generality lies in the fact that the notion of explanation is complex
and leads to many interpretations. Usual informal definitions are inference to the best
explanation for abduction and generalization of observations for induction. The no-
tions of "explanation" and "generality" bring confusion in the relationships between
these two forms of reasoning.
The role of explanation in inductive and abductive reasonings is a fundamental
component in the analysis of their relationships. If we consider that inductive general-
ity is explanatory, we may agree with the thesis that induction is a form of abduction.
However, this consideration is not as simple as it may seen. It depends on the defi-
nitions we give to explanation and generality, and it also depends on how we situate
ON RELATIONSHIPS BETWEEN INDUCfiON AND ABDUCfiON 79
induction and abduction within these definitions. We refer to the chapters of Console
and Saitta, Josephson and Psillos for a variety of points of view on the notions of gen-
erality and explanation. In short, in this first approach - "one is an instance of the
other" - we consider inductive and abductive reasonings a certain explanatory power,
one being able to be fully incorporated into the other. In the sequel, we present one
of the most common approach to both inferences, based on a hypothetico-deductive
model. In this approach, the explanatory power turns into a root common to induction
and abduction which are, then, considered as different.
explanans = L 1\ C
We examine this particular class of inductive and abductive reasonings, that we call
the AI class defined by:
Definition 5.1 Induction and abduction are of AI class iff they are based
- on the logical model of hypothetico-deductive explanation
- on the epistemological model of deductive-nomological explanation
Induction of AI class is defined as the inference of covering laws and abduction of AI
class as the inference ofparticular conditions.
80 B. BESSANT
• 0 is an observation iff
1. 0 is a closed formula of L
2. M(O)=J0
3. If This a domain theory,
then Th ?!= 0 and M(Th 1\ 0) =J 0
• H is a hypothesis iff
1. H is a closed formula of L
2. Given Th and 0, Thl\ 0 ?!=Hand M(Thl\ 0 I\ H) =J 0
We consider that the domain theory Th and the observation 0 are consistent (M(Th) =J
0 and M( 0) =J 0) and that 0 and H have the particularity to be consistent with the a
priori knowledge (M(T h 1\ 0) =J 0 and M(T h 1\ 01\ H) =J 0). Indeed, the reason is that
it exists at least, one common model to T h, 0 and H that is the intentional interpre-
tation (or real world). The validity of a hypothesis H is not known (Th 1\ 0 ?!=H).
Finally, observation 0 has a particularity linked to its part in triggering induction and
abduction which is that, a priori, we do not have a proof of its validity (T h ?!= 0).
• L is a law iff
it is semantically2 equivalent to a universally quantified formula.
(11) H is a hypothesis
(12) H is a law
(13)
1 -Either H f= 0
2 - Or H is such as it exists a particular condition C such as
H ~ 0, C ~ 0, C /\ H f= 0, and
either (a) C E Th or (b) C E 0
(II)(I2)(13) (resp. (Al)(A2)(A3)) determine the set of possible inductive (resp. ab-
ductive) hypotheses. We do not address here the preference criteria that allow to select
the best hypotheses.
=
equivalent to fly(Tweety) (resp. bird(Tweety)).
Fs Vx(crow(x) ---+ (fly(x) /\ bird(x)) is not an inductive hypothesis: (13) is not ver-
ified. However, if we add crow(Tweety) to Th, F5 becomes inductive: (13-2-a) is
verified.
Th does not contain any law, so the set of possible abductive hypotheses is empty be-
=
cause of (A3).
=
If we add F6 Vx(gold(x)---+ shine(x)) to the theory, the only ground formulas that
we can consider such that Th /\ F f= 0, are F 0/\ G with any formula G. However,
it is not an abductive hypothesis (F f= 0).
If we add Fs to Th then crow(Tweety) is an abductive hypothesis: (Al)(A2)(A3) are
verified.
5.3.2 Discussion
From an epistemological point of view, we adopt Hempel's (with the notion of law
and particular condition) instead of Peirce's (see the introductory chapter to this vol-
ume). The reason is that in many AI works, the intuitive definitions are closer to the
deductive-nomological explanation. From a logical point of view, (12) and (A2) are
syntactical considerations that could be too restrictive and debatable from an episte-
mological point of view. However, we think that they capture a non negligible part
82 B.BESSANT
of AI works, (12) and (A2) represent one of the main points on which induction and
abduction are differentiated: induction is defined as inference of covering laws and
abduction is defined as inference to particular conditions. Sometimes abduction or
induction are defined in a less restricted way as for example, when abduction is con-
sidered as scientific discovery. In our context, abduction would be then the inference
to the explanans (that is inference to the conjunction of the laws and particular con-
ditions) instead of restricting abduction to the inference to particular conditions. The
adopted point of view depends on the choice of generality in the definition. We agree
with Dimopoulos and Kakas' point of view (Dimopoulos and Kakas, 1996a) about the
complexity of such a reasoning that can be broken down to a combination of more ba-
sic steps (what we do here). In our approach, this form of reasoning naturally appears
as the "explanatory reasoning" (as in Aliseda's chapter (this volume) where the author
presents a unified framework for abductive and inductive reasoning). It represents
here the common root, described in the beginning of the section through hypothetico-
deductive model.
In our definition, induction and abduction are forms of hypothetico-deductive ex-
planation except for the case (11.12.13-2-b ). The two cases (11.12.13-1) and (11.12.13-2-
b) correspond to enumerative induction. This is another point on which induction and
abduction differ: whereas abduction is necessarily domain-dependent (see A3), this
is not the case for induction (see 11.12.13-1). Abduction is domain-dependent because
of the dependence between particular conditions and the associated law. If this law
was not in the domain theory, we would have to infer it as a hypothesis too, then it
would not be an abductive reasoning but an explanatory one (i.e. inference to the ex-
planans). In enumerative induction, we do not need to deal with a domain theory, it is
simply an extension of some properties of observed individuals, to a larger unobserved
population. This form of reasoning is sometimes called inductive generalization. The
enumerative induction is related to the concept of confirmation that we present in the
following section. In the next approach, abduction and induction are totally different,
they are no more linked up by the common root of the explanatory reasoning. Induc-
tion is nothing but the process of confirmation and abduction is always explanatory
but it is based on what we call, "the hypothetico-nonmonotonic model".
Tht\H I"' 0
We call such a hypothesis H, a hypothetico-nonmonotonic explanation.
This case corresponds to the possibility that some causally relevant factors might
be omitted in the domain theory which is not assumed perfect (Leake, 1995). For
example, let us assume that hypothesis H ="a lit match fell down" explains the obser-
vation that "the forest is burning". When we add the omitted information "the match
fell down in a pool of water", H is no longer an explanation for the observation. It
is related to the qualification problem raised by (McCarthy, 1980). Let us note the
existence of some works (Aliseda, 1996a; Flach, 1996b; Mayer and Pirri, 1996; Pino
84 B.BESSANT
Perez and Uzcategui, 1997), studying the inductive and abductive inferences based on
the hypothetico-nonmonotonic model, at the meta-level (ie. studying properties that
the underlying inferences should satisfy). This approach of induction and abduction
is presented in Section 5.5.2.
5.4.3 Discussion
Abduction based on the hypothetico-nonmonotonic model is different from induction
based on hypothetico-deductive model. It is no more falsity preserving and it is not a
form of reversed deduction. Let us note that the notion of explanation is complex and
although it is often mixed up with deductive inference, many other cases of explanation
can be considered, modelled by a non-deductive inference (see for example (Lipton,
1991)).
Another fundamental point raised here is the notion of confirmation that points out
the explanation/confirmation duality, and allows us to distinguish some characteristics
about abduction and induction. The problem of confirmation is identified as problem
of induction. So the question is what would the concept of confirmation be in ab-
ductive reasoning? Usually, in abduction, a hypothesis is simply accepted and from a
technical point of view it is extracted from the domain theory (Dimopoulos and Kakas,
1996a). Abduction is a way to clarify particular conditions, implicit in the domain the-
ory, that allows to give a proof (not necessarily deductive) to accept the observation,
whereas induction is the construction of a hypothesis that gives an account of unifor-
mity, common to a set of observations. Abduction is the admission of the necessity
of "a jump to conclusion" while induction is the formulation of a uniformity, with an
increasing empiricism. We seize the opportunity to point out another difference usu-
ally expressed between the two forms of reasoning that is, induction is a prediction
for some unobserved information, whereas abduction deals with the available obser-
vations.
In the following section, we examine the underlying logical inference of abductive
and inductive reasoning, on the object- and meta-level.
Let us note l<;nd (resp. l<abd) the inductive (resp. abductive) inference, defined mod-
ulo a domain theory T h, as follows:
0 l<ind(abd) H
ON RELATIONSHIPS BETWEEN INDUCTION AND ABDUCTION 85
From the observation 0 and the domain theory T h, we induce (or abduce) the hypoth-
esis H.
Ol<ind(abd)H - Compl(Th)I\Of=H
5.6 CONCLUSION
We investigated many ways to relate induction and abduction in a single framework,
each corresponding to different definitions of induction and abduction. Some of these
definitions are complementary, some of them are more abstract than others, some of
them are antinomic. In the first approach, we studied possibilities to insert one into
the other: induction as a form of abduction and vice versa. In this approach, induction
considered as generalization of observations plays an explanatory role of abductive
type, induction is situated as a form of abduction or inference to the best explana-
tion. In the second approach, definitions of induction and abduction are restricted and
refined. They remain forms of explanatory reasoning that constitutes their common
root, but they differ about their explanatory power. We define induction as inference
of covering laws, and abduction as inference of particular conditions, which is close to
ON RELATIONSHIPS BETWEEN INDUCTION AND ABDUCTION 87
Acknowledgments
This work has been partly supported by the Ganymede II project of the Contrat de Plan Etat/Nord-
Pas-de-Calais.
6 ON THE LOGIC OF HYPOTHESIS
GENERATION
Peter A. Flach
6.1 INTRODUCTION
It has been argued in the introductory chapter to this volume that, when dealing with
non-deductive reasoning forms like abduction and induction, a distinction between
hypothesis generation and hypothesis evaluation arises. Hypothesis generation is con-
cerned with hypotheses that are not yet ruled out by the data, and is as such a purely
logical process. Hypothesis evaluation then proceeds with further investigating the
possible hypotheses in order to select one of which the predictions agree sufficiently
with reality. This distinction has been inspired by Peirce's later, inferential theory of
abduction, deduction and induction as the three stages of scientific inquiry.
Furthermore, as argued in the introductory chapter and by several authors contribut-
ing to this volume (e.g. Lachiche), while abduction is predominantly inference of an
explanation of the observations, induction can be understood from two perspectives.
On the one hand, we can perceive inductively inferred classification rules as explain-
ing the classifications of the examples (explanatory induction); on the other, we can
take a wider perspective by including non-classificatory forms of induction, which aim
at discovering generalisations that are sufficiently confirmed by the data (confirmatory
or descriptive induction).
In this chapter we deal with hypothesis generation rather than evaluation. The aim
is to formalise the relation between observations and possible hypotheses. The distinc-
tion between explanatory hypotheses and confirmed generalisations naturally leads to
two different forms of hypothesis generation, which I will call explanatory and con-
firmatory reasoning. The reader may think of confirmatory reasoning as generation of
(non-classificatory) inductive generalisations and of explanatory reasoning as genera-
89
P.A. Flach and A. C. Kakas (eds.), Abduction and Induction , 89-106.
© 2000 Kluwer Academic Publishers.
90 P.A.FLACH
tion of abductive explanations (including classification rules), but it is not the aim of
this chapter to draw a firm line between abduction and induction. In fact, the analysis
in this chapter can be seen as an alternative to the abduction/induction dichotomy, dis-
tinguishing non-deductive reasoning forms instead by the logical relation the inferred
hypotheses bear to the premisses.
The way this logical relation is characterised in this chapter has been inspired by
work on the analysis of nonmonotonic reasoning (Kraus et al., 1990). The analysis is
carried out on the meta-level by considering hypothetical consequence relations that
link observations with possible hypotheses. This meta-level analysis is accompanied
by semantic characterisations and representation theorems. The approach thus not
only links up with other work on logics for artificial intelligence, but also with classi-
callogical analysis dealing with soundness and completeness of proof systems. More-
over, we draw connections with work in philosophy of science, in particular Hempel's
qualitative approach to confirmation.
Q are literals (atomic formulae or their negation). Intuitively, such a postulate should
be interpreted as an implication with antecedent P1, . .. , Pn (interpreted conjunctively)
and consequent Q, in which all variables are implicitly universally quantified. An
example of such a postulate, written in an expanded Gentzen-style notation, is
ak~ a/\~f=y
a/\·y~~
This is a postulate with two positive literals in its antecedent, and a negative literal in
its consequent. Intuitively, it expresses that a hypothesis ~. previously inferred from
evidence a, should be withdrawn if the negation of a consequence of a and ~ together
is added to the evidence.
Consequence relations provide the semantics for this meta-language, by fixing the
meaning of the meta-predicate k . Formally, a consequence relation is a subset of
L x L. Consequence relations will be used to model part or all of the reasoning be-
haviour of an abductive or inductive agent, by listing a number of arguments (pairs
of premiss and conclusion) the agent is prepared to accept. A consequence rela-
tion satisfies a postulate whenever it satisfies all instances of the postulate, and vi-
olates it otherwise, where an instance of a postulate is obtained by replacing the vari-
ables of the postulate with formulae from L. For instance, the consequence relation
{(p,q), (p/\ •p,q)} violates the postulate above. We will normally refer to a particular
consequence relation as k and write p k q instead of (p, q) E k.
1Practical algorithms establish a function from evidence to hypothesis rather than a relation, i.e. also Right
Logical Equivalence would be invalidated by an induction algorithm (of all the logically equivalent hy-
potheses only one would be output).
92 P.A.FLACH
Verification
Falsification
In these two postulates y is a prediction made on the basis of hypothesis ~ and evidence
a. Verification expresses that if such a prediction is indeed observed, hypothesis ~ re-
mains a possible hypothesis, while if its negation is observed, ~ may be considered
refuted according to Falsification. 2 One might remark that typically the hypothesis
will entail the evidence, so that the first condition in the antecedent of Verification
and Falsification may be simplified to f= ~ -+ y. However, this is only the case for
certain approaches to explanatory reasoning; generally speaking hypotheses, in par-
ticular those that are confirmed without being an explanation, may not contain all the
information conveyed by the evidence. The formulation above represents the general
case.
Falsification can be simplified in another sense, as shown by the following lemma.
ak~
Consistency
Falsification and Consistency rule out inconsistent evidence and hypotheses. The way
inconsistent evidence is handled is merely a technicality, and we might have decided
to treat it differently. The case of inconsistent hypotheses is different however: it is
awkward to say, for instance, that arbitrary evidence induces an inconsistent hypothe-
sis. Furthermore, in inductive concept learning often negative examples are included,
that are not to be classified as belonging to the concept, which requires consistency
of the induced rule. Also, the adoption of Consistency is the only way to treat ex-
planatory and confirmatory reasoning in a unified way as regards the consistency of
evidence and hypothesis.
In the presence of Consistency a number of other principles have to be formulated
carefully. For instance, we have reflexivity only for consistent formulae . In the light
2 Notice that, contrary to the previous two postulates, Verification and Falsification happen to be meaning-
ful also when modelling the behaviour of an induction algorithm: Verification expresses that the current
hypothesis should not be abandoned when the next observation is a predicted one (in the terminology of
(Angluin and Smith, 1983) the algorithm is conservative), while Falsification expresses that the current hy-
pothesis must be abandoned when the next observations runs counter to the predictions of the algorithm
(called consistency by (Angluin and Smith, 1983)). However, in the context of the present chapter these are
not the intended interpretations of the two postulates.
ON THE LOGIC OF HYPOTHESIS GENERATION 93
Left Reflexivity
akf3
aka
Right Reflexivity
akf3
f3kf3
If a consequence relation contains an argument a k a, this signals that a is consistent
with the reasoner's background theory. We will call such an a admissible (with respect
to the consequence relation), and use conditions of this form whenever we require
consistency of evidence or hypothesis in a postulate.3
The final postulate mentioned in this section is a variant of Verification that allows
to add any prediction to the hypothesis rather than the evidence:
Right Extension
C , Af=C
A
In the logical analysis of this chapter the explanatory inference from C to A is lifted
to the meta-level, as follows:
Af=C
CkA
3Readers with a background in machine learning may interpret aka as 'hypothesis a does not cover any
negative example'.
94 ~A.FLACH
We may note that Admissible Converse Entailment can be derived from Admissible
Right Strengthening if we assume Consistency and the following postulate:
Explanatory Reflexivity
While the postulates above express properties of possible explanations, the fol-
lowing two postulates concentrate on the evidence. The underlying idea is a basic
principle in inductive machine learning: if the evidence is a set of instances of the
target concept, we can partition the evidence arbitrarily and find a single hypothesis
that is an explanation of each subset of instances. This principle is established by the
following two postulates: 4
lncrementality
(l ky ' 13 ky
nAI3ky
Convergence
f=n---+13 , nky
13 ky
Lemma 6.3 If k is a consequence relation satisfying lncrementality and Conver-
gence, then a/\ 13 ky iff a kyand 13 ky.
4 In
previous work (Flach, 1995) Incrementality was called Additivity, and Convergence was called Incre-
mentality. The terminology employed here better reflects the meaning of the postulates.
ON THE LOGIC OF HYPCITHESIS GENERATION 95
of evidence can be dealt with in isolation. Another way to say the same thing is that
the set of evidence explained by a given hypothesis is conjunctively closed. By the
postulate of Consistency this set is consistent, which yields the following principle:
Left Consistency
Lemma 6.4 In the presence ofRight Reflexivity and Admissible Converse Entailment,
Left Consistency implies Consistency.
It follows that Left Consistency and Consistency are equivalent in the presence of
Right Reflexivity, Admissible Converse Entailment, and lncrementality.
Convergence expresses a monotonicity property of hypothesis generation, which
can again best be understood by considering its contrapositive: a hypothesis that is re-
jected on the basis of evidence~ cannot become feasible again when stronger evidence
a is available. In other words: the process of rejecting a hypothesis is not defeasible
(i.e. based on assumptions), but based on the evidence only. This is the analogue of the
monotonicity property of deduction (note that the latter can be obtained by reversing
the implication in the first condition of Convergence).
Lemma 6.5 The combination of Verification and Convergence is equivalent with the
following postulate:
t=uAy-+~ , uky
Predictive Convergence
~ ky
Conditionalisation
That is, possible explanations of a are those hypotheses l3 that are consistent and entail
a, where consistency and entailment are taken relative to a background theory encoded
inW.
The following system of postulates can be proved to axiomatise strong explanatory
structures.
akl3 yky
Admissible Right Strengthening
aky
Explanatory Reflexivity
aky , 13ky
Incrementality
aAI3ky
Predictive Convergence
F= a/\y-t 13 aky
13 ky
Left Consistency
5 An explanationmechanism reasons from an explanans to an explanandum, and thus should not be confused
with an explanatory consequence relation which reasons from explanandum to a possible explanans.
6 C(a) ~ C-(~) implies~ J--a if j-, is reflexive;~ J-,a implies c_(a) ~ C-(~) if j-, is transitive.
ON THE LOGIC OF HYPOTHESIS GENERATION 97
Conditionalisation
Theorem 6.6 (Soundness of EM) Any strong explanatory consequence relation sat-
isfies the postulates of EM.
Theorem 6.7 (Completeness of EM) Any consequence relation satisfying the postu-
lates of EM is defined by a strong explanatory structure.
The entailment condition (HI) simply means that entailment ' might be referred to
as the special case of conclusive confirmation' (Hempel, 1945, p.l07). The conse-
quence conditions (H2) and (H2.1) state that the relation of confirmation is closed
under weakening of the hypothesis or set of hypotheses (HI is weaker than H2 iff it
is logically entailed by the latter). Hempel justifies this postulate as follows (Hempel,
1945, p.l03): ' an observation report which confirms certain hypotheses would invari-
ably be qualified as confirming any consequence of those hypotheses. Indeed: any
such consequence is but an assertion of all or part of the combined content of the orig-
inal hypotheses and has therefore to be regarded as confirmed by any evidence which
confirms the original hypotheses.' Now, this may be reasonable for single hypotheses
(H2.1 ), but much less so for sets of hypotheses, each of which is confirmed separately.
The culprit can be identified as (H2.3), which together with (H2.1) implies (H2). A
similar point can be made as regards the consistency condition (H3), about which
Hempel remarks that it 'will perhaps be felt to embody a too severe restriction'.
(H3.1), on the other hand, seems to be reasonable enough; however, combined with
the conjunction condition (H2.3) it implies (H3). We thus see that Hempel's rationality
postulates are intuitively justifiable, except for the conjunction condition (H2.3) and,
a fortiori, the general consequence condition (H2). On the other hand, the conjunction
ON THE LOGIC OF HYPOTHESIS GENERATION 99
Admissible Entailment
I= a ---+ ~ a ka
ak~
Confirmatory Reflexivity
aka a K·~
~k~
Admissible Entailment expresses that admissible evidence (i.e. evidence that is con-
sistent with the background knowledge) confirms any of its consequences. In other
words, consistent entailment is a special case of confirmation. Confirmatory Reflex-
ivity is the confirmatory counterpart of Explanatory Reflexivity encountered in the
previous section. It is added as a separate postulate since, in its original formulation,
(Hl) includes reflexivity as a special case. As with its explanatory counterpart, Con-
firmatory Reflexivity is best understood when considering its contrapositive: if ~ is
inadmissible, i.e. too strong a statement with regard to the background knowledge, its
negation •13 is so weak that it is confirmed by arbitrary admissible formulae a.
Consequence condition (H2) cannot be translated directly, since in the language of
consequence relations as defined here we have no means to refer to a set of confirmed
sentences. However, a translation of the special consequence condition (H2.1) and the
conjunction condition (H2.3) will suffice:
1=13---+y , akl3
Right Weakening
aky
ak~ , aky
Right And
ak~Ay
Lemma 6.8 If k is a consequence relation satisfying Right And and Right Weaken-
ing, then a k ~A y iff a k ~and a ky.
Right Weakening expresses that any hypothesis entailed by a given hypothesis con-
firmed by a is also confirmed by a. Notice that Admissible Entailment is an instance
of Right Weakening (put J3=a).
Lemma 6.9 The combination of Right Extension and Right Weakening is equivalent
to the following postulate:
Consistency
Condition (H3.2) expresses that for any formula f3, if f3 is in the set of confirmed
hypotheses then •13 is not. This principle is expressed by the following postulate:
Right Consistency
Lemma 6.10 In the presence of Admissible Entailment and Left Reflexivity, Right
Consistency implies Consistency.
7 This holds only for finite K, an assumption that I will make throughout.
ON THE LOGIC OF HYPOTHESIS GENERATION 101
[a] = s iff 1= a
[a/\ 13] =[a] n [13]
[-.a]= S- [a]
and similarly for the other connectives.
Let us call confirmatory structures which satisfy these conditions, as well as [a] ~
[a] for all a E L, simple confirmatory structures, defining simple (closed) confirma-
tory consequence relations. I will now demonstrate that simple confirmatory structures
are axiomatised by the following postulate system.
102 P.A.FLACH
f=at\~--+'Y , ak~
Predictive Right Weakening
aky
ak~ aky
Right And
ak~Ay
ak~
Right Consistency
a\;(·~
Derived postulates of CS include Right Weakening and Right Extension (Lemma 6.9),
Left Reflexivity (an instance of Predictive Right Weakening), Admissible Entailment
(an instance of Right Weakening), and Consistency (Lemma 6.10).
Theorem 6.11 (Soundness of CS) Any simple closed confirmatory consequence re-
lation satisfies the postulates ofCS.
For admissible formulae, normal models will play the role of the regular models in the
simple confirmatory structure we are building (remember that, given a consequence
relation k, a formula a E L is admissible iff aka).
In the standard treatment of inconsistent premisses they have all formulae as con-
sequences, hence no normal models- since in our treatment inconsistent premisses do
not confirm any hypothesis and thus have all models in U as normal models, we have
to treat them as a separate case. Given a consequence relation k, let W = (U, [·], [ · ]}
be defined as follows:
2. [a]={mEUimf=a}.
3. [a] = { m E U I m is a normal model for a} if a is admissible, and 0 otherwise;
The following completeness result states that the consequence relation defined by W
coincides with the original one if the latter satisfies the postulates of CS.
Theorem 6.12 (Completeness of CS) Any consequence relation satisfying the pos-
tulates of CS is the closed consequence relation defined by a simple confirmatory
structure.
One may note that two of the postulates obtained in the previous section have not
been mentioned above, viz. Confirmatory Reflexivity and Left Logical Equivalence.
ON THE LOGIC OF HYPaJ'HESIS GENERATION 103
Definition 6.7 A preferential structure is a triple W = (S,/, <), where Sis a set of
states, I: S-tU is a function that labels every state with a model, and< is a strict
partial orde~ on S, called the preference ordering, that is smooth.9 W defines a prefer-
ential confirmatory structure (S, [·], [·])and a preferential confirmatory consequence
relation as follows: [a.]= {s E S jl(s) f= a.}, and [a.]= {s E [a.] I Vs' E S: s' < s:::}
s' ¢[a.]}.
Note that preferential confirmatory structures are simple confirmatory structures, and
that they also satisfy the conditions associated above with Left Logical Equivalence
and Confirmatory Reflexivity (by the smoothness condition, if s E [a.] then either
sE (a.] or there is a t<s such that tE (a.]- hence [a.] ':/; 0 implies (a.] ':/; 0).
In comparison with the preferential semantics of (Kraus et al., 1990), the only dif-
ference is that in a preferential confirmatory argument the evidence is required to be
satisfiable, in order to guarantee the validity of Consistency. The intermediate seman-
tic level of states is mainly needed for technical reasons, and can be interpreted as
the set of models the hypothesis generating agent considers possible in that epistemic
state.
The following set of postulates axiomatises preferential confirmatory consequence
relations.
Definition 6.8 The system CP consists of the postulates ofCS, Confirmatory Reflex-
ivity, Left Logical Equivalence, plus the following postulates:
Left Or
a.ky ' f3ky
a.Va.ky
Strong Verification
a.ky ' a.kf3
a./\ykf3
The first postulate can be seen as a variant of Convergence, which is clearly invalid in
the general case: if we weaken the evidence, there will presumably come a point where
the evidence no longer confirms the hypothesis. However, Left Or states that pieces
of confirming evidence for a hypothesis can be weakened by taking their disjunction.
The second postulate is a variant of Verification, which states that a predicted formula
y can be added to confirming evidence a. for hypothesis f3. Strong Verification states
that this is also allowed when y is confirmed by a.. The way in which Strong Verifi-
cation strengthens Verification is very similar to the way Right And strengthens Right
Extension. The underlying intuition is that the evidence is strong enough to have all
confirmations "point in the same direction", as it were. 10
Theorem 6.13 (Representation theorem for CP) A consequence relation is prefer-
ential confirmatory if! it satisfies the postulates ofCP.
From this definition it is clear that weak confirmatory consequence relations satisfy
both Right Weakening and Left Weakening (i.e. Convergence), as well as Consistency.
One additional postulate is needed.
10In the context of nonmonotonic reasoning, Strong Verification is known as Cautious Monotonicity. For
the purposes of this chapter I prefer to use the first name, which expresses more clearly the underlying
intuition in the present context.
11 This is sometimes called credulous inference, in contrast with skeptical inference which requires truth in
all regular models.
ON THE LOGIC OF HYPOTHESIS GENERATION 105
f=at\y-+f3 , aky
Predictive Convergence
f3 ky
Consistency
Disjunctive Rationality
Disjunctive Rationality has not been considered before in this chapter. The name
has been borrowed from Kraus et al., who identify it as a valid principle of plausible
reasoning. In the context of confirmatory reasoning, Disjunctive Rationality states that
a hypothesis cannot be confirmed by disjunctive observations unless it is confirmed by
at least one of the disjuncts separately.
6.5 DISCUSSION
This chapter has been written in an attempt to increase our understanding of hypoth-
esis generation through logical analysis. What logic can achieve for arbitrary forms
of reasoning is no more and no less than a precise definition of what counts as a pos-
sible hypothesis given certain premisses. Evaluating the usefulness or plausibility of
hypotheses is an extra-logical matter.
There is not a single logic of hypothesis generation. The logical relationship be-
tween evidence and possible hypotheses depends on the task these hypotheses are
intended to perform. Abductive hypotheses or induced classification rules such as con-
cept definitions are based on a notion of explanation, while non-classificatory gener-
alisations are based on a notion of confirmation. Other forms of hypothesis generation
are conceivable. In this chapter I have proposed a meta-level framework for charac-
terising and reasoning about different forms of hypothesis generation. The framework
does not fix a material definition of hypothesis generation, but can be used to aggregate
knowledge about classes of such material logics of hypothesis generation.
A number of technical results have been obtained. The system EM axiomatises
explanation-preserving reasoning with respect to a monotonic explanation mechanism.
This system stays close to Peirce's conception of abduction in his later inferential
theory, but enables a fuller logical analysis. Characterisation of explanatory reasoning
with respect to weaker (e.g. preferential) explanation mechanisms is left as an open
problem.
106 P.A.FLACH
Acknowledgments
An extended version of this chapter appears in the Handbook of Defeasible Reasoning and
Uncertainty Management (Flach, 2000). Part of this work was supported by Esprit IV Long
Term Research Project 20237 (Inductive Logic Programming 2).
7 ABDUCTION AND INDUCTION
FROM A NON-MONOTONIC
REASONING PERSPECTIVE
Nicolas Lachiche
7.1 INTRODUCTION
Both abduction and induction aim at inferring hypotheses given some observations. In
the inferential perspective of Peirce, recalled by Flach and Kakas in the introductory
chapter, abduction is defined as the process of coming up with a hypothesis to explain
the observations. I will consider a slightly more constrained definition of abduction as
the inference of the best explanation. Reasoning from symptoms to their causes (dis-
eases) is, for instance, a typical abductive task. This definition is the most frequently
used in computer science.
The underlying notion of explanation still has to be made precise. In his chapter in
this volume, Josephson argues that some good explanations are not proofs and some
proofs are not explanations, so explanations are not deductive proofs but assignments
of causal responsibility. Similarly, Bessant introduces a hypothetico-nonmonotonic
model. We will consider in this chapter the simpler case where explanation is modelled
by logical consequence. We will also assume that knowledge is represented in a first-
order predicate logic.
While induction is sometimes considered to cover any ampliative reasoning as men-
tioned by Flach and Kakas, in this chapter, induction will only refer to inductive gen-
eralisation: induction is the inference of general laws from observations. Given the
observations of the positions of the planets during a limited lapse of time, inferring
that the planets move in ellipses with the sun at one focus is an inductive task.
107
P.A. Flach and A. C. Kakas (eds.), Abduction and Induction, 107-116.
© 2000 Kluwer Academic Publishers.
108 N.LACHICHE
Clearly abduction and induction meet each other when the inference of general laws
explaining the observations is considered. In this case, abduction and induction can-
not be distinguished in the inference process. I will however argue that they can still
be separated according to the utility of inferred hypotheses. Actually, the inference of
general rules explaining the observations is one form of induction, appropriately called
explanatory induction. There is another form of induction whose aim is not to explain
the observations, but simply to point out regularities confirmed by the observations.
This other kind of induction is called descriptive (or confirmatory) induction. The
relationship between abduction and descriptive induction is less obvious and requires
the consideration of their underlying completion principles. Abduction and induc-
tion are indeed non-monotonic reasonings. Their conclusions, which are rather called
hypotheses, may be falsified when new knowledge is provided. Abduction and induc-
tion are ampliative: They provide more knowledge than that can be deduced from the
available knowledge. Their results can however be built automatically by a computer
system, and that means that they can be deduced in some way. This is done by com-
pleting the limited available knowledge in order to have enough information to be able
to use deduction. The completion technique has therefore to represent the intended
reasoning. At least one completion technique has been used for both abduction and
induction but this does not mean that the completion principles are the same in both
cases. In fact, I will argue that induction relies on a completion of individuals whereas
abduction relies on a completion of properties. Therefore, abduction and induction
differ from a non-monotonic reasoning perspective.
In the following sections, the different meanings of abduction and of induction
are recalled, in particular the explanatory and descriptive forms of induction. These
two forms of induction are then compared with abduction. The case of explanatory
induction is quite obvious, but descriptive induction will require to detail underlying
completion principles. The relationship between abduction and induction will finally
be discussed in the light of their freshly highlighted completion principles and their
different applications.
7.2 DEFINITIONS
In this section, the different meanings that can be given to logic-based abduction and
induction are presented. We will first define abduction and the underlying notion
of explanation. Then we will consider induction and its explanatory and descriptive
forms.
Actually, the non-monotonicity of both forms of reasoning does not mean that every
piece of knowledge that cannot be deduced from the observations is a hypothesis. Let
us first define abductive hypotheses as the set of preferred hypotheses explaining the
observations.
The notion of explanation can be defined in numerous ways (Josephson, this vol-
ume). In this chapter, we will only model explanation by logical consequence. We are
thus in the hypothetico-deductive model detailed by Bessant (this volume). Clearly,
a pure deductive model cannot completely fit the real world and any pure deductive
model has to be adapted into a hypothetico-non monotonic model to cover some ex-
ceptions. However, a pure deductive model usually covers most of the observations
and therefore provides a good basis for reasoning.
For instance, given the observation E = mortal(Socrates) and the domain theory
Th = VX ,human(X) => mortal(X), the formulaH =human( Socrates) is an abductive
hypothesis for E given T h.
Inductive hypotheses are not necessarily required to be explanatory. Inferring that
planets move in ellipses from observations does describe some general knowledge on
planets, but does not explain a lot. Roughly, inductive hypotheses have to account for
the observations.
• a domain theory T h in L,
induction consists in determining the set of hypotheses H from L which account forE
given T h and are preferred according to C.
Accounting for the observations can have at least two meanings: hypotheses can
either explain the observations, or reflect regularities of the observations. The for-
mer defines explanatory induction and the latter defines descriptive induction. The
definition of explanatory induction is thus the same as that of abduction.
• a domain theory T h in L,
For instance, given the observation E = mortal(Socrates) and the domain theory
Th = human(Socrates), the formula
H = 't/X,human(X) => mortal(X)
• CompsA(E 1\ Th) f= H,
• H is preferred according to C.
CompSA(E 1\ Th) is the completion of the initial knowledge by a similarity assumption
such that unknown individuals behave like the known ones.
E = {mortal(Socrates),human(Socrates)}
and the domain theory Th = VX ,man(X)=> human(X), the formulaH = VX ,human(X) =>
mortal(X) is a descriptive inductive hypothesis forE given Th.
A NON-MONOfONIC REASONING PERSPECTIVE 111
VX,mortal(X) =} human(X) .
Clark's predicate completion has also been used for descriptive induction. In the
CLAUDIEN system proposed by (De Raedt and Bruynooghe, 1993), a clause is a de-
scriptive inductive hypothesis if no counter-instance of it can be deduced from the
completed initial knowledge, CompcJark(E 1\ Th). For instance, Clark's predicate
completion of the set of observations
E = {human(Socrates),mortal(Socrates)}
Th = VX,man(X) =} human(X)
E 1\ Th 1\ VX,X E C ~ H.
E = {mortal(Socrates),human(Socrates)}
and the domain theory
is a logical consequence of
E 1\ Th 1\ VX,(X=Socrates).
At least two definitions have been proposed for the notion of "appearing" individ-
uals. (Lachiche and Marquis, 1997) suggested to use the Herbrand domain of E 1\ T h
while (Hempel, 1945) suggested to use the set of "essential" individuals of E 1\ T h,
that is the set of individuals appearing in every formula logically equivalent toE 1\ T h.
This issue is discussed in (Lachiche and Marquis, 1997).
The domain closure assumption is clearly appropriate to model descriptive induc-
tion, but it does not seem appropriate to model abduction. Actually, abduction and
induction do not rely on the same completion principles.
114 N.LACHICHE
7.5 DISCUSSION
Both abduction and explanatory induction infer an explanation of the observations. I
don't think that syntactical constraints or language biases, such as considering only ab-
ducibles, must be put forward to distinguish between induction and abduction. There-
fore, abduction and explanatory induction can be seen as logically identical. However,
they can still be separated from an epistemic perspective. Those epistemic differences
apply to descriptive induction as well. However more differences can be pointed out
for descriptive induction.
7.6 CONCLUSION
In this chapter, abduction is considered as the process of inferring an explanation for
given observations. Explanations are assumed to logically entail the observations.
This definition is close to the one of explanatory induction. In fact, they are identical
when no syntactical arguments are taken into account and I believe that the difference
between abduction and induction should not rely on syntactical restrictions. Logically
speaking, abduction and explanatory induction are the same process.
One originality of this chapter is to compare abduction with another form of induc-
tion called descriptive (or confirmatory) induction. This kind of induction has been
studied for a long time by philosophers, but has only recently been considered in arti-
ficial intelligence. Descriptive induction doesn't aim at explaining the observations but
116 N.LACHICHE
Acknowledgments
This work was partially supported by Esprit IV Long Term Research Project 20237 (Inductive
Logic Programming 2). I would like to thank Pierre Marquis and Antony Bowers for their
helpful comments.
Ill The integration of
abduction and induction: an
Artificial Intelligence perspective
8 UNIFIED INFERENCE IN
EXTENDED SYLLOGISM
Pei Wang
ScP
where S is the subject term of the statement, and P is the predicate term. Intuitively,
this statement says that Sis a specialization (instantiation) of P, and Pis a generaliza-
tion (abstraction) of S. This roughly corresponds to "S is a kind of P" in English.
Term logic uses syllogistic inference rules, in each of which two statements shar-
ing a common term generate a new statement linking the two unshared terms. In
Aristotle's syllogism (Aristotle, 1989), all statements are binary (that is, either true
or false), and all valid inference rules are deduction, therefore when both premises
are true, the conclusion is guaranteed to be true. When Aristotle introduced the de-
duction/induction distinction, he presented it in term logic, and so did Peirce when
he added abduction into the picture (Peirce, 1958). According to Peirce, the deduc-
tion/abduction/induction triad is defined formally in terms of the position of the shared
term (Table 8.1). Defined in this way, the difference among the three is purely syn-
tactic: in deduction, the shared term is the subject of one premise and the predicate of
117
P.A. Flach and A.C. Kakas (eds.), Abduction and Induction, 117-129.
© 2000 Kluwer Academic Publishers.
118 P. WANG
the other; in abduction, the shared term is the predicate of both premises; in induction,
the shared term is the subject of both premises. If we only consider combinations of
premises with one shared term, these three exhaust all the possibilities (the order of
the premises does not matter for our current purpose).
It is well-known that only deduction can generate sure conclusions, while the other
two are fallible. However, it seems that abduction and induction, expressed in this way,
do happen in everyday thinking, though they do not serve as rules for proof or demon-
stration, as deduction does. In seeking for a semantic justification for them, Aristotle
noticed that induction corresponds to generalization, and Peirce proposed that abduc-
tion is for explanation. Later, when Peirce's focus turned to the pragmatic usage of
non-deductive inference in scientific inquiry, he viewed abduction as the process of
hypothesis generation, and induction as the process of hypothesis confirmation.
Peirce's two ways to classify inference, syllogistic and inferential (in the terminol-
ogy introduced by Flach and Kakas in their introductory chapter), correspond to two
levels of observation in the study of reasoning. The "syllogistic" view is at the micro
level, and is about a single inference step. It indicates what conclusion can be derived
from given premises. On the other hand, the "inferential" view is at the macro level,
and is about a complete inference process. It specifies the logical relations between
the initial premises and the final result, without saying how the result is obtained, step
by step.
Though the syllogistic view is constructive and appealing intuitively, some issues
remain open, as argued by Flach and Kakas:
1. Abduction and induction, when defined in the syllogistic form, have not ob-
tained an appropriate semantic justification.
2. The expressive capacity of the traditional term-oriented language is much less
powerful than the language used in predicate calculus.
I fully agree that these two issues are crucial to term logic.
For the first issue, it is well known that abduction and induction cannot be justified
in the model-theoretic way as deduction. Since "preserving truth in all models" seems
to be the only existing definition for the validity of inference rules, in what sense
abductive and inductive conclusions are "better", or "more rational" than arbitrary
conclusions? In what semantic aspect they differ from each other?
UNIFIED INFERENCE IN EXTENDED SYLLOGISM 119
The second issue actually is a major reason for syllogism to lose its dominance to
"mathematical logic" (mainly, first-order predicate logic) one hundred years ago. How
much of our knowledge can be put into the form "S is a kind of P", where S and P are
English words? Currently almost all logic textbooks treat syllogism as an ancient and
primary form of logic, and as a subset of predicate logic in terms of functionalities.
It is no surprise that even in the works that accept the "syllogistic" view of Peirce on
abduction and induction, the actual formal language used is not that of syllogism, but
predicate logic (see the chapter of Christiansen).
In the following, I want to show that when properly extended, syllogism (or term
logic, I am using these two names interchangeably here) provides a more natural,
elegant, and powerful framework, in which deduction, abduction, and induction can
be unified in syntax, semantics, and pragmatics. Furthermore, this new logic can be
implemented in a computer system for the purpose of artificial intelligence.
where "S C P" is used as in the previous section, and "< F, C >" is a pair of real
numbers in [0, 1]. representing the truth value of the statement. F is the frequency of
the statement, and Cis the confidence.
When both F and C reach their maximum value, 1, the statement indicates a com-
plete inheritance relation from StoP. This special case is written as "S C P". By
definition, the binary relation "c" is reflexive and transitive.
We further define the extension and intension of a term T as sets of terms:
where the first relation is an inheritance relation between two terms, while the last
two are including relations between two sets. This is why "c" is called a "complete
inheritance" relation- "S C P" means that P completely inherits the extension of S,
and S completely inherits the intension of P.
120 P.WANG
As mentioned before, "c'' is a special case of "C". In NARS, according to the as-
sumption of insufficient knowledge, inheritance relations are usually incomplete. To
adapt to its environment, even incomplete inheritance relations are valuable to NARS,
and the system needs to know how incomplete the relation is, according to given
knowledge. Though complete inheritance relations do not appear as given knowl-
edge to the system, we can use them to define positive and negative evidence of a
statement in idealized situations, just like we usually define measurements of physical
quantities in highly idealized situations, then use them in actual situations according
to the definition.
For a given statement "S C P" and a term M, when M is in the extension of S, it
can be used as evidence for the statement. If M is also in the extension of P, then the
statement is true, as far as M is concerned, otherwise it is false , with respect toM. In
the former case, M is a piece of positive evidence, and in the latter case, it is a piece of
negative evidence. Similarly, if M is in the intension of P, it becomes evidence for the
statement, and it is positive if M is also in the intension of S, but negative otherwise.
For example, if we know that iron (M) is metal (S), then iron can be used as ev-
idence for the statement "Metal is crystal" (S C P). If iron (M) is crystal (P), it is
positive evidence for the above statement (that is, as far as its instance iron is con-
cerned, metal is crystal). If iron is not crystal, it is negative evidence for the above
statement (that is, as far as its instance iron is concerned, metal is not crystal). There-
fore, syntactically, "Metal is crystal" is an inductive conclusion defined in term logic;
semantically, the truth value of the conclusion is determined by checking for inherited
instance (extension) from the subject to the predicate; pragmatically, the conclusion
"Metal is crystal" is a generalization of "Iron is crystal", given "Iron is metal" as
background knowledge.
Similarly, if we know that metal (P) is crystal (M), then " being crystal" can be
used as evidence for the statement "Iron is metal" (S C P) . If iron (S) is crystal (M),
it is positive evidence for the above statement (that is, as far as its property crystal is
concerned, iron is metal). If iron is not crystal, it is negative evidence for the above
statement (that is, as far as its property crystal is concerned, iron is not metal). There-
fore, syntactically, "Iron is metal" is an abductive conclusion defined in term logic;
semantically, the truth value of the conclusion is determined by checking for inherited
property (intension) from the predicate to the subject; pragmatically, the conclusion
"Iron is metal" is an explanation of "Iron is crystal", given "Metal is crystal" as back-
ground knowledge.
The perfect parallelism of the above two paragraphs indicates that induction and
abduction, when defined in term logic as above, becomes dual.
If the given knowledge to the system is a set of complete inheritance relations, then
the weight of positive evidence and total (positive and negative) evidence are defined,
respectively, as
Intuitively, F is the proportion of positive evidence among all evidence, and C is the
proportion of current evidence among evidence in the near future (after a unit-weight
evidence is collected). When C is 0, it means that the system has no evidence on
the proposed inheritance relation at all (and F is undefined); the more evidence the
system gets (no matter positive or negative), the more confident the system is on this
judgment.
Now we can see that while in traditional binary logics, the truth value of a state-
ment qualitatively indicates whether there exists negative evidence for the statement,
in NARS the truth value quantitatively measures available positive and negative evi-
dence.
Revision
s c P <F1, c1 >
S c P <F2, C2>
S C P <F, C>
lishes inheritance relations based on shared extension. Though intuitively we can still
say that deduction is for proving, abduction is for explanation, and induction is for
generalization, these characteristics are no longer essential. Actually here labels like
"generalization" and "explanation" are more about the pragmatics of the inference
rules (that is, what the user can use them for) than about their semantics (that is, how
the truth values of their conclusions are determined).
Each time one of the above rules is applied, the truth value of the conclusion is
evaluated solely according to the evidence summarized in the premises. Abductive and
inductive conclusions are always uncertain (i.e., their confidence values cannot reach
1 even if the premises have confidence 1), because they never check the extension or
intension of the two terms exhaustively in one step, as deductive conclusions do.
To get more confident conclusions, a revision rule is used to combine evidence
from different sources (Table 8.3). These two functions are derived from the relation
between weight of evidence and truth value, and the additivity of weight of evidence
during revision (Wang, 1994; Wang, 1995). The conclusion is more confident than
either premise, because it is based on more evidence. The frequency of the conclusion
is more stable in the sense that it is less sensitive (compared to either premise) to (a
given amount of) future evidence.
Using this rule to combine abductive and inductive conclusions, the system can
obtain more confident and stable generalizations and explanations.
available knowledge. The task and knowledge are chosen probabilistically according
to priority values reflecting the urgency of each task and the salience of each piece
of knowledge. The priority distributions are adjusted after each step, according to
the result the system obtained in that step. By doing so, the system tries to spend
more resources on more important and promising tasks, with more reliable and useful
knowledge. Consequently, in each inference step the system does not decide what rule
to use first, then look for corresponding knowledge. Instead, it picks up two statements
that share a common term, and decides what rule to apply according to the position of
the shared term.
In general, a question-answering procedure in NARS consists of many inference
steps. Each step carries out a certain type of inference, such as deduction, abduction,
induction, revision, and so on. These steps are linked together in run-time in a context-
sensitive manner, so the processing of a question or a piece of new knowledge does
not follow a predetermined algorithm. If the same task appears at different time in
different context, the processing path and result may be different, depending on the
available knowledge, the order by which the pieces of knowledge are accessed, and
the time-space resource supplied to the task.
When the system runs out of space, it removes terms and statements with the lowest
priority, therefore some knowledge and tasks may be permanently forgotten by the
system. When the system is busy (that is, working on many urgent tasks at the same
time), it cannot afford the time to answer all questions and to consider all relevant
knowledge, so some knowledge and tasks may be temporally forgot by the system.
Therefore the quality of the answers the system can provide not only depends on the
available knowledge, but also depends on the context in which the questions are asked.
This control mechanism makes it possible for NARS to answer questions in real
time, to handle unexpected tasks, and to use its limited resources efficiently.
8.3 AN EXAMPLE
Let us see a simple example. Assume that the following is the relevant knowledge that
NARS has at a certain moment:
( 1) robin C feat he red -ereat ure < 1.00, 0. 90 >
("Robin has feather.")
(2) bird C feathered-creature < 1.00,0.90>
("Bird has feather.")
(3) swan C bird < 1.00,0.90>
("Swan is a kind of bird.")
(4) swan C swimmer < 1.00, 0.90 >
("Swan can swim.")
(5) gull C bird < 1.00,0.90>
("Gull is a kind of bird.")
(6) gull C swimmer < 1.00,0.90>
("Gull can swim.")
124 P. WANG
robin C swimmer
Swan provides positive evidence for "Bird swims". Again, the confidence is low.
[StepS] (10) and (12) look identical, but since they came from different sources,
they are not redundant and can be merged by the revision rule to get:
A compromise is formed by considering both positive and negative evidence, and the
positive evidence is stronger.
It needs to be mentioned that a typical run in NARS is much more complex than
the previous description, where we have omitted the conclusions that are irrelevant to
the current question, and we have assumed an order of inference that directly leads to
the desired result.
For example, in Step 2 and 4, NARS actually also gets a symmetric inductive con-
clusion
(17) swimmer C bird < 1.00,0.45 >
8.4 DISCUSSION
In this book, the current chapter is the only one that belongs to the term logic tradition,
while all the others belong to the predicate logic tradition. Instead of comparing NARS
126 P.WANG
with the other approaches introduced in the other chapters one by one, I will compare
the two paradigms and show their difference in handling abduction and induction,
because this is the origin of many minor differences between NARS and the other
works.
Compared with deduction, a special property of abduction and induction is the
uncertainty they introduced into their conclusions, that is, even when all the premises
are completely true and an abduction (or induction) rule is correctly applied, there is
still no guaranty that the conclusion is completely true.
When abduction and induction are formalized in binary logic, as in the most chap-
ters of this book, their conclusions become defeasible, that is, a conclusion can be
falsified by any single counter-evidence (see Lachiche's chapter). The philosophical
foundation and implication of this treatment of induction can be found in Popper's
work (Popper, 1959). According to this approach, an inductive conclusion is a univer-
sally quantified formula that implies all positive evidence but no negative evidence.
Though many practical problems can be forced into this framework, I believe that
there are much more that cannot - in empirical science and everyday life, it is not
very easy to get a non-trivial "rule" without counter-example. Abduction is similar.
Staying in binary logic means that we are only interested in explanations that explains
all relevant facts, which are not very common, neither.
To generate and/or evaluate generalizations and explanations with both positive and
negative evidence usually means to measure the evidence quantitatively, and the ones
that with more positive evidence and less negative evidence are preferred (other things
being equal). A natural candidate theory for this is "probabilistic logic" (a combination
of first-order predicate logic and probability theory).
Let us use induction as an example. In predicate logic, a general conclusion "Ravens
are black" can be represented as a universally quantified proposition ('v'x) (Raven(x) -t
Black(x)). To extend it beyond binary logic, we attach a probability to it, to allow it to
be "true to a degree". Intuitively, each time a black raven is observed, the probability
should be increased a little bit, while when a non-raven is observed, the probability
should be decreased a little bit.
Unfortunately, Hempel has found a paradox in this naive solution (Hempel, 1943).
('v'x)(Raven(x) -t Black(x)) is logically identical to ('v'x)( ...,B[ack(x) -t --,Raven(x) ).
Since the probability of the latter is increased by any non-black non-raven (such as a
green shirt), so does the former. This is highly counter-intuitive.
This chapter makes no attempt to survey the huge amount of literature on Hempel's
"Raven Paradox". What I want to mention is the fact that all the previous solutions
are proposed within the framework of first-order predicate logic. I will show that this
problem is actually caused by the framework itself, and the paradox does not appear
in term logic.
In first-order predicate logic, every general conclusion is represented by a propo-
sition which contains at least one universally quantified variable, such as the x in the
previous example. This variable can be substituted by any constant in the domain, and
the resulting proposition is either true or false. If we call the constants that make it
true "positive evidence" and those make it false "negative evidence", then everything
must belong to one of the two category, and nothing in the domain is irrelevant. Lit-
UNIAED INFERENCE IN EXTENDED SYLLOGISM 127
erally, ('v'x) (Raven(x) -t Black(x)) states that "For everything in the domain, either it
is a raven, or it is not black". Though it is a meaningful statement, there is a subtle
difference between it and "Ravens are black" - the latter is about ravens, not about
everything.
The situation in term logic is different. In term logic "Ravens are black" can be
represented as raven C blackJhing, and "Non-black things are not raven" as (thing-
blackJhing) C (thing- raven). According to the definition, these two statements
share common negative evidence (non-black ravens), but the positive evidence for
the former (black ravens) and the latter (non-black non-ravens) are completely differ-
ent (here we only consider the extension of the concepts). The two statements have
the same truth value in binary (extensional) term logic, because there a truth value
merely qualitatively indicates whether there is negative evidence for the statement. In
a non-binary term logic like NARS, they do not necessarily have the same truth value
anymore, so in NARS a green shirt has nothing to do with the system's belief about
whether ravens are black, just like crow, as a non-swimmer, provides no evidence for
swimmer C bird, no matter it is a bird or not (see the example in the previous section).
The crucial point is that in term logic, general statements are usually not about
everything (except when "everything" or "thing" happen to be the subject or the predi-
cate), and the domain of evidence is only the extension of the subject (and the intension
of the predicate, for a logic that consider both extensional inference and intensional
inference). I cannot see how first-order predicate logic can be extended or revised to
do a similar thing.
In summary, my argument goes like this: the real challenge of abduction and in-
duction is to draw conclusions with conflicting and incomplete evidence. To do this, it
is necessary to distinguish positive evidence, negative evidence, and irrelevant infor-
mation for a given statement. This task can be easily carried out in term logic, though
it is hard (if possible) for predicate logic.
Another advantage of term logic over predicate logic is the relation among deduc-
tion, abduction, and induction. As described previously, in NARS the three have a
simple, natural, and elegant relationship, both in their syntax and semantics. Their
definitions and the relationship among them becomes controversial in predicate logic,
which is a major issue discussed in the other chapters of this book.
By using a term logic, NARS gets the following properties that distinguish it from
other artificial intelligence systems doing abduction and induction:
• Abduction and induction become dual in the sense that they are completely sym-
metric to each other, both syntactically and semantically. The difference is that
abduction collects evidence from the intensions of the terms in the conclusion,
while induction collects evidence from the extensions of the terms. Intuitively,
they still correspond to generalization and explanation, respectively.
• With the help of the revision rule, abduction and induction at the problem-
solving level becomes incremental and open-ended processes, and they do not
need predetermined algorithms.
Here I want to claim (though I have only discussed part of the reasons in this chap-
ter) that, though First-Order Predicate Logic is still better for binary deductive reason-
ing, term logic provides a better platform for the enterprise of artificial intelligence.
However, it does not mean that we should simply go back to Aristotle. NARS has
extended the traditional term logic in the following aspects:
Though the last issue is beyond the scope of this chapter, it needs to be addressed
briefly. Term logic is often criticized for its poor expressibility. Obviously, many
statements cannot be put into the "S C P" format where Sand Pare simple words.
However, this problem can be solved by allowing compound terms. This is similar to
the situation in natural language: most (if not all) declarative sentences can be parsed
into a subject phrase and a predicate phrase, which can either be a word, or a structure
consisting of multiple words. In the same way, term logic can be expended to represent
more complex knowledge.
For example, in the previous section "Non-black things are not raven" is repre-
sented as (thing- blackJhing) C (thing- raven), where both the subject and the
predicate are compound terms formed by simpler terms with the help of the differ-
ence operator. Similarly, "Ravens are black birds" can be represented as raven C
(blackJhing n bird), where the predicate is the intersection of two simpler terms;
"Sulfuric acid and sodium hydroxide neutralize each other" can be represented as
(sulfuric..acid x sodium-hydroxide) C neutralization, where the subject is a Carte-
sian product of two simpler terms.
UNIFIED INFERENCE IN EXTENDED SYLLOGISM 129
Though the new version of NARS containing compound terms is still under de-
velopment, it is obvious that the expressibility of the term-oriented language can be
greatly enriched by recursively applying logical operators to form compound terms
from simpler terms.
Finally, let us re-visit the relationship between the micro-level (inference step) and
macro-level (inference process) perspectives of abduction and induction, in the context
of NARS. As described previously, in NARS the words "abduction" and "induction"
are used to name (micro-level) inference rules. Though the conclusions derived by
these rules still intuitively correspond to explanation and generalization, such a cor-
respondence does not accurately hold at the macro-level. If NARS is given a list of
statements to start with, then after many inference steps the system may reach a con-
clusion, which is recognized by human observers as an explanation (or generalization)
of some of the given statements. Such a situation is usually the case that the abduction
(or induction) rule has played a major role in the process, though it is rarely the only
rule involved. As shown by the example in the previous section, the answers reported
to the user by NARS are rarely pure abductive (or deductive, inductive, and so on).
In summary, though different types of inference can be clearly distinguished in each
step (at the micro level), a multiple-step inference procedure usually consists of var-
ious types of inference, so cannot be accurately classified as induction, abduction, or
deduction.
As mentioned at the beginning of this chapter, Peirce introduced the deduction-
induction-abduction triad in two levels of reasoning: syllogistic (micro, single step)
and inferential (macro, complete process). I prefer to use the triad in the first sense,
because it has an elegant and natural formalization in term logic. On the other hand,
I doubt that we can identify a similar formalization at the macro level when using
"abduction" for hypothesis generation, and "induction" for hypothesis confirmation.
It is very unlikely that there is a single, universal method for inference processes like
hypothesis generation or confirmation. On the contrary, these processes are typically
complex, and vary from situation to situation. For the purpose of artificial intelligence,
we prefer a constructive explanation, than a descriptive explanation. It is more likely
for us to achieve this goal at the micro level than at the macro level.
Because term logic has been ignored by mainstream logic and artificial intelligence
for a long time, it is still too early to draw conclusions about its power and limitation.
However, according to available evidence, at least we can say that it shows many novel
properties, and some, if not all, of the previous criticisms on term logic can be avoided
if we properly extend the logic.
9 ON THE RELATIONS BETWEEN
ABDUCTIVE AND INDUCTIVE
EXPLANATION
Luca Console and Lorenza Saitta
9.1 INTRODUCTION
Abduction and induction are two forms of inference that are commonly used in many
artificial intelligence tasks with the goal of generating explanations about the world.
Paradigmatic is the case of Machine Learning. It traditionally relied on induction in
order to generate hypotheses (Plotkin, 1970; Mitchell, 1982; Michalski, 1983b). How-
ever, some limitations emerging in purely inductive systems led researchers to propose
the use of other reasoning mechanisms for learning, e.g. deduction (Mitchell et al.,
1986; DeJong and Mooney, 1986), abduction (O'Rorke etal., 1990; Saitta et al., 1993)
and analogy (Veloso and Carbonell, 1991 ). Thus, a precise characterization of the var-
ious mechanisms could contribute to clarify their relations with learning tasks.
The interest for abduction, as a mechanism for generating (best) explanations grew
considerably in many fields of AI, such as diagnosis (Console and Torasso, 1991; Re-
iter, 1987; Cox and Pietrzykowski, 1987; de Kleer et al., 1992; Poole, 1989b), plan-
ning (Eshghi, 1988), natural language understanding (Charniak, 1988; Hobbs et al.,
1993), logic programming (Kakas et al., 1992). Indeed, several formal accounts of
abduction have been proposed (e.g., (Console et al., 1991b; O'Rorke et al., 1990; Cox
and Pietrzykowski, 1986a; Poole et al., 1987; Konolige, 1992; Kakas et al., 1992;
Levesque, 1989; De Raedt and Bruynooghe, 1991; Josephson, 1994)).
The goal of this chapter is to analyse the notion of reasoning towards explanation,
with specific interest for abduction and induction. We must immediately say that we
shall not be concerned with a universal notion of explanation, whose explication at-
133
P.A. Flach and A.C. Kakas (eds.), Abduction and Induction, 133-151.
© 2000 Kluwer Academic Publishers.
134 L. CONSOLE AND L. SAITTA
tracted for years, and still does, the attention of many philosophers (see, e.g., (Salmon,
1990) for a review). Thus, in this chapter we shall only deal with a restricted notion of
deductive explanation, as used, for instance, in the literature on principles of diagnosis
(Hamscher et al., 1992).
One of the goals of this chapter is to show that, using a logical framework, different
tasks aimed at providing explanations for a set of observations can be conceptually
unified; these tasks can be differentiated by imposing different constraints on the type
of explanation searched for. The goal is achieved by using a generalized notion of
explanatory hypothesis and of observation, including any kind of formulas, not just
ground ones. The framework allows induction and abduction to be characterized as
two aspects of the same inference process and to be related to each other. The proposed
characterization is in no way claimed to be the correct one; however, it does seem to
capture, in most cases, a basic intuition behind these inference schemes, and to make
explicit the grounds which the hypotheses they generate are based on.
The process of explaining observations is not limited to the generation of hypothe-
ses, but also includes their evaluation and selection. Beside domain-dependent crite-
ria, some notion of minimality (see, e.g., (Poole, 1989a; Stickel, 1988)) or simplicity
(Michalski, 1983b; Kemeny, 1953; Pearl, 1978) has been proposed to introduce an
order in the hypothesis space. Generation and selection of hypotheses can be done at
the same time, by biasing the search process in such a way that only hypotheses in a
preferred set are generated. In this chapter, however, we consider the two phases as
conceptually distinct, in order to give a definition of explanation neutral with respect
to any additional constraint suggested by the domain of application. A fundamental
partial order between hypotheses, widely used in Machine Learning, is given by their
degree of generality (specificity). In the first part of the chapter we briefly introduce a
definition of the notion of generality which will be then used in our characterization
of explanation.
The chapter is organized as follows: the notion of generality is discussed in Section
9.2; a generalized notion of explanation is introduced in Section 9.3; the relations be-
tween induction and abduction are investigated in Section 9.4. Section 9.5 applies the
framework to some examples of reasoning mechanisms. Finally, Section 9.6 discusses
related work.
The concept extension consists of the set of individuals (or tuples of individuals) which
satisfy the concept definition. More precisely, let f(xl, ... ,xn) be a concept over the
free variables x 1, ... ,xn; the extension off with respect to an interpretation I is defined
as follows 1:
The predicates that are true of a given n-tuple < a1, .. . , an > E nn are said to belong
to the intension of that n-tuple (Descles, 1987):
A certain confusion between these two aspects has influenced some of the definitions
of the more-specific-than (more-general-than) relation. Any definition of this relation
should acknowledge that specificity (generality) is an extensional property and, hence,
it only pertains to concepts. Closed formulas (sentences) are statements about the gen-
erality of the associated concepts and can be compared according to the information
they provide about a concept. A concept and a sentence are not comparable with re-
spect to generality. In order to illustrate this difference, let us consider the concept
square(x) and the sentences:
01 =square( a) 02 = 3x[square(x)] 03 = 'v'x[square(x)]
No extension can be associated with any of 01, 02 or 03; hence, it makes no sense
to speak of their degree of generality. However, each of the three sentences provides
information about the extension, and, hence, the degree of generality of the associated
concept square(x). In particular, 01 states that a E EXT(square) (with a E Q), i.e.,
that the extension of the concept square contains at least the individual a. 02 states
=
that EXT(square) -:/: 0, i.e., that the extension of square(x) is not empty. 03 states
that EXT(square) Q, i.e., that the extension of square(x) coincides with the whole
universe.
We can now introduce the more-specific-than relation among concepts and the
more-informative-than relation among sentences.
Definition 9.1 Given two concepts f(xl, ... Xn) and g(x1, ... xn), and a universe of
discourse Q, the concept f(xl, ... xn) will be said to be more specific than the concept
g(x~, ... xn) (denoted by f I< g (Michalski, 1983b)) iff EXT(!) ~ EXT(g) for any
interpretation.
If both f I< g and g I< f hold, then f and g belong to the same equivalence class with
respect to the more-specific-than relation; equivalence in generality will be denoted
by f < I > g. The relation I< is reflexive and transitive, but not antisymmetrical,
because it usually includes the case of equivalence. Definition 9.1, however, may
not be applicable in practice and an intensional criterion is needed. 8-subsumption
(Plotkin, 1970) was one of the first proposed criteria; recently it has been widely used
in Inductive Logic Programming (Muggleton, 1993).
1In the discussion that follows, in order to simplify the notation, we shall limit ourselves to Herbrand
interpretations.
136 L. CONSOLE AND L. SAIITA
Definition 9.2 Given two sentences q> and 'If of L, q> will be said to be more infor-
mative than 'If (denoted by q> :7 'If) if.fW(q>) ~ W('lf), where W(q>) denotes the set of
consistent worlds in which q> is true.
Definition 9.3 Given a theory T, expressed in the language L, and two concepts
f(xJ, . .. ,xn) and g(xJ, .. . ,xn). the concept f will be said to be more specific than
the concept g with respect to T (denoted by f I< T g) iffT 1- Vx1, ... , Xn [f(xi, ... , Xn) -+
g(xi, . .. ,xn)], where-+ denotes material implication.
Definition 9.4 Given two sentences q> and 'If of L and a theory T, q> will be said to be
more informative than 'If with respect to T (denoted by q> :7r '\jf) if.fT 1- (q> -+ 'If).
It is easy to see that definitions 9.3 and 9.4 are special cases of definitions 9.1 and
9.2, respectively. If we want to draw a parallel between the more-specific-than (l<r)
and the more-informative-than (:7r) relations, we can say that the extension of a con-
cept corresponds to the information content of a sentence. In order to examine more
deeply the parallel between informativeness of sentences and generality of concepts,
let us consider some examples. For instance, with an empty theory, we have:
In fact, the formula Vx p(x) selects the worlds in which every object has the property
p, whereas p(a) selects all those worlds in which only the object a is bound to have
that property. Finally, 3xp(x) is true in every world in which some object has the
property p. Then:
The attempt by (Flach, 1992) of assessing the relative generality of the two sentences
above is not meaningful in this chapter, because only the more-informative-than re-
lation can be applied to the sentences. On the other hand, Sigmund lives in Vienna
is a ground instance of the concept lives(x, Vienna), which can be proved to be more
specific than the concept lives(x,Austria), with respect toT.
Definition 9.5 Given a domain theory T and a set of sentences O.obs• describing ob-
servations performed on the world, a set E of sentences will be called an Explanation
for O.obs (with respect to T) iff the following conditions are satisfied:
(a)E "3T O.obs
(b)E ~T ..l
(c) True ~T O.obs
(d) E has some additional properties that make it interesting.
138 L. CONSOLE AND L. SAITTA
According to Definition 9.5, the explanation (a) must be more informative than the
observations, (b) must be consistent with the theory, and (c) must be essential, i.e.,
complete information about the observation must not be present in the theory. Condi-
tion (a) is what is required by Johnson-Laird for an explanation in human reasoning: a
hypothesis must increase, with respect to the observations, the semantic information,
in the sense that it must reduce the number of possible states of affairs. However, we
have to pay for this increase with the uncertainty of the hypothesis (Johnson-Laird,
1988, Cap. XIII). Definition 9.5 captures this notion, by explicitly referring to the
information content added by the generated hypothesis to our knowledge of the world.
Definition 9.5 has the advantage, on the one hand, of leaving open the possibility of
using different definitions of explanations (for instance, logical or causal ones), and,
on the other hand, of allowing a great freedom in the syntactic form of both observa-
tions and hypotheses. For example, using Definition 9.4, the conditions of Definition
9.5 can be rewritten as follows:
(a) TUEI-a.obs (b) TUEI/..1. (c) T'rfa.obs
obtaining thus the usual formulation of explanation in model-based diagnosis (Console
and Torasso, 1991; Poole, 1989a). Moreover, in model-based diagnosis (or planning),
O.obs and E are usually sets of ground literals. Restricting observations and hypothe-
ses to be ground seems reasonable for these tasks, where one wants to hypothesize
either faults in the modelled system or the effect of an action for explaining specific
symptoms or a specific state of the world. In learning, on the contrary, the hypothe-
sized explanation of a phenomenon shall be added to the current knowledge for future
use. Then, in learning, one would also like to explain non-ground, general formulas,
for instance to check part of a domain theory or to justify regularities noticed in the
world; this last task is fundamental for theory formation, in which empirical laws are
to be explained by general theories. The suitability of explaining quantified data has
also been more recently advocated by (Kelly and Glymour, 1988), who are interested
in the impact of this kind of data on the complexity of learning. In Machine Learn-
ing, most systems assume that O.obs is ground and E contains universally quantified
sentences. Notwithstanding the possible differences in the format of observations and
hypotheses, the very same conditions above have been adopted in Machine Learning
to characterize the learned hypotheses (Muggleton, 1991; Michalski, 1991).
Without any modification, Definition 9.5 also covers the case in which the q> 3T 'I'
relation is interpreted as q> is cause of'IJf (Cox and Pietrzykowski, 1986a; Saitta et al.,
1993; O'Rorke et al., 1990).
In summary, Definition 9.5 is justified by the requirement that it satisfies at least
the following properties:
• It is widely applicable, in that it only pretends a minimal set of broadly accepted
requirements for explanations. In fact, it is neutral with respect to both the
selected notion of explanation, and to any criterion that can be used to single
out interesting explanations.
• The analysis ofthe reasoning mechanisms in the rest of the chapter only depends
on this minimal set of requirements. In particular, the considerations that will
follow hold for any criterion used either to specialize Definition 9.5 or to select
among alternative hypotheses.
ON THE RELATIONS BETWEEN ABDUCTIVE AND INDUCTIVE EXPLANATION 139
Given Definition 9.5, the number of explanations satisfying conditions (a)- (c) is usu-
ally very big; then, condition (d) comes into play, allowing a set of preferred expla-
nations to be selected. Even though this chapter is not concerned with the problem of
explanation selection, some of the proposed criteria are briefly mentioned (see (Poole,
1989a; Stickel, 1988) for a discussion), pointing out their relations with the notion of
informativeness introduced in section 9.2.
• Avoiding redundant explanations. This criterion corresponds to the requirement
that an interesting explanation E must be minimal, in the sense that no proper subset
of E satisfies conditions (a) - (c). This criterion, adopted in many early definitions
of explanation, is subsumed by the notion of information content (see the discussion
in (Console et al., 1991b)). Such a criterion is also related to the notion of prime
implicants, which is in tum defined in terms of logical entailment and has been used
e.g. in model-based diagnosis (de Kleer et al., 1992).
• Avoiding trivial explanations, coinciding with the observations themselves. In
learning from examples, for instance, hypotheses consisting of the disjunction of
the observed examples may be undesirable.
• Explanations may be required to be expressed only in terms of a predefined set of
predicates (or language), for instance, the set of abducible predicates in diagno-
sis (Console and Torasso, 1991). This limitation is extensively used in Machine
Learning, under the name of language bias (Mitchell, 1982).
• A further dimension for defining preference criteria among explanations concerns
the specificity (or basicness (Cox and Pietrzykowski, 1986a; Stickel, 1988)). In
most specific abduction (Stickel, 1988), the assumptions must be basic, i.e., not
provable by making other assumptions. In least-specific abduction the only al-
lowable assumptions are the observations themselves. The notion of information
content partially interacts with such a choice: if we compare two explanations us-
ing as background theory the same domain theory used for generating explanations,
we enforce a preference for less specific explanations. It is worth noting that inter-
mediate notions between the two mentioned above have been suggested, e.g., in
cost-based abduction (Hobbs et al., 1988), in which numeric costs are associated
with assumptions.
• (Poole, 1989a) introduced the notion of least presumptive explanations, i.e., expla-
nations which do not make any assumptions that are not supported by the observa-
tions: given a set T of relations among assumptions (domain theory), an explanation
E 1 is less presumptive than £2 iff T U £2 f- E 1· Thus, as expected, this notion is a
special case of that of informativeness.
• In Machine Learning, three criteria have been traditionally used to compare and
select hypotheses: consistency, completeness and simplicity (Occam's razor prin-
ciple). A complete hypothesis is able to explain all the positive occurrences of the
phenomenon under study, whereas a consistent hypothesis does not explain any of
the negative occurrences of the phenomenon. Simplicity is more difficult to define,
and has mostly been associated with the hypothesis' syntactic simplicity. However,
this type of simplicity is not always satisfactory, and other notions, more seman-
tic in nature, have been proposed. One attempt has been to introduce a measure
of coherence (Ng and Mooney, 1991), a metric that selects those assumptions that
140 L. CONSOLE AND L. SAITTA
are more relevant for (connected to) the given observations. This approach has
been proposed for natural language understanding (inferring users plans from utter-
ances).
=
Inductive hypotheses are not all inductive to the same degree. For instance, in
Example 9.1 the sentence £4 q( a) 1\ q( b) is also a valid inductive explanation for
q(a). However, the inductive leap from q(a) to Vx q(x) is much larger than from q(a)
to q( a) 1\ q( b). A quantitative measure based on the informativeness might allow the
inductive leaps to be measured and compared.
Example 9.2 Let us now consider a universe of discourse Q = {John, Dan} and the
following theory and observation:
T ={Vx[measles(x)---+ red...spots(x)],
Vxy[measles(x) 1\brothers(x,y)---+ measles(y)],
Vxy[brothers(x,y)-+ brothers(y,x)],
brothers(John,Dan)}
=
Uobs red...spots(John)
TUE; 1- a;(1 ~ i ~ k)
In this case, the nature of the E; 's can be assessed independently. Therefore, we can
consider, without loss of generality, the case of a non-decomposable explanation E
for <lobs· The definition of the various types of explanations (hypotheses) will be
introduced in several steps, starting from the case where the observation is ground and
consists of instances of a unique predicate p.
E4 = {r(a),'v'x[r(x)-+ p(x)]}
where the predicate r does not occur in T; hence, r(a) is not a subformula ofT in-
stantiated on a. However, the generation of E4 corresponds to the acknowledgement
144 L. CONSOLE AND L. SAITTA
that T may be incomplete and that p(a) is not actually a satisfactory hypothesis; then,
some other phenomenon should be postulated to produce p(a). This kind of process
of generating the abductive hypothesis £ 4 is quite common in theory formation, where
theoretical terms and hidden properties are hypothesized to extend the theory T on the
basis of new evidence.
Definition 9.6 can be generalized to the case where more than one predicate symbol
occurs in the observation, i.e., Uobs =PI (XI) t\ . .. t\pm(Xm). In such a case, the nature
of an explanation E can be evaluated with respect to each Pi (1 ~ i ~ m); we will say
that, globally, E is inductive when it is inductive with respect to at least one Pi· E
is abductive when it is abductive with respect to every Pi· and inductive/abductive in
any other case. The reason for this definition is that an abductive hypothesis must not
introduce any extensional increase, thus the extension of the Pi's should not change.
On the other hand, it is sufficient that a single Pi has its extension increased, in order
to have an inductive leap. An analogous definition can be given in the more general
case in which O.obs is a ground formula expressed in conjunctive or disjunctive normal
form.
The case where negative literals occur in the observations has to be dealt with care-
fully. In particular, one can interpret negation in a classical way; in such a case, given
an observation --,a.(a) (or --,a.(X)), we can apply the same considerations as above,
by simply commuting •a. into a new predicate g, and considering the complements
of the extension(s). However, interpreting negation classically is not natural in many
applications of abductive and inductive reasoning (see (Console et al. , 1991 b)) and the
interpretation as failure to prove is much more useful and common. In this case, the
explanations must be consistent with the negative literals, i.e., positive atoms incon-
sistent with the negative literals must not be derivable from the explanation. Then, the
considerations above must be adapted; however, a detailed description of this process
is out of the scope of this chapter.
Let us now consider the case where the observation is not ground; the formal defi-
nition of the nature of an explanation becomes more complex. Let us start again from
=
the simplest case where the observation is a single universally quantified predicate p:
O.obs '1:/x p(x). In this case, the observed extension of p coincides with the whole
universe, i.e., X= EXT(p) = Q. Then, only purely abductive explanations can exist
for O.obs·
=
Consider now the case of the observation being a universally quantified formula
not consisting of a single predicate, i.e., a.obs 'Vx q>(x), where :X denotes a set of
m variables. Obviously, as X= EXTobs(q>) =Om. there cannot exist inductive ex-
planations for the whole O.obs· However, q>(:X) is a concept whose extension derives
from the combination of the extensions of sub-concepts, each consisting of a single
predicate. Let q, = {p jll ~ j ~ n} be the set of such predicates. Let, moreover,
Xj = EXTobs(Pj)(l ~ j ~ n) be the observed extension of Pj in O.obs· Definition 9.6
can be applied to each p j; in fact, even if EX Tabs( q>) cannot be further extended, some
of the Xj 's could be. Then, an explanation E of a.obs ::: 'Vx q>(x) will be called inductive,
if it is inductive for at least one p j E q,, abductive if it is abductive for every p j E q,,
and inductive/abductive in every other case.
ON THE RELATIONS BETWEEN ABDUCfiVE AND INDUCTIVE EXPLANATION 145
Theory: Theory:
Absorption Vx[<p(x)-+ p(x)] Vx[<p(x)-+ p(x)]
Observation: Explanation:
Vx[<p(x) 1\ '\jf(x) -+ q(x)] 'v'x[p(x) 1\ '\jf(x) -+ q(x)]
Theory: Theory:
Identification Vx[<p(x) 1\ p(x) -+ q(x) Vx[<p(x) 1\p(x)-+ q(x)]
Observation: Explanation:
Vx[<p(x) 1\ '\jf(x)-+ q(x)] Vx[<p(x) -+ p(x)]
Truncation Observation: Explanation:
Vx(<p(x) 1\ '\jf(x)-+ q(x)] Vx[<p(x)-+ p(x)]
Table 9.1 Inverse resolution operators.
is quite old (Meltzer, 1970; Morgan, 1971). Choosing resolution as the deduction
machinery, Muggleton defined a set of rules for performing Inverse Resolution for the
propositional calculus (Muggleton, 1987). Later, a reduced set of inference rules has
been considered for First Order Logic (Muggleton and Buntine, 1988; Rouveirol and
Puget, 1990). The nature of the inverse resolution operators has been investigated in
(Console et al., 1991a) in the case of propositional calculus. In particular, in that paper
absorption, identification and truncation (whose definitions are summarized in Table
9.1) have been proved to be sound inference rules with respect to the general definition
of explanation reported above.
In order to investigate the nature of these operators, we have to identify the theory
and the observation. The criterion we chose is that the formulas that are not modified
by the inference process belong to the theory. Explanations are new rules to be added
to the theory.
It is easy to show that absorption and identification have a potentially inductive/
abductive nature. Absorption, in fact, corresponds to the process of generalizing prop-
erties of classes of objects (climbing generalization rule (Michalski, 1983b)). In fact,
given the assertion in T, which states that each object in class <p belongs to the class p,
the fact that the objects in class <p have the property q (when they also have property
'If) is explained by stating that the same is true of all the objects in the super-class p of
<p. If it is the case that the theory T contains also the clause:
Vx[o(x) -+ p(x)]
then, also the objects in class o would inherit the property q, and those objects are not
involved in the observation. This is a case in which the nature of a hypothesis can be
determined only in the context of a given theory.
The truncation operator has a purely inductive nature since it corresponds to the
dropping-condition rule (Michalski, 1983b). The extension of p is enlarged from the
set of object with both properties <p and 'I' to the set of object with only property <p.
ON THE RELATIONS BETWEEN ABDUCfiVE AND INDUCfiVE EXPLANATION 147
r ='v'xy[q>(x,y) -+ w(x)]
to be added to the theory, by trying to prove w(a) from T. The addition of r to T is
only a matter of convenience, because, in fact, T r- r.
In EBG , the observation a.obs. which we want to justify, is:
Actually, it is well known that EBG does not really allow something new to be learned,
but only to state in a new, effective way implicit links between w and properties em-
bedded in T.
Given a set O.obs of observations (containing at most one ground instance of each
predicate belonging to the language of the observations), and, possibly, contextual
information, a diagnosis is a set E of ground instances of the mode predicate, assigning
at most one mode to each component, and such that O.obs follows from T, E, inputs
and context.
It should be clear that explanations of this form have a purely abductive nature
according to Definition 9.6 (as one would expect). This can be shown by considering
that observations and explanations are forced to be ground and that the behavioral
models that are used are usually functional. Therefore no value other than the observed
one can be predicted for a predicate corresponding to a given observation. Notice that
the same notion of explanation has been used also in other tasks such as planning or
natural language understanding (see the overview in (Poole, 1990)).
(for instance, (Cox and Pietrzykowski, 1986a; Saitta et al., 1993; O'Rorke et al.,
1990)), this notion is mostly left to the reader's intuition from everyday life: a non-
monotonic relation between cause and effect, which is not material implication, but
which is context-sensitive and related to temporal order. Definition 9.5 also applies
to causal explanations, provided that some computationally precise semantics is as-
sociated to the relation <p 3T 'If, to be read, in this case, as <p causes 'If· Finally, we
may notice that a weaker notion of explanation can also be considered. For instance,
(Flach, 1991; Flach, 1995) makes a distinction between weak explanation (based on
consistency between the hypothesis and data) and strong explanation (based on deriv-
ability of data from the hypothesis). Requiring only consistency of the explanation
with the observations and the available theory is common in model-based diagnosis,
when the theory models the system's correct behaviour(Console and Torasso, 1991; de
Kleer eta/., 1992). Observation derivability may be required, instead, when the theory
models the systems possible fault modes (in particular causes of misbehaviour).
such that P U BK f= C, where BK is a body of background knowledge and Cis the set
of statements to be explained. According to his view, then, abduction is a special case
of induction, differently from what is proposed in this chapter. However, Michalski
does not provide any formal characterization of the abductive explanations among all
the inductive ones, so that it is difficult to recognize them.
A similar classification is introduced in (Kodratoff, 1991 ), where abduction is one
among many other forms of inductive inference. In particular, Kodratoff distinguishes
between abduction (as inversion of modus ponens) and generalization (inference from
the particular to the general), but does not provide any formal support for it, only using
intuitive arguments.
On the contrary, (Josephson, 1994) (see also Josephson's chapter in the present
volume) classifies inductive generalization as a form of inference aimed at some best
explanation, i.e., as a form of abduction (following (Harman, 1965)); in fact, differ-
ently from Michalski and Kodratoff, and also from us, he calls abduction the whole
process of generating, evaluating and selecting hypotheses. However, he too fails to
provide a formal definition of inductive generalization and, more than that, a con-
vincing distinction between inductive generalization and abduction. In fact, he states
that the result of an inductive generalization does not explain the observed facts, but
the events of observing the facts, which is quite an ambiguous assertion. Moreover,
also the motivation for including inductive generalization into abduction is weak: if
it were not so, then we would be unable to justify why the credibility of an induc-
tive generalization increases with the number of supporting observations. This is not
true: it is sufficient to introduce a ranking among inductive hypotheses based on some
quantitative measure.
In this chapter we agree with Aristotle and the later proposal of Peirce, who consid-
ered induction and abduction as distinct processes. Our classification may provide a
formal basis for a precise distinction among explanations which generalize and those
which do not.
Finally, we disagree with the view proposed by (Flach, 1991) that induction =
abduction + revision, because this definition is too vague and does not allow for a
precise distinction between the two; moreover, it is not psychologically satisfactory
to relate the definition of induction with that of incrementality. In fact, this turns out
to say that, if one sees ten green apples today and concludes that all apples are green,
he/she performs an abductive reasoning; if tomorrow two more green apples are taken
into consideration, an inductive reasoning takes place.
9.7 CONCLUSIONS
This chapter proposed a generalized definition of explanation and a characterization
of the abductive or inductive nature of explanations, in the context of a theory. Rec-
ognizing the inductive or abductive nature of a hypothesis allows different kinds of
motivation to be searched for assessing its credibility: its frequency of occurrence in
the actual world, for an inductive hypothesis, and its causes, for an abductive one.
10
LEARNING, BAYESIAN
PROBABILITY, GRAPHICAL MODELS,
AND ABDUCTION
David Poole
10.1 INTRODUCTION
This chapter explores the relationship between learning (induction) and abduction. I
take what can be called the Bayesian view, where all uncertainty is reflected in proba-
bilities. In this chapter I argue that, not only can abduction be used for induction, but
that most current learning techniques (from statistical learning to neural networks to
decision trees to inductive logic programming to unsupervised learning) can be best
viewed in terms of abduction.
Definition 10.1 An evidential reasoning task is where some parts of a system are
observed and you want to make inferences about other (hidden) parts.
Example 10.1 The problem of diagnosis is an evidential reasoning task. Given ob-
servations about the symptoms of a patient or artifact, we want to determine what is
going on inside the system to produce those symptoms.
!53
P.A . Flach and A.C. Kakas (eds.), Abduction and Induction, 153-168.
© 2000 Kluwer Academic Publishers.
154 D. POOLE
Evidential reasoning tasks are often of the form where there is a cause-effect relation-
ship between the parts. In diagnosis we can think of the disease causing the symptoms.
In vision we can think of the scene causing the image. By causation 1, I mean that dif-
ferent diseases can result in different symptoms (but changing the symptoms doesn't
affect the disease) and different scenes can result in different images (but manipulating
an image doesn't affect the scene).
There are a number of different ways of modelling such a causal domain:
causal modelling where we model the function from causes to effects. For example,
we can model how diseases or faults manifest their symptoms. We can model
how scenes produce images.
evidential modelling where we model the function from effects to causes. For ex-
ample we can model the mapping from symptoms to diseases, or from image to
scene.
Independently of these two modelling strategies, we can consider two reasoning tasks:
Evidential Reasoning given an observation of the effects, determine the causes. For
example, determine the disease from the symptoms, or the scene from the image.
Causal Reasoning given some cause, make a prediction of the effects. For example,
predicting symptoms or prognoses from a disease, or predicting an image from
a scene. This is often called simulation.
observation
prediction
• The second strategy is to model both causally and evidentially and to use the
causal model for causal reasoning and the evidential model for evidential rea-
soning. The main problem with this is the redundancy of the knowledge, and
its associated problem of consistency, although there are techniques for au-
tomatically inferring the evidential model from the causal model for limited
cases (Poole, 1988b; Console et al., 1989; Konolige, 1992; Poole, 1994). Pearl
(Pearl, 1988a) has pointed out how naive representations of evidential and causal
knowledge can lead to problems.
• The third strategy is to model causally and use different reasoning strategies for
causal and evidential reasoning. For causal reasoning we can directly use the
causal model, and for evidential reasoning we can use abduction.
This leads to an abstract formulation of abduction that will include both logical and
probabilistic formulations of abduction:
Definition 10.2 Abduction is the use ofa model in its opposite direction. That is, if a
model specifies how x gives a y, abduction lets us infer xfrom y. Abduction is usually
evidential reasoning from a causal modet2.
If we have a model of how causes produce effects, abduction lets us infer causes
from effects. Abduction depends on an implicit assumption of complete knowledge of
possible causes (Console et al., 1989; Poole, 1994); when an effect is observed, one
of its causes must be present.
2 Neitherthe standard logical definition of abduction nor the probabilistic version of abduction (presented
below) prescribe that the given knowledge is causal. It shouldn't be surprising that the formal definitions
don't depend on the knowledge base being causal as the causal relationship is a modelling assumption. We
don't want the logic to impose arbitrary restrictions on modelling.
156 D. POOLE
• An agent can only act according to its beliefs and its goals. An agent doesn't
have access to everything that is true in its domain, but only to its beliefs. An
agent must somehow be able to decide on actions based on its beliefs.
• It is not enough for an agent to have just a single model of the world in which
it is interacting and act on that model. It also needs to consider what other
alternatives may be true, and make sure that its actions are not too disastrous if
these other contingencies happen to arise.
A classic example is wearing a seat belt; an agent may assume that it won't have
an accident on a particular trip, but wears a seat belt to cover the possibility
that it does have an accident. Under normal circumstances, the seat belt is a
slight nuisance, but if there is an accident, the agent is much better off when
it is wearing a seat belt. Whether the agent wears a seat belt depends on how
inconvenient it is when there is no accident, how much better off the agent would
be if they were wearing a seat belt when there is an accident, and how likely an
accident is. This tradeoff between various outcomes, their relative desirability,
and their likelihood is the subject of decision theory.
• As we will see below, probabilities are what can be obtained from data. Proba-
bility lets us explicitly model noise in data, and lets us update our beliefs based
on noisy data.
w f= a 1\ ~ iff w f= a and w f= ~
So far this is just standard logic, but using the terminology of random variables.
I 58 D. POOLE
Let's define a nonnegative measure Jl(w) to each world w so that the measures of
the possible worlds sum 3 to I. The use of I is purely by convention; we could have
just as easily used I 00, for example.
The probability of proposition a, written P( a), is the sum of the measures of the
worlds in which a is true:
P(a) = I Jl(w).
wl=a
~(w)/P(e)
ifw ~ e
Jle(w) = { ifw p e
We can then define the conditional probability of a given e, written P(aie) in terms
of the new measure:
P(aie) = I Jle(w).
wl=a
Example 10.3 The probability P(sneeze = yesicold =severe) specifies, out of all of
the worlds where cold is severe, what proportion have sneeze with value yes. It is
the measure of belief in the proposition sneeze= yes given that all you knew was that
the cold was severe. The probability P(sneeze = yesicold =f. severe) considers the
other worlds where the cold isn't severe, and specifies the proportion of these in which
sneeze has value yes. This second probability is independent of the first.
P(hie) = p~(:)e).
Rewriting the above formula, and noticing that h 1\ e is the same proposition as e 1\ h,
we get:
P(hl\e) = P(hie) x P(e)
= P(eih) x P(h)
We can divide the right hand sides by P( e), giving
P(hie) = P(eih) x P(h)
P(e)
3 When there are infinitely many possible worlds, we need to use some form of measure theory, so that the
measure of all of the possible worlds is 1. This requires us to assign probabilities to measurable sets of
worlds, but the general idea is essentially the same.
LEARNING, BAYESIAN PROBABILITY, GRAPHICAL MODELS, AND ABDUCfiON 159
if P(e) =/; 0. This equation is known as Bayes' theorem or Bayes' Rule. It was first
given in this generality by (Laplace, 1812).
It may seem puzzling why such an innocuous looking equation should be so cele-
brated. It is important because it tells us how to do evidential reasoning from a causal
knowledge base; Bayes' rule is an equation for abduction. Suppose P(eih) specifies
a causal model; it gives the propensity of effect e in the context when h is true. Bayes'
rule specifies how to do evidential reasoning; it tells us how to infer the cause h from
the effect e.
The numerator is the product of the likelihood, P(eih), which specifies how well
the hypothesis h predicts the evidence e, and the prior probability, P(h), that specifies
how much the hypothesis was believed before any evidence arrived.
The denominator, P(e), is a normalising constant to ensure that the probabilities
are well formed. If { h 1 , ••• , hk} are a set of pairwise incompatible (h; and h j cannot
both be true if i =/; J) and covering (one h; must be true) set of hypotheses, then
If you are only interested in comparing hypotheses this denominator can be ignored.
If e is the data (all of the training examples), and his a hypothesis, Bayes' rule spec-
ifies how, given the model of how the hypothesis h produces the data e and the prior
propensity of h, you can infer how likely the hypothesis is, given the data. R.J. One
of the main reasons why this is of interest is that the hypotheses can be noisy; an
hypothesis can specify a probability distribution over the data it predicts. Moreover,
Bayes' rule allows us to compare those hypotheses that predict the data exactly (where
P(eih) = 1) amongst themselves and with the hypotheses that specify any other prob-
ability of the data.
Example 10.4 Suppose we are doing Bayesian learning of decision trees, and are
considering a number ofdefinitive decision trees (i.e., they predict classifications with
0 or 1 probabilities, and thus have no room for noise). For each such decision tree h,
either P(eJh) = 1 or P(eJh) = 0. Bayes theorem tells us that those that don't predict
the data have posterior probability 0, and those that predict the observed data have
posterior probabilities proportional to their priors. Thus the prior probability specifies
the learning bias (for example, towards simpler decision trees); out of all of the trees
that match the data, which are to be preferred. Without such a bias, there can be no
learning as every possible function can be represented as a decision tree. Bayes rule
160 D. POOLE
3.5~------~--------r-------~-------.--------.
also specifies how to compare simpler decision trees that may not exactly fit the data
(e.g., if they have probabilities at the leaves) with more complex ones that exactly fit
the data. This gives a principled way to handle overfitting.
Example 10.5 The simplest form of Bayesian learning with probabilistic hypotheses
is when there is a single binary event that is repeated and statistics are collected. That
is, we are trying to learn probabilities. Suppose we have some object that can fall
down such that either there is some distinguishing feature (which we will call heads)
showing on top, or there is not heads (which we will call tails) showing on top. We
would like to learn the probability that there is a heads showing on top. Suppose our
hypothesis space consists of hypotheses that specify P(heads) = p where heads is the
proposition that says heads is on top, and p is a number that specifies the probability
of a heads on top. Implicit in this hypothesis is that repeated tosses are independent4 .
Suppose we have on observation e consisting of a particular sequence of outcomes
4 Bayesian probability doesn't require independent nials. You can model the interdependence of the nials
in the hypothesis space.
LEARNING, BAYESIAN PROBABILITY, GRAPHICAL MODELS, AND ABDUCTION 161
with n outcomes with heads true and out ofm outcomes. Let hp be the hypothesis that
P(heads) = p for some 0 ::=; p ::=; 1. Then we have, by elementary probability theory,
P(e!hp) = pn(l- p)m-n
Suppose that our prior probability is uniform on [0,1]. That is, we consider each value
for P(heads) to be equally likely before we see any data.
Figure 10.2 shows the posterior distributions for various values of n and m. Note
that the only hypotheses that are inconsistent with the observations are P(heads) = 0
when n > 0 and P(heads) = 1 when m > 0. Note that if the prior isn' t very biased, it
soon gets dominated by the data.
The latter is the number of bits it takes to describe the data in terms of the model plus
the number of bits it takes to describe the model. Thus the best hypothesis is the one
that gives the shortest description of the data in terms of that model.
5 We don't have to do this. In panicular, it is the posterior distribution of the hypotheses that we want to use
to make decisions, rather than the most likely hypothesis.
162 D. POOLE
idea is to represent a domain in terms of random variables and to explicitly model the
interdependence of the random variables in terms of a graph. This is useful when a
random variable only depends on a few other random variables, as occurs in many
domains.
Suppose we decide to represent some domain using the random variables x1, . . . , Xn.
If we totally order the variables, it is easy to prove that
P(x1, ... ,xn)
= P(xi)P(x2lxi)P(x3lx1,X2) · · ·P(xnlx! · · ·Xn-1)
For each variable Xi suppose there is some minimal set 1tx; ~ { x1 , . .. , Xi-! } such that
P(xiiXI, ... ,Xi- I) = P(xd1tx;)
That is, once you know the values of the variables in 1tx;, knowing the values of other
predecessors of Xi in the total ordering will not change your belief in Xi· The elements
of the set 1tx; are known as the parents of variable Xi. We say Xi is conditionally
independent of its predecessors given its parents. We can create a graph where there
is an arc from each parent of a node into that node. Such a graph, together with
the conditional probabilities for P(xd1tx;) for each variable Xi is known as a Bayesian
network or a belief network (Pearl, 1988b; Jensen, 1996).
There are a few important points to notice about a Bayesian network:
• By construction, the graph defining a Bayesian network is acyclic.
• Different total orderings of the variables can result in different Bayesian net-
works for the same underlying distribution.
• The size of the conditional probability table P(xil1tx;) is exponential in the num-
ber of parents of Xi .
Typically we try to build Bayesian networks so that the total ordering implies few
parents and a sparse graph.
Bayesian networks are of interest because they can be constructed taking into ac-
count just local information, the information that has to be specified is reasonably
intuitive, and there are many domains that have concise representations as Bayesian
networks. There are algorithms that can exploit the sparseness of the graph for com-
putational gain (Lauritzen and Spiegelhalter, 1988; Dechter, 1996; Zhang and Poole,
1996), exploit the skewness of distributions (Poole, 1996) or use the structure for
stochastic simulation (Henrion, 1988; Pearl, 1987; Dagum and Luby, 1997).
P(e) = IlP(n)
nEe
In (Poole, 1993) it was proved that the Bayesian network and the abductive character-
isation result in the same probabilities.
Suppose we want to compute a probability given evidence, we have
Thus this can be seen in terms of abduction as: given evidence e, first explain the
evidence (this gives P(e)), and from the explanations of the evidence, explain h (this
gives P(h 1\ e)). Note that the explanation of h 1\ e are the explanations of e extended
to also explain h. In terms of a Bayesian network, you can first go backwards along
the arrows to explain the evidence, and then go forward along the arrows to make pre-
dictions. Thus not only can Bayes' rule be seen as a rule for abduction, but Bayesian
networks can be seen a representation for abduction. Note that this reasoning frame-
work of using abduction for evidential reasoning and assumption-based reasoning for
causal reasoning (see Figure 10.1), which is what the above analysis gives us for
Bayesian networks, has also been proposed in the default reasoning literature (Poole,
1989a; Poole, 1990; Shanahan, 1989).
The logic programs have a standard logical meaning and can be extended to include
(universally quantified) logical variables6 in the usual way. The only difference to
standard logic programs7 is that some of the premises are hypotheses that may have
an associated probability.
Example 10.6 Figure 10.3 shows a Bayesian network for the coin tossing of Example
10.5. The probability of heads on example i, which in the left-hand side of Figure 10.3
is shown as heads;, is a random variable that depends only on 9, the probability of
6 Itis important not to confuse logical variables, which stand for individuals, and random variables. In this
chapter, I will follow the Prolog convention of having logical variables in upper case.
7 In the independent choice logic (Poole, 1997), we can also have negation as failure in the rules. The notion
of abduction needs to be expanded to allow abduction through the negation (Poole, 1998).
LEARNING, BAYESIAN PROBABILITY, GRAPHICAL MODELS, AND ABDUCfiON 165
Figure 10.3 Bayesian network for coin tossing, with and without plates.
heads appearing on a coin toss. The right-hand side of Figure 10.3 shows the same
network using plates, where there is one copy of the boxed node for each example.
Given the logic-programming characterisation of Bayesian networks, we can use
universally quantified logical variables in the rules to represent the plates of Buntine.
Example 10.7 Let's write the example of Figure 10.3 in terms of probabilistic Horn
abduction. First we can represent each arc to an example as the rule:
heads( E) +- happensJoJurn..heads(E, P) 1\ prob..o f ..heads(P)
where heads( E) is true if example E shows a heads, and tails( E) is true if example E
shows a tails.
The corresponding alternatives are
VEVP{ happensJoJurn..heads(E, P), happensJoJurn..heads(E, P)} E C
That is, we can assume that example E turns heads or assume it turns tails. We then
have the probabilities:
P(happensJoJurn..heads(E,P)) = P
P(happensJoJurnJails(E,P)) = 1-P
for each P E [0, 1]. Suppose there were n heads and m tails in the k = n + m examples,
then the probability of this explanation is
pn X ( 1 - P)m X q
8 Notethat when these decision trees are translated into rules, probabilistic Hom abduction theories result.
But here we are using probabilistic Hom abduction to represent the learning task, not the task being learned.
LEARNING, BAYESIAN PROBABILITY, GRAPHICAL MODELS, AND ABDUCTION 167
number(N) 1\
predicts_prob(Ex,N, V).
where
VExVN{predicts_prob(Ex,N,true),predicts_prob(Ex,N,false)} E C
such that
P(predicts_prob(Ex,N,true)) = N
P(predicts_prob(Ex,N,false)) = 1-N
Similarly we need ways to abduce what the trees are, and (the more difficult) problem
of assigning the priors on the decision trees.
The most likely explanation of a set of classifications on examples results in the
most likely decision tree given those examples.
10.5.2 Generalization
It has often been thought that probability is unsuitable for generalization as the gener-
alization VX r(X) must have a lower probability than any set of examples r( e1), ... , r( ek),
as the generalization implies the examples. While the statement of probability is cor-
rect, it is misleading because it is not the hypothesis and the evidence that we want to
compare but the different hypotheses9 •
The different hypotheses may be, for example:
1. r(X) is always true,
2. r(X) is sometimes true (and it just happened to be true for examples e1 , . .. , ek).
10.6 CONCLUSION
This chapter has related the Bayesian approach to learning with logic-based abduction.
In particular, I have sketched the the relationship between Bayesian leaning and the
graphical models of (Buntine, 1994) and the relationship between graphical models
and abductive logic programming of (Poole, 1993). It should be emphasised that,
while each of the links has been developed, the chain has not been fully investigated.
This chapter should be seen as a starting point, rather than a survey of mature work.
Acknowledgments
This work was supported by Institute for Robotics and Intelligent Systems, Phase III (IRIS-III),
project "Dealing with Actions", and Natural Sciences and Engineering Research Council of
Canada Operating Grant OGP0044121 .
11 ON THE RELATION BETWEEN
ABDUCTIVE AND INDUCTIVE
HYPOTHESES
Akinori Abe
11.1 INTRODUCTION
Abduction and induction have been recognized as important forms of reasoning with
incomplete information that are appropriate for many problems in artificial intelli-
gence (AI). Abduction is usually used for design, diagnosis and other such tasks. In-
duction is usually used for classification, program generation, and other similar tasks.
As mentioned in the introductory chapter to this volume, Peirce classified abduc-
tion from a philosophical point of view as the operation of adopting an explanatory
hypothesis and characterized its form. In addition, he characterized induction as the
operation of testing a hypothesis by experiments (Peirce, 1955a). Abduction and in-
duction are different in his viewpoint.
On the other hand, abduction in the AI field is generally understood as reasoning
from observation to explanations, and induction as the generation of general rules from
specific data. Sometimes, both types of inferences are thought to be the same because
they can be viewed as being the inverse of deduction. Pople mechanized abduction as
the inverse of deduction (Pople, Jr., 1973), although he seems to distinguish abduc-
tion from induction. Muggleton and Buntine have formalised induction as inverted
resolution (Muggleton and Buntine, 1988).
Some researchers have contended that abduction and induction are similar pro-
cesses. For example, Josephson, in his contribution to this volume, argues that "smart"
inductive generalisation is a special case of abduction. On the other hand, Dimopou-
los and Kakas have shown that the two types of inferences differ, in that abduction
169
P.A. Flach and A.C. Kakas (eds.), Abduction and Induction, 169-180.
© 2000 Kluwer Academic Publishers.
170 A. ABE
extracts hypotheses from theories, while induction constructs its hypotheses using in-
formation from theories (Dimopoulos and Kakas, 1996b). Furthermore, from the two
perspectives, Flach has shown that "abduction (inferential)= abduction (syllogistic) U
induction (syllogistic)", and that induction (inferential) does not have an analogue in
the syllogistic perspective (Flach, 1996a).
In this chapter, I will argue from the inferential point of view. Since the role and
behaviour of abductive hypotheses and those of inductive ones are somewhat different,
I support the position that they are different. I will provide a way to integrate abduction
and induction from the features of the above hypotheses.
There are various sorts of studies about abduction and induction in the AI field.
However, for simplification, in this chapter I will specialize abduction as a hypothetical
reasoning and induction as inductive logic programming (ILP).
devices' functions and their connections, and knowledge of other rules in the predicate
formas follows (specification). 1
fact((equ(out(N,fl),O) :- conn(Node,out(N,fl)) & equ(Node,O))).
If the following input-output relation for the LSI circuit is given as an observation,
abduction computes the devices' name, and their connections in a propositional formas
a set of hypotheses.
conn(out(l,x5),in(l,x6)) & conn(out(l,x4),in(l,x5)) &
c onn(in(l,fl),in(l,x4)) & function(x6,reg)& dev(x6,1,1) & ...
(usually in the predicate form). If examples are given in the propositional formlike
1In the following examples, names beginning with a capital letter are variables.
172 A. ABE
(F). If it fails, it selects a subset of hypotheses (h) from H and tries to explain the
observation with facts and a consistent set of hypotheses.
FifO (0 can not be explained by only F)
CMS. Another popular abduction system is CMS (Reiter and de Kleer, 1987). The
CMS inference mechanism is: When
(C can not be explained by only ~)
A clause S is called a minimal support clause, and -,s is a clause missing from clause
set ~ that can explain C. Therefore, -,scan be thought of as an abductive hypothesis
according to the abductive point of view. This hypothesis is not included in the knowl-
edge base(~~ S.). Therefore, there is no justification for the hypotheses except that
it provides a minimal completion for the abductive puzzle
Definition 11.2 Let A and B be sets of clauses. "A ~ B'" is the relationship between
A and B', iff there exists a set of clauses B' such that:
(i) A F B,
(ii) B f. B', and
(iii) every clause in B' is either included in B, derivable from B, an analogical clause
of that included in B, or an analogical clause of that derivable from B.
Definition 11.3 Let A be a set of clauses. "A fvt A'" is the relationship between A and
A', such that A' is a set of clauses that can derive analogical clauses of the clauses
derived from A. Here, "fvt" means "can derive clauses with analogical mapping."
(S ~S)
1: f= s'vc (S ~s')
1:~S'
2In this chapter, "logically equivalent" means that both results from clauses are the same. For example, if A
is ~av c and B is ~av ~bv c and b, then A is logically equivalent to B.
3In fact, this query is given in the form of ~donkey /1 palace, because of the limitation of CMS. One solution
to this problem is found in (Abe, 1997).
4 A possible analogical target can be easily found from the inference path. For details, see (Abe, 1998)
ON THE RELATION BETWEEN ABDUCTIVE AND INDUCTIVE HYPOTHESES 175
(11.3)
(11.4)
then ILP finds a logic program h E H, such that B and h are complete (11.3) and
consistent (11.4).
Fuh 1- 0
;-:;,~+
Figure 11 .1 Relation between abductive hypotheses and inductive hypotheses.
propositional
examples
facts
Related work. As Mooney writes in his contribution to this volume, in his method
abduction works as knowledge base refinement, and inductive learning provides the
acquisition of abductive theories. The second part of his method is the same as pre-
sented here. In the first half, abduction is expected to generate positive examples to
refine the knowledge base. Therefore, abductive hypotheses in his system correspond
to inductive background knowledge.
Other related work is by (Kanai and Kunifuji, 1997). When background knowl-
edge for induction is not sufficient in their method, abduction produces sufficient
background knowledge. They regard abductive hypotheses as inductive background
knowledge. Their treatment of abductive hypotheses is also slightly different because
here we have abductive hypotheses corresponding to inductive examples.
5 Near-miss example is a negative example that has a few different elements from a positive exam-
ple. To illustrate, if a positive example is ([vll, v12], [v21, v22]. ... ), then one near-miss example is
([v91, v92],[v21, v22], ... ).
178 A. ABE
in general, it is rather excessive to divide negative examples and positive ones by such
a classification.
Therefore, in this chapter, I focus upon learning from positive-only examples. One
of the restrictions to learning rules from positive-only examples is the Subset Princi-
ple (Angluin, 1980; Berwick, 1986), which is a necessary condition for positive-only
learning. Let L; be an indexed family of non-empty languages. The necessary and
sufficient condition for the Subset Principle is:
SUT~G
SUTUM t= G
SUTUM ~ -.m (mE M)
TUA~G
TUAUMf=G
11.4 CONCLUSION
Abductive hypotheses and inductive hypotheses are sometimes treated as if they were
the same type of hypotheses. Indeed, the forms of the formulae for abduction and
induction are similar. However, they are not the same from their roles and behaviour.
I have shown the relationship between abduction and induction. The relationship be-
tween abductive hypotheses and inductive hypotheses is shown in Figure 11.1. It gives
a clue to the integration of abduction and induction. By considering this relationship,
my solution to their integration is
2) to generate predicate rules by induction from abduced hypotheses, and then put
them into the knowledge base as facts.
mapping function is presently very simple, AAR can generate plausible hypotheses by
analogical mapping from clauses in the knowledge base. Therefore, induced rules are
also plausible ones.
Arima has shown that induction and analogy have a common form of preduc-
tion (Arima, 1997). He has described empirical inductive reasoning as "preduction
and mathematical induction" and analogical reasoning as "preduction and deduction."
From this viewpoint, analogical reasoning and inductive reasoning come from the
same root and are rather similar. Furthermore, I think analogical mapping works as
the ideal inductive reasoning. I believe this is so because, while mathematical in-
duction is done without regarding the relation between examples, the ideal inductive
examples collection will be done by analogical mapping. As such, analogical mapping
will work as the subtask of inductive reasoning.
Dimopoulos and Kakas have suggested that abduction can help exploit high-level
background theory available for learning and help handle possible incompleteness in
the background theory (Dimopoulos and Kakas, 1996b). Their concept for integra-
tion of abduction and induction is similar to the integration presented in this chapter.
However, it uses negative observations to eliminate inaccurate theories. Instead of
generating negative examples, it refines existing rules. Despite its use of negative
observations, it seems to be another way of learning with only positive examples.
12 INTEGRATING ABDUCTION
AND INDUCTION IN MACHINE
LEARNING
Raymond J. Mooney
12.1 INTRODUCTION
Abduction is the process of inferring cause from effect or constructing explanations
for observed events and is central to tasks such as diagnosis and plan recognition.
Induction is the process of inferring general rules from specific data and is the primary
task of machine learning. An important issue is how these two reasoning processes can
be integrated, or how abduction can aid machine learning and how machine learning
can acquire abductive theories. The machine learning research group at the University
of Texas at Austin has explored these issues in the development of several machine
learning systems over the last ten years. In particular, we have developed methods for
using abduction to identify faults and suggest repairs for theory refinement (the task of
revising a knowledge base to fit empirical data), and for inducing knowledge bases for
abductive diagnosis from a database of expert-diagnosed cases. We treat induction and
abduction as two distinct reasoning tasks, but have demonstrated that each can be of
direct service to the other in developing AI systems for solving real-world problems.
This chapter reviews our work in these areas, focusing on the issue of how abduction
and induction is integrated. 1
Recent research in machine learning and abductive reasoning have been character-
ized by different methodologies. Machine learning research has emphasized experi-
1Additional details are available in our publications listed in the bibliography, most of which are available
in postscript on the World Wide Web at h t tp : 1 j www. cs . u texas . e dujus ers/ml.
181
P.A. Flach and A.C. Kakas (eds.), Abduction and Induction, 181-191.
© 2000 Kluwer Academic Publishers.
182 R.J.MOONEY
of a negative example). Revising a logical theory may require both adding and remov-
ing clauses as well as adding or removing literals from existing clauses. Generally, the
ideal goal is to make the minimal syntactic change to the existing theory according to
some measure of edit distance between theories that measures the number literal ad-
ditions and deletions that are required to transform one theory into another (Wogulis
and Pazzani, 1993; Mooney, 1995b). Unfortunately, this task is computationally in-
tractable; therefore, in practice, heuristic search methods must be used to approximate
minimal syntactic change. Note that compared to the use of background knowledge in
induction, theory refinement requires modifying the existing background knowledge
rather than just adding clauses to it. Experimental results in a number of realistic
applications have demonstrated that revising an existing imperfect knowledge base
provided by an expert results in more accurate results than inducing a knowledge base
from scratch (Ourston and Mooney, 1994; Towell and Shavlik, 1993).
Abduction would find that the assumption T (a) makes this positive example prov-
able. Therefore, two possible revisions to the theory are to remove the literal T (X)
from the second clause in the theory, or to learn a new clause for T (X ) , such as
T(X) :- V(X).
Q(X) V(X).
or
Q(X) :- S(X) I V(X) 0
In order to find a small set of repairs that allow all of the positive examples to be
proven, a greedy set-covering algorithm can be used to select a small subset of the
union of repair points suggested by the abductive explanations of individual positive
examples, such that the resulting subset covers all of the positive examples. If sim-
ply deleting literals from a clause causes negative examples to be covered, inductive
methods (e.g. ILP techniques like FOIL (Quinlan, 1990)) can be used to learn a new
clause that is consistent with the negative examples. Continuing the example, assume
the positive examples are
P(a) R(a) 1 S(a) 1 V(a) 1 W(a).
P(b) R(b) I V(b) I W(b) 0
The abductive assumptions Q ( a ) and Q (b) are generated for the first and second
positive examples respectively. Therefore, making a repair to the Q predicate would
cover both cases. Note that the previously mentioned potential repairs to T would not
cover the second example since the abductive assumption T (b) is not sufficient (both
T (b) and s (b) must be assumed). Since a repair to the single predicate Q covers
both positive examples, it is chosen. However, deleting the antecedent Q ( x) from
the first clause of the original theory would allow both of the negative examples to be
proven.
Therefore, a new clause for Q is needed. Positive examples for Q are the required
abductive assumptions Q (a ) and Q (b) . Negative examples are Q ( c ) and Q (d)
since these assumptions would allow the negative examples to be derived. Given the
descriptions provided for a b c and d in the examples, an ILP system such as
I I
since this is the simplest clause that covers both of the positive examples without
covering either of the negatives. Note that although the alternative, equally-simple
clause
Q(X) :- W(X)
covers both positive examples, it also covers the negative example Q (d) .
A general outline of the basic procedure for using abduction for theory refinement is
given in Figure 12.1. The selection of an appropriate subset of assumption sets (repair
points) is generally performed using some form of greedy set-covering algorithm in
order to limit search. Selection of an appropriate assumption set may be based on an
186 R.J. MOONEY
estimate of the complexity of the resulting repair as well as the number of positive
examples that it covers. For example, the more negative examples that are generated
when the literals corresponding to an assumption set are deleted, the more complex
the resulting repair is likely to be.
The EITHER (Ourston and Mooney, 1990; Ourston and Mooney, 1994; Ourston,
1991) and NEITHER (Baffes and Mooney, 1993; Baffes, 1994) theory refinement sys-
tems allow multiple assumptions in order to prove an example, preferring more spe-
ci1ic assumptions, i.e. they employ most-specific abduction (Cox and Pietrzykowski,
1987). AUDREY (Wogulis, 1991), AUDREY II (Wogulis and Pazzani, 1993), A3
(Wogulis, 1994), and CLARUS (Brunk, 1996) are a series of theory refinement systems
that make a single-fault assumption during abduction. For each positive example, they
find a single most-specific assumption that makes the example provable. Different
constraints on abduction may result in different repairs being chosen, affecting the
level of specificity at which the theory is refined. EITHER and NEITHER strongly pre-
fer making changes to the more specific aspects of the theory rather than modifying
the top-level rules.
It should be noted that abduction is primarily useful in generalizing a theory to
cover more positive examples rather than specializing it to uncover negative examples.
A separate procedure is generally needed to determine how to appropriately specialize
a theory. However, if a theory employs negation as failure, abduction can also be used
to determine appropriate specializations (Wogulis, 1993; Wogulis, 1994).
It should also be noted that a related approach to combining abduction and in-
duction is useful in learning definitions of newly invented predicates. In particular,
several ILP methods for inventing predicates use abduction to infer training sets for
an invented predicate and then invoke induction recursively on the abduced data to
learn a definition for the new predicate (Wirth and O'Rorke, 1991; Kijsirikul et al.,
1992; Zelle and Mooney, 1994; Stahl, 1996; Flener, 1997) . This technique is ba-
sically the same as using abduced data to learn new rules for existing predicates in
theory refinement as described above.
INTEGRATING ABDUCfiON AND INDUCfiON IN MACHINE LEARNING 187
A final interesting point is that the same approach to using abduction to guide re-
finement can also be applied to probabilistic domain theories. We have developed a
system, BANNER (Ramachandran and Mooney, 1998; Ramachandran, 1998) for re-
vising Bayesian networks that uses probabilistic abductive reasoning to isolate faults
and suggest repairs. Bayesian networks are particularly appropriate for this approach
since the standard inference procedures support both causal (predictive) and abductive
(evidential) inference (Pearl, 1988b). Our technique focuses on revising a Bayesian
network intended for causal inference by adapting it to fit a set of training examples of
correct causal inference. Analogous to the logical approach outlined above, Bayesian
abductive inference on each positive example is used to compute assumptions that
would explain the correct inference and thereby suggest potential modifications to the
existing network. The ability of this general approach to theory revision to employ
probabilistic as well as logical methods of abduction is an interesting indication of its
generality and strength.
95
90
85
80
75 )J· ....
u • ·EJ· ................. .
~
0
(.)
70
~
0
65 c:r EITHER~
NEITHER -+--·
60 C4.5 ·D··
BANNER x -
55
50
45
0 10 20 30 40 50 60 70 80 90
Training Examples
edge base such that the correct diagnosis for each training example is a minimum
cover. The system uses a fairly straightforward hill-climbing induction algorithm. At
each iteration, it adds to the developing knowledge base the individual disorder
--+ symptom rule that maximally increases accuracy of abductive diagnosis over the
complete set of training cases. The knowledge base is considered complete when the
addition of any new rule fails to increase accuracy on the training data.
An outline of the learning algorithm is given in Figure 12.3. It assumes E is the set
of training examples, {E 1 ..• En}, where each E; consists of a set of disorders D; and a
set of symptoms S;. An example is diagnosed by finding the minimum covering set of
disorders given the current rule-base, R, using the BIPARTITE algorithm of (Peng and
Reggia, 1990). If there are multiple minimum covering sets, one is chosen at random
as the system diagnosis. To account for the fact that both the correct and system
diagnoses may contain multiple disorders, performance is measured by intersection
accuracy. If Sis the system diagnosis and C the correct diagnosis, the intersection
accuracy is:
(ISnCI/ISI + ISnCI/ICI)/2.
The average intersection accuracy across a set of examples is used to evaluate a knowl-
edge base.
LAB employs a fairly simple, restricted, propositional model of abduction and a
simple, hill-climbing inductive algorithm. However, using techniques from induc-
tive logic programming (ILP), the basic idea of using induction to acquire abductive
knowledge bases from examples can be generalized to more expressive first-order rep-
resentations. Both (Dimopoulos and Kakas, 1996b) and (Lamma et al., this volume)
present interesting ideas and algorithms on using ILP to learn abductive theories; how-
ever, this approach has yet to be tested on a realistic application. Finally, on-going
research on the induction of Bayesian networks from data (Cooper and Herskovits,
1992; Heckerman, 1995) can be viewed as an alternative approach to learning knowl-
edge that supports abductive inference.
190 R.J.MOONEY
60r----r----r----r----r--=~====~==~==~
....--:~~::.:::.:5'-<:~::~·.:::::>_..-:. (3· • . •
so
,~;::::/:" --: ·- . . _:
>- 40
0
~
~
0 / //·/,·.• : / ·"' • LAB -+-
0
0 ''''' HULTI-DIA.G·ID3 -+-- -
"c /
:
:. /
I BACKPROP ·0--
EXPERT- KB ··;.(·····
.;: 30
·: :,----- ___ ./ ,.·>;
.
HULTI-DIAG-PFOIL _,. __
"~
0
·,,
~ ... /
;;"
20
10
0._--~-----L----~----L---~----~----L---~
0 10 15 20 25 30 35 40
Training Examples
The resulting learning curves are shown in Figure 12.4. All results are averaged
over 20 separate trials with different disjoint training and test sets. The results demon-
strate that abductive knowledge bases can be induced that are more accurate than man-
ually constructed abductive rules. In addition, for limited number of training exam-
ples, induced abductive rules are also more accurate than the knowledge induced by
competing machine learning methods.
12.5 CONCLUSIONS
In conclusion, we believe our previous and on-going work on integrating abduction
and induction has effectively demonstrated two important points: 1) Abductive rea-
soning is useful in inductively revising existing knowledge bases to improve their
accuracy; and 2) Inductive learning can be used to acquire accurate abductive theo-
ries. We have developed several machine-learning systems that integrate abduction
and induction in both of these ways and experimentally demonstrated their ability to
successfully aid the construction of AI systems for complex problems in medicine,
molecular biology, and intelligent tutoring. However, our work has only begun to
explore the potential benefits of integrating abductive and inductive reasoning. Fur-
ther explorations into both of these general areas of integration will likely result in
additional important discoveries and successful applications.
Acknowledgments
Many of the ideas reviewed in this chapter were developed in collaboration with Dirk Ourston,
Brad Richards, Paul Baffes, Cindi Thompson, and Sowmya Ramachandran. This research
was partially supported by the National Science Foundation through grants IRI-9102926, IRI-
931 0819, and IRI-9704943, the Texas Advanced Research Projects program through grant ARP-
003658-114, and the NASA Ames Research Center through grant NCC 2-629.
IV The integration of
abduction and induction: a
Logic Programming perspective
13 ABDUCTION AND INDUCTION
COMBINED IN A METALOGIC
FRAMEWORK
Henning Christiansen
13.1 INTRODUCTION
We see abduction and induction as instances within a wide spectrum of reasoning pro-
cesses. They are of special interest because they represent pure and isolated forms.
These together with deduction were identified by C.S. Peirce, and central to his phi-
losophy was the claim of these being the fundamental mechanisms of reasoning as
spelled out in more detail by Flach and Kakas in their introductory chapter to this
volume.
In this chapter, we show that notions and methods developed for metaprogramming
in logic programming can provide a common framework and computational models
for a wide range of reasoning processes, including combinations of abduction and in-
duction. We show examples developed in an implemented metaprogramming system,
called the DEMO system, whose central component is a reversible implementation of
a proof predicate. Reversibility means that the proof predicate can work with partly
specified object programs, and the implementation may produce object program frag-
ments that make the given query provable. Using this proof predicate, we can give
declarative specifications of the overall consistency relation in a given context, cov-
ering a wide range of computations. When facts of an object program are unknown,
the specified process resembles abduction. When rules are unknown, it resembles in-
duction, but any combination of known and unknown rules and facts can be specified,
thus providing models for this wider spectrum of reasoning processes.
195
P.A. Flach and A. C. Kakas (eds.), Abduction and Induction, 195-211.
© 2000 Kluwer Academic Publishers.
196 H. CHRISTIANSEN
nations thereof. This includes a kind of analogical reasoning as well as the derivation
from observations of whole theories of rules and facts. The basic algorithms that im-
plement the DEMO system are based on metaprogramming and constraint logic meth-
ods outlined in section 13.4. We discuss also the similarities and differences between
abduction and induction that become visible in the execution of the algorithms. The
procedural properties that imply the smooth interaction between different reasoning
methods are explained. In the final section 13.5 we give a summary with a discussion
of related work.
We prefer a ground representation for the object language in order to avoid well-known
semantic problems spelled out by (Hill and Lloyd, 1989). Using a bit of syntactic
sugar, the actual representation of names can be hidden as in our own implemented
metaprogramming framework. Here the notation \ ( p ( ' X ' ) : - q ( ' X ' ) ) is read
as the ground name for the object language clause p ( x) : - q (X) . The backslash
r..
serves as a concrete syntax for the "naming brackets" ·l· This notation is extended
so that partly instantiated patterns also can be described by means of a question mark
198 H. CHRISTIANSEN
In case P and Q are completely specified, demo just replicates the work of a conven-
tional interpreter for logic programs. The interesting applications arise when demo is
queried with partly specified arguments. Assume, for example, that the metavariable z
stands in the position of an unknown object language clause in the program argument
of a call to demo as follows.
demo ( \ [ • • • ? z •••] I rp (a) l )
A capable implementation (as the one we describe in this chapter) will, according to
the specification of demo, compute answers for z, each being a name for an object
language clause that makes the object language query p ( a ) succeed in the completed
program.
A representation of non-provability is useful in order to express integrity constraints
and counterexamples, which are often given as part of abduction and induction prob-
lems. This can be made by allowing some form of negation in the query argument of
demo or by means of an additional proof predicate demo_f a i 1 s ( rPl, rQl) with the
meaning that the object query Q fails in the object program P.
A metalevel query may include user-defined side-conditions limiting the program
fragments sought to, say, rules of a certain form or facts belonging to a class of ab-
ducibles. In general, a user can define new metalevel predicates making any combina-
tion of the syntactic and semantic facilities offered by the framework and making full
use of the general logic programming capabilities of the metalanguage.
In this chapter we show, by means of a series of examples developed in an im-
plemented metaprogramming framework called the DEMO system, that it is possible
using this approach to specify a wide range of tasks involving abduction and induction
- and combinations thereof- in a quite natural and declarative way.
The DEMO system is implemented in Sicstus Prolog (SICS, 1998), thus includ-
ing its repertoire of built-in predicates, delay mechanisms, and facilities for writing
constraint solvers in the metalanguage. In addition, the DEMO system provides
• a syntactically sugared naming relation as indicated above, with a Prolog-like
syntax for the object language and the inherent ambiguity resolved using three
ABDUCTION AND INDUCfiON COMBINED IN A METALOGIC FRAMEWORK 199
different naming operators, \ for programs and clauses, \ \ for atoms, con-
straints and conjunctions, and \\\ for terms,
R F Obs
deduction v v J!?
abduction v ? v
induction ? v v
general reasoning vn vn J!?
1The most recent version of the system and documentation is available on-line at http: I jwww. dat.
ruc.dkjsoftwarejdemo.html.
200 H. CHRISTIANSEN
For reasons of symmetry in the examples, we introduce the following for negative
examples and integrity constraints.
demo_fails_family(Rules, Facts, Obs):-
demo_fails(\ (?Rules & ?Facts), Obs) .
13.3.1 Abduction
Different reasoning tasks can be defined by varying the degree of instantiation of the
arguments in a query to the demo_family predicate. In particular, abduction is
performed when part of the facts is left unknown.
Here we give as input to the query some facts about the parent relation and a rule
defining the sibling relation. The free variable NewFact in the query below stands
for an unknown fact which, when added to the fact base, should be able to explain how
it can be the case that sibling (mary, brian).
?- Facts \ [parent(john, mary), parent(jane,mary)],
Rules \ [(sibling('X', 'Y') : -
parent ( 'P' , 'X' ) , parent ( 'P' , 'Y' ) ) ] ,
The following two abductive explanations are returned for this query.
NewFact \ (parent(john,brian):-true)
NewFact \ (parent(jane,brian):-true)
ABDUCfiON AND INDUCfiON COMBINED IN A METALOGIC FRAMEWORK 201
We could also allow for abducing more than one fact in which case the system would
return one more answer providing a third parent for mary shared with brian; we
show this in detail in another example below.
13.3.2 Induction
In the following, we ask for a rule defining the sibling relation, giving as input
a number of parent facts together with the observation that mary and brian are
siblings.
?- Facts \[parent(john,mary), parent(jane,mary),
parent(john,brian), parent(hubert,zoe)],
The system suggests the following alternative rules each of which correctly satisfies
the metalogical specification.2
Only the first rule is an intuitively correct definition of the sibling relation. This
leads us to refine the query as follows, giving in addition one negative example.
demo_fails_family(\[?NewRule], Facts,
\\sibling(mary,zoe)),
demo_family(\[?NewRule], Facts,
\\sibling(mary,brian)).
2A standard ordering is imposed on the goals in the body of a clause so that series of equivalent solutions
by means of permutations and duplications of body atoms are suppressed.
202 H. CHRISTIANSEN
demo_fails_family(\[?NewRule], Facts,
\\sibling(mary,zoe)),
demo_family(\[?NewRule], Facts,
\\sibling(mary,brian)),
All answers include the now familiar sibling rule as the value of NewRule to-
gether with one of the following alternative sets of abduced facts.
NewFacts \[(parent(john,donald):-true)]
NewFacts \[(parent(jane,donald) : -true)]
NewFacts \[(parent(aO,mary):-true),
ABDUCTION AND INDUCTION COMBINED IN A METALOGIC FRAMEWORK 203
(parent(aO~donald):-true)]
This means that donald can have one of mary's known parents as one of his own
parents, or that there exists another parent common to mary and donald; the name
a 0 in the last answer is generated by the system. 3
This example illustrates also that demo and the underlying constraint solver is able
to cope correctly with a problem concerning variables in abducibles which makes
some abduction algorithms flounder. In the process of calculating the third answer
above, a single metavariable (visible in the answer as constant aO) stands for unknown
parts of two different and interdependent facts to be abduced. Without some means
to distinguish between meta and object variables this tends to give problems; this
phenomenon is discussed further in the final section of this chapter.
Finally, we notice that this answer has as a consequence that mary has three par-
ents which we might want to suppress by means of an integrity constraint that can be
expressed as follows.
demo_fails_family(\[?NewRule]l \ (?Facts & ?NewFacts) 1
\\ (parent ( 1 P1 1 1 1 C 1 ) 1 parent ( 1 P2 1 1 1 C 1 ) 1
parent( P3 1 1 C 1 )1
1 1
))
The condition reads: For no individual (given by the object variable C), can there
be found three parents that all are different. We used here a constraint dif which
is included in DEMO's object language; d if (It t2 ) means that t 1 and t2 must be
I
syntactically different.
demo_fails_family(\[?NewRule] 1
3The system includes a device which uses an adapted least-general-generalization algorithm in order to
instantiate metavariables in a way which satisfies the pending constraints, thus making the answers more
readable. It needs to be included in the query as an explicit call of a metalevel predicate which we have
suppressed in this presentation.
204 H. CHRISTIANSEN
\[parent(pl,c),parent(p2,c)], \\sibling(pl,p2)),
demo_fails_family(\[?NewRule], NewFacts,
\\sibling(mary,donald)),
demo_family(\[?NewRule], NewFacts,
\\ (sibling(mary,zoe), sibling(donald,peter))).
The first four negative examples express that none of the mentioned individuals are
parents. The next condition expresses, using skolem constants, that whatever sibling
rule is generated, is should not allow two siblings to have a common child. Notice
here, that this call to demo only has the rule in common with the other calls, the facts
are different. The remaining calls are positive and negative examples of the sibling
relation. The following answer is printed out.
NewFacts = \[(parent(aO,mary) : -true),
(parent(aO,zoe):-true),
(parent(bO,donald) : -true),
(parent(bO,peter) : -true)]
This example indicates that metalogical frameworks in our sense have a potential for
being used as general concept learners, producing a theory respecting a certain bias
from a collection of unsorted observations. This bias needs to include an assumption
about a stratification among the predicates used. The actual stratification correspond-
ing to a sequence of examples should be determined dynamically, including which
predicates are to be considered abducible or basic, corresponding to the lowest stra-
tum. The examples considered above are especially simple because they have only two
a priori given strata, one for the abducible parent predicate and one for all others.
Identification of taxonomies seems to be another obvious application, using metalevel
predicates to define the sort of object programs that represent a taxonomy.
(IJ demo ( P, Q) : -
instance(Q, Ql, _),
demol(P, Ql).
S := initial query;
while any of the following steps apply, do
while (tt = t2) E S do S := (S \ {tt = t2} )mgu(t, ,t2);
if some constraint solver rule can apply to S, then do so,
otherwise, select an atom A in S and instance of a clause
with new variables H: -B 1 , ••• , Bn and let
S := S\ {A} U{A =H,B,, ... ,Bn};
niques are necessary in order to avoid floundering and other problems that otherwise
arise with a straightforward implementation in Prolog, e.g., (Gallagher, 1993; Hill and
Gallagher, 1994) in case of partly specified object programs.
The operational semantics for the metalanguage is summarized in the nondeter-
ministic algorithm of Figure 13.2 which is a straightforward generalization of SLD-
resolution with constraint handling. The state S is a set of unresolved constraints and
atoms, "mgu" stands for a most-general-unifier operation which produces a substitu-
tion which is applied to the state; if unification fails, the algorithm stops with failure
for the given branch. Notice that unifications are passed through the state as equations
and executed at the next entry of the loop. A state is final if is different from failure
and consists of constraints only, to which no constraint solver rule applies. A com-
puted answer consists of the constraints in a final state together with the substitutions
which have been made to the variables of the initial query.
The rules of Figure 13.3 define the execution of instance constraints. Each rule
is of the form C 1 ...,... C2 and should be understood as follows: If constraints of the
form indicated by expression C 1 exist in the state, replace them by the constraints
indicated by C2. Rule (11), for example, expresses that if two instance constraints
have the same (meta-) variable as their first arguments, and identical third arguments
(object substitution), then the two second arguments should be unified (and one ofthe
two instance constraints is removed as they anyhow become identical following
the unification). Rules (I2-3) move instance constraints that express bindings to
given object variables into the representation of substitutions. 4 Rules (I4-5) perform a
decomposition of (names for) composite object language phrases. Notice the slightly
different treatment of instance constraints related to terms of the object language
and those related to other categories (atoms, clauses, and conjunctions).
In general, a constraint solver should be such that final constraint sets are guar-
anteed to be in a certain simplified form known to be satisfiable. In (Christiansen,
1998a) we have proved this property together with soundness and completeness of
4Rules (12-3) assume the invariant property, that substitution arguments in the state always have an open
tail so that new bindings can be added.
ABDUCTION AND INDUCTION COMBINED IN A METALOGIC FRAMEWORK 207
5 Solving instance constraints is equivalent to the multiple semi-unification problem which is known to
be undecidable (Kfoury et al., 1990) and some readers may have noticed the close similarity between rules
(11-14) of Figure 13.3 and proposed semi-unification algorithms (LeiB, 1984; Henglein, 1989). However,
the structure of the metainterpreter in Figure 13.1 implies invariant properties that ensure termination.
208 H. CHRISTIANSEN
<C·Il constant_(v),instance(v,t,s)
.,.. v = t, constant (t)
where v is a variable.
is of the right form), it reduces to constraints for the terms supposed to name the
head and the body of a clause. In some cases, these constraints make it possible to
obtain an optimized behaviour of instance constraints as shown in the additional
constraint solver rule in Figure 13.4. Rule (C-I) overrides the delay of the instance
constraint in rule (I4) in the particular case that v is constrained to be a name for an
object language constant.
A user of the DEMO system does not need a detailed knowledge of the underlying con-
straint mechanism when setting up side-conditions for defining a particular reasoning
task. Such conditions can be written as straightforward Prolog definitions, extended
with unsophisticated use of delay mechanisms as they are found in recent versions of
logic programming languages, e.g., Sicstus Prolog (SICS, 1998).
We can illustrate the principle by means of a small example. Assume we want to
define a predicate abducible which accepts names for facts about object language
predicates rand s. The following definition is sufficient.
:-block abducible(-), abd_atom(-).
abducible( \(?A:- true)):- abd_atom(A).
abd_atom( \\r(?Const)):- constant_(Const) .
abd_atom( \\s(?Const)) : - constant_(Const) .
The block directive is a standard Sicstus Prolog declaration which informs the inter-
preter that these predicates should delay until their arguments become instantiated. 6
The overall effect is that abducible becomes a very lazy predicate in the sense
that it tends not to instantiate but rather to be waiting to test the instantiations that
other events in the computation process might perform. In the present context, these
other events are typically actions performed inside demo, and the Prolog interpreter
provides automatically an optimal interleaving of the two predicates. In this way,
backtracking in the abducible predicate is effectively prevented. With more com-
plex patterns for new clauses- typically with an infinite space of candidate hypotheses
- as may be the case for induction or when arbitrary function symbols are used, this
property becomes crucial for efficiency and termination.
6 This use of delays can easily be incorporated in the semantics we have described for the object language.
Change the wording of Figure 13.2 as to include " ... select a rwn-bwcked atom A in S ... " where an atom is
said to be blocked if its predicate has a block declaration of the indicated form and its argument is a variable.
ABDUCTION AND INDUCTION COMBINED IN A METALOGIC FRAMEWORK 209
or less!) fixed once and for all in the underlying resolution-based interpreter. The
relations defined by a logic program can be used for computing a variety of functions
that would require a whole collection of functional or procedural programs. As a
simple but striking example of this quality, consider the standard append predicate,
most often used as a device for computing the concatenation of two lists, but when
used properly, it serves as a quite powerful pattern matching device. This is analogous
to the way we use a proof predicate, normally thought of as a device for computing
proofs from given programs, but using it properly, it can perform other tasks such as
abduction and induction as well.
The method we have used in order to obtain the crucial property of reversibility
in demo descends from a resolution method described in an abstract way in (Chris-
tiansen, 1992). However, the use of constraint logic methods as sketched in this chap-
ter is necessary in order to obtain a reasonably efficient implementation. See (Chris-
tiansen, 1998a) for an overview of other work concerned with the demo predicate.
We can also refer to integrations of procedures for abductive and inductive reason-
ing which can lead to quite powerful systems, we can refer to (Ade and Denecker,
1995) and to work documented in this volume by Lamma et al., Mooney, Inoue and
Haneda, Samaka, and by Yamamoto. The reader is referred to the chapter by Lamma
et al. which gives an overview and comparison of these and related methods.
The approach of Lamma et al. induces abducible theories, a problem specified
quite similarly to our example of induction aided by abduction; it seems also pos-
sible by changing their definitions a bit to extend their methods also to handle the
reverse, reasoning by analogy by making abduction aided by induction. They inherit
a floundering problem, briefly touched upon in section 13.3.3, from the underlying
abduction algorithm that arise in case of variables in abucibles. This problem seems
inherent in abduction algorithms that do not explicitly distinguish between meta and
object level variables, e.g., (Kakas and Mancarella, 1990a; Decker, 1996). The ap-
proaches of (Eshghi, 1988; Denecker and de Schreye, 1992) get around the problem
by inserting skolem constants for "problematic" variables. Denecker and de Schreye
(Denecker and de Schreye, 1998) apply a classification scheme for variables which
seems related to the distinction between object and metavariables based on the vari-
ables' level of quantification. Ade and Denecker (Ade and Denecker, 1995) generalize
the abduction method of (Denecker and de Schreye, 1992) into a method to perform
abduction and induction in parallel under integrity constraints and negation. We have
not made a detailed comparison of the expressibility in this and our approach but it
appears that a metaprogramming framework provides a flexibility not found in other
approaches to set up a diversity of requirements to the facts and rules sought.
One problem in our approach inherited from the logic programming setting is that
it is difficult to put a preference ordering on the answers produced. A property such
as minimality of an abductive explanation cannot be specified in an elegant way. The
natural way to work in DEMO is to set up, and revise, the side-conditions such that the
class of possible solutions becomes sufficiently small.
The methods of induction applied by (Ade and Denecker, 1995), Lamma et al. (this
volume), and most other work references above are descendants of the method for
inductive program synthesis of (Shapiro, 1983). The synthesis process is performed
ABDUCTION AND INDUCTION COMBINED IN A METALOGIC FRAMEWORK 211
Acknowledgments
This research is supported in part by the DART project funded by the Danish Research Councils.
14 LEARNING ABDUCTIVE AND
NONMONOTONIC LOGIC PROGRAMS
Katsumi Inoue and Hiromasa Haneda
14.1 INTRODUCTION
We investigate the integration of induction and abduction in the context of logic pro-
gramming. Our integration proceeds in a way that we learn theories for abductive
logic programming (ALP) in the framework of inductive logic programming (ILP).
Both ILP and ALP are important research areas in logic programming and AI. ILP
provides theoretical frameworks and practical algorithms for inductive learning of re-
lational descriptions in the form of logic programs (Muggleton, 1992; Lavrac and
Dzeroski, 1994; De Raedt, 1996). ALP, on the other hand, is usually considered as an
extension of logic programming to deal with abduction so that incomplete information
is represented and handled easily (Kakas et al., 1992). Learning abductive programs
has also been proposed as an extension of previous work on ILP (Dimopoulos and
Kakas, 1996b; Kakas and Riguzzi, 1997). 1 The important question here is "how do
we learn abductive theories?"
To answer this question, we rely on the following two important ideas presented in
ILP and logic programming:
• Learning nonmonotonic theories has recently received much attention in ILP to
capture the intuition behind learning under incomplete information (Bain and
Muggleton, 1992; Dimopoulos and Kakas, 1995; Inoue and Kudoh, 1997).
1In this chapter, we use the terms abduction and induction precisely in the contexts of ALP and ILP. As
far as these two research fields are considered, each role and the distinction between them are clear without
controversy.
213
P.A. Flach and A.C. Kakas (eds.), Abduction and Induction, 213-231.
© 2000 Kluwer Academic Publishers.
214 K. INOUE AND H. HANEDA
Before we present the main idea for learning abductive theories, we briefly review the
above two issues.
This chapter not only extends (Inoue and Kudoh, 1997) to deal with abduction, but
also revises the previous LELP algorithm. The rest of this chapter is organized as
follows. Section 14.2 outlines how LELP produces ELPs to learn default and non-
deterministic rules. Section 14.3 extends LELP to learn abductive theories. Sec-
tion 14.4 discusses related work, and Section 14.5 concludes this chapter. The Ap-
pendix presents the proof of the correctness of LELP, which is not included in (Inoue
and Kudoh, 1997).
where Li 's (0:::; i:::; n; n 2': m) are literals. Here, the left-hand side Lois called the head
of the rule ( 14.1 ), and the right-hand side is called the body of the rule. A rule with an
empty head is called an integrity constraint, in which the empty head Lo is identified
with false. Two kinds of negation appear in a program: not is the negation as failure
(NAF) operator, and--, is classical negation. Intuitively, the rule (14.1) can be read as:
if Lt, ... ,Lm are believed and Lm+l, ... ,Ln are not believed then Lois believed.
The semantics of ELPs are defined by the notion of answer sets (Gelfond and Lif-
schitz, 1991), which are sets of ground literals representing possible beliefs. 2 The
class of ELPs is considered as a subset of default logic (Reiter, 1980): each rule of the
form ( 14.1) in an ELP can be identified with the default of the form
Ltl\ ... 1\Lm : r;;;;J., ... ,L,.
Lo
where I stands for the literal complementary to L. Then, each answer set is the set of
literals in an extension of the default theory. An ELP is consistent if it has a consistent
answer set. We say that a literal L is entailed by an ELP P, written P f- L, if L is
contained in every answer set of P. In the following, we often denote classical negation
--, as -, NAF not as \+,and the arrow~ as : - in programs.
We allow rules of the form ( 14.1) in background knowledge. We call a rule having
a positive literal in its head a positive rule, and a rule having a negative literal in its
2 While we adopted the answer set semantics for LELP, other semantics for ELPs may be applicable to our
learning framework with minor modification. For example, Lamma et al. use a well-founded semantics for
learning ELPs, and their output hypotheses are in a slightly different form from ours (Lamma et al., 1998).
LEARNING ABDUCfiVE AND NONMONafONIC LOGIC PROGRAMS 217
head a negative rule. In LELP, the input positive examples are represented as positive
literals, and negative examples are denoted as negative literals.
The completeness and consistency of concept learning (Lavrac and Dzeroski, 1994)
can be reformulated in the three-valued setting as follows. Let BG be an ELP as
background knowledge, E a set of positive and negative literals as the union of positive
examples£+ and negative examples E-, and H a set of rules as the output hypotheses.
1. H is complete with respect to BG and E if for every e E E, H covers e, i.e.,
BGUHI--e.
2. H is consistent with respect to BG and E if for any e E E, H does not cover e,
i.e.,BGUH lfe.
Note here that positive examples are not given any higher priority than negative ones.
Both positive and negative examples are to be covered by the learned rules that are
consistent with respect to background knowledge and examples. Thus, we will learn
both positive and negative rules: no CWA is assumed to derive non-instances as in (De
Raedt and Bruynooghe, 1990).
In the above formalization, whenever H is complete with respect to BG and E, the
consistency of H with respect to BG and E can be replaced with the consistency of
BG U H under the answer set semantics.
Proposition 14.1 Let BG, E and H be the same as above. Suppose that H is complete
with respect to BG and E. Then, the following two statements are equivalent.
(I) His consistent with respect to BG and E.
(2) BG U H is consistent, that is, BG U H has a consistent answer set.
3. OWS(To,E+,E-,BG,AB,TI);
% Compute default rules T1 and exceptions AB by specializing To
% so that T1 is consistent wrt. BG and E-.
4. Counter(E-,AB,TJ,Tz);
%Corresponding to T1, generate rules Tz covering E- from AB.
5. Cancel(AB,BG, T3).
% Generalize AB to default cancellation rules T3 wrt. BG.
In Step 2 of Algorithm 14.1, given positive (or negative) examples E and back-
ground knowledge BG, general rules Tare generated to cover E using an ordinary ILP
technique. We denote this part of algorithm as GenRules(E,BG, T), which generates
a minimal set of rules T satisfying that, for each example e E E,
1. BG U T f- e, where BG U T is consistent, and
2. there exists a rule in T whose head is unifiable with e.
Here, the latter condition for GenRules(E,BG, T) means that examples E are covered
directly by T, so that T can be regarded as a definition of the learned concept. Note
also that T is complete with respect to BG and E. Then, if no generalization is in-
duced for some examples in E, they should be just added to hypotheses T. In a special
case, T can be even identical to E, so that the existence of the output of GenRules is
always guaranteed. We do not assume any particular learning algorithm for the imple-
mentation of GenRules. However, since no negative example is used to cover positive
examples, some restrictions on the form of learned rules are necessary if a top-down
learning algorithm is used in GenRules. For instance, learned rules should be range-
restricted, that is, every variable in a rule should appear in the body. The inductive
bias can also be introduced in the definition of GenRules. Anyway, GenRules can be
considered as a black box, and here we are not concerned with the details.
General rules computed by GenRules(E+ ,BG, T) cover the positive (resp. neg-
ative) examples £+, but may also cover the complements of some negative (resp.
positive) examples in E-. To specialize general rules, we use the algorithm of open
world specialization (OWS), which is closely related to the closed world specializa-
tion (CWS) (Bain and Muggleton, 1992). Unlike CWS, OWS does not apply CWA to
identify non-instances of the target concept. In OWS, exceptions are identified from
literals contained in negative examples (or positive examples if the general rule is
negative) such that their complements are proved from general rules with background
knowledge.
else N := ab;(VI, ... , Vn), where ab; is a new predicate appearing nowhere in
BG and {VI, ... , Vn} are the variables in H;
T' := (T'\ {C;})U{ (H :- B, \+N) };
AB :=ABU{Ne I LeE Exc}.
Example 14.1 LELP is implemented in SICStus Prolog and is called by lelp (Ex-
arnples, Background_Knowledge, Result). In Examples, atoms pre-
ceded by + represents positive examples, and those with - are negative examples.
I ?- lelp( [+flies(l) , +flies(2),+flies(3),+flies(4),+flies(S),
+flies(6) , -flies(a),-flies(b), - flies(c)],
[bird(l),bird(2),bird(3) , bird(4),bird(5),bird(c),
(bird(X) :- pen(X)),pen(a),pen(b)], Rules).
Rules =
[(flies(A):-bird(A),\+abl(A)), flies(6), % Tl by OWS
(-flies(B):-abl(B)), % T2 by Counter
(abl(C):-pen(C)), abl(c)]? % T3 by Cancel
Theorem 14.2 Given background knowledge BG, positive examples£+ and negative
examples E-, let H be the hypotheses produced by
LELP1 (E+ ,E- ,BG,H). Assume that BGUE is consistent, where E = E+ uE-. Then,
(1) His complete with respect to BG and E, i.e., VeE E (BGUH f- e).
(2) His consistent with respect to BG and E, i.e., VeE E (BGUH If e) .
3 Inconventional machine learning methods, a search bias and a noise-handling mechanism are usually
implemented to prevent the induced hypotheses from overjitting the given examples. See (Lavrac and
Dreroski, 1994, Chapter 8) for an overview of mechanisms for handling imperfect data in ILP. These con-
ventional approaches to noise handling can also be applied to the determination and the implementation
of GenRules in learning positive or negative rules, e.g., (Srinivasan et a/., 1992), in conjunction with our
solutions. Since both positive and negative concepts are learned in our proposals, the use of parallel default
rules and nondeterministic rules further minimizes the number of incorrectly classified training examples.
LEARNING ABDUCTIVE AND NONMONafONIC LOGIC PROGRAMS 221
[bird(l)lbird(2)1 bird(3)1bird(4)1
bird(S)Ibird(6) 1 bird(7) 1 bird(8)]~ Rules).
Here, the ratio of positive examples is 50%, and both positive and negative rules can
be generated. If we use Basic LELP in parallel, the following parallel default rules
are computed:
(flies(A):-bird(A)~\+abl(A))I
abl(S)~abl(6)1abl(7)1abl(8)1
(-flies (B) :-bird (B) 1 \+ab2 (B)) I
ab2(l)lab2(2)1ab2 (3)1ab2(4)
In the above rules, we do not include the rules produced by Counter, i.e., ( - f 1 ies (A)
: -abl (A)) and (flies (B): -ab2 (B)). In fact, these rules are not necessary
since we learn default rules for both positive and negative examples in parallel. No-
tice also that exceptions with the abl and ab2 predicates cannot be generalized here
because rules like (abl(A): -bird(A)) and (ab2(B): -bird(B)) do not sat-
isfy the constraint in Cancel. Now, the above rules look like correct, but if bird ( 9)
is added to background knowledge, neither abl ( 9) nor ab2 ( 9) is proved, so that a
contradiction occurs. In such a case, nondeterministic rules are generated by adding a
NAF formula noty to the body of each parallel rule withy in the head. In the example,
we have:
(flies(A) :-bird(A)I\+abl(A)~\+ -flies(A))I
abl(S)~abl(6)1abl(7)1abl(8)1
(-flies(B) :-bird(B)I\+ab2(B)~\+flies(B))~
ab2(l)lab2(2)1ab2 (3)1ab2(4)
The above two rules act as nondeterministic rules, and each default rule is represented
in the form:
Hence,for the bird 9, two answer sets exist, one concluding flies ( 9) and the other
- f 1 i e s ( 9 ) , but neither one is entailed.
1. GenRules(AB,BG, T), under the condition that IS\ AI< IAI, where Sis the set
of ground abi literals entailed by BG U T, and A is the set of ground instances
fromAB.
The condition IS\AI < IAI in Algorithm 14.5 replaces the stronger conditionS= A
in Algorithm 14.4. Here, the set S \A denotes the exceptions to exceptions. Under
this new condition, S may properly include A as long as the number of elements in
S \ A is less than that in A. 4 This condition represents the monotone assumption that
"in every level of the hierarchy, the number of exceptions is less than that of instances
with default properties".
To learn hierarchical default rules, in Algorithm 14.5, we need to call algorithms of
Cance/2 and OW S recursively. The procedure stops when there are no more exceptions
or no more instances to be generalized. The extended algorithm LELP2 is as follows. 5
4 We can also consider another criteria for learning hierarchical default cancellation rules. For example, we
In Algorithm 14.7, AB low denotes the exceptions to the default cancellation rules R
that cover the exceptions AB.
Rules =
[(-flies(A):-animal(A) I \+abl(A))I % R4 by OWS
(flies(B): -abl(B)) 1 % R6 by Counter
(abl(C): -bird(C) \+ab2 (C))
1 1 % R by OWS in ABs
1
r} and r' = { 8R I R E r}. Note that ( P', r' ) is an AELP. Second, for each atomic
abducible 8 in r', the following pair of new rules are introduced:
8 f- not•8,
(14.2)
·8 f- not8.
Third, P* is defined as the union of P' and the set of rules of the form (14.2) from
r'. Then, there is a one-to-one correspondence between the belief sets of (P,r) and
the consistent answer sets of P*. The nondeterministic rules (14.2) produce multiple
answer sets, one containing 8 and the other •8, which correspond to the addition and
non-addition of the original hypothesis, respectively.
Theorem 14.3 Let P be an ELP produced by LELP, and ( P*, r•) the AELP con-
structed as above. Then,for every consistent answer setS ofP, there is a beliefsetS
of ( P*, r• ) such that S = S* \ r•.
such that B2 is true in S' again implies that -.y is true in S'. Otherwise, if S contains
neither"{ nor -.y, B 1 and B2 are not true in S. Then, for S' in which B 1 and B2 are not
true, neither"{ nor -.y is true in S'. Therefore, any consistent answer set S of P is also
a belief set of ( P1 u E 1, r 1 ) •
Next, there is a one-to-one correspondence between the belief sets of ( P1 U E 1, r 1 )
and the belief sets of the AELP (P2UE1,r2) as shown in Section 14.3.1 and (Inoue,
1994).
Finally, any belief set of the AELP ( P2 u E 1 , r 2) is also a belief set of the AELP
(P2UE2,r2)(= (P* ,r* )). Hence, the result follows. I
Example 14.4 (Nixon Diamond) Suppose that examples and background knowledge
are given as follows.
E = { +pacifist(c), +pacifist(d), -pacifist( a), -pacifist(b) }.
BG = {republican (a}, republican(b}, quaker( c), quaker(d},
republican(nixon},quaker(nixon} }.
P = BGUN1 has two answer sets, one containing pacifist (nixon) and the other
-pacifist(nixon). Let P1 = P\N1 = BG and E1 =E. Next, the knowledge
system (PI u E I ' r I ) has the abducible rules:
r1 = { (pacifist(A}: -quaker(A}}, ( -pacifist(B}: -republican(B}} }.
Now, by naming abducible rules, the AELP (P* ,r*) is obtained as:
P*={ (pacifist(A} :- quaker(A}, dove(A}},
(-pacifist(B) :- republican(B), hawk(B)),
dove(c}, dove(d}, hawk(a}, hawk(b} } UBG,
r*={ dove(A}, hawk(B} }.
The converse of Theorem 14.3 does not hold. For Example 14.4, the AELP ( P*, r* )
has a belief set containing neither pacifist (nixon) nor -pacifist (nixon),
which is not an answer set of P. In that belief set, neither dove (nixon) nor
hawk (nixon) is abduced, so that nixon is neutral or irrelevant to this matter. Note
in this case that the truth value of pacifist (nixon) is unknown for both P and
P*. In general, however, an ELP P entails more literals than the corresponding AELP
P*, as shown in the next subsection.
The next theorem shows that P* entails every example in E.
Theorem 14.4 Let P and ( P*, r*) be the same as in Theorem 14.3, and E the given set
ofpositive and negative examples. Let M 1 be the set of literals entailed by P, and M2
the set ofliterals contained in every beliefset of ( p*, r• ). Then, M 1 n E = M2 n E = E.
226 K. INOUE AND H. HANEDA
Proof By Theorem 14.3, ignoring naming literals from f2, the answer sets of Pare
included in the belief sets of P*, so the intersection Mt of the former sets includes the
intersection M2 of the latter sets. Hence, Mt 2 M2, and so Mt n E 2 M2 n E.
Now, suppose that there is a literal L in (M 1 \ M2 ) n E. Then, L is included in every
answer set of P, but there is a belief set S of ( P* , f* ) such that (i) S is not an answer
set of P and (ii) L ~ S. By (i), there is a pair of literals y and -,y from some rules of the
form ( 14.3) in P such that neither y nor -,y is in S although either B 1 or B2 is true in S.
In other words, neither of two rules ( 14.4) is abduced in S. Here, we can assume that
rules ( 14.3) are generated by LELP to account for the nondeterminism of the concept
to be learned. Then, y and -,y never appear in the body of any rule other than (14.3).
Therefore, Lis either y or -,y. In either case, Lis covered by one of the rules (14.3).
Since L E E, L must be in E 1• Then, the corresponding abducible name 8 is contained
in £ 2 , which is a part of P*. This implies that 8 is in every belief set of ( P*, r•) and
hence Lis also inS, contradicting with (ii). Hence, (Mt \ M2) n E is empty.
Since E is covered by P, M1 nE = E holds. Therefore, Mt nE = M2nE =E. I
~ penguin(X) I norrnali(X) .
6 Thisprocess leads us to the so-called abduction-induction cycle, in which abduced literals are input to the
inductive process as examples to be generalized, while rules generated by the inductive process are used as
background knowledge in the abductive process (Flach and Kakas, this volume, Section 1.4).
LEARNING ABDUCfiVE AND NONMONOTONIC LOGIC PROGRAMS 227
LELP generates the rule h = (flies(x) +- bird(x)), which realizes an inductive leap
because BG U {h} entails flies( oliver) .1 In the three-valued semantics, however, one
often wants to conclude that flies( oliver) is unknown. To derive this weak con-
clusion, a new abducible is added to h as (flies(x) +- bird(x),normal(x)). Note
again that the abducible normal(x) can be regarded as the name of the abducible rule
(flies(x) t- bird(x)) (Poole, 1988a; Inoue, 1994). In this case, normal(tweety) has to
be introduced as a fact. For oliver, if we assume normal (oliver) it flies; otherwise, we
do not know whether it flies or not. This inference is preferable if we do not want to
make rules defaults but need to have rules as candidate hypotheses.
14.3.4 Discussion
Sections 14.3.2 and 14.3.3 have shown that we can convert rules learned by LELP
into rules with abducibles. We now call this learning method LAELP. Since LAELP
is based on learning ELPs, it can be regarded as an indirect method to generate ab-
ductive theories. An alternative way is to learn abductive programs directly from the
beginning. This is possible because we can generate abducible rules instead of rules
with NAF when we learn default rules or nondeterministic rules.
One might think that LELP is enough for learning under incomplete information
because the class of ELPs is as expressive as the class of AELPs. The question as to
why we need to learn abductive theories should be answered by considering the role of
abduction in application domains. One may often understand abductive theories more
easily and more intuitively than theories represented in other nonmonotonic logics.
For example, in diagnostic domains, background knowledge contains the cause-effect
relations and abducibles are written as a set of causes. Moreover, in the process of the-
ory formation, incomplete knowledge is naturally represented in the form of abducible
rules.
As shown in Section 14.3.2, both LELP and LAELP can avoid the situation that
an instance may be classified as both positive and negative. In LELP, this can be
achieved only by nondeterministic rules, while the same effect is also obtained by
abducible rules in LAELP. In Section 14.3.3, we have shown a difference between
abducible rules and default rules. Inductive leaps can be better avoided by abducible
rules, while defaults are better represented by rules with NAF.
Another merit of learning abductive theories lies in the fact that abductive proof
procedures developed for ALP are computationally useful. Often abductive proce-
dures can be implemented more easily than theorem provers for other kinds of non-
monotonic reasoning.
7To avoid inductive leaps, some researchers propose a weak form of induction by applying CWA to BG U E
through Clark's completion, e.g., (De Raedt and Lavrac, 1993). However, as explained earlier, CWA is not
appropriate in learning ELPs.
228 K. INOUE AND H. HANEDA
the ordinary one in ELPs. Abductive entailment is necessary when we allow an ab-
ductive program as background knowledge. While an abducible literal introduced by
LAELP is either of the type expressing the nondeterminism of rules or of the type of
defaults, Lamma et al. can learn rules containing more than two abducibles, which are
available from background knowledge. However, such an extension with abductive en-
tailment is not obvious for LAELP, because we would like to acquire new abducibles
and revise old abducibles at the same time in the abduction-induction cycle. This is
important future work.
Learning abductive programs is also investigated in different contexts by other re-
searchers, including Mooney and Sakama in this volume and (Kanai and Kunifuji,
1997). They use abduction to compute inductive hypotheses efficiently, and such an
integration is useful in theory refinement.
14.5 CONCLUSION
We presented an integration of abduction and induction from the viewpoint of learning
abductive logic programs in ILP. We proposed techniques to learn abductive theories
based on the theory of learning nonmonotonic rules and the relationships between non-
monotonic reasoning and abductive reasoning. The proposed learning system LAELP
can avoid problems in handling nondeterminism and inductive leaps by introducing
new abducibles. This automatic discovery of abducibles is important for learning un-
der incomplete information.
The knowledge representation language on which our proposal is based is abduc-
tive extended logic programs, which is rich enough to allow explicit negation, default
negation, and abducibles in programs. Since both nonmonotonic programs and ab-
ductive programs can be used to represent incomplete information, one may represent
incomplete knowledge in various ways. This means that possible solutions other than
the method presented in this chapter could be considered to learning under incomplete
information. Hence, our important future work is to consider which representation
method is better in acquiring knowledge of various domains. We should make such a
comparison in terms of expressiveness, efficiency, comprehensibility and applicability.
Acknowledgments
We are indebted to Hiroichi Nakanishi, Yoshimitsu Kudoh and Munenori Nakakoji for their
assistance to implement versions of L(A)ELP. We would like to thank Chiaki Sakama for his
valuable comments.
In Step 2, GenRules(E+ ,BG, To). Then, To is complete and consistent with respect to
BG and £+ by the consistency of BG U £+. That is,
To=£+: GenRules in Step 2 did not produce new rules. In this case, T1 =To and
AB = 0. Then, \leE £+ (BG UAB U T1 f- e) and VeE£+ (BG UAB U T1 If e).
Also, \feE£- (BGUABUT1 1fe) by the consistency of BGUE.
To =1- £+: GenRules in Step 2 produced new rules. Here, we consider the case that
there is only one rule C=( H : - B) in To such that H is resolved with some
literals from E-, but the result can be extended to the case that there are multiple
such rules in To. Consider the following cases.
and
Moreover, BG U T3 never entails any new abi literal which is not an instance of an
element of AB.
Finally, put H = Tt U TzU h With the above results, it holds that
15.1 INTRODUCTION
This chapter proposes an approach for the cooperation of abduction and induction in
the context of Logic Programming. We do not take a stance on the debate on the nature
of abduction and induction (see Flach and Kak:as, this volume), rather we assume the
definitions that are given in Abductive Logic Programming (ALP) and Inductive Logic
Programming (ILP).
We present an algorithm where abduction helps induction by generating atomic hy-
potheses that can be used as new training examples or for completing an incomplete
background knowledge. Induction helps abduction by generalizing abductive expla-
nations.
A number of approaches for the cooperation of abduction and induction are pre-
sented in this volume (e.g., by Abe, Sak:ama, Inoue and Haneda, Mooney). Even if
these approaches have been developed independently, they show remarkable similar-
ities, leading one to think that there is a "natural way" for the integration of the two
inference processes, as it has been pointed out in the introductory chapter by Flach
and Kak:as.
The algorithm solves a new learning problem where background and target theory
are abductive theories, and abductive derivability is used as the example coverage
relation. The algorithm is an extension of a basic top-down algorithm adopted in
ILP (Bergadano and Gunetti, 1996) where the proof procedure defined in (Kak:as and
233
P.A. Flach and A. C. Kakils (eds.), Abduction and Induction, 233-252.
© 2000 Kluwer Academic Publishers.
234 E. LAMMA ET AL.
Mancarella, 1990c) for abductive logic programs is used for testing the coverage of
examples in substitution of the deductive proof procedure of Logic Programming.
The algorithm has been implemented in a system called LAP (Lamma et al., 1997)
by using Sicstus Prolog 3#5. The code of the system and some of the examples shown
in thechapterareavailableathttp : 1;www-lia . deis . unibo . it/Software/
LAP/.
We also discuss how to learn abductive theories: we show that, in case of complete
knowledge, the rule part of an abductive theory can be also learned without abduc-
tion. Abduction is not essential to this task, but it is essential in case of absence of
information, i.e. when the background theory is abductive.
The chapter is organized as follows: in Section 15.2 we recall the main concepts
of Abductive Logic Programming, Inductive Logic Programming, and the definition
of the abductive learning framework. Section 15.3 presents the learning algorithm.
In Section 15.4 we apply the algorithm to the problem of learning from incomplete
knowledge, learning theories for abductive diagnosis and learning exceptions to rules.
Our approach to the integration of abduction and induction is discussed in detail and
is compared with works by other authors in Section 15.5. Section 15.6 concludes and
presents directions for future work.
not_p j arity is added to the set A and the integrity constraint +- p(X), not_p(X) 1 is
added to /C. Then, each negative literal not p(f) in the program is replaced by a lit-
eral not_p({). Atoms of the form not_p({) are called default atoms. For simplicity, in
the following we will write abductive theories with Negation by Default and we will
implicitly assume the transformation.
We define the complement l of a literal[ as
l _ { not_p(x) if z = p(x)
- p(x) if l = not_p(x)
In (Kakas and Mancarella, 1990c) a proof procedure for the positive version of ab-
ductive logic programs has been defined. This procedure (reported in the Appendix)
starts from a goal G and a set of initial assumptions 11; and results in a set of consistent
hypotheses (abduced literals) 110 such that 110 2 11; and 110 is an abductive explana-
tion of G. The proof procedure employs the notions of abductive and consistency
derivations. Intuitively, an abductive derivation is the standard Logic Programming
derivation suitably extended in order to consider abducibles. As soon as an abducible
atom o is encountered, it is added to the current set of hypotheses, and it is proved
that any integrity constraint containing o is satisfied. To this purpose, a consistency
derivation foro is started. Every integrity constraint containing o is considered and 8 is
removed from it. The constraints are satisfied if we prove that the resulting goals fail.
In the consistency derivation, when an abducible is encountered, an abductive deriva-
tion for its complement is started in order to prove its falsity, so that the constraint is
satisfied.
When the procedure succeeds for the goal G and the initial set of assumptions 11;
producing as output the set of assumptions /10 , we say that T abductively derives G
or that G is abductively derived from T and we write T f-!~ G. Negative atoms of the
form not...a in the explanation have to be interpreted as "a must be false in the theory",
"a cannot be assumed" or "a must be absent from any model of the theory".
In (Brogi et al., 1997) it has been proved that the proof procedure is sound and
weakly complete with respect to an abductive model semantics under a number of
restrictions:
• integrity constraints are denials with at least one abducible in each constraint.
The requirement that the program is ground is not restrictive in the case in which there
are no function symbols in the program and therefore the Herbrand universe is finite.
In this case, in fact, we can obtain a finite ground program from a non-ground one by
grounding in all possible ways the rules and constraints in the program.
The soundness and weak completeness of the procedure require the absence of
any definition for abducibles. However, when representing incomplete information,
it is often the case that for some predicate a partial definition is available, expressing
known information about that predicate. In this case, we can apply a transformation to
T so that the resulting program T' has no definition for abducible predicates. This is
done by introducing an auxiliary predicate Oa/ n for each abducible predicate a/ n with
a partial definition and by adding the clause
Predicate a/n is no longer abducible, whereas Oa/n is now abducible. If a(i') cannot
be derived using the partial definition for ajn, it can be derived by abducing Oa(i').
provided that it is consistent with integrity constraints.
Usually, observations to be explained are positive. However, by representing nega-
tion by default through abduction, we are able to explain also negative observations.
A negative observation is represented by a literal not J. and an explanation for it can
be generated by the abductive proof procedure applied to the goal+- notJ.. The ex-
planation of a negative observation has the following meaning: if all the atoms in
the explanation for +- not J. are added to the theory, then +- l will not be derivable.
This differs from abductive frameworks proposed in this volume by Abe, for whom
the explanation of negative observations is uncommon, and by Sakama, for whom the
explanation of negative observations is not allowed.
Find:
Let us introduce some terminology. The sets £+ and E- are called training sets. The
program P that we want to learn is the target program and the predicates which are
defined in it are target predicates. The program B is called background knowledge
and contains the definitions of the predicates that are already known. We say that
the learned program P covers an example e if PUB 'r e. A theory that covers all
positive examples is said to be complete while a theory that does not cover any negative
example is said to be consistent. The set P is called the hypothesis space.
COOPERATION OF ABDUCTION AND INDUCTION IN LOGIC PROGRAMMING 237
The language bias (or simply bias in this chapter) is a description of the hypothesis
space. Some systems require an explicit definition of this space and many formalisms
have been introduced in order to describe it (Bergadano and Gunetti, 1996). In order
to ease the implementation of the algorithm, we have considered only a very simple
bias in the form of a set of literals which are allowed in the body of clauses for target
predicates.
Definition 15.1 (Correctness) An abductive logic program Tis correct, with respect
to£+ and E-, iff there exists 11 such that
where not .E- = {not ...e-ie- E E-} and£+, not .E- stands for the conjunction ofeach
atom in£+ and not.E-
Find:
A new abductive theory T' = (PUP' ,A ,IC) such that P' E P and T' is correct
wrt. £+ and E-.
We say that a positive example e+ is covered if T 1-~ e+. We say that a negative
example e- is not covered (or ruled out) if T 1-~ not..e-
The abductive program that is learned can contain new rules (possibly containing
abducibles in the body), but not new abducible predicates 2 and new integrity con-
straints.
We now give an example of an Abductive Learning Problem.
2 If we exclude the abducible predicates added in order to deal with exceptions, as explained in Section 15.3.
238 E. LAMMA ET AL.
Example 15.1 We want to learn a definition for the concept father from a background
knowledge containing facts about the concepts parent, male and female. Knowledge
about male and female is incomplete and we can make assumptions about them by
considering them as an abducible.
Consider the following training sets and background knowledge:
{father(john, mary),father(david, steve)}
{father(john, steve) ,father(kathy, ellen)}
{parent (john, mary), male (john),
parent(david,steve),
parent(kathy, ellen), female(kathy)}
A= {male/ 1,female/ 1}
IC= { +-- male(X),female(X)}
Moreover, let the bias be
father(X,Y) +--a where a C {parent(X,Y),parent(Y,X),
male(X) ,male(Y),female(X),female(Y)}
A solution to this Abductive Learning Problem is the theory T' = (PUP',A,IC) where
P' = {father(X,Y) +-- parent(X,Y),male(X)}
In fact, the condition on the solution
T 1-~ £+ ,not.E-
is verified with
6. = {male(david),noLfemale(david),not.male(kathy)}.
Note that, for the example father(david,steve), the abductive proof procedure re-
turns the explanation {male(david),not-female(david)} containing also the literal
not-female(david) which is implied by the constraints and male(david) . However,
in this way the explanation is such that it can not be consistently extended without
violating the constraints.
Differently from the ILP problem, we require the conjunction of examples to be
derivable, instead of each example singularly. This is done in order to avoid that
abductive explanations for different examples are inconsistent with each other, as it is
shown in the next example.
Example 15.2 Consider the following abductive theory:
p= {p +--a.
q +--b.}
A= {a/O,b/0}
IC= {+-a,b.}
and consider two positive examples p and q. If taken singularly, they are both ab-
ductively derivable from the theory with, respectively, the explanations {a} and {b}.
However, these explanations are inconsistent with each other because of the integrity
constraint +-- a, b, therefore the conjunction p, q will not be abductively derivable in
the theory.
COOPERATION OF ABDUCTION AND INDUCTION IN LOGIC PROGRAMMING 239
procedure LeamAbdLP(
inputs: £+ ,E- : training sets,
T = (P,A,IC): background abductive theory,
outputs: P': learned theory,/1: abduced literals)
P':=0
11 := 0
repeat (covering loop)
GenerateRule(in: T,E+ ,E- ,P',/1; out: Rule,Etule•I1RuJe)
Add to £+ all the positive literals of target predicates in 11Rule
Add toE- all the atoms corresponding to
negative literals of target predicates in 11Rule
£+ := £+ - Etule
P' :=P'U{Rule}
11 := 11 U 11Rule
until£+ = 0 (Completeness stopping criterion)
outputP'
procedure GenerateRule(
inputs: T,E+ ,E- ,P',6.
outputs: Rule: rule,
Efiute : positive examples covered by Rule,
6.Rule : abduced literals
procedure TestCoverage(
inputs : Rule, T, P', £+, E-, 6.
outputs: Efiute,ERule: examples covered by Rule
6.Rule : new set of abduced literals
EJiu/e := ERule := 0
6.;n := 6.
for each e+ E £+ do
if AbdDer( +- e+, (PUP' U {Rule},A,IC),6.;n,6.our)
succeeds then Add e+ to Efiute; 6.;n := 6.our
end for
for each e- E E- do
if AbdDer( +- not_e-, (PUP' U {Rule},A,IC),6.;n,6.our)
succeeds then 6.;n := 6.our
else Add e- to Efiu 1e
end for
6.Rule : = 6.our - 6.
output Efiute,ERule•6.Rule
negation of each negative example ( f- not _e-). Also in this case, each derivation starts
from the set of abducibles previously assumed. The set of assumptions is initialized
to the empty set at the beginning of the computation, and is gradually extended as it is
passed on from derivation to derivation. This is done as well across different clauses.
Third, some abducible predicates may be also target predicates, i.e., predicates for
which we want to learn a definition. To this purpose, after the generation of each
clause, abduced atoms of target predicates are added to the training set, so that they
become new training examples. For each positive abduced literal/, if it is positive, l
is added to £+, if it is negative, 1is added to E-.
In order to achieve consistency, in rule specialization (Figure 15.2), a rule can be
specialized by adding either a non-abducible literal or an abducible one. However, the
system does not need to be aware of what kind of literal it is adding to the rule since the
abductive proof procedure takes care of both cases. When adding an abducible atom
3(X) that has no definition in the background, the rule becomes consistent because
each negative example p(t-:_) is uncovered by assuming not _B(t-:_) and each previously
covered positive example p(i+) is still covered by assuming B(i+). If the abducible
has a partial definition, some positive examples will be covered without abduction and
others with abduction, while some negative examples will be uncovered with abduc-
tion and others will be covered.
We prefer to first try adding non-abducible literals to the rule, since complete infor-
mation is available about them and therefore the coverage of examples is more certain.
The algorithm also performs the task of learning exceptions to rules. The task of
learning exceptions to rules is a difficult one because exceptions limit the generality
of the rules since they represent specific cases. In order to deal with exceptions, a
number of (new) auxiliary abducible predicates is provided, so that the system can
use them, for rule specialization, when no standard literal or abducible literal with
a partial definition is available from the bias such that a rule for a target predicate
becomes consistent.
The algorithm can be extended in order to learn not only from examples but also
from integrity constraints on target predicates. The details of this extension together
with an example application will be described in Section 15.4.4.
Note that the system is not able to learn full abductive theories, including new
integrity constraints as well. In order to do this, in (Kak:as and Riguzzi, 1997) the au-
thors proposed the use of systems that learn from interpretations, such as Claudien (De
Raedt and Bruynooghe, 1993) and ICL (De Raedt and Van Laer, 1995).
15.4 EXAMPLES
Two interesting applications of the integration are learning from incomplete knowl-
edge and learning exceptions. When learning from incomplete knowledge, abduction
completes the information available in the background knowledge. When learning ex-
ceptions, instead, assumptions are used as new training examples in order to generate
a definition for the class of exceptions (Section 15.4.3).
When learning from incomplete data, what to do with the assumptions depends on
the type of theory we are learning. When learning a non-abductive theory, abduction
242 E. LAMMA ET AL.
P= {parent(john,mary),male(john),
parent( david, steve),
parent(kathy, ellen) ,Jemale(kathy)}
A= {male/1,/emalejl}
/C = { f-- male(X),jemale(X)}
£+ = {!ather(john,mary),father(david,steve)}
E- = {!ather(john,steve),Jather(kathy,ellen)}
The program must first be transformed into its positive version and then into a program
where abducibles have no definition, as shown in Section 15.2.1. For simplicity, we
omit the two transformations, and we suppose to apply the inverse transformations to
the learned program.
At the first iteration of the specialization loop, the algorithm generates the rule
father(X,Y) f--.
which covers all positive examples but also all negative ones. Therefore another itera-
tion is started and the literal parent(X, Y) is added to the rule
father(X,Y) f-- parent(X,Y) .
This clause also covers all positive examples but also the negative example
father(kathy,ellen) . Note that up to this point no abducible literal has been added
to the rule, therefore no abduction has been made and the set !! is still empty. Now, an
abducible literal is added to the rule, male(X), obtaining
father(X,Y) f-- parent(X,Y),male(X).
At this point the coverage of examples is tested. father(john,mary) is covered with-
out abduction, while father(david,steve) is covered with the abduction of
{male( david), not_female( david)}.
COOPERATION OF ABDUCfiON AND INDUCfiON IN LOGIC PROGRAMMING 243
Then the coverage of negative examples is tested by starting the abductive deriva-
tions
~ not_father(john,steve).
~ not_father(kathy, ellen).
The first derivation succeeds with an empty explanation while the second succeeds
abducing not..male(kathy) which is consistent with the fact female(kathy) and the
constraint~ male(X),jemale(X). Now, no negative example is covered, therefore
the specialization loop ends. No target atom is in /)., therefore no example is added to
the training set. Positive examples covered by the rules are removed from the train-
ing set which becomes empty. Therefore also the covering loop terminates and the
algorithm ends, returning the rule
father(X,Y) ~ parent(X,Y),male(X).
and the assumptions
/). = {male( david), not_female(david), not ..male(kathy)}.
At this point, assumptions made are added to the background knowledge in order to
complete the theory, thus performing a kind of theory revision. Only positive assump-
tions are added to the resulting theory, since negative assumptions can be derived by
Negation As Failure 3 • In this case, only male( david) ~ . is added to the theory, while
not female( david) and not male( kathy) can be derived by using Negation As Failure.
P = {flauyre(biket).
circular( bike t).
tyre.Jwlds...air(bike3).
3 Since Pro log proof procedure will be used with the final theory and therefore Negation As Failure will
replace Negation by Default.
244 E. LAMMA ET AL.
circular( bike4).
tyreJwlds..air(bike4) .}
A= {flauyre/l ,broken...spokes/1}
which covers all positive examples, but also all negative ones. In order to rule out neg-
ative examples, the abducible literal not...abnorm1 is added to the body of R1 obtaining
Rz:
Now, the theory is correct and the set of assumptions resulting from the derivations of
positive and (negated) negative examples is
Since abnorm 1/1 is a target predicate, these assumptions become new training exam-
ples yielding:
E+ = {abnorml(c),abnorm!(d)}
E - = {abnorml(a),abnormi(b),abnormi(e),abnormi(f)}
Therefore, a new iteration of the covering loop is started in which the following clause
is generated (R3):
The rule is correct and the set of assumptions resulting from the derivations of positive
and (negated) negative examples is
+--- not...abnorm1(a)
COOPERATION OF ABDUCfiON AND INDUCfiON IN LOGIC PROGRAMMING 247
+- not...abnormt(b)
since penguin( a) and penguin( b) are false.
After the addition of the new assumptions, the training sets become
£+ = {abnorm2(e),abnorm2(!)}
E- = {abnorm2(c),abnorm2(d)}
abnorm2(X) +- superpenguin(X).
that is correct and no assumption is generated for covering examples. The algorithm
now ends by producing the following program:
flies(X) +- bird(X),not...abnormt(X).
abnormt(X) +- penguin(X),not..abnorm2(X).
abnorm2(X) +- superpenguin(X).
+- rests(X),plays(X).
Consider now the new training sets:
£+ = {plays(a),plays(b),rests(e),rests(f)}
E- = {}
In this case, the information about the target predicates comes not only from the train-
ing set, but also from integrity constraints. These constraints contain target predicates
and therefore they differ from those usually given in the background knowledge that
contain only non-target predicates, either abducible or non-abducible. The generaliza-
tion process is not limited by negative examples but by integrity constraints. Suppose
that we generalize the two positive examples for plays/1 in plays(X). This means that
for all X, plays(X) is true. However, this is inconsistent with the integrity constraint I
because plays(X) cannot be true for e and f.
The information contained in these type of integrity constraints must be made avail-
able in a form that is exploitable by our learning algorithm, i.e., it must be transformed
into new training examples, as it is done in theory revision systems (De Raedt and
248 E. LAMMA ET AL.
Bruynooghe, 1992a; Ade et al., 1994). When the knowledge base violates a newly
supplied integrity constraint, these systems extract one example from the constraint
and revise the theory on the basis of it: in (De Raedt and Bruynooghe, 1992a) the
example is extracted by querying the user on the truth value of the literals in the con-
straint, while in (Ade eta/., 1994) the example is automatically selected by the system.
In our approach, one or more examples are generated from constraints on target
predicates using the abductive proof procedure. The consistency of each available
example is checked with the constraints, and assumptions are possibly made to ensure
consistency. Assumptions about target predicates are considered as new negative or
positive examples.
In the previous case, we start an abductive derivation for
+- plays(a) , plays(b),rests(e), rests(!)
Since plays I 1 and rests I 1 are abducibles, a consistency derivation is started for each
atom. Consider plays(a) , in order to have the consistency with the constraint +-
plays(X) , rests(X) ., the literal notJests(a) is abduced. The same is done for the other
literals in the goal obtaining the set of assumptions
{not Jests( a), notJests(b), not_plays(e) , not_plays(f)}
that is then transformed in the set of negative examples
E- = {rests(a) ,rests(b),plays(e),plays(f)}
Now the learning process applied to the new training set generates the following cor-
rect rules:
plays(X) +- bird(X),not...abnorm1 (X) .
rests(X) +- superpenguin(X).
In this way, we can learn not only from (positive and negative) examples but also from
integrity constraints.
to ensure consistency among examples. At the end of the computation, the resulting
set of assumptions can be discarded, if we are learning an abductive theory, or added
to the theory, if we want to complete an incomplete theory.
In this way, we obtain a particular instantiation of the cycle of abductive and in-
ductive knowledge development described by Flach and Kakas in this volume. In our
approach, abduction helps induction by generating suitable background knowledge or
training examples, while induction helps abduction by generalizing the assumptions
made.
Abe, Sakama, Mooney (this volume) and (Dimopoulos and Kakas, 1996b; De
Raedt and Bruynooghe, 1992a; Ade eta/., 1994; Ade and Denecker, 1995; Kanai
and Kunifuji, 1997) agree on using abduction for covering examples. What to do
with the assumptions generated depends then on the task you are performing. If you
are performing theory revision, you can revise overspecific rules by dropping the ab-
ducible literals from the body of rules (Mooney, Sakama, this volume) or by adding
the explanations to the theory (De Raedt and Bruynooghe, 1992a; Ade eta/. , 1994).
In both theory revision and batch learning, you can learn a definition for the abducible
predicates by considering the assumptions as examples for the abducible predicates
(Mooney, this volume) and (Kanai and Kunifuji, 1997; De Raedt and Bruynooghe,
1992a; Ade eta/., 1994; Ade and Denecker, 1995).
Another approach for the use of abduction in learning is described in (Dimopoulos
and Kakas, 1996b) where each example is given together with a set of observations
that are related to it. Abduction is then used to explain the observations in order to
generate relevant background data for the inductive generalization.
Different positions exist on the treatment of negative observations. According to
Abe (p.177), "it is very rare for hypotheses to be generated when observations are
negative", therefore he does not consider this possibility. To avoid the difficulties of
learning from positive examples only, he adopts Abductive Analogical Reasoning to
generate abductive hypotheses under similar observations: in this case, generated hy-
potheses satisfy the Subset Principle and it is possible to learn from positive examples
only.
Sakama (this volume), instead, revises a database that covers negative observations
by revising only the extensional part of the database: by abduction, he finds the facts
that are responsible for the coverage of the negative example/observation and he re-
places each such fact~ f- with the clause~ f- ~- Semantically, ~ f- ~(that is equiv-
alent to ~ V ......~. represents two possible worlds, one in which ~ is true and the other
in which ~ is false. In this way the inconsistency is removed but, differently from
systems where the theory is revised by removing ~. information about the previous
state is kept, thus allowing to restore the database in its original state.
Our approach for the treatment of negative examples is similar to work by Mooney
(this volume) and (Kanai and Kunifuji, 1997). It differs from Abe 's work (this vol-
ume), since we do generate explanations for negative examples, and is a generalization
of Sakama's one since we are able to revise not only facts but also rules. The kind of
revision we perform is very similar: instead of adding an abnormality literal in the
head of the fact, we add a non-abnormality literal to the body. Moreover, Sakama's
procedure is effective for dealing with single exceptions rather then classes of excep-
250 E. LAMMA ET AL.
tions, because it treats each exception singularly by adding a new abducible literal
to a fact. Instead, we try to generalize exceptions in order to treat them as a whole,
possibly leading to the discovery of a hierarchy of exceptions.
In his chapter, Christiansen proposes a reversible demo predicate that is able to
generate the (parts of the) program that are necessary for deriving the goal. Constraint
Logic Programming techniques are used for specifying conditions on the missing pro-
gram parts in a declarative way. The approach shows a high generality, being able to
perform either induction or abduction depending on which program part is missing,
general rules or specific facts . The author also shows that the system is able to learn
exceptions to rules, though not hierarchies of exceptions.
Acknowledgments
This research was partially funded by the MURST Project 40% "Rappresentazione della conoscenza
e meccanismi di ragionamento".
Abductive derivation
An abductive derivation from (GI ~t) to (Gn ~n) in (P,Ab,IC) via a selection ruleR
COOPERATION OF ABDUCTION AND INDUCTION IN LOGIC PROGRAMMING 251
is a sequence
(GI ~I),(G2~2), . .. ,(Gn~n)
such that each G; has the form+- L1, ... ,Lt. R(G;) = Lj and (Gi+l ~i+l) is obtained
according to one of the following rules:
(A1) If Lj is not abducible or default, then Gi+l = C and ~i+l =~;where Cis the
resolvent of some clause in P with G; on the selected literal L j;
(A2) If L j is abducible or default and L j E ~i then
Gi+l =+- L1, . .. ,Lj-I,Lj+l•· ·· ,Lk and ~i+l = ~;;
Steps (A1) and (A2) are SLD-resolution steps with the rules of P and abductive or
default hypotheses, respectively. In step (A3) a new abductive or default hypotheses
is required and it is added to the current set of hypotheses provided it is consistent.
Consistency derivation
A consistency derivation for an abducible or default literal a from (u, ~~) to (Fn ~n)
in (P,Ab,IC) is a sequence
where :
(Ci) F1 is the union of all goals of the form+- L1, .. . ,Ln obtained by resolving the
abducible or default a with the denials in /C with no such goal been empty,+-;
(Cii) for each i > 1, F; has the form { +- L1, ... ,Lk} U F/ and for some j = 1, ... ,k
(Fi+l ~;+I) is obtained according to one of the following rules:
(C1) If Lj is not abducible or default, then Fi+l = C' UF/ where C' is the set
of all resolvents of clauses in P with +- L1, . . . ,Lk on the literal L j and
+-~ C', and ~i+l = ~;;
(C2) If L j is abducible or default, L j E ~i and k > 1, then
fi+l = {+-LI, ... ,Lj-I.Lj+I•····Lk}UF/
and ~i+l = ~;;
(C3) If Lj is abducible or default, Lj E ~i then Fi+l = F;' and ~i+l = ~;;
(C4) If L j is abducible or default, L j ~ ~; and L j ~ ~;, and there exists an abduc-
tive derivation from ( +- L j ~;) to ( +- ~') then Fi+ 1 = F;' and ~i+ 1 = ~'.
In case (C1) the current branch splits into as many branches as the number of resol-
vents of +- L 1, . . . , Lk with the clauses in P on L j· If the empty clause is one of such
resolvents the whole consistency check fails. In case (C2) the goal under considera-
tion is made simpler if literal Lj belongs to the current set of hypotheses~; . In case
(C3) the current branch is already consistent under the assumptions in ~;. and this
252 E. LAMMA ET AL.
branch is dropped from the consistency checking. In case (C4) the current branch of
the consistency search space can be dropped provided t- L j is abductively provable.
Given a query L, the procedure succeeds, and returns the set of abducibles f). if there
exists an abductive derivation from ( t- L {}) to ( t- /),). With abuse of terminology,
in this case, we also say that the abductive derivation succeeds.
16 ABDUCTIVE GENERALIZATION
AND SPECIALIZATION
Chiaki Sakama
16.1 INTRODUCTION
Abduction and induction both generate hypotheses to explain observed phenomena
in an incomplete knowledge base, while they are distinguished in the following as-
pects. Abduction conjectures specific facts accounting for some particular observa-
tion. Those assumptions of facts are extracted using causal relations in the background
knowledge base. As there are generally many possible facts which may imply the ob-
servation, candidates for hypotheses are usually pre-specified as abducibles. Then,
the task is finding the best explanations from those candidates. By contrast, induction
seeks regularities underlying the observed phenomena. The goal is not only explain-
ing the current observations but discovering new knowledge for future usage. Hence
induced hypotheses are general rules rather than specific facts. In constructing general
rules, some constraints called biases are often used but candidates for hypotheses are
not usually given in advance. The task is then forming new hypotheses using informa-
tion in the background knowledge base.
Comparing two reasonings, abduction can compute explanations efficiently by spec-
ifying possible hypotheses in advance. Induction has a reasoning ability higher than
abduction in the sense that it can produce new hypotheses. However, the computation
of hypotheses will require a large search space and it is generally expensive. Thus ab-
duction and induction have a trade-off between reasoning abilities and computational
costs. Then, integrating two paradigms and taking advantages of each framework will
provide a powerful methodology for hypothetical reasoning. Moreover, such transfers
of techniques will benefit both abduction and induction. In abduction, introducing
a mechanism of abducing not only facts but general rules will enhance the reason-
253
PA. Flach and A.C. Kala!s (eds.), Abduction and Induction, 253-265.
© 2000 Kluwer Academic Publishers.
254 C.SAKAMA
16.2 PRELIMINARIES
where Hand Bi (1 ::; i::; n) are atoms. The atom His the head and the conjunction
B1, ... ,Bn is the body of the clause. A clause with an empty body H f- is called
a fact. Each fact H f- is identified with the atom H. A conjunction in the body is
identified with the set of atoms included in it. A clause (atom, literal) is ground if it
contains no variable. Given a knowledge base K, a set of atoms A from the language
of K is called abducibles. Abducibles specify a set of hypothetical facts. Any instance
A of an element from A is also called an abducible and is written as A EA. Given a
knowledge base K, its associated abducibles A are often omitted when their existence
is clear from the context.
Let 0 be a set of ground literals. Each positive literal in 0 represents a positive
observation, while each negative literal in 0 represents a negative observation. A pos-
itive observation presents an evidence that is known to be true, while a negative obser-
vation presents an evidence that is known to be false. An individual positive/negative
observation is written by o+ I o-, and the set of positive/negative observations from 0
is written by Q+ I O-, respectively.
1 In (Inoue and Sakama, 1995) the framework is introduced for nonmonotonic theories. Here we use it for
definite Hom theories with multiple observations.
ABDUCTIVE GENERALIZATION AND SPECIALIZATION 255
That is, a knowledge base (K U E) \ F derives every positive observation and is consis-
tent with every negative observation. 2 It should be noted that in this extended frame-
work hypotheses can not only be added to a knowledge base but also be discarded from
it to explain observations. When Q+ contains a single observation and O- and F are
empty, the above definition reduces to the traditional logical framework of abduction
addressed by Flach and Kakas in the introduction of this volume.
An explanation (E,F) is minimal if for any explanation (E' ,F'), E' ~ E andF' ~ F
imply E' = E and F' =F. It holds that EnF = 0 for any minimal explanation (E,F) .
In this chapter explanations mean minimal explanations unless stated otherwise.
2 In (Inoue and Sakarna, 1995), explanations for a negative observation are called anti-explanations.
256 C.SAKAMA
Example 16.2 One can make a profit if he/she buys a stock and the stock price goes
up. Now there are four persons a, b, c, d, and each one bought a stock e, f, g, h,
respectively. The situation is represented as
Suppose that abducibles are specified as A= {stock(x,y), up(y) }. Then, given the
set ofpositive observations
profit(x) +- stock(x,y),
rather than computing similar explanations for each observation. This inference is an
inductive generalization, which is obtained from the original rule by dropping condi-
tions (Michalski, 1983a).
Our goal in this section is to compute such inductive generalization through ab-
duction. That is, given a knowledge base and positive observations, we produce a
generalized knowledge base which explains the observations.
Some terminologies are introduced from (Plotkin, 1970). Two atoms are compati-
ble if they have the same predicate and the same number of arguments. Let S be a set
of compatible atoms. For A1, Az E S, A1 is more general than Az (written A1 :::; Az)
ABDUCTIVE GENERALIZATION AND SPECIALIZATION 257
Definition 16.1 Let K be a knowledge base and ()+ a set of positive observations.
Then the following procedure computes an abductive generalization K+ of K wrt. ()+.
First, put K+ = K.
2. For any clause C from K+ whose body has atoms unifiable with atoms in lg(E),
produce a new clause c+ by resolving C with lg(E) on every such atom. 4
The procedure consists of two generalization processes. The first one is the gen-
eralization of abduced explanations, and the second one is the generalization of a
knowledge base. Abductive generalization weakens the conditions of existing clauses
by the least generalization of the abduced explanations. The knowledge base K+ is
also an inductive generalization of K, which explains the observations()+.
Example 16.3 In Example 16.2, the least generalization ofE is lg(E) = { up(y) }. As
the clause cl : profit(x) +- stock(x,y), up(y) contains the atom up(y), resolving cl
with up(y) produces the clause
c{: profit(x) +- stock(x,y).
Since the original clause C1 is subsumed by the produced clause Ci, Ki is obtained
from K1 by replacing C1 with c{:
K{ : profit(x) +- stock(x,y),
stock(a,e) +-, stock(b,f) +-,
stock(c,g) +-, stock(d,h) +-.
3 Inthe ILP literature, it is also called a least general generalization. But we use the term from (Plotkin,
1970) in this chapter.
4 Resolving C with lg(E) means resolution between C and an atom in lg(E).
258 C.SAKAMA
Example 16.5 Consider the knowledge base K2 = Kt of Example 16.3. When the
negative observation Q- = {• profit(d)} is provided, K2 U O- is inconsistent. Tore-
cover consistency ofK2 wrt. Q- , abduction computes the explanation F = {stock( d, h) } .
c-: A~A'
where A' is a newly introduced abducible uniquely associated with A.
Abductive specialization abductively finds facts which are the sources of inconsis-
tency. Then those facts are specialized by introducing newly invented abducibles to
their conditions. The specialized knowledge base K - is consistent with O- .
C2 : stock(d,h) ~ stock'(d,h) .
ABDUCTIVE GENERALIZATION AND SPECIALIZATION 261
As a result, K2 becomes
K2 : profit(x) t- stock(x,y),
stock(a,e) +-, stock(b,j) +-, stock(c,g) t-,
stock(d,h) t- stock'(d,h),
Note that abduction removes explanatory facts from a knowledge base, while ab-
ductive specialization keeps information on them. This is useful for recovering the pre-
vious state of a knowledge base. For instance, if the stock h later rises and profit(d)
turns positive, K2 is reproduced from K2 using abductive generalization, i.e., dropping
the condition stoc/C (d, h) inc:;.
Abductive specialization recovers consistency by modifying facts while retaining
general knowledge. This is also the case of updates in deductive databases where
every fact in a database is considered an abducible which is subject to change (Kakas
and Mancarella, 1990b). On the other hand, when one wants to specialize not only
facts but rules in a knowledge base, abductive specialization is applied in the following
manner.
Given a knowledge base K with abducibles A, we first select hypothetical clauses
from K which are subject to change. For any hypothetical clause
C;: Ht-B
q: H t-B,A;
where A; is a new abducible uniquely associated with each C;. 6 Then we consider the
knowledge base
where A;S j is any ground instantiation of A;. Abducibles associated with this new
theory K' are defined as
Then, we apply abductive specialization to K' with the following policy. If we want
to specialize C; and negative observations O- have an explanation F containing A;S j.
then we take the explanation F and specialize the corresponding fact A;S j t- in K'.
The resulting knowledge base K'- has the same effect as specializing C; inK.
6This techniqueis called naming in (Poole, 1988a). When C; contains n distinct free variables x = x,, ... ,xn,
an abducible A;= p;(x) is associated with C; where p; is an n-ary predicate appearing nowhere inK.
262 C.SAKAMA
with A = { bird(x) }. Suppose that the first clause is a hypothetical clause which we
want to revise. First, K is transformed to K':
Note that K'- has the effect of specializing the first clause inK wrt. O-. The revised
knowledge base means that a bird flies ifit satisfies an additional property p (normality
or something). But tweety fails to satisfy the property by the presence of the unproved
condition p'.
7When there are (infinitely) many ground instantiations of p(x), the set of facts p(t) <- other than
p(tweety) f- is shortly written as p(x) f- x f tweety.
ABDUCfiVE GENERALIZATION AND SPECIALIZATION 263
through abduction. Although the proposed techniques are still restrictive compared
with general induction systems, they enhance the reasoning ability of abduction and
also realize efficient induction. Our system is realized using the procedure of extended
abduction (Inoue and Sakama, 1998).
According to Peirce, "if we are ever to learn anything or to understand phenomena
at all, it must be by abduction that this is to be brought about" (Peirce, 1958). In this
respect, abduction is considered as a step to induction. Abductive generalization and
specialization are captured as techniques based on this view, and there are possibilities
of exploiting further techniques in this direction. On the application side, it is known
that abduction is useful for database update and theory revision where extensional
facts are subject to change (Kakas and Mancarella, l990b; Inoue and Sakama, 1995).
By contrast, abductive generalization/specialization constructs intensional rules for
new information, so it has potential applications to rule updates in knowledge bases.
Future research also includes extending the techniques to nonmonotonic knowledge
bases.
Acknowledgments
The author thanks Katsumi Inoue for comments on an earlier draft of this chapter.
17 USING ABDUCTION FOR
INDUCTION BASED ON BOTTOM
GENERALIZATION
Akihiro Yamamoto
17.1 INTRODUCTION
Abduction is to find explanations which explain a given example assuming a back-
ground theory. Induction, often called inductive inference, means a process of gen-
erating general rules which given examples obey. From these simple definitions, we
can expect such an inductive inference procedure that it generates rules by modify-
ing explanations which some abductive inference generates from input examples. In
this chapter we give such a procedure with the support of deductive inference and
generalization. The procedure is a refinement of bottom generalization (Yamamoto,
1997; Yamamoto, 1999a), which was invented in the analysis of inverse entailment by
(Muggleton, 1995). Because inverse entailment is an extension of bottom generaliza-
tion, the results in this chapter show that inverse entailment also contains abduction
potentially.
In researches of Artificial Intelligence abduction has already been formalized in
various ways (Cox and Pietrzykowski, 1986b; Demolombe and Farinas de Cerro,
1991; Hirata, 1995; Inoue, 1992a; Poole, 1988a; Pople, Jr., 1973). Induction has also
been formalized in various ways, and a comprehensive framework has been proposed
in Theoretical Computation Science (see, for example, (Angluin and Smith, 1983)). In
these ten years inductive inference based on theorem proving and logic programming
has attracted much attention with the name inductive logic programming (ILP) (Mug-
gleton, 1992; Lavrac and Dzeroski, 1994; Nienhuys-Cheng and de Wolf, 1997).
267
P.A. Flach and A. C. Kakas (eds.), Abduction and Induction, 267-280.
© 2000 Kluwer Academic Publishers.
268 A. YAMAMOTO
1We do not use the name in this chapter because it means another theorem in the theorem proving area.
USING ABDUCfiON FOR INDUCfiON BASED ON BOITOM GENERALIZATION 269
(A -1) B 1\ H is consistent.
(A-2) B 1\H f= E .
Poole and Inoue gave an inference method for abduction in clausal logic by using
resolution as a consequence finding procedure. Before explaining the method we will
give the completeness theorem of resolution for consequence finding.
Definition 17.1 A clause D subsumes a clause C if there is a substitution e such that
every literal in De occurs in C.
Theorem 17.1 (Lee, 1967; Kowalski, 1970) A clause Cis a logical consequence of
a clausal theory Tiff there is a clause D which is derivable from T by resolution with
factoring and subsumes C.
Suppose that Ls is a set of conjunctions of clauses and both LE and LH are sets
of ground atoms. Note that both •E and •H are clauses in the language structure.
Then any ground atom H satisfying (A-2) can be derived by generating clauses D
from T = B 1\ ...,£, with resolution and checking whether or not De = •H for some e.
This method is justified by Theorem 17.1. We can apply this abduction method in the
case that both LE and LH are sets of negation ofclauses. In the case resolution derives
clauses D which subsumes the negation of an explanation H satisfying (A-2). The
explanation obtained by negating D is called minimal (Inoue, 1992a). Non-minimal
explanations should be derived by adding some literals to D before negating it.
We formalize induction from positive examples. An inductive inference system
M sequentially takes inputs of examples E,, E2, E3, ... and returns hypotheses B,, B2,
B3, .... (We choose the symbol B; because in our formalization the current hypothesis
corresponds to the background theory in the abduction explained above.) Some initial
hypothesis Bo is given to M. Two sets LE and Ls of first-order formula are fixed, and
each example E; and each hypothesis B; must belong toLE and Ls, respectively. We
assume the following conditions forE; and B; (i = I, 2, ... ):
(I-I) B; is consistent.
(1-2) B; F= £ 1 1\ E2 1\ ... I\ E;, that is, each example is positive and each hypothesis
explains all the examples already given to M.
270 A. YAMAMCITO
(1-3) B; = B;- 1 1\H; where H; is in some set LH. The system M generates the hy-
pothesis incrementally.
The following two conditions follow directly (1-1 ), (1-2) and (1-3):
This formalization has already been used to design some inductive inference sys-
tems, e.g. ITOU (Rouveirol, 1992) and Progol (Muggleton, 1995). (Cohen, 1995)
and (Arimura, 1997; Arimura et al., 1995) used the formalization for theoretical anal-
ysis of inductive inference.
The expectation at the beginning of this chapter is formally stated by comparing
(1-4) and (1-5) with (A-1) and (A-2): We expect an inductive inference system M
iteratively revises B;_ 1 by adding H; to it which is obtained by modifying explanations
which some abductive inference procedure derives from E;.
Now we consider how to realize the expectation in clausal logic. Assume that LB
is a set of conjunctions of clauses, and that LH is a set of clauses. that we try apply-
ing Poole and Inoue's abduction method to generating H; from B;-I and E;. Then a
problem arises that we cannot use explanations derived by the abduction method as
H; because the condition (l-3) is not satisfied in general. That is, since the abduc-
tion method is based on consequence finding, H; is the negation of some clause, and
thereforeB; = B;_ 11\H; does not always belong to LB.
Some reader might think the problem would disappear if we assumed that LH is
a set of ground literals. However, ground literals would not be expected as outputs
of an inductive inference system. We expect the outputs should be generate general
rules, but a ground literal could not be regard as a general rule. We should allow LH
to contain clauses which are not ground literals. We have to find some methods to
modify explanations derived by the abduction method so that we get a general rule
which is suitable as an out put of inductive inference procedure H;. We use saturation
and generalization for the modification.
In the following discussions LE are assumed to be a set of clauses. We adopt
this assumption with expecting our result contribute the progress of such theories and
practices, because examples for an inference systems are often assumed to be clauses
in both theories and practices of inductive inference. If we concentrated ourselves on
applying the abduction method to inductive inference, LE should be a set of negations
of clauses, or a set of ground literals.
17.3 SOLD-RESOLUTION
We are introducing SOLD-resolution, with which we implement Poole and Inoue's
abduction method.
A definite program is a finite conjunction of definite clauses. A definite clause is a
formula of form
USING ABDUCI'ION FOR INDUCI'ION BASED ON BOTTOM GENERALIZATION 271
Remark 17.1 The resolving operation (4(b)) for SOLD-resolution is different from the
derivation operation/or SOL-resolution (Inoue, 1992a) in the following three points:
3. Neither the factoring rule nor the merging rule for SOL-resolution is used in
constructing SOLD-derivations.
These differences are due to the fact that P and G are respectively a definite program
and a goal clause.
Remark 17.2 Fora Horn clause C, •(C- o) is a ground definite program, -.(c+o) is
the negation ofa ground atom, and -.(Co) = -.(c+o) t\ •(C-o).
(BG-1) Choose non-deterministically one positive literal Ao and several negative liter-
als •A1, •A2, ... , •An (n 2 O)from Bot(E,B), and put K = Ao t-A1 ,A2, ... ,An.
274 A.YAMAMafO
SOLDR(P, G) =
{A I which
f- A is a ground instance of a goal +- A'
is SOLDR-derivable from (P, G)
}
.
The theorem is proved with a lemma, called the Switching Lemma for SLD-refutation.
Lemma 17.5 (Lloyd, 1987) Let (G,_,_), (Gt ,8t,Ct), ... , (D,8n,Cn) be an SLD-refu-
tation of(P,G). Suppose that
(G, _, _), ... , (Gq, 8q,Cq), (G~+t, 8~+t ,Cq+2), (G~+2• 8~+2,Cq+t),
(G~+ 3 ,e~+ 3 ,Cq+3), ... , (o,e~,Cn)
and every clause G;Jor i = q + 2, ... ,n is a variant ofG;.
Proof. (if part) LetP 1\G f= •A. Then there is an SLD-refutation (Go,_,_), (Gt, 8t ,Ct),
... ,(D,Sn,Cn) of (P 1\ (A +-),G). If no C; is A+-, the refutation is of (P,G) and
therefore P 1\ G is inconsistent. This contradicts the assumption of the theorem. So
A f- must be used as an input clause at least once. By Lemma 17.5 we can assume
that the input clauses Ct , C2, ... , Ck are in variants of clauses in P, and the rests
are A+-. Let Gk+t =+- Bt,B2, ... ,Bn-k· Since Gk is refuted with only A+-, the
atoms {A,Bt ,B2, ... ,Bn-k} are unifiable. By applying the skipping operation instead
of resolution with A +-, we get an SOLD-derivation of (P, G) whose consequence is
Fn+t =+- Bt,B2, ... ,Bn-k· This concludes that A is in SOLDR(P,G).
(only-if part) Suppose thatA is in SOLDR(P,G). Then there is an SOLD-derivation
of (P, G) with consequence f- Bt ,B2, . .. ,Bm such that each of B; is unifiable with
A. By replacing each application of the skipping operation with an application of
resolution with A+-, we get an SLD-refutation of (P 1\ (A+- ),G), and therefore (P 1\
(A +-),G) is unsatisfiable. This completes the proof. I
276 A. YAMAMOTO
Procedure Find.DCwith..BG
Input: a definite program B and a definite clause E
Output: a definite clause H
Method:
where A, ,Az, ... ,An are ground literals which are logical consequences ofB 1\ -,(£a).
Lemma 17.6 Let A be a ground atom, P a definite program, and G a goal. IfP 1\G is
consistent, P 1\ G f= A iff P f= A .
Proof It is sufficient to show the only-if part. Let G =f-A, ,Az, ... ,Am, and M(P)
be the least Herbrand model of P. If G is true in M(P), -.,A must be false in M(P) from
the assumption that P 1\ G 1\ -,A is inconsistent. Therefore A is a logical consequence
ofP.
The proof is completed if we prove is that G is true in M(P). Suppose that G is
false in M(P). Then there is a ground substitution 9 such that -,(G9) =A, e 1\Aze 1\
.. . I\ Ame is true in M(P). Since each A;e is a logical consequence of P, P 1\ G is
inconsistent, and this contradicts the assumption of the lemma. I
It is a famous result in Logic Programming theories (Lloyd, 1987) that, for a ground
atom A, P f= A iff A is derived by resolution and instantiation from P. Therefore we
can adopt, as ...,K-, any conjunction A 1 1\ Az 1\ . . . 1\ An of ground atoms which is a
logical consequence of P.
In Figure 17.1 we illustrate an inference procedure Find.DCwith..BG based on
Theorem 17.4 and Lemma 17 .6. We state, in the form of a theorem, that this procedure
is a correct implementation of bottom generalization.
USING ABDUCfiON FOR INDUCfiON BASED ON BaiTOM GENERALIZATION 277
Procedure FIND_UP_withJD
Input: a definite program B and a definite clause E
Output: a unit program H
Method:
+- p(a,xi),p(xJ,b),
+- p(a,x2),p(x2,x3),p(x 3,b),
From the goal clauses we get the following hypotheses of unit programs:
p(a,b) +-,
(p(a,y1) +-)/\(p(y2,b) +-),
(p(a,a) +-) 1\ (p(a,b) +-),
(p(a,b) +-)1\(p(b,b) +-),
(p(a,yl) +-)/\(p(y2,Y3) +-)/\(p(y4,b) +-),
(p(a,a) +-)/\(p(a,y3) +-)/\(p(y4,b) +-),
it was first proposed by (Rouveirol, 1992), while (Angluin et a/., 1992) independently
developed the operation in the Computational Learning area. (Cohen, 1995) uses
saturation in analyzing the complexity of finding definite clauses. (Arimura, 1997)
designed an algorithm which exactly learns definite programs (with some restricted
syntax) in polynomial time.
Inverse entailment also uses the bottom set in deriving a seed K of hypotheses,
but it generalizes K by the inverse of logical entailment. Since instantiation is a sub-
operation of entailment, inverse entailment is more powerful than bottom generaliza-
tion. Unfortunately, there is no characterization of the hypotheses generated by inverse
entailment. All that we have shown is that it cannot derive all hypotheses H such that
BI\H f= E (Yamamoto, 1999b).
We conjecture that it is quite difficult to extend the procedure FIND_UP_withJD
so that it can derive conjunctions of definite clauses. For such an extension, we would
have to consider, for example, the case that two definite clause in a hypothesis call
each other in an SLD-resolution. At our current stage of research, we have no solu-
tion to the problem how to treat such a case. However, remember that the procedure
Find...DC_with...BG assumed to be used iteratively in an inductive inference system. For
the assumption, if we gave sufficient and proper examples to the system, hypotheses
consists of more than two clauses could be inferred. So we should carefully consider
when we need extensions of FIND_UP_withJD.
From the construction of Find...DC_with...BG, some readers might consider that we
used abduction as a sub-inference of induction. But this is not correct. In the formal-
ization of inductive inference in Section 17 .2, a hypothesis derived by Find...DCwith...BG
is added to the background theory the inference system has and this addition is fol-
lowed by the next call of Find...DCwith...BG. So we can consider that the abduction
in the next call of Find...DCwith...BG is induced by induction. That is, abduction and
induction are co-routines in our formalization.
Acknowledgments
The main part of this work was accomplished when the author was visiting the Computer Sci-
ence Department of Technical University Darmstadt, Germany. The author wishes to thank
Prof.Dr. Wolfgang Bibel and the members of his group for many discussions and suggestions
on this issue.
Bibliography
281
282 BIBLIOGRAPHY
Cheeseman, P. (1990). On finding the most probable model. In Shranger, J. and Lan-
gley, P., editors, Computational Models of Scientific Discovery and Theory Forma-
tion, chapter 3, pages 73-95. Morgan Kaufmann, San Mateo, CA.
Cheeseman, P., Kelly, J., Self, M., Stutz, J., Taylor, W., and Freeman, D. (1988). Auto-
class: A Bayesian classification system. In Proceedings of the Eighth International
Workshop on Machine Learning, pages 54-64, Ann Arbor, MI.
Chinn, C. (1994). Are scientific theories that predict data more believable than theories
that retrospectively explain data? a psychological investigation. In Proceedings of
the XVI Conference of the Cognitive Science Society, pages 177-182.
Christiansen, H. (1992). A complete resolution method for logical meta-programming
languages. In Pettorossi, A., editor, Proceedings of the Third International Work-
shop on Meta-Programming in Logic, volume 649 of Lecture Notes in Computer
Science, pages 205-219.
Christiansen, H. (1998a). Automated reasoning with a constraint-based metainter-
preter. Journal of Logic Programming, 37(1-3):213-254. Special issue on Con-
straint Logic Programming.
Christiansen, H. (1998b). Implicit program synthesis by a reversible metainterpreter.
In Fuchs, N. E., editor, Proceedings of the Seventh International Workshop on Logic
Program Synthesis and Transformation LOPSTR' 97, volume 1463 of Lecture Notes
in Computer Science, pages 87-106, Berlin. Springer-Verlag.
Christiansen, H. (1999). Integrity constraints and constraint logic programming. In
Proceedings of the Twelfth International Conference on Applications of Prolog,
pages 5-12.
Clark, K. L. (1978). Negation as failure. In Gallaire, H. and Minker, J., editors, Pro-
ceedings of the Symposium on Logic and Databases, pages 293-322. Plenum Press,
New York.
Cohen, W. W. ( 1992). Abductive explanation-based learning: a solution to the multiple
inconsistent explanation problem. Machine Learning, 8:167-219.
Cohen, W. W. (1995). PAC-learning recursive logic programs: Efficient algorithms.
Journal ofArtificial Intelligence Research, 2:501-539.
Console, L., Giordana, A., and Saitta, L. (1991a). Investigating the relationships be-
tween abduction and inverse resolution in propositional calculus. In Proceedings of
the Fifth International Symposium on Methodologies for Intelligent Systems, vol-
ume 542 of Lecture Notes in Computer Science, pages 316-325. Springer-Verlag.
Console, L., Theseider Dupre, D., and Torasso, P. ( 1989). Abductive reasoning through
direct deduction from completed domain models. In Ras, Z. W., editor, Proceedings
of the Fourth International Symposium on Methodologies for Intelligent Systems,
pages 175-182. Elsevier.
Console, L., Theseider Dupre, D., and Torasso, P. (1991b). On the relationship be-
tween abduction and deduction. Journal of Logic and Computation, 1(5):661-690.
Console, L. and Torasso, P. (1991). A spectrum of logical definitions of model-based
diagnosis. Computational Intelligence, 7(3):133-141. Also in (Hamscher et al.,
1992).
Cooper, G. G. and Herskovits, E. (1992). A Bayesian method for the induction of
probabilistic networks from data. Machine Learning, 9:309-347.
BIBLIOGRAPHY 285
Copi, I. M. and Cohen, C. (1998). Introduction to Logic. Prentice Hall, lOth edition.
Cox, P. T. and Pietrzykowski, T. (1986a). Causes for events: their computation and
application. In Proceedings of the Eighth International Conference on Automated
Deduction, volume 230 of Lecture Notes in Computer Science, pages 608-621.
Springer-Verlag.
Cox, P. T. and Pietrzykowski, T. (1986b). Incorporating equality into logic program-
ming. Annals of Pure and Applied Logic, 31:177-189.
Cox, P. T. and Pietrzykowski, T. (1987). General diagnosis by abductive inference. In
IEEE Symposium on Logic Programming, pages 183-189, San Francisco.
Dagum, P. and Luby, M. (1997). An optimal approximation algorithm for Bayesian
inference. Artificial Intelligence, 93(1-2): 1-27.
Darden, L. (1991). Theory Change in Science: Strategies from Mendelian Genetics.
Oxford University Press, New York.
de Kleer, J., Mackworth, A. K., and Reiter, R. (1992). Characterizing diagnoses and
systems. Artificial Intelligence, 56(2-3): 197-222. Also in (Hamscher et al., 1992).
de Kleer, J. and Williams, C. R. (1987). Diagnosing multiple faults . Artificial Intelli-
gence, 32:97-130.
De Raedt, L., editor (1996). Advances in Inductive Logic Programming. IOS Press,
Amsterdam.
De Raedt, L. and Bruynooghe, M. (1990). On negation and three-valued logic in in-
teractive concept-learning. In Proceedings of the Ninth European Conference on
Artificial Intelligence, pages 207-212. Pitman.
De Raedt, L. and Bruynooghe, M. (1991). A multistrategy interactive concept-learner
and theory revision system. In Proceedings of the First International Workshop on
Multistrategy Learning, pages 175-190, Harpers Ferry.
De Raedt, L. and Bruynooghe, M. ( 1992a). Belief updating from integrity constraints
and queries. Artificial Intelligence, 53:291-307.
De Raedt, L. and Bruynooghe, M. (1992b). An overview of the interactive concept-
learner and theory revisor CLINT. In (Muggleton, 1992), pages 163-191.
De Raedt, L. and Bruynooghe, M. (1993). A theory of clausal discovery. In Pro-
ceedings of the Thirteenth International Joint Conference on Artificial Intelligence,
pages 1058-1063, Chambery, France.
De Raedt, L. and Lavrac, N. (1993). The many faces of inductive logic programming.
In Komorowski, J., editor, Proceedings of the Seventh International Symposium on
Methodologies for Intelligent Systems, volume 689 of Lecture Notes in Artificial
Intelligence, pages 435-449. Springer-Verlag.
De Raedt, L. and Van Laer, W. (1995). Inductive constraint logic. In Proceedings of
the Sixth International Workshop on Algorithmic Learning Theory, volume 997 of
Lecture Notes in Artificial Intelligence. Springer-Verlag.
Debrock, G. (1997). The artful riddle of abduction (abstract). In (Rayo et al., 1997),
page 230.
Dechter, R. ( 1996). Bucket elimination: A unifying framework for probabilistic infer-
ence. In Horvitz, E. and Jensen, F., editors, Proceedings of the Twelfth Conference
on Uncertainty in Artificial Intelligence ( UAJ -96 ), pages 211-219, Portland, OR.
286 BIBLIOGRAPHY
Esposito, F., Lamma, E., Malerba, D., Mello, P., Milano, M., Riguzzi, F., and Semer-
aro, G. (1996). Learning abductive logic programs. In (Flach and Kakas, 1996),
pages 23-30. Available on-line at http :Ilwww. cs .bris. ac. ukl-flachl
ECAI961.
Evans, C. A. and Kakas, A. C. (1992). Hypothetico-deductive reasoning. In Proceed-
ings of the International Conference on Fifth Generation Computer Systems, pages
546-554. ICOT.
Fann, K. T. ( 1970). Peirce's Theory of Abduction. Martinus Nijhoff, The Hague.
Flach, P. A. (1991). The role of explanations in inductive learning.ITK Research Re-
port 30, Tilburg University, The Netherlands.
Flach, P. A. (1992). Generality revisited. In Proceedings of the ECA/'92 Workshop on
Logical Approaches to Machine Learning, Vienna.
Flach, P. A. (1995). Conjectures-an inquiry concerning the logic of induction. PhD
thesis, Tilburg University.
Flach, P. A. (1996a). Abduction and induction: syllogistic and inferential perspectives.
In (Flach and Kakas, 1996), pages 31-35. Available on-line at http: 1 lwww. cs.
bris.ac.ukl-flachiECAI961.
Flach, P. A. (1996b). Rationality postulates for induction. In Shoham, Y., editor, Pro-
ceedings of the Sixth International Conference on Theoretical Aspects ofReasoning
and Knowledge (TARK'96), pages 267-281.
Flach, P. A. (2000). Logical characterisations of inductive learning. In Gabbay, D. M.
and Smets, P., editors, Handbook of Defeasible Reasoning and Uncertainty Man-
agement, volume IV: Abduction and Learning, chapter 4. Kluwer Academic Pub-
lishers.
Flach, P. A. and Kakas, A. C., editors (1996). Proceedings of the ECA/'96 Workshop
on Abductive and Inductive Reasoning. Available on-line at http: 1 lwww. cs.
bris.ac.ukl-flachiECAI961.
Flach, P. A. and Kakas, A. C. (1997a). Abductive and inductive reasoning: report of
the ECAI'96 workshop. Logic Journal of the IGPL, 5(5):773-778.
Flach, P. A. and Kakas, A. C., editors (1997b). Proceedings of the /JCA/'97 Workshop
on Abduction and Induction in Artificial Intelligence. Available on-line at http:
llwww.cs.bris.ac.ukl-flachiiJCAI971.
Flach, P. A. and Kakas, A. C. (1998). Abduction and induction in AI: report of the
IJCAI'97 workshop. Logic Journal of the IGPL, 6( 4):651-656.
Flener, P. (1997). Inductive logic program synthesis with dialogs. In Muggleton, S.,
editor, Inductive Logic Programming: Selected papers from the Sixth International
Workshop, pages 175-198. Springer-Verlag, Berlin.
Frankfurt, H. (1958). Peirce's notion of abduction. Journal of Philosophy, 55:594.
Frege, G. (1893). Grundgesetze der Aritmetik, Begriffsschriftlich Abgeleitet, Vol. /.
Jena, Germany. Reprinted by Olms, Hildesheim (1966).
Friihwirth, T. W. (1995). Constraint handling rules. In Podelski, A., editor, Constraint
Programming: Basics and Trends, volume 910 of Lecture Notes in Computer Sci-
ence, pages 90-107.
Fung, T. and Kowalski, R. A. (1997). The IFF proof procedure for Abductive Logic
Programming. Journal of Logic Programming, 33(2):151-165.
288 BIBLIOGRAPHY
Hamscher, W., Console, L., and de Kleer, J., editors (1992). Readings in Model-Based
Diagnosis. Morgan Kaufmann.
Hanson, N. R. (1958). The logic of discovery. Journal of Philosophy, 55(25): 1073-
1089.
Hanson, N. R. (1965). Notes towards a logic of discovery. In Bernstein, R., editor,
Critical Essays on C.S. Peirce. Yale University Press.
Harman, G. (1965). The inference to the best explanation. Philosophical Review,
74:88-95.
Harper, W. and Skyrms, B., editors (1988). Causation, Cause and Credence. Kluwer
Academic Press.
Heckerman, D. (1995). A tutorial on learning Bayesian networks. Technical Report
MSR-TR-95-06, Microsoft Research, Redmond, WA. (Revised November 1996).
Helft, N. ( 1989). Induction as nonmonotonic inference. In Proceedings of the First In-
ternational Conference on Principles ofKnowledge Representation and Reasoning,
pages 149-156, Toronto, Canada. Morgan Kaufmann.
Hempel, C. G. (1943). A purely syntactical definition of confirmation. Journal ofSym-
bolic Logic, 8(4): 122-143.
Hempel, C. G. (1945). Studies in the logic of confirmation. Mind, 54(213 & 214):1-26
& 97-121.
Hempel, C. G. (1962). Deductive-nomological versus statistical explanation. In Feigl,
H. and Maxwell, G., editors, Minnesota Studies in the Philosophy of Science, Vol.
///,pages 98-169. University of Minnesota Press.
Hempel, C. G. (1965). Aspects of Scientific Explanation and Other Essays in the Phi-
losophy of Science. Free Press, New York.
Hempel, C. G. (1966). Philosophy ofNatural Science. Prentice Hall, Englewood Cliffs,
NJ.
Hempel, C. G. and Oppenheim, P. (1948). Studies in the logic of explanation. Philos-
ophy of Science, 15:135-175.
Henglein, F. (1989). Polymorphic type inference and semi-unification. PhD thesis, De-
partment of Computer Science, New York University.
Henrion, M. ( 1988). Propagating uncertainty in Bayesian networks by probabilistic
logic sampling. In Lemmer, J. F. and Kanal, L. N., editors, Uncertainty in Artificial
Intelligence 2, pages 149-163. Elsevier.
Hill, P.M. and Gallagher, J. P. (1994). Meta-programming in logic programming. In
Gabbay, D. M., Hogger, C. J., and Robinson, J. A., editors, Handbook of Logic in
Artificial Intelligence and Logic Programming, volume V. Oxford University Press.
Hill, P. M. and Lloyd, J. W. ( 1989). Analysis of meta-programs. In Meta-programming
in Logic Programming, pages 23-51. MIT Press.
Hirata, K. (1995). A classification of abduction: Abduction for logic programming. In
Machine Intelligence 14, pages 397-424. Oxford University Press.
Hobbs, J., Stickel, M., Appelt, D., and Martin, P. (1990). Interpretation as abduction.
Technical report, SRI International, Menlo Park, Ca.
Hobbs, J., Stickel, M., Appelt, D., and Martin, P. (1993). Interpretation as abduction.
Artificial Intelligence, 63:69-142.
290 BIBLIOORAPHY
Hobbs, J., Stickel, M., Martin, P., and Edwards, D. (1988). Interpretation as abduction.
In Proceedings of the Twentysixth Annual ACL Meeting, pages 95-103, Buffalo.
Holland, J., Holyoak, K., Nisbett, R., and Thagard, P.R. (1986). Induction: Processes
of inference, learning, and discovery. MIT Press, Cambridge, MA.
Hookway, C. (1992). Peirce. Routledge & Kegan Paul, London.
Horwich, P. (1982). Probability and Evidence. Cambridge University Press.
Inoue, K. (1992a). Linear resolution for consequence finding. Artificial Intelligence,
56:301-353.
Inoue, K. ( 1992b). Studies on abductive and nonmonotonic reasoning. PhD thesis,
Kyoto University, Kyoto.
Inoue, K. (1994). Hypothetical reasoning in logic programs. Journal of Logic Pro-
gramming, 18(3):191-227.
Inoue, K. and Kudoh, Y. (1997). Learning extended logic programs. In Proceedings of
the Fifteenth International Joint Conference on Artificial Intelligence, pages 176-
181. Morgan Kaufmann.
Inoue, K. and Sakama, C. (1995). Abductive framework for nonmonotonic theory
change. In Proceedings of the Fourteenth International Joint Conference on Ar-
tificial Intelligence, pages 204-210. Morgan Kaufmann, California.
Inoue, K. and Sakama, C. ( 1996). A fix point characterization of abductive logic pro-
grams. Journal of Logic Programming, 27(2): 107-136.
Inoue, K. and Sakama, C. (1998). Specifying transactions for extended abduction.
In Proceedings of the Sixth International Conference on Principles of Knowledge
Representation and Reasoning, pages 394-405. Morgan Kaufmann, California.
Jaffar, J. and Maher, M. (1994). Constraint logic programming: A survey. Journal of
Logic Programming, 19-20:503-581.
Jaffar, J., Maher, M., Marriott, K., and Stuckey, P. (1998). Semantics of constraint
logic programs. Journal of Logic Programming, 37:1-46.
Jaynes, E. (1985). Bayesian methods: General background. In J.H. Justice, editor,
Maximum Entropy and Bayesian Methods in Applied Statistics, pages 1-25. Cam-
bridge University Press, Cambridge, England.
Jaynes, E. (1995). Probability Theory: The Logic of Science. Unpublished Manuscript.
Available on-line at ftp: I jbayes. wustl. edujJaynes. book.
Jensen, F. V. (1996). An Introduction to Bayesian Networks. Springer-Verlag, New
York.
Johnson-Laird, P. (1988). The Computer and the Mind. W. Collins and Co.
Jordan, M. and Bishop, C. (1996). Neural networks. Memo 1562, MIT Artificial In-
telligence Lab, Cambridge, MA.
Josephson, J. R. (1994). Conceptual analysis of abduction. In (Josephson and Joseph-
son, 1994), chapter 1, pages 5-30.
Josephson, J. R. and Josephson, S. G. ( 1994). Abductive Inference: Computation, Phi-
losophy, Technology. Cambridge University Press, New York.
Jung, B. (1993). On inverting generality relations. In Proceedings of the Third Inter-
national Workshop on Inductive Logic Programming, pages 87-101.
Kakas, A. C., Kowalski, R. A., and Toni, F. (1992). Abductive logic programming.
Journal of Logic and Computation, 2(6):719-770.
BIBLIOGRAPHY 291
Kakas, A. C., Kowalski, R. A., and Toni, F. (1997). The role of abduction in logic
programming. In Gabbay, D. M., Hogger, C. J., and Robinson, J. A., editors, Hand-
book of Logic in Artificial Intelligence and Logic Programming, volume 5, pages
233-306. Oxford University Press.
Kakas, A. C. and Mancarella, P. (1990a). Database updates through abduction. In Pro-
ceedings of the Sixteenth International Conference on Very Large Databases, pages
650-661. Morgan Kaufmann, California.
Kakas, A. C. and Mancarella, P. (1990b). Generalized stable models: a semantics for
abduction. In Proceedings of the Ninth European Conference on Artificial Intelli-
gence, pages 385-391. Pitman.
Kakas, A. C. and Mancarella, P. (1990c). On the relation between truth maintenance
and abduction. In Proceedings of the Second Pacific Rim International Conference
on Artificial Intelligence.
Kakas, A. C. and Mancarella, P. (1994). Knowledge assimilation and abduction. In In-
ternational Workshop on Truth Maintenance, Lecture Notes in Computer Science.
Springer-Verlag.
Kakas, A. C. and Michael, A. (1995). Integrating abductive and constraint logic pro-
gramming. In Proceedings of the Twelfth International Conference on Logic Pro-
gramming ICLP-95, pages 399-413.
Kakas, A. C. and Riguzzi, F. (1997). Learning with abduction. In Lavrac, N. and
Dzeroski, S., editors, Proceedings ofthe Seventh International Workshop on Induc-
tive Logic Programming, volume 1297 of Lecture Notes in Artificial Intelligence,
pages 181-188. Springer-Verlag.
Kakas, A. C. and Riguzzi, F. (1999). Abductive concept learning. New Generation
Computing.
Kanai, T. and Kunifuji, S. (1997). Extending inductive generalisation with abduction.
In (Flach and Kakas, 1997b), pages 25-30. Available on-line at http: 1 jwww.
cs.bris.ac.uk/-flach/IJCAI97/.
Kapitan, T. (1990). In what way is abductive inference creative? Transactions of the
Charles S. Peirce Society, 26(4):449-512.
Kasahara, K., Matsuzawa, K., Ishikawa, T., and Kawaoka, T. (1996). Viewpoint-based
measurement of semantic similarity between words. In Proceedings of the Fifth In-
ternational Workshop on Artificial Intelligence and Statistics, volume 112 of Lec-
ture Notes in Statistics, pages 433-442.
Kelly, K. and Glymour, C. (1988). Theory discovery from data with mixed quanti-
fier. Technical Report CMU -PHIL-9, Department of Philosophy, Methodology and
Logic, Carnegie-Mellon University, Pittsburgh, PA.
Kemeny, J. (1953). The use of simplicity in induction. Philosophical Review, 62:391-
408.
Kfoury, A., Tiuryn, J ., and Urcyczyn, P. ( 1990). The undecidability of the semi-unifica-
tion problem. In Proceedings of the Twentysecond Annual ACM Symposium on
Theory of Computing, pages 468-476.
Kijsirikul, B., Numao, M., and Shimura, M. (1992). Discrimination-based constructive
induction of logic programs. In Proceedings of the Tenth National Conference on
Artificial Intelligence, pages 44-49, San Jose, CA.
292 BIBLIOGRAPHY
Lee, R. (1967). A Completeness Theorem and Computer Program for Finding Theo-
rems Derivable from Given Axioms. PhD thesis, University of California, Berkeley.
LeiB, H. (1984). Polymorphic recursion and semi-unification. In Lecture Notes in
Computer Science 440, pages 211-224. Springer-Verlag.
Levesque, H. J. (1989). A knowledge-level account of abduction. In Proceedings of
the Eleventh Internationalloint Conference on Artificial Intelligence, pages 1061-
1067, Detroit.
Levi, I. (1991 ). The Fixation of Belief and its Undoing. Cambridge University Press.
Lewis, D. (1986). Causal explanation. In Philosophical Papers, volume 2. Oxford Uni-
versity Press.
Lipton, P. (1991). Inference to the Best Explanation. Routledge & Kegan Paul, Lon-
don.
Lloyd, J. W. ( 1987). Foundations ofLogic Programming. Springer-Verlag, 2nd edition.
Lobo, J. and Uzcategui, C. (1998). Abductive change operators. Fundamenta Infor-
matica.
Loredo, T. (1990). From Laplace to supernova SN 1987A: Bayesian inference in as-
trophysics. In Fougere, P., editor, Maximum Entropy and Bayesian Methods, pages
81-142. Kluwer Academic Press, Dordrecht, The Netherlands.
Lucas, P. (1998). Analysis of notions of diagnosis. Artificial Intelligence, 105(1-2):289-
337.
Mackworth, A. K. (1978). Vision research strategy: black magic, metaphors, mecha-
nisms, miniworlds and maps. In Hanson, A. R. and Riseman, E. M., editors, Com-
puter Vision Systems, pages 53-61. Academic Press, New York, NY.
Makinson, D. (1989). General theory of cumulative inference. In Reinfrank, M., de
Kleer, J., Ginsberg, M. L., and Sandewall, E., editors, Proceedings ofthe Second In-
ternational Workshop on Non-Monotonic Reasoning, volume 346 of Lecture Notes
in Artificial Intelligence, pages 1-18, Berlin. Springer-Verlag.
Martin, L. and Vrain, C. (1996). A three-valued framework for the induction of general
logic programs. In (De Raedt, 1996), pages 219-235.
Mayer, M. C. and Pirri, F. (1996). Abduction is not deduction-in-reverse. Logic Jour-
nal of the IGPL, 4(1):1-14.
McCarthy, J. (1980). Circumscription: a form of non-monotonic reasoning. Artificial
Intelligence, 13(1-2):27-39.
Meltzer, B. (1970). The semantics of induction and the possibility of completee sys-
tems of inductive inference. Artificial Intelligence, 1:189-192.
Michalski, R. S. ( 1983a). A theory and methodology of inductive learning. In (Michal-
ski et al., 1983), pages 83-134.
Michalski, R. S. (1983b). A theory and methodology of inductive learning. Artificial
Intelligence, 20(2): 111-162.
Michalski, R. S. (1987). Concept learning. In Shapiro, S., editor, Encyclopedia ofAr-
tificial Intelligence, pages 185-194. John Wiley, Chicester.
Michalski, R. S. (1991). Inferential learning theory as a basis for multistrategy task-
adaptive learning. In Proceedings of the First International Workshop on Multi-
strategy Learning, pages 3-18, Harpers Ferry.
294 BIBLIOGRAPHY
Pearl, J. (1978). On the connection between the complexity and credibility of inferred
models. International Journal of General Systems, 4:255-264.
Pearl, J. (1987). Evidential reasoning using stochastic simulation of causal models.
Artificial Intelligence, 32(2):245-257.
Pearl, J. (1988a). Embracing causation in default reasoning. Artificial Intelligence,
35(2):259-271.
Pearl, J. (1988b ). Probabilistic Reasoning in Intelligent Systems: Networks of Plausi-
ble Inference. Morgan Kaufmann, San Mateo, CA.
Peirce, C. S. (1955a). Abduction and induction. In Buchler, J., editor, Philosophical
Writings of Peirce, chapter 11, pages 150--156. Dover.
Peirce, C. S. (1955b ). The fixation of belief. In Buchler, J., editor, Philosophical Writ-
ings of Peirce. Dover.
Peirce, C. S. ( 1957). Essays in the Philosophy of Science. Liberal Arts Press.
Peirce, C. S. (1958). Collected Papers of Charles Sanders Peirce. Harvard University
Press, Cambridge, Massachusetts. Edited by C. Harstshome, P. Weiss & A. Burks.
Peng, Y. and Reggia, J. A. (1990). Abductive Inference Models for Diagnostic Problem-
Solving. Springer-Verlag, New York.
Pino Perez, R. and Uzcategui, C. (1997). Jumping to explanations vs. jumping to con-
clusions. Technical Report IT-301, LIFL, University of Lille, France.
Plotkin, G. D. (1970). A note on inductive generalization. In Meltzer, B. and Michie,
D., editors, Machine Intelligence 5, pages 153-163. Edinburgh University Press.
Plotkin, G. D. (1971). Automatic Methods of Inductive Inference. PhD thesis, Edin-
burgh University.
Poole, D. (1988a). A logical framework for default reasoning. Artificial Intelligence,
36(1):27-47.
Poole, D. (1988b ). Representing knowledge for logic-based diagnosis. In Proceedings
of the International Conference on Fifth Generation Computing Systems, pages
1282-1290,Tokyo,Japan.
Poole, D. (1989a). Explanation and prediction: An architecture for default and abduc-
tive reasoning. Computational Intelligence, 5(2):97-110.
Poole, D. (1989b). Normality and faults in logic-based diagnosis. In Proceedings of
the Eleventh International Joint Conference on Artificial Intelligence, pages 1304-
1310, Detroit, MI.
Poole, D. (1990). A methodology for using a default and abductive reasoning system.
International Journal of Intelligent Systems, 5(5):521-548.
Poole, D. (1993). Probabilistic Hom abduction and Bayesian networks. Artificial In-
telligence, 64(1):81-129.
Poole, D. (1994). Representing diagnosis knowledge. Annals of Mathematics and Ar-
tificial Intelligence, 11:33-50.
Poole, D. (1996). Probabilistic conflicts in a search algorithm for estimating posterior
probabilities in Bayesian networks. Artificial Intelligence, 88:69-100.
Poole, D. (1997). The independent choice logic for modelling multiple agents under
uncertainty. Artificial Intelligence, 94:7-56. Available on-line at http: 1 jwww.
cs.ubc.cajspider/poolejabstracts/icl.htrnl.
BIBLIOORAPHY 297
Poole, D. (1998). Abducing through negation as failure: stable models in the Indepen-
dent Choice Logic. Journal of Logic Programming. Available on-line at http:
llwww.cs.ubc.calspiderlpoolelabstractslappro x-pa.html.
Poole, D., Goebel, R., and Aleliunas, R. (1987). Theorist: A logical reasoning system
for defaults and diagnosis. In Cercone, N. and McCalla, G., editors, The Knowledge
Frontier: Essays in the Representation of Knowledge, pages 331-352. Springer-
Verlag.
Pople, Jr., H. E. (1973). On the mechanization of abductive logic. In Nilsson, N.J.,
editor, Proceedings of the Third International Joint Conference on Artificial Intel-
ligence, pages 147-152, Standford, CA. William Kaufmann.
Popper, K. (1959). The Logic of Scientific Discovery. Basic Books, New York.
Psillos, S. (1996). Ampliative reasoning: induction or abduction? In (Flach and Kakas,
1996), pages56-61. Available on-line at http: 1 lwww. cs. bris. ac. ukl-flachl
ECAI961.
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1):81-106.
Quinlan, J. R. (1990). Learning logical definitions from relations. Machine Learning,
5(3):239-266.
Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann, San
Mateo, CA.
Ramachandran, S. (1998). Theory Refinement ofBayesian Networks with Hidden Vari-
ables. PhD thesis, Department of Computer Sciences, University of Texas, Austin,
TX. Also appears as Artificial Intelligence Laboratory Technical Report AI 98-265
(see http: 1 lwww. cs. utexas. eduluserslai -lab).
Ramachandran, S. and Mooney, R. J. (1998). Theory refinement for Bayesian net-
works with hidden variables. In Proceedings of the Fifteenth International Confer-
ence on Machine Learning, Madison, WI. Morgan Kaufman.
Ramsey, F. P. (1931). General propositions and causality. In Foundations of Mathe-
matics and Other Logical Essays, pages 237-257. Routledge & Kegan Paul.
Rayo, M., Gimate-Welsh, A., and Pellegrino, P., editors (1997). VIth International
Congress of the International Association for Semiotic Studies: Semiotics Bridging
Nature and Culture. Editorial Solidaridad, Mexico.
Reggia, J. A., Nau, D., and Wang, P. (1983). Diagnostic expert systems based on a set
covering model. International Journal of Man-Machine Studies, 19(5):437-460.
Reilly, F. E. (1970). Charles Peirce's Theory of Scientific Method. Fordham University
Press, New York.
Reiter, R. (1980). A logic for default reasoning. Artificial Intelligence, 13:81-132.
Reiter, R. (1987). A theory of diagnosis from first principles. Artificial Intelligence,
32(1 ):57-96. Also in (Hamscher et al., 1992).
Reiter, R. and de Kleer, J. (1987). Foundation of assumption-based truth maintenance
systems: preliminary report. In Proceedings of the Sixth National Conference on
Artificial Intelligence, pages 183-188.
Roesler, A. (1997). Perception and abduction (abstract). In (Rayo et al., 1997), page
226.
Rouveirol, C. (1992). Extensions of inversion of resolution applied to theory comple-
tion. In (Muggleton, 1992), pages 63-92.
298 BIBLIOGRAPHY
Kakas, A.C., 13, 21, 23-26, 33, 35, 45, 49, 51, positive-only, 178-180
54, 69, 71, 82, 84, 107, 115, 118, probabilities, 160
133, 170, 180, 183, 184, 189, 195, supervised, 110
196, 199,210,211,213-215,223, unsupervised, 40, 153, 161, 166
226,228,233-235,241,245,248- least generalization, 257
250,261,264,265 Lee, R., 268, 269
Kanai, T., 176,229,248,249 Lehmann, D., 86, 90, 103, 105, 106
Kant, I., 47 LeiS, H., 207
Kapitan, T., 48 Leibniz, G.W., 39
Kasahara, K., 173 LEL~214,216,217,220,224-228
Kawaoka, T., 173 Levesque, H.J., 54, 133, 183
Kedar-Cabelli, S., 133, 147 Levi, I., 57
Keller, R., 133, 147 Lewis, D., 63
Kelly, K., 138 Lifschitz, V., 214,216
Kemeny,J., 134 Lipton, P., 8, 32, 73, 84
Kfoury, A.J., 207 Lloyd,J.W., 197,271 , 275, 276
Kijsirikul, B., 186 Lobo, J., 54-56, 58
Kitcher, P., 68 logic
knowledge assimilation, 54, 70 first-order, 107,117,126-128,197
knowledge base refinement, 183 of discovery, 9
knowledge representation, 108, 111, 214, 229 of the finished research report, 9
knowledge system, 215, 223-225 predicate, 107, 117, 119, 125-128
Kodratoff, Y., 77, 151 propositional, 117
Konolige, K., 69, 112, 133, 150, 155 logic program
Kowalski, R.A., 13, 23, 45, 51, 69, 71, 133, 183, abductive extended, 223, 229
211,213-215,234,268,269 extended, 214, 216
Kraus, S., 86, 90, 103, 105, 106 normal, 214
Kruijff, G.J., 33, 49 logic programming, 2, 5, 6, II, 13, 23, 69, 133,
Kudoh, Y., 214, 216, 222 161, 163, 196, 205, 209, 213, 214,
Kuipers, T., 57 233-235
Kunifuji, S., 176,229,248,249 abductive, 13, 15, 24, 85, 163, 168, 199,
200,211,223,228,229,233,234
inductive, 11, 13, 16, 22, 24, 25, 101, 103,
LAB,24, 26
110, 114, 116, 153, 170, 171, 175,
Lachiche, N., 10, 107, 113, 114, 126
183, 199, 211, 214, 218, 233, 234,
LAELP, 215, 227, 228 236,267
Lakatos, 1., 137
probabilistic, 163
Lamma, E., 11, 26, Ill, 116, 177, 196, 210, Loredo, T.J., 159
215, 216, 222, 228, 234, 235, 239, Luby, M.,162
264 Lucas, P.J .F., 36
Laplace, P.S., 159 Lycan, W., 31
Lauritzen, S.L., 162
Lavrac,N., 183,213,217,220,226,227,267 machine learning, 77, 79, 93-95, 110, 133, 134,
Leake, D., 83 138, 139, 181, 191,220
learning, 133, 138, 148, 156 Mackworth, A.K., 133, 139, 147, 149, 154
abductive logic programs, 26, 223, 239, Magidor, M., 86, 90, 103, 105, 106
264 Makinson, D., 45, 52, 53, 86, 90
abductive theories, 188, 243 Malerba, D., 177,239
Bayesian networks, 161 Malfait, B., 24, 25, 247-249
decision tree, !59, 161 , 166 Mancarella, P., 23, 45, 54, 210, 215, 223, 234,
exceptions, 241 235,250,261,265
explanation-based, 108, 147,211 Marquis, P., 113, 114
from examples, 46, 139, 145 Martens, B., 19, Ill
from incomplete knowledge, 234, 242 Martin, L., 214, 228
from integrity constraints, 247 Martin, P., 45, 133, 139
neural networks, 161 , 166 Matsuzawa, K., 173
nonmonotonic logic programs, 216 Mayer, M.C., 84, 86
nonmonotonic theories, 214 McCarthy,J., 83
INDEX 307
Mello, P., II, 26, Ill , Il6, 177,215, 228, 234, nonmonotonic
235,239 explanation, 83
Meltzer, B., 146, 148, 150 inference, 83
meta-interpreter, 205 reasoning, 83, 90, 103, 104, 107-109,
meta-language, 90, 91, 109, 110, 112, 197, 198 ll2, ll4, 214,215, 227,229
meta-level, 15, 23, 84, 85, 90, 93, 105, Ill normal form, 13, 144
meta-logic, 85, 90,205 normal logic program, 214
Michael, A., 23 Numao, M., 186, 2ll
Michalski, R.S ., I, 21, 24, 71, 77, 78, 133-135,
138,145,146,148, 150, 183,256 O'Neill, M., 187
Milano, M., 177,215, 228,234,239 O'Rorke, P., 26, 133, 138, 149, 186, 264
Mill, J.S., 46, !50 observable, 14-22, 24, 199
minimal explanation, 255, 269 observation, 6, II , 13, 14,78-80, 82, 134
minimal model, 16, 103 Occam's razor, 139, 172, 183
minimality, 103, 114, 134 Olshen, R.A., 156
minimum description length, 161 Oppenheim, P., 148
MIS, 264 Ourston, D., 24, 25, 184, 186, 187, 264
Mitchell, T.M., 133, 139, 145, 147 overfitting, 160, 161,220
modelling
causal, 154 Pagnucco, M., 54-56, 58
evidential, 154 parsimonious, 36, 43, 72, 188
Mooney, R.J ., ll , 24-26, ll6, 133, 139, 147, partial evaluation, 257
172, 176, 183, 184, 186-188, 190, Pazzani, M.J., 184, 186,264
210, 229, 233, 248,249, 264 Pearl, J., 39, 134, 154-156, 161, 162, 187
Morgan, C., 146 Peirce, C.S., 5-11 , 13, 16, 33, 45, 47, 48, 51 ,
Morris, S., 133 93-95, 97, 107,145, 151,169,170,
Mortimer, H., 78, 82 195,265
most general unifier, 206 abduction,47
epistemology, 51
Muggleton, S.H., 13, 22, 23, 77, 101 , Ill, 114,
inferential theory, 6, 13, 35, 89, 105, 107,
135, 138, 146, 148, 169, 170, 175,
170, 196
183, 211, 213, 214, 218, 220, 228,
syllogistic theory, 5, 35, 117-ll9, 170,
247,267,270,277,278
196,199
Mycin, 154
Peng, Y., 183, 188, 189
perception, 154
natural language, 12, 45, 128, 133, 140, 148, Pereira, L.M., 216,222
172 PFOIL,l90
Nau, D.S., 147 Pietrzykowski, T., 23, 77, 133, 138, 139, 149,
Neal, R.M., 161 150, 186,267
near-miss example, 177 Pino Perez, R., 84-86
negation,216 Pirri, F., 85, 86
as failure, 144, 199,214,216,234,243 Pin, L., 280
by default, 214, 229, 234-236, 243 planning, 11, 12, 44, 133, 138, 148
classical, 214, 216,228 Plotkin, G.D., 133, 135, 256, 277
explicit, 214, 229 Poole, D., ll , 12, 77, ll2, 133, 134, 138, 139,
negative observation, 254 147, 148, 150, 155, 162-164, 168,
Neri, F., 133, 138, 149 171 , 183, 214, 215,227, 261,267-
neural network, 36, 153, 154, 156, 161, 166, 190 269
Ng, H.T., I39, 172, 183 PopleJr.,H.E., ll1 , 150, 169, 170,183, 267
Nicosia, M., 214,228 Popper, K., 137
Nienhuys-Cheng, S.-H., 211, 267, 268, 272 positive
Nilsson, J.F., 211 observation, 254
Nisbett, R., 46 positive-only learning, 178-180
no-good clause, 177 possible world, 19, 53, 61, 136, 157, 158, 163,
noise, 10, 157, 159, 161 , 220 249
nondeterminism, 226, 228, 229 predicate
nondeterministic rule, 215, 220, 221 , 224, 225, clause, 170, 171
227,228 completion, 112, ll4, 116
308 INDEX
Tanner, M.C., 72
Tarski, A., 90
taxonomy, 204,214
testing a hypothesis by experiment, 169, I70
Thagard, P.R., 46, 65, 73, 137, 172
theory
development, I7, 24, 25,27
formation, 17, 22
refinement, lSI, 229,264
revision, 22, 24, 54, ISI, 242
Theseider Dupre, D., 23, 69, 7I, 77, 85, 112,
I33, 139, I44, I48, ISO, ISS
Thompson, C.A., 24, 26, 188
Tiuryn, J., 207
Toni, F., 13, 45, 51, 69, 7I, 133, 183, 211, 213,
2I4,234
Torasso, P., 23, 69, 71, 77, 85, II2, I33, 138,
139, 144, I47-ISO, ISS
Towell, G., 184
transitivity, 96, 103, 1I9, I2I, 135
truth-preserving, 2, 6I, IIS
truth-value function, I2I, I27
Tuhrim, S., 190
unfolding, 226
APPLIED LOGIC SERIES