Vous êtes sur la page 1sur 316

Abduction and Induction

APPLIED LOGIC SERIES


VOLUME 18

Managing Editor
Dov M. Gabbay, Department of Computer Science, King's College, Landen, U.K.

Co-Editor
John Barwise, Department of Philosophy, Indiana University, Bloomington, IN,
U.S.A .

Editorial Assistant
Jane Spurr, Department of Computer Science, King's College, London, U.K.

SCOPE OF THE SERIES


Logic is applied in an increasingly wide variety of disciplines, from the traditional sub-
jects of philosophy and mathematics to the more recent disciplines of cognitive science,
computer science, artificial intelligence, and linguistics, leading to new vigor in this
ancient subject. Kluwer, through its Applied Logic Series, seeks to provide a home for
outstanding books and research monographs in applied logic, and in doing so demon-
strates the underlying unity and applicability of logic.

The titles published in this series are listed at the end of this volume.
Abduction and
Induction
Essays on their Relation and Integration

edited by

PETER A. FLACH
University of Bristol

and

ANTONIS C. KAKAS
University of Cyprus

SPRINGER-SCIENCE+BUSINESS MEDIA, B.V.


A C.I.P. Catalogue record for this book is available from the Library of Congress.

ISBN 978-90-481-5433-3 ISBN 978-94-017-0606-3 (eBook)


DOI 10.1007/978-94-017-0606-3

Printed on acid-free paper

All Rights Reserved


© 2000 Springer Science+Business Media Dordrecht
Originally published by Kluwer Academic Publishers in 2000
No part of the material protected by this copyright notice may be reproduced or
utilized in any form or by any means, electronic or mechanical,
including photocopying, recording or by any information storage and
retrieval system, without written permission from the copyright owner.
Contents

Foreword ix

Preface xiii

Contributing Authors xv

Abductive and inductive reasoning: background and issues


Peter A. Flach and Antonis C. Kakas
1.1 Introduction 1
1.2 Abduction and induction in philosophy and logic 2
1.3 Abduction and induction in logic programming and artificial intelligence 11
1.4 Integration of abduction and induction 24
1.5 Conclusions 27

Part I The philosophy of abduction and induction


2
Smart inductive generalizations are abductions 31
John R. Josephson
2.1 A distinctive pattern of inference 31
2.2 What is an explanation? 36
2.3 Smart inductive generalizations are abductions 39
2.4 Conclusion 43

3
Abduction as epistemic change: a Peircean model in Artificial Intelligence 45
Atocha Aliseda
3.1 Introduction 45
3.2 Abduction and induction 46
3.3 The notion of abduction 47
3.4 Epistemic change 51
3.5 Abduction as epistemic change 54
3.6 Discussion and conclusions 57
v
Vl CONTENTS

4
Abduction: between conceptual richness and computational complexity 59
Stathis Psillos
4.1 Introduction 59
4.2 Ampliative reasoning 60
4.3 Explanatory reasoning: induction and hypothesis 62
4.4 Abduction 64
4.5 Abduction and computation 69
4.6 Conclusions 73

Part II The logic of abduction and induction


5
On relationships between induction and abduction: a logical point of view 77
Brigitte Bessant
5.1 Introduction 77
5.2 Abduction and induction: one is an instance of the other 78
5.3 Abduction and induction: different with a common root 79
5.4 Abduction and induction: totally different 82
5.5 Abduction and induction : a logical inference 84
5.6 Conclusion 86

6
On the logic of hypothesis generation 89
Peter A. Flach
6.1 Introduction 89
6.2 Logical preliminaries 90
6.3 Explanatory reasoning 93
6.4 Confirmatory reasoning 97
6.5 Discussion 105

7
Abduction and induction from a non-monotonic reasoning perspective 107
Nicolas Lachiche
7.1 Introduction 107
7.2 Definitions 108
7.3 Abduction and explanatory induction 111
7.4 Abduction and descriptive induction 111
7.5 Discussion 114
7.6 Conclusion 115

8
Unified inference in extended syllogism 117
Pei Wang
8.1 Term logic and predicate logic 117
8.2 Extended syllogism in NARS 119
8.3 An example 123
8.4 Discussion 125
CONTENTS vii

Part II I The integration of abduction and induction: an Artificial Intel-


ligence perspective
9
On the relations between abductive and inductive explanation 133
Luca Console and Lorenza Saitta
9.1 Introduction 133
9.2 Generality and informativeness 134
9.3 A general definition of explanation 137
9.4 Inductive and abductive explanations 140
9.5 Analysis of inference mechanisms in the literature 145
9.6 Related work 148
9.7 Conclusions 151

10
Learning, Bayesian probability, graphical models, and abduction 153
David Poole
10.1 Introduction 153
10.2 Bayesian probability 156
10.3 Bayesian networks 161
10.4 Bayesian learning and logic-based abduction 162
10.5 Combining induction and abduction 166
10.6 Conclusion 168

11
On the relation between abductive and inductive hypotheses 169
Akinori Abe
11 .1 Introduction 169
11.2 The relation between abduction and induction 170
11.3 About the integration of abduction and induction 176
11.4 Conclusion 179
12
Integrating abduction and induction in Machine Learning 181
Raymond J. Mooney
12.1 Introduction 181
12.2 Abduction and induction 182
12.3 Abduction in theory refinement 183
12.4 Induction of abductive knowledge bases 188
12.5 Conclusions 191

Part IV The integration of abduction and induction: a Logic Program-


ming perspective
13
Abduction and induction combined in a metalogic framework 195
Henning Christiansen
13.1 Introduction 195
13.2 A metalogic framework for models of reasoning 197
13.3 Modelling a variety of reasoning processes 199
13.4 Implementation of the DEMO system 204
13.5 Summary and related work 209
Vlll CONTENTS

14
Learning abductive and nonmonotonic logic programs 213
Katsumi Inoue and Hiromasa Haneda
14.1 Introduction 213
14.2 Learning nonmonotonic logic programs 216
14.3 Learning abductive logic programs 223
14.4 Related work 228
14.5 Conclusion 229
Appendix: Proof of Theorem 14.2 229

15
Cooperation of abduction and induction in Logic Programming 233
Evelina Lamma, Paola Mello, Fabrizio Riguzzi, Floriana Esposito,
Stefano Ferilli, and Giovanni Semeraro
15.1 Introduction 233
15.2 Abductive and Inductive Logic Programming 234
15.3 An algorithm for learning abductive logic programs 239
15.4 Examples 241
15.5 Integration of abduction and induction 248
15.6 Conclusions and future work 250
Appendix: Abductive proof procedure 250

16
Abductive generalization and specialization 253
Chiaki Sakama
16.1 Introduction 253
16.2 Preliminaries 254
16.3 Generalizing knowledge bases through abduction 256
16.4 Specializing knowledge bases through abduction 260
16.5 Related work 264
16.6 Concluding remarks 264

17
Using abduction for induction based on bottom generalization 267
Akihiro Yamamoto
17.1 Introduction 267
17.2 From abduction to induction 269
17.3 SOLD-resolution 270
17.4 Finding definite clauses 273
17.5 Finding unit programs 278
17.6 Concluding remarks 279

Bibliography 281

Index 301
Foreword

Reasoning in reverse
Logic is the systematic study of cogent reasoning. The central process of reasoning
studied by modem logicians is the accumulative deduction, usually explained semanti-
cally, as taking us from truths to further truths. But actually, this emphasis is the result
of a historical contraction of the agenda for the field. Up to the 1930s, many logic
textbooks still treated deduction, induction, confirmation, and various further forms of
reasoning in a broader sense as part of the logical core curriculum. And moving back
to the 19th century, authors like Mill or Peirce included various non-deductive modes
of reasoning (induction, abduction) on a par with material that we would recognize at
once as 'modem' concerns. Since these non-deductive styles of reasoning seemed ir-
relevant to foundational research in mathematics, they moved out quietly in the Golden
Age of mathematical logic. But they do remain central to a logical understanding of
ordinary human cognition. These days, this older broader agenda is coming back to
life, mostly under the influence of Artificial Intelligence, but now pursued by more
sophisticated techniques - made available, incidentally, by advances in mathematical
logic ...
The present volume is devoted to two major varieties of non-deductive inference,
namely abduction and induction, identified as logical 'twins' by C.S. Peirce, but dis-
covered independently under many different names. Roughly speaking, abduction is
about finding explanations for observed facts, viewed as missing premises in an ar-
gument from available background knowledge deriving those facts. Equally roughly
speaking, induction is abput finding general rules covering a large number of given
observations. Both these phenomena have been studied by philosophers of science
since the 1950s, such as Camap (the pioneer of inductive logic) and Hempel (whose
'logico-deductive' model of explanation has unmistakable abductive features). An-
other major contribution was made by Popper. If good news travels in this forward
direction, bad news travels in the opposite. Valid consequences also take us from false
conclusions to falsity of at least one of the premises, allowing us to learn by revision
- even though we may have some latitude in where to assign the blame. Thus, rea-
soning is also tied up with scientific theory change, and more generally, flux of our
commonsense opinions. What the present volume shows is how these concerns are
ix
X FOREWORD

converging with those of logicians and computer scientists, into a broader picture of
what reasoning is all about.
A pervasive initial problem in this area, even if just an irritant, is terminology. Some
people feel that 'abduction' and 'induction', once baptised, must be real phenomena.
But they might be mere terms of art, still looking for a substantial denotation ... Indeed,
it is not easy to give a crystal-clear definition for them, either independently or in their
inter-relationship. (Of course, this is not easy for 'deduction' either.) Fortunately, the
editors do an excellent job in their introductory chapter of clearing up a number of
confusions, and relating abduction and induction in a productive manner. No need to
repeat that. Instead, let me highlight how the subject of this book intertwines many
general features of reasoning that need to be understood - and that somehow man-
age to escape from the usual logical agenda. For this purpose, we must make some
distinctions.
Every type of reasoning revolves around some underlying connection, giving us a
link with a certain 'quality' between input (data) and output (conclusions). The classi-
cal schema for this connection is the binary format P, Q, ... f= C. But this format leaves
many 'degrees of freedom', that are essential to reasoning. First, the strength of the
connection may vary. With Tarski it says "all models of the premises P, Q, .. . are also
models for the conclusion C'. But there are respectable alternatives which ask less:
replacing "all" by "almost all" (as happens in some probabilistic reasoning), or by "all
most preferred" (as in non-monotonic logics in AI). This variety of 'styles of reason-
ing' can be traced back to the pioneering work of Bolzano in the early 19th century,
including an awareness- wide-spread now, but quite novel then - that these different
styles differ, not only in the individual inferences they sanction, but also in their gen-
eral structural rules, such as Monotonicity of Transitivity. Second, varieties of logical
consequence multiply by the existence of very different viewpoints on the connection.
All variations so far were semantic, in terms of models, truth, and preference. But
we can also analyse cogent reasoning in a proof-theoretic manner (consequence as
derivability), giving us options between classical, or intuitionistic, or linear logic- or
a game-theoretic one (where valid consequence is the existence of a winning strategy
for a proponent in debate), giving us yet further logical systems.
Another key dimension to reasoning is direction. Standard logical inference moves
forward, from given premises to new conclusions. But abduction moves backwards,
looking for premises that imply a given conclusion. The backwards direction is often
less deterministic, as we can choose from a vast background reservoir of knowledge,
prejudices, hypotheses, etc. Indeed, to put backwards reasoning in proper perspective,
we need richer formats of inference. An example is Toulmin 's schema from the 1950s,
where claims follow from data via a 'warrant', where data are backed up by evidence,
and warrants by background theory. Thus, we are led naturally to a study of theory
structure. The latter is prominent in the philosophy of science, where assertions may
fall into laws, facts, and auxiliary hypotheses. This structure seems essential to logical
analysis of reasoning. Thus, abduction looks for facts, while induction searches for
regularities. Indeed, would the same distinctions make sense in the forward direction?
Structured theories fix different roles for assertions. A more radical view would
be that such roles are not fixed globally, but just represent a focus in the current con-
FOREWORD XI

text. What counts as a 'relevant' assertion, or as an 'explanation' for a given ob-


servation, may depend on the dynamics of some ongoing argument - and the topic
of conversation. This epistemic flux reflects a general modem concern. Inferential
connections and theory structures are static entities, but actual reasoning is a dynamic
process, driven by purposes over time. This dynamics is reflected in standard accounts
of abduction and induction, as being about 'finding ' explanations or generalizations.
But even standard deduction is usually dynamically goal-driven, say by a conjecture,
pulling us toward intermediate results. Of course, we need to 'compare like to like':
abduction, induction and deduction all have both static and dynamic aspects.
Another dimension of fine-structure is the role of language in reasoning. Eventu-
ally, one cannot understand the workings of a style of inference without understanding
its concomitant language design. E.g., the above distinction between individual facts
and general assertions is language-dependent. (I do not know of any conclusive se-
mantic underpinning for it.) Likewise, the choice of concrete and abstract vocabulary
is essential to perspicuous theory structure, as is well-known from philosophy of sci-
ence. But also, different notions of consequence suggests different vocabularies of
logical operators reflecting 'control structures' of the process, witness classical, intu-
itionistic, or modal logic. Modem non-standard logics invite even more exotic new
operations.
Finally, this book highlights issues of combination. Different forms of reasoning do
not live in isolated domains, they interact. This is clear even in classical mathematics,
where backwards problem Analysis lived side-by-side with forward proof Synthesis.
Logic systems like semantic tableaux have this same dual character. In this book,
abduction and induction occur intertwined, which raises many additional questions.
This is one instance of a more general trend toward understanding the architecture of
logical systems, and its effects on the complexity of their behaviour in bulk.
Reasoning is a many-dimensional process, involving a complex interplay of infer-
ential connections, language design, changing directions, and larger-scale combina-
tions. Naturally, the present Book does not address all this once and for all. But it
does throw open windows towards understanding the true complexities of reasoning,
by presenting abduction and induction intertwined as a fascinating case study for 'real
logic'.

Johan van Benthem


Preface

From the very beginning of investigation of human reasoning, philosophers had iden-
tified- along with deduction- two other forms of reasoning which we now call abduc-
tion and induction. Whereas deduction has been widely studied over the years and is
now fairly well understood, these two other forms of reasoning have, until now, eluded
a similar level of understanding. Their study has concentrated more on the role they
play in the evolution of knowledge and the development of scientific theories.
In an attempt to increase our understanding of these two forms of non-deductive
reasoning, this book presents a collection of works addressing the issues of the re-
lation between abduction and induction, as well as their possible integration. These
issues are approached sometimes from a philosophical perspective, sometimes from a
(purely) logical perspective, but also from the more task-oriented perspective of Arti-
ficial Intelligence. To a certain extent, the emphasis lies with the last area of Artificial
Intelligence, where abduction and induction have been more intensively studied in
recent years.
This book grew out of a series of workshops on this topic. The first of these took
place at the Twelfth European Conference on Artificial Intelligence (Budapest, August
1996), and concentrated on the general philosophical issues pertaining to the unifica-
tion or distinction between abduction and induction. The second workshop took place
at the Fifteenth International Joint Conference on Artificial Intelligence (Nagoya, Au-
gust I 997), with an emphasis on the more practical issues of integration of abduc-
tion and induction. Taking place in parallel with the preparation of this book, a third
workshop was held at the Thirteenth European Conference on Artificial Intelligence
(Brighton, August 1998). Detailed reports on the first two workshops have been pub-
lished as (Flach and Kakas, 1997a; Flach and Kakas, I 998); these reports, as well as
further information about the workshops (including submitted papers), are available
on-line at http: 1 jwww . cs. bris . ac. uk/-flachjabdind/.
After the first two workshops, we invited the participants to submit a longer paper
based on their workshop contribution(s), suitable for publication in an edited volume.
Following a careful reviewing process, thirteen of the submitted papers were selected
for publication. In addition, we invited four well-known authors to contribute a paper:
John Josephson, Luca Console, Lorenza Saitta, and David Poole.

xiii
xiv PREFACE

Following a general introduction into the subject, the book is structured into four
main parts. The first two parts take a more theoretical perspective, while the remaining
two parts address the more practical issue of integrating abduction and induction. Part
1 contains three papers addressing philosophical aspects of abduction and induction.
In Part 2, four papers investigate the logical relation of the two forms of reasoning.
The four papers in Part 3 deal with integration of the two forms of reasoning from the
perspective of Artificial Intelligence, while the five papers that can be found in Part
4 address this problem within the more particular framework of Logic Programming.
The book starts off with an introductory chapter aimed at helping the reader in two
ways. It provides background material on the general subject of the book and exposes
the main issues involved. At the same time it positions the other contributions in the
book within the general terrain of debate.
The present book is one of the first books to address explicitly the problem of under-
standing the relation and interaction between abduction and induction in the various
fields of study where these two forms of reasoning appear. As such, it should be rele-
vant to a variety of students and researchers from these different areas of study, such
as philosophers, logicians, and people working in Artificial Intelligence and Computer
Science more generally.

Acknowledgments
We would like to thank all persons who helped in one way or another with the prepa-
ration of this book.
These are all those involved in the organisation of the three workshops on the sub-
ject at ECAI'96, IJCAI'97 and ECAI'98, where much of the groundwork for this book
has been done. In particular, we would like to thank the other members of the organis-
ing committees of these workshops: Henning Christiansen, Luca Console, Marc De-
necker, Luc De Raedt, Randy Goebel, Katsumi Inoue, John Josephson, Ray Mooney
and Chiaki Sakama. A special thanks goes to the three invited speakers at these work-
shops, John Josephson, David Poole and Murray Shanahan. And of course we are
grateful to all the participants, for the pleasant atmosphere and lively discussions dur-
ing the workshops. Finally, we would like to thank the two European networks of
excellence, Compulog-Net and ML-Net, for their financial support in organising these
workshops.
We thank everybody who submitted a paper to this book; those who helped review-
ing the submissions; the invited authors for their marvellous contributions; and Johan
van Benthem for his beautiful and thought-provoking foreword.
Part of this work falls under the workplan of the ESPRIT project ILP2: Inductive
Logic Programming. We wish to thank the other partners of the project for their help
and valuable discussions on the subject of the book. We also thank the Universities of
Cyprus, Tilburg and Bristol for providing the opportunities to prepare this book.
Special thanks go to Kim and Nada for their patient understanding and support with
all the rest of life's necessities, thus allowing us the selfish pleasure of concentrating
on research and other academic matters such as putting this book together.

Peter Flach and Antonis Kakas


Contributing Authors

Akinori Abe (ave@cslab. keel. ntt. co. jp) is a Senior Research Scientist at
NTI Communication Science laboratories. He obtained his Doctor of Engineering
degree from University of Tokyo in 1991, with a thesis entitled A Fast Hypothetical
Reasoning System using Analogical Case. His main research interests are abduction
(hypothetical reasoning), analogical reasoning and language sense processing. He is a
member of the Planning Committee of New Generation Computing.

Atocha Aliseda (atocha@filosoficas. unam. mx) is an Associate Professor at


the Institute for Philosophical Research of the National Autonomous University of
Mexico. She obtained her PhD from Stanford University in 1997, with a thesis enti-
tled Seeking Explanations: Abduction in Logic, Philosophy of Science and Artificial
Intelligence, which was also published by the Institute for Logic, Language and Com-
putation (ILLC) of the University of Amsterdam, 1997. Her main research interests
are abductive logic, heuristic reasoning and the connection between philosophy of sci-
ence and artificial intelligence. Her homepage is at http://www . filosoficas.
unam.mx;-atocha/home.html.

Brigitte Bessant (bessant@cmi . univ-mrs. fr) is a Lecturer in computer sci-


ence at University of Artois in France. She obtained her PhD in 1999, with a thesis
entitled Contributions to techniques of belief revision in artificial intelligence: se-
mantic and practical aspects. Her main research interests are nonmonotonic logics,
commonsense reasoning, belief revision, update and machine learning.

Henning Christiansen (henning@ruc. dk) is an Associate Professor at the Com-


puter Science Department of Roskilde University, Denmark. He obtained his PhD
from Roskilde University in 1988, with a thesis entitled Programming as language
development. His main research interests are logic programming with emphasis on
metaprogramming, constraints, and abduction, and query-answering systems. He is
co-editor of books on Flexible Query-Answering Systems (Kluwer, 1997; Springer,
1998). His homepage is at http: ;;www . dat. rue. dk/-henning;.
XV
xvi CONTRIBUTING AUTHORS

Luca Console (Luca. Console@di. uni to. it) is an AssociateProfessorofCom-


puter Science at the Dipartimento di Informatica of the Universita' di Torino. His main
research interests regard reasoning mechanism, with specific attention to model-based
reasoning and diagnosis, temporal reasoning, abductive reasoning, adaptive systems.
He is author of several papers, of introductory books and editor of collections on
model-based diagnosis.

Floriana Esposito (esposito@di . uniba. it) is a Professor of Computer Sci-


ence at DIB, University of Bari. She graduated in Electronic Physics at University of
Bari in 1970 and since 197 4 was Assistant Professor of Computer Science. Her re-
search interests are artificial intelligence, machine learning, programming languages
and symbolic computation.

Stefano Ferilli (ferilli @di . uniba. it) is currently a PhD student in Computer
Science at DIB, University of Bari. He graduated in Computer Science at University
of Bari in 1996. His research interests include logic programming, machine learning
and theory revision.

Peter A. Flach (Peter. Flach@bristol. ac. uk) is a Lecturer at the Computer


Science Department of the University of Bristol. He obtained his PhD from Tilburg
University in 1995, with a thesis entitled Conjectures: an inquiry concerning the logic
of induction. His main research interests are inductive logic programming, intelligent
reasoning, and philosophy of artificial intelligence. He is author of the textbook Sim-
ply Logical - intelligent reasoning by example (John Wiley, 1994). He is academic
coordinator of ILPnet2: the European Network of Excellence in Inductive Logic Pro-
gramming. His homepage is at http: 1 lwww. cs . bris. ac. ukl-flachl.

Hiromasa Haneda (haneda @kobe- u. ac. j p) is a Professor at the Department


of Electrical and Electronics Engineering at Kobe University, Japan. He obtained his
PhD from the University of California, Berkeley in 1972 in the area of computer-aided
analysis of electronic circuits and systems. His main research interests are machine
learning as applied to computer-aided design and analysis of industrial systems.

Katsumi Inoue (inoue@eedept . kobe- u. ac. jp) is an Associate Professor at


the Department of Electrical and Electronics Engineering at Kobe University, Japan.
He obtained a Doctor of Engineering from Kyoto University in 1993 with a thesis
entitled Studies on abductive and nonmonotonic reasoning. His main research inter-
ests are automated reasoning, knowledge representation, machine learning, and logic
programming. His homepage is at http: 1 lcslab. eedept. kobe- u. ac. jpl
-inouel.
CONTRIBUTING AUTHORS XVII

John R. Josephson (j j @cis . ohio- state. edu} is a Research Scientist and the
Associate Director of the Laboratory for AI Research (LAIR) in the Department of
Computer and Information Science at the Ohio State University. He received his Ph.D.
in Philosophy (of science) from Ohio State in 1982; he also holds B.S. and M.S. de-
grees in Mathematics from Ohio State. His primary research interests are artificial
intelligence, knowledge-based systems, abductive inference, causal reasoning, theory
formation, perception, diagnosis, the logic of investigation, and the foundations of
science. He has worked in several application domains including: medical diagno-
sis, diagnosis of engineered systems, logistics planning, speech recognition, genetics,
molecular biology, and design of electro-mechanical systems. He is the co-editor,
with Susan G. Josephson, of Abductive Inference (Cambridge University Press, 1994,
1996). His homepage is at http : 1 jwww. cis . ohio- state. edu/-j j/.

Antonis C. Kakas (antonis@ucy. ac. cy) is an Associate Professor at the Com-


puter Science Department of the University of Cyprus. He obtained his PhD in The-
oretical Physics from Imperial College, London, in 1984. In 1989 he started working
in Computational Logic and Artificial Intelligence. His main research interests are ab-
duction, with specific interest in the integration of abductive, inductive and constraint
logic programming and its applications in the areas of planning and information in-
tegration, argumentation and the theory of actions and change. He is the editor in
chief of the magazine Computational Logic published by Compulog-Net: the Euro-
pean Network of Excellence in Computational logic.

Nicolas Lachiche (lachiche@iutsud . u- strasbg. fr) is a Lecturer in Com-


puter Science at the University Robert Schuman (Strasbourg, France). He was pre-
viously a Research Associate at the Computer Science Department of the University
of Bristol. He obtained his PhD from the University of Nancy in 1997, with a thesis
focusing on classification and descriptive induction, and their relations. His research
mainly concerns data mining and machine learning, from both supervised and unsu-
pervised perspectives, in either an attribute-value representation or a first-order logic
language.

Evelina Lamma (elamma @de is . unibo . it) is an Associate Professor at DEIS,


University of Bologna. She graduated in Electrical Engineering in 1985 at the Uni-
versity of Bologna and obtained her PhD in Computer Science in 1990. Her main
research interests are artificial intelligence, and extensions of logic programming in
particular.

Paola Mello (pmello@deis . unibo. it) is a Professor at DEIS, University of


Bologna. She graduated in Electrical Engineering in 1982 at the University of Bologna
and obtained her PhD in Computer Science in 1989. Her main research interests are
artificial intelligence, and extensions of logic programming in particular.
XV Ill CONTRIBUTING AUTHORS

Raymond j. Mooney (mooney@cs. utexas. edu) is an Associate Professor in the


Department of Computer Sciences at the University of Texas at Austin. He received
his Ph.D. in 1988 from the University of Illinois at Urbana/Champaign with a thesis on
explanation-based learning. He is an editor for the journal Machine Learning where he
recently co-edited a special issue on natural language learning. His current research in-
terests include natural-language learning, knowledge-base refinement, inductive logic
programming, and learning for text categorization and recommender systems. His
homepageisathttp:llwww.cs . utexas.eduluserslmooneyl.

David Poole (poole @cs . ubc. ca) is a Professor of Computer Science at the Uni-
versity of British Columbia. He received his Ph.D. from the Australian National Uni-
versity in 1984. He is known for his work on knowledge representation, default rea-
soning, assumption-based reasoning, diagnosis, reasoning under uncertainty, and au-
tomated decision making. He is a co-author of a recent AI textbook, Computational
Intelligence: A Logical Perspective (Oxford University Press, 1998), co-editor of the
Proceedings of the Tenth Conference in Uncertainty in Artificial Intelligence (Morgan
Kaufmann, 1994), serves on the editorial board of the Journal of AI research, and
is a principal investigator in the Institute for Robotics and Intelligent Systems. His
homepage is at http: 1 lwww. cs . ubc. calspiderlpoolel.

Stathis Psillos (psillos@netplan. gr) is a Lecturer at the Department of Phi-


losophy and History of Science, University of Athens. Between 1995-1998 he was a
British Academy Postdoctoral Fellow at the London School of Economics. He com-
pleted his Ph.D in 1994, at King's College London. His book Scientific Realism: How
Science Tracks Truth is due to appear in November 1999 by Routledge.

Fabrizio Riguzzi (friguzzi@deis. unibo. it) is currently affiliated to DEIS,


University of Bologna. He obtained his PhD from the University of Bologna in 1999,
with a thesis entitled Extensions of Logic Programming as Representation Languages
for Machine Learning . His main research interests are logic programming, machine
learning and inductive logic programming in particular.

Chiaki Sakama (sa kama @sys. wakayama- u. ac. j p) is an Associate Professor


at the Department of Computer and Communication Sciences of Wakayama Univer-
sity. He obtained his Doctor of Engineering degree from Kyoto University in 1995,
with a thesis entitled Studies on Disjunctive Logic Programming. His research in-
terests include abductive/inductive logic programming, nonmonotonic reasoning, and
belief revision. His homepage is at http: 11www. sys. wakayama- u. ac. jpl
-sakamal.

Lorenza Saitta (sai tta @di. uni to. it) is a Professor of Computer Science at
the Universita' del Piemonte Orientale, Alessandria, Italy. Her main research interests
are in Machine Learning, specifically learning relations, multistrategy learning, and
CONTRIBUTING AUTHORS xix

complexity issues. Recently, she became also interested in Genetic Algorithms and
Cognitive Sciences. She has been the Chairperson of the International Conference on
Machine Learning in 1996.

Giovanni Semeraro (semeraro@di. uniba. it) is an Associate Professor at DIB,


University of Bari. He graduated in Computer Science in 1988. He joined the Univer-
sity of Bari in 1991. His research interests are centered on the logical and algebraic
foundations of inductive inference, document classification and understanding, multi-
strategy learning, theory revision and intelligent digital libraries.

Pei Wang (pwang@cogsci. indiana. edu) is the Director of Artificial Intelli-


gence at IntelliGenesis Corporation, and an Adjunct Researcher at the Center for
Research on Concepts and Cognition, Indiana University. He obtained his BS and
MS in Computer Science from Peking University, and his PhD in Computer Sci-
ence and Cognitive Science from Indiana University. His main research interests
are the foundation of intelligence, reasoning with uncertainty, learning and adap-
tation, and decision making under time pressure. His publications are available at
http : //www . cogsci . indiana . edujfargjpeiwangjpapers . html.

Akihiro Yamamoto (yamamoto@meme. hokudai . ac. jp) is an Associate Pro-


fessor of the Division of Electronics and Information Engineering at Hokkaido Univer-
sity. He is also a researcher of Precursory Research for Embryonic Science and Tech-
nology (PRESTO) at Japan Science and Technology Corporation (JST). He obtained
the Dr. Sci. degree from Kyushu University in 1990, with a thesis entitled Studies on
Unification in Logic Programming. His main research interests are logic programming
and its applications to abduction, inductive inference and computer networks.
1 ABDUCTIVE AND INDUCTIVE
REASONING: BACKGROUND AND
ISSUES
Peter A. Flach and Antonis C. Kakas

1.1 INTRODUCTION
This collection is devoted to the analysis and application of abductive and inductive
reasoning in a common context, studying their relation and possible ways for integra-
tion. There are several reasons for doing so. One reason is practical, and based on the
expectation that abduction and induction are sufficiently similar to allow for a tight
integration in practical systems, yet sufficiently complementary for this integration to
be useful and productive.
Our interest in combining abduction and induction is not purely practical, however.
Conceptually, the relation between abduction and induction is not well understood.
More precisely, there are several, mutually incompatible ways to perceive this relation.
For instance, Josephson writes that 'it is possible to treat every good (... ) inductive
generalisation as an instance of abduction' (Josephson, 1994, p.l9), while Michalski
has it that 'inductive inference was defined as a process of generating descriptions that
imply original facts in the context of background knowledge. Such a general definition
includes inductive generalisation and abduction as special cases' (Michalski, 1987,
p.188).
One can argue that such incompatible viewpoints indicate that abduction and induc-
tion themselves are not well-defined. Once their definitions have been fixed, studying
their relation becomes a technical rather than a conceptual matter. However, it is not
self-evident why there should exist absolute, Platonic ideals of abduction and induc-
tion, waiting to be discovered and captured once and for all by an appropriate defini-

P.A. Flach and A.C. Kakns (eds.), Abduction and Induction, 1-27.
@ 2000 Kluwer Academic Publishers.
2 P.A. FLACH AND A.C. KAKAS

tion. As with most theoretical notions, it is more a matter of pragmatics, of how useful
a particular definition is going to be in a particular context.
A more relativistic viewpoint is often more productive in these matters, looking at
situations where it might be more appropriate to distinguish between abduction and
induction, and also at cases where it seems more useful to unify them. Sometimes
we want to stress that abduction and induction spring from a common root (say hypo-
thetical or non-deductive reasoning), and sometimes we want to take a finer grained
perspective by looking at what distinguishes them (e.g. the way in which the hypothe-
sis extends our knowledge). The following questions will therefore be our guidelines:

• When and how will it be useful to unify, or distinguish, abduction and induction?

• How can abduction and induction be usefully integrated?

Here and elsewhere, by unification we mean considering them as part of a common


framework, while by integration we mean employing them together, in some mutually
enhancing way, for a practical purpose.
The current state of affairs with regard to these issues is perhaps most adequately
described as an ongoing debate, and the reader should look upon the following chap-
ters as representing a range of possible positions in this debate. One of our aims in this
introductory chapter is to chart the terrain where the debate is taking place, and to po-
sition the contributions to this volume within the terrain. We will retrace some of the
main issues in this debate to their historical background. We will also attempt a syn-
thesis of some of these issues, primarily motivated by work in artificial intelligence,
sometimes taking positions that may not be shared by every author in this volume.
The outline of this chapter is as follows. In Section 1.2 we discuss the philosophi-
cal and logical origins of abduction and induction. In Section 1.3 we analyse previous
work on abduction and induction in the context of logic programming and artificial
intelligence, and attempt a (partial) synthesis of this work. Section 1.4 considers the
integration of abduction and induction in artificial intelligence, and Section 1.5 con-
cludes.
Before we embark on this, let us express our sincere thanks for all authors con-
tributing to this volume, without whom we couldn't have written this introduction -
indeed, some of the viewpoints we're advocating have been strongly influenced by the
other contributions. Wherever possible we have tried to indicate the original source of
a viewpoint we discuss, but we apologise in advance for any omissions in this respect.

1.2 ABDUCTION AND INDUCTION IN PHILOSOPHY AND LOGIC


In this section we discuss various possible viewpoints on abduction and induction
that can be found in the philosophical and logical literature. The philosophical issue is
mainly one of categorisation (which forms of reasoning exist?), while the logical issue
is one of formalisation.
As far as categorisation is concerned, it seems uncontroversial that deduction should
be singled out as a separate reasoning form which is fundamentally different from any
other form of reasoning by virtue of its truth-preserving nature. The question, then,
is how non-deductive reasoning should be mapped out. One school of thought holds
ABDUCTIVE AND INDUCTIVE REASONING: BACKGROUND AND ISSUES 3

that no further sub-categorisation is needed: all non-deductive logic is of the same


category, which is called induction. Another school of thought argues for a further
division of non-deductive reasoning into abduction and induction. We will discuss
these two viewpoints in the next two sections. A general analysis of the relationship
between abduction and induction from several different perspectives is also carried out
by Bessant in her contribution to this volume.

1.2.1 Induction as non-deductive reasoning


Let us start by taking a look at a textbook definition of induction.
Arguments can be classified in terms of whether their premisses provide ( l) con-
clusive support, (2) partial support, or (3) only the appearance of support (that is,
no real support at all.) When we say that the premisses provide conclusive support
for the conclusion, we mean that if the premisses of the argument were all true,
it would be impossible for the conclusion of the argument to be false. Arguments
that have this characteristic are called deductive arguments. When we say that
the premisses of an argument provide partial support for the conclusion, we mean
that if the premisses were true, they would give us good reasons - but not con-
clusive reasons - to accept the conclusion. That is to say, although the premisses,
if true, provide some evidence to support the conclusion, the conclusion may still
be false. Arguments of this type are called inductive arguments. (Salmon, l984a,
p.32)

This establishes a dichotomy of the set of non-fallacious arguments into either deduc-
tive or inductive arguments, the distinction being based on the way they are supported
or justified: while deductive support is an absolute notion, inductive support must be
expressed in relative (e.g. quantitative) terms.
Salmon further classifies inductive arguments into arguments based on samples,
arguments from analogy, and statistical syllogisms. Arguments based on samples or
inductive generalisations have the following general form:
X percent of observed Fs are Gs;
therefore, (approximately) X percent of all Fs are Gs.

Arguments from analogy look as follows:


Objects of type X have properties F, G, H, ...;
objects of type Y have properties F, G, H, ... , and also property Z;
therefore, objects of type X have property Z as well.

Finally, statistical syllogisms have the following abstract form:


X percent of all Fs are Gs;
a is anF;
therefore, a is a G.

Here X is understood to be a high percentage (i.e. if X is close to zero, the conclusion


must be changed to 'a is not a G').
There are several important things to note. One is that some premisses and conclu-
sions are statistical, talking about relative frequencies ('X percent of'), while others
are categorical. In general, we can obtain a categorical special case from arguments
4 P.A. FLACH AND A. C. KAKAS

involving a relative frequency X by putting X = 100%. Obviously, the categorical


variant of statistical syllogism is purely deductive. More importantly, categorical in-
ductive generalisation has the following form:
All observed Fs are Gs;
therefore, all Fs are Gs.
As argued in Section 1.2.3, most inductive arguments in artificial intelligence are cat-
egorical, as this facilitates further reasoning with the inductive conclusion.
Regardless of whether inductive arguments are statistical or categorical, we must
have a way to assess their strength or inductive support, and this is the second way
in which statistics comes into play. Given evidence E collected in the premisses of
an inductive argument, we want to know the degree of belief we should attach to
the hypothetical conclusion H. It is widely believed that degrees of belief should be
quantified as (subjective) probabilities- in particular, the degree of belief in H given
E is usually identified with the conditional probability P(HIE). The probabilistic
formalisation of inductive support is known as confirmation theory.
It is tempting to consider the degree of confirmation of hypothesis H by evidence
E as the degree of validity of the inductive argument 'E, therefore H ', and treat this
'inductive validity' as analogous to deductive validity. Following this line of thought,
several authors speak of confirmation theory as establishing an 'inductive logic':
'What we call inductive logic is often called the theory of nondemonstrative or
nondeductive inference. Since we use the term 'inductive' in the wide sense of
'nondeductive', we might call it the theory of inductive inference... However,
it should be noticed that the term 'inference' must here, in inductive logic, not
be understood in the same sense as in deductive logic. Deductive and inductive
logic are analogous in one respect: both investigate logical relations between sen-
tences; the first studies the relation of [entailment], the second that of degree of
confirmation which may be regarded as a numerical measure for a partial [entail-
ment] ... The term 'inference' in its customary use implies a transition from given
sentences to new sentences or an acquisition of a new sentence on the basis of
sentences already possessed. However, only deductive inference is inference in
this sense.' (Camap, 1950, §44B, pp.205-6)
In other words, confirmation theory by itself does not establish a consequence relation
(a subset of L x L, where Lis the logical language), since any evidence will confirm
any hypothesis to a certain degree. Inductive logic based on confirmation theory does
not have a proof theory in the traditional sense, and therefore does not guide us in
generating possible inductive hypotheses from evidence, but rather evaluates a given
hypothesis against given evidence. The inductive logic arising from confirmation the-
ory is a logic of hypothesis evaluation rather than hypothesis generation. This dis-
tinction between hypothesis generation and hypothesis evaluation is an important one
in the present context, and we will have more to say about the issue in Sections 1.2.3
and 1.3.
To summarise, one way to categorise arguments is by dividing them into non-
defeasible (i.e. deductive) and defeasible but supported (i.e. inductive) arguments. A
further sub-categorisation can be obtained by looking at the syntactic form of the ar-
gument. Confirmation theory quantifies inductive support in probabilistic terms, and
deals primarily with hypothesis evaluation.
ABDUCTIVE AND INDUCTIVE REASONING: BACKGROUND AND ISSUES 5

1.2.2 Deduction, induction and abduction


After having discussed the view that identifies induction with all non-deductive rea-
soning, we next tum to the trichotomy of deductive, inductive and abductive reasoning
proposed by the American philosopher Charles Sanders Peirce ( 1839-1914) .
Peirce was a very prolific thinker and writer, but only a fraction of his work was
published during his life. His collected works (Peirce, 1958) 1 therefore reflect, first
and foremost, the evolution of his thinking, and should be approached with some care.
With respect to abduction and induction Peirce went through a substantial change of
mind during the decade 1890- 1900 (Fann, 1970). It is perhaps fair to say that many
of the current controversies surrounding abduction seem to be attributable to Peirce's
mindchange. Below we will briefly discuss both his early, syllogistic theory, which
can be seen as a precursor to the current use of abduction in logic programming and
artificial intelligence, and his later, inferential theory, in which abduction represents
the hypothesis generation part of explanatory reasoning.

Peirce's syllogistic theory. In Peirce's days logic was not nearly as well-developed
as it is today, and his first attempt to classify arguments (which he considers 'the
chief business of the logician' (2.619), follows Aristotle in employing syllogisms.
The following syllogism is known as Barbara:
All the beans from this bag are white;
these beans are from this bag;
therefore, these beans are white.

The idea is that this valid argument represents a particular instantiation of a reason-
ing scheme, and that any alternative instantiation represents another argument that is
likewise valid. Syllogisms should thus be interpreted as argument schemas.
Two other syllogisms are obtained from Barbara if we exchange the conclusion (or
Result, as Peirce calls it) with either the major premiss (the Rule) or the minor premiss
(the Case):
Case. -These beans are from this bag.
Result. -These beans are white.
Rule. - All the beans from this bag are white.

Rule.- All the beans from this bag are white.


Result. -These beans are white.
Case.- These beans are from this bag.
The first of these two syllogisms (inference of the rule from the case and the result)
can be recognised as what we called previously a categorical inductive generalisation,
generalising from a sample of beans to the population of beans in the bag. The sort
of inference exemplified by the second syllogism (inference of the case from the rule

1References to Peirce's collected papers take the form X .Y, where X denotes the volume number andY the
paragraph within the volume.
6 P.A. FLACH AND A.C. KAKAS

and the result) Peirce calls making a hypothesis or, briefly, hypothesis - the term
'abduction' is introduced only in his later theory. 2
Peirce thus arrives at the following classification of inference (2.623):
Deductive or Analytic
Inference { S h . { Induction
ynt ettc Hypothesis
Comparing this classification with the one obtained in Section 1.2.1, we can point
out the following similarities. That what was called induction previously corresponds
to what Peirce calls synthetic inference (another term he uses is ampliative reason-
ing, since it amplifies, or goes beyond, the information contained in the premisses).
Furthermore, what Peirce calls induction corresponds to what we called inductive gen-
eralisation in Section 1.2.1.3
On the other hand, the motivations for these classifications are quite different in
each case. In Section 1.2.1 we were concentrating on the different kinds of support or
confirmation that arguments provide, and we noticed that this is essentially the same
for all non-deductive reasoning. When we concentrate instead on the syllogistic form
of arguments, we find this to correspond more naturally to a trichotomy, separating
non-deductive reasoning into two subcategories. As Horn clause logic is in some sense
a modem upgrade of syllogistic logic, it is perhaps not surprising that the distinction
between abduction and induction in logic programming follows Peirce's syllogistic
classification to a large extent. This will be further taken up in Section 1.3.

Peirce's inferential theory. In his later theory of reasoning Peirce abandoned the
idea of a syllogistic classification of reasoning:
'( ... ) I was too much taken up in considering syllogistic forms and the doctrine
of logical extension and comprehension, both of which I made more fundamenlal
than they really are. As long as I held that opinion, my conceptions of Abduc-
tion necessarily confused two different kinds of reasoning.' (Peirce, 1958, 2.1 02,
written in 1902)
Instead, he identified the three reasoning forms - abduction, deduction and induction
- with the three stages of scientific inquiry: hypothesis generation, prediction, and
evaluation (Figure 1.1). The underlying model of scientific inquiry runs as follows.
When confronted with a number of observations she seeks to explain, the scientist
comes up with an initial hypothesis; then she investigates what other consequences
this theory, were it true, would have; and finally she evaluates the extent to which
these predicted consequences agree with reality. Peirce calls the first stage, coming up
with a hypothesis to explain the initial observations, abduction; predictions are derived
from a suggested hypothesis by deduction; and the credibility of that hypothesis is
estimated through its predictions by induction. We will now take a closer look at these
stages.

2 Peirce also uses the term 'retroduction', a translation of the Greek word a.Tta.J'IIYVIl used by Aristotle (trans-
lated by others as 'reduction').
3 It should be noted that, although the above syllogistic arguments are all categorical, Peirce also considered
statistical versions.
ABDUCTIVE AND INDUCTIVE REASONING: BACKGROUND AND ISSUES 7

hypothesis
deduction

abduction

R E A L I T v
Figure 1.1 The three stages of scientific inquiry.

Abduction is defined by Peirce as the process of forming an explanatory hypothe-


sis from an observation requiring explanation. This process is not algorithmic: 'the
abductive suggestion comes to us like a flash. It is an act of insight, although of ex-
tremely fallible insight' (Peirce, 1958, 5.181). Elsewhere Peirce describes abduction
as 'a capacity for 'guessing' right', a 'mysterious guessing power' underlying all sci-
entific research (Peirce, 1958, 6.530). Its non-algorithmic character notwithstanding,
abduction
'is logical inference( ... ) having a perfectly definite logical form. ( ...)Namely, the
hypothesis cannot be admitted, even as a hypothesis, unless it be supposed that it
would account for the facts or some of them. The form of inference, therefore, is
this:

The surprising fact, C, is observed;


But if A were true, C would be a matter of course,
Hence, there is reason to suspect that A is true.' (Peirce, 1958, 5.188-9)

Let us investigate the logical form of abduction given by Peirce a little closer. About
C we know two things: that it is true in the actual world, and that it is surprising. The
latter thing can be modelled in many ways, one of the simplest being the requirement
that C does not follow from our other knowledge about the world. In this volume,
Aliseda models it by an epistemic state of doubt which calls for abductive reasoning
to transform it into a state of belief.
Then, 'if A were true, C would be a matter of course' is usually interpreted as 'A
logically entails C'. 4 Peirce calls A an explanation of C, or an 'explanatory hypothe-
sis'. Whether or not this is an appropriate notion of explanation remains an issue of

4 Note that interpreting the second premiss as a material implication, as is sometimes done in the literature,
renders it superfluous, since the truth of A -+C follows from the truth of the observation C.
8 P.A. FLACH AND A.C. KAKAS

debate. In this volume, Console and Saitta also propose to identify explanation with
entailment, but Josephson argues against it.
Besides being explanatory, Peirce mentions two more conditions to be fulfilled
by abductive hypotheses: they should be capable of experimental verification, and
they should be 'economic'. A hypothesis should be experimentally verifiable, since
otherwise it cannot be evaluated inductively. Economic factors include the cost of
verifying the hypothesis, its intrinsic value, and its effect upon other projects (Peirce,
1958, 7 .220). In other words, economic factors are taken into account when choosing
the best explanation among the logically possible ones. For this reason, abduction is
often termed 'inference to the best explanation' (Lipton, 1991).
Induction is identified by Peirce as the process of testing a hypothesis against reality
through selected predictions. 'Induction consists in starting from a theory, deducing
from it predictions of phenomena, and observing those phenomena in order to see how
nearly they agree with the theory' (Peirce, 1958, 5.170). Such predictions can be seen
as experiments:
'When I say that by inductive reasoning I mean a course of experimental inves-
tigation, I do not understand experiment in the narrow sense of an operation by
which one varies the conditions of a phenomenon almost as one pleases. (...) An
experiment( ...) is a question put to nature. ( ...) The question is, Will this be the
result? If Nature replies 'No!' the experimenter has gained an important piece
of knowledge. If Nature says 'Yes,' the experimenter's ideas remain just as they
were, only somewhat more deeply engrained.' (Peirce, 1958, 5.168)
This view of hypothesis testing is essentially what is called the 'hypothetico-deductive
method' in philosophy of science (Hempel, 1966). The idea that a verified prediction
provides further support for the hypothesis is very similar to the notion of confirma-
tion as discussed in Section 1.2.1, and also refutation of hypotheses through falsified
predictions can be brought in line with confirmation theory, with a limiting degree of
support of zero. 5 The main difference from confirmation theory is that in the Peircean
view of induction the hypothesis is, through the predictions, tested against selected
pieces of evidence only. This leads to a restricted form of hypothesis evaluation, for
which we will use the term hypothesis testing.
Peirce's inferential theory makes two main points. It posits a separation between
hypothesis generation and hypothesis evaluation; and it focuses attention on hypothe-
ses that can explain and predict. Combining the two points, abduction is the process of
generating explanatory hypotheses (be they general 'rules' or specific 'cases', as in the
syllogistic account), and induction corresponds to the hypothetico-deductive method
of hypothesis testing. However, the two points are relatively independent: e.g., we can
perceive generation of non-explanatory hypotheses. We will come back to this point
in the discussion below.

5 From a Bayesian perspective P(HIE) is proportional to P(EIH)P(H), where P(H) is the prior probability
of the hypothesis; if E is contrary to a prediction P(EIH) = 0. See Poole's chapter for further discussion of
the Bayesian perspective.
ABDUCfiVE AND INDUCfiVE REASONING: BACKGROUND AND ISSUES 9

1.2.3 Discussion
In the previous two sections we have considered three philosophical and logical per-
spectives on how non-deductive reasoning may be categorised: the inductivist view,
which holds that no further categorisation is needed since all non-deductive reasoning
must be justified in the same way by means of confirmation theory; the syllogistic
view, which distinguishes between inductive generalisation on the one hand and hy-
pothesis or abduction as inference of specific 'cases' on the other; and the inferential
view, which holds that abduction and induction represent the hypothesis generation
and evaluation phases in explanatory reasoning. As we think that none of these view-
points provides a complete picture, there is opportunity to come to a partial synthesis.

Hypothesis generation and hypothesis evaluation. The most salient point of Peirce's
later, inferential theory is the distinction between hypothesis generation and hypothe-
sis evaluation. In most other accounts of non-deductive reasoning the actual hypothe-
sis is already present in the argument under consideration, as can be seen clearly from
the argument forms discussed in Section 1.2.1. For instance, when constructing an
inductive generalisation
X percent of observed Fs are Gs;
therefore, (approximately) X percent of all Fs are Gs.
our job is first to conjecture possible instantiations ofF and G (hypothesis generation),
and then to see whether the resulting argument has sufficient support (hypothesis eval-
uation).
One may argue that a too rigid distinction between generation and evaluation of
hypotheses is counter-productive, since it would lead to the generation of many, ulti-
mately useless hypotheses. Indeed, Peirce's 'economic factors', to be considered when
constructing possible abductive hypotheses, already blur the distinction to a certain ex-
tent. However, even if a too categorical distinction may have practical disadvantages,
on the conceptual level the dangers of confusing the two processes are much larger.
Furthermore, the distinction will arguably be sharper drawn in artificial reasoning sys-
tems than it is in humans, just as chess playing computers still have no real alternative
to finding useful moves than to consider all possible ones.
In any case, whether tightly integrated or clearly separated, hypothesis generation
and hypothesis evaluation have quite distinct characteristics. Here we would argue that
it is hypothesis generation, being concerned with possibilities rather than choices, that
is most inherently 'logical' in the traditional sense. Deductive logic does not help the
mathematician in selecting theorems, only in distinguishing potential theorems from
fallacious ones. Also, as (Hanson, 1958) notes, if hypothesis evaluation establishes
a logic at all, then this would be a 'Logic of the Finished Research Report' rather
than a 'Logic of Discovery'. An axiomatic formalisation of the logic of hypothesis
generation is suggested by Flach in his chapter in this volume.
We also stress the distinction between generation and evaluation because it provides
a useful heuristic for understanding the various positions of participants in the debate
on abduction and induction. This rule of thumb states that those concentrating on
generating hypotheses tend to distinguish between non-deductive forms of reasoning;
those concentrating on evaluating hypotheses tend not to distinguish between them.
10 P.A. FLACH AND A.C. KAKAS

Not only does the rule apply to the approaches discussed in the previous two sections;
we believe that it can guide the reader, by and large, through the chapters in this
collection.

Inductive generalisation. Turning next to the question 'What is induction?', we ex-


pect that any form of consensus will centre around the argument form we called induc-
tive generalisation (see above). In the inductivist approach such sample-to-population
arguments were separated out on syntactic grounds. They also figured in Peirce's
syllogistic theory as one of the two possible reversals of Barbara.
As we remarked above, hypothesis generation here amounts to instantiating F and
G. In general the number of possibilities is large, but it can be reduced by constraining
the proportion X. Many artificial intelligence approaches to induction actually choose
F and G such that X is (close to) 100%, thereby effectively switching to categorical
inductive generalisations:
All observed F s are Gs;
therefore, all Fs are Gs.

For instance, instead of observing that 53% of observed humans are female, such
approaches will continue to refine F until all observed Fs are female (for instance, F
could be 'humans wearing a dress').
The point here is not so much that in artificial intelligence we are only interested
in infallible truths. Often, we have to deal with uncertainties in the form of noisy
data, exceptions to rules, etc. Instead of representing these uncertainties explicitly in
the form of relative frequencies, one deals with them semantically, e.g. by attaching a
degree of confirmation to the inductive conclusion, or by interpreting rules as defaults.
The above formulation of categorical inductive generalisation is still somewhat lim-
iting. The essential step in any inductive generalisation is the extension of the universal
quantifier's scope from the sample to the population. Although the universally quan-
tified sentence is frequently a material implication, this need not be. A more general
form for categorical inductive generalisation would therefore be:
All objects in the sample satisfy P(x);
therefore, all objects in the population satisfy P(x) .

where P(x) denotes a formula with free variable x. Possible instantiations of P(x) can
be found by pretending that there exist no other objects than those in the sample, and
looking for true universal sentences. For instance, we might note that every object in
the sample is either female or male. This approach is further discussed in the chapter
by Lachiche.

Confirmatory and explanatory induction. This more comprehensive formulation


of categorical inductive generalisation also indicates a shortcoming of Peirce's infer-
ential theory: not all hypotheses are explanatory. For instance, take the inductive gen-
eralisation 'every object in the population is female or male'. This generalisation does
not, by itself, explain that Maria is female, since it requires the additional knowledge
that Maria is not male. Likewise, an explanation of John being male is only obtained
by adding that John is not female. This phenomenon is not restricted to disjunctive
ABDUCTIVE AND INDUCTIVE REASONING: BACKGROUND AND ISSUES 11

generalisations: the rule 'every parent of John is a parent of John 's brother' does not
explain parenthood.
In line with recent developments in inductive logic programming, we would like to
suggest that inductive generalisations like these are not explanatory at all. They sim-
ply are generalisations that are confirmed by the sample. The process of finding such
generalisations has been called confirmatory induction (also descriptive induction).
The difference between the two forms of induction can be understood as follows. A
typical form of explanatory induction is concept learning, where we want to learn a
definition of a given concept C in terms of other concepts. This means that our induc-
tive hypotheses are required to explain (logically entail) why particular individuals are
Cs, in terms of the properties they have.
However, in the more general case of confirmatory induction we are not given a
fixed concept to be learned. The aim is to learn relationships between any of the
concepts, with no particular concept singled out. The formalisation of confirmatory
hypothesis formation thus cannot be based on logical entailment, as in Peirce's ab-
duction. Rather, it is a qualitative form of degree of confirmation, which explains its
name. We will have more to say about the issue in Section 1.3.2.

Abduction. Turning next to abduction, it may seem at first that Peirce's syllogistic
and inferential definitions are not easily reconcilable. However, it is possible to per-
ceive a similarity between the two when we notice that the early syllogistic view of
abduction or hypothesis (p. 5) provides a special form of explanation. The Result (tak-
ing the role of the observation) is explained by the Case in the light of the Rule as a
given theory. The syllogistic form of abduction can thus be seen to meet the explana-
tory requirement of the later inferential view of abduction. Hence we can consider
explanation as a characterising feature of abduction. This will be further discussed in
Section 1.3.2.
Even if the syllogistic and inferential view of abduction can thus be reconciled, it is
still possible to distinguish between approaches which are primarily motivated by one
of the two views. The syllogistic account of abduction has been taken up, by and large,
in logic programming and other work in artificial intelligence addressing tasks such
as that of diagnosis and planning. In this volume, the logic programming perspective
on abduction can be found in the contributions by Christiansen, Console and Saitta,
Inoue and Haneda, Mooney, Poole, Lamma et al., Sakama, and Yamamoto. The logic
programming and artificial intelligence perspective will be more closely examined in
the next section. On the other hand, the chapters by Aliseda, Josephson, and Psillos
are more closely related to the inferential perspective on abduction.

1.3 ABDUCTION AND INDUCTION IN LOGIC PROGRAMMING AND


ARTIFICIAL INTELLIGENCE
In this section, we will examine how abduction and induction appear in the field of arti-
ficial intelligence (AI) and its specific subfield of logic programming. In Section 1.3.1
we will argue that in these fields abduction and induction are generally perceived as
distinct reasoning forms, mainly because they are used to solve different tasks. Con-
sequently, most of what follows should interpreted from the viewpoint of Peirce's
12 P.A. FLACH AND A. C. KAKAS

earlier, syllogistic theory. In Section 1.3.2 we argue that abductive hypotheses primar-
ily provide explanations, while inductive hypotheses provide generalisations. We then
further investigate abduction and induction from a logical perspective in Section 1.3.3,
pointing out differences in the way in which they extend incomplete theories. In Sec-
tion 1.3.4 we investigate how more complex reasoning patterns can be viewed as being
built up from simple abductive and inductive inferences. Finally, in Section 1.3.5 we
address the computational characteristics of abduction and induction.

1.3.1 A task-oriented view


In AI the two different terms of abduction and induction exist separately and are used
by different communities of researchers. This gives the impression that two distinct
and irreducible forms of non-deductive reasoning exist. We believe this separation to
be caused by the fact that in AI, irrespective of the level at which we are examining
the problem, we are eventually interested in tackling particular tasks such as planning,
diagnosis, learning, and language understanding. For instance, a prototypical AI ap-
plication of abductive reasoning is the problem of diagnosis. Here abduction is used
to produce a reason, according to some known theory of a system, for the observed
(often faulty) behaviour of the system. A typical inductive task, on the other hand,
is the problem of concept learning from examples. From a collection of observations
which are judged according to some background information to be similar or related
we draw hypotheses that generalise this observed behaviour to other as yet unseen
cases.
What distinguishes this AI view from the philosophical and logical analyses dis-
cussed in the previous section is the more practical perspective required to tackle these
tasks. Hence in AI it is necessary to study not only the issue of hypothesis evalua-
tion but also the problem of hypothesis generation, taking into account the specific
characteristics of each different task. These tasks require different effects from the
non-deductive reasoning used to address them, resulting in different kinds of hypothe-
ses, generated by different computational methods. As we will argue in Section 1.3.2,
abductive hypotheses are primarily intended to provide explanations and inductive hy-
potheses aim at providing generalisations of the observations.
The point we want to stress here is that in AI hypothesis generation is a real is-
sue, while in philosophy and logic it often seems to be side-stepped since the analysis
usually assumes a given hypothesis. Since abduction and induction produce different
kinds of hypotheses, with different relations to the observations and the background
theory, it seems natural that this increased emphasis on hypothesis generation rein-
forces the distinguishing characteristics of the two reasoning forms. However, despite
this emphasis on hypothesis generation in AI it is not possible to avoid the problem of
hypothesis evaluation and selection amongst several possible alternatives. Returning
to this problem we see that work in AI where the emphasis lies on hypotheses selection
tends to conclude that the two forms of reasoning are not that different after all - they
use the same kind of mechanism to arrive at the conclusion. This is seen in Poole's
work which uses Bayesian probability for the selection of hypotheses and Josephson's
work where several, more qualitative criteria are used.
ABDUCfiVE AND INDUCfiVE REASONING: BACKGROUND AND ISSUES 13

Peirce revisited. AI's emphasis on solving practical tasks notwithstanding, most


research is still aimed at providing general solutions in the form of abductive and in-
ductive engines that can be applied to specific problems by providing the right domain
knowledge and setting the right parameters. In order to understand what these systems
are doing, it is still necessary to use abstract (logical) specifications. Let us examine
this more closely, using the case of logic programming and its two extensions of ab-
ductive and inductive logic programming.
Logic programming assumes a normal form of logical formulae, and therefore has
a strong syllogistic flavour. Consequently, the logic programming perception of ab-
duction and induction essentially follows Peirce's earlier, syllogistic characterisation.
Here are Peirce's two reversals of the syllogism Barbara, recast in logic programming
terms:
Case. - from_this...bag (b) .
Result. -white(b).
Rule. -white(X): -from_this...bag(X) .

Rule. -white (X) : - from_this...bag (X) .


Result-white(b).
Case-from_this...bag(b).

The first pattern, inference of a general rule from a case (description) and a result
(observation) of a particular individual, exemplifies the kind of reasoning performed
by inductive logic programming (ILP) systems. The second pattern, inferring a more
complete description of an individual from an observation and a general theory valid
for all such individuals, is the kind of reasoning studied in abductive logic program-
ming (ALP).
The above account describes ILP and ALP by example, and does not provide a gen-
eral definition. Interestingly, attempts to provide such a general definition of abduction
and induction in logic programming typically correspond to Peirce's later, inferential
characterisation of explanatory hypotheses generation. Thus, in ALP abductive infer-
ence is typically specified as follows:
'Given a set of sentences T (a theory presentation), and a sentence G (obser-
vation), to a first approximation, the abductive task can be characterised as the
problem of finding a set of sentences ll. (abductive explanation for G) such that:
(I) TUI:J. F G.
(2) T U ll. is consistent. • (Kakas et al., 1992, p. 720)
The following is a specification of induction in ILP:
'Given a consistent set of examples or observations 0 and consistent background
knowledge B find an hypothesis H such that: B UH f= 0 • (Muggleton and De
Raedt, 1994)
In spite of small terminological differences the two specifications are virtually iden-
tical: they both invert a deductive consequence relation in order to complete an incom-
plete given theory, prompted by some new observations that cannot be deductively
14 P.A. FLACH AND A. C. KAKAS

accounted for by the theory alone. 6 If our assessment of the distinction between ab-
duction and induction that is usually drawn in AI is correct, we must conclude that
the above specifications are unable to account for this distinction. In the remainder of
Section 1.3 we will try to understand the differences between abduction and induction
as used in AI in modern, non-syllogistic terms. For an account which stays closer to
syllogisms, the reader is referred to the chapter by Wang.

1.3.2 Explanation and generalisation


Let us further analyse the logical processes of abduction and induction from the utility
perspective of AI, and examine to what extent it is possible to distinguish two such
processes on the basis of the function they are intended to perform. We will argue
that such a distinction is indeed possible, since the function of abduction is to provide
explanations, and the function of induction is to provide generalisations. Some of our
views on this matter have been influenced directly by the contribution by Console and
Saitta where more discussion on this possibility of distinction between abduction and
induction can be found.
First, it will be convenient to introduce some further terminology.

Observables and abducibles. We will assume a common first-order language for


all knowledge (known, observed, or hypothetical). We assume that the predicates
of this language are separated into observables and non-observables or background
predicates. Domain knowledge or background knowledge is a general theory con-
cerning non-observable predicates only. Foreground knowledge is a general theory
relating observable predicates to background predicates and each other. Instance
knowledge (sometimes called scenario knowledge) consists of formulae containing
non-observable predicates only, possibly drawn from a restricted subset of such pred-
icates. Known instance knowledge can be part of the background knowledge. Obser-
vations are formulae containing observable predicates, known to hold; predictions are
similar to observations, but their truthvalue is not given.
It will often be useful to employ the notion of an individual to refer to a particu-
lar object or situation in the domain of discourse. For example, instance knowledge
will usually contain descriptions of individuals in terms of non-observable predicates
(thence the name). An unobserved or new individual is one of which the description
becomes known only after the abductive or inductive hypothesis has been formed. As
a consequence, the hypothesis cannot refer to this particular individual; however, the
hypothesis may still be able to provide a prediction for it when its description becomes
available.
Given this terminology, we can specify the aim of induction as inference offore-
ground knowledge from observations and other known information. Typically, this
information consists of background and instance knowledge, although other known

6 Extra elements that are often added to the above definitions are the satisfaction of integrity constraints for
the case of abduction, and the avoidance of negative examples for the case of induction; these can again be
viewed under the same heading, namely as being aimed at exclusion of certain hypotheses.
ABDUCTIVE AND INDUCTIVE REASONING: BACKGROUND AND ISSUES 15

foreground knowledge may also be used. In some cases it may be empty, for instance
when we are learning the definition of a recursive predicate, when we are learning
the definitions of several mutually dependent predicates, or when we are doing data
mining. The observations specify incomplete (usually extensional) knowledge about
the observables, which we try to generalise into new foreground knowledge.
On the other hand, in abduction we are inferring instance knowledge from ob-
servations and other known information. The latter necessarily contains foreground
information pertaining to the observations at hand. Possible abductive hypotheses are
built from specific non-observable predicates called abducibles in ALP. The intuition
is that these are the predicates of which the extensions are not completely known as in-
stance knowledge. Thus, an abductive hypothesis is one which completes the instance
knowledge about an observed individual. This difference between the effect of abduc-
tion and induction on observable and instance knowledge is studied in the chapter by
Console and Saitta.

Explanation. Non-deductive reasoning as used in AI provides two basic functions


that are generally useful in addressing different problems. These two functions are
(a) finding how a piece of information came about to be true according to a general
theory describing the domain of interest, and (b) constructing theories that can de-
scribe the present and future behaviour of a system. Purely from this utility point of
view, non-deductive reasoning is required to provide these two basic effects of expla-
nation and generalisation. Informally, for the purposes of this chapter it is sufficient
for explanation to mean that the hypothesis reasoned to (or generated) by the non-
deductive reasoning does not refer to observables (i.e. consists of instance knowledge)
and entails a certain formula (an observation), and for generalisation to mean that
the hypothesis can entail additional observable information on unobserved individuals
(i.e. predictions).
As we have seen before, both abduction and induction can be seen as a form of
reversed deduction in the presence of a background theory, and thus formally qualify
as providing explanations of some sort. The claim that abduction is explanatory infer-
ence indeed seems undisputed, and we do not find a need to say more about the issue
here (see the chapters by Console and Saitta, Josephson, and Psillos for a discussion
of abduction as explanatory inference). We only point out that if an abductive explana-
tion 11 is required to consist of instance knowledge only, then clearly abduction needs
a given theory T of foreground knowledge, connecting observables to background
predicates, in order to be able to account for the observation with 11. An abductive ex-
planation thus makes sense only relative to this theory T from which it was generated:
it explains the observation according to this particular theory.
However, if induction provides explanations at all, these explanations are of a dif-
ferent kind. For instance, we can say that 'all the beans from this bag are white' is an
explanation for why the observed beans from the bag are white. Notice however that
this kind of explanation is universal: 'observed X s are Y' is explained by the hypoth-
esis that 'all X s are Y '. This explanation does not depend on a particular theory: it
is not according to a particular model of the 'world of beans'. It is a general, meta-
level explanation that does not provide any insight to why things are so. As Josephson
16 P.A. FLACH AND A.C. KAKAS

puts it, inductive hypotheses do not explain particular observations, but they explain
the frequencies with which the observations occur (viz. that non-white beans from this
bag are never observed).

Generalisation. We thus find that inductive hypotheses are not explanatory in the
same way as abductive hypotheses are. But we would argue that being explanatory is
not the primary aim of inductive hypotheses in the first place. Rather, the main goal of
induction is to provide generalisations. In this respect, we find that the ILP definition
of induction (p. 13) is too much focused on the problem of learning classification rules,
without stressing the aspect of generalisation. An explanatory hypothesis would only
be inductive if it generalises. The essential aspect of induction as applied in AI seems
to be the kind of sample-to-population inference exemplified by categorical inductive
generalisation, reproduced here in its more general form from Section 1.2.3:
All objects in the sample satisfy P(x);
therefore, all objects in the population satisfy P(x) .
As with Peirce's syllogisms, the problem here is that P(x) is already assumed to be
given, while in AI a major problem is to generate such hypotheses. The specification
of confirmatory or descriptive induction follows this pattern, but leaves the hypothesis
unspecified:
Given a consistent set of observations 0 and a consistent background knowledge
B, find a hypothesis H such that: M (B U 0) I= H
(Helft, 1989; De Raedt and Bruynooghe, 1993; Flach, 1995)
Hence the formal requirement now is that any generated hypothesis should be true
in a certain model constructed from the given knowledge and observations (e.g. the
truth-minimal model).
This specification can be seen as sample-to-population inference. For example, in
Peirce's bean example (p. 5), B is 'these beans are from this bag' (instance knowledge),
0 is 'these beans are white' (observation), and H- 'all the beans from this bag are
white' - is satisfied by the model containing 'these beans' as the only beans in the
universe. Under the assumption that the population is similar to the sample, we achieve
generalisation by restricting attention to formulae true in the sample. Note that the
induced hypothesis is not restricted to one explaining the whiteness of these beans:
we might equally well have induced that 'all white beans are from this bag'.
Above we defined a hypothesis as generalising if it makes a prediction involving
an observable. We have to qualify this statement somewhat, as the following example
shows (taken from the chapter by Console and Saitta, Example 9.2, p. 141). Let our
background theory contain the following clauses:

measles(X):-brother(X,Y),measles(Y).
red_spots(X):-measles(X).
brother(john,dan).

The observation is red_spots (john). A possible explanation for this observa-


tion is measles (john). While this explanation is clearly completing instance
knowledge and thus abductive, adding it to our theory will lead to the prediction
ABDUCTIVE AND INDUCTIVE REASONING: BACKGROUND AND ISSUES 17

red_spots (dan). Thus, the hypothesis that John has measles also seems to qualify
as a generalisation. We would argue however that this generalisation effect is already
present in the background theory. On the other hand, an inductive hypothesis produces
a genuinely new generalisation effect, in the sense that we can find new individuals for
which the addition of the hypothesis to our knowledge is necessary to derive some ob-
servable property for these individuals (usually this property is that of the observations
on which the induction was based). With an abductive hypothesis this kind of exten-
sion of the observable property to other new individuals does not necessarily require
the a priori addition of the abductive hypothesis to the theory but depends only on the
properties of this individual and the given background theory: the generalisation, if
any, already exists in the background theory.
We conclude that abductive and inductive hypotheses differ in the degree of gen-
eralisation that each of them produces. With the given background theory T we im-
plicitly restrict the generalising power of abduction as we require that the basic model
of our domain remains that of T . The existence of this theory separates two levels
of generalisation: (a) that contained in the theory and (b) new generalisations that are
not given by the theory. In abduction we can only have the first level with no in-
terest in genuinely new generalisations, while in induction we do produce such new
generalisations.

1.3.3 Extending incomplete theories


We will now further examine the general logical process that each of abduction and
induction takes. The overall process that sets the two forms of reasoning of abduction
and induction in context is that of theory formation and theory development. In this
we start with a theory T (that may be empty) which describes at a certain level the
problem domain we are interested in. This theory is incomplete in its representation
of the problem domain, as otherwise there is no need for non-deductive ampliative
reasoning. New information given to us by the observations is to be used to complete
this description. As we argue below, abduction and induction each deal with a different
kind of incompleteness of the theory T.

Abductive extensions. In a typical use of abduction, the description of the problem


domain by the theory T is further assumed to be sufficient, in the sense that it has
reached a stage where we can reason with it. Typically this means that the incom-
pleteness of the theory can be isolated in some of its non-observable predicates, which
are called abducible (or open) predicates. We can then view the theory T as a repre-
sentation of all of its possible abductive extensions T U ll., usually denoted T (ll.), for
each abducible hypothesis ll.. An enumeration of all such formulae (consistent with
T) gives the set of all possible abductive extensions of T. Abductive entailment with
T is then defined by deductive entailment in each of its abductive extensions.
Alternatively, we can view each abductive formula ll. as supplying the missing in-
stance knowledge for a different possible situation or individual in our domain, which
is then completely described by T(ll.). For example, an unobserved individual and its
background properties can be understood via a corresponding abductive formula ll..
Once we have these background properties, we can derive - using T - other proper-
18 P.A. FLACH AND A. C. KAKAS

ties for this new individual.7 Given an abductive theory T as above, the process of
abduction is to select one of the abductive extensions T(A) ofT in which the given
observation to be explained holds, by selecting the corresponding formula A. We can
then reason deductively in T(A) to arrive at other conclusions. By selecting A we are
essentially enabling one of the possible associations between A and the observation
among those supplied by the theory T.
It is important here to emphasise that the restriction of the hypothesis of abduction
to abducible predicates is not incidental or computational, but has a deeper representa-
tional reason. It reflects the relative comprehensiveness of knowledge of the problem
domain contained in T. The abducible predicates and the allowed abductive formu-
lae take the role of 'answer-holders' for the problem goals that we want to set to our
theory. In this respect they take the place of the logical variable as the answer-holder
when deductive reasoning is used for problem solving. As a result this means that the
form of the abductive hypothesis depends heavily on the particular theory T at hand,
and the way that we have chosen to represent in this our problem domain.
Typically, the allowed abducible formulae are further restricted to simple logical
forms such as ground or existentially quantified conjunctions of abducible literals.
Although these further restrictions may be partly motivated by computational consid-
erations, it is again important to point out that they are only made possible by the
relative comprehensiveness of the particular representation of our problem domain in
the theory T. Thus, the case of simple abduction - where the abducible hypothesis
are ground facts- occurs exactly because the representation of the problem domain in
T is sufficiently complete to allow this. Furthermore, this restriction is not significant
for the purposes of comparison of abduction and induction: our analysis here is inde-
pendent of the particular form of abducible formulae. The important elements are the
existence of an enumeration of the abductive formulae, and the fact that these do not
involve observable predicates.

Inductive extensions. Let us now tum to the case of induction and analyse this
process to facilitate comparison with the process of abduction as described above.
Again, we have a collection of possible inductive hypotheses from which one must be
selected. The main difference now is the fact that these hypotheses are not limited to
a particular subset of predicates that are incompletely specified in the representation
of our problem domain by the theory T, but are restricted only by the language ofT.
In practice, there may be a restriction on the form of the hypothesis, called language
bias, but this is usually motivated either by computational considerations, or by other
information external to the theory T that guides us to an inductive solution.
Another essential characteristic of the process of induction concerns the role of the
selected inductive hypothesis H. The role of H is to extend the existing theory T to a
new theory T' = T U H, rather than reason with T under the set of assumptions H as is
the case for abduction. Hence T is replaced by T' to become a new theory with which
we can subsequently reason, either deductively of abductively, to extract information

7 Note that this type of abductive (or open) reasoning with a theory T collapses to deduction, when and if
the theory becomes fully complete.
ABDUCTIVE AND INDUCTIVE REASONING: BACKGROUND AND ISSUES 19

from it. The hypothesis H changes T by requiring extra conditions on the observable
predicates that drive the induction, unlike abduction where the extra conditions do not
involve the observable predicates. In effect, H provides the link between observables
and non-observables that was missing or incomplete in the original theory T.
Analogously to the concept of abductive extension, we can define inductive ex-
tensions as follows. Consider a common given theory T with which we are able to
perform abduction and induction. That is, T has a number of abductive extensions
T(!!.). Choosing an inductive hypothesis Has a new part of the theory T has the ef-
fect of further conditioning each of the abductive extensions T(!!.). Hence, while in
abduction we select an abductive extension of T, with induction we extend each of
the abductive extensions with H. The effect of induction is thus 'universal' on all the
abductive extensions.
If we now consider the new abductive theory T' = T U H, constructed by induction,
we can view induction as a process of selecting a collection of abductive extensions,
namely those of the new theory T'. Hence an inductive extension can be viewed as a
set ofabductive extensions of the original theory T that are further (uniformly) condi-
tioned by the common statement of the inductive hypothesis H . This idea of an induc-
tive extension consisting of a set of abductive extensions was used in (Denecker et al.,
1996) to obtain a formalisation of abduction and induction as selection processes in
a space of possible world models over the given theory in each case. In this way the
process of induction can be seen to have a more general form than abduction, able to
select a set of extensions rather than a single one. Note that this does not necessar-
ily mean that induction will yield a more general syntactic form of hypotheses than
abduction.

Analysis. Comparing the possible inductive and abductive extensions of a given the-
ory T we have an essential difference. In the case of abduction some of the predicates
in the theory, namely the observables, cannot be arbitrarily defined in an extension.
The freedom of choice of abduction is restricted to constrain directly (via!!..) only the
abducibles of the theory. The observable predicates cannot be affected except through
the theory: the observables must be grounded in the existing theory T by the choice of
the abductive conditions on the abducible part of the extension. Hence in an abductive
extension the extent to which the observables can become true is limited by the theory
T and the particular conditions !!.. on the rest of the predicates.
In induction this restriction is lifted, and indeed we can have inductive extensions
of the given theory T, the truthvalue of which on the observable predicates need not
be attributed via T to a choice on the abducibles. The inductive extensions 'induce'
a more general change (from the point of view of the observables) on the existing
theory T, and- as we will see below- this will allow induction to genuinely gener-
alise the given observations to other cases not derivable from the original theory T .
The generalising effect of abduction, if at all present, is much more limited. The se-
lected abductive hypothesis!!.. may produce in T(!!.) further information on abducible
or other predicates, as in the measles example from the previous section. Assuming
that abducibles and observables are disjoint, any information on an observable derived
in T(!!.) is a generalisation already contained in T .
20 P.A. R..ACH AND A. C. KAKAS

What cannot happen is that the chosen abductive hypothesis A alone (without T)
predicts a new observation, as A does not affect directly the value of the observable
predicates. Every prediction on an observable derived in T(A), not previously true in
T (including the observation that drives the abductive process), corresponds to some
further instance knowledge A', which is a consequence of T(A), and describes the
new situation (or individual) at hand. Such consequences are already known to be
possible in the theory T, as we know that one of its possible extensions is T(A').
In the measles example (p. 16), the observation red_spots (john) gives rise to the
hypothesis A= measles (john). Adopting this hypothesis leads to a new prediction
red_spots (dan), corresponding to the instance knowledge A'= measles (dan),
which is a consequence of T(A) . This new prediction could be obtained directly from
T(measles (dan)) without the need of A= measles (john).
Similarly, if we consider a previously unobserved situation (not derivable from
T(A)) described by Anew with T(A) U Anew deriving a new observation, this is also
already known to be possible as T(A U Anew) is one of the possible extensions ofT.
For example, if Anew= measles (mary), then T(A) UAnew• and in fact TUAnew
derives red_spots (mary), which is again not a genuine generalisation.
In short, abduction is meant to select some further conditions A under which we
should reason with T. It concerns only this particular situation described by A and
hence, if A cannot impose directly any conditions on the observable predicates, the
only generalisations that we can get on the observables are those contained in T under
the particular restrictions A. In this sense we say that the generalisation is not genuine
but already contained in T. Hence, as argued in the chapter by Console and Saitta,
abduction increases the intension of known individuals (abducible properties are now
made true for these individuals), but does not have a genuine generalisation effect on
the observables (it does not increase the extension of the observables with previously
unobserved individuals for which the theory T alone could not produce this extension
when it is given the instance knowledge that describes these individuals).
On the other hand, the universal conditioning of the theory T by the inductive
hypothesis H produces a genuine generalisation on the observables of induction. The
extra conditions in H on the observables introduce new information on the relation of
these predicates to non-observable predicates in the theory T, and from this we get
new observable consequences. We can now find cases where from H alone together
with a (non-observable) part ofT, describing this case, we can derive a prediction not
previously derivable in T.
The new generalisation effect of induction shows up more when we consider as
above the case where the given theory for induction has some of its predicates as
abducible (different from the observables). It is now possible to have a new individual
described by the extra abducible information Anew. such that in the new theory T' =
T U H produced by induction a new observation holds which was not known to be
possible in the old theory T (i.e. it is not a consequence ofT U Anew). Note that we
cannot (as in the case of abduction) combine H with Anew to a set A~ew of instance
knowledge, under which the observation would hold from the old theory T. We can
also have that a new observation holds alone from the hypothesis H and Anew for such
previously unobserved situations not described in the given theory T. These are cases
ABDUCTIVE AND INDUCTIVE REASONING: BACKGROUND AND ISSUES 21

of genuine generalisation not previously known to be possible from the initial theory
T.
Summarising this subsection, induction- seen as a selection of a set of extensions
defined by the new theory T U H - has a stronger and genuinely new generalising
effect on the observable predicates than abduction. The purpose of abduction is to
select an extension and reason with it, thus enabling the generalising potential of the
given theory T. In induction the purpose is to extend the given theory to a new theory,
the abductive extensions of which can provide new possible observable consequences.
Finally, we point out a duality between abduction and induction (first studied in
(Dimopoulos and Kakas, 1996b)) as a result of this analysis. In abduction the the-
ory T is fixed and we vary the instance knowledge to capture (via T) the observable
knowledge. On the other hand, in induction the instance knowledge is fixed as part of
the background knowledge B, and we vary the general theory so that if the selected
theory T is taken as our abductive theory then the instance knowledge in B will form
an abductive solution for the observations that drove the induction. Conversely, if
we perform abduction with T and we consider the abductive hypothesis A explaining
the observations as instance knowledge, the original theory T forms a valid inductive
hypothesis.

1.3.4 Interaction between abduction and induction


In the preceding sections we analysed basic patterns of abduction and induction. In
practice hybrid forms of ampliative reasoning occur, requiring an interaction between
these basic patterns. Such interaction is the subject of this section.
Let us consider a simple example originating from (Michalski, 1993). We have the
observation that:
0: all bananas in this shop are yellow,

and we want to explain this given a theory T containing the statement:


T: all bananas from Barbados are yellow.

An explanation for this is given by the hypothesis:


H: all bananas in this shop are from Barbados.

Is this a form of abduction or a form of induction, or perhaps a hybrid form? As we


will show, this strongly depends on the choice of observables and abducibles.
Suppose, first, that we choose 'yellow' as observable and the other predicates as ab-
ducibles.8 The hypothesis H selects amongst all the possible abductive extensions of
the theory T (corresponding to the different abducible statements of instance knowl-
edge consistent with T) a particular one. In this selected extension the observation is
entailed and therefore the hypothesis explains according to the abductive theory T the
observation. Note that this hypothesis H does not generalise the given observations: it
does not enlarge the extension of the observable predicate 'yellow' over that provided

8 We can if we wish consider only the predicate 'from Barbados' as abducible.


22 P.A. FLACH AND A. C. KAKAS

by the statement of the observation 0. In fact, we can replace the universal quantifica-
tion in 'all bananas from this shop' by a typical representative through skolemisation.
More importantly, the link of the observation 0 with the extra information of H is
known a priori as one of the possible ways of reasoning with the theory T to derive
new observable information.
There is a second way in which to view this reasoning and the hypothesis H above.
We can consider the predicate 'from Barbados' as the observable predicate with a set
of observations that each of the observed bananas in the shop is from Barbados. We
then have a prototypical inductive problem (like the white bean example of Peirce)
where we generate the same statement H as above, but now as an inductive hypothe-
sis. From this point of view the hypothesis now has a genuine generalising effect over
the observations on the predicate 'from Barbados'. But where did the observations on
Barbados come from? These can be obtained from the theory T as separate abductive
explanations for each of the original observations (or a typical one) on the predicate
'yellow'. We can thus understand this example as a hybrid process of first using (sim-
ple) abduction to translate separately each given observation as an observation on the
abducibles, and then use induction to generalise the latter set of observations, thus
arriving at a general statement on the abducibles.
Essentially, in this latter view we are identifying, by changing within the same
problem the observable and abducible predicates, simple basic forms of abduction and
induction on which we can build more complex forms of non-deductive reasoning.
Referring back to our earlier discussion in Section 1.3, these basic forms are: pure
abduction for explanation with no generalisation effect (over what already exists in the
theory T); and pure induction of simple generalisations from sample to population.
This identification of basic distinct forms of reasoning has important computational
consequences. It means that we can consider two basic computational models for the
separate tasks of abduction and induction. The emphasis then shifts to the question
of how these basic forms of reasoning and computation can be integrated together to
solve more complex problems by suitably breaking down these problems into simpler
ones.
It is interesting to note here that in the recent framework of inverse entailment as
used by the ILP system Progol (Muggleton, 1995) where we can learn from general
clauses as observations, an analysis of its computation as done in the chapter by Ya-
mamoto reveals that this can be understood as a mixture of abduction and induction.
As described in the above example, the Progol computation can be separated into first
abductively explaining according to the background theory a skolemised, typical ob-
servation, and then inductively generalising over this abductive explanation. The use-
fulness of explicitly separating out abduction and induction is also evident in several
works of theory formation or revision. Basic computational forms of abduction and
induction are used together to address these complex problems. This will be described
further in Section 1.4 on the integration of abduction and induction in AI.

1.3.5 Computational characteristics


We will close this section by discussing further the computational distinction that the
basic forms of abduction and induction have in their practice in AI and logic program-
ABDUCTIVE AND INDUCTIVE REASONING: BACKGROUND AND ISSUES 23

ming. Indeed, when we examine the computational models used for abduction and
induction in AI, we notice that they are very different. Their difference is so wide that
it is difficult, if not impossible, to use the computational framework of one form of
reasoning in order to compute the other form of reasoning. Systems developed in AI
for abduction cannot be used for induction (and learning), and vice versa, inductive AI
systems cannot be used to solve abductive problems.9 In the chapter by Christiansen a
system is described where the computation of both forms of reasoning can be unified
at a meta-level, but where the actual computation followed by the system is different
for the separate forms of reasoning.
We will describe here the main characteristics of the computational models of the
basic forms of abduction and induction, discussed above, as they are found in practical
AI approaches. According to these basic forms, abduction extracts an explanation
for an observation from a given theory T, and induction generalises a set of atomic
observations. For abduction the computation has the following basic form: extract
from the given theory T a hypothesis A and check this for consistency. The search
for a hypothesis is done via some form of enhanced deduction method e.g. resolution
with residues (Cox and Pietrzykowski, 1986a; Eshghi and Kowalski, 1989; Kakas
and Mancarella, 1990c; Denecker and de Schreye, 1992; Inoue, 1992a; Kakas and
Michael, 1995), or unfolding of the theory T (Console et al., 1991b; Fung and
Kowalski, 1997).
The important thing to note is that the abductive computation is primarily based on
the computation of deductive consequences from the theory T. The proofs are now
generalised so that they can be successfully terminated 'early' with an abductive for-
mula. To check consistency of the found hypothesis, abductive systems employ stan-
dard deductive methods (these may sometimes be specially simplified and adapted to
the particular form that the abductive formulae are restricted to take). If a hypothesis
(or part of a hypothesis) is found inconsistent then it is rejected and another one is
sought. Note that systems that compute constructive abduction (e.g. SLDNFA (De-
necker and de Schreye, 1998) , IFF (Fung and Kowalski, 1997), ACLP (Kakas and
Michael, 1995)), where the hypothesis may not be ground but can be an existentially
quantified conjunction (with arithmetic constraints on these variables) or even a uni-
versally quantified formula, have the same computational characteristics. They arrive
at these more complex hypotheses by extending the proof methods for entailment to
account for the (isolated) incompleteness on the abducible predicates.
On the other hand, the computational model for the basic form of induction in AI
takes a rather different form. It constructs a hypothesis and then refines this under
consistency and other criteria. The construction of the hypothesis is based on methods
for inverting entailment proofs (or satisfaction proofs in the case of confirmatory in-
duction) so that we can obtain a new theory that would then entail (or be satisfied by)
the observations. Thus, unlike the abductive case, the computation cannot be based
on proof methods for entailment, and new methods such as inverse resolution, clause
generalisation and specialisation are used. In induction the hypothesis is generated

9 With the possible exception of Cigol (Muggleton and Buntine, 1988), a system designed for doing unre-
stricted reversed deduction.
24 P.A. FLACH AND A. C. KAKAS

from the language of the problem domain (rather than a given theory of the domain),
in a process of iteratively improving a hypothesis to meet the various requirements
posed by the problem. Furthermore, in induction the comparison of the different pos-
sible hypotheses plays a prominent and dynamic role in the actual process of hypothe-
sis generation, whereas in abduction evaluation of the different alternative hypothesis
may be done after these have been generated.
It should be noted, however, that the observed computational differences between
generating abductive hypotheses and generating inductive hypotheses are likely to be-
come smaller once more complex abductive hypotheses are allowed. Much of the
computational effort of ILP systems is spent on efficiently searching and pruning the
space of possible hypotheses, while ALP systems typically enumerate all possible
abductive explanations. The latter approach becomes clearly infeasible when the ab-
ductive hypothesis space grows. In this respect, we again mention the system Cigol
which seems to be the only system employing a unified computational method (inverse
resolution) to generate both abductive and inductive hypotheses.
Computational distinctions of the two forms of reasoning are amplified when we
consider the different works of trying to integrate abduction and induction in a com-
mon system. In most of these works, each of the two forms of reasoning is computed
separately, and their results are transferred to the other form of reasoning as input. The
integration clearly recognises two different computational processes (for each reason-
ing) which are then suitably linked together. For example, in LAB (Thompson and
Mooney, 1994) or ACL (Kakas and Riguzzi, 1997; Kakas and Riguzzi, 1999) the
overall computation is that of induction as described above, but where now - at the
point of evaluation and improvement of the hypothesis- a specific abductive problem
is computed that provides feedback regarding the suitability of the inductive hypothe-
sis. In other cases, such as RUTH (Ade et al., 1994) or Either (Ourston and Mooney,
1994) an abductive process generates new observable input for a subsidiary inductive
process. In all these cases we have well-defined separate problems of simple forms of
abduction and induction each of which is computed along the lines described above.
In other words, the computational viability of the integrated systems depends signifi-
cantly on this separation of the problem and computation into instances of the simple
forms of abduction and induction.

1.4 INTEGRATION OF ABDUCTION AND INDUCTION


The complementarity between abduction and induction, as we have seen it in the pre-
vious section- abduction providing explanations from the theory while induction gen-
eralises to form new parts of the theory - suggests a basis for their integration. Co-
operation between the two forms of reasoning would be useful within the context of
theory development (construction or revision), where a current theory Tis updated to
a new theory T' in the light of new observations 0 so that T' captures 0 (i.e. T' f= 0).
At the simplest level, abduction and induction simply co-exist and both function as
revision mechanisms that can be used in developing the new theory (Michalski, 1993).
In a slightly more cooperative setting, induction provides new foreground knowledge
in T for later use by abduction. At a deeper level of cooperation, abduction and in-
duction can be integrated together within the process of constructing T. There are
ABDUCfiVE AND INDUCfiVE REASONING: BACKGROUND AND ISSUES 25

T' ----0

Induction
"'
TuH 0 Abduction

T
------ ~0'
Figure 1.2 The cycle of abductive and inductive knowledge development.

several ways in which this can happen within a cycle of development of T, as will be
described below. For further discussion on the integration of abduction and induction
in the context of machine learning see the chapter by Mooney in this volume. Also
the chapter by Sak:ama studies how abduction can be used to compute induction in an
integrated way.

The cycle of abductive and inductive knowledge development. On the one hand,
abduction can be used to extract from the given theory T and observations 0 abducible
information that would then feed into induction as (additional) training data. One ex-
ample of this is provided by (Ours ton and Mooney, 1994), where abduction identifies
points of repair of the original, faulty theory T, i.e. clauses that could be generalised
so that positive observations in 0 become entailed, or clauses that may need to be
specialised or retracted because they are inconsistent with negative observations.
A more active cooperation occurs when, first, through the use of basic abduction,
the original observations are transformed to data on abducible background predicates
in T, becoming training data for induction on these predicates. An example of this was
discussed in Section 1.3 .4; another example in (Dimopoulos and Kak:as, 1996b) shows
that only if, before inductive generalisation takes place, we abductively transform the
observations into other predicates in a uniform way, it is possible to solve the original
inductive learning task. In this volume, Abe studies this type of integration, employing
an analogy principle to generate suitable data for induction. Similarly, Yamamoto's
analysis of the ILP system Progol in this volume shows that- at an abstract level- the
computation splits into a first phase of abductively transforming the observations on
one predicate to data on other predicates, followed by a second generalisation phase
to produce the solution.
In the framework of the system RUTH (Ade et al., 1994), we see induction feeding
into the original abductive task. An abductive explanation may lead to a set of required
facts on 'inducible' predicates, which are inductively generalised to give a general rule
in the abductive explanation for the original observations, similar to (one analysis of)
the bananas example discussed previously.
These types of integration can be succinctly summarised as follows. Consider a
cycle of knowledge development governed by the 'equation' T U HI= 0, where Tis
the current theory, 0 the observation triggering theory development, and H the new
26 P.A. FLACH AND A.C. KAKAS

knowledge generated. Then, as shown in Figure 1.2, on one side of this cycle we
have induction, its output feeding into the theory T for later use by abduction, as
shown in the other half of the cycle, where the abductive output in tum feeds into the
observational data 0 for later use by induction, and so on.

Inducing abductive theories. Another way in which induction can feed into abduc-
tion is through the generation of confirmatory (or descriptive) inductive hypotheses
that could act as integrity constraints for the new theory. Here we initially have some
abductive hypotheses regarding the presence or absence of abducible assumptions.
Based on these hypotheses and other data in T we generate, by means of confirmatory
induction, new sentences I which, when interpreted as integrity constraints on the new
theory T, would support the abducible assumptions (assumptions of presence would
be consistent with/, assumptions of absence would now be inconsistent with/).
This type of cooperation between abductive and inductive reasoning is based on a
deeper level of integration of the two forms of reasoning, where induction is perceived
as hypothesising abductive (rather than deductive) theories. The deductive coverage
relation for learning is replaced by abductive coverage, such that an inductive hypoth-
esis H is a valid generalisation if the observations can be abductively explained by
T' = T U H, rather than deductively entailed. A simple example of this is the exten-
sion of Explanation-Based Learning with abduction (Cohen, 1992; O'Rorke, 1994),
such that deductive explanations are allowed to be completed by abductive assump-
tions before they are generalised.
Inducing abductive theories is particularly useful in cases where the domain theory
is incomplete, and also when performing multiple predicate learning, as also in this
case the background knowledge for one predicate includes the incomplete data for the
other predicates to be learned. In these cases the given theory T is essentially an ab-
ductive theory, and hence it is appropriate to use an abductive coverage relation. On
the other hand, it may be that the domain that we are trying to learn is itself inher-
ently abductive or non-monotonic (e.g. containing nested hierarchies of exceptions),
in which case the hypothesis space for learning is a space of abductive theories.
LAB (Thompson and Mooney, 1994) is one of the first learning systems adopting
this point of view (see also Mooney's contribution to this volume). The class predi-
cates to be learned are the abducible predicates, and the induced theory H describes the
effects of these predicates on other predicates that we can observe directly with rules
of the form observation~ class. Then the training examples (each consisting of a set
of properties and its classification) are captured by the induced hypothesis H when the
correct classification of the examples forms a valid abductive explanation, given H, for
their observed properties. Other frameworks for learning abductive theories are given
in (Kakas and Riguzzi, 1997; Kakas and Riguzzi, 1999; Dimopoulos et al., 1997)
and the chapter by Lamma et al. Here, both explanatory and confirmatory induction
are used to generate theories together with integrity constraints. In this volume, Inoue
and Haneda also study the problem of learning abductive logic programs for capturing
non-monotonic theories.
With this type of integration we can perceive abduction as being used to evaluate
the suitability or credibility of the inductive hypothesis. Similarly, abductive expla-
ABDUCfiVE AND INDUCfiVE REASONING: BACKGROUND AND ISSUES 27

nations that lead to induction can be evaluated by testing the induced generalisation.
In this sense, the integration of abduction and induction can help to cross-evaluate the
hypothesis that they generate.

1.5 CONCLUSIONS
The nature of abduction and induction is still hotly debated. In this introductory chap-
ter we have tried to chart the terrain of possible positions in this debate, and also to
provide a roadmap for the contributions to this volume. From a logico-philosophical
perspective, there are broadly speaking two positions: either one holds that abduction
provides explanations and induction provides generalisations; or one can hold that ab-
duction is the logic of hypothesis generation and induction is the logic of hypothesis
evaluation. AI approaches tend to adopt the first perspective (although there are ex-
ceptions)- abduction and induction each deal with a different kind of incompleteness
of the given theory, extending it in different ways.
As stressed in the introduction to this chapter, we do however think that absolute
positions in this debate may be counter-productive. Referring back to the questions
formulated there, we think it will be useful to unify abduction and induction when
concentrating on hypothesis evaluation. On the other hand, when considering hypoth-
esis generation we often perceive a distinction between abduction and induction, in
particular in their computational aspects.
With respect to the second question, abduction and induction can be usefully inte-
grated when trying to solve complex theory development tasks. We have reviewed a
number of AI approaches to such integration. Most of these frameworks of integration
use relatively simple forms of abduction and induction, namely abduction of ground
facts and basic inductive generalisations. Moreover, each of the two is computed sep-
arately and its results transferred to the other, thus clearly recognising two separate
and basic computational problems. From these, they synthesise an integrated form of
reasoning that can produce more complex solutions, following a cyclic pattern with
each form of reasoning feeding into the other.
A central question then arises as to what extent the combination of such basic forms
of abduction and induction is complete, in the sense that it encapsulates all solutions to
the task. Can they form a generating basis for any method for such theory development
which Peirce describes in his later work as 'coming up with a new theory'? We hope
that the present collection of papers will contribute towards understanding this issue,
and many other issues pertaining to the relation between abduction and induction.

Acknowledgments
Part of this work was supported by Esprit IV Long Term Research Project 20237 (Inductive
Logic Programming 2).
I The philosophy of abduction
and induction
SMART INDUCTIVE 2
GENERALIZATIONS ARE ABDUCTIONS
John A. Josephson

2.1 A DISTINCTIVE PATTERN OF INFERENCE

2.1.1 Inference to the best explanation


To postpone entanglements with the abundant confusions surrounding various uses
of the term "abduction," for which Peirce himself seems to be largely responsible,
and to proceed as directly as possible to engage the basic logical and computational
issues, let us begin by examining a pattern of inference I will call "inference to the
best explanation" and abbreviate as "IBE" ("IBEs" for the plural). 1
IBEs follow a pattern like this:2
Dis a collection of data (facts, observations, givens),
H explains D (would, if true, explain D),
No other hypothesis explains D as well as H does.

Therefore, H is probably correct.


The strength of the conclusion depends on several considerations, including: 3

• how good H is by itself, independently of considering the alternatives,

• how decisively H surpasses the alternatives, and

1The phrase "inference to the best explanation" seems to originate with Gilbert Hannan (Hannan, 1965).
2 This formulation is largely due to William Lycan.
3 Piease see (Josephson, 1994, p. l4), for a more complete description of the considerations governing con-
fidence and acceptance, including pragmatic considerations.

31
P.A. Flach and A.C. Kakas (eds.), Abduction and Induction, 31-44.
© 2000 Kluwer Academic Publishers.
32 J.R. JOSEPHSON

• how thorough was the search for alternative explanations.

I trust that my readers recognize IBE as familiar, and as having a kind of intuitively
recognizable evidential force. We can observe that people quite commonly justify
their conclusions by direct, or barely disguised, appeal to IBE, showing that speaker
and hearer share a common understanding of the pattern. Thus, IBE appears to be
part of "commonsense logic." Why this might be so makes for interesting speculation
- perhaps it is somehow built into the human mind, and perhaps this is good design.
These speculations aside, it seems undeniable that people commonly view IBE as a
form of good thinking. Moreover, it appears that people intuitively recognize many
of the considerations, such as those just mentioned, that govern the strength of the
conclusions of IBEs. Beyond that, people sometimes actually come up to the standards
set by IBE in their actual reasoning. When they do so, they can be reasonably said to
be "behaving intelligently" (in that respect). Thus, IBE is part of being intelligent, part
of being "smart."
When I say that a form of inference, is "smart" I mean that reasoning in accor-
dance with it has some value in contributing to intelligence, either because inferences
according to the pattern carry evidential force, as in the case of IBE, or because they
have some other power or effectiveness to contribute to intelligence. I will leave "in-
telligence" undefined, although I suppose that intelligence is approximately the same
as western philosophers have called "reason," and that intelligence is a biological phe-
nomenon, an information-processing capability of humans and other organisms, and
that it comes in degrees and dimensions, with some species and individuals being more
intelligent than others in some respects.
Besides everyday intelligence, we can readily see IBE in scientific reasoning, as
well as in the reasoning of historians, juries, diagnosticians, and detectives. Thus, IBE
seems to characterize some of the most careful and productive processes of human
reasoning. 4 Considering its apparent ubiquity, it is remarkable how overlooked and
underanalyzed this inference pattern is by some 2,400 years of logic and philosophy.
The effectiveness of predictions enters the evaluative considerations in a natural
way. A hypothesis that leads to false or highly inaccurate predictions is poor by itself,
and should not be accepted, even if it appears to be the best explanation when consid-
ering all the available data. Failures in predictive power count as evidence against a
hypothesis and so tend to improve the chances of other hypotheses coming out as best.
Failures in predictive power may also improve the margin of decisiveness by which
the best explanation surpasses the failing alternatives. Thus. we see that IBEs are
capable of turning negative evidence against some hypotheses into positive evidence
for alternative hypotheses.
This kind of reasoning by exclusion, which is able to tum negative to positive ev-
idence, can be viewed deductively as relying on the assumption that the contrast set
(the set of hypotheses within which one hypothesis gets evidential support by being
best and over which reasoning by exclusion proceeds) exhausts the possibilities. It

4 For more extensive discussions of the epistemic virtues of this pattern of inference, see (Harman, 1965;
Lipton, 1991; Josephson, 1994).
SMART INDUCfiVE GENERALIZATIONS ARE ABDUCfiONS 33

must either exhaust the possibilities, or at least be broad enough to include all plau-
sible hypotheses, i.e., all hypotheses with a significant chance of being true. If the
contrast set is broad enough, the true explanation can be presumed to be included
somewhere in the set of hypotheses under consideration, and the best explanation can
then be brought out by reasoning by exclusion. A thorough search for alternatives,
the third consideration mentioned previously, is important for high confidence in the
conclusion, since a thorough search reduces the danger that the contrast set is too nar-
row and that the true explanation has been overlooked.5 Note that nothing requires
the alternatives in a contrast set to be mutually exclusive. In principle a patient might
have both indigestion and heart trouble. Reasoning by exclusion works fine for non-
exclusive hypotheses- reasoning by exclusion depends on exhaustiveness, not mutual
exclusion.
I do not suggest that the description of the IBE inference pattern that I have given
here is perfect, or precise, or complete, or the best possible description of it. I suggest
only that it is good enough so that we can recognize IBE as distinctive, logically
forceful, ubiquitous, and smart.

2.1.2 "Abduction"
Sometimes a distinction has been made between an initial process of coming up with
explanatorily useful hypothesis alternatives and a subsequent process of critical eval-
uation wherein a decision is made as to which explanation is best. Sometimes the
term "abduction" has been restricted to the hypothesis-generation phase. Peirce him-
self commonly wrote this way, although at other times Peirce clearly used the term
"abduction" for something close to what I have here called "inference to the best ex-
planation."6
Sometimes "abduction" has been identified with the creative generation of explana-
tory hypotheses, even sometimes with the creative generation of ideas in general.
Kruijff suggests that, besides the creativity of hypotheses, the surprisingness of what
is to be explained is at the core of abduction's ubiquity and of its relation to reality
(Kruijff, 1997). Peirce, too, sometimes emphasizes surprise. It is clear that there is
much expected utility in trying to explain things that are surprising. Surprise points out
just where knowledge is lacking, and when a failed expectation has distinctly pleas-
ant or unpleasant effects, there may well be something of practical important to be
learned. But one may also wonder about, and seek explanations for, things that are not
ordinarily surprising, and which only become "surprising" when you wonder about
them, when you recognize that in some way things could be different. "Why do things
fall?" "Why do people get angry?" "Why do arctic foxes have white coats in winter?"
None of these are unexpected, all present openings for new knowledge. Clearly, nei-
ther novelty of hypothesis nor surprise at the data are essential for an IBE to establish

5 For amore extensive discussion of the importance of evidence that the contrast set includes the true expla-
nation, please see (Josephson, 1994, p. IS).
6 For a discussion of Peirce's views on abduction, please see the opening essay by Flach and Kakas and
the essay by Psillos in this volume. For a detailed scholarly examination of Peirce's writings on abduction
please see (Fann, 1970).
34 J.R. JOSEPHSON

its conclusion with evidential force . "Who forgot to put the cheese away last night?"
"Probably Billy. He has left the cheese out most nights this week."
While the creative generation of ideas is certainly virtuous in the right contexts, and
useful for being smart, it is necessary for creative hypotheses to have some plausibility,
some chance of being true, or some pursuit value, before creativity can make a gen-
uine contribution to working intelligence. Generating low value creative explanatory
hypotheses is in itself a dis-virtue in that time, attention, and other cognitive or com-
putational resources must then be expended in rejecting these low value hypotheses so
that better hypotheses may be pursued. Too much of the wrong kind of creativity is a
drain on intelligence, and so is not smart. Generation of hypotheses, without critical
control, is not smart. Generation of hypotheses, as a pattern of inference, in and of
itself, is not smart.
Generation of plausible explanatory hypotheses, relevant to the current explanatory
problem, is smart. Yet pre-screening hypotheses to remove those that are implausible
or irrelevant mixes critical evaluation into the hypothesis-generation process, and so
breaks the separation between the process of hypotheses generation and the process of
critical evaluation. Furthermore, evaluating one or more explanatory hypotheses may
require (according to IBE) that alternative explanations are generated and considered
and that a judgment is made concerning the thoroughness of the search for alternative
explanations. Again we see a breakdown in the separation of the processes of hypoth-
esis generation from the processes of critical evaluation. Either type of process will
sometimes need the other as a subprocess. The use of one process by the other might
be precompiled, so that it is not invoked explicitly at run time, but instead is only
implicit. A hypothesis generation mechanism might implicitly use criticism (it must
use some criticism if it is to be smart), and criticism might implicitly use hypothesis
generation, for example by implicitly considering and eliminating a large number of
alternatives as being implausible. Thus I conclude that hypothesis generation and hy-
pothesis evaluation cannot be neatly separated, and in any case, hypothesis generation
by itself is not smart.
Consider another pattern of inference, which I will call "backward modus ponens,"
which has a pattern as follows:
p-+q
q

Therefore, p.
The arrow,"--+", here may be variously interpreted, so let us just suppose it to have
more or less the same meaning as the arrow used in schematizing:
p-+q
p

Therefore, q.
This second one is modus ponens, and this one is smart. Modus ponens has some
kind of intuitively visible logical force. In contrast, backward modes ponens is ob-
viously fallacious. Copi calls it "the fallacy of affirming the consequent" (Copi and
Cohen, 1998). By itself, backward modus ponens is not smart, although reasoning in
SMART INDUCfiVE GENERALIZATIONS ARE ABDUCfiONS 35

accordance with its pattern may be smart for other reasons, and there may be special
contexts in which following the pattern is smart.
It has become common in AI to identify "abduction" with backward modus po-
nens, or with backward modus ponens together with syntactic or semantic constraints,
such as that the- conclusion must come from some fixed set of abducibles. There is
a burden on those who study restricted forms of backward modus ponens to show us
the virtues of their particular forms, that is, they need to show us how they are smart.
I suggest that we will find that backward modus ponens is smart to the degree that
it approximates, or when it is controlled and constrained to approximate, or when it
implements, inference to the best explanation.
From the foregoing discussion it appears that IBE is distinctive, evidentially force-
ful, ubiquitous, and smart, and that no other proposed definition or description of the
term "abduction" has all of these virtues. Thus it seems that IBE is our best candidate
as a description of what is at the epistemological and information-processing core of
the family of patterns collected around the idea of abduction. I therefore claim the
term "abduction" for IBE, and in the remainder of this essay, by "abduction" I mean
"inference to the best explanation."
Some authors characterize abduction as reasoning from effects to causes, a view to
which we will return later in this essay. For now, I would just like to point out that, at
least, abduction is a good way to be effective in reasoning from effects to causes. From
an effect, we may generate a set of alternative causal explanations and try to determine
which is the best. If a hypothesized cause is the best explanation, then we have good
evidence that it is the true cause.

2. 1.3 Abductive reasoning


Until now the discussion has mainly focused on abduction as an argument pattern
- as a pattern of evidence and justification - although we have briefly touched on a
process-oriented view of abduction in our discussion of the separability of hypoth-
esis generation and evaluation, and in other hints about what it takes to be smart.
In their opening essay to this volume, Flach and Kakas have distinguished Peirce's
early views on abduction from his later more mature views and have characterized
these as "syllogistic" and "inferential" views of abduction, the latter being more pro-
cess oriented. Oztiirk has distinguished inference "as an evidential process," which
is concerned with the value of conclusions either in security or in productivity, from
inference "as a methodological process," which emphasizes the role of inferences in
the economy of processes of inquiry, or the uses of inferences in support of other tasks
(Oztilrk, 1997).
It will be helpful for conceptual clarity to distinguish abduction as a pattern of
argument or justification, from abduction as a reasoning task, from abduction as
a reasoning process. An information-processing task sets up a goal to accomplish,
which may be described independently of its means of accomplishment, that is, a
task may be described separately from the available algorithms, mechanisms, strate-
36 J .R. JOSEPHSON

gies, implementations, and processes that will be needed to accomplish it. 7 These
three perspectives - justification, task, and process - are conceptually tightly inter-
connected, as follows. An abductive reasoning task, prototypically, is one that has
the goal of producing a satisfactory explanation, which is an explanation that can be
confidently accepted. An explanation that can be confidently accepted is one that has
strong abductive justification. Thus, a prototypical abductive task aims at setting up a
strong abductive justification. Information processing that is undertaken for the pur-
pose of accomplishing a prototypical abductive task, that is, of producing a confident
explanation, may reasonably be called an "abductive reasoning process." From an
information- processing perspective, it makes sense to think of abductive reasoning
as comprising the whole process of generation, criticism, and possible acceptance of
explanatory hypotheses.
Note that the abductive justifications set up by abductive reasoning might be ex-
plicit, as when a diagnostic conclusion can be justified, or they might arise implicitly
as a result of the functioning of an "abductively effective mechanism," such as, per-
haps, the human visual system, or the human language understanding mechanism, or
an effective neural-net diagnostic system. Note also that the conclusions of abduc-
tive arguments (and correspondingly, the accomplishments of abductive tasks, and the
results of abductive reasoning processes) may be either general or particular propo-
sitions. Sometimes a particular patient's symptoms are explained; sometimes an em-
pirical generalization is explained by an underlying causal mechanism (e.g., univer-
sal gravitation explains the orbits of the planets). Sometimes an individual event is
explained - "What caused the fire?" - and sometimes a recurrent phenomenon is
explained- "What causes malaria?"
The account of abduction that has been sketched so far in this essay still has two
large holes: (I) what is an explanation? and (2) what makes one explanation better
than another? I will not attempt to fill the second hole in this chapter - the literature
on the subject is vast (see (Darden, 1991, p.277 ff.) for a starting point). I will simply
mention some desirable features of explanatory hypotheses: consistency, plausibility,
simplicity, explanatory power, predictive power, precision, specificity, and theoretical
promise. To begin to fill the first hole, let us ask: what conception of explanation is
needed for understanding abduction?

2.2 WHAT IS AN EXPLANATION?


2.2.1 Explanations are not proofs
There have been two main traditional attempts to analyze explanations as deductive
proofs. By most accounts, neither attempt has been particularly successful. First,
Aristotle maintained that an explanation is a syllogism of a certain form that also
satisfies various informal conditions, one of which is that the middle term of the syllo-
gism is the cause of the thing being explained. More recently (considerably) Hempel
(Hempel, 1965) modernized the logic and proposed the "covering law" or "deductive

7 See (Lucas, 1998) for a method-independent account of diagnosis as a task.


SMART INDUCfiVE GENERALIZATIONS ARE ABDUCfiONS 37

nomological" (D-N) model of explanation. 8 The main difficulty with these accounts
(besides Hempel's confounding the question of what makes an ideally good explana-
tion with the question of what it is to explain at all) is that being a deductive proof is
neither necessary nor sufficient for being an explanation. Consider the following:
QUESTION: Why does he have bums on his hand?
EXPLANATION: He sneezed while cooking pasta and upset the pot.
The point of this example is that an explanation is given, but no deductive proof, and
although it could be turned into a deductive proof by including additional proposi-
tions, this would amount to gratuitously completing what is on the face of it an in-
complete explanation. Real explanations are almost always incomplete. Under the
circumstances (incompletely specified) sneezing and upsetting the pot were presum-
ably causally sufficient for the effect, but this is quite different from being deductively
sufficient. For another example, consider that the flu hypothesis explains the body
aches, but often people have flu without body aches, so having flu does not imply
having body aches. The lesson is that an explanatory hypothesis need not deductively
entail what it explains.
The case that explanations are not necessarily deductive proofs is made even stronger
when we consider psychological explanations, where there is presumptively an ele-
ment of free will, and explanations that are fundamentally statistical, where, for ex-
ample, quantum phenomena are involved. In these cases it is clear that causal deter-
minism cannot be assumed, so the antecedent conditions, even all antecedent condi-
tions together, known and unknown, cannot be assumed to be causally sufficient for
the effects.
Conversely, many deductive proofs fail to be explanations of anything. For exam-
ple, classical mechanics is deterministic and time reversible, so an earlier state of a
system can be deduced from a later state, but the earlier state cannot be said to be
explained thereby. Also, q can be deduced from 'p and q' but is not thereby explained.
Many mathematicians will at least privately acknowledge that some proofs establish
their conclusion without giving much insight into why the conclusions are true, while
other proofs give richer understanding. So it seems that, even in pure mathematics,
some proofs are explanatory and some are not.
We are forced to conclude that explanations are not deductive proofs in any par-
ticularly interesting sense. Although they can always be presented in the form of
deductive proofs by adding premises, doing so does not succeed in capturing anything
essential or especially useful, and typically requires completing an incomplete expla-
nation. Thus the search for a proof of D is not the same as the search for an explanation
of D. Instead it is only a traditional, but seriously flawed, approximation of it.

2.2.2 Explanations give causes


An attractive alternative view is that an explanation is an assignment of causal respon-
sibility; it tells a causal story. Finding possible explanations is finding possible causes

brief summary of deductive and other models of explanation please see (Bhaskar, 1981 ). For a history
8 For a
of more recent philosophical accounts of explanation, please see (Salmon, 1990).
38 J.R. JOSEPHSON

of the thing to be explained. It follows that abduction, as a process of reasoning to


an explanation, is a process of reasoning from effect to cause. (Ideas of causality and
explanation have been intimately linked for a very long time. For a well-developed
historical account of the connections, see (Wallace, 1972; Wallace, 197 4).)
It appears that "cause" for abduction must be understood somewhat more broadly
than its usual senses of mechanical, or efficient, or event-event causation. To get
some idea of a more expanded view of causation, consider the four kinds of causes
according to Aristotle: efficient cause, material cause, final cause, and formal cause
(Aristotle, Physics, bk.2, chap.3). Consider the example of my coffee mug. The
efficient cause is the process by which the mug was manufactured and helps explain
such things as why there are ripples on the surface of the bottom. The material cause
is the ceramic and glaze, which compose the mug and cause it to have certain gross
properties such as hardness. The final cause is the end, or function, or purpose, in this
case to serve as a container for liquids and as a means of conveyance for drinking.
A final-cause explanation is needed to explain the presence and shape of the handle.
Formal cause is somewhat more mysterious - Aristotle is hard to interpret here - but
it is perhaps something like the mathematical properties of the shape, which impose
constraints resulting in certain specific other properties. That the cross-section of the
mug, viewed from above, is approximately a circle, explains why the length and width
of the cross-section are approximately equal.
What the types of causation and explanation are, remains unsettled, despite Aris-
totle's best efforts and those of many other thinkers over the centuries. Apparently,
the causal story told by an abductive explanation might rely on any type of causation.
At different times we seek best explanations of different types. Note that different
types of explanations do not usually compete with each other: they answer different
kinds of explanation-seeking puzzlements; they explain different aspects; and they do
not belong together in the same contrast sets. Yet it seems that the various types of
best-explanation reasoning, corresponding to different types of explanations, are fun-
damentally similar in their reliance on the logic of reasoning by exclusion over a set
of possible explanations.
When we conclude that data D is explained by hypothesis H , we say more than
just that H is a cause of D in the case at hand. We conclude that among all the vast
causal ancestry of D we will assign responsibility to H . Commonly, our reasons for
focusing on H are pragmatic and connected rather directly with goals of producing,
preventing, or repairing D . We blame the heart attack on the blood clot in the coronary
artery or on the high-fat diet, depending on our interests. We can blame the disease on
the invading organism, on the weakened immune system that permitted the invasion,
or on the wound that provided the route of entry into the body. I suggest that it comes
down to this: the things that will satisfy us as accounting for D will depend on what
we are trying to account for about D, and why we are interested in accounting for it;
but the only things that count as candidates are plausible parts of the causal ancestry
of D according to a desired type of causation.
I have argued that explanations give causes. Explaining something, whether that
something is particular or general, gives something else upon which the first thing de-
pends for its existence, or for being the way that it is. The bomb explosion explains
SMART INDUCfiVE GENERALIZATIONS ARE ABDUCTIONS 39

the plane crash. The mechanisms that connect the ingestion of cigarette smoke with
effects on the arteries of the heart, explain the statistical association between smok-
ing and heart disease. It is common in science for an empirical generalization, an
observed regularity, to be explained by reference to underlying structure and mech-
anisms. Explainer and explained, explanans and explanandum, may be general or
particular. Accordingly, abductions may apply to, or arrive at, propositions that are
either general or particular. Computational models of abduction that do not allow for
this are not fully general, although they may be effective as special-purpose models.
As I have argued, explanations are not deductive proofs in any particularly inter-
esting sense. Although they can always be presented in the form of deductive proofs,
doing so seems not to capturing anything essential, or especially useful, and usually
requires completing an incomplete explanation. Thinking of explanations as proofs
tends to confuse causation with logical implication. To put it simply: causation is in
the world, implication is in the mind. Of course, mental causation exists (e.g., where
decisions cause other decisions), which complicates the simple distinction by includ-
ing mental processes in the causal world, but that complication should not be allowed
to obscure the basic point, which is not to confuse an entailment relationship with
what may be the objective, causal grounds for that relationship. Deductive models of
causation are at their best when modeling deterministic closed-world causation, but
this is too narrow for most real-world purposes. Even for modeling situations where
determinism and closed world are appropriate assumptions, treating causality as de-
duction is dangerous, since one must be careful to exclude non-causal and anti-causal
(effect-to-cause) conditionals from any knowledge base if one is to distinguish cause-
effect from other kinds of inferences. [Pearl has pointed out the significant dangers of
unconstrained mixing of cause-to-effect with effect-to-cause reasoning(Pearl, 1988b).]
Per se, there is no reason to seek an implier of some given fact. The set of possible
impliers includes all sorts of riffraff, and there is no obvious contrast set at that level to
set up reasoning by exclusion. But there is a reason to seek a possible cause: broadly
speaking, because knowledge of causes gives us powers of influence and prediction.
And the set of possible causes (of the kind we are interested in) does constitute a
contrast set for reasoning by exclusion. Common sense takes it on faith that everything
has a cause. (Compare this with Leibniz's Principle of Sufficient Reason.) There is
no (non-trivial) principle of logic or common sense that says that everything has an
implier.
In the search for interesting causes, we may set up alternative explanations and
reason by exclusion. Thus, IBE is a way to reason from effect to cause. Effect-to-
cause reasoning is not itself the same as abduction, rather, effect-to-cause reasoning
is what abduction is for.

2.3 SMART INDUCTIVE GENERALIZATIONS ARE ABDUCTIONS


This brings us to the main point of the essay, which is the argument that inductive
generalizations are a special case of abductions, or they are so if they are any good.
More precisely, I will argue that epistemically warranted inductive generalizations are
warranted because they are warranted abductions.
40 J.R. JOSEPHSON

To begin with, let us note that the word "induction" has had no consistent use, either
recently or historically. Sometimes writers have used the term to mean all inferences
that are not deductive, sometimes they have specifically meant inductive generaliza-
tions, and sometimes they have meant next-case inductions as in the philosophical
"problem of induction" as put by David Hume. We focus on inductive generalizations,
which we may describe by saying that an inductive generalization is an inference that
goes from the characteristics of some observed sample of individuals to a conclusion
about the distribution of those characteristics in some larger population. Examples
include generalizations that arrive at categorical propositions (All A's are B's) and
generalizations that arrive at statistical propositions (71% of A's are B's, Most A's are
B's, Typical A's are B's.). A common form of inductive generalization in AI is called
"concept learning from examples," which may be supervised or unsupervised. Here
the learned concept generalizes the frequencies of occurrence and co-occurrence of
certain characteristics in a sample, with the intention to apply them to a larger general
population, which includes unobserved as well as observed instances.
I will argue that it is possible to treat every "smart" (i.e., reasonable, valid, strong)
inductive generalization as an instance of abduction, and that analyzing inductive gen-
eralizations as abductions shows us how to evaluate the strengths of these inferences.
First we note that many possible inductive generalizations are not smart.9
This thumb is mine & this thumb is mine.

Therefore, all thumbs are mine.

All observed apples are observed.

Therefore, all apples are observed.

Russell's example: a man falls from a tall building, passes the 75th floor, passes the
74th floor, passes the 73rd floor, is heard to say, "so far, so good."
Harman pointed out that it is useful to describe inductive generalizations as abduc-
tions because it helps to make clear when the inferences are warranted(Harman, 1965).
Consider the following inference:
All observed A's are B's

Therefore, All A's are B's

This inference is warranted, Harman writes, " ... whenever the hypothesis that all
A's are B's is (in the light of all the evidence) a better, simpler, more plausible (and
so forth) hypothesis than is the hypothesis, say, that someone is biasing the observed
sample in order to make us think that all A's are B's. On the other hand, as soon as the
total evidence makes some other competing hypothesis plausible, one may not infer
from the past correlation in the observed sample to a complete correlation in the total
population."

9I did not invent these examples, but I forget where I got them.
SMART INDUCfiVE GENERALIZATIONS ARE ABDUCfiONS 41

If this is indeed an abductive inference, then "All A's are B's" should explain "All
observed A's are B's." The problem is that, "All A's are B's" does not seem to explain
why "This A is a B," or why A and Bare regularly associated (pointed out by (Ennis,
1968)). Furthermore, it is hard to see how a general fact could explain its instances,
because it does not seem in any way to cause them.
The story becomes clearer if we are careful about what precisely is explained and
what is doing the explaining. What the general statement in the conclusion explains are
certain characteristics of the set of observations, not the facts observed. For example,
suppose I choose a ball at random (arbitrarily) from a large hat containing colored
balls. The ball I choose is red. Does the fact that all of the balls in the hat are red
explain why this particular ball is red? No, but it does explain why, when I chose a
ball at random, it turned out to be a red one (because they all are). "All A's are B's"
cannot explain why "This A is a B" because it does not say anything at all about how
its being an A is connected with its being a B. The information that "they all are" does
not tell us anything about why this one is, except that it suggests that if we want to
know why this one is, we would do well to figure out why they all are. Instead, all A's
are B's helps to explain why, when a sample was taken, it turned out that all of the A's
in the sample were B's. A generalization helps to explain some characteristics of the
set of observations of the instances, but it does not explain the instances themselves.
That the cloudless, daytime sky is blue helps explain why, when I look up, I see the
sky to be blue, but it doesn't explain why the sky is blue. Seen this way, an inductive
generalization does indeed have the form of an inference whose conclusion explains
its premises.
In particular, "A's are mostly B's" together with "This sample of A's was obtained
without regard to whether or not they were B's" explains why the A's that were sam-
pled were mostly B's.
Why were 61% of the chosen balls yellow?

Because the balls were chosen more or less randomly from a population that
was two thirds yellow, the difference from two thirds in the sample being due
to chance.
Alternative explanation for the same observation:
Because the balls were chosen by a selector with a bias for large balls from a
population that was only one third yellow but where yellow balls tend to be larger
than non yellow
Core claim: The frequencies in the larger population, together with the frequency-
relevant characteristics of the method for drawing a sample, explain the frequencies
in the observed sample.
What is explained? In this example, just the frequency of characteristics in the
sample is explained, not why these particular balls are yellow or why the experiment
was conducted on Tuesday. In general, the explanation explains why the sample fre-
quency was the way it was, rather than having some markedly different value. If there
is a deviation in the sample from what you would expect, given the population and the
sampling method, then you have to throw some Chance into the explanation (which is
more or less plausible depending on how much Chance you have to suppose).
42 J.R. JOSEPHSON

How are frequencies explained? An observed frequency is explained by giving a


causal story that explains how the frequency came to be the way it was. This causal
story typically includes both the method of drawing the sample, and the population
frequency in some reference class. Unbiased sampling processes tend to produce
representative outcomes; biased sampling processes tend to produce unrepresentative
outcomes. This "tending to produce" is a fully causal relationship and it supports ex-
planation and prediction. For example, we may say that an outcome has been caused,
in part, by a certain kind of sampling bias; this sampling bias will then be part of the
explanation for existing data, and a basis for predictions. Similarly, outcomes causally
depend partly on the population frequency, which is also explanatory and predictive.
A peculiarity is that characterizing a sample as "representative" is characterizing
the effect (sample frequency) by reference to part of its cause (population frequency) .
Straight inductive generalization (carrying the sample frequencies unchanged to the
generalization) is equivalent to concluding that a sample is representative, which is a
conclusion about its cause. Straight inductive generalization depends partly on evi-
dence or presumption that the sampling process is (close enough to) unbiased. The
unbiased sampling process is part of the explanation of the sample frequency, and any
independent evidence for or against unbiased sampling bears on its plausibility as part
of the explanation.
If we do not think of inductive generalizations as abductions, we are at a loss to
explain why such an inference is made stronger or more warranted, if in collecting data
we make a systematic search for counter-instances and cannot find any, than it would
be if we just take the observations passively. Why is the generalization made stronger
by making an effort to examine a wide variety of types of A's? The answer is that it
is made stronger because the failure of the active search for counter-instances tends
to rule out various hypotheses about ways in which the sample might be biased, that
is, it strengthens the abductive conclusion by ruling out alternative explanations for
the observed frequency. If we think that a sampling method is fair and unbiased, then
straight generalization gives the best explanation of the sample frequencies. But if the
sample size is small, alternative explanations, where the frequencies differ, may still be
plausible. These alternative explanations become less and less plausible as the sample
size grows, because the sample being unrepresentative due to chance becomes more
and more improbable. Thus viewing inductive generalizations as abductions shows us
why sample size is important. Again, we see that analyzing inductive generalizations
as abductions shows us how to evaluate the strengths of these inferences.

2.3.1 What is the best generalization?


We have seen that alternative generalizations are parts of alternative explanations, and
that for a generalization to be warranted it must be part of the best explanation in
some contrast set. (It must also be best by a distinct margin and the contrast set must
plausibly exhaust the possibilities.)
We have considered contrast sets where hypotheses differ with respect to the nature
and degree of hypothesized bias in the sampling method, and differ correspondingly
in the hypothesized parent frequency and amount of supposed chance. But generaliza-
tions sometimes differ in other ways, such as differing in the choice of the reference
SMART INDUCfiVE GENERALIZATIONS ARE ABDUCfiONS 43

class for the parent frequency (e.g., is it just humans or all multi-celled animals that
use insulin for glucose control?). This is the A class in "All A's are B's." Generaliza-
tions may also differ in the B class, which amounts to differing on the choice of which
attribute to generalize (e.g., 'Crows are black' versus 'Crows are squawky'). This
example shows clearly that sometimes alternative generalizations from the same data
(e.g., crow observations) are not genuinely contrastive; they do not belong together
in contrast sets from which a best explanation may be drawn by abductive reasoning.
It does not make sense to argue in favor of the generalization, 'Crows are black' by
arguing against 'Crows are squawky.' This is not simply because they are compati-
ble hypotheses; we have seen earlier that alternative explanations for abduction need
not be mutually exclusive. 'Heart attack' and 'indigestion' are compatible explana-
tions for the pain since it is possible for the patient to have both conditions. But in
this case one may argue for one by arguing against the other; they are genuinely con-
trastive. Perhaps the 'black' and 'squawky' generalizations of the crow observations
are not contrastive because different aspects of the observations are explained by the
generalizations, so they are not alternative ways of explaining the same things.
I am puzzled about what principles govern when explanations in general are gen-
uinely contrastive, and when they are not. Explanations of different causal types, e.g.,
final cause, efficient cause, are not usually contrastive. Yet within the same causal
type, explanations may not be contrastive, e.g., whether we blame the death on the
murderer or the heart stoppage.
Peter Flach has suggested that finding the right level of generality is most of the
work in forming good generalizations (mailing-list communication). It is interesting
to note that alternative levels of generality do seem to be genuinely contrastive. Let
us suppose that the attribute to be generalized is fixed, and we want to determine the
best reference class for the parent frequency. For concreteness, let us suppose that
we have noticed (and have carefully recorded data showing that) on average, week-
ends are rainier than weekdays. Our observations are taken in New York. Alternative
generalizations include that weekend days are rainier in New York, on the east coast
of North America, on Earth. These generalizations lead to different predictions and
perhaps have differing levels of plausibility based on background knowledge about
plausible mechanisms. (The best generalization in this case seems to be the east coast
of North America, according to a recent report in Nature.)
We have seen that contrastive alternative generalizations may differ in their degrees
of plausibility, for example by supposing more or less chance or by hypothesizing
kinds of sampling bias that are made more or less plausible by background knowl-
edge. Alternative explanations with alternative generalizations may also differ in other
virtues, such as explanatory power, predictive power, precision (e.g., in hypothesized
frequency), specificity (of reference class), consistency (internal consistency of the ex-
planation), simplicity (e.g., complicated versus simple account of bias), and theoretical
promise (e.g., whether the generalization is both testable and theoretically suggestive).

2.4 CONCLUSION
I have argued that it is possible to treat every "smart" inductive generalization as an in-
stance of abduction, and that analyzing inductive generalizations as abductions shows
44 J.R. JOSEPHSON

us how to evaluate the strengths of these inferences. Good inductive generalizations


are good because they are parts of best explanations. I conclude that inductive general-
izations derive their epistemic warrants from their natures as abductions. The warrant
of an inductive generalization is not evident from its form as an inductive generaliza-
tion, but only from its form as an abduction.
If this conclusion is correct, it follows that computational mechanisms for inductive
generalization must be abductively well-constructed, or abductively well-controlled,
if they are to be smart and effective.
We can easily imagine that inference to the best explanation (IBE) and adopting
the best plan (ABP) might rely on the capabilities of a single underlying mechanism
able to generate causal stories. Both admit of separation into two stages of: 1 - gen-
erating and evaluating alternatives, and 2 - decision to adopt. (This is not the same as
separation into propose and evaluate, which I argued against earlier.) However, these
pieces of intelligence are functionally different: IBE leads to belief and ABP leads to
decision to act. If we assume the existence of supervisory control by a problem solver
that is able to keep track of goals, reason hypothetically, and make predictions based
on causal stories, then an important challenge for artificial intelligence is to develop an
integrated framework that unifies both general and specialized methods for generating
plausible causal stories.

Acknowledgments
Parts of this essay are adapted from (Josephson and Josephson, 1994, Chapter 1), "Conceptual
analysis of abduction" (used with permission). I especially want to thank Richard Fox for many
helpful comments on an earlier draft of this chapter.
3 ABDUCTION AS EPISTEMIC
CHANGE: A PEIRCEAN MODEL IN
ARTIFICIAL INTELLIGENCE
Atocha Aliseda

3.1 INTRODUCTION
Charles S. Peirce's abductive formulation ((Peirce, 1958, 5.189), reproduced on p.7),
has been the point of departure of many recent studies on abductive reasoning in arti-
ficial intelligence, such as in logic programming (Kakas et al., 1992), knowledge ac-
quisition (Kakas and Mancarella, 1994) and natural language processing (Hobbs et al.,
1990).
Nevertheless, these approaches have paid little attention to the elements of this for-
mulation and none to what Peirce said elsewhere in his writings. This situation may be
due to the fact that his philosophy is very complex and not easy to be implemented in
the computational realm. The notions of logical inference and of validity that Peirce
puts forward go beyond logical formulations. They are linked to his epistemology, a
dynamic view of thought as logical inquiry. In our view, however, there are several
aspects of Peirce's abduction which are tractable and may be implemented using ma-
chinery of artificial intelligence (AI), such as that found in theories of belief revision.
In this chapter, we propose abduction as an epistemic process for the acquisition of
knowledge and present a model which combines elements from Peirce's epistemology
and theories of epistemic change in AI originated in (Alchourr6n et al., 1985). In
particular, our interest is on the role played by the element of surprise in the abductive
formulation; and its connection to the epistemic transition between the states of doubt
and belief.

45
P.A. Flach and A. C. Kakas (eds.), Abduction and Induction, 45-58.
© 2000 Kluwer Academic Publishers.
46 A.ALISEDA

A natural consequence of our interpretation is that the logical form of abduction


is that of an epistemic process instead of a logical argument, as it has generally been
understood. This approach also contributes to place Peirce's notion of abduction in
the epistemological agenda of AI. As for induction, in our view, it may be described
as part of abduction; therefore we will make no special analysis of induction, but only
show how it fits in our taxonomy for abductive reasoning.
The chapter is organized as follows. In Section 3.2 the terminological confusions
between abduction and induction are discussed. Section 3.3 presents abduction from
three perspectives: the notion of abduction in Peirce, abduction viewed as logical
inference in AI, and our own view based on several parameters which determine types
of abduction. Section 3.4 describes the epistemological models of Peirce and those in
AI. In Section 3.5 our model for abduction as epistemic change is introduced. Finally,
in Section 3.6 our conclusions are offered.

3.2 ABDUCTION AND INDUCTION


Once beyond deductive logic, diverse terminologies are being used. Perhaps the most
widely used term is inductive reasoning (Mill, 1843; Salmon, 1990; Holland et al.,
1986; Thagard, 1988; Flach, 1995). For C.S. Peirce, as we shall see, 'deduction',
'induction' and 'abduction' formed a natural triangle- but the literature in general
shows many overlaps, and even confusions.
Since the time of John Stuart Mill (1806-1873), the technical name given to all
kinds of non-deductive reasoning has been 'induction', though several methods for
discovery and demonstration of causal relationships (Mill, 1843) were recognized.
These included generalizing from a sample to a general property, and reasoning from
data to a causal hypothesis (the latter further divided into methods of agreement, dif-
ference, residues, and concomitant variation). A more refined and modem terminol-
ogy is 'enumerative induction' and 'explanatory induction'. Some instances of these
are: 'inductive generalization', 'predictive induction', 'inductive projection', 'statisti-
cal syllogism' and 'concept formation'.
Such a broad connotation of the term 'induction' continues to the present day. For
instance, in the so called "computational philosophy of science", induction is under-
stood "in the broad sense of any kind of inference that expands knowledge in the face
of uncertainty" (Thagard, 1988, p.54). In artificial intelligence, 'induction' is used for
the process of learning from examples - but also for creating a theory to explain the
observed facts (Shapiro, 1991 ). Under this view, abduction is viewed as an instance of
induction, when the observation is a single fact.
On the other hand, some authors regard abduction under the name of inference
to the best explanation, as the basic form of non-deductive inference (Harman, 1965),
and consider (enumerative) induction as a special case. But we must approach the term
'abduction' with care. Given a fact to be explained, there are often several possible
explanations, but only one that counts as the best one. Thus, abduction is connected
to both hypothesis generation and hypothesis selection. Some authors consider these
processes as two separate steps, construction dealing with what counts as a possible
explanation, and selection with applying some preference criterion over possible ex-
planations to select the best one. Other authors regard abduction as a single process by
ABDUCTION AS EPISTEMIC CHANGE 47

which a single best explanation is constructed. While the latter view considers finding
the best explanation as fundamental for abduction (an approach shared by Joseph-
son and Psillos in this volume), abduction understood as the construction of explana-
tions, regards the notion of explanation as more fundamental, as shown in (Aliseda,
1996b; Denecker et at., 1996), Bessant (this volume), and Flach (this volume).
To clear up all these conflicts, which are terminological to a large extent, one might
want to coin new terminology altogether. I have argued for a new term of "explana-
tory reasoning" in (Aliseda, 1996b; Aliseda, 1997), trying to describe its fundamental
aspects without having to decide if they are instances of either abduction or induction.
However, for the purposes of this chapter, rather than introducing new terminology, I
shall use the term 'abduction' for the basic type of explanatory reasoning.
Our focus is on abduction as hypothesis construction. It is a general process of
explanation which is best described by a taxonomy (cf. Section 3.3.2) in which sev-
eral parameters (inference, triggers, outcomes) determine types of abduction. More
precisely, we shall understand abduction as reasoning from a single observation to its
explanations, and induction as enumerative induction from samples to general state-
ments. Therefore, abduction (when properly generalized), encloses (some cases of)
induction as one of its instances, when the observations are many and the outcome a
universal statement.

3.3 THE NOTION OF ABDUCTION


3.3. 1 Abduction in the work of Peirce
The intellectual enterprise of Charles Sanders Peirce, in its broadest sense, was to
develop a semiotic theory, in order to provide a framework to give an account for
thought and language. With regard to our purposes, the fundamental question Peirce
addressed was how synthetic reasoning was possible 1• Very much influenced by the
philosophy of Immanuel Kant, Peirce's aim was to extend his categories and correct
his logic.
"According to Kant, the central question of philosophy is 'How are synthetical
judgments a priori possible?' But antecedently to this comes the question how
synthetical judgments in general, and still more generally, how synthetical reason-
ing is possible at all. When the answer to the general problem has been obtained,
the particular one will be comparatively simple. This is the lock upon the door of
philosophy." (Peirce, 1958, 5.348), quoted in (Hookway, 1992, p.l8)

Peirce proposes abduction to be the logic for synthetic reasoning, a method to ac-
quire new ideas. He was the first philosopher to give to abduction a logical form.
However, his notion of abduction is a difficult one to unravel. On the one hand, it is
entangled with many other aspects of his philosophy, and on the other hand, several
different conceptions of abduction evolved in his thought. We will point out a few
general aspects of his theory of inquiry, and later concentrate on some of its more
logical aspects.

1See p.6 for Peirce's classification of inferences into analytic and synthetic.
48 A.ALISEDA

The development of a logic of inquiry occupied Peirce's thought since the begin-
ning of his work. In the early years he thought of a logic composed of three modes
of reasoning: deduction, induction and hypothesis each of which corresponds to a
syllogistic form (p.5). Of these, deduction is the only reasoning which is completely
certain, inferring its 'Result' as a necessary conclusion. Induction produces a 'Rule'
validated only in the 'long run' (Peirce, 1958, 5.170), and hypothesis merely suggests
that something may be 'the Case' (Peirce, 1958, 5.171).
Later on, Peirce proposed these types of reasoning as the stages composing a
method for logical inquiry, of which hypothesis (now called abduction), is the be-
ginning:
"From its [abductive] suggestion deduction can draw a prediction which can be
tested by induction". (Peirce, 1958, 5.171)

The notion of abduction is then enriched by the more general conception of: "the pro-
cess of forming an explanatory hypothesis" (Peirce, 1958, 5.171) and the syllogistic
form is replaced by the often-quoted logical formulation (p.7).
For Peirce, three aspects determine whether a hypothesis is promising: it must be
explanatory, testable, and economic. A hypothesis is an explanation if it accounts
for the facts, according to the abductive formulation. Its status is that of a suggestion
until it is verified, which explains the need for the testability criterion. Finally, the
motivation for the economic criterion is twofold: a response to the practical problem of
having innumerable explanatory hypotheses to test, as well as the need for a criterion
to select the best explanation amongst the testable ones.
Moreover, abductive reasoning is essential for every human inquiry. It plays a role
in perception, in which:
"The abductive suggestion comes to us as a flash" (Peirce, 1958, 5.181)

As well as in the general process of invention:


"It [abduction] is the only logical operation which introduces any new ideas"
(Peirce, 1958, 5.171)

In all this, abduction is both "an act of insight and an inference" as has been claimed
by (Anderson, 1986), who suggests a double aspect of abduction; an intuitive and a
rational one.

Interpreting Peirce's abduction. The notion of abduction has puzzled Peirce schol-
ars all along. Some have concluded that Peirce held no coherent view on abduction at
all (Frankfurt, 1958), others have tried to give a joint account with induction (Reilly,
1970) and still others claim it is a form of inverted modus ponens (Anderson, 1986).
A more modem view is found in (Kapitan, 1990) who interprets Peirce's abduction
as a form of heuristics. An account that tries to make sense of the two extremes of
abduction, both as a guessing instinct and as a rational activity is found in (Ayim,
1974). This last approach continues to present day. While (Debrock, 1997) proposes
to reinterpret the concept of rationality to account for these two aspects, (Gorlee, 1997)
shows abductive inference in language translation, a process in which the best possi-
ble hypothesis is sought using instinctive as well as rational elements of translation.
ABDUCfiON AS EPISTEMIC CHANGE 49

Thus, abductive inference is found in a variety of contexts. To explain abduction in


perception, (Roesler, 1997) offers a reinterpretation of Peirce's abductive formulation,
whereas (Wirth, 1997) uses the notion of 'abductive competence' to account for lan-
guage interpretation.
In AI circles, Peirce's abductive formulation has been generally interpreted as the
following logical argument-schema:
c
A---tC
A
Where the status of A is tentative (it does not follow as a logical consequence from the
premises).
However intuitive, this interpretation certainly captures neither the fact that C is
surprising nor the additional criteria Peirce proposed. Moreover, the interpretation of
the second premise is not committed to material implication. In fact, Flach and Kakas
(this volume) argue that this is a vacuous interpretation and favour one of classical
logical entailment. But other interpretations are possible; any other nonstandard form
of logical entailment or even a computational process in which A is the input and C
the output, are all feasible interpretations for "if C were true, A would be a matter of
course".
The additional Peircean requirements of testability and economy are not recognized
as such in AI, but are nevertheless incorporated. The latter criterion is implemented as
a further selection process to produce the best explanation, since there might be several
formulae which satisfy the above formulation but are not appropriate as explanations.
As for the testability requirement, when the second premise is interpreted as A f= C (or
any other form of logical entailment) this requirement is trivialized, since given that C
is true, in the simplest sense of 'testable', A will always be testable.
We leave here the reconstruction of Peirce's notion of abduction. A nice concise
account of the development of abduction in Peirce, which clearly distinguishes three
stages in the evolution of his thought is given in (Fann, 1970). Another key reference
on Peirce's abduction, in its relation to creativity in art and science is found in (An-
derson, 1987). As to more general semiotic aspects of Peirce's philosophy, another
proposal for characterizing abduction in AI is found in (Kruijff, 1995).

3.3.2 Abduction in artificial intelligence


Of the many areas in AI in which abduction is studied, our focus will be on logic
based approaches. The general trend in this field is to interpret abduction as reversed
deduction plus a consistency requirement (p.I3). In addition to these conditions, it
is often required that an explanation a be 'minimal' (but there are several ways to
characterize minimality, see (Aliseda, 1997)), and has some restricted syntactical form
(usually an atomic formula).
An additional condition not always made explicit is that E> ~ <p (the fact to be
explained should not already follow from the background theory alone).
In our view, this characterization of abduction is but one instance of a general form
for abductive reasoning, which we proceed to present.
50 A.ALISEDA

A taxonomy for abduction. Abduction is a general process of explanation, whose


products are specific explanations, with a certain inferential structure. As for the logi-
cal schema for abduction, it may be viewed as a threefold relation:
E>,a:::? q>
between an observation cp, an abduced item a (the explanation), and a background
theory 8 2 • Against this background, I have proposed three main parameters that deter-
mine types of abduction. (i) An 'inferential parameter' (::::}) sets some suitable logical
relationship among explanans, background theory, and explanandum. This may be
classical semantic entailment, statistical inference, or even any non-standard interpre-
tation of logical consequence. (ii) Next, 'triggers' determine what kind of abduction
is to be performed: cp may be a novel phenomenon, or it may be in conflict with the
theory 8, in which case the phenomenon is anomalous. (iii) Finally, 'outcomes' (a)
are the various products of an abductive process: singular facts, universal statements,
or even new theories.
This proposed taxonomy generalizes the standard one of abduction as logical infer-
ence in AI. It goes further in that it does not limit the underlying consequence to be
classical, nor the form of 'abducible outcomes' to be singular facts, as it is the case in
most approaches. Admittedly, though, the proposed logical schema gives neither nec-
essary nor sufficient conditions for the existence of an explanatory relation between
e, a and cp. There are many well-known cases of explanation in philosophy of sci-
ence which do not fit this schema, as well as cases in which additional conditions are
required; such as consistency, relevance and minimality. But we think this schema
gives us enough already to characterize abduction as explanation. Moreover, it re-
mains unclear whether the requirements of relevance and minimality can be given in
purely logical (classical or otherwise) terms. (Cf. (Aliseda, 1996b) for motivation on
the proposed taxonomy and (Aliseda, 1997) for much more details, including logical
characterizations of several 'abductive logics' generated by setting the above parame-
ters and additional conditions).
Concerning parameter (ii), the one in focus for this chapter, abduction is generally
understood as a process in which the observation cp to be explained is novel, that is,
neither cp nor -.cp are explained by the background theory. In our interpretation of
abduction, this is just one dimension of abductive reasoning . The other one is when
the observation cp is an anomaly, a fact in conflict with the theory.
Thus, we identify at least two triggers for abduction: novelty and anomaly, which
we characterize within our model as follows:

• Abductive Novelty: 8 :/} cp, 8 :/} -.cp

• Abductive Anomaly: 8 :/} cp, 8 '* -.cp.


2 Philosophers of science may think it is inappropriate to call CL the explanation. In this tradition the expla-
nation is the whole schema, such as the one for the D-N model of explanation, in which the explanans (the
general laws and antecedent conditions) are related to the explanandum (the phenomenon to be explained)
by logical entailment.
ABDUCTION AS EPISTEMIC CHANGE 51

In the computational literature on abduction, novelty is the condition for an ab-


ductive problem (Kakas et al. , 1992). My suggestion is to incorporate anomaly as a
second basic type.
Moreover, it seems straightforward to account for (many cases of) induction as an
instance of abduction within this schema. When it is interpreted as inductive gener-
alization, it is the case of a reasoning triggered by repeatedly finding a novelty in a
set of observations; and the outcome is a general statement that generalizes the novel
property found in the observations. And when viewed as statistical induction, it is the
case in which the inferential parameter is set to probable inference.

3.4 EPISTEMIC CHANGE


3.4. 1 Peirce's epistemology
In Peirce's epistemology, thought is a dynamic process, essentially an interaction be-
tween two states of mind: doubt and belief. While the essence of the latter is the
"establishment of a habit which determines our actions" (Peirce, 1958, 5.388), with
the quality of being a calm and satisfactory state in which all humans would like to
stay, the former "stimulates us to inquiry until it is destroyed" (Peirce, 1955b), and
it is characterized by being a stormy and unpleasant state from which every human
struggles to be freed:
"The irritation of doubt causes a struggle to attain a state of belief'. (Peirce,
1955b).
Peirce speaks of a state of belief and not of knowledge. Thus, the pair ' doubt-
belief' is a cycle between two opposite states. While belief is a habit, doubt is its
privation. Doubt, however, Peirce claims, is not a state generated at will by raising a
question, just as a sentence does not become interrogative by putting a special mark
on it, there must be a real and genuine doubt:
"genuine doubt always has an external origin, usually from surprise; and that it
is as impossible for a man to create in himself a genuine doubt by such an act of
the will as would suffice to imagine the condition of a mathematical theorem, as
it would be for him to give himself a genuine surprise by a simple act of the will."
(Peirce, 1958, 5.443)
Moreover, it is surprise that breaks a habit:
"For belief, while it lasts, is a strong habit, and as such, forces the man to believe
until some surprise breaks up the habit". ((Peirce, 1958, 5.524), my emphasis).
And Peirce distinguishes two ways to break a habit:
"The breaking of a belief can only be due to some novel experience" (Peirce, 1958,
5.524) or " ...until we find ourselves confronted with some experience contrary to
those expectations." ((Peirce, 1958, 7.36), my emphasis).

Peirce's epistemic model proposes two varieties of surprise as the triggers for ev-
ery inquiry, which we relate to the previously proposed novelty and anomaly (cf. Sec-
tion 3.3.2). We will see (Section 3.5.1) how these are related to the epistemic opera-
tions for belief change in AI.
52 A.ALISEDA

3.4.2 Epistemic theories in artificial intelligence


Notions related to explanation have also emerged in theories of belief change in AI.
One does not just want to incorporate new beliefs, but often also, to justify them. The
main motivation of these theories is to develop logical and computational mechanisms
to incorporate new information to a scientific theory, data base or set of beliefs. Dif-
ferent types of change are appropriate in different situations. Indeed, the pioneering
work of Carlos Alchourr6n, Peter Gardenfors and David Makinson (often referred as
the AGM approach) (Alchourr6n et al., 1985), proposes a normative theory of epis-
temic change characterized by the conditions that a rational belief change operator
should satisfy.
My discussion of epistemic change is in the same spirit, taking a number of cues
from their analysis 3 . I concentrate on belief revision, where changes occur only in
the theory. The situation or world to be modelled is supposed to be static, only new
information is coming in. (The other type of epistemic change in AI which accounts
for a changing world is called update.)
The basic elements of this theory are the following. Given a consistent theory
8 closed under logical consequence, called the belief state, and a sentence <p, the
incoming belief, there are three epistemic attitudes for 8 with respect to <p: either <p
is accepted (<p E 8), <p is rejected ('<i> E 8), or <p is undetermined (<p f/. 8, '<i> f/. 8).
Given these attitudes, the following operations characterize the kind of belief change
<p brings into 8, thereby effecting an epistemic change in the agent's currently held
beliefs:

• Expansion
A new sentence is added to 8 regardless of the consequences of the larger set
to be formed. The belief system that results from expanding 8 by a sentence <p
together with the logical consequences is denoted by 8 + <p.
• Revision
A new sentence that is (typically) inconsistent with a belief system 8 is added,
but in order that the resulting belief system be consistent, some of the old sen-
tences in 8 are deleted. The result of revising 8 by a sentence <p is denoted by
8*<1'·
• Contraction
Some sentence in 8 is retracted without adding any new facts. In order to guar-
antee the deductive closure of the resulting system, some other sentences of 8
may be given up. The result of contracting 8 with respect to sentence <p is
denoted by 8 - <p.

While expansion can be uniquely and easily defined (8 + <p = {a I 8 V {<p} f- a}),
this is not so with contraction or revision. A simple example to illustrate this point is
the following:

3 However, the material of this section is mainly based on (Glirdenfors and Rott, 1995).
ABDUCTION AS EPISTEMIC CHANGE 53

E>: r, r--+ w.
<p: -.w.

In order to incorporate <P into 9 and maintain consistency, the theory must be re-
vised. But there are two possibilities for doing this: deleting either of r --+ w or r
allows us to then expand the contracted theory with -.w consistently. Several formulas
can be retracted to achieve the desired effect, thus it is impossible to state in purely
logical or set-theoretical terms which of these is to be chosen.
Therefore, an additional criterion must be incorporated in order to fix which for-
mula to retract. Here, the general intuition is that changes on the theory should be kept
'minimal', in some sense of informational economy 4 .
Moreover, epistemic theories in this tradition observe certain 'integrity constraints',
which concern the theory's preservation of consistency, its deductive closure and two
criteria for the retraction of beliefs: the loss of information should be kept minimal
and the less entrenched beliefs should be removed first.
These are the very basics of the AGM approach. In practice, however, full-fledged
systems of belief revision can be quite diverse. They differ in at least three aspects: (a)
belief state representation (sets, bases, possible worlds or probabilities over sentences
or worlds), (b) characterization of the operations of epistemic change (via postulates or
constructively), and (c) epistemological stance. This last aspect concerns the epistemic
quality to be preserved. While the foundationalists argue that beliefs must be justified
(with the exception of a selected set of 'basic beliefs'), the coherentists consider it
a priority to maintain the overall coherence of the system and reject the existence of
basic beliefs.
Therefore, each theory of epistemic change may be characterized by its represen-
tation of belief states, its description of belief revision operations, and its stand on
the main properties of sets of beliefs one should be looking for. These choices may
be interdependent. Say, a constructive approach might favor a representation by be-
lief bases, and hence define belief revision operations on some finite base, rather than
the whole background theory. Moreover, the epistemological stance determines what
constitutes rational epistemic change. The foundationalist accepts only those beliefs
which are justified in virtue of other basic beliefs, thus having an additional challenge
of computing the reasons for an incoming belief. On the other hand, the coheren-
tist must maintain coherence, and hence make only those minimal changes which do
not endanger (at least) consistency (however, coherence need not be identified with
consistency).
In particular, the AGM paradigm represents belief states as sets (in fact, as theories
closed under logical consequence), provides 'rationality postulates' to characterize the
belief revision operations, and finally, it advocates a coherentist view.

4Various ways of dealing with this issue occur in the literature. I mention only that in (Glirdenfors, 1988). It
is based on the notion of entrenchment, a preferential ordering which lines up the formulas in a belief state
according to their importance. Thus, we may retract those formulas which are the 'least entrenched' first.
For a more detailed reference as to how this is done exactly, see (Glirdenfors and Makinson, 1988).
54 A. ALISEDA

3.5 ABDUCTION AS EPISTEMIC CHANGE


In AI, practical connections of abduction to theories of belief revision have often been
noted. But these in general use abduction to determine explanations of incoming be-
liefs or as an aid to perform the epistemic operations for belief revision. Of many
references in the literature, we mention (Williams, 1994) (which studies the relation-
ship between explanations based on abduction and 'Spohnian reasons') and (Aravin-
dan and Dung, 1994) (which uses abductive procedures to realize contractions over
theories with 'immutability conditions'). Some work has been done relating abduc-
tion to knowledge assimilation (Kakas and Mancarella, 1994), in which the goal is to
assimilate a series of observations into a theory maintaining its consistency.
Our claim will be stronger. Abduction can function in a model of theory revision
as a means to determine explanations for incoming beliefs. But also more generally,
abductive reasoning itself provides a model for epistemic change. It may be described
by two operations: either as an (abductive) expansion, where the background theory
gets extended to account for a novel fact, or as an (abductive) revision, in which the
theory needs to be revised to account for an anomalous fact. Belief revision theories
provide an explicit calculus of modification for both cases and indeed serve as a guide
to define abductive operations for epistemic change. In AI, characterizing abduction
via belief operators goes back to (Levesque, 1989). More recently, other work has
been done in this direction (Pagnucco, 1996; Lobo and Uzcategui, 1998). As we shall
see, while the approaches share the same intuition, they all differ in implementation.
In philosophy, the idea of abduction as epistemic change is already present in
Peirce's philosophical system, in the connection between abduction and the epistemic
transition between the mental states of doubt and belief. It shows itself very clearly in
the fact that surprise is both the trigger of abductive reasoning - as indicated by the
first premise of the logical formulation - as well as the trigger of the state of doubt,
when a belief habit has been broken. Abductive reasoning is a process by which
doubts are transformed into beliefs, since an explanation for a surprising fact is but a
suggestion to be tested thereafter. Moreover, one aspect of abduction is related to the
"Ramsey test" (Ramsey, 1931): given a conditional sentence a-t cp, a is a reason for
cp iff revising your current beliefs by a causes cp to be believed. This test has been the
point of departure of much epistemological work both in philosophy and in AI.
Next we propose two abductive epistemic operations for the acquisition of knowl-
edge, following those in the AGM theory while being faithful to our interpretation of
Peirce's abduction and proposed taxonomy. Then, a discussion on abductive epistemic
theories, in which we sketch ours and compare it to other work.

3.5. 1 Abductive operations for epistemic change


The previously defined abductive novelty and abductive anomaly correspond respec-
tively, to the AGM epistemic attitudes of undetermination and rejection (provided that
::::} is f- and 8 is closed under logical consequence).
In our account of abduction, both a novel phenomenon and an anomalous one in-
duce a change in the original theory. The latter calls for a revision and the former
ABDUCTION AS EPISTEMIC CHANGE 55

for expansion. So, the basic operations for abduction are expansion and revision 5 •
Therefore, two epistemic attitudes and changes in them are reflected in an abductive
model.
Here, then, are the abductive operations for epistemic change:

• Abductive Expansion
Given an abductive novelty <p, a consistent explanation a for <p is computed in
such a way that e, (l => <p, and then added to e.

• Abductive Revision
Given an abductive anomaly <p, a consistent explanation a is computed as fol-
lows: the theory e is revised into e' so that it does not explain ...,<p. That is,
e' =fo ...,<p, where e' = e - Uh' .. . '~I ) 6 •
Once e' is obtained, a consistent explanation a is calculated in such a way that
e'' (l => <p and then added to e.
Thus, the process of revision involves both contraction and expansion.

In one respect, these operations are more general than their counterparts in the
AGM theory, since incoming beliefs are incorporated into the theory together with
their explanation (when the theory is closed under logical consequence). But in the
type of sentences to accept, they are more restrictive. Given that in our model non-
surprising facts (where e => <p) are not candidates for being explained, abductive ex-
pansion does not apply to already accepted beliefs, and similarly, revision only accepts
rejected facts. Other approaches however, do not commit themselves to the precondi-
tions of novelty and anomaly that we have set forward. Pagnucco's abductive expan-
sion (Pagnucco, 1996) is defined for an inconsistent input, but in this case the resulting
state stays the same. Lobo and Uzcategui's abductive expansion (Lobo and Uzcategui,
1998) is even closer to standard AGM expansion; it is in fact the same when every
atom is "abducible".

3.5.2 Abductive epistemic theories


Once we interpret abductive reasoning as a model for epistemic change, in lines of
those proposed in AI, the next question is: what kind of theory is an abductive epis-
temic theory?
As we saw in Section 3.4.2, there are several choices for representing belief states,
for characterizing the operations for epistemic change and finally, an epistemic the-
ory adheres to either the foundationalist or the coherence trend. The predominant line

5 Indeed the three belief change operations can be reduced into two of them, since revision and contraction
may be defined in terms of each other. In particular, revision here is defined as a composition of contraction
and expansion: first contract those beliefs of 9 that are in conflict with <JI, and then expand the modified
theory with sentence <P (known as ' Levi's identity').
6 In many cases, several formulas and not just one must be removed from the theory. The reason is that sets
of formulas which entail (explain) <P should be removed. E.g., given 9 = {a --+ 13. a,l3} and <P = ~13. in
order to make 9, ~13 consistent, one needs to remove either {13, a} or {13, a --+ 13}.
56 A. ALISEDA

of what we have examined so far, is to stay close to the AGM approach. That is, to
represent belief states as sets (in fact as closed theories), and to characterize the ab-
ductive operations of extension and revision through definitions, rationality postulates
and a number of constructions motivated by those for AGM contraction and revision.
As for epistemological stance, Pagnucco is careful to keep his proposal away from
being interpreted as foundationalist; he thinks that having a special set of beliefs like
the "abducibles", as found in (Lobo and Uzcategui, 1998), is "against the coherentist
spirit of the AGM" (Pagnucco, 1996, p.174).
In our view, an abductive theory for epistemic change which aims to model Peirce's
abduction naturally calls for a procedural approach. It should produce explanations for
surprising phenomena and thus transform a state of doubt into one of belief. The AGM
postulates describe expansions, contractions and revisions as epistemic products rather
than processes in their own right. Their concern is with the nature of epistemic states,
not with their dynamics. This gives them a 'static' flavour, which may not always be
appropriate.
Therefore, we aim at giving a constructive model in which abduction is an epistemic
activity. In (Aliseda, 1997), I propose (an extension of) the logical framework of se-
mantic tableaux as a constructive representation of theories, and abductive expansion
and revision operations that work over them. These tableaux analyze non-deductively
closed finite sets of formulas, corresponding with 'belief bases'. For this chapter how-
ever, I leave the precise technical proposal outside of its scope. Let me just mention
that semantic tableaux provides a very attractive framework, in which expanding a
theory concerns the addition of new formulae to open branches and contraction cor-
responds to deleting formulas by 'opening' closed branches. Besides implementing
the standard account of contraction, which is done by removing complete formulas,
tableaux offer an alternative route. This is done by removing "subformulas", which is
a more delicate kind of minimal change. We thus offer two strategies for contracting
theories, global and local. We also explore the idea of contracting by revising the
language, which seems a more realistic way to account for inconsistencies, as often
people resolve anomalies by introducing distinctions in the language, rather than by
deleting and expanding their background theory. Moreover, in tableaux consistency is
handled in a natural way and its structure provides a way in which an entrenchment
order is easily represented.
As for epistemological stance, here is our position. The main motivation for an ab-
ductive epistemic theory is to incorporate an incoming belief together with its explana-
tion, the belief (or set of) that justifies it. This fact places abduction close to the above
foundationalist line, which requires that beliefs are justified in terms of other basic be-
liefs. Often, abductive beliefs are used by a (scientific) community, so the claim that
individuals do not keep track of the justifications of their beliefs (Gardenfors, 1988)
does not apply. On the other hand, an important feature of abductive reasoning is
maintaining consistency of the theory. Otherwise, explanations would be meaningless
(especially if:::} is interpreted as classical logical consequence. Cf. (Aliseda, 1997)).
Therefore, abduction is committed to the coherentist approach as well. This is not
a case of opportunism. Abduction rather demonstrates that the earlier philosophical
stances are not incompatible. Indeed, (Haack, 1993) argues for an intermediate stance
ABDUCTION AS EPISTEMIC CHANGE 57

of 'foundherentism'. Combinations of foundationalist and coherentist approaches are


also found in the AI literature (Galliers, 1992).

3.6 DISCUSSION AND CONCLUSIONS


We have argued for an epistemic model of abductive reasoning, in the lines of those
proposed for belief revision in AI. However, this connection does not imply that ab-
duction can be equated to belief revision. Let me discuss some reasons for this claim.
On the one hand, in its emphasis on explanations, an abductive model for epis-
temic change is richer than many theories of belief revision. Admittedly, though, not
all cases of belief revision involve explanation, so the greater richness also reflects a
restriction to a special setting. Moreover, in our model, not all input data is epistem-
ically assimilated, but only that which is surprising; those facts which have not been
explained by the theory, or that are in conflict with it. (Even so, one might speculate
whether facts which are merely probable on the basis of 9 might still need explanation
of some sort to further cement their status.)
On the other hand, having a model for abduction within the belief revision frame-
work has imposed some other restrictions which might not be appropriate in a broader
conception of abductive reasoning. One is that our abduction always leads to some
modification of the background theory to account for a surprising phenomenon. There-
fore, we leave out cases in which a fact may be surprising (e.g. my computer is not
working) even though it has been explained in the past. This sense of "surprising"
seems closer to "unexpected"; there is a need to search for an explanation, but it does
not involve any revision whatsoever. Moreover, the type of belief revision accounted
for in this model for abductive reasoning is of a very simple kind. Addition and re-
moval of information are the basic operations to effect the epistemic changes, and the
only criterion for theory revision is explanation. There is neither room for conceptual
change, a more realistic type of scientific reasoning, nor place for a finer-grain dis-
tinction of kinds of expansion (as in (Levi, 1991)) or of revision. Our approach may
also be extended with a theory revision procedure, in which the revised theory is more
successful than the original one, in the lines proposed in (Kuipers, 1998).
Even so, the restrictions we set for abduction were of a piece with the aim of this
chapter, namely to provide a working model for Peirce's abduction, at least in respect
of capturing the notion of a "surprising fact" and relating it to the epistemic process of
transforming a state of doubt into one of belief.
The overall abductive epistemic process may be described as follows: a novel or an
anomalous fact gives place to a surprising fact, generating a state of doubt. Therefore,
abductive reasoning is triggered, which consists on explaining the surprising fact and
incorporating it into the theory. In the case of an abductive novelty, the explanation is
assimilated into the theory by the operation of expansion. In the case of an abductive
anomaly, the operation of revision is needed to modify the theory and incorporate the
explanation. These two operations lead to a state of belief, which will remain as such
until another surprising fact is encountered.
A natural consequence of this analysis is that the interpretation of Peirce's abduc-
tive formulation goes beyond that of a logical argument. It also involves extending
58 A. ALISEDA

the traditional view of abduction in AI, to include cases in which the observation is in
conflict with the theory, as we have suggested in our taxonomy (Section 3.3.2).
Regarding the connection to other work, it would be interesting to compare our
approach with that mentioned which follows the AGM line. Although this is not as
straightforward as it may seem (cf. (Aliseda, 1997)), we could at least check whether
our algorithmic constructions of abductive expansion and revision validate the abduc-
tive postulates found in (Pagnucco, 1996) and (Lobo and Uzcategui, 1998).

Acknowledgments
I am grateful to Michael Hoffmann and Raymundo Morado for discussions on several points of
this chapter, and to Samir Okasha for very helpful comments and suggestions to this chapter.
4
ABDUCTION: BETWEEN
CONCEPTUAL RICHNESS AND
COMPUTATIONAL COMPLEXITY
Stathis Psillos

4.1 INTRODUCTION
The aim of this chapter is two-fold: first, to explore the relationship between abduction
and induction from a philosophical point of view; and second, to examine critically
some recent attempts to provide computational models of abduction. Induction is
typically conceived as the mode of reasoning which produces generalisations over
domains of individuals based on samples. Abduction, on the other hand, is typically
seen as the mode of reasoning which produces hypotheses such that, if true, they would
explain certain phenomena or evidence. Recently there has been some increasing
interest in the issue of how exactly, if at all, they are related. Two seem to be the
main problems: first, whether or not induction and abduction are conceptually distinct
modes of reasoning; second, whether or not they can be modelled computationally in
the same, or similar, ways. The second issue is explored in some detail by several
chapters in this collection (e.g. the contributions by Aliseda, Mooney and Poole). The
first issue is what the present chapter will concentrate on. My suggestion will be that
abduction is the basic type of ampliative reasoning. It comprises as special case both
Induction and what the American philosopher Charles Peirce called "the Method of
Hypothesis".
In order to motivate and defend my thesis, I proceed as follows. Section 4.2 de-
scribes the basic logical features of ampliative reasoning. Section 4.3 takes its cue
from Peirce's distinction between Induction and Hypothesis and raises the following
question: should the fact that Induction and Hypothesis admit different logical forms
59
P.A. Flach and A.C. Kakas (eds.), Abduction and Induction, 59-74.
© 2000 Kluwer Academic Publishers.
60 S. PSILLOS

be taken to indicate that they are conceptually distinct modes of ampliative reasoning?
I answer this question negatively and defend the view that Induction and Hypothesis
are very similar in nature: they are instances of what can be called "explanatory rea-
soning", where explanatory considerations govern the acceptance of the conclusion.
So, I suggest that explanatory reasoning is a basic type of ampliative reasoning, irre-
spective of the specific logical forms it may admit. In Section 4.4, I describe abduction
as the basic type of explanatory reasoning. I suggest that it should be best understood
as Inference to the Best Explanation. In particular, I deal with three problems. First,
how abduction can acquire an eliminative-evaluative dimension; second, how abduc-
tion can produce likely hypotheses; and third, what the nature of explanation is. These
are still open issues and what this chapter aims to do is motivate some ways to address
them. Finally, Section 4.5 discusses some recent computational models of abduction
and notes that there seems to be an inherent tension in the project of modelling abduc-
tion. Simple models of abduction are computationally tractable, but fail to capture the
rich conceptual structure of abductive reasoning. And conversely, conceptually rich
models of abduction become computationally intractable.

4.2 AMPLIATIVE REASONING


It was Charles Peirce who, following Kant's distinction between analytic and syn-
thetic reasoning, called "ampliative" the kind of reasoning in which the conclusion
of the argument goes beyond what is already stated in its premises (2.623). 1 A typi-
cal case of ampliative reasoning is the following more-of-the-same type of inference:
'All observed individuals who have the property A also have the property A; therefore,
(probably) All individuals who have the property A also have the property B'. This
is what is known as the rule of induction, where the conclusion of the argument is a
generalisation over the individuals referred-to in its premises.
Ampliative reasoning is to be contrasted to what Peirce called "explicative reason-
ing". The conclusion of an explicative inference is already included in its premises,
and hence contains no information which is not already, albeit implicitly, in them: the
reasoning process itself merely unpacks the premises and shows what follows logi-
cally from them. Deductive inferences are explicative inferences. In contrast to this,
ampliative reasoning is logically invalid: the conclusions of an ampliative argument
can be false although all of its premises may be true. Consequently, the rules involved
in ampliative reasoning do not guarantee that whenever the premises of an argument
are true the conclusion will also be true. But this is as it should be: the conclusion
of an ampliative argument is adopted on the basis that the premises offer some rea-
son to accept it as plausible. Were it not for the premises, the conclusion would be
unwarranted.
If ampliative reasoning is to be possible at all, one should be reasonable in ac-
cepting the conclusions of ampliative arguments, although further information might
render them wrong. This feature of ampliative reasoning is called defeasibility. It is its

1All references to Peirce's work are given in the standard form and refer to the relevant volume and para-
graph of his collected papers.
BETWEEN CONCEPTUAL RICHNESS AND COMPUTATIONAL COMPLEXITY 61

constitutive difference from explicative reasoning. The latter is not defeasible, since
the addition of further information in the premises of a logically valid argument would
not affect the derivation of the original conclusion. When it comes to ampliative rea-
soning, further evidence, which does not affect the truth of the premises, can render
the conclusion false. Take for instance the simple inductive argument: 'All hitherto
observed swans have been white; so, all swans are white'. The observation of a black
swan falsifies its conclusion, without contradicting its premises. Given its defeasi-
bility, one may wonder why ampliative reasoning should be accepted as a legitimate
type of reasoning in the first place. The reason for this is that explicative reasoning
is not concerned with one of the basic aspects of reasoning, viz., how it is reasonable
for someone to form and change their system of beliefs, or the information they hold
true. All that explicative reasoning dictates is that since a certain conclusion logically
follows from a set of premises, its likelihood of being true is at least as great as the
likelihood of the premises being true, and it will remain so when further premises are
added. But this is too thin. Judgements as to whether the conclusion, or the premises,
are probable enough, or even plausible at all, to be accepted fall outside the province of
explicative reasoning. When, for instance, the conclusion of an explicative argument
is not acceptable to a reasoner, at least one of the premises should have to go (or the
integrity of the derivation may be challenged). But explicative reasoning on its own
cannot tell us which premise should go. This requires reasoning based on some con-
siderations of plausibility, and only ampliative reasoning can tell the reasoner what to
count as plausible and what not, given the information available. In order, however, to
avoid a possible misunderstanding, the following should be stressed. There is nothing
wrong with the claim that it is reasonable to accept a statement which logically follows
from other premises accepted by a reasoner. Rather, what needs to be emphasised is
that a) what makes premises acceptable in the first place is some sort of ampliative rea-
soning which renders them plausible, or reasonable, given the evidence available; and
b) if the conclusion of a deductive argument is not acceptable, explicative reasoning
alone cannot tell the reasoner where to revise.
Such opening remarks lead us directly to the problem of justification of ampliative
reasoning: given that ampliative reasoning is not necessarily truth-preserving, how
can it be justified? This is Hume' s problem, for although David Hume first raised it for
induction, his challenge concerns ampliative reasoning in general. His point hinges
on the fact that ampliative reasoning is defeasible. Since, the Humean challenge goes,
the premises of an ampliative argument do not logically entail the conclusion, there
are possible worlds in which the premises are true and the conclusion false. How then,
the challenge goes on, can we show that the actual world is one of the possible worlds
in which whenever the premises of an ampliative argument are true, its conclusion is
also true? Or even, how can we show that in the actual world most of the times in
which the premises of an ampliative argument are true, the conclusion is also true?
The Humean challenge is precisely that the only way to do this is bound to presuppose
that ampliative reasoning is rational and reliable; hence, its bound to beg the question.
What the Humean challenge is taken to suggest is that the premises of an amplia-
tive argument cannot confer warrant or rational support on its conclusion. This is
the central philosophical issue concerning ampliative reasoning. Any substantial de-
62 S.PSILLOS

fence of the rationality of ampliative reasoning should either solve or dissolve Hume's
problem. Yet, this is not the place to deal with this philosophical problem. Instead,
this chapter will concentrate on another problem, which needs to be dealt with inde-
pendently of the problem of justification. It is the descriptive problem: what is the
structure of ampliative reasoning? No-one, including Hume himself, but save Popper,
denies that humans are engaged in ampliative inferential practices. What exactly these
inferential practices involve, and whether or not they admit specific logical forms, are
issues worth looking into. One may call the descriptive problem Peirce's problem,
since Peirce was, arguably, the first who tried to address it systematically.

4.3 EXPLANATORY REASONING: INDUCTION AND HYPOTHESIS


That ampliative reasoning admits specific logical forms goes back to Peirce's early
work on logic and inference. As is well-known (and further explained in the intro-
duction of the present book), early Peirce (2.372-388) attempted to model ampliative
reasoning on the logical form of explicative reasoning. Take, for instance, the follow-
ing typical case of explicative reasoning:
D: All A's are B; a is A; therefore, a is B.
An obvious re-organisation of D is
/:a is A; a is B; therefore All A's are B.
I (what Peirce originally called "Induction") moves from some observations about a
set of individuals (i.e., that the individuals in the sample are both A and B) and returns
a generalisation over all individuals of a certain domain. But the deductive rule D
above admits of yet another re-organisation:
H: a is B; All A's are B; therefore a is A,

where the premises are a particular known fact (a is B) and a generalisation (All A's
are B), while the conclusion is a particular hypothesis (that a is A).
The fact that argument-patterns I and H have different logical forms suggests that
there may well be two different and distinct types of ampliative reasoning. While the
argument-pattern I clearly characterises the logical form of the intuitive more-of-the-
same rule of induction, the argument-pattern H is more difficult to characterise. Peirce
called "Hypothesis" (or the "method of hypothesis") the mode of reasoning which cor-
responds to H. It can be illustrated by using Peirce's own example: given the premises
"All the beans from this bag are white" and "These beans are white", one can draw the
hypothetical conclusion that "These beans are from this bag" (2.623). Peirce seems
to have thought that the argument-patterns H and I correspond to two distinct modes
of ampliative reasoning, since he noted that "induction classifies, whereas hypothesis
explains" (2.636). As he put it: "Induction is where we generalise from a number of
cases of which something is true, and infer that the same thing is true of a whole class.
(...) Hypothesis is where we find some very curious circumstance, which would be
explained by the supposition that it was a case of a certain general rule, and thereupon
adopt that supposition" (2.636). However, scholars of his work, most notably (Fann,
1970, p.22-23), suggest that he was not prepared to separate sharply the two forms
of inference, but that he conceived of induction and hypothesis as occupying opposite
BETWEEN CONCEPTUAL RICHNESS AND COMPUTATIONAL COMPLEXITY 63

ends of the continuum of ampliative inference. Be that as it may, I want to focus on


and defend the following thesis. The different logical forms of Induction and Hypoth-
esis should not obscure the fact that they are very similar in nature: they are instances
of what one may call explanatory reasoning, where explanatory considerations govern
the acceptance of the conclusion. So, I want to suggest that explanatory reasoning is
a basic type of ampliative reasoning, irrespective of the specific logical forms it may
admit. In order to defend this thesis I shall first discuss the case of Induction.
In order to see how a nomological generalisation of the form "All A's are B" is ex-
planatory we need to consider the following contrastive explanation-seeking question:
'Why is this sample of individuals which are A also B, rather than not-B'? (e.g., 'Why
is this sample of ravens black, rather than white '?). When this question is asked, what
is looked for is a relevant difference between an actual case (e.g. the sample containing
only black ravens) and an unactualised, but possible, case (e.g., the sample containing
white ravens, or both white and black ravens) (cf. (Lewis, 1986)). The relevant differ-
ence is that by virtue of a law, the contrastive class of A's which are not B is empty. In
other words, the relevant difference is that there is a nomological connection between
being A and being B which makes it the case that all A's are B. Therefore, the nomo-
logical generalisation "All A's are B" explains why the sample has failed to contain an
individual which is A but not B. This can be suitably extended to statistical generalisa-
tions. The nomological generalisation that x% of A's are B explains why the random
sample of individuals has displayed the observed frequency of A's which are B.2
What needs to be stressed is that good inductive reasoning involves comparison of
alternative explanatory hypotheses. In a typical case, where the reasoning starts from
the premise that 'All A's in the sample are B', there are two possible conclusions that
can be drawn. The first is that the observed correlation in the sample is due to the fact
that the sample is biased. The second is that the observed correlation is due to the fact
that there is a nomological connection between being A and being B such that All A's
are B. Which hypothesis should be chosen as the appropriate conclusion will depend
on explanatory considerations. Insofar as the conclusion "All A's are B" is accepted,
it is accepted on the basis it offers a better explanation of the observed frequencies of
A's which are B in the sample in contrast to the (alternative potential) explanation that
someone has biased the sample in order to make us think that all A's are B. 3
In order to see how hypothetical reasoning of the form H above is explanatory,
lets take a toy-example. Suppose that we observe a black bird (a is B) and that, by
instantiating schema H : {a is B; All A's are B; therefore a is A}, we infer that, given
that All ravens are black, this bird is a raven (a is A). We have thereby answered the
explanation-seeking question "Why is individual a B?" by hypothesising that a is A
and by appealing to some sort of nomological connection between being A and being
B. The nomological connection between property A and property B is part of the in-
formation contained in the premises of the explanatory argument H. The conclusion
of the argument, that a is A, does not explain (how could it?) this nomological con-
nection, but it is itself a potential explanation of the observation that a is B only in

2A similar point is made by John Josephson in his chapter in the present volume.
3Gilbert Harman (Harman, 1965) has also emphasised this point.
64 S. PSILLOS

virtue of this nomological connection. This observation seems to be the essence of


the Hempelian Deductive-Nomological account of explanation (cf. (Hempel, 1965)).
On this account, explanation amounts to nomic expectability. A singular event e (the
explanandum) is explained iff a description of e is the conclusion of a valid deductive
argument, whose premises, the explanans, involve essentially a law-like statement L,
reporting a law of nature, and a set C of initial, or antecedent, conditions. So, the event
e is explained by showing how this event should have been expected, if the relevant
laws and certain initial conditions were taken into account. For instance, on this ac-
count, we offer a potential explanation of the fact that the beer keg exploded in the
basement by citing the law which connects the pressure of a liquid with its tempera-
ture and by appealing to a certain antecedent condition, viz., that the temperature of
the beer in the keg rose rapidly. We therefore explain the explanandum by subsuming
it under a law. It then appears that schema H is nothing but the Hempel ian Deductive-
Nomological account of explanation. We ask: why did e happen? And we answer
the question by constructing an argument of the type H above, whose premises are
law-like statements and statements of initial conditions.
My suggestion then is that one should simply see both I and H as species of one and
the same genus of reasoning: explanatory reasoning, where hypotheses are being gen-
erated and accepted on an explanatory basis, irrespective of the logical form that these
hypotheses might take. This is not to suggest that there are no differences between I
and H. The reasoning process behind I produces generalisations. In the case of H, the
reasoning begins with the observation that a is B, and then, in one and the same act, it
asserts that this observation would be explained if one hypothesised that there was a
nomological connection between A and Band that a was A. So, the reasoning process
behind H produces both a general hypothesis (asserting the nomological connection
between A and B) and a particular hypothesis (asserting that the individual a is A). In
any case, we shouldn't lose sight of the fact that the generation and acceptance of gen-
eral hypotheses is the product of explanatory reasoning no less than the generation and
acceptance of particular hypotheses. The fact that I can lead only to general hypothe-
ses, whereas H can lead to both general and particular hypotheses does not amount to
a fundamental category difference between the two. In either case, it is explanatory
reasoning which leads to the generation and acceptance of hypotheses, be they general
or particular. 4

4.4 ABDUCTION
In his famous characterisation of abduction, Peirce described "abduction" as the rea-
soning process which proceeds as follows: "The surprising fact C is observed. But if
A were true, C would be a matter of course. Hence, there is reason to suspect that A is
true" (5.189). We can easily see that this process can underlie both argument-patterns
I and H. Suppose that the surprising fact is the particular fact that a is B. This is ex-

4 With one possible exception, viz.• the predictive inference, as in the case of next-instance induction. There,
we move from n observed A's being 8 to conclude that the next A is going to be B. The conclusion is a
singular statement and is clearly non-explanatory. But one may think of predictive inference as parasitic on
the implicit generalisation "All A's are 8".
BETWEEN CONCEPTUAL RICHNESS AND COMPUTATIONAL COMPLEXITY 65

plained by saying that if a is A and All A's are B, then a is expected to be B. This piece
of reasoning is nothing but an instance of the argument-pattern H. Suppose now that
we allow the surprising fact to be the correlation of two properties A and B in a sample
of individuals. Then, we can explain this by saying that the sample is the way it is
because All A's are B. This is an instance of the argument-pattern l. So, abduction can
incorporate both l and H, and therefore can lead to the generation of generalisations
no less than to the generation of hypotheses stating particular facts. Accordingly, I
shall reserve the term abduction for explanatory reasoning in general and suggest that
abduction comprises both argument-patterns l and H. This may tally with what Peirce
himself thought of abduction, when in his later years introduced abduction as a distinct
type of reasoning. But I refrain from engaging in interpretative work. 5 Instead, I will
try to characterise more precisely what exactly is involved in abduction as a reasoning
process, drawing on some of Peirce's thoughts and suggestions and pointing out some
open problems.
Three, I think, are the big problems that any precise characterisation of abduction
faces. The first is what I will call the "multiple explanations problem". The second
concerns the connection between the connection between reasoning process behind
abduction and the likelihood of the hypotheses that it generates. The third is the nature
of explanation itself. Let me consider them in tum.

4.4. 1 Abduction as inference to the best explanation


There is a clear sense in which the reasoning process that Peirce's quotation captures
is inadequate. For there are, typically, more than one mutually incompatible hypothe-
ses T;,...,Tn such that, if true, they would make the explanandum-event e a matter of
course. If the mere fact that the explanatory hypothesis made the explanandum "a
matter of course" were enough to render this hypothesis plausible, then an unlimited
number of mutually incompatible hypotheses would be equally plausible. I shall call
this problem the "multiple explanations problem". As a result of this, one has either
to resolve for the view that abduction is impotent to impose any restriction on the ac-
ceptance of hypotheses, or to beef up the reasoning process behind abduction so that
it acquires an evaluative-eliminative component.
The search for (types of) explanatory hypotheses should be preferential. The search
should aim to create, as Peirce nicely put it (Peirce, 1957, p.254), "good hypotheses".
Consequently, this search should produce an evaluation of hypotheses which ranks
them in an order of preference, reflecting a distinction of hypotheses into better and
worse. Those hypotheses are ranked higher which a) explain all the facts that led to the
search for hypotheses; b) are licensed by the existing background beliefs;6 c) are, as

5The interested reader should look at the Introduction of this book for a description of Peirce's account
of abduction. More relevant literature includes (Burks, 1946; Hanson, 1965; Fann, 1970; Thagard, 1981;
Flach, 1996a).
6Using background beliefs to give a hypothesis a certain place in the order of preference is going to influ-
ence the likelihood of the hypothesis, and hence its acceptability, since background beliefs themselves are,
typically, supported by evidence to some degree.
66 S. PSILLOS

far as possible, simple; d) have unifying power 7 e) are more testable, and especially,
are such that entail novel predictions. 8 These factors are not algorithmic in charac-
ter, but this does not mean that one cannot decide, on their basis, which hypothesis
should be ranked highest. In fact, in most typical cases, these factors will lead to a
definite conclusion, be it about medical diagnosis or car mechanics, or what have you.
So, for instance, a diagnostician, pretty much like a good car mechanic, will look for
a hypothesis about the cause of symptoms such that: it accounts, if possible, for all
the symptoms; it is consonant with background knowledge as what types of causes
produce these symptoms; it avoids, in the first instance, attributing the symptoms to
multiple causes; it can yield further predictions that can be tested (e.g., that the patient
will recover if they take a certain medicine which acts on the cause of the symptoms).
It's not implausible to think that although virtually never do we go through all such
factors explicitly when we are engaged in abductive reasoning, all these factors have
nonetheless been internalised by a good reasoner who, then, applies them implicitly
in the case at hand. The internalisation of these factor may well be what Peirce called
"good sense" (7.220). In most typical cases, an explicit reconstruction of the reason-
ing process will reveal the implicit reliance on such factors . Similarly, in most typical
cases, the product of the reasoning will be just one hypothesis which is ranked as most
plausible. But when there is more than one (e.g., the light does not come on; is it
because the light-bulb is gone; because the fuse is blown; or because of a power-cut?),
the reasoning process itself contains obvious resources which will lead to adjudica-
tion. To the extent that the application of these evaluative-ranking criteria mark the
degree of goodness of a hypothesis, it is reasonable to say that abduction is nothing
but what (Harman, 1965) has called "Inference to the Best Explanation". According to
this mode of reasoning, a hypothesis H is accepted on the basis that a) it explains the
evidence and b) no other hypothesis explains the evidence as well as H does. So, not
only is there a reasoning process which underlies abduction, but also this reasoning
process has a certain logical, though not algorithmic, structure.

4.4.2 Abduction and confirmation


It should be clear that the product of abductive reasoning- the explanatory hypothesis
- is not guaranteed to be true. This is not surprising, given that abductive reasoning is
defeasible. But, surely, one may think, what is at issue here is not the obvious fact of
defeasibility. Instead, the objection may be that abductive reasoning cannot return an
explanatory hypothesis which might be reasonably said to be (likely to be) true. For,
one might ask, what reasons would govern such judgements of likelihood? Yet, in the
end of the day, a good reasoner should want to adopt hypotheses that are likely to be
true, or that she has reasons to think that they are likely to be true. Peirce was surely
aware of the problem: "A hypothesis then has to be adopted which is likely in itself
and renders the facts likely. This process of adopting a hypothesis as being suggested

7 0r, breadth, as Peirce put it (7.220-1 &7.410).


8 cf. Peirce (7.220 & 7.115).
BETWEEN CONCEPTUAL RICHNESS AND COMPUTATIONAL COMPLEXITY 67

by the facts is what I call abduction" (7.202). The question is: how can abduction be
this process? How, that is, can abduction render the chosen hypothesis likely?
For a plausible solution to this problem we may take our cue from late Peirce's
suggestion that abduction should be seen as part and parcel of the method of enquiry
(cf. 7.202ff.). So, the reasoning process that underlies abduction should be embedded
in a more general framework of inquiry so that the hypotheses generated and evalu-
ated by abduction can be further tested. The result of this testing is the confirmation
or disconfirmation of the hypothesis which, naturally, affects its likelihood to be true.
We should therefore conceive of abduction as the first stage of the reasoner's attempt
to add reasonable beliefs into his belief-corpus in the light of new phenomena or ob-
servations. The process of generation and ranking of hypotheses in terms of plausibil-
ity (abduction) is followed by the derivation of further predictions from the abduced
hypotheses. Insofar as these predictions are fulfilled, the abduced hypothesis gets con-
firmed. Peirce himself thought that the process of generating predictions is deductive
and came to call "Induction" the testing these predictions, and hence the process of
confirming the abduced hypothesis (cf. 7.202ff). 9 Leaving once again aside some im-
portant interpretative issues, I make the following use of Peirce's idea: although a
hypothesis might be reasonably accepted as the most plausible hypothesis based on
explanatory considerations (abduction), the degree of confidence in this hypothesis is
tied to its degree of subsequent confirmation. The latter has an antecedent input, i.e.,
it depends on how good the hypothesis is (i.e., how thorough the search for other po-
tential explanations was, how plausible a potential explanation is the one at hand etc.),
but it also crucially depends on how well-confirmed the hypothesis becomes in light
of further evidence. So, abduction can return likely hypotheses, but only insofar as it
is seen as an integral part of the method of inquiry, whereby hypotheses are further
evaluated and tested.

4.4.3 Explanation
As we have already seen, in his famous characterisation of abduction, Peirce noted
that the abduced hypothesis makes the surprising fact to be explained a matter of
course. This reference to matter of course is not accidental. It suggests that the ex-
planatory hypothesis should be such that it removes the surprise from the occurrence
of the explanandum. But, although it is certainly part of an explanation that it renders
the explanandum non-surprising, what needs to be added is in exactly what ways the
explanandum is rendered non-surprising. And although it is intuitively clear that to
explain an explanandum-event is to provide information about its causal history, there
is substantive disagreement over how exactly we should understand this last claim.
Explanation is effected by pointing to some causal-nomological connections between
the explanandum and the fact that is called upon to do the explaining. But the nature
of these causal-nomological connections are under heavy dispute.

9 "But if [abduction is] to be understood to be a process antecedent to the application of induction, not
intending to test the hypothesis, but intended to aid in perfecting that hypothesis and making it more definite,
this proceeding is an essential part of a well-conducted inquiry" (7.114) And "Induction is a process for
testing hypotheses already in hand. The induction adds nothing" (7.217).
68 S. PSILLOS

Two are the important points of dispute. The first centres around how exactly
the explanatory connection is to be understood. Some philosophers (e.g. (Hempel,
1965; Kitcher, 1981)) argue that explanation proceeds via derivation. They claim that
explanations are, essentially, arguments such that an event-type P explains an event-
type Q iff (a description ot) the explanandum-event logically follows from a set of
premises which essentially involve (a description ot) P. The well-known Deductive-
Nomological account of explanation is an instance of this approach. What's typical
of this approach is that causal order follows from (instead of being presupposed by)
explanatory-derivational order: what causes what is settled after we have settled the
question what explains (in a derivational sense) what. Opposite to the above approach
is the view (advocated, among others, by (Salmon, 1984b)) that explanation are not
arguments. Instead, they should characterise the causal mechanisms that bring about
the explanandum-event, irrespective of whether (descriptions ot) these mechanisms
can be captured in the premises of an argument whose conclusion is( a description ot)
the explanandum-event. 10 The second (related) dispute focuses on the role of laws in
explanation. On one approach, laws and reference to nomological connections are es-
sential part of an explanation, whereas on another view, causal stories can be complete
even though they make no reference to laws, or even though there may be no relevant
laws to refer to. These issues are in the forefront of the current philosophical debate.
So, here I will not try to examine them further (but cf. (Salmon, 1990)). It should be
enough to keep in mind two things. First, it is still an open issue what exactly an ex-
planation is. Second, whatever the explanatory relation is taken to be in its details, its
connection with causal and/or nomological information about the explanandum-event
and its function as a surprise-remover should be pretty uncontroversial. I think the best
way to capture the latter function is to point out that explanation is typically linked
with improving our understanding of why an event happened and that improvement of
understanding occurs when we succeed in showing how an event can be made to fit in
the causal-nomological nexus of things we accept. We remove the surprise of the oc-
currence of an event if we show that the acceptance of certain explanatory hypotheses,
and their incorporation into our belief-corpus, helps to include the explanandum-event
into this corpus. Schematically, if BK is this belief corpus, e is the explanandum-event
and T is a potentially explanatory hypothesis, then T should be accepted as a potential
explanation of e if BK alone cannot explain e, but BK U T explains e.
To sum up, abduction, conceived as Inference to the Best Explanation has a rather
definite logical structure: it is the reasoning process in which the reasoner generates
and evaluates a number of potentially explanatory hypotheses, in the light of back-
ground knowledge. Judging the plausibility of each of them, and ranking them ac-
cordingly, is precisely the respect in which abduction is evaluative. The degree of

10 Well-known counter-examples to the Deductive-Nomological account of explanation suggest that there

is more to causal explanation than can be captured by the ON-pattern. The ON-patterns is symmetric, but
causation is not. For instance, one can explain the length of the shadow of a flagpole in a ON-fashion by
constructing an argument whose premises are general Jaws about the propagation of light and particular
conditions about the height of the flagpole. Yet, one can use the length of the shadow as the initial condition
and ON-explain the height of the flagpole by reversing the above ON-argument. The latter ON-derivation
cannot count as a genuine explanation because it does not respect the relation of cause and effect.
BETWEEN CONCEPTUAL RICHNESS AND COMPUTATIONAL COMPLEXITY 69

confidence in the chosen hypothesis, however, is a matter of how well the hypothesis
will stand up further testing.

4.5 ABDUCTION AND COMPUTATION


Having outlined a conceptual model of abductive reasoning, I shall now turn my atten-
tion to two major attempts to provide computational models of abductive reasoning.
The aim of this section is to motivate (but not prove) the point that, because of its rich
structure, abduction resists an adequate and computationally tractable formal model.
Before I embark on this task, I should note that the following points are meant only to
be part of a general philosophical critique of (some aspects of) the computational ap-
proach to abduction which does not aim to minimise or bypass the important technical
achievements related to the use of abduction in Artificial Intelligence. 11
In Logic Programming (LP) (Kakas et al., 1992; Console et al., 1991 b), abduction
operates in the context of a logic program. The aim of an abductive problem is to
assimilate a new datum 0 into a knowledge-base (KB). So, KB is suitably extended
by a certain hypothesis H into KB' such that KB' incorporates the datum 0. Abduction
is the process through which an H is chosen. The logical form of an abductive problem
in LP is the following (call it F): given a KB and a datum 0, search for a hypothesis
H such that i) KBUH f= 0 and ii) KBUH is consistent. In a typical LP, abduction is
used to detach (and affirm) the antecedent (the body) of a conditional (rule) which is
part of the domain theory (KB) in order to show how its consequent (its head) can be
proved. Take, for instance, the following well-known toy-example. The domain theory
consists of the following two statements: KB : {grass is wet, if rained last night; grass
is wet, if sprinkler was on}. In order to explain the observation 0 that the grass is
wet we may abduce the hypothesis H that it rained last night (or, alternatively that the
sprinkler was on). Given H, and given that it is consistent with KB, we can then run
the program to prove that KB UHF a.
As it stands, F cannot adequately capture the structure of an abductive problem.
Here are some of the reasons why. First, if the only task is to suitably augment KB
so that (i) and (ii) above are satisfied, then the reasoner (program) might trivially in-
corporate 0 straight into KB, without bothering for an H. Second, if (i) and (ii) are the
only elements of an abductive problem, then, as the toy-example shows, there can be
more than one hypothesis that satisfy them. (i) and (ii) above cannot, on their own,
distinguish between the many H's that satisfy them. From a syntactic-computational
point of view, all H's which satisfy (i) and (ii) are the same. Third, searching for
hypotheses H that satisfy non-trivially (i) and (ii) above requires a conceptual space
of hypotheses from which H's can be drawn. Consistency with KB is too permissive
a criterion because the reasoner (program) might well end up examining all kinds of
irrelevant, but consistent with KB, hypotheses before starting investigating the rele-
vant ones. So, other constraints should be incorporated which guide the search and
order the hypotheses to be examined in a preferential way. Typically, the generation of
H's is constrained by the existing KB. Hence, the selected H's should not be merely

11 For a recent survey of the role of alxluction in AI, see (Konolige, 1996).
70 S.PSILLOS

consistent with KB, but they may stand in a stronger relation to it (e.g., they are made
likely by KB.) Notice also that the required relation cannot be entailment of H's by
KB, unless one is willing to accept only mutually consistent hypotheses as potential
explanations of 0. If all potential H's are entailed by KB, then they have to be mutu-
ally consistent. However, it is essential for abduction to be able to deal with mutually
inconsistent explanatory hypotheses. Fourth, knowledge assimilation is typically ab-
ductive, but it is a much more complicated process than the one characterised by (i)
and (ii) above. Here, let me only stress that requirement (ii) above can be too re-
strictive. It may well be the case that the assimilation of datum 0 requires extensive
modification of the existing KB in such a way that the adopted H is inconsistent with
the existing KB, although, of course, the new KB' which includes H should be inter-
nally consistent. Fifth, the explanation of the datum 0 need not be deductive. It may
well be the case that 0 does not logically follow from H and KB, but that still KB UH
explain 0, by showing how 0 was to be expected (e.g., by showing how KBUH make
0 likely, or more likely than not-0.).
An adequate computational model of abduction should be able to deal with such
problems. That is, it should incorporate these features into the computation. Naturally,
there should be a trade-off between the need to set-up a conceptually adequate model
and the need for the model to be computationally tractable. But at least computational
modelling should aim to characterise as adequately as possible the rich conceptual
structure of an abductive problem.
LP-theorists have attempted to improve on F above. There are three main ways in
which F has been improved. First, the logical space from which H's are drawn com-
prises a set A of domain-specific hypotheses, called abducibles. In the toy-example
above the abducibles are {rain last night; sprinkler was on}. In the event calculus
the set of abducibles is a set of events (or event-predicates of the form happens (E)
) which are abduced to hold at a time tt in order to explain how a property holds at
t2 (t2 > tJ) (cf. (Shanahan, 1989)). Second, the updates of KB are subjected to a set
of integrity constraints (I C), i.e., a set of meta-rules which specify which changes of
the KB are not allowed and, therefore, specify which abducibles are not acceptable.
In the event calculus, an /C is such that a property cannot hold at time t2 even though
it held at time tt Ct2 > tt) if there was an event that terminated Pat a time t12, where
t2 > t12 >ft. Third, the abduced hypothesis must be minimal, i.e., it must not be de-
composable into two others each of which could on its own explain the datum. So, F
above gives way to the following schema (F'): an abductive problem is characterised
by the triple < KB,A,IC > and it consist in searching for a minimal H such that i')
KB U H p 0 and ii") KB U H satisfies /C.
There is no doubt that F' is on the right track. But its own limitations suggest some
very general problems with the computational approach. Two such problems stick out.
First, although it should be clear that the specification of a set of abducibles is nec-
essary for any computational model of abduction, its doubtful that this specification
can only be achieved by syntactic-computational resources. An abductive problem
is not merely the search of an explanation H of a datum 0 such that KB U H p 0.
Rather, it is the search of an explanation of a particular type, one that the background
information suggests that is relevant to the understanding why 0 occurred. When,
BETWEEN CONCEPTUAL RICHNESS AND COMPUTATIONAL COMPLEXITY 71

for instance, the computer does not come on we look for blown fuses, power-cuts, or
internal failures, but we do not look for astral influence, or for who switched it off
last time etc. Some hypotheses are relevant while others are not. The first should be
properly called abducible. Yet, these judgements of relevance - and hence the spec-
ification of the appropriate type of abducibles - are not-syntactic, although once in
place, they may admit a certain logical-computational formP The second problem
with F' is that it does not yet have built into it the required preferential structure. As it
stands it simply seems to lack the resources to rank abducibles in some order of prefer-
ence. The requirement of minimality says that of two mutually consistent abducibles
the minimal should be preferred, but as it stands it applies only to abducibles with a
definite logical structure, e.g., p and p&q. It does not say, for instance, among two
or more mutually inconsistent abducibles which one should be preferred. Nor does it
say among two equally simple, but mutually consistent hypotheses, which should be
chosen. In a nutshell, F' does not yet capture the rich structure of abductive reasoning.
LP-theorists have developed several techniques to deal especially with the multi-
ple explanations problem. 13 At this stage, however, they are sets of heuristics which
are not fully incorporated into the computational framework of abductive reasoning.
According to Michalski in (Michalski, 1993, p.120), however, the computational char-
acterisation of abduction need not capture its preferential structure. He suggests that
what needs to be formalised is the process of generating (creating) an explanation, not
the evaluation of which explanation is the best. Michalski's abstract model conceives
of abduction as "reversed deduction", or as he puts it, as tracing backwards an im-
plication rule: where in deduction the reasoner looks for conclusions of the premises
she already accepts, in abduction she looks for the premises that, if true, would entail
a certain conclusion she already accepts. (Hence, Michalski's abduction looks very
much like early Peirce's Hypothesis.)
Michalski is quite right in stressing that the determination of the best among a set
of alternative explanations is not always easy. Abductive reasoning will not always
rank hypotheses in such a way that one, and only one, comes out the best. But his
objection cuts much deeper than that. He thinks that the logical properties of abduc-
tive reasoning do not depend on any measure of the goodness of an explanation. It
should be clear, however, that if abduction was just what Michalski thinks, then ab-
duction would generate an infinity of crazy explanations. Michalski's own example
is instructive. Envisage a case in which one wants to explain why one's pencil is
green. Abduction could easily trace backwards the following implication: {My pencil
is grass; Everything that is grass is green; Therefore, my pencil is green}. The result

12 (Console et al.• 199lb, p.668) give a syntactic characterisation of abducible as follows: the abducible
symbols are exactly those not occurring in the head of any clause in the theory. This characterisation works,
however, only in the limited case in which the explanatory hypothesis is already included in the background
theory. If the explanation is to be sought outside the theory and is such that, together with the theory, it
explains the datum, then Console et al. need to explicitly introduce a set of abducibles (cf. p.676). It should
then be clear that the specification of this set cannot be made syntactically.
13 See (Kakas eta/. , 1997) and (Evans and Kakas, 1992). Evans and Kakas use the notion of corroboration
to select explanations. But it should clear that the notion of corroboration is not related to the search for
explanations but rather to the degree of confidence in the chosen explanation. Corroboration is more akin
to Peirce 's later use of induction rather than to his abduction.
72 S. PSILLOS

would be that the reasoner might consider as an explanation of the fact that the pencil
is green that it is grass. It is precisely because the computational characterisation of
abduction should avoid such trivialities, that some measure of goodness of the abduced
hypotheses should be incorporated in it.
Michalski does, after all, build into his abstract model some measure of goodness of
potential explanation. He makes abduction dependent on some estimation of the like-
lihood of what he calls a "mutual implication". According to his suggestion, whether
or not a hypothesis of the form "All A's are B" is a good explanation depends on the
backward strength of the converse implication: "All B's are A". If it is likely that 'If
something is a B, then it is also an A' then, upon finding a B we may conclude that it
is also an A. So, on this suggestion, inferring from "a is B" and "All A's are B" that
probably "a is A" depends on how likely it is that "All B's are A". If it is very likely,
then the reasoner may accept that a is A, but not otherwise. In the example above,
it would be silly to infer that my pencil is grass because the reversed implication "If
something is green, then it is grass" is not at all likely. Since there is no much space at
present to evaluate properly Michalski's theory, the only point I will stress is that his
suggested measure of goodness does not depend on explanatory considerations. His
suggestion amounts to the claim that the likeliest hypothesis should be chosen. This
is a sound piece of advice, if we already know which is the likeliest hypothesis. But
if we do know that, then there is no reason to generate any other than the likeliest
hypothesis.
Bylander and his collaborators in (Bylander et al., 1991) have aimed to offer a
computational model of abduction which captures its evaluative element. According
to them, an adduction problem is a tuple< Dau ,Hall ,e,pl >where: Dall is a finite set
of all the data to de explained; Hall is a finite set of all the individual hypotheses; e is
a map from all subsets of Hall to subsets of Dall; pl is a map from subsets of Hall to a
partially ordered set representing the plausibility of various hypotheses. In this model,
an explanation is a set of hypotheses H such that H is complete and parsimonious, i.e.,
such that e(H) = Dall and there is no proper subset H' of H such that e(H') = Dall·
The best explanation is the H with the highest place in the plausibility ordering. Lets
call this model J .
There are clear senses in which J is an improvement over F' and over Michalski's
model. Its most distinctive improvements are a) that it is not built into the model that
an explanation should be a deductive argument and b) that potential explanations are
ordered in terms of plausibilities. Allowing for an initial plausibility ordering takes
account of the way in which background information and explanatory considerations
may affect the trustworthiness of a hypothesis. In the case of medical diagnosis, where
J has been applied, the plausibility ordering suggests, for instance, that not all hypothe-
ses concerning the causes of a set of symptoms are equally licensed by background
information. The plausibility ordering is also helpful from a technical point of view.
Given that J conceives of explanation as a function from subsets of Hall to subsets of
Dau, it should be clear that there will, normally, be a large number of such potential
explanations. If they are ranked in terms of plausibility, then some of them will be
deemed implausible and will not be further entertained.
BETWEEN CONCEPTUAL RICHNESS AND COMPUTATIONAL COMPLEXITY 73

The J model, however, has some weaknesses, too. Some computational difficulties
have been noted by the authors of J themselves. They point out that J makes abductive
problems computationally tractable only if it assumed - as a rule, implausibly - that
there are no incompatibility relationships between the competing hypotheses. Besides,
if it is required that there always should be one most plausible (best) explanation, then
there intractability is guaranteed. Some conceptual in nature problems with J have
been noted by (Thagard and Shelley, 1997). What one may add is that plausibility in J
is taken as a primitive notion. Although it is right to say that the details of the plausi-
bility ordering will be domain-specific, J needs to say more about its general structure
in order to accommodate explanatory factors into abductive reasoning. To be sure,
(Josephson and Josephson, 1994, ch.9) offer a weaker model of abductive reasoning
which is computationally tractable. In this model the task of an abductive problem
is to explain as much as possible of the data with acceptable levels of confidence.
Completeness and maximal plausibility have to be sacrificed in favour of the weaker
aim of maximising explanatory coverage. There are algorithms for this model which
compute the result in polynomial time. This is clearly an improvement in respect of
computation, but some of the element which make their original model J conceptually
rich have to go.

4.6 CONCLUSIONS
To recapitulate, I have argued that abduction has a rich conceptual structure which
comprises induction as a special case. Abduction is the mode of reasoning in which a
hypothesis His accepted on the basis that a) it explains the evidence and b) no other
hypothesis explains the evidence as well as H does. So, the reasoning process which
underlies abduction has a certain logical, though not algorithmic, structure. Induc-
tion produces generalisations (be they universal or statistical), but these are explana-
tory and their acceptance is governed by explanatory considerations. So, although
induction may be taken to be superficially distinct from abduction, it is an instance of
explanatory reasoning.
As for the second theme of this chapter, i.e., the critical discussion of the recent
computational modelling of abduction, I wish to sum it up with a conjecture: the
more conceptually adequate a model of abduction becomes, the less computationally
tractable it is. This may leave us with a dilemma: either we may have to go for
computational tractability at the expense of conceptual richness, or we may have to
resolve for the view that a rich conceptual model of abduction cannot be adequately
programmed. The solution, if any, lies with future research.

Acknowledgments
Many thanks to Peter Lipton whose book "Inference to the Best Explanation" (Lipton, 1991) has
been a great source of inspiration; to John Josephson for many helpful comments on an earlier
draft; to Bob Kowalski for many hours of discussions about the role of abduction in Artificial
Intelligence; and to Francesca Toni for her patient defence of the Logic Programming approach
to abduction. Research for this chapter was conducted under a British Academy Postdoctoral
Fellowship. I am grateful to the Academy for all the help. Many ideas of this chapter have been
74 S. PSILLOS

stimulated by, and have found a great companion in, the views expressed in the book "Abductive
Inference" by John Josephson and his collaborators (Josephson and Josephson, 1994), whom I
wish to thank.
II The logic of abduction and
induction
5 ON RELATIONSHIPS BETWEEN
INDUCTION AND ABDUCTION: A
LOGICAL POINT OF VIEW
Brigitte Bessant

5.1 INTRODUCTION
The concepts of inductive and abductive reasonings have led and still lead to many
open discussions. Many works are based on these reasonings, for example in machine
learning (e.g. (Kodratoff and Ganascia, 1986; Michalski, 1983a) ), inductive logic pro-
gramming (see (Muggleton and De Raedt, 1994) for a survey), abductive logic pro-
gramming (e.g. (Poole et al., 1987)), or resolution of diagnosis problems (see (Cox
and Pietrzykowski, 1987; de Kleer and Williams, 1987; Poole, 1989b) for various ap-
proaches to abduction). However, many of these works deal with only one of these
reasonings. When we attempt to relate induction and abduction, the obvious differ-
ences between the various approaches make this task difficult. How can we, for ex-
ample, relate induction characterized by logical conditions of adequacy of the relation
"observation 0 confirms hypothesis H" (Hempel, 1945) with the ABDUCE procedure
in logic programming (Console et al., 199lb)?
In order to clarify, we examine the relationships between induction and abduc-
tion through three standard approaches which are analyzed and compared. The first,
namely "one is an instance of the other", is that induction is simply a form of abduction
and vice versa. The second, namely "different with a common root", is that they are
distinct but they may be made to share a common logical form based on a hypothetico-
deductive model. The third, namely "totally different", is that induction is nothing but
the process of confirmation and hence it is distinct from abduction, based on what
we call "the hypothetico-nonmonotonic model". We then present the viewpoint that
77
P.A. Flach and A.C. Kakas (eds.), Abduction and Induction, 77-87.
© 2000 Kluwer Academic Publishers.
78 B. BESSANT

both abduction and induction are means by which knowledge bases are completed in
order to employ deduction and to draw further conclusions. We eventually discuss
about another way to deal with inductive or abductive inference, that is the study at
the meta-level of its logical properties.

5.2 ABDUCTION AND INDUCTION: ONE IS AN INSTANCE OF THE


OTHER
Abduction is a form of induction. Abduction can be considered as a form of in-
duction in different ways. In Michalski's work (Michalski, 1993), abduction is clearly
defined as a special case of induction defined as follows: "given a background knowl-
edge Th and observations 0, induction hypothesizes a premise H, consistent with Th,
such asH U T h f= 0 ... Induction is viewed as "tracing backward" this relationship."
Another way to situate abduction as a special case of induction is by following (Mor-
timer, 1988): "deduction and induction would create a separate and exhaustive divi-
sion of all possible inferences". Induction being characterized as the non-deductive
inference, abduction is an instance of induction.

Induction is a form of abduction. In Psillos' paper (Psillos, 1996), abduction is


considered as an ampliative reasoning: "a reasoning in which the conclusion of the
inference exceeds what is already stated in the premises". So, induction can be seen
as a particular case of abduction. According to Josephson (this volume), inductive
generalizations are abductions because abduction is seen as inference to the best ex-
planation, and inductive generalization as an explanation to the set of observations.

Discussion. First of all, through this first approach, we notice the difficulty to relate
and to compare definitions of abduction and induction which are sometimes conflict-
ing. The variety of definitions and models using unrelated formalisms, the intuitive
perception of some basic concepts (e.g. the concept of explanation, generality), or
else, the existence of numerous languages to deal with abduction and induction, is
what we call "the babelism problem", pointed out in (Bessant, 1996). The different
points of view presented above, result among others from a problem of terminology
which of course, is not the main problem but should yet be noted. Another reason is
the level of generality of the definitions that allow plenty of scope for liberty. For ex-
ample, when induction is considered as non-deductive inference, it covers a very large
set of possible reasonings; or else when abduction is defined as inference to the best
explanation, the generality lies in the fact that the notion of explanation is complex
and leads to many interpretations. Usual informal definitions are inference to the best
explanation for abduction and generalization of observations for induction. The no-
tions of "explanation" and "generality" bring confusion in the relationships between
these two forms of reasoning.
The role of explanation in inductive and abductive reasonings is a fundamental
component in the analysis of their relationships. If we consider that inductive general-
ity is explanatory, we may agree with the thesis that induction is a form of abduction.
However, this consideration is not as simple as it may seen. It depends on the defi-
nitions we give to explanation and generality, and it also depends on how we situate
ON RELATIONSHIPS BETWEEN INDUCfiON AND ABDUCfiON 79

induction and abduction within these definitions. We refer to the chapters of Console
and Saitta, Josephson and Psillos for a variety of points of view on the notions of gen-
erality and explanation. In short, in this first approach - "one is an instance of the
other" - we consider inductive and abductive reasonings a certain explanatory power,
one being able to be fully incorporated into the other. In the sequel, we present one
of the most common approach to both inferences, based on a hypothetico-deductive
model. In this approach, the explanatory power turns into a root common to induction
and abduction which are, then, considered as different.

5.3 ABDUCTION AND INDUCTION: DIFFERENT WITH A COMMON


ROOT

5.3.1 The hypothetico-deductive model


In many AI works, abduction and induction are considered as two different forms of
reasoning, linked up by a single root, which is the hypothetico-deductive model, de-
fined as follows:

Given an observation 0 and a domain theory T h find a hypothesis H such as


Thi\H F0
We call such a hypothesis, a hypothetico-deductive explanation.
In machine learning, Th represents the background knowledge, 0 can be seen as
a set of examples and counter-examples of a concept and the inductive hypothesis H
is the description of the concept. With diagnosis, T h accounts for the description of
the system, 0 is a set of symptoms and the abductive hypothesis H is the diagnosis.
We propose to refine the hypothetico-deductive model by a logical characterization of
some minimal properties for inductive and abductive hypotheses.
We observe that in several works in AI, the approach of induction or abduction, fits
in with the epistemological point of view of (Hempel, 1966). The common way of in-
terpreting it is as follows: given a phenomenon to be explained, namely explanandum,
a deductive-nomological explanation of this phenomenon is the knowledge called
explanans such as
explanans 'p explanandum
Explanans is characterized by Hempel as the conjunction of a set of covering laws
represented by L and particular conditions represented by C

explanans = L 1\ C
We examine this particular class of inductive and abductive reasonings, that we call
the AI class defined by:
Definition 5.1 Induction and abduction are of AI class iff they are based
- on the logical model of hypothetico-deductive explanation
- on the epistemological model of deductive-nomological explanation
Induction of AI class is defined as the inference of covering laws and abduction of AI
class as the inference ofparticular conditions.
80 B. BESSANT

We consider a first-order language L consisting of an infinite countable set of classi-


cal symbols (variables, connectors V, 1\, •, -+, universal and existential quantifiers V,
3) and by an infinite countable set of non-logical symbols (constants, functions, rela-
tions). Classical rules of formation of terms and formulas, are not recalled here. Let
us simply note that capital roman letters A,B,C ... denote well-formed formulas of L.
M(A) is the set of models of A (interpretations where A is true). A is the deductive
closure of A (i.e. A = {B formulas of L such that A p B} ).
We first give the logical characterization of knowledge common to induction and
abduction.

• This a domain theory (or knowledge base) iff 1


1. T h is a closed formula of L
2. M(Th) =J 0

• 0 is an observation iff
1. 0 is a closed formula of L
2. M(O)=J0
3. If This a domain theory,
then Th ?!= 0 and M(Th 1\ 0) =J 0

• H is a hypothesis iff

1. H is a closed formula of L
2. Given Th and 0, Thl\ 0 ?!=Hand M(Thl\ 0 I\ H) =J 0

We consider that the domain theory Th and the observation 0 are consistent (M(Th) =J
0 and M( 0) =J 0) and that 0 and H have the particularity to be consistent with the a
priori knowledge (M(T h 1\ 0) =J 0 and M(T h 1\ 01\ H) =J 0). Indeed, the reason is that
it exists at least, one common model to T h, 0 and H that is the intentional interpre-
tation (or real world). The validity of a hypothesis H is not known (Th 1\ 0 ?!=H).
Finally, observation 0 has a particularity linked to its part in triggering induction and
abduction which is that, a priori, we do not have a proof of its validity (T h ?!= 0).

• L is a law iff
it is semantically2 equivalent to a universally quantified formula.

• C is a particular condition iff


it is a ground formula

Definition 5.2 (Induction of AI class) Given an observation 0 and a domain theory


T h, the minimal properties for a hypothesis H induced from 0 and T h are:

1 if and only if by definition


2A is semantically equivalent to B iff M(A) = M(B) .
ON RELATIONSHIPS BETWEEN INDUCTION AND ABDUCTION 81

(11) H is a hypothesis
(12) H is a law
(13)
1 -Either H f= 0
2 - Or H is such as it exists a particular condition C such as
H ~ 0, C ~ 0, C /\ H f= 0, and
either (a) C E Th or (b) C E 0

Definition 5.3 (Abduction of AI class) Given an observation 0 and a domain theory


T h, the minimal properties for a hypothesis H abduced from 0 and T h are:
(Al) H is a hypothesis
(A2) H is a particular condition
(A3) H is such as it exists a law L E T h such as
H~O,L~O,L/\Hf=O

(II)(I2)(13) (resp. (Al)(A2)(A3)) determine the set of possible inductive (resp. ab-
ductive) hypotheses. We do not address here the preference criteria that allow to select
the best hypotheses.

Th =shine( sun), let an observation 0 =


Let us illustrate these properties with some examples. Given a domain theory
fly(Tweety) /\ bird(Tweety).
Ft =mouse( Clyde)/\ fly(Tweety) is neither an inductive hypothesis because of (12),
nor an abductive one because of (A3): Th does not contain any law.
=
Fz Vx(fly(x) 1\bird(x)) is an inductive hypothesis: (II) (12) (13-1) are verified.
= =
F3 Vx(fly(x) ---+ bird(x)) (resp. F4 Vx(bird(x) ---+ fly(x))) is an inductive hypoth-
esis: (II) (12) (13-2-b) are verified with the particular condition C (such that C E 0)

=
equivalent to fly(Tweety) (resp. bird(Tweety)).
Fs Vx(crow(x) ---+ (fly(x) /\ bird(x)) is not an inductive hypothesis: (13) is not ver-
ified. However, if we add crow(Tweety) to Th, F5 becomes inductive: (13-2-a) is
verified.
Th does not contain any law, so the set of possible abductive hypotheses is empty be-

=
cause of (A3).

=
If we add F6 Vx(gold(x)---+ shine(x)) to the theory, the only ground formulas that
we can consider such that Th /\ F f= 0, are F 0/\ G with any formula G. However,
it is not an abductive hypothesis (F f= 0).
If we add Fs to Th then crow(Tweety) is an abductive hypothesis: (Al)(A2)(A3) are
verified.

5.3.2 Discussion
From an epistemological point of view, we adopt Hempel's (with the notion of law
and particular condition) instead of Peirce's (see the introductory chapter to this vol-
ume). The reason is that in many AI works, the intuitive definitions are closer to the
deductive-nomological explanation. From a logical point of view, (12) and (A2) are
syntactical considerations that could be too restrictive and debatable from an episte-
mological point of view. However, we think that they capture a non negligible part
82 B.BESSANT

of AI works, (12) and (A2) represent one of the main points on which induction and
abduction are differentiated: induction is defined as inference of covering laws and
abduction is defined as inference to particular conditions. Sometimes abduction or
induction are defined in a less restricted way as for example, when abduction is con-
sidered as scientific discovery. In our context, abduction would be then the inference
to the explanans (that is inference to the conjunction of the laws and particular con-
ditions) instead of restricting abduction to the inference to particular conditions. The
adopted point of view depends on the choice of generality in the definition. We agree
with Dimopoulos and Kakas' point of view (Dimopoulos and Kakas, 1996a) about the
complexity of such a reasoning that can be broken down to a combination of more ba-
sic steps (what we do here). In our approach, this form of reasoning naturally appears
as the "explanatory reasoning" (as in Aliseda's chapter (this volume) where the author
presents a unified framework for abductive and inductive reasoning). It represents
here the common root, described in the beginning of the section through hypothetico-
deductive model.
In our definition, induction and abduction are forms of hypothetico-deductive ex-
planation except for the case (11.12.13-2-b ). The two cases (11.12.13-1) and (11.12.13-2-
b) correspond to enumerative induction. This is another point on which induction and
abduction differ: whereas abduction is necessarily domain-dependent (see A3), this
is not the case for induction (see 11.12.13-1). Abduction is domain-dependent because
of the dependence between particular conditions and the associated law. If this law
was not in the domain theory, we would have to infer it as a hypothesis too, then it
would not be an abductive reasoning but an explanatory one (i.e. inference to the ex-
planans). In enumerative induction, we do not need to deal with a domain theory, it is
simply an extension of some properties of observed individuals, to a larger unobserved
population. This form of reasoning is sometimes called inductive generalization. The
enumerative induction is related to the concept of confirmation that we present in the
following section. In the next approach, abduction and induction are totally different,
they are no more linked up by the common root of the explanatory reasoning. Induc-
tion is nothing but the process of confirmation and abduction is always explanatory
but it is based on what we call, "the hypothetico-nonmonotonic model".

5.4 ABDUCTION AND INDUCTION: TOTALLY DIFFERENT

5.4. 1 Theory of confirmation


Another approach consists in considering the logic of induction as the theory of confir-
mation (Mortimer, 1988). The problem of induction comes down to characterizing to
what extent a set of observations confirms a general hypothesis. We refer to (Hempel,
1945) for a qualitative account of confirmation and to (Carnap, 1952) for a quantitative
concept of degree of confirmation. Carnap's approach is based on conditional prob-
abilities where a function of degree of confirmation is defined as logical probability.
Hempel proposes a set of logical conditions of adequacy that a binary relation between
two logical formulas should satisfy. This relation Ct(O,H) formalizes the statement
"0 confirms H" and satisfies adequacy conditions such as, for example,
ON RELATIONSHIPS BETWEEN INDUCTION AND ABDUCTION 83

(H2.2) equivalence condition


ifCt(O,H) then,forallH' such asp (H- H'), Ct(O,H')
Note that this condition leads to the so-called raven paradox which is connected to the
problem of confirmation of universal statements by an individual statement (e.g. "white
shoes" confirms the hypothesis that "all non-black objects are not ravens" and so by
equivalence "all ravens are black". This view, which suggests that the observation of
"white shoes" confirms the hypothesis that "all ravens are black", is clearly consid-
ered to be inconsistent with the concept of confirmation). A solution to this paradox
is proposed in (Flach, 1995).
According to Hempel's approach where induction is considered as logical infer-
ence, the inductive inference is then the confirmatory inference (0 confirms H) which
is different from the explanatory inference (H explains 0). Abduction and induc-
tion are totally different when abduction is defined as inference to the best explana-
tion, induction as confirmatory inference. This raises the problem of the explana-
tion/confirmation duality that we will discuss later. We present, in the next section,
another case where abduction and induction are totally different.

5.4.2 Hypothetico-nonmonotonic model


Abduction is often characterized as inference to the best explanation. This requires
addressing, among others, the question of what is an explanation. In the previous
section, the abductive explanation is seen as a deductive proof (see Section 5.3.1).
Such approaches based on AI class pursue specific objectives, motivated by computa-
tional perspectives. The knowledge domain is then restricted and the domain theory
is assumed to be perfect. However, in the everyday world, it is not the case, so the ex-
planation does not lead any more to observations by a deductive inference (p) but by
a nonmonotonic inference (denoted I"'). The property of nonmonotonicity is defined
as follows
A I"' B :fo A t\ c I"' B
Adding a new piece of information C may call back into question the validity of B. So,
instead of being based on hypothetico-deductive model, abduction is based on what
we call the hypothetico-nonmonotonic model:

Given an observation 0 and a domain theory T h, find a hypothesis H such as

Tht\H I"' 0
We call such a hypothesis H, a hypothetico-nonmonotonic explanation.

This case corresponds to the possibility that some causally relevant factors might
be omitted in the domain theory which is not assumed perfect (Leake, 1995). For
example, let us assume that hypothesis H ="a lit match fell down" explains the obser-
vation that "the forest is burning". When we add the omitted information "the match
fell down in a pool of water", H is no longer an explanation for the observation. It
is related to the qualification problem raised by (McCarthy, 1980). Let us note the
existence of some works (Aliseda, 1996a; Flach, 1996b; Mayer and Pirri, 1996; Pino
84 B.BESSANT

Perez and Uzcategui, 1997), studying the inductive and abductive inferences based on
the hypothetico-nonmonotonic model, at the meta-level (ie. studying properties that
the underlying inferences should satisfy). This approach of induction and abduction
is presented in Section 5.5.2.

5.4.3 Discussion
Abduction based on the hypothetico-nonmonotonic model is different from induction
based on hypothetico-deductive model. It is no more falsity preserving and it is not a
form of reversed deduction. Let us note that the notion of explanation is complex and
although it is often mixed up with deductive inference, many other cases of explanation
can be considered, modelled by a non-deductive inference (see for example (Lipton,
1991)).
Another fundamental point raised here is the notion of confirmation that points out
the explanation/confirmation duality, and allows us to distinguish some characteristics
about abduction and induction. The problem of confirmation is identified as problem
of induction. So the question is what would the concept of confirmation be in ab-
ductive reasoning? Usually, in abduction, a hypothesis is simply accepted and from a
technical point of view it is extracted from the domain theory (Dimopoulos and Kakas,
1996a). Abduction is a way to clarify particular conditions, implicit in the domain the-
ory, that allows to give a proof (not necessarily deductive) to accept the observation,
whereas induction is the construction of a hypothesis that gives an account of unifor-
mity, common to a set of observations. Abduction is the admission of the necessity
of "a jump to conclusion" while induction is the formulation of a uniformity, with an
increasing empiricism. We seize the opportunity to point out another difference usu-
ally expressed between the two forms of reasoning that is, induction is a prediction
for some unobserved information, whereas abduction deals with the available obser-
vations.
In the following section, we examine the underlying logical inference of abductive
and inductive reasoning, on the object- and meta-level.

5.5 ABDUCTION AND INDUCTION: A LOGICAL INFERENCE

5.5. 1 Process of logical inference: relationship with deduction


Induction and abduction are forms of hypothetical reasoning, based on incomplete
knowledge. The underlying logical inference is then nonmonotonic (adding new in-
formation may call back into question the validity of hypothesis H). For example, if,
from the observation P(a) I\ P(b) I\ P(c) we induce Vx P(x), adding the information
-,p(d), calls back into question the validity of Vx P(x).

Let us note l<;nd (resp. l<abd) the inductive (resp. abductive) inference, defined mod-
ulo a domain theory T h, as follows:

0 l<ind(abd) H
ON RELATIONSHIPS BETWEEN INDUCTION AND ABDUCTION 85

From the observation 0 and the domain theory T h, we induce (or abduce) the hypoth-
esis H.

Our analysis is based on the underlying inferences of abduction and induction of AI


class, that is to say:
0 l<ind(abd} H -
find H such as Th 1\ H f= 0, with conditions for induction (or abduction) mentioned
in Section 5.3.

Generally, the aim of the construction process of inductive or abductive hypotheses


is to come down to the deductive inference. To achieve this, the process is based on
the technique of completion:

Ol<ind(abd)H - Compl(Th)I\Of=H

with Compl(Th) the domain theory completed.


From a semantical point of view, the completion achieves preference between models:
0 l<ind( abd) H - every preferred model ofTh, model ofO, is a model ofH.

The set of hypotheses is infinite, the completion of T h allows to be limited to a finite


case. The type of completion corresponds to a particular choice of H. In induction
and abduction, the technique of completion differs. In induction, the completion is
performed by adding a hypothesis of similarity to T h, conveying the idea that "all that
is known, looks like what is unknown". This similarity hypothesis is used for example
in Lachiche 's chapter to circumscribe the individuals to the known ones, by a domain-
closure. In abduction, the completion is performed by circumscribing some properties
of individuals, the abductive hypothesis is supposed to be among the known ones. For
example, in (Console et al., 1991 b) , properties are split into abducible properties and
non-abducible ones, assuming that abductive hypothesis is a combination of abducible
properties. The completion of Th (a logical program in that case) is the completion of
non-abducible properties in order to replace them in the observation with a logically
equivalent formula. Consequently, the result of the construction process is a formula
that contains only abducible properties, and so represents an abductive hypothesis.
The analysis of these two different processes of completion, confirms the idea that we
developed in Section 5.4.3 according to which, abduction is a way to clarify conditions
which are implicit in the domain theory and induction gives an account of uniformity,
common to a set of observations. The abductive hypothesis is specific to observation,
whereas induction gives an account of future observations.

5.5.2 Meta-level: properties of the inference


Another way to deal with inductive or abductive inference, is to study its logical prop-
erties at the meta-level. This is the purpose of some researchers (for example, (Flach,
1995; Aliseda, 1996a; Pino Perez and Uzcategui, 1997; Mayer and Pirri, 1996); see
also the chapter by Flach) who propose sets of metalogical rules aiming at defining
86 B.BESSANT

the "rationality" of the abductive and inductive consequence relations. An example of


such a rule is
I= 01 1\ H --+ 02 01 I< inti( abd) H
01 1\ 02 l<;nd( abd) H
This rule, called by Flach "Verification" (see p. 92) is interpreted as follows: "02 is a
prediction made on the basis of hypothesis H and evidence 0 1 • If such a prediction is
actually observed, hypothesis H remains a possible hypothesis". Let us note that the
notation l<;nd( abd) used in this section is interpreted in many different ways according
to each author, and although this rule is common to a great number of works, the con-
sequence relation l<;nd( abd) does not always have the same meaning. This is the first
point according to which it seems difficult to relate and compare these works in order
to relate induction and abduction. The second point is that the authors' motivations
are different and focused on only one of these two forms of reasoning (for exam-
ple, Flach's works originate from the study of induction, Pino Perez and Uzcategui's
works originate from the study of abduction). However, metalogical approaches of
inductive and abductive consequence relations allow us to open new perspectives as
to the issue of relations between these two forms of reasoning. First, they are in the
same vein as Kraus et al. 's or else Gardenfors and Makinson's works (Kraus et al.,
1990; Gardenfors and Makinson, 1994). In the former, authors give a classification of
consequence relations according to how close the relation is to the classical entailment
relation. The latter is an approach of nonmonotonic inference seen from belief revi-
sion. Strong connections with these different approaches can be obtained (Mayer and
Pirri, 1996; Pino Perez and Uzcategui, 1997)) and so improve our understanding of
induction and abduction. Another perspective is the explanatory/confirmatory duality
(proposed by Flach) or explanation/conclusion duality (proposed by Pino Perez and
Uzcategui) that are strongly related to the notion of induction and abduction. Then we
can adopt two points of view: these dualities are equivalent to the abduction/induction
duality or they can be incorporated in each form of reasoning. Let us note that these
two points of view are not exclusive. Eventually, the domain theory being gener-
ally fixed, the extension of the metalogical approaches by allowing the change of the
domain theory, could provide new insights about relations between induction and ab-
duction.

5.6 CONCLUSION
We investigated many ways to relate induction and abduction in a single framework,
each corresponding to different definitions of induction and abduction. Some of these
definitions are complementary, some of them are more abstract than others, some of
them are antinomic. In the first approach, we studied possibilities to insert one into
the other: induction as a form of abduction and vice versa. In this approach, induction
considered as generalization of observations plays an explanatory role of abductive
type, induction is situated as a form of abduction or inference to the best explana-
tion. In the second approach, definitions of induction and abduction are restricted and
refined. They remain forms of explanatory reasoning that constitutes their common
root, but they differ about their explanatory power. We define induction as inference
of covering laws, and abduction as inference of particular conditions, which is close to
ON RELATIONSHIPS BETWEEN INDUCTION AND ABDUCTION 87

many approaches in AI. These definitions are linked up to the hypothetico-deductive


model that constitutes their common root. From this viewpoint, explanation is assim-
ilated to deduction. In the third approach, we study how induction and abduction can
be considered as totally different, that is to say when induction is nothing but confir-
mation and plays no more explanatory role, and when abduction remains explanatory
but based on the "hypothetico-nonmonotonic model". This third approach introduces
a new component to the study of relationships between induction and abduction which
is the duality between confirmation/explanation. These different points of view that
lead to different answers to the issue of relations between these two forms of reason-
ing, correspond to specific needs and purposes. We saw for example how induction
and abduction came down to deduction by techniques of completion and in the last
section, we presented another logical point of view that is characterizing metalogical
properties of the underlying inferences. An in-depth comparison between works on
this subject would open new perspectives to the issue of the relationships between
induction and abduction.

Acknowledgments
This work has been partly supported by the Ganymede II project of the Contrat de Plan Etat/Nord-
Pas-de-Calais.
6 ON THE LOGIC OF HYPOTHESIS
GENERATION
Peter A. Flach

6.1 INTRODUCTION
It has been argued in the introductory chapter to this volume that, when dealing with
non-deductive reasoning forms like abduction and induction, a distinction between
hypothesis generation and hypothesis evaluation arises. Hypothesis generation is con-
cerned with hypotheses that are not yet ruled out by the data, and is as such a purely
logical process. Hypothesis evaluation then proceeds with further investigating the
possible hypotheses in order to select one of which the predictions agree sufficiently
with reality. This distinction has been inspired by Peirce's later, inferential theory of
abduction, deduction and induction as the three stages of scientific inquiry.
Furthermore, as argued in the introductory chapter and by several authors contribut-
ing to this volume (e.g. Lachiche), while abduction is predominantly inference of an
explanation of the observations, induction can be understood from two perspectives.
On the one hand, we can perceive inductively inferred classification rules as explain-
ing the classifications of the examples (explanatory induction); on the other, we can
take a wider perspective by including non-classificatory forms of induction, which aim
at discovering generalisations that are sufficiently confirmed by the data (confirmatory
or descriptive induction).
In this chapter we deal with hypothesis generation rather than evaluation. The aim
is to formalise the relation between observations and possible hypotheses. The distinc-
tion between explanatory hypotheses and confirmed generalisations naturally leads to
two different forms of hypothesis generation, which I will call explanatory and con-
firmatory reasoning. The reader may think of confirmatory reasoning as generation of
(non-classificatory) inductive generalisations and of explanatory reasoning as genera-
89
P.A. Flach and A. C. Kakas (eds.), Abduction and Induction , 89-106.
© 2000 Kluwer Academic Publishers.
90 P.A.FLACH

tion of abductive explanations (including classification rules), but it is not the aim of
this chapter to draw a firm line between abduction and induction. In fact, the analysis
in this chapter can be seen as an alternative to the abduction/induction dichotomy, dis-
tinguishing non-deductive reasoning forms instead by the logical relation the inferred
hypotheses bear to the premisses.
The way this logical relation is characterised in this chapter has been inspired by
work on the analysis of nonmonotonic reasoning (Kraus et al., 1990). The analysis is
carried out on the meta-level by considering hypothetical consequence relations that
link observations with possible hypotheses. This meta-level analysis is accompanied
by semantic characterisations and representation theorems. The approach thus not
only links up with other work on logics for artificial intelligence, but also with classi-
callogical analysis dealing with soundness and completeness of proof systems. More-
over, we draw connections with work in philosophy of science, in particular Hempel's
qualitative approach to confirmation.

6.2 LOGICAL PRELIMINARIES


The main logical tool in this chapter is the notion of a consequence relation, originat-
ing from (Tarski, 1956) and further elaborated by (Gabbay, 1985), (Makinson, 1989),
and (Kraus et al., 1990). In this section I give an introduction to this important meta-
logical concept that is largely self-contained. The basic definitions are given in Section
6.2.1 . In Section 6.2.2 I consider some general properties of hypothetical consequence
relations.

6.2. 1 Consequence relations


We distinguish between the language L in which evidence and hypotheses are for-
mulated, and the meta-language in which statements about the process of hypothesis
generation are expressed. In this chapter L is a propositional language over a fixed
countable set of proposition symbols, closed under the usual logical connectives. We
assume a set of propositional models U, and a satisfaction relation f=~ U x L that is
well-behaved with respect to the logical connectives and compact. As usual, we write
f= a for Vm E U : m f= a, for arbitrary a E L. Note that U may be a proper subset
of the set of all truth-assignments to proposition symbols in L, which would reflect
prior knowledge or background knowledge. Equivalently, we may think of U as the
set of models of an implicit background theory T, and let f= a stand for 'a is a logical
consequence of T'.
The meta-language is a restricted predicate language built up from a unary meta-
predicate f= in prefix notation (standing for validity with respect to U in L) and a
binary meta-predicate k in infix notation (standing for hypothetical consequence).
In statements of the form a k ~. a is called the evidence (if a is a conjunction then
each of the conjuncts is called an observation), while ~is called the hypothesis. In
referring to object-level formulae from L we employ a countable set of meta-variables
a, ~. y, o, ... , the logical connectives from L (acting like function symbols on the meta-
level), and the meta-constants true and false. Formulae of the meta-language, usually
referred to as postulates, are of the form P1 , ... , Pnf Q for n 2:: 0, where P1, ... , Pn and
ON THE LOGIC OF HYP<JfHESIS GENERATION 91

Q are literals (atomic formulae or their negation). Intuitively, such a postulate should
be interpreted as an implication with antecedent P1, . .. , Pn (interpreted conjunctively)
and consequent Q, in which all variables are implicitly universally quantified. An
example of such a postulate, written in an expanded Gentzen-style notation, is
ak~ a/\~f=y
a/\·y~~
This is a postulate with two positive literals in its antecedent, and a negative literal in
its consequent. Intuitively, it expresses that a hypothesis ~. previously inferred from
evidence a, should be withdrawn if the negation of a consequence of a and ~ together
is added to the evidence.
Consequence relations provide the semantics for this meta-language, by fixing the
meaning of the meta-predicate k . Formally, a consequence relation is a subset of
L x L. Consequence relations will be used to model part or all of the reasoning be-
haviour of an abductive or inductive agent, by listing a number of arguments (pairs
of premiss and conclusion) the agent is prepared to accept. A consequence rela-
tion satisfies a postulate whenever it satisfies all instances of the postulate, and vi-
olates it otherwise, where an instance of a postulate is obtained by replacing the vari-
ables of the postulate with formulae from L. For instance, the consequence relation
{(p,q), (p/\ •p,q)} violates the postulate above. We will normally refer to a particular
consequence relation as k and write p k q instead of (p, q) E k.

6.2.2 Some rationality postulates for hypothesis generation


After having stated the main definitions concerning consequence relations I will now
list some properties generally obeyed by hypothetical consequence relations. These
properties should be taken as rationality postulates to be satisfied by hypothesis gen-
erating agents. In this section we will not distinguish between explanatory or con-
firmatory reasoning, and simply interpret a k ~ as '~ is a possible hypothesis given
evidence a'.
The first two postulates state that the logical form of evidence and hypotheses is
immaterial:
f=at+~ aky
Left Logical Equivalence
~ky
f=at+~ yka
Right Logical Equivalence
yk~
Left Logical Equivalence states, for instance, that if the evidence is expressed as a
conjunction of ground facts, the order in which they occur is immaterial. From the
viewpoint of practical induction algorithms this may embody a considerable simpli-
fication - however, the framework presented in this chapter is intended to provide a
model for hypothesis generation in general rather than particular algorithms. 1

1Practical algorithms establish a function from evidence to hypothesis rather than a relation, i.e. also Right
Logical Equivalence would be invalidated by an induction algorithm (of all the logically equivalent hy-
potheses only one would be output).
92 P.A.FLACH

The following two postulates express principles well-known from philosophy of


science:

Verification

Falsification

In these two postulates y is a prediction made on the basis of hypothesis ~ and evidence
a. Verification expresses that if such a prediction is indeed observed, hypothesis ~ re-
mains a possible hypothesis, while if its negation is observed, ~ may be considered
refuted according to Falsification. 2 One might remark that typically the hypothesis
will entail the evidence, so that the first condition in the antecedent of Verification
and Falsification may be simplified to f= ~ -+ y. However, this is only the case for
certain approaches to explanatory reasoning; generally speaking hypotheses, in par-
ticular those that are confirmed without being an explanation, may not contain all the
information conveyed by the evidence. The formulation above represents the general
case.
Falsification can be simplified in another sense, as shown by the following lemma.

Lemma 6.1 In the presence of Left Logical Equivalence, Falsification is equivalent


with the following postulate:

ak~
Consistency

Falsification and Consistency rule out inconsistent evidence and hypotheses. The way
inconsistent evidence is handled is merely a technicality, and we might have decided
to treat it differently. The case of inconsistent hypotheses is different however: it is
awkward to say, for instance, that arbitrary evidence induces an inconsistent hypothe-
sis. Furthermore, in inductive concept learning often negative examples are included,
that are not to be classified as belonging to the concept, which requires consistency
of the induced rule. Also, the adoption of Consistency is the only way to treat ex-
planatory and confirmatory reasoning in a unified way as regards the consistency of
evidence and hypothesis.
In the presence of Consistency a number of other principles have to be formulated
carefully. For instance, we have reflexivity only for consistent formulae . In the light

2 Notice that, contrary to the previous two postulates, Verification and Falsification happen to be meaning-
ful also when modelling the behaviour of an induction algorithm: Verification expresses that the current
hypothesis should not be abandoned when the next observation is a predicted one (in the terminology of
(Angluin and Smith, 1983) the algorithm is conservative), while Falsification expresses that the current hy-
pothesis must be abandoned when the next observations runs counter to the predictions of the algorithm
(called consistency by (Angluin and Smith, 1983)). However, in the context of the present chapter these are
not the intended interpretations of the two postulates.
ON THE LOGIC OF HYPOTHESIS GENERATION 93

of Consistency a formula is consistent if it occurs in an hypothetical argument, either


as evidence or as hypothesis, so we have the following weaker versions of reflexivity:

Left Reflexivity
akf3
aka
Right Reflexivity
akf3
f3kf3
If a consequence relation contains an argument a k a, this signals that a is consistent
with the reasoner's background theory. We will call such an a admissible (with respect
to the consequence relation), and use conditions of this form whenever we require
consistency of evidence or hypothesis in a postulate.3
The final postulate mentioned in this section is a variant of Verification that allows
to add any prediction to the hypothesis rather than the evidence:

Right Extension

Further postulates considered below are specific to either explanatory or confirmatory


reasoning.

6.3 EXPLANATORY REASONING


In the introductory chapter we have seen that Peirce's definition of explanatory hy-
pothesis generation could be formalised by the inference rule

C , Af=C
A
In the logical analysis of this chapter the explanatory inference from C to A is lifted
to the meta-level, as follows:
Af=C
CkA

The symbol k stands for the explanatory consequence relation.


In this section we will study postulates like the above 'converse entailment' prop-
erty. Throughout the section a k f3 is to be read as 'evidence a is explained by hy-
pothesis Wor 'hypothesis f3 is a possible explanation of evidence a'. What counts
as a possible explanation will initially be left unspecified - the framework of conse-
quence relations allows us to formulate abstract properties of hypothesis generation,
without fixing a particular material definition. We will then single out a particular set
of postulates and characterise it semantically by means of so-called strong explanatory
structures.

3Readers with a background in machine learning may interpret aka as 'hypothesis a does not cover any
negative example'.
94 ~A.FLACH

6.3. 1 Postulates for explanatory reasoning


As has been explained before, the converse entailment postulate introduced by Peirce
has to be adapted in order to exclude inconsistent evidence and hypotheses, as follows:

Admissible Converse Entailment


f=I3-Hl , 13kl3
nkl3
A stronger postulate is the following: possible explanations may be logically strength-
ened, as long as they remain consistent. This is expressed as follows:
f=y---+13 , nkl3 , yky
Admissible Right Strengthening
(l ky

We may note that Admissible Converse Entailment can be derived from Admissible
Right Strengthening if we assume Consistency and the following postulate:

Explanatory Reflexivity

This postulate represents a weakening of reflexivity especially tailored for explana-


tory reasoning. It is best understood by rewriting it into its contrapositive: from a k a
and 13 b( 13 infer --,13 k a, which states that if 13 is inadmissible, i.e. too strong a state-
ment with regard to the background knowledge, its negation --,13 is so weak that it is
explained by arbitrary admissible hypotheses a.

Lemma 6.2 In the presence of Consistency and Explanatory Reflexivity, Admissible


Right Strengthening implies Admissible Converse Entailment.

While the postulates above express properties of possible explanations, the fol-
lowing two postulates concentrate on the evidence. The underlying idea is a basic
principle in inductive machine learning: if the evidence is a set of instances of the
target concept, we can partition the evidence arbitrarily and find a single hypothesis
that is an explanation of each subset of instances. This principle is established by the
following two postulates: 4

lncrementality
(l ky ' 13 ky
nAI3ky

Convergence
f=n---+13 , nky
13 ky
Lemma 6.3 If k is a consequence relation satisfying lncrementality and Conver-
gence, then a/\ 13 ky iff a kyand 13 ky.

Incrementality and Convergence are of considerable importance for machine learn-


ing, since they allow for an incremental approach. Incrementality states that pieces

4 In
previous work (Flach, 1995) Incrementality was called Additivity, and Convergence was called Incre-
mentality. The terminology employed here better reflects the meaning of the postulates.
ON THE LOGIC OF HYPCITHESIS GENERATION 95

of evidence can be dealt with in isolation. Another way to say the same thing is that
the set of evidence explained by a given hypothesis is conjunctively closed. By the
postulate of Consistency this set is consistent, which yields the following principle:

Left Consistency

Lemma 6.4 In the presence ofRight Reflexivity and Admissible Converse Entailment,
Left Consistency implies Consistency.

It follows that Left Consistency and Consistency are equivalent in the presence of
Right Reflexivity, Admissible Converse Entailment, and lncrementality.
Convergence expresses a monotonicity property of hypothesis generation, which
can again best be understood by considering its contrapositive: a hypothesis that is re-
jected on the basis of evidence~ cannot become feasible again when stronger evidence
a is available. In other words: the process of rejecting a hypothesis is not defeasible
(i.e. based on assumptions), but based on the evidence only. This is the analogue of the
monotonicity property of deduction (note that the latter can be obtained by reversing
the implication in the first condition of Convergence).

Lemma 6.5 The combination of Verification and Convergence is equivalent with the
following postulate:

t=uAy-+~ , uky
Predictive Convergence
~ ky

Predictive Convergence can be seen as a strengthening of Convergence, in the sense


that ~ is not merely a weakening of evidence a, but can be any set of predictions. Note
that Right Reflexivity is an instance of Predictive Convergence (put y=~).
The final postulate we consider in this section expresses a principle well-known
from machine learning: if a represents the classification of an instance and ~ its de-
scription, then we may either induce a concept definition from examples of the form
~ -+ a, or we may add ~ to the background theory and induce from a alone. Since
in our framework background knowledge is included implicitly, ~ is added to the hy-
pothesis instead.

Conditionalisation

After having discussed various abstract properties of generation of explanatory hy-


potheses we now turn to the question of characterising explanatory reasoning seman-
tically.

6.3.2 Strong explanatory consequence relations


As we have seen, Peirce's original idea was to define explanatory hypothesis gener-
ation as reversed deduction. I will amend Peirce's proposal in two ways. First, as
explained above it is required that the hypothesis be consistent with respect to the
96 P.A.FLACH

background knowledge. Secondly, I reformulate reversed deduction as inclusion of


deductive consequences. The main reason for the latter is that in this way the explana-
tory consequence relation is defined in terms of a property that is preserved by its
arguments (viz. explanatory power).

Definition 6.1 An explanation mechanism is some consequence relation f-v. 5 Let


C~(a) denote {131 a 1---13}, i.e. the set offormulae explained by a under 1---. The ex-
planatory consequence relation defined by f-v is defined as a k 13 if!C~ (a) ~ C~ (13) C
L. A strong explanatory consequence relation is defined by a monotonic explanation
mechanism.
'~ I
Thus, an explanation is required to have at least the s!ime;explananda as the evidence
it is obtained from, without becoming inconsistent. It should be noted that, in the
general case, the conditions C~(a) ~ C~(l3) and 13 k a are not equivalent. 6 However,
for monotonic explanation mechanisms they are, which provides us with the following
'Peircean' definition of strong explanatory consequence relations in terms of reversed
deduction.

Definition 6.2 A strong explanatory structure is a set W ~ M. The consequence rela-


tion it defines is denoted by kw and is defined by: a kw 13 ijf(i) there is an moE W
such that mo f= 13. and (ii) for every m E W, m f= 13 -t a.

That is, possible explanations of a are those hypotheses l3 that are consistent and entail
a, where consistency and entailment are taken relative to a background theory encoded
inW.
The following system of postulates can be proved to axiomatise strong explanatory
structures.

Definition 6.3 The system EM consists of the following postulates:

akl3 yky
Admissible Right Strengthening
aky

Explanatory Reflexivity

aky , 13ky
Incrementality
aAI3ky

Predictive Convergence
F= a/\y-t 13 aky
13 ky

Left Consistency

5 An explanationmechanism reasons from an explanans to an explanandum, and thus should not be confused
with an explanatory consequence relation which reasons from explanandum to a possible explanans.
6 C(a) ~ C-(~) implies~ J--a if j-, is reflexive;~ J-,a implies c_(a) ~ C-(~) if j-, is transitive.
ON THE LOGIC OF HYPOTHESIS GENERATION 97

Conditionalisation

We note the following derived postulates of EM: Convergence, Admissible Converse


Entailment and Right Reflexivity (instances of Predictive Convergence) and Consis-
tency (Lemma 6.4).

Theorem 6.6 (Soundness of EM) Any strong explanatory consequence relation sat-
isfies the postulates of EM.

In order to prove completeness we build a strong explanatory structure W from a given


consequence relation k satisfying the postulates of EM, such that a k 13 iff a kw 13.
For non-empty explanatory relations the following construction is used:

W = {m E U I for all a, 13 such that a k 13 : m f= 13 -+ a}


An empty explanatory relation signals inconsistent background knowledge, and is
hence defined by the empty explanatory structure.

Theorem 6.7 (Completeness of EM) Any consequence relation satisfying the postu-
lates of EM is defined by a strong explanatory structure.

What we have achieved in this section is a refinement of Peirce's conception of


explanatory reasoning into a form that is more amenable to logical analysis. Rather
than defining explanatory reasoning by means of the single postulate of converse en-
tailment, we have defined it through six necessary and sufficient structural conditions
on the explanatory consequence relation. Each of these postulates can be judged on its
own merits, and perhaps changed or dropped to obtain a weaker form of explanatory
reasoning. We have also introduced an (implicit) background theory, and required that
the hypothesis be logically compatible with the evidence.

6.4 CONFIRMATORY REASONING


We will now switch from the explanatory viewpoint (as in abduction or classification-
oriented induction) to the confirmatory perspective (descriptive induction). Through-
out this section a k 13 is to be read as 'evidence a confirms hypothesis 13'. Our goals
will be to find reasonable properties of k under this interpretation, and to characterise
particular sets of postulates by a suitable semantics.

6.4. 1 Hempel's rationality postulates for confirmation


Carl G. Hempel (Hempel, I943; Hempel, I945) developed a qualitative account of
hypothesis generation that can be seen as a direct precursor of the consequence relation
approach. Before developing a material definition of confirmation he lists a number of
rationality postulates (or, as he calls them, adequacy conditions) any such definition
should satisfy. The following conditions can be found in (Hempel, I945, pp. 103-I 06,
II 0); logical consequences of some of the conditions are also stated.
98 P.A.FLACH

(Hl) Entailment condition: any sentence which is entailed by an observation report


is confirmed by it.

(H2) Consequence condition: if an observation report confirms every one of a class K


of sentences, then it also confirms any sentence which is a logical consequence
ofK.

(H2.1) Special consequence condition: if an observation report confirms a hy-


pothesis H , then it also confirms every consequence of H.
(H2.2) Equivalence condition: if an observation report confirms a hypothesis H,
then it also confirms every hypothesis which is logically equivalent with
H.
(H2.3) Conjunction condition: if an observation report confirms each of two hy-
potheses, then it also confirms their conjunction.

(H3) Consistency condition: every logically consistent observation report is logically


compatible with the class of all the hypotheses which it confirms.

(H3.1) Unless an observation report is self-contradictory, it does not confirm any


hypothesis with which it is not logically compatible.
(H3.2) Unless an observation report is self-contradictory, it does not confirm any
hypotheses which contradict each other.

(H4) Equivalence condition for observations: if an observation report B confirms


a hypothesis H , then any observation report logically equivalent with B also
confirms H.

The entailment condition (HI) simply means that entailment ' might be referred to
as the special case of conclusive confirmation' (Hempel, 1945, p.l07). The conse-
quence conditions (H2) and (H2.1) state that the relation of confirmation is closed
under weakening of the hypothesis or set of hypotheses (HI is weaker than H2 iff it
is logically entailed by the latter). Hempel justifies this postulate as follows (Hempel,
1945, p.l03): ' an observation report which confirms certain hypotheses would invari-
ably be qualified as confirming any consequence of those hypotheses. Indeed: any
such consequence is but an assertion of all or part of the combined content of the orig-
inal hypotheses and has therefore to be regarded as confirmed by any evidence which
confirms the original hypotheses.' Now, this may be reasonable for single hypotheses
(H2.1 ), but much less so for sets of hypotheses, each of which is confirmed separately.
The culprit can be identified as (H2.3), which together with (H2.1) implies (H2). A
similar point can be made as regards the consistency condition (H3), about which
Hempel remarks that it 'will perhaps be felt to embody a too severe restriction'.
(H3.1), on the other hand, seems to be reasonable enough; however, combined with
the conjunction condition (H2.3) it implies (H3). We thus see that Hempel's rationality
postulates are intuitively justifiable, except for the conjunction condition (H2.3) and,
a fortiori, the general consequence condition (H2). On the other hand, the conjunction
ON THE LOGIC OF HYPOTHESIS GENERATION 99

condition can be justified by a completeness assumption on the evidence, as will be


further discussed below.
I will now translate Hempel's set of adequacy conditions for confirmation into pos-
tulates for confirmatory consequence relations. The conditions will be slightly mod-
ified, in order to keep the treatment of inconsistent evidence and hypothesis in line
with the explanatory case: inconsistent evidence does not confirm any hypothesis, and
inconsistent hypotheses are not confirmed by any evidence.
Entailment condition (Hl) is translated into two postulates:

Admissible Entailment
I= a ---+ ~ a ka
ak~

Confirmatory Reflexivity
aka a K·~
~k~
Admissible Entailment expresses that admissible evidence (i.e. evidence that is con-
sistent with the background knowledge) confirms any of its consequences. In other
words, consistent entailment is a special case of confirmation. Confirmatory Reflex-
ivity is the confirmatory counterpart of Explanatory Reflexivity encountered in the
previous section. It is added as a separate postulate since, in its original formulation,
(Hl) includes reflexivity as a special case. As with its explanatory counterpart, Con-
firmatory Reflexivity is best understood when considering its contrapositive: if ~ is
inadmissible, i.e. too strong a statement with regard to the background knowledge, its
negation •13 is so weak that it is confirmed by arbitrary admissible formulae a.
Consequence condition (H2) cannot be translated directly, since in the language of
consequence relations as defined here we have no means to refer to a set of confirmed
sentences. However, a translation of the special consequence condition (H2.1) and the
conjunction condition (H2.3) will suffice:

1=13---+y , akl3
Right Weakening
aky
ak~ , aky
Right And
ak~Ay

Lemma 6.8 If k is a consequence relation satisfying Right And and Right Weaken-
ing, then a k ~A y iff a k ~and a ky.

Right Weakening expresses that any hypothesis entailed by a given hypothesis con-
firmed by a is also confirmed by a. Notice that Admissible Entailment is an instance
of Right Weakening (put J3=a).

Lemma 6.9 The combination of Right Extension and Right Weakening is equivalent
to the following postulate:

Predictive Right Weakening


100 P.A. FLACH

In words, Predictive Right Weakening expresses that given a confirmatory argument,


any predicted formula is confirmed by the same evidence. Notice that by putting y=a.
in Predictive Right Weakening we obtain Left Reflexivity.
Right And states that the set of all confirmed hypotheses (interpreted as a conjunc-
tion) is itself confirmed. The combination of Right And and Right Weakening implies
Hempel's general consequence condition (H2): if E confirms every formula of a set K,
then it also confirms the conjunction of the formulae inK (by Right And), and there-
fore also every consequence of this conjunction (by Right Weakening)_? It has already
been remarked that Right And is probably too strong in the general case, if we have
inconclusive evidence that is unable to choose between incompatible hypotheses. In
this respect it is perhaps appropriate to point at a certain similarity between Right And
and Right Extension: the latter postulate requires y to be predicted rather than being
confirmed by a..
Like the general consequence condition (H2), general consistency condition (H3)
cannot be translated directly into a postulate, since we have no means to refer to the
set of confirmed formulae. However, in the light of Right And the conjunction of the
formulae in this set is itself confirmed, and therefore it is sufficient to formulate a
postulate expressing the special consistency condition (H3.1 ), which is the postulate
of Consistency previously encountered:

Consistency

Condition (H3.2) expresses that for any formula f3, if f3 is in the set of confirmed
hypotheses then •13 is not. This principle is expressed by the following postulate:

Right Consistency

Lemma 6.10 In the presence of Admissible Entailment and Left Reflexivity, Right
Consistency implies Consistency.

Clearly, Consistency implies Right Consistency in the presence of Right And. As a


corollary to Lemma 6.10, we have that Right Consistency and Consistency are equiv-
alent in the presence of Left Reflexivity, Admissible Entailment, and Right And.
Finally, the equivalence condition for observations (H4) is translated into

Left Logical Equivalence


F <H-+ f3 , a. k 'Y
f3 ky
We now tum to the question of devising a meaningful semantics for Hempel's condi-
tions as re-expressed in our framework of hypothetical consequence relations.

7 This holds only for finite K, an assumption that I will make throughout.
ON THE LOGIC OF HYPOTHESIS GENERATION 101

6.4.2 Simple confirmatory structures


In inductive logic programming, it is customary to draw a distinction between in-
duction of logic programs that are to entail given sets of examples, and induction of
integrity constraints (Muggleton and De Raedt, 1994). This distinction corresponds
more or less to our distinction between explanatory reasoning and confirmatory rea-
soning. In the latter case the evidence is considered as a partial or complete specifica-
tion of a logical model. Correspondingly, we will express a semantics for confirma-
tory reasoning in terms of satisfaction by an appropriately constructed model or set of
models.
More precisely, a confirmatory semantics is conceived as one in which certain reg-
ular models are constructed from the premisses, such that a hypothesis is confirmed
if it is true in all those regular models. The requirement of truth in all regular models
is rather strong, but can be justified by assuming that the evidence contains enough
information to construct those regular models. As this is reminiscent of the Closed
World Assumption (Helft, 1989) I will call this a closed confirmatory semantics. In
Section 6.4.4 I will consider a variant which relaxes the assumptions regarding the
evidence (open confirmatory semantics).
Definition 6.4 A confirmatory structure is a triple W = (S, (·), [ · ]),where Sis a set of
semantic objects, and[·]: L -t 25 and [ · ]: L -t 25 are functions mapping formulae to
sets of semantic objects. The closed confirmatory consequence relation defined by W
is given by: a kw 13 iff(i) [a] i- 0, and (ii) [a] ~ [13].
Intuitively, [a] denotes the set of regular models constructed from premisses a,
each of which should satisfy hypothesis 13. A similar semantics has been considered
by (Bell, 1991), who calls it a pragmatic model. There are however two differences
between Bell's approach and mine. First, in order to rule out inconsistent premisses I
have added condition (i) in the definition of kw. Furthermore, I allow the possibility
that some of the regular models may not satisfy the premisses (i.e. (a] ~ [a]). How-
ever, the characterisation of such anti-reflexive logics of confirmation is left as an open
problem, and the results obtained below are based on the assumption that (a] ~ [a]
for all a E L.
Bell proves (Bell, 1991 , Theorem 4.8) that his pragmatic models, including the con-
dition [a] ~ [a], are axiomatised by the postulate system B consisting of Reflexivity,
Right Weakening, and Right And, if one additionally assumes that [ ·] is well-behaved
with respect to the logical connectives and logical entailment:

[a] = s iff 1= a
[a/\ 13] =[a] n [13]
[-.a]= S- [a]
and similarly for the other connectives.
Let us call confirmatory structures which satisfy these conditions, as well as [a] ~
[a] for all a E L, simple confirmatory structures, defining simple (closed) confirma-
tory consequence relations. I will now demonstrate that simple confirmatory structures
are axiomatised by the following postulate system.
102 P.A.FLACH

Definition 6.5 The system CS consists of the following postulates:

f=at\~--+'Y , ak~
Predictive Right Weakening
aky
ak~ aky
Right And
ak~Ay

ak~
Right Consistency
a\;(·~

Derived postulates of CS include Right Weakening and Right Extension (Lemma 6.9),
Left Reflexivity (an instance of Predictive Right Weakening), Admissible Entailment
(an instance of Right Weakening), and Consistency (Lemma 6.10).

Theorem 6.11 (Soundness of CS) Any simple closed confirmatory consequence re-
lation satisfies the postulates ofCS.

In order to prove completeness we need to build, for a given consequence relation, a


simple confirmatory structure W that defines exactly that relation. The key concept is
the following .

Definition 6.6 Let k be a confirmatory consequence relation. The model m E U is


said to be normal for a ifffor all ~ in L such that a k ~ , m f= ~-

For admissible formulae, normal models will play the role of the regular models in the
simple confirmatory structure we are building (remember that, given a consequence
relation k, a formula a E L is admissible iff aka).
In the standard treatment of inconsistent premisses they have all formulae as con-
sequences, hence no normal models- since in our treatment inconsistent premisses do
not confirm any hypothesis and thus have all models in U as normal models, we have
to treat them as a separate case. Given a consequence relation k, let W = (U, [·], [ · ]}
be defined as follows:

1. U is the set of models of L under consideration;

2. [a]={mEUimf=a}.
3. [a] = { m E U I m is a normal model for a} if a is admissible, and 0 otherwise;

The following completeness result states that the consequence relation defined by W
coincides with the original one if the latter satisfies the postulates of CS.

Theorem 6.12 (Completeness of CS) Any consequence relation satisfying the pos-
tulates of CS is the closed consequence relation defined by a simple confirmatory
structure.

One may note that two of the postulates obtained in the previous section have not
been mentioned above, viz. Confirmatory Reflexivity and Left Logical Equivalence.
ON THE LOGIC OF HYPaJ'HESIS GENERATION 103

Each of these postulates poses additional restrictions on simple confirmatory struc-


tures:
(Left Logical Equivalence) Iff= a*"+ f3, then [a.]= [f3].
(Confirmatory Reflexivity) If [f3] ':/; 0, then [f3] ':/; 0.
Additional postulates may be obtained by being more explicit about the construc-
tion of regular models. Inductive logic programming approaches like (De Raedt and
Bruynooghe, 1993) suggest to take the truth-minimal Herbrand model(s) of the evi-
dence as the regular model(s). In the analysis of nonmonotonic reasoning it is cus-
tomary to abstract this into a preference ordering on the set of models, such that the
regular models are the minimal ones under this ordering. We will work this out in the
next section.

6.4.3 Preferential confirmatory consequence relations


The main result of this section concerns an adaptation of Kraus et a/. 's preferential
semantics, recast as a confirmatory semantics. In other words, the regular semantic
objects are those that are minimal with respect to a fixed preference ordering.

Definition 6.7 A preferential structure is a triple W = (S,/, <), where Sis a set of
states, I: S-tU is a function that labels every state with a model, and< is a strict
partial orde~ on S, called the preference ordering, that is smooth.9 W defines a prefer-
ential confirmatory structure (S, [·], [·])and a preferential confirmatory consequence
relation as follows: [a.]= {s E S jl(s) f= a.}, and [a.]= {s E [a.] I Vs' E S: s' < s:::}
s' ¢[a.]}.
Note that preferential confirmatory structures are simple confirmatory structures, and
that they also satisfy the conditions associated above with Left Logical Equivalence
and Confirmatory Reflexivity (by the smoothness condition, if s E [a.] then either
sE (a.] or there is a t<s such that tE (a.]- hence [a.] ':/; 0 implies (a.] ':/; 0).
In comparison with the preferential semantics of (Kraus et al., 1990), the only dif-
ference is that in a preferential confirmatory argument the evidence is required to be
satisfiable, in order to guarantee the validity of Consistency. The intermediate seman-
tic level of states is mainly needed for technical reasons, and can be interpreted as
the set of models the hypothesis generating agent considers possible in that epistemic
state.
The following set of postulates axiomatises preferential confirmatory consequence
relations.

Definition 6.8 The system CP consists of the postulates ofCS, Confirmatory Reflex-
ivity, Left Logical Equivalence, plus the following postulates:

Left Or
a.ky ' f3ky
a.Va.ky

8 I.e., < is irreftexive and transitive.


9 I.e. for any S ~ S and for any s E S, either s is minimal in S, or there is a 1 E S such that 1 < s and 1 is
minimal in S. This condition is satisfied if < does not allow infinite descending chains.
104 P.A.FLACH

Strong Verification
a.ky ' a.kf3
a./\ykf3

The first postulate can be seen as a variant of Convergence, which is clearly invalid in
the general case: if we weaken the evidence, there will presumably come a point where
the evidence no longer confirms the hypothesis. However, Left Or states that pieces
of confirming evidence for a hypothesis can be weakened by taking their disjunction.
The second postulate is a variant of Verification, which states that a predicted formula
y can be added to confirming evidence a. for hypothesis f3. Strong Verification states
that this is also allowed when y is confirmed by a.. The way in which Strong Verifi-
cation strengthens Verification is very similar to the way Right And strengthens Right
Extension. The underlying intuition is that the evidence is strong enough to have all
confirmations "point in the same direction", as it were. 10
Theorem 6.13 (Representation theorem for CP) A consequence relation is prefer-
ential confirmatory if! it satisfies the postulates ofCP.

6.4.4 Weak confirmatory consequence relations


Any semantics that is to obey postulates like Right And and Strong Verification must
be based on completeness assumptions with regard to the evidence. On the other hand,
such strong assumptions cannot be made for all hypothesis generation tasks. It seems
reasonable, then, to investigate also an alternative approach, in which a confirmed
hypothesis is required to be true in some of the regular models. 11 It is not difficult
to see that such an alternative semantics, based on some notion of consistency, would
invalidate both Right And and Strong Verification. On the other hand, by not making
completeness assumptions one may again have the desirable property of Convergence.
Definition 6.9 Let W = (S, [·], [ · ]) be a confirmatory structure. The open confirma-
tory consequence relation defined by W is given by: a. kw f3 if![a.] n [!3] =1- 0.
We will characterise an extreme form of open confirmatory relations, which arises
when[·] is identified with [·],which in tum is well-behaved with respect to the con-
nectives and entailment.
Definition 6.10 A classical confirmatory structure is a simple confirmatory structure
(S, [ · ], [ · ]). A consequence relation is called weak confirmatory iff it is the open
consequence relation defined by a classical confirmatory structure.

From this definition it is clear that weak confirmatory consequence relations satisfy
both Right Weakening and Left Weakening (i.e. Convergence), as well as Consistency.
One additional postulate is needed.

10In the context of nonmonotonic reasoning, Strong Verification is known as Cautious Monotonicity. For

the purposes of this chapter I prefer to use the first name, which expresses more clearly the underlying
intuition in the present context.
11 This is sometimes called credulous inference, in contrast with skeptical inference which requires truth in
all regular models.
ON THE LOGIC OF HYPOTHESIS GENERATION 105

Definition 6.11 The system CW consists of the following postulates:

f=at\y-+f3 , aky
Predictive Convergence
f3 ky

Predictive Right Weakening


f=at\f3-+y , akf3
aky

Consistency

Disjunctive Rationality

Disjunctive Rationality has not been considered before in this chapter. The name
has been borrowed from Kraus et al., who identify it as a valid principle of plausible
reasoning. In the context of confirmatory reasoning, Disjunctive Rationality states that
a hypothesis cannot be confirmed by disjunctive observations unless it is confirmed by
at least one of the disjuncts separately.

Theorem 6.14 (Representation theorem for CW) A consequence relation is weak


confirmatory if! it satisfies the postulates ofCW.

The system CW thus provides an axiomatisation of the relation of logical compatibil-


ity.

6.5 DISCUSSION
This chapter has been written in an attempt to increase our understanding of hypoth-
esis generation through logical analysis. What logic can achieve for arbitrary forms
of reasoning is no more and no less than a precise definition of what counts as a pos-
sible hypothesis given certain premisses. Evaluating the usefulness or plausibility of
hypotheses is an extra-logical matter.
There is not a single logic of hypothesis generation. The logical relationship be-
tween evidence and possible hypotheses depends on the task these hypotheses are
intended to perform. Abductive hypotheses or induced classification rules such as con-
cept definitions are based on a notion of explanation, while non-classificatory gener-
alisations are based on a notion of confirmation. Other forms of hypothesis generation
are conceivable. In this chapter I have proposed a meta-level framework for charac-
terising and reasoning about different forms of hypothesis generation. The framework
does not fix a material definition of hypothesis generation, but can be used to aggregate
knowledge about classes of such material logics of hypothesis generation.
A number of technical results have been obtained. The system EM axiomatises
explanation-preserving reasoning with respect to a monotonic explanation mechanism.
This system stays close to Peirce's conception of abduction in his later inferential
theory, but enables a fuller logical analysis. Characterisation of explanatory reasoning
with respect to weaker (e.g. preferential) explanation mechanisms is left as an open
problem.
106 P.A.FLACH

Confirmatory reasoning has been divided in a closed form making completeness


assumptions about the evidence, and an open form not making such assumptions. The
systems CS and CP axiomatise the general and preferential forms of closed confir-
matory reasoning, conceived as reasoning about selected regular models. They repre-
sent variations of earlier, differently motivated, characterisations by (Bell, 1991) and
(Kraus et al., 1990), with a different treatment of inconsistent premisses. An impor-
tant open problem here is the axiomatisation of confirmatory structures where regular
models may not be models of the premisses. Finally, the system CW represents an
extreme form of open confirmatory reasoning (i.e. compatibility of premisses and hy-
pothesis). The point to note here is that this is the only form of confirmatory reasoning
considered here that satisfies the desirable property of Convergence, enabling an in-
cremental approach. Finding more realistic forms of open confirmatory reasoning
remains an open problem.

Acknowledgments
An extended version of this chapter appears in the Handbook of Defeasible Reasoning and
Uncertainty Management (Flach, 2000). Part of this work was supported by Esprit IV Long
Term Research Project 20237 (Inductive Logic Programming 2).
7 ABDUCTION AND INDUCTION
FROM A NON-MONOTONIC
REASONING PERSPECTIVE
Nicolas Lachiche

7.1 INTRODUCTION
Both abduction and induction aim at inferring hypotheses given some observations. In
the inferential perspective of Peirce, recalled by Flach and Kakas in the introductory
chapter, abduction is defined as the process of coming up with a hypothesis to explain
the observations. I will consider a slightly more constrained definition of abduction as
the inference of the best explanation. Reasoning from symptoms to their causes (dis-
eases) is, for instance, a typical abductive task. This definition is the most frequently
used in computer science.
The underlying notion of explanation still has to be made precise. In his chapter in
this volume, Josephson argues that some good explanations are not proofs and some
proofs are not explanations, so explanations are not deductive proofs but assignments
of causal responsibility. Similarly, Bessant introduces a hypothetico-nonmonotonic
model. We will consider in this chapter the simpler case where explanation is modelled
by logical consequence. We will also assume that knowledge is represented in a first-
order predicate logic.
While induction is sometimes considered to cover any ampliative reasoning as men-
tioned by Flach and Kakas, in this chapter, induction will only refer to inductive gen-
eralisation: induction is the inference of general laws from observations. Given the
observations of the positions of the planets during a limited lapse of time, inferring
that the planets move in ellipses with the sun at one focus is an inductive task.

107
P.A. Flach and A. C. Kakas (eds.), Abduction and Induction, 107-116.
© 2000 Kluwer Academic Publishers.
108 N.LACHICHE

Clearly abduction and induction meet each other when the inference of general laws
explaining the observations is considered. In this case, abduction and induction can-
not be distinguished in the inference process. I will however argue that they can still
be separated according to the utility of inferred hypotheses. Actually, the inference of
general rules explaining the observations is one form of induction, appropriately called
explanatory induction. There is another form of induction whose aim is not to explain
the observations, but simply to point out regularities confirmed by the observations.
This other kind of induction is called descriptive (or confirmatory) induction. The
relationship between abduction and descriptive induction is less obvious and requires
the consideration of their underlying completion principles. Abduction and induc-
tion are indeed non-monotonic reasonings. Their conclusions, which are rather called
hypotheses, may be falsified when new knowledge is provided. Abduction and induc-
tion are ampliative: They provide more knowledge than that can be deduced from the
available knowledge. Their results can however be built automatically by a computer
system, and that means that they can be deduced in some way. This is done by com-
pleting the limited available knowledge in order to have enough information to be able
to use deduction. The completion technique has therefore to represent the intended
reasoning. At least one completion technique has been used for both abduction and
induction but this does not mean that the completion principles are the same in both
cases. In fact, I will argue that induction relies on a completion of individuals whereas
abduction relies on a completion of properties. Therefore, abduction and induction
differ from a non-monotonic reasoning perspective.
In the following sections, the different meanings of abduction and of induction
are recalled, in particular the explanatory and descriptive forms of induction. These
two forms of induction are then compared with abduction. The case of explanatory
induction is quite obvious, but descriptive induction will require to detail underlying
completion principles. The relationship between abduction and induction will finally
be discussed in the light of their freshly highlighted completion principles and their
different applications.

7.2 DEFINITIONS
In this section, the different meanings that can be given to logic-based abduction and
induction are presented. We will first define abduction and the underlying notion
of explanation. Then we will consider induction and its explanatory and descriptive
forms.
Actually, the non-monotonicity of both forms of reasoning does not mean that every
piece of knowledge that cannot be deduced from the observations is a hypothesis. Let
us first define abductive hypotheses as the set of preferred hypotheses explaining the
observations.

Definition 7.1 (Abduction) Given

• a set of observations E in a language L,

• a domain theory, also called background knowledge, Thin L,


A NON-MONai'ONIC REASONING PERSPECfiVE 109

• a set of preference criteria C intensionally characterising preferred hypotheses


in the meta-language ofL,

abduction consists in determining the set of hypotheses H from L which explain E


given T h and are preferred according to C.

The notion of explanation can be defined in numerous ways (Josephson, this vol-
ume). In this chapter, we will only model explanation by logical consequence. We are
thus in the hypothetico-deductive model detailed by Bessant (this volume). Clearly,
a pure deductive model cannot completely fit the real world and any pure deductive
model has to be adapted into a hypothetico-non monotonic model to cover some ex-
ceptions. However, a pure deductive model usually covers most of the observations
and therefore provides a good basis for reasoning.

Definition 7.2 (Explanation) A set of hypotheses H explains a set of observations E


given a domain theory T h if and only if H A T h is consistent and H A T h f= E.

For instance, given the observation E = mortal(Socrates) and the domain theory
Th = VX ,human(X) => mortal(X), the formulaH =human( Socrates) is an abductive
hypothesis for E given T h.
Inductive hypotheses are not necessarily required to be explanatory. Inferring that
planets move in ellipses from observations does describe some general knowledge on
planets, but does not explain a lot. Roughly, inductive hypotheses have to account for
the observations.

Definition 7.3 (Induction) Given

• a set of observations E in a language L,

• a domain theory T h in L,

• a set ofpreference criteria C intensionally characterising preferred hypotheses


in the meta-language of L,

induction consists in determining the set of hypotheses H from L which account forE
given T h and are preferred according to C.

Accounting for the observations can have at least two meanings: hypotheses can
either explain the observations, or reflect regularities of the observations. The for-
mer defines explanatory induction and the latter defines descriptive induction. The
definition of explanatory induction is thus the same as that of abduction.

Definition 7.4 (Explanatory induction) Given

• a set of observations E in a language L,

• a domain theory T h in L,

• a set ofpreference criteria C intensionally characterising preferred hypotheses


in the meta-language of L,
110 N. LACHICHE

explanatory induction consists in determining the set of hypotheses H from L which


explain E given Th,
HI\Thf==E,
and are preferred according to C.

For instance, given the observation E = mortal(Socrates) and the domain theory
Th = human(Socrates), the formula
H = 't/X,human(X) => mortal(X)

is an explanatory inductive hypothesis for E given T h. According to the previous


definitions, it is also an abductive hypothesis.
Explanatory induction has been the most considered form of induction by induc-
tive logic programming in particular and by machine learning in general. It is often
referred to as supervised learning. The other form of induction was introduced re-
cently by (Helft, 1989). Descriptive induction, also called confirmatory induction,
looks for regularities reflecting the observations, that is general laws satisfied by the
observations.

Definition 7.5 (Descriptive induction) Given

• a set of observations E in a language L,


• a domain theory T h in L,

• a set ofpreference criteria C intensionally characterising preferred hypotheses


in the meta-language of L,

descriptive (or confirmatory) induction consists in determining the set of hypotheses


H from L such that:

• CompsA(E 1\ Th) f= H,
• H is preferred according to C.
CompSA(E 1\ Th) is the completion of the initial knowledge by a similarity assumption
such that unknown individuals behave like the known ones.

For instance, given the set of observations

E = {mortal(Socrates),human(Socrates)}
and the domain theory Th = VX ,man(X)=> human(X), the formulaH = VX ,human(X) =>
mortal(X) is a descriptive inductive hypothesis forE given Th.
A NON-MONOfONIC REASONING PERSPECTIVE 111

7.3 ABDUCTION AND EXPLANATORY INDUCTION


It follows from the definitions that explanatory induction is closely related to abduc-
tion. Sometimes, syntactic constraints are put forward to distinguish explanatory in-
duction from abduction (see the chapters by Bessant, Lamma et al., and Sakama). The
latter is restricted to the inference of ground facts whereas the former infers general
rules. This restriction is successfully modelled from a semantic point of view in (De-
necker et al., 1996). But I think the use of such restrictions is not desirable since
abduction can also consider general formulas. Thus both forms of reasoning consist
in determining hypotheses that explain (in the sense of logical consequence) the ob-
servations. This is the hypothetico-deductive model presented in Bessant's chapter. It
can also be related to the meta-level consistency condition expressed by Christiansen
(this volume).
As mentioned by Abe (this volume), both abduction and induction can be seen
as inverse deduction. Inverse deduction can be used for abduction (Pople, Jr., 1973)
or for induction (Muggleton and Buntine, 1988). When the observations E and the
domain theory T h are represented by closed formulas, the deduction theorem gives
the following result cited in (Gregoire and Sa1s, 1996):
H 1\ Th I= E if and only if-,£ 1\ Th I= •H.
By adding a negation, hypotheses can thus be deduced from observations. This rela-
tion is, for instance, used in the Progol system (Muggleton, 1995).
For instance, given the observation E = mortal(Socrates) and the domain theory
Th = human(Socrates), the formula
•H = 3Xf•mortal(X) 1\human(X)
is a logical consequence of
-,£1\ Th = •mortal(Socrates) 1\human(Socrates)
and H = VX, human(X) =} mortal(X) is an explanatory inductive hypothesis for E
given Th.
Thus, if we do not consider syntactical or semantical restrictions on the form of
hypotheses, we conclude that there is no difference between abduction and explanatory
induction from a logical point of view. But there is still a difference between abduction
and explanatory induction concerning the intended use of hypotheses. This will be
discussed in Section 7.5 since it applies to descriptive induction as well.

7.4 ABDUCTION AND DESCRIPTIVE INDUCTION


The relationship between descriptive induction and abduction is less straightforward.
Some techniques have been used in both descriptive induction and abduction. These
techniques are actually completion techniques used to complete the available knowl-
edge in order to be able to produce hypotheses deductively. We will consider an
example of a completion technique that has been used for both abduction and descrip-
tive induction and a completion technique specific to descriptive induction. We will
then define more precisely the underlying completion principles of abduction and of
descriptive induction.
112 N.LACHICHE

7.4. 1 A non-monotonic reasoning perspective


Abduction and induction share similarities at the inferential level. In fact, they look
for a set of hypotheses in a language L given a set of observations and possibly a
domain theory. These hypotheses can be characterised intensionally by completion
assumptions, that are often formulas from a meta-language describing L. The problem
consists in finding them in extension, that is in L. For instance, the similarity assump-
tion of descriptive induction is a completion assumption. It can be represented by the
formula VX ,X E C, where Cis the set of known individuals. Given this completion
technique, descriptive inductive hypotheses are defined as being logical consequences
of the observations.
Several works aiming at characterising non-monotonic inference as giving hypothe-
ses, explicitly originate in the idea of a search of hypotheses characterised by a com-
pletion principle. The alternative is not to give them explicitly but to reason directly
from completion assumptions as it is done, for instance, with circumscription (Poole,
1988a; Brewka, 1989; Cayrol, 1992).
While abduction and induction can be modelled as deduction from a completed
theory, different completion policies must be considered. Some have been used by
both abduction and induction while others are specific to descriptive induction.

7.4.2 A shared completion technique


Clark's predicate completion (Clark, 1978) is a completion technique that has been
used for both reasonings.
It has been used in several works on abduction (Console eta/., 1991 b; Inoue, 1992b;
Konolige, 1992). For instance, given the observation E = mortal(Socrates) and the
domain theory
Th = VX,human(X) =} mortal(X),
the hypothesis H = human( Socrates) can be deduced from E 1\ T h by adding the other
half of Clark's predicate completion ofTh, that is the formula

VX,mortal(X) =} human(X) .

Clark's predicate completion has also been used for descriptive induction. In the
CLAUDIEN system proposed by (De Raedt and Bruynooghe, 1993), a clause is a de-
scriptive inductive hypothesis if no counter-instance of it can be deduced from the
completed initial knowledge, CompcJark(E 1\ Th). For instance, Clark's predicate
completion of the set of observations

E = {human(Socrates),mortal(Socrates)}

and the domain theory

Th = VX,man(X) =} human(X)

adds the set of formulas

{VX,human(X) =} (man(X) V (X= Socrates));


A NON-MONOI'ONIC REASONING PERSPECfiVE 113

VX,mortal(X):::} (X= Socrates);VX,...,man(X)}.


Clearly, the formula
3X ,human(X) 1\ ...,mortal(X)
cannot be satisfied in Compclark(E 1\ Th), thus its negation
VX, -,human(X) V mortal(X)
is a descriptive inductive hypothesis for E given T h.
Thus abduction and descriptive induction share at least one common completion
technique. However, there exists some completion techniques specific to descriptive
induction that do not seem to apply to abduction.

7.4.3 A completion technique specific to induction


Descriptive induction basically relies on the similarity assumption that "All individuals
behave like the known ones". Hempel expresses it in the following way (Hempel,
1943):
A hypothesis H is confirmed by the set of observations E and the domain theory
Th if and only if H would be a logical consequence of E 1\ Th if the domain of
individuals were reduced to the set of individuals appearing in E 1\ T h.
Thus, if we denote by C the set of individuals appearing in E 1\ Th, His confirmed
byE 1\ Th if and only if

E 1\ Th 1\ VX,X E C ~ H.

This completion principle is known as the domain closure assumption.


For instance, given the set of observations

E = {mortal(Socrates),human(Socrates)}
and the domain theory

Th= VX,man(X):::} human(X),


the set of individuals appearing in E 1\ Th is C = {Socrates} and clearly the formula
H = VX,human(X):::} mortal(X)

is a logical consequence of

E 1\ Th 1\ VX,(X=Socrates).

At least two definitions have been proposed for the notion of "appearing" individ-
uals. (Lachiche and Marquis, 1997) suggested to use the Herbrand domain of E 1\ T h
while (Hempel, 1945) suggested to use the set of "essential" individuals of E 1\ T h,
that is the set of individuals appearing in every formula logically equivalent toE 1\ T h.
This issue is discussed in (Lachiche and Marquis, 1997).
The domain closure assumption is clearly appropriate to model descriptive induc-
tion, but it does not seem appropriate to model abduction. Actually, abduction and
induction do not rely on the same completion principles.
114 N.LACHICHE

7.4.4 Circumscription of properties and circumscription of individuals


Clark's predicate completion represents a kind of inverse deduction. It roughly inverts
implications such that each conclusion entails its condition. Hence the explanation of
the observations can be automatically deduced from the completed knowledge. Clark's
completion is thus appropriate to abduction.
However, Clark's predicate completion is less appropriate to descriptive induction.
(Lachiche and Marquis, 1997) have argued that the use of Clark's completion, for
instance in (De Raedt and Bruynooghe, 1993), or of minimal models, for instance in
(Helft, 1989; Muggleton and De Raedt, 1994; Wrobel and Dzeroski, 1995) requires
a more restricted language of hypotheses than the domain closure assumption and
entails a closed world assumption that is not necessarily wanted.
To be precise, descriptive induction and abduction do not require the same comple-
tion policy. Descriptive induction simply requires the circumscription of individuals
of the domain, without more assumptions: all individuals behave like the known ones.
Abduction requires the circumscription of properties of the (known!) individuals: ex-
planations are assumed to be among the known explanations. These assumptions en-
able the use of deduction to check the validity of a hypothesis.
Usually a circumscription of properties entails a circumscription of individuals.
It can be then used for induction, but the converse is false. It explains why some
completion principles, such as Clark's predicate completion, have been used for both
abduction and induction. But since the circumscription of properties is stronger than
the circumscription of individuals, using the former for descriptive induction requires
a restriction of expressiveness (Lachiche and Marquis, 1997), therefore the latter must
be preferred.

7.5 DISCUSSION
Both abduction and explanatory induction infer an explanation of the observations. I
don't think that syntactical constraints or language biases, such as considering only ab-
ducibles, must be put forward to distinguish between induction and abduction. There-
fore, abduction and explanatory induction can be seen as logically identical. However,
they can still be separated from an epistemic perspective. Those epistemic differences
apply to descriptive induction as well. However more differences can be pointed out
for descriptive induction.

7.5. 1 A non-monotonic perspective


Descriptive induction, also known as the non-monotonic setting of inductive logic
programming (Muggleton and De Raedt, 1994), has been less considered than ex-
planatory induction by computer scientists, and therefore less considered in the recent
debate on abduction and induction. Considering descriptive induction helps to clarify
the relationship between abduction and induction by making explicit the completion
principle of each form of reasoning. As we pointed out, abduction and descriptive
induction basically do not rely on the same completion principles. Abduction makes
use of a circumscription of properties: the explanation is among the known ones.
Descriptive induction relies on a circumscription of individuals: all individuals are
A NON-MONafONIC REASONING PERSPECfiVE 115

known_ To the circumscription of properties required by abduction is usually added


a circumscription of individuals. Since the circumscription of individuals is the only
completion required by descriptive induction, techniques used for abduction can often
be used for descriptive induction. But clearly, some completion principles, like the do-
main closure assumption, which are appropriate to descriptive induction do not seem
appropriate for abduction.
Another main difference between abduction and descriptive induction is related
to the status of completed knowledge. In the case of abduction, formulas resulting
from the completion can be added to the initial knowledge, as long as no new given
knowledge refutes them. For instance, completing the knowledge with the formula
VX ,mortal(X) :::} human(X) simply states that the only known cause of mortality is
to be human, even if this belief can be falsified in the future. In the case of descriptive
induction, formulas resulting from the completion are only needed for calculus. They
are not intended to be kept: they perform a domain closure and we certainly do not
believe that all individuals are known since the aim of induction is to produce general
formulas which apply to new individuals.

7.5.2 An epistemic point of view


The completion principles also reflect the interests of abduction and induction. Abduc-
tion focuses on properties while induction focuses on individuals. This can be related
to the attention paid to the generality of hypotheses and to their explanatory interest,
as pointed out by Flach and Kakas (this volume). Abduction considers mainly known
individuals. Abduction can sometimes infer on some unknown terms by skolemizing
them but basically it is not concerned with the existence of other possible individuals
while induction is done especially to cover new individuals. This is the difference be-
tween abduction and explanatory induction: while they are technically identical, they
differ from an epistemic point of view. An abductive hypothesis is intended to explain
the observations. An explanatory inductive hypothesis is not intended to explain the
observations; even if it does, it is intended to apply to new individuals. The difference
is even more important with descriptive induction. A descriptive inductive hypothesis
is first intended to apply to new individuals and secondly it is absolutely not required
to explain the observations, but on the contrary to be confirmed by the observations.

7.6 CONCLUSION
In this chapter, abduction is considered as the process of inferring an explanation for
given observations. Explanations are assumed to logically entail the observations.
This definition is close to the one of explanatory induction. In fact, they are identical
when no syntactical arguments are taken into account and I believe that the difference
between abduction and induction should not rely on syntactical restrictions. Logically
speaking, abduction and explanatory induction are the same process.
One originality of this chapter is to compare abduction with another form of induc-
tion called descriptive (or confirmatory) induction. This kind of induction has been
studied for a long time by philosophers, but has only recently been considered in arti-
ficial intelligence. Descriptive induction doesn't aim at explaining the observations but
116 N.LACHICHE

rather at reflecting regularities satisfied by the observations. Some completion princi-


ples used in abduction, for instance Clark's predicate completion, have been used for
descriptive induction as well. I have pointed out that this is due to the completion of
the individuals hidden in those completion techniques, and that other completion tech-
niques modelling only a circumscription of individuals exist that are more appropriate
to descriptive induction than those using a circumscription of properties. Moreover
a circumscription of individuals does not seem to be of any use for abduction. Thus
abduction and descriptive induction are different in their completion techniques.
Explanatory reasoning, either called abduction or explanatory induction, relies on
the assumption that the explanation of the observations is among the known properties
whereas descriptive induction relies on the assumption that the set of individuals con-
sists only of the known ones. I have emphasised that abduction and induction differ on
several practical aspects. The completed knowledge can be kept in abduction whereas
it must not be in descriptive induction. The abductive hypothesis is only required to
explain the current observations whereas the inductive hypothesis is intended to cover
more observations. Finally, from my point of view, the generation of a general rule
explaining given observations is either an abductive process if it aims at explaining the
observations only, or an inductive process if the rule is intended to cover new obser-
vations. Thus abduction and induction are different forms of reasoning. However, it
does not prevent them to be used in a symbiosis to help each other; see, for instance,
the chapters by Mooney and Lamma et al..

Acknowledgments
This work was partially supported by Esprit IV Long Term Research Project 20237 (Inductive
Logic Programming 2). I would like to thank Pierre Marquis and Antony Bowers for their
helpful comments.
Ill The integration of
abduction and induction: an
Artificial Intelligence perspective
8 UNIFIED INFERENCE IN
EXTENDED SYLLOGISM
Pei Wang

8.1 TERM LOGIC AND PREDICATE LOGIC


There are two major traditions in formal logic: term logic and propositional/predicate
logic, exemplified respectively by the Syllogism of Aristotle and the First-Order Pred-
icate Logic founded by Frege, Russell, and Whitehead.
Term logic is different from predicate logic in both its knowledge representation
language and its inference rules. Term logic represents knowledge in subject-predicate
statements. In the simplest form, such a statement contains two terms, linked together
by an inheritance relation:

ScP

where S is the subject term of the statement, and P is the predicate term. Intuitively,
this statement says that Sis a specialization (instantiation) of P, and Pis a generaliza-
tion (abstraction) of S. This roughly corresponds to "S is a kind of P" in English.
Term logic uses syllogistic inference rules, in each of which two statements shar-
ing a common term generate a new statement linking the two unshared terms. In
Aristotle's syllogism (Aristotle, 1989), all statements are binary (that is, either true
or false), and all valid inference rules are deduction, therefore when both premises
are true, the conclusion is guaranteed to be true. When Aristotle introduced the de-
duction/induction distinction, he presented it in term logic, and so did Peirce when
he added abduction into the picture (Peirce, 1958). According to Peirce, the deduc-
tion/abduction/induction triad is defined formally in terms of the position of the shared
term (Table 8.1). Defined in this way, the difference among the three is purely syn-
tactic: in deduction, the shared term is the subject of one premise and the predicate of
117
P.A. Flach and A.C. Kakas (eds.), Abduction and Induction, 117-129.
© 2000 Kluwer Academic Publishers.
118 P. WANG

Deduction Abduction Induction

MCP PCM MCP


SCM SCM MC S

SCP SCP SCP

Table 8.1 The deduction/abduction/induction triad.

the other; in abduction, the shared term is the predicate of both premises; in induction,
the shared term is the subject of both premises. If we only consider combinations of
premises with one shared term, these three exhaust all the possibilities (the order of
the premises does not matter for our current purpose).
It is well-known that only deduction can generate sure conclusions, while the other
two are fallible. However, it seems that abduction and induction, expressed in this way,
do happen in everyday thinking, though they do not serve as rules for proof or demon-
stration, as deduction does. In seeking for a semantic justification for them, Aristotle
noticed that induction corresponds to generalization, and Peirce proposed that abduc-
tion is for explanation. Later, when Peirce's focus turned to the pragmatic usage of
non-deductive inference in scientific inquiry, he viewed abduction as the process of
hypothesis generation, and induction as the process of hypothesis confirmation.
Peirce's two ways to classify inference, syllogistic and inferential (in the terminol-
ogy introduced by Flach and Kakas in their introductory chapter), correspond to two
levels of observation in the study of reasoning. The "syllogistic" view is at the micro
level, and is about a single inference step. It indicates what conclusion can be derived
from given premises. On the other hand, the "inferential" view is at the macro level,
and is about a complete inference process. It specifies the logical relations between
the initial premises and the final result, without saying how the result is obtained, step
by step.
Though the syllogistic view is constructive and appealing intuitively, some issues
remain open, as argued by Flach and Kakas:

1. Abduction and induction, when defined in the syllogistic form, have not ob-
tained an appropriate semantic justification.
2. The expressive capacity of the traditional term-oriented language is much less
powerful than the language used in predicate calculus.

I fully agree that these two issues are crucial to term logic.
For the first issue, it is well known that abduction and induction cannot be justified
in the model-theoretic way as deduction. Since "preserving truth in all models" seems
to be the only existing definition for the validity of inference rules, in what sense
abductive and inductive conclusions are "better", or "more rational" than arbitrary
conclusions? In what semantic aspect they differ from each other?
UNIFIED INFERENCE IN EXTENDED SYLLOGISM 119

The second issue actually is a major reason for syllogism to lose its dominance to
"mathematical logic" (mainly, first-order predicate logic) one hundred years ago. How
much of our knowledge can be put into the form "S is a kind of P", where S and P are
English words? Currently almost all logic textbooks treat syllogism as an ancient and
primary form of logic, and as a subset of predicate logic in terms of functionalities.
It is no surprise that even in the works that accept the "syllogistic" view of Peirce on
abduction and induction, the actual formal language used is not that of syllogism, but
predicate logic (see the chapter of Christiansen).
In the following, I want to show that when properly extended, syllogism (or term
logic, I am using these two names interchangeably here) provides a more natural,
elegant, and powerful framework, in which deduction, abduction, and induction can
be unified in syntax, semantics, and pragmatics. Furthermore, this new logic can be
implemented in a computer system for the purpose of artificial intelligence.

8.2 EXTENDED SYLLOGISM IN NARS


NARS is an intelligent reasoning system. In this chapter, I only introduce the part of
it that is directly relevant to the theme of this book. For more comprehensive descrip-
tions of the system, see (Wang, 1994; Wang, 1995).

8.2. 1 Syntax and semantics


NARS is designed to be adaptive to its environment, and to work with insufficient
knowledge and resources. The system answers questions in real time according to
available knowledge, which may be incomplete (with respect to the questions to be
answered), uncertain, and with internal conflicts.
In NARS, each statement has the form

SCP <F, C>

where "S C P" is used as in the previous section, and "< F, C >" is a pair of real
numbers in [0, 1]. representing the truth value of the statement. F is the frequency of
the statement, and Cis the confidence.
When both F and C reach their maximum value, 1, the statement indicates a com-
plete inheritance relation from StoP. This special case is written as "S C P". By
definition, the binary relation "c" is reflexive and transitive.
We further define the extension and intension of a term T as sets of terms:

Er = {xl x C T} and lr = {xl T C x}


respectively. It can be proven that

(S C P) <====}- (Es ~ Ep) <====}- (lp ~Is)

where the first relation is an inheritance relation between two terms, while the last
two are including relations between two sets. This is why "c" is called a "complete
inheritance" relation- "S C P" means that P completely inherits the extension of S,
and S completely inherits the intension of P.
120 P.WANG

As mentioned before, "c'' is a special case of "C". In NARS, according to the as-
sumption of insufficient knowledge, inheritance relations are usually incomplete. To
adapt to its environment, even incomplete inheritance relations are valuable to NARS,
and the system needs to know how incomplete the relation is, according to given
knowledge. Though complete inheritance relations do not appear as given knowl-
edge to the system, we can use them to define positive and negative evidence of a
statement in idealized situations, just like we usually define measurements of physical
quantities in highly idealized situations, then use them in actual situations according
to the definition.
For a given statement "S C P" and a term M, when M is in the extension of S, it
can be used as evidence for the statement. If M is also in the extension of P, then the
statement is true, as far as M is concerned, otherwise it is false , with respect toM. In
the former case, M is a piece of positive evidence, and in the latter case, it is a piece of
negative evidence. Similarly, if M is in the intension of P, it becomes evidence for the
statement, and it is positive if M is also in the intension of S, but negative otherwise.
For example, if we know that iron (M) is metal (S), then iron can be used as ev-
idence for the statement "Metal is crystal" (S C P). If iron (M) is crystal (P), it is
positive evidence for the above statement (that is, as far as its instance iron is con-
cerned, metal is crystal). If iron is not crystal, it is negative evidence for the above
statement (that is, as far as its instance iron is concerned, metal is not crystal). There-
fore, syntactically, "Metal is crystal" is an inductive conclusion defined in term logic;
semantically, the truth value of the conclusion is determined by checking for inherited
instance (extension) from the subject to the predicate; pragmatically, the conclusion
"Metal is crystal" is a generalization of "Iron is crystal", given "Iron is metal" as
background knowledge.
Similarly, if we know that metal (P) is crystal (M), then " being crystal" can be
used as evidence for the statement "Iron is metal" (S C P) . If iron (S) is crystal (M),
it is positive evidence for the above statement (that is, as far as its property crystal is
concerned, iron is metal). If iron is not crystal, it is negative evidence for the above
statement (that is, as far as its property crystal is concerned, iron is not metal). There-
fore, syntactically, "Iron is metal" is an abductive conclusion defined in term logic;
semantically, the truth value of the conclusion is determined by checking for inherited
property (intension) from the predicate to the subject; pragmatically, the conclusion
"Iron is metal" is an explanation of "Iron is crystal", given "Metal is crystal" as back-
ground knowledge.
The perfect parallelism of the above two paragraphs indicates that induction and
abduction, when defined in term logic as above, becomes dual.
If the given knowledge to the system is a set of complete inheritance relations, then
the weight of positive evidence and total (positive and negative) evidence are defined,
respectively, as

w+ = IEsnEpl + jlpnlsl, W = IEsl +liP!


where set theory notations are used.
.
Finally, the truth value mentioned previously is defined as
F =W+ jW, C=W j(W +I)
UNIFlED INFERENCE IN EXTENDED SYLLOGISM 121

Deduction Abduction Induction

M c p <FJ, CI> PC M <FJ,C!> MCP<FJ,C!>


SCM <F2, C2> SCM <F2, C2> M C S <F2, C2>

S C P <F, C> S C P <F, C> S C P <F, C>

F = F1F2 F=F2 F=F1


F 1+F2 -F1Fz
C = C1C2(F1 + F2- F1F2) C- F,C,Cz C- FzC1C2
- F 1C 1C2+1 - FzC1Cz+l

Table 8.2 The NARS inference rules.

Intuitively, F is the proportion of positive evidence among all evidence, and C is the
proportion of current evidence among evidence in the near future (after a unit-weight
evidence is collected). When C is 0, it means that the system has no evidence on
the proposed inheritance relation at all (and F is undefined); the more evidence the
system gets (no matter positive or negative), the more confident the system is on this
judgment.
Now we can see that while in traditional binary logics, the truth value of a state-
ment qualitatively indicates whether there exists negative evidence for the statement,
in NARS the truth value quantitatively measures available positive and negative evi-
dence.

8.2.2 Inference rules


Though the truth value of a statement is defined in terms of counting in extension and
intension of the two terms, it is not actually obtained in this way in NARS. Instead, the
user specifies truth values of input statements (according to the semantics described
in the previous section), and the inference rules in NARS have truth-value functions
attached to calculate the truth values of the conclusions from those of the premises in
each inference step.
The triad of inference in NARS is given in Table 8.2. These truth-value functions
are determined in two steps: first, from the definition of the truth value in NARS,
we get the boundary conditions of the functions, that is, the function values when the
premises are complete inheritance relations. Second, these boundary conditions are
extended into general situations according to certain principles, such as the continuity
of the function, the independence of the premises, and so on. For detailed discussions,
see (Wang, 1994; Wang, 1995).
From a semantic point of view, the truth value of an inheritance relation is deter-
mined in different ways in different inference types: deduction extends the transitiv-
ity of complete inheritance relation into (incomplete) inheritance relation in general;
abduction establishes inheritance relations based on shared intension; induction estab-
122 P. WANG

Revision

s c P <F1, c1 >
S c P <F2, C2>

S C P <F, C>

Table 8.3 NARS revision rule.

lishes inheritance relations based on shared extension. Though intuitively we can still
say that deduction is for proving, abduction is for explanation, and induction is for
generalization, these characteristics are no longer essential. Actually here labels like
"generalization" and "explanation" are more about the pragmatics of the inference
rules (that is, what the user can use them for) than about their semantics (that is, how
the truth values of their conclusions are determined).
Each time one of the above rules is applied, the truth value of the conclusion is
evaluated solely according to the evidence summarized in the premises. Abductive and
inductive conclusions are always uncertain (i.e., their confidence values cannot reach
1 even if the premises have confidence 1), because they never check the extension or
intension of the two terms exhaustively in one step, as deductive conclusions do.
To get more confident conclusions, a revision rule is used to combine evidence
from different sources (Table 8.3). These two functions are derived from the relation
between weight of evidence and truth value, and the additivity of weight of evidence
during revision (Wang, 1994; Wang, 1995). The conclusion is more confident than
either premise, because it is based on more evidence. The frequency of the conclusion
is more stable in the sense that it is less sensitive (compared to either premise) to (a
given amount of) future evidence.
Using this rule to combine abductive and inductive conclusions, the system can
obtain more confident and stable generalizations and explanations.

8.2.3 Inference control


Since all statements have the same format, different types of inference can easily be
mixed in an inference procedure. The premises used by the induction rule may be
generated by the deduction (or abduction, and so on) rule, and the conclusions of the
induction rule may be used as premises by the other rules. The revision rule may
merge an inductive conclusion with a deductive (or abductive, and so on) conclusion.
NARS processes many inference tasks at the same time by time-sharing, and in
each time-slice a task (a question or a piece of new knowledge) interacts with a piece of
UNIAED INFERENCE IN EXTENDED SYLLOGISM 123

available knowledge. The task and knowledge are chosen probabilistically according
to priority values reflecting the urgency of each task and the salience of each piece
of knowledge. The priority distributions are adjusted after each step, according to
the result the system obtained in that step. By doing so, the system tries to spend
more resources on more important and promising tasks, with more reliable and useful
knowledge. Consequently, in each inference step the system does not decide what rule
to use first, then look for corresponding knowledge. Instead, it picks up two statements
that share a common term, and decides what rule to apply according to the position of
the shared term.
In general, a question-answering procedure in NARS consists of many inference
steps. Each step carries out a certain type of inference, such as deduction, abduction,
induction, revision, and so on. These steps are linked together in run-time in a context-
sensitive manner, so the processing of a question or a piece of new knowledge does
not follow a predetermined algorithm. If the same task appears at different time in
different context, the processing path and result may be different, depending on the
available knowledge, the order by which the pieces of knowledge are accessed, and
the time-space resource supplied to the task.
When the system runs out of space, it removes terms and statements with the lowest
priority, therefore some knowledge and tasks may be permanently forgotten by the
system. When the system is busy (that is, working on many urgent tasks at the same
time), it cannot afford the time to answer all questions and to consider all relevant
knowledge, so some knowledge and tasks may be temporally forgot by the system.
Therefore the quality of the answers the system can provide not only depends on the
available knowledge, but also depends on the context in which the questions are asked.
This control mechanism makes it possible for NARS to answer questions in real
time, to handle unexpected tasks, and to use its limited resources efficiently.

8.3 AN EXAMPLE
Let us see a simple example. Assume that the following is the relevant knowledge that
NARS has at a certain moment:
( 1) robin C feat he red -ereat ure < 1.00, 0. 90 >
("Robin has feather.")
(2) bird C feathered-creature < 1.00,0.90>
("Bird has feather.")
(3) swan C bird < 1.00,0.90>
("Swan is a kind of bird.")
(4) swan C swimmer < 1.00, 0.90 >
("Swan can swim.")
(5) gull C bird < 1.00,0.90>
("Gull is a kind of bird.")
(6) gull C swimmer < 1.00,0.90>
("Gull can swim.")
124 P. WANG

(7) crow C bird < 1.00, 0.90 >


("Crow is a kind of bird.")
(8) crow C swimmer < 0.00, 0.90 >
("Crow cannot swim.")

Then the system is asked to evaluate the truth value of

robin C swimmer

which is like asking "Can robin swim?"


To make the discussion simple, let us assume a certain priority distribution, accord-
ing to which the premises are chosen in the following order.

[Step 1] From (1) and (2), by abduction, the system gets:

(9) robin C bird < 1.00,0.45 >


Here "having feather" gives the system evidence to believe that robin is a kind of bird,
though the confidence of the conclusion is low, because it is only based on a single
piece of evidence.

[Step 2] From (3) and (4), by induction, the system gets:

(10) bird C swimmer < 1.00,0.45 >

Swan provides positive evidence for "Bird swims". Again, the confidence is low.

[Step 3] From (9) and (10), by deduction, the system gets:

( 11) robin C swimmer < 1.00, 0.20 >


As an answer to a question asked by the user, this result is reported to the user, while
the system continues to work on it when resources are available. Here the system an-
swers "Yes" to "Can robin swim?", though it is far from confident about this answer,
and it is going to look for more evidence.

[Step 4] From (5) and (6), by induction, the system gets:

(12) bird C swimmer < 1.00,0.45>


Gull also provides positive evidence for "Bird swims".

[StepS] (10) and (12) look identical, but since they came from different sources,
they are not redundant and can be merged by the revision rule to get:

(13) bird C swimmer < 1.00,0.62>


Evidences from different sources accumulate to support a more confident conclusion.
UNIFIED INFERENCE IN EXTENDED SYLLOGISM 125

[Step 6] From (7) and (8), by induction, the system gets:


(14) bird C swimmer <0.00,0.45>
Crow provides negative evidence for "Bird swims".

[Step 7] From (13) and (14), by revision, the system gets:


(15) bird C swimmer <0.67,0.71 >

A compromise is formed by considering both positive and negative evidence, and the
positive evidence is stronger.

[Step 8] From (9) and (15), by deduction, the system gets:


(16) robin C swimmer <0.67,0.32>
Because this conclusion is a more confident answer to the user question than (13), it
is reported to the user, too. In this way, the system can change its mind after more
knowledge and resources become available.

It needs to be mentioned that a typical run in NARS is much more complex than
the previous description, where we have omitted the conclusions that are irrelevant to
the current question, and we have assumed an order of inference that directly leads to
the desired result.
For example, in Step 2 and 4, NARS actually also gets a symmetric inductive con-
clusion
(17) swimmer C bird < 1.00,0.45 >

which can be combined to become


(18) swimmer C bird < 1.00,0.62>

However, in Step 6 there is no symmetric inductive conclusion generated- since crow


is not a swimmer, no matter it is a bird or not, it provides no evidence for swimmer C
bird. From the definition of (positive and negative) evidence introduced earlier, it is
not hard to see that in induction and abduction, positive evidence for "X C Y" are also
positive evidence for "Y C X", but negative evidence for the former is not counted as
evidence for the latter.
In practical situations, the system may wonder around, and jump from task to task.
However, these behaviors are rational in the sense that all conclusions are based on
available evidence, and the choice of task and knowledge at each step is determined
by a priority distribution, which is formed according to the system's experience and
current environmental factors (such as user requirements).

8.4 DISCUSSION
In this book, the current chapter is the only one that belongs to the term logic tradition,
while all the others belong to the predicate logic tradition. Instead of comparing NARS
126 P.WANG

with the other approaches introduced in the other chapters one by one, I will compare
the two paradigms and show their difference in handling abduction and induction,
because this is the origin of many minor differences between NARS and the other
works.
Compared with deduction, a special property of abduction and induction is the
uncertainty they introduced into their conclusions, that is, even when all the premises
are completely true and an abduction (or induction) rule is correctly applied, there is
still no guaranty that the conclusion is completely true.
When abduction and induction are formalized in binary logic, as in the most chap-
ters of this book, their conclusions become defeasible, that is, a conclusion can be
falsified by any single counter-evidence (see Lachiche's chapter). The philosophical
foundation and implication of this treatment of induction can be found in Popper's
work (Popper, 1959). According to this approach, an inductive conclusion is a univer-
sally quantified formula that implies all positive evidence but no negative evidence.
Though many practical problems can be forced into this framework, I believe that
there are much more that cannot - in empirical science and everyday life, it is not
very easy to get a non-trivial "rule" without counter-example. Abduction is similar.
Staying in binary logic means that we are only interested in explanations that explains
all relevant facts, which are not very common, neither.
To generate and/or evaluate generalizations and explanations with both positive and
negative evidence usually means to measure the evidence quantitatively, and the ones
that with more positive evidence and less negative evidence are preferred (other things
being equal). A natural candidate theory for this is "probabilistic logic" (a combination
of first-order predicate logic and probability theory).
Let us use induction as an example. In predicate logic, a general conclusion "Ravens
are black" can be represented as a universally quantified proposition ('v'x) (Raven(x) -t
Black(x)). To extend it beyond binary logic, we attach a probability to it, to allow it to
be "true to a degree". Intuitively, each time a black raven is observed, the probability
should be increased a little bit, while when a non-raven is observed, the probability
should be decreased a little bit.
Unfortunately, Hempel has found a paradox in this naive solution (Hempel, 1943).
('v'x)(Raven(x) -t Black(x)) is logically identical to ('v'x)( ...,B[ack(x) -t --,Raven(x) ).
Since the probability of the latter is increased by any non-black non-raven (such as a
green shirt), so does the former. This is highly counter-intuitive.
This chapter makes no attempt to survey the huge amount of literature on Hempel's
"Raven Paradox". What I want to mention is the fact that all the previous solutions
are proposed within the framework of first-order predicate logic. I will show that this
problem is actually caused by the framework itself, and the paradox does not appear
in term logic.
In first-order predicate logic, every general conclusion is represented by a propo-
sition which contains at least one universally quantified variable, such as the x in the
previous example. This variable can be substituted by any constant in the domain, and
the resulting proposition is either true or false. If we call the constants that make it
true "positive evidence" and those make it false "negative evidence", then everything
must belong to one of the two category, and nothing in the domain is irrelevant. Lit-
UNIAED INFERENCE IN EXTENDED SYLLOGISM 127

erally, ('v'x) (Raven(x) -t Black(x)) states that "For everything in the domain, either it
is a raven, or it is not black". Though it is a meaningful statement, there is a subtle
difference between it and "Ravens are black" - the latter is about ravens, not about
everything.
The situation in term logic is different. In term logic "Ravens are black" can be
represented as raven C blackJhing, and "Non-black things are not raven" as (thing-
blackJhing) C (thing- raven). According to the definition, these two statements
share common negative evidence (non-black ravens), but the positive evidence for
the former (black ravens) and the latter (non-black non-ravens) are completely differ-
ent (here we only consider the extension of the concepts). The two statements have
the same truth value in binary (extensional) term logic, because there a truth value
merely qualitatively indicates whether there is negative evidence for the statement. In
a non-binary term logic like NARS, they do not necessarily have the same truth value
anymore, so in NARS a green shirt has nothing to do with the system's belief about
whether ravens are black, just like crow, as a non-swimmer, provides no evidence for
swimmer C bird, no matter it is a bird or not (see the example in the previous section).
The crucial point is that in term logic, general statements are usually not about
everything (except when "everything" or "thing" happen to be the subject or the predi-
cate), and the domain of evidence is only the extension of the subject (and the intension
of the predicate, for a logic that consider both extensional inference and intensional
inference). I cannot see how first-order predicate logic can be extended or revised to
do a similar thing.
In summary, my argument goes like this: the real challenge of abduction and in-
duction is to draw conclusions with conflicting and incomplete evidence. To do this, it
is necessary to distinguish positive evidence, negative evidence, and irrelevant infor-
mation for a given statement. This task can be easily carried out in term logic, though
it is hard (if possible) for predicate logic.
Another advantage of term logic over predicate logic is the relation among deduc-
tion, abduction, and induction. As described previously, in NARS the three have a
simple, natural, and elegant relationship, both in their syntax and semantics. Their
definitions and the relationship among them becomes controversial in predicate logic,
which is a major issue discussed in the other chapters of this book.
By using a term logic, NARS gets the following properties that distinguish it from
other artificial intelligence systems doing abduction and induction:

• In the framework of term logic, different types of inference (deduction, abduc-


tion, induction, and so on) are defined syntactically, and their relationship is
simple and elegant.

• With the definition of extension and intension introduced in NARS, it becomes


easy to define truth value as a function of available evidence, to consistently
represent uncertainty from various sources, and to design truth-value functions
accordingly. As a result, different types of inference can be justified by the
same experience-grounded semantics, while the difference among them is still
visible.
128 P. WANG

• Abduction and induction become dual in the sense that they are completely sym-
metric to each other, both syntactically and semantically. The difference is that
abduction collects evidence from the intensions of the terms in the conclusion,
while induction collects evidence from the extensions of the terms. Intuitively,
they still correspond to generalization and explanation, respectively.

• With the help of the revision rule, abduction and induction at the problem-
solving level becomes incremental and open-ended processes, and they do not
need predetermined algorithms.

• The choice of inference rule is knowledge-driven and context-sensitive. Though


in each inference step, different types of inference are well-defined and clearly
distinguished, the processing of user tasks typically consists of different types
of inference. Solutions the user gets are seldom purely inductive, abductive, or
deductive.

Here I want to claim (though I have only discussed part of the reasons in this chap-
ter) that, though First-Order Predicate Logic is still better for binary deductive reason-
ing, term logic provides a better platform for the enterprise of artificial intelligence.
However, it does not mean that we should simply go back to Aristotle. NARS has
extended the traditional term logic in the following aspects:

1. from binary to multi-valued,

2. from monotonic to revisable,


3. from extensional to both extensional and intensional,

4. from deduction only to multiple types of inference,

5. from atomic terms to compound terms.

Though the last issue is beyond the scope of this chapter, it needs to be addressed
briefly. Term logic is often criticized for its poor expressibility. Obviously, many
statements cannot be put into the "S C P" format where Sand Pare simple words.
However, this problem can be solved by allowing compound terms. This is similar to
the situation in natural language: most (if not all) declarative sentences can be parsed
into a subject phrase and a predicate phrase, which can either be a word, or a structure
consisting of multiple words. In the same way, term logic can be expended to represent
more complex knowledge.
For example, in the previous section "Non-black things are not raven" is repre-
sented as (thing- blackJhing) C (thing- raven), where both the subject and the
predicate are compound terms formed by simpler terms with the help of the differ-
ence operator. Similarly, "Ravens are black birds" can be represented as raven C
(blackJhing n bird), where the predicate is the intersection of two simpler terms;
"Sulfuric acid and sodium hydroxide neutralize each other" can be represented as
(sulfuric..acid x sodium-hydroxide) C neutralization, where the subject is a Carte-
sian product of two simpler terms.
UNIFIED INFERENCE IN EXTENDED SYLLOGISM 129

Though the new version of NARS containing compound terms is still under de-
velopment, it is obvious that the expressibility of the term-oriented language can be
greatly enriched by recursively applying logical operators to form compound terms
from simpler terms.
Finally, let us re-visit the relationship between the micro-level (inference step) and
macro-level (inference process) perspectives of abduction and induction, in the context
of NARS. As described previously, in NARS the words "abduction" and "induction"
are used to name (micro-level) inference rules. Though the conclusions derived by
these rules still intuitively correspond to explanation and generalization, such a cor-
respondence does not accurately hold at the macro-level. If NARS is given a list of
statements to start with, then after many inference steps the system may reach a con-
clusion, which is recognized by human observers as an explanation (or generalization)
of some of the given statements. Such a situation is usually the case that the abduction
(or induction) rule has played a major role in the process, though it is rarely the only
rule involved. As shown by the example in the previous section, the answers reported
to the user by NARS are rarely pure abductive (or deductive, inductive, and so on).
In summary, though different types of inference can be clearly distinguished in each
step (at the micro level), a multiple-step inference procedure usually consists of var-
ious types of inference, so cannot be accurately classified as induction, abduction, or
deduction.
As mentioned at the beginning of this chapter, Peirce introduced the deduction-
induction-abduction triad in two levels of reasoning: syllogistic (micro, single step)
and inferential (macro, complete process). I prefer to use the triad in the first sense,
because it has an elegant and natural formalization in term logic. On the other hand,
I doubt that we can identify a similar formalization at the macro level when using
"abduction" for hypothesis generation, and "induction" for hypothesis confirmation.
It is very unlikely that there is a single, universal method for inference processes like
hypothesis generation or confirmation. On the contrary, these processes are typically
complex, and vary from situation to situation. For the purpose of artificial intelligence,
we prefer a constructive explanation, than a descriptive explanation. It is more likely
for us to achieve this goal at the micro level than at the macro level.
Because term logic has been ignored by mainstream logic and artificial intelligence
for a long time, it is still too early to draw conclusions about its power and limitation.
However, according to available evidence, at least we can say that it shows many novel
properties, and some, if not all, of the previous criticisms on term logic can be avoided
if we properly extend the logic.
9 ON THE RELATIONS BETWEEN
ABDUCTIVE AND INDUCTIVE
EXPLANATION
Luca Console and Lorenza Saitta

9.1 INTRODUCTION
Abduction and induction are two forms of inference that are commonly used in many
artificial intelligence tasks with the goal of generating explanations about the world.
Paradigmatic is the case of Machine Learning. It traditionally relied on induction in
order to generate hypotheses (Plotkin, 1970; Mitchell, 1982; Michalski, 1983b). How-
ever, some limitations emerging in purely inductive systems led researchers to propose
the use of other reasoning mechanisms for learning, e.g. deduction (Mitchell et al.,
1986; DeJong and Mooney, 1986), abduction (O'Rorke etal., 1990; Saitta et al., 1993)
and analogy (Veloso and Carbonell, 1991 ). Thus, a precise characterization of the var-
ious mechanisms could contribute to clarify their relations with learning tasks.
The interest for abduction, as a mechanism for generating (best) explanations grew
considerably in many fields of AI, such as diagnosis (Console and Torasso, 1991; Re-
iter, 1987; Cox and Pietrzykowski, 1987; de Kleer et al., 1992; Poole, 1989b), plan-
ning (Eshghi, 1988), natural language understanding (Charniak, 1988; Hobbs et al.,
1993), logic programming (Kakas et al., 1992). Indeed, several formal accounts of
abduction have been proposed (e.g., (Console et al., 1991b; O'Rorke et al., 1990; Cox
and Pietrzykowski, 1986a; Poole et al., 1987; Konolige, 1992; Kakas et al., 1992;
Levesque, 1989; De Raedt and Bruynooghe, 1991; Josephson, 1994)).
The goal of this chapter is to analyse the notion of reasoning towards explanation,
with specific interest for abduction and induction. We must immediately say that we
shall not be concerned with a universal notion of explanation, whose explication at-
133
P.A. Flach and A.C. Kakas (eds.), Abduction and Induction, 133-151.
© 2000 Kluwer Academic Publishers.
134 L. CONSOLE AND L. SAITTA

tracted for years, and still does, the attention of many philosophers (see, e.g., (Salmon,
1990) for a review). Thus, in this chapter we shall only deal with a restricted notion of
deductive explanation, as used, for instance, in the literature on principles of diagnosis
(Hamscher et al., 1992).
One of the goals of this chapter is to show that, using a logical framework, different
tasks aimed at providing explanations for a set of observations can be conceptually
unified; these tasks can be differentiated by imposing different constraints on the type
of explanation searched for. The goal is achieved by using a generalized notion of
explanatory hypothesis and of observation, including any kind of formulas, not just
ground ones. The framework allows induction and abduction to be characterized as
two aspects of the same inference process and to be related to each other. The proposed
characterization is in no way claimed to be the correct one; however, it does seem to
capture, in most cases, a basic intuition behind these inference schemes, and to make
explicit the grounds which the hypotheses they generate are based on.
The process of explaining observations is not limited to the generation of hypothe-
ses, but also includes their evaluation and selection. Beside domain-dependent crite-
ria, some notion of minimality (see, e.g., (Poole, 1989a; Stickel, 1988)) or simplicity
(Michalski, 1983b; Kemeny, 1953; Pearl, 1978) has been proposed to introduce an
order in the hypothesis space. Generation and selection of hypotheses can be done at
the same time, by biasing the search process in such a way that only hypotheses in a
preferred set are generated. In this chapter, however, we consider the two phases as
conceptually distinct, in order to give a definition of explanation neutral with respect
to any additional constraint suggested by the domain of application. A fundamental
partial order between hypotheses, widely used in Machine Learning, is given by their
degree of generality (specificity). In the first part of the chapter we briefly introduce a
definition of the notion of generality which will be then used in our characterization
of explanation.
The chapter is organized as follows: the notion of generality is discussed in Section
9.2; a generalized notion of explanation is introduced in Section 9.3; the relations be-
tween induction and abduction are investigated in Section 9.4. Section 9.5 applies the
framework to some examples of reasoning mechanisms. Finally, Section 9.6 discusses
related work.

9.2 GENERALITY AND INFORMATIVENESS


The notion of generality is a fundamental one when discussing about explanation.
However, in the literature there seem to be some confusion about this notion. A de-
tailed analysis of generality is not a goal of this chapter. In this section we briefly
introduce the notion of generality and of informativeness that will be used in the next
sections.
Let us consider a First Order Logic language L. Let P be the set of basic predicates
and Q the set of individuals of the universe. According to classical logic, formulas
in L can be partitioned into two subsets: open formulas, with some occurrence of
free variables, and closed ones (sentences), with no free variables. Following (Frege,
1893), the open formulas will be called concepts. A concept does not have a truth
value associated with it; it partitions Q into the concept extension and its complement.
ON THE RELATIONS BETWEEN ABDUCTIVE AND INDUCTIVE EXPLANATION 135

The concept extension consists of the set of individuals (or tuples of individuals) which
satisfy the concept definition. More precisely, let f(xl, ... ,xn) be a concept over the
free variables x 1, ... ,xn; the extension off with respect to an interpretation I is defined
as follows 1:

EXT(!) = { < a1, ... ,an> If(a I, ... ,an) is true in I} ~ nn

The predicates that are true of a given n-tuple < a1, .. . , an > E nn are said to belong
to the intension of that n-tuple (Descles, 1987):

/NT(< a1, ... ,an>)= {p E Plp(a1, ... ,an) is true in!}

A certain confusion between these two aspects has influenced some of the definitions
of the more-specific-than (more-general-than) relation. Any definition of this relation
should acknowledge that specificity (generality) is an extensional property and, hence,
it only pertains to concepts. Closed formulas (sentences) are statements about the gen-
erality of the associated concepts and can be compared according to the information
they provide about a concept. A concept and a sentence are not comparable with re-
spect to generality. In order to illustrate this difference, let us consider the concept
square(x) and the sentences:
01 =square( a) 02 = 3x[square(x)] 03 = 'v'x[square(x)]
No extension can be associated with any of 01, 02 or 03; hence, it makes no sense
to speak of their degree of generality. However, each of the three sentences provides
information about the extension, and, hence, the degree of generality of the associated
concept square(x). In particular, 01 states that a E EXT(square) (with a E Q), i.e.,
that the extension of the concept square contains at least the individual a. 02 states

=
that EXT(square) -:/: 0, i.e., that the extension of square(x) is not empty. 03 states
that EXT(square) Q, i.e., that the extension of square(x) coincides with the whole
universe.
We can now introduce the more-specific-than relation among concepts and the
more-informative-than relation among sentences.

Definition 9.1 Given two concepts f(xl, ... Xn) and g(x1, ... xn), and a universe of
discourse Q, the concept f(xl, ... xn) will be said to be more specific than the concept
g(x~, ... xn) (denoted by f I< g (Michalski, 1983b)) iff EXT(!) ~ EXT(g) for any
interpretation.

If both f I< g and g I< f hold, then f and g belong to the same equivalence class with
respect to the more-specific-than relation; equivalence in generality will be denoted
by f < I > g. The relation I< is reflexive and transitive, but not antisymmetrical,
because it usually includes the case of equivalence. Definition 9.1, however, may
not be applicable in practice and an intensional criterion is needed. 8-subsumption
(Plotkin, 1970) was one of the first proposed criteria; recently it has been widely used
in Inductive Logic Programming (Muggleton, 1993).

1In the discussion that follows, in order to simplify the notation, we shall limit ourselves to Herbrand
interpretations.
136 L. CONSOLE AND L. SAIITA

Generality is an extensional property, pertaining only to concepts. In order to com-


pare sentences, we can take into account the amount of information that a true sentence
conveys. To this end, let us recall the following notions. Given a universe of discourse
n and a set of predicates P, a possible world is a set of assignments of truth values to
every basic n-ary predicate of P, for each tuple of n objects in nn (n 2: 1). Let W be
the set of possible worlds.

Definition 9.2 Given two sentences q> and 'If of L, q> will be said to be more infor-
mative than 'If (denoted by q> :7 'If) if.fW(q>) ~ W('lf), where W(q>) denotes the set of
consistent worlds in which q> is true.

Notice that in such a way a tautology T is the least informative sentence in L. On


the contrary, a most informative sentence is one that reduces the number of possible
consistent worlds to 1. There are several ways in which we can quantitatively evaluate
the information content of sentences; a well-known approach is to use the notion of
entropy (Shannon and Weaver, 1949).
In principle, any intensional definition of the more-specific-than and of the more-
informative-than relations may be acceptable, provided that they are compatible with
Definition 9.1 or 9.2, respectively. However, intensional definitions based on material
implication provide a unified view of both relations and highlight the links between
them.

Definition 9.3 Given a theory T, expressed in the language L, and two concepts
f(xJ, . .. ,xn) and g(xJ, .. . ,xn). the concept f will be said to be more specific than
the concept g with respect to T (denoted by f I< T g) iffT 1- Vx1, ... , Xn [f(xi, ... , Xn) -+
g(xi, . .. ,xn)], where-+ denotes material implication.

Definition 9.4 Given two sentences q> and 'If of L and a theory T, q> will be said to be
more informative than 'If with respect to T (denoted by q> :7r '\jf) if.fT 1- (q> -+ 'If).

It is easy to see that definitions 9.3 and 9.4 are special cases of definitions 9.1 and
9.2, respectively. If we want to draw a parallel between the more-specific-than (l<r)
and the more-informative-than (:7r) relations, we can say that the extension of a con-
cept corresponds to the information content of a sentence. In order to examine more
deeply the parallel between informativeness of sentences and generality of concepts,
let us consider some examples. For instance, with an empty theory, we have:

Vx p(x) :7 p(a) :7 3xp(x)

which is justified by the following derivations:

1- [Vx p(x) -+ p(a)J and 1- [p(a)-+ 3xp(x)]

In fact, the formula Vx p(x) selects the worlds in which every object has the property
p, whereas p(a) selects all those worlds in which only the object a is bound to have
that property. Finally, 3xp(x) is true in every world in which some object has the
property p. Then:

W(Vx p(x)) ~ W(p(a)) ~ W(3xp(x))


ON THE RELATIONS BETWEEN ABDUCfiVE AND INDUCfiVE EXPLANATION
I
137

Notice that in the example above no theory is needed. On the contrary:

Sigmund lives in Vienna "3T Sigmund lives in Austria

can be asserted only with respect to a theory T including, e.g.:

'v'xyz[lives(x,y) A. is_in(y,z) -+ lives(x, z)], is_in(Vienna,Austria)

The attempt by (Flach, 1992) of assessing the relative generality of the two sentences
above is not meaningful in this chapter, because only the more-informative-than re-
lation can be applied to the sentences. On the other hand, Sigmund lives in Vienna
is a ground instance of the concept lives(x, Vienna), which can be proved to be more
specific than the concept lives(x,Austria), with respect toT.

9.3 A GENERAL DEFINITION OF EXPLANATION


The search for an explanation of events is a fundamental aspect of human reasoning.
Humans do not like events that they do not understand, because these events cannot
be predicted or mastered. The notion of explanation is so deep and multifaced and
involves so many aspects of life that no explication, capturing the whole of its nature,
has been found, notwithstanding the centuries of philosophical investigations. What
is considered a bona fide explanation depends on several epistemological, teleological
and pragmatic factors, such as the field of interest, the model of the world of the rea-
soning agent, the importance of the explanandum, and the goal which the explanation
is aimed at.
Searching for an explanation has complex links with prediction: an ex ante expla-
nation allows unknown events to be predicted, whereas an ex post explanation only
accounts for what has been observed previously. There is a lively debate between the
Predictivist (e.g., (Lakatos, 1975; Popper, 1959)) and the Nonpredictivist (e.g., (Hor-
wich, 1982; Thagard, 1992)) schools on the psychological validity ascribed by humans
to ex ante and ex post explanations. Recent studies did not find a clear preference in
(non-scientist) subjects toward one of the two (Chinn, 1994).
It is clearly out of the scope of this chapter to try to contribute to such a complex
philosophical and psychological matter. We will limit ourselves to investigate a much
more restricted computational notion of explanation, which can be manipulated by
an automated reasoner to account for some observations provided to it. As we aim
at extracting a common core among explanations used in various tasks, a generalized
notion of explanation will be introduced first.

Definition 9.5 Given a domain theory T and a set of sentences O.obs• describing ob-
servations performed on the world, a set E of sentences will be called an Explanation
for O.obs (with respect to T) iff the following conditions are satisfied:
(a)E "3T O.obs
(b)E ~T ..l
(c) True ~T O.obs
(d) E has some additional properties that make it interesting.
138 L. CONSOLE AND L. SAITTA

According to Definition 9.5, the explanation (a) must be more informative than the
observations, (b) must be consistent with the theory, and (c) must be essential, i.e.,
complete information about the observation must not be present in the theory. Condi-
tion (a) is what is required by Johnson-Laird for an explanation in human reasoning: a
hypothesis must increase, with respect to the observations, the semantic information,
in the sense that it must reduce the number of possible states of affairs. However, we
have to pay for this increase with the uncertainty of the hypothesis (Johnson-Laird,
1988, Cap. XIII). Definition 9.5 captures this notion, by explicitly referring to the
information content added by the generated hypothesis to our knowledge of the world.
Definition 9.5 has the advantage, on the one hand, of leaving open the possibility of
using different definitions of explanations (for instance, logical or causal ones), and,
on the other hand, of allowing a great freedom in the syntactic form of both observa-
tions and hypotheses. For example, using Definition 9.4, the conditions of Definition
9.5 can be rewritten as follows:
(a) TUEI-a.obs (b) TUEI/..1. (c) T'rfa.obs
obtaining thus the usual formulation of explanation in model-based diagnosis (Console
and Torasso, 1991; Poole, 1989a). Moreover, in model-based diagnosis (or planning),
O.obs and E are usually sets of ground literals. Restricting observations and hypothe-
ses to be ground seems reasonable for these tasks, where one wants to hypothesize
either faults in the modelled system or the effect of an action for explaining specific
symptoms or a specific state of the world. In learning, on the contrary, the hypothe-
sized explanation of a phenomenon shall be added to the current knowledge for future
use. Then, in learning, one would also like to explain non-ground, general formulas,
for instance to check part of a domain theory or to justify regularities noticed in the
world; this last task is fundamental for theory formation, in which empirical laws are
to be explained by general theories. The suitability of explaining quantified data has
also been more recently advocated by (Kelly and Glymour, 1988), who are interested
in the impact of this kind of data on the complexity of learning. In Machine Learn-
ing, most systems assume that O.obs is ground and E contains universally quantified
sentences. Notwithstanding the possible differences in the format of observations and
hypotheses, the very same conditions above have been adopted in Machine Learning
to characterize the learned hypotheses (Muggleton, 1991; Michalski, 1991).
Without any modification, Definition 9.5 also covers the case in which the q> 3T 'I'
relation is interpreted as q> is cause of'IJf (Cox and Pietrzykowski, 1986a; Saitta et al.,
1993; O'Rorke et al., 1990).
In summary, Definition 9.5 is justified by the requirement that it satisfies at least
the following properties:
• It is widely applicable, in that it only pretends a minimal set of broadly accepted
requirements for explanations. In fact, it is neutral with respect to both the
selected notion of explanation, and to any criterion that can be used to single
out interesting explanations.
• The analysis ofthe reasoning mechanisms in the rest of the chapter only depends
on this minimal set of requirements. In particular, the considerations that will
follow hold for any criterion used either to specialize Definition 9.5 or to select
among alternative hypotheses.
ON THE RELATIONS BETWEEN ABDUCTIVE AND INDUCTIVE EXPLANATION 139

Given Definition 9.5, the number of explanations satisfying conditions (a)- (c) is usu-
ally very big; then, condition (d) comes into play, allowing a set of preferred expla-
nations to be selected. Even though this chapter is not concerned with the problem of
explanation selection, some of the proposed criteria are briefly mentioned (see (Poole,
1989a; Stickel, 1988) for a discussion), pointing out their relations with the notion of
informativeness introduced in section 9.2.
• Avoiding redundant explanations. This criterion corresponds to the requirement
that an interesting explanation E must be minimal, in the sense that no proper subset
of E satisfies conditions (a) - (c). This criterion, adopted in many early definitions
of explanation, is subsumed by the notion of information content (see the discussion
in (Console et al., 1991b)). Such a criterion is also related to the notion of prime
implicants, which is in tum defined in terms of logical entailment and has been used
e.g. in model-based diagnosis (de Kleer et al., 1992).
• Avoiding trivial explanations, coinciding with the observations themselves. In
learning from examples, for instance, hypotheses consisting of the disjunction of
the observed examples may be undesirable.
• Explanations may be required to be expressed only in terms of a predefined set of
predicates (or language), for instance, the set of abducible predicates in diagno-
sis (Console and Torasso, 1991). This limitation is extensively used in Machine
Learning, under the name of language bias (Mitchell, 1982).
• A further dimension for defining preference criteria among explanations concerns
the specificity (or basicness (Cox and Pietrzykowski, 1986a; Stickel, 1988)). In
most specific abduction (Stickel, 1988), the assumptions must be basic, i.e., not
provable by making other assumptions. In least-specific abduction the only al-
lowable assumptions are the observations themselves. The notion of information
content partially interacts with such a choice: if we compare two explanations us-
ing as background theory the same domain theory used for generating explanations,
we enforce a preference for less specific explanations. It is worth noting that inter-
mediate notions between the two mentioned above have been suggested, e.g., in
cost-based abduction (Hobbs et al., 1988), in which numeric costs are associated
with assumptions.
• (Poole, 1989a) introduced the notion of least presumptive explanations, i.e., expla-
nations which do not make any assumptions that are not supported by the observa-
tions: given a set T of relations among assumptions (domain theory), an explanation
E 1 is less presumptive than £2 iff T U £2 f- E 1· Thus, as expected, this notion is a
special case of that of informativeness.
• In Machine Learning, three criteria have been traditionally used to compare and
select hypotheses: consistency, completeness and simplicity (Occam's razor prin-
ciple). A complete hypothesis is able to explain all the positive occurrences of the
phenomenon under study, whereas a consistent hypothesis does not explain any of
the negative occurrences of the phenomenon. Simplicity is more difficult to define,
and has mostly been associated with the hypothesis' syntactic simplicity. However,
this type of simplicity is not always satisfactory, and other notions, more seman-
tic in nature, have been proposed. One attempt has been to introduce a measure
of coherence (Ng and Mooney, 1991), a metric that selects those assumptions that
140 L. CONSOLE AND L. SAITTA

are more relevant for (connected to) the given observations. This approach has
been proposed for natural language understanding (inferring users plans from utter-
ances).

9.4 INDUCTIVE AND ABDUCTIVE EXPLANATIONS


The previous section is only concerned with a computational definition of what an
explanation is. However, a most important point is where the explanation comes from.
Traditionally, reasoning mechanisms such as induction and abduction have been con-
sidered as the basic means to generate explanations (hypotheses), even though there is
no agreement on a widely accepted definition.
In the following we will try to formalize an intuitive understanding of the inductive
and abductive reasoning mechanisms: an inductive hypothesis explains a phenomenon
by asserting that the same phenomenon has been already observed several times in the
past, under the same circumstances. An abductive hypothesis explains a phenomenon
by specifying enabling conditions (as a special case, causes) for it. As a consequence,
abduction necessarily needs a domain theory (often a causal theory, even though it
need not be of such a nature), while induction does not. If we want to explain, for
instance, that the light appears in a bulb when we tum a switch on, an inductive expla-
nation would say that this is because it happened hundreds of times before, whereas
an abductive one could supply an explanation in terms of the electric current flowing
into the bulb filament. If, at some moment, turning the switch on does not let the
lightbulbs starts burning, the inductive explanation just fails, whereas the abductive
one can supply hints for understanding what happened and for suggesting remedies.
Before moving to a formal characterization, let us look at the problem informally,
by means of two simple examples.

Example 9.1 Consider the following theory and observation:


=
• T 'v'x[p(x)-+ q(x)J =
• Uobs q(a)
Three alternative explanations for Uobs are:
£1 = {p(a)} £2 = {'v'xq(x)} £3 = {'v'xp(x)}
£1 corresponds to the assumption that the property p holds for the individual a, the
same for which q has been observed. E1 needs T in order to derive the observation.
We would like to say that E1 is a (purely) abductive explanation.
On the contrary:
E2l- {q(y)i'v'y E 0} 1- q(a)
Hence, q(a) is deducible from E2 without using T and, furthermore, q(a) cannot be
deduced without also deducing q(y), 'v'y E Q. We would like to label E2 as a (purely)
inductive explanation.
Finally, let us consider £3. We obtain:
T U£3 1- {q(y)i'v'y E 0} 1- q(a)
Again, we cannot prove q(a) without also proving q(y), 'v'y E Q. However, T is es-
sential in the derivation. If we look more closely at the generation of £3, we notice
that this process actually corresponds to a combination of two inference steps: a first
one, abductive, in which E1, sufficient to explain q(a), is generated, and a second one,
ON THE RELATIONS BETWEEN ABDUCfiVE AND INDUCfiVE EXPLANATION 141

inductive, in which E1 is explained, in turn, by £3. We would like to label E3 as an


inductive/abductive explanation.

This labelling of hypotheses as abductive, inductive/abductive or inductive tries to


capture the intuitive feeling that the only support of an inductive explanation is a sup-
posed similarity between unobserved individuals and observed ones (Helft, 1989). In
other words, an inductive hypothesis allows the validity of properties, observed on a
set of individuals, to be extended to other individuals not in the observations, whereas
an abductive one allows unobserved properties to be applied to observed individuals.
Notice that the distinction between inductive and abductive hypotheses strictly par-
allels the dichotomy extension vs. intension, or generality vs. informativeness. In
fact, inductive hypotheses are related to (concept) extensions (i.e., they enlarge the
extension of observed properties, or they generalize observed properties to other in-
dividuals), whereas abductive hypotheses are related to (individual's) intensions (i.e.,
they increase the intension of individuals in the observation - that is the set of prop-
erties that are true for such individuals). The inductive/abductive explanations are a
kind of hybrid, since they extend observed properties to other individuals not in the
observations through the use of other properties specified by the theory.
By considering again the hypotheses E1, E2 and £3 (for which £3 3T E1 3T a.obs
and £3 3T E2 3T O.obs) in Example 9.1, if the task at hand is a diagnostic one, E1 may
be a preferred explanation, because it corresponds to making the minimum assumption
about the actual world (i.e., p(a)). If we want to learn something useful in the future,
explanations E2 and £3 are better, because they potentially apply to yet unseen cases.
On the other hand, both of them make very strong assumptions about the actual world
and one could question their validity. Then, for learning, a trade-off between the
generality of a hypothesis and its credibility has to be searched for.

=
Inductive hypotheses are not all inductive to the same degree. For instance, in
Example 9.1 the sentence £4 q( a) 1\ q( b) is also a valid inductive explanation for
q(a). However, the inductive leap from q(a) to Vx q(x) is much larger than from q(a)
to q( a) 1\ q( b). A quantitative measure based on the informativeness might allow the
inductive leaps to be measured and compared.

Example 9.2 Let us now consider a universe of discourse Q = {John, Dan} and the
following theory and observation:

T ={Vx[measles(x)---+ red...spots(x)],
Vxy[measles(x) 1\brothers(x,y)---+ measles(y)],
Vxy[brothers(x,y)-+ brothers(y,x)],
brothers(John,Dan)}
=
Uobs red...spots(John)

An explanation ofUobs is:


E1 = {measles(John)}
The hypothesis E1 is sufficient to explain Uobs· However, if we now use the theory T
together with E J, we can also prove measles( Dan), and red...spots(Dan), which extend
the properties measles and red ...spots to Dan; hence, measles(Dan) and red...spots(Dan)
belong to the set of consequences of measles(John) with respect to T. Notice that
142 L. CONSOLE AND L. SAITIA

£2 = {measles(Dan)} is also an explanation ofred....spots(John), obtained by using


the first and second rule in T. By comparing E 1 and E2 with respect to their informa-
tiveness, we obtain:
measles(John) E3T measles(Dan)
(where E 3T means equivalence in information content). In fact, without any temporal
ordering among events, whenever measles(John) is true, measles(Dan) is also true,
and vice-versa. This special case is originated from the presence of a recursive rule
in T. By analyzing the way E 1 and E2 are generated, we would like to label both E,
and E2 as inductive/abductive hypotheses.

At first sight, labelling explanations £ 1 and £2 of Example 9.2 as inductive/abductive


may seem counter-intuitive, because one would rather be tempted to label them as
abductive. However, we have to consider that we have related the nature of an expla-
nation E to the effects produced by the addition of E to T, and not to the mechanism
employed to generate the hypothesis. Moreover, the explanation E acquires its nature
only in the context of a theory T ; then, the explanation simply lets the generalization
capability, already implicit in the theory, become apparent. An empty theory does not
offer abductive capabilities, only inductive ones.
This fact can be exploited to check the level of inductive power of a theory. Let us
consider, for example, the theory:

T ={rain--+ wet(grass),rain--+ wet(road)}

If we observe wet(grass), a possible explanation could beE= {rain}. Then, we


can conclude that also wet(road) is true. This explanation points out that the theory
allows the transfer of the property wet from grass to road. This is not substantially
different from predicting that the next apple we will see shall be red, because we have
previously observed another which has the same color. The point we want to make is
even more apparent if we consider the theory:

T ={rain--+ 'v'x wet(x)}

which asserts that anything is wet when it rains.


In other words, we believe that the counter-intuitiveness disappears if the induc-
tive/abductive nature is seen more as a sign of the theory's generality than as a charac-
teristic of the explanation. In this way, a continuum can be established from an empty
theory, which has the maximum inductive power, to a theory consisting only of rules
of the type {p(a) --+ q(a)}, which does not have any, and from hypotheses that only
modify extensions to hypotheses that only modify intensions.
By summarizing the ideas discussed in Examples 9.1 and 9.2, given a (possibly
empty) theory T, we propose to partition the set of explanations into three subsets:
(purely) inductive, (purely) abductive and inductive/abductive hypotheses, with re-
spect to T. Notice that the nature of an explanation is evaluated with respect to the
extensions of the predicates occurring in the observation to be explained, and with
respect to a theory. By changing the theory, the nature of an explanation may change.
ON THE RELATIONS BETWEEN ABDUCfiVE AND INDUCfiVE EXPLANATION 143

Let us introduce, first, the notion of a decomposable explanation. Let a 1 , •• • , ak be


subformulas occurring in <lobs· It may happen that the explanation for <lobs is decom-
posable, i.e., subformulas of <lobs can be explained separately:

TUE; 1- a;(1 ~ i ~ k)

In this case, the nature of the E; 's can be assessed independently. Therefore, we can
consider, without loss of generality, the case of a non-decomposable explanation E
for <lobs· The definition of the various types of explanations (hypotheses) will be
introduced in several steps, starting from the case where the observation is ground and
consists of instances of a unique predicate p.

Definition 9.6 Given a domain theory T, a universe of discourse Q, and an observa-


tion <lobs= p(a1) 1\ ... 1\p(an), let p(X) denote the set of assertions p(a;) (I ~ i ~ n),
X = {a1, ... ,an} being the set of individuals for which p has been observed. An ex-
planation E for aobs· satisfying Definition 9.5, will be called:
• (Purely) Inductive in the context ofT, iff:
3 Y such that: E 1- p( Y) 1- p(X) (with XC Y)
• Inductive IAbductive in the context ofT, iff:
3 Y such that: T U E 1- p( Y) 1- p(X), (with XC Y)
E f/ <lobs
• (Purely) Abductive in the context ofT, iff:
TUE 1- p(X)
-.3b (j. X such that TUE 1- p(b)
=
E f/ Uobs• unless E Uobs

Definition 9.6 can be interpreted as follows. A (purely) inductive explanation does


not need any theory to derive the observations, and the observations cannot be derived
without proving that the property p holds for a set of individuals Y strictly larger than
X. Then, an inductive hypothesis is required to show generalizing effects. An induc-
tive/abductive explanation also necessarily extends property p to a strictly larger set of
individuals, but it needs the theory T to do it. (Purely) abductive hypotheses necessar-
ily need a theory to derive the observation; moreover, the observed properties cannot
be extended to any other individual not belonging to X. Finally, the observation can-
not be derived from E alone, unless E is a trivial explanation. It has to be noticed that,
in the common practice, abductive hypotheses are often required to be ground atoms
of T. This requirement is very important, because it strongly limits the size of the
hypothesis space, without reducing the expressive power of the hypothesis language.
However, adding this further constraint to the very definition of a generic abductive
hypothesis may not be sensible, because it would rule out some explanations which
may be of interest. Let us consider again Example 9.1 and the explanation:

E4 = {r(a),'v'x[r(x)-+ p(x)]}
where the predicate r does not occur in T; hence, r(a) is not a subformula ofT in-
stantiated on a. However, the generation of E4 corresponds to the acknowledgement
144 L. CONSOLE AND L. SAITTA

that T may be incomplete and that p(a) is not actually a satisfactory hypothesis; then,
some other phenomenon should be postulated to produce p(a). This kind of process
of generating the abductive hypothesis £ 4 is quite common in theory formation, where
theoretical terms and hidden properties are hypothesized to extend the theory T on the
basis of new evidence.
Definition 9.6 can be generalized to the case where more than one predicate symbol
occurs in the observation, i.e., Uobs =PI (XI) t\ . .. t\pm(Xm). In such a case, the nature
of an explanation E can be evaluated with respect to each Pi (1 ~ i ~ m); we will say
that, globally, E is inductive when it is inductive with respect to at least one Pi· E
is abductive when it is abductive with respect to every Pi· and inductive/abductive in
any other case. The reason for this definition is that an abductive hypothesis must not
introduce any extensional increase, thus the extension of the Pi's should not change.
On the other hand, it is sufficient that a single Pi has its extension increased, in order
to have an inductive leap. An analogous definition can be given in the more general
case in which O.obs is a ground formula expressed in conjunctive or disjunctive normal
form.
The case where negative literals occur in the observations has to be dealt with care-
fully. In particular, one can interpret negation in a classical way; in such a case, given
an observation --,a.(a) (or --,a.(X)), we can apply the same considerations as above,
by simply commuting •a. into a new predicate g, and considering the complements
of the extension(s). However, interpreting negation classically is not natural in many
applications of abductive and inductive reasoning (see (Console et al. , 1991 b)) and the
interpretation as failure to prove is much more useful and common. In this case, the
explanations must be consistent with the negative literals, i.e., positive atoms incon-
sistent with the negative literals must not be derivable from the explanation. Then, the
considerations above must be adapted; however, a detailed description of this process
is out of the scope of this chapter.
Let us now consider the case where the observation is not ground; the formal defi-
nition of the nature of an explanation becomes more complex. Let us start again from

=
the simplest case where the observation is a single universally quantified predicate p:
O.obs '1:/x p(x). In this case, the observed extension of p coincides with the whole
universe, i.e., X= EXT(p) = Q. Then, only purely abductive explanations can exist
for O.obs·

=
Consider now the case of the observation being a universally quantified formula
not consisting of a single predicate, i.e., a.obs 'Vx q>(x), where :X denotes a set of
m variables. Obviously, as X= EXTobs(q>) =Om. there cannot exist inductive ex-
planations for the whole O.obs· However, q>(:X) is a concept whose extension derives
from the combination of the extensions of sub-concepts, each consisting of a single
predicate. Let q, = {p jll ~ j ~ n} be the set of such predicates. Let, moreover,
Xj = EXTobs(Pj)(l ~ j ~ n) be the observed extension of Pj in O.obs· Definition 9.6
can be applied to each p j; in fact, even if EX Tabs( q>) cannot be further extended, some
of the Xj 's could be. Then, an explanation E of a.obs ::: 'Vx q>(x) will be called inductive,
if it is inductive for at least one p j E q,, abductive if it is abductive for every p j E q,,
and inductive/abductive in every other case.
ON THE RELATIONS BETWEEN ABDUCfiVE AND INDUCTIVE EXPLANATION 145

Finally, let O.obs = 3xp(x) be an existentially quantified observation involving a


single predicate. The observation tells us that X= EXTobs(P) =/; 0. Moreover, we make
the reasonable assumption that the observer is fair, i.e. he/she tells all what he/she
observed; then, we can say for sure that X= EXTobs(P) C n , otherwise the observer
would have provided a universally quantified observation. In this case, Definition 9.6
directly applies. However, it may be difficult to prove that X C Y, since the exact
identity of X is unknown. A special case is the one where Y = Q; in such a case, in
fact, the explanation is clearly generalizing the observation. If the observation is an
existentially quantified formula «obs = 3x<p(x), definition provided in the case of an
universally quantified observation can.be used.
Another way of distinguishing between abduction and induction, consistent with
Definition 9.6, can be obtained by considering the generation process of inductive or
abductive hypotheses as the effect of specific inference rules. Given the modus ponens
inference rule:
Vx[p(x) -t q(x)] p(a)
q(a)
an abductive inference step (Peirce, 1958) is performed when p(a) is hypothesized
upon observing «obs = q(a) and knowing that Vx[p(x) -t q(x)] belongs to the the-
ory. Explanations obtained by only chaining this type of basic inference steps have
an (at least partially) abductive nature. On the other hand, we can also hypothesize
Vx[p(x) -t q(x)] upon observing O.obs = p(a) -t q(a), thus obtaining the usual form of
induction.

9.5 ANALYSIS OF INFERENCE MECHANISMS IN THE LITERATURE


In this section we will show how the framework in Sections 9.3 and 9.4 can be used to
analyse some approaches to computing explanations.

9.5. 1 Learning from examples


Definitions 9.5 and 9.6 cover the classical way of considering induction (Mitchell,
1982; Michalski, 1983b). In fact, suppose we are given a set A= {aJ , ... ,an} of
individuals, which is a proper subset of the universe n . For each ak E A we have
observed that if ak has the logical combination of properties <p, then a label w can be
applied to it. Then:

«obs = {q>(ak) -t w(ak)i'<lak E A}


The set X coincides with A . The following inductive hypothesis can be generated:
E = {Vx(<p(x) -t w(x)]}

In this case Y =nand E is an inductive hypothesis, according to Definition 9.6.

9.5.2 Inverse resolution


Given a complete deduction system, the idea of inverting deduction in order to obtain
plausible hypotheses, explaining sets of observations, and/or define new predicates,
146 L. CONSOLE AND L. SAITIA

Theory: Theory:
Absorption Vx[<p(x)-+ p(x)] Vx[<p(x)-+ p(x)]
Observation: Explanation:
Vx[<p(x) 1\ '\jf(x) -+ q(x)] 'v'x[p(x) 1\ '\jf(x) -+ q(x)]
Theory: Theory:
Identification Vx[<p(x) 1\ p(x) -+ q(x) Vx[<p(x) 1\p(x)-+ q(x)]
Observation: Explanation:
Vx[<p(x) 1\ '\jf(x)-+ q(x)] Vx[<p(x) -+ p(x)]
Truncation Observation: Explanation:
Vx(<p(x) 1\ '\jf(x)-+ q(x)] Vx[<p(x)-+ p(x)]
Table 9.1 Inverse resolution operators.

is quite old (Meltzer, 1970; Morgan, 1971). Choosing resolution as the deduction
machinery, Muggleton defined a set of rules for performing Inverse Resolution for the
propositional calculus (Muggleton, 1987). Later, a reduced set of inference rules has
been considered for First Order Logic (Muggleton and Buntine, 1988; Rouveirol and
Puget, 1990). The nature of the inverse resolution operators has been investigated in
(Console et al., 1991a) in the case of propositional calculus. In particular, in that paper
absorption, identification and truncation (whose definitions are summarized in Table
9.1) have been proved to be sound inference rules with respect to the general definition
of explanation reported above.
In order to investigate the nature of these operators, we have to identify the theory
and the observation. The criterion we chose is that the formulas that are not modified
by the inference process belong to the theory. Explanations are new rules to be added
to the theory.
It is easy to show that absorption and identification have a potentially inductive/
abductive nature. Absorption, in fact, corresponds to the process of generalizing prop-
erties of classes of objects (climbing generalization rule (Michalski, 1983b)). In fact,
given the assertion in T, which states that each object in class <p belongs to the class p,
the fact that the objects in class <p have the property q (when they also have property
'If) is explained by stating that the same is true of all the objects in the super-class p of
<p. If it is the case that the theory T contains also the clause:

Vx[o(x) -+ p(x)]

then, also the objects in class o would inherit the property q, and those objects are not
involved in the observation. This is a case in which the nature of a hypothesis can be
determined only in the context of a given theory.
The truncation operator has a purely inductive nature since it corresponds to the
dropping-condition rule (Michalski, 1983b). The extension of p is enlarged from the
set of object with both properties <p and 'I' to the set of object with only property <p.
ON THE RELATIONS BETWEEN ABDUCfiVE AND INDUCfiVE EXPLANATION 147

9.5.3 Explanation-based generalization


Explanation-based generalization (EBG) searches for sufficient conditions to recog-
nize instances of a given concept w, by trying to justify, with the help of a domain
theory, why an object a is indeed an instance of the concept (Mitchell eta/., 1986; De-
Jong and Mooney, 1986). Formally, given a domain theory T and an instance a of w,
EBG searches for a rule

r ='v'xy[q>(x,y) -+ w(x)]
to be added to the theory, by trying to prove w(a) from T. The addition of r to T is
only a matter of convenience, because, in fact, T r- r.
In EBG , the observation a.obs. which we want to justify, is:

O.obs =a.(a,b) ='lf(a,b)-+ w(a)


where 'If(a, b) contains the set of all the observed facts (possibly including irrelevant
ones) involving a and other objects b. In order to find an explanation, EBG searches
for a subset q>(a,b) of the facts 'lf(a,b) such that T r- q>(a,b) -+ w(a). Then, Et =
{ q>(a, b) -+ w(a)} could be an explanation for a.ohs• because T U {q>(a,b)} -+ w(a)) r-
('lf(a,b) -+ w(a)). However, £ 1 is not an explanation, according to Definition 9.5,
because a.obs can be derived directly from T; in fact:

T f-- 'v'xb[q>(x,b)-+ w(x)] f-- q>(a,b)-+ w(a)

Actually, it is well known that EBG does not really allow something new to be learned,
but only to state in a new, effective way implicit links between w and properties em-
bedded in T.

9.5.4 Explanation in diagnosis


Let us consider the notion of explanation used in the abductive approaches to diagnosis
(Reggia et al., 1983; Console and Torasso, 1991; de Kleer eta/., 1992; Poole eta/.,
1987). In these approaches, the model of the system to be diagnosed is a theory T
including:
• A set of axioms describing the structure of the system.
• A set of axioms describing the (correct and/or faulty) behavior of the system (of its
components); such axioms have the following form:
type( C) 1\mode(C,M) 1\ inp(C,X) 1\ context(C,Y)-+ out(C,f(X ,Y))
where type is replaced by a predicate specifying the type of the component C to
which the rule refers, mode(C,M) indicates the fact that the component C is in
modeM (correct or faulty), inp and out indicate the input and output of component
C (and f is a function specifying how the output depends on inputs and context),
while context may indicate relevant contextual information.
We assume, that in T there is a predicate symbol for each input and output of the
system to be diagnosed (i.e., each observable parameter: the set of such predicates is
the language of the observations).
148 L. CONSOLE AND L. SAI1TA

Given a set O.obs of observations (containing at most one ground instance of each
predicate belonging to the language of the observations), and, possibly, contextual
information, a diagnosis is a set E of ground instances of the mode predicate, assigning
at most one mode to each component, and such that O.obs follows from T, E, inputs
and context.
It should be clear that explanations of this form have a purely abductive nature
according to Definition 9.6 (as one would expect). This can be shown by considering
that observations and explanations are forced to be ground and that the behavioral
models that are used are usually functional. Therefore no value other than the observed
one can be predicted for a predicate corresponding to a given observation. Notice that
the same notion of explanation has been used also in other tasks such as planning or
natural language understanding (see the overview in (Poole, 1990)).

9.6 RELATED WORK


In Philosophy of Science the proposed definitions of explanation fall into three broad
classes (Salmon, 1990): explanations as logical arguments, as functional relations,
and as a search for causes. In a paper that marks the transition between the prehis-
tory and the history of the discussion on explanation, (Hempel and Oppenheim, 1948)
presented their Deductive-Nomological (D-N) model in which the definition of ex-
planation as a logical argument was formalized. According to the D-N model, an
observation a (the explanandum) is explained if it can be derived from a set of general
laws and of antecedent conditions (the explanans).
The D-N model has been the reference for more than twenty years, until it has
been challenged by a number of famous counter-examples (Salmon, 1990, pp.46-
50). Moreover, Hempel himself limited the applicability of his model to ground facts,
because he found difficulties in explaining general regularities (laws to be explained
in terms of more general laws). The D-N model was then extended to the Inductive-
Statistical (I-S) and the Deductive-Statistical (D-S) models, in order to account for
stochastic phenomena (Hempel, 1962; Hempel, 1965).
If we assume Definition 9.4 for the more-informative-than relation, Definition 9.5
is substantially Hempel's D-N model, extended to the case in which also existentially
and universally quantified observations can be considered.
In learning, observations are mostly ground examples, and the basic definition
given in (Dietterich and Michalski, 1983) falls under the D-N or the I-S models. A
hypothesis His considered an explanation of some set of observations 0, in presence
of a background theory B, also in the ILP framework (Muggleton, 1991). In fact, H
must be such that B 1\ H 1-- 0 . The same definition is also found in precursor papers of
ILP (see, for instance, (Meltzer, 1970)).
The identification of an explanation as a cause is very ancient: let us only recall the
four Aristotelian causes (Analytica Posteriora, 71 b 9-12, 94 a 20; Physica, 184 a 10-
14) and the derived scire per causas of the Scholastic School. Even though causality
has had a controversial history in Philosophy, mostly due to its metaphysical nature, it
is nevertheless a very intuitive and appealing notion, which has been re-considered re-
cently, but with a more epistemological than realistic nature (Bunge, 1979; Harper and
Skyrms, 1988). In artificial systems which exploit causality (Console et al., 199lb)
ON THE RELATIONS BETWEEN ABDUCTIVE AND INDUCTIVE EXPLANATION 149

(for instance, (Cox and Pietrzykowski, 1986a; Saitta et al., 1993; O'Rorke et al.,
1990)), this notion is mostly left to the reader's intuition from everyday life: a non-
monotonic relation between cause and effect, which is not material implication, but
which is context-sensitive and related to temporal order. Definition 9.5 also applies
to causal explanations, provided that some computationally precise semantics is as-
sociated to the relation <p 3T 'If, to be read, in this case, as <p causes 'If· Finally, we
may notice that a weaker notion of explanation can also be considered. For instance,
(Flach, 1991; Flach, 1995) makes a distinction between weak explanation (based on
consistency between the hypothesis and data) and strong explanation (based on deriv-
ability of data from the hypothesis). Requiring only consistency of the explanation
with the observations and the available theory is common in model-based diagnosis,
when the theory models the system's correct behaviour(Console and Torasso, 1991; de
Kleer eta/., 1992). Observation derivability may be required, instead, when the theory
models the systems possible fault modes (in particular causes of misbehaviour).

9.6.1 Induction and abduction


Definitions of explanation do not include indications of any computational tool for
actually figuring out explanations. The process of generating explanatory hypotheses
falls inside the scope of reasoning. As the main motivation for wanting an explanation
is to increase our knowledge about the world, deduction has to be excluded from our
analysis, because adding consequences to a theory does not increase the amount of
information we already have. As we are interested, among the means to generate
hypotheses, in induction and abduction, we will briefly mention previous definitions
related to them.
A characterization of these reasoning mechanisms goes back to Aristotle (Analytica
Priora II, 23-27). Beside syllogism (deduction), he acknowledges four other kinds of
inference: induction, single example inference, anthymeme and reduction. Induction
(€1t<X.~) is the recognition of the universal in the particular, a process inherent to any
perceptual act, and a fundamental mean to build up the axioms (premises) to be used
in syllogisms. Induction has to be considered in a certain sense, inverse of syllogism
(op. cit., 23). It has to be noticed that this kind of induction is perfect, in the sense that
it is based on the observation of all the possible cases: it is then a limiting situation,
which every inductive act (not considering all the examples) shall approximate.
Reasoning from one example is the other extreme: from the observation of one
fact, a general law is hypothesized and used to prove another fact. An example is the
following (op. cit., 24): starting from the knowledge that "Thebians and Phocians are
neighbours" and "Athenians are Thebians are neighbours", and observing that "it was
bad for the Phocians to be at war with the Thebians", it is possible to conclude that
"it will be bad for the Athenians to be at war with the Phocians" (after making the
general hypothesis that it is bad to be at war with neighbours). Aristotle states that the
above reasoning is neither a syllogism (in fact, a hypothesis has been introduced), nor
an inductive step, because it is not a reasoning from a part to the whole, but from a part
to a part. According to our classification, this conclusion would have been labelled as
inductive/abductive, as it is similar to Example 9.2 reported in Section 9.4.
150 L. CONSOLE AND L. SAITTA

The anthymeme is a syllogism from probable premises or from signs. As an ex-


ample we can infer that a woman gave birth to a child from the fact that she has milk
(op. cit., 27). We can easily recognize here a form of reasoning from effects to causes,
one instance of the scheme we call now abduction. Finally, Reduction (ana~) is
the reduction of a problem to another problem whose solution entails a solution of also
the original problem. (op. cit., 25). Also this is an instance of abduction.
In modem times, the first relevant studies on induction have been those by Fran-
cis Bacon (Novum Organum, 1620), David Hume (A Treatise of Human Nature, 1739)
and John Stuart Mill (A System ofLogic, 1834), whereas abduction has been re-defined
by Peirce. For Bacon, induction consists in forming lists of objects having or not a
given property, with the aim of finding a suitable classification for each object. Hume
discusses hypothesis formation in connection with causality, and suggests eight rules
governing the relations between causes and effects. Some of these rules resemble the
canons of induction defined later by Mill, who approached the problem by trying to
justify inductive reasoning on the basis of a supposed constancy of the nature. As
regards Peirce's views, abduction generates hypotheses, whereas induction evaluates
them: "Abduction is the process offorming an explanatory hypothesis. It is the only
logical operation which introduces any new idea; for induction does nothing but de-
termine a value, and deduction merely evolves the necessary consequences of a pure
hypothesis. Deduction proves that something must be; Induction shows that some-
thing actually is operative; Abduction merely suggest that something may be." It must
be noted, however, that in his early work Peirce provided a different view on abduc-
tion and induction: the former is seen as a process that generates new information
about the individuals in the observation while the latter generalizes properties to other
individuals.
The interest for abduction grew considerably in the last few years in the AI com-
munity and many definitions of abduction have been proposed (see, for example, (Cox
and Pietrzykowski, 1987; Pople, Jr., 1973; Poole et al., 1987)). The relationships be-
tween abduction and deduction have been pointed out in (Console et al., 1991b) where
it is proved that the form of abduction used in many AI systems can be characterized as
deduction on a completed theory (Clark, 1978); similar considerations have been later
suggested in (Konolige, 1992). Such a characterization provides further insights into
the general framework of this chapter, since generalized, non-inductive explanations
can be regarded as deductions on a closed world, as well. This could be considered as
a further criterion to distinguish between abductive inferences, i.e., deductions on the
closed world that do not affect the extension of the predicates for which observations
are available, and inductive inferences, which are not deductions on the closed world
and in which the extensions are affected.
Further interesting comparisons can be drawn by considering works that relate ab-
duction and induction. The kind of relation proposed between them strongly depends
on the definition attributed to the two terms. (Michalski, 1991), for instance, recog-
nizes the fundamental role of inference towards explanations and calls induction the
process of building up such explanations. The same definition was already proposed
by (Meltzer, 1970), for whom induction is the inverse of deduction. More precisely,
the goal of induction, for Michalski, is to find a set of statements, P (the premises),
ON THE RELATIONS BETWEEN ABDUCTIVE AND INDUCTIVE EXPLANATION 151

such that P U BK f= C, where BK is a body of background knowledge and Cis the set
of statements to be explained. According to his view, then, abduction is a special case
of induction, differently from what is proposed in this chapter. However, Michalski
does not provide any formal characterization of the abductive explanations among all
the inductive ones, so that it is difficult to recognize them.
A similar classification is introduced in (Kodratoff, 1991 ), where abduction is one
among many other forms of inductive inference. In particular, Kodratoff distinguishes
between abduction (as inversion of modus ponens) and generalization (inference from
the particular to the general), but does not provide any formal support for it, only using
intuitive arguments.
On the contrary, (Josephson, 1994) (see also Josephson's chapter in the present
volume) classifies inductive generalization as a form of inference aimed at some best
explanation, i.e., as a form of abduction (following (Harman, 1965)); in fact, differ-
ently from Michalski and Kodratoff, and also from us, he calls abduction the whole
process of generating, evaluating and selecting hypotheses. However, he too fails to
provide a formal definition of inductive generalization and, more than that, a con-
vincing distinction between inductive generalization and abduction. In fact, he states
that the result of an inductive generalization does not explain the observed facts, but
the events of observing the facts, which is quite an ambiguous assertion. Moreover,
also the motivation for including inductive generalization into abduction is weak: if
it were not so, then we would be unable to justify why the credibility of an induc-
tive generalization increases with the number of supporting observations. This is not
true: it is sufficient to introduce a ranking among inductive hypotheses based on some
quantitative measure.
In this chapter we agree with Aristotle and the later proposal of Peirce, who consid-
ered induction and abduction as distinct processes. Our classification may provide a
formal basis for a precise distinction among explanations which generalize and those
which do not.
Finally, we disagree with the view proposed by (Flach, 1991) that induction =
abduction + revision, because this definition is too vague and does not allow for a
precise distinction between the two; moreover, it is not psychologically satisfactory
to relate the definition of induction with that of incrementality. In fact, this turns out
to say that, if one sees ten green apples today and concludes that all apples are green,
he/she performs an abductive reasoning; if tomorrow two more green apples are taken
into consideration, an inductive reasoning takes place.

9.7 CONCLUSIONS
This chapter proposed a generalized definition of explanation and a characterization
of the abductive or inductive nature of explanations, in the context of a theory. Rec-
ognizing the inductive or abductive nature of a hypothesis allows different kinds of
motivation to be searched for assessing its credibility: its frequency of occurrence in
the actual world, for an inductive hypothesis, and its causes, for an abductive one.
10
LEARNING, BAYESIAN
PROBABILITY, GRAPHICAL MODELS,
AND ABDUCTION
David Poole

10.1 INTRODUCTION
This chapter explores the relationship between learning (induction) and abduction. I
take what can be called the Bayesian view, where all uncertainty is reflected in proba-
bilities. In this chapter I argue that, not only can abduction be used for induction, but
that most current learning techniques (from statistical learning to neural networks to
decision trees to inductive logic programming to unsupervised learning) can be best
viewed in terms of abduction.

10. 1. 1 Causal and evidential modelling and reasoning


In order to understand abduction and its role in reasoning, it is important to understand
ways to model, as well as ways to reason. In this section we consider reasoning
strategies independently of learning, and return to learning in Section 10.1.2.
Many reasoning problems can be best understood as evidential reasoning tasks.

Definition 10.1 An evidential reasoning task is where some parts of a system are
observed and you want to make inferences about other (hidden) parts.

Example 10.1 The problem of diagnosis is an evidential reasoning task. Given ob-
servations about the symptoms of a patient or artifact, we want to determine what is
going on inside the system to produce those symptoms.

!53
P.A . Flach and A.C. Kakas (eds.), Abduction and Induction, 153-168.
© 2000 Kluwer Academic Publishers.
154 D. POOLE

Example 10.2 The problem of perception (including vision) is an evidential reason-


ing task. In the world the scene produces the image, but the problem of vision is, given
an image, determine what is in the scene.

Evidential reasoning tasks are often of the form where there is a cause-effect relation-
ship between the parts. In diagnosis we can think of the disease causing the symptoms.
In vision we can think of the scene causing the image. By causation 1, I mean that dif-
ferent diseases can result in different symptoms (but changing the symptoms doesn't
affect the disease) and different scenes can result in different images (but manipulating
an image doesn't affect the scene).
There are a number of different ways of modelling such a causal domain:

causal modelling where we model the function from causes to effects. For example,
we can model how diseases or faults manifest their symptoms. We can model
how scenes produce images.

evidential modelling where we model the function from effects to causes. For ex-
ample we can model the mapping from symptoms to diseases, or from image to
scene.

Independently of these two modelling strategies, we can consider two reasoning tasks:

Evidential Reasoning given an observation of the effects, determine the causes. For
example, determine the disease from the symptoms, or the scene from the image.

Causal Reasoning given some cause, make a prediction of the effects. For example,
predicting symptoms or prognoses from a disease, or predicting an image from
a scene. This is often called simulation.

In particular, much reasoning consists of evidential reasoning followed by causal rea-


soning (see Figure 10.1). For example, a doctor may observe a patient, determine
possible diseases, then make predictions of other symptoms or prognoses. This then
can feedback to making the doctor look for the presence or absence of these symptoms,
forming the cycle of perception (Mackworth, 1978). Similarly, a robot can observe its
world, determine what is where, and act on its beliefs, leading to further observations.
There are a number of combinations of modelling and reasoning strategies that have
been proposed:

• The simplest strategy is to do evidential modelling and only evidential reason-


ing. Examples of this are neural networks (Jordan and Bishop, 1996) and old-
fashioned expert systems such as Mycin (Buchanan and Shortliffe, 1984). A
neural network for character recognition may be able to recognise an "/>:' from
a bitmap, but could not say what an "/>:' looks like. In Mycin there are rules
leading from the symptoms to the diseases, but the system can't tell you what
the symptoms of some disease are.

1See http : 1 /singapore. cs . ucla. edu/LECTURE/lecture_secl. htm for a fascinating lec-


ture by Judea Pearl on causation.
LEARNING, BAYESIAN PROBABILITY, GRAPHICAL MODELS, AND ABDUCTION 155

observation

prediction

Figure 10.1 Causal and evidential reasoning.

• The second strategy is to model both causally and evidentially and to use the
causal model for causal reasoning and the evidential model for evidential rea-
soning. The main problem with this is the redundancy of the knowledge, and
its associated problem of consistency, although there are techniques for au-
tomatically inferring the evidential model from the causal model for limited
cases (Poole, 1988b; Console et al., 1989; Konolige, 1992; Poole, 1994). Pearl
(Pearl, 1988a) has pointed out how naive representations of evidential and causal
knowledge can lead to problems.

• The third strategy is to model causally and use different reasoning strategies for
causal and evidential reasoning. For causal reasoning we can directly use the
causal model, and for evidential reasoning we can use abduction.

This leads to an abstract formulation of abduction that will include both logical and
probabilistic formulations of abduction:

Definition 10.2 Abduction is the use ofa model in its opposite direction. That is, if a
model specifies how x gives a y, abduction lets us infer xfrom y. Abduction is usually
evidential reasoning from a causal modet2.

If we have a model of how causes produce effects, abduction lets us infer causes
from effects. Abduction depends on an implicit assumption of complete knowledge of
possible causes (Console et al., 1989; Poole, 1994); when an effect is observed, one
of its causes must be present.

2 Neitherthe standard logical definition of abduction nor the probabilistic version of abduction (presented
below) prescribe that the given knowledge is causal. It shouldn't be surprising that the formal definitions
don't depend on the knowledge base being causal as the causal relationship is a modelling assumption. We
don't want the logic to impose arbitrary restrictions on modelling.
156 D. POOLE

10.1.2 Learning as an evidential reasoning task


In this section we explore learning as an evidential reasoning task. Given a task, a
prior belief or bias, and some data, the learning task is to produce an updated theory
of the data (the posterior belief) that can be used in the task.
In order to make this clear, we must be very careful to distinguish:
• the task being learned
• the task of learning itself.
This distinction is very important when the task being learned is also an evidential
reasoning task (e.g., learning to do diagnosis, or learning a perceptual task).
The task of learning can be seen as an evidential reasoning task where the model
"causes" the data. The aim of learning is: given the data, to find appropriate models
(evidential reasoning), and from the model(s) to make prediction on unseen cases
(causal reasoning).
When we look at learning as an evidential reasoning task, not surprisingly, we find
learning methods that correspond to the two strategies that allow causal and evidential
reasoning (the second and third strategies of the previous section).
The second strategy of the previous section is to build special-purpose reasoning
strategies to carry out the evidential reasoning task (i.e., inferring the model from
the data) that is separate from the causal reasoning task (predicting new data from
the model). Examples of such special purpose mechanisms are decision-tree learn-
ing algorithms such as C4.5 (Quinlan, 1993) and CART ,(Breiman et al., 1984), and
backpropagation for neural network learning (Rumelhart et al., 1986).
The rest of this chapter will show how the third strategy of the previous section,
namely causal modelling and using different strategies for causal and evidential rea-
soning, can be used for learning, and can be carried out with both logical and prob-
abilistic specifications of abductive reasoning. Such a strategy implies that we need
a specification of the models to be learned and what these models predict in order to
build a learning algorithm.

10.2 BAYESIAN PROBABILITY


In this section we introduce and motivate probability theory independently of learning.
The interpretation of probability theory we use here is called Bayesian, personal, or
subjective probability, as opposed to the frequentist interpretation of probability as the
study of the frequency of repeatable events.
Probability theory (Pearl, 1988b) is a study of belief update; how an agent's knowl-
edge affects its beliefs. An agent's probability of a proposition is a measure of how
much the proposition is believed by the agent. Rather than considering an agent main-
taining one coherent set of beliefs (for example, the most plausible way the world
could be based on the agent's knowledge), Bayesian probability specifies that an agent
must consider all possible ways that the world could be and their relative plausibilities.
This plausibility when normalised to the range [0,1] so that the values for all possible
situations sum to one is called a probability.
There are a number of reasons why we would be interested in probability, including:
LEARNING, BAYESIAN PROBABILITY, GRAPHICAL MODELS, AND ABDUCfiON 157

• An agent can only act according to its beliefs and its goals. An agent doesn't
have access to everything that is true in its domain, but only to its beliefs. An
agent must somehow be able to decide on actions based on its beliefs.

• It is not enough for an agent to have just a single model of the world in which
it is interacting and act on that model. It also needs to consider what other
alternatives may be true, and make sure that its actions are not too disastrous if
these other contingencies happen to arise.
A classic example is wearing a seat belt; an agent may assume that it won't have
an accident on a particular trip, but wears a seat belt to cover the possibility
that it does have an accident. Under normal circumstances, the seat belt is a
slight nuisance, but if there is an accident, the agent is much better off when
it is wearing a seat belt. Whether the agent wears a seat belt depends on how
inconvenient it is when there is no accident, how much better off the agent would
be if they were wearing a seat belt when there is an accident, and how likely an
accident is. This tradeoff between various outcomes, their relative desirability,
and their likelihood is the subject of decision theory.

• As we will see below, probabilities are what can be obtained from data. Proba-
bility lets us explicitly model noise in data, and lets us update our beliefs based
on noisy data.

The formalisation of probability theory is simple.


A random variable is a term in a language that can take one of a number of dif-
ferent values. The set of all possible values a variable can take is called the domain
of the variable. We write x = v to mean the proposition that variable x has value v. A
Boolean random variable is one where the domain is {true, false}. Often we write x
rather than x =true and •x rather than x =false. A proposition is a Boolean formula
made from assignments of values to variables.
Some example random variables may be a patient's blood pressure at 2:00p.m. on
October 25, 1999, the value of the Australian dollar relative to the Canadian dollar on
January 1, 2001, whether a patient has cancer at a particular time, whether a light is lit
at some time point, or whether a particular coin lands heads on a particular toss.
There is nothing random about random variables. We introduce them because it is
often useful to be able to refer to a variable without specifying its value.
Suppose we have a set of random variables. A possible world specifies an assign-
ment of one value to each random variable. If w is a world, x is a random variable and
v is a value in the domain of x, we write
wf=x=v
to mean that variable xis assigned value v in world w. We can allow Boolean combi-
nations on the right-hand side off=, where the logical connectives have their standard
meaning, for example,

w f= a 1\ ~ iff w f= a and w f= ~
So far this is just standard logic, but using the terminology of random variables.
I 58 D. POOLE

Let's define a nonnegative measure Jl(w) to each world w so that the measures of
the possible worlds sum 3 to I. The use of I is purely by convention; we could have
just as easily used I 00, for example.
The probability of proposition a, written P( a), is the sum of the measures of the
worlds in which a is true:
P(a) = I Jl(w).
wl=a

The most important part of Bayesian probability is conditioning on observations.


The set of all observations is called the evidence. If you are given evidence e, con-
ditioning means that all worlds in which e is false are eliminated, and the remaining
worlds are renormalised so that their probabilities sum to 1. This can be seen as creat-
ing a new measure Jle defined by:

~(w)/P(e)
ifw ~ e
Jle(w) = { ifw p e
We can then define the conditional probability of a given e, written P(aie) in terms
of the new measure:
P(aie) = I Jle(w).
wl=a

Example 10.3 The probability P(sneeze = yesicold =severe) specifies, out of all of
the worlds where cold is severe, what proportion have sneeze with value yes. It is
the measure of belief in the proposition sneeze= yes given that all you knew was that
the cold was severe. The probability P(sneeze = yesicold =f. severe) considers the
other worlds where the cold isn't severe, and specifies the proportion of these in which
sneeze has value yes. This second probability is independent of the first.

10.2.1 Bayes' rule


Given the above semantic definition of conditioning, it is easy to prove:

P(hie) = p~(:)e).
Rewriting the above formula, and noticing that h 1\ e is the same proposition as e 1\ h,
we get:
P(hl\e) = P(hie) x P(e)
= P(eih) x P(h)
We can divide the right hand sides by P( e), giving
P(hie) = P(eih) x P(h)
P(e)

3 When there are infinitely many possible worlds, we need to use some form of measure theory, so that the
measure of all of the possible worlds is 1. This requires us to assign probabilities to measurable sets of
worlds, but the general idea is essentially the same.
LEARNING, BAYESIAN PROBABILITY, GRAPHICAL MODELS, AND ABDUCfiON 159

if P(e) =/; 0. This equation is known as Bayes' theorem or Bayes' Rule. It was first
given in this generality by (Laplace, 1812).
It may seem puzzling why such an innocuous looking equation should be so cele-
brated. It is important because it tells us how to do evidential reasoning from a causal
knowledge base; Bayes' rule is an equation for abduction. Suppose P(eih) specifies
a causal model; it gives the propensity of effect e in the context when h is true. Bayes'
rule specifies how to do evidential reasoning; it tells us how to infer the cause h from
the effect e.
The numerator is the product of the likelihood, P(eih), which specifies how well
the hypothesis h predicts the evidence e, and the prior probability, P(h), that specifies
how much the hypothesis was believed before any evidence arrived.
The denominator, P(e), is a normalising constant to ensure that the probabilities
are well formed. If { h 1 , ••• , hk} are a set of pairwise incompatible (h; and h j cannot
both be true if i =/; J) and covering (one h; must be true) set of hypotheses, then

P(e) = 2,P(eJh;) x P(h;)


h;

If you are only interested in comparing hypotheses this denominator can be ignored.

10.2.2 Bayesian learning


Bayesian learning, or Bayesian statistics (Cheeseman, 1990; Loredo, 1990; Jaynes,
1985; Jaynes, 1995) is the method for using Bayes' rule for evidential reasoning for
the evidential reasoning task of learning.
Bayes' rule is

If e is the data (all of the training examples), and his a hypothesis, Bayes' rule spec-
ifies how, given the model of how the hypothesis h produces the data e and the prior
propensity of h, you can infer how likely the hypothesis is, given the data. R.J. One
of the main reasons why this is of interest is that the hypotheses can be noisy; an
hypothesis can specify a probability distribution over the data it predicts. Moreover,
Bayes' rule allows us to compare those hypotheses that predict the data exactly (where
P(eih) = 1) amongst themselves and with the hypotheses that specify any other prob-
ability of the data.

Example 10.4 Suppose we are doing Bayesian learning of decision trees, and are
considering a number ofdefinitive decision trees (i.e., they predict classifications with
0 or 1 probabilities, and thus have no room for noise). For each such decision tree h,
either P(eJh) = 1 or P(eJh) = 0. Bayes theorem tells us that those that don't predict
the data have posterior probability 0, and those that predict the observed data have
posterior probabilities proportional to their priors. Thus the prior probability specifies
the learning bias (for example, towards simpler decision trees); out of all of the trees
that match the data, which are to be preferred. Without such a bias, there can be no
learning as every possible function can be represented as a decision tree. Bayes rule
160 D. POOLE

3.5~------~--------r-------~-------.--------.

Figure 10.2 Posterior distribution for learning a probability.

also specifies how to compare simpler decision trees that may not exactly fit the data
(e.g., if they have probabilities at the leaves) with more complex ones that exactly fit
the data. This gives a principled way to handle overfitting.

Example 10.5 The simplest form of Bayesian learning with probabilistic hypotheses
is when there is a single binary event that is repeated and statistics are collected. That
is, we are trying to learn probabilities. Suppose we have some object that can fall
down such that either there is some distinguishing feature (which we will call heads)
showing on top, or there is not heads (which we will call tails) showing on top. We
would like to learn the probability that there is a heads showing on top. Suppose our
hypothesis space consists of hypotheses that specify P(heads) = p where heads is the
proposition that says heads is on top, and p is a number that specifies the probability
of a heads on top. Implicit in this hypothesis is that repeated tosses are independent4 .
Suppose we have on observation e consisting of a particular sequence of outcomes

4 Bayesian probability doesn't require independent nials. You can model the interdependence of the nials
in the hypothesis space.
LEARNING, BAYESIAN PROBABILITY, GRAPHICAL MODELS, AND ABDUCTION 161

with n outcomes with heads true and out ofm outcomes. Let hp be the hypothesis that
P(heads) = p for some 0 ::=; p ::=; 1. Then we have, by elementary probability theory,
P(e!hp) = pn(l- p)m-n
Suppose that our prior probability is uniform on [0,1]. That is, we consider each value
for P(heads) to be equally likely before we see any data.
Figure 10.2 shows the posterior distributions for various values of n and m. Note
that the only hypotheses that are inconsistent with the observations are P(heads) = 0
when n > 0 and P(heads) = 1 when m > 0. Note that if the prior isn' t very biased, it
soon gets dominated by the data.

Bayesian learning has been applied to many representations including decision


trees (Buntine, 1992), neural networks (Neal, 1996), Bayesian networks (Heckerman,
1995), and unsupervised learning (Cheeseman et al., 1988). All we need is a way to
specify what a particular decision tree, neural network, Bayesian network, or logic
program predicts (this is well defined by the definition of the representation), as well
as a prior probability on the different representations.
Prior probabilities may seem to be problematic, but are important for avoiding over-
fitting. They give a principled way to do what would otherwise have to be done by
some ad hoc mechanism, such as pruning decision trees or limiting the size of neural
networks. For example, if there is noise in the data, a more detailed decision tree can
always be made to fit the data better, but usually has worse predictive properties on
unseen examples. A prior probability on decision trees provides a bias that lets us
tradeoff fitting the training data with simplicity of the trees (Buntine, 1992).
Bayesian leaning is closely related to the minimum description length (MDL) prin-
ciple. If we were to choose the most likely hypothesis given the data5 (called the
maximum a posteriori probability, or MAP, hypothesis), we can use:
argmaxP(h!e)
h
P(e!h) x P(h)
argmf" P(e)
= argmaxP(e!h) x P(h)
h
= argmax-log2 P(e!h) + -log2 P(h)
h

The latter is the number of bits it takes to describe the data in terms of the model plus
the number of bits it takes to describe the model. Thus the best hypothesis is the one
that gives the shortest description of the data in terms of that model.

10.3 BAYESIAN NETWORKS


Probability specifies a semantic construction and not a representation of knowledge.
A Bayesian network (Pearl, 1988b) is a way to represent probabilistic knowledge. The

5 We don't have to do this. In panicular, it is the posterior distribution of the hypotheses that we want to use
to make decisions, rather than the most likely hypothesis.
162 D. POOLE

idea is to represent a domain in terms of random variables and to explicitly model the
interdependence of the random variables in terms of a graph. This is useful when a
random variable only depends on a few other random variables, as occurs in many
domains.
Suppose we decide to represent some domain using the random variables x1, . . . , Xn.
If we totally order the variables, it is easy to prove that
P(x1, ... ,xn)
= P(xi)P(x2lxi)P(x3lx1,X2) · · ·P(xnlx! · · ·Xn-1)
For each variable Xi suppose there is some minimal set 1tx; ~ { x1 , . .. , Xi-! } such that
P(xiiXI, ... ,Xi- I) = P(xd1tx;)

That is, once you know the values of the variables in 1tx;, knowing the values of other
predecessors of Xi in the total ordering will not change your belief in Xi· The elements
of the set 1tx; are known as the parents of variable Xi. We say Xi is conditionally
independent of its predecessors given its parents. We can create a graph where there
is an arc from each parent of a node into that node. Such a graph, together with
the conditional probabilities for P(xd1tx;) for each variable Xi is known as a Bayesian
network or a belief network (Pearl, 1988b; Jensen, 1996).
There are a few important points to notice about a Bayesian network:
• By construction, the graph defining a Bayesian network is acyclic.
• Different total orderings of the variables can result in different Bayesian net-
works for the same underlying distribution.
• The size of the conditional probability table P(xil1tx;) is exponential in the num-
ber of parents of Xi .
Typically we try to build Bayesian networks so that the total ordering implies few
parents and a sparse graph.
Bayesian networks are of interest because they can be constructed taking into ac-
count just local information, the information that has to be specified is reasonably
intuitive, and there are many domains that have concise representations as Bayesian
networks. There are algorithms that can exploit the sparseness of the graph for com-
putational gain (Lauritzen and Spiegelhalter, 1988; Dechter, 1996; Zhang and Poole,
1996), exploit the skewness of distributions (Poole, 1996) or use the structure for
stochastic simulation (Henrion, 1988; Pearl, 1987; Dagum and Luby, 1997).

10.4 BAYESIAN LEARNING AND LOGIC-BASED ABDUCTION


So far we have given an informal characterisation of Bayes' rule as a rule for abduc-
tion. (Poole, 1993) has shown a direct correspondence between Bayesian networks
and logic-based conceptions of abduction. (Buntine, I 994) has shown how Bayesian
networks form a representation for many inductive learning tasks. In this section we
put these together to show how inductive learning tasks can be related to logic-based
abduction. In the following section, we expand on this mapping to discuss some of the
issues of this book relating abduction and induction.
LEARNING, BAYESIAN PROBABILITY, GRAPHICAL MODELS, AND ABDUCTION 163

10.4.1 Logic programs, abduction and Bayesian networks


This section overviews the relationship between Bayesian networks and logic-based
abduction (Poole, 1993). In particular, I give the translation of Bayesian networks
into probabilistic Hom abduction (Poole, 1993), a form of probabilistic and abductive
logic programming.
Suppose variable a has parents bt , ... , bk in a Bayesian network. As part of the
Bayesian network are probabilities of the form
P(a=vlbt =VJ, ... ,bk=vk) =p
These can be translated into rules of the form:
a= V f- bt = VJ /1. ... /1. bk = Vk /1. h. (10.2)
which can be treated as normal logical rules where h is assumable.
In probabilistic Hom abduction (and its successor the independent choice logic
(Poole, 1997), which can handle more general rules, including negation as failure, as
well as different agents choosing assumptions), the assumables are structured in terms
of a choice space, C, which is a set of alternatives (called disjoint sets in (Poole,
1993)), where an alternative is a set of ground atoms. Each member of an alternative
is assumable and can only appear in one alternative. The integrity constraints are that
the elements of an alternative are pairwise inconsistent.
An independent choice logic theory is specified by a choice space and an acyclic
logic program that doesn't imply any element of an alternative. The semantics is
defined in terms of possible worlds. There is a possible world for each selection of
one element from each alternative. What is true in a possible world is given by the
stable model of the logic program and the atoms selected. The logic is abductive in
the sense that the explanations of g form a concise specification of the possible worlds
in which g is true (Poole, 1993; Poole, 1998). This thus forms a natural form of
abductive logic programming.
We place a probability over the assumables so that the probability of the elements
of an alternative sum to one. We assume that the different alternatives are probabilis-
tically independent (the alternatives correspond to random variables).
In term of representing the Bayesian network above, there is an alternative for
each assignment of values to the parents of a. For each such alternative, there is
an element of the alternative for each value of a. The probability of the assumable h
(from equation ( 10.1)) is the same as the probability of the corresponding conditional
probability in the Bayesian network:
P(h) = P(a = vlbt = Vt, ... ,bk = vk)
The abductive characterisation of probabilistic Hom abduction is straightforward. For
any proposition h, the probability of h can be computed from the set of minimal ex-
planations of h. The minimal explanations are disjoint (by the way the rules were
constructed), and so the probability of h is the sum of the probabilities of the minimal
explanations for h. The probability of an explanation is the product of the probabilities
of the assumables. That is
P(h) = ~ P(e)
e is a minimal explanation of h
164 D. POOLE

where the probability for explanation e is given by

P(e) = IlP(n)
nEe

In (Poole, 1993) it was proved that the Bayesian network and the abductive character-
isation result in the same probabilities.
Suppose we want to compute a probability given evidence, we have

Thus this can be seen in terms of abduction as: given evidence e, first explain the
evidence (this gives P(e)), and from the explanations of the evidence, explain h (this
gives P(h 1\ e)). Note that the explanation of h 1\ e are the explanations of e extended
to also explain h. In terms of a Bayesian network, you can first go backwards along
the arrows to explain the evidence, and then go forward along the arrows to make pre-
dictions. Thus not only can Bayes' rule be seen as a rule for abduction, but Bayesian
networks can be seen a representation for abduction. Note that this reasoning frame-
work of using abduction for evidential reasoning and assumption-based reasoning for
causal reasoning (see Figure 10.1), which is what the above analysis gives us for
Bayesian networks, has also been proposed in the default reasoning literature (Poole,
1989a; Poole, 1990; Shanahan, 1989).
The logic programs have a standard logical meaning and can be extended to include
(universally quantified) logical variables6 in the usual way. The only difference to
standard logic programs7 is that some of the premises are hypotheses that may have
an associated probability.

10.4.2 Bayesian networks and induction


Buntine argues that Bayesian networks (as well as related chain graphs) form a good
representation for many induction tasks (Buntine, 1994). That is, he argued that
Bayesian networks can form a representation for the evidential reasoning task of learn-
ing.
Note that this is very different from the problem of learning Bayesian networks
themselves for which there are Bayesian and non-Bayesian techniques (see (Hecker-
man, 1995) for a review of learning Bayesian networks). Bun tine was using Bayesian
networks to represent the task of learning, independently of the task being learned.
Buntine used the notion of plates which were repeated copies of a network.

Example 10.6 Figure 10.3 shows a Bayesian network for the coin tossing of Example
10.5. The probability of heads on example i, which in the left-hand side of Figure 10.3
is shown as heads;, is a random variable that depends only on 9, the probability of

6 Itis important not to confuse logical variables, which stand for individuals, and random variables. In this
chapter, I will follow the Prolog convention of having logical variables in upper case.
7 In the independent choice logic (Poole, 1997), we can also have negation as failure in the rules. The notion
of abduction needs to be expanded to allow abduction through the negation (Poole, 1998).
LEARNING, BAYESIAN PROBABILITY, GRAPHICAL MODELS, AND ABDUCfiON 165

Figure 10.3 Bayesian network for coin tossing, with and without plates.

heads appearing on a coin toss. The right-hand side of Figure 10.3 shows the same
network using plates, where there is one copy of the boxed node for each example.
Given the logic-programming characterisation of Bayesian networks, we can use
universally quantified logical variables in the rules to represent the plates of Buntine.

Example 10.7 Let's write the example of Figure 10.3 in terms of probabilistic Horn
abduction. First we can represent each arc to an example as the rule:
heads( E) +- happensJoJurn..heads(E, P) 1\ prob..o f ..heads(P)

tails(E) +- happensJoJurnJails(E,P) 1\prob..of..heads(P)

where heads( E) is true if example E shows a heads, and tails( E) is true if example E
shows a tails.
The corresponding alternatives are
VEVP{ happensJoJurn..heads(E, P), happensJoJurn..heads(E, P)} E C
That is, we can assume that example E turns heads or assume it turns tails. We then
have the probabilities:
P(happensJoJurn..heads(E,P)) = P

P(happensJoJurnJails(E,P)) = 1-P

We also have the alternative that corresponds to the 9 in Figure 10.3:


{prob..of ..heads(P): 0 :S: P :S: 1} E C
That is you can assume any single probability in the range [0, 1].
Suppose you have example e1, ... , et. and have observed say
heads(ei),tails(e2), tails(e3), ...
The explanations of this observation are of the form:
{happensJoJurn..heads( e1 ,P), happensJoJurnJails( e2, P),
happensJoJurnJails(e3,P), ... ,
prob..of..heads(P)}
166 D. POOLE

for each P E [0, 1]. Suppose there were n heads and m tails in the k = n + m examples,
then the probability of this explanation is
pn X ( 1 - P)m X q

where q is P(prob...of ..heads(P)).

10.5 COMBINING INDUCTION AND ABDUCTION


In terms of abduction, the basic idea of this model of induction is to have some as-
sumptions that are specific to each example, and some assumptions that are specific
to the model being learned. For each example, you make some model-specific as-
sumptions and some example-specific assumptions (that also depend on the model
assumptions). When explaining a number of examples, they each have their own
example-specific assumptions, but must share the model assumptions.
Buntine has shown how many different learning algorithms from neural networks
to unsupervised learning can be put into this framework (Buntine, 1994).

10.5.1 Learning decision trees


In this section we will sketch how the same framework can be used for more com-
plicated examples, where the models must be constructed, rather than having a fixed
number of parameters to be estimated. Here the flexibility of representation in terms
of logic-based abduction can be seen to have great advantages over the use of plates.
Let's look at the same framework for Bayesian learning of decision trees with prob-
abilities at the leaves8 (Buntine, 1992). To keep this simple let's suppose that all at-
tributes are Boolean.
We use the relation prop(Ex,Att,Val) that is true when example Ex has value
Val on attribute Att. Suppose a decision tree is either a number or is of the form
if(C,YT,NT) where Cis an attribute YT and NT are trees.
We need to write rules that specify the value of the classification based on the tree:

prop( Ex, classification, V) ~ tree(T) 1\ tree_predicts(T,Ex, V).

It is straightforward to define what a tree predicts:


tree_predicts(if(C,YesT,NoT),Ex, V) ~
prop(Ex,C,true) 1\
tree_predicts(YesT,Ex, V).
tree_predicts(if(C,YesT,NoT),Ex, V) ~
prop(Ex,C,false) 1\
tree_predicts(NoT,Ex, V).
tree_predicts(N,Ex, V) ~

8 Notethat when these decision trees are translated into rules, probabilistic Hom abduction theories result.
But here we are using probabilistic Hom abduction to represent the learning task, not the task being learned.
LEARNING, BAYESIAN PROBABILITY, GRAPHICAL MODELS, AND ABDUCTION 167

number(N) 1\
predicts_prob(Ex,N, V).
where
VExVN{predicts_prob(Ex,N,true),predicts_prob(Ex,N,false)} E C
such that
P(predicts_prob(Ex,N,true)) = N
P(predicts_prob(Ex,N,false)) = 1-N
Similarly we need ways to abduce what the trees are, and (the more difficult) problem
of assigning the priors on the decision trees.
The most likely explanation of a set of classifications on examples results in the
most likely decision tree given those examples.

10.5.2 Generalization
It has often been thought that probability is unsuitable for generalization as the gener-
alization VX r(X) must have a lower probability than any set of examples r( e1), ... , r( ek),
as the generalization implies the examples. While the statement of probability is cor-
rect, it is misleading because it is not the hypothesis and the evidence that we want to
compare but the different hypotheses9 •
The different hypotheses may be, for example:
1. r(X) is always true,
2. r(X) is sometimes true (and it just happened to be true for examples e1 , . .. , ek).

3. r(X) is always false.


This can be represented as having the alternatives:
{r ...alwaysJrue, r ..sometimesJrue, r ..always_false} E C
VX {r ..happensJrue(X),r..happens_false(X)} E C
with some probabilities associated with the assumables, and the rules
r(X) +- r ...alwaysJrue.
r(X) +- r ..sometimesJrue 1\ r ..happensJrue(X).
For any set of (all positive) observations: r(et), ... ,r(ek), there are two competing
explanations:
{r ...alwaysJrue}
{r ..sometimesJrue,r..happensJrue(et), . .. ,r..happensJrue(ek)}

9 It is interesting to note that in the abductive framework the hypothesis


always implies the evidence, and so
it is always less likely. But this is exactly what we want from learning: we want the learned hypothesis to
make risky prediction, that could be wrong, on unseen data.
168 D. POOLE

If there are no extreme (0 or 1) probabilities, with enough positive examples, the


conclusion that r is always true will be the most likely hypothesis. Thus we can make
universal generalizations within this framework.

10.6 CONCLUSION
This chapter has related the Bayesian approach to learning with logic-based abduction.
In particular, I have sketched the the relationship between Bayesian leaning and the
graphical models of (Buntine, 1994) and the relationship between graphical models
and abductive logic programming of (Poole, 1993). It should be emphasised that,
while each of the links has been developed, the chain has not been fully investigated.
This chapter should be seen as a starting point, rather than a survey of mature work.

Acknowledgments
This work was supported by Institute for Robotics and Intelligent Systems, Phase III (IRIS-III),
project "Dealing with Actions", and Natural Sciences and Engineering Research Council of
Canada Operating Grant OGP0044121 .
11 ON THE RELATION BETWEEN
ABDUCTIVE AND INDUCTIVE
HYPOTHESES
Akinori Abe

11.1 INTRODUCTION
Abduction and induction have been recognized as important forms of reasoning with
incomplete information that are appropriate for many problems in artificial intelli-
gence (AI). Abduction is usually used for design, diagnosis and other such tasks. In-
duction is usually used for classification, program generation, and other similar tasks.
As mentioned in the introductory chapter to this volume, Peirce classified abduc-
tion from a philosophical point of view as the operation of adopting an explanatory
hypothesis and characterized its form. In addition, he characterized induction as the
operation of testing a hypothesis by experiments (Peirce, 1955a). Abduction and in-
duction are different in his viewpoint.
On the other hand, abduction in the AI field is generally understood as reasoning
from observation to explanations, and induction as the generation of general rules from
specific data. Sometimes, both types of inferences are thought to be the same because
they can be viewed as being the inverse of deduction. Pople mechanized abduction as
the inverse of deduction (Pople, Jr., 1973), although he seems to distinguish abduc-
tion from induction. Muggleton and Buntine have formalised induction as inverted
resolution (Muggleton and Buntine, 1988).
Some researchers have contended that abduction and induction are similar pro-
cesses. For example, Josephson, in his contribution to this volume, argues that "smart"
inductive generalisation is a special case of abduction. On the other hand, Dimopou-
los and Kakas have shown that the two types of inferences differ, in that abduction
169
P.A. Flach and A.C. Kakas (eds.), Abduction and Induction, 169-180.
© 2000 Kluwer Academic Publishers.
170 A. ABE

extracts hypotheses from theories, while induction constructs its hypotheses using in-
formation from theories (Dimopoulos and Kakas, 1996b). Furthermore, from the two
perspectives, Flach has shown that "abduction (inferential)= abduction (syllogistic) U
induction (syllogistic)", and that induction (inferential) does not have an analogue in
the syllogistic perspective (Flach, 1996a).
In this chapter, I will argue from the inferential point of view. Since the role and
behaviour of abductive hypotheses and those of inductive ones are somewhat different,
I support the position that they are different. I will provide a way to integrate abduction
and induction from the features of the above hypotheses.
There are various sorts of studies about abduction and induction in the AI field.
However, for simplification, in this chapter I will specialize abduction as a hypothetical
reasoning and induction as inductive logic programming (ILP).

11.2 THE RELATION BETWEEN ABDUCTION AND INDUCTION

11.2. 1 Abduction and induction


Peirce wrote that abduction is an operation for adopting an explanatory hypothesis,
which is subject to certain conditions, and that in pure abduction, there can never
be justification for accepting the hypothesis other than through interrogation (Peirce,
1955a). In AI, if we use this logic framework, a sort of justification will be made by
generating and testing candidate hypotheses to find acceptable hypotheses. Therefore,
in simple words, abduction can be formalized as an explanation of an observation. It
generates or adopts consistent hypotheses that explain an observation. Furthermore,
abduction can be formalized as prediction because hypotheses generation and testing
can be seen as a prediction of the observation's causation. Peirce also wrote that in-
duction is an operation for testing a hypothesis by experiment, and if it is true, an
observation made under certain conditions ought to have certain results. Then, if these
conditions are fulfilled and the results are favourable, we extend a certain confidence
to the hypothesis. Thus, simply speaking, induction can be formalized as the gener-
alization of examples. Induction finds tendencies in examples and generates general
rules (hypotheses) from examples and background knowledge.
Both types of inferences seem to be the same because they can be viewed as the
inverse of deduction. For example, Pople mechanized abduction as the inverse of
deduction (Pople, Jr., 1973), and Muggleton and Buntine have introduced inverse res-
olution to induction (Muggleton and Buntine, 1988).
From their points of view, both inference mechanisms generate hypotheses and
appear similar. However, the main difference between abductive hypotheses and in-
ductive hypotheses is that, in general, abductive hypotheses are propositional clauses
and inductive hypotheses are predicate clauses. Intuitively, this difference comes from
the method of inference. Abduction involves the adoption of existing or non-existing
hypotheses for a given observation explanation. Therefore, hypotheses are usually
propositional clauses. For example, when a hypothetical reasoning system is used for
an LSI circuit design, the knowledge base (including facts) has knowledge about the
ON THE RELATION BETWEEN ABDUCTIVE AND INDUCTIVE HYPOTHESES 171

devices' functions and their connections, and knowledge of other rules in the predicate
formas follows (specification). 1
fact((equ(out(N,fl),O) :- conn(Node,out(N,fl)) & equ(Node,O))).

If the following input-output relation for the LSI circuit is given as an observation,

equ(out(l , fl),input+x2+x3+x4) :- equ(in(l,fl),input) & ...

abduction computes the devices' name, and their connections in a propositional formas
a set of hypotheses.
conn(out(l,x5),in(l,x6)) & conn(out(l,x4),in(l,x5)) &
c onn(in(l,fl),in(l,x4)) & function(x6,reg)& dev(x6,1,1) & ...

In hypothetical reasoning, hypotheses, although those in the hypothesis base are


predicate clauses, are specialized to explain the observation; therefore, hypotheses
tend to be propositional clauses.
On the other hand, induction involves the classification or generalization of ex-
amples for explaining tendencies in observations. Therefore, the hypotheses become
predicate clauses. For example, if ILP is used for the program generation, background
knowledge includes sample programs like

head([AI_], A), tail([_IB], B), cons(C, D, [CID]), ...

(usually in the predicate form). If examples are given in the propositional formlike

append( [a], [b], [a, b]), .. .

it returns the answer program in a predicate formlike

append(X, Y, Z) :- head(X, H), tail(X, T),


append(T, Y, W), cons(H, W, Z).

As such, some predicate clauses can be generated as hypotheses by generalization


from propositional examples.

11.2.2 Abduction in artificial intelligence


Theorist. One of the popular abduction systems is Theorist (Poole et al., 1987),
which is a hypothetical reasoning system. In Theorist, logical knowledge is divided
into two categories; facts and hypotheses. Fact (F) is knowledge that is always true
and can not have any inconsistency with other knowledge, while hypothesis (h) is de-
feasible knowledge, which requires consistency checking during an inference process.
The inference mechanism of Theorist explains an observation (0) by consistent
hypotheses (h). Hypotheses are selected from a set of hypotheses (H). As a result,
when an observation is given, it at first tries to explain the observation with only facts

1In the following examples, names beginning with a capital letter are variables.
172 A. ABE

(F). If it fails, it selects a subset of hypotheses (h) from H and tries to explain the
observation with facts and a consistent set of hypotheses.
FifO (0 can not be explained by only F)

FUhl-0 ( 0 can be explained by F and h) (11.1)

FUhlfD (Fand his consistent), (11.2)

where F is a fact, h is a hypothesis that is a subset of H (h ~ H), 0 is the observation


to be explained, and D is an empty clause.
For example, when Theorist is used for an LSI circuit design, F includes knowledge
about the devices' function and their connections, and the knowledge of other rules. In
addition, H includes candidate devices and their candidate connections. If the relation
between input and output of the circuit is given as an observation 0, Theorist computes
the name of devices and their connections as hypotheses h. Theorist can not generate
new hypotheses, it only adopts consistent hypotheses (usually atom-type clauses) from
the hypotheses set.

CMS. Another popular abduction system is CMS (Reiter and de Kleer, 1987). The
CMS inference mechanism is: When
(C can not be explained by only ~)

if propositional clause C (observation) is given, CMS returns a set of minimal clauses


S to clause set ~ such that

A clause S is called a minimal support clause, and -,s is a clause missing from clause
set ~ that can explain C. Therefore, -,scan be thought of as an abductive hypothesis
according to the abductive point of view. This hypothesis is not included in the knowl-
edge base(~~ S.). Therefore, there is no justification for the hypotheses except that
it provides a minimal completion for the abductive puzzle

Abductive Analogical Reasoning (AAR). In both popular abduction systems I have


mentioned, the criterion for accepting hypotheses is a simplicity criterion, i.e. "Oc-
cam's Razor". However, in applications like natural language processing, using a
simplicity criterion is insufficient. Such a criterion considers neither the meaning of
hypotheses nor their relationships. Therefore, problems will occur when selecting the
best hypotheses. (Ng and Mooney, 1990) have shown that some notion of explanatory
coherence is more important in deciding which explanation is the best and introduced
coherence metrics in order to select better explanations.
IndependentofNg and Mooney's work, (Thagard, 1989) has proposed a notion of
explanatory coherence. He explained seven principles of explanatory coherence. In-
tuitively, explanatory coherence is interpreted as a plausibility of explanations. How-
ever, Ng and Mooney on the one hand and Thagard on the other use this word quite
ON THE RELATION BETWEEN ABDUCTIVE AND INDUCTIVE HYPOTHESES 173

differently. Ng and Mooney focus on the connection between an observation and


explanations, while Thagard does not seem to focus on the relationship between an
observation and explanations. In turn, he seems to focus on the coherence of the ana-
logical relationship between explanations or between observations.
I have adopted the notion of explanatory coherence as analogical mapping and pro-
posed Abductive Analogical Reasoning (AAR) (Abe, 1998). My characterization of
explanatory coherence is similar toNg and Mooney's. However, the explicit connect-
edness between an observation and explanations does not play a significant role in the
author's framework. Instead, the analogical relationship between similar inferences
plays a significant role. That is, if a previous successful abduction result from a simi-
lar observation is known, a result mapped from it is more coherent than that abduced
from an observation with a simplicity criterion.
I will give a brief introduction to the AAR inference mechanism. First, I will define
some notations.

Definition 11.1 (Analogical clause) In general, "A is an analogical clause of B"


means that B is analogically mapped from A. Usually, the relation is a high-order rela-
tion like structure-mapping (Gentner, 1983; Gentner, 1988). However, in (Abe, 1998),
I simplify analogical mapping. Let A be aV bV ... V 1. An analogical clause ofA (=B)
is such a clause that has at least one atom that is similar to that of A and other atoms
are the same as those ofA. For example, if a is similar to d, B is a' V b V ... V 1. The
similarity between words is calculated by the concept base (Kasahara et al., 1996).

Definition 11.2 Let A and B be sets of clauses. "A ~ B'" is the relationship between
A and B', iff there exists a set of clauses B' such that:
(i) A F B,
(ii) B f. B', and
(iii) every clause in B' is either included in B, derivable from B, an analogical clause
of that included in B, or an analogical clause of that derivable from B.

Definition 11.3 Let A be a set of clauses. "A fvt A'" is the relationship between A and
A', such that A' is a set of clauses that can derive analogical clauses of the clauses
derived from A. Here, "fvt" means "can derive clauses with analogical mapping."

From the above definitions, the following relation can be obtained.


Let r., A and A' be sets of clauses. When r. f= A, if A fvt A', then r. ~A'.
The AAR inference mechanism is as follows. AAR explains an observation, C, by
satisfying the following formulae. If an observation C is given, AAR tries to explain
it using clauses in the knowledge base r.. When
(C can not be explained only by r.)
it means that r. lacks the clauses to explain C. Then, AAR returns a set of minimal
clause S such that
r. F= svc
174 A. ABE

Clauses in 1: Results from Concept Base


•east V palace. horse is similar to donkey
•ride V east.
•palaceV beautiful.
•desert V donkey.
•horse V •sadd leV ride.
•town V car.
desert.
saddle.

Table 11.1 Example knowledge base.

S is the clause that is necessary to explain C. However, since 1: ~ S, the justification


of S is not guaranteed. As a result, it generates clauses S' such that

(S ~S)

and generates clauses S" such that

1: f= s'vc (S ~s')

1:~S'

C is then explained by •S" as hypotheses. S" is logically equivalent2 to or identical to


S, because of the reverse mapping.
AAR generates clauses that are necessary for explaining an observation, transforms
these clauses to plausible hypotheses for explaining an observation by referring to
clauses in the knowledge base, and then explains the observation by those hypotheses.
For example, if the knowledge base contains the clauses shown in Table 11.1, and the
user gives a query (an observation) "palace 3", the set of the minimal support clauses
(S) is {donkey 1\ •palace, donkey 1\ •east, donkey 1\ •ride, donkey 1\ •horse} . Some
negations of these clauses seem to be necessary for explaining an observation. Fur-
thermore, since "donkey" is similar to4 "horse," by referring to "•horse V •saddle V
ride" in the knowledge base, and using analogical mapping from {•(donkey/\ •ride) ,
saddle}, a hypothesis {•donkey V •saddle V ride, saddle } is returned from AAR.

2In this chapter, "logically equivalent" means that both results from clauses are the same. For example, if A
is ~av c and B is ~av ~bv c and b, then A is logically equivalent to B.
3In fact, this query is given in the form of ~donkey /1 palace, because of the limitation of CMS. One solution
to this problem is found in (Abe, 1997).
4 A possible analogical target can be easily found from the inference path. For details, see (Abe, 1998)
ON THE RELATION BETWEEN ABDUCTIVE AND INDUCTIVE HYPOTHESES 175

"-.donkey V ride" is a minimal hypothesis, or a 'short-cut' hypothesis. On the other


hand, {-.donkey V •saddle V ride, saddle } are modified by referring to {-.horse V
-.saddle V ride, saddle } in the knowledge base; therefore, this is a more plausible
hypothesis than "•donkey V ride." This is because it is generated by referring to the
clause in the knowledge base.

11.2.3 Induction in artificial intelligence


Various Inductive logic programming (ILP) systems have been proposed over the past
ten years (CIGOL (Muggleton and Buntine, 1988), GOLEM (Muggleton and Feng,
1990), FOIL (Quinlan, 1990), Progol (Muggleton, 1995), etc.). ILP is now one of
the most popular research areas in the realm of induction. ILP systems may receive
various kinds of information about the desired program as input, but this input always
includes examples of the program's input/output behavior. The output that is produced
is a logic program that behaves as expected on the given examples, or at least on a high
percentage of them (Bergadano and Gunetti, 1996). ILP's basic mechanism is:
E =E+ +E-

(11.3)

(11.4)

where B is background knowledge and h represents hypotheses. E denotes examples,


and E can be separated into£+ (positive examples) and E- (negative examples). 0 is
an empty clause.
If a setH (hypotheses space: possible program), a set£+, a set E-, and consistent
logic program B are given such that

B If e+ , for at least one e+ E £+

then ILP finds a logic program h E H, such that B and h are complete (11.3) and
consistent (11.4).

11.2.4 The relation between abduction and induction


I previously gave some features of abductive inferences and inductive inferences. If
an observation is given, both inferences produce so-called hypotheses. In fact, the
formulae of both types of inferences are superficially similar. In particular, if E- in
the inductive inference is an empty set, i.e., positive observations only, formulae (11.1)
and (11.2) are quite similar to formulae (11.3) and (11.4).
For the example abduction behaviour, since abductive hypotheses are usually con-
crete or example types, h in formulae ( 11.1) and ( 11.2) corresponds to £+ in formulae
(11.3). However, for the inductive program generation example, since inductive hy-
potheses are usually generalized rules, h in formulae (11.3) and (11.4) corresponds to
Fin formulae (11.1) and (11.2). In addition, B will play the same role as F .
In this sense, abduction can generate propositional examples that can be used in
induction in a predictive way, and induction can generate generalized rules that can
176 A. ABE

Fuh 1- 0
;-:;,~+
Figure 11 .1 Relation between abductive hypotheses and inductive hypotheses.

propositional
examples

facts

Figure 11.2 Integration of abduction and induction.

be used in abduction as facts (background knowledge). Therefore, if abduction is


performed more than once, and various abduced hypotheses are found under the same
or similar observations; in addition, if induction is performed using these abduced
hypotheses as examples, abduction and induction can be integrated. Thus, abduction
generates examples for induction and induction generates facts for abduction (Figure
11.2).

Related work. As Mooney writes in his contribution to this volume, in his method
abduction works as knowledge base refinement, and inductive learning provides the
acquisition of abductive theories. The second part of his method is the same as pre-
sented here. In the first half, abduction is expected to generate positive examples to
refine the knowledge base. Therefore, abductive hypotheses in his system correspond
to inductive background knowledge.
Other related work is by (Kanai and Kunifuji, 1997). When background knowl-
edge for induction is not sufficient in their method, abduction produces sufficient
background knowledge. They regard abductive hypotheses as inductive background
knowledge. Their treatment of abductive hypotheses is also slightly different because
here we have abductive hypotheses corresponding to inductive examples.

11.3 ABOUT THE INTEGRATION OF ABDUCTION AND INDUCTION


11.3. 1 Problems in integrating abduction and induction
In the previous section, I showed the relationship between abduction and induction,
and the difference between abductive hypotheses and inductive hypotheses. By con-
ON THE RELATION BETWEEN ABDUCfiVE AND INDUCfiVE HYPOTHESES 177

sidering the formalization of abduction as prediction and induction as generalization, it


is natural to generate propositional examples for induction by abduction, and then use
induction to generate predicate rulesfrom those abduced examples, previously existing
examples, and background knowledge.
However, there are some problems. For example, hypotheses generated by abduc-
tion, like hypothetical reasoning systems, are thought to be positive examples in typi-
cal applications. It is very rare for hypotheses to be generated when observations are
negative. Therefore, the induction process must be done with only positive examples
in a simple integration. However, learning rules from only positive examples is harder
than learning them from positive and negative examples. Therefore, some negative
examples may be generated by abduction to perform good induction. To overcome
this problem, near-miss examples5 are often intentionally added to an example set as
negative examples.
For example, AILP (Ade and Denecker, 1995), a system for integrating induction
and abduction, adopts abduction to resolve a positive goal, treats a negated positive
goal as a negative goal, and then generates negative examples. However, it seems
to be slightly unnatural to explain an intentionally negated observation to generate
negative examples. Esposito introduced a method for generating negative examples
from integrity constraints (Esposito et al., 1996). It seems to work well; however,
in hypothetical reasoning, such no-good clauses usually do not explicitly appear as a
result of inference.
How, then, can we properly perform induction without intentionally generating
negative examples?

11.3.2 Collecting examples by AAR


I have adopted the notion of explanatory coherence as analogical mapping and pro-
posed Abductive Analogical Reasoning (AAR). In AAR, analogical mapping guaran-
tees explanatory coherence among hypotheses. It works as a constraint or a reference
to generate plausible hypotheses. Since it transforms a logically correct and minimal
clause to a logically correct and plausible clause by referring to known clauses, ex-
planatory coherence is a criterion for generating plausible abductive hypotheses. Fur-
thermore, hypotheses generated by analogical mapping from a similar source clause
are classified into the same category. If AAR generates hypotheses from similar ob-
servations, hypotheses collected by AAR can become good examples for induction.
However, they are all positive examples.
Furukawa and Shimizu introduced an algorithm to generate negative examples for
Progol (Furukawa and Shimazu, 1996). In their algorithm, examples in another class
are considered to be negative with respect to a certain class that are considered to be
positive. Thus, good negative examples are selected that are near-miss examples of
positive examples. In some applications like CAD, it seems to work well. However,

5 Near-miss example is a negative example that has a few different elements from a positive exam-
ple. To illustrate, if a positive example is ([vll, v12], [v21, v22]. ... ), then one near-miss example is
([v91, v92],[v21, v22], ... ).
178 A. ABE

in general, it is rather excessive to divide negative examples and positive ones by such
a classification.
Therefore, in this chapter, I focus upon learning from positive-only examples. One
of the restrictions to learning rules from positive-only examples is the Subset Princi-
ple (Angluin, 1980; Berwick, 1986), which is a necessary condition for positive-only
learning. Let L; be an indexed family of non-empty languages. The necessary and
sufficient condition for the Subset Principle is:

(i) T; is finite, where T; is a set of strings,


(ii) T; ~ L;, and

(iii) for all j > i, if T; ~ L i then Ti is not a proper subset ofT;.

i.e., for all i, hypothesis(i + 1) is not a proper subset of hypothesis(i).


For abduced hypotheses, the following relation can be obtained from some experi-
ences.
In general, a hypothesis abduced from a certain observation can never be
a proper subset or superset of that abduced from a similar observation.
In hypothetical reasoning, the above relation can show that if hypothesis (-.S) gener-
ated from a certain observation (C) is a subset of that (-.S') generated from a similar
observation (C'), the certain observation must be a subset of the similar observation.
Let a -.s be a subset of -.s', i.e., -.S' = -.s v a, where a :f. 0 and an -.S' = 0, the
following formulae hold.
~t= svc
~ t= s'vc' = -.(-.sva) vc'
Then,
C=C' 1\--.a (11.5)

Thus, if a hypothesis generated from a certain observation is a subset of that gener-


ated from a similar observation, then the certain observation is a subset of the similar
observation. Similarly, if a certain observation is a superset of a similar observation, a
hypothesis generated from the certain observation is a superset of that generated from
the similar observation.
Furthermore, analogical reasoning can be formalized under the Theorist framework
as follows (Goebel, 1989).

SUT~G

SUTUM t= G
SUTUM ~ -.m (mE M)

where Sis source knowledge, T is target knowledge, M is an equality assumption, and


G is an analogical inference (axiom).
ON THE RELATION BETWEEN ABDUCTIVE AND INDUCTIVE HYPOTHESES 179

If T is a subset of S, i.e., S = T UA, where A f; 0 and An S = 0, then these formulae


become as follows.

TUA~G

TUAUMf=G

TUAUM ~ -,m (mE M)

Regarding these formulae, A = S = T U A. So, assumption "T is a subset of S'' is


wrong. Therefore, T ~ S. Similarly, S ~ T. As a result, from ( 11.5), a hypothesis
generated from a certain observation is neither a subset nor superset of that generated
from a similar observation. Therefore, hypotheses generated by analogical mapping
from the same source clause can meet the Subset Principle.
If hypotheses are abduced by AAR under similar observations and rules are induced
from these abduced hypotheses as positive examples, abduction and induction can be
integrated. The key to the integration of abduction and induction is analogy or analog-
ical mapping. Analogical mapping from clauses in a knowledge base can guarantee
explanatory coherence among hypotheses. This guarantees the generation of plausible
hypotheses. Furthermore, hypotheses generated under similar situations are good pos-
itive examples, so induction of plausible rules with only positive examples can also be
achieved.

11.4 CONCLUSION
Abductive hypotheses and inductive hypotheses are sometimes treated as if they were
the same type of hypotheses. Indeed, the forms of the formulae for abduction and
induction are similar. However, they are not the same from their roles and behaviour.
I have shown the relationship between abduction and induction. The relationship be-
tween abductive hypotheses and inductive hypotheses is shown in Figure 11.1. It gives
a clue to the integration of abduction and induction. By considering this relationship,
my solution to their integration is

1) to generate propositional hypotheses by abduction under similar observations,


and

2) to generate predicate rules by induction from abduced hypotheses, and then put
them into the knowledge base as facts.

However, there are some restrictions to generating a predicate rule by induction.


Since the usual abduction procedure generates only positive examples, the induction
procedure must generate rules from only positive examples. In general, it is slightly
hard to learn rules from only positive examples. However, if examples satisfy the
Subset Principle, new rules can be generated from only positive examples.
The solution to this problem is to generate hypotheses under similar observations.
As shown in Section 11.3, if AAR generates hypotheses under similar observations,
the generated hypotheses can satisfy the Subset Principle, and inductive hypotheses
(rules) can be generated from only positive examples. Furthermore, even though the
180 A. ABE

mapping function is presently very simple, AAR can generate plausible hypotheses by
analogical mapping from clauses in the knowledge base. Therefore, induced rules are
also plausible ones.
Arima has shown that induction and analogy have a common form of preduc-
tion (Arima, 1997). He has described empirical inductive reasoning as "preduction
and mathematical induction" and analogical reasoning as "preduction and deduction."
From this viewpoint, analogical reasoning and inductive reasoning come from the
same root and are rather similar. Furthermore, I think analogical mapping works as
the ideal inductive reasoning. I believe this is so because, while mathematical in-
duction is done without regarding the relation between examples, the ideal inductive
examples collection will be done by analogical mapping. As such, analogical mapping
will work as the subtask of inductive reasoning.
Dimopoulos and Kakas have suggested that abduction can help exploit high-level
background theory available for learning and help handle possible incompleteness in
the background theory (Dimopoulos and Kakas, 1996b). Their concept for integra-
tion of abduction and induction is similar to the integration presented in this chapter.
However, it uses negative observations to eliminate inaccurate theories. Instead of
generating negative examples, it refines existing rules. Despite its use of negative
observations, it seems to be another way of learning with only positive examples.
12 INTEGRATING ABDUCTION
AND INDUCTION IN MACHINE
LEARNING
Raymond J. Mooney

12.1 INTRODUCTION
Abduction is the process of inferring cause from effect or constructing explanations
for observed events and is central to tasks such as diagnosis and plan recognition.
Induction is the process of inferring general rules from specific data and is the primary
task of machine learning. An important issue is how these two reasoning processes can
be integrated, or how abduction can aid machine learning and how machine learning
can acquire abductive theories. The machine learning research group at the University
of Texas at Austin has explored these issues in the development of several machine
learning systems over the last ten years. In particular, we have developed methods for
using abduction to identify faults and suggest repairs for theory refinement (the task of
revising a knowledge base to fit empirical data), and for inducing knowledge bases for
abductive diagnosis from a database of expert-diagnosed cases. We treat induction and
abduction as two distinct reasoning tasks, but have demonstrated that each can be of
direct service to the other in developing AI systems for solving real-world problems.
This chapter reviews our work in these areas, focusing on the issue of how abduction
and induction is integrated. 1
Recent research in machine learning and abductive reasoning have been character-
ized by different methodologies. Machine learning research has emphasized experi-

1Additional details are available in our publications listed in the bibliography, most of which are available
in postscript on the World Wide Web at h t tp : 1 j www. cs . u texas . e dujus ers/ml.

181
P.A. Flach and A.C. Kakas (eds.), Abduction and Induction, 181-191.
© 2000 Kluwer Academic Publishers.
182 R.J.MOONEY

mental evaluation on actual data for realistic problems. Performance is evaluated by


training a system on a set of classified examples and measuring its accuracy at predict-
ing the classification of novel test examples. For instance, a classified example can be
a set of symptoms paired with a diagnosis provided by an expert. A variety of data
sets on problems ranging from character recognition and speech synthesis to medical
diagnosis and genetic sequence detection have been assembled and made available in
electronic form at the University of California at Irvine. 2 Experimental comparisons
of various algorithms on these data sets have been used to demonstrate the advan-
tages of new approaches and analyze the relative performance of different methods on
different kinds of problems.
On the other hand, recent research on abductive reasoning has emphasized philo-
sophical discussion on the nature of abduction and the development and theoretical
analysis of various logical and probabilistic formalisms. The philosophical discussions
have focussed on the relation between deduction, abduction, induction, and probabilis-
tic inference. Logicists have developed various models of abductive inference based
on reverse deduction, i.e. the formation of assumptions that entail a set of observations.
Probabilists have developed various models based on Bayesian inference. A number
of interesting formalisms have been proposed and analyzed; however, there has been
relatively little experimental evaluation of the methods on real-world problems.
Our research adopts the standard methodology of machine learning to evaluate
techniques for integrating traditional abductive and inductive methods. We have pro-
duced more effective machine learning systems, and the advantages of these systems
have been demonstrated on real applications such as DNA sequence identification
and medical diagnosis. We believe that such experimental evaluation is important in
demonstrating the utility of research in the area and in allowing the exploration and
analysis of the strength and weaknesses of different approaches.
The remainder of the chapter is organized as follows. Section 12.2 presents defi-
nitions of abduction and induction that we will assume for most of the chapter. Sec-
tion 12.3 reviews our work on using abductive inference to aid theory refinement. Sec-
tion 12.4 reviews our work on the induction of abductive knowledge bases. Finally,
Section 12.5 presents some overall conclusions.

12.2 ABDUCTION AND INDUCTION


Precise definitions for abduction and induction are still somewhat controversial. In
order to be concrete, I will generally assume that abduction and induction are both
defined in the following genera1logical manner.

• Given: Background knowledge, B, and observations (data), 0, both represented


as sets of formulae in first-order predicate calculus where 0 is restricted to
ground formulae.

• Find: An hypothesis H (also a set oflogical formulae) such that B U H If .l and


BUH r- 0.

2http://www.ics.uci . edu/-mlearn/MLRepository . html


INTEGRATING ABDUCfiON AND INDUCfiON IN MACHINE LEARNING 183

In abduction, H is generally restricted to a set of atomic ground or existentially quanti-


fied formulae (called assumptions) and B is generally quite large relative to H. On the
other hand, in induction, H generally consists of universally quantified Hom clauses
(called a theory or knowledge base), and B is relatively small and may even be empty.
In both cases, following Occam's Razor, it is preferred that H be kept as small and
simple as possible.
Despite their limitations, these formal definitions encompass a significant fraction
of the existing research on abduction and induction, and the syntactic constraints on H
capture at least some of the intuitive distinctions between the two reasoning methods.
In abduction, the hypothesis is a specific set of assumptions that explain the observa-
tions of a particular case; while in induction, the hypothesis is a general theory that
explains the observations across a number of cases. The body of logical work on ab-
duction, e.g. (Pople, Jr., 1973; Poole et al., 1987; Levesque, 1989; Ng and Mooney,
1991; Ng and Mooney, 1992; Kakas et al., 1992), generally fits this definition of ab-
duction and several diagnostic models (Reiter, 1987; Peng and Reggia, 1990) can be
shown to be equivalent or a special case of it (Poole, 1989b; Ng, 1992). The work
on inductive logic programming (ILP) (Muggleton, 1992; Lavrac and Dzeroski, 1994)
employs this definition of induction, and most machine learning work on induction
can also be seen as fitting this paradigm (Michalski, 1983b). In addition, most algo-
rithms and implemented systems for logical abduction or induction explicitly assume
a representation of hypotheses that is consistent with these restrictions and are tailored
to be computationally efficient for problems satisfying these assumptions.
The intent of the current chapter is not to debate the philosophical advantages and
disadvantages of these definitions of induction and abduction; I believe this debate
eventually becomes just a question of terminology. Given their acceptance by a fairly
large body of researchers in both areas, a range of specific algorithms and systems
have been developed for performing abductive and inductive reasoning as prescribed
by these definitions. The claim of the current chapter is that these existing methods
can be fruitfully integrated to develop machine learning systems whose effectiveness
has been experimentally demonstrated in several realistic applications.

12.3 ABDUCTION IN THEORY REFINEMENT


12.3. 1 Definition of theory refinement
Theory refinement (theory revision, knowledge-base refinement) is the machine learn-
ing task of modifying an existing imperfect domain theory to make it consistent with
a set of data. For logical theories, it can be more precisely defined as follows:
• Given: An initial theory, T, a set of positive examples, P, and a set negative
examples, N, where P and N are restricted to ground formulae.
• Find: A "minimally revised" consistent theory T' such that \::1 p E P : T' f- p and
\::In E N : T' If n.
Generally, examples are ground Hom-clauses of the form C :- B1, ... ,Bn, where the
body, B, gives a description of a case and the head, C, gives a conclusion or classifica-
tion that should logically follow from this description (or should not follow in the case
184 R.J.MOONEY

of a negative example). Revising a logical theory may require both adding and remov-
ing clauses as well as adding or removing literals from existing clauses. Generally, the
ideal goal is to make the minimal syntactic change to the existing theory according to
some measure of edit distance between theories that measures the number literal ad-
ditions and deletions that are required to transform one theory into another (Wogulis
and Pazzani, 1993; Mooney, 1995b). Unfortunately, this task is computationally in-
tractable; therefore, in practice, heuristic search methods must be used to approximate
minimal syntactic change. Note that compared to the use of background knowledge in
induction, theory refinement requires modifying the existing background knowledge
rather than just adding clauses to it. Experimental results in a number of realistic
applications have demonstrated that revising an existing imperfect knowledge base
provided by an expert results in more accurate results than inducing a knowledge base
from scratch (Ourston and Mooney, 1994; Towell and Shavlik, 1993).

12.3.2 Theory refinement algorithms and systems


Several theory refinement systems use abduction on individual examples to locate
faults in a theory and suggest repairs (Ourston and Mooney, 1990; Ourston, 1991;
Ourston and Mooney, 1994; Wogulis and Pazzani, 1993; Wogulis, 1994; Baffes and
Mooney, 1993; Baffes, 1994; Baffes and Mooney, 1996; Brunk, 1996). The ways
in which various forms of logical abduction can be used in revising theories is also
discussed and reviewed by (Dimopoulos and Kakas, 1996b); however, they do not
discuss using abduction to generalize existing clauses by deleting literals (removing
antecedents). Different theory-refinement systems use abduction in slightly different
ways, but the following discussion summarizes the basic approach. For each individ-
ual positive example that is not derivable from the current theory, abduction is applied
to determine a set of assumptions that would allow it to be proven. These assumptions
can then be used to make suggestions for modifying the theory. One potential repair is
to learn a new rule for the assumed proposition so that it could be inferred from other
known facts about the example. Another potential repair is to remove the assumed
proposition from the list of antecedents of the rule in which it appears in the abductive
explanation of the example. For example, consider the theory

P(X) R(X), Q(X).


Q(X) S(X), T(X).

and the unprovable positive example

P(a) :- R(a), S(a), V(a).

Abduction would find that the assumption T (a) makes this positive example prov-
able. Therefore, two possible revisions to the theory are to remove the literal T (X)
from the second clause in the theory, or to learn a new clause for T (X ) , such as
T(X) :- V(X).

Another possible abductive assumption is Q (a ) , suggesting the possible revisions of


removing Q ( X ) from the first clause or learning a new clause for Q ( x ) such as
INTEGRATING ABDUCfiON AND INDUCfiON IN MACHINE LEARNING 185

Q(X) V(X).

or
Q(X) :- S(X) I V(X) 0

In order to find a small set of repairs that allow all of the positive examples to be
proven, a greedy set-covering algorithm can be used to select a small subset of the
union of repair points suggested by the abductive explanations of individual positive
examples, such that the resulting subset covers all of the positive examples. If sim-
ply deleting literals from a clause causes negative examples to be covered, inductive
methods (e.g. ILP techniques like FOIL (Quinlan, 1990)) can be used to learn a new
clause that is consistent with the negative examples. Continuing the example, assume
the positive examples are
P(a) R(a) 1 S(a) 1 V(a) 1 W(a).
P(b) R(b) I V(b) I W(b) 0

and the negative examples are


P(c) :- R(c) 1 S(c).
P(d) :- R(d) I W(d) 0

The abductive assumptions Q ( a ) and Q (b) are generated for the first and second
positive examples respectively. Therefore, making a repair to the Q predicate would
cover both cases. Note that the previously mentioned potential repairs to T would not
cover the second example since the abductive assumption T (b) is not sufficient (both
T (b) and s (b) must be assumed). Since a repair to the single predicate Q covers
both positive examples, it is chosen. However, deleting the antecedent Q ( x) from
the first clause of the original theory would allow both of the negative examples to be
proven.
Therefore, a new clause for Q is needed. Positive examples for Q are the required
abductive assumptions Q (a ) and Q (b) . Negative examples are Q ( c ) and Q (d)
since these assumptions would allow the negative examples to be derived. Given the
descriptions provided for a b c and d in the examples, an ILP system such as
I I

FOIL would induce the new clause


Q(X) :- V(X).

since this is the simplest clause that covers both of the positive examples without
covering either of the negatives. Note that although the alternative, equally-simple
clause
Q(X) :- W(X)

covers both positive examples, it also covers the negative example Q (d) .
A general outline of the basic procedure for using abduction for theory refinement is
given in Figure 12.1. The selection of an appropriate subset of assumption sets (repair
points) is generally performed using some form of greedy set-covering algorithm in
order to limit search. Selection of an appropriate assumption set may be based on an
186 R.J. MOONEY

For each unprovable positive example, i, do


Abduce alternative sets of assumptions Ail ,A;2, . .. ,Ain; that
allow example i to be proven.
Select a subset, S, of the resulting assumption sets (Aij 's) such that
their union allows all of the positive examples to be proven.
For each assumption set Aij E S do
If deleting the literals in the theory indicated by Aij cause
negative examples to be proven
then Induce a new consistent clause to cover the
examples made provable by Aij.
else Delete the literals indicated by Aij.

Figure 12.1 General theory refinement algorithm with abduction.

estimate of the complexity of the resulting repair as well as the number of positive
examples that it covers. For example, the more negative examples that are generated
when the literals corresponding to an assumption set are deleted, the more complex
the resulting repair is likely to be.
The EITHER (Ourston and Mooney, 1990; Ourston and Mooney, 1994; Ourston,
1991) and NEITHER (Baffes and Mooney, 1993; Baffes, 1994) theory refinement sys-
tems allow multiple assumptions in order to prove an example, preferring more spe-
ci1ic assumptions, i.e. they employ most-specific abduction (Cox and Pietrzykowski,
1987). AUDREY (Wogulis, 1991), AUDREY II (Wogulis and Pazzani, 1993), A3
(Wogulis, 1994), and CLARUS (Brunk, 1996) are a series of theory refinement systems
that make a single-fault assumption during abduction. For each positive example, they
find a single most-specific assumption that makes the example provable. Different
constraints on abduction may result in different repairs being chosen, affecting the
level of specificity at which the theory is refined. EITHER and NEITHER strongly pre-
fer making changes to the more specific aspects of the theory rather than modifying
the top-level rules.
It should be noted that abduction is primarily useful in generalizing a theory to
cover more positive examples rather than specializing it to uncover negative examples.
A separate procedure is generally needed to determine how to appropriately specialize
a theory. However, if a theory employs negation as failure, abduction can also be used
to determine appropriate specializations (Wogulis, 1993; Wogulis, 1994).
It should also be noted that a related approach to combining abduction and in-
duction is useful in learning definitions of newly invented predicates. In particular,
several ILP methods for inventing predicates use abduction to infer training sets for
an invented predicate and then invoke induction recursively on the abduced data to
learn a definition for the new predicate (Wirth and O'Rorke, 1991; Kijsirikul et al.,
1992; Zelle and Mooney, 1994; Stahl, 1996; Flener, 1997) . This technique is ba-
sically the same as using abduced data to learn new rules for existing predicates in
theory refinement as described above.
INTEGRATING ABDUCfiON AND INDUCfiON IN MACHINE LEARNING 187

A final interesting point is that the same approach to using abduction to guide re-
finement can also be applied to probabilistic domain theories. We have developed a
system, BANNER (Ramachandran and Mooney, 1998; Ramachandran, 1998) for re-
vising Bayesian networks that uses probabilistic abductive reasoning to isolate faults
and suggest repairs. Bayesian networks are particularly appropriate for this approach
since the standard inference procedures support both causal (predictive) and abductive
(evidential) inference (Pearl, 1988b). Our technique focuses on revising a Bayesian
network intended for causal inference by adapting it to fit a set of training examples of
correct causal inference. Analogous to the logical approach outlined above, Bayesian
abductive inference on each positive example is used to compute assumptions that
would explain the correct inference and thereby suggest potential modifications to the
existing network. The ability of this general approach to theory revision to employ
probabilistic as well as logical methods of abduction is an interesting indication of its
generality and strength.

12.3.3 Experimental results on theory refinement


The general approach of using abduction to suggest theory repairs has proven quite
successful at revising several real-world knowledge bases. The systems referenced
above have significantly improved the accuracy of knowledge bases for detecting
special DNA sequences called promoters (Ourston and Mooney, 1994; Baffes and
Mooney, 1993), diagnosing diseased soybean plants (Ourston and Mooney, 1994), and
determining when repayment is due on a student loan (Brunk, 1996). The approach
has also been successfully employed to construct rule-based models of student knowl-
edge for over 50 students using an intelligent tutoring system for teaching concepts
in C++ programming (Baffes, 1994; Baffes and Mooney, 1996). In this application,
theory refinement was used to modify correct knowledge of the domain to account for
errors individual students made on a set of sample test questions. The resulting modifi-
cations to the correct knowledge base were then used to generate tailored instructional
feedback for each student. In all of these cases, experiments with real training and test
data were used to demonstrate that theory revision resulted in improved performance
on novel, independent test data and generated more accurate knowledge than raw in-
duction from the data alone. These results clearly demonstrate the utility of integrating
abduction and induction for theory refinement.
As an example of the sort of experimental results that have been reported, consider
some results obtained on the popular DNA promoter problem. The standard data set
consists of 106 DNA strings with 57 features called nucleotides, each of which can
take on one of four values, A, G, T or C. The target class, promoter, predicts whether
or not the input DNA sequence indicates the start of a new gene. The data is evenly
split between promoters and non-promoters. The initial domain theory was assembled
from information in the biological literature (O'Neill and Chiafari, 1989). Figure 12.2
presents learning curves for this data for several systems. All results are averaged over
25 separate trials with different disjoint training and test sets. Notice that all of the
abduction-based refinement systems improved the accuracy of the initial theory sub-
stantially and outperform a standard decision-tree induction method, C4.5 (Quinlan,
1993), that does not utilize an initial theory.
188 R.J.MOONEY

95

90

85

80

75 )J· ....
u • ·EJ· ................. .
~
0
(.)
70
~
0
65 c:r EITHER~
NEITHER -+--·
60 C4.5 ·D··
BANNER x -
55

50

45
0 10 20 30 40 50 60 70 80 90
Training Examples

Figure 12.2 Learning curves for DNA promoter recognition.

12.4 INDUCTION OF ABDUCTIVE KNOWLEDGE BASES

12.4.1 Learning for abduction


Another important aspect of integrating abduction and induction is the learning of
abductive theories. Induction of abductive theories can be viewed as a variant of in-
duction where the provability relation (f-) is itself interpreted abductively. In other
words, given the learned theory it must be possible to abductively infer the correct
conclusion for each of the training examples.
We have previously developed a learning system, LAB (Thompson and Mooney,
1994; Thompson, 1993), for inducing an abductive knowledge base appropriate for the
diagnostic reasoning model of parsimonious set covering (PCT) (Peng and Reggia,
1990). In PCT, a knowledge base consists of a set of disorder -+ symptom rules
that demonstrate how individual disorders cause individual symptoms. Such an abduc-
tive knowledge base stands in contrast to the deductive symptoms -+ disorder
rules used in standard expert systems and learned by traditional machine-learning
methods. Given a set of symptoms for a particular case, the task of abductive di-
agnosis is to find a minimum set of disorders that explains all of the symptoms, i.e. a
minimum covering set.

12.4.2 LAB algorithm


Given a set of training cases each consisting of a set of symptoms together with their
correct diagnosis (set of disorders), LAB attempts to construct an abductive know!-
INTEGRATING ABDUCTION AND INDUCTION IN MACillNE LEARNING 189

Let R = 0 {initialize rule-base}


Let P be the set of potential rules { d --+ s I d E D;, s E S;}
Until the accuracy of Ron E ceases to improve do
Let A = 0 {initialize best accuracy}
For each rule r E P do
LetR' = RU {r}
Compute the accuracy, a, of R' on E
If a >A then let A = a, b = r {update best rule}
LetR = RU {b} {add best rule to KB}
RetumR
Figure 12.3 General LAB algorithm.

edge base such that the correct diagnosis for each training example is a minimum
cover. The system uses a fairly straightforward hill-climbing induction algorithm. At
each iteration, it adds to the developing knowledge base the individual disorder
--+ symptom rule that maximally increases accuracy of abductive diagnosis over the
complete set of training cases. The knowledge base is considered complete when the
addition of any new rule fails to increase accuracy on the training data.
An outline of the learning algorithm is given in Figure 12.3. It assumes E is the set
of training examples, {E 1 ..• En}, where each E; consists of a set of disorders D; and a
set of symptoms S;. An example is diagnosed by finding the minimum covering set of
disorders given the current rule-base, R, using the BIPARTITE algorithm of (Peng and
Reggia, 1990). If there are multiple minimum covering sets, one is chosen at random
as the system diagnosis. To account for the fact that both the correct and system
diagnoses may contain multiple disorders, performance is measured by intersection
accuracy. If Sis the system diagnosis and C the correct diagnosis, the intersection
accuracy is:

(ISnCI/ISI + ISnCI/ICI)/2.
The average intersection accuracy across a set of examples is used to evaluate a knowl-
edge base.
LAB employs a fairly simple, restricted, propositional model of abduction and a
simple, hill-climbing inductive algorithm. However, using techniques from induc-
tive logic programming (ILP), the basic idea of using induction to acquire abductive
knowledge bases from examples can be generalized to more expressive first-order rep-
resentations. Both (Dimopoulos and Kakas, 1996b) and (Lamma et al., this volume)
present interesting ideas and algorithms on using ILP to learn abductive theories; how-
ever, this approach has yet to be tested on a realistic application. Finally, on-going
research on the induction of Bayesian networks from data (Cooper and Herskovits,
1992; Heckerman, 1995) can be viewed as an alternative approach to learning knowl-
edge that supports abductive inference.
190 R.J.MOONEY

60r----r----r----r----r--=~====~==~==~
....--:~~::.:::.:5'-<:~::~·.:::::>_..-:. (3· • . •

so

,~;::::/:" --: ·- . . _:
>- 40
0
~
~
0 / //·/,·.• : / ·"' • LAB -+-
0
0 ''''' HULTI-DIA.G·ID3 -+-- -
"c /
:

:. /
I BACKPROP ·0--
EXPERT- KB ··;.(·····
.;: 30
·: :,----- ___ ./ ,.·>;
.
HULTI-DIAG-PFOIL _,. __
"~
0
·,,
~ ... /
;;"
20

10

0._--~-----L----~----L---~----~----L---~
0 10 15 20 25 30 35 40

Training Examples

Figure 12.4 Learning curves for stroke-damage diagnosis.

12.4.3 Experimental evaluation of LAB


Using real data for diagnosing brain damage due to stroke originally assembled by
(Tuhrim et al., 1991 ), LAB was shown to produce abductive knowledge bases that were
more accurate than an expert-built abductive rule base, deductive knowledge bases
learned by several standard machine-learning methods, and trained neural networks.
The data consists of 50 patients described by 155 possible symptoms. The possible
disorders consist of 25 different areas of the brain that could be damaged. The fifty
cases have an average of 8.56 symptoms and 1.96 disorders each. In addition, we
obtained the accompanying abductive knowledge base generated by an expert, which
consists of 648 rules.
LAB was compared with a decision-tree learner, ID3 (Quinlan, 1986), a proposi-
tional rule learner, PFOIL (Mooney, 1995a), and a neural network trained using stan-
dard backpropagation (Rumelhart et al., 1986). The neural network had one output
bit per disorder and the number of hidden units was I 0% of the number of disorders
plus the number of symptoms. Since ID3 and PFOIL are typically used for predicting
a single category, an interface was built to allow them to handle multiple-disorder di-
agnosis by learning a separate decision tree or rule-base for predicting each disorder.
An example Ei E E is given to the learner as a positive example if the given disorder
is present in Di, otherwise it is given as a negative example.
INTEGRATING ABDUCfiON AND INDUCfiON IN MACHINE LEARNING 191

The resulting learning curves are shown in Figure 12.4. All results are averaged
over 20 separate trials with different disjoint training and test sets. The results demon-
strate that abductive knowledge bases can be induced that are more accurate than man-
ually constructed abductive rules. In addition, for limited number of training exam-
ples, induced abductive rules are also more accurate than the knowledge induced by
competing machine learning methods.

12.5 CONCLUSIONS
In conclusion, we believe our previous and on-going work on integrating abduction
and induction has effectively demonstrated two important points: 1) Abductive rea-
soning is useful in inductively revising existing knowledge bases to improve their
accuracy; and 2) Inductive learning can be used to acquire accurate abductive theo-
ries. We have developed several machine-learning systems that integrate abduction
and induction in both of these ways and experimentally demonstrated their ability to
successfully aid the construction of AI systems for complex problems in medicine,
molecular biology, and intelligent tutoring. However, our work has only begun to
explore the potential benefits of integrating abductive and inductive reasoning. Fur-
ther explorations into both of these general areas of integration will likely result in
additional important discoveries and successful applications.

Acknowledgments
Many of the ideas reviewed in this chapter were developed in collaboration with Dirk Ourston,
Brad Richards, Paul Baffes, Cindi Thompson, and Sowmya Ramachandran. This research
was partially supported by the National Science Foundation through grants IRI-9102926, IRI-
931 0819, and IRI-9704943, the Texas Advanced Research Projects program through grant ARP-
003658-114, and the NASA Ames Research Center through grant NCC 2-629.
IV The integration of
abduction and induction: a
Logic Programming perspective
13 ABDUCTION AND INDUCTION
COMBINED IN A METALOGIC
FRAMEWORK
Henning Christiansen

13.1 INTRODUCTION
We see abduction and induction as instances within a wide spectrum of reasoning pro-
cesses. They are of special interest because they represent pure and isolated forms.
These together with deduction were identified by C.S. Peirce, and central to his phi-
losophy was the claim of these being the fundamental mechanisms of reasoning as
spelled out in more detail by Flach and Kakas in their introductory chapter to this
volume.
In this chapter, we show that notions and methods developed for metaprogramming
in logic programming can provide a common framework and computational models
for a wide range of reasoning processes, including combinations of abduction and in-
duction. We show examples developed in an implemented metaprogramming system,
called the DEMO system, whose central component is a reversible implementation of
a proof predicate. Reversibility means that the proof predicate can work with partly
specified object programs, and the implementation may produce object program frag-
ments that make the given query provable. Using this proof predicate, we can give
declarative specifications of the overall consistency relation in a given context, cov-
ering a wide range of computations. When facts of an object program are unknown,
the specified process resembles abduction. When rules are unknown, it resembles in-
duction, but any combination of known and unknown rules and facts can be specified,
thus providing models for this wider spectrum of reasoning processes.

195
P.A. Flach and A. C. Kakas (eds.), Abduction and Induction, 195-211.
© 2000 Kluwer Academic Publishers.
196 H. CHRISTIANSEN

13.1.1 A spectrum of inferential processes


Here we relate briefly the position of our work with respect to Peirce's different the-
ories about abduction and induction reviewed by Flach and Kakas in the introductory
chapter of this volume.
The logic programming community has adopted Peirce's syllogistic theory and de-
fined abduction and induction as two different and more or less orthogonal processes.
Different computational methods have been developed for the two and, as described by
Flach and Kakas, with important practical applications. The algorithms developed for
the two appear to be quite different and current research, some of which is documented
in the present volume, is concerned with possible combinations of such algorithms as
to obtain a higher degree of expressibility. For practical reasons, and for compatibility
with the logic programming literature, we apply the terms abduction and induction
with these "syllogistic" meanings.
In his later, so-called inferential theory, Peirce addressed a more general notion
of hypotheses generation which seems analogous to the wider spectrum of reasoning
task that we address in our approach. Our model concerns problems of extending
a background theory (formalized as a logic program) with new hypotheses so that
given observations become logical consequences of the extended theory. These new
hypotheses can be any kind of logic program clauses, and by means of declarative
side-conditions, our methods can be tuned to produce specific kinds of clauses, e.g.,
consisting of facts only, of rules only, or any combination thereof. As it will appear in
the examples, the specification of reasoning problems in this metalogical framework
tends to be fairly straightforward. In addition, the declarative nature of these specifica-
tions together with a capable interpreter provides a quite effortless interaction between
the different forms of reasoning.
Peirce's inferential theory includes also verification of hypotheses by means of new
experiments in nature. This aspect is more difficult to replicate in a logic programming
context as reality or nature is a somewhat intractable object to model on a computer.
However, the use of integrity constraints seems related to this, and what comes closest
is perhaps the judgments made by a user in the process of refining a query to our
system until an acceptable answer is produced.
There are other systems that combine abduction and induction, some of which are
described in other chapters of this volume by Lamma et al., Inoue and Haneda, and by
Yamamoto, but the declarative and flexible style of specification in the DEMO system
does not seem to have a counterpart in other known systems. The approach has been
made possible by new constraint logic methods that needed to be developed in order
to obtain reversibility in the proof predicate in an efficient way.

13. 1.2 Overview


In section 13.2 we discuss requirements for a framework for models of reasoning with
the generality that we have in mind and we present the main features of the DEMO
system as a suggestion for such a framework. Section 13.3 gives a schematic char-
acterization for different sorts of reasoning in our framework and provides a suite of
examples ranging from the pure forms of abduction and induction to various combi-
ABDUCTION AND INDUCTION COMBINED IN A METALOGIC FRAMEWORK 197

nations thereof. This includes a kind of analogical reasoning as well as the derivation
from observations of whole theories of rules and facts. The basic algorithms that im-
plement the DEMO system are based on metaprogramming and constraint logic meth-
ods outlined in section 13.4. We discuss also the similarities and differences between
abduction and induction that become visible in the execution of the algorithms. The
procedural properties that imply the smooth interaction between different reasoning
methods are explained. In the final section 13.5 we give a summary with a discussion
of related work.

13.2 A METALOGIC FRAMEWORK FOR MODELS OF REASONING


A framework for models of reasoning must include a representation of theories and a
metalanguage in which interesting properties about them can be expressed. One such
framework often used in the literature is first-order logic together with metalogic op-
erators such as f= and Pf: (understood as logical consequence, resp., not-consequence),
capital Greek letters referring to sets of formulas, etc., and precise natural language
statements in order to restrict to specific classes of first-order programs. Most pub-
lished papers on abduction and induction in a logic programming setting apply this
style for specifying their particular problem, but with the solution methods defined in
a procedural style thus giving rise to the problem of integrating different methods.
In the present chapter we show that metaprogramming methods developed in logic
programming can be put together and extended to form an implemented framework
which to a large extent appears as a formalization of the style of specification based
on first-order logic.
A framework in our sense concerns a particular object language whose programs
play the role of theories; the object language may consist of Hom clauses or other
classes of logic programs. We define informally a metaprogramming framework for a
given object language to consist of

• a metalanguage which is a general-purpose logic programming language,

• a naming relation which to phrases (including whole programs) of the object


language associates ground terms in the metalanguage; if w belongs to the object
r
language, wl denotes the corresponding metalanguage term called a name for
w,

• a collection of built-in predicates or constraints that makes it possible to ex-


press in the metalanguage interesting semantic and syntactic properties about
elements of the object language.

We prefer a ground representation for the object language in order to avoid well-known
semantic problems spelled out by (Hill and Lloyd, 1989). Using a bit of syntactic
sugar, the actual representation of names can be hidden as in our own implemented
metaprogramming framework. Here the notation \ ( p ( ' X ' ) : - q ( ' X ' ) ) is read
as the ground name for the object language clause p ( x) : - q (X) . The backslash
r..
serves as a concrete syntax for the "naming brackets" ·l· This notation is extended
so that partly instantiated patterns also can be described by means of a question mark
198 H. CHRISTIANSEN

operator. For example, the notation \ (? z : - q ( 1 x 1 ) ) denotes a non-ground met-


alevel term with the metavariable z standing in the position of the name for the head
of an object language clause. The"?" so to speak suspends the effect of"\". Vari-
ables of the metalanguage are called metavariables so as to distinguish them from the
variables of the object language.
In order to specify in a reasonable way (processes that resemble) abduction and
induction, a metaprogramming framework needs to support a representation of logical
consequence for object programs. For Horn clause programs, provability is a good
approximation of logical consequence which can be formalized in terms of the meta-
level predicate demo (for 'demonstrate') specified as follows; f- stands for provability
in the object language.
demo ( rPll Wl) iff p and Q are object program and query and there
exists a substitution o with
Pf- Qo

In case P and Q are completely specified, demo just replicates the work of a conven-
tional interpreter for logic programs. The interesting applications arise when demo is
queried with partly specified arguments. Assume, for example, that the metavariable z
stands in the position of an unknown object language clause in the program argument
of a call to demo as follows.
demo ( \ [ • • • ? z •••] I rp (a) l )
A capable implementation (as the one we describe in this chapter) will, according to
the specification of demo, compute answers for z, each being a name for an object
language clause that makes the object language query p ( a ) succeed in the completed
program.
A representation of non-provability is useful in order to express integrity constraints
and counterexamples, which are often given as part of abduction and induction prob-
lems. This can be made by allowing some form of negation in the query argument of
demo or by means of an additional proof predicate demo_f a i 1 s ( rPl, rQl) with the
meaning that the object query Q fails in the object program P.
A metalevel query may include user-defined side-conditions limiting the program
fragments sought to, say, rules of a certain form or facts belonging to a class of ab-
ducibles. In general, a user can define new metalevel predicates making any combina-
tion of the syntactic and semantic facilities offered by the framework and making full
use of the general logic programming capabilities of the metalanguage.
In this chapter we show, by means of a series of examples developed in an im-
plemented metaprogramming framework called the DEMO system, that it is possible
using this approach to specify a wide range of tasks involving abduction and induction
- and combinations thereof- in a quite natural and declarative way.
The DEMO system is implemented in Sicstus Prolog (SICS, 1998), thus includ-
ing its repertoire of built-in predicates, delay mechanisms, and facilities for writing
constraint solvers in the metalanguage. In addition, the DEMO system provides
• a syntactically sugared naming relation as indicated above, with a Prolog-like
syntax for the object language and the inherent ambiguity resolved using three
ABDUCTION AND INDUCfiON COMBINED IN A METALOGIC FRAMEWORK 199

different naming operators, \ for programs and clauses, \ \ for atoms, con-
straints and conjunctions, and \\\ for terms,

• an implementation of demo which is capable of working properly with unin-


stantiated metavariables standing for unknown parts of the object program,
• a version of demo_f a i 1 s implemented in a straightforward negation-as-finite-
failure fashion, however, lazy in the sense that it delays subcomputations that
depend on missing program parts when necessary,

• a sound and complete constraint solver for a collection of syntactic constraints


together with constraints primarily intended for implementing the demo predi-
cate.
We explain facilities and notation in the system as they are encountered in the ex-
amples. A full description of the DEMO system and its background can be found
in (Christiansen, 1998a), the implementation of demo_f ails is described in (Chris-
tiansen, 1999). 1

13.3 MODELLING A VARIETY OF REASONING PROCESSES


We can use our framework to give a simplistic classification of different sorts of rea-
soning in the following way. Assume a metalevel predicate rules defining the shape
of rules for describing general knowledge in some context, similarly facts for basic
or irreducible facts, and obs for observations which should be explainable from the
current rules and facts.
The following query captures the overall consistency relation among the different
components; the & operator denotes concatenation of object programs.
rules(R), facts(F), obs(Obs), demo(\ (?R & ?F), Obs).
The following table characterizes the syllogistic view (Flach and Kakas, this volume)
of deduction, abduction and induction,which is usually adopted in logic program-
ming, together with the more general spectrum of reasoning processes that we ad-
v
dress. By we indicate that the given argument must be completely given, by ? that
the argument is unknown (or partly given) in the query, thus containing metavariables
whose values are enquired for.

R F Obs
deduction v v J!?
abduction v ? v
induction ? v v
general reasoning vn vn J!?

1The most recent version of the system and documentation is available on-line at http: I jwww. dat.

ruc.dkjsoftwarejdemo.html.
200 H. CHRISTIANSEN

Deduction means to verify or predict observations from established knowledge, ab-


duction means to derive those basic facts that are necessary in order to explain the
given observations, and induction means to extract rules from examples. In the fol-
lowing we show a number of examples of the "pure" forms of abduction and induction
modelled in our framework as well as combinations thereof, indicating examples of
what we have called general reasoning in the table.
We consider in these examples family relations with basic facts concerning the
parent relation and rules of the form pred (X, Y) : - body where body contains one
or two parent atoms and pred can be any predicate different from parent. In
addition, the rules must be range restricted in the sense that the variables in the head,
here X and Y, must appear in the body. We put no restrictions on the observations.
The overall consistency relation in this context is summarized in the following met-
alevel predicate.

demo_family(Rules, Facts, Obs):-


facts(Facts),
rules(Rules),
demo(\ (?Rules & ?Facts), Obs).

For reasons of symmetry in the examples, we introduce the following for negative
examples and integrity constraints.
demo_fails_family(Rules, Facts, Obs):-
demo_fails(\ (?Rules & ?Facts), Obs) .

13.3.1 Abduction
Different reasoning tasks can be defined by varying the degree of instantiation of the
arguments in a query to the demo_family predicate. In particular, abduction is
performed when part of the facts is left unknown.
Here we give as input to the query some facts about the parent relation and a rule
defining the sibling relation. The free variable NewFact in the query below stands
for an unknown fact which, when added to the fact base, should be able to explain how
it can be the case that sibling (mary, brian).
?- Facts \ [parent(john, mary), parent(jane,mary)],
Rules \ [(sibling('X', 'Y') : -
parent ( 'P' , 'X' ) , parent ( 'P' , 'Y' ) ) ] ,

demo_family(Rules, \(?Facts & [?NewFact]),


\\sibling(mary,brian)).

The following two abductive explanations are returned for this query.
NewFact \ (parent(john,brian):-true)
NewFact \ (parent(jane,brian):-true)
ABDUCfiON AND INDUCfiON COMBINED IN A METALOGIC FRAMEWORK 201

We could also allow for abducing more than one fact in which case the system would
return one more answer providing a third parent for mary shared with brian; we
show this in detail in another example below.

13.3.2 Induction
In the following, we ask for a rule defining the sibling relation, giving as input
a number of parent facts together with the observation that mary and brian are
siblings.
?- Facts \[parent(john,mary), parent(jane,mary),
parent(john,brian), parent(hubert,zoe)],

demo_family(\[?NewRule], Facts, \\sibling(mary,brian)).

The system suggests the following alternative rules each of which correctly satisfies
the metalogical specification.2

NewRule \ (sibling( 'X', 'Y'):-


parent( 'XO', 'X') ,parent( 'XO', 'Y'))
NewRule \ (sibling('X','Y'):-
parent( 'XO', 'X') ,parent( 'Xl', 'Y'))
NewRule \ (sibling('X', 'Y'):-
parent('XO', 'Y'),parent('Xl', 'X'))

Only the first rule is an intuitively correct definition of the sibling relation. This
leads us to refine the query as follows, giving in addition one negative example.

?- Facts= \[parent(john,mary), parent(jane,mary),


parent(john,brian), parent(hubert,zoe)],

demo_fails_family(\[?NewRule], Facts,
\\sibling(mary,zoe)),
demo_family(\[?NewRule], Facts,
\\sibling(mary,brian)).

This solves the problem, only the correct answer is returned.

13.3.3 Induction aided by abduction


It may be the case that the necessary amount of basic facts is not available in order
to induce a rule. In this example, the background theory includes a rule saying that
children live with their parents, and we delete any basic facts concerning brian's
parents. Instead we provide an observation in the query describing brian's place of
living.

2A standard ordering is imposed on the goals in the body of a clause so that series of equivalent solutions
by means of permutations and duplications of body atoms are suppressed.
202 H. CHRISTIANSEN

?- Facts \[parent(john, mary),parent(jane,mary),


parent(hubert, zoe), ?NewFact],

Rules \ [(lives_by('X', 'Y'):- parent('X', 'Y')),


?NewRule],

demo_fails_family(Rules, Facts, \\sibling(mary,zoe)),


demo_family(Rules, Facts, \\ (sibling(mary,brian),
lives_by(john,brian))).
In order to explain the observation li ves_by (john, brian), the system abduces
the fact parent (john, brian) which in turn provides enough information to in-
duce the rule and the answer is as follows.
NewRule \ (sibling('X', 'Y'):-
parent('XO' ,'X'),parent('XO' ,'Y'))
NewFact \ (parent(john,brian):-true)

13.3.4 Abduction aided by induction: Reasoning by analogy


If no rule is available whose head matches an observation to be explained, it is not
possible to perform an abduction in the usual way. On the other hand, reasoning
by analogy from other observations can make it possible to produce an acceptable
explanation. This can be described as a combination of abduction and induction where
induction involving known cases produces a rule that makes an abduction possible.
Assume some observations are available concerning the siblings of mary and that
the known facts describe her parent relationships. The question is now, if we know
that donald is a sibling of mary, what can we say about his parents?
This example is a straightforward continuation of the pure induction example above,
except that we allow the abduction to introduce any number of facts.
?- Facts= \[parent(john,mary), parent(jane,mary),
parent(john,brian), parent(hubert,zoe)],

demo_fails_family(\[?NewRule], Facts,
\\sibling(mary,zoe)),
demo_family(\[?NewRule], Facts,
\\sibling(mary,brian)),

demo_family(\[?NewRule], \ (?Facts & ?NewFacts),


\\sibling(mary,donald) ).

All answers include the now familiar sibling rule as the value of NewRule to-
gether with one of the following alternative sets of abduced facts.
NewFacts \[(parent(john,donald):-true)]
NewFacts \[(parent(jane,donald) : -true)]
NewFacts \[(parent(aO,mary):-true),
ABDUCTION AND INDUCTION COMBINED IN A METALOGIC FRAMEWORK 203

(parent(aO~donald):-true)]

This means that donald can have one of mary's known parents as one of his own
parents, or that there exists another parent common to mary and donald; the name
a 0 in the last answer is generated by the system. 3
This example illustrates also that demo and the underlying constraint solver is able
to cope correctly with a problem concerning variables in abducibles which makes
some abduction algorithms flounder. In the process of calculating the third answer
above, a single metavariable (visible in the answer as constant aO) stands for unknown
parts of two different and interdependent facts to be abduced. Without some means
to distinguish between meta and object variables this tends to give problems; this
phenomenon is discussed further in the final section of this chapter.
Finally, we notice that this answer has as a consequence that mary has three par-
ents which we might want to suppress by means of an integrity constraint that can be
expressed as follows.
demo_fails_family(\[?NewRule]l \ (?Facts & ?NewFacts) 1

\\ (parent ( 1 P1 1 1 1 C 1 ) 1 parent ( 1 P2 1 1 1 C 1 ) 1
parent( P3 1 1 C 1 )1
1 1

dif( 1 P1 1 I 1 P2 1 ) ldif( 1 P1 1 I 1 P3 1 ) ldif( 1 P2 1 I 1 P3 1 )

))

The condition reads: For no individual (given by the object variable C), can there
be found three parents that all are different. We used here a constraint dif which
is included in DEMO's object language; d if (It t2 ) means that t 1 and t2 must be
I

syntactically different.

13.3.5 Building whole theories from examples


The final example in this suite shows how an entire theory can be derived from a series
of observations given the "bias" inherent in the definition of the rules and facts
metalevel predicates. In general it is difficult to produce the intuitively correct answer
from only a few observation but through a number of iterations we get to the following
query of positive and negative examples.

demo_fails_family(\[] 1 NewFacts 1 \\parent(mary 1 1 _ 1 ) ) 1


demo_fails_family(\[] 1 NewFacts 1 \\parent(zoe 1 1 _ 1 ) ) 1
demo_fails_family(\[J~ NewFactsl \\parent(donaldl 1 _ 1 ) ) 1
demo_fails_family(\[] 1 NewFacts 1 \\parent(peter 1 1 _ 1 ) ) 1

demo_fails_family(\[?NewRule] 1

3The system includes a device which uses an adapted least-general-generalization algorithm in order to
instantiate metavariables in a way which satisfies the pending constraints, thus making the answers more
readable. It needs to be included in the query as an explicit call of a metalevel predicate which we have
suppressed in this presentation.
204 H. CHRISTIANSEN

\[parent(pl,c),parent(p2,c)], \\sibling(pl,p2)),

demo_fails_family(\[?NewRule], NewFacts,
\\sibling(mary,donald)),

demo_family(\[?NewRule], NewFacts,
\\ (sibling(mary,zoe), sibling(donald,peter))).
The first four negative examples express that none of the mentioned individuals are
parents. The next condition expresses, using skolem constants, that whatever sibling
rule is generated, is should not allow two siblings to have a common child. Notice
here, that this call to demo only has the rule in common with the other calls, the facts
are different. The remaining calls are positive and negative examples of the sibling
relation. The following answer is printed out.
NewFacts = \[(parent(aO,mary) : -true),
(parent(aO,zoe):-true),
(parent(bO,donald) : -true),
(parent(bO,peter) : -true)]

NewRule \ (sibling('X', 'Y'):-


parent('XO', 'X'),parent('XO' ,'Y'))

This example indicates that metalogical frameworks in our sense have a potential for
being used as general concept learners, producing a theory respecting a certain bias
from a collection of unsorted observations. This bias needs to include an assumption
about a stratification among the predicates used. The actual stratification correspond-
ing to a sequence of examples should be determined dynamically, including which
predicates are to be considered abducible or basic, corresponding to the lowest stra-
tum. The examples considered above are especially simple because they have only two
a priori given strata, one for the abducible parent predicate and one for all others.
Identification of taxonomies seems to be another obvious application, using metalevel
predicates to define the sort of object programs that represent a taxonomy.

13.4 IMPLEMENTATION OF THE DEMO SYSTEM


In the introductory chapter of this book, Flach and Kakas discuss differences between
abduction and induction which are apparent in the different sorts of algorithms that
usually are applied for the two sorts problems. In our approach, the implementation of
the demo predicate serves as a common engine for both- and so to speak characterizes
the properties that are common to the two - and where the differences between them
appears in the amount of information that needs to be formalized at the metalevel and
in the actual computations performed by the underlying interpreter. In this section, we
sketch the basic algorithms that implements the DEMO system. Firstly, we describe
the constraint-based implementation of the demo predicate, next we explain how the
side-conditions that define particular reasoning tasks are controlled.
ABDUCTION AND INDUCTION COMBINED IN A METALOGIC FRAMEWORK 205

(IJ demo ( P, Q) : -
instance(Q, Ql, _),
demol(P, Ql).

(2J demol(P, \\true).

(3J demol ( P, Atom):-


member(C, P),
instance(C, \ (?Atom:- ?Body), _),
demol ( P, Body).

(4J demol(P, \ \ (?A, ?B)):-


demol(P, A),
demol(P, B).

Figure 13.1 Definition of the demo predicate.

13.4.1 A constraint logic implementation of the demo predicate


The demo predicate is implemented by means of metaprogramming techniques in
logic programming, where our implementation differs from earlier work on demo by
using constraint logic for interpreting the primitive operations used inside demo. In
this way, we have obtained the reversibility which is needed in order to apply demo
for abduction and induction as indicated. We are not aware of other implementations
of proof predicates such as demo which can be used in this way; see (Christiansen,
1998a) for a review and comparison with related work in this area.
The evaluation of a call to demo is defined by two levels of interpretation, the
first level being the metainterpreter shown in Figure 13.1, the next level given by the
semantics for the metalanguage in which this metainterpreter is written.
The program for demo in Figure 13.1 is a straightforward formalization of SLD-
resolution for the object language. The member condition expresses that a given ob-
ject program clause is member of a given object program. In the actually implemented
DEMO system, member provides a view of programs being sets of clauses, this to
avoid the generation of multitudes of equivalent answers by means of permutations
and duplications of object clauses. However, for the present explanation of the basic
principles underlying demo, it is sufficient to think of it as ordinary list membership
defined in the usual Prolog fashion.
The instance condition is a constraint which formalizes, at the metalevel, the
notion of one object expression having another object expression as an instance; the
third argument of instance represents the substitution that gives that particular in-
stance.
We do not expect the reader to be familiar with constraint logic programming and
we use here a simplified and self-contained model to explain the interpretation of the
metalanguage applied in Figure 13.1. For an overview of constraint logic languages,
see (Jaffar and Maher, 1994; Jaffar et al., 1998); our own model is described in full de-
tail in (Christiansen, 1998a). As argued in the referenced paper, constraint logic tech-
206 H. CHRISTIANSEN

S := initial query;
while any of the following steps apply, do
while (tt = t2) E S do S := (S \ {tt = t2} )mgu(t, ,t2);
if some constraint solver rule can apply to S, then do so,
otherwise, select an atom A in S and instance of a clause
with new variables H: -B 1 , ••• , Bn and let
S := S\ {A} U{A =H,B,, ... ,Bn};

Figure 13.2 A semantics for constraint logic languages.

niques are necessary in order to avoid floundering and other problems that otherwise
arise with a straightforward implementation in Prolog, e.g., (Gallagher, 1993; Hill and
Gallagher, 1994) in case of partly specified object programs.
The operational semantics for the metalanguage is summarized in the nondeter-
ministic algorithm of Figure 13.2 which is a straightforward generalization of SLD-
resolution with constraint handling. The state S is a set of unresolved constraints and
atoms, "mgu" stands for a most-general-unifier operation which produces a substitu-
tion which is applied to the state; if unification fails, the algorithm stops with failure
for the given branch. Notice that unifications are passed through the state as equations
and executed at the next entry of the loop. A state is final if is different from failure
and consists of constraints only, to which no constraint solver rule applies. A com-
puted answer consists of the constraints in a final state together with the substitutions
which have been made to the variables of the initial query.
The rules of Figure 13.3 define the execution of instance constraints. Each rule
is of the form C 1 ...,... C2 and should be understood as follows: If constraints of the
form indicated by expression C 1 exist in the state, replace them by the constraints
indicated by C2. Rule (11), for example, expresses that if two instance constraints
have the same (meta-) variable as their first arguments, and identical third arguments
(object substitution), then the two second arguments should be unified (and one ofthe
two instance constraints is removed as they anyhow become identical following
the unification). Rules (I2-3) move instance constraints that express bindings to
given object variables into the representation of substitutions. 4 Rules (I4-5) perform a
decomposition of (names for) composite object language phrases. Notice the slightly
different treatment of instance constraints related to terms of the object language
and those related to other categories (atoms, clauses, and conjunctions).
In general, a constraint solver should be such that final constraint sets are guar-
anteed to be in a certain simplified form known to be satisfiable. In (Christiansen,
1998a) we have proved this property together with soundness and completeness of

4Rules (12-3) assume the invariant property, that substitution arguments in the state always have an open
tail so that new bindings can be added.
ABDUCTION AND INDUCTION COMBINED IN A METALOGIC FRAMEWORK 207

<II> instance(v,tl ,s),instance(v,tz,s)


--+ t1=tz, instance ( v, t 1 , s)
where v is a variable.
(12> instance(x,tl, [ ··· (x,tz)·· · ])--+ t1 = tz
where x names an object variable.

<B> instance(x,t, [ ··· lw]) --+w= [(x,t) lwJl


where x names an object variable, w is a variable, and (12)
does not apply; w 1 is a new variable.

(14> instance(/ {tJ, ... , tn), t, s)


--+ t = f(vh . . . , vn), instance {tJ, VJ, s), .. . , instance Un, Vn, s)
where f names a function symbol in the object language;
VJ, . . . , Vn are new variables.

<IS> instance ( t1 , tz, s)


-->t1 =h(vJ , .. . ,vn),tz=h(uJ , ... ,un),
instance (VJ, UJ, s), . . . , instance (Vn, Un, s)
where t1 or tz is of the form h ( r 1 , .. . , rn ) where h names a
predicate or connective of the object language;
VJ, . .. , Vn, UJ, .. . , Un are new variables.

Figure 13.3 Constraint solver for instance constraints.

the constraint solver as well as of the entire implementation of demo summarized in


Figures 13.1-13.3.5
In the actual implementation, the metainterpreter of Figure 13.1 is interpreted di-
rectly by Pro log and, thus, demo's termination properties inherit the consequences of
Prolog's depth-first execution method. The constraint solver has been implemented in
Sicstus Pro log (SICS, 1998) using its library of Constraint Handling Rules (Friihwirth,
1995) in a way that preserves termination in constraint solving.

13.4.2 Additional constraints in the DEMO system and user-defined


side-conditions
The DEMO system includes also syntactic constraints that are convenient when setting
up additional side-conditions that co-operate with demo. For each syntactic category
of the object language, we have a corresponding constraint that restricts its argument
to be a metalevel term that names a phrase belonging to that category. A constraint
such as clause_( z) delays until its argument gets instantiated and (if the argument

5 Solving instance constraints is equivalent to the multiple semi-unification problem which is known to
be undecidable (Kfoury et al., 1990) and some readers may have noticed the close similarity between rules
(11-14) of Figure 13.3 and proposed semi-unification algorithms (LeiB, 1984; Henglein, 1989). However,
the structure of the metainterpreter in Figure 13.1 implies invariant properties that ensure termination.
208 H. CHRISTIANSEN

<C·Il constant_(v),instance(v,t,s)
.,.. v = t, constant (t)
where v is a variable.

Figure 13.4 A constraint solver rule for constant_ constraints.

is of the right form), it reduces to constraints for the terms supposed to name the
head and the body of a clause. In some cases, these constraints make it possible to
obtain an optimized behaviour of instance constraints as shown in the additional
constraint solver rule in Figure 13.4. Rule (C-I) overrides the delay of the instance
constraint in rule (I4) in the particular case that v is constrained to be a name for an
object language constant.
A user of the DEMO system does not need a detailed knowledge of the underlying con-
straint mechanism when setting up side-conditions for defining a particular reasoning
task. Such conditions can be written as straightforward Prolog definitions, extended
with unsophisticated use of delay mechanisms as they are found in recent versions of
logic programming languages, e.g., Sicstus Prolog (SICS, 1998).
We can illustrate the principle by means of a small example. Assume we want to
define a predicate abducible which accepts names for facts about object language
predicates rand s. The following definition is sufficient.
:-block abducible(-), abd_atom(-).
abducible( \(?A:- true)):- abd_atom(A).
abd_atom( \\r(?Const)):- constant_(Const) .
abd_atom( \\s(?Const)) : - constant_(Const) .

The block directive is a standard Sicstus Prolog declaration which informs the inter-
preter that these predicates should delay until their arguments become instantiated. 6
The overall effect is that abducible becomes a very lazy predicate in the sense
that it tends not to instantiate but rather to be waiting to test the instantiations that
other events in the computation process might perform. In the present context, these
other events are typically actions performed inside demo, and the Prolog interpreter
provides automatically an optimal interleaving of the two predicates. In this way,
backtracking in the abducible predicate is effectively prevented. With more com-
plex patterns for new clauses- typically with an infinite space of candidate hypotheses
- as may be the case for induction or when arbitrary function symbols are used, this
property becomes crucial for efficiency and termination.

Distinguishing between abduction and induction. At the level of specification in


DEMO, abduction and induction appear very much the same as we have emphasized in

6 This use of delays can easily be incorporated in the semantics we have described for the object language.
Change the wording of Figure 13.2 as to include " ... select a rwn-bwcked atom A in S ... " where an atom is
said to be blocked if its predicate has a block declaration of the indicated form and its argument is a variable.
ABDUCTION AND INDUCTION COMBINED IN A METALOGIC FRAMEWORK 209

our choice of examples. However, if a given problem to be solved is in essence abduc-


tion, there is no reason for the developer to provide a metalevel predicate that approves
those background clauses that anyhow are fixed. And vice versa for induction.
At the procedural level, the following differences are apparent:
• Abduction: Whenever an "unknown" object clause is selected by member in
rule (3) of Figure 13.1, a side-condition wakes up and effectively reduces its
body to true, and instance constraints together with the side-conditions
determine the head of the object clause.
• Induction: Executes in a similar way except that the side-conditions should
allow the unknown clause body to be expanded gradually up to a certain point
through recursive application of the rules (2-4) of Figure 13 .1.
For induction, this may imply quite complex, recursive computations and for less triv-
ial cases of induction than those shown above, it may not be easy to come up with the
right side-conditions.
As our examples illustrate, the declarative nature of the metalevel specifications
provides an easy integration of different methods once developed: Pose the different
side-conditions in a conjunction together with one or more calls to demo and let the
underlying interpreter take care of integrating the different "algorithms".

13.5 SUMMARY AND RELATED WORK


Our main thesis in the present chapter is that metalogic programming, with a ground
representation of object programs and a reversible demo predicate, is well-suited as a
general framework for specifying a wide range of reasoning processes in a declarative
way. We have supported this thesis showing a collection of examples of abduction
and induction and various combinations thereof developed in an implemented system
called the DEMO system.
The overall consistency relation in a given context can be specified by means of
demo together with additional metalevel predicates supplied by the developer defin-
ing the sort of (e.g.) rules and facts that are meaningful in this context; such metalevel
predicates correspond to what is called bias in other approaches. Different kinds of
reasoning processes can be performed by varying the parameters to this consistency
relation, e.g., leaving the facts unspecified leads to abduction, leaving the rules unspec-
ified leads to induction, and the system will return values for the unspecified compo-
nents. In this way, there is no problem of integrating abduction and induction, rather,
we have specified a wide spectrum of reasoning processes in which "pure" abduction
and induction appear as special cases. The DEMO system in which we developed the
examples given in this chapter shows that this seemingly naive approach can work in
practice.
Our classification of abduction and induction within the mentioned spectrum is
merely syntactic, whereas Console and Saitta provide a more "logical" characteriza-
tion in their chapter. They use notions of generality in order to classify generated
explanations as being inductive, abductive or something in between.
Our work is essentially an application of logic programming, the problem being
specified declaratively in terms of relations with the procedural considerations (more
210 H. CHRISTIANSEN

or less!) fixed once and for all in the underlying resolution-based interpreter. The
relations defined by a logic program can be used for computing a variety of functions
that would require a whole collection of functional or procedural programs. As a
simple but striking example of this quality, consider the standard append predicate,
most often used as a device for computing the concatenation of two lists, but when
used properly, it serves as a quite powerful pattern matching device. This is analogous
to the way we use a proof predicate, normally thought of as a device for computing
proofs from given programs, but using it properly, it can perform other tasks such as
abduction and induction as well.
The method we have used in order to obtain the crucial property of reversibility
in demo descends from a resolution method described in an abstract way in (Chris-
tiansen, 1992). However, the use of constraint logic methods as sketched in this chap-
ter is necessary in order to obtain a reasonably efficient implementation. See (Chris-
tiansen, 1998a) for an overview of other work concerned with the demo predicate.
We can also refer to integrations of procedures for abductive and inductive reason-
ing which can lead to quite powerful systems, we can refer to (Ade and Denecker,
1995) and to work documented in this volume by Lamma et al., Mooney, Inoue and
Haneda, Samaka, and by Yamamoto. The reader is referred to the chapter by Lamma
et al. which gives an overview and comparison of these and related methods.
The approach of Lamma et al. induces abducible theories, a problem specified
quite similarly to our example of induction aided by abduction; it seems also pos-
sible by changing their definitions a bit to extend their methods also to handle the
reverse, reasoning by analogy by making abduction aided by induction. They inherit
a floundering problem, briefly touched upon in section 13.3.3, from the underlying
abduction algorithm that arise in case of variables in abucibles. This problem seems
inherent in abduction algorithms that do not explicitly distinguish between meta and
object level variables, e.g., (Kakas and Mancarella, 1990a; Decker, 1996). The ap-
proaches of (Eshghi, 1988; Denecker and de Schreye, 1992) get around the problem
by inserting skolem constants for "problematic" variables. Denecker and de Schreye
(Denecker and de Schreye, 1998) apply a classification scheme for variables which
seems related to the distinction between object and metavariables based on the vari-
ables' level of quantification. Ade and Denecker (Ade and Denecker, 1995) generalize
the abduction method of (Denecker and de Schreye, 1992) into a method to perform
abduction and induction in parallel under integrity constraints and negation. We have
not made a detailed comparison of the expressibility in this and our approach but it
appears that a metaprogramming framework provides a flexibility not found in other
approaches to set up a diversity of requirements to the facts and rules sought.
One problem in our approach inherited from the logic programming setting is that
it is difficult to put a preference ordering on the answers produced. A property such
as minimality of an abductive explanation cannot be specified in an elegant way. The
natural way to work in DEMO is to set up, and revise, the side-conditions such that the
class of possible solutions becomes sufficiently small.
The methods of induction applied by (Ade and Denecker, 1995), Lamma et al. (this
volume), and most other work references above are descendants of the method for
inductive program synthesis of (Shapiro, 1983). The synthesis process is performed
ABDUCTION AND INDUCTION COMBINED IN A METALOGIC FRAMEWORK 211

as an iteration process of a program gradually being modified as negative examples


show it to be too general and positive examples show it to be too specific. This and
other methods derived from it can show interesting results for the synthesis of recursive
programs. We refer to (Bergadano and Gunetti, 1996; Muggleton and De Raedt, 1994;
Nienhuys-Cheng and de Wolf, 1997) for overviews of the area of inductive logic
programming. In (Christiansen, 1998b) we compare our work with the tradition for
logic program synthesis. Methods for abduction in logic programming are reviewed
in (Kakas et al., 1992).
In addition to the sort of reasoning tasks considered in this chapter, we mention the
following applications which seem worth considering in a metalogical framework in
our sense (however, not all of them feasible in the present DEMO system).

• Counterfactual reasoning. In (Andreasen and Christiansen, 1996; Andreasen


and Christiansen, 1998), we have shown that a special case of counterfactual
reasoning can be computed efficiently in a metalogical setting. It would be
interesting to integrate this with with abduction and induction in the DEMO
system.

• Explanation-based reasoning. It has been suggested by (Numao and Shimura,


1990) to use explanation-based reasoning together with a reversible demo pred-
icate for extracting the inherent decomposition and recursion patterns of existing
programs and reusing these as part of the bias in the inductive synthesis of other
programs.

• Synthesis of recursive programs using the recursion operators of (Hamfelt and


Nilsson, 1996; Hamfelt and Nilsson, 1997) to structure the space of recursive
programs. In this way, the demo predicate needs only to invent non-recursive
plug-in's in order to define, say, the append predicate.

Acknowledgments
This research is supported in part by the DART project funded by the Danish Research Councils.
14 LEARNING ABDUCTIVE AND
NONMONOTONIC LOGIC PROGRAMS
Katsumi Inoue and Hiromasa Haneda

14.1 INTRODUCTION
We investigate the integration of induction and abduction in the context of logic pro-
gramming. Our integration proceeds in a way that we learn theories for abductive
logic programming (ALP) in the framework of inductive logic programming (ILP).
Both ILP and ALP are important research areas in logic programming and AI. ILP
provides theoretical frameworks and practical algorithms for inductive learning of re-
lational descriptions in the form of logic programs (Muggleton, 1992; Lavrac and
Dzeroski, 1994; De Raedt, 1996). ALP, on the other hand, is usually considered as an
extension of logic programming to deal with abduction so that incomplete information
is represented and handled easily (Kakas et al., 1992). Learning abductive programs
has also been proposed as an extension of previous work on ILP (Dimopoulos and
Kakas, 1996b; Kakas and Riguzzi, 1997). 1 The important question here is "how do
we learn abductive theories?"
To answer this question, we rely on the following two important ideas presented in
ILP and logic programming:
• Learning nonmonotonic theories has recently received much attention in ILP to
capture the intuition behind learning under incomplete information (Bain and
Muggleton, 1992; Dimopoulos and Kakas, 1995; Inoue and Kudoh, 1997).

1In this chapter, we use the terms abduction and induction precisely in the contexts of ALP and ILP. As
far as these two research fields are considered, each role and the distinction between them are clear without
controversy.
213
P.A. Flach and A.C. Kakas (eds.), Abduction and Induction, 213-231.
© 2000 Kluwer Academic Publishers.
214 K. INOUE AND H. HANEDA

• There are close relationships between abduction and nonmonotonic reasoning


(Poole, 1988a; Inoue, 1992b), in particular between ALP and nonmonotonic
logic programming (Eshghi and Kowalski, 1989; Kakas et al., 1992; Inoue,
1994; Baral and Gelfond, 1994).

Before we present the main idea for learning abductive theories, we briefly review the
above two issues.

14.1.1 Learning nonmonotonic theories


Learning nonmonotonic logic programs has also been considered as an extension of
previous work on ILP to deal with learning under incomplete information. In most
previous work on ILP, definite Hom programs or classical clausal programs are con-
sidered in the form of logic programs. However, research on knowledge representation
in AI, in particular work on nonmonotonic reasoning, has shown that such monotonic
programs are not adequate to represent our commonsense knowledge, including no-
tions of concepts and taxonomies. To learn default rules or concepts in taxonomic
hierarchy, we need a learning mechanism that can deal with nonmonotonic reasoning.
On the other hand, recent advances on theories of logic programming and non-
monotonic reasoning have revealed that logic programs with negation as failure (NAF)
are an appropriate tool for knowledge representation (Baral and Gelfond, 1994). Nor-
ma/logic programs (NLPs) are the class of programs in which NAF is allowed to
appear freely in bodies of rules. Learning NLPs is an important step towards a better
learning tool, and has been considered in (Bain and Muggleton, 1992; Srinivasan et al.,
1992; Bergadano et al., 1996; Martin and Vrain, 1996).
However, NLPs have still a limitation as a knowledge representation tool since they
do not allow us to deal directly with incomplete information (Gelfond and Lifschitz,
1991). NLPs automatically apply the closed world assumption (CWA) to each pred-
icate, so that negative information is implicitly defined in NLPs. As pointed out by
De Raedt and Bruynooghe, the automatic application of CWA is also inappropriate in
inductive concept learning (De Raedt and Bruynooghe, 1990).
To overcome the above problem of NLPs, learning extended logic programs (ELPs)
has recently been proposed (Inoue and Kudoh, 1997). ELPs were introduced by Gel-
fond and Lifschitz to extend the class of NLPs by including classical negation (or
explicit negation)"--," along with NAF (or default negation) "not". The semantics of
ELPs is given by the notion of answer sets (Gelfond and Lifschitz, 1991), and the
answer to a ground query A is either yes, no, or unknown, depending on whether A is
contained in all answer sets, or •A is in all answer sets, or otherwise. Using ELPs,
any literal not contained in either positive or negative examples is considered unknown
unless the learned theory says that it must or must not be in that concept.
In (Inoue and Kudoh, 1997), a system called LELP (Learning ELPs) is proposed
to learn default rules with exceptions in the form of extended logic programs, given
incomplete positive and negative examples and background knowledge. While default
rules are generated as specialization of general rules that cover positive examples,
exceptions to general rules are identified from negative examples and are then gener-
alized to default cancellation rule. In LELP, hierarchical defaults can also be learned
LEARNING ABDUCfiVE AND NONMONOfONIC LOGIC PROGRAMS 215

by recursively calling the exception identification algorithm. Moreover, when some


instances are possibly classified as both positive and negative, nondeterministic rules
can also be learned so that there are multiple answer sets for the resulting program.

14.1.2 Abduction and negation as failure


The links between abduction and nonmonotonic reasoning are bidirectional (Inoue,
1992b): abduction can be formalized by nonmonotonic logics, and conversely many
nonmonotonic reasoning formalisms can be represented and computed by abduction.
Here, we focus on logic programs with NAF as the nonmonotonic formalism.
On the direction from abduction to NAF, it should be noted that abduction is intrin-
sically nonmonotonic inference. Since abduction is ampliative and plausible reason-
ing, conclusions of abductive reasoning may not be correct and are subject to change.
The nonmonotonicity of abduction is also verified by the fact that abductive hypothe-
ses can be represented by default logic (Reiter, 1980) in the form of prerequisite-free
normal default theories (Poole, 1988a). While Poole's result is stated in the context
of first-order logic, it is extended by Inoue to abductive theories represented in ELPs
called knowledge systems (Inoue, 1992b; Inoue, 1994). A knowledge system is de-
fined as a pair (P,r), where both P (facts) and r (hypotheses) are ELPs. Inoue has
shown that such a theory can be translated into a single ELP. In other words, the class
of ELPs can represent abduction in logic programming.
On the direction from NAF to abduction, Eshghi and Kowalski give an abductive
interpretation of NAF in the class of NLPs (Eshghi and Kowalski, 1989). Their ap-
proach is followed by (Kakas and Mancarella, 1990b). Given an NLP P, each NAF
formula nota is replaced by a new atom a* as a hypothesis (called an abducible),
thereby translating P into the Horn program and integrity constraints together with the
set of abducibles.

14.1.3 Outline of this chapter


Combining results presented in the above two subsections, an obvious way to integrate
ALP and ILP consists of two steps: (1) learning ELPs first, then (2) converting the
resulting programs into abductive programs. The basic idea of this extension is based
on two kinds of translations of NAF into abducibles:
1. If LELP generates nondeterministic rules, they are converted into rules with
abducibles using the framework of (Inoue, 1994).
2. NAF formulas in default rules generated by LELP are converted into abducibles
in a simple way.
In this way, we propose the learning system called LAELP (Learning Abductive
ELPs) as an extension ofLELP. Although these translations are not difficult to achieve,
there are important differences between rules with NAF and rules with abducibles. We
will present such relationships between these two learning forms. Since abducibles are
introduced when nondeterministic rules are found, such abducibles are newly gener-
ated (or invented) in LAELP. This is contrasted with other work on learning abductive
theories by (Dimopoulos and Kakas, 1996b; Kakas and Riguzzi, 1997) and (Lamma
et al., this volume).
216 K. INOUE AND H. HANEDA

This chapter not only extends (Inoue and Kudoh, 1997) to deal with abduction, but
also revises the previous LELP algorithm. The rest of this chapter is organized as
follows. Section 14.2 outlines how LELP produces ELPs to learn default and non-
deterministic rules. Section 14.3 extends LELP to learn abductive theories. Sec-
tion 14.4 discusses related work, and Section 14.5 concludes this chapter. The Ap-
pendix presents the proof of the correctness of LELP, which is not included in (Inoue
and Kudoh, 1997).

14.2 LEARNING NONMONOTONIC LOGIC PROGRAMS


This section reviews how LELP learns default rules with exceptions (Inoue and Kudoh,
1997). We first show a simple model, called Basic LELP, in which there are general
rules and exceptions but there are no exceptions to exceptions. This basic model is
then extended to deal with complex concept structures with hierarchical exceptions.

14.2.1 Extended logic programs


Extended logic programs (ELPs) were introduced in (Gelfond and Lifschitz, 1991) as
a tool for reasoning in the presence of incomplete information. They are defined as
sets of rules of the form
Lo ~ Lt , ... , Lm, not Lm+ 1 , ••• , not Ln (14.1)

where Li 's (0:::; i:::; n; n 2': m) are literals. Here, the left-hand side Lois called the head
of the rule ( 14.1 ), and the right-hand side is called the body of the rule. A rule with an
empty head is called an integrity constraint, in which the empty head Lo is identified
with false. Two kinds of negation appear in a program: not is the negation as failure
(NAF) operator, and--, is classical negation. Intuitively, the rule (14.1) can be read as:
if Lt, ... ,Lm are believed and Lm+l, ... ,Ln are not believed then Lois believed.
The semantics of ELPs are defined by the notion of answer sets (Gelfond and Lif-
schitz, 1991), which are sets of ground literals representing possible beliefs. 2 The
class of ELPs is considered as a subset of default logic (Reiter, 1980): each rule of the
form ( 14.1) in an ELP can be identified with the default of the form
Ltl\ ... 1\Lm : r;;;;J., ... ,L,.
Lo
where I stands for the literal complementary to L. Then, each answer set is the set of
literals in an extension of the default theory. An ELP is consistent if it has a consistent
answer set. We say that a literal L is entailed by an ELP P, written P f- L, if L is
contained in every answer set of P. In the following, we often denote classical negation
--, as -, NAF not as \+,and the arrow~ as : - in programs.
We allow rules of the form ( 14.1) in background knowledge. We call a rule having
a positive literal in its head a positive rule, and a rule having a negative literal in its

2 While we adopted the answer set semantics for LELP, other semantics for ELPs may be applicable to our
learning framework with minor modification. For example, Lamma et al. use a well-founded semantics for
learning ELPs, and their output hypotheses are in a slightly different form from ours (Lamma et al., 1998).
LEARNING ABDUCfiVE AND NONMONafONIC LOGIC PROGRAMS 217

head a negative rule. In LELP, the input positive examples are represented as positive
literals, and negative examples are denoted as negative literals.
The completeness and consistency of concept learning (Lavrac and Dzeroski, 1994)
can be reformulated in the three-valued setting as follows. Let BG be an ELP as
background knowledge, E a set of positive and negative literals as the union of positive
examples£+ and negative examples E-, and H a set of rules as the output hypotheses.
1. H is complete with respect to BG and E if for every e E E, H covers e, i.e.,
BGUHI--e.
2. H is consistent with respect to BG and E if for any e E E, H does not cover e,
i.e.,BGUH lfe.
Note here that positive examples are not given any higher priority than negative ones.
Both positive and negative examples are to be covered by the learned rules that are
consistent with respect to background knowledge and examples. Thus, we will learn
both positive and negative rules: no CWA is assumed to derive non-instances as in (De
Raedt and Bruynooghe, 1990).
In the above formalization, whenever H is complete with respect to BG and E, the
consistency of H with respect to BG and E can be replaced with the consistency of
BG U H under the answer set semantics.

Proposition 14.1 Let BG, E and H be the same as above. Suppose that H is complete
with respect to BG and E. Then, the following two statements are equivalent.
(I) His consistent with respect to BG and E.
(2) BG U H is consistent, that is, BG U H has a consistent answer set.

Proof. Suppose that H is consistent with respect to BG and E. Then, '1:/e E E ( BG U H If


e). That is, for any e E E, there is an answer setS of BGUH such that e ~ S. For such
an S, e E S holds by the completeness of H wrt. BG and E. Then, Sis a consistent
answer set. Hence, BG U H is consistent.
Conversely, suppose that BG UH has a consistent answer setS. Let e be any exam-
ple in E. By the completeness of H wrt. BG and E, e E S holds. By the consistency of
S, e ~ S holds. Hence, H is consistent with respect to BG and E. I

14.2.2 Basic LELP


The algorithm of Basic LELP is summarized as follows.

Algorithm 14.1 LELPI(E+,E- ,BG,H)


Input: positive examples£+, negative examples E-,
background knowledge BG.
Output: rules H := T1 U T2 U T3.
1. Determine whether learned general rules are positive or negative;
if negative rules are to be learned then swap£+ and E-;
2. GenRules(E+ ,BG, To);
% Given £+ and BG, generate general rules To covering £+.
218 K. INOUE AND H. HANEDA

3. OWS(To,E+,E-,BG,AB,TI);
% Compute default rules T1 and exceptions AB by specializing To
% so that T1 is consistent wrt. BG and E-.
4. Counter(E-,AB,TJ,Tz);
%Corresponding to T1, generate rules Tz covering E- from AB.
5. Cancel(AB,BG, T3).
% Generalize AB to default cancellation rules T3 wrt. BG.

In Step 2 of Algorithm 14.1, given positive (or negative) examples E and back-
ground knowledge BG, general rules Tare generated to cover E using an ordinary ILP
technique. We denote this part of algorithm as GenRules(E,BG, T), which generates
a minimal set of rules T satisfying that, for each example e E E,
1. BG U T f- e, where BG U T is consistent, and
2. there exists a rule in T whose head is unifiable with e.
Here, the latter condition for GenRules(E,BG, T) means that examples E are covered
directly by T, so that T can be regarded as a definition of the learned concept. Note
also that T is complete with respect to BG and E. Then, if no generalization is in-
duced for some examples in E, they should be just added to hypotheses T. In a special
case, T can be even identical to E, so that the existence of the output of GenRules is
always guaranteed. We do not assume any particular learning algorithm for the imple-
mentation of GenRules. However, since no negative example is used to cover positive
examples, some restrictions on the form of learned rules are necessary if a top-down
learning algorithm is used in GenRules. For instance, learned rules should be range-
restricted, that is, every variable in a rule should appear in the body. The inductive
bias can also be introduced in the definition of GenRules. Anyway, GenRules can be
considered as a black box, and here we are not concerned with the details.
General rules computed by GenRules(E+ ,BG, T) cover the positive (resp. neg-
ative) examples £+, but may also cover the complements of some negative (resp.
positive) examples in E-. To specialize general rules, we use the algorithm of open
world specialization (OWS), which is closely related to the closed world specializa-
tion (CWS) (Bain and Muggleton, 1992). Unlike CWS, OWS does not apply CWA to
identify non-instances of the target concept. In OWS, exceptions are identified from
literals contained in negative examples (or positive examples if the general rule is
negative) such that their complements are proved from general rules with background
knowledge.

Algorithm 14.2 OWS(T,E+,E-,BG,AB,T')


Input: rules T, examples£+ uE-, background knowledge BG.
Output: default rules T', exceptions AB.
Let T' := T; AB := 0;
for each rule C; = ( H : - B ) containing variables in T do
Exc := {Le I IE E-, BGU T f= L9 andBGU (T \ {C;}) ~ L9}; if Exc =/; 0
then
if B contains a NAF formula \+N then
AB :=ABu { Ne 1 Le E Exc}
LEARNING ABDUCTIVE AND NONMONCYfONIC LOGIC PROGRAMS 219

else N := ab;(VI, ... , Vn), where ab; is a new predicate appearing nowhere in
BG and {VI, ... , Vn} are the variables in H;
T' := (T'\ {C;})U{ (H :- B, \+N) };
AB :=ABU{Ne I LeE Exc}.

In Algorithm 14.2, each literal Le for I E E- is collected as exceptional literals


Exc whenever LS is entailed by BG U T but is not entailed by BG U (T \ {C;}). This
condition means that the rule C; is necessary to derive LS from BG U T. When such a
rule C; is generated in Step 2 of Algorithm 14.1, the head H of C; is actually resolved
with I E E- since H must be unifiable with some example in £+.
In Step 4 of Algorithm 14.1, we need rules to cover negative examples (or positive
examples if the default rule is negative). Given negative (resp. positive) examples E-,
default rules D and the exceptions AB, LELP generates such negative (resp. positive)
rules Tin Counter(£- ,AB,D, T).

Algorithm 14.3 Counter(E,AB,D, T)


Input: examples E, default rules D, exceptions AB.
Output: rules T.
1. R := { (H :- N> 1 (H :- B, \+N> ED and NeE AB};
2. T := RU { e E E I an instance of e is not covered by ABUR }.

In Algorithm 14.3, if AB is empty, so is R, then no generalization is attempted, i.e.,


T =E. An alternative method is to produce rules T covering negative (resp. positive)
examples E by GenRules(E ,AB, T) under the constraint that T should not block the
derivation of any positive (resp. negative) example that has been covered by defaults
D. However, if some general rules are also needed for negative (or positive) examples,
parallel rules, which consist of both positive rules and negative rules for the same
concept, should be learned in Section 14.2.3. On the other hand, when parallel rules
are learned, the Counter algorithm is not necessary (see Algorithm 14.6).
The exceptions AB produced by the OWS algorithm is a set of ground atoms. When
exceptions have some common properties, they are generalized to default cancellation
rules by Cancel(AB,BG,T), in which GenRules is again used. Since exceptions are
not anticipated in general, such rules deriving exceptions should be used to derive only
exceptions. Thus, if such rules are too general, that is, they derive negative facts more
than expected, they should be rejected.

Algorithm 14.4 Cancel(AB,BG, T)


Input: exceptions AB, background knowledge BG.
Output: default cancellation rules T.
1. GenRules(AB,BG, T),
under the condition that the set of ground ab; literals entailed by BG U T coin-
cides with the set of ground instances from AB.

In Algorithm 14.4, if no generalization is induced for AB then T is set to AB so


that T satisfies the condition. Hence, the existence of the output of Cancel is always
guaranteed.
220 K. INOUE AND H. HANEDA

Example 14.1 LELP is implemented in SICStus Prolog and is called by lelp (Ex-
arnples, Background_Knowledge, Result). In Examples, atoms pre-
ceded by + represents positive examples, and those with - are negative examples.
I ?- lelp( [+flies(l) , +flies(2),+flies(3),+flies(4),+flies(S),
+flies(6) , -flies(a),-flies(b), - flies(c)],
[bird(l),bird(2),bird(3) , bird(4),bird(5),bird(c),
(bird(X) :- pen(X)),pen(a),pen(b)], Rules).

Rules =
[(flies(A):-bird(A),\+abl(A)), flies(6), % Tl by OWS
(-flies(B):-abl(B)), % T2 by Counter
(abl(C):-pen(C)), abl(c)]? % T3 by Cancel

Theorem 14.2 Given background knowledge BG, positive examples£+ and negative
examples E-, let H be the hypotheses produced by
LELP1 (E+ ,E- ,BG,H). Assume that BGUE is consistent, where E = E+ uE-. Then,
(1) His complete with respect to BG and E, i.e., VeE E (BGUH f- e).
(2) His consistent with respect to BG and E, i.e., VeE E (BGUH If e) .

The proof is given in an appendix to this chapter.

14.2.3 Nondeterministic rules and hierarchical defaults


In Basic LELP, learned general rules are either positive or negative, which must be
determined in Step 1. A possible way to make this determination is to consider the
ratios of the positive and negative examples to all objects. In general, however, it
is difficult to judge whether the general rules should be positive or negative. Some-
times the number of positive examples is close to that of negative examples, and often
given training examples are too sparse so that the ratios of positive and negative ex-
amples to all objects are very low. Two solutions can be considered to this problem:
(1) parallel default rules, and (2) nondeterministic rules. 3 Parallel default rules are
generated when exceptions may exist for both positive and negative rules in parallel
(e.g., mammals normally do not fly except bats, and birds normally fly except pen-
guins). Nondeterministic rules are generated when some instance is possibly proved
to be both positive and negative by parallel default rules so that a contradiction occurs.

Example 14.2 (Learning Nondeterministic Rules)


I ?- lelp([+flies(l),+flies(2),+flies(3),+flies(4),
-flies(S),-flies(6) , -flies(7),-flies(8)],

3 Inconventional machine learning methods, a search bias and a noise-handling mechanism are usually
implemented to prevent the induced hypotheses from overjitting the given examples. See (Lavrac and
Dreroski, 1994, Chapter 8) for an overview of mechanisms for handling imperfect data in ILP. These con-
ventional approaches to noise handling can also be applied to the determination and the implementation
of GenRules in learning positive or negative rules, e.g., (Srinivasan et a/., 1992), in conjunction with our
solutions. Since both positive and negative concepts are learned in our proposals, the use of parallel default
rules and nondeterministic rules further minimizes the number of incorrectly classified training examples.
LEARNING ABDUCTIVE AND NONMONafONIC LOGIC PROGRAMS 221

[bird(l)lbird(2)1 bird(3)1bird(4)1
bird(S)Ibird(6) 1 bird(7) 1 bird(8)]~ Rules).

Here, the ratio of positive examples is 50%, and both positive and negative rules can
be generated. If we use Basic LELP in parallel, the following parallel default rules
are computed:
(flies(A):-bird(A)~\+abl(A))I
abl(S)~abl(6)1abl(7)1abl(8)1
(-flies (B) :-bird (B) 1 \+ab2 (B)) I

ab2(l)lab2(2)1ab2 (3)1ab2(4)

In the above rules, we do not include the rules produced by Counter, i.e., ( - f 1 ies (A)
: -abl (A)) and (flies (B): -ab2 (B)). In fact, these rules are not necessary
since we learn default rules for both positive and negative examples in parallel. No-
tice also that exceptions with the abl and ab2 predicates cannot be generalized here
because rules like (abl(A): -bird(A)) and (ab2(B): -bird(B)) do not sat-
isfy the constraint in Cancel. Now, the above rules look like correct, but if bird ( 9)
is added to background knowledge, neither abl ( 9) nor ab2 ( 9) is proved, so that a
contradiction occurs. In such a case, nondeterministic rules are generated by adding a
NAF formula noty to the body of each parallel rule withy in the head. In the example,
we have:
(flies(A) :-bird(A)I\+abl(A)~\+ -flies(A))I
abl(S)~abl(6)1abl(7)1abl(8)1
(-flies(B) :-bird(B)I\+ab2(B)~\+flies(B))~
ab2(l)lab2(2)1ab2 (3)1ab2(4)

The above two rules act as nondeterministic rules, and each default rule is represented
in the form:

y;(x) +- B;, notab;(x), noty;(x)

where x is a tuple of variables and B; is a conjunction of body literals. This form


of rules is shown in (Baral and Gelfand, 1994) as an appropriate way to represent
default rules in ELPs. Then,for the added ground term 9, the ground instances of new
default rules are:
flies(9) bird(9)1 \+abl(9)1 \+ -flies(9).
-flies(9) :- bird(9) 1 \+ab2(9)1 \+flies(9).

Hence,for the bird 9, two answer sets exist, one concluding flies ( 9) and the other
- f 1 i e s ( 9 ) , but neither one is entailed.

To deal with hierarchical structures of concepts and exceptions, we first modify


Algorithm 14.4 to generate default cancellation rules.

Algorithm 14.5 Cancel2(AB,BG, T)


Input: exceptions AB, background knowledge BG.
Output: default cancellation rules T.
222 K. INOUE AND H. HANEDA

1. GenRules(AB,BG, T), under the condition that IS\ AI< IAI, where Sis the set
of ground abi literals entailed by BG U T, and A is the set of ground instances
fromAB.

The condition IS\AI < IAI in Algorithm 14.5 replaces the stronger conditionS= A
in Algorithm 14.4. Here, the set S \A denotes the exceptions to exceptions. Under
this new condition, S may properly include A as long as the number of elements in
S \ A is less than that in A. 4 This condition represents the monotone assumption that
"in every level of the hierarchy, the number of exceptions is less than that of instances
with default properties".
To learn hierarchical default rules, in Algorithm 14.5, we need to call algorithms of
Cance/2 and OW S recursively. The procedure stops when there are no more exceptions
or no more instances to be generalized. The extended algorithm LELP2 is as follows. 5

Algorithm 14.6 LELP2(£+ ,E- ,BG,H)


Input: positive examples£+, negative examples E-,
background knowledge BG.
Output: rulesH :=R3UR4URsUR6UR7URg.
1. Determine to learn either (1) default rules for£ + only, (2) default rules for£ -
only, or (3) parallel default rules;
if (1) then put R4 = R6 = Rs = 0, perform Steps 2, 4, 7, 9;
if (2) then put R3 = Rs = R1 = 0, perform Steps 3, 5, 8, 10;
if (3) then put Rs = R6 = 0, perform Steps 2-6, 9-10;
2. GenRules(E+,BG,R1) ;
3. GenRules(E-,BG,Rz);
4. OW S(R~,£+ ,E- ,BG,AB1 ,R3);
5. OW S(Rz,E- ,£+ ,BG,ABz,R4);
6. if there are contradictory rules in R1 U Rz then transform R3 U R4
into nondeterministic rules, put Rs = R6 = R1 = Rs = 0, and stop;
7. Counter(£- ,ABt ,R3,Rs);
8. Counter(E+,ABz,R4,R6);
9. ABs(E+,E-,ABt,BGUR3URs,R1);
10. ABs(E- ,£+ ,ABz,BGUR4UR6,Rs).

Algorithm 14.7 ABs(E+ ,E- ,AB,BG, T)


Input: examples£+ U E-, exceptions AB, background knowledge BG.
Output: rules T.

4 We can also consider another criteria for learning hierarchical default cancellation rules. For example, we

can even produce nondeterministic rules at lower levels of the hierarchy.


5The LELP2 algorithm in this chapter has been revised from the previous version in (Inoue and Kudoh,
1997). The previous algorithm produces rules deriving counter-examples by Counler in every level of
the hierarchy, while such rules are added only once at the top level (Step 7 or 8) only when parallel de-
fault rules are not learned. Then, for Example 14.3, the resulting Rules now do not include the rule
(- f 1 i es (D) : - ab2 ( D ) ) , which is not necessary. This redundancy in the previous version was pointed
out in (Lamma et al., 1998).
LEARNING ABDUCTIVE AND NONMONOTONIC LOGIC PROGRAMS 223

I. if AB = 0 then T := 0, and return;


2. Cancel2(AB,BG,R);
3. OW S(R,E- ,£+ ,BG,ABlow•R');
4. ABs(E-,E+ ,ABfow,BGUR',Tiow);
5. T := R' U Tiow ·

In Algorithm 14.7, AB low denotes the exceptions to the default cancellation rules R
that cover the exceptions AB.

Example 14.3 (Learning Hierarchical Defaults)


1 ?- lelp([-flies(l)~-flies(2)~+flies(3)1+flies(4)1+flies(5)~
-flies(6)~-flies(7)~-flies(8)1-flies(9)1-flies(l0)1
-flies(ll) ~ -flies(l2)]1
[pen(l)~pen(2)1bird(3)1bird(4)1bird(5)1
(bird(X) :- pen(X))Ianimal(6)1animal(7)1animal(8)1
animal(9) 1 animal(l0)1animal(ll)~animal(l2)1
(animal(X) :- bird(X))J~ Rules).

Rules =
[(-flies(A):-animal(A) I \+abl(A))I % R4 by OWS
(flies(B): -abl(B)) 1 % R6 by Counter
(abl(C): -bird(C) \+ab2 (C))
1 1 % R by OWS in ABs
1

(ab2(D) : -pen(D))]? % Tlow by ABs in ABs

14.3 LEARNING ABDUCTIVE LOGIC PROGRAMS


14.3.1 Abductive extended logic programs
An abductive extended logic program (AELP) is a pair (P,r), where Pis an ELP
and r is a set of literals from the language of P. The set r is identified with the set
of ground instances from r, and each literal in r is called an abducible. The model-
theoretic and fixpoint semantics for AELPs are given in (Inoue and Sakama, 1996).
LetS be a set of literals. If Sis a consistent answer set of the ELP PUA for some A ~ r,
then S is called a belief set of ( P, r). Belief sets of ( P, r) are also called generalized
stable models (Kakas and Mancarella, 1990b) when classical negation "..," does not
appear in ( P, r). Note that belief sets reduce to consistent answer sets when the set of
abducibles is empty.
In an AELP ( P, r), each abducible in r is a literal. Often however, we would like to
introduce rules of the form (14.1) in r. Such a rule, called an abducible rule, intuitively
means that if the rule is abduced then it is used for inference together with background
knowledge P. This extended abductive framework is introduced in (Inoue, 1994) as
a knowledge system, and has been shown to be a useful tool for theory formation and
representation of commonsense knowledge.
Any knowledge system ( P, r) , where both P and r are ELPs, can be translated into
a single ELP P* as follows (Inoue, 1994). First, for each abducible ruleR in r, a new
naming atom OR is associated with R, and let P' = PU { (H *- B,oR) I R = (H *-B) E
224 K. INOUE AND H. HANEDA

r} and r' = { 8R I R E r}. Note that ( P', r' ) is an AELP. Second, for each atomic
abducible 8 in r', the following pair of new rules are introduced:
8 f- not•8,
(14.2)
·8 f- not8.

Third, P* is defined as the union of P' and the set of rules of the form (14.2) from
r'. Then, there is a one-to-one correspondence between the belief sets of (P,r) and
the consistent answer sets of P*. The nondeterministic rules (14.2) produce multiple
answer sets, one containing 8 and the other •8, which correspond to the addition and
non-addition of the original hypothesis, respectively.

14.3.2 From NAF to abducibles (/):nondeterministic rules


A pair of rules of the form (14.2) in ELPs represents the fact that an atom 8 can be
regarded as an abducible. In other words, the expressive power of the class of ELPs is
the same as that of AELPs. Therefore, when we generate rules having the form (14.2)
in their components, they can be converted into abducible rules.
In LELP, nondeterministic rules are generated when an instance can be possibly
classified as positive and negative. In an ELP P, these rules have the form:

'Y f- B1, not•"{,


(14.3)
"Y f- B2, noty,
where y is an atom and both B 1 and B 2 are conjunctions of literals and NAF formulas.
Here, we assume that neither y nor •'Y appears in the head of any other rule in P.
Then, the AELP ( P*, r*) is constructed as follows. First, let N1 be a pair of rules of
the form ( 14.3) which are generated by LELP, and P1 = P \ N1. Also, let E 1 be the set
of positive and negative examples that are covered by N1 . Second, r 1 is obtained by
converting N1 into abducible rules:

'Y +-- B1,


(14.4)
"Y f- B2.
Third, each abducible rule R of the form (14.4) in r1 is named with a new atom
8R, and let p2 =PI u { (H f- B,8R) I R = (H f- B) Erl} and r2 = {8R IRE ri}.
Fourth, compute the set £2 of abducibles from r2 such that P2 U £2 1- e for every
e E £ 1 • While this computation is of course realized by abductive procedures for ALP,
the identification of £ 2 from E 1 is not a difficult task. The final AELP is defined as
(P* ,r•) = (P2UE2,r2).

Theorem 14.3 Let P be an ELP produced by LELP, and ( P*, r•) the AELP con-
structed as above. Then,for every consistent answer setS ofP, there is a beliefsetS
of ( P*, r• ) such that S = S* \ r•.

Proof. Let us firstly consider the knowledge system ( P1 U E 1, r 1 ) . If an answer set S


of P contains y from (14.3), then it must be that B 1 is also true inS. Then, assuming
(y f- B 1) in a belief set S' of ( P1 U E 1 , r I) such that B 1 is true in S', y is also true in
S'. On the other hand, if S contains --,y, B2 is true in S. Assuming (--,y f- B2) in S'
LEARNING ABDUCTIVE AND NONMONOTONIC LOGIC PROGRAMS 225

such that B2 is true in S' again implies that -.y is true in S'. Otherwise, if S contains
neither"{ nor -.y, B 1 and B2 are not true in S. Then, for S' in which B 1 and B2 are not
true, neither"{ nor -.y is true in S'. Therefore, any consistent answer set S of P is also
a belief set of ( P1 u E 1, r 1 ) •
Next, there is a one-to-one correspondence between the belief sets of ( P1 U E 1, r 1 )
and the belief sets of the AELP (P2UE1,r2) as shown in Section 14.3.1 and (Inoue,
1994).
Finally, any belief set of the AELP ( P2 u E 1 , r 2) is also a belief set of the AELP
(P2UE2,r2)(= (P* ,r* )). Hence, the result follows. I
Example 14.4 (Nixon Diamond) Suppose that examples and background knowledge
are given as follows.
E = { +pacifist(c), +pacifist(d), -pacifist( a), -pacifist(b) }.
BG = {republican (a}, republican(b}, quaker( c), quaker(d},
republican(nixon},quaker(nixon} }.

Then, LELP produces the nondeterministic rules N 1:


(pacifist(A} :- quaker(A}, \+ -pacifist(A}},
(-pacifist(B} :- republican(B}, \+pacifist(B)).

P = BGUN1 has two answer sets, one containing pacifist (nixon) and the other
-pacifist(nixon). Let P1 = P\N1 = BG and E1 =E. Next, the knowledge
system (PI u E I ' r I ) has the abducible rules:
r1 = { (pacifist(A}: -quaker(A}}, ( -pacifist(B}: -republican(B}} }.

Now, by naming abducible rules, the AELP (P* ,r*) is obtained as:
P*={ (pacifist(A} :- quaker(A}, dove(A}},
(-pacifist(B) :- republican(B), hawk(B)),
dove(c}, dove(d}, hawk(a}, hawk(b} } UBG,
r*={ dove(A}, hawk(B} }.

In the AELP, pacifist (nixon) is concluded by abducing dove (nixon), while


-pacifist (nixon) is explained by hawk (nixon). Both abducibles cannot be
assumed at the same time.

The converse of Theorem 14.3 does not hold. For Example 14.4, the AELP ( P*, r* )
has a belief set containing neither pacifist (nixon) nor -pacifist (nixon),
which is not an answer set of P. In that belief set, neither dove (nixon) nor
hawk (nixon) is abduced, so that nixon is neutral or irrelevant to this matter. Note
in this case that the truth value of pacifist (nixon) is unknown for both P and
P*. In general, however, an ELP P entails more literals than the corresponding AELP
P*, as shown in the next subsection.
The next theorem shows that P* entails every example in E.

Theorem 14.4 Let P and ( P*, r*) be the same as in Theorem 14.3, and E the given set
ofpositive and negative examples. Let M 1 be the set of literals entailed by P, and M2
the set ofliterals contained in every beliefset of ( p*, r• ). Then, M 1 n E = M2 n E = E.
226 K. INOUE AND H. HANEDA

Proof By Theorem 14.3, ignoring naming literals from f2, the answer sets of Pare
included in the belief sets of P*, so the intersection Mt of the former sets includes the
intersection M2 of the latter sets. Hence, Mt 2 M2, and so Mt n E 2 M2 n E.
Now, suppose that there is a literal L in (M 1 \ M2 ) n E. Then, L is included in every
answer set of P, but there is a belief set S of ( P* , f* ) such that (i) S is not an answer
set of P and (ii) L ~ S. By (i), there is a pair of literals y and -,y from some rules of the
form ( 14.3) in P such that neither y nor -,y is in S although either B 1 or B2 is true in S.
In other words, neither of two rules ( 14.4) is abduced in S. Here, we can assume that
rules ( 14.3) are generated by LELP to account for the nondeterminism of the concept
to be learned. Then, y and -,y never appear in the body of any rule other than (14.3).
Therefore, Lis either y or -,y. In either case, Lis covered by one of the rules (14.3).
Since L E E, L must be in E 1• Then, the corresponding abducible name 8 is contained
in £ 2 , which is a part of P*. This implies that 8 is in every belief set of ( P*, r•) and
hence Lis also inS, contradicting with (ii). Hence, (Mt \ M2) n E is empty.
Since E is covered by P, M1 nE = E holds. Therefore, Mt nE = M2nE =E. I

14.3.3 From NAF to abducibles (II): default rules


Another possibility of conversion of NAF into abducibles is to substitute NAF for-
mulas in default rules produced by LELP with abducibles. This is an alternative way
to specialize general rules in LELP. While OWS places a NAF formula of the form
\+abi(X), it is now replaced with an abducible normali(X) . Then, instead of iden-
tifying exceptions from negative examples, those normal objects are identified from
positive examples. For each such a positive example p(t), we introduce normali(t) as
a fact. Sometimes such normality abducibles are collected, and can be further gener-
alized.6 On the other hand, for each negative example -,p(t), it is necessary to block
the assumption normali(t). This is realized by introducing integrity constraints, for
example,

~ penguin(X) I norrnali(X) .

This last integrity constraint can be obtained by unfolding the constraint


( ~ abi (X) Inorma li (X)) upon the abnormal literal with the default cancellation
rule (abi (X) ~penguin (X)) produced by Cancel(2).
Like the first translation presented in Section 14.3 .2, the meaning of the AELP con-
structed by the above conversion is not the same as that of the original ELP produced
by LELP. The difference is due to the fact that abnormality is usually minimized by
NAF, while normality is assumed only when needed by abduction. In this sense, ab-
ducible rules are more cautious than default rules. When examples are incompletely
given, such a difference cannot be ignored.
For example, let BG = { bird(tweety), bird(oliver)} be background knowledge,
and £+ = { flies(tweety) } positive examples (De Raedt and Lavrac, 1993). Then,

6 Thisprocess leads us to the so-called abduction-induction cycle, in which abduced literals are input to the
inductive process as examples to be generalized, while rules generated by the inductive process are used as
background knowledge in the abductive process (Flach and Kakas, this volume, Section 1.4).
LEARNING ABDUCfiVE AND NONMONOTONIC LOGIC PROGRAMS 227

LELP generates the rule h = (flies(x) +- bird(x)), which realizes an inductive leap
because BG U {h} entails flies( oliver) .1 In the three-valued semantics, however, one
often wants to conclude that flies( oliver) is unknown. To derive this weak con-
clusion, a new abducible is added to h as (flies(x) +- bird(x),normal(x)). Note
again that the abducible normal(x) can be regarded as the name of the abducible rule
(flies(x) t- bird(x)) (Poole, 1988a; Inoue, 1994). In this case, normal(tweety) has to
be introduced as a fact. For oliver, if we assume normal (oliver) it flies; otherwise, we
do not know whether it flies or not. This inference is preferable if we do not want to
make rules defaults but need to have rules as candidate hypotheses.

14.3.4 Discussion
Sections 14.3.2 and 14.3.3 have shown that we can convert rules learned by LELP
into rules with abducibles. We now call this learning method LAELP. Since LAELP
is based on learning ELPs, it can be regarded as an indirect method to generate ab-
ductive theories. An alternative way is to learn abductive programs directly from the
beginning. This is possible because we can generate abducible rules instead of rules
with NAF when we learn default rules or nondeterministic rules.
One might think that LELP is enough for learning under incomplete information
because the class of ELPs is as expressive as the class of AELPs. The question as to
why we need to learn abductive theories should be answered by considering the role of
abduction in application domains. One may often understand abductive theories more
easily and more intuitively than theories represented in other nonmonotonic logics.
For example, in diagnostic domains, background knowledge contains the cause-effect
relations and abducibles are written as a set of causes. Moreover, in the process of the-
ory formation, incomplete knowledge is naturally represented in the form of abducible
rules.
As shown in Section 14.3.2, both LELP and LAELP can avoid the situation that
an instance may be classified as both positive and negative. In LELP, this can be
achieved only by nondeterministic rules, while the same effect is also obtained by
abducible rules in LAELP. In Section 14.3.3, we have shown a difference between
abducible rules and default rules. Inductive leaps can be better avoided by abducible
rules, while defaults are better represented by rules with NAF.
Another merit of learning abductive theories lies in the fact that abductive proof
procedures developed for ALP are computationally useful. Often abductive proce-
dures can be implemented more easily than theorem provers for other kinds of non-
monotonic reasoning.

7To avoid inductive leaps, some researchers propose a weak form of induction by applying CWA to BG U E
through Clark's completion, e.g., (De Raedt and Lavrac, 1993). However, as explained earlier, CWA is not
appropriate in learning ELPs.
228 K. INOUE AND H. HANEDA

14.4 RELATED WORK


14.4.1 Learning nonmonotonic programs
The CWS algorithm (Bain and Muggleton, 1992) has been applied to non-monotonic
versions of CIGOL and GOLEM in (Bain, 1992) and a learning algorithm that can
acquire hierarchical programs in (Srinivasan et al., 1992). CWS produces default
rules with NAF in stratified NLPs. Since CWS is based on CWA in the two-valued
setting, it regards every ground atom that is not contained in an intended model as an
exception. or In LELP, on the other hand, OWS is employed instead of CWS, and
incomplete information can be represented in ELPs with the three-valued semantics.
TRACYnor (Bergadano et al., 1996) learns stratified NLPs using trace information
of SLDNF derivations. Since this system needs the hypothesis space in advance, it
does not invent a new predicate like ab; expressing exceptions, and hence seems more
suitable for learning rules with negative knowledge and CWA rather than learning
defaults. Martin and Vrain use the three-valued semantics for NLPs in their inductive
framework (Martin and Vrain, 1996). Since they do not adopt ELPs, CWA is still
employed and two kinds of negation are not distinguished.
While no previous work adopts full ELPs in the form of learned programs, a lim-
ited form of classical negation has been used in (De Raedt and Bruynooghe, 1990; Di-
mopoulos and Kakas, 1995). De Raedt and Bruynooghe first discussed the impor-
tance of the three-valued semantics in ILP (De Raedt and Bruynooghe, 1990). How-
ever, since they did not allow NAF, an explicit list of exceptions is necessary for each
rule, which causes the qualification problem in AI. Dimopoulos and Kakas propose a
learning method that can acquire rules with hierarchical exceptions (Dimopoulos and
Kakas, 1995). They also do not use NAF to represent defaults, but adopt their own
nonmonotonic logic with priority relations.
In any previous work, nondeterministic rules cannot be generated, and hence com-
monsense knowledge with multiple extensions cannot be learned. Thus, it may not be
easy to construct abductive theories as in Section 14.3.2 from theories generated by
previous nonmonotonic learning.

14.4.2 Learning abductive programs


Dimopoulos and Kakas propose a general framework in which a new abductive logic
program is learned given a previous abductive program as background knowledge and
training examples (Dimopoulos and Kakas, 1996b). Their framework is followed by
(Kakas and Riguzzi, 1997) and (Lamma et al., this volume). There are two impor-
tant differences between these work and ours. Firstly, new abducibles are acquired
in LAELP, while this function is not implemented in any other work since the set of
abducibles is always fixed in their learning definition. In LAELP, abducible rules are
converted from nondeterministic rules, and such nondeterminism is automatically dis-
covered in the learning process. We believe that this mechanism of abducible invention
is the core of automatic acquisition of incomplete knowledge, so that it is necessary
for learning abductive theories. Secondly, the covering relation in (Dimopoulos and
Kakas, 1996b; Kakas and Riguzzi, 1997) and (Lamma et al., this volume) is defined
based on abductive entailment, while we did not change the entailment relation from
LEARNING ABDUCTIVE AND NONMONOTONIC LOGIC PROGRAMS 229

the ordinary one in ELPs. Abductive entailment is necessary when we allow an ab-
ductive program as background knowledge. While an abducible literal introduced by
LAELP is either of the type expressing the nondeterminism of rules or of the type of
defaults, Lamma et al. can learn rules containing more than two abducibles, which are
available from background knowledge. However, such an extension with abductive en-
tailment is not obvious for LAELP, because we would like to acquire new abducibles
and revise old abducibles at the same time in the abduction-induction cycle. This is
important future work.
Learning abductive programs is also investigated in different contexts by other re-
searchers, including Mooney and Sakama in this volume and (Kanai and Kunifuji,
1997). They use abduction to compute inductive hypotheses efficiently, and such an
integration is useful in theory refinement.

14.5 CONCLUSION
We presented an integration of abduction and induction from the viewpoint of learning
abductive logic programs in ILP. We proposed techniques to learn abductive theories
based on the theory of learning nonmonotonic rules and the relationships between non-
monotonic reasoning and abductive reasoning. The proposed learning system LAELP
can avoid problems in handling nondeterminism and inductive leaps by introducing
new abducibles. This automatic discovery of abducibles is important for learning un-
der incomplete information.
The knowledge representation language on which our proposal is based is abduc-
tive extended logic programs, which is rich enough to allow explicit negation, default
negation, and abducibles in programs. Since both nonmonotonic programs and ab-
ductive programs can be used to represent incomplete information, one may represent
incomplete knowledge in various ways. This means that possible solutions other than
the method presented in this chapter could be considered to learning under incomplete
information. Hence, our important future work is to consider which representation
method is better in acquiring knowledge of various domains. We should make such a
comparison in terms of expressiveness, efficiency, comprehensibility and applicability.

Acknowledgments
We are indebted to Hiroichi Nakanishi, Yoshimitsu Kudoh and Munenori Nakakoji for their
assistance to implement versions of L(A)ELP. We would like to thank Chiaki Sakama for his
valuable comments.

Appendix: Proof of Theorem 14.2


Theorem 14.2 Assume that BG U E is consistent, where E = £+ U E-. Let H be the
hypotheses produced by LELPI(E+,E- ,BG,H). Then, His complete and consistent
with respect to BG and E.
Proof. We prove the theorem by tracing the LELPI algorithm step by step. In the
following, we consider the case that positive rules are learned as general rules. The
correctness in learning negative rules can be proved in the same way.
230 K. INOUE AND H. HANEDA

Firstly, by the requirement for GenRules, for any K and F,

GenRules(F,K,T) ~ (VfEF(KUTf- f)A(KUTisconsistent)].

By Proposition 14.1, this can be written as:

GenRules(F,K,T) ~\If E F [(KUT f- f) 1\ (KU TIff)].

In Step 2, GenRules(E+ ,BG, To). Then, To is complete and consistent with respect to
BG and £+ by the consistency of BG U £+. That is,

\feE£+ ( (BG U To f- e) 1\ (BG U To If e)].

In Step 3, OW S(To,E+ ,E- ,AB, T1). Consider the following cases.

To=£+: GenRules in Step 2 did not produce new rules. In this case, T1 =To and
AB = 0. Then, \leE £+ (BG UAB U T1 f- e) and VeE£+ (BG UAB U T1 If e).
Also, \feE£- (BGUABUT1 1fe) by the consistency of BGUE.
To =1- £+: GenRules in Step 2 produced new rules. Here, we consider the case that
there is only one rule C=( H : - B) in To such that H is resolved with some
literals from E-, but the result can be extended to the case that there are multiple
such rules in To. Consider the following cases.

1. When there is no literal L such that I E £- and BG U To f- 3L, T1 = To


and AB = 0 hold. As in the above case, \feE£+ (BG UAB U T1 f- e),
\feE£+ (BGUABUT1 1fe), and VeE£- (BGUABUT1 1fe) hold.
2. When there is a literal L such that I E E - and BG U To f- 3L, let e be an an-
swer substitution for C in proving L. T1 includes a rule ( H : - B, \ + N)
whereN = abi(V~, ... , Vn). andNS isincludedinAB. Then, \feE£+ (BGU
ABU T1 f- e) and Ve E £+ (BG U ABU T1 If e) still hold, while we also
have BG U ABU T1 If Le. Putting every such Ne into AB, it holds that
\feE£- (BGUABUT11fe).

Hence, in either case,

\feE£+ ((BGUABUT1 f- e) 1\ (BGUABUT11fe)]

and

\feE£- (BGUABU T11fe).

In Step 4, Counter(£ - ,AB, T1, T2). By the construction of T2,

Adding T2 to BG UABU T1 does not change the derivability of any literal in E £+ U =


E- because every general rule in T2 is of the form ( H : - N) where ( H B, \ + N) E
T1 and NeE AB, so that He is derived only if NS E AB. Hence,

\feE E ( (BGUABU T1 U T2 f- e) 1\ (BGUABU T1 U T21fe)].


LEARNING ABDUCTIVE AND NONMONOTONIC LOGIC PROGRAMS 231

In Step 5, Cancel(AB,BG, T3). Here, GenRules(AB,BG, T3) produces T3 under the


constraint that the set of ground abi literals covered by T3 is exactly the set of ground
instances from AB. Then,

Vabi E AB (BGU T3 f- abi)·

Moreover, BG U T3 never entails any new abi literal which is not an instance of an
element of AB.
Finally, put H = Tt U TzU h With the above results, it holds that

VeE E [ (BGUH f- e) 1\ (BGUH If e)] .


That is, H is complete and consistent with respect to BG and E. I
15 COOPERATION OF ABDUCTION
AND INDUCTION IN LOGIC
PROGRAMMING
Evelina Lamma, Paola Mello, Fabrizio Riguzzi,
Floriana Esposito, Stefano Ferilli, and Giovanni Semeraro

15.1 INTRODUCTION
This chapter proposes an approach for the cooperation of abduction and induction in
the context of Logic Programming. We do not take a stance on the debate on the nature
of abduction and induction (see Flach and Kak:as, this volume), rather we assume the
definitions that are given in Abductive Logic Programming (ALP) and Inductive Logic
Programming (ILP).
We present an algorithm where abduction helps induction by generating atomic hy-
potheses that can be used as new training examples or for completing an incomplete
background knowledge. Induction helps abduction by generalizing abductive expla-
nations.
A number of approaches for the cooperation of abduction and induction are pre-
sented in this volume (e.g., by Abe, Sak:ama, Inoue and Haneda, Mooney). Even if
these approaches have been developed independently, they show remarkable similar-
ities, leading one to think that there is a "natural way" for the integration of the two
inference processes, as it has been pointed out in the introductory chapter by Flach
and Kak:as.
The algorithm solves a new learning problem where background and target theory
are abductive theories, and abductive derivability is used as the example coverage
relation. The algorithm is an extension of a basic top-down algorithm adopted in
ILP (Bergadano and Gunetti, 1996) where the proof procedure defined in (Kak:as and
233
P.A. Flach and A. C. Kakils (eds.), Abduction and Induction, 233-252.
© 2000 Kluwer Academic Publishers.
234 E. LAMMA ET AL.

Mancarella, 1990c) for abductive logic programs is used for testing the coverage of
examples in substitution of the deductive proof procedure of Logic Programming.
The algorithm has been implemented in a system called LAP (Lamma et al., 1997)
by using Sicstus Prolog 3#5. The code of the system and some of the examples shown
in thechapterareavailableathttp : 1;www-lia . deis . unibo . it/Software/
LAP/.
We also discuss how to learn abductive theories: we show that, in case of complete
knowledge, the rule part of an abductive theory can be also learned without abduc-
tion. Abduction is not essential to this task, but it is essential in case of absence of
information, i.e. when the background theory is abductive.
The chapter is organized as follows: in Section 15.2 we recall the main concepts
of Abductive Logic Programming, Inductive Logic Programming, and the definition
of the abductive learning framework. Section 15.3 presents the learning algorithm.
In Section 15.4 we apply the algorithm to the problem of learning from incomplete
knowledge, learning theories for abductive diagnosis and learning exceptions to rules.
Our approach to the integration of abduction and induction is discussed in detail and
is compared with works by other authors in Section 15.5. Section 15.6 concludes and
presents directions for future work.

15.2 ABDUCTIVE AND INDUCTIVE LOGIC PROGRAMMING


In this section we recall the definitions of abduction and induction in Logic Program-
ming given by Flach and Kakas in the introductory chapter and we add the satisfaction
of integrity constraints to abduction and the avoidance of negative examples to induc-
tion.

15.2.1 Abductive Logic Programming


In a Logic Programming setting, an abductive theory (Kakas et al., 1997) is a triple
(P,A,IC) where:

• P is a normal logic program;

• A is a set of abducible predicates;

• IC is a set of integrity constraints in the form of denials, i.e.:


f- A1 , . .. ,Am,not Am+ I, . . . ,not Am+n·

Given an abductive theory T = (P,A,IC) and a formula G, an abductive explanation


fl. for G is a set of ground atoms of predicates in A such that P U fl. f= G (fl. explains G)
and P U fl. f= /C (fl. is consistent). When there exists an abductive explanation for G in
T, we say that T abductively entails G and we write T FA G.
Negation As Failure (Clark, 1978) is replaced, in ALP, by Negation by Default (Es-
hghi and Kowalski, 1989) and is obtained by transforming the program into its positive
version: for each predicate symbol p / arity in the program, a new predicate symbol
COOPERATION OF ABDUCTION AND INDUCTION IN LOGIC PROGRAMMING 235

not_p j arity is added to the set A and the integrity constraint +- p(X), not_p(X) 1 is
added to /C. Then, each negative literal not p(f) in the program is replaced by a lit-
eral not_p({). Atoms of the form not_p({) are called default atoms. For simplicity, in
the following we will write abductive theories with Negation by Default and we will
implicitly assume the transformation.
We define the complement l of a literal[ as

l _ { not_p(x) if z = p(x)
- p(x) if l = not_p(x)

In (Kakas and Mancarella, 1990c) a proof procedure for the positive version of ab-
ductive logic programs has been defined. This procedure (reported in the Appendix)
starts from a goal G and a set of initial assumptions 11; and results in a set of consistent
hypotheses (abduced literals) 110 such that 110 2 11; and 110 is an abductive explana-
tion of G. The proof procedure employs the notions of abductive and consistency
derivations. Intuitively, an abductive derivation is the standard Logic Programming
derivation suitably extended in order to consider abducibles. As soon as an abducible
atom o is encountered, it is added to the current set of hypotheses, and it is proved
that any integrity constraint containing o is satisfied. To this purpose, a consistency
derivation foro is started. Every integrity constraint containing o is considered and 8 is
removed from it. The constraints are satisfied if we prove that the resulting goals fail.
In the consistency derivation, when an abducible is encountered, an abductive deriva-
tion for its complement is started in order to prove its falsity, so that the constraint is
satisfied.
When the procedure succeeds for the goal G and the initial set of assumptions 11;
producing as output the set of assumptions /10 , we say that T abductively derives G
or that G is abductively derived from T and we write T f-!~ G. Negative atoms of the
form not...a in the explanation have to be interpreted as "a must be false in the theory",
"a cannot be assumed" or "a must be absent from any model of the theory".
In (Brogi et al., 1997) it has been proved that the proof procedure is sound and
weakly complete with respect to an abductive model semantics under a number of
restrictions:

• the abductive logic program must be ground,

• abducibles must not have a definition in the program,

• integrity constraints are denials with at least one abducible in each constraint.
The requirement that the program is ground is not restrictive in the case in which there
are no function symbols in the program and therefore the Herbrand universe is finite.
In this case, in fact, we can obtain a finite ground program from a non-ground one by
grounding in all possible ways the rules and constraints in the program.
The soundness and weak completeness of the procedure require the absence of
any definition for abducibles. However, when representing incomplete information,

1In the following, X represents a tuple of variables and t a tuple of terms.


236 E. LAMMA ET AL.

it is often the case that for some predicate a partial definition is available, expressing
known information about that predicate. In this case, we can apply a transformation to
T so that the resulting program T' has no definition for abducible predicates. This is
done by introducing an auxiliary predicate Oa/ n for each abducible predicate a/ n with
a partial definition and by adding the clause

Predicate a/n is no longer abducible, whereas Oa/n is now abducible. If a(i') cannot
be derived using the partial definition for ajn, it can be derived by abducing Oa(i').
provided that it is consistent with integrity constraints.
Usually, observations to be explained are positive. However, by representing nega-
tion by default through abduction, we are able to explain also negative observations.
A negative observation is represented by a literal not J. and an explanation for it can
be generated by the abductive proof procedure applied to the goal+- notJ.. The ex-
planation of a negative observation has the following meaning: if all the atoms in
the explanation for +- not J. are added to the theory, then +- l will not be derivable.
This differs from abductive frameworks proposed in this volume by Abe, for whom
the explanation of negative observations is uncommon, and by Sakama, for whom the
explanation of negative observations is not allowed.

15.2.2 Inductive Logic Programming


The ILP problem can be defined as (Bergadano and Gunetti, 1996):
Given:

• a set£+ of positive examples (atoms)

• a set E- of negative examples (atoms)

• a logic program B (background knowledge)

• a set P of possible programs

Find:

• a logic program P E P such that


- Ve+ EE+,BUP'r-e+ (completeness)
- Ve- E E-, B UP 'rf e- (consistency).

Let us introduce some terminology. The sets £+ and E- are called training sets. The
program P that we want to learn is the target program and the predicates which are
defined in it are target predicates. The program B is called background knowledge
and contains the definitions of the predicates that are already known. We say that
the learned program P covers an example e if PUB 'r e. A theory that covers all
positive examples is said to be complete while a theory that does not cover any negative
example is said to be consistent. The set P is called the hypothesis space.
COOPERATION OF ABDUCTION AND INDUCTION IN LOGIC PROGRAMMING 237

The language bias (or simply bias in this chapter) is a description of the hypothesis
space. Some systems require an explicit definition of this space and many formalisms
have been introduced in order to describe it (Bergadano and Gunetti, 1996). In order
to ease the implementation of the algorithm, we have considered only a very simple
bias in the form of a set of literals which are allowed in the body of clauses for target
predicates.

15.2.3 The new learning framework


We consider a new learning problem where both background and target theory are
abductive theories and the notion of deductive coverage is replaced by abductive cov-
erage.
Let us first define the correctness of an abductive logic program T with respect to
the training set£+ ,E- . This notion replaces those of completeness and consistency
for logic programs.

Definition 15.1 (Correctness) An abductive logic program Tis correct, with respect
to£+ and E-, iff there exists 11 such that

where not .E- = {not ...e-ie- E E-} and£+, not .E- stands for the conjunction ofeach
atom in£+ and not.E-

Definition 15.2 (Abductive Learning Problem)


Given:

• a set ofpositive examples £+


• a set of negative examples E-

• an abductive theory T = (P,A,JC) as background theory


• a set P ofpossible programs

Find:
A new abductive theory T' = (PUP' ,A ,IC) such that P' E P and T' is correct
wrt. £+ and E-.

We say that a positive example e+ is covered if T 1-~ e+. We say that a negative
example e- is not covered (or ruled out) if T 1-~ not..e-
The abductive program that is learned can contain new rules (possibly containing
abducibles in the body), but not new abducible predicates 2 and new integrity con-
straints.
We now give an example of an Abductive Learning Problem.

2 If we exclude the abducible predicates added in order to deal with exceptions, as explained in Section 15.3.
238 E. LAMMA ET AL.

Example 15.1 We want to learn a definition for the concept father from a background
knowledge containing facts about the concepts parent, male and female. Knowledge
about male and female is incomplete and we can make assumptions about them by
considering them as an abducible.
Consider the following training sets and background knowledge:
{father(john, mary),father(david, steve)}
{father(john, steve) ,father(kathy, ellen)}
{parent (john, mary), male (john),
parent(david,steve),
parent(kathy, ellen), female(kathy)}
A= {male/ 1,female/ 1}
IC= { +-- male(X),female(X)}
Moreover, let the bias be
father(X,Y) +--a where a C {parent(X,Y),parent(Y,X),
male(X) ,male(Y),female(X),female(Y)}
A solution to this Abductive Learning Problem is the theory T' = (PUP',A,IC) where
P' = {father(X,Y) +-- parent(X,Y),male(X)}
In fact, the condition on the solution
T 1-~ £+ ,not.E-
is verified with
6. = {male(david),noLfemale(david),not.male(kathy)}.
Note that, for the example father(david,steve), the abductive proof procedure re-
turns the explanation {male(david),not-female(david)} containing also the literal
not-female(david) which is implied by the constraints and male(david) . However,
in this way the explanation is such that it can not be consistently extended without
violating the constraints.
Differently from the ILP problem, we require the conjunction of examples to be
derivable, instead of each example singularly. This is done in order to avoid that
abductive explanations for different examples are inconsistent with each other, as it is
shown in the next example.
Example 15.2 Consider the following abductive theory:
p= {p +--a.
q +--b.}
A= {a/O,b/0}
IC= {+-a,b.}
and consider two positive examples p and q. If taken singularly, they are both ab-
ductively derivable from the theory with, respectively, the explanations {a} and {b}.
However, these explanations are inconsistent with each other because of the integrity
constraint +-- a, b, therefore the conjunction p, q will not be abductively derivable in
the theory.
COOPERATION OF ABDUCTION AND INDUCTION IN LOGIC PROGRAMMING 239

procedure LeamAbdLP(
inputs: £+ ,E- : training sets,
T = (P,A,IC): background abductive theory,
outputs: P': learned theory,/1: abduced literals)

P':=0
11 := 0
repeat (covering loop)
GenerateRule(in: T,E+ ,E- ,P',/1; out: Rule,Etule•I1RuJe)
Add to £+ all the positive literals of target predicates in 11Rule
Add toE- all the atoms corresponding to
negative literals of target predicates in 11Rule
£+ := £+ - Etule
P' :=P'U{Rule}
11 := 11 U 11Rule
until£+ = 0 (Completeness stopping criterion)
outputP'

Figure 15.1 The covering loop.

15.3 AN ALGORITHM FOR LEARNING ABDUCTIVE LOGIC PROGRAMS


In this section, we present an algorithm that is able to learn abductive logic programs
according to Definition 15.2. It evolved from the one we proposed in (Esposito et al.,
1996).
The algorithm is obtained from the basic top-down ILP algorithm (Bergadano and
Gunetti, 1996), by replacing, for the coverage test of examples, the Pro log proof pro-
cedure with the abductive proof procedure.
As the basic one, our algorithm is constituted by two nested loops: the covering
loop (Figure 15.1) and the specialization loop (Figure 15.2). At each iteration of the
covering loop, a new clause is generated such that it covers at least one positive ex-
ample and no negative one. Positive examples covered by the rule are removed from
the training set and a new iteration of the covering loop is started. The algorithm ends
when the set of positive examples becomes empty. The new clause is generated in
the specialization loop: the clause is initially assigned an empty body, and literals are
added to it until the clause does not cover any negative example while still covering at
least one positive. The basic top-down algorithm is extended in the following respects.
First, in order to determine the positive examples E%ule covered by the generated
rule Rule (procedure TestCoverage in Figure 15.3), an abductive derivation is started
for each positive example. This derivation results in a (possibly empty) set of abduced
literals. We give as input to the abductive procedure also the set of literals abduced in
the derivations of previous examples. In this way, we ensure that assumptions made
during the derivation of the current example are consistent with assumptions for other
examples.
Second, in order to check that no negative example is covered (ERule = 0 in Fig-
ure 15.2) by the generated rule Rule, an abductive derivation is started for the default
240 E. LAMMA ET AL.

procedure GenerateRule(
inputs: T,E+ ,E- ,P',6.
outputs: Rule: rule,
Efiute : positive examples covered by Rule,
6.Rule : abduced literals

Select a target predicate p


Let Rule:= p(X) +-true.
repeat (specialization loop)
select a literal L from the language bias
add L to the body of Rule
TestCoverage(in: Rule, T,P',E+ ,E- ,6.,
out: Efiute,ERule'6.Rute)
ifEJiule = 0
backtrack to a different choice for L
until Eiiute = 0 (Consistency stopping criterion)
output Rule, EJiu/e' 6.Rule

Figure 15.2 The specialization loop.

procedure TestCoverage(
inputs : Rule, T, P', £+, E-, 6.
outputs: Efiute,ERule: examples covered by Rule
6.Rule : new set of abduced literals

EJiu/e := ERule := 0
6.;n := 6.
for each e+ E £+ do
if AbdDer( +- e+, (PUP' U {Rule},A,IC),6.;n,6.our)
succeeds then Add e+ to Efiute; 6.;n := 6.our
end for
for each e- E E- do
if AbdDer( +- not_e-, (PUP' U {Rule},A,IC),6.;n,6.our)
succeeds then 6.;n := 6.our
else Add e- to Efiu 1e
end for
6.Rule : = 6.our - 6.
output Efiute,ERule•6.Rule

Figure 15.3 Coverage testing.


COOPERATION OF ABDUCTION AND INDUCTION IN LOGIC PROGRAMMING 241

negation of each negative example ( f- not _e-). Also in this case, each derivation starts
from the set of abducibles previously assumed. The set of assumptions is initialized
to the empty set at the beginning of the computation, and is gradually extended as it is
passed on from derivation to derivation. This is done as well across different clauses.
Third, some abducible predicates may be also target predicates, i.e., predicates for
which we want to learn a definition. To this purpose, after the generation of each
clause, abduced atoms of target predicates are added to the training set, so that they
become new training examples. For each positive abduced literal/, if it is positive, l
is added to £+, if it is negative, 1is added to E-.
In order to achieve consistency, in rule specialization (Figure 15.2), a rule can be
specialized by adding either a non-abducible literal or an abducible one. However, the
system does not need to be aware of what kind of literal it is adding to the rule since the
abductive proof procedure takes care of both cases. When adding an abducible atom
3(X) that has no definition in the background, the rule becomes consistent because
each negative example p(t-:_) is uncovered by assuming not _B(t-:_) and each previously
covered positive example p(i+) is still covered by assuming B(i+). If the abducible
has a partial definition, some positive examples will be covered without abduction and
others with abduction, while some negative examples will be uncovered with abduc-
tion and others will be covered.
We prefer to first try adding non-abducible literals to the rule, since complete infor-
mation is available about them and therefore the coverage of examples is more certain.
The algorithm also performs the task of learning exceptions to rules. The task of
learning exceptions to rules is a difficult one because exceptions limit the generality
of the rules since they represent specific cases. In order to deal with exceptions, a
number of (new) auxiliary abducible predicates is provided, so that the system can
use them, for rule specialization, when no standard literal or abducible literal with
a partial definition is available from the bias such that a rule for a target predicate
becomes consistent.
The algorithm can be extended in order to learn not only from examples but also
from integrity constraints on target predicates. The details of this extension together
with an example application will be described in Section 15.4.4.
Note that the system is not able to learn full abductive theories, including new
integrity constraints as well. In order to do this, in (Kak:as and Riguzzi, 1997) the au-
thors proposed the use of systems that learn from interpretations, such as Claudien (De
Raedt and Bruynooghe, 1993) and ICL (De Raedt and Van Laer, 1995).

15.4 EXAMPLES
Two interesting applications of the integration are learning from incomplete knowl-
edge and learning exceptions. When learning from incomplete knowledge, abduction
completes the information available in the background knowledge. When learning ex-
ceptions, instead, assumptions are used as new training examples in order to generate
a definition for the class of exceptions (Section 15.4.3).
When learning from incomplete data, what to do with the assumptions depends on
the type of theory we are learning. When learning a non-abductive theory, abduction
242 E. LAMMA ET AL.

completes the information available in the background knowledge and it is therefore


natural to add the assumptions to the theory at the end of the learning process, thus
doing a form of theory revision (Section 15.4.1). When learning an abductive theory,
the assumptions made do not have to be added to the theory, since they can be guessed
by abduction in the final theory (Section 15.4.2).

15.4.1 Learning from incomplete knowledge


Abduction is particularly suitable for modelling domains in which there is incomplete
knowledge. In this section, we show how the algorithm is able to find the solution of
the learning problem presented in Example 15.1.
For the sake of clarity, in the following we repeat the problem statement. We want
to learn a definition for the concept father from a background knowledge contain-
ing facts about the concepts parent, male and female, with male and female being
incompletely defined.
Consider the abductive background theory B = (P,A,IC} and training set:

P= {parent(john,mary),male(john),
parent( david, steve),
parent(kathy, ellen) ,Jemale(kathy)}
A= {male/1,/emalejl}
/C = { f-- male(X),jemale(X)}
£+ = {!ather(john,mary),father(david,steve)}
E- = {!ather(john,steve),Jather(kathy,ellen)}

Moreover, let the bias be

father(X,Y) f--a where a C {parent(X,Y),parent(Y,X),


male(X), male(Y), female(X), f emale(Y)}

The program must first be transformed into its positive version and then into a program
where abducibles have no definition, as shown in Section 15.2.1. For simplicity, we
omit the two transformations, and we suppose to apply the inverse transformations to
the learned program.
At the first iteration of the specialization loop, the algorithm generates the rule
father(X,Y) f--.
which covers all positive examples but also all negative ones. Therefore another itera-
tion is started and the literal parent(X, Y) is added to the rule
father(X,Y) f-- parent(X,Y) .
This clause also covers all positive examples but also the negative example
father(kathy,ellen) . Note that up to this point no abducible literal has been added
to the rule, therefore no abduction has been made and the set !! is still empty. Now, an
abducible literal is added to the rule, male(X), obtaining
father(X,Y) f-- parent(X,Y),male(X).
At this point the coverage of examples is tested. father(john,mary) is covered with-
out abduction, while father(david,steve) is covered with the abduction of
{male( david), not_female( david)}.
COOPERATION OF ABDUCfiON AND INDUCfiON IN LOGIC PROGRAMMING 243

Then the coverage of negative examples is tested by starting the abductive deriva-
tions
~ not_father(john,steve).
~ not_father(kathy, ellen).
The first derivation succeeds with an empty explanation while the second succeeds
abducing not..male(kathy) which is consistent with the fact female(kathy) and the
constraint~ male(X),jemale(X). Now, no negative example is covered, therefore
the specialization loop ends. No target atom is in /)., therefore no example is added to
the training set. Positive examples covered by the rules are removed from the train-
ing set which becomes empty. Therefore also the covering loop terminates and the
algorithm ends, returning the rule
father(X,Y) ~ parent(X,Y),male(X).
and the assumptions
/). = {male( david), not_female(david), not ..male(kathy)}.
At this point, assumptions made are added to the background knowledge in order to
complete the theory, thus performing a kind of theory revision. Only positive assump-
tions are added to the resulting theory, since negative assumptions can be derived by
Negation As Failure 3 • In this case, only male( david) ~ . is added to the theory, while
not female( david) and not male( kathy) can be derived by using Negation As Failure.

15.4.2 Learning abductive theories


In this section, we apply the algorithm to the problem of learning an abductive theory
for diagnosis from an incomplete background knowledge. When learning an abductive
theory, the observations to be explained are represented by facts in the training set and
the corresponding known explanations by facts in the background knowledge.
It must be observed that, when the available information on explanations is com-
plete, it is not necessary to use our algorithm for learning the rule part of the theory,
but any standard ILP system can be used. In fact, if all the explanations are known,
then every positive example can be entailed by the resulting theory without the need
of abduction. Therefore, we argue that the main use of abduction in learning is for
completing incomplete knowledge.
In presence of incomplete information on the explanations, abduction is necessary
to generate the missing (part of) explanations, as in the general case of learning from
incomplete knowledge (see Section 15.4.1). However, differently from that case, ex-
planations should not be added to the resulting theory. In fact, explanations can be
newly obtained from the target theory by means of abductive reasoning.
Let us consider the case of an abductive background theory containing the follow-
ing clauses, abducibles and constraints:

P = {flauyre(biket).
circular( bike t).
tyre.Jwlds...air(bike3).

3 Since Pro log proof procedure will be used with the final theory and therefore Negation As Failure will
replace Negation by Default.
244 E. LAMMA ET AL.

circular( bike4).
tyreJwlds..air(bike4) .}

A= {flauyre/l ,broken...spokes/1}

IC = {+-- flatJyre(X) ,tyreJwlds..air(X).


+-- circular(X), broken...spokes(X).}

£+ = {wobbly _wheel (bike 1) , wobbly _wheel (bikez) , wobbly_wheel (bike3)}


E- = { wobbly_wheel(bike4)}
The algorithm generates the following clause in the specializing loop:
wobbly_wheel(X) +--flat Jyre(X).
Then the clause is tested. This clause covers wobbly_wheel(bikel) because
flauyre(biket) is specified in the background knowledge while the second example
wobbly_wheel(bikez) is covered by assuming
{flat Jyre(bikez), not JyreJwlds..air(bikez)} .
The example wobbly_wheel(bike3), however, cannot be covered: in fact, we can-
not assume jlauyre(bike3) since it is inconsistent with the integrity constraint +-
flauyre(X) ,tyre..holds..air(X). and the fact tyre..holds..air(bike3). Then, we check
that not_wobbly_wheel(bike4) is derivable in the hypotheses set. This derivation suc-
ceeds by abducing not -flauyre( bike 4).
The algorithm adds the clause to the current theory and removes covered examples
from E+ . A new iteration of the covering loop is then started with:
E + = {wobbly_wheel(bike3)} .
E- = { wobbly_wheel(bike4)}
A = {flat Jyre( bikez), not Jyre..holds..air( bikez),
not_flatJyre(bike4) }.
In order to cover the remaining positive example wobbly_wheel(bike3). the system
generates the clause:
wobbly_wheel(X) +-- broken...spokes(X) .
which covers the example by abducing
{ broken...spokes( bike3), not _circular( bike3)}
In fact, these assumptions are consistent with the integrity constraint:
+-- circular(X),broken...spokes(X) .
As for the previous case, the negative example is ruled out by assuming
not..broken...spokes(bike4) .
At this point the algorithm terminates because£+ becomes empty.
The resulting set of assumptions constitute a set of diagnosis for the devices con-
sidered in the training set. Assumptions are not added to the resulting theory since
they can be generated by abductive reasoning at any time.
COOPERATION OF ABDUCfiON AND INDUCfiON IN LOGIC PROGRAMMING 245

15.4.3 Learning rules with exceptions


The task of learning exceptions to rules is a difficult one because exceptions limit the
generality of the rules since they represent specific cases. In the following, we discuss
how our algorithm performs the task of learning exceptions to rules by using a number
of auxiliary abducible predicates, and show an example of its behaviour.
In order to learn exceptions to rules, when no standard literal or abducible literal
with a partial definition is available from the bias such that a rule for a target predi-
cate pIn becomes consistent, then the algorithm specializes the rule by adding a new
abducible literal not _abnorm;(X). This addition transforms the rule into a default rule
that can be applied in all "normal" (or non-abnormal) cases. The refined rule becomes
consistent by abducing abnorm;(t-:_) for every negative example p(t-:_) . Positive exam-
ples, instead, will be covered by abducing not _abnorm;( t+) for every positive example
p(t+). These assumptions are then added to the training set, and are used to learn a
definition for abnorm;j n that describes the class of exceptions. If there are exceptions
to exceptions, the system adds a new literal not _abnorm1In to the body of the rule for
abnorm;j n and the process is iterated. Therefore, we are able to learn hierarchies of
exceptions.
The above technique is implemented by including a number of predicates of the
form not_abnorm;jn in the bias of each target predicate of pin that may have ex-
ceptions. Moreover, abnorm;j n and not _abnorm;j n are added to the set of abducible
predicates and the constraint
~ abnorm;(X), not _abnorm;(X).

is added to the (positive version of the) background knowledge. Predicates abnorm;j n


are considered as target predicates, and a bias must be defined for them. Since we may
have exceptions to exceptions, we may also include a number of literals of the form
not _abnorm j(X) in the bias for abnorm;j n.
The example which follows is inspired by (Dimopoulos and Kakas, 1995), and
shows how exceptions are dealt with. Let us consider the following background ab-
ductive theory T = (P,A,IC) and training sets:
P = {bird(X) ~ penguin(X).
penguin(X) ~ superpenguin(X).
bird(a) .
bird(b).
penguin( c).
penguin(d) .
superpenguin(e).
superpenguin(!).}
A= {abnormtfl ,abnormzll}
/C={}

£+ = {flies( a) ,flies(b) ,flies(e) ,flies(!)}


E- = {jlies(c),flies(d)}
The positive version of the theory will contain also the constraints:
246 E. LAMMA ET AL.

+--- abnorm1 (X),not...abnorm1 (X).


+--- abnormz(X),not...abnormz(X).
Moreover, let the bias be:

jlies(X) +--- a where


a~ {bird(X),penguin(X),superpenguin(X),
not...abnorm1 (X)}
abnorm1 (X) +--- ~ where
~ ~ {bird(X),penguin(X),superpenguin(X),
not ...abnormz (X)}
abnormz(X) +--- y where
y ~ {bird(X),penguin(X),superpenguin(X)}
The algorithm starts by generating the following rule in the specialization loop (R 1):

jlies(X) +--- bird(X).

which covers all positive examples, but also all negative ones. In order to rule out neg-
ative examples, the abducible literal not...abnorm1 is added to the body of R1 obtaining
Rz:

flies( X) +--- bird(X), not...abnorm1 (X).

Now, the theory is correct and the set of assumptions resulting from the derivations of
positive and (negated) negative examples is

{not...abnorm1 (a),not...abnorm1 (b),


not ...abnorm 1(e), not ...abnorm1 (!),
abnorm1 (c),abnorm1 (d)}.

Since abnorm 1/1 is a target predicate, these assumptions become new training exam-
ples yielding:

E+ = {abnorml(c),abnorm!(d)}
E - = {abnorml(a),abnormi(b),abnormi(e),abnormi(f)}

Therefore, a new iteration of the covering loop is started in which the following clause
is generated (R3):

abnorm1 (X) +--- penguin(X), not ...abnormz(X).

The rule is correct and the set of assumptions resulting from the derivations of positive
and (negated) negative examples is

{not ...abnormz (c), not ...abnormz (d),


abnormz (e), abnormz (!)}.

Note that no assumptions are generated for the derivations

+--- not...abnorm1(a)
COOPERATION OF ABDUCfiON AND INDUCfiON IN LOGIC PROGRAMMING 247

+- not...abnormt(b)
since penguin( a) and penguin( b) are false.
After the addition of the new assumptions, the training sets become

£+ = {abnorm2(e),abnorm2(!)}
E- = {abnorm2(c),abnorm2(d)}

For this training set, the algorithm produces the rule

abnorm2(X) +- superpenguin(X).

that is correct and no assumption is generated for covering examples. The algorithm
now ends by producing the following program:

flies(X) +- bird(X),not...abnormt(X).
abnormt(X) +- penguin(X),not..abnorm2(X).
abnorm2(X) +- superpenguin(X).

We try to generalize exceptions in order to treat them as a whole, possibly leading to


the discovery of exceptions to exceptions. In this way, we can learn hierarchies of
exceptions.

15.4.4 Learning from integrity constraints on target predicates


We now present a way in which training examples can be extracted from integrity
constraints on target predicates. In this way, the algorithm is able to learn not only
from examples but also from integrity constraints, as it is done in (De Raedt and
Bruynooghe, 1992a; Ade et al., 1994; Muggleton, 1995).
Let us consider the abductive program T' generated in the previous section, and
add to it the following user-defined constraint, I, on target predicates:

+- rests(X),plays(X).
Consider now the new training sets:

£+ = {plays(a),plays(b),rests(e),rests(f)}
E- = {}

In this case, the information about the target predicates comes not only from the train-
ing set, but also from integrity constraints. These constraints contain target predicates
and therefore they differ from those usually given in the background knowledge that
contain only non-target predicates, either abducible or non-abducible. The generaliza-
tion process is not limited by negative examples but by integrity constraints. Suppose
that we generalize the two positive examples for plays/1 in plays(X). This means that
for all X, plays(X) is true. However, this is inconsistent with the integrity constraint I
because plays(X) cannot be true for e and f.
The information contained in these type of integrity constraints must be made avail-
able in a form that is exploitable by our learning algorithm, i.e., it must be transformed
into new training examples, as it is done in theory revision systems (De Raedt and
248 E. LAMMA ET AL.

Bruynooghe, 1992a; Ade et al., 1994). When the knowledge base violates a newly
supplied integrity constraint, these systems extract one example from the constraint
and revise the theory on the basis of it: in (De Raedt and Bruynooghe, 1992a) the
example is extracted by querying the user on the truth value of the literals in the con-
straint, while in (Ade eta/., 1994) the example is automatically selected by the system.
In our approach, one or more examples are generated from constraints on target
predicates using the abductive proof procedure. The consistency of each available
example is checked with the constraints, and assumptions are possibly made to ensure
consistency. Assumptions about target predicates are considered as new negative or
positive examples.
In the previous case, we start an abductive derivation for
+- plays(a) , plays(b),rests(e), rests(!)
Since plays I 1 and rests I 1 are abducibles, a consistency derivation is started for each
atom. Consider plays(a) , in order to have the consistency with the constraint +-
plays(X) , rests(X) ., the literal notJests(a) is abduced. The same is done for the other
literals in the goal obtaining the set of assumptions
{not Jests( a), notJests(b), not_plays(e) , not_plays(f)}
that is then transformed in the set of negative examples
E- = {rests(a) ,rests(b),plays(e),plays(f)}
Now the learning process applied to the new training set generates the following cor-
rect rules:
plays(X) +- bird(X),not...abnorm1 (X) .
rests(X) +- superpenguin(X).
In this way, we can learn not only from (positive and negative) examples but also from
integrity constraints.

15.5 INTEGRATION OF ABDUCTION AND INDUCTION


In this section, we describe our approach for the integration of abduction and induc-
tion, and we relate it to other works.
In our approach, abduction is used in induction for making assumptions about un-
known facts in order to cover examples, as proposed in this volume by Abe, Sakama,
and Mooney, and also by (De Raedt and Bruynooghe, 1992a; Ade et al., 1994; Ade
and Denecker, 1995; Dimopoulos and Kakas, 1996b; Kanai and Kunifuji, 1997). Ab-
ducibles can be present in the body of background rules or can be added to the body
of a target rule for specialization. These assumptions can be relative to background
abducible predicates, with an empty or partial definition, or to target predicates, for
which a definition must be learned, as in the chapter by Mooney and in (De Raedt and
Bruynooghe, 1992a; Ade et al., 1994; Ade and Denecker, 1995; Kanai and Kunifuji,
1997). In this second case, assumptions are also added to the training set so that they
become new training examples. Induction is then used in order to generalize assump-
tions. In both cases, the set of assumptions is stored and gradually extended in order
COOPERATION OF ABDUCTION AND INDUCTION IN LOOIC PROGRAMMING 249

to ensure consistency among examples. At the end of the computation, the resulting
set of assumptions can be discarded, if we are learning an abductive theory, or added
to the theory, if we want to complete an incomplete theory.
In this way, we obtain a particular instantiation of the cycle of abductive and in-
ductive knowledge development described by Flach and Kakas in this volume. In our
approach, abduction helps induction by generating suitable background knowledge or
training examples, while induction helps abduction by generalizing the assumptions
made.
Abe, Sakama, Mooney (this volume) and (Dimopoulos and Kakas, 1996b; De
Raedt and Bruynooghe, 1992a; Ade eta/., 1994; Ade and Denecker, 1995; Kanai
and Kunifuji, 1997) agree on using abduction for covering examples. What to do
with the assumptions generated depends then on the task you are performing. If you
are performing theory revision, you can revise overspecific rules by dropping the ab-
ducible literals from the body of rules (Mooney, Sakama, this volume) or by adding
the explanations to the theory (De Raedt and Bruynooghe, 1992a; Ade eta/. , 1994).
In both theory revision and batch learning, you can learn a definition for the abducible
predicates by considering the assumptions as examples for the abducible predicates
(Mooney, this volume) and (Kanai and Kunifuji, 1997; De Raedt and Bruynooghe,
1992a; Ade eta/., 1994; Ade and Denecker, 1995).
Another approach for the use of abduction in learning is described in (Dimopoulos
and Kakas, 1996b) where each example is given together with a set of observations
that are related to it. Abduction is then used to explain the observations in order to
generate relevant background data for the inductive generalization.
Different positions exist on the treatment of negative observations. According to
Abe (p.177), "it is very rare for hypotheses to be generated when observations are
negative", therefore he does not consider this possibility. To avoid the difficulties of
learning from positive examples only, he adopts Abductive Analogical Reasoning to
generate abductive hypotheses under similar observations: in this case, generated hy-
potheses satisfy the Subset Principle and it is possible to learn from positive examples
only.
Sakama (this volume), instead, revises a database that covers negative observations
by revising only the extensional part of the database: by abduction, he finds the facts
that are responsible for the coverage of the negative example/observation and he re-
places each such fact~ f- with the clause~ f- ~- Semantically, ~ f- ~(that is equiv-
alent to ~ V ......~. represents two possible worlds, one in which ~ is true and the other
in which ~ is false. In this way the inconsistency is removed but, differently from
systems where the theory is revised by removing ~. information about the previous
state is kept, thus allowing to restore the database in its original state.
Our approach for the treatment of negative examples is similar to work by Mooney
(this volume) and (Kanai and Kunifuji, 1997). It differs from Abe 's work (this vol-
ume), since we do generate explanations for negative examples, and is a generalization
of Sakama's one since we are able to revise not only facts but also rules. The kind of
revision we perform is very similar: instead of adding an abnormality literal in the
head of the fact, we add a non-abnormality literal to the body. Moreover, Sakama's
procedure is effective for dealing with single exceptions rather then classes of excep-
250 E. LAMMA ET AL.

tions, because it treats each exception singularly by adding a new abducible literal
to a fact. Instead, we try to generalize exceptions in order to treat them as a whole,
possibly leading to the discovery of a hierarchy of exceptions.
In his chapter, Christiansen proposes a reversible demo predicate that is able to
generate the (parts of the) program that are necessary for deriving the goal. Constraint
Logic Programming techniques are used for specifying conditions on the missing pro-
gram parts in a declarative way. The approach shows a high generality, being able to
perform either induction or abduction depending on which program part is missing,
general rules or specific facts . The author also shows that the system is able to learn
exceptions to rules, though not hierarchies of exceptions.

15.6 CONCLUSIONS AND FUTURE WORK


We have proposed an algorithm where abduction and induction cooperate in order to
improve their power. Abduction helps induction by generating suitable background
knowledge or training examples, while induction helps abduction by generalizing the
assumptions made.
The algorithm solves an extended ILP problem in which both the background and
target theories are abductive theories, and coverage by deduction is replaced with cov-
erage by abduction. The algorithm is obtained from the basic top-down ILP algorithm
by substituting, for the coverage testing, Prolog proof procedure with an abductive
proof procedure. It can be applied to the problems of learning from incomplete knowl-
edge, learning abductive theories and learning rules with exceptions.
Future work will be devoted to evaluate the applicability of these ideas on real world
problems and to extend the system for learning full abductive theories, including as
well integrity constraints. The integration of the algorithm with other systems for
learning constraints, such as Claudien (De Raedt and Bruynooghe, 1993) and ICL (De
Raedt and Van Laer, 1995), as proposed in (Kakas and Riguzzi, 1997), seems very
promising in this respect.
Another interesting future line of research consists in investigating the idea, pro-
posed by Inoue and Haneda in their chapter in this volume, of learning logic programs
with abduction and two kinds of negation (e.g., default negation and explicit negation).

Acknowledgments
This research was partially funded by the MURST Project 40% "Rappresentazione della conoscenza
e meccanismi di ragionamento".

Appendix: Abductive proof procedure


In the following we recall the abductive proof procedure used by our algorithm. The
procedure is taken from (Kakas and Mancarella, 1990c). It is composed by two phases:
abductive derivation and consistency derivation.

Abductive derivation
An abductive derivation from (GI ~t) to (Gn ~n) in (P,Ab,IC) via a selection ruleR
COOPERATION OF ABDUCTION AND INDUCTION IN LOGIC PROGRAMMING 251

is a sequence
(GI ~I),(G2~2), . .. ,(Gn~n)

such that each G; has the form+- L1, ... ,Lt. R(G;) = Lj and (Gi+l ~i+l) is obtained
according to one of the following rules:
(A1) If Lj is not abducible or default, then Gi+l = C and ~i+l =~;where Cis the
resolvent of some clause in P with G; on the selected literal L j;
(A2) If L j is abducible or default and L j E ~i then
Gi+l =+- L1, . .. ,Lj-I,Lj+l•· ·· ,Lk and ~i+l = ~;;

(A3) If Lj is abducible or default, Lj ~~;and Lj ~~;and there exists a consistency


derivation from (L j ~;U {L j}) to ({}~')then Gi+l =+- L1, .. . ,L j-1 ,L j+l, .. . ,Lk
and~i+l =~'.

Steps (A1) and (A2) are SLD-resolution steps with the rules of P and abductive or
default hypotheses, respectively. In step (A3) a new abductive or default hypotheses
is required and it is added to the current set of hypotheses provided it is consistent.

Consistency derivation
A consistency derivation for an abducible or default literal a from (u, ~~) to (Fn ~n)
in (P,Ab,IC) is a sequence

where :
(Ci) F1 is the union of all goals of the form+- L1, .. . ,Ln obtained by resolving the
abducible or default a with the denials in /C with no such goal been empty,+-;
(Cii) for each i > 1, F; has the form { +- L1, ... ,Lk} U F/ and for some j = 1, ... ,k
(Fi+l ~;+I) is obtained according to one of the following rules:
(C1) If Lj is not abducible or default, then Fi+l = C' UF/ where C' is the set
of all resolvents of clauses in P with +- L1, . . . ,Lk on the literal L j and
+-~ C', and ~i+l = ~;;
(C2) If L j is abducible or default, L j E ~i and k > 1, then
fi+l = {+-LI, ... ,Lj-I.Lj+I•····Lk}UF/
and ~i+l = ~;;
(C3) If Lj is abducible or default, Lj E ~i then Fi+l = F;' and ~i+l = ~;;
(C4) If L j is abducible or default, L j ~ ~; and L j ~ ~;, and there exists an abduc-
tive derivation from ( +- L j ~;) to ( +- ~') then Fi+ 1 = F;' and ~i+ 1 = ~'.

In case (C1) the current branch splits into as many branches as the number of resol-
vents of +- L 1, . . . , Lk with the clauses in P on L j· If the empty clause is one of such
resolvents the whole consistency check fails. In case (C2) the goal under considera-
tion is made simpler if literal Lj belongs to the current set of hypotheses~; . In case
(C3) the current branch is already consistent under the assumptions in ~;. and this
252 E. LAMMA ET AL.

branch is dropped from the consistency checking. In case (C4) the current branch of
the consistency search space can be dropped provided t- L j is abductively provable.
Given a query L, the procedure succeeds, and returns the set of abducibles f). if there
exists an abductive derivation from ( t- L {}) to ( t- /),). With abuse of terminology,
in this case, we also say that the abductive derivation succeeds.
16 ABDUCTIVE GENERALIZATION
AND SPECIALIZATION
Chiaki Sakama

16.1 INTRODUCTION
Abduction and induction both generate hypotheses to explain observed phenomena
in an incomplete knowledge base, while they are distinguished in the following as-
pects. Abduction conjectures specific facts accounting for some particular observa-
tion. Those assumptions of facts are extracted using causal relations in the background
knowledge base. As there are generally many possible facts which may imply the ob-
servation, candidates for hypotheses are usually pre-specified as abducibles. Then,
the task is finding the best explanations from those candidates. By contrast, induction
seeks regularities underlying the observed phenomena. The goal is not only explain-
ing the current observations but discovering new knowledge for future usage. Hence
induced hypotheses are general rules rather than specific facts. In constructing general
rules, some constraints called biases are often used but candidates for hypotheses are
not usually given in advance. The task is then forming new hypotheses using informa-
tion in the background knowledge base.
Comparing two reasonings, abduction can compute explanations efficiently by spec-
ifying possible hypotheses in advance. Induction has a reasoning ability higher than
abduction in the sense that it can produce new hypotheses. However, the computation
of hypotheses will require a large search space and it is generally expensive. Thus ab-
duction and induction have a trade-off between reasoning abilities and computational
costs. Then, integrating two paradigms and taking advantages of each framework will
provide a powerful methodology for hypothetical reasoning. Moreover, such transfers
of techniques will benefit both abduction and induction. In abduction, introducing
a mechanism of abducing not only facts but general rules will enhance the reason-
253
PA. Flach and A.C. Kala!s (eds.), Abduction and Induction, 253-265.
© 2000 Kluwer Academic Publishers.
254 C.SAKAMA

ing ability of abduction. In induction, on the other hand, it is provided a method of


computing general rules abductively, which will make induction feasible.
In this chapter we propose new techniques called abductive generalization and ab-
ductive specialization. Abductive generalization provides a mechanism of abducing
not only specific facts but general rules accounting for positive observations. It is
achieved by computing abductive explanations and extending a knowledge base with
generalized explanations. On the other hand, when a knowledge base is inconsistent
with negative observations, abductive specialization refines a knowledge base to re-
cover consistency. It is done by abductively finding the sources of inconsistency and
specializing a knowledge base with additional abductive hypotheses. Abductive gen-
eralization and specialization provide methods for computing inductive hypotheses
through abduction, thus contribute to a step of integrating abduction and induction in
AI.
This chapter is organized as follows. Section 16.2 introduces an abductive frame-
work used in this chapter. Section 16.3 presents a method of abductive generalization,
and Section 16.4 provides a method of abductive specialization. Section 16.5 discusses
related work and Section 16.6 concludes the chapter.

16.2 PRELIMINARIES

16.2. 1 Extended abduction


In this chapter we use an extended framework of abduction which is proposed by
(Inoue and Sakama, 1995). 1
A knowledge base K is a set of definite clauses

where Hand Bi (1 ::; i::; n) are atoms. The atom His the head and the conjunction
B1, ... ,Bn is the body of the clause. A clause with an empty body H f- is called
a fact. Each fact H f- is identified with the atom H. A conjunction in the body is
identified with the set of atoms included in it. A clause (atom, literal) is ground if it
contains no variable. Given a knowledge base K, a set of atoms A from the language
of K is called abducibles. Abducibles specify a set of hypothetical facts. Any instance
A of an element from A is also called an abducible and is written as A EA. Given a
knowledge base K, its associated abducibles A are often omitted when their existence
is clear from the context.
Let 0 be a set of ground literals. Each positive literal in 0 represents a positive
observation, while each negative literal in 0 represents a negative observation. A pos-
itive observation presents an evidence that is known to be true, while a negative obser-
vation presents an evidence that is known to be false. An individual positive/negative
observation is written by o+ I o-, and the set of positive/negative observations from 0
is written by Q+ I O-, respectively.

1 In (Inoue and Sakama, 1995) the framework is introduced for nonmonotonic theories. Here we use it for
definite Hom theories with multiple observations.
ABDUCTIVE GENERALIZATION AND SPECIALIZATION 255

Given a knowledge base K with abducibles A, and observations 0, a pair of sets of


atoms (E, F) is an explanation of 0 inK if it satisfies the following conditions:

1. (KUE) \F f= o+ for every o+ E Q+,


2. (K U E)\ F U O- is consistent,

3. both E and F consist of ground instances of elements from A.

That is, a knowledge base (K U E) \ F derives every positive observation and is consis-
tent with every negative observation. 2 It should be noted that in this extended frame-
work hypotheses can not only be added to a knowledge base but also be discarded from
it to explain observations. When Q+ contains a single observation and O- and F are
empty, the above definition reduces to the traditional logical framework of abduction
addressed by Flach and Kakas in the introduction of this volume.
An explanation (E,F) is minimal if for any explanation (E' ,F'), E' ~ E andF' ~ F
imply E' = E and F' =F. It holds that EnF = 0 for any minimal explanation (E,F) .
In this chapter explanations mean minimal explanations unless stated otherwise.

Example 16.1 Let K be the knowledge base

driving(x) t- licensed(x), has-car(x),


licensed(John) t-, licensed(Mary) t-,
has-car(John) +-

with A = { licensed(x), has-car(x) }. Suppose we observe that Mary is driving but


John is not these days. The situation is represented as the set of observations 0 =
{ driving(Mary), •driving(John) }. Then, o+ = driving(Mary) is explained by as-
suming has-car(Mary) , i.e., she got a car. On the other hand, o- = ·driving(John)
is explained by removing either has-car(John) or licensed(John) from K, i.e., he lost
his car or license for some reason. As a result, 0 has two alternative explanations:

(E1,F2) = ( {has-car( Mary)}, {has-car( John)})


(E2,F2) = ({ has-car(Mary) }, { licensed(John)})

16.2.2 Our goal


In extended abduction both positive and negative observations are explained by intro-
ducing/removing hypotheses to/from a knowledge base. However, explanations are
still selected from the pre-specified abducible facts and no new rules are constructed
like induction. Our goal in this chapter is to bridge the gap between abduction and in-
duction, and to provide a method for abducing new rules which explain observations.
The problem is formally stated as follows.
Given a knowledge base K (with abducibles A) and positive/negative observations
0, abduce a new knowledge base K* such that

2 In (Inoue and Sakarna, 1995), explanations for a negative observation are called anti-explanations.
256 C.SAKAMA

1. K* f= o+ for every o+ E ()+,


2. K* U O- is consistent.

To obtain K* we use techniques for inductive generalization and specialization.

16.3 GENERALIZING KNOWLEDGE BASES THROUGH ABDUCTION


16.3.1 Abductive generalization
This section considers knowledge bases in which only positive observations are avail-
able. Since we consider monotonic definite theories, removing facts from a knowledge
base does not increase proven facts. Hence, whenever a positive observation has an
explanation (E,F), F is empty. Thus, an explanation (£,0) is simply written as E in
this section.

Example 16.2 One can make a profit if he/she buys a stock and the stock price goes
up. Now there are four persons a, b, c, d, and each one bought a stock e, f, g, h,
respectively. The situation is represented as

K1: profit(x) +- stock(x,y), up(y),


stock(a,e) +-, stock(b,j) +-,
stock(c,g) +-, stock(d,h) +-.

Suppose that abducibles are specified as A= {stock(x,y), up(y) }. Then, given the
set ofpositive observations

()+ = {profit(a), profit(b), profit(c) },


abduction computes the explanation

E = {up( e), up(!), up(g) }.


Thus, abduction makes each observation derivable by introducing E to K,. On the
other hand, the observations present that every person except d has already made a
profit. Then, one may consider that the market is rising and d will also make a profit.
In this case, one can assume the optimistic rule

profit(x) +- stock(x,y),

rather than computing similar explanations for each observation. This inference is an
inductive generalization, which is obtained from the original rule by dropping condi-
tions (Michalski, 1983a).
Our goal in this section is to compute such inductive generalization through ab-
duction. That is, given a knowledge base and positive observations, we produce a
generalized knowledge base which explains the observations.
Some terminologies are introduced from (Plotkin, 1970). Two atoms are compati-
ble if they have the same predicate and the same number of arguments. Let S be a set
of compatible atoms. For A1, Az E S, A1 is more general than Az (written A1 :::; Az)
ABDUCTIVE GENERALIZATION AND SPECIALIZATION 257

if A 19 = A2 with some substitution 9. An atom A is a least generalization 3 of S if (i)


A ::; A; for every A; E S, and (ii) if A j ::; A; holds for every A; E S, then A j ::; A. If
A and A' are two least generalizations of S, A and A' are alphabetic variants. Given
a set of atoms S, consider a decompositionS= S 1 U · · · U Sk where each S; is a set of
compatible atoms and no two atoms A E S; and B E S j ( i =I- j) are compatible. When
an atom A is a least generalization of S;, it is written as lg(S;) = {A}. Then, the least
generalization lg(S) of Sis defined as lg(S) = lg(S1) U · · · U lg(Sk)·

Definition 16.1 Let K be a knowledge base and ()+ a set of positive observations.
Then the following procedure computes an abductive generalization K+ of K wrt. ()+.
First, put K+ = K.

1. Compute an explanation E of()+ and its least generalization lg(E).

2. For any clause C from K+ whose body has atoms unifiable with atoms in lg(E),
produce a new clause c+ by resolving C with lg(E) on every such atom. 4

3. /fC+e ~ C holds for some substitution e, replace C by c+ inK+. Otherwise,


addc+ toK+.

The procedure consists of two generalization processes. The first one is the gen-
eralization of abduced explanations, and the second one is the generalization of a
knowledge base. Abductive generalization weakens the conditions of existing clauses
by the least generalization of the abduced explanations. The knowledge base K+ is
also an inductive generalization of K, which explains the observations()+.

Example 16.3 In Example 16.2, the least generalization ofE is lg(E) = { up(y) }. As
the clause cl : profit(x) +- stock(x,y), up(y) contains the atom up(y), resolving cl
with up(y) produces the clause
c{: profit(x) +- stock(x,y).

Since the original clause C1 is subsumed by the produced clause Ci, Ki is obtained
from K1 by replacing C1 with c{:
K{ : profit(x) +- stock(x,y),
stock(a,e) +-, stock(b,f) +-,
stock(c,g) +-, stock(d,h) +-.

A generalized knowledge base K+ is also considered as a theory which is obtained


from K by partial evaluation with respect to abduced explanations. That is, instead
of explicitly introducing abductive hypotheses to a knowledge base, corresponding
hypotheses are implicitly incorporated in their general forms. As a result, each obser-
vation is derived from K+ without introducing the abduced explanation E.

3 Inthe ILP literature, it is also called a least general generalization. But we use the term from (Plotkin,
1970) in this chapter.
4 Resolving C with lg(E) means resolution between C and an atom in lg(E).
258 C.SAKAMA

Theorem 16.1 Let K be a knowledge base, a+ a set of positive observations, and


E an explanation of a+. Then, for any ground atom A such that A fl. E, K U E f= A
implies K+ f= A.
Proof. Let us identify knowledge bases K U E and K+ with their ground instances.
Then, any ground clause H +- B from K such that B n E f:. 0 is transformed to a
ground clause H +- B' in K+ where B' = B \E. Hence, K U E f= H implies K+ f= H.
Therefore, any ground atom A s.t. A fl. E, which is derived from K U E, is also derived
fromK+. I
When A E E, the relation K U E f= A does not necessarily imply K+ f= A. This is
because K+ may have no clause defining A.
Corollary 16.2 For any o+ E a+, K U E f= o+ and o+ fl. E imply K+ f= o+.
By Theorem 16.1, any fact which is not in an explanation and is derived from the
prior knowledge base together with the explanation, is also derived from the gener-
alized knowledge base. This is especially the case for observations (Corollary 16.2).
Note that since we consider minimal explanations, o+ E E implies E = o+. In this
case, o+ is explained by itself and K+ does not necessarily entail o+ in such a trivial
case.
The converse of Theorem 16.1 or Corollary 16.2 does not hold in general. Indeed,
K+ possibly derives facts that are not derived from K U E. For instance, in Exam-
ple 16.3, profit(d) is derived from K( but not from K1 UE. Such an increase of
proven facts other than observations is called an inductive leap, which is a feature of
inductive generalization.

16.3.2 Some remarks on abductive generalization


Abductive generalization introduces an inductive mechanism to abduction by con-
structing general rules which explain observations. From induction viewpoint, gener-
alization K+ is computed by modifying existing clauses in the background knowledge
base K. Restricting dropping atoms to abducibles is a kind of bias, which reduces
the number of possible generalization. Dropping abducibles is also semantically jus-
tified, since any rule containing hypotheses is considered incomplete and is subject to
change.
The reliability of abductive generalization increases in proportion to the number of
(compatible) positive observations. When a+ does not have more than one compatible
observation, the procedure generalizes a knowledge base to the smallest extent. For
example, if the single observation a+ = {profit( a)} is given to K1 of Example 16.2,
its explanation is E = {up(e) }. In this case lg(E) = E, and resolving C 1 : profit(x) +-
stock(x,y), up(y) with lg(E) produces the clause
0: profit(x) +- stock(x,e),
which presents that one can make a profit if he/she buys the stock e. Since 0 does
not subsume the original clause C1, it is just added to K 1 and abductive generalization
produces Kt = K1 U { 0 }. This is a technique of introduction of clauses, which is
also used in inductive generalization.
ABDUCTIVE GENERALIZATION AND SPECIALIZATION 259

Abductive generalization reduces nondeterminism in induction. There may be


many possible inductive generalizations that comply with observations, then abduc-
tion leads us to hypotheses on which a knowledge base should be repaired. However,
when positive observations o+ have multiple explanations E 1, • •• ,En inK, a general-
ization K+ exists with respect to each lg(Ei)-
Example 16.4 Let K be the knowledge base
p(x) +- q(x), s(x),
q(x) +- r(x), t(x),
s(a) +-, s(b) +-, s(c) +-,
t(a) +-, t(b) +-
with A = { q(x), r(x) }. Given o+ = { p(a), p(b) }, there are two explanations E1 =
{ q(a), q(b)} and £2 = { r(a), r(b) }. Using £1, abductive generalization becomes
Ki. : p(x) +- s(x),
q(x) +- r(x), t(x),
s(a) +-, s(b) +-, s(c) +-,
t(a) +-, t(b) +- .
On the other hand, using £2 it becomes
K~ : p(x) +- q(x), s(x),
q(x) +- t(x),
s(a) +-, s(b) +-, s(c) +-,
t(a) +-, t(b) +- .
Here, p(c) is derived from Ki. but not from K~ .
Thus, there are generalizations according to different explanations, and each gen-
eralization produces different leaps in general. This kind of nondeterminism could
be reduced if further observations on the leaps are available. For instance, if p(c) is
known to be false, Kt1 does not reflect the situation and K~ is chosen.
Some additional condition is considered for performing abductive generalization.
Suppose that the positive observations o+ = {profit(k) , profit(h)} are given to the
knowledge base K 1 of Example 16.2. Since there is no fact on k's and h's stock, abduc-
tion computes the explanation E = {stock(k, ti), stock(h, t2), up(tJ ), up(t2)} for some
instances t1 and t2 . In this case, by the least generalization lg(E) = {stock(x,y), up(y) },
both stock(x,y) and up(y) are dropped from the body of the clause profit(x) +-
stock(x,y), up(y). The generalized clause then becomes
profit(x) +-,
saying that everyone makes a profit. To avoid such over-generalization, it is effective
to restrict dropping conditions only when generalized clauses are range-restricted.5

5 A clause is range-restricted if any variable in the clause occurs in the body.


260 C.SAKAMA

16.4 SPECIALIZING KNOWLEDGE BASES THROUGH ABDUCTION

16.4. 1 Abductive specialization


This section considers a situation where negative observations are given to a knowl-
edge base. In a definite theory whenever a negative observation has an explanation
(E,F), E is empty. This is because introducing facts to a definite theory does not help
to recover consistency with respect to negative observations. Thus, an explanation
(0,F) is simply written as Fin this section.

Example 16.5 Consider the knowledge base K2 = Kt of Example 16.3. When the
negative observation Q- = {• profit(d)} is provided, K2 U O- is inconsistent. Tore-
cover consistency ofK2 wrt. Q- , abduction computes the explanation F = {stock( d, h) } .

Thus, abduction recovers consistency by removing hypothetical facts from a knowl-


edge base. Our goal in this section is to achieve the same effect not by removing hy-
potheses but by specializing clauses. That is, given a knowledge base and negative
observations, we produce a specialized knowledge base which is consistent with the
observations.

Definition 16.2 Let K be a knowledge base and (F a set of negative observations.


Then, the following procedure computes an abductive specialization K- ofK wrt. Q-.
First, put K- = K.

1. Compute an explanation F of Q-.

2. For every A E F, replace the corresponding fact C : A ~ in K- with the clause

c-: A~A'
where A' is a newly introduced abducible uniquely associated with A.

Abductive specialization abductively finds facts which are the sources of inconsis-
tency. Then those facts are specialized by introducing newly invented abducibles to
their conditions. The specialized knowledge base K - is consistent with O- .

Theorem 16.3 Let K be a knowledge base and (F negative observations. If Q- has


an explanation F, K- U Q- is consistent.

Proof. For any o- = -, G from Q-, K \ F ~ G holds by definition. When K U {o-}


is inconsistent, G is derived from K using each atom A in F. Then, rewriting every
corresponding fact A ~ in K with A ~ A' in K-, G is not derived from K- . I

Example 16.6 Consider the knowledge base K2 and Q- = { • profit(d)} of Exam-


ple 16.5. By the explanationF = {stock(d, h)}, the correspondingfactC2 : stock(d,h) ~
in K2 is specialized to

C2 : stock(d,h) ~ stock'(d,h) .
ABDUCTIVE GENERALIZATION AND SPECIALIZATION 261

As a result, K2 becomes

K2 : profit(x) t- stock(x,y),
stock(a,e) +-, stock(b,j) +-, stock(c,g) t-,
stock(d,h) t- stock'(d,h),

where K:; U { • profit(d)} is consistent. In the specialized knowledge base K:;, an


additional hypothesis stoc/C (d, h) is requested to conclude that d bought a (good) stock
h.

Note that abduction removes explanatory facts from a knowledge base, while ab-
ductive specialization keeps information on them. This is useful for recovering the pre-
vious state of a knowledge base. For instance, if the stock h later rises and profit(d)
turns positive, K2 is reproduced from K2 using abductive generalization, i.e., dropping
the condition stoc/C (d, h) inc:;.
Abductive specialization recovers consistency by modifying facts while retaining
general knowledge. This is also the case of updates in deductive databases where
every fact in a database is considered an abducible which is subject to change (Kakas
and Mancarella, 1990b). On the other hand, when one wants to specialize not only
facts but rules in a knowledge base, abductive specialization is applied in the following
manner.
Given a knowledge base K with abducibles A, we first select hypothetical clauses
from K which are subject to change. For any hypothetical clause

C;: Ht-B

inK, we consider the clause

q: H t-B,A;

where A; is a new abducible uniquely associated with each C;. 6 Then we consider the
knowledge base

K' = (K\ U{c;}) u U{c;} u U{A;ej +- },


i i i

where A;S j is any ground instantiation of A;. Abducibles associated with this new
theory K' are defined as

Then, we apply abductive specialization to K' with the following policy. If we want
to specialize C; and negative observations O- have an explanation F containing A;S j.
then we take the explanation F and specialize the corresponding fact A;S j t- in K'.
The resulting knowledge base K'- has the same effect as specializing C; inK.

6This techniqueis called naming in (Poole, 1988a). When C; contains n distinct free variables x = x,, ... ,xn,
an abducible A;= p;(x) is associated with C; where p; is an n-ary predicate appearing nowhere inK.
262 C.SAKAMA

Example 16.7 Let K be the knowledge base

flies(x) +--- bird(x},


bird(tweety) +---, bird(pol/y) +-

with A = { bird(x) }. Suppose that the first clause is a hypothetical clause which we
want to revise. First, K is transformed to K':

K': flies(x) +--- bird(x), p(x),


p(tweety) +---, p(polly) +---,
bird(tweety) +---, bird(pol/y) +-
with A'= {bird(x), p(x) }. Given o-
= { •flies(tweety) }, it has two explanations
F1 = {bird(tweety)} and F2 = {p(tweety) }. According to the policy, Fz is chosen
then K'- becomes1

K'-: flies(x) +--- bird(x), p(x),


p(tweety) +--- p 1 (tweety),
p(polly) +---,
bird(tweety) +---, bird(polly) +--- .

Note that K'- has the effect of specializing the first clause inK wrt. O-. The revised
knowledge base means that a bird flies ifit satisfies an additional property p (normality
or something). But tweety fails to satisfy the property by the presence of the unproved
condition p'.

16.4.2 Combining abductive generalization and specialization


Finally, we consider combining abductive generalization and specialization in the
presence of both positive and negative observations. Abductive generalization often
produces an overly general theory which is inconsistent with some negative observa-
tions. Let us consider the knowledge base Kt of Example 16.2 in which the observa-
tions 0 = {profit(a) , profit(b), profit( c), •profit(d)} are given. By the positive
observations ()+ from 0, abductive generalization produces Kz = K{ of Example 16.3
which explains ()+. As Kz is inconsistent with the negative observation O- from 0,
abductive specialization produces K2 of Example 16.6 which is consistent with O-.
Thus, in the presence of both positive and negative observations, it is considered that
first generalizing a theory to derive positive observations, then specializing the theory
to satisfy negative observations. Note that in this example each positive observation in
()+ is still derived from K2. However, the specialization may affect the derivation of
positive observations in general.
Given a knowledge base K and positive/negative observations 0, let K± be a knowl-
edge base obtained by combining the procedures of Definitions 16.1 and 16.2. When

7When there are (infinitely) many ground instantiations of p(x), the set of facts p(t) <- other than
p(tweety) f- is shortly written as p(x) f- x f tweety.
ABDUCfiVE GENERALIZATION AND SPECIALIZATION 263

a generalization K+ is obtained by ()+, a necessary set of ()+ in K+ is defined as a


minimal set F of facts such that K+ \ F ~ G for some G E ()+. Such a necessary set
is computed using abduction. That is, putting ..., G as a negative observation in K+ , a
necessary set of {G} is obtained as an explanation F of -,G.
Theorem 16.4 Let K be a knowledge base in which positive and negative observa-
tions 0 have an explanation, and ()+ n A = 0. For any necessary set F1 of()+ in K+
and an explanation F2 of o- in K+, suppose F1 n F2 = 0. Then, there is a knowledge
base K± such that
1. K± I= o+ for every o+ E ()+ ,
2. K± U O- is consistent.
Proof. First, by the transformation from K to K+ , K+ I= o+ for every o+ E ()+
(Corollary 16.2). Second, by the transformation from K+ to K± using F2, K± U O- is
consistent (Theorem 16.3). In this transformation, each fact A E F2 is transformed to
A +- A' in K±. As A '/. F1, this rewriting does not affect the derivation of any o+ from
K+. Therefore, K± I= o+ for every o+ E ()+. I
When a necessary set of ()+ and every explanation of O- in K+ have an atom in
common, the result of Theorem 16.4 does not hold in general.
Example 16.8 Let K be the knowledge base
p(x) +- q(x) ,
q(x) +- r(x),s(x),
s(a) +- , s(b) +-
with A = { r(x), s(x) }. Given 0 = { p(a), p(b), -,q(a) }, the positive observations
()+ = { p( a), p( b) } have the explanation E = {r( a), r( b) } and K is generalized to
K+:
p(x) +- q(x),
q(x) +- s(x) ,
s(a) +-, s(b) +- .
In this K+, o+ = p( a) has the necessary set F 1 = {s( a) }. On the other hand, the neg-
ative observation O- = { .., q( a) } has the explanation F2 = { s( a) } which is equivalent
to F1 . Then, the following specialized knowledge base K± does not derive o+.
p(x) +- q(x),
q(x) +- s(x),
s(a) +-s'(a) ,
s(b) +-
Note that 0 has no explanation in K in the above example. Indeed, when some
facts are necessary to explain a positive observation, those facts cannot be removed to
satisfy negative observations. This is a necessary condition which a knowledge base
should satisfy to have an explanation for both positive and negative observations. This
condition is expressed as F1 n F2 = 0 in Theorem 16.4.
264 C. SAKAMA

16.5 RELATED WORK


Several systems use abduction in the process of induction. Ourston and Mooney in-
troduce a theory refinement system called EITHER (Ourston and Mooney, 1990). To
generalize a theory, it abductively searches the cause of failed positive observations
and generalization is done by inserting a new clause or dropping most specific expla-
nations from the conditions of a clause. Specialization is done by removing a clause
or adding new antecedents to a clause. In this volume, Mooney also presents a the-
ory refinement algorithm which generalizes a theory by deleting abduced literals or
inducing new clauses. In AUDREY (Wogulis, 1991 ), abduction is used for identi-
fying assumptions on which the domain theory should be repaired. In AUDREY II
(Wogulis and Pazzani, 1993), abduced assumptions are deleted from a theory unless
it covers negative examples. In EITHER the domain theory is a propositional Hom
theory, while AUDREY (II) considers predicate Hom theories. Our use of abduction
is similar to these systems, but is different from them on the following points. First,
we generalize a theory with least generalized explanations. Second, we use extended
abduction for not only generalizing theories but also specializing them. In CLINT (De
Raedt and Bruynooghe, 1992b) and RUTH (Ade et al., 1994) abduction supplements
induction by hypothesizing factual knowledge and is also used for diagnosing a cause
of integrity violation in a knowledge base. SIERES (Wirth and O'Rorke, 1992) uses
abduction to infer training sets of assumptions for inductively invented predicates.
Integration of abduction and induction is also investigated by some researchers.
Ade and Denecker introduce a procedure called SLDNFAI (Ade and Denecker, 1995).
It inductively constructs hypothetical clauses to cover positive observations and un-
cover negative observations. Inductive hypotheses constructed in this manner, called
inducibles, are used for explaining observations in the abductive procedure. Dimopou-
los and Kakas construct hypothetical rules to explain both positive and negative ob-
servations (Dimopoulos and Kakas, 1996b). Abduction is used for restricting a back-
ground theory to a relevant part on which induction is based. Abducibles are intro-
duced to the body of a clause to refine hypotheses. Lamma et al. and Inoue and Haneda
in this volume introduce systems for learning abductive logic programs, which pro-
duce a new abductive program from observations and a background abductive theory.
The above presented systems consider a general induction problem, namely, learning
concepts possibly having no definitions in the background knowledge base. They use
typical induction algorithms like MIS (Shapiro, 1981) or FOIL (Quinlan, 1990), in
which hypotheses are constructed from scratch. However, such general induction al-
gorithms require exhaustive search and would produce many useless hypotheses. By
contrast, we restricted the problem setting and assumed a prior knowledge base having
imperfect rules defining the learning concept. Then we used abduction to revise such
incomplete knowledge rather than inducing arbitrary new knowledge.

16.6 CONCLUDING REMARKS


This chapter introduced new techniques of abductive generalization and abductive spe-
cialization. They provide methods for revising a knowledge base in the face of positive
and negative observations, and compute inductive generalization and specialization
ABDUCTIVE GENERALIZATION AND SPECIALIZATION 265

through abduction. Although the proposed techniques are still restrictive compared
with general induction systems, they enhance the reasoning ability of abduction and
also realize efficient induction. Our system is realized using the procedure of extended
abduction (Inoue and Sakama, 1998).
According to Peirce, "if we are ever to learn anything or to understand phenomena
at all, it must be by abduction that this is to be brought about" (Peirce, 1958). In this
respect, abduction is considered as a step to induction. Abductive generalization and
specialization are captured as techniques based on this view, and there are possibilities
of exploiting further techniques in this direction. On the application side, it is known
that abduction is useful for database update and theory revision where extensional
facts are subject to change (Kakas and Mancarella, l990b; Inoue and Sakama, 1995).
By contrast, abductive generalization/specialization constructs intensional rules for
new information, so it has potential applications to rule updates in knowledge bases.
Future research also includes extending the techniques to nonmonotonic knowledge
bases.

Acknowledgments
The author thanks Katsumi Inoue for comments on an earlier draft of this chapter.
17 USING ABDUCTION FOR
INDUCTION BASED ON BOTTOM
GENERALIZATION
Akihiro Yamamoto

17.1 INTRODUCTION
Abduction is to find explanations which explain a given example assuming a back-
ground theory. Induction, often called inductive inference, means a process of gen-
erating general rules which given examples obey. From these simple definitions, we
can expect such an inductive inference procedure that it generates rules by modify-
ing explanations which some abductive inference generates from input examples. In
this chapter we give such a procedure with the support of deductive inference and
generalization. The procedure is a refinement of bottom generalization (Yamamoto,
1997; Yamamoto, 1999a), which was invented in the analysis of inverse entailment by
(Muggleton, 1995). Because inverse entailment is an extension of bottom generaliza-
tion, the results in this chapter show that inverse entailment also contains abduction
potentially.
In researches of Artificial Intelligence abduction has already been formalized in
various ways (Cox and Pietrzykowski, 1986b; Demolombe and Farinas de Cerro,
1991; Hirata, 1995; Inoue, 1992a; Poole, 1988a; Pople, Jr., 1973). Induction has also
been formalized in various ways, and a comprehensive framework has been proposed
in Theoretical Computation Science (see, for example, (Angluin and Smith, 1983)). In
these ten years inductive inference based on theorem proving and logic programming
has attracted much attention with the name inductive logic programming (ILP) (Mug-
gleton, 1992; Lavrac and Dzeroski, 1994; Nienhuys-Cheng and de Wolf, 1997).

267
P.A. Flach and A. C. Kakas (eds.), Abduction and Induction, 267-280.
© 2000 Kluwer Academic Publishers.
268 A. YAMAMOTO

In this research we adopt the formalization of abduction by (Poole, 1988a), which


was extended by (Inoue, 1992a). The formalization is suitable for our purpose be-
cause it is based on consequence finding in clausal logic, with which we formalized
bottom generalization (Yamamoto, 1997; Yamamoto, 1999a). Moreover, Inoue pro-
posed SOL-resolution as a procedure for consequence finding and showed how to
implement abduction with it. The inductive inference procedure given in this chapter
uses SOLD-resolution, which is a refinement of SOL-resolution for the use of definite
clauses.
Resolution is widely known as a procedure for complete refutation, but it has re-
ceived little attention as a procedure for consequence finding even in the research
area of theorem proving. The completeness of resolution for consequence finding was
firstly shown by (Lee, 1967; Slagle et al., 1969). We can find its proof in (Kowalski,
1970). (Inoue, 1992a) pointed out the importance of the theorem in various inferences:
abduction, prime implicants, ATMS, and circumscription. However, he did not include
inductive inference in this list. In the ILP area some researchers (Nienhuys-Cheng and
de Wolf, 1994; Nienhuys-Cheng and de Wolf, 1995) noticed the importance of the
theorem, calling it the Subsumption Theorem. 1
Bottom generalization was originally for inductive inference of non-Hom clausal
logic, but we assume in this chapter Hom clausal logic. By doing so we can not
only use fruitful results in logic programming, but also represent the combination of
abduction and deduction in a very simple form.
This chapter is organized as follows. Precise formalizations of abduction and in-
duction are given in Section 17 .2. By comparing them with each other, we clarify a
problem which appears in constructing inductive inference procedures by using ab-
ductive inference.
In Section 17.3 we introduce SOLD-resolution and show that it is complete for find-
ing goal clauses as consequences from pairs of a definite program and a goal clause.
We also give some remarks on the difference between SOLD-resolution and SOL-
resolution.
The main result is given in Section 17 .4. After recalling the definition of bottom
generalization, we show how we should implement it by using SOLD-resolution as ab-
duction, and combining it with deduction and generalization. The obtained inference
inference procedure is illustrated in Figure 17 .1.
The procedure given in Section 17.4 can find at most one definite clause from one
example. In Section 17.5 we try to extend the procedure so that it can infer multiple
clauses from one example. By assuming that every hypothesis consists of unit clauses,
we give an extended procedure based on SOLD-resolution.
In Section 17.6 we give some comparison of our results with other inference meth-
ods developed in the ILP area.

1We do not use the name in this chapter because it means another theorem in the theorem proving area.
USING ABDUCfiON FOR INDUCfiON BASED ON BOITOM GENERALIZATION 269

17.2 FROM ABDUCTION TO INDUCTION


The discussion in this section is general and we do not restrict ourselves to Horn
clausal logic here.
In the formalization by (Poole, 1988a) and (Inoue, 1992a), abduction is to find
an explanation H from a given surprising fact E and a given background theory B.
(We use the symbols H and E to avoid confusion in comparing abduction to induc-
tion.) The formulas H, E, and B are assumed to belong to sets LH, LE. and Ls.
respectively. Since the three sets are fixed for an inference system, we call the triple
S = (LH, LE, Ls) a language structure. The explanation H must satisfy the following
conditions:

(A -1) B 1\ H is consistent.
(A-2) B 1\H f= E .
Poole and Inoue gave an inference method for abduction in clausal logic by using
resolution as a consequence finding procedure. Before explaining the method we will
give the completeness theorem of resolution for consequence finding.
Definition 17.1 A clause D subsumes a clause C if there is a substitution e such that
every literal in De occurs in C.

Theorem 17.1 (Lee, 1967; Kowalski, 1970) A clause Cis a logical consequence of
a clausal theory Tiff there is a clause D which is derivable from T by resolution with
factoring and subsumes C.

Suppose that Ls is a set of conjunctions of clauses and both LE and LH are sets
of ground atoms. Note that both •E and •H are clauses in the language structure.
Then any ground atom H satisfying (A-2) can be derived by generating clauses D
from T = B 1\ ...,£, with resolution and checking whether or not De = •H for some e.
This method is justified by Theorem 17.1. We can apply this abduction method in the
case that both LE and LH are sets of negation ofclauses. In the case resolution derives
clauses D which subsumes the negation of an explanation H satisfying (A-2). The
explanation obtained by negating D is called minimal (Inoue, 1992a). Non-minimal
explanations should be derived by adding some literals to D before negating it.
We formalize induction from positive examples. An inductive inference system
M sequentially takes inputs of examples E,, E2, E3, ... and returns hypotheses B,, B2,
B3, .... (We choose the symbol B; because in our formalization the current hypothesis
corresponds to the background theory in the abduction explained above.) Some initial
hypothesis Bo is given to M. Two sets LE and Ls of first-order formula are fixed, and
each example E; and each hypothesis B; must belong toLE and Ls, respectively. We
assume the following conditions forE; and B; (i = I, 2, ... ):

(I-I) B; is consistent.

(1-2) B; F= £ 1 1\ E2 1\ ... I\ E;, that is, each example is positive and each hypothesis
explains all the examples already given to M.
270 A. YAMAMCITO

(1-3) B; = B;- 1 1\H; where H; is in some set LH. The system M generates the hy-
pothesis incrementally.

The following two conditions follow directly (1-1 ), (1-2) and (1-3):

(1-4) B;-11\H; is consistent.

This formalization has already been used to design some inductive inference sys-
tems, e.g. ITOU (Rouveirol, 1992) and Progol (Muggleton, 1995). (Cohen, 1995)
and (Arimura, 1997; Arimura et al., 1995) used the formalization for theoretical anal-
ysis of inductive inference.
The expectation at the beginning of this chapter is formally stated by comparing
(1-4) and (1-5) with (A-1) and (A-2): We expect an inductive inference system M
iteratively revises B;_ 1 by adding H; to it which is obtained by modifying explanations
which some abductive inference procedure derives from E;.
Now we consider how to realize the expectation in clausal logic. Assume that LB
is a set of conjunctions of clauses, and that LH is a set of clauses. that we try apply-
ing Poole and Inoue's abduction method to generating H; from B;-I and E;. Then a
problem arises that we cannot use explanations derived by the abduction method as
H; because the condition (l-3) is not satisfied in general. That is, since the abduc-
tion method is based on consequence finding, H; is the negation of some clause, and
thereforeB; = B;_ 11\H; does not always belong to LB.
Some reader might think the problem would disappear if we assumed that LH is
a set of ground literals. However, ground literals would not be expected as outputs
of an inductive inference system. We expect the outputs should be generate general
rules, but a ground literal could not be regard as a general rule. We should allow LH
to contain clauses which are not ground literals. We have to find some methods to
modify explanations derived by the abduction method so that we get a general rule
which is suitable as an out put of inductive inference procedure H;. We use saturation
and generalization for the modification.
In the following discussions LE are assumed to be a set of clauses. We adopt
this assumption with expecting our result contribute the progress of such theories and
practices, because examples for an inference systems are often assumed to be clauses
in both theories and practices of inductive inference. If we concentrated ourselves on
applying the abduction method to inductive inference, LE should be a set of negations
of clauses, or a set of ground literals.

17.3 SOLD-RESOLUTION
We are introducing SOLD-resolution, with which we implement Poole and Inoue's
abduction method.
A definite program is a finite conjunction of definite clauses. A definite clause is a
formula of form
USING ABDUCI'ION FOR INDUCI'ION BASED ON BOTTOM GENERALIZATION 271

and a goal clause is of the form


D = 'v'xJ .. .xk(-.At V .. . V-.An),
where n ~ 0, Ai 's are all atoms, and Xt, ... , Xk are all variables occurring in the atoms.
A Horn clause is either a definite clause or a goal clause. We represent the formulas
in the forms of implication:
C Ao+-At,A2, .. . ,An,
D +-At,A2, .. . ,An.
For the definite clause C, c+ and c- respectively denote a definite clause Ao +- and a
goal clause+- At ,A2 , .. . ,An. For the goal clause D,IDI denotes the sequence of atoms
At ,A2 , ... ,An.
For our discussions later, we firstly give the definition of SLD-resolution in a form
slightly different from that in (Lloyd, 1987). An SLD-refutation is defined under two
assumptions: An atom in goal clause is selected according to a computation rule.
Variables in each input clause are assumed to be standardized apart (Lloyd, 1987),
that is, renamed so that no pair of input clauses can share variables.
Definition 17.2 Let P be a definite program, G be a goal, and R be a computation rule.
An SLD-refutation of(P, G) is a finite sequence of triples (Gi, 9i,Ci) (i = 0, 1, 2, .. . ,n)
which satisfies the following conditions:
1. Gi is a goal clause, 9i is a substitution, and q is a variant of a definite clause
in P the variables of which are standardized apart by renaming.
2. Gn =D.
3. For every i = 0, ... ,n- 1, ifGi =+-At, ... ,Ak and Am is the atom selected from
G by R, then choose as Ci+l such a clause that c;t 1 and Am are unifiable, put
9i+ 1 an mgu of them, and put

Now we introduce SOLD-resolution. For SOLD-derivations we assume the same


conditions as assumed for SLD-derivations.
Definition 17.3 Let P be a definite program, G be a goal, and R be a computation
rule. An SOLD-derivation of(P,G) is a finite sequence of quadruples (Gi,Fi,9i,Ci)
(i = 0, 1,2, .. . ,n) which satisfies the following conditions:
1. Gi, and Fi are goal clauses, 9i is a substitution, and Ci is a variant of a definite
clause in P the variables of which are standardized apart by renaming.
2. Go= GandFo =D.
3. Gn=DandFn=F .
4. For every i = 0, .. . ,n- 1, if Gi =+-At , ... ,At. Fi =+- Bt, . .. ,Bh, and Am is
the atom selected from G by R, then (Gi+ I ' Fi+ I' 9i+ I' ci+ I) is derived from
(Gi,Fi,9i,Ci) by the either of the following two operations:
272 A.YAMAMafO

(a) Skip: Put


G;+t ~AJ, ... ,Am-t ,Am+l,···,Ab
F;+t ~ Bt, . .. ,Bh,Am.
Both ei+l and C;+t can be arbitrary.
(b) Resolve: Choose as Ci+l such a clause that Ci+ 1 and Am are unifiable,put
ei+l an mgu of the two atoms, and put
G;+t ~ (At , . . . ,Am-t,ICii,Am+t , . .. ,Ak)e;+t, and
Fi+l F;S;+t·
The goal Fn of the last quadruples of the SOLD-derivation is called its consequence.
The first quadruple of an SOLD-derivation is written as (G, 0 , _, _). Moreover, when
the skipping operation is applied to (G;,F';, 8;, C;), the quadruple following it is written
as (Gi+I ,Fi+I, _, _).
If the skipping operation is not applied in constructing an SOLD-derivation, F; is 0
in each quadruple and, by deleting F; from the quadruple, an SLD-refutation of (P, G)
is obtained.
Example 17.1 Let Pt = Ct/\ C2/\ C3/\ C4 where
Ct pet(x) ~ cat(x),
C2 cuddly-pet(x) ~ small(x),jluffy(x) ,pet(x),
c3 jluffy(cx) ~,
C4 cat( Cx) ~ .
Then an example of an SOLD-derivation from Gt =~ cuddly-pet(cx) with P1 is as
follows:
~ cuddly-pet(cx) , o, _,
~ small(cx),jluffy(cx),pet(cx), o, {xi:= Cx}, c2
~ jluffy(cx),pet(cx), ~ small(cx), _,
+- pet(cx), ~ small(cx), £, c3
~ cat(cx), ~ small( cx), £, Ct
0, ~ small(cx) , £, c4
At the second step we use the skipping operation.
The completeness theorem for SOLD-resolution was proved by (Nienhuys-Cheng
and de Wolf, 1995), though they called it the completeness theorem for SLD-resolution
for consequence finding.
Proposition 17.2 (Nienhuys-Cheng and de Wolf, 1995) Let P be a definite clause
and G be a goal clause. Then a goal clause F is a logical consequence of P 1\ G iff
there is a SOLD-derivation of(P,G) whose consequence subsumes F.
In the following we will give some remarks on details of the amalgamation of SOL-
resolution and SLD-resolution, but readers who are not familiar with SOL-resolution
can skip them.
USING ABDUCfiON FOR INDUCfiON BASED ON BaiTOM GENERALIZATION 273

Remark 17.1 The resolving operation (4(b)) for SOLD-resolution is different from the
derivation operation/or SOL-resolution (Inoue, 1992a) in the following three points:

1. In an SOLD-derivation no variant ofG can be used in Ci. while some variant of


G can be used in an SOL-deduction.

2. The literal Am9i is removed in an SOLD-derivation, while it is framed in a SOL-


deduction. Therefore the ancestry rule for SOL-resolution is not necessary for
SOLD-resolution.

3. Neither the factoring rule nor the merging rule for SOL-resolution is used in
constructing SOLD-derivations.

These differences are due to the fact that P and G are respectively a definite program
and a goal clause.

17.4 FINDING DEFINITE CLAUSES


In this section we give a method to derive definite clauses which explain a given ex-
ample, by using Poole and Inoue's abduction method. As stated in Section 17.2, we
assume such a language structure that LB is the set of all definite programs, LH is the
set of all definite clauses, and LE ~ LH.
We start with preparing definitions and some remarks.

Definition 17.4 A clause D is a generalization ofa clause C if there is a substitution


9 such that De = C.

Definition 17.5 The complement of a clause C = Ao t- A1, ... ,Am is a formula

where oc is a substitution which replaces each variable x in C with a Skolem constant


symbol Cx. We sometimes write o instead ofoc when it causes no ambiguity. lfC is
ground, oc is an empty substitution and therefore •(Coc) = -.C.

Remark 17.2 Fora Horn clause C, •(C- o) is a ground definite program, -.(c+o) is
the negation ofa ground atom, and -.(Co) = -.(c+o) t\ •(C-o).

We give a formal definition of the bottom generalization method.

Definition 17.6 (Yamamoto, 1999a) Let B be a background theory and E be an ex-


ample. The bottom set (or bottom for short) forE w.r.t. B is a set of literals

Bot(E,B) = {L I Lis a ground literal and B t\ •(Eo) f= •L}.


The inference method bottom generalization generates hypotheses H in the following
way:

(BG-1) Choose non-deterministically one positive literal Ao and several negative liter-
als •A1, •A2, ... , •An (n 2 O)from Bot(E,B), and put K = Ao t-A1 ,A2, ... ,An.
274 A.YAMAMafO

(BG-2) Non-deterministically choose one of the generalizations of K which has no


Skolem constants, and then return the clause as H .
Because o substitutes a Skolem constant to each variable in E, we get the following
proposition.
Proposition 17.3 (Yamamoto, 1999b) A clause H obtained by bottom generalization
satisfies that B 1\ H f= E .
Proof Let K be the clause constructed in phase (BG-1). Then, by the definition of
bottom set, it holds that B 1\ -,(Eo) f= -,K. Since K is ground, we get B 1\ K f= Eo.
Because every clause H which is chosen in phase (BG-2) is a generalization of B, we
have B 1\H f= Eo. Moreover, we have B 1\H f= E because H has no Skolem constants.
I
Now the problem we must solve is how to derive the clause K at step (BG-1 ).
Our solution is that K+ is generated by Poole and Inoue 's abduction with SOLD-
resolution, and the literals in K- are generated by bottom-up resolution of a definite
programP=Bf\-,(E-o) (Remark 17.2).
In order to derive the positive literal K+ we modify SOLD-resolution. Because it
may derive a consequence +- A 1, ... , Am such that m ~ 2, we add a process to check
whether or not the consequence is subsumed by a goal clause of the form +- A. We
can check it by applying the well-known mgu algorithm to the set {A 1, ... ,Am}.
Definition 17.7 Let (G, o,_, _), (G1 ,FJ , el,CI), ... , (D,Fn , On,Cn) bean SOLD-deriva-
tionof(P,G) withFn =+-A1 , . .. ,Am. lftheatomsA! , .. . ,Am are unifiable, an SOLDR-
derivation of (P, G) is defined as a sequence by adding (0, +-A 1't, 't, _) to the SOLD -
derivation, where 't is an mgu of the atoms A 1, ... ,Am. The goal clause +- A1 't is
called the consequence of the derivation. A goal clause +- A is SOLDR-derivable
from (P,G) if there is an a SOLDR-derivation whose consequence is+- A.
Remark 17.3 The check whether or not the atoms A1 , . . . ,Am are unifiable is regarded
as the iterative application of the reducing operation used in SOL-resolution. We can
embed the reducing operation into SOLD-resolution by replacing condition 4 for an
SOLD-derivation with the following 4':
4' For every i = 0, ... ,n- I, ifG; =+- A1, . .. ,Ak> F; =+- B1, . . . ,Bh, and Am is the
atom selected from G by R, then (G;,F;,O;,C;) is derived by one of the following
three operations:
(a) Skip: Put
G;+I = +-AJ, . . . ,Am-J ,Am+l•· · ·•Ak,
F;+I +- B1, .. . ,Bh,Am.
Both ei+ I and C;+ I can be arbitrary.
(b) Reduce: IfF; is+- A and A and Am is unifiable, put 0;+1 an mgu of the
two atoms, and put
G;+I G;O;+I• and
F;+ I F;O;+ I·
USING ABDUCTION FOR INDUCTION BASED ON BaiTOM GENERALIZATION 275

C;+ 1 can be arbitrary.


(c) Resolve: Choose as C;+t such a clause that C/+ 1 and Am are unifiable,put
ei+l an mgu of the two atoms, and put
G;+t +- (At, ... ,Am-t,ICji,Am+t, ... ,Ak)e;+t, and
F;+ t F;8;+ t.

Now we put the set

SOLDR(P, G) =
{A I which
f- A is a ground instance of a goal +- A'
is SOLDR-derivable from (P, G)
}
.

Theorem 17.4 Let P be a program and G be a goal such that P 1\ G is consistent.


Then for any ground atom A, A E SOLDR(P, G) iff P 1\ G f= •A.

The theorem is proved with a lemma, called the Switching Lemma for SLD-refutation.

Lemma 17.5 (Lloyd, 1987) Let (G,_,_), (Gt ,8t,Ct), ... , (D,8n,Cn) be an SLD-refu-
tation of(P,G). Suppose that

Gq +-At. ... ,A;-t,A;, ... ,Aj-t,Ah . .. ,Ak


Gq+t +- (At, ... ,A;-t,ICq+t-I, ... ,Aj-t,Aj, ... ,Ak)Sq+t
Gq+2 f- (At, ... ,A;-1' ICq+t-1, ... ,Aj-1, IC_;-+21, ... ,Ak)8q+t8q+2

Then there is an SLD-refutation of (P, G) in which A j is selected in Gq instead of A;


and A; is selected in Gq+ 1 instead ofA j· The obtained SLD-refutation is of the form

(G, _, _), ... , (Gq, 8q,Cq), (G~+t, 8~+t ,Cq+2), (G~+2• 8~+2,Cq+t),
(G~+ 3 ,e~+ 3 ,Cq+3), ... , (o,e~,Cn)
and every clause G;Jor i = q + 2, ... ,n is a variant ofG;.

The proof of Theorem 17.4 now proceeds as follows.

Proof. (if part) LetP 1\G f= •A. Then there is an SLD-refutation (Go,_,_), (Gt, 8t ,Ct),
... ,(D,Sn,Cn) of (P 1\ (A +-),G). If no C; is A+-, the refutation is of (P,G) and
therefore P 1\ G is inconsistent. This contradicts the assumption of the theorem. So
A f- must be used as an input clause at least once. By Lemma 17.5 we can assume
that the input clauses Ct , C2, ... , Ck are in variants of clauses in P, and the rests
are A+-. Let Gk+t =+- Bt,B2, ... ,Bn-k· Since Gk is refuted with only A+-, the
atoms {A,Bt ,B2, ... ,Bn-k} are unifiable. By applying the skipping operation instead
of resolution with A +-, we get an SOLD-derivation of (P, G) whose consequence is
Fn+t =+- Bt,B2, ... ,Bn-k· This concludes that A is in SOLDR(P,G).
(only-if part) Suppose thatA is in SOLDR(P,G). Then there is an SOLD-derivation
of (P, G) with consequence f- Bt ,B2, . .. ,Bm such that each of B; is unifiable with
A. By replacing each application of the skipping operation with an application of
resolution with A+-, we get an SLD-refutation of (P 1\ (A+- ),G), and therefore (P 1\
(A +-),G) is unsatisfiable. This completes the proof. I
276 A. YAMAMOTO

Procedure Find.DCwith..BG
Input: a definite program B and a definite clause E
Output: a definite clause H
Method:

I. If B f= E then stop. Otherwise go to Step 2.


2. Derive an atom Ao in SOLDR(B 1\ -,(£-a), -,(£+a)).

3. Derive some atoms A 1 , A 2 , •. •, An in M(B 1\ -.,(£-a)) by bottom-up derivation.

4. Return a clause H which is a generalization of Ao f-A, ,Az, ... ,An

Figure 17.1 An inference procedure for finding definite clauses.

Now we consider how to generate negative literals in K.

Definition 17.8 (Rouveirol, 1992) Let Ao be a ground atom, B a background theory,


and H a hypothesis. A saturant ofAo w.r.t. E and B is a clause

where A, ,Az, ... ,An are ground literals which are logical consequences ofB 1\ -,(£a).

From the definition, applying saturation to a ground atom Ao in Bot(E,B) exactly


coincides to the selection of negative literals at Step (BG-1) of bottom generaization.
The following lemma shows that negative literals in Bot(£ ,B) can be obtained from P
without using G.

Lemma 17.6 Let A be a ground atom, P a definite program, and G a goal. IfP 1\G is
consistent, P 1\ G f= A iff P f= A .

Proof It is sufficient to show the only-if part. Let G =f-A, ,Az, ... ,Am, and M(P)
be the least Herbrand model of P. If G is true in M(P), -.,A must be false in M(P) from
the assumption that P 1\ G 1\ -,A is inconsistent. Therefore A is a logical consequence
ofP.
The proof is completed if we prove is that G is true in M(P). Suppose that G is
false in M(P). Then there is a ground substitution 9 such that -,(G9) =A, e 1\Aze 1\
.. . I\ Ame is true in M(P). Since each A;e is a logical consequence of P, P 1\ G is
inconsistent, and this contradicts the assumption of the lemma. I
It is a famous result in Logic Programming theories (Lloyd, 1987) that, for a ground
atom A, P f= A iff A is derived by resolution and instantiation from P. Therefore we
can adopt, as ...,K-, any conjunction A 1 1\ Az 1\ . . . 1\ An of ground atoms which is a
logical consequence of P.
In Figure 17.1 we illustrate an inference procedure Find.DCwith..BG based on
Theorem 17.4 and Lemma 17 .6. We state, in the form of a theorem, that this procedure
is a correct implementation of bottom generalization.
USING ABDUCfiON FOR INDUCfiON BASED ON BaiTOM GENERALIZATION 277

Theorem 17.7 Suppose that B ~E. A hypothesis His generated by Find.DCwith.BG


iffH is obtained by bottom generalization/romE w.r.t. E.
Proof. Let Bot+(E,B) (BoC(E,B)) be the set of all positive (negative, resp.) in
Bot(E,B). Then, by Theorem 17.4 and Lemma 17.6,
Bot+(E,B) = SOLDR(BI\•(E-o),•(E+o)), and
BoC(E,B) = {·A I A E M(B 1\ •(E-o))}.
Referring Step 2, Step 3 and Step 4 of Find.DCwith.BG and the definition of bottom
generalization, we know that the theorem holds. I
Example 17.2 We illustrate how Find_Dc_with..BG works with an example in (Mug-
gleton, 1995). With the clauses in Example 17.1, let us define B1 and E1 as follows:
B1 C1/\C2,
E1 = cuddly-pet(x) +- jluffy(x),cat(x).
Since
M(B11\ •(E! o)) = {jluffy(cx), cat(c;,),pet(cx)}
and
SOLDR(BII\ •(EI- o), •(EI +o)) = {small(cx),cuddly-pet(cx)},
a definite clause
H = small(x) +- jluffy(x),cat(x),pet(x)
is one of the outputs, because a clause
K = small(cx) +- jluffy(cx),cat(cx),pet(cx)
is an instance of H.
Step 1 of Find.DCwith.BG is needed for two reasons. The first one is the formal-
ization of inductive inference given in Section 17 .2. In the formalization, we adopt
the standpoint that we need not revise the current background theory B; if the given
example E; can be explained with B;. So we need Step I of Find.DCwith.BG.
The second reason is concerned with the completeness of Find.DCwith.BG. The
completeness theorem below can be shown directly from previous results (Jung, I993;
Yamamoto, I997; Yamamoto, 1999a).
Definition 17.9 (Plotkin, 1971) Let H and E be clauses and B be a conjunction of
clauses. Then H subsumes E relative to B if there is a clause F such that
B f= Vy1 .. ·Yn(E' ++ F')
and H subsumes F, where E' and F' are obtained by removing universal quantifiers
from E and F respectively, and YI, ... ,Yn are all variables occurring in E' and F'.
Theorem 17.8 Let H and E be definite clauses and B be a definite program. A hy-
potheses H is obtained with bottom generalization iff H subsumes E relative to B
whenever E is not a tautology and that B ~ E.
Since we cannot neglect the condition at the last of the theorem, we need Step I of
Find.DCwith.BG.
278 A. YAMAMaTO

17.5 FINDING UNIT PROGRAMS


The procedure Find...DCwith..BG is designed under the assumption that every hypoth-
esis derived by it consists of a single clause. This is accomplished in Find...DCwith..BG
because every abductive hypothesis derived by SOLDR-resolution consists of a single-
ton atom. Therefore one may conjecture that, if we use the original SOLD-resolution
instead of SOLDR-resolution, we obtain an inference procedure which derives mul-
tiple clauses as one hypotheses. In this section we show that the conjecture is partly
true.
A unit program P is a program of the form
(At+--) 1\ (A2 +--) 1\ ... I\ (An+--).
Without loss of generality we can assume that the variables introduced in SOLD-
resolution occurs in no programs in LH. The next theorem can be proved in the same
way as Theorem 17 .4.
Theorem 17.9 Let H be a unit program and E a definite clause, and B a definite
program. Then B 1\ H f= E iff there is a clause F such that
1. there is an SOLD-derivation of(B 1\ •(E-o),...,(E+o)) whose consequence is
F,and
2. there is an SLD-refutation of(H ,F) .
The operation with which a unit clause C; +-- is derived from +-- A; was called identifi-
cation in (Muggleton, 1990). The theorem above shows that the hypothesis H can be
derived with repeated application of identification. Identification can be realized by
the following proposition.
Proposition 17.10 Two atoms A and B which share no variables that are unifiable iff
B is a generalization of some ground instance of A.
Example 17.3 Two atoms At = p(a,x) and Bt = p(y,b) are unifiable. In fact Bt is
a generalization of p(a,b) which is a ground instance of At. Two atoms A2 = p(x,x)
and B2 = p(y,J(y)) are not unifiable. There is no ground instance of A2 which is a
generalization of B2.

An inference procedure based on Theorem 17.9 and Proposition 17.10 is illustrated in


Figure 17 .2.
Example 17.4 We illustrate how Find_UP_with_ID works with the example in (Inoue,
1992a). Let us assume the following a background theory B2 and an example E2 are
given:
B2 = p(x,z) +-- p(x,y),p(y,z),
E2 p(a,b) +--.
Then the goal clauses which are consequences of some SOLD-derivations for (B 1 1\
...,(E2-o),...,(E2+o)) are:
+-- p(a,b),
USING ABDUCfiON FOR INDUCfiON BASED ON BOITOM GENERALIZATION 279

Procedure FIND_UP_withJD
Input: a definite program B and a definite clause E
Output: a unit program H
Method:

1. If B f= E then stop. Otherwise go to Step 2.


2. Derive a goal clause F =+- B1,B2, ... ,Bn which is the consequence of some
SOLD-derivation for (B 1\ •(E-o), •(E+o)).

3. For every i from I to n, apply the followings:

(a) Make A; a generalization of a ground instance of B; such that every vari-


able in A; is new.
(b) Apply the mgu 9; of A; and B; to F.

4. Return H = (A1 +-) 1\ (A2 +-) 1\ ... 1\ (An+-).

Figure 17.2 An inference procedure for finding unit programs.

+- p(a,xi),p(xJ,b),
+- p(a,x2),p(x2,x3),p(x 3,b),

From the goal clauses we get the following hypotheses of unit programs:
p(a,b) +-,
(p(a,y1) +-)/\(p(y2,b) +-),
(p(a,a) +-) 1\ (p(a,b) +-),
(p(a,b) +-)1\(p(b,b) +-),
(p(a,yl) +-)/\(p(y2,Y3) +-)/\(p(y4,b) +-),
(p(a,a) +-)/\(p(a,y3) +-)/\(p(y4,b) +-),

17.6 CONCLUDING REMARKS


In this chapter we give an inductive inference procedure by combining abduction,
deduction and generalization. The abductive inference in it is implemented SOLD-
resolution, which is a modification of SOL-resolution. The deductive inference coin-
cide to saturation proposed in ILP. Since saturation is a procedure which finds conse-
quences iteratively, it could also be implemented with SOL-resolution. Therefore the
procedure Find.DC_with.JJG can be realized with programs for SOL-resolution and
generalization.
Saturation seems a very natural inference mechanism for inductive inference of
definite clauses because several researchers noticed it independently. In the ILP area
280 A. YAMAMOTO

it was first proposed by (Rouveirol, 1992), while (Angluin et a/., 1992) independently
developed the operation in the Computational Learning area. (Cohen, 1995) uses
saturation in analyzing the complexity of finding definite clauses. (Arimura, 1997)
designed an algorithm which exactly learns definite programs (with some restricted
syntax) in polynomial time.
Inverse entailment also uses the bottom set in deriving a seed K of hypotheses,
but it generalizes K by the inverse of logical entailment. Since instantiation is a sub-
operation of entailment, inverse entailment is more powerful than bottom generaliza-
tion. Unfortunately, there is no characterization of the hypotheses generated by inverse
entailment. All that we have shown is that it cannot derive all hypotheses H such that
BI\H f= E (Yamamoto, 1999b).
We conjecture that it is quite difficult to extend the procedure FIND_UP_withJD
so that it can derive conjunctions of definite clauses. For such an extension, we would
have to consider, for example, the case that two definite clause in a hypothesis call
each other in an SLD-resolution. At our current stage of research, we have no solu-
tion to the problem how to treat such a case. However, remember that the procedure
Find...DC_with...BG assumed to be used iteratively in an inductive inference system. For
the assumption, if we gave sufficient and proper examples to the system, hypotheses
consists of more than two clauses could be inferred. So we should carefully consider
when we need extensions of FIND_UP_withJD.
From the construction of Find...DC_with...BG, some readers might consider that we
used abduction as a sub-inference of induction. But this is not correct. In the formal-
ization of inductive inference in Section 17 .2, a hypothesis derived by Find...DCwith...BG
is added to the background theory the inference system has and this addition is fol-
lowed by the next call of Find...DCwith...BG. So we can consider that the abduction
in the next call of Find...DCwith...BG is induced by induction. That is, abduction and
induction are co-routines in our formalization.

Acknowledgments
The main part of this work was accomplished when the author was visiting the Computer Sci-
ence Department of Technical University Darmstadt, Germany. The author wishes to thank
Prof.Dr. Wolfgang Bibel and the members of his group for many discussions and suggestions
on this issue.
Bibliography

Abe, A. (1997). Hypothesis generation in abduction. In Fiftyfourth Annual Conference


of/PSJ, volume 2 6G-Ol, pages 155-156. In Japanese.
Abe, A. (1998). Abductive analogical reasoning. Transactions of the IEICE, J81-D-
11(6):1285-1292. In Japanese.
Ade, H. and Denecker, M. (1995). AILP: abductive inductive logic programming. In
Proceedings of the Fourteenth International Joint Conference on Artificial Intelli-
gence, pages 1201-1207, California. Morgan Kaufmann.
Ade, H., Malfait, B., and Denecker, M. (1994). RUTH: an ILP theory revision sys-
tem. In Proceedings of the Eighth International Symposium on Methodologies for
Intelligent Systems, volume 869 of Lecture Notes in Artificial Intelligence, pages
336-345. Springer-Verlag, Berlin.
Alchourr6n, C., Giirdenfors, P., and Makinson, D. (1985). On the logic of theory
change: Partial meet contraction and revision functions. Journal of Symbolic Logic,
50:510-530.
Aliseda, A. (1996a). Toward a logic of abduction. Technical report, Stanford Univer-
sity, USA.
Aliseda, A. (1996b). A unified framework for abductive and inductive reasoning in
philosophy and AI. In (Flach and Kakas, 1996), pages 1-6. Available on-line at
http://www.cs.bris.ac.uk/-flach/ECAI96/.
Aliseda, A. (1997). Seeking Explanations: Abduction in Logic, Philosophy of Science
and Artificial Intelligence. PhD thesis, Institute for Logic, Language and Com-
putation (ILLC), University of Amsterdam, The Netherlands. Available on-line at
http://www.wins.uva.nl/-www/research/illc/w wwdissertations.
html.
Anderson, D. ( 1986). The evolution of Peirce's concept of abduction. Transactions of
the Charles S. Peirce Society, 22(2): 145-164.
Anderson, D. (1987). Creativity and the Philosophy ofC.S. Peirce, volume 27 of Phi-
losophy Library. Martinus Nijhoff.
Andreasen, T. and Christiansen, H. (1996). Counterfactual exceptions in deductive
database queries. In Proceedings of the Twelfth European Conference on Artificial
Intelligence, pages 340-344.

281
282 BIBLIOGRAPHY

Andreasen, T. and Christiansen, H. (1998). A practical approach to hypothetical database


queries. In Freitag, B., Decker, H., Kifer, M., and Voronkov, A., editors, Transac-
tions and Change in Logic Databases, volume 1472 of Lecture Notes in Computer
Science, Berlin. Springer-Verlag.
Angluin, D. (1980). Inductive inference of formal languages from positive data. Infor-
mation and Control, 45:117-135.
Angluin, D., Frazier, M., and Pitt, L. (1992). Learning conjunctions of hom clauses.
Machine Learning, 9:147-164.
Angluin, D. and Smith, C. H. (1983). Inductive inference: Theory and methods. Com-
puting Surveys, 15(3):237-269.
Aravindan, C. and Dung, P. M. (1994). Belief dynamics, abduction and databases. In
MacNish, C., Pearce, D., , and Pereira, L., editors, Logics in Artificial Intelligence.
European Workshop JEL/A'94, volume 838 of Lecture Notes in Artificial Intelli-
gence, pages 66-85. Springer-Verlag.
Arima, J. (1997). Preduction: A common form of induction and analogy. In Proceed-
ings of the Fifteenth International Joint Conference on Artificial Intelligence, pages
210-215.
Arimura, H. (1997). Learning acyclic first-order hom sentences from implication.
In Proceedings of the Eighth International Workshop on Algorithmic Learning
Theory, volume 1316 of Lecture Notes in Artificial Intelligence, pages 432-445.
Springer-Verlag.
Arimura, H., Ishizaka, H., and Shinohara, T. (1995). Learning unions of tree patterns
using queries. In Proceedings of the Sixth International Workshop on Algorithmic
Learning Theory, volume 997 of Lecture Notes in Artificial Intelligence, pages 66-
79. Springer-Verlag.
Aristotle (1989). Prior Analytics. Hackett Publishing Company, Indianapolis, Indiana.
Translated by R. Smith.
Ayim, M. (1974). Retroduction: The rational instinct. Transactions of the Charles S.
Peirce Society, 10(1):34-43.
Baffes, P. T. (1994). Automatic Student Modeling and Bug Library Construction using
Theory Refinement. PhD thesis, Department of Computer Sciences, University of
Texas, Austin, TX.
Baffes, P. T. and Mooney, R. J. (1993). Symbolic revision of theories with M-of-N
rules. In Proceedings of the Thirteenth International Joint Conference on Artificial
Intelligence, pages 1135-1140, Chambery, France.
Baffes, P. T. and Mooney, R. J. (1996). A novel application of theory refinement to stu-
dent modeling. In Proceedings of the Thirteenth National Conference on Artificial
Intelligence, pages 403-408, Portland, OR.
Bain, M. (1992). Experiments in non-monotonic first-order induction. In (Muggleton,
1992), pages 423-436.
Bain, M. and Muggleton, S. H. (1992). Non-monotonic learning. In (Muggleton, 1992),
pages 145-161.
Baral, C. and Gelfond, M. (1994). Logic programming and knowledge representation.
Journal of Logic Programming, 19/20:73-148.
BIBLIOGRAPHY 283

Bell, J. (1991). Pragmatic logics. In Proceedings of the Second International Confer-


ence on Principles of Knowledge Representation and Reasoning, pages 50-60, San
Mateo. Morgan Kaufmann.
Bergadano, F. and Gunetti, D. (1996). Inductive Logic Programming. MIT Press.
Bergadano, F., Gunetti, D., Nicosia, M., and Ruffo, G. (1996). Learning logic pro-
grams with negation as failure . In (De Raedt, 1996), pages 107-123.
Berwick, R. C. (1986). Learning from positive-only examples: The subset principle
and three case studies. In (Michalski et al., 1986), pages 625-645.
Bessant, B. (1996). The Babelism about induction and abduction. In (Flach and Kakas,
1996),pages 10-13. Available on-line at http : 1 jwww . cs . bris . ac . ukj-flach/
ECAI96/.
Bhaskar, R. (1981). Explanation. In Bynum, W. F., Browne, E. J. , and Porter, R., ed-
itors, Dictionary of the History of Science, pages 140-142. Princeton University
Press.
Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984). Classification
and Regression Trees. Wadsworth and Brooks, Monterey, CA.
Brewka, G. (1989). Preferred subtheories: an extended logical framework for default
reasoning. In Proceedings of the Eleventh International Joint Conference on Artifi-
cial Intelligence, pages 1043-1048.
Brogi, A., Lamma, E., Mancarella, P., and Mello, P. (1997). A unifying view for
logic programming with non-monotonic reasoning. Theoretical Computer Science,
184:1-59.
Brunk, C. A. (1996). An Investigation of Knowledge Intensive Approaches to Con-
cept Learning and Theory Refinement. PhD thesis, Department of Information and
Computer Science, University of California, Irvine, CA.
Buchanan, B. and Shortliffe, E., editors (1984). Rule-Based Expert Systems: The MYCIN
Experiments ofthe Stanford Heuristic Programming Project. Addison-Wesley, Read-
ing, MA.
Bunge, M. (1979). Causality and the Modern Science. Dover.
Buntine, W. L. (1992). Learning classification trees. Statistics and Computing , 2:63-
73.
Buntine, W. L. (1994). Operations for learning with graphical models. Journal of Ar-
tificial Intelligence Research, 2:159-225.
Burks, A. (1946). Peirce's theory of abduction. Philosophy of Science, 13:301-306.
Bylander, T., Allemang, D., Tanner, M., and Josephson, J. R. (1991). The computa-
tional complexity of abduction. Artificial Intelligence, 49:25-60.
Carnap, R. (1950). Logical Foundations ofProbability. Routledge & Kegan Paul, Lon-
don.
Carnap, R. (1952). The continuum of inductive methods. Chicago University Press.
Cayrol, M. (1992). Un modele logique general pour le raisonnement revisable. Revue
d' Intelligence Artificielle, 6(3):255-284.
Charniak, E. (1988). Motivation analysis, abductive unification and nonmonotonic
equality. Artificial Intelligence, 34(3):275-295.
284 BIBLIOGRAPHY

Cheeseman, P. (1990). On finding the most probable model. In Shranger, J. and Lan-
gley, P., editors, Computational Models of Scientific Discovery and Theory Forma-
tion, chapter 3, pages 73-95. Morgan Kaufmann, San Mateo, CA.
Cheeseman, P., Kelly, J., Self, M., Stutz, J., Taylor, W., and Freeman, D. (1988). Auto-
class: A Bayesian classification system. In Proceedings of the Eighth International
Workshop on Machine Learning, pages 54-64, Ann Arbor, MI.
Chinn, C. (1994). Are scientific theories that predict data more believable than theories
that retrospectively explain data? a psychological investigation. In Proceedings of
the XVI Conference of the Cognitive Science Society, pages 177-182.
Christiansen, H. (1992). A complete resolution method for logical meta-programming
languages. In Pettorossi, A., editor, Proceedings of the Third International Work-
shop on Meta-Programming in Logic, volume 649 of Lecture Notes in Computer
Science, pages 205-219.
Christiansen, H. (1998a). Automated reasoning with a constraint-based metainter-
preter. Journal of Logic Programming, 37(1-3):213-254. Special issue on Con-
straint Logic Programming.
Christiansen, H. (1998b). Implicit program synthesis by a reversible metainterpreter.
In Fuchs, N. E., editor, Proceedings of the Seventh International Workshop on Logic
Program Synthesis and Transformation LOPSTR' 97, volume 1463 of Lecture Notes
in Computer Science, pages 87-106, Berlin. Springer-Verlag.
Christiansen, H. (1999). Integrity constraints and constraint logic programming. In
Proceedings of the Twelfth International Conference on Applications of Prolog,
pages 5-12.
Clark, K. L. (1978). Negation as failure. In Gallaire, H. and Minker, J., editors, Pro-
ceedings of the Symposium on Logic and Databases, pages 293-322. Plenum Press,
New York.
Cohen, W. W. ( 1992). Abductive explanation-based learning: a solution to the multiple
inconsistent explanation problem. Machine Learning, 8:167-219.
Cohen, W. W. (1995). PAC-learning recursive logic programs: Efficient algorithms.
Journal ofArtificial Intelligence Research, 2:501-539.
Console, L., Giordana, A., and Saitta, L. (1991a). Investigating the relationships be-
tween abduction and inverse resolution in propositional calculus. In Proceedings of
the Fifth International Symposium on Methodologies for Intelligent Systems, vol-
ume 542 of Lecture Notes in Computer Science, pages 316-325. Springer-Verlag.
Console, L., Theseider Dupre, D., and Torasso, P. ( 1989). Abductive reasoning through
direct deduction from completed domain models. In Ras, Z. W., editor, Proceedings
of the Fourth International Symposium on Methodologies for Intelligent Systems,
pages 175-182. Elsevier.
Console, L., Theseider Dupre, D., and Torasso, P. (1991b). On the relationship be-
tween abduction and deduction. Journal of Logic and Computation, 1(5):661-690.
Console, L. and Torasso, P. (1991). A spectrum of logical definitions of model-based
diagnosis. Computational Intelligence, 7(3):133-141. Also in (Hamscher et al.,
1992).
Cooper, G. G. and Herskovits, E. (1992). A Bayesian method for the induction of
probabilistic networks from data. Machine Learning, 9:309-347.
BIBLIOGRAPHY 285

Copi, I. M. and Cohen, C. (1998). Introduction to Logic. Prentice Hall, lOth edition.
Cox, P. T. and Pietrzykowski, T. (1986a). Causes for events: their computation and
application. In Proceedings of the Eighth International Conference on Automated
Deduction, volume 230 of Lecture Notes in Computer Science, pages 608-621.
Springer-Verlag.
Cox, P. T. and Pietrzykowski, T. (1986b). Incorporating equality into logic program-
ming. Annals of Pure and Applied Logic, 31:177-189.
Cox, P. T. and Pietrzykowski, T. (1987). General diagnosis by abductive inference. In
IEEE Symposium on Logic Programming, pages 183-189, San Francisco.
Dagum, P. and Luby, M. (1997). An optimal approximation algorithm for Bayesian
inference. Artificial Intelligence, 93(1-2): 1-27.
Darden, L. (1991). Theory Change in Science: Strategies from Mendelian Genetics.
Oxford University Press, New York.
de Kleer, J., Mackworth, A. K., and Reiter, R. (1992). Characterizing diagnoses and
systems. Artificial Intelligence, 56(2-3): 197-222. Also in (Hamscher et al., 1992).
de Kleer, J. and Williams, C. R. (1987). Diagnosing multiple faults . Artificial Intelli-
gence, 32:97-130.
De Raedt, L., editor (1996). Advances in Inductive Logic Programming. IOS Press,
Amsterdam.
De Raedt, L. and Bruynooghe, M. (1990). On negation and three-valued logic in in-
teractive concept-learning. In Proceedings of the Ninth European Conference on
Artificial Intelligence, pages 207-212. Pitman.
De Raedt, L. and Bruynooghe, M. (1991). A multistrategy interactive concept-learner
and theory revision system. In Proceedings of the First International Workshop on
Multistrategy Learning, pages 175-190, Harpers Ferry.
De Raedt, L. and Bruynooghe, M. ( 1992a). Belief updating from integrity constraints
and queries. Artificial Intelligence, 53:291-307.
De Raedt, L. and Bruynooghe, M. (1992b). An overview of the interactive concept-
learner and theory revisor CLINT. In (Muggleton, 1992), pages 163-191.
De Raedt, L. and Bruynooghe, M. (1993). A theory of clausal discovery. In Pro-
ceedings of the Thirteenth International Joint Conference on Artificial Intelligence,
pages 1058-1063, Chambery, France.
De Raedt, L. and Lavrac, N. (1993). The many faces of inductive logic programming.
In Komorowski, J., editor, Proceedings of the Seventh International Symposium on
Methodologies for Intelligent Systems, volume 689 of Lecture Notes in Artificial
Intelligence, pages 435-449. Springer-Verlag.
De Raedt, L. and Van Laer, W. (1995). Inductive constraint logic. In Proceedings of
the Sixth International Workshop on Algorithmic Learning Theory, volume 997 of
Lecture Notes in Artificial Intelligence. Springer-Verlag.
Debrock, G. (1997). The artful riddle of abduction (abstract). In (Rayo et al., 1997),
page 230.
Dechter, R. ( 1996). Bucket elimination: A unifying framework for probabilistic infer-
ence. In Horvitz, E. and Jensen, F., editors, Proceedings of the Twelfth Conference
on Uncertainty in Artificial Intelligence ( UAJ -96 ), pages 211-219, Portland, OR.
286 BIBLIOGRAPHY

Decker, H. (1996). An extension of SLD by abduction and integrity maintenance for


view updating in deductive databases. In Maher, M., editor, Proceedings of the 1996
Joint International Conference and Symposium on Logic Programming, pages 157-
169. MIT Press.
DeJong, G. and Mooney, R. J. (1986). Explanation-based learning: An alternative
view. Machine Learning, 1:145-176.
Demolombe, R. and Farinas de Cerro, L. ( 1991 ). An inference rule for hypothesis gen-
eration. In Proceedings of the Twelfth International Joint Conference on Artificial
Intelligence, pages 152-157.
Denecker, M. and de Schreye, D. (1992). SLDNFA: An abductive procedure for nor-
mal abductive programs. In Apt, K., editor, Proceedings of the 1992 Joint Inter-
national Conference and Symposium on Logic Programming, pages 686-700. MIT
Press.
Denecker, M. and de Schreye, D. (1998). SLDNFA: An abductive procedure for ab-
ductive logic programs. Journal of Logic Programming, 34(2): 111-167.
Denecker, M., Martens, B., and De Raedt, L. (1996). On the difference between ab-
duction and induction: a model theoretic perspective. In (Flach and Kakas, 1996),
pages 19-22. Available on-line at http: 1 lwww . cs. bris. ac. ukl-flachl
ECAI961.
Descles, J. (1987). Implication entre concepts: La notion de typicalite'. In Riegel,
M. and Tamba, 1., editors, L' implication dans les langues naturelies et dans les
langages artificiels Theory Formation, pages 179-202. Klincksieck, Paris.
Dietterich, T. and Michalski, R. S. (1983). A comparative review of selected methods
for learning from examples. In (Michalski et al., 1983), pages 41-81.
Dimopoulos, Y., Dzeroski, S., and Kakas, A. C. (1997). Integrating explanatory and
descriptive learning in ILP. In Proceedings of the Fifteenth lnternationalloint Con-
ference on Artificial Intelligence, pages 900-905. Morgan Kaufmann.
Dimopoulos, Y. and Kakas, A. C. (1995). Learning non-monotonic logic programs:
learning exceptions. In Lavrac, N. and Wrobel, S., editors, Proceedings of the
Eighth European Conference on Machine Learning, volume 912 of Lecture Notes
in Artificial Intelligence, pages 122-137. Springer-Verlag.
Dimopoulos, Y. and Kakas, A. C. (1996a). Abduction and induction: an AI perspec-
tive. In (Flach and Kakas, 1996), pages 68-70. Available on-line at http: 1 lwww.
cs.bris.ac.ukl-flachiECAI961.
Dimopoulos, Y. and Kakas, A. C. (1996b ). Abduction and inductive learning. In (De
Raedt, 1996), pages 144-171.
Ennis, R. (1968). Enumerative induction and best explanation. Journal of Philosophy,
LXV(18):523-529.
Eshghi, K. ( 1988). Abductive planning with event calculus. In Kowalski, R. and Bowen,
K., editors, Proceedings of the Fifth International Conference on Logic Program-
ming, pages 562-579. MIT Press.
Eshghi, K. and Kowalski, R. A. (1989). Abduction compared with negation by fail-
ure. In Levi, G. and Martelli, M., editors, Proceedings of the Sixth International
Conference on Logic Programming, pages 234-254. MIT Press.
BIBLIOORAPHY 287

Esposito, F., Lamma, E., Malerba, D., Mello, P., Milano, M., Riguzzi, F., and Semer-
aro, G. (1996). Learning abductive logic programs. In (Flach and Kakas, 1996),
pages 23-30. Available on-line at http :Ilwww. cs .bris. ac. ukl-flachl
ECAI961.
Evans, C. A. and Kakas, A. C. (1992). Hypothetico-deductive reasoning. In Proceed-
ings of the International Conference on Fifth Generation Computer Systems, pages
546-554. ICOT.
Fann, K. T. ( 1970). Peirce's Theory of Abduction. Martinus Nijhoff, The Hague.
Flach, P. A. (1991). The role of explanations in inductive learning.ITK Research Re-
port 30, Tilburg University, The Netherlands.
Flach, P. A. (1992). Generality revisited. In Proceedings of the ECA/'92 Workshop on
Logical Approaches to Machine Learning, Vienna.
Flach, P. A. (1995). Conjectures-an inquiry concerning the logic of induction. PhD
thesis, Tilburg University.
Flach, P. A. (1996a). Abduction and induction: syllogistic and inferential perspectives.
In (Flach and Kakas, 1996), pages 31-35. Available on-line at http: 1 lwww. cs.
bris.ac.ukl-flachiECAI961.
Flach, P. A. (1996b). Rationality postulates for induction. In Shoham, Y., editor, Pro-
ceedings of the Sixth International Conference on Theoretical Aspects ofReasoning
and Knowledge (TARK'96), pages 267-281.
Flach, P. A. (2000). Logical characterisations of inductive learning. In Gabbay, D. M.
and Smets, P., editors, Handbook of Defeasible Reasoning and Uncertainty Man-
agement, volume IV: Abduction and Learning, chapter 4. Kluwer Academic Pub-
lishers.
Flach, P. A. and Kakas, A. C., editors (1996). Proceedings of the ECA/'96 Workshop
on Abductive and Inductive Reasoning. Available on-line at http: 1 lwww. cs.
bris.ac.ukl-flachiECAI961.
Flach, P. A. and Kakas, A. C. (1997a). Abductive and inductive reasoning: report of
the ECAI'96 workshop. Logic Journal of the IGPL, 5(5):773-778.
Flach, P. A. and Kakas, A. C., editors (1997b). Proceedings of the /JCA/'97 Workshop
on Abduction and Induction in Artificial Intelligence. Available on-line at http:
llwww.cs.bris.ac.ukl-flachiiJCAI971.
Flach, P. A. and Kakas, A. C. (1998). Abduction and induction in AI: report of the
IJCAI'97 workshop. Logic Journal of the IGPL, 6( 4):651-656.
Flener, P. (1997). Inductive logic program synthesis with dialogs. In Muggleton, S.,
editor, Inductive Logic Programming: Selected papers from the Sixth International
Workshop, pages 175-198. Springer-Verlag, Berlin.
Frankfurt, H. (1958). Peirce's notion of abduction. Journal of Philosophy, 55:594.
Frege, G. (1893). Grundgesetze der Aritmetik, Begriffsschriftlich Abgeleitet, Vol. /.
Jena, Germany. Reprinted by Olms, Hildesheim (1966).
Friihwirth, T. W. (1995). Constraint handling rules. In Podelski, A., editor, Constraint
Programming: Basics and Trends, volume 910 of Lecture Notes in Computer Sci-
ence, pages 90-107.
Fung, T. and Kowalski, R. A. (1997). The IFF proof procedure for Abductive Logic
Programming. Journal of Logic Programming, 33(2):151-165.
288 BIBLIOGRAPHY

Furukawa, K. and Shimazu, K. (1996). Knowledge discovery in database by Progol


- conception and design. In Tenth Annual Conference of JSAI, pages 49-52. In
Japanese.
Gabbay, D. M. (1985). Theoretical foundations for non-monotonic reasoning in expert
systems. In Apt, K. R., editor, Logics and Models of Concurrent Systems, pages
439-457. Springer-Verlag, Berlin.
Gallagher, J.P. (1993). Tutorial on specialisation of logic programs. In Proceedings of
the ACM SIGPLAN Symposium on Partial Evaluation and Semantics-Based Pro-
gram Manipulation (PEPM'93), pages 88-98, Copenhagen.
Galliers, J. ( 1992). Autonomous belief revision and communication. In Giirdenfors, P.,
editor, BeliefRevision, pages 220--246. Cambridge Tracts in Theoretical Computer
Science, Cambridge University Press,.
Giirdenfors, P. (1988). Knowledge in Flux: Modeling the Dynamics ofEpistemic States.
MIT Press.
Giirdenfors, P. and Makinson, D. (1988). Revisions of knowledge systems using epis-
temic entrenchment. In Proceedings of the Second Conference on Theoretical As-
pects of Reasoning and Knowledge, pages 83-96.
Glirdenfors, P. and Makinson, D. (1994). Nonmonotonic inferences based on expecta-
tions. Artificial Intelligence, 65:197-245.
Giirdenfors, P. and Rott, H. (1995). Belief revision. In Gabbay, D. M., Hogger, C. J.,
and Robinson, J. A., editors, Handbook ofLogic in Artificial/ntelligence and Logic
Programming, volume IV: Epistemic and Temporal Reasoning, pages 35-132. Ox-
ford University Press.
Gelfond, M. and Lifschitz, V. (1991). Classical negation in logic programs and dis-
junctive databases. New Generation Computing , 9(3/4):365-385.
Gentner, D. (1983). Structure-mapping: A theoretical framework for analogy. Cogni-
tive Science, 7:155-170.
Gentner, D. (1988). Analogical Inference and Analogical Access. Pitman.
Goebel, R. (1989). A sketch of analogy as reasoning with equality hypotheses. In
Proceedings of the International Workshop on Analogical and Inductive Inference,
volume 397 of Lecture Notes in Artificial Intelligence, pages 243-253.
Gorlt!e, D. (1997). iEureka! La traducci6n interlinguistica como descubrimiento prag-
matico-abductivo (abstract). In (Rayo et al., 1997), page 230.
Gregoire, E. and Sais, L. ( 1996). Inductive reasoning is sometimes deductive. In (Flach
and Kakas, 1996}, pages 36-39. Available on-line at http: I jwww . cs . bris .
ac.uk/-flach/ECAI96/.
Haack, S. (1993). Evidence and Inquiry. Towards Reconstruction in Epistemology .
Blackwell, Oxford (UK) and Cambridge, Mass.
Hamfelt, A. and Nilsson, J. F. (1996). Declarative logic programming with primitive
recursive relations on lists. In Maher, M., editor, Proceedings of the 1996 Joint
International Conference and Symposium on Logic Programming , pages 230--243.
MIT Press.
Hamfelt, A. and Nilsson, J. F. (1997). Towards a logic programming methodology
based on higher-order predicates. New Generation Computing, 15:421-228.
BIBLIOGRAPHY 289

Hamscher, W., Console, L., and de Kleer, J., editors (1992). Readings in Model-Based
Diagnosis. Morgan Kaufmann.
Hanson, N. R. (1958). The logic of discovery. Journal of Philosophy, 55(25): 1073-
1089.
Hanson, N. R. (1965). Notes towards a logic of discovery. In Bernstein, R., editor,
Critical Essays on C.S. Peirce. Yale University Press.
Harman, G. (1965). The inference to the best explanation. Philosophical Review,
74:88-95.
Harper, W. and Skyrms, B., editors (1988). Causation, Cause and Credence. Kluwer
Academic Press.
Heckerman, D. (1995). A tutorial on learning Bayesian networks. Technical Report
MSR-TR-95-06, Microsoft Research, Redmond, WA. (Revised November 1996).
Helft, N. ( 1989). Induction as nonmonotonic inference. In Proceedings of the First In-
ternational Conference on Principles ofKnowledge Representation and Reasoning,
pages 149-156, Toronto, Canada. Morgan Kaufmann.
Hempel, C. G. (1943). A purely syntactical definition of confirmation. Journal ofSym-
bolic Logic, 8(4): 122-143.
Hempel, C. G. (1945). Studies in the logic of confirmation. Mind, 54(213 & 214):1-26
& 97-121.
Hempel, C. G. (1962). Deductive-nomological versus statistical explanation. In Feigl,
H. and Maxwell, G., editors, Minnesota Studies in the Philosophy of Science, Vol.
///,pages 98-169. University of Minnesota Press.
Hempel, C. G. (1965). Aspects of Scientific Explanation and Other Essays in the Phi-
losophy of Science. Free Press, New York.
Hempel, C. G. (1966). Philosophy ofNatural Science. Prentice Hall, Englewood Cliffs,
NJ.
Hempel, C. G. and Oppenheim, P. (1948). Studies in the logic of explanation. Philos-
ophy of Science, 15:135-175.
Henglein, F. (1989). Polymorphic type inference and semi-unification. PhD thesis, De-
partment of Computer Science, New York University.
Henrion, M. ( 1988). Propagating uncertainty in Bayesian networks by probabilistic
logic sampling. In Lemmer, J. F. and Kanal, L. N., editors, Uncertainty in Artificial
Intelligence 2, pages 149-163. Elsevier.
Hill, P.M. and Gallagher, J. P. (1994). Meta-programming in logic programming. In
Gabbay, D. M., Hogger, C. J., and Robinson, J. A., editors, Handbook of Logic in
Artificial Intelligence and Logic Programming, volume V. Oxford University Press.
Hill, P. M. and Lloyd, J. W. ( 1989). Analysis of meta-programs. In Meta-programming
in Logic Programming, pages 23-51. MIT Press.
Hirata, K. (1995). A classification of abduction: Abduction for logic programming. In
Machine Intelligence 14, pages 397-424. Oxford University Press.
Hobbs, J., Stickel, M., Appelt, D., and Martin, P. (1990). Interpretation as abduction.
Technical report, SRI International, Menlo Park, Ca.
Hobbs, J., Stickel, M., Appelt, D., and Martin, P. (1993). Interpretation as abduction.
Artificial Intelligence, 63:69-142.
290 BIBLIOORAPHY

Hobbs, J., Stickel, M., Martin, P., and Edwards, D. (1988). Interpretation as abduction.
In Proceedings of the Twentysixth Annual ACL Meeting, pages 95-103, Buffalo.
Holland, J., Holyoak, K., Nisbett, R., and Thagard, P.R. (1986). Induction: Processes
of inference, learning, and discovery. MIT Press, Cambridge, MA.
Hookway, C. (1992). Peirce. Routledge & Kegan Paul, London.
Horwich, P. (1982). Probability and Evidence. Cambridge University Press.
Inoue, K. (1992a). Linear resolution for consequence finding. Artificial Intelligence,
56:301-353.
Inoue, K. ( 1992b). Studies on abductive and nonmonotonic reasoning. PhD thesis,
Kyoto University, Kyoto.
Inoue, K. (1994). Hypothetical reasoning in logic programs. Journal of Logic Pro-
gramming, 18(3):191-227.
Inoue, K. and Kudoh, Y. (1997). Learning extended logic programs. In Proceedings of
the Fifteenth International Joint Conference on Artificial Intelligence, pages 176-
181. Morgan Kaufmann.
Inoue, K. and Sakama, C. (1995). Abductive framework for nonmonotonic theory
change. In Proceedings of the Fourteenth International Joint Conference on Ar-
tificial Intelligence, pages 204-210. Morgan Kaufmann, California.
Inoue, K. and Sakama, C. ( 1996). A fix point characterization of abductive logic pro-
grams. Journal of Logic Programming, 27(2): 107-136.
Inoue, K. and Sakama, C. (1998). Specifying transactions for extended abduction.
In Proceedings of the Sixth International Conference on Principles of Knowledge
Representation and Reasoning, pages 394-405. Morgan Kaufmann, California.
Jaffar, J. and Maher, M. (1994). Constraint logic programming: A survey. Journal of
Logic Programming, 19-20:503-581.
Jaffar, J., Maher, M., Marriott, K., and Stuckey, P. (1998). Semantics of constraint
logic programs. Journal of Logic Programming, 37:1-46.
Jaynes, E. (1985). Bayesian methods: General background. In J.H. Justice, editor,
Maximum Entropy and Bayesian Methods in Applied Statistics, pages 1-25. Cam-
bridge University Press, Cambridge, England.
Jaynes, E. (1995). Probability Theory: The Logic of Science. Unpublished Manuscript.
Available on-line at ftp: I jbayes. wustl. edujJaynes. book.
Jensen, F. V. (1996). An Introduction to Bayesian Networks. Springer-Verlag, New
York.
Johnson-Laird, P. (1988). The Computer and the Mind. W. Collins and Co.
Jordan, M. and Bishop, C. (1996). Neural networks. Memo 1562, MIT Artificial In-
telligence Lab, Cambridge, MA.
Josephson, J. R. (1994). Conceptual analysis of abduction. In (Josephson and Joseph-
son, 1994), chapter 1, pages 5-30.
Josephson, J. R. and Josephson, S. G. ( 1994). Abductive Inference: Computation, Phi-
losophy, Technology. Cambridge University Press, New York.
Jung, B. (1993). On inverting generality relations. In Proceedings of the Third Inter-
national Workshop on Inductive Logic Programming, pages 87-101.
Kakas, A. C., Kowalski, R. A., and Toni, F. (1992). Abductive logic programming.
Journal of Logic and Computation, 2(6):719-770.
BIBLIOGRAPHY 291

Kakas, A. C., Kowalski, R. A., and Toni, F. (1997). The role of abduction in logic
programming. In Gabbay, D. M., Hogger, C. J., and Robinson, J. A., editors, Hand-
book of Logic in Artificial Intelligence and Logic Programming, volume 5, pages
233-306. Oxford University Press.
Kakas, A. C. and Mancarella, P. (1990a). Database updates through abduction. In Pro-
ceedings of the Sixteenth International Conference on Very Large Databases, pages
650-661. Morgan Kaufmann, California.
Kakas, A. C. and Mancarella, P. (1990b). Generalized stable models: a semantics for
abduction. In Proceedings of the Ninth European Conference on Artificial Intelli-
gence, pages 385-391. Pitman.
Kakas, A. C. and Mancarella, P. (1990c). On the relation between truth maintenance
and abduction. In Proceedings of the Second Pacific Rim International Conference
on Artificial Intelligence.
Kakas, A. C. and Mancarella, P. (1994). Knowledge assimilation and abduction. In In-
ternational Workshop on Truth Maintenance, Lecture Notes in Computer Science.
Springer-Verlag.
Kakas, A. C. and Michael, A. (1995). Integrating abductive and constraint logic pro-
gramming. In Proceedings of the Twelfth International Conference on Logic Pro-
gramming ICLP-95, pages 399-413.
Kakas, A. C. and Riguzzi, F. (1997). Learning with abduction. In Lavrac, N. and
Dzeroski, S., editors, Proceedings ofthe Seventh International Workshop on Induc-
tive Logic Programming, volume 1297 of Lecture Notes in Artificial Intelligence,
pages 181-188. Springer-Verlag.
Kakas, A. C. and Riguzzi, F. (1999). Abductive concept learning. New Generation
Computing.
Kanai, T. and Kunifuji, S. (1997). Extending inductive generalisation with abduction.
In (Flach and Kakas, 1997b), pages 25-30. Available on-line at http: 1 jwww.
cs.bris.ac.uk/-flach/IJCAI97/.
Kapitan, T. (1990). In what way is abductive inference creative? Transactions of the
Charles S. Peirce Society, 26(4):449-512.
Kasahara, K., Matsuzawa, K., Ishikawa, T., and Kawaoka, T. (1996). Viewpoint-based
measurement of semantic similarity between words. In Proceedings of the Fifth In-
ternational Workshop on Artificial Intelligence and Statistics, volume 112 of Lec-
ture Notes in Statistics, pages 433-442.
Kelly, K. and Glymour, C. (1988). Theory discovery from data with mixed quanti-
fier. Technical Report CMU -PHIL-9, Department of Philosophy, Methodology and
Logic, Carnegie-Mellon University, Pittsburgh, PA.
Kemeny, J. (1953). The use of simplicity in induction. Philosophical Review, 62:391-
408.
Kfoury, A., Tiuryn, J ., and Urcyczyn, P. ( 1990). The undecidability of the semi-unifica-
tion problem. In Proceedings of the Twentysecond Annual ACM Symposium on
Theory of Computing, pages 468-476.
Kijsirikul, B., Numao, M., and Shimura, M. (1992). Discrimination-based constructive
induction of logic programs. In Proceedings of the Tenth National Conference on
Artificial Intelligence, pages 44-49, San Jose, CA.
292 BIBLIOGRAPHY

Kitcher, P. (1981 ). Explanatory unification. Philosophy of Science, 48:251-281.


Kodratoff, Y. (1991 ). Induction and the organization of knowledge. In Proceedings of
the First International Workshop on Multistrategy Learning, pages 34-48, Harpers
Ferry.
Kodratoff, Y. and Ganascia, J.-G. (1986). Improving the generalization step in learn-
ing. In (Michalski et al., 1986), pages 215-244.
Konolige, K. (1992). Abduction versus closure in causal theories. Artificial Intelli-
gence, 53(2-3):255-272.
Konolige, K. (1996). Abductive theories in artificial intelligence. In G.Brewka, editor,
Principles of Knowledge Representation. CSLI Publications.
Kowalski, R. A. (1970). The case for using equality axioms in automatic demonstra-
tion. In Proceedings of the Symposium on Automatic Demonstration, volume 125
of Lecture Notes in Mathematics, pages 112-127. Springer-Verlag.
Kraus, S., Lehmann, D., and Magidor, M. (1990). Nomonotonic reasoning, preferen-
tial models and cumulative logics. Artificial Intelligence, 44:167-207.
Kruijff, G.-J. (1995). The unbearable demise of surprise: Reflections on abduction in
artificial intelligence and Peirce's philosophy. Master's thesis, University ofTwente,
Enschede, the Netherlands.
Kruijff, G.-J. (1997). Concerning logics of abduction- on integrating abduction and
induction. In (Flach and Kakas, 1997b), pages 31-36. Available on-line at http:
//www.cs.bris . ac.uk/-flach/IJCAI97/.
Kuipers, T. (1998). Abduction aiming at truth approximation. Paper presented at the
"Creativity and Discovery Conference", Gent, Belgium.
Lachiche, N. and Marquis, P. (1997). A model for generalization based on confirma-
tory induction. In van Someren, M. and Widmer, G., editors, Proceedings of the
Ninth European Conference on Machine Learning, volume 1224 of Lecture Notes
in Artificial Intelligence, pages 154-161, Prague, Czech Republic. Springer-Verlag.
Lakatos, I. (1975). Falsification and the methodology of scientific research programmes.
In Lakatos, I. and Musgrave, A., editors, Criticism and the Growth of Knowledge,
pages 91-196. Cambridge University Press.
Lamma, E., Mello, P., Milano, M., and Riguzzi, F. (1997). A system for learning ab-
ductive logic programs. In J. Dix, L. M. Pereira, T. P., editor, Proceedings of the
Workshop on Logic Programming and Knowledge Representation LPKR'97. Uni-
versitat Koblenz-Landau, Technical Report n. 10/97.
Lamma, E., Riguzzi, F., and Pereira, L. M. (1998). Learning with extended logic pro-
grams. In Dix, J. and Lobo, J., editors, Proceedings of the LP Track at the Interna-
tional NMR'98 Workshop. Universillit of Koblenz.
Laplace, P. (1812). Theorie Analytique de Probabilites. Courcier, Paris.
Lauritzen, S. L. and Spiegelhalter, D. J. (1988). Local computations with probabilities
on graphical structures and their application to expert systems. Journal of the Royal
Statistical Society, Series B, 50(2): 157-224.
Lavrac, N. and Dzeroski, S. (1994).1nductive Logic Programming: Techniques and
Applications. Ellis Horwood.
Leake, D. (1995). Abduction, experience and goals. Journal of Experimental and The-
oretical Artificial Intelligence.
BIBLIOGRAPHY 293

Lee, R. (1967). A Completeness Theorem and Computer Program for Finding Theo-
rems Derivable from Given Axioms. PhD thesis, University of California, Berkeley.
LeiB, H. (1984). Polymorphic recursion and semi-unification. In Lecture Notes in
Computer Science 440, pages 211-224. Springer-Verlag.
Levesque, H. J. (1989). A knowledge-level account of abduction. In Proceedings of
the Eleventh Internationalloint Conference on Artificial Intelligence, pages 1061-
1067, Detroit.
Levi, I. (1991 ). The Fixation of Belief and its Undoing. Cambridge University Press.
Lewis, D. (1986). Causal explanation. In Philosophical Papers, volume 2. Oxford Uni-
versity Press.
Lipton, P. (1991). Inference to the Best Explanation. Routledge & Kegan Paul, Lon-
don.
Lloyd, J. W. ( 1987). Foundations ofLogic Programming. Springer-Verlag, 2nd edition.
Lobo, J. and Uzcategui, C. (1998). Abductive change operators. Fundamenta Infor-
matica.
Loredo, T. (1990). From Laplace to supernova SN 1987A: Bayesian inference in as-
trophysics. In Fougere, P., editor, Maximum Entropy and Bayesian Methods, pages
81-142. Kluwer Academic Press, Dordrecht, The Netherlands.
Lucas, P. (1998). Analysis of notions of diagnosis. Artificial Intelligence, 105(1-2):289-
337.
Mackworth, A. K. (1978). Vision research strategy: black magic, metaphors, mecha-
nisms, miniworlds and maps. In Hanson, A. R. and Riseman, E. M., editors, Com-
puter Vision Systems, pages 53-61. Academic Press, New York, NY.
Makinson, D. (1989). General theory of cumulative inference. In Reinfrank, M., de
Kleer, J., Ginsberg, M. L., and Sandewall, E., editors, Proceedings ofthe Second In-
ternational Workshop on Non-Monotonic Reasoning, volume 346 of Lecture Notes
in Artificial Intelligence, pages 1-18, Berlin. Springer-Verlag.
Martin, L. and Vrain, C. (1996). A three-valued framework for the induction of general
logic programs. In (De Raedt, 1996), pages 219-235.
Mayer, M. C. and Pirri, F. (1996). Abduction is not deduction-in-reverse. Logic Jour-
nal of the IGPL, 4(1):1-14.
McCarthy, J. (1980). Circumscription: a form of non-monotonic reasoning. Artificial
Intelligence, 13(1-2):27-39.
Meltzer, B. (1970). The semantics of induction and the possibility of completee sys-
tems of inductive inference. Artificial Intelligence, 1:189-192.
Michalski, R. S. ( 1983a). A theory and methodology of inductive learning. In (Michal-
ski et al., 1983), pages 83-134.
Michalski, R. S. (1983b). A theory and methodology of inductive learning. Artificial
Intelligence, 20(2): 111-162.
Michalski, R. S. (1987). Concept learning. In Shapiro, S., editor, Encyclopedia ofAr-
tificial Intelligence, pages 185-194. John Wiley, Chicester.
Michalski, R. S. (1991). Inferential learning theory as a basis for multistrategy task-
adaptive learning. In Proceedings of the First International Workshop on Multi-
strategy Learning, pages 3-18, Harpers Ferry.
294 BIBLIOGRAPHY

Michalski, R. S. (1993). Inferential theory of learning as a conceptual basis for multi-


strategy learning. Machine Learning, 11:111-151.
Michalski, R. S., Carbonell, J. G., and Mitchell, T. M., editors (1983). Machine Learn-
ing: an Artificial Intelligence Approach. Morgan Kaufmann, Palo Alto, CA.
Michalski, R. S., Carbonell, J. G., and Mitchell, T. M., editors (1986). Machine Learn-
ing: an Artificial Intelligence Approach, Volume II. Morgan Kaufmann, Palo Alto,
CA.
Mill, J. S. (1843). A System ofLogic. Reprinted in The Collected Works ofJohn Stuart
Mill, J.M. Robson, editor, Routledge & Kegan Paul, London.
Mitchell, T. M. (1982). Generalization as search. Artificial Intelligence, 18:203-226.
Mitchell, T. M., Keller, R., and Kedar-Cabelli, S. (1986). Explanation-based general-
ization: A unifying view. Machine Learning, 1:47-80.
Mooney, R. J. (1995a). Encouraging experimental results on learning CNF. Machine
Learning, 19(1):79-92.
Mooney, R. J. (1995b). A preliminary PAC analysis of theory revision. In Petsche, T.,
Hanson, S., and Shavlik, J., editors, Computational Learning Theory and Natural
Learning Systems, Vol. 3, pages 43-53. MIT Press, Cambridge, MA.
Morgan, C. (1971 ). Hypothesis generation by machine. Artificial Intelligence, 2: 179-
187.
Mortimer, H. (1988). The logic of induction. Ellis Horwood.
Muggleton, S. H. (1987). DUCE, an oracle based approach to constructive induction.
In Proceedings of the Tenth International Joint Conference on Artificial Intelli-
gence, pages 287-292, Milano.
Muggleton, S. H. (1990). Inductive logic programming. In Arikawa, S., Goto, S.,
Ohsuga, S., and Yokomori, T., editors, Proceedings of the First International Work-
shop on Algorithmic Learning Theory, pages 42-62. JSAI.
Muggleton, S. H. (1991). Inductive logic programming. New Generation Computing,
8( 4):295-318.
Muggleton, S. H., editor (1992).1nductive Logic Programming. Academic Press.
Muggleton, S. H. (1993). Inductive logic programming: Derivations, successes and
shortcomings. In Proceedings ofthe Sixth European Conference on Machine Learn-
ing, pages 21-37, Vienna.
Muggleton, S. H. (1995). Inverse entailment and Progol. New Generation Computing,
13(3-4):245-286.
Muggleton, S. H. and Buntine, W. L. (1988). Machine invention of first-order predi-
cates by inverting resolution. In Proceedings of the Fifth International Conference
on Machine Learning, pages 339-352, San Mateo, CA. Morgan Kaufmann.
Muggleton, S. H. and De Raedt, L. (1994). Inductive logic programming: theory and
methods. Journal of Logic Programming, 19-20:629-679.
Muggleton, S. H. and Feng, C. (1990). Efficient induction of logic programs. In Arikawa,
S., Goto, S., Ohsuga, S., and Yokomori, T., editors, Proceedings of the First Inter-
national Workshop on Algorithmic Learning Theory. JSAI.
Neal, R. (1996). Bayesian Learning for Neural Networks. Springer-Verlag, New York.
Ng, H. T. (1992). A General Abductive System with Applications to Plan Recognition
and Diagnosis. PhD thesis, Department of Computer Sciences, University of Texas,
BIBLIOORAPHY 295

Austin, TX. Also appears as Artificial Intelligence Laboratory Technical Report AI


92-177.
Ng, H. T. and Mooney, R. J. (1990). On the role of coherence in abductive explanation.
In Proceedings of the Eighth National Conference on Artificial Intelligence, pages
337-342, Boston.
Ng, H. T. and Mooney, R. J. (1991). An efficient first-order Hom-clause abduction
system based on the ATMS. In Proceedings of the Ninth National Conference on
Artificial Intelligence, pages 494-499, Anaheim, CA.
Ng, H. T. and Mooney, R. J. (1992). Abductive plan recognition and diagnosis: A com-
prehensive empirical evaluation. In Proceedings of the Third International Confer-
ence on Principles of Knowledge Representation and Reasoning, pages 499-508,
Cambridge, MA.
Nienhuys-Cheng, S.-H. and de Wolf, R. (1994). The subsumption theorem in inductive
logic programming: Facts and fallacies. In de Raedt, L., editor, Proceedings of the
Fourth International Workshop on Inductive Logic Programming, pages 147-160.
Nienhuys-Cheng, S.-H. and de Wolf, R. (1995). The subsumption theorem revisited:
Restricted to SLD-resolution. In Computer Science in the Netherlands '95.
Nienhuys-Cheng, S.-H. and de Wolf, R. (1997). Foundations of Inductive Logic Pro-
gramming. Number 1228 in Lecture Notes in Artificial Intelligence. Springer-Verlag.
Numao, M. and Shimura, M. ( 1990). Inductive program synthesis by using a reversible
meta-interpreter. In Proceedings of the Second Workshop on Meta-programming in
Logic, pages 123-136.
O'Neill, M. and Chiafari, F. (1989). Escherichia coli promoters. Journal of Biological
Chemistry, 264:5531-5534.
O'Rorke, P. (1994). Abduction and explanation-based learning: case studies in diverse
domains. Computational Intelligence, 10:295-330.
O'Rorke, P., Morris, S., and Schulenburg, D. (1990). Theory formation by abduction:
A case study based on the chemical revolution. In Shrager, J. and Langley, P., ed-
itors, Computational Models of Scientific Discovery and Theory Formation, pages
97-224. Morgan Kaufmann.
Ourston, D. (1991). Using Explanation-Based and Empirical Methods in Theory Re-
vision. PhD thesis, Department of Computer Sciences, University of Texas, Austin,
TX. Also appears as Artificial Intelligence Laboratory Technical Report AI 91-164.
Ourston, D. and Mooney, R. J. (1990). Changing the rules: a comprehensive approach
to theory refinement. In Proceedings of the Eighth National Conference on Artifi-
cial Intelligence, pages 815-820. MIT Press, Cambridge.
Ourston, D. and Mooney, R. J. (1994). Theory refinement combining analytical and
empirical methods. Artificial Intelligence, 66:311-344.
Oztiirk, P. ( 1997). An AI criterion for an account of inference: how to realize a task. In
(Flach and Kakas, 1997b), pages 43-48. Available on-line at http: I jwww. cs.
bris.ac . uk/-flach/IJCAI97/.
Pagnucco, M. ( 1996). The Role of Abductive Reasoning within the Process of Belief
Revision. PhD thesis, Basser Department of Computer Science, University of Sid-
ney, Australia.
296 BIBLIOGRAPHY

Pearl, J. (1978). On the connection between the complexity and credibility of inferred
models. International Journal of General Systems, 4:255-264.
Pearl, J. (1987). Evidential reasoning using stochastic simulation of causal models.
Artificial Intelligence, 32(2):245-257.
Pearl, J. (1988a). Embracing causation in default reasoning. Artificial Intelligence,
35(2):259-271.
Pearl, J. (1988b ). Probabilistic Reasoning in Intelligent Systems: Networks of Plausi-
ble Inference. Morgan Kaufmann, San Mateo, CA.
Peirce, C. S. (1955a). Abduction and induction. In Buchler, J., editor, Philosophical
Writings of Peirce, chapter 11, pages 150--156. Dover.
Peirce, C. S. (1955b ). The fixation of belief. In Buchler, J., editor, Philosophical Writ-
ings of Peirce. Dover.
Peirce, C. S. ( 1957). Essays in the Philosophy of Science. Liberal Arts Press.
Peirce, C. S. (1958). Collected Papers of Charles Sanders Peirce. Harvard University
Press, Cambridge, Massachusetts. Edited by C. Harstshome, P. Weiss & A. Burks.
Peng, Y. and Reggia, J. A. (1990). Abductive Inference Models for Diagnostic Problem-
Solving. Springer-Verlag, New York.
Pino Perez, R. and Uzcategui, C. (1997). Jumping to explanations vs. jumping to con-
clusions. Technical Report IT-301, LIFL, University of Lille, France.
Plotkin, G. D. (1970). A note on inductive generalization. In Meltzer, B. and Michie,
D., editors, Machine Intelligence 5, pages 153-163. Edinburgh University Press.
Plotkin, G. D. (1971). Automatic Methods of Inductive Inference. PhD thesis, Edin-
burgh University.
Poole, D. (1988a). A logical framework for default reasoning. Artificial Intelligence,
36(1):27-47.
Poole, D. (1988b ). Representing knowledge for logic-based diagnosis. In Proceedings
of the International Conference on Fifth Generation Computing Systems, pages
1282-1290,Tokyo,Japan.
Poole, D. (1989a). Explanation and prediction: An architecture for default and abduc-
tive reasoning. Computational Intelligence, 5(2):97-110.
Poole, D. (1989b). Normality and faults in logic-based diagnosis. In Proceedings of
the Eleventh International Joint Conference on Artificial Intelligence, pages 1304-
1310, Detroit, MI.
Poole, D. (1990). A methodology for using a default and abductive reasoning system.
International Journal of Intelligent Systems, 5(5):521-548.
Poole, D. (1993). Probabilistic Hom abduction and Bayesian networks. Artificial In-
telligence, 64(1):81-129.
Poole, D. (1994). Representing diagnosis knowledge. Annals of Mathematics and Ar-
tificial Intelligence, 11:33-50.
Poole, D. (1996). Probabilistic conflicts in a search algorithm for estimating posterior
probabilities in Bayesian networks. Artificial Intelligence, 88:69-100.
Poole, D. (1997). The independent choice logic for modelling multiple agents under
uncertainty. Artificial Intelligence, 94:7-56. Available on-line at http: 1 jwww.
cs.ubc.cajspider/poolejabstracts/icl.htrnl.
BIBLIOORAPHY 297

Poole, D. (1998). Abducing through negation as failure: stable models in the Indepen-
dent Choice Logic. Journal of Logic Programming. Available on-line at http:
llwww.cs.ubc.calspiderlpoolelabstractslappro x-pa.html.
Poole, D., Goebel, R., and Aleliunas, R. (1987). Theorist: A logical reasoning system
for defaults and diagnosis. In Cercone, N. and McCalla, G., editors, The Knowledge
Frontier: Essays in the Representation of Knowledge, pages 331-352. Springer-
Verlag.
Pople, Jr., H. E. (1973). On the mechanization of abductive logic. In Nilsson, N.J.,
editor, Proceedings of the Third International Joint Conference on Artificial Intel-
ligence, pages 147-152, Standford, CA. William Kaufmann.
Popper, K. (1959). The Logic of Scientific Discovery. Basic Books, New York.
Psillos, S. (1996). Ampliative reasoning: induction or abduction? In (Flach and Kakas,
1996), pages56-61. Available on-line at http: 1 lwww. cs. bris. ac. ukl-flachl
ECAI961.
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1):81-106.
Quinlan, J. R. (1990). Learning logical definitions from relations. Machine Learning,
5(3):239-266.
Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann, San
Mateo, CA.
Ramachandran, S. (1998). Theory Refinement ofBayesian Networks with Hidden Vari-
ables. PhD thesis, Department of Computer Sciences, University of Texas, Austin,
TX. Also appears as Artificial Intelligence Laboratory Technical Report AI 98-265
(see http: 1 lwww. cs. utexas. eduluserslai -lab).
Ramachandran, S. and Mooney, R. J. (1998). Theory refinement for Bayesian net-
works with hidden variables. In Proceedings of the Fifteenth International Confer-
ence on Machine Learning, Madison, WI. Morgan Kaufman.
Ramsey, F. P. (1931). General propositions and causality. In Foundations of Mathe-
matics and Other Logical Essays, pages 237-257. Routledge & Kegan Paul.
Rayo, M., Gimate-Welsh, A., and Pellegrino, P., editors (1997). VIth International
Congress of the International Association for Semiotic Studies: Semiotics Bridging
Nature and Culture. Editorial Solidaridad, Mexico.
Reggia, J. A., Nau, D., and Wang, P. (1983). Diagnostic expert systems based on a set
covering model. International Journal of Man-Machine Studies, 19(5):437-460.
Reilly, F. E. (1970). Charles Peirce's Theory of Scientific Method. Fordham University
Press, New York.
Reiter, R. (1980). A logic for default reasoning. Artificial Intelligence, 13:81-132.
Reiter, R. (1987). A theory of diagnosis from first principles. Artificial Intelligence,
32(1 ):57-96. Also in (Hamscher et al., 1992).
Reiter, R. and de Kleer, J. (1987). Foundation of assumption-based truth maintenance
systems: preliminary report. In Proceedings of the Sixth National Conference on
Artificial Intelligence, pages 183-188.
Roesler, A. (1997). Perception and abduction (abstract). In (Rayo et al., 1997), page
226.
Rouveirol, C. (1992). Extensions of inversion of resolution applied to theory comple-
tion. In (Muggleton, 1992), pages 63-92.
298 BIBLIOGRAPHY

Rouveirol, C. and Puget, J.-F. (1990). Beyond inversion of resolution. In Proceed-


ings ofthe Seventh International Conference on Machine Learning, pages 122-131 ,
Austin.
Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986). Learning internal repre-
sentations by error propagation. In Rumelhart, D. E. and McClelland, J. L. , editors,
Parallel Distributed Processing, chapter 8, pages 318-362. MIT Press, Cambridge,
MA.
Saitta, L., Botta, M., and Neri, F. (1993). Multi-strategy learning and theory revision.
Machine Learning, 11:153-172.
Salmon, M. (1984a). Introduction to Logic and Critical Thinking. Harcourt, Brace,
Jovanovich.
Salmon, W. C. (1984b). Scientific Explanation and the Causal Structure of the World.
Princeton University Press, Princeton.
Salmon, W. C. (1990). Four Decades of Scientific Explanation. University of Min-
nesota Press, Minneapolis.
Shanahan, M. (1989). Prediction is deduction, but explanation is abduction. In Pro-
ceedings of the Eleventh International Joint Conference on Artificial Intelligence,
pages 1055-1060, Detroit, MI.
Shannon, C. and Weaver, W. (1949). The Mathematical Theory of Communications.
University of Illinois Press.
Shapiro, E. Y. (1981 ). Inductive inference of theories from facts. Research Report 192,
Yale University.
Shapiro, E. Y. (1983). Algorithmic program debugging . MIT Press.
Shapiro, E. Y. (1991). Inductive inference of theories from facts. In Lassez, J. L.
and Plotkin, G., editors, Computational Logic: Essays in Honor of Alan Robinson,
pages 237-257. MIT Press, Cambridge, Mass.
SICS (1998). SICStus Prolog User's Manual, Release 3.7. Intelligent Systems Lab-
oratory, Swedish Institute of Computer Science, PO Box 1263, SE-164 29 Kista,
Sweden.
Slagle, J., Chang, C., and Lee, R. (1969). Completeness theorems for semantic reso-
lution in consequence-finding. In Proceedings of the First lnternationalJoint Con-
ference on Artificial Intelligence, pages 281-285.
Srinivasan, A., Muggleton, S. H., and Bain, M. ( 1992). Distinguishing exceptions from
noise in non-monotonic learning. In Proceedings of the Second International Work-
shop on Inductive Logic Programming. ICOT.
Stahl, I. ( 1996). Predicate invention in inductive logic programming. In (De Raedt,
1996), pages 34-47.
Stickel, M. (1988). A Prolog-like inference system for computing minimum-cost ab-
ductive explanations in natural language interpretation. Technical Report TN-451 ,
SRI International.
Tarski, A. (1956). On the concept of logical consequence. In Logic, Semantics, Meta-
mathematics, pages 409-420. Clarendon Press, Oxford.
Thagard, P. R. ( 1981 ). Peirce on hypothesis and abduction. In C.S. Peirce Bicentennial
International Congress. Texas University Press.
BIBLIOGRAPHY 299

Thagard, P. R. (1988). Computational Philosophy of Science. MIT Press, Cambridge,


MA.
Thagard, P. R. ( 1989). Explanatory coherence. Behavioral and Brain Sciences, 12:435-
502.
Thagard, P.R. (1992). Conceptual Revolutions. Princeton University Press.
Thagard, P.R. and Shelley, C. (1997). Abductive reasoning: Logic, visual thinking and
coherence. In Chiara, M. D., editor, Logic and Scientific Methods, pages 413-427.
Kluwer.
Thompson, C. A. (1993). Inductive learning for abductive diagnosis. Master's thesis,
Department of Computer Sciences, University of Texas.
Thompson, C. A. and Mooney, R. J. (1994). Inductive learning for abductive diagnosis.
In Proceedings of the Twelfth National Conference on Artificial Intelligence, pages
664-669, Seattle, WA.
Towell, G. and Shavlik, J. (1993). Extracting refined rules from knowledge-based neu-
ral networks. Machine Learning, 13(1):71-102.
Tuhrim, S., Reggia, J. A., and Goodall, S. (1991). An experimental study of crite-
ria for hypothesis plausibility. Journal of Experimental and Theoretical Artificial
Intelligence, 3:129-144.
Veloso, M. and Carbonell, J. G. (1991). Automating case generation, storage andre-
trieval in PRODIGY. In Proceedings of the First International Workshop on Multi-
strategy Learning, pages 363-377, Harpers Ferry.
Wallace, W. A. (1972). Causality and Scientific Explanation, volume I, Medieval and
Early Classical Science. University of Michigan Press, Ann Arbor.
Wallace, W. A. (1974). Causality and Scientific Explanation, volume 2, Classical and
Contemporary Science. University of Michigan Press, Ann Arbor.
Wang, P. (1994). From inheritance relation to nonaxiomatic logic. International Jour-
nal ofApproximate Reasoning, 11(4):281-319.
Wang, P. (1995). Non-Axiomatic Reasoning System: Exploring the Essence of Intel-
ligence. PhD thesis, Indiana University. Available at http: 1 jwww. cog sci.
indiana.edu/farg/peiwangjpapers.html.
Williams, M. A. (1994). Explanation and theory base transmutations. In Proceedings
of the Eleventh European Conference on Artificial Intelligence, pages 341-346.
Wirth, R. and O'Rorke, P. (1991). Constraints on predicate invention. In Proceed-
ings of the Eighth International Workshop on Machine Learning, pages 457-461,
Evanston, IL.
Wirth, R. and O'Rorke, P. (1992). Constraints for predicate invention. In (Muggleton,
1992), pages 299-318.
Wirth, U. ( 1997). Abductive inference in semiotics and philosophy of language: Peirce's
and Davidson's account of interpretation (abstract). In (Rayo et al., 1997), page
232.
Wogulis, J. (1991). Revising relational domain theories. In Proceedings of the Eighth
International Workshop on Machine Learning, pages 462-466. Morgan Kaufmann,
California.
300 BIBLIOGRAPHY

Wogulis, J. (1993). Handling negation in first-order theory revision. In Proceedings of


the IJCAI-93 Workshop on Inductive Logic Programming, pages 36-46, Chambery,
France.
Wogulis, J. (1994). An Approach to Repairing and Evaluating First-Order Theories
Containing Multiple Concepts and Negation. PhD thesis, Department of Informa-
tion and Computer Science, University of California, Irvine, CA.
Wogulis, J. and Pazzani, M. J. (1993). A methodology for evaluating theory revision
systems: Results with Audrey II. In Proceedings of the Thirteenth International
Joint Conference on Artificial Intelligence, pages 1128-1134. Morgan Kaufmann,
California.
Wrobel, S. and Dzeroski, S. (1995). The ILP description learning problem: Towards a
general model-level definition of data mining in ILP. In Morik, K. and Herrmann,
J., editors, Proceedings of FGML'95, Annual Workshop of the GI Special Inter-
est Group Machine Learning (GI FG 1.1.3 ), Univ. Dortmund, Germany. Research
Report580.
Yamamoto, A. (1997). Which hypotheses can be found with inverse entailment? In
Proceedings of the Seventh International Workshop on Inductive Logic Program-
ming, volume 1297 of Lecture Notes in Artificial Intelligence, pages 296- 308.
Yamamoto, A. (1999a). An inference method for the complete inverse of relative sub-
sumption. New Generation Computing, 17(1):99-117.
Yamamoto, A. (1999b). Revising the logical foundations of inductive logic program-
ming systems with ground reduced programs. New Generation Computing, 17( 1): 119-
127.
Zelle, J. M. and Mooney, R. J. (1994). Combining top-down and bottom-up meth-
ods in inductive logic programming. In Proceedings of the Eleventh International
Conference on Machine Learning, pages 343-351, New Brunswick, NJ.
Zhang, N. and Poole, D. (1996). Exploiting causal independence in Bayesian network
inference. Journal ofArtificial Intelligence Research, 5:301-328.
Index

abducible, 15, 17-21,23,25,26, 56, 70, 71, 85, specialization, 260


114,139,198,203,204,208,215, surprise, 45
223-229,234-238,241-243,245- taxonomy, SO
252,254 theory, 18, 19, 21, 26, 176,234,237-239,
invention, 228, 260 242,243,245,249
rule, 223-228 triggers, 50
abduction anomaly, 50, 51,57
and Bayes' rule, 159 novelty, 50, 51, 57
and Bayesian networks, 164 surprise, 51, 54, 57
as an epistemic process, 45, 57 types, 50
as epistemic change, 54 Abe, A., Ill, 173, 174,233,236, 248,249
as evidential reasoning, l 55 abstraction, 117
as explanatory inference, 15 ACL,24
as hypothesis generation, 5, 8 ACLP,23
as reversed deduction, 49, 170 Ade,H.,24,25, 177,210,247-249,264
as selecting an extension, 18 AILP,l77
as synthetic inference, 47 Alchourr6n, C., 45, 52
computational model of, 23 Aleliunas, R., 77, 133, 147, 150, 171, 183
constructive, 23 Aliseda, A., 7, II, 47, 49, 50, 56, 58, 84,85
inferential definition of, 7 Allemang, D., 72
simple, 18,22 alternative, 163
abductive ampliative reasoning, 6,17, 21, 60, 78, 107, 108,
analogical reasoning, 173, 174, 177,249 215
argument, 36 analogy,3,25, 133,179,180,202,210
coverage, 237 analytic inference, 6
entailment, 17. 228 Anderson, D., 48, 49
epistemic theory, 55 Andreasen, T., 211
expansion, 54-57 Angluin, D., 92, 178,267,280
explanation, ll-16, 22-27,50 answer set, 214,216,217,221,223-225
extended logic program, 223,229 Appelt, D., 45, 133
extension, 19 Aravindan, C., 54
fonnulation, 49, 57 argument, 3, 5, 9, 35, 60, 91
generalization, 257 abductive,36
hypotheses, 177 ampliative, 60,61
logic programming, 13, 15, 24, 85, 163, based on samples, 3
168, 199, 200, 211, 223, 228, 229, categorical, 4, 6
233,234 confinnatory, 100
logical fonn, 46, 50, 57 deductive,3,61, 72
problem, 51 explanatory, 63
revision, 54-57 explicative,6l
301
302 INDEX

from analogy, 3 Buchanan, B., 154


hypothetical, 93 Bunge, M., 148
inductive, 3, 4 Buntine, W.L., 23, 111, 146, 161, 162, 164--166,
Arima, J., 180 168-170, 175
Arimura, H., 270, 280 Burks, A., 65
Aristotle, 5, 6, 38, 148-151 Bylander, T., 72
artificial intelligence, 2, 4, 10, 11, 22, 44, 79,
81,83,90, 115,119,127-129,169, C4.5, 156, 187
267 Carbonell, J., 133
AUDREY,264 Carnap, R., 4, 82
automated reasoning, 44, 108, 114, 197
CART,156
Ayim, M.,48
categorical argument, 4, 6
categorical inductive generalisation, 4, 5, I 0, 16
background knowledge, 1, 13, 14, 16, 21, 26,
causal
68, 78, 79, 90,94--97,99, 107-113,
determinism, 37
120, 170, 171, 176, 182, 184, 196,
explanation, 35, 68, 138, 149
214,216,218,220,221,223,225-
hypothesis, 46
228
inductive, 176 inference, 187
background predicate, 14, 15,25 mechanism, 36
background theory, 50, 54, 57, 269 modelling, 154
backpropagation, 156 reasoning, 154
backward reasoning, 34, 35, 39, 78 responsibility, 37, 107
Bacon, F., 150 story, 38, 42
Baffes, P.T., 184, 186, 187 theory, 140
Bain, M., 213,214, 218, 220,228 causation, 38,39,44,68, 148,150,154,170
Baral, C., 214,221 cause,35,38,39, 107,138,140,148,227
Bayes' rule, 159 Cayrol, M., 112
Bayesian learning, 159 Chang, C.L., 268
Bayesian network, 161, 162, 187 Charniak, E., 133
Bayesian probability, 12, 156 Cheeseman, P., 159, 161
belief, 51, 56, 57 chess playing, 9
degree of, 4 Chiafari, F., 187
network, 162 Chinn, C.A., 137
operator, 54 choice space, 163
revision, 45, 57, 86 Christiansen, H., 11, 23, 111, 119, 199, 205,
theories in AI, 52, 54 206,210, 211,250
set, 223-225 CIGOL, 23, 24
state, 52 Clark, K.L., 112, 150, 234
Bell, J., 101, 106 classical negation, 214, 216, 228
Bergadano, F., 175, 211, 214, 228, 233, 236, classification, 89, 90, 95, 105, 169, 171, 182
237,239 CLAUDIEN, 112,241,250
Berwick, R.C., 178 CLINT,264
Bessant, B., 3, 47, 78, 107, 109, Ill closed world assumption, 101, 114,214
Bhaskar, R., 37 closed world specialization, 218
bias, 18, 40, 42, 43, 114, 139, 159, 161, 203, CMS, 172
204,209,218,220,237,238,240- Cohen, W.W., 26, 270, 280
242,245,246,258 commonsense knowledge, 214,223,228
Bishop, C.M., 154 commonsense reasoning, 32, 83
Botta, M., 133, 138, 149 completeness, 72, 73, 90, 97, 99, 102, 104, 106,
bottom generalization, 273 206,217,236,237,239
bottom set, 273 of basic forms of abduction and induction,
Breiman, L., !56 27
Brewka, G., 112 of SOLD-resolution, 272
Brogi, A., 235 completion, 85, 87, 108, 110-116, 227
Brunk,C.A., 184,186,187 predicate, 112, 114, 116
Bruynooghe, M., 16, 103, 112, 114, 133, 214, complexity, 59
217,228,241,247-250,264 concept, 134, 136
INDEX 303

conceptlearrring, 11,12,40, 92,214,217 deductive-nomological explanation, 79


conceptual change, 57 default, 216, 227,229
conditional probability, 82, 158 atom, 235
confinnation,4, 6,8, 10, 11 , 66, 67, 77, 82-84, cancellation rule, 214,218, 219,221,223,
87,97-99, 101, 105, 108, 113, 115, 226
118, 129 hierarchical, 214, 222, 228
confinnationilieory, 4, 8,9,82 hypoiliesis, 251
confirmatory logic, 215, 216
argument, 100 negation, 214,229
induction, 11, 16, 23, 26, 89, 108, 110, negation by, 234-236, 243
115 reasoning, 164
inference, 83 rule, 10, 214, 215, 218-222, 226-228,
reasoning,89,91-93 , 97, 101,105,106 245
consequence finding, 268 defeasible argument, 4
consequence of an SOLD-derivation, 272 defeasible reasoning, 126
consequence relation, 4, 13, 86, 90, 91, 93, 94, definite clause, 254, 270
96,97,99, 100, 102-105, 107,109, definite program, 270
111-113 DeJong, G., 133, 147
Console, L., 8, 11, 14-16, 20, 23, 69, 71, 77, Denecker, M. , 19, 23-25, 47, 111, 177, 210,
85, 112, 133, 134, 138, 139, 144, 247-249,264
146-150,155, 209 derivation, 235
constraint, 111, 114, 197, 199,203,205-207 Descles, J., 135
context, 123, 128 descriptive induction, 11, 16, 26, 89, 97, 108-
contraction, 52 116
contrast set, 32, 43 design, 169, 170, 172
Cooper, G.G., 189 diagnosis, II, 12, 36, 66, 72, 77, 79, 133, 138,
Copi, I.M., 34 147,153,169,188,234
counterfactual, 211 Dietterich, T., 148
cover, 217-219, 223 Dimopoulos, Y., 21 , 25, 26, 82, 84, 170, 180,
coverage, 36, 109, 115, 116 184, 189, 213, 215, 228, 245, 248,
abductive, 26, 237 249, 264
deductive,26,237 discovery, 46, 82
covering law, 79, 82 logic of, 9
Cox, P.T., 23, 77, 133, 138, 139, 149, 150, 186, of abducibles, 229
267 ofexceptions,247,250
cycle of perception, 154 domain ilieory, 79, 80, 82
Dung, P.M., 54

Dreroski, s.. 26, 114, 183,213,217,220, 267 Edwards, D., 139


Dagum, P., 162 EITHER, 24, 264
Darden, L. , 36 empiricism, 84
data mining, 15 Ennis, R., 41
de Kleer, J., 77, 133, 134, 139, 147, 149, 172 enumerative induction, 82
De Raedt, L., 13, 16, 19, 77, 101 , 103, 111 , 112, epistemic change, 54
114, 133, 213, 214, 226-228, 241 , Eshghi, K., 23, 133,210, 214, 215, 234
247-250,264 Esposito, F., 11, 26, 111, 116, 177, 215, 228,
de Schreye, D., 23,210 239
de Wolf, R., 211,267,268, 272 Evans, C.A., 71
Debrock, G., 48 event calculus, 70
Dechter, R., 162 evidence, 158
decision tree, 153, 187 evidential modelling, 154
learning, 159, 161, 166 evidential reasoning, 153, 154
Decker, H., 210 exception, 26, 214, 216, 218-221 , 223, 226,
deductive 228,234,237, 241 , 245,247,249,
coverage, 237 250
explanation, 26, 79 expansion, 52
reasoning,37,39,48,79,81-83,85, 107- abductive, 54-57
109,111,112, 114,128,199 experiment, 169, 170, 196
304 INDEX

explanandum, 50, 79 Frankfurt, H., 48


explanans, 50, 79, 82 Frazier, M., 280
explanation, 7, 10, 15, 21,36-38,41,64, 65, 67, Frege, G., 134
68, 78, 79, 83, 84, 87, 89, 93, 94, Friedman, J.H., 156
96, 105, 107-109, Ill, 114-116, Fung, T., 23
118, 120, 122, 128, 129, 133, 137, Furukawa, K., 177
138,140,255,269
abductive, 11-16,22-27,50,140, 143 Gardenfors, P., 45, 52, 53, 56, 86
causal, 68, 138, 149 Gabbay, D.M., 90
constructive, 129 Gallagher, J.P., 206
deductive, 26, 79 Galliers, J., 57
deductive-nomological, 79 Ganascia, J.G., 77
descriptive, 129 Gelfond, M., 214,216,221
inductive, 15, 140, 141, 143 generalisation, I, 3-6, 9- 12, 14, 16, 19, 20, 23,
inductive/abductive, 143 25, 27, 39-43, 60, 62-64, 78, 89,
inference to the best -, 8, 31, 35, 39, 46, 105, 107, 117, 118, 120, 122, 128,
48, 60,65,66,68, 73, 78,83,107 129, 167, 169-171, 177, 186, 256,
mechanism, 96, I 05 273
meta-level, 15 explanation-based, 108
nonmonotonic, 83 generalised rule, 175
of an observation, 170 generality, 78, 107-111, 115, 116, 134, 136,
true, 33 141,241,245,250
explanation-based generalisation, 108 Gentner, D., 173
explanation-based learning, 26, 108, 147, 211 Giordana, A., 146
explanation-preserving reasoning, 105 Glymour, C., 138
explanatory Goebel, R., 77, 133, 147, 150, 171, 178, 183
argument, 63 Goodall, S., 190
coherence, 172, 173, 177, 179 Gorlee, D., 48
hypothesis, 7, 48 Gregoire, E., Ill
induction, II, 26, 89, 97, 108-111, 114- Guneni, D., 175,214,228,233,236,237,239
116
inference, 83 Haack, S., 56
Haher, M., 205
power, 36, 43, 79, 86, 96
Hamfelt, A., 211
reasoning,5,47,89,91-95,97, 101,105
Hamscher, W., 134
extended abduction, 254
Haneda, H., 11, 196,210,233,250, 264
extended logic program, 214,216
Hanson, N.R., 9, 65
extension, 15, 21 , 112, 119-122, 127, 134, 141,
Harman, G., 31, 32, 40, 46, 63, 66, 151
216,228
Harper, W., 148
abductive, 17
Heckerman, D., 161 , 164, 189
inductive, 19
Helft, N., 16, 101, 110, 114, 141
extensional inference, 127
Hempel, C.G., 8, 36, 64, 68, 77, 79, 82, 90, 97,
98, 100, 113, 148
falsification, 92 Henglein, F., 207
falsity-preserving, 84 Henrion, M., 162
Fann, K.T., 5, 33, 49, 62, 65 Herskovits, E., 189
Feng, C., 175 heuristic, 184
Ferilli, S., 11, 26, 111, 116,215, 228 ruerarchy,214, 222,250
first-order logic, 107, 117, 126-128, 197 Hill, P.A., 197,206
Flach, P.A., 9, 16, 33, 35, 43, 46, 47, 49, 65, Hinton, G.E., 156, 190
83-85,94, 107, 115, 118, 137, 149, Hirata, K., 267
151, 170, 195, 196, 199, 226, 233, Hobbs, J.R., 45, 133, 139
234,249 Holland, J., 46
Flener, P., 186 Holyoak, R., 46
FOIL, 175, 185,264 Hookway, C., 47
foreground knowledge, 14, IS, 24 Hom clause, 183, 197, 198,271
Fox, Richard, 44 Hom logic, 6
Friihwirth, T.W., 207 Hom program, 214, 215
INDEX 305

Horn theory, 254, 264 background knowledge, 176


Horwich, P., 137 explanation, IS
Hume, D., 40, ISO exrension, 19
hypothesis, 6, 31, 48, 59,62-67,69,71, 72, 90, inference, 267
91, 107-109, lll-ll4, liS, 129, leap, 227 , 258
134, 140, 175, 176, 179, 215, 217, logic, 4
220, 224,227,229,269 logic programming, 11, 13, 16, 22, 24, 25,
evaluation, 4, 6, 8, 9, 12, 27, 31-34, 36, 101, 103, 110, 114, 116, 153, 170,
42-44, 89,105,108,109,111-116 171, 175, 183, 199, 211, 214, 218,
explanatory, 7, 48 233,234, 236,267
generation,4, 6,8-IO, 12,27, 33,34,36, reasoning, 46, 48, 5 I
44,46,89-91,93, 95,97, 104,105, support, 3, 4
107, Ill, 129, 170, 196 validity, 4
selection, 12, 18, 36, 44, 46, I 08-110, inference, 4, 78,83-85
112, 113, 115, ll6 analytic, 6
resting,8,27, 169,170 confirmatory, 83
hypothetical explanatory, 83
argument, 93 exrensional, 127
clause, 261 inrensional, 127
consequence, 90 sample-to-population, I 0, 16, 22
consequence relation, 90, 91, I 00 statistical, 40--42
reasoning, 63, 84, 170, 171, 177, 178 synthetic, 6
hypothetico-deductive model, 79, 82, 109, Ill
to the best explanation, 8, 31 , 35, 39, 46,
hypothetico-nonrnonotonic model, 82-84, 87 48,60,65,66,68,73, 78,83,107
informativeness, I 34, 136, 141
IBE, 31 inheritance, 117, ll9-122
ICL, 241,250 Inoue, K., I I, 23, 112, 196, 210, 214-216, 222,
103, 190 223, 225, 227, 233, 250, 254, 264,
IFF,23 265,267-269,273,278
implication, 39 inquiry,6,35,45,47,48,51,67,89, 118
incomplere information, 214,216,227-229 instance knowledge, 14, 17,21
incomplereness, 84 inregrity constraint, 14, 26, 70, 10 I, 177, 196,
of abducibles, 23 198, 200, 203, 215, 216, 226, 234,
of domain theory, 17, 26, 27, 180 235, 237, 238, 241, 244, 247, 248,
of explanations, 37 250
inconsisrency, 36, 43, 92, 94, 99, 102, 106, 249 inrension, 20, 108-llO, ll2, ll9-122, 127, 135,
incremental generation, 270 141
independent choice logic, 163 inrensional inference, 127
individual, 14, 17, 20, 113, 134 intractable, 60
inducible, 25, 264 introduction of clauses, 258
induction invenred abducible, 228, 260
as hypothesis resting, 8 inverse entailment, 22, 23, 34, Ill, 114,280
as inference of foreground knowledge, 14 inverse resolution, 23, Ill, ll4, 145, 169, 170
as selecting a collection of exrensions, 19, Ishikawa, T., 173
21 Ishizaka, H., 270
computational model of, 23 ITOU,270
confirmatory, 11, 16, 23, 26, 89, 108, 110,
liS
descriptive, II, 16,26,89,97,108-116 Jaffar, J ., 205
enumerative, 82 Jaynes, E.T., !59
explanatory, ll, 26, 89, 97, 108-111, Jensen, F. V., 162
114-116 Johnson-Laird, P., 138
inferential definition of, 8 Jordan, M.I., 154
of abductive theories, 26 Josephson, J.R., I, 8, ll, 12, IS, 31, 32, 47,72-
syllogistic definition of, 5 74, 107, 109, 133, 151, 169
inductive Josephson, S.G., 31, 32
argument, 3, 4 Jung, B., 277
306 INDEX

Kakas, A.C., 13, 21, 23-26, 33, 35, 45, 49, 51, positive-only, 178-180
54, 69, 71, 82, 84, 107, 115, 118, probabilities, 160
133, 170, 180, 183, 184, 189, 195, supervised, 110
196, 199,210,211,213-215,223, unsupervised, 40, 153, 161, 166
226,228,233-235,241,245,248- least generalization, 257
250,261,264,265 Lee, R., 268, 269
Kanai, T., 176,229,248,249 Lehmann, D., 86, 90, 103, 105, 106
Kant, I., 47 LeiS, H., 207
Kapitan, T., 48 Leibniz, G.W., 39
Kasahara, K., 173 LEL~214,216,217,220,224-228
Kawaoka, T., 173 Levesque, H.J., 54, 133, 183
Kedar-Cabelli, S., 133, 147 Levi, I., 57
Keller, R., 133, 147 Lewis, D., 63
Kelly, K., 138 Lifschitz, V., 214,216
Kemeny,J., 134 Lipton, P., 8, 32, 73, 84
Kfoury, A.J., 207 Lloyd,J.W., 197,271 , 275, 276
Kijsirikul, B., 186 Lobo, J., 54-56, 58
Kitcher, P., 68 logic
knowledge assimilation, 54, 70 first-order, 107,117,126-128,197
knowledge base refinement, 183 of discovery, 9
knowledge representation, 108, 111, 214, 229 of the finished research report, 9
knowledge system, 215, 223-225 predicate, 107, 117, 119, 125-128
Kodratoff, Y., 77, 151 propositional, 117
Konolige, K., 69, 112, 133, 150, 155 logic program
Kowalski, R.A., 13, 23, 45, 51, 69, 71, 133, 183, abductive extended, 223, 229
211,213-215,234,268,269 extended, 214, 216
Kraus, S., 86, 90, 103, 105, 106 normal, 214
Kruijff, G.J., 33, 49 logic programming, 2, 5, 6, II, 13, 23, 69, 133,
Kudoh, Y., 214, 216, 222 161, 163, 196, 205, 209, 213, 214,
Kuipers, T., 57 233-235
Kunifuji, S., 176,229,248,249 abductive, 13, 15, 24, 85, 163, 168, 199,
200,211,223,228,229,233,234
inductive, 11, 13, 16, 22, 24, 25, 101, 103,
LAB,24, 26
110, 114, 116, 153, 170, 171, 175,
Lachiche, N., 10, 107, 113, 114, 126
183, 199, 211, 214, 218, 233, 234,
LAELP, 215, 227, 228 236,267
Lakatos, 1., 137
probabilistic, 163
Lamma, E., 11, 26, Ill, 116, 177, 196, 210, Loredo, T.J., 159
215, 216, 222, 228, 234, 235, 239, Luby, M.,162
264 Lucas, P.J .F., 36
Laplace, P.S., 159 Lycan, W., 31
Lauritzen, S.L., 162
Lavrac,N., 183,213,217,220,226,227,267 machine learning, 77, 79, 93-95, 110, 133, 134,
Leake, D., 83 138, 139, 181, 191,220
learning, 133, 138, 148, 156 Mackworth, A.K., 133, 139, 147, 149, 154
abductive logic programs, 26, 223, 239, Magidor, M., 86, 90, 103, 105, 106
264 Makinson, D., 45, 52, 53, 86, 90
abductive theories, 188, 243 Malerba, D., 177,239
Bayesian networks, 161 Malfait, B., 24, 25, 247-249
decision tree, !59, 161 , 166 Mancarella, P., 23, 45, 54, 210, 215, 223, 234,
exceptions, 241 235,250,261,265
explanation-based, 108, 147,211 Marquis, P., 113, 114
from examples, 46, 139, 145 Martens, B., 19, Ill
from incomplete knowledge, 234, 242 Martin, L., 214, 228
from integrity constraints, 247 Martin, P., 45, 133, 139
neural networks, 161 , 166 Matsuzawa, K., 173
nonmonotonic logic programs, 216 Mayer, M.C., 84, 86
nonmonotonic theories, 214 McCarthy,J., 83
INDEX 307

Mello, P., II, 26, Ill , Il6, 177,215, 228, 234, nonmonotonic
235,239 explanation, 83
Meltzer, B., 146, 148, 150 inference, 83
meta-interpreter, 205 reasoning, 83, 90, 103, 104, 107-109,
meta-language, 90, 91, 109, 110, 112, 197, 198 ll2, ll4, 214,215, 227,229
meta-level, 15, 23, 84, 85, 90, 93, 105, Ill normal form, 13, 144
meta-logic, 85, 90,205 normal logic program, 214
Michael, A., 23 Numao, M., 186, 2ll
Michalski, R.S ., I, 21, 24, 71, 77, 78, 133-135,
138,145,146,148, 150, 183,256 O'Neill, M., 187
Milano, M., 177,215, 228,234,239 O'Rorke, P., 26, 133, 138, 149, 186, 264
Mill, J.S., 46, !50 observable, 14-22, 24, 199
minimal explanation, 255, 269 observation, 6, II , 13, 14,78-80, 82, 134
minimal model, 16, 103 Occam's razor, 139, 172, 183
minimality, 103, 114, 134 Olshen, R.A., 156
minimum description length, 161 Oppenheim, P., 148
MIS, 264 Ourston, D., 24, 25, 184, 186, 187, 264
Mitchell, T.M., 133, 139, 145, 147 overfitting, 160, 161,220
modelling
causal, 154 Pagnucco, M., 54-56, 58
evidential, 154 parsimonious, 36, 43, 72, 188
Mooney, R.J ., ll , 24-26, ll6, 133, 139, 147, partial evaluation, 257
172, 176, 183, 184, 186-188, 190, Pazzani, M.J., 184, 186,264
210, 229, 233, 248,249, 264 Pearl, J., 39, 134, 154-156, 161, 162, 187
Morgan, C., 146 Peirce, C.S., 5-11 , 13, 16, 33, 45, 47, 48, 51 ,
Morris, S., 133 93-95, 97, 107,145, 151,169,170,
Mortimer, H., 78, 82 195,265
most general unifier, 206 abduction,47
epistemology, 51
Muggleton, S.H., 13, 22, 23, 77, 101 , Ill, 114,
inferential theory, 6, 13, 35, 89, 105, 107,
135, 138, 146, 148, 169, 170, 175,
170, 196
183, 211, 213, 214, 218, 220, 228,
syllogistic theory, 5, 35, 117-ll9, 170,
247,267,270,277,278
196,199
Mycin, 154
Peng, Y., 183, 188, 189
perception, 154
natural language, 12, 45, 128, 133, 140, 148, Pereira, L.M., 216,222
172 PFOIL,l90
Nau, D.S., 147 Pietrzykowski, T., 23, 77, 133, 138, 139, 149,
Neal, R.M., 161 150, 186,267
near-miss example, 177 Pino Perez, R., 84-86
negation,216 Pirri, F., 85, 86
as failure, 144, 199,214,216,234,243 Pin, L., 280
by default, 214, 229, 234-236, 243 planning, 11, 12, 44, 133, 138, 148
classical, 214, 216,228 Plotkin, G.D., 133, 135, 256, 277
explicit, 214, 229 Poole, D., ll , 12, 77, ll2, 133, 134, 138, 139,
negative observation, 254 147, 148, 150, 155, 162-164, 168,
Neri, F., 133, 138, 149 171 , 183, 214, 215,227, 261,267-
neural network, 36, 153, 154, 156, 161, 166, 190 269
Ng, H.T., I39, 172, 183 PopleJr.,H.E., ll1 , 150, 169, 170,183, 267
Nicosia, M., 214,228 Popper, K., 137
Nienhuys-Cheng, S.-H., 211, 267, 268, 272 positive
Nilsson, J.F., 211 observation, 254
Nisbett, R., 46 positive-only learning, 178-180
no-good clause, 177 possible world, 19, 53, 61, 136, 157, 158, 163,
noise, 10, 157, 159, 161 , 220 249
nondeterminism, 226, 228, 229 predicate
nondeterministic rule, 215, 220, 221 , 224, 225, clause, 170, 171
227,228 completion, 112, ll4, 116
308 INDEX

form, 171 synthetic, 47


logic, 107, 117, 119, 125-128 recursion, 15, 211
rule, 177, 179 reducing operation, 274
prediction, 6, 8, 14-16, 20, 32, 39, 42--44, 66, refinement operator, 185
67,84,86,89,92,93,95, 170,177 refutation, 32, 42, II 5, 268
preduction, 180 Reggia, J.A., 147, 183, 188-190
preference, 31, 34, 36, 46, 65, 71, 85, 103, 108- Reilly, F. E., 48
110, 139,210 Reiter, R., 133, 139, 147, 149, 172, 183, 215,
prime implicant, 139 216
prior probability, 8, 159, 161 resolution, 205,206,251, 257,268
probabilistic Hom abduction, 163 retroduction, 6
probabilistic logic programming, 163 reversed deduction, IS, 23, 49, 71, 84, 95, 96,
probability, 126, 156 170
Bayesian, 12, I 56 revision, 24, 52
conditional, 82, I 58 abductive, 54-57
prior, 8, 159, 161 Riguzzi, F., II, 24, 26, Ill, 116, 177,213,215,
PROGOL, 22, 25, Ill, 175, 177,270 216,222,228,234,239,241,250
program generation, 169, 171 Roesler, A., 49
Prolog, 164,198,206-208,220,234,239,243, Roll, H., 52
250 Rouveirol, C., 146,270, 276,280
proof predicate, 195, 198,205,210 Ruffo, G., 214,228
propositional Rumelhart, D.E., 156, 190
clause, 170, 171 Russell, B., 40
form, 171 RUTH, 24, 25, 264
logic, 90, 117, 146
Psillos, S., 11, 15, 33, 47, 78
Sais, L., Ill
quantifier, 10, 18,22,23,80,126,277 Saitta, L., 8, II, 14-16, 20, 133, 138, 146, 149,
Quinlan, J.R., 156, 175, 185, 187, 190,264 209
Sakama, C., II, 25, Ill, 223, 229, 233, 236,
Ramachandran, S., 187 248,249,254,265
Ramsey, F.P., 54 Salmon, M.H., 3
range-restricted, 218, 259 Salmon, W.C., 37, 46, 68, 134, 148
rationality, 32, 34, 35, 86, 118, 125 sample-to-population inference, 10, 16,22
rationality postulate, 91, 97, 98 sampling, 40-42
raven paradox, 83, 126 satisfiability, 103, 113, 116, 206
reasoning saturant, 276
ampliative, 6, 17, 21, 60, 78, 107, 108, saturation, 279
215 Schulenburg, D., 133
automated, 44, 108, 114, 197 semantic tableaux, 56
backward, 34, 35, 39,78 Semeraro, G., II, 26, Ill, 116, 177,215,228,
by exclusion, 32, 38, 39 239
causal, 154 semi-unification, 207
commonsense, 32,83 set-covering, 185, 188
confirmatory, 89, 91-93, 97, 101, 105, Shanahan, M., 70, 164
106 Shannon, C., 136
deductive,37,39,48, 79,81-83,85,107- Shapiro, E.Y., 46,210, 264
109, Ill, 112, 114, 128, 199 Shavlik, J., 184
evidential, I 54 Shirnazu, K., 177
explanation-preserving, I OS Shimura, M., 186, 211
explanatory, 5, 47, 89, 91-95, 97, 101, Shinohara. T., 270
lOS Shortliffe, E., !54
hypothetical, 2, 63, 84, 170, 171, 177, 178 SIERES, 264
inductive,46,48,51 similarity, 85, 110, 112, 113
non-deductive, 2, 5, 6, 9, 12, 15, 22, 46, simulation, 154
89 skipping operation, 272
nonmonotonic, 83, 90, 103, 104, 107- skolemisation, 22, 115, 204, 210,273
109, 112, 114,214,215,227,229 Skyrms, B., 148
INDEX 309

Slagle, J., 268 unification, 206


SLD-refutation, 271 uniformity, 84, 85
SLD-resolution, 272 unit program, 278
SLDNFA,23 unsupervised learning, 40, 153, I61, I66
SLDNFAI, 264 Urcyczyn, P., 207
Smith, C.H., 92, 267 Uzcategui, C., 54-56, 58, 84, 85
SOL-resolution, 268,272,274
SOLD-derivation, 271 Van Laer, W., 241,250
SOLD-resolution, 268,271,274 Veloso, M., 133
SOLDR-derivation, 274 verification,8,86,92,93,95, I04, I96
SOLDR-resolution, 278 vision, 154
specialisation, 117, 186, 214, 239, 241-243, Vrain, C., 214, 228
246,248
closed world, 218 Wallace, W.A., 38
open world, 218 Wang, P., I4
specificity, 36, 43, 134, 135, 139, 186 Wang, P.Y. , 147
Spiegelhalter, D.J., 162 Weaver, W., 136
Srinivasan, A., 214, 220,228 Williams, C.R., 77
Stahl, 1., 186 Williams, M.A., 54
statistical inference, 40-42 Williams, R.J., IS6, 190
statistical learning, I 53 Wirth, R., I86, 264
statistical syllogism, 3 Wirth, U., 49
Stickel, M., 45, 133, 134, 139 Wogulis, J., I84, 186,264
Stone, C.J., I 56 Wrobel, S., 114
stratification, 204, 228
Subset Principle, I78, 179 Yamamoto, A., II, 22, 25, I96, 267, 268, 273,
subsumption theorem, 268 274,277, 280
surprise,45,49,SI,S4,57
switching lemma, 275 Zelle, J.M., 186
syllogism, 5, 13, I4, 36, 117, II9, I49 Zhang, N.L., I62
statistical, 3
synthetic inference, 6

Tanner, M.C., 72
Tarski, A., 90
taxonomy, 204,214
testing a hypothesis by experiment, 169, I70
Thagard, P.R., 46, 65, 73, 137, 172
theory
development, I7, 24, 25,27
formation, 17, 22
refinement, lSI, 229,264
revision, 22, 24, 54, ISI, 242
Theseider Dupre, D., 23, 69, 7I, 77, 85, 112,
I33, 139, I44, I48, ISO, ISS
Thompson, C.A., 24, 26, 188
Tiuryn, J., 207
Toni, F., 13, 45, 51, 69, 7I, 133, 183, 211, 213,
2I4,234
Torasso, P., 23, 69, 71, 77, 85, II2, I33, 138,
139, 144, I47-ISO, ISS
Towell, G., 184
transitivity, 96, 103, 1I9, I2I, 135
truth-preserving, 2, 6I, IIS
truth-value function, I2I, I27
Tuhrim, S., 190

unfolding, 226
APPLIED LOGIC SERIES

I. D. Walton: Fallacies Arising from Ambiguity. 1996 ISBN 0-7923-4100-7


2. H. Wansing (ed.): Proof Theory ofModal Logic. 1996 ISBN 0-7923-4120-1
3. F. Baader and K.U. Schulz (eds.): Frontiers ofCombining Systems. First International
Workshop, Munich, March 1996. 1996 ISBN 0-7923-4271-2
4. M. Marx andY. Venema: Multi-Dimensional Modal Logic. 1996
ISBN 0-7923-4345-X
5. S. Akama (ed.): Logic, Language and Computation. 1997 ISBN 0-7923-4376-X
6. J. Goubault-Larrecq and I. Mackie: Proof Theory and Automated Deduction. 1997
ISBN 0-7923-4593-2
7. M. de Rijke (ed.): Advances in Intensional Logic. 1997 ISBN 0-7923-4711-0
8. W. Bibel and P.H. Schmitt (eds.): Automated Deduction -A Basis for Applications.
Volume I. Foundations - Calculi and Methods. 1998 ISBN 0-7923-5129-0
9. W. Bibel and P.H. Schmitt (eds.): Automated Deduction -A Basis for Applications.
Volume II. Systems and Implementation Techniques. 1998 ISBN 0-7923-5130-4
10. W. Bibel and P.H. Schmitt (eds.): Automated Deduction- A Basis for Applications.
Volume III. Applications. 1998 ISBN 0-7923-5131-2
(Set vols. I-III: ISBN 0-7923-5132-0)
11. S.O. Hansson: A Textbook of BeliefDynamics. Theory Change and Database Updat-
ing. 1999 Hb: ISBN 0-7923-5324-2; Pb: ISBN 0-7923-5327-7
Solutions to exercises. 1999. Pb: ISBN 0-7923-5328-5
Set: (Hb): ISBN 0-7923-5326-9
Set: (Pb): ISBN 0-7923-5329-3
12. R. Pareschi and B. FronhOfer (eds.): Dynamic Worlds from the Frame Problem to
Knowledge Management. 1999 ISBN 0-7923-5535-0
13. D.M. Gabbay and H. Wansing (eds.): What is Negation? 1999 ISBN 0-7923-5569-5
14. M. Wooldridge and A. Rao (eds.): Foundations ofRational Agency. 1999
ISBN 0-7923-5601-2
15. D. Dubois, H. Prade and E.P. Klement (eds.): Fuzzy Sets, Logics and Reasoning about
Knowledge. 1999 ISBN 0-7923-5911-1
16. H. Barringer, M. Fisher, D. Gabbay and G. Gough (eds.): Advances in Temporal
Logic. 2000 ISBN 0-7923-6149-0
17. D. Basin, M.D. Agostino, D.M. Gabbay, S. Matthews and L. Vigano (eds.): Labelled
Deduction. 2000 ISBN 0-7923-6237-3
18. P.A. Flach and A.C. Kakas (eds.): Abduction and Induction. Essays on their Relation
and Integration. 2000 ISBN 0-7923-6250-0
19. S. Holldobler (ed.): lntellectics and Computational Logic. Papers in Honor of
Wolfgang Bibel. 2000 ISBN 0-7923-6261-6

KLUWER ACADEMIC PUBLISHERS - DORDRECHT I BOSTON I LONDON

Vous aimerez peut-être aussi