Académique Documents
Professionnel Documents
Culture Documents
Abstract—The advent of artificial neural networks has stirred are tested using a bench-marking problem in Section IV. One
the imagination of many in the field of knowledge acquisition. drawback with existing rule-extracting approaches is that they
There is an expectation that neural networks will play an im- can only deal with networks with binary inputs. In Section V,
portant role in automating knowledge acquisition and encoding,
however, the problem solving knowledge of a neural network is we propose a method to extract fuzzy rules from networks with
represented at a subsymbolic level and hence is very difficult continuous-valued inputs. The method is tested using a real-
for a human user to comprehend. One way to provide an life problem involving decision-making by pilots on combat
understanding of the behavior of neural networks is to extract situations. Conclusions are drawn regarding the utility and
their problem solving knowledge in terms of rules that can be applicability of this technique.
provided to users. Several papers which propose extracting rules
from feedforward neural networks can be found in the literature,
however, these approaches can only deal with networks with II. UNDERSTANDING THE BEHAVIOR OF NEURAL
binary inputs. Furthermore, certain approaches lack theoretical
support and their usefulness and effectiveness are debatable. NETWORKS: A HUMAN FACTORS PERSPECTIVE
Upon carefully analyzing these approaches, we propose a method Neural networks are expected to play an important role
to extract fuzzy rules from networks with continuous-valued in the development of AI systems. Many researchers have
inputs. The method was tested using a real-life problem (decision-
making by pilots involving combat situations) and found to be realized that there is a need for a more explicit consideration of
effective. human factors in the AI/expert systems field [3]. Chignell and
Peterson [4] have shown that knowledge engineering and the
development of expert systems benefit from careful use and
I. INTRODUCTION
application of human factors techniques. This section explains
accessible for people to report [8]. Current knowledge acquisi- with the system during its reasoning process and also be
tion approaches, e.g., verbal protocol analysis, are inadequate confident in the system’s reasoning and advice.”
for the task of obtaining complete data on the knowledge
involved in the problem solving process because a protocol Therefore, it is very important for people to be able to un-
can only reflect the information available in working memory derstand the behavior of neural networks. A good explanation
and verbalization is often much slower than the cognitive facility has been described as having two main functions: 1)
processes involved. Much information has to be inferred by to make the system more intelligible to the user and 2) to
analyzing large volumes of poorly articulated data. Since uncover shortcomings in the knowledge base [15]. To make
neural computing resembles human cognitive behavior at the neural networks fit these requirements, we need to understand
micro-level, some researchers suggest that neural networks can how knowledge is represented in neural networks. Ye and
be used for automated knowledge acquisition [9]. A neural Salvendy [9] developed three hypotheses about knowledge
network can be trained with examples to solve a problem. The representation in neural networks:
knowledge within the trained network would be more objective 1) An entity (object, concept, etc.) is more likely to be
and reliable than knowledge reasoned out by a specific person. locally encoded in a neural network.
The knowledge of a neural network, however, is implicitly 2) The knowledge is implemented in a implicit way in
embedded in its connections and weights. Developing a human the internal structure of the neural network (a group of
understanding of the knowledge in neural networks remains associated hidden neurons and their connections to entity
an open research topic. neurons), not in individual neurons or connections.
In traditional rule-based expert systems, knowledge is rep- 3) Different modules of a neural network, which implement
resented symbolically in expressions or data structures. These different conceptual schema, have similar processing
representations are subsequently manipulated or processed structures, but differ in their input and output.
to produce useful results which are the logical result of The authors validated these hypotheses using three experi-
existing representations. The situation is quite different in ments based on a task of modulo arithmetic. These three
neural networks. In neural computing, the processing is the experiments provided some potential insight into neural com-
representation. In other words, knowledge is not represented puting. Specifically, they found that neural computing provides
symbolically, but in the form of distributed processing and a theoretical and constructive foundation for understanding
localized decision rules [10]. This subsymbolic representation human cognitive behavior. The fact that the problem solving
makes neural networks more resistant to noise than traditional knowledge is implicitly embedded in the internal structure of
symbolic representation, however, this representation is very a neural network provides a good explanation for the difficulty
difficult for a human user to comprehend. involved in knowledge elicitation. The knowledge cannot be
Do we really need to know what is going on inside a directly accessed but needs to be reasoned out through problem
neural network? This question needs to be answered from a solving states. To overcome this natural limitation of human
human factors point of view. Madni [11] points out that “the ability to verbalize knowledge, one can train a neural network
success of expert systems depends not only on the quality to solve a problem and then obtain knowledge about the
and completeness of the knowledge elicited from experts problem solving process from the internal structure of the
but also on the compatibility of the recommendations and neural network. However, the authors did not address the
decisions with the user’s conceptualization of the task.” A issue of how to extract the problem solving knowledge from
study conducted by Lehner and Zirk [12] showed that when a trained neural network.
a human being and an intelligent machine cooperate to solve Mozer and Smolensky [16] point out that it is much easier
problems, but where each employs different problem-solving to understand the behavior of a feedforward neural network
procedures, the user must have an accurate model of how in terms of simple rules than in terms of a large number of
that machine operates. This is because when people deal weights and activation values. They proposed a skeletonization
with complex, interactive systems, they usually build up their technique to determine the relevance of individual units in
own conceptual mental model of the system. The model feedforward networks and to remove redundant units. The
guides their actions and helps them interpret the system’s result of the procedure is a minimal network which consists of
behavior. Such a model, when appropriate, can be very helpful only those units that really contribute to the solution. Another
or even necessary for dealing successfully with the system. weight elimination procedure was proposed by Weigend et al.
However, if inappropriate or inadequate, it can lead to serious [17]. Their method begins with a feedforward network that is
misconceptions or errors [13]. Kidd and Cooper [14] point out too large for a given problem. A cost is associated with each
that connection in the network. If a given level of performance
“If an expert system is to be responsible for complex on the training set can be obtained with fewer weights, the
decision-making and giving advice, then it is vital that cost function will encourage the reduction, and eventually
there is compatibility, at the cognitive level, between the elimination, of as many connections as possible. Weight
the user’s model of the problem and the system’s. In elimination is then extended to unit elimination and hence the
other words, the knowledge representation and problem least important hidden units are removed from the network.
solving processes employed by the system must be This will make the network more transparent and it should
readily intelligible to the user. Only if this is true will the thus be easier to understand the behavior of the network.
user both be able to interact competently and efficiently However, the weight elimination method does not provide
HUANG AND ENDSLEY: BEHAVIOR OF FEEDFORWARD NEURAL NETWORKS 467
TABLE I
WEIGHTS FOR THE 4-2-4 ENCODER/DECODER NEURAL NETWORK
TABLE II
WEIGHTS FOR THE 4-4-4 DECODER/ENCODER NEURAL NETWORK
(5.1)
where
Th threat factor of sector ;
I set of indices of targets in sector ;
number of targets in sector ;
total number of targets in the TSD;
range of the th target from the center of TSD (own-
ship);
Fig. 5. A TSD [29]. mean range of targets;
mean range of targets within sector .
of the fuzzy sets corresponding to the linguistic terms are For a sector that does not have any targets in it (henceforth
referred to as a dead sector), the threat factor was defined as
constructed properly.
0. In this way, information about the targets was converted
The approach can be summarized as follows:
into a 12-dimensional input vector.
Step 1.
Classify the continuous-valued inputs into sets. The primary network had 12 input nodes, as well as 12
Step 2.
Represent the sets using a binary scheme. output nodes. It was used to decide which sector should be
Step 3.
Construct a neural network with binary inputs. attacked. Each selected sector was divided into six sub-sectors
Step 4.
Train the network. of 5 each. The threat factors of these five sub-sectors for each
Step 5.
Identify important inputs using the large-weight active sector were calculated and fed into a secondary network.
approach. This secondary network made the final decision, picking a
particular target (or targets) to attack (or not to attack any
Step 6. Extract fuzzy rules using the subset approach.
aircraft). For more details about the networks, readers may
As the test case used to examine the previous techniques refer to Sundararajan [29].
involved binary inputs, this approach was validated using a Sundararajan [29] studied the data set and found that there
real-life problem involving decision-making by pilots regard- is a probability of 74% that the decision made by one pilot
ing combat situations. The data is drawn from the work of will be exactly duplicated by at least one other pilot (This is a
Endsley and Smith [28], a study investigating the effectiveness common issue in dealing with multiple experts, particularly in
of a tactical situation display (TSD) in the fighter aircraft this domain). He thus concluded that if the neural network’s
cockpit. A simple version of a TSD is shown in Fig. 5 [29]. average percentage generalization (PG, defined as the percent-
In this study, a TSD was presented on a computer screen. age of the network’s decisions that agree with at least one
pilot) is equal to at least 74%, then the decisions made by
On each trial, between 3 and 12 targets (enemy aircraft) were
the neural network will be as much in conformance with the
presented on the display for 5 s, at which point the targets were expert pilots’ decisions as those of the experts themselves.
blanked from the display. The subjects (experienced fighter Sundararajan’s dual-network indeed achieved a PG of 74%.
aircraft pilots) were instructed to verbally report the tactical This result confirmed that the pilots’ decisions can be modeled
action they would take if presented with the situation dis- by a neural network. It was also found that the accuracy of
played. Each subject was presented with 490 distinct displays, the dual-network was heavily dependent on the accuracy of
referred to as target sets, in a random order during the test. the primary network.
Ten (10) pilots were used in the experiment. The objective of the present study was to find out what
Sundararajan [29] developed a dual-network approach to kind of problem solving knowledge is acquired by the neural
model the pilots’ decision making process. The TSD was network through training. Since the accuracy of the network’s
decision is heavily dependent on the accuracy of the primary
divided into 12 imaginary sectors of 30 each. For each sector
network, only the primary network was investigated in this
that had at least one target in it (henceforth referred to as a live study. To facilitate rule extraction, the structure of the network
sector), the range and angular location of all the targets in that was modified. Instead of using a 12-dimensional output vector,
sector were integrated into a single value called a threat factor a one-dimensional output vector was used. The underlying
that was used as the input to a primary network. The threat assumption is that the pilot should use the same set of
factor for each live sector was calculated according to rules when dealing with each sector. Therefore, the modified
472 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 27, NO. 3, JUNE 1997
TABLE V
INFORMATION ABOUT THE CONVERTED TRAINING PATTERNS
The network was used to test decision accuracy for the 490
target sets. In 316 cases, the sector selected to be attacked was
also selected at least by one pilot. This yields a PG of 64.3%,
which is lower than that of Sundararajan’s dual-network. The
reasons are most likely 1) the network is much simpler than
Sundararajan’s dual-network and 2) binary inputs instead of
continuous-valued inputs are used for decision making.
From Table VI, one can see that connections [6], [9], and
[12] have significantly large weights. According to the large-
weight approach, the corresponding inputs are considered to Fig. 7. Membership function of threat factor.
be significant. All the other connection weights were then
set to zero, except for these three connections. The modified The fuzzy rule, however, is not quite as effective as Sun-
network was again tested using the 490 target sets. In 302 dararajan’s dual-network which showed a 74% PG. The main
cases, the sector selected by the new network was also selected reason is that our network is simpler than the dual-network.
at least by one pilot. This yields a PG of 61.6%. This means Furthermore, it only uses information from the most significant
that those three inputs contribute 95.6% of the decision and inputs. Despite these disadvantages, the fuzzy rule achieved a
indeed are significant inputs. Hence, only inputs 6, 9, and reasonably high accuracy.
12 will be considered. Notice that these three inputs are
from the sector under consideration, its left most adjacent
sector, and its right most adjacent sector, respectively. This VI. CONCLUSION
confirms Sundararajan’s observation that the most important Neural networks have been used successfully to model
information comes from the sector under consideration and its human decision making processes. However, neural networks
most adjacent sectors. act like a black box providing little insight into how decisions
Next, the subset approach for rule extraction was applied. are made. The knowledge of a neural network is represented
Only the bias and the weights associated with inputs 6, 9, at a subsymbolic level and hence is very difficult for a human
and 12 are considered. The bias is 8.3, while the weights user to comprehend. This paper sheds light into the neural net-
associated with inputs 6, 9, and 12 are 15.9, 23.7, and 18.9, work black box by discussing methods for extracting problem
respectively. Since an input is either 1 or 0, the network’s solving knowledge of feedforward neural networks in terms of
output can be 1 only when the 6th input is 0, the 9th input is rules. An innovative approach is also presented for extracting
1, and the 12th input is fuzzy rules from networks with continuous-valued inputs. The
. In other words, the network’s output will be 1 only when approach is applied to a real-life problem with promising
the threat factor of the sector under consideration is [0, 0, 1] results. However, the approach has some limitations. The
and the threat factors of the two most adjacent sectors are not approach can be applied only to feedforward neural networks
[0, 0, 1]. Therefore, the following rule can be concluded: with a sigmoid nodal function. Each neuron of the network
IF (threat factor of the sector is high) AND (threat fac- must be associated with a certain concept. The approach works
tors of the two most adjacent sectors are not high) THEN well with neural networks that have a simple structure. For
attack. networks with a complex structure, the effectiveness of the
This fuzzy rule can be used for the purpose of explanation. approach still needs to be accessed. We believe that neural
It can also be used to deal with the original target sets instead networks and fuzzy systems are closely related. They should
of the neural network, provided that the membership functions be integrated for the purpose of knowledge representation.
for the threat factor are constructed properly. The construction The integration of neural networks with fuzzy systems shows
of fuzzy membership functions for threat factor is independent promise for providing a higher level of understanding of
of the classification method used for converting continuous- intelligent systems.
valued inputs to binary inputs. After all, a fuzzy membership
is a real number in the range of 0 to 1. REFERENCES
A common method for constructing the fuzzy membership [1] K. G. Coleman and S. Watenpool, “Neural networks in knowledge
functions for threat factor is shown in Fig. 7. For each sector, acquisition,” AI Expert, vol. 7, no. 1, pp. 36–39, 1992.
the membership for attack was calculated based on the fuzzy [2] J. Diederich, “Explanation and artificial neural networks,” Int. J.
Man–Machine Studies, vol. 37, pp. 335–355, 1992.
rule. Among the 12 sectors, the sector with the highest [3] A. S. Maida, “Selecting a humanly understandable representation for
membership of attack is declared to be attacked. For the 490 reasoning about knowledge,” Int. J. Man-Machine Studies, vol. 22, pp.
target sets, in 329 cases, the sector selected was also selected 151–161, 1985.
[4] H. Chignell and J. G. Peterson, “Strategic issues in knowledge engi-
by at least one pilot. This yields a PG of 67.1% using the neering,” Human Factors, vol. 30, No. 4, pp. 381–391, 1988.
explanatory rule generated from the network. The result is even [5] T. Kohonen, “An introduction to neural computing,” Neural Networks,
vol. 1, pp. 3–16, 1988.
better than that of the network (61.6%) because the original [6] D. A. Mitta, “Knowledge acquisition: Human factors issues,” in Proc.
value rather than the converted value of the threat factor was Human Factors Soc. 33rd Annu. Meet., 1989, pp. 351–355.
used. In other words, the original information (continuous- [7] B. R. Gaines, “An overview of knowledge acquisition and transfer,” Int.
J. Man–Mach. Stud., vol. 26, pp. 453–472, 1987.
valued inputs) rather than the compressed information (binary [8] K. A. Ericsson and H. A. Simon, Protocol Analysis: Verbal Reports as
inputs) was used for decision making. Data. Englewood Cliffs, NJ: Prentice-Hall, 1984.
474 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 27, NO. 3, JUNE 1997
[9] N. Ye and G. Salvendy, “Cognitive engineering based knowledge Samuel H. Huang received the B.S. degree in
representation in neural networks,” Behav. Inf. Technol., vol. 10, no. instrument engineering from Zhejiang University,
5, pp. 403–418, 1991. P. R. China, in 1991, and the M.S. and Ph.D. de-
[10] Y.-H. Pao and D. J. Sobajic, “Neural networks and knowledge engineer- grees in industrial engineering from Texas Tech Uni-
ing,” IEEE Trans. Knowledge Data Eng., vol. 3, no. 2, pp. 185–192, versity, Lubbock, in 1992 and 1995, respectively.
1991. He is an R&D Engineer with EDS/Unigraphics
[11] A. M. Madni, “The role of human factors in expert systems design and Computer-Aided Manufacturing, Cypress, CA. His
acceptance,” Human Factors, vol. 30, no. 4, pp. 395–414, 1988. research interests are CAD/CAM, application of
[12] P. E. Lehner and D. A. Zirk, “Cognitive factors in user/expert system artificial intelligence in manufacturing, tolerance
interaction,” Human Factors, vol. 29, no. 1, pp. 97–109, 1987. analysis, and environmentally conscious manufac-
[13] R. M. Young, “The machine inside the machine: Users’ models of pocket turing. His work in these areas has been published in
calculators,” Int. J. Man–Mach. Stud., vol. 15, pp. 51–85, 1981. research journals including the International Journal of Production Research,
[14] A. L. Kidd and M. B. Cooper, “Man-machine interface issues in the the IEEE TRANSACTIONS ON COMPONENTS, PACKAGING, AND MANUFACTURING
construction and use of an expert system,” Int. J. Man–Mach. Stud., TECHNOLOGY, Computers in Industry, and the Journal of Engineering Design
vol. 22, pp. 91–102, 1985. and Automation.
[15] R. Davis and D. B. Lenat, Knowledge-Based Systems in Artificial Dr. Huang is a member of IIE, SME, and ASME.
Intelligence. New York: McGraw-Hill, 1982.
[16] M. C. Mozer and P. Smolensky, “Using relevance to reduce network
size automatically,” Connection Sci., vol. 1, no. 1, pp. 3–16, 1989.
[17] A. S. Weigend, B. A. Huberman, and D. E. Rumelhart, Predicting the
Future: A Connectionist Approach. Palo Alto, CA: System Sciences Mica R. Endsley received the B.S. degree from
Lab., Xerox Palo Alto Res. Center, 1990. Texas Tech University, Lubbock, and the M.S. de-
[18] L. Shastri, Semantic Networks: An Evidential Formalization and its gree from Purdue University, West Lafayette, IN,
Connectionist Realization. San Mateo, CA: Morgan Kaufman, 1988. both in industrial engineering, and the Ph.D. degree
[19] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal in industrial and systems engineering with a spe-
representations by error propagation,” Parallel Distributed Processing: cialization in human factors from the University of
Explorations in the Microstructure of Cognition, Vol. 1: Foundations. Southern California, Los Angeles.
Cambridge, MA: MIT Press, 1986, pp. 318–362. She is currently an Associate Professor of In-
[20] G. G. Towell and J. W. Shavlik, “Extracting refined rules from dustrial Engineering at Texas Tech University. She
knowledge-based neural networks,” Mach. Learn., vol. 13, pp. 71–101, has been working on issues related to situation
1993. awareness in high performance aircraft for the past
[21] S. I. Gallant, Neural Network Learning and Expert Systems. Cam- ten years, most recently expanding this research to air traffic control and
bridge, MA: MIT Press, 1993. maintenance for the Federal Aviation Administration. Prior to joining Texas
[22] L.-M. Fu, “Rule learning by searching on adapted nets,” in Proc. 9th Tech in 1990, she was an Engineering Specialist for the Northrop Corporation,
Nat. Conf. Artificial Intelligence. Anaheim, CA: AAAI Press, 1991, serving as Principal Investigator of a research and development program
pp. 590–595. focused on the areas of situation awareness, mental workload, expert systems,
[23] K. Saito and R. Nakano, “Medical diagnostic expert system based on
and interface design for the next generation of fighter cockpits. She the author
PDP model,” in Proc. IEEE Int. Conf. Neural Networks, San Diego,
of over 40 scientific articles and reports on numerous subjects, including
CA, 1988, vol. 1, pp. 255–262.
the implementation of technological change, the impact of automation, the
[24] S. Sestito and T. Dillon, “Knowledge acquisition of conjunctive rules
design of expert system interfaces, new methods for knowledge elicitation for
using multilayered neural networks,” Int. J. Intelligent Syst., vol. 8, pp.
artificial intelligence system development, pilot decision-making, and various
779–805, 1993.
[25] , “Using single-layered neural networks for the extracting of aspects of situation awareness.
conjuctive rules and hierarchical classifications,” J. Appl. Intell., vol. Dr. Endsley is the recipient of numerous awards for teaching, research, and
1, pp. 157–173, 1991. contributions to system development.
[26] Y. Yoon, R. W. Brobst, P. R. Bergstresser, and L. L. Peterson, “A
desktop neural network for dermatology diagnosis,” J. Neural Network
Comput., pp. 43–52, Summer 1989.
[27] J. J. Buckley, Y. Hayashi, and E. Czogala, “On the equivalence of neural
nets and fuzzy expert systems,” Fuzzy Sets Syst., vol. 53, pp. 129–134,
1993.
[28] M. R. Endsley and R. P. Smith, “Attention distribution and decision
making in tactical air combat,” unpublished.
[29] M. Sundararajan, “A neural network approach to model decision pro-
cesses,” M.S. thesis, Dept. Indust. Eng., Texas Tech Univ., Lubbock,
1993.