Understanding of FF-ANN

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 27, NO.
3, JUNE 1997 465
Providing Understanding of the Behavior

of Feedforward Neural Networks
Samuel H. Huang and Mica R. Endsley
Abstract—The advent of artificial neural networks has stirred are tested using a bench-marking problem in Section IV. One
the imagination of many in the field of knowledge acquisition. drawback with existing rule-extracting approaches is that they
There is an expectation that neural networks will play an im- can only deal with networks with binary inputs. In Section V,
portant role in automating knowledge acquisition and encoding,
however, the problem solving knowledge of a neural network is we propose a method to extract fuzzy rules from networks with
represented at a subsymbolic level and hence is very difficult continuous-valued inputs. The method is tested using a real-
for a human user to comprehend. One way to provide an life problem involving decision-making by pilots on combat
understanding of the behavior of neural networks is to extract situations. Conclusions are drawn regarding the utility and
their problem solving knowledge in terms of rules that can be applicability of this technique.
provided to users. Several papers which propose extracting rules
from feedforward neural networks can be found in the literature,
however, these approaches can only deal with networks with II. UNDERSTANDING THE BEHAVIOR OF NEURAL
binary inputs. Furthermore, certain approaches lack theoretical
support and their usefulness and effectiveness are debatable. NETWORKS: A HUMAN FACTORS PERSPECTIVE
Upon carefully analyzing these approaches, we propose a method Neural networks are expected to play an important role
to extract fuzzy rules from networks with continuous-valued in the development of AI systems. Many researchers have
inputs. The method was tested using a real-life problem (decision-
making by pilots involving combat situations) and found to be realized that there is a need for a more explicit consideration of
effective. human factors in the AI/expert systems field [3]. Chignell and
Peterson [4] have shown that knowledge engineering and the
development of expert systems benefit from careful use and
I. INTRODUCTION
application of human factors techniques. This section explains
R ECENTLY, the advent of artificial neural networks has

stirred the imagination of many in the field of knowl-
edge acquisition [1]. Neural networks belong to a family of
the need for understanding the behavior of neural networks
from a human factors point of view.
Neural networks have been inspired both by biological
models that are based on a learning-by-example paradigm in nervous systems and mathematical theories of learning. They
which problem solving knowledge is automatically generated are “massively parallel interconnected networks of simple
according to actual examples presented to the network. The (usually adaptive) elements and their hierarchical organizations
knowledge, however, is represented at a subsymbolic level in which are intended to interact with objects of the real world in
terms of connections and weights. Neural networks act like the same way as biological nervous systems do” [5]. Neural
a black box providing little insight into how decisions are computing, which may be more similar to human cognition
made. They have no explicit, declarative knowledge structure than current computing technology, is expected to facilitate the
which allows the representation and generation of explanation realization of automating human cognitive behavioral features
structures [2]. such as learning and generalization abilities. Neural networks
Experience with rule-based expert systems has shown that are expected to be an effective technique for automated
the ability to generate explanations is absolutely crucial for knowledge acquisition.
user acceptance of artificial intelligence (AI) systems. Hence, Knowledge acquisition is the process of collecting domain
it is very important to understand the behavior of neural net- knowledge from the expert and expressing it in the form
works. One way to generate an understanding of the behavior of facts and rules [4]. Domain knowledge consists of three
of neural networks is to extract their problem solving knowl- subsets: general knowledge, working-level knowledge, and
edge in terms of rules. This paper discusses the extraction expert knowledge. Expert knowledge results from an indi-
of rules from feedforward neural networks. Section II of this vidual’s extensive problem-solving experience in a specific
paper explains the need for understanding the behavior of domain. It is heuristic in nature and has also been described as
neural networks from a human factors point of view. Section unwritten or unconscious knowledge [6]. As domain experts
III provides a literature review on methods for extracting achieve greater competency, their ability to explain the fine
rules from feedforward neural networks. These approaches details associated with problem solving strategies degrade.
Manuscript received August 12, 1995; revised March 19, 1996. Intermediate solution steps are unconsciously performed as
S. H. Huang is with EDS/Unigraphics Computer-Aided Manufacturing, a matter of routine as strategies are compressed into a few
Cypress, CA 90630 USA (e-mail: huangsh@ug.eds.com). major steps [7]. Thus, not all of the knowledge involved can
M. R. Endsley is with the Department of Industrial Engineering, Texas Tech
University, Lubbock, TX 79416 USA. be decoded from schema to a semantic representation which
Publisher Item Identifier S 1083-4419(97)02924-5. is available in human working memory; and hence is not
1083–4419/97$10.00  1997 IEEE
466 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 27, NO. 3, JUNE 1997
accessible for people to report [8]. Current knowledge acquisi- with the system during its reasoning process and also be
tion approaches, e.g., verbal protocol analysis, are inadequate confident in the system’s reasoning and advice.”
for the task of obtaining complete data on the knowledge
involved in the problem solving process because a protocol Therefore, it is very important for people to be able to un-
can only reflect the information available in working memory derstand the behavior of neural networks. A good explanation
and verbalization is often much slower than the cognitive facility has been described as having two main functions: 1)
processes involved. Much information has to be inferred by to make the system more intelligible to the user and 2) to
analyzing large volumes of poorly articulated data. Since uncover shortcomings in the knowledge base [15]. To make
neural computing resembles human cognitive behavior at the neural networks fit these requirements, we need to understand
micro-level, some researchers suggest that neural networks can how knowledge is represented in neural networks. Ye and
be used for automated knowledge acquisition [9]. A neural Salvendy [9] developed three hypotheses about knowledge
network can be trained with examples to solve a problem. The representation in neural networks:
knowledge within the trained network would be more objective 1) An entity (object, concept, etc.) is more likely to be
and reliable than knowledge reasoned out by a specific person. locally encoded in a neural network.
The knowledge of a neural network, however, is implicitly 2) The knowledge is implemented in a implicit way in
embedded in its connections and weights. Developing a human the internal structure of the neural network (a group of
understanding of the knowledge in neural networks remains associated hidden neurons and their connections to entity
an open research topic. neurons), not in individual neurons or connections.
In traditional rule-based expert systems, knowledge is rep- 3) Different modules of a neural network, which implement
resented symbolically in expressions or data structures. These different conceptual schema, have similar processing
representations are subsequently manipulated or processed structures, but differ in their input and output.
to produce useful results which are the logical result of The authors validated these hypotheses using three experi-
existing representations. The situation is quite different in ments based on a task of modulo arithmetic. These three
neural networks. In neural computing, the processing is the experiments provided some potential insight into neural com-
representation. In other words, knowledge is not represented puting. Specifically, they found that neural computing provides
symbolically, but in the form of distributed processing and a theoretical and constructive foundation for understanding
localized decision rules [10]. This subsymbolic representation human cognitive behavior. The fact that the problem solving
makes neural networks more resistant to noise than traditional knowledge is implicitly embedded in the internal structure of
symbolic representation, however, this representation is very a neural network provides a good explanation for the difficulty
difficult for a human user to comprehend. involved in knowledge elicitation. The knowledge cannot be
Do we really need to know what is going on inside a directly accessed but needs to be reasoned out through problem
neural network? This question needs to be answered from a solving states. To overcome this natural limitation of human
human factors point of view. Madni [11] points out that “the ability to verbalize knowledge, one can train a neural network
success of expert systems depends not only on the quality to solve a problem and then obtain knowledge about the
and completeness of the knowledge elicited from experts problem solving process from the internal structure of the
but also on the compatibility of the recommendations and neural network. However, the authors did not address the
decisions with the user’s conceptualization of the task.” A issue of how to extract the problem solving knowledge from
study conducted by Lehner and Zirk [12] showed that when a trained neural network.
a human being and an intelligent machine cooperate to solve Mozer and Smolensky [16] point out that it is much easier
problems, but where each employs different problem-solving to understand the behavior of a feedforward neural network
procedures, the user must have an accurate model of how in terms of simple rules than in terms of a large number of
that machine operates. This is because when people deal weights and activation values. They proposed a skeletonization
with complex, interactive systems, they usually build up their technique to determine the relevance of individual units in
own conceptual mental model of the system. The model feedforward networks and to remove redundant units. The
guides their actions and helps them interpret the system’s result of the procedure is a minimal network which consists of
behavior. Such a model, when appropriate, can be very helpful only those units that really contribute to the solution. Another
or even necessary for dealing successfully with the system. weight elimination procedure was proposed by Weigend et al.
However, if inappropriate or inadequate, it can lead to serious [17]. Their method begins with a feedforward network that is
misconceptions or errors [13]. Kidd and Cooper [14] point out too large for a given problem. A cost is associated with each
that connection in the network. If a given level of performance
“If an expert system is to be responsible for complex on the training set can be obtained with fewer weights, the
decision-making and giving advice, then it is vital that cost function will encourage the reduction, and eventually
there is compatibility, at the cognitive level, between the elimination, of as many connections as possible. Weight
the user’s model of the problem and the system’s. In elimination is then extended to unit elimination and hence the
other words, the knowledge representation and problem least important hidden units are removed from the network.
solving processes employed by the system must be This will make the network more transparent and it should
readily intelligible to the user. Only if this is true will the thus be easier to understand the behavior of the network.
user both be able to interact competently and efficiently However, the weight elimination method does not provide
HUANG AND ENDSLEY: BEHAVIOR OF FEEDFORWARD NEURAL NETWORKS 467
any explanation facilities for neural networks. Furthermore,

this method has advantages only when the network size is
reasonably small.
Diederich [2] discusses the ability of neural networks to
generate explanations. He concludes that explanation is dif-
ficult to realize in unstructured neural networks. He argues
that connectionist systems benefit from the explicit coding
of relations and the use of highly structured networks in
order to allow explanation. He then developed an explanation
component for connectionist semantic networks (CSN). CSN
are massively parallel systems that allow the drawing of certain
kinds of inferences based on conceptual knowledge with
extreme efficiency [18]. They are quite different from con-
ventional neural networks. Traditional neural network learning Fig. 1. An artificial neuron.
algorithms such as back-propagation learning [19] cannot be
applied in CSN, and how to provide an understanding of the
behavior of the very popular feedforward neural networks
remains a question.
III. EXTRACTING RULES FROM FEEDFORWARD

NEURAL NETWORKS: A REVIEW
One way of understanding the behavior of neural networks
is to extract their problem solving knowledge in terms of rules. Fig. 2. Rule extracting using the subset algorithm [20].
By extracting rules from a neural network, an explanation
facility can be built. Thus, a human user will have some un-
derstanding of the network. Several approaches for extracting positive output indicates that the neuron is on, while a negative
rules from feedforward neural networks have been proposed output indicates that the neuron is off).
in the literature, however, these approaches can only deal with A simple, breadth-first subset algorithm starts by determin-
networks with binary inputs. Furthermore, certain approaches ing whether any sets containing a single link (connection)
lack theoretical support and their usefulness and effectiveness are sufficient to guarantee that the bias is exceeded. If yes,
are debatable. then these sets are written as rules. The search proceeds by
increasing the size of the subsets until all possible subsets
have been explored. Finally, the algorithm removes redundant
A. The Subset Approach
rules. Fig. 2 shows an example.
One approach for rule extraction is the so called subset A problem with the subset algorithm is that the cost of
method provided by Towell and Shavlik [20]. This method finding all subsets grows exponentially with the size of the
explicitly searches for subsets of incoming weights that exceed links. Several heuristic approaches have been proposed to deal
the bias (threshold, denoted as ) of a neuron. It represents the with this problem [20]–[23]. Among these approaches, the one
state-of-the-art in the published literature. proposed by Towell and Shavlik is quite interesting. Their
The subset method has a strong theoretical foundation. algorithm, called , differs from the subset algorithm
Recall that the output of an artificial neuron (Fig. 1) is defined in that it explicitly searches for rules of the form:
in general as IF ( of the following antecedents are true) THEN ...
The idea underlying is that individual antecedents
(3.1) (links) do not have a unique importance. Rather, groups of
antecedents form equivalence classes in which each antecedent
in the binary case and has the same importance as, and is interchangeable with, other
members of the class. Compared with the subset algorithm,
(3.2) generates fewer rules for the same network. An exper-
iment performed by Towell and Shavlik [20] indicated that the
in the continuous case, in which accuracy of rules derived by was approximately equal
to that of the network from which the rule set was extracted.
(3.3)
B. The Large-Weight Approach
where is the bias of the neuron. Another approach for extracting rules from trained neural
Therefore, if a certain combination of weighted input values of networks is the large-weight approach. Sestito and Dillon [24],
a neuron is greater than its bias then the status of the neuron is [25] proposed such an approach to extract information from
on; otherwise its status is off (or, for (3.2) correspondingly, a a single-layered network. Their method basically determines
the main contributory inputs of an output by selecting those

inputs whose associated weights are within some range of the
maximum weight for that particular output. For single-layer
neural networks, this is straightforward. Suppose we have
inputs and we are considering output . First, we need to find
such that
We can then select all the inputs ( ’s) whose associated

weight is within a particular range. In other words, we
select the ’s such that
where is a positive number representing the percentage of

the maximum that the selected inputs have to be within. The
Fig. 3. The 4-2-4 encoder/decoder neural network.
rules constructed will then be
IF and
This approach, however, is heuristic in nature and somewhat
where denotes an optional statement which can be re- unique in the literature. It is based on the postulation that if
peated. the outgoing weights of two neurons are similar then these two
A similar approach can be found in Yoon et al. [26]. neurons are closely associated. There is no theoretical support
The authors argued that neural computing could be viewed for this postulation. Even if the sufficiency of the postulation
as a simple linear discriminant analysis that attempts to could be proved, its necessity might never be proved due to
use input parameters to discriminate between a finite and the stochastic nature of neural networks. Hence, the usefulness
mutually exclusive set of output variables. In single-layer of this similar-weight approach is debatable. The weakness of
networks, the connection weights are roughly equivalent to this approach will be shown in our case study.
discriminant function. Therefore, each weight represents the
relative contribution of its associated variable. IV. A CASE STUDY: NETWORKS WITH BINARY INPUTS
In multi-layer networks, a common statistical technique
After reviewing the approaches for extracting rules from
known as factor analysis can be used. The hidden layers are
feedforward neural networks, a case study was conducted. In
comparable to factors. Although this comparison is far from
the case study, the - - encoder/decoder problem was
exact, the analogy does prove useful for knowledge-based
used to test the approaches. An “ - - encoder/decoder”
interpretation. Factors (i.e., hidden neurons) with large weights
means a three-layer neural network with neurons for the
are interpreted as important factors. This method was applied
input and output layer and neurons for the hidden layer. The
to a neural network for dermatology diagnosis and found to
network is given with distinct input patterns, each of which
be useful for the purpose of explanation.
only one bit is turned on, and all other bits are turned off. The
network should duplicate the input pattern in the output layer.
C. The Similar-Weight Approach
In a multi-layer neural network, the tracing of the contri- A. Using the Subset and the Large-Weight Approach
bution of inputs to an output is much more difficult than in a
A 4-2-4 neural network (Fig. 3) was trained using the error-
single-layer one. This is because there are hidden neurons to
back propagation algorithm. The input of each neuron was
deal with and hidden neurons usually do not have any explicit
either 1 or 1. An input 1 means the neuron is on; while an
meanings. In order to have some means of determining the
input 1 means the neuron is off. The training examples were
associations between the inputs and outputs, Sestito and Dillon
[24] decided to extend the input set to include all the desired
outputs. This means that outputs are used as additional inputs.
The rationale behind this is to enable determination of the
direct association between the inputs and the outputs.
Since the inputs and outputs are now at the same level, they The weights of the trained neural network are shown in Table I.
can be directly compared. If the weights from an original input Based on the subset approach, the following rules were
to the hidden neurons and that of an augmented input (i.e., an obtained:
output) are similar (or identical), then it is postulated that there
is a close association between the input and the output. On the 1. For the hidden layer,
basis of this postulation, the authors developed a method to Rule 1. IF 3 of ((not A), B, (not C), D) are true THEN
extract one conjunctive rule for each output of a multi-layer X.
neural network. The method was then tested in two example Rule 2. IF 3 of ((not A), (not B), C, D) are true THEN
domains and found to be useful. Y.
TABLE I
WEIGHTS FOR THE 4-2-4 ENCODER/DECODER NEURAL NETWORK
Fig. 4. The 4-4-4 encoder/decoder neural network.
TABLE II
WEIGHTS FOR THE 4-4-4 DECODER/ENCODER NEURAL NETWORK
2. For the output layer,

Rule 3. IF (not X) and (not Y) THEN a.
Rule 4. IF X and (not Y) THEN b.
Rule 5. IF (not X) and Y THEN c.
Rule 6. IF X and Y THEN d.
Although these rules are somewhat difficult to comprehend,
they can be used to classify the training examples correctly.
The large-weight approach does not make too much sense
in this case because most of the weights are of the same
magnitude.
To further validate the subset and the large-weight approach,
the network was trained with all of the possible input
patterns. The network was required to duplicate the input
patterns in the output layer. As the 4-2-4 network could not
converge when trained with those 16 patterns, the number of
hidden neurons was increased from 2 to 4. The 4-4-4 network
learned all of the 16 patterns. Its structure is shown in Fig. 4
and its weights are shown in Table II.
The large weights are highlighted in Table II. Based on the
large-weight approach, the following rules were obtained:
For the hidden layer,
Rule 1. IF (not A) THEN R.
Rule 2. IF D THEN X.
Rule 3. IF (not B) THEN Y.
Rule 4. IF (not C) THEN Z.
For the output layer,
Rule 5. IF (not R) THEN a.
Rule 6. IF (not Y) THEN b. The same result was obtained when the subset approach was
Rule 7. IF (not Z) THEN c. used. By simplifying the eight rules, the following rules are
Rule 8. IF X THEN d. obtained:
TABLE III TABLE IV

SSEab OF THE 4-2-4 ENCODER/DECODER NETWORK SSEab OF THE 4-2-4 ENCODER/DECODER
NETWORK TRAINED WITH NEW PATTERNS
a. IF A THEN a (Rule 1 and Rule 5);

3) Rule 3. IF A and B and C THEN c.
b. IF B THEN b (Rule 3 and Rule 6);
4) Rule 4. IF A and B and C and D THEN d.
c. IF C THEN c (Rule 4 and Rule 7);
d. IF D THEN d (Rule 2 and Rule 8); The ’ were calculated and are shown in Table IV.
From Table IV, we really cannot conclude any useful rules
which means the 4-4-4 network will always duplicate the input
since no similar weights can be found. Therefore, the similar-
pattern in the output neurons. This result is as was expected.
weight approach is not shown to be useful in this case.
The case study showed that although the three rule extrac-
B. Using the Similar-Weight Approach
tion approaches can help explain the behavior of a neural
Sestito and Dillon’s similar-weight approach was then ap- network, they are not always useful. Most often, rules obtained
plied to the 4-2-4 encoder/decoder network. The network using the subset approach are accurate but difficult to compre-
was trained with four patterns. The sum of squares error hend. The large-weight approach failed when all the weights
measurements were calculated and are shown in of a network are of the same magnitude. The similar-weight
Table III. approach is not always useful due to the stochastic nature of
From Table III, the following rules are obtained: neural network learning.
1) Rule 1. IF A THEN a.
2) Rule 2. IF B THEN b. V. EXTRACTING FUZZY RULES FROM
3) Rule 3. IF C THEN c. NETWORKS WITH CONTINUOUS-VALUED INPUTS
4) Rule 4. IF D THEN d. As mentioned previously, the rule extraction approaches
This rule set is exactly the same as that obtained for the reported in the literature can only deal with networks with
4-4-4 neural network using the subset and the large-weight binary inputs. However, many real-life applications require
approach. This rule set can be used to duplicate the 16 input that the input values be continuous. An approach based on
patterns. However, this rule set was obtained from the 4-2-4 fuzzy logic is proposal to deal with networks with continuous-
encoder/decoder network, which could not learn to duplicate valued inputs. A study by Buckley et al. [27] showed that
all of the 16 input patterns. Obviously, the performance of the under certain assumptions, a neural network can be approxi-
4-2-4 network cannot be the same as that of the obtained rule mated to any degree of accuracy using a fuzzy system, and
set. This means the similar-weight approach is not effective vice versa. Therefore, it is justified to extract fuzzy rules from
in this case. networks with continuous-valued inputs.
To further validate the similar-weight approach, the network A continuous-valued input can be described using a lin-
was retrained with the following four patterns: guistic term, such as large, medium, or small. Each linguistic
term can be represented by a set. A continuous-valued input
can thus be classified into a specific set and represented in
a binary scheme. In this way, the continuous-valued inputs
are converted to binary inputs. The large-weight approach and
subset approach can then be applied for rule extraction. The
The desired relation (rule set) is as follows: rules extracted can be used to explain the behavior of the
1) Rule 1. IF A THEN a. network in understandable linguistic terms. They can also be
2) Rule 2. IF A and B THEN b. used for decision-making, provided that membership functions
the following [29]:
(5.1)
where
Th threat factor of sector ;
I set of indices of targets in sector ;
number of targets in sector ;
total number of targets in the TSD;
range of the th target from the center of TSD (own-
ship);
Fig. 5. A TSD [29]. mean range of targets;
mean range of targets within sector .
of the fuzzy sets corresponding to the linguistic terms are For a sector that does not have any targets in it (henceforth
referred to as a dead sector), the threat factor was defined as
constructed properly.
0. In this way, information about the targets was converted
The approach can be summarized as follows:
into a 12-dimensional input vector.
Step 1.
Classify the continuous-valued inputs into sets. The primary network had 12 input nodes, as well as 12
Step 2.
Represent the sets using a binary scheme. output nodes. It was used to decide which sector should be
Step 3.
Construct a neural network with binary inputs. attacked. Each selected sector was divided into six sub-sectors
Step 4.
Train the network. of 5 each. The threat factors of these five sub-sectors for each
Step 5.
Identify important inputs using the large-weight active sector were calculated and fed into a secondary network.
approach. This secondary network made the final decision, picking a
particular target (or targets) to attack (or not to attack any
Step 6. Extract fuzzy rules using the subset approach.
aircraft). For more details about the networks, readers may
As the test case used to examine the previous techniques refer to Sundararajan [29].
involved binary inputs, this approach was validated using a Sundararajan [29] studied the data set and found that there
real-life problem involving decision-making by pilots regard- is a probability of 74% that the decision made by one pilot
ing combat situations. The data is drawn from the work of will be exactly duplicated by at least one other pilot (This is a
Endsley and Smith [28], a study investigating the effectiveness common issue in dealing with multiple experts, particularly in
of a tactical situation display (TSD) in the fighter aircraft this domain). He thus concluded that if the neural network’s
cockpit. A simple version of a TSD is shown in Fig. 5 [29]. average percentage generalization (PG, defined as the percent-
In this study, a TSD was presented on a computer screen. age of the network’s decisions that agree with at least one
pilot) is equal to at least 74%, then the decisions made by
On each trial, between 3 and 12 targets (enemy aircraft) were
the neural network will be as much in conformance with the
presented on the display for 5 s, at which point the targets were expert pilots’ decisions as those of the experts themselves.
blanked from the display. The subjects (experienced fighter Sundararajan’s dual-network indeed achieved a PG of 74%.
aircraft pilots) were instructed to verbally report the tactical This result confirmed that the pilots’ decisions can be modeled
action they would take if presented with the situation dis- by a neural network. It was also found that the accuracy of
played. Each subject was presented with 490 distinct displays, the dual-network was heavily dependent on the accuracy of
referred to as target sets, in a random order during the test. the primary network.
Ten (10) pilots were used in the experiment. The objective of the present study was to find out what
Sundararajan [29] developed a dual-network approach to kind of problem solving knowledge is acquired by the neural
model the pilots’ decision making process. The TSD was network through training. Since the accuracy of the network’s
decision is heavily dependent on the accuracy of the primary
divided into 12 imaginary sectors of 30 each. For each sector
network, only the primary network was investigated in this
that had at least one target in it (henceforth referred to as a live study. To facilitate rule extraction, the structure of the network
sector), the range and angular location of all the targets in that was modified. Instead of using a 12-dimensional output vector,
sector were integrated into a single value called a threat factor a one-dimensional output vector was used. The underlying
that was used as the input to a primary network. The threat assumption is that the pilot should use the same set of
factor for each live sector was calculated according to rules when dealing with each sector. Therefore, the modified
TABLE V
INFORMATION ABOUT THE CONVERTED TRAINING PATTERNS
network deals with one sector at a time and thus uses a

one-dimensional output vector.
A threat factor is a positive real number. It was classified
into a set as being high, medium, low, or none based on the
following:
high if
medium if
(5.2)
low if
none if
in which, . Fig. 6. Network configuration.
The linguistic terms high, medium, low, and none are rep-
resented using vectors , and , TABLE VI
WEIGHTS OF THE NETWORK
respectively. Using this set, the continuous-valued inputs (i.e.,
threat factors) were converted into binary inputs.
A neural network was then developed using the binary
inputs as determined from the set. As previously discussed,
one sector at a time will be examined and hence a network with
one output node is required. The dimension of the input vector
needs to be determined. When dealing with a sector, the threat
factor of that sector should be considered, and in addition,
so should the threat factor of the other sectors. Although it
seems that all 12 sectors should be taken into consideration,
Sudararajan [29] showed that the most important information
comes from the sector under consideration and its immediately
adjacent sectors, so just three sectors need to be considered in
each decision.
To examine the problem data, there are 490 target sets. Each
target set consists of 12 sectors, providing 5880 (490 12)
patterns. The threat factors are converted into binary vectors of input vector as small as possible. This will facilitate both
using (5.2). After converting the threat factors into binary network training and rule extraction. Therefore, it was decided
vectors, some input patterns became indistinguishable; yet to consider five sectors. To facilitate the rule extraction, a
their corresponding output patterns were not always the same. simple feedforward neural network with no hidden layers was
This problem is inevitable when continuous-valued inputs are used. The network configuration is shown in Fig. 6.
converted into binary inputs. The more sectors considered The network was trained using 431 converted binary pat-
(i.e., the larger the input dimension), the less the number of terns (182 unique patterns and 249 consistent patterns). The
conflicting patterns will be. Since the TSD is symmetric in popular back-propagation learning algorithm was applied. The
nature, the number of sectors considered should always be weights of the network are shown in Table VI.
increased by 2. For example, if the second adjacent sector to Since the network considers only one sector at a time,
the left is under consideration, one should also consider the the results of all 12 sectors within a target set need to be
second adjacent sector to the right. The information about the synthesized to make the final decision. The synthesis procedure
converted patterns is shown in Table V. is shown in the following:
From Table V, it can be seen that if only 3 sectors are con- Step 1. For each sector within the target set, calculate
sidered, the consistency of decisions made is very low (40.6%). the output of the network, denoted as
If 5 sectors are considered, the consistency will increase to .
80.6%. Although the more sectors considered, the higher the Step 2. Find .
consistency will be, it is also desirable to keep the dimension Step 3. Sector is declared as the sector to be attacked.
The network was used to test decision accuracy for the 490
target sets. In 316 cases, the sector selected to be attacked was
also selected at least by one pilot. This yields a PG of 64.3%,
which is lower than that of Sundararajan’s dual-network. The
reasons are most likely 1) the network is much simpler than
Sundararajan’s dual-network and 2) binary inputs instead of
continuous-valued inputs are used for decision making.
From Table VI, one can see that connections [6], [9], and
[12] have significantly large weights. According to the large-
weight approach, the corresponding inputs are considered to Fig. 7. Membership function of threat factor.
be significant. All the other connection weights were then
set to zero, except for these three connections. The modified The fuzzy rule, however, is not quite as effective as Sun-
network was again tested using the 490 target sets. In 302 dararajan’s dual-network which showed a 74% PG. The main
cases, the sector selected by the new network was also selected reason is that our network is simpler than the dual-network.
at least by one pilot. This yields a PG of 61.6%. This means Furthermore, it only uses information from the most significant
that those three inputs contribute 95.6% of the decision and inputs. Despite these disadvantages, the fuzzy rule achieved a
indeed are significant inputs. Hence, only inputs 6, 9, and reasonably high accuracy.
12 will be considered. Notice that these three inputs are
from the sector under consideration, its left most adjacent
sector, and its right most adjacent sector, respectively. This VI. CONCLUSION
confirms Sundararajan’s observation that the most important Neural networks have been used successfully to model
information comes from the sector under consideration and its human decision making processes. However, neural networks
most adjacent sectors. act like a black box providing little insight into how decisions
Next, the subset approach for rule extraction was applied. are made. The knowledge of a neural network is represented
Only the bias and the weights associated with inputs 6, 9, at a subsymbolic level and hence is very difficult for a human
and 12 are considered. The bias is 8.3, while the weights user to comprehend. This paper sheds light into the neural net-
associated with inputs 6, 9, and 12 are 15.9, 23.7, and 18.9, work black box by discussing methods for extracting problem
respectively. Since an input is either 1 or 0, the network’s solving knowledge of feedforward neural networks in terms of
output can be 1 only when the 6th input is 0, the 9th input is rules. An innovative approach is also presented for extracting
1, and the 12th input is fuzzy rules from networks with continuous-valued inputs. The
. In other words, the network’s output will be 1 only when approach is applied to a real-life problem with promising
the threat factor of the sector under consideration is [0, 0, 1] results. However, the approach has some limitations. The
and the threat factors of the two most adjacent sectors are not approach can be applied only to feedforward neural networks
[0, 0, 1]. Therefore, the following rule can be concluded: with a sigmoid nodal function. Each neuron of the network
IF (threat factor of the sector is high) AND (threat fac- must be associated with a certain concept. The approach works
tors of the two most adjacent sectors are not high) THEN well with neural networks that have a simple structure. For
attack. networks with a complex structure, the effectiveness of the
This fuzzy rule can be used for the purpose of explanation. approach still needs to be accessed. We believe that neural
It can also be used to deal with the original target sets instead networks and fuzzy systems are closely related. They should
of the neural network, provided that the membership functions be integrated for the purpose of knowledge representation.
for the threat factor are constructed properly. The construction The integration of neural networks with fuzzy systems shows
of fuzzy membership functions for threat factor is independent promise for providing a higher level of understanding of
of the classification method used for converting continuous- intelligent systems.
valued inputs to binary inputs. After all, a fuzzy membership
is a real number in the range of 0 to 1. REFERENCES
A common method for constructing the fuzzy membership [1] K. G. Coleman and S. Watenpool, “Neural networks in knowledge
functions for threat factor is shown in Fig. 7. For each sector, acquisition,” AI Expert, vol. 7, no. 1, pp. 36–39, 1992.
the membership for attack was calculated based on the fuzzy [2] J. Diederich, “Explanation and artificial neural networks,” Int. J.
Man–Machine Studies, vol. 37, pp. 335–355, 1992.
rule. Among the 12 sectors, the sector with the highest [3] A. S. Maida, “Selecting a humanly understandable representation for
membership of attack is declared to be attacked. For the 490 reasoning about knowledge,” Int. J. Man-Machine Studies, vol. 22, pp.
target sets, in 329 cases, the sector selected was also selected 151–161, 1985.
[4] H. Chignell and J. G. Peterson, “Strategic issues in knowledge engi-
by at least one pilot. This yields a PG of 67.1% using the neering,” Human Factors, vol. 30, No. 4, pp. 381–391, 1988.
explanatory rule generated from the network. The result is even [5] T. Kohonen, “An introduction to neural computing,” Neural Networks,
vol. 1, pp. 3–16, 1988.
better than that of the network (61.6%) because the original [6] D. A. Mitta, “Knowledge acquisition: Human factors issues,” in Proc.
value rather than the converted value of the threat factor was Human Factors Soc. 33rd Annu. Meet., 1989, pp. 351–355.
used. In other words, the original information (continuous- [7] B. R. Gaines, “An overview of knowledge acquisition and transfer,” Int.
J. Man–Mach. Stud., vol. 26, pp. 453–472, 1987.
valued inputs) rather than the compressed information (binary [8] K. A. Ericsson and H. A. Simon, Protocol Analysis: Verbal Reports as
inputs) was used for decision making. Data. Englewood Cliffs, NJ: Prentice-Hall, 1984.
[9] N. Ye and G. Salvendy, “Cognitive engineering based knowledge Samuel H. Huang received the B.S. degree in
representation in neural networks,” Behav. Inf. Technol., vol. 10, no. instrument engineering from Zhejiang University,
5, pp. 403–418, 1991. P. R. China, in 1991, and the M.S. and Ph.D. de-
[10] Y.-H. Pao and D. J. Sobajic, “Neural networks and knowledge engineer- grees in industrial engineering from Texas Tech Uni-
ing,” IEEE Trans. Knowledge Data Eng., vol. 3, no. 2, pp. 185–192, versity, Lubbock, in 1992 and 1995, respectively.
1991. He is an R&D Engineer with EDS/Unigraphics
[11] A. M. Madni, “The role of human factors in expert systems design and Computer-Aided Manufacturing, Cypress, CA. His
acceptance,” Human Factors, vol. 30, no. 4, pp. 395–414, 1988. research interests are CAD/CAM, application of
[12] P. E. Lehner and D. A. Zirk, “Cognitive factors in user/expert system artificial intelligence in manufacturing, tolerance
interaction,” Human Factors, vol. 29, no. 1, pp. 97–109, 1987. analysis, and environmentally conscious manufac-
[13] R. M. Young, “The machine inside the machine: Users’ models of pocket turing. His work in these areas has been published in
calculators,” Int. J. Man–Mach. Stud., vol. 15, pp. 51–85, 1981. research journals including the International Journal of Production Research,
[14] A. L. Kidd and M. B. Cooper, “Man-machine interface issues in the the IEEE TRANSACTIONS ON COMPONENTS, PACKAGING, AND MANUFACTURING
construction and use of an expert system,” Int. J. Man–Mach. Stud., TECHNOLOGY, Computers in Industry, and the Journal of Engineering Design
vol. 22, pp. 91–102, 1985. and Automation.
[15] R. Davis and D. B. Lenat, Knowledge-Based Systems in Artificial Dr. Huang is a member of IIE, SME, and ASME.
Intelligence. New York: McGraw-Hill, 1982.
[16] M. C. Mozer and P. Smolensky, “Using relevance to reduce network
size automatically,” Connection Sci., vol. 1, no. 1, pp. 3–16, 1989.
[17] A. S. Weigend, B. A. Huberman, and D. E. Rumelhart, Predicting the
Future: A Connectionist Approach. Palo Alto, CA: System Sciences Mica R. Endsley received the B.S. degree from
Lab., Xerox Palo Alto Res. Center, 1990. Texas Tech University, Lubbock, and the M.S. de-
[18] L. Shastri, Semantic Networks: An Evidential Formalization and its gree from Purdue University, West Lafayette, IN,
Connectionist Realization. San Mateo, CA: Morgan Kaufman, 1988. both in industrial engineering, and the Ph.D. degree
[19] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal in industrial and systems engineering with a spe-
representations by error propagation,” Parallel Distributed Processing: cialization in human factors from the University of
Explorations in the Microstructure of Cognition, Vol. 1: Foundations. Southern California, Los Angeles.
Cambridge, MA: MIT Press, 1986, pp. 318–362. She is currently an Associate Professor of In-
[20] G. G. Towell and J. W. Shavlik, “Extracting refined rules from dustrial Engineering at Texas Tech University. She
knowledge-based neural networks,” Mach. Learn., vol. 13, pp. 71–101, has been working on issues related to situation
1993. awareness in high performance aircraft for the past
[21] S. I. Gallant, Neural Network Learning and Expert Systems. Cam- ten years, most recently expanding this research to air traffic control and
bridge, MA: MIT Press, 1993. maintenance for the Federal Aviation Administration. Prior to joining Texas
[22] L.-M. Fu, “Rule learning by searching on adapted nets,” in Proc. 9th Tech in 1990, she was an Engineering Specialist for the Northrop Corporation,
Nat. Conf. Artificial Intelligence. Anaheim, CA: AAAI Press, 1991, serving as Principal Investigator of a research and development program
pp. 590–595. focused on the areas of situation awareness, mental workload, expert systems,
[23] K. Saito and R. Nakano, “Medical diagnostic expert system based on
and interface design for the next generation of fighter cockpits. She the author
PDP model,” in Proc. IEEE Int. Conf. Neural Networks, San Diego,
of over 40 scientific articles and reports on numerous subjects, including
CA, 1988, vol. 1, pp. 255–262.
the implementation of technological change, the impact of automation, the
[24] S. Sestito and T. Dillon, “Knowledge acquisition of conjunctive rules
design of expert system interfaces, new methods for knowledge elicitation for
using multilayered neural networks,” Int. J. Intelligent Syst., vol. 8, pp.
artificial intelligence system development, pilot decision-making, and various
779–805, 1993.
[25] , “Using single-layered neural networks for the extracting of aspects of situation awareness.
conjuctive rules and hierarchical classifications,” J. Appl. Intell., vol. Dr. Endsley is the recipient of numerous awards for teaching, research, and
1, pp. 157–173, 1991. contributions to system development.
[26] Y. Yoon, R. W. Brobst, P. R. Bergstresser, and L. L. Peterson, “A
desktop neural network for dermatology diagnosis,” J. Neural Network
Comput., pp. 43–52, Summer 1989.
[27] J. J. Buckley, Y. Hayashi, and E. Czogala, “On the equivalence of neural
nets and fuzzy expert systems,” Fuzzy Sets Syst., vol. 53, pp. 129–134,
1993.
[28] M. R. Endsley and R. P. Smith, “Attention distribution and decision
making in tactical air combat,” unpublished.
[29] M. Sundararajan, “A neural network approach to model decision pro-
cesses,” M.S. thesis, Dept. Indust. Eng., Texas Tech Univ., Lubbock,
1993.

Understanding of FF-ANN

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Understanding of FF-ANN

Transféré par

Droits d'auteur :

Formats disponibles

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 27, NO.

3, JUNE 1997 465

Providing Understanding of the Behavior

R ECENTLY, the advent of artificial neural networks has

any explanation facilities for neural networks. Furthermore,

III. EXTRACTING RULES FROM FEEDFORWARD

the main contributory inputs of an output by selecting those

We can then select all the inputs ( ’s) whose associated

where is a positive number representing the percentage of

Fig. 4. The 4-4-4 encoder/decoder neural network.

2. For the output layer,

TABLE III TABLE IV

a. IF A THEN a (Rule 1 and Rule 5);

the following [29]:

network deals with one sector at a time and thus uses a

Vous aimerez peut-être aussi