Artificial Neural Networks

122
Chapter XVI
Artificial Neural Networks

Xiaojun Yang
Florida State University, USA
Abstract
Artificial neural networks are increasingly being used to model complex, nonlinear phenomena. The
purpose of this chapter is to review the fundamentals of artificial neural networks and their major
applications in geoinformatics. It begins with a discussion on the basic structure of artificial neural
networks with the focus on the multilayer perceptron networks given their robustness and popularity.
This is followed by a review on the major applications of artificial neural networks in geoinformatics, including pattern recognition and image classification, hydrological modeling, and urban growth
prediction. Finally, several areas are identified for further research in order to improve the success of
artificial neural networks for problem solving in geoinformatics.
INTRODUCTION
An artificial neural network (commonly just
neural network) is an interconnected assemblage
of artificial neurons that uses a mathematical or
computational model of theorized mind and brain
activity, attempting to parallel and simulate the
powerful capabilities for knowledge acquisi-
tion, recall, synthesis, and problem solving. It

originated from the concept of artificial neuron
introduced by McCulloch and Pitts in 1943. Over
the past six decades, artificial neural networks
have evolved from the preliminary development
of artificial neuron, through the rediscovery and
popularization of the back-propagation training
algorithm, to the implementation of artificial neu-
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
ral networks using dedicated hardware. Theoretically, artificial neural networks are highly robust
in data distribution, and can handle incomplete,
noisy and ambiguous data. They are well suited
for modeling complex, nonlinear phenomena
ranging from financial management, hydrological modeling to natural hazard prediction. The
purpose of the article is to introduce the basic
structure of artificial neural networks, review
their major applications in geoinformatics, and
discuss future and emerging trends.
BACKGROUND
The basic structure of an artificial neural network
involves a network of many interconnected neurons. These neurons are very simple processing
elements that individually handle pieces of a big
problem. A neuron computes an output using an
activation function that considers the weighted
sum of all its inputs. These activation functions
can have many different types but the logistic
sigmoid function is quite common:
f(x)
1
1
e x
where f(x) is the output of a neuron and x represents the weighted sum of inputs to a neuron.
As suggested from Equation 1, the principles
of computation at the neuron level are quite
simple, and the power of neural computation
relies upon the use of distributed, adaptive and
nonlinear computing. The distributed computing environment is realized through the massive
interconnected neurons that share the load of the
overall processing task. The adaptive property
is embedded with the network by adjusting the
weights that interconnect the neurons during the
training phase. The use of an activation function
in each neuron introduces the nonlinear behavior
to the network.
There are many different types of neural networks, but most can fall into one of the five major
paradigms listed in Table 1. Each paradigm has

advantages and disadvantages depending upon
specific applications. A detailed discussion about
these paradigms can be found elsewhere (e.g.,
Bishop, 1995; Rojas, 1996; Haykin, 1999; and
Principe et al., 2000). This article will concentrate
upon multilayer perceptron networks due to their
technological robustness and popularity (Bishop,
1995).
Figure 1 illustrates a simple multilayer perceptron neural network with a 4541 structure.
This is a typical feed-forward network that allows the connections between neurons to flow in
one direction. Information flow starts from the
neurons in the input layer, and then moves along
weighted links to neurons in the hidden layers for
processing. The weights are normally determined
through training. Each neuron contains a nonlinear
activation function that combines information
from all neurons in the preceding layers.The
output layer is a complex function of inputs and
internal network transformations.
The topology of a neural network is critical
for neural computing to solve problems with
reasonable training time and performance. For
any neural computing, training time is always
the biggest bottleneck and thus, every effort is
needed to make training effective and affordable.
Training time is a function of the complexity of
the network topology which is ultimately determined by the combination of hidden layers and
neurons. A trade-off is needed to balance the
processing purpose of the hidden layers and the
training time needed. A network without a hidden
layer is only able to solve a linear problem. To
tackle a nonlinear problem, a reasonable number
of hidden layers is needed. A network with one
hidden layer has the power to approximate any
function provided that the number of neurons and
the training time are not constrained (Hornik,
1993). But in practice, many functions are difficult to approximate with one hidden layer and
thus, Flood and Kartam (1994) suggested using
two hidden layers as a starting point.
123
Table 1. Classification of artificial neural networks (Source: Haykin, 1999)

No.
Type
Example
Brief description
Feed-forward neural
network
Multi-layer perceptron
It consists of multiple layers of processing units that are usually

interconnected in a feed-forward way
Radial basis functions
As powerful interpolation techniques, they are used to replace the

sigmoidal hidden layer transfer function in multi-layer perceptrons
Kohonen self-organizing networks
They use a form of unsupervised learning method to map points in

an input space to coordinate in an output space.
Simple recurrent
networks
Contrary to feed-forward networks, recurrent neural networks use

bi-directional data flow and propagate data from later processing
stages to earlier stages
Recurrent network
Hopfield network
3
Stochastic neural
networks
Boltzmann machine
They introduce random variations, often viewed as a form of statistical sampling, into the networks
Modular neural
networks
Committee of machine
They use several small networks that cooperate or compete to solve

problems.
Other types
Dynamic neural networks
They not only deal with nonlinear multivariate behavior, but also
include learning of time-dependent behavior.
Cascading neural
networks
They begin their training without any hidden neurons. When the
output error reaches a predefined error threshold, the networks add
a new hidden neuron.
Neuro-fuzzy networks
They are a fuzzy inference system in the body which introduces the
processes such as fuzzification, inference, aggregation and defuzzification into a neural network.
Figure 1. A simple multilayer perceptron(MLP) neutral network with a 4 X 5 X 4 X 1 structure
124
The number of neurons for the input and output

layers can be defined according to the research
problem identified in an actual application. The
critical aspect is related to the choice of the number
of neurons in hidden layers and hence the number
of connection weights. If there are too few neurons in hidden layers, the network may be unable
to approximate very complex functions because
of insufficient degrees of freedom. On the other
hand, if there are too many neurons, the network
tends to have a large number of degrees of freedom which may lead to overtraining and hence
poor performance in generalization (Rojas, 1996).
Thus, it is crucial to find the optimum number of
neurons in hidden layers that adequately capture
the relationship in the training data. This optimization can be achieved by using trial and error
or several systematic approaches such as pruning
and constructive algorithms (Reed, 1993).
Training is a learning process by which the
connection weights are adjusted until the network
is optimal. This involves the use of training samples, an error measure and a learning algorithm.
Training samples are presented to the network
with input and output data over many iterations.
They should not only be large in size but also
be representative of the entire data set to ensure
sufficient generalization ability. There are several
different error measures such as the mean squared
error (MSE), the mean squared relative error
(MSRE), the coefficient of efficiency (CE), and
the coefficient of determination (r2) (Dawson and
Wilby, 2001). The MSE has been most commonly
used. The overall goal of training is to optimize
errors through either a local or global learning
algorithm. Local methods adjust weights of the
network by using its localized input signals and
localized first- or second- derivative of the error
function. They are computationally effective for
changing the weights in a feed-forward network
but are susceptible to local minima in the error surface. Global methods are able to escape
local minima in the error surface and thus can
find optimal weight configurations (Maier and
Dandy, 2000).
By far the most popular algorithm for optimizing feed-forward neural networks is error
back-propagation (Rumelhart et al., 1986). This
is a first-order local method. It is based on the
method of steepest descent, in which the descent
direction is equal to the negative of the gradient of
the error. The drawback of this method is that its
search for the optimal weight can become caught
in local minima, thus resulting in suboptimal
solutions. This vulnerability could increase when
the step size taken in weight space becomes too
small. Increasing the step size can help escape local error minima, but when the step size becomes
too large, training can fall into oscillatory traps
(Rojas, 1996). If that happens, the algorithm will
diverge and the error will increase rather than
decrease.
Apparently, it is difficult to find a step size that
can balance high learning speed and minimization of the risk of divergence. Recently, several
algorithms have been introduced to help adapt
step sizes during training (e.g., Maier and Dandy,
2000). In practice, however, a trial-and-error
approach has often been used to optimize step
size. Another sensitive issue in back-propagation
training is the choice of initial weights. In the
absence of any a priori knowledge, random values
should be used for initial weights.
The stop criteria for learning are very important. Training can be stopped when the total
number of iterations specified or a targeted value
of error is reached, or when the training is at the
point of diminishing returns. It should be noted
that using low error level is not always safe to
stop the training because of possible overtraining
or overfitting. When this happens, the network
memorizes the training patterns, thus losing the
ability to generalize. A highly recommended
method for stopping the training is through cross
validation (e.g., Amari et al., 1997). In doing so,
an independent data set is required for test purposes, and close monitoring of the error in the
training set and the test set is needed. Once the
error in the test set increases, the training should
125
be stopped since the point of best generalization

has been reached.
APPLICATIONS
Artificial neural networks are applicable when a
relationship between the independent variables
and dependent variables exists. They have been
applied for such generic tasks as regression analysis, time series prediction and modeling, pattern
recognition and image classification, and data
processing. The applications of artificial neural
networks in geoinformatics have concentrated
on a few major areas such as pattern recognition
and image classification (Bruzzone et al., 1999),
hydrological modeling (Maier and Dandy, 2000)
and urban growth prediction (Yang, 2009). The
following paragraphs will provide a brief review
on these areas.
Pattern recognition and image classification
are among the most common applications of
artificial neural networks in remote sensing, and
the documented cases overwhelmingly relied upon
the use of multi-layer perceptron networks. The
major advantages of artificial neural networks over
conventional parametric statistical approaches to
image classification, such as the Euclidean, maximum likelihood (ML), and Mahalanobis distance
classifiers, are that they are distribution-free with
less severe statistical assumptions needed and that
they are suitable for data integration from various
sources (Foody, 1995). Artificial neural networks
are found to be accurate in the classification of
remotely sensed data, although improvements in
accuracies have generally been small or modest
(Campbell, 2002).
Artificial neural networks are being used increasingly to predict and forecast water resource
variables such as algae concentration, nitrogen
concentration, runoff, total volume, discharge,
or flow (Maier and Dandy, 2000; Dawson and
Wilby, 2001). Most of the documented cases used
a multi-layer perceptron that was trained by using
126
the back-propagation algorithm. Based on the

results obtained so far, there is little doubt that
artificial neural networks have the potential to be
a useful tool for the prediction and forecasting of
water resource variables.
The application of artificial neural networks
for urban predictive modeling is a new but rapidly
expanding area of research (Yang, 2009). Neural
networks have been used to compute development probability by integrating a set of predictive
variables as the core of a land transformation
model (e.g., Pijanowski et al., 2002) or a cellular
automata-based model (e.g., Yeh and Li, 2003). All
the applications documented so far involved the
use of the multilayer perceptron network, a gridbased modeling framework, and a Geographic
Information Systems (GIS) that was loosely or
tightly integrated with the network for input data
preparation, modeling validation and analysis.
CONCLUSION AND FUTURE

TRENDS
Based on many documented applications within
recent years, the prospect of artificial neural
networks in geoinformatics seems to be quite
promising. On the other hand, the capability of
neural networks tends to be oversold as an allinclusive black box that is capable to formulate
an optimal solution to any problem regardless
of network architecture, system conceptualization, or data quality. Thus, this field has been
characterized by inconsistent research design
and poor modeling practice. Several researchers
recently emphasized the need to adopt a systematic approach for effective neural network model
development that considers problem conceptualization, data preprocessing, network architecture
design, training methods, and model validation in
a sequential mode (e.g., Mailer and Dandy, 2000;
Dawson and Wilby, 2001; Yang, 2009).
There are a few areas where further research is
needed. Firstly, there are many arbitrary decisions
involved in the construction of a neural network

model, and therefore, there is a need to develop
guidance that helps identify the circumstances
under which particular approaches should be
adopted and how to optimize the parameters that
control them. For this purpose, more empirical,
inter-model comparisons and rigorous assessment
of neural network performance with different
inputs, architectures, and internal parameters are
needed. Secondly, data preprocessing is an area
where little guidance can be found. There are
many theoretical assumptions that have not been
confirmed by empirical trials. It is not clear how
different preprocessing methods could affect the
model outcome. Future investigation is needed to
explore the impact of data quality and different
methods in data division, data standardization,
or data reduction. Thirdly, continuing research is
needed to develop effective strategies and probing tools for mining the knowledge contained in
the connection weights of trained neural network
models for prediction purposes. This can help
uncover the black-box construction of the neural
network, thus facilitating the understanding of
the physical meanings of spatial factors and their
contribution to geoinformatics. This should help
improve the success of neural network applications for problem solving in geoinformatics.

REFERENCES
Amari, S., Murata, N., Muller, K. R., Finke, M., &
Yang, H. H. (1997). Asymptotic statistical theory
of overtraining and cross-validation. IEEE Transactions On Neural Networks, 8(5), 985-996.
Bishop, C. ( 1995). Neural Networks for Pattern
Recognition (p. 504). Oxford: University Press.
Bruzzone, L., Prieto, D. F., & Serpico, S. B. (1999).
A neural-statistical approach to multitemporal and
multisource remote-sensing image classification.
IEEE Transactions on Geoscience and Remote
Sensing, 37(3), 1350-1359.
Campbell, J. B. (2002). Introduction to Remote

Sensing (3rd ) (p. 620). New York: The Guiford
Press.
Dawson, C. W., & Wilby, R. L. (2001). Hydrological modelling using artificial neural networks.
Progress in Physical Geography, 25(1), 80-108.
Flood, I., & Kartam, N. (1994). Neural networks
in civil engineering.2. systems and application.
Journal of Computing in Civil Engineering, 8(2),
149-162.
Foody, G. M. (1995). Land cover classification
using an artificial neural network with ancillary
information. International Journal of Geographical Information Systems, 9, 527- 542.
Haykin, S. (1999). Neural Networks: A Comprehensive Foundation (p. 842). Prentice Hall.
Hornik, K. (1993). Some new results on neuralnetwork approximation. Neural Networks, 6(8),
1069-1072.
Kwok, T. Y., & Yeung, D. Y. (1997). Constructive
algorithms for structure learning in feed-forward
neural networks for regression problems. IEEE
Transactions On Neural Networks, 8(3), 630645.
Maier, H. R., & Dandy, G. C. (2000). Neural
networks for the prediction and forecasting of
water resources variables: A review of modeling
issues and applications. Environmental Modelling
& Software, 15, 101-124.
Pijanowski, B. C., Brown, D., Shellito, B., &
Manik, G. (2002). Using neural networks and GIS
to forecast land use changes: A land transformation model. Computers, Environment and Urban
Systems, 26, 553575.
Principe, J. C., Euliano, N. R., & Lefebvre, W.
C. (2000). Neural and Adaptive Systems: Fundamentals Through Simulations (p. 565). New
York: John Wiley & Sons.
127
Reed, R. (1993). Pruning algorithms - a survey.

IEEE Transactions On Neural Networks, 4(5),
740-747.
Rojas, R. (1996). Neural Networks: A Systematic
Introduction (p. 502). Springer-Verlag, Berlin.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J.
(1986). Learning internal representations by error
propagation. In Parallel Distributed Processing
D. E. Rumelhart, & J. L. McClelland. Cambridge:
MIT Press.
Yang, X. (2009). Artificial neural networks
for urban modeling. In Manual of Geographic
Information Systems, M. Madden. American
Society for Photogrammetry and Remote Sensing (in press).
Yeh, A. G. O., & Li, X. (2003). Simulation of
development alternatives using neural networks,
cellular automata, and GIS for urban planning.
Photogrammetric Engineering and Remote Sensing, 69(9), 1043-1052.
key TERMS
Architecture: The structure of a neural
network including the number and connectivity
of neurons. A network generally consists of an
input layer, one or more hidden layers, and an
output layer.
128
Back-Propagation: The training algorithm for

the feed-forward, multi-layer perceptron networks
which works by propagating errors back through
a network and adjusting weights in the direction
opposite to the largest local gradient.
Error Space: The n-dimensional surface in
which weights in a networks are adjusted by the
back-propagation algorithm to minimize model
error.
Feed-Forward: A network in which all the
connections between neurons flow in one direction from an input layer, through hidden layers,
to an output layer.
Multiplayer Perceptron: The most popular
network which consists of multiple layers of interconnected processing units in a feed-forward
way.
Neuron: The basic building block of a neural
network. A neuron sums the weighed inputs,
processes them using an activation function, and
produces an output response.
Pruning Algorithm: A training algorithm
that optimizes the number of hidden layer
neurons by removing or disabling unnecessary
weights or neurons from a large network that is
initially constructed to capture the input-output
relationship.
Training/Learning: The processing by which
the connection weights are adjusted until the
network is optimal.

Artificial Neural Networks

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Artificial Neural Networks

Transféré par

Droits d'auteur :

Formats disponibles

122