Académique Documents
Professionnel Documents
Culture Documents
Chapter XVI
Abstract
Artificial neural networks are increasingly being used to model complex, nonlinear phenomena. The
purpose of this chapter is to review the fundamentals of artificial neural networks and their major
applications in geoinformatics. It begins with a discussion on the basic structure of artificial neural
networks with the focus on the multilayer perceptron networks given their robustness and popularity.
This is followed by a review on the major applications of artificial neural networks in geoinformatics, including pattern recognition and image classification, hydrological modeling, and urban growth
prediction. Finally, several areas are identified for further research in order to improve the success of
artificial neural networks for problem solving in geoinformatics.
INTRODUCTION
An artificial neural network (commonly just
neural network) is an interconnected assemblage
of artificial neurons that uses a mathematical or
computational model of theorized mind and brain
activity, attempting to parallel and simulate the
powerful capabilities for knowledge acquisi-
Copyright 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
ral networks using dedicated hardware. Theoretically, artificial neural networks are highly robust
in data distribution, and can handle incomplete,
noisy and ambiguous data. They are well suited
for modeling complex, nonlinear phenomena
ranging from financial management, hydrological modeling to natural hazard prediction. The
purpose of the article is to introduce the basic
structure of artificial neural networks, review
their major applications in geoinformatics, and
discuss future and emerging trends.
BACKGROUND
The basic structure of an artificial neural network
involves a network of many interconnected neurons. These neurons are very simple processing
elements that individually handle pieces of a big
problem. A neuron computes an output using an
activation function that considers the weighted
sum of all its inputs. These activation functions
can have many different types but the logistic
sigmoid function is quite common:
f(x)
1
1
e x
where f(x) is the output of a neuron and x represents the weighted sum of inputs to a neuron.
As suggested from Equation 1, the principles
of computation at the neuron level are quite
simple, and the power of neural computation
relies upon the use of distributed, adaptive and
nonlinear computing. The distributed computing environment is realized through the massive
interconnected neurons that share the load of the
overall processing task. The adaptive property
is embedded with the network by adjusting the
weights that interconnect the neurons during the
training phase. The use of an activation function
in each neuron introduces the nonlinear behavior
to the network.
There are many different types of neural networks, but most can fall into one of the five major
123
Type
Example
Brief description
Feed-forward neural
network
Multi-layer perceptron
Simple recurrent
networks
Recurrent network
Hopfield network
3
Stochastic neural
networks
Boltzmann machine
They introduce random variations, often viewed as a form of statistical sampling, into the networks
Modular neural
networks
Committee of machine
Other types
They not only deal with nonlinear multivariate behavior, but also
include learning of time-dependent behavior.
Cascading neural
networks
They begin their training without any hidden neurons. When the
output error reaches a predefined error threshold, the networks add
a new hidden neuron.
Neuro-fuzzy networks
They are a fuzzy inference system in the body which introduces the
processes such as fuzzification, inference, aggregation and defuzzification into a neural network.
124
By far the most popular algorithm for optimizing feed-forward neural networks is error
back-propagation (Rumelhart et al., 1986). This
is a first-order local method. It is based on the
method of steepest descent, in which the descent
direction is equal to the negative of the gradient of
the error. The drawback of this method is that its
search for the optimal weight can become caught
in local minima, thus resulting in suboptimal
solutions. This vulnerability could increase when
the step size taken in weight space becomes too
small. Increasing the step size can help escape local error minima, but when the step size becomes
too large, training can fall into oscillatory traps
(Rojas, 1996). If that happens, the algorithm will
diverge and the error will increase rather than
decrease.
Apparently, it is difficult to find a step size that
can balance high learning speed and minimization of the risk of divergence. Recently, several
algorithms have been introduced to help adapt
step sizes during training (e.g., Maier and Dandy,
2000). In practice, however, a trial-and-error
approach has often been used to optimize step
size. Another sensitive issue in back-propagation
training is the choice of initial weights. In the
absence of any a priori knowledge, random values
should be used for initial weights.
The stop criteria for learning are very important. Training can be stopped when the total
number of iterations specified or a targeted value
of error is reached, or when the training is at the
point of diminishing returns. It should be noted
that using low error level is not always safe to
stop the training because of possible overtraining
or overfitting. When this happens, the network
memorizes the training patterns, thus losing the
ability to generalize. A highly recommended
method for stopping the training is through cross
validation (e.g., Amari et al., 1997). In doing so,
an independent data set is required for test purposes, and close monitoring of the error in the
training set and the test set is needed. Once the
error in the test set increases, the training should
125
APPLICATIONS
Artificial neural networks are applicable when a
relationship between the independent variables
and dependent variables exists. They have been
applied for such generic tasks as regression analysis, time series prediction and modeling, pattern
recognition and image classification, and data
processing. The applications of artificial neural
networks in geoinformatics have concentrated
on a few major areas such as pattern recognition
and image classification (Bruzzone et al., 1999),
hydrological modeling (Maier and Dandy, 2000)
and urban growth prediction (Yang, 2009). The
following paragraphs will provide a brief review
on these areas.
Pattern recognition and image classification
are among the most common applications of
artificial neural networks in remote sensing, and
the documented cases overwhelmingly relied upon
the use of multi-layer perceptron networks. The
major advantages of artificial neural networks over
conventional parametric statistical approaches to
image classification, such as the Euclidean, maximum likelihood (ML), and Mahalanobis distance
classifiers, are that they are distribution-free with
less severe statistical assumptions needed and that
they are suitable for data integration from various
sources (Foody, 1995). Artificial neural networks
are found to be accurate in the classification of
remotely sensed data, although improvements in
accuracies have generally been small or modest
(Campbell, 2002).
Artificial neural networks are being used increasingly to predict and forecast water resource
variables such as algae concentration, nitrogen
concentration, runoff, total volume, discharge,
or flow (Maier and Dandy, 2000; Dawson and
Wilby, 2001). Most of the documented cases used
a multi-layer perceptron that was trained by using
126
REFERENCES
Amari, S., Murata, N., Muller, K. R., Finke, M., &
Yang, H. H. (1997). Asymptotic statistical theory
of overtraining and cross-validation. IEEE Transactions On Neural Networks, 8(5), 985-996.
Bishop, C. ( 1995). Neural Networks for Pattern
Recognition (p. 504). Oxford: University Press.
Bruzzone, L., Prieto, D. F., & Serpico, S. B. (1999).
A neural-statistical approach to multitemporal and
multisource remote-sensing image classification.
IEEE Transactions on Geoscience and Remote
Sensing, 37(3), 1350-1359.
127
key TERMS
Architecture: The structure of a neural
network including the number and connectivity
of neurons. A network generally consists of an
input layer, one or more hidden layers, and an
output layer.
128