Académique Documents
Professionnel Documents
Culture Documents
399
400
materially assisted with the development of hydrological models that are both
physically-based and spatially-distributed, such as the Systme Hydrologique
Europen (Abbott et al., 1986). Nevertheless, there remains a high degree of
empiricism in the representation of certain hydrological processes such that the
ideal of determining parameter values by direct measurement rather than calibration remains some distance away.
In contrast, systems investigation, which Amorocho & Hart (1964)
regarded as being concerned with the direct solution of technological problems
subject only to the constraints imposed by the available data and so not subject
to 'physical' considerations, has recently undergone something of a renaissance,
largely through the adoption of artificial intelligence techniques such as
Artificial Neural Networks (ANNs) and Genetic Algorithms (e.g. Babovic &
Minns, 1994). The particular advantage of the ANN is that, even if the 'exact'
relationship between sets of input and output data is unknown but is acknowledged to exist, the network can be 'trained' to iearn' that relationship,
requiring no a priori knowledge of the catchment characteristics.
In the hydrological context, the input pattern consists of rainfall depths
and the output the discharges at the catchment outlet. Since the contributions
from different parts of the catchment arrive at the outlet at different times, the
variations in the discharge output may be considered to be determined by the
rainfall depths at both the concurrent and previous time intervals. Preliminary
work (Hall & Minns, 1993) has indicated that the number of antecedent rainfall
ordinates required is broadly related to the lag time of the drainage area. Since
the ANN relates the pattern of inputs to the pattern of outputs, volume continuity is not a constraint. However, care must be taken to avoid the presentation to the ANN of contradictory information. More specifically, the input
pattern may contain many zeros both at the start of the rising limb of the output
hydrograph and during the recession when rainfall has ended and flows are
decreasing. These two situations could be distinguished by providing an extra
input consisting of a binary variable (say, zero for pre-storm and unity for
post-storm conditions), but previous work (Hall & Minns, 1993) has indicated
that antecedent flow ordinates both perform the same function and provide
additional information about the input pattern, i.e. the longer the input rainfalls
remain zero, the more the output decreases. The use of an output variable in
the input is encountered in other applications of ANNs (Hertz et al., 1991) and
is referred to as recurrent back-propagation. The inclusion of the flow at time
t 1 as an input to determine the flow at time t may appear to introduce an
element of flood routing into the model, but that is not the purpose of the
ANN. Unlike the conventional rainfall-runoff model, the network seeks to learn
patterns and not to replicate in detail the physical processes involved in
transforming input into output. The learning process does not depend upon any
assumptions relating to the form of the input-output transfer function, the
number of (active) parameters or their possible physical interaction. In the
terms of the discussion by Amorocho & Hart (1964), the ANN could perhaps
be regarded as the ultimate black-box model.
401
ARTIFICIAL NEURAL N E T W O R K S
The ability of the brain to perform difficult operations and to recognize
complex patterns, even if those patterns are distorted with a high degree of
noise, has fascinated scientists for centuries. The particular ability of the brain
to learn from experience without a predefined knowledge of the underlying
physical relationships makes it an exceptionally flexible and powerful calculating device that scientists would also like to mimic.
Yet other scientists are devoted to reproducing, or modelling, physical
phenomena by making use of electronic computational machines to solve everincreasingly complex partial differential equations and empirical relationships.
These scientists are supported by a rapid increase in the computational capacity
of modern computers and an emerging recognition of the advantages of
massively parallel computation (parallel distributed processing) that performs
the required calculations with ever-increasing speed. However, although the
design and construction of the hardware for parallel computation is relatively
straightforward, the software required for creating algorithms to utilize this
parallel architecture most efficiently is still quite limited.
These two groups of scientists, pursuing what appear to be quite different
goals, have found a common ground in the field of artificial neural networks.
402
One of the major applications of ANNs is in pattern recognition and classification or, more generally, system identification. In brief, an ANN consists of
layers of processing units (representing biological neurons - see Hopfield,
1994) where each processing unit in each layer is connected to all processing
units in the adjacent layers (representing biological synapses and dendrites).
Many publications describe in much greater detail the architecture of various
types of ANNs (for example, Beale & Jackson, 1990; Aleksander & Morton,
1990; Hertz et al., 1991). The selection of an appropriate architecture for an
ANN will depend upon the problem to be solved and the type of learning
algorithm to be applied. In particular, the use of Kohonen networks for
unsupervised classification of patterns and the use of Hopfield networks for
recalling previously learned patterns are two approaches commonly used in
pattern recognition. For the more general approach to systems identification,
one wishes to train an ANN to provide a correct output response to a given
input stimulus. In particular, for rainfall-runoff modelling, the input stimulus
corresponds to the measured rainfall and the output response to the measured
runoff from a catchment. A multi-layer, feed-forward, perceptron-type ANN
is one of the most suitable types of ANN for learning the stimulus-response
relationship for a given set of measured data. Figure 1 shows a general schematization of a 3-layer, feed-forward ANN of the type that was used in this study.
The working of an ANN can best be described by following the operations involved during training and computation. An input signal, consisting of
an array of numbers xi is introduced to the input layer of processing units or
nodes, as shown in Fig. 1. The signals are carried along connections to each
of the nodes in the adjacent layer and can be amplified or inhibited through
Output signal
'idden layer or
nternal representation
units
Input Signal
403
weights, wt, associated with each connection. The nodes in the adjacent layer
act as summation devices for the incoming (weighted) signals (Fig. 2). The
incoming signal is transformed into an output signal, Oj, within the processing
units by passing it through a threshold function. A common threshold function
for the ANN depicted in Fig. 1 is the sigmoid function defined as:
fix) = _ J _
(1)
which provides an output in the range 0 < f(x) < 1. In most thresholding
routines, the threshold function usually takes the form of a single-valued, harddelimiter. The sigmoidal threshold function is chosen for mathematical convenience because it resembles a hard-limiting step-function for extremely large
positive and negative values of the incoming signal and also gives useful
information about the response of the processing unit to inputs that are close
to the threshold value. Furthermore, the sigmoid function has a very simple
derivative that makes the subsequent implementation of the learning algorithm
much easier.
Input pattern
wi Summation and
w*5\threshold unit
Output pattern
> y
X !
L__
(2)
This output signal is subsequently carried along the weighted connections to the
following layer of nodes and the process is repeated until the signal reaches the
output layer. The one or more layers of processing units located between the
input and output layers have no direct connections to the outside world and are
referred to as hidden layers. The output signal can then be interpreted as the
response of the ANN to the given input stimulus.
The ANN can be trained to produce known or desired output responses
for given input stimuli. The ANN is first initialized by assigning random
numbers to the interconnection weights. An input signal is then introduced to
the input layer and the resulting output signal is compared to the desired output
404
signal. The interconnection weights are then adjusted to minimize the error
between the ANN output and the desired output. This process is repeated many
times with many different input/output tuples until a sufficient accuracy for all
data sets has been obtained. The adjustment of the interconnection weights
during training employs a method known as error back-propagation in which
the weight associated with each connection is adjusted by an amount proportional to the strength of the signal in the connection and the total measure of
the error {see Rumelhart et al., 1986). The total error at the output layer is
then reduced by redistributing this error value backwards through the hidden
layers until the input layer is reached. The next input/output tuple is then
applied and the connection weights readjusted to minimize this new error. In
this way, the back-propagation algorithm can be seen to be a form of gradient
descent for finding the minimum value of the multi-dimensional error function.
This procedure is repeated until all training data sets have been applied. The
whole process is then repeated starting from the first data set once more and
continued until the total error for all data sets is sufficiently small and
subsequent adjustments to the weights are inconsequential. The ANN is now
said to have learned a relationship between the input and output training data
sets. The exact form of this relationship cannot be extracted from the ANN but
rather is encapsulated in the stored series of weights and connections between
nodes. The absolute values of the individual weights cannot be interpreted to
have any deeper physical meaning (Minns, 1995).
Although the error back-propagation method does not guarantee convergence to an optimal solution since local minima may exist, it appears in
practice that the back-propagation method leads to solutions in almost every
case (Rumelhart et al., 1994). In fact, Hornik et al. (1989) concluded that
standard multi-layer, feed-forward networks are capable of approximating any
measurable function to any desired degree of accuracy. They further state that
errors in representation appear to arise only from having insufficient hidden
units or the relationships themselves being insufficiently deterministic. For this
reason, a standard, multi-layer, feed-forward ANN using standard backpropagation learning techniques was used in this study.
405
Rainfall data
For the purposes of the numerical experiments, six sequences of storm events
of varying duration, total depth and profile, occurring at irregular intervals,
were required that could be routed through simple conceptual hydrological
models with different degrees of nonlinearity in order to produce the
corresponding streamflow outputs. For simplicity, these rainfalls were treated
as areal averages. Since several storm sequences were required, they were
produced using Monte Carlo methods based on the following assumptions:
1.
2.
3.
4.
Initially, three sequences of 14 storm events were generated, the profile shapes
being selected by sampling from a distribution uniform over the range zero to
six. The first was a training sequence with a total duration of 764 h. Five of
the six profiles were represented, with durations having an average of 19.2 h
and a standard deviation of 6.95 h. The average depth was 31.6 mm, with a
standard deviation of 1.9 mm. The other two sequences were employed for
verification purposes. The first of these verification data sets was generated so
that the maximum values all fell within the range defined by the training
sequence. However, if an ANN were to be applied to a real catchment, even
if the training data included all the available measurements, there is always a
small but non-negligible probability that an extreme event beyond the range of
recorded experience may occur in the future. In order to evaluate the performance of an ANN under these circumstances, the second verification
sequence was generated that contained rainfall maxima outside the range of
those upon which the training data was based.
The two verification sequences had a total duration of 794 h, and all
profiles were represented. In the first data set, individual events had an average
duration of 19.8 h and a standard deviation of 4.9 h, and a mean depth of
24.6 mm with a standard deviation of 2.1 mm. The second verification sequence
was constructed by employing the same seed as that for the first, but assuming
that storm depths were lognormally-distributed with a mean of 25 mm and a
standard deviation of 3 mm, which implies a coefficient of variation of 1.53 and
406
a skewness coefficient of 8.2. The actual mean and standard deviation of storm
depths produced was 24.3 mm and 3.3 mm respectively.
Runoff data generation
The conceptual hydrological model that was adopted to produce the flow series
corresponding to the storm sequences was the RORB model (Mein et al.,
191A), the basic element of which is a single nonlinear reservoir for which the
relationship between storage, S, and discharge, Q, is given by:
S = KcKrQm
(3)
Kr=fh.
(4)
where Lt is the length of the reach represented by the storage element, Lav is
the average flow distance of sub-catchment inflows within the channel network,
and/is a factor depending upon the type of channel reach, i.e. natural, lined
or unlined.
According to Laurenson & Mein (1988), the exponent in equation (3) is
rarely less than 0.6 or greater than 1.0 when modelling catchment runoff
response to rainfall, and a trial value of 0.8 is recommended on beginning a
modelling exercise. A brief review of the available literature shows that the
values adopted for exponents has ranged from 0.67 (Watt & Kidd, 1975) to 0.8
(Selvalingham et al., 1987), with a predominance of values between 0.7 and
0.8 (Laurenson, 1964; Askew, 1970; Mein et al, 191A; Hong & Mohd Nor,
1988). Three models were therefore adopted to cover the range of possible
catchment behaviour:
(i) m = 0.8 to represent the typical nonlinear relationships encountered in
practice;
(ii) m = 1.0 to represent the extreme linear case; and
(iii) m = 0.5 to represent an extreme nonlinear type of behaviour.
In order to run the RORB software, a hypothetical catchment area and main
channel length had to be assumed in order to establish the value of Kr. The
chosen values of these characteristics were consistent with those of a rural
drainage area of about 30 km2 in southern England. Although these
considerations are not particularly relevant to the learning of patterns, the size
of the catchment broadly determines how many antecedent rainfall depths are
required in developing the ANN and therefore influences the overall size of the
network. For simplicity, no losses were separated and the catchment was considered to have no impervious area. The Kc value was set to 20. The time
407
series of flows so obtained reflected very well the range in response characteristics represented by the three models, with model (iii) showing rapid rises
and recessions in contrast to the slow rises and sustained recessions of model
(ii). For the purposes of illustration, the rainfall hyetographs and flow
hydrographs generated by model (i) are presented in Fig. 3.
I
^ 60
i
12
_, 20
J 0
time
Standardization of data
Prior to presenting the data to the ANN for training, a standardization must be
applied in order to restrict the data range to the interval of zero-to-one,
corresponding to the output limits of the nodes of the network as expressed in
equation (1). The significance of this standardization should not be underestimated. When different standardization factors are applied to the training and
verification sequences, the actual numbers represented by unity in the two data
sets are different. In practice, a trained ANN can only be used in the recall
mode with data that it has 'seen' before; the ANN should not be used for
extrapolation. For example, if the maximum flow that the ANN has learned to
predict is 50 m3 s"1 (corresponding to, say, an output from the node of 1.0),
it is impossible for the ANN ever to predict a flow value exceeding 50 m3 s"1.
The choice of the range for standardization may therefore influence significantly the performance of the ANN. For the experiments used here, the
standardization factors adopted were the maximum generated rainfall depths
and flow ordinates rounded up to the next highest multiple of 10 mm or
10 m3 s"1 respectively. For the training sequence and the first verification
408
409
to the first event, and the cycle is repeated. This procedure is continued until
the global error of the network, which is based upon the sums of squares of the
differences between observed and computed values, is driven down to an
acceptable level. In the majority of runs, in order to ensure that the global
error had truly reached its minimum, the training was continued until the
number of events had exceeded 106. Since the global error as implemented in
the software package employed was dependent upon the number of nodes in the
network, a more general fitting criterion was sought. As the review by Diskin
& Simon (1977) has shown, a variety of such indices have been applied in
hydrological modelling, but perhaps the form that has been used most widely
is the coefficient of efficiency defined as one minus the quotient of the mean
square error and the variance of the observed flows, i.e.:
i
F =1-
m=l
(5)
m
1
I
- \2
m-l/=iv
'
where qt are the model estimates of the flow ordinates, qi,i = \,2,...,m and
g is the mean of the qv Since the network inputs included the flows at previous
time steps, the ANN could be considered to be modelling the change inflows
rather than their absolute values. In these circumstances, the variance of the
differences in flows, qv - qiA, could be preferred to the variance of the
observed flows in equation (5). However, investigation showed that, for the
data sets employed in this study, the variance of the differences was usually of
the order of 10~2 times the variance of the observed flows, but that the mean
square error could be as high as 10 times the variance of the differences. In
these circumstances, use of the latter would then lead to F values well below
minus one, whereas equation (5) remains between zero and one and was therefore preferred.
Training an ANN can take several hours on a powerful, desk-top personal
computer. However, once the weights have been determined the running time
for the model with a new input data sequence is only a few seconds.
In order to demonstrate the degree of fit obtained, two consecutive events
from the training sequence, including the largest of the 14 generated storms,
have been selected for illustration. Figure 4 shows the performance of the 3layer ANN for each of the three models for these events. In all cases, the
hydrograph from the smaller event is well simulated, but the 3-layer ANN
marginally underestimates the six or seven peak ordinates from the larger
event. In addition, Table 1 summarizes the results from both training and
verifying the 3-layer ANN on the data from each of the three models. In each
case, the ANN was trained on the training sequence and verified on all three
verification sequences as described above.
Table 1 shows that the goodness-of-fit obtained was such that the
majority of the coefficients of efficiency varied only in the third place of
decimals. In both training and verification, the performance of the ANN on the
410
model iii
__ training data
A
3-layerANN
o 4-layerANN
time, h
Fig. 4 Training of 3- and 4-layer ANNs on input and output data from
each of three conceptual catchment models: (a) model i; (b) model ii;
and (c) model iii. Two events only have been selected for clarity of
illustration.
411
linear case was marginally the best, although there was little to choose between
that and the two nonlinear cases. Comparison of the three verification cases
underlines the importance of the standardization. Whereas the normal and
extreme verification results are comparable for all three models, those for the
out-of-range case are notably poorer, essentially because the ANN depresses
the extremes in the data sets.
Table 1 Coefficients of efficiency for 3-layer ANNs fitted to rainfall and runoff series
from three different conceptual hydrological models
Model
extreme
out-oi -range
0.9955
0.9963
0.9943
0.7800
ii
0.9973
0.9980
0.9945
0.8383
iii
0.9942
0.9867
0.9807
0.7718
412
model ii
normal verification
fl
3-layerANN
o 4-layerANN
BSeec'
model iii
normal verification
A
3-layerANN
o 4-IayerANN
ft
f
M
-^
3S8QS#_
SSSaj3a^Q3aQG3Q
413
(aP
model I
_ extreme verification
&
3-layerANN
o 4-layerANN
0.2
^fiEeageeasaeaao
(b)
model ii
extreme verification
_J .
fl
3-layerANN
o 4-layerANN
/
L
/'"
L
'I
X=,
i
- i
0.2 J -
J
(0
extreme verification
rn
on>
3-layerANN
o 4-layerANN
0.6
i i
0.2
^ w ^
model iii
#o
x*^
-i
i
!
\
U
T ^ f e w m w l
20
^ ^ ^ ^
60
time, h
414
(a)
model i
out-of-range /erification
3-layerANN
o 4-layerANN
1 \
-A
it
r\ \
(b)
2.5
model ii
out-of-range verification
A
3-iayerANN
o 4-layerANN
0.5
(0
model iti
out-of-range verification
2
3-layerANN
o 4-layerANN
1
I
1.5
f>A A
0.5
A
I
\
\
\
/cgpasB^gp^
l
'f&smmffpK
40
60
time, h
80
415
deemed unsatisfactory, a 4-layer ANN may well bring about some improvement. This conclusion appears valid over the range of linear and nonlinear
behaviour normally encountered in rainfall-runoff modelling, and confirms the
potential of the approach.
Table 2 Coefficients of efficiency for 4-layer ANNs fitted to rainfall and runoff series
from three different conceptual hydrological models
Model
extreme
out-of-range
0.9998
0.9996
0.9992
0.8135
ii
0.9993
0.9993
0.9971
0.8502
iii
0.9997
0.9836
0.9866
0.7923
CONCLUDING REMARKS
The results of the numerical experiments summarized above, based upon
rainfall and flow data generated by conceptual catchment models varying from
linear to extremely (in hydrological terms) nonlinear cases, have reinforced the
conclusion reached by Hall & Minns (1993) that ANNs are capable of identifying usable relationships between discharges and antecedent rainfalls. When
the fitted ANNs were verified on storm sequences containing the same range
of extremes as the training data (normal verification), the coefficients of
efficiency were comparable to the second decimal place. The performance of
the ANN deteriorated with increasing nonlinearity - but only in the third
decimal place. In terms of individual storm hydrographs, the largest peaks were
not always reproduced closely. This performance can be expected when the
number of 'high' peaks is small compared with that of 'average' peaks; the
ANN assigns relatively more importance to the latter rather than to matching
the former. These findings are sufficient to suggest that extreme caution should
be applied if ANNs were to be employed in studies of extreme floods.
When the ANNs were verified on sequences having larger extremes than
the training data (extreme verification), F values were reduced but not as much
as expected. The larger range of standardization introduced a squashing of the
hydrographs such that the sequence provided a surplus of small (on the scale
of zero-to-one) rises on which the ANNs had some difficulty. In contrast the
larger peaks were simulated rather well and the largest events tended to be
overestimated (Fig. 6).
The out-of-range verification sequences (Fig. 7) serve to emphasize the
care required in choosing standardization factors. Nevertheless, even though the
largest events were significantly in error, those scaled to below unity were
modelled well.
416
REFERENCES
Abbott, M. B., Bathurst, J. C , Cunge, J. A., O'Connell, P. E. & Rasmussen, J. (1986) An introduction
to the European Hydrological System - Systme Hydrologique Europen, "SHE", 1: History and
philosophy of a physically-based, distributed modelling system. / . Hydrol. 87, 45-59.
Aleksander, I. & Morton, H. (1990) An Introduction to Neural Computing. Chapman and Hall, London.
Amorocho, J. & Hart, W. E. (1964) A critique of current methods in hydrologie systems investigation.
Trans. AGU 45, 307-321.
Askew, A. J. (1970) Derivation of formulae for variable lag time. / . Hydrol. 10, 225-242.
Babovic, V. & Minns, A. W. (1994) Use of computational adaptive methodologies in hydroinformatics. In:
Hydroinformatics '94 (Proc. 1st Internat. Conf. on Hydroinformatics, Delft, The Netherlands) ed.
A. Verwey, A. W. Minns, V. Babovic & C. Maksimovic, 201-210. Balkema, Rotterdam, The
Netherlands.
417
Beale, R, & Jackson, T. (1990) Neural Computing: An Introduction. Institute of Physics, Bristol, UK.
Diskin, M. H. & Simon, E. (1977) A procedure for the selection of objective functions for hydrologie
simulation models. J. Hydrol. 34, 129-149.
Hall, M. J. & Minns, A. W. (1993) Rainfall-runoff modelling as a problem in artificial intelligence:
experience with a neural network. In: Proc 4th Nat. Hydrol. Symp. (Cardiff, UK), 5.51-5.57. British
Hydrological Society, London, UK.
Hornik, K., Stinchcombe, M. & White, H. (1989) Multilayer feedforward networks are universal
approximators. Neural Networks 2, 359-366.
Hertz, J., Krogh, A. & Palmer, R. G. (1991) Introduction to the Theory of Neural Computation. AddisonWesley, Redwood City, California, USA.
Hong Kee An & Mohd. Nor Bin Hj. Mohd. Desa (1988) Application of runoff routing model to the Krian
river basin. Bull Instn. Engrs. Malaysia, 12-17.
Hopfield, J. J. (1994) Neurons, dynamics and computation. Physics Today 47(2), 40-46.
Laurenson, E. M. (1964) A catchment storage model for runoff routing. / . Hydrol. 2, 141-163.
Laurenson, E. M. & Mein, R. H. (1988) RORB Version 4 runoff routing program user manual.
Monash University, Clayton, Australia.
Mein, R. H., Laurenson, E. M. & McMahon, T. A. (1974) Simple nonlinear model for flood estimation.
Proc. Am. Soc. Civ. Engrs, J. Hydraul. Div. 100(HY11), 1507-1518.
Minns, A. W. (1995) Analysis of experimental data using artificial neural networks. In: Hydra 2000 (Proc.
XXVI Congress IAHR), vol. 1, 218-223. Thomas Telford Services Ltd, London, UK.
Natural Environment Research Council (1975) Flood Studies Report, Vol II, Meteorological studies. Natural
Environment Research Council, London, UK.
Rumelhart, D. E., Hilton, G. E. & Williams, R. J. (1986) Learning internal representations by error
propagation. In: Parallel Distributed Processing. Explorations in the Microstructure of Cognition,
Vol 1, Foundations, 318-362. The MIT Press, Cambridge, Massachusetts, USA.
Rumelhart, D. E., Widrow, B. & Lehr, M. A. (1994) The basic ideas in neural networks. Communications
of the ACM 37(3), 87-92.
Selvalingham, S., Liong, S. Y. & Manoharan, P. C. (1987) Application of RORB model to a catchment
in Singapore. Wat. Resour. Bull. 23(1), 81-90.
Watt, W. E. & Kidd, C. H. R. (1975) QUURM - a realistic urban runoff model. J. Hydrol. 27, 225-235.
Received 16 August 1995; accepted 12 January 1996