Vous êtes sur la page 1sur 10

2003 IEEE X I I I Workshop on Neural Networks for Signal Processing

NEURAL AND FUZZY NEURAL NETWORKS FOR NATURAL GAS CONSUMPTION P REDICTlO N
Nguyen Hoang Viet Institute of Fundamental Technological Research, Polish Academy of Sciences. ul. Swigtokrzyska 21, 00-049 Warsaw, Poland. email: uiet@ippt.gov.pl Jacek I\fandziuk* Faculty of Mathematics and Information Science, Warsaw University of Technology. P1. Politechniki 1, 00-661 Wariaw, Poland. e-mail: mandziuk@mini.pw.edu.pl

Abstract. In this work several approaches t o prediction of natural gas consumption with neural and fuzzy neural systems for a certain region of Poland are analyzed and tested. Prediction strategies tested in the paper include: single neural net module approach, combination of three neural modules, temperature clusterization based method, and application of fuzzy neural networks. The results indicate the superiority of temperature clusterization based method over modular and fuzzy neural approaches. One of the interesting issues observed in the paper is relatively good performance of the tested methods in the case of a long-term (four week) prediction compared t o mid-term (one week) prediction. Generally, t h e results are significantly better than those obtained by statistical methods currently used in the gas company under consideration.

INTRODUCTION
The paper discusses several neural and fuzzy neural approaches to the problem of prediction of natural gas consumption in a certain region of Poland. The area is mostly rural with several small cities and therefore the consumers are mostly individual or they belung to small iudustry [bakeries, restaurants, laundries, etc.).
'The work was supported by the grant no. 503G 1120 0009 002 from the Warsaw University of Technology. Support from the Masovim Oil and Gas Company Ltd. is also gratefully acknowledged.

0-7803-8178-5/03/$17.00 0 2003 IEEE

759

Predicting the consumption of natural gas is an interesting, non-trivial and highly economically-motivated task [l,21. The conventional approach to solving it is based on applying statistical methods which are usually efficient enough only if the large amount of past (historical) data is available. An alternative approach is to exploit soft-computing methods, especially neural and fuzzy neural network models. Neural networks are well known to be universal non-linear approximators [3, 4 1 and therefore are in general capable of close approximation of the prediction model without the need of its explicit (mathematical) formulation in contrast to statistical approaches. The other advantage of using neural networks over statistical methods in the application domain considered in this paper is the problem of gas market volatility in Poland in recent years. On one hand several new consumers are attracted by the relatively low cost of this type of energy but on the other hand some gas consumers (especially in the rural regions) switch to alternative even cheaper energy sources, like wood or coal. This type of structural instability of natural gas consumers is very harmful for statistical approaches, which do not implement any adaptation mechanisms.

NETWORK ARCHITECTURES
Feed-forward n e t w o r k s Feed-forward networks applied here are two hidden layer architectures with sigmoidal units. Each prediction network takes the amount of daily gas consumption together with the daily average temperature measured in the given region as the inputs. Since the gas consumption is highly seasonal, the time factor coded in a special way is also included as part of the input. Lets denote by G ( t ) the actual gas consumption on day t , by G ( t ) the prediction of gas consumption made on day t and by T ( t )the average temperature on day t. It should be noted that all references to temperature values in this work concern the observable average temperatures in a given period in the past. No explicit temperature prediction was made since this kind of prediction was virtually embedded in the gas load prediction network, due to its specific architecture (see description bellow). The network architecture used for prediction is schematically presented in Fig. 1. The neurons in the input layer can be devided into three groups. The first k neurons in the input layer represent the daily gas loads. The second group of k neurons refers to the average daily temperatures from the k prevous days. The last input group T I , . . . ,T, denotes the time factor associated with the considered period (see description bellow for more details). The first hidden layer also consists of three groups of neurons. Neurons belonging to the first group are fully connected to those in the first group of the input layer (i.e. the gas units). Similarly, neurons in the second group are fully connected to those in the second input group (the temperature inputs).

760

Figure 1: The architecture of feed-forward net.work used for one-day prediction. The third group neurons are connected to the rest of the inputs. The number of neurons in this group is fixed to two. All the first hidden layer neurons are fully connected t o the neurons in the second hidden layer, which in turn are connected to the single neuron in the output layer. Three types of prediction are considered in this work: one-day prediction (denoted by D - t y p e ) , onoweek prediction (denoted by W - t y p e ) and fourweek prediction (denoted by 4W-type). The output G(t 1) of the network is equal to the predicted daily consumption for the next day in the case of D-type prediction or the average daily consumption for the following week or the following four weeks for W-type and 4 W t y p e prediction, r e p . The sizes of t h e input and the hidden layers were selected experimentally by some preliminary tests, depending on the prediction horizon. The gas loads and the average daily temperatures from the last three, five and seven days were taken as inputs to make D-type, W-type and 4T-type predictions, resp. Encoding of the last group of inputs representing the time factor requires special care. The main problem that occurs when applying a straightforward encoding scheme (either by the use of one input representing the day of the year or the two inputs denoting the month of the year and the day of the month) is harmful discontinuity in the New Years period. Therefore a different encoding approach is proposed here: denote by t the day of the year number of the first day of some n-day period ( n 2 1,0 5 t 5 365). The subsequent days in that period are numbered by t + 1, t 2, . . ., t n - 1. For such a period, the two following time encoding inputs are applied:

27i D, . 27iDc r1 = sin - and 72 = cos366 366 where D, is a real value iudicating the center day of the period under consideration. i.e.:

It can be observed that the use of (1) allows smooth coding of the season

761

of the year, which is especially important in the case of mid-term and longterm prediction. In the case of D-type prediction there is an additional time encoding input r,, which indicates the type of the next day, where 73 = -1 for working days and ~3 = +1 for non-working days'. This input can be interpreted as defining the working and non-working day context of the networks. All feed-forward networks described above are trained by the Scaled Conjugate Gradient method [5], which was proven to be a fast and effective training method.
Fuzzy neural networks ( F N N s )
Several FNN models have been developed and successfully used in various applications [6, 71. The advantages of fuzzy over conventional neural networks are their high generalization abilities and the capability of dealing with imprecise data. In the gas load prediction problem the temperature input plays significant role. The use of fuzzy neural network for this task is mainly motivated by the fact that the average daily temperature is only an approximation of the exact temperature which varies throughout the day.

The F N N model. The fuzzy neural network implemented in this work is represented by a single hidden layer feed-forward architecture with N input units, K hidden units and A 4 output ones (Figure 2). Every connection

Figure 2: The architecture of fuzzy-neural network used in simulations.

between input and hidden units has a weight equal to 1. Each unit in the hidden layer represents a fuzzy set over RN. The output value of the it h hidden neuron for a given input vector X = [ X I , XZ,.. . , X N ] can ~ be interpreted as the degree of membership of X to the fuzzy set represented by this neuron. The neurons in the hidden layer arc fully connected to the neurons in the output layer. Assuming the Gaussian membership function of the fuzzy sets represented by the hidden neurons: the output of the i-th hidden neuron given X as the
'Actually, for the sake of simplicity, only weekend days are treated as non-working days. All holidays that occurred in weekdays are treated as working days.

762

(3)
where (1 . (1 is the Euclidian norm and U<, Ci = [ c i l , c i 2 , .. . , G N ] ~ are parameters associated with the i-th hidden neuron. Denoting by W = [ b v k V k m ] K X M the matrix of connection weights between hidden and output layer, the output value of t h e j-th ouput layer neuron is defined as:
Yj =

E : = ' Wkjbk = ,
Ckd Pk
K

(4)

Assuming a training sample set T = { ( X i ,di) : i = the FNNs described above can he trained in supervised mode by minimizing the error function:

G},

where 0 (Xi) is t h e output vector yielded by the network for the input vector

Xi.
The FNN model described here can be interpreted in terms of an equivalent fuzzy system. For the i-th neuron in the input layer (the fuzzification neuron), a fuzzy IF-THEN rule Ri can he extracted:

R i

: IF X is p c THEN y1 = Wi1 AND

. . . AND yn6

Win!.

It is clear that t h e fuzzy properties pi as well as the (crisp) outputs Wij of the fuzzy rules are determined in the training process [certainly, the notion of p c indicates only a fuzzy set over the input space, not any linguistic value). The fuzzy rule set is then defined as:

S={Ri:

i=m},

and the inference engine for such fuzzy system is specified by (4).

FNNs for gas load prediction. Analogously to the case of feed-forward


networks, the fuzzy neural networks used here receive the daily gas load and the average daily temperature from the last few days as the input: three days for short-term, five days for mid-term and seven days for long term prediction. Regardless of the prediction horizon the numher of hidden units is chosen experimentally to be equal to four. Each network contains one output neuron that produces the predicted average daily gas load. An important issue concerning the training of FNNs is the technique for initialization of their parameters. Since each hidden neuron represents a fuzzy , K are set of Gaussian membership function, the parameter vectors Ci, i = 1 initially set close to the medians of some fuzzy sets that cozler the input space. A self-organizing map of the training data set is first built using the neural gas technique 181. Vectors C , are initialized as the centers of the clust,ers in that map. The training process moves these sets t o an optimal configuration.

763

DATA COLLECTION AND PREPROCESSING


One of the problems in this particular prediction task is the lack of historical data. The available data set provided by the gas company ranged from Jan. 01, 2000 to Dec. 31, 2002. The set was initially divided into two sets: the first one covering years 2000 - 2001 to compose the training set and the second one with 2002 year data t o form the test set. Furthermore 10% of the training data was randomly chosen for validation. Each record of the data represents the daily cumulative load of natural gas provided hy the telemetric system as well as the average daily temperature. Due to the relatively small amount of past data, the sliding window mechanism was used in order to artificially enlarge the data sets. Namely, for W and 4W-type predictions the target periods are overlapping and subsequent target periods are defined as [t 1, t n] - for prediction made on 1 - for prediction made on day t 1, [t 3, t n 2 1 day t , [t 2, t n 1 - for prediction made on day t 2, etc., where n = 7 or n = 28 for W-type and 4W-type predictions, resp. The data, before being input to the network, was scaled to some predefined range. The maximum and minimum temperatures in the training set, and t,,,, resp. were determined. Afterwards the temperature called t,,, tmar] was uniformly extended by 20% from each side into the range [tmtn, range [T,,,, TmaZ]. Temperature values were finally scaled t o the interval The daily load values were scaled to ( 0 , l ) by (-1,l) using T,,, and T,,. dividing each value G(t) by G,,,, where G,, is the extended (as in the case of temperature) maximum daily load over the training data set. The output (target) values in the training set, which arc the average daily loads for a given one-day, one-week or four-week period - depending on the type of , , , . prediction - were scaled similarly, i.e. were divided by G

+ + +

+ +

+ + +

EXPERIMENTAL RESULTS
For each prediction horizon the following experiments were performed: naive prediction, prediction using a single neural network module, prediction using a combination of three neural modules, prediction using three neural modules - each of which is devoted to a predefined temperature range, single neural network prediction performed for working days only (concerns D-type prediction only) and prediction based on a single fuzzy neural network module. The test data set (covering the whole year 2002) for W and 4W-type predictions was also artificially enlarged by using the sliding window mechanism - analogously to the case of the training data. In all tests, the same error measure - the Mean Absolute Percentage Error, denoted by R4APE - com1 was applied. The following experiments monly used in prediction tasks 12, 9 with various neural architectures were performed:

Naive prediction (Naive). In this experiment no neural network is involved. The predicted average daily load in the period [t 1,t 72.1 is

+ +

764

simply assumed to be equal to the average daily load in the preceding where n = 1,7,28, resp. for D, W and 4Wtype period, i.e. [t- n + l , t ] , prediction. Prediction using a single n e u r a l network ( S i n g l e N . In this experiment, for each prediction horizon, 50 neural networks were trained and tested as single predictors. Prediction using a combination of three n e u r a l ' m o d u l e s (3AvgN). Here all combinations of three different networks among the above 50 were tested. The final output value was equal to the average value of three modules' outputs.

Prediction using three neural m o d u l e s - each of which is devoted to a specific temperature r a n g e ( 3 T e m p N ) . In this experiment the training set T was divided into three equipotent, overlapping subsets. Namely, for each training sample p , the average daily temperature Ep of the days covered by this sample was calculated. Let t l , tz and t g be some temperature values where:
tl

< tz < t g

Let: L
= = =

{ P E T : Tp < t z } { P E T :ti I Zp < { P E T : Ep 2 t z } .


t3)

M
H

The values

tl,

ta

and t g were chosen so that: card(L) = card(M) = card(H).


'

In other words, the training set was divided according to the temperature context. The subsets L, M and H can be regarded as containing sample data corresponding t o the low, medium and high temperature'.

For each of these subsets, a set of 20 neural networks was independently trained. The context-based partition of the training data could facilitate the prediction task within each temperature range. In the test phase, all possible combinations of the three modules (one of each type) were tested (SO00 combinations in total). Please note that due to some overlapping areas in the training sets, the effect of gluing the modules was achieved in a straightforward way (cf. [lo, 111). For a given test sample, the corresponding average input temperature was calculated and then depending on this value, one or (usually) two appropriate networks were activated. The ultimate prediction was the average value of the outputs of the modules involved in prediction.
'Certainly, such notions are only subjective.

765

TABLE 1: THE AVERAGE,

T H E bllNIMUY AND T H E hlAXIMUM

I\'IAPE (IN

PERCENT)

AND ITS STANDARD DEVIATION FOR

D-TYPE PREDICTIONS.

TABLE 2: THEAVERAGE,

T H E hllNlhlUhl AND T H E MAXIMUM

h'IAPE (IN

PERCENT)

A N 0 ITS STANDARD DEVIATION FOR W - T Y P E PREDICTIONS.

Prediction for the working days only ( WorkD).This experiment was analogous to the single network prediction except that it was made for the working days only. The experiment was performed for D-type prediction only. Prediction based on a fuzzy neural network module (fizrynr). An ensemble of 50 FNNs was trained and tested analogously t o the case of single neural modules.

For each experiment mentioned above, the average, the minimum and the maximum value (in percent) as well as the standard deviation of MAPE over all tested networks (or all their combinations, where applicable) were calculated. The results are shown in Tables 1-3. Examples of W-type and 4W-type predictions are presented in Fig. 3. CONCLUSIONS
The main conclusions drawn from Tables 1-3 are as follows: Comparing the average MAPE values, the best performance is observed for the combination of three temperature context based modules. The next one is the combination of three non-temperature-based networks. These two outperform the single neural and the single fuzzy neural approaches. The efficacies of single fuzzy neural network and single neural network modules are comparable. It is important to note, however, that for

766

TABLE 3: THE AVERAGE,

T H E hllNlhlUhl AND T H E hlAXIhlUhl

AlAPE

(IN P E R C E N T )

A N D ITS STANDARD DE\'IATION

FOR I W - T Y P E PREDICTIONS.

each type of prediction, the size of a fuzzy neural network module was much smaller compared to the corresponding crisp neural module. In the case of D-type prediction an interesting issue is to compare the working days prediction versus all days prediction. The average results for working days are only slightly better than the respective ones for all days. The main reason for such a little improvement is the rural character of the area with very few industrial consumers. Therefore the consumption profile over the week is more stable than in highly industrialized regions. In summary, the results achieved by neural networks and fuzzy neural networks are encouraging and acceptable from the natural gas company% viewpoint. Statistical methods used so far by the company yield the average h4APE error for monthly prediction in the period 01.2000 - 12.2001 equal to 12.86%3. In the numerical evaluation of the results it should be taken into account that this year's winter was unusually cold (cf. Fig. 3) and therefore making predictions for the period Nov.-Dec. n a s much more difficult in the year 2002 than in the two previous years. Another comparison can be fairly made with the naive prediction a p proach, which was clearly outperformed by neural methods in the case of mid-term and long-term predictions. The comparison of our results with the literature is ambiguous since the data sets used in other works are different from ours and also the consumer profiles may be different. One example is [2] where the MAPE error of 3.78% for D-type prediction involving temperature data is reported. In the paper, the proposed techniques were applied in the context involving individual and small industrial consumers. It will he interesting to verify the efficacy of these methods in the case of a different consumer's profile. When considering highly inhabited and industrialized areas, more accurate and reliable prediction would he necessary. T h e techniques presented here could be exploited in cooperation with some other approaches like partially recurrent or RBF neural networks, as well as statistical methods.
3Unfortunately, the more recent data i s not available.

767

n
Figure 3: Test results for W-type and 4W-type predictions in the period Jan.-Dec., 2002. The solid lines and the dotted lines represent the target and predicted values, resp.

REFERENCES
[I] R.H. Brown, L. Martin, P. Kharouf, L.P. Piessens, "Development of artificial neural-network models to predict daily gas consumption", A.G.A. Forecasting Review, vol. 5, pp. 1-22, 1996. [2] A. Khotanzad, H. Elragal, T-L. Lu, "Combination of artificial neural-network forecasters for prediction of natural gas consumption", IEEE Transactions on Neural Networks, vol. 11, no. 2, pp. 464-473, 2000. [3] A N . Kolmogorov, "On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition", Dokl. Akad. Nauk ZSRR,vol. 114, pp. 953-956, 1957. [4] V. Kurkova, "Rates of approximation by neural networks", in P. Sincak, J. Vascak (eds.) Quo Vadis Computational Intelligence. Springer, Berlin, 2000, pp. 23-36. [5] M.F. Mdler, " A scaled conjugate gradient algorithm for fast supervised learning'', Neural Networks, vol. 6, pp. 525-533, 1993. [6] J.J. Buckley, Y. Hayashi, "Fuzzy neural networks: A survey", Fuzzy Sets and Systems, vol. 66, pp. 1-13, 1994. [7] M.M. Gupta, D.H. Rm, "On the principles of fuzzy neural networks", Fuzzy Sets and Systems, vol. 61, pp. 1-18, 1994.

[8] M. Martinetz, S. Berkovich, K. Schulten, "Neural-gas net\rork for vector quantization and its application to time series prediction", IEEE Transactions on Neural Networks, vol. 4, pp. 558.569, 1993. (91 G. Zhang, B.E. Patuwo, M.Y. Hu, "Forecasting with artificial neural networks: The state of the art", International Journal of Forecasting, vol. 14, pp. 35-62, 1998. [IO] A.J.C. Sharkey, "Modularity, combining and artificial neural nets", Connection Science, vol. 9, no. 1, pp. 3-10, 1997. [Ill A . Waibel, "Consonant recognition by modular construction of large phonemic timedelay neural networks", in D. Touretzky (ed.) Advances in NIPS 1, Morgan Kaufmann, 1989, pp. 215223.

768

Vous aimerez peut-être aussi