Vous êtes sur la page 1sur 13

ELSEVIER

International Journal of Forecasting 12 (1996) 255-267

Forecasting consumers' expenditure: A comparison between econometric and neural network models
Keith B. Church a'*, Stephen P. Curram b
aESRC Macroeconornic Modelling Bureau, University of Warwick, Coventry CV4 7AL, UK bWarwick Business School, University of Warwick, Coventry CV4 7AL, UK

Abstract

This paper is motivated by the difficulties faced by forecasters in predicting the decline in the growth rate of consumers' expenditure in the late 1980s. The econometric specifications of four competing explanations are replicated and the static forecasts compared with the actual outturns. The same data are then used to estimate neural network models. The main issue is whether the neural network technology can extract any more from the data sets provided than the econometric approach. It is found that the neural network models describe the decline in the growth of consumption since the late 1980s as well as, but no better than, the econometric specifications included in the exercise, and are shown to be robust when faced with a small number of data points. However, whichever approach is adopted, it is the skill of choosing the menu of explanatory variables which determines the success of the final results.
Keywords: Consumers' expenditure; Econometric modelling; Neural networks; Forecasting

1. Introduction

This p a p e r compares the forecasts from models of U K consumers' expenditure which arise f r o m competing econometric and neural network specifications. The motivation for this exercise comes from the difficulty in explaining the persistent downturn in consumers' expenditure that occurred during the late 1980s and early 1990s. The failure to predict these events probably led to policy makers taking decisions that * Corresponding author. Tel: 01203 523934; fax: 01203 523032; e-mail: mbran@csv.warwick.ac.uk.

m a d e the subsequent recession d e e p e r and m o r e prolonged than it might otherwise have been. Forecasters using econometric and other methods based m o r e on judgement were equally prone to these forecast failures. The use of an econometric model entails an explicit statement of the model determining expenditure and enables comparable neural network models to be constructed. The p a p e r examines whether the estimation of potentially highly non-linear neural network models using the same variables featured in the conventional consumption functions can help explain consumers' behaviour. The consumers' expenditure equations used in

0169-2070/96/$15.00 (~) 1996 Elsevier Science B.V. All rights reserved SSDI 0169-2070(95)00631-1

256

K.B. Church, S.P. Curram / International Journal of Forecasting 12 (1996) 255-267

this comparison are based on those estimated by the London Business School (LBS), the National Institute of Economic and Social Research (NIESR) and the Bank of England (BE), three of the large-scale quarterly models of the UK economy regularly deposited at the ESRC Macroeconomic Modelling Bureau, together with a specification based on a savings ratio equation estimated by Goldman Sachs (GS). We have chosen to look at total expenditure rather than the disaggregate components. These equations exactly correspond to those described and evaluated in Church et al. (1994). That paper concludes that despite recent developments and improvements in modelling consumption, none of the models can completely explain the fall in expenditure growth of the late 1980s and early 1990s. Neural networks are a family of models which are loosely based on the structure of neurons in the brain. They are made up of simple processing units which connect to form structures that are able to learn relationships between sets of variables. Initial research on neural networks focused on discovering the way that the brain works, but they have since been exploited for their mathematical properties as signal processors and statistical models. In recent years there has been increasing interest in the use of neural networks for forecasting purposes. For further examples see Department of Trade and Industry (1993), Smith (1993), Azoff (1994). The paper proceeds as follows. In Section 2 we discuss the theory behind the choice of variables used in both the econometric and neural network models and how the final specifications are chosen. Section 3 presents the results of the estimation and some properties of the preferred neural network model are examined in Section 4. Some conclusions are drawn in Section 5.

signed to form a small part of a large-scale macroeconomic model. The model proprietors are therefore not only concerned with estimating a single relationship that gives a good description of the data but also in the way that this equation interacts with the others in the model to determine full model properties. For example, the BE equation used in this comparison does not appear in the version of the model currently held by the Bureau because problems in modelling net liquid assets, one of its explanatory variables, leads to unwelcome simulation properties if it is used. In addition to the problem of endogeneity of possible right-hand side variables, the modeller may wish to impose certain features implied by theory but imposed at the expense of goodness of fit, static homogeneity of consumption with respect to income and wealth being one example. In contrast, the objective when building the neural network model of consumers' expenditure is simply to produce the best forecasting model, although the modelling process itself can give some guidance as to the relative importance of each of the inputs.
2.1. E c o n o m e t r i c approach

2. Modelling consumers' expenditure The different models presented in this comparison reflect the differing objectives of the model builders. In the cases of the LBS, NIESR and BE specifications, the equations are de-

Most modern econometric consumption functions originate from the life-cycle theory which explains how an individual smooths his/her expenditure given the amount of wealth accumulated over a lifetime. Hendry et al. (1990) show how a standard optimization exercise yields a log-linear consumption function depending on factors such as income, wealth and the real rate of return, but then point out that this neglects possible roles for income uncertainty, credit constraints, demographic changes, liquidity and dynamic adjustment. The final specification contains variables that therefore reflect both theoretical considerations and, to a certain extent, the opinions of the modeller. The most important differences in the models are in the choice of the measure of non-human wealth. The BE model uses net liquid assets as its wealth variable, which is the narrowest definition and is encompassed by the measure of net financial wealth of the personal sector used by all the

K.B. Church, S.P. Curram / International Journal of Forecasting 12 (1996) 255-267

257

other models. The NIESR model has a temporary effect from the change in liquid assets. The LBS and GS models also consider physical wealth to be important and include different measures of the value of the housing stock. There is also disagreement on the role of interest rates. They do not feature in explanations of BE and GS but the real bank base rate appears in the NIESR specification and the after-tax version of the same variable in the LBS model. Various other variables are used to capture the factors mentioned above which are neglected in the life-cycle theory. The GS model uses differences in the unemployment rate to try to capture income uncertainty. The influence of shifts in the demographic structure is addressed explicitly in the LBS model by a term measuring the proportion of the population aged between 45 and 64. The estimation method used is OLS with a two-stage Engle and Granger (1987) technique employed by the LBS and GS models. The twostage technique involves first estimating a cointegrating relationship. This relationship is then embodied in the second-stage dynamic model. The approach involves examination of the timeseries properties of each of the variables. The order of integration of a series is the number of times that differencing is required to make it stationary. The variables in the cointegrating regression should all be integrated of order one. The cointegrating relationship exists if the residuals from this regression are stationary. These residuals then appear in the second-stage regression which should only contain stationary variables using differencing where necessary. Typically the cointegrating relationship embodies the long-run or equilibrium relationship between consumption, income and wealth, but stationary variables from outside the cointegrating regression may also enter the long-run solution of the model. The time series and cointegrating properties of the variables used in these models is covered in Section 3.

by Rumelhart et al. (1986). The network takes continuous-valued input variables and 'learns' their relationship to continuous-valued target values, a method known as supervised learning. The network is data driven in that it learns only from the training data presented to it and has no underlying parametric model. This means that the model produced is only as good as the data used, so the choice of explanatory variables is as important as for any other approach. Neural networks do, however, have the ability to ignore variables which do not contribute to the model so that some experimentation with potential variables can be done, though care must be taken not to swamp the network with too many irrelevant variables. The basic building block of the neural network is the neuron. A single neuron is a simple processing unit based on the structure of neurons in the brain. Fig. 1 shows a biological neuron. The neuron collects input signals through the dendrites and passes them to the soma or processor. If the combined signals reach some threshold then the neuron is activated and passes a signal on to other neurons via the axonal path. The sensitivity of a neuron is controlled by this activation threshold. While a single neuron is slow, it is the interconnection of over 100 billion neurons which gives the brain its power. An artificial neural network is made up of layers of neurons as shown in Fig. 2. The input layer and output layer represent the input and output variables of the model. Between them lie one or more hidden layers which hold the network's ability to learn non-linear relationships. The greater the number of neurons in the hidden layers, the more the network is able to
Dendrites

2.2. Neural network approach


The neural network used for these experiments is the 'multilayer perceptron' developed

Fig. 1. A brain neuron.

258

K.B. Church, S.P. Curram / International Journal of~Forecasting 12 (1996) 255-267

Input Layer

Hidden Layer

Output Layer

Fig. 2. A 3"2"1 multilayerperceptron. cope with non-linear relationships. Each neuron in a layer has weighted connections to each neuron in the next layer. The weights multiply the signal between pairs of neurons and can have positive and negative values. This means that they can control the strength of the connection and whether there is a positive or negative relationship between neurons. Each neuron also has a bias term which acts like the intercept in a regression. An individual neuron works as follows: the values of inputs entering the neuron are summed and added to the bias term; this total then passes through an S-shaped squashing (or sigmoid) function to give the activation of the neuron in the range 0 to 1. The activation value is then passed on to all the neurons in the next layer of the network via weighted connections or, in the case of the output layer, represents the output of the network. The data to be used with the network are usually scaled. The output from a neuron, and thus the network as a whole, is in the range 0 to 1, so data for the dependent variable must be scaled to within this range. Moreover, we scaled the dependent variable from 0.2 to 0.8. This was done for two reasons. First, as Smith (1993) points out, the sigmoid function becomes increasingly non-linear as it approaches 0 and 1, which makes it increasingly difficult to represent linearly scaled variables as their values approach these extremes, and so slows learning. Second, the regime allows the possibility of forecast

values which are outside the range of the original training data, as may be required for new data or sensitivity analysis. The explanatory variables do not need to be scaled, but scaling is usually desirable. Hart (1992) notes that scaling variables to the same order of magnitude prevents those variables with a higher magnitude from 'swamping' the network in the initial stages of training. This swamping effect can slow or inhibit the training of the network. Typical ranges for scaling explanatory variables are 0 to 1 o r - 1 to 1 (Hart, 1992; Smith, 1993; DTI, 1994a). In our case the range 0.2 to 0.8 was used, since for one model a lagged version of the dependent variable, consumer expenditure, was used as an explanatory variable, and it was thought desirable to maintain a 1:1 relationship between the two. A general overview of the training process is produced below. However more technical descriptions of the training algorithm are given by Smith (1993) and summarized by Lippmann (1987). Training involves repeatedly presenting the data to the network. Learning is achieved in the network by altering the values of weighted connections between neurons and bias values within neurons to bring the output of the network closer to the desired target value. The overall aim is to reduce the mean-squared error (MSE) for the training data. The errors between output and target values are propagated back through the network to attribute them to the weights in the network. These are then altered using the steepest descent method which aims to reduce the MSE by following the steepest gradient on the error surface. The rate of learning is controlled by gain and momentum parameters. The gain parameter specifies the magnitude of changes to the weights. A small gain term results in slow network learning while a large gain term can miss key features on the error surface leading to oscillation or convergence to local minima. Past changes which have been made to a particular weight are stored as an exponentially smoothed average, known as the momentum. A proportion of this momentum is used in future changes of the weight so as to smooth learning and reduce oscillation.

K.B. Church, S.P. Curram / International Journal of Forecasting 12 (1996) 255-267

259

Two key disadvantages of the multilayer perceptron are its slow learning speed and its ability to converge to local minima. These are particularly affected by the choice of value for the gain parameter. To help overcome the difficulty in setting the gain parameter we used the adaptations suggested by Vogl et al. (1988). Here the gain term is not fixed but is allowed to vary depending on the success of the learning. While there is an improvement in the MSE, the gain term is allowed to increase, representing increased confidence in the direction of learning. If the MSE worsens by more than some small given percentage, the gain term is reduced, the momentum terms are ignored so that past weight changes are not allowed to affect the current weight changes, and the calculated weight and bias changes for that iteration are not used. The momentum term is switched back on when a successful learning iteration occurs. A key aspect in setting up the network is deciding on the number and size of the hidden layers. The more complex the interactions between the variables, the more the hidden units required. If too few hidden neurons are used, the network will fail to learn the richness of the relationships; if too many are used then the data may be overfitted, fitting to individual data points rather than the trend and so reducing the network's ability to generalize. There are no hard and fast rules for deciding on the number of hidden neurons to use. It has been found that generally only one hidden layer is required for forecasting problems. However, the actual number of neurons required in that layer must be found by trial and error. We used an alternative to optimizing the number of hidden nodes, which is to use an independent validation step in the training of the network as described by Hoptroff (1993). Here the network is required to have sufficient nodes to fit the trend but is not streamlined to prevent overfitting. Hoptroff suggests that 10 nodes in the hidden layer are usually sufficient for most forecasting problems. More nodes can be used but usually result in slower learning without an improvement in results. The approach requires that independent vali-

dation data are used to test how well the network is able to generalize to unseen data. The validation data are taken out from the training data and should be representative across the range of outcomes. A larger validation set is likely to be more representative. However this does take data away from the training set. It is, therefore, necessary to strike a balance between the training and validation data set sizes. When large amounts of data are available the selection of validation data can be done using a simple random choice. In our case, the amount of data was limited, so we used a stratification approach. Here the data were ordered by ascending value of the dependent variable and partitioned into groups of roughly equal size, one for each validation point required. One validation point was then selected at random from each group. This stratification approach tries to ensure that the validation data are representative by choosing across the range of values for the dependent variable. Hoptroff (1993) suggests that the validation data should comprise 10-25% of the available data, and at least 10 data points. We used a more conservative 30% with a minimum of 15 data points. It is important to stress that these validation data were independent and were not used for actual training examples, nor were they taken from test data used for the final forecasting period. Each iteration of the training process is as follows. The network is presented with a set of training examples from which weight and bias adjustments are made. Then the network is tested using independent validation data to find the ability of the network to fit unseen data. Training is stopped at the iteration where the MSE for the validation data is minimized. This represents the point in training where the network is best able to generalize. In practice, it has been found that the MSE of the validation data can go up but then improve again. To overcome this, training is usually continued for some time after the optimum point has been reached to ensure that no further reduction in the MSE will occur. In our case, the network weights and biases are saved every time

260

K.B. Church, S.P. Curtain / International Journal of Forecasting 12 (1996) 255-267

the MSE of the validation set is reduced, and training of the network is stopped if 2000 iterations occur without an improvement. This means that the final saved network represents the optimal training point while the extra iterations offer a large safety margin to ensure that a better stopping point will not have been missed. If the MSE of the validation set does not permanently get worse (i.e. no optimum point in training is reached), this suggests that the network does not contain enough nodes to overfit to the data and is unlikely to be able to pick up the full underlying trend. In this situation training is restarted using more nodes in the hidden layer. The reasoning behind the independent validation approach is that the underlying trend lies closer to the network starting point than a model which has been fitted to individual data points. Since the steepest descent method tries to minimize the fitting error in as short a distance from the starting point as possible, it will fit the underlying structure before the detail of the individual data points. Thus stopping training at the right time prevents overfitting from occurring. Since the validation data are not used in training, the point where it reaches its minimum MSE should represent the point where the generalizing abilities of the model are maximized.

variable is the change in the log of consumers' expenditure. Since the neural network is not based on an underlying economic model we use the same explanatory variables as the input to the network to enable a direct comparison. The data used are the same for both approaches and are available from the authors on request. The sources for all the data are the Central Statistical Office publications, Economic Trends and the

Monthly Digest of Statistics. 3.1. Econometric modelling


The OLS estimates of the first-stage regressions are shown in Table 1 where the two-step method is adopted. Detailed tests of the timeseries properties for all the series that appear in the long-run solutions are presented in Church et al. (1994). The tests show that all the variables used in the first-stage regressions apart from the unemployment term in the GS model are integrated of order one. The stationary unemployment term should not really appear in the cointegrating relationship but does form part of the long-run economic relationship. The interest rate variables and LBS demographic term are all stationary and so are consistent with inclusion in the second stage. The Dickey-Fuller (DF) and augmented Dickey-Fuller (ADF) tests quoted in Table 1 indicate some evidence for cointegration in both LBS and GS models, although the case is weaker for the GS specification. The NIESR model imposes the first-stage relationship, and this proves to be the cointegrating vector which shows the strongest evidence that the residuals are stationary. The second-stage regressions and dynamic model of the BE are given in Table 2. Virtually all the estimated coefficients are at least twice the magnitude of their standard errors and the only diagnostic test failed is that of the BE model for autocorrelation. The fact that all the models pass the Reset test indicates that the log-linear functional form is probably an adequate description of the data, although given the general nature of this test a more complicated model remains a possibility. Although the equations are replicated on seasonally adjusted

3. Estimation and forecasting


In this section we present estimates of each of the four consumption functions using the two alternative approaches to model building. The extent to which each specification can explain the downturn in expenditure after 1988 is then examined. The type of econometric model used and its underlying assumptions are all fairly familiar and can, therefore, be seen as a benchmark against which the neural network models can be judged. The functional forms of the econometric specifications are those originally used by the model builders, and so their decisions on choice of variables and dynamic structure are adopted. The models are designed to capture short-run behaviour, so the dependent

K.B. Church, S.P. Curram / International Journal o f Forecasting 12 (1996) 255-267 Table 1 First-stage regressions Dependent variable Sample period Estimation method Constant In(Y) ln(HW) ln(FW) A(U) R2 tr DW DF ADF(4) 0.991 0.016 1.06 -5.3** -2.9* LBS ln(C) 69:1-90:1 OLS 1.31 (0.29) 0.757 (0.039) 0.059 (0.016) 0.093 GS In(C) 71:2-89:4 OLS 0.913 (0.332) 0.828 (0.046) 0.046 (0.020) 0.059

261

(o.oo8)

(O.OLO)
-0.030

(0.008)
0.989 0.015 1.27 -5.94* -2.69

tionships of that era have since been successfully augmented so that the rise in expenditure between 1985 and 1987 is now adequately described. This has been achieved to a certain extent by placing greater emphasis on the role of the housing market. Although these innovations have helped to explain one difficult period they are found to be unable to capture the fall in expenditure that occurred in the early 1990s. This is demonstrated by estimating the models up until the last quarter of 1988 and then calculating static forecasts. These forecasts together with the actual changes in expenditure are shown in Fig. 3.

3.2. Neural network modelling


The estimation of the neural network models presented here require an appropriate software package. The software used was developed by the authors and contains the features described in Section 2, however details of several packages that are capable of repeating this exercise are given in DTI (1994b). The neural network was used as a regression tool to learn the relationship between a set of explanatory variables and the dependent variable, namely the change in consumer expenditure. Thus the neural network is used in a similar way to the OLS method, but allows highly non-linear relationships to be fitted if they exist. The set of experiments with the neural networks involved training models using exactly the same explanatory variables and data sets as for each of the econometric models. As a consequence the neural network is subject to the same insufficiencies in the data as the econometric models. The shape of the neural network is dependent on the number of variables, while the hidden layer is required to have sufficient nodes to model the relationship but need not be optimized to prevent overfitting, since independent validation is being used. Our networks used 10 nodes in the hidden layer which was found to be sufficient for all the models. The final trained network represents the relationships between the explanatory variables and the dependent variable. There was a wide variation in the number

Notes for Tables I and 2: c Total consumers' expenditure Y Real personal disposable income H W Housing wealth F W Financial wealth I N T Interest rate U Unemployment dem Proportion of the population aged between 45-64 wealth Total wealth RB Change in stock of liquid assets A Liquid assets E C M Error correction mechanism Rather than estimate an ECM, in the NIESR model the relationship below is used in the second-stage regression E C M = ln(C - IOOASCC/PC) - In(Y) where SCC Stock of consumer credit PC Consumer's expenditure deflator ** Significant at 1% level * Significant at 5% level

1992 vintage data, the sample period originally used in each case is retained to try and replicate as closely as possible the published equation of each of the institutions represented. More detailed results and testing are contained in Church et al. (1994). One of the conclusions of that paper is that following the last breakdown in consumers' expenditure equations, as described in Carruth and Henley (1990), the rela-

262

K.B. Church, S.P. Curram / International Journal o f Forecasting 12 (1996) 255-267

Table 2 Second-stage regressions Dependent variable Sample period Estimation method Constant A In(C)_4 Aln(r) A in(Y)_1 A In(Y)_5
dem 1NT INT_ 1 In(wealth)_ 1 A in(wealth)

LBS Aln(C) 69:2-90:1 OLS 0.168 (0.031) -0.125 (0.055) 0.272 (0.043) 0.163 (0.044) -0.062 (0.042) -0.630 (0.122) -0.288 (0.054)

NIESR A ln(C) 67:1-87:4 OLS -0.049 (0.044)

BE A In(C) 76:1-91:3 OLS -0.081 (0.034)

GS AIn(C) 71:3-89:4 OLS 0.004 (0.001)

0.216 (0.053) 0.189 (0.047)

0.608 (0.105)

0.189 (0.064)

-0.0003 (0.0003) 0.020 (0.O09) 0.108 (0.066)

0.393 (0.099) 0.101 (0.030) 0.030 (0.015)

AIn(HW) ain(FW)
Aln(RB)
ln(C/Y)_ l

0.558 (0.171) -0.347 (0.093) 0.054 (0.028) -2.062 (0.408) -0.032 (0.008) -0.218 (0.079) 0.709 0.009 1.83 9.54* 5.26 1.41 1.24 12.4(12) 0.658 0.008 2.09 6.17 2.82 0.45 0.01 9.23(12)

ln(A / Y)_ l AU aa(V)


ECM_ 1

-0.170 (0.059) 0.771 0.007 2.21 3.47 2.22 1.20 0.25 12.7(18)

-0.102 (0.O64) 0.752 0.007 2.39 5.14 7.23 0.28 4.92 11.8(17)

Equation diagnostics
R 2 (7"

DW

Autocorrelation LM(4)
ARCH(4)

Reset (1) Normality X22 Heteroskedasticity X2

K.B. Church, S.P. Curram / International Journal of Forecasting 12 (1996) 255-267


~lutfl

263
~W
LBS

0.04 0.03 0.02

0.03 002 0.01 0 -0.01


.
-0"029~

N~aroI

..

~-.,,,-..~,..f.,. i"

0.01 0 --0 0!

1go4

1987

1~

11J69

19g0

1~1~1

1~

- 0 " - lgll,~

19M

11J67

19615

1~

1990

lggl

1'1J92

Fig. 3. Change in total consumers' expenditure; actual and forecast values.

Fig. 4. LBS model. Change in total consumers' expenditure; actual and forecast values.

of iterations required to reach the optimal point, from 6789 for the BE model, 4708 for the GS, 578 for the NIESR, and 167 for the LBS. Observation of the errors during training showed that in all models, both training and validation set errors fell quickly during the initial stages of learning. For the NIESR and LBS models, a point was quickly reached where the training error continued to fall but the validation error began to rise steadily, indicating that the network was starting to fit the noise rather than the underlying trend. In the case of the GS and BE models, the reduction of training and validation errors was slow but steady, until the validation error began to slowly rise. The more steady learning suggests that training took longer to fit the trend, but was less affected by noise. The test data is applied to the network in the same way as the training data, except that the weights and biases are fixed at their final values. The output of the network for each set of variables from the test set is then rescaled to form the forecast value. The static one-step-ahead forecasts from the neural network models are compared to those from the econometric specification in Figs. 4-7. The econometric models have a slight advantage because the choice of variables is based on the results of estimation over the full sample specified in Table 2. Therefore, the choice of specification is influenced by the data points that are to be forecast. By contrast, although the neural network model uses the same right-hand side variables, none of the forecast data are used

A t W

0.03

O.Ol

,'-;~.

,,

/.."

"~

Fig. 5. NIESR model. Change in total consumers' expenditure; actual and forecast values.

in either training or validating the final specification used. Generally, the network forecasts follow closely those generated by the econometric models. The neural network model using the LBS choice of variables performs even worse than its econometric equivalent, failing to predict any dow-

0.03

.....

0.01 0
-'0.01

...-

1~

1M6

Igg?

1~8

1~

1~

1991

a992

Fig. 6. BE model. Change in total consumers' expenditure; actual and forecast values.

264

K.B. Church, S.P. Curram I International Journal of Forecasting 12 (1996) 255-267


~ w

0.03

002 0.01 o
-0.01 -0"~~ , , * * IM0 ... . . . . . . . . . . . . . . . . . . . . . . . IS67 I~ 1Sin ISlm IN1

-~\ ~

I~

Fig. 7. GS model. Change in total consumers' expenditure; actual and forecast values. nturn. In the NIESR example it is the modelbased forecast which is slightly better at the start of the forecast period and the neural network which is preferable towards the end. The comparison with the BE model does not allow a clear preference to be made, with both models overpredicting badly from 1989 to the start of 1991. The neural network does come close to capturing the low point of the recession, but both models subsequently return to overprediction. The GS choice of explanatory variables proves to be the best for both econometric and neural network specifications. Fig. 7 shows that the econometric model picks up the downturn well, actually forecasting a larger downturn in consumption growth than was actually witnessed between the third quarter of 1989 and the start of 1991. Subsequently, the model is slightly too optimistic about the speed of the recovery. One of the main reasons why the GS model performs better than the other models is the inclusion of terms in the change and rate of change of the unemployment rate. The actual profile of the forecast is sensitive to the functional form that is chosen for the unemployment rate. Similar specifications in Church et al. (1994) which use the logarithm of the rate tend not to capture the initial downturn as accurately, but do not overpredict at the end of the forecast. The neural network is designed to fit a model to the overall trend shown in the data rather than any noise. In the GS example, the network forecast follows the actual data down without picking up individual peaks and troughs. The model does not underpredict in the way of its econometric equiv-

alent, but nor does it fully pick up the lowest point of the downturn. It shares the overprediction seen in the final few periods. A final exercise was conducted which presented the network with the inputs used in all the models (25 in total) to examine if the provision of all the information used can yield better results than in any one of the models. This approach is only valid using the neural network model as the quality of the forecast is the only concern with this method. The specification of the econometric model is constrained by the need to be consistent with economic theory. The forecasts for the combined model are shown in Fig. 8. This model shares several of the characteristics of the GS specification. The downturn is captured adequately and in this case coincides with the bottom of the cycle. The magnitude of the recovery is again overstated. The results from this final exercise show that the neural network is able to pick out the important features from a large number of variables to produce a model which performs better than any of the encompassed models. However, great care must be taken to use a representative validation set since the high parameterization offered by the large number of variables leads to a model which is more sensitive to the training process. There are several points emerging from this section. The first is that neural networks do not require vast quantities of data for effective forecasting performance. The neural network produces similar forecasts to the econometric models and does not outperform them, indicating that the process underlying the behaviour of

~tt t * ~

0.03 0.02 0.01 0 -0.01 -0.02 ISe6

11116

1987

11181

1~

IT~IO

IIi91

1992

Fig. 8. Total model. Changein total consumers'expenditure; actual and forecast values.

K.B. Church, S.P. Curram / International Journal o f Forecasting 12 (1996) 255-267

265

consumers' expenditure is not highly non-linear. Finally, if the information required to make an accurate forecast is not contained within the data set, no forecasting method will produce adequate prediction.

4. Properties of a neural network model

The econometric models in this comparison all contain estimates of parameters that will be of interest to economists, the marginal propensity to consume being one example. The sole output of the neural network is the forecast and reveals nothing about the underlying economic properties. A further comparison between econometric and network models can be made by perturbing or shocking each of the network inputs to gauge the sensitivity with respect to each variable. This involves testing the neural network model with the average values from the complete data set to find the forecast for the dependent variables; each variable is then shocked, one at a time, to find the degree to which the shock affects the forecast of the dependent variable. These results can be directly compared to the elasticities in the econometric specification. Table 3 shows the impact on the change in expenditure in the GS model of increasing each of the logged inputs by one per cent and the unemployment terms by one perTable 3 Response of A In(C) to a one per cent increase in right-hand side variables GS ln(C),_l In(Y),_ 1 ln(HW)t_ ~ ln(FW),_l --0.218 0.180 0.010 0.013 Neural network --0.116 0.123 -0.017 0.020

A(U),_~* AIn(Y) Aln(HW) Aln(FW) AA(U)*

-0.645 0.189 0.101 0.030 -3.161

-0.906 0.200 0.034 0.018 -2.018

Note: * One percentage point increase in unemployment rate.

centage point from their average values for both econometric and neural network models. There are several key features shared by the two competing explanations. The only 'coefficient' sign that differs is that for housing wealth in the long run, which has a negative effect in the neural network model. The dynamic terms in income are of very similar size, while changes in wealth have smaller impact in the network model but the relative importance of housing and financial wealth is comparable. The magnitudes of the consumption and income responses are smaller in the network model but virtually equal and opposite, suggesting a long-run unit elasticity. Unemployment has already been seen to play an important role in explaining the downturn in expenditure. Changes in the unemployment rate give a larger decrease in expenditure growth in the network model while the econometric specification is more responsive to the rate of this change. The overall impact of a one percentage point increase is smaller in the network version which helps to explain why the econometric model actually forecasts a more rapid fall in expenditure growth than the network model (and the actual outcome) from the end of 1989 to the start of 1991. With a log-linear specification, the econometric models will have constant elasticities and no interactions between variables. However, if the determination of consumers' expenditure is highly non-linear then we might suspect that a 10% shock would not be equal to 10 times a 1% shock. To investigate, variant simulations are conducted on the neural network model where shocks of between minus and plus 20% in unit intervals are applied to the variables entered in logged form. The unemployment rate is decreased by two percentage points and then increased by 0.1 until the rate is two percentage points above base. The results are illustrated in Fig. 9. A sample of four variables from the GS model shows how the growth rate of consumption changes with the size of shock. The curvature seen in the plots illustrates the degree of non-linearity in the relationship. Typically, for small changes in the explanatory variables the effects are negligible but when larger shocks are

266
Aln(RPOI)

K.B. Church, S.P. Curram / International Journal of Forecasting 12 (1996) 255-267


Aln(W)

= "6

.4o

-to

o ~

~ (z)

]D

Shark. Io ~

ln(m'Dz) E

~J

- -2 ]

.~,~xk to ;n~aendqmt ~

(~)

iit
-2

-1 Shock to ; n a s c e n t

~ioaJe (~ ~ls)

Fig. 9. Response of expenditure to changes in explanatory variables.

applied then the slight curvature does lead to noticeable differences. The income variables show the largest non-linearity as seen by the difference in the absolute effects when the shocks of plus and minus 20% are applied. Shocks of this size are rarely seen in reality, however. Further experiments where two variables are shocked simultaneously also indicate that the combined effects are little different to the sum of the outcome of the individual shocks. Overall, the conclusion is that a log-linear econometric model is probably adequate for describing the data in this case.

tion to the econometric method. None of the models here can completely explain the downturn in consumers' expenditure after 1988. In the case of the LBS and NIESR equations it appears that the information required to solve the forecasting problem is not in these data sets. The BE and particularly the GS models perform better and this seems to be due to the inclusion of terms involving the unemployment rate. While the economic model builder is expected to satisfy statistical criteria and tests down from a general to a specific model in which all the coefficients are significantly different from zero, the neural network contains no underlying assumptions and even when presented with a large number of inputs can discriminate between them, giving low weight to irrelevant information. The overall conclusion is that every forecasting method is dependent on the quality and nature of the data used. If the answer is not in the chosen data, then no method can extract it. The neural network technology is easier to apply because the model builder does not need to worry about the economic properties of the final specification. In both methods, the role of judgement in choosing the appropriate explanatory variables is the most important factor. The user of the neural network has the additional task of ensuring that the type and quantity of data used in the validation is sufficient and typical of the data as a whole.

Acknowledgements
The authors are grateful to Ajay Patel, whose MSc dissertation demonstrated the initial feasibility of this project. We also thank Ken Wallis and two anonymous referees for comments on the paper, but as ever we take responsibility for the accuracy and interpretation of all the results.

5. Conclusions
Several points emerge from this paper. The common belief that neural networks require a vast number of data points to be effective can be discounted and the forecasts produced are no worse than those using standard econometric methods. Indeed, despite the possibility of fitting a highly non-linear model, it appears that the neural network produces a very similar specifica-

References
Azoff, E.M., 1994, Neural Network Time Series Forecasting of Financial Markets (Wiley, Chichester).

K.B. Church, S.P. Curtain / International Journal of Forecasting 12 (1996) 255-267


Carruth, A. and A. Henley, 1990, Can existing consumption functions forecast consumer spending in the late 1980s?, Oxford Bulletin of Economics and Statistics, 52, 211-222. Church, K.B., EN. Smith and K.F. Wailis, 1994, Econometric evaluation of consumers' expenditure equations, Oxford Review of Economic Policy, 10, 71-85. Department of Trade and Industry, 1993, Neural Computing Applications Portfolio (DTI, London). Department of Trade and Industry, 1994a, Best Practice Guidelines for Developing Neural Computing Applications (DTI, London). Department of Trade and Industry, 1994b, Directory of Neural Computing Suppliers, Products and Sources of Information (DTI, London). Engle, R.F. and C.W.J. Granger, 1987, Cointegration and error correction: Representation, estimation and testing, Econometrica, 55, 179-197. Hart, A., 1992, Using neural networks for classification tasks-some experiments on datasets and practical advice, Journal of Operational Research Society, 43, 215-226. Hendry, D.F., J.N.J. Muellbauer and A. Murphy, 1990, The econometrics of DHSY, in: J.D. Hey and D. Winch, eds., A Century of Economics: I00 years of the Royal Economic Society and the Economic Journal (Basil Blackwell, Oxford) 298-334. Hoptroff, R.G., 1993, The principles and practice of time

267

series forecasting and business modelling using neural networks, Neural Computing and Applications, 1, 59-66. Lippmann, R.P., 1987, An introduction to computing with neural nets, IEEE ASSP, 4(2), 4-22. Rumelhart, D.E., G.E. Hinton and R.S. Williams, 1986, Learning internal representations by error propagation, in: D.E. Rumelhart, J.L. McLelland and the PDP Research Group, eds., Parallel Distributed Processing (MIT Press, Cambridge, MA) 318-362. Smith, M., 1993, Neural Networks for Statistical Modeling (Van Nostrand, New York). Vogl, T.E, J.K. Mangis, A.K. Rigler, WIT. Zink and D.L. Alkon, 1988, Accelerating the convergence of the backpropagation method, Biological Cybernetics, 59, 257-263.

Biographies: Keith CHURCH is a Research Associate in the


ESRC Macroeconomic Modelling Bureau at the University of Warwick. His research interests are concerned with explaining the difference in properties across large-scale macroeconometric models of the UK economy. Stephen CURRAM is a lecturer in Operational Research and Systems at the Warwick Business School. His research interests include the application of neural networks for management decision making and the representation of intelligent decision making in computer simulation models.

Vous aimerez peut-être aussi