Vous êtes sur la page 1sur 2

Bankruptcy Prediction

with Artificial Neural Networks


Eugenio Fernandez*.** and Ignacio Olmeda*

*Dpto. de Fundamentos de Economia e Historia Econ6mica


"Dpto. de Matem~iticas
Universidad de Alcal~i
Alcal~i de Henares 28802 Madrid SPAIN

Abstract: In this paper we compare the forecasting accuracy of feedforward neural networks against various competing
models (C4.5, MARS, Discriminant Analysis and Logit) on the problem of predicting bankruptcy. The neural network model
is found to provide generally better results, though the computational effort is several orders of magnitude higher. We also
consider mixtures of the methods and show that many of these are always more accurate than any single method. We suggest
that an optimal system for risk rating should include two or more of the models considered.

1. Introduction.

Financial agents are increasingly interested on the use of Artificial Neural Networks (ANN "s),
as well as some other appealing techniques such as Genetic Algorithms or Machine Learning, for
modelling and forecasting purposes. The reason for this is quite obvious, if these "high-tech" tools were
truly more powerful, the competitive advantage from using them would be decisive, at least until these
technologies were used by any agent so that diferential benefits were fully arbitraged. The number of
successful applications of ANN's reported has been so high that a "folk-theorem" asserts their
universality and superiority against any other procedure. Considering that the process of developing a
ANN-based Decision Support System (DSS) is relatively much more costly than a traditional statistical
one, it is crucial, from the economic point of view, to determine the soundness of this belief.

Comparisons on the forecasting accuracy of A N N ' s against various models in classification


problems are relatively common in the literature. Most of these comparisons consider only a competing
model (such as Discriminant Analysis) so that the appropriateness of ANN "s in a general forecasting
context is not resolved. Another salient feature of the mentioned studies is that they only employ
models as the aternative, and not the combination of two or more of them. In this paper we consider
both questions by employing recently developed methods as alternatives and well as mixtures of them.
We will show that though ANN's models can be near optimal (under a forecasting criterion) when
compared against the traditional or sophisticated models considered, a combination of the methods
provides in general better results.

2. Methods compared.

The models chosen for comparison include a standard feedfurward neural network with a single
hidden layer trained with backpropagation (NN), two classical statistical techniques: Discriminant
1143

Analysis (DA) [Fisher, 1936] and Logit (Logit), and two recent extensions of the CART algorithm of
Breiman et at. (1984): Multivariate Adaptive Regression Splines (MARS) and C4.5 (C4.5). The first
three approaches are already well known so that we will only briefly describe the last two ones.

MARS [Friedman, 1991] is a nonparametric technique wich exploits the ideas of stepwise
regression and recursive partitioning. The model begins with a simple structure (say, a linear regression
in the explicative variables) and sucessively adds new terms of higher order (basis functions), for
example, cross products between variables. Whenever the new function fails to improve the fit to the
data, the model splits the region into different subregions and repeats the procedure. The regions are
subdivided in the following manner: for each dependent variable xi fixed at a determined level (called
knot), MARS considers two subregions, one including the data points with a value higher than the knot
and its complementary. After performing a regression in each of these subregions, the fit is computed
and then, a different spliting variable or a different level are tried. This procedure is repeated and the
best parameters are chosen.

The C4.5 algorithm combines some improvements of the well-known ID3 [Quinlan, 1983]. C4.5
generates a decision tree by evaluating the information gain of further partitioning the tree at a certain
stage. The algorithm begins with a minimal tree and evaluates the attribute which produces the most
informative partition of the training cases (which is equivalent to minimizing the entropy of the
partition). Each of the leaves generated is treated again as a new tree and the procedure iterates until
there are no missclassifications in the training data. The resulting tree is "pruned" to produce a minimal
tree by reducing its complexity while conserving its generalizing properties.

Though conceptually very similar these methods differ both in their structure (the MARS
algorithm uses truncated cubic polynomials as basis functions while C4.5 uses step functions) as well
as in their performance criterion (the MARS algorithm minimizes a cross-validated error while C4.5
maximizes an information criterion), consequently they can provide different conclusions.

3. Database used and results.

From 1977 to 1985 the Spanish banking system suffered the worst crisis of its whole history,
affecting 52 % Of the 110 banks that were operative at the beginning of this period. Such concentration
in time offers the oportunity to compare alternative methods for bankruptcy prediction, since the
economic conditions can be considered stable enough to assess the significance of the financial ratios
used. Following previous studies (see Pina, 1989 and references therein), we employ a database
consisting on 66 banks (29 failed and 37 non-failed) and 9 financial and economic ratios (working
capital/total assets, sales/total assets, etc.). This database was randomly splited in two sets, Set 1
consisted on 34 banks (15 failed and 19 non-failed) and Set 2 on 32 banks (14 failed and 18 non-failed).

We tried a variety of especifications for each of the models (number of basis functions for
MARS, number of leaves for C4.5, number of hidden nodes for the NN, etc.), always using all the
attributes. For reasons of brevity we give only the results for the best model found (full results are
available upon request). First we estimated the models on Set 1 and use them for predicting on Set 2,

Vous aimerez peut-être aussi