Académique Documents
Professionnel Documents
Culture Documents
www.elsevier.com/locate/eswa
Department of Business Administration, Hallym University, 39 Hallymdaehak-gil, Chuncheon Gangwon-do, 200-702, Korea
b
Graduate School of Management, KAIST, 207-43 Cheongryangri-Dong, Dongdaemoon-Gu, Seoul 130-012, Korea
Abstract
Bankruptcy prediction is an important and widely studied topic since it can have significant impact on bank lending decisions and profitability.
Recently, the support vector machine (SVM) has been applied to the problem of bankruptcy prediction. The SVM-based method has been
compared with other methods such as the neural network (NN) and logistic regression, and has shown good results. The genetic algorithm (GA)
has been increasingly applied in conjunction with other AI techniques such as NN and Case-based reasoning (CBR). However, few studies have
dealt with the integration of GA and SVM, though there is a great potential for useful applications in this area. This study proposes methods for
improving SVM performance in two aspects: feature subset selection and parameter optimization. GA is used to optimize both a feature subset and
parameters of SVM simultaneously for bankruptcy prediction.
q 2005 Elsevier Ltd. All rights reserved.
Keywords: Support vector machines; Bankruptcy prediction; Genetic algorithms
1. Introduction
Bankruptcy prediction is an important and widely studied
topic since it can have significant impact on bank lending
decisions and profitability. Statistical methods and data mining
techniques have been used for developing more accurate
bankruptcy prediction models. The statistical methods include
regression, discriminant analysis, logistic models, factor
analysis etc. The data mining techniques include decision
trees, neural networks (NNs), fuzzy logic, genetic algorithm
(GA), support vector machine (SVM) etc.
Bankruptcy has been an important issue in accounting and
finance, and it has been forecasted by the prediction models.
The prediction is a kind of binary decision, in terms of twoclass pattern recognition. Beaver (1966) originally proposed
the univariate analysis on financial ratios to predict the problem
and many studies have followed to improve the decision with a
variety of statistical methodologies. Linear discriminant
analysis (Altman, 1968; Altman, Edward, Haldeman, &
Narayanan 1977), multiple regression (Meyer & Pifer, 1970),
* Corresponding author. Address: Department of Business Administration,
Hallym University, 39 Hallymdaehak-gil, Chuncheon Gangwon-do, 200-702,
Korea. Tel.: C82-33-248-1841; fax: C82-33-256-3424.
E-mail address: shmin@hallym.ac.kr (S.-H. Min).
0957-4174/$ - see front matter q 2005 Elsevier Ltd. All rights reserved.
doi:10.1016/j.eswa.2005.09.070
653
W
H2
b
W
H1
Origin
Margin
Fig. 1. Linear separating hyperplanes for the separable case (The support
vectors are circled).
Kernel functions.
Simple dot product : Kx; y Z x$y
Vovk0s polynomial : Kx; y Z xy C 1p
Radial basis functionRBF : Kx; y Z eKjjxKyjj
2=2s2
l
X
iZ1
ai K
l
1 X
a a y y Kxi ; xj
2 i;jZ1 i j i j
subject to 0% ai % C
l
X
ai yi Z 0
iZ1
l
P
yi ai kx; xi C b .
iZ1
2. Research background
In this paper, we define the bankruptcy problem as a nonlinear problem and use the RBF kernel to optimize the hyper
plan.
iZ1;.n
654
655
Table 1
Steps of GA-SVM
Step 1 Define the string (or chromosome)
V1iZ(s,t,.,r)(Features of SVM are encoded into chromosomes)
V2i (Parameters of SVM are encoded into chromosomes)
Step 2 Define population size (Npop), probability of crossover (Pc) and
probability of mutation (Pm).
Step 3 Generate binary coded initial population of Npop chromosomes
randomly.
Step 4 While stopping condition is false, do Step 48.
Step 5 Decode jth chromosome (jZ1,2,.,Npop) to obtain the corresponding
feature subset V1j and parameters V2j
Step 6 Apply V1j and V2j to the SVM model to compute the output, Ok.
Step 7 Evaluate fitness, Fj of the jth chromosome using Ok (fitness function:
average predictive accuracy)
Step 8 Calculate total fitness function of population
N
pop
P
F1 V1i ; V2i
TFZ
iZ1
Step 9 Reproduction
9.1 Compute qiZFi (V1i,V2i)/TF
9.2 Calculate cumulative probability
9.3 Generate random number r between [0, 1]. If r!q1, then select first string
(V11, V21), otherwise, select jth string such that qjK1 hriqj
Step 10 Generate offspring population by performing crossover and mutation
on parent pairs
10.1 Crossover: generate random number r1 between [0, 1] for a new string. If
r1!Pc, then operate crossover
10.2 Mutation: generate random number r2 between [0, 1] and select the bit for
mutation randomly. If r2!Pm, then operate mutation for the bit
Step 11 Stop the iterative step when the terminal condition is reached
656
Hi
Fitness Z iZ1
n
where Hi is 1 if the actual output equals the predicted value of
the SVM model, otherwise Hi is zero.
During the evolution, the simple crossover operator
(traditional 1-point crossover) is used. The mutation operator
just flips a specific bit. With the elite survival strategy,
we reserve the elite set not only between generations but also
in the operation of crossover and mutation so that we can
obtain all the benefit of GA operation. The details of the
proposed model in an algorithmic form are explained
in Table 1.
4. Experimental design
The research data used in this study is obtained from a
commercial bank in Korea. The data set contains 614
externally non-audited medium-size light industry firms.
Among the cases, 307 companies went bankrupt and filed for
bankruptcy between 1999 and 2002. Initially 32 financial ratios
categorized as stability, profitability, growth, activity and cash
flow are investigated through literature review and basic
statistical methods.
Out of total 32 financial ratios four feature subsets are
selected for the experiment. The selected variables and feature
subsets are shown Table 2. In Table 2, 32FS represents all
financial ratios. ThirtyFS means 30 financial ratios which are
selected by the independent-sample t-test between each
financial ratio as an input variable, and bankruptcy or nonbankruptcy as the output variable. TwelveFS and 6FS represent
the feature subset selected by logistic regression stepwise and
the MDA stepwise method respectively.
Table 2
Variables and feature subsets
Category
Features
Stability
Profitability
Growth
Activity
Cash flow
Quick ratio
Debt/total asset
Debt repayment coefficient
Debt ratio
Equity capital ratio
Debt/total asset
Cash ratio
Financial expenses to sales
Operating income/net interest
expenses
Financial expenses to debt ratio
Net financing cost/sales
Time interest earned (interest cover)
Ordinary income of total asset
Return on total asset
(Operation profitCnon-operation
profit)/capital
Net income/capital
EBIT/interest cost
EBITDA/interest cost
Sales increase ratio
Growth rate of sales
Net profit increase rate
Inventory change/sales
Account receivable change/sales
Working capital change/sales
Operating asset change/sales
Cash operational income/debt
Cash operational income/interest
expenses
Debt service coverage ratio
Cash flow from operating activity/debt
Cash flow from operating activity/
interest expenses
Cash flow after interest payment/debt
Cash flow after interest payment/
interest expenses
12FS
O
O
O
O
O
O
O
O
O
O
Selected by
GA-SVM
30FS
32FS
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
O
Table 3
Classification accuracies (%) of various parameters in pure SVM using various feature subsets
d2
1
10
Tr
30
50
Val
Tr
Val
Tr
Val
Tr
80.05
87.74
92.31
94.71
95.67
95.91
95.67
96.39
97.36
98.08
70.71
70.20
64.65
63.64
64.65
64.14
64.65
66.16
64.14
63.13
75.48
77.40
79.09
79.81
80.53
81.49
81.49
82.69
82.93
83.65
71.21
70.71
70.20
96.70
70.71
70.71
70.20
71.22
70.71
71.21
75.24
76.44
75.96
75.72
75.48
75.72
76.20
77.16
78.13
79.09
69.19
70.71
69.70
68.69
96.19
69.19
69.19
96.19
69.70
69.70
75.48
81.25
76.44
76.20
76.20
76.20
76.20
75.00
75.72
75.48
1
10
30
50
70
90
100
150
200
250
79.09
81.73
83.89
86.06
88.22
88.94
89.66
90.14
90.38
91.59
96.19
72.73
70.20
69.70
70.71
70.20
70.71
69.70
66.67
66.16
75.96
78.13
79.09
78.67
79.57
79.81
79.57
79.81
79.57
80.29
68.69
67.17
68.18
96.19
68.69
70.20
96.70
96.19
69.70
68.69
75.24
76.68
77.88
78.67
78.13
77.88
77.88
77.64
77.40
77.40
66.16
67.17
67.68
67.17
67.17
67.17
67.68
68.69
69.19
69.70
73.80
76.20
77.88
77.88
78.13
78.13
79.61
78.37
77.88
77.40
1
10
30
50
70
90
100
150
200
250
81.73
91.35
95.43
97.60
98.32
9.56
98.80
99.28
99.52
699.76
70.20
69.19
64.14
64.14
64.65
65.15
64.65
65.66
67.17
64.65
76.20
78.61
81.97
81.97
82.69
83.17
83.17
84.86
85.58
86.90
70.20
68.69
68.69
68.18
69.19
70.20
70.20
71.72
70.71
69.70
74.52
75.72
76.44
78.85
79.09
79.57
79.33
81.49
81.73
82.45
69.19
70.71
67.68
68.18
69.70
68.69
68.69
68.18
68.69
68.69
73.32
75.96
76.20
76.20
76.92
77.40
77.64
78.85
79.09
79.33
1
10
30
50
70
90
100
150
200
250
82.45
93.27
96.88
98.56
99.04
99.28
99.28
99.76
99.76
99.76
72.22
66.67
66.16
64.14
63.64
63.13
63.64
62.63
64.65
64.14
76.68
78.85
81.25
83.17
84.13
85.10
85.34
26.54
87.02
88.22
70.71
68.69
69.19
70.20
69.70
70.20
70.20
71.21
71.21
70.20
74.76
76.20
76.92
78.61
79.57
80.05
80.53
81.25
81.49
82.69
69.70
69.70
69.19
69.70
69.70
68.69
68.18
69.19
69.19
68.69
73.32
75.96
76.20
75.72
76.68
77.16
77.40
79.57
20.53
80.53
Val
100
200
Tr
Val
Tr
Val
Tr
Val
75.00
75.24
75.72
76.68
76.68
76.20
75.96
75.96
76.20
75.96
96.19
72.22
70.71
70.20
70.20
69.70
69.70
96.70
96.19
96.19
72.12
75.72
75.96
76.20
76.68
76.44
76.44
75.96
75.96
76.20
70.20
71.72
71.72
70.71
70.20
70.71
70.71
69.70
96.70
96.19
62.50
74.76
75.24
75.96
75.72
75.96
75.96
76.92
75.44
75.96
57.58
96.19
71.72
72.22
71.21
70.20
70.20
70.20
71.21
70.20
79.80
76.44
76.92
78.13
78.37
78.61
78.37
78.91
78.37
79.09
64.14
68.69
67.17
68.18
66.67
66.67
67.17
68.18
67.68
68.18
73.32
75.72
76.20
77.40
78.37
78.13
78.37
78.61
78.37
78.67
64.65
68.69
67.68
67.17
66.67
67.17
67.17
68.18
68.18
67.68
68.99
75.72
75.72
76.44
76.92
77.64
77.64
78.37
78.85
78.85
61.62
67.17
67.17
67.68
67.17
68.18
67.17
67.17
67.17
67.68
72.36
75.72
75.72
70.20
76.44
76.20
77.16
77.16
76.92
77.88
66.67
70.20
69.70
76.20
69.70
68.18
68.18
68.18
67.68
67.68
72.12
75.24
75.96
70.20
76.44
76.68
76.44
76.92
76.92
77.16
65.66
69.70
70.20
75.72
69.70
69.70
69.70
68.69
68.69
68.18
71.39
74.76
75.72
70.20
75.48
75.72
75.72
76.44
76.44
76.68
64.14
70.20
69.70
72.36
75.72
76.20
76.44
76.20
75.45
75.96
76.44
77.40
79.09
66.67
70.71
69.19
69.19
68.69
68.69
68.69
68.69
67.68
68.69
71.88
74.52
76.20
76.20
76.20
75.96
75.72
76.44
76.68
77.16
65.66
68.69
71.21
68.69
69.19
68.18
67.68
68.69
68.18
67.68
71.63
74.76
75.48
75.72
76.44
75.96
76.20
76.20
76.20
76.20
64.14
69.70
69.70
70.71
70.71
70.20
69.19
68.69
68.18
68.18
6FS
69.19
68.69
68.69
69.19
69.19
68.69
657
68.18
74.75
70 20
96.19
69.70
68.69
68.69
96.19
68.69
96.19
12FS
66.16
67.68
67.68
67.68
67.17
67.68
67.68
67.68
68.69
68.69
30FS
66.16
71.21
70.71
76.44
67.68
67.68
68.18
69.19
69.19
68.69
32FS
66.16
70.71
69.19
68.18
69.19
69.70
69.40
69.70
68.18
68.69
c
1
10
30
50
70
90
100
150
200
250
80
658
5. Experimental results
5.1. Sensitivity of pure SVM to feature sets and parameters
85
80
Accuracy
75
70
65
60
30FS_Delta_30_Tr
30FS_Delta_30_Val
55
50
1
10
30
50
90
C
100
150
200
250
Fig. 5. Accuracy of 30FSs training set (Tr) and validation set (Val) when d2Z30.
95
12FS_C_70_Tr
90
12FS_C_70_Val
85
Accuracy
80
75
70
65
60
55
50
1
Fig. 6. Accuracy of 30FSs training set (Tr) and validation set (Val) when
CZ70.
Table 4
Average prediction accuracy
LR
32FS
30FS
12FS
6FS
NaN
Pure SVM
GA-SVM
Training (%)
Validation (%)
Training (%)
Validation (%)
Training (%)
Validation (%)
Training
Validation
78.13
80.53
66.83
76.92
68.18
67.68
68.69
70.71
79.57
78.85
79.81
75.48
68.18
69.19
69.19
71.72
82.45
84.86
81.73
81.01
72.22
71.72
72.73
74.75
86.53%
80.30%
LR
NN
Pure SVM
659
References
NN
Pure SVM
GA-SVM
0.727
0.115
0.263
0.002**
0.004**
0.082*
a nonparametric test for two related samples using the chisquare distribution. The McNemar test assesses the significance of the difference between two dependent samples when
the value of interesting variable is a dichotomy. It is useful for
detecting changes in responses due to experimental intervention in before-and-after designs (Siegel, 1956).
Table 5 shows the results of the McNemar test. As shown in
Table 5, GA-SVM outperforms LR and NN at the 1% statistical
significance level, and Pure SVM at the 10% statistical level.
But the other models do not significantly outperform each
other.
6. Conclusion
Bankruptcy prediction is an important and widely studied
topic since it can have significant impact on bank lending
decisions and profitability. Recently, the support vector
machine (SVM) has been applied to the problem of bankruptcy
prediction. The SVM-based model has been compared with
other methods such as the neural network (NN) and logistic
regression, and has shown good results. However, few studies
have dealt with integration of GA and SVM although there is a
great potential for useful applications in this area. This paper
focuses on the improvement of the SVM-based model by
means of the integration of GA and SVM.
This study presents, the methods for improving SVM
performance in two aspects: feature subset selection and
parameter optimization. GA is used to optimize both the
feature subset and parameters of SVM simultaneously for
bankruptcy prediction. This paper applies, the proposed GASVM model to the bankruptcy prediction problem using a real
data set from Korean companies.
We evaluated the proposed model using the real data set
and compared it with other models. The results showed that
the proposed model was effective in finding the optimal
feature subset and parameters of SVM, and that it improved
the prediction of bankruptcy. The results also demonstrate
that the choice of the feature subset has an influence on the
appropriate kernel parameters and vice versa.
For future work, we intend to optimize the kernel function,
parameters and feature subset simultaneously. We would also
like to expand this model to apply to instance selection
problems.
Acknowledgements
This work was supported by the Research Grant from
Hallym University, Korea.
660