Vous êtes sur la page 1sur 9

International Journal of Application or Innovation in Engineering& Management (IJAIEM)

Web Site: www.ijaiem.org Email: editor@ijaiem.org


Volume 3, Issue 4, April 2014 ISSN 2319 - 4847

Volume 3, Issue 4, April 2014 Page 55


Abstract
Credit Risk Prediction is an imperative task in any Private& Public Sector Banks. Recognizing the dodger before giving loan is a
decisive task of the Banker. Classification techniques play important role to identify, whether the customer is a defaulter or a
genuine customer. Determining the best classifier for credit risk prediction is a critical assignment for any banking industrialist.
It leads to determine the best classifier for the credit risk prediction though different research works. This paper assesses the
efficiency of Sequential Minimal Optimization (SMO) and Logistic Classifiers for the credit risk prediction and compares them
through various measures. The experimentation of this work is done using the German credit dataset for credit risk prediction
using open source machine learning tool.

Keywords: Credit Risk Prediction, Logistic Classifier, Performance Evaluation, SMO Classifier.

1. Introduction
The huge volume of transactions spins the information processing automation into a vital factor for cost reduction, high
quality standards with high speed results. Automation and result of the relevant successes achieved by state-of-the art
computer solutions applied have changed the opinions of many skeptics. In past days, people tended to think that
financial market analysis entails knowledge, intuition and experience and wondered how this activity could be automated.
However, through steadily growing along with the scientific and technological advances, the automation of financial
market analysis has been achieved. In modern days, credit risk evaluation and credit defaulter prediction have attracted a
great deal of interests from theorists, regulators and practitioners, in the financial industry. In past days, financial
institutions utilized the rules or principles built by the analysts to decide whom to give credit. But it is impossible both in
economic and manpower terms to conduct all works with the tremendous increase in the number of applicants. Therefore,
the credit approval decision process needs to be automated. Automation of credit risk prediction is achieved using a
classification technique. Determining the classifier, which predicts the credit risk in an efficient manner, is an important
and crucial task. This work evaluates the credit risk performance of two different classifiers, namely, Logistic Classifier
and Sequential Minimal Optimization (SMO) and compares which provide more accurate credit risk prediction.

2. Literature Review
Many researchers have made the credit risk prediction using varied computing techniques. A neural network based
system for automatic support to credit risk analysis in a real world problem is presented in [2]. An integrated back
propagation neural network with traditional discriminant analysis approach is used to explore the performance of credit
scoring in [3]. A comparative study of corporate credit rating analysis using support vector machines (SVM) and back
propagation neural network (BPNN) is analyzed in [4]. A triple-phase neural network ensemble technique with an un-
correlation maximization algorithm is used in a credit risk evaluation system to discriminate good creditors from bad
ones are explained in [5]. An application of artificial neural network to credit risk assessment using two different
architectures are discussed in [6]. Credit risk analysis using different Data Mining models like C4.5, NN, BP, RIPPER,
LR and SMO are compared in [7]. The credit risk for a Tunisian bank through modeling the default risk of its
commercial loans is analyzed in [8]. Credit risk assessment using six stage neural network ensemble learning approach is
discussed in [9]. Modeling framework for credit assessment models is constructed by using different modeling procedures
and performance is analyzed in [10]. Hybrid method for evaluating credit risk using Kolmogorove-Smirnov test,
DEMATEL method and a Fuzzy Expert system is explained in [11]. An Artificial Neural Network based approach for
Credit Risk Management is proposed in [12]. Artificial neural networks using Feed-forward back propagation neural
network and business rules to correctly determine credit defaulter is proposed in [13]. Adeptness evaluation of Memory
based classifiers for credit risk analysis is experimented and summarized in [14]. Adeptness comparison of Instance
Based and K Star Classifiers for Credit Risk Scrutiny is performed and described in [15]. This research work compares
the efficiency of Logistic classifier and SMO Classifier for credit risk prediction.

3. Dataset Used
Effectiveness Assessment between Sequential
Minimal Optimization and Logistic Classifiers
for Credit Risk Prediction

Lakshmi Devasena C
1


1
Department of Operations & IT, IBS Hyderabad, IFHE University, India.
International Journal of Application or Innovation in Engineering& Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 3, Issue 4, April 2014 ISSN 2319 - 4847

Volume 3, Issue 4, April 2014 Page 56

The German credit data is taken for credit risk prediction. It consists of 20 attributes, namely, Checking Status, Duration,
Credit History, Purpose, Credit Amount, Saving Status, Employment, Installment Commitment, Personal Status, Other
parties, resident since, Property magnitude, Age, Other payment plans, Housing, existing credits, job, Num dependents,
Own Phone and Foreign worker. The data set consists of 1000 instances of customer credit data with the class detail. It
has two classes, namely, good and bad.

4. Methodology Used
In this work, two different classifiers namely, Logistic Classifier and Sequential Minimal Optimization (SMO) Classifier
are used for efficiency comparison of credit risk prediction.

4.1 Logistic Classifier
Logistic Classifier is a generalization of linear regression classifier [18]. It is fundamentally used for estimating binary or
multi-class dependent variables and the response variable is discrete, it cannot be modeled directly by linear regression
i.e. discrete variable changed into continuous value. Logistic classifier mainly is used to classify the low dimensional data
having non linear boundaries. It also provides the difference in the percentage of dependent variable and provides the
rank of individual variable according to its significance. So, the main dictum of Logistic classifier is to determine the
result of each variable correctly. Logistic classifier is also known as logistic model/ logit model that provide categorical
variable for target variable with two categories such as good, bad.

4.2 SMO Classifier
Sequential minimal optimization (SMO) is an algorithm for quickly solving the optimization problems. Consider a binary
classification problem with a dataset (x
1
, y
1
)... (x
n
, y
n
), where x
i
is an input vector and y
i
{-1, +1} is a binary label
corresponding to it. The dual form of quadratic programming problem solved using support vector machine is as follows:
(1)
subject to:


where C is a Support Vector Machine hyper-parameter and K (x
i
, x
j
) is the kernel function, supplied by the user; and the
variables are Lagrange multipliers.
SMO breaks the problem into a series of smallest possible sub-problems, which are then solved analytically. Since the
linear equality constraint involving the Lagrange multipliers , the smallest possible problem involves two such
multipliers. Then, for any two multipliers and , the constraints are reduced to:

(2)

is the sum of the rest of terms in the equality constraint, which is fixed in each iteration.
The SMO algorithm proceeds as follows:
1. Find a Lagrange multiplier that contravenes the KarushKuhnTucker (KKT) conditions for the optimization
problem.
2. Choose a second multiplier and optimize the pair .
3. Repeat step 1 and 2 until convergence of multipliers.
The problem has been solved, when all the Lagrange multipliers satisfy the KarushKuhnTucker conditions within a
user-defined tolerance level.

5. Performance Measures Used
Different measures are used to evaluate the performance of the classifiers.

5.1 Classification Accuracy
All classification result could have an error rate and it may fail to classify correctly. Classification accuracy can be
calculated as follows.
Accuracy =(Instances Correctly Classified / Total Number of Instances)*100 % (3)

5.2 Mean Absolute Error
MAE is the average of difference between predicted and actual value in all test cases. The formula for calculating MAE is
given in equation shown below:
MAE =(|a1 c1| +|a2 c2| + +|an cn|) / n (4)
International Journal of Application or Innovation in Engineering& Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 3, Issue 4, April 2014 ISSN 2319 - 4847

Volume 3, Issue 4, April 2014 Page 57

Here a is the actual output and c is the expected output.

5.3 Root Mean Square Error
RMSE is used to measure differences between values actually observed and the values predicted by the model. It is
calculated by taking the square root of the mean square error as shown in equation given below:
RMSE =[((a1 c1)2 +(a2 c2)2+ +(an cn)2 )] / n (5)
Here a is the actual output and c is the expected output. The mean-squared error is the commonly used measure for
numeric prediction.

5.4 Confusion Matrix
A confusion matrix contains information about actual and predicted classifications done by a classification system.

6. Results and Discussion
The performance of both Logistic and SMO classifiers are checked using open source machine learning tool. The
performance is checked using the Training set itself and using different Cross Validation and Percentage Split methods.
The class is obtained by considering the values of all the 20 attributes.

6.1 Performance of Logistic Classifier
The overall evaluation summary of Logistic Classifier using training set and different cross validation methods is given in
Table 1. The classification summary of Logistic Classifier for different percentage split is given in Table 2. The confusion
matrix for each different test mode is given in Table 3 to Table 12. Logistic Classifier gives 78.6% for the training data
set. But for evaluation testing with test data is essential. So various cross validation and percentage split methods are used
to test its actual performance. On an average, it gives around 75% of classification accuracy for credit risk prediction.

Table 1: Logistic Classifier Overall Evaluation Summary
Test Mode Correctly
Classified
Instances
Incorrectly
Classified
Instances
Accuracy Kappa
Statistic
Mean
absolute
error
Root Mean
Squared
Error
Time Taken
to Build
Model (Sec)
Training Set 786 214 78.6% 0.4563 0.2921 0.3823 0.58
5 Fold CV 757 243 75.7% 0.3943 0.3067 0.4065 0.14
10 Fold CV 752 248 75.2% 0.375 0.3098 0.4087 0.13
15 Fold CV 757 243 75.7% 0.3845 0.3103 0.4085 0.13
20 Fold CV 754 246 75.4% 0.3813 0.3106 0.4086 0.14
50 Fold CV 752 248 75.2% 0.3725 0.3116 0.4084 0.14

Table 2: Logistic Classifier Percentage Split Overall Evaluation Summary
Test Mode Total
Test
Insta
nces
Correctly
Classified
Instances
Incorrectly
Classified
Instances
Accuracy Kappa
Statistic
Mean
absolute
error
Root
Mean
Squared
Error
Time Taken
to Build
Model (Sec)
66% Percentage Split 340 262 78 77.059% 0.4064 0.2968 0.3954 0.14
33% Percentage Split 670 477 193 71.194% 0.2759 0.3159 0.4533 0.13
75% Percentage Split 250 193 57 77.2 % 0.4161 0.2893 0.391 0.14
80% Percentage Split 200 153 47 76.5 % 0.4082 0.2936 0.3918 0.14

Table 3: Confusion Matrix Logistic Classifier on Training Dataset
Good Bad Actual (Total)
Good 626 74 700
Bad 140 160 300
Predicted (Total) 766 234 1000

Table 4: Confusion Matrix Logistic Classifier for 5 Fold Cross Validation
Good Bad Actual (Total)
Good 602 98 700
Bad 145 155 300
Predicted (Total) 747 253 1000

International Journal of Application or Innovation in Engineering& Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 3, Issue 4, April 2014 ISSN 2319 - 4847

Volume 3, Issue 4, April 2014 Page 58

Table 5: Confusion Matrix Logistic Classifier for 10 Fold Cross Validation
Good Bad Actual (Total)
Good 605 95 700
Bad 153 147 300
Predicted (Total) 758 242 1000

Table 6: Confusion MATRIX Logistic Classifier for 15 Fold Cross Validation
Good Bad Actual (Total)
Good 610 90 700
Bad 153 147 300
Predicted (Total) 763 237 1000

Table 7: Confusion MATRIX Logistic Classifier for 20 Fold Cross Validation
Good Bad Actual (Total)
Good 605 95 700
Bad 151 149 300
Predicted (Total) 756 244 1000

Table 8: Confusion MATRIX Logistic Classifier for 50 Fold Cross Validation
Good Bad Actual (Total)
Good 607 93 700
Bad 155 145 300
Predicted (Total) 762 238 1000

Table 9: Confusion Matrix Logistic Classifier For 66% Percentage Split
Good Bad Actual (Total)
Good 212 38 700
Bad 40 50 300
Predicted (Total) 252 88 1000

Table 10: Confusion Matrix Logistic Classifier For 33% Percentage Split
Good Bad Actual (Total)
Good 390 100 700
Bad 93 87 300
Predicted (Total) 483 187 1000

Table 11: Confusion Matrix Logistic Classifier For 75% Percentage Split
Good Bad Actual
(Total)
Good 155 29 700
Bad 28 38 300
Predicted (Total) 183 67 1000

Table 12: Confusion Matrix Logistic Classifier For 80% Percentage Split
Good Bad Actual (Total)
Good 122 27 700
Bad 20 31 300
Predicted (Total) 142 58 1000


Figure 1 Correctly Classified instances of Logistic Classifier
International Journal of Application or Innovation in Engineering& Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 3, Issue 4, April 2014 ISSN 2319 - 4847

Volume 3, Issue 4, April 2014 Page 59


Figure 2 Classification Accuracy of Logistic Classifier


Figure 3 Classification Accuracy of Logistic Classifier for different Percentage Split

6.2 Performance of SMO Classifier
The overall evaluation summary of SMO Classifier using training set and different cross validation methods is given in
Table 13. The classification summary of SMO Classifier for different percentage split is given in Table 14. The confusion
matrix for each different test mode is given in Table 15 to Table 24. The chart showing the performance of SMO
Classifier with respect to Correctly Classified Instances and Classification Accuracy with different type of test modes are
depicted in Fig. 4, Fig. 5 and Fig. 6. SMO classifier gives 78.4% for the training data set. But for testing various cross
validation and percentage split methods, it performs more or less equal to logistic classifier. On an average, SMO
Classifier gives around 75% of classification accuracy for credit risk prediction.
Table 13: SMO Classifier Overall Evaluation Summary
Test Mode Correctly
Classified
Instances
Incorrectly
Classified
Instances
Accuracy Kappa
Statistic
Mean
absolute
error
Root Mean
Squared
Error
Time Taken
to Build
Model (Sec)
Training Set 784 216 78.4% 0.4501 0.216 0.4648 2.44
5 Fold CV 760 240 76.0% 0.3939 0.24 0.4899 2.56
10 Fold CV 751 249 75.1% 0.3654 0.249 0.499 2.66
15 Fold CV 747 253 74.7% 0.3499 0.253 0.503 2.52
20 Fold CV 745 255 74.5% 0.3528 0.255 0.505 2.44
50 Fold CV 745 255 74.5% 0.3435 0.255 0.505 2.38

Table 14: SMO Classifier Percentage Split Overall Evaluation Summary
Test Mode Total
Test
Instances
Correctly
Classified
Instances
Incorrectly
Classified
Instances
Accuracy Kappa
Statistic
Mean
absolute
error
Root
Mean
Squared
Error
Time Taken
to Build
Model (Sec)
66% Percentage Split 340 261 79 76.7647% 0.3695 0.2324 0.482 2.52
33% Percentage Split 670 482 188 71.9403% 0.3031 0.2806 0.5297 2.52
75% Percentage Split 250 196 54 78.4 % 0.4387 0.216 0.4648 2.45
80% Percentage Split 200 155 45 77.5 % 0.4116 0.225 0.4743 2.53
International Journal of Application or Innovation in Engineering& Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 3, Issue 4, April 2014 ISSN 2319 - 4847

Volume 3, Issue 4, April 2014 Page 60

Table 15: Confusion Matrix SMO on Training Dataset
Good Bad Actual (Total)
Good 626 74 700
Bad 142 158 300
Predicted (Total) 768 232 1000

Table 16: Confusion Matrix SMO for 5 Fold Cross Validation
Good Bad Actual (Total)
Good 610 90 700
Bad 150 150 300
Predicted (Total) 760 240 1000

Table 17: Confusion Matrix SMO for 10 Fold Cross Validation
Good Bad Actual (Total)
Good 610 90 700
Bad 159 141 300
Predicted (Total) 769 231 1000

Table 18: Confusion Matrix SMO for 15 Fold Cross Validation
Good Bad Actual (Total)
Good 612 88 700
Bad 165 135 300
Predicted
(Total)
777 223 1000

Table 19: Confusion Matrix SMO for 20 Fold Cross Validation
Good Bad Actual (Total)
Good 626 74 700
Bad 142 158 300
Predicted (Total) 768 232 1000

Table 20: Confusion Matrix SMO for 50 Fold Cross Validation
Good Bad Actual (Total)
Good 610 90 700
Bad 150 150 300
Predicted (Total) 760 240 1000

Table 21: Confusion Matrix SMO for 66% Percentage Split
Good Bad Actual (Total)
Good 218 32 250
Bad 47 43 90
Predicted (Total) 265 75 340

Table 22: Confusion Matrix SMO for 33% Percentage Split
Good Bad Actual (Total)
Good 389 101 490
Bad 87 93 180
Predicted (Total) 476 194 670

Table 23: Confusion Matrix SMO for 75% Percentage Split
Good Bad Actual
(Total)
Good 158 26 184
Bad 28 38 66
Predicted (Total) 186 64 250

Table 24: Confusion Matrix SMO for 80% Percentage Split
Good Bad Actual (Total)
Good 126 23 149
Bad 22 29 51
Predicted (Total) 148 52 200

International Journal of Application or Innovation in Engineering& Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 3, Issue 4, April 2014 ISSN 2319 - 4847

Volume 3, Issue 4, April 2014 Page 61


Figure 4 Correctly Classified instances of SMO Classifier


Figure 5 Classification Accuracy of Logistic Classifier


Figure 6 Classification Accuracy of SMO Classifier for different Percentage Split

6.3 Comparison of Logistic and SMO Classifiers
The comparison between Logistic Classifier and SMO classifier are shown in Fig 7, Fig. 8 and Fig. 9 in terms of
classification accuracy and Correctly Classified Instances. The overall ranking is done based on the classification
accuracy, correctly classified instances, MAE and RMSE values and other statistics found using Training Set results,
Percentage Split and Cross Validation Techniques. Based on that, it is observed that SMO classifier performs more or less
equal to Logistic Classifier.

International Journal of Application or Innovation in Engineering& Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 3, Issue 4, April 2014 ISSN 2319 - 4847

Volume 3, Issue 4, April 2014 Page 62


Figure 7 Correctly Classified Instances Comparison between Logistic and SMO Classifier for Percentage Split


Figure 8 Classification Accuracy Comparison between Logistic and SMO Classifier


Figure 9 Classification Accuracy Comparison between Logistic and SMO Classifier for different Percentage Split

7. Conclusion
This work investigated the efficiency of two different classifiers namely, Logistic Classifier and Sequential Minimal
Optimization (SMO) Classifier for credit risk prediction. Experiment is done using the open source machine learning
tool. Efficiency comparison of both the classifiers has been done by considering different measures of performance
International Journal of Application or Innovation in Engineering& Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org
Volume 3, Issue 4, April 2014 ISSN 2319 - 4847

Volume 3, Issue 4, April 2014 Page 63

evaluation. After experiment, it is observed that Sequential Minimal Optimization (SMO) Classifier performs better than
Logistic Classifier for credit risk prediction.

References
[1] John C. Platt, Fast Training of Support Vector Machines using Sequential Minimal Optimization, 2000.
[2] Germano C. Vasconcelos, Paulo J. L. Adeodato and Domingos S. M. P. Monteiro, A Neural Network Based
Solution for the Credit Risk Assessment Problem, Proceedings of the IV Brazilian Conference on Neural Networks -
IV Congresso Brasileiro de Redes Neurais pp. 269-274, July 20-22, 1999.
[3] Tian-Shyug Lee, Chih-Chou Chiu, Chi-Jie Lu and I-Fei Chen, Credit scoring using the hybrid neural discriminant
technique, Expert Systems with Applications (Elsevier) 23, pp. 245254, 2002.
[4] Zan Huang, Hsinchun Chena, Chia-Jung Hsu, Wun-Hwa Chen and Soushan Wu, Credit rating analysis with support
vector machines and neural networks: a market comparative study, Decision Support Systems (Elsevier) 37, pp.
543 558, 2004.
[5] Kin Keung Lai, Lean Yu, Shouyang Wang, and Ligang Zhou, Credit Risk Analysis Using a Reliability-Based
Neural Network Ensemble Model, S. Kollias et al. (Eds.): ICANN 2006, Part II, Springer LNCS 4132, pp. 682
690, 2006.
[6] Eliana Angelini, Giacomo di Tollo, and Andrea Roli A Neural Network Approach for Credit Risk Evaluation,
Kluwer Academic Publishers, pp. 1 22, 2006.
[7] S. Kotsiantis, Credit risk analysis using a hybrid data mining model, Int. J. Intelligent Systems Technologies and
Applications, Vol. 2, No. 4, pp. 345 356, 2007.
[8] Hamadi Matoussi and Aida Krichene, Credit risk assessment using Multilayer Neural Network Models - Case of a
Tunisian bank, 2007.
[9] Lean Yu, Shouyang Wang, Kin Keung Lai, Credit risk assessment with a multistage neural network ensemble
learning approach, Expert Systems with Applications (Elsevier) 34, pp.14341444, 2008.
[10] Arnar Ingi Einarsson, Credit Risk Modeling, Ph.D Thesis, Technical University of Denmark, 2008.
[11] Sanaz Pourdarab, Ahmad Nadali and Hamid Eslami Nosratabadi, A Hybrid Method for Credit Risk Assessment of
Bank Customers, International Journal of Trade, Economics and Finance, Vol. 2, No. 2, April 2011.
[12] Vincenzo Pacelli and Michele Azzollini, An Artificial Neural Network Approach for Credit Risk Management,
Journal of Intelligent Learning Systems and Applications, 3, pp. 103-112, 2011.
[13] A.R.Ghatge and P.P.Halkarnikar, Ensemble Neural Network Strategy for Predicting Credit Default Evaluation
International Journal of Engineering and Innovative Technology (IJEIT) Volume 2, Issue 7, January 2013 pp. 223
225.
[14] Lakshmi Devasena, C., Adeptness Evaluation of Memory Based Classifiers for Credit Risk Analysis, Proc. of
International Conference on Intelligent Computing Applications - ICICA 2014, 6-7, (IEEE Explore), March 2014,
pp. 143-147.
[15] Lakshmi Devasena, C., Adeptness Comparison between Instance Based and K Star Classifiers for Credit Risk
Scrutiny, International Journal of Innovative Research in Computer and Communication Engineering, Vol.2,
Special Issue 1, March 2014.
[16] Lakshmi Devasena, C., Adeptness Comparison between Instance Based and K Star Classifiers for Credit Risk
Scrutiny, Proc. of International Conference on Intelligent Computing Applications - ICICA 2014, 978-1-4799-3966-
4/14 (IEEE Explore), 6-7 March 2014, pp. 143-147, 2014.
[17] UCI Machine Learning Data Repository http://archive.ics.uci.edu/ml/datasets.
[18] De Mantaras and Armengol E, Machine learning from example: Inductive and Lazy methods, Data & Knowledge
Engineering 25: 99-123, 1998.



AUTHOR
Dr. C. Lakshmi Devasena has completed her Ph.D in Karpagam University, Coimbatore. She has 10
years of Teaching and two years of Industrial experience. She has published 35 papers, in that 24 papers are
published in International Journals and 11 are published in National Journals and conference proceedings.
She has presented 35 papers in various International and National conferences. Her research interest is Data
Mining, Medical Image Analysis, Business Analytics, Evaluation of Algorithms, and E-Governance. She is presently
working in Operations & IT Area, IBS Hyderabad, IFHE University.

Vous aimerez peut-être aussi