Logistic regression vs. neural network modelling Group 4 Rahul Gaikwad (4022/20) Sahil Mehta (4043/20) Shaurya Varma (4046/20) Source: Credit-risk Evaluation Of A Tunisian Commercial Bank: Logistic Regression vs Neural Network Modelling, Hamadi Matoussi, University of Manouba, Tunisia (2010); IIMC library portal (Hyperlink) The Case Short term loans to Tunisian companies Database of 1434 files of credits in 2003, 2004, 2005, and 2006 Results Best prediction model is the multi-layered NN model Classification rate: 97% (training) and 89.8% (validation) Bank could have reduced the percentage of loss from 18.7% to 12% in 2007
Introduction Basel Committee on Banking Supervision (June 2004) Banking institutions allowed to use their own internal measures for key drivers of credit risk as primary inputs to their minimum regulatory capital requirement Choice of two broad methodologies External mapping Internal rating Traditional statistical methods (eg. Logistic regression) Non-parametric statistical methods (k-nearest neighbour, classification trees) Neural networks
Neural Networks Structural vs. Empirical modelling Structural: Default is endogenous, and related to the capital structure. Default occurs when the asset value of the firm falls below a critical level. Empirical: Instead of modelling the relationship of default with the characteristics of the firm, this relationship is learned from the data. Linear (Z-Score, O-Score) Non-linear (Neural networks) Advantages Ability to model complex relationships Benchmarking alternatives Speeding up the decision Architecture: Multilayer Feed-forward NN Dataset: Financial and non-financial data (financial ratios, firm size, industry)
Sample and Data (1/3) Training Data Consists of a number of cases containing a range of input and output data Variables : Indicators of Default Risk
Subject : Borrowers Statistical Methods Artificial Neural Network based methods in our case Default Risk Levels In the case, we would use data collected from a large private commercial bank (BIAT) based in Tunisia The data wasnt taken from a public sector bank to avoid potential inefficiencies of a public bank Although the data has been classified by BIAT into five risk classes, we will divide the final output into two classes only healthy and risky Sample and Data (2/3) Dependent Variable Composition Sample and Data (3/3) Independent Variable Composition We have considered 24 indicators in all 22 financial indicators and 2 non- financial indicators Financial ratios analysis groups the ratios into different categories, such as liquidity, operational, leverage and profitability, that tell us about different facets of a companys finance and operations Besides the commonly used financial ratios, two non-financial indicators, namely collateral and firm size, have also been considered as independent variables Empirical Model In order to test the prediction capacity of model we split our sample of the bank credit files into two sub samples. The first sub-sample is composed of 924 files of short term loan granted to 231 industrial Tunisian companies in 2003 and 2004,2005 and 2006. The data of this sub-sample are used as a training set (the insample set) to construct the prediction models. The second one is composed of 510 files and is used for validation (the out-of sample set). Out of Sample Validation Out-of-sample validation will be done on the second subsample, which contains data on 510 files of short term loan granted to Tunisian companies in 2003, 2004, 2005 and 2006. Out of sample validation provides a way to assess the practical applicability of the model. The NN model was performing better than statistical regression models. Appendix Independent Variable Set Appendix Independent Variable Set Thank You