Académique Documents
Professionnel Documents
Culture Documents
Team No 5
Kapil Paniker
Shashi B Pandey
Ragil Ravindran
Georgy K Joseph
Supriyo
Chakrabarti
Harshad
Suryawanshi
IIM Trichy
130101
4
130104
1
130105
6
130106
8
130110
2
130110
3
Business Context
Banks play one of the most vital roles in economies worldwide. Every
economy thrives on the basis of investments and savings. For markets and
society to function, individuals, corporates and governments need access to
credit. Banks are the critical financial intermediaries between the savers and
the investors by lending the savings of households in the form of loans to
corporate and government for investment. An important task for banks is to
decide whether a customer can get finance and if so, then on what terms.
These decisions are vital for a banks business as credit risk is the most
important risk in the banking sector and dictates financial stability.
Probability of default models are used by banks in arriving at decisions
regarding whether to grant a loan to a customer or not. Credit scoring is a
mechanism which associates the probability of default with the borrowers
based on historic data. Using credit scoring method, banks can identify
borrowers with higher probability of default and take business decisions
accordingly.
Business Objective
The objective of this project is to develop a model that would predict whether
a person in the dataset would experience financial distress in the next two
years. Based on this, a credit score would be assigned to each borrower. The
credit score would be an indication of the probability of default. Higher the
credit score, lower will the probability of default and vice versa.
The benefit of these credit scores will be twofold. The banks will know which
potential customers have highest probability of default and hence could
avoid giving them loans altogether. Also, those customers who have lowest
probability of default would be charged the lowest interest rate by the bank
and the ones having moderate probability of default will be charged higher
interest rate as they pose credit risk (default risk) to the bank.
Mining objectives
Since we have a large dataset we would try to identify clusters of data that
are homogenous in certain aspects. These clusters could contain those
potential customers who have similar credit risks and hence would have
similar credit ratings. Also, we would try to identify attributes that are
correlated to each other and those which could help us calculate the
probability of default. We would also look at the possibility of deriving new
attributes from existing ones. The end result of this mining exercise would
help us assign a credit score to each customer in the dataset. Based on this
credit score, the banks could decide which customers to lend to and which
ones to avoid. Also, banks could charge variable interest rates to different
customers based on their credit scores; higher the credit score, lower the
interest rate.
Specific questions we seek to answer using Data Mining
The project aims to seek answers to the following questions
Variable Name
RevolvingUtilizationOfUnsecu
redLines
2
3
age
NumberOfTime3059DaysPastDueNotWorse
DebtRatio
MonthlyIncome
Description
Total balance on credit cards
and personal lines of credit
except real estate and no
installment debt like car loans
divided by the sum of credit
limits
Age of borrower in years
Number of times borrower
has been 30-59 days past due
but no worse in the last 2
years.
Monthly debt payments,
alimony,living costs divided
by monthy gross income
Monthly income
Type
percent
age
integer
integer
percent
age
real
NumberOfOpenCreditLinesAn
dLoans
NumberOfTimes90DaysLate
NumberRealEstateLoansOrLi
nes
NumberOfTime6089DaysPastDueNotWorse
1
0
NumberOfDependents
integer
integer
integer
integer
integer