Vous êtes sur la page 1sur 5

Larger and Lengthier Loans Seem to Have No Impact on the Interest Rate You Will Have to Pay.

I.

Introduction FICO scores, ranging from 300 to 850, are an indicator of the credit risk level of a credit applicant and are used by lenders as an important determinant of the interest rate that a borrower will ultimately have to pay. FICO scores are based on the five following categories of credit information: Payment history (35 per cent) 1; Amount owed (30 per cent); Length of credit history (15 per cent); New credit (10 per cent); and Types of credit in use (10 per cent).

However, FICO scores are not the only factor determining the level of interest rates. The purpose of this document is trying to quantify the relative importance of three other variables thought to be significantly associated with the level of the interest rate of a loan: the amount requested, the length of the loan and the monthly income of the applicant. While income is normally constant over the short term, the length of the loan and the amount requested are thought to be controlled by the applicant. Therefore, estimating the amount of the individual impact of these variables on the level of the interest rate, may be of interest to prospective borrowers. Section II of this document reviews the methods used, including a brief description of the data set utilized, a summary of the exploratory data analysis conducted and an overview of the statistical modeling undertaken. Section III discusses results of an estimated multivariate linear regression model and presents some caveats that should be considered when making inferences on the basis of the estimated model. Finally, Section IV summarizes some conclusions that can be drawn from the analysis.

Fair Isaac Corporation (2011). Figures in parentheses indicate how much of FICO scores is approximately based on the corresponding category.

II.

Methods Data set and variables used The data set used in this document contains information on 2,500 peer-to-peer loans issued by the Lending Club (www.lendingclub.com/home.action). This data set includes both quantitative and categorical variables. These variables are listed below. Quantitative variables: i. ii. iii. iv. v. vi. vii. viii. ix. x. xi. The amount requested (in USD); The amount loaned to the applicant (in USD); The interest rate of the loan (in per cent); The length of time of the loan (in months); The debt-to-income ratio of applicant (in per cent); The monthly income of applicant (in USD); The FICO range of the applicant (ranging from 640 to 834 in the sample); The current number of lines of credit opened by the applicant; The total amount of outstanding credit of the applicant (in USD); The number of credit inquires during the six months prior to the application; and The length of time in the current employment (in years).

Categorical variables: i. ii. iii. The state of residence of applicant; The purpose of the loan (car, credit card, debt consolidation, education, home improvement, house, major purchase, medical, moving, etc.); and The ownership status of applicant (e.g., renter, home owner, or paying mortgage).

The data set also includes loan numbers, presumably unique identifiers of loans. The data set was contained in a csv file and was downloaded from the following URL: https://spark-public.s3.amazonaws.com/dataanalysis/loansData.csv. All data analyses in this document were conducted in R, ver. 2.15.1. Exploratory data analysis Exploratory data analysis was used in an attempt at understanding important characteristics (e.g., center, variation and extreme observations) and the nature of

univariate and multivariate distributions for all variables in the data set. In particular, scatter plots, boxplots, histograms and summary tables were used. Panel (a) in the accompanying figure, is a scatter plot showing that there is a negative relationship between the average of FICO range and the interest rate of loans. It also suggests that both the length of the loan (highlighted in purple in the figure) and the amount requested (proportional to the size of the dots) tend to cluster towards the upper part of the figure, indicating that larger and lengthier loans tend to be associated with greater interest rates. Panel (b) presents a histogram of income of applicants, a variable expected to be strongly associated with the interest rate of a loan. As can be observed in panel (b), the distribution of income is rightly skewed with three extreme outliers, labeled 41411, 18439 and 54487 (incomes of USD 39,583.33, 65,000 and 102,750, respectively). These three observations were excluded from all subsequent analyses. Furthermore, a boxplot of the distribution of income (not shown) indicated that there are quite a few additional mild outliers, that is, observations greater than the 3rd quartile plus 1.5 times the interquartile range. However, these observations were not as extreme as those three mentioned above and were taken care of by a logarithmic transformation. Finally, loan number 101596 has incomplete information on some variables, notably monthly income, and was also excluded. Statistical modeling A preliminary linear regression modeling the interest rate as a function of all numerical variables in the data set, indicated that three variables -the debt to income ratio, the amount of outstanding credit and the length of time in the current employment- were not statistically significant at the 5-per cent level (p<.001). Consequently, they were excluded from subsequent analyses. Furthermore, two other variables -the number of credit inquires and the current number of lines of credit opened by the applicant- are known to be part of the elements composing FICO scores and, therefore, are not considered in what follows so as to minimize the risks of overloading the regression model and the effects of multicollinearity. III. Results On the basis of 2,496 complete observations, the finally estimated model, is: IR = 1 + 2FICO +3LL+4AR+5log(I)+,

where IR is the interest rate; FICO is the average of FICO scores; LL is the length of the loan in months and AR is the amount requested in USD. The model includes an error term, , assumed to be a random variable, normally and independently distributed, with zero mean and constant variance2. Estimated coefficients (confidence intervals, CI, in parenthesis) are: 1 = 7.078e-01 (CI: 6.9e-01,7.3e-01); 2 = -8.733e-04 (CI: -9.0e-04, -8.5e-04); 3 = 1.357e-03 (CI: 1.3e-03, 1.4e-03); 4 = 1.516e-06 (CI:1.4e-06, 1.6e-06); and 5 = -3.970e-03 (-5.8e-03, -2.1e-03).

1, the independent term, represents the level of the interest rate whem all variables at equal to zero. The coefficient of determination adjusted by the degrees of freedom is 0.75, indicating that approximately 75 per cent of the variance of the interest rate is explained by the fitted model. Note that although all individual coefficients are highly significant (p<0.001) and of the expected sign, they are all likely to have only a marginal impact on the estimated interest rate of the relevant loan. In particular, assuming that two applicants had identical FICO scores and keeping all variables but one constant, it should be expected that, for instance: A 10 per cent positive difference favoring the monthly income of one of the applicants, would have a marginal reduction (of approximately 0.0004 per cent, or 0.00397*log(1.1)3 on the interest rate that this applicant would obtain; An increase of the length of the loan of one of the applicants from 36 to 60 months, would likely result in an increase on the interest rate of only 0.0014 per cent; and An increase of USD 100 in the amount requested by one of the applicants, would increase the interest rate received by this applicant by barely 0.0002 per cent.

Some caveats. Panel (c) shows the relationship between the estimated interest rate and corresponding residuals, after conditioning on the purpose of loan. It should be noted that the fitted model seems to be overestimating the interest rate, as can be observed by the many positive residuals in this panel. More importantly, panel (c) suggest that there seem to be a non-linear (presumably quadratic) pattern in the residuals not accounted for by the

Dickey et al (1989). Note that due to the logarithmic transformation, the effect of income on the exchange rate is not linear in this model.
3

the variables included in the model. Furthermore, the pattern followed by residuals in this same panel also suggests that the assumption of equal variance does not hold. IV. Conclusions

Estimated results presented in section III suggest that, after controlling for the effects of FICO scores, larger and lengthier loans are likely to have almost no impact on the interest rate of loans, other things being equal. However, as estimated residuals seem to violate some of the assumptions of the linear regression model, this finding should be interpreted with extreme caution. Finally, although all individual coefficients are highly significant (p<0.001) and of the expected sign, their practical significance seems nil. References Fair Isaac Corporation (2011). Understanding your FICO Score. www.myfico.com/downloads/files/myfico_uyfs_booklet.pdf (accesed on February 18, 2013). Dickey, David A., John O. Rawlings and Sastry G. Pantula (1989). Applied Regression Analysis, 2nd Ed., Springer-Verlag: NY.

Vous aimerez peut-être aussi