Non-Financial Factors in Credit Ratings
Non-Financial Factors in Credit Ratings
[Link]/locate/econbase
a
Department of Banking and Finance, University of Mannheim, L 5.2, D-68131 Mannheim, Germany
b
Centre for Economic Policy Research (CEPR), London, UK
Received 5 June 2002; accepted 14 January 2004
Available online 20 June 2004
Abstract
Internal credit ratings are expected to gain in importance because of their potential use for
determining regulatory capital adequacy and banks’ increasing focus on the risk–return profile
in commercial lending. Whereas the eligibility of financial factors as inputs for internal credit
ratings is widely accepted, the role of non-financial factors remains ambiguous. Analyzing
credit file data from four major German banks, we find evidence that the combined use of
financial and non-financial factors leads to a more accurate prediction of future default events
than the single use of each of these factors.
Ó 2004 Elsevier B.V. All rights reserved.
1. Introduction
Similar to capital market investors that rely on credit ratings provided by rating
agencies, banks assign internal credit ratings to appraise the creditworthiness of their
borrowers. In both cases, ratings can be interpreted as a screening technology that is
applied to alleviate asymmetric information problems between borrowers and lend-
ers. Whereas external ratings have been well established since the beginning of the
20th century, internal ratings were adopted increasingly by banks during the nineties
*
Corresponding author. Tel.: +49-621-1811536; fax: +49-621-1811534.
E-mail address: norden@[Link] (L. Norden).
0378-4266/$ - see front matter Ó 2004 Elsevier B.V. All rights reserved.
doi:10.1016/[Link]fin.2004.05.017
510 J. Grunert et al. / Journal of Banking & Finance 29 (2005) 509–531
(see English and Nelson, 1999; Treacy and Carey, 2000). Internal credit ratings for
corporate borrowers are an aggregated valuation procedure of various financial
and non-financial factors. In banking practice, ratings represent the basis for loan
approval, pricing, monitoring, and loan loss provisioning. While considerable re-
search has proven the suitability of financial factors to predict borrower insolvency
(see, for example, Altman, 1968), the role of non-financial factors remains ambigu-
ous. Although consideration of non-financial factors such as management quality
and industry perspectives is beyond controversy (see Basel Committee on Banking
Supervision, 2000a, 2001; G€ unther and Gr€ uning, 2000) there is a lack of quantitative
research on this issue. With respect to these ‘‘soft’’ factors, bankers often refer to
their experience and distrust the sole use of financial criteria. A first investigation
of the importance of soft information in borrower–bank relationships is conducted
by Berger et al. (2002) and Stein (2002). Depending on bank size, Berger et al. (2002)
explore a bank’s ability to act in projects that require the evaluation of soft informa-
tion. They find that small banks are more capable of collecting and acting on soft
information than large banks. Stein (2002) points out that decentralized banking
hierarchies are likely to be more attractive when projects’ soft factors are to be eval-
uated.
This paper explores the role of non-financial factors in internal credit ratings. For
this purpose we examine empirically whether the combined use of financial and non-
financial factors leads to a more accurate prediction of default events than their sin-
gle use. 1 Our study has implications for both banks and bank supervisors: banks
will be able to better understand the role of quantitative and qualitative factors in
internal credit ratings and supervisors will be supported in claiming a ‘‘mixed’’ credit
rating to determine regulatory capital requirements (see Basel Committee on Bank-
ing Supervision, 2001).
The paper proceeds as follows. Section 2 provides an overview of related litera-
ture, in particular on the structure of internal rating systems and the properties of
non-financial sub-ratings. Section 3 describes the data, the variables, and proposes
a testable hypothesis. Section 4 analyzes whether a combination of financial and
non-financial factors leads to a more accurate prediction of default events than
the single use of each of these factors. Afterwards, several types of robustness tests
are performed. The paper concludes with Section 5.
1
The indicator variable for default events defined hereinafter is consistent with the Basel II definition of
default (see Basel Committee on Banking Supervision, 2001).
J. Grunert et al. / Journal of Banking & Finance 29 (2005) 509–531 511
and monitor borrowers at a given cost but they rarely specify the technology that is
or should be applied. Since the latter issue is closely connected to our study, we out-
line three lines of related literature in the following, focusing on the empirical anal-
ysis of default prediction. 2 Firstly, research on the prediction of corporate
bankruptcy on the basis of financial factors is presented. Secondly, empirical and
normative research on banks’ internal credit rating systems is reviewed. Finally, lit-
erature concerning the components of credit ratings, both quantitative and qualita-
tive, is described.
Firstly, our analysis relates to the work on corporate bankruptcy prediction with
financial factors (see Beaver, 1966; Altman, 1968; Altman et al., 1977; Ohlson, 1980;
Platt and Platt, 1990; Baetge, 1998). These factors typically concern the capital struc-
ture, profitability and liquidity of a firm. Models are based on linear discriminant
analysis, on logit and probit regression analysis or, more recent ones, on neural net-
works. Because of their relatively high discriminary power, these models are widely
accepted but they nevertheless show some disadvantages (see Basel Committee on
Banking Supervision, 2000b, pp. 107–110). Few of them are based on a theory that
explains why and how certain financial factors are linked to corporate bankruptcy.
As financial factors are mostly backward-looking point-in-time measures, these
models are inherently constrained and it is not clear how well these models perform
out-of-sample (time, firm, industry, etc.). This area of research is relatively well
developed but still has to overcome the above mentioned problems.
Secondly, we briefly review research on banks’ internal credit rating systems
which is still scarce but growing considerably. It can be divided into an empirical
and a normative part. On the one hand, empirical analyses of banks’ internal rating
systems examine the structure and the use of ratings (see Elsas and Krahnen, 1998;
Machauer and Weber, 1998; English and Nelson, 1999; Treacy and Carey, 2000;
Crouhy et al., 2001; Ewert and Szczesny, 2001; Norden, 2002). These studies and
an overview of international best practice rating standards in the banking industry
(see Basel Committee on Banking Supervision, 2000a) show that internal rating sys-
tems are based on either statistical methods, constrained expert judgment-based
techniques or exclusively expert judgments. These systems tend to include similar
types of risk factors, typically a mix of quantitative and qualitative factors (e.g. lever-
age, profitability and liquidity ratios, management experience, industry perspec-
tives). However, the weighting schemes of these risk factors differ considerably
across banks. Ratings are used for loan approval, management reporting, pricing,
limit setting, and loan loss provisioning. Other studies analyze the frequency and
the extent of banks’ rating disagreement for a given borrower (see Risk Management
Association, 2000; Carey, 2001). In addition to the reasons given in these studies, it
might be possible that differences in opinion on borrower quality result from a dif-
ferent evaluation of non-financial factors rather than from financial factors. Using a
similar kind of reasoning, Tabakis and Vinci (2002) compare and combine credit
2
Theory-based models (structural models) are drawn upon Merton (1974). Commercial applications
for theory-based models are CreditMetricse and KMV CreditMonitore.
512 J. Grunert et al. / Journal of Banking & Finance 29 (2005) 509–531
accurate forecast of borrower quality? We intend to answer this question in the fol-
lowing sections.
3
In our study we eliminated bank 5 due to a lack of non-financial factors and bank 6 because of a small
number of observations. For this reason credit file information from four banks remains in our data set.
514 J. Grunert et al. / Journal of Banking & Finance 29 (2005) 509–531
Table 1
Description of variables
Variable Description
Default dummy variables
DEF ¼ 1 if default occurred in the year following the
one of the considered rating
Rating categories
FR Financial rating with grades 1–6
NFR Non-financial rating with grades 1–6
OR Overall rating with grades 1–6
Non-financial factors
MGT Non-financial factor ‘‘Management quality’’
MKT Non-financial factor ‘‘Market position’’
Financial factors
LTA Logarithm of total assets
ER Equity-to-assets ratio
CR Current ratio
CFNL Cash flow-to-net liabilities
CIR Capital intensity ratio
ROA Return on assets
Bank dummy variables
B1, B2, B3, B4 ¼ 1 if bank 1, 2, 3, 4 is the lender
Year dummy variables
Y1992, Y1993, Y1994, Y1995 ¼ 1 if observation is from 1992, 1993, 1994, 1995
market position) directly from the credit files, whereas we have to compute the finan-
cial factors, some of them are integral parts of the financial rating, because only the
underlying balance sheet items are in the dataset (see Appendix A for detailed defi-
nitions). These factors cover all categories of the C’s of credit (excepting collateral), a
familiar credit analysis concept in commercial lending (see Collins, 1966). Dummy
variables are created to control bank and year-specific effects. Table 2 shows the dis-
tribution of the default variable DEF by banks, years and overall rating classes. Pa-
nel A shows that default events are agglomerated at bank 2 but quite evenly
distributed across banks 1, 3 and 4. Note that this agglomeration of default events
at bank 2 is not a problem because our results are not sensitive to the omission of
bank 2 from the sample (see Section 4.2).
Whereas panel B indicates a relatively even distribution of the default events
across years, a monotonous increase of the relative default frequency from rating
class 1 to 6 can be observed in panel C.
Table 3 displays descriptive statistics of different rating categories. The means of
all three credit rating categories are higher for defaulters than for non-defaulters.
This is a first indication of a robust relation between credit ratings and default
status. The standard deviations of the different rating categories indicate that the dis-
persions of defaulters’ ratings are lower, which may be caused by the fact that default
events occur mainly in the grades 5 and 6. Similar to the study of Weber et al. (1999),
J. Grunert et al. / Journal of Banking & Finance 29 (2005) 509–531 515
Table 2
Distribution of default events – panels A, B, and C present the distribution of the default variable DEF by
banks, years and overall rating classes
Panel A: Default events by banks
Bank DEF ¼ 0 DEF ¼ 1 Total % of all obs.
1 66 3 69 16.87
2 110 52 162 39.61
3 84 11 95 23.23
4 80 3 83 20.29
Total 340 69 409 100
Table 3
Descriptive statistics of credit rating categories – credit ratings are assigned in year t, and default status
DEF is reported for the following year
Mean Std. Dev. Mean Std. Dev. Mean Std. Dev.
DEF ¼ 0 DEF ¼ 1
Financial rating 3.72 1.58 3.45 1.50 5.07 1.26
Non-financial rating 3.51 1.15 3.30 1.07 4.54 0.96
Overall rating 3.47 1.17 3.27 1.09 4.45 1.01
No. of observations 409 340 69
All three ratings are based on a six grade scale (1 ¼ best, . . ., 6 ¼ worst creditworthiness).
the standard deviation of non-financial ratings is lower than the one of financial rat-
ings. Furthermore, non-financial ratings are significantly better at the 0.01-level than
516 J. Grunert et al. / Journal of Banking & Finance 29 (2005) 509–531
financial ratings using a Wilcoxon signed-rank test. This means that on an average
banks assess the quality of the management and the market position of their borrow-
ers better than their financial situation.
We now turn to the formulation of our hypothesis. Since the objective of assessing
a borrower’s creditworthiness is to specify his or her probability of default over a
given time horizon (usually one year), banks should not only use backward-looking
‘‘hard’’ financial data but also some forward-looking ‘‘soft’’ information. Internal
ratings of banks are usually based on borrowers’ current condition (point-in-time),
whereas rating agencies follow a ‘‘through the cycle’’ approach projecting the bor-
rowers’ condition on an entire economic cycle (see Treacy and Carey, 2000; L€ offler,
2004). Accordingly, we propose the following hypothesis: a combination of financial
and non-financial factors leads to a more accurate prediction of default than the sin-
gle use of either financial or non-financial factors.
4.1. The relation between credit ratings and default events in the following year
4
Due to missing default dates and rating assignment dates during a year, we do not include default
events in the same year because we do not know which variable (rating or default) changed at first and
which followed.
J. Grunert et al. / Journal of Banking & Finance 29 (2005) 509–531 517
cation basically yields the same results, we use the credit rating variables (coded on a
scale from 1 to 6) in the remainder. In each model we control for bank and year-spe-
cific influences with dummy variables using bank 1 and year 1992 as reference cate-
gories.
Given that our sample includes a relatively high number of defaults due to the
oversampling of a distressed sub-sample, we employ sampling weights in all subse-
quent regression models in order to correct for potentially biased coefficients. Sam-
pling weights represent the inverse of an observation’s probability of being included
in the sample. In the estimation procedure, sampling weights put more weight on
non-defaulters and less weight on defaulters in order to approximate the default dis-
tribution of the underlying population (see Ewert and Szczesny, 2001, p. 15). 5 We
assume a population probability of default derived from the OECD data on loan loss
provisions of German commercial banks, in which the rounded average probability
of default for the years 1993–1996 amounts approximately to 2%. 6 In addition,
Moody’s Investor Service (2001) takes 1.6% for their RiskCalce model for German
private firms. For robustness purposes, we also apply probability weights, assuming
a population probability of default of 1.6%. The results for models 1–3 and regres-
sions 1–2 are very similar to those obtained from 2%. Neither the regressions coef-
ficients’ magnitude and significance, nor relative model performance change
substantially. Therefore we stick to the previously assumed value of 2%.
The models can be evaluated by using different criteria (see Hosmer and Leme-
show, 2000; Deutsche Bundesbank, 2003). We decided to use the McFadden’s R2 ,
the Brier Score, the percentage of correctly classified observations, and type I and
type II error rates as evaluation criteria because they represent an adequate mix of
goodness of fit and classification accuracy measures. Since the conventional R2 can-
not be calculated for probit and logit models, McFadden’s R2 (Pseudo R2 ) is
employed. It is defined as 1 ) (unrestricted log-likelihood function/restricted log-
likelihood function). The Brier Score (BS) is a measure of prediction accuracy that
is well-known
P in meteorology and medical science (see Brier, 1950). It is calculated
as BS ¼ 1n ni ðhi pi Þ2 where hi is a binary indicator for the actual realization of
the default variable (1 if default, 0 if no default) and pi is the estimated probability
of default. The difference between the Brier Score and the percentage of correctly
classified observations is that the former is more sensitive to the level of the estimated
probabilities. The Brier Score takes the estimated probabilities directly into account.
Otherwise, the percentage of correctly classified observations transforms probabili-
ties that are higher than a specific cutoff point to 1 and others to 0. In the following,
predicted defaults are those observations with an estimated probability above 0.11,
which is the cutoff point that maximizes the proportion of observations correctly pre-
dicted by model 1. 7 It is important to mention that using the optimal cutoff point of
5
See also Zmijewski (1984) who calls oversampling ‘‘choice-based sample bias’’.
6
See Organisation for Economic Co-Operation and Development (2001).
7
See Carey and Hrycay (2001) for a similar criterion for choosing a cutoff point. Zmijewski (1984)
provides an overview and several examples for linked problems. Ohlson (1980, p. 120), discusses this topic
in the same context, emphasizing that 0.5 is not always a reasonable cutoff value.
518 J. Grunert et al. / Journal of Banking & Finance 29 (2005) 509–531
8
The significance test is based on the Williams–Kloot-statistic zwk which is described in detail by
Redelmeier et al. (1991) and Vinterbo and Ohno-Machado (1999).
9
See Carey and Hrycay (2001). Their logit default prediction model (based on four financial factors)
produces a type I error of 0.68 in the sample and 0.65 out of the sample.
J. Grunert et al. / Journal of Banking & Finance 29 (2005) 509–531 519
Table 4
Regression results and evaluation criteria for models 1–3
Panel A: Regression results
Variable Model 1 (financial Model 2 (non-financial Model 3 (overall
rating) rating) rating)
DEF Coefficient Robust Coefficient Robust Coefficient Robust
Std. Err. Std. Err. Std. Err.
Rating 0.4439 0.0734 0.6651 0.0984 0.8931 0.1355
B2 1.4741 0.2876 0.9260 0.2631 2.2023 0.4581
B3 1.0654 0.3126 0.3268 0.2768 1.1187 0.4092
B4 0.2701 0.3428 )0.6470 0.3734 0.2628 0.4179
Y1993 )0.4807 0.2760 )0.6122 0.3172 )0.5824 0.3007
Y1994 )0.2550 0.2321 )0.1306 0.2399 )0.2851 0.2449
Y1995 0.3162 0.2073 0.4424 0.2294 0.3956 0.2269
Intercept )4.8979 0.4887 )5.0936 0.4856 )6.8252 0.8528
Panel B: Evaluation criteria – predicted defaults are those observations with a fitted probability above
0.11, which is the cutoff point that maximizes the proportion of correctly predicted observations by
model 1. The null hypotheses BS(model 1) ¼ BS(model 3) and BS(model 2) ¼ BS(model 3) can both be
rejected with a p-value 0.00 using the Williams–Kloot statistic zwk (two-tailed test)
Evaluation Model 1 Model 2 Model 3
criterion (financial (non-financial (overall
rating) rating) rating)
McFadden’s R2 0.2680 0.2938 0.3599
Brier Score 0.1301 0.1257 0.1043
Obs. correctly classified 0.8875 0.8900 0.9169
Type I error rate 0.5217 0.4783 0.4058
Type II error rate 0.0294 0.0353 0.0176
The sample used in all three probit regressions is the same and consists of 409 observations from the period
1992–1995. The dependent variable, DEF, takes the value one if default occurs in the year following the
one of the rating assignment and zero otherwise. In addition to bank and year dummy variables, model 1
uses the financial rating FR, model 2 the non-financial rating NFR, and model 3 the overall rating OR as
independent variables (instead of ‘‘Rating’’ as indicated in the first column) to estimate the probability of a
default event. Coefficients are estimated using the maximum likelihood method with sampling weights.
, , Significantly different from zero at the 0.01, 0.05, and 0.10-level.
empirical estimate of the sampling distribution of each evaluation criterion in the fol-
lowing manner:
1. Random draw of 409 observations with replacement from the original sample.
2. Estimation of models 1–3 in the way described above using the sample drawn in
step 1.
3. Evaluation of each model’s performance using the same previous criteria.
4. Independent replication of steps 1–3 for 1000 times.
sampling distribution. Using this bootstrap distribution we can determine the bias,
calculated as the average difference between the bootstrap estimates and the ob-
served statistic (as shown in Table 4), its standard deviation and confidence intervals.
Efron (1982) suggests that when the ratio of the bias to its standard deviation is less
than 0.25, the bias is not a problem as the random error will surpass it. Table 5 pre-
sents results.
As can be seen, in most of the cases the bias is relatively small in comparison to its
standard deviation. For example, calculating each model’s ratio of the bias to the
estimated standard deviation for the Brier Score (observations correctly classified)
yields a range of 0.14–0.2 (0.14–0.18). However, for the type I error rate (type II er-
ror rate) we obtain values in the range 0.42–0.69 (0.26–0.40). These results indicate
that biased coefficients do not represent a serious problem for most of the evaluation
criteria. Moreover, the reported confidence intervals support the previously found
ranking of the three models.
Table 5
Bootstrapping of evaluation criteria for models 1–3
Evaluation Model 1 Model 2 Model 3
criterion (financial rating) (non-financial rating) (overall rating)
McFadden’s R2 Bootstrap mean 0.2842 0.3118 0.3811
Bias 0.0162 0.0180 0.0212
Bootstrap Std. 0.0454 0.0476 0.0591
Dev.
95% Conf. interval [0.1818; 0.3485] [0.2091; 0.3786] [0.2232; 0.4557]
1.00
0.90
McFadden's R2 model 3
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00
2
McFadden's R model 1
10
With regard to this criterion, model 3 performs in 6 of 1000 cases equally well and in 15 of 1000 cases
worse than model 1.
522 J. Grunert et al. / Journal of Banking & Finance 29 (2005) 509–531
0.20
0.18
0.16
Brier Score model 3
0.14
0.12
0.10
0.08
0.06
0.04
0.02
0.00
0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20
Brier Score model 1
1.00
0.98
Correctly classified model 3
0.96
0.94
0.92
0.90
0.88
0.86
0.84
0.82
0.80
0.80 0.82 0.84 0.86 0.88 0.90 0.92 0.94 0.96 0.98 1.00
Correctly classified model 1
Table 6
Relative performance of model 3 vs. models 1 and 2
Evaluation criterion P (model 3 worse than model 1) P (model 3 worse than model 2)
2
McFadden’s R 0.003 0.022
Brier Score 0.000 0.004
Obs. correctly classified 0.015 0.017
Type I error rate 0.196 0.173
Type II error rate 0.131 0.118
Reported values are probabilities of observing a positive/negative difference between the evaluation cri-
teria. Probabilities are obtained from the bootstrap distribution of each model and criterion.
model 3 performing worse than the two other models. The results of the different
evaluation criteria are presented in Table 6 below.
It turns out that model 3 clearly dominates the other two models in terms of
McFadden’s R2 , the Brier Score and the percentage of correctly classified observa-
tions, as the probability of observing the opposite is very small. With respect to type
I and II error rates, model 3 is still, but to a lesser extent, superior. Note that this
deterioration of performance should not be overstated since it can be a consequence
of the fact that type I and II error estimates are relatively more biased than the other
evaluation criteria estimates.
Summarizing, a ‘‘mixed’’ model that includes both financial and non-financial fac-
tors leads to a more accurate prediction of default events than a model that is solely
based on either factors or naive forecasts. The result is statistically highly significant,
which provides strong support for our hypothesis.
In this section we analyze the robustness of our previous finding relative to several
aspects. Firstly, we address the marginal impact of each banks’ rating-default-set by
leaving out one bank from the sample successively and estimating the same models
with the remaining three banks as in Section 4.1. The results confirm our previous
findings: model 3 dominates model 1 with respect to all of our five evaluation crite-
ria. 11
Secondly, we carried out regression analyses on the individual bank level. Due to
a too small number of observations from bank 1 and 4, individual models can only
be estimated for bank 2 and bank 3. The results obtained are consistent with previ-
ous ones.
Thirdly, we examine the influence of weighting schemes in internal rating systems.
With regard to previous results, it is not clear why model 3 performs better than
model 1. One reason might be the additional inclusion of non-financial factors.
11
The only exception occurs if bank 2 is left out. In this case, model 3 dominates model 1 for all
evaluation criteria except type II error rate, where both models perform equally.
524 J. Grunert et al. / Journal of Banking & Finance 29 (2005) 509–531
Another reason might be that the independent variables in both models are based on
different weighting schemes (one optimal and one sub-optimal). In particular, the
fact that we use the financial rating based on a weighting scheme optimized for
the overall rating could be problematic. To investigate the influence of weighting
schemes, we compare a probit model that explains default events on the basis of
financial factors (regression 1) with a probit model that explains default events on
the basis of financial and non-financial factors (regression 2). For this purpose, we
inspect the financial factors to detect outliers. Since the distributions of the current
ratio, cash flow-to-net liabilities and the capital intensity ratio exhibit some extreme
values, we estimate regressions 1 and 2 with raw data and with winsorized data. 12
Results obtained from winsorized data are very close to those from the raw data,
indicating that outliers are not crucial. For this reason, we stick to the raw data in
the remainder. Regression results and the corresponding evaluation criteria are re-
ported in Table 7. 13
Essentially, in regression 1 we find that the equity ratio (ER) has the expected neg-
ative sign and is significant at the 0.01-level. Likewise, the variable return on assets
(ROA) has the expected negative sign and is significant at the 0.05-level. In terms of
economic significance, the coefficient is higher than that of the equity ratio. In regres-
sion 2 the variable ER still has an economic influence but its significance declines to
the 0.10-level. Note that the non-financial factor management quality (MGT) is eco-
nomically and statistically significant at the 0.01-level, whereas the non-financial fac-
tor market position (MKT) is not significant at all. Return on assets also remains
significant at the 0.05-level. As reported in Table 7, panel B, regression 2 produces
a more accurate prediction of default events than regression 1 with respect to most
of the evaluation criteria (except type II error rate), although the cutoff point (here
0.07) was optimized for regression 1. Both regression models are more accurate than
the naive forecast which leads to a Brier Score of 0.1308. Since the weighting of
financial and non-financial factors is not predetermined here (as it was in the case
in Section 4.1) but rather estimated in the regressions, weighting schemes do not
seem to be critical because our previous results are reproducible, without relying
on sub-ratings that are based on predetermined weighting schemes. Note that the sig-
nificant influence of the factor management quality (MGT) is consistent with the sur-
vey results of G€unther and Gr€ uning (2000). Additionally, as stated in Section 4.1, we
bootstrap regressions 1 and 2 to study a potential bias in estimators and the relative
performance of both models. Table 8 summarizes the main results.
Panel A shows that the bias is relatively low in comparison to the bootstrap stan-
dard deviation for observations correctly classified and the type II error. Considering
12
The replacement of extreme values by the minimal/maximal admitted values is called winsorization.
We replaced values below (above) the 5% quantile (95% quantile) by the 5%/95% quantile values of a
variable’s distribution.
13
Note that due to the lower number of observations, the absolute values of the evaluation criteria are
not comparable to the regression analyses presented in Section 4.1.
J. Grunert et al. / Journal of Banking & Finance 29 (2005) 509–531 525
Table 7
Prediction of default events with different factor types
Panel A: Regression results
Variable Regression 1 Regression 2
DEF Coefficient Robust Std. Err. Coefficient Robust Std. Err.
LTA )0.0098 0.1070 0.07888 0.1309
ER )0.0170 0.0055 )0.0105 0.0062
CR 0.0000 0.0000 0.0001 0.0018
CFNL )0.0000 0.0000 )0.0001 0.0000
CIR 0.0002 0.0003 0.0003 0.0003
ROA )0.0321 0.0127 )0.0344 0.01490
MKT – – )0.1603 0.1462
MGT – – 0.5075 0.1125
B2 0.7146 0.3863 0.4692 0.4403
B3 0.4526 0.4307 0.2667 0.4811
B4 )0.1605 0.5056 )0.5290 0.5900
Y1993 )0.1873 0.3283 )0.2305 0.3724
Y1994 0.2370 0.2981 0.1000 0.3110
Y1995 0.6440 0.3038 0.6105 0.3257
Intercept )2.4507 1.2802 )4.1844 1.8581
Panel B: Evaluation criteria – Predicted defaults are those observations with a fitted probability above
0.07, which is the cutoff point that maximizes the proportion of correctly predicted observations by
regression 1. The null hypotheses BS(regression 1) ¼ BS(regression 2) can be rejected with a p-value
0.00 using the Williams–Kloot statistic zwk (two-tailed test)
Evaluation criterion Regression 1 Regression 2
McFadden’s R2 0.2288 0.3297
McFadden’s R2 adjusted )0.288 )0.267
Brier Score 0.1256 0.1042
Obs. correctly classified 0.8777 0.8993
Type I error rate 0.5581 0.4186
Type II error rate 0.0425 0.0425
The dependent variable DEF indicates weather default occurs in the year following the one of the rating
assignment. Regression 1 uses all financial factors described in Table 1 as independent variables (LTA,
ER, CR, CFNL, CIR, ROA, and dummies for banks and years) whereas regression 2 uses all financial
(and dummies for banks and years) and non-financial factors (MKT, MGT). Due to lacking data the
sample is reduced to 278 observations. Coefficients are estimated using the maximum likelihood method
with sampling weights. , , Significantly different from zero at the 0.01, 0.05, and 0.10-level.
the other three evaluation criteria, the bias is somehow larger but, in absolute terms,
always smaller than the bootstrap standard deviation. Overall, in our opinion these
mixed bootstrapping results do not represent clear evidence in favor of a serious bias
in the evaluation criteria. Furthermore, panel B reveals that regression 2 dominates
regression 1 with regard to four of five evaluation criteria. Only the performance
analysis with type II error rates provides a less definite predominance of regression
2. Note that if we bootstrap regressions with variables winsorized at the 5%/95%
quantiles (not reported in Table 8), the probability of regression 2 performing worse
526 J. Grunert et al. / Journal of Banking & Finance 29 (2005) 509–531
Table 8
Bootstrapping of regressions 1 and 2
Panel A: Bootstrapping results
Evaluation criterion Regression 1 Regression 2
McFadden’s R2 Bootstrap mean 0.2770 0.3926
Bias 0.0481 0.0629
Bootstrap Std. Dev. 0.0625 0.07062
95% Conf. interval [0.1268; 0.3063] [0.2080; 0.4063]
Brier Score Bootstrap mean 0.1188 0.0949
Bias )0.0068 )0.0047
Bootstrap Std. Dev. 0.0170 0.0162
95% Conf. interval [0.1004; 0.1677] [0.0809; 0.1477]
Panel B: Relative performance of regression 2 vs. regression 1 – Reported values are probabilities of
observing a positive/negative difference between the evaluation criteria. Probabilities are obtained
from the bootstrap distribution of each regression and criterion
Evaluation criterion P (regression 2 worse
than regression 1)
McFadden’s R2 0.000
Brier Score 0.000
Obs. correctly classified 0.150
Type I error rate 0.110
Type II error rate 0.415
For each evaluation criterion we report the bootstrap mean, the bias (calculated as the average difference
of the bootstrap estimates and the observed value), the bootstrap standard deviation and a 95% confidence
interval based on the bias-corrected percentile method. Regressions are estimated as described in Table 7.
Since we do not know how accurately banks rate firms that were in default during
the preceding year, 14 we investigate this issue for a subsample of our original data.
More specifically, we drop subsequent default observations from firms that exhibit
multiple defaults (for example, from a firm-specific year-default vector (1,1,0,1) we
only use the first and the last observation). This procedure leads to a subsample with
383 observations (of which there are 50 defaults). According to Section 4.1, we then
estimate models 1–3. The resulting rank ordering reveals that model 3 performs bet-
ter than models 1 and 2 in terms of all evaluation criteria. In addition, applying the
same selection criteria as before, we carry out regressions 1 and 2 with a subsample
of 264 observations (of which there are 35 defaults). Regression 2 leads to a better
forecast of default events than regression 1 with respect to most of the evaluation
criteria (except type II error rate).
Finally, results may be influenced by a reporting lag due to outdated financial
statements. Since distressed firms stop producing financial statements, financial rat-
ings may be based on the last available report before default. For example, if a firm
defaults early in year t, the bank’s financial rating of this firm is still based on the
financial statements for fiscal year t 2. To address this issue, we create a variable
‘‘time lag’’ that reports the number of days between December 31 of the last avail-
able financial statements and the day of the credit application or review. We cannot
refer to the submission date of a firm’s financial reports because it is not included in
our sample. Then we rerun models 1–3 and regressions 1 and 2, adding the variable
‘‘time lag’’. It turns out that ‘‘time lag’’ is not significant in any regression (with
p-values in a range of 0.2–0.5), the rating variables in models 1–3 remain highly sig-
nificant and the rank ordering of the models according to the five evaluation criteria
has not been altered. These results indicate that timeliness does not seem to be a crit-
ical issue. Nevertheless, this point should be interpreted with care because our
variable ‘‘time lag’’ is only an imperfect approximation of the actual reporting
behavior.
Recapitulating, the results of various robustness tests support the hypothesis that
the combined use of financial and non-financial factors leads to a more accurate pre-
diction of default events than their single use or naive forecasts. This finding is not
sensitive to the omission of any bank from the sample, robust on the individual level
for two banks, not driven by predetermined weighting schemes, not dependent on
whether firms default once or more times, and does not seem to be biased by a
reporting lag.
14
Ratings should reflect a firm’s probability of default, which implies that rating changes indicate an
increase or decrease in a firm’s default risk. Considering our definition of default and the fact that banks
actually keep on rating distressed firms, we think it is reasonable to analyze not only situations in which
firms default but also those in which they recover from a previous default. For example, if a bank creates a
specific loan loss provision, it will set the rating to grade 6 directly afterwards. Nonetheless, if the bank
subsequently gets new information that indicates an improved creditworthiness, any further rating will
describe the future (and not the current) status of the borrower.
528 J. Grunert et al. / Journal of Banking & Finance 29 (2005) 509–531
5. Conclusion
Over the past 10 years, banks’ uses of internal credit ratings have multiplied. In
the near future ratings will be recognized by banking supervision authorities to deter-
mine banks’ capital adequacy, converging considerably the internal and the external
perspective of credit risk management (see Basel Committee on Banking Supervision
(2001, 2003)). Given this rising importance of credit ratings, the design of sound rat-
ing systems is in the interest of banks, borrowers, and supervisors. Whereas the rel-
evance of financial factors for rating purposes is widely accepted, the consideration
of non-financial factors is equally beyond controversy but it has often only holisti-
cally been justified.
This paper constitutes a first attempt to explore the role of non-financial factors in
credit ratings. Our main result is that the combined use of financial and non-financial
factors leads to a significantly more accurate default prediction than the single use of
financial or non-financial factors. Default is defined consistently with the definition
of the Basel Committee on Banking Supervision; goodness of fit and accuracy of de-
fault prediction is measured using McFadden’s R2 , the Brier Score, the percentage of
correctly classified observations and type I/II error rates.
Although our results are limited in some ways due to the data used, they essen-
tially confirm banking practice (see Basel Committee on Banking Supervision,
2000a; G€ unther and Gr€ uning, 2000) and show that holistic justifications for the
use of non-financial factors can be confirmed by a quantitative analysis. However,
since only the benefits of non-financial factors have been analyzed, it is not possible
to conclude that their additional use represents a net advantage because we have not
examined the costs of acquiring and processing non-financial information. The latter
may be left to future research that should proceed with an integrated cost benefit
analysis of internal credit rating systems on the individual bank level. Using more
extensive data, it would be interesting to differentiate our analysis with respect to
the age and size of borrowing firms since both characteristics might be linked to
the degree to which non-financial factors improve default prediction. Additionally,
in particular for pricing issues, it might be instructive to study whether non-financial
factors in credit ratings can improve the differentiation between those borrowers dis-
posing of an acceptable degree of creditworthiness. Collecting data from different
financial intermediaries, our results could also be tested with regard to bank size
and organizational structure following Berger et al. (2002) and Stein (2002). Finally,
a promising extension of Carey’s (2001), Tabakis and Vinci’s (2002) research as well
as our own could be to investigate whether and to what extent there is a relationship
between multiple lenders’ rating disagreements for common borrowers and non-
financial factors in credit ratings.
Acknowledgements
participants of the 9th Annual Meeting of the German Finance Association in Co-
logne, Germany, for their valuable comments and insights. In addition, we are grate-
ful to the seminar participants at the University of Mannheim for useful
conversations. We retain responsibility for any remaining errors.
This table shows the formulae to calculate the financial factors used in Section 4.
The factors ER, CR, CFNL, CIR, and ROA are parts of the internal credit rat-
ings systems of banks 1–4.
References
Altman, E.I., 1968. Financial ratios, discriminant analysis and the prediction of corporate bankruptcy.
Journal of Finance 23, 589–609.
Altman, E.I., Haldemann, R.G., Narayan, P., 1977. Zetae analysis – a new model to identify bankruptcy
risk of corporations. Journal of Banking and Finance 1, 29–54.
Baetge, J., 1998. Empirische Methoden zur Fr€ uherkennung von Unternehmenskrisen. Opladen/Wiesba-
den.
Basel Committee on Banking Supervision, 2000a. Range of Practice in Banks’ Internal Rating Systems.
Discussion Paper, January 2000.
Basel Committee on Banking Supervision, 2000b. Credit ratings and complementary sources of credit
quality information. Working Paper No. 3, August 2000.
Basel Committee on Banking Supervision, 2001. The New Basel Capital Accord. Consultative Document,
January 2001.
Basel Committee on Banking Supervision, 2003. The New Basel Capital Accord. Consultative Document,
April 2003.
Beaver, W.H., 1966. Financial ratios as predictors of failure. Journal of Accounting Research 4, 71–111.
Berger, A.N., Miller, N.H., Petersen, M.A., Rajan, R.G., Stein, J.C., 2002. Does function follow
organizational form? Evidence from the lending practices of large and small banks. Working Paper
No. 8752. National Bureau of Economic Research.
530 J. Grunert et al. / Journal of Banking & Finance 29 (2005) 509–531
Bhattacharya, S., Thakor, A., 1993. Contemporary banking theory. Journal of Financial Intermediation 3,
2–50.
Blochwitz, S., Eigermann, J., 2000. Unternehmensbeurteilung durch Diskriminanzanalyse mit qualitativen
Merkmalen. Zeitschrift f€ ur betriebswirtschaftliche Forschung 52, 58–73.
Brier, G.W., 1950. Verification of forecasts expressed in terms of probability. Monthly Weather Review
78, 1–3.
Brunner, A., Krahnen, J.P., Weber, M., 2000. Information production in credit relationships: on the role
of internal ratings in commercial banking. Working Paper No. 2000/10. Center for Financial Studies,
Frankfurt/Main.
Carey, M., 2001. Some evidence on the consistency of banks’ internal credit ratings. Working Paper.
Federal Reserve Board.
Carey, M., Hrycay, M., 2001. Parameterizing credit risk models with rating data. Journal of Banking and
Finance 25, 197–270.
Collins, N.J., 1966. Credit Analysis – Concepts and Objectives. In: Baughn, W.H., Walker, C.E. (Eds.),
The Banker’s Handbook, pp. 279–289.
Crouhy, M., Galai, D., Mark, R., 2001. Prototype risk rating system. Journal of Banking and Finance 25,
47–95.
Deutsche Bundesbank, 2003. Validierungsans€atze f€ ur interne Ratingsysteme. Monatsbericht (September),
61–74.
Diamond, D.W., 1984. Financial intermediation and delegated monitoring. Review of Economic Studies
51, 393–414.
Efron, B., 1979. Bootstrap methods: Another look at the jackknife. Annals of Statistics 7, 1–26.
Efron, B., 1982. The Jackknife, the Bootstrap and other Resampling Plans. Society for Industrial and
Applied Mathematics, Philadelphia.
Elsas, R., Henke, S., Machauer, A., Rott, R., Schenk, G., 1998. Empirical analysis of credit relationships
in small firms financing: sampling design and descriptive statistics. Working Paper No. 1998/14. Center
for Financial Studies, Frankfurt/Main.
Elsas, R., Krahnen, J.P., 1998. Is relationship lending special? Evidence from credit-file data in Germany.
Journal of Banking and Finance 22, 1283–1316.
English, W.B., Nelson, W.R., 1999. Bank risk rating of business loans. In: Proceedings of the 35th Annual
Conference on Bank Structure and Competition, May.
Ewert, R., Szczesny, A., 2001. Countdown for the new Basle Capital Accord. Working Paper No. 2001/05.
Center for Financial Studies, Frankfurt/Main.
G€unther, T., Gr€ uning, M., 2000. Einsatz von Insolvenzprognoseverfahren bei der Kredit-
w€urdigkeitspr€ufung im Firmenkundenbereich. Die Betriebswirtschaft 60, 39–59.
Hesselmann, S., 1995. Insolvenzprognose mit Hilfe qualitativer Faktoren. Aachen.
Hosmer, D.W., Lemeshow, S., 2000. Applied logistic regression. New York.
Krahnen, J.P., Weber, M., 2001. Generally accepted rating principles: A primer. Journal of Banking and
Finance 25, 3–23.
Leland, H.E., Pyle, D.H., 1977. Information asymmetries, financial structure, and financial intermedi-
ation. Journal of Finance 32, 371–387.
L€
offler, G., 2004. An anatomy of rating through the cycle. Journal of Banking and Finance 28, 695–
720.
Machauer, A., Weber, M., 1998. Bank behavior based on internal credit ratings of borrowers. Journal of
Banking and Finance 22, 1355–1383.
Merton, R.C., 1974. On the pricing of corporate Debt: The risk structure of interest rates. Journal of
Finance 29, 449–470.
Moody’s Investor Service, 2001. Moody’s RiskCalce for Private Companies: The German Model –
Rating Methodology, November 2001.
Norden, L., 2002. Spezialbanken und Basel II: Eine empirische Untersuchung interner Ratingsysteme. Die
Betriebswirtschaft 62, 273–288.
Organisation for Economic Co-operation and Development, 2001. Bank Profitability: Financial
Statements of Banks, Paris.
J. Grunert et al. / Journal of Banking & Finance 29 (2005) 509–531 531
Ohlson, J.A., 1980. Financial ratios and the probabilistic prediction of bankruptcy. Journal of Accounting
Research 18, 109–131.
Platt, H.D., Platt, M.B., 1990. Development of a class of stable predictive variables: The case of
bankruptcy prediction. Journal of Business, Finance and Accounting 17, 31–51.
Redelmeier, D.A., Bloch, D.A., Hickam, D.H., 1991. Assessing predictive accuracy: How to compare brier
scores. Journal of Clinical Epidemiology 44, 1141–1146.
Risk Management Association, 2000. EDF Estimation: A ‘‘Test-Deck’’ Exercise. The Risk Management
Association Journal, 54–61.
Stein, J.C., 2002. Information production and capital allocation: Decentralized versus hierarchical firms.
Journal of Finance 57, 1891–1921.
Tabakis, E., Vinci, A., 2002. Analysing and combining multiple credit assessments of financial institutions.
Working paper No. 123. European Central Bank.
Treacy, W.F., Carey, M., 2000. Credit risk rating systems at large US banks. Journal of Banking and
Finance 24, 167–201.
Vinterbo, S., Ohno-Machado, L., 1999. A recalibration method for predictive models with dichotomous
outcomes. In: Vinterbo, S., Predictive Models in Medicine: Some Methods for Construction and
Adaptation. Ph.D. thesis. Norwegian University of Science and Technology.
Weber, M., Krahnen, J.P., Vossmann, F., 1999. Risikomessung im Kreditgesch€aft: Eine empirische
Analyse bankinterner Ratingverfahren. Zeitschrift f€ ur betriebswirtschaftliche Forschung, Sonderheft
41, 117–142.
Zmijewski, M.E., 1984. Methodological issues related to the estimation of financial distress prediction
models. Journal of Accounting Research 22, 59–82.