0% found this document useful (0 votes)
92 views23 pages

Non-Financial Factors in Credit Ratings

This document discusses the role of non-financial factors in internal credit ratings. It analyzes credit file data from four major German banks to determine if a combination of financial and non-financial factors leads to a more accurate prediction of future default events than using each factor alone. The study finds that including both financial and non-financial factors in credit ratings results in a more accurate default prediction than relying only on financial or only on non-financial factors. This has implications for how banks assess creditworthiness and for regulators in determining capital adequacy requirements.

Uploaded by

noor ul hq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
92 views23 pages

Non-Financial Factors in Credit Ratings

This document discusses the role of non-financial factors in internal credit ratings. It analyzes credit file data from four major German banks to determine if a combination of financial and non-financial factors leads to a more accurate prediction of future default events than using each factor alone. The study finds that including both financial and non-financial factors in credit ratings results in a more accurate default prediction than relying only on financial or only on non-financial factors. This has implications for how banks assess creditworthiness and for regulators in determining capital adequacy requirements.

Uploaded by

noor ul hq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Journal of Banking & Finance 29 (2005) 509–531

[Link]/locate/econbase

The role of non-financial factors in internal


credit ratings
Jens Grunert a, Lars Norden a,*
, Martin Weber a,b

a
Department of Banking and Finance, University of Mannheim, L 5.2, D-68131 Mannheim, Germany
b
Centre for Economic Policy Research (CEPR), London, UK
Received 5 June 2002; accepted 14 January 2004
Available online 20 June 2004

Abstract

Internal credit ratings are expected to gain in importance because of their potential use for
determining regulatory capital adequacy and banks’ increasing focus on the risk–return profile
in commercial lending. Whereas the eligibility of financial factors as inputs for internal credit
ratings is widely accepted, the role of non-financial factors remains ambiguous. Analyzing
credit file data from four major German banks, we find evidence that the combined use of
financial and non-financial factors leads to a more accurate prediction of future default events
than the single use of each of these factors.
Ó 2004 Elsevier B.V. All rights reserved.

JEL classification: G21


Keywords: Credit risk; Credit ratings; Debt default; Probit analysis

1. Introduction

Similar to capital market investors that rely on credit ratings provided by rating
agencies, banks assign internal credit ratings to appraise the creditworthiness of their
borrowers. In both cases, ratings can be interpreted as a screening technology that is
applied to alleviate asymmetric information problems between borrowers and lend-
ers. Whereas external ratings have been well established since the beginning of the
20th century, internal ratings were adopted increasingly by banks during the nineties

*
Corresponding author. Tel.: +49-621-1811536; fax: +49-621-1811534.
E-mail address: norden@[Link] (L. Norden).

0378-4266/$ - see front matter Ó 2004 Elsevier B.V. All rights reserved.
doi:10.1016/[Link]fin.2004.05.017
510 J. Grunert et al. / Journal of Banking & Finance 29 (2005) 509–531

(see English and Nelson, 1999; Treacy and Carey, 2000). Internal credit ratings for
corporate borrowers are an aggregated valuation procedure of various financial
and non-financial factors. In banking practice, ratings represent the basis for loan
approval, pricing, monitoring, and loan loss provisioning. While considerable re-
search has proven the suitability of financial factors to predict borrower insolvency
(see, for example, Altman, 1968), the role of non-financial factors remains ambigu-
ous. Although consideration of non-financial factors such as management quality
and industry perspectives is beyond controversy (see Basel Committee on Banking
Supervision, 2000a, 2001; G€ unther and Gr€ uning, 2000) there is a lack of quantitative
research on this issue. With respect to these ‘‘soft’’ factors, bankers often refer to
their experience and distrust the sole use of financial criteria. A first investigation
of the importance of soft information in borrower–bank relationships is conducted
by Berger et al. (2002) and Stein (2002). Depending on bank size, Berger et al. (2002)
explore a bank’s ability to act in projects that require the evaluation of soft informa-
tion. They find that small banks are more capable of collecting and acting on soft
information than large banks. Stein (2002) points out that decentralized banking
hierarchies are likely to be more attractive when projects’ soft factors are to be eval-
uated.
This paper explores the role of non-financial factors in internal credit ratings. For
this purpose we examine empirically whether the combined use of financial and non-
financial factors leads to a more accurate prediction of default events than their sin-
gle use. 1 Our study has implications for both banks and bank supervisors: banks
will be able to better understand the role of quantitative and qualitative factors in
internal credit ratings and supervisors will be supported in claiming a ‘‘mixed’’ credit
rating to determine regulatory capital requirements (see Basel Committee on Bank-
ing Supervision, 2001).
The paper proceeds as follows. Section 2 provides an overview of related litera-
ture, in particular on the structure of internal rating systems and the properties of
non-financial sub-ratings. Section 3 describes the data, the variables, and proposes
a testable hypothesis. Section 4 analyzes whether a combination of financial and
non-financial factors leads to a more accurate prediction of default events than
the single use of each of these factors. Afterwards, several types of robustness tests
are performed. The paper concludes with Section 5.

2. Overview of related literature

In modern theory of financial intermediation, the existence of intermediaries is ex-


plained by an improvement of welfare that results from a reduction in costs of asym-
metric information (see, for example, Leland and Pyle, 1977; Diamond, 1984;
Bhattacharya and Thakor, 1993). Many of these models presume that banks screen

1
The indicator variable for default events defined hereinafter is consistent with the Basel II definition of
default (see Basel Committee on Banking Supervision, 2001).
J. Grunert et al. / Journal of Banking & Finance 29 (2005) 509–531 511

and monitor borrowers at a given cost but they rarely specify the technology that is
or should be applied. Since the latter issue is closely connected to our study, we out-
line three lines of related literature in the following, focusing on the empirical anal-
ysis of default prediction. 2 Firstly, research on the prediction of corporate
bankruptcy on the basis of financial factors is presented. Secondly, empirical and
normative research on banks’ internal credit rating systems is reviewed. Finally, lit-
erature concerning the components of credit ratings, both quantitative and qualita-
tive, is described.
Firstly, our analysis relates to the work on corporate bankruptcy prediction with
financial factors (see Beaver, 1966; Altman, 1968; Altman et al., 1977; Ohlson, 1980;
Platt and Platt, 1990; Baetge, 1998). These factors typically concern the capital struc-
ture, profitability and liquidity of a firm. Models are based on linear discriminant
analysis, on logit and probit regression analysis or, more recent ones, on neural net-
works. Because of their relatively high discriminary power, these models are widely
accepted but they nevertheless show some disadvantages (see Basel Committee on
Banking Supervision, 2000b, pp. 107–110). Few of them are based on a theory that
explains why and how certain financial factors are linked to corporate bankruptcy.
As financial factors are mostly backward-looking point-in-time measures, these
models are inherently constrained and it is not clear how well these models perform
out-of-sample (time, firm, industry, etc.). This area of research is relatively well
developed but still has to overcome the above mentioned problems.
Secondly, we briefly review research on banks’ internal credit rating systems
which is still scarce but growing considerably. It can be divided into an empirical
and a normative part. On the one hand, empirical analyses of banks’ internal rating
systems examine the structure and the use of ratings (see Elsas and Krahnen, 1998;
Machauer and Weber, 1998; English and Nelson, 1999; Treacy and Carey, 2000;
Crouhy et al., 2001; Ewert and Szczesny, 2001; Norden, 2002). These studies and
an overview of international best practice rating standards in the banking industry
(see Basel Committee on Banking Supervision, 2000a) show that internal rating sys-
tems are based on either statistical methods, constrained expert judgment-based
techniques or exclusively expert judgments. These systems tend to include similar
types of risk factors, typically a mix of quantitative and qualitative factors (e.g. lever-
age, profitability and liquidity ratios, management experience, industry perspec-
tives). However, the weighting schemes of these risk factors differ considerably
across banks. Ratings are used for loan approval, management reporting, pricing,
limit setting, and loan loss provisioning. Other studies analyze the frequency and
the extent of banks’ rating disagreement for a given borrower (see Risk Management
Association, 2000; Carey, 2001). In addition to the reasons given in these studies, it
might be possible that differences in opinion on borrower quality result from a dif-
ferent evaluation of non-financial factors rather than from financial factors. Using a
similar kind of reasoning, Tabakis and Vinci (2002) compare and combine credit

2
Theory-based models (structural models) are drawn upon Merton (1974). Commercial applications
for theory-based models are CreditMetricse and KMV CreditMonitore.
512 J. Grunert et al. / Journal of Banking & Finance 29 (2005) 509–531

assessments of financial institutions from multiple sources (ratings of agencies, other


credit assessment institutions and banks’ internal ratings). They develop a rating
model that relies on two components: a ‘‘core part’’ which includes easily available
and quantifiable financial data, and an ‘‘analyst’s contribution’’ which includes addi-
tional, more complex information.
On the other hand, Krahnen and Weber (2001) present a normative set of ‘‘Gen-
erally accepted rating principles’’ that points out the necessity of a link between cred-
it rating and probability of default. Requirements concerning completeness,
definition of default, monotonicity, back testing, etc. of a rating system are devel-
oped. They describe credit ratings as being a ‘‘mixture of mathematical models
and management intuition’’, but they say nothing about the risk factors, the factor
weights and the value function to be included in a ‘‘good’’ rating. Based on the first
and second consultation period and several of its own studies, the Basel Committee
on Banking Supervision released a second Consultative Document in January 2001
and recently a third one in April 2003. Both contain the proposal of an internal
ratings-based approach for regulatory capital adequacy and include an extensive list
of the normative requirements banks have to meet if they want to calculate regula-
tory risk weights based on their internal credit ratings.
Thirdly, G€unther and Gr€ uning’s (2000) survey reports that 70 of 145 German
banks not only use quantitative but also qualitative factors in credit risk assessment,
with management quality being the most important ‘‘soft’’ factor. 77.6% of these
banks state that the additional inclusion of qualitative factors clearly improves de-
fault prediction. However, nothing is said about the degree of improvement. Hessel-
mann (1995) as well as Blochwitz and Eigermann (2000) incorporate qualitative
variables (for example accounting behavior or discrete cover ratio classes) in discri-
minant analysis to differentiate between subsequently defaulting and non-defaulting
German companies. They find that the use of qualitative variables improves the per-
centage of companies correctly classified. These results support the requirement of
the Basel Committee on Banking Supervision (2001) that banks not only have to
consider quantitative but also qualitative factors; for example, the availability of au-
dited financial statements, depth and skill of management, the position within the
industry and future prospects (see no. 265 in the second Consultative Document).
Furthermore, analyses of quantitative and qualitative ratings using different sets of
credit file data from German banks (see Weber et al., 1999; Brunner et al., 2000)
show that qualitative ratings exhibit significantly better grades with less dispersion
around their mean, that they change less often than quantitative ratings and that rat-
ing changes stem mainly from changes in the quantitative sub-ratings. They leave
open the question of whether the important role of ‘‘soft’’ information in internal
credit ratings is a desirable or problematic feature.
Given this literature, it becomes clear that the specific role of and interaction be-
tween different risk factors in internal credit rating systems has to be analyzed in
more detail. Whereas the importance of financial factors is widely accepted because
its impact is measurable, the relevance of non-financial factors is mainly considered
in a holistic manner. These factors are usually chosen on the basis of experts’ judg-
ments and common industry knowledge but how much do they contribute to an
J. Grunert et al. / Journal of Banking & Finance 29 (2005) 509–531 513

accurate forecast of borrower quality? We intend to answer this question in the fol-
lowing sections.

3. Data, variables and hypothesis

Originally, our data on bank–borrower relationships was composed of two ran-


domly drawn sub-samples (A and P) that contained credit file information from
six major German banks for 240 borrowers from the period January 1992 to Decem-
ber 1996 (see Elsas et al., 1998 for a detailed description of the original sample). 3
The population was restricted to medium-sized firms with an annual turnover be-
tween EUR 25 and 250 million and a minimum loan size of EUR 1.5 million. To
avoid the influence of the restructuring process in the eastern part of Germany, only
customers of the western part were included. Sample A was randomly drawn from
this population and sample P was randomly drawn from a sub-population which
consisted of borrowers in financial distress during 1992–1996. We merged both
sub-samples, controlling for a potential oversampling bias. More explicitly, we en-
sured that we did not sample the same borrower twice and we used sampling weights
in all regression models (as described in Section 4.1).
The meta rating scale with grades from 1 to 6 was created to make internal ratings
comparable between banks (see Elsas et al., 1998). Grade 1 means very high, 2 high
or above average, 3 average, 4 below average, 5 problematic and 6 highly distressed
or defaulted. Some variables were not documented in the credit files because not all
relationships lasted for five years and the creditworthiness of high quality borrowers
was not checked annually but every second year at one bank. Since all firms in the
sample borrowed exclusively from one of the four banks (or other banks that are not
in the sample), we could not compare multiple rating assignments for borrowers
from different lenders as done by the Risk Management Association (2000), Carey
(2001) and Tabakis and Vinci (2002).
In our analysis an observation consists of a borrower’s financial, non-financial,
and overall rating and his or her default status in the following year. All variables
used in the further analyses are summarized in Table 1.
The variable DEF is an indicator for default events. Consistent with the definition
given by the Basel Committee on Banking Supervision (2001) (see no. 272 in the sec-
ond Consultative Document), the variable DEF equals 1 if one or more of the fol-
lowing sub-events occur in the year following the one of the rating assignment
and otherwise zero: moratorium, allowance of loan loss provisions, withdrawal of
a credit, disposition of collaterals, liquidation, formation of a bank pool, recapital-
ization. The financial, non-financial and overall ratings are directly adopted from the
original credit files of each bank and transformed accordingly to the overall rating
on the meta rating scale. We obtain the non-financial factors (management quality,

3
In our study we eliminated bank 5 due to a lack of non-financial factors and bank 6 because of a small
number of observations. For this reason credit file information from four banks remains in our data set.
514 J. Grunert et al. / Journal of Banking & Finance 29 (2005) 509–531

Table 1
Description of variables
Variable Description
Default dummy variables
DEF ¼ 1 if default occurred in the year following the
one of the considered rating
Rating categories
FR Financial rating with grades 1–6
NFR Non-financial rating with grades 1–6
OR Overall rating with grades 1–6
Non-financial factors
MGT Non-financial factor ‘‘Management quality’’
MKT Non-financial factor ‘‘Market position’’
Financial factors
LTA Logarithm of total assets
ER Equity-to-assets ratio
CR Current ratio
CFNL Cash flow-to-net liabilities
CIR Capital intensity ratio
ROA Return on assets
Bank dummy variables
B1, B2, B3, B4 ¼ 1 if bank 1, 2, 3, 4 is the lender
Year dummy variables
Y1992, Y1993, Y1994, Y1995 ¼ 1 if observation is from 1992, 1993, 1994, 1995

market position) directly from the credit files, whereas we have to compute the finan-
cial factors, some of them are integral parts of the financial rating, because only the
underlying balance sheet items are in the dataset (see Appendix A for detailed defi-
nitions). These factors cover all categories of the C’s of credit (excepting collateral), a
familiar credit analysis concept in commercial lending (see Collins, 1966). Dummy
variables are created to control bank and year-specific effects. Table 2 shows the dis-
tribution of the default variable DEF by banks, years and overall rating classes. Pa-
nel A shows that default events are agglomerated at bank 2 but quite evenly
distributed across banks 1, 3 and 4. Note that this agglomeration of default events
at bank 2 is not a problem because our results are not sensitive to the omission of
bank 2 from the sample (see Section 4.2).
Whereas panel B indicates a relatively even distribution of the default events
across years, a monotonous increase of the relative default frequency from rating
class 1 to 6 can be observed in panel C.
Table 3 displays descriptive statistics of different rating categories. The means of
all three credit rating categories are higher for defaulters than for non-defaulters.
This is a first indication of a robust relation between credit ratings and default
status. The standard deviations of the different rating categories indicate that the dis-
persions of defaulters’ ratings are lower, which may be caused by the fact that default
events occur mainly in the grades 5 and 6. Similar to the study of Weber et al. (1999),
J. Grunert et al. / Journal of Banking & Finance 29 (2005) 509–531 515

Table 2
Distribution of default events – panels A, B, and C present the distribution of the default variable DEF by
banks, years and overall rating classes
Panel A: Default events by banks
Bank DEF ¼ 0 DEF ¼ 1 Total % of all obs.
1 66 3 69 16.87
2 110 52 162 39.61
3 84 11 95 23.23
4 80 3 83 20.29
Total 340 69 409 100

Panel B: Default events by years


Year DEF ¼ 0 DEF ¼ 1 Total % of all obs.
1992 68 11 79 19.32
1993 94 11 105 25.67
1994 98 18 116 28.36
1995 80 29 109 26.65
Total 340 69 409 100

Panel C: Default by overall rating classes


Overall rating DEF ¼ 0 DEF ¼ 1 Relative Total % of all obs.
default fre-
quency (%)
1 18 0 0.00 18 4.40
2 61 1 1.61 62 15.16
3 120 11 8.40 131 32.03
4 99 25 20.16 124 30.32
5 36 20 35.71 56 13.69
6 6 12 66.67 18 4.40

Total 340 69 409 100


DEF takes the value 1 if default occurred in the year following the one of the considered rating assignment
and 0 otherwise.

Table 3
Descriptive statistics of credit rating categories – credit ratings are assigned in year t, and default status
DEF is reported for the following year
Mean Std. Dev. Mean Std. Dev. Mean Std. Dev.
DEF ¼ 0 DEF ¼ 1
Financial rating 3.72 1.58 3.45 1.50 5.07 1.26
Non-financial rating 3.51 1.15 3.30 1.07 4.54 0.96
Overall rating 3.47 1.17 3.27 1.09 4.45 1.01
No. of observations 409 340 69
All three ratings are based on a six grade scale (1 ¼ best, . . ., 6 ¼ worst creditworthiness).

the standard deviation of non-financial ratings is lower than the one of financial rat-
ings. Furthermore, non-financial ratings are significantly better at the 0.01-level than
516 J. Grunert et al. / Journal of Banking & Finance 29 (2005) 509–531

financial ratings using a Wilcoxon signed-rank test. This means that on an average
banks assess the quality of the management and the market position of their borrow-
ers better than their financial situation.
We now turn to the formulation of our hypothesis. Since the objective of assessing
a borrower’s creditworthiness is to specify his or her probability of default over a
given time horizon (usually one year), banks should not only use backward-looking
‘‘hard’’ financial data but also some forward-looking ‘‘soft’’ information. Internal
ratings of banks are usually based on borrowers’ current condition (point-in-time),
whereas rating agencies follow a ‘‘through the cycle’’ approach projecting the bor-
rowers’ condition on an entire economic cycle (see Treacy and Carey, 2000; L€ offler,
2004). Accordingly, we propose the following hypothesis: a combination of financial
and non-financial factors leads to a more accurate prediction of default than the sin-
gle use of either financial or non-financial factors.

4. Measuring the relation between credit ratings and default events

Our main objective is to find out whether an additional inclusion of non-financial


factors in a bank’s internal credit rating is beneficial or not. It can be deemed ben-
eficial if it leads to a more accurate prediction of default events. In Section 4.1 we test
the above proposed hypothesis by comparing the explanatory power of the overall
rating with that of the financial rating for default events that occur in the year fol-
lowing the one of the rating assignment. In Section 4.2 we implement several tests of
robustness to investigate the sensitivity of our results.

4.1. The relation between credit ratings and default events in the following year

The purpose of a credit rating is to classify prospects and borrowers according to


their probability of default over a given time horizon. As banks typically assign credit
ratings for a one-year horizon (see Treacy and Carey, 2000), we analyze how different
rating categories are related to the default status in the year following a rating assign-
ment. For this purpose, we compare credit ratings assigned in the year t (e.g. calendar
year 1993) with the variable DEF (default in t þ 1, hence calendar year 1994 in our
example). The date of the rating assignment and default event is not in the data set. 4
Since descriptive statistics such as rank correlations or concordance coefficients
support our hypothesis (the results are not reported here), we directly estimate probit
regression models with DEF as dependent variable, and the financial rating FR
(model 1), the non-financial rating NFR (model 2), and the overall rating OR (model
3) respectively as independent variables. In a preparatory analysis dummy variables
for the financial, the non-financial and the overall rating were used. As this specifi-

4
Due to missing default dates and rating assignment dates during a year, we do not include default
events in the same year because we do not know which variable (rating or default) changed at first and
which followed.
J. Grunert et al. / Journal of Banking & Finance 29 (2005) 509–531 517

cation basically yields the same results, we use the credit rating variables (coded on a
scale from 1 to 6) in the remainder. In each model we control for bank and year-spe-
cific influences with dummy variables using bank 1 and year 1992 as reference cate-
gories.
Given that our sample includes a relatively high number of defaults due to the
oversampling of a distressed sub-sample, we employ sampling weights in all subse-
quent regression models in order to correct for potentially biased coefficients. Sam-
pling weights represent the inverse of an observation’s probability of being included
in the sample. In the estimation procedure, sampling weights put more weight on
non-defaulters and less weight on defaulters in order to approximate the default dis-
tribution of the underlying population (see Ewert and Szczesny, 2001, p. 15). 5 We
assume a population probability of default derived from the OECD data on loan loss
provisions of German commercial banks, in which the rounded average probability
of default for the years 1993–1996 amounts approximately to 2%. 6 In addition,
Moody’s Investor Service (2001) takes 1.6% for their RiskCalce model for German
private firms. For robustness purposes, we also apply probability weights, assuming
a population probability of default of 1.6%. The results for models 1–3 and regres-
sions 1–2 are very similar to those obtained from 2%. Neither the regressions coef-
ficients’ magnitude and significance, nor relative model performance change
substantially. Therefore we stick to the previously assumed value of 2%.
The models can be evaluated by using different criteria (see Hosmer and Leme-
show, 2000; Deutsche Bundesbank, 2003). We decided to use the McFadden’s R2 ,
the Brier Score, the percentage of correctly classified observations, and type I and
type II error rates as evaluation criteria because they represent an adequate mix of
goodness of fit and classification accuracy measures. Since the conventional R2 can-
not be calculated for probit and logit models, McFadden’s R2 (Pseudo R2 ) is
employed. It is defined as 1 ) (unrestricted log-likelihood function/restricted log-
likelihood function). The Brier Score (BS) is a measure of prediction accuracy that
is well-known
P in meteorology and medical science (see Brier, 1950). It is calculated
as BS ¼ 1n ni ðhi  pi Þ2 where hi is a binary indicator for the actual realization of
the default variable (1 if default, 0 if no default) and pi is the estimated probability
of default. The difference between the Brier Score and the percentage of correctly
classified observations is that the former is more sensitive to the level of the estimated
probabilities. The Brier Score takes the estimated probabilities directly into account.
Otherwise, the percentage of correctly classified observations transforms probabili-
ties that are higher than a specific cutoff point to 1 and others to 0. In the following,
predicted defaults are those observations with an estimated probability above 0.11,
which is the cutoff point that maximizes the proportion of observations correctly pre-
dicted by model 1. 7 It is important to mention that using the optimal cutoff point of

5
See also Zmijewski (1984) who calls oversampling ‘‘choice-based sample bias’’.
6
See Organisation for Economic Co-Operation and Development (2001).
7
See Carey and Hrycay (2001) for a similar criterion for choosing a cutoff point. Zmijewski (1984)
provides an overview and several examples for linked problems. Ohlson (1980, p. 120), discusses this topic
in the same context, emphasizing that 0.5 is not always a reasonable cutoff value.
518 J. Grunert et al. / Journal of Banking & Finance 29 (2005) 509–531

model 1 is conservative because predictions of model 2 and 3 will be based on the


same value. If, for example, model 3 outperforms model 1 even in this setting, it cer-
tainly will with its optimal cutoff value.
Type I error is the percentage of observations classified as ‘‘non-default’’ but
which actually did default. Type II error is the percentage of observations classified
as ‘‘default’’ that actually did not default. Note that in commercial banking the type
I error is more important than the type II error because of its higher costs. We com-
pare the accuracy measures with those of a naive forecast and between models. The
Brier Score of a naive forecast is calculated by taking the average relative default fre-
quency (ADF) of the entire sample as a default probability for each individual obser-
2 2
vation BS ¼ 1n ½nDEF¼1 ðADF  1Þ þ nDEF¼0 ðADF  0Þ .
Regression results and evaluation criteria for models 1–3 are presented in Table 4.
All three rating variables have, as expected, positive coefficients and are significant at
the 0.01-level. The coefficients of the rating variables are highly economically signif-
icant indicating the strong relation between future default events and credit ratings.
With respect to the dummy variables, bank 2 and bank 3 have significant influence
on the prediction of default, which is consistent with the fact that these two banks
show higher average default frequencies than the two other banks. None of the year
dummies are significant at the 0.05-level, which is consistent with the relatively even
distribution of default events over time. Note that all models are more accurate than
the naive forecast, which leads to a Brier Score of 0.1402. Finally, model evaluation
results shown in panel B reveal that model 3 is superior to models 1 and 2 with re-
spect to all criteria. We find that the Brier Score of model 3 is significantly lower than
the ones of both other models. 8 Moreover, the type I error rate of model 3 (0.4058)
is lower than the one of model 1 (0.5217). 9 This means, in terms of economic signif-
icance, that a bank that relies on a model 3-type rating system will incur considerably
less losses (roughly 10%) due to erroneously accepted borrowers that subsequently
default.
Although we control for an oversampling bias, probit estimates could be biased
because of violated distributional assumptions or correlated regressors and error
terms. To address this concern we apply the bootstrap methodology (see Efron,
1979) to the different evaluation criteria. Bootstrapping is a general resampling tech-
nique that helps to answer the question of whether sample statistics or estimated
regression parameters are biased. It does not depend on specific distributional forms
and can be implemented with Monte Carlo sampling. The only condition to be ful-
filled is that the sample has to be representative. As explained above, the latter is met
by estimating all regressions with sampling weights. Subsequently, we generate an

8
The significance test is based on the Williams–Kloot-statistic zwk which is described in detail by
Redelmeier et al. (1991) and Vinterbo and Ohno-Machado (1999).
9
See Carey and Hrycay (2001). Their logit default prediction model (based on four financial factors)
produces a type I error of 0.68 in the sample and 0.65 out of the sample.
J. Grunert et al. / Journal of Banking & Finance 29 (2005) 509–531 519

Table 4
Regression results and evaluation criteria for models 1–3
Panel A: Regression results
Variable Model 1 (financial Model 2 (non-financial Model 3 (overall
rating) rating) rating)
DEF Coefficient Robust Coefficient Robust Coefficient Robust
Std. Err. Std. Err. Std. Err.
Rating 0.4439 0.0734 0.6651 0.0984 0.8931 0.1355
B2 1.4741 0.2876 0.9260 0.2631 2.2023 0.4581
B3 1.0654 0.3126 0.3268 0.2768 1.1187 0.4092
B4 0.2701 0.3428 )0.6470 0.3734 0.2628 0.4179
Y1993 )0.4807 0.2760 )0.6122 0.3172 )0.5824 0.3007
Y1994 )0.2550 0.2321 )0.1306 0.2399 )0.2851 0.2449
Y1995 0.3162 0.2073 0.4424 0.2294 0.3956 0.2269
Intercept )4.8979 0.4887 )5.0936 0.4856 )6.8252 0.8528

Panel B: Evaluation criteria – predicted defaults are those observations with a fitted probability above
0.11, which is the cutoff point that maximizes the proportion of correctly predicted observations by
model 1. The null hypotheses BS(model 1) ¼ BS(model 3) and BS(model 2) ¼ BS(model 3) can both be
rejected with a p-value 0.00 using the Williams–Kloot statistic zwk (two-tailed test)
Evaluation Model 1 Model 2 Model 3
criterion (financial (non-financial (overall
rating) rating) rating)
McFadden’s R2 0.2680 0.2938 0.3599
Brier Score 0.1301 0.1257 0.1043
Obs. correctly classified 0.8875 0.8900 0.9169
Type I error rate 0.5217 0.4783 0.4058
Type II error rate 0.0294 0.0353 0.0176
The sample used in all three probit regressions is the same and consists of 409 observations from the period
1992–1995. The dependent variable, DEF, takes the value one if default occurs in the year following the
one of the rating assignment and zero otherwise. In addition to bank and year dummy variables, model 1
uses the financial rating FR, model 2 the non-financial rating NFR, and model 3 the overall rating OR as
independent variables (instead of ‘‘Rating’’ as indicated in the first column) to estimate the probability of a
default event. Coefficients are estimated using the maximum likelihood method with sampling weights.
  
, , Significantly different from zero at the 0.01, 0.05, and 0.10-level.

empirical estimate of the sampling distribution of each evaluation criterion in the fol-
lowing manner:

1. Random draw of 409 observations with replacement from the original sample.
2. Estimation of models 1–3 in the way described above using the sample drawn in
step 1.
3. Evaluation of each model’s performance using the same previous criteria.
4. Independent replication of steps 1–3 for 1000 times.

Placing a probability of 1/1000 at each point, we obtain a relative frequency dis-


tribution of each criterion which represents a non-parametric estimate of the
520 J. Grunert et al. / Journal of Banking & Finance 29 (2005) 509–531

sampling distribution. Using this bootstrap distribution we can determine the bias,
calculated as the average difference between the bootstrap estimates and the ob-
served statistic (as shown in Table 4), its standard deviation and confidence intervals.
Efron (1982) suggests that when the ratio of the bias to its standard deviation is less
than 0.25, the bias is not a problem as the random error will surpass it. Table 5 pre-
sents results.
As can be seen, in most of the cases the bias is relatively small in comparison to its
standard deviation. For example, calculating each model’s ratio of the bias to the
estimated standard deviation for the Brier Score (observations correctly classified)
yields a range of 0.14–0.2 (0.14–0.18). However, for the type I error rate (type II er-
ror rate) we obtain values in the range 0.42–0.69 (0.26–0.40). These results indicate
that biased coefficients do not represent a serious problem for most of the evaluation
criteria. Moreover, the reported confidence intervals support the previously found
ranking of the three models.

Table 5
Bootstrapping of evaluation criteria for models 1–3
Evaluation Model 1 Model 2 Model 3
criterion (financial rating) (non-financial rating) (overall rating)
McFadden’s R2 Bootstrap mean 0.2842 0.3118 0.3811
Bias 0.0162 0.0180 0.0212
Bootstrap Std. 0.0454 0.0476 0.0591
Dev.
95% Conf. interval [0.1818; 0.3485] [0.2091; 0.3786] [0.2232; 0.4557]

Brier Score Bootstrap mean 0.1280 0.1230 0.1014


Bias )0.0021 )0.0027 )0.0029
Bootstrap Std. 0.0146 0.0143 0.0139
Dev.
95% Conf. interval [0.1037; 0.1637] [0.1011; 0.1579] [0.0823; 0.1413]

Obs. correctly Bootstrap mean 0.8792 0.8835 0.9088


classified Bias )0.0084 )0.0065 )0.0081
Bootstrap Std. 0.0451 0.0449 0.0444
Dev.
95% Conf. interval [0.6968; 0.9169] [0.6944; 0.9218] [0.7237; 0.9438]
Type I error rate Bootstrap mean 0.4542 0.4489 0.3717
Bias )0.0676 )0.0294 )0.0341
Bootstrap Std. 0.0987 0.0890 0.0798
Dev.
95% Conf. interval [0.4203; 0.8116] [0.3478; 0.7246] [0.2899; 0.6232]
Type II error Bootstrap mean 0.0532 0.0491 0.0343
rate Bias 0.0238 0.0138 0.0167
Bootstrap Std. 0.0546 0.0533 0.0539
Dev.
95% Conf. interval [0.0029; 0.0529] [0.0176; 0.2706] [0.0029; 0.0411]
For each evaluation criterion we report the bootstrap mean, the bias (calculated as the average difference
of the bootstrap estimates and the observed value), the bootstrap standard deviation and a 95% confidence
interval based on the bias-corrected percentile method. Models are estimated as described in Table 4.
J. Grunert et al. / Journal of Banking & Finance 29 (2005) 509–531 521

1.00

0.90
McFadden's R2 model 3

0.80

0.70

0.60

0.50

0.40

0.30

0.20

0.10

0.00
0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00
2
McFadden's R model 1

Fig. 1. McFadden’s R2 pairs for models 1 and 3.

Furthermore, the bootstrap procedure enables us to verify the number of cases in


which model 3 exhibits a better fit than model 1. Using a diagram in which McFad-
den’s R2 of model 1 is indicated on the horizontal axis and that of model 3 on the
vertical axis, we obtain 1000 dots of comparison pairs. Note that McFadden’s R2
of model 1 and 3 stem from different regressions that are estimated with the same
observations. The 45° line indicates model pairs of equal goodness of fit. Fig. 1 illus-
trates this analysis.
As a result, model 3 displays in 997 of 1000 cases, a better goodness of fit which is
clear evidence for our hypothesis. Likewise, we can compare the bootstrap Brier
Score of model 1 with that of model 3 (see Fig. 2).
Fig. 2 strongly supports our hypothesis because the Brier Score of model 3 is low-
er than that of model 1 in each of the 1000 cases, indicating that the use of the overall
rating – instead of the pure financial rating – results in a higher predictive accuracy.
Finally, plotting the bootstrap values of the percentage of correctly classified obser-
vations of model 1 against that of model 3, we obtain an alternative impression of
the prediction accuracy of the models (see Fig. 3).
The pattern of dots in Fig. 3 mirrors the fact that the number of correctly classi-
fied observations is integer because the cutoff value transforms fitted probabilities
into binary predictions (1 if default, zero otherwise). Therefore, the percentage val-
ues cluster on the corresponding integer number of predictions. Model 3 leads in 979
of 1000 cases to a higher number of correctly classified firms than model 1. 10

10
With regard to this criterion, model 3 performs in 6 of 1000 cases equally well and in 15 of 1000 cases
worse than model 1.
522 J. Grunert et al. / Journal of Banking & Finance 29 (2005) 509–531

0.20

0.18

0.16
Brier Score model 3

0.14

0.12

0.10

0.08

0.06

0.04

0.02

0.00
0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20
Brier Score model 1

Fig. 2. Brier Score pairs for models 1 and 3.

1.00

0.98
Correctly classified model 3

0.96

0.94

0.92

0.90

0.88

0.86

0.84

0.82

0.80
0.80 0.82 0.84 0.86 0.88 0.90 0.92 0.94 0.96 0.98 1.00
Correctly classified model 1

Fig. 3. Pairs of correctly classified observations for models 1 and 3.

A similar way of interpreting model performance is analyzing the differences be-


tween the bootstrapped evaluation criteria. Fur this purpose, we calculate for each
bootstrap run the pairwise difference between the evaluation criteria of model 1 or
2 and model 3. From the distribution of differences we obtain the probability of
J. Grunert et al. / Journal of Banking & Finance 29 (2005) 509–531 523

Table 6
Relative performance of model 3 vs. models 1 and 2
Evaluation criterion P (model 3 worse than model 1) P (model 3 worse than model 2)
2
McFadden’s R 0.003 0.022
Brier Score 0.000 0.004
Obs. correctly classified 0.015 0.017
Type I error rate 0.196 0.173
Type II error rate 0.131 0.118
Reported values are probabilities of observing a positive/negative difference between the evaluation cri-
teria. Probabilities are obtained from the bootstrap distribution of each model and criterion.

model 3 performing worse than the two other models. The results of the different
evaluation criteria are presented in Table 6 below.
It turns out that model 3 clearly dominates the other two models in terms of
McFadden’s R2 , the Brier Score and the percentage of correctly classified observa-
tions, as the probability of observing the opposite is very small. With respect to type
I and II error rates, model 3 is still, but to a lesser extent, superior. Note that this
deterioration of performance should not be overstated since it can be a consequence
of the fact that type I and II error estimates are relatively more biased than the other
evaluation criteria estimates.
Summarizing, a ‘‘mixed’’ model that includes both financial and non-financial fac-
tors leads to a more accurate prediction of default events than a model that is solely
based on either factors or naive forecasts. The result is statistically highly significant,
which provides strong support for our hypothesis.

4.2. Tests of robustness

In this section we analyze the robustness of our previous finding relative to several
aspects. Firstly, we address the marginal impact of each banks’ rating-default-set by
leaving out one bank from the sample successively and estimating the same models
with the remaining three banks as in Section 4.1. The results confirm our previous
findings: model 3 dominates model 1 with respect to all of our five evaluation crite-
ria. 11
Secondly, we carried out regression analyses on the individual bank level. Due to
a too small number of observations from bank 1 and 4, individual models can only
be estimated for bank 2 and bank 3. The results obtained are consistent with previ-
ous ones.
Thirdly, we examine the influence of weighting schemes in internal rating systems.
With regard to previous results, it is not clear why model 3 performs better than
model 1. One reason might be the additional inclusion of non-financial factors.

11
The only exception occurs if bank 2 is left out. In this case, model 3 dominates model 1 for all
evaluation criteria except type II error rate, where both models perform equally.
524 J. Grunert et al. / Journal of Banking & Finance 29 (2005) 509–531

Another reason might be that the independent variables in both models are based on
different weighting schemes (one optimal and one sub-optimal). In particular, the
fact that we use the financial rating based on a weighting scheme optimized for
the overall rating could be problematic. To investigate the influence of weighting
schemes, we compare a probit model that explains default events on the basis of
financial factors (regression 1) with a probit model that explains default events on
the basis of financial and non-financial factors (regression 2). For this purpose, we
inspect the financial factors to detect outliers. Since the distributions of the current
ratio, cash flow-to-net liabilities and the capital intensity ratio exhibit some extreme
values, we estimate regressions 1 and 2 with raw data and with winsorized data. 12
Results obtained from winsorized data are very close to those from the raw data,
indicating that outliers are not crucial. For this reason, we stick to the raw data in
the remainder. Regression results and the corresponding evaluation criteria are re-
ported in Table 7. 13
Essentially, in regression 1 we find that the equity ratio (ER) has the expected neg-
ative sign and is significant at the 0.01-level. Likewise, the variable return on assets
(ROA) has the expected negative sign and is significant at the 0.05-level. In terms of
economic significance, the coefficient is higher than that of the equity ratio. In regres-
sion 2 the variable ER still has an economic influence but its significance declines to
the 0.10-level. Note that the non-financial factor management quality (MGT) is eco-
nomically and statistically significant at the 0.01-level, whereas the non-financial fac-
tor market position (MKT) is not significant at all. Return on assets also remains
significant at the 0.05-level. As reported in Table 7, panel B, regression 2 produces
a more accurate prediction of default events than regression 1 with respect to most
of the evaluation criteria (except type II error rate), although the cutoff point (here
0.07) was optimized for regression 1. Both regression models are more accurate than
the naive forecast which leads to a Brier Score of 0.1308. Since the weighting of
financial and non-financial factors is not predetermined here (as it was in the case
in Section 4.1) but rather estimated in the regressions, weighting schemes do not
seem to be critical because our previous results are reproducible, without relying
on sub-ratings that are based on predetermined weighting schemes. Note that the sig-
nificant influence of the factor management quality (MGT) is consistent with the sur-
vey results of G€unther and Gr€ uning (2000). Additionally, as stated in Section 4.1, we
bootstrap regressions 1 and 2 to study a potential bias in estimators and the relative
performance of both models. Table 8 summarizes the main results.
Panel A shows that the bias is relatively low in comparison to the bootstrap stan-
dard deviation for observations correctly classified and the type II error. Considering

12
The replacement of extreme values by the minimal/maximal admitted values is called winsorization.
We replaced values below (above) the 5% quantile (95% quantile) by the 5%/95% quantile values of a
variable’s distribution.
13
Note that due to the lower number of observations, the absolute values of the evaluation criteria are
not comparable to the regression analyses presented in Section 4.1.
J. Grunert et al. / Journal of Banking & Finance 29 (2005) 509–531 525

Table 7
Prediction of default events with different factor types
Panel A: Regression results
Variable Regression 1 Regression 2
DEF Coefficient Robust Std. Err. Coefficient Robust Std. Err.
LTA )0.0098 0.1070 0.07888 0.1309
ER )0.0170 0.0055 )0.0105 0.0062
CR 0.0000 0.0000 0.0001 0.0018
CFNL )0.0000 0.0000 )0.0001 0.0000
CIR 0.0002 0.0003 0.0003 0.0003
ROA )0.0321 0.0127 )0.0344 0.01490
MKT – – )0.1603 0.1462
MGT – – 0.5075 0.1125
B2 0.7146 0.3863 0.4692 0.4403
B3 0.4526 0.4307 0.2667 0.4811
B4 )0.1605 0.5056 )0.5290 0.5900
Y1993 )0.1873 0.3283 )0.2305 0.3724
Y1994 0.2370 0.2981 0.1000 0.3110
Y1995 0.6440 0.3038 0.6105 0.3257
Intercept )2.4507 1.2802 )4.1844 1.8581

Panel B: Evaluation criteria – Predicted defaults are those observations with a fitted probability above
0.07, which is the cutoff point that maximizes the proportion of correctly predicted observations by
regression 1. The null hypotheses BS(regression 1) ¼ BS(regression 2) can be rejected with a p-value
0.00 using the Williams–Kloot statistic zwk (two-tailed test)
Evaluation criterion Regression 1 Regression 2
McFadden’s R2 0.2288 0.3297
McFadden’s R2 adjusted )0.288 )0.267
Brier Score 0.1256 0.1042
Obs. correctly classified 0.8777 0.8993
Type I error rate 0.5581 0.4186
Type II error rate 0.0425 0.0425
The dependent variable DEF indicates weather default occurs in the year following the one of the rating
assignment. Regression 1 uses all financial factors described in Table 1 as independent variables (LTA,
ER, CR, CFNL, CIR, ROA, and dummies for banks and years) whereas regression 2 uses all financial
(and dummies for banks and years) and non-financial factors (MKT, MGT). Due to lacking data the
sample is reduced to 278 observations. Coefficients are estimated using the maximum likelihood method
with sampling weights.  , , Significantly different from zero at the 0.01, 0.05, and 0.10-level.

the other three evaluation criteria, the bias is somehow larger but, in absolute terms,
always smaller than the bootstrap standard deviation. Overall, in our opinion these
mixed bootstrapping results do not represent clear evidence in favor of a serious bias
in the evaluation criteria. Furthermore, panel B reveals that regression 2 dominates
regression 1 with regard to four of five evaluation criteria. Only the performance
analysis with type II error rates provides a less definite predominance of regression
2. Note that if we bootstrap regressions with variables winsorized at the 5%/95%
quantiles (not reported in Table 8), the probability of regression 2 performing worse
526 J. Grunert et al. / Journal of Banking & Finance 29 (2005) 509–531

Table 8
Bootstrapping of regressions 1 and 2
Panel A: Bootstrapping results
Evaluation criterion Regression 1 Regression 2
McFadden’s R2 Bootstrap mean 0.2770 0.3926
Bias 0.0481 0.0629
Bootstrap Std. Dev. 0.0625 0.07062
95% Conf. interval [0.1268; 0.3063] [0.2080; 0.4063]
Brier Score Bootstrap mean 0.1188 0.0949
Bias )0.0068 )0.0047
Bootstrap Std. Dev. 0.0170 0.0162
95% Conf. interval [0.1004; 0.1677] [0.0809; 0.1477]

Obs. correctly classified Bootstrap mean 0.8730 0.8915


Bias )0.0047 )0.0077
Bootstrap Std. Dev. 0.0569 0.0564
95% Conf. interval [0.6115; 0.9136] [0.6511; 0.9388]
Type I error rate Bootstrap mean 0.4656 0.3579
Bias )0.0924 )0.0607
Bootstrap Std. Dev. 0.1248 0.0898
95% Conf. interval [0.4186; 0.8837] [0.3255; 0.7209]
Type II error rate Bootstrap mean 0.0649 0.0628
Bias 0.0224 0.0203
Bootstrap Std. Dev. 0.0707 0.0682
95% Conf. interval [0.0085; 0.0808] [0.0213; 0.0808]

Panel B: Relative performance of regression 2 vs. regression 1 – Reported values are probabilities of
observing a positive/negative difference between the evaluation criteria. Probabilities are obtained
from the bootstrap distribution of each regression and criterion
Evaluation criterion P (regression 2 worse
than regression 1)
McFadden’s R2 0.000
Brier Score 0.000
Obs. correctly classified 0.150
Type I error rate 0.110
Type II error rate 0.415
For each evaluation criterion we report the bootstrap mean, the bias (calculated as the average difference
of the bootstrap estimates and the observed value), the bootstrap standard deviation and a 95% confidence
interval based on the bias-corrected percentile method. Regressions are estimated as described in Table 7.

than regression 1, according to the type II error criterion, decreases considerably


(from 0.415 to 0.172) and values for the other criteria decrease slightly.
Fourthly, we examine whether model performance is driven by the structure of
our data set. In our sample, firms are in default in a particular year if their status
meets our definition of default. Some firms exhibit a continuous or interrupted se-
quence of multiple defaults (for example, they are in default at the beginning of
the sampling period, they recover in the meantime and default again two years later).
J. Grunert et al. / Journal of Banking & Finance 29 (2005) 509–531 527

Since we do not know how accurately banks rate firms that were in default during
the preceding year, 14 we investigate this issue for a subsample of our original data.
More specifically, we drop subsequent default observations from firms that exhibit
multiple defaults (for example, from a firm-specific year-default vector (1,1,0,1) we
only use the first and the last observation). This procedure leads to a subsample with
383 observations (of which there are 50 defaults). According to Section 4.1, we then
estimate models 1–3. The resulting rank ordering reveals that model 3 performs bet-
ter than models 1 and 2 in terms of all evaluation criteria. In addition, applying the
same selection criteria as before, we carry out regressions 1 and 2 with a subsample
of 264 observations (of which there are 35 defaults). Regression 2 leads to a better
forecast of default events than regression 1 with respect to most of the evaluation
criteria (except type II error rate).
Finally, results may be influenced by a reporting lag due to outdated financial
statements. Since distressed firms stop producing financial statements, financial rat-
ings may be based on the last available report before default. For example, if a firm
defaults early in year t, the bank’s financial rating of this firm is still based on the
financial statements for fiscal year t  2. To address this issue, we create a variable
‘‘time lag’’ that reports the number of days between December 31 of the last avail-
able financial statements and the day of the credit application or review. We cannot
refer to the submission date of a firm’s financial reports because it is not included in
our sample. Then we rerun models 1–3 and regressions 1 and 2, adding the variable
‘‘time lag’’. It turns out that ‘‘time lag’’ is not significant in any regression (with
p-values in a range of 0.2–0.5), the rating variables in models 1–3 remain highly sig-
nificant and the rank ordering of the models according to the five evaluation criteria
has not been altered. These results indicate that timeliness does not seem to be a crit-
ical issue. Nevertheless, this point should be interpreted with care because our
variable ‘‘time lag’’ is only an imperfect approximation of the actual reporting
behavior.
Recapitulating, the results of various robustness tests support the hypothesis that
the combined use of financial and non-financial factors leads to a more accurate pre-
diction of default events than their single use or naive forecasts. This finding is not
sensitive to the omission of any bank from the sample, robust on the individual level
for two banks, not driven by predetermined weighting schemes, not dependent on
whether firms default once or more times, and does not seem to be biased by a
reporting lag.

14
Ratings should reflect a firm’s probability of default, which implies that rating changes indicate an
increase or decrease in a firm’s default risk. Considering our definition of default and the fact that banks
actually keep on rating distressed firms, we think it is reasonable to analyze not only situations in which
firms default but also those in which they recover from a previous default. For example, if a bank creates a
specific loan loss provision, it will set the rating to grade 6 directly afterwards. Nonetheless, if the bank
subsequently gets new information that indicates an improved creditworthiness, any further rating will
describe the future (and not the current) status of the borrower.
528 J. Grunert et al. / Journal of Banking & Finance 29 (2005) 509–531

5. Conclusion

Over the past 10 years, banks’ uses of internal credit ratings have multiplied. In
the near future ratings will be recognized by banking supervision authorities to deter-
mine banks’ capital adequacy, converging considerably the internal and the external
perspective of credit risk management (see Basel Committee on Banking Supervision
(2001, 2003)). Given this rising importance of credit ratings, the design of sound rat-
ing systems is in the interest of banks, borrowers, and supervisors. Whereas the rel-
evance of financial factors for rating purposes is widely accepted, the consideration
of non-financial factors is equally beyond controversy but it has often only holisti-
cally been justified.
This paper constitutes a first attempt to explore the role of non-financial factors in
credit ratings. Our main result is that the combined use of financial and non-financial
factors leads to a significantly more accurate default prediction than the single use of
financial or non-financial factors. Default is defined consistently with the definition
of the Basel Committee on Banking Supervision; goodness of fit and accuracy of de-
fault prediction is measured using McFadden’s R2 , the Brier Score, the percentage of
correctly classified observations and type I/II error rates.
Although our results are limited in some ways due to the data used, they essen-
tially confirm banking practice (see Basel Committee on Banking Supervision,
2000a; G€ unther and Gr€ uning, 2000) and show that holistic justifications for the
use of non-financial factors can be confirmed by a quantitative analysis. However,
since only the benefits of non-financial factors have been analyzed, it is not possible
to conclude that their additional use represents a net advantage because we have not
examined the costs of acquiring and processing non-financial information. The latter
may be left to future research that should proceed with an integrated cost benefit
analysis of internal credit rating systems on the individual bank level. Using more
extensive data, it would be interesting to differentiate our analysis with respect to
the age and size of borrowing firms since both characteristics might be linked to
the degree to which non-financial factors improve default prediction. Additionally,
in particular for pricing issues, it might be instructive to study whether non-financial
factors in credit ratings can improve the differentiation between those borrowers dis-
posing of an acceptable degree of creditworthiness. Collecting data from different
financial intermediaries, our results could also be tested with regard to bank size
and organizational structure following Berger et al. (2002) and Stein (2002). Finally,
a promising extension of Carey’s (2001), Tabakis and Vinci’s (2002) research as well
as our own could be to investigate whether and to what extent there is a relationship
between multiple lenders’ rating disagreements for common borrowers and non-
financial factors in credit ratings.

Acknowledgements

We thank Thomas Langer, Gunter L€ offler, Achim Machauer, Steven Ongena,


and participants of the 1st C.R.E.D.I.T. Conference in Venice, Italy, as well as the
J. Grunert et al. / Journal of Banking & Finance 29 (2005) 509–531 529

participants of the 9th Annual Meeting of the German Finance Association in Co-
logne, Germany, for their valuable comments and insights. In addition, we are grate-
ful to the seminar participants at the University of Mannheim for useful
conversations. We retain responsibility for any remaining errors.

Appendix A. Definitions of financial factors

This table shows the formulae to calculate the financial factors used in Section 4.

Variable and formula


Logarithm of total assets (LTA) ¼ log (total assets)
equity
Equity-to-assets ratio (ER) ¼ 100
total assets
current assets
Current ratio (CR) ¼ 100
current liabilities
cash flow
Cash flow-to-net liabilities (CFNL) ¼ 100
total liabilities  current assets
fixed assets
Capital intensity ratio (CIR) ¼ 100
equity þ long-term liabilities
net earnings
Return on assets (ROA) ¼ 100
total assets

The factors ER, CR, CFNL, CIR, and ROA are parts of the internal credit rat-
ings systems of banks 1–4.

References

Altman, E.I., 1968. Financial ratios, discriminant analysis and the prediction of corporate bankruptcy.
Journal of Finance 23, 589–609.
Altman, E.I., Haldemann, R.G., Narayan, P., 1977. Zetae analysis – a new model to identify bankruptcy
risk of corporations. Journal of Banking and Finance 1, 29–54.
Baetge, J., 1998. Empirische Methoden zur Fr€ uherkennung von Unternehmenskrisen. Opladen/Wiesba-
den.
Basel Committee on Banking Supervision, 2000a. Range of Practice in Banks’ Internal Rating Systems.
Discussion Paper, January 2000.
Basel Committee on Banking Supervision, 2000b. Credit ratings and complementary sources of credit
quality information. Working Paper No. 3, August 2000.
Basel Committee on Banking Supervision, 2001. The New Basel Capital Accord. Consultative Document,
January 2001.
Basel Committee on Banking Supervision, 2003. The New Basel Capital Accord. Consultative Document,
April 2003.
Beaver, W.H., 1966. Financial ratios as predictors of failure. Journal of Accounting Research 4, 71–111.
Berger, A.N., Miller, N.H., Petersen, M.A., Rajan, R.G., Stein, J.C., 2002. Does function follow
organizational form? Evidence from the lending practices of large and small banks. Working Paper
No. 8752. National Bureau of Economic Research.
530 J. Grunert et al. / Journal of Banking & Finance 29 (2005) 509–531

Bhattacharya, S., Thakor, A., 1993. Contemporary banking theory. Journal of Financial Intermediation 3,
2–50.
Blochwitz, S., Eigermann, J., 2000. Unternehmensbeurteilung durch Diskriminanzanalyse mit qualitativen
Merkmalen. Zeitschrift f€ ur betriebswirtschaftliche Forschung 52, 58–73.
Brier, G.W., 1950. Verification of forecasts expressed in terms of probability. Monthly Weather Review
78, 1–3.
Brunner, A., Krahnen, J.P., Weber, M., 2000. Information production in credit relationships: on the role
of internal ratings in commercial banking. Working Paper No. 2000/10. Center for Financial Studies,
Frankfurt/Main.
Carey, M., 2001. Some evidence on the consistency of banks’ internal credit ratings. Working Paper.
Federal Reserve Board.
Carey, M., Hrycay, M., 2001. Parameterizing credit risk models with rating data. Journal of Banking and
Finance 25, 197–270.
Collins, N.J., 1966. Credit Analysis – Concepts and Objectives. In: Baughn, W.H., Walker, C.E. (Eds.),
The Banker’s Handbook, pp. 279–289.
Crouhy, M., Galai, D., Mark, R., 2001. Prototype risk rating system. Journal of Banking and Finance 25,
47–95.
Deutsche Bundesbank, 2003. Validierungsans€atze f€ ur interne Ratingsysteme. Monatsbericht (September),
61–74.
Diamond, D.W., 1984. Financial intermediation and delegated monitoring. Review of Economic Studies
51, 393–414.
Efron, B., 1979. Bootstrap methods: Another look at the jackknife. Annals of Statistics 7, 1–26.
Efron, B., 1982. The Jackknife, the Bootstrap and other Resampling Plans. Society for Industrial and
Applied Mathematics, Philadelphia.
Elsas, R., Henke, S., Machauer, A., Rott, R., Schenk, G., 1998. Empirical analysis of credit relationships
in small firms financing: sampling design and descriptive statistics. Working Paper No. 1998/14. Center
for Financial Studies, Frankfurt/Main.
Elsas, R., Krahnen, J.P., 1998. Is relationship lending special? Evidence from credit-file data in Germany.
Journal of Banking and Finance 22, 1283–1316.
English, W.B., Nelson, W.R., 1999. Bank risk rating of business loans. In: Proceedings of the 35th Annual
Conference on Bank Structure and Competition, May.
Ewert, R., Szczesny, A., 2001. Countdown for the new Basle Capital Accord. Working Paper No. 2001/05.
Center for Financial Studies, Frankfurt/Main.
G€unther, T., Gr€ uning, M., 2000. Einsatz von Insolvenzprognoseverfahren bei der Kredit-
w€urdigkeitspr€ufung im Firmenkundenbereich. Die Betriebswirtschaft 60, 39–59.
Hesselmann, S., 1995. Insolvenzprognose mit Hilfe qualitativer Faktoren. Aachen.
Hosmer, D.W., Lemeshow, S., 2000. Applied logistic regression. New York.
Krahnen, J.P., Weber, M., 2001. Generally accepted rating principles: A primer. Journal of Banking and
Finance 25, 3–23.
Leland, H.E., Pyle, D.H., 1977. Information asymmetries, financial structure, and financial intermedi-
ation. Journal of Finance 32, 371–387.
L€
offler, G., 2004. An anatomy of rating through the cycle. Journal of Banking and Finance 28, 695–
720.
Machauer, A., Weber, M., 1998. Bank behavior based on internal credit ratings of borrowers. Journal of
Banking and Finance 22, 1355–1383.
Merton, R.C., 1974. On the pricing of corporate Debt: The risk structure of interest rates. Journal of
Finance 29, 449–470.
Moody’s Investor Service, 2001. Moody’s RiskCalce for Private Companies: The German Model –
Rating Methodology, November 2001.
Norden, L., 2002. Spezialbanken und Basel II: Eine empirische Untersuchung interner Ratingsysteme. Die
Betriebswirtschaft 62, 273–288.
Organisation for Economic Co-operation and Development, 2001. Bank Profitability: Financial
Statements of Banks, Paris.
J. Grunert et al. / Journal of Banking & Finance 29 (2005) 509–531 531

Ohlson, J.A., 1980. Financial ratios and the probabilistic prediction of bankruptcy. Journal of Accounting
Research 18, 109–131.
Platt, H.D., Platt, M.B., 1990. Development of a class of stable predictive variables: The case of
bankruptcy prediction. Journal of Business, Finance and Accounting 17, 31–51.
Redelmeier, D.A., Bloch, D.A., Hickam, D.H., 1991. Assessing predictive accuracy: How to compare brier
scores. Journal of Clinical Epidemiology 44, 1141–1146.
Risk Management Association, 2000. EDF Estimation: A ‘‘Test-Deck’’ Exercise. The Risk Management
Association Journal, 54–61.
Stein, J.C., 2002. Information production and capital allocation: Decentralized versus hierarchical firms.
Journal of Finance 57, 1891–1921.
Tabakis, E., Vinci, A., 2002. Analysing and combining multiple credit assessments of financial institutions.
Working paper No. 123. European Central Bank.
Treacy, W.F., Carey, M., 2000. Credit risk rating systems at large US banks. Journal of Banking and
Finance 24, 167–201.
Vinterbo, S., Ohno-Machado, L., 1999. A recalibration method for predictive models with dichotomous
outcomes. In: Vinterbo, S., Predictive Models in Medicine: Some Methods for Construction and
Adaptation. Ph.D. thesis. Norwegian University of Science and Technology.
Weber, M., Krahnen, J.P., Vossmann, F., 1999. Risikomessung im Kreditgesch€aft: Eine empirische
Analyse bankinterner Ratingverfahren. Zeitschrift f€ ur betriebswirtschaftliche Forschung, Sonderheft
41, 117–142.
Zmijewski, M.E., 1984. Methodological issues related to the estimation of financial distress prediction
models. Journal of Accounting Research 22, 59–82.

You might also like