Printer Friendly
The Free Library
4,474,578 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

Credit scoring using data mining techniques.


Abstract

In its most simple form, credit scoring can be defined as a technique that helps credit providers decide whether to grant credit to customers. This paper discusses the benefits and applications of credit scoring and the construction of credit scoring models. It also reviews the data mining methodology and identifies potential data mining techniques that can be used to construct these models. Finally, the paper illustrates the use of data mining techniques to construct credit scoring models and highlights the prerequisites and limitations of the data mining approach.

Key words: Credit scoring; data mining; predictive modelling; credit scoring models; decision trees.

**********

The last two decades have seen a rapid growth in both the availability and the use of consumer credit. Until recently, the decision to grant credit was based on human judgement to assess the risk of default. The growth in the demand for credit, however, has led to a rise in the use of more formal and objective methods (generally known as credit scoring) to help credit providers decide whether to grant credit to an applicant. This approach was first introduced in the 1940s and over the years had evolved and developed significantly. In recent years, the progress in credit scoring was fuelled by increased competition in the financial industry, advances in computer technology, and the exponential growth of large databases.

Since the mid-1990s, three new interrelated areas that emphasised obtaining more information from data have emerged strongly in information systems and information technology. They are data warehousing, knowledge management, and data mining, the last of which aims to identify valid, novel, potentially useful and understandable correlations and patterns in data (Chung and Gray, 1999). Coupled with advances in both computer hardware and software, many data mining applications are now more accessible and affordable to businesses than before.

This paper reviews the credit scoring and data mining literature and illustrates the use of data mining techniques in the construction of credit scoring models. The first section provides a formal definition of credit scoring, describes its usefulness and advantages, and lists some of its applications. It also discusses the construction of credit scoring models by looking at the methodology and techniques commonly used. The second section reviews the data mining literature and identifies data mining techniques that can be used to construct credit scoring models. The third section illustrates the use of data mining techniques to construct credit scoring models. Finally, the concluding section highlights the limitations of credit scoring as well as the prerequisites and limitations of the data mining approach to the construction of credit scoring models.

Credit Scoring

Credit scoring can be formally defined as a statistical (or quantitative) method that is used to predict the probability that a loan applicant or existing borrower will default or become delinquent (Mester, 1997). The objective of credit scoring is to help credit providers quantify and manage the financial risk involved in providing credit so that they can make lending decisions quickly and more objectively. This section describes the development of Credit Scoring over the years.

In 1936, Fisher introduced the idea of discriminating between groups in a population (For example, between two species of iris by using measurements of the physical size of the plants). In 1941, Durand, who was working on a research project for the US National Bureau of Economic Research, realised that Fisher's discriminant analysis could be used to differentiate between good and bad loans. For many years, the decision to grant a loan had been done judgmentally by credit analysts. Because of a shortage of credit analysts during World War II, many organisations got the analysts to write down the rules they used to assess a loan applicant's credibility in repaying the loan (Johnson, 2004). Credit decisions were made with the help of these rules. After the war, people linked these two events together and began to see the advantages of using statistically derived models in the process of decision making for loan applications.

In the 1960s, with the creation of credit cards, banks and other credit card issuers realised the advantages of credit scoring. As the number of people applying for credit card increased, there was an urgency to automate the credit granting process. Organisations that used credit scoring also realised that the scores predicted default better than any judgmental method (Myers, 1963). The scores also helped the organisations to reduce the delinquency rates. The Equal Credit Opportunity Acts that were passed in the US in 1975 and 1976 marked an important event as the Acts signified the acceptance of credit scoring to facilitate lending decisions while safeguarding the interests of consumers to prevent incidence of unfairness.

In the 1980s, the success of credit scoring for credit cards prompted banks to use credit scoring for other purposes (for example, personal loan applications). The growth of direct marketing in the 1990s also led to the use of the credit scoring methodology to increase the response rate to advertising campaigns. In recent years, credit scoring has been used for home loans, small business loans and insurance applications and renewals. The focus has also shifted from the reduction of the delinquency rate of loan applicants to the increase of profit from customers (Thomas, 2000).

Benefits of Credit Scoring

Credit scoring has many benefits that accrue not only to the lenders but also to the borrowers. For example, credit scores help to reduce discrimination because credit scoring models provide an objective analysis of a consumer's creditworthiness. This enables credit providers to focus on only information that relate to credit risk and avoid the personal subjectivity of a credit analyst or an underwriter. In the United States, under the Equal Credit Opportunity Act, variables of overt discrimination such as race, sex, religion and age cannot be included in the credit scoring models. Only information that is non-discriminatory in nature and that has been proven to be predictive of payment performance can be included in the models.

Credit scoring also helps to increase the speed and consistency of the loan application process and allows the automation of the lending process. As such, it greatly reduces the need for human intervention on credit evaluation and the cost of delivering credit (Barefoot, 1995). With the help of the credit scores, financial institutions are able to quantify the risks associated with granting credit to a particular applicant in a shorter time. Leonard's (1995) study of a Canadian bank found that the time for processing a consumer loan application was shortened from nine days to three days after credit scoring was used. The time saved in processing the loans can be used to address more complex issues. Banaslak and Kiely (2000) concluded that with the help of credit scores, financial institutions are able to make faster, better and higher quality decisions.

Further, credit scores can help financial institutions determine the interest rate that they should charge their consumers and to price portfolios (Avery et al, 2000). Higher risk consumers are charged a higher interest rate and vice versa. Based on the consumer's credit scores, the financial institutions are also able to determine the credit limits to be set for the consumers (Sandler et al, 2000). These help financial institutions to manage their accounts more effectively and profitably. As an extension, profit scoring can be used to maximise profits across a range of products (Thomas, 2000).

Credit scoring models have enabled the development of the sub-prime lending industry where sub-prime consumers have poor credit records and fall short of credit acceptance and risk. They may not meet the requirements for traditional financing because of credit impairment, missing data in their credit histories or difficulty in validating their income (Quittner, 2003). One of the major factors in the progress of sub-prime lending has been automated underwriting, which allows sub-prime mortgage loans to be packaged and sold as investment securities. The initial success of specialised financial institutions in this market has driven more financial institutions to enter the sub-prime lending market, which is expected to grow as technology in credit scoring advances (Perin, 1998).

Finally, because of advances in technology, more intelligent credit scoring models are being developed. Consequently, credit cards issuers are able to make use of the information generated from the models to formulate better collection strategies and hence use their resources more effectively. Lucas (2000) reported that recovery rates averaged 15.9 per cent in 1999, up from 12.1 per cent in the previous year and 9.1 per cent in 1997. Further, the insurance industry has used credit scoring to streamline the insurance application and renewal process. In particular, credit scores help insurance companies to make a better prediction on claims and control risk more effectively. They also make pricing more accurate. This enables insurance companies to offer more insurance coverage to more consumers at a more equitable cost, react quickly to market changes and gain a competitive edge (Kellison and Brockett, 2003).

Credit Scoring Applications

In the early years, financial institutions used credit scoring mainly to make credit decisions for loan applications. Over the past 25 years, however, the application of credit scoring has grown from making credit decisions to making decisions related to housing, insurance, basic utility services, and even employment. However, not all these applications are equally widely used.

The most common use of credit scores is in making credit decisions for loan applications. In addition to decisions on personal loan applications, financial institutions now make use of credit scores to help set credit limits, manage existing accounts and forecast the profitability of consumers and customers (Punch, 2000). For example, the Australia and New Zealand Banking Group makes use of credit scoring to assist them to identify applicants who should receive credit, determine the amount of credit that the applicants should receive, and the steps that should be taken should there be a failure in the payment of loans (see www.sas.com/success/anzcredit.html). Also, credit card issuers use credit scores as a decision support tool to identify their target market for credit cards (Punch, 2000). In recent years, credit scores have also been used as part of the decision process for providing credit to small businesses (Rowland, 2003). For example, the Fleet Financial Group uses credit scoring for loans under US$100,000 (Zuckeman, 1996).

Credit scoring models have also been used in the insurance industry (for example, for mortgage and automobile insurance) to decide on the applications of new insurance policies and the renewals of existing polices. The premise is that there is a direct relationship between financial stability and risk. It has been argued that there is a strong relationship between credit rating and loss ratios in both automobile and mortgage insurance. Statistical evidence has also shown that relative loss ratios (which are a function of both claim frequency and cost) decrease as credit rating improves (Schiff, 2003). GE Capital Mortgage Corporation uses credit scoring to help them to screen mortgage insurance applications (Prakash, 1995). Credit scores are also used as a basis to adjust premiums. Generally, consumers with bad credit scores have a higher chance of filing insurance claims as compared to customers with good credit scores. Therefore, the former are charged a higher premium. Credit information is also used to assess a consumer's accountability and performance under the conditions of an insurance policy.

In addition to the above, other credit scoring applications have also been reported by the Consumer Federation of America in 2002. For example, landlords can make use of credit scores to determine whether potential tenants are likely to pay their rent on time. Some utility suppliers in the United States have also used credit scores to determine whether to provide their services to consumers. Finally, some employers make use of credit history and credit scores to decide whether to hire a potential employee, especially for posts where employees need to handle huge amounts of money. The implication is that employee trustworthiness and hence personal character can be assessed through their credit scores.

Construction of Credit Scoring Models

The methodology of constructing credit scoring models generally involves the following process. First, a sample of previous customers is selected and classified as "good" and "bad" depending on their repayment performance over a given period (for simplicity, only a dichotomy is used here). Next, data are compiled from loan applications, personal and/or business credit records and various sources if available (for example, credit bureau reports). Finally, statistical or other quantitative analysis is performed on the data to derive a credit scoring model. The model will comprise weights to apply to the different variables or attributes in the data and a cut-off point. The sum of the weights applied to the variables for an individual consumer or customer constitutes the credit score. The cut-off point determines if this consumer or customer should be classified as "good" or "bad". The probability associated with this classification can also be generated. It is noted that different models can be constructed for different segments of the data (for example, for different products).

To date, several techniques have been used in the construction of credit scoring models. The most common techniques used are traditional statistical methods. For example, some of the earliest credit scoring models developed used discriminant analysis. However, discriminant analysis requires rather restrictive statistical assumptions that are seldom satisfied in real life. Consequently, logistic regression, which is less restrictive, has been proposed as an alternative to discriminant analysis. Some of the techniques that have been previously used, but rather infrequently, to construct credit scoring models include genetic algorithm, k-nearest neighbour, linear programming and expert systems.

In recent years, data mining techniques have been increasingly used to construct credit scoring models. In particular, the decision tree approach has become a popular technique for developing credit scoring models as the resulting decision trees are easily interpretable and visualisable. Further, neural networks are also commonly used. They techniques are discussed in detail below. Empirical studies on credit scoring models include (Lee and Jung, 1999/2000) and (West, 2000).

Data Mining

Data mining can be considered a relatively recently developed methodology and technology, coming into prominence only in 1994 (Trybula, 1997). It aims to identify valid, novel, potentially useful, and understandable correlations and patterns in data (Chung and Gray, 1999). Data mining can also be considered a process and a technology to detect the previously unknown in order to gain competitive advantage. In data mining, there is a strong emphasis on combing through copious data sets to sniff out patterns that are too subtle or complex for humans to detect (Kreuze, 2001).

Data Mining Methodology

CRISP-DM (Cross-Industry Standard Process for Data Mining, see www.crisp-dm.org) proposes the following methodology for data mining: (1) business understanding, (2) data understanding and data preparation, (3) modelling, (4) evaluation, and (5) deployment. Business understanding is critical as it identifies the business objectives and hence the success criteria of data mining projects. Further, as the term "data mining" implies, data is a crucial component, that is, no data means no mining. Hence, CRISP-DM includes data understanding and data preparation (for example, sampling and data transformation) as an essential antecedent for modelling.

The modelling stage is the actual data analysis. Most data mining software include OLAP (on-line analytical processing), traditional statistical methods (for example, cluster analysis, discriminant analysis and regression analysis) as well as non-traditional statistical analysis (such as neural networks, decision trees, link analysis and association analysis). This extensive range of techniques is not surprising given that data mining has been viewed as the offspring of three different disciplines, namely database management, statistics and computer science.

The evaluation stage allows the comparison of models and results from any data mining model by using a common yardstick (for example, lift charts, profit charts or diagnostic classification charts). Finally, deployment relates to the actual implementation and operationalisation of the data mining models.

For the purpose of this paper, one of the business objectives of data mining applications is credit scoring.

Data Mining Techniques

Data mining techniques can be broadly classified based on what they can do, namely: (1) description and visualisation; (2) association and clustering; and (3) classification and estimation (that is, prediction). Description and visualisation can contribute greatly towards understanding a data set, especially a large one, and detecting hidden patterns in data, especially complicated data containing complex and non-linear interactions. They are usually performed before modelling is attempted and represents data understanding in the CRISP-DM methodology.

In association, the objective is to determine which variables go together. For example, market basket analysis refers to a technique that generates probabilistic statements such as: if customers purchase coffee, there is a 0.35 probability that they also purchase bread. Such information can be useful for store layout, items bundling, discount and promotion decisions ... etc. Market basket analysis can be applied not only to items purchased concurrently but also to items purchased sequentially. In clustering, the objective is to group objects in such a way that objects belonging to the same cluster are similar and objects belonging to different clusters are dissimilar. As an application, clustering can be used for market segmentation to group consumers and customers.

The most common and important applications in data mining probably involve prediction, sometimes referred to as modelling. Classification refers to the prediction of a target variable that is categorical in nature (for example, predicting fraud versus non-fraud, high-risk versus low-risk or purchaser versus non-purchaser). Estimation, on the other hand, refers to the prediction of a target variable that is metric (interval) in nature (for example, predicting the amount spent, duration of a call or the account balance). To construct credit scoring models, predictive modelling techniques are the most relevant.

For predictive modelling, the data mining techniques include traditional statistics such as multiple discriminant analysis and logistic regression analysis. More importantly, data mining techniques also include non-traditional methods developed in the areas of artificial intelligence and machine learning. The two most important models of these are neural networks and decision trees. As traditional statistics are not new and can be found in standard texts in the area, they are not discussed here. Instead, the following paragraphs discuss neural networks and decision trees. More details can be found in Berry and Linoff (1997).

Neural networks are useful for recognising patterns in the data, especially when the form of relationships between the target (for example, credit risk) and input variables (for example, demographic characteristics) is unknown and/or complex. They are modelled after the human brain, which can be perceived as a highly connected network of neurons or nodes. Each node in a layer of nodes receives inputs from at least one node in a previous layer and combines the inputs and generates an output to at least one node in the next layer. Generally, the input variables comprise the input layer and the target variable comprises the output layer. Between the input and output layers, there may be one or more hidden layers of nodes.

Each node performs a computation to combine the inputs and a transformation to generate an output. Each connection between two nodes has a weight that determines how the input from a prior node is to be combined with other inputs to generate an output to be received by the next node. The final neural network model comprising the final weights is derived by training the network to derive optimal weights such that the outputs (for example, credit score) of the neural network is as close as possible to the desired outputs (for example, actual credit risk) for the consumers/customers in the sample.

The objective of decision trees is prediction and/or classification by dividing observations into mutually exclusive and collectively exhaustive subgroups. The division is based on the levels of particular input variables (for example, demographic characteristics) that have the strongest association with the target variable (for example, credit risk). In its basic form, the decision tree approach begins by searching for the input variable that divides the sample in such a way that the difference with respect to the target variable is greatest among the divided subgroups. At the next stage, each subgroup is further divided into sub-subgroups by searching for the input variable that divides the subgroup in such a way that the difference with respect to the target variable is greatest among the divided sub-subgroups. The input variable selected need not be the same for each subgroup. This process of division or splitting usually continues until either no further splitting can produce statistically significant differences in the target variable in the new subgroups or the subgroups are too small for any further meaningful division. The subgroups and sub-subgroups are usually referred to as nodes. The end product can be graphically represented by a tree-like structure. More information on decision trees can be found in Lehmann et al (1998).

Using Data Mining Techniques for Credit Scoring

To illustrate the use of data mining techniques for credit scoring, consider a credit card issuer who is interested to develop a credit scoring model to predict the credit risk of credit card applicants as bad loss, bad profit and good risk. The credit card issuer intends to deploy the model at the time the credit card applications are processed. Assume that all applicants provide the following information in the application form:

(1) age;

(2) annual income;

(3) gender;

(4) marital status;

(5) number of children;

(6) number of other credit cards held; and

(7) whether the applicant has an outstanding mortgage loan.

Given the above, the target variable is credit risk and the input variables are the seven variables listed above from age to whether the applicant has an outstanding mortgage loan. Prior to developing this application, the credit card issuer has categorised a representative sample comprising 4,117 one-year old credit card holders into three groups (that is, bad loss, bad profit and good risk). Also, as a routine practice, all information provided on the application form is captured electronically.

Construction of the credit scoring model requires predictive modelling to be done. For this purpose, three data mining techniques are appropriate; namely, logistic regression, neural network and decision tree. SPSS Clementine 7.2 (a data mining software) is used in this illustration. The data mining diagram associated with the illustration is given in Figure 1. It can be noted that description and visualisation and predictive modelling are incorporated into the illustration. Further, association and clustering are not relevant for this credit scoring application. A snapshot of the sample data is shown in Figure 2.

[FIGURES 1-2 OMITTED]

Description and Visualisation Results

As mentioned earlier, description and visualisation are useful for understanding the data and in the initial modelling stage to explore patterns, trends and relationships. Several description and visualisation tools are used in the illustration. Some of the results are summarised in Figure 3. For example, descriptive statistics derived using the Statistics node in Clementine indicate that in the sample, the mean age is 31.82 years, mean annual income is $25,580 and mean number of children is 1.45 children (see left panel of Figure 3). In addition, 3,200 or 77.73 per cent of the credit card holders have an outstanding mortgage loan (see top central panel of Figure 3). Although not shown, it is noted that the mean number of other credit cards held is 2.43 cards, 2,077 or 50.45 per cent are female, and 2,089 or 50.74 per cent are married. As for the target variable credit risk, 906 (22.01 per cent) are bad loss, 2,407 (58.46 per cent) are bad profit, and 804 (19.53 per cent) are good risk. Such description aids in understanding the data (that is, credit card applicants and holders).

[FIGURE 3 OMITTED]

To visualise the data using the Plot and Histogram nodes in Clementine, a plot of age and annual income and a histogram showing the number of other credit cards held are generated (see centre and central right panels of Figure 3 respectively). Note that the credit risk is overlaid in the diagrams to relate the visualisation to the target variable. An analysis of the results show that higher age and annual income as well as a lower number of other credit cards held are associated with a more favourable credit risk. Finally, a Web graph (via the Web node in Clementine) is drawn showing the links among gender, marital status, mortgage loan and credit risk (see bottom panel of Figure 3). Stronger relationships are shown by stronger lines. Links below a threshold level as defined by the user are not included in the Web graph (for example, between good risk and marital status). The Web graph suggests that bad loss is moderately associated with having an outstanding mortgage loan and weakly associated with female and married credit card holders. As noted earlier, description and visualisation can be useful for modelling purposes.

Predictive Modelling Results

In this illustrative credit scoring application using data mining techniques, predictive modelling is the most important analysis. In particular, logistic regression, neural network and decision tree can be used to construct the credit scoring model. Before performing predictive modelling, the sample data are partitioned into a construction/training sample (about 75 per cent) and a validation/test sample (about 25 per cent). For simplicity, it is assumed that the overall accuracy rate is the primary performance indicator of the respective prediction models. That is, the overall accuracy rate is the criterion used to assess each model and to compare across models.

Figures 4 and 5 show portions of the logistic regression, neural network and decision tree results derived from the Logistic Regression, Neural Net and C5.0 (decision tree) nodes in Clementine. The logistic regression results indicate that the model is statistically significant (based on a 0.05 significance level). In addition, as shown in the bottom left panel of Figure 4, the following input variables are statistically significant in predicting credit risk: age, annual income, number of children, number of other credit cards held, marital status, and whether the applicant has an outstanding mortgage loan. Gender is not statistically significant. The detailed results of the model are summarised in the right panel of Figure 4. Finally, for the logistic regression model, the overall accuracy rate is 72.7 per cent. This is deemed sufficient for the purpose of this illustration.

[FIGURES 4-5 OMITTED]

Figure 5 (left panel) shows a relatively simple decision tree model with nine terminal nodes (predicting bad loss, bad profit and good risk) and five important input variables: annual income, age, number of children, number of other credit cards held and marital status. A graphical representation of the decision tree model is given in Figure 6. As can be seen, the decision tree can be interpreted visually and also in terms of rules. For example, good risk credit card holders are likely to be those with annual income above $25,049 and not more than one child as well as those with annual income not more than $25,049 and who are above 39 years old and single. The overall accuracy for the decision tree model is 76.0 per cent, which is deemed sufficient for this illustration.

[FIGURE 6 OMITTED]

Finally, Figure 5 shows that the neural network model has nine neurons in the input layer, that is, four metric input variables, (age, annual income, number of children, and number of other credit cards held) and three nonmetric variables (gender, marital status, and mortgage loan) resulting in five dummy variables, three neurons in the hidden layer, and three neurons in the output layer (bad loss, bad profit and good risk). In the neural network, the importance of the input variables in descending order of importance are: annual income, number of other credit cards held, marital status, age, number of children, whether the applicant has an outstanding mortgage loan and gender. The overall accuracy rate of the neural network model is 76.6 per cent, which is deemed sufficient for the purpose of this illustration.

It can be noted from the results presented above that the neural network model is the most accurate. However, as the performance of the three models on the construction/training sample is upward biased since the same observations are used for model construction and model evaluation, it is important to assess the performance of the models on the validation/test sample.

The results can be summarised as follows: (1) logistic regression model: 71.1 per cent, (2) decision tree model: 74.2 per cent, and (3) neural network model: 73.4 per cent. Hence, predictions of the decision tree model are most accurate, followed by those of the neural network model and logistic regression model. Hence, based on the evaluation criterion, the decision tree model is the best prediction model and can be used for predicting credit risk of credit card applicants. It is also noted that a decision tree model is easy to interpret, as evidenced by the simple rules reflected in Figure 6.

Conclusion

In recent years, data mining has gained widespread attention and increasing popularity in the commercial world. Besides credit scoring, there are other potential data mining applications. For example, data mining can be used to: (1) perform churn modelling to identify customers who are likely to churn, (2) construct fraud detection models to give early warning signals of possible fraudulent transactions, (3) understand consumers and customers better, (4) segment customers, or (5) construct models to predict the probability of purchasing certain products or services in order to facilitate cross-selling or up-selling. The findings can then be used, say, to prepare mail catalogues, target advertisement and promotion campaigns. However, data mining is not without limitations.

Limitations of Data Mining

First, the quality of data mining results and applications depends on the availability and quality of data (Chopoorian et al, 2001). For example, to construct a credit scoring model, sufficient "good" and "bad" cases have to be available. In addition, for the available data, problems such as missing data, corrupted data, inconsistent data, have to be resolved before data mining is done. It has been estimated that data preparation comprises about 75 per cent of a data mining project.

Second, a sufficiently exhaustive mining of data will certainly throw up patterns of some kind that are a product of random fluctuations (Hand, 1998). This is especially so for large data sets with many variables. Hence, many interesting and/or significant patterns and relationships found in data mining may not be useful. Further, from a statistical perspective, while data mining is well developed for modelling, it is not as well developed for effect assessment. Murray (1997) and Hand (1998) have warned against using data mining for data dredging or fishing (randomly trawling through data in the hope of identifying patterns) because of the statistical problems involved.

Third, successful application of data mining requires the user to be knowledgeable in the domain area of application as well as in the data mining methodology and tools. Without a sufficient knowledge of data mining, the user may not be aware of or be able to avoid the pitfalls of data mining, see, for example, McQueen and Thorley (1999). Collectively, the data mining team should possess the following: domain knowledge, statistical and research expertise, and IT and data mining knowledge and skills.

Finally, businesses developing data mining applications also need to make a substantial investment of their resources in data mining. It should be borne in mind that data mining projects can fail for a variety of reasons (for example, lack of management support, unrealistic user expectations, poor project management, inadequate data mining expertise).

Limitations of Credit Scoring

In this concluding section, it is appropriate to discuss the limitations of credit scoring. One of the major problems that can arise when constructing a credit scoring model is that the model may be built using a biased sample of consumers and customers who have been granted credit (Hand, 2001). This may occur because applicants who are rejected will not be included in the data for constructing the model since there is no opportunity to ascertain their credit worthiness. Hence, the sample will be biased as good customers are too heavily represented. The credit scoring model built using this sample will generally not perform well on the entire population since the data used to build the model is different from the data that the model will be applied to.

The second problem that can arise when building credit scoring models is the change of patterns over time. The key assumption for any predictive modelling is that the past can predict the future (Berry and Linoff, 2000). In credit scoring, this means that the characteristics of past applicants who are subsequently classified as "good" or "bad" creditors can be used to predict the credit status of new applicants. Sometimes, the tendency for the distribution of the characteristics to change over time is so fast that it requires constant refreshing of the credits scoring model to stay relevant.

Another problem that is prevalent in predictive modelling is the omission of important variables or attributes in the model (Avery et al, 2000). Credit scoring models utilise primarily information about an individual's payment and credit history. This may not be complete to assess one's creditworthiness. In the illustrative credit scoring model, an applicant's credit rating is predicted as "bad" if his attributes are similar to observable characteristics of "bad" customers. However, credit default may be driven by unobservable (that is, unmeasured) characteristics such as employment status and current economic status. Further, the accuracy of the credit scores depends critically on the data used to construct the model and the data to which the constructed model is applied. Also, the prevalence of errors in the credit reports could put both consumers on the losing end and credit providers at financial risk (Collins, 2003).

Related to the above, the use of credit scoring requires an individual to have sufficient credit history and activity before his scores can be calculated. Hence, lenders who have new applicants who have yet accumulated any credit activity may not be able to use credit scoring to assess their credit worthiness. There have been reported instances where new applications for insurance are denied outright (Eldred, 2002).

One of the consequences of credit scoring is the possibility that end-users become so reliant on the technology that they reduce the need for prudent judgement and exercise their knowledge on special cases. In other instances, end-users unintentionally apply more resources than necessary to work the entire portfolio. This could run into the risk of a self-fulfilling prophecy (Lucas, 2002). In the United States, a new industry has emerged that is dedicated to help borrowers improve scores by rearranging finances (Timmons, 2002), rather than obeying the simple rule: pay your bills on time and keep your debt low. Such score-polishing actions could potentially distort the patterns of credit default.

Finally, in insurance critics have alleged that credit information can be misused, and has become the sole determinant in some cases and may be a substitute for race and income data that cannot be used to set insurance rates. However, this is inevitable as there exist other attributes in the credit information that are highly correlated with race and income. In credit scoring, consumers have complained that the credit scores are discriminating. In general, minorities have lower credit scores than white applicants. Scoring industry representatives say that this is because factors that affect a borrower's ability to meet financial obligations such as income, property, education and employment are not equally distributed by race or national origin in the United States (Wasserman, 2000). There exist certain relationships between the observable and the unobservable attributes.

Despite the limitations highlighted above, there is no doubt that credit scoring will continue to be a major tool in predicting credit risk in consumer lending. It is envisaged that organisations using credit scoring appropriately will gain important strategic advantage and competitive edge over its rivals.
Figure 4: Logistic Regression Results

Model Fitting Information

                  -2 Log
Model            Likelihood   Chi-Square   df    Sig

Intercept Only   5888.387        --        --    --
Final            4324.927     1563.460     16    .000

Likelihood Ratio Tests

Effect      Chi-Square   df   Sig

Intercept        .00     0     --
AGE           181.664    2    .000
INCOME        220.006    2    .000
NUMKIDS        23.519    2    .000
NUMCARDS      103.848    2    .000
GENDER           .041    2    .980
MARITAL       226.347    4    .000
MORTGAGE        9.881    2    .007

             Risk                      B       Wald    df   Sig

Bad Loss     Intercept               -.370     1.027   1    .311
             AGE                      .006      .341   1    .559
             INCOME                   .000    73.369   1    .000
             NUMKIDS                  .460    21.467   1    .000
             NUMCARDS                 .730    78.920   1    .000
             [GENDER=f]              -.005      .001   1    .972
             [MARITAL=divsepwid]    -2.936    49.203   1    .000
             [MARITAL=married]       1.141    24.282   1    .000
             [MORTGAGE=n]             .303     2.547   1    .111

Bad Profit   Intercept               6.073   372.297   1    .000
             AGE                      .085    85.881   1    .000
             INCOME                   .000   185.130   1    .000
             NUMKIDS                  .193     4.430   1    .004
             NUMCARDS                 .199     8.364   1    .035
             [GENDER=f]               .015      .018   1    .892
             [MARITAL=divsepwid]     -.700     3.303   1    .069
             [MARITAL=married]        .724    16.081   1    .000
             [MORTGAGE=n]             .506     9.412   1    .002

Note: The reference groups are GENDER = m, MARITAL = single and
MORTGAGE =y.


References

Altman EI, 2001. "Managing credit risk: A challenge for the new millennium". Working Paper. New York University.

Avery RB, RW Bostic, PS Calem and GB Canner, 2000. "Credit scoring: Statistical issues and evidence from credit-bureau files". Real Estate Economics, 28(3): 523-547.

Banasik J, JN Crook and LC Thomas, 1999. "Not if but when the borrowers will default". Journal of the Operational Research Society, 50(12): 1185-1190.

Barefoot AS, 1996. "Credit scoring at a crossroad". ABA Banking Journal, 88(6): 26-31.

Banaslak MJ and GL Kiely, 2000. "Predictive collection score technology". Business Credit, 102(2): 18-20.

Berry MJA and GS Linoff, 1997. Data Mining Techniques: For Marketing, Sales, and Customer Support. John Wiley & Sons, Inc, New York.

--, 2000. Mastering Data Mining: The Art and Science of Customer Relationship Management. John Wiley & Sons, Inc, New York.

Chopoorian JA, R Witherell, OEM Khalil and M Ahmed, 2001. "Mind your own business by mining your data". SAM Advanced Management Journal, 66(2): 45-51.

Chung HM and P Gray, 1999. "Data mining". Journal of Management Information Systems, 16(1): 11-16.

Collins B, 2003. "Some may be misclassified as B&C". Origination News, 12(4): 35.

Consumer Federation of America, 2002. Credit Scores Accuracy and Implications for Consumers. December, Consumer Federation of America and National Credit Reporting Association, New York.

Durand D, 1941. Risk Elements in Consumer Instalment Financing, National Bureau of Economic Research. New York.

Eldred T, 2002. "Reducing use of credit scores for policies proposed". Delaware Capital Review, 25(42): 3.

Fisher RA, 1936. "The use of multiple measurements in taxonomic problems". Annals of Eugenics, 7: 179-188.

Hand DJ, 1998. "Data mining: Statistics and more?". The American Statistician, 52(2): 112-118.

--, 2001. "Modelling consumer credit risk". IMA Journal of Management Mathematics, 12: 139-155.

Hicks B, 2002. "Credit-based scoring". Rough Notes, 145(8): 88-89.

Johnson RW, 2004. "Legal, social and economic issues implementing scoring in the US" in LC Thomas, DB Edelman and JN Crook (eds), Credit Scoring--Recent Developments, Advances and Aims. Oxford University Press, Oxford: forthcoming.

Kahn VM, 2000. "Credit scoring: What your lender won't tell you". Business Week, May 22: 30-31.

Kellison B and P Brockett, 2003. "Check the score: credit scoring and insurance losses: Is there a connection?". Texas Business Review, Special (2003): 1-6.

Kreuze D, 2001. "Debugging hospitals". Technology Review, 104(2): 32.

Lee TH and SH Jung, 1999/2000. "Forecasting creditworthiness: logistic vs artificial neural net". The Journal of Business Forecasting Methods & Systems, 18(4): 28-30.

Lehmann DR, S Gupta and JH Steckel, 1998. Marketing Research. Addison-Wesley Educational Publishers, Inc, Massachusetts.

Leonard KJ, 2000. "The development of credit scoring quality measures for consumer credit applications". International Journal of Quality and Reliability Management, 12(4): 79-85.

Lewis EM, 1992. An Introduction to Credit Scoring. Athena Press, San Rafael.

Loretta JM, 1997. "What's the point of credit scoring?". Business Review, September/ October: 3-16.

Lucas P, 2000. "Why recoveries are on the rise". Credit Card Management, 13(7): 71-76.

--, 2002. "Score updates". Collections & Credit Risk, 7(10): 22-25.

McQueen G and S Thorley, 1999. "Mining fool's gold". Financial Analysts Journal, 55(2): 61-72.

Murray LR, 1997. "Lies, damned lies and more statistics: The neglected issue of multiplicity in accounting research". Accounting and Business Research, 27(3): 243-258.

Perin M, 1998. "Risky business: Sub-prime market growth attracts host of new players". Houston Business Journal, August, http://www.bisjournals.com/houston/stories/1998/ 08/31/newscolumn4.html.

Prakash S, 1995. "Mortgage lenders see credit scoring as key to hacking through red tape". American Banker, 160(161): 1-2.

Punch L, 2000. "Shedding light on credit scores". Credit Card Management, August: 78-80.

Quinn LR, 2000. "Credit scores scrutiny". Mortgage Banking, 60(12): 50-55.

Quittner J, 2003. "Credit cards: subprime's tech dilemma: with delinquencies and charge-offs on the rise, the industry examines the role of automated decisioning". Bank Technology News, 16(1): 19, 23.

Ryman-Tubb N, 2000. "Impact of e-commerce on credit scoring". Credit Control, 21(3): 11-14.

Robida C and G Gilkerson, 2000. "How many scorecards do I need for my business lending environment?". Business Credit, 102(6): 36-38.

Rowland JB, 2003. "Confidently evaluate small businesses with credit scoring". Business Credit, 105(3): 26-31.

Sandler AL, SE McGinn and JL Barloon, 2000. "Fair lending scrutiny of credit score-based underwriting systems". ABA Bank Compliance, 21(3), p. 37-43.

Schiff S, 2003. "Two views on 2003: conning--PC results improving I.I.I.--credit scoring likely to be year's 'hot topic'". Rough Notes, 146(1): 104-105.

Stepanova M and LC Thomas, 2002. "Survival analysis methods for personal loan data". Operational Research, 50(2): 277-289.

Thomas LC, 2000. "A survey of credit and behavioral scoring; forecasting financial risk of lending to consumers". International Journal of Forecasting, 16 (2, 2000): 149-172.

Timmons H, 2002. "The cracks in credit scoring: As loan default rise, worries grow about how creditors pick borrowers". Business Week, November 25: 136-137.

Trybula WJ, 1997. "Data mining and knowledge discovery". Annual Review of Information Science and Technology, 32: 197-229.

Wasserman M, 2000. "Mining data". Regional Review, 10(3).

West D, 2000. "Neural network credit scoring models". Computers & Operations Research, 27(11 and 12): 1131-1152.

Zmiewski M, 2000. "Small business credit scores in good times ... and bad". Journal of Lending & Credit Risk Management, 82(7): 74-79.

Zuckerman S, 1996. "Taking small business competition nationwide". US Banker, 106 (8): 24-28.

Koh Hian Chye

Nanyang Business School

Nanyang Technological University

Tan Wei Chin

Goh Chwee Peng

National Computer Systems Pte Ltd
COPYRIGHT 2004 Singapore Institute of Management
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2004, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Author:Peng, Goh Chwee
Publication:Singapore Management Review
Date:Jul 1, 2004
Words:7107
Previous Article:Product and skills development in small- and medium-sized high-tech firms through international strategic alliances.
Next Article:The measurement, analysis, and application of the perceived usability of electronic stores.
Topics:



Related Articles
Right from the start helping students pass the HSEE: students who fail to pass the high school exit exam the first time may be at risk of dropping...
CBAT - computer based analytical technology: Alterian Inc. (Database Systems).
Data mining and customer relationship marketing in the banking industry.
By the numbers. (Briefing).(Insurance rates dependent upon customer credit ratings)
U.S. government still mining data.(News, Trends & Analysis)
Security breach adds fuel to fire over credit-based insurance scoring.(Technology: Technology Notes)
SPSS' new Clementine 9.0 supports predictive modeling Oracle Data Mining.(Brief Article)
Mining for information gold: data mining offers the RIM professional an opportunity to contribute to knowledge discovery in databases in a...
Transformation of analytical tools: using portfolio analysis techniques in defense applications.
New loan fraud screening solution.(TECHNOLOGY)

Terms of use | Copyright © 2008 Farlex, Inc. | Feedback | For webmasters | Submit articles