Printer Friendly

Data mining and customer relationship marketing in the banking industry.

Advances in computer hardware and data mining software have made data mining accessible and affordable to many businesses. Hence, it is no surprise that data mining has gained widespread attention and increasing popularity in the commercial world in recent years. Data mining provides the technology to analyse mass volume of data and/or detect hidden patterns in data to convert raw data into valuable information. This paper discusses the potential usefulness of data mining for customer relationship management (CRM) in the banking industry First, the paper introduces the CRM concept and summarises the data mining methodology and tools. Second, it discusses the data mining literature, particularly its applications in banks. Third, it illustrates a possible CRM application of data mining in banking. Finally, it suggests other potential data mining banking applications and highlights some of the limitations of data mining.


Since the mid-1990s, three new interrelated areas that emphasised obtaining more information from data have emerged strongly in information systems and information technology. They are data warehousing, knowledge management, and data mining. Coupled with advances in both computer hardware and software, many applications are more accessible and affordable to businesses than before. This paper focuses on data mining, which aims to identify valid, novel, potentially useful and understandable correlations and patterns in data (Chung and Gray, 1999). In particular, the paper explores the potential usefulness of data mining in banks in the area of customer relationship management (CRM). Although the paper focuses mainly on the banking industry, the issues and applications discussed are applicable to other industries as well. See, for example, Koh and Low (2001) on data mining applications in the insurance industry and Koh and Leong (2001) on data mining applications in the healthcare industry.

The remainder of the paper is organised into five sections. The first section introduces customer relationship management in general. The second section discusses the data mining methodology and tools, and the data mining literature. The third section illustrates possible banking applications and examples of data mining in the literature, both in the context of customer relationship management. The fourth section gives an illustration of how data mining can be applied to chum modelling (that is, the prediction of customer turnover) in banks. Finally, the concluding section highlights the limitations of data mining and suggests other possible CRM applications of data mining in the banking industry.

Customer Relationship Management

CRM can be defined as the process of predicting customer behaviour and selecting actions to influence that behaviour to benefit the company (Jenkins, 1999), usually leveraging on information technology and database-related tools. This important concept has been given a new lease of life because of the growth of the Internet and E-businesses. CRM is crucial in the on-line business environment because face-to-face contact is impossible on the net and customer loyalty can waver easily. As gaining customer loyalty becomes the focus in an E-business environment, it is not surprising that analysts have referred to CRM services as one of the hottest enterprise services today. Statistics from the International Data Corporation predict that worldwide revenues in this market will explode at a compound annual growth rate of 29 per cent from US$34.4 billion in 1999 to US$125.2 billion in 2004 (Chin, 2000). This projected phenomenal growth of CRM illustrates its increasing popularity among businesses.

CRM initiatives usually seek to fulfil several objectives. One of the objectives is to get closer to the customer by utilising the data "hidden" in scattered enterprise databases. Examining and analysing the data can turn raw data into valuable information about customer's needs. By predicting customer needs in advance, businesses can then market the right products to the right segments at the right time through the right delivery channels. Customer satisfaction can also be improved through more effective marketing.

Another objective of the CRM initiative is to transform the company into customer-centric organisations with a greater focus on customer profitability as compared to line profitability. The insights gained from CRM enable companies to calculate or estimate the profitability of individual accounts. Businesses are then able to differentiate their customers correctly with respect to their profitability. From such insights, companies can build predictive churn models to retain their best customers by identifying telltale symptoms of dissatisfaction and churning. As for less profitable accounts, efforts can be directed to switch them to lower cost/service delivery channels.

Other CRM objectives include increased cross-selling possibilities, better lead management, better customer response and improved customer loyalty (Chin, 2000).

CRM and Data Mining

While the tremendous business value of customer-centric marketing and management strategies is intuitive, implementing CRM initiatives have only been popularised by recent developments in technology, particularly in data storage capabilities, data warehousing applications, and data mining techniques (Berry and Linoff, 1997). Although a large part of CRM is technologically driven, it is not just about computer software and hardware. For most small businesses, CRM occurs naturally (Coyle, 1999). Customer loyalty and profitability are derived from the closely knitted relationships that small community businesses have with their customers. As businesses expand, however, that degree of intimacy is no longer available.

As it is not realistic and cost effective for big corporations to know each customer individually, CRM must be achieved in an indirect manner for such organisations. They must predict the behaviour of individual customers through the available transactional, operational and other customer information they have.

Data mining uses sophisticated statistical processing or artificial intelligence algorithms to discover useful trends and patterns from the extracted data. Data mining can yield important insights including prediction models and associations that can help companies understand their customers better. Front office applications can enable marketing personnel to have dynamic access to decision support models from different delivery channels to decision support models.

Data Mining Methodology and Tools

Data mining can be considered a recently developed methodology and technology, coming into prominence in 1994 (Trybula, 1997). The SAS Institute defines data mining as the process of selecting, exploring and modelling large amounts of data to uncover previously unknown patterns of data (SAS Institute, 1998). Accordingly, data mining can be considered a process and a technology to detect the previously unknown in order to gain competitive advantage. The SAS data mining methodology comprises the following five stages: Sample, Explore, Modify, Model, and Assess (semma).

Sampling is desirable if the data for analysis are too voluminous for reasonable processing time or if it is desirable to avoid problems of generalisation by dividing the data into different sets for model construction and model validation. Exploration and modification refer to the review of data to enhance understanding of it (for example, by examining the summary measures) and the transformation of data (for example, to induce a linear relationship or a normal distribution), respectively. It is noted that not every data mining project needs sampling or modification of the data. However, exploration is usually useful and done as a form of preliminary analysis.

The modelling stage is the actual data analysis. Most data mining software include traditional statistical methods (for example, cluster analysis, discriminant analysis, and regression analysis) as well as non-traditional statistical analysis such as neural networks, decision trees, link analysis, and association analysis. Finally, the assessment stage allows the comparison of models and results from any data mining model by using a common yardstick (for example, lift charts, profit charts or diagnostic classification charts).

Classification of Data Mining Tools

Data mining tools can be broadly classified based on what they can do:

* description and visualisation;

* association and clustering; and

* classification and estimation (prediction).

Description and Visualisation

Description and visualisation can contribute greatly towards understanding a data set and detecting hidden patterns in data--especially complicated data containing complex and non-linear interactions. They are usually performed before modelling is attempted and represents exploration in the SEMMA methodology. Standard description tools include summary statistics (for example, measures of central tendency and measures of dispersion) and graphical representations (for example, distributions and plots). Visualisation can be considered an enhanced graphical approach that allows user input and interaction. An example is a rotating multidimensional plot that permits the user to define the multiple dimensions (multiple variables) in the plot as well as the direction and angle of rotation to facilitate viewing complex relationships. Colours can also enhance visualisation tools. In the data mining context, description and visualisation tools can be used to understand people, products and process and study the relatio nships among variables. The results from such analyses are seldom an end in themselves but are usually used as a means to construct better data mining models (to predict certain target variables).

Association and Clustering

In association, the objective is to determine which variables go together. For example, market basket analysis refers to a technique that generates probabilistic statements such as: if customers purchase coffee, there is a 0.35 probability that they also purchase bread. Such information can be useful for store layout, items bundling, discount and promotion decisions, etc. Market basket analysis can be applied not only to items purchased concurrently but also to items purchased sequentially. Another data mining tool, link analysis, can also be considered an association technique. It looks at connection relationships-how people, places and things are connected (for example, call patterns in telecommunications).

In clustering, the objective is to group objects in such a way that objects belonging to the same cluster are similar and objects belonging to different clusters are dissimilar. The two most common data mining tools for clustering are cluster analysis and self-organising map (or Kohonen network). As an application, clustering can be used for market segmentation to group consumers and customers. Clustering can also be used to generate cluster membership, which in turn can be used as an input variable in a prediction model-consumers belonging to particular clusters may be more inclined to respond to a particular mailing campaign favourably.

Classification and Estimation (Prediction)

The most common and important applications in data mining probably involve prediction. Classification refers to the prediction of a target variable that is categorical in nature (for example, predicting fraud versus non-fraud, high-risk versus low-risk, or purchaser versus non-purchaser). Estimation, on the other hand, refers to the prediction of a target variable that is metric in nature (for example, predicting the amount spent, duration of a call, or the account balance).

To construct prediction models, at least one of the following data mining tools is usually used: multiple or logistic regression, neural networks and decision trees. Logistic regression is a tradition statistical method similar to regression, except that it handles categorical target variables. Neural networks are useful for recognising patterns in the data and are modelled after the human brain, which can be perceived as a highly connected network of neurons.

The objective of decision trees is estimation and/or classification by dividing observations into mutually exclusive and exhaustive subgroups. The end product can be graphically represented by a tree-like structure. In applying the decision tree methodology, each observation is eventually assigned to a node that has a predicted value or classification. Of the three prediction models, decision trees are the most interpretable in that they can be translated into decision rules. Further, as in the case of neural networks, decision trees can be used to model complex non-linear and interaction relationships.

Data Mining and CRM Literature

According to the professional and trade literature, more companies are using data mining as the foundation for strategies that help them outsmart competitors, identify new customers and lower costs (Davis, 1999). In particular, data mining is widely used in marketing, risk management and fraud control (Kuykendall, 1999). For example, the Farmers Insurance Group data mines customer information to develop competitive rates, Foote Cone & Belding analyses data mined from operational and transactional systems to refine clients' direct mailing and advertising campaigns to improve catalogue sales, and Axios Data Analysis Systems use data mining to help identify what could be fraudulent health insurance claims for one of its clients. Other successful users of data mining include Fingerhut, American Century Investments, Charles Schwab & Company, Chase Manhattan Bank, Bank of America, US West, Bell Atlantic, Alltel, Wal-Mart and Boots PLC (Lach, 1999; Scholber, 1999; Stedman, 1998; Brabazon, 1997). Total spending by US banks on CRM, including technology and non-technology outlays, has been estimated to grow at a compound rate of 11 per cent (Kiesnoski, 1999).

One possible data mining application in banks is risk management, such as credit risk assessment or credit scoring. In the past, assessing credit risk (for example, in loan approval and overdraft facilities) had mostly been a rule-based affair and the rules are usually derived from weathered industry norms. In the last couple of years, more accessible and easier-to-use data mining software has made it possible for powerful data mining techniques to be applied to risk assessment and other banking-related business problems (Berger, 1999). For example, a decision tree solution for credit risk assessment produces credit-scoring rules for all the accounts in a bank database, and credit jeopardy lists can be drawn up by the use of multiple database queries. This scoring or classification of high/low risk is based on the attributes of each customer account such as overdraft records, outstanding loans, history of derogatory credit reports, account type, income levels, and other information.

Real world examples include Corestates Bank, whose Retail Credit Information System (RCRIS) allows the bank to analyse customer and credit portfolio accurately to reduce its credit risk and monitor high-risk accounts (Varney, 1996). Another example is the Bank of Montreal which analyses mortgage customers' transactional history in checking, saving and other accounts for insight into customers' risk of default (Fabris, 1998). Similarly, the Bank of America's mortgage division has used data mining on customer behaviour data to estimate bad loans, so that credit risk managers can allocate optimal loan loss reserves which affects profitability directly (Fabris, 1998).

Another possible banking application of data mining techniques is in customer acquisition. Traditionally, database marketers have made important marketing decisions based on simple one-dimensional queries (that underutilise the available data), or even on pure gut and intuition (Berger, 1999). Today, exploratory data mining methods--such as automatic cluster detection and market basket analysis--can be used to discover attributes in customer databases that predict response rates to the bank's marketing campaigns (Peacock, 1998). Attributes that are identified as campaign friendly can then be matched to new lists of non-customers in order to increase the effectiveness of the marketing campaign.

Coyle (1999) reported that, before data mining caught on several years ago, a direct mail campaign was thought to be successful if it achieved a response rate of 6 to 7 per cent. In 1998, the Canadian Imperial Bank of Commerce utilised CRM and data mining to achieve a phenomenal response rate of 47 per cent. Much of the success was attributed to targeting the right customers and being able to predict their responses. Fleet Bank also used data mining to identify the best prospects for marketing its mutual funds based on customer demographics and account data (Fabris, 1998).

Another data mining application in the Bank of America's west coast customer service call centre focuses on marketing and cross-selling opportunities (Fabris, 1998). Instead of mass pitching a certain "hot" product, the bank's customer service representatives are equipped with customer profiles enriched by data mining that help them to identify which products and services are most relevant to callers.

Data mining can also be used in customer retention applications (for example, by employing churn modelling). In a typical application, data mining identifies customers who are profitable and who are likely to leave or churn. With the information, the bank can target these valuable but vulnerable customers for extra value-added customer services, special offers and loyalty incentives (Peacock, 1998). The Chase Manhattan Bank in New York uses data mining to model customer churning (Fabris, 1998). From the data mining efforts, the Chase Manhattan Bank implemented the unusual step of reducing required the minimum balance in customers' checking accounts for two consecutive years. The result was that the percentage of profitable customers to overall customers improved.

An Illustrative Data Mining Application in Banking: Churn Modelling

To set the context to illustrate a possible data mining application in banking, consider a customer retention application or churn modelling for a fictitious bank, ZBANK, which is facing increasing competitive challenges from other financial institutions. ZBANK has been encountering customer defections in its home loans, which is one of its most highly valued customer bases. As a marketing strategy, ZBANK gives its new customers in home loans lots of incentives (such as free electrical appliances and furniture vouchers). Thus, it has a comparatively higher initial cost of acquisition than its competitors. However, market dominance in this type of loans has given ZBANK a lower risk exposure due to the home mortgages and a strong strategic positioning for cross selling other services such as future home loans or home insurance.

Besides maintaining its strategic market dominance, predicting churn likelihood is also important to ZBANK for reducing the number of new customers who defect soon after being acquired. It is noted that ZBANK has a customer database that consists of transactional and demographic information pertaining to its home loan customers.

Data and Data Mining Tools

Assume that ZBANK captures the following data:

(1) customer identification [cust_num],

(2) balance in the savings account [savg_acc {$'000}],

(3) balance in the current account [curr_acc {$'000}],

(4) balance in the investment account [invt_acc {$'000}],

(5) average number of transactions per day [trans_dy],

(6) mode of credit card payment [card_pay {giro, cheque, other accounts}],

(7) whether there are other mortgage loans [mortg_ln],

(8) whether there are credit lines [cdt_line],

(9) customer age [cust_age],

(10) customer gender [cust_sex],

(11) customer marital status [cust_mar],

(12) customer number of children [cust_chd],

(13) customer income per annum [cust_inc],

(14) whether customer has more than one car [cust_car], and

(15) customer churn status [cust_chn].

Assume further that the objective of the illustrative data mining application is to construct a churn prediction model to predict the probability that a current customer will churn in the next six months. The prediction will be made based on thirteen of the variables listed above (that is, from savg_acc to cust_car). The target variable (cust_chn) is captured as a multichotomous variable as follows: current customer, involuntary churn, and voluntary churn. Involuntary churn is probably the least interesting to ZBANK as it reflects mostly customers who have sold their homes within the loan period and who therefore no longer require the home loans. Voluntary chum refers to customers who defect to ZBANK's competitors and is the primary concern of the bank.

Prior to developing this application, ZBANK has categorised all its existing customers into the above three groups. Also, as a routine practice, all demographic information (that is, from cust_age to cust_car) is updated every six months while transactional information (that is, from savg_acc to cdt_line) is updated real-time. To enable the prediction model to provide early indicators so that remedial actions can be taken, a lag of six months between the target (that is, dependent) variable and input (that is, independent) variables is decided upon. That is, the input variables are collected six months prior to categorising the customers churn status; thus, the model predicts churn six months in advance.

For predictive modelling, three data mining tools are usually appropriate; namely, logistic regression, neural network and decision tree. SPSS Clementine, a data mining software, is used in this illustration. The data mining diagram associated with the illustration is given in Figure 1. It is noted that description and visualisation, association and clustering, and predictive modelling are incorporated into the illustration. A snapshot of the sample data is shown in Figure 2.

Description and Visualisation Results

As mentioned earlier, description and visualisation are useful for understanding the data and in the initial modelling stage to explore patterns, trends and relationships. Several description and visualisation tools are used in the illustration. For example, descriptive statistics are derived using the Statistics and Distribution nodes in Clementine. Some results are shown in Figure 3 (for example, the mean age of home loan customers is 57.4 years, see left panel of Figure 3, and 720 or 50.7 per cent are females, see top right panel of Figure 3). Such description aids in understanding the data. To visualise the data using the Plot and Histogram nodes, a plot of customer income and customer age and a histogram showing average number of transactions per day are generated (see centre panel and middle right panel, respectively). Further, to relate the visualisation to the target variable, customer churn status is overlaid in the different graphs. For example, the dispersion of customer, invol_chn and vol_churn a mong female and male customers and for each level of trans_dy is incorporated into the graphs. This preliminary assessment of the relationships can be useful for modelling purposes. In particular, the results suggest that voluntary churn is proportionately more common among female customers as well as less active customers (as measured by trans_dy).

Finally, a Web graph (via the Web node in Clementine) is drawn showing the links among cust_sex, cust_mar, card_pay and cust_chn (see bottom right panel in Figure 3). Stronger relationships are shown by stronger lines. Links below a threshold level (as defined by the user) are not included in the web graph (for example, between invol_chn and the selected input variables). The web graph suggests that existing customers (that is, non-churners) tend to be those who are married and male and those who make their credit card payments with other accounts. It is noted that the customer chum status lags the input variables by six months as discussed earlier.

Association and Clustering Results

To further understand the home loan customers, clustering can be performed. The results obtained from running the TwoStep clustering node are summarised in Figure 4. As shown, the customers seem to fall into seven natural clusters. The cluster profile and characteristics generated can help to define and understand each cluster as well as differences among clusters. For example, comparing Cluster 1 and Cluster 4, Cluster 1 consists of female customers only who are relatively younger and mostly married (92.2 per cent), and who have relatively higher annual income. In comparison, Cluster 4 consists of male customers only who are relatively older (by about 5 years on average) and among whom about 59.8 per cent are married, and who have relatively tower annual income (by almost $4,000 on average). Clustering results are very useful for market profiling and segmentation studies but are less relevant for predictive modelling.

In this illustration, association analysis is used to generate rules that indicate the relationship between the input variables and the target variable. These rules are important not only for discovering patterns, relationships and trends but also for predictive modelling (for example, deciding on which input variables to include/exclude from the model). The GRI (generalised rule induction) node in Clementine is used to perform association analysis and the results are summarised in Figure 5. To interpret the results, the first association rule indicates that there are 156 (or 11.0 per cent of) home loan customers whose balance in their investment account is less than $4,988; among these, 81.0 per cent of them are involuntary churners. Similarly, the third association rule indicates that there are 198 (or 13.9 per cent of) home loan customers whose balance in their current account is more than $1,017; among these, 81.0 per cent of them are voluntary churners. The other association rules can be interpreted simi larly. The association rules show how the transactional and demographic information is associated with customer churn status. It is noted that customer chum status lags the input variables by six months.

Predictive Modelling Results

In this illustrative data mining application for ZBANK, predictive modelling is the most important analysis. In particular, logistic regression, neural network and decision tree can be used to model customer chum in home loans. Before performing predictive modelling, the sample data is partitioned into a construction/training sample, approximately 75 per cent and validation/test sample, approximately 25 per cent. Extracts from the two samples are shown in Figure 6.

Figures 7 and 8 show portions of the logistic regression, neural network and decision tree results derived from the Logistic Regression, Train Net and Build C5.0 nodes in Clementine. As can be seen, the logistic regression model is statistically significant and has a Chi-square p-value of 1.000, indicating a good fit of the data (see Figure 7). In addition, the following input variables are statistically significant in predicting customer chum status at a 0.05 level of significance: savg_acc (p-value = 0.000), curr_acc (p-value = 0.000), cust_age (p-value = 0.002), cust_inc (p-value = 0.033) and cust_sex (p-value = 0.000).

Figure 8 shows that the neural network model has 15 neurons in the input layer, five neurons in the hidden layer, and three neurons in the output layer. In addition, the five most important input variables are (in descending order of importance): curr_acc, cust_chd, savg_acc, invt_acc and cust_mar. Finally, the decision tree model shows a relatively simple decision tree with four terminal nodes and only three important input variables (in descending order of importance): invt_acc cust_sex and cust_age. A graphical representation of the decision tree model is given in Figure 9.

That each prediction model is significant can be seen from the lift charts generated with the Evaluation node in Figure 10 (for logistic regression, decision tree and neural network from left to fight). The lift charts plot the cumulative lift value against percentiles of the sample (in this case, the construction/training sample). The benchmark (that is, threshold for evaluating each model) is 1, which translates to the success of "hitting" existing customers if the percentiles of records are randomly selected from the sample. The lift value measures how much more successful (that is, accurate) the prediction model is in "hitting" existing customers if the percentiles reflect the descending order of the predicted probability that a record from the data is an existing customer. As can be seen in Figure 10, the lift value for each model is above the benchmark of 1, converging at 1 at the 100th percentile. Hence, it can be concluded that each of the prediction model is significant in that it can predict the ta rget variable (at least existing customers versus non-existing-customers) with significant accuracy.

It can be noted that the prediction models obtained from logistic regression, neural network and decision tree are not identical. Hence, it is important to compare the performance of the three different models not only on the construction/training sample but also (and more importantly) on the validation/test sample. For these prediction models, the best way to evaluate their comparative performance is probably to look at the accuracy rates of the models in predicting the target variable (that is, customer churn status). For the purpose of this illustration and for simplicity, it is assumed that the overall accuracy rate comprises the evaluation criterion for comparing the performance of the different prediction models. The results (that is, classification tables) are captured in Figure 11.

As shown in the left panel of Figure 11, predictions of the decision tree model ($C-cust_chn) are most accurate at an overall accuracy rate of 81.6 per cent, followed by those of the logistic regression model ($L-cust_chn; overall accuracy rate = 80.0 per cent) and neural network model ($N-cust_chn; overall accuracy rate = 77.9 per cent). Hence, based on the evaluation criterion, the decision tree model is the best (or champion) prediction model and should be used for predicting churn in the home loans of ZBANK. It is also noted that a decision tree model is easy to interpret, as evidenced by the simple rules reflected in Figure 9. In particular, the results indicate that home loan customers in ZBANK who churn voluntarily are likely to be female customers above the age of 39 and who have more than $4,976 in their investment accounts. Note that the target variable lags the input variables by six months.

From the results presented so far, it is expected that the decision tree churn prediction model can add value by more accurately identifying churners and non-churners given their transactional and demographic information. This being the case, the decision tree model can be used to help ZBANK identify which customers are inclined to voluntarily churn. ZBANK can then offer them incentive packages or take other preventive actions. Similarly, the churn model can help identify and which low-churn-risk home loan applicants to acquire. Using data mining terminology, the decision tree model can be deployed by using it to score existing customers and new home loan applicants.

Finally, it should be pointed out that for this illustration, the model overall classification accuracy rate is computed in a relatively simple manner. In practice, it is appropriate to consider also the misclassifications and their relative costs as well as the relative proportion of churners and non-churners in both the sample and population--see, for example, Koh (1992).


In recent years, data mining has gained widespread attention and increasing popularity in the commercial world. Successful data mining applications in the Farmers Insurance Group, Foote Cone & Belding, Axios Data Analysis Systems, Fingerhut, American Century Investments, Charles Schwab & Company, Chase Manhattan Bank, Bank of America, US West, Bell Atlantic, Alltel, Wal-Mart and Boots PLC have been reported. It is thus not surprising that recent surveys found that data mining had grown in usage and effectiveness. Professional bodies (see Freedman, 1997; AICPA, 1999) have also identified data mining as an important technology for the twenty-first century.

This paper looks at the potential usefulness of data mining in the banking industry and presents an illustrative application focused on churn modelling. The data mining methodology and its tools are also discussed and the data mining and CRM literature summarised.

Other Potential Data Mining Applications in the Banking Industry

Besides chum modelling, there are other potential data mining applications for banks. For example, data mining can be used to:

(1) construct credit scoring models to assess the credit risk of loan applicants or credit card applicants, or

(2) construct fraud detection models to give early warning signals of possible fraudulent transactions.

Further, it can be used to:

(3) understand consumers and customers better (for example, via market basket analysis), or

(4) segment customers (for example, via clustering). The findings can then be used, say, to prepare mail catalogues, target advertisement and promotion campaigns, etc. Finally, data mining can also be used to

(5) construct models to predict the probability of purchasing certain products or services in order to facilitate cross-selling or up-selling.

Limitations of Data Mining

It is appropriate in this concluding section to highlight some limitations of data mining. First, a sufficiently exhaustive mining of data will certainly throw up patterns of some kind that are a product of random fluctuations (Hand, 1998). This is especially so for large data sets with many variables. Hence, many interesting and/or significant patterns and relationships found in data mining may not be useful. Second, from a statistical perspective, while data mining is well developed for modelling, it is not as well developed for effect assessment. Murray (1997) and Hand (1998) have warned against using data mining for data dredging or fishing (that is, trawling through data in the hope of identifying patterns) because of the statistical problems involved.

Third, successful application of data mining requires the user to be knowledgeable in the domain area of application as well as in the data mining methodology and tools. Without a sufficient knowledge of data mining, the user may not be aware of or be able to avoid the pitfalls of data mining, see, for example, McQueen and Thorley (1999). Collectively, the data mining team should possess the following: domain knowledge, statistical and research expertise, and IT and data mining knowledge and skills. Finally, businesses developing data mining applications also need to make a substantial investment of their resources (that is, time and effort) in data mining. It should be borne in mind that data mining projects can fail for a variety of reasons (for example, lack of management support, unrealistic user expectations, poor project management, inadequate data mining expertise, etc.).

To conclude, there is no doubt that data mining is potentially useful in the banking industry. It is envisaged that the bank that can realise the potential usefulness of data mining in transforming raw data into valuable information will gain important strategic advantage and competitive edge over its rivals.


American Institute of Certified Public Accountants (AICPA) (1999). "Top 10 technologies--plus 5 for tomorrow". Journal of Accountancy, 187(5): 16-17.

Berger C (1999). "Data mining to reduce churn". Target Marketing, 22(8): 26-28.

Berry MJA and GS Linoff (1997). Data Mining Techniques: For Marketing, Sales, and Customer Support. John Wiley & Sons, Inc.: New York, New York.

Brabazon T (1997). "Data mining: A new source of competitive advantage?" Accountancy Ireland, 29(3): 30-31.

Chin J (2000). "It's important to do it well". Straits Times--Computer Times, 8 Nov, 2000: 14-16.

Chung HM and P Gray (1999). "Data mining". Journal of Management Information Systems, 16(1): 11-13.

Coyle T (1999). "Finding your best customers". America's Community Banker 8(9): 26-29.

Davis B (1999). "Data mining transformed". Informationweek, 751: 86-88.

Decker P (1998). "Data mining's hidden dangers". Banking Strategies, 74(2): 6-14.

Fabris P (1998). "Advanced Navigation". CIO, 11(15): 50-55.

Freedman J (1997). "IIA announces 1997 research priorities". Management Accounting, 78(1): 65-66.

Hand DJ (1998). "Data mining: Statistics and more?" The American Statistician, 52(2): 112-118.

Jenkins D (1999). "Customer relationship management and the data warehouse". Call Center Solutions, 18(2): 88-92.

Kiesnoski K (1999). "Customer relationship management". Bank Systems & Technology 36(2): 30-34.

Koh HC (1992). "The sensitivity of optimal cut-off points to misclassification costs of Type I and Type II errors in the going-concern prediction context". Journal of Business Finance & Accounting, 19(2): 187-197.

Koh HC and SK Leong (2001). "Data Mining Applications in the Context of Casemix". Annals, Academy of Medicine (Singapore), 30(4, Supplement): 41-49.

Koh HC and CK Low (2001). "Using data mining in insurance companies". Singapore International Insurance and Actuarial Journal, 4(2): 51-62.

Kuykendall L (1999). "The data-mining toolbox". Credit Card Management, 12(6): 30-40.

Lach J (1999). "Data mining digs in". American Demographics, 21(7): 38-45.

McQueen G and S Thorley (1999). "Mining fool's gold". Financial Analysts Journal, 55(2): 61-72.

Murray LR (1997). "Lies, damned lies and more statistics: The neglected issue of multiplicity in accounting research". Accounting and Business Research, 27(3): 243-258.

Peacock PR (1998). "Data mining in marketing: Part 1". Marketing Management, 6(4): 8-18.

SAS Institute (1998). From Data to Business Advantages: Data Mining, The SEMMA Methodology and SAS Software. SAS Institute: Gary, North Carolina.

Sargeant A and J McKenzie (1999). "The lifetime value of donors: Gaining insight through CHAID". Fund Raising Management. 30(1): 22-27.

Schober D (1999). "Data detectives". Telephony, 237(9): 20-24.

Stedman C (1998). "Data mining despite the dangers". Computerworld, 32(1): 61-62.

Trybula WJ (1997). "Data mining and knowledge discovery". Annual Review of information Science and Technology 32: 197-229.
COPYRIGHT 2002 Singapore Institute of Management
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2002, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

Article Details
Printer friendly Cite/link Email Feedback
Author:Leong Gerry, Chan Kin
Publication:Singapore Management Review
Geographic Code:1USA
Date:Jul 1, 2002
Previous Article:Contributions of Multiculturalism to the Competitive Advantage of an Organisation. (Commentary).
Next Article:Virtual Organisation: a stratagem.

Related Articles
New Release of MODEL 1 by Unica Technologies Enhances Integration and Automation.
Data Mining Leaders Bring Data Mining to the Masses With New Process Model; CRISP-DM Methodology Helps Users Manage Data Mining Projects More...
ANGOSS Software Announces Support for CommerceTrends 3.0 Platform from WebTrends.
Data decisions: How much insurers value control of customer data will determine how they deal with account aggregation. (Life/Health).
Perish the paradox: using analytics to unearth true intelligence in customer interactions. (Call Center/CRM Management Scope).
Credit scoring using data mining techniques.
Data mining: a tool for marketers.
Mining for information gold: data mining offers the RIM professional an opportunity to contribute to knowledge discovery in databases in a...

Terms of use | Copyright © 2017 Farlex, Inc. | Feedback | For webmasters