Using Data Mining and Warehousing For Knowledge Discovery.
Real strategic value comes from understanding customer behavior and being able to model alternative actions. The knowledge required to anticipate behavior could be discovered from many users running several traditional queries against warehouses of data, but that supposes the questions are known and the time is available to complete the analysis.
Knowledge Discovery is a powerful new solution to information overload. It enables an organization to better understand the business process at work by searching automatically through huge amounts of data, looking for patterns of events, and presenting these to the business in an easy-to-understand graphical form. These systems are tireless, they do not forget, they free up skilled human resources, and find answers to important questions that users may never have asked.
Organizations need these new solutions if they are to remain competitive. According to Eight Trends in IT, in the Gartner Research Review from 1998, trend one is: "From Data to Decisions. Rather than using IT as a means to collect and present data for users to make decisions, technology will continue to automate more of the burden of the decision making process itself (e.g., through data mining, expert systems, and agents)."
A clear and fast growing market for Knowledge Discovery solutions exists. The Meta Group has calculated the size of the Knowledge Discovery market, forecasting a market with a compound growth of 40 percent to reach $800 million in the year 2000.
What Is Knowledge Discovery?
Knowledge Discovery is about understanding a business. It is a process that solves business problems by analyzing the data to identify patterns and relationships that can explain and predict behavior.
The process is well defined and can be broken into six phases, as shown in Fig 1. The process is cyclical and iterative with the results of one exercise driving requirements for further analysis. The Knowledge Discovery process should not be confused with data mining, which is the application of specific algorithms for finding patterns in data. Although data mining can be considered the key step in the overall Knowledge Discovery process ("modeling" in the Fig), the other stages are essential in ensuring that knowledge is successfully extracted from the data.
It should be possible to use the identified knowledge to, for example:
* Make predictions about new data;
* Identify and explain hidden patterns and trends in existing data;
* Summarize the contents of large databases to facilitate understanding and decision making.
The Knowledge Discovery Process
Implementing Knowledge Discovery as a process within an organization has clear benefits. It creates a common understanding and framework that is understood by all involved in decision making, which enables Knowledge Discovery exercises to be an important part of the ongoing management process.
This initial phase focuses on understanding the objectives of the project or exercise from a business perspective. This has to be converted into a Knowledge Discovery problem definition, so that a preliminary plan can be designed to achieve the objectives.
Starting with initial data collection, the organization must identify what data is required, where it can be found, what format it is in, and what external sources of any missing data are available. This stage gains first insights into the data and must identify and find solutions to any data quality issues. Data quality should not be a problem if the source is a data warehouse, as the data should have been "cleaned" before being loaded into it.
This is the collection, cleaning, and transformation of the data to get it ready for the modeling phase. These tasks are likely to be performed multiple times and not in any prescribed order.
Various modeling techniques can be applied and it is essential that the Knowledge Discovery tool allows the best technique for the specific problem to be applied. Some techniques require certain forms of data, so stepping back to data preparation is often required. This stage is often referred to as "data mining."
The results of the modeling exercise need to be reviewed. The user has to confirm that the model built solves the original problem. If not, they must identify what has been missed. At the end of the evaluation phase, a decision should be reached on how to use the results within the business.
The knowledge gained needs to be organized and presented in a way the business can use in its operations. The deployment maybe as simple as a report or it may require the generated model to be embedded in the end user applications.
Knowledge Discovery and Data warehousing are complementary solutions to the organization's demand for information. An existing data warehouse provides a rich source of data for Knowledge Discovery, but this will still probably need to be augmented by data provided by operational systems and external sources.
The need for data warehousing was driven by the requirement for organizations to better understand the data that they already had in their different transaction processing systems, as well as to gain a competitive edge by making more effective use of it. The need to integrate customer data held in different transaction processing systems for decision support drove the requirement to centralize the data in a form that was easy for the business to understand.
Data warehouse volumes now commonly exceed 100GB and the number of systems greater than 1TB is growing rapidly. OLAP systems commonly handle 10-20GB of data, with many at 100GB. As volumes increase, the number of possible permutations of data relationships grows exponentially. The volumes become too great for users to explore and analyze, missing potentially important patterns and relationships.
A data warehouse can provide much information, but the considerable investment made in them only yields a return if users effectively exploit it to create a competitive advantage. Traditional queries will help to control a company, but offer little to identify opportunities, which is why Knowledge Discovery is required.
The questions below demonstrate the difference between what a data Warehouse or OLAP tool can answer and those that Knowledge Discovery can solve. They are complementary, and while Knowledge Discovery should be part of every organization's data warehouse strategy, it can be used independently with great effect.
Typical OLAP versus Knowledge Discovery questions:
1. Which customers spent most with us last year? [OLAP]
Which customers should we target with our next promotion? [Knowledge Discovery]
2. How many customers closed their accounts in the previous six months compared to the same period last year? [OLAP]
Which customers will switch their accounts to a competitor in the next six months? [Knowledge Discovery]
3. Which stores failed to meet target last month? [OLAP]
What is the optimum size and location of our next store? [Knowledge Discovery]
4. What were our top five selling products by revenue? [OLAP]
Which additional products are most likely to be sold with a specific purchase? [Knowledge Discovery]
5. How much did we lose on failed loans in the last year? [OLAP]
Which customers are most likely to default on a loan? [Knowledge Discovery]
Knowledge Discovery is the natural evolution of the reporting and OLAP systems deployed over the last ten years. OVUM summarized the relationship as shown in the chart (Fig 2).
Knowledge Discovery tools can analyze the same operational or data warehouse data that populates an OLAP system. It should be borne in mind that Knowledge Discovery processes often require data preparation specific to the form of algorithm to be used, and that these data preparation operations will be needed on-both operational and data warehouse sources (Fig 3). On very large data warehouses, Knowledge Discovery may be employed to select the information required for further OLAP analysis, as it may not be feasible to load all of the original data into the OLAP system or even to know which information is required.
Creating a competitive edge is the goal of all organizations employing Knowledge Discovery for decision support. They need to constantly seek information that will enable better decisions that in turn generate greater revenues, or reduce costs, or increase product quality and customer service. Knowledge Discovery provides unique benefits over alternative decision support techniques, as it uncovers relationships and rules, not just data. These hidden relationships and rules exist empirically in the data because they have been derived from the way the business and its market work.
Extracting and acting on them helps to create a series of best practices that offer huge potential benefits. Knowledge Discovery and data mining have been employed across many industry sectors; the table identifies some of the information obtained.
Using Knowledge Discovery as a tool to unearth the knowledge vital for running a business is only the first step. With the massive increase in data being collected and the demands of a new breed of Intelligent Application like customer relationship management, demand planning and predictive forecasting, the OLAP and Knowledge Discovery technologies must come together to provide a high performance and feature rich Intelligent Application Server.
The way to process millions of records is to have the Knowledge Discovery algorithms as close as possible, if not totally integrated within the OLAP data storage. To guarantee the response necessary for real-time mining, the decision tree inquiry needs to be an integral part of the OLAP function, just as pivot and drill-down is today.
Knowledge Discovery harnesses the latest advances in information technology to offer a new level of analysis previously available only to a few statistical specialists. The example application areas represent only the beginning of the potential opportunities, which Knowledge Discovery technology presents to the integrators who can provide such rich informational systems.
Ian Rawlings is managing director of K-wiz Solutions Inc. (Boston, MA).
Application area What Knowledge Discovery can determine Marketing What other products can I sell to existing customers? Insurance Which customers are likely to have claims next year? Sales Which car fits the requirements of this customer best? Personnel What are the characteristics of a successful salesperson? Medical Who are good candidates for a certain medical procedure? Fraud Detection Which trades are based upon insider information? Customer Care Which customers are going to leave for a competitor? Source: Codd and Date, 1997
|Printer friendly Cite/link Email Feedback|
|Publication:||Computer Technology Review|
|Date:||Sep 1, 1999|
|Previous Article:||The Free PC: Fad Or Our Future?|
|Next Article:||IEEE 1394 Bus Galvanic Isolation Issues.|