Using Data Mining and Warehousing For Knowledge Discovery.Throughout the 1980s and early 1990s, major corporations adopted Business Intelligence (BI) tools such as report writers, spreadsheets The following is a list of spreadsheets. Freeware/open source software Online spreadsheets
f x = b^x If no base is specified, e, the base of natural logarthims, is assumed. 2. increase in information--primarily due to the electronic capture of data and its storage in vast data warehouses--has reduced the benefit of these BI tools dramatically. These systems, although essential for monitoring and planning, are unable to cope with the volumes of data or the sophisticated analysis required for strategic decision making. Real strategic value comes from understanding customer behavior and being able to model alternative actions. The knowledge required to anticipate behavior could be discovered from many users running several traditional queries against warehouses of data, but that supposes the questions are known and the time is available to complete the analysis. Knowledge Discovery is a powerful new solution to information overload A symptom of the high-tech age, which is too much information for one human being to absorb in an expanding world of people and technology. It comes from all sources including TV, newspapers, magazines as well as wanted and unwanted regular mail, e-mail and faxes. . It enables an organization to better understand the business process at work by searching automatically through huge amounts of data, looking for Looking for In the context of general equities, this describing a buy interest in which a dealer is asked to offer stock, often involving a capital commitment. Antithesis of in touch with. patterns of events, and presenting these to the business in an easy-to-understand graphical form. These systems are tireless, they do not forget, they free up skilled human resources The fancy word for "people." The human resources department within an organization, years ago known as the "personnel department," manages the administrative aspects of the employees. , and find answers to important questions that users may never have asked. Organizations need these new solutions if they are to remain competitive. According to according to prep. 1. As stated or indicated by; on the authority of: according to historians. 2. In keeping with: according to instructions. 3. Eight Trends in IT, in the Gartner (Gartner, Inc., Stamford, CT, www.gartner.com) The largest information technology consulting firm that specializes in research and analysis. Founded in 1979 by Gideon Gartner, it has grown through acquisitions, including Dataquest in 1995 and Techrepublic in 2000. Research Review from 1998, trend one is: "From Data to Decisions. Rather than using IT as a means to collect and present data for users to make decisions, technology will continue to automate To turn a set of manual steps into an operation that goes by itself. See automation. more of the burden of the decision making process itself (e.g., through data mining, expert systems, and agents)." A clear and fast growing market for Knowledge Discovery solutions exists. The Meta Group has calculated the size of the Knowledge Discovery market, forecasting a market with a compound growth of 40 percent to reach $800 million in the year 2000. What Is Knowledge Discovery? Knowledge Discovery is about understanding a business. It is a process that solves business problems by analyzing the data to identify patterns and relationships that can explain and predict behavior. The process is well defined and can be broken into six phases, as shown in Fig 1. The process is cyclical cyclical Of or relating to a variable, such as housing starts, car sales, or the price of a certain stock, that is subject to regular or irregular up-and-down movements. and iterative it·er·a·tive adj. 1. Characterized by or involving repetition, recurrence, reiteration, or repetitiousness. 2. Grammar Frequentative. Noun 1. with the results of one exercise driving requirements for further analysis. The Knowledge Discovery process should not be confused with data mining, which is the application of specific algorithms The following is a list of the algorithms described in Wikipedia. See also the list of data structures, list of algorithm general topics and list of terms relating to algorithms and data structures. for finding patterns in data. Although data mining can be considered the key step in the overall Knowledge Discovery process ("modeling" in the Fig), the other stages are essential in ensuring that knowledge is successfully extracted from the data. It should be possible to use the identified knowledge to, for example: * Make predictions about new data; * Identify and explain hidden patterns and trends in existing data; * Summarize sum·ma·rize intr. & tr.v. sum·ma·rized, sum·ma·riz·ing, sum·ma·riz·es To make a summary or make a summary of. sum the contents of large databases to facilitate understanding and decision making. The Knowledge Discovery Process Implementing Knowledge Discovery as a process within an organization has clear benefits. It creates a common understanding and framework that is understood by all involved in decision making, which enables Knowledge Discovery exercises to be an important part of the ongoing management process. Business Understanding This initial phase focuses on understanding the objectives of the project or exercise from a business perspective. This has to be converted into a Knowledge Discovery problem definition, so that a preliminary plan can be designed to achieve the objectives. Data Understanding Starting with initial data collection, the organization must identify what data is required, where it can be found, what format it is in, and what external sources of any missing data are available. This stage gains first insights into the data and must identify and find solutions to any data quality issues. Data quality should not be a problem if the source is a data warehouse, as the data should have been "cleaned" before being loaded into it. Data Preparation This is the collection, cleaning, and transformation of the data to get it ready for the modeling phase. These tasks are likely to be performed multiple times and not in any prescribed pre·scribe v. pre·scribed, pre·scrib·ing, pre·scribes v.tr. 1. To set down as a rule or guide; enjoin. See Synonyms at dictate. 2. To order the use of (a medicine or other treatment). order. Modeling Various modeling techniques can be applied and it is essential that the Knowledge Discovery tool allows the best technique for the specific problem to be applied. Some techniques require certain forms of data, so stepping back to data preparation is often required. This stage is often referred to as "data mining." Evaluation The results of the modeling exercise need to be reviewed. The user has to confirm that the model built solves the original problem. If not, they must identify what has been missed. At the end of the evaluation phase, a decision should be reached on how to use the results within the business. Deployment The knowledge gained needs to be organized and presented in a way the business can use in its operations. The deployment maybe as simple as a report or it may require the generated model to be embedded Inserted into. See embedded system. in the end user applications. Knowledge Discovery and Data warehousing See data warehouse. data warehousing - data warehouse are complementary solutions to the organization's demand for information. An existing data warehouse provides a rich source of data for Knowledge Discovery, but this will still probably need to be augmented by data provided by operational systems and external sources. The need for data warehousing was driven by the requirement for organizations to better understand the data that they already had in their different transaction processing systems A Transaction Processing System (TPS) is a type of information system. TPSs collect, store, modify, and retrieve the transactions of an organization. A transaction is an event that generates or modifies data that is eventually stored in an information system. , as well as to gain a competitive edge by making more effective use of it. The need to integrate customer data held in different transaction processing systems for decision support drove the requirement to centralize cen·tral·ize v. cen·tral·ized, cen·tral·iz·ing, cen·tral·iz·es v.tr. 1. To draw into or toward a center; consolidate. 2. the data in a form that was easy for the business to understand. Data warehouse volumes now commonly exceed 100GB and the number of systems greater than 1TB is growing rapidly. OLAP systems commonly handle 10-20GB of data, with many at 100GB. As volumes increase, the number of possible permutations of data relationships grows exponentially ex·po·nen·tial adj. 1. Of or relating to an exponent. 2. Mathematics a. Containing, involving, or expressed as an exponent. b. . The volumes become too great for users to explore and analyze an·a·lyze v. 1. To examine methodically by separating into parts and studying their interrelations. 2. To separate a chemical substance into its constituent elements to determine their nature or proportions. 3. , missing potentially important patterns and relationships. A data warehouse can provide much information, but the considerable investment made in them only yields a return if users effectively exploit it to create a competitive advantage. Traditional queries will help to control a company, but offer little to identify opportunities, which is why Knowledge Discovery is required. The questions below demonstrate the difference between what a data Warehouse or OLAP tool can answer and those that Knowledge Discovery can solve. They are complementary, and while Knowledge Discovery should be part of every organization's data warehouse strategy, it can be used independently with great effect. Typical OLAP versus Knowledge Discovery questions: 1. Which customers spent most with us last year? [OLAP] Which customers should we target with our next promotion? [Knowledge Discovery] 2. How many customers closed their accounts in the previous six months compared to the same period last year? [OLAP] Which customers will switch their accounts to a competitor in the next six months? [Knowledge Discovery] 3. Which stores failed to meet target last month? [OLAP] What is the optimum size and location of our next store? [Knowledge Discovery] 4. What were our top five selling products by revenue? [OLAP] Which additional products are most likely to be sold with a specific purchase? [Knowledge Discovery] 5. How much did we lose on failed loans in the last year? [OLAP] Which customers are most likely to default on a loan? [Knowledge Discovery] Knowledge Discovery is the natural evolution of the reporting and OLAP systems deployed over the last ten years. OVUM summarized the relationship as shown in the chart (Fig 2). Knowledge Discovery tools can analyze the same operational or data warehouse data that populates an OLAP system. It should be borne in mind that Knowledge Discovery processes often require data preparation specific to the form of algorithm algorithm (ăl`gərĭth'əm) or algorism (–rĭz'əm) [for Al-Khowarizmi], a clearly defined procedure for obtaining the solution to a general type of problem, often numerical. to be used, and that these data preparation operations will be needed on-both operational and data warehouse sources (Fig 3). On very large data warehouses, Knowledge Discovery may be employed to select the information required for further OLAP analysis, as it may not be feasible (algorithm) feasible - A description of an algorithm that takes polynomial time (that is, for a problem set of size N, the resources required to solve the problem can be expressed as some polynomial involving N). to load all of the original data into the OLAP system or even to know which information is required. Creating a competitive edge is the goal of all organizations employing Knowledge Discovery for decision support. They need to constantly seek information that will enable better decisions that in turn generate greater revenues, or reduce costs, or increase product quality and customer service. Knowledge Discovery provides unique benefits over alternative decision support techniques, as it uncovers relationships and rules, not just data. These hidden relationships and rules exist empirically em·pir·i·cal adj. 1. a. Relying on or derived from observation or experiment: empirical results that supported the hypothesis. b. in the data because they have been derived de·rive v. de·rived, de·riv·ing, de·rives v.tr. 1. To obtain or receive from a source. 2. from the way the business and its market work. Extracting and acting on them helps to create a series of best practices that offer huge potential benefits. Knowledge Discovery and data mining have been employed across many industry sectors; the table identifies some of the information obtained. Using Knowledge Discovery as a tool to unearth the knowledge vital for running a business is only the first step. With the massive increase in data being collected and the demands of a new breed of Intelligent Application like customer relationship management, demand planning and predictive forecasting, the OLAP and Knowledge Discovery technologies must come together to provide a high performance and feature rich Intelligent Application Server. The way to process millions of records is to have the Knowledge Discovery algorithms as close as possible, if not totally integrated within the OLAP data storage. To guarantee the response necessary for real-time 1. real-time - Describes an application which requires a program to respond to stimuli within some small upper limit of response time (typically milli- or microseconds). Process control at a chemical plant is the classic example. mining, the decision tree inquiry needs to be an integral part of the OLAP function, just as pivot and drill-down is today. Knowledge Discovery harnesses the latest advances in information technology to offer a new level of analysis previously available only to a few statistical specialists. The example application areas represent only the beginning of the potential opportunities, which Knowledge Discovery technology presents to the integrators who can provide such rich informational systems. Ian Rawlings Ian Rawlings (born 9 March 1959 in Whyalla, South Australia, Australia) is an Australian actor. He is most famous for two long-running roles in Australian soap operas. He started out playing the role of the spiteful and scheming Wayne Hamilton in Sons and Daughters. is managing director of K-wiz Solutions Inc. (Boston Boston, town, England Boston, town (1991 pop. 26,495), E central England, on the Witham River. Boston's fame as a port dates from the 13th cent., when it was a Hanseatic port trading wool and wine. Having recovered from a decline in the 18th and 19th cent. , MA). Application area What Knowledge Discovery can determine Marketing What other products can I sell to existing customers? Insurance Which customers are likely to have claims next year? Sales Which car fits the requirements of this customer best? Personnel What are the characteristics of a successful salesperson? Medical Who are good candidates for a certain medical procedure? Fraud Detection Which trades are based upon insider information? Customer Care Which customers are going to leave for a competitor? Source: Codd and Date, 1997 |
|
||||||||||||||||

Printer friendly
Cite/link
Email
Feedback
Reader Opinion