Académique Documents
Professionnel Documents
Culture Documents
By Barry Keating
Data mining is a way to gain market intelligence from a huge amount of data ... the problem today is not the lack of data, but how to learn from it ... in data mining, the data tell the story, but it is up to you how to use that information.
stories, or any other kind); and so on. With this type of information available, decision makers will make better choices. Human resource people will hire the right individuals. Credit departments will target those prospective customers that are less prone to become delinquent and/or less likely to involve in fraudulent activities.
Direct marketers will target those customers that are more likely to purchase their products. With the insight gained from data mining, businesses may wish to re-configure their product offering and/or emphasize specific features of a product. These are not the only uses of data mining. Police use this tool to dctennine when and where a crime is likely to occur, and what would be the nature of that crime. Organized stock exchanges detect fraudulent activities with data mining. Pharmaceutical companies mine data to predict the efficacy of compounds as well as to uneover new chemical entities that may be useful for a particular disease. The airline industry uses it to predict which flights are likely to be delayed (well before the flight is scheduled to depart). Weather analysts determine weather patterns with data mining to predict when there will be rain, sunshine, a hurricane, or snow. Nonprofit companies use data mining to predict the likelihood of individuals making a donation for a certain cause. The uses of data mining arc far reaching and its benefits may be quite significant.
ata mining is used to search for valuable information from the mounds of data collected over time, which could be used in decision making. The information may be certain patterns and/or relationships that exist.
With data mining, a retail store may find that certain products are sold more in one channel of distribution than in the others; certain products are sold together; certain products are sold more in one geographical location than in others; and certain products are sold when a certain event occurs. Wal-Mart, for example, has found that the sales of beer increase when a hurricane is imminent. This means that they have to hold more than the usual supply ofbeer when a hurricane is expected. With data mining, a financial analyst would like to know the characteristics of a company becoming insolvent; human resource managers would like to know the characteristics of a successful prospective employee; credit card departments would like to know which potential customers are more likely to pay back the debt and when a credit card is swiped, which transaction is fraudulent and which one is legitimate; direct marketers would like to know which customers purchase which types of products; booksellers like Amazon would like to know which customers purchase which types of books (fiction, detective
BARRY KEATING Dr. Keating is the Jesse H. Jones Professor of Business Economics at the University of Notre Dame. He specializes in understanding how notfor-profit organizations function; more specifically, how they respond to incentives, changes in revenue and cost conditions, and changes in regulatory mechanisms. He is widely published and is the co-author of the book. Business Forecasting, published by McGrawHill. He Is a Heritage Foundation Fellow (1992-1996), a Heartland Institute Research Fellow, and serves on the Board of Advisors of both the Indiana Policy Review Group and the Institute of Business Forecasting.
33
patterns in early election returns, in global temperature ehanges, and in the saies data of new and matured products. Over the last 25 years or so, there has been a gradual evolution from data processing to data mining. In the 1960s, businesses routinely collected data and processed it using database management techniques that allowed an orderly listing and tabulation of the data as well as some query activity. On-line Transaction Processing (OLTP) became routine, data retrieval from stored data became faster and more efficient because of the availability of new and better storage devices, and data processing became quicker and more efficient because of advancements in computer technology. Database management advanced rapidly to include highly sophisticated query systems, and became popular not only in business applications but also in scientific inquiries. Databases began to grow at previously unheard of rates. The amount of data in all of the world's databases is now estimated to double in less than every two years. Businesses currently deploy what we call data warehouses and data marts. A "data warehouse" is a firm's repository of historic data, containing information of every relevant activity that occurred in the past. A "data mart," on the other hand, is a subset of a data warehouse. It holds some special information or infonnation that has been grouped to help businesses in making better decisions. Data used here are usually derived from a data warehouse. The first organized use of such large databases started with Online Analytical Processing (OLAP). Data mining tools use and analyze the data that exist in databases, data marts and data warehouses. Researchers have been doing data inining for a long time, though they called it by different names. Some called it Exploratory Data Analysis; others called it Business Intelligence, Data Driven Discovery, Deductive Learning., Discovery Science, and Knowledge Discovery in Databases (KDD).
story rather than impose a model on the data that we feel will replicate the actual patterns. Perhaps the most common misconception about data mining is that it will automatically extract all the valuable infonnation embedded in a database without any intervention on the part of the researcher. In fact, every large database contains numerous sets of patterns, which may very well be as many in number as the number of items in the database itself. But. most of the patterns could be irrelevant to the researcher's task. So, the researcher, before he or she starts the data mining process, sets goals and research parameters. This way he or she will eliminate many patterns that are irrelevant to the task and concentrate on the ones that are important and pertinent. As in traditional statistical forecasting, the researcher remains an important part of the analysis. Data mining usually uses very large datasets, oftentimes far larger than the datasets used in business forecasting. But the tools used in data mining are somewhat different than the ones used in traditional business forecasting. You may well be familiar with many of the statistical tools available to us, but tools used in data mining and the way they are used are different from the ones used in traditional business forecasting. Tools used in data mining are discussed in the next section. The premise of data mining is that there is a great deal of information locked up in a database; it's up to the researcher to unlock it. Data mining tools help to unlock that information.
Prediction Tools: They are the methods derived from traditional statistical forecasting for predicting a variable's value. ClassificationTools: Most commonly used in data mining., classification tools attempt to distinguish different classes of objects or actions. For instance, a particular credit card transaction may be either normal or fraudulent. These tools could classify it as one or the other, thereby saving the credit card company a considerable amount of money. In another instance, an advertiser may want to know which aspect of its promotion is most appealing to consumers. Is it price, quality, and/or reliability of a product? Maybe it is a special feature that is missing on competitive products. The classification tools help give such information on all the products, making possible to use the advertising budget in a most effective manner. Clustering Analysis Tools: These are very powerful tools for clustering products into groups that naturally fall together. These groups are identified by the program and not by the researchers. Most of the clusters discovered may not be useilil in business decision. However, they may find one or two that are extremely important, the ones the company can take advantage of. The most common use for clustering tools is probably in what economists refer to as "market segmentation." In market segmentation, a company divides the customer base into segments dependent upon characteristics such as income, wealth, geographic location, lifestyle, and so on. Each segment is then treated with a different marketing approach, one suited precisely to that particular segment. Association Rules Discovery: Here the data mining tools discover associations; e.g., what kinds of books certain groups of people read, what products certain groups of people purchase, what movies certain groups of people watch, etc. Businesses use this information in targeting their markets. Netflix, for example, recommends movies based on movies people have watched and rated in the past. Amazon does the mueh the same thing in recommending books.
BOOKS
Practical Guide to Business Forecasting edited by Chaman L. Jain & Jack Malehom. Flushing. New York: Graceway Publishing Cotiipany. 2005. pp. 510. $59.95 Regression Analysis, Modeling and Forecasting by George C. Wang & Chaman L. Jain, Flushing, New York: Graceway Publishing Company. 2003. pp. 299. $58.95. Benchmarking Forecasting Practices by Chaman L. Jain & Jack Malehom. Flushing, New York: Graceway Publishing Company. 2006. pp. 116. $68.95. Sales & Operations Planning: The How to Handbook by Thomas F. Wallace. 2004. pp. 176. $44.95. Sales & Operations Planning: The Executive's Guide by Thomas F. Wallace and Robert A. Stahl. 2006. pp. 112. $44.95. For Information Call/Contact IBF 350 Northern Blvd., Suite 203 Great Neck, N.Y. 11021 P: 516.504.7576 Email: info@ibf.org
35