Vous êtes sur la page 1sur 17

DATA MINING: Applications and Trends

Data mining has attracted a great deal of attention in the information industry
and in society as a whole in recent years, due to the availability of huge amounts
of data and the imminent need for turning such data into useful information and
knowledge. Today as more data are gathered, with the amount of data doubling
every three years, Data Mining is becoming an increasingly important tool to
transform these data into information. It is commonly used in a wide range of
profiling practices, such as marketing, surveillance, fraud detection and
scientific discovery.

INTRODUCTION

Fig.1 Data Mining: Discovering hidden value in your data warehouse

Data Mining is the exploration and analysis of large sets, in order to


discover meaningful patterns and rules. The key idea is to find effective ways
to combine computers power to process data with the human eye’s ability .to
detect patterns. The techniques of data mining are designed for work best with
large data sets.

Data mining is the process of extracting patterns from data It is the process
of extraction of interesting (nontrivial, implicit, previously unknown and
potentially useful) patterns or knowledge from huge amount of data. It is the set
of activities used to find new, hidden or unexpected patterns in data or unusual
patterns in data. Using information contained within data warehouse, data
mining can often provide answers to questions about an organization that a
decision maker has previously not thought to ask.

• Which products should be promoted to a particular customer?


DATA MINING: Applications and Trends 1
• What is the probability that a certain customer will respond to a planned
promotion?
• Which securities will be most profitable to buy or sell during the next trading
session?
• What is the likelihood that a certain customer will default or pay back a
schedule?
• What is the appropriate medical diagnosis for this patient?
These types of questions can be answered surprisingly easily if the information
hidden among the data in your databases can be located and utilized.

The importance of collecting data that reflect your business or


scientific activities to achieve competitive advantage is widely
recognized now. Powerful systems for collecting data and
managing it in large databases usually take place in all large
and mid-range companies. However, the bottleneck of turning
this data into your success is the difficulty of extracting
knowledge about the system you study from the collected data.

Human analysts with no special tools can no longer make sense


of enormous volumes of data that require processing in order to
make informed business decisions.

Data mining automates the process of finding


relationships and patterns in raw data and delivers
results that can be either utilized in an automated
decision support system or assessed by a human
analyst.

DATA MINING: Applications and Trends 2


1. HISTORIC DEVELOPMENT
(EVOLUTION)

Data mining techniques are the result of a long process of research and product
development. This evolution began when business data was first stored on
computers, continued with improvements in data access, and more recently,
generated technologies that allow users to navigate through their data in real
time. Data mining takes this evolutionary process beyond retrospective data
access and navigation to prospective and proactive information delivery. From
the user’s point of view, the following four steps were revolutionary because
they allowed new business questions to be answered accurately and quickly.

Data Collection (1960s)

Data Access (1980s)

Data Warehousing & Decision


Support (1990s)

Data Mining (Emerging Today)

DATA MINING: Applications and Trends 3


Fig 2: Evolutionary Stages of Data Mining

Evolutionary Stages of Data Mining:


 Data Collection (1960s): At this stage:-
 Business question: “What was my total revenue in the last five years?".
 Enabling technologies: Computers, tapes, disks.
 Product Providers: IBM, CDC.
 Characteristics: Retrospective, static data delivery.

 Data Access (1980s): At this stage:-


 Business question: "What were unit sales in New England last
March?".
 Enabling technologies: Relational databases (RDBMS),
Structured Query Language (SQL), ODBC.
 Product Providers: Oracle, Sybase, Informix, IBM, Microsoft.
 Characteristics: Retrospective, dynamic data delivery at record
level.

 Data Warehousing & Decision Support (1990s): At this stage:-


 Business question: "What were unit sales in New England last March?
Drill down to Boston.”
 Enabling technologies: On-line analytic processing (OLAP),
multidimensional databases, and data warehouses.
 Product Providers: Pilot, Comshare, Arbor, Cognos , Micro strategy.
 Characteristics: Retrospective, dynamic data delivery at multiple levels.

 Data Mining (Emerging Today): At this stage:-


 Business question: "What’s likely to happen to Boston unit sales next
month? Why?".
 Enabling technologies: Advanced algorithms, multiprocessor computers,
massive databases.
 Product Providers: Pilot, Lockheed, IBM, SGI, numerous startups
(nascent industry).
 Characteristics: Prospective, proactive information delivery.

The core components of data mining technology have been under


development for decades, in research areas such as statistics, artificial
DATA MINING: Applications and Trends 4
intelligence, and machine learning. Today, the maturity of these techniques,
coupled with high-performance relational database engines and broad data
integration efforts, make these technologies practical for current data
warehouse environments.

1.1THE PRESENT AND THE FUTURE

The field of data mining has been growing in leaps and bounds, and has shown
great potential for the future. What is the future of data mining? Certainly, the
field has made great strides in past years, and many industry analysts and
experts in the area feel that the future will be bright. There is definite growth
in the area of data mining. Many industry analysts and research firms have
projected a bright future for the entire data mining area, and its related area of
CRM (customer relationship management). The growth in the CRM Analytic
application market had approached 54.1% per year through 2003. In addition,
data mining projects had grown by more than 300% by the year 2002. By
2003, over 90% of consumer-based industries with e-commerce orientation
had utilized some kind of data mining model. As mentioned previously, the
field of data mining is very broad, and there are many methods and
technologies which have become dominant in the field.

1.2THE SCOPE OF DATA MINING


Data mining derives its name from the similarities between searching for
valuable business information in a large database and mining a mountain for
a vein of valuable ore. Both processes require either sifting through an
immense amount of material, or intelligently probing it to find exactly where
the value resides. Given databases of sufficient size and quality, data mining
technology can generate new business opportunities by providing these
capabilities:

• Automated prediction of trends and behaviors: Data mining


automates the process of finding predictive information in large
databases. Questions that traditionally required extensive hands-on
analysis can now be answered directly from the data — quickly. A
typical example of a predictive problem is targeted marketing. Data
mining uses data on past promotional mailings to identify the targets
most likely to maximize return on investment in future mailings. Other
predictive problems include forecasting bankruptcy and other forms of
default, and identifying segments of a population likely to respond
similarly to given events.
DATA MINING: Applications and Trends 5
• Automated discovery of previously unknown patterns. Data mining
tools sweep through databases and identify previously hidden patterns in
one step. An example of pattern discovery is the analysis of retail sales
data to identify seemingly unrelated products that are often purchased
together. Other pattern discovery problems include detecting fraudulent
credit card transactions and identifying anomalous data that could
represent data entry keying errors.

Data mining techniques can yield the benefits of automation on existing


software and hardware platforms, and can be implemented on new systems
as existing platforms are upgraded and new products developed. When data
mining tools are implemented on high performance parallel processing
systems, they can analyze massive databases in minutes. Faster processing
means that users can automatically experiment with more models to
understand complex data. High speed makes it practical for users to analyze
huge quantities of data. Larger databases, in turn, yield improved predictions.

1.3 TECHNIQUES OF DATA MININ


The most commonly used techniques in data mining are:

• Artificial neural networks: Non-linear predictive models that learn


through training and resemble biological neural networks in structure.

• Decision trees: Tree-shaped structures that represent sets of decisions.


These decisions generate rules for the classification of a dataset. Specific
decision tree methods include Classification and Regression Trees
(CART) and Chi Square Automatic Interaction Detection (CHAID).

• Genetic algorithms: Optimization techniques that use process such as


genetic combination, mutation, and natural selection in a design based
on the concepts of evolution.

• Nearest neighbor method: A technique that classifies each record in a


dataset based on a combination of the classes of the k record(s) most
similar to it in a historical dataset (where k ³ 1). Sometimes called the k-
nearest neighbor technique.

• Rule induction: The extraction of useful if-then rules from data based
on statistical significance.

DATA MINING: Applications and Trends 6


Many of these technologies have been in use for more than a decade in
specialized analysis tools that work with relatively small volumes of data.
These capabilities are now evolving to integrate directly with industry-
standard data warehouse and OLAP platforms.

1.4 THE TEN STEPS OF DATA MINING


Here is a process for extracting hidden knowledge from your data warehouse,
your customer information file, or any other company database.

1. Identify The Objective -- Before you begin, be clear on what you hope
to accomplish with your analysis. Know in advance the business goal of the
data mining. Establish whether or not the goal is measurable. Some possible
goals are to

• Find sales relationships between specific products or services


• Identify specific purchasing patterns over time
• Identify potential types of customers
• Find product sales trends.

2. Select The Data -- Once you have defined your goal, your next step is
to select the data to meet this goal. This may be a subset of your data
warehouse or a data mart that contains specific product information. It may
be your customer information file. Segment it as much as possible the
scope of the data to be mined. Here are some key issues.

• Are the data adequate to describe the phenomena the data mining
analysis is attempting to model?

• Can you enhance internal customer records with external lifestyle and
demographic data?
• Are the data stable—will the mined attributes be the same after the analysis?
• If you are merging databases can you find a common field for linking them?
• How current and relevant are the data to the business goal?

3.Prepare The Data -- Once you've assembled the data, you must
decide which attributes to convert into usable formats. Consider the input of
domain experts—creators and users of the data.

• Establish strategies for handling missing data, extraneous noise,


and outliers.

DATA MINING: Applications and Trends 7


• Identify redundant variables in the dataset and decide which
fields to exclude.
• Decide on a log or square transformation, if necessary.
• Visually inspect the dataset to get a feel for the database.
• Determine the distribution frequencies of the data

You can postpone some of these decisions until you select a data-mining
tool. For example, if you need a neural network or polynomial network you
may have to transform some of your fields.

4. Audit The Data -- Evaluate the structure of your data in order to


determine the appropriate tools.

• What is the ratio of categorical/binary attributes in the database?


• What is the nature and structure of the database?
• What is the overall condition of the dataset?
• What is the distribution of the dataset?

Balance the objective assessment of the structure of your data against your
users' need to understand the findings. Neural nets, for example, don't explain
their results.

Identify the Objective

2. Select the data

DATA MINING: Applications and Trends 8


3. Prepare the data

4. Audit the data


Steps of DATA
MINING
5. Select the Tools

6. Format the solution

7. Construct the solution

8. Validate the findings

9. Deliver the findings

10. Integrate the solution

5.Select The Tools – Two concerns drive the selection of the


appropriate data-mining tool—your business objectives and your data
structure. Both should guide you to the same tool. Consider these questions
when evaluating a set of potential tools.

• Is the data set heavily categorical?


• What platforms do your candidate tools support?
• Are the candidate tools ODBC-compliant?
DATA MINING: Applications and Trends 9
• What data format can the tools import?

No single tool is likely to provide the answer to your data-mining project.


Some tools integrate several technologies into a suite of statistical analysis
programs, a neural network, and a symbolic classifier.

6. Format The Solution -- In conjunction with your data audit, your


business objective and the selection of your tool determine the format of your
solution. The Key questions are:

• What is the optimum format of the solution—decision tree, rules, C


code, SQL syntax?
• What are the available format options?
• What is the goal of the solution?
• What do the end-users need—graphs, reports, code?

7. Construct The Model -- At this point that the data mining


process begins. Usually the first step is to use a random number seed to split
the data into a training set and a test set and construct and evaluate a model.
The generation of classification rules, decision trees, clustering sub-groups,
scores, code, weights and evaluation data/error rates takes place at this stage.
Resolve these issues:

• Are error rates at acceptable levels? Can you improve them?


• What extraneous attributes did you find? Can you purge them?
• Is additional data or a different methodology necessary?
• Will you have to train and test a new data set?

8. Validate The Findings -- Share and discuss the results of the


analysis with the business client or domain expert. Ensure that the findings
are correct and appropriate to the business objectives.

• Do the findings make sense?


• Do you have to return to any prior steps to improve results?
• Can use other data mining tools to replicate the findings?

9. Deliver The Findings -- Provide a final report to the business unit


or client. The report should document the entire data mining process
including data preparation, tools used, test results, source code, and rules.
Some of the issues are:

DATA MINING: Applications and Trends 10


• Will additional data improve the analysis?
• What strategic insight did you discover and how is it applicable?
• What proposals can result from the data mining analysis?
• Do the findings meet the business objective?

10. Integrate The Solution -- Share the findings with all interested
end-users in the appropriate business units. You might wind up incorporating
the results of the analysis into the company's business procedures. Some of
the data mining solutions may involve

• SQL syntax for distribution to end-users


• C code incorporated into a production system
• Rules integrated into a decision support system.

Although data mining tools automate database analysis, they can lead to faulty
findings and erroneous conclusions if you're not careful. Bear in mind that data
mining is a business process with a specific goal—to extract a competitive
insight from historical records in a database.

2.DATA MINING –IMPACT ON EMPLOYEES AND


INDUSTRY

•For Financial data analysis


Most banks and financial institutions offer a wide variety of banking services
(such as checking, saving, and business and individual customer transactions),
credit (such as business, mortgage, and automobile loans), and investment
services (such as mutual funds). Some also offer insurance services and stock
services. Financial data collected in the banking and financial industry is often
relatively complete, reliable and high quality, which facilitates systematic data
analysis and data mining. For example it can also help in fraud detection by
detecting a group of people who stage accidents to collect on insurance money.

• For Retail Industry

Retail industry collects huge amount of data on sales, customer shopping


history, goods transportation and Consumption and service records and so on.
The quantity of data collected continues to expand rapidly, especially due to the
increasing ease, availability and popularity of the business conducted on web, or
DATA MINING: Applications and Trends 11
e-commerce. Retail industry provides a rich source for data mining. Retail data
mining can help identify customer behavior, discover customer shopping
patterns and trends, improve the quality of customer service, achieve better
customer retention and satisfaction, enhance goods consumption ratios design
more effective goods transportation and distribution policies and reduce the cost
of business.

• For Telecommunication Industry

The telecommunication industry has quickly evolved from offering local and
long distance telephone services to provide many other comprehensive
communication services including voice, fax, pager, cellular phone, images, e-
mail, computer and web data transmission and other data traffic. The integration
of telecommunication, computer network, Internet and numerous other means of
communication and computing are underway. Moreover, with the deregulation
of the telecommunication industry in many countries and the development of
new computer and communication technologies, the telecommunication market
is rapidly expanding and highly competitive. This creates a great demand from
data mining in order to help understand business involved, identify
telecommunication patterns, catch fraudulent activities, make better use of
resources, and improve the quality of services.

• Text Mining and Web Mining

Text mining is the process of searching large volumes of documents from


certain keywords or key phrases. By searching literally thousands of documents
various relationships between the documents can be established. Using text
mining however, we can easily derive certain patterns in the comments that may
help identify a common set of customer perceptions not captured by the other
survey questions. An extension of text mining is web mining. Web mining is an
exciting new field that integrates data and text mining within a website. It
enhances the web site with intelligent behavior, such as suggesting related links
or recommending new products to the consumer. Web mining is especially
exciting because it enables tasks that were previously difficult to implement.
They can be configured to monitor and gather data from a wide variety of
locations and can analyze the data across one or multiple sites. For example the
search engines work on the principle of data mining.

• Healthcare

DATA MINING: Applications and Trends 12


The past decade has seen an explosive growth in biomedical research, ranging
from the development of new pharmaceuticals and in cancer therapies to the
identification and study of human genome by discovering large scale
sequencing patterns and gene functions. Recent research in DNA analysis has
led to the discovery of genetic causes for many diseases and disabilities as well
as approaches for disease diagnosis, prevention and treatment.

3.EXAMINATION OF CURRENT TRENDS

As different types of data are available, approaches poses many challenging


research issues in data mining. The design of a standard data mining languages,
the development of effective and efficient data mining methods and systems, the
construction of interactive and integrated data mining environments, and the
applications of data mining to solve large applications large application
problems are important tasks for data mining researches and data mining system
and application developers. Here we will discuss some of the trends in data
mining that reflect the pursuit of these challenges:

• Application Exploration:

Earlier data mining was mainly used for helping businesses gain a competitive
edge. But as data mining is becoming more popular it is gaining wide
acceptance in other fields also such as biomedicine, stock market, fraud
detection, telecommunication and many more. And many new explorations are
being done for this purpose. In addition for data mining for business continues
to expand as e-commerce and marketing becomes mainstream elements of the
retail industry. As generic data mining systems may have limitations in dealing
with application-specific problems, we may see a trend toward the development
of more application– specific data mining systems.

• Scalable data mining methods

The current data mining methods capable of handling only a particular type of
data and limited amount of data, but as data is expanding at a massive rate, there
is a need to develop new data mining methods which are scalable and can
handle different types of data and large volume of data. The data mining
methods should be more interactive and user friendly. One important direction
towards improving the repair efficiency of the timing process while increasing
user interaction is constraint-based mining. This provide user with more control
by allowing the specification and use of constraints to guide data mining
systems in their search for interesting patterns.
DATA MINING: Applications and Trends 13
• Combination of data mining with database
systems, data warehouse systems, and web database
systems

Database systems, data warehouse systems, and WWW are loaded with huge
amounts of data and have thus become the major information processing
systems. It is important to make sure that data mining serves as essential data
analysis component that can be easily included in to such an information-
processing environment. The desired architecture for data mining system is the
tight coupling with database and data warehouse systems. Transaction
management query processing, online analytical processing and online
analytical mining should be integrated into one unified framework.

• Standardization of data mining language:


Today few data mining languages are commercially available in the market like
Microsoft’s SQL server 2005, IBM Intelligent Miner, SAS Enterprise Miner,
SGI Mineset, Clementine, DBMiner and many more but a standard data mining
language or other standardization efforts will provide the orderly development
of data mining solutions, improved interpretability among multiple data mining
systems and functions.

• Visual data mining

It is rightly said a picture is worth a thousand words. So if the result of the


mined data can be shown in the visual form it will further enhance the worth of
the mined data. Visual data mining is an effective way to discover knowledge
from huge amounts of data. The systematic study and development of visual
data mining techniques will promote the use for data mining analysis.

• New methods for mining complex types of data

The complex types of data like geospatial, multimedia, time series, sequence
and text data poses an important research area in field of data mining. There is
still a huge gap between the needs for these applications and the available
technology.

• Web mining
DATA MINING: Applications and Trends 14
The World Wide Web is huge collection of globally distributed collection of
news, advertisements, consumer records, financial, education, government, e-
commerce and many other services. The WWW also contains huge and dynamic
collection hyper linked information, providing a huge source for data mining.
Based on the above facts, the
Web also poses great challenges for efficient resource and knowledge discovery.

• Biological data mining:

Although biological data mining can be considered under “application


exploration”, the unique combination of complexity, richness, size, and
importance of biological warrants special attention in data mining. Mining DNA
and protein sequences, mining high-dimensional microarray data are some of
the interesting topics for biological data mining research.

• Data mining and software engineering:

As software programs become increasingly bulky in size, sophisticated in


complexity, and tend to originate from the integration of multiple components
developed by different software team, it is an increasingly challenging task to
ensure software robustness and reliability. The analysis of the executions of a
buggy software program is essentially a data mining process- tracing the data
generated during program executions may disclose important patterns and
outliers that may lead to the eventual automated discovery of software bugs.

• Distributed data mining:

Traditional data mining methods, designed to work at a centralized location, do


not work well in many of the distributed computing environments present today
(e.g., intranets, Internets, LAN). Advances in distributed data mining methods
are expected.

• Real time data mining:

Many applications involving stream data (such as e-commerce, web mining,


stock analysis) require dynamic data mining models to be built in real time.
Additional development is needed in this area

DATA MINING: Applications and Trends 15


4. CONCLUSION
Comprehensive data warehouses that integrate operational data with customer,
supplier, and market information have resulted in an explosion of information.
Competition requires timely and sophisticated analysis on an integrated view of
the data. However, there is a growing gap between more powerful storage and
retrieval systems and the users’ ability to effectively analyze and act on the
information they contain.

Both relational and OLAP technologies have tremendous capabilities for


navigating massive data warehouses, but brute force navigation of data is not
enough. A new technological leap is needed to structure and prioritize
information for specific end-user problems. The data mining tools can make this
leap. Quantifiable business benefits have been proven through the integration of
data mining with current information systems, and new products are on the
horizon that will bring this integration to an even wider audience of users.

DATA MINING: Applications and Trends 16


Since data mining is a young discipline with wide and diverse applications,
there is still a nontrivial gap between general principles of data mining and
domain specific, effective data mining tools for particular applications.

A few application domains of Data Mining (such as finance, the retail


industry and telecommunication) and Trends in Data Mining which include
further efforts towards the exploration of new application areas and new
methods for handling complex data types, algorithms scalability, constraint
based mining and visualization methods, the integration of data mining
with data warehousing and database systems, the standardization of data
mining languages, and data privacy protection and security.

5.REFERENCES

 Han, Jiawei and Kamber, Micheline (Second Edition). Data Mining:


Concepts and Techniques. Morgan Kaufmann Publishers.
 http://www.wikepedia.com
 http://www.google.com
 http://www.csse.monash.edu.au

DATA MINING: Applications and Trends 17

Vous aimerez peut-être aussi