Vous êtes sur la page 1sur 13

Data Mining:

Knowledge discovery in databases

Presented by: BALAJI

October 24, 2012

Introduction

Motivation: Why data mining?

What is data mining?


Evolution of Data Mining

KDD process
Architecture: Typical Data Mining System

What can Data mining do?


Data mining technologies&functionalies
2

October 24, 2012

Necessity Is the Mother of Invention

Data explosion problem

Automated data collection tools and mature database technology


lead to tremendous amounts of data accumulated and/or to be analyzed in databases, data warehouses, and other information

repositories

We are drowning in data, but starving for knowledge! Solution: Data warehousing and data mining

Data warehousing and on-line analytical processing

Mining interesting knowledge (rules, regularities, patterns, constraints) from data in large databases

October 24, 2012

What Is Data Mining?

Data mining (knowledge discovery from data)

Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of data

Data mining: a misnomer? Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc. (Deductive) query processing. Expert systems or small statistical programs
4

Alternative names

Watch out: Is everything data mining?


October 24, 2012

Evolution of Data Mining


Evolutionary Step Data Collection(1960s) Business Question What was my total revenue in the last five years? Enabling Technology computers, tapes ,disks

Data Access(1980s)

What were unit sales in faster and cheaper with New England last March? more storage, relational databases What were un it sales in New England last March? Drill down to Boston. faster and cheaper with more storage ,On-line analytical processing(OLAP),multidi mensional databases, data warehouses faster cheaper computers with more storage, advanced computer algorithms
5

Data Warehousing and Decision Support

Data Mining

Whats likely to happen to Boston unit sales next month? Why?

October 24, 2012

Data Mining: A KDD Process

Data miningcore of knowledge discovery process

Pattern Evaluation

Data Mining Task-relevant Data Data Warehouse Selection

Data Cleaning
Data Integration Databases
October 24, 2012

Steps of a KDD Process

Learning the application domain

relevant prior knowledge and goals of application

Creating a target data set: data selection Data cleaning and preprocessing: (may take 60% of effort!) Data reduction and transformation

Find useful features, dimensionality/variable reduction, invariant representation. summarization, classification, regression, association, clustering.

Choosing functions of data mining

Choosing the mining algorithm(s) Data mining: search for patterns of interest Pattern evaluation and knowledge presentation

visualization, transformation, removing redundant patterns, etc.

Use of discovered knowledge


7

October 24, 2012

Architecture: Typical Data Mining System


Graphical user interface

Pattern evaluation

Data mining engine


Database or data warehouse server
Data cleaning & data integration

Knowledge-base
Filtering

Databases
October 24, 2012

Data Warehouse
8

What can Data mining do?

Automated prediction of trends and behaviors: Data mining automates the process of finding predictive information in large databases. Automated discovery of previously unknown patterns: Data mining tools sweep through databases and identify previously hidden patterns in one step.

October 24, 2012

Data Mining Technologies & Functionality

Commonly used techniques in data mining are Artificial neural networks, Decision trees, Genetic algorithms, Rule induction, Data visualization, Nearest neighbor method. Classes: Stored data used to locate data in predetermined groups. Clusters: Data items are grouped according to logical relationships or consumer preferences.

Associations:

Data can be mined to identify associations.

Sequence Patterns: Data is mined to anticipate behavior patterns and trends.

October 24, 2012

10

Recommended Reference Books

R. Agrawal, J. Han, and H. Mannila, Readings in Data Mining: A Database Perspective, Morgan Kaufmann (in

preparation)

J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2001

October 24, 2012

11

Where to Find the Set of Slides?

Book page: (MS PowerPoint files):

www.cs.uiuc.edu/~hanj/dmbook

Updated course presentation slides (.ppt):

www-courses.cs.uiuc.edu/~cs497jh/

Research papers, DBMiner system, and other related information:

www.cs.uiuc.edu/~hanj or www.dbminer.com

October 24, 2012

12

Thank you !!!


October 24, 2012

13

Vous aimerez peut-être aussi