Vous êtes sur la page 1sur 5


Data Mining:
Intelligent methods are applied to extract the
useful information or patterns
Data Mining: A KDD Process:
Data mining: the core of knowledge discovery
Steps of a KDD Process
Data Cleaning
Handles Noisy, Inconsistent, Incomplete data
Missing Values
Noisy data
Binning, Clustering etc.
Tools, functional dependencies

Data Integration
Schema Integration

Entity Identification problem

Correlation Analysis

Data Selection
Select only the task relevant data

Data Transformation
Transform or consolidate data
Smoothing, Normalization, Feature Construction
Data Reduction Compression

Pattern Evaluation
Interestingness Measures

Knowledge Presentation

Data Mining Functionalities:

Characterize general properties of the data


Performs inference


Various Granularities

Concept/class description
Association Analysis
Classification and Prediction
Cluster Analysis
Outlier Analysis
Evolution Analysis

Concept/ Class Description:

Data can be associated with Classes / Concepts

Computers, Printers
BigSpenders Vs BudgetSpenders

Class / Concept Description

Classes and Concepts can be summarized in

concise and precise terms

Data Characterization
Data Discrimination

Data Characterization:

Summarization of the general characteristics

Data collected and aggregated
OLAP roll up operation
Attribute Oriented Induction
Results Charts, cubes, rules
Characteristics of Customers

Data Discrimination:

Compare target class and contrasting classes

Maybe user specified
Products whose sales increased Vs decreased
Regular Shoppers Vs Occasional Shoppers

Output includes Comparative measures

Association Analysis:

Discovery of association rules

Form: X Y
Age(X, 2029)

buys(X, Laptop)
Single Dimensional

Classification and Prediction:

Finds models that describe and differentiate

classes or concepts
Predicts class
Training data
Models rules, decision trees, NN, formulae
Preceded by relevance analysis (to eliminate
irrelevant attributes)
Derived model is used for prediction
Data value prediction
Class label prediction (Classification)
Trend identification
Cluster Analysis

Class labels are missing in the training set
Maximize Intra-class similarity
Minimize Inter-class similarity
Hierarchy of classes

Outlier Analysis

Objects that do not comply with the general behavior

Noise Vs Rare events
Fraud detection
Statistical tests
Deviation based methods

Evolution Analysis:

Trend detection
Time series data
Involves other functionalities