Académique Documents
Professionnel Documents
Culture Documents
Introduction
KDD process
Architecture: Typical Data Mining System
repositories
We are drowning in data, but starving for knowledge! Solution: Data warehousing and data mining
Mining interesting knowledge (rules, regularities, patterns, constraints) from data in large databases
Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of data
Data mining: a misnomer? Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc. (Deductive) query processing. Expert systems or small statistical programs
4
Alternative names
Data Access(1980s)
What were unit sales in faster and cheaper with New England last March? more storage, relational databases What were un it sales in New England last March? Drill down to Boston. faster and cheaper with more storage ,On-line analytical processing(OLAP),multidi mensional databases, data warehouses faster cheaper computers with more storage, advanced computer algorithms
5
Data Mining
Pattern Evaluation
Data Cleaning
Data Integration Databases
October 24, 2012
Creating a target data set: data selection Data cleaning and preprocessing: (may take 60% of effort!) Data reduction and transformation
Find useful features, dimensionality/variable reduction, invariant representation. summarization, classification, regression, association, clustering.
Choosing the mining algorithm(s) Data mining: search for patterns of interest Pattern evaluation and knowledge presentation
Pattern evaluation
Knowledge-base
Filtering
Databases
October 24, 2012
Data Warehouse
8
Automated prediction of trends and behaviors: Data mining automates the process of finding predictive information in large databases. Automated discovery of previously unknown patterns: Data mining tools sweep through databases and identify previously hidden patterns in one step.
Commonly used techniques in data mining are Artificial neural networks, Decision trees, Genetic algorithms, Rule induction, Data visualization, Nearest neighbor method. Classes: Stored data used to locate data in predetermined groups. Clusters: Data items are grouped according to logical relationships or consumer preferences.
Associations:
10
R. Agrawal, J. Han, and H. Mannila, Readings in Data Mining: A Database Perspective, Morgan Kaufmann (in
preparation)
J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2001
11
www.cs.uiuc.edu/~hanj/dmbook
www-courses.cs.uiuc.edu/~cs497jh/
www.cs.uiuc.edu/~hanj or www.dbminer.com
12
13