Académique Documents
Professionnel Documents
Culture Documents
Chapter 1
Jiawei
Han,
Micheline
Kamber,
and
Jian
Pei
University
of
Illinois
at
Urbana-Champaign
&
Simon
Fraser
University
2013
Han,
Kamber
&
Pei.
All
rights
reserved.
11
Introduction
n
n
n
Alterna2ve
names
n
Selection
Data Cleaning
Data Integration
Databases
Decision
Making
Data Presentation
Visualization Techniques
Data Mining
Information Discovery
End User
Business
Analyst
Data
Analyst
Data Exploration
Statistical Summary, Querying, and Reporting
Data Preprocessing/Integration, Data Warehouses
DBA
Data Sources
Paper, Files, Web documents, Scientific experiments, Database Systems
8
Input Data
Data PreProcessing
Data integration
Normalization
Feature selection
Dimension reduction
Data
Mining
Pattern discovery
Association & correlation
Classification
Clustering
Outlier analysis
PostProcessing
Pattern
Pattern
Pattern
Pattern
evaluation
selection
interpretation
visualization
n
n
Data
to
be
mined
n Database
data
(extended-rela2onal,
object-oriented,
heterogeneous,
legacy),
data
warehouse,
transac2onal
data,
stream,
spa2otemporal,
2me-
series,
sequence,
text
and
web,
mul2-media,
graphs
&
social
and
informa2on
networks
Knowledge
to
be
mined
(or:
Data
mining
funcNons)
n Characteriza2on,
discrimina2on,
associa2on,
classica2on,
clustering,
trend/devia2on,
outlier
analysis,
etc.
n Descrip2ve
vs.
predic2ve
data
mining
n Mul2ple/integrated
func2ons
and
mining
at
mul2ple
levels
Techniques
uNlized
n Data-intensive,
data
warehouse
(OLAP),
machine
learning,
sta2s2cs,
pa?ern
recogni2on,
visualiza2on,
high-performance,
etc.
ApplicaNons
adapted
n Retail,
telecommunica2on,
banking,
fraud
analysis,
bio-data
mining,
stock
market
analysis,
text
mining,
Web
mining,
etc.
11
Mul2media database
Text databases
12
Typical
methods
n
Typical
applica2ons:
n
15
17
Graph
mining
n Finding
frequent
subgraphs
(e.g.,
chemical
compounds),
trees
(XML),
substructures
(web
fragments)
Informa2on
network
analysis
n Social
networks:
actors
(objects,
nodes)
and
rela2onships
(edges)
n e.g.,
author
networks
in
CS,
terrorist
networks
n Mul2ple
heterogeneous
networks
n A
person
could
be
mul2ple
informa2on
networks:
friends,
family,
classmates,
n Links
carry
a
lot
of
seman2c
informa2on:
Link
mining
Web
mining
n Web
is
a
big
informa2on
network:
from
PageRank
to
Google
n Analysis
of
Web
informa2on
networks
n Web
community
discovery,
opinion
mining,
usage
mining,
19
Evaluation of Knowledge
n
Introduction
n
Summary
21
Applications
Algorithm
Pattern
Recognition
Data Mining
Database
Technology
Statistics
Visualization
High-Performance
Computing
22
n
n
Mining
Methodology
n
User
Interac2on
n
Interac2ve mining
PAKDD
(1997),
PKDD
(1997),
SIAM-Data
Mining
(2001),
(IEEE)
ICDM
(2001),
WSDM
(2008),
etc.
KDD
Conferences
n
Other related conferences
n ACM
SIGKDD
Int.
Conf.
on
Knowledge
n DB conferences: ACM SIGMOD,
Discovery
in
Databases
and
Data
VLDB, ICDE, EDBT, ICDT,
Mining
(KDD)
n Web and IR conferences: WWW,
n SIAM
Data
Mining
Conf.
(SDM)
SIGIR, WSDM
n (IEEE)
Int.
Conf.
on
Data
Mining
n ML conferences: ICML, NIPS
(ICDM)
n PR conferences: CVPR,
n European
Conf.
on
Machine
Learning
and
Principles
and
prac2ces
of
n
Journals
Knowledge
Discovery
and
Data
n Data Mining and Knowledge
Mining
(ECML-PKDD)
Discovery (DAMI or DMKD)
n Pacic-Asia
Conf.
on
Knowledge
n IEEE Trans. On Knowledge and
Discovery
and
Data
Mining
(PAKDD)
Data Eng. (TKDE)
n Int.
Conf.
on
Web
Search
and
Data
n KDD Explorations
Mining
(WSDM)
n
Sta2s2cs
n
n
Conferences:
Machine
learning
(ML),
AAAI,
IJCAI,
COLT
(Learning
Theory),
CVPR,
NIPS,
etc.
Journals:
Machine
Learning,
Ar2cial
Intelligence,
Knowledge
and
Informa2on
Systems,
IEEE-PAMI,
etc.
Web
and
IR
n
Visualiza2on
n
n
S. ChakrabarN. Mining the Web: StaNsNcal Analysis of Hypertex and Semi-Structured Data. Morgan Kaufmann, 2002
T. Dasu and T. Johnson. Exploratory Data Mining and Data Cleaning. John Wiley & Sons, 2003
U.
M.
Fayyad,
G.
Piatetsky-Shapiro,
P.
Smyth,
and
R.
Uthurusamy.
Advances
in
Knowledge
Discovery
and
Data
Mining.
AAAI/
MIT
Press,
1996
U.
Fayyad,
G.
Grinstein,
and
A.
Wierse,
InformaNon
VisualizaNon
in
Data
Mining
and
Knowledge
Discovery,
Morgan
Kaufmann,
2001
J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques. Morgan Kaufmann, 3rd ed. , 2011
T.
HasNe,
R.
Tibshirani,
and
J.
Friedman,
The
Elements
of
StaNsNcal
Learning:
Data
Mining,
Inference,
and
PredicNon,
2nd
ed.,
Springer,
2009
Y. Sun and J. Han, Mining Heterogeneous InformaNon Networks, Morgan & Claypool, 2012
P.-N. Tan, M. Steinbach and V. Kumar, IntroducNon to Data Mining, Wiley, 2005
I.
H.
Wi[en
and
E.
Frank,
Data
Mining:
PracNcal
Machine
Learning
Tools
and
Techniques
with
Java
ImplementaNons,
Morgan
Kaufmann,
2nd
ed.
2005
30
Summary
n
n
n