Académique Documents
Professionnel Documents
Culture Documents
Indriana Hidayah
References
1. Witten, Ian H. and Eibe Frank. Data mining: practical
machine learning tools and techniques, 2nd edition.
Morgan Kaufmann publishers. 2005.
2. Han, Jiawei, Micheline Kamber, and Jian Pei. Data
mining: concept and techniques, 3rd edition. Morgan
Kaufmann Publishers. 2012.
3. Liu, Bing. Web data mining: exploring hyperlinks,
contents, and usage data. Springer. 2007.
Lecture plan
RPKPS (Rencana Program Kegiatan
Pembelajaran Semester)
Pattern evaluation
Data
Databases Warehouse
9/13/2013 Data Mining: Concepts and Techniques 18
Another example:Directed marketing
(S. Moro, R. Laureano and P. Cortez. Using Data Mining for Bank Direct Marketing: An Application of the CRISP-
DM Methodology. )
• Problem:
– Increasing vast number of marketing campaigns
– Global competitive world
– Mass campaigns are ineffective
• Solution:
– Directed campaigns with a strict and rigorous selection of
contacts.
• Focus on targets that assumable will be keener to that specific
product/service
• More efficient, reduction in costs and time
• The dataset:
– Portuguese marketing
campaign related with bank
deposit subscription.
– Dataset collected is related to
17 campaigns that occurred
between May 2008 and
November 2010,
corresponding to a total of
79354 contacts.
– For each contact, recorded
• a large number of attributes
• the target variable (class attribute)
• there were 6499 successes (8%
success rate).
Steps
1. Goal definition
– To predict if a client will subscribe the deposit
– Classification task
2. Simple data pre-processing (Data Preparation phase)
– Non-conclusive instances were discarded, leading to a total of 55817
contacts.
– Attribute reduction, leading to 29 attributes and 1 class attribute
– Discard instances that contained missing values, leading to 45211
instances (5289 of which were successful or 11.7% success rate).
3. Data mining step (Modeling phase), using NB, DT, SVM
– dataset was randomly divided into training (2/3) and test (1/3) sets
4. Evaluation of the model
Conclusion
• Call duration is the most
relevant feature, meaning
that longer calls tend
increase successes.
• In second place comes the
month of contact.
• Success is most likely to
occur in the last month of
each trimester (March, June,
September and December).
• Such knowledge can be
used to shift campaigns to
occur in those months.
Data Mining: On What Kind of
Data?
Relational databases
Data warehouses
Transactional databases
Advanced DB and information repositories
Object-oriented and object-relational databases
Spatial databases
Time-series data and temporal data
Text databases and multimedia databases
Heterogeneous and legacy databases
WWW
9/13/2013 Data Mining: Concepts and Techniques 23
Functionality
Knowledge produced by data mining
Knowledge in DM term, means useful pattern
The pattern should be
Useful
Valid
Understandable
Pattern types can be produced by data mining
methods:
Frequent pattern, association, correlation
Data characterization and discrimination
Classification and prediction
Cluster
Frequent pattern, association,
correlation