Vous êtes sur la page 1sur 74

Data Mining:

Concepts and Techniques


Text Book:
Jaiawei Han and Micheline Kamber, Data Mining Concepts and Techniques , Morgan Kaufmann/Elsevier Science publishers

July 2010

Data Mining: Concepts and Techniques

July 2010

Data Mining: Concepts and Techniques

July 2010

Data Mining: Concepts and Techniques

July 2010

Data Mining: Concepts and Techniques

July 2010

Data Mining: Concepts and Techniques

July 2010

Data Mining: Concepts and Techniques

February July 201029, 2012

Data Mining: Concepts and Techniques

February July 201029, 2012

Data Mining: Concepts and Techniques

Why Data Mining?




The Explosive Growth of Data: from terabytes to petabytes




Data collection and data availability




Automated data collection tools, database systems, Web, computerized society

Major sources of abundant data


 

Business: Web, e-commerce, transactions, stocks, Science: Remote sensing, bioinformatics, scientific simulation, Society and everyone: news, digital cameras, YouTube

 

We are drowning in data, but starving for knowledge! Necessity is the mother of invention analysis of massive data sets Data mining Automated

July 2010

Data Mining: Concepts and Techniques

July 2010

Data Mining: Concepts and Techniques

10

July 2010

Data Mining: Concepts and Techniques

11

Syllabus MCOMP - 502: DATA WAREHOUSING & DATA MINING Unit-I Introduction:
Motivations Data Mining on different kinds of data Data Mining Functionalities Data Mining Task Primitives Classifications of Data Mining Systems Major issues in Data Mining.

Data Preprocessing :
Need for data Preprocessing Descriptive Data Summarization Data Cleaning, Data Integration and Transformation Data Reduction Data Discretization and Concepts Hierarchy Generation.
July 2010 Data Mining: Concepts and Techniques

12

Syllabus Conti
Unit-II
Data Warehouse and OLAP technology for data Mining:
Definition of data warehouse A Multidimensional Data Model Data warehouse architecture Data warehouse implementation From data warehousing to data Mining.

Data Cube Computation and Data Generalization :


Efficient methods for Data Cube Computation Further Development of Data Cube and OLAP Technology Attribute Oriented Induction An alternative method for Data Generalization and Concept Description.
July 2010 Data Mining: Concepts and Techniques

13

Syllabus Conti
Unit-III
Mining Frequent Patterns , Associations and Correlations:
Basic Concepts Efficient and Scalable Frequent Itemset Mining Methods Mining various kinds of Association rules From Association Mining to Correlation Analysis Constraint-Based Association Mining.

July 2010

Data Mining: Concepts and Techniques

14

Syllabus Conti
Unit-IV
Classification and Prediction:
Definition of Classification and Prediction Issues regarding classification and Prediction Classification by decision tree induction Bayesian Classification Rule based Classification Classification by Back propagation Classification by association rules analysis Lazy learners Other classification methods Prediction Classification accuracy and error measures

July 2010

Data Mining: Concepts and Techniques

15

Syllabus Conti
Unit-IV conti
Cluster Analysis:
Definition of Cluster Types of data in cluster analysis A categorization of major cluster Methods Partitioning methods Hierarchical methods Density-Base Methods Grid-based methods Model based Clustering Methods Outlier analysis

July 2010

Data Mining: Concepts and Techniques

16

Syllabus Conti
Unit-V
Applications and Trends in Data Mining
Mining Data Streams Mining Time-Series Data Mining Sequence Patterns in Transactional Database Mining Sequence patterns in Biological Data Graph Mining Spatial Data Mining Multimedia Data Mining Text Mining Mining the World Wide Web Data Mining Applications and Trends

February July 201029, 2012

Data Mining: Concepts and Techniques

17

Chapter 1. Introduction
            

Motivation: Why Data Mining? What Is Data Mining? Data Mining: On What Kind of data? A Multi-Dimensional View of Data Mining Data Mining Functionalities: What Kinds of Patterns Can Be Mined? Evaluation of Knowledge Data Mining Task Primitives Classification of data mining systems Major issues in data mining Applications of Data Mining Major Challenges in Data Mining A Brief History of Data Mining and Data Mining Society Summary

July 2010

Data Mining: Concepts and Techniques

18

What Is Data Mining?




Data mining (knowledge discovery from data)




Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of data

 

Data mining: a misnomer? Alternative names




Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc. Simple search and query processing (Deductive) expert systems
Data Mining: Concepts and Techniques

Watch out: Is everything data mining ?


 

July 2010

19

Chapter 1. Introduction
            

Motivation: Why Data Mining? What Is Data Mining? Data Mining: On What Kind of data? A Multi-Dimensional View of Data Mining Data Mining Functionalities: What Kinds of Patterns Can Be Mined? Evaluation of Knowledge Data Mining Task Primitives Classification of data mining systems Major issues in data mining Applications of Data Mining Major Challenges in Data Mining A Brief History of Data Mining and Data Mining Society Summary

July 2010

Data Mining: Concepts and Techniques

20

Evolution of Sciences
 

Before 1600, empirical science 1600-1950s, theoretical science




Each discipline has grown a theoretical component. Theoretical models often motivate experiments and generalize our understanding.

1950s-1990s, computational science




Over the last 50 years, most disciplines have grown a third, computational branch (e.g. empirical, theoretical, and computational ecology, or physics, or linguistics.) Computational Science traditionally meant simulation. It grew out of our inability to find closed-form solutions for complex mathematical models.

1990-now, data science


   

The flood of data from new scientific instruments and simulations The ability to economically store and manage petabytes of data online The Internet and computing Grid that makes all these archives universally accessible Scientific info. management, acquisition, organization, query, and visualization tasks scale almost linearly with data volumes. Data mining is a major new challenge!
Data Mining: Concepts and Techniques

July 2010

21

Evolution of Database Technology


 

1960s: Primitive file processing system 1970s




Hierarchical and network data base system, Data collection, database creation, relational database system. ER Model , Indexing, Accessing, Query language, Forms reports and OLTP, Relational data model, relational DBMS implementation RDBMS, advanced data models (extended-relational, OO, deductive, etc.) Application-oriented DBMS (spatial, scientific, engineering, etc.) Data mining, data warehousing, multimedia databases, and Web databases Stream data management and mining Data mining and its applications
Data Mining: Concepts and Techniques

1980s:


1990s:
 

1990s:


2000s


July 2010


22

July 2010

Data Mining: Concepts and Techniques

23

July 2010

Data Mining: Concepts and Techniques

24

Chapter 1. Introduction
            

Why Data Mining? What Is Data Mining? Data Mining: On What Kind of data? A Multi-Dimensional View of Data Mining Data Mining Functionalities: What Kinds of Patterns Can Be Mined? Evaluation of Knowledge Data Mining Task Primitives Classification of data mining systems Major issues in data mining Applications of Data Mining Major Challenges in Data Mining A Brief History of Data Mining and Data Mining Society Summary

July 2010

Data Mining: Concepts and Techniques

25

Knowledge Discovery (KDD) Process




This is a view from typical database systems and data Pattern Evaluation warehousing communities Data mining plays an essential role in the knowledge discovery Data Mining process Task-relevant Data Data Warehouse Data Cleaning Data Integration Databases Selection and Transformation

July 2010

Data Mining: Concepts and Techniques

26

July 2010

Data Mining: Concepts and Techniques

27

KDD Process: A Typical View from ML and Statistics

Input Data

Data PreProcessing

Data Mining

PostProcessing

Data integration Normalization Feature selection Dimension reduction

Pattern discovery Association & correlation Classification Clustering Outlier analysis

Pattern evaluation Pattern selection Pattern interpretation Pattern visualization

This is a view from typical machine learning and statistics communities


Data Mining: Concepts and Techniques

July 2010

28

July 2010

Data Mining: Concepts and Techniques

29

Example: A Web Mining Framework




Web mining usually involves


       

Data cleaning Data integration from multiple sources Warehousing the data Data cube construction Data selection for data mining Data mining Presentation of the mining results Patterns and knowledge to be used or stored into knowledge-base
Data Mining: Concepts and Techniques

July 2010

30

Data Mining in Business Intelligence


Increasing potential to support business decisions

End User

Decision Making Data Presentation Visualization Techniques Data Mining Information Discovery
Business Analyst Data Analyst

Data Exploration Statistical Summary, Querying, and Reporting Data Preprocessing/Integration, Data Warehouses Data Sources Paper, Files, Web documents, Scientific experiments, Database Systems
July 2010 Data Mining: Concepts and Techniques

DBA

31

Example: Mining vs. Data Exploration




Business intelligence view




Warehouse, data cube, reporting but not much mining

   

Business objects vs. data mining tools Supply chain example: tools Data presentation Exploration

July 2010

Data Mining: Concepts and Techniques

32

Example: Medical Data Mining




Health care & medical data mining often adopted such a view in statistics and machine learning Preprocessing of the data (including feature extraction and dimension reduction) Classification or/and clustering processes Post-processing for presentation

 

July 2010

Data Mining: Concepts and Techniques

33

Data Mining: Confluence of Multiple Disciplines


Machine Learning Pattern Recognition

Statistics

Applications

Data Mining

Visualization

Algorithm

Database Technology

High-Performance Computing

July 2010

Data Mining: Concepts and Techniques

34

Why Confluence of Multiple Disciplines?




Tremendous amount of data




Algorithms must be highly scalable to handle such as tera-bytes of data Micro-array may have tens of thousands of dimensions Data streams and sensor data Time-series data, temporal data, sequence data Structure data, graphs, social networks and multi-linked data Heterogeneous databases and legacy databases Spatial, spatiotemporal, multimedia, text and Web data Software programs, scientific simulations

High-dimensionality of data


High complexity of data


     


July 2010

New and sophisticated applications


Data Mining: Concepts and Techniques

35

Applications of Data Mining conti..




Detecting inappropriate medical treatment




Australian Health Insurance Commission identifies that in many cases blanket screening tests were requested (save Australian $1m/yr).

Detecting telephone fraud




Telephone call model: destination of the call, duration, time of day or week. Analyze patterns that deviate from an expected norm. British Telecom identified discrete groups of callers with frequent intra-group calls, especially mobile phones, and broke a multimillion dollar fraud.

Retail


Analysts estimate that 38% of retail shrink is due to dishonest employees.


Data Mining: Concepts and Techniques

July 2010

36

Applications of Data Mining




Web page analysis: from web page classification, clustering to PageRank & HITS algorithms Collaborative analysis & recommender systems Basket data analysis to targeted marketing Biological and medical data analysis: classification, cluster analysis (microarray data analysis), biological sequence analysis, biological network analysis Data mining and software engineering (e.g., IEEE Computer, Aug. 2009 issue) From major dedicated data mining systems/tools (e.g., SAS, MS SQLServer Analysis Manager, Oracle Data Mining Tools) to invisible data mining
Data Mining: Concepts and Techniques

  

July 2010

37

Chapter 1. Introduction
            

Why Data Mining? What Is Data Mining? Data Mining: On What Kind of data? A Multi-Dimensional View of Data Mining Data Mining Functionalities: What Kinds of Patterns Can Be Mined? Evaluation of Knowledge Data Mining Task Primitives Classification of data mining systems Major issues in data mining Applications of Data Mining Major Challenges in Data Mining A Brief History of Data Mining and Data Mining Society Summary

July 2010

Data Mining: Concepts and Techniques

38

Data Mining: On What Kinds of Data?




Database-oriented data sets and applications




Relational database, data warehouse, transactional database

Advanced data sets and advanced applications


        

Data streams and sensor data Time-series data, temporal data, sequence data (incl. bio-sequences) Structure data, graphs, social networks and multi-linked data Object-relational databases Heterogeneous databases and legacy databases Spatial data and spatiotemporal data Multimedia database Text databases The World-Wide Web
Data Mining: Concepts and Techniques

July 2010

39

July 2010

Data Mining: Concepts and Techniques

40

July 2010

Data Mining: Concepts and Techniques

41

July 2010

Data Mining: Concepts and Techniques

42

July 2010

Data Mining: Concepts and Techniques

43

Chapter 1. Introduction
            

Why Data Mining? What Is Data Mining? Data Mining: On What Kind of data? A Multi-Dimensional View of Data Mining Data Mining Functionalities: What Kinds of Patterns Can Be Mined? Evaluation of Knowledge Data Mining Task Primitives Classification of data mining systems Major issues in data mining Applications of Data Mining Major Challenges in Data Mining A Brief History of Data Mining and Data Mining Society Summary

July 2010

Data Mining: Concepts and Techniques

44

Multi-Dimensional View of Data Mining




Knowledge to be mined (or: Data mining functions)  Characterization, discrimination, association, classification, clustering, trend/deviation, outlier analysis, etc.  Descriptive vs. predictive data mining  Multiple/integrated functions and mining at multiple levels Data to be mined  Relational, data warehouse, transactional, stream, objectoriented/relational, active, spatial, time-series, text, multi-media, heterogeneous, legacy, WWW Techniques utilized  Database-oriented, data warehouse (OLAP), machine learning, statistics, visualization, etc. Applications adapted  Retail, telecommunication, banking, fraud analysis, bio-data mining, stock market analysis, text mining, Web mining, etc.
Data Mining: Concepts and Techniques

July 2010

45

Chapter 1. Introduction
            

Why Data Mining? What Is Data Mining? Data Mining: On What Kind of data? A Multi-Dimensional View of Data Mining Data Mining Functionalities: What Kinds of Patterns Can Be Mined? Evaluation of Knowledge Data Mining Task Primitives Classification of data mining systems Major issues in data mining Applications of Data Mining Major Challenges in Data Mining A Brief History of Data Mining and Data Mining Society Summary

July 2010

Data Mining: Concepts and Techniques

46

Data Mining Function: (1) Generalization




Information integration and data warehouse construction




Data cleaning, transformation, integration, and multidimensional data model Scalable methods for computing (i.e., materializing) multidimensional aggregates OLAP (online analytical processing)

Data cube technology




Multidimensional concept description: Characterization and discrimination




Generalize, summarize, and contrast data characteristics, e.g., dry vs. wet region
Data Mining: Concepts and Techniques

July 2010

47

Data Mining Function: (2) Association and Correlation Analysis




Frequent patterns (or frequent itemsets)




What items are frequently purchased together in your Walmart? A typical association rule


Association, correlation vs. causality




Computer

Antivirus software [support = 2%; confidence = 60%]

Are strongly associated items also strongly correlated?

How to mine such patterns and rules efficiently in large datasets?

How to use such patterns for classification, clustering, and other applications? Concepts and Techniques July 2010 Data Mining:


48

July 2010

Data Mining: Concepts and Techniques

49

Data Mining Function: (3) Classification




Classification and label prediction


 

Construct models (functions) based on some training examples Describe and distinguish classes or concepts for future prediction


E.g., classify countries based on (climate), or classify cars based on (gas mileage)

Predict some unknown class labels Decision trees, nave Bayesian classification, support vector machines, neural networks, rule-based classification, patternbased classification, logistic regression, Credit card fraud detection, direct marketing, classifying stars, diseases, web-pages,
Data Mining: Concepts and Techniques

Typical methods


Typical applications:


July 2010

50

Data Mining Function: (4) Cluster Analysis


 

Unsupervised learning (i.e., Class label is unknown) Group data to form new categories (i.e., clusters), e.g., cluster houses to find distribution patterns Principle: Maximizing intra-class similarity & minimizing interclass similarity Many methods and applications

July 2010

Data Mining: Concepts and Techniques

51

July 2010

Data Mining: Concepts and Techniques

52

Data Mining Function: (5) Outlier Analysis




Outlier analysis


Outlier: A data object that does not comply with the general behavior of the data Noise or exception? person s treasure One person s garbage could be another

 

Methods: by product of clustering or regression analysis, Useful in fraud detection, rare events analysis

July 2010

Data Mining: Concepts and Techniques

53

Chapter 1. Introduction
            

Why Data Mining? What Is Data Mining? Data Mining: On What Kind of data? A Multi-Dimensional View of Data Mining Data Mining Functionalities: What Kinds of Patterns Can Be Mined? Evaluation of Knowledge Data Mining Task Primitives Classification of data mining systems Major issues in data mining Applications of Data Mining Major Challenges in Data Mining A Brief History of Data Mining and Data Mining Society Summary

July 2010

Data Mining: Concepts and Techniques

54

Evaluation of Knowledge


Are all mined knowledge (Patterns) interesting?


  

One can mine tremendous amount of patterns and knowledge Some may fit only certain dimension space (time, location, Some may not be representative, may be transient, )

Interestingness measures:


A pattern is interesting if it is easily understood by humans, valid on new or test data with some degree of certainty, potentially useful, novel, or validates some hypothesis that a user seeks to confirm

July 2010

Data Mining: Concepts and Techniques

55

Evaluation of Knowledge


Objective vs. subjective interestingness measures:




Objective: based on statistics and structures of patterns, e.g., support, confidence, etc. Subjective: based on user s belief in the data, e.g., unexpectedness, novelty, actionability, etc.

Evaluation of mined knowledge


    

directly mine only interesting knowledge?

Descriptive vs. predictive Coverage Typicality vs. novelty Accuracy Timeliness etc
Data Mining: Concepts and Techniques

July 2010

56

Chapter 1. Introduction
            

Why Data Mining? What Is Data Mining? Data Mining: On What Kind of data? A Multi-Dimensional View of Data Mining Data Mining Functionalities: What Kinds of Patterns Can Be Mined? Evaluation of Knowledge Data Mining Task Primitives Classification of data mining systems Major issues in data mining Applications of Data Mining Major Challenges in Data Mining A Brief History of Data Mining and Data Mining Society Summary

July 2010

Data Mining: Concepts and Techniques

57

July 2010

Data Mining: Concepts and Techniques

58

Example:
Suppose, as a marketing manager of AllElectronics, you would like to classify customers based on their buying patterns. You are especially interested in those customers whose salary is no less than $40,000, and who have bought more than $1,000 worth of items, each of which is priced at no less than $100. In particular, you are interested in the customer s age, income, the types of items purchased, the purchase location, and where the items were made. You would like to view the resulting classification in the form of rules. This data mining query is expressed in DMQL3 as follows, where each line of the query has been enumerated to aid in our discussion.
July 2010 Data Mining: Concepts and Techniques

59

DMQL (Data Mining Querry Language)


use database AllElectronics db use hierarchy location_hierarchy for T.branch, age_hierarchy for C.age mine classification as promising customers in relevance to C.age, C.income, I.type, I.place made, T.branch from customer C, item I, transaction T where I.item ID = T.item ID and C.cust ID = T.cust ID and C.income 40,000 and I.price 100 group by T.cust ID

July 2010

Data Mining: Concepts and Techniques

60

Chapter 1. Introduction
            

Why Data Mining? What Is Data Mining? Data Mining: On What Kind of data? A Multi-Dimensional View of Data Mining Data Mining Functionalities: What Kinds of Patterns Can Be Mined? Evaluation of Knowledge Data Mining Task Primitives Classification of data mining systems Major issues in data mining Applications of Data Mining Major Challenges in Data Mining A Brief History of Data Mining and Data Mining Society Summary

July 2010

Data Mining: Concepts and Techniques

61

Classification of data mining systems




General functionality
 

Descriptive data mining Predictive data mining

Different views, different classifications


   

Kinds of databases to be mined Kinds of knowledge to be discovered Kinds of techniques utilized Kinds of applications adapted

July 2010

Data Mining: Concepts and Techniques

62

Chapter 1. Introduction
            

Why Data Mining? What Is Data Mining? Data Mining: On What Kind of data? A Multi-Dimensional View of Data Mining Data Mining Functionalities: What Kinds of Patterns Can Be Mined? Evaluation of Knowledge Data Mining Task Primitives Classification of data mining systems Major issues in data mining Applications of Data Mining Major Challenges in Data Mining A Brief History of Data Mining and Data Mining Society Summary

July 2010

Data Mining: Concepts and Techniques

63

Major issues in data mining




Issues relating to the diversity of data types


 

Handling relational and complex types of data Mining information from heterogeneous databases and global information systems (WWW)

Issues related to applications and social impacts




Application of discovered knowledge  Domain-specific data mining tools  Intelligent query answering  Process control and decision making

Integration of the discovered knowledge with existing knowledge: A knowledge fusion problem Protection of data security, integrity, and privacy
Data Mining: Concepts and Techniques

 July 2010

64

Integration of a Data Mining System


Integration of a Data Mining System with a Database or DataWarehouse System


No coupling Loose coupling Semitight coupling Tight coupling

July 2010

Data Mining: Concepts and Techniques

65

Chapter 1. Introduction
            

Why Data Mining? What Is Data Mining? Data Mining: On What Kind of data? A Multi-Dimensional View of Data Mining Data Mining Functionalities: What Kinds of Patterns Can Be Mined? Evaluation of Knowledge Data Mining Task Primitives Classification of data mining systems Major issues in data mining Applications of Data Mining Major Challenges in Data Mining A Brief History of Data Mining and Data Mining Society Summary

July 2010

Data Mining: Concepts and Techniques

66

Chapter 1. Introduction
            

Why Data Mining? What Is Data Mining? Data Mining: On What Kind of data? A Multi-Dimensional View of Data Mining Data Mining Functionalities: What Kinds of Patterns Can Be Mined? Evaluation of Knowledge Data Mining Task Primitives Classification of data mining systems Major issues in data mining Applications of Data Mining Major Challenges in Data Mining A Brief History of Data Mining and Data Mining Society Summary

July 2010

Data Mining: Concepts and Techniques

67

Major Challenges in Data Mining


    

Efficiency and scalability of data mining algorithms Parallel, distributed, stream, and incremental mining methods Handling high-dimensionality Handling noise, uncertainty, and incompleteness of data Incorporation of constraints, expert knowledge, and background knowledge in data mining Pattern evaluation and knowledge integration Mining diverse and heterogeneous kinds of data: e.g., bioinformatics, Web, software/system engineering, information networks Application-oriented and domain-specific data mining Invisible data mining (embedded in other functional modules) Protection of security, integrity, and privacy in data mining
Data Mining: Concepts and Techniques

 

  

July 2010

68

Chapter 1. Introduction
            

Why Data Mining? What Is Data Mining? Data Mining: On What Kind of data? A Multi-Dimensional View of Data Mining Data Mining Functionalities: What Kinds of Patterns Can Be Mined? Evaluation of Knowledge Data Mining Task Primitives Classification of data mining systems Major issues in data mining Applications of Data Mining Major Challenges in Data Mining A Brief History of Data Mining and Data Mining Society Summary

July 2010

Data Mining: Concepts and Techniques

69

A Brief History of Data Mining Society




1989 IJCAI Workshop on Knowledge Discovery in Databases




Knowledge Discovery in Databases (G. Piatetsky-Shapiro and W. Frawley, 1991) Advances in Knowledge Discovery and Data Mining (U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, 1996)

1991-1994 Workshops on Knowledge Discovery in Databases




1995-1998 International Conferences on Knowledge Discovery in Databases and Data Mining (KDD 95-98)


Journal of Data Mining and Knowledge Discovery (1997)

 

ACM SIGKDD conferences since 1998 and SIGKDD Explorations More conferences on data mining


PAKDD (1997), PKDD (1997), SIAM-Data Mining (2001), (IEEE) ICDM (2001), etc.

ACM Transactions on KDD starting in 2007


Data Mining: Concepts and Techniques

July 2010

70

Conferences and Journals on Data Mining




KDD Conferences  ACM SIGKDD Int. Conf. on Knowledge Discovery in Databases and Data Mining (KDD)  SIAM Data Mining Conf. (SDM)  (IEEE) Int. Conf. on Data Mining (ICDM)  Conf. on Principles and practices of Knowledge Discovery and Data Mining (PKDD)  Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD)

Other related conferences


    

ACM SIGMOD VLDB (IEEE) ICDE WWW, SIGIR ICML, CVPR, NIPS Data Mining and Knowledge Discovery (DAMI or DMKD) IEEE Trans. On Knowledge and Data Eng. (TKDE) KDD Explorations ACM Trans. on KDD
71

Journals


 

July 2010

Data Mining: Concepts and Techniques

Summary


Data mining: Discovering interesting patterns from large amounts of data A natural evolution of database technology, in great demand, with wide applications A KDD process includes data cleaning, data integration, data selection, transformation, data mining, pattern evaluation, and knowledge presentation Mining can be performed in a variety of information repositories Data mining functionalities: characterization, discrimination, association, classification, clustering, outlier and trend analysis, etc. Data mining systems and architectures Major issues in data mining
Data Mining: Concepts and Techniques

 

 

July 2010

72

What are the Motivations for Data Mining? What are the challenges in data mining Explain in detail.. Discuss and Explain the terms Discrimination, Generalization and characterization

Explain the architecture of a typical DM system with a neat diagram Explain the taxonomy of data mining tasks Define each of the following data mining functionalities: characterization, discrimination, association and correlation analysis, classification, prediction, clustering, and evolution analysis. What are the various Data Mining functionalities? What are the measures of patterns interestingness?

Explain various data Mining task primitives OR Explain the different ways of user interaction with the data mining system Discuss the issues related to Integration of a Data Mining System with a Database or Data Warehouse System
Data Mining: Concepts and Techniques

July 2010

73

What is data warehouse? How is a data warehouse different from a database? How are they similar? Briefly describe the following advanced database systems and applications: relational db, transactional db, object relational databases, spatial databases, text databases, multimedia databases, stream data, World Wide Web. Describe why concept hierarchies are important and useful in data mining. Discuss the differences between the following approaches: No coupling, Loose coupling, Semi tight coupling and tight coupling.

July 2010

Data Mining: Concepts and Techniques

74

Vous aimerez peut-être aussi