Vous êtes sur la page 1sur 9

APPLICATIONS OF DATA MINING ISSUES IN

DATA MINING
Applications:

Financial Data Analysis


Retail Industry
Telecommunication Industry
Biological Data Analysis
Other Scientific Applications
Intrusion Detection
Financial Data Analysis:
Financial Data
Collected from Banks and Financial Institutions
Usually complete and reliable
Design and Construction of data Warehouses for

multi-dimensional data analysis and mining


Analysis Changes by month, by region, by
sectorand max, min, total, average, trend etc.
Characteristic and Comparative analysis, Outlier
Analysis
Loan payment and customer credit policy analysis

Feature Selection and attribute relevance ranking


(Debt ratio, credit history, income, education
level )
Loan granting policy can be adjusted
Low risk Customers are granted loans

Classification and Clustering of customers for

targeted marketing
Customer group identification
Multidimensional clustering techniques
Can associate new customer with existing groups
Detection of money laundering and financial crimes

Data from several sources integrated


Data Analysis tools can be used to detect unusual
patterns
Data Visualization tools, Linkage Analysis tools
Classification tools, Clustering tools
Outlier Analysis tools

Retail Industry:
Sales Data, Customer Shopping history, Goods

Transportation, E-Commerce
Mining can help to
Identify buying behaviour, discover shopping
trends
Improve the quality of customer service, retain
customers
Design and Construction of data warehouses
Several ways to design a warehouse

Entities involved: Sales, Customers,


Employers, Goods transportation
Preliminary data mining exercises can help to
guide the design process
Dimensions and levels to involve and preprocessing to be done

Multi-dimensional analysis of sales, customers,

products, time and region


Multi-feature data cube
Visualization tools
Analysis of effectiveness of sales campaigns

Compare sales and transaction volume


Multidimensional analysis

Compare sales amount, number of


transactions containing same items before and
after the campaign
Association Analysis

Identify items likely to be purchased together

Customer Retention

Customer loyalty and trends

Sequential pattern mining


Adjust pricing strategy and goods range

Purchase recommendation and cross-reference of

items
Recommender Systems
Sales promotion by displaying deal information in
association with items of interest

Telecommunication Industry:
Computer and Web data transmission, fax, Mobile

phone, Telephone services

Multidimensional analysis of telecommunication data

Helps to identify and compare the data traffic,


System work load, Resource usage, User Group
Behavior, Profit..
Time-of-day usage patterns

Fraudulent pattern analysis

Identify fraudulent users and atypical usage


patterns

Illegal Customer account access

Automatic Dial-out equipment

Switch and route congestion patterns

Multidimensional association and sequential pattern

analysis
Usage patterns for a set of communication
services by customer group, time of day
Sales Promotion
Mobile Telecommunication Services

Spatio-temporal data mining

Use of visualization tools

Biomedical and DNA Data Analysis:


Research in DNA Analysis has led to

Development of new drugs


Cancer therapies
Human genome study
Discovery of genetic causes for many diseases

Genome Research

Study of DNA Sequences


Adenine, Cytosine, Guanine, Thymine
1,00,000 genes each has hundreds of
nucleotides can be combined in a number of
ways
Identifying Gene Sequence patterns is challenging

Semantic Integration of Heterogeneous, distributed

genome databases
Highly distributed generation and use of DNA
data
Integrated data warehouses and distributed
federated databases
Efficient Data Cleaning and Integration methods
Similarity Search and Comparison among DNA

Sequences
Gene sequences isolated from healthy and
diseased tissues
Compare frequently occurring patterns in each
class
Help to identify the genetic factors of the disease
and immune factors
Non-numeric nature of data poses difficulties
Association Analysis: Identification of co-occurring

gene sequences
Diseases triggered by a combination of genes
acting together
Association analysis helps to detect the kinds of
genes that may co-occur
Study interactions and relationships between them

Path Analysis: Linking genes to different stages of

disease development
Different genes become active at different stages
of the disease
Develop drug interventions that target specific
stages
Visualization tools and genetic data analysis

Complex Gene structures Graphs, trees,


Cuboids and visualization tools
Better Understanding and support interactive data
exploration

Intrusion Detection:
Intrusions

Any set of actions that threaten the integrity,


availability, or confidentiality of a network
resource
Misuse detection: use patterns of well-known attacks
to identify intrusions

Signatures Must be updated

Classification based on known intrusions

E.g., three consecutive login failures: password


guessing.

Anomaly detection: use deviation from normal usage


patterns to identify intrusions

Any significant deviations from the expected


behavior are reported as possible attacks

Data Mining Algorithms

Misuse detection

training data labeled normal / intrusion

Classifier can be used to detect known


intrusions

Classification algorithms, Association rule


mining
Anomaly detection

Builds models of normal behavior and detects


significant deviations

Supervised normal training data

Unsupervised no information about training


data

Classification, clustering

Association and Correlation Analysis


Finds relationships between system attributes
describing the network data
Helps in selection of useful attributes
Analysis of Stream data
Transient and dynamic nature of intrusions
An event maybe normal on its own but malicious
when viewed as a part of a sequence
Distributed Data Mining
Analysis of data from several locations
Visualization and Querying tools

Data Mining in other Scientific Applications:

Old Scenario: Small, homogeneous data sets


Formulate hypothesis, build model, evaluate
results

Current Scenario: High-dimensional data, stream

data, heterogeneous data (spatial, temporal)


Collect and store data, mine for new hypotheses,
confirm with data or experimentation

Vast amounts of data have been collected from


Scientific domains
Climate and ecosystem modeling, Chemical
engineering, fluid dynamics, structural
mechanics

Data Warehouses and data preprocessing

Scientific applications methods are needed for


integrating data from heterogeneous sources
(Geospatial data warehouse) and identifying
events (Climate and Ecosystem data)

Mining complex data types

Scientific data Semi-structured and unstructured


Multimedia and Spatial data

Graph-based mining

Labeled graphs capture spatial, topological,


geometric and other relational characteristics
present in scientific data
Nodes objects to be mined; edges
relationships
Scalable and efficient mining methods are needed

Visualization tools and domain specific knowledge

High level GUIs and visualization tools are


needed
Integrated with existing domain-specific systems
and database systems

Issues in Data Mining:


Mining methodology and user interaction

Mining different kinds of knowledge in databases


Interactive mining of knowledge at multiple
levels of abstraction
Incorporation of background knowledge
Data mining query languages and ad-hoc data
mining
Expression and visualization of data mining
results
Handling noise and incomplete data
Pattern evaluation

Issues relating to the diversity of data types


Handling relational and complex types of data
Mining information from heterogeneous
databases and global information systems
(WWW)

Performance and scalability

Efficiency and scalability of data mining


algorithms

Parallel, distributed and incremental mining


methods

Vous aimerez peut-être aussi