Vous êtes sur la page 1sur 39

Gene Classification using Pattern

Discovery based Classifier Rules

Submitted by, Under the guidance of


Monika K (731613104019) Dr. G. Malathy
Sumithra P (731613104042)
Vidhya D (731613104045) Associate Professor
Vinodini K (731613104050) Department of CSE

1
Data Mining
Objective: Fit data to a model
Potential Result: Higher-level meta information that
may not be obvious when looking at raw data
Similar terms
Exploratory data analysis
Data driven discovery
Deductive learning

2
Abstract
The associative classification model based label
assignment framework is used to categorize the high
dimensional data values
Pattern based sequence classification is carried out on
the gene expression data values
Sequence Classification based on Refined Interesting
Patterns (SCRIP) is constructed to perform the category
assignment process
Rule summary analysis, dimensionality reduction and
redundant time stamp analysis techniques are integrated
with the SCRIP scheme
Introduction
Discover the frequent patterns using association rule
mining.
Frequent rules are discovered with minimum support and
confidence levels and frequent patterns are used as the
classifier rules.
Classification techniques are employed to assign
category information for the transactions.
Associative classification method integrates rule mining
and classification techs for process.
Unlabeled transactions are analyzed with classifier rules.
Literature Survey
Title Name of the Authors Mechanism Contribution Limitations
& Year
Enabling new S. Saroiu and A. killer application as Sequence Classification Redundant
Mobile Applications Wolman&2009 location-enabled mobile is Performed with Item Timestamp are
With Location handheld devices sets not Handled
Proofs
proliferate
Vogue: A Variable M.J. Zaki, C. D. Variable Order Hidden Performs the Frequent Classification
Order Hidden Carothers and B. K. Markov Model Sequence Mining Operations are
Markov Model with Szymanski & 2010 Operations not Handled
Duration based on
Frequent Sequence
Mining

Temporal Chuanren Liu, Kai Temporal Data visualization and Dimensionality


Skeletonization on Zhang, Hui Xiong skeletonization Categorization Detection Not
Sequential Data: Guofei Jiang and Performed
Patterns, Qiang Yang & 2016
Categorization and
Visualization
Hierarchical Spatio- Chung-Hsien Yu, Wei Ensemble Spatio- Pattern Discovery and Redundant
Temporal Pattern Ding, Melissa temporal Pattern (ESTP) Prediction is Performed Timestamp are
Discovery and Morabito and Ping on Location and Time not Handled
Predictive Modeling Chen & 2016 based Data Values
Mining Sequential D.Fradkin and F. Branch-and-bound Classification is High
Patterns for Morchen & 2015 methods Performed using Computational
Classification Sequential Patterns Overhead
Learning Sequential G.Dafe, A. Veloso, Hidden Markov Models Sequence Classification Redundant
Classifiers from M. Zaki and W. Meira is Performed on Noisy Timestamp are
Long and Noisy Jr & 2014 Data Environment not Handled
Discrete-Event
Sequences
Efficiently
CBC: An H.Deng, G. Runger, Condition-based Tree Association Rule Based High
Associative E. Tuv andW & 2014 (CBT) Classification is Dimensionality
Classifier with a Performed with Data Values are
Small Number of Minimum Ruleset Not Performed
Rules
Mining H.T. Lam, F. Encoding Scheme The system Discovers Classification
Compressing Moerchen, D. Fradkin Compressing Sequential Classes are Not
Sequential Patterns and T. Calders & 2014 Patterns Supported
Mining User-Aware Jiaqi Zhu, Kaijun User-Aware Rare Sequential Topic Patterns Classification
Rare Sequential Wang, Yunkun Wu, Sequential Topic are Discovered in Text Process are Not
Topic Patterns in Zhongyi Hu and Patterns (URSTD) Documents Handled
Document Streams Hongan Wang & 2016

Classification based L.T. Nguyen, B. Vo, Class Association Rule Class Association Rules Sequential
on Association T.-P. Hong and H. C. Miner are Discovered for Data Classification is
Rules: A Lattice- Thanh & 2012 Categorization not Supported
Based Approach
Existing System

The sequence classification methods are applied to


assign class labels for data sequences
The Sequence Classification based on Interesting
Patterns (SCIP) scheme is adapted to perform
classification on sequential data elements
Support and cohesion measures are estimated to
discover the interesting rules.
The SCIP scheme is build with four major steps:
Interesting item set discovery, interesting
subsequence identification, rule pruning and
classifier building operations
The class rules are updated in the classifier to
perform the unlabeled transaction analysis tasks
The classifier is used in the class assignment process
Drawbacks

Dimensionality reduction operations are not supported


in the system
Redundant timestamp values are not handled in the
sequence identification process
Rule refinement process is not handled
Classifier is build with noisy rule information
Proposed System

The Sequence Classification based on Refined


Interesting Patterns (SCRIP) scheme is employed for the
gene data classification process
Attribute groups are constructed to support
dimensionality reduction process
The sequences are build with redundant time stamp
analysis
Support distribution based rule summarization model is
proposed for frequent item-set identification
Module Description
The system is designed to analyze yeast gene
expressions and also high dimension and dense data
analysis is provided in the system.
This system is divided into six major modules
Gene Data Analysis
Relationship Analysis
Pattern Mining Process
Rule Summary Analysis
Classifier Building Process
Classification Process
Gene Data Analysis
Designed to analyze yeast gene expression data values
and data cleaning is performed to correct noisy data
values.
The optimized gene data values are prepared with the
attribute group information
Values are extracted and also optimization process is
performed with timestamp.
Redundant timestamp details are considered in the
sequence preparation process
The attribute summary shows the attribute group
names and attribute count details
Relationship Analysis

The candidate set and item sets values are


prepared with labels.
Attribute name, value and transaction labels
are used to prepare candidate sets.
Item sets are prepared using the combination
of candidate sets.
Frequency values are updated for each
candidate set and item set.
Candidate and item set details
Labeled Details
Pattern Mining Process

The pattern mining process is applied to extract


frequent patterns
Minimum support and minimum confidence values
are used to fetch frequent rules
Rules are filtered on labeled patterns
The interesting pattern extraction also uses the
support and cohesion values
Rule Summary Analysis

Rule summary analysis is carried out on item sets


with support and confidence values
Support distribution is prepared with the support ratio
values
Rule summary is prepared for each support level
The rule refinement is carried out with rule summary
information
Classifier Building Process
The classifier construction process is used to build the
classifier with discriminative rules
The sequence construction process and rule
identification operation are called on the interesting
item set values
The classifier construction is carried out using two
ways
They are Sequence Classification based on Interesting
Patterns (SCIP) scheme and Sequence Classification
based on Refined Interesting Patterns (SCRIP) scheme
The Interesting Pattern (IP) method is applied to fetch
the patterns that are used for the classifier rules
The Refined Interesting Patterns (RIP) method is
used to discover the patterns for classifier rules
The RIP method eliminates the infrequent and
irrelevant rules from the discovered patterns
Support maximization threshold is used for the
refinement process
Classification Process
The classification process is designed to perform the gene
expression categorization process
The Sequence Classification based on Interesting Patterns
(SCIP) and Sequence Classification based on Refined
Interesting Patterns (SCRIP) schemes are used in the
classification process
Redundant timestamp information are analyzed to correct
the sequence constructions process
The pattern refinement technique reduces the classifier
rules
Performance Analysis

SCIP method uses the labeled patterns for classification


process.
Rule summary analysis is adapted in the SCRIP
method.
The gene expression classification is analyzed with
SCIP and SCRIP techniques.
The system is tested with three performance measures.
The rules are compared for the same support and
confidence values. The IP and RIP extraction methods
are analyzed with rule retrieval rate measure.
They are rule retrieval count, classification accuracy
and time complexity measures
The Refined Interesting Pattern (RIP) model reduces
the irrelevant rule 45% than the Interesting Patterns
(IP) model
The SCRIP model improves the classification
accuracy 15% than the SCIP model
The SCRIP model reduces the time complexity 20%
than the SCIP model
Advantages

Rule retrieval rate is increased in the system


The classification process minimizes the time
complexity levels
The classification accuracy is improved in the system
The system supports redundant timestamp information
in interesting subsequence identification process
Applications

Healthcare data analysis


Bioinformatics
Market data analysis
Web data analysis
Conclusion

Pattern based sequence classification models are adapted


to classify gene expressions
Sequence Classification with Refined Interesting
Patterns (SCRIP) mechanism improves the classification
results
Dimensionality reduction and redundant time stamp
analysis operations are supported in the system
Time and accuracy parameters are estimated and
analyzed in the classification process
Future Work
The associative classification scheme can be
improved with weighted rule mining methods
The system can be upgraded with fuzzy logic,
Genetic Algorithm (GA) and optimization based
techniques
The system can be enhanced to support incremental
mining tasks
The classification scheme can be adapted to
categorize the text and web data values
References
Cheng Zhou, Boris Cule and Bart Goethals, Pattern Based
Sequence Classification, IEEE Transactions On Knowledge And
Data Engineering, May 2016
Chung-Hsien Yu, Wei Ding, Melissa Morabito and Ping Chen,
Hierarchical Spatio-Temporal Pattern Discovery and Predictive
Modeling, IEEE Transaction on Data and Knowledge Engineering,
April 2016
D.Fradkin and F. Morchen, Mining Sequential Patterns for
Classification, Knowledge and Information Systems, pp. 119,
2015
G.Dafe, A. Veloso, M. Zaki and W. Meira Jr, Learning Sequential
Classifiers from Long and Noisy Discrete-Event Sequences
Efficiently, Data Mining and Knowledge Discovery, pp. 124,
2014
H.Deng, G. Runger, E. Tuv andW. Bannister, CBC: An
Associative Classifier with a Small Number of Rules, Decision
Support Systems, Vol. 59, pp. 163170, 2014
H.T. Lam, F. Moerchen, D. Fradkin and T. Calders, Mining
Compressing Sequential Patterns, Statistical Analysis and Data
Mining, 2014
P. Fournier-Viger, A. Gomariz, M. Campos and R. Thomas, Fast
Vertical Sequential Pattern Mining Using Co-Occurrence
Information, in Proc. 18th Pacific-Asia Conf. Knowl. Discovery
Data Mining, 2014A

Vous aimerez peut-être aussi