Académique Documents
Professionnel Documents
Culture Documents
RNA
Texts
Outline
Graph Classification
Graph pattern-based approach
Machine Learning approaches
Graph Clustering
Link-density-based approach
(A)
(B)
(C)
FREQUENT PATTERNS
(MIN SUPPORT IS 2)
(1)
(2)
Example
GRAPH DATASET
FREQUENT PATTERNS
(MIN SUPPORT IS 2)
Support calculation
embedding store or not
Discover order of patterns
path tree graph
9
Apriori-Based Approach
k-edge
(k+1)-edge
G1
G1
G
G2
G
Gn
G
Join
Gn
Subgraph
isomorphism
test
NP-complete
Prune
check the frequency of
each candidate
10
k-edge
duplicate
graph
G2
Gn
12
support is 5%
13
as G
Lossless compression
Still ensures that the mining result is complete
Graph Search
Querying graph databases:
Given a graph database and a query graph, find all the
graphs containing this query graph
query graph
graph database
15
Scalability Issue
Nave solution
Sequential scan (Disk I/O)
Subgraph isomorphism test (NP-complete)
16
Indexing Strategy
Graph (G)
If graph G contains query
graph Q, G should contain
any substructure of Q
Substructure
Remarks
17
Indexing Framework
Two steps in processing graph queries
substructures
Size-increasing support threshold
support
minimum
support threshold
size
19
(a) caffeine
(b) diurobromine
(c) sildenafil
QUERY GRAPH
20
corresponding vectors
Advantages
Easy to index
Fast
Rough measure
21
22
Approximate Search
Hard to build indices covering similar subgraphs
explosive number of subgraphs in databases
23
Outline
Graph Classification
Graph pattern-based approach
Machine Learning approaches
Graph Clustering
Link-density-based approach
F {g1,..., g n }
xi
is the frequency of
x {x1 ,..., xn },
g i in that graph
Extensions
Mining top-k discriminative patterns
Mining approximate/weighted discriminative patterns
Graph Kernels
Motivation:
Kernel based learning methods doesnt need to access data
points
They rely on the kernel function between the data points
Basic idea:
Map each graph to some significant set of patterns
Define a kernel on the corresponding sets of patterns
27
Kernel-based Classification
Random walk
Basic Idea: count the matching random walks between the two graphs
Marginalized Kernels
Grtner 02, Kashima et al. 02, Mah et al.04
and
and
features
A rule is a tuple
If a molecule contains substructure
Gain
Applying boosting
, it is classified as
Outline
Graph Classification
Graph pattern-based approach
Machine Learning approaches
Graph Clustering
Link-density-based approach
Graph Compression
Extract common subgraphs and simplify graphs by
these structures.
How can the structure be made clear?
An Example of Networks
How many clusters?
What size should they be?
What is the best
partitioning?
Should some points be
segregated?
society
E.g., Hermits know few people and belong to no group
Structure Similarity
The desired features tend to be captured by a measure
| (v) ( w) |
(v, w)
| (v) || ( w) |
Structural similarity is large for members of a clique and
Graph Mining
Frequent Subgraph
Mining (FSM)
Apriori
based
AGM
FSG
PATH
Pattern
Growth
based
gSpan
MoFa
GASTO
N FFSM
SPIN
Variant Subgraph
Pattern Mining
Applications of
Frequent Subgraph
Mining
Indexing
and
Search
Clustering
Coherent
Subgraph
mining
Closed
Dense
Classification
Subgraph CSA
Subgraph
CLAN
mining
Mining
Approximate
methods
SUBDUE
GBI
CloseGraph
CloseCut
Splat
CODENSE
Kernel Methods
(Graph Kernels)
GraphGrep
Daylight
gIndex
( Grafil)
37