A Fuzzy Self Constructing Feature Clustering Algorithm For Text Classification

2010 / 03 / 17 Yi - Xian Lin 1
A Fuzzy Self-Constructing Feature

Clustering Algorithm for Text
Classification
Jung-Yi Jiang, Ren-Jia Liou, and Shie-Jue Lee
Accepted by IEEE Transactions on Knowledge and Data Engineering
Reporter Yi-Xian Lin
National University of Tainan
2010 / 03 / 17 Yi - Xian Lin 2
Outline
Motivation Objective
Feature Reduction
Feature Clustering
Fuzzy Feature Clustering
Text Classification
Experimental results
Advantages
2010 / 03 / 17 Yi - Xian Lin 3
Motivation Objective
In text classification, the dimensionality of the feature vector is
usually huge
The current problem of the existing feature clustering methods
The desired number of extracted features has to be specified in advance
When calculating similarities, the variance of the underlying cluster is
not considered
How to reduce the dimensionality of feature vectors for text
classification and run faster ?
2010 / 03 / 17 Yi - Xian Lin 4
Feature Reduction
Purpose
Reduce classifiers computation load
Increase data consistency
Techniques
To eliminate redundant data
To find representative data
To reduce the dimensions of the feature sets
To find the best set of vectors which best separate the patterns
Two ways of doing feature reduction, feature selection
and feature extraction
2010 / 03 / 17 Yi - Xian Lin 5
Feature Reduction
Feature selection
Let the word set W={W
1
,W
2
,,W
m
} be the feature vector of the
document set
Find a new word set
Then W
is used as inputs for classification tasks

Feature extraction
Extracted features are obtained by a projecting process through
algebraic transformations
Let a corpus of documents be represented as an matrix
Find an optimal transformation matrix
' ' ' '
1 2
{ , ,... } ,
k
W w w w k m = <
n m
n m
R X

k m
R F

*
2010 / 03 / 17 Yi - Xian Lin 6
Feature Clustering
Feature clustering is an efficient approach for feature reduction
Groups all features into some clusters where features in a
cluster are similar to each other
Let D be the matrix consisting of all the original documents
with m features and D
be the matrix consisting of the

converted documents with new k features
New feature set corresponds to a partition
{W
1
,W
2
,,W
k
} of the original feature set W
' ' ' '
1 2
{ , ,... }
k
W w w w =
2010 / 03 / 17 Yi - Xian Lin 7
A document set D of n documents d
1
,d
2
,...,d
n
Feature vector W of m words w
1
,w
2
,...,w
m
p classes c
1
,c
2
,...,c
p
Construct one word pattern for each word in W
where
( ) ( )
( )
1 2 1 2
, ,..., | , | ,..., |
i i i ip i i p i
x x x x P c w P c w P c w = =
( )
1
1
| , 1
n
qi qi
q
j i
n
qi
q
d
P c w for j p
d
=
=
2010 / 03 / 17 Yi - Xian Lin 8

( ) ( )
6 1 6 2 6
| , | x P c w P c w =
( )
2 6
1 0 2 0 0 0 1 0 1 1 1 1 1 1 1 1 0 1
| 0.50
1 2 0 1 1 1 1 1 0
P c w
+ + + + + + + +
= =
+ + + + + + + +
2010 / 03 / 17 Yi - Xian Lin 9
Let G be a cluster containing q word patterns x
1
,x
2
,...,x
q
Let
The mean
The deviation
The fuzzy similarity of a word pattern x to cluster G
1 2
, ,..., , 1
j j j jp
x x x x j q =
1
1 2
, ,..., ,
q
ji
j
p i
x
m m m m m
G
=
= =

1 2
, ,...,
p
=
( )
2
1
, 1
q
ji ji
j
i
x m
for i p
G
=

=
( )
2
1
exp
p
i i
i
i
x m
G x
=
(
| |
( =
|
(
\

2010 / 03 / 17 Yi - Xian Lin 10

A word pattern close to the mean of a cluster is regarded to
be very similar to this cluster
Suppose m
1
= < 0.4, 0.6 > ,
1
= < 0.3 , 0.5 >
( )
1 G x
( )
2 2
1 2
0.2 0.4 0.8 0.6
exp exp
0.3 0.5
0.6412 0.8521 0.5464
G x
( (

| | | |
=
( (
| |
\ \
( (

= =
2010 / 03 / 17 Yi - Xian Lin 11
A predefined threshold ,
If , x
i
passes the similarity test on cluster G
j
If the user intends to have larger clusters, give a smaller
threshold
Two cases may occur
No existing fuzzy clusters on which x
i
has passed the similarity test
Create a new cluster G
h
, h = k + 1 ( k is the number of currently
existing clusters) ,
is a user-defined constant vector
0 1
( )
j i
G x
0
= ,
h i h
m x =
0 0 0
,..., =
2010 / 03 / 17 Yi - Xian Lin 12
If there are existing clusters on which x
i
has passed the
similarity test, let cluster G
t
be the cluster with the largest
membership degree ,
Modification to cluster G
t
( ) ( )
1
arg max
j i
j k
t G x

=
( )( )
0
2
2 2 2
0
, ,
1
1
1
,
1
1 , 1
t tj ij
tj tj
t
t tj t tj ij
t tj ij
t
t t t
t t
S m x
m A B
S
S S m x
S m x
S
A B
S S S
for j p and S S

+
= = +
+
+ +
+
| |
+
= =
|
+
\
= +
2010 / 03 / 17 Yi - Xian Lin 13
The order in which the word patterns are fed in influences the
clusters obtained
Sort all the patterns, in decreasing order, by their largest
components
Let x
1
= < 0.1 , 0.3 , 0.6 > , x
2
= < 0.3, 0.3, 0.4 > , x
3
= < 0.8, 0.1, 0.1 >
The largest components in these word patterns are 0.6, 0.4, and 0.8
The sorted list is 0.8, 0.6, 0.4
The order of feeding is x
3
, x
1
, x
2
2010 / 03 / 17 Yi - Xian Lin 14
The order of feeding : x
5
, x
7
, x
10
, x
1
, x
4
, x
9
, x
2
, x
3
, x
8
, x
6
No clusters exist at the beginning , k = 0
Set
0
= 0.5 , =0.64
Create G
1
< 0.5 , 0.5 > < 1.00 , 0.00 > 1
G
1
deviation mean m Size S cluster
2010 / 03 / 17 Yi - Xian Lin 15
Feeding : x
7
G
1
(x
7
) = 1 >
( )( )
( )( )
11 12
1
2
2
2 2
11 11
2
2
2 2
12 12
11
1 1.00 1.00 1 0.00 0.00
1.00 , 0.00
1 1 1 1
1.00 , 0.00
1 1 0.5 0.5 1 1.00 1.00
1 1 1 1.00 1.00
,
1 1 1 1
1 1 0.5 0.5 1 0.00 0.00
1 1 1 0.00 0.00
,
1 1 1 1
m m
m
A B
A B
+ +
= = = =
+ +
=
+ +
+ +
| |
= =
|
+
\
+ +
+ +
| |
= =
|
+
\
11 11 12 11 11
1 1
0.5 0.5 , 0.5 0.5
0.5 , 0.5 , 1 1 2
A B A B
S
= + = = + =
= = + =
2010 / 03 / 17 Yi - Xian Lin 16
After self-constructing clustering
Similarities of patterns to clusters
2010 / 03 / 17 Yi - Xian Lin 17
Data transformation
H-FFC (hard weighting)
each word is only allowed to belong to a cluster and so it only
contributes to a new extracted feature
'
D DT =
[ ]
1
' ' ' '
1 2 2
,
T
T
n n
D d d d D d d d
(
= =

( ) ( )
1
1 , arg max
0 , otherwise
k i
ij
j G x
t

if
2010 / 03 / 17 Yi - Xian Lin 18
H-FFC :
2010 / 03 / 17 Yi - Xian Lin 19
S-FFC (soft weighting)
each word is allowed to contribute to all new extracted features,
with the degrees depending on the values of the membership
functions
M-FFC (mixed weighting)
a combination of the hardweighting approach and the soft-
weighting approach
is a user-defined constant lying between 0 and 1
( )
ij j i
t G x =
( ) ( )
1
H S
ij ij ij
t t t = +
2010 / 03 / 17 Yi - Xian Lin 20
S-FFC :
2010 / 03 / 17 Yi - Xian Lin 21
M-FFC :
2010 / 03 / 17 Yi - Xian Lin 22
Text Classification
Training document data set
Feature reduction
Training
data set for
class 1
...
Training
data set for
class p
Train 1st classifier (SVM) Train p-th classifier (SVM)
...
Unknown
pattern
Feature reduction
...
p classifiers are constructed.
2010 / 03 / 17 Yi - Xian Lin 23
Text Classification
Training data set and target sets for SVMs
Class Target 1Target 2
C1
+1 -1
C1
+1 -1
C1
+1 -1
C1
+1 -1
C2
-1 +1
C2
-1 +1
C2
-1 +1
C2
-1 +1
C2
-1 +1
Training target set for class C1
Training target set for class C2
2010 / 03 / 17 Yi - Xian Lin 24
Text Classification
Training classifiers
Feature reduction for unknown pattern
1 target
'
+
H
D
2 target
'
+
H
D
Training classifier (SVM1)
Training classifier (SVM2)
Unknown pattern
Unknown pattern after feature reduction
2010 / 03 / 17 Yi - Xian Lin 25
Text Classification
Classify the unknown pattern
Trained classifier
(SVM1)
Trained classifier
(SVM2)
-1 +1
Unknown pattern d Class C2
Classified to
2010 / 03 / 17 Yi - Xian Lin 26
Performance measures
class. wrt negatives False :
class. wrt positives False :
class. wrt negatives True :
class. wrt positives True :
classes. of number :
i-th FN
i-th FP
i-th TN
i-th TP
p
i
i
i
i
( ) ( )
( )
( )
1 1
1 1
1
1
,
2
1 ,
P P
i i
i i
P P
i i i i
i i
P
i i
i
P
i i i i
i
TP TP
MicroP MicroR
TP FP TP FN
TP TN
MicroP MiccroR
MicroF MicroAcc
MicroP MiccroR
TP TN FP FN
= =
= =
=
=
= =
+ +
+
= =
+
+ + +

2010 / 03 / 17 Yi - Xian Lin 27

20 news groups data set
Number of classes 20
Number of
documents
20000
Proportion of
training documents
2/3
Proportion of
testing documents
1/3
Number of features 25718
2010 / 03 / 17 Yi - Xian Lin 28
Execution time (sec) of different methods on 20 Newsgroup data
2010 / 03 / 17 Yi - Xian Lin 29
Microaveraged accuracy (%) of different methods on 20 Newsgroup data
2010 / 03 / 17 Yi - Xian Lin 30
Microaveraged F1 (%) of M-FFC with different values
for 20 Newsgroups data
2010 / 03 / 17 Yi - Xian Lin 31
Advantages
a fuzzy self-constructing feature clustering (FFC)
algorithm which is an incremental clustering approach
to reduce the dimensionality of the features in text
classification
Determine the number of features automatically
Match membership functions closely with the real
distribution of the training data
Runs faster
Better extracted features than other methods

A Fuzzy Self Constructing Feature Clustering Algorithm For Text Classification

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

A Fuzzy Self Constructing Feature Clustering Algorithm For Text Classification

Transféré par

Droits d'auteur :

Formats disponibles

2010 / 03 / 17 Yi - Xian Lin 1

A Fuzzy Self-Constructing Feature

is used as inputs for classification tasks

be the matrix consisting of the

2010 / 03 / 17 Yi - Xian Lin 8

2010 / 03 / 17 Yi - Xian Lin 10

2010 / 03 / 17 Yi - Xian Lin 27

Vous aimerez peut-être aussi