Académique Documents
Professionnel Documents
Culture Documents
k m
R F
*
2010 / 03 / 17 Yi - Xian Lin 6
Feature Clustering
Feature clustering is an efficient approach for feature reduction
Groups all features into some clusters where features in a
cluster are similar to each other
Let D be the matrix consisting of all the original documents
with m features and D
=
=
=
=
( )
2
1
exp
p
i i
i
i
x m
G x
=
(
| |
( =
|
(
\
+ +
= = = =
+ +
=
+ +
+ +
| |
= =
|
+
\
+ +
+ +
| |
= =
|
+
\
11 11 12 11 11
1 1
0.5 0.5 , 0.5 0.5
0.5 , 0.5 , 1 1 2
A B A B
S
= + = = + =
= = + =
2010 / 03 / 17 Yi - Xian Lin 16
Fuzzy Feature Clustering
After self-constructing clustering
Similarities of patterns to clusters
2010 / 03 / 17 Yi - Xian Lin 17
Fuzzy Feature Clustering
Data transformation
H-FFC (hard weighting)
each word is only allowed to belong to a cluster and so it only
contributes to a new extracted feature
'
D DT =
[ ]
1
' ' ' '
1 2 2
,
T
T
n n
D d d d D d d d
(
= =
( ) ( )
1
1 , arg max
0 , otherwise
k i
ij
j G x
t
if
2010 / 03 / 17 Yi - Xian Lin 18
Fuzzy Feature Clustering
H-FFC :
2010 / 03 / 17 Yi - Xian Lin 19
Fuzzy Feature Clustering
S-FFC (soft weighting)
each word is allowed to contribute to all new extracted features,
with the degrees depending on the values of the membership
functions
M-FFC (mixed weighting)
a combination of the hardweighting approach and the soft-
weighting approach
is a user-defined constant lying between 0 and 1
( )
ij j i
t G x =
( ) ( )
1
H S
ij ij ij
t t t = +
2010 / 03 / 17 Yi - Xian Lin 20
Fuzzy Feature Clustering
S-FFC :
2010 / 03 / 17 Yi - Xian Lin 21
Fuzzy Feature Clustering
M-FFC :
2010 / 03 / 17 Yi - Xian Lin 22
Text Classification
Training document data set
Feature reduction
Training
data set for
class 1
...
Training
data set for
class p
Train 1st classifier (SVM) Train p-th classifier (SVM)
...
Unknown
pattern
Feature reduction
...
p classifiers are constructed.
2010 / 03 / 17 Yi - Xian Lin 23
Text Classification
Training data set and target sets for SVMs
Class Target 1Target 2
C1
+1 -1
C1
+1 -1
C1
+1 -1
C1
+1 -1
C2
-1 +1
C2
-1 +1
C2
-1 +1
C2
-1 +1
C2
-1 +1
Training target set for class C1
Training target set for class C2
2010 / 03 / 17 Yi - Xian Lin 24
Text Classification
Training classifiers
Feature reduction for unknown pattern
1 target
'
+
H
D
2 target
'
+
H
D
Training classifier (SVM1)
Training classifier (SVM2)
Unknown pattern
Unknown pattern after feature reduction
2010 / 03 / 17 Yi - Xian Lin 25
Text Classification
Classify the unknown pattern
Trained classifier
(SVM1)
Trained classifier
(SVM2)
-1 +1
Unknown pattern d Class C2
Classified to
2010 / 03 / 17 Yi - Xian Lin 26
Experimental results
Performance measures
class. wrt negatives False :
class. wrt positives False :
class. wrt negatives True :
class. wrt positives True :
classes. of number :
i-th FN
i-th FP
i-th TN
i-th TP
p
i
i
i
i
( ) ( )
( )
( )
1 1
1 1
1
1
,
2
1 ,
P P
i i
i i
P P
i i i i
i i
P
i i
i
P
i i i i
i
TP TP
MicroP MicroR
TP FP TP FN
TP TN
MicroP MiccroR
MicroF MicroAcc
MicroP MiccroR
TP TN FP FN
= =
= =
=
=
= =
+ +
+
= =
+
+ + +