Académique Documents
Professionnel Documents
Culture Documents
a r t i c l e
i n f o
Article history:
Received 2 February 2012
Received in revised form 24 May 2012
Accepted 31 August 2012
Available online 23 September 2012
Keywords:
Meta-cognitive learning
Self-regulatory thresholds
Radial basis function network
Multi-category classication
Projection Based Learning
a b s t r a c t
Meta-cognitive Radial Basis Function Network (McRBFN) and its Projection Based Learning (PBL) algorithm for classication problems in sequential framework is proposed in this paper and is referred to as
PBL-McRBFN. McRBFN is inspired by human meta-cognitive learning principles. McRBFN has two components, namely the cognitive component and the meta-cognitive component. The cognitive component
is a single hidden layer radial basis function network with evolving architecture. In the cognitive component, the PBL algorithm computes the optimal output weights with least computational effort by nding
analytical minima of the nonlinear energy function. The meta-cognitive component controls the learning
process in the cognitive component by choosing the best learning strategy for the current sample and
adapts the learning strategies by implementing self-regulation. In addition, sample overlapping conditions are considered for proper initialization of new hidden neurons, thus minimizes the misclassication.
The interaction of cognitive component and meta-cognitive component address the what-to-learn, whento-learn and how-to-learn human learning principles efciently. The performance of the PBL-McRBFN is
evaluated using a set of benchmark classication problems from UCI machine learning repository and two
practical problems, viz., the acoustic emission signal classication and the mammogram for cancer classication. The statistical performance evaluation on these problems has proven the superior performance
of PBL-McRBFN classier over results reported in the literature.
2012 Elsevier B.V. All rights reserved.
1. Introduction
Neural networks are powerful tools that can be used to approximate the complex nonlinear inputoutput relationships efciently.
Hence, from the last few decades neural networks are extensively
employed to solve real world classication problems [1]. In a classication problem, the objective is to learn the decision surface that
accurately maps an input feature space to an output space of class
labels. Several learning algorithms for different neural network
architectures have been used in various problems in science, business, industry and medicine, including the handwritten character
recognition [2], speech recognition [3], biomedical medical diagnosis [4], prediction of bankruptcy [5], text categorization [6] and
information retrieval [7]. Among various architectures reported in
the literature, Radial Basis Function (RBF) network gaining attention due to its localization property of Gaussian function, and
widely used in classication problems. Signicant contributions
to RBF learning algorithms for classication problems are broadly
classied into two categories: (a) Batch learning algorithms: Gradient descent based learning was used to determine the network
parameters [8]. Here, the complete training data are presented multiple times, until the training error is minimum. Alternatively, one
can implement random input parameter selection with least square
solution for the output weight [9,10]. In both cases, the number
of Gaussian functions required to approximate the true function is determined heuristically. (b) Sequential learning algorithms:
The number of Gaussian neurons required to approximate the
inputoutput relationship is determined automatically [1115].
Here, the training samples are presented one-by-one and discarded
after learning. Resource Allocation Network (RAN) [11] was the
rst sequential learning algorithm introduced in the literature. RAN
evolves the network architecture required to approximate the true
function using novelty based neuron growth criterion. Minimal
Resource Allocation Network (MRAN) [12] uses a similar approach,
but it incorporates error based neuron growing/pruning criterion.
Hence, MRAN determines compact network architecture than RAN
algorithm. Growing and Pruning Radial Basis Function Network
[13] selects growing/pruning criteria of the network based on the
signicance of a neuron. A sequential learning algorithm using
recursive least squares presented in [14], referred as an On-line
Sequential Extreme Learning Machine (OS-ELM). OS-ELM chooses
input weights randomly with xed number of hidden neurons and
analytically determines the output weights using minimum norm
least-squares. In case of sparse and imbalance data sets, the random
(a)
(b)
Metacognition
Monitoring
Control
Cognition
655
Metacognitive Component
Predicted
Output
Best learning
Strategy
Cognitive Component
(RBF Neural Network)
Fig. 1. (a) Nelson and Narens Model of meta-cognition and (b) McRBFN Model.
knowledge stored in the network. Similar works using metacognition in complex domain are reported in [24,25]. Recently
proposed Projection Based Learning in meta-cognitive radial basis
function network [26] addresses the above issues in batch mode
except proper utilization of the past knowledge stored in the network and applied to solve biomedical problems in [2729]. In this
paper, we propose a meta-cognitive radial basis function network
and its fast and efcient projection based sequential learning algorithm.
There are several meta-cognition models available in human
physiology and a brief survey of various meta-cognition models are
reported in [30]. Among the various models, the model proposed
by Nelson and Narens in [31] is simple and clearly highlights the
various actions in human meta-cognition as shown in Fig. 1(a). The
model is analogous to the meta-cognition in human-beings and has
two components, the cognitive component and the meta-cognitive
component. The information ow from the cognitive component
to meta-cognitive component is considered monitoring, while the
information ow in the reverse direction is considered control.
The information owing from the meta-cognitive component to
the cognitive component either changes the state of the cognitive
component or changes the cognitive component itself. Monitoring
informs the meta-cognitive component about the state of cognitive
component, thus continuously updating the meta-cognitive components model of cognitive component, including, no change in
state.
McRBFN is developed based on the Nelson and Narens metacognition model [31] as shown in Fig. 1(b). Analogous to the Nelson
and Narens meta-cognition model [31], McRBFN has two components namely the cognitive component and the meta-cognitive
component as shown in Fig. 1(b). The cognitive component is a
single hidden layer radial basis function network with evolving
architecture. The cognitive component learns from the training
data by adding new hidden neurons and updating the output
weights of hidden neurons to approximate the true function. The
input weights of hidden neurons (center and width) are determined based on the training data and output weights of hidden
neurons are estimated using the projection based sequential learning algorithm. When a neuron is added to the cognitive component,
the input/hidden layer parameters are xed based on the input of
the sample and the output weights are estimated by minimizing
an energy function given by the hinge loss error as in [32]. The
problem of nding optimal weights is rst formulated as a linear programming problem using the principles of minimization
and real calculus [33,34]. The Projection Based Learning (PBL)
algorithm then converts the linear programming problem into a
system of linear equations and provides a solution for the optimal
weights, corresponding to the minimum energy point of the energy
function. The meta-cognitive component of McRBFN contains a
dynamic model of the cognitive component, knowledge measures
and self-regulated thresholds. Meta-cognitive component controls
the learning process of the cognitive component by choosing one of
the four strategies for each sample in the training data set. When a
656
sample, and ct (1, n) is its class label. Where n is the total number
of classes. The coded class labels (yt = [y1t , . . . , yjt , . . . , ynt ]T ) Rn
are given by:
yjt =
if
ct = j
j = 1, . . . , n
1 otherwise
(1)
The objective of McRBFN classier is to approximate the underlying decision function that maps xt Rm yt Rn . McRBFN begins
with zero hidden neuron and selects suitable strategy for each sample to achieve this objective. In the next section, we describe the
architecture of McRBFN and discuss each of these learning strategies in detail.
2.2. McRBFN architecture
McRBFN has two components, namely the cognitive component and the meta-cognitive component, as shown in Fig. 2. The
cognitive component is a single hidden layer radial basis function
network with evolving architecture starting from zero hidden neuron. The meta-cognitive component of McRBFN contains dynamic
model of the cognitive component, knowledge measures and
self-regulated thresholds. Meta-cognitive component controls the
learning process of the cognitive component by choosing one of
the four strategies for each sample in the training data set. When a
new training sample presented to the McRBFN, the meta-cognitive
component of McRBFN estimates the knowledge present in the new
training sample with respect to the cognitive component. Based
on this information, the meta-cognitive component controls the
learning process of the cognitive component by selecting suitable
strategy for the current training sample to address what-to-learn,
when-to-learn and how-to-learn properly.
We present a detailed description of the cognitive and the metacognitive components of McRBFN in the following sections:
2.2.1. Cognitive component of McRBFN
The cognitive component of McRBFN is a single hidden layered
feed forward radial basis function network with a linear input and
output layers. The neurons in the hidden layer of the cognitive
component of McRBFN employ the Gaussian activation function.
Without loss of generality, we assume that the McRBFN builds
K Gaussian neurons from t 1 training samples. For a given input
xt , the predicted output of the jth output neuron (
yjt ) of McRBFN is
yjt =
K
wkj htk ,
j = 1, . . . , n
(2)
k=1
where wkj is the weight connecting the kth hidden neuron to the
jth output neuron and htk is the response of the kth hidden neuron
to the input xt is given by
htk = exp
xt lk 2
(kl )2
(3)
where lk Rm is the center and kl R+ is the width of the kth hidden neuron. Here, the superscript l represents the corresponding
class of the hidden neuron.
The cognitive component uses Projection Based Learning (PBL)
algorithm for learning process. The strategy proposed here is similar to that of fast learning algorithm for single layer neural network
in [33,34]. The PBL algorithm is described as follows.
Projection Based Learning algorithm: The Projection Based
Learning algorithm works on the principle of minimization of
energy function and nds the optimal network output parameters
657
for which the energy function is minimum, i.e, the network achieves
the minimum energy point of the energy function.
The considered energy function is the sum of squared hinge loss
error at McRBFN output neurons. The energy function for ith sample
is dened as
Ji =
n
i
2
ej
i = 1, . . . , t
(4)
j=1
eji
0
yji
yji
if yji
yji > 1
j=1
yji )2
(yji
n
j=1
Ji =
yji
i=1
n
t
1
2
i=1 j=1
(9)
wkj hik
K
t
hip yji
(10)
i=1
yji
0
K
j = 1, . . . , n
(11)
i = 1, . . . , t
(6)
AW = B
(12)
2
wkj hik
p = 1, . . . , K;
J(W) =
j = 1, . . . , n
k=1
k=1
t
1
p = 1, . . . , K;
Ji =
J(W)
= 0,
wpj
k=1 i=1
j = 1, . . . , n
otherwise
n
0
akp =
t
hik hip ,
k = 1, . . . , K;
p = 1, . . . , K
(13)
i=1
(7)
k=1
where hik is the response of the kth hidden neuron for ith training
sample.
The optimal output weights (W RKn ) are estimated such
that the total energy reaches its minimum.
bpj =
t
hip yji ,
p = 1, . . . ,
K; j = 1, . . . , n
(14)
i=1
(8)
W := arg min
WRKn
J(W)
658
(15)
But the pair of vectors lk and lp are allocated based upon the
selected signicant training samples for addition of neurons, these
signicant samples are selected using neuron growth criterion as
in Eq. (33). Neuron growth criterion uses maximum hinge error
(Et ) and class-wise signicance ( c ). c dened such that a new
neuron is added such that when there is no neuron present near
to the current sample which produces signicant output for the
current sample. So there are no two neuron centers are equal and
hence, the response of the kth and pth hidden neurons are not equal
for all samples.
Proposition 2. The response of the each hidden neuron is non-zero
for at least few samples.
Proof. Let us assume that the response of kth hidden neuron is
0, i.e., hik = 0 xi . This is possible if and only if xi , or lk
, or kl 0
The input variables xi are normalized in a circle of radius 1 such
that |xj | < 1 ; j = 1, . . ., m. As shown in overlapping conditions of the
growth strategy in subsection 2.2.3 that hidden neuron centers are
allocated based upon the selected signicant training samples and
widths are determined based upon inter/intra class nearest neuron
distances which are nonzero positive values. Hence, the response
of the hidden neuron is non-zero for at least few samples.
We state the following theorem, using the Propositions 1 and 2.
Theorem 1. The projection matrix A is a positive denite symmetric
matrix, and hence it is invertible.
Proof. From the denition of the projection matrix A given in Eq.
(13),
Apk =
t
hip hik ,
p = 1, . . . , K;
k = 1, . . . , K
(16)
i=1
t
hik hik ,
k = 1, . . . , K
(17)
i=1
t
|hik |2 > 0
(18)
i=1
t
i=1
hik hij =
t
(19)
i=1
From Eqs. (17) and (19), it can be inferred that the projection matrix
A is a symmetric matrix.
A symmetric matrix is positive denite iff for any q =
/ 0,
qT Aq > 0. Let us consider an unit basis vector q1 RK1 such
that q11 = 1 and q12 q1K = 0, i.e., q1 = [1 0 0 0]T . Therefore,
qT1 Aq1 = A11 In Eq. (17), it was shown that k = 1, . . . , K, Akk R >
0. Therefore, A11 R > 0 qT1 Aq1 > 0. Similarly, for an unit basis
vector qk = [0 1 0]T , the product qTk Aqk is given by
qTk Aqk = Akk > 0;
k = 1, . . . , K
(20)
K
(qk tk )T A
k=1
K
(qk tk ) =
k=1
K
|tk |2 Akk
(21)
k=1
As shown in Eq. (17), Akk R > 0. Also, that |tk |2 R > 0 is evident.
Hence,
|tk |2 Akk R > 0; k = 1, . . . , K
K
(22)
k=1
J(W) i i i 2
=
hp hp =
|hp | > 0
wlp 2
t
i=1
i=1
(23)
(24)
(25)
j1,...,n
the classier developed using hinge loss error estimates the posterior probability more accurately than the classier developed using
mean square error. Hence, in
McRBFN, we use the hinge loss error
T
t
e
max
j1,2,...,n
(26)
Condence of Classier (p(c t |xt )): The condence level of classication or predicted posterior probability is given as
p (j|xt ) =
min(1, max(1, y jt )) + 1
2
j = ct
(27)
(28)
= h(xt , xt )
k=1
K
h(lk , lr )
(29)
2
h(xt , lk )
K
K
(30)
k=1
Kc
h(x
, ck )
(32)
k,r=1
From the above equation, we can see that for Gaussian function
K
the rst term (h(xt , xt )) and last term (1/K 2 k,r=1 h lk , lr ) are
constants. Since potential is a measure of novelty, these constants
may be discarded and the potential can be reduced to
659
(31)
k=1
The spherical potential explicitly indicates the knowledge contained in the sample, a higher value of spherical potential (close
to one) indicates that the sample is similar to the existing knowledge in the cognitive component and a smaller value of spherical
potential (close to zero) indicates that the sample is novel.
2.2.3. Learning strategies
Meta-cognitive component devices various learning strategies using the knowledge measures and self-regulated thresholds,
which directly addresses the basic principles of self-regulated
human learning (i.e., what-to-learn, when-to-learn and how-tolearn). The meta-cognitive part controls the learning process in
c =/ ct OR Et a AND
t
c (x )
(33)
where c is the meta-cognitive knowledge measurement threshold and a is the self-adaptive meta-cognitive addition threshold.
The terms c and a allows samples with signicant knowledge
for learning rst then uses the other samples for ne tuning. If
c threshold is chosen closer to zero and the initial value of a
threshold is chosen closer to the maximum value of hinge error,
then very few neurons will be added to the network. Such a network will not approximate the function properly. If c threshold
is chosen closer to one and the initial value of a threshold is chosen closer to the minimum value of hinge error, then the resultant
network may contain many neurons with poor generalization
ability. Hence, the range for the meta-cognitive knowledge measurement threshold can be selected in the interval [0.30.7] and
the range for the initial value of self-adaptive meta-cognitive
660
(34)
(35)
dI = ||xt lnrI ||
(36)
c
K+1
=
(xt )T xt
c
K+1
= xt cnrS
(37)
(38)
Minimum overlapping with the inter-class: when a new training sample is close to the inter-class nearest neuron compared
to the intra-class nearest neuron, i.e., the intra/inter class distance ratio is in range 11.5, then the sample has minimum
overlapping with the other class. In this case, the center of the
new hidden neuron is shifted away from the inter-class nearest
neuron and shifted towards the intra-class nearest neuron, and
is initialized as
cK+1 = xt + (cnrS lnrI );
c
K+1
= cK+1 cnrS
cK+1 = xt (lnrI xt );
c
K+1
= cK+1 lnrI
(40)
(41)
t
aK+1,p =
i=1
hiK+1 hip ,
= exp
p = 1, . . . , K where hip
li lp 2
(42)
(pl )2
K+1
hiK+1 hiK+1
(43)
i=1
where is center shift factor which determines how much center has to be shifted from the new training sample location. In
our simulation studies value is xed to 0.1.
Signicant overlapping with the inter-class: When a new training
sample is very close to the inter-class nearest neuron compared
to the intra-class nearest neuron, i.e., the intra/inter class distance ratio is more than 1.5, then the sample has signicant
overlapping with the other class. In this case, the center of the
new hidden neuron is shifted away from the inter-class nearest
neuron and is initialized as
(39)
Bt(K+1)n =
T t
T
t
Bt1
Kn + h
bK+1
(44)
K+1
hiK+1 y ji ,
j = 1, . . . , n
(45)
i=1
if
l=j
1 otherwise j = 1, . . . , n
(46)
WtK
wtK+1
= At(K+1)(K+1)
Bt(K+1)n
(47)
where WtK is the output weight matrix for K hidden neurons, and
wtK+1 is the vector of output weights for new hidden neuron after
By
substituting
661
t1
Bt1 = At1 Wt1
+ ht
K &A
t
T
Wt1
K
h t = At
and
(57)
1 t
T t
T
+ At
WtK = Wt1
K
where
aK+1 At1
KK
(AtKK )
AtKK = At1 + ht
aTK+1
= (At1 )
and
AtKK
(At1 )
(48)
ht ,
= aK+1,K+1
is calculated as
(ht )T ht (At1 )
1 + ht (At1 )
(49)
(ht )T
WtK
IKK +
Wt1
+ At1
KK
K
wtK+1
At1
KK
aTK+1 aK+1
(50)
1 t
T t
T
h
At1
KK
(51)
bK+1
(52)
(58)
where et is the hinge loss error for tth sample obtained from Eq.
(5).
Sample reserve strategy: If the new training sample does not
satisfy either the deletion or the neuron growth or the cognitive
component parameters update criterion, then the current sample is pushed to the rear of the training sequence. Since McRBFN
modies the strategies based on the current sample knowledge,
these samples may be used in later stage.
Ideally, training process stops when no further sample is available in the data stream. However, in real-time, training stops
when samples in the reserve remains same.
2.3. PBL-McRBFN classication algorithm
To summarize, the PBL-McRBFN algorithm in a pseudo code
form is given in Pseudo code 1:
aTK+1 bK+1
1 t
T t
T
1
=
aK+1 Wt1
+ At1
h
y
KK
K
(53)
Pseudocode 1.
Algorithm.
(54)
p = 1, . . . , K;
j = 1, . . . , n
IF
c t == c t ANDp(c t |xt ) d THEN
Delete the sample from the sequence without learning.
Neuron Growth Strategy:
ct =
/ c t ORE t a AND c (xt ) c THEN
ELSEIF
Add a neuron to the network (K = K+1).
Choose the parameters of the new hidden neuron using Eqs.
(37) to (52).
Update the self-adaptive meta-cognitive addition
threshold according to Eq. (34)
Parameters Update Strategy:
ELSEIF c t ==
c t ANDE t u THEN
Update the parameters of the cognitive component using Eq. (58)
Update the self-adaptive meta-cognitive update threshold according
to Eq. (54)
Sample Reserve Strategy:
ELSE
The current sample xt , yt is pushed to the rear
end of the sample stack to be used in future. They can be
later used to ne-tune the cognitive component parameters.
ENDIF
(55)
Equating the rst partial derivative to zero and re-arranging Eq.
(55), we get
(At1 + (ht )T ht )WtK (Bt1 + (ht )T (yt )T ) = 0
(56)
In PBL-McRBFN, sample delete strategy address the what-tolearn by deleting insignicant samples from training data set,
662
Table 1
Description of benchmark data sets selected from UCI machine learning repository for performance study.
Data sets
No. of features
No. of classes
No. of samples
Training
I.F
Training
Testing
19
4
13
18
9
7
3
3
4
6
210
45
60
424a
109a
2100
105
118
422
105
0
0
0
0.1
0.68
0
0
0.29
0.12
0.77
HEART
Liver disorders (LD)
PIMA
Breast cancer (BC)
Ionosphere (ION)
13
6
8
9
34
2
2
2
2
2
70
200
400
300
100
200
145
368
383
251
0.14
0.17
0.22
0.26
0.28
0.1
0.14
0.39
0.33
0.28
Testing
n
min N
N j=1...n j
(59)
qjj
Nj
100%
(60)
1
a =
j ,
n
n
o =
j=1
qjj
100%
(61)
j=1
1/M i ri ) over all data sets should be equal, the Friedman statistic
is given by
12M
2F =
L(L + 1)
2
+
1)
L(L
Rj2
(62)
(M 1) 2F
M(L 1) 2F
(63)
CD = q
L(L + 1)
6M
(64)
where critical
values q are based on the Studentized range statistic
divided by 2 as given in [37].
3.2. Performance evaluation on UCI benchmark data sets
The class-wise performance measures (average/overall) testing
efciencies, number of hidden neurons and samples used for PBLMcRBFN, SRAN, ELM and SVM classiers are reported in Table 2. The
Table 2 contains results of both the binary and the multi-category
classication data sets from UCI machine learning repository. From
Table 2, we can see that PBL-McRBFN classier performs slightly
better than the best performing SRAN classier and signicantly
better than ELM and SVM classiers on all the 10 data sets. In
addition, the proposed PBL-McRBFN classier requires fewer samples to learn the decision function and develops compact neural
architecture to achieve better generalization performance.
Well balanced data sets: In IS, IRIS, WINE data sets, the generalization performance of PBL-McRBFN is approximately 2% more
than SRAN classier and 3 4 % more than ELM and SVM classiers. On IS data set proposed PBL-McRBFN uses fewer samples to
achieve 2% improvement over SRAN and proposed PBL-McRBFN
achieves approximately 3 4 % improvement over ELM and SVM
classiers. Similar to IS, on IRIS and WINE data sets, PBL-McRBFN
uses fewer samples with less number of neurons to achieve better
generalization performance. PBL-McRBFN classier achieves better
generalization performance using meta-cognitive learning algorithm, which selects appropriate samples to used in learning based
on the current knowledge. Also, deletes many redundant samples
to avoid over training. For example, in IS data set, PBL-McRBFN
uses only 89 samples out of 210 training samples to build the best
classier.
In order to highlight the above-mentioned advantages of proposed PBL-McRBFN classier, we conduct a simulation study in
ELM classier with only training samples used by PBL-McRBFN
classier. On IS data set, PBL-McRBFN classier selects the best 89
samples for training and these samples are used in batch learning
ELM algorithm and we refer this classier as ELM* .
The testing performance of ELM* classier (which uses the best
89 samples sequence) is better than the original ELM classier
developed using 210 training samples. Also, ELM* achieves better
generalization performance with smaller number of hidden neurons (ELM* requires only 32 hidden neurons to achieve 92.14%
testing efciency whereas ELM requires 49 hidden neurons to
achieve 90.23%). This study clearly indicates that sample deletion
strategy present in PBL-McRBFN helps in achieving better decision
making ability.
Imbalanced data sets: In VC, GI, HEART, LD, PIMA, BC, ION data
sets, the generalization performance of PBL-McRBFN is approximately 2 10 % more than SRAN classier, and 2 15 % more than
ELM and SVM classiers. In case of imbalance data sets, PBLMcRBFN require more number of neurons to approximate the
decision surface with minimal samples for approximating the
663
664
Table 2
Performance comparison of PBL-McRBFN with SRAN, ELM and SVM.
Data sets
PBL-McRBFN
K
SRAN
Samples
Used
IS
IRIS
WINE
VC
GI
HEART
LD
PIMA
BC
ION
a
50
6
11
175
71
20
87
100
13
18
89
20
29
318
115
69
116
162
45
58
Testing
ELM
o
a
94.19
98.10
98.31
78.91
84.76
81.50
73.1
79.62
97.39
96.41
94.19
98.10
98.69
79.09
92.72
81.47
72.63
76.67
97.85
96.47
Samples
Used
47
8
12
113
59
28
91
97
7
21
113
29
46
437
159
56
151
230
91
86
Testing
o
a
92.29
96.19
96.61
75.12
86.21
78.50
66.90
78.53
96.87
90.84
92.29
96.19
97.19
76.86
80.95
77.53
65.78
74.90
97.26
91.88
49
10
10
150
80
36
100
100
66
32
SVM
SVa
Testing
o
a
90.23
96.19
97.46
77.01
81.31
76.50
72.41
76.63
96.35
89.64
90.23
96.19
98.04
77.59
87.43
75.91
71.41
75.25
96.48
87.52
Testing
127
13
36
340
183
42
141
221
24
43
o
a
91.38
96.19
97.46
70.62
70.47
75.50
71.03
77.45
96.61
91.24
91.38
96.19
98.04
68.51
75.61
75.10
70.21
76.43
97.06
88.51
Table 3
Ranks based on the overall (o ) and average (a ) testing efciencies.
Data sets
IS
IRIS
WINE
VC
GI
HEART
LD
PIMA
BC
ION
Average rank (Rj )
PBL-McRBFN
SRAN
ELM
SVM
o
a
o
a
o
a
o
a
1
1
1
1
2
1
1
1
1
1
1.1
1
1
1
1
1
1
1
1
1
1
1
2
3
4
3
1
2
4
2
2
3
2.6
2
3
4
3
3
2
4
4
2
2
2.9
4
3
2.5
2
3
3
2
4
4
4
3.15
4
3
2.5
2
2
3
2
3
4
4
2.95
3
3
2.5
4
4
4
3
3
3
2
3.15
3
3
2.5
4
4
4
3
2
3
3
3.15
PBL-McRBFN
SRAN
ELM
SVM
a
Hidden
Samples
Neurons
Used
o
a
5
10
10
22a
9
39
62
62
99.27
99.27
99.27
98.54
98.91
98.91
98.91
97.95
Testing
PBL-McRBFN
SRAN
ELM
SVM
1
Hidden
Samples
Neurons
Used
22
25
30
261
60
45
97
97
Acknowledgements
Testing
o
100
90.91
90.91
90.91
665
a
100
91.67
90.0
91.67
[1] G.B. Zhang, Neural network for classication: a survey, IEEE Transactions on
Systems, Man and Cybernetics Part C: Applications and Reviews 30 (4) (2000)
451462.
[2] Y. LeCun, B. Boser, J.S. Denker, D. Henderson, R.E. Howard, W. Hubbard, L.D.
Jackel, Backpropagation applied to handwritten zip code recognition, Neural
Computation. 1 (1989) 541551.
[3] F.F. Li, T.J. Cox, A neural network model for speech intelligibility quantication,
Applied Soft Computing 7 (1) (2007) 145155.
[4] S. Ari, G. Saha, In search of an optimization technique for articial neural network to classify abnormal heart sounds, Applied Soft Computing 9 (1) (2009)
330340.
[5] V. Ravi, C. Pramodh, Threshold accepting trained principal component neural
network and feature subset selection: application to bankruptcy prediction in
banks, Applied Soft Computing 8 (4) (2008) 15391548.
[6] M.E. Ruiz, P. Srinivasan, Hierarchical text categorization using neural networks,
Information Retrieval 5 (2002) 87118.
[7] M. Khan, S.W. Khor, Web document clustering using a hybrid neural network,
Applied Soft Computing 4 (4) (2004) 423432.
[8] D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning representations by backpropagation errors, nature, Nature 323 (1986) 533536.
[9] G.-B. Huang, Q.Y. Zhu, C.K. Siew, Extreme learning machine: a new learning
scheme of feedforward neural networks, IEEE International Joint Conference
on Neural Networks. Proceedings 2 (2004) 985990.
[10] G.-B. Huang, X. Ding, H. Zhou, Optimization method based extreme learning
machine for classication, Neurocomputing 74 (1-3) (2010) 155163.
[11] J.C. Platt, A resource allocation network for function interpolation, Neural Computation 3 (2) (1991) 213225.
[12] L. Yingwei, N. Sundararajan, P. Saratchandran, A sequential learning scheme for
function approximation using minimal radial basis function neural networks,
Neural Computation 9 (2) (1997) 461478.
[13] G.-B. Huang, P. Saratchandran, N. Sundararajan, An efcient sequential learning
algorithm for growing and pruning RBF (GAP-RBF) networks, IEEE transactions on Systems, Man, and Cybernetics. Part B, Cybernetics 34 (6) (2004)
22842292.
[14] N.-Y. Liang, G.-B. Huang, P. Saratchandran, N. Sundararajan, A fast and accurate
online sequential learning algorithm for feedforward networks., IEEE Transactions on Neural Networks 17 (6) (2006) 14111423.
[15] S. Suresh, N. Sundararajan, P. Saratchandran, A sequential multi-category classier using radial basis function networks, Neurocomputing 71 (1) (2008)
13451358.
[16] S. Suresh, R.V. Babu, H.J. Kim, No-reference image quality assessment using
modied extreme learning machine classier, Applied Soft Computing 9 (2)
(2009) 541552.
[17] N. Kasabov, Evolving fuzzy neural networks for supervised/unsupervised
online knowledge-based learning, IEEE Transactions on Systems, Man, and
Cybernetics, Part B: Cybernetics 31 (6) (2001) 902918.
[18] W.P. Rivers, Autonomy at all costs: an ethnography of metacognitive selfassessment and self-management among experienced language learners, The
Modern Language Journal 85 (2) (2001) 279290.
[19] R. Isaacson, F. Fujita, Metacognitive knowledge monitoring and self-regulated
learning: academic success and reections on learning, Journal of the Scholarship of Teaching and Learning 6 (1) (2006) 3955.
[20] S. Suresh, K. Dong, H.J. Kim, A sequential learning algorithm for self-adaptive
resource allocation network classier, Neurocomputing 73 (1618) (2010)
30123019.
[21] S. Suresh, R. Savitha, N. Sundararajan, A sequential learning algorithm
for complex-valued self-regulating resource allocation network-CSRAN, IEEE
Transactions on Neural Networks 22 (7) (2011) 10611072.
[22] G. Sateesh Babu, S. Suresh, Meta-cognitive neural network for classication
problems in a sequential learning framework, Neurocomputing 81 (2012)
8696.
[23] K. Subramanian, S. Suresh, A meta-cognitive sequential learning algorithm
for neuro-fuzzy inference system, Applied Soft Computing 12 (11) (2012)
36033614.
[24] R. Savitha, S. Suresh, N. Sundararajan, Metacognitive learning in a fully
complex-valued radial basis function neural network, Neural Computation 24
(5) (2012) 12971328.
[25] R. Savitha, S. Suresh, N. Sundararajan, A meta-cognitive learning algorithm
for a Fully Complex-valued Relaxation Network, Neural Networks 32 (2012)
209218.
[26] G. Sateesh Babu, R. Savitha, S. Suresh, A projection based learning in metacognitive radial basis function network for classication problems, in: The
2012 International Joint Conference on Neural Networks (IJCNN), 2012, pp.
29072914.
666
[27] G. Sateesh Babu, S. Suresh, B.S. Mahanand, Alzheimers disease detection using
a Projection Based Learning Meta-cognitive RBF Network, in: The 2012 International Joint Conference on Neural Networks (IJCNN), 2012, pp. 408415.
[28] G. Sateesh Babu, S. Suresh, K. Uma Sangumathi, H. Kim, A Projection Based
Learning Meta-cognitive RBF network classier for effective diagnosis of
Parkinsons disease, in: J. Wang, G. Yen, M. Polycarpou (Eds.), Advances in Neural Networks ISNN 2012, vol. 7368 of Lecture Notes in Computer Science,
Springer, Berlin / Heidelberg, 2012, pp. 611620.
[29] G. Sateesh Babu, S. Suresh, Parkinsons disease prediction using gene expression a projection based learning meta-cognitive neural classier approach,
Expert Systems with Applications (2012), http://dx.doi.org/10.1016/j.eswa.
2012.08.070
[30] M.T. Cox, Metacognition in computation: a selected research review, Articial
Intelligence 169 (2) (2005) 104141.
[31] T.O. Nelson, L. Narens, Metamemory: A Theoretical Framework and New Findings, Allyn and Bacon, Boston, USA, 1992.
[32] S. Suresh, N. Sundararajan, P. Saratchandran, Risk-sensitive loss functions for
sparse multi-category classication problems, Information Sciences 178 (12)
(2008) 26212638.
O. Fontenla-Romero, A. Alonso-Betanzos, A
[34] E. Castillo, B. Guijarro-Berdinas,
very fast learning method for neural networks based on sensitivity analysis,
Journal of Machine Learning Research 7 (2006) 11591182.
[35] H. Hoffmann, Kernel PCA for novelty detection, Pattern Recognition 40 (3)
(2007) 863874.
[36] C. Blake, C. Merz, UCI repository of machine learning databases, University of
California, Irvine, Department of Information and Computer Sciences, 1998,
http://archive.ics.uci.edu/ml/
[37] J. Demsar, Statistical comparisons of classiers over multiple data sets, The
Journal of Machine Learning Research 7 (2006) 130.
[38] S.N. Omkar, S. Suresh, T.R. Raghavendra, V. Mani, Acoustic emission signal classication using fuzzy C-means clustering, Proceedings of the ICONIP
02, 9th International Conference on Neural Information Processing 4 (2002)
18271831.
[39] C. Aize, Q. Song, X. Yang, S. Liu, C. Guo, Mammographic mass detection by vicinal
support vector machine, Proceedings of the ICNN 04, International Conference
on Neural Networks 3 (2004) 19531958.
[40] T. Zhang, Statistical behavior and consistency of classication methods based
on convex risk minimization, Annals of Statistics 32 (1) (2004) 5685.
[41] B. Scholkopf, A.J. Smola, Learning with Kernels, MIT Press, Cambridge, MA, 2002.
[42] C. Cortes, V. Vapnik, Support-vector networks, Machine Learning 20 (3) (1995)
273297.
[43] J. Suckling, J. Parker, D.R. Dance, S. Astley, I. Hutt, C. Boggis, I. Ricketts, E. Stamatakis, N. Cerneaz, S. Kok, et al., The mammographic image analysis society
digital mammogram database, Experta Medica International Congress Series
1069 (1994) 375378.
[44] S. Suresh, S.N. Omkar, V. Mani, T.N.G. Prakash, Lift coefcient prediction at
high angle of attack using recurrent neural network, Aerospace Science and
Technology 7 (8) (2003) 595602.
[45] C.-C. Chang, C.-J. Lin, LIBSVM: a library for support vector machines, ACM Transactions on Intelligent Systems and Technology 2 (2011) 27:1-27:27, software
available at http://www.csie.ntu.edu.tw/ cjlin/libsvm
[46] R.L. Iman, J.M. Davenport, Approximations of the critical region of the Friedman
statistic, Communications in Statistics (1980) 571595.
[47] J.H. Zar, Biostatistical Analysis, 4th Ed., Prentice-Hall, Englewood Clifs, New
Jersey, 1999.
[48] O.J. Dunn, Multiple comparisons among means, Journal of the American Statistical Association 56 (293) (1961) 5264.
Mr. Giduthuri Sateesh Babu received the B.Tech degree
in electrical and electronics engineering from Jawaharlal Nehru Technological University, India, in 2007, and
M.Tech degree in electrical engineering from Indian Institute of Technology Delhi, India, in 2009. From 2009 to
2010, he worked as a senior software engineer in Samsung R&D centre, India. He is currently a Ph.D. student
with School of Computer Engineering, Nanyang Technological University, Singapore. His research interests
include machine learning, cognitive computing, neural
networks, control systems, optimization and medical
informatics.