Median Radial Basis Function Neural Network: G. Borg

IEkE TRANSACTIONS ON NEURAL NETWORKS, VOL I, NO.
6, NOVEMBER 1996 1351
Median Radial Basis Function Neural Network

Adrian G. Borg and Ioannis Pitas, Senior Member, IEEE
Abstract-Radial basis functions (RBF’s) consist of a two-layer The RBF neural networks can be used to model the proba-
neural network, where each hidden unit implements a kernel bility density functions (PDF) in nonparametric classification
function. Each kernel is associated with an activation region tasks [lo]-[ 121. The basis functions, when used as activation
from the input space and its output is fed to an output unit. In
order to find the parameters of a neural network which embeds functions for hidden units, provide the network with the
this structure we take into consideration two different statistical capability of forming complex separation boundaries between
approaches. The first approach uses classical estimation in the classes, which is equivalent to what perceptran networks
learning stage and it is based on the learning vector quantization can provide through an intermediate mapping. The main
algorithm and its second-order statistics extension. After the applications for the RBF have been so far in pattern classifi-
presentation of this approach, we introduce the median radial
basis function (MRBF) algorithm based on robust estimation of cation where the network approximates the Bayesian classifier
the hidden unit parameters. The proposed algorithm employs the [12]-[14] and in system modeling [7], [lS], [16]. In both
marginal median for kernel location estimation and the median areas, RBF networks gave better results when compared to
of the absolute deviations for the scale parameter estimation. A other methods. The RBF network requires less computation
histogram-based fast implementation is provided for the MRBF
time for the learning [7] and a more compact topology than
algorithm. The theoretical performance of the two training algo-
rithms is comparatively evaluated when estimating the network other neural networks [ 171. Various learning algorithms have
weights. The network is applied in pattern classificationproblems been used in order to find the most appropriate parameters
and in optical flow segmentation. for the RBF decomposition. They can be classified in two
major branches: batch learning, where the learning is done on
groups of patterns [6], [13], [14], [18] and on-line learning,
I. INTRODUCTION where the learning is adaptive, on a per pattern basis [7], [15],
F UNCTIONAL estimation is an important problem in

data analysis and pattern recognition applications. The
properties of radial basis functions (RBF’s) make them suitable
[16], [19]. These algorithms were employed for time series
prediction [ 151, speech recognition [7], channel equalization
[ 161, and image quantization [12].
to be used as universal approximators [1]-[3]. A continuous The Gaussian function is usually chosen as kernel function
function can be described as a weighted sum of kernels. A and it will be considered in this study as well. The parameters
kernel decomposition is suitable to be embedded in a two-layer to be estimated in this case are the center vector and the
neural-network structure where each kernel is implemented by covariance matrix for each kernel as well as the weighting
a hidden unit. In supervised learning, the network is provided parameters corresponding to the output connections. The cal-
with input-output pairs of samples drawn from an observation culation of Gaussian centers [7] is similar to the leaLming vector
set and the learning algorithm finds the rules that model the quantization (LVQ) algorithm [9], [20] and this approach can
given mapping. be extended for the calculation of the covariance matrices
In [4], a window function was associated with each data [16], [21]. The Gaussian centers correspond to the local
sample. In other approaches, a distribution function is approx- estimates for the first-order statistics and covariaiice matrices
imated by the superposition of a set of basis functions whose for the second-order statistics. However, the estirnators based
centers are situated on a regular grid [5].The assignment of so on classical statistics produce bias in parameter estimation, if
many functions is not practical in most of the applications and data are not normally distributed [22]. Robust statistics, which
further work was devoted to representing a set of data with are extensively used in image processing algorithms 1231 are
one function, with the closest approximation possible. In RBF known to give good results when data are contarninated with
networks, an activation region with respect to the data sample outliers or when the distributions are long-tailed. In this paper,
local densities is assigned, after the learning stage, to each we introduce a learning algorithm for RBF networks based on
hidden unit. In [6], the RBF weights are found by solving robust estimation.
the given system of equations and considering the desired The RBF network is presented in detail in Section 11. The
values for the training set. The adaptive implementation of classical approach as well as its statistical interpretation are de-
this approach [7] is related with clustering techniques such as scribed in Section 111-A. In Section 111-B, we introduce a novel
adaptive k-means clustering algorithm [S] and learning vector on-line learning algorithm called median radial biasis function
quantization [9]. (MRBF). For the estimation of the RBF centers we employ the
marginal median estimator [23],[24] and for the estimation of
Manuscript received September 19, 1994; revised April 25, 1995 and May the covariance matrix, the median of the absolute deviations
18, 1996. (MAD) algorithm [22].The median algorithm relies on the
The authors are with the Department of Informatics, University of’ Thessa-
loniki, Thessaloniki, Greece. ordering of the coming data samples in a runnling window
Publisher Item Identifier S 1045-9227(96)06610.6. with the decision to be assigned to that pattem situated in the
1045-9227/96$05.00 0 1996 IEEE
1352 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 7, NO 6, NOVEMBER 1996
i \/'
INPUT HTDDEN LAYER OUTPC
Fig. 1. The RBF neural-network structure
middle of the window. The number of patterns to be taken into In supervised learning, the network is provided with a train-
account by the algorithm depends on how fast the distribution ing set of patterns consisting of vectors and their corresponding
of the data changes in time. A fast computing algorithm based classes. Each pattern is considered assigned only to one class
on data sample histogram analysis is derived for the MRBF Ck, according to an unknown mapping. After an efficient
in Section 111-C. This implementation is very useful in the learning stage, the network implements the mapping rule and
case when data have discrete values, e.g., in image processing generalizes it for patterns which are not from the training set.
and computer vision applications. In Section IV, the expected According to Bayes' theorem [25],we can express the relation
stationary values are derived in the case when we estimate among the a posteriori probabilities P(Ck1X) of different
the parameters from a mixture of one-dimensional (1-D) classes by using their a priori probabilities P ( C , )
Gaussian functions. We provide the theoretical bounds for
mean and variance estimators in the case when we use either
classical or robust estimation. In the case where estimating the
parameters of each function from a mixture of distributions we
investigate the parameter convergence to the stationary values.
In Section V-A, both algorithms are applied in artificially where M is the number of classes and X is an N-dimensional
generated data classification problems. In this application, the vector denoting a pattern. Providing their capabilities of ap-
networks model the underlying probability for each class using proximation [I]-131, RBF networks can be used to describe the
the decomposition in RBF kernels. In order to decide the underlying probability as a sum of components with respect
class for a new data sample, both Euclidean and Mahalanobis to a base (denoted by the function family 4)
[8], [25] distances are used. The figures of merit are the L
classification error, the capability of functional approximation (3)
as well as the estimation of the optimal boundary between ,7 = 1
the classes. In Section V-B, the proposed algorithm is applied
for optical flow segmentation and in Section VI we draw the
where L is the number of kernel functions and A h o are the
weights of the hidden unit to output connection.
conclusions of the present study.
Each hidden unit implements a Gaussian function
11. RBF NETWORKAND OPTIMAL CLASSIFICATION

The RBF network has a feedforward topology which models for j = 1, . . . L , where ,u3 is the mean vector and C, is the
a mapping between a set of vector entries and a set of outputs, covariance matrix. Geometrically, p,j represents the center and
by using an intermediate level of representation implemented C j the shape of the ,jth basis function. A hidden unit function
by the radial basis functions. The structure of this network can be represented as a hyper-ellipsoid in the N-dimensional
is represented in Fig. 1. Each network input is assigned to space. An activation region is defined around the mean vector.
a vector entry (feature in a pattern recognition application) If a pattern falls inside the activation region of a hidden unit,
and the outputs correspond either to a set of functions to be that neuron will fire. The maximum activation for the hidden
modeled by the network or to various associated classes. unit is obtained when the sample is identical to the center.
BORS AND PITAS: MEDIAN RADIAL BASIS FUNCTION 1353
The activation region for a neuron is similar to the Voronoi where nJ is the number of data samples from the given data
neighborhood for a vector quantizer [26].Let us denote by population 1251.
V, the activation region of the j t h kernel with respect to a In order to decide which class center will be updated, in
metric distance LVQ the Euclidean distance is computed between the data
sample and each center
v, = { X E ELN I IIX I*.//I
-
L
qx-/Lll i = l " . > L . i # j l
> (5) If IIX, - bJ1l2= min / I X , - fin-l12
k=l
then X , E C, (10)
where 11 . 11 represents a distance metric, e.g., Euclidean. The where C j is the winner class and 11 . /I denotes the Euclidean
separating boundary between two classes is the location of distance. The LVQ algorithm is the adaptive version for
the vectors wlhich have the same a posteriori probabilities for (8), computed for patterns assigned to an activ;ition region
both classes according to (IO). In the original LVQ algorithm, used for
If we consider 1-D data, then we can express V, = [TJ,T,+1). b,(t + 1) = bJ(q+ V[Xt - bJQl (1 1)
Each output implements a weighted sum of kernels as given
where is the learning rate and & ( t ) is the c1:nter vector
by (3). Classes can be coded in different ways by the outputs.
estimate at the moment t. Various decaying rules for the
For a more accurate representation of the classes we choose
leaming rate were tested for the LVQ algorithm [28]. The
the number of outputs as equal to the number of classes. In this
leaming rate which achieves the minimum output variance
case, the class decision is assigned to the maximal activated
output unit (winner take all). The sigmoidal function is used [29] is
in order to limit the output values in the interval (0, 1) 1
q=- (12)
1 727
(7)
y k ( x )= 1 + cxp ;-pk(x)] where n, is the number of samples assigned to tlhe cluster j .
For the covariance matrix calculation we use the extension of
for k = 1. . . . , 144, where M represents the number of outputs the LVQ algorithm for second-order statistics [ 161, [ 191
and p k ( X ) is8given by (3).
The sign of the weights An-, shows the activation or the E,(t + 1) = ___
nJ -
n, 1
-
2Jt)
inhibition of the hidden unit to output connection. If the sign
of the weight X k J is positive, then the activation region of
the kernel j corresponds to the class k ; otherwise, it is not
+ [XL - b J ( -+
f 1)"
n7 - 1
- b3(t -a (13)
associated with the class k .
where $,(t) is the covariance matrix estimate at the moment
t. We can observe that the formulas (11) and ~(13)are the
[II. LEARNINGIN RBF NETWORKS adaptive versions of (8) and (9).
In some applications, it is worthwhile to use the Maha-
A. Classical Statistics Approach lanobis distance instead of the Euclidean one for the choice
A combined unsupervised-supervised learning technique has of the winner class. The Mahalanobis distance takes into
been widely used [7], 1161, [21] in order to estimate the RBF consideration the covariance matrix for each basis function
weights. This is an on-line technique which employs the LVQ
If (bJ - x,,T5;'(i;.,- X , ) =
algorithm [9], 1201 in order to find the input to hidden unit
L
weights in the unsupervised part and the least mean squares inin [(bk - X,)T %il(j& - X,)] then X, E C,. (14)
[27]for finding the A k 3 weights in the supervised part. At each k=l
iteration we first update the kernel parameters and afterwards However, at the start of the learning algorithm, an imprecision
the output weights. The unsupervised part of the learning stage in estimating the covariance parameters may occur and this
is based on classical statistics assumptions. can lead to a singular covariance matrix. Thus, for the first
In the clasical statistics approach, the estimation of the few data samples we can use the Euclidean distance (10)
mean and of the covariance matrix for a given population of and afterwards employ the Mahalanobis distance (14). The
data samples is given by initial values for the centers j2 are randomly generated and the
covariance matrices are initialized with zero.
B. Median-Based Estimation jor RBF Par-ametei:y

In the training stage, it is desirable to avoid using outliers
n1 which may cause bias in the estimation of the FLBF network
(X, - f i J T ( X- fiJ parameters. The patterns which are not consistent with data
I?&,] = ,=l (9) statistics (noisy patterns) should be rejected rather than used
n, - 1 for training [30]. Robust methods are known to provide
1354 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. I , NO 6. NOVEMBER 1996
accurate estimates when data are contaminated with outliers or Mahalanobis distance. The order of RBF network weights
have long-tailed distributions [22],1231. They are insensitive updating is well defined: the kernel center, the covariance
at extreme observations and this makes them attractive for matrix which uses the previously estimated center and after-
parameter estimation. In marginal median LVQ algorithm 1241, wards the weights corresponding to the hidden unit to output
[3I], the data samples are marginally ordered and the centroid connection. The network found by means of the proposed
is taken as the marginal median [23] training algorithm is called MRBF neural network.
The second layer is used in order to group the clusters found
j i J = med {XO,Xz, . . . X,-1} (15) in the unsupervised stage in classes. The output weights are
where X,-1 is the last pattern assigned to the j t h neuron. updated as follows:
In order to avoid an excessive computational complexity, the
median operation can be done on a finite set of data, extracted
through a moving window that contains only the last W data
samples assigned to the hidden unit j
bj = { med
med
{XI, x1, ” ’ , X-l}
{ X T L - - wXrL--w+l,
, . . . > X,,-I}
ifn<W
if n 2 W for k = 1, . . . , M and ,7 = 1. . . . L, where the learning rate
(16) is 7~ E (0; 11. Fk(X) is the desired output for the pattern
where Xk., k = n- W , , 7 1 - 1 are the data samples assigned vector X and it is coded as
to the j t h neuron according to (10) or (14).
Window size W is small if the statistics of the sample
population change rapidly in time and large if the data samples
statistics is relatively unchanged in time and if a more accurate
median estimate of the given data population is desired. Unlike
for k = 1, . . . M . The formula (21) corresponds to the
in image filtering where the window is rather small, for a good ~
median estimate of the data samples a rather big window backpropagation [ 3 2 ] , [ 3 3 ] for the output weights of a RBF
should be employed. network with respect to the mean square error cost function
~41.
For the dispersion vector associated with a kernel function,
we use the MAD estimator The network topology represents the number of neurons
on each layer. The number of inputs and outputs can be
med { 1x0 - b.1I , . . . IXn -1 - fi.7 I}
A
o3 = 1
set up from the given supervised problem. For evaluating
(17)
0.6745 the number of hidden units we can use various approaches:
where 0.6745 is the scaling parameter in order to make the growing architecture, decreasing number of hidden units,
estimator Fisher consistent for the normal distribution [22]. or a combination of these two. When the performance of
MAD calculation is performed along each data dimension, the network is poor, the number of hidden units should be
independently. The same set of data samples can be taken increased 1141, [ 1.51, [17]. If some hidden units are not relevant
into account in (17) as for the marginal median in (16). for the classification, or their activation fields are overlapping,
The off-diagonal components of the covariance matrix can the network should be pruned [ 131. The relevance of the hidden
be calculated based on robust statistics as well 1221. We units is calculated based on the ratio between the number of
consider two arrays containing the difference and sum of each data samples contained in their activation field and the total
two different components for a data sample from the moving number of data samples. The overlapping of the activation
window fields can be evaluated by clustering similarity measures [SI.
q+i,,l =h + xi,
x,; 1 (18)
Z77hl = xz;
h, X ! ,1
- (19) C. Fast Training in Median RBF Based
on Data Sample Histograms
for Z = n - W , . . . n - 1. First, the median of these new data
When the data samples are distributed in a discrete range
populations is calculated according to (1 6). The squares of the
of values we can find solutions for a fast MRBF training
correspondent MAD estimates (1 7) for the arrays Z& and Z i l
stage. A fast implementation for the median algorithm based
represent their variances and they are denoted as Vjl’Lland on histogram updating, used in image filtering, was proposed
q:fLl. The off-diagonal components of the covariance matrix in [34]. The first data sample assigned to a unit becomes the
are derived as starting point in finding the median. In the updating stage we
take into consideration pairs of data samples Xi and
assigned to the same unit according to either ( I O ) or (14).
In marginal median LVQ, both Euclidean (10) and Maha- We build up the marginal histogram associated with each
lanobis distances (14) can be used. In the case of Mahalanobis activation region, denoted here as HJh [ k ] , where ,j is the
distance, a good estimation is desired for the covariance matrix hidden unit, h is the data sample entry and k represents the
in order to be appropriately used for winner class selection. histogram level. We denote by j&, h(t) the center estimate at
By using a robust estimation of the covariance matrix as instant t , and let us assume X i , tL < Xz+l.f , . Median updating
in (17)-(20) we can be confident in the evaluation of the can be performed according to the rank of the incoming data
BORS AND PITAIS: MEDIAN RADIAL BASIS FUNCTION 1355
samples
0.07 ! ”/\
where K is tlhe number of histogram levels necessary to add
or subtract in order to obtain the new location for the median.
K is evaluated on the condition that median is located where
the data marginal histogram splits in two sides containing an
equal number of samples
/
Fig. 2. Overlapping Gaussian functions.

where 1 represents the total number of levels in the updated
histogram H.,h[k].In some histograms it is not possible to
obtain exact equality in the relation (24). The median is chosen Iv. THE ESTIMATION
OF THE NETWORKWIEIGHTS
such that (24) is best approximated.
The marginal median estimator operates independently on
We implement a fast calculation of MAD (17) by using the
each data axis. Therefore, the performance analysis can be
histogram of data samples. In order to estimate the dispersion
done for the 1-D case without loss of generality. In this
parameter we use the histograms obtained during median
Section we estimate the parameters, i.e., the centers and
calculation by (23) and (24). We build new histograms denoted
variances of 1-D normal distribution mixtures. The proposed
as HJh, MAD^^] and representing the distributions for IX,-fij 1
robust estimation techniques are compared against the classical
estimators. A mixture of Gaussians is a very geineral model,
used in many applications. We shall perform an asymptotic
analysis of performance when we have a sufficiently large
number of observations for each class.
where k represents the histogram level. MAD represents the Let us take a PDF function f ( X ) equal to a mixture made
median of the H3h,&IAD[IC] histogram multiplied by a constant up of L 1-D normal distributions N ( p 3 , 0 3 ) ,each of them
and should fulfill a similar condition as in (23) and (24). Let us with a priori probability E~
denote by g J h the location where the histogram H3h, MAD[^]
splits in two equal sides. MAD estimator of the distribution
can be derived from the median of this histogram, taking into
account the quantization error as well
L
Ej =l.
3=1
The second equation represents the normalization relation for

the a priori probabilities. We assume that each distribution in
the mixture (28) corresponds to one data class. Our aim is to
\
separate each class j by choosing appropriate thresholds TJ,
where nJ is the total number of samples assigned to the Tj+l.An example of three overlapping Gaussian distributions
j t h hidden unit and where 0.5 should be subtracted in order separated by thresholds TI and T2 is provided in Fig. 2.
to compensate for the first term of the folded histogram When estimating the components of (28) we have to consider
H3h,~ ~ 0 [from 0 ] (25). The second term represents the quan- the overlapping among different distributions. Thle normalized
tization error and can be omitted if a faster implementation distribution is given by
is desired. It can be easily seen that this term is zero for a
properly balanced histogram associated with MAD estimation.
For the off-diagonal components of the covariance matrix
we can employ a similar approach to that used for calcu-
lating the MAD estimator (25), (26). We calculate the joint where Tj and Tj+l are the optimal boundaries of the j t h
histograms of 2; and 2, based on (18) and (19) and we function with its neighboring functions.
evaluate the median and MAD estimators similarly with (27). The expected value of the center can be obtained from
Afterwards, the covariance matrix components are derived as
in (20). E[&] = E [ X I X E [+j, Fj+l)]
1356 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL 7,NO. 6, NOVEMBER 1996
-0.02
Median
0.181
0.161
6 -0.06
$a -0.08
2 -0.1
-8
-
8
L
-0.12-
-0.14-
a
-0.16-
a
4 -0.18-
Mean
1
-0 2 2 ' I
0 20 40 60 80 100 120 140 160 180 200
0 2 4 6 8 I0 12 14 16 No samples
Fig. 3. The separation boundary between two Gaussian probability density (d

[unctions where "-'' denotes the optimal boundary, "- -" the boundary
found by using marginal median estimator, and "- ." the boundary found by
using classical LVQ.
0.I
0.05 I I
MAD
where T j and T3+l are the estimates of the separating bound-

aries for the j t h Gaussian kernel and f ( X ) is given by (28).
In an estimation problem, the bias represents the difference -0.25 -
between the estimated value and the optimum one [22]. It
is desirable to obtain as small bias as possible. The bias
Sample standard deviation
1
of the boundary between two classes is directly related to
the estimation of the class probabilities (6). If the class
probabilities are well estimated, then the bias is small. If not,
then the bias is large.
When jlJis evaluated as in classical LVQ ( 1 I ) , the stationary
estimate for the j t h Gaussian kernel center is given by
When employing the median estimator (15), the PDF for
72 = 21 + 1 independent and identically distributed data is
given by 1231
where F ( X ) is the cumulative distribution function for the

data whose PDF is (28). If we insert (35) in (31) we obtain
the expected value of the median estimator assuming n data
(32) samples.
The median is located where the PDF of the given data
where S is samples splits in two equal areas [23]
(33) f ( X )d X = /T'" , f ( X )dX (36)

ER [fi, 1
and the function erf ( y ) is [25]
erf (y) = ~
d'z.
i' exp (-;) dt. (34)
where E n [ & ] is the median stationary estimate of the given
data. The stationary value of j t h distribution center esti-
mate using the median estimator is obtained after inserting
0.6
0.4
0.2
I
1 2 3 4
Scale parameter Scale parameter
(a) (b)
-2 - -- classical
- MAD
-2
-2
1 2 3 4 1 2 3 4
Scale parameter Scale parameter
(c) (d)
Fig. 5. Theoretical analysis of the bias for median and classical statistics estimators in evaluating the RBF parameters: (a) center for X ( 5 . 0 ) in the
distribution (48), (b) center for N ( 5 . 0 ) in the distribution (49), (c) scale parameter for N ( 5 , 0 ) in (48), and (d) scale parameter for N ( 5 ; (7)in (49).
(28) in (36) TABLE 1

BETWEEN
COPvlPARISON MRBF AND RBF NETWORIKS IN
PATTERN
CLASSIFICATION MODELING
AND DISTRIBUTION
where 6 is provided in (33).

In the case where we want to find the expectation for the
variance estimator 6; we use a similar approach. The expected
stationary estimate for 6; based on classical estimation (13) is
where f ( X ) is given by (28) and Ec[fij]is evaluated in (32).

When MAD is used as dispersion estimator ( 1 7), its ex-
pected value when assigning n data samples is the expected stationary state of the MAD e s h a t o r , denoted
as ER[6j]
@+I IX - En,7L[fijllfi+l(X)
dX
;JI_
E R [ f i j ] + C E R 1831
ER, n[e,j]=
Tl+l
g--+l f i + l ( X )d X (39) .i,,,,-C,,,,r,

f ( X )d X = f ( X )dX (40)
where E ~ [ f i jcan
] be calculated from (37). E . ~ [ e jcan
] be
where f i + l ( X ) is given in (329, En,,,[jl,] is evaluated after derived from
inserting (351)in (31) and c = 0.6745.
By taking into account the median property of splitting the
data distribution into two equal areas as in (36), we obtain 2=1
1358 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 7 , NO. 6, NOVEMBER 1996
. . . .
I .
. ..:... i
. . ..,
. .
-5 t .
. .
.
.
-10
-10 -5 0 5 10 15 20
FEATURE I
Fig. 6. Samples from Distribution I. The boundaries between classes are marked with: '-" for optimal classifier, "- -" for MRBF, and "- ." for RBF.
~2 = 4,~2 = 0.3. The optimal boundary (44) is compared

with the boundary obtained based on the marginal median
where 6 is (33). estimator and with the boundary given by classical LVQ. The
In order to evaluate the parameters for the Gaussian kernel bias IT1 f ' l I provided by the robust estimator is smaller than
~
that of the classical LVQ.

we must also estimate the activation domains ?J+l)
for each Gaussian function. If Euclidean distance is used We make the assumption that the a priori probabilities and
for deciding the corresponding activation region for a new the variances of the Gaussian functions are equal to each other.
data sample (lo), we estimate the boundary TJ between two In the two-distribution estimation case, the expected values of
activation regions j and j +
1 as the median estimator for the two centers can be derived from
(37)
for j = 1, . . . ? L - 1. When the Euclidean distance is replaced

with the Mahalanobiv distance (14), the boundary condition
can be found by solving the equations
In the case where estimating the dispersion based on the
MAD estimator for two overlapping Gaussian distributions
we obtain from (41)
for j = 1, . . . ? L - 1. The first and last boundaries are defined
as TO= --oo and TL = 00.The 2L - 1 parameters (Gaussian
centers and boundaries) for the case described by (42) and
3L 1 parameters (including the variances) for (43) have to
~
be evaluated. In order to do this, analytical methods can be

employed by calculating iteratively the centers of the Gaussian
functions and the class boundaries.
From the condition (6), the relationship which gives the
optimal boundary TI between two classes, each of them
modeled by a Gaussian PDF, can be derived as
The particular examples considered here are
; ;
f ( X ) = N ( 5 , ff)+ N(10, 0 ) (48)
Two Gaussian PDF functions are shown in Fig. 3: p l ( X )

+ +
f ( X ) = N ( 3 , 0 ) + N ( 5 , 0)+ N(10, 0 ) + (49)
with 1-11 = 3, 01 = 2, ~1 = 0.7 and p 2 ( X ) with pz = 12, where N ( p , 0)denotes a Gaussian distribution.
BORS AND PITA!;: MEDIAN RADIAL BASIS FUNCTION 1359
20
. ..
.. . .
15 - . .. . . . .
10 -
tl
2
5E 5-
0-
( .
I.
. .:
P;! '.
-5 -
. .
-10
-10 -5 0 5 10 15 20
25 I
RBF - Mahalanobis
5-
0'
0 500 1000 1500
4
2000 2500
MRBF - Mahalanobis
3000 3500 4000

Data samples
Fig. 8. The learning curves in the case when the samples are drawn from Distribution I.
The convergence is the property of a neural network to median by replacing the formula (35) in (31) and computing
achieve a stable state after a finite number of iterations. the integral numerically. In Fig. 4(a) we compare the expected
The convergence can be defined individually for a weight bias for the marginal median against the bias of the stationary
or globally, expressing the state of the network by a cost estimate of the mean, when estimating the Gaussian center.
function. In the following we analyze the capacity of various In Fig. 4(b) we provide a comparison between 1.he expected
weights to achieve a stable state. In the example (48) let us bias of the MAD estimator (41) and that of the stationary
assume o = 2. We use MRBF to estimate the parameters estimate of the classical estimator for scale, which can be
of the distribution N ( 5 , 2). We find the expectation for the derived from (38). The expectation for scale parameter using
1360 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL 7. NO 6, NOVEMBER 1996
-..
...
...........
......
..._I .I.,..
,..
, *:
..i.
. I
.I ,.
....
.. I .
.. .. .. ..
... ... ... ...
....
.. .. .. ..
....
.. .. . ., .
....
.. ..
..
..
..
;L
...
.....
......
(a)
...... .-__.-._.
. ..-.-. .. . . .- .-. . ............
. . . . . . . . . . .. .. ...-.
. . . .. . .. ..-_
-.-
C . .
-. . . .. . . . .... .......
..............
C C . I
..-......... ........... ._ ..
.._..- ...... -. ... ....
C . .
_..- .-............ __.. ----. ... ......... -. .. ..

..-.
-. ......
............ .......
. . . - . -.. .._. . _ ..
.. ..... .......
.....
._-
....
........ ..... . ............
&
.
............ -
.._. . . -___-.
_---.
.- --.
C. ...... . .
...... .. . --.
1
~_.
..
.- ........
.. .. .. ....
. . . . .. .. .. .. .. .. .. .. -.-. . ..- ......
......
.. .- . ... ... ... ... ... ......... ..
..
.. ....
.-
-. ... ... ... ... ... ...
...........
........... ..
.. ... ... ... ... .. .. ._... .. .
-. ......... ..
........... .. .... .. .. .. .. .. ..
...........
........... .. .. ........ .......
.....
...........
.....
. . . ..-. ...... -... .. .. ....
....
.... .......
- .. ~.
........... .. .. .....
... .. .. .. .. .. ..
.. ....
..... .. .. .. .. .. ..
. .. .. .. .. .. .. .. .. ..
... .
...
.--. - .
........ .......
.-. . . . .... .......
.
.............
-
.. .. .. .. ............. --_
................ cI____
.... .. .... ..
.... -. .. ..I-....... ...----
I
.....
. .-. . .. _ Y
I
.........
........
... .. .. .. .../. .......

. . . . .. .. .. .. .-..........
(b)
/-:! ..............
Fig. 9. Frame? from the "Hamburg taxi" sequence: (a) the first frame and ...
... ..,. .. .. .. .. .. .. .. .. .. .. .. ..
(b) the third frame.
...
... -. .-.............
........ . .-... .. .. .. .. .. .. .. .. .. ..
........
........
L
_. -./ ..". ". ." .". . . .

...............
........ -/
..... . -.
.........
.......
.. .. .. .. .. .. .. .. ........ -. ..........
........
the MAD estimator is obtained after evaluating (39). From
.. .. .... .............. I " ' "
.. .. .. .. .. .. .. .. .. .. .-.. ....
these plots it is clear that median and MAD algorithms provide ....
.... . . . . . .. .. .. .. .. .. .. .. ..
better parameter estimation than arithmetic mean and classical ....
.... .........
.........
.. .. .. .. .. .. .. .. .. .. .. .. ..
sample dispersion estimator, in the case when estimating .. .. .. ..
.. .. .. .. .. .......................
............ .. .. . .
overlapping normal functions. .. .. .. .. ..............
. . . . . . . . . ../. . . ...
.... ..........
. . . . . . .. .. .. . . . . .
.
We estimate the center and the scale parameter for the .... ..............
........
.ccec
distribution N ( 5 , a ) using both classical and robust type

learning in the case of the distribution mixtures (48) and
(49). The class which corresponds to N ( 5 , 0 ) is bounded in
the case of (49) and unbounded to the left in the case of
.........
................. ....
(48). Thus, in the case of (48), the data samples used for ....... ....... . I . ......~..e.
-A..
training are drawn from a "medium-tailed" distribution and (C)

in the case of (49) from a "short-tailed'' one. The stationary
Fig. IO. The pindiagrams representing the optical flow corresponding to the
state of the bias E[ji] p is depicted in Fig. 5(a) for the
~
movement between the first and third frames: (a) the optical flow provided
distribution (48) and in Fig. 5(b) for the distribution (49), by the block matching algorithm, (b) the optical flow after it was smoothed
both with respect to the assumed dispersion (scale parameter) by the MRBF network, and (c) the optical flow after it was smoothed by the
RBF network.
0 . The comparison results for estimating the stationary state
of the bias for the scale parameter E [ & ]- o are given certain overlaps occur among various Gaussian functions from
in Fig. 5(c) and (d). From these plots it is evident that if the mixture, the respective amount of data samples contains
outliers, while median and MAD estimators provide lesser bias TABLE I1
than mean and classical sample deviation estimators. If the OPTICALFLOWSEGMENTATION
RESULTS
Gaussian functions are far away from each other with respect MRBF
to their dispersions, the amount of outliers decreases and both Network MAE MSE TIME(s) MAE
algorithms provide similar results. However, if the isolated Topology First I
N-L-M
Gaussian functions are truncated, e.g., due to the decision (lo),
the robust estimators are more accurate than those based on
classical statistics.
1.53 1 8.57 1 0.21 I 0.37 1 2.39 1 17.60 1 0.30 I 0.46 1
V. SIMULATION
RESULTS
comparison criterion is the approximation of the PDF functions
A. Estimating Probability Density Functions by the networks. The optimal network is obtained when
the network weights are equal with the paramleters of the
In the previous section we have evaluated the theoretical Distribution I or I1 (SO) and (51). The mean isquare error
performance in parameter estimation for both algorithms de- calculated between the ideal function and the estimated one
scribed in Section 111. In this section, we test these algorithms is defined as
for the estimation of mixed bivariate normal and contaminated
I M n
normal distributions. 1
The problem of finding the parameters for the Gaussian M k=1
functions is Seen as a supervised learning task. We consider
four artificially generated distributions, each containing two- where j j k ( X ) is the hypersurface modeled by the kth output
dimensional (2-D) clusters. This problem can be considered as unit. This consists of a global performance estimat:ion measure.
a 2-D extension of the mixture of Gaussians model analyzed The comparison results provided by the networks are pro-
in Section I\'. A 2-D Gaussian distribution is denoted by vided in Table I. Each experiment was repeated many times
N ( p 1 , p2; 01, 02). The Gaussian clusters are grouped in two for different data, drawn from the same distributions. Patterns
classes in orcler to form more complex distributions. from the first two distributions are represented in Figs. 6 and 7.
Distribution I: The two figures display also the boundaries found by means
of neural networks as well as the optimal boundaries. The
P:(X) = N ( 2 ; 1; 3, 1) + N ( 8 , 7; 3, 1) same number of hidden units are assumed for each network.
From these figures it is evident that MRBF approximates better
P,'(X) = N ( 8 , 2; 1: 3 ) + N ( 2 , 6; 1, 3 ) . (SO)
than RBF the class boundaries. The advantage is clear for
Distribution 11. MRBF in all the cases considered in Table I. However, when
the mixture of bivariate normal distributions is contaminated
P;'(X) = N ( 6 , 0; 4:1) + N ( 0 : 6; 1, 4) with uniform distributed patterns (e.g., in Distributions 111 and
P i ' ( X ) = N ( 6 , 6; 2, 2). (51) IV) the difference becomes very large because robust type
learning is insensitive at extreme observations. By using the
Two more distributions are obtained from the first two by Mahalanobis distance (14) instead of the Euclidean one we
adding uniform distributed data samples. obtain better results for both algorithms, excepting for the
Distribution Ill: case when we use the classical estimators for the uniform
contaminated model (52) and (53). In this case, because of the
PL"(X) = + (1 - c ) U ( [ - 5 ; 151, [ - 5 , 151). (52) noise corruption, the estimation of the covariance matrices is
poor. The MRBF algorithm based on the Mahalanobis data
Distribution IV: assignment rule gives the best results in all the assumed cases,
as it can be seen from Table I.
p;"(x) = fpi' + (1 - t)U([-5. 151,[-5, 151) (53) In Fig. 8 we evaluate the global convergence of the algo-
where k E { 1. 2) and E = 0.9. We denote by U ( [ - 5 . 151, rithms in the case when the data are drawn from Distribution
[-5. 151) a uniform distribution having the domain [-5, 151 x 1. The learning curves represent the estimation of the PDF
[-5, 151. functions given by MSE (S4), with respect to the number of
For MRBF we consider a running window of W = 401 drawn samples. From this plot it is clear that MEBF network
samples when evaluating the median estimator according provides a smaller MSE when compared to the classical RBF
to (16). Both Euclidean and Mahalanobis distances were network. The improvement produced when using Mahalanobis
considered in order to decide which neuron to be updated distance is evident from this plot as well.
for a new data sample. The same data were used for training
both algorithms. We have tested the ability of classification B. Optical Flow Segmentation
for both networks after the learning stage was concluded. The Motion representation and modeling is an important step
misclassificaiion error compares the true output S ( X ) with toward dynamic image understanding. The optical flow field
the output Y k ( X ) given by the network and is represented consists of the distribution of the velocities associated with the
as a fraction of the total number of samples. The second image element displacement. A variety of motion estimation
I362 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. I, NO. 6, NOVEMBER 1996
TABLE 111
EVALUATION
OF THE SPEED FOR VARIOUS McTrTk7-
(IN PIXELS/FRAMES) MOVINGn”’7n‘’.‘
OBJECI‘S
Taxi Van Car

Network
Method Topology X Y X Y X Y X Y
RBF 2-4-4 -5.08 -1.62 -5.08 -1.62 5.27 -0.39 -0.02 -0.05
techniques exists [ 3 5 ] . Block matching motion estimation output unit based on the Euclidean distance between each two
techniques are widely used in video coding. A block matching hidden unit centers.
algorithm assumes that the object displacement is constant We have evaluated the performance of the algorithms in
within a small block of pixel elements. The block displacement terms of mean square error (MSE) as well as mean absolute
is estimated by using correlation or matching techniques. error (MAE) of the optical flow with respect to the center
The 2-D vector is chosen such that it minimizes the mean selection
absolute error or mean square error between two blocks from
a certain neighborhood from two different frames [ 3 5 ] . The
(55)
best results are obtained when a full search is employed.
This method takes into consideration all the possible pixel
blocks within a region around the original block. The search
region is chosen according to the expected maximal speed in
the sequence. By employing block matching techniques, good
results can be obtained in the regions having many details. where Xi, si = 1; . . . . nJ are data samples assigned to the
However, in regions with almost constant pixel intensity, j t h unit. Image sequence processing needs fast algorithms in
this algorithm usually gives a certain number of erroneous most applications. Thus, we have implemented the histogram-
decisions. based algorithm for MRBF as presented in Section 111-C. Both
Optical flow segmentation algorithms identify the regions algorithms were tested in the same conditions. The hidden
having similar motion vectors. Various algorithms based on units which have assigned only a very small amount of
clustering were proposed to be used in optical flow segmenta- motion vectors, are pruned out. Only one pass through data
tion [36], [371. When applying RBF networks for optical flow is enough in order to achieve a good motion smoothing. The
segmentation, the centers of the hidden units represent groups comparison results in terms of MAE, MSE, and training time
of motion vectors [38]. Each set of vectors corresponds to are shown in Table 11. The time for the first layer updating
an object or to a part from an object, moving with a certain corresponds to the calculation of the hidden unit weights.
velocity. The total time includes also the output weights calculation.
We have applied the algorithms presented in Section 111 in All these times correspond to an implementation on a Silicon
the “Hamburg taxi” sequence. The first and third frames of Graphics Indigo Workstation. The implementation algorithm
this sequence are shown in Fig. 9(a) and (b). Their frame proposed in Section 111-C for MRBF parameter evaluation
size is 256 x 190 and they contain three important moving proved to be fast.
objects: a taxi turning around the corner, a car in the lower We have also evaluated the speed (in pixeldframes) for
left moving from left to right and a van in the lower right the corresponding moving objects. The optimal velocity was
moving from right to left. In the first processing stage we calculated as the average of the clear feature displacements
have estimated the optical flow by using the full search from each moving object, obtained independently, in a semi-
block matching algorithm, when assuming blocks of 4 x 4 automatic way. The comparison results between the real speed
pixels. The block matching search region is taken [ - 8 , 81 x of the objects and the speed obtained by means of the RBF
[-8, 81 pixels wide. The motion field provided by the block and MRBF algorithms are given in Table 111. The velocity
matching is shown in Fig. lO(a). The optical flow histogram is vectors found in the “Hamburg taxi” sequence by MRBF
represented in Fig. IO(a). The four moving objects (including and RBF algorithms are displayed in Fig. 10(b) and (c). The
the background) can be easily identified as histogram peaks smoothing obtained after processing the optical flow by using
(a concentration of motion vectors with similar velocity) in either MRBF or RBF algorithm is clear from these figures. A
Fig. 1 l(a). more complex criterion taking into account the block average
We have employed both RBF and MRBF neural networks graylevel and the position of the blocks would provide better
for optical flow segmentation. The input to hidden unit weights moving object segmentation results [38].
are calculated in an unsupervised manner as it was presented The histograms representing the optical flow modeled by
in Section 111. We have considered a second level clustering means of MRBF and RBF networks as IXk3 1 +j(X) are
algorithm for finding the output weights &. Each cluster of shown in Fig. 1 l(b) and (c), respectively. Comparing these
motion vectors which activates a hidden unit is assigned to an histograms to the initial optical flow histogram from Fig. 1 l(a)
BORS AND PITAIS: MEDIAN RADIAL BASIS FUNCTION 1363
VI. CONCLUSIONS
In this paper we present a novel algorithm for estimating
the RBF weights based on robust estimation and called median
RBF. This algorithm is presented in comparison with a classi-
cal approach for training an RBF network. We have employed
the marginal median estimator for evaluating the basis function
centers and the MAD for estimating the dispersion parameters.
We propose an implementation for MRBF network based on
data histogram updating which proved to be fast. We provide
the theoretical evaluation of the bias for both aligorithms in
the case when estimating overlapping Gaussian distributions.
The MRBF-based training is less biased by the presence of
the outliers in the training set and was proved to provide
an accurate estimation of the implied probabilities. Both RBF
and MRBF algorithms were compared in PDF estimation of
artificially generated data as well as in motion segmentation
of a real-life image sequence. In both cases, the MRBF gave
better estimation of the implied PDF’s and has shown better
classification capabilities.
REFERENCES
T. Poggio and F. Girosi, “Networks for approximation and learning,”
Proc. IEEE, vol. 78, no. 9, pp. 1481-1497, Sept. 1990.
E. J. Hartman, J. D. Keeler, and J. M. Kowalski, “Layered neural
networks with Gaussian hidden units as universal approximations,”
Neural Computa., vol. 2, pp. 210-215, 1990.
J. Park and J. W. Sandberg, “Universal approximation using radial basis
functions network,” Neural Computa., vol. 3, pp. 246-;!57, 1991.
E. Parzen, “On estimation of a probability density function and mode,”
Ann. Math. Stat., vol. 33, pp. 1065-1076, 1962.
R. M. Sanner and J.-J. E. Slotine, “Gaussian networks for direct adaptive
control,” IEEE Trans. Neural Networks, vol. 3, pp. 837-863, Nov.
1992.
D. S. Broomhead and D. Lowe, “Multivariable functionial interpolation
and adaptive networks,” Complex Syst., vol. 2, pp. 321--355, 1988.
J. Moody and C. Darken, “Fast learning in networks of locally-
tuned processing units,” Neural Computa., vol. 1, no. 2, pp. 281-294,
1989.
R. J. Schalkoff, Pattern Recognition: Statistical, Structural, and Neural
Approaches. New York: Wiley, 1992.
T. K. Kohonen, SelfOrganization and Associative Memory. Berlin:
Springer-Verlag, 1989.
D. F. Specht, “Probabilistic neural networks and the polynomial adaline
as complementary techniques for classification,” IEEE T r a m Neural
Networks, vol. 1 , pp. 111-121, Mar. 1990.
__ , “A general regression neural network,” IEEE Trans. Neural
Networks, vol. 2, pp. 568-576, Nov. 1991.
H. G. C. TrivCn, “A neural-network approach to statistical pattern classi-
fication by semiparametric estimation of probability density functions,”
Fig. 11. The optical flow histograms corresponding to the movement be- IEEE Trans. Neural Networks, vol. 2, pp. 366-377, May 1991.
tween the first and third frames. From left to right the peaks in histograms M. T. Musavi, W. Ahmed, K. H. Chan, K. B. Faris, and I). M. Hummels,
represent the van, the taxi, the background and the left car: (a) the histogram “On the training of radial basis functions classifiers,” Neural Networks,
of the optical flow when using the full search block matching algorithm, (b) vol. 5 , pp. 595-603, 1992.
the histogram represented by means of MRBF network, and (c) the histogram A. G. Bora and M. Gabbouj, “Minimal topology for a radial basis
represented by means of RBF network. functions neural network for pattern classification,” Digital Signal
Processing: A Rev. J., vol. 4, no. 3, pp. 173-188, July 1994.
J. Platt, “A resource-allocating network for function interpolation,”
Neural Computa., vol. 3, no. 2, pp. 213-225, 1991.
we can observe that MRBF network approximates it better than S. Chen, B. Mulgrew, and P. M. Grant, “A clustering technique for
RBF. From Figs. 1O(c) and 1 1(c), as well as from Table I11 we digital communications channel equalization using radial basis function
can see that the algorithm based on classical training was not networks,” IEEE Trans. Neural Networks, vol. 4, pp. 570-579, July
1993.
able to identify correctly the movement of the “taxi“ moving S. Lee and R. M. Kil, “A Gaussian potential function network with
object. According to these experiments, the proposed learning hierarchically self-organizing learning,” Neural Networks, vol. 4, pp.
algorithm MRBF, provides better estimation for the desired 207-224, 199 1,
S. Chen, C. F. N. Cowan, and P. M. Grant, “Orthogonal least squares
parameters when compared to the classical statistics based learning algorithm for radial basis functions networks,” IEEE Trans.
training for the RBF network. Neural Networks, vol. 2, pp. 302-309, Mar. 1991.
1364 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. I, NO 6, NOVEMBER 1996
[191 S. Chen and B . Mulgrew, “Overcoming cochannel interference using an Adrian G. Borv was born in Piatra Neami, Ro-
adaptive radial basis functions equalizer,” Signal Processing, vol. 28, mania, on November 3, 1967. He received the
pp. 91-107, July 1992. M.S. degree in electronics engineering from the
1201 T. K. Kohonen, “The self-organizing map,” Proc. IEEE, vol. 78, no. 9, Polytechnic University of Bucharest, Romania, in
pp. 1464-1480, Sept. 1990. 1992. He is currently working toward the Ph.D.
[21] L. Xu, A. Krzyzak, and E. Oja, “Rival penalized competitive learning for degree.
clustering analysis, RBF net, and curve detection,” IEEE Trans. Neural During 1992 to 1993, he was a Visiting Re-
Networks, vol. 4, pp. 636-649, July 1993. searcher at the Signal Processing Laboratory, Tam-
[22] G. Seber, Multivariate Observations. New York: Wiley, 1986. pere University of Technology, Finland. Since 1993.
[23] I. Pitas and A. N. Venetsanopoulos, Nonlinear Digital Filters: Principles he has been with the University of Thessaloniki,
and Applications. Norwell, MA: Kluwer, 1990. Greece. His research interests include neural net-
1241 I. Pitas and P. Kiniklis, “Median learning vector quantizer,” in Proc. works, computer visii3n , pattern recognition, and nonlinear digital signal
SPIE, Nonlinear Image Processing, San Jose, CA, vol. 2180, Feb. 7-9, processing.
1994, pp. 23-34.
[25] A. Papoulis, Probability, Random Variables. and Stochastic Processes.
New York: McCraw-Hill, 1984.
[26] A. Gersho and R. M . Gray, Vector Quantization and Signal Compression.
Norwell, MA: Kluwer, 1992.
1271 B. Widrow and S . D. Steams, Adaptive Signal Procexsing. Englewood
Cliffs, NJ: Pi-entice-Hall, 1985.
[28] J. A. Kangas, T. K. Kohonen, and J. T. Laaksonen, “Variants of self- Ioannis Pitas (SM’94) received the Diploma degree
organizing maps.” IEEE Trans. Neural Networks, vol. 1, pp. 93-99, in electrical engineering in 1980 and the Ph D
Mar. 1990. degree in electrical engineering in 1985, both from
[29] E. Yair, K. Zeger, and A. Gersho, “Competitive learning and soft the University of Thessaloniki, Greece
competition for vector quantizer design,’’ IEEE Trans. Signal Processing, Fiom 1980 to 1993, he served as Scientihc As-
ml. 40, no. 2, pp. 294-309, Feb. 1992. sistant, Lecturer, Assistant Professor, and Associate
[30] A. G. Borg and I. Pitas, “Robust estimation of radial basis functions,” Profes\or in the Department of Electtical and Com-
in Proc. IEEE Wkshp. Neural Networks f o r Signal Processing, Ermioni,
puter Engineering, University of Thessdloniki He
Greece, Sept. 6-8, 1994, pp. 105-114.
served as a Visiting Research Associate at the
1311 I. Pitas, C. Kotropoulos, N. Nikolaidis, R . Yang, and M. Gabbouj, “A
class of order statistics learning vector quantizers,” in Proc. IEEE h t . University of Toronto, Cdnada, the University of
Symp. Circuits Syst., London, 1994, pp. VI--387-VI-390.
Erlangen Nuernberg, Germany, and Tampere Uni
versity of Technology, Finland, a\ well as Visiting Assistant Professor at the
1321 D. E. Rumelhart and J. L. McClelland, Parallel Distributed Processing.
Cambridge, MA: MIT Press, 1986. University of Toronto He WdS Lecturer in short courses for continuing educa-
1331 J. Hertz, A. Krogh, and R. G. Palmer, Introduction to Theory of‘Neuru1 tion Since 1994, he has been d Professor at the Department of Informntics at
Computation. New York: Addison-Wesley, 1991. the University of Thessaloniki His cunent interests dre i n the dreds of digital
[34] T. S. Huang, G. J. Yang, and G. Y. Tang, “A fast two-dimensional image processing, multidimen\ional signal processing, dnd computer vision
median filtering algorithm,” IEEE Trans. Acoust., Speech, Signal Pro- He has publirhed more thdii 190 papers and contributed to eight books in his
cessing, vol. ASSP-27, pp. 13-18, 1979. area of interest He I\ the coauthor of the book, Nonlrneiir Digital F‘rlferc.
[35] A. N. Netravali and B. G. Haskell, Digital Pictures: Representation and Principles andApplications (Boston, MA Kluwer, 1990) He is duthor of the
Compression. New York: Plenum, 1988. book, Digital Image Processing Algorithms (Englewood Cliffs NJ Prentice
[36] M. M. Chang, A. M. Tekalp, and M. I. Sezan, “Motion-field seg- Hall, 1993) He 15 editor of the book, Parallel Algorithm and Archrfetturer
mentation using an adaptive MAP criterion,” in Proc. IEEE Int. Conf f o r Digital Image Proteyring, Cornputer ! h i o n , and Neural Networks (New
Acoust., Speech, Signal Processing, Minneapolis, MN, Apr. 1993, pp. York Wiley, 1993)
V-33-V-36. Dr Pitas ha\ been a membei of the European Community ESPRIT Pdiallel
[371 D. P. Kottke and Y. Sun, “Motion estimation via clustering matching,” Action Committee He hds dlso been dn invited spedker and/or member of the
IEEE Trans. Pattern Anal. Machine Intell., vol. 16, pp. 1128-1 132, Nov. progiam committee of 5everdl scientihc conferences and workshops He I\
1994. an Associate Editor of the IEEE TRA~SAC’IIOUS oh CIRCUITS AUD SYSTEMS
[381 A. G. Borg and I. Pitas, “Median radial hasis function network for optical and coeditor of Multidimenru“ Systems and Signal Procetsing He waq
flow processing,” in Proc. IEEE Wkshp. Nonlinear Signal and Image chair of the 1995 IEEE Workshop on Nonlinear Signal and Imdge Piocessing
Processing, Neos Marinaras, Greece, June 1995, pp. 702-70.5. (NSIP’9.5)

Median Radial Basis Function Neural Network: G. Borg

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Median Radial Basis Function Neural Network: G. Borg

Transféré par

Droits d'auteur :

Formats disponibles

IEkE TRANSACTIONS ON NEURAL NETWORKS, VOL I, NO.

6, NOVEMBER 1996 1351

Median Radial Basis Function Neural Network

F UNCTIONAL estimation is an important problem in

INPUT HTDDEN LAYER OUTPC

Fig. 1. The RBF neural-network structure

11. RBF NETWORKAND OPTIMAL CLASSIFICATION

B. Median-Based Estimation jor RBF Par-ametei:y

Fig. 2. Overlapping Gaussian functions.

The second equation represents the normalization relation for

Fig. 3. The separation boundary between two Gaussian probability density (d

where T j and T3+l are the estimates of the separating bound-

where F ( X ) is the cumulative distribution function for the

(33) f ( X )d X = /T'" , f ( X )dX (36)

(28) in (36) TABLE 1

where 6 is provided in (33).

where f ( X ) is given by (28) and Ec[fij]is evaluated in (32).

g--+l f i + l ( X )d X (39) .i,,,,-C,,,,r,

~2 = 4,~2 = 0.3. The optimal boundary (44) is compared

that of the classical LVQ.

for j = 1, . . . ? L - 1. When the Euclidean distance is replaced

be evaluated. In order to do this, analytical methods can be

Two Gaussian PDF functions are shown in Fig. 3: p l ( X )

3000 3500 4000

_..- .-............ __.. ----. ... ......... -. .. ..

... .. .. .. .../. .......

_. -./ ..". ". ." .". . . .

distribution N ( 5 , a ) using both classical and robust type

training are drawn from a "medium-tailed" distribution and (C)

Taxi Van Car

Vous aimerez peut-être aussi