Académique Documents
Professionnel Documents
Culture Documents
Abstract-Radial basis functions (RBF’s) consist of a two-layer The RBF neural networks can be used to model the proba-
neural network, where each hidden unit implements a kernel bility density functions (PDF) in nonparametric classification
function. Each kernel is associated with an activation region tasks [lo]-[ 121. The basis functions, when used as activation
from the input space and its output is fed to an output unit. In
order to find the parameters of a neural network which embeds functions for hidden units, provide the network with the
this structure we take into consideration two different statistical capability of forming complex separation boundaries between
approaches. The first approach uses classical estimation in the classes, which is equivalent to what perceptran networks
learning stage and it is based on the learning vector quantization can provide through an intermediate mapping. The main
algorithm and its second-order statistics extension. After the applications for the RBF have been so far in pattern classifi-
presentation of this approach, we introduce the median radial
basis function (MRBF) algorithm based on robust estimation of cation where the network approximates the Bayesian classifier
the hidden unit parameters. The proposed algorithm employs the [12]-[14] and in system modeling [7], [lS], [16]. In both
marginal median for kernel location estimation and the median areas, RBF networks gave better results when compared to
of the absolute deviations for the scale parameter estimation. A other methods. The RBF network requires less computation
histogram-based fast implementation is provided for the MRBF
time for the learning [7] and a more compact topology than
algorithm. The theoretical performance of the two training algo-
rithms is comparatively evaluated when estimating the network other neural networks [ 171. Various learning algorithms have
weights. The network is applied in pattern classificationproblems been used in order to find the most appropriate parameters
and in optical flow segmentation. for the RBF decomposition. They can be classified in two
major branches: batch learning, where the learning is done on
groups of patterns [6], [13], [14], [18] and on-line learning,
I. INTRODUCTION where the learning is adaptive, on a per pattern basis [7], [15],
i \/'
middle of the window. The number of patterns to be taken into In supervised learning, the network is provided with a train-
account by the algorithm depends on how fast the distribution ing set of patterns consisting of vectors and their corresponding
of the data changes in time. A fast computing algorithm based classes. Each pattern is considered assigned only to one class
on data sample histogram analysis is derived for the MRBF Ck, according to an unknown mapping. After an efficient
in Section 111-C. This implementation is very useful in the learning stage, the network implements the mapping rule and
case when data have discrete values, e.g., in image processing generalizes it for patterns which are not from the training set.
and computer vision applications. In Section IV, the expected According to Bayes' theorem [25],we can express the relation
stationary values are derived in the case when we estimate among the a posteriori probabilities P(Ck1X) of different
the parameters from a mixture of one-dimensional (1-D) classes by using their a priori probabilities P ( C , )
Gaussian functions. We provide the theoretical bounds for
mean and variance estimators in the case when we use either
classical or robust estimation. In the case where estimating the
parameters of each function from a mixture of distributions we
investigate the parameter convergence to the stationary values.
In Section V-A, both algorithms are applied in artificially where M is the number of classes and X is an N-dimensional
generated data classification problems. In this application, the vector denoting a pattern. Providing their capabilities of ap-
networks model the underlying probability for each class using proximation [I]-131, RBF networks can be used to describe the
the decomposition in RBF kernels. In order to decide the underlying probability as a sum of components with respect
class for a new data sample, both Euclidean and Mahalanobis to a base (denoted by the function family 4)
[8], [25] distances are used. The figures of merit are the L
classification error, the capability of functional approximation (3)
as well as the estimation of the optimal boundary between ,7 = 1
the classes. In Section V-B, the proposed algorithm is applied
for optical flow segmentation and in Section VI we draw the
where L is the number of kernel functions and A h o are the
weights of the hidden unit to output connection.
conclusions of the present study.
Each hidden unit implements a Gaussian function
The activation region for a neuron is similar to the Voronoi where nJ is the number of data samples from the given data
neighborhood for a vector quantizer [26].Let us denote by population 1251.
V, the activation region of the j t h kernel with respect to a In order to decide which class center will be updated, in
metric distance LVQ the Euclidean distance is computed between the data
sample and each center
v, = { X E ELN I IIX I*.//I
-
L
qx-/Lll i = l " . > L . i # j l
> (5) If IIX, - bJ1l2= min / I X , - fin-l12
k=l
then X , E C, (10)
where 11 . 11 represents a distance metric, e.g., Euclidean. The where C j is the winner class and 11 . /I denotes the Euclidean
separating boundary between two classes is the location of distance. The LVQ algorithm is the adaptive version for
the vectors wlhich have the same a posteriori probabilities for (8), computed for patterns assigned to an activ;ition region
both classes according to (IO). In the original LVQ algorithm, used for
If we consider 1-D data, then we can express V, = [TJ,T,+1). b,(t + 1) = bJ(q+ V[Xt - bJQl (1 1)
Each output implements a weighted sum of kernels as given
where is the learning rate and & ( t ) is the c1:nter vector
by (3). Classes can be coded in different ways by the outputs.
estimate at the moment t. Various decaying rules for the
For a more accurate representation of the classes we choose
leaming rate were tested for the LVQ algorithm [28]. The
the number of outputs as equal to the number of classes. In this
leaming rate which achieves the minimum output variance
case, the class decision is assigned to the maximal activated
output unit (winner take all). The sigmoidal function is used [29] is
in order to limit the output values in the interval (0, 1) 1
q=- (12)
1 727
(7)
y k ( x )= 1 + cxp ;-pk(x)] where n, is the number of samples assigned to tlhe cluster j .
For the covariance matrix calculation we use the extension of
for k = 1. . . . , 144, where M represents the number of outputs the LVQ algorithm for second-order statistics [ 161, [ 191
and p k ( X ) is8given by (3).
The sign of the weights An-, shows the activation or the E,(t + 1) = ___
nJ -
n, 1
-
2Jt)
inhibition of the hidden unit to output connection. If the sign
of the weight X k J is positive, then the activation region of
the kernel j corresponds to the class k ; otherwise, it is not
+ [XL - b J ( -+
f 1)"
n7 - 1
- b3(t -a (13)
associated with the class k .
where $,(t) is the covariance matrix estimate at the moment
t. We can observe that the formulas (11) and ~(13)are the
[II. LEARNINGIN RBF NETWORKS adaptive versions of (8) and (9).
In some applications, it is worthwhile to use the Maha-
A. Classical Statistics Approach lanobis distance instead of the Euclidean one for the choice
A combined unsupervised-supervised learning technique has of the winner class. The Mahalanobis distance takes into
been widely used [7], 1161, [21] in order to estimate the RBF consideration the covariance matrix for each basis function
weights. This is an on-line technique which employs the LVQ
If (bJ - x,,T5;'(i;.,- X , ) =
algorithm [9], 1201 in order to find the input to hidden unit
L
weights in the unsupervised part and the least mean squares inin [(bk - X,)T %il(j& - X,)] then X, E C,. (14)
[27]for finding the A k 3 weights in the supervised part. At each k=l
iteration we first update the kernel parameters and afterwards However, at the start of the learning algorithm, an imprecision
the output weights. The unsupervised part of the learning stage in estimating the covariance parameters may occur and this
is based on classical statistics assumptions. can lead to a singular covariance matrix. Thus, for the first
In the clasical statistics approach, the estimation of the few data samples we can use the Euclidean distance (10)
mean and of the covariance matrix for a given population of and afterwards employ the Mahalanobis distance (14). The
data samples is given by initial values for the centers j2 are randomly generated and the
covariance matrices are initialized with zero.
accurate estimates when data are contaminated with outliers or Mahalanobis distance. The order of RBF network weights
have long-tailed distributions [22],1231. They are insensitive updating is well defined: the kernel center, the covariance
at extreme observations and this makes them attractive for matrix which uses the previously estimated center and after-
parameter estimation. In marginal median LVQ algorithm 1241, wards the weights corresponding to the hidden unit to output
[3I], the data samples are marginally ordered and the centroid connection. The network found by means of the proposed
is taken as the marginal median [23] training algorithm is called MRBF neural network.
The second layer is used in order to group the clusters found
j i J = med {XO,Xz, . . . X,-1} (15) in the unsupervised stage in classes. The output weights are
where X,-1 is the last pattern assigned to the j t h neuron. updated as follows:
In order to avoid an excessive computational complexity, the
median operation can be done on a finite set of data, extracted
through a moving window that contains only the last W data
samples assigned to the hidden unit j
bj = { med
med
{XI, x1, ” ’ , X-l}
{ X T L - - wXrL--w+l,
, . . . > X,,-I}
ifn<W
if n 2 W for k = 1, . . . , M and ,7 = 1. . . . L, where the learning rate
(16) is 7~ E (0; 11. Fk(X) is the desired output for the pattern
where Xk., k = n- W , , 7 1 - 1 are the data samples assigned vector X and it is coded as
to the j t h neuron according to (10) or (14).
Window size W is small if the statistics of the sample
population change rapidly in time and large if the data samples
statistics is relatively unchanged in time and if a more accurate
median estimate of the given data population is desired. Unlike
for k = 1, . . . M . The formula (21) corresponds to the
in image filtering where the window is rather small, for a good ~
median estimate of the data samples a rather big window backpropagation [ 3 2 ] , [ 3 3 ] for the output weights of a RBF
should be employed. network with respect to the mean square error cost function
~41.
For the dispersion vector associated with a kernel function,
we use the MAD estimator The network topology represents the number of neurons
on each layer. The number of inputs and outputs can be
med { 1x0 - b.1I , . . . IXn -1 - fi.7 I}
A
o3 = 1
set up from the given supervised problem. For evaluating
(17)
0.6745 the number of hidden units we can use various approaches:
where 0.6745 is the scaling parameter in order to make the growing architecture, decreasing number of hidden units,
estimator Fisher consistent for the normal distribution [22]. or a combination of these two. When the performance of
MAD calculation is performed along each data dimension, the network is poor, the number of hidden units should be
independently. The same set of data samples can be taken increased 1141, [ 1.51, [17]. If some hidden units are not relevant
into account in (17) as for the marginal median in (16). for the classification, or their activation fields are overlapping,
The off-diagonal components of the covariance matrix can the network should be pruned [ 131. The relevance of the hidden
be calculated based on robust statistics as well 1221. We units is calculated based on the ratio between the number of
consider two arrays containing the difference and sum of each data samples contained in their activation field and the total
two different components for a data sample from the moving number of data samples. The overlapping of the activation
window fields can be evaluated by clustering similarity measures [SI.
q+i,,l =h + xi,
x,; 1 (18)
Z77hl = xz;
h, X ! ,1
- (19) C. Fast Training in Median RBF Based
on Data Sample Histograms
for Z = n - W , . . . n - 1. First, the median of these new data
When the data samples are distributed in a discrete range
populations is calculated according to (1 6). The squares of the
of values we can find solutions for a fast MRBF training
correspondent MAD estimates (1 7) for the arrays Z& and Z i l
stage. A fast implementation for the median algorithm based
represent their variances and they are denoted as Vjl’Lland on histogram updating, used in image filtering, was proposed
q:fLl. The off-diagonal components of the covariance matrix in [34]. The first data sample assigned to a unit becomes the
are derived as starting point in finding the median. In the updating stage we
take into consideration pairs of data samples Xi and
assigned to the same unit according to either ( I O ) or (14).
In marginal median LVQ, both Euclidean (10) and Maha- We build up the marginal histogram associated with each
lanobis distances (14) can be used. In the case of Mahalanobis activation region, denoted here as HJh [ k ] , where ,j is the
distance, a good estimation is desired for the covariance matrix hidden unit, h is the data sample entry and k represents the
in order to be appropriately used for winner class selection. histogram level. We denote by j&, h(t) the center estimate at
By using a robust estimation of the covariance matrix as instant t , and let us assume X i , tL < Xz+l.f , . Median updating
in (17)-(20) we can be confident in the evaluation of the can be performed according to the rank of the incoming data
BORS AND PITAIS: MEDIAN RADIAL BASIS FUNCTION 1355
samples
0.07 ! ”/\
where K is tlhe number of histogram levels necessary to add
or subtract in order to obtain the new location for the median.
K is evaluated on the condition that median is located where
the data marginal histogram splits in two sides containing an
equal number of samples
/
-0.02
Median
0.181
0.161
6 -0.06
$a -0.08
2 -0.1
-8
-
8
L
-0.12-
-0.14-
a
-0.16-
a
4 -0.18-
Mean
1
-0 2 2 ' I
0 20 40 60 80 100 120 140 160 180 200
0 2 4 6 8 I0 12 14 16 No samples
0.05 I I
MAD
erf (y) = ~
d'z.
i' exp (-;) dt. (34)
where E n [ & ] is the median stationary estimate of the given
data. The stationary value of j t h distribution center esti-
mate using the median estimator is obtained after inserting
BORS AND PITAS: MEDIAN RADIAL BASIS FUNCTION 1357
0.6
0.4
0.2
I
1 2 3 4
Scale parameter Scale parameter
(a) (b)
-2 - -- classical
- MAD
-2
-2
1 2 3 4 1 2 3 4
Scale parameter Scale parameter
(c) (d)
Fig. 5. Theoretical analysis of the bias for median and classical statistics estimators in evaluating the RBF parameters: (a) center for X ( 5 . 0 ) in the
distribution (48), (b) center for N ( 5 . 0 ) in the distribution (49), (c) scale parameter for N ( 5 , 0 ) in (48), and (d) scale parameter for N ( 5 ; (7)in (49).
where E ~ [ f i jcan
] be calculated from (37). E . ~ [ e jcan
] be
where f i + l ( X ) is given in (329, En,,,[jl,] is evaluated after derived from
inserting (351)in (31) and c = 0.6745.
By taking into account the median property of splitting the
data distribution into two equal areas as in (36), we obtain 2=1
1358 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 7 , NO. 6, NOVEMBER 1996
. . . .
I .
. ..:... i
. . ..,
. .
-5 t .
. .
.
.
-10
-10 -5 0 5 10 15 20
FEATURE I
Fig. 6. Samples from Distribution I. The boundaries between classes are marked with: '-" for optimal classifier, "- -" for MRBF, and "- ." for RBF.
; ;
f ( X ) = N ( 5 , ff)+ N(10, 0 ) (48)
20
. ..
.. . .
15 - . .. . . . .
10 -
tl
2
5E 5-
0-
( .
I.
. .:
P;! '.
-5 -
. .
-10
-10 -5 0 5 10 15 20
25 I
RBF - Mahalanobis
5-
0'
0 500 1000 1500
4
2000 2500
MRBF - Mahalanobis
The convergence is the property of a neural network to median by replacing the formula (35) in (31) and computing
achieve a stable state after a finite number of iterations. the integral numerically. In Fig. 4(a) we compare the expected
The convergence can be defined individually for a weight bias for the marginal median against the bias of the stationary
or globally, expressing the state of the network by a cost estimate of the mean, when estimating the Gaussian center.
function. In the following we analyze the capacity of various In Fig. 4(b) we provide a comparison between 1.he expected
weights to achieve a stable state. In the example (48) let us bias of the MAD estimator (41) and that of the stationary
assume o = 2. We use MRBF to estimate the parameters estimate of the classical estimator for scale, which can be
of the distribution N ( 5 , 2). We find the expectation for the derived from (38). The expectation for scale parameter using
1360 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL 7. NO 6, NOVEMBER 1996
-..
...
...........
......
..._I .I.,..
,..
, *:
..i.
. I
.I ,.
....
.. I .
.. .. .. ..
... ... ... ...
....
.. .. .. ..
....
.. .. . ., .
....
.. ..
..
..
..
;L
...
.....
......
(a)
...... .-__.-._.
. ..-.-. .. . . .- .-. . ............
. . . . . . . . . . .. .. ...-.
. . . .. . .. ..-_
-.-
C . .
-. . . .. . . . .... .......
..............
C C . I
..-......... ........... ._ ..
.._..- ...... -. ... ....
C . .
............ -
.._. . . -___-.
_---.
.- --.
C. ...... . .
...... .. . --.
1
~_.
..
.- ........
.. .. .. ....
. . . . .. .. .. .. .. .. .. .. -.-. . ..- ......
......
.. .- . ... ... ... ... ... ......... ..
..
.. ....
.-
-. ... ... ... ... ... ...
...........
........... ..
.. ... ... ... ... .. .. ._... .. .
-. ......... ..
........... .. .... .. .. .. .. .. ..
...........
........... .. .. ........ .......
.....
...........
.....
. . . ..-. ...... -... .. .. ....
....
.... .......
- .. ~.
........... .. .. .....
... .. .. .. .. .. ..
.. ....
..... .. .. .. .. .. ..
. .. .. .. .. .. .. .. .. ..
... .
...
.--. - .
........ .......
.-. . . . .... .......
.
.............
-
.. .. .. .. ............. --_
................ cI____
.... .. .... ..
.... -. .. ..I-....... ...----
I
.....
. .-. . .. _ Y
I
.........
........
movement between the first and third frames: (a) the optical flow provided
distribution (48) and in Fig. 5(b) for the distribution (49), by the block matching algorithm, (b) the optical flow after it was smoothed
both with respect to the assumed dispersion (scale parameter) by the MRBF network, and (c) the optical flow after it was smoothed by the
RBF network.
0 . The comparison results for estimating the stationary state
of the bias for the scale parameter E [ & ]- o are given certain overlaps occur among various Gaussian functions from
in Fig. 5(c) and (d). From these plots it is evident that if the mixture, the respective amount of data samples contains
BORS AND PITAS: MEDIAN RADIAL BASIS FUNCTION 1361
outliers, while median and MAD estimators provide lesser bias TABLE I1
than mean and classical sample deviation estimators. If the OPTICALFLOWSEGMENTATION
RESULTS
Gaussian functions are far away from each other with respect MRBF
to their dispersions, the amount of outliers decreases and both Network MAE MSE TIME(s) MAE
algorithms provide similar results. However, if the isolated Topology First I
N-L-M
Gaussian functions are truncated, e.g., due to the decision (lo),
the robust estimators are more accurate than those based on
classical statistics.
1.53 1 8.57 1 0.21 I 0.37 1 2.39 1 17.60 1 0.30 I 0.46 1
V. SIMULATION
RESULTS
comparison criterion is the approximation of the PDF functions
A. Estimating Probability Density Functions by the networks. The optimal network is obtained when
the network weights are equal with the paramleters of the
In the previous section we have evaluated the theoretical Distribution I or I1 (SO) and (51). The mean isquare error
performance in parameter estimation for both algorithms de- calculated between the ideal function and the estimated one
scribed in Section 111. In this section, we test these algorithms is defined as
for the estimation of mixed bivariate normal and contaminated
I M n
normal distributions. 1
The problem of finding the parameters for the Gaussian M k=1
functions is Seen as a supervised learning task. We consider
four artificially generated distributions, each containing two- where j j k ( X ) is the hypersurface modeled by the kth output
dimensional (2-D) clusters. This problem can be considered as unit. This consists of a global performance estimat:ion measure.
a 2-D extension of the mixture of Gaussians model analyzed The comparison results provided by the networks are pro-
in Section I\'. A 2-D Gaussian distribution is denoted by vided in Table I. Each experiment was repeated many times
N ( p 1 , p2; 01, 02). The Gaussian clusters are grouped in two for different data, drawn from the same distributions. Patterns
classes in orcler to form more complex distributions. from the first two distributions are represented in Figs. 6 and 7.
Distribution I: The two figures display also the boundaries found by means
of neural networks as well as the optimal boundaries. The
P:(X) = N ( 2 ; 1; 3, 1) + N ( 8 , 7; 3, 1) same number of hidden units are assumed for each network.
From these figures it is evident that MRBF approximates better
P,'(X) = N ( 8 , 2; 1: 3 ) + N ( 2 , 6; 1, 3 ) . (SO)
than RBF the class boundaries. The advantage is clear for
Distribution 11. MRBF in all the cases considered in Table I. However, when
the mixture of bivariate normal distributions is contaminated
P;'(X) = N ( 6 , 0; 4:1) + N ( 0 : 6; 1, 4) with uniform distributed patterns (e.g., in Distributions 111 and
P i ' ( X ) = N ( 6 , 6; 2, 2). (51) IV) the difference becomes very large because robust type
learning is insensitive at extreme observations. By using the
Two more distributions are obtained from the first two by Mahalanobis distance (14) instead of the Euclidean one we
adding uniform distributed data samples. obtain better results for both algorithms, excepting for the
Distribution Ill: case when we use the classical estimators for the uniform
contaminated model (52) and (53). In this case, because of the
PL"(X) = + (1 - c ) U ( [ - 5 ; 151, [ - 5 , 151). (52) noise corruption, the estimation of the covariance matrices is
poor. The MRBF algorithm based on the Mahalanobis data
Distribution IV: assignment rule gives the best results in all the assumed cases,
as it can be seen from Table I.
p;"(x) = fpi' + (1 - t)U([-5. 151,[-5, 151) (53) In Fig. 8 we evaluate the global convergence of the algo-
where k E { 1. 2) and E = 0.9. We denote by U ( [ - 5 . 151, rithms in the case when the data are drawn from Distribution
[-5. 151) a uniform distribution having the domain [-5, 151 x 1. The learning curves represent the estimation of the PDF
[-5, 151. functions given by MSE (S4), with respect to the number of
For MRBF we consider a running window of W = 401 drawn samples. From this plot it is clear that MEBF network
samples when evaluating the median estimator according provides a smaller MSE when compared to the classical RBF
to (16). Both Euclidean and Mahalanobis distances were network. The improvement produced when using Mahalanobis
considered in order to decide which neuron to be updated distance is evident from this plot as well.
for a new data sample. The same data were used for training
both algorithms. We have tested the ability of classification B. Optical Flow Segmentation
for both networks after the learning stage was concluded. The Motion representation and modeling is an important step
misclassificaiion error compares the true output S ( X ) with toward dynamic image understanding. The optical flow field
the output Y k ( X ) given by the network and is represented consists of the distribution of the velocities associated with the
as a fraction of the total number of samples. The second image element displacement. A variety of motion estimation
I362 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. I, NO. 6, NOVEMBER 1996
TABLE 111
EVALUATION
OF THE SPEED FOR VARIOUS McTrTk7-
(IN PIXELS/FRAMES) MOVINGn”’7n‘’.‘
OBJECI‘S
techniques exists [ 3 5 ] . Block matching motion estimation output unit based on the Euclidean distance between each two
techniques are widely used in video coding. A block matching hidden unit centers.
algorithm assumes that the object displacement is constant We have evaluated the performance of the algorithms in
within a small block of pixel elements. The block displacement terms of mean square error (MSE) as well as mean absolute
is estimated by using correlation or matching techniques. error (MAE) of the optical flow with respect to the center
The 2-D vector is chosen such that it minimizes the mean selection
absolute error or mean square error between two blocks from
a certain neighborhood from two different frames [ 3 5 ] . The
(55)
best results are obtained when a full search is employed.
This method takes into consideration all the possible pixel
blocks within a region around the original block. The search
region is chosen according to the expected maximal speed in
the sequence. By employing block matching techniques, good
results can be obtained in the regions having many details. where Xi, si = 1; . . . . nJ are data samples assigned to the
However, in regions with almost constant pixel intensity, j t h unit. Image sequence processing needs fast algorithms in
this algorithm usually gives a certain number of erroneous most applications. Thus, we have implemented the histogram-
decisions. based algorithm for MRBF as presented in Section 111-C. Both
Optical flow segmentation algorithms identify the regions algorithms were tested in the same conditions. The hidden
having similar motion vectors. Various algorithms based on units which have assigned only a very small amount of
clustering were proposed to be used in optical flow segmenta- motion vectors, are pruned out. Only one pass through data
tion [36], [371. When applying RBF networks for optical flow is enough in order to achieve a good motion smoothing. The
segmentation, the centers of the hidden units represent groups comparison results in terms of MAE, MSE, and training time
of motion vectors [38]. Each set of vectors corresponds to are shown in Table 11. The time for the first layer updating
an object or to a part from an object, moving with a certain corresponds to the calculation of the hidden unit weights.
velocity. The total time includes also the output weights calculation.
We have applied the algorithms presented in Section 111 in All these times correspond to an implementation on a Silicon
the “Hamburg taxi” sequence. The first and third frames of Graphics Indigo Workstation. The implementation algorithm
this sequence are shown in Fig. 9(a) and (b). Their frame proposed in Section 111-C for MRBF parameter evaluation
size is 256 x 190 and they contain three important moving proved to be fast.
objects: a taxi turning around the corner, a car in the lower We have also evaluated the speed (in pixeldframes) for
left moving from left to right and a van in the lower right the corresponding moving objects. The optimal velocity was
moving from right to left. In the first processing stage we calculated as the average of the clear feature displacements
have estimated the optical flow by using the full search from each moving object, obtained independently, in a semi-
block matching algorithm, when assuming blocks of 4 x 4 automatic way. The comparison results between the real speed
pixels. The block matching search region is taken [ - 8 , 81 x of the objects and the speed obtained by means of the RBF
[-8, 81 pixels wide. The motion field provided by the block and MRBF algorithms are given in Table 111. The velocity
matching is shown in Fig. lO(a). The optical flow histogram is vectors found in the “Hamburg taxi” sequence by MRBF
represented in Fig. IO(a). The four moving objects (including and RBF algorithms are displayed in Fig. 10(b) and (c). The
the background) can be easily identified as histogram peaks smoothing obtained after processing the optical flow by using
(a concentration of motion vectors with similar velocity) in either MRBF or RBF algorithm is clear from these figures. A
Fig. 1 l(a). more complex criterion taking into account the block average
We have employed both RBF and MRBF neural networks graylevel and the position of the blocks would provide better
for optical flow segmentation. The input to hidden unit weights moving object segmentation results [38].
are calculated in an unsupervised manner as it was presented The histograms representing the optical flow modeled by
in Section 111. We have considered a second level clustering means of MRBF and RBF networks as IXk3 1 +j(X) are
algorithm for finding the output weights &. Each cluster of shown in Fig. 1 l(b) and (c), respectively. Comparing these
motion vectors which activates a hidden unit is assigned to an histograms to the initial optical flow histogram from Fig. 1 l(a)
BORS AND PITAIS: MEDIAN RADIAL BASIS FUNCTION 1363
VI. CONCLUSIONS
In this paper we present a novel algorithm for estimating
the RBF weights based on robust estimation and called median
RBF. This algorithm is presented in comparison with a classi-
cal approach for training an RBF network. We have employed
the marginal median estimator for evaluating the basis function
centers and the MAD for estimating the dispersion parameters.
We propose an implementation for MRBF network based on
data histogram updating which proved to be fast. We provide
the theoretical evaluation of the bias for both aligorithms in
the case when estimating overlapping Gaussian distributions.
The MRBF-based training is less biased by the presence of
the outliers in the training set and was proved to provide
an accurate estimation of the implied probabilities. Both RBF
and MRBF algorithms were compared in PDF estimation of
artificially generated data as well as in motion segmentation
of a real-life image sequence. In both cases, the MRBF gave
better estimation of the implied PDF’s and has shown better
classification capabilities.
REFERENCES
T. Poggio and F. Girosi, “Networks for approximation and learning,”
Proc. IEEE, vol. 78, no. 9, pp. 1481-1497, Sept. 1990.
E. J. Hartman, J. D. Keeler, and J. M. Kowalski, “Layered neural
networks with Gaussian hidden units as universal approximations,”
Neural Computa., vol. 2, pp. 210-215, 1990.
J. Park and J. W. Sandberg, “Universal approximation using radial basis
functions network,” Neural Computa., vol. 3, pp. 246-;!57, 1991.
E. Parzen, “On estimation of a probability density function and mode,”
Ann. Math. Stat., vol. 33, pp. 1065-1076, 1962.
R. M. Sanner and J.-J. E. Slotine, “Gaussian networks for direct adaptive
control,” IEEE Trans. Neural Networks, vol. 3, pp. 837-863, Nov.
1992.
D. S. Broomhead and D. Lowe, “Multivariable functionial interpolation
and adaptive networks,” Complex Syst., vol. 2, pp. 321--355, 1988.
J. Moody and C. Darken, “Fast learning in networks of locally-
tuned processing units,” Neural Computa., vol. 1, no. 2, pp. 281-294,
1989.
R. J. Schalkoff, Pattern Recognition: Statistical, Structural, and Neural
Approaches. New York: Wiley, 1992.
T. K. Kohonen, SelfOrganization and Associative Memory. Berlin:
Springer-Verlag, 1989.
D. F. Specht, “Probabilistic neural networks and the polynomial adaline
as complementary techniques for classification,” IEEE T r a m Neural
Networks, vol. 1 , pp. 111-121, Mar. 1990.
__ , “A general regression neural network,” IEEE Trans. Neural
Networks, vol. 2, pp. 568-576, Nov. 1991.
H. G. C. TrivCn, “A neural-network approach to statistical pattern classi-
fication by semiparametric estimation of probability density functions,”
Fig. 11. The optical flow histograms corresponding to the movement be- IEEE Trans. Neural Networks, vol. 2, pp. 366-377, May 1991.
tween the first and third frames. From left to right the peaks in histograms M. T. Musavi, W. Ahmed, K. H. Chan, K. B. Faris, and I). M. Hummels,
represent the van, the taxi, the background and the left car: (a) the histogram “On the training of radial basis functions classifiers,” Neural Networks,
of the optical flow when using the full search block matching algorithm, (b) vol. 5 , pp. 595-603, 1992.
the histogram represented by means of MRBF network, and (c) the histogram A. G. Bora and M. Gabbouj, “Minimal topology for a radial basis
represented by means of RBF network. functions neural network for pattern classification,” Digital Signal
Processing: A Rev. J., vol. 4, no. 3, pp. 173-188, July 1994.
J. Platt, “A resource-allocating network for function interpolation,”
Neural Computa., vol. 3, no. 2, pp. 213-225, 1991.
we can observe that MRBF network approximates it better than S. Chen, B. Mulgrew, and P. M. Grant, “A clustering technique for
RBF. From Figs. 1O(c) and 1 1(c), as well as from Table I11 we digital communications channel equalization using radial basis function
can see that the algorithm based on classical training was not networks,” IEEE Trans. Neural Networks, vol. 4, pp. 570-579, July
1993.
able to identify correctly the movement of the “taxi“ moving S. Lee and R. M. Kil, “A Gaussian potential function network with
object. According to these experiments, the proposed learning hierarchically self-organizing learning,” Neural Networks, vol. 4, pp.
algorithm MRBF, provides better estimation for the desired 207-224, 199 1,
S. Chen, C. F. N. Cowan, and P. M. Grant, “Orthogonal least squares
parameters when compared to the classical statistics based learning algorithm for radial basis functions networks,” IEEE Trans.
training for the RBF network. Neural Networks, vol. 2, pp. 302-309, Mar. 1991.
1364 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. I, NO 6, NOVEMBER 1996
[191 S. Chen and B . Mulgrew, “Overcoming cochannel interference using an Adrian G. Borv was born in Piatra Neami, Ro-
adaptive radial basis functions equalizer,” Signal Processing, vol. 28, mania, on November 3, 1967. He received the
pp. 91-107, July 1992. M.S. degree in electronics engineering from the
1201 T. K. Kohonen, “The self-organizing map,” Proc. IEEE, vol. 78, no. 9, Polytechnic University of Bucharest, Romania, in
pp. 1464-1480, Sept. 1990. 1992. He is currently working toward the Ph.D.
[21] L. Xu, A. Krzyzak, and E. Oja, “Rival penalized competitive learning for degree.
clustering analysis, RBF net, and curve detection,” IEEE Trans. Neural During 1992 to 1993, he was a Visiting Re-
Networks, vol. 4, pp. 636-649, July 1993. searcher at the Signal Processing Laboratory, Tam-
[22] G. Seber, Multivariate Observations. New York: Wiley, 1986. pere University of Technology, Finland. Since 1993.
[23] I. Pitas and A. N. Venetsanopoulos, Nonlinear Digital Filters: Principles he has been with the University of Thessaloniki,
and Applications. Norwell, MA: Kluwer, 1990. Greece. His research interests include neural net-
1241 I. Pitas and P. Kiniklis, “Median learning vector quantizer,” in Proc. works, computer visii3n , pattern recognition, and nonlinear digital signal
SPIE, Nonlinear Image Processing, San Jose, CA, vol. 2180, Feb. 7-9, processing.
1994, pp. 23-34.
[25] A. Papoulis, Probability, Random Variables. and Stochastic Processes.
New York: McCraw-Hill, 1984.
[26] A. Gersho and R. M . Gray, Vector Quantization and Signal Compression.
Norwell, MA: Kluwer, 1992.
1271 B. Widrow and S . D. Steams, Adaptive Signal Procexsing. Englewood
Cliffs, NJ: Pi-entice-Hall, 1985.
[28] J. A. Kangas, T. K. Kohonen, and J. T. Laaksonen, “Variants of self- Ioannis Pitas (SM’94) received the Diploma degree
organizing maps.” IEEE Trans. Neural Networks, vol. 1, pp. 93-99, in electrical engineering in 1980 and the Ph D
Mar. 1990. degree in electrical engineering in 1985, both from
[29] E. Yair, K. Zeger, and A. Gersho, “Competitive learning and soft the University of Thessaloniki, Greece
competition for vector quantizer design,’’ IEEE Trans. Signal Processing, Fiom 1980 to 1993, he served as Scientihc As-
ml. 40, no. 2, pp. 294-309, Feb. 1992. sistant, Lecturer, Assistant Professor, and Associate
[30] A. G. Borg and I. Pitas, “Robust estimation of radial basis functions,” Profes\or in the Department of Electtical and Com-
in Proc. IEEE Wkshp. Neural Networks f o r Signal Processing, Ermioni,
puter Engineering, University of Thessdloniki He
Greece, Sept. 6-8, 1994, pp. 105-114.
served as a Visiting Research Associate at the
1311 I. Pitas, C. Kotropoulos, N. Nikolaidis, R . Yang, and M. Gabbouj, “A
class of order statistics learning vector quantizers,” in Proc. IEEE h t . University of Toronto, Cdnada, the University of
Symp. Circuits Syst., London, 1994, pp. VI--387-VI-390.
Erlangen Nuernberg, Germany, and Tampere Uni
versity of Technology, Finland, a\ well as Visiting Assistant Professor at the
1321 D. E. Rumelhart and J. L. McClelland, Parallel Distributed Processing.
Cambridge, MA: MIT Press, 1986. University of Toronto He WdS Lecturer in short courses for continuing educa-
1331 J. Hertz, A. Krogh, and R. G. Palmer, Introduction to Theory of‘Neuru1 tion Since 1994, he has been d Professor at the Department of Informntics at
Computation. New York: Addison-Wesley, 1991. the University of Thessaloniki His cunent interests dre i n the dreds of digital
[34] T. S. Huang, G. J. Yang, and G. Y. Tang, “A fast two-dimensional image processing, multidimen\ional signal processing, dnd computer vision
median filtering algorithm,” IEEE Trans. Acoust., Speech, Signal Pro- He has publirhed more thdii 190 papers and contributed to eight books in his
cessing, vol. ASSP-27, pp. 13-18, 1979. area of interest He I\ the coauthor of the book, Nonlrneiir Digital F‘rlferc.
[35] A. N. Netravali and B. G. Haskell, Digital Pictures: Representation and Principles andApplications (Boston, MA Kluwer, 1990) He is duthor of the
Compression. New York: Plenum, 1988. book, Digital Image Processing Algorithms (Englewood Cliffs NJ Prentice
[36] M. M. Chang, A. M. Tekalp, and M. I. Sezan, “Motion-field seg- Hall, 1993) He 15 editor of the book, Parallel Algorithm and Archrfetturer
mentation using an adaptive MAP criterion,” in Proc. IEEE Int. Conf f o r Digital Image Proteyring, Cornputer ! h i o n , and Neural Networks (New
Acoust., Speech, Signal Processing, Minneapolis, MN, Apr. 1993, pp. York Wiley, 1993)
V-33-V-36. Dr Pitas ha\ been a membei of the European Community ESPRIT Pdiallel
[371 D. P. Kottke and Y. Sun, “Motion estimation via clustering matching,” Action Committee He hds dlso been dn invited spedker and/or member of the
IEEE Trans. Pattern Anal. Machine Intell., vol. 16, pp. 1128-1 132, Nov. progiam committee of 5everdl scientihc conferences and workshops He I\
1994. an Associate Editor of the IEEE TRA~SAC’IIOUS oh CIRCUITS AUD SYSTEMS
[381 A. G. Borg and I. Pitas, “Median radial hasis function network for optical and coeditor of Multidimenru“ Systems and Signal Procetsing He waq
flow processing,” in Proc. IEEE Wkshp. Nonlinear Signal and Image chair of the 1995 IEEE Workshop on Nonlinear Signal and Imdge Piocessing
Processing, Neos Marinaras, Greece, June 1995, pp. 702-70.5. (NSIP’9.5)