Vous êtes sur la page 1sur 6

A Perceptron-Like Online Algorithm for Tracking the Median

Tom Bylander and Bruce Rosen Division of Computer Science University of Texas at San Antonio San Antonio, TX 78249 USA fbylander,roseng@cs.utsa.edu Abstract
We present an online algorithm for tracking the median of a series of values. The algorithm updates its current estimate of the median by incrementing or decrementing a xed value, which is analogous to perceptron updating. The median value of a sequence minimizes the absolute loss, i.e., the sum of absolute deviations. Our analysis shows that the worst-case absolute loss of our algorithm is comparable to the absolute loss of any sequence of target medians, given restrictions on how much the target can change per trial.

Introduction

The median is the value at the halfway point in a set of values or a probability distribution. The median is a useful order statistic [6]. For example, we have been working the domain of fetal heart rate interpretation. In this domain, the median is better than the mean for estimating the baseline heart rate because the median is less sensitive to extreme values, e.g., outliers caused by arrhythmias or noise [7]. When the median of a time series is stationary or nearly so, the median of the previous n values is efcient to determine and provides a good estimate of the median. The sample median is typically computed by sorting the data and accessing the middle value, a process which is (n log n), though it should be noted that more efcient algorithms are available [2]. When the median of a time series is nonstationary, it is not clear how to estimate the median. For example, the baseline fetal heart rate can change; in particular, the onset of a falling baseline is especially important to detect. It might be possible to employ traditional techniques from time-series analysis [3], but strong assumptions about the underlying process are often required. A different approach is to use an algorithm that is guaranteed to have good performance even in the worst-case. In our paper, we present an online algorithm for tracking the median. The algorithm stores a current estimate of the me-

dian. Before receiving a datum, the algorithm uses its current estimate as the prediction of the next median. After receiving the datum, the current estimate is updated by incrementing or decrementing a xed value according to whether the prediction is lower or higher than the datum, respectively. It is important to note that the estimate of the median at time point t is given before the datum is received. Of course, a good estimate of the median can be given by sampling a window around time t, but this requires data at and after time t. A good estimate might also be given by sampling a window before time t. Our algorithm then provides a performance standard for empirical comparison. The update of our algorithm is a specialization of the perceptron update rule, i.e., a perceptron unit that simply consists of a bias weight with no thresholding. The single input to the unit is always 1. The bias weight corresponds to the current estimate of the median. The learning rate corresponds to the xed increment/decrement value. Our worst-case analysis is based on the fact that the median minimizes the sum of absolute deviations, i.e., the absolute loss. In particular, we show that the worst-case absolute loss of our algorithm is comparable to the absolute loss of the optimal sequence of target medians, i.e., at each time point, the target median can be thought of as the median of the distribution at that time point. The only restriction we make is a limit on the amount that the target median can change per time step. No other assumptions are made regarding the distribution of the data.

Preliminaries

A trial consists of a datum x from , where x corresponds to the next value in some time series. For each trial, b our online algorithm makes a prediction m, i.e., its current estimate of the median. Then the algorithm receives the datum x and modies its estimated median. Over a sequence S of k trials, the performance of our online algorithm is measured by comparing the sum of its absolute losses

mPwith the sum of the absolute losses of the b tj t=1 jxt k target median t=1 jxt mj, or with the sum of the abPk solute losses of a sequence of target medians t=1 jxt mt j. The notation Loss( ; S ) will be used to refer to the total loss of an algorithm or target median(s) on a given sequence S . The target median m or sequence of target medians (m1 ; : : :; mk ) minimizes this loss, subject to a limit on how much two consecutive target medians can differ. There are no other restrictions on the distribution of trials, and we do not assume that the distribution is stationary. It might appear that we are assuming that the trials and the target median(s) are known in advance. This is not the case. We merely rely on the fact that some sequence of trials will occur and that some target median(s) will be optimal for the trial sequence after the trial sequence has been seen. Our online algorithm makes a prediction before each trial and never receives the target median(s). Even so, the performance of the online algorithm is comparable to what is optimal after the trial sequence is known. A well-developed methodology for analyzing this kind of b online algorithm is to nd a distance function d(m; m) that changes in correspondence to the loss of the algorithm and b the loss of the target [1, 4, 5]. In our case, we use d(m; m) = (m m)2. Based on this methodology, we analyze an online b algorithm for estimating the median.
3 Algorithm

Pk

Algorithm OM(s, ) Parameters: s: initial estimate of median. : amount to increment or decrement estimate, > 0. Initialization: Before the rst trial, set m1 to s. b Update: After receiving the tth datum xt , update the estimate as follows:

mt+1 = b

8 b < mt mt b : mt b

if xt if xt if xt

> mt b = mt b < mt b

Figure 1. Online Algorithm for Tracking the Median

Figure 1 displays the OM (Online Median) algorithm, with parameters s and , where > 0. The rst median estimate m1 is s. Thereafter, if the next datum is b higher/lower than the current estimate, then the estimate is incremented/decremented by . Clearly, the OM algorithm can be implemented by a simple perceptron-like computing unit, i.e., a perceptron with just a bias weight with no thresholding performed. The OM algorithm can be modied for any quantile (the median is the 0:5 quantile). If instead, an online estimate of the pth quantile is desired, then the increment should be 2 p and the decrement should be 2 (1 p). However, our analysis below only applies to the median. Consider the following example. Suppose that a time series begins with a sequence of ten 10s, and then alternates between 9 and 11. Then OM(0; 1) starts with an initial estimate m1 = 0. This estimate is incremented by 1 for each b of the rst 10 trials, leading to m11 = 10. Then, x11 = 9 b results in a decrement, i.e. m12 = 9. The next datum b x12 = 11 results in a increment, i.e. m13 = 10. The meb dian estimate continues to alternate between 9 and 10. This example, though simple, illustrates the three sources of loss for the OM algorithm. The loss of the OM algorithm is limited to the sum of the following.

One, the OM algorithm has a loss during the beginning of the sequence as the median estimate changes from its initial value toward the target median 10. This source adds a constant to the total loss because it only occurs on the initial part of the sequence. Two, if the current median estimate happens to be the target median, the next increment or decrement causes the estimate to deviate from the target. This source results in a per trial loss. Three, the target median 10 has a loss. In the worst-case, the OM algorithm cannot do better than the target.

Analysis of Stationary Median

Before analyzing the general case of a nonstationary median, it is useful to consider the stationary median rst. Theorem 1 Let S = (x1; : : :; xk ) be an arbitrary trial sequence with target median m, let s be an arbitrary real number, and let be an arbitrary positive real number. Then:

Loss(OM(s; ); S ) 2 Loss(m; S ) + (m 2 s) + k2 jxt mt j jxt mj b (m mt )2 (m mt+1 )2 + b b 2 2 Loss(OM(s; ); S ) Loss(m; S )

(1)

Proof: If we can show that the following inequality holds for each trial t:

(2)

then by summing over all trials, we can obtain the inequality of the theorem:

k X t

k X t

=1

jxt mt j b (m

k X t

mt )2 b

=1

jxt mj (m mt+1 )2 + b 2

2 b 2 b = (m m1 ) 2 (m mk+1 ) + k2 (m s)2 + k 2 2
So consider the following cases: Case 1: xt > mt . In this case, mt+1 b b derive:

=1

= mt + b

. We can

is an additional loss per trial of at most =2. This suggests a tradeoff in the choice of . A larger decreases (m s)2 =(2 ) but increases k =2. A smaller has the opposite effect. Inequality 1 is a tight worst-case bound for the OM algorithm. Let c be a positive integer. Suppose that the initial estimate s is 0, the increment is 1, and that datum xt = c for 1 t c, and then alternates between c 1 and c + 1. The target median m = c has a loss of k c. The loss of the OM algorithm would vary from c on the rst trial to 1 on the cth trial, and then alternate between 1 and 2 for the remaining k c trials. Assuming that k c is an even number, then the total loss of the OM algorithm is:
c X

jxt mt j b = (xt (xt = m

mt b

jxt mj mt ) jxt mj b mt ) (xt m) b

which is equal to:

left hand side of Inequality 2 is nonpositive, and the right hand side is nonnegative, so the inequality follows. b b b . We can Case 3: xt < mt . In this case, mt+1 = mt derive:

(m mt )2 (m mt+1 )2 + b b 2 2 (m mt )2 (m mt )2 + b b = 2 2 2 (m m) 2 + b = 2 2 = m mt b Case 2: xt = mt . In this case, mt+1 = mt . Clearly the b b b

3 2 t=c+1 t=1 c(c + 1) + 3(k c) = 2 2 c2 + k = (k c) + 2 2 (c t + 1) +


k X

which makes Inequality 1 an equality for this case.

Analysis of Nonstationary Median

Now we consider the situation where there is a sequence of target medians (m1 ; : : :; mk ). The only constraint we impose is that the target median does not change by more than from trial to trial. That is, for 1 t k 1, we require:

mt

mt+1 mt +

As long as the increment exceeds , the loss of the OM algorithm is comparable to the loss of the target medians. Theorem 2 Let S = (x1 ; : : :; xk ) be an arbitrary trial sequence with a sequence of target medians M = (m1 ; : : :; mk ), where consecutive target medians differ by , 0. Let s be an arbitrary real number, and at most let be an arbitrary real number such that > . Then:

jxt mt j b = (mt b (mt b = mt b (m


mt )2 b

jxt mj xt ) jxt mj xt ) (m xt) (m


mt+1 )2 b

Similar to case 1, this turns out to be equal to:

+2 2 b 2 b = (m mt ) 2(m mt + ) + 2 2

Loss(OM(s; ); S ) + Loss(M; S ) + (m1 s)2 + k( + )2 2( ) 2( ) 2( )jxt mt j 2( + )jxt mt j b (mt mt )2 (mt+1 mt+1 )2 + ( + )2 b b

(3)

Inequality 2 holds for all three cases, thus the inequality of the theorem follows. Thus, the loss of the OM algorithm can exceed the loss of the target median m by at most two terms. One term, which is a constant regardless of the length of the sequence, is quadratic in jm sj and linear in 1= . The other term

Proof: If we can show that the following inequality holds for each trial t: (4)

then by summing over all trials, we can obtain the inequality of the theorem:

2(

)Loss(OM(s; ); S ) 2( + )Loss(M; S )

= 2(
k X

k X t

=1

jxt mt j 2( + ) b

k X t

=1

jx t m t j

=1 = (m1 (m1
t

(mt mt )2 (mt+1 mt+1 )2 + ( + )2 b b


s)2 (mk+1 mk+1 )2 + k( + )2 b s)2 + k( + )2

Setting = 3 , i.e., setting to three times the maximum change in consecutive target medians, minimizes the additional loss per trial ( + )2 =(2( )). A smaller value for increases of the values of all terms in Inequality 3, so setting too close to might seriously degrade performance. A larger value for decreases the loss ratio ( + )=( ) and the constant (m1 s)2 =(2( )) at the expense of increasing the additional loss per trial.

We proceed by cases. Case 1: xt > mt . In this case, mt+1 = mt + . Suppose b b b mt+1 = mt + t , where . In this case, we can t derive:

Example

(mt mt )2 (mt+1 mt+1 )2 + ( + )2 b b 2 (mt + t mt = (mt mt ) b b )2 + ( + )2 2 2 = 2( mt ) ( b t )(mt t) + ( + ) 2( mt ) b t )(mt 2( )(mt mt ) if mt mt b b b 2( + )(mt mt ) if mt < mt b
Based on the last inequality, we consider two subcases. Case 1a: When mt mt , then: b

In this section, we illustrate the OM algorithm on a fetal heart rate (FHR) tracing. Originally, the FHR values are given in sample-and-hold form. The FHR signal consists of a sequence of values given in terms of beats per minute. Each FHR value x corresponds a duration of 60=x seconds between consecutive heart beats. In our example, these FHR values are sampled every 4 seconds, resulting in a uniform time step. For the OM algorithm, the starting value s is 150, and the increment is 1. Figure 2 illustrates what happens on one sequence. It can be seen that the FHR data (dotted line) varies considerably with many extreme values. Nevertheless, the online median (solid line) remains well within the data except when there are large changes. Over the same sequence, Figure 3 compares the online median (solid line) to a locally computed median (dashed line), i.e., the median of the 2 minute window surrounding each datum (1 minute in the past and 1 minute in the future). The absolute loss of the online median is only 32% higher than the local median even though the local median has a distinct advantage because it uses current and future values. Again, it can be seen that the online median lags behind when there are large changes in the data. Increasing the increment would allow the OM algorithm to track large changes more closely, but at the expense of poorer performance when the median value is stable. The online median compares well to estimating the median with a window before the current value. The median of the 2 minute window before the datum (the local median in Figure 3 shifted one minute to the right) has an 8% higher absolute loss compared to the online median. The median of the 1 minute window before the datum has an 5% lower absolute loss compared to the online median. Similar results appear to hold for other fetal heart rate tracings. Because the baseline fetal heart rate does not change more than a few heart beats per minute, the resistance of the online median to large changes suggests that the online median might be a good choice for detecting a falling baseline. We are continuing to study the potential of this approach.

2(

)jxt mt j 2( + )jxt mt j b 2( )(xt mt ) 2( b )(xt mt ) = 2( )(mt mt ) b

and Inequality 4 follows. Case 1b: When mt < mt b

2(

)jxt mt j 2( + )jxt mt j b 2( + )(xt mt ) 2( + )(xt mt ) b = 2( + )(mt mt ) b

and again Inequality 4 follows. b b b Case 2: xt = mt . In this case, mt+1 = mt . Clearly the left hand side of Inequality 4 is nonpositive, and it and the right hand side is nonnegative, so Inequality 4 follows. Case 3: xt < mt . The proof of this case is similar to b Case 1. Inequality 4 holds for all three cases, thus the inequality of the theorem follows. Note that if = 0, i.e., the target median is stationary, then Inequality 3 is identical to Inequality 1. This implies that Theorem 2 is a strict generalization of Theorem 1. Also, the inequality of Theorem 2 has a form similar to Theorem 1. The loss of the OM algorithm is bounded by the sum of three terms: a ratio times the loss of the optimal sequence of target medians, a constant regardless of the length of the sequence, and an additional loss per trial. For a nonstationary target median, Theorem 2 suggests the following considerations for choosing the increment .

180 online data 160

140 Heart Rate

120

100

80

60 0 10 20 30 40 Minutes 50 60 70

Figure 2. Example of OM Algorithm on a Fetal Heart Rate Tracing

180 online local 160

140 Heart Rate

120

100

80

60 0 10 20 30 40 Minutes 50 60 70

Figure 3. Comparison of OM Algorithm to Locally Computed Median

Summary

We have presented a new online algorithm for tracking the median. The algorithm performs a perceptron-like update to modify its current estimate of the median. A theoretical analysis of the algorithm shows that the algorithm is guaranteed to have an absolute loss comparable to the optimal target median, even if the target median is allowed to change during the sequence. Our future research will continue to apply the OM algorithm to real applications such as fetal heart rate interpretation, and will empirically test the effectiveness of the OM algorithm for estimation and learning. Acknowledgments This research has been funded in part by grant NIH/MBRS S06GM 08194-16.

References
[1] N. Cesa-Bianchi, P. M. Long, and M. K. Warmuth. Worstcase quadratic loss bounds for a generalization of the WidrowHoff rule. IEEE Transactions on Neural Networks, 7:604 619, 1996. [2] T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. MIT Press, Cambridge, Massachusetts, 1990. [3] P. J. Diggle. Time Series: A Biostatistical Introduction. Oxford University Press, London, 1990. [4] J. Kivinen and M. K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Technical Report UCSC-CRL-94-16, Univ. of Calif. Computer Research Lab, Santa Cruz, California, 1994. An extended abstract appeared in STOC 95, pp. 209218. [5] N. Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2:285318, 1988. [6] F. Mosteller and R. E. K. Rourke. Sturdy Statistics. AddisonWesley, Reading, Massachusetts, 1973. [7] B. S. Schifrin. Exercises in fetal monitoring: Volume 1. Los Angeles, 1990.

Vous aimerez peut-être aussi