Block Lms Algorithm

1860
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 49, NO. 9, SEPTEMBER 2001
Analysis of the Partitioned Frequency-Domain Block LMS (PFBLMS) Algorithm

Kheong Sann Chan and Berhouz Farhang-Boroujeny, Senior Member, IEEE
AbstractIn this paper, we present a new analysis of the partitioned frequency-domain block least-mean-square (PFBLMS) algorithm. We analyze the matrices that control the convergence rates of the various forms of the PFBLMS algorithm and evaluate their eigenvalues for both white and colored input processes. Because of the complexity of the problem, the detailed analyses are only given for the case where the filter input is a first-order autoregressive process (AR-1). However, the results are then generalized to arbitrary processes in a heuristic way by looking into a set of numerical examples. An interesting finding (that is consistent with earlier publications) is that the unconstrained PFBLMS algorithm suffers from slow modes of convergence, which the FBLMS algorithm does not. Fortunately, however, these modes are not present in the constrained PFBLMS algorithm. A simplified version of the constrained PFBLMS algorithm, which is known as the schedule-constrained PFBLMS algorithm, is also discussed, and the reason for its similar behavior to that of its fully constrained version is explained. Index TermsAdaptive filters, block LMS, FBLMS, frequency domain, partitioned FBLMS.
I. INTRODUCTION N the realization of adaptive filters, the least-mean-square (LMS) algorithm has always been one of the most popular adaptation schemes. Its conventional form, which was first proposed by Widrow and Hoff [1], has been very well analyzed and understood. Its main drawback is that it does not perform very well in the presence of a highly colored filter input. In the past, researchers have developed many variations of the LMS algorithm, reducing the complexity, increasing the convergence rate, and tailoring it for certain specific applications [2][16]. The frequency-domain block LMS (FBLMS) algorithm (also known as the fast block LMS) was initially proposed by Ferrara [2] to cut down the computational complexity of the algorithm. It takes advantage of the existence of efficient algorithms for the computation of the discrete Fourier transform (DFT) and the fact that point-wise multiplication in the frequency domain is equivalent to circular convolution in the time domain [17]. In this way, a block of size outputs of the filter are simultaneously calculated. The block size is typically (although not necessarily) chosen to be the same as or close to the filter size .
Manuscript received May 2, 2000; revised May 18, 2001. The associate editor coordinating the review of this paper and approving it for publication was Dr. Ali H. Sayed. K. S. Chan is with the Data Storage Institute, National University of Singapore, Singapore. B. Farhang-Boroujeny is with the Department of Electrical Engineering, University of Utah, Salt Lake City, UT 84112-9206 USA and also with the National University of Singapore, Singapore (e-mail: farhang@ee.utah.edu). Publisher Item Identifier S 1053-587X(01)07043-X.
Ferraras algorithm was initially proposed as an exact but fast implementation of the time-domain block LMS (BLMS) algorithm. However, Mansour and Gray [16] subsequently showed that it was also possible to omit an operation that constrained certain time-domain quantities at the cost of an increase in the misadjustment. This algorithm, which is referred to as the unconstrained FBLMS algorithm, saves two out of the five fast Fourier transforms (FFTs) required in the original (constrained) FBLMS. However, it is no longer an exact implementation of the time-domain block LMS algorithm. It was also found that since the transformed samples of the filter input (known as frequency bins) are almost uncorrelated from one another, one may use separate normalized step-size parameters for each of the frequency bins [18], [19], thereby equalizing the convergence rates of all frequency bins. The stepnormalization procedure resolves the problem of slow modes of the LMS algorithm and results in an algorithm that converges faster than the conventional LMS algorithm [18], [20]. Analysis done on the FBLMS algorithm has shown that as the block size grows, the eigenvalues that control the convergence behavior of the FBLMS algorithm all asymptotically approach the same value, resulting in the fastest possible convergence rate for a given filter length [20]. The shortcoming lies in the fact that as the filter length grows, more time must be spent accumulating the input data before any processing can begin. Therefore, is large, the when the number of taps of the adaptive filter FBLMS algorithm suffers a significant delay from the time the first datum of the current block is collected until the processing for the current block is completed. This delay is referred to as the algorithm latency. One solution is to simply use a smaller , but this results in an algorithm that is comblock size putationally less efficient. Asharif et al. [3] have suggested a solution to this that involves partitioning the time-domain convolution into smaller-sized convolutions and performing each of the convolutions in the frequency domain. This algorithm is referred to here as the partitioned FBLMS (PFBLMS) algorithm and solves the problem of a large latency without sacrificing computational efficiency; the FBLMS algorithm is a special . The analysis of case of the PFBLMS algorithm with the PFBLMS algorithm, however, is more complex than that of the FBLMS algorithm. Moulines et al. have analyzed the PFBLMS algorithm in [21] (which they refer to as a multidelay adaptive filter). They have performed first- and second-order analysis on the matrices. Their analysis resembles that done by Lee and Un in [20] in which they derive the matrices that control the various implementations of the FBLMS algorithm. They show that in
1053587X/01$10.00 2001 IEEE
CHAN AND FARHANG-BOROUJENY: PARTITIONED FREQUENCY-DOMAIN BLOCK LMS ALGORITM
1861
the cases of the normalized constrained/unconstrained FBLMS algorithms and the normalized constrained PFBLMS algorithm, the matrices will be asymptotically equivalent to some other matrices that have identical eigenvalues. Therefore, in the limit as the matrix dimension grows to infinity, the eigenvalue spread will tend toward 1. However, the unconstrained PFBLMS algorithm is found to suffer from slow modes of convergence. Beaufays has shown in [22] that although the asymptotic equivalence of two matrices may mean that the eigenvalue moments will be the same (as proven by Gray [23]), there may still exist individual eigenvalues that do not asymptotically converge to the same values. In addition, in [7], it has been noted that the overlapping of successive partitions in the PFBLMS algorithm degrades its unconstrained convergence behavior. A solution to this problem was then given based on a simplified analysis of the algorithm. It was noted that by reducing the amount of overlap between successive partitions, the slow convergence of the PFBLMS algorithm is resolved to some extent. Further work along this line has been done in [24]. In [25], McLaughlin has proposed an alternative algorithm where the constraining operation is applied to the various partitions on a scheduled basis, with specific reference to the application of acoustic echo cancellation. This scheduling of the constraint results in a significant reduction in the computational complexity of the algorithm while keeping the convergence at almost the same rate as the fully constrained PFBLMS algorithm. This method is therefore referred to as the schedule-constrained PFBLMS algorithm. Analysis of this algorithm has yet to be reported. In this paper, we present an alternative analysis of the PFBLMS algorithm based on a direct evaluation and analysis of the underlying convergence controlling matrices, without relying on the concept of asymptotic equivalence. We use Gerschgorins theorem to derive bounds on the distribution of the eigenvalues of these matrices. To make the analysis tractable, we first consider the case where the filter input is modeled as a first-order autoregressive process (AR-1). We then show through numerical examples that the matrices evaluated for the AR-1 case are similar to those of more general processes as well. We also provide an overview of the less-understood schedule-constrained PFBLMS algorithm based on the matrices we have developed in this paper. II. PFBLMS ALGORITHM We consider the implementation of an adaptive transversal taps. The filter output is related to its filter with by input
where (3) and each of these smaller convolutions is then performed effias ciently in the frequency-domain and summed to give input and tap-weight vectors for per (2). Then, the the th partition of the algorithm are
(4) respectively, where is the block size, and is the block index. We also note that there is an overlap of data samples between successive partitions and that the addition of zeros at the end is to allow the computation of filter output , based of on the overlap-save method [17], which can be implemented in the frequency domain. The frequency-domain input and tap-weight vectors used in and the PFBLMS algorithm are the DFT of (5) where vector is the DFT matrix. The frequency-domain output is given by
(6) where denotes point-wise multiplication of vectors, and is a diagonal matrix taking the elements of along its diagonal. The block of time-domain filter output are the last elements of samples . The corresponding time-domain errors are (7) , where s are the elements of for the time-domain training sequence. The frequency-domain error vector , vector is given as the1 where (8) This may also be written entirely in the frequency domain as
(1) s are the adaptive filter tap weights. where the The convolution sum of (1) may be partitioned into smaller-sized convolution sums according to is the DFT of the training signal padded with where zeros at the beginning, and
(9)
(10) (2)
e
note that for a block of length L and partition length M , the length of M + L 1. The choice of M + L is however more common as it greatly simplifies the implementation [19] and analysis.
1We
( ) need only be
1862
is a windowing matrix that forces the first time-domain elis then used to update ements to zero [19]. The vector the frequency-domain tap-weight vectors of each partition according to
. . .
. . .
..
. . .
(11) where denotes complex conjugation (12) is the windowing matrix that forces the last time-domain eleis the unnormalized step size, and is a ments to zero, diagonal matrix of the powers at each of the frequency bins. The powers of the frequency bins are usually updated using the running average algorithm
where , is a parameter that determines how much weight is given to previous powers. In a subsequent discussion, and assume we choose to ignore the time dependence of holding the powers that it is equal to a fixed diagonal matrix at each frequency bin along its diagonal. Comments on the range that guarantees the stability of the PFBLMS algorithm of may be found in [21]. We may write the equations for the PFBLMS algorithm in a more compact form by defining the super matrix (13) and the super vector
The matrix or performs the constraining operation, and it is thus omitted in the unconstrained algorithm. The is omitted for the unnormalized step-normalizing matrix PFBLMS algorithm. The parameter is the unnormalized step size and is used to control how fast the algorithm converges is and the misadjustment level after convergence. Typically, chosen such that the final misadjustment is on the order of 10%. In the rest this paper, we limit our discussion to the analysis of step-normalized versions of the PFBLMS algorithm. The constrained PFBLMS algorithm without step normalization is an alternative (fast) implementation of the conventional block LMS algorithm whose performance is well understood [19]. It performs very similarly to the conventional LMS algorithm. The unconstrained PFBLMS algorithm without step normalization, on the other hand, is difficult to analyze, and to the best of our knowledge, such analysis has yet to be reported. The limited reports on the analysis of the PFBLMS algorithm, as in this paper, have concentrated mainly on the step-normalized forms of the algorithm. One reason for this is that PFBLMS algorithm without step normalization is hardly ever used in practice. As mentioned in Section I, the step normalization resolves the well-known problem of slow convergence modes in the LMS algorithm. In the FBLMS and PFBLMS algorithms, this solution comes at a minimal computational cost. III. ANALYSIS OF THE PFBLMS ALGORITHM We assume a model for the desired signal that consists of through a fixed FIR filter and adding inpassing the input to the output. The coefdependent zero-mean white noise will be the optimum tap weights ficients of this fixed filter for the adaptive filter that will minimize the power of the error . As such, we may write (9) as vector (17)
. . . With these definitions, (6) is more simply written as
(14)
(15) The error vector is still given by (9), but the update equation (11) becomes2 (16) where superscript denotes matrix Hermitian
is the optimum tap-weight vector defined in a way where is the error similar to that done in (4), (5), and (14), and (noise) added to the output of the plant. Substituting (17) into (16), we obtain
(18) is the tap-weight error vector. where from both sides, taking the expectation, and Subtracting considering the commonly used independence assumption [19], we get (19)
. . .
2It
. . .
..
. . .
and
is known that in a finite-precision implementation, (16) may face some numerical stability problems due to round-off errors [19]. The algorithm will work better if is applied to the entire right-hand side of (16). For the purposes of analysis, however, (16) is adequate as it stands.
where (20)
1863
and we have set , which readily and and the fact follows from the independence of . We note that (19) is the equation for the that constrained normalized PFBLMS algorithm. The analysis for the unconstrained and/or unnormalized PFBLMS algorithms and/or from the follows analogously by removing the equation, respectively. Equation (19) shows that the modes of convergence of the PFBLMS algorithm are controlled by . We therefore begin with an the properties of the matrix analysis of this matrix. A. Matrix Substituting (13) into (20), we obtain
Now, we make the extension to the matrix of the normalized PFBLMS algorithm, which, from (19), is (26) Here and in the subsequent equations, we use the superscript to indicate normalization. It turns out that the structure of the time-domain matrix (27) where is a block-diagonal supermatrix whose diagonal block elements are the DFT matrix , is simpler than that of its frequency-domain counterpart. From the theory of eigenvalues of and share the same set of matrices, it is known that and its eigenvalues. We thus proceed with the analysis of . submatrices B. Matrix For our analysis, we assume a first-order autoregressive input process (AR-1) with coloring parameter . Such a process is generated by passing white noise through a filter with transfer . The normalized submatrices function are readily obtained from (23), (25), (26), and (27) as (28) for the A procedure for the derivation of the elements of is given in Appendix A. The expressions case when obtained are cumbersome for the cases with general . When , examination of (4) reveals that inputs at different partitions will have significant correlation only when their partition and numbers do not differ by more than 1. That is, will have significant values, but to will all have norms that are close to zero, given that the filter is large enough. The input is not too highly colored and that is thus asymptotically block tridiagonal. We therematrix fore summarize our results for the three main submatrices in and . These are which we are interested, i.e., refers to the th element of the given in Tables IIII, where specified matrix, and is the Kronecker delta function. More can detailed derivations that include general elements of be found in [27]. The matrix controlling the convergence of the normalized unconstrained PFBLMS algorithm may then be explicitly written out as
. . . where
. . .
..
. . .
(21)
(22) and and we have used the fact that . Since is Hermitian, we see that is also Hermitian, and therefore, all of its eigenvalues will be real. has been shown to be asymptotically diagonal, and in the case (i.e., the FBLMS), the when the number of partitions unconstrained normalized algorithm performs sufficiently well as the filter length increases, even in the presence of a colored however, we have to consider the input [20], [26]. When on the matrix as effects of the submatrices well. is diagonal, the product From (22), we see that since is equivalent to multiplying the rows of by , which are the the corresponding diagonal elements of . Similarly, postmultiplying by the diagelements of is equivalent to multiplying the columns onal matrix by the corresponding diagonal elements, i.e., of . Noting these facts, we obtain the elements of
(23) where
(24) is the correlation matrix between the inputs of partitions and 0 and . Using (23) in (21), we get . . . .. . . . . . . .. . . . (25) where we have made use of . and
. . .
. . .
..
. . .
(29)
C. Numerical Examples . . At this stage in our study, it is instructive to look at the results of Tables IIII through some numerical examples. The numerical results presented here serve two purposes. First, they are used to confirm the accuracy of our derivations, as we have made an approximation in Appendix A in evaluating the diag. Second, the numerical examples serve onal elements of
1864
TABLE I ELEMENTS OF THE MATRIX
TABLE II ELEMENTS OF THE MATRIX
TABLE III ELEMENTS OF THE MATRIX
to fill the gap between the derived theoretical results for AR-1 input and the more general results associated with an arbitrary input process. Fig. 1 presents a pictorial representation of the and in the left column and time-domain matrices and in the right column for partition size when the input process is white. Fig. 2 shows six similar matrices that were generated when the input is an AR-1 . The relative magnitudes of process with parameter the elements are represented in varying shades of gray as indicated in the corresponding legends. Darker squares correspond to elements with larger magnitude, whereas lighter squares correspond to elements with smaller magnitude. The submatrices were evaluated from knowledge of the autocorrelation funcfor white input), and the tion for an AR-1 process (with were computed from using (28). The diagsubmatrices were used for the elements of instead onal elements of
of the approximations employed in Appendix A. The numerical elements of these submatrices were compared with those obtained using Tables IIII and found to be in close agreement. This confirms the validity of the simplifying assumptions used in Appendix A, which gave rise to the results of Tables IIII. According to (28), is derived from by first converting it to the frequency domain, taking the complex conjugate, point-wise , normalizing the matrix, and then conmultiplying it by verting it back into the time domain. Comparing Figs. 1 and 2, we see that although the original time-domain matrices differ quite substantially when the input is colored, as compared look fairly similar for to when it is white, the matrices the two cases. This implies that the normalized PFBLMS algorithm ought to perform similarly, regardless of whether the input process is white or AR-1, provided that the block size and the partition size are sufficiently large. With reference to Figs. 1 and 2, we make the following observations. is close to diagonal. We recall that 1) The matrix is the correlation matrix that governs the convergence behavior of the FBLMS algorithm, i.e., the case when . Furthermore, the asymptotic diagonality of this matrix has been proven in the context of the FBLMS algorithm [20], [26]. are mainly 2) The significant nonzero elements of along the diagonals of the upper-right and lower-left quadrants. The forms of these matrices may be understood when we see that the frequency-domain equivalent is generated by the point-wise multiplication of of and , (23). two frequency-domain matrices This corresponds to a two-dimensional (2-D) circular convolution in the time domain. When the input is white, is a rectangular pulse along the diagonal of the top-right quadrant, as shown in Fig. 1(b), is a rectangular pulse along and the diagonal of the bottom-right quadrant. Circularly convolving the two rectangular pulses together produces a triangular pulse that spans the top-right and bottom-left quadrants, as shown in Fig. 1(e). These two observations, as we will see, are fundamental to the convergence behavior of the various versions of the PFBLMS algorithm. In Fig. 3, we have evaluated all the submatrices and presented for four different a pictorial representation of the matrix input processes: i) white input process; ; ii) lowpass AR-1 process with parameter iii) autoregressive moving average (ARMA) highpass process generated by the coloring filter
iv) moving average bandpass process generated by passing a white process through a filter with transfer function . The interesting observation in Fig. 3 is that the general pattern of is almost independent of the nature of the input process. We
1865
Fig. 1.
Pictorial representation of the submatrices (a)
R , (b) R , (c) R
, (d)
, (e)
, and (f)
for a white input process.
thus may conclude that the behavior of the PFBLMS algorithm is almost independent of the coloring of the filter input.
of Recalling the form of
or, equivalently, , it is easy to see that
D. Matrix From (19), we observe that the convergence of the constrained PFBLMS algorithm is governed by the eigenvalues
. . .
. . .
..
. . . (30)
1866
Fig. 2. Pictorial representation of the submatrices (a) = 0:85.
R , (b) R , (c) R
, (d)
, (e)
, and (f)
for an AR-1 input process with parameter
The matrix sets the bottom half of each submatrix to zero. , this has the effect of zeroing out exactly half of For , as is only of rank . These zero the eigenvalues of eigenvalues correspond to the convergence modes of the second half of each partition, which are forced to zero at each iteration by the constraining operation. They are thus automatically set to their optimum values (zero) at each iteration and, therefore,
do not need to converge. The convergence of the constrained PFBLMS algorithm is therefore determined by the remaining . nonzero eigenvalues of At this point, it is instructive to examine the structure of in more detail and to determine the effect of this structure on the distribution of its eigenvalues. In Fig. 4, we present a pictorial representation of for the AR-1 process, with
1867
Fig. 3. Pictorial representation of the matrices (time-domain) for (a) white input, (b) AR-1 input, (c) ARMA input, and (d) MA input. P = 4; L = M = 16. The three colored processes were chosen such that the original eigenvalue spread is around 100.
and . As can be seen from (30) as well sets the bottom half of each as from Fig. 4, the matrix to zero. Thus, only the top half of and will affect the eigenvalues of . Furthermore, each satisfy eigenvalue and associated eigenvector of
Since the bottom half of each partition (i.e., of the matrices and ) is zero, the corresponding portions of must also be zero, and therefore, only the left half of and will affect the eigen. The only portion of the matrices values of and that affect the eigenvalues of are therefore the and . These portop-left quadrants, i.e.,
1868
tions are obtained from Tables IIII and are summarized here and . for
(31) The presence of in the expressions makes these portions of the matrices quite simple. In particular, we note that only the first row of the top-left quadrants of contain nonzero elements. This closely matches what is observed in Figs. 2 and 4. E. Convergence Modes of the Constrained and Unconstrained PFBLMS Algorithms In this subsection, we wrap up the results developed so far and comment on the various implementations of the PFBLMS algorithm. Obviously, without step-normalization, the convergence behavior of the PFBLMS algorithm is highly dependent on the power spectral density of the filter input and performs poorly when it is highly colored. For this reason, as noted earlier, we only consider the step-normalized versions of the PFBLMS algorithm. We therefore focus only on the matrices and , which determine the convergence behavior of the unconstrained and constrained versions of the PFBLMS algorithm, respectively. The analysis of these matrices relies heavily on the following result from matrix theory [28]: Gerschgorins Theorem: Let be an eigenvalue of an arbimatrix . Then, there exists some integer trary such that (32) is Gerschgorins theorem says that each eigenvalue of close to one of its diagonal elements. How close is quantified , which is the summation of the by absolute values of the off-diagonal elements of the th column of . As the determinants and, therefore, the eigenvalues of and are identical, we see that Gerschgorins theorem may equally well be applied by forming the summation over the rows instead of the columns. Since Gerschgorins theorem provides a bound for a specific eigenvalue, it is natural to choose the tighter of the two bounds, i.e., to choose to be the smaller of the two possible summations arising from the row Gerschgorin disks in or column summations. If we define , and of radius , for the complex plane each centered at , then the eigenvalues of are distributed such that they all lie within the union of the Gerschgorin disks. A further extension of Gerschgorins theorem states that when the Gerschgorin disks are disjoint, there will be one eigenvalue in each disk. as described in (31) and in Examining the elements of Fig. 4, we see that we may apply Gerschgorins theorem to the . The term guarantees that the only nonzero columns of
Fig. 4. Matrix
with an AR-1 input process, with = 0:85.
elements in each will be in the first row; therefore, element of each submatrix we only need to consider the and to obtain the radius of each of the Gerschgorin disks. From (31), we observe that these elements will in common. This clearly shows that all have the factor goes to infinity,3 the radii of the Geras the FFT-length schgorin disks go to zero, and therefore, the nonzero eigenvalues all tend toward their diagonal values of a half. of When the constraint is not applied, looking at Figs. 2 and 3, it is apparent that summing over the relevant rows or columns of will give rise to Gerschgorin disks with significant (nondecaying) radii because of the significant elements in the top-right . The eigenvalues of are and bottom-left quadrants of in fact widely spread, as we now proceed to demonstrate with the following numerical example. is white, Consider the case where the input and . Then
The eigenvalues of this matrix may be evaluated by subfrom each of the diagonal elements and then tracting performing elementary row operations to reduce it to upper triangular form. The eigenvalues will then be located along
3We note that a choice of a large value of M may be undesirable in some applications as it will result in a latency of at least M samples at the filter output.
1869
TABLE IV EIGENVALUE SPREAD OF FOR WHITE INPUT
EIGENVALUE SPREAD OF
TABLE V FOR AR-1 INPUT WITH PARAMETER
the main diagonal. Doing so produces the eigenvalues , which are widely has eigenvalues that are spread.4 In general, the matrix widely spread. One point of interest is the zero eigenvalue . that implies that the eigenvalue spread The zero eigenvalue corresponds to a degree of redundancy incorporated into the unconstrained PFBLMS algorithm. This redundancy is also best illustrated by way of example. We write out the output vectors for each partition for our example of
toward the convergence of the PFBLMS algorithm. It simply appears as a kind of random walk stochastic process. This may present practical problems with the numerical overflow as the variance of a random-walk process goes to infinity as time goes may become to infinity. It is therefore possible that the tap arbitrarily large. Furthermore, we note that this is not a problem when the constrained PFBLMS is used because , being in the second half of the first partition, is constrained to 0, forcing to bear the full responsibility for that tap. Continuing with the analysis of our example, we see that when the constrained matrix is used, we get
and
where denotes a nonrelevant item. The s are later set to zero . Performing the circular convolution between and by , we see that an arbitrary variable may be added to and subtracted from the tap-weight vector in the shown posi. This retions, without affecting dundancy manifests itself as a zero eigenvalue, as energy may and be exchanged arbitrarily between tap 2 of partition . Obviously, this does not contribute tap 0 of the partition
4Similar, but slightly different results have also been reported in [21]. In particular, the results given in [21] do not contain any zero eigenvalue. This difference is because of the choice of the FFT length. Here, we use FFTs that are of length M L. In [21], an FFT length of M L is assumed. See also Footnote 1 after (7).
+ 01
The matrix has the effect of zeroing out the second eigenvalues, half of each partition resulting in which are identically zero. The remaining four eigenvalues may be immediately obtained by using Gerschgorins theorem along columns 0, 1, 4, and 5, giving four eigenvalues all equal to 0.5, as all of the Gerschgorin disks in this case are therehave radii of zero. The eigenvalues of . Ignoring the zero fore eigenvalues, we thus see that the constraining operation has equalized all the eigenvalues. To further support the conclusions drawn in this section, we present Tables IVVI. In Tables IV and V, the eigenvalue spread is given for a white input process and for an of the matrix , respectively, for various AR-1 input process with and . In Table VI, the eigenvalue spread of is given and . Since the eigenvalue for the AR-1 input for various is always 1 when the input process is white, that spread of table has been omitted. The matrices were evaluated without the approximation employed in Appendix A. These tables support
1870
EIGENVALUE SPREAD OF
FOR
TABLE VI AR-1 INPUT WITH PARAMETER = 0:85
our conclusion that the constrained PFBLMS algorithm has an eigenvalue spread that asymptotically tends to 1 as (recall that ). We can also see that this does not happen with the unconstrained algorithm, even for the case of white input.
constrained taps are set to zero every time the constraint is applied to a tap-weight vector. They will remain close to zero over the course of the next few iterations as the unconstrained PFBLMS algorithm is run but will grow slowly. However, they will be set back to zero at the next scheduled constraint and, therefore, never have the opportunity to reach a significantly large magnitude. Using the matrices developed in Appendix A and summarized in Tables IIII, we can understand the behavior of the schedule-constrained PFBLMS algorithm and show that it will converge with almost no degradation in performance as compared to its fully constrained counterpart. When is a constrained vector, we know that the algorithm will asymptotically converge with a single mode. This can also be seen from (19), which determines the convergence . Removing the constraining matrix and preof to convert the equation into the timemultiplying (19) by domain, we get (33) . From the way the super-vector of where supertap weights was defined in (14), we see that the holds the time-domain tap-error weights for all the vector partitions strung head to tail. Equation (33) is the update equation for the normalized unconstrained PFBLMS algorithm, and the convergence is controlled by the matrix in (29), which has the submatrices summarized in Tables IIII for and depicted in Fig. 3 for various input processes. As we wish to differentiate between constrained taps and free taps, we split into four submatrices, which we label up the submatrices , and , corresponding to the top-left, , top-right, bottom-left, and bottom-right quadrants of respectively. We can write the tap-errors in terms of their free taps and constrained taps as
IV. SCHEDULE-CONSTRAINED PFBLMS ALGORITHM From the analysis in the previous section, we see that the unconstrained PFBLMS algorithm suffers from slow convergence modes, whereas the convergence of the constrained PFBLMS algorithm asymptotically approaches a single mode at the expense of significantly increased computational complexity. The unconstrained PFBLMS algorithm only requires three FFT/IFFTs per iteration, whereas the constraining operation requires an additional FFTs and IFFTs per iteration to confor . strain the tap-weight vectors For large , this complexity may not be acceptable. As was mentioned in Section I, McLaughlin [25] has proposed a solution in which the tap weights are not all constrained at each iteration. Instead, a scheduling is assigned in which different tap weights are constrained at each iteration. A simple schedule could be to apply the constraining operation to each partition tap-weight vector on a rotational basis. As is noted in [25] and shown through computer simulation in Section V, the schedule-constrained PFBLMS algorithm performs almost as well as the fully constrained PFBLMS algorithm. In this section, we use the results derived in the previous sections to give an explanation of the high convergence rate of the schedule-constrained PFBLMS algorithm. The schedule-constrained PFBLMS algorithm takes advantage of the fact that after being constrained, the tap weights in each partition remain approximately constrained for the next few iterations. We define those time-domain taps that are set to zero by the constraining operation, as the constrained taps, whereas those taps that are not influenced by the constraint are referred to as the free taps. In the implementation of the PFBLMS algorithm with 50% overlap between input par, the free taps correspond to the first half of titions , whereas the constrained taps correspond to the . In terms of these definitions, the second half of
where is the tap-error vector for the th partition. With these new terms, (33) is written as
(34)
where, due to space restrictions, we have written the equation , although it extends simply to the case of general out for .
1871
From (34), we can see that when the constrained taps are zero, the update equation for the free taps is governed solely , and (34) reduces to by the matrices
(35) and From Figs. 13 and Tables IIII, we know that are close to zero and that is asymptotically diagonal. It then follows that , which controls the conver-
gence of the free taps when the constrained taps are zero, is also asymptotically diagonal. As was noted before, the constrained taps are never allowed to grow very large due to the periodical application of the constraint. We therefore expect the schedule-constrained PFBLMS algorithm to perform almost as well as the fully constrained PFBLMS algorithm. This conclusion is verified by the simulations. V. SIMULATION RESULTS Figs. 5 and 6 show the learning curves for the PFBLMS aland for the white input and gorithm with cases. The curves were ensemble averAR-1 input aged over 500 individual runs each and time averaged over each . The step-size parameter was chosen so that block the final misadjustment would be around 10% in all cases. The following equations were used to obtain and where denotes misadjustment, and the superscripts , , and , as before, refer to constrained, unconstrained, and step-normalized, respectively. The forgetting factor was 0.95. The three curves correspond to the unconstrained, schedule-constrained, and fully constrained algorithms, respectively. Similar curves are also obtained for other colored inputs, including the third and fourth processes presented in Section III-C. We can see from these curves that, as predicted by the theory, the schedule-constrained algorithm performs almost identically to the fully constrained PFBLMS algorithm. The unconstrained algorithm, however, shows a slow mode of convergence as predicted by the large eigenvalue spreads . A further observation from these results associated with is that the convergence modes are more or less independent of the type of input coloring, which is, again, in agreement with what is observed from the matrices in Fig. 3. VI. CONCLUSION In this paper, we evaluated the matrices that control the convergence rates of the constrained and unconstrained normalized PFBLMS algorithms. We proceeded to calculate each element of the matrices for the case of an AR-1 input. From these, we showed that as the size of the discrete Fourier transform is increased, the eigenvalues associated with the constrained PFall tend to the same value, and BLMS algorithm matrix therefore, the eigenvalue spread asymptotically tends toward 1. We also showed through numerical examples and simulation
Fig. 5. Learning curves for the (a) unconstrained, (b) schedule-constrained, and (c) fully constrained PFBLMS algorithm with white input. = 1024; P = 16; M = 64.
Fig. 6. Learning curves for the (a) unconstrained, (b) schedule-constrained, and (c) fully constrained PFBLMS algorithm with AR-1 input. N = 1024; P = 16; M = 64.
results that the eigenvalue spread associated with the unconstrained PFBLMS algorithm is widely spread and, therefore, that the unconstrained PFBLMS algorithm suffers from slow modes of convergence. We also looked at the schedule-constrained PFBLMS algorithm, which does not have as high a computational complexity as the fully constrained PFBLMS, nor does it suffer from slow modes of convergence like the unconstrained PFBLMS algorithm. This not so well-known algorithm was thus identified as the best compromise implementation of the PFBLMS algorithm. APPENDIX A FOR AN AR-1 INPUT EVALUATION OF Since we need to evaluate matrices of the form and frequently in this Appendix, it is useful to have an explicit derivation of the elements of this matrix. When and
1872
are matrices, is arbitrary, and DFT matrix, i.e., work out the elements of the matrix
is the normalized , it is simple to as (36)
We now make the simplifying assumption that is large enough, and is small enough such that for the significant may be approxiterms in the above summation, . Defining the th diagonal element of as , mated by we thus obtain
and similarly for (37) , whereas the variable Here, we use the symbol to denote is used as an index to the columns of a matrix. In this Appendix, . Applying (36) to the matrix , we we assume that matrix get the expression for the Again, making the approximation terms, we get (38) The main objective of this Appendix is to evaluate the elements of the matrix Multiplying (40) by gives and combining
where . Writing out explicitly the input vectors for the input partitions and from (4) and replacing by , we see that for an AR-1 input process with may parameter , the elements of be written as (39) Applying (36) to this, we get
Applying (37) then gives (41), shown at the top of the next page. The first summation in parentheses, in (41), after factoring out the denominator, may be expanded as , we get
Point-wise multiplying this with
(40) We now need to evaluate the matrix , which is a diagonal matrix holding the powers of each frequency bin of the input. , These are simply the diagonal elements of which, from (36), are given as
(42) Considering the valid ranges of the variables, the summations of exponents can be rewritten in terms of the Kroneker delta function. Simplifying (41) then gives
We only need to consider the diagonal elements of . Due , we may rewrite this as a single to the Toeplitz nature of summation
(43)
1873
(41)
Expanding the brackets will give an expression with 14 terms in it, each of the form
We can treat each of these summations separately. Evaluation of these summations is a matter of carefully checking the ranges of the summations and variables (parameters) used and counting the number of elements of the summation that are nonzero. The valid ranges of the summations are different, depending on which portion of the matrix is being examined. The expression has been evaluated and presented for the three main submatrices and in Tables IIII. A more detailed of interest , is available evaluation, including a general expression for in [27]. ACKNOWLEDGMENT The authors wish to thank the anonymous reviewers for their many constructive comments. REFERENCES
[1] B. Widrow and M. E. Hoff Jr., Adaptive switching circuits, in IRE WESCON Conv. Rec., 1960, pp. 96104. [2] E. R. Ferrara, Fast implementation of LMS adaptive filters, IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-28, pp. 474475, Aug. 1980. [3] M. R. Asharif, T. Takebayashi, T. Chugo, and K. Murano, Frequencydomain noise canceler: Frequency-bin adaptive filtering (FBAF), in Proc. ICASSP, 1986, pp. 41.22.141.22.4. [4] M. R. Asharif and F. Amano, Acoustic echo-canceler using the FBAF algorithm, Trans. Commun., vol. 42, pp. 30903094, Dec. 1994. [5] J. S. Soo and K. K. Pang, Multidelay block frequency-domain adaptive filter, IEEE Trans. Acoust., Speech, Signal Processing, vol. 38, pp. 373376, Feb. 1990. [6] C. H. Yon and C. K. Un, Fast multidelay block transform-domain adaptive filters based on a two-dimensional optimum block algorithm, IEEE Trans. Circuits Syst. II, vol. 41, pp. 337345, May 1994. [7] B. Farhang-Boroujeny, Analysis and efficient implementation of partitioned block LMS adaptive filters, IEEE Trans. Signal Processing, vol. 44, pp. 28652868, Nov. 1996. [8] W. Kellermann, Analysis and design of multirate systems for cancellation of acoustic echoes, Proc. IEEE ICASSP, pp. 25702573, 1988. [9] B. Farhang-Boroujeny and Z. Wang, Adaptive filtering in subbands: Design issues and experimental results for acoustic echo cancellation, Signal Process., vol. 61, pp. 213223, 1997. [10] S. S. Narayan, A. M. Peterson, and M. J. Narasimha, Transform domain LMS algorithm, IEEE Trans. Acoust. Speech, Signal, Processing, vol. ASSP-31, pp. 609615, June 1983. [11] B. Farhang-Boroujeny and S. Gazor, Selection of orthonormal transforms for improving the performance of the transform domain normalized LMS algorithm, in Proc. Inst. Elect. Eng. F, vol. 139, Oct. 1992, pp. 327335.
[12] D. F. Marshall and W. K. Jenkins, A fast quasi-Newton adaptive filtering algorithm, IEEE Trans. Signal Processing, vol. 40, pp. 16521662, July 1992. [13] C. E. Davila, A stochastic Newton algorithm with data-adaptive step size, IEEE Trans. Acoust., Speech, Signal Processing, vol. 38, pp. 17961798, Oct. 1990. [14] B. Farhang-Boroujeny, Fast LMS/Newton algorithms based on autoregressive modeling and their applications to acoustic echo cancellation, IEEE Trans. Signal Processing, vol. 45, pp. 19872000, Aug. 1997. [15] B. Widrow and S. D. Stearns, Adaptive Signal Processing. Englewood Cliffs, NJ: Prentice-Hall, 1985. [16] D. Mansour and A. H. Gray Jr., Unconstrained frequency-domain adaptive filter, IEEE Trans. Acoust., Speech, Signal, Processing, vol. ASSP-30, pp. 726734, Oct. 1982. [17] A. V. Oppenheim and R. W. Schafer, Digital Signal Processing. Englewood Cliffs, NJ: Prentice-Hall, 1975. [18] G. A. Clark, S. R. Parker, and S. K. Mitra, A unified approach to time and frequency-domain realization of FIR adaptive digital filters, IEEE Trans. Acoust., Speech, Signal, Processing, vol. ASSP-31, pp. 10731083, Oct. 1983. [19] B. Farhang-Boroujeny, Adaptive Filters: Theory and Applications. Chichester, U.K.: Wiley, 1998. [20] J. C. Lee and C. K. Un, Performance analysis of frequency-domain block LMS adaptive digital filters, IEEE Trans. Circuits Syst., vol. 36, pp. 173189, Feb. 1989. [21] E. Moulines, O. A. Amrane, and Y. Grenier, The generalized multidelay adaptive filter: Structure and convergence analysis, IEEE Trans. Signal Processing, vol. 43, pp. 1428, Jan. 1995. [22] F. Beaufays, Transform domain adaptive filters: An analytic approach, IEEE Trans. Signal Processing, vol. 43, pp. 422431, Feb. 1995. [23] R. M. Gray, On the asymptotic eigenvalue distribution of Toeplitz matrices, IEEE Trans. Inform. Theory, vol. IT-18, pp. 725729, Nov. 1972. [24] K. S. Chan and B. Farhang-Boroujeny, Lattice implementation: Fast converging structure for efficient implementation of frequency-domain adaptive filters, Signal Process., vol. 78, pp. 7989, 1999. [25] H. J. McLaughlin, System and method for an efficiently constrained frequency-domain adaptive filter, U.S. Patent 5 526 426, June 11, 1996. [26] B. Farhang-Boroujeny and K. S. Chan, Analysis of the frequency-domain block LMS algorithm, IEEE Trans. Signal Processing, vol. 48, pp. 23322342, Aug. 2000. [27] K. S. Chan, Fast block LMS algorithms and analysis, Ph.D. dissertation, Nat. Univ. Singapore, 2000. [28] J. H. Wilkinson, The Algebraic Eigenvalue Problem. Monographs on Numerical Analysis. Oxford, U.K.: Oxford Univ. Press, 1965.
Kheong Sann Chan was born in Melbourne, Australia, in 1972 and grew up in Singapore. He received the B.A. degree in mathematics and physics in 1994 and the B.Sc. degree in electrical engineering in 1996 from Northwestern University, Evanston, IL. He then returned to Singapore to pursue the Ph.D. degree at the National University of Singapore. His research interests include analysis and implementation of adaptive filtering algorithms in the frequency domain and partial response channel equalization. He is currently working at the Data Storage Institute, National University of Singapore.
1874
Berhouz Farhang-Boroujeny (SM98) received the B.Sc. degree in electrical engineering from Teheran University, Teheran, Iran, in 1976, the M.Eng. degree from University of Wales Institute of Science and Technology, Cardiff, U.K., in 1977, and the Ph.D. degree from Imperial College, University of London, London, U.K., in 1981. From 1981 to 1989, he was with Isfahan University of Technology, Isfahan, Iran. From September 1989 to August 2000, he was with the Electrical Engineering Department, National University of Singapore. He recently joined the Department of Electrical Engineering, University of Utah, Salt Lake City. His current scientific interests are adaptive filter theory and applications, multicarrier modulation for wired and wireless channels, code division multiple access, and recording channels. Dr. Farhang-Boroujeny received the UNESCO Regional Office of Science and Technology for South and Central Asia Young Scientists Award in 1987 in recognition of his outstanding contribution in the field of computer applications and informatics. He is the author of the book Adaptive Filters: Theory and Applications (New York: Wiley, 1998) and the coauthor of an upcoming book Toeplitz Matrices: Algebra, Algorithms and Analysis (Boston, MA: Kluwer, to be published). He has been an active member of IEEE. He has served as member of Signal Processing, Circuits and Systems, and Communications Chapters in Singapore and Utah. He has also served on organizing committees of many international conferences, including Globecom95 in Singapore and ICASSP2001, in Salt Lake City, UT.

Block Lms Algorithm

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Block Lms Algorithm

Transféré par

Droits d'auteur :

Formats disponibles

1860

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 49, NO. 9, SEPTEMBER 2001

Analysis of the Partitioned Frequency-Domain Block LMS (PFBLMS) Algorithm

1053587X/01$10.00 2001 IEEE

CHAN AND FARHANG-BOROUJENY: PARTITIONED FREQUENCY-DOMAIN BLOCK LMS ALGORITM

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 49, NO. 9, SEPTEMBER 2001

. . . With these definitions, (6) is more simply written as

CHAN AND FARHANG-BOROUJENY: PARTITIONED FREQUENCY-DOMAIN BLOCK LMS ALGORITM

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 49, NO. 9, SEPTEMBER 2001

TABLE I ELEMENTS OF THE MATRIX

TABLE II ELEMENTS OF THE MATRIX

TABLE III ELEMENTS OF THE MATRIX

CHAN AND FARHANG-BOROUJENY: PARTITIONED FREQUENCY-DOMAIN BLOCK LMS ALGORITM

Pictorial representation of the submatrices (a)

for a white input process.

of Recalling the form of

or, equivalently, , it is easy to see that

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 49, NO. 9, SEPTEMBER 2001

Fig. 2. Pictorial representation of the submatrices (a) = 0:85.

for an AR-1 input process with parameter

CHAN AND FARHANG-BOROUJENY: PARTITIONED FREQUENCY-DOMAIN BLOCK LMS ALGORITM

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 49, NO. 9, SEPTEMBER 2001

with an AR-1 input process, with = 0:85.

CHAN AND FARHANG-BOROUJENY: PARTITIONED FREQUENCY-DOMAIN BLOCK LMS ALGORITM

TABLE IV EIGENVALUE SPREAD OF FOR WHITE INPUT

TABLE V FOR AR-1 INPUT WITH PARAMETER

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 49, NO. 9, SEPTEMBER 2001

TABLE VI AR-1 INPUT WITH PARAMETER = 0:85

CHAN AND FARHANG-BOROUJENY: PARTITIONED FREQUENCY-DOMAIN BLOCK LMS ALGORITM

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 49, NO. 9, SEPTEMBER 2001

is the normalized , it is simple to as (36)

Point-wise multiplying this with

CHAN AND FARHANG-BOROUJENY: PARTITIONED FREQUENCY-DOMAIN BLOCK LMS ALGORITM

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 49, NO. 9, SEPTEMBER 2001

Vous aimerez peut-être aussi