Printer Friendly

Output signal based combination of two NLMS adaptive filters--transient analysis/Valjundsignaalidel baseeruv adaptiivsete NLMS-filtrite kombinatsioon: koonduvusanaluus.


When designing an adaptive algorithm, one faces a trade-off between the initial convergence speed and the mean-square error in steady state. In case of algorithms belonging to the least mean square (LMS) family this trade-off is controlled by the step-size parameter. A large step size leads to a fast initial convergence but the algorithm also exhibits a large mean-square error in the steady state, and on the contrary, a small step size slows down the convergence but results in a small steady state error [1,2].

Variable step size adaptive schemes offer a possible solution allowing achieving both fast initial convergence and low steady state misadjustment [3-7]. How successful these schemes are depends on how well the algorithm is able to estimate the distance of the adaptive filter weights from the optimal solution. The variable step size algorithms use different criteria for calculating the proper step size at any given time instance. For example, squared instantaneous errors have been used in [4] and the squared autocorrelation of errors at adjacent time instances have been used in [6]. Paper [5] investigates an algorithm that changes the time-varying convergence parameters in such a way that the change is proportional to the negative of the gradient of the squared estimation error with respect to the convergence parameter. In [7] the norm of the projected weight error vector is used as a criterion to determine how close the adaptive filter is to its optimum performance.

Recently there has been an interest in a combination scheme that is able to optimize the trade-off between the convergence speed and the steady state error [8]. The scheme consists of two adaptive filters that are simultaneously applied to the same inputs as depicted in Figure 1. One of the filters has a large step size allowing fast convergence and the other one has a small step size for a small steady state error. The outputs of the filters are combined through a mixing parameter [lambda]. The performance of this scheme has been studied for some parameter update schemes [9-11]. Paper [9] uses convex combination, i.e., [lambda] is constrained to lie between 0 and 1. Paper [10] presents a transient analysis of a slightly modified version of this scheme. The parameter [lambda] is in those studies found by using an LMS-type adaptive scheme and computing the sigmoidal function of the result. Another approach by computing the mixing parameter using an affine combination is provided in [11]. This paper uses the ratio of time averages of the instantaneous errors of the filters. The error function of the ratio is computed to obtain [lambda].


In [12] a convex combination of two adaptive filters with different adaptation schemes has been investigated with the aim of improving the steady state characteristics. One of the adaptive filters uses the LMS algorithm and the other one the Generalized Normalized Gradient Decent algorithm. The combination parameter [lambda] is computed, using stochastic gradient adaptation. In [13] the convex combination of two adaptive filters is applied in a variable filter length scheme to gain improvements in low signal to noise ratio conditions. In [14] the combination has been used to join two affine projection filters with different regularization parameters. Paper [15] uses the combination on parallel binary structured LMS algorithms. These three works use the LMS-like scheme of [16] to compute [lambda].

It should be noted that schemes involving two filters have been proposed earlier [17,18]. However, in those early schemes only one of the filters was adaptive, while the other used fixed filter weights. The updating of the fixed filter was accomplished by copying all coefficients from the adaptive filter, when the adaptive filter was performing better than the fixed one.

In the present paper the mixing parameter [lambda] is computed from output signals of the individual filters. The way of calculating the mixing parameter is optimal in the sense that it results from minimization of the mean-squared error of the combined filter. The scheme was independently proposed in [19] and [20]; the steady state performance of it was investigated in [21] and the tracking performance in [22]. In [23] the output signal based combination was used in the adaptive line enhancer. Those papers investigate the behaviour of the adaptive combination scheme in steady state, i.e., in the situation when discrete time n approaches infinity. In the main body of this paper, on the other hand, a transient analysis of the algorithm is given, i.e., the formulae, predicting the entire course of adaptation, not only steady state, are derived.

It will be assumed throughout the paper that the signals are complex-valued and the combination scheme uses two normalized LMS (NLMS) adaptive filters. The italic, boldface lower-case, and boldface upper case letters will be used for scalars, column vectors, and matrices, respectively. The superscript T denotes transposition of a matrix, the operator E[x] denotes mathematical expectation, and tr[x] stands for trace of a matrix.


Let us consider two adaptive filters, as shown in Figure 1, each of them updated using the NLMS adaptation rule

[w.sub.i](n)= [w.sub.i](n - 1) + [[mu].sub.i]/[x.sub.T](n)x(n) [e.sup.*.sub.i](n)x(n), (1)

[e.sub.i](n) = d(n) - [w.sup.H.sub.i] (n - 1)x(n), (2)

d(n)= [w.sup.H.sub.o] x(n) + v(n). (3)

In the above, [w.sub.i](n) is the N vector of coefficients of the ith adaptive filter, with i = 1,2. The vector [w.sub.o] is the true weight vector we aim to identify with our adaptive scheme and x(n) is the known N input vector, common for both adaptive filters. The input process is assumed to be a zero mean wide sense stationary Gaussian process. The desired signal d(n) is a sum of the output of the filter to be identified and the Gaussian, zero mean independent an identically distributed (i.i.d.) measurement noise that is statistically independent of all other signals. The step size of the ith adaptive filter is denoted by [[mu].sub.i]. We assume without loss of generality that [[mu].sub.1] > [[mu].sub.2]. The case [[mu].sub.1] = [[mu].sub.2] is not interesting because the two filters remain equal and the combination renders to a single filter.

The outputs of the two adaptive filters are combined according to

y(n) = [lambda](n)[y.sub.1](n) + [1 - [lambda](n)][y.sub.2](n), (4)

where [y.sub.i](n) = [w.sup.H.sub.i] (n - 1)x(n) and the mixing parameter [lambda] can be any real number.

We define the a priori system error signal as difference between the output signal of the true system at time n, given by [y.sub.o](n) = [w.sup.H.sub.o] x(n) = d(n) - v(n), and the output signal of our adaptive scheme y(n)

[e.sub.a](n) = [y.sub.o](n) - [lambda](n)[y.sub.1](n) - (1 - [lambda](n))[y.sub.2](n). (5)

Let us now find [lambda](n) by minimizing the mean square of the a priori system error. The derivative of E[[[absolute value of [e.sub.a](n)].sup.2]] with respect to [lambda](n) reads

[partial derivative]E[[absolute value of [e.sub.a](n)].sup.2]]/[partial derivative][lambda](n) = 2E[Re{([y.sub.o](n) - [y.sub.2](n)) - [y.sub.2](n))([y.sub.2]))([y.sub.2](n) - [y.sub.1](n))*} + [lambda](n)[absolute value of [y.sub.2](n) - [y.sub.1](n))].sup.2]. (6)

Setting the derivative to zero results in

[lambda](n) = E[Re{(d(n) - [y.sub.2](n))([y.sub.1](n) - [y.sub.2](n))*}]/ e[[[absolute value of ([y.sub.1](n) - [y.sub.2](n))].sup.2] (7)

where we have replaced the true system output signal [y.sub.o](n) by its observable noisy version d(n). Note, however, that because we have made the standard assumption that the input signal x(n) and measurement noise v(n) are independent random processes, this can be done without introducing any error into our calculations.

The denominator of equation (7) comprises expectation of the squared difference of the two filter output signals. This quantity can be very small or even zero, particularly at the beginning of adaptation if the two step sizes are close to each other. Correspondingly [lambda] computed directly from (7) may be large. To avoid this from happening, we add a small regularization constant e to the denominator of (7). The constant e should be selected small compared to E[[x.sup.T](n)x(n)] but large enough to prevent division by zero in given arithmetic.


In this section we are interested in finding expressions that characterize transient performance of the combined algorithm, i.e., we intend to derive formulae that characterize the entire course of adaptation of the algorithm. Before we can proceed we need, however, to introduce some notations. First, let us denote the weight error vector of the ith filter as

[[??].sub.i](n)= [w.sub.o] - [w.sub.i](n). (8)

Then the equivalent weight error vector of the combined adaptive filter will be

[??](n)= [lambda][w.sub.1](n) + (1 - [lambda]) [[??].sub.2](n). (9)

The mean square deviation (MSD) of the combined filter is given by


The a priori estimation error of an individual filter is defined as

[e.sub.i,a] (n)= [[??].sup.H.sub.i] (n - 1)x(n). (11)

It follows from (5) that we can express the a priori error of the combination as

[e.sub.a](n) = X(n)[e.sub.1,a](n) + (1 - [lambda](n))[e.sub.2,a](n) (12)

and because [lambda](n) is, according to (7), a ratio of mathematical expectations and, hence, deterministic, we have for the excess mean square error (EMSE) of the combination


As [e.sub.i,a](n) = [[??].sup.H.sub.i](n - 1)x(n), the expression of the EMSE becomes


In what follows we often drop the explicit time index n as we have done in (14), if it is not necessary to avoid a confusion.

Noting that [y.sub.i](n) = [w.sup.H.sub.i] (n - 1)x(n), we can rewrite the expression for [lambda](n) in (7) as


We thus need to investigate the evolution of terms of the type [EMSE.sub.k,l] = E[[[??].sup.H.sub.k] (n - 1)x(n)[x.sup.H] (n)[[w.sub.i](n - 1)] in order to reveal the time evolution of EMSE(n) and [lambda](n). To do so, we concentrate first on the mean square deviation defined in (10).

For a single NLMS filter we have after subtraction of (1) from [w.sub.o] and expressing [e.sub.i](n) through the error of the corresponding Wiener filter [e.sub.o](n)

[[??].sup.i](n) = (I [mu]/[x.sup.H]x [xx.sup.H] [[??].sup.i](n - 1) - [mu]/[x.sup.H]x x[e.sup.*.sub.o](n). (16)

At this point we make two approximations. First, we approximate the outer product of input signal vectors by its correlation matrix [[xx.sup.H] [approximately equal to] [R.sub.x]. Second, we approximate the inner product of the input signal vectors by N times of its power [x.sup.H]x [approximately equal to] N [[sigma].sup.2.sub.x]. With those approximations we have

[??](n) [approximately equal to] (I - [mu]/N [[sigma].sup.2.sub.x] [R.sub.x]) [??](n - 1) - [mu]/N[[sigma].sup.2.sub.x] x [e.sup.*.sub.o](n). (17)

This means, in fact, that we apply the small step size theory [2] even if the assumption of a small step size is not really true for the fast adapting filter. In our simulation study we will see, however, that the assumption works rather well in practice.

Let us now define the eigendecomposition of the correlation matrix as

[Q.sup.H] [R.sub.x]Q = [OMEGA], (18)

where Q is a unitary matrix whose columns are the orthogonal eigenvectors of [R.sub.x] and [OMEGA] is a diagonal matrix having eigenvalues associated with the corresponding eigenvectors on its main diagonal. We also define the transformed weight error vector as

v(n) = [Q.sup.H] [??](n) (19)

and the transformed last term of equation (17) as

P(n) = [mu]/N[[sigma].sup.2.sub.x] [Q.sup.H]x [e.sup.*.sub.o](n). (20)

Then we can rewrite equation (17) after multiplying both sides by [Q.sup.H] from the left as

v(n) = (I - [mu]/N[[sigma].sup.2.sub.x] [OMEGA]) v(n - 1) - P(n). (21)

We note that the mean of p is zero by the orthogonality theorem and its correlation matrix equals

E[[PP.sup.H]] = [[mu].sup.2]/[N.sup.2][[sigma].sup.4.sub.x] [Q.sup.H] E[x[e.sup.*.sub.o](n)[e.sub.o](n)[x.sup.H]]Q. (22)

We now invoke the Gaussian moment factoring theorem to write

E[x[e.sup.*.sub.o](n)[e.sub.o](n)[x.sup.H]] = E[x[e.sup.*.sub.o](n)]E[[e.sub.o](n)[x.sup.H]] + E[[xx.sup.H]]E[[absolute value [e.sub.o]].sup.2]. (23)

The first term in the above is zero due to the principle of orthogonality and the second term equals [RJ.sub.min]. Hence we are left with

E[[PP.sup.H]] = [[mu].sup.2]/[N.sup.2][[sigma].sup.4.sub.x] [J.sub.min] [OMEGA], (24)

where [J.sub.min] = E[[[absolute value of [e.sub.o].sup.2]] is the minimum mean square error produced by the corresponding Wiener filter. As the matrices I and [OMEGA] in (21) are diagonal, it follows that the mth element of vector v(n) is given by


where [[omega].sub.m] is the mth eigenvalue of [R.sub.x] and [v.sub.m] and [p.sub.m] are the mth components of the vectors v and p, respectively.

We can now express the MSD and its individual components in (10) through the transformed weight error vectors as N 1

E[[[??].sup.H](n)[??](n)] = E[v.sup.H](n)v(n)] = [N-1.summation over (m=0)] E[[v.sub.m](n)[v.sup.*.sub.m](n)]. (26)

Let us concentrate on the mth component in the sum above corresponding to the cross term and denote it as [Y.sub.m] = E[[v.sub.k,m](n)[v.sup.*.sub.l,m](n)]. The expressions for the component filters follow as special cases. Substituting (25) into the expression of [Y.sub.m] above, taking the mathematical expectation, and noting that the vector p is independent of v(0) results in


We now note that most likely the two component filters are initialized to the same value

[v.sub.k,m](0) = [v.sub.l,m](0) = [v.sub.m](0) (28)

and that


We then have for the mth component of MSD


The sum over i above can be recognized as a geometric series with n terms. The first term is equal to 1 and the geometric ratio equals [(1 - [[mu].sub.k]/N[[sigma].sup.2.sub.x] [[omega].sub.m]).sup.-1] [(1 - [[mu].sub.l]/N[[sigma].sup.2.sub.x][[omega].sub.m]).sup.-1]. Hence we have


After substitution of the above into (30) and simplification we are left with


which is our result for a single entry to the MSD cross term vector. It is easy to see that for the terms involving a single filter we get expressions that coincide with the one available in the literature [2].

Let us now focus on the cross term

[EMSE.sub.k,l] = E[[[??].sup.H.sub.k] (n - 1)x(n)[x.sup.H](n)[[??].sub.l](n - 1)] , (33)

appearing in the EMSE equation (14). Due to the independence assumption we can rewrite this by using the properties of the trace operator as


Let us now recall that for any of the filters [[??].sub.i](n) = [Qv.sub.i] (n) to write


The EMSE of the combined filter can now be computed as

EMSE = [N - 1.summation over (i=0)] [[omega].sub.i]E[[[absolute value of [lambda](n)[v.sub.k,i](n - 1)] + (1 - [lambda](n) [v.sub.l,i](n - 1)].sup.2]. (36)

where the components of type E[[v.sub.k,i](n - 1)[v.sub.l,i](n - 1)] are given by (32). To compute [lambda](n), we use (15), substituting (35) for its individual components.


A simulation study was carried out with the aim of verifying the approximations made in Section 3. In particular we are interested in how well the small step-size theory applies to our combination scheme of two adaptive filters.

We have combined two 64 tap long adaptive filters. In order to obtain a practical algorithm, the expectation operators in both the numerator and denominator of (7) have been replaced by exponential averaging of the type

[P.sub.u](n) = (1 - [gamma])[P.sub.u](n - 1) + [gamma][u.sup.2](n), (37)

where u(n) is the signal to be averaged, [P.sub.u](n) is the averaged quantity, and [gamma] = 0.01. The averaged quantities were then used in (7) to obtain [lambda]. We have selected the sample echo path model number one from [24], to be the unknown system to identify. The impulse response of the echo path is shown in Figure 2.

The curves in Figures 3-6 are averaged over 100 trials. The input signal x is formed from the Gaussian white noise with unity variance by passing it through the filter with the transfer function

H(z) = 1/1 - 0.5[z.sup.-1] - 0.1[z.sup.-2] (38)

to get a coloured input signal. The measurement noise is Gaussian white noise, statistically independent of x. The solid lines represent the simulation results and the dashed lines are the theoretical results. The theory predicts the simulation results rather well and because of that in Figs 3-6 the theoretical curve overlaps with the simulation result.


As seen from Figure 3, there is a a rapid convergence in the beginning, followed by a stabilization period. When the slow adapting filter gets better than the fast one between sample times 15 000 and 20 000, a second convergence occurs. One can observe a good resemblance of simulation and theoretical curves.

Figure 4 depicts the time evolution of the combination parameter [lambda] in this simulation. At the beginning of the test case the combination parameter is close to one. Correspondingly the output signal of the fast filter is used as the output of the combination. After a while, when the slow filter catches up the fast one and becomes better, [lambda] changes towards zero and eventually becomes a small negative number. In this state the slow but more accurate filter determines the combined output. Again, one can see that there is a clear similarity between the lines.



The reason for [lambda] becoming negative at the end of adaptation lies in that we have two adaptive filters working in parallel on the same input signal and the criterion for selecting [lambda] is minimization of the mean square error of the output signal. At the end of adaptation some of the additive noise v(n) will be cancelled out if [lambda] is allowed to become negative to provide the smallest possible mean square error.

In Figure 5 we have made the difference between the step sizes small. One can see that the characteristic horizontal part of the learning curve has almost disappeared. We have also increased the measurement noise level. The simulation and theoretical curves show a good match.



In Figure 6 we have increased the measurement noise level even more. One can see that the theoretical simulation results agree well.


We have investigated a combination of two NLMS adaptive filters that are simultaneously applied to the same input signals. The mixing parameter X was computed, using the output signals of the individual filters and the desired signal. The transient behaviour of the algorithm was investigated, using the assumption of a small step size, and the expressions for evolution of EMSE(n) and X(n) were derived. Finally, it was shown in the simulation study that the derived formulae fit the simulation results well.

doi: 10.3176/proc.2011.4.06


[1.] Sayed, A. H. Adaptive Filters. John Wiley and Sons, Hoboken, NJ, USA, 2008.

[2.] Haykin, S. Adaptive Filter Theory, Fourth Edition. Prentice Hall, 2002.

[3.] Harris, R. W., Chabries, D. M., and Bishop, F. A. Variable step (vs) adaptive filter algorithm. IEEE Trans. Acoust. Speech Signal Process., 1986, 34, 309-316.

[4.] Kwong, R. H. and Johnston, E. W. A variable step size LMS algorithm. IEEE Trans. Signal Process., 1992, 40, 1633- 1642.

[5.] Matthews, V. J. and Xie, Z. A stochastic gradient adaptive filter with gradient adaptive step size. IEEE Trans. Signal Process., 1993, 41, 2075-2087.

[6.] Arenas-Garcia, J., Figueiras-Vidal, A. R., and Sayed, A. H. A robust variable step-size LMS-type algorithm: Analysis and simulations. IEEE Trans. Signal Process., 1997, 45, 631-639.

[7.] Shin, H. C. and Sayed, A. H. Variable step-size NLMS and affine projection algorithms. IEEE Signal Process. Lett., 2004, 11, 132-135.

[8.] Martinez-Ramon, M., Arenas-Garcia, J., Navia-Vazquez, A., and Figueiras-Vidal, A. R. An adaptive combination of adaptive filters for plant identification. In Proc. 14th International Conference on Digital Signal Processing. Santorini, Greece, 2002, 1195-1198.

[9.] Arenas-Garcia, J., Figueiras-Vidal, A. R., and Sayed, A. H. Mean-square performance of convex combination of two adaptive filters. IEEE Trans. Signal Process., 2006, 54, 1078-1090.

[10.] Silva, M. T. M., Nascimento, V. H., and Arenas-Garcia, J. A transient analysis for the convex combination of two adaptive filters with transfer of coefficients. In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing. Dallas, TX, USA, 2010, 3842-3845.

[11.] Bershad, N. J., Bermudez, J. C., and Tourneret, J. H. An affine combination of two LMS adaptive filters - transient mean-square analysis. IEEE Trans. Signal Process., 2008, 56, 1853-1864.

[12.] Mandic, D., Vayanos, P., Boukis, C., Jelfs, B., Goh, S. I., Gautama, T., and Rutkowski, T. Collaborative adaptive learning using hybrid filters. In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing. Honolulu, Hawaii, 2007, 901-924.

[13.] Zhang, Y. and Chambers, J. A. Convex combination of adaptive filters for a variable tap-length LMS algorithm. IEEE Signal Process. Lett., 2006, 10, 628-631.

[14.] Kim, K., Choi, Y., Kim, S., and Song, W. Convex combination of affine projection filters with individual regularization. In Proc. 23rd International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC). Shimonoseki, Japan, 2008, 901-904.

[15.] Fathiyan, A. and Eshghi, M. Combining several PBS-LMS filters as a general form of convex combination of two filters. J. Appl. Sci., 2009, 9, 759-764.

[16.] Azpicueta-Ruiz, L. A., Figueiras-Vidal, A. R., and Arenas-Garcia, J. A normalized adaptation scheme for the convex combination of two adaptive filters. In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing. Las Vegas, Nevada, 2008, 3301-3304.

[17.] Ochiai, K. Echo canceller with two echo path models. IEEE Trans. Commun., 1977, 25, 589-594.

[18.] Armbruster, W. Wideband acoustic echo canceller with two filter structure. In Signal Processing VI, Theories and Applications (Vanderwalle, J., Boite, R., Moonen, M., and Oosterlinck, A., eds). Elsevier Science Publishers B.V, 1992, 1611-1617.

[19.] Trump, T. An output signal based combination of two NLMS adaptive algorithms. In Proc. 16th International Conference on Digital Signal Processing. Santorini, Greece. IEEE 2009, 6 pp.

[20.] Azpicueta-Ruiz, L. A., Figueiras-Vidal, A. R., and Arenas-Garcia, J. A new least squares adaptation scheme for the affine combination of two adaptive filters combination of two adaptive filters. In Proc. IEEE International Workshop on Machine Learning for Signal Processing. Cancun, Mexico, 2008, 327-332.

[21.] Trump, T. Steady state analysis of an output signal based combination of two NLMS adaptive filters. In Proc. 17th European Signal Processing Conference. Glasgow, Scotland, 2009, 1720-1724.

[22.] Trump, T. Tracking performance of a combination of two NLMS adaptive filters. In Proc. IEEE Workshop on Statistical Signal Processing. Cardiff, UK. IEEE 2009, 181-184.

[23.] Trump, T. Output statistics of a line enhancer based on a combination of two adaptive filters. Central European J. Engineering, 2011, 1, 244-252.

[24.] ITU-T Recommendation G.168 Digital Network Echo Cancellers. ITU-T, 2009; (accessed 27 July 2011).

Tonu Trump

Department of Radio and Telecommunication Engineering, Tallinn University of Technology, Ehitajate tee 5, 19086 Tallinn, Estonia;

Received 11 October 2010, accepted 20 December 2010
COPYRIGHT 2011 Estonian Academy Publishers
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2011 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:SIGNAL PROCESSING; normalized least mean square
Author:Trump, Tonu
Publication:Proceedings of the Estonian Academy of Sciences
Article Type:Report
Geographic Code:4EXES
Date:Dec 1, 2011
Previous Article:Generalized Sasakian space forms with semi-symmetric non-metric connections/Poolsummeetrilise mittemeetrilise seostusega uldistatud Sasaki...
Next Article:On dispersion properties of surface motions in the Gulf of Finland/ Soome lahe pinnakihi hoovuste dispersioonist.

Terms of use | Privacy policy | Copyright © 2019 Farlex, Inc. | Feedback | For webmasters