Académique Documents
Professionnel Documents
Culture Documents
1. INTRODUCTION
Blind signal separation, or BSS, refers to performing
inverse channel estimation despite having no knowledge
about the true channel (or mixing filter) [1,2,3,4,5]. BSS
method based on ICA (independent component analysis)
technique has been found effective and thus commonly
used. A limitation using ICA technique is the need for
long unmixing filters in order to estimate inverse
channels [l]. Here, we propose the use of noncausal filters
[10] to shorten the filter length. In addition to that, using Fig. 1. Torkkola’s feedback network for BSS.
noncausal filters in the feedback network allows a good
separation even if the direct channels filters do not have 2.2 Improved ICA Based BSS Method
stable inverses. A variable step-size parameter for Torkkola's algorithm works only when the stable inverse
adaptation of the learning process is used to improve the of the direct channel filters exists; which is not
convergence. guaranteed. It was shown that the algorithm can be
FPGA (Field Programmable Gate Array) architecture modified for noncausal. The relationships between the
allows optimal parallelism needed to handle the high signals are now changed to:
M −1 work shows that using piecewise approximation does not
u1 (t ) = x1 (t + M ) + ∑ w12 (k )u2 (t − k ) (4) affect the performance BSS algorithm significantly [6].
k =−M
M −1 3.1.1 Three-buffer technique
u 2 (t ) = x2 (t + M ) + ∑ w21 (k )u1 (t − k ) (5) In real-time hardware implementation, to achieve an
k =− M uninterrupted processing, the hardware must process the
input and output as streams of continuous sample.
where M is half of the filter length, L, i.e. L = 2M+1 and However, this is in contrast with the need of batch
the learning rule: processing of BSS algorithm. To perform the separation,
∆wij( t1− p1+ M ) = ∆w(ijt0 − p0 + M ) + K (ui (t0 )u j ( p0 )) (6) a block of data buffer has to be filtered iteratively. Here,
where K ( ui ( t0 )) = stepsize * (1 − 2 y i ( t0 )) (7) we implement a buffering mechanism using three 640-
sample (N = 640) buffers per one input source. While one
1
yi (t0 ) = (8) buffer is being filled with the input, second buffer is
1 + e− ui (t0 ) being filtered, and the third is being streamed out.
and t1 = t0+1 A side effect of this three-buffer technique is that the
po=t0-k and p1=t1-k for k = -M, -M+1, …, M. system produces a processing delay equivalent to twice
The variable learning step size, stepsize, in Equation 7, the time needed to fill up a buffer. For example, if the
will be explained in more detail later on. signal sampling frequency is 100 Hz, the time to fill up
one buffer is 640/100 = 6.4 second. The system will then
need another 6.4 second to process before the result being
3. ARCHITECTURE OF FPGA DESIGN FOR BSS ready for output. The total delay is then 6.4+6.4 = 12.8
In this section, we describe the architecture of the FPGA sec. This processing delay is too long for a practical real-
design of the ICA-based BSS algorithm using Torkkola’s time ECG monitoring and thus we applied an overlapped
feedback network. The system-level design is shown window technique. In our implementation, the 640-
followed by detailed FPGA simulations on real ECG sample block is sampled with overlap of 32 samples. In
signals. Then, topics on hardware realization of the FPGA this case the processing delay is reduced to
are discussed and the FPGA synthesis results given. (64/100)*2=1.28 sec.
In our work, the FPGA design tools used were
XilinxTM System Generator version 2.3 [6] and MatlabTM 3.1.2 Implementation of mechanism for the feedback
version 6.5 from MathWorks. The FPGA synthesis tool
network
used was XilinxTM ISE 5.2i. System Generator provides a
According to Equations 4 and 5, there is a need to refer to
bit-true and cycle-true FPGA blocksets for simulation
negative addresses for the values of w12(i) when i < 0.
under MATLAB SimulinkTM, thus offering a convenient
and realistic system-level FPGA simulation. The equation can be modified to include only positive
addresses:
M
3.1 Practical Implementation of Torkkola’s Network
for FPGA Realization
u1 (t ) = x1 (t + M ) + ∑w
i =− M
12 (i + M )u 2 (t − i ) (9)
As a result of our earlier experimentation [7][10], we Equation 9 performs the same non-causal filtering on
propose that in order to minimize FPGA resource needed, u2 as in Equation 4 without the need for negative
as well as to ensure real-time BSS separation given the addressing of w12. Equation 5 is also modified
limited FPGA clock speed, the specifications shown accordingly.
below be used. Subsections 3.1.1 to 3.1.5 explain the
impact of each parameter on hardware requirement.
• Filter length, L = 321 taps,
• Buffer size for iterative convolution, N = 640
(implemented using overlapped window to shorten
the latency time. See Subsections 3.1.1),
• Maximum number of iterations, I = 200,
• Approximation of the exponential learning step size
using linear piecewise approximation. Fig. 2. Implementation of (9) for Torkkola’s feedback network
The linear piecewise approximation is used to avoid
complex circuitry needed to implement the exponential The block diagram shown Fig. 2 depicts the hardware
function in hardware (see subsection 3.1.4 for more implementation of Equation 9. Note that the
explanation). The MATLAB simulation in our earlier implementation of the FIR filtering of w12 is done through
multiply-accumulate unit (MAC) which significantly
reduces the numbers of multipliers and adders needed
compared to direct implementation (see section 3.1.5).
(if not heavily pipelined) which will result in the need for - 0.5
signals, L is the tap length of the FIR filter, and I is the 0.5
number of iterations.
0
- 0.5
0.5
7. REFERENCES
0
[1] T-W Lee, "Independent Component Analysis - Theory
- 0.5
and Applications", Kluwer Academic Publishers, 1998.
-1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
[2] R.M. Gray, "Entropy and Information Theory", New
York: Springer-Verlag, 1990.
(a) [3] P. Comon, "Independent component analysis, a new
2
concept?", Signal Processing, vol. 36, 1994, pp. 287-314.
1.5 [4] K. Torkkola, "Blind Source Separation For Audio Signals
1
- Are We there yet?", IEEE Workshop on Independent
0.5
Component Analysis and Blind Signal Separation,
Aussois, France, Jan 1999.
0
[5] T-W Lee, A.J. Bell, and R. Orglmeister, "Blind source
- 0.5
separation of real world signals", Proc. IEEE Int. Conf.
-1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
Neural Networks, June 97, Houston, pp. 2129-2135.
[6] Xilinx Inc., Xilinx System Generator v2.3 for The
(a) MathWorks Simulink: Quick Start Guide, February 2002.
Fig. 6. Result of FPGA simulation (a) separated MECG and (b) [7] F. Sattar and C. Charayaphan, “Low-Cost Design and
separated FECG Implementation of an ICA-Based Blind Source Separation
Algorithm”, IEEE ASIC/SoC Conference, Rochester, NY,
5. FPGA SYNTHESIS RESULTS Sept 25-28, 2002, pp. 15-19.
After the successful simulation, the VHDL codes were [8] Adam, D., and Shavit, D. “Complete foetal ECG
automatically generated from the design using System morphology recording by synchronized adaptive
Generator. The VHDL codes were then synthesized using filtration”, Medical and biological engineering and
Xilinx ISE 5.2i and targeted for Virtex-E, 600,000 gates. computing, 28, 287-292. 1990.
[9] Kam, A. and Cohen, A., “Maternal ECG elimination and
Foetal ECG Detection – Comparison of Several
Algorithms”, Proc. Of the 20th Ann. Int. Conf. IEEE
EMBS, Hong-Kong, 1998.
[10] Charayaphan Charoensak and Farook Sattar, “Hardware
for real-time ICA-based blind source separation,” in
Proc. 15th IEEE Int. Conf. SOCC, Sept. 12-15, 2004.