Vous êtes sur la page 1sur 32

Design & Implementation of an

Area Efficient FIR Filter Using


FPGA & VHDL

Presented By:
K.Praveen
V.Sathish kumar
A.Yasar Arafath
R.Venkateshwaran
Outline
 Introduction
 FIR filter implementation
 MAG algorithm
 RSG Multiplier block synthesis algorithm
 Generation & Presentation of results
 Overview
 Advantages
 Applications
 Conclusion
Introduction

High-speed digital signal processor (DSP) systems are
increasingly being implemented on field programmable gate array
(FPGA) hardware platforms.
 The techniques in use are applicable to implementation on
application-specific integrated circuit (ASIC) and Structured ASIC
technologies, analysis is performed using field programmable gate
array (FPGA) hardware.
 The FPGA platform provides high performance and flexibility with
the option to reconfigure and is the technology focused on for the
remainder of this paper.
 Finite impulse response (FIR) digital filters are common DSP
functions and are widely used in FPGA implementations. If very high
sampling rates are required, full-parallel hardware must be used where
every clock edge feeds a new input sample and produces a new output
sample.
continued…….
 Such filters can be implemented on FPGAs using combinations of the
general purpose logic fabric, on-board RAM and embedded arithmetic
hardware.
 Full-parallel filters cannot share hardware over multiple clock cycles and
so tend to occupy large amounts of resource. Hence, efficient
implementation of such filters is important to minimize hardware
requirement.
 When implementing a DSP system on a platform containing dedicated
arithmetic blocks, it is normal practice to utilize such blocks as far as
possible in preference to any general purpose logic fabric.
FIR Filter
 A finite impulse response (FIR) filter is a filter structure that
can be used to implement almost any sort of frequency
response digitally. An FIR filter is usually implemented by
using a series of delays, multipliers, and adders to create the
filter's output.
 The general formula of FIR filter is
Y(n) =∑ x(k) h(n-k) where k = 0,1,2…..L-1
FIR Filter Add/Shift Implementation
Replacing Constant Multiplication by Multiplier Block

X [n]
X[n]

+ + + +
Multiplier
Block
+ + +

y0 y1 y2 yL-1

x hL-1 x
Delay
Block Z-1 + Z-1 + Z-1 Z-1 + y[n]
FIR Filter Add/Shift Implementation
Registered Adder at no Additional Cost
Shift/Add Operation
RSG Algorithm
 Reduced Slice Graph Algorithm algorithm performing
synthesis for low FPGA area. This can be achieved
by synthesising low-logic depth multiplier blocks.
 Considering above multiplier block diagram for RSG
synthesis, that multiplies the input by 3, 13, 21,37 in 2
clock cycles. input is fed to the ‘3’ adder untouched
and after being left shifted once (multiplied by 2).
Hence, the output of the ‘3’ adder is 2x + x = 3x
required. This product can be used as a graph output
be fed to the filter summation chain.
Contd..
 It generate +ve, odd integers since negative filter
coefficient weightings can be restored at the
summation chain by subtracting the +ve equivalent
generated by the multiplier block.
 Once a multiplier has been synthesised, the multiplier
& any additional values used to generate it are placed
in a set called ‘Graph Set’. RSG stores data that
describes the synthesised graph topology & the
mathematical association between the vertices and
edges (shifts)
Contd…
 RSG requires the least FPGA area with the slice and flip-flop cost.
 RSG takes the approach , starts with the highest cost values and simply
inserts the ‘best graph’ required for each, ensuring no duplicate adders
are created and that adder outputs are shared as far as possible.
 As the number of slices reduced compared to MAG algorithm, RSG
synthesis is followed to reduce filter complexity in verl large scale
integration (VLSI) design.
RSG Multiplier Block
Synthesis Algorithm
RSG coding:
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_SIGNED.ALL;

entity test is
Port ( clk,reset : in std_logic;
xin : in std_logic_vector(3 downto 0);
yout3 : out std_logic_vector(5 downto 0);
yout13: out std_logic_vector(7 downto 0);
yout21: out std_logic_vector(8 downto 0);
yout37: out std_logic_vector(9 downto 0)
);
end test;

architecture bhv of test is


signal add_in,s1 :std_logic_vector(5 downto 0):=(others=>'0');
Contd…
signal add_3,add_21,mul3:std_logic_vector(8 downto 0):=(others=>'0');
signal s:std_logic_vector(8 downto 0):=(others=>'0');
signal sh_5,add_5_0,add_37 ,xin_5 :std_logic_vector(9 downto 0):=(others=>'0');
signal sh_2,add_5:std_logic_vector(6 downto 0):=(others=>'0');
signal sh_22, xin_0 :std_logic_vector(7 downto 0):=(others=>'0');
begin
process(clk,reset)
begin

if reset='1' then
s1<=(others=>'0');
s<=(others=>'0');
yout3<=(others=>'0');
add_5<=(others=>'0');
add_5_0<=(others=>'0');
Contd…
elsif clk'event and clk='1' then
s1<= '0' & xin(3 downto 0) & '0' + ("00" & xin );
yout3<=s1;
s<="000" & s1;
add_5<=('0' & xin(3 downto 0) & "00") + ("000" & xin);--7
add_5_0<="000" & add_5;--10
end if;
end process;

process(clk,reset)
begin
if reset='1' then
yout21<=(others=>'0');
xin_5<=(others=>'0');
yout37<=(others=>'0');
sh_22<=(others=>'0');
yout13<=(others=>'0');
Contd..
xin_0<=(others=>'0');
yout13<=(others=>'0');
elsif clk'event and clk='1' then

yout21<=( s(5 downto 0) & "000") - s;

xin_5<="000000" & xin;


yout37<= xin_5(4 downto 0) & "00000" + add_5_0;--10

sh_22<= s1(5 downto 0) & "00";--8


xin_0<="0000" & xin;--8
yout13<=sh_22 + xin_0;--8

end if;
end process;
end bhv;
MAG algorithm
 The minimised adder graph (MAG) algorithm was defined
byDempster and Macleod.
 It generates minimum adder graphs consisting of
shifts,adds&subtracts for implementing constant integer
multiplication of individual values up to12-bits.
 MAG is required for recent FPGA architectures to further
reduce the area requirement of conventional constant
coefficient multipliers.
 MAG finds numerous graphs for each value that implement
the required multiplication with the number of
adders/vertices.
Contd..
 This algorithm attempts to synthesise all required
multiplications by initially placing coefficients of adder cost 1
into the multiplier block
 the ‘3’ adder is synthesised first and all other coefficients are
built from it. This approach leads to high logic depth blocks and
hence pushes up FPGA area requirement.
 calculating the number of single-bit full adders required to
implement each graph and selecting the graph requiring the
least. Minimise VLSI area consumed when implementing the
multiplication. MAG also generates a table containing the adder
cost of each.
Multiplier block with 4 adders,
4 pipeline stages costing 44 slices
MAG Coding:
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_SIGNED.ALL;

entity mag is
Port ( clk,reset : in std_logic;
xin : in std_logic_vector(3 downto 0);
yout3 : out std_logic_vector(5 downto 0);
yout21: out std_logic_vector(8 downto 0);
yout13: out std_logic_vector(7 downto 0);
yout37: out std_logic_vector(9 downto 0)
);
end mag;

architecture bhv of mag is


signal add_in,s1 :std_logic_vector(5 downto 0):=(others=>'0');
signal add_3,add_21,mul3:std_logic_vector(8 downto 0):=(others=>'0');
Contd..
signal s:std_logic_vector(8 downto 0):=(others=>'0');
signal sh_5,add_5_0,add_37 ,xin_5 :std_logic_vector(9 downto 0):=(others=>'0');
signal sh_2,add_5:std_logic_vector(6 downto 0):=(others=>'0');
signal sh_22, xin_0 :std_logic_vector(7 downto 0):=(others=>'0');
begin

process(clk,reset)-----3
begin
if reset='1' then
s1<=(others=>'0');
yout3<=(others=>'0');
elsif clk'event and clk='1' then
s1<= '0' & xin(3 downto 0) & '0' + ("00" & xin );
yout3<=s1;
end if;
end process;
Contd..
process(clk,reset)----13
begin
if reset='1' then
sh_22<=(others=>'0');
yout13<=(others=>'0');
elsif clk'event and clk='1' then
sh_22<= s1(5 downto 0) & "00";--8
xin_0<="0000" & xin;--8
yout13<=sh_22 + xin_0;--8
end if;
end process;

process(clk,reset)----21
begin
if reset='1' then
s<=(others=>'0');
yout21 <=(others=>'0');
elsif clk'event and clk='1' then
Contd..
s<="000" & s1;
yout21<=( s(5 downto 0) & "000") - s;
end if;
end process;

process(clk,reset)---37
begin
if reset='1' then
add_5<=(others=>'0');
add_5_0<=(others=>'0');
yout37<=(others=>'0');
elsif clk'event and clk='1' then
add_5<=('0' & xin(3 downto 0) & "00") + ("000" & xin);--7
add_5_0<="000" & add_5;--10
xin_5<="000000" & xin;
yout37<= xin_5(4 downto 0) & "00000" + add_5_0;--10
end if;
end process;
end bhv;
Generation & presentation of
Results
 VHDL filters were generated to compare RSG, RAG and
C1, coefficient bitwidth varied from 1 to 20 and filter
length fixed at 51taps.
 Fig 4a demonstrates that RSG requires less FPGA area
in general, although fig 4b shows that RSG uses more
adders in all cases,fig 4c confirms that FPGA area is
correlated with flip-flop usage,fig 4d RSG is lowest as
expected and shift-register usage.
 Fig 4e LUT data is correlated with adder usage with
RSG using fractionally more LUTs in general.
Contd..
Device Utilization of RSG synthesis multiplier block
---------------------------------------------------------------------
Resource Used Available Utilization
----------------------------------------------------------------------
IOs 39 86 45.35%
Function Generators 27 384 7.03%
CLB Slices 25 192 13.02%
Dffs or Latches 50 672 7.44%
Overview….
 Minimising multiplier block adder cost has been demonstrated
not to minimise FPGA hardware for full-parallel pipelined FIR
filters.
 Reducing flip-flop count through minimising multiplier logic
depth has instead been shown to yield the lowest area
solutions.
 The results presented establish a clear area advantage of RSG
over prior algorithms for typical filter parameters with
comparablemaximum clock rates.
 In addition, the industrial relevance of the transposed FIR with
multiplier block architecture & the RSG algorithm has been
established through comparison with filters implemented using
the DA technique.
Contd..
 RSG is shown to consume more LUTs which is the primary
cause of reducing RSG area advantage with increasing bit-
width.
 Maximumclock rates were generally well into the full-
parallelrange although for bit-widths 18 and 20, RSG filter
performance drops as filter length increases.
 RSG filters provide lowest area implementations capable of
being clocked at full-parallel rates.
Advantages
 There is a constant requirement for efficient use of FPGA
resources where occupying less hardware for a given system
can yield significant cost-related benefits:
1.Reduced power consumption.
2. Area for additional application functionality.
3. Potential to use a smaller, cheaper FPGA.
(i)No input sample shift registers are required since each
sample is fed to each tap simultaneously.
(ii) the pipelined addition chain maps efficiently.
(iii) filter latency is reduced.
(iv) identical tap coefficient magnitudes can share multiplication
hardware because taps receive the input sample
simultaneously.
Applications:
 Extensive use of FPGAs in computationally intensive
applications such as DSP
 More available logic resources in current FPGAs
 Broad applications of FIR filters in multimedia and
Audio communication.
 Need to efficient design methods to save area/power.
 Parameterizing FIR filters in VHDL by using required
multiplier block synthesis algorithms.
 Minimise VLSI area consumed when implementing
the synthesis of multiplier blocks.
Conclusion
 Presented a multiplierless technique, based on the add and shift
method.

 And Common sub expression elimination for low area, low power and
high speed implementations of FIR filters.

 FIR filters can perform at sample rates greatly exceeding those of a


state-of-the-art programmable DSP.

 The FPGA solution offers complete flexibility in the design and, by


reducing chip count, improves the overall reliability of the system.

Vous aimerez peut-être aussi