Académique Documents
Professionnel Documents
Culture Documents
Full-Custom ASICs
Semi-Custom ASICs
User Programmable
PLD
FPGA
Fuses (destroy internal links with current) Anti-fuses (grow internal links) PROM EPROM EEPROM Flash SRAM - volatile
Reprogrammable
What is an FPGA?
Configurable Logic Blocks
Block RAMs Block RAMs
Off-the-shelf
High performance Low development cost Low power Short time to market Low cost in high volumes
Reconfigurability
Manufacturing cycle for ASIC is very costly, lengthy and engages lots of manpower
Mistakes not detected at design time have large impact on development time and cost FPGAs are perfect for rapid prototyping of digital circuits
reconfigurable computing
XILINX
Xilinx
UMC (Taiwan) {*Xilinx acquired an equity stake in UMC in 1996} Seiko Epson (Japan) TSMC (Taiwan)
Old families
XC3000, XC4000, XC5200 Old 0.5m, 0.35m and 0.25m technology. Not recommended for modern designs. Virtex (0.22m) Virtex-E, Virtex-EM (0.18m) Virtex-II, Virtex-II PRO (0.13m) Spartan/XL derived from XC4000 Spartan-II derived from Virtex Spartan-IIE derived from Virtex-E Spartan-3
High-performance families
CLB Structure
COUT YB Y D CK EC R F5IN BY SR XB X D S F4 F3 F2 F1 XB X D S COUT YB Y D CK EC R G4 G3 G2 G1 Look-Up Table O
G4 G3 G2 G1
Look-Up Table O
F5IN BY SR F4 F3 F2 F1
Look-Up Table O
CK
EC R
Look-Up Table O
CK
EC R
CIN CLK CE
SLICE
CIN CLK CE
SLICE
Each slice has 2 LUT-FF pairs with associated carry logic Two 3-state buffers (BUFT) associated with each CLB, accessible by all CLB outputs
Four-input LUT Any 4-input logic function, or 16-bit x 1 sync RAM or 16-bit shift register Carry & Control Fast arithmetic logic Multiplier logic Multiplexer logic Storage element Latch or flip-flop Set and reset True or inverted inputs Sync. or async. control
LUT
x1 x2 x3 x4
x1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
x2 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
x3 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
x4 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
y 0 1 0 0 0 1 0 1 0 1 0 0 1 1 0 0
Look-Up tables are primary elements for logic implementation Each LUT can implement any function of 4 inputs
x1 x2 y y
One CLB Slice can implement any function of 5 inputs Logic function is partitioned between two LUTs F5 multiplexer selects LUT
A4 A3 A2 A1 WS DI
0
F5
F5 GXOR G
F4 F3 F2 F1 BX
A4 A3 A2 A1
WS
DI D
nBX BX 1 0
LUT
OUT
LUT
MUXF5 combines 2 LUTs to create Any 5-input function (LUT5) Or selected functions up to 9 inputs Or 4x1 multiplexer MUXF6 combines 2 slices to form Any 6-input function (LUT6) Or selected functions up to 19 inputs 8x1 multiplexer Dedicated muxes are faster and more space efficient
Slice
LUT LUT MUXF5
Distributed RAM
RAM16X1S
LUT
=
RAM32X1S
D WE WCLK A0 A1 A2 A3 A4 O
D WE WCLK A0 A1 A2 A3
A LUT equals 16x1 RAM Implements Single and Dual-Ports Cascade LUTs to increase RAM size
LUT
=
LUT
or
RAM16X2S
D0 D1 WE WCLK A0 A1 A2 A3 O0 O1
RAM16X1D
D WE WCLK A0 A1 A2 A3 DPRA0 DPO DPRA1 DPRA2 DPRA3 SPO
or
Shift Register
LUT
IN CE CLK
D CE
D CE
Dynamically addressable delay up to 16 cycles For programmable pipeline Cascade for greater cycle delays Use CLB flip-flops to add depth
LUT
D CE
OUT
D CE
DEPTH[3:0]
Shift Register
12 Cycles Operation A 64 4 Cycles Operation B 8 Cycles 64
Operation C
3 Cycles
3 Cycles
Register-rich FPGA
9-Cycle imbalance
D CK EC
F5IN BY SR XB
F4 F3 F2 F1
X Look-Up Table O
S D CK EC R Q
CIN CLK CE
SLICE
Increases efficiency and performance of adders, subtractors, accumulators, comparators, and counters
LSB
Each CLB contains separate logic and routing for the fast generation of sum & carry signals
MSB
All major synthesis tools can infer carry logic for arithmetic functions
Addition (SUM <= A + B) Subtraction (DIFF <= A - B) Comparators (if A < B then) Counters (count <= count +1)
Block RAM
Port B Port A
Spartan-II True Dual-Port Block RAM
Block RAM
WEB
ENB
DOB[15:0]
Each port can be configured with a different data bus width Provides easy data width conversion without any additional logic
Tie the MSB address bit to Logic Low Tie the MSB address bit to Logic High
I/O Banking
Three-State Control
Output FF Enable
D Q EC SR
Output Path
SR
IOB Functionality
IOB provides interface between the package pins and CLBs Each IOB can work as uni- or bi-directional I/O Outputs can be forced into High Impedance Inputs and outputs can be registered
Routing Resources
CLB CLB CLB
PSM
CLB CLB
PSM
CLB Programmable Switch Matrix
PSM
CLB CLB
PSM
CLB
Clock Distribution
FPGA Nomenclature
ALTERA
FLEX10K Architecture
Stratix Architecture
Feature Logic Elements (LEs) M512 RAM Blocks ( 512 Bits + Parity) M4K RAM Blocks (4 Kbits + Parity) M512 RAM Blocks (512 Kbits + Parity) Total RAM bits DSP Blocks Embedded Multipliers PLLS Maximum User I/O Pins Engineering Sample Availability Production Device Availability
EP1S40 41,250 384 183 4 3,423,744 14 112 12 822 Now March 2003
EP1S80 79,040 767 364 9 7,427,520 22 176 12 1,238 Now January 2003
12 96 10 726 N/A
Now
2003
year
1995
1996
1997
2000
2003
2004 ?
Technology
0.6
0.13
100K LC* 8Mb RAM 400 18X18 multipliers
0.07
Gate count
25K
100K
250K
1M
Transistor count
3.5M
12M
23M
75M
430M
1B
More guts
Additional components
RAM blocks Dedicated multipliers Tri-state buffers Transceivers Processor cores DSP blocks
QuickLogic
Altera Xilinx
Processor Cores
ARM in Excalibur
Industry-standard ARM922T 32-bit RISC processor core operating up to 200MHz
ARMv4T instruction set with Thumb extensions
Memory management unit (MMU) included for real-time operating systems (RTOS) support Harvard cache architecture with 64-way set associative separate 8Kbyte instruction and 8-Kbyte data caches
Watchdog timer
FPGA Tools
Library IEEE; use ieee.std_logic_1164.all; use ieee.std_logic_unsigned.all; entity RC5_core is port( clock, reset, encr_decr: in std_logic; data_input: in std_logic_vector(31 downto 0); data_output: out std_logic_vector(31 downto 0); out_full: in std_logic; key_input: in std_logic_vector(31 downto 0); key_read: out std_logic; ); end AES_core;
Functional simulation
Synthesis
Post-synthesis simulation
Timing simulation
Active-HDL
Simulation Tools
Synthesis Tools
Logic Synthesis
VHDL description
architecture MLU_DATAFLOW of MLU is signal signal signal signal begin A1:STD_LOGIC; B1:STD_LOGIC; Y1:STD_LOGIC; MUX_0, MUX_1, MUX_2, MUX_3: STD_LOGIC; A1<=A when (NEG_A='0') else not A; B1<=B when (NEG_B='0') else not B; Y<=Y1 when (NEG_Y='0') else not Y1; MUX_0<=A1 MUX_1<=A1 MUX_2<=A1 MUX_3<=A1 and B1; or B1; xor B1; xnor B1;
Circuit netlist
with (L1 & L0) select Y1<=MUX_0 when "00", MUX_1 when "01", MUX_2 when "10", MUX_3 when others; end MLU_DATAFLOW;
Interpret RTL code Produce synthesized circuit netlist in a standard EDIF format Give preliminary performance estimates Some can display circuit schematics corresponding to EDIF netlist
Implementation
After synthesis the entire implementation process is performed by FPGA vendor tools Xilinx ISE foundation 6.2i Altera Quartus II 4.0 3rd party tools for alliance version
Circuit Compilation
1. Technology Mapping
LUT
2. Placement
LUT
?
3. Routing
Routing Example
FPGA
Programmable Connections
Performs static analysis of the circuit performance Reports critical paths with all sources of delays Determines maximum clock frequency
Critical Path The Longest Path From Outputs of Registers to Inputs of Registers
tP logic
in clk
out
tCritical = tP FF + tP logic + tS FF
Min. Clock Period = Length of The Critical Path Max. Clock Frequency = 1 / Min. Clock Period
Configuration
Once a design is implemented, you must create a file that the FPGA can understand
The BIT file can be downloaded directly to the FPGA, or can be converted into a PROM file which stores the programming information