Vous êtes sur la page 1sur 62

Introduction to FPGA

Technology, Devices and Tools

FPGA Devices & Technology

World of Integrated Circuits

Full-Custom ASICs

Semi-Custom ASICs

User Programmable

PLD

FPGA

ASIC Application Specific Integrated Circuit


designs must be sent for expensive and time consuming fabrication in semiconductor foundry designed all the way from behavioral description to physical layout

FPGA Field Programmable Gate Array


Small development overhead No NRE (non-recurring engineering) costs Quick time to market No minimum quantity order Reprogrammable

How can we make a programmable logic?

One time programmable


Fuses (destroy internal links with current) Anti-fuses (grow internal links) PROM EPROM EEPROM Flash SRAM - volatile

Reprogrammable

What is an FPGA?
Configurable Logic Blocks
Block RAMs Block RAMs

I/O Blocks Block RAMs

Which Way to Go?


ASICs FPGAs

Off-the-shelf
High performance Low development cost Low power Short time to market Low cost in high volumes

Reconfigurability

Other FPGA Advantages

Manufacturing cycle for ASIC is very costly, lengthy and engages lots of manpower

Mistakes not detected at design time have large impact on development time and cost FPGAs are perfect for rapid prototyping of digital circuits

Easy upgrades like in case of software Unique applications

reconfigurable computing

Major FPGA Vendors


SRAM-based FPGAs Xilinx, Inc. Share over 60% of the market Altera Corp. Atmel Lattice Semiconductor Flash & antifuse FPGAs Actel Corp. Quick Logic Corp.

XILINX

Xilinx

Primary products: FPGAs and the associated CAD software

Programmable Logic Devices

ISE Alliance and Foundation Series Design Software

Main headquarters in San Jose, CA Fabless* Semiconductor and Software Company


UMC (Taiwan) {*Xilinx acquired an equity stake in UMC in 1996} Seiko Epson (Japan) TSMC (Taiwan)

Xilinx FPGA Families

Old families

XC3000, XC4000, XC5200 Old 0.5m, 0.35m and 0.25m technology. Not recommended for modern designs. Virtex (0.22m) Virtex-E, Virtex-EM (0.18m) Virtex-II, Virtex-II PRO (0.13m) Spartan/XL derived from XC4000 Spartan-II derived from Virtex Spartan-IIE derived from Virtex-E Spartan-3

High-performance families

Low Cost Family


Basic Spartan-II FPGA Block Diagram

CLB Structure
COUT YB Y D CK EC R F5IN BY SR XB X D S F4 F3 F2 F1 XB X D S COUT YB Y D CK EC R G4 G3 G2 G1 Look-Up Table O

Carry & Control Logic

G4 G3 G2 G1

Look-Up Table O

Carry & Control Logic

F5IN BY SR F4 F3 F2 F1

Look-Up Table O

Carry & Control Logic

CK
EC R

Look-Up Table O

Carry & Control Logic

CK
EC R

CIN CLK CE

SLICE

CIN CLK CE

SLICE

Each slice has 2 LUT-FF pairs with associated carry logic Two 3-state buffers (BUFT) associated with each CLB, accessible by all CLB outputs

CLB Slice Structure

Each slice contains two sets of the following:

Four-input LUT Any 4-input logic function, or 16-bit x 1 sync RAM or 16-bit shift register Carry & Control Fast arithmetic logic Multiplier logic Multiplexer logic Storage element Latch or flip-flop Set and reset True or inverted inputs Sync. or async. control

LUT (Look-Up Table) Functionality


x1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 x2 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 x3 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 x4 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 y 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 x1 x2 x3 x4

LUT

x1 x2 x3 x4

x1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1

x2 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1

x3 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1

x4 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

y 0 1 0 0 0 1 0 1 0 1 0 0 1 1 0 0

Look-Up tables are primary elements for logic implementation Each LUT can implement any function of 4 inputs

x1 x2 y y

5-Input Functions implemented using two LUTs


One CLB Slice can implement any function of 5 inputs Logic function is partitioned between two LUTs F5 multiplexer selects LUT
A4 A3 A2 A1 WS DI
0

LUT ROM RAM

F5
F5 GXOR G

F4 F3 F2 F1 BX

A4 A3 A2 A1

WS

DI D

LUT ROM RAM

nBX BX 1 0

5-Input Functions implemented using two LUTs


X X X X X 5 4 3 2 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 1 0 0 1 0 0 0 0 1 0 1 0 0 1 1 0 0 0 1 1 1 0 1 0 0 0 0 1 0 0 1 0 1 0 1 0 0 1 0 1 1 0 1 1 0 0 0 1 1 0 1 0 1 1 1 0 0 1 1 1 1 1 0 0 0 0 1 0 0 0 1 1 0 0 1 0 1 0 0 1 1 1 0 1 0 0 1 0 1 0 1 1 0 1 1 0 1 0 1 1 1 1 1 0 0 0 1 1 0 0 1 1 1 0 1 0 1 1 0 1 1 1 1 1 0 0 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 Y 0 1 0 0 1 1 0 0 1 0 0 1 1 1 1 1 0 0 0 0 0 0 0 1 0 1 0 1 0 1 0 0

LUT

OUT

LUT

Dedicated Expansion Multiplexers

MUXF5 combines 2 LUTs to create Any 5-input function (LUT5) Or selected functions up to 9 inputs Or 4x1 multiplexer MUXF6 combines 2 slices to form Any 6-input function (LUT6) Or selected functions up to 19 inputs 8x1 multiplexer Dedicated muxes are faster and more space efficient

CLB Slice LUT LUT MUXF5 MUXF6

Slice
LUT LUT MUXF5

Distributed RAM
RAM16X1S

CLB LUT configurable as Distributed RAM

LUT

=
RAM32X1S
D WE WCLK A0 A1 A2 A3 A4 O

D WE WCLK A0 A1 A2 A3

A LUT equals 16x1 RAM Implements Single and Dual-Ports Cascade LUTs to increase RAM size

LUT

Synchronous write Synchronous/Asynchronous read

=
LUT

or

RAM16X2S
D0 D1 WE WCLK A0 A1 A2 A3 O0 O1

RAM16X1D
D WE WCLK A0 A1 A2 A3 DPRA0 DPO DPRA1 DPRA2 DPRA3 SPO

Accompanying flip-flops used for synchronous read

or

Shift Register
LUT

Each LUT can be configured as shift register

IN CE CLK

D CE

Serial in, serial out

D CE

Dynamically addressable delay up to 16 cycles For programmable pipeline Cascade for greater cycle delays Use CLB flip-flops to add depth

LUT

D CE

OUT

D CE

DEPTH[3:0]

Shift Register
12 Cycles Operation A 64 4 Cycles Operation B 8 Cycles 64

Operation C
3 Cycles

3 Cycles

Register-rich FPGA

9-Cycle imbalance

Allows for addition of pipeline stages to increase throughput

Data paths must be balanced to keep desired functionality

Carry & Control Logic


COUT YB G4 G3 G2 G1 Y Look-Up O Table S

Carry & Control Logic

D CK EC

F5IN BY SR XB

F4 F3 F2 F1

X Look-Up Table O

S D CK EC R Q

Carry & Control Logic

CIN CLK CE

SLICE

Fast Carry Logic

Increases efficiency and performance of adders, subtractors, accumulators, comparators, and counters

Carry logic is independent of normal logic and routing resources

LSB

Carry Logic Routing

Each CLB contains separate logic and routing for the fast generation of sum & carry signals

MSB

Accessing Carry Logic

All major synthesis tools can infer carry logic for arithmetic functions

Addition (SUM <= A + B) Subtraction (DIFF <= A - B) Comparators (if A < B then) Counters (count <= count +1)

Block RAM
Port B Port A
Spartan-II True Dual-Port Block RAM

Block RAM

Most efficient memory implementation

Dedicated blocks of memory 4 to 14 memory blocks

Ideal for most memory requirements

4096 bits per blocks

Use multiple blocks for larger memories

Builds both single and true dual-port RAMs

Dual Port Block RAM

Dual-Port Bus Flexibility


RAMB4_S4_S16
WEA ENA RSTA CLKA ADDRA[9:0] DIA[3:0] DOA[3:0]

Port A In 1K-Bit Depth

Port A Out 4-Bit Width

WEB
ENB

Port B In 256-Bit Depth

RSTB CLKB ADDRB[7:0] DIB[15:0]

DOB[15:0]

Port B Out 16-Bit Width

Each port can be configured with a different data bus width Provides easy data width conversion without any additional logic

Two Independent Single-Port RAMs


RAMB4_S1_S1 Port A In 2K-Bit Depth VCC, ADDR[10:0]
WEA ENA RSTA CLKA ADDRA[10:0] DIA[0] DOA[0]

Port A Out 1-Bit Width

Port B In 2K-Bit Depth GND, ADDR[10:0]

WEB ENB RSTB CLKB ADDRB[10:0] DIB[0] DOB[0]

Port B Out 1-Bit Width

Added advantage of True Dual-Port

To access the lower RAM

No wasted RAM Bits

Can split a Dual-Port 4K RAM into two Single-Port 2K RAM

Tie the MSB address bit to Logic Low Tie the MSB address bit to Logic High

Simultaneous independent access to each RAM

To access the upper RAM

I/O Banking

Basic I/O Block Structure


Three-State FF Enable Clock Set/Reset D Q EC SR

Three-State Control

Output FF Enable

D Q EC SR

Output Path

Direct Input FF Enable Registered Input Q D EC Input Path

SR

IOB Functionality

IOB provides interface between the package pins and CLBs Each IOB can work as uni- or bi-directional I/O Outputs can be forced into High Impedance Inputs and outputs can be registered

advised for high-performance I/O

Inputs can be delayed

Routing Resources
CLB CLB CLB

PSM
CLB CLB

PSM
CLB Programmable Switch Matrix

PSM
CLB CLB

PSM
CLB

Clock Distribution

FPGA Nomenclature

ALTERA

Device Families & Tools

Logic Element: FLEX10K

Logic Array Block: FLEX10K

FLEX10K Architecture

Stratix Architecture

Stratix Device Family

Feature Logic Elements (LEs) M512 RAM Blocks ( 512 Bits + Parity) M4K RAM Blocks (4 Kbits + Parity) M512 RAM Blocks (512 Kbits + Parity) Total RAM bits DSP Blocks Embedded Multipliers PLLS Maximum User I/O Pins Engineering Sample Availability Production Device Availability

EP1S10 10,570 94 60 1 920,448 6 48 6 426 Now March 2003

EP1S20 18,460 194 82 2 1,669,248 10 80 6 586 Use Production Now

EP1S25 25,660 224 138 2 1,944,576 10 80 6 706 Use Production Now

EP1S30 32,470 295 171 4


3,317,184

EP1S40 41,250 384 183 4 3,423,744 14 112 12 822 Now March 2003

EP1S60 57,120 574 292 6


5,215,104

EP1S80 79,040 767 364 9 7,427,520 22 176 12 1,238 Now January 2003

EP1S120 114,140 1,118 520 12


10,118,016

12 96 10 726 N/A

18 144 12 1,022 N/A April 2003

28 224 12 1,314 2003

Now

2003

FPGA Technology Roadmap

year

1995

1996

1997

2000

2003

2004 ?

Technology

0.6

0.35 0.25 0.18

0.13
100K LC* 8Mb RAM 400 18X18 multipliers

0.07

Gate count

25K

100K

250K

1M

Transistor count

3.5M

12M

23M

75M

430M

1B

*note: Xilinx Virtex-II Pro XC2VP100 (9/16/2003)

Advance architecture on modern FPGAs

More guts

Additional components

RAM blocks Dedicated multipliers Tri-state buffers Transceivers Processor cores DSP blocks

Dedicate Arithmetic Blocks

QuickLogic

Altera Xilinx

Processor Cores

PowerPC on Vertex II Pro


Embedded 300+ MHz Harvard Architecture Core Low Power Consumption: 0.9 mW/MHz Five-Stage Data Path Pipeline Hardware Multiply/Divide Unit Thirty-Two 32-bit General Purpose Registers 16 KB Two-Way Set-Associative Instruction Cache 16 KB Two-Way Set-Associative Data Cache Memory Management Unit (MMU) - 64-entry unified Translation Look-aside Buffers (TLB) - Variable page sizes (1 KB to 16 MB) Dedicated On-Chip Memory (OCM) Interface Supports IBM CoreConnect Bus Architecture Debug and Trace Support Timer Facilities

ARM in Excalibur
Industry-standard ARM922T 32-bit RISC processor core operating up to 200MHz
ARMv4T instruction set with Thumb extensions
Memory management unit (MMU) included for real-time operating systems (RTOS) support Harvard cache architecture with 64-way set associative separate 8Kbyte instruction and 8-Kbyte data caches

Embedded programmable on-chip peripherals


ETM9 embedded trace module to assistant software debugging

Flexible interrupt controller


Universal asynchronous receiver/transmitter (UART) General-purpose timer

Watchdog timer

FPGA Tools

Design process (1)


Design and implement a simple unit permitting to speed up encryption with RC5-similar cipher with fixed key set on 8031 microcontroller. Unlike in the experiment 5, this time your unit has to be able to perform an encryption algorithm by itself, executing 32 rounds..

Specification (Lab Experiments)


VHDL description (Your Source Files)

Library IEEE; use ieee.std_logic_1164.all; use ieee.std_logic_unsigned.all; entity RC5_core is port( clock, reset, encr_decr: in std_logic; data_input: in std_logic_vector(31 downto 0); data_output: out std_logic_vector(31 downto 0); out_full: in std_logic; key_input: in std_logic_vector(31 downto 0); key_read: out std_logic; ); end AES_core;

Functional simulation

Synthesis

Post-synthesis simulation

Design process (2)


Implementation

Timing simulation

Configuration On chip testing

Active-HDL

Simulation Tools

Synthesis Tools

Logic Synthesis
VHDL description
architecture MLU_DATAFLOW of MLU is signal signal signal signal begin A1:STD_LOGIC; B1:STD_LOGIC; Y1:STD_LOGIC; MUX_0, MUX_1, MUX_2, MUX_3: STD_LOGIC; A1<=A when (NEG_A='0') else not A; B1<=B when (NEG_B='0') else not B; Y<=Y1 when (NEG_Y='0') else not Y1; MUX_0<=A1 MUX_1<=A1 MUX_2<=A1 MUX_3<=A1 and B1; or B1; xor B1; xnor B1;

Circuit netlist

with (L1 & L0) select Y1<=MUX_0 when "00", MUX_1 when "01", MUX_2 when "10", MUX_3 when others; end MLU_DATAFLOW;

Features of synthesis tools


Interpret RTL code Produce synthesized circuit netlist in a standard EDIF format Give preliminary performance estimates Some can display circuit schematics corresponding to EDIF netlist

Implementation

After synthesis the entire implementation process is performed by FPGA vendor tools Xilinx ISE foundation 6.2i Altera Quartus II 4.0 3rd party tools for alliance version

Circuit Compilation
1. Technology Mapping

LUT

2. Placement
LUT

?
3. Routing

Assign a logical LUT to a physical location.

Select wire segments And switches for Interconnection.

Routing Example
FPGA
Programmable Connections

Static Timing Analyzer

Performs static analysis of the circuit performance Reports critical paths with all sources of delays Determines maximum clock frequency

Static Timing Analysis

Critical Path The Longest Path From Outputs of Registers to Inputs of Registers
tP logic

in clk

out

tCritical = tP FF + tP logic + tS FF

Min. Clock Period = Length of The Critical Path Max. Clock Frequency = 1 / Min. Clock Period

Configuration

Once a design is implemented, you must create a file that the FPGA can understand

This file is called a bit stream: a BIT file (.bit extension)

The BIT file can be downloaded directly to the FPGA, or can be converted into a PROM file which stores the programming information

Vous aimerez peut-être aussi