Vous êtes sur la page 1sur 71

1

Digital System Design with PLDs and FPGAs


Field Programmable Gate Arrays

Kuruvilla Varghese
DESE
Indian Institute of Science
Kuruvilla Varghese

Topics 2

FPGA Architecture (Xilinx, Altera, Actel)


FPGA related Design issues
FPGA related Timing issues
Tool Flow
FPGA Configuration
SoPC
Debugging
Case Study

Kuruvilla Varghese

1
Field Programmable Gate Arrays 3

ASIC, MPGA/Standard Cell, FPGA


Volumes, NRE cost, Turn around time
Array of logic resources with programmable
interconnection.
Logic resources (Combinational, Flip flops)
Combinational: LUT, Multiplexers, Gates
Programmable interconnections: SRAM, Flash, Anti-fuse
Special Resources: PLL/DLL, RAMs, FIFOs,
Memory Controllers, Network Interfaces, Processors
Kuruvilla Varghese

Commercial FPGAs 4

Xilinx
Spartan-3, Spartan-6
Virtex-4, Virtex-5, Virtex-6
Artix-7, Kintex-7, Virtex-7, Zynq
Altera
Cyclone, Cyclone II, Cyclone III, Cyclone IV, Cyclone V
Arria II, Arria V
Stratix II, Stratix III, Startix IV, Startix V

Kuruvilla Varghese

2
Commercial FPGAs 5

Actel
Axcelerator (Antifuse)
IGLOO, IGLOOE (Flash)
ProASIC Plus (Flash)
ProASIC3, ProASIC3E (Flash)
RTAX (Radiation Tolerant, Anti-fuse)
RTSX -SU (Radiation Tolerant, Anti-fuse)
Smart Fusion, Smart Fusion 2 (ARM Cortex M3)
Kuruvilla Varghese

Structure of an FPGA 6

Kuruvilla Varghese

3
Structure of an FPGA 7

Kuruvilla Varghese Source: Xilinx Data Sheets

Detailed View 8

CLB CLB CLB

SB SB SB

CLB CLB CLB

Kuruvilla Varghese

4
Switch Block 9

Kuruvilla Varghese

Types of switch blocks 10

Kuruvilla Varghese

5
FPGA 11

Kuruvilla Varghese Source: Xilinx Data Sheets

FPGA 12

I/O Blocks (Tri-state output / Input, Synchronizing Flip-


flops)
Array of Configurable Logic Blocks
Horizontal and Vertical wires with programmable switches
in between
Single length, Double length, Quad, Hex and Long lines
Resources available to user
Resources for configuring programmable switches in the
interconnect structures and Logic blocks
Kuruvilla Varghese

6
Programmable Connections 13

SRAM (Pass Transistor)


Flash
Antifuse

Kuruvilla Varghese

SRAM (Pass Transistor) 14

Kuruvilla Varghese Source: Xilinx Data Sheets

7
Pass Transistor with configuration cell 15
Flip-Flop

Write Transistor

Pass Transistor

Flip-Flop to store the switch status (4 Transistors)


Write Transistor to write Configuration status
Total: 6 Transistors
FFs controlling the Switches are organized as SRAM hence the name

Kuruvilla Varghese

Flash Transistor 16

MOS transistor with a floating gate


Conducts when not programmed off
Can be electrically programmed off or on

Kuruvilla Varghese

8
Flash Transistor 17

Kuruvilla Varghese

Flash Cell Write 18

Kuruvilla Varghese

9
Flash Cell Erase 19

Kuruvilla Varghese

Anti-fuse 20

Kuruvilla Varghese

10
Programmable Connections 21

Name Volatile Re-programm- Delay Area


able
Flash No In-circuit Large Medium

SRAM Yes In-circuit Large Large

Anti-fuse No No Small Small

Kuruvilla Varghese

Logic Block size 22

Coarse grain
Owing to SRAM interconnection area (6 transistors) the
Logic Blocks are made large in SRAM based FPGA
Utilization is made high with configurability within the logic
block
Fine Grain
Since the antifuse occupies less area and has less time delay,
antifuse based FPGAs employs smaller size logic blocks

Kuruvilla Varghese

11
Logic Cell Structure Coarse Grain 23

Kuruvilla Varghese Source: Xilinx Data Sheets

Logic Cell Structure Fine Grain 24

Kuruvilla Varghese

12
Design Methodology 25

Functional
HDL Source
Simulation

Synthesis Logic Simulation

Equations/Netlists

Static Timing
Constraints PAR/Fitting Analysis

Configuration Timing
File Model
Timing Simulation
Programming

Kuruvilla Varghese

Structure of an FPGA 26

Kuruvilla Varghese Source: Xilinx Data Sheets

13
Commercial Tools 27

Simulators
ModelSim (Mentor Graphics)
Active HDL (Aldec)
Synthesis Tools
Synplify Pro (Synopsys)
Precision Synthesis (Mentor Graphics)
Vendor Tools
Xilinx ISE (Synthesis, Simulation, PAR, Programming, )
Xilinx Vivado (Synthesis, Simulation, PAR, Programming, )
Altera Quartus II (Synthesis, Simulation, PAR, Programming, )
Actel Libero (Synthesis, Simulation, PAR, Programming, )

Kuruvilla Varghese

Commercial Tools 28

Cadence Suite
Synopsis Suite
Mentor Graphics Suite

Kuruvilla Varghese

14
Xilinx Virtex FPGA 29

SRAM based programmable connections, configuration


LUT based combinational Logic
Flip-Flops with sync/async reset/preset
Large Configurable Logic Cells (CLBs)
Block RAM (SPRAM, DPRAM, FIFO)
LUT as Distributed RAM
Low skew clock trees, DLL, Tri-state gates for Buses
Carry Chains / Cascade Chains
JTAG, Serial, and Parallel Configuration schemes
I/O Blocks (Registered / Non-registered)
Multiple I/O standards

Kuruvilla Varghese

Xilinx Virtex FPGA 30

Present day FPGAs use PLL


instead of DLL and has DSP
blocks for fixed point arithmetic

Kuruvilla Varghese Source: Xilinx Data Sheets

15
Virtex CLB 31

Kuruvilla Varghese Source: Xilinx Data Sheets

LUT 32

00 0
X A1
01 1
A0
Y 10 1
11 0
D0
X XOR Y

Address lines as inputs, data line as output (read mode)


Truth table written during configuration (write)
4 input, 6 input LUTs
Fixed AND, Programmable OR
Kuruvilla Varghese

16
FPGA Configuration / Programming 33

Writing to configuration memory


Configuring options in Logic blocks
Writing LUTs with truth tables
Combining LUTs,
Using LUTs as memory
Selecting clocks, Set/Reset for FFs
Configuring Various Muxes in Slices
Using special resources (RAM, FIFOs, PLLs)
Programming Switch matrices
Programming I/O blocks
Kuruvilla Varghese

Virtex Family 34

Kuruvilla Varghese Source: Xilinx Data Sheets

17
Important Specifications 35

CLB Array, Block RAM Bits


User I/O, Differential I/O
Distributed RAM Bits can be calculated from number of
CLBs (multiply by 4 x 64)
System gates and logic gates are not useful, as these are
equivalent gate counts, it is useless to compare across
vendors

Kuruvilla Varghese

Structure of an FPGA 36

Kuruvilla Varghese

18
Virtex CLB 37

Kuruvilla Varghese Source: Xilinx Data Sheets

Kuruvilla Varghese Source: Xilinx Data Sheets

19
4 input LUT and Flip-Flops 39

I3 S
I2 O D Q
I1 CK
I0 AR

I3
S
I2 O
I1 D Q
I0 CK
AR

Use LUT and FFs independently


Use LUT followed with FFs

Kuruvilla Varghese

4 input LUT and Flip-Flops 40

Independent LUT Outputs: X, Y


Dedicated inputs to FF: BX, BY

Kuruvilla Varghese

20
5 input LUT 41

I3
I2 O
I1
I0

F5
I3
I2 O
I1
I0
I4

Two 4 input LUTs are Muxed for 5 input LUT using F5 Mux.
Select line is connected to BX and hence cannot use bottom FF
independently. F5 Mux output is connected to this FF.

Kuruvilla Varghese

6 Input LUT 42
I3 I3
I2 O I2 O
I1 I1
I0 I0
F5 F5
I3 I3
F6
I2 O I2 O
I1 I1
I0 I0
I4 I4

I5

Two 5 inputs are Muxed using F6 for a 6 input LUT. Select line is
connected to BY and hence cannot use top FF independently. F6 Mux
output is connected to this FF.

Kuruvilla Varghese

21
Cascading LUTs 43

5 inputs and 6 inputs LUT using F5 and F6 muxes are required


in most general case, considering all possible minterms
But in a specific case of 6 input LUT can be implemented using
cascade of two LUTs

Kuruvilla Varghese

6 inputs using 2 LUTs 44

Y = ABCDE or ABCDF
Y = (ABCD) and (E or F)
ABCD = X
Y = X and E or F
X
A
B E
C F
D

Truth Table Truth Table


ABCD X or E or F
Kuruvilla Varghese

22
5 inputs using 2 cascaded LUTs 45

Y = ABCDE
Y = (ABCD) and E
ABCD = X
Y = X and E
X
A
B E
C
D

Truth Table Truth Table


ABCD X and E
Kuruvilla Varghese

5 inputs using 2 cascaded LUTs 46

Y = ABCDE or AB/CDE/
Y = (ABCD) and E
ABCD = X
Y = X and E X
A
B
C E
D

Truth Table Truth Table


ABCD X and E

Kuruvilla Varghese

23
5 inputs using 5 input LUT 47

Y = ABCD xor E
ABCD = Z
Y = ZE/ and Z/E
A I3
B I2 O
C I1
D I0

F5 Y
A I3
B I2 O
C I1
D I0

Kuruvilla Varghese

Virtex CLB: LUT 48

LUT and FF can be used separately or together


4, 4 inputs LUTs
5 inputs LUT from two 4 inputs LUTs using F5 Mux
6 inputs LUT from two 5 inputs LUTs through F6 Mux
Four 4 inputs LUTs / Two 5 inputs LUTs / One 6 inputs
LUT
FF: Sync/Async Set-Reset, Clock Enable
Since, both set and reset is available. Registers can be
initialized to any value, without extra overhead.

Kuruvilla Varghese

24
LUT as RAM 49

I3
I2 O
I1
I0

LUT
RAM
Write

General routing lines can be used to write LUT through the LUT
RAM write control circuit to use LUT as Distributed RAM

Kuruvilla Varghese

Virtex CLB: LUT as distributed RAM 50

LUT is written while configuring FPGA, when used for logic


implementation.
Write control signals are available to be connected to routing wires so that
it can be used a s RAM when it is not used for logic implementation.
Four 16x1 distributed RAM per CLB
These can be combined to make various memory sizes and data widths.
Since it is spread across CLBs, it is called Distributed RAM
Since, it is spread across, access latency can vary and should be careful, if
you use it without read registering.

Kuruvilla Varghese

25
Carry Chain 51

Adder
S i = Ai B i C i
C i +1 = Ai B i + ( Ai B i )C i

Requires two lookup tables (Si and Ci+1) at each stage.


This along with routing makes adder big and slow
Hence dedicated carry chain to make adder faster,
implementing Ci+1.

Kuruvilla Varghese

Carry Chain 52

Ci+1

0 1
LUT

Ai
Bi
Si

Ci

C i +1 = A i B i + ( Ai B i ) C i

Kuruvilla Varghese

26
Source: Xilinx Data Sheets
Kuruvilla Varghese

Carry Chain 54

For adders use the operator + to be able to use carry chains.


For higher level functions like counters etc; synthesis tool
infer and use carry chains.
The AND gate combining Ai and Bi shown in Slice diagram
is for partial product generation in multipliers
In some FPGAs, carry chain has features to cascade
(AND/OR) the LUT outputs.

Kuruvilla Varghese

27
Control of Sequential Circuits 55

FSM /
en (RA_L) Reg /
Contr-
oller Counter /

clk

Kuruvilla Varghese

Clock Gating 56

D7:0
D Q

RA_E

RA_L
CLK
CLK CK

CLK

RA-L

CLK

Kuruvilla Varghese

28
Re-circulating Multiplexer 57

0 D7:0
D Q
1

RA_L RA_E

CLK CK

CLK

RA-L

Register write on the clock edge

Kuruvilla Varghese

Re-circulating Multiplexer 58

0 D Q
D Q
1

CE
CK
CK

if (clkevent and clk = 1) then


if (cntrl_sig = 1) then
q <= d;
end if;
end if;
Kuruvilla Varghese

29
Clock Gating for low power 59

D7:0
D Q
RA_L CLK1
D Q CLK2
CK RA_E
CK
CLK

CLK

RA-L

CLK1

CLK2

Kuruvilla Varghese

Combinational Circuit Mapping 60

Comb

One or More LUTS

Kuruvilla Varghese

30
Sequential Circuit Mapping 61

One or more Flip Flops

FF Comb FF

One or More LUTS

Kuruvilla Varghese

Counter, FSM Mapping 62

One or more LUTs

NSL FF OL

One or More FFs

Kuruvilla Varghese

31
Virtex IOB 63

Source: Xilinx Data Sheets


Kuruvilla Varghese

Virtex IOB 64

Three paths: Output, Input, Tri-state enable


Direct or Through Flip-flops (synchronization)
Flip-Flops: Set/reset, Clock enable, Clock selection
Programmable delay at input to make hold time zero (not an
issue once registered at IOB, as tcq > th)
Programmable pull-up, pull down. Hold, slew rate
PAR tool may move some of the input/output registers to IOB

Kuruvilla Varghese

32
Virtex IOB 65

Various IO standards
LVTTL
LVCMOS33, LVCMOS25
LVCMOS18, LVCOMS15, LVCMOS12
PCI33, PCI66

Some IO standards require a Reference voltage for Inputs
Banks of I/O pins support some of the IO standards

Kuruvilla Varghese

Week keeper (Hold) 66

Bus

Hold circuit hold the previous state of the bus, but provides a
weak drive so that it could be driven to 0 or 1.
This avoids unnecessary switching of inputs by noise, if the bus
would have been left in high impedance.

Kuruvilla Varghese

33
Detailed View 67

CLB CLB CLB

SB SB SB

CLB CLB CLB

Kuruvilla Varghese

Virtex Routing 68

Kuruvilla Varghese Source: Xilinx Data Sheets

34
Virtex Routing 69

Direct connection to adjacent CLB


24 single length lines (per GRM in each direction
72 buffered Hex lines (per 6th GRM in each direction)
12 buffered long lines (horizontal & vertical)
4 tri-state lines (horizontal & vertical)

Kuruvilla Varghese

Bus Lines 70

For Busing and Multiplexing it is better to use tri-state gates than


multiplexers

Kuruvilla Varghese Source: Xilinx Data Sheets

35
Fitting Example: FSM 71

FSM, with 2 inputs, 3 states, and 2 Mealy outputs. How


many CLBs to fit in?
State Variables: 2 flip-flops (3 states)
NSL: 2 state variables + 2 inputs = 4 inputs
OL: 2 Inputs + 2 state variables = 4 inputs
2 LUTs for NSL
2 FFs for state variables,
2 LUTs for OL
This requires 1 CLB minus two FFs In fact if output is registered still it
can be accommodated in one CLB

Kuruvilla Varghese

CLBs, FSM 72

NSL FF OL

Kuruvilla Varghese Source: Xilinx Data Sheets

36
Fitting Example: Counter 73

8 bit up counter with parallel load feature


State Variables: 8 Flip-flops
Incrementer uses carry chain
NSL: 1 state variables + load + 1 din = 3 inputs per state
variable
NSL requires 8 LUTs
This requires 2 CLBs ( 4 Slices)

Kuruvilla Varghese

CLB, Counter 74

+1 FF

Kuruvilla Varghese

37
Signal Paths in CLB 75

library ieee;
use ieee.std_logic_1164.all;

entity test is
port (a, b, c, d, e, f, g, h: in std_logic; z: out std_logic);
end entity test;

architecture arch_test of test is


begin

Kuruvilla Varghese

Signal Paths in CLB 76

process (a, b)
begin
if (a = '1') then z <= '0';
elsif (b'event and b = '1') then
if (c = '1') then
z <= (d and e and f and g) xor h;
end if;
end if;
end process;
end arch_test;

Kuruvilla Varghese

38
d
e
f
g

z
h
d
e
f
g

a
b
c

Kuruvilla Varghese

Virtex DPRAM 78

Source: Xilinx Data Sheets


Kuruvilla Varghese

39
Virtex DPRAM 79

True Dual port Memory


Each port can be read/write, read or write
Synchronous reads and writes
Can be combined for larger widths and depths
Instantiated through Core Generator Tool
Conflict on simultaneous read/write to a location, read
data could be wrong
Can be initialized in VHDL code
Kuruvilla Varghese

Metastability 80

D Q
ts: Setup time: Minimum time input
must be valid before the active clock
CLK
edge

th: Hold time: Minimum time input


CLK must be valid after the active clock
edge
D
ts tco: Propagation delay for input to
th
appear at the output from active clock
Q
edge
tco

Kuruvilla Varghese

40
Minimum Clock period 81

Data path
D Q D Q
Comb
CLK CLK

clk

tclk > tco + tcomb + tsetup


tco(min) + tcomb(min) > th(max)

Here we are considering the data path from first flip-flop to the next. We
Are estimating the minimum clock period for proper latching of data on to
second flip-flop

Kuruvilla Varghese

Minimum Clock period 82

Sequential Circuit / FSM


Outputs

Inputs
D PS
Comb NS CK Q
AR

Clock
Reset

tclk > tco + tcomb + tsetup


tco(min) + tcomb(min) > th(max)

Kuruvilla Varghese

41
Clock skew 83

Previous analysis assumes that the clock reaches at flip flops at


the same time, it is not practically true, as the wire delay and
buffer delay gets added.
This creates relative delays between pair of flip flops or registers
For analysis it is important to consider the clock skew between
flip-flops/registers where there is a data path between them.
Clock Skew:
Difference in arrival time of the clock at the flip flops

Kuruvilla Varghese

Max Path and Min Path 84

CHIP Min Path

clock

Max Path

Kuruvilla Varghese

42
Clock Skew: Max path 85

D Q D Q
Comb
CLK1 CLK2
tclk tskew > tcomax +
clk tcombmax + tsetup
tclk
tclk > tcomax + tcombmax +
CLK1 tsetup + tskew
tco tcomb tskew
ts
slack =
CLK2 tclk (tcomax + tcombmax + tsetup
slack + tskew)

Kuruvilla Varghese

Clock Skew: Max path 86

Analysis for data path from first flip-flop to next


We assume tco + tcomb is greater than the hold time of flip-flop
Hence, when a clock edge comes to both the flip-flops, new data
from first flip-flop arrives at the second flip-flop after the clock
edge, even after the hold time and wont get latched in second
flip-flop
But, we estimate the clock period such that when the next clock
edge comes to second flip-flop, data from the first flip-flop due
to current clock edge get latched in the second flip-flop

Kuruvilla Varghese

43
Clock Skew: Max path 87

Since, the clock to the second flip-flop is skewed or comes early


compared to first, clock period has to accommodate this skew,
requiring a larger clock period than the case where there would
have been no skew

Kuruvilla Varghese

Clock Skew: Min path 88

D Q D Q
Comb
CLK1 CLK2

clk
Same edge
tclk
tcomin + tcombmin >
tskewmax + thold
CLK1
tco tcomb Next edge
tclk > tco + tcomb +
CLK2
tsetup - tskew
th
tskew tskew

Kuruvilla Varghese

44
Clock Skew: Min path 89

Here, an analysis like the case in max path (i.e. from one clock edge at first
flip-flop to next clock edge on second flip-flop) would result is a smaller
clock period, as the clock edge arrives late on second flip-flop
But, now the real danger is the data from first flip-flop due to current edge,
appearing in the hold time window of the current edge at the second flip-
flop
If that happens, solution is only to add extra delay to the data path between
these flip-flops, or route the clock in opposite direction
Practically, this can happen in shift registers as there may not be
combinational delay between flip-flops

Kuruvilla Varghese

Clock routing 90

Requirement
Minimum relative delay between any 2 flip-flops, at least between flip
flops where there is a datapath
Solution
Balance the number of buffers and approximate the length of wire from
clock input to the flip-flops
H Clock Tree

Kuruvilla Varghese

45
Virtex Clock Tree 91

Kuruvilla Varghese Source: Xilinx Data Sheets

DLL 92

CLKIN CLKOUT
CLKI CLKO

CLKFB DLL delays CLKOUT by


tadd that clock edges of
both CLKIN and CLKOUT
matches

CLKIN

CLKOUT

tskew tadd

Kuruvilla Varghese

46
DLL / PLL 93

In a DLL, input clock is delayed for de-skew


In a PLL, a VCO synthesizes a clock synchronous to the input clock
DLL adjusts the phase of the input clock.
PLL synthesizes the clock of same phase and frequency as that of
the input clock.
PLL has the problem of working with a limited range of
frequencies, but in FPGAs clock frequency may not change in most
cases.
PLL also cleans up the input jitters.
Xilinx Virtex 5 has PLL blocks in addition to DLL in DCM.

Kuruvilla Varghese

Current FPGAs 94

PLL
Digital Clock Manager (DCM)
DLL for de-skewing
Phase shifter
Frequency multiplication / division
Clock Buffers, Muxes (Glitchless)
All these can be connected in clock path
Clock pins, Clock tree
Kuruvilla Varghese

47
Special Resources Usage 95

Resources
Buffers
DLL / PLL
Block RAMs
DSP Blocks
Usage
Vendor library components
Inferred by synthesis tool, when possible
VHDL attributes with code

Kuruvilla Varghese

Virtex Configuration 96

JTAG: Prototyping (PC Board)


Master Serial:
Configuring from a Serial PROM
Embedded boards
Slave Serial
Works as a slave to master FPGA connected to a serial PROM
SelectMAP
8 /16 bit wide synchronous slave configuration of FPGA
Suitable for FPGA Interfaces to a CPU
Kuruvilla Varghese

48
Virtex Configuration: Serial PROM 97

Kuruvilla Varghese Source: Xilinx Data Sheets

Serial Configuration 98

Multiple FPGAs are configured from a single serial (Flash) PROM.


Master FPGA supplies clock to PROM and slave FPGAs
Master and slave FPGAs are daisy chained.
After power on or after PROGRAM request, all FPGAs configuration memory is
cleared.
Init phase synchronization is done through INIT I/O pin
Master FPGA programs first sending out 1s on DOUT and slave FPGA waits.
Once master FPGA is configured it sends configuration stream for first slave and
so on.
DONE synchronization is done through open drain output DONE, to form wired
AND operation

Kuruvilla Varghese

49
SelectMAP Scheme 99

Kuruvilla Varghese Source: Xilinx Data Sheets

SelectMAP Configuration: Timing 100

Source: Xilinx Data Sheets


Kuruvilla Varghese

50
FPGA Controls while configuring 101

While FPGA is being configured, its internal state is not


defined and pins levels are also not defined.
Xilinx FPGA has two internal signals to keep the FPGA
state sane during and after configuration.
GTS: This signal drives all FPGA outputs to tri-state
GSR: This signal goes to all flip flop set/reset and keeps all
flip-flops set or reset as reset state specified.
Once FPGA is configured, these signals are released.
Use separate user resets, for normal reset operation.

Kuruvilla Varghese

Spartan 6: Configuration 102

Boundary Scan / JTAG / TAP / IEEE 1149.1


Single Device, Chain
Master Serial (Chain, Ganged) (SPI: x1, X2, X4)
Slave Serial (SPI: x1, X2, X4)
Master SelectMAP (x8, x16)
Single Device, Chain, Ganged
Slave SelectMAP (x8, x16)

Kuruvilla Varghese

51
Spartan 6: Bit Stream encryption 103

Bit steam is AES encrypted with 256 bit key using BitGen
tool
Encryption key is programmed in to FPGA device through
JTAG for decryption.
Once programmed FPGA can be configured for no read back
Configuration also cant be read back.
AES key can be permanently fused in FPGA, Or in an
SRAM with external battery backup

Kuruvilla Varghese

Spartan 6: Bit Stream compression 104

Bit steam can be compressed when there are lot of resources


unused
Less memory for storage
Less configuration time

Kuruvilla Varghese

52
Spartan 6: Multi Boot 105

Multiple Configuration Images in Program Flash


At least, one Main configuration and one fallback/golden
configuration
During configuration, if CRC error of bit steam occurs, or
sync word detection is timed out (WDT), configuration
tries fall back configuration
Supported in SPI (x1, x2, x4) and BPI Modes

Kuruvilla Varghese

Spartan 6: DSP Slices 106

Slices to support DSP computations


18 bit 2s complement pre-adder
18 x 18 bit Multiplier, 36 bit result
Result is sign extended to 48 bit
48 bit 2s complement adder/subtracter

Kuruvilla Varghese

53
Spartan 6: DSP48A1Slice 107

Kuruvilla Varghese Source: Xilinx Data Sheets

Debug: Internal Signal Probing 108

Probing the internal signals in FPGA for debug.


Signal Probe / Logic Analysis
Use a Signal Capture IP
Interface this IP to the JTAG port
PC based software to configure signal capture IP
and display the signal waveforms
Xilinx: ChipScope Pro
Altera: Signal Probe
Kuruvilla Varghese

54
Xilinx ChipScope Pro 109

Kuruvilla Varghese Source: Xilinx Data Sheets

Virtex Pins 110

Kuruvilla Varghese Source: Xilinx Data Sheets

55
Virtex Pins 111

Kuruvilla Varghese Source: Xilinx Data Sheets

One hot encoding 112

Inputs Outputs
NS
Next
D PS Output
State
CK Q Logic
Logic
AR

Clock
Reset
Outputs
tclk > tco + tlogic + tsetup
Inputs
D PS
Logic NS CK Q
AR

Clock
Reset

Kuruvilla Varghese

56
One hot encoding 113

e.g. FSM with 5 inputs, 18 states, and 6 outputs


NSL: 5 + 5 = 10 inputs (worst case)
For Virtex (Worst Case)
Basic block: 4 input LUT
1 CLB 6 input LUT
16 CLBs for 10 input LUT
NSL would be distributed increasing the delay bringing down the
clock frequency of FSM.
Solution: one hot encoding, where each state is encoded using a
flip flop.
Kuruvilla Varghese

One hot encoding 114

Si

condi

condj
Sj

Dj = condi . Qi + condj . Qj
NSL: 5 + 2 inputs (Worst Case)

Kuruvilla Varghese

57
One-hot encoding Output logic 115

Most Moore outputs are direct decode of a state or decode of


more than one state
If output is a decode of a single state, then that state flip-flop
output is the output signal
In case of multiple states produce an output, the output
signal is the logical OR of all those state flip-flops
Thus, one-hot encoding reduces the output logic also, at the
cost of extra state flip-flops

Kuruvilla Varghese

One hot encoding 116

State encoding
Sequential, gray, one-hot-one, one-hot-zero

User defined attributes (state encoding)


attribute state-encoding of type-name: type is value;
(sequential, gray, one-hot-one, one-hot-zero)
attribute state_encoding of statetype: type is gray;
attribute enum_encoding of type-name: type is string;
attribute enum_encoding of statetype: type is 00 01 11 10;
Kuruvilla Varghese

58
One-hot one, One-hot zero 117

One-hot one One-hot zero (Almost one-


00001 hot)
00010 0000
00100 0001
01000 0010
10000 0100
1000
Easy to initialize (reset all flip-
flops
Starting state is never revisited
Kuruvilla Varghese

One hot encoding 118

Explicit declaration of states


signal pr_state, nx_state: std_logic_vector(3 downto 0);
constant a: std_logic_vector(3 downto 0) := 0001;
constant b: std_logic_vector(3 downto 0) := 0010;
constant c: std_logic_vector(3 downto 0) := 0100;
constant d: std_logic_vector(3 downto 0) := 1000;

Kuruvilla Varghese

59
Altera Stratix 119

Two levels of interconnections


SRAM based programmable connections
Logic Array Block (10 LEs)
LUT as combinational Logic
Flip-Flops with sync/async reset/preset
RAM Block (SPRAM, DPRAM, FIFO)
Low skew clock trees, PLL
Carry, Cascade chains
DSP Blocks (Multipliers, Shift Registers)
I/O Blocks (Registered / Non-registered)
Multiple I/O standards
JTAG, Parallel, and Serial Configurations

Kuruvilla Varghese

Altera Stratix 120

Kuruvilla Varghese

60
Altera Stratix 121

Kuruvilla Varghese Source: Altera Data Sheets

Actel 54SX-A 122

Antifuse based programmable interconnections


Simple Combinational and Registered cells
Simple I/O Blocks
Low skew Clock trees
Muliple I/O standards
Hardware probe pins

Kuruvilla Varghese

61
Actel 54SX-A, C Cell 123

Kuruvilla Varghese Source: Actel Data Sheets

Actel 54SX-A, R Cell 124

Kuruvilla Varghese Source: Actel Data Sheets

62
Actel 54SX-A 125

Kuruvilla Varghese Source: Actel Data Sheets

Actel 54SX-A Routing 126

Kuruvilla Varghese Source: Actel Data Sheets

63
Actel 54SX-A Probe 127

Kuruvilla Varghese Source: Actel Data Sheets

Actel ProASIC Plus 128

Kuruvilla Varghese Source: Actel Data Sheets

64
ProASIC Plus, Logic Tile 129

Kuruvilla Varghese Source: Actel Data Sheets

Latch / FF 130
clk

Latch with Mux D


1

Q
0

FF with Latches
D D Q Q
D Q

C C

CLK

Kuruvilla Varghese

65
ProASIC Plus Routing 131

Fast Connect
Short Lines (1, 2, 4), Long Lines
Clock Tree
Pad Ring (Pin Locking)
SRAM Blocks
Programming Tech: Flash
Non-volatile

Kuruvilla Varghese

CPLD vs FPGA 132

Features CPLD FPGA


Logic AND-OR Mux / LUT / Gates
Register to Logic Small Large
ratio
Timing Simple Complex
Architecture Small Large
Variation
Programming Flash SRAM, Anti-Fuse, Flash
Technology
Capacity 10 K 2 M LUT + RAM
Kuruvilla Varghese

66
Static Timing Analysis (STA) 133

Timing simulation: simulates the real time operation of the circuit,


with timing models of blocks for the specified test vectors
Time consuming for exhaustive simulation
Static Timing Analysis, analyzes various path delay from Block
and wire delays
Can make mistake as it is not aware of the real time behavior of
the circuit (inputs, FSM/Controller behavior)
A path that is never used in circuit operation may be reported
(False paths)
Registers which are not enabled every clock cycle may be
reported (Multi-cycle paths)

Kuruvilla Varghese

STA: Sequential Circuit 134

Setup to Register to Register Path


clock Clock to setup Clock to output

Input D Q D Q Output
Comb
CK CK

CLK

Register to register path decides the clock frequency. But, if other 2 exceeds one need to
choose the maximum value as the minimum clock period.
In real life, this is not a great concern many a time we are designing some IPs which goes
inside the chip interfaced to other blocks close by. Even in case inputs are outputs are
brought to external pins, proper placement should take care of these delays.

Kuruvilla Varghese

67
Static Timing Analysis: Sequential Circuit 135

Clock to Setup: Register to register path with longest delay


Clock to Setup on destination clock <clk_signal>
Clock to Pad: FF output delay - from FF output to chip
output pin
Clock <clk_signal> to Pad
Setup to Clock: Setup / Hold time of FF with respect to
input pin/pad
Setup/Hold to clock <clk_signal>

Kuruvilla Varghese

Static Timing Analysis 136

Take Maximum of the three to find the maximum clock


frequency for timing simulation
But, the actual throughput is given by Clock to Setup:
(Register to register path with longest delay)
In most cases, the Clock to Pad of a module is not of
consequence, as these output when used in top level module
goes as inputs to the nearby module.

Kuruvilla Varghese

68
False Paths 137

Improbable Paths
Static Paths (e.g. Input Registers)
Paths between clock domains

Kuruvilla Varghese

Multi-cycle path 138

D Q Comb D Q
CE1 CE CE2
CE
CK CK

clk

Clock Enable CE2 comes 3 clock cycles after CE1

Kuruvilla Varghese

69
Critical Path 139

FF1 FF2

D Q C1 C2 D Q
CE1 CE CE2 CE
CK CK

clk

Critical path delay = tCO + tC1 + tC2 + tS

Kuruvilla Varghese

Constraint driven PAR 140

Constraint editor
I/O constraints
I/O locations
I/O standards (LVTTL, PCI66-3, LVDS ..)
Drive strength (current)
Slew rate
I/O termination (pull up, pull down, hold)
Input delay

Kuruvilla Varghese

70
Timing constraints 141

Global
Clock period, pad to setup, clock to pad
Per port
pad to setup, clock to pad
Per group (by net and clock)
Pad to setup, Clock to pad
FROM TO, FROM THRU TO

False Paths
Multi-cycle paths

Kuruvilla Varghese

71