FPGA DesignFlow

FPGA Design Flow
Bharathwaj Muthuswamy
muthuswamy@msoe.edu
EE3921 Fall Quarter (Oct. 2009)
References:
1. Digital Systems Design Using VHDL, Chapter 6 (your book)
2. Dr. Sheila Ross’ EE3921 Slides
3. UC Berkeley CS150 Spring 2009 Prof. Wawrzynek slides
4. Altera FLEX10K datasheet:
http://www.altera.com/literature/ds/dsf10k.pdf
Administrivia
• Feedback on summaries
– Related note: project proposals and project
reports
• Schedule for the rest of the quarter
(reminder)
EE3921 Design Methodology
Steps in Design Flow
To take a design described in VHDL and
implement it on the FPGA board, the compiler
performs the following steps:
1.Synthesis
2.Optimization
3.Mapping
4.Placement and Routing
Logic Synthesis
• VHDL (and Verilog etc.) started out as simulation
languages but quickly people wrote programs to
automatically convert VHDL “code” into low level
circuit description (netlists).
• Synthesis converts VHDL descriptions to
implementation technology specific primitives:
– FPGAs: LUTs, flip‐flops and RAM blocks
– ASICSs: standard cell gate and flip‐flop libraries;
memory blocks
More on Logic Synthesis
• Logic operators map into primitive logic gates
• Arithmetic operators map into adders,
subtractors, …
• Relation operators generate comparators
• Conditional expression generates logic or
MUX
Why synthesis?
• Automatically manages many details of the
design process:
– Fewer bugs
– Improved productivity
• Abstracts design data (HDL description) from
any particular implementation technology
• In many cases, leads to a more optimal design
than could be achieved by manual means.
• BE CAREFUL WITH SYNTHESIS:
– SYNTHESIZED HARDWARE DOES NOT MEET
DESIGN SPECIFICATION!
Example 1: 4-1 MUX
• Structural
• Selected Signal Assigment
Example 2: case Statement
Example 2: case Statement
Example 3: if Statement
Example: if Statement
This MUX implementation
works because the test
conditions are simple:
if A = ‘1’ then …
elsif B = ‘0’ then …
Example: if Statement
• More complicated condition tests, such as
“less than” or “not equal” would require
comparator hardware.
Inferred Memory
• Signals inside a process can change value when the
sensitivity list is activated and the process runs.
• If the process has run through, and a signal has
not been assigned a new value during the run, it
must retain its previous value.
• The compiler then “infers” that the circuit has
memory. The type of memory (flip‐flop, RAM, etc.)
chosen depends on the way the process is written.
• For example, if a signal changes value only on a
rising clock edge, the compiler might choose a flip‐
flop rather than a level‐triggered latch.
Flip‐Flop vs. RAM
• One can read the value of a flip‐flop anytime. It
allows asynchronous read.
• One can only write to a flip‐flop at a clock edge.
It only allows synchronous write.
• RAM is set up so that reads and writes are
synchronous.
– Bit lines (output lines) usually need to be charged to
halfway between logic 1 and logic 0 before
attempting a read. Thus, timing is necessary.
Inferred Flip‐Flop
The line which writes input data to
memory is inside the if statement. Write
can occur only on positive edge.
The data is always available for reading.
Output is hardwired to data (outside
process).
Inferred RAM
Both the read and write sections are
inside the if statement, thus can only
occur on clock edge
Optimization
• In EE 2901, you learned about design
tradeoffs:
– Ripple‐carry adder vs. carry lookahead adder
– K‐map minimized sum of products or
multistage/xor implementation
– One‐hot, straight binary, or minimal Hamming
distance encoding of state machines
• The netlist can be optimized to meet a user’s
goal of minimum power, area, or delay
State Assignment Optimization
• For every flip‐flop in our FPGA, we have a
small LUT to implement a logic function.
• One‐hot state assignment uses more flip‐
flops, but the logic equations are simpler, so
we may end up using fewer logic blocks
• Microprogramming typically uses straight
binary state assignment
• Other state assignment heuristics exist which
may use fewer flip‐flops than one‐hot, smaller
logic than straight binary (Chapter 1)
Mapping
• The mapping step is where we move from
hardware‐independent primitives to a circuit
that will fit the particular board
• FLEX10K has 4‐input LUT’s
• Shannon’s theorem is used to break up large
logic functions
• Carry/Cascade Chains and other dedicated
hardware are used to save on logic blocks
Shannon’s Theorem
• Shannon’s theorem allows us to break down
Boolean functions of many variables.
• Suppose we wanted to implement the
following 6‐variable function on the FLEX10K:
Z = abcd ′ef ′ + a′b′c′def ′ + b′cde′f

• The FLEX10K only has 4 input LUT’s.
• We need to rewrite Z using a combination of
4‐variable functions.
Shannon’s Theorem
• We can write Z like this
Z = a′Z 0 + aZ1
where Z0 and Z1 are functions of b, c, d ,e, f.
• To obtain Z0, substitute 0 for a in your original
equation. Simplify, and that is Z0.
• To obtain Z1, substitute 1 for a and simplify.
Shannon’s Theorem
Z 0 = 0bcd ′ef ′ + 1b′c′def ′ + b′cde′f
Z 0 = b′c′def ′ + b′cde′f
Z1 = 1bcd ′ef ′ + 0b′c′def ′ + b′cde′f
Z1 = bcd ′ef ′ + b′cde′f
Shannon’s Theorem
• We can look at Z0 and Z1 as “sub‐functions”.
• Z is now a 3‐input function, which uses two 5‐
input sub‐functions.
Z = a′Z 0 + aZ1
• Since our FLEX10K can only handle 4 inputs,
we must break down the sub‐functions.
Breaking Down Z0
• To obtain Z00, substitute 0 for b in Z0 :
Z 00 = 1c′def ′ + 1cde′f
Z 00 = c′def ′ + cde′f
Z 01 = 0c′def ′ + 0cde′f
Z 01 = 0
Breaking Down Z1
Z10 = 0cd ′ef ′ + 1cde′f

Z10 = cde′f
Z11 = 1cd ′ef ′ + 0cde′f

Z11 = cd ′ef ′
Implementation in 4‐input LUT’s
• We then implement each of the following
functions using a 4‐input LUT:
Z = a′Z 0 + aZ1
Z 0 = b′Z 00 + bZ 01 Z1 = b′Z10 + bZ11

Implementation in 4‐input LUT’s
Carry Chain
• To save logic
blocks and
common routing
lines, many
FPGA’s include
dedicated carry
hardware.
Cascade Chain
• Since sum‐of‐products
are common, cascade
chains are included to
OR together many logic
blocks.
• Fewer logic blocks and
routing lines are used up
for Boolean function
implementation.
Place and Route
• In the final step, the compiler decides how the
logic blocks will be arranged (place) and sets
the interconnections (route)
• Finding the lowest “cost” solution involves
non‐convex optimization
– Hard to do, there are local minima, often O(2n)
– Greedy algorithm: Accept only “better” moves
• Can get stuck in local minimum
– Simulated annealing: In the beginning, accept
some “bad” moves, then fewer as time goes on
Routing Optimization
Greedy Algorithm
0 Iteration 43

FPGA DesignFlow

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

FPGA DesignFlow

Transféré par

Droits d'auteur :

Formats disponibles

FPGA Design Flow

Z = abcd ′ef ′ + a′b′c′def ′ + b′cde′f

Z10 = 0cd ′ef ′ + 1cde′f

Z11 = 1cd ′ef ′ + 0cde′f

Z 0 = b′Z 00 + bZ 01 Z1 = b′Z10 + bZ11

Vous aimerez peut-être aussi