Vous êtes sur la page 1sur 22

Mentor Graphics Tutorial: Schematic Capture, Simulation, & Placement/Routing

1.0 Introduction
This tutorial demonstrates a simple VLSI circuit design process from concept to chip
layout of an 8-bit Modified Booth Multiplier on a 0.5m process using software from
Mentor Graphics Corp. The topics covered in this tutorial include schematic capture &
design, simulation, and placement & routing.
2.0 Schematic Capture & Design
The specification for the design calls for an 8-bit unsigned multiplier that accepts two 8bit unsigned inputs and produces a 16-bit unsigned output as a result. Three additional
control signals are required, including a DONE, START, and CLOCK signal. The
implementation presented in this tutorial consists of these baseline requirements, in
addition to a RESET control signal. Also, the multiplier accepts signed inputs and
therefore performs signed multiplication. We use Booths Modified Algorithm as the
underlying architecture for the design due to its ability to produce a result quickly and
reliably. In an effort to better understand the derivation of the design, a brief description
of Booths Modified Algorithm follows.
2.1 Booths Modified Algorithm
On average Booths Modified Algorithm can produce results in approximately half the
time that the traditional add and shift multiplier can. This is because Booths Modified
Algorithm looks at strings of three bits simultaneously with a one-bit overlap in each
successive comparison in order to decide what to do next. Since one bit of each string of
three bits overlaps with the previous triplet, two new bits are effectively considered
during each clock cycle. In the case of an 8-bit multiplier, this means that calculations
can optimally be performed in 4 clock cycles, excluding additional control states. The
following is a more formal definition of the Modified Booth Algorithm. Let x be the
multiplier and m be the multiplicand. Let two bits of x plus the last bit from the previous
two bits represent the triplet xL. Assume x and m are n-bit signed binary numbers. The
triplet xL can be represented by the following vector:
xL =

x 2 y +1 , x 2 y , x 2 y 1

(1)

n
where y = 0,1,2,..., ; x2 y +1 is the first bit of the triplet, x2y is the second bit of the triplet,
2
and x2y-1 is the overlapped bit from the previous triplet. Letting xi be the ith bit of x and
let x-1 = 0, the twos complement of x can be written as
n2

x = xn 1 2 n 1 + xi 2i
i =0

(2)

2 1

= 22 y ( 2 x2 y +1 + x2 y + x2 y 1 )

(3)

y =0

The product p (m, x ) can be expressed as


n

21

p(m, x) = m 22 y ( 2 x2 y+1 + x2 y + x2 y1) (4)


y=0

2 1

= 22 y (x y , m )

(5)

y =0

where (x y , m ) represents the Modified Booth recoding function and is defined by the

piecewise function:
0, x y = 0,0,0
m, x y = 0,0,1
m, x y = 0,1,0

(xy , m) =

2 m, x y = 0,1,1
2 m, x y = 1,0,0
m, x y = 1,0,1
m, x y = 1,1,0
0, x y = 1,1,1

(6)

In order to implement this algorithm in hardware, several conclusions must be drawn


from the relationships above. From equation 5, it follows that an n 2 -operand adder is
necessary to cumulatively sum the n 2 terms during each successive multiplication. For
the Modified Booth recoding function defined in equation 6, a combinational logic
network is necessary in order to produce the recoded multipliers 0, m, and 2m from m,
the multiplicand. This algorithm reduces the total number of additions from n to n 2 at
the cost of extra logic to generate and select the necessary recoded multipliers [1,2].
2.2 Design & Methodology

The Mentor Graphics Design Architect tool is used in this tutorial for schematic capture
and design. Due to the nature of schematic capture, a hierarchy consisting of
encapsulation and abstraction is used to make the design more modular and
comprehensible. The ADK libraries consist of all the necessary standard cells needed to
build each functional unit comprising the circuit, and therefore they will be used
extensively. The top level circuit schematic, presented in figure 1, consists of 7 primary
functional units. In total, there are 11 functional units that encompass the design.

Figure 1. Top-Level Circuit Schematic


Some components contain other custom symbols, and therefore these symbols cannot be
found via the ADK libraries, rather they must be imported by clicking on the CHOOSE
SYMBOL button in the ADD/ROUTE schematic palette window. To import a symbol,
you must navigate to the directory containing the symbol. For each functional unit, a
symbol must be generated in order for the component to be used elsewhere. As stated
earlier, the use of symbols make the schematic more comprehensible. In addition, the
design makes extensive use of data buses. Buses allow bits to be grouped so as to avoid
having separate wires for each bit running all over the place. Grouping bits into a bus
simplifies the overall look of the design and is therefore recommended. A bus can be
instantiated by clicking on the ADD BUS/BUNDLE button on the ADD/ROUTE
schematic palette window. After a bus route has been placed, the name and size must be
specified. To do this, select the bus by clicking on it with the LMB. Choose the
NetName Nets popup menu item with the RMB. Enter the name that defines the bus in
the Property Value text box. Click the OK button. Move the cursor in the schematic
window. Drag the bus name to the location you want it displayed. Click the LMB to fix
the text position. The naming of the bus should conform to the following format:
bus_net_name(msb:lsb), where bus_net_name is any name of your choice, msb is the
most significant bit, and lsb is the least significant bit. Next, add a wire to the bus by
clicking on the ADD WIRE button on the ADD/ROUTE schematic palette window and
attach it to the bus. A pop-up menu will appear asking you which bit you would like to

use. A bus does not have to be named in the event that two symbols with common bus
sizes are to be connected. Ports (inputs/outputs), GND & VDD, basic logic gates, flipflops, transistors, pads, etc. can be added by navigating to the ADK libraries under the
Libraries pull-down menu. This will display the ADK libraries palette menu to the
right of the screen. This concludes the basic foreknowledge needed in order to reproduce
the multiplier presented in this tutorial. The tutorial will proceed as follows: Each
functional unit will be presented and a brief description of each will be given.
2.2.1 The Control Unit

Figure 2 presents the Design Architect circuit schematic for the control unit. The main
control unit is implemented as a finite state machine consisting of eight states.

Figure 2. Control Unit Circuit Schematic (control)


Figure 3 presents the state diagram flow chart and the corresponding state table for the
control unit. From the diagram, the reset and ready signals control which state the
multiplier is in. The reset signal has the effect of clearing all storage units in the
multiplier in addition to reinitializing the current state to zero, while the ready signal only
has an effect in states zero and seven.

X1
10

10

7
X1

X1

X1, 00

X0

X0

00

X1

X1

X0

X0

3
X0

X0

X1

STATE DEFINITION
CONTROL STATES
0
0
0 CLEAR / WAIT FOR RDY SIGNAL
0
0
1 RDY ASSERTED / LOAD MULTIPLICAND
0
1
0 LOAD MULTIPLIER
0
1
1 ADD RECODED MULTIPLIER AND SHIFT BY 2
1
0
0 ADD RECODED MULTIPLIER AND SHIFT BY 2
1
0
1 ADD RECODED MULTIPLIER AND SHIFT BY 2
1
1
0 ADD RECODED MULTIPLIER AND SHIFT BY 2
1
1
1 WAIT FOR RDY SIGNAL AND ASSERT DONE

Figure 3. Control Unit State Definitions


The primary function of the control unit is to manage the state of the multiplier. The
inputs to the control unit consist of a ready signal, which signals the start of a
multiplication cycle, a clock signal; and a clear signal, which resets the state of the entire
multiplier. The output signals of the control unit consist of a 3-bit bus, which defines the
current state of the multiplier; a done signal, which signifies that the multiplication has
finished; and an enable signal, which enables storage of the multiplicand into an 8-bit
register for recoded multiplier generation. The control unit was designed using D-FlipFlops, and the equations were derived from the state table in figure 4 below.
PRESENT
INPUT
NEXT
QS2 QS1 QS0 RDY RST Q'S2 Q'S1 Q'S0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
1
0
0
1
0
0
0
1
0
0
0
1
1
0
0
1
0
0
1
0
0
0
0
1
1
0
1
0
1
0
0
1
1
0
1
1
0
0
1
0
0
0
1
1
1
0
1
0
0
1
0
0
0
0
1
0
1
1
0
0
1
0
1
0
1
1
0
1
0
0
1
1
0
1
0
1
1
0
1
1
0
1
1
0
0
0
1
1
1
1
1
0
1
0
1
1
1
1
1
1
0
0
1
1
1
1
1
1
1
0
0
0
1
X
X
X
X
1
0
0
0

Figure 4. Control Unit State Table


In an effort to minimize the amount of logic gates required to implement the state
machine, the above table was placed into three 4-bit Karnough maps. The following
figures present the Karnough maps obtained from the state table shown in figure 4.

DS2: QS2,QS1 / QS0,RDY


00
00
01
11
10

01
0
0
1
1

11
0
0
1
1

10
0
1
0
1

0
1
1
1

DS1: QS2,QS1 / QS0,RDY


00 01 11 10
00
0 0
1
1
01
1 1
0
0
11
1 1
0
1
10
0 0
1
1
DS0: QS2,QS1 / QS0,RDY
00 01 11 10
00
0
1
0
0
01
1
1
0
0
11
1
1
1
1
10
1
1
0
0

Figure 5. Control Unit Karnough Maps


Following the process of map simplification, the equations listed below are obtained.
DS 2 = QS 2 QS 0 + QS 2 RDY + QS 2 QS1QS 0 + QS 2 QS1

DS 1 = QS 1 QS 0 + QS 1QS 0 + QS 2QS 0 RDY


DS 0 = QS 1 QS 0 + QS 2 QS 0 + QS 0 RDY + QS 2QS 1

The following is a brief explanation of the derivation of the state table listed in figure 3.
Upon initialization, the state of the multiplier is unknown. Therefore, to ensure that the
multiplier is in a known state, the external reset signal must be asserted to clear the
contents of the state machine as well as the result register. When this is done, the
multiplier enters state 0 and stays in this state until the ready signal is asserted. When the
ready signal is asserted, the multiplier enters state 1, and it loads the multiplicand into an
8-bit storage register for recoded multiplier calculations. At this time, the state machine
asserts the enable signal to allow the multiplicand to be loaded and it then de-asserts the
enable signal on the next clock edge. The state machine enters state 2 during the next
clock cycle, where it loads the multiplier into the lower byte of the result register from
the input. Due to external pin limitations, the two 8-bit input operands (the multiplicand
and the multiplier) are loaded sequentially with the same 8-bit input bus. The following
four states are required to produce the product. The product is calculated by adding the
proper recoded multiplier to the upper byte of the result register and then performing a
right 2-bit shift during each clock cycle. During state 7, the done signal is asserted and
the content of the result register is stable and valid. The multiplier stays in this state and
the output is valid until the ready signal is asserted again. The figure below presents a
timing diagram of the circuit.

S0

S1

S2

S3

S4

S5

S6

S7

CLOCK
A (7:0)
B (7:0)
RESET
DONE
READY
R (15:0)

Figure 6. Timing Diagram


2.2.2 An 8-bit Multiplicand Register

An 8-bit register was designed with D-Flip-Flops with the sole purpose of storing the
input multiplicand for recoded multiplier calculations. Figure 7 presents the circuit
schematic for this component.

Figure 7. Multiplicand Register Circuit Schematic (8register)


Booths Modified algorithm works by adding these so called recoded multipliers to the
partial sum of the upper byte of the result register. The recoded multipliers are generated
according to the Modified Booth recoding function defined in equation 6. The register
has an enable signal, which is asserted only during state 1 in order to load the
multiplicand. The contents of the register are held constant throughout the rest of the
multiplication cycle.

2.2.3 Multiplicand Select Decoder Unit

Figure 8 presents the circuit schematic for this component. The purpose of this
component is to select the appropriate recoded multiplier based on the lower 2 bits of the
result register and the carry out bit. The input to this component consists of a 3-bit bus
containing the result bits just described, and the 4-bit output bus contains an enable bit
signal for each of the four non-zero recoded multipliers.

Figure 8. Multiplicand Select Decoder Circuit Schematic (mdecoder)


2.2.4 The Recoded Multiplier Unit

As probably deduced from the title of this section, this component serves to simply
generate each of the four non-zero recoded multipliers. Figure 9 presents the circuit
schematic for this component. One of the four multipliers is selected based on a 4-bit
input signal generated by the multiplicand select decoder. The contents of the 8-bit
multiplicand register serve as the other input to this component. The outputs consist of 4
10-bit recoded multipliers, in which one of the recoded multipliers will be active,
depending on the input select signal from the multiplicand select decoder.

Figure 9. Recoded Multiplier Circuit Schematic (recodemult)


2.2.5 An 8-bit 2s Complement Unit

This unit simply produces the 2s complement of the input. The input is 8 bits wide, and
the output is 8 bits wide with two additional bits for sign extension. Additionally, a sign
output bit gives the polarity of the result. This component is used in generating the
recoded multipliers for the recoded multiplier unit. The circuit schematic for this
component is presented in figure 10.

Figure 10. An 8-bit 2s Complement Unit Circuit Schematic (82scomplement)

2.2.6 The Addend Unit

This unit simply consists of 10 4-bit OR gates. The gates are reconvergent so as to allow
one of the four recoded multipliers to pass as input to the partial sum of the product
register. The multiplicand select decoder enables only one of the recoded multipliers by
AND-ing that particular multiplier with a logical high and AND-ing the remaining three
recoded multipliers with logical low. Therefore each of the 4-bit OR gates will have
three guaranteed logic 0 inputs, while the other bit will be the ith bit of the enabled
recoded multiplier. The output is 10-bits wide. The circuit schematic for this component
is shown in figure 11.

Figure 11. Addend Circuit Schematic (addend)


2.2.7 A 10-bit Full Adder

A 10-bit full adder is used in computing the partial sum of the upper byte of the result
register. The inputs to the adder consist of the upper byte of the result register including
two bits for sign extension, and the other 10-bit input comes from the addend unit. The
10-bit result is re-deposited into the upper byte and sign bit of the result register. The
circuit schematic for this component is shown in figure 12.

Figure 12. 10-bit Full Adder Circuit Schematic (8fadder)


2.2.8 A 16-bit Result Register

This component is by far the most complex logic block in the circuit. The result register
is 16-bits wide and contains three additional D-Flip-Flops, two for sign extension, and the
other to store the shift-out bit. Therefore, the result register is actually 19-bits wide, but
only the 16 bits are available to the user. The result register can be in one of three modes:
load, hold, or shift. The load mode has two separate contexts for the upper and lower
bytes of the result register. For the lower byte, the load mode loads the input multiplier.
For the upper byte, the load mode simply loads 0, effectively clearing the byte. This
mode corresponds to state 2 of the finite state machine. The hold mode keeps the
contents of the result register regardless of changes at the input. This mode is valid for
state 7 of the finite state machine. The shift mode shifts the contents of the result register
to the right by two bits during each successive clock cycle. This mode is valid for states
3, 4, 5, and 6 of the finite state machine. The circuit schematic for this component is
presented in figure 13.

Figure 13. The 16-bit Result Register Circuit Schematic (16shft2reg)


In order to implement the result register, special D-Flip-Flops were designed to perform
these modes of operation. The next section discusses these special D-Flip-Flops.
2.2.9 D-Flip-Flops with Load, Hold, & Shift

Load, hold, & shift modes of operation were added to standard cell D-Flip-Flops in order
to avoid including an additional 16-bit register to store a stable and valid result upon the
completion of a multiplication cycle. The need for these special flip-flops stems from the
fact that three inputs needed to be multiplexed with each D-Flip-Flop in the result
register. Some control logic was needed in order to properly multiplex the inputs with the
select signals. The corresponding states of the result register in response to each of the 8
states generated by the state machine are listed in the table of figure 14.
REGISTER CONTROL
0
0
0
0
0
1
0
1
0
0
1
1
1
0
0
1
0
1
1
1
0
1
1
1

DEFINITION
CLEAR REGISTERS
CLEAR MULTIPLICAND
LOAD MULTIPLIER
SHIFT BY 2
SHIFT BY 2
SHIFT BY 2
SHIFT BY 2
HOLD / QNEXT = QPRESENT

Figure 14. Result Register State Definitions


The state table, shown in figure 15, was derived from the result register state definition
table in figure 14.

QS2 QS1 QS0 L0AD HOLD SHIFT


0
0
0
0
0
0
0
0
1
0
0
0
0
1
0
1
0
0
0
1
1
0
0
1
1
0
0
0
0
1
1
0
1
0
0
1
1
1
0
0
0
1
1
1
1
0
1
0

Figure 15. Result Register State Table


Placing the above table into three separate Karnough maps, the following diagrams listed
below in figure 16 are obtained.
LOAD: QS2 / QS1,QS0
00 01 11 10
0
0 0
0
1
1
0 0
0
0

SHIFT: QS2 / QS1,QS0


00 01 11 10
0
0 0
1
0
1
1 1
0
1
HOLD: QS2 / QS1,QS0
00 01 11 10
0
0 0
0
0
1
0 0
1
0

Figure 16. Result Register Karnough Maps


The equations governing the necessary control logic needed to implement the special flipflops were derived from Karnough maps shown in the figure above and are listed below.
LOAD = QS 2 QS1 QS 0
SHIFT = QS 2 QS 1 + QS 2QS1QS 0 + QS 2 QS 0

HOLD = QS 2QS 1QS 0


This control logic is embedded within the special D-Flip-Flops, and they are in turn
encapsulated in the register mode decoder, which is the subject of the next section.
Figure 17 presents a circuit schematic for this component.

Figure 17. D-Flip-Flops with Load, Hold, & Shift Circuit Schematic (dfflhs)
2.2.10 The Register Mode Decoder

This is the last significant logic block to be discussed. This logic block decodes the
current state of the control unit into select lines that are used to multiplex the D-FlipFlops with load, hold, and shift capabilities. The input consists of the 3-bit state of the
control unit (this serves as the select input) as well as the inputs for the load, shift, and
hold lines. The output is one of the load, hold, or shift lines. Figure 18 shows the
schematic diagram for this component.

Figure 18. The Register Mode Decoder Circuit Schematic (rdecoder)


Assuming some familiarity with the Mentor Graphics Design Architect tool, and having
presented each component that comprises the design, in addition to including circuit
schematics, one should be able to reproduce this implementation of a Modified Booth
Multiplier with little effort.
3.0 Simulation

The modified booth multiplier inputs (multiplier and multiplicand) need to have the
binary numbers sent to them individually. If the pattern generator is set to count from 0
to 255 with two separate sets of 8 binary digits, it will not iterate as expected. For this
reason, the 8-bit multiplier and multiplicand should be combined into a 16-bit bus. If the
pattern generator is set to count a 16 bit binary number from 0 to 65536, then the
multiplier can count from 0 to 255 every time the multiplicand counts a single binary
digit. In this implementation, we make use of buffers for the combination of the
multiplier and multiplicand. A 16 bit input bus goes into the bus combine component and
is split into two 8-bit buses (one goes to the multiplier and the other to the multiplicand).

Figure 19. Bus combine Circuit Schematic


3.1 Adding Forces

For this simulation, the clock is set to the maximum operating frequency of 30.3Mhz, for
an equivalent period of 33ns. This is done by clicking on the clock input line and adding
a clock force. The clock period should be 33ns (see figure 20).

Figure 20. Clock signal window

Next, the multiplier needs to be cleared on the first clock cycle. This is done by
clicking on the reset input line and adding a force to the reset line. The reset line will
have a value of 1 at time 0ns and a value of 0 at 33ns. The multiplier should be ready
to multiply all the 65536 values after the reset cycle. To set the multiplier to be ready,
the ready input line is clicked and a force is added to the ready line. The ready line will
have a value of 0 at time 0 and a value of 1 at 33ns.

Figure 21. Force signal window

3.2 Pattern Generation

The input pattern for the multiplier and multiplicand must be created. To do this, click on
the 16 bit input bus going into the bus combine component. Next, click on the
PATTERN GENERATOR icon. The pattern generation should begin 33ns after the
assertion of the reset signal. Since the multiplier needs 7 cycles to make a multiplication,
7*33ns (or 231ns) is needed per multiplication cycle. A total of 65536 patterns are
needed, so this requires that the entire pattern sequence should be 65536*231ns or
15,138,816ns (15.1ms) long.

Figure 22. Pattern Generator window


3.3 Running the Simulation and Viewing Results

To view the relevant traces, select the multiplicand, multiplier, and result lines. Next,
add the selected traces in hexadecimal format. The other single bit lines can be directly
added by clicking on the TRACE or LIST button. Having completed this, the inputs
should be set and added to the list; however, the simulation still needs to be completed.
The simulation should be allowed to run slightly longer then the total time of the pattern
generation. Type run 15139000 to run the pattern sequence.

Figure 23. Trace results


After the simulation has completed, the data in the trace list should be exported. To do
this, click on the LIST icon, and then go to the file menu and export report.
4.0 Placement & Routing
4.1 Pad Placement

To start the creation of an IC design the 88mult component must be connected to pads for
I/O and power. Certain pads may be reserved for VDD and ground depending on your
chip layout type. AMI 0.5 technology is used to create the pad layout for this
multiplier.

Figure 24. Pad Layout


4.2 Internal Core Layout

The core layout will automatically be generated, as shown in figure 25. Some design
rules may be violated in the creation process and must be manually corrected. See Dr.
Milenkovics website for some tutorial tips. Typing the command Peek followed by a
number will reveal that amount of hidden layers so that errors can be more easily fixed.

Figure 25. Core Internal Routing


4.3 Design Rule Check Errors

You might encounter one or more of the following errors when performing a design rule
check. It is important that all DRC errors be corrected in order to increase the probability
that the chip will perform as expected after fabrication.
4.3.1 Via must NOT be stacked with contact

Select the via and then right click, select editmoveunconstrained. Move the via over
2s. Then add metal layer 1 (or metal 2) in between the via and trace. Use the same
metal layer that the trace is made out of. Right click, select addshape, and click
options on the pop-up menu that appears. Select metal 1 (or metal layer 2) to add between
the via and trace. Figure 26 shows this DRC error and the corresponding correction.

Figure 26. Via must NOT be stacked with contact

4.3.2. Port must be completely covered with Metal

Add metal layer 1 (or metal 2) over the white indicator box. Use the same metal layer
that the trace is made out of. Right click and select addshape, click options on the popup menu, and select metal 1 (or metal layer 2) to cover the white indicator box with
metal. Figure 27 shows an example of the DRC error.

Figure 27. Port must be completely covered with Metal


4.3.3 Metal spacing = 4L

Add metal layer 1 (or metal 2) in between the white indicator lines. Use the same metal
layer that the trace is made out of and only extend the metal coverage to the length of the
shortest white line. Right click, selectshape, click options on the pop-up menu and
select metal 1 (or metal layer 2) to cover the gap between the two white indicator lines.
Figure 28 shows this particular DRC error and a typical fix.

Figure 28. Metal spacing = 4L

4.4 Core Placement and Routing

The core of the chip should be centered inside the pad frame. Traces should connect the
core to the pad frame. The traces have two layers. If manual routing is required, then the
route should be placed to connect the core to the frame by changing layers when
necessary to avoid inadvertent connections between two separate traces. You may find
the preferred route facility of the auto-route feature useful in performing manual
routing.

Figure 29. Core Placement and Routing


5. Conclusion

In this paper, we have endeavored to present the design of an 8-bit Modified


Booth multiplier in an effort to demonstrate an approach that can be taken toward the
design and verification process of a VLSI circuit with the Mentor Graphics tools. We
have included circuit schematics, timing diagrams, as well as simulation procedures in
this technical paper in an effort to convince the reader of the correctness of the design.
Using design techniques such as those presented in this paper, any VLSI circuit, whether
complex or simple, can be successfully realized.

Vous aimerez peut-être aussi