Vous êtes sur la page 1sur 10

Logic BIST for Large Industrial Designs: Real Issues and Case Studies

Graham Hetherington and Tony Fryars Texas Instruments, Ltd. 800 Pavilion Drive Northampton, UK NN4 7YL
Abstract
This paper discusses practical issues involved in applying logic built-in self-test (BIST) to four large industrial designs. These multi-clock designs, ranging in size from 200K to 800K gates, pose significant challenges to logic BIST methodology, flow, and tools. The paper presents the process of generating a BIST-compliant core along with the logic BIST controller for at-speed testing. Comparative data on fault grades and area overhead between automatic test pattern generation (ATPG) and logic BIST are reported. The experimental results demonstrate that with automation of the proposed solutions, logic BIST can achieve test quality approaching that of ATPG with minimal area overhead and few changes to the design flow.

Nagesh Tamarapalli, Mark Kassab, Abu Hassan, and Janusz Rajski Mentor Graphics Corporation 8005 S.W. Boeckman Road Wilsonville, OR 97070, USA
[1]. Note that manufacturing test is applied to every device multiple times, at different voltage levels, at the wafer, packaged device, etc. The manufacturing test cost is incurred for every manufactured device and might be as high as 25-30% of the total manufacturing cost. Logic built-in self-test (BIST) is based on scan as the fundamental DFT methodology [2,3,4,5]. Initially, the predominant compelling reason for the adoption of BIST was the requirement to perform in-field testing. Recently, there has been growing interest in BIST as it can reduce the cost of manufacturing test as well as improve the quality of the test by providing at-speed testing capability. In BIST, pseudorandom patterns are generated on chip, the responses are compacted on chip, and the control signals are driven by an on-chip controller. The amount of test data exchanged with the tester is therefore drastically reduced. In addition, the scan cells are configured into a large number of relatively short scan chains, thus reducing the time required to apply a single test pattern. The low memory and performance requirements on the tester allows the usage of very low cost testers for manufacturing test of designs with logic BIST. Logic BIST is based on pseudorandom patterns and involves compaction of test responses. Those two characteristics impose more stringent design rules on the BISTed logic than scan with stored patterns. Logic BIST requires that bus conflicts are eliminated, sources of X states are properly bounded to prevent corruption of the signatures, the circuit is random-pattern testable, etc. In many cases, the original design does not satisfy many of these requirements, thus posing barriers to BIST. In those cases, and in general, the only practical way to implement BIST is through automation of the design tasks and their integration in the overall methodology and design flow. The introduction of logic BIST at the Texas Instruments MOS design center is driven by limitations of the currently-used test equipment and a number of specific goals. In particular, the testers currently used already limit the ability to run available tests in the following ways: 1. Scan operates at a maximum frequency of 50 MHz. 2. Tester scan memory is usually filled. 3. Tester has a maximum of 8 scan chains, resulting in a long test application time for large designs. 4. Tester functional test memory is also filled, leading to utilization of as little as 10% of the available functional tests.

I. INTRODUCTION
Most large application-specific integrated circuits (ASICs) use scan as a fundamental design for test (DFT) methodology. It has been observed that the amount of test data required to test one gate in a large design can exceed 1 Kb. This depends on several factors, such as the design style, fault models used, and capabilities of the automatic test pattern generation (ATPG) tool used. However, even using state-of-the-art ATPG tools, several gigabits of test data may be required for a multi-million gate design. Testers have a limited number of channels designed to drive scan chains, typically around 8. The speed of loading is also limited by the maximum scan frequency, usually around 10 to 50 MHz. The large volume of test data creates two problems for testers: capacity and test application time. Very often, testers do not have enough memory to store the entire test set to cover stuck-at, transition, and path delay faults. In some cases, the available memory is not even large enough to store a complete test set for stuck-at faults. In this case, either very time consuming reloads are required, or only a subset of the test vectors is applied with the corresponding reduction of fault coverage. The volume of test data directly impacts the test application time. The increasingly large volume of test data and limited throughput of the scan interface between the tester and the design creates a bottleneck. Even today, test application time can be several seconds. The cost of tester time is typically 25 to 50 cents per second. For high-end testers used in testing state-of-the-art microprocessors, it has been reported that the tester amortization cost could as high as $6000 per hour

Paper 14.2 35 8

ITC INTERNATIONAL TEST CONFERENCE 0-7803-5753-1/99 $10.00

1999 IEEE

5. Transition fault or path delay scan ATPG patterns are not used due to lack of tester memory. All these problems are constantly getting worse. They could be solved by investing in tester technology. However, logic BIST is an attractive alternative solution as it removes most of the tester limitations. Given the current design environment and ATPG practice, the following basic logic BIST goals were derived: G1. Eliminate tester memory and frequency limitations. G2. Solution provides at-speed scan testing. G3. Solution works for 1-2 million logic gate designs. G4. BIST stuck-at grade 95%. G5. Logic BIST area overhead 2% of logic. G6. Silicon BIST run time < 1 second. G7. Engineering effort < 2 person months per design. It is also important that logic BIST fits seamlessly into the current design process and that the overnight design synthesis time is not compromised, hence the following additional flow-related goals: G8. Ability to use ATPG or logic BIST tests. G9. Minimal impact on current design methodology. G10. Automation of the logic BIST flow. G11. Additional RTL-to-gates run time < 2 hours. G12. Logic BIST fault grade run time < 12 hours. G13. Logic BIST IkosTM simulation time < 12 hours. G14. BIST can be run on a very low cost tester. In Section II, the logic BIST architecture is presented with particular emphasis on the controller and its ability to support multi-clock multi-frequency designs. Section III covers generation of the BIST-ready core including insertion of test points, bounding of X generators, and handling of primary inputs and outputs. Section IV is devoted to detailed presentation of four case studies. Finally, conclusions are presented in Section V.

eral shallow BIST-mode scan chains into a few deep ATPG-mode scan chains accessed directly from the chip pins in case top-up ATPG is used to improve the fault coverage obtained by BIST.

...

Space compactor

Phase shifter

PRPG

Scan chain 2

Scan chain N Scan enable Decoder Mask

Pattern counter Shift counter Clock BIST mode

Figure 1: Generic scan-based logic BIST architecture.

II. LOGIC BIST ARCHITECTURE A. Generic scan based logic BIST architecture
A generic single clock logic BIST architecture based on the well known STUMPS technique [6] is illustrated in Figure 1. The figure depicts the circuit-under-test or core, and the logic BIST controller in the highlighted area. The circuit is composed of combinational logic, and possibly embedded memories, separated by multiple scan chains. Various components of the logic BIST controller are shown in the highlighted area. These components include test pattern generation block - composed of the pseudo-random pattern generator (PRPG) and phase shifter circuit, the output response analysis block - composed of multipleinput signature register (MISR), space compactor, and optional AND gates. In addition, there are two counters: the pattern counter, and the shift counter which for each pattern keeps track of the number of cycles required to fill the scan chains. The decoder block shown in the figure drives the test points. Finally, the multiplexers between the phase shifter and scan inputs are used to concatenate sev-

The BIST can be initiated either through a boundary scan TAP controller or by appropriately asserting a set of new primary inputs in case a stand-alone mode logic BIST controller is implemented. Prior to running the actual test, the controller components such as PRPG, MISR and the pattern counter need to be initialized. In addition, the internal scan chains can also be optionally initialized. The actual test of the circuit consisting of several patterns then begins. For each pattern the shift counter counts ( N SC + N CC ) cycles where N SC , the number of cycles in the shift window is equal to the length of the longest scan chain and N CC , the number of cycles in capture window is typically equal to one for a simple capture window. Hence in order to reduce the test application time it is necessary to configure the scan cells into a large number of shallow scan chains. A systematically designed phase shifter circuit [6,7] is placed between the LFSR and the scan chain inputs to eliminate structural dependencies and allow a large number of scan chains to be driven by a relatively short LFSR. Similarly an XOR structure called space compactor is required to compact the large number of scan outputs before feeding them to a small MISR. As with the phase shifter care must be taken in designing the space compactor to avoid loss of test coverage due to fault masking. During the shift window of a pattern, new pseudo-random values from the PRPG are loaded into the scan chains while simultaneously unloading and compacting the circuits response for the previous pattern into the MISR. In case the internal scan chains are not initialized, for the first pattern, their unknown contents can be blocked as shown in Figure 1 by means of AND gates in front of the MISR. After the scan chains are completely loaded, the multiplexers in the scan cells are placed in system mode for one cycle to capture the circuits response. This sequence of events continues for each pattern. In addition, if multi-

Paper 14.2 35 9

MISR

...
Scan chain 1

...

...

...

phase test scheme is used [8], at the beginning of each new test phase, the test points decoder establishes a pre-determined set of values at the control points that remain fixed for that phase. Once all the test phases are applied, the contents of the MISR, i.e. the signature, can be either scanned out and compared externally or compared with an on-chip reference signature to determine the status of the circuit.

the capture window of a test pattern. It is shown that in order to achieve at-speed test of the circuit, unlike previous methods [10], it is not necessary to perform at-speed shift of the scan chains. In fact only events in the capture window are crucial to at-speed testing of the circuit.

MISR

Figure 2: Multi-frequency logic BIST controller.

The timing diagram shown in Figure 3 illustrates the partitioning of a test pattern into shift window and programmable capture window. The shift window is comprised of multiple shift operations required to load/unload the scan chains. These shift operations can be performed at a frequency of any of the three clocks or their sub-multiples. This freedom of selection of the shift frequency provides a trade-off between design of scan chains vs. the test application time. In the example timing diagram in Figure 3, scan chains are shifted at frequency F2 of clock clk 2 . Note that memory elements in scan chain SC1 use the faster frequency F1 during the functional operation. This frequency is reduced to F2 in the shift window through clock suppression. Clock suppression in this case suppresses every other pulse of clock sys_clk 1 to generate slower clock clk 1 of frequency F2. Scan chain SC2 is clocked by clk 2 which is driven by sys_clk 2 . Since the frequency of sys_clk 2 is F2, no modification of this clock is necessary for the shift window. Finally, scan chain SC3 is driven by a slower clock sys_clk 3 in the system mode. Clock multiplexing is used to drive clk 3 with sys_clk 2 of frequency F2 during the shift window. Timing diagram of clk3 shows the effect of multiplexing in the faster clock. The programmable capture window comprises of captures in different clock domains and some shift operations to create inter-domain at-speed capture. The functional clock of each of the domains is used to obtain a shift followed by a capture. These two consecutive events using the functional clock guarantee that every intra-domain path can

Paper 14.2 36 0

...

This section discusses an at-speed, multi-clock, multifrequency logic BIST scheme that tests logic in every clock domain as well as between domains at their respective system speed. The speed of scan loading is separated from the speed of circuit operation to provide a simple mechanism to control power dissipation during BIST and to reduce the impact on scan chain design. A programmable capture window allows capture in interacting domains to take place at different times to eliminate the clock skew problem. The scheme presented in this section is the subject of a patentpending invention [9]. Figure 2 illustrates a multi-frequency logic BIST scheme based on the STUMPS architecture. This figure, similar to the architecture shown in Figure 1, depicts two parts: the circuit-under-test and the logic BIST controller. However the circuit-under-test in this case has multiple scan chains operated by different clocks, possibly running at different frequencies. These frequencies can be sub-multiples of each other, for example F 1 , F 2 = F 1 2 , F 3 = F 1 4 , or related but not sub-multiples, such as F 1 , F 2 = F 1 3 , F 3 = F 1 5 , or totally unrelated with respect to each other, such as F 1 = 155 MHz , F 2 = 66 MHz , F 3 = 41 MHz . In the following, for the sake of simplicity, a logic BIST scheme for a circuit containing three clocks that are sub-multiples of each other is discussed. The proposed techniques can be just as easily applied to circuits containing related but non-sub-multiple clocks or totally unrelated clocks. The circuit in Figure 2 contains three scan chains SC1, SC2, and SC3 operated by three different clocks clk 1 , clk 2 , and clk 3 respectively. In the normal system mode clk 1 , clk 2 , and clk 3 are driven by sys_clk 1 , sys_clk 2 , sys_clk 3 respectively, which are generated by means of an on-chip phase locked loop (PLL). The frequency F2 and F3 of clocks sys_clk 2 and sys_clk 3 , respectively, is assumed to be half and quarter of that of sys_clk 1 frequency F1. In the multi-frequency LBIST test, these functional clocks are modified in different ways, as explained in the following, to achieve an at-speed test. Each of three clock domains clk 1 , clk 2 , and clk 3 has a dedicated scan enable signal Sen1, Sen2 and Sen3 respectively. The controller for multi-clock logic BIST contains, in addition to the components of the single-clock logic BIST controller, a micro-controller block to generate scan enable and clock control signals. The controller generates at-speed test through appropriate manipulation of clocks by means of clock suppression and multiplexing. The key idea behind the proposed scheme is to decouple the shift window from

Phase shifter

Clock control signals PLL sys_clk1 sys_clk2 sys_clk3

BIST controller
clk1 S clk2 C 1 clk3 S C 2 S C 3

Space compactor

...

B. Multi-clock logic BIST scheme

PRPG sen3 sen2 sen1

Load-unload window: Vector N

Programmable capture window: Vector N 2 3 capture 4 5 6 7 8

sys_clk1 clk1 sen1 sys_clk2

capture

clk2 sen2 sys_clk3 clk3 sen3


capture

Figure 3: Multi-frequency logic BIST timing diagram.

be tested at-speed; i.e., the time between the launch and capture events is equal to one functional clock period. Also to test inter-domain paths, at-speed clock edges are placed appropriately as shown in Figure 3. The exact placement of clock edges for at-speed test of all nine relations is detailed in Table I. Each table cell T [ i, j ] , corresponding to launch clock clki and capture clock clkj, lists the position of the launch pulse of clki followed by the capture pulse of clkj. All the positions are described in terms of pulses of the fastest clock sys_clk 1 . For example, the capture edge for clk 3 is the rising edge of second clock pulse of clk 3 in the capture window, which is equivalent to the rising edge of the fifth clock pulse of sys_clk 1 . As can be seen, in the capture window clock suppression is used to suppress some pulses of clk 1 and clk 2 while no pulses of clk 3 are suppressed. The scan enable signals for each of the clock domains switch to system mode prior to their respective capture edges. Since the scan enable signals have to be routed to all the scan cells in the circuit, their design constraints can be relaxed by opting for slow scan enables. In the example timing diagram shown in Figure 3, scan enable Sen1 is designed to be fast, i.e. it has half-a-cycle of the fastest clock to settle whereas the scan enables Sen2 and Sen3 are designed to be slow, i.e. they have one-and-half-a-cycle of the fastest clock to settle.
Table I: Edge placement for intra- and inter-domain at-speed test.

One of the important advantages of the programmable capture window is the robustness against the clock skew. Figure 3 illustrates that whenever any clock domain is capturing data, other clock domains do not have an active edge. Thus, the capture edge, unlike in previously proposed solutions, is not susceptible to inter-domain clock skew. In addition, the capture window can be programmed to perform multiple captures in each domain as well as control slow scan enables. Performing multiple captures reduces the risk of delay test invalidation and false paths that might occur due to illegal states in scan chains resulting from filling them with pseudo-random values from the PRPG. Slow scan enables, by providing multiple cycles of the fastest clock for the scan enable signals to settle, reduce constraints on their design. They no longer need to be routed as clock signals. Note that programmability of capture window can also be used to handle a circuit containing multiple clock domains of the same frequency. In order to generate appropriate clock control and scan enable signals, the pattern and shift counters have to operate using the fastest clock. Also, unlike the previous controller, the number of cycles in the shift window N SC , is not necessarily equal to the length of the longest scan chain. N SC depends on the longest effective scan length as determined by the frequency used for shifting the scan chains. Similarly N CC , the number of fastest clock cycles in the capture window is usually more than one. On the completion of scan chain loading, a sequence of events is launched in the capture window to perform atspeed testing of intra- and inter-domain logic. Once a predetermined number of patterns are applied, the contents of the MISR can be, as explained earlier, scanned out and compared externally or compared with an on-chip golden signature.

III. GENERATING A BIST-READY CIRCUIT


In addition to having scan, a BIST-ready circuit should be random pattern testable and should have no unknown values propagating to observable points. In this section, those and other barriers to the implementation of logic BIST will be discussed and automated solutions to overcome them will be presented.

A. Random-pattern resistance
Logic BIST is in general based on pseudo-random patterns. Most circuits, however, have inherent random-pattern resistance, which results in relatively poor test coverage. To achieve test coverage approaching that of ATPG, control and observe points are added to the circuit to increase its susceptibility to random pattern testing. Control points A control point is inserted on a signal that has a very high probability of logic 0 or 1 if this predominant value causes poor controllability or observability of a sufficiently

Launch clock

clk1 clk2 clk3

Capture clock clk1 clk2 clk3 1-2 2-3 4-5 1-2 1-3 3-5 1-2 1-3 1-5

Paper 14.2 36 1

large number of faults. Several types of control points are commonly used: 1. A MUX-type control point provides pseudo-random values while blocking propagation of any faults through it. For that reason it is used primarily for bounding of logic generating X (unknown) states. The MUX is typically driven by either an existing scan cell or a dedicated scan cell for the control point. Driving control points from existing scan cells reduces the area overhead since only one MUX is used per control point. It also decreases routing overhead since a nearby scan cell can drive the control point. 2. An XOR control point can be controlled by a source of pseudo-random patterns in test mode, such that random-pattern resistance is reduced without blocking faults propagating to the control point site. An AND or OR gate can similarly be used. 3. AND or OR gates can be inserted to force a constant 0 or 1 when activated, respectively. The value opposite to the predominant value on the signal can be forced. This type of control point has low area overhead, but can block other faults from propagating if activated for the entire test session. Observe points Observe points can be added to enhance the circuits random-pattern testability by making more nodes in the circuit easily observable. Although observe points are less intrusive than control points, they can increase the capacitive load of the driving gates. Observe points can be merged through XOR trees before connecting them to scan cells, to reduce the number of scan cells required and hence the hardware overhead. Special care, however, must be exercised to merge only observe points that are close together in the circuit. Merging observe points in different blocks can lead to long interconnects which may result in timing problems in at-speed testing. Furthermore, the number of observe points merged should be limited to reduce the possible loss of test coverage due to fault masking. An attractive solution to both of those issues is to connect observe points to existing nearby scan cells rather then create new scan cells. The observe points are connected to the scan cell such that during capture, the functional input is XORed with the observe points connected to that scan cell. This is illustrated in Figure 4. This method reduces the hardware overhead since no new scan cells need to be added. However, note that an additional multiplexor is added along the functional path. Test point selection The method used for test point selection is Multi-phase Test Point Insertion (MTPI) [8]. MTPI divides the test into multiple phases, asserting a subset of the control points in each phase. The control points are AND or OR control points which, when activated, force a constant 0 or 1 respectively. Each control point requires one gate and is controlled from the BIST controller, so the hardware overhead is minimal. Although control points force a constant

BIST_mode Functional input Scan_enable

From observe points Additional logic

From previous scan cell Scan cell

FF

Figure 4: Connecting observe points to existing scan cells.

value, since they are only asserted in certain phases, they do not block fault propagation for the entire test session. In addition to the low area overhead of the control points in MTPI, typically fewer are required than when test points are selected using other methods. MTPIs control points are also unlikely to introduce timing problems in test mode; since they force constant values when activated, and the signals driving them are only changed during the relatively long scan load/unload cycle. Note that inserting control points can affect a circuits timing since they add delays along functional paths. However, it is possible to prevent insertion of control points along critical paths or blocks.

B. X generators
An essential requirement for a BIST-ready circuit is that it should not generate any observable unknown states. If an X propagates to the MISR, it corrupts the signature and makes it impossible to distinguish faulty and fault-free circuits. Therefore, test logic must be inserted to suppress unknown states or prevent them from propagating to an observable point. Typical potential X generators include the following: 1. Non-scan flip-flops (FFs). 2. RAMs and CAMs. 3. Combinational loops. 4. Undriven primary inputs. 5. Bus contention. 6. Violation on a wired gate. Potential X generators can be identified by a design rule checker. Preventing those X sources from propagating to the MISR can be accomplished using several methods which trade area overhead and loss of test coverage. Bounding X generators After identifying potential X generators, analysis is performed to determine which of those X sources need to be bounded. An X generator only needs to be bounded if its value can propagate to an observable point, or if an observe point can be inserted such that the X generator becomes observable. A trade-off can be used to prevent X sources which are already blocked at a nearby location from being bounded. Since in this case, the X generator will only be observable if an observe point is added between the X source and the locations at which it is blocked, simply restricting all gates in this region from being considered as

Paper 14.2 36 2

observe point candidates eliminates this problem. The threshold used to determine whether to re-bound a blocked X generator or exclude its blocked fanout region from consideration for observe points can be set by the user. For X generators which must be bounded, this can be done by inserting one or more control points before the X can propagate to an observable point. For example, if a non-scan FF has 2 outputs (Q and Q), one control point can be inserted on each of the outputs and activated in test mode. Alternatively, if the FF has asynchronous set/reset pins, a control point can be added to force the FF to 0 or 1 during the test. While a control point can be added to force a constant value, it is recommended for higher test coverage to insert a MUX control point driven by a nearby existing scan cell, as explained in the control points section. This method for X bounding ensures that no Xs will be observed. However, it does not provide means for observing faults which can only propagate to an observable point through the now-blocked X source. This can result in loss of test coverage. If the number of such faults for a given bounded X generator justifies the cost, one or more observe points can be added before the X source to provide an observation point to which those faults can propagate. Handling embedded memories Embedded memories, typically RAMs, can act as X generators. However, bounding their outputs can severely impact test coverage as faults which only propagate to the RAM will not be testable. This includes faults propagating to the RAMs data as well as address and control lines. The preferred method for handling embedded RAMs is to bypass them in test mode. The RAM inputs are connected to scan cells for observation. The inputs can be connected to space compactors (XOR trees) before connecting them to the scan cells, to reduce the number of scan cells required. Those same scan cells are used to drive the outputs of the RAMs in test mode. Therefore, in test mode, the RAMs inputs and outputs become pseudo primary inputs and outputs, respectively. This is illustrated in Figure 5. It is assumed that some other DFT methodology, typically memory BIST, is used to test the RAM itself.

logic BIST session follows a memory BIST run, then the RAM can be disabled at the end of the memory BIST session. This forces the outputs of the RAM to have constant values throughout the logic BIST run. While this method has low area overhead, faults propagating to the RAM will be blocked if no observe points are inserted on the RAMs inputs. Furthermore, constant values will be applied from the RAM, which can decrease the testability of faults in the logic driven by the RAM. It may also be possible to bypass some RAMs with low hardware overhead and without adding any logic on their inputs and outputs. If the RAM supports pass-through where the same address is written and read simultaneously, this mode can be used to make the memory transparent. Test logic would be required to force the memory into this mode during the BIST session. The main disadvantage of this method is that while it allows the data inputs to pass through, faults propagating to the address lines may not be tested. Furthermore, if multiple RAMs operate in this mode, combinational loops may form. It is therefore recommended to use the RAM bypass method discussed.

C. Handling of primary inputs and outputs


In logic BIST, only the scan chains are controlled and observed by default. Since the tester does not drive the test, it does not drive the primary inputs (PIs) or observe the primary outputs (POs). If POs are not observed, loss of test coverage will result since faults which only propagate to POs and not to scan cells will not be tested. More importantly, PIs must be driven; in addition to loss of coverage, a floating PI is an X generator. Control points can be added on the PIs to force them to constant values during the BIST session. While this prevents the PIs from generating Xs, loss in coverage may result due to the constant values forced. The recommended solution is to use MUX control points and drive the PIs from nearby existing scan cells. Therefore, PIs are handled exactly the same as X generators. Only PIs which are directly driven by the BIST controller do not need to be bounded. To observe POs during the BIST session, observe points are used to connect them to scan cells.

SC

Test_mode

IV. PRACTICAL EXPERIENCE WITH LOGIC BIST A. Background and motivation


This section describes the practical aspects of introducing logic BIST into a department that designs large ASICs. The designs have 200-800K NAND2 gate equivalents of logic plus an equivalent area of RAMs. There are often multiple clock domains with frequencies ranging from 2.5 MHz to 150 MHz. Register Transfer Level (RTL) VHDL is the design sourcing language. A simplified diagram of the overall design flow is shown in Figure 6. One of the key design processes is daily execution of some design flows; design synthesis to gates including scan insertion, RTL simulation regression, and IkosTM gate level
Paper 14.2 36 3

RAM

Figure 5: RAM bypass.

If the output multiplexors are not acceptable for timing or introduce an unacceptable hardware overheard, an alternative is to freeze the RAM after it is initialized. If the

Table II: Design data.


Functional Test VHDL RTL

Synthesis Scan Test Gate Level Simulation Scan Insertion Delay SDF Top Level Assembly ATPG

Scan Test

Layout

Timing Analysis Functional Gate Level Simulation

Design ASIC1 ASIC2 ASIC3 ASIC4 Core comb. gate count 180K 356K 558K 748K 16 10 10 11 Number of RAMs 0 1 1 1 75 MHz clocks 9 0 0 0 125 MHz clocks 0 1 1 1 150 MHz clocks 0 0 16 16 25/2.5 MHz clocks 0 4 2 2 125/25 MHz clocks 1 0 0 1 50 MHz clocks ogy is a small step. However, additional design work arises as follows: 1. Generation of a BIST controller. 2. Multiplexing and balancing of clocks. 3. Insertion of many short scan chains for use in BIST mode, and the ability to reconfigure them into relatively few, long scan chains for use in ATPG mode. 4. Test point insertion. 5. Handling of observable X generators. 6. Bounding module inputs. 7. Fault simulation to measure the fault grade and compute the MISR signature. 8. Timing analysis (TA) and resolution of any test pointrelated TA issues. 9. Gate level timing simulation verification of BIST In what follows, issues related to clocks, test point insertion, and X generators will be described. Multi-clock designs require careful balancing of the clock trees. Clock skew within a clock tree and between clock trees must typically be reduced to 0.3 ns at clock speeds of 75 MHz. Such clock control can be achieved through ASIC layout clock tree synthesis (CTS) tools. The clock multiplexing inherent in a multi-clock logic BIST controller is therefore safe as long as the ASIC CTS macros are placed directly on the output of the BIST controller clock multiplexors. MTPI test point insertion uses simple AND or OR gates for control points, driven by the test phase control signal. Selecting gates of sufficient drive ensures correct operation of these static signals. MTPI observe points are implemented as new output signals, new scan cells, or connected into existing scan cells. Each of these three observe point types also has the option of observe point sharing via XOR trees used as space compactors. Observe point signals must operate at speed so they must be captured close to their source; typically within a 20K gate region. The sparsity of observe points within 400-800K gate designs is such that observe point sharing would require non-local XOR compaction trees which would not work at speed. For large designs, observe points must either be connected into local pre-existing scan cells or connected into new additional local scan cells. The current logic BIST implementation utilized new scan cell observe points for ASIC1, and

RTL Simulation

Chip Netlist

Figure 6: Design flow.

simulation regression. These are shown highlighted in Figure 6. Daily execution of these flows requires them to run overnight and therefore in less than 12 hours. The DFT used is nearly full scan, muxed-scan style with scan insertion performed on the gate level core netlist. Scan insertion currently takes 1-2 hours of the allotted 12 hours for the flow. While ATPG is important, it is not a critical path flow; ATPG with pattern compression takes approximately one day. Normally, a 10-20% sample of the ATPG scan patterns are serially simulated on an IkosTM, which takes approximately 30 hours. Silicon testing uses a suite of parametric tests followed by functional tests and the scan ATPG tests. Stuck-at ATPG grades are approximately 97% and pseudo stuck-at IDDQ grades are typically 80% with 10 stops. The largest designs have 40K scan cells, and ATPG of these designs generates approximately 5K scan patterns. Assuming that each scan cell generates 3 bits of scan test data per scan pattern, these 5K scan patterns translate to 600 Mb of scan test data. These designs have 8 parallel scan chains and ATPG patterns are applied at 20-50 MHz, giving silicon run times of 1.3 - 0.5 seconds. Scan overhead is approximately 9% extra logic which translates to a 4% chip area overhead.

B. BIST implementation
Logic BIST was implemented on a trial basis into four designs. These are large designs with multiple clocks. Table II gives some vital statistics of the designs. The 75 MHz and 150 MHz clocks are generated within the designs. Although there are lower frequency clocks, all logic works at 75 MHz. For ATPG testing, all clocks run at 50 MHz and this single-frequency multi-clock test mode was used for the current logic BIST implementation. ASIC1 clocks run at 125 MHz. The other ASICs clocks run at 75 MHz. Logic BIST was implemented using the STUMPS architecture, described in Section II. Moving from a scan ATPG methodology to a STUMPS logic BIST methodol-

Paper 14.2 36 4

observe points connected into pre-existing scan cells (using the XOR/MUX circuit of Figure 4) for the larger ASICs. The RAMs in the designs can shadow up to 3% of the faults. Therefore RAM bypass mode was used as shown in Figure 5. Using 10 scan cells per RAM leads to a bypass cost of approximately 125 gates per RAM. Removal of the other X generators can be done either by manual alteration of the source VHDL, or automatic bounding as described in Section III. Manual VHDL source fixing of X generators is not practical within the 1-2 person month resource limit so automatic X bounding was used. The only practical way to implement logic BIST is automation of the design tasks together with a methodology which minimizes the probability of failing timing analysis and simulation. Our STUMPS implementation therefore utilized a tap controller whose RTL was automatically generated. The BIST controller described in Section II was used; its RTL was also automatically generated. Finally, the automatic multi-phase test point insertion, X bounding, and module input bounding described in Section III were used. This combination allowed complete automation of the logic BIST implementation, thereby freeing the 1-2 person months of resource to handle issues of timing analysis and simulation verification.

Table III: Summary of logic BIST results.

C. BIST implementation results


The basic results of the logic BIST implementation are given in Table III. These results represent typical achieved values, not necessarily the optimum possible. Fault grades are quoted for the design core only but all logic in the design core is counted, including the bounding multiplexors and test point logic. The fault list used is the same as that used in ATPG, so all faults in the core are included. Thus, no credit is given for possible detected faults, scan enable faults are not implicitly detected, and faults associated with tied logic are included as not detected. As can be seen, BIST fault grades of 95-96% are achievable with approximately 2% logic overhead. BIST fault simulation time is within the goal. However, scan and test point insertion time ranges from 1-14 hours giving an additional RTLto-gates time of 0.5-12.5 hours versus the goal of 2 hours. This additional time is mainly the time spent performing test point insertion, which includes fault grading. ATPG is performed on the BISTed cores under the same conditions in which BIST is run. The ATPG comparative grades of 9798% indicate an expected grade shortfall with using logic BIST. Note that while the BIST silicon run times are within the 1 second goal, they are 2-3 times longer than the ATPG pattern silicon run times. In the future as ASIC clock frequencies rise to 0.5 GHz and beyond, BIST silicon run times will become less than ATPG pattern run times. Logic BIST test can be topped up with ATPG of the residual faults. For these designs, the ATPG top-up pattern volume is 25-65% of a full ATPG test. A breakdown of the BIST overhead for ASIC4 is given in Table IV. The biggest contributors are the observe points and BIST controller.

Item ASIC1 ASIC2 ASIC3 ASIC4 180K 356K 550K 748K Raw netlist gate count 9K 20K 33K 41K No. scan cells 100 99.8 98.8 99.1 Percent scan (%) 8.1 9.2 9.3 8.7 Scan overhead (%) 80 128 128 120 No. BIST scan chains 31 31 31 31 Bit length of PRPG 80 80 80 31 Bit length of MISR 50 337 444 592 No. control points added 70 800 1200 1200 No. observe points added 65K 262K 262K 262K BIST pattern count BIST stuck-at fault grade 96.0 95.7 95.3 95.6 (%) 3.4 2.6 2.1 1.58 BIST gates overhead (%) BIST chip area overhead 1.3 1.3 1.0 0.8 (%) Scan + test point insertion 0.7 3.3 6.7 14.5 time (hr) 0.9 3.4 5.2 4.0 Fault simulation time (hr) IkosTM simulation time 21 n/a n/a n/a (hr) 125 75 75 75 BIST frequency (MHz) BIST silicon run time .06 0.58 0.93 1.2 (sec) 97.8 97.8 97.2 97.9 ATPG grade (%) ATPG pattern volume 24 344 451 828 (Mb) ATPG top-up pattern vol6 159 293 504 ume (Mb) 50 40 20 50 ATPG frequency (MHz) ATPG silicon run time .02 0.36 .94 0.7 (sec)
Table IV: BIST overhead for ASIC4.

BIST component BIST controller Core input bounding CAM bounding RAM bounding X bounding 592 control points 1200 observe points Total

NAND2 gate equivalents 3489 304 1246 1140 290 1193 5100 12762

Overhead as % of scan-inserted netlist 0.43% 0.04% 0.15% 0.14% 0.04% 0.15% 0.63% 1.58%

Sensitivity of the BIST fault grades to the number of added test points is shown in Figure 7 for design ASIC3. The BIST grades rise sharply as control and observe points

Paper 14.2 36 5

are added. In the 94-96% region, the grade is relatively insensitive to the number of control points but rises significantly as observe points are added. Sensitivity of the grade to the number of BIST patterns is shown in Figure 8 for design ASIC4. Significant grade increases occur for pattern counts up to 256K patterns and beyond.

97.00% 96.00% 95.00%

Fault Grade

94.00% 93.00% 92.00% 91.00% 90.00%


24 00 20 00 16 00

12 00

80 0

Number of Control Points

40 0

40

80 0

12 0

16 00

20 00

24 00

00

Number of Observe Points

Figure 7: ASIC3 logic BIST grades versus test points.

96.00% 95.50% 95.00% Fault Grade 94.50% 94.00% 93.50% 93.00% 92.50% 92.00% 0 100000 200000 300000 400000 500000 600000 Number of BIST patterns

This is pre- or post-layout gate level timing simulation of the 65K BIST patterns in full serial mode. This long simulation time meant that any debug had to be done using a BIST controller setup for just a few patterns. Automatic checking of expected against simulated values for key points in the BISTed design also facilitated debug. Key points are the PRPG state, stumps scan-in points, stumps scan-out points and MISR signature. ASIC1 IkosTM simulation found one timing issue. At the end of the BIST run, the MISR signature is scanned out through a relatively slow chip pin driver. Driver delay variation between min and max timing made the MISR signature slip by one cycle between these conditions. The solution to this was to slow the clock to 50 MHz during scan of the MISR. Addressing the design process goals of logic BIST, the new design flow is shown in Figure 9. The changed components are highlighted. From the viewpoint of the main design processes, this new design flow is essentially unchanged. The only change to these main processes is the addition of automatic bounding and test point insertion to the scan insertion step. The DFT engineer, however, has the extra tasks of RTL VHDL generation of the TAP and BIST controller and the job of getting satisfactory results for gate level BIST fault simulation and functional and timing simulation. Finally, any timing issues with the inserted test points will also result in additional engineering time.

Functional Test

VHDL RTL

BIST Control Generate

BIST Gate Level Simulation BIST Fault Simulation Scan Test Gate Level Simulation

Synthesis

Scan Insertion

Scan Test

Test Point Insertion

Delay SDF

ATPG

Figure 8: ASIC4 logic BIST grades versus patterns.

Top Level Assembly

Layout

Timing Analysis Functional Gate Level Simulation

Pre- and post-layout timing analysis of ASIC1 uncovered the following: 1. Paths from the stumps channel scan-out points to the MISR had setup violations. These were eliminated by reducing the depth of the XOR space compactor from 3 to 1 XOR. 2. A small part of the ASIC1 logic did not run at speed. This was not a functional issue but meant that special handling was needed in order to be able to run logic BIST at speed. Paths from two inputs into this logic were slow. The solution adopted here was to source these two slow inputs with the MTPI phase control signals during logic BIST. That makes these inputs pseudo static at the cost of a small fault grade reduction due to their toggling only once during logic BIST. Of note is that the ASIC1 test points gave rise to no timing issues. ASIC1 IkosTM simulation time was 21 hours.

RTL Simulation

Chip Netlist

Figure 9: Design flow with logic BIST.

An assessment of the success of current implementations of logic BIST in the designs is presented in Table V. Most of the goals were achieved. However, the run time for the design compile is too long and work is underway to address this through design partitioning and distributed processing. IkosTM simulation times are also over the goal, but in retrospect this goal was impractical. Future simulations will mostly be partial simulations as is our current practice with ATPG pattern IkosTM simulations. Confi-

Paper 14.2 36 6

dence in using logic BIST is now high enough that it will actually be used in new designs.
Table V: Status of logic BIST goals.

Goal G1 G2 G3 G4 G5 G6 G7 G8 G9 G10 G11 G12 G13 G14

Description Status Eliminate tester memory and frequency Achieved limitations Achieved Provides at-speed scan testing Expected Works for 1-2 million gate designs Achieved BIST stuck-at grade 95% Achieved Logic BIST area 2% of logic Achieved Silicon BIST run time < 1sec Expected Effort < 2 person month per design Achieved Ability to use ATPG or logic BIST Achieved Minimal impact on design methodology Achieved Automation of the logic BIST flow Expected Additional RTL-to-gates time < 2 hrs Achieved Logic BIST fault grade < 12 hrs TM simulation < 12 hrs Goal Revised Logic BIST Ikos Achieved BIST can be run on very low cost tester

these designs with low area overhead and high stuck-at fault coverage. The test application time as well as the fault simulation time were shown to be low. Finally, with the use of automation, it has been possible to implement logic BIST without impacting the product schedule. The proposed scheme, together with the implementation experience reported, show that logic BIST is a viable and acceptable test solution for large industrial designs. Future work will report on practical issues of implementing multi-frequency at-speed logic BIST as well as measuring the effectiveness of logic BIST test.

ACKNOWLEDGEMENTS
The authors would like to thank Theo Powell and the MOS design center engineers of Texas Instruments, as well as Ian Burgess, Ralph Sanchez, and Kelly Scott of Mentor Graphics, for their contributions and support.

REFERENCES
[1] B. Bottoms, The Third Millenniums Test Dilemma, IEEE Design &Test of Computers, pp. 711, Vol. 15, No. 4, Fall 1998. [2] E. J. McCluskey, Built-In Self Test Techniques, IEEE Design &Test of Computers, pp. 21-28, Vol. 2, No. 2, April 1985. [3] W. Needham and N. Gollakota, DFT Strategy for Intel Microprocessors, Proc. of International Test Conference, pp. 396-399, 1996. [4] T. Foote, D. Hoffman, W. Houtt and M. Kusko, Testing the 400-MHz IBM Generation-4 CMOS Chip, Proc. of International Test Conference, pp. 106-114, 1997. [5] C.-J. Lin, Y. Zorian and S. Bhawmik, PSBIST: A Partial Scan Based Built-In Self Test Scheme, Proc. of International Test Conference, 1993. [6] P.H. Bardell, W.H. McKenney, and J. Savir, Built-In Test for VLSI: Pseudorandom techniques, John Wiley and Sons, New York, 1987. [7] J. Rajski, N. Tamarapalli, and J. Tyszer, Automated Synthesis of Large Phase Shifters for Built-In SelfTest, Proc. of International Test Conference, pp. 1047-1056, 1998. [8] N. Tamarapalli and J. Rajski, Constructive MultiPhase Test Point Insertion for Scan-Based BIST, Proc. of International Test Conference, pp. 649-658, 1996. [9] A. Hassan, J. Rajski, R. Thompson and N. Tamarapalli, Method and Apparatus for At-Speed Testing of Digital Circuits, US patent pending. [10] B. Nadeau-Dostie, D. Burek and A. Hassan, ScanBIST: A Multifrequency Scan-Based BIST Method, IEEE Design &Test of Computers, pp. 7-17, Vol. 11, No. 1, Spring 1994.

V. CONCLUSIONS
In this paper, a practical logic BIST solution for large and complex industrial digital designs has been presented. The challenges in making logic BIST a viable test solution include making a design BIST-ready, achieving high test quality, automating logic BIST, and integrating logic BIST into the overall design flow without impacting the product schedule. Techniques like automatically identifying and bounding X generators, bypassing RAMs, bounding I/Os, and test point insertion have been proposed and discussed to make a design BIST-ready. The multi-phase test point insertion technique has been used to improve random pattern testability of the designs and to make BIST test coverage approach that of ATPG. A novel BIST controller has been proposed to handle at-speed testing of multi-frequency designs. This multi-frequency BIST scheme is designed to test various intra- and inter-clock domain paths at-speed, thereby increasing the quality of test, without requiring that scan shifting be performed at speed. The results of implementing the logic BIST solution on four industrial designs have been reported. The solution embodies the techniques described above, and a number of tools have been used to automate the BIST flow. These tools and techniques have made logic BIST a feasible solution for such large and complex industrial designs. The results presented demonstrate that most of the objectives set for logic BIST have been satisfied for the four designs. It has also been shown that logic BIST is implemented in

Paper 14.2 36 7

Vous aimerez peut-être aussi