10 Commandments Scan Design High Coverage Testing

Ten Commandments of Scan Design
Ken Jaramillo Subbu Meiyappan
VLSI Technology, Inc. (A subsidiary of Philips Semiconductors)
ABSTRACT Although scan design methodologies have been around for several years, many companies are just starting to explore them. This is especially true as companies move into the System On a Chip (SOC) arena. With gate counts increasing at an enormous rate, it is becoming increasingly difficult to achieve high fault coverage production tests without using scan techniques. This paper is meant to help those just starting out in scan designs. It provides useful design tips to ensure successful adoption of scan design methodologies within your company or design group. By adhering to these commandments you will be able to produce chips which can easily be processed by current ATPG tools to generate scan based test vectors providing high fault coverage.
1.0
Introduction
Although scan design methodologies have been around for several years, many companies are just starting to explore them. This is especially true as companies move into the System-On-Chip (SOC) arena. With gate counts increasing at an enormous rate, it is becoming increasingly difficult to produce high fault coverage production tests without using scan techniques. This paper is meant to help those just starting out in scan designs. It provides useful design tips to ensure successful adoption of scan design methodologies within your company or design group. By adhering to these commandments you will be able to produce chips which can easily be processed by current ATPG tools to generate scan based test vectors providing high fault coverage.
2.0
Glossary
AMBA Advanced Microcontroller Bus Interface. The AMBA specification contains descriptions of three bus interfaces commonly used in ARM processor based systems. The first is the Advanced High-performance Bus (AHB). The second is the Advanced System Bus (ASB). And the third is the Advanced Peripheral Bus (APB). ATPG Automated Test Pattern Generation. The process of generating scan based production test patterns automatically via a CAD tool. Capture Cycle This refers to the clock cycle during scan mode in which the scan flip flop muxed inputs are switched to select the normal functional inputs rather than the scan inputs. The scan flops are then said to capture functional data after a scan pattern has been shifted into the device under test. Combinational/Sequential ATPG This concept of combination vs. sequential scan applies to how the ATPG tool handles scan data during capture cycles. This is really important if you have multiple clock domains and the logic between domains interacts during capture cycles. When this is true you normally stagger your clocks for pattern generation. For example, if you have two clock domains, clk1 and clk2, youd assert clk1 at time 200 and de-assert it at time 250 while asserting clk2 at time 300 and de-asserting it at time 350. This staggering of clocks is used to avoid any hold time violations between clock domains. Combinational scan tools cannot handle this situation. They assume that capture data from one clock domain does not affect the capture data of another chain. If your ATPG tool does not support sequential scan, and if your design has interaction between the clock domains, then you must tell the tool to generate only one scan clock at a time during capture cycles. This makes the tools job more difficult because it has to generate more patterns to get the same level of fault coverage it could have gotten had it been able to assert multiple clocks during capture cycles. Sequential scan tools can handle this situation. The tool knows that data from clk1 changes during the capture cycle and that in turn changes the capture data of some flops in the clk2 scan chain.
SNUG San Jose 2000
Fault Grading The process of determining what percentage of manufacturing faults can be detected within a chip by a set of test patterns. Production Testing Production testing is the process of verifying that a chip was manufactured correctly. This is done be creating a set of test vectors that are run on a tester which tests packaged parts. Sequential ATPG See definition of Combinational ATPG. Shift Mode This refers to the clock cycles during scan mode in which the scan flip flop muxed inputs are switched to select the scan inputs rather than the normal functional inputs. This allows us to shift in the current test pattern prior to performing a capture cycle.
3.0
Why is Production Testing Important?
Designers create functional simulations to verify the proper operation of their design. For example, a designer of a memory controller creates simulations to verify that the design operates correctly within the system. This is fine in the virtual world where the chip is only bits and pieces of HDL coding, but what about when the chip is actually manufactured? How do we verify that the chip was manufactured correctly? The answer is production testing. Production tests are used to verify that the design was manufactured correctly and is free from flaws such as power and ground shorts, open interconnections due to dust particles, and other types of manufacturing flaws. In short, production testing ensures that high quality parts with low failure rates are delivered to customers.
4.0
Why is Scan Important?
In the past, functional simulations were used to generate test vectors, which in turn would be used to verify newly manufactured chips on a tester. But because of the high gate counts and extreme complexity of todays SOC designs, these production test techniques are quickly running out of steam. Functional simulations are still used to verify the operation of designs, but it is becoming increasingly difficult to produce enough simulations to provide high fault coverage. For example, consider a 500,000 gate design containing an embedded ARM processor, an embedded OAK DSP, complex memory controllers, several high gate count peripherals such as Firewire, USB, Ethernet, etc Although design teams can produce enough functional simulations to verify the chip, these functional simulations would probably provide only about 80% fault coverage if they were used to produce production test vectors. The amount of effort required to create additional simulations that result in high fault coverage (say 95%) would be extremely difficult and time consuming. With the advent of scan design techniques and Automated Test Pattern Generation (ATPG) tools, however, we can take this same design and produce several thousand production test vectors quickly which provide high fault coverage. The use of scan design techniques simplifies the problem of test pattern generation by reducing the design, or sections of a design, into purely combinational logic. This allows the designer to use fast and efficient ATPG algorithms
SNUG San Jose 2000
developed for combinational logic (provided by the ATPG tool) to generate high fault coverage vectors.
5.0
What is Scan?
Before we go into the concept of scan, lets review what were trying to do. We want to verify that our chip was manufactured correctly. Consider the simple circuit below (Figure 1). If we want to verify that we dont have node A stuck at 0 (due to a manufacturing flaw that shorts it to ground) we can create the vector (A=1, B=0, C=0). Setting B and C low allows the modification of A to directly control the output at D. If A is stuck at 0 then D will be 0 no matter what value is driven at A. Similarly we can create other vectors to verify that each node is not stuck high or low. These are simple test vectors that can be manually created without much effort. But what if the design is more complicated? What if it contains thousands of flip flops and hundreds of thousands of combinational logic gates?
stuck at 0 A B D C
Figure 1 Example Circuit to Verify
The use of scan design techniques allows us to use all of the flip flops in a design as a big shift register during scan testing. That way we can shift patterns into the chip (ex. to drive the inputs of this simple circuit; A=1, B=0, C=0), capture the functional data resulting from the test pattern into the flops (ex. to capture the value at D, 1), and then to shift the results out. Using internal scan, controllability and observability of internal nodes are increased by connecting storage cells into a long shift register (or scan chain), and by enhancing the logic of these cells to support a scan shift mode in which the contents of the scan chain may be serially loaded and unloaded. In normal operational mode, the scan chain does not effect system operation. When scan mode is selected, however, the outputs of each storage cell in the scan design become primary inputs to the combinational logic (increasing controllability), and the inputs of each scan storage cell allow registering of the outputs of the combinational logic (increasing observability). ATPG tools are proficient at generating test patterns to provide high fault coverage for combinational logic. Scan allows the tools to have access to all the combinational logic in the design in an easy manner. The figure below (Figure 2) shows a simple circuit without scan circuitry. The circuit contains three flip flops and some combinational logic. This could represent a simple state machine with a registered output.
SNUG San Jose 2000
Functional Inputs
Figure 2
Design Before Scan Insertion
We can take this same design and add scan to it to create the design in the figure below (Figure 3).
Scan Output 0 1 Functional Inputs 0 1 0 1
Scan Input
Scan Enable
Figure 3
Design After Scan Insertion
In this simple example, we only need to add three additional pins for scan testing: a serial data input Scan Input, a serial data output Scan Output, and a scan mode control pin Scan Enable (the scan clock being shared with the system clock). Often these pads can be combined with system operation pads using multiplexers to reduce the I/O overhead of scan. A serial scan chain has been formed from the Scan Input pad, connecting each flip-flop into a scan register three bits long. The output of the final flip-flop is connected to the Scan Output pad. Note that
SNUG San Jose 2000
each of the flip-flops was replaced with a flip-flop with a muxed input. The Scan Enable signal selects between the normal functional data input (coming from the combinational logic clouds) and the scan data (coming from the Scan Input or the previous flip flop). The figure below (Figure 4) illustrates the timing for scan testing of the circuit.
Scan Clock Scan Input Scan Enable Scan Output s1 s2 s3 X s4 s5 s6
c1
c2
c3
Shift in 1st Vector
Normal Mode
Shift out result of 1st vector and shift in 2nd vector
s1, s2, s3 = Scan Data for the first test vector s4, s5, s6 = Scan Data for the second test vector c1, c2, c3 = Capture Data from the first test vector
Figure 4 Simple Timing Diagram for Scan Operation
There are three stages to this sequence: 1. Scan mode is selected (Scan Enable = 1) and data is serially loaded into the scan chain from the Scan Input signal. 2. Once the scan chain has been loaded (one scan clock for each storage cell in the scan chain), normal system mode is selected (Scan Enable = 0), data is applied at the primary inputs of the chip and observed at the primary outputs of the chip, and one system clock is applied. This captures data from the combinational logic elements of the design into the scan storage cells. Notice that the Scan Enable signal is asserted and de-asserted on the falling edge of the clock. This helps make timing easier, especially hold time constraints. 3. Finally, scan mode is selected again and the scan clock is used to unload the scan chain through the Scan Output where data is checked against expected values. While captured data
SNUG San Jose 2000
is being shifted out of the scan chain, input data from the next scan test pattern may be loaded. The next example (Figure 5) shows a circuit using two clock domains. The scan circuitry has already been included. The upper portion of the figure shows clock domain 1 using clk1. It consists of two flip flops and some combinational logic. The bottom portion of the figure shows clock domain 2 using clk2. It consists of three flip flops and some combinational logic. The circuit could be thought of as two state machines with a single signal going between the two for communication, which is synchronized by a single flop. In this example, we include two scan chains, one for each clock domain. The first starts with the Scan Input 1 signal going through flops 1 and 2 and output as Scan Output 1. The second scan chain starts with the Scan Input 2 signal going through flops 3, 4, and 5, and output as Scan Output 2. Notice that there is still only one Scan Enable signal for the circuit even though there are two scan chains. Because there is interaction between the two scan chains (the functional path from flop 1 to flop 5) well have to be careful when we assert the clocks during the capture cycle or else we could end up with a hold time violation at flop 5.
SNUG San Jose 2000
Scan Output 1 0 2 1 Functional Inputs 0 1 1 clk1 Scan Input 1 clk1 Functional Outputs
Scan Enable Scan Output 2
0 5 1 clk2
0 4 1 Functional Inputs 0 3 1 clk2 Scan Input 2 clk2 Functional Outputs
Figure 5
Scan Design with Multiple Clock Domains
The figure below (Figure 6) illustrates the timing for scan testing of this circuit. Similar to the first timing diagram example, there are three stages to this sequence:
SNUG San Jose 2000
1. Scan mode is selected (Scan Enable=1) and data is serially loaded into the scan chains from the Scan Input 1 and Scan Input 2 inputs. Notice that the scan chains are loaded in parallel. The length of the longest scan chain determines the length of the scan shift operation. Scan chain 2 is 3 flops long. Therefore the scan shift operation takes 3 clocks. But what about scan chain 1 that is only 2 flops long? We still shift in 3 data values into this chain, but the first value shifted in is a dont care. 2. Once the scan chains have been loaded, normal mode is selected (Scan Enable = 0), data applied at the primary inputs of the chip and observed at the primary outputs of the chip, and one system clock is applied. This captures data from the combination logic elements of the design into the scan storage cells. This capture cycle is a little different than the first example, which had only one clock domain. Because we have two clock domains, and because there is interaction between the two domains, the capture cycle must be such that the assertion of the clocks is staggered. This prevents any potential timing problems with the data crossing the clock domains. Scan chain 1 is clocked first in the capture cycle (first clock after Scan Enable goes low) to capture data into the clk1 based flops. Scan chain 2 is clocked next (second clock after Scan Enable goes low) to capture data into the clk2 based flops. Note that after capture data is latched into the flops on scan chain 1 the functional inputs of the second scan chains flops will change (really only the input to flop 5). Not all ATPG tools can handle this situation. Only sequential based ATPG tools can handle this. Purely combinational based ATPG tools can not. If using a combinational ATPG tool, one would tell the tool to assert only one of the clocks during the capture cycle. The tool would then only assert clk1 or clk2 during the capture cycle. This results in a higher number of patterns to achieve the same level of fault coverage. 3. Finally, scan mode is selected again and the scan clocks are used to unload the scan chains through the scan output pins where data is checked against expected values. While capture data is being shifted out of the scan chains, input data from the next scan test pattern may be loaded. Note that only the first cycle of the shift is shown.
SNUG San Jose 2000
Scan Enable
clk1 Scan Input 1 Scan Output 1 clk2 Scan Input 2 Scan Output 2
s2_1 s2_2 s2_3 X s2_4 X s1_1 s1_2 X X
c1_1
c1_2
c2_1
c2_2
Shift in 1st Vector
Capture cycle
Shift out result of 1st vector and shift in 2nd vector
s2_1, s2_2, s2_3 = Scan Chain 2 Data for the first test vector s1_1, s1_2 = Scan Chain 1 Data for the first test vector
c2_1, c2_2, c2_3 = Scan Chain 2 Capture Data for the first test vector c1_1, c1_2 = Scan Chain 1 Capture Data for the first test vector
s2_4, s2_5, s2_6 = Scan Chain 2 Data for the second test vector s1_3, s1_4 = Scan Chain 1 Data for the second test vector
Figure 6
Timing Diagram For Scan Operation Including Multiple Scan Chains
5.1
Scan Techniques
All the examples so far have shown scan storage elements as being flip flops with muxed inputs. These muxed flip flops are only one type of scan storage element. The other types of scan elements are as follows: clocked scan elements and Level-Sensitive-Scan Design (LSSD) elements. Each type of scan element provides its own benefits. Muxed Flop and Clocked Scan techniques are better suited for designs containing edge triggered flip flops. LSSD techniques are normally used on latched based designs. The type of scan element you decide to use depends on your design and upon your ASIC vendor. This paper focuses on the Muxed Flip Flop technique.
5.1.1 Muxed Flip Flop

A Muxed Flip Flop scan element contains a single D type flip flop with multiplexed inputs that allows selection of either normal functional data or scan input data. The figure below (Figure 7) shows a muxed flip flop scan element. In normal mode (Scan Enable=0) the system data
SNUG San Jose 2000
10
(functional input) goes through to the flip flop and is registered. In scan mode (Scan Enable=1) scan data goes through to the flip flop and is registered.
Functional Input 0 1
Scan Input Scan Enable Clock Original D Flip Flop
Muxed Flip Flop Scan Element

Figure 7 Muxed Flip Flop Architecture
5.1.2 Clocked Scan

Clocked Scan elements are very similar to Muxed Flip Flop elements but uses a dedicated test clock to register scan data into the flop rather than a flip flop. During normal operation, the system clock registers the system data (functional input) into the flop. During scan mode, the scan clock registers the scan data into the flop.
Functional Input
Scan Input System Clock Original D Flip Flop Scan Clock
Clocked Scan Element

Figure 8 Clocked Scan Architecture
5.1.3 LSSD
LSSD uses three independent clocks to capture data into the two latches contained within the scan cell. During normal mode, the master latch uses the system clock to latch system data (functional input) and output it to the normal functional data output path. During scan mode, the two scan clocks control the latching of data through the master and slave latches to generate the scan data output.
SNUG San Jose 2000
11
Functional Input D Q EN Original Latch System Clock Scan Input Scan Clock 1 D Q Master Latch Scan Clock 2 EN Slave Latch Scan Output Functional Output
LSSD Element
Figure 9
LSSD Scan Architecture
5.1.4 Lock-Up Latches

Scan chains are particularly vulnerable to clock skew problems. There are two main causes of clock skew: 1. The same clock may be used for hundreds or thousands of scan storage cells with no circuitry between them. Logically adjacent storage cells in the scan chain may be physically separated in the layout. Clock skew between successive scan storage cells must be less than the propagation delay between the scan out of the first storage cell and the scan in of the next storage cell. Otherwise data slippage may occur. This means that the data latched into the first scan cell will also be latched in the second scan cell. This is an error since the second scan cell should have latched the first scan cells old data rather than its new data. Figure 10 demonstrates this. In the figure, the path delay for the data is less than that of the clock. Because of this, new data at Da passes all the way through to Dd in one clock period. The second flip flop should have latched the old value at Dc (a logic high) rather than the new value.
SNUG San Jose 2000
12
Da
Db
Dc
Dd
CLKa
CLKb
Path delay of the data is less than that of the clock. This allows the data at Da to pass all the way through to Dd on a single clock edge.
CLKa
Da
Db
Dc
CLKb
Dd Figure 10 Clock skew causing data slippage
2. Youll see from later sections that scan chains are separated by clock domains. For example, all the flip flops from the clk1 clock domain in the previous example are linked in the same scan chain. Likewise all the flip flops from the clk2 clock domain form a second scan chain. If it is desired to link these two scan chains together to form a single scan chain we could have timing problems because the two clocks are generated by two different clock trees which will introduce some amount of skew between the two clocks. We cannot link the two scan chains together unless we handle this clock skew problem. The timing for this scenerio would be very much the same as in the previous example except there would be two separate clocks rather than a single clock. Lock-up latches are nothing more than transparent latches. They are used to connect two scan storage elements together in a scan chain where excessive clock skew exists. The figure below (Figure 11) illustrates the use of lock-up latches. It contains two flip flops. Flip flop 1 represents the end of the scan chain containing only elements that are in the clk1 clock domain. Flip flop 2 represents the beginning of the scan chain containing only elements that are in the clk2 clock domain. Note that were not showing the fact that these flops really have multiplexer inputs. The inputs of these flops really represent the scan inputs of the multiplexers. The latch has an active
SNUG San Jose 2000
13
high enable and only becomes transparent when clk1 goes low. It effectively adds a half clock of hold time to the output of flop 1. In this figure we assume that clk1 and clk2 are asserted synchronously as would be the normal case during scan mode operation. Note that even though they are asserted synchronously, there will still be some amount of clock skew between them as they are generated from different clock trees.
Lock-up Latch Da 1 Db Dc EN Dd 2 De
CLK1 CLK2
CLK1
Da
Db
Dc
EN
Dd
CLK2
De
Figure 11 Lock-up Latch Technique
SNUG San Jose 2000
14
The figure above shows lock-up latches being used to connect scan chains from different clock domains. They can just as easily be used to connect scan chains from various blocks within a chip which although on the same scan chain are located physically remote from each other on the die. Note that we want to make the latch transparent during the inactive part of the clock. For example, both flops above are triggered on the rising edge of the clock. Therefore we want to make the lock-up latch transparent during the low period of the clock. If the flops were triggered on the falling edge of the clock wed want the latch to be transparent when the clock was high.
6.0
Ten Commandments of Scan
Now that you know the basics of scan, what are the most important issues to be aware of to guarantee successful adoption of scan techniques within your company or design group? Handle internal tristate busses with care. Avoid bus contention by design. All clocks and asynchronous resets must come from chip pins during scan mode. All scan elements on a scan chain should be in the same clock domain. Know the requirements and limitations of your chip testers. Handle mixing flip flops triggered off different edges of the clock with care. Break all combinational logic feedback loops. Handle all non-scan elements with care. Avoid design practices that lead to non-scannable elements. Handle multiple clock domains with care to avoid potential timing problems. Plan chip level scan issues before you start block level design.
6.1
Commandment #1 - Handle Internal Tristate Busses With Care
Without a doubt, the single biggest hurdle to overcome in SOC designs with respect to ATPG is the proper control of internal tristate bus structures. Here is a simple rule that should always be followed, if possible: Do not implement designs with internal tristate bus structures. If this is not possible then always fall back to this position Implement the minimum number of internal tristate bus structures as is possible, and guarantee by design that there can be no bus contention on any internal bus during scan testing. There are two control problems that should be carefully considered and that must be taken care of by design: First, the designer should ensure that there will be no contention on the tristate busses during scan shift operations. This can be done automatically during the scan insertion phase by most scan insertion tools. Second and most importantly, the designer must ensure that there is no possible contention on the internal tristate busses during capture cycles during scan testing.
SNUG San Jose 2000
15
With most designs it is possible to generate a scan test pattern that would cause bus contention on some internal busses. Several ATPG tools are intelligent enough to avoid generating patterns causing bus contention. The problem is that while the tools may be intelligent enough to avoid contention, it takes much more CPU effort to achieve this, and depending on the design, it may make it so CPU intensive that the result is much longer run times, fewer patterns generated, and lower fault coverage. If the ATPG tool is incapable of identifying scan test patterns that create bus contention, and those vectors are used to test the device, then the part may be stressed during production test to the point that it may fail on the tester, be damaged, or suffer a shortened life cycle as a result of the stress induced by the production test. Therefore avoiding contention on internal tristate busses is very important. Issues with bus contention in SOC designs can occur at two levels. The first is within a design block that contains multiple drivers to a tristate port. The second is at the chip level where multiple blocks interface to the same bus. Consider the case of an internal PCI bus structure. In normal operation, only one master can control the bus at a given time. This fact is guaranteed by the bus arbitration logic via the request/grant pairs. During scan testing however, the ATPG tool can easily generate test patterns which would turn on multiple requests, grants, and output enable signals for bus transceivers, thus forcing multiple devices onto the tristate bus at once. Consider the following figure. It represents two blocks which both drive a common internal tristate bus. The figure represents a single bit of the bus. In each block, the output enables for the bus transceivers are controlled by scan flops. The figure shows the last flop in Block As scan chain driving the first flop in Block Bs scan chain. If the ATPG tool generates a pattern which causes both flip flops to shift in values of 0 then wed have bus contention on this bit of the bus.
SNUG San Jose 2000
16
Potential Bus Contention During Scan Testing
Functional Input Scan Input Scan Enable Clock Scan Enable Clock
0 1
Block A
Functional Input Scan Input Scan Enable Clock
0 1
Block B
Figure 12
Example Bus Contention During Scan Testing
While there are several potential solutions to these types of problems (See Appendix A Internal PCI Bus Contention Solution), they are generally pretty simple to come up with. The important thing to do is recognize that if you use internal busses, you must guarantee by design that bus contention is not possible during scan testing.
SNUG San Jose 2000
17
6.2
Commandment # 2 - All Clocks and Asynchronous Resets Must Come From Chip Pins
The next biggest issue when it comes to achieving high fault coverage is to ensure that all clocks and asynchronous resets come from chip pins during scan testing. This allows the ATPG tool to control clocks and resets in the design. Neglecting this fact will cause the ATPG tool to consider each potential scan element that does not have a clock or reset coming from a chip pin to be unscannable. All unscannable cells will be considered unknown during pattern generation resulting in reduced fault coverage. What do we mean by this commandment? Do we mean that all clocks and resets must come directly from pins? No. We mean that the ATPG tool must have total control of scan element clock and reset signals. It must be able to totally control the clocks and be able to de-assert the resets. The following examples demonstrate this commandment.
6.2.1 Must Be Able to Disable Flip Flop Asynchronous Inputs Via Chip Level Reset Pin
The figure below (Figure 13) shows one flop in the scan chain driving the asynchronous set or clear of another flop. This design practice must be avoided. The problem is that as data is being shifted around the scan chain, the second flip flop will be resetting (set or clear) depending on the shift data. The ATPG tools could not produce useful scan patterns if this type of circuit existed. Because of this, the second flop will not be included on the scan chain and will be considered an X during pattern resulting in a loss of fault coverage. If this type of logic exists then a mux should be inserted in the reset path of the second flip flop which allows (only during scan test mode) the reset signal to be controlled via a chip pin or disables the reset pin of the flop all together.
Functional Input 0 1
Scan Input Scan Enable Clock
Functional Input
Clock
Figure 13 Example: Asynchronous Flip Flop Inputs
The example above showed a flop directly driving the asynchronous input to another flop. To be more generic, we want to avoid any asynchronous flop input that cant be disabled by a chip level
SNUG San Jose 2000
18
reset pin. Therefore, if a flop has its asynchronous reset input tied to the output of some combinational logic (may have scan flop outputs as its inputs or even chip level inputs) which cannot be disabled by a chip level reset pin, then mux circuitry will have to be inserted just like the example given above. Note that if the offending signals which prevent the flops asynchronous input from being disabled by a single chip reset pin are themselves chip pins, then we can solve the problem by forcing the ATPG tool to drive these pins to constant values during pattern generation. This is easier than adding mux circuitry.
6.2.2 Must Be Able To Completely Control Flop Clock Inputs Via Chip Level Pin
The figure below (Figure 14) shows one flop in the scan chain driving the clock input of another flop. This design practice must be avoided if possible. The problem is that as data is being shifted around the scan chain, the second flip flops clock will be toggling depending on the shift data. The ATPG tools could not produce useful scan patterns if this type of circuit existed. Because of this, the second flop will not be included on the scan chain and will be considered an X during pattern resulting in a loss of fault coverage. This type of design will exist for circuits such as clock dividers. Therefore, if this type of logic exists then do one of the following: 1) Insert a mux in the clock path of the second flip flop such that the clock input is tied (only during scan test mode) to one of the scan clocks. Note that since we are introducing logic in the clock path, the clocks between the flops will no longer be considered synchronous. Therefore a lockup latch should be inserted in the scan chain before and after the second flip flop to avoid any potential hold time problems. If several instances of this circuit exist, you may wish to create a special clock just for this situation that all these flops use during scan test mode. In this case you would only need to place a lockup latch before the first of these flops and after the last of them. 2) Insert a mux in the path of the asynchronous reset path of the second flip flop such that it is tied (only during scan test mode) active holding the flop in reset. This isnt as effective as the first solution but is better than having the ATPG tool consider it an unknown.
Functional Input
Functional Input
0 1
Scan Input Scan Enable Clock

Figure 14
Example: Flip Flop Clock Pin Controllability
SNUG San Jose 2000
19
The example above showed a flop directly driving the clock input to another flop. To be more generic, we want to avoid any clock input that cant be totally controlled by a single chip level clock pin. Therefore if a flop has its clock input tied to the output of some combinational logic (may have scan flop outputs as its inputs, chip level inputs, and even chip level clock inputs) which cannot be totally controlled by a single chip level clock pin, then mux circuitry will have to be inserted just like the example given above. Note that if the offending signals which prevent the flops clock input from being totally controlled by a single chip clock pin are themselves chip pins, then we can solve the problem by forcing the ATPG tool to drive these pins to constant values during pattern generation. This is easier than adding mux circuitry.
6.3
Commandment # 3 - All Scan Elements On a Scan Chain Should Be In the Same Clock Domain
There are several factors that determine the number of scan chains in a given design. In general, you want to divide scan chains by clock domain. All flops in a given scan chain should use the exact same clock. But there are factors which might make this selection criteria undesirable. 1) Each scan chain must have its own scan input and scan output pin. The more scan chains you have the more pins you must set aside for test. If you dont dedicate pins for test ,you must dedicate mux logic to mux the scan inputs and outputs with other chip pins. The production tester the chips will be tested on has limitations which affect the number of scan chains a given design can support. See the next commandment for more information. It is generally a good idea to equalize scan chain lengths. Remember that each scan pattern is as long as the longest scan chain. For example, consider a design with 10 scan chains, one chain being 1000 flops long and the rest being 2 flops a piece. Each scan chain will require a test pattern which is 1000 bits long for each test pattern even though the chains which are 2 flops long have patterns which contain 998 dont care bits and only 2 real test bits. So it may be wise to break up some of the longest chains into multiple chains.
2)
3)
If you decide to combine scan chains based on different clocks (attempting to equalize the scan chain lengths, reacting to tester limitations, etc.) make sure you place lockup latches in between the scan chains to avoid potential hold time problems. It is also a good idea to include lockup latches between chip level blocks even if they are in the same clock domains. This isnt really needed if an accurate static timing analysis has been done, but it makes it much less likely youll have any potential hold time problems between blocks. This is important because its pretty common for the bulk of the effort relating to scan to begin after a final netlist has been delivered for place and route of the chip. Statiic timing scripts for scan paths are typically developed after the scripts to verify functional paths. Adding these lockup latches doesnt add many gates to the design but it avoids potential timing problems that might not be found until late in the design cycle.
6.4
Commandment # 4 - Know the Requirements and Limitations of Your Chip Testers

20 Ten Commandments of Scan Design
SNUG San Jose 2000
You have to know the limitations of your production tester before you can plan an effective strategy for scan. There are two limits that impact test. The first is test time. In general, production tests should be designed to operate in less than three seconds, roughly the cycle time of the device handler. Test times that take longer than three seconds result in excess cost per chip for the extra testing time. The second limit to keep in mind is tester memory. The entire test program must fit in the available memory of the tester. Reload of test memory mid-test is never permitted. Most testers are limited to a certain number of scan chains due to dedicated scan hardware constraints. After this, the limitation is based on total amount of tester memory. A tester with 128 Mbits of memory could be configured as follows:
Max Number of Scan Chains 1 2 4 8 16 32 Available Memory Per Chain 128 Mbits 64 Mbits 32 Mbits 16 Mbits 8 Mbits 4 Mbits
In this example, the tester supports a maximum of 32 scan chains. The number of scan chains chosen determines how much memory youll have to work with which directly impacts the number of test patterns you can support. Most testers work on even numbers of scan chains. For example, if a design had 9 scan chains, the available memory per chain would still be 8 Mbytes; the remaining memory would be inaccessible for the scan test. The following is an example of how to determine the number of allowable test patterns you can generate based on the testers memory limits.
Tester_Memory_Per_Chain > (#Scan_Patterns*Max_Scan_Chain_Length)+Max_Scan_Chain_Length Tester_Memory_Per_Chain = Total_Tester_Memory/Number_Of_Scanchains Number_Of_Scanchains is in multiple of two increments (except if you have only 1 scan chain). 1 Meg of memory = 1,048,576 bits #Scan_Patterns < (Tester_Memory_Per_Chain- Max_Scan_Chain_Length)/Max_Scan_Chain_Length
For example, if the amount of memory available on the tester is 256 Mbytes, your design has 8 scan chains, and your longest scan chain is 3000 flops long, then
Tester_Memory_Per_Chain = 256Mbits/8 = 33,554,432 bits = 32 Mbits #Scan_Patterns < (33,554,432-3000)/3000 = 11183 ATPG patterns.
Note that this is one example of calculating tester memory limitations. These calculations depend on the type of tester used. One should consult their test engineering personnel for the details on their particular testers.
6.5
Commandment # 5 - Handle Mixing of Flip Flops Triggered Off Different Edges of the Clock With Care
ATPG tools require that all falling edge triggered flip flops be placed at the front of a scan chain. If a falling edge triggered flip flop were placed after a rising edge triggered flip flop in the scan
SNUG San Jose 2000
21
chain, then scan data would be clocked through both flip flops in a single clock cycle. This would cause some loss of coverage since the two flops would always have the same scan data value after a shift cycle. How do we handle the fact that several blocks within a chip may have falling edge triggered flip flops? Do we have to place all falling edge triggered flops at the front of the entire scan chain (consisting of multiple chip level blocks)? The answer is no. Whenever a falling edge triggered flip flop follows a rising edge triggered flip flop in a scan chain, a lockup latch must be inserted between the two. The lockup latch will prevent data from shifting through both flip flops in one clock cycle. To avoid having an excessive amount of lockup latches, it is still advisable to place all the falling edge triggered flip flops at the beginning of the scan chain for each block. Then we only need lockup latches to be placed between blocks. The figure below illustrates this point. It shows two chip level blocks (Block A and Block B) each containing falling edge triggered flops. The blocks scan ports are connected together via lockup latches at the chip level.
Lockup Latch
Scan Input
EN
Block A Clock
Block B
Figure 15
Scan Routing of Flip Flops Triggered Off Different Edges of the Clock
Be warned that a few ATPG tools have difficulty handling falling edge triggered flops during capture cycles and may require special commands to inform them how to handle the flops. You will want to investigate how your ATPG tool handles this situation. You may even consider changing the circuit such that these flip flops are triggered off the rising edge of the clock during scan mode rather than being triggered off the falling edge of the clock. But remember that any modification of the clock inputs of these flops effectively makes them in a different clock domain. They will require lockup latches to be placed before and after them in the scan chain.
6.6
Commandment # 6 - Break All Combinational Logic Feedback Loops
Designs containing combinational feedback loops have inherent testability problems. Combination feedback loops may introduce internal logic states into a design that cannot be controlled via scan storage elements. Consider Figure 16 . It shows a circuit with three flip flops and a combinational feedback loop from U6 to U3. If the flip flops were initialized to values of U1=0, U2=0, and U4=1, then the output at U6 would be a stable high. If flip flop U2 were to change to a logic high, then the output at U6 would begin to oscillate between 0 and 1. Because of this, ATPG tools cannot predict the operation of the circuit. In order to generate patterns, the ATPG tool would have to break this loop which would result in a reduction in overall fault coverage. ATPG tools have a few different methods at their disposal for breaking combinational feedback loops. Some are less harmfull to fault coverage than others. But all methods result in some loss of coverage. Therefore one should avoid combinational feedback loops whenever
SNUG San Jose 2000
22
possible. Most ATPG tools will inform the user of all the combinational feedback loops present in the design. If you cannot avoid these feedback loops, then follow these guidelines: 1) Break the feedback loop with an additional flip flop inserted in the feedback path that is in the path only during scan test mode. This modification will result in the highest fault coverage. If you cannot insert a flip flop, then insert a mux in the feedback path that drives a constant value during scan test mode. This will result in lower coverage than the flip flop solution but higher coverage than allowing the tool to break the loop by assuming an unknown value as a result of the loop.
2)
The figure below shows an example circuit with a combination logic feedback loop. The feedback loop from U6 to U3 is the problem. We must break this loop by inserting a mux in the feedback path which drives either a constant value or uses the output of an additional scan flip flop. Note that this logic is only active during scan mode. During normal operation the feedback path should be as it was originally designed.
U1
U2
U3
U6
U5 U4
Figure 16
Combinational Feedback Loops
6.7
Commandment # 7 - Handle All Non-Scan Elements With Care
Scan insertion tools consider all cells that do not have a scan equivalent cell as black boxes and will not insert them into a scan chain. ATPG tools consider sequential cells that are not on scan chains as being black boxes. Therefore we must treat all non-scan sequential elements with care
SNUG San Jose 2000
23
to avoid loss of fault coverage. Examples of non-scan elements are as follows: latches, RAMs, blocks in the design which do not include scan, etc
6.7.1 Latches
Latch based designs, while popular for gate and power savings, are not handled optimally by most scan/ATPG tools. ATPG tools are capable of understanding the behavior of latches that are held transparent. The behavior of latches when they are not transparent is usually modeled as driving unknowns. If the latch data is fed into other logic that is then captured into a scanned register, then poor fault coverage could result. In general you want to keep all latches transparent during scan testing, but you should investigate how your scan/ATPG tool handles latches.
6.7.2 RAM
RAM cells have more complex failure modes than do the simple stuck-at modes for standard cell logic (flip flops, latches, and combination logic gates. Because of this, scan techniques are not used to verify RAM circuits during production testing. Instead, a technique known as RAM BIST (Built In Self Test) is used to verify RAM cells. This technique involves writing several patterns into the RAM array to check for the various failure modes of RAM cells. BIST is a well known technique. Refer to the Mentor Graphics ASIC/IC Design-for-Test Process Guide for more information. It provides a good introduction to BIST and for testing of RAM and ROM memories. Because RAMs are tested via BIST (achieving very high fault coverage), they do not need to be tested and fault graded by ATPG tools. But even when RAM is made fully testable (via BIST logic), a significant reduction in test coverage of the surrounding logic may result (commonly referred to as the shadow effect). Imagine a FIFO array with data being pushed in one side and pulled out on the other. FIFOs such as this are typical in networking designs such as Firewire, communication designs such as Satellite Modem Receive and Transmit Buffers, etc In these cases, the logic surrounding the FIFO array will not be tested unless special techniques are used to make the FIFO ATPG tool friendly. Figure 17 shows an example of a RAM array (FIFO) used in a networking application. The logic in A and B is responsible for grabbing the data off the network and placing it in system memory. If we dont handle the FIFO carefully we could lose a significant amount of fault coverage because of the FIFO. We need to design the FIFO such that we can observe the outputs from the logic in B and control the inputs to logic A during scan test mode.
SNUG San Jose 2000
24
System Memory
local memory bus (ex. PCI)
FIFO
Logic needed to take data from network and place in System Memory
Network (IEEE 1394 Firewire, Ethernet, etc...)

Figure 17 Simple example of RAM (FIFO) used in networking design
Several techniques can be used to increase observability of logic immediately before the RAM and increase the controllability of logic immediately after the RAM. Support for these techniques varies considerably between the various ATPG tools. So investigate your ATPG tools capabilities before you decide how to handle RAM. Consider the following suggestions. Keep in mind that there are all sorts of RAM blocks out there. Some have bi-directional data busses while others have uni-directional data busses. Some are synchronous while others are synchronous. Which one you use depends on your application and upon your vendor library. 1) Isolate the RAM block by de-asserting its output enable signal during scan mode.
This is one of the easiest solutions. It doesnt add any observabilty or controllability for the RAM, but it does get the RAM off the bus so that it does not interfere with the other blocks on the data bus. The implementation of this depends on the type of RAM the design uses. Figure 18 shows the logic necessary to isolate a RAM which has a bi-directional bus using the RAMs output enable signal. Figure 19 is similar to Figure 18 except the RAM has separate data in
SNUG San Jose 2000
25
and data out busses, and these busses are used separately in the design. Figure 20 is similar to Figure 19 in that the RAM has separate data in and data out busses, but this implementation combines the data busses into a single bi-directional data bus.
oe_n scantestmode cs_n we_n Addr[m:0] Data[n:0] oe_n cs_n we_n Addr D
RAM
(with bidirectional data bus)
Figure 18 Isolating RAM by de-asserting output enable (RAM with bidirectional data)
oe_n scantestmode cs_n we_n Addr[m:0] Data_in[n:0] Data_out[n:0] oe_n cs_n we_n Addr Din Dout
RAM
(with uni-directional data bus)
Figure 19 Isolating RAM by de-asserting output enable (RAM with uni-directional data)
SNUG San Jose 2000
26
cs_n we_n Addr[m:0]
cs_n we_n Addr Din
Data[n:0]
Dout
oe_n scantestmode
Figure 20
RAM
Isolating RAM by de-asserting output enable (RAM with uni-directional data)
2)
Isolate the RAM block by inserting a multiplexer to drive the data signals during scan mode.
The values driven may be constant or may be some combination of the input control signals. Note that this is only useful for RAM blocks with unidirectional data busses (read bus and write bus). This is the next step up from simply disabling the RAM like in the previous examples. In this case we at least allow a constant pattern to be driven onto the output data bus. This adds controllability to the logic immediately after the RAM because the ATPG tool can affect the output data of the RAM (although only in a simple way). Figure 21 shows an implementation which uni-directional data busses while Figure 22 shows one which uses a bi-directional data bus. Both figures show a RAM block which has unidirectional data busses.
SNUG San Jose 2000
27
RAM
oe_n cs_n we_n Addr[m:0] Data_in[n:0] oe_n cs_n we_n Addr Din
Data_out[n:0]
Dout Constant Value
scantestmode
Figure 21 Isolate RAM by driving output data to constant value (uni-directional data busses)
SNUG San Jose 2000
28
RAM
cs_n we_n Addr Din
cs_n we_n Addr[m:0]
Data[n:0]
0 1
Dout
oe_n scantestmode Constant Value

Figure 22 Isolate RAM by driving output data to constant value (bi-directional data busses)
3)
Place the RAM block into a transparent mode during scan test.
In this mode you essentially route data in to data out. . Note that this is only useful for RAM blocks with unidirectional data busses (read bus and write bus) and designs which use the busses separately. While the last solution provided some amount of controllability to the logic immediately following the RAM, this solution provides both observability to the logic immediately before the RAM and controllability of the logic immediately after the RAM.
SNUG San Jose 2000
29
RAM
oe_n cs_n we_n Addr[m:0] Data_in[n:0] oe_n cs_n we_n Addr Din
Data_out[n:0]
1 0
Dout
scantestmode
Figure 23 Placing RAM in transparent mode
4)
Write RAM data prior to scan test and use the RAM contents to generate test data for the surrounding logic during scan test.
To avoid disturbing the RAM contents during scan test, disable the RAM write signal during scan test mode. This method requires the ATPG tool to support a functional RAM model. Some ATPG tools allow for only a partial initialization of the RAM array. This solution adds a lot of controllability to the logic immediately after the RAM but no observability to the logic before the RAM. One of the drawbacks (besides the observability problem) is the length of time that might be required to initialize the RAM array. Another potential drawback is that it requires the ATPG tool to support a functional RAM model. But many ATPG tools support RAM models, so this may not be an issue. Figure 24 shows the implementation where we must initialize the entire RAM prior to scan testing and then allow the contents of the RAM to test the surrounding logic. Figure 25 shows the implementation where we only need to initialize a single location within the RAM prior to scan testing. This saves time on initialization but will not provide as much controllability as the logic shown in Figure 24 . It is feasible to create a solution somewhere in between these two that only requires initialization of a certain number of locations (ex., the bottom 1K of memory).
SNUG San Jose 2000
30
we_n scantestmode cs_n oe_n Addr[m:0] Data_in[n:0] Data_out[n:0] we_n cs_n oe_n Addr Din Dout
RAM
Figure 24 Pre-initialize entire RAM and protect it from being written during scan test
we_n scantestmode cs_n oe_n Addr[m:0] 0 0 1 we_n cs_n oe_n Addr
Data_in[n:0] Data_out[n:0]
Din Dout
RAM
Figure 25 Pre-initialize one RAM location and protect it from being written during scan test
5)
Leave the RAM as is and let the ATPG tool exercise it functionally to generate logic values to test the surrounding logic during scan test.
This requires the ATPG tool to support a functional RAM model. This solution requires no changes to the hardware and provides the most observability of logic before the RAM and
SNUG San Jose 2000
31
controllability of logic after the RAM. But it requires an ATPG tool capable of modeling RAMs and a little more effort to learn how to use it. Which solution you choose depends on your design architecture (single bi-directional bus or two uni-directional busses), your timing budget (can you withstand logic added to the data paths), and your ATPG tool (does it support modeling of RAMs). Solution 1) is the easiest but offers no help as far as fault coverage is concerned. Solution 3) adds observability and controllability but adds delay to the data path. Solution 5) also adds observability and controllability but requires a little more from your ATPG tool and the person using it. All solutions are valid (1-4). Which one you use depends on your situation.
6.7.3 Non-Scanned Blocks

There may be portions of your chip which are not scannable. Examples are older versions of blocks, 3rd party IP, etc These blocks may be tested with canned test vectors, logic BIST, or other testing methods. But you need to be careful that the lack of scan in these blocks does not hurt the ability to test other blocks in the chip. Multiplexer isolation techniques can be used to separate any non-scan blocks from the scan section of the design during scan mode. Well designed scan and non-scan isolation with appropriate control logic will result in fast and trouble free test program generation with high fault coverage. To increase the fault coverage obtained from scan ATPG software, the isolation circuitry of non-scanned blocks should set the outputs of such blocks to known logic states during scan mode.
6.7.4 Non-Scanned Flip Flops

Examples of these are flops that have no scan equivalent and so could not be included on the scan chain or were designed into the circuit such that they could not be placed on a scan chain, improper generation of clock or reset inputs (Commandment #2). The first choice should be to fix the reason for the problem. For example, if the flop has no scan equivalent cell in the ASIC library, then change the design if possible such that it can use a scan type flip flop. If there are problems with the generation of the clock or reset inputs, then change the design as per Commandment #2 recommendations. If the design cannot be modified to fix the reason for the scan problems, then the last resort is as follows. Design access to the preset or clear connectors such that the flip flops are held in known states during scan mode (either hold preset active or clear active). This reduces fault coverage somewhat since it limits the controllability of input nodes of the flip flop itself and also of the logic downstream from the flip flop.
6.8
Commandment # 8 - Avoid Design Practices Which Lead To NonScannable Elements
The ASIC vendor and ASIC library you choose will dictate which types of scan equivalent cells you can design with. Typically an engineer writes HDL code to produce sequential logic without paying much attention to what types of cells are available. This is because most vendor libraries have a rich variety of standard logic cells to choose from. But the scan insertion tools will pick flip flops that come from a subset of this library. Currently there is only one cell that seems to be universally lacking in vendor libraries when it comes to scan: flip flops with both asynchronous set and asynchronous clear inputs. Therefore, avoid designing functionality that requires these
SNUG San Jose 2000
32
types of cells. There are usually scan flops that have either an asynchronous set or an asynchronous clear, but not both.
6.9
Commandment # 9 - Handle multiple clock domains with care to avoid potential timing problems.
It is extremely important in scan designs to handle multiple clock domains with care. What is considered a clock domain? A clock domain is a grouping of sequential elements all tied to the same clock line. This clock line must have been generated from the same clock tree. If two different flops use clocks that come from the same clock tree but a different branch of the tree, they are still considered within the same clock domain as long as you watch your clock skew carefully. But if one flip flop takes the clock right from the clock tree and another flip flop has to modify the clock via combinational logic then the two flops are considered to be in two different clock domains. The only way they would be considered within the same clock domain is if clock skew between them (assuming they are next to each other in the scan chain) is watched very carefully. Why is this important? While its usually pretty easy to meet setup timing requirements during scan testing (due to slow scan clock frequency); hold timing problems are common. As long as blocks have their scan chains routed internally (meaning that scan chains are routed at a block level and connected at the chip level), hold time problems arent prevalent for flops in the same clock domain. Timing problems usually occur between blocks (due to the logically adjacent scan flops being physically distant causing excessive clock skew) and within blocks where some sort of clock gating was performed. Any sort of gating of the clock (gating for power savings, muxing to handle issues with Commandment #2, etc) introduces skew in the clock line. To avoid potential hold time problems it is suggested that a scan chain only consist of flip flops from the same clock domain. If this is not feasible then you should add lockup latches between the adjacent flops on a scan chain that are in different clock domains. To avoid having an excessive amount of lockup latches and a really confusing scan chain interconnect, it is wise to analyze your designs beforehand to avoid any gating of clocks. If you cant avoid gating of clocks, then attempt to minimize it, and try to come up with a scheme where you localize all of the clock gating such that these flops use the same clock during scan mode (meaning they are within the same clock domain for scan purposes). That way you can place them together and only need lockup latches to be placed before the first flop in this group and after the last one.
6.10 Commandment # 10 - Plan Chip Level Scan Issues Before You Start Block Level Design
This is really a collection of issues that one needs to think of at the chip level to make sure both chip level and block level scan issues are handled correctly. 1) Route the scan chains at the block level and connect them at the chip level. To avoid potential hold time problems consider using lockup latches at the chip level to hook the scan chains up between blocks. You must either dedicate pins to handle the scan chains (scan enable, scan inputs, scan outputs, scan test mode) or you must design mux logic to mux the scan pins with the normal functional I/O.
2)
SNUG San Jose 2000
33
3)
4)
5)
You must preplan all the various test modes youre going to need and how youre going to get the chip into those modes. You can either use spare pins for this or design in logic such as Test Access Port (TAP) Controllers (See IEEE 1149.1 specification for more information). This test logic will need to at least generate a scantestmode signal to alert logic in the chip when scan test mode is active. This is used for all the various muxing logic that has been mentioned throughout the paper to bypass non-scan blocks, mux clock signals to clock pins of flip flops, mux reset signals to set/clear pins of flip flops, etc Buffer the Scan Enable signal to provide maximum scan testing frequency. Remember that Scan Enable is used by every scan flip flop in the chip. If youre not careful you could end up with a large ramp time on this signal. It shouldnt require too much buffering, a few 4x drive buffers in parallel should be capable of driving upwards of 20,000 gates sufficiently to run the scan vectors at 20 MHz. If higher speed or more flops are involved then a little more buffering will be required. So why is this needed? Although 1MHz is a typical frequency which scan is run, it may be necessary to run scan much faster (10 MHz, 50 MHz, etc) in order to have the production test vectors run in a reasonable amount of time. This is dependent on the complexity of design, how scan friendly it is, and how many scan patterns are required to achieve the desired fault coverage. Handle bi-directional I/O with care. Bi-directional I/Os can cause problems on testers depending on how liberal the ATPG tool chooses to operate them. Frequently the default setting of the ATPG tool allows it to generate vectors in which the bi-directional I/Os change direction as a result of the capture clock. This activity is generally not supported on production testers. To be safe, it is advised that you instruct the ATPG tool to generate scan patterns which do not change the direction of bi-directional I/Os as the result of a capture cycle or at least do not cause any contention as a result of the bi-directionsl I/Os changing to outputs.
7.0
Summary
Although scan design methodologies have been around for several years, many companies are just starting to explore them. With gate counts increasing at an enormous rate, it is becoming increasingly difficult to achieve high fault coverage production tests without using scan techniques. By adhering to these commandments you will be able to produce chips which can easily be processed by current ATPG tools to generate scan based test vectors providing high fault coverage. What can you take from this paper? Scan is becoming a necessary design methodology to produce high quality chips. Research scan design methodologies prior to starting a project. Read as much as you can. The references at the beginning of this document are a great introduction to scan. Dont just expect the scan expert of the project to learn scan techniques. Get everyone involved. The basic principles of scan design are pretty obvious when you learn them. The more the designers know the easier it is for them to produce scan friendly designs.
SNUG San Jose 2000
34
Dont underestimate the amount of time it will take to produce a scan design. If youre company is just starting out, you will require a significant amount of time. There will be many pitfalls and unexpected problems. Your managers will want to send the chip to the FAB without allowing you much time to simulate the test patterns produced by the ATPG tools. They wont put much emphasis on chip level static timing involving the scan chains. But if you dont run back annotated simulations of the test patterns then dont expect them to work. These simulations will alert you to timing problems, tool problems, and functional problems. The scan insertion and ATPG tools can produce Design Rule Check (DRC) reports. Read these carefully. They tell you what the tool thinks of your design and any potential problems it sees. Create hardware solutions for all internal tristate busses. By implementing a solution in hardware, rather than expecting the ATPG tool to check for contention, will yield significant improvements to fault coverage. This is probably the most important of the commandments.
8.0
References
Synopsys Scan Synthesis User Guide. This document is available on-line via the Synopsys Online Documentation (SOLD) system. This is a very good source to learn the basic concepts of scan design. Mentor Graphics ASIC/IC Design-for-Test Process Guide. This document is also a great source to learn the basic concepts of scan design.
9.0
Appendix A PCI Bus Contention Solution
Please note that VLSI Technology Inc. (a Subsidiary of Philips Semiconductors) has filed a patent application for the specific implementation detailed within this appendix. Also note, however, that there are plenty of other implementations that can be dreamed up which would not infringe on this patent application. One example might be to have external pins decide which PCI device drives the bus at a given time rather than the internal PCI arbiter. Therefore, use this appendix as one example of how to avoid bus contention on internal busses during scan testing. The internal PCI bus is a source of problems when it comes to Mentor Fastscan and ATPG Pattern Generation. Fastscan has problems resolving potential bus contention on the PCI bus. Fastscan targets faults for each scan cycle, generates the necessary scan data for each scan cycle to target these faults, then simulates this data to see if it causes bus contention. As a result, Fastscan ends up generating lots of vectors that it cant use due to bus contention. This results in extremely long run times and potentially reduced coverage. The following solution guarantees that the PCI bus will never have bus contention during scan test thereby reducing Fastscan ATPG generation times and potentially producing better fault coverage. The solution is to use the PCI bus arbiter to grant the bus to one of the PCI devices. Since the flip flops which are used to generate the bus grants are on the scan chain, Fastscan can force scan data such that the appropriate PCI device drives the PCI bus as desired. This general idea isnt
SNUG San Jose 2000
35
perfect. What if we have Target only devices which dont use bus grant? What if Fastscan attempts to assert multiple bus grants? The Internal PCI Bus Arbiter acts as the central resource for enabling each internal PCI devices tristate drivers during scan test mode. Any PCI device which has its bus grant asserted during scan test shall drive the PCI bus (AD, CBE, PAR, PERR#, SERR#). This includes PCI Target only devices. Note that bus grant is a new signal that must be added to Target only devices. Note also that these bus grants for Target only devices are new outputs from the arbiter that only function during scan test mode. In the event that a Target only device is selected, the CBE signals will be tristated for the duration of the scan capture cycle (one clock). This is due to the fact that a Target only device has no ability to drive the PCI CBE signals. During scan test the PCI Bus Arbiter is responsible for asserting one and only one PCI Bus grant. The flip flops in the arbiter responsible for generating PCI bus grants are on the scan chain such that Fastscan can shift data into them to grant the bus to whichever PCI device it desires. But Fastscan also may attempt to assert multiple bus grants. The arbiter must still guarantee that only one PCI device is selected. In the event that no device is selected the arbiter grants the bus to the default PCI device. This default PCI device is the ASB2IPCI bridge.
Note: This solution assumes that the PCI control signals FRAME#, IRDY#, TRDY#, STOP#, DEVSEL#, REQ#(0:N), and INT(A:D) are not tristated. It assumes some sort of ORing logic is used for these signals.
Figure 26 shows the PCI Bus Arbiter. It only focuses on the modifications to the grant logic. The figure shows an example arbiter with 4 PCI Master devices and 2 target only devices. The "target grant" signals are shown as TGNT_N(1:0). The only additions to the logic of the normal PCI Bus Arbiter are the addition of the flip flops to drive the "target grant" signals and the combinational logic to guarantee that only one grant is asserted during scan test. During normal operation (SCANTESTMODE=0) the PCI bus grants, GNT_N(3:0) are driven straight from the flip flops and the "target grants" are deasserted. During scan test (SCANTESTMODE=1) the GNT_N and TGNT_N outputs are driven from the flip flops (i.e., by Fastscan) unless multiple grants are asserted. If multiple grants are asserted by the flip flops then the combinational logic must choose one of the grants to assert while deasserting all others. If no grants are asserted by the flip flops, then the combinational logic must choose one of the grants to assert (probably the ASB2IPCI bridge grant) while deasserting all others.
SNUG San Jose 2000
36
TGNT_N(1)
TGNT_N(0)
GNT_N(3)
Normal Arbiter Logic (excluding GNT_N register)
GNT_N(2)
GNT_N(1)
GNT_N(0)
SCANTESTMODE
Figure 26
PCI Bus Arbiter with Bus Contention Solution Additions
Figure 27 shows the typical logic used to generate the output enable for the PCI address/data bus. It only shows one output enable being generated for the entire bus. It is common to have multiple flops generating output enables for different portions of the bus, but this is a simple extension to this example.
CR_AD_OE_N AD(31:0) CR_AD(31)
CR_AD(30)
CR_AD(0)
Figure 27
Typical PCI Device AD Output Enable Logic
Figure 28 shows the logic needed by the PCI device to guarantee that we never have bus contention during scan test mode. During normal operation (SCANTESTMODE=0) the PCI devices normal output enable signal, cr_ad_oe_n, is used to enable its output drivers. But during scan test (SCANTESTMODE=1) the grant signal, GNT_N, shall be used to enable the output drivers. This logic assumes that the output enables are active low.
SNUG San Jose 2000
37
scan_oe_contr ol module CR_AD_OE_N
GNT _N SCANTESTMODE
AD( 31:0) CR_AD( 31)
CR_AD( 30)
CR_AD( 0)
Figure 28
Modifications to PCI Device Output Enable Logic to Prevent Bus Contention
Figure 29 shows the logic if the output enables are active high.
scan_oe_control module
OE
OE_MOD
GNT_N SCANTESTMODE
Figure 29
Logic for Active High Output Enable Circuit
SNUG San Jose 2000
38
10.0 Appendix B Simple Case Study VLSI Catalina 7 ASIC

This appendix presents a simple case study so that the reader can get a feel for the complexity of scan design. The case study presented is a chip the authors worked on in the first quarter of 1999. It was the first time scan had been implemented by the authors and the first time scan had been implemented on such a complicated design at the design site. This paper is a result of the experiences and lessons learned on that project, and the desire to teach those within the company the basics of scan design. The figure below shows the block diagram of the Catalina 7 chip. This chip contained an ARM7TDMI processor (along with the various components of the ARMs subsystem), Cache memory for the ARM, Math Coprocessor, high speed memory interfaces to external SDRAM, FLASH, SRAM, and ROM, an on-chip Advanced System Bus (ASB), bridges to both internal and external PCI busses, a plethora of simple peripherals such as GPIO, serial ports, IrDA, interrupt controllers, timers, Real Time Clock, etc., PCI based IDE controller, USB Master and Slave Interfaces, and a parallel port (IEEE 1284). The total gate count was approximately 450,000 gates.
JTAG
ARM7 (.2u)
EBIU Flash & SRAM
ARM-ASB ASB VPB ASB to VPB
SDRAM Control
ASB to ePCI Real Time Clock i-PCI PLLs Arbiter ASB to iPCI Arbiter
GPIO Interrupt Control
Timer
16550 UART
16550 UART w IR
USB Slave
I2C
Bus Mastering IDE
Printer Port #1
USB Master
Figure 30
Example: VLSI Technology, Inc. Catalina 7 ASIC Design
The design contained over 15 clock domains and three major internal tristate busses (ASB, PCI, and VLSI Peripheral Bus). Although the VPB bus and ASB bus components and were designed for the most part with scan in mind, the PCI bus based components (namely USB and the parallel port) were old legacy blocks that were very scan unfriendly.
SNUG San Jose 2000
39
Now that you know what the chip looks like and how complex it was, what was done for scan? This can be divided up into three areas: number of scan chains, test logic added to control scan testing, BIST logic.
10.1 Scan Chains

The chip was divided up into 8 scan chains as follows: 1) 2) 3) 4) 5) 6) 7) 8) ARM processor boundary scan chain = 105 scan elements. This boundary scan chain was used as an extra internal scan chain. Internal PCI Bus scan chain #1 = 2762 scan elements Internal PCI Bus scan chain #2 = 3342 scan elements Internal ASB Bus clk1 scan chain = 2217 scan elements Internal ASB Bus clk2 scan chain = 1651 scan elements Internal VPB bus clock scan chain = 2865 scan elements Peripheral Clock scan chain = 1039 scan elements External PCI bus clock scan chain = 1387 scan elements
The first thing you notice about this is that the scan chains are not very evenly matched in length. Although this is recommended in scan design to achieve the smallest number of scan vectors, we wanted to create a methodology which supported extremely quick turn around times for simple chip modifications such as adding or removing blocks (while keeping the base design intact). With this in mind, we created the scan chains based clock domain, block level granularity, and tester limitations. Due to tester limitations (our tester had 256 Mbits of scan memory) and rough estimates based on 15 scan chains (based only on clock domains), it came out that we had to limit ourselves to 8 scan chains. We had over 15 clock domains in the chip. The good news was that we only had 10 major clocks in the chip. The other 5 clocks were internally generated clocks within some of the design blocks (especially USB and printer port blocks). Therefore we mandated that each block must comply with these 10 major clocks when creating their scan chains. If they had clocks that werent one of the major clocks, they had to create internal scan chains within the block based on their internally generated clocks and connect those scan chains to one of their major scan chains by using lockup latches. For example, the USB Master block itself had about 6 internal clocks, only 2 of which were major clocks (Internal PCI clock and Peripheral clock). Therefore the USB Master block had to create 4 internal scan chains based on its internally generated clocks (each scan chain based on one clock) and hook these scan chains up to one of the two major clock based scan chain via lockup latches. This methodology created 10 scan chains. We knew that two of those scan chains would always be pretty small. Therefore we decided to add them onto the end of the External PCI bus scan chain via lockup latches. Therefore we had 8 total scan chains.
10.2 Test Logic To Control Scan Testing

We had to add test logic to control the chip during scan testing. The ARM7TDMI contains a Test Access Port (TAP) controller. The ARMs TAP controller was used to control its boundary scan
SNUG San Jose 2000
40
chain. Unfortunately the ARMs TAP controller could not be used to control the chip level boundary scan chain. Therefore a primary TAP controller was added to control the chips boundary scan chain logic and to control the various logic in the chip for scan testing. A special scan test mode command was added to alert the chip when scan testing was active. The scan related pins such as scan enable, scan data in, scan data out, and all the scan clocks, were all multiplexed with other functional pins. Therefore test muxing logic had to be added to get these scan related signals on and off chip using the scan test mode signal from the primary TAP controller. The chip contains two internal PLLs to generate the normal operation system clocks. These PLL outputs had to be multiplexed such that chip level pins could be used to drive scan clocks during scan testing. The chip also contains simple reset circuitry using two reset signals (powergood and power_up_reset) to reset the chip. This circuitry had de-glitching hardware added on the powergood and power_up_reset signals which had to be bypassed during scan testing to allow chip pins to drive the internal reset signals directly from the powergood and power_up_reset pins directly.
10.3 BIST
Built In Self Test (BIST) was used to increase the fault coverage of specific blocks within the chip. RAM BIST was used for all the internal memories (Cache, FIFOs, register banks). Software BIST was used to test the Cache memory and hardware BIST was used to test the other memories.
10.4 Summary
Now that you know how complex the chip was and what circuitry was added for scan, what were the results of this effort? This effort required approximately 26 manweeks of effort. Some of this effort came from the simple fact that the chip itself took a while to produce. The schedule could have been reduced by 3 or 4 manweeks had the chip been ahead of schedule. This was also the first time scan had been implemented by the authors and the first time scan had been implemented at the site with designs of any complexity. The ATPG tools (Mentor Fastscan in this case) took approximately 2 days to run, produced about 4000 vectors, and yielded a total fault coverage of over 95% (including BIST). The biggest hurdles to overcome were 1) inexperience and 2) internal tristate busses. The biggest hurdle was lack of experience. There was a huge learning curve involved due to the lack of previous scan experience at the site. With the experience gained on this project, future projects could reduce the schedule to half or less what was required for this project. The second biggest hurdle was the lack of a solution for internal tristate busses. It was known that the ATPG tools could handle bus contention and produce scan patterns that would prevent it. But it was not known how difficult and time consuming it was for the tools to achieve this. When we started the project, we did not have any built in solution to prevent contention on the ASB or PCI busses. At
SNUG San Jose 2000
41
that time we achieved 77% fault coverage and the tool required about 1 week to run. As soon as we created solutions for bus contention and added them to the design our coverage shot to over 95% and only required about 2 days to generate patterns.
SNUG San Jose 2000
42

10 Commandments Scan Design High Coverage Testing

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

10 Commandments Scan Design High Coverage Testing

Transféré par

Droits d'auteur :

Formats disponibles

Ten Commandments of Scan Design

Ken Jaramillo Subbu Meiyappan

VLSI Technology, Inc. (A subsidiary of Philips Semiconductors)

SNUG San Jose 2000

Ten Commandments of Scan Design

Why is Production Testing Important?

Why is Scan Important?

SNUG San Jose 2000

Ten Commandments of Scan Design

SNUG San Jose 2000

Ten Commandments of Scan Design

Design Before Scan Insertion

Scan Output 0 1 Functional Inputs 0 1 0 1

Design After Scan Insertion

SNUG San Jose 2000

Ten Commandments of Scan Design

Shift in 1st Vector

Shift out result of 1st vector and shift in 2nd vector

SNUG San Jose 2000

Ten Commandments of Scan Design

SNUG San Jose 2000

Ten Commandments of Scan Design

Scan Enable Scan Output 2

0 4 1 Functional Inputs 0 3 1 clk2 Scan Input 2 clk2 Functional Outputs

Scan Design with Multiple Clock Domains

SNUG San Jose 2000

Ten Commandments of Scan Design

SNUG San Jose 2000

Ten Commandments of Scan Design

Shift in 1st Vector

Shift out result of 1st vector and shift in 2nd vector

Timing Diagram For Scan Operation Including Multiple Scan Chains

5.1.1 Muxed Flip Flop

SNUG San Jose 2000

Ten Commandments of Scan Design

Scan Input Scan Enable Clock Original D Flip Flop

Muxed Flip Flop Scan Element

5.1.2 Clocked Scan

Scan Input System Clock Original D Flip Flop Scan Clock

Clocked Scan Element

SNUG San Jose 2000

Ten Commandments of Scan Design

LSSD Scan Architecture

5.1.4 Lock-Up Latches

SNUG San Jose 2000

Ten Commandments of Scan Design

Dd Figure 10 Clock skew causing data slippage

SNUG San Jose 2000

Ten Commandments of Scan Design

SNUG San Jose 2000

Ten Commandments of Scan Design

Ten Commandments of Scan

Commandment #1 - Handle Internal Tristate Busses With Care

SNUG San Jose 2000

Ten Commandments of Scan Design

SNUG San Jose 2000

Ten Commandments of Scan Design

Potential Bus Contention During Scan Testing

Functional Input Scan Input Scan Enable Clock

Example Bus Contention During Scan Testing

SNUG San Jose 2000

Ten Commandments of Scan Design