Vous êtes sur la page 1sur 11

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 21, NO.

2, FEBRUARY 2013

281

A Built-In Repair Analyzer With Optimal Repair Rate for Word-Oriented Memories
Jaeyong Chung, Member, IEEE, Joonsung Park, Member, IEEE, and Jacob A. Abraham, Fellow, IEEE
AbstractThis paper presents a built-in self repair analyzer with the optimal repair rate for memory arrays with redundancy. The proposed method requires only a single test, even in the worst case. By performing the must-repair analysis on the y during the test, it selectively stores fault addresses, and the nal analysis to nd a solution is performed on the stored fault addresses. To enumerate all possible solutions, existing techniques use depth rst search using a stack and a nite-state machine. Instead, we propose a new algorithm and its combinational circuit implementation. Since our formulation for the circuit allows us to use the parallel prex algorithm, it can be congured in various ways to meet area and test time requirements. The total area of our infrastructure is dominated by the number of content addressable memory entries to store the fault addresses, and it only grows quadratically with respect to the number of repair elements. The infrastructure is also extended to support various types of word-oriented memories. Index TermsBuilt-in self repair (BISR), memory test, redundancy allocation, repair analysis, spare allocation.
Fig. 1. Required number of CAM entries and worst-case test sessions of each analyzer for several redundancy congurations.

I. INTRODUCTION ODAYS system-on-chip (SoC) environment requires signicant changes in testing methodologies for memory arrays. The failure of embedded memories in a SoC is more expensive than that of commodity memories because a relatively large die is wasted. Due to the large die size and the complex fabrication process for combining memories and logic, SoCs suffer from relatively lower yield, necessitating yield optimization techniques [1]. At present, the area occupied by the embedded memories takes more than half of the total area of a typical SoC, and the ratio is expected to keep increasing in the future [2]. The defects are thus likely to affect the functionality of the memory arrays rather than that of logic. In addition, the aggressive design rules make the memory arrays prone to defects [3]. Therefore, the overall SoC yield is dominated by

Manuscript received June 17, 2011; revised October 18, 2011; accepted December 07, 2011. Date of publication January 31, 2012; date of current version January 17, 2013. This work was supported in part by Samsung Electronics Co., Ltd. J. Chung was with the Department of Electrical and Computer Engineering, The University of Texas, Austin, TX 78712 USA. He is now with Synopsys, Inc., Mountain View, CA 94043 USA (e-mail: chung@cerc.utexas.edu). J. Park was with the Department of Electrical and Computer Engineering, The University of Texas, Austin, TX 78712 USA. He is now with Texas Instruments, Inc., Dallas, TX 75243 USA (e-mail: parkjs@cerc.utexas.edu). J. A. Abraham is with the Department of Electrical and Computer Engineering, The University of Texas, Austin, TX 78712 USA (e-mail: jaa@cerc. utexas.edu). Color versions of one or more of the gures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identier 10.1109/TVLSI.2011.2182217

the memory yield, and optimizing the memory yield plays a crucial role in the SoC environment. To improve the yield, memory arrays are usually equipped with spare elements, and external testers have been used to test the memory arrays and congure the spare elements. However, in the SoC environment, the overall test time is prohibitively increased if the test response data from the memory arrays are sent to the external testers. On the other hand, the SoC environment, combined with shrinking technology, allows us more area for on-chip test infrastructure at lower cost than before, which makes feasible a variety of built-in self test (BIST) and built-in self-repair (BISR) techniques for reducing the test time. In accordance with this trend, built-in redundancy allocation (BIRA) approaches have been proposed as part of BISR. In [4], Kawagoe et al. propose a pioneering BIRA approach, CRESTA. They use parallel sub-analyzers, each of which evaluates a solution candidate. CRESTA has the sub-analyzers for all solution candidates, which provides the optimal repair rate with a single test. The sub-analyzer consists of a row content addressable memory (CAM) with entries ( is the number of repair rows) and a column CAM with entries ( is the number of sub-analyzers. repair columns), and CRESTA requires Since this may not be affordable in memories with many spare elements, subsequent studies have been focused on reducing hardware complexity. In [5], the authors provide a formal basis for design of BIRA algorithms from prime repair algorithms, which correspond to the sub-analyzers. They evaluate particular combinations of the sub-analyzers and show that the area overhead can be reduced with slight degradation of the repair rate. In order to lower the

1063-8210/$31.00 2012 IEEE

282

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 21, NO. 2, FEBRUARY 2013

Fig. 2. Proposed on-chip infrastructure and must repair analyzer details.

hardware complexity and still guarantee the optimal repair rate, another approach is proposed in [6] and [7]. This evaluates each possible solution one by one and thus does not require the parallel sub-analyzers. Such serial implementations may increase the overall test time, but the number of possible solutions is reduced using the must-repair analysis. Also the hardware complexity for the must-repair analysis is only quadratic with respect to the number of repair elements. However, in this method, repeating the complete test for possible solutions may lead to a high increase in test time [8]. For existing optimal analyzers [4], [6], as well as our analyzer, Fig. 1 shows the number of test sessions and CAM entries required for the repair analysis. The analyzers all use CAM arrays and the area is dominated by the number of CAM entries. CRESTA consists of sub-analyzers, each of which has a row CAM with entries and a column CAM with entries. Thus the total number of CAM entries is . In [6], Oehler et al. use two CAMs for the must-repair analysis and each CAM has entries. Although they did not mention this explicitly, additional CAM entries are required to store a repair solution and check if the faults are covered by the solution. The major contributions of this paper can be summarized as follows. Our infrastructure provides the optimal repair rate with a single test as in CRESTA and has the same requirements for the number of CAM entries as [6]. Instead of a stack and a nite-state machine (FSM) used to enumerate all possible solutions in [6], we propose a combinational circuit, which can be congured in various ways to meet the requirements for area and test time. For the fastest conguration, it can generate the next solution candidate in a single cycle. Unlike most repair analysis studies [6], [9], [10], we show that the proposed method can work for word-oriented memories. The rest of this paper is organized as follows. In Section II, we introduce word-oriented repairable memories and basic terms for repair analysis. In Section III, we propose an on-chip infrastructure for repair analysis. In Section IV, the on-chip infrastructure is extended for word-oriented memories. In Section V, we explain in detail the combinational circuit that enumerates all possible solutions. Experimental results are

presented in Section VI. Finally, Section VII concludes this paper. II. PRELIMINARIES In the classical spare allocation problem, we consider a bitoriented memory array with spare (repair) rows and spare (repair) columns. Any fault row (column) can be replaced with a spare row (column). A repair solution is a set of at most row addresses and at most column addresses that cover all faults (all faults are on the addresses). If a repair solution exists for a memory array, the memory array is repairable. A repair strategy is a string of the alphabet { , } such that occurs times and occurs times. Thus there are repair strategies. For example, if and , the set of all possible repair strategies is

Given a sequence of fault addresses, a repair strategy can generate a repair solution candidate and the solution space consists of solution candidates. For example, given a sequence of fault addresses

a repair strategy RRCC generates a solution

The problem to nd a repair solution is known as the constraint bipartite vertex cover (CBVC) problem, which is proven as NP-complete in [11]. Denition 1: If a row (column) in a memory array has more than faults, the row (column) is a must-repair row (column) [11]. This follows from the fact that if a row (column) has more than faults and a repair row (column) is not used for the

CHUNG et al.: BUILT-IN REPAIR ANALYZER WITH OPTIMAL REPAIR RATE FOR WORD-ORIENTED MEMORIES

283

row (column), the memory array is not repairable. The repair rate and the normalized repair rate are dened as follows [12]:

Repair rate Normalized repair rate

(1) (2)
Fig. 3. Column circuitry of a word-oriented memory of type A.

The 100% normalized repair rate is called the optimal repair rate. The following propositions have been introduced in several places in the literature [6], [12], [19]. Here we state them with new, shorter proofs using the pigeonhole principle. Lemma 1: If a memory array is repairable and the number of faults on the memory array is greater than , there exists at least a must-repair row or column [19]. Proof: Let and be the number of row and column addresses in a repair solution for the memory array, respectively. Also let and be sets that contain the faults covered by the th repair row and column, respectively. Suppose that the must-repair conditions are not satised for all repair rows and columns. Then for all and for all Since the memory array is repairable, the total number of faults is

Fig. 4. Column circuitry of a word-oriented memory of type B.

This is a contradiction. Therefore, for at least a repair row or column, the must repair condition is satised. Corollary 1: If the number of faults on a memory array is greater than and the must-repair conditions for all repair rows and columns are not satised, the memory array is not repairable [12]. Corollary 2: If a memory array is repairable, the number of faults captured in the (unbounded) fault-list is at most [6], [12]. Proof: Let and be the number of must-repair rows and columns in the memory array, respectively. All faults which are neither on the must-repair rows nor columns should be covered by at most repair rows and at most repair columns. The number of the faults are at most by the same argument as Lemma 1. The fault-list has at most faults for the must-repair rows and columns. The number of faults captured in the fault-list is at most since and . There are various types of word-oriented repairable memories, and they impose different constraints on the spare allocation problem. Since it is difcult to capture all the different types of repairable memories into a generalized model and to

design an universal repair analyzer, we categorize them into three types, which will be called type A, type B, and type C, respectively. Typically, a faulty row is replaced with a spare row, but the way to replace a faulty column varies, based on which they are classied. Fig. 3 illustrates the column circuitry of a word-oriented memory of type A. In word-oriented memories, the data in a word is usually not placed in adjacent locations due to several issues such as the coupling effect, and the columns associated with the same bit position are clustered together. In type A memories, there are spare column-groups of columns each. A group of columns associated with a word is replaced with a space-column group. In other words, the column replacement is performed on a column group basis. For example, if the rst bit line in group 0 is faulty, and it is replaced with the rst spare column in group 0, then the rst bit lines in the other groups are also replaced with the associated rst spare columns, respectively. The spare allocation problem in this type can be reduced to the conventional spare allocation problem for bit-oriented memories [13]. Most repair analyzers in the literature are developed for bit-oriented memories. In type B memories, a faulty column is replaced with a spare column, but among a group of columns associated with a word, only one column can be replaced. Fig. 4 shows where the restriction comes from. A word-oriented memory of type B has only spare columns unlike that of type A. Each spare column is selected when a programmed column address is accessed. Up to faulty columns can be replaced, but columns that constitute a word cannot be replaced together. An efcient implementation of a word-oriented memory of this type is proposed in [14]. Fig. 5 illustrates a memory of type C, where any faulty column can be replaced with an available spare column without any restriction. Various implementations for this type are proposed in [15][17]. In order to generalize the constraint that arises in type B, we dene a new term. In a memory, if up to columns out of those associated with a word can be replaced with spare columns, the memory is column-per-word replaceable. Thus, memories of type B are 1 column-per-word replaceable.

284

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 21, NO. 2, FEBRUARY 2013

Fig. 5. Column circuitry of a word-oriented memory of type C.

III. PROPOSED INFRASTRUCTURE In this section, we propose an on-chip infrastructure for bit-oriented memories. This infrastructure will be extended for word-oriented memories later. Our repair analyzer requires only a single test and provides the optimal repair rate. Our infrastructure does not depend on BIST engines, and we assume that an arbitrary BIST engine tests a memory array and provides fault addresses whenever detected. Our infrastructure adopts the framework of the Kuo-Fuchs algorithm [11] and consists of must-repair analysis and nal analysis. The must-repair analysis identies must-repair rows and columns, and the nal analysis searches a repair solution. The must-repair analysis is performed concurrently with the test, while the nal analysis is done after the test is completed. The must repair analyzer (MRA) is shown in Fig. 2. The MRA consists of a pair of CAMs for fault addresses, called the fault-list, and a pair of CAMs for a repair solution, called the solution record. In the fault-list, each CAM has one extra valid bit for each word, and the valid bits are initialized to 0 in the beginning. Since the CAMs assert 1 at the valid bit position for write and match operation, only written entries can be matched. During the test, if the BIST engine detects a fault, it sends the fault address to the MRA on the y through BIST_R_DUTAddr and BIST_C_DUTAddr, and continues the test. The row (column) fault address is compared against row (column) CAM entries, and the number of matched entries is efciently counted by a parallel counter [18]. If the number of the matched entries equals in the row (column) CAM, the row (column) indicated by the fault address satises the must-repair condition and R_MustRepair (C_MustRepair) signal is asserted. If the fault address triggers neither the row nor column must-repair condition, MRA writes the row and column address in the row and column CAMs, respectively. Due to Corollary 2, we can limit the size of the fault-list as 2 , and if the overow of the fault-list occurs, the memory array can be determined as unrepairable, and the test can be terminated early. If a particular row or column is identied as must-repair, the row or column address must be part of the solution. Thus the MRA writes the row or column address in the solution record. The L registers are used as valid bits for the solution record and also determine the next available CAM entry. Since a must-repair row and a must-repair column can be identied by a fault at the same time, the MRA should be able to write a pair of row and column addresses simultaneously. Once a row or column address is stored as part of solution by the must-repair condition, then all solution candidates considered by the SOLVER

include the address, and faults on the address do not affect the nal analysis any more. Therefore, such faults do not need to be stored, and we can collect all necessary information for the nal analysis during a single test. Once the test is completed (thus the must-repair analysis is done), BIST_Done signal is asserted and the nal analysis is started. In the nal analysis, the SOLVER module controls the MRA. The operation of the SOLVER and the MRA in the nal analysis phase is illustrated in Fig. 6. The SOLVER will generate repair strategies one by one and will check whether each repair strategy can x all the faults captured in the fault-list. If and are mapped to 1 and 0, respectively, then a repair strategy can be represented by a -bit word as shown Table I. The RepairStrategy module comprises a -bit register and stores the repair strategy being tested currently. The rst repair strategy is generated depending on the numbers of must-repair rows and columns, or UsedMustRepairRows and UsedMustRepairCols. For example, although and , if one repair row and one repair column are used as must-repair, only the two repair strategies and should be generated as if and . After a repair strategy is tested, the state of the MRA should be reverted to that right after the must-repair analysis so the values in the L registers are copied to the L_save registers before the nal analysis begins. The SOLVER generates the rst repair strategy and the MRA reads each fault address in the fault-list in order until there exists no more fault address or the RESTART signal is arrived. The MRA checks if each fault is covered by the current solution, stored in the solution record, and asserts R_Covered or C_Covered. If both signals are low, the fault should be covered by a new repair row or column. The SOLVER determines whether a repair row or column is used for the uncovered fault, and asserts R_Insert or C_Insert. If R_Insert (C_Insert) is high, the fault row (column) address is written in the row (column) CAM of the solution record. If the CAM is full, the memory array cannot be repaired by the rst repair strategy, and the SOLVER generates the next repair strategy and asserts the RESTART signal. The next repair strategy can be generated directly from the current repair strategy by a combinational circuit called K-subset enumerator, which will be explained in detail in Section V. When the RESTART signal becomes high, the MRA restores the initial state, and the next repair strategy starts being evaluated. In this way, the SOLVER explores the solution space and can nd a solution if one exists. In order to test the optimality of a solution, we can dene a cost function as in [11]. In our implementation, the cost is dened as the number of used spare elements. The SOLVER has a register to store the cost of the current repair strategy, or UsedRepairElements. The SOLVER also has registers to store the repair strategy with minimum cost so far and the minimum cost, or RepairStrategyOpt and UsedRepairElOpt. The current cost is compared against the minimum cost so far, which generates the Better signal. If the Better signal goes down during the evaluation of the current repair strategy, the SOLVER immediately asserts the RESTART signal and moves on to the next repair strategy. If the Better signal stays at 1 until the end of the evaluation, the SOLVER saves the current repair strategy and its cost.

CHUNG et al.: BUILT-IN REPAIR ANALYZER WITH OPTIMAL REPAIR RATE FOR WORD-ORIENTED MEMORIES

285

Fig. 6. Solver details and MRA operation in the nal analysis phase: If the fault address being read is not covered by the current solution, depending on R_Insert or C_Insert, the row or column fault address is added to the current solution.

TABLE I BIT REPRESENTATIONS OF REPAIR STRATEGIES

Since the SOLVER continues to search for a better solution even after nding a solution, the MRA may not have the optimal solution after the last repair strategy is evaluated. To reduce area, we have stored the optimal repair strategy instead of the optimal solution. If the solution is directly stored, the size of the solution record should be doubled. Thus we need to recover the solution from the repair strategy, and the SOLVER goes into recovery phase. In the recovery phase, the optimal repair strategy, stored in the RepairStrategyOpt, goes into the RepairStrategy register and the strategy is evaluated again and the nal analysis ends up with the optimal solution. IV. EXTENSION FOR WORD-ORIENTED MEMORIES In this section, we will extend the proposed infrastructure for word-oriented memories, which is more common than bit-oriented memories in practice. Unlike the bit-oriented memory case, from the BIST engine, our infrastructure takes as input a triplet ( , , S), where is the row (column) address, and is the failure syndrome, which is the exclusive OR of the test response and the expected output of the word at . For word-oriented memories of type A, we can discard the failure syndrome and can input only the row and column addresses to the proposed infrastructure. Then without any modication, it will perform repair analysis for type A word-oriented memories. A. Dealing With Type B For a word-oriented memory of type B, our repair analyzer is modied as follows. Let be the word size of the device under test (DUT). We will map the word-oriented memory to

a bit-oriented memory. Since every bit should be addressable in the bit-oriented memory, we expand the width of the column address by to distinguish each column within a word. We call the extended address the virtual column address. In this case, a triplet can generate up to virtual column address for the bit-oriented memory. However, in the case that the number of 1s in is greater than 1, it is obvious that the row being tested is a must-repair row since the DUT is 1 column-per-word replaceable. Thus, if this case is handled separately, one triplet will generate only one virtual column address. The pair of the incoming row address and the virtual column address is fed into the proposed infrastructure, which will work with the word-oriented memory of type B. Let us extend this scheme for a column-per-word replaceable memory where . In this memory, a triplet can generate virtual column addresses. It is common to perform memory BIST at-speed for higher test quality, which means that our infrastructure may receive a triplet at every cycle so the virtual column addresses may need to be handled in one cycle. They can be placed in pipeline, but this does not prevent the BIST from being stopped. Thus, in order to enable at-speed BIST together with BISR, it is necessary to handle this memory in a different way from that of type B. Note that -bit-word-oriented memories of type C are column-per-word replaceable. If we can deal with type C, any column-per-word replaceable memory with can be also handled easily. B. Dealing With Type C We modify the MRA to support word-oriented memories of type C. To begin with, we dene several terms. In word-orient memories, bits (columns) have the same address. However, in order to repair such memories on a column basis, we need to distinguish each column anyway, so we dene the extended column address as a pair of a column address and a word of bits, each of which corresponds to one among the columns indicated by the column address. The extended column address can indicate multiple columns within a word. Also, we call a pair of a row address and an extended column address the extended fault address. Multiple extended fault addresses indicating each fault

286

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 21, NO. 2, FEBRUARY 2013

Fig. 7. Modied must repair analyzer for word-oriented memories of type C.

within a word can be combined into a single extended fault address. For example, and can be combined into . The triplet coming from the BIST can also be represented by an extended fault address. Thus we say that our RA engine receives an extended fault address at a cycle from the BIST engine. The modied MRA is illustrated in Fig. 7. In the MRA, the column CAM in the fault-list needs to store the extended column addresses, and each entry comes to have additional bits. The each bit has its own match signal, and thus each entry has match signals including the original match signal. The new match signals except the original one for the th entry are denoted by , respectively. These signals are fed into the logic for generating the signal. This logic comprises parallel counters (PCs) and the nal adder to count the number of faulty cells in a row across the word boundaries. The number of the faulty cells is added to the number of 1s in the incoming fault syndrome, and if it is greater than , then the must repair condition is satised and the R_MustRepair signal is asserted. The new match signals are also used to generate , each of which is asserted when each column within a word satises the must-repair condition. Columns in a word are handled in parallel using the PCs. Since our architecture is designed to share storage elements as much as possible, this parallel operation is implemented using a little extra hardware. The solution record in the MRA is extended accordingly and it will generate . As in the bit-oriented case, each signal indicates whether each column within a word is already included in the solution record (so will be repaired by a spare column). If a fault, not covered by the current partial solution, triggers the must-repair condition for the row (column), the row (extended column) address should be inserted to the solution record. If a fault is neither covered by the current partial solution, nor triggers the must-repair condition, then the extended fault address of the fault is added to the fault list. For up to faults in a word, this operation should be performed in parallel. Some fault addresses can be inserted to the solution record, and at the same time others can be added to the fault-list. In this process, the fault addresses to be inserted to the solution record are combined. If more than one column satises the must-repair condition for the rst time by the incoming triplet, the extended column addresses to be added to the solution record should be also combined. Thus one triplet adds at most one entry to the solution record and the fault-list.

Fig. 8. Part of control logic for writing extended column addresses to the faultlist and the solution record: the other two inputs of the lower AND gates are con, respecnected to R_Covered and one of tively.

The logic to implement this function is illustrated in Fig. 8. Suppose this gure shows the rst empty entries in the fault-list and the solution record, which are indicated by FLEnd and SolEnd, respectively. Thus the two signals in this gure are considered to be asserted. Readers can identify that a control logic is instantiated times for the parallel operation. The SOLVER also tries to write an extended column address to the solution record in the nal analysis phase, when the signal C_Insert is used. To demonstrate the operation of the modied MRA, we will consider an example. Suppose that we are given a DUT with 3 rows and each row has 2 words of 2 bits each. The DUT is equipped with 2 spare rows and 2 spare columns. Fig. 9 depicts the faults on the DUT. At cycle 0, our RA engine takes the extended fault address from the BIST engine and this is added to the fault-list. At cycle 1, is received, and R_MustRepair is asserted. Thus the row address is inserted to the solution record. At cycle 2 and 3, each fault address is added to the fault-list without triggering the must-repair condition. At cycle 4, the left column in triggers the column must-repair condition, but the right column does not. Thus, the extended column address is inserted to the solution record and the extended fault address is inserted to the fault-list. Finally is added to the fault list. The fault-list after the test is shown in Table II. Actually, the row 2 and the right column in are a must-repair row and a must-repair column, respectively, but these are not

CHUNG et al.: BUILT-IN REPAIR ANALYZER WITH OPTIMAL REPAIR RATE FOR WORD-ORIENTED MEMORIES

287

TABLE II FAULT-LIST AFTER THE TEST OF THE DUT IS COMPLETED

Fig. 9. Faults on a DUT. TABLE III CHANGE OF THE SOLUTION RECORD WHEN IS USED

detected by the MRA. Note that the main goal of the MRA is not to detect all must-repair conditions but to capture all necessary information with entries. As in the bit-oriented case, once the must-repair analysis is completed, the nal analysis is started. The nal analysis is an iterative process for each entry in the fault-list and is summarized in Algorithm 1. The overall procedure is similar to the nal analysis in the bit-oriented case. The reason why the nal analysis can be performed with minor modications is that given multiple faults within a word, there are still only two possible (meaningful) ways to x them; a repair row is used, or all faults are xed by a repair column. It is obvious that the other ways lead to suboptimal results. We will continue to use the example, for the nal analysis. Algorithm 1 Final Analysis 1: Read an extended fault address ( , ( ,S)) from the fault list 2: Let be the bitwise OR of and . Then represents the faults in not covered by the current partial solution. 3: Let be the number of 1 in 4: if then 5: Go to line 1 6: else 7: if a repair row should be used according to the current repair strategy then 8: if is 1 then 9: Restore the state of the MRA right after the must-repair analysis is done 10: Obtain the next repair strategy, initialize the pointer of the fault list and go to line 1 11: else 12: 13: end if 14: end if 15: if a repair column should be used according to the current repair strategy then 16: Let be the number of remaining repair columns 17: if then 18: Perform the same procedure as line 910 19: else 20: 21: end if 22: end if 23: end if

TABLE IV CHANGE OF THE SOLUTION RECORD WHEN

IS USED

Table III shows the change of the solution record in the example. Initially, the solution record contains and that are detected as must-repair during the test. Except these, only two spare elements are remaining. Suppose that the current repair strategy is . At cycle 0 and 1, the rst and second entries are read from the fault-list in Table II. These are already covered by the current partial solution and nothing happens. At cycle 2, the third entry is loaded and according to the current repair strategy, a repair row is used for the fault. The row 1 is added to the solution record. At cycle 3, is read and a spare column is used to x this fault. At cycle 4, since there is no remaining spare element for the last entry, the current strategy cannot x the DUT. Now the current repair strategy becomes . Table IV shows the change of the solution record in this case. The internal state of the MRA is restored to that before the nal analysis begins, and the solution record consists of and again. The cycle 0 and 1 are the same as before. At cycle 2, a repair column is used at this time, which adds to the solution record. At cycle 3, the fault is xed by a repair row. At cycle 4, since the last entry is already covered by the repair row, nothing happens. Since every fault in the fault-list is covered by the current partial solution, we obtain a solution. V. K-SUBSET ENUMERATOR A repair strategy is represented a -bit word. Since the words contain 1s and 0s, enumerating all repair strategies is equivalent to enumerating -bit words that contains 1s. This problem is also equivalent to enumerating -subsets, which are subsets of size , given a set of size , where and . Usually the enumeration can be done by using the depth-rst search, which can be implemented by using a stack. A software implementation using recursion is shown in [20]. Note that recursive algorithms use the call stack. A hardware implementation using a stack and counters is shown in [6] which requires times 4 storage ele-

288

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 21, NO. 2, FEBRUARY 2013

ments (RAM or ip-ops) and a FSM, and the number of cycles to derive the next repair strategy varies. In [21], the authors use an linear feedback shift register (LFSR) and show the feedback polynomials and initial seed values to enumerate all repair strategies for up to 2 repair rows and 2 repair columns. However, this approach is not general for arbitrary and values. A maximal LFSR may be used with a parallel counter to count the number of 1s. If the number of 1s equals , the LFSR value is a repair strategy. However, the numbers of available repair rows and columns are different after the must-repair phase, and during the enumeration of the LFSR, it is difcult to detect every repair strategy for the available repair rows and columns only once using a simple circuit. We therefore propose an efcient algorithm and a combinational circuit implementation. It is important to note that the problem of enumerating all repair strategies also corresponds to generating constant weight vectors, which are often used as test patterns [22]. A method to generate constant weight vectors is presented in [22] whose scheme, however, requires an external tester to input a particular sequence. Enum_K_Subset, summarized in Algorithm 2, consists of two distinct operations. If the LSB is 0 (line 2), Enum_K_Subset performs a move operation (line 36), which nds the trailing 1 and moves it towards the LSB by one bit. For example, if is 001100, the next -subset representation becomes 001010. If the LSB is 1, Enum_K_Subset performs a borrow operation (line 812), which nds the trailing 1 excepting one or more consecutive 1s ending in the LSB and moves it towards the LSB by one bit. We call such a trailing 1 a pivot. After the pivot, it writes the same number of 1s as the consecutive 1s and writes 0s until the LSB. This is equivalent to ipping the group of bits in the position lower than the pivot. For example, if is 00100011, the next -subset representation becomes 00011100. It is interesting to note that both the operations in Enum_K_Subset preserve the number of 1s in . Algorithm 2 Enum_K_Subset Require: 1: while is not a string of one or more consecutive 0s followed by one or more consecutive 1s do 2: if then 3: // move operation 4: Let the bit index of the trailing 1 5: 6: 7: else 8: // borrow operation 9: Let the bit index of the rst 1 followed by one or more consecutive 0s and one or more consecutive 1s ending in LSB 10: 11: 12: 13: end if 14: end while

If Enum_K_Subset takes a -bit string and we need to enumerate all -subsets of a set of size , then the initial string should be consecutive 0s and consecutive 1s followed by consecutive 0s. For example, given , , and , the initial string is 001000 and Enum_K_Subset enumerates the sequence . Note that can be greater than . Thus, in the proposed BIRA, can be greater than . This is a crucial feature because the available repair elements after the must-repair analysis can be fewer than even if we set . A. Combinational Circuit Implementation In this subsection, we propose a combinational circuit implementation of Enum_K_Subset that consists of two stages. In the rst stage, the circuit generates borrow signals dened as follows: if and an index such that and

otherwise. The borrow signals indicate the position of the pivot. In the second stage, the next -subset representation, , is generated using the borrow signals. To derive the borrow signals from , we dene two types of intermediate signals and . The signal is high if is a string of consecutive 1s. The signal is high if is a string of one or more consecutive 0s followed by one or more consecutive 1s. Then, we can write

Using

, the borrow signal can be written as

Owing to the dependency of on , the borrow signals may be difcult to compute in a single cycle. We thus generalize these signals to break the dependency. This allows us to use the parallel prex algorithm, which is used for high-speed tree adders such as the Kogge-Stone adder, Brent-Kung adder and so on. asserts if it is a string of consecutive 0s, if it is a string of consecutive 1s, and if it is a string of one or more consecutive 0s followed by one or more consecutive 1s. So we write If

If

for an integer such that

CHUNG et al.: BUILT-IN REPAIR ANALYZER WITH OPTIMAL REPAIR RATE FOR WORD-ORIENTED MEMORIES

289

TABLE V BORROW OPERATION CAN BE IMPLEMENTED BY COMBINING OPERATION AND A SHIFT OPERATION

BYPASS

Fig. 10. 8-bit -subset enumerator with Kogge-Stone style conguration.

triangle. However, since the conditions for are mutually exclusive, we can directly implement the formula for by using the AOI gates in an implementation of small size such as 8-bit. Besides, the bypass operation is dominant in the high order bits and the area overhead is small. VI. EXPERIMENTAL RESULTS We implemented the proposed infrastructure in 130-nm technology for a memory array with four repair rows and four repair columns, and the operating frequency is 400 MHz. We custom-designed CAMs and synthesized the other logics using Synopsys Design Compiler except an 8-bit -subset enumerator. Major evaluation factors of BIRA performance include analysis time, area, and repair rate. Table VI compares our method to CRESTA and the intelligentSolveFirst proposed in [6]. Since all these methods provide the optimal repair rate, the repair rates are not presented. The number of test and the number of CAM entries dominate the analysis time and the area, respectively. Thus, the test time in the worst case and the area can be estimated using Table VI. As mentioned earlier, CRESTA performs repair analysis in parallel with the test and evaluate all solution candidates simultaneously using the multiple sub-analyzers, requiring only one test irrespective of the number of repair elements. The test and repair analysis nish at the same time, and one of the sub-analyzers contains the optimal solution. Since no extra cycle after the test is required, the analysis time equals the test time. In the case that or as in the third row in Table VI, the number of possible solution candidates increases linearly in the number of repair elements, and the spare allocation problem becomes relatively easy. However, it is known that the repair rate of and is worse than that of and [4]. The methods proposed in [6] reduce the number of required CAM entries at the cost of the analysis time. TheBasicSolve in [6] performs the exhaustive search and requires tests (or, restarts) if the optimal solution is necessary in terms of the number of repair elements used. If the number of repair elements used does not matter, it may nd a solution with a few tests [6] but the worst case bound is still . The required number of tests by the intelligentSolve and the intelligentSolverFirst in the worst case seems to be much less in simulation, but it is not proven theoretically. In our proposed method, the restart of the test does not happen in any case. This comes at the cost of a few extra cycles after the single test for the nal analysis. In the nal analysis phase, the read operation of the faultlist takes less than 1.7 ns as shown in Fig. 11, and the subsequent logic uses 0.8 ns. Thus the SOLVER can evaluate a repair

Note that and . Fig. 10 shows a group LHT cell to generate and signals. Similar to carry signals of adders, there are various congurations to generate the nal , , and by using the group LHT cells, which provides a trade-off between speed, area and complexity. Such congurations for adders are shown in [23] and can be directly converted into ones for -subset enumerators. Since the SOLVER takes some time to consume a repair strategy, a slow conguration may not be critical in the overall performance. However, since the SOLVER can request the next repair strategy in less than cycles in some cases, we can consider a fast conguration, provided that area is not a crucial factor. Fig. 10 also shows the Kogge-Stone style conguration that provides the best speed, and if the smallest area is desired, the ripple-carry style conguration can be used. For a -bit -subset enumerator, indicates that is the last one (i.e., the end of the enumeration). Once , and signals are obtained from a conguration, the next -subset representation, , can be derived from the following formulas. For all

For all such that if if if if if otherwise. For or , can be derived similarly. The rst line of implements the move operation. For the borrow operation, the previous equation can be represented again in Table V when . This shows the borrow operation can be implemented by combining two distinct operations, the bypass operation for the upper triangle and the shift operation for the lower triangle. Thus we call this the partial bypass-and-shift. To select one of the two operations, we can use . We may need to use a barrel shifter for the lower

290

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 21, NO. 2, FEBRUARY 2013

TABLE VI NUMBERS OF TEST SESSIONS IN THE WORST CASE AND THE NUMBERS OF CAM ENTRIES

TABLE VII COMBINATIONAL LOGIC SUMMARY

Fig. 11. Read operation of the row CAM in the fault-list.

The -subset enumerator has much slack in timing, so if it is replaced by a slower conguration, we can implement it using a smaller number of cells. As mentioned earlier, the number of cycles to generate repair strategies varies in the stack-based implementation unlike the -subset enumerator, which could lengthen the nal analysis time. However, this may be marginal compared to the long test time. Rather, the -subset enumerator is benecial for its simplicity as well as smaller area. Due to the varying generation time, the stack-based implementation complicates control logics, while our proposed enumerator does not require them and it is easy to verify once implemented. VII. CONCLUSION In this paper, we have proposed an on-chip infrastructure for repair analysis with the optimal repair rate. Our infrastructure requires a single test and a few extra cycles, which is about 600 cycles in a memory array with four repair rows and four repair columns. Most built-in repair analyzers are developed for bit-oriented memories, whereas our repair analyzer also aims at various types of word-oriented memories. To achieve this, we have extensively studied existing word-oriented repairable memories and have classied them into three types. For each type, we have showed how the bit-oriented version can be extended. As part of our repair analyzer, we have also developed a novel combinatorial circuit for enumerating constant-weight vectors. ACKNOWLEDGMENT The authors would like to thank E. Byun and C.-J. Woo at Samsung Electronics for helpful discussions. REFERENCES
[1] R. Rajsuman, Design and test of large embedded memories: An overview, IEEE Design Test Comput., vol. 18, no. 3, pp. 1623, May 2001. [2] S. Hamdioui, G. Gaydadjiev, and A. van de Goor, The state-of-art and future trends in testing embedded memories, in Proc. Records Int. Workshop Memory Technol., Design, Test., 2004, pp. 5459. [3] Y. Zorian and S. Shoukourian, Embedded-memory test and repair: Infrastructure IP for SOC yield, IEEE Design Test Comput., vol. 20, no. 3, pp. 5866, May/Jun. 2003.

strategy within cycles, which includes one extra cycle to move on the next repair strategy. The SOLVER should evaluate repair strategies in the worst case and one extra repair strategy for recovering the optimal solution, so the nal analysis takes less than cycles. This amount of time is negligible compared to a single test time, which is typically over a few hundred million cycles [9], [10]. The storage requirement of the proposed method remains the same as [6]. Even in the extension for word-oriented memories, the results in Table VI do not change although each entry in the CAMs for the column address requires additional bits. Since our infrastructure requires entries for each CAM in the fault-list and entries for the solution record, if the number of repair rows and columns are evenly divided, the total number of CAM entries grows quadratically with respect to the number of repair elements, even if the redundancy allocation problem is NP-complete. Actually this is no surprise since the problem kernel of the redundancy allocation is known to be small, [24]. The extra cycles clearly grow exponentially. However, we have tried to minimize the constant that is multiplied by the exponential term in the complexity representation of the test time. As a result, the extra cycles become insignicant for the small number of repair elements used in practice. Our implementation of the proposed method uses 94 ipops and the number of ip-ops grows at most linearly with respect to the number of repair elements. Also, it uses about 800 combinational cells including an 8-bit -subset enumerator with Kogge-Stone style conguration, as shown in Table VII.

CHUNG et al.: BUILT-IN REPAIR ANALYZER WITH OPTIMAL REPAIR RATE FOR WORD-ORIENTED MEMORIES

291

[4] T. Kawagoe, J. Ohtani, M. Niiro, and T. Ooishi, A built-in self-repair analyzer (cresta) for embedded drams, in Proc. Int. Test Conf., 2000, pp. 567574. [5] S. Shoukourian, V. A. Vardanian, and Y. Zorian, A methodology for design and evaluation of redundancy allocation algorithms, in Proc. VLSI Test Symp., 2004, pp. 249255. [6] P. Oehler, S. Hellebrand, and H.-H. Wunderlich, An integrated built-in test and repair approach for memories with 2D redundancy, in Proc. Eur. Test Symp., 2007, pp. 9196. [7] P. Oehler, S. Hellebrand, and H.-J. Wunderlich, Analyzing test and repair times for 2D integrated memory built-in test and repair, in Proc. Design Diag. Electron. Circuits Syst., 2007, pp. 16. [8] P. Oehler, A. Bosio, G. D. Natale, and S. Hellebrand, A modular memory BIST for optimized memory repair, in Proc. Int. On-Line Test. Symp., 2008, pp. 171172. [9] W. Jeong, I. Kang, K. Jin, and S. Kang, A fast built-in redundancy analysis for memories with optimal repair rate using a line-based search tree, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 17, no. 12, pp. 16651678, Dec. 2009. [10] W. Jeong, J. Lee, T. Han, K. Lee, and S. Kang, An advanced BIRA for memories with an optimal repair rate and fast analysis speed by using a branch analyzer, IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 29, no. 12, pp. 20142026, Dec. 2010. [11] S.-Y. Kuo and W. Fuchs, Efcient spare allocation for recongurable arrays, IEEE Design Test Comput., vol. 4, no. 1, pp. 2431, Feb. 1987. [12] C.-T. Huang, C.-F. Wu, J.-F. Li, and C.-W. Wu, Built-in redundancy analysis for memory yield improvement, IEEE Trans. Reliab., vol. 52, no. 4, pp. 386399, Dec. 2003. [13] A. Sehgal, A. Dubey, E. Marinissen, C. Wouters, H. Vranken, and K. Chakrabarty, Redundancy modelling and array yield analysis for repairable embedded memories, IEE Proc. Comput. Digit. Techn., vol. 152, no. 1, pp. 97106, 2005. [14] A. Ferris and G. Work, Memory circuit capable of replacing a faulty column with a spare column, U.S. Patent 5 163 023, Nov. 10, 1992. [15] B. Fitzgerald and E. Thoma, Circuit implementation of fusible redundant addresses on RAMs for productivity enhancement, IBM J. Res. Develop., vol. 24, no. 3, pp. 291298, 1980. [16] N. MacDonald, Memory array of integrated circuits capable of replacing faulty cells with a spare, U.S. Patent 5 406 565, Apr. 11, 1995. [17] C. Wu and C. Wong, Dynamic spare column replacement memory system, U.S. Patent 5 781 717, July 14, 1998. [18] E. E. Swartzlander, Parallel counters, IEEE Trans. Comput., vol. 22, no. 11, pp. 10211024, Nov. 1973. [19] D. K. Bhavsar, An algorithm for row-column self-repair of rams and its implementation in the alpha 21264, in Proc. Int. Test Conf., 1999, pp. 311318. [20] J. Loughry, J. I. Hemert, and L. Schoofs, Efciently enumerating the subsets of a set, 2000. [Online]. Available: applied-math.org/subset.pdf [21] X. Du and W.-T. Cheng, At-speed built-in self-repair analyzer for embedded word-oriented memories, in Proc. Int. Conf. VLSI Design, 2004, pp. 895900. [22] D. Tang and L. Woo, Exhaustive test pattern generation with constant weight vectors, IEEE Trans. Comput., vol. C-32, no. 12, pp. 11451150, Dec. 1983. [23] N. H. Weste and D. Harris, CMOS VLSI Design: A Circuits and Systems Perspective. Boston, MA: Addison-Wesley, 2005. [24] G. Bai and H. Fernau, Constraint bipartite vertex cover: Simpler exact algorithms and implementations, in Proc. Int. Workshop Frontiers in Algorithmics, 2008, pp. 6778.

Jaeyong Chung (S09M11) received the B.S. degree in electrical engineering from Yonsei University, Seoul, Korea, in 2006, and the M.S. and Ph.D. degrees in electrical and computer engineering from the Department of Electrical and Computer Engineering, University of Texas, Austin, in 2008 and 2011, respectively. He joined Synopsys, Inc., Mountain View, CA, in 2011. He is currently a member of the Design Compiler Team, Synopsys, Inc., where he focuses on research and development in logic synthesis and optimization topics. His current research interests include combinational optimization, timing analysis, yield optimization, and very large scale integration testing. Dr. Chung was the recipient of Best Paper Award nominations at the International Conference on Computer-Aided Design in 2009 and the Asia and South Pacic Design Automation Conference in 2011. One of his co-authored papers was selected in the Asian Test Symposium 20th Anniversary Compendium.

Joonsung Park (S07M09) received the B.Eng. degree in electrical engineering from the Korea University, Seoul, Korea, in 2003, the M.Sci. degree in electrical engineering from the University of Michigan, Ann Arbor, in 2005, and the Ph.D. degree in electrical and computer engineering from the University of Texas at Austin, Austin, in 2009. He is currently an Electrical Design Engineer with Texas Instruments, Inc., Dallas. His research interest includes VLSI and mixed-signal circuit design and test methodology.

Jacob A. Abraham (S71M74SM84F85) received the Ph.D. degree in electrical engineering and computer science from Stanford University, Stanford, CA, in 1974. He is currently a Professor with the Department of Electrical and Computer Engineering, University of Texas, Austin. He is also the Director of the Computer Engineering Research Center and holds the Cockrell Family Regents Chair in Engineering. From 1975 to 1988, he was on the faculty of the University of Illinois at Urbana-Champaign, Urbana. He has published extensively and is included in the Thomson Reuters List of Highly Cited Researchers. He has supervised more than 80 Ph.D. dissertations and is particularly proud of the accomplishments of his students, many of whom occupy senior positions with academia and industry. His current research interests include VLSI design and test, formal verication, and fault-tolerant computing. Dr. Abraham is a fellow of the ACM and was the recipient of the 2005 IEEE Emanuel R. Piore Award.

Vous aimerez peut-être aussi