Académique Documents
Professionnel Documents
Culture Documents
I. I NTRODUCTION
is used in this regard for storage and faster access (wire-speed of contention for cache locations can be solved, but it results
access) [1]–[4]. In a conventional direct mapped cache, there in requirement of entire cache tag search (serial). The software
is a probability of high cache miss rate due to the continuous match routine is slower in spite of using faster matching algo-
refresh in cache manager. Register and level 1 (L1) cache rithms. So, a content addressable memory (CAM) is often used
are fastest and the performance degrades for the level 2 (L2) in place of software cache tag presented in Fig. 1 often called as
cache and the main memory. The moderate sized L2 cache is hardware cache tag which performs the search in a single clock
often used for accessing the frequently searched information. cycle but at the cost of additional storage area.
In a conventional searching, the cache controller provides the Unlike a random access memory (RAM), a CAM renders an
address of frequently searched data to the cache rather than the accelerated data search medium by comparing the search data
main memory for faster data access. with prestored contents in a single clock cycle. In addition to
A fully associative cache must be used so that any location the basic CAM, a ternary CAM (TCAM) also called threefold
in main memory can be associated with the cache and issue memory uses a supplementary don’t care (or “X”) state. During
the search operation, input is prefetched to the match index
and a simultaneous comparison is carried out with previously
loaded data. TCAM is an efficient search engine which makes
Manuscript received January 19, 2016; revised April 26, 2016; accepted
July 11, 2016. Date of current version October 25, 2016. This work was sup- it suitable in asynchronous transfer mode switching and fast
ported in part by the Ministry of Human Resource Development, Government lookup of network routing [5]–[9]. Besides the fast searching,
of India. This paper was recommended by Associate Editor V. Erraguntla. large number of storage cells and interconnections occupy
The authors are with the Department of Electronics and Communication
Engineering, National Institute of Technology Meghalaya, Shillong 793003, substantial design area and make TCAM more power hungry.
India (e-mail: ssandeep.mmishra@nitm.ac.in; telajalamahendra@nitm.ac.in; Thus, efficient low-power techniques and high density storage
anup.dandapat@nitm.ac.in). approach must be employed in designing a TCAM.
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. Algorithmic approaches have been implemented to reduce
Digital Object Identifier 10.1109/TCSI.2016.2592182 the TCAM lookup [10], [11]. These techniques help in reducing
1549-8328 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
MISHRA et al.: A 9-T 833-MHz 1.72-fJ/BIT/SEARCH QUASI-STATIC TERNARY FULLY ASSOCIATIVE CACHE TAG 1911
Fig. 2. TCAM architectures. (a) 18-T conventional swapped-XOR TCAM. (b) 12-T compact TCAM comprising two 4-T static storage. (c) Proposed 9-T quasi-
static TCAM comprising 4-T static storage and dynamic mask storage.
the power consumption but at the cost of performance degrada- The non-segmented architectures face the challenge of high-
tion. In [12], a unique choking current method has been pro- leakage power consumption [28], [29]. The segmented archi-
posed to reduce the power consumption with speed boosting. tecture resolve this issue but the cell count remains the same.
Dynamic designs have been presented for high density storage CAM cells are arranged in an alternate fashion in a dual bit
with low leakage requirements but a proper synchronization content addressable memory (DBCAM) to store logic 0 and 1
between data retention and refresh cycle is too complex and separately [30], [31]. The storage cell requirement is reduced
increases the energy dissipation [13]–[16]. Tsai et al. have used by half in comparison with the conventional TCAM presented
reflex charge equating scheme to minimize the power consumed in [32], yet both these suffer from the complex matchline
by matchline (ML) [17]. control and lead to smaller hit rate.
In [18], a NAND-type circuit has been partitioned into two The proposed architecture provides a high-density fully asso-
segments with different capacities that operate consecutively, ciative cache tag that reduces lookup load by using a hardware
resulting in lesser power consumption. But the matching prob- cache tag prior to the L2 cache storage. The use of cross-
ability and design of the pre-computation circuitry decide the coupled inverters for data storage in the conventional TCAM
power consumption in the above technique. Low-power designs leads to additional leakage. Therefore, a faster quasi-static
have been proposed based on power reduction in the high TCAM approach has been employed in designing the cache tag
capacitive matchlines [19]–[23]. Pie-sigma ML scheme has that also helps in reducing the design area. The rest of this paper
been used in [19] where NAND and NOR cells have been realized is structured as follows: Section II describes the background
by pie and sigma segment respectively. An interfacing logic on high-density TCAMs. Next, we introduce the quasi-static
has been used between these segments to avoid the short- TCAM followed by the selective matchline evaluation scheme
circuit current. The voltage detector current has been recycled in Section IV. Analysis on the measured results has been
to charge-up the ML for reducing energy [21]. presented in Section V and we conclude the paper in Section VI.
An efficient high-density cache tag must be designed that
performs the matchline charge sense (match) in near zero time. II. BACKGROUND : H IGH -D ENSITY T ERNARY
A dynamic CAM (DCAM) is the best quick fix but has a C ONTENT A DDRESSABLE M EMORIES
low data retention time that requires a proper synchronization Set associative storage cells have been popular among
between the refresh and search cycle to function [13]–[16]. designers for high-density memory architectures. Asymmetric
High-density static CAMs (SCAMs) do not suffer from this static storage requires less design area with similar performance
issue but the power consumption is significant [6], [24]. In [24], results as of conventional static TCAMs (SCAMs). This ap-
two latches have been used to store three logic values similar proach consumes considerable power while the complementary
to the design presented in [6]. These designs take lesser number data search provides a faster matchline charge-up or charge-
of transistors among available TCAMs. Two metal rails VDDML down. Separate storage cells for content and mask have been
and VDD have been introduced to power up the data and mask used in conventional swapped-XOR TCAM as shown in Fig. 2(a).
storage cell [25]. The mask value (M ) and evaluation result (E) go through a
Unique arrangements in the conventional TCAMs have been NAND based ML sensing. A conventional TCAM writes and
carried out to reduce the energy dissipation [26], [27]. The reads through three wordlines [data wordline (DWL), mask
mask bits with only logic 1 value have been separated from wordline (MWL), and read wordline (RWL)]. This approach (de-
those having only logic 0 value [26]. All other cells except coupled data and search line with separate wordlines) increases
the boundary cells in different segments have been self gated. the interconnection length and size of the TCAM macro.
1912 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 63, NO. 11, NOVEMBER 2016
TABLE I
S TATE TABLE C OMPARISON OF TCAM A RCHITECTURES (Q: S TORAGE N ODE ; SL: S EARCHLINE ; E: E VALUATION S TATE ;
ML: M ATCHLINE ; XM: M ASK I NPUT; BL: B ITLINE /S EARCHLINE OF P ROPOSED D ESIGN )
A state of the art quasi-static TCAM (QSTCAM) has been TCAM have weak logic values at various nets, which results in
presented in Fig. 2(c). The storage cell provides two com- higher power consumption.
plementary soft nodes (Q and Q) similar to the conventional Searchline capacitance (CSL ), matchline capacitance (CML )
TCAM (hard) storage. The additional “Q” does not contribute at each mismatch state and the matchline swing primarily
to the evaluation of search data but helps in maintaining affects the overall power consumption of a TCAM. The search-
logic 0 and 1 values at node “Q.” The static storage exhibits line power for S number of searchlines is
a dc characteristics with static-noise-margin of 752 mV and
1
126 mV with a trivial variation of 21.9% in the threshold volt- Psearch = S CSL VDD 2 f. (1)
age over process corner variation (presented in the Appendix). 2
A coupled data and searchline (BL/ BL) has been provided to Considering a TCAM array of m-words × n-bits, (1) can be
the sources of transistors T5 and T6 . The data wordline (DWL) written as
has been used to write data values (BL and BL) and mask 1
wordline (MWL) has been provided to write the mask (XM) Psearch = m nC SL VDD 2 f. (2)
2
through transistor T8 . The dynamic masking approach has been
Similarly
employed where the mask storage net (N1 ) is destructive but
the separated MWL can be activated at any period without PML = mnC ML VDD 2 f. (3)
changing the storage node (Q and Q) values. MWL is activated
during each precharge phase to ensure a valid mask value With unequal matchline swing
at N1 during search which is controlled through the mask- PML = mnC ML VDD VMLswing f. (4)
line driver. The conventional swapped- XOR TCAM [23] uses
18 transistors with 7 I/O ports (18-T–7-I/O), while the compact The presented design does not provide control over the
TCAM [24] takes minimum 12-T–6-I/O. The proposed quasi- matchline swing for maintaining lower interconnects. There-
static approach (static content and dynamic mask storage) fore, parameters (CSL , CML , and S) become the deciding
requires only 9 transistors with 6 ports (9-T–6-I/O) making it factors in the power reduction. The use of single static storage
suitable in high-density design requirements. cell reduces the number of searchline pairs to half, the coupled
State table comparison of these designs has been summarized data and searchline increases the power but at one end. The use
in Table I. The masking in proposed QSTCAM is very similar of single matchline discharge transistor also helps in reducing
to the conventional design with the exception of complementary the ML power consumption.
mask values. A low value at net N2 [“0” or small charge (L)] The performance speed of the TCAM depends on the time it
keeps the ML transistor (T9 ) at cut-off state, while a high value takes to change the matchline charge. The approximated small-
[“1” or near VDD (H)] discharges the ML to ground. Dissimilar signal model of the proposed TCAM is presented in Fig. 3 for
logic values at SL and H result in a match in compact TCAM. the calculation of discharging time constant for a mismatch
From the state table it is clear that the compact and proposed case. The model is valid for the search phase where the nets,
MISHRA et al.: A 9-T 833-MHz 1.72-fJ/BIT/SEARCH QUASI-STATIC TERNARY FULLY ASSOCIATIVE CACHE TAG 1913
Fig. 4. (a). TCAM array mask distribution (global and local masking). (b) Net and port charge variation in the proposed QSTCAM.
ports, and nodes with constant voltage level during the phase are IV. S ELECTIVE M ATCHLINE E VALUATION S CHEME
grounded. For a simplified analysis the following assumptions The matchline charge-up or charge-down depends upon the
have been considered: matchline control transistors state (T17 and T18 in conventional
1) Neglecting the effects of conductances (GBD and GBS ) TCAM; T5 and T6 or T11 and T12 in compact TCAM). In
and resistances (RDB and RSB ). the proposed TCAM, transistor T9 state decides the matchline
2) Neglecting the effects of dependent current sources value. In the proposed design depicted in Fig. 2(c), a voltage
(GMBS VBS , INRD , ID , and INRS ). level at N2 sufficiently below the threshold of T9 ensures the
3) Equating the charge effects of DWL, PRE, XM, BL, BL, match state. Circuit behavior at various masking for pattern
Q, Q, MWL, and VDD to zero. matching has been discussed first followed by explanation at
normal match and mismatch states.
A two-port model has been designed from the evaluation cir-
cuit as the “Q” charge is constant throughout the precharge and A. Pattern Matching Implementation
search phase. The Thevenin’s equivalent resistance (RTh ) or
the voltage gain can be used for the calculation of discharging The additional don’t care state (X) of TCAM as summarized
time constant. The voltage gain in the model can be calculated in the sequence 5 and 6 in Table I is significant in longest
by using the ML port as output and node E as input. The gate prefix matching, pattern matching, and fast lookup of network
of T8 (MWL) and the source (XM) are at same voltage level routing [5]–[9]. Local matching provides a strong “0” and
thereby isolating its effect from the decision path. The ML global masking supply a weak but low logic N2 to the gate of
delay depends upon the discharging time of matchline where ML control transistor T9 shown in Fig. 2(c). At this N2 signal
evaluation result decides the discharge of ML voltage. level, the matchline remain charged at VDD . The TCAM array
The drain-gate capacitances of T9 and T7 (CDG9 and CDG7 ) mask distribution shown in Fig. 4(a) considers a 128-word ×
and effective drain resistances (RDN9 and RDP7 ) mainly con- 32-bit macro with global masking at bit 4 (BL4 = BL4 = 0)
tribute to the discharging time. The output voltage at port ML and local masking (X) at shown positions.
(VOUT ) can be written in the function of voltage VN2 as A 3-search timing diagram of an 1-bit TCAM cell is pre-
sented in Fig. 4(b). DWL is kept high to write the data and
VN2 RDN9 MWL is kept high for writing the mask value (XM) to the
VOUT = 1 . (5)
CDG9 S + RDN9 net N1 . The masking bits are stored in the mask-bit register
and driven through the mask-bit driver. The writing strategy
The voltage at net N2 can be expressed as of the mask bits into the TCAM cell is very much similar
to the conventional mechanisms expect the timing. The two
1
1
VIN CDG7 VOUT CDG9 + RDN9
S S storage nodes (Q and Q) are kept at the same voltage level
VN2 = 1 = . (6)
C S + RDP7 R DN9 irrespective of the mask value. This property of separating
DG7
data storage (static) and mask storage (dynamic) reduces the
Therefore, the transfer function for the evaluation circuit can be dynamic power consumption due to alternate storage of data
written as and mask values at same storage cell. The reason of separating
VOUT CDG9 RDG9 S wordlines for data and mask storage is to achieve parallel data
= . (7) and mask write. This increases the frequency of operation by
VIN (1 + CDG9 RDG9 S)(1 + CDG7 RDG7 S)
simultaneous search after precharge. During global masking,
The discharging time can be calculated from (7) or RTh of the the sources of both evaluation transistors (T5 and T6 ) have been
approximate model shown in Fig. 3. provided with logic 0 that matches with all “Q” values.
1914 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 63, NO. 11, NOVEMBER 2016
Fig. 7. Measurement results of the extracted layout netlist of a 128 × 32-bit QSTCAM (α: Considering full matchline charge; β: Considering acceptable
matchline charge.)
Fig. 9. Power analysis of compared TCAMs. (a) and (b) Peak and average power comparison respectively at varying temperature from −20 ◦ C to 100 ◦ C
and VDD of 1 V. (c) Average power comparison for supply voltage scaling from 1.2 to 0.7 V.
The proposed design clearly settles down faster than the com-
pared designs. The comparable energy dissipation and small
ML delay provides the best energy delay product (EDP) among
the referred TCAMs.
Energy dissipation at various TCAM macros of 32-bits are
compared in Fig. 11(a). A similar energy dissipation metric
in all macro sizes secures the proposed QSTCAM cascad-
ability for forming larger TCAM arrays. Power consumption
distribution at various operational phases has been presented
in Fig. 11(b). The low-power consumption of compact TCAM
during the write and precharge phase provides the design
advantage at low operating frequency, but the overwhelming
value during the search limits its use. The proposed QSTCAM
provides a trade-off between static and matchline energy dis-
sipation. The two storage nodes (Q and Q) increase the static
energy dissipation but well below the conventional design.
Fig. 10. Matchline charge variation of compared TCAMs at mismatch state.
An equal rise and fall time of 10 ps has been considered.
C. Device Scaling and Process Corner Variation
The stability of compared designs have been tested at various
delay as discussed in Section III. Transistors functioning in this process corners—Threshold voltages at various corners are
decision path (Conventional TCAM-TG7 , TG8 , T17 , and T18 ; presented in the Appendix—. The proposed design dissipates
Compact TCAM-T5 and T6 or T11 and T12 ; Proposed less at slow corners (FS and SS) while the evaluation result
QSTCAM-T5 , T7 , and T9 or T6 , T7 , and T9 ) contribute to pass transistor (T7 ) contributes to the smaller ML delay at fast
the search time. In an approximation, higher transistors in the corner (FF). The compact TCAM advances at FS corner but the
decision path leads to higher matchline delay. higher energy dissipation provides worst EDP at fast corner.
The ML delay variation over the given temperature range Matchline delay of the conventional TCAM is more sensitive
follows the peak power variation. The higher the peak a design to the process corner variation where as an average variation
clinches, the higher the time it takes to settle down completely of 15% in the EDP makes the proposed design best as shown
(Discharge). A lower average rate of change of 10.65% is found in Fig. 11(c).
in the ML delay of conventional TCAM but the proposed design The effect of device scaling at typical corner (TT) is sum-
searches faster in the temperature range from −20 ◦ C to 100 ◦ C. marized in Table II. CMOS circuits are more pronounced to
Performance of the proposed design degrades at a lower rate leakage as the technology scales down [considering relative
compared to the compact TCAM as the supply voltage is threshold voltage (VTH ) reduction]. The threshold voltages
scaled down. however are not scaled down in that proportion. The leaky
A ML delay increment of 89% and 90.7% for the proposed static storage cells in conventional and compact designs are
and compact TCAM, respectively, have been found at VDD of the reason behind high average power consumption at lower
0.6 V, and hence not considered in the comparison presented. device sizes. The proposed design therefore performs better in
The matchline charge variation of compared designs for a all respects as the technology is scaled down and can further
mismatch case at 27 ◦ C and VDD of 1 V is shown in Fig. 10. work (below 45-nm) by specifying a proper VDD /VTH ratio.
MISHRA et al.: A 9-T 833-MHz 1.72-fJ/BIT/SEARCH QUASI-STATIC TERNARY FULLY ASSOCIATIVE CACHE TAG 1917
Fig. 11. (a) Energy dissipation comparison at various TCAM macros. (b) Average power distribution among various operation phases. (c) Design sensitivity to
energy-delay at various process corners.
TABLE III
I MPACT OF P ROCESS C ORNER VARIATION ON P ROPOSED QSTCAM
P ERFORMANCE AT VARIOUS S UPPLY V OLTAGES (FF: FAST C ORNER ;
FS: FAST nMOS, S LOW pMOS; SF: S LOW nMOS, FAST pMOS;
SS: S LOW C ORNER ; TT: T YPICAL C ORNER )
TABLE IV
TCAM S USTAINABILITY, E NERGY D ISSIPATION AND
M ATCHLINE L OW L OGIC V OLTAGE C OMPARISON
AT VARIOUS F REQUENCY OF O PERATION
Fig. 12. (a) Matchline delay performance on voltage scaling. (b) EDP compar-
ison for supply voltage scaling from 1.2 to 0.7 V.
TABLE II
I MPACT OF D EVICE S CALING ON VARIOUS TCAM P ERFORMANCE
AT T YPICAL C ORNER —C YCLE T IME OF 90 ns H AS B EEN
C ONSIDERED IN THIS C OMPARISON —
Fig. 13. Measurement waveforms and performance results of the extracted layout netlist of the proposed QSTCAM.
TABLE V
P ERFORMANCE C OMPARISON S UMMARY OF R EFERRED D ESIGNS
TABLE VI R EFERENCES
P ERFORMANCE C OMPARISON S UMMARY OF C OMPARED
32 × 64-B IT M ACROS AT 27 ◦ C AND VDD OF 1.0 V [1] K. Zheng, C. Hu, H. Lu, and B. Liu, “A TCAM-based distributed parallel
IP lookup scheme and performance analysis,” IEEE/ACM Trans. Netw.,
vol. 14, no. 4, pp. 863–875, Aug. 2006.
[2] Y. D. Kim, H. S. Ahn, S. Kim, and D. K. Jeong, “A high-speed range-
matching TCAM for storage-efficient packet classification,” IEEE Trans.
Circuits Syst. I, Reg. Papers, vol. 56, no. 6, pp. 1221–1230, Jun. 2009.
[3] I. Arsovski, T. Hebig, D. Dobson, and R. Wistort, “A 32 nm 0.58-fJ/bit/
search 1-GHz ternary content addressable memory compiler using silicon-
aware early-predict late-correct sensing with embedded deep-trench
capacitor noise mitigation,” IEEE J. Solid-State Circuits, vol. 48, no. 4,
pp. 932–939, Apr. 2013.
[4] P.-T. Huang and W. Hwang, “A 65 nm 0.165 fJ/bit/search 256 × 144
TCAM macro design for IPv6 lookup tables,” IEEE J. Solid-State Circuits,
vol. 46, no. 2, pp. 507–519, Feb. 2011.
[5] S. K. Maurya and L. T. Clark, “A dynamic longest prefix matching content
addressable memory for IP routing,” IEEE Trans. Very Large Scale Integr.
TABLE VII (VLSI) Syst., vol. 19, no. 6, pp. 963–972, Jun. 2011.
P REDICTIVE M ODELS OF THE G ENERIC P ROCESS D ESIGN [6] I. Arsovski, T. Chandler, and A. Sheikholeslami, “A ternary content-
K IT (GPDK) R EPRESENTING THE T HRESHOLD addressable memory (TCAM) based on 4T static storage and including
V OLTAGES [V] AT VARIOUS C ORNERS a current-race sensing scheme,” IEEE J. Solid-State Circuits, vol. 38,
no. 1, pp. 155–158, Jan. 2003.
[7] H. Che, Z. Wang, and K. Zheng, “DRES: Dynamic range encoding
scheme for TCAM coprocessors,” IEEE Trans. Comput., vol. 57, no. 7,
pp. 902–915, Jul. 2008.
[8] P. Maffezzoni, B. Bahr, Z. Zhang, and L. Daniel, “Oscillator array models
for associative memory and pattern recognition,” IEEE Trans. Circuits
TABLE VIII Syst. I, Reg. Papers, vol. 62, no. 6, pp. 1591–1598, Jun. 2015.
P REDICTIVE M ODELS OF THE G ENERIC P ROCESS D ESIGN [9] Y. Sun and M. S. Kim, “A hybrid approach to CAM-based longest pre-
K IT (GPDK) R EPRESENTING THE D EVICE S IZES [nm] fix matching for IP route lookup,” in Proc. IEEE GLOBECOM, 2010,
pp. 1–5.
[10] L. Kosmidis, J. Abella, E. Quiñones, and F. J. Cazorla, “Efficient cache
designs for probabilistically analysable real-time systems,” IEEE Trans.
Comput., vol. 63, no. 12, pp. 2998–3011, Dec. 2014.
[11] I. Hayashi, T. Amano, N. Watanabe, Y. Yano, Y. Kuroda, M. Shirata,
K. Dosaka, K. Nii, H. Noda, and H. Kawai, “A 250-MHz 18-Mb full ternary
scheme perform faster search at low average energy dissipation CAM with low-voltage matchline sensing scheme in 65-nm CMOS,”
IEEE J. Solid-State Circuits, vol. 48, no. 11, pp. 2671–2680, Nov. 2013.
change of 12.7% over supply voltage scaling. The design [12] C. Wang, C. Hsu, C. Huang, and J. Wu, “A self-disabled sensing technique
nudges a larger peak during the phase change from precharge to for content-addressable memories,” IEEE Trans. Circuits Syst. II, Express
search compared to the conventional TCAM due to the use of Briefs, vol. 57, no. 1, pp. 31–35, Jan. 2010.
[13] V. Lines, A. Ahmed, P. Ma, and S. Ma, “66 MHz 2.3 M ternary dy-
two soft storage nodes but the higher matchline discharge rate namic content addressable memory,” in Proc. Record IEEE Int. Workshop
renders low settling time. The proposed 9-T–6-I/O QSTCAM Memory Technol., Design Testing, 2000, pp. 101–105.
can be used in applications with low-power and high-density [14] Y. H. Gong and S. Chung, “Exploiting refresh effect of DRAM read oper-
ations: A practical approach to low-power refresh,” IEEE Trans. Comput.,
storage requirements. The design dissipates 1.72-fJ/bit/search vol. 65, no. 5, pp. 1507–1517, May 2016.
at 1 V and can perform at a cycle time of 1.2 ns. Results [15] M. Chae, J. W. Lee, and S. H. Hong, “Decoupled 4T dynamic CAM
conclude that a 33% reduction in the matchline delay with an suitable for high density storage,” Electron. Lett., vol. 47, no. 7,
pp. 434–436, Mar. 2011.
average improvement of 62% in the energy delay product have [16] V. Vinogradov, J. Ha, C. Lee, A. Molnar, and S. H. Hong, “Dynamic
been achieved over the compared architectures. ternary CAM for hardware search engine,” Electron. Lett., vol. 50, no. 4,
pp. 256–258, Feb. 2014.
[17] K. L. Tsai, Y. J. Chang, and Y. C. Cheng, “Automatic charge balancing
A PPENDIX content addressable memory with self-control mechanism,” IEEE Trans.
Circuits Syst. I, Reg. Papers, vol. 61, no. 10, pp. 2834–2841, Oct. 2014.
Predictive models of the generic process design kit (GPDK) [18] N. Onizawa, S. Matsunaga, V. C. Gaudet, W. J. Gross, and T. Hanyu,
“High-throughput low-energy self-timed CAM based on reordered over-
representing the threshold voltages and device sizes have been lapped search mechanism,” IEEE Trans. Circuits Syst. I, Reg. Papers,
shown in Tables VII and VIII respectively. vol. 61, no. 3, pp. 865–876, Mar. 2014.
1920 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 63, NO. 11, NOVEMBER 2016
[19] S. H. Yang, Y. J. Huang, and J. F. Li, “A low-power ternary content Sandeep Mishra (M’14) received the B.Tech and
addressable memory with pai-sigma matchlines,” IEEE Trans. Very Large M.Tech degrees in electronics and communication
Scale Integr. (VLSI) Syst., vol. 20, no. 10, pp. 1909–1913, Oct. 2012. engineering from the Biju Patnaik University of
[20] N. Mohan, W. Fung, D. Wright, and M. Sachdev, “A low-power ternary Technology, Rourkela, India, in 2011 and 2013, re-
CAM with positive-feedback match-line sense amplifiers,” IEEE Trans. spectively. He is presently pursuing the Ph.D. degree
Circuits Syst. I, Reg. Papers, vol. 56, no. 3, pp. 566–573, Mar. 2009. with the Department of Electronics and Communi-
[21] J. W. Zhang, Y. Z. Ye, and B. D. Liu, “A current-recycling technique cation Engineering, National Institute of Technology
for shadow-match-line sensing in content-addressable memories,” IEEE Meghalaya at Shillong, India.
Trans. Very Large Scale Integr. (VLSI) Syst., vol. 16, no. 6, pp. 677–682, His research area of interest covers low-power
Jun. 2008. memory design, high-speed sense amplifier, and in-
[22] B. D. Yang, Y. K. Lee, S. W. Sung, J. J. Min, J. M. Oh, and H. J. Kang, telligent transportation system.
“A low-power content addressable memory using low swing search lines,”
IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 58, no. 12, pp. 2849–2858,
Dec. 2011.
[23] A. Agarwal, S. Hsu, S. Mathew, M. Anders, H. Kaul, F. Sheikh, and
R. Krishnamurthy, “A 128 × 128b high-speed wide-AND match-line
content addressable memory in 32 nm CMOS,” in Proc. 2011 ESSCIRC,
2011, pp. 83–86.
[24] C. C. Wang, J. S. Wang, and C. Yeh, “High-speed and low-power design Telajala Venkata Mahendra received the B.Tech
techniques for TCAM macros,” IEEE J. Solid-State Circuits, vol. 43, degree in electronics and communication engineering
no. 2, pp. 530–540, Feb. 2008. from the Jawaharlal Nehru Technological University,
[25] A. T. Do, S. Chen, Z. H. Kong, and K. S. Yeo, “A high speed low-power Kakinada, India, in 2013, and the M.Tech degree in
CAM with a parity bit and power-gated ML sensing,” IEEE Trans. Very VLSI Design in 2016 from the National Institute of
Large Scale Integr. (VLSI) Syst., vol. 21, no. 1, pp. 151–156, Jul. 2013. Technology Meghalaya, Shillong, India.
[26] Y. J. Chang, K. L. Tsai, and H. J. Tsai, “Low leakage TCAM for IP lookup He is currently a Junior Research Fellow at
using two-side self-gating,” IEEE Trans. Circuits Syst. I, Reg. Papers, the National Institute of Technology Meghalaya.
vol. 60, no. 6, pp. 1478–1486, Jun. 2013. His research interests include design of low-power
[27] N. Onizawa, S. Matsunaga, V. C. Gaudet, and T. Hanyu, “High-throughput VLSI circuits, content-addressable memories, volatile
low-energy content-addressable memory based on self-timed overlapped memories, and FPGA-based implementations.
search mechanism,” in Proc. 18th IEEE Int. Symp. ASYNC, 2012,
pp. 41–48.
[28] A. Wiltgen, K. Escobar, A. Reis, and R. Ribas, “Power consumption
analysis in static cmos gates,” in Proc. 26th SBCCI, 2013, pp. 1–6.
[29] N. S. Kim, T. Austin, D. Blaauw, T. Mudge, K. Flautner, J. S. Hu,
M. J. Irwin, M. Kandemir, and V. Narayanan, “Leakage current: Moore’s
law meets static power,” IEEE Comput., vol. 36, no. 12, pp. 68–75, Anup Dandapat (M’10–SM’15) received the Ph.D.
Dec. 2003. degree in digital VLSI design from Jadavpur Univer-
[30] D. Kayal, A. Dandapat, and C. Sarkar, “Design of a high performance sity, Kolkata, India, in 2008.
memory using a novel architecture of double bit CAM and SRAM,” Int. He is presently an Associate Professor with
J. Electron., vol. 99, no. 12, pp. 1691–1702, Jun. 2012. the Department of Electronics and Communica-
[31] S. Mishra and A. Dandapat, “EMDBAM: A low-power dual bit associa- tion Engineering, National Institute of Technology
tive memory with match error and mask control,” IEEE Trans. Very Large Meghalaya at Shillong, India. He has authored over
Scale Integr. (VLSI) Syst., vol. 24, no. 6, pp. 2142–2151, Jun. 2016. 50 national and international journal papers. His
[32] K. Pagiamtzis and A. Sheikholeslami, “Content-addressable memory current research interests include low-power VLSI
(CAM) circuits and architectures: A tutorial and survey,” IEEE J. Solid- design, low-power memory design, and low-power
State Circuits, vol. 41, no. 3, pp. 712–727, Mar. 2006. digital design.