Vous êtes sur la page 1sur 11

530

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 43, NO. 2, FEBRUARY 2008

High-Speed and Low-Power Design


Techniques for TCAM Macros
Chao-Ching Wang, Jinn-Shyan Wang, Member, IEEE, and Chingwei Yeh

AbstractTernary content addressable memory (TCAM) is an


important component for many applications. For TCAM-based
networking systems, the rapidly growing size of routing tables
brings with it the challenge to design higher search speeds and
lower power consumption. In this work, two techniques are proposed to realize high-performance and low-power TCAM for IP
address lookup. One technique is the tree AND-type match-line
scheme for high search speed. The other technique is the segmented search-line scheme for low power. The implemented 1.8 V
0.18 m 256 128b TCAM macro achieves a 1.56 ns search time
using a 1.42 fJ/bit/search of energy.
Index TermsAssociative memories, content-addressable
memory, high speed, low power, PF-CDPD, pseudo-footless, segmented search line, tree match line.
Fig. 1. A search engine realized by a TCAM.

I. INTRODUCTION

HE content addressable memory (CAM) is an important


component for accelerating data search operation in many
applications such as data base access [1], pattern matching
[2], signal processing [3], and networking IP address lookup
[4][6]. In some applications such as IP-address lookup, ternary
content addressable memory (TCAM) is required to implement
(dont care) in the
the masking function through storing
TCAM cell. When storing in the TCAM cell, the cell datum
is always regarded as matched with the search datum no matter
what the search datum is. Fig. 1 shows a search engine realized
with a TCAM. The input data are fed into the search lines
through the search-line buffers, and are then compared simultaneously with all the stored data in the TCAM array. Row-based
data search is performed to generate matching results through
match-line circuits. Previous works [7][14] have demonstrated
that the design of the match-line circuit has a major impact on
search speed and power consumption.
It is generally recognized that NOR-type match-line circuits
[7][9] achieve high search speed but at the expense of high
power consumption, while NAND-type match-line circuits
[10], [11] are power efficient with the penalty of low speed.
Recently, an AND-type match-line circuit [12] constructed
with the pseudo-footless clock-and-data pre-charged dynamic
(PF-CDPD) logic was proposed to achieve not only high speed
but also low power.
Manuscript received February 24, 2007; revised August 27, 2007. This work
was supported by the National Science Council, the Ministry of Economic Affairs, and the National Si-Soft Project of Taiwan.
The authors are with the Department of Electrical Engineering, National Chung Cheng University, Chia-Yi, 621 Taiwan, R.O.C. (e-mail:
ieegsw@ccu.edu.tw).
Digital Object Identifier 10.1109/JSSC.2007.914330

The works in [9][12] show that the power consumption of


match-line circuits has been greatly reduced by the advancement in match-line circuit techniques. The work in [13] used in
the IP address lookup application used low-swing search-line
circuits for further power reduction, and added the pipelining
technique to the NOR-type match-line circuit [9] for enhancing
throughput. This indeed increases the throughput, but the area
overhead resulting from the flip-flops and the clock driver
for pipelining makes this design not cost effective, and the
extra power consumption required for these added components
nullifies any power saving from new search-line circuits. On
the other hand, the research in [14] added the non-pipelined
split-path technique on top of the AND-type match-line circuit
[12] to achieve over 50% search speed improvement compared
to the pipelined NOR-type match-line scheme [13]. However,
compared to the original AND-type match-line design [12], the
power efficiency of the split-path AND-type match-line scheme
is sacrificed due to a much larger clock loading.
This work proposes both high-speed and low-power design
techniques [15] for TCAM macros. The speed enhancement
technique is a tree AND-type match-line scheme, which can efficiently speed-up search operations with only a slight sacrifice
in energy efficiency due to a slightly more complex interconnection. In addition, total power consumption can be reduced
through the proposed segmented search-line scheme by utilizing
the specific feature of IP address lookup.
The rest of the paper is organized as follows. Section II describes the tree match-line circuitry, and Section III describes
the segmented search-line circuitry. Other design considerations
are described in Section IV. Test chip implementation and experimental results are presented in Section V. Finally, conclusions are drawn in Section VI.

0018-9200/$25.00 2008 IEEE

WANG et al.: HIGH-SPEED AND LOW-POWER DESIGN TECHNIQUES FOR TCAM MACROS

531

Fig. 2. (a) The original cascaded AND-type match-line circuit. (b) Logic transformation.

II. TREE MATCH-LINE CIRCUITRY


In the first part of this section, we will point out the problems with the split-path AND-type match-line circuit [14]. The
analysis emphasizes the design concept of the proposed tree
AND-type match-line circuitry, which will be described in the
second part of this section.
A. Problems With the Split-Path AND-Type Match-Line
An original -stage PF-CDPD AND-type match-line circuit
[12] is shown in the upper part of Fig. 2(a), while the lower
part of Fig. 2(a) depicts the same circuit represented by logic
symbols. Except for the first gate, all other gates perform a
p-input AND function. The evolution from the cascaded ANDtype match-line to the split-path AND-type match-line is shown
separated p-input AND gates
in Fig. 2(b). There are
in the split-path AND-type match-line. It was shown [14] that
a 23.33% speed gain (delay reduced from 2.1 ns to 1.61 ns) is
obtained by the logic transformation, mainly due to a much simpler critical-path circuitry.
However, the speed enhancement comes at a cost. For a
256 128b BiCAM macro designed in a 0.18 m CMOS technology [16], the energy efficiency deteriorates substantially,
from 2.33 fJ/bit/search to 4.83 fJ/bit/search [14]. Our analysis
indicates that the energy efficiency deterioration results from
three reasons. First, the clock driver needs to be enlarged
because all the separated gates need to be triggered by the clock
signal resulting in increased power consumption of the clock
driver. Second, all separated p-input AND gates are evaluated
independently. This means that the evaluation does not depend
on the evaluation results of any other gate, and the switching
activity of these gates will be higher than that of the p-input
AND gate in the cascaded design. Third, the number of logic
gates is increased, and hence the interconnections among the
logic gates and the parasitic capacitance are correspondingly
increased.
B. The Proposed Tree AND-Type Match-Line Circuitry
The basic concept behind the split-path AND-type match-line
circuit is that it tries a different way to implement a big AND
function originally realized by m cascaded AND gates. However, there are several ways to achieve the same goal, and three
of them are shown in Fig. 3 (assuming 64b in each half plane).

Fig. 3. (a) Parallel, (b) 3-level tree, and (c) 2-level tree AND-type match lines.

The design in Fig. 3(a) uses two short parallel match lines in
each half plane and merges the outputs from both planes into
a 4-input AND gate to generate the final matching result. On
the other hand, the design in Fig. 3(b) and (c) adopt a 3-level
and 2-level tree match-line circuit, respectively, in each half
plane, and use an 8-input and 4-input AND gate, respectively,
to generate the final matching results. The electrical behaviors,
including delay time and power consumption, are used to determine the final choice. The evaluation results are described as
follows.
Let us take the design of a 0.18 m 128b TCAM match line
as an example. Post-layout evaluation results of different implementations are listed in Table I. All the designs use the same
TCAM cell for a fair evaluation, and the cell layout is shown in
Fig. 4. The impacts of the TCAM cell design and the cell layout
design will be described in Section IV.
The following are the observations from the extracted features and parameters.
1) Both designs 1 and 2 have the deepest logic depth, but
design 1 performs a more complex function in the critical
path than design 2. So, design 1 has the longest search
delay.

532

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 43, NO. 2, FEBRUARY 2008

TABLE I
PERFORMANCE COMPARISONS BETWEEN DIFFERENT MATCH LINES

the search time and the energy index breakdown normalized


at 1.2 V 0.1 m technology are shown in Fig. 5. We find that
the power consumption of search lines occupies about 54%,
71%, and 82% of the total power consumption of the CAM designs [9], [11], and [12], respectively. To reduce the power consumption for search lines, this study proposes the segmented
search-line technique for the TCAM macro in the application
of IP address lookup. In the following, we will first describe the
attributes of TCAM for IP address lookup, and then describe the
design of the segmented search-line circuitry.
A. Attributes of TCAM for IP Address Lookup

Fig. 4. The TCAM cell layout used for match-line evaluation.

2) Designs 3, 4, and 5 have nearly 30% improvement on


search speed compared to design 1.
3) The differences of the delay times of designs 3, 4, and 5 do
not exceed 1.5%. Therefore, the final decision can be made
based on the power consumption.
of power consumption of the cas4) Compared to 233.6
caded design, the split-path design has 85% more power
consumption. On the other hand, compared to the cascaded
design, the parallel design and the 3-level tree design have
about 20% more power consumption and the 2-level design
has only 9% more power consumption.
5) Therefore, we adopt the 2-level tree match-line circuitry in
the TCAM design.
III. SEGMENTED SEARCH-LINE CIRCUITRY
In order to reduce power consumption, we must be aware
that a TCAM macro consumes power mainly for three parts: the
clock driver, match lines, and search lines. Due to the advancement of match-line circuit techniques, the power consumption
of both match-line circuit and clock driver have been greatly reduced. According to the data published in [9], [11], and [12],

In Internet Protocol version 6 (IPv6), the length of an IP address extends to 128 bits. In a routing table, the prefix region
stores either 0 or 1, and the rest stores . The statistic prefix
length distribution observed at a specific router [17] is shown in
Fig. 6(a). We find that more than 90% of IP addresses are shorter
than 64 bits. Therefore, when the routing table is constructed
with a TCAM array, a large portion of the array contains the
mask bits (i.e., the bits), as shown in Fig. 6(b).
B. The Proposed Segmented Search-Line Circuitry
Since the cells in Fig. 6(b) do nothing but pass matching
signals, they do not have to be involved with the search operation. This property, when combined with the progressive layout
pattern, indicates that search lines behind the cells can be
turned off to save energy. The idea then leads to the segmented
search-line design as shown in Fig. 7. Many segmentation entries (SEs) are inserted into the cell array. A segmentation entry
contains a row of segmentation cells (SCs), and SCs are used to
control signal propagation in the search lines.
The circuit containing an SC and two TCAM cells is shown in
Fig. 8. The SC is composed of a dummy cell and a path-control
switch. The word line (WL) for the upper TCAM cell is also
applied to the dummy cell. When writing an into the upper
TCAM cell, both WBLP and WBLN lines are raised to high. In
that case, the output of the dummy cell receives a low to cut
off the signal propagation, and the upper segment of the search

WANG et al.: HIGH-SPEED AND LOW-POWER DESIGN TECHNIQUES FOR TCAM MACROS

533

Fig. 5. Search time and energy index of conventional CAM macros.

Fig. 6. (a) The prefix length distribution of IP addresses, and (b) the corresponding TCAM array.

Fig. 8. The circuit showing the relationship between the SC and neighboring
TCAM cells.
Fig. 7. Concept of the segmented search-line scheme.

lines (SBLNu and SBLPu) will be pulled down to ground. In


other words, when the ternary cell above the segmentation cell
(SC) stores an , the segmentation cell will automatically block
the search data from propagating forward and so save energy.

The number and locations of segmentation entries can be decided by the statistic features of the routing table. Once the
TCAM array has been designed, segmentation entries can not be
changed for a specific embedded application. If an entry needs
to be added to the look-up table, the table should be resorted at
the system level first, and then write operations are performed

534

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 43, NO. 2, FEBRUARY 2008

TABLE II
PERFORMANCE COMPARISON BETWEEN DIFFERENT INTERCONNECTION MANNERS

voltage at node out and the channel length (L) of the feedback
at typical (TT) and worst (SF) process corners are
pMOS
shown in Fig. 10(b). The results indicate that if cell_out is con, the adjustable range of for maximal
not
nected to
exceeding 0.4 V is very limited. Moreover, for the same ripple
can
voltage, say 0.15 V, the design with
use a longer (0.5 m) and obtain a shorter gate delay (188 ps),
should use a shorter
while the design with
(0.21 m) and get a larger gate delay (379 ps).
B. Interconnections Among TCAM Cells
Fig. 9. The proposed TCAM cell.

to update the table. A design example and experimental results


will be described in Sections IV and V.
IV. OTHER DESIGN CONSIDERATIONS
In this section, we describe the physical design considerations for a 1.8 V 0.18 m 256 128b TCAM macro, including
the TCAM cell design, the interconnections among TCAM
cells, the TCAM cell layout, and the design of the segmentation
entries.
A. TCAM Cell Design
Fig. 9 depicts the schematic of the proposed TCAM cell
for the AND-type match line. This cell uses two independent
latches for storing three possible kinds of data, similar to the
TCAM cell [8] used for the NOR-type match line. QN and QP
are the storage nodes, and they store complementary values
when the stored datum is either 0 or 1. MN and MP perform the
comparison (XOR) function between (QN, QP) and (SBLN,
SBLP). When the cell needs to store , both QN and QP should
be written 0 to turn off MN and MP to disable the XOR
operation. In the AND-type match line, transistor MC in this
case should always be turned on, and MX1 and MX2 are used
to charge node cell_out to a high voltage level VDDC for
this purpose.
This cell is almost identical to the TCAM cell used in [12]
except that VDDC is
rather than
. This change
considers the charge sharing effect (CSE). We adopt the same
simulation model [Fig. 10(a)] as that used in [12] to observe the
worst CSE of the proposed design. During the evaluation period,
the voltage at node out should be kept sufficiently low to guarantee a correct function. However, due to the CSE, there will be
a ripple or even a logic change at node out. The simulation
results reflecting the relationship between the undesired ripple

All the gates in the proposed 2-level tree match line [Fig. 3(c)]
should be arranged in one row in the memory array. Therefore,
it is necessary to make a long interconnection to link two
branches of the tree. The way an interconnection is made influences the amount of parasitic capacitance and in turn influences
both search speed and power consumption. We have studied
two interconnection methods for performance evaluation.
Fig. 11(a) and (b) show the conceptual and layout diagrams
of straightforward and leap-frog interconnection methods,
respectively. Post-layout simulation results are summarized in
Table II.
The simulation data in Table I are based on the leap-frog interconnection. The data in Table II reveal that if a straightforward interconnection is adopted, not only will the search be delayed but also the power consumption will increase. This effect
is mainly because the long interconnection in the straightforward manner lies in the critical path and results in a larger RC
product.
C. TCAM Cell Layout
Both evaluation results in Tables I and II are based on the
TCAM cell shown in Fig. 4. In the following we show performance evaluations based on different cell layouts. Fig. 4 is
a TCAM cell with an aspect ratio of 1.17. We designed two
other cell layouts with a small aspect ratio, as shown in Fig. 12.
Table III summarizes the post-layout evaluation results for a
128b 2-level tree match line (ML).
The data in Table III show that the smaller the aspect ratio
of the TCAM cell, the longer the search delay, the larger the
power consumption of the match-line but the smaller the capacitance on the search lines (SL). A good tradeoff is to use the design of Fig. 12(a) because it only sacrifices 1.97% search delay
but obtains 33% SL capacitance reduction. The overall power
reduction from 33% SL capacitance reduction will more than
compensate for the 4% ML power increase.

WANG et al.: HIGH-SPEED AND LOW-POWER DESIGN TECHNIQUES FOR TCAM MACROS

535

Fig. 10. (a) Simulation model for observing the CSE and (b) simulation results.

Fig. 11. (a) Straightforward and (b) leap-frog interconnection manners.


TABLE III
IMPACT OF DIFFERENT CELL LAYOUTS

D. Design of Segmentation Entries


Although we can save more power if more SEs used, the propagation delay of the search signal will be strongly affected by
the sizing of the SC and the number of series SCs along the
search line. To compromise search speed and power consumption, we design the SEs with the following steps. First, assume
there is only one SC along the search line, and size the transmission gate (TG) in the SC. For a 256-entry TCAM macro, the

TG should be sized up to 2.4 times the minimal size so that the


voltage at node cell_out in the TCAM cell can be pulled up to
from 0 V or pulled down to 0 V from
within the first half clock cycle to guarantee a safe match operation in the second half clock cycle. We also found that, with
this sizing, the operation at cell_out becomes too slow even
if there are only two SCs along the search path, as shown in
Fig. 13. Second, increase the number of SEs along the search

536

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 43, NO. 2, FEBRUARY 2008

Fig. 12. (a) The second style, and (b) the third style TCAM cell layouts.

Fig. 13. Simulation waveforms under different number of segmentation entries.

line for a larger power saving, but use the above sizing while
avoid further speed loss with the aid of floorplan design. Fig. 14
shows the final floorplan of the 256 128b TCAM macro with
two SEs. One SE is located at the quarter and the other at the
half of the search line. The write and search buffers are located
at the center of the array so that they can drive search lines in
the upper and the lower half arrays simultaneously. With this
design, each search signal will pass only one SC although one
256b search line has two SCs.
In Fig. 14 we also show the diode used to generate VDDC.
The large diode is realized by many distributed small diodes
located at top and bottom of the cell array.
V. EXPERIMENTAL RESULTS
We implemented a 1.8 V 0.18 m TCAM test chip for verifying the proposed design techniques. The critical-path circuit
of the TCAM macro is shown in Fig. 15(a). Before we can use
the TCAM for searching purposes, the TCAM array should be
filled with data using the write operation. The timing waveforms
for the write mode are shown in Fig. 15(b). When writing a
will be set
dont care, the corresponding mask bit
as 0, and the corresponding bitline enable signal
and write bitlines (
,
) will be pulled low
and high, respectively. So, both storage nodes (QP and QN) of a
TCAM cell will be written a 0 and the inner node cell_out
will be pulled up to the voltage level of VDDC as described eargoes high and
lier. When writing 1 or 0, the signal

Fig. 14. The floorplan of the 256

2 128b TCAM macro.

one of the write bitlines will be pulled low according to the input
datum.
The timing waveforms for the search operation are shown
in Fig. 15(c). The signal is the internal clock signal for the
match circuit, and the complementary signal of the external
clock signal clk. When goes low, the match-line circuit enters
the pre-charge phase and the external datum and its complement
are fetched by the up-going clk into the search lines through the
search line buffers. In this phase, the datum on the search lines
begins to compare with all the data previously written and stored

WANG et al.: HIGH-SPEED AND LOW-POWER DESIGN TECHNIQUES FOR TCAM MACROS

537

Fig. 15. (a) Schematic of the critical-path circuit, (b) waveforms of the write operation, and (c) waveforms of the search operation.

into the TCAM array, and the voltage at node cell_out of each
memory cell goes toward its final value. When goes high, the
match-line circuit enters the evaluation phase. Please refer to
[12] for the detailed operation of the PF-CDPD match-line circuit. All match lines are evaluated at the same time, and each
.
will go high if
match line generates an output
the search data matches with the stored data.
The block diagram of the test chip is shown in Fig. 16(a). The
TCAM macro contains two segmentation entries with the prefix
length of each being equal to 64 bits and 32 bits, respectively.
A voltage controlled oscillator (VCO) and a divide-by-two circuit are used to generate the clock signals with a 50% duty
cycle. The clock frequency range can be adjusted from 200 MHz
to 600 MHz. A dummy clock buffer synchronizes the rising
(falling) edge of the clock clkt for the peripheral circuits, and
for the TCAM core.
the falling (rising) edge of the clock
The pre-stored data and the search data are generated by four
32b linear feedback shift registers (LFSRs), and the seed for
the LFSRs can be controlled for varying the data sequence. The
mask-bit control circuit is used to help generate the progressive data pattern. The 8b counter is used for generating the address for the write operation. In the beginning of measurement,
the 4 32b LFSRs will generate a random pattern, which is

ANDed with the pattern output by the mask-bit control circuit


to generate the progressive pattern for the look-up table. When
doing matching operations, the random search data will be generated by LSFRs again with the same seed, and therefore each
search data will match with one of the stored data. In this sense,
the power is measured with one and only one match output per
clock cycle. The timing diagrams of the test chip are shown in
Fig. 16(b). From the timing diagrams, the search time can be
, where
is the
calculated as
is the simulated set-up time
measured clock cycle time, and
of the flip-flop (5.2 ps).
The photograph of the test chip is shown in Fig. 17(a), and
the measured waveforms are shown in Fig. 17(b). The test chip
can run at a minimal clock cycle time of 3.12 ns at the typical
supply voltage of 1.8 V. In this case, the search time is calculated
ns
ps
ns. The lowest working
is
as
1.2 V as shown in Fig. 17(c), and the corresponding search time
is 5.63 ns. The energy indexes of the TCAM macro are measured
to be 1.42 and 0.63 fJ/bit/search at 1.8 V and 1.2 V, respectively.
Chip features extracted from the implemented chip are listed
in Table IV. We obtained eight samples from an educational program, and all of them functioned correctly. The standard deviation of the search time and the energy index of these eight chips

538

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 43, NO. 2, FEBRUARY 2008

Fig. 16. (a) The block diagram and (b) the timing diagrams of the test chip.

Fig. 17. (a) Photograph, (b) measurement waveforms, and (c) shmoo chart of the test chip.
TABLE IV
FEATURES SUMMARY OF THE TEST CHIP

are 0.021 ns and 0.012 fJ/bit/search, respectively. This result implies that the proposed design techniques are robust to process
variations.

Performance comparisons are illustrated in Table V. Even


the proposed TCAM cell is 11% larger than the conventional
TCAM cell used in [13], the normalized area per bit of the
proposed design is 60% smaller than the conventional highspeed pipelined design [13]. When adopting the tree AND-type
match-line circuitry and the segmented search-line technique,
the 256 128b TCAM macro achieves a search time of 1.56 ns
and an energy index of 1.42 fJ/bit/search at 1.8 V. This achievement represents a 51% improvement in the energy index as compared to the TCAM design in [13]. It also represents a 38% reduction in the minimal clock cycle time and a 39% improvement
in the energy index compared to the BiCAM design in [12]. Because the segmented search line (SSL) is an application-specific
technique, we also show the energy index in Table V for the proposed TCAM without using the SSL technique. Compared to

WANG et al.: HIGH-SPEED AND LOW-POWER DESIGN TECHNIQUES FOR TCAM MACROS

539

TABLE V
FEATURES SUMMARY AND PERFORMANCE COMPARISON

TABLE VI
OTHER PERFORMANCE COMPARISONS

the TCAM design [13], the proposed design still shows a 25%
improvement in the energy index.
In order to see how the speed and power are affected by the
bit width and CMOS technology, we have also implemented a
0.18 m 1.8 V 256 144b TCAM macro and a 0.13 m 1.2 V
256 128b TCAM macro. Table VI summarizes the design features. When realizing a 144b match line, a four-input PF-CDPD
AND gate is added at the end of each branch of the 2-level tree
AND-type match line (refer to Fig. 3(c)). As compared to the
128b-wide TCAM macro, the search delay and the energy index
of the 144b-wide TCAM match line increase 12% and 23%, respectively. On the other hand, comparing the 0.13 m 1.2 V
256 128b TCAM design to the 0.18 m 1.8 V 256 128b
TCAM design, the search delay and the energy index improve
29% and 75%, respectively. The results indicate the benefits
from the technology scaling.
VI. CONCLUSION
In this work, the tree AND-type match-line scheme is proposed for its high search speed, and the segmented search line

scheme for its high energy efficiency in the TCAM-based application of IP address lookup. The design of the TCAM cell, interconnections among TCAM cells, TCAM cell layout, and segmentation entries are also described. The realized 1.8 V 0.18 m
256 128b TCAM macro achieves a search time of 1.56 ns with
1.42 fJ/bit/search energy.
ACKNOWLEDGMENT
The authors thank the Chip Implementation Center for supporting the chip fabrication.
REFERENCES
[1] K.-J. Lin and C.-W. Wu, A low-power CAM design for LZ data compression, IEEE Trans. Comput., vol. 49, pp. 11391145, Oct. 2000.
[2] F. Yu, R. H. Katz, and T. V. Lakshman, Gigabit rate packet patternmatching using TCAM, in Proc. IEEE ICNP, 2004, pp. 174183.
[3] T. Ikenaga and T. Ogura, A fully parallel 1-Mb CAM LSI for real-time
pixel-parallel image processing, IEEE J. Solid-State Circuits, vol. 35,
no. 4, pp. 536544, Apr. 2000.
[4] R. Sangireddy and A. K. Somani, High-speed IP routing with binary
decision diagrams based hardware address lookup engine, IEEE J. Sel.
Areas Commun., vol. 21, no. 5, pp. 513521, May 2003.

540

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 43, NO. 2, FEBRUARY 2008

[5] T. Hayashi and T. Miyazaki, High-speed table lookup engine for


IPv6 longest prefix match, in Proc. GLOBECOM99, 1999, vol. 2,
pp. 15861571.
[6] N.-F. Huang, W.-E. Chen, J.-Y. Luo, and J.-M. Chen, Design of
multi-field IPv6 packet classifiers using ternary CAMs, in Proc.
GLOBECOM 2001, vol. 3, pp. 18771881.
[7] H. Miyatake, M. Tanaka, and Y. Mori, A design for high-speed
low-power CMOS fully parallel content-addressable memory macros,
IEEE J. Solid-State Circuits, vol. 36, no. 6, pp. 956968, Jun. 2001.
[8] I. Arsovski, T. Chandler, and A. Sheikholeslami, A ternary contentaddressable memory (TCAM) based on 4T static storage and including
a current-race sensing scheme, IEEE J. Solid-State Circuits, vol. 38,
no. 1, pp. 155158, Jan. 2003.
[9] I. Arsovski and A. Sheikholeslami, A mismatch-dependent power allocation technique for match-line sensing in content-addressable memories, IEEE J. Solid-State Circuits, vol. 38, no. 11, pp. 19581966,
Nov. 2003.
[10] F. Shafai, K. J. Schultz, G. F. R. Gibson, A. G. Bluschke, and D. E.
Somppi, Fully parallel 30-MHz, 2.5-Mb CAM, IEEE J. Solid-State
Circuits, vol. 33, no. 11, pp. 16901696, Nov. 1998.
[11] S. Choi, K. Sohn, and H.-J. Yoo, A 0.7 fJ/bit/search, 2.2 ns search
time, hybrid type TCAM architecture, IEEE J. Solid-State Circuits,
vol. 40, no. 1, pp. 254260, Jan. 2005.
[12] Li Hung-Yu, C.-C. Chen, J.-S. Wang, and C. Yeh, An AND-type
match-line scheme for high-performance energy-efficient content addressable memories, IEEE J. Solid-State Circuits, vol. 41, no. 5, pp.
11081119, May 2006.
[13] K. Pagiamtzis and A. Sheikholeslami, A low-power content-addressable memory (CAM) using pipelined hierarchical search scheme,
IEEE J. Solid-State Circuits, vol. 39, no. 9, pp. 15121519, Sep. 2004.
[14] C.-C. Chen, Li Hung-Yu, and J.-S. Wang, The split-path and-type
match-line scheme for very high-speed content addressable memories,
in Proc. Asian Solid-State Circuits Conf., 2005, pp. 525528.
[15] J.-S. Wang, C.-C. Wang, and C. Yeh, TCAM for IP-address lookup
using tree-style and-type match lines and segmented search lines,
in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, 2006, pp.
577586.
[16] TSMC 0.18um mixed signal 1P6M + MIM salicide 1.8 V/3.3 V
process documents, Taiwan Semiconductor Manufacturing Co., Ltd.,
T-018-MM-TM-002.
[17] BGP Table Statistics. 2006 [Online]. Available: http://bgp.potaroo.net
& http://bgpview.6test.edu.cn

Chao-Ching Wang was born in Taiwan, R.O.C.,


in 1981. He received the B.S. degree in electrical
engineering from National Chung Cheng University,
Taiwan, in 2003. He is currently working toward
the Ph.D. degree at the Institute of Electrical Engineering, National Chung Cheng University.
His research interests include high-speed,
low-leakage, and low-power memory designs

Jinn-Shyan Wang (S85M88) was born in


Taiwan, R.O.C., in 1959. He received the B.S.
degree in electrical engineering from National
Cheng-Kung University, Tainan, Taiwan, in 1982,
and the M.S. and Ph.D. degrees from the Institute
of Electronics, National Chiao-Tung University,
Hsinchu, Taiwan, in 1984 and 1988, respectively.
He was with Industrial Technology Research Institute (ITRI) from 1988 to 1995, engaged in ASIC
circuit and system design, and became the Manager
of the Department of VLSI Design. He joined the Department of Electrical Engineering, National Chung-Cheng University, Chia-Yi,
Taiwan, in 1995, where he is currently a full Professor. His research interests
are in low-power and high-speed digital integrated circuits and systems, analog
integrated circuits, IP and SOC design, and CMOS image sensors. He has published over 20 journal papers and 40 conference papers and holds over 20 patents
on VLSI circuits and architectures.

Chingwei Yeh received the B.S. degree in electrical


engineering from National Taiwan University,
Taipei, Taiwan, in 1986, and the Ph.D. degree
in electrical and computer engineering from the
University of California at San Diego in 1992.
Since then, he has been a faculty member with the
Electrical Engineering Department, National ChungCheng University, Taiwan. His research interests include digital VLSI design and CAD.

Vous aimerez peut-être aussi