Vous êtes sur la page 1sur 3

JOURNAL OF COMPUTING, VOLUME 2, ISSUE 7, JULY 2010, ISSN 2151-9617

HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 44

Improved compact State Machines for High


Performance Pattern Matching
Shanthi Makka
Dept. of Computer Science and Engineering, ITS Engineering College
Plot No 46, Greater Noida ,INDIA

new viruses. The paper Pattern matching algorithms, such as


Abstract-Pattern Matching is finding all occurrences of
Aho-Corasick[1] and D2FA [6], rely on state machines
pattern in a Text. Pattern matching is used in wide range of
whose size is proportional to the size of pattern databases. A
applications such as search for particular patterns in DNA Pattern matching starts in a data stream and it follows the
sequences, network intrusion detection, virus scanning, etc. state migration until the input stream is exhausted. Several
Internet search engines also use them to find web pages studies and products are available for performing pattern
relevant to queries. Recently, parallel pattern matching searching with ASICs [3, 10], FPGAs [2, 4]. Recently
engines, based on ASICs, FPGAs or network processors, Network processors have also introduced as programmable
performs pattern matching by using multiple numbers of processing units for performing pattern matching.
finite state machines. The state migration in the matching
1.2 Finite State Machines
procedure incurs intensive memory accesses. Thus, it is
Finite state machines are normally stored in a linear memory
difficult to minimize the storage of state machines such that space; pattern matching procedures uses intensive memory
they can be fit in on-chip or other fast memory modules to access while traversing state machines. Fast memory access
achieve high-speed pattern matching. So it is required to time can increase the performance of pattern matching
minimize the storage space and that’s why in this paper I significantly. Thus, it is necessary to minimize the storage
space of state machines such that they can fit in to on-chip or
propose optimization techniques, namely state re-labeling other fast memory modules to achieve high speed pattern
and memory partition, to reduce state machine storage. The matching.
paper also presents architectural designs based on the
optimization strategy. We evaluate our design using realistic
1.3 Schemes Proposed
pattern sets, and the results show state machine memory Generally state machines which uses linear space requires
reduction up to 90.1%. more memory space. But this paper proposes few schemes to
reduce the memory required to store state machines in a
General Terms linear memory space. One scheme is to relabeling the state of
a machine so that the matching states are clustered at the
Algorithms, Design, Performance, Security beginning of memory space then it can lead to compact
storage. The other scheme is to divide the state machines and
Keywords provide separate memory modules to store keywords and
Pattern Matching, Parallel Processing, ASICs, FPGAs, next state pointers. Then i present architectural design using
the proposed schemes. The experiment results show
network processors significant reduction (up to 90.1%) on the memory
consumption of state machines.
1. INTRODUCTION
1.1Pattern Matching 1.4 Organization of the paper
The paper is organized as follows. Section 2 introduces the
A pattern can be defined as a sub part of a text or a string back ground of a parallel pattern matching architecture and
which may or may not be in contiguous positions. Pattern Section 3 motivates our research. Section 4 describes the
matching searches for predefined patterns in data streams. A proposed optimization schemes. Section 5 presents the
pattern set requires permission to have digital or hard copy of architectural design using the schemes. Section 6 presents
entire or part of network for personal of class room use is comparison with existing work. Finally, Section 7
concludes.
granted without any sort of fee provided that these copies are
not made for any profit or commercial adv advantage. To
copy or to republish, to post on servers or to redistribute to 2. BACKGROUND
The storage requirement of state machines significantly
lists, requires prior specific permission and/or a fee. (or affects the performance of pattern matching. Pattern
database) can contain thousands of patterns and keep matching procedures uses intensive memory access while
expanding for enforcing new security policies or capturing traversing state machines.
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 7, JULY 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 45

So it is desirable that state machines are small enough to fit in to This implies that compression or elimination of such zero vectors can
high speed memory access device to reduce the memory access significantly reduce the memory usage. We propose to relabel the
time.There exist research works addressing the state machine states after the trie is constructed. We re-label the double-circled
storage issue [10, 8]. Particularly, Tan-Sherwood proposed to nodes (matching nodes, or output states) in the trie. Such that none of
leverage eight bit-level state machines to match eight individual their identifiers is larger than those of single circled intermediate
bits of a byte in parallel, instead of matching byte by byte. Such nodes. As a result, when storing the state machine in a linear memory
bit-level state machines significantly reduce the number of next space, the keyword-ID vectors corresponding to the matching nodes
1
states to two (2 ), whereas a state in the Fig. 1 depicts a generic are clustered at the beginning of the keywordID matrix. At the same
architecture of parallel pattern matching. The pattern matching time, the structure of the trie is maintained to ensure proper state
machine consists of N parallel pattern matching engines. Each migration.
engine is instructed to search for a subset of the patterns, thus The state re-labeling is performed after the initial trie
called Subset Engine (SE). A SE consists of multiple processing structure is created using Aho-Corasick algorithm, or actually any
elements (PEs) can work in parallel to detect patterns [10, 8]. other state machine based algorithms. The following pseudo code
The results from PEs are ANDed to generate the final result of describes the proposed re-labeling algorithm. The input of the re-
SE. Such a pattern matching machine, along with its subset labeling algorithm is a table mapping from patterns to the states,
engines, is technology-independent, i.e., it can be based on either an intermediate state or a matching state. Such a table is
provided by the original AC or other state machine based algorithms.
different technologies such as ASIC, FPGA or NP. SEs store
The output of re-labeling is an updated mapping table. It can been
state machines in either their local memory or a shared global seen that complexity of the re-labeling process is O(n), i.e. re-labeling
memory module. introduces little overhead.

Pattern Matching Algorithm


begin
1 for (each char1∈Σ) do
2 for (each char2∈Σ) do next[char1,char2]←m+1;
3 for (each char1∈Σ) do next[char1,pattern[0]]←m;
4 for (i=0;i<=m-1li++) do
next[pattern[i],pattern[i+1]]←m-i-1;
5 for( j←0; j<=n;)do
begin
6 i←m-1;
7 while(i>=0 and pattern[i]=text[i+j]) do i←i-1;
8 if (i<0) then output(match at location j);
Figure 1: A generic parallel pattern matching architecture.

3. MOTIVATION OF THE RESEARCH 9 j←j+next[text[j+m-1],text[j+m]];


We are interested to study the memory usage of the key-word ID endfor
and next state matrix and find out how these two fields contribute end
to the total memory usage of state machines. We again use Snort
pattern set and divide it into subsets in our study. The number of 4.2 Memory Partition
the subsets, and subsequently the number of parallel matching After re-labeling, we propose to separate the storage of keyword-ID
engines, are inversely related to the size k (i.e. the number of matrix from the storage of next state matrix. This is based on the
patterns of a subset). For each subset, a state machine is observation that the number of non zero keyword-ID vectors is
bounded by the number of patterns in the subset. When the AC
constructed and stored in the memory space, and a pattern algorithm constructs a trie, each pattern will be mapped to a double-
matching engine is assigned to the state machine. Note that ASIC circled node, and only be mapped to that node (i.e.the matching
and FPGA based designs have to handle the worst case state node).That is, the following assertion holds. NkeywordID≤ Nrules
machine, i.e., its memory module has to be large enough to Therefore, only ak × k matrix is needed to store keywordID vectors,
accommodate a state machine with the maximal number of where k is the amount of patterns in the subset. This observation is
states. particularly helpful in reducing the local memory needed for parallel
subset engines shown in Fig. 1. On the other hand, non-null next-state
4. OPTIMIZATION SCHEMES
vectors can be arbitrarily more than non zero keyword-ID vectors
In this section, we present an optimization technique to organize since there can be an arbitrary number of intermediate or non-
state machines memory in a compact fashion. The technique matching nodes in the trie. While these intermediate nodes are
consists of two steps: state re-labeling and memory partition. necessary to maintain the trie structure, their associated keyword-ID
vectors are all zeros. In regular memory organization, the keyword-ID
4.1 State Re-labeling and next states are tightly coupled. We propose to separate the storage
We revisit concepts which we find that the keyword-ID vectors of keyword-ID matrix from next state matrix. The k × k bit matrix is
actually contain a large number of zero vectors, indicating no used to store the keyword-ID matrix and next state matrix resides in
matching at all. These vectors correspond to the intermediate an independent memory module.
Thus, the zero keywordID vectors of the intermediate states
nodes along the tried before reaching any double circled nodes. are eliminated. We call this technique “k-Square”.In fact, the k × k bit
In the classical Aho-Corasick (AC) algorithm, all the nodes in matrix can contain zero vectors because some matching nodes can
the trie are labeled in the sequence when they are created. As a contain more than one pattern. In such a case, we need only λ
matching nodes, where λ ≤ k. This gives opportunities to further
result, the zero vectors are interleaved with the non-zero vectors. reduce the memory usage. We can store only λ non-zero keyword-ID
As we can see, the percentage of zero vectors is consistently vectors in one shared global memory module. We call this technique
high, 49.1% and more, throughout the range of k under study. “k-Lamda”.
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 7, JULY 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 46
compare our compact state machines with Tan’s and Brodie’s
5. ARCHITECTURE methods. We assume using the same SRAM technology and then
We present the architectural design based on the above proposed translate Tan’s and Brodie’s results into the bytes needed for Snort
optimization techniques in this section. The State Processing pattern set. The memory required by Tan’s approach is estimated
Element (SPE) is shown in Fig. 2. A SPE consists of a Next State through simulating their architecture. For Brodie’s, we estimate from
Decision Unit (NSDU), an Output State Detector (OSD), and the density (423 characters/mm2) and 6T SRAM (2 x 10−6mm2/bit),
memory modules storing next-states matrix and keyword-ID both of which are from [3]. Hence, Brodie’s approach supports 846 x
matrix. The NSDU supplies the current state to next-state matrix 10−6chars/bit. With “k-Square” matrix, the memory consump tion is
to obtain the next state based on the input bit. Meanwhile, the within 46% of [10] and 20.1% of [3], respectively. The“k-Lambda”
keyword-ID vector corresponding to the current state will be matrix brings the memory usage to 38.7% of [10] and 16.9% of [3],
fetched from the keywordID matrix. The OSD determines respectively. The results can be explained as follows. The hardware
outputting the keywordID vector if it is a valid one. design in [10] requires all the “tiles” used to store keyword-ID
vectors having the same amount of rows as next state matrix. In fact,
however, a large percentage of the tiles are wasted for zero vectors.
On the contrary, our approach uses the keyword-ID matrix memory
as it is needed, i.e. no memory is wasted.
7. CONCLUSION
This paper proposes novel schemes to reduce the memory required to
store state machines in a linear memory space for pattern matching.
We propose to re-label the states in a state machine such that the
matching states are clustered at the beginning of a memory space,
giving opportunities for compact storage. We then partition state
machines and use separate memory modules to store keyword-ID and
next-state matrixes. I present the architectural design using the
proposed schemes. We evaluate the performance of the optimization
strategies using realistic pattern sets from Snort [9]. The experiment
results show significant reduction (up to 90.1%) on the memory
Figure 2: State Processing Element consumption of state machines. This research can benefit the design
of high speed pattern matching machines or pattern matching
Fig. 3(left) depicts the Next State Decision Unit, which is coprocessors for a wide range of applications such as network
intrusion detection and virus scanning etc.
responsible for reaching the next state based on the current state
and input bits. The Current State Register hold the current state.
The value is used as an address to access the row in the next-state 8. REFERENCES
matrix memory. In the next cycle, the next-state matrix will give
the next states (2 next states in this example) as its output, which [1] A. V. Aho and M. J. Corasick. Efficient string matching:
are fed to the Holding Register. The multiplexer in the figure will an aid to bibliographic search. Communications of the
ACM, 18(6):333–340, 1975.
determine the next state based on the 1-bit input from the data [2] S. Artan and J. Chao. Multi-packet Signature Detection
stream. Re-labeling may change the label of the entry state from using Prefix Bloom Filters. In IEEE Global
its default value ‘0’. Telecommunications Conference, Nov 2005.
Fig.3 (right) depicts Output State Detector. After [3] B. C. Brodie, R. K. Cytron, and D.E. Taylor. A Scalable
relabeling states, the label of matching states (double circle Architecture for High-Throughput Regular-Expression
nodes) are always less than k. For example, k=16, their label Pattern Matching. In ISCA, Boston, MA, June 2006.
values are 0, 1, 2, up to 15. With this property, we can use a [4] C. R. Clark and D. E. Schimmel. Modeling the
comparator to detect whether a current state is an matching state Data-Dependent Performance of Pattern-Matching
or not. If so, the Output State Detector will enable the keyword- Architectures. In International Symposium on Field
Programmable Gate Arrays, Monterey, CA, Feb 2006.
ID matrix. If not, the detector will reset the Output Register [5] Intel Corp. Intel IXP2400 Network Processor Product
indicating no match. Brief, 2005.
[6] S. Kumar, S. Dharmapurikar, P. Crowley, J. Turner, and
F. Yu. Algorithms to Accelerate Multiple Regular
Expression Matching for Deep Packet Inspection. In
SIGCOMM 2006, Pisa, Italy, 2006.
[7] Linux Layer 7 Packet Classifier.
http://sourceforge.net/projects/l7-filter/.
[8] P. Piyachon and Y. Luo. Efficient memory utilization on
network processors for deep packet inspection. In
Proceedings of ACM Symposium on Architectures for
Networking and Communications Systems, Dec 2006.
[9] Snort. www.snort.org, 2003.
[10] L. Tan and T. Sherwood. Architectures for Bit-Split String
Scanning in Intrusion Detection. IEEE Micro, Jan-Feb 06.
[11] Piyachon, P. Yan Luo Compact State Machines for High
Performance Pattern Matching Univ. of Massachusetts Lowell,
Figure 3: Next State Decision Unit, and Output State Detector Lowell Design Automation Conference, 2007. DAC '07. 44th
ACM/IEEE, June 2007.
The benefit is that the detector reduces the amount of accessing
to the keywordID matrix memory, leading to lower power
consumption and less likelihood of performance bottleneck. As a
result, the keywordID matrix is suitable to be put in a shared
global memory.

6. COMPARISON WITH EXISTING WORK


We then compared the memory usage of our optimized storage of
state machines with the results from previous research such as
Tan-Sherwood’s [10] and Brodie-Cytron-Taylor’s [3]. Approach
in [3] is based on pipelined state machine and just appeared
recently.A significant improvement is shown on the memory
density (characters/mm2) over Tan’s (however, they use newer
SRAM technology 65nm instead of 130nm in Tan’s.) We

Vous aimerez peut-être aussi