Reconfigurable Computing

Pattern Matching
Pattern matching can be defined as the process of checking if a character string is part of a longer sequence
of characters.
Pattern matching is used in a large variety of fields in computer science. In text processing programs such
as Microsoft Word, pattern matching is used in the search function. The purpose is to match the keyword
being searched against a sequence of characters that build the complete text.
In database information retrieval, the content of different fields of a given database entry are matched
against the sequence of characters that build the user request.
Speech recognition and other pattern recognition tools also use pattern matching as basic operations on top
of which complex intelligent functions may be built, to better classify the audio sequences.
Other applications using pattern matching are:
dictionary implementation, spam avoidance, network intrusion detection and content surveillance.
The first advantage of using the reconfigurable device here is the inherent fine-grained parallelism that
characterizes the search. Many different words can be searched for in parallel by matching the input text
against a set of words on different paths. The second advantage is the possibility of quickly exchanging the
list of searched words by means or reconfiguration.
We define the capacity of a search engine as the number of words that can be searched for in parallel. A
large capacity also means a high complexity of the function to be implemented in hardware, which in turn
means a large amount of resources. The goal in implementing a search engine in hardware is to have a
maximal hardware utilization, i.e. as many words as possible that can be searched for in parallel.
The different hardware implementation of pattern matching are :
1. Sliding Window Approach
2. Hash Tabled Based Text Searching
3. Automation Based Text Searching
The Sliding Windows Approach
One approach for text searching is the so-called sliding window. In the 1-keyword version, the target word
is stored in one register, each character being stored in one register field consisting of a byte. The length of
the register is equal to the length of the word it contains. The text is streamed through a separate shift
register, whose length is the same as that of the keyword. For each character of the given keyword stored as
a byte in one register field, an 8-bit comparator is used to compare this character with the corresponding
character of the text, which streams through the shift register. A hit occurs when all the comparators return
the value true.
The sliding window can be extended to check a match of multiple patterns in parallel. Each target word
will then be stored in one register and will have as many comparators as required.
The main advantage of this approach resides in the reconfiguration. Because each keyword is stored in an
independent register, the reconfiguration can happen without affecting the other words, thus providing the
possibility to gradually and quickly modify the dictionary.
Automation Based Text Searching:

It is well known that any regular grammar can be recognized by a deterministic finite state machine (FSM).
In an automaton-based search algorithm, a finite state machine is built on the basis of the target words. The
target words define a regular grammar that is compiled in an automaton acting as a recognizer for that
grammar. When scanning a text, the automaton changes its state with the appearance of characters. Upon
reaching an end state, a hit occurs and the corresponding word is set to be found.
The advantage of the FSM-based search machine is the elimination of the preprocessing step done in many
other methods to remove stop words (such as the, to, for etc., which does not affect the meaning of
statements) from documents.
When taking into consideration the common prefix of words, it is possible to save a considerable amount of
flip flops. For this purpose, the search machine could be implemented as an automaton recognizer,
common to all the target words. As shown in figure below, the resulting structure is a tree in which a path
from the root to a leaf determines the appearance of a corresponding key word in the streamed text. Words
that share a common prefix use a common path from the root corresponding to the length of their common
prefix. A split occurs at the node where the common prefix ends.
In the hardware implementation of a group of words with a common prefix, common flip flops will be used
to implement the common path (figure 9.3a). For each character in the target alphabet, only one comparator
is needed. The comparison occurs in this case, once at a particular location in the device. Incoming
characters can be directly sent to a set of all possible comparators. Upon matching a particular one, a
corresponding signal is sent to all the flip flops, which need the result of that comparison. This method will
reduce the number of comparators needed almost by the sum of the length of all target words.
The overall structure of the search engine previously explained is given in shown above. It consists of an
array of comparators to decode the characters of the FSM alphabet, a state decoder that moves the state
machine in the next states and a preprocessor to map incoming 8-bit characters to 5-bit characters, thus
mapping all the characters to lower cases.
As case insensitivity is considered in most application in information retrieval, the preprocessor is designed
to map upper and lower case characters to the binary code of 1 to 26 and the rest of character to 0.
Characters are streamed to the device in ASCII notation. An incoming 8-bit character triggers the clock and
is mapped to a 5-bit character that is sent to the comparator array. All the 5-bit signals are distributed to all
the comparators that operate in parallel to check if a match of the incoming character with an alphabet
character occurs. If a match occurs for the comparator k, the output signal chark is set to one and all the
others are set to zero. If no match occurs, all the output signal are set to be zero.
Applications of Reconfigurable Computing
Pattern Matching
Video Streaming
Distributed Arithmetic
Adaptive Controller
Adaptive Cryptographic Systems
Software Defined Radio
High-Performance Computing
Rapid Prototyping, Frequently Configurable and Non-Frequently Reconfigurable

devices
1. Rapid prototyping:
In this case, the reconfigurable device is used as an emulator for another digital device, usually an
ASIC.
Emulator in Embedded Systems refers to Hardware/Software that duplicates the functions of one
Embedded System (The Guest) in another Embedded System(The Host), different from first one,
so that the Emulated Behaviour closely resembles the behaviour of the Real system.
The emulation process allows to functionally test the correctness of the ASIC device to be
produced, sometimes in real operating and environmental conditions, before production.
The reconfigurable device is only reconfigured to emulate a new implementation of the ASIC
device.
Example: APTIX-System Explorer and the ITALTEL Flexbench systems
2. Non-frequently reconfigurable systems:
The Reconfigurable device is integrated in a system where it can be used as an application-specific
processor.
Such systems are used as prototyping platform, but can be used as running environment as well.
These systems are usually stand-alone Systems
The reconfiguration is used for testing and initialization at start-up and for upgrading purpose.
The device configuration is usually stored in an EEPROM of Flash from which it is downloaded
at start-up to reconfigure the device.
No configuration happens during operation.
Example: The RABBIT System, the Celoxica, RC100, RC200, RC300
3. Frequently reconfigurable systems:.
In this category of Reconfigurable Devices, the devices are frequently reconfigured.
These systems are usually coupled with a host processor, which is used to reconfigure the device
and control the complete system.
RD is used as an accelerator for time critical parts of applications.
The host processor accesses the RD using function calls.
The Reconfigurable part is usually a PCI board attached to the PCI-bus. The communication is
useful for configuration and data exchange.
Example: The Raptor 2000, the Celoxica RC 1000, RC2000
Runtime Reconfigurable Systems & Their Hardware

Depending on the time at which the reconfiguration sequence are defined, Reconfigurable Devices can be
classified as:
A. Static / Compile-time Reconfigurable Devices: The Computation and Configuration sequences as
well as the Data exchange are defined at compile time and never change during a computation.
B. Dynamic/ Run-time Reconfigurable Devices: The Computation and Configuration sequences are
not known at compile time. Request to implement a given task is known at run-time and should be
handled dynamically.
The management of the reconfigurable device is usually done by a scheduler and a placer that can
be implemented as part of an operating system running on a processor .
The processor can either reside inside or outside the reconfigurable chip.
The scheduler manages the tasks and decides when a task should be executed.
The tasks that are available as configuration data in a database are characterized through their
bounding box and their run-time. The bounding box defines the area that a task occupies on the
device.
The scheduler determines which task should be executed on the RPU and then gives the task to the
placer that will try to place it on the device, i.e. allocate a set of resources for the implementation
of that task.
If the placer is not able to find a site for the new task, then it will be sent back to the scheduler that
can then decide to send it later and to send another task to the placer. In this case, we say that the
task is rejected.
The host CPU is used for device configuration and data transfer.
Usually, the reconfigurable device and the host processor communicates through a bus that is used
for the data transfer between the processor and the reconfigurable device
The RPU acts like a coprocessor with varying instruction sets accessible by the processor in a
function call.
Differences between reconfigurable machines and conventional processors:

Instruction Distribution Rather than broadcasting a new instruction to the functional units on every
cycle, instructions are locally configured, allowing the reconfigurable device to compress instruction
stream distribution and effectively deliver more instructions into active silicon on each cycle.
Spatial routing of intermediates As space permits, intermediate values are routed in parallel from
producing function to consuming function rather than forcing all communication to take place in time
through a central resource bottleneck.
More, often finer-grained, separately programmable building blocks Reconfigurable devices provide
a large number of separately programmable building blocks allowing a greater range of computations to
occur per time step. This effect is largely enabled by the compressed instruction distribution.
Distributed deployable resources, eliminating bottlenecks Resources such as memory, interconnect,
and functional units are distributed and deployable based on need rather than being centralized in large
pools. Independent, local access allows reconfigurable designs to take advantage of high, local, parallel onchip bandwidth, rather than creating a central resource bottleneck.
DPGA (Dynamically Programmable Gate Array)
A fine-grained computing device which adds small, on-chip instruction memories to FPGAs
Used for typical logic applications and finite-state machines.
DPGA implementation is one-third the size of the FPGA implementation.
Each compute and interconnect resource has its own, small, memory for describing its behavior
These instruction memories are read in parallel whenever a context (instruction) switch is indicated.
The DPGA exploits two facts:

1. The description of an operation is much smaller than the active area necessary to perform the
operation.
2. It is seldom necessary to evaluate every gate or bit computation in a design simultaneously in
order to achieve the desired task latency or throughput.
Programmable, Configurable and Fixed Function Devices

Programmable we will use the term programmable to refer to architectures which heavily and rapidly reuse a
single piece of active circuitry for many different functions. The canonical example of a programmable device is a
processor which may perform a different instruction on its ALU on every cycle. All processors, be they microcoded,
SIMD, Vector, or VLIW are included in this category.
Configurable we use the term configurableto refer to architectures where the active circuitry can perform any of
a number of different operations, but the function cannot be changed from cycle to cycle. FPGAs are our canonical
example of a configurable device.
Fixed Function, Limited Operation Diversity, High Throughput When the function and data granularity to be
computed are well-understood and fixed, and when the function can be economically implemented in space,
dedicated hardware provides the most computational capacity per unit area to the application.
Variable Function, Low Diversity If the function required is unknown or varying, but the instruction or data
diversity is low; the task can be mapped directly to a reconfigurable computing device and efficiently extract high
computational density.
Space Limited, High Entropy If we are limited spatially and the function to be computed has a high operation
and data diversity, we are forced to reuse limited active space heavily and accept limited instruction and data
bandwidth. In this regime, conventional processor organization are most effective since they dedicate considerable
space to on-chip instruction storage in order to minimize off-chip instruction traffic while executing descriptively
complex tasks.
General-Purpose Computing Issues

There are two key features associated with general-purpose computers which distinguish them from their specialized
counterparts. The way these aspects are handled plays a large role in distinguishing various general-purpose
computing architectures.
Interconnect
In general-purpose machines, the datapaths between functional units cannot be hardwired. Different tasks will
require different patterns of interconnect between the functional units. Within a task individual routines and
operations may require different interconnectivity of functional units. General-purpose machines must provide the
ability to direct data flow between units. In the extreme of a single functional unit, memory locations are used to
perform this routing function. As more functional units operate together on a task, spatial switching is required to
move data among functional units and memory. The flexibility and granularity of this interconnect is one of the big
factors determining yielded capacity on a given application.
Instructions
Since general-purpose devices must provide different operations over time, either within a computational task or
between computational tasks, they require additional inputs, instructions, which tell the silicon how to behave at
any point in time. Each general-purpose processing element needs one instruction to tell it what operation to perform
and where to find its inputs. As we will see, the handling of this additional input is one of the key distinguishing
features between different kinds of general-purpose computing structures. When the functional diversity is large and
the required task throughput is low, it is not efficient to build up the entire application dataflow spatially in the
device. Rather, we can realize applications, or collections of applications, by sharing and reusing limited hardware
resources in time (See Figure 2.1) and only replicating the less expensive memory for instruction and intermediate
data storage.

Reconfigurable Computing

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Reconfigurable Computing

Transféré par

Droits d'auteur :

Formats disponibles

Pattern Matching

Automation Based Text Searching:

Applications of Reconfigurable Computing

Rapid Prototyping, Frequently Configurable and Non-Frequently Reconfigurable

Runtime Reconfigurable Systems & Their Hardware

Differences between reconfigurable machines and conventional processors:

DPGA (Dynamically Programmable Gate Array)

The DPGA exploits two facts:

Programmable, Configurable and Fixed Function Devices

General-Purpose Computing Issues

Vous aimerez peut-être aussi