Vous êtes sur la page 1sur 3


Vol. V, No. I, February 2018 www.ijrtonline.org

Review on Hybrid LUT/Multiplexer Combinational

Logic Block Architecture
Shaili Jain, Shashilata Rawat

Abstract— Hybrid configurable logic block architectures for Look up Tables:

field-programmable gate arrays that contain a mixture of lookup The basic method used to build a combinational logic block
tables and hardened multiplexers are evaluated toward the goal (CLB) also called a logic element in an SRAM-based FPGA is
of higher logic density and area reduction. Technology mapping the lookup table (LUT). As shown in Fig. 1, the lookup table
optimizations that target the proposed architectures are also
is an SRAM that is used to implement a truth table. Each
implemented within the circuit. Both accounting for complex
logic block and routing area while maintaining mapping depth. address in the SRAM represents a combination of inputs to the
For factorable architectures, the proposed architecture of this logic element. The value stored at that address represents the
paper analysis the logic size, area and power consumption using value of the function for that input combination. An n-input
Xilinx 14.2. function requires an SRAM with locations.
Keywords: Hybrid System, Combinational Logic Block (CLB),
LUT, Multiplexer.

A field-programmable gate array (FPGA) is a block of
programmable logic that can implement multi-level logic
functions. FPGAs are most commonly used as separate
commodity chips that can be programmed to implement large
functions. However, small blocks of FPGA logic can be useful
components on-chip to allow the user of the chip to customize
part of the chip’s logical function. An FPGA block must
implement both combinational logic functions and Fig. 1: Lookup Tables
interconnect to be able to construct multi-level logic functions. II. LITERATURE REVIEW
There are several different technologies for programming
FPGAs, but most logic processes are unlikely to implement Stephen Alexander Chin et al. [1], hybrid configurable logic
anti fuses or similar hard programming technologies. block architectures for field-programmable gate arrays that
Throughout the history of field-programmable gate arrays contain a mixture of lookup tables and hardened multiplexers
(FPGAs), lookup tables (LUTs) have been the primary logic are evaluated toward the goal of higher logic density and area
element (LE) used to realize combinational logic. A K-input reduction. Multiple hybrid configurable logic block
LUT is generic and very flexible able to implement any K- architectures, both non-fracturable and fracturable with
input Boolean function. The use of LUTs simplifies varying MUX:LUT logic element ratios are evaluated across
technology mapping as the problem is reduced to a graph two benchmark suites (VTR and CHStone) using a custom
covering problem. However, an exponential area price is paid tool flow consisting of LegUp-HLS, Odin-II front-end
as larger LUTs are considered. The value of K between 4 and synthesis, ABC logic synthesis and technology mapping, and
6 is typically seen in industry and academia, and this range has VPR for packing, placement, routing, and architecture
been demonstrated to offer a good area/performance exploration. Technology mapping optimizations that target the
compromise. Recently, a number of other works have proposed architectures are also implemented within ABC.
explored alternative FPGA LE architectures for performance Experimentally, we show that for non-fracturable
improvement to close the large gap between FPGAs and architectures, without any mapper optimizations, we naturally
application-specific integrated circuits (ASICs). save up to ∼8% area post place and route; both accounting for
Manuscript received on February, 2018. complex logic block and routing area while maintaining
Shaili Jain, Research Scholar, Department of Electronics & Communication
mapping depth. With architecture-aware technology mapper
Engineering, Lakshmi Narain College of Technology, Bhopal, M.P., India. optimizations in ABC, additional area is saved, post-place-
Prof. Shashilata Rawat, Asst. Professor, Department of Electronics &
and-route. For fracturable architectures, experiments show that
Communication Engineering, Lakshmi Narain College of Technology, only marginal gains are seen after place-and-route up to ∼2%.
Bhopal, M.P., India. For both non-fracturable and fracturable architectures, we see

Impact Factor: 4.012 5

Published under
Asian Research & Training Publication
ISO 9001:2015 Certified
Vol. V, No. I, February 2018 www.ijrtonline.org
minimal impact on timing performance for the architectures containing only look-up table-based logic and flip-flops, the
with best area-efficiency. ratio of silicon area required to implement them in FPGAs and
Rose et al. [2], recent works have shown that the ASICs is on average 35. Modern FPGAs also contain ―hard‖
heterogeneous architectures and synthesis methods can have a blocks such as multiplier/accumulators and block memories.
significant impact on improving logic density and delay, We find that these blocks reduce this average area gap
narrowing the ASIC–FPGA gap. Works by Anderson and significantly to as little as 18 for our benchmarks, and we
Wang with ―gated‖ LUTs, then with asymmetric LUT LEs, estimate that extensive use of these hard blocks could
show that the LUT elements present in commercial FPGAs potentially lower the gap to below five. The ratio of critical-
provide unnecessary flexibility. Toward improved delay and path delay, from FPGA to ASIC, is roughly three to four with
area, the macrocell-based FPGA architectures have been less influence from block memory and hard multipliers.
proposed. These studies describe significant changes to the J. Rose et al. [6], the dynamic power consumption ratio is
traditional FPGA architectures, whereas the changes proposed approximately 14 times and, with hard blocks, this gap
here build on architectures used in industry and academia. generally becomes smaller. In this paper the new architectural
Similarly, and-inverter cones have been proposed as proposals are routinely generated in both academia and
replacements for the LUTs, inspired by and-inverter graphs industry. For FPGA’s to continue to grow, it is important that
(AIGs). these new architectural ideas are fairly and accurately
Y. Hara et al. [3], purnaprajna and Ienne explored the evaluated, so that those worthy ideas can be included in future
possibility of repurposing the existing MUXs contained within chips. Typically, this evaluation is done using experimentation.
the Xilinx Logic Slices. Similar to this work, they use the However, the use of experimentation is dangerous, since it
ABC priority cut mapper as well as VPR for packing, place, requires making assumptions regarding the tools and
and route. However, their work is primarily delaying based architecture of the device in question. If these assumptions are
showing an average speed up of 16% using only ten of 19 not accurate, the conclusions from the experiments may not be
VTR7 benchmarks. In this article, we study the technology meaningful. In this paper, we investigate the sensitivity of
mapping problem for a novel field programmable gate array FPGA architectural conclusions to experimental variations. To
(FPGA) architecture that is based onk-input single-output make our study concrete, we evaluate the sensitivity of four
programmable logic array- (PLA-) like cells, or, k/m- previously published and well-known FPGA architectural
macrocells. Each cell in this architecture can implement a results: lookup-table size, switch block topology, cluster size,
single output function of up to k inputs and up to m product and memory size. It is shown that these experiments are
terms. We develop a very efficient technology mapping significantly affected by the assumptions, tools, and techniques
algorithm, km flow, for this new type of architecture. used in the experiments.
A. Canis et al. [4], the experimental results show that our
algorithm can achieve depth-optimality on almost all the test III. PROBLEM FORMULATION
cases in a set of 16 Microelectronics Center of North Carolina A K-input LUT is generic and very flexible—able to
(MCNC) benchmarks. Furthermore it is shown that on this set implement any K-input Boolean function. The use of LUTs
of benchmarks, with only a relatively small number of product simplifies technology mapping as the problem is reduced to a
terms (m≤k+3), the k/m-macro cellbased FPGAs can achieve graph covering problem. However, an exponential area price
the same or similar mapping depth compared with the is paid as larger LUTs are considered. The value of K between
traditional kinput single-output lookup table- (k-LUT-) based 4 and 6 is typically seen in industry and academia, and this
FPGAs. We also investigate the total area and delay of k/m- range has been demonstrated to offer a good area/performance
macro cell-based FPGAs and compare them with those of the compromise [4], [5]. Recently, a number of other works have
commonly used 4-LUT-based FPGAs. The experimental explored alternative FPGA LE architectures for performance
results show that k/m-macro cell-based FPGAs can improvement [6]–[10] to close the large gap between FPGAs
outperform 4-LUT-based FPGAs in terms of both delay and and application-specific integrated circuits (ASICs) [11]. In
area after placement and routing by VPR on this set of this paper, we propose incorporating (some) hardened
benchmarks. multiplexers (MUXs) in the FPGA logic blocks as a means of
E. Ahmed et al. [5], this paper presents experimental increasing silicon area efficiency and logic density.
measurements of the differences between a 90- nm CMOS The MUX-based logic blocks for the FPGAs have seen
field programmable gate array (FPGA) and 90-nm CMOS success in early commercial architectures, such as the Actel
standard-cell application specific integrated circuits (ASICs) ACT-1/2/3 architectures, and efficient mapping to these
in terms of logic density, circuit speed, and power structures has been studied [12] in the early 1990s. However,
consumption for core logic. We are motivated to make these their use in commercial chips has waned, perhaps partly due to
measurements to enable system designers to make better the ease with which logic functions can be mapped into LUTs,
informed choices between these two media and to give insight simplifying the entire computer aided design (CAD) flow.
to FPGA makers on the deficiencies to attack and, thereby, Nevertheless, it is widely understood that the LUTs are
improve FPGAs. We describe the methodology by which the inefficient at implementing MUXs, and that MUXs are
measurements were obtained and show that, for circuits frequently used in logic circuits.

Impact Factor: 4.012 6

Published under
Asian Research & Training Publication
ISO 9001:2015 Certified
Vol. V, No. I, February 2018 www.ijrtonline.org
The MUX4 LE shown in Fig. 2 consists of a 4-to-1 MUX with In this thesis we proposed a new hybrid CLB architecture
optional inversion on its inputs that allow the realization of containing MUX4 hard MUX elements and shown techniques
any {2, 3}-input function, some {4, 5}-input functions, and for efficiently mapping to these architectures. We also
one 6-input function—a 4-to-1 MUX itself with optional provided analysis of the benchmark suites post mapping,
inversion on the data inputs. A 4-to-1 MUX matches the input discussing the distribution of functions within each benchmark
pin count of a 6-LUT, allowing for fair comparisons with suite. The area reductions for non-fracturable architectures, is
respect to the connectivity and intracluster routing. 8% and MUX4:LUT ratio is 4:6 and in the case of fracturable
Naturally, any two-input Boolean function can be easily architecture the area reductions are 2%.The CH Stone
implemented in the MUX4: the two function inputs can be tied benchmarks being high level synthesized with LegUp-HLS
to the select lines and the truth table values (logic-0 or logic-1) also showed marginally better performance and this could be
can be routed to the data inputs accordingly. Or alternately, due to the way LegUp performs HLS on the CHStone
Shannon decomposition can be performed about one of the benchmarks themselves. Overall, the addition of MUX4s to
two variables—the variable can then feed a select input. The FPGA architectures minimally impact FMax and show
Shannon cofactors will contain at most one variable and can, potential for improving logic-density in non-fracturable
therefore, be fed to the data inputs (the optional inversion may architectures and modest potential for improving logic density
be needed). in fracturable architecture.
[1] Stephen Alexander Chin, Jason Luu, Safeen Huda, and Jason H.
Anderson, ―Hybrid LUT/Multiplexer FPGA Logic Architectures‖,
IEEE Transactions on Very Large Scale Integration (VLSI)
Systems, Volume:25, Issue: 2, Feb.2017. J.
[2] Rose et al., ―The VTR project: Architecture and CAD for FPGAs
from verilog to routing,‖ in Proc. ACM/SIGDA FPGA, 2012, pp.
[3] Y. Hara, H. Tomiyama, S. Honda, and H. Takada, ―Proposal and
Fig. 2: MUX4 LE depicting optional data input inversions. quantitative analysis of the CHStone benchmark program suite for
Hybrid Complex Logic Block: practical C-based high-level synthesis,‖ J. Inf. Process., vol. 17, pp.
A variety of different architectures were considered—the first 242–254, Oct. 2009.
[4] A. Canis et al., ―LegUp: High-level synthesis for FPGA-based
being a non-fracturable architecture. In the non-fracturable
processor/accelerator systems,‖ in Proc. ACM/SIGDA FPGA, 2011,
architecture, the CLB has 40 inputs and ten basic LEs (BLEs), pp. 33–36.
with each BLE having six inputs and one output following [5] E. Ahmed and J. Rose, ―The effect of LUT and cluster size on deep
empirical data in prior work. Fig. 3 shows this non-fracturable submicron FPGA performance and density,‖ IEEE Trans. Very
CLB architecture with BLEs that contain an optional register. Large Scale Integr. (VLSI), vol. 12, no. 3, pp. 288–298, Mar. 2004.
We vary the ratio of MUX4s to LUTs within the ten elements [6] J. Rose, R. Francis, D. Lewis, and P. Chow, ―Architecture of field
programmable gate arrays: The effect of logic block functionality on
CLB from 1:9 to 5:5 MUX4s:6-LUTs. The MUX4 element is area efficiency,‖ IEEE J. Solid-State Circuits, vol. 25, no. 5, pp.
proposed to work in conjunction with 6-LUTs, creating a 1217–1225, Oct. 1990.
hybrid CLB with a mixture of 6-LUTs and MUX4s (or MUX4 [7] H. Parandeh-Afshar, H. Benbihi, D. Novo, and P. Ienne,
variants). Fig. 3 shows the organization of our CLB and ―Rethinking FPGAs: Elude the flexibility excess of LUTs with and-
internal BLEs. inverter cones,‖ in Proc. ACM/SIGDA FPGA, 2012, pp. 119–128.
[8] J. Anderson and Q. Wang, ―Improving logic density through
synthesis inspired architecture,‖ in Proc. IEEE FPL, Aug./Sep.
2009, pp. 105–111.
[9] J. Anderson and Q. Wang, ―Area-efficient FPGA logic elements:
Architecture and synthesis,‖ in Proc. ASP DAC, 2011, pp. 369–375.
[10] J. Cong, H. Huang, and X. Yuan, ―Technology mapping and
architecture evaluation for k/m-macrocell-based FPGAs,‖ ACM
Trans. Design Autom. Electron. Syst., vol. 10, no. 1, pp. 3–23, Jan.
[11] Y. Hu, S. Das, S. Trimberger, and L. He, ―Design, synthesis and
evaluation of heterogeneous FPGA with mixed LUTs and macro-
gates,‖ in Proc. IEEE ICCAD, Nov. 2007, pp. 188–193.
[12] I. Kuon and J. Rose, ―Measuring the gap between FPGAs and
ASICs,‖ IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.,
Fig. 3: Hybrid CLB vol. 26, no. 2, pp. 203–215, Feb. 2007.

Impact Factor: 4.012 7

Published under
Asian Research & Training Publication
ISO 9001:2015 Certified