Vous êtes sur la page 1sur 6

AN ENVIRONMENT FOR ENERGY CONSUMPTION ANALYSIS OF CACHE MEMORIES IN SOC PLATFORMS Cordeiro, F.R.; Silva-Filho, A.G.; Araujo, C.C.

; Gomes, M.; Barros, E.N.S. and Lima, M.E. Informatics Center (CIn) Federal University of Pernambuco (UFPE) Av. Prof. Luiz Freire s/n Cidade Universitria Recife/PE - Brasil email: { frc, agsf,cca2,maag,ensb,mel}@cin.ufpe.br
ABSTRACT The tuning of cache architectures in platforms for embedded systems applications can dramatically reduce energy consumption. The existing cache exploration environments constrain the designer to analyze cache energy consumption on single processor systems and worse, systems that are based on a single processor type. In this paper is presented the PCacheEnergyAnalyzer environment for energy consumption analysis of cache memory on SoC platforms. This is a powerful energy analysis environment that combines the use of efficient tools to provide static and dynamic energy consumption analysis, the flexibility to support the architecture exploration of cache memories on platforms that are not bound to a specific processor, and fast simulation techniques. The proposed environment has been integrated into the SoC modeling framework PDesigner, providing a user-friendly graphical interface allowing the integrated modeling and cache energy analysis of SoCs. The PCacheEnergyAnalyzer has been validated with four applications of the Mediabench suite benchmark. 1. INTRODUCTION Currently, the energy consumed by the memory hierarchy can account for up to 50% of the total energy spent by microprocessor-based architectures [1][2]. This fact becomes more critical due to the emergence of SoCs, meaning a large part of the integrated circuits contains heterogeneous processors and often cache memories. Moreover, current semiconductor technologies have raised static memory consumption from negligible to up to 30%. Many approaches do not take this fact in consideration. Many efforts have been made to reduce the consumption of energy by adjusting cache parameters to the needs of a particular application[3][4][5][6]. However, since the fundamental purpose of the cache subsystem is to provide high performance for memory accessing, cache optimization techniques are supposed to be driven not only by energy savings but also by preventing degradation of the applications performance. No single combination of cache parameters (total size, line size and associativity), also known as cache configuration, would be perfect for all applications. Therefore cache subsystems have been customized in order to deal with specific characteristics and to optimize their energy consumption when running a particular application. By adjusting the parameters of a cache memory to a specific application it is possible to save on average 60% of energy consumption [3]. Nevertheless, finding a suitable cache configuration (combination of total size, line size and associativity) for a specific application can be a complex task and may take a long time for simulation and analysis. Most of the tools use exhaustive or heuristics based exploration [4][5][6] of all possible cache configurations. These are cost intensive approaches, leading to unacceptable exploration times. The tools for cache analysis lack the resources designers need to perform an energy consumption analysis with efficacy and efficiency. Some do not take into consideration static energy consumption. These tools are not flexible; they are normally bound to a specific processor. Designers experience several difficulties when they need to analyze caches on a platform with processors that are different from the ones in the tools. Some environments have been developed aimed at cache parameters exploration [4][5][7]. However, none of them has taken into account cache memory energy models that consider the two energy components: static and dynamic. Silva-Filho [6] considers static and dynamic energy components, however, the work focuses on a single platform and is not integrated in to a graphical interface environment for platform analysis. Although cache memory exploration considering energy consumption is not a new issue in Design Space Exploration (DSE), this work contributes with a new approach for exploiting platforms with cache memory architectures, considering energy consumption. Differently from other approaches with analysis intended for only one processor, this paper presents an environment for energy consumption analysis called PCacheEnergyAnalyzer. This is a an environment that provides support for the exploration of cache memories configurations in terms of static and dynamic energy consumption. Moreover, it uses a

fast exploration strategy based on single-step simulation for simulating multiple sizes of caches simultaneously. It also supports the cache exploration in platforms not bound to a specific processor. All these features are integrated in a easy to handle graphical environment in the PDesigner Framework. The rest of this paper is structured as follows. In the next section, we discuss some recent related work. In section 3 the proposed approach for a cache energy environment is presented. In section 5 some results are presented comparing the potentialities for two different processors and several applications by using PCacheEnergyAnalyzer environment. Finally, in Section 5 the conclusions and future directions are discussed. 2. RELATED WORK Some existing methods still apply the exhaustive search to find the optimal cache configuration in the design space. However, the time required for such an exhaustive search is often prohibitive. Platune [8] is an example of a framework for adjusting configurable System-on-Chip (SoC) platforms that utilizes the exhaustive search method for one-level caches and just one type of processor (MIPS core processor). It is suitable only in some cases when there are only a small number of possible configurations [8]. But for a large design space, a long exploration time would be required. Even the use of heuristics may be unsuitable for several long simulations. Palesi et al. [9] reduces the possible configuration space by using a genetic algorithm and produces faster results than the Platune approach. Zhang et al. [3] have developed a heuristic based on the influence of each cache parameter (cache size, line size and associativity) in the overall energy consumption. However, the simulation mechanism used by the previous approaches is based on the SimpleScalar [10] and CACTI tools [11]. SimpleScalar is a microprocessor simulation tool based on command lines, which generate the results of the applications performance. The CACTI tool is intended to generate energy consumption per access for a given cache configuration. In these cases, the simulation of different configurations for the same application may take a long period. Prete et al. [12] proposed the simulation tool called ChARM for tuning ARM-based embedded systems that also include cache memories. This tool provides a parametric, trace-driven simulation for tuning system configuration. Unlike previous approaches, it provides a graphical interface that allows designers to configure the components parameters of the components, evaluate execution time, conduct a set of simulations, and analyze the results. However, energy results are not supported by this approach.

On the other hand, Silva-Filho, in [8] takes into account static and dynamic energy consumption estimates in his analysis with the TECH-CYCLES heuristic. This heuristic uses the eCACTI [13] cache memory model to determine the energy consumption of the hierarchy. The eCACTI, differently from other approaches, considers the two energy components: static and dynamic. The static energy component that was negligible in previous technologies represents, for recent technologies, up to 30% of the energy in CMOS circuits [14]. The eCACTI is an up-to-date cache memory model that was extended from the original CACTI model [11]. The original CACTI tool does not consider the static component of energy. Also, the transistor width of various devices is assumed to be constant (except for wordlines) when analyzing power and delay. Nowadays this assumption would be incorrect [11], because the transistor widths in actual cache designs change according to their capacitive load. These lead to significant inaccuracies in the CACTI power estimates. The PDesigner framework is an Eclipse-based framework [15] that provides support for the modeling and simulation of SoCs and MPSoCs platforms. By using this framework the platform designer can build the platform graphically and generate an executable simulator. Currently, PDesigner is a free solution and offers support to modeling platform with different components such as processors, cache memory, memory, bus and connections. Performance results are obtained from this approach; however, energy results are not supported. Looking at the situation depicted in Table 1 it becomes evident that there is no environment that combines the flexibility to model multiple platforms with caches; the use of an approach based on a single simulation; the capability to estimate both dynamic and static energy consumption of cache memories; or the possibility to explore the platform configuration design space graphically. Table 1. Comparison of related studies.
Multi Platform Modeling Single Simul. Dynamic Consump . Static Consump . Graphical Exploration

Zhang Palesis Silva-Filho Platune SimpleScalar ChARM PDesigner

3. PROPOSED APPROACH In this paper, we propose the development of a cache energy consumption estimation tool that implements an energy consumption analysis flow and its integration as a plugin in the PDesigner framework. The plugin, called PCacheEnergyAnalyzer, provides dynamic and static energy consumption statistics for cache memory components of a SoC. The plugin is also an interactive environment that provides a graphical user-friendly interface for cache analysis and its interaction with the platform model already provided by the PDesigner. The proposed approach is depicted in Figure 1. The first step in the approach has been the definition of an energy cache analysis flow. For the implementation of the flow a new SystemC component that generates traces of memory accesses has been created, and that has been added to the PDesigner library. Moreover, two additional tools have been created: an interactive graphical environment that allows the control and view of the results of the analysis; and a tool for dynamic and static energy consumption estimation based on the eCACTI model. These two tools comprise the PCacheEnergyAnalyzer plugin. The plugin allows the designer to select a cache on the platform, define the design space to be explored, visualize the results in charts, select the desired cache configuration from the chart and reflect the decision on the platform. Finally, the updated library and the PCacheEnergyAnalyzer plugin have been integrated into the PDesigner framework. The result is a powerful tool that supports the modeling of platforms and the cache architecture exploration. In the rest of this section the analysis flow, its implementation by the PCacheEnergyAnalyzer and the integration in the PDesigner are explained.
PCacheEnergyAnalyzer Plugin

Platform Mapping Select Cache Energy Analysis Define Exploration Space Simulate Define Configuration Space Define Transistor Technology

Application

Calculate Energy

View Results

Select Configuration

Update Platform

Fig. 2. Energy consumption analysis flow. Initially, the desired platform is graphically constructed from a list of components available in the PDesigner component library. System designers model the architecture by dragging and dropping the components from the component palette. The component palette has the following component types: processor, bus, device, memory and cache memory. Figure 3 shows an example of a platform composed of a MIPS processor, cache memory, bus and main memory. The component master and slave protocol ports are connected through connections. The designer can also change the component parameters by selecting them and using the properties view (lower part of the Figure 3). The application is a binary code compiled for the target processor. The designer selects the processor and associates the binary file with the triple {processor, memory, load address}. In order to make energy analysis in cache memory it is necessary to select the PCacheEnergyAnalyzer option when the designer right-clicks on the cache component. This option enables the platform to explore energy consumption in the cache memory component. Once the cache component has been selected, the designer can change the cache memory properties. In the Properties window shown in Figure 3, the designer can change the exploration space of the cache memory component. This is done by defining minimum and maximum values for each cache memory parameter. The parameters are the following: cache size, cache line size and associativity. For the associativity there is only the maximum parameter. After, an executable simulator of the platform it is generated. The simulator performs a single simulation and generates miss and hits statistics for the entire configuration space defined by the designer. So, the result of the simulator execution is an XML file that contains the

Interactive Visual Graphical Environment Environment Interaction Analysis Flow Library Extension Dynamic & Static Visual Environment Energy estimation Interaction Integration in PDesigner

Fig. 1. Proposed approach. 3.1. Cache Energy Consumption Analysis Flow Figure 2 shows the flow used to analyze energy consumption in cache memories. All necessary steps are detailed carefully in this section.

cache configuration ID, cache parameters such as size, line size, associativity, number of accesses and miss rate. A simulation mechanism using a single-pass simulation technique, based on [16] work, has been adopted. Usually, simulations using this method are based on traces and spend more than one single simulation [16] [17]. For instance, single-pass cache evaluation mechanism proposed in [16] is 70 times faster than a simulation-based mechanism for ADPCM application from Mediabench.

Processor

it in two sets of information. The first of these is the cache parameters and technology information that are provided to the eCACTI tool for the dynamic and static energy calculation per access. The second one contains the number of misses, the number of accesses and cache parameters of the chosen configuration. This information, together with the dynamic and static energy provided by the eCACTI, is used to calculate the total static and dynamic energies consumed by the cache memory for the application. In addition, in this step the total number of cycles needed to run the application is also calculated. Once calculated these parameters, another parser generates the energy estimation results for each configuration also in XML format file.
Selected Configurations Space (.XML) Cache parameters, # Miss, # Accesses Energy, Cycles Calculation Dynamic and Static Energy per access Energy, Cycles Results parser

Component Pallete

Cache Memory Bus

parser Cache parameters and technology

Main Memory
eCACTI

Energy Consumption Estimation Results (.XML)

Processor Load Address

Fig. 4. Energy Calculation Flow


Exploration Space

Fig. 3. PDesigner, Architecture Modeling, Component Palette and Configuration Space. The exploration space may contain cache configurations that are invalid or that are not interesting for the designer. After simulation, the designer is able to select some or all configurations for energy analysis and define the configuration space that contains all the desired cache configurations through a Configuration Selection Window. This window allows the designer to select the transistor technology size and also all the cache configurations in the configuration space. After the configuration space has been defined, the energy module calculates the energy consumption and number of cycles for each selected configuration. The cache memory energy consumption calculation flow is depicted in Figure 4. A parser receives as input the selected Configurations Space saved in the XML file and separates

A cost function represented by F = Energy x Cycles equation is also calculated. The minimization of this cost function makes it possible to obtain the cache configurations near to Pareto-optimal [8]. These cache configurations present a tradeoff between performance and energy consumption. The configuration that has the lowest Energy x Cycles cost is also identified. Once the energy calculation flow is concluded, the user graphically visualizes the results of the cache energy analysis. The energy consumption estimation for each of the configurations in the configuration space is displayed in a visual interactive chart as depicted in Figure 5. The chart displays on the y-axis the energy consumed and, on the x-axis, the performance in number of clock cycles. Each point on the chart corresponds to one of the configurations in the configuration space. The chart is interactive, meaning the user can select one of the points and display information about it. There are two types of information: the first, in the form of a tool tip, is depicted by the rectangle in Figure 5 and contains the number of cycles and energy consumed by the selected configuration; the second form of presenting information is by viewing properties, also shown in Figure 5.

Fig. 5. Energy estimation interactive chart. Here the following information is displayed: the cache configuration parameter values, miss rate, number of accesses, the cost value based on the cost function calculation, dynamic and static energy consumption, the total cycles required to run the application and the total energy consumption. The configuration with the lowest calculated cost is represented in the interactive chart in a different color. The user can use this configuration as a reference. Therefore he/she is not obliged to choose it as the optimal configuration. The user also can interact with the chart in order to view the properties of a particular cache configuration. In this step, the designer selects one of the configurations that meets his/her performance/energy consumption requirements. The user selects the configuration by simply clicking on the point in the chart. In this step, the designer updates the platform by replacing the actual cache component with the selected configuration parameter values. The PCacheEnergyAnalyzer plugin makes the substitution automatically by interacting with the PDesigner Framework. 4. RESULTS The PCacheEnergyAnalyzer tool has been used to explore the cache memory design space for four different applications of the Mediabench benchmark suite [18]: fft, timing, rawcaudio and rawdaudio. The architecture is composed of one interconection structure SimpleBus; one cache memory; and a RAM memory. The parameters of the cache memory are varied and the exploration is performed for the two different processors and four different applications from the Mediabench suite [18]. Figure 6 summarizes the energy consumption estimation values of the cache configurations with best cost function and configurations with the lowest energy consumption in the configuration space for each application, running in the MIPS and SPARCV8 processors. Despite these two processors have similar architectures, compilers and compilation optimization presents some differences in some cases. It can be seen in the chart that the MIPS processor presents a much better energy consumption than the SPARCV8 for the timing application, and slightly higher energy consumption for the other applications. The configuration space used considers 50 different configurations for each application. The selected technology was 0.18um. The cache size varies from 256 to 8192 bytes; the cache line size ranges from 16 to 64 bytes; and the associativity ranges from 1 to 4. The energy consumption estimation and performance have been calculated based on the flow depicted in Figure 4. The results are then displayed in the energy estimation interactive chart of Figure 5.
0,1200 MIPS (Cost Function)

Energy (Joules)

0,1000 0,0800 0,0600 0,0400 0,0200 0,0000 Timing Rawcaudio

MIPS (Lowest Energy) SPARC (Cost Function) SPARC (Lowest Energy)

Rawdaudio

FFT

Fig. 6. Energy estimation for different applications.

Additionally, the proposed approach also was compared with existing work by using the basicmath_small from Mibench suite [19]. SimpleScalar and PcacheEnergyAnalyzer(PCEA) were compared in terms of fidelity by analyzing the energy consumption for some different cache configurations. Each pixel in Figure 7 represents the energy consumption for a given cache configuration (cache size, cache line size, associativity).

6. REFERENCES
[1] H. Chang; L. Code; M. Hunt, G. Martin, A.J. McNelly and L. Todd, Surviving the SOC revolution: A guide to platform-based design; Kluwer Academic Publishers, 1 ed., 1999. [2] B. Malik Moyer and D. Cermak, A Low Power Unified Cache Architecture Providing Power and Performance Flexibility, Int Symp. On Low Power Electronics and Design, June 2000, pp. 241-243. [3] C. Zhang, F. Vahid, Cache configuration exploration on prototyping platforms. 14th IEEE Interational Workshop on Rapid System Prototyping (June 2003), vol 00, p.164. [4] A. Gordon-Ross, F. Vahid, N. Dutt, Automatic Tuning of Two-Level Caches to Embedded Aplications, DATE, pp.208-213 (Feb 2004). [5] A. Gordon-Ross, et.al. ,Fast Configurable-Cache Tuning with a Unified Second-Level Cache, ISLPED05, 2005. [6] A.G. Silva-Filho, F.R. Cordeiro, R.E. SantAnna and M.E. Lima, Heuristic for Two-Level Cache Hierarchy Exploration Considering Energy Consumption and Performance, PATMOS 2006, Montpellier, France, September 13-15, 2006 pp 75-83. [7] A. Halambi, et al. EXPRESSION: A language for architecture exploration through compiler/simulator retargetability. DATE , March 1999. p.485-491. [8] T. Givargis, F. Vahid; Platune: A Tuning framework for system-on-a-chip platforms, IEEE Trans. Computer-Aided Design, vol 21, nov. 2002. pp.1-11. [9] M. Palesi, T. Givargis, Multi-objective design space exploration using genetic algorithms. Internacional Wordshop on Hardware/Software Codesign (May 2002). [10] D. Burger, T.M. Austin, The SimpleScalar Tool Set, Version 2.0; Computer Architecture News; Vol 25(3). June 1997. pp.13-25. [11] P. Shivakumar, N.P. Jouppi, Cacti 3.0: An Integrated Cache Timing, Power and Area model, WRL Research Report 2001/2. [12] C.A. Prete, M. Graziano, F. Lazzarini, The ChARM Tool for Tuning Embbeded Systems. In IEEE Micro 1997. Vol 17, pp. 67-76. [13] N. Dutt, M. Mamidipaka, eCACTI: An Enhanced Power Estimation Model for On-chip Caches, TR 04-28; set. 2004. [14] E. Macii, et. al. ; Energy-Aware Design of Embedded Memories: A Survey of Technologies, Architectures and Optimization Techniques, ACM Transactions on Embedded Computing Systems; Vol. 2, No. 1, Feb. 2003, pp. 5-32. [15] Eclipse, available at http://www.eclipse.org. [16] P. Viana, et al. Cache-Analyzer: Design Space Evaluation of Configurable-Caches in a Single-Pass. International Workshop on Rapid System Prototyping. pp. 3-9, May 2007. [17] R.A. Sugumar, and S.G. Abraham, Efficient simulation of multiple cache configurations using binomial trees, CSETR-111-91,CSE Div, Univ. of Michigan, 1991. Available in: <http://citeseer.ist.psu.edu/sugumar91efficient.html>. [18] Mediabench: http://cares.icsl.ucla.edu/MediaBench/,2006. [19] M.R. Guttaus, et al. Mibench: A free, commercially representative embedded benchmark suite. In IEEE 4th Annual Workshop on Workload Characterization, pp.1-12, Dec. 2001.

Fig. 7. Normalized Energy comparison for SimpleScalar and PCEA approaches. Although SimpleScalar tool do not support energy consumption analysis, it was calculated with an approach based on Zhang work [3], using one level cache and the eCACTI cache memory energy model. For simplicity of the analysis, data and instructions caches configurations are assumed to be the same. Results showed in Figure 7 indicate that both approaches present fidelity. We believe that the precision difference depicted in the figure 8 is due to the used compilers and compilation optimizations. 5. CONCLUSION In this work has been presented the PcacheEnergyAnalyzer environment for energy consumption analysis. The tool provides support for cache memory energy consumption estimation on SoC platforms. Initial studies were focused for one level caches, however, it can be easily extended for more levels. Results have shown that it is a powerful tool for helping users to find interesting cache configurations for a particular application, which consider not only performance, but also the best relation between performance and energy consumption. PCacheEnergyAnalyzer fills the gaps of the existing tools by simultaneously providing multiplatform support, extensibility, dynamic and static energy consumption estimation and a graphical environment.

Vous aimerez peut-être aussi