Académique Documents
Professionnel Documents
Culture Documents
POWER REDUCTION
Wing-Shan (Emily) Chan
wech592@cse.unsw.edu.au
18th May 2004
Supervisor: Annie Guo
huig@cse.unsw.edu.au
Table of Content
Abstract.......................................................................................................1
1. Introduction............................................................................................2
2. Review of Prior Work.............................................................................4
2.1 Prior Related Work........................................................................5
2.1.1 Bai et al. [1].........................................................................5
2.1.2 Maro et al. [4]......................................................................7
2.1.3 Bahar et al. [5].....................................................................9
3. Proposal................................................................................................11
3.1 Introduction..................................................................................11
3.2 Design..........................................................................................12
3.2.1 Floating Point and Integer Clusters...................................12
3.2.2 Ready and Non-Ready FIFOs............................................13
3.3 Implementation............................................................................15
3.3.1 Performance Monitors.......................................................15
3.3.2 Power Estimates and Tools................................................17
3.4 Alternate proposal........................................................................17
3.5 Schedule.......................................................................................19
4.0 Conclusion..........................................................................................21
5.0 References..........................................................................................22
Abstract
Power dissipation has become a very vital issue in designing modern computer
architectures. In this paper, I examine a number of previous researches on processor
power saving techniques and I analyze the constraints of each of them. Based on
these studies, I propose a solution towards the end of this paper.
1. Introduction
For years, researches have been carried out focusing only on ways to maximize
processor performance.
These two clusters share the same data cache, the instruction
fetch/rename unit and the commit unit. I then divide the issue queue in each cluster
into two major parts: ready and non-ready instructions queues.
Non-ready
instructions imply the ones that have at least one operand pending. Within each part,
the sub-queue is then further partitioned into several sets (FIFOs). Only the head of
each FIFO is visible to the request and selection/arbiter logic resulting in in-order
issue of instructions in FIFOs.
My contribution is to show the potential in power saving through dynamically
reconfiguring the issue logic as well as the functional units. According to the
feedback from the hardware performance monitors, I dynamically modify the size
and number of the existing FIFOs. Note that the reconfiguration of each cluster can
be carried out independently to each other and this independence holds between the
ready and non-ready queues as well. In addition, I also modify the number of
available functional units on-the-fly based on the feedback provided by the
hardware performance monitors.
The layout of the rest of the paper is as follows. Section 2 in the paper
discusses previously taken approaches. Section 3 explains my proposed scheme in
details and also presents my schedule of implementation.
conclusions.
Section 4 offers
The
hardware performance monitors used here are implemented by Maro et al. in [4].
These monitors are mainly composed of simple counters and comparators; hence,
the power consumed by these components can be neglected [5]. Below is a picture
showing an example of possible operating modes of the issue queue under
Scheme#1.
If the hardware
to gain better control over the type of applications. For instance, the processor can
deal with the two types of instruction separately and therefore results in a greater
flexibility to adapt to changes in resources needs of each instruction type.
In addition, further power saving can be achieved by restricting the broadcast
of a just-computed result to the non-ready instructions only. Applications with a
large degree of ILP will benefit the most from this restriction.
2.1.2 Maro et al. [4]
Maro et al. [4] implemented a hardware performance monitoring mechanism
and these monitors provide feedback on whether or not to disable part of the Integer
and/or Floating Point pipelines during runtime in order to save power. Here shows
the basic multi-pipelined processor:
In the work, a number of low-power operating modes are defined and they are:
Figure 4 Possible Operating Modes for the processor. Table taken from [4]
Both the entering and exiting of these modes depend of the hardware performance
monitors feedback while the exiting of these modes also depend on the trigger
events such as data/instruction cache misses and floating point activity, etc.
This approach provides greater flexibility in handling instructions according to
their types. Nevertheless, by shrinking the overall size of the issue queue when
entering some of the operating modes will lead to a limitation of the exposure of
ILP. Scheme#1 described in 2.1.1 has this same negative effect as they both attempt
10
to alter the total size of the issue queue during run-time of programs. Moreover, the
select and wake-up logic has no way to distinguish between ready and non-ready
instructions, and; therefore the system becomes very power inefficient in a way that
the associated selection and wake-up signals of all entries in the issue queue will
have to be updated every cycle even when an instruction is not ready to be issued.
2.1.3 Bahar et al. [5]
Bahar et al. [5] proposed a technique called Pipeline Balancing (PLB) in which
it allows disabling of a cluster or part of a cluster of functional units through varying
the issue width. The pipeline organization of 8-wide issue processor is shown
below:
12
Figure 7 Entering and Exiting Conditions for each state of the processor.
Table taken from [5].
It is important to ensure that these threshold values allow the system to respond
to changes in programs needs effectively and efficiently. For example, a program
that has a burst of floating point instructions for some portions of its execution time
will suffer when the processor fails to restore back to normal mode (rebalancing the
structure) effectively.
3. Proposal
3.1 Introduction
Bai et al. [1] states that a good design strategy should be flexible enough to
dynamically reconfigure available resources according to the programs needs.
This statement becomes the Golden Rule during the designing phase of my strategy.
In my proposed scheme, I try to provide as much flexibility as possible for a system
to adapt to changes in programs needs; and, therefore reacts effectively and
efficiently to fulfill the ultimate goal of power saving in processors.
3.2 Design
The fundamental configuration of my proposed scheme is as follow:
Parameter
Configuration
128 entries
Machine Width
Functional Units
64 entries
8 entries
56 entries
14
instruction fetch/rename unit, data cache and the commit unit. Note that this is only
a preliminary stage of design; the architecture described above may be changed
according to issues arisen during the implementation stage.
showing the new pipeline structure proposed:
15
Below is a graph
16
especially for applications exhibiting a high ILP. The graph below shows the
structure of the newly proposed issue queue within a cluster:
17
An important property is that the total numbers of entries for both component
queues remain the same at all times for all applications. This diminishes the
negative effect of limiting the exposure of ILP by shrinking the issue queue size.
In addition to the above modifications, I also propose to monitor the usage of
the functional units. More power can be saved by disabling some of the functional
units that are not utilized optimally. This again is implemented independently on
each of the clusters, i.e. disabling a Floating Point functional unit will not affect the
operation of the Integer cluster.
3.3 Implementation
3.3.1 Performance Monitors
Similar to [1, 4, 5], reconfigurations of the system are carried out according to
the feedback from the hardware performance monitors. As stated before, these
monitors are mainly composed of simple counters and comparators; therefore, the
power consumption by these parts can be neglected [5]. The cycle window [1] that I
set is either 512 or 1024 cycles at this stage. I will further investigate the feasibility
and effects on having different cycle window sizes for different monitors. This
ensures that the system will be more flexible in responding to feedback from
different monitors. I implement the following hardware performance monitors in
the system:
18
19
21
scheme will help me to analyze how the separation of the Integer and Floating Point
clusters impact the overall performance. Due to the similarity, the details of this
alternate approach will not be covered here; however, more information will be
provided in depth in the final report.
DESIGN PHASE (3 WEEKS)
Eliminate any uncertainty in design
through research
Modify design when necessary
3.5 Consolidate
Schedule design including determining
parameter values for processor
architecture
Document any changes with reasons
OPTIONAL PHASE
Implement the alternate
approach proposed in the design
document
Implement the proposed
processor using the Wattch Tool
Investigate the effect of
changing the issue width while
reconfiguring the functional
units
24
4.0 Conclusion
Due to the rapidly rising awareness of the importance of including power issues
in the design phase of processors, many researches have been carried out. These
prior work have presented many ways to achieve power saving while minimizing the
impact on the overall performance. Based on the previous studies, I propose a
strategy focusing mainly on the issue logic design as well as the usage of functional
units. The aim of my study is to show that by dynamically reconfiguring the internal
structure of my processor according to different sources of feedback, saving in
power consumption will be achieved. And that my proposed processor will have its
maximized flexibility in responding to different programs needs and therefore it
fulfills the Golden Rule stated in [1].
25
5.0 References
[1] Yu Bai and R. Iris Bahar. A Dynamically Reconfigurable Mixed In-Order/Out-ofOrder Issue Queue for Power-Aware Microprocessors.
Division of
[3] K. Wilcox and S. Manne. Alpha processors: A history of power issues and a look
to the future. In Cool-Chips Tutorial, November 1999. Held in conjunction
with the 32nd International Symposium on Microarchitecture.
[4]
Held in
26
[5] R. I. Bahar and S. Manne. Power and energy reduction via pipeline balancing. In
Proceedings of the 28th InternationalSymposium on Computer Architecture,
July 2001.
[6]
Technology.
27