Académique Documents
Professionnel Documents
Culture Documents
Simulation
Simulator program (flexible)- a program that simulates the
design with & without the proposed feature of interest
No. of Programs or multiprogram workload
Performance impact
Cost=hardware + design time
Limitations of simulation
Resource intensive, lot of memory, time
Scaling Workloads and machines:
Basic measure of multiprocessor
Performance
Two performance characteristics
1. Absolute performance
Imp. to end user, buyer
Measured as work done per unit time
Input configuration ( problem size)
up front, set of continuous inputs
Performance = 1/ execution time
Explicit work per unit time, meaning full for application
Ex no of transaction serviced per unit time, no of
bonds computed per second,
2. Performance improvement due to parallelism
Speedup
Absolute performance on p processor
Abs performance on single processor
Speedup (Execution time as performance metric)
Time( 1 proc)
Time(p procs)
(TPC:
users terminals and size of database is scale with
computing power)
Many efforts have been made to define standard benchmark suites of parallel
applications to facilitate workload-driven architectural evaluation.
For now, let us assume that a parallel program has been chosen as a workload, and
see how we might use it to evaluate a real machine. We first keep the number of
processors fixed, which both simplifies the discussion and also exposes the
important interactions more cleanly. Then, we discuss varying the number of
processors as well.
Workload/Benchmark
Suites
Numerical Aerodynamic Simulation (NAS)
Originally pencil and paper benchmarks
SPLASH/SPLASH-2
Shared address space parallel programs
ParkBench
Message-passing parallel programs
ScaLapack
Message-passing kernels
TPC
Transaction processing
SPEC-HPC
. ..
Choosing Performance
metrics
1. Absolute performance
User time or Wall clock time
Average or maximum time
2. Performance improvement or speedup
3. Processing rate
Number of computer operation executed per unit time
MFLOPS, MIPS
4. Utilization
Processor busy in execution
5. Problem size
Smallest problem size of a given application that obtains a
specified parallel efficiency
Speedup/number of processors
Efficiency constrained scaling
6. Percentage improvement in performance
Improved performance due to architectural feature
In evaluating how well a machine scales as resources are
added, it is not only how performance increases that
matters but also how cost increases.
1. Absolute Performance
Suppose that execution time is our absolute performance
metric. Time can be measured in different ways
First, there is a choice between
User time is the time the machine spent executing
code from the particular workload or program in
question, thus excluding system activity and other
programs that might be timesharing the machine
wall-clock time is the total elapsed time for the
workloadincluding all intervening activityas
measured by a clock hanging on the wall
Second, there is the issue of whether to use the average
or the maximum execution time over all processes of the
program.
2. Performance Improvement or Speedup
what the denominator in the speedup ratio
performance on one processorshould actually
measure. There are four choices
1. Performance of the parallel program on one
processor of the parallel machine
2. Performance of a sequential implementation of
the same algorithm on one processor of the
parallel machine
3. Performance of the best sequential algorithm
and program for the same problem on one
processor of the parallel machine
4. Performance of the best sequential program on
an agreed-upon standard machine.
3. Processing Rate
A metric that is often quoted to characterize the
performance of machines is the number of computer
operations that they execute per unit time (as opposed to
operations that have meaning at application level, such
as transactions or chemical bonds).
Classic examples are MFLOPS (millions of floating point
operations per second) for numerically intensive
programs and MIPS (millions of instructions per second)
for general programs.
4. Utilization
Architects sometimes measure success by how well (what
fraction of the time) they are able to keep their
processing engines busy executing instructions rather
than stalled due to various overheads.
5. Problem Size
The smallest problem size of a given application that obtains a
specified parallel efficiency, which is defined as speedup divided
by number of processor.
By keeping parallel efficiency fixed as the number of processors
increases, in a sense this introduces a new scaling model that we
might call efficiency-constrained scaling, and with it a
performance metric which is the smallest problem size needed
6. Percentage improvement in performance