Académique Documents
Professionnel Documents
Culture Documents
Optimization Techniques
1
Session Topics
Helping The Compiler - Control flow
Dead code elimination
2
Session Objectives
To learn the optimization of c code by helping the
compiler through control flow statements
To study the different ways for elimination of the dead
code
3
Helping The Compiler - Control flow
Profiling Tools 4
Faster for Loop
Ordinarily, you would code a simple for() loop like this:
for( i=0; i<10; i++){ ... }
i loops through the values 0,1,2,3,4,5,6,7,8,9
If you don't care about the order of the loop counter,
you can do this instead: for( i=10; i--; ) { ... }
Using this code, i loops through the values
9,8,7,6,5,4,3,2,1,0, and the loop should be faster.
This works because it is quicker to process "i--" as the
test condition, which says "is i non-zero? If so,
decrement it and continue.".
For the original code, the processor has to calculate
"subtract i from 10. Is the result non-zero? if so,
increment i and continue.
5
Faster for Loop
The third statement in the loop is optional (an
infinite loop would be written as "for( ; ; )" ).
The same effect could also be gained by
coding: for(i=10; i; i--){}
or (to expand it further)
for(i=10; i!=0; i--){}
The only things you have to be careful of are
remembering that the loop stops at 0 (so if you
wanted to loop from 50-80, this wouldn't work),
and the loop counter goes backwards.It's easy
to get caught out if your code relies on an
ascending loop counter.
6
If..else…
From optimization and run time view it is
recommended to shift the more probable
case into the If-branch.
During an If/else-statement in a loop the
more probable case is put into the else
branch.
7
If..else…
In the code fragment below, the conditional-clause of the nested
IF statement can be eliminated.
void f (int *p)
{
if (p)
{
g(1);
if (p) g(2);
g(3);
}
return;
}
8
If..else…
In the code fragment below, the two IF
statements can be combined into one IF
statement.
9
If..else…
Below is the code fragment after the two IF statements
have been combined into one IF statement.
10
If..else…
Below is the code fragment after the nested
conditional-clause has been eliminated.
void f (int *p)
{
if (p)
{
g(1);
g(2);
g(3);
}
return;
}
11
Place Case Labels In Narrow
Range
If the case labels are in a narrow range, the compiler
does not generate a if-else-if cascade for the switch
statement
It generates a jump table of case labels along with
manipulating the value of the switch to index the table
This code generated is faster than if-else-if cascade
code that is generated in cases where the case labels
are far apart
Performance of a jump table based switch statement
is independent of the number of case entries in switch
statement
12
Place Frequent Case Labels First
If the case labels are placed far apart, the compiler will
generate if-else-if cascaded code with comparing for
each case label and jumping to the action for leg on
hitting a label match
By placing the frequent case labels first, you can
reduce the number of comparisons that will be
performed for frequently occurring scenarios
Typically this means that cases corresponding to the
success of an operation should be placed before
cases of failure handling
13
Break Big Switch Statements Into
Nested Switches
The previous technique does not work for some
compilers as they do not generate the cascade of if-
else-if in the order specified in the switch statement
In such cases nested switch statements can be used to get
the same effect
To reduce the number of comparisons being
performed, judiciously break big switch statements into
nested switches
Put frequently occurring case labels into one switch
and keep the rest of case labels into another switch
which is the default leg of the first switch
14
Break Big Switch Statements Into
Nested Switches
/*Splitting a Switch Statement: This switch statement performs a
switch on frequent messages and handles the infrequent
messages with another switch statement in the default leg of the
outer switch statement */
pMsg = ReceiveMessage();
switch (pMsg->type)
{
case FREQUENT_MSG1:
handleFrequentMsg1();
break;
case FREQUENT_MSG2:
handleFrequentMsg2();
break;
...
case FREQUENT_MSGn:
handleFrequentMsgn();
break;
15
Break Big Switch Statements Into
Nested Switches
default:
// Nested switch statement for handling infrequent messages.
switch (pMsg->type)
{
case INFREQUENT_MSG1:
handleInfrequentMsg1();
break;
case INFREQUENT_MSG2:
handleInfrequentMsg2();
break;
...
case INFREQUENT_MSGm:
handleInfrequentMsgm();
break;
}
}
16
Switch
Table lookups:
each test and jump makes up the machine language
implementation uses up valuable processor time simply
deciding what work should be done next
To speed things up, try to put the individual cases in order by
their relative frequency of occurrence
If there is a lot of work to be done within each case, it
might be more efficient to replace the entire switch
statement with a table of pointers to functions
enum NodeType { NodeA, NodeB, NodeC };
switch (getNodeType())
{
case NodeA:
..
case NodeB:
..
case NodeC:
.. 17
Switch
• The first part is the setup: the creation of an array
of function pointers
• The second part is a one-line replacement for the
switch statement that executes more efficiently
int processNodeA(void);
int processNodeB(void);
int processNodeC(void);
/* * Establishment of a table of pointers to functions. */
int (* nodeFunctions[])() = { processNodeA, processNodeB,
processNodeC };
.
.
/* * The entire switch statement is replaced by the next line. */
status = nodeFunctions[getNodeType()]();
18
switch() Instead of if...else...
For large decisions involving if...else...else..., like this:
if( val == 1)
dostuff1();
else if (val == 2)
dostuff2();
else if (val == 3)
dostuff3();
it may be faster to use a switch:
switch( val )
{
case 1: dostuff1(); break;
case 2: dostuff2(); break;
case 3: dostuff3(); break;
}
In the if() statement, if the last case is required, all the previous ones will
be tested first. The switch lets us cut out this extra work. If you have to
use a big if..else.. statement, test the most likely cases first.
19
Loop Breaking
It is often not necessary to process the entirety of a loop. For
example, if you are searching an array for a particular item, break
out of the loop as soon as you have got what you need.
Example: This loop searches a list of 10000 numbers to see if
there is a -99 in it.
found = FALSE;
for(i=0;i<10000;i++)
{
if( list[i] == -99 )
{
found = TRUE;
}
}
if( found ) printf("Yes, there is a -99. Hooray!\n");
20
Loop Breaking
This works well, but will process the entire array, no matter where
the search item occurs in it.
A better way is to abort the search as soon as you've found the
desired entry.
found = FALSE;
for(i=0; i<10000; i++)
{
if( list[i] == -99 )
{
found = TRUE;
break;
}
}
if( found ) printf("Yes, there is a -99. Hooray!\n");
If the item is at, say position 23, the loop will stop there and then,
and skip the remaining 9977 iterations.
21
goto Statements
As with global variables, good software
engineering practice dictates against the
use of this technique
But in a pinch, goto statements can be
used to remove complicated control
structures or to share a block of oft
repeated code
22
Example in Kiel
10 Instructions
Using keil
37 micro second
23
Example in Kiel
37 micro second
24
Example in Kiel
14 Instructions
Using keil
34 micro second
25
Example in Kiel
Using keil
26
Example in Kiel
27
Example in Kiel
28
Helping The Compiler - Libraries
Profiling Tools 29
Avoid Standard Library Routines
Many of the largest are expensive only because they
try to handle all possible cases
It might be possible to implement a subset of the
functionality yourself with significantly less code
For example, the standard C library's sprintf
routine is notoriously large
Much of this bulk is located within the floating-point
manipulation routines on which it depends
But if you don't need to format and display floating-
point values (%f or %d ), you could write your own
integer-only version of sprintf and save several
kilobytes of code space
30
Misc
Declare anything within a file (external to
functions) as static, unless it is intended to be
global.
Use word-size variables if you can, as the
machine can work with these better ( instead
of char, short, double, bitfields etc. ).
Compilers can often optimise a whole file -
avoid splitting off closely related functions into
separate files, the compiler will do better if can
see both of them together (it might be able to
inline the code, for example).
31
Misc
Binary/unformatted file access is faster than formatted access, as
the machine does not have to convert between human-readable
ASCII and machine-readable binary. If you don't actually need to
read the data in a file yourself, consider making it a binary file.
If your library supports the mallopt() function (for controlling
malloc), use it. The MAXFAST setting can make significant
improvements to code that does a lot of malloc work.If a
particular structure is created/destroyed many times a second, try
setting the mallopt options to work best with that size.
Last but definitely not least - turn compiler optimisation on!
Seems obvious, but is often forgotten in that last minute rush to
get the product out on time. The compiler will be able to optimise
at a much lower level than can be done in the source code, and
perform optimisations specific to the target processor.
32
Optimizing Compilers: Conclusions
Some processor-specific options still do
not appear to be a major factor in
producing fast code
More optimizations do not guarantee
faster code
Different algorithms are most effective
with different optimizations
Idea : using statistics gathered by
profiler as input for compiler/linker
33
Dead code
Profiling Tools 34
Standard Compiler
Optimizations
Dead-Code Elimination
If code is definitely not going to be executed during any run of
a program, then it is called dead code and can be removed.
Example:
debug = 0;
...
if (debug){
print .....
}
You can help by using ASSERTs and #ifdefs to tell the
compiler about dead code
It is often difficult for the compiler to identify dead code
itself
35
Standard Compiler
Optimizations
Common Sub-expression Elimination
Formally, “An occurrence of an expression E is called a
common sub-expression if E was previously computed, and
the values of variables in E have not changed since the
previous computation.”
You can avoid re-computing the expression if we can use the
previously computed one. b:
Benefit: less code to be executed t6 = 4 * i
x = a[t6]
After b: Before t7 = 4 * i
t6 = 4* i t8 = 4 * j
x = a[t6] t9 = a[t8]
t8 = 4 * j a[t7] = t9
t9 = a[t8] t10 = 4 * j
a[t6] = t9 a[t10] = x
a[t8] = x goto b
goto b
36
Code Transformations
Code rewriting to improve access locality and
regularity
Data dependencies (true or false !?)
consumption time - production time gives the
data lifetime
single assignment improve freedom
37
Code Transformations
38
Data-flow Chains Removal
To remove unnecessary storage
Data-dependencies are what matters
code can be rewritten with different internal data-flow and
control flow without changing the algorithm's result
Data-flow transformations
eliminate redundant transfers and storage
enable subsequent steps such as loop and control
transformations
Expected influence
reduce number of memory accesses from CPU and in all
memory hierarchy stages
39
Data-flow Chains Removal
40
Example
int sumcalc(int a, int b, int N)
{
int i;
int x, y;
x = 0;
y = 0;
for(i = 0; i <= N; i++) {
x = x + (4*a/b)*i + (i+1)*(i+1);
x = x + b*y;
}
return x;
}
41
Example
test:
subu $fp, 16
sw zero, 0($fp) # x = 0
sw zero, 4($fp) # y = 0
sw zero, 8($fp) # i = 0
lab1: # for(i=0;i<N; i++)
mul $t0, $a0, 4 # a*4
div $t1, $t0, $a1 # a*4/b
lw $t2, 8($fp) # i
mul $t3, $t1, $t2 # a*4/b*i
lw $t4, 8($fp) # i
addui$t4, $t4, 1 # i+1
lw $t5, 8($fp) # i
addui$t5, $t5, 1 # i+1
mul $t6, $t4, $t5 # (i+1)*(i+1)
addu $t7, $t3, $t6 # a*4/b*i + (i+1)*(i+1)
lw $t8, 0($fp) # x
add $t8, $t7, $t8 # x = x + a*4/b*i + (i+1)*(i+1)
sw $t8, 0($fp)
lw $t0, 4($fp) # y
mul $t1, $t0, a1 # b*y
lw $t2, 0($fp) # x
add $t2, $t2, $t1 # x = x + b*y
sw $t2, 0($fp)
lw $t0, 8($fp) # i
addui $t0, $t0, 1 # i+1
sw $t0, 8($fp)
ble $t0, $a3, lab1
lw $v0, 0($fp)
addu $fp, 16
b $ra
42
Lets Optimize...
int sumcalc(int a, int b, int N)
{
int i;
int x, y;
x = 0;
y = 0;
for(i = 0; i <= N; i++) {
x = x + (4*a/b)*i + (i+1)*(i+1);
x = x + b*y;
}
return x;
}
43
Constant Propagation
int sumcalc(int a, int b, int N)
{
int i;
int x, y;
x = 0;
y = 0;
for(i = 0; i <= N; i++) {
x = x + (4*a/b)*i + (i+1)*(i+1);
x = x + b*y;
}
return x;
}
44
Constant Propagation
int sumcalc(int a, int b, int N)
{
int i;
int x, y;
x = 0;
y = 0;
for(i = 0; i <= N; i++) {
x = x + (4*a/b)*i + (i+1)*(i+1);
x = x + b*y;
}
return x;
}
45
Constant Propagation
int sumcalc(int a, int b, int N)
{
int i;
int x, y;
x = 0;
y = 0;
for(i = 0; i <= N; i++) {
x = x + (4*a/b)*i + (i+1)*(i+1);
x = x + b*0;
}
return x;
}
46
Algebraic Simplification
int sumcalc(int a, int b, int N)
{
int i;
int x, y;
x = 0;
y = 0;
for(i = 0; i <= N; i++) {
x = x + (4*a/b)*i + (i+1)*(i+1);
x = x + b*0;
}
return x;
}
47
Algebraic Simplification
int sumcalc(int a, int b, int N)
{
int i;
int x, y;
x = 0;
y = 0;
for(i = 0; i <= N; i++) {
x = x + (4*a/b)*i + (i+1)*(i+1);
x = x + b*0;
}
return x;
}
48
Algebraic Simplification
int sumcalc(int a, int b, int N)
{
int i;
int x, y;
x = 0;
y = 0;
for(i = 0; i <= N; i++) {
x = x + (4*a/b)*i + (i+1)*(i+1);
x = x + 0;
}
return x;
}
49
Algebraic Simplification
int sumcalc(int a, int b, int N)
{
int i;
int x, y;
x = 0;
y = 0;
for(i = 0; i <= N; i++) {
x = x + (4*a/b)*i + (i+1)*(i+1);
x = x + 0;
}
return x;
}
50
Algebraic Simplification
int sumcalc(int a, int b, int N)
{
int i;
int x, y;
x = 0;
y = 0;
for(i = 0; i <= N; i++) {
x = x + (4*a/b)*i + (i+1)*(i+1);
x = x + 0;
}
return x;
}
51
Algebraic Simplification
int sumcalc(int a, int b, int N)
{
int i;
int x, y;
x = 0;
y = 0;
for(i = 0; i <= N; i++) {
x = x + (4*a/b)*i + (i+1)*(i+1);
x = x;
}
return x;
}
52
Copy Propagation
int sumcalc(int a, int b, int N)
{
int i;
int x, y;
x = 0;
y = 0;
for(i = 0; i <= N; i++) {
x = x + (4*a/b)*i + (i+1)*(i+1);
x = x;
}
return x;
}
53
Copy Propagation
int sumcalc(int a, int b, int N)
{
int i;
int x, y;
x = 0;
y = 0;
for(i = 0; i <= N; i++) {
x = x + (4*a/b)*i + (i+1)*(i+1);
}
return x;
}
54
Common Sub-expression
Elimination
int sumcalc(int a, int b, int N)
{
int i;
int x, y;
x = 0;
y = 0;
for(i = 0; i <= N; i++) {
x = x + (4*a/b)*i + (i+1)*(i+1);
}
return x;
}
55
Common Sub-expression
Elimination
int sumcalc(int a, int b, int N)
{
int i;
int x, y;
x = 0;
y = 0;
for(i = 0; i <= N; i++) {
x = x + (4*a/b)*i + (i+1)*(i+1);
}
return x;
}
56
Common Sub-expression
Elimination
int sumcalc(int a, int b, int N)
{
int i;
int x, y, t;
x = 0;
y = 0;
for(i = 0; i <= N; i++) {
t = i+1;
x = x + (4*a/b)*i + (i+1)*(i+1);
}
return x;
}
57
Common Sub-expression
Elimination
int sumcalc(int a, int b, int N)
{
int i;
int x, y, t;
x = 0;
y = 0;
for(i = 0; i <= N; i++) {
t = i+1;
x = x + (4*a/b)*i + t*t;
}
return x;
}
58
Dead Code Elimination
int sumcalc(int a, int b, int N)
{
int i;
int x, y, t;
x = 0;
y = 0;
for(i = 0; i <= N; i++) {
t = i+1;
x = x + (4*a/b)*i + t*t;
}
return x;
}
59
Dead Code Elimination
int sumcalc(int a, int b, int N)
{
int i;
int x, y, t;
x = 0;
y = 0;
for(i = 0; i <= N; i++) {
t = i+1;
x = x + (4*a/b)*i + t*t;
}
return x;
}
60
Dead Code Elimination
int sumcalc(int a, int b, int N)
{
int i;
int x, y, t;
x = 0;
61
Dead Code Elimination
int sumcalc(int a, int b, int N)
{
int i;
int x, t;
x = 0;
62
Summary
The ways to help the compiler through control flow
Simple design will often prevent extra branches
If..else…
Switch
Loop breaking
63
Summary
Improving program performance
Standard compiler optimizations
Common sub-expression elimination
Dead-code elimination
Induction variables
64