C-Programming-Optimization Techniques Class 3

Session-3
Optimization Techniques
1
Session Topics
 Helping The Compiler - Control flow
 Dead code elimination
2
Session Objectives
 To learn the optimization of c code by helping the
compiler through control flow statements
 To study the different ways for elimination of the dead
code
3
Helping The Compiler - Control flow
Profiling Tools 4
Faster for Loop
 Ordinarily, you would code a simple for() loop like this:
for( i=0; i<10; i++){ ... }
 i loops through the values 0,1,2,3,4,5,6,7,8,9
 If you don't care about the order of the loop counter,
you can do this instead: for( i=10; i--; ) { ... }
 Using this code, i loops through the values
9,8,7,6,5,4,3,2,1,0, and the loop should be faster.
 This works because it is quicker to process "i--" as the
test condition, which says "is i non-zero? If so,
decrement it and continue.".
 For the original code, the processor has to calculate
"subtract i from 10. Is the result non-zero? if so,
increment i and continue.
5
Faster for Loop
 The third statement in the loop is optional (an
infinite loop would be written as "for( ; ; )" ).
The same effect could also be gained by
coding: for(i=10; i; i--){}
 or (to expand it further)
for(i=10; i!=0; i--){}
 The only things you have to be careful of are
remembering that the loop stops at 0 (so if you
wanted to loop from 50-80, this wouldn't work),
and the loop counter goes backwards.It's easy
to get caught out if your code relies on an
ascending loop counter.
6
If..else…
 From optimization and run time view it is
recommended to shift the more probable
case into the If-branch.
 During an If/else-statement in a loop the
more probable case is put into the else
branch.
7
If..else…
 In the code fragment below, the conditional-clause of the nested
IF statement can be eliminated.
void f (int *p)
{
if (p)
{
g(1);
if (p) g(2);
g(3);
}
return;
}
8
If..else…
 In the code fragment below, the two IF
statements can be combined into one IF
statement.
void f (int *p)

{
if (p) g(1);
if (p) g(2);
return;
}
9
If..else…
 Below is the code fragment after the two IF statements
have been combined into one IF statement.
void f (int *p)

{
if (p)
{
g(1);
g(2);
}
return;
}
10
If..else…
 Below is the code fragment after the nested
conditional-clause has been eliminated.
void f (int *p)
{
if (p)
{
g(1);
g(2);
g(3);
}
return;
}
11
Place Case Labels In Narrow
Range
 If the case labels are in a narrow range, the compiler
does not generate a if-else-if cascade for the switch
statement
 It generates a jump table of case labels along with
manipulating the value of the switch to index the table
 This code generated is faster than if-else-if cascade
code that is generated in cases where the case labels
are far apart
 Performance of a jump table based switch statement
is independent of the number of case entries in switch
statement
12
Place Frequent Case Labels First
 If the case labels are placed far apart, the compiler will
generate if-else-if cascaded code with comparing for
each case label and jumping to the action for leg on
hitting a label match
 By placing the frequent case labels first, you can
reduce the number of comparisons that will be
performed for frequently occurring scenarios
 Typically this means that cases corresponding to the
success of an operation should be placed before
cases of failure handling
13
Break Big Switch Statements Into
Nested Switches
 The previous technique does not work for some
compilers as they do not generate the cascade of if-
else-if in the order specified in the switch statement
 In such cases nested switch statements can be used to get
the same effect
 To reduce the number of comparisons being
performed, judiciously break big switch statements into
nested switches
 Put frequently occurring case labels into one switch
and keep the rest of case labels into another switch
which is the default leg of the first switch
14
Nested Switches
/*Splitting a Switch Statement: This switch statement performs a
switch on frequent messages and handles the infrequent
messages with another switch statement in the default leg of the
outer switch statement */
pMsg = ReceiveMessage();
switch (pMsg->type)
{
case FREQUENT_MSG1:
handleFrequentMsg1();
break;
case FREQUENT_MSG2:
handleFrequentMsg2();
break;
...
case FREQUENT_MSGn:
handleFrequentMsgn();
break;
15
Nested Switches
default:
// Nested switch statement for handling infrequent messages.
switch (pMsg->type)
{
case INFREQUENT_MSG1:
handleInfrequentMsg1();
break;
case INFREQUENT_MSG2:
handleInfrequentMsg2();
break;
...
case INFREQUENT_MSGm:
handleInfrequentMsgm();
break;
}
}
16
Switch
 Table lookups:
 each test and jump makes up the machine language
implementation uses up valuable processor time simply
deciding what work should be done next
 To speed things up, try to put the individual cases in order by
their relative frequency of occurrence
 If there is a lot of work to be done within each case, it
might be more efficient to replace the entire switch
statement with a table of pointers to functions
enum NodeType { NodeA, NodeB, NodeC };
switch (getNodeType())
{
case NodeA:
..
case NodeB:
..
case NodeC:
.. 17
Switch
• The first part is the setup: the creation of an array
of function pointers
• The second part is a one-line replacement for the
switch statement that executes more efficiently
int processNodeA(void);
int processNodeB(void);
int processNodeC(void);
/* * Establishment of a table of pointers to functions. */
int (* nodeFunctions[])() = { processNodeA, processNodeB,
processNodeC };
.
.
/* * The entire switch statement is replaced by the next line. */
status = nodeFunctions[getNodeType()]();
18
switch() Instead of if...else...
 For large decisions involving if...else...else..., like this:
if( val == 1)
dostuff1();
else if (val == 2)
dostuff2();
else if (val == 3)
dostuff3();
 it may be faster to use a switch:
switch( val )
{
case 1: dostuff1(); break;
}
 In the if() statement, if the last case is required, all the previous ones will
be tested first. The switch lets us cut out this extra work. If you have to
use a big if..else.. statement, test the most likely cases first.
19
Loop Breaking
 It is often not necessary to process the entirety of a loop. For
example, if you are searching an array for a particular item, break
out of the loop as soon as you have got what you need.
 Example: This loop searches a list of 10000 numbers to see if
there is a -99 in it.
found = FALSE;
for(i=0;i<10000;i++)
{
if( list[i] == -99 )
{
found = TRUE;
}
}
if( found ) printf("Yes, there is a -99. Hooray!\n");
20
Loop Breaking
 This works well, but will process the entire array, no matter where
the search item occurs in it.
 A better way is to abort the search as soon as you've found the
desired entry.
found = FALSE;
for(i=0; i<10000; i++)
{
if( list[i] == -99 )
{
found = TRUE;
break;
}
}
if( found ) printf("Yes, there is a -99. Hooray!\n");
 If the item is at, say position 23, the loop will stop there and then,
and skip the remaining 9977 iterations.
21
goto Statements
 As with global variables, good software
engineering practice dictates against the
use of this technique
 But in a pinch, goto statements can be
used to remove complicated control
structures or to share a block of oft
repeated code
22
Example in Kiel
10 Instructions
Using keil
37 micro second
23
Example in Kiel
Using keil 10 Instructions
37 micro second
24
Example in Kiel
14 Instructions
Using keil
34 micro second
25
Example in Kiel
Using keil
26
Example in Kiel
27
Example in Kiel
28
Helping The Compiler - Libraries
Profiling Tools 29
Avoid Standard Library Routines
 Many of the largest are expensive only because they
try to handle all possible cases
 It might be possible to implement a subset of the
functionality yourself with significantly less code
 For example, the standard C library's sprintf
routine is notoriously large
 Much of this bulk is located within the floating-point
manipulation routines on which it depends
 But if you don't need to format and display floating-
point values (%f or %d ), you could write your own
integer-only version of sprintf and save several
kilobytes of code space
30
Misc
 Declare anything within a file (external to
functions) as static, unless it is intended to be
global.
 Use word-size variables if you can, as the
machine can work with these better ( instead
of char, short, double, bitfields etc. ).
 Compilers can often optimise a whole file -
avoid splitting off closely related functions into
separate files, the compiler will do better if can
see both of them together (it might be able to
inline the code, for example).
31
Misc
 Binary/unformatted file access is faster than formatted access, as
the machine does not have to convert between human-readable
ASCII and machine-readable binary. If you don't actually need to
read the data in a file yourself, consider making it a binary file.
 If your library supports the mallopt() function (for controlling
malloc), use it. The MAXFAST setting can make significant
improvements to code that does a lot of malloc work.If a
particular structure is created/destroyed many times a second, try
setting the mallopt options to work best with that size.
 Last but definitely not least - turn compiler optimisation on!
Seems obvious, but is often forgotten in that last minute rush to
get the product out on time. The compiler will be able to optimise
at a much lower level than can be done in the source code, and
perform optimisations specific to the target processor.
32
Optimizing Compilers: Conclusions
 Some processor-specific options still do
not appear to be a major factor in
producing fast code
 More optimizations do not guarantee
faster code
 Different algorithms are most effective
with different optimizations
 Idea : using statistics gathered by
profiler as input for compiler/linker
33
Dead code
Profiling Tools 34
Standard Compiler
Optimizations
 Dead-Code Elimination
 If code is definitely not going to be executed during any run of
a program, then it is called dead code and can be removed.
 Example:
debug = 0;
...
if (debug){
print .....
}
 You can help by using ASSERTs and #ifdefs to tell the
compiler about dead code
 It is often difficult for the compiler to identify dead code
itself
35
Standard Compiler
Optimizations
 Common Sub-expression Elimination
 Formally, “An occurrence of an expression E is called a
common sub-expression if E was previously computed, and
the values of variables in E have not changed since the
previous computation.”
 You can avoid re-computing the expression if we can use the
previously computed one. b:
 Benefit: less code to be executed t6 = 4 * i
x = a[t6]
After b: Before t7 = 4 * i
t6 = 4* i t8 = 4 * j
x = a[t6] t9 = a[t8]
t8 = 4 * j a[t7] = t9
t9 = a[t8] t10 = 4 * j
a[t6] = t9 a[t10] = x
a[t8] = x goto b
goto b
36
Code Transformations
 Code rewriting to improve access locality and
regularity
 Data dependencies (true or false !?)
 consumption time - production time gives the
data lifetime
 single assignment improve freedom
37
Code Transformations
38
Data-flow Chains Removal
 To remove unnecessary storage
 Data-dependencies are what matters
 code can be rewritten with different internal data-flow and
control flow without changing the algorithm's result
 Data-flow transformations
 eliminate redundant transfers and storage
 enable subsequent steps such as loop and control
transformations
 Expected influence
 reduce number of memory accesses from CPU and in all
memory hierarchy stages
39
Data-flow Chains Removal
40
Example
int sumcalc(int a, int b, int N)
{
int i;
int x, y;
x = 0;
y = 0;
for(i = 0; i <= N; i++) {
x = x + (4*a/b)*i + (i+1)*(i+1);
x = x + b*y;
}
return x;
}
41
Example
test:
subu $fp, 16
sw zero, 0($fp) # x = 0
sw zero, 4($fp) # y = 0
sw zero, 8($fp) # i = 0
lab1: # for(i=0;i<N; i++)
mul $t0, $a0, 4 # a*4
div $t1, $t0, $a1 # a*4/b
lw $t2, 8($fp) # i
mul $t3, $t1, $t2 # a*4/b*i
lw $t4, 8($fp) # i
addui$t4, $t4, 1 # i+1
lw $t5, 8($fp) # i
addui$t5, $t5, 1 # i+1
mul $t6, $t4, $t5 # (i+1)*(i+1)
addu $t7, $t3, $t6 # a*4/b*i + (i+1)*(i+1)
lw $t8, 0($fp) # x
add $t8, $t7, $t8 # x = x + a*4/b*i + (i+1)*(i+1)
sw $t8, 0($fp)
lw $t0, 4($fp) # y
mul $t1, $t0, a1 # b*y
lw $t2, 0($fp) # x
add $t2, $t2, $t1 # x = x + b*y
sw $t2, 0($fp)
lw $t0, 8($fp) # i
addui $t0, $t0, 1 # i+1
sw $t0, 8($fp)
ble $t0, $a3, lab1
lw $v0, 0($fp)
addu $fp, 16
b $ra
42
Lets Optimize...
{
int i;
int x, y;
x = 0;
y = 0;
for(i = 0; i <= N; i++) {
x = x + (4*a/b)*i + (i+1)*(i+1);
x = x + b*y;
}
return x;
}
43
Constant Propagation
{
int i;
int x, y;
x = 0;
y = 0;
for(i = 0; i <= N; i++) {
x = x + (4*a/b)*i + (i+1)*(i+1);
x = x + b*y;
}
return x;
}
44
{
int i;
int x, y;
x = 0;
y = 0;
for(i = 0; i <= N; i++) {
x = x + (4*a/b)*i + (i+1)*(i+1);
x = x + b*y;
}
return x;
}
45
{
int i;
int x, y;
x = 0;
y = 0;
for(i = 0; i <= N; i++) {
x = x + (4*a/b)*i + (i+1)*(i+1);
x = x + b*0;
}
return x;
}
46
Algebraic Simplification
{
int i;
int x, y;
x = 0;
y = 0;
for(i = 0; i <= N; i++) {
x = x + (4*a/b)*i + (i+1)*(i+1);
x = x + b*0;
}
return x;
}
47
{
int i;
int x, y;
x = 0;
y = 0;
for(i = 0; i <= N; i++) {
x = x + (4*a/b)*i + (i+1)*(i+1);
x = x + b*0;
}
return x;
}
48
{
int i;
int x, y;
x = 0;
y = 0;
for(i = 0; i <= N; i++) {
x = x + (4*a/b)*i + (i+1)*(i+1);
x = x + 0;
}
return x;
}
49
{
int i;
int x, y;
x = 0;
y = 0;
for(i = 0; i <= N; i++) {
x = x + (4*a/b)*i + (i+1)*(i+1);
x = x + 0;
}
return x;
}
50
{
int i;
int x, y;
x = 0;
y = 0;
for(i = 0; i <= N; i++) {
x = x + (4*a/b)*i + (i+1)*(i+1);
x = x + 0;
}
return x;
}
51
{
int i;
int x, y;
x = 0;
y = 0;
for(i = 0; i <= N; i++) {
x = x + (4*a/b)*i + (i+1)*(i+1);
x = x;
}
return x;
}
52
Copy Propagation
{
int i;
int x, y;
x = 0;
y = 0;
for(i = 0; i <= N; i++) {
x = x + (4*a/b)*i + (i+1)*(i+1);
x = x;
}
return x;
}
53
Copy Propagation
{
int i;
int x, y;
x = 0;
y = 0;
for(i = 0; i <= N; i++) {
x = x + (4*a/b)*i + (i+1)*(i+1);
}
return x;
}
54
Common Sub-expression
Elimination
{
int i;
int x, y;
x = 0;
y = 0;
for(i = 0; i <= N; i++) {
x = x + (4*a/b)*i + (i+1)*(i+1);
}
return x;
}
55
Elimination
{
int i;
int x, y;
x = 0;
y = 0;
for(i = 0; i <= N; i++) {
x = x + (4*a/b)*i + (i+1)*(i+1);
}
return x;
}
56
Elimination
{
int i;
int x, y, t;
x = 0;
y = 0;
for(i = 0; i <= N; i++) {
t = i+1;
x = x + (4*a/b)*i + (i+1)*(i+1);
}
return x;
}
57
Elimination
{
int i;
int x, y, t;
x = 0;
y = 0;
for(i = 0; i <= N; i++) {
t = i+1;
x = x + (4*a/b)*i + t*t;
}
return x;
}
58
Dead Code Elimination
{
int i;
int x, y, t;
x = 0;
y = 0;
for(i = 0; i <= N; i++) {
t = i+1;
x = x + (4*a/b)*i + t*t;
}
return x;
}
59
{
int i;
int x, y, t;
x = 0;
y = 0;
for(i = 0; i <= N; i++) {
t = i+1;
x = x + (4*a/b)*i + t*t;
}
return x;
}
60
{
int i;
int x, y, t;
x = 0;
for(i = 0; i <= N; i++) {

t = i+1;
x = x + (4*a/b)*i + t*t;
}
return x;
}
61
{
int i;
int x, t;
x = 0;
for(i = 0; i <= N; i++) {

t = i+1;
x = x + (4*a/b)*i + t*t;
}
return x;
}
62
Summary
 The ways to help the compiler through control flow
 Simple design will often prevent extra branches
 Fewer branches leads to more effective branch

prediction
 Faster for loop
 If..else…
 Switch
 Loop breaking
63
Summary
 Improving program performance
 Standard compiler optimizations
 Common sub-expression elimination
 Dead-code elimination
 Induction variables
 Aggressive compiler optimizations

 In-lining
of functions
 Loop unrolling
 Architectural code optimizations
64

C-Programming-Optimization Techniques Class 3

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

C-Programming-Optimization Techniques Class 3

Transféré par

Droits d'auteur :

Formats disponibles

Session-3

void f (int *p)

void f (int *p)

Using keil 10 Instructions

for(i = 0; i <= N; i++) {

for(i = 0; i <= N; i++) {

 Fewer branches leads to more effective branch

 Aggressive compiler optimizations

 Architectural code optimizations

Vous aimerez peut-être aussi