Vous êtes sur la page 1sur 10

The lesser known GCC optimizations Mark Veltzer mark@veltzer.


Using profile information for optimization

The CPU pipe line

Most CPUs today are built using pipelines This enables the CPU to clock at a higher rate This also enables the CPU to use more of it's infrastructure at the same time and so utilizing the hardware better. It also speed things up because computation is done in parallel. This is even more so in multi-core CPUs.

The problems with pipe lines

The pipe line idea is heavily tied to the idea of branch prediction If the CPU has not yet finished certain instructions and sees a branch following them then it needs to be able to guess where the branch will go Otherwise the pipeline idea itself becomes problematic. That is because the execution of instructions in parallel at the CPU level is halted every time there is a branch.

Hardware prediction

The hardware itself does a rudimentary form of prediction. The assumption of the hardware is that what happened before will happen again (order) This is OK and hardware does a good job even without assistance. This means that a random program will make the hardware miss-predict and will cause execution speed to go down. Example: "if(random()<0.5) {"

Software prediction

The hardware manufacturers allow the software layer to tell the hardware which way a branch could go using hints left inside the assembly. Special instructions were created for this by the hardware manufacturers The compiler decides whether to use hinted branches or non hinted ones. It will use the hinted ones only when it can guess where the branch will go. For instance: in a loop the branch will tend to go back to the loop.

The problem of branching

When the compiler sees a branch not as part of a loop (if(condition)) it does not know what are the chances that the condition will evaluate to true. Therefor it will usually use a hint-less branch statement. Unless you tell it otherwise. There are two ways to tell the compiler which way the branch will go.

First way - explicit hinting in the software

You can use the __builtin_expect construct to hint at the right path. Instead of writing "if(x) {" you write: "if(__builtin_expect((x),1)) {" You can wrap this in a nice macro like the Linux kernel folk did. The compiler will plant a hint to the CPU telling it that the branch is likely to succeed. See example

Second way - using profile information

You leave your code alone. You compile the code with -fprofile-arcs. Then you run it on a typical scenario creating files which show which way branches went (auxname.gcda for each file). Then you compile your code again with -fbranchprobabilities which uses the gathered data to plant the right hints. In GCC you must compile the phases using the exact same flags (otherwise expect problems) See example

Second way - PGO

This whole approach is a subset of a bigger concept called PGO Profile Generated Optimizations This includes the branch prediction we saw before but also other types of optimization (reordering of switch cases as an example). This is why you should use the more general flags -fprofile-generate and -fprofile-use which imply the previous flags and add even more profile generated optimization.