Akhila PDF

GLOBAL VALUE NUMBERING WITH
STATIC SINGLE ASSIGNMENT FORM
A THESIS
Submitted by
AKHILA C M
In partial fulfilment for the award of the degree of
MASTER OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING
Under the guidance of
Dr. SALEENA N
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

NATIONAL INSTITUTE OF TECHNOLOGY CALICUT
NIT CAMPUS PO, CALICUT
KERALA, INDIA 673601
June 01 2016
ii
ACKNOWLEDGEMENTS
First and foremost, I thank God Almighty for allowing me to complete the
project successfully, and for all the blessings he showered on me during the course
of the project.
It is with immense gratitude that I acknowledge my guide Dr. Saleena N for
introducing me to the topic and for the motivation and support she has given me
throughout in connection with the project.
I am deeply grateful to Dr. Abdul Nazeer K A, Head of the Department, for
allowing me to use all the facilities of the Department, which I required for this
project.
I would like to thank the members of my evaluation panel, Dr. Vineeth Pa-
leri, Ms. Nadia T T, Ms. Sreeja M, and Mr. Aswin Jacob for the thoughts and
advices given to me, that helped me improve the outcome of the project.
Last but not the least I would also like to thank all, especially my family
and my friends, without whose support, moral and intellectual, this work would
not have been completed.
Akhila C M
DECLARATION
“I hereby declare that this submission is my own work and that, to the best of my
knowledge and belief, it contains no material previously published or written by
another person nor material which has been accepted for the award of any other
degree or diploma of the university or other institute of higher learning, except
where due acknowledgment has been made in the text”.
Place: Signature :
Date: Name :
Reg.No:
CERTIFICATE
This is to certify that the thesis entitled: “GLOBAL VALUE NUMBERING

WITH STATIC SINGLE ASSIGNMENT FORM” submitted by Sri/Smt/Ms
AKHILA C M to National Institute of Technology Calicut towards partial ful-
fillment of the requirements for the award of Degree of Master of Technology in
Computer Science and Engineering is a bonafide record of the work carried out by
him/her under my/our supervision and guidance.
Signed by Thesis Supervisor(s) with name(s) and date
Place:
Date:
Signature of Head of the Department
Office Seal
Contents
Chapter
1 Problem Definition 1
2 Introduction 2
2.1 Simple Algorithm for Global Value Numbering . . . . . . . . . . . . 2
2.2 Static Single Assignment Form . . . . . . . . . . . . . . . . . . . . . 4
2.3 Need for SSA Version of GVN Algorithm . . . . . . . . . . . . . . . 6
3 Literature Survey 8
4 Modified Algorithm for SSA 10
5 Implementation 14
5.1 LLVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.2 Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.3 Transfer Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.4 Confluence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6 Results 23
7 Conclusions 25
Bibliography 26
vi
Abstract
Global Value Numbering(GVN) is a method to detect equivalent expressions
in a program. This project aims to detect the equivalent expressions in the SSA
form of a program using global value numbering. For this, we extend the Simple
Algorithm for Global Value Numbering[Saleena and Paleri, 2014 ] to SSA form.
We make use of the φ-functions present at the join points to compute the confluence
operation. The algorithm is implemented using the LLVM compiler infrastructure.
We measured the number of redundancies detected by the algorithm for the SPEC
CPU2006 benchmark programs and compared it with the number of lexical redun-
dancies for the same benchmark programs.

Tables
Table
6.1 Number of redundancies detected . . . . . . . . . . . . . . . . . . . 24

Figures
Figure
2.1 Expression pool computed by GVN algorithm for non-SSA code . . 3
2.2 Non-SSA code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 SSA code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.4 Expression pool computed by GVN algorithm for SSA code . . . . . 6
4.1 Computing confluence . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.1 Computing confluence : Implementation . . . . . . . . . . . . . . . 21

Chapter 1
Problem Definition
The problem is to extend the GVN algorithm by Saleena and Paleri[7] to
Static Single Assignment(SSA) form and implement the algorithm using LLVM.
SSA form is the intermediate representation in which every variable in a
program has only one definition. SSA makes data flow analysis and optimizations
more simpler and effective[6]. So, most of the compilers are representing the in-
termediate code in SSA form. Hence we hope, an implementation based on SSA
code will be useful. If intermediate representation in SSA form is given as input
to GVN algorithm by Saleena and Paleri[7], it will miss some of the redundancies.
Hence, we aim to extend the algorithm to SSA form so that it could detect those
missing redundancies.
Chapter 2
Introduction
Global Value Numbering(GVN) is a method used to detect equivalent expres-
sions in programs. “Two expressions are said to be equivalent if we can statically
determine that both the expressions will have the same value during execution”[7].
If we can identify that two expressions are equivalent in a program, then we can
replace the second expression by the value of the first expression. GVN can be
used to eliminate the redundant code, even the ones which cannot be eliminated
by the Common Subexpression Elimination.
2.1 Simple Algorithm for Global Value Numbering
The value numbering for a block, which is also known as Local Value Num-
bering is done by assigning a value number to every expressions in the basic block
and this number will be same for the expressions that are equivalent. Global Value
Numbering is the idea of extending this concept to the whole program. In GVN
algorithm by Saleena and Paleri[7], value expression is used to represent a set of
equivalent expressions. A value expression is the expression in which the operands
are value numbers[7]. For each expression of the form x op y, we can obtain a
value expression by replacing the operands by their corresponding value numbers.
The set of expressions that are equivalent at a program point is represented by an
expression pool. Each class in the expression pool contains the expressions that
3
are equivalent. By this method, we have a compact representation of a set of ex-
pressions by a single value expression.
EIN : φ
x=1
(1)
y=2
EOUT :{[v1 , x, 1], [v2 , y, 2]}
EIN :{[v1 , x, 1], [v2 , y, 2]} EIN :{[v1 , x, 1], [v2 , y, 2]}
x=3
(2) (3) z = x+y
z = x+y
EOUT :{[v1 , 1], [v2 , y, 2], EOUT :{[v1 , x, 1], [v2 , y, 2],
[v3 , x, 3], [v4 , z, v3 + v2 ]} [v5 , z, v1 + v2 ]}
EIN :{[v1 , 1], [v2 , y, 2], [v6 , x],

[v7 , z, v6 + v2 ]}
(4) p = x+y
EOUT :{[v1 , 1], [v2 , y, 2], [v6 , x],

[v7 , z, v6 + v2 , p]}
Figure 2.1: Expression pool computed by GVN algorithm for non-SSA code
“The points at which multiple control flow paths join are called confluence
points”[7]. At the confluence points in the program, suppose the value expression
from the left branch is vi1 + vi2 and that from the right branch is vj1 + vj2 . Then
in the GVN algorithm by Saleena and Paleri[7], at the point of confluence it is
checked whether there is any common element corresponding to vi1 and vj1 , and
vi2 and vj2 . If there are common elements, then a new value expression will be
4
generated representing the common expression reaching the confluence point.
Figure 2.1 shows the expression pool computed by the GVN algorithm at
input and output point of each basic block in a sample non-SSA code. The expres-
sion x+y at node (4) can be identified as redundant. This is because, the value
expression corresponding to x+y, i.e, v6 + v2 is available at the input point of the
node.
2.2 Static Single Assignment Form
“Static Single Assignment form is an intermediate representation in which
each variable has only one definition in the program text”[6]. In other words, ev-
ery definition of a variable dominates its use. “In translating to SSA form, the
standard mechanism is to subscript each of the variables”[6]. Also, we have to
insert φ- functions at the join points if there is an assignment to a variable in the
branch. If the variables xi and xj are reaching at the join point from two different
branches, then xk = φ(xi , xj ) will be introduced at the join.
Figure 2.2 shows the control flow graph(CFG) of a sample non-SSA code.
The corresponding SSA code is shown in Figure 2.3. In the non-SSA code, we can
observe that there are more than one definitions for the variables x and z. While
converting the code into SSA form, when the first definition of x is seen, we replace
it by the variable x1 and we replace all uses of x by x1 until the next definition of
x is seen. When the second definition of x is seen, we replace it by x2 . Similarly
for z. At the join point of node (2) and node (3) in Figure 2.3, definition of the
variables x1 and x2 is reaching. In this case, we introduce a pseudo-assignment,
x3 = φ(x2 , x1 ) at the join point. Similarly, since the definition of z1 and z2 is

5
reaching at the join point, we introduce z3 = φ(z1 , z2 ) at that point.
x=1
(1)
y=2
x=3
(2) (3) z = x+y
z = x+y
(4) p = x+y
Figure 2.2: Non-SSA code
x1 = 1
(1)
y1 = 2
x2 = 3
(2) (3) z2 = x1 + y1
z1 = x2 + y1
x3 = φ(x2 , x1 )
z3 = φ(z1 , z2 )
(4) p1 = x3 + y1
Figure 2.3: SSA code

6
2.3 Need for SSA Version of GVN Algorithm
If SSA code is given as input to the GVN algorithm by Saleena and Paleri[7],
it fails to detect some of the redundancies that were detected by the algorithm when
non-SSA code is given as input. This is because, the algorithm does not deal with
the φ-functions in SSA code.
EIN : φ
x1 = 1
(1)
y1 = 2
EOUT :{[v1 , x1 , 1], [v2 , y1 , 2]}
EIN :{[v1 , x1 , 1], [v2 , y1 , 2]} EIN :{[v1 , x1 , 1], [v2 , y1 , 2]}
x2 = 3
(2) (3) z2 = x1 + y1
z1 = x2 + y1
EOUT :{[v1 , x1 , 1], [v2 , y1 , 2], EOUT :{[v1 , x1 , 1], [v2 , y1 , 2],
[v3 , x2 , 3], [v4 , z1 , v3 + v2 ]} [v5 , z2 , v1 + v2 ]}
x3 = φ(x2 , x1 )
z3 = φ(z1 , z2 )
EIN :{[v1 , x1 , 1], [v2 , y1 , 2]}
(4) p 1 = x3 + y 1
EOUT :{[v1 , 1], [v2 , y1 , 2], [v6 , x3 ],

[v7 , v6 + v2 , p1 ]}
Figure 2.4: Expression pool computed by GVN algorithm for SSA code
Figure 2.4 shows the expression pool computed by the GVN algorithm at
7
input and output point of each basic block in the SSA form of the code given
in Figure 2.1. The expression x3 + y1 in node (4) is actually redundant. But,
this redundancy is not detected by the GVN algorithm. In Figure 2.1, we have
seen that the expression x+y in node (4) can be identified as redundant by the
GVN algorithm when non-SSA code is given as input. But, when SSA code is
given as input, GVN algorithm fails to detect the redundancy of the corresponding
expression in node (4). This project aims to extend the GVN algorithm by Saleena
and Paleri[7] to work with SSA form, so that it could detect all the redundancies
in SSA code that are detected by the GVN algorithm when the corresponding
non-SSA code is given as input. For this, we make use of the φ-functions present
at the join points to compute the confluence operation.

Chapter 3
Literature Survey
Fundamentals of compiler optimizations are given by Lam, Sethi, Ullman
and Aho[5]. Data flow analysis refers to a body of techniques that derive infor-
mation about the flow of data along program execution paths. Foundations of
data flow analysis and the iterative solution to general data flow framework are
explained in [5].
A simple algorithm for detecting equivalent expressions in a program using
Global Value Numbering is given by Saleena and Paleri[7]. The basic idea of the
algorithm is to use the concept of value expression, which is a compact represen-
tation of a set of expressions.
The details about SSA and the concept of φ-functions are clearly explained
in [4]. The SSA conversion algorithms are given by Muchnick[6] and Appel[4].
Appel use the concept of dominance frontiers to calculate the minimum set of φ
functions. “The dominance frontier of a node x is the set of all nodes w such that x
dominates a predecessor of w, but does not strictly dominate w ”[4]. According to
Appel, when a definition of some variable is x is seen in a node n, then a φ-function
for x is to be inserted at the dominance frontier of n.
An SSA based algorithm to detect equality of variables in a program is given

9
by Alpern, Wegman and Zadeck in [3]. They use an auxiliary structure called value
graph that represents the symbolic execution of the program. “This approach is
conservative in that any variables detected to be equivalent will in fact be equiva-
lent, but not all equivalences are detected”[3].
Details on the LLVM compiler framework are available at [2]. It clearly
explains about the intermediate representation of a program in LLVM. The steps
to write and execute an LLVM pass is also available at [2].

Chapter 4
Modified Algorithm for SSA
As given in introduction, the GVN algorithm by Saleena and Paleri[7] will
miss some of the redundancies, when SSA code is given as input. Hence, we have
modified the algorithm to work with SSA code. The algorithm is modified in the
assumption that the information about φ-function is available at the confluence
points in a program.
Algorithm 1: EOU Tn = fn (EINn ),for a node n containing the assignment

xi = e
Et = EINn ;
if ( xj is in a class Cxj ∈ Et ) then
remove xj from Cxj ;
end
e0 = valueExp(e);
if (e0 is in a class Ce0 ∈ Et ) then
add xi to Ce0 ;
else
create a new class [vk , xi , e0 ] and add it to Et ;
/* vk is a new value number */
end
EOU Tn =Et ;
return EOU Tn ;
Algorithm 1 shows the transfer function for a node n containing assignment
xi = e , where xi is a variable and e is an expression. “The function valueExp(e)
returns the value expression of e, if e is of the form x op y, and returns e itself
otherwise”[7]. When an expression xi = e is seen, all the expressions involving xj is

11
killed. Even if killing is not done, there will not be any problem in the correctness
of the algorithm.
Algorithm 2: Computing Ci u Cj
/* A class with value number vn is denoted by Cn and vice-versa
*/
Ck = Ci ∩ Cj ;
foreach pair of variables xp ∈ Ci − Ck and xq ∈ Cj − Ck do
var1 = P hiM ap(xp );
var2 = P hiM ap(xq );
if (var1 = var2 and var1 6= N U LL) then
add var1 to Ck ;
end
end
if (Ci and Cj have different value expressions) then
/* let vi1 + vi2 and vj1 + vj2 be the value expressions in Ci and
Cj respectively */
Ck1 = Ci1 u Cj1 ;
Ck2 = Ci2 u Cj2 ;
if (Ck1 6= φ and Ck2 6= φ ) then
add the value expression vk1 + vk2 to Ck ;
end
end
if (Ck 6= φ and Ck does not have a value number) then
add a new value number, say vk to Ck ;
end
return Ck ;
12
Algorithm 2 is the modified algorithm for computing Ci u Cj . This algorithm
will compute the expressions that are equivalent in both Ci and Cj with the help
of φ-function at the confluence point, if present. In the algorithm, a class with
value number vn is denoted by Cn and vice-versa. It is assumed that the informa-
tion about the φ-functions at a confluence point are available while computing the
confluence of expression pools at that point. If there is a pseudo-assignment of the
form xr = φ(xp , xq ), the function PhiMap(xp ) returns xr and PhiMap(xq ) returns
xr . Otherwise, it will return NULL. This modified algorithm for computing Ci uCj
has an additional step apart from the steps in the GVN algorithm for computing
Ci u Cj [7]. That is, apart from adding the common expressions in Ci and Cj to
a new class Ck , a variable xr is added to Ck if xr = φ(xp , xq ) is present in the
confluence point and xp ∈ Ci and xq ∈ Cj .
E1 : {[v1 , x1 , 5], [v2 , y1 , b], E2 : {[v4 , x2 , p], [v5 , y2 , q],

[v3 , v1 + v2 ]} [v6 , v4 + v5 ]}
x3 = φ(x1 , x2 )
y3 = φ(y1 , y2 )
E3 : {[v7 , x3 ], [v8 , y3 ] , [v9 , v7 + v8 ]}
Figure 4.1: Computing confluence
In Figure 4.1, the expression pool from left is E1 and that from right is E2 .
When C1 u C4 is done, since x3 = φ(x1 , x2 ) is present at the confluence point and
the classes C1 and C4 contains x1 and x2 respectively, x3 is added to a new class
and a value number v7 is assigned to it. The resulting class is then added to the
expression pool E3 . Similarly, when C2 u C5 is done, y3 is added to a new class

13
with a value number v8 and the class C8 is added to E3 . When C3 u C6 is done,
since C1 u C4 and C2 u C5 are not NULL and the value numbers of their result-
ing classes are v7 and v8 respectively, we add v7 + v8 to another class with a new
value number v9 (Cn is a class with value number vn ). Then, C9 is also added to E3 .
Chapter 5
Implementation
The modified algorithm is implemented in LLVM[2]. In the implementation,
only the integer arithmetic instructions are considered. The IR code in SSA form
is given as the input to the program and it will compute the expression pool at the
input and output point of each instruction. The number of redundancies in each
function is also computed.
5.1 LLVM
The LLVM compiler infrastructure project is a collection of modular and
reusable compiler and toolchain technologies[2]. It has a well-defined intermediate
representation based on SSA. LLVM is written in cpp. In order to write a pass in
llvm, the runOnFunction of FunctionPass is overrided. We use the opt command
to run an LLVM program through the pass[2]. To convert IR of a program to
SSA form, mem2reg pass is used. This pass promotes memory references to regis-
ter references. This is just the standard SSA construction algorithm to construct
pruned SSA form[2]. Using the command line option –time-passes, we can get the
information about running time of our pass.

15
5.2 Data Structures
This section describes the data structure used in our implementation.
• ExPool : map<string, string>
An expression pool is represented by a string to string map. It maps
a variable/constant/value expression to the corresponding value number.
For each instruction, two instances of this map, namely EIN and EOUT
is created to store the expression pool at input and output point of an in-
struction respectively. The expressions mapped to the same value number
are considered as equivalent expressions. For example, the expression pool
{[v1 , x1 , 5], [v2 , y1 , b], [v3 , v1 + v2 ]} will be stored as
x1 → v1
5 → v1
y1 → v2
b → v2
v1 + v2 → v3
In this example, since y1 and b have same value number, they can be
considered as members of the equivalence class with value number v2 .
• map<Instruction*, ExPool>
It maps the Instruction pointer to the expression pool. Two instances of
this map is created. First one is used to map each instruction pointer to
the corresponding EIN. This map will contain expression pool at the in-
put point of each instruction. Second one is used to map each instruction
pointer to the corresponding EOUT. It will contain expression pool at the
output point of each instruction.

16
• BB ExPool : map<string, set<string>>
At the input and output point of each basic block, a map which maps
a value number to the set of expressions which corresponds to the value
number is created. It represents the expression pool at input and output
point of a basic block. This map is created for the convenience in comput-
ing confluence. The expression pool {[v1 , x1 , 5], [v2 , y1 , b], [v3 , v1 + v2 ]} will
be stored in this map as
v1 → {x1 , 5}
v2 → {y1 , b}
v3 → {v1 + v2 }
• map<BasicBlock*, BB ExPool>
This will map a basic block pointer to the expression pool of a basic block.
Two instances of this map is created. One is for storing the expression pool
at the input pool of each basic block. Other is for storing the expression
pool at the output point of each basic block. These two instances together
will contain EIN and EOUT of every basic block.
• PHIMAP : map<string, string>
It is a data structure that is used for storing the information about the φ-
functions. It is used at the time of computing confluence operation. In the
first iteration of the program, this map is created to store the φ-functions
at each basic block. Suppose there are two φ-functions, say x3 = φ(x1 , x2 )
and y3 = φ(y1 , y2 ) at the confluence point. Then, they will be stored in
PHIMAP as
17
x1 → x3
x2 → x3
y1 → y3
y2 → y3
• map<BasicBlock*, PHIMAP>
This will map a basic block pointer to the PHIMAP of the basic block.
This map will contain the details about φ-functions in each basic block. It
helps to avoid the recomputation of PHIMAP in every iteration.
5.3 Transfer Function
Using the above said data structures, the Transfer Function of the modified
GVN algorithm is implemented in LLVM. For each instruction in the IR of the
program, Algorithm 1 is invoked. Suppose an arithmetic instruction, say z = x op
y, where op is an arithmetic operator is seen. Then the Transfer Function will take
EIN as input and returns EOUT. EIN and EOUT are string to string maps which
maps expression to its corresponding value number. EIN represents the expression
pool at the input point of the instruction.
First of all, Transfer Function will initialize EOUT with EIN and check if x
is already present in EOUT using the inbuilt function find(x). If x is not present,
x will be inserted to EOUT with a new value number. This new value number
will be the same as the value number of x in the previous iteration. If it is the first
iteration, then a new value number is given. Similarly for y. The expression pool
is implemented as a class and the member function valueExpression(x, op, y) is
invoked. It will return the value expression of x op y as a string. Then, it will check
18
whether the value expression of x op y is present in EOUT. If it is present, z will
be inserted into EOUT with same value number as that of the value expression of
x op y. If it is not present, value expression of x op y will be inserted into EOUT
with a new value number. This value number is also assigned in the same manner
as that of x. z is also inserted to EOUT with the same value number. Finally,
EOUT will be returned.
At every function call, the expression pool is set to NULL. For any instruc-
tion other than arithmetic expression and Call instruction, Transfer Function will
return EIN itself as EOUT 1 . After calling the Transfer Function, the instruction
pointer is mapped to EIN in one map and is mapped to EOUT in another map. In
LLVM, some optimizations are done when IR is converted into SSA code. Hence,
representation of variables in SSA code are in such a way that we cannot identify
xj corresponding to the statement xi = e in Algorithm 1. Hence, KILL operation
cannot be implemented in LLVM. Sample intermediate code of a straight line pro-
gram in SSA form and its output is given below.
Example
• EIN :φ
%add = add nsw i32 2, 1
EOUT : { 2 → v1 , 1 → v2 , add → v3 , v1 + v2 → v3 }
• EIN : { 2 → v1 , 1 → v2 , add → v3 , v1 + v2 → v3 }
%mul = mul nsw i32 %add, %add
EOUT : { 2 → v1 , 1 → v2 , add → v3 , v1 +v2 → v3 , v3 ∗v3 → v4 , mul → v4 }
1
In the converted SSA code, we have observed that there are no load and store instructions.
19
• EIN : { 2 → v1 , 1 → v2 , add → v3 , v1 +v2 → v3 , v3 ∗v3 → v4 , mul → v4 }
%add1 = add nsw i32 2, 1
EOUT : { 2 → v1 , 1 → v2 , add → v3 , v1 +v2 → v3 , v3 ∗v3 → v4 , mul → v4 ,
add1 → v3 }
• EIN : { 2 → v1 , 1 → v2 , add → v3 , v1 + v2 → v3 , v3 ∗ v3 → v4 , mul → v4 ,
add1 → v3 }
%mul2 = mul nsw i32 %add1, %add1
EOUT : { 2 → v1 , 1 → v2 , add → v3 , v1 +v2 → v3 , v3 ∗v3 → v4 , mul → v4 ,
add1 → v3 , mul2 → v4 }
5.4 Confluence
Confluence point is the program point at which two or more control flow
paths join. i.e, if there are more than one predecessor to a basic block, then
the input point of the basic block is a confluence point. In order to compute
the EIN at the confluence point, the Confluence() function is called with out-
put expression pools from its predecessors as argument. The expression pools are
passed as BB ExPool data structure. The Confluence() function is written with
two arguments. If there are more than two predecessors, then the output of the
Confluence() function with first two predecessors is passed as an argument of Con-
fluence() function with the output expression pool of the third predecessor and so
on. i.e, If there are n predecessors for a basic block, then the Confluence() function
is called n − 1 times.
We make use of the φ-functions at the confluence point to compute EIN of
the basic block with more than one predecessors. So, the information about φ-
functions of a basic block are stored in the data structure PHIMAP. In the first
20
iteration of the program, for each basic block, the basic block pointer and its cor-
responding PHIMAP is stored in a map. From the second iteration onwards, the
stored information is used for computing EIN. The φ-functions of a basic block
are present at the beginning of the basic block. When a P HI instruction of the
form xr = φ(x1 , x2 , .., xn )is seen, we will insert x1 → xr , x2 → xr , ... xn → xr and
xr → xr to PHIMAP. A function PhiMap(xi ) is implemented which search for the
key xi in PHIMAP and returns the value corresponding to xi .
In the Confluence() function, class-wise intersection of the two expression
pools that are passed as the arguments to the function is done. The expression
pools are stored in the data structure BB ExPool. For each pair of classes in the
expression pool, Intersect() function(Algorithm 2) is invoked. Value numbers of
the two classes is given as input to the function and it will return the value num-
ber of the resultant class. If the resultant class is empty, it will return NULL.
Intersect() function will initially check whether it is already invoked for the same
pair of value numbers. If it is, then the value number of the resultant class is re-
turned. Otherwise, it will take the sets corresponding to the value numbers in its
input from BB ExPool. Then, it takes the intersection of these two sets and insert
the result of intersection into a new set, say S. After that, each pair of distinct
expressions in the sets are taken and PhiMap() function is called. Suppose one of
the sets contains xp and other contains xq . When PhiMap(xp ) and PhiMap(xq )
are called, both will return xr if the pseudo-assignment xr = φ(xp , xq ) is present
in the basic block. So, if the variable returned by PhiMap(xp ) and PhiMap(xq )
are same, then it is added to the set S. In this case, xr is added to S. Then, it
checks for the value expression of the form vi op vj in the sets. If both the sets
contain such value expression, it checks if operator of both the value expressions
are same. If they are same, then Intersect() function is called recursively. Suppose
21
v1 op v2 is present in one set and v3 op v4 in the other, then Intersect(v1 , v3 ) and
Intersect(v2 , v4 ) are invoked. If they return v5 and v6 respectively, then v5 op v6 is
added to the set S. And if any of them returns NULL, it means that no expression
is common in the two sets. So, nothing will be added to S. After that, a value
number will be assigned to the set S. If the value numbers passed as arguments
to Intersect() function are same, then the same value number will be assigned to
the newly created set S. If the value numbers are different and if it is not the first
iteration, it will check if any of the elements in the set were present in the input
expression pool of the basic block in previous iteration. If it is present, the same
value number will be assigned to the set S. Otherwise, a new value number will be
assigned. Finally, the set S along with its value number is inserted into the input
expression pool of the basic block.
v1 → {x1 , 5} v4 → {x2 , p}
E1 : v2 → {y1 , b} E2 : v5 → {y2 , q}
v3 → {v1 + v2 } v6 → {v4 + v5 }
x3 = φ(x1 , x2 )
y3 = φ(y1 , y2 )
v7 → {x3 }
E3 : v8 → {y3 }
v9 → {v7 + v8 }
Figure 5.1: Computing confluence : Implementation
This is an iterative algorithm. The iteration is terminated when the output
expression pool of every instructions in an iteration are same as the corresponding
output expression pool of the previous iteration. i.e, when EOUT of two consec-
utive iterations are same, the iteration is terminated. So, it takes a minimum of
22
two iterations for the pass to terminate. Computation of confluence in Figure 4.1
is implemented as in Figure 5.1 where each expression pool is represented by a
string to set map.

Chapter 6
Results
The extended GVN algorithm for SSA form is implemented in LLVM. SPEC
CPU2006[1] benchmarks are executed and the number of redundancies are com-
puted. Since the benchmarks are in non-SSA form, they are converted into SSA
form by running the mem2reg pass and are given as the input to our GVN pass.
When the test cases are executed, it is observed that most of the straight line
programs terminate after two iterations and those with loops terminate after three
iterations.
Since the GVN algorithm is detecting value based redundancies, we thought
of having a comparison with a lexical based algorithm. In a lexical based algo-
rithm, the redundancies detected is only among lexically identical expressions. We
have written a pass that uses a lexical based algorithm to compute the number
of redundancies in a program and the SSA form of SPEC CPU2006 benchmarks
are executed in it. A comparison on the number of redundancies detected by the
SSA version of GVN algorithm and the number of lexical redundancies is shown
in table 6.1. The GVN algorithm will compute value based equivalence apart from
lexical equivalence. So, GVN algorithm will detect more number of redundancies
than lexical redundancies.
The mem2reg pass does many optimizations like constant propagation and
SPECint 2006 GVN Lexical 24
astar 22 22
bzip2 44 38
gcc 69 64
gromacs 260 215
h264ref 1287 971
hmmer 360 348
lbm 794 340
mcf 35 29
povray 191 169
sjeng 139 139
soplex 13 10
sphinx 27 25
Table 6.1: Number of redundancies detected
copy propagation, while converting the IR into SSA form. So, many of the expres-
sions in the IR becomes lexically equivalent. That is the reason why two folders in
Table 6.1 are detecting same number of redundancies. For all other folders in Ta-
ble 6.1, GVN algorithm is detecting more number of redundancies than the lexical
redundancies.
Chapter 7
Conclusions
The GVN algorithm is modified for SSA code. We make use of the φ-function
at the join points to compute the confluence operations. The modified algorithm
is implemented in LLVM. SPEC CPU2006 benchmark programs in SSA form are
executed and the number of redundancies are measured. The result obtained is
then compared with the number of lexical redundancies for the same input pro-
grams. It is observed that the SSA version of the GVN algorithm is detecting more
number of redundancies than the lexical redundancies in the SSA programs.

Bibliography
[1] Standard Performance Evaluation Corporation.

https://www.spec.org/cpu2006/. Last Accessed on 16-05-2016.
[2] The LLVM Compiler Infrastructure . http://llvm.org/. Last Accessed on 16-

05-2016.
[3] Bowen Alpern, Mark N Wegman, and F Kenneth Zadeck. Detecting equality
of variables in programs. In Proceedings of the 15th ACM SIGPLAN-
SIGACT symposium on Principles of programming languages, pages
1–11. ACM, 1988.
[4] Andrew W Appel and P Jens. Modern compiler implementation in java, 2002.
[5] Monica Lam, Ravi Sethi, JD Ullman, and AV Aho. Compilers: Principles,
techniques and tools, 2006.
[6] Steven S. Muchnick. Advanced compiler design implementation. Morgan

Kaufmann, 1997.
[7] Nabizath Saleena and Vineeth Paleri. Global value numbering for redundancy
detection: a simple and efficient algorithm. In Proceedings of the 29th
Annual ACM Symposium on Applied Computing, pages 1609–1611.
ACM, 2014.

Akhila PDF

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Akhila PDF

Transféré par

Droits d'auteur :

Formats disponibles

GLOBAL VALUE NUMBERING WITH

STATIC SINGLE ASSIGNMENT FORM

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

It is with immense gratitude that I acknowledge my guide Dr. Saleena N for

throughout in connection with the project.

I am deeply grateful to Dr. Abdul Nazeer K A, Head of the Department, for

not have been completed.

This is to certify that the thesis entitled: “GLOBAL VALUE NUMBERING

Signed by Thesis Supervisor(s) with name(s) and date

Signature of Head of the Department

2.1 Simple Algorithm for Global Value Numbering . . . . . . . . . . . . 2

2.2 Static Single Assignment Form . . . . . . . . . . . . . . . . . . . . . 4

2.3 Need for SSA Version of GVN Algorithm . . . . . . . . . . . . . . . 6

4 Modified Algorithm for SSA 10

5.2 Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

5.3 Transfer Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Global Value Numbering(GVN) is a method to detect equivalent expressions

operation. The algorithm is implemented using the LLVM compiler infrastructure.

dancies for the same benchmark programs.

6.1 Number of redundancies detected . . . . . . . . . . . . . . . . . . . 24

2.1 Expression pool computed by GVN algorithm for non-SSA code . . 3

2.2 Non-SSA code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.3 SSA code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.4 Expression pool computed by GVN algorithm for SSA code . . . . . 6

4.1 Computing confluence . . . . . . . . . . . . . . . . . . . . . . . . . 12

5.1 Computing confluence : Implementation . . . . . . . . . . . . . . . 21

The problem is to extend the GVN algorithm by Saleena and Paleri[7] to

SSA form is the intermediate representation in which every variable in a

termediate code in SSA form. Hence we hope, an implementation based on SSA

code will be useful. If intermediate representation in SSA form is given as input

Global Value Numbering(GVN) is a method used to detect equivalent expres-

sions in programs. “Two expressions are said to be equivalent if we can statically

by the Common Subexpression Elimination.

2.1 Simple Algorithm for Global Value Numbering

algorithm by Saleena and Paleri[7], value expression is used to represent a set of

equivalent expressions. A value expression is the expression in which the operands

value expression by replacing the operands by their corresponding value numbers.

The set of expressions that are equivalent at a program point is represented by an

pressions by a single value expression.

EOUT :{[v1 , x, 1], [v2 , y, 2]}

EIN :{[v1 , 1], [v2 , y, 2], [v6 , x],

EOUT :{[v1 , 1], [v2 , y, 2], [v6 , x],

in the GVN algorithm by Saleena and Paleri[7], at the point of confluence it is

expression corresponding to x+y, i.e, v6 + v2 is available at the input point of the

2.2 Static Single Assignment Form

“Static Single Assignment form is an intermediate representation in which

standard mechanism is to subscript each of the variables”[6]. Also, we have to

insert φ- functions at the join points if there is an assignment to a variable in the

branches, then xk = φ(xi , xj ) will be introduced at the join.

x is seen. When the second definition of x is seen, we replace it by x2 . Similarly

variables x1 and x2 is reaching. In this case, we introduce a pseudo-assignment,

x3 = φ(x2 , x1 ) at the join point. Similarly, since the definition of z1 and z2 is

Figure 2.2: Non-SSA code

Figure 2.3: SSA code

the φ-functions in SSA code.

EOUT :{[v1 , x1 , 1], [v2 , y1 , 2]}

EIN :{[v1 , x1 , 1], [v2 , y1 , 2]}

EOUT :{[v1 , 1], [v2 , y1 , 2], [v6 , x3 ],

in Figure 2.1. The expression x3 + y1 in node (4) is actually redundant. But,

at the join points to compute the confluence operation.

Fundamentals of compiler optimizations are given by Lam, Sethi, Ullman

A simple algorithm for detecting equivalent expressions in a program using

algorithm is to use the concept of value expression, which is a compact represen-