Académique Documents
Professionnel Documents
Culture Documents
Proefschrift
door
Copromotor:
Dr. G. N. Wells
Samenstelling promotiecommissie:
Rector Magnificus Voorzitter
Prof. dr. ir. L. J. Sluys Technische Universiteit Delft, promotor
Dr. G. N. Wells University of Cambridge, copromotor
Dr. ir. M. B. van Gijzen Technische Universiteit Delft
Prof. dr. P. H. J. Kelly Imperial College London
Prof. dr. R. Larsson Chalmers University of Technology
Prof. dr. L. R. Scott University of Chicago
Prof. dr. ir. C. Vuik Technische Universiteit Delft
Prof. dr. A. Scarpas Technische Universiteit Delft, reservelid
This thesis represents the formal end of my long and interesting journey as a
PhD student. The sum of many experiences over the past years has increased my
knowledge and contributed to my personal development. All these experiences
originate from the interaction with many people to whom I would like to express
my gratitude.
I am most grateful to Garth Wells for giving me the opportunity to come to
Delft and to study under his competent supervision. His constructive criticism
and vision combined with our nice discussions greatly improved the quality of
my research. As the head of the computational mechanics group, Bert Sluys has
played a vital role by creating a very nice and supportive working environment
where people enjoy a lot of creative freedom. As creativity is key in this research I
consider myself lucky to have been part of Bert’s group.
Ronnie Pedersen did a very good job in persuading me to come to Delft for
a PhD, and I am happy that he managed to convince me. I am also grateful for
enjoying his friendship throughout the years, the good times on the football pitch,
and the even better times in ’t Proeflokaal watching football and discussing work
and life in general.
A friendly and inspiring working environment is important in order to produce
quality work. Therefore, I would like to thank past and present colleagues Rafid Al-
Khoury, Roberta Bellodi, Frank Custers, Frank Everdij, Huan He, Cecilia Iacono, Cor
Kasbergen, Oriol Lloberas-Valls, Prithvi Mandapalli, Frans van der Meer, Andrei
Metrikine, Peter Moonen, Dung Nguyen, Vinh Phu Nguyen, Mehdi Nikbakth,
Marjon van der Perk, Frank Radtke, Zahid Shabir, Xuming Shan, Angelo Simone,
Mojtaba Talebian, Andy Terrel, Ilse Vegt, Jaap Weerheijm, Sigurd Blöndal, Lars
Damkilde, Niels Dollerup, Jens Hagelskjær, Michael Jepsen, Sven Krabbenhøft
and Søren Lambertsen. In particular, I would like to thank Frans for the years
that we shared the same office and for translating the propositions into Dutch. A
special thanks goes to Mehdi, my ‘brother-in-arms’, the only person remaining
in the group who was also involved with the FEniCS Project after Garth left for
Cambridge and Xuming left for home.
The research presented in this thesis, is centered around the FEniCS Project and,
vi
therefore, I would also like to thank all the people in the FEniCS community, in
particular my close collaborators from Simula Anders Logg, Martin Alnæs, Marie
Rognes and Johan Hake for all the nice discussions, debugging assistance and
good ideas. During my PhD, I also had the pleasure of visiting the University of
Michigan and in this regard I want to thank Krishna Garikipati, Jake Ostien and
his wife Erin for their hospitality during my stay in Ann Arbor.
Outside the office, I enjoyed many hours in the good company of my friends
Linda Grimstrup and Lars Freising which definitely improved the quality of my
social life a lot. I also want to thank all my former team mates at Vitesse Delft for
the many memorable hours on the football pitch trying to learn the secrets behind
‘totaalvoetbal’. Although The Netherlands and Denmark are quite similar in terms
of weather, nature and culture it was always nice to receive visitors from home. For
this, I would like to thank my friends Kenneth Guldager, Henrik Hansen, Mads
Madsen, Christian Meyer, Nick Nørreby and Thomas Sørensen.
Last, but certainly not least I want to thank my parents and my brother and
sisters for their encouragement, support, help and visits during my years in Delft. I
also wish to thank both of my sons for putting things in perspective which helped
me to focus during the last iterations towards finishing this thesis. Of all people, I
am most grateful to my wife. I know her patience has been tested to the limit, yet
she remained supportive, loving and caring during all the years. For this, and for
our sons, I am forever indebted.
The research presented in this thesis was carried out at the Faculty of Civil
Engineering and Geosciences at Delft University of Technology. The research
was supported by the Netherlands Technology Foundation STW, the Netherlands
Organisation for Scientific Research and the Ministry of Public Works and Water
Management.
1 Introduction 1
1.1 Research objectives and approach . . . . . . . . . . . . . . . . . . . . 2
1.2 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 The FEniCS Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.1 Simple model problem . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.2 Unified Form Language . . . . . . . . . . . . . . . . . . . . . . 8
1.3.3 FEniCS Form Compiler . . . . . . . . . . . . . . . . . . . . . . 11
1.3.4 Unified Form-assembly Code . . . . . . . . . . . . . . . . . . . 15
1.3.5 DOLFIN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
References 171
Summary 183
Samenvatting 185
Propositions 187
Stellingen 189
Since the advent of the modern programmable computer in the 1940s, the cost
of computing power relative to manpower has decreased significantly. As a con-
sequence, high-level programming languages have emerged allowing the imple-
mentation of programs in source code using abstractions that are independent of
the specific computer architectures on which the program is intended to run. A
compiler is then invoked to translate the source code into machine code targeted for
the given computer’s central processing unit (CPU). This development has allowed,
among other things, researchers and scientists to write programs for investigating
and solving various classes of problems numerically.
In engineering, physical phenomena are often described mathematically by
partial differential equations (PDEs), and a commonly used method to solve these
equations is the finite element method (FEM). Standard finite element software typ-
ically provide a problem solving environment for a set of engineering problems
using a predefined selection of finite elements. As part of the application program-
ming interface (API) a user can often supply subroutines which implement special
methods, for instance, the constitutive model in case of a solid mechanics problem.
This offers a degree of customisation and flexibility in terms of implementing
certain models, but the approach may fall short as the complexity of a model
increases.
Strain gradient plasticity is an example of a class of models which can be
difficult to implement in traditional finite element software and researchers often
resort to implementing their own unique solver targeting a specific model. An
implementation involves translating the abstract mathematical representation of
the model into source code which can be handled by a compiler, a process which
can be tedious, time consuming and error prone. However, by introducing a higher
level of abstraction, the burden of this process can be alleviated when it comes
to implementing mathematical representations of the FEM for solving PDEs. A
possible abstraction consists of a form language for expressing the mathematical
formulation of the given problem, and compilers which automatically generate
efficient source code from the given mathematical expressions. This thesis is
centered around this type of automated mathematical modelling.
2 Chapter 1. Introduction
As will be demonstrated in this work, addressing the above three issues has
had a significant impact on the range of problems which can be handled in the
FEniCS framework and thereby making life easier for researchers and application
developers.
A complex application from solid mechanics in the form of a strain gradient
plasticity model is considered, as an example, to demonstrate the extensions to the
FEniCS framework developed in this work. Strain gradient models are often used
to provide regularisation in softening problems and to account for observed size
effects at small length scales. An abundance of strain gradient models have been
proposed in literature including the models by Aifantis (1984), Gurtin (2004), Fleck
and Hutchinson (1997), Fleck and Hutchinson (2001) and Gao et al. (1999) to name
a few. The focus in this work is on the class of models involving gradients of fields
such as the equivalent plastic strain. An example of such a model is that proposed
by Aifantis (1984) which involves the addition of the Laplacian of the equivalent
plastic strain to the classical yield condition. A feature of this particular model
is that the classical consistency condition leads to a partial differential equation
rather than an algebraic equation, as is the case is classical flow theory of plasticity.
The partial differential equation is only active in the region undergoing plastic
deformations which introduces the difficulty of imposing non-standard boundary
conditions on the secondary field on the evolving boundary.
4 Chapter 1. Introduction
Motivated by the work of Wells et al. (2004) and Molari et al. (2006) who used
a discontinuous Galerkin formulation for a strain gradient-dependent damage
model, a discontinuous basis can be used to interpolate the secondary field. This
provides a natural framework for handling evolving elastic–plastic boundaries
and provides local (cell-wise) satisfaction of the yield condition. To satisfy the
regularity requirement of the secondary field, a discontinuous Galerkin formulation
is used to enforce weak continuity across cell facets. In order to allow the use
of a discontinuous constant basis for the secondary field, a so-called lifting-type
discontinuous Galerkin formulation, proposed by Bassi and Rebay (1997, 2002), is
adopted. A discontinuous constant basis is the natural choice for the secondary
field when a linear continuous basis is used for the displacement field. Considering
that the formulation involves an additional field variable it is also computationally
more efficient if discontinuous constant elements can be used for this particular
field.
1.2 Outline
The rest of this chapter contains an overview of the FEniCS Project including details
on the components pertinent to the present work. Chapter 2 continues with a
demonstration of how to use the FEniCS toolchain for solid mechanics applications.
The purpose of this demonstration is twofold. Firstly, it serves as an introduction to
the concepts of automated modelling from a solid mechanics point of view, which
will give an understanding of how the automated modelling approach can be
utilised to also tackle more complex problems. Secondly, the presented models and
applications will be used in subsequent chapters, either by extending the models or
by using them as a platform for discussing the development of FEniCS components
in connection to the work presented in this thesis.
Local finite element tensors can be evaluated using different representations of
the tensors. In Chapter 3 the two representations that FFC adopts, the quadrature
representation and the tensor contraction representation are presented and comparisons
are made between the two representations. Furthermore, optimisation strategies
for the quadrature representation are discussed and the performance of these are
investigated.
Chapter 4 introduces the extensions implemented in the FEniCS framework
to allow a class of discontinuous Galerkin (DG) formulations to be handled in an
automated fashion. Building on these abstractions, a semi-automated approach
to implementing lifting-type DG formulations is presented in Chapter 5. This
chapter also contains a brief comparison, in terms of complexity regarding the
implementation and the numerical implications, between a lifting-type formulation
and an interior penalty (IP) DG formulation for the Poisson equation.
In Chapter 6 the extensions, developed in the previous chapters, to the FEniCS
1.3. The FEniCS Project 5
net/fenics-plasticity) which focussed solely on plasticity problems. However, to reflect that the
scope of the library has increased to also include more general solid mechanics problems the name was
changed during a recent migration from Launchpad to Bitbucket.
4 Version 1.0 of the FEniCS Solid Mechanics library can be downloaded from https://bitbucket.
org/fenics-apps/fenics-solid-mechanics.
6 Chapter 1. Introduction
Figure 1.1: FEniCS toolchain for solving a PDE using the FEM.
Facilities for each of these steps are implemented in separate software components
in FEniCS. The relationship between input and output of each component in the
FEniCS toolchain for the finite element procedure is shown in Figure 1.1. In short,
the variational form of the PDE is expressed in the Unified Form Language (UFL)
(Alnæs et al., 2013; Alnæs, 2012), which is given as input to the FEniCS Form
Compiler (FFC)5 (Kirby and Logg, 2006, 2007; Logg et al., 2012c; Ølgaard et al.,
2008a) that automatically generates efficient C++ code for evaluating the local
element tensors. The output from FFC is compliant with the interface defined
in Unified Form-assembly Code (UFC) (Alnæs et al., 2009, 2012) and is used by
DOLFIN (Logg and Wells, 2010; Logg et al., 2012d), which is the finite element
assembler and solver of FEniCS although, in principle, any assembly library which
supports UFC can be used.
The key advantage of this modular construction is that it becomes more trans-
parent where and how new features and functionality should be implemented.
Furthermore, developers and users can pick individual components to form their
own applications. In this work for instance, the UFL is augmented with discontinu-
ous Galerkin operators6 , compiler optimisations are implemented in FFC, while
more complex solvers for lifting-type formulations and solid mechanics problems
can be implemented on top of the FEniCS toolbox.
As a model boundary value problem for presenting the FEniCS framework consider
the Poisson equation, which for a body Ω ⊂ Rd , where 1 ≤ d ≤ 3, with boundary
5 Any compiler that supports UFL as input, and outputs UFC code, can be used instead of FFC in the
described toolchain. The Symbolic Form Compiler (Alnæs and Mardal, 2010, 2012), which is also part of
FEniCS, is one such example.
6 Historically, the DG operators were implemented in the original form language of FFC which was
−∆u = f in Ω,
u = g on Γ D , (1.1)
∇u · n = h on Γ N .
where V is the trial space and V̂ is the test space, a (u, v) and L (v) denote the
bilinear and linear forms, respectively. A typical variational form7 of (1.1) defines
the bilinear and linear forms as:
Z
a (u, v) := ∇u · ∇v dx (1.3)
ZΩ Z
L (v) := f v dx + hv ds, (1.4)
Ω ΓN
Thus, after transforming the strong form of the problem into the variational coun-
terpart, the FEniCS toolchain, starting with UFL, can be invoked to compute a
solution.
UFL code
element = FiniteElement("Lagrange", triangle, 1)
u = TrialFunction(element)
v = TestFunction(element)
f = Coefficient(element)
h = Coefficient(element)
a = inner(grad(u), grad(v))*dx
L = f*v*dx + h*v*ds
Figure 1.2: UFL code for the Poisson problem using continuous-piecewise linear
Lagrange polynomials on triangles.
In order to compute a solution to the variational problem using the FEM, it is neces-
sary to discretise the formulation. The Unified Form Language (UFL) (Alnæs et al.,
2013; Alnæs, 2012) enables a user to express the discretisation compactly using a
notation which resembles the mathematical notation closely. UFL is implemented
as a domain-specific embedded language (DSEL) in Python which, among other
things, allow users to define custom operators using all features of the Python
programming language when writing UFL code. This section presents the most
basic features used throughout in this work, while some of the more advanced func-
tionality is presented in subsequent chapters as needed. For a detailed description
of the language, refer to Alnæs et al. (2013).
The Poisson problem in (1.7) can be expressed in UFL by the code shown in
Figure 1.2. The first line in the code defines the local finite element basis that spans
the discrete function space Vh on an element T ∈ Th where Th denotes the standard
triangulation of Ω. Generally, finite elements are defined in UFL by their family,
cell and degree:
UFL code
element = FiniteElement(family, cell, degree)
which in the given case, in Figure 1.2, means that the basis is a piecewise continuous
linear Lagrange triangle. UFL contains a set of predefined finite element family
names, for instance, "Lagrange" as already shown, "Discontinuous Lagrange"
(short name "DG") and "Brezzi-Douglas-Marini" (short name "BDM"). The cell
argument denotes the polygonal shape of the finite element while the degree
argument denotes the degree of the polynomial space. Although valid cell shapes in
UFL are: interval, triangle, tetrahedron, quadrilateral and hexahedron. FFC
only supports the first three cell shapes at present. Also note that the permitted
1.3. The FEniCS Project 9
Table 1.1: (Left) Table of tensor algebraic operators. (Right) Table of differential
operators.
value of cell and degree depend on the choice of finite element family. It is
important to realise that UFL is only concerned with the abstract operations related
to the finite element function spaces; it is left to the form compiler to support the
element families, that is, to generate meaningful code for the representation of
elements and forms. For mixed finite element methods, product spaces like:
V = [V2 × V2 ] × V1 . (1.8)
UFL code
V_2 = FiniteElement("Lagrange", triangle, 2)
V_1 = FiniteElement("Lagrange", triangle, 1)
V = (V_2*V_2)*V_1
W = MixedElement(MixedElement((V_2, V_2)), V_2)
meaning that V and W are identical. To create a mixed element in which all the
component spaces are identical, the VectorElement can be used:
UFL code
V = VectorElement(family, cell, degree, dim=None)
where dim defaults to the dimension of the given cell unless explicitly specified.
After defining the local finite element basis, the trial function u ∈ Vh , the
test function v ∈ Vh and the coefficient functions f , g ∈ Vh can be defined in
a straightforward fashion as seen in the code in Figure 1.2. The bilinear and
linear forms from (1.3) and (1.4) can then be implemented simply by using the
tensor and differential operators defined in UFL, some of which can be seen in
Table 1.1. An important thing to note is that the definition of the gradient operator
grad(A) of, for instance, a vector valued function in UFL is {grad(u)}ij = ∂ui /∂x j
10 Chapter 1. Introduction
and not {grad(u)}ij = ∂u j /∂xi . The latter operator is, however, provided in
UFL by nabla_grad(A). A similar convention applies to the divergence operator
where nabla_div(A) is provided as an alternative to div(A). In this work, the
operators ∇u and ∇ · u follow the UFL definition for the gradient and divergence
operators, grad(u) and div(u), respectively and should not be confused with the
UFL operators nabla_grad(u) and nabla_div(u).
To complete the implementation of the variational forms, integrationR on the
relevant domains must be expressed. In UFL, the integral over the domain Ω I dx
k
is
R denoted by I*dx(k) while the integral over the exterior boundary of the domain
∂Ωk I ds is denoted by I*ds(k) where k is the subdomain number and I is a valid
UFL expression. Thus, having completed the implementation of (1.2) in the near-
mathematical notation of UFL, the form compiler can be invoked to generate code
from the abstract UFL representation.
The last two classes of expressions to be presented in this short introduction to
UFL are nonlinear scalar functions and geometric quantities. UFL provides a set
of nonlinear scalar functions, presented in Table 1.2, which can be applied to, for
instance, scalar valued coefficient functions such as f and g in the Poisson example.
It is illegal to apply these functions to any test or trial function as this would render
the variational form nonlinear in those arguments. Geometric quantities are related
to the local finite element cell T. For instance, the coordinate of the integration
point currently being evaluated on T (including its boundary) can be accessed via
cell.x. Other geometric quantities which are particularly useful in relation to this
work are the outward normal to the facet8 currently being evaluated cell.n and the
circumradius, the radius of the circumscribed circle of T, cell.circumradius. Basic
usage of nonlinear functions and the integration point coordinate is demonstrated
later in Section 1.3.5, while the facet normal and circumradius are frequently used
8 A facet is a topological entity of a computational mesh of dimension D − 1 (codimension 1) where
D is the topological dimension of the cells of the computational mesh. Thus for a triangular mesh, the
facets are the edges and for a tetrahedral mesh, the facets are the faces.
1.3. The FEniCS Project 11
The form file contains the UFL specification of elements and/or forms, as for
instance the code from Figure 1.2 which in this case is saved in the file Poisson.ufl.
The content of a form file is wrapped in a Python script and then executed for
further processing in FFC. There exist a number of optional command-line options
to control the code generation. Related to this work, the most important options
are:
-l language This parameter controls the output format for the generated code. The
default value is “ufc”, which indicates that the code is generated according
to the UFC specification. Alternatively, the value “dolfin” may be used to
generate code according to the UFC format with a small set of additional
DOLFIN-specific wrappers.
-r representation This parameter controls the representation used for the gener-
ated element tensor code. There are three possibilities: “auto” (the default),
“quadrature” and “tensor”. FFC implements two different approaches to
code generation. One is based on traditional quadrature and another on a
special tensor representation. This will be discussed in Section 3.2. In the
case “auto”, FFC will try to select the better of the two representations; that
12 Chapter 1. Introduction
is, the representation that is believed to yield the best run-time performance
for the problem at hand. This issue is addressed in detail in Section 3.6.
-O If this option is used, the code generated for the element tensor is optimised
for run-time performance. The optimisation strategy used depends on the
chosen representation. In general, this will increase the time required for
FFC to generate code, but should reduce the run-time for the generated code.
Note that for very complicated variational forms, hardware limitations can
make compilation with some optimisation options impossible. Optimisation
strategies are treated in Chapter 3.
As an illustration of the options presented above, the command:
Bash code
$ ffc -l dolfin -r quadrature -O Poisson.ufl
will cause FFC to generate code for the Poisson problem, including DOLFIN
wrappers using the quadrature representation with the default optimisation. A
list of all available command-line parameters can be seen in FFC manual page by
typing ‘man ffc’ on the command-line.
FFC follows the conventional design of a compiler in that it breaks compilation
into several sequential stages. The output generated at each stage serves as input for
the following stage, as illustrated in Figure 1.3. Introducing separate stages allows
development and improvement of each stage to be implemented without affecting
other stages of the compilation. Furthermore, adding new stages and dropping
existing stages becomes trivial. Each of the stages involved when compiling a form
is described in the following. Compilation of elements follow a similar (but simpler)
set of stages, and is not described here.
Compiler stage 0: Language (parsing). In this stage, the user-specified form is
interpreted and stored as a UFL abstract syntax tree (AST). The actual pars-
ing is handled by Python and the transformation to a UFL form object is
implemented by operator overloading in UFL.
Input: Python code or .ufl file
Output: UFL form
Compiler stage 1: Analysis. This stage preprocesses the UFL form and extracts
form metadata (FormData), such as which elements were used to define the
form, the number of coefficients and the cell type (interval, triangle or
tetrahedron). This stage also involves selecting a suitable quadrature scheme
and representation (as discussed earlier) for the form if these have not been
specified by the user.
Input: UFL form
Output: preprocessed UFL form and form metadata
1.3. The FEniCS Project 13
OIR
Stage 4
Code generation
C++ code
Stage 5
Code formatting
Foo.h / Foo.cpp
14 Chapter 1. Introduction
.h/.cpp files conforming to the UFC specification. This is where the actual
writing of C++ code takes place and the stage relies on templates for UFC
code available as part of the UFC module ufc_utils.
Input: C++ code
Output: C++ code files
The interface to the code which is generated by FFC is discussed in the following
section.
C++ code
/// Tabulate the local-to-global mapping of dofs on a cell
virtual void tabulate_dofs(unsigned int* dofs, const mesh& m, const cell& c)
const = 0;
where dofs is a pointer to an array for the tabulated values on T. UFC only provides
the interface of this function, it is not concerned with computing ι T . The code
to compute ι T must be generated by the form compiler. For example, FFC will
generate the following code for linear Lagrange elements on triangles.
C++ code
/// Tabulate the local-to-global mapping of dofs on a cell
virtual void tabulate_dofs(unsigned int* dofs, const ufc::mesh& m, const
ufc::cell& c) const
{
dofs[0] = c.entity_indices[0][0];
dofs[1] = c.entity_indices[0][1];
dofs[2] = c.entity_indices[0][2];
}
Note that FFC associates each degree of freedom with the global vertex number
which can be extracted from the cell::entity_indices array. For discontinuous
linear Lagrange elements on triangles the generated code is
C++ code
/// Tabulate the local-to-global mapping of dofs on a cell
virtual void tabulate_dofs(unsigned int* dofs, const ufc::mesh& m, const
ufc::cell& c) const
{
dofs[0] = 3*c.entity_indices[2][0];
dofs[1] = 3*c.entity_indices[2][0] + 1;
dofs[2] = 3*c.entity_indices[2][0] + 2;
}
because FFC considers all degrees of freedom local to the given element and
therefore compute degree of freedom numbers based on the global cell index.
The local finite element tensor is computed inside the tabulate_tensor function
which is implemented by all three integral classes although the interface varies
slightly. For the cell_integral, the interface is
C++ code
/// Tabulate the tensor for the contribution from a local cell
virtual void tabulate_tensor(double* A, const double * const * w, const cell&
c) const = 0;
where A is a pointer to an array which will hold the values of the local ele-
ment tensor and w contains nodal values of any coefficient functions present
1.3. The FEniCS Project 17
v2
Vertex Coordinates
v0 x = (0, 0)
v1 x = (1, 0)
v2 x = (0, 1)
v0 v1
Figure 1.4: The UFC reference triangle and the coordinates of the vertices.
in the integral. The code which FFC generates for this function varies depend-
ing on, for example, the choice of representation and optimisation, issues which
are discussed in Chapter 3. (Figures 3.2 and 3.3, on page 63 and 65 respec-
tively, show examples of code generated by FFC for this function.) The inter-
face for exterior_facet_integral::tabulate_tensor is similar in nature to the
interface for interior_facet_integral::tabulate_tensor which is discussed in
Section 4.1.2 in connection to automation of discontinuous Galerkin methods.
The UFC specification also defines a numbering scheme for mesh entities which
allows form compilers to access necessary data consistently when generating
code, for example, for computing the local tensors and local-to-global mapping as
discussed above. Important aspects of this numbering scheme are summarised in
the following for triangular cells. Further details on the UFC numbering convention
can be found in Alnæs et al. (2012).
The UFC reference triangle, including the coordinates of the three vertices, is
shown in Figure 1.4. Mesh entities are identified by the tuple (d, i ) where d is the
topological dimension of the mesh entity and i is a unique global index of the mesh
entity. For convenience, mesh entities of topological dimension 0 are referred to as
vertices, entities of dimension 1 are referred to as edges and entities of dimension 2
are referred to as faces. Mesh entities of topological dimension D − 1 (codimension 1),
with D denoting the topological dimension of the cells of the computational mesh,
are referred to as facets. Thus for a triangular mesh, the facets are the edges and for
a tetrahedral mesh, the facets are the faces. Following this convention, the vertices
of a triangle are identified as v0 = (0, 0), v1 = (0, 1) and v2 = (0, 2), the edges
(facets) are e0 = (1, 0), e1 = (1, 1) and e2 = (1, 2), and the cell itself is c0 = (2, 0).
The vertices of simplicial cells (intervals, triangles and tetrahedra) are numbered
locally based on the corresponding global vertex numbers such that a tuple of
increasing local vertex numbers corresponds to a tuple of increasing global vertex
numbers. This is illustrated for a simple mesh in Figure 1.5. The remaining mesh
entities are numbered within each topological dimension based on a lexicographical
18 Chapter 1. Introduction
2 3
v1 v2
v2
v0
v0 v1
0 1
Figure 1.5: Local vertex numbering of simplicial mesh based on global vertex
numbers.
ordering of ordered tuples of non-incident vertices. For example, the first edge, e0 , of
a triangle is located opposite vertex v0 as shown in Figure 1.6a. The numbering of
mesh entities on triangular cells is shown in Table 1.4.
The relative ordering of mesh entities with respect to other incident mesh
entities follows by sorting the entities by their indices. Therefore, the pair of
vertices incident to edge e0 in Figure 1.6a is (v1 , v2 ), not (v2 , v1 ). Due to the vertex
numbering convention, this means that two incident simplicial cells will always
agree on the orientation of incident subsimplices (for instance facets). This is
demonstrated in Figure 1.6b, which shows two incident triangles which agree on
the orientation of the common edge. This feature is advantageous when generating
code for discontinuous Galerkin methods, as will be demonstrated in Chapter 4.
1.3.5 DOLFIN
Up until now, only the variational form and finite element discretisation has
been defined. To obtain a solution to the boundary value problem in (1.1) the
computational domain and boundary conditions must be specified which in the
1.3. The FEniCS Project 19
2 3
v2 v1 v2
v2
e2
e0
e0
v0
v0 v1
v0 v1 0 1
(a) Edges are numbered based on the non- (b) Orientation of facets (edges) are defined
incident vertex. Therefore, e0 is located op- by the ordered tuple of incident vertices thus
posite vertex v0 . e0 = (v0 , v2 ) and e2 = (v0 , v1 ).
Figure 1.6: Edge numbering and orientation based on sorted tuples of incident and
non-incident vertices. As a consequence two incident triangles will always agree
on the orientation of the common facet for simplicial cells.
C++ code
#include <dolfin.h>
#include "Poisson.h"
// Source term
class Source : public Expression
{
void eval(Array<double>& values, const Array<double>& x) const
{
values[0] = 8*pow(DOLFIN_PI, 2)*sin(2*DOLFIN_PI*x[0])*sin(2*DOLFIN_PI*x[1]);
}
};
Figure 1.7: Implementation of source term and Dirichlet boundary for the C++
solver for the boundary value problem in (1.1). Program continues in Figure 1.8.
1.3. The FEniCS Project 21
values array.
Next follows the definition of the class DirichletBoundary, a subclass of
SubDomain, for the part of the boundary where Dirichlet boundary conditions
are to be applied. The SubDomain class implements the function inside which eval-
uates to true or false depending on whether or not the point given by coordinates x
is part of the subdomain. In addition to the argument x, the inside function also
takes the argument on_boundary, a boolean value, supplied by DOLFIN, which
is true if the point x is located on ∂Ω. In the given case, the Dirichlet condition
is indeed applied on ∂Ω which means that the overloaded inside function can
simply be implemented by returning the on_boundary argument.
The remaining part of the C++ solver, the main function, is shown in Figure 1.8.
The first line defines the computational mesh and consists of 2048 triangles as
the unit square is divided into 32 × 32 cells and each cell is divided into two 2
triangles. DOLFIN provides functionality for creating simple meshes through the
classes: UnitInterval, UnitSquare, UnitCube, UnitCircle, UnitSphere, Interval,
Rectangle and Box which are useful for testing. For ‘real’ applications, a user can
read a mesh from file in the following way:
C++ code
Mesh mesh("mesh.xml");
provided that the mesh is saved in the DOLFIN XML format. Meshes can be
generated by external libraries, such as Gmsh (http://geuz.org/gmsh/), stored in
the Gmsh data format and converted by the dolfin-convert script to the DOLFIN
XML data format.
Next, the FunctionSpace is defined for the finite element function space Vh
in (1.7). A function space is represented by a Mesh, a DofMap and a FiniteElement.
The DofMap and FiniteElement classes are generated by FFC based on the element
definition in Figure 1.2. However, by including the ‘-l dolfin’ option when
compiling the UFL input with FFC:
Bash code
ffc -l dolfin Poisson.ufl
C++ code
int main()
{
// Create mesh and function space
UnitSquare mesh(32, 32);
Poisson::FunctionSpace V(mesh);
// Compute solution
Function u(V);
Matrix A;
Vector b;
assemble(A, a);
assemble(b, L);
bc.apply(A, b);
solve(A, *u.vector(), b);
// Plot solution
plot(u);
return 0;
}
Figure 1.8: Continuation from Figure 1.7 of C++ code for the Poisson boundary
value problem.
1.3. The FEniCS Project 23
and attached to the linear form. The coefficient f is defined by the class Source as
shown in Figure 1.7, while the coefficient h, the Neumann boundary condition, is
zero in the given case.
A Function u is then declared to hold the computed solution. The Function
class represents a finite element function in V and therefore takes a function space
as argument. The function u also holds a vector of values of the degrees of freedom
associated with the function. A function is evaluated based on linear combinations
of basis functions and the values of this vector. This is in contrast to the Expression
class which is evaluated by overloading the eval function as seen in Figure 1.7.
To compute a solution for u which satisfies the variational problem, defined by
the bilinear and linear forms a and L, the following three steps are applied. Firstly,
the bilinear and linear forms a and L are assembled into the Matrix A and the Vector
b by calling the free function assemble which implements an algorithm to assemble
finite element variational forms. The assembly algorithm will be presented later
in this section. Secondly, the Dirichlet boundary condition is applied to the linear
system of equations using the apply member function of the DirichletBC object
bc. Thirdly, after applying the boundary condition, the system of equations can be
solved by calling the free function solve which solves linear systems on the form
Ax = b using the assembled matrix A, the vector of degree of freedom values from
u and the assembled vector b as arguments.
As an alternative to the three steps outlined above, the solve function provides
functionality to solve variational problems in a straightforward fashion namely by:
C++ code
solve(a == L, u, bc);
which automatically assembles the system, applies the boundary conditions and
solves the linear system which is stored in the function u.
Finally, the solution is saved in ParaView Data (PVD) format (http://www.
paraview.org/) for external post processing and plotted by the built-in plot com-
mand of DOLFIN which enables a quick visual inspection of the computed solution.
The computed solution to the Poisson boundary value problem is shown in Fig-
ure 1.9.
Python interface
Figure 1.9: Computed solution to the Poisson boundary value problem. The warped
scalar field u in the figure on the right has been scaled by a factor of 0.5.
this reason, the Python interface to DOLFIN is preferred, whenever feasible, for the
examples presented in this thesis.
As an example, the complete solver for the Poisson boundary value problem
using the Python interface is shown in Figure 1.10. The code is very similar to the
C++ code in Figures 1.7 and 1.8 and the differences are mainly due to the difference
in Python and C++ syntax. The two main differences are the definition of the
FunctionSpace and the definition of the variational forms which are implemented
directly as part of the solver and not in a separate file. Also note that the UFL
coordinates have been used to implement the source term f directly as part of
the variational formulation. It could also be implemented by subclassing the
Expression class and overloading the eval function in a similar way to the approach
in the C++ example:
Python code
class Source(Expression):
def eval(self, values, x):
values[0] = 8*pi**2*sin(2*pi*x[0])*sin(2*pi*x[1])
f = Source()
Python code
f = Expression("8*pow(pi,2)*sin(2*pi*x[0])*sin(2*pi*x[1])")
where the string argument to the Expression class is given in C++ syntax which is
automatically just-in-time compiled in order to evaluate the Expression. Compared
to the subclassing approach, this is more efficient as the callback to the eval function
1.3. The FEniCS Project 25
Python code
from dolfin import *
# Compute solution
U = Function(V)
solve(a == L, U, bc)
# Plot solution
plot(U, interactive=True)
Figure 1.10: Complete Python solver for the boundary value problem in (1.1).
26 Chapter 1. Introduction
Assembly algorithm
To conclude this short introduction to the FEniCS Project, the assembly algorithm,
implemented in the DOLFIN assemble function, is presented. The presentation is
given for the assembly of the rank two tensor corresponding to the bilinear form
of the Poisson problem in (1.2). A generalisation of the algorithm for multilinear
forms is given in Alnæs et al. (2009) and Logg et al. (2012b).
Setting the function space V̂ = V, the tensor A which arises from assembling
the bilinear form a is defined by
A I = a φ I2 , φ I1 , (1.9)
N
where I = ( I1 , I2 ) is a multi-index and φk k=1 is a basis for V. The tensor A is a
sparse rank two tensor, a matrix, of dimensions N × N. The matrix A is computed
by iterating over the cells of the mesh and adding the contribution from each local
cell to the global matrix A. In this case, from (1.3), the local cell tensor A T is defined
as: Z
A T,i = a T φiT2 , φiT1 = ∇u · ∇v dx, (1.10)
T
where i = (i1 , i2 ) is a multi-index A T,i is the ith entry of the cell tensor A T , a T is
n o3
the local contribution to the form from a cell T ∈ Th and φkT is the local finite
k =1
element basis for V on T, which is linear Lagrange elements on triangles in this
case.
To formulate the assembly algorithm, a local-to-global mapping of degrees of
freedom is needed. Let ι T : I T → I denote the collective local-to-global mapping
for each T ∈ Th
ι T (i ) = ι1T (i1 ), ι2T (i2 ) ∀ i ∈ IT , (1.11)
j
where ι T : [1, 3] → [1, N ] denotes the local-to-global mapping for each discrete
function space Vj and I T is the index set
2
I T = ∏ [1, 3] = (1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3) . (1.12)
j =1
That is, ι T maps a tuple of local degrees of freedom to a tuple of global degrees
of freedom. DOLFIN calls the tabulate_tensor and tabulate_dofs functions
presented in Section 1.3.4, in order to compute the local contribution a T and the
j
local-to-global mapping ι T for each discrete function space from which DOLFIN
constructs the collective local-to-global mapping ι T .
1.3. The FEniCS Project 27
The assembly of the matrix A can now be carried out efficiently by iterating
over all cells T ∈ Th . On each cell T, the cell tensor A T is computed and then added
to the global tensor A as outlined in Algorithm 1. The algorithm can be extended
to handle assembly over exterior and interior facets, the latter is demonstrated in
Section 4.1.4.
2 FEniCS applications to solid mechanics
One of the goals of this work is to tackle complicated solid mechanics models
using automated modelling tools. In the previous chapter it was shown how
automated modelling could be employed to solve the finite element variational
formulation of a Poisson boundary value problem. The Poisson problem provides
a simple platform for introducing the concepts behind automated modelling as it is
implemented in the FEniCS framework. However, from the simple presentation
it may not be immediately clear how more complex problems, like plasticity,
can be solved. A natural step is, therefore, to apply the concept of automated
modelling to some standard solid mechanics problems. Solid mechanics problems
typically involve the standard momentum balance equation, posed in a Lagrangian
setting, with different models distinguished by the choice of nonlinear or linearised
kinematics, and the constitutive model for determining the stress. The traditional
development approach to solid mechanics problems, and traditional finite element
codes, places a strong emphasis on the implementation of constitutive models
at the quadrature point level. Automated methods, on the other hand, tend to
stress more heavily the governing balance equations. Widely used finite element
codes for solid mechanics applications provide application programming interfaces
(APIs) for users to implement their own constitutive models. The interface supplies
kinematic and history data, and the user code computes the stress tensor, and when
required also the linearisation of the stress. Users of such libraries will typically
not be exposed to code development other than via the constitutive model API.
In addition to demonstrating how solid mechanics problems can be solved
using automation tools, this chapter presents some of the models that will be
further investigated and extended in subsequent chapters. It is not intended as a
comprehensive treatment of solid mechanics problems, but should be viewed as a
stepping stone towards implementation of classes of plasticity models. The chapter
also focuses on some pertinent issues that arise due to the nature of the constitutive
models. These issues, and solid mechanics problems in general, have motivated a
number of developments in the FEniCS framework.
The common problems of linearised elasticity, plasticity, hyperelasticity and
elastic wave propagation are considered. Topics that are addressed in this chapter
30 Chapter 2. FEniCS applications to solid mechanics
2.1.1 Preliminaries
F (u; w) = 0 ∀ w ∈ V, (2.1)
which is identical to the form in (1.2). For nonlinear problems, a Newton method
is typically employed to solve (2.1). Linearising F about u = u0 leads to a bilinear
form,
dF (u0 + edu; w)
a (du, w) := Du0 F (u0 ; w) [du] = , (2.4)
de
e =0
and a residual given by:
L ( w ) : = F ( u0 , w ). (2.5)
Using the definitions of a and L in (2.4) and (2.5), respectively, a Newton step
involves solving a problem of the type in (2.3), followed by the correction u0 ←
u0 − du. The process is repeated until (2.1) is satisfied to within a specified tolerance.
The standard balance of linear momentum problem for the body Ω reads:
ρü − ∇ · σ = b in Ω × I, (2.6)
u=g on Γ D × I, (2.7)
σn = h on Γ N × I, (2.8)
u ( x, 0) = u0 in Ω, (2.9)
u̇( x, 0) = v0 in Ω, (2.10)
Applying integration by parts, using the divergence theorem and inserting the
boundary condition (2.8), equation (2.11) can be expressed on the form of (2.2) as:
Z Z Z Z
F := ρü · w dx + σ : ∇w dx − h · w ds − b · w dx = 0. (2.12)
Ω Ω ΓN Ω
In this section, the momentum balance equation has been presented on the
current configuration Ω. It can also be posed on the fixed reference domain Ω0 via
a pull-back operation. However, for the particular presentation which is used in
this chapter for geometrically nonlinear models details of the pull-back will not be
needed.
where Ψ0 is the stored strain energy density on the reference domain, and an
external potential energy functional of the form
Z Z
Πext = − b0 · v dx − h0 · v ds, (2.15)
Ω0 Γ0,N
are considered. It is the form of the stored energy density function Ψ0 that defines
2.2. Constitutive models 33
a particular constitutive model. For later convenience, the potential energy terms
have been presented on the reference domain Ω0 .
A minimiser u of (2.13) minimises the potential energy:
min Π, (2.16)
v ∈V
dΠ (u + ew)
F (u; w) := Du Π (u) [w] = . (2.17)
de
e =0
For linearised elasticity, the stress tensor as a function of the strain tensor for an
isotropic, homogeneous material is given by
where ε = ∇u + (∇u)T /2 is the strain tensor, µ and λ are the Lamé parameters
and I is the second-order identity tensor. The relationship between the stress and
the strain can also be expressed as
σ = C : ε, (2.19)
34 Chapter 2. FEniCS applications to solid mechanics
where
Cijkl = µ δik δjl + δil δjk + λδij δkl , (2.20)
σ = C : εe , (2.21)
where εe is the elastic part of the strain tensor. It is assumed that the strain tensor
can be decomposed additively into elastic and plastic parts:
ε = εe + εp . (2.22)
where φ σ, qkin (εp ) is a scalar effective stress measure, qkin is a stress-like internal
variable used to model kinematic hardening, qiso is a scalar stress-like term used to
model isotropic hardening, κ is a scalar internal variable and σy is the initial scalar
yield stress. For the commonly adopted von Mises model (also known as J2 -flow)
with linear isotropic hardening, φ and qiso read:
r
3
φ (σ) = s s , (2.24)
2 ij ij
qiso (κ ) = Hκ, (2.25)
where sij = σij − σkk δij /3 is the deviatoric stress and the constant scalar H > 0 is a
hardening parameter.
In the flow theory of plasticity, the plastic strain rate is given by:
∂g
ε̇p = λ̇ , (2.26)
∂σ
where λ̇ is the rate of the plastic multiplier and the scalar g is known as the plastic
potential. In the case of associative plastic flow, g = f . The term λ̇ determines
the magnitude of the plastic strain rate, and the direction is given by ∂g/∂σ. For
2.2. Constitutive models 35
2.2.3 Hyperelasticity
F = I + ∇u, (2.29)
T
C = F F, (2.30)
1
E = (C − I ) , (2.31)
2
where I is the second-order identity tensor. Using E in place of the infinitesimal
strain tensor ε in (2.28), the following expression for the strain energy density
function is obtained:
λ
Ψ0 = (tr E)2 + µE : E, (2.32)
2
36 Chapter 2. FEniCS applications to solid mechanics
which is known as the St. Venant–Kirchhoff model. Unlike the linearised case, this
energy density function is not linear in u (or spatial derivatives of u), which means
that when minimising the total potential energy Π, the resulting equations are
nonlinear. Other examples of hyperelastic models are the Mooney–Rivlin model:
Ψ0 = c1 ( IC − 3) + c2 ( I IC − 3) , (2.33)
where IC = tr C and I IC = 21 IC2 − tr C2 and the compressible neo-Hookean
model:
µ λ
Ψ0 = ( IC − 3) − µ ln J + (ln J )2 , (2.34)
2 2
where J = det F.
In most presentations of hyperelastic models, one would proceed from the
definition of the stored energy function to the derivation of a stress tensor, and
then often to a linearisation of the stress for use in a Newton method. This process
can be lengthy and tedious. For a range of models, features of UFL will permit
problems to be posed as energy minimisation problems, and it will not be necessary
to compute an expression for a stress tensor, or its linearisation, explicitly. A
particular model can then be posed in terms of a particular expression for Ψ0 , as
will be demonstrated in the example in Section 2.4.3. It is also possible to follow
the momentum balance route, in which case UFL can be used to compute the stress
tensor and its linearisation automatically from an expression for Ψ0 .
where the scalar stress σ is a nonlinear function of the strain field u,x , and will be
computed via a separate algorithm. A continuous, piecewise quadratic displace-
ment field (and likewise for w) is considered. The strain field u,x is computed via
an L2 -projection onto the space of discontinuous, piecewise linear elements. For the
considered spaces, this is equivalent to a direct evaluation of the strain. Because the
stress is computed via a separate algorithm based on nodal values from the strain
field, it is chosen to also represent the stress using a discontinuous, piecewise linear
basis. Since the polynomial degree of the integrand is two, (2.35) can be integrated
exactly using two Gauss quadrature points on an element T ∈ Th :
2 2
f T,i1 := ∑∑ ψαT xq σα φiT1 ,x xq Wq , (2.36)
q =1 α =1
where q is the integration point index, α is the degree of freedom index for the
local basis of σ, ψ T and φ T denotes the linear and quadratic basis functions on the
element T, respectively, and Wq is the quadrature weight at integration point xq .
Note that σα is the computed value of the stress at the element node α.
To apply a Newton method, the Jacobian (linearisation) of (2.36) is required.
This will be denoted by A?T,i . To achieve quadratic convergence of a Newton
method, the linearisation must be exact. The Jacobian of (2.36) is:
d f T,i1
A?T,i := , (2.37)
dui2
where ui2 are the displacement degrees of freedom. Because the stress is computed
from the strain field u,x , only σα in (2.36) depends on dui2 , and the linearisation of
this term reads:
dσα dσα dε α dε α
= = Dα , (2.38)
dui2 dε α dui2 dui2
where Dα is the tangent value at node α. To compute the values of the strain at
nodes, ε α , from the displacement field, the derivative of the displacement field is
evaluated at xα :
3
εα = ∑ φiT2 ,x ( xα ) ui2 . (2.39)
i2 =1
2 2
A?T,i = ∑ ∑ ψαT (xq ) Dα φiT2 ,x (xα )φiT1 ,x (xq )Wq . (2.40)
q =1 α =1
bilinear form: Z
a(u, w) := Du,x w,x dx, (2.41)
Ω
where D = dσ/dε is the tangent. As before, D is represented using a discontinuous,
piecewise linear basis where the nodal values of D are computed via a separate
algorithm. If two quadrature points are used to integrate the form (which is exact
for this form), the resulting element matrix is:
2 2
A T,i = ∑ ∑ ψαT (xq ) Dα φiT2 ,x (xq )φiT1 ,x (xq )Wq . (2.42)
q =1 α =1
Solving this problem via Newton’s method involves solving a series of linear
problems with
Z Z
L (w) := 1 + u2n un,x wn,x dx − f w dx, (2.45)
ZΩ
Ω
Z
a (dun+1 , w) := 1 + u2n dun+1,x w,x dx + 2un un,x dun+1 w,x dx, (2.46)
Ω Ω
1 The concept was introduced in Ølgaard et al. (2008b) although the syntax for declaring a ‘quadrature
element’ and the underlying interpretation has changed slightly. Specifically, the argument k used to
refer to the number of integration points in each spacial direction of the quadrature scheme, which is
different from the current interpretation in which it refers to the polynomial degree that the underlying
quadrature rule will be able to integrate exactly.
40 Chapter 2. FEniCS applications to solid mechanics
where k is the polynomial degree that the underlying quadrature rule will be
able to integrate exactly. The declaration of a quadrature element is similar to the
declaration of any other element in UFL, as demonstrated in Section 1.3.2, and it can
be used as such, with some limitations. Note, however, the subtle difference that the
element order does not refer to the polynomial degree of the finite element shape
functions, but instead relates to the quadrature scheme. For ‘sufficient’ integration
of a second-order polynomial in three dimensions, FFC will use four quadrature
points per cell. FFC interprets the quadrature points of the quadrature element as
degrees of freedom where the value of a shape function for a degree of freedom is
equal to one at the quadrature point and zero otherwise. This has the implication
that a function that is defined on a quadrature element can only be evaluated at
quadrature points. Furthermore, it is not possible to take derivatives of functions
defined on a quadrature element.
The following examples illustrate simple usage of a quadrature element. Con-
sider the bilinear form for a mass matrix weighted by a coefficient f that is defined
on a quadrature element:
Z
a (u, w) := f uw dx. (2.49)
Ω
If the test and trial functions w and u come from a space of linear Lagrange
functions, the polynomial degree of their product is two. This means that the
coefficient f should be defined as:
UFL code
ElementQ = FiniteElement("Quadrature", tetrahedron, 2)
f = Coefficient(ElementQ)
to ensure appropriate integration of the form in (2.49). The reason for this is that
the quadrature element in the form dictates the quadrature scheme that FFC will
use for the numerical integration since the quadrature element, as described above,
only has nonzero values at points that coincide with the underlying quadrature
scheme of the quadrature element. Thus, if the degree of ElementQ is set to one, the
form will be integrated using only one integration point, since one point is enough
to integrate a linear polynomial exactly, and as a result the form is under-integrated.
If quadratic Lagrange elements are used for w and u, the polynomial degree of the
integrand is four, therefore the declaration for the coefficient f should be changed
to:
UFL code
ElementQ = FiniteElement("Quadrature", tetrahedron, 4)
f = Coefficient(ElementQ)
Table 2.1: Computed relative residual norms after each iteration of the Newton
solver for the nonlinear model problem using different elements for V and W.
Quadratic convergence is observed when using quadrature elements, and when
using piecewise constant functions for W, which coincides with a one-point quadra-
ture element. The presented results are computed using the code in Figure 2.1
using the different combinations of function spaces.
Figure 2.1 for solving the nonlinear model problem in (2.44) with a source term f =
x2 − 4, Dirichlet boundary conditions u = 1 at x = 0, continuous quadratic elements
for V, and quadrature elements of degree two for W. NonlinearModelProblem is
a subclass of the DOLFIN class NonlinearProblem, which implements the lin-
ear form F and the bilinear form J, the derivative or Jacobian of F, according
to (2.5) and (2.4) respectively. The DOLFIN class NewtonSolver solves prob-
lems expressed in the canonical form of (2.1) based on the information provided
by the NonlinearModelProblem object. Further details on the DOLFIN classes
NonlinearProblem and NewtonSolver can be found in Logg et al. (2012d).
The relative residual norm after each iteration of the Newton solver for four
different combinations of spaces V and W is shown in Table 2.1. Continuous, dis-
continuous and quadrature elements are denoted by CGk , DGk and Qk respectively
where k refers to the polynomial degree as discussed previously. It is clear from the
table that using quadratic elements for V requires the use of quadrature elements
for W in order to ensure quadratic convergence of the Newton solver.
Python code
from dolfin import *
Figure 2.1: DOLFIN implementation for the nonlinear model problem in (2.44) with
‘off-line’ computation of terms used in the variational forms.
2.4. Implementations and examples 43
In the code extracts, commentary is only provided for non-trivial aspects as the
more generic aspects, such as the creation of meshes, application of boundary
conditions and the solution of linear systems, already have been treated in the
introduction to the FEniCS Project in Section 1.3.
2.4.2 Plasticity
The computation of the stress tensor, and its linearisation, for the model outlined
in Section 2.2.2 in a displacement-driven finite element model is rather involved. A
method of computing point-wise a stress tensor that satisfies (2.23) from the strain,
strain increment and history variables is known as a ‘return mapping algorithm’.
Return mapping strategies are discussed in detail in Simo and Hughes (1998). A
widely used return mapping approach, the ‘closest-point projection’, is summarised
below for a plasticity model with linear isotropic hardening.
From (2.21) and (2.22) the stress at the end of a strain increment reads:
p
σn+1 = C : (ε n+1 − ε n+1 ). (2.50)
p
Therefore, given ε n+1 , it is necessary to determine the plastic strain ε n+1 in order to
compute the stress. In a closest-point projection method the increment in plastic
strain is computed from:
p p ∂g (σn+1 )
ε n+1 − ε n = ∆λ , (2.51)
∂σ
where g is the plastic potential function and ∆λ = λn+1 − λn . Since ∂σ g is evaluated
at σn+1 , (2.50) and (2.51) constitute as system of coupled equations with unknowns
∆λ and σn+1 . In general, the system is nonlinear. To obtain a solution, Newton’s
44 Chapter 2. FEniCS applications to solid mechanics
Python code
from dolfin import *
# Create mesh
mesh = UnitCube(8, 8, 8)
# Elasticity parameters
E, nu = 10.0, 0.3
mu, lmbda = E/(2.0*(1.0 + nu)), E*nu/((1.0 + nu)*(1.0 - 2.0*nu))
# Stress
sigma = 2*mu*sym(grad(u)) + lmbda*tr(grad(u))*Identity(w.cell().d)
Figure 2.2: DOLFIN solver for a linearised elasticity problem on a unit cube.
2.4. Implementations and examples 45
method is employed as follows, with k denoting the iteration number. First, a ‘trial
stress’ is computed:
p
σtrial = C : (ε n+1 − ε n ). (2.52)
Subtracting (2.52) from (2.50) and inserting (2.51), the following equation is ob-
tained:
∂g (σn+1 )
Rn+1 := σn+1 − σtrial + ∆λC : = 0, (2.53)
∂σ
where Rn+1 is the ‘stress residual’. During the Newton iterations this residual
is driven towards zero. If the trial stress in (2.52) leads to satisfaction of the
yield criterion in (2.23), then σtrial is the new stress and the Newton procedure is
terminated. Otherwise, the Newton increment of ∆λ is computed from:
f k − Rk : Qk : ∂σ f k
dλk = , (2.54)
∂ σ f k : Ξ k : ∂ σ gk + h
h i −1
where Q = I + ∆λC : ∂2σσ g , Ξ = Q : C and h is a hardening parameter, which
for the von Mises model with linear hardening is equal to H (the constant hardening
parameter). The stress increment is computed from:
after which the increment of the plastic multiplier and the stresses for the next
iteration can be computed:
The yield criterion is then evaluated again using the updated values, and the proce-
dure continues until the yield criterion is satisfied to within a prescribed tolerance.
Note that to start the procedure ∆λ0 = 0 and σ0 = σtrial . After convergence is
achieved, the consistent tangent can be computed:
Ξ : ∂σ g ⊗ ∂σ f : Ξ
Ctan = Ξ − , (2.58)
∂σ f : Ξ : ∂σ g + h
which is used when assembling the global Jacobian (stiffness matrix). The return
mapping algorithm is applied at all quadrature points.
The closest-point return mapping algorithm described above (Simo and Hughes,
1998) is common to a range of plasticity models that are defined by the form of the
functions f and g. The process can be generalised for models with more complicated
hardening behaviour. To aid the implementation of different models, a return
mapping algorithm and support for quadrature point level history parameters
46 Chapter 2. FEniCS applications to solid mechanics
C++ code
class PlasticityModel
{
public:
/// Constructor
PlasticityModel(double E, double nu);
};
Figure 2.3: PlasticityModel public interface defined by the plasticity library. Users
are required to supply implementations for at least the pure virtual functions.
These functions describe the plasticity model.
48 Chapter 2. FEniCS applications to solid mechanics
UFL code
1 element = VectorElement("Lagrange", tetrahedron, 2)
2 elementT = VectorElement("Quadrature", tetrahedron, 2, 36)
3 elementS = VectorElement("Quadrature", tetrahedron, 2, 6)
4
5 u, w = TrialFunction(element), TestFunction(element)
6 b, h = Coefficient(element), Coefficient(element)
7 t, s = Coefficient(elementT), Coefficient(elementS)
8
9 def eps(u):
10 return as_vector([u[i].dx(i) for i in range(3)] \
11 + [u[i].dx(j) + u[j].dx(i) for i, j in [(0, 1), (0, 2), (1, 2)]])
12
13 def sigma(s):
14 return as_matrix([[s[0], s[3], s[4]],
15 [s[3], s[1], s[5]],
16 [s[4], s[5], s[2]]])
17
18 def tangent(t):
19 return as_matrix([[t[i*6 + j] for j in range(6)] for i in range(6)])
20
21 a = inner(dot(tangent(t), eps(u)), eps(w))*dx
22 L = inner(sigma(s), grad(w))*dx - dot(b, w)*dx - dot(h, w)*ds
Figure 2.4: Definition of the linear and bilinear variational forms for plasticity
expressed using UFL syntax.
2.4. Implementations and examples 49
cells, loops over cell quadrature points, and variable updates in addition to defining
the linear and bilinear forms of the plasticity problem. The PlasticityProblem is
solved by the NewtonSolver like any other NonlinearProblem object as described
earlier in this chapter, line 41. After each Newton solve, the history variables are
updated by calling the update_variables function before proceeding with the next
solution increment, line 44.
2.4.3 Hyperelasticity
The construction of a solver for a hyperelastic problem, phrased as a minimisation
problem, is now presented and follows the minimisation framework presented
in Section 2.1.3. The compressible neo-Hookean model in (2.34) is adopted. The
automatic functional differentiation features of UFL permit the solver code to
resemble closely the abstract mathematical presentation. Differentiation of forms
with respect to functions are handled by the UFL function derivative. For instance,
given the potential energy functional Π (u) as a function of the displacements u,
the derivative of Π with respect to u in the direction w is given by
dΠ (u + ew)
Du Π ( u ) [ w ] : = , (2.59)
de
e =0
UFL code
derivative(Pi, u, w)
If w is a test function, the result from applying the derivative is a linear form, which
can be differentiated again to yield a bilinear form as shown in (2.4). Noteworthy
in this approach is that it is not necessary to provide an explicit expression for the
stress tensor. Changing model is therefore as simple as redefining the stored energy
density function Ψ0 .
A complete hyperelastic solver is presented in Figure 2.6. It corresponds to a
problem posed on a unit cube, and loaded by a body force b0 = (0, −0.5, 0), and
restrained such that u = (0, 0, 0) where x = 0. Elsewhere on the boundary the
traction h0 = (0.1, 0, 0) is applied. Continuous, piecewise linear functions for the
displacement field are used. The code in Figure 2.6 adopts the same notation used
in Sections 2.1.3 and 2.2.3. The problem is posed on the reference domain, and for
convenience the subscripts ‘0’ have been dropped in the code.
The solver in Figure 2.6 solves the problem using one Newton step. For problems
with stronger nonlinearities, perhaps as a result of greater volumetric or surface
forcing terms, it may be necessary to apply a pseudo time-stepping approach and
solve the problem in number of Newton increments, or it may be necessary to
apply a path following solution method.
50 Chapter 2. FEniCS applications to solid mechanics
C++ code
1 // Create mesh and define function spaces
2 UnitCube mesh(4, 4, 4);
3 Plasticity::FunctionSpace V(mesh);
4 Plasticity::BilinearForm::CoefficientSpace_t Vt(mesh);
5 Plasticity::LinearForm::CoefficientSpace_s Vs(mesh);
6
7 // Create functions, forms and attach functions
8 Function u(V); Function tangent(Vt); Function stress(Vs);
9 Plasticity::BilinearForm a(V, V);
10 Plasticity::LinearForm L(V);
11 a.t = tangent;
12 L.s = stress;
13
14 // Young’s modulus and Poisson’s ratio
15 double E = 20000.0; double nu = 0.3;
16
17 // Slope of hardening (linear) and hardening parameter
18 double E_t(0.1*E);
19 double hardening_parameter = E_t/(1.0 - E_t/E);
20
21 // Yield stress
22 double yield_stress = 200.0;
23
24 // Object of class von Mises
25 fenicssolid::VonMises J2(E, nu, yield_stress, hardening_parameter);
26
27 // Create PlasticityProblem
28 fenicssolid::PlasticityProblem nonlinear_problem(a, L, u, tangent, stress, bcs,
J2);
29
30 // Create nonlinear solver
31 NewtonSolver nonlinear_solver;
32
33 // Pseudo time stepping parameters
34 double t = 0.0; double dt = 0.005; double T = 0.02;
35
36 // Apply load in steps
37 while (t < T)
38 {
39 // Increment time and solve nonlinear problem
40 t += dt;
41 nonlinear_solver.solve(nonlinear_problem, *u.vector());
42
43 // Update variables for next load step
44 nonlinear_problem.update_variables();
45 }
Figure 2.5: DOLFIN code extract for solving a plasticity problem using the FEniCS
Solid Mechanics library.
2.4. Implementations and examples 51
Python code
from dolfin import *
# Kinematics
I = Identity(V.cell().d) # Identity tensor
F = I + grad(u) # Deformation gradient
C = F.T*F # Right Cauchy-Green tensor
Ic, J = tr(C), det(F) # Invariants of deformation tensors
# Elasticity parameters
E, nu = 10.0, 0.3
mu, lmbda = E/(2*(1 + nu)), E*nu/((1 + nu)*(1 - 2*nu))
# Compute Jacobian of F
dF = derivative(F, u, du)
Figure 2.6: Complete DOLFIN solver for the compressible neo-Hookean model,
formulated as a minimisation problem.
52 Chapter 2. FEniCS applications to solid mechanics
2.4.4 Elastodynamics
1
un+1 = un + ∆tu̇n + ∆t2 2βün+1 + 1 − 2β ün , (2.60)
2
u̇n+1 = u̇n + ∆t γün+1 + (1 − γ) ün , (2.61)
where β and γ are scalar parameters. Various well-known schemes are recovered
for particular combinations of β and γ. Setting β = 1/4 and γ = 1/2 leads to the
trapezoidal scheme, and setting β = 0 and γ = 1/2 leads to a central difference
scheme. For β > 0, re-arranging (2.60) and using (2.61) leads to:
!
1 1
ün+1 = (un+1 − un − ∆tu̇n ) − − 1 ün , (2.62)
β∆t2 2β
! !
γ γ γ
u̇n+1 = (u − un ) − − 1 u̇n − ∆t − 1 ün , (2.63)
β∆t n+1 β 2β
in which un+1 is the only unknown term on the right-hand side. To solve a time
dependent problem, the governing equation can be posed at time tn+1 ,
F ( u n +1 ; w ) = 0 ∀ w ∈ V, (2.64)
with the expressions in (2.62) and (2.63) used for first and second time derivatives
of u at time tn+1 .
The viscoelastic model under consideration is a minor extension of the elasticity
model in (2.18). For the viscoelastic model, the stress tensor is given by:
for the acceleration and velocity at time tn+1 (a1, v1) in terms of the displacement
at tn+1 (u1) and other fields at time tn . For simplicity, the source term b = (0, 0, 0).
The body is fixed such that u = (0, 0, 0) at x = 0 and the initial conditions are
u0 = v0 = (0, 0, 0). A traction h is applied at x = 1 and is increased linearly from
zero to one over the first five time steps. Therefore, no forces are acting on the
body at t = 0 and the initial acceleration is zero. Again, the UFL functions lhs
and rhs have been used to extract the bilinear and linear terms from the form.
This is particularly convenient for time-dependent problems since it allows the
code implementation to be posed in the same format as is usually adopted in the
mathematical presentation, with the equation of interest posed in terms of fields at
some point between times tn and tn+1 . The presented solver could be made more
efficient by exploiting linearity of the governing equation and thereby re-using the
factorisation of the system matrix.
Finally, to attract more users with a solid mechanics background another exten-
sion to consider is improving the interface of the FEniCS Solid Mechanics library
to make it more similar to conventional finite element software packages. This
involves supplying the users with information like strain, strain rates and possibly
gradients of strain at integration point level for the user to formulate the constitutive
relation without working with the weak form of the governing equations.
54 Chapter 2. FEniCS applications to solid mechanics
Python code
from dolfin import *
# External load
class Traction(Expression):
def __init__(self, end):
Expression.__init__(self)
self.t = 0.0
self.end = end
def value_shape(self):
return (2,)
# v = dt * ((1-gamma)*a0 + gamma*a) + v0
v_vec = dt*((1.0-gamma)*a0_vec + gamma*a_vec) + v0_vec
E, nu = 10.0, 0.3
mu, lmbda = E/(2.0*(1.0 + nu)), E*nu/((1.0 + nu)*(1.0 - 2.0*nu))
Figure 2.7: DOLFIN code for solving for a dynamic problem using an implicit
Newmark scheme. Program continues in Figure 2.8.
2.5. Current and future developments 55
Python code
# Time stepping parameters
beta, gamma = 0.25, 0.5
dt = 0.1
t, T = 0.0, 20*dt
# Stress tensor
def sigma(u, v):
return 2.0*mu*sym(grad(u)) + (lmbda*tr(grad(u)) +
eta*tr(grad(v)))*Identity(u.cell().d)
# Governing equation
F = (rho*dot(a1, w) + inner(sigma(u1, v1), sym(grad(w))))*dx - dot(h, w)*ds
Figure 2.8: Continuation from Figure 2.7 of DOLFIN code extract for solving for a
dynamic problem.
3 Representations and optimisations of
finite element variational forms
The previous chapter demonstrated that solvers for various solid mechanics prob-
lems can be implemented with relatively little effort using an automated modelling
approach which relies on the abstractions offered by UFL and the ability of FFC to
generate C++ code from the UFL input. For the approach to be competitive with
hand written code, it is important that the run-time performance of the correspond-
ing low-level code generated from the UFL representation is comparable to that
of hand written code. To this end, FFC implements two different types of repre-
sentations of finite element tensors, the so-called tensor contraction representation
and the classical quadrature-loop representation, including optimisations of both
representations.
The development of different strategies for representing and optimising fi-
nite element variational forms has been motivated by the desire of applying the
automated modelling approach to problems of increasing complexity. The first
representation available in FFC was the tensor contraction. However, this repre-
sentation is not effective for problems like plasticity in Section 2.4.2. This led to
the development of a representation based on quadrature which included the opti-
misations described in Sections 3.3.1 and 3.3.2. With the availability of automatic
differentiation in UFL, problems like hyperelasticity could easily be implemented
in the automated framework, Sections 2.2.3 and 2.4.3. For these types of prob-
lems, further optimisations of the quadrature representation was necessary for
efficient computation. These optimisations are important as FFC will automatically
select the quadrature representation for moderately complex and highly complex
problems if the representation is not set by the user. The automatic selection of
representation is discussed in Section 3.6. Many FEniCS users will, therefore, be
using the quadrature representation and optimisations unknowingly, particularly if
they work through the Python interface of DOLFIN.
This chapter presents the developments in FFC in terms of representations and
optimisations for finite element variational forms and is primarily based on the
work in Ølgaard and Wells (2009, 2010, 2012b) with the main difference being that
58 Chapter 3. Representations and optimisations of finite element variational forms
code examples and results have been updated to be compliant with FEniCS version
1.0. The developments have been applied by researchers and application developers
to various problems such as multiphase flow through porous media (Wells et al.,
2008), free surface flows (Labeur and Wells, 2009), the Navier–Stokes equations
(Mortensen et al., 2011; Labeur and Wells, 2012; Jansson et al., 2011; Selim et al.,
2012), fluid structure interaction (Selim, 2012; Hoffman et al., 2013), shape memory
alloys (Grandi et al., 2012), electromagnetics (Marchand and Davidson, 2011; Lezar
and Davidson, 2012), magnetic fluid hyperthermia for cancer therapy (Miaskowski
et al., 2012), oscillatory hydraulic tomography (Saibaba et al., 2012), the Föppl–Von
Kármán shell model (Vidoli, 2013), nonlinear elliptic problems (Lakkis and Pryer,
2011), microstructural processes (Maraldi et al., 2011, 2012), mantle convection
simulations (Vynnytska et al., 2013, 2012), glacier ice motion (Riesen et al., 2010;
Riesen, 2011), PDE-constrained optimisation and optimal control (Brandenburg
et al., 2012; Funke and Farrell, 2013; Rosseel and Wells, 2012; Clason and Kunisch,
2012; Rognes and Logg, 2012), Nitsche’s method for overlapping meshes (Massing
et al., 2012b,a, 2013), automated modelling of evolving discontinuities (Nikbakht
and Wells, 2009; Nikbakht, 2012), liquid crystal elastomers (Luo and Calderer, 2012),
and crack propagation in elastomers (Horst et al., 2013).
The tensor contraction representation of element tensors (Kirby and Logg, 2006;
Ølgaard et al., 2008a) is based on the multiplicative decomposition of an element
tensor into two tensors; one of which depends only on the differential equation
and the chosen finite element bases and can be computed prior to run-time. It
has been shown for classes of problems that the tensor contraction representation
is more efficient than the traditional quadrature approach, and the speed-ups
can be dramatic (Kirby and Logg, 2006; Ølgaard and Wells, 2010). Furthermore,
strategies which analyse the structure of the tensor contraction representation can
yield improved performance (Kirby et al., 2005, 2006). However, in contrast to
the quadrature-loop approach, the tensor contraction representation is somewhat
specialised as it cannot be extended trivially to non-affine isoparametric mappings
while maintaining efficiency, and it is not effective for classes of nonlinear problems
which require the integration of functions that do not come from a finite element
space (Ølgaard et al., 2008b). The attractive feature of the approach is the run-time
performance for classes of problems.
A general experience is that the tensor contraction approach does not scale well
when forms become more complicated. This is manifest in three ways: the time
required to generate low-level code for a variational form becomes prohibitive or
3.1. Motivation and approach 59
may fail due to memory limitations or limitations of underlying libraries1 ; the size
of the generated code is such that the compilation of the generated low-level code is
prohibitively slow and file size limitations of compilers acting on the low-level code
may be exceeded; and the run-time performance deteriorates rapidly relative to a
quadrature approach. Complicated forms are by no means exotic. Many common
nonlinear equations, when linearised, result in forms which involve numerous
function products. Factors that determine the complexity of a form are the number
of coefficient functions, the number of derivatives and the polynomial orders of
the finite element basis functions. Approaches to reduce the time required for the
code generation phase when using the tensor contraction representation have been
developed and implemented in FFC (Kirby and Logg, 2007), although these cannot
mitigate the inherently expensive nature of the approach for complicated forms.
Using a quadrature representation for more complicated forms mitigates the
problems regarding the time required to generate the code and the file size of the
generated code. However, a naive implementation of the quadrature representation
can have a serious impact on the run-time performance of the generated code.
Fortunately, the automated generation of computer code provides scope for various
optimisations to be applied such that optimal or near-optimal run-time performance
is maintained also for complex forms. The optimisations that have been developed
in this work are discussed in Section 3.3, see also Ølgaard and Wells (2010, 2012b).
To demonstrate the issues pertinent to automated code generation for compli-
cated forms this chapter presents the tensor contraction representation and the
quadrature representations, and discusses four optimisation strategies for the latter
for run-time performance of the generated code. Adopting the approach in Øl-
gaard and Wells (2010), the two representations are then compared to each other by
considering
The relative importance of these points may well shift during a development
cycle. During initial development, it is likely that the speed of the code generation
phase and the size of the generated code are most important, whereas at the end
of the development cycle run-time performance is likely to be the most crucial
consideration. However, there is typically a correlation between the three points.
After comparing the two representations, the four optimisations for the quadrature
representation are compared to each other in terms of run-time performance.
1 For instance, the implementation of the tensor contraction representation in FFC relies on the
The bilinear form for the weighted Laplace operator −∇ · (w∇u), where u is
unknown and w is a prescribed coefficient is chosen as a canonical example to
illustrate the two different representations and the optimisations implemented in
FFC. The bilinear form for this operator reads
Z
a (u, v) := w∇u · ∇v dx. (3.1)
Ω
The quadrature approach can deal with cases in which not all functions come
from a finite element space including nonlinear functions like ln, exp, sin etc.,
using ‘quadrature functions’ (see Section 2.3.2) that can be evaluated directly
at quadrature points. The tensor representation approach only supports cases
in which all functions come from a finite element space (using interpolation if
necessary). Therefore, to ensure a proper performance comparison between the
representations, it is assumed in this chapter that all functions in a form, including
coefficient functions, come from a finite element function space. In the case of (3.1),
all functions will come from
n o
Vh := v ∈ H 1 (Ω) : v| T ∈ Pq ( T ) ∀ T ∈ Th , (3.2)
where i = (i1 , i2 ) is a multi-index. The UFL input for (3.1) is shown in Figure 3.1
for continuous piecewise linear functions on triangles as a basis for all functions in
the form.
3.2. Representation of finite element tensors 61
UFL code
element = FiniteElement("Lagrange", triangle, 1)
u = TrialFunction(element)
v = TestFunction(element)
w = Coefficient(element)
a = w*inner(grad(u), grad(v))*dx
Figure 3.1: UFL input for the weighted Laplacian form on linear triangular elements.
N n
A T,i = ∑ ∑ Φ α3 ( X q ) w α3
q =1 α3 =1
d d ∂Xα1 ∂Φi1 ( X q ) d
∂Xα2 ∂Φi2 ( X q )
∑ ∑ ∑ det FT0 W q , (3.4)
β =1 α1 =1 ∂x β ∂Xα1 α2 =1 ∂x β ∂X α 2
where a change of variables from the reference coordinates X to the real coordinates
x = FT ( X ) has been used. In the above equation, N denotes the number of
integration points, d is the dimension of Ω, n is the number of degrees of freedom
for the local basis of w, Φi denotes basis functions on the reference element, det FT0
is the determinant of the Jacobian, and W q is the quadrature weight at integration
point X q . By default, FFC applies a quadrature scheme that will integrate the
variational form exactly.
From the intermediate representation in (3.4), code for computing entries of
the local element tensor is generated. This code is shown in Figure 3.2. Code
generated for the quadrature representation is structured in the following way.
First, values of geometric quantities that depend on the current element T, like
the components of the inverse of the Jacobian matrix ∂Xα1 /∂x β and ∂Xα2 /∂x β ,
are computed and assigned to the variables like K_01 in the code (this code is
not shown as it is not important for understanding the nature of the quadrature
representation). Next, values of basis functions and their derivatives at integration
points on the reference element, like Φα3 ( X q ) and ∂Φi1 ( X q )/∂Xα1 are tabulated.
Finite element basis functions are computed by FIAT. Basis functions and their
derivatives on a reference element are independent of the current element T and
62 Chapter 3. Representations and optimisations of finite element variational forms
are, therefore, tabulated at compile-time and stored in the tables Psi_w, Psi_vu_D01
and Psi_vu_D10 in Figure 3.2. After the tabulation of basis function values, the
loop over integration points begins. In the example, linear elements are considered,
and only one integration point is necessary for exact integration. The loop over
integration points has therefore been omitted. The first task inside a loop over
integration points is to compute the values of coefficients at the current integration
point. For the considered problem, this involves computing the value of the
coefficient w. The code for evaluating F0 in Figure 3.2 is an exact translation of the
representation ∑nα3 =1 Φα3 ( X q )wα3 . The last part of the code in Figure 3.2 is the loop
over the basis function indices i1 and i2 , where the contribution to each entry in
the local element tensor, A T , from the current integration point is added. The code
presented in Figure 3.2 is the default output of the quadrature representation and
is not optimised for run-time performance. Optimisation strategies are discussed
in Section 3.3. To generate code using the quadrature representation the FFC
command-line option -r quadrature should be used.
d d n d ∂Xα ∂Xα
Z
∂Φi1 ∂Φi2
A T,i = ∑ ∑ ∑ det FT0 wα3 ∑ ∂xβ1 ∂xβ2 T0
Φ α3
∂Xα1 ∂Xα2
dX. (3.5)
α1 =1 α2 =1 α3 =1 β =1
Noteworthy is that the integral appearing in equation (3.5) is independent of the cell
geometry and can, therefore, be evaluated prior to run-time. The remaining terms,
with the exception of wα3 , depend only on the geometry of the cell. Exploiting this
observation, the element tensor A T,i can then be expressed as a tensor contraction,
where the tensors A0iα (the reference tensor) and GTα (the geometry tensor) are defined
as
∂Φi1 ∂Φi2
Z
A0iα = Φ α3 dX, (3.7)
T0 ∂Xα1 ∂Xα2
d ∂Xα1 ∂Xα2
GTα = det FT0 wα3 ∑ ∂x β ∂x β
. (3.8)
β =1
3.2. Representation of finite element tensors 63
C++ code
virtual void tabulate_tensor(double* A,
const double * const * w,
const ufc::cell& c) const
{
...
// Quadrature weight.
static const double W1 = 0.5;
Figure 3.2: Part of the generated code for quadrature representation of the bilinear
form associated with the weighted Laplacian using linear elements in two dimen-
sions. The variables like K_00 are components of the inverse of the Jacobian matrix
and det is the determinant of the Jacobian. The code to compute these variables
is not shown. A holds the values of the local element tensor and w contains nodal
values of the weighting function w.
64 Chapter 3. Representations and optimisations of finite element variational forms
During assembly, one may then iterate over all elements of the triangulation
and on each element T compute the geometry tensor GTα , compute the tensor
contraction (3.6) and then add the resulting element tensor A T,i to the global sparse
matrix A. A generalisation of the approach to general multilinear variational forms
is presented in Kirby and Logg (2007).
The code which FFC will generate from the representation in (3.6) is shown in
Figure 3.3. As was the case with the quadrature representation, values of geometric
quantities that depend on the current element T are computed first and assigned
to the variables like K_01 in the code (again, this code is not shown as it is not
important for understanding the nature of the tensor contraction representation).
Based on these values, the geometry tensor (3.8) is computed and the contraction
in (3.6) is performed using the reference tensor from (3.7) which is precomputed
during the code generation stage (the literal constants 0.166667). Notice that the
contraction to compute entries in A T,i is unrolled which allows any zero-valued
entry of the reference tensor to be detected during the code generation stage and the
corresponding code can, therefore, be omitted. For a certain class of simple forms
this can lead to a tremendous speed-up when evaluating the element matrices
relative to a quadrature approach (Kirby and Logg, 2006).
Inevitably, the tensor contraction approach, due to unrolling the contraction,
leads to code which is much less compact compared to the quadrature represen-
tation (see Figure 3.2). Furthermore, as the number of functions and derivatives
present in the variational form increases, the rank of both the reference tensor
and the geometry tensor increases, thereby increasing the complexity of the ten-
sor contraction. Thus, for complicated forms the size of the generated code may
cause problems for the compilers acting on the generated low-level code, and the
complexity of the tensor contraction may exceed that of the quadrature representa-
tion leading to poor run-time performance. This influence of the complexity on
the performance is investigated in Section 3.4. To generate code using the tensor
contraction representation the FFC command-line option -r tensor should be
used.
C++ code
virtual void tabulate_tensor(double* A,
const double * const * w,
const ufc::cell& c) const
{
...
// Compute geometry tensor
const double G0_0_0_0 = det*(w[0][0]*((K_00*K_00 + K_01*K_01)));
const double G0_0_0_1 = det*(w[0][0]*((K_00*K_10 + K_01*K_11)));
const double G0_0_1_0 = det*(w[0][0]*((K_10*K_00 + K_11*K_01)));
const double G0_0_1_1 = det*(w[0][0]*((K_10*K_10 + K_11*K_11)));
const double G0_1_0_0 = det*(w[0][1]*((K_00*K_00 + K_01*K_01)));
const double G0_1_0_1 = det*(w[0][1]*((K_00*K_10 + K_01*K_11)));
const double G0_1_1_0 = det*(w[0][1]*((K_10*K_00 + K_11*K_01)));
const double G0_1_1_1 = det*(w[0][1]*((K_10*K_10 + K_11*K_11)));
const double G0_2_0_0 = det*(w[0][2]*((K_00*K_00 + K_01*K_01)));
const double G0_2_0_1 = det*(w[0][2]*((K_00*K_10 + K_01*K_11)));
const double G0_2_1_0 = det*(w[0][2]*((K_10*K_00 + K_11*K_01)));
const double G0_2_1_1 = det*(w[0][2]*((K_10*K_10 + K_11*K_11)));
Figure 3.3: Part of the generated code for tensor contraction representation of the
bilinear form associated with the weighted Laplacian using linear elements in two
dimensions. The variables like K_00 are components of the inverse of the Jacobian
matrix and det is the determinant of the Jacobian. The code to compute these
variables is not shown. A holds the values of the local element tensor and w contains
nodal values of the weighting function w. Due to space considerations the number
of digits of the literal constant 0.166667 has been reduced from fifteen to six.
66 Chapter 3. Representations and optimisations of finite element variational forms
Loop invariant code motion This procedure seeks to identify terms that are inde-
pendent of one or more of the summation indices and to move them outside
the loop over those particular indices. For instance, in (3.4) the terms regard-
ing the coefficient w, the quadrature weight W q and the determinant det FT0
are all independent of the basis function indices i1 and i2 and therefore only
need to be computed once for each integration point. A generic discussion of
this technique, which is also known as ‘loop hoisting’, can be found in Alfred
et al. (1986).
Reuse common terms Terms that appear multiple times in an expression can be
identified, computed once, stored as temporary values and then reused
in all occurrences in the expression. This can have a great impact on the
operation count since the expression to compute an entry in A T is located
inside loops over the basis function indices as shown in the code for the
standard quadrature representation in Figure 3.2.
The optimisations described in this section take place after the representation
stage of the code generation process (see Figure 1.3 on page 13) where any given
form is represented as simple loop and algebra instructions. Therefore, the opti-
misations are general and apply to all forms and elements that can be handled by
FFC. While the above optimisations are straightforward for simple forms and ele-
ments, their implementation using conventional programming approaches requires
manual inspection of the form and the basis. This is often done in specialised
codes, but the extension to non-trivial forms is difficult, time consuming and error
3.3. Quadrature optimisations 67
prone. Furthermore, the optimised code may bear little relation to the mathematical
problem at hand. This makes maintenance and re-use of the hand-generated code
problematic.
To switch on optimisation the command-line option -O should be used in
addition to any of the FFC optimisation options presented in the following sections.
C++ code
// Tabulated basis functions.
static const double Psi_vu[1][2] = {{-1.0, 1.0}};
Figure 3.4: Part of the generated code for the weighted Laplacian using linear
elements in two dimensions with optimisation option -f eliminate_zeros. The
arrays nzc0 and nzc1 contain the nonzero column indices for the mapping of
values. Note how eliminating zeros makes it possible to replace the two tables with
derivatives of basis functions Psi_vu_D01 and Psi_vu_D10 from Figure 3.2 with one
table (Psi_vu).
3.3. Quadrature optimisations 69
The code expressions to evaluate an entry in the local element tensor can become
very complex. Since such expressions are typically located inside loops, a reduction
in complexity can reduce the total operation count significantly. The approach can
be illustrated by the expression x (y + z) + 2xy, which after expansion of the first
term and grouping common terms reduces to x (y + z) + 2xy → xy + xz + 2xy →
3xy + xz. As x appears in both products in the sum a reduction of one operation can
be achieved by moving x outside parenthesis 3xy + xz → x (3y + z). By applying
these simplifications, the number of operations has been reduced from five to three
which may seem trivial although it is, in fact, a reduction of 40%. The algorithm
developed and implemented in FFC to perform simplifications as described above,
bears resemblance to the algorithm presented by Hosangadi et al. (2006) and later
extended and applied to optimised code generation for finite element assembly
by Russell and Kelly (2013). An additional benefit of this strategy is that the
expansion of expressions, which take place before the simplification, will typically
allow more terms to be precomputed and hoisted outside loops, as explained in
the beginning of this section.
The FFC command-line option -f simplify_expressions should be used to
generate code with this optimisation enabled. Code generated by this option for the
representation in (3.4) is presented in Figure 3.5, where again only code different
from that in Figure 3.2 has been included. The number of operations has decreased
compared to the code in Figure 3.2 for the standard quadrature representation. An
improvement in run-time performance can therefore be expected.
To understand how the optimisations lead to the code in Figure 3.5, consider
the terms
d ∂X ∂Φ ( X q ) d
d
i ∂Xα ∂Φi ( X q )
∑ ∑ ∂xβ1 ∂X1 α ∑ ∂xβ2 ∂X2 α det FT0 W q ,
α
(3.9)
β =1 α =1
1 1 α =1
2 2
in the representation (3.4) for the weighted Laplace equation. These terms are
transformed by FFC into an expression equivalent to the code
C++ code
((K_00*Psi_vu_D10[0][j] + K_10*Psi_vu_D01[0][j])*
(K_00*Psi_vu_D10[0][k] + K_10*Psi_vu_D01[0][k]) +
(K_01*Psi_vu_D10[0][j] + K_11*Psi_vu_D01[0][j])*
(K_01*Psi_vu_D10[0][k] + K_11*Psi_vu_D01[0][k])
)*W1*det;
which is, apart from a missing F0, identical to the standard quadrature code inside
the loops in Figure 3.2.
This expression is then expanded into a new expression, a sum of products,
equivalent to the code
70 Chapter 3. Representations and optimisations of finite element variational forms
C++ code
// Geometry constants.
double G[3];
G[0] = W1*det*(K_00*K_00 + K_01*K_01);
G[1] = W1*det*(K_00*K_10 + K_01*K_11);
G[2] = W1*det*(K_10*K_10 + K_11*K_11);
Figure 3.5: Part of the generated code for the weighted Laplacian using linear
elements in two dimensions with optimisation option -f simplify_expressions.
3.3. Quadrature optimisations 71
C++ code
K_00*K_00*W1*det*Psi_vu_D10[0][j]*Psi_vu_D10[0][k] +
K_00*K_10*W1*det*Psi_vu_D10[0][j]*Psi_vu_D01[0][k] +
K_00*K_10*W1*det*Psi_vu_D01[0][j]*Psi_vu_D10[0][k] +
K_10*K_10*W1*det*Psi_vu_D01[0][j]*Psi_vu_D01[0][k] +
K_01*K_01*W1*det*Psi_vu_D10[0][j]*Psi_vu_D10[0][k] +
K_01*K_11*W1*det*Psi_vu_D10[0][j]*Psi_vu_D01[0][k] +
K_01*K_11*W1*det*Psi_vu_D01[0][j]*Psi_vu_D10[0][k] +
K_11*K_11*W1*det*Psi_vu_D01[0][j]*Psi_vu_D01[0][k];
C++ code
(K_00*K_00*W1*det + K_01*K_01*W1*det)*Psi_vu_D10[0][j]*Psi_vu_D10[0][k] +
(K_00*K_10*W1*det + K_01*K_11*W1*det)*Psi_vu_D10[0][j]*Psi_vu_D01[0][k] +
(K_00*K_10*W1*det + K_01*K_11*W1*det)*Psi_vu_D01[0][j]*Psi_vu_D10[0][k] +
(K_10*K_10*W1*det + K_11*K_11*W1*det)*Psi_vu_D01[0][j]*Psi_vu_D01[0][k];
where the terms in parentheses only depend on geometry information. The terms
in parentheses can, therefore, be moved outside of the loops over the basis function
indices j and k and stored in the array G. During the process of generating values
for G, FFC will discover that two of the four parentheses are identical and thus only
three unique values in G are computed. The expressions to compute the values in
G have been simplified further by moving the variables det and W1, that appear
in both products, outside the parentheses as seen in Figure 3.5. The weighting
coefficient F0 (left out of the detailed explanation above) will generally depend on
the integration point. Therefore, each value in G is multiplied by F0 and the result
is stored in the array I which contain values that are constant inside the loop over
integration points.
The optimisation described above is the most expensive of the quadrature
optimisations to perform in terms of FFC code generation time and memory
consumption as it involves creating new terms when expanding the expressions.
The procedure does not scale well for complex expressions, but it is in many
cases the most effective approach in terms of reducing the number of operations.
This particular optimisation strategy, in combination with the elimination of zeros
outlined in the previous section, was the first to be implemented in FFC. It has
been investigated and compared to the tensor representation in Ølgaard and Wells
(2010).
C++ code
// Geometry constants.
double G[1];
G[0] = W1*det;
Figure 3.6: Part of the generated code for the weighted Laplacian using linear
elements in two dimensions with optimisation option -f precompute_ip_const.
tion points, values that depend on the basis indices are precomputed inside the
loops. This will result in a reduction in operations for cases in which some terms
appear frequently inside the loop such that a given value can be reused once
computed. To generate code with this optimisation, the FFC command-line option
-f precompute_basis_const should be used.
Code generated by this method for the representation in (3.4) can be seen in
Figure 3.7, where only code that differs from that in Figure 3.6 has been included.
Inside the loop, the value of each binary operation is stored in the array B such
that it can be reused in subsequent computations. The UFL representation of (3.4),
which is the input to FFC, can be viewed as a directed acyclic graph (DAG). When
FFC generates code from this input, it uses algorithms from UFL to traverse the
DAG such that code to evaluate subexpressions is generated before code to evaluate
any expression which depends on these subexpressions. This ensures that values in
B are computed in the correct order. In this particular case, no additional reduction
in operations has been achieved, if compared to the previous method, since no
terms can be reused inside the loop over the indices j and k. However, as the
complexity of forms increases so does the scope for reusing terms inside the loop,
leading to improved run-time performance.
C++ code
for (unsigned int j = 0; j < 3; j++)
{
for (unsigned int k = 0; k < 3; k++)
{
double B[16];
B[0] = Psi_vu_D01[0][j]*K_10;
B[1] = Psi_vu_D10[0][j]*K_00;
B[2] = (B[0] + B[1]);
B[3] = Psi_vu_D01[0][k]*K_10;
B[4] = Psi_vu_D10[0][k]*K_00;
B[5] = (B[3] + B[4]);
B[6] = B[2]*B[5];
B[7] = Psi_vu_D01[0][j]*K_11;
B[8] = Psi_vu_D10[0][j]*K_01;
B[9] = (B[7] + B[8]);
B[10] = Psi_vu_D01[0][k]*K_11;
B[11] = Psi_vu_D10[0][k]*K_01;
B[12] = (B[10] + B[11]);
B[13] = B[12]*B[9];
B[14] = (B[13] + B[6]);
B[15] = B[14]*I[0];
A[j*3 + k] += B[15];
}
}
Figure 3.7: Part of the generated code for the weighted Laplacian using linear
elements in two dimensions with optimisation option -f precompute_basis_const.
The array B contain precomputed values that depend on indices j and k.
3.4. Performance comparisons of representations 75
UFL code
BDM = FiniteElement("Brezzi-Douglas-Marini", triangle, 5)
DG = FiniteElement("Discontinuous Lagrange", triangle, 5 - 1)
mixed_element = BDM*DG
(sigma, u) = TrialFunctions(mixed_element)
(tau, w) = TestFunctions(mixed_element)
Figure 3.8: UFL code for the stiffness matrix of the mixed Poisson problem in (3.10)
using BDM elements of order five.
where τ, σ ∈ V, w, u ∈ W and
The UFL code for this form with k = 5 is shown in Figure 3.8.
The generation of code for a discontinuous Galerkin formulation of the bihar-
monic equation with Lagrange basis functions which involves both cell and interior
facet integrals (Ølgaard et al., 2008a) is also considered. The bilinear form for this
problem reads
Z Z Z
a (u, v) := ∇2 u∇2 v dx − h∇2 ui · J∇vK ds − J∇uK · h∇2 vi ds
Ω Γ0 Γ0
α
Z
+ J∇uK · J∇vK ds, (3.13)
Γ0 h
and Γ0 denotes the set of interior facets, α > 0 is a penalty parameter and h is
a measure of the cell size. See Section 4.2.4 for more details. The UFL code for
this bilinear form for the case k = 3 is shown in Figure 3.9. The third example
is a complicated form which has arisen in modelling temperature-dependent
3.4. Performance comparisons of representations 77
UFL code
element = FiniteElement("Lagrange", triangle, 3)
u = TrialFunction(element)
v = TestFunction(element)
n = VectorConstant(element.cell())
h = Constant(element.cell())
h_avg = 0.5*(h(’+’) + h(’-’))
alpha = 10.0
a = inner(div(grad(u)), div(grad(v)))*dx \
- inner(avg(div(grad(u))), jump(grad(v), n))*dS \
- inner(jump(grad(u), n), avg(div(grad(v))))*dS \
+ alpha*h_avg*inner(jump(grad(u), n), jump(grad(v),n))*dS
Figure 3.9: UFL code for the stiffness matrix of a discontinuous Galerkin for-
mulation for the biharmonic equation using two-dimensional elements of order
three (3.13).
multiphase flow through porous media (Wells et al., 2008). It comes from the
approximate linearisation of a stabilised finite element formulation for a particular
problem and is characterised by standard Lagrange basis functions of low order but
the products of many functions from a number of different spaces. The physical
significance of the equation is unimportant in the context of this work, therefore it
is presented in an abstract form. The bilinear form reads:
2
Z
!
a( p, q) := f 0 g2 g3 g4 pq − ( 1 − g5 ) ∑ g i u i · ∇ p q
Ω i =0
2 2
! !
g6 (1 − g5 ) ∑ f 2i+1 ∇ p · ∇ q + f 0 g2 g3 g4 p g7 ∑ g i u i · ∇ q
−
i =0 i =0
2 2
! !
− ( 1 − g5 ) ∑ g i u i · ∇ p g7 ∑ gi u i · ∇ q
i =0 i =0
2 2
! !
2
g6 (1 − g5 ) ∑ f 2i+1 ∇ p g7 ∑ gi u i · ∇ q dx, (3.15)
−
i =0 i =0
UFL code
scalar_p = FiniteElement("Lagrange", triangle, 2)
scalar = FiniteElement("Lagrange", triangle, 1)
dscalar = FiniteElement("Discontinuous Lagrange", triangle, 0)
vector = VectorElement("Discontinuous Lagrange", triangle, 1)
p = TrialFunction(scalar_p)
q = TestFunction(scalar_p)
a_0 = p*g3*f0*g2*g4*q\
- (1 - g5)*inner(Sgu, grad(p))*q\
- S*inner(grad(p), grad(q))
a = (a_0 + a_1)*dx
Figure 3.10: UFL code for the ‘pressure equation’ (3.15) in two dimensions.
The coefficient functions are either prescribed or come from the solution of other
equations. The UFL input to the compiler for this form is shown in Figure 3.10. Due
to the origins of this form, it will informally be denoted as the ‘pressure equation’.
The three forms have been compiled with FFC using the tensor contraction
and quadrature representations. In Table 3.1, the time required to generate the
code, the size of the generated code and the time required to compile the C++
code are reported for each form. Results are presented for the tensor contraction
case, together with the ratio of the time/size for the quadrature representation case
divided by the time/size required for the tensor contraction representation case,
denoted by q/t. In measuring the C++ compile-time and the run-time performance,
3.4. Performance comparisons of representations 79
Form generation [s] q/t size [kB] q/t C++ [s] q/t
mixed Poisson 6.3 0.79 4300 0.91 27.2 0.11
DG biharmonic 23.4 0.04 4800 0.07 77.1 0.06
pressure equation 4.0 0.14 5300 0.05 356.0 0.01
Table 3.1: Timings and code size for the compilation phase for the various variational
forms. ‘generation’ is the time required by FFC to generate the tensor contraction
code; ‘size’ is the size of the generated tensor contraction code; and ‘C++’ is the
time required to compile the generated C++ code. The ratio q/t is the ratio between
quadrature and tensor contraction representations.
the generated code has been compiled against the library DOLFIN. Noteworthy
from the results in Table 3.1 is that the generation phase for the quadrature repre-
sentation is faster than the tensor contraction representation generation phase for
all forms. In all cases the size of the generated quadrature code is smaller than the
tensor contraction code, which is reflected in the C++ compile-time. The differences
in the C++ compile-time are substantial for all forms (more than a factor of hundred
for the pressure equation), which is important during the code development phase
with frequent recompilations.2
Timings and operation counts for the three forms are presented in Table 3.2. The
number of floating point operations (flops) is defined as the sum of all ‘+’ and ‘∗’
operators in the code for computing the element matrix. Although multiplications
are generally more expensive than additions, this definition provides a good
measure for the performance of the generated code. The compound operator ‘+=’
is counted as one operation. For the run-time performance, the time required
to compute the local element tensors N times is recorded. The time needed to
insert the local tensor into the global sparse matrix is not included. For the mixed
Poisson problem N = 5 × 105 and for the discontinuous Galerkin biharmonic
2 It should be noted that the C++ compile-time reduces substantially for the tensor contraction
representation if no g++ optimisations are used (approximately around a factor of ten). The C++
compile-time for the quadrature representation is typically a couple of seconds irrespective of which
g++ optimisation option is used.
80 Chapter 3. Representations and optimisations of finite element variational forms
UFL code
element = FiniteElement("Lagrange", tetrahedron, 2)
u = TrialFunction(element)
v = TestFunction(element)
a = u*v*dx
Figure 3.11: UFL code for the mass matrix in three dimensions with element order
q = 2.
problem and the pressure equation N = 1 × 106 . Table 3.2 presents the timings and
operation counts for tensor contraction representation, together with the ratio of the
quadrature representation case and the tensor contraction representation case, q/t.
The run-time performance is indicative of an aspect of the two representations; there
can be significant performance difference depending on the nature of the differential
equation. For the mixed Poisson problem, the tensor contraction representation is
close to a factor of twenty faster than the quadrature representation, whereas for the
pressure equation the quadrature representation is close to a factor of seventy faster
than the tensor contraction case. Furthermore, the run-time performance ratio and
the flops ratio are in the same order of magnitude suggesting a coupling between the
two. This observation of dramatic differences in run-time performance suggests the
possibility of devising a strategy for determining the best representation, without
generating the code for each case. Such concepts have been successfully developed
in digital signal processing (Püschel et al., 2005). For forms with a relatively simple
structure, devising such a scheme is straightforward. However, it turns out to be
non-trivial for arbitrary forms.
UFL code
element = VectorElement("Lagrange", tetrahedron, 3)
u = TrialFunction(element)
v = TestFunction(element)
def eps(v):
return grad(v) + grad(v).T
a = 0.25*inner(eps(u), eps(v))*dx
Figure 3.12: UFL code for the elasticity-like matrix in three dimensions with element
order q = 3.
The time required for insertion into a sparse matrix, which is independent of the
element matrix representation, is also reported. The total assembly time is the
‘run-time’ plus the ‘insertion’ time, which provides a picture of the overall assembly
performance. The ratio of the total assembly time for the quadrature representation
over the total assembly time for the tensor contraction representation, denoted
by aq /at , is also presented. When taking this into account, for some forms the
difference in performance between different representations appears less drastic.
The various timings for the mass matrix problem are reported in Table 3.3. What
is clear from these results is that tremendous speed-ups for computing the element
matrices can be achieved using the tensor contraction representation, particularly
as the element order is increased. This is perhaps not surprising considering that
the geometry tensor for this case is simply a scalar, therefore the entire matrix is
essentially precomputed. Also note that the g++ compiler appears to be performing
particularly well for the tensor contraction representation in the two cases where
q = 2 and q = 3. For the case q = 3, the ratio of flops suggest that the run-time ratio
should be around hundred while in fact it is close to 6500. However, as the number
of flops increase for the tensor contraction representation this effect disappears
and the two ratios become almost equal (compare 365 to 378 for the case q = 4).
The effect of the speed-up of computing the element matrix is reduced, however,
if the time required to insert terms into a sparse matrix is taken into account. For
the case of q = 4, the tensor contraction representation is a factor of 378 faster for
computing the element matrix, but when insertion is included an overall speed-up
factor of 9.72 is observed. Although this is a substantial speed-up, the efficiency of
matrix insertion must be addressed to reap the full benefits of the tensor contraction
approach for these types of problems. If in addition the time required to perform
the remaining parts of the finite element procedure such as mesh initialisation,
application of boundary conditions, and solving the resulting system of equations
is taken into account the q/t ratio will become even closer to unity.
82 Chapter 3. Representations and optimisations of finite element variational forms
Table 3.3: Timings for the mass matrix in three dimensions for varying polynomial
order basis q.
Table 3.4: Timings for the elasticity-like matrix in three dimensions for varying
polynomial order basis q.
The various timings for the elasticity-like stiffness matrix are presented in
Table 3.4. Compared to the mass matrix, the differences in performance of the
tensor contraction representation relative to quadrature representation are less
dramatic, but nonetheless substantial, especially for higher-order functions.
that the flop count is a reasonably good indicator of performance, it is demonstrated in Section 3.5 that
this is not always the case.
3.4. Performance comparisons of representations 83
UFL code
element = FiniteElement("Lagrange", tetrahedron, 2)
element_f = FiniteElement("Lagrange", tetrahedron, 3)
u = TrialFunction(element)
v = TestFunction(element)
f = Coefficient(element_f)
g = Coefficient(element_f)
a = f*g*u*v*dx
Figure 3.13: UFL code for the mass matrix in three dimensions with with q = 2,
premultiplied by two coefficient functions (n f = 2) of order p = 3.
multiplications in the forms and the polynomial order of these functions, before
introducing products of derivatives.
To generate forms of greater complexity than those in the previous section, the
mass matrix and elasticity-like variational forms with a Lagrange basis of order
q are premultiplied with n f functions of order p. In case of the mass matrix, the
modified form reads:
Z nf
a (u, v) := ∏ f i uv dx, (3.20)
Ω i =1
An example of UFL code is shown in Figure 3.13 for the mass matrix pre-multiplied
by coefficient functions where q = 2, n f = 2 and p = 3.
A comparison of the representations for the mass matrix with a different
number of premultiplying functions and a range of orders p and q are presented in
Table 3.5. In terms of flops, a ratio q/t > 1 indicates that the tensor representation
is more efficient while q/t < 1 indicates that the quadrature representation is more
efficient. What is clear from Table 3.5 is that with few premultiplying functions,
the tensor contraction approach is generally more efficient, even for relatively high
order premultiplying functions. The situation changes quite dramatically as the
84 Chapter 3. Representations and optimisations of finite element variational forms
nf = 1 nf = 2 nf = 3 nf = 4
flops q/t flops q/t flops q/t flops q/t
p = 1, q = 1 156 1.86 580 1.61 2324 0.49 9492 0.21
p = 1, q = 2 648 7.18 3136 2.44 12512 1.68 52416 0.80
p = 1, q = 3 2700 28.68 12484 12.21 46628 3.29 205716 1.30
p = 1, q = 4 7994 57.62 38058 20.97 155850 5.13 622970 2.04
p = 2, q = 1 360 2.72 3472 0.63 36020 0.39 370020 0.08
p = 2, q = 2 1884 4.10 20236 2.12 203926 0.39 2044176 0.06
p = 2, q = 3 7656 19.95 79936 3.36 766628 0.57 8049636 0.08
p = 2, q = 4 23330 34.23 239550 5.32 2452810 0.78 24548810 0.11
p = 3, q = 1 700 1.93 14020 1.17 288020 0.13 5920020 0.02
p = 3, q = 2 3808 5.75 81136 1.02 1572608 0.09 FFC stopped
p = 3, q = 3 14740 10.53 315652 1.39 6380156 0.11 - -
p = 3, q = 4 47850 16.78 980010 1.96 19602234 0.14 - -
Table 3.5: The number of operations and the ratio between number of operations
for the two representations for the mass matrix in three dimensions as a function
of different polynomial orders and numbers of functions.
nf = 1 nf = 2 nf = 3
flops q/t flops q/t flops q/t
p = 1, q = 1 9928 0.13 42832 0.11 183088 0.03
p = 1, q = 2 80020 0.75 331228 0.51 1154620 0.16
p = 1, q = 3 405064 2.31 1466704 1.02 6806512 0.59
p = 1, q = 4 1426374 9.82 5920974 4.60 23425902 1.17
p = 2, q = 1 24940 0.19 268120 0.06 2758888 0.01
p = 2, q = 2 204760 0.82 2071972 0.14 21617452 0.07
p = 2, q = 3 902188 1.66 10789336 0.72 FFC stopped
p = 2, q = 4 3680298 7.43 37846422 1.25 - -
p = 3, q = 1 19936 0.29 750880 0.04 21556504 0.01
p = 3, q = 2 367732 0.49 8611804 0.18 FFC stopped
p = 3, q = 3 2068552 1.93 43364368 0.31 - -
p = 3, q = 4 7366950 3.71 152974350 0.50 - -
Table 3.6: The number of operations and the ratio between number of operations
for the two representations for the elasticity-like tensor in three dimensions as a
function of different polynomial orders and numbers of functions.
could be attributed to the increased memory traffic noted by Kirby and Logg (2006).
Also, it may be that the compiler is unable to perform effective optimisations on
the unrolled code, or that the compiler is particularly effective at optimising the
loops in the generated quadrature code.
A similar comparison is made for elasticity-like forms and the results are
presented in Table 3.6. The trends in this table are similar to those observed for
the mass matrix. Again, FFC was stopped after one hour of generating code
for a number of the more complex forms when using the tensor contraction
representation. Code generation using the quadrature representation completes
in a few seconds for all cases. Compared to the mass matrix case, the number
of operations has increased significantly which has a big impact on both the FFC
generation time and the size of the generated code. As an example, FFC spent 63
minutes generating a file of 2.8 GB for the case where n f = 2, p = 3 and q = 4 for
the tensor contraction representation. For the quadrature representation the code
was generated in 8.6 seconds and the resulting file size was 9.2 MB.
As seen in Table 3.6, increasing the number of coefficient functions n f in the
form clearly works in favor of quadrature representation. For n f = 3 the quadrature
representation can be expected to perform best for all values of q and p even though
q/t = 1.17 for the case where p = 1 and q = 4. In this specific case the size of
the generated code for the tensor contraction representation is 442 MB which will
reduce the run-time performance as discussed previously, assuming that g++ is
able to compile the code at all. Increasing the polynomial order of the coefficients,
86 Chapter 3. Representations and optimisations of finite element variational forms
UFL code
element = VectorElement("Lagrange", triangle, 2)
element_f = VectorElement("Lagrange", triangle, 3)
u = TrialFunction(element)
v = TestFunction(element)
f = Coefficient(element_f)
g = Coefficient(element_f)
a = div(f)*div(g)*inner(grad(u), grad(v))*dx
Figure 3.14: UFL code for the vector-valued Poisson problem in two dimension
with with q = 2, premultiplied by the divergence of two vector valued functions
(n f = 2) of order p = 3.
nf = 1 nf = 2
flops q/t flops q/t
p = 1, q = 1 708 0.29 6148 0.07
p = 1, q = 2 2202 0.90 18394 0.13
p = 1, q = 3 8090 1.48 66394 0.19
p = 1, q = 4 22548 2.53 183892 0.32
p = 2, q = 1 1412 0.16 24580 0.04
p = 2, q = 2 7790 0.52 162766 0.03
p = 2, q = 3 24902 0.57 516606 0.05
p = 2, q = 4 60156 1.27 1246436 0.10
p = 3, q = 1 2116 0.30 96772 0.02
p = 3, q = 2 11862 0.36 545422 0.02
p = 3, q = 3 45086 0.54 1695358 0.03
p = 3, q = 4 110668 1.08 4093924 0.04
Table 3.7: The number of operations and the ratio between number of operations for
the two representations for the vector-valued Poisson problem in two dimensions
as a function of different polynomial orders and numbers of functions.
will be investigated using two forms, namely the bilinear form for the weighted
Laplace equation (3.1), see UFL input in Figure 3.1, and the bilinear form for the
Mooney–Rivlin hyperelasticity model from (2.33), page 36, in three dimensions.
The UFL input for the hyperelasticity model is seen in Figure 3.15. In both cases
quadratic Lagrange finite elements will be used.
All tests were performed using the same hardware and software setup as de-
scribed in the previous section with the small difference that the g++ compiler
options are varied. The two forms are compiled with the different FFC optimi-
sations, and the number of floating point operations (flops) to compute the local
element tensor is determined. The number of flops is defined as in the previous
section, that is, as the sum of all appearances of the operators ‘+’ and ‘*’ in the
code. The ratio between the number of flops of the current FFC optimisation
and the standard quadrature representation, o/q is also computed. The gener-
ated code is then compiled with g++ using four different optimisation options for
g++, and the time needed to compute the element tensor N times is measured.
In the following, -zeros will be used as shorthand for the -f eliminate_zeros
option, -simplify is shorthand for the -f simplify_expressions option, -ip is
shorthand for the -f precompute_ip_const option and -basis is shorthand for the
-f precompute_basis_const option.
The operation counts for the weighted Laplace equation with different FFC
optimisations can be seen in Table 3.8, while Figure 3.16 shows the run-time
performance for different compiler options for N = 5 × 107 . The FFC compiler
88 Chapter 3. Representations and optimisations of finite element variational forms
UFL code
element = VectorElement("Lagrange", tetrahedron, 2)
w = TestFunction(element)
du = TrialFunction(element)
u = Coefficient(element)
c1 = Constant(tetrahedron)
c2 = Constant(tetrahedron)
Figure 3.15: UFL input for the Mooney–Rivlin hyperelasticity model in three
dimensions using quadratic elements. It is the bilinear form, the Jacobian J, which
is of interest in the performance comparison.
3.5. Performance comparisons of quadrature optimisations 89
FFC
optimisation flops o/q
None 4176 1.00
-zeros 6672 1.60
-simplify 2712 0.65
-simplify -zeros 1920 0.46
-ip 3756 0.90
-ip -zeros 4290 1.03
-basis 3756 0.90
-basis -zeros 3690 0.88
options can be seen on the x-axis in the figure and the four g++ compiler options
are shown with different colors.
The FFC and g++ compile-times were less than one second for all optimisation
options. It is clear from Figure 3.16 that run-time performance is greatly influenced
by the g++ optimisations. Compared to the case where no g++ optimisations are
used (the -O0 flag), the run-time for the standard quadrature code improves by a
factor of 4.70 when using the -O2 option, 6.86 when using the -O2 -funroll-loops
option and 10.65 when using the -O3 option. The -O3 option does not appear to
improve the run-time noticeably beyond the improvement observed for the -O2
-funroll-loops option when the FFC optimisation option -zeros is used. Using
the FFC optimisation option -zeros alone for this form does not improve run-
time performance. In fact, using this option in combination with any of the other
optimisation options increases the run-time, even when combining with the option
-simplify, which has a significant lower operation count compared to the standard
quadrature representation. A curious point to note is that without g++ optimisation
there is a significant difference in run-time for the -ip and -basis options, even
though they involve the same number of flops. When g++ optimisations are
switched on, this difference is eliminated completely and the run-times for the two
FFC optimisations are identical. This suggests that it is not possible to predict run-
time performance from the operation count alone since the type of FFC optimisation
must be taken into account as well as the intended use of g++ compiler options.
The optimal combination of optimisations for this form is FFC option -ip or -basis
combined with g++ option -O2 -funroll-loops, in which case the run-time has
improved by a factor of 12.3 compared to standard quadrature code with no g++
optimisations.
The operation counts and FFC code generation time for the bilinear form for
hyperelasticity with different FFC optimisations are presented in Table 3.9, while
Figure 3.17 shows the run-time performance for different compiler options for
90 Chapter 3. Representations and optimisations of finite element variational forms
103
-O0
-O2
-O2 -funroll-loops
time [s] -O3
102
101
os
ne
os
y
os
p
is
os
if
-i
er
no
er
er
as
er
pl
-z
-z
-z
-b
-z
im
p
-s
is
if
-i
as
pl
-b
im
-s
Figure 3.16: Run-time performance for the weighted Laplace equation for different
compiler options. The x-axis shows the FFC compiler options, and the colors denote
the g++ compiler options.
Table 3.9: FFC code generation times and operation counts for the hyperelasticity
example.
104
-O0
-O2
-O2 -funroll-loops
103 -O3
time [s]
102
101
100
os
ne
os
os
p
is
os
if
-i
er
no
er
er
as
er
pl
-z
-z
-z
-b
-z
im
y
-s
is
if
-i
as
pl
-b
im
-s
Figure 3.17: Run-time performance for the hyperelasticity example for different
compiler options. The x-axis shows the FFC compiler options, and the colors denote
the g++ compiler options.
92 Chapter 3. Representations and optimisations of finite element variational forms
well for this example compared to the weighted Laplace problem. The reason is
that the nature of the hyperelasticity form results in a relatively complex expression
to compute the entries in the local element tensor. However, this expression only
consists of a few different variables (components of the inverse of the Jacobian and
basis function values) which makes the -simplify option very efficient since many
terms are common and can be precomputed and hoisted. For the hyperelasticity
form, the optimal combination of optimisations is FFC option -simplify -zeros
and g++ option -O2 -funroll-loops. This combination improves the run-time
performance by approximately one order of magnitude compared to all other FFC
options when g++ optimisations are included. Compared to the case where no
optimisation is used by either FFC or g++, the run-time performance of the code is
improved by a factor of 744.
For the considered examples, it is clear that no single optimisation strategy is
the best for all cases. Furthermore, the generation phase optimisations that one
can best use depends on which optimisations are performed by the g++ compiler.
It is also very likely that different C++ compilers will give different results for
the test cases presented in this section. The general recommendation for selecting
the appropriate optimisation for production code will therefore be that the choice
should be based on a benchmark program for the specific problem.
In this chapter it has been illustrated how the run-time performance of the generated
code for variational forms can be improved by using various optimisation options
for the FFC and g++ compilers, and by changing the representation of the form.
Numerical experiments have shown that the relative run-time performance of
the two representations can differ substantially depending on the nature of the
considered variational form. In general, the tensor contraction approach deals
well with forms which involve high-order bases and few coefficient functions,
whereas the quadrature representation is more efficient as the number of coefficient
functions (other than constants coefficients) and derivatives in a form increases.
Hence, in general the quadrature representation is significantly faster for more
complicated forms.
In an automated modelling framework, like FEniCS, it seems natural to attempt
to select the most favourable representation automatically. When comparing the two
representations in Section 3.4 it was found that the operation count is a reasonably
good indicator for which form will exhibit the best run-time performance. FFC
presently computes the operation count for the code which is generated, on the
basis of which a choice could be made, but this involves generating computer code
for each case which can be time consuming. Ideally, the form compiler would
select the best representation based on an a priori inspection of the form. It turns
3.7. Future optimisations 93
out, however, that this is a non-trivial task if the goal is a general approach which
holds for any form which FFC can handle. Furthermore, as it has been shown
in the previous section, the code with the lowest number of flops, at least for the
quadrature representation, does not always perform best for a given form. Finally,
the run-time performance even depends on which g++ options are used. A strategy
for selecting between representations based only on an estimation of flops does,
therefore, not seem feasible.
Choosing the combination of form representation and optimisation options
that leads to optimal performance will inevitably require a benchmark study of
the specific problem. However, very often many variational forms of varying
complexity are needed to solve more complex problems. Setting up benchmarks
for all of them is cumbersome and time consuming. Additionally, during the model
development stage run-time performance is of minor importance compared to
rapid prototyping of variational forms as long as the generated code performs
reasonably well.
The default behavior of FFC is, therefore, to automatically determine which
form representation should be used based on a measure for the cost of using the
tensor representation. In short, the cost is simply computed as the maximum value
of the sum of the number of coefficients and derivatives present in the monomials
representing the form. If this cost is larger than a specified threshold, currently
set to three, the quadrature representation is selected. Recall from Table 3.6 that
when n f = 3 the flops for quadrature representation was significantly lower for
virtually all the test cases. Although this approach may seem ad hoc, it will work
well for those situations where the difference in run-time performance is significant.
It is important to remember that the generated code is only concerned with the
evaluation of the local element tensor and that the time needed to insert the values
into a sparse matrix and to solve the system of equations will reduce any difference,
particularly for simple forms. Therefore, making a correct choice of representation
is less important for forms where the difference in run-time performance is small.
A future improvement could be to devise a strategy for also letting the system
select the optimisation strategy for the quadrature representation automatically.
Regardless of whether it is possible to define an optimal strategy for automatically
selecting the representation (and possibly the optimisation), the applicability of
automated modelling is definitely extended by having both tensor contraction and
quadrature representations, and their optimisations, as part of the computational
arsenal.
The optimisations proposed in Section 3.3.5 for the quadrature representation are
primarily concerned with the run-time performance of the generated code and the
94 Chapter 3. Representations and optimisations of finite element variational forms
strategies follow along similar lines as the ones already implemented and discussed
in Section 3.3. However, as the number of FEniCS users has increased, so has the
complexity of the problems that users are trying to solve. In Section 3.4 it was
demonstrated that, for some of the more complicated forms, the tensor contraction
representation can take hours to generate code for the given problem and that
the size of the generate code can become very large. For very complex forms,
typically nonlinear forms that are linearised automatically by UFL, similar trends
can be observed also for the quadrature representation. It is, therefore, necessary
to develop new strategies for the code generation process to reduce the generation
time and the size of the generated code.
Two possible approaches that could be investigated are outlined below. Cur-
rently, the code
to compute
derivatives of, for instance, basis functions like the term
∑dβ=1 ∑dα1 =1 ∂Xα1 /∂x β ∂Φi1 ( X q )/∂Xα1 in (3.4) is located inside the loop over
basis function indices j and k, see for instance Figure 3.2. From the UFL input
UFL code
element = FiniteElement("Lagrange", triangle, 1)
u = TrialFunction(element)
v = TestFunction(element)
a = inner(grad(u),grad(v))*dx
the generated code for the loop over basis function indices will be
C++ code
for (unsigned int j = 0; j < 3; j++)
{
for (unsigned int k = 0; k < 3; k++)
{
A[j*3 + k] += (((K_00*FE0_D10[0][j] + K_10*FE0_D01[0][j]))*
((K_00*FE0_D10[0][k] + K_10*FE0_D01[0][k])) +
((K_01*FE0_D10[0][j] + K_11*FE0_D01[0][j]))*
((K_01*FE0_D10[0][k] + K_11*FE0_D01[0][k])))*W1*det;
}
}
which is almost identical to that in Figure 3.2. However, the only difference between
the code to compute the derivative of u and v is the loop index because u and v are
defined using the same finite element. Thus precomputing the derivatives outside
the loop will lead to a reduction in the code size (and in the number of operations
needed). The improved code for the given case would then become:
C++ code
double FE0_d0[3];
double FE0_d1[3];
for (unsigned int r = 0; r < 3; r++)
{
3.7. Future optimisations 95
The drawback of this approach is that the optimisations discussed in Section 3.3,
particularly the -f simplify optimisation, could be less effective as fewer common
expressions involving the geometry constants like K_00 will be present.
To reduce the size of the code even further (and possibly also improve run-
time performance), a linear algebra library, for instance Armadillo (http://arma.
sourceforge.net/) could be employed to perform block operations using optimised
BLAS. The generated code will then become:
C++ code
arma::vec FE0_d0(3);
arma::vec FE0_d1(3);
for (unsigned int r = 0; r < 3; r++)
{
FE0_d0[r] = (K_00*FE0_D10[0][r] + K_10*FE0_D01[0][r]);
FE0_d1[r] = (K_01*FE0_D10[0][r] + K_11*FE0_D01[0][r]);
}
// Copy values to A
double* p = R.memptr();
for (int r=0; r<9; r++)
A[r] = p[r];
In the given case, the size of the code has not been reduced significantly. The
approach will be particularly effective in situations involving, for instance, the
inverse operator in UFL. The inverse operator in UFL (only defined for 1 × 1, 2 × 2
and 3 × 3 matrices, is hardcoded as a function of the matrix components. This
leads to a very complex expression inside the loop over basis functions when
following the conventional quadrature approach which can be substituted by a
simple function call to arma::inv.4 The strategy outlined above could have a
negative influence on the run-time performance due to overhead in the linear
algebra library or by making it more difficult for the g++ compiler to perform
optimisations.
4 This approach might not be feasible for linearisations of the inverse when using the automatic
of FFC which was later merged into UFL. The code examples from the paper have also been updated to
be compliant with FEniCS version 1.0.
98 Chapter 4. Automation of discontinuous Galerkin methods
T−
S
T+
et al. (2006); Wells and Dung (2007); Labeur and Wells (2007).
where Γ0 denotes the set of all interior facets of the triangulation Th and JvK denotes
the jump in the function value of v across the facet S:
JvK = v+ − v− . (4.2)
Here, v+ and v− denote the values of v on the facet S as seen from the two cells
T + and T − incident with S, respectively (see Figure 4.1). Note that each interior
facet is incident to exactly two cells which may be labelled T + and T − . The union
of these two cells, T = T + ∪ T − , will be referred to as the macro cell.
In order to handle variational forms such as (4.1) in the FEniCS framework,
additional functionality is needed in a number of components. Obviously, UFL
must be extended to support the definition of integrals over interior facets. These
integrals may involve functions which can be evaluated on either of the two cells
incident to the interior facet. DOLFIN must be extended to support assembly of
multilinear forms containing interior facet integrals which in turn requires the
UFC interface to be extended with a new integral class. As UFC is only concerned
with the interface of this class, FFC must support code generation for interior
facet integrals defined using the UFL syntax. The following sections describe the
extensions that have been developed in each of these four components.
4.1. Extending the framework to discontinuous Galerkin methods 99
As DOLFIN relies on the UFC interface when evaluating local finite element tensors,
the UFC interface must define the tabulate_tensor function also for interior facet
integrals. This function is provided by the class ufc::interior_facet_integral
and the interface is
C++ code
/// Tabulate the tensor for the contribution from a local interior facet
virtual void tabulate_tensor(double* A, const double * const * w, const cell&
c0, const cell& c1, unsigned int facet0, unsigned int facet1) const = 0;
where A is a pointer to an array which will hold the values of the local element
tensor and w contains nodal values of any coefficient functions present in the
integral. The two cells c0 and c1 correspond to the cells T + and T − incident with
the given facet S while facet0 and facet1 are the local indices of the facet S relative
to the cells c0 and c1 respectively. This is illustrated in Figure 1.6b, page 19, where
the local facet (edge) index of the shared facet is e0 relative to one cell while it
is e2 relative to the other cell. The implication of this aspect is elaborated in the
following section.
FFC must also be extended in order to generate code for the new integral class in
UFC to evaluate the local facet tensor. In Section 3.2.2, it was shown how the cell
tensor (element tensor) can be computed from the tensor representation
Similarly, one may use the affine mappings (defined in Section 3.2.1) FT + and FT − to
obtain a tensor representation for the interior facet tensor AS . However, depending
on the topology of the macro cell T, one obtains different tensor representations.
For a triangular mesh, each cell has three facets (edges) and there are thus 3 × 3 =
9 different topologies to consider; there are nine different ways in which two
edges can meet. Similarly, for a tetrahedral mesh, there are 4 × 4 = 16 different
topologies to consider. Notice that this is only true because FFC assumes the UFC
numbering convention of mesh entities, outlined in Section 1.3.4 and illustrated in
Figure 1.6b, which guarantees that two incident simplicial cells always agree on
the orientation of an incident facet. If no particular ordering of the mesh entities is
assumed, one needs to consider 3 × 3 × 2 = 18 different topologies for triangles
and 4 × 4 × 6 = 96 topologies for tetrahedra. This is because there are two different
ways to superimpose two edges, and there are six different ways to superimpose
two faces. The tensor representation for the interior facet tensor can then be written
4.1. Extending the framework to discontinuous Galerkin methods 101
in the form
0, f + (S), f − (S)
AS,i = ∑ Aiα GTα (S) , (4.4)
α
where f + and f − denote the local numbers of the two facets that meet at S relative
to the two cells T + and T − respectively. Note that the geometry tensor GTα in (4.3)
involves the mapping from the reference cell and differs from the geometry tensor
GTα (S) in (4.4), which may involve the mapping from the reference cell and the
+ −
mapping from the reference facet. The reference tensor A0, f , f is precomputed
for each facet–facet combination ( f + , f − ) and a run-time decision must be made
as to which reference tensor should be contracted with the geometry tensor.
The FFC machinery which generates code for each facet–facet combination
based on UFL expressions is largely unaffected by the extensions to discontinu-
ous Galerkin methods. As a consequence, the quadrature representation can be
extended in a similar fashion taking into account the differences between the two
representations described in Section 3.2. Furthermore, the optimisations presented
in Section 3.3 also apply to variational forms containing interior facet integrals.
To assemble the global sparse tensor A for variational forms that contain integrals
over interior facets as in (4.1), one may extend the standard assembly algorithm
over the cells of the computational mesh (see Algorithm 1, page 27) by including
an iteration over the interior facets of the mesh. The approach is described for
the bilinear form in (4.1) where, for ease of notation, it is assumed that u, v ∈ V.
Adopting the notation from Section 1.3.5 the tensor A which arises from assembling
the bilinear form in (4.1) can be expressed as:
Z
A I = a φ I2 , φ I1 = ∑ aS φI2 , φI1 = ∑ JuKJvK ds, (4.5)
S S S
N
where I = ( I1 , I2 ) is a multi-index and φk k=1 is a global (possibly discontinuous)
basis for V.
To assemble the global sparse tensor A efficiently by iterating over the interior
facets of the mesh, a local-to-global mapping that maps the basis functions on
the local facet S to the set of global basis functions is needed. This mapping is
constructed by considering + −
n two o cells T n ando T sharing a common facet S as
+ n − n
shown in Figure 4.1. Let φkT and φkT denote the local finite element
k =1 k =1
basis on T+ and T− respectively. These local basis functions are now extended to
102 Chapter 4. Automation of discontinuous Galerkin methods
T+
φk ( x ) , k = 1, 2, . . . , n,
x ∈ T+,
0, k = 1, 2, . . . , n, x T−,
∈
φ̄kT ( x ) = (4.6)
0, k = n + 1, n + 2, . . . , 2n, x ∈ T+,
φ T − ( x ) , k = n + 1, n + 2, . . . , 2n,
x ∈ T−.
k−n
The local basis functions on T + and T − are thus extended to T by zero to obtain a
n o2n
local finite element space, φ̄kT , on T of dimension 2n. Recall from Section 1.3.5
k =1
j
that, for each T ∈ Th , ι T : [1, n] → [1, N ] denotes the local-to-global mapping for
j
each discrete function space Vj . The local-to-global mapping for T (or S), ι T , can
then be obtained by the construction (4.6) such that
j j j j j j j j
ι T (1) = ι T + (1), . . . , ι T (n) = ι T + (n), ι T (n + 1) = ι T − (1), . . . , ι T (2n) = ι T − (n). (4.7)
The local interior facet tensor AS can now be defined. Consider first the case
j j
when ι T is an injective mapping and note that ι T is injective when the ranges of
j j
ι T + and ι T − are disjoint (which is the case for discontinuous elements). Continuing
from (4.5), the tensor A can be computed from
A I = ∑ aS φ I2 , φ I1 = ∑ aS φ̄ιT−1 ( I ) , φ̄ιT−1 ( I ) = ∑ A −1 , (4.8)
−1
T 2 T 1 S, ι
T
( I1 ),ι T ( I2 )
S S S
where i = (i1 , i2 ) is a multi-index. Note that the size of AS , due to the construction
in (4.6) is 2n × 2n and not n × n as would be the case for a local cell tensor A T .
Similar to (1.11) and (1.12), the collective local-to-global mapping for each S ∈ Γ0 is
defined as
ι T (i ) = ι1T (i1 ) , ι2T (i2 ) ∀ i ∈ IT , (4.10)
2
I T = ∏ [1, 2n] = (1, 1), (1, 2), . . . , (2n, 2n − 1), (2n, 2n) . (4.11)
j =1
may still assemble the global tensor A by Algorithm 2 and compute the interior
facet tensor according to (4.9). To see this, assume that ι1T (i1 ) = ι1T (i10 ) = I1 for
some i1 6= i10 . It then follows that the entry A I1 ,ι2 (i2 ) will be a sum of the two terms
T
AS,i1 ,i2 and AS,i0 ,i2 (and possibly other terms). Since aS is bilinear, we have
1
AS,i1 ,i2 + AS,i0 ,i2 = aS φ̄iT2 , φ̄iT1 + aS φ̄iT2 , φ̄iT0
1
1
= aS φ̄iT2 , φ̄iT1 + φ̄iT0 = aS φ̄iT2 , φ I1 , (4.12)
1
where by the construction (4.6) φ I1 is the global basis function that both φ̄iT1 and φ̄iT0
1
are mapped to.
DOLFIN implements Algorithm 2 in the assemble function. To compute the
local contribution aS , DOLFIN calls the tabulate_tensor function for interior facet
integrals using the interface described in Section 4.1.2. For each discrete function
space DOLFIN calls the tabulate_dofs function, see Section 1.3.4, on the cells T +
j
and T − to construct the local-to-global mapping ι T from (4.7). These mappings
are then used by DOLFIN to construct the collective local-to-global mapping ι T
from (4.10).
4.2 Examples
The developments described in the previous section extend the applicability of the
FEniCS framework to a new range of problems. In this section, it is demonstrated
how the extensions make it possible to apply discontinuous Galerkin formulations
to a number of problems. The examples are presented on the usual form: find
u ∈ V such that
a (u, v) = L (v) ∀ v ∈ V̂, (4.13)
104 Chapter 4. Automation of discontinuous Galerkin methods
where V is the trial space and V̂ is the test space, a (u, v) and L (v) denote the
bilinear form and linear form respectively. Some of the examples are presented as
complete DOLFIN solvers while others only present the UFL input for the bilinear
and linear forms of the corresponding problem. For all examples, the test and trial
functions are assumed to come from the same function space, that is, V̂ = V.
and Z
L(v) := f v dx, (4.16)
Ω
where α > 0 is a penalty parameter and h is a measure for the cell size defined as
h = (h+ + h− )/2 with h+ and h− denoting the cell size for the two cells, T + and
T − respectively, incident with the given interior facet. Due to the term involving
the penalty parameter α this formulation is commonly referred to as an interior
penalty (IP) formulation. The size of a cell is defined here as twice the circumradius.
The jump J·K and average h·i operators are defined as JvK = v+ n+ + v− n− and
h∇vi = (∇v+ + ∇v− )/2 on the set of interior facets, Γ0 , and JvK = vn on ∂Ω.
A domain and source term identical to those used in Section 1.3.5 are considered,
that is, Ω = [0, 1] × [0, 1] and f = 8π 2 sin(2πx ) sin(2πy). The corresponding
DOLFIN solver for this problem is shown in Figure 4.2 for linear polynomials on
triangular elements. Note in particular how the form and syntax of the definitions
of the bilinear and linear forms (a and L) resemble closely the mathematical notation
in (4.15) and (4.16). Also, note the close resemblance with the code in Figure 1.10,
page 25, for the continuous solution. This demonstrates the ease of switching
between formulations for a given problem by only changing the definitions of the
bilinear and linear forms in the computational setup. The functions FacetNormal
and CellSize are convenience functions implemented in DOLFIN according to the
definitions above. Because the solution is computed on discontinuous elements it is
4.2. Examples 105
common to project the solution onto a continuous basis for visualisation. This can
be accomplished easily by using the project function in DOLFIN. The computed
solution for the Poisson problem, projected onto a piecewise linear basis, is seen in
Figure 4.3, which is almost identical to the solution presented in Figure 1.9, page 24,
for the continuous case.
and
α
Z Z Z
L(v) := gv ds − JgK · ∇v ds − ∇ g · JvK ds, (4.18)
ΓD h ΓD ΓD
where the vector b is a given velocity field, u? is equal to u restricted to the upwind
side of a facet,
u+ b · n+ > 0,
u? = (4.19)
u− b · n+ < 0,
κ is the diffusion coefficient, Γ D is the part of the boundary where the Dirichlet
condition u = g is applied. The definitions of the jump and average operators and
the parameters h and α are the same as for the Poisson equation.
Again, the unit square, Ω = [0, 1] × [0, 1], is considered with g = sin(5πy))
applied on the boundary at x = 1 and a constant velocity field of b = (−3, −2).
The diffusion coefficient κ is set to zero in which case the DOLFIN solver for this
problem can be implemented as shown in Figure 4.4 for linear triangular elements.
The implementation is again a reflection of the mathematical formulation with
a small exception regarding the upwind value u? . In the code, the variable bn
is computed as bn = b · n + |b · n| /2. Relative to the two elements T + and T −
Python code
from dolfin import *
# Define normal component, mesh size, penalty parameter and right-hand side
n = FacetNormal(mesh)
h = CellSize(mesh)
h_avg = (h(’+’) + h(’-’))/2
alpha = 4.0
x = V.cell().x
f = 8*pi**2*sin(2*pi*x[0])*sin(2*pi*x[1])
# Compute solution
u = Function(V)
solve(a == L, u)
# Plot solution
plot(u_proj, interactive=True)
Figure 4.2: Complete DOLFIN solver for the interior penalty method applied to the
Poisson equation on a unit square using k = 1.
4.2. Examples 107
Figure 4.3: Computed solution of the Poisson problem. The solution has been
projected onto a piecewise linear basis for visualisation. The warped scalar field u
has been scaled by a factor of 0.5.
Recalling that jump(v) in UFL is equivalent to v+ − v− the line in the code concern-
ing upwinding on interior facets dot(jump(v), bn(’+’)*u(’+’) - bn(’-’)*u(’-’))
is equivalent to:
v+ − v− b · n+ u+ − b · n− u− =
v+ b · n+ u+ − v− b · n+ u+ − v+ b · n− u− + v− b · n− u− (4.21)
Since either b · n+ or b · n− is zero, (4.21) is identical to the bu? · JvK term in (4.17)
when the definition of u? in (4.19) is used. The implementation of the upwind
value is a good example of the flexibility offered by the operators in Table 4.1 for
implementing more complex expressions. Also note how the Dirichlet boundary
conditions are applied to only the Γ D part of the exterior boundary. This is achieved
in the code by first creating a class DirichletBoundary, see Section 1.3.5, which
overloads the inside function to return true when x = 1. Then, a FacetFunction
is created which holds an integer value, a marker, for all facets of the mesh and the
value for all facets is initially set to 0. The DirichletBoundary class is then used to
mark the facets which are located at x = 1 by 1. The variable boundary_facets now
contains the index of all facets and the associated value (0 or 1) which indicates if the
facet is part of the Γ D boundary or not. This variable is used to redefine the Measure
object ds to let *ds(1) and *ds(0) in the forms, a and L, indicate integration over
the Γ D and Γ D \ ∂Ω parts of the boundary respectively, see Section 1.3.2. The
computed solution to this problem, projected onto a piecewise linear basis, is seen
in Figure 4.5.
108 Chapter 4. Automation of discontinuous Galerkin methods
Python code
from dolfin import *
# bn = bn if outflow_facet else 0
bn = (dot(b, n) + abs(dot(b, n)))/2.0
# Define forms
a = dot(-b*u, grad(v))*dx \
+ dot(bn(’+’)*u(’+’) - bn(’-’)*u(’-’),jump(v))*dS + dot(bn*u,v)*ds(0)\
+ (alpha/h*u*v - dot(grad(u), v*n) - dot(u*n, grad(v)))*ds(1)
L = (alpha/h*g*v - dot(g*n, grad(v)) - dot(grad(g), v*n))*ds(1)
# Compute solution
u = Function(V)
solve(a == L, u)
Figure 4.4: Complete DOLFIN solver for the advection–diffusion equation with
diffusion coefficient κ = 0.
4.2. Examples 109
UFL code
# Create mixed function space
W = VectorElement("Discontinuous Lagrange", "triangle", 1)
Q = FiniteElement("Lagrange", "triangle", 1)
element = W * Q
# Define normal component, mesh size, penalty parameter and right-hand side
n = element.cell().n
h = 2.0*triangle.circumradius
h_avg = (h(’+’) + h(’-’))/2
alpha = 4.0
f = Coefficient(W)
# Define forms
a = inner(grad(u), grad(v))*dx + inner(grad(p), v)*dx - inner(u, grad(q))*dx \
+ inner(jump(u, n), q(’+’))*dS \
+ inner(u, n)*q*ds \
- inner(dot(avg(grad(u)), n(’+’)), jump(v))*dS \
- inner(jump(u), dot(avg(grad(v)), n(’+’)))*dS \
- inner(dot(grad(u), n), v)*ds \
- inner(u, dot(grad(v), n))*ds \
+ alpha/h_avg*inner(jump(u), jump(v))*dS \
+ alpha/h*inner(u, v)*ds
L = dot(f, v)*dx
Figure 4.6: UFL input for the Stokes equation using k = 1 and ν = 1.0.
and Z
L v, q := f · v dx. (4.25)
Ω
The jump J·K and average h·i operators are defined as JvK = v+ ⊗ n+ + v− ⊗ n− ,
Jv · nK = v+ · n+ + v− · n− and h∇vi = (∇v+ + ∇v− )/2 on Γ0 and JvK = v ⊗ n on
∂Ω. The UFL input in two dimensions for this problem with k = j = 1, as proposed
in Baker et al. (1990), and the kinematic viscosity ν = 1.0 is shown in Figure 4.6.
Classically, Galerkin methods for the biharmonic equation seek approximate solu-
tions in a subspace of H 2 (Ω). However, such functions are difficult to construct in
a finite element context. Based on discontinuous Galerkin principles, methods have
been developed which utilise functions from H 1 (Ω) (Engel et al., 2002; Wells and
Dung, 2007). Rather than considering jumps in functions across element boundaries,
4.2. Examples 111
terms involving the jump in the normal derivative across element boundaries are
introduced. Unlike fully discontinuous approaches, this method does not involve
double-degrees of freedom on element edges and, therefore, does not lead to the
significant increase in the number of degrees of freedom relative to conventional
methods. Consider the continuous function space
n o
V := v ∈ H01 (Ω) : v ∈ Pk ( T ) ∀ T ∈ Th . (4.26)
The bilinear and linear forms for the biharmonic equation, with the boundary
conditions u = 0 on ∂Ω and ∇2 u = 0 on ∂Ω, read
Z Z Z
a (u, v) := ∇2 u∇2 v dx − J∇uK · h∇2 vi ds − h∇2 ui · J∇vK ds
Ω Γ0 Γ0
α
Z
+ J∇uK · J∇vK ds, (4.27)
Γ0 h
Z
L(v) := f v dx. (4.28)
Ω
The jump J·K and average h·i operators are defined as J∇vK = ∇v+ · n+ + ∇v− · n−
and h∇2 vi = (∇2 v+ + ∇2 v− )/2 on Γ0 . The UFL input for this problem with k = 4
is shown in Figure 4.7.
The error in the L2 norm for the convergence rates in Figure 4.8, is computed
via the code shown in Figure 4.9 where the finite element solution uh has been
computed using fourth order Lagrange basis functions. Given the exact solution
u and the finite element solution uh , the error e = u − uh can be computed by
the functional M in the code. Note that the exact solution has been approximated
by interpolating the exact solution using a continuous eighth order polynomial.
Extending the FEniCS framework for discontinuous Galerkin methods also permits
the computation of other norms like the mesh-dependent semi-norm of the error
Z Z
2
|||e||| = ∇e · ∇e dx + JeK · JeK ds, (4.29)
Ω Γ0
UFL code
# Define test and trial functions
element = FiniteElement("Lagrange", tetrahedron, 4)
u = TrialFunction(element)
v = TestFunction(element)
# Parameters
alpha = 16.0
# Bilinear form
a = inner(div(grad(u)), div(grad(v)))*dx \
- inner(avg(div(grad(u))), jump(grad(v), n))*dS \
- inner(jump(grad(u), n), avg(div(grad(v))))*dS \
+ alpha/h_avg*inner(jump(grad(u), n), jump(grad(v),n))*dS
# Linear form
L = f*v*dx
100
k=2
k=3
10−1 k=4
2
10−2
1
ku − uh k
10−3
4
10−4
1
10−5 5
1
10−6
1 0.1
h
Figure 4.8: Error in the L2 norm for the biharmonic equation with penalty parame-
ters α = 4, α = 20 and α = 20, for k = 2, k = 3 and k = 4 respectively.
4.2. Examples 113
UFL code
element_u = FiniteElement("Lagrange", tetrahedron, 8)
element_uh = FiniteElement("Lagrange", tetrahedron, 4)
u = Coefficient(element_u)
u_h = Coefficient(element_uh)
e = u - u_h
M = e*e*dx
UFL code
element_u = FiniteElement("Lagrange", tetrahedron, 8)
element_uh = FiniteElement("Discontinuous Lagrange", tetrahedron, 4)
u = Coefficient(element_u)
u_h = Coefficient(element_uh)
e = u - u_h
M = inner(grad(e), grad(e))*dx + inner(jump(e), jump(e))*dS
chosen small enough such that it does not dominate the results when constant elements are used. This
will be investigated in Section 5.3.
116 Chapter 5. Automation of lifting-type discontinuous Galerkin methods
chapter first presents a lifting-type formulation for the Poisson equation. Then
follows the implementation of this formulation in the FEniCS framework, including
some developments for semi-automated support. The two formulations for the
Poisson equation are then compared to each other to illustrate the influence of
the penalty parameter and performance for constant elements. Finally, future
developments to enable fully-automated support for lifting-type formulations in
the FEniCS framework are discussed.
This section describes the basic concepts of a lifting-type formulation for the
Poisson equation. The notation from the previous chapter is adopted and some
definitions and concepts are reiterated in the following for convenience. Recall
from Section 4.2.1 the discontinuous scalar function space V:
n o
V := v ∈ L2 (Ω) : v| T ∈ Pk ( T ) ∀ T ∈ Th , (5.1)
where (·)+ and (·)− denote the value of a quantity (·) on T + and T − respectively,
n is the outward unit normal and Γ0 is the set of interior facets in Ω. A function
space for the gradient of functions in V is also defined:
h id
Q := q ∈ L2 (Ω) : q| T ∈ Pl ( T ) ∀ T ∈ Th , (5.3)
where the last term is a stabilisation term with α being a stabilisation parameter.
An important property of the lifting-type discontinuous Galerkin formulation is
that the method is stable for any α > 0, which is in contrast to the IP formulation
in (4.15), see Arnold et al. (2002). In addition, no parameter for the mesh size is
needed (h in (4.15))2 . The linear form for the Poisson problem using a lifting-type
formulation remains identical to (4.16) when considering homogeneous Dirichlet
boundary conditions.
tion of the Poisson equation in Arnold et al. (2002) the parameter he is defined as the length of an edge,
while Djoko et al. (2007b) defines he as the distance between centroids of elements sharing a common
edge for a similar problem.
118 Chapter 5. Automation of lifting-type discontinuous Galerkin methods
using the FEniCS framework, see Figure 4.2 on page 106. The implementation
of lifting-type discontinuous Galerkin forms like (5.7) is, however, more involved.
This is due to the nature of the lifting function R (v) defined in (5.6), through the
variational problem in (5.5), which adds complexity to the assembly procedure.
However, it is possible to use the tools provided by FEniCS as building blocks to
extend the framework to also handle lifting-type formulations in a semi-automated
fashion. This section describes a possible approach to achieve this.
onv
T,q nq
Let φkT,v
n n o
and φk denote the local finite element basis for V and Q
k =1 k =1
on a cell T respectively. From (5.5) two tensors, A E and AS , can be identified:
Z
E,q E,q
A E,i = a E φ̄i , φ̄i = rS (v) · q dx (5.8)
2 1 E
Z
E,v E,q
AS,i = aS φ̄i , φ̄i = − JvK · hqi ds (5.9)
2 1 S
where i = (i1 , i2 ) is the usual multi-index and φ̄ E is a, possible, macro basis which
can be constructed from (4.6), page 102. The lifting operator rS (v) on the cell E can
be represented as:
N
E,q
rS (v) = ∑ rk φ̄k (5.10)
k
assumed that S is an interior facet. The local cell tensor A T for this term is equal to:
Z
E,q
A T,i = a T φ̄i , φiT,v =
rS (u) · ∇v dx, (5.12)
2 1 T
with rS (u) defined by (5.10). Due to the extensions presented in the previous
chapter, a bilinear form to compute AS from (5.9) in two dimensions can be
implemented directly in UFL as:
UFL code
Q = VectorElement("DG", triangle, 0)
V = FiniteElement("DG", triangle, 1)
q = TestFunction(Q)
u = TrialFunction(V)
n = triangle.n
a = - inner(jump(u, n), avg(q))*dS
5.2. Semi-automated implementation of lifting-type formulations 119
UFL code
Q = VectorElement("DG", triangle, 0)
v = TestFunction(V)
u = TrialFunction(V)
a = u*v*dx
on T + and T − and then inserting the resulting tensors into A E (which is essentially
a macro mass matrix) following the construction in (4.6). The bilinear form to
compute A T only involves the standard integral over a cell and can, like AS , be
implemented directly in UFL as:
UFL code
Q = VectorElement("DG", triangle, 0)
V = FiniteElement("DG", triangle, 1)
v = TestFunction(V)
R = TrialFunction(Q)
a = inner(R, grad(v))*dx
where the mappings ι1T (i1 ) and ι2S (i2 ) are computed according to (1.11), page 26,
and (4.7), page 102, respectively; and I T,S is the index set:
n o
I T,S = (1, 1), (1, 2), . . . , (nv , 2nq − 1), (nv , 2nq ) , (5.14)
computing the inverse of A E . In order to keep the implementation simple, the FEniCS Solid Mechanics
library employs Armadillo (http://arma.sourceforge.net/) to perform these computations.
5.2. Semi-automated implementation of lifting-type formulations 121
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
0
∗ ∗ ∗
1 ∗ ∗ ∗
2
∗ ∗ ∗
3 ∗ ∗ ∗ ∗
4 ∗ ∗ ∗
5 ∗ ∗ ∗
6 ∗ ∗ ∗ ∗
7 ∗ ∗ ∗ ∗
8
∗ ∗ ∗
9
∗ ∗ ∗ ∗
10 ∗ ∗ ∗ ∗
11
∗ ∗
12
∗ ∗ ∗ ∗
13
∗ ∗ ∗
14 ∗ ∗
Figure 5.1: Illustration of nonzero entries in the global tensor Aip arising from
assembling the IP formulation in (4.15) on the mesh shown in Figure 5.3a using
discontinuous constant elements.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
0
∗ ∗ ∗ ∗ ∗ ∗
1 ∗ ∗ ∗ ∗ ∗
2
∗ ∗ ∗ ∗ ∗ ∗
3 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
4 ∗ ∗ ∗ ∗ ∗
5 ∗ ∗ ∗ ∗ ∗ ∗
6 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
7 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
8
∗ ∗ ∗ ∗ ∗ ∗
9
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
10 ∗ ∗ ∗ ∗ ∗ ∗ ∗
11
∗ ∗ ∗ ∗
12
∗ ∗ ∗ ∗ ∗ ∗ ∗
13
∗ ∗ ∗ ∗ ∗ ∗ ∗
14 ∗ ∗ ∗ ∗
Figure 5.2: Illustration of nonzero entries in the global tensor Alift arising from
assembling the lifting-type formulation in (5.7) on the mesh shown in Figure 5.3a
using discontinuous constant elements.
122 Chapter 5. Automation of lifting-type discontinuous Galerkin methods
11 13 14 11 13 14 11 13 14
10 12 10 12 10 12
7 9 7 9 7 9
5 6 8 5 6 8 5 6 8
1 3 4 1 3 4 1 3 4
0 2 0 2 0 2
Figure 5.3: Simple mesh of the domain Ω = [0, 1] × [0, 1] including cell indices.
Because constant elements are used, the index numbering is equal to the degree of
freedom numbering. Figures (b) and (c) show the cells involved when computing
the entry for degree of freedom number 6 using the IP and lifting-type formulations
respectively.
(a) Structured mesh, the ‘right’ mesh, with (b) Structured mesh, the ‘left/right’ mesh,
the direction of the diagonal pointing to the with alternating direction of the diagonals.
right.
Figure 5.4: Two types of structured meshes for the domain Ω = [0, 1] × [0, 1].
are shown in Figure 5.4. The diagonals of the mesh in Figure 5.4a all point to the
right, and this type of structured mesh will therefore be referred to as the ‘right’
mesh. This particular mesh is created in DOLFIN by:
C++ code
UnitSquare mesh(4, 4);
and is the default mesh type in DOLFIN. The direction of the diagonals of the
mesh in Figure 5.4b alternates between right and left and will be referred to as the
‘left/right’ mesh which is created in in DOLFIN by:
C++ code
UnitSquare mesh(4, 4, "left/right");
100 100
10−1 10−1
||u − uh ||
||u − uh ||
10−2 10−2
2 2
1 1
10−3 10−3
α = 10−3 α=2
α = 100 α=4
α = 103 α = 1000
10−4 10−4
0.1 0.01 0.1 0.01
h h
Figure 5.5: Error in the L2 norm as a function of the cell size for various values of
α using discontinuous linear elements (k = 1). The results were computed on the
‘right’ mesh (similar results can be obtained using the ‘left/right’ mesh).
100 100
10−1 10−1
||u − uh ||
||u − uh ||
1 1
1 1
10−2 10−2
α = 100 α = 1.00
α = 10−1 α = 2.00
α = 10−2 α = 2.30
α = 10−3 α = 2.45
α = 10−4 α = 4.00
10−3 10−3
0.1 0.01 0.1 0.01
h h
10−1 10−1
||u − uh ||
||u − uh ||
1 1
1 1
10−2 10−2
α = 100 α = 1.00
α = 10−1 α = 2.00
α = 10−2 α = 2.30
α = 10−3 α = 2.45
α = 10−4 α = 4.00
10−3 10−3
0.1 0.01 0.1 0.01
h h
Figure 5.6: Error in the L2 norm as a function of the cell size for various values of α
on different types of meshes using discontinuous constant elements (k = 0).
126 Chapter 5. Automation of lifting-type discontinuous Galerkin methods
(a) Computed solution for the lifting-type (b) Computed solution for the IP formula-
formulation with α = 10−3 on the ‘right’ tion with α = 2.45 on the ‘right’ mesh.
mesh.
(c) Computed solution for the lifting-type (d) Computed solution for the IP formula-
formulation with α = 10−3 on the ‘left/right’ tion with α = 2.45 on the ‘left/right’ mesh.
mesh.
Figure 5.7: Computed solutions to the Poisson problem for the two formulations
on the two structured meshes using constant elements and a ‘optimal’ value of α.
5.4. Future developments 127
Figures 5.6d and 5.7 one could expect that the IP formulation with α = 2.45 would
perform reasonably well on an unstructured mesh. But from Figure 5.8, α = 2 seems
to produce the best result as higher values of α reduce the magnitude of the values
in the solution. The lifting-type formulation is not affected by the unstructured
mesh and the result is comparable to the ones obtained in Figures 5.7c and 5.7a
for the ‘left/right’ and ‘right’ meshes respectively. In conclusion, a lifting-type
formulation is needed to obtain reliable results when using discontinuous constant
elements for the Poisson problem.
UFL code
Q = VectorElement("DG", triangle, 0)
V = FiniteElement("DG", triangle, 1)
v = TestFunction(V)
u = TrialFunction(V)
R = LiftingFunction(Q)
a = inner(R(u), grad(v))*dE
The LiftingFunction represents the operations defined in (5.5) and (5.6) and FFC
must be extended to support code generation for these operations. A new type
of integral, dE, is introduced to denote integration over a patch of elements such
that evaluating R (u) on T will involve T and all of its neighbours. UFC will then
need to provide an interface for this new integral class and DOLFIN must be
updated with an algorithm to perform the assembly, including the construction of
the collective local-to-global mapping for the patch of elements under consideration.
The procedure outlined above is further complicated by the definition of the lifting
function in (5.6) where the loop over facets depends on the set Γ D where Dirichlet
boundary conditions are to be applied. In the event of a moving boundary inside
the domain Ω, the assembly algorithm must thus be provided with information
about which cells and facets to consider.
Another possible future direction, which is perhaps achieved more easily, is to
extend the assembly algorithm. In Algorithm 3 the loop over facets S to compute
rS (v) is nested inside a loop over the cells of the mesh. It should be possible to
compute the entire lifting function R (v) in a single loop over all facets of the mesh
to avoid redundant computations of rS (v). The collective local-to-global mapping
must, however, still be constructed by looping over the facets of the cell T during
assembly.
128 Chapter 5. Automation of lifting-type discontinuous Galerkin methods
(a) Computed solution for the lifting-type (b) Computed solution for the IP formula-
formulation with α = 10−3 on an unstruc- tion with α = 2 on an unstructured mesh.
tured mesh.
(c) Computed solution for the IP formulation (d) Computed solution for the IP formula-
with α = 2.45 on an unstructured mesh. tion with α = 4 on an unstructured mesh.
Figure 5.8: Computed solutions to the Poisson problem for the two formulations
on an unstructured mesh using constant elements. The solution computed using
the lifting-type formulation (a) is compared to the solutions obtained using the IP
formulation with different values of α.
6 Strain gradient plasticity
This chapter brings together the tools and extensions of the previous chapters in an
implementation of a strain gradient plasticity model proposed by Aifantis (1984) in
the FEniCS framework. Strain gradient plasticity models can be used to model size
effects which cannot be accounted for by classical plasticity theory. Size effects in
plasticity are manifest as an increase in the strength of a material as the size of a
specimen becomes smaller. This effect has been observed in many applications at
the micron scale, for instance micro-indentation (Poole et al., 1996; Nix and Gao,
1998; Begley and Hutchinson, 1998), micro-bending (Stölken and Evans, 1998) and
wire torsion (Fleck et al., 1994). For softening problems, the classical plasticity
theory exhibits a pathological mesh dependence in material softening as it does not
provide a length scale for the shear band width. Strain gradient models that define
an internal length scale can, therefore, sometimes be used to provide regularisation
in softening problems under certain conditions.
The considered plasticity model involves the addition of the Laplacian of the
plastic multiplier to the classical yield condition. By considering a weak formulation
for the yield condition H 1 -regular functions can be used for representing the plastic
multiplier. By employing a discontinuous formulation, the yield condition can
be satisfied locally (cell-wise). Following this approach, the standard balance of
momentum equations for the displacements are defined in the entire domain.
The formulation for the yield condition, however, is only defined in the plastic
domain. Furthermore, it is necessary to impose boundary conditions for the plastic
multiplier on the, potentially moving, boundary of the plastic domain. This poses
a numerically challenging problem.
The chapter is organised as follows. First, the strain gradient plasticity model
is presented. The presentation builds on the notation and equations presented in
Chapter 2, in particular Section 2.2.2 and 2.4.2 concerning plasticity and where
convenient some of the definitions are reiterated. A lifting-type formulation for
the plastic multiplier based on the work in the previous chapter is then proposed.
This is followed by the linearisation of the governing equations after which the
implementation in the FEniCS framework is discussed. Finally, numerical examples
are presented followed by some computational observations.
130 Chapter 6. Strain gradient plasticity
This section introduces the strain gradient model which will be investigated in
the remainder of this chapter. In the particular model under consideration, the
yield criterion from (2.23), page 34, is augmented with the Laplacian of the internal
hardening variable κ as suggested by Aifantis (1984, 1987) and investigated by, for
instance, Mühlhaus and Aifantis (1991):
where φ σ, qkin (εp ) is a scalar effective stress measure, qkin is a stress-like internal
variable used to model kinematic hardening, qiso is a scalar stress-like term used
to model isotropic hardening, κ is a scalar internal variable, σy is the initial scalar
yield stress and the constant scalar G > 0 is a hardening parameter. The hardening
parameter G determines the contribution of the gradient effect to hardening, and
in the case G = 0 the model reduces to the classical problem in (2.23). As in
Section 2.2.2, a von Mises model with linear isotropic hardening is adopted, see
(2.25). Classical associative plastic flow is assumed1 see (2.26), and isotropic strain-
hardening according to (2.27) is adopted, in which it follows that
κ̇ = λ̇. (6.2)
f˙ (σ, κ ) = 0 (6.4)
1 It has been argued that the plastic flow direction for strain gradient plasticity is governed by a
microstress and not the deviatoric Cauchy stress, see for instance Gudmundson (2004). However, to
ensure a proper comparison of the approach taken in this chapter to the approach of other researchers
investigating the model by Aifantis, this argument is not taken into account.
6.1. A strain gradient plasticity model 131
suitable. A different approach is, therefore, adopted in which the yield criterion
(6.1) is satisfied in a weak sense inside the plastic region at the end of a loading
step: Z
f (σn+1 , λn+1 ) η = 0 ∀ η ∈ W, (6.5)
Ωp
for a suitable choice of W. Together with the standard balance of momentum
equations, this forms a coupled system of equations for computing the unknowns
u and λ. In the remainder of this section the subscript n + 1 is dropped for brevity.
Consider a weak formulation for the yield criterion (6.5):
a (u, λ) ; η = 0 ∀ η ∈ W, (6.6)
with
Z
a (u, λ) ; η := φ σ (u, λ) − Hλ − σy η dx
Ωp
Z Z
−G ∇λ · ∇η dx + G (∇λ · n) η ds, (6.7)
Ωp ∂Ωp
where n is the outward unit normal to ∂Ωp and the last two integrals arise from the
application of integration by parts. The variational form is nonlinear due to how
the stress is computed from u and λ, however, the linearisation of the equations is
postponed to Section 6.3. Due to the presence of integrals involving ∇λ and ∇η
in (6.7), the functions interpolating λ and η should be in the space H 1 (Ω).
A further implication of (6.7) is the necessity of imposing boundary conditions
on λ on the elastic–plastic boundary ∂Ωp . Frequently, the homogeneous boundary
conditions:
p
λ=0 on Γ D , (6.8)
p
∇λ · n = 0 on ΓN , (6.9)
are adopted, see for instance Mühlhaus and Aifantis (1991); De Borst and Mühlhaus
p p
(1992), where Γ D and Γ N denote the parts of the elastic–plastic boundary where
Dirichlet and Neumann conditions for λ are applied respectively. These boundary
conditions bear resemblance to the microhard and microfree boundary conditions
suggested by Gurtin (2004); Gurtin and Needleman (2005) and defined as:
microscopically hard boundary conditions meant to characterise, for example,
microscopic behaviour at the boundary of a ductile metal perfectly bonded to
a ceramic;
microscopically free boundary conditions meant to characterise microscopic be-
haviour at a boundary whose environment exerts no microscopic forces on
the body.
132 Chapter 6. Strain gradient plasticity
The first boundary condition thus prevents plastic flow out of a domain, while the
latter does not.
Considering a discontinuous Galerkin formulation for satisfying the yield crite-
rion in a weak sense carries certain advantages. For instance, by using discontinuous
elements, the yield condition can be satisfied in a local sense (cell-wise). Another
advantage of a discontinuous Galerkin formulation is that the boundary conditions
in (6.8) and (6.9) on the, possibly moving, internal elastic–plastic boundary can be
included naturally. Due to the discontinuous elements, jumps in the λ function
across element boundaries can be represented, which is necessary to accommodate
the ∇λ · n = 0 boundary condition on the elastic–plastic boundary. This issue can
be resolved by leaving the λ function undefined in the elastic region Ωe when
considering continuous function spaces. However, this approach is computation-
ally challenging if the elastic–plastic boundary is moving. Finally, discontinuous
constant elements can be used for λ which is computationally more efficient.
The choice of boundary condition for λ on the elastic–plastic boundary is crucial
for the behaviour of the model. For instance, setting ∇λ · n = 0 on the elastic–
plastic boundary does not guarantee regularisation for a softening problem. The
reason is that this boundary condition permits a constant field for λ inside the
plastic domain which in turn does not introduce a gradient effect. Therefore, the
model does not provide a mechanism by which the plastic domain can expand.
On the other hand, by enforcing λ = 0 on the elastic–plastic boundary a constant
nonzero λ field is no longer possible. This activates the gradient term inside the
plastic domain and thereby provides a mechanism which allows the plastic domain
to expand. Note, however, that for a softening problem the plastic domain will
only expand if the gradient parameter is large enough to overcome the softening
behaviour of the plastic domain and make adjacent elastic elements yield. The
model is, therefore, not suitable for softening problems, see also Engelen et al.
(2006) who investigated the model proposed in Fleck and Hutchinson (2001) as
a representative of a wider class of gradient plasticity models, including that of
Aifantis presented in this section. For hardening problems, on the other hand,
the plastic domain can expand regardless of the choice of boundary condition for
λ although an expanding plastic domain will lead to jumps in the λ field. The
effect of boundary conditions on the behaviour of the model is demonstrated via
numerical examples in Section 6.5.
For the considered plasticity model and boundary conditions, others (De Borst
and Mühlhaus, 1992; De Borst and Pamin, 1996; Djoko et al., 2007a) have shown,
via numerical examples, a regularising effect in the presence of strain softening.
However, the regularising effect is achieved by defining the plastic multiplier on
the entire domain (elastic and plastic) and by imposing excessive regularity of the
plastic multiplier across the elastic–plastic boundary (using a C1 -conforming basis
(De Borst and Mühlhaus, 1992) or introducing penalty terms (De Borst and Pamin,
6.2. A discontinuous Galerkin formulation for the plastic multiplier 133
1996)) or by allowing the plastic multiplier to spread into the elastic region (Djoko
et al., 2007a). This is in contrast to the formulation pursued in this chapter which,
as already mentioned, will use a discontinuous basis for the plastic multiplier and
only consider the gradient term active in the plastic regions.
with Lagrange polynomials of degree m. A weak formulation for the yield criterion
corresponding to (6.5) on a single cell T ∈ Ωp can then be formulated as: find
(u, λ) ∈ V × W such that for all w ∈ W
Z
φ σ (u, λ) − Hλ − σy w dx
T
Z Z
−G ∇λ · ∇w dx + G τ (λ, ∇λ) · n w ds = 0, (6.12)
T ∂T
that the numerical flux is single valued on cell facets. As this is the case for all the
variants of the numerical flux presented in Arnold et al. (2002) the yield criterion is
satisfied locally on each cell for both the IP and lifting-type formulations.
Taking guidance from Section 4.2.1, the variational form for (6.6) using an
interior penalty formulation can be defined as:
Z Z
a f IP (u, λ) ; w := φ σ (u, λ) − Hλ − σy w dx − G ∇λ · ∇w dx
p Ωp
Z Ω
α
Z
+G h∇λi · JwK + JλK · h∇wi ds − G JλK · JwK ds, (6.14)
p
Γ0
p
Γ0 he
where the last term ensures stability of the formulation in which he denotes the
distance between the centroids of two neighbouring elements, α is the usual
p
stabilisation parameter and Γ0 denotes the set of interior facets inside the plastic
region Ωp . Both λ and w are assumed to be functions in W while u ∈ V. This
particular type of formulation was used by Djoko et al. (2007a,b) for a gradient
plasticity model similar to the one described in the previous section.
However, as demonstrated in the previous chapter, the IP formulation is not
suitable when discontinuous piecewise constant elements are used. A lifting-type
formulation is, therefore, developed which is similar in nature to the lifting-type
formulation for the Poisson equation, which was discussed in Section 5.1, although
some definitions are slightly different to take into account that (6.5) is only valid for
regions undergoing plastic deformations. The jump of a function w ∈ W is defined
as:
w+ n+ + w− n− on Γp ,
JwK = 0 (6.15)
wn on ∂Ω ∪ ∂Ωp ,
which is comparable to (5.2), page 116. The function space for the gradient of
functions in W is again denoted by Q and is defined in (5.3). The definition of the
average of a function q ∈ Q is slightly different from that in (5.4) namely:
1 q+ + q− on Γp ,
hqi = 2 0 (6.16)
q on ∂Ω ∪ ∂Ωp .
Also the definition of the lifting operator and the lifting function (equations (5.5)
and (5.6) respectively) are slightly different. The operator rS : W → Q is defined
for a given w ∈ W, find rS (w) ∈ Q such that:
Z Z
rS (w) · q dx = − JwK · hqi ds ∀ q ∈ Q, (6.17)
E S
p
where E = T + ∪ T − , as seen in Figure 4.1, for S ∈ Γ0 ; E ∈ Th is the element
6.3. Linearisation of the governing equations 135
p
associated with the facet S for S ∈ ∂Ω; and for S ∈ Γ D , E is the element inside Ωp
which is associated with the facet S. The lifting function is then defined as:
R (w) = ∑
p p
rS (w) , (6.18)
S ∈ Γ0 ∪ Γ D
which is very similar to the lifting function in (5.6). Note that due to the definitions
of E and S in (6.17), the function is not defined in neither the elastic region of Ω
p
nor on Γ N and it will, therefore, be defined to be zero in both of these cases. The
lifting-type formulation corresponding to the variational form in (6.14) for the yield
criterion then reads:
Z
a f (u, λ) ; w := φ σ (u, λ) − Hλ − σy w dx
Ωp
Z Z
−G ∇λ + R (λ) · ∇w + R (w) dx − ∑ rS (λ) · rS (w) dx,
αG
Ωp p p Ωp
S ∈ Γ0 ∪ Γ D
(6.19)
The steady state balance of momentum equation from (2.12), page 32, at the end of
a loading step reads:
Z Z Z
σ (un+1 , λn+1 ) : ∇v dx − hn+1 · v ds − bn+1 · v dx = 0, (6.20)
Ω ΓN Ω
Together these equations form a coupled system of equations that are nonlinear in
general. Newton’s method is, therefore, employed to obtain a solution by linearising
about a state defined at Newton iteration k.
136 Chapter 6. Strain gradient plasticity
At the end of a loading step the stress tensor can be computed from (2.21)
and (2.22), page 34, by:
p
σn+1 = C : ε n+1 − ε n+1 . (6.23)
p p ∂ f (σn+1 )
ε n+1 − ε n = ∆λ = ∆λN (σn+1 ) , (6.24)
∂σ
where the increment of the plastic multiplier ∆λ = λn+1 − λn . The Newton
increment of the stress tensor is determined by inserting (6.24) into (6.23) and
linearising such that at iteration k:
s ∂Nk
dσ = C : ∇ du − Nk dλ − ∆λk dσ , (6.25)
∂σ
dσ = Ctan : ∇s du − Nk dλ , (6.26)
with −1
∂N
Ctan = C −1
+ ∆λk k . (6.27)
∂σ
Here, ∆λk = λk − λn denotes the total increment in the plastic multiplier measured
from the previously converged state at load step n.
In a similar fashion the increment of the yield function can be found by linearis-
ing (6.1) such that:
d f = Nk dσ − Hdλ + G ∇2 dλ, (6.28)
which after inserting (6.26) results in the following expression for the increment of
the yield function:
Using these increments, the linearised coupled variational formulation for the
equations (6.20) and (6.22) then reads: find (du, dλ) ∈ V × W such that
where
Z Z
a (du, dλ) ; (v, w) = Ctan : ∇s du : ∇v dx − dλHw dx
Ω Ω
Z Z
− dλCtan : Nk : ∇v dx + Nk : Ctan : ∇s du w dx
Ωp Ωp
Z Z
dλNk : Ctan : Nk w dx − G ∇dλ + R (dλ) · ∇w + R (w) dx
−
Ωp Ωp
Z
− ∑
p p
αG
Ωp
rS (dλ) · rS (w) dx (6.31)
S ∈ Γ0 ∪ Γ D
and
Z Z Z Z
L (v, w) = σk : ∇v dx − b · v dx − h · v ds + f k w dx. (6.32)
Ω Ω ΓN Ωp
In the linear form, the homogeneous Dirichlet and Neumann boundary conditions
for λ, seeR(6.8) and (6.9), have been adopted. An important thing to note is that
the term Ω dλHw dx in (6.31) is effective in the entire domain although, strictly
speaking, it should only be effective in regions undergoing plastic deformation.
This is necessary in order to avoid a singular global system when solving the
equations. However, it does not affect the solution because the term Ωp f k w dx
R
6.4 Implementation
Implementing a solver for the coupled nonlinear equations of the gradient plasticity
problem involves advancing the solution from the pseudo time tn to the time tn+1
where the state defined at tn is known. This is achieved by a series of iterations
using a predictor–corrector algorithm outlined in Algorithms 4 and 5 which is
implemented in the C++ class GradPlasProblem in the FEniCS Solid Mechanics
library. The algorithm is inspired by the work of Djoko et al. (2007b) although there
are a few notable differences. Firstly, the evolving plastic region is determined
based on the cell average value of the yield criterion instead of the value at
integration points. This means that an element is either elastic or plastic and that
the elastic–plastic boundary is located on element facets and not inside elements.
138 Chapter 6. Strain gradient plasticity
Algorithm 4 shows the computations for the predictor step. The force b and
boundary condition h is updated to the state n + 1 and the global system in (6.30)
is assembled and solved to get the Newton increments du and dλ which are used
to update the values of u and λ at time n + 1 and iteration number k such that in
general un+1,k ← un+1,k−1 + du and λn+1,k ← λn+1,k−1 + dλ. For the first iteration
un+1,k ← un + du and λn+1,k ← λn + du. In the following, and in Algorithms 4
and 5, the subscripts n + 1 are omitted.
The total increment in the plastic multiplier for the entire load step is computed
at every integration point, line 3. If the cell average of this increment is negative,
the cell is marked as elastic during the entire load step n → n + 1 to avoid the
unstable situation where elements are switching back and forth between the elastic
and plastic state, lines 5–8. Note that it is only the total increment of λ which
6.4. Implementation 139
a f k , w = L (w) ∀ w ∈ W, (6.33)
where Z
a f k , w := f k w dx (6.34)
Ω
and
Z
L (w) := φ (σk ) − Hλk − σy w dx
Ω
Z
−G ∇λk + R (λk ) · ∇w + R (w) dx
Ωp
Z
− ∑
p p
αG
Ωp
rS (λk ) · rS (w) dx, (6.35)
S ∈ Γ0 ∪ Γ D
The yield criterion is evaluated (line 12), using the trial stress, by solving the
variational problem (6.33) under the assumption that the set of plastic elements
p p
remained constant during the last iteration, that is, using Ωk−1 (or Ωn in case
k = 0).
UFL code
V = VectorElement("Lagrange", tetrahedron, 1)
W = FiniteElement("DG", tetrahedron, 0)
EPS = VectorElement("Quadrature", tetrahedron, 1, 6)
TAN = VectorElement("Quadrature", tetrahedron, 1, 36)
element = V * W
(v, w) = TestFunctions(element)
(du, dl) = TrialFunctions(element)
N0 = Coefficient(EPS)
sig0 = Coefficient(EPS)
f0 = Coefficient(W)
t = Coefficient(TAN)
H = 2000.0
G = 800.0
def tangent(t):
return as_matrix([[t[i*6 + j] for j in range(6)] for i in range(6)])
def epsilon(U):
return as_vector([U[i].dx(i) for i in range(3)] \
+ [U[i].dx(j) + U[j].dx(i) for i, j in [(0, 1), (0, 2), (1, 2)]])
def sigma(s):
return as_matrix([[s[0], s[3], s[4]],
[s[3], s[1], s[5]],
[s[4], s[5], s[2]]])
L = inner(sigma(sig0), grad(v))*dx(0) \
+ inner(sigma(sig0), grad(v))*dx(1) + f0*w*dx(1)
Figure 6.1: UFL input for the conventional parts of the variational problem in (6.30)
in three dimensions. In the specific case, continuous, piecewise linear elements are
used for the displacements while discontinuous piecewise constant elements are
used for the plastic multiplier.
6.5. Numerical examples 143
The lifting-type formulation of the model will be considered and the solver is
implemented in the FEniCS framework as outlined in the previous section. Two
combinations of finite element discretisations for the displacement and plastic
multiplier fields will be considered. The first case considers a continuous, piecewise
linear displacement field and a discontinuous, piecewise constant field for the
plastic multiplier λ and will be referred to as the P1 /P0 case. The second case
considers a continuous, piecewise quadratic displacement field and a discontinuous,
piecewise linear field for the plastic multiplier λ and will be referred to as the
P2 /P1 case. For the latter case, it is chosen to use discontinuous, piecewise linear
polynomials for the gradient space Q although constant elements can be used
according to the definition in (5.3). Using equal order elements for the plastic
multiplier and the gradient space improves convergence of the Newton solver
when large gradients are present. Similar observations have been reported by Bassi
and Rebay (1997). Based on the conclusions from the previous chapter regarding
lifting-type formulations for fields involving discontinuous constants, the value of
the stabilisation parameter will be set to α = 10−3 for all examples to avoid that the
stabilisation term governs the solution.
Two types of boundary conditions for λ, see (6.8) and (6.9), are considered for
all examples. The first type will be referred to as the microhard boundary condition
p
where λ = 0 on Γ D = ∂Ωp \ ∂Ω, that is, the facets on the elastic–plastic boundary
which are not located on the exterior of the domain. For the microhard boundary
condition, ∇λ · n = 0 is imposed on the remainder of facets on the plastic boundary
p
such that Γ N = ∂Ωp ∩ ∂Ω. The second type will be referred to as the microfree
boundary condition where ∇λ · n = 0 on ∂Ωp .
120
100
80
Net force [N]
60
40
G =0
20 G = 1E2
G = 1E4
G = 1E6
0
0 0.001 0.002
Displacement [mm]
Figure 6.2: Load-displacement curve for different values of G using the microfree
boundary condition for λ on a unit square loaded in shear.
Figure 6.3: Localisation of λ in the middle column of elements for different values
of the gradient parameter G after the last load step using the microfree boundary
condition.
120
100
80
Net force [N]
60
40
G =0
20 G = 50
G = 250
G = 1000
0
0 0.001 0.002
Displacement [mm]
Figure 6.4: Load-displacement curve for different values of G using the microhard
boundary condition for λ on a unit square loaded in shear.
146 Chapter 6. Strain gradient plasticity
Figure 6.5: Distribution of λ for different values of the gradient parameter G after
the last load step using the microhard boundary condition.
still exhibits softening, but the softening is less if compared to the case where
G = 0MPa. As the value of G is increased, the softening becomes less pronounced
and for G = 250MPa the load-displacement curve is almost perfectly plastic. In
other words, the gradient effect counterbalances the influence of material softening
governed by the hardening parameter H. For G = 1000MPa, the specimen exhibits
a hardening behaviour as the gradient effect becomes dominant compared to the
softening term in the yield function.
The distribution of the λ field after the last load step for the microhard boundary
condition can be seen in Figure 6.5. Note, that for the two cases where G = 50MPa
and G = 250MPa, Figures 6.5a and 6.5b, the plastic zone is still localised in the
middle column of elements. However, the values of λ are different compared to the
classical plasticity case, Figure 6.3a, in that higher values of G correspond to lower
values of λ because the microhard boundary condition drives the values towards
zero. When the gradient parameter is large enough to make the specimen enter the
hardening regime, the plastic zone expands to the adjacent elements as shown in
Figure 6.5c. The expanding plastic zone also accounts for the jumps observed in
the load-displacement curve.
As demonstrated in this example, the model is incapable of providing regulari-
sation under the given conditions when using the microfree boundary condition.
Switching to the microhard boundary condition makes the softening less pro-
nounced for lower values of the gradient parameter although it does not lead to an
expansion of the plastic zone. To make the plastic zone expand, a high value of
the gradient parameter is needed which effectively changes the load-displacement
response from softening to hardening.
(a) Mesh 1 consisting of 690 (b) Mesh 2 consisting of 1566 (c) Mesh 3 consisting of 6370
triangles. triangles. triangles.
depend on the value of the gradient parameter also when using the microfree
boundary condition. As a consequence, regularisation of the softening problem
can be expected to some extent. Three different unstructured meshes, shown in
Figure 6.6, will be considered to demonstrate the influence of the mesh size in
this softening problem. The width of the plate is 10mm while the height is 15mm
and the imperfection in the lower left corner has an extension of 1mm. The left-
hand side of the plate is fixed in the horizontal direction and the bottom is fixed
in vertical direction. The test is performed under plane strain conditions using
material parameters identical to the ones shown in Table 6.1 with the exception
that H = −4000MPa and the yield stress is uniform in the entire domain.
2500
2000
Net force [N]
1500
1000
500
Mesh 1
Mesh 2
Mesh 3
0
0 0.01 0.02 0.03 0.04
Displacement [mm]
Figure 6.7: Mesh dependent softening with G = 0MPa (classical plasticity) for the
P1 /P0 case.
decreasing width as shown in the figure. Apart from the convergence problems,
the overall behaviour is as anticipated. In general, the approach is very sensitive to
the choice of model parameters for softening problems. In particular, the step size,
the mesh size, and values of H and G affects the stability of the problem.
more softening than the coarser mesh for a given value of the gradient parameter.
6.5. Numerical examples 149
Figure 6.8: Localisation of λ with G = 0MPa for the three different mesh cases after
the last converged load step using P1 /P0 elements.
2500
2000
Net force [N]
1500
1000
500
Mesh 1
Mesh 2
Mesh 3
0
0 0.01 0.02 0.03 0.04
Displacement [mm]
Figure 6.9: Load-displacement curve with G = 200MPa using P1 /P0 elements and
the microfree boundary condition for λ.
150 Chapter 6. Strain gradient plasticity
Figure 6.10: Distribution of λ with G = 200MPa for the three different mesh cases
after the final load step using P1 /P0 elements and the microfree boundary condition
for λ.
is that the microfree boundary condition only has a regularising effect while the
plastic zone is developing. Once the plastic zone is fully developed, there is no
mechanism by which the plastic zone can expand as the ‘shape’ of the λ field does
not change. Therefore, no additional gradient effects are introduced and plastic flow
localises in the zone which is already plastic. The microfree boundary condition is,
therefore, not able to produce mesh independent results for this softening problem
even if some plastic strain gradients are present inside the plastic domain.
2500
2000
Net force [N]
1500
1000
500
Mesh 1
Mesh 2
Mesh 3
0
0 0.01 0.02 0.03 0.04
Displacement [mm]
Figure 6.11: Load-displacement curve with G = 200MPa using P1 /P0 elements and
the microhard boundary condition for λ.
avg
plastic zone expands ‘too much’ during an iteration, ∆λk for a given cell T can
become negative due to the diffusive nature of the yield function. For stability
reasons, in line 5 of Algorithm 4, the cell is marked as elastic during the entire load
step should this event occur. However, this introduces an artificial elastic–plastic
boundary in the otherwise plastic domain, which again, due to the boundary
conditions for λ, introduces additional hardening. This is illustrated in Figure 6.13
which shows the distribution of λ on mesh 3 at load steps 6-11. In load step 6-8,
the distribution of λ is developing as expected for the given problem. Then, in load
step 9-11 it is seen how the artificial elastic–plastic boundaries develop as loading
progresses which causes ‘plastic islands’ to emerge in the computational domain.
This effect naturally has an impact on the load-displacement curve as already
shown in Figure 6.11. For fine meshes, it is difficult to avoid this situation when
using the microhard boundary condition for λ. However, while the plastic zone
is expanding smoothly, convergence is usually better compared to the microfree
case. Reducing the loading step size does not improve the stability of the algorithm
avg
because even a small expansion of the plastic zone can result in a negative ∆λk
for a cell well inside the plastic region.
P2 /P1 elements
The influence of using higher order elements for the softening problem is now
investigated. Using higher order elements makes the algorithms even more sensitive
to the choice of model parameters. The hardening modulus is set to H = −2000MPa
152 Chapter 6. Strain gradient plasticity
(a) Load step 8. (b) Load step 9. (c) Load step 10.
(d) Load step 11. (e) Load step 12. (f) Load step 13.
(d) Load step 9. (e) Load step 10. (f) Load step 11.
2500
2000
Net force [N]
1500
1000
500
Mesh 1
Mesh 2
Mesh 3
0
0 0.01 0.02 0.03 0.04
Displacement [mm]
Figure 6.14: Mesh dependent softening with G = 0MPa (classical plasticity) for the
P2 /P1 case.
and the size of the plastic load steps is reduced. An elastic load step of ∆u =
0.005mm is followed by sixty plastic load steps of ∆u = 0.0005mm such that the
total downward displacement after the final load step is, still, u = 0.035mm. Again,
the gradient parameter G is set to zero to verify that the classical theory is mesh
dependent in the current implementation.
The resulting load-displacement curve for the three meshes is shown in Fig-
ure 6.14. For this test set up, the Newton solver failed to converge for the last few
load steps in the case of mesh 2 and for mesh 3 convergence was only achieved for
a couple of load steps after the plastic zone was fully developed.
The distribution of the λ field after the last converged load step can be seen
in Figure 6.15. It is clear that the higher order elements allow the plastic zone to
localise in a zone which is only a couple of elements wide. (Compare to Figure 6.8
for the P1 /P0 case.)
The microfree boundary condition is now applied for the λ field with G =
200MPa and the resulting load-displacement curve can be seen in Figure 6.16.
Compared to the P1 /P0 case, the results are now almost mesh independent. How-
ever, the convergence rate of the Newton solver was poor and for mesh 3 it failed
to converge after a few plastic load steps. Figure 6.17 shows the distribution of
λ for the three different meshes after the last converged load step. The width of
the plastic zone is almost identical for the three meshes and much less dependent
on the cell size compared to the results in Figure 6.15. Note that the softening for
G = 200MPa in Figure 6.16 is much less pronounced compared to Figure 6.14 for
6.5. Numerical examples 155
Figure 6.15: Localisation of λ with G = 0MPa for the three different mesh cases
after the last converged load step using P2 /P1 elements.
2500
2000
Net force [N]
1500
1000
500
Mesh 1
Mesh 2
Mesh 3
0
0 0.01 0.02 0.03 0.04
Displacement [mm]
Figure 6.16: Load-displacement curve with G = 200MPa using P2 /P1 elements and
the microfree boundary condition for λ.
156 Chapter 6. Strain gradient plasticity
Figure 6.17: Distribution of λ with G = 200MPa for the three different mesh cases
after the final load step using P2 /P1 elements and the microfree boundary for λ.
2500
2000
Net force [N]
1500
1000
500
Mesh 1
Mesh 2
Mesh 3
0
0 0.01 0.02 0.03 0.04
Displacement [mm]
Figure 6.18: Load-displacement curve with G = 200MPa using P2 /P1 elements and
the microhard boundary condition for λ.
3000
2500
2000
Net force [N]
1500
1000
500 G =0
G = 800
G = 1600
G = 3200
0
0 0.01 0.02 0.03 0.04
Displacement [mm]
Figure 6.19: Influence of different values of the gradient parameter G for mesh 3
and an isotropic linear hardening modulus H = 2000MPa using P1 /P0 elements
and the microfree boundary condition for λ.
3000
2500
2000
Net force [N]
1500
1000
500 G =0
G = 800
G = 1600
G = 3200
0
0 0.01 0.02 0.03 0.04
Displacement [mm]
Figure 6.20: Influence of different values of the gradient parameter G for mesh 3
and an isotropic linear hardening modulus H = 2000MPa using P1 /P0 elements
and the microhard boundary condition for λ.
6.5. Numerical examples 159
3000
2500
2000
Net force [N]
1500
1000
500 G =0
G = 800
G = 1600
G = 3200
0
0 0.01 0.02 0.03 0.04
Displacement [mm]
Figure 6.21: Influence of different values of the gradient parameter G for mesh 3
and an isotropic linear hardening modulus H = 2000MPa using P2 /P1 elements
and the microfree boundary condition for λ.
3000
2500
2000
Net force [N]
1500
1000
500 G =0
G = 800
G = 1600
G = 3200
0
0 0.01 0.02 0.03 0.04
Displacement [mm]
Figure 6.22: Influence of different values of the gradient parameter G for mesh 3
and an isotropic linear hardening modulus H = 2000MPa using P2 /P1 elements
and the microhard boundary condition for λ.
160 Chapter 6. Strain gradient plasticity
much better for a hardening problem and it is less sensitive to the choice of model
parameters.
6.5.4 Micro-indentation
As shown in the hardening problem in the previous section, the effect of the value
of the gradient parameter on the load-displacement curves was negligible as the
microfree boundary condition on the exterior boundary did not introduce addi-
tional hardening. Therefore, a micro-indentation problem is now investigated to
demonstrate that the model is capable of capturing size effects when the microhard
boundary condition for λ is used on an elastic–plastic boundary located inside the
computational domain.
A three dimensional model problem is considered in which the specimen of
interest has a width of 10mm and a height of 5mm. The specimen is constrained
such that displacements in the normal direction on the four sides and at the bottom
are prevented. The indenter is located at the center of the top part of the domain.
It has a spherical tip with a radius of 1mm and is initially embedded within the
specimen, to which it is rigidly attached, at a depth equal to the radius. The domain
in this initial state is assumed stress free.
A sequence of downward displacements measured from this initial state are
prescribed on the indenter. An elastic load step of ∆u = 0.0008mm is followed
by seven plastic load steps of ∆u = 0.0004mm such that the total downward
displacement after the final load step is u = 0.0036mm. Rather than modelling
the indenter explicitly, the prescribed displacements are imposed on the degrees
of freedom located on the surface of the indenter. Due to the symmetry of the
problem, only one quarter of the domain is modelled.
A front and top view of the computational mesh used for this problem is shown
in Figure 6.23. The mesh is refined in the region around the indenter tip. The
material parameters for this example are identical to the ones shown in Table 6.1
with the exception that H = 2000MPa.
The net force acting on the indenter tip as a function of the indentation depth
for the P1 /P0 case using microfree boundary conditions is shown in Figure 6.24.
Increasing the value of G only has a small effect on the load–displacement curve.
On the other hand, the load–displacement curve shown in Figure 6.25 for the
microhard boundary condition show a much bigger dependence on the value of
G. After load step number 6 the load bearing capacity for the cases G = 1600MPa
and G = 3200MPa increases dramatically. Again, this can be attributed to the effect
avg
of forcing cells to be elastic in Algorithm 4 if ∆λk is negative as explained in the
previous section.
Figures 6.26 and 6.27 show the load–displacement curves for the P2 /P1 case
using microfree and microhard boundary conditions respectively. In case of the
microfree boundary condition, the effect of increasing G has completely vanished.
6.5. Numerical examples 161
Figure 6.23: Finite element mesh for the micro-indentation example consisting of
10979 tetrahedra.
500
400
Net force [N]
300
200
100 G =0
G = 800
G = 1600
G = 3200
0
0 0.001 0.002 0.003 0.004
Indenter displacement [mm]
Figure 6.24: The resulting force on the indenter as a function of the indentation
depth for different values of the gradient parameter G using P1 /P0 elements and
the microfree boundary condition for λ.
162 Chapter 6. Strain gradient plasticity
500
400
Net force [N]
300
200
100 G =0
G = 800
G = 1600
G = 3200
0
0 0.001 0.002 0.003 0.004
Indenter displacement [mm]
Figure 6.25: The resulting force on the indenter as a function of the indentation
depth for different values of the gradient parameter G using P1 /P0 elements and
the microhard boundary condition for λ.
500
400
Net force [N]
300
200
100 G =0
G = 800
G = 1600
G = 3200
0
0 0.001 0.002 0.003 0.004
Indenter displacement [mm]
Figure 6.26: The resulting force on the indenter as a function of the indentation
depth for different values of the gradient parameter G using P2 /P1 elements and
the microfree boundary condition for λ.
500
400
Net force [N]
300
200
100 G =0
G = 800
G = 1600
G = 3200
0
0 0.001 0.002 0.003 0.004
Indenter displacement [mm]
Figure 6.27: The resulting force on the indenter as a function of the indentation
depth for different values of the gradient parameter G using P2 /P1 elements and
the microhard boundary condition for λ.
164 Chapter 6. Strain gradient plasticity
Figure 6.28: Close-ups of the region around the indenter tip which show the
distribution of λ at the final load step for different values of G using P2 /P1 elements
and the microfree boundary condition for λ.
6.5. Numerical examples 165
Figure 6.29: Close-ups of the region around the indenter tip which show the
distribution of λ at the final load step for different values of G using P2 /P1 elements
and the microhard boundary condition for λ.
166 Chapter 6. Strain gradient plasticity
In this work, the automated modelling framework of FEniCS has been developed
in a number of directions with the aim to facilitate rapid implementation and
testing for a wider range of problems. The developed extensions are widely used
by researchers and application developers in a number of different fields, see the
introduction to Chapter 3 and Section 4.2.5 for examples. The main contributions
can be summarised as follows. Efficiency is an issue when large scale problems
are solved using the finite element method. The development of the quadrature
representation and its optimisations has, therefore, extended the applicability of the
automated modelling concepts to more complex problems. Discontinuous Galerkin
methods, and methods that use discontinuous Galerkin concepts, may be applied
to problems other than strain gradient plasticity as demonstrated in this work. The
extensions to FEniCS for discontinuous Galerkin methods developed in this work,
therefore, also apply to these problems. Finally, the quadrature element, developed
for correct linearisation of plasticity problems, can be used for other problems
where functions do not come from a finite element space.
Conclusions
The main conclusions of this work relate to the representations and optimisations
of finite element forms, the automation of discontinuous Galerkin methods and
strain gradient plasticity. Numerical experiments have shown that the relative
run-time performance of the quadrature representation and the tensor contraction
representation can differ substantially depending on the nature of the considered
variational form. In general, the tensor contraction approach deals well with
forms which involve high-order bases and few coefficient functions, whereas the
quadrature representation is more efficient as the number of coefficient functions
(other than constant coefficients) and derivatives in a form increases. Hence, in
general, the quadrature representation is significantly faster for more complicated
forms. Furthermore, it has been shown, that quadrature optimisations can have a
significant impact on the run-time performance. It is, therefore, desirable to select
the most favorable representation and optimisation strategy based on an a priori
168 Chapter 7. Conclusions and future developments
inspection of the variational form. However, the code with the lowest number of
flops, at least for the quadrature representation, does not always perform best for a
given form. In addition, the run-time performance even depends on which C++
compiler options are used. A strategy for selecting between representations and
optimisations based only on an estimation of the number of flops does, therefore,
not seem feasible.
By developing extensions for supporting discontinuous Galerkin methods a
range of discontinuous variational formulations can be implemented in a relatively
straightforward fashion in the FEniCS framework. However, the new abstractions
also permit other formulations that build on concepts from discontinuous Galerkin
methods to be implemented by using the developed extensions as building blocks.
This has been demonstrated in a semi-automated implementation of a lifting-type
discontinuous Galerkin formulation for the Poisson equation. The lifting-type
formulation has two main advantages in relation to this work compared to the
interior penalty formulation. Firstly, it is stable for all positive values of the
stabilisation parameter. Secondly, as numerical experiments indicate, one can use
a constant basis for the Poisson equation, something which is not possible when
using the interior penalty method.
The Aifantis strain gradient plasticity model was implemented in the FEniCS
framework using a continuous, piecewise linear displacement field and a discon-
tinuous, piecewise constant field for the plastic multiplier. The latter was possible
because a lifting-type discontinuous Galerkin formulation was used for the plas-
tic multiplier. The implementation was also tested successfully for a continuous,
piecewise quadratic displacement field and a discontinuous, piecewise linear field
for the plastic multiplier. It was demonstrated that the model is not suitable for
softening problems. Size effects, on the other hand, were observed for a hardening
problem in the micro-indentation example, provided that the microhard boundary
condition was employed for the plastic multiplier. Some numerical problems were,
however, observed during load steps in which the plastic region did not expand
smoothly. The observed problems originate from the algorithm which handles the
update of state variables as it will force an element to be elastic during a load step
if the average of the total increment of the plastic multiplier becomes negative in a
given iteration. This issue should be resolved in order to produce reliable results.
Future developments
Using this work as a basis, the following future developments of the FEniCS
framework could be of interest. As the user base of FEniCS grows, so does
the desire of solving problems of increasing complexity. Therefore, continued
investigations into further optimising the quadrature representation is desirable.
The optimisations should focus on both run-time performance of the generated
169
Alfred, V., Sethi, R., and Jeffrey, D. U. (1986). Compilers: Principles, Techniques and
Tools. Addison-Wesley, Reading, Massachusetts.
Allen, G., Benger, W., Goodale, T., Hege, H.-C., Lanfermann, G., Merzky, A., Radke,
T., Seidel, E., and Shalf, J. (2000). The Cactus code: a problem solving environment
for the grid. In High-Performance Distributed Computing, 2000. Proceedings. The
Ninth International Symposium on, pages 253–260.
Alnæs, M. S. (2012). UFL: a finite element form language. In Logg, A., Mardal,
K.-A., and Wells, G. N., editors, Automated Solution of Differential Equations by
the Finite Element Method, volume 84 of Lecture Notes in Computational Science and
Engineering, chapter 17. Springer.
Alnæs, M. S., Logg, A., and Mardal, K.-A. (2012). UFC: a finite element code
generation interface. In Logg, A., Mardal, K.-A., and Wells, G. N., editors,
Automated Solution of Differential Equations by the Finite Element Method, volume 84
of Lecture Notes in Computational Science and Engineering, chapter 16. Springer.
Alnæs, M. S., Logg, A., Mardal, K.-A., Skavhaug, O., and Langtangen, H. P.
(2009). Unified framework for finite element assembly. International Journal of
Computational Science and Engineering, 4(4):231–244.
Alnæs, M. S., Logg, A., Ølgaard, K. B., Rognes, M. E., and Wells, G. N. (2013).
Unified Form Language: A domain-specific language for weak formulations
of partial differential equations. ACM Transactions on Mathematical Software, To
appear. http://arxiv.org/abs/1211.4047.
172 References
Alnæs, M. S. and Mardal, K.-A. (2012). SyFi and SFC: Symbolic finite elements
and form compilation. In Logg, A., Mardal, K.-A., and Wells, G. N., editors,
Automated Solution of Differential Equations by the Finite Element Method, volume 84
of Lecture Notes in Computational Science and Engineering, chapter 15. Springer.
Arnold, D. N., Brezzi, F., Cockburn, B., and Marini, L. D. (2002). Unified analysis for
discontinuous Galerkin methods for elliptic problems. SIAM Journal on Numerical
Analysis, 39(5):1749–1779.
Balay, S., Buschelman, K., Gropp, W. D., Kaushik, D., Knepley, M. G., McInnes,
L. C., Smith, B. F., and Zhang, H. (2001). PETSc Web page. http://www.mcs.anl.
gov/petsc/.
Bastian, P., Blatt, M., Dedner, A., Engwer, C., Klöfkorn, R., Kornhuber, R., Ohlberger,
M., and Sander, O. (2008a). A Generic Grid Interface for Parallel and Adaptive
Scientific Computing. Part II: Implementation and Tests in DUNE. Computing,
82(2–3):121–138.
Bastian, P., Blatt, M., Dedner, A., Engwer, C., Klöfkorn, R., Ohlberger, M., and
Sander, O. (2008b). A Generic Grid Interface for Parallel and Adaptive Scientific
Computing. Part I: Abstract Framework. Computing, 82(2–3):103–119.
Bonet, J. and Wood, R. D. (1997). Nonlinear Continuum Mechanics for Finite Element
Analysis. Cambridge University Press.
Brandenburg, C., Lindemann, F., Ulbrich, M., and Ulbrich, S. (2012). Advanced
numerical methods for PDE constrained optimization with application to optimal
design in Navier Stokes flow. In Leugering, G., Engell, S., Griewank, A., Hinze,
M., Rannacher, R., Schulz, V., Ulbrich, M., and Ulbrich, S., editors, Constrained
Optimization and Optimal Control for Partial Differential Equations, volume 160 of
International Series of Numerical Mathematics, pages 257–275. Springer Basel.
Brezzi, F., Douglas, Jim, J., and Marini, L. (1985). Two families of mixed finite
elements for second order elliptic problems. Numerische Mathematik, 47:217–235.
Brezzi, F., Manzini, G., Marini, D., Pietra, P., and Russo, A. (2000). Discontinuous
Galerkin approximations for elliptic problems. Numerical Methods for Partial
Differential Equations, 16(4):365–378.
Djoko, J. K., Ebobisse, F., McBride, A. T., and Reddy, B. D. (2007a). A discontinuous
Galerkin formulation for classical and gradient plasticity – Part 1: Formulation
and analysis. Computer Methods in Applied Mechanics and Engineering, 196(37–
40):3881–3897.
Djoko, J. K., Ebobisse, F., McBride, A. T., and Reddy, B. D. (2007b). A discontinuous
Galerkin formulation for classical and gradient plasticity. Part 2: Algorithms
and numerical analysis. Computer Methods in Applied Mechanics and Engineering,
197(1–4):1–21.
Dular, P., Geuzaine, C., Henrotte, F., and Legros, W. (1998). A general environment
for the treatment of discrete problems and its application to the finite element
method. Magnetics, IEEE Transactions on, 34(5):3395–3398.
Engel, G., Garikipati, K., Hughes, T. J. R., Larson, M. G., and Taylor, R. L. (2002).
Continuous/discontinuous finite element approximations of fourth-order elliptic
problems in structural and continuum mechanics with applications to thin beams
and plates, and strain gradient elasticity. Computer Methods in Applied Mechanics
and Engineering, 191(34):3669–3750.
Fleck, N., Muller, G., Ashby, M., and Hutchinson, J. (1994). Strain gradient plasticity:
theory and experiment. Acta Metallurgica et Materialia, 42(2):475–487.
Gao, H., Huang, Y., Nix, W., and Hutchinson, J. (1999). Mechanism-based strain gra-
dient plasticity– I. theory. Journal of the Mechanics and Physics of Solids, 47(6):1239 –
1263.
Giesselmann, J., Makridakis, C., and Pryer, T. (2012). Energy consistent DG methods
for the Navier–Stokes–Korteweg system. arXiv preprint. http://arxiv.org/abs/
1207.4647.
Grandi, D., Maraldi, M., and Molari, L. (2012). A macroscale phase-field model for
shape memory alloys with non-isothermal effects: Influence of strain rate and
environmental conditions on the mechanical response. Acta Materialia, 60(1):179–
191.
Heumann, H. and Hiptmair, R. (2012). Stabilized Galerkin methods for magnetic ad-
vection. ETH Zürich. ftp://ftp.sam.math.ethz.ch/pub/sam-reports/reports/
reports2012/2012-26.pdf.
Hilber, H. M., Hughes, T. J., and Taylor, R. L. (1977). Improved numerical dissipation
for time integration algorithms in structural dynamics. Earthquake Engineering &
Structural Dynamics, 5(3):283–292.
Hoffman, J., Jansson, J., de Abreu, R. V., Degirmenci, N. C., Jansson, N., Müller,
K., Nazarov, M., and Spühler, J. H. (2013). Unicorn: Parallel adaptive finite ele-
ment simulation of turbulent flow and fluid–structure interaction for deforming
domains and complex geometry. Computers & Fluids, 80(0):310–319. Selected
contributions of the 23rd International Conference on Parallel Fluid Dynamics.
Horst, T., Heinrich, G., Schneider, M., Schulze, A., and Rennert, M. (2013). Linking
mesoscopic and macroscopic aspects of crack propagation in elastomers. In
Grellmann, W., Heinrich, G., Kaliske, M., Klüppel, M., Schneider, K., and Vilgis,
T., editors, Fracture Mechanics and Statistical Mechanics of Reinforced Elastomeric
Blends, volume 70 of Lecture Notes in Applied and Computational Mechanics, pages
129–165. Springer Berlin Heidelberg.
Hosangadi, A., Fallah, F., and Kastner, R. (2006). Optimizing polynomial expressions
by algebraic factorization and common subexpression elimination. Computer-
Aided Design of Integrated Circuits and Systems, IEEE Transactions on, 25(10):2012–
2022.
Hughes, T. J. R., Scovazzi, G., Bochev, P. B., and Buffa, A. (2006). A multiscale
discontinuous Galerkin method with the computational structure of a continuous
Galerkin method. Computer Methods in Applied Mechanics and Engineering, 195(19–
22):2761–2787.
Jansson, N., Hoffman, J., and Nazarov, M. (2011). Adaptive simulation of turbulent
flow past a full car model. In High Performance Computing, Networking, Storage
and Analysis (SC), 2011 International Conference for, pages 1–8. IEEE.
Kirby, R. C. (2004). Algorithm 839: FIAT, A new paradigm for computing finite
element basis functions. ACM Transactions on Mathematical Software, 30:502–516.
Kirby, R. C., Knepley, M. G., Logg, A., and Scott, L. R. (2005). Optimizing the
evaluation of finite element matrices. SIAM Journal on Scientific Computing,
27(3):741–758.
Kirby, R. C. and Logg, A. (2006). A compiler for variational forms. ACM Transactions
on Mathematical Software, 32:417–444.
Kirby, R. C., Logg, A., Scott, L. R., and Terrel, A. R. (2006). Topological optimization
of the evaluation of finite element matrices. SIAM Journal on Scientific Computing,
28(1):224–240.
Labeur, R. and Wells, G. (2012). Energy stable and momentum conserving hybrid
finite element method for the incompressible Navier–Stokes equations. SIAM
Journal on Scientific Computing, 34(2):889–913.
Labeur, R. J. and Wells, G. N. (2009). Interface stabilised finite element method for
moving domains and free surface flows. Computer Methods in Applied Mechanics
and Engineering, 198(5–8):615 – 630.
Lakkis, O. and Pryer, T. (2011). A finite element method for fully nonlinear elliptic
problems. arXiv preprint. http://arxiv.org/abs/1103.2970.
Logg, A., Mardal, K.-A., and Wells, G. N., editors (2012a). Automated Solution of
Differential Equations by the Finite Element Method, volume 84 of Lecture Notes in
Computational Science and Engineering. Springer.
Logg, A., Mardal, K.-A., and Wells, G. N. (2012b). Finite element assembly. In
Automated Solution of Differential Equations by the Finite Element Method, volume 84
of Lecture Notes in Computational Science and Engineering, chapter 6. Springer.
Logg, A., Ølgaard, K. B., Rognes, M. E., and Wells, G. N. (2012c). FFC: the fenics
form compiler. In Logg, A., Mardal, K.-A., and Wells, G. N., editors, Automated
Solution of Differential Equations by the Finite Element Method, volume 84 of Lecture
Notes in Computational Science and Engineering, chapter 11. Springer.
Logg, A., Wells, G. N., and Hake, J. (2012d). DOLFIN: A C++/Python finite element
library. In Automated Solution of Differential Equations by the Finite Element Method,
volume 84 of Lecture Notes in Computational Science and Engineering, chapter 10.
Springer.
Long, K., Kirby, R., and Van Bloemen Waanders, B. (2010). Unified embedded
parallel finite element computations via software-based Fréchet differentiation.
SIAM Journal on Scientific Computing, 32:3323–3351.
Lopes, N., Pereira, P., and Trabucho, L. (2011). A numerical analysis of a class of
generalized Boussinesq-type equations using continuous/discontinuous FEM.
International Journal for Numerical Methods in Fluids, 69(7):1186–1218.
Maraldi, M., Molari, L., and Grandi, D. (2012). A unified thermodynamic framework
for the modelling of diffusive and displacive phase transitions. International
Journal of Engineering Science, 50(1):31 – 45.
Maraldi, M., Wells, G., and Molari, L. (2011). Phase field model for coupled
displacive and diffusive microstructural processes under thermal loading. Journal
of the Mechanics and Physics of Solids, 59(8):1596–1612.
178 References
Massing, A., Larson, M., and Logg, A. (2013). Efficient implementation of finite
element methods on nonmatching and overlapping meshes in three dimensions.
SIAM Journal on Scientific Computing, 35(1).
Massing, A., Larson, M. G., Logg, A., and Rognes, M. E. (2012a). A stabilized
Nitsche fictitious domain method for the Stokes problem. arXiv preprint. http:
//arxiv.org/abs/1206.1933.
Massing, A., Larson, M. G., Logg, A., and Rognes, M. E. (2012b). A stabilized
Nitsche overlapping mesh method for the Stokes problem. arXiv preprint. http:
//arxiv.org/abs/1205.6317.
Miaskowski, A., Sawicki, B., and Krawczyk, A. (2012). The use of magnetic nanopar-
ticles in low frequency inductive hyperthermia. COMPEL: The International Journal
for Computation and Mathematics in Electrical and Electronic Engineering, 31(4):1096–
1104.
Molari, L., Wells, G. N., Garikipati, K., and Ubertini, F. (2006). A discontinuous
Galerkin method for strain gradient-dependent damage: Study of interpolations
and convergence. Computer Methods in Applied Mechanics and Engineering, 195(13–
16):1480–1498.
Poole, W., Ashby, M., and Fleck, N. (1996). Micro-hardness of annealed and
work-hardened copper polycrystals. Scripta Materialia, 34(4):559–564.
179
Püschel, M., Moura, J. M. F., Johnson, J., Padua, D., Veloso, M., Singer, B., Xiong, J.,
Franchetti, F., Gacic, A., Voronenko, Y., Chen, K., Johnson, R. W., and Rizzolo,
N. (2005). SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE,
93(2):232– 275.
Riesen, P., Hutter, K., and Funk, M. (2010). A viscoelastic Rivlin–Ericksen material
model applicable to glacier ice. Nonlinear Processes in Geophysics, 17:673–684.
Riesen, P. D. (2011). Variations of the surface ice motion of Gornergletscher during
drainages of the ice-dammed lake Gornersee. PhD thesis, ETH Zürich. http://dx.
doi.org/10.3929/ethz-a-006526655.
Rognes, M., Kirby, R., and Logg, A. (2010). Efficient assembly of h(div) and h(curl)
conforming finite elements. SIAM Journal on Scientific Computing, 31(6):4130–4151.
Rognes, M. E. and Logg, A. (2012). Automated goal-oriented error control I:
Stationary variational problems. arXiv preprint. http://arxiv.org/abs/1204.
6643.
Rosseel, E. and Wells, G. N. (2012). Optimal control with stochastic PDE constraints
and uncertain controls. Computer Methods in Applied Mechanics and Engineering,
213–216(0):152 – 167.
Russell, F. P. and Kelly, P. H. J. (2013). Optimized code generation for finite element
local assembly using symbolic manipulation. ACM Transactions on Mathematical
Software, 39(4).
Saibaba, A. K., Bakhos, T., and Kitanidis, P. K. (2012). A flexible Krylov solver
for shifted systems with application to oscillatory hydraulic tomography. arXiv
preprint. http://arxiv.org/abs/1212.3660.
Selim, K. (2012). An adaptive finite element solver for fluid–structure interaction
problems. In Logg, A., Mardal, K.-A., and Wells, G. N., editors, Automated
Solution of Differential Equations by the Finite Element Method, volume 84 of Lecture
Notes in Computational Science and Engineering, chapter 29. Springer.
Selim, K., Logg, A., and Larson, M. (2012). An adaptive finite element splitting
method for the incompressible navier–tokes equations. Computer Methods in
Applied Mechanics and Engineering, 209–212(0):54–65.
180 References
Shewchuk, J. R. and Ghattas, O. (1993). A compiler for parallel finite element meth-
ods with domain-decomposed unstructured meshes. In Keyes, D. E. and Xu, J.,
editors, Proceedings of the Seventh International Conference on Domain Decomposition
Methods in Scientific and Engineering Computing (Pennsylvania State University),
Contemporary Mathematics, volume 180, pages 445–450. American Mathematical
Society.
Stölken, J. and Evans, A. (1998). A microbend test method for measuring the
plasticity length scale. Acta Materialia, 46(14):5109–5115.
Sukys, J., Hiptmair, R., and Heumann, H. (2010). Discontinuous Galerkin discretiza-
tion of magnetic convection. ETH Zürich. http://math1.unice.fr/~hheumann/
Files/Report_Sukys.pdf.
Ten Eyck, A., Celiker, F., and Lew, A. (2008). Adaptive stabilization of discontin-
uous Galerkin methods for nonlinear elasticity: Motivation, formulation, and
numerical examples. Computer Methods in Applied Mechanics and Engineering,
197(45–48):3605–3622.
Ten Eyck, A. and Lew, A. (2006). Discontinuous Galerkin methods for non-linear
elasticity. International Journal for Numerical Methods in Engineering, 67(9):1204–
1243.
Vynnytska, L., Rognes, M., and Clark, S. (2013). Benchmarking FEniCS for man-
tle convection simulations. Computers & Geosciences, 50(0):95–105. Benchmark
problems, datasets and methodologies for the computational geosciences.
Wells, G. N., Hooijkaas, T., and Shan, X. (2008). Modelling temperature effects on
multiphase flow through porous media. Philosophical Magazine, 88(28–29):3265–
3279.
Ølgaard, K. B., Logg, A., and Wells, G. N. (2008a). Automated code generation for
discontinuous Galerkin methods. SIAM Journal on Scientific Computing, 31(2):849–
864.
1. Automatic code generation can reduce the time needed to implement finite
element solvers, but only if one has faith in the generator that generates the
code. (This proposition was conceived after many hours of debugging the FEniCS
Form Compiler only to find out that there was a sign error in the input code.)
2. In the past, finite element solvers for partial differential equations (PDEs) were
written in languages (source code) that compilers translated into machine
code. In the present, a high-level language for expressing the mathematical
formulation of a given PDE makes it possible for compilers to automatically
generate source code. In the future, automated model generators may create
PDEs from experimental data.
4. In the big picture, humans, as a species, are already redundant, but this does
not imply that strong AI has been developed yet.
6. “A young man naturally conceives an aversion to labour, when for a long time
he receives no benefit from it.” – Adam Smith, reflections on apprenticeships
in An Inquiry into the Nature and Causes of the Wealth of Nations. Similarly, a
PhD student may experience a drop in motivation if project funding runs
out. The solution is to improve project planning rather than extending the
funding.
7. Dijkstra’s shortest path algorithm works well for planning tasks of short
duration, but it is not suitable for planning long-term research projects.
8. Time (or money) is the penalty parameter closing the gap between ambition
and actual work done.
188 Propositions
13. Recent debate whether or not Sinterklaas (Saint Nicholas) can have Zwarte
Piet (Black Peter) as his helper (provided his employment is in accordance
with the collective agreement) misses the point. The real problem is if Zwarte
Piet is dismissed because of his skin colour.
14. Although a PhD thesis is rarely read cover to cover, most people will read
the propositions and then go through the references to see how many papers
have been published based on the present work.
These propositions are regarded as opposable and defendable, and have been
approved as such by the supervisors Prof. dr. ir. L. J. Sluys and Dr. G. N. Wells.
Stellingen
4. In het grotere plaatje zijn mensen, als soort, al overbodig, maar dit impliceert
niet dat sterke KI al ontwikkeld is.
5. In termen van het ontdekken van bugs en het sturen van de ontwikkeling
van een software-project, is een grote gebruikersgroep meer waard dan een
groot aantal ontwikkelaars.
6. “Een jongeman ontwikkelt van nature een afkeer van arbeid wanneer hij
er gedurende een langere periode geen baat bij heeft.” – Adam Smith,
overdenkingen over leerlingschap in An Inquiry into the Nature and Causes of
the Wealth of Nations. Evenzo kan een promovendus een afname in motivatie
ervaren wanneer de projectfinanciering ophoudt. De oplossing is eerder om
de projectplanning te verbeteren dan om de financiering te verlengen.
190 Stellingen
8. Tijd (of geld) is de penalty parameter die de kloof tussen ambitie en werkelijk
verrichte arbeid dicht.
10. Een kind voldoet niet vanaf zijn geboorte aan randvoorwaarden. Randvoor-
waarden moeten daarom afgedwongen worden in zwakke zin met straf,
beloning en het geven van het goede voorbeeld.
11. Gradiënten in de verdeling van rijkdom zijn een voorwaarde voor een dy-
namische maatschappij.
12. Samenwerking maakt het mogelijk dat een sterke groep individuen een groep
sterke individuen aftroeft. Dit concept werkt zowel voor de wetenschap als
voor sport en is, bijvoorbeeld, te zien geweest in de EK-voetbalwedstrijden
tussen Nederland en Denemarken in 1992 en 2012.
13. De recente discussie over of Sinterklaas al dan niet Zwarte Piet als zijn helper
mag hebben (gegeven dat zijn dienstverband in overeenstemming is met
de collectieve arbeidsovereenkomst) mist het wezenlijke punt. Het echte
probleem ontstaat als Zwarte Piet ontslagen wordt vanwege zijn huidskleur.
14. Hoewel een proefschrift zelden van kaft tot kaft gelezen wordt, zullen de
meeste mensen de stellingen lezen en dan de bibliografie doornemen om te
zien hoeveel artikelen gepubliceerd zijn op basis van het betreffende werk.