Vous êtes sur la page 1sur 201

Automated computational modelling for

complicated partial differential equations


Automated computational modelling for
complicated partial differential equations

Proefschrift

ter verkrijging van de graad van doctor


aan de Technische Universiteit Delft,
op gezag van de Rector Magnificus prof. ir. K.C.A.M. Luyben,
voorzitter van het College voor Promoties,
in het openbaar te verdedigen op dinsdag 3 december 2013 om 12.30 uur

door

Kristian Breum ØLGAARD


Master of Science in Civil Engineering, Aalborg Universitet Esbjerg
geboren te Ringkøbing, Denemarken
Dit proefschrift is goedgekeurd door de promotor:
Prof. dr. ir. L. J. Sluys

Copromotor:
Dr. G. N. Wells

Samenstelling promotiecommissie:
Rector Magnificus Voorzitter
Prof. dr. ir. L. J. Sluys Technische Universiteit Delft, promotor
Dr. G. N. Wells University of Cambridge, copromotor
Dr. ir. M. B. van Gijzen Technische Universiteit Delft
Prof. dr. P. H. J. Kelly Imperial College London
Prof. dr. R. Larsson Chalmers University of Technology
Prof. dr. L. R. Scott University of Chicago
Prof. dr. ir. C. Vuik Technische Universiteit Delft
Prof. dr. A. Scarpas Technische Universiteit Delft, reservelid

Copyright © 2013 by K. B. Ølgaard


Printed by Ipskamp Drukkers B.V., Enschede, The Netherlands
ISBN 978-94-6191-990-8
Foreword

This thesis represents the formal end of my long and interesting journey as a
PhD student. The sum of many experiences over the past years has increased my
knowledge and contributed to my personal development. All these experiences
originate from the interaction with many people to whom I would like to express
my gratitude.
I am most grateful to Garth Wells for giving me the opportunity to come to
Delft and to study under his competent supervision. His constructive criticism
and vision combined with our nice discussions greatly improved the quality of
my research. As the head of the computational mechanics group, Bert Sluys has
played a vital role by creating a very nice and supportive working environment
where people enjoy a lot of creative freedom. As creativity is key in this research I
consider myself lucky to have been part of Bert’s group.
Ronnie Pedersen did a very good job in persuading me to come to Delft for
a PhD, and I am happy that he managed to convince me. I am also grateful for
enjoying his friendship throughout the years, the good times on the football pitch,
and the even better times in ’t Proeflokaal watching football and discussing work
and life in general.
A friendly and inspiring working environment is important in order to produce
quality work. Therefore, I would like to thank past and present colleagues Rafid Al-
Khoury, Roberta Bellodi, Frank Custers, Frank Everdij, Huan He, Cecilia Iacono, Cor
Kasbergen, Oriol Lloberas-Valls, Prithvi Mandapalli, Frans van der Meer, Andrei
Metrikine, Peter Moonen, Dung Nguyen, Vinh Phu Nguyen, Mehdi Nikbakth,
Marjon van der Perk, Frank Radtke, Zahid Shabir, Xuming Shan, Angelo Simone,
Mojtaba Talebian, Andy Terrel, Ilse Vegt, Jaap Weerheijm, Sigurd Blöndal, Lars
Damkilde, Niels Dollerup, Jens Hagelskjær, Michael Jepsen, Sven Krabbenhøft
and Søren Lambertsen. In particular, I would like to thank Frans for the years
that we shared the same office and for translating the propositions into Dutch. A
special thanks goes to Mehdi, my ‘brother-in-arms’, the only person remaining
in the group who was also involved with the FEniCS Project after Garth left for
Cambridge and Xuming left for home.
The research presented in this thesis, is centered around the FEniCS Project and,
vi

therefore, I would also like to thank all the people in the FEniCS community, in
particular my close collaborators from Simula Anders Logg, Martin Alnæs, Marie
Rognes and Johan Hake for all the nice discussions, debugging assistance and
good ideas. During my PhD, I also had the pleasure of visiting the University of
Michigan and in this regard I want to thank Krishna Garikipati, Jake Ostien and
his wife Erin for their hospitality during my stay in Ann Arbor.
Outside the office, I enjoyed many hours in the good company of my friends
Linda Grimstrup and Lars Freising which definitely improved the quality of my
social life a lot. I also want to thank all my former team mates at Vitesse Delft for
the many memorable hours on the football pitch trying to learn the secrets behind
‘totaalvoetbal’. Although The Netherlands and Denmark are quite similar in terms
of weather, nature and culture it was always nice to receive visitors from home. For
this, I would like to thank my friends Kenneth Guldager, Henrik Hansen, Mads
Madsen, Christian Meyer, Nick Nørreby and Thomas Sørensen.
Last, but certainly not least I want to thank my parents and my brother and
sisters for their encouragement, support, help and visits during my years in Delft. I
also wish to thank both of my sons for putting things in perspective which helped
me to focus during the last iterations towards finishing this thesis. Of all people, I
am most grateful to my wife. I know her patience has been tested to the limit, yet
she remained supportive, loving and caring during all the years. For this, and for
our sons, I am forever indebted.
The research presented in this thesis was carried out at the Faculty of Civil
Engineering and Geosciences at Delft University of Technology. The research
was supported by the Netherlands Technology Foundation STW, the Netherlands
Organisation for Scientific Research and the Ministry of Public Works and Water
Management.

Kristian Breum Ølgaard


Ølgod, Denmark, November 2013
Contents

1 Introduction 1
1.1 Research objectives and approach . . . . . . . . . . . . . . . . . . . . 2
1.2 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 The FEniCS Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.1 Simple model problem . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.2 Unified Form Language . . . . . . . . . . . . . . . . . . . . . . 8
1.3.3 FEniCS Form Compiler . . . . . . . . . . . . . . . . . . . . . . 11
1.3.4 Unified Form-assembly Code . . . . . . . . . . . . . . . . . . . 15
1.3.5 DOLFIN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2 FEniCS applications to solid mechanics 29


2.1 Governing equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.1.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.1.2 Balance of momentum . . . . . . . . . . . . . . . . . . . . . . . 31
2.1.3 Potential energy minimisation . . . . . . . . . . . . . . . . . . 32
2.2 Constitutive models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.2.1 Linearised elasticity . . . . . . . . . . . . . . . . . . . . . . . . 33
2.2.2 Flow theory of plasticity . . . . . . . . . . . . . . . . . . . . . . 34
2.2.3 Hyperelasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.3 Linearisation issues for complex constitutive models . . . . . . . . . 36
2.3.1 Consistency of linearisation . . . . . . . . . . . . . . . . . . . . 36
2.3.2 Quadrature elements . . . . . . . . . . . . . . . . . . . . . . . . 38
2.4 Implementations and examples . . . . . . . . . . . . . . . . . . . . . . 41
2.4.1 Linearised elasticity . . . . . . . . . . . . . . . . . . . . . . . . 43
2.4.2 Plasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.4.3 Hyperelasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.4.4 Elastodynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.5 Current and future developments . . . . . . . . . . . . . . . . . . . . 53
viii Contents

3 Representations and optimisations of finite element variational forms 57


3.1 Motivation and approach . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.2 Representation of finite element tensors . . . . . . . . . . . . . . . . . 60
3.2.1 Quadrature representation . . . . . . . . . . . . . . . . . . . . 61
3.2.2 Tensor contraction representation . . . . . . . . . . . . . . . . 62
3.3 Quadrature optimisations . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.3.1 Eliminate operations on zeros . . . . . . . . . . . . . . . . . . 67
3.3.2 Simplify expressions . . . . . . . . . . . . . . . . . . . . . . . . 69
3.3.3 Precompute integration point constants . . . . . . . . . . . . . 71
3.3.4 Precompute basis constants . . . . . . . . . . . . . . . . . . . . 72
3.3.5 Further optimisations . . . . . . . . . . . . . . . . . . . . . . . 73
3.4 Performance comparisons of representations . . . . . . . . . . . . . . 75
3.4.1 Performance for a selection of forms . . . . . . . . . . . . . . . 75
3.4.2 Performance for common, simple forms . . . . . . . . . . . . 80
3.4.3 Performance for forms of increasing complexity . . . . . . . . 82
3.5 Performance comparisons of quadrature optimisations . . . . . . . . 86
3.6 Automatic selection of representation . . . . . . . . . . . . . . . . . . 92
3.7 Future optimisations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

4 Automation of discontinuous Galerkin methods 97


4.1 Extending the framework to discontinuous Galerkin methods . . . . 98
4.1.1 Extending the Unified Form Language . . . . . . . . . . . . . 99
4.1.2 Extending the Unified Form-assembly Code . . . . . . . . . . 100
4.1.3 Extending the FEniCS Form Compiler . . . . . . . . . . . . . . 100
4.1.4 Extending DOLFIN . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.2.1 The Poisson equation . . . . . . . . . . . . . . . . . . . . . . . 104
4.2.2 Steady state advection–diffusion equation . . . . . . . . . . . 105
4.2.3 The Stokes equations . . . . . . . . . . . . . . . . . . . . . . . . 109
4.2.4 Biharmonic equation . . . . . . . . . . . . . . . . . . . . . . . . 110
4.2.5 Further applications . . . . . . . . . . . . . . . . . . . . . . . . 114

5 Automation of lifting-type discontinuous Galerkin methods 115


5.1 Lifting-type formulation for the Poisson equation . . . . . . . . . . . 116
5.2 Semi-automated implementation of lifting-type formulations . . . . 117
5.3 Comparison of IP and lifting-type formulations . . . . . . . . . . . . 122
5.4 Future developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

6 Strain gradient plasticity 129


6.1 A strain gradient plasticity model . . . . . . . . . . . . . . . . . . . . 130
6.2 A discontinuous Galerkin formulation for the plastic multiplier . . . 133
6.3 Linearisation of the governing equations . . . . . . . . . . . . . . . . 135
Contents ix

6.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137


6.4.1 The predictor step . . . . . . . . . . . . . . . . . . . . . . . . . 138
6.4.2 The corrector step . . . . . . . . . . . . . . . . . . . . . . . . . 139
6.4.3 Implementing the variational forms . . . . . . . . . . . . . . . 141
6.5 Numerical examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
6.5.1 Unit square loaded in shear with strain softening . . . . . . . 143
6.5.2 Plate under compressive loading with strain softening . . . . 146
6.5.3 Plate under compressive loading with strain hardening . . . 156
6.5.4 Micro-indentation . . . . . . . . . . . . . . . . . . . . . . . . . 160
6.5.5 Computational notes . . . . . . . . . . . . . . . . . . . . . . . . 162

7 Conclusions and future developments 167

References 171

Summary 183

Samenvatting 185

Propositions 187

Stellingen 189

Curriculum vitae 191


1 Introduction

Since the advent of the modern programmable computer in the 1940s, the cost
of computing power relative to manpower has decreased significantly. As a con-
sequence, high-level programming languages have emerged allowing the imple-
mentation of programs in source code using abstractions that are independent of
the specific computer architectures on which the program is intended to run. A
compiler is then invoked to translate the source code into machine code targeted for
the given computer’s central processing unit (CPU). This development has allowed,
among other things, researchers and scientists to write programs for investigating
and solving various classes of problems numerically.
In engineering, physical phenomena are often described mathematically by
partial differential equations (PDEs), and a commonly used method to solve these
equations is the finite element method (FEM). Standard finite element software typ-
ically provide a problem solving environment for a set of engineering problems
using a predefined selection of finite elements. As part of the application program-
ming interface (API) a user can often supply subroutines which implement special
methods, for instance, the constitutive model in case of a solid mechanics problem.
This offers a degree of customisation and flexibility in terms of implementing
certain models, but the approach may fall short as the complexity of a model
increases.
Strain gradient plasticity is an example of a class of models which can be
difficult to implement in traditional finite element software and researchers often
resort to implementing their own unique solver targeting a specific model. An
implementation involves translating the abstract mathematical representation of
the model into source code which can be handled by a compiler, a process which
can be tedious, time consuming and error prone. However, by introducing a higher
level of abstraction, the burden of this process can be alleviated when it comes
to implementing mathematical representations of the FEM for solving PDEs. A
possible abstraction consists of a form language for expressing the mathematical
formulation of the given problem, and compilers which automatically generate
efficient source code from the given mathematical expressions. This thesis is
centered around this type of automated mathematical modelling.
2 Chapter 1. Introduction

1.1 Research objectives and approach


The research presented in this thesis aims at developing concepts, tools and meth-
ods which allow researchers and application developers to create efficient solvers
for complicated partial differential equations with relatively little effort. Sev-
eral software projects aim at providing a flexible framework for solving partial
differential equations using the finite element method. These software projects
include, among others, traditional finite element libraries and toolboxes such as
deal.II (http://www.dealii.org/, Bangerth et al. (2007)), Diffpack (http://www.
diffpack.com/, Langtangen (1999)), DUNE (http://www.dune-project.org, Bas-
tian et al. (2008b,a)), GetFEM++ (http://home.gna.org/getfem/), OpenFOAM
(http://www.openfoam.com/) and Cactus (http://cactuscode.org/, Allen et al.
(2000)). However, a bit of ‘hand coding’ is often needed in order to use the above
mentioned software. For instance, a user must typically implement (parts of)
the assembly algorithm which is cumbersome as the complexity of the problem
is increasing. A number of software projects have, therefore, emerged that try
to automate the finite element method. These projects include, among others,
FINGER (Wang, 1986), Archimedes (Shewchuk and Ghattas, 1993), Symbolic Me-
chanics System (Korelc, 1997), GetDP (http://geuz.org/getdp/, Dular et al. (1998)),
FreeFEM++ (http://www.freefem.org/), Sundance (http://www.math.ttu.edu/
~kelong/Sundance/html/, Long et al. (2010)), Feel++ (http://www.feelpp.org/,
Prud’homme (2006)) and the FEniCS Project (http://fenicsproject.org, Logg
et al. (2012a)). A common feature of these approaches is that they provide a higher
level of abstraction for expressing variational forms and thereby lessen the burden
on application developers.
The developments presented in this thesis are implemented in various software
components of the FEniCS Project which is chosen for a number of reasons. The
software is released under an open source license1 which makes it possible to obtain
and modify the source code. This provides a high degree of freedom and flexibility
in terms of implementing advanced models and applications. Furthermore, if
the application source code is published, the implementation becomes completely
transparent and reproducible, both properties of importance in research. The
software contains a problem solving environment that handles the assembly, the
application of boundary conditions and the solution of sparse systems of equations.
What distinguishes the software from the more conventional finite element packages
is that it provides a high degree of mathematical abstraction by implementing
a form language for expressing variational forms and relies on form compilers to
automatically generate computer code for the local finite element tensor. This
approach offers several advantages of which two are of particular interest. Firstly,
the time needed to implement, test and debug the code for the local finite element
1 All FEniCS core components are licensed under the GNU LGPL version 3 (or any later version) as

published by the Free Software Foundation (http://www.fsf.org).


1.1. Research objectives and approach 3

tensor can be reduced. Secondly, various optimisations can be employed by the


form compilers during the code generation stage to make the generated code
competitive with hand optimised code. The importance of these two advantages
is proportional to the complexity of the variational form. Finally, the software is
under active development by a growing community which is helpful for receiving
feedback when implementing new features and during debugging sessions.
The potential of the FEniCS framework is evident, however, at the time when
this work was commenced, the functionality in the FEniCS software was only
available for a limited class of problems. For instance, only integration over element
interiors was supported which precluded, among other things, discontinuous
Galerkin methods from being handled as these methods involve integration over
element boundaries. Furthermore, problems like conventional plasticity were not
possible to solve because the software could only handle functions coming from a
finite element space. Also, the generated code was only efficient for limited classes
of problems. The objectives of this work can thus be condensed into the following:
extend the automated mathematical modelling framework of FEniCS such that

• discontinuous Galerkin methods can be handled;

• rapid prototyping of advanced models and applications is possible; and

• efficiency is maintained also for complex problems in general.

As will be demonstrated in this work, addressing the above three issues has
had a significant impact on the range of problems which can be handled in the
FEniCS framework and thereby making life easier for researchers and application
developers.
A complex application from solid mechanics in the form of a strain gradient
plasticity model is considered, as an example, to demonstrate the extensions to the
FEniCS framework developed in this work. Strain gradient models are often used
to provide regularisation in softening problems and to account for observed size
effects at small length scales. An abundance of strain gradient models have been
proposed in literature including the models by Aifantis (1984), Gurtin (2004), Fleck
and Hutchinson (1997), Fleck and Hutchinson (2001) and Gao et al. (1999) to name
a few. The focus in this work is on the class of models involving gradients of fields
such as the equivalent plastic strain. An example of such a model is that proposed
by Aifantis (1984) which involves the addition of the Laplacian of the equivalent
plastic strain to the classical yield condition. A feature of this particular model
is that the classical consistency condition leads to a partial differential equation
rather than an algebraic equation, as is the case is classical flow theory of plasticity.
The partial differential equation is only active in the region undergoing plastic
deformations which introduces the difficulty of imposing non-standard boundary
conditions on the secondary field on the evolving boundary.
4 Chapter 1. Introduction

Motivated by the work of Wells et al. (2004) and Molari et al. (2006) who used
a discontinuous Galerkin formulation for a strain gradient-dependent damage
model, a discontinuous basis can be used to interpolate the secondary field. This
provides a natural framework for handling evolving elastic–plastic boundaries
and provides local (cell-wise) satisfaction of the yield condition. To satisfy the
regularity requirement of the secondary field, a discontinuous Galerkin formulation
is used to enforce weak continuity across cell facets. In order to allow the use
of a discontinuous constant basis for the secondary field, a so-called lifting-type
discontinuous Galerkin formulation, proposed by Bassi and Rebay (1997, 2002), is
adopted. A discontinuous constant basis is the natural choice for the secondary
field when a linear continuous basis is used for the displacement field. Considering
that the formulation involves an additional field variable it is also computationally
more efficient if discontinuous constant elements can be used for this particular
field.

1.2 Outline

The rest of this chapter contains an overview of the FEniCS Project including details
on the components pertinent to the present work. Chapter 2 continues with a
demonstration of how to use the FEniCS toolchain for solid mechanics applications.
The purpose of this demonstration is twofold. Firstly, it serves as an introduction to
the concepts of automated modelling from a solid mechanics point of view, which
will give an understanding of how the automated modelling approach can be
utilised to also tackle more complex problems. Secondly, the presented models and
applications will be used in subsequent chapters, either by extending the models or
by using them as a platform for discussing the development of FEniCS components
in connection to the work presented in this thesis.
Local finite element tensors can be evaluated using different representations of
the tensors. In Chapter 3 the two representations that FFC adopts, the quadrature
representation and the tensor contraction representation are presented and comparisons
are made between the two representations. Furthermore, optimisation strategies
for the quadrature representation are discussed and the performance of these are
investigated.
Chapter 4 introduces the extensions implemented in the FEniCS framework
to allow a class of discontinuous Galerkin (DG) formulations to be handled in an
automated fashion. Building on these abstractions, a semi-automated approach
to implementing lifting-type DG formulations is presented in Chapter 5. This
chapter also contains a brief comparison, in terms of complexity regarding the
implementation and the numerical implications, between a lifting-type formulation
and an interior penalty (IP) DG formulation for the Poisson equation.
In Chapter 6 the extensions, developed in the previous chapters, to the FEniCS
1.3. The FEniCS Project 5

framework are brought together in an implementation of a lifting-type discontinu-


ous formulation for a simple strain gradient plasticity model proposed by Aifantis
(Aifantis, 1984). The purpose is to illustrate how researchers and application de-
velopers may create solvers for more complex problems on top of the FEniCS
software. Finally, in Chapter 7, conclusions are drawn and recommendations for
future development related to this work are presented.

1.3 The FEniCS Project


The FEniCS Project is a suite of open source programs for automating the solution
of PDEs. The concepts and components which are most important in relation to
this work and which will be elaborated on in subsequent chapters are presented.
Thus, only a subset of the components in the FEniCS Project is presented. Further
details on the components presented here, and other components associated with
the FEniCS Project, can be found in the FEniCS book (Logg et al., 2012a) or
online at http://fenicsproject.org. All FEniCS software components, and the
software developed in this work, can be obtained freely at https://bitbucket.org/
fenics-project2 . The FEniCS Project is under continuous development, however,
this presentation and all example code, and software developed and described in
this work, is compliant with version 1.0 of the project and its associated components
unless stated otherwise.
The majority of developments in this work is implemented in the core com-
ponents of FEniCS. However, some of the developments are implemented in
the FEniCS Solid Mechanics library3 (Ølgaard and Wells, 2013). In this thesis,
several code snippets are presented along with many results from numerical
experiments. All example code, and the code which has been used to obtain
all the results, can be downloaded from https://bitbucket.org/k.b.oelgaard/
oelgaard-thesis-supporting-material. Note that in order to run the code, work-
ing installations of FEniCS version 1.0 and FEniCS Solid Mechanics version 1.0 are
required4 .
The procedure of solving PDEs using the FEM can be broken down into the
following four steps:
1. Formulate the variational problem of the PDE
2 The FEniCS software components have recently moved from Launchpad (https://launchpad.net/
fenics) to Bitbucket. However, as the FEniCS Project is being actively developed the location might
change again in the future. The FEniCS website (http://fenicsproject.org), which is less likely to
move, might be a better starting point for locating the software.
3 The FEniCS Solid Mechanics library was formerly known as FEniCS Plasticity (https://launchpad.

net/fenics-plasticity) which focussed solely on plasticity problems. However, to reflect that the
scope of the library has increased to also include more general solid mechanics problems the name was
changed during a recent migration from Launchpad to Bitbucket.
4 Version 1.0 of the FEniCS Solid Mechanics library can be downloaded from https://bitbucket.

org/fenics-apps/fenics-solid-mechanics.
6 Chapter 1. Introduction

UFL FFC UFC DOLFIN

Figure 1.1: FEniCS toolchain for solving a PDE using the FEM.

2. Discretise the formulation

3. Finite element assembly

4. Solve the global system of equations

Facilities for each of these steps are implemented in separate software components
in FEniCS. The relationship between input and output of each component in the
FEniCS toolchain for the finite element procedure is shown in Figure 1.1. In short,
the variational form of the PDE is expressed in the Unified Form Language (UFL)
(Alnæs et al., 2013; Alnæs, 2012), which is given as input to the FEniCS Form
Compiler (FFC)5 (Kirby and Logg, 2006, 2007; Logg et al., 2012c; Ølgaard et al.,
2008a) that automatically generates efficient C++ code for evaluating the local
element tensors. The output from FFC is compliant with the interface defined
in Unified Form-assembly Code (UFC) (Alnæs et al., 2009, 2012) and is used by
DOLFIN (Logg and Wells, 2010; Logg et al., 2012d), which is the finite element
assembler and solver of FEniCS although, in principle, any assembly library which
supports UFC can be used.
The key advantage of this modular construction is that it becomes more trans-
parent where and how new features and functionality should be implemented.
Furthermore, developers and users can pick individual components to form their
own applications. In this work for instance, the UFL is augmented with discontinu-
ous Galerkin operators6 , compiler optimisations are implemented in FFC, while
more complex solvers for lifting-type formulations and solid mechanics problems
can be implemented on top of the FEniCS toolbox.

1.3.1 Simple model problem

As a model boundary value problem for presenting the FEniCS framework consider
the Poisson equation, which for a body Ω ⊂ Rd , where 1 ≤ d ≤ 3, with boundary

5 Any compiler that supports UFL as input, and outputs UFC code, can be used instead of FFC in the

described toolchain. The Symbolic Form Compiler (Alnæs and Mardal, 2010, 2012), which is also part of
FEniCS, is one such example.
6 Historically, the DG operators were implemented in the original form language of FFC which was

later merged into the richer UFL.


1.3. The FEniCS Project 7

∂Ω and outward unit normal vector n : ∂Ω → Rd reads:

−∆u = f in Ω,
u = g on Γ D , (1.1)
∇u · n = h on Γ N .

Here, u is an unknown scalar field, f is a source term, g is a prescribed value


for u on the Dirichlet boundary Γ D , and h is a prescribed value for the outward
normal derivative of u on the Neumann boundary Γ N . The boundaries Γ D and Γ N
divide the boundary such that Γ D ⊆ ∂Ω and Γ N = ∂Ω \ Γ D . To apply the FEniCS
framework the problem must be posed as a variational formulation in the following
canonical form: find u ∈ V such that

a (u, v) = L (v) ∀ v ∈ V̂, (1.2)

where V is the trial space and V̂ is the test space, a (u, v) and L (v) denote the
bilinear and linear forms, respectively. A typical variational form7 of (1.1) defines
the bilinear and linear forms as:
Z
a (u, v) := ∇u · ∇v dx (1.3)
ZΩ Z
L (v) := f v dx + hv ds, (1.4)
Ω ΓN

with the trial and test spaces defined as:


n o
V := v ∈ H 1 (Ω) : v = g on Γ D , (1.5)
n o
V̂ := v ∈ H 1 (Ω) : v = 0 on Γ D . (1.6)

The variational problem in (1.2) must be discretised to compute a finite element


solution to the Poisson problem. This is done by using a pair of discrete function
spaces for the test and trial functions: find uh ∈ Vh ⊂ V such that

a (uh , v) = L (v) ∀ v ∈ V̂h ⊂ V̂. (1.7)

Thus, after transforming the strong form of the problem into the variational coun-
terpart, the FEniCS toolchain, starting with UFL, can be invoked to compute a
solution.

7 Chapters 4 and 5 presents discontinuous Galerkin formulations for (1.1).


8 Chapter 1. Introduction

UFL code
element = FiniteElement("Lagrange", triangle, 1)

u = TrialFunction(element)
v = TestFunction(element)
f = Coefficient(element)
h = Coefficient(element)

a = inner(grad(u), grad(v))*dx
L = f*v*dx + h*v*ds

Figure 1.2: UFL code for the Poisson problem using continuous-piecewise linear
Lagrange polynomials on triangles.

1.3.2 Unified Form Language

In order to compute a solution to the variational problem using the FEM, it is neces-
sary to discretise the formulation. The Unified Form Language (UFL) (Alnæs et al.,
2013; Alnæs, 2012) enables a user to express the discretisation compactly using a
notation which resembles the mathematical notation closely. UFL is implemented
as a domain-specific embedded language (DSEL) in Python which, among other
things, allow users to define custom operators using all features of the Python
programming language when writing UFL code. This section presents the most
basic features used throughout in this work, while some of the more advanced func-
tionality is presented in subsequent chapters as needed. For a detailed description
of the language, refer to Alnæs et al. (2013).
The Poisson problem in (1.7) can be expressed in UFL by the code shown in
Figure 1.2. The first line in the code defines the local finite element basis that spans
the discrete function space Vh on an element T ∈ Th where Th denotes the standard
triangulation of Ω. Generally, finite elements are defined in UFL by their family,
cell and degree:

UFL code
element = FiniteElement(family, cell, degree)

which in the given case, in Figure 1.2, means that the basis is a piecewise continuous
linear Lagrange triangle. UFL contains a set of predefined finite element family
names, for instance, "Lagrange" as already shown, "Discontinuous Lagrange"
(short name "DG") and "Brezzi-Douglas-Marini" (short name "BDM"). The cell
argument denotes the polygonal shape of the finite element while the degree
argument denotes the degree of the polynomial space. Although valid cell shapes in
UFL are: interval, triangle, tetrahedron, quadrilateral and hexahedron. FFC
only supports the first three cell shapes at present. Also note that the permitted
1.3. The FEniCS Project 9

Mathematical notation UFL notation Mathematical notation UFL notation


A·B dot(A, B) A,i A.dx(i)
A:B inner(A, B) ∂A
∂xi Dx(A, i)
AB ≡ A ⊗ B outer(A, B) dA
Dn(A)
dn
AT transpose(A), ∇A grad(A)
A.T ∇·A div(A)
sym A sym(A)
tr A ≡ Aii tr(A)
det A det(A)

Table 1.1: (Left) Table of tensor algebraic operators. (Right) Table of differential
operators.

value of cell and degree depend on the choice of finite element family. It is
important to realise that UFL is only concerned with the abstract operations related
to the finite element function spaces; it is left to the form compiler to support the
element families, that is, to generate meaningful code for the representation of
elements and forms. For mixed finite element methods, product spaces like:

V = [V2 × V2 ] × V1 . (1.8)

can easily be generated by either the MixedElement class or the * operator:

UFL code
V_2 = FiniteElement("Lagrange", triangle, 2)
V_1 = FiniteElement("Lagrange", triangle, 1)
V = (V_2*V_2)*V_1
W = MixedElement(MixedElement((V_2, V_2)), V_2)

meaning that V and W are identical. To create a mixed element in which all the
component spaces are identical, the VectorElement can be used:

UFL code
V = VectorElement(family, cell, degree, dim=None)

where dim defaults to the dimension of the given cell unless explicitly specified.
After defining the local finite element basis, the trial function u ∈ Vh , the
test function v ∈ Vh and the coefficient functions f , g ∈ Vh can be defined in
a straightforward fashion as seen in the code in Figure 1.2. The bilinear and
linear forms from (1.3) and (1.4) can then be implemented simply by using the
tensor and differential operators defined in UFL, some of which can be seen in
Table 1.1. An important thing to note is that the definition of the gradient operator
grad(A) of, for instance, a vector valued function in UFL is {grad(u)}ij = ∂ui /∂x j
10 Chapter 1. Introduction

Mathematical notation UFL notation Mathematical notation UFL notation


a cos f cos(f)
b a / b
ab a**b, pow(a,b) sin f sin(f)
f tan f
p
sqrt(f) tan(f)
exp f exp(f) arccos f acos(f)
ln f ln(f) arcsin f asin(f)
|f| abs(f) arctan f atan(f)
sign f sign(f)

Table 1.2: (Left) Table of elementary functions. (Right) Table of trigonometric


functions.

and not {grad(u)}ij = ∂u j /∂xi . The latter operator is, however, provided in
UFL by nabla_grad(A). A similar convention applies to the divergence operator
where nabla_div(A) is provided as an alternative to div(A). In this work, the
operators ∇u and ∇ · u follow the UFL definition for the gradient and divergence
operators, grad(u) and div(u), respectively and should not be confused with the
UFL operators nabla_grad(u) and nabla_div(u).
To complete the implementation of the variational forms, integrationR on the
relevant domains must be expressed. In UFL, the integral over the domain Ω I dx
k
is
R denoted by I*dx(k) while the integral over the exterior boundary of the domain
∂Ωk I ds is denoted by I*ds(k) where k is the subdomain number and I is a valid
UFL expression. Thus, having completed the implementation of (1.2) in the near-
mathematical notation of UFL, the form compiler can be invoked to generate code
from the abstract UFL representation.
The last two classes of expressions to be presented in this short introduction to
UFL are nonlinear scalar functions and geometric quantities. UFL provides a set
of nonlinear scalar functions, presented in Table 1.2, which can be applied to, for
instance, scalar valued coefficient functions such as f and g in the Poisson example.
It is illegal to apply these functions to any test or trial function as this would render
the variational form nonlinear in those arguments. Geometric quantities are related
to the local finite element cell T. For instance, the coordinate of the integration
point currently being evaluated on T (including its boundary) can be accessed via
cell.x. Other geometric quantities which are particularly useful in relation to this
work are the outward normal to the facet8 currently being evaluated cell.n and the
circumradius, the radius of the circumscribed circle of T, cell.circumradius. Basic
usage of nonlinear functions and the integration point coordinate is demonstrated
later in Section 1.3.5, while the facet normal and circumradius are frequently used
8 A facet is a topological entity of a computational mesh of dimension D − 1 (codimension 1) where

D is the topological dimension of the cells of the computational mesh. Thus for a triangular mesh, the
facets are the edges and for a tetrahedral mesh, the facets are the faces.
1.3. The FEniCS Project 11

when defining discontinuous Galerkin variational forms in Chapter 4.

1.3.3 FEniCS Form Compiler


As shown in Figure 1.1, the FEniCS Form Compiler (FFC) (Kirby and Logg, 2006,
2007; Logg et al., 2012c; Ølgaard et al., 2008a; Ølgaard and Wells, 2010) takes as
input a variational form specified in UFL and generates as output C++ code which
conforms to the UFC interface, to be described in Section 1.3.4. Central to the finite
element method is the assembly of sparse tensors, described in Section 1.3.5, which
relies on the computation of the local element tensor A T as well as the local-to-
global mapping ι T . Although it is possible to hand code A T and ι T , the process
is both tedious and error-prone especially for complex problems. This issue is
eliminated by letting FFC generate the code automatically. Introducing a compiler
also provides the possibility of applying various optimisation strategies for efficient
computation of A T , which would normally not be feasible when developing code
by hand. The automated code generation for the general and efficient solution of
finite element variational problems is one of the key features of FEniCS.
There are three different interfaces to FFC: a Python interface, a just-in-time (JIT)
compilation interface and a command-line interface. Only the latter is presented
here while details on the other two interfaces can be found in Logg et al. (2012c).
The command-line interface takes a UFL form file or a list of form files as input:
Bash code
$ ffc Poisson.ufl

The form file contains the UFL specification of elements and/or forms, as for
instance the code from Figure 1.2 which in this case is saved in the file Poisson.ufl.
The content of a form file is wrapped in a Python script and then executed for
further processing in FFC. There exist a number of optional command-line options
to control the code generation. Related to this work, the most important options
are:
-l language This parameter controls the output format for the generated code. The
default value is “ufc”, which indicates that the code is generated according
to the UFC specification. Alternatively, the value “dolfin” may be used to
generate code according to the UFC format with a small set of additional
DOLFIN-specific wrappers.
-r representation This parameter controls the representation used for the gener-
ated element tensor code. There are three possibilities: “auto” (the default),
“quadrature” and “tensor”. FFC implements two different approaches to
code generation. One is based on traditional quadrature and another on a
special tensor representation. This will be discussed in Section 3.2. In the
case “auto”, FFC will try to select the better of the two representations; that
12 Chapter 1. Introduction

is, the representation that is believed to yield the best run-time performance
for the problem at hand. This issue is addressed in detail in Section 3.6.
-O If this option is used, the code generated for the element tensor is optimised
for run-time performance. The optimisation strategy used depends on the
chosen representation. In general, this will increase the time required for
FFC to generate code, but should reduce the run-time for the generated code.
Note that for very complicated variational forms, hardware limitations can
make compilation with some optimisation options impossible. Optimisation
strategies are treated in Chapter 3.
As an illustration of the options presented above, the command:
Bash code
$ ffc -l dolfin -r quadrature -O Poisson.ufl

will cause FFC to generate code for the Poisson problem, including DOLFIN
wrappers using the quadrature representation with the default optimisation. A
list of all available command-line parameters can be seen in FFC manual page by
typing ‘man ffc’ on the command-line.
FFC follows the conventional design of a compiler in that it breaks compilation
into several sequential stages. The output generated at each stage serves as input for
the following stage, as illustrated in Figure 1.3. Introducing separate stages allows
development and improvement of each stage to be implemented without affecting
other stages of the compilation. Furthermore, adding new stages and dropping
existing stages becomes trivial. Each of the stages involved when compiling a form
is described in the following. Compilation of elements follow a similar (but simpler)
set of stages, and is not described here.
Compiler stage 0: Language (parsing). In this stage, the user-specified form is
interpreted and stored as a UFL abstract syntax tree (AST). The actual pars-
ing is handled by Python and the transformation to a UFL form object is
implemented by operator overloading in UFL.
Input: Python code or .ufl file
Output: UFL form
Compiler stage 1: Analysis. This stage preprocesses the UFL form and extracts
form metadata (FormData), such as which elements were used to define the
form, the number of coefficients and the cell type (interval, triangle or
tetrahedron). This stage also involves selecting a suitable quadrature scheme
and representation (as discussed earlier) for the form if these have not been
specified by the user.
Input: UFL form
Output: preprocessed UFL form and form metadata
1.3. The FEniCS Project 13

Figure 1.3: Compilation of


Foo.ufl
finite element variational
forms broken into six se-
quential stages: Language,
Analysis, Representation, Stage 0
Optimisation, Code gen- Language
eration and Code Format-
ting. Each stage gen- UFL
erates output based on
input from the previous
Stage 1
stage. The input/output
Analysis
data consist of a UFL form
file, a UFL object, a UFL
object and metadata com- UFL + metadata
puted from the UFL ob-
ject, an intermediate rep- Stage 2
resentation (IR), an opti- Representation
mised intermediate repre-
sentation (OIR), C++ code
IR
and, finally, C++ code files
(from Logg et al. (2012c)).
Stage 3
Optimization

OIR

Stage 4
Code generation

C++ code

Stage 5
Code formatting

Foo.h / Foo.cpp
14 Chapter 1. Introduction

Compiler stage 2: Code representation. Most of the complexity of compilation is


handled in this stage which examines the input and generates all data needed
for the code generation. This includes generation of finite element basis func-
tions, extraction of data for mapping of degrees of freedom, and generation of
the form representation, see Section 3.2, which may involve precomputation
of integrals. Both representations available in FFC use tabulated values of
finite element basis functions and their derivatives at a suitable set of inte-
gration points on the reference element. FFC itself does not generate these
values, but relies on the library FIAT (Kirby, 2004, 2012) for the computation
of basis functions and their derivatives.
The intermediate representation is stored as a Python dictionary, mapping
names of UFC functions to the data needed for generation of the correspond-
ing code. In simple cases, like ufc::form::rank, this data may be a simple
number like 2. In other cases, like ufc::cell_tensor::tabulate_tensor, the
data may be a complex data structure that depends on the choice of form
representation.
Input: preprocessed UFL form and form metadata
Output: intermediate representation (IR)
Compiler stage 3: Optimisation. This stage examines the intermediate representa-
tion and performs optimisations. The optimisation strategy depends on the
chosen form representation, see Section 3.3 for optimisations pertinent to the
quadrature representation. Data stored in the intermediate representation
dictionary is then replaced by new data that encode an optimised version of
the function in question.
Input: intermediate representation (IR)
Output: optimised intermediate representation (OIR)
Compiler stage 4: Code generation. This stage examines the optimised intermedi-
ate representation and generates the actual C++ code for the body of each UFC
function. The code is stored as a dictionary, mapping names of UFC functions
to strings containing the C++ code. As an example, the data generated for
ufc::form::rank may be the string “return 2;”.
This demonstrates the importance of separating stages 2, 3 and 4 as it allows
stages 2 and 3 to focus on algorithmic aspects related to finite elements and
variational forms, while stage 4 is concerned only with generating C++ code
from a set of instructions prepared in earlier compilation stages.
Input: optimised intermediate representation (OIR)
Output: C++ code
Compiler stage 5: Code formatting. This stage examines the generated C++ code
and formats it according to the UFC format, generating as output one or more
1.3. The FEniCS Project 15

ufc::mesh ufc::function ufc::cell_integral


ufc::cell ufc::finite_element ufc::exterior_facet_integral
ufc::form ufc::dofmap ufc::interior_facet_integral

Table 1.3: C++ classes defined in the UFC interface.

.h/.cpp files conforming to the UFC specification. This is where the actual
writing of C++ code takes place and the stage relies on templates for UFC
code available as part of the UFC module ufc_utils.
Input: C++ code
Output: C++ code files
The interface to the code which is generated by FFC is discussed in the following
section.

1.3.4 Unified Form-assembly Code


The purpose of Unified Form-assembly Code (UFC) (Alnæs et al., 2009, 2012) is to
provide an interface between the problem-specific code generated by form compilers
and general-purpose problem solving environments like DOLFIN (described in
Section 1.3.5) which implements, among other things, the finite element assembly
algorithm. In contrast to other FEniCS components, few changes are made to UFC
in order maintain a stable interface between form compilers and DOLFIN. This
section gives a brief introduction to the interface, with emphasis on the functions
relevant for this work. Furthermore, the UFC numbering convention for mesh
entities is discussed.
The UFC interface provides a small set of abstract C++ classes, shown in Ta-
ble 1.3 which are commonly used for assembling finite element tensors. The
mesh and cell classes are simple data structures that provide information such
as the geometric dimension and the topological dimension. In addition, the cell
class provides an array of global indices for the mesh entities belonging to the
given cell (cell.mesh_entities) and an array of coordinates of the vertices of
the cell (cell.coordinates). The classes function and finite_element define
interfaces for general tensor-valued functions and finite elements respectively.
The form class defines an interface for assembly of the global tensor correspond-
ing to the given form. This includes functions to create finite_element, dofmap
and integral objects (ufc::cell_integral, ufc::exterior_facet_integral and
ufc::interior_facet_integral) of the variational form.
Of particular interest in relation to this work are the dofmap and integral classes.
The local-to-global degree of freedom mapping on the finite element cell T, ι T , is
computed by the dofmap::tabulate_dofs function for which the UFC interface is
defined as:
16 Chapter 1. Introduction

C++ code
/// Tabulate the local-to-global mapping of dofs on a cell
virtual void tabulate_dofs(unsigned int* dofs, const mesh& m, const cell& c)
const = 0;

where dofs is a pointer to an array for the tabulated values on T. UFC only provides
the interface of this function, it is not concerned with computing ι T . The code
to compute ι T must be generated by the form compiler. For example, FFC will
generate the following code for linear Lagrange elements on triangles.

C++ code
/// Tabulate the local-to-global mapping of dofs on a cell
virtual void tabulate_dofs(unsigned int* dofs, const ufc::mesh& m, const
ufc::cell& c) const
{
dofs[0] = c.entity_indices[0][0];
dofs[1] = c.entity_indices[0][1];
dofs[2] = c.entity_indices[0][2];
}

Note that FFC associates each degree of freedom with the global vertex number
which can be extracted from the cell::entity_indices array. For discontinuous
linear Lagrange elements on triangles the generated code is

C++ code
/// Tabulate the local-to-global mapping of dofs on a cell
virtual void tabulate_dofs(unsigned int* dofs, const ufc::mesh& m, const
ufc::cell& c) const
{
dofs[0] = 3*c.entity_indices[2][0];
dofs[1] = 3*c.entity_indices[2][0] + 1;
dofs[2] = 3*c.entity_indices[2][0] + 2;
}

because FFC considers all degrees of freedom local to the given element and
therefore compute degree of freedom numbers based on the global cell index.
The local finite element tensor is computed inside the tabulate_tensor function
which is implemented by all three integral classes although the interface varies
slightly. For the cell_integral, the interface is

C++ code
/// Tabulate the tensor for the contribution from a local cell
virtual void tabulate_tensor(double* A, const double * const * w, const cell&
c) const = 0;

where A is a pointer to an array which will hold the values of the local ele-
ment tensor and w contains nodal values of any coefficient functions present
1.3. The FEniCS Project 17

v2

Vertex Coordinates
v0 x = (0, 0)
v1 x = (1, 0)
v2 x = (0, 1)

v0 v1

Figure 1.4: The UFC reference triangle and the coordinates of the vertices.

in the integral. The code which FFC generates for this function varies depend-
ing on, for example, the choice of representation and optimisation, issues which
are discussed in Chapter 3. (Figures 3.2 and 3.3, on page 63 and 65 respec-
tively, show examples of code generated by FFC for this function.) The inter-
face for exterior_facet_integral::tabulate_tensor is similar in nature to the
interface for interior_facet_integral::tabulate_tensor which is discussed in
Section 4.1.2 in connection to automation of discontinuous Galerkin methods.
The UFC specification also defines a numbering scheme for mesh entities which
allows form compilers to access necessary data consistently when generating
code, for example, for computing the local tensors and local-to-global mapping as
discussed above. Important aspects of this numbering scheme are summarised in
the following for triangular cells. Further details on the UFC numbering convention
can be found in Alnæs et al. (2012).
The UFC reference triangle, including the coordinates of the three vertices, is
shown in Figure 1.4. Mesh entities are identified by the tuple (d, i ) where d is the
topological dimension of the mesh entity and i is a unique global index of the mesh
entity. For convenience, mesh entities of topological dimension 0 are referred to as
vertices, entities of dimension 1 are referred to as edges and entities of dimension 2
are referred to as faces. Mesh entities of topological dimension D − 1 (codimension 1),
with D denoting the topological dimension of the cells of the computational mesh,
are referred to as facets. Thus for a triangular mesh, the facets are the edges and for
a tetrahedral mesh, the facets are the faces. Following this convention, the vertices
of a triangle are identified as v0 = (0, 0), v1 = (0, 1) and v2 = (0, 2), the edges
(facets) are e0 = (1, 0), e1 = (1, 1) and e2 = (1, 2), and the cell itself is c0 = (2, 0).
The vertices of simplicial cells (intervals, triangles and tetrahedra) are numbered
locally based on the corresponding global vertex numbers such that a tuple of
increasing local vertex numbers corresponds to a tuple of increasing global vertex
numbers. This is illustrated for a simple mesh in Figure 1.5. The remaining mesh
entities are numbered within each topological dimension based on a lexicographical
18 Chapter 1. Introduction

2 3
v1 v2
v2

v0
v0 v1
0 1

Figure 1.5: Local vertex numbering of simplicial mesh based on global vertex
numbers.

Entity Incident vertices Non-incident vertices


v0 = (0, 0) ( v0 ) ( v1 , v2 )
v1 = (0, 1) ( v1 ) ( v0 , v2 )
v2 = (0, 2) ( v2 ) ( v0 , v1 )
e0 = (1, 0) ( v1 , v2 ) ( v0 )
e1 = (1, 1) ( v0 , v2 ) ( v1 )
e2 = (1, 2) ( v0 , v1 ) ( v2 )
c0 = (2, 0) ( v0 , v1 , v2 ) ∅

Table 1.4: Local numbering of mesh entities on triangular cells.

ordering of ordered tuples of non-incident vertices. For example, the first edge, e0 , of
a triangle is located opposite vertex v0 as shown in Figure 1.6a. The numbering of
mesh entities on triangular cells is shown in Table 1.4.
The relative ordering of mesh entities with respect to other incident mesh
entities follows by sorting the entities by their indices. Therefore, the pair of
vertices incident to edge e0 in Figure 1.6a is (v1 , v2 ), not (v2 , v1 ). Due to the vertex
numbering convention, this means that two incident simplicial cells will always
agree on the orientation of incident subsimplices (for instance facets). This is
demonstrated in Figure 1.6b, which shows two incident triangles which agree on
the orientation of the common edge. This feature is advantageous when generating
code for discontinuous Galerkin methods, as will be demonstrated in Chapter 4.

1.3.5 DOLFIN
Up until now, only the variational form and finite element discretisation has
been defined. To obtain a solution to the boundary value problem in (1.1) the
computational domain and boundary conditions must be specified which in the
1.3. The FEniCS Project 19

2 3

v2 v1 v2
v2

e2
e0
e0

v0
v0 v1
v0 v1 0 1
(a) Edges are numbered based on the non- (b) Orientation of facets (edges) are defined
incident vertex. Therefore, e0 is located op- by the ordered tuple of incident vertices thus
posite vertex v0 . e0 = (v0 , v2 ) and e2 = (v0 , v1 ).

Figure 1.6: Edge numbering and orientation based on sorted tuples of incident and
non-incident vertices. As a consequence two incident triangles will always agree
on the orientation of the common facet for simplicial cells.

context of FEniCS is handled via a component called DOLFIN, a C++/Python


library, which also provides algorithms for finite element assembly and linear
algebra functionality to solve the arising system of equations. DOLFIN provides a
problem solving environment and is the main user interface to FEniCS. A detailed
presentation of DOLFIN is outside the scope of this work but can be found in Logg
and Wells (2010) and Logg et al. (2012d).
The necessary DOLFIN functionality to implement a complete solver for the
Poisson problem is presented. The intention is to give an impression of the
possibilities that are offered by DOLFIN and an understanding of the basic concepts
that are developed and used in subsequent chapters. For the model problem
under consideration the domain Ω = [0, 1] × [0, 1], in which the source term f =
8π 2 sin(2πx ) sin(2πy) is present, is subjected to homogeneous Dirichlet boundary
conditions, g = 0 on Γ D = ∂Ω.
A complete C++ solver for this problem is shown in Figures 1.7 and 1.8. The
first line in Figure 1.7 includes the DOLFIN library, while the second line includes
the UFC conforming code generated by FFC based on the UFL input for the Poisson
problem shown in Figure 1.2. Then follows the definition of the class Source which
is a subclass of the Expression class. An Expression represents a function that can
be evaluated on a finite element space and to suit this purpose it implements an
eval function. This function takes as arguments an array of values which holds
the return values and an array x which contains the coordinates of the point where
the Expression is currently being evaluated. The Source class overloads the eval
function, which in this case simply inserts the value 8π 2 sin(2πx ) sin(2πy) into the
20 Chapter 1. Introduction

C++ code
#include <dolfin.h>
#include "Poisson.h"

using namespace dolfin;

// Source term
class Source : public Expression
{
void eval(Array<double>& values, const Array<double>& x) const
{
values[0] = 8*pow(DOLFIN_PI, 2)*sin(2*DOLFIN_PI*x[0])*sin(2*DOLFIN_PI*x[1]);
}
};

// Sub domain for Dirichlet boundary condition


class DirichletBoundary : public SubDomain
{
bool inside(const Array<double>& x, bool on_boundary) const
{
return on_boundary;
}
};

Figure 1.7: Implementation of source term and Dirichlet boundary for the C++
solver for the boundary value problem in (1.1). Program continues in Figure 1.8.
1.3. The FEniCS Project 21

values array.
Next follows the definition of the class DirichletBoundary, a subclass of
SubDomain, for the part of the boundary where Dirichlet boundary conditions
are to be applied. The SubDomain class implements the function inside which eval-
uates to true or false depending on whether or not the point given by coordinates x
is part of the subdomain. In addition to the argument x, the inside function also
takes the argument on_boundary, a boolean value, supplied by DOLFIN, which
is true if the point x is located on ∂Ω. In the given case, the Dirichlet condition
is indeed applied on ∂Ω which means that the overloaded inside function can
simply be implemented by returning the on_boundary argument.
The remaining part of the C++ solver, the main function, is shown in Figure 1.8.
The first line defines the computational mesh and consists of 2048 triangles as
the unit square is divided into 32 × 32 cells and each cell is divided into two 2
triangles. DOLFIN provides functionality for creating simple meshes through the
classes: UnitInterval, UnitSquare, UnitCube, UnitCircle, UnitSphere, Interval,
Rectangle and Box which are useful for testing. For ‘real’ applications, a user can
read a mesh from file in the following way:

C++ code
Mesh mesh("mesh.xml");

provided that the mesh is saved in the DOLFIN XML format. Meshes can be
generated by external libraries, such as Gmsh (http://geuz.org/gmsh/), stored in
the Gmsh data format and converted by the dolfin-convert script to the DOLFIN
XML data format.
Next, the FunctionSpace is defined for the finite element function space Vh
in (1.7). A function space is represented by a Mesh, a DofMap and a FiniteElement.
The DofMap and FiniteElement classes are generated by FFC based on the element
definition in Figure 1.2. However, by including the ‘-l dolfin’ option when
compiling the UFL input with FFC:

Bash code
ffc -l dolfin Poisson.ufl

the DOLFIN wrappers are generated, permitting a user to instantiate a function


space simply by providing the mesh as argument to the constructor.
The next three lines define an object for the Dirichlet boundary condition
u = g = 0 on the boundary Γ D defined by the DirichletBoundary class from
Figure 1.7. The value g = 0 is simply represented as a constant.
Then follows the creation of the bilinear and linear forms of the Poisson prob-
lem using the function space V as argument. The Poisson::BilinearForm and
Poisson::LinearForm classes are part of the code in Poisson.h generated by FFC
from the UFL input in Figure 1.2. Note how the coefficients f and h are defined
22 Chapter 1. Introduction

C++ code
int main()
{
// Create mesh and function space
UnitSquare mesh(32, 32);
Poisson::FunctionSpace V(mesh);

// Define boundary condition


Constant g(0.0);
DirichletBoundary boundary;
DirichletBC bc(V, g, boundary);

// Define variational forms


Poisson::BilinearForm a(V, V);
Poisson::LinearForm L(V);
Source f;
L.f = f;
Constant h(0.0);
L.h = h;

// Compute solution
Function u(V);
Matrix A;
Vector b;
assemble(A, a);
assemble(b, L);
bc.apply(A, b);
solve(A, *u.vector(), b);

// Save solution in PVD format


File file("poisson.pvd");
file << u;

// Plot solution
plot(u);

return 0;
}

Figure 1.8: Continuation from Figure 1.7 of C++ code for the Poisson boundary
value problem.
1.3. The FEniCS Project 23

and attached to the linear form. The coefficient f is defined by the class Source as
shown in Figure 1.7, while the coefficient h, the Neumann boundary condition, is
zero in the given case.
A Function u is then declared to hold the computed solution. The Function
class represents a finite element function in V and therefore takes a function space
as argument. The function u also holds a vector of values of the degrees of freedom
associated with the function. A function is evaluated based on linear combinations
of basis functions and the values of this vector. This is in contrast to the Expression
class which is evaluated by overloading the eval function as seen in Figure 1.7.
To compute a solution for u which satisfies the variational problem, defined by
the bilinear and linear forms a and L, the following three steps are applied. Firstly,
the bilinear and linear forms a and L are assembled into the Matrix A and the Vector
b by calling the free function assemble which implements an algorithm to assemble
finite element variational forms. The assembly algorithm will be presented later
in this section. Secondly, the Dirichlet boundary condition is applied to the linear
system of equations using the apply member function of the DirichletBC object
bc. Thirdly, after applying the boundary condition, the system of equations can be
solved by calling the free function solve which solves linear systems on the form
Ax = b using the assembled matrix A, the vector of degree of freedom values from
u and the assembled vector b as arguments.
As an alternative to the three steps outlined above, the solve function provides
functionality to solve variational problems in a straightforward fashion namely by:

C++ code
solve(a == L, u, bc);

which automatically assembles the system, applies the boundary conditions and
solves the linear system which is stored in the function u.
Finally, the solution is saved in ParaView Data (PVD) format (http://www.
paraview.org/) for external post processing and plotted by the built-in plot com-
mand of DOLFIN which enables a quick visual inspection of the computed solution.
The computed solution to the Poisson boundary value problem is shown in Fig-
ure 1.9.

Python interface

As already mentioned, DOLFIN also provides a Python interface as an alternative


to the C++ interface. Most of the Python interface is generated automatically
from the C++ interface using SWIG (http://www.swig.org/). In addition, the
Python interface offers seamless integration with UFL and FFC through just-in-time
compilation of variational forms and elements which, in combination with the
expressiveness of Python, allows solvers to be implemented very compactly. For
24 Chapter 1. Introduction

Figure 1.9: Computed solution to the Poisson boundary value problem. The warped
scalar field u in the figure on the right has been scaled by a factor of 0.5.

this reason, the Python interface to DOLFIN is preferred, whenever feasible, for the
examples presented in this thesis.
As an example, the complete solver for the Poisson boundary value problem
using the Python interface is shown in Figure 1.10. The code is very similar to the
C++ code in Figures 1.7 and 1.8 and the differences are mainly due to the difference
in Python and C++ syntax. The two main differences are the definition of the
FunctionSpace and the definition of the variational forms which are implemented
directly as part of the solver and not in a separate file. Also note that the UFL
coordinates have been used to implement the source term f directly as part of
the variational formulation. It could also be implemented by subclassing the
Expression class and overloading the eval function in a similar way to the approach
in the C++ example:

Python code
class Source(Expression):
def eval(self, values, x):
values[0] = 8*pi**2*sin(2*pi*x[0])*sin(2*pi*x[1])
f = Source()

As an alternative, it could be implemented by:

Python code
f = Expression("8*pow(pi,2)*sin(2*pi*x[0])*sin(2*pi*x[1])")

where the string argument to the Expression class is given in C++ syntax which is
automatically just-in-time compiled in order to evaluate the Expression. Compared
to the subclassing approach, this is more efficient as the callback to the eval function
1.3. The FEniCS Project 25

Python code
from dolfin import *

# Create mesh and define function space


mesh = UnitSquare(32, 32)
V = FunctionSpace(mesh, "Lagrange", 1)

# Define Dirichlet boundary


def boundary(x, on_boundary):
return on_boundary

# Define boundary condition


g = Constant(0.0)
bc = DirichletBC(V, g, boundary)

# Define variational problem


u = TrialFunction(V)
v = TestFunction(V)
x = V.cell().x
f = 8*pi**2*sin(2*pi*x[0])*sin(2*pi*x[1])
a = inner(grad(u), grad(v))*dx
L = f*v*dx

# Compute solution
U = Function(V)
solve(a == L, U, bc)

# Save solution in PVD format


file = File("poisson.pvd")
file << U

# Plot solution
plot(U, interactive=True)

Figure 1.10: Complete Python solver for the boundary value problem in (1.1).
26 Chapter 1. Introduction

will take place in C++ rather than Python.

Assembly algorithm

To conclude this short introduction to the FEniCS Project, the assembly algorithm,
implemented in the DOLFIN assemble function, is presented. The presentation is
given for the assembly of the rank two tensor corresponding to the bilinear form
of the Poisson problem in (1.2). A generalisation of the algorithm for multilinear
forms is given in Alnæs et al. (2009) and Logg et al. (2012b).
Setting the function space V̂ = V, the tensor A which arises from assembling
the bilinear form a is defined by

A I = a φ I2 , φ I1 , (1.9)


 N
where I = ( I1 , I2 ) is a multi-index and φk k=1 is a basis for V. The tensor A is a
sparse rank two tensor, a matrix, of dimensions N × N. The matrix A is computed
by iterating over the cells of the mesh and adding the contribution from each local
cell to the global matrix A. In this case, from (1.3), the local cell tensor A T is defined
as:   Z
A T,i = a T φiT2 , φiT1 = ∇u · ∇v dx, (1.10)
T
where i = (i1 , i2 ) is a multi-index A T,i is the ith entry of the cell tensor A T , a T is
n o3
the local contribution to the form from a cell T ∈ Th and φkT is the local finite
k =1
element basis for V on T, which is linear Lagrange elements on triangles in this
case.
To formulate the assembly algorithm, a local-to-global mapping of degrees of
freedom is needed. Let ι T : I T → I denote the collective local-to-global mapping
for each T ∈ Th  
ι T (i ) = ι1T (i1 ), ι2T (i2 ) ∀ i ∈ IT , (1.11)
j
where ι T : [1, 3] → [1, N ] denotes the local-to-global mapping for each discrete
function space Vj and I T is the index set

2
I T = ∏ [1, 3] = (1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3) . (1.12)

j =1

That is, ι T maps a tuple of local degrees of freedom to a tuple of global degrees
of freedom. DOLFIN calls the tabulate_tensor and tabulate_dofs functions
presented in Section 1.3.4, in order to compute the local contribution a T and the
j
local-to-global mapping ι T for each discrete function space from which DOLFIN
constructs the collective local-to-global mapping ι T .
1.3. The FEniCS Project 27

The assembly of the matrix A can now be carried out efficiently by iterating
over all cells T ∈ Th . On each cell T, the cell tensor A T is computed and then added
to the global tensor A as outlined in Algorithm 1. The algorithm can be extended

Algorithm 1 Assembly algorithm.


A=0
for T ∈ Th
(1) Compute ι T
(2) Compute A T
(3) Add A T to A according to ι T :
for i ∈ I T
+
Aι T (i) = A T,i
end for
end for

to handle assembly over exterior and interior facets, the latter is demonstrated in
Section 4.1.4.
2 FEniCS applications to solid mechanics

One of the goals of this work is to tackle complicated solid mechanics models
using automated modelling tools. In the previous chapter it was shown how
automated modelling could be employed to solve the finite element variational
formulation of a Poisson boundary value problem. The Poisson problem provides
a simple platform for introducing the concepts behind automated modelling as it is
implemented in the FEniCS framework. However, from the simple presentation
it may not be immediately clear how more complex problems, like plasticity,
can be solved. A natural step is, therefore, to apply the concept of automated
modelling to some standard solid mechanics problems. Solid mechanics problems
typically involve the standard momentum balance equation, posed in a Lagrangian
setting, with different models distinguished by the choice of nonlinear or linearised
kinematics, and the constitutive model for determining the stress. The traditional
development approach to solid mechanics problems, and traditional finite element
codes, places a strong emphasis on the implementation of constitutive models
at the quadrature point level. Automated methods, on the other hand, tend to
stress more heavily the governing balance equations. Widely used finite element
codes for solid mechanics applications provide application programming interfaces
(APIs) for users to implement their own constitutive models. The interface supplies
kinematic and history data, and the user code computes the stress tensor, and when
required also the linearisation of the stress. Users of such libraries will typically
not be exposed to code development other than via the constitutive model API.
In addition to demonstrating how solid mechanics problems can be solved
using automation tools, this chapter presents some of the models that will be
further investigated and extended in subsequent chapters. It is not intended as a
comprehensive treatment of solid mechanics problems, but should be viewed as a
stepping stone towards implementation of classes of plasticity models. The chapter
also focuses on some pertinent issues that arise due to the nature of the constitutive
models. These issues, and solid mechanics problems in general, have motivated a
number of developments in the FEniCS framework.
The common problems of linearised elasticity, plasticity, hyperelasticity and
elastic wave propagation are considered. Topics that are addressed in this chapter
30 Chapter 2. FEniCS applications to solid mechanics

via these problems include ‘off-line’ computation of stress updates, linearisation of


problems with off-line stress updates, automatic differentiation and time stepping
for problems with second-order time derivatives. The presentation starts with
the relevant governing equations and the constitutive models under consideration.
The important issue of solving and linearising problems in which the governing
equation is expressed in terms of the stress tensor (rather than explicitly in terms
of the displacement field, or derivatives of the displacement field), and the stress
tensor is computed via a separate algorithm is then addressed. These topics are then
followed by a number of examples that demonstrate implementation approaches of
the described models.
To conclude the chapter, which is primarily based on the work in Ølgaard and
Wells (2012a); Ølgaard et al. (2008b), extensions of the FEniCS framework that are
particular interesting with respect to solid mechanics problems, and consequently
to this work, are summarised.

2.1 Governing equations

2.1.1 Preliminaries

The considered problems will be posed on a polygonal domain Ω ⊂ Rd , where


1 ≤ d ≤ 3. The boundary of Ω, denoted by ∂Ω, is decomposed into regions Γ D
and Γ N such that Γ D ∪ Γ N = ∂Ω and Γ D ∩ Γ N = ∅. The outward unit normal
vector on ∂Ω will be denoted by n. For time-dependent problems, a time interval
of interest I = (0, T ] will be considered, where superimposed dots denote time
derivatives. The current configuration of a solid body is denoted by Ω; that is, the
domain Ω depends on the displacement field. It is sometimes convenient to also
define a reference domain Ω0 ⊂ Rd that remains fixed. For convenience, cases in
which Ω and Ω0 coincide at time t = 0 are considered. To indicate boundaries,
outward unit normal vectors, and other quantities relative to Ω0 , the subscript ‘0’
will be used. When considering linearised kinematics, the domains Ω and Ω0 are
both fixed and coincide at all times t. A triangulation of the domain Ω will be
denoted by Th , and a triangulation of the domain Ω0 will be denoted by Th,0 .
The governing equations for the different models will be formulated in the
common framework of: find u ∈ V such that

F (u; w) = 0 ∀ w ∈ V, (2.1)

where F : V × V → R is linear in w and V is a suitable function space. If F is also


linear in u, then F can be expressed as

F (u; w) := a(u, w) − L(w), (2.2)


2.1. Governing equations 31

where a : V × V → R is linear in u and in w, and L : V → R is linear in w. For this


case, the problem can be cast in the canonical setting of: find u ∈ V such that

a(u, w) = L(w) ∀ w ∈ V, (2.3)

which is identical to the form in (1.2). For nonlinear problems, a Newton method
is typically employed to solve (2.1). Linearising F about u = u0 leads to a bilinear
form,
dF (u0 + edu; w)

a (du, w) := Du0 F (u0 ; w) [du] = , (2.4)
de
e =0
and a residual given by:
L ( w ) : = F ( u0 , w ). (2.5)
Using the definitions of a and L in (2.4) and (2.5), respectively, a Newton step
involves solving a problem of the type in (2.3), followed by the correction u0 ←
u0 − du. The process is repeated until (2.1) is satisfied to within a specified tolerance.

2.1.2 Balance of momentum

The standard balance of linear momentum problem for the body Ω reads:

ρü − ∇ · σ = b in Ω × I, (2.6)
u=g on Γ D × I, (2.7)
σn = h on Γ N × I, (2.8)
u ( x, 0) = u0 in Ω, (2.9)
u̇( x, 0) = v0 in Ω, (2.10)

where ρ : Ω × I → R is the mass density, u : Ω × I → Rd is the displacement field,


σ : Ω × I → Rd × Rd is the symmetric Cauchy stress tensor, b : Ω × I → Rd is a
body force, g : Ω × I → Rd is a prescribed boundary displacement, h : Ω × I → Rd
is a prescribed boundary traction, u0 : Ω → Rd is the initial displacement and
v0 : Ω → Rd is the initial velocity. To complete the boundary value problem, a
constitutive model that relates σ to u is required.
To develop finite element models, it is necessary to cast the momentum balance
equation in a weak form by multiplying the balance equation (2.6) by a weight
function w and integrating. It is possible to formulate a space-time method by
considering a weight function that depends on space and time, and then integrating
over Ω × I. However, it is far more common in solid mechanics applications to
consider a weight function that depends on spatial position only and to apply finite
difference methods to deal with time derivatives. Following this approach, at a
time t ∈ I equation (2.6) is multiplied by a function w (w is assumed to satisfy
32 Chapter 2. FEniCS applications to solid mechanics

w = 0 on Γ D ) and integrate over Ω:


Z Z Z
ρü · w dx − (∇ · σ) · w dx − b · w dx = 0. (2.11)
Ω Ω Ω

Applying integration by parts, using the divergence theorem and inserting the
boundary condition (2.8), equation (2.11) can be expressed on the form of (2.2) as:
Z Z Z Z
F := ρü · w dx + σ : ∇w dx − h · w ds − b · w dx = 0. (2.12)
Ω Ω ΓN Ω

In this section, the momentum balance equation has been presented on the
current configuration Ω. It can also be posed on the fixed reference domain Ω0 via
a pull-back operation. However, for the particular presentation which is used in
this chapter for geometrically nonlinear models details of the pull-back will not be
needed.

2.1.3 Potential energy minimisation

An alternative approach to solving static problems (problems without an inertia


term) is to consider the minimisation of potential energy. This approach leads
to the same governing equation when applied to a standard problem, but may
be a preferable framework for problems that are naturally posed in terms of
stored energy densities and for which external forcing terms are conservative
(see Holzapfel (2000, p. 159) for an explanation of conservative loading), and
for problems that involve coupled physical phenomena that are best described
energetically.
Consider a system for which the total potential energy Π associated with a body
can be expressed as
Π = Πint + Πext , (2.13)
where Πint is the internal potential energy stored in Ω and Πext is the energy
associated with external forces acting on the domain Ω. An internal potential
energy functional of the form
Z
Πint = Ψ0 (v) dx, (2.14)
Ω0

where Ψ0 is the stored strain energy density on the reference domain, and an
external potential energy functional of the form
Z Z
Πext = − b0 · v dx − h0 · v ds, (2.15)
Ω0 Γ0,N

are considered. It is the form of the stored energy density function Ψ0 that defines
2.2. Constitutive models 33

a particular constitutive model. For later convenience, the potential energy terms
have been presented on the reference domain Ω0 .
A minimiser u of (2.13) minimises the potential energy:

min Π, (2.16)
v ∈V

where V is a suitably defined function space. Minimisation of Π corresponds to


the directional derivative of Π being zero for all possible variations of u. Therefore,
minimisation of Π corresponds to solving (2.1) with

dΠ (u + ew)

F (u; w) := Du Π (u) [w] = . (2.17)
de
e =0

For suitable definitions of the stress tensor, it is straightforward to show that


minimising Π is equivalent to solving the balance of momentum problem, for the
static case.

2.2 Constitutive models


A constitutive model describes the relationship between stress and deformation. The
stress can be defined explicitly in terms of primal functions like the displacement
field for linearised elasticity, it can be implicitly defined via stored energy density
functions, or it can be defined as the solution to a secondary problem for instance
the yield criterion in the case of plasticity. The constitutive model can be either
linear or nonlinear. In the following sections examples of these cases are presented
in the form of linearised elasticity, plasticity and hyperelasticity. The expressions
for the stress or stored energy density presented in this section can be inserted into
the balance equations or the minimisation framework in the preceding section to
yield a governing equation.

2.2.1 Linearised elasticity

For linearised elasticity, the stress tensor as a function of the strain tensor for an
isotropic, homogeneous material is given by

σ = 2µε + λtr(ε) I, (2.18)

where ε = ∇u + (∇u)T /2 is the strain tensor, µ and λ are the Lamé parameters
 

and I is the second-order identity tensor. The relationship between the stress and
the strain can also be expressed as

σ = C : ε, (2.19)
34 Chapter 2. FEniCS applications to solid mechanics

where  
Cijkl = µ δik δjl + δil δjk + λδij δkl , (2.20)

and δij is the Kronecker-Delta.

2.2.2 Flow theory of plasticity


The standard flow theory model of plasticity is considered, and only the background
necessary to support the examples will be presented. In depth coverage can be
found in many textbooks, such as Lubliner (2008) and Simo and Hughes (1998).
For a geometrically linear plasticity problem, the stress tensor is computed by

σ = C : εe , (2.21)

where εe is the elastic part of the strain tensor. It is assumed that the strain tensor
can be decomposed additively into elastic and plastic parts:

ε = εe + εp . (2.22)

If εe can be determined, then the stress can be computed.


The stress tensor in classical plasticity models must satisfy the yield criterion:

f (σ, εp , κ ) := φ σ, qkin (εp ) − qiso (κ ) − σy 6 0, (2.23)




where φ σ, qkin (εp ) is a scalar effective stress measure, qkin is a stress-like internal


variable used to model kinematic hardening, qiso is a scalar stress-like term used to
model isotropic hardening, κ is a scalar internal variable and σy is the initial scalar
yield stress. For the commonly adopted von Mises model (also known as J2 -flow)
with linear isotropic hardening, φ and qiso read:
r
3
φ (σ) = s s , (2.24)
2 ij ij
qiso (κ ) = Hκ, (2.25)

where sij = σij − σkk δij /3 is the deviatoric stress and the constant scalar H > 0 is a
hardening parameter.
In the flow theory of plasticity, the plastic strain rate is given by:

∂g
ε̇p = λ̇ , (2.26)
∂σ

where λ̇ is the rate of the plastic multiplier and the scalar g is known as the plastic
potential. In the case of associative plastic flow, g = f . The term λ̇ determines
the magnitude of the plastic strain rate, and the direction is given by ∂g/∂σ. For
2.2. Constitutive models 35

isotropic strain-hardening, it is usual to set


r
2 p p
κ̇ = ε̇ ε̇ , (2.27)
3 ij ij

which for associative von Mises plasticity implies that κ̇ = λ̇.


A feature of the flow theory of plasticity is that the constitutive model is
postulated in a rate form. This requires the application algorithms to compute the
stress from increments of the total strain. A discussion of algorithmic aspects on
how the stress tensor can be computed from the equations presented in this section
is postponed to Section 2.4.2.

2.2.3 Hyperelasticity

Hyperelastic models are characterised by the existence of a stored strain energy


density function Ψ0 . The linearised model presented at the start of this section falls
within the class of hyperelastic models. Assuming linearised kinematics, the stored
energy function
λ
Ψ0 = (tr ε)2 + µε : ε (2.28)
2
corresponds to the linearised model in (2.18). It is straightforward to show that
using this stored energy function in the potential energy minimisation approach
in (2.17) leads to the same equation as inserting the stress from (2.18) into the weak
momentum balance equation (2.12).
More generally, stored energy functions that correspond to nonlinear models
can be defined. A wide range of stored energy functions for hyperelastic models
have been presented and analysed in the literature (see, for example, Bonet and
Wood (1997) for a selection). In order to present concrete examples, it is necessary to
introduce some kinematics, and in particular strain measures. The Green–Lagrange
: d d
 gradientF Ω0 × I → R × R
strain tensor E is defined in terms of the deformation
sym
and the right Cauchy–Green tensor C : Ω0 × I → Rd × Rd :

F = I + ∇u, (2.29)
T
C = F F, (2.30)
1
E = (C − I ) , (2.31)
2
where I is the second-order identity tensor. Using E in place of the infinitesimal
strain tensor ε in (2.28), the following expression for the strain energy density
function is obtained:
λ
Ψ0 = (tr E)2 + µE : E, (2.32)
2
36 Chapter 2. FEniCS applications to solid mechanics

which is known as the St. Venant–Kirchhoff model. Unlike the linearised case, this
energy density function is not linear in u (or spatial derivatives of u), which means
that when minimising the total potential energy Π, the resulting equations are
nonlinear. Other examples of hyperelastic models are the Mooney–Rivlin model:

Ψ0 = c1 ( IC − 3) + c2 ( I IC − 3) , (2.33)
 
where IC = tr C and I IC = 21 IC2 − tr C2 and the compressible neo-Hookean
model:
µ λ
Ψ0 = ( IC − 3) − µ ln J + (ln J )2 , (2.34)
2 2
where J = det F.
In most presentations of hyperelastic models, one would proceed from the
definition of the stored energy function to the derivation of a stress tensor, and
then often to a linearisation of the stress for use in a Newton method. This process
can be lengthy and tedious. For a range of models, features of UFL will permit
problems to be posed as energy minimisation problems, and it will not be necessary
to compute an expression for a stress tensor, or its linearisation, explicitly. A
particular model can then be posed in terms of a particular expression for Ψ0 , as
will be demonstrated in the example in Section 2.4.3. It is also possible to follow
the momentum balance route, in which case UFL can be used to compute the stress
tensor and its linearisation automatically from an expression for Ψ0 .

2.3 Linearisation issues for complex constitutive models


Solving problems with nonlinear constitutive models, such as plasticity, using
Newton’s method requires linearisation of (2.12). There are two particular issues
that deserve attention. The first is that if the stress σ is computed via some
algorithm, then proper linearisation of F requires linearisation of the algorithm for
computing the stress, and not linearisation of the continuous problem. This point
is well known in computational plasticity, and has been extensively studied (Simo
and Taylor, 1985). The second issue is that the stress field, and its linearisation,
will not in general come from a finite element space. Hence, if all functions are
assumed to be in a finite element space, or are interpolated in a finite element space,
suboptimal convergence of a Newton method will be observed. This is illustrated
in the following sections.

2.3.1 Consistency of linearisation


Consider the following one-dimensional problem:
Z
F (u; w) := σw,x dx, (2.35)

2.3. Linearisation issues for complex constitutive models 37

where the scalar stress σ is a nonlinear function of the strain field u,x , and will be
computed via a separate algorithm. A continuous, piecewise quadratic displace-
ment field (and likewise for w) is considered. The strain field u,x is computed via
an L2 -projection onto the space of discontinuous, piecewise linear elements. For the
considered spaces, this is equivalent to a direct evaluation of the strain. Because the
stress is computed via a separate algorithm based on nodal values from the strain
field, it is chosen to also represent the stress using a discontinuous, piecewise linear
basis. Since the polynomial degree of the integrand is two, (2.35) can be integrated
exactly using two Gauss quadrature points on an element T ∈ Th :

2 2    
f T,i1 := ∑∑ ψαT xq σα φiT1 ,x xq Wq , (2.36)
q =1 α =1

where q is the integration point index, α is the degree of freedom index for the
local basis of σ, ψ T and φ T denotes the linear and quadratic basis functions on the
element T, respectively, and Wq is the quadrature weight at integration point xq .
Note that σα is the computed value of the stress at the element node α.
To apply a Newton method, the Jacobian (linearisation) of (2.36) is required.
This will be denoted by A?T,i . To achieve quadratic convergence of a Newton
method, the linearisation must be exact. The Jacobian of (2.36) is:

d f T,i1
A?T,i := , (2.37)
dui2

where ui2 are the displacement degrees of freedom. Because the stress is computed
from the strain field u,x , only σα in (2.36) depends on dui2 , and the linearisation of
this term reads:
dσα dσα dε α dε α
= = Dα , (2.38)
dui2 dε α dui2 dui2
where Dα is the tangent value at node α. To compute the values of the strain at
nodes, ε α , from the displacement field, the derivative of the displacement field is
evaluated at xα :
3
εα = ∑ φiT2 ,x ( xα ) ui2 . (2.39)
i2 =1

Inserting (2.38) and (2.39) into (2.37) yields:

2 2
A?T,i = ∑ ∑ ψαT (xq ) Dα φiT2 ,x (xα )φiT1 ,x (xq )Wq . (2.40)
q =1 α =1

This is the exact linearisation of (2.36).


The linearisation of the weak form (2.35) is now considered, which leads to the
38 Chapter 2. FEniCS applications to solid mechanics

bilinear form: Z
a(u, w) := Du,x w,x dx, (2.41)

where D = dσ/dε is the tangent. As before, D is represented using a discontinuous,
piecewise linear basis where the nodal values of D are computed via a separate
algorithm. If two quadrature points are used to integrate the form (which is exact
for this form), the resulting element matrix is:

2 2
A T,i = ∑ ∑ ψαT (xq ) Dα φiT2 ,x (xq )φiT1 ,x (xq )Wq . (2.42)
q =1 α =1

The representation of the element matrix in (2.42) is what would be produced by


FFC.
Equations (2.40) and (2.42) are not identical since φiT2 ,x is being evaluated in
different locations (xq 6= xα in general). As a consequence, the bilinear form in
(2.42) is not an exact linearisation of (2.35), and a Newton method will therefore
exhibit suboptimal convergence. For the special case where a continuous, piecewise
linear basis is used for u and w and a discontinuous, piecewise constant basis is
used for the strain, stress and tangent fields, only one integration point is needed
and thus xq = xα which makes the linearisation exact.
In general, the illustrated problem arises when some coefficients in a form are
computed by a nonlinear operation elsewhere, and then interpolated and evaluated
at points that differ from where the coefficient values were computed. This situation
is different from the use of nonlinear operators in UFL (see Table 1.2, page 10).
An example of such an operator is the ‘ln J’ term in the neo-Hookean model (2.34)
where ‘J’ will be computed at quadrature points during assembly after which the
operator ‘ln’ is applied to compute ‘ln J’.
The linearisation issue highlighted in this section is further illustrated in the
following section, as too is a solution in the context of automated modelling that
involves the definition of so-called ‘quadrature elements’.

2.3.2 Quadrature elements


Before introducing the concept of quadrature elements, a model problem that will
be used in numerical examples is presented. Given the finite element space
n o
V := w ∈ H01 (Ω), w ∈ Pk ( T ) ∀ T ∈ Th , (2.43)

where Ω ⊂ R and k ≥ 1, the model problem of interest involves: given f ∈ V, find


u ∈ V such that
Z   Z
F := 1 + u2 u,x w,x dx − f w dx = 0 ∀ w ∈ V. (2.44)
Ω Ω
2.3. Linearisation issues for complex constitutive models 39

Solving this problem via Newton’s method involves solving a series of linear
problems with
Z   Z
L (w) := 1 + u2n un,x wn,x dx − f w dx, (2.45)
ZΩ  

Z
a (dun+1 , w) := 1 + u2n dun+1,x w,x dx + 2un un,x dun+1 w,x dx, (2.46)
Ω Ω

with the update un ← un − dun+1 . To draw an analogy with complex constitutive


models, the above is rephrased as:
Z Z
L (w) := σn w,x dx − f w dx, (2.47)
ZΩ Ω
Z
a (dun+1 , w) := Cn dun+1,x w,x dx + 2un un,x dun+1 w,x dx, (2.48)
Ω Ω
   
where σn = 1 + u2n un,x and Cn = 1 + u2n . Apart from the second term in the
bilinear form, the forms now resemble those for a plasticity problem where σ is the
‘stress’, C is the ‘tangent’ and u,x is the ‘strain’.
Similar to a plasticity problem, the idea is to compute nodal values of σ and C
‘off-line’, and to supply σ and C as functions in a space W to the forms used in the
Newton solution process. To access un,x for use off-line, an approach is to perform
an L2 -projection of the derivative of u onto a space W. For the example in question,
the term 1 + u2 will also be projected onto W. A natural choice would be to make
W one polynomial order less that V and discontinuous across cell facets. However,
following this approach leads to a convergence rate for a Newton solver that is less
than the expected quadratic rate. The reason is that the linearisation that follows
from this process is not consistent with the problem being solved as explained in
the previous section.
To resolve this issue within the context of UFL and FFC, the concept of quadrature
elements has been developed1 . This special type of element is used to represent
‘functions’ that can only be evaluated at particular points (quadrature points), and
cannot be differentiated, but can be integrated (approximately). In the remainder
of this section key features of the quadrature element are presented together with a
demonstration of its use for the model problem considered above.
A quadrature element is declared in UFL by:
UFL code
element = FiniteElement("Quadrature", tetrahedron, k)

1 The concept was introduced in Ølgaard et al. (2008b) although the syntax for declaring a ‘quadrature

element’ and the underlying interpretation has changed slightly. Specifically, the argument k used to
refer to the number of integration points in each spacial direction of the quadrature scheme, which is
different from the current interpretation in which it refers to the polynomial degree that the underlying
quadrature rule will be able to integrate exactly.
40 Chapter 2. FEniCS applications to solid mechanics

where k is the polynomial degree that the underlying quadrature rule will be
able to integrate exactly. The declaration of a quadrature element is similar to the
declaration of any other element in UFL, as demonstrated in Section 1.3.2, and it can
be used as such, with some limitations. Note, however, the subtle difference that the
element order does not refer to the polynomial degree of the finite element shape
functions, but instead relates to the quadrature scheme. For ‘sufficient’ integration
of a second-order polynomial in three dimensions, FFC will use four quadrature
points per cell. FFC interprets the quadrature points of the quadrature element as
degrees of freedom where the value of a shape function for a degree of freedom is
equal to one at the quadrature point and zero otherwise. This has the implication
that a function that is defined on a quadrature element can only be evaluated at
quadrature points. Furthermore, it is not possible to take derivatives of functions
defined on a quadrature element.
The following examples illustrate simple usage of a quadrature element. Con-
sider the bilinear form for a mass matrix weighted by a coefficient f that is defined
on a quadrature element:
Z
a (u, w) := f uw dx. (2.49)

If the test and trial functions w and u come from a space of linear Lagrange
functions, the polynomial degree of their product is two. This means that the
coefficient f should be defined as:

UFL code
ElementQ = FiniteElement("Quadrature", tetrahedron, 2)
f = Coefficient(ElementQ)

to ensure appropriate integration of the form in (2.49). The reason for this is that
the quadrature element in the form dictates the quadrature scheme that FFC will
use for the numerical integration since the quadrature element, as described above,
only has nonzero values at points that coincide with the underlying quadrature
scheme of the quadrature element. Thus, if the degree of ElementQ is set to one, the
form will be integrated using only one integration point, since one point is enough
to integrate a linear polynomial exactly, and as a result the form is under-integrated.
If quadratic Lagrange elements are used for w and u, the polynomial degree of the
integrand is four, therefore the declaration for the coefficient f should be changed
to:

UFL code
ElementQ = FiniteElement("Quadrature", tetrahedron, 4)
f = Coefficient(ElementQ)

As a final demonstration of quadrature elements, consider the DOLFIN code in


2.4. Implementations and examples 41

Iteration CG1 /DG0 CG1 /Q1 CG2 /DG1 CG2 /Q2


1 1.114e+00 1.101e+00 1.398e+00 1.388e+00
2 2.161e-01 2.319e-01 2.979e-01 2.691e-01
3 3.206e-03 3.908e-03 2.300e-02 6.119e-03
4 7.918e-07 7.843e-07 1.187e-03 1.490e-06
5 9.696e-14 3.662e-14 2.656e-05 1.242e-13
6 5.888e-07
7 1.317e-08
8 2.963e-10

Table 2.1: Computed relative residual norms after each iteration of the Newton
solver for the nonlinear model problem using different elements for V and W.
Quadratic convergence is observed when using quadrature elements, and when
using piecewise constant functions for W, which coincides with a one-point quadra-
ture element. The presented results are computed using the code in Figure 2.1
using the different combinations of function spaces.

Figure 2.1 for solving the nonlinear model problem in (2.44) with a source term f =
x2 − 4, Dirichlet boundary conditions u = 1 at x = 0, continuous quadratic elements
for V, and quadrature elements of degree two for W. NonlinearModelProblem is
a subclass of the DOLFIN class NonlinearProblem, which implements the lin-
ear form F and the bilinear form J, the derivative or Jacobian of F, according
to (2.5) and (2.4) respectively. The DOLFIN class NewtonSolver solves prob-
lems expressed in the canonical form of (2.1) based on the information provided
by the NonlinearModelProblem object. Further details on the DOLFIN classes
NonlinearProblem and NewtonSolver can be found in Logg et al. (2012d).
The relative residual norm after each iteration of the Newton solver for four
different combinations of spaces V and W is shown in Table 2.1. Continuous, dis-
continuous and quadrature elements are denoted by CGk , DGk and Qk respectively
where k refers to the polynomial degree as discussed previously. It is clear from the
table that using quadratic elements for V requires the use of quadrature elements
for W in order to ensure quadratic convergence of the Newton solver.

2.4 Implementations and examples


This section presents implementation examples that correspond to the afore pre-
sented models. Where feasible, complete solvers are presented. When this is not
feasible, relevant code extracts are presented. Python examples are preferred due
to the compactness of the code extracts, however, in the case of plasticity efficiency
demands a C++ implementation. It is possible in the future that an efficient Python
interface for plasticity problems will be made available via just-in-time compilation.
42 Chapter 2. FEniCS applications to solid mechanics

Python code
from dolfin import *

# Sub domain for Dirichlet boundary condition


class DirichletBoundary(SubDomain):
def inside(self, x, on_boundary):
return x[0] < DOLFIN_EPS and on_boundary

# Class for interfacing with the Newton solver


class NonlinearModelProblem(NonlinearProblem):
def __init__(self, a, L, u, C, S, W, bc):
NonlinearProblem.__init__(self)
self.a, self.L = a, L
self.u, self.C, self.S, self.W, self.bc = u, C, S, W, bc

def F(self, b, x):


assemble(self.L, tensor=b)
self.bc.apply(b, x)

def J(self, A, x):


assemble(self.a, tensor=A)
self.bc.apply(A)

def form(self, A, b, x):


C = project((1.0 + self.u**2), self.W)
self.C.vector()[:] = C.vector()
S = project(Dx(self.u, 0), self.W)
self.S.vector()[:] = S.vector()
self.S.vector()[:] = self.S.vector()*self.C.vector()

# Create mesh and define function spaces


mesh = UnitInterval(8)
V = FunctionSpace(mesh, "Lagrange", 2)
W = FunctionSpace(mesh, "Quadrature", 2)

# Define boundary condition


bc = DirichletBC(V, Constant(1.0), DirichletBoundary())

# Define source and functions


f = Expression("x[0]*x[0] - 4")
u, C, S = Function(V), Function(W), Function(W)

# Define variational problems


du, w = TrialFunction(V), TestFunction(V)
L = S*Dx(w, 0)*dx - f*w*dx
a = C*Dx(du, 0)*Dx(w, 0)*dx + 2*u*Dx(u, 0)*du*Dx(w, 0)*dx

# Create nonlinear problem, solver and solve


problem = NonlinearModelProblem(a, L, u, C, S, W, bc)
solver = NewtonSolver(); solver.solve(problem, u.vector())

Figure 2.1: DOLFIN implementation for the nonlinear model problem in (2.44) with
‘off-line’ computation of terms used in the variational forms.
2.4. Implementations and examples 43

In the code extracts, commentary is only provided for non-trivial aspects as the
more generic aspects, such as the creation of meshes, application of boundary
conditions and the solution of linear systems, already have been treated in the
introduction to the FEniCS Project in Section 1.3.

2.4.1 Linearised elasticity


This example is particularly simple since the stress can be expressed as a straightfor-
ward function of the displacement field, and the expression for the stress in (2.18)
can be inserted directly into (2.12). For the steady case (inertia terms are ignored),
a complete solver for a linearised elasticity problem is presented in Figure 2.2. The
solver in Figure 2.2 is for a simulation on a unit cube with a source term b = (1, 0, 0)
and u = 0 on ∂Ω. A continuous, piecewise quadratic finite element space is used.
The expressiveness of the UFL input means that the expressions for sigma and F
in Figure 2.2 resemble closely the mathematical expressions used in the text for σ
and F. To unify the presentation of linear and nonlinear equations, the problem in
Figure 2.2 is presented in terms of F, where the UFL functions lhs (left-hand side)
and rhs (right-hand side) have been used to automatically extract the bilinear and
linear forms, respectively, from F (Alnæs et al., 2013).

2.4.2 Plasticity
The computation of the stress tensor, and its linearisation, for the model outlined
in Section 2.2.2 in a displacement-driven finite element model is rather involved. A
method of computing point-wise a stress tensor that satisfies (2.23) from the strain,
strain increment and history variables is known as a ‘return mapping algorithm’.
Return mapping strategies are discussed in detail in Simo and Hughes (1998). A
widely used return mapping approach, the ‘closest-point projection’, is summarised
below for a plasticity model with linear isotropic hardening.
From (2.21) and (2.22) the stress at the end of a strain increment reads:
p
σn+1 = C : (ε n+1 − ε n+1 ). (2.50)
p
Therefore, given ε n+1 , it is necessary to determine the plastic strain ε n+1 in order to
compute the stress. In a closest-point projection method the increment in plastic
strain is computed from:

p p ∂g (σn+1 )
ε n+1 − ε n = ∆λ , (2.51)
∂σ
where g is the plastic potential function and ∆λ = λn+1 − λn . Since ∂σ g is evaluated
at σn+1 , (2.50) and (2.51) constitute as system of coupled equations with unknowns
∆λ and σn+1 . In general, the system is nonlinear. To obtain a solution, Newton’s
44 Chapter 2. FEniCS applications to solid mechanics

Python code
from dolfin import *

# Create mesh
mesh = UnitCube(8, 8, 8)

# Create function space


V = VectorFunctionSpace(mesh, "Lagrange", 2)

# Create test and trial functions, and source term


u, w = TrialFunction(V), TestFunction(V)
b = Constant((1.0, 0.0, 0.0))

# Elasticity parameters
E, nu = 10.0, 0.3
mu, lmbda = E/(2.0*(1.0 + nu)), E*nu/((1.0 + nu)*(1.0 - 2.0*nu))

# Stress
sigma = 2*mu*sym(grad(u)) + lmbda*tr(grad(u))*Identity(w.cell().d)

# Governing balance equation


F = inner(sigma, grad(w))*dx - dot(b, w)*dx

# Extract bilinear and linear forms from F


a, L = lhs(F), rhs(F)

# Dirichlet boundary condition on entire boundary


c = Constant((0.0, 0.0, 0.0))
bc = DirichletBC(V, c, DomainBoundary())

# Set up PDE and solve


u = Function(V)
problem = LinearVariationalProblem(a, L, u, bcs=bc)
solver = LinearVariationalSolver(problem)
solver.parameters["symmetric"] = True
solver.solve()

Figure 2.2: DOLFIN solver for a linearised elasticity problem on a unit cube.
2.4. Implementations and examples 45

method is employed as follows, with k denoting the iteration number. First, a ‘trial
stress’ is computed:
p
σtrial = C : (ε n+1 − ε n ). (2.52)
Subtracting (2.52) from (2.50) and inserting (2.51), the following equation is ob-
tained:
∂g (σn+1 )
Rn+1 := σn+1 − σtrial + ∆λC : = 0, (2.53)
∂σ
where Rn+1 is the ‘stress residual’. During the Newton iterations this residual
is driven towards zero. If the trial stress in (2.52) leads to satisfaction of the
yield criterion in (2.23), then σtrial is the new stress and the Newton procedure is
terminated. Otherwise, the Newton increment of ∆λ is computed from:

f k − Rk : Qk : ∂σ f k
dλk = , (2.54)
∂ σ f k : Ξ k : ∂ σ gk + h
h i −1
where Q = I + ∆λC : ∂2σσ g , Ξ = Q : C and h is a hardening parameter, which
for the von Mises model with linear hardening is equal to H (the constant hardening
parameter). The stress increment is computed from:

∆σk = −dλk C : ∂σ gk − Rk : Qk , (2.55)


 

after which the increment of the plastic multiplier and the stresses for the next
iteration can be computed:

∆λk+1 = ∆λk + dλk , (2.56)


σk+1 = σk + ∆σk . (2.57)

The yield criterion is then evaluated again using the updated values, and the proce-
dure continues until the yield criterion is satisfied to within a prescribed tolerance.
Note that to start the procedure ∆λ0 = 0 and σ0 = σtrial . After convergence is
achieved, the consistent tangent can be computed:

Ξ : ∂σ g ⊗ ∂σ f : Ξ
Ctan = Ξ − , (2.58)
∂σ f : Ξ : ∂σ g + h

which is used when assembling the global Jacobian (stiffness matrix). The return
mapping algorithm is applied at all quadrature points.
The closest-point return mapping algorithm described above (Simo and Hughes,
1998) is common to a range of plasticity models that are defined by the form of the
functions f and g. The process can be generalised for models with more complicated
hardening behaviour. To aid the implementation of different models, a return
mapping algorithm and support for quadrature point level history parameters
46 Chapter 2. FEniCS applications to solid mechanics

is provided by the FEniCS Solid Mechanics library. The library is implemented


in C++ and adopts a polymorphic design, with the base class PlasticityModel
providing an interface for users to implement, and thereby supply functions for f ,
∂σ f , ∂σ g, and ∂σσ g. Figure 2.3 shows the public interface of the PlasticityModel
class. Supplied with details of f (and possibly g), the library can compute stress
updates and linearisations using the closest-point projection method.
Computational efficiency is important in the return mapping algorithm as the
stress and its linearisation are computed at all quadrature points at each global
Newton iteration. Therefore, FEniCS Solid Mechanics relies on the linear algebra
library Armadillo (http://arma.sourceforge.net/) to perform the block opera-
tions inside the return mapping algorithm to get the benefit of BLAS. Furthermore,
the algorithm is executed in C++ rather than in Python. For this reason, the FEniCS
Solid Mechanics library provides a C++ interface only at this stage. To reconcile
ease and efficiency, it would be possible to use just-in-time compilation for a Python
implementation of the PlasticityModel interface, just as DOLFIN presently does
for the Expression class (see Logg et al. (2012d)).
In the following, the outline of a solver based on the FEniCS Solid Mechanics
library is presented. The UFL input for a formulation in three dimensions using a
continuous, piecewise quadratic basis is shown in Figure 2.4. Note that the stress
and the linearised tangent, s and t, are defined using quadrature elements and
supplied as coefficients to the form, line 2, 3 and 7, as they are computed inside
the plasticity library. Note also in Figure 2.4 that symmetry has been exploited to
flatten the stress and the tangent terms, line 13 and 18. Recall from Section 2.3
that when constitutive updates are computed outside of the form file care must be
taken to ensure quadratic convergence of a Newton method. By using quadrature
elements in Figure 2.4, it is possible to achieve quadratic convergence during a
Newton solve for plasticity problems.
The solver is implemented in C++, and Figure 2.5 shows an extract of the most
relevant parts of the solver in the context of plasticity. First, the necessary function
spaces are created, line 3-5. V is used to define the bilinear and linear forms and
the displacement field u, while Vt and Vs are used for the two coefficient spaces:
the consistent tangent and the stress, which enter the bilinear and linear forms of
the plasticity problem. The forms defining the plasticity problem are then created
and the relevant functions are attached to the forms, line 8-12. Then the object
defining the plasticity model is created, line 25. The class VonMises is a subclass
of the PlasticityModel class shown in Figure 2.3 and it implements functions for
f , ∂σ f and ∂σσ g. It is constructed with values for the Young’s modulus, Poisson’s
ratio, yield stress and linear hardening parameter. This object can then be passed to
the constructor of the PlasticityProblem class along with the forms, displacement
field u, coefficient functions and boundary conditions, line 28. PlasticityProblem
class, a subclass of the DOLFIN class NonlinearProblem, handles the assembly over
2.4. Implementations and examples 47

C++ code
class PlasticityModel
{
public:

/// Constructor
PlasticityModel(double E, double nu);

/// Return hardening parameter


virtual double hardening_parameter(double eps_eq) const;

/// Equivalent plastic strain


virtual double kappa(double eps_eq, const arma::vec& stress,
double lambda_dot) const;

/// Value of yield function f


virtual double f(const arma::vec& stress,
double equivalent_plastic_strain) const = 0;

/// First derivative of f with respect to sigma


virtual void df(arma::vec& df_dsigma,
const arma::vec& stress) const = 0;

/// First derivative of g with respect to sigma


virtual void dg(arma::vec& dg_dsigma,
const arma::vec& stress) const;

/// Second derivative of g with respect to sigma


virtual void ddg(arma::mat& ddg_ddsigma,
const arma::vec& stress) const = 0;

};

Figure 2.3: PlasticityModel public interface defined by the plasticity library. Users
are required to supply implementations for at least the pure virtual functions.
These functions describe the plasticity model.
48 Chapter 2. FEniCS applications to solid mechanics

UFL code
1 element = VectorElement("Lagrange", tetrahedron, 2)
2 elementT = VectorElement("Quadrature", tetrahedron, 2, 36)
3 elementS = VectorElement("Quadrature", tetrahedron, 2, 6)
4
5 u, w = TrialFunction(element), TestFunction(element)
6 b, h = Coefficient(element), Coefficient(element)
7 t, s = Coefficient(elementT), Coefficient(elementS)
8
9 def eps(u):
10 return as_vector([u[i].dx(i) for i in range(3)] \
11 + [u[i].dx(j) + u[j].dx(i) for i, j in [(0, 1), (0, 2), (1, 2)]])
12
13 def sigma(s):
14 return as_matrix([[s[0], s[3], s[4]],
15 [s[3], s[1], s[5]],
16 [s[4], s[5], s[2]]])
17
18 def tangent(t):
19 return as_matrix([[t[i*6 + j] for j in range(6)] for i in range(6)])
20
21 a = inner(dot(tangent(t), eps(u)), eps(w))*dx
22 L = inner(sigma(s), grad(w))*dx - dot(b, w)*dx - dot(h, w)*ds

Figure 2.4: Definition of the linear and bilinear variational forms for plasticity
expressed using UFL syntax.
2.4. Implementations and examples 49

cells, loops over cell quadrature points, and variable updates in addition to defining
the linear and bilinear forms of the plasticity problem. The PlasticityProblem is
solved by the NewtonSolver like any other NonlinearProblem object as described
earlier in this chapter, line 41. After each Newton solve, the history variables are
updated by calling the update_variables function before proceeding with the next
solution increment, line 44.

2.4.3 Hyperelasticity
The construction of a solver for a hyperelastic problem, phrased as a minimisation
problem, is now presented and follows the minimisation framework presented
in Section 2.1.3. The compressible neo-Hookean model in (2.34) is adopted. The
automatic functional differentiation features of UFL permit the solver code to
resemble closely the abstract mathematical presentation. Differentiation of forms
with respect to functions are handled by the UFL function derivative. For instance,
given the potential energy functional Π (u) as a function of the displacements u,
the derivative of Π with respect to u in the direction w is given by

dΠ (u + ew)

Du Π ( u ) [ w ] : = , (2.59)
de
e =0

which can be implemented in UFL by the expression:

UFL code
derivative(Pi, u, w)

If w is a test function, the result from applying the derivative is a linear form, which
can be differentiated again to yield a bilinear form as shown in (2.4). Noteworthy
in this approach is that it is not necessary to provide an explicit expression for the
stress tensor. Changing model is therefore as simple as redefining the stored energy
density function Ψ0 .
A complete hyperelastic solver is presented in Figure 2.6. It corresponds to a
problem posed on a unit cube, and loaded by a body force b0 = (0, −0.5, 0), and
restrained such that u = (0, 0, 0) where x = 0. Elsewhere on the boundary the
traction h0 = (0.1, 0, 0) is applied. Continuous, piecewise linear functions for the
displacement field are used. The code in Figure 2.6 adopts the same notation used
in Sections 2.1.3 and 2.2.3. The problem is posed on the reference domain, and for
convenience the subscripts ‘0’ have been dropped in the code.
The solver in Figure 2.6 solves the problem using one Newton step. For problems
with stronger nonlinearities, perhaps as a result of greater volumetric or surface
forcing terms, it may be necessary to apply a pseudo time-stepping approach and
solve the problem in number of Newton increments, or it may be necessary to
apply a path following solution method.
50 Chapter 2. FEniCS applications to solid mechanics

C++ code
1 // Create mesh and define function spaces
2 UnitCube mesh(4, 4, 4);
3 Plasticity::FunctionSpace V(mesh);
4 Plasticity::BilinearForm::CoefficientSpace_t Vt(mesh);
5 Plasticity::LinearForm::CoefficientSpace_s Vs(mesh);
6
7 // Create functions, forms and attach functions
8 Function u(V); Function tangent(Vt); Function stress(Vs);
9 Plasticity::BilinearForm a(V, V);
10 Plasticity::LinearForm L(V);
11 a.t = tangent;
12 L.s = stress;
13
14 // Young’s modulus and Poisson’s ratio
15 double E = 20000.0; double nu = 0.3;
16
17 // Slope of hardening (linear) and hardening parameter
18 double E_t(0.1*E);
19 double hardening_parameter = E_t/(1.0 - E_t/E);
20
21 // Yield stress
22 double yield_stress = 200.0;
23
24 // Object of class von Mises
25 fenicssolid::VonMises J2(E, nu, yield_stress, hardening_parameter);
26
27 // Create PlasticityProblem
28 fenicssolid::PlasticityProblem nonlinear_problem(a, L, u, tangent, stress, bcs,
J2);
29
30 // Create nonlinear solver
31 NewtonSolver nonlinear_solver;
32
33 // Pseudo time stepping parameters
34 double t = 0.0; double dt = 0.005; double T = 0.02;
35
36 // Apply load in steps
37 while (t < T)
38 {
39 // Increment time and solve nonlinear problem
40 t += dt;
41 nonlinear_solver.solve(nonlinear_problem, *u.vector());
42
43 // Update variables for next load step
44 nonlinear_problem.update_variables();
45 }

Figure 2.5: DOLFIN code extract for solving a plasticity problem using the FEniCS
Solid Mechanics library.
2.4. Implementations and examples 51

Python code
from dolfin import *

# Optimization options for the form compiler


parameters["form_compiler"]["cpp_optimize"] = True

# Create mesh and define function space


mesh = UnitCube(16, 16, 16)
V = VectorFunctionSpace(mesh, "Lagrange", 1)

# Define Dirichlet boundary (x = 0)


def left(x):
return x[0] < DOLFIN_EPS
bc = DirichletBC(V, Constant((0.0, 0.0, 0.0)), left)

# Define test and trial functions


du, w = TrialFunction(V), TestFunction(V)

# Displacement from previous iteration


u = Function(V)
b = Constant((0.0, -0.5, 0.0)) # Body force per unit mass
h = Constant((0.1, 0.0, 0.0)) # Traction force on the boundary

# Kinematics
I = Identity(V.cell().d) # Identity tensor
F = I + grad(u) # Deformation gradient
C = F.T*F # Right Cauchy-Green tensor
Ic, J = tr(C), det(F) # Invariants of deformation tensors

# Elasticity parameters
E, nu = 10.0, 0.3
mu, lmbda = E/(2*(1 + nu)), E*nu/((1 + nu)*(1 - 2*nu))

# Stored strain energy density (compressible neo-Hookean model)


Psi = (mu/2)*(Ic - 3) - mu*ln(J) + (lmbda/2)*(ln(J))**2

# Total potential energy


Pi = Psi*dx - dot(b, u)*dx - dot(h, u)*ds

# Compute first variation of Pi (directional derivative about u in the


direction of w)
F = derivative(Pi, u, w)

# Compute Jacobian of F
dF = derivative(F, u, du)

# Create nonlinear variational problem and solve


problem = NonlinearVariationalProblem(F, u, bcs=bc, J=dF)
solver = NonlinearVariationalSolver(problem); solver.solve()

Figure 2.6: Complete DOLFIN solver for the compressible neo-Hookean model,
formulated as a minimisation problem.
52 Chapter 2. FEniCS applications to solid mechanics

2.4.4 Elastodynamics

As a final example, a linearised elastodynamics problem to illustrate the solution


of time-dependent problems is considered. The example is based on the Newmark
family of methods, which are widely used in structural dynamics. It is a direct
integration method, in which the equations are evaluated at discrete points in time
separated by a time increment ∆t. Thus, the time step tn+1 is equal to tn + ∆t.
While this section addresses the Newmark scheme, it is straightforward to extend
the approach (and implementation) to generalised-α methods (Hilber et al., 1977).
The Newmark relations between displacements, velocities and accelerations at
tn and tn+1 read:

1   
un+1 = un + ∆tu̇n + ∆t2 2βün+1 + 1 − 2β ün , (2.60)
2
u̇n+1 = u̇n + ∆t γün+1 + (1 − γ) ün , (2.61)


where β and γ are scalar parameters. Various well-known schemes are recovered
for particular combinations of β and γ. Setting β = 1/4 and γ = 1/2 leads to the
trapezoidal scheme, and setting β = 0 and γ = 1/2 leads to a central difference
scheme. For β > 0, re-arranging (2.60) and using (2.61) leads to:
!
1 1
ün+1 = (un+1 − un − ∆tu̇n ) − − 1 ün , (2.62)
β∆t2 2β
! !
γ γ γ
u̇n+1 = (u − un ) − − 1 u̇n − ∆t − 1 ün , (2.63)
β∆t n+1 β 2β

in which un+1 is the only unknown term on the right-hand side. To solve a time
dependent problem, the governing equation can be posed at time tn+1 ,

F ( u n +1 ; w ) = 0 ∀ w ∈ V, (2.64)

with the expressions in (2.62) and (2.63) used for first and second time derivatives
of u at time tn+1 .
The viscoelastic model under consideration is a minor extension of the elasticity
model in (2.18). For the viscoelastic model, the stress tensor is given by:

σ = 2µε + λtr(ε) + ηtr(ε̇) I, (2.65)




where the constant scalar η ≥ 0 is a viscosity parameter.


A simple, but complete, elastodynamics solver is presented in Figures 2.7 and 2.8.
The solver mirrors the notation used in (2.62), (2.63) and (2.65), with expressions for
the acceleration, velocity and displacement at time tn (a0, v0, u0), and expressions
2.5. Current and future developments 53

for the acceleration and velocity at time tn+1 (a1, v1) in terms of the displacement
at tn+1 (u1) and other fields at time tn . For simplicity, the source term b = (0, 0, 0).
The body is fixed such that u = (0, 0, 0) at x = 0 and the initial conditions are
u0 = v0 = (0, 0, 0). A traction h is applied at x = 1 and is increased linearly from
zero to one over the first five time steps. Therefore, no forces are acting on the
body at t = 0 and the initial acceleration is zero. Again, the UFL functions lhs
and rhs have been used to extract the bilinear and linear terms from the form.
This is particularly convenient for time-dependent problems since it allows the
code implementation to be posed in the same format as is usually adopted in the
mathematical presentation, with the equation of interest posed in terms of fields at
some point between times tn and tn+1 . The presented solver could be made more
efficient by exploiting linearity of the governing equation and thereby re-using the
factorisation of the system matrix.

2.5 Current and future developments


In this chapter a range of standard solid mechanics problems have been presented
in the context of automated modelling. The implementation of the models was
shown to be relatively straightforward due to the high level of abstraction provided
in the FEniCS framework. The presented cases cover a range of typical solid
mechanics problems that can be solved using FEniCS version 1.0. To broaden the
range of problems that can be handled in the FEniCS framework the following two
extensions are of particular interest from a solid mechanics viewpoint:

Assembly of forms on manifolds In FEniCS version 1.0, it is assumed that two-


dimensional elements, like triangles, are embedded in R2 and three-dimensional
elements, like tetrahedra, are embedded in R3 . At the time of writing, support
for two-dimensional elements embedded in R3 and one-dimensional elements
embedded in R2 or R3 is being implemented in the development version of
FEniCS. This does, among other things, facilitate the development of support
for shell and truss problems within the automated framework.

Isoparametric elements This issue relates to quadrilateral and hexahedral ele-


ments, which are currently not supported, and to elements with higher order
mappings that allow curved mesh boundaries to be represented.

Finally, to attract more users with a solid mechanics background another exten-
sion to consider is improving the interface of the FEniCS Solid Mechanics library
to make it more similar to conventional finite element software packages. This
involves supplying the users with information like strain, strain rates and possibly
gradients of strain at integration point level for the user to formulate the constitutive
relation without working with the weak form of the governing equations.
54 Chapter 2. FEniCS applications to solid mechanics

Python code
from dolfin import *

# External load
class Traction(Expression):
def __init__(self, end):
Expression.__init__(self)
self.t = 0.0
self.end = end

def eval(self, values, x):


values[0] = 0.0
values[1] = 0.0
if x[0] > 1.0 - DOLFIN_EPS:
values[0] = self.t/self.end if self.t < self.end else 1.0

def value_shape(self):
return (2,)

def update(u, u0, v0, a0, beta, gamma, dt):


# Get vectors (references)
u_vec, u0_vec = u.vector(), u0.vector()
v0_vec, a0_vec = v0.vector(), a0.vector()

# Update acceleration and velocity


a_vec = (1.0/(2.0*beta))*( (u_vec - u0_vec - v0_vec*dt)/(0.5*dt*dt) -
(1.0-2.0*beta)*a0_vec )

# v = dt * ((1-gamma)*a0 + gamma*a) + v0
v_vec = dt*((1.0-gamma)*a0_vec + gamma*a_vec) + v0_vec

# Update (t(n) <-- t(n+1))


v0.vector()[:], a0.vector()[:] = v_vec, a_vec
u0.vector()[:] = u.vector()

# Load mesh and define function space


mesh = UnitSquare(32, 32)

# Define function space


V = VectorFunctionSpace(mesh, "Lagrange", 1)

# Test and trial functions


u1, w = TrialFunction(V), TestFunction(V)

E, nu = 10.0, 0.3
mu, lmbda = E/(2.0*(1.0 + nu)), E*nu/((1.0 + nu)*(1.0 - 2.0*nu))

# Mass density and viscous damping coefficient


rho, eta = 1.0, 0.2

Figure 2.7: DOLFIN code for solving for a dynamic problem using an implicit
Newmark scheme. Program continues in Figure 2.8.
2.5. Current and future developments 55

Python code
# Time stepping parameters
beta, gamma = 0.25, 0.5
dt = 0.1
t, T = 0.0, 20*dt

# Fields from previous time step (displacement, velocity, acceleration)


u0, v0, a0 = Function(V), Function(V), Function(V)
h = Traction(T/4.0)

# Velocity and acceleration at t_(n+1)


v1 = (gamma/(beta*dt))*(u1 - u0) - (gamma/beta - 1.0)*v0 - dt*(gamma/(2.0*beta)
- 1.0)*a0
a1 = (1.0/(beta*dt**2))*(u1 - u0 - dt*v0) - (1.0/(2.0*beta) - 1.0)*a0

# Stress tensor
def sigma(u, v):
return 2.0*mu*sym(grad(u)) + (lmbda*tr(grad(u)) +
eta*tr(grad(v)))*Identity(u.cell().d)

# Governing equation
F = (rho*dot(a1, w) + inner(sigma(u1, v1), sym(grad(w))))*dx - dot(h, w)*ds

# Extract bilinear and linear forms


a, L = lhs(F), rhs(F)

# Set up boundary condition at left end


zero = Constant((0.0, 0.0))
def left(x):
return x[0] < DOLFIN_EPS
bc = DirichletBC(V, zero, left)

# Set up PDE, advance in time and solve


u = Function(V)
problem = LinearVariationalProblem(a, L, u, bcs=bc)
solver = LinearVariationalSolver(problem)
# Save solution in VTK format
file = File("displacement.pvd")
while t <= T:
t += dt
h.t = t
solver.solve()
update(u, u0, v0, a0, beta, gamma, dt)
file << u

Figure 2.8: Continuation from Figure 2.7 of DOLFIN code extract for solving for a
dynamic problem.
3 Representations and optimisations of
finite element variational forms

The previous chapter demonstrated that solvers for various solid mechanics prob-
lems can be implemented with relatively little effort using an automated modelling
approach which relies on the abstractions offered by UFL and the ability of FFC to
generate C++ code from the UFL input. For the approach to be competitive with
hand written code, it is important that the run-time performance of the correspond-
ing low-level code generated from the UFL representation is comparable to that
of hand written code. To this end, FFC implements two different types of repre-
sentations of finite element tensors, the so-called tensor contraction representation
and the classical quadrature-loop representation, including optimisations of both
representations.
The development of different strategies for representing and optimising fi-
nite element variational forms has been motivated by the desire of applying the
automated modelling approach to problems of increasing complexity. The first
representation available in FFC was the tensor contraction. However, this repre-
sentation is not effective for problems like plasticity in Section 2.4.2. This led to
the development of a representation based on quadrature which included the opti-
misations described in Sections 3.3.1 and 3.3.2. With the availability of automatic
differentiation in UFL, problems like hyperelasticity could easily be implemented
in the automated framework, Sections 2.2.3 and 2.4.3. For these types of prob-
lems, further optimisations of the quadrature representation was necessary for
efficient computation. These optimisations are important as FFC will automatically
select the quadrature representation for moderately complex and highly complex
problems if the representation is not set by the user. The automatic selection of
representation is discussed in Section 3.6. Many FEniCS users will, therefore, be
using the quadrature representation and optimisations unknowingly, particularly if
they work through the Python interface of DOLFIN.
This chapter presents the developments in FFC in terms of representations and
optimisations for finite element variational forms and is primarily based on the
work in Ølgaard and Wells (2009, 2010, 2012b) with the main difference being that
58 Chapter 3. Representations and optimisations of finite element variational forms

code examples and results have been updated to be compliant with FEniCS version
1.0. The developments have been applied by researchers and application developers
to various problems such as multiphase flow through porous media (Wells et al.,
2008), free surface flows (Labeur and Wells, 2009), the Navier–Stokes equations
(Mortensen et al., 2011; Labeur and Wells, 2012; Jansson et al., 2011; Selim et al.,
2012), fluid structure interaction (Selim, 2012; Hoffman et al., 2013), shape memory
alloys (Grandi et al., 2012), electromagnetics (Marchand and Davidson, 2011; Lezar
and Davidson, 2012), magnetic fluid hyperthermia for cancer therapy (Miaskowski
et al., 2012), oscillatory hydraulic tomography (Saibaba et al., 2012), the Föppl–Von
Kármán shell model (Vidoli, 2013), nonlinear elliptic problems (Lakkis and Pryer,
2011), microstructural processes (Maraldi et al., 2011, 2012), mantle convection
simulations (Vynnytska et al., 2013, 2012), glacier ice motion (Riesen et al., 2010;
Riesen, 2011), PDE-constrained optimisation and optimal control (Brandenburg
et al., 2012; Funke and Farrell, 2013; Rosseel and Wells, 2012; Clason and Kunisch,
2012; Rognes and Logg, 2012), Nitsche’s method for overlapping meshes (Massing
et al., 2012b,a, 2013), automated modelling of evolving discontinuities (Nikbakht
and Wells, 2009; Nikbakht, 2012), liquid crystal elastomers (Luo and Calderer, 2012),
and crack propagation in elastomers (Horst et al., 2013).

3.1 Motivation and approach

The tensor contraction representation of element tensors (Kirby and Logg, 2006;
Ølgaard et al., 2008a) is based on the multiplicative decomposition of an element
tensor into two tensors; one of which depends only on the differential equation
and the chosen finite element bases and can be computed prior to run-time. It
has been shown for classes of problems that the tensor contraction representation
is more efficient than the traditional quadrature approach, and the speed-ups
can be dramatic (Kirby and Logg, 2006; Ølgaard and Wells, 2010). Furthermore,
strategies which analyse the structure of the tensor contraction representation can
yield improved performance (Kirby et al., 2005, 2006). However, in contrast to
the quadrature-loop approach, the tensor contraction representation is somewhat
specialised as it cannot be extended trivially to non-affine isoparametric mappings
while maintaining efficiency, and it is not effective for classes of nonlinear problems
which require the integration of functions that do not come from a finite element
space (Ølgaard et al., 2008b). The attractive feature of the approach is the run-time
performance for classes of problems.
A general experience is that the tensor contraction approach does not scale well
when forms become more complicated. This is manifest in three ways: the time
required to generate low-level code for a variational form becomes prohibitive or
3.1. Motivation and approach 59

may fail due to memory limitations or limitations of underlying libraries1 ; the size
of the generated code is such that the compilation of the generated low-level code is
prohibitively slow and file size limitations of compilers acting on the low-level code
may be exceeded; and the run-time performance deteriorates rapidly relative to a
quadrature approach. Complicated forms are by no means exotic. Many common
nonlinear equations, when linearised, result in forms which involve numerous
function products. Factors that determine the complexity of a form are the number
of coefficient functions, the number of derivatives and the polynomial orders of
the finite element basis functions. Approaches to reduce the time required for the
code generation phase when using the tensor contraction representation have been
developed and implemented in FFC (Kirby and Logg, 2007), although these cannot
mitigate the inherently expensive nature of the approach for complicated forms.
Using a quadrature representation for more complicated forms mitigates the
problems regarding the time required to generate the code and the file size of the
generated code. However, a naive implementation of the quadrature representation
can have a serious impact on the run-time performance of the generated code.
Fortunately, the automated generation of computer code provides scope for various
optimisations to be applied such that optimal or near-optimal run-time performance
is maintained also for complex forms. The optimisations that have been developed
in this work are discussed in Section 3.3, see also Ølgaard and Wells (2010, 2012b).
To demonstrate the issues pertinent to automated code generation for compli-
cated forms this chapter presents the tensor contraction representation and the
quadrature representations, and discusses four optimisation strategies for the latter
for run-time performance of the generated code. Adopting the approach in Øl-
gaard and Wells (2010), the two representations are then compared to each other by
considering

1. The run-time performance of the generated code;

2. The size of the generated code; and

3. The speed of the code generation phase.

The relative importance of these points may well shift during a development
cycle. During initial development, it is likely that the speed of the code generation
phase and the size of the generated code are most important, whereas at the end
of the development cycle run-time performance is likely to be the most crucial
consideration. However, there is typically a correlation between the three points.
After comparing the two representations, the four optimisations for the quadrature
representation are compared to each other in terms of run-time performance.
1 For instance, the implementation of the tensor contraction representation in FFC relies on the

Python module NumPy (http://www.numpy.org/) for computations involving n-dimensional arrays.


The maximum dimension which is allowed is version specific, but for NumPy version 1.6.2 nmax = 32.
60 Chapter 3. Representations and optimisations of finite element variational forms

It should be noted that the presented representations and optimisation tech-


niques are possible to implement with conventional ‘hand’ coding. Automation,
however, makes the approach generic and allows the application of these simple but
tedious to implement by hand strategies to an unlimited range of problems. Auto-
mated code generation is most appealing when considering complicated variational
forms for which the strategies could not be reasonably expected of a developer to
program by hand.

3.2 Representation of finite element tensors

The bilinear form for the weighted Laplace operator −∇ · (w∇u), where u is
unknown and w is a prescribed coefficient is chosen as a canonical example to
illustrate the two different representations and the optimisations implemented in
FFC. The bilinear form for this operator reads
Z
a (u, v) := w∇u · ∇v dx. (3.1)

The quadrature approach can deal with cases in which not all functions come
from a finite element space including nonlinear functions like ln, exp, sin etc.,
using ‘quadrature functions’ (see Section 2.3.2) that can be evaluated directly
at quadrature points. The tensor representation approach only supports cases
in which all functions come from a finite element space (using interpolation if
necessary). Therefore, to ensure a proper performance comparison between the
representations, it is assumed in this chapter that all functions in a form, including
coefficient functions, come from a finite element function space. In the case of (3.1),
all functions will come from
n o
Vh := v ∈ H 1 (Ω) : v| T ∈ Pq ( T ) ∀ T ∈ Th , (3.2)

where Pq ( T ) denotes the space of Lagrange polynomials of degree q onn theoelement


T of the standard triangulation of Ω, which is denoted by Th . Letting φiT denote
the local finite element basis that spans the discrete function space Vh on T, the
local element tensor for an element T can be computed as
Z
A T,i = w∇φiT1 · ∇φiT2 dx, (3.3)
T

where i = (i1 , i2 ) is a multi-index. The UFL input for (3.1) is shown in Figure 3.1
for continuous piecewise linear functions on triangles as a basis for all functions in
the form.
3.2. Representation of finite element tensors 61

UFL code
element = FiniteElement("Lagrange", triangle, 1)

u = TrialFunction(element)
v = TestFunction(element)
w = Coefficient(element)

a = w*inner(grad(u), grad(v))*dx

Figure 3.1: UFL input for the weighted Laplacian form on linear triangular elements.

3.2.1 Quadrature representation

FFC generates an intermediate representation of the UFL input in Figure 3.1 as


explained in Section 1.3.3. Assuming a standard affine mapping FT : T0 → T from
a reference element T0 to a given element T ∈ Th , this intermediate representation
reads

N n
A T,i = ∑ ∑ Φ α3 ( X q ) w α3
q =1 α3 =1
d d ∂Xα1 ∂Φi1 ( X q ) d
∂Xα2 ∂Φi2 ( X q )
∑ ∑ ∑ det FT0 W q , (3.4)
β =1 α1 =1 ∂x β ∂Xα1 α2 =1 ∂x β ∂X α 2

where a change of variables from the reference coordinates X to the real coordinates
x = FT ( X ) has been used. In the above equation, N denotes the number of
integration points, d is the dimension of Ω, n is the number of degrees of freedom
for the local basis of w, Φi denotes basis functions on the reference element, det FT0
is the determinant of the Jacobian, and W q is the quadrature weight at integration
point X q . By default, FFC applies a quadrature scheme that will integrate the
variational form exactly.
From the intermediate representation in (3.4), code for computing entries of
the local element tensor is generated. This code is shown in Figure 3.2. Code
generated for the quadrature representation is structured in the following way.
First, values of geometric quantities that depend on the current element T, like
the components of the inverse of the Jacobian matrix ∂Xα1 /∂x β and ∂Xα2 /∂x β ,
are computed and assigned to the variables like K_01 in the code (this code is
not shown as it is not important for understanding the nature of the quadrature
representation). Next, values of basis functions and their derivatives at integration
points on the reference element, like Φα3 ( X q ) and ∂Φi1 ( X q )/∂Xα1 are tabulated.
Finite element basis functions are computed by FIAT. Basis functions and their
derivatives on a reference element are independent of the current element T and
62 Chapter 3. Representations and optimisations of finite element variational forms

are, therefore, tabulated at compile-time and stored in the tables Psi_w, Psi_vu_D01
and Psi_vu_D10 in Figure 3.2. After the tabulation of basis function values, the
loop over integration points begins. In the example, linear elements are considered,
and only one integration point is necessary for exact integration. The loop over
integration points has therefore been omitted. The first task inside a loop over
integration points is to compute the values of coefficients at the current integration
point. For the considered problem, this involves computing the value of the
coefficient w. The code for evaluating F0 in Figure 3.2 is an exact translation of the
representation ∑nα3 =1 Φα3 ( X q )wα3 . The last part of the code in Figure 3.2 is the loop
over the basis function indices i1 and i2 , where the contribution to each entry in
the local element tensor, A T , from the current integration point is added. The code
presented in Figure 3.2 is the default output of the quadrature representation and
is not optimised for run-time performance. Optimisation strategies are discussed
in Section 3.3. To generate code using the quadrature representation the FFC
command-line option -r quadrature should be used.

3.2.2 Tensor contraction representation


An alternative to the run-time quadrature approach presented in the previous
section is the tensor contraction representation, which is reviewed here by fol-
lowing the work of Kirby and Logg (2006). Taking equation (3.4) as the point
of departure, the tensor contraction representation of the element matrix for the
weighted Laplacian is expressed as

d d n d ∂Xα ∂Xα
Z
∂Φi1 ∂Φi2
A T,i = ∑ ∑ ∑ det FT0 wα3 ∑ ∂xβ1 ∂xβ2 T0
Φ α3
∂Xα1 ∂Xα2
dX. (3.5)
α1 =1 α2 =1 α3 =1 β =1

Noteworthy is that the integral appearing in equation (3.5) is independent of the cell
geometry and can, therefore, be evaluated prior to run-time. The remaining terms,
with the exception of wα3 , depend only on the geometry of the cell. Exploiting this
observation, the element tensor A T,i can then be expressed as a tensor contraction,

A T,i = ∑ A0iα GTα , (3.6)


α

where the tensors A0iα (the reference tensor) and GTα (the geometry tensor) are defined
as
∂Φi1 ∂Φi2
Z
A0iα = Φ α3 dX, (3.7)
T0 ∂Xα1 ∂Xα2
d ∂Xα1 ∂Xα2
GTα = det FT0 wα3 ∑ ∂x β ∂x β
. (3.8)
β =1
3.2. Representation of finite element tensors 63

C++ code
virtual void tabulate_tensor(double* A,
const double * const * w,
const ufc::cell& c) const
{
...
// Quadrature weight.
static const double W1 = 0.5;

// Tabulated basis functions at quadrature points.


static const double Psi_w[1][3] = \
{{0.33333333333333, 0.33333333333333, 0.33333333333333}};
static const double Psi_vu_D01[1][3] = \
{{-1.0, 0.0, 1.0}};
static const double Psi_vu_D10[1][3] = \
{{-1.0, 1.0, 0.0}};

// Compute coefficient value.


double F0 = 0.0;
for (unsigned int r = 0; r < 3; r++)
F0 += Psi_w[0][r]*w[0][r];

// Loop basis functions.


for (unsigned int j = 0; j < 3; j++)
{
for (unsigned int k = 0; k < 3; k++)
{
A[j*3 + k] +=
((K_00*Psi_vu_D10[0][j] + K_10*Psi_vu_D01[0][j])*
(K_00*Psi_vu_D10[0][k] + K_10*Psi_vu_D01[0][k]) +
(K_01*Psi_vu_D10[0][j] + K_11*Psi_vu_D01[0][j])*
(K_01*Psi_vu_D10[0][k] + K_11*Psi_vu_D01[0][k])
)*F0*W1*det;
}
}
}

Figure 3.2: Part of the generated code for quadrature representation of the bilinear
form associated with the weighted Laplacian using linear elements in two dimen-
sions. The variables like K_00 are components of the inverse of the Jacobian matrix
and det is the determinant of the Jacobian. The code to compute these variables
is not shown. A holds the values of the local element tensor and w contains nodal
values of the weighting function w.
64 Chapter 3. Representations and optimisations of finite element variational forms

During assembly, one may then iterate over all elements of the triangulation
and on each element T compute the geometry tensor GTα , compute the tensor
contraction (3.6) and then add the resulting element tensor A T,i to the global sparse
matrix A. A generalisation of the approach to general multilinear variational forms
is presented in Kirby and Logg (2007).
The code which FFC will generate from the representation in (3.6) is shown in
Figure 3.3. As was the case with the quadrature representation, values of geometric
quantities that depend on the current element T are computed first and assigned
to the variables like K_01 in the code (again, this code is not shown as it is not
important for understanding the nature of the tensor contraction representation).
Based on these values, the geometry tensor (3.8) is computed and the contraction
in (3.6) is performed using the reference tensor from (3.7) which is precomputed
during the code generation stage (the literal constants 0.166667). Notice that the
contraction to compute entries in A T,i is unrolled which allows any zero-valued
entry of the reference tensor to be detected during the code generation stage and the
corresponding code can, therefore, be omitted. For a certain class of simple forms
this can lead to a tremendous speed-up when evaluating the element matrices
relative to a quadrature approach (Kirby and Logg, 2006).
Inevitably, the tensor contraction approach, due to unrolling the contraction,
leads to code which is much less compact compared to the quadrature represen-
tation (see Figure 3.2). Furthermore, as the number of functions and derivatives
present in the variational form increases, the rank of both the reference tensor
and the geometry tensor increases, thereby increasing the complexity of the ten-
sor contraction. Thus, for complicated forms the size of the generated code may
cause problems for the compilers acting on the generated low-level code, and the
complexity of the tensor contraction may exceed that of the quadrature representa-
tion leading to poor run-time performance. This influence of the complexity on
the performance is investigated in Section 3.4. To generate code using the tensor
contraction representation the FFC command-line option -r tensor should be
used.

3.3 Quadrature optimisations

The automated generation of code provides scope for employing optimisations


which may not be practically feasible in hand-generated code. An example of such
an approach which is pertinent to the tensor contraction representation involves the
analysis of the reference tensor, A0iα , in order to find so-called complexity-reducing
relations between subtensors which will minimise the number of floating point
operations required to compute the element tensor (Kirby et al., 2005, 2006; Kirby
and Logg, 2008). For simple problems, this can lead to a significant reduction in the
number of operations required to compute the local element tensor, A T,i . However,
3.3. Quadrature optimisations 65

C++ code
virtual void tabulate_tensor(double* A,
const double * const * w,
const ufc::cell& c) const
{
...
// Compute geometry tensor
const double G0_0_0_0 = det*(w[0][0]*((K_00*K_00 + K_01*K_01)));
const double G0_0_0_1 = det*(w[0][0]*((K_00*K_10 + K_01*K_11)));
const double G0_0_1_0 = det*(w[0][0]*((K_10*K_00 + K_11*K_01)));
const double G0_0_1_1 = det*(w[0][0]*((K_10*K_10 + K_11*K_11)));
const double G0_1_0_0 = det*(w[0][1]*((K_00*K_00 + K_01*K_01)));
const double G0_1_0_1 = det*(w[0][1]*((K_00*K_10 + K_01*K_11)));
const double G0_1_1_0 = det*(w[0][1]*((K_10*K_00 + K_11*K_01)));
const double G0_1_1_1 = det*(w[0][1]*((K_10*K_10 + K_11*K_11)));
const double G0_2_0_0 = det*(w[0][2]*((K_00*K_00 + K_01*K_01)));
const double G0_2_0_1 = det*(w[0][2]*((K_00*K_10 + K_01*K_11)));
const double G0_2_1_0 = det*(w[0][2]*((K_10*K_00 + K_11*K_01)));
const double G0_2_1_1 = det*(w[0][2]*((K_10*K_10 + K_11*K_11)));

// Compute element tensor


A[0] = 0.166667*G0_0_0_0 + 0.166667*G0_0_0_1 + 0.166667*G0_0_1_0 +
0.166667*G0_0_1_1 + 0.166667*G0_1_0_0 + 0.166667*G0_1_0_1 +
0.166667*G0_1_1_0 + 0.166667*G0_1_1_1 + 0.166667*G0_2_0_0 +
0.166667*G0_2_0_1 + 0.166667*G0_2_1_0 + 0.166667*G0_2_1_1;
A[1] = -0.166667*G0_0_0_0 - 0.166667*G0_0_1_0 - 0.166667*G0_1_0_0 -
0.166667*G0_1_1_0 - 0.166667*G0_2_0_0 - 0.166667*G0_2_1_0;
A[2] = -0.166667*G0_0_0_1 - 0.166667*G0_0_1_1 - 0.166667*G0_1_0_1 -
0.166667*G0_1_1_1 - 0.166667*G0_2_0_1 - 0.166667*G0_2_1_1;
A[3] = -0.166667*G0_0_0_0 - 0.166667*G0_0_0_1 - 0.166667*G0_1_0_0 -
0.166667*G0_1_0_1 - 0.166667*G0_2_0_0 - 0.166667*G0_2_0_1;
A[4] = 0.166667*G0_0_0_0 + 0.166667*G0_1_0_0 + 0.166667*G0_2_0_0;
A[5] = 0.166667*G0_0_0_1 + 0.166667*G0_1_0_1 + 0.166667*G0_2_0_1;
A[6] = -0.166667*G0_0_1_0 - 0.166667*G0_0_1_1 - 0.166667*G0_1_1_0 -
0.166667*G0_1_1_1 - 0.166667*G0_2_1_0 - 0.166667*G0_2_1_1;
A[7] = 0.166667*G0_0_1_0 + 0.166667*G0_1_1_0 + 0.166667*G0_2_1_0;
A[8] = 0.166667*G0_0_1_1 + 0.166667*G0_1_1_1 + 0.166667*G0_2_1_1;
}

Figure 3.3: Part of the generated code for tensor contraction representation of the
bilinear form associated with the weighted Laplacian using linear elements in two
dimensions. The variables like K_00 are components of the inverse of the Jacobian
matrix and det is the determinant of the Jacobian. The code to compute these
variables is not shown. A holds the values of the local element tensor and w contains
nodal values of the weighting function w. Due to space considerations the number
of digits of the literal constant 0.166667 has been reduced from fifteen to six.
66 Chapter 3. Representations and optimisations of finite element variational forms

when dealing with complicated, or even moderately complicated, variational for-


mulations the experience that one is not generally well-rewarded for sophisticated
optimisation strategies is not uncommon. Such strategies may not scale well in
terms of the required computer time to perform the optimisations for moderately
complex variational forms and prove to be prohibitive in terms of time and memory.
Experience indicates that simple optimisations, some of which are described in this
section, offer the greatest rewards, even to the extent that the cost of evaluating
element tensors becomes negligible relative to other aspects of a computation, such
as insertion of entries into a sparse matrix.
This section discusses four automated a priori optimisation strategies, eliminate
operations on zeros, simplify expressions, precompute integration point constants and
precompute basis constants, that have been developed for the quadrature representa-
tion from Section 3.2.1 for improved run-time performance of the generated code.
The underlying philosophy of the optimisation strategies, which are implemented
in FFC, is to manipulate the representation in such a way that the number of
operations to compute the local element tensor decreases. Each strategy described
in the following sections, with the exception of eliminate operations on zeros, share
some features which can be categorised as:

Loop invariant code motion This procedure seeks to identify terms that are inde-
pendent of one or more of the summation indices and to move them outside
the loop over those particular indices. For instance, in (3.4) the terms regard-
ing the coefficient w, the quadrature weight W q and the determinant det FT0
are all independent of the basis function indices i1 and i2 and therefore only
need to be computed once for each integration point. A generic discussion of
this technique, which is also known as ‘loop hoisting’, can be found in Alfred
et al. (1986).

Reuse common terms Terms that appear multiple times in an expression can be
identified, computed once, stored as temporary values and then reused
in all occurrences in the expression. This can have a great impact on the
operation count since the expression to compute an entry in A T is located
inside loops over the basis function indices as shown in the code for the
standard quadrature representation in Figure 3.2.

The optimisations described in this section take place after the representation
stage of the code generation process (see Figure 1.3 on page 13) where any given
form is represented as simple loop and algebra instructions. Therefore, the opti-
misations are general and apply to all forms and elements that can be handled by
FFC. While the above optimisations are straightforward for simple forms and ele-
ments, their implementation using conventional programming approaches requires
manual inspection of the form and the basis. This is often done in specialised
codes, but the extension to non-trivial forms is difficult, time consuming and error
3.3. Quadrature optimisations 67

prone. Furthermore, the optimised code may bear little relation to the mathematical
problem at hand. This makes maintenance and re-use of the hand-generated code
problematic.
To switch on optimisation the command-line option -O should be used in
addition to any of the FFC optimisation options presented in the following sections.

3.3.1 Eliminate operations on zeros


Some basis functions, in particular those concerning mixed elements, and deriva-
tives of basis functions may be zero-valued at all integration points for a particular
problem. Since these values are tabulated at compile-time, the columns containing
nonzero values can be identified. This enables a reduction in the loop dimension
for indices concerning these tables, a process which is comparable to dead-code
elimination in compiler jargon. However, a consequence of reducing the tables
is that a mapping of indices must be created in order to access values correctly.
The mapping results in memory not being accessed contiguously at run-time and
can lead to a decrease in run-time performance. In some cases the elimination
of operations on zero terms is similar to the strategy that the tensor contraction
representation applies when unrolling the code as shown in Figure 3.3. The major
difference being that the quadrature representation can only eliminate contributions
that are zero for all quadrature points, unlike the tensor contraction representation
which can eliminate all zero-valued contributions. The unrolled tensor contraction
code is, however, longer which introduces some drawbacks, such as increased C++
compile-time as discussed previously.
To generate code with this optimisation, the FFC command-line option -f
eliminate_zeros should be used. Code for the weighted Laplace equation gener-
ated with this option is shown in Figure 3.4. For brevity, only code different from
the standard quadrature code in Figure 3.2 has been included.
As seen in Figure 3.4, the loop dimension for the loops involving the indices
j and k has decreased from three to two due to the elimination of zeros when
compared to the code standard quadrature code in Figure 3.2. However, the total
number of operations has increased. The reason is that the mapping causes four
entries to be computed at the same time inside the loop, and the code to compute
each entry has not been reduced significantly if compared to the code in Figure 3.2.
In fact, using this optimisation strategy by itself is usually not recommended, but
in combination with the strategies outlined in the following sections it can improve
run-time performance significantly. This effect is particularly pronounced when
forms contain mixed elements in which many of the values in the basis function
tables are zero. Another reason for being careful when applying this strategy is
that it might prevent FFC compilation due to hardware limitations because the
increase in the number of entries, which is computed inside the loop, will require
more memory during the compilation.
68 Chapter 3. Representations and optimisations of finite element variational forms

C++ code
// Tabulated basis functions.
static const double Psi_vu[1][2] = {{-1.0, 1.0}};

// Arrays of nonzero columns.


static const unsigned int nzc0[2] = {0, 2};
static const unsigned int nzc1[2] = {0, 1};

// Loop basis functions.


for (unsigned int j = 0; j < 2; j++)
{
for (unsigned int k = 0; k < 2; k++)
{
A[nzc0[j]*3 + nzc0[k]] +=
(K_10*Psi_vu[0][j]*K_10*Psi_vu[0][k] +
K_11*Psi_vu[0][j]*K_11*Psi_vu[0][k])*F0*W1*det;
A[nzc0[j]*3 + nzc1[k]] +=
(K_11*Psi_vu[0][j]*K_01*Psi_vu[0][k] +
K_10*Psi_vu[0][j]*K_00*Psi_vu[0][k])*F0*W1*det;
A[nzc1[j]*3 + nzc0[k]] +=
(K_00*Psi_vu[0][j]*K_10*Psi_vu[0][k] +
K_01*Psi_vu[0][j]*K_11*Psi_vu[0][k])*F0*W1*det;
A[nzc1[j]*3 + nzc1[k]] +=
(K_01*Psi_vu[0][j]*K_01*Psi_vu[0][k] +
K_00*Psi_vu[0][j]*K_00*Psi_vu[0][k])*F0*W1*det;
}
}

Figure 3.4: Part of the generated code for the weighted Laplacian using linear
elements in two dimensions with optimisation option -f eliminate_zeros. The
arrays nzc0 and nzc1 contain the nonzero column indices for the mapping of
values. Note how eliminating zeros makes it possible to replace the two tables with
derivatives of basis functions Psi_vu_D01 and Psi_vu_D10 from Figure 3.2 with one
table (Psi_vu).
3.3. Quadrature optimisations 69

3.3.2 Simplify expressions

The code expressions to evaluate an entry in the local element tensor can become
very complex. Since such expressions are typically located inside loops, a reduction
in complexity can reduce the total operation count significantly. The approach can
be illustrated by the expression x (y + z) + 2xy, which after expansion of the first
term and grouping common terms reduces to x (y + z) + 2xy → xy + xz + 2xy →
3xy + xz. As x appears in both products in the sum a reduction of one operation can
be achieved by moving x outside parenthesis 3xy + xz → x (3y + z). By applying
these simplifications, the number of operations has been reduced from five to three
which may seem trivial although it is, in fact, a reduction of 40%. The algorithm
developed and implemented in FFC to perform simplifications as described above,
bears resemblance to the algorithm presented by Hosangadi et al. (2006) and later
extended and applied to optimised code generation for finite element assembly
by Russell and Kelly (2013). An additional benefit of this strategy is that the
expansion of expressions, which take place before the simplification, will typically
allow more terms to be precomputed and hoisted outside loops, as explained in
the beginning of this section.
The FFC command-line option -f simplify_expressions should be used to
generate code with this optimisation enabled. Code generated by this option for the
representation in (3.4) is presented in Figure 3.5, where again only code different
from that in Figure 3.2 has been included. The number of operations has decreased
compared to the code in Figure 3.2 for the standard quadrature representation. An
improvement in run-time performance can therefore be expected.
To understand how the optimisations lead to the code in Figure 3.5, consider
the terms
d ∂X ∂Φ ( X q ) d
d
i ∂Xα ∂Φi ( X q )
∑ ∑ ∂xβ1 ∂X1 α ∑ ∂xβ2 ∂X2 α det FT0 W q ,
α
(3.9)
β =1 α =1
1 1 α =1
2 2

in the representation (3.4) for the weighted Laplace equation. These terms are
transformed by FFC into an expression equivalent to the code

C++ code
((K_00*Psi_vu_D10[0][j] + K_10*Psi_vu_D01[0][j])*
(K_00*Psi_vu_D10[0][k] + K_10*Psi_vu_D01[0][k]) +
(K_01*Psi_vu_D10[0][j] + K_11*Psi_vu_D01[0][j])*
(K_01*Psi_vu_D10[0][k] + K_11*Psi_vu_D01[0][k])
)*W1*det;

which is, apart from a missing F0, identical to the standard quadrature code inside
the loops in Figure 3.2.
This expression is then expanded into a new expression, a sum of products,
equivalent to the code
70 Chapter 3. Representations and optimisations of finite element variational forms

C++ code
// Geometry constants.
double G[3];
G[0] = W1*det*(K_00*K_00 + K_01*K_01);
G[1] = W1*det*(K_00*K_10 + K_01*K_11);
G[2] = W1*det*(K_10*K_10 + K_11*K_11);

// Integration point constants.


double I[3];
I[0] = F0*G[0];
I[1] = F0*G[1];
I[2] = F0*G[2];

// Loop basis functions.


for (unsigned int j = 0; j < 3; j++)
{
for (unsigned int k = 0; k < 3; k++)
{
A[j*3 + k] +=
(Psi_vu_D10[0][j]*Psi_vu_D10[0][k]*I[0] +
Psi_vu_D10[0][j]*Psi_vu_D01[0][k]*I[1] +
Psi_vu_D01[0][j]*Psi_vu_D10[0][k]*I[1] +
Psi_vu_D01[0][j]*Psi_vu_D01[0][k]*I[2]);
}
}

Figure 3.5: Part of the generated code for the weighted Laplacian using linear
elements in two dimensions with optimisation option -f simplify_expressions.
3.3. Quadrature optimisations 71

C++ code
K_00*K_00*W1*det*Psi_vu_D10[0][j]*Psi_vu_D10[0][k] +
K_00*K_10*W1*det*Psi_vu_D10[0][j]*Psi_vu_D01[0][k] +
K_00*K_10*W1*det*Psi_vu_D01[0][j]*Psi_vu_D10[0][k] +
K_10*K_10*W1*det*Psi_vu_D01[0][j]*Psi_vu_D01[0][k] +
K_01*K_01*W1*det*Psi_vu_D10[0][j]*Psi_vu_D10[0][k] +
K_01*K_11*W1*det*Psi_vu_D10[0][j]*Psi_vu_D01[0][k] +
K_01*K_11*W1*det*Psi_vu_D01[0][j]*Psi_vu_D10[0][k] +
K_11*K_11*W1*det*Psi_vu_D01[0][j]*Psi_vu_D01[0][k];

In the next step of the optimisation process, identical terms depending on


the loop indices j and k are identified and grouped such that the expression is
equivalent to

C++ code
(K_00*K_00*W1*det + K_01*K_01*W1*det)*Psi_vu_D10[0][j]*Psi_vu_D10[0][k] +
(K_00*K_10*W1*det + K_01*K_11*W1*det)*Psi_vu_D10[0][j]*Psi_vu_D01[0][k] +
(K_00*K_10*W1*det + K_01*K_11*W1*det)*Psi_vu_D01[0][j]*Psi_vu_D10[0][k] +
(K_10*K_10*W1*det + K_11*K_11*W1*det)*Psi_vu_D01[0][j]*Psi_vu_D01[0][k];

where the terms in parentheses only depend on geometry information. The terms
in parentheses can, therefore, be moved outside of the loops over the basis function
indices j and k and stored in the array G. During the process of generating values
for G, FFC will discover that two of the four parentheses are identical and thus only
three unique values in G are computed. The expressions to compute the values in
G have been simplified further by moving the variables det and W1, that appear
in both products, outside the parentheses as seen in Figure 3.5. The weighting
coefficient F0 (left out of the detailed explanation above) will generally depend on
the integration point. Therefore, each value in G is multiplied by F0 and the result
is stored in the array I which contain values that are constant inside the loop over
integration points.
The optimisation described above is the most expensive of the quadrature
optimisations to perform in terms of FFC code generation time and memory
consumption as it involves creating new terms when expanding the expressions.
The procedure does not scale well for complex expressions, but it is in many
cases the most effective approach in terms of reducing the number of operations.
This particular optimisation strategy, in combination with the elimination of zeros
outlined in the previous section, was the first to be implemented in FFC. It has
been investigated and compared to the tensor representation in Ølgaard and Wells
(2010).

3.3.3 Precompute integration point constants


The optimisations described in the previous section are performed at the expense
of increased code generation time. In order to reduce the generation time while
72 Chapter 3. Representations and optimisations of finite element variational forms

C++ code
// Geometry constants.
double G[1];
G[0] = W1*det;

// Integration point constants.


double I[1];
I[0] = F0*G[0];

// Loop basis functions.


for (unsigned int j = 0; j < 3; j++)
{
for (unsigned int k = 0; k < 3; k++)
{
A[j*3 + k] +=
((Psi_vu_D01[0][j]*K_10 + Psi_vu_D10[0][j]*K_00)*
(Psi_vu_D01[0][k]*K_10 + Psi_vu_D10[0][k]*K_00) +
(Psi_vu_D01[0][j]*K_11 + Psi_vu_D10[0][j]*K_01)*
(Psi_vu_D01[0][k]*K_11 + Psi_vu_D10[0][k]*K_01)
)*I[0];
}
}

Figure 3.6: Part of the generated code for the weighted Laplacian using linear
elements in two dimensions with optimisation option -f precompute_ip_const.

achieving a reduction in the operation count, another approach can be taken


involving hoisting expressions that are constant with respect to integration points
without expanding the expression first. To generate code with this optimisation
the FFC command-line option -f precompute_ip_const should be used. Code
generated by this method for the representation in (3.4) can be seen in Figure 3.6
which includes only code different from that in Figure 3.2.
It is clear from the generated code that this strategy will not lead to a significant
reduction in the number of operations for this particular form. The only difference
between the code inside the loop in Figure 3.2 and Figure 3.6 is that F0*W1*det
has been reduced to I[0] which reduces the number of operations by sixteen
(two operations for each of the nine times the loop is executed minus the two
operations to compute the I[0] entry). However, for more complex forms, with
many coefficients, the number of terms that can be hoisted will increase significantly,
leading to improved run-time performance.

3.3.4 Precompute basis constants

This optimisation strategy is an extension of the strategy described in the previous


section. In addition to hoisting terms related to the geometry and the integra-
3.3. Quadrature optimisations 73

tion points, values that depend on the basis indices are precomputed inside the
loops. This will result in a reduction in operations for cases in which some terms
appear frequently inside the loop such that a given value can be reused once
computed. To generate code with this optimisation, the FFC command-line option
-f precompute_basis_const should be used.
Code generated by this method for the representation in (3.4) can be seen in
Figure 3.7, where only code that differs from that in Figure 3.6 has been included.
Inside the loop, the value of each binary operation is stored in the array B such
that it can be reused in subsequent computations. The UFL representation of (3.4),
which is the input to FFC, can be viewed as a directed acyclic graph (DAG). When
FFC generates code from this input, it uses algorithms from UFL to traverse the
DAG such that code to evaluate subexpressions is generated before code to evaluate
any expression which depends on these subexpressions. This ensures that values in
B are computed in the correct order. In this particular case, no additional reduction
in operations has been achieved, if compared to the previous method, since no
terms can be reused inside the loop over the indices j and k. However, as the
complexity of forms increases so does the scope for reusing terms inside the loop,
leading to improved run-time performance.

3.3.5 Further optimisations


Preliminary investigations suggest that the performance of the quadrature rep-
resentation can be improved by applying two additional optimisations. Looking
at the code in Figure 3.7, it is seen that about half of the temporary values in
the array B only depend on the loop index j, and they can therefore be hoisted,
as has been done for other terms in previous sections. Another approach is to
unroll the loops with respect to j and k in the generated code. This will lead to a
dramatic increase in the number of values that can be reused, and the approach
can be readily combined with all of the other optimisation strategies. However, the
total number of temporary values will also increase. Therefore, this optimisation
strategy might not be feasible for all forms.
FFC implements a few efficient quadrature schemes for integrating polynomi-
als of degree less than or equal to six on simplices. For polynomials of degree
higher than six, it calls FIAT to compute the quadrature scheme. FIAT supplies
schemes that are based on the Gauss–Legendre–Jacobi rule mapped onto simplices
(see Karniadakis and Sherwin (2005) for details of such schemes). This means that
for integrating a seventh-order polynomial, FFC will use four quadrature points
in each spatial direction, that is, 43 = 64 points per cell in three dimensions. A
further optimisation of the quadrature representation can thus be achieved by
implementing more efficient quadrature schemes for higher order polynomials on
simplices since a reduction in the number of integration points will yield improved
run-time performance. FFC does, however, provide an option for a user to specify
74 Chapter 3. Representations and optimisations of finite element variational forms

C++ code
for (unsigned int j = 0; j < 3; j++)
{
for (unsigned int k = 0; k < 3; k++)
{
double B[16];
B[0] = Psi_vu_D01[0][j]*K_10;
B[1] = Psi_vu_D10[0][j]*K_00;
B[2] = (B[0] + B[1]);
B[3] = Psi_vu_D01[0][k]*K_10;
B[4] = Psi_vu_D10[0][k]*K_00;
B[5] = (B[3] + B[4]);
B[6] = B[2]*B[5];
B[7] = Psi_vu_D01[0][j]*K_11;
B[8] = Psi_vu_D10[0][j]*K_01;
B[9] = (B[7] + B[8]);
B[10] = Psi_vu_D01[0][k]*K_11;
B[11] = Psi_vu_D10[0][k]*K_01;
B[12] = (B[10] + B[11]);
B[13] = B[12]*B[9];
B[14] = (B[13] + B[6]);
B[15] = B[14]*I[0];
A[j*3 + k] += B[15];
}
}

Figure 3.7: Part of the generated code for the weighted Laplacian using linear
elements in two dimensions with optimisation option -f precompute_basis_const.
The array B contain precomputed values that depend on indices j and k.
3.4. Performance comparisons of representations 75

the quadrature degree of a variational form thereby permitting inexact quadrature.


For instance, to set the quadrature degree equal to two, the command-line option
-f quadrature_degree=2 should be used in which case FFC will use a quadrature
rule which is able to integrate a quadratic polynomial exactly. For tetrahedra, this
will result in a four point quadrature scheme.

3.4 Performance comparisons of representations


Generated tensor contraction and quadrature-based code is now compared in terms
of the metrics outlined in Section 3.1, namely the run-time performance, the size of
generated code and the speed of the code generation phase. The aim is to elucidate
features of the two representations for various problems with the goal of finding
a guiding principle for selecting the most appropriate representation for a given
problem.
First some typical forms of differing complexity and nature are considered
to illustrate some trends and differences between the representations. This leads
to a systematic comparison using some very simple forms for which the tensor
contraction representation is expected to prove superior, before increasing the
complexity of the forms in order to investigate the cross-over point at which the
quadrature representation becomes the better representation in terms of run-time
performance. Exact quadrature is used for all examples.
All tests were performed on an Intel Core i7-2600 CPU at 3.40GHz (8 cores,
although tests were run in serial) with 15.7GB of RAM running Ubuntu 12.10 with
Linux kernel 3.5.0-23. Python version 2.7.3 and NumPy version 1.6.2 (both pertinent
to FFC) is used when generating code, while g++ version 4.7.2 with the ‘-O2 -
funroll-loops’ optimisation flags is used to compile the generated C++ code which
is compliant with UFC version 2.0.5. DOLFIN version 1.0.0 is used to assemble the
global sparse matrix for tests which involve compressed sparse matrices. DOLFIN
provides various linear algebra backends, and PETSc (Balay et al., 2001) is used as
the backend for the assembly tests. The nonzero structure of the compressed sparse
matrix is initialised and no special reordering of degrees of freedom has been used
in the assembly tests. Results presented in this section is obtained with FFC version
1.0.0 using the optimisation options -f eliminate_zeros and -f simplify for the
quadrature representation.

3.4.1 Performance for a selection of forms


The two representations are now compared for three different ‘real’ forms to
demonstrate the strengths and weaknesses. The first form considered is a mixed
Poisson formulation using fifth-order Brezzi–Douglas–Marini (BDM) elements
(Brezzi et al., 1985), automation aspects of which have been addressed by Rognes
et al. (2010). The bilinear form, which leads to the finite element stiffness matrix,
76 Chapter 3. Representations and optimisations of finite element variational forms

UFL code
BDM = FiniteElement("Brezzi-Douglas-Marini", triangle, 5)
DG = FiniteElement("Discontinuous Lagrange", triangle, 5 - 1)

mixed_element = BDM*DG

(sigma, u) = TrialFunctions(mixed_element)
(tau, w) = TestFunctions(mixed_element)

a = (dot(sigma, tau) - u*div(tau) + div(sigma)*w)*dx

Figure 3.8: UFL code for the stiffness matrix of the mixed Poisson problem in (3.10)
using BDM elements of order five.

for the mixed Poisson problem reads


Z
a(σ, u; τ, w) := σ · τ − u (∇ · τ ) + (∇ · σ ) w dx, (3.10)

where τ, σ ∈ V, w, u ∈ W and

V := τ ∈ H (div, Ω) : τ | T ∈ BDMk ( T ) ∀ T ∈ Th , (3.11)



n o
W := w ∈ L2 (Ω) : w| T ∈ Pk−1 ( T ) ∀ T ∈ Th . (3.12)

The UFL code for this form with k = 5 is shown in Figure 3.8.
The generation of code for a discontinuous Galerkin formulation of the bihar-
monic equation with Lagrange basis functions which involves both cell and interior
facet integrals (Ølgaard et al., 2008a) is also considered. The bilinear form for this
problem reads
Z Z Z
a (u, v) := ∇2 u∇2 v dx − h∇2 ui · J∇vK ds − J∇uK · h∇2 vi ds
Ω Γ0 Γ0
α
Z
+ J∇uK · J∇vK ds, (3.13)
Γ0 h

where the functions u, v ∈ V and


n o
V := v ∈ H01 (Ω) : v T ∈ Pk ( T ) ∀ T ∈ Th , (3.14)

and Γ0 denotes the set of interior facets, α > 0 is a penalty parameter and h is
a measure of the cell size. See Section 4.2.4 for more details. The UFL code for
this bilinear form for the case k = 3 is shown in Figure 3.9. The third example
is a complicated form which has arisen in modelling temperature-dependent
3.4. Performance comparisons of representations 77

UFL code
element = FiniteElement("Lagrange", triangle, 3)
u = TrialFunction(element)
v = TestFunction(element)

n = VectorConstant(element.cell())
h = Constant(element.cell())
h_avg = 0.5*(h(’+’) + h(’-’))

alpha = 10.0

a = inner(div(grad(u)), div(grad(v)))*dx \
- inner(avg(div(grad(u))), jump(grad(v), n))*dS \
- inner(jump(grad(u), n), avg(div(grad(v))))*dS \
+ alpha*h_avg*inner(jump(grad(u), n), jump(grad(v),n))*dS

Figure 3.9: UFL code for the stiffness matrix of a discontinuous Galerkin for-
mulation for the biharmonic equation using two-dimensional elements of order
three (3.13).

multiphase flow through porous media (Wells et al., 2008). It comes from the
approximate linearisation of a stabilised finite element formulation for a particular
problem and is characterised by standard Lagrange basis functions of low order but
the products of many functions from a number of different spaces. The physical
significance of the equation is unimportant in the context of this work, therefore it
is presented in an abstract form. The bilinear form reads:

2
Z
!
a( p, q) := f 0 g2 g3 g4 pq − ( 1 − g5 ) ∑ g i u i · ∇ p q

Ω i =0
2 2
! !
g6 (1 − g5 ) ∑ f 2i+1 ∇ p · ∇ q + f 0 g2 g3 g4 p g7 ∑ g i u i · ∇ q
 

i =0 i =0
2 2
! !
− ( 1 − g5 ) ∑ g i u i · ∇ p g7 ∑ gi u i · ∇ q
i =0 i =0
2 2
! !
2
g6 (1 − g5 ) ∑ f 2i+1 ∇ p g7 ∑ gi u i · ∇ q dx, (3.15)


i =0 i =0

where the test and trial functions q, p ∈ V with


n o
V := v ∈ H 1 (Ω) : v T ∈ P2 ( T ) ∀ T ∈ Th , (3.16)

and the functions f i ∈ Vf , gi ∈ Vg and ui ∈ Vu are coefficient functions. The


78 Chapter 3. Representations and optimisations of finite element variational forms

UFL code
scalar_p = FiniteElement("Lagrange", triangle, 2)
scalar = FiniteElement("Lagrange", triangle, 1)
dscalar = FiniteElement("Discontinuous Lagrange", triangle, 0)
vector = VectorElement("Discontinuous Lagrange", triangle, 1)

p = TrialFunction(scalar_p)
q = TestFunction(scalar_p)

f0, f1, f2, f3, f4, f5, f6 = [Coefficient(scalar) for i in range(7)]


g0, g1, g2, g3, g4, g5, g6, g7 = [Coefficient(dscalar) for i in range(8)]
u0, u1, u2 = [Coefficient(vector) for i in range(3)]

Sgu = g0*u0 + g1*u1 + g2*u2


S = g6*(1 - g5)*(f1 + f3 + f5)

a_0 = p*g3*f0*g2*g4*q\
- (1 - g5)*inner(Sgu, grad(p))*q\
- S*inner(grad(p), grad(q))

a_1 = g3*f0*g2*g4*p*g7*inner(Sgu, grad(q))\


- (1 - g5)*inner(Sgu, grad(p))*g7*inner(Sgu, grad(q))\
+ S*div(grad(p))*g7*inner(Sgu, grad(q))

a = (a_0 + a_1)*dx

Figure 3.10: UFL code for the ‘pressure equation’ (3.15) in two dimensions.

coefficients spaces are:


n o
Vf := f ∈ H 1 (Ω) : f T ∈ P1 ( T ) ∀ T ∈ Th , (3.17)
n o
Vg := g ∈ L2 (Ω) : gT ∈ P1 ( T ) ∀ T ∈ Th , (3.18)
  2 2

Vu := u ∈ L2 (Ω) : u T ∈ P1 ( T ) ∀ T ∈ Th . (3.19)

The coefficient functions are either prescribed or come from the solution of other
equations. The UFL input to the compiler for this form is shown in Figure 3.10. Due
to the origins of this form, it will informally be denoted as the ‘pressure equation’.
The three forms have been compiled with FFC using the tensor contraction
and quadrature representations. In Table 3.1, the time required to generate the
code, the size of the generated code and the time required to compile the C++
code are reported for each form. Results are presented for the tensor contraction
case, together with the ratio of the time/size for the quadrature representation case
divided by the time/size required for the tensor contraction representation case,
denoted by q/t. In measuring the C++ compile-time and the run-time performance,
3.4. Performance comparisons of representations 79

Form generation [s] q/t size [kB] q/t C++ [s] q/t
mixed Poisson 6.3 0.79 4300 0.91 27.2 0.11
DG biharmonic 23.4 0.04 4800 0.07 77.1 0.06
pressure equation 4.0 0.14 5300 0.05 356.0 0.01

Table 3.1: Timings and code size for the compilation phase for the various variational
forms. ‘generation’ is the time required by FFC to generate the tensor contraction
code; ‘size’ is the size of the generated tensor contraction code; and ‘C++’ is the
time required to compile the generated C++ code. The ratio q/t is the ratio between
quadrature and tensor contraction representations.

Form flops q/t run-time [s] q/t


mixed Poisson 38138 34.26 11.7 17.600
DG biharmonic 37353 1.41 15.3 1.175
pressure equation 271356 0.04 158.8 0.014

Table 3.2: Run-time performance for the various variational forms.

the generated code has been compiled against the library DOLFIN. Noteworthy
from the results in Table 3.1 is that the generation phase for the quadrature repre-
sentation is faster than the tensor contraction representation generation phase for
all forms. In all cases the size of the generated quadrature code is smaller than the
tensor contraction code, which is reflected in the C++ compile-time. The differences
in the C++ compile-time are substantial for all forms (more than a factor of hundred
for the pressure equation), which is important during the code development phase
with frequent recompilations.2
Timings and operation counts for the three forms are presented in Table 3.2. The
number of floating point operations (flops) is defined as the sum of all ‘+’ and ‘∗’
operators in the code for computing the element matrix. Although multiplications
are generally more expensive than additions, this definition provides a good
measure for the performance of the generated code. The compound operator ‘+=’
is counted as one operation. For the run-time performance, the time required
to compute the local element tensors N times is recorded. The time needed to
insert the local tensor into the global sparse matrix is not included. For the mixed
Poisson problem N = 5 × 105 and for the discontinuous Galerkin biharmonic
2 It should be noted that the C++ compile-time reduces substantially for the tensor contraction

representation if no g++ optimisations are used (approximately around a factor of ten). The C++
compile-time for the quadrature representation is typically a couple of seconds irrespective of which
g++ optimisation option is used.
80 Chapter 3. Representations and optimisations of finite element variational forms

UFL code
element = FiniteElement("Lagrange", tetrahedron, 2)

u = TrialFunction(element)
v = TestFunction(element)

a = u*v*dx

Figure 3.11: UFL code for the mass matrix in three dimensions with element order
q = 2.

problem and the pressure equation N = 1 × 106 . Table 3.2 presents the timings and
operation counts for tensor contraction representation, together with the ratio of the
quadrature representation case and the tensor contraction representation case, q/t.
The run-time performance is indicative of an aspect of the two representations; there
can be significant performance difference depending on the nature of the differential
equation. For the mixed Poisson problem, the tensor contraction representation is
close to a factor of twenty faster than the quadrature representation, whereas for the
pressure equation the quadrature representation is close to a factor of seventy faster
than the tensor contraction case. Furthermore, the run-time performance ratio and
the flops ratio are in the same order of magnitude suggesting a coupling between the
two. This observation of dramatic differences in run-time performance suggests the
possibility of devising a strategy for determining the best representation, without
generating the code for each case. Such concepts have been successfully developed
in digital signal processing (Püschel et al., 2005). For forms with a relatively simple
structure, devising such a scheme is straightforward. However, it turns out to be
non-trivial for arbitrary forms.

3.4.2 Performance for common, simple forms


The performance of the two representations for two canonical examples: the scalar
‘mass’ matrix and the ‘elasticity-like’ stiffness matrix is now investigated. The
input for the mass matrix form is shown in Figure 3.11 and the input for the
elasticity-like stiffness matrix is shown in Figure 3.12. The performance of the
two representations are compared for three-dimensional cases on simplices and
for various polynomial orders. Code is generated using FFC, and the number
of floating point operations required to form the element matrix for all cases is
reported. In addition to reporting the number of floating point operations, the
time required to compute the element matrix N times is also presented, which is
expected in most cases to be strongly correlated to the floating point operations
count. As before, values are reported for the tensor contraction representation case
together with the ratio of the quadrature value over the tensor contraction value.
3.4. Performance comparisons of representations 81

UFL code
element = VectorElement("Lagrange", tetrahedron, 3)

u = TrialFunction(element)
v = TestFunction(element)

def eps(v):
return grad(v) + grad(v).T

a = 0.25*inner(eps(u), eps(v))*dx

Figure 3.12: UFL code for the elasticity-like matrix in three dimensions with element
order q = 3.

The time required for insertion into a sparse matrix, which is independent of the
element matrix representation, is also reported. The total assembly time is the
‘run-time’ plus the ‘insertion’ time, which provides a picture of the overall assembly
performance. The ratio of the total assembly time for the quadrature representation
over the total assembly time for the tensor contraction representation, denoted
by aq /at , is also presented. When taking this into account, for some forms the
difference in performance between different representations appears less drastic.
The various timings for the mass matrix problem are reported in Table 3.3. What
is clear from these results is that tremendous speed-ups for computing the element
matrices can be achieved using the tensor contraction representation, particularly
as the element order is increased. This is perhaps not surprising considering that
the geometry tensor for this case is simply a scalar, therefore the entire matrix is
essentially precomputed. Also note that the g++ compiler appears to be performing
particularly well for the tensor contraction representation in the two cases where
q = 2 and q = 3. For the case q = 3, the ratio of flops suggest that the run-time ratio
should be around hundred while in fact it is close to 6500. However, as the number
of flops increase for the tensor contraction representation this effect disappears
and the two ratios become almost equal (compare 365 to 378 for the case q = 4).
The effect of the speed-up of computing the element matrix is reduced, however,
if the time required to insert terms into a sparse matrix is taken into account. For
the case of q = 4, the tensor contraction representation is a factor of 378 faster for
computing the element matrix, but when insertion is included an overall speed-up
factor of 9.72 is observed. Although this is a substantial speed-up, the efficiency of
matrix insertion must be addressed to reap the full benefits of the tensor contraction
approach for these types of problems. If in addition the time required to perform
the remaining parts of the finite element procedure such as mesh initialisation,
application of boundary conditions, and solving the resulting system of equations
is taken into account the q/t ratio will become even closer to unity.
82 Chapter 3. Representations and optimisations of finite element variational forms

flops q/t run-time [s] q/t insertion [s] aq /at


q = 1 (N = 1 × 109 ) 52 3.8 1.05 4.1 21.4 1.03
q = 2 (N = 1 × 108 ) 136 31.0 0.11 764.3 67.1 2.25
q = 3 (N = 1 × 108 ) 316 91.2 0.12 6493.3 362.3 3.15
q = 4 (N = 1 × 107 ) 1260 364.7 3.40 377.9 143.5 9.72

Table 3.3: Timings for the mass matrix in three dimensions for varying polynomial
order basis q.

flops q/t run-time [s] q/t insertion [s] aq /at


q = 1 (N = 1 × 107 ) 2242 0.6 2.47 1.4 10.17 1.09
q = 2 (N = 1 × 106 ) 18046 2.7 4.79 3.2 9.68 1.74
q = 3 (N = 1 × 105 ) 91522 9.5 2.63 10.5 5.08 4.24
q = 4 (N = 1 × 104 ) 321984 16.3 1.13 13.7 1.86 5.79

Table 3.4: Timings for the elasticity-like matrix in three dimensions for varying
polynomial order basis q.

The various timings for the elasticity-like stiffness matrix are presented in
Table 3.4. Compared to the mass matrix, the differences in performance of the
tensor contraction representation relative to quadrature representation are less
dramatic, but nonetheless substantial, especially for higher-order functions.

3.4.3 Performance for forms of increasing complexity


The complexity of the forms investigated in the previous section is now increased
systematically in order to examine under which circumstances the quadrature
representation will be more favourable in terms of run-time performance. The
comparison is based on the floating point operation count3 and the size of the
generated file for a large class of problems. The ‘complexity’ of a variational form is
considered to increase when the number of function products increases and when
the number of derivatives present increases. Increasing the number of derivatives
and/or the numbers of functions appearing in a form leads to higher rank tensors
for the tensor contraction representation. Also, increases in the polynomial order of
the basis of a coefficient function leads to an increase in complexity of the geometry
tensor GTα while increases in the polynomial order of the basis of test and trial
functions lead to an increase in complexity of the reference tensor A0iα , see (3.8)
and (3.7). Initially, attention is restricted to manipulating the number of function
3 While the tables concerning flops and run-time performance in the previous two sections suggest

that the flop count is a reasonably good indicator of performance, it is demonstrated in Section 3.5 that
this is not always the case.
3.4. Performance comparisons of representations 83

UFL code
element = FiniteElement("Lagrange", tetrahedron, 2)
element_f = FiniteElement("Lagrange", tetrahedron, 3)

u = TrialFunction(element)
v = TestFunction(element)

f = Coefficient(element_f)
g = Coefficient(element_f)

a = f*g*u*v*dx

Figure 3.13: UFL code for the mass matrix in three dimensions with with q = 2,
premultiplied by two coefficient functions (n f = 2) of order p = 3.

multiplications in the forms and the polynomial order of these functions, before
introducing products of derivatives.
To generate forms of greater complexity than those in the previous section, the
mass matrix and elasticity-like variational forms with a Lagrange basis of order
q are premultiplied with n f functions of order p. In case of the mass matrix, the
modified form reads:
 
Z nf
a (u, v) := ∏ f i  uv dx, (3.20)
Ω i =1

where the test and trial functions v, u ∈ V with


n o
V := v ∈ H 1 (Ω) : v T ∈ Pq ( T ) ∀ T ∈ Th , (3.21)

and f i ∈ Vf are coefficient functions with


n o
Vf := v ∈ H 1 (Ω) : v T ∈ Pp ( T ) ∀ T ∈ Th . (3.22)

An example of UFL code is shown in Figure 3.13 for the mass matrix pre-multiplied
by coefficient functions where q = 2, n f = 2 and p = 3.
A comparison of the representations for the mass matrix with a different
number of premultiplying functions and a range of orders p and q are presented in
Table 3.5. In terms of flops, a ratio q/t > 1 indicates that the tensor representation
is more efficient while q/t < 1 indicates that the quadrature representation is more
efficient. What is clear from Table 3.5 is that with few premultiplying functions,
the tensor contraction approach is generally more efficient, even for relatively high
order premultiplying functions. The situation changes quite dramatically as the
84 Chapter 3. Representations and optimisations of finite element variational forms

nf = 1 nf = 2 nf = 3 nf = 4
flops q/t flops q/t flops q/t flops q/t
p = 1, q = 1 156 1.86 580 1.61 2324 0.49 9492 0.21
p = 1, q = 2 648 7.18 3136 2.44 12512 1.68 52416 0.80
p = 1, q = 3 2700 28.68 12484 12.21 46628 3.29 205716 1.30
p = 1, q = 4 7994 57.62 38058 20.97 155850 5.13 622970 2.04
p = 2, q = 1 360 2.72 3472 0.63 36020 0.39 370020 0.08
p = 2, q = 2 1884 4.10 20236 2.12 203926 0.39 2044176 0.06
p = 2, q = 3 7656 19.95 79936 3.36 766628 0.57 8049636 0.08
p = 2, q = 4 23330 34.23 239550 5.32 2452810 0.78 24548810 0.11
p = 3, q = 1 700 1.93 14020 1.17 288020 0.13 5920020 0.02
p = 3, q = 2 3808 5.75 81136 1.02 1572608 0.09 FFC stopped
p = 3, q = 3 14740 10.53 315652 1.39 6380156 0.11 - -
p = 3, q = 4 47850 16.78 980010 1.96 19602234 0.14 - -

Table 3.5: The number of operations and the ratio between number of operations
for the two representations for the mass matrix in three dimensions as a function
of different polynomial orders and numbers of functions.

number of premultiplying functions increases, and as the polynomial order of


the premultiplying functions increases. The cases with numerous premultiplying
functions are typical of the Jacobian resulting from the linearisation of a nonlinear
differential equation in a practical simulation, and are therefore important. It is
also noted that the tensor contraction representation is more efficient for increases
in q, however, this effect is less pronounced for the cases where n f > 1 and p > 1.
Obviously, the selection of the representation can have a tremendous performance
impact. For the most complicated cases where n f = 4, p = 3 and q > 1 FFC was
stopped after more than one hour of generating code for the tensor contraction
representation. FFC generated the quadrature representation code for all cases in a
couple of seconds.
Interestingly, for complicated forms the operation count is not always a good
indicator of performance. For the three-dimensional mass matrix case with p = 1,
q = 4 and n f = 4, it would be expected from the operation count (q/t = 2.04) that
the tensor contraction representation would be faster. However, when computing
the element tensor 100000 times, a ratio of q/t = 0.81 is observed, meaning that
the quadrature representation is faster. Noteworthy for this case is that the size of
the generated code for tensor contraction representation is 13 MB, while the size
of the generated quadrature code is only 2.4 MB. This size difference leads not
only to a significant difference in the C++ compile-time (almost twenty minutes
for the tensor contraction code and only two seconds for the quadrature code), but
also appears to result in a drop in run-time performance. The performance drop
3.4. Performance comparisons of representations 85

nf = 1 nf = 2 nf = 3
flops q/t flops q/t flops q/t
p = 1, q = 1 9928 0.13 42832 0.11 183088 0.03
p = 1, q = 2 80020 0.75 331228 0.51 1154620 0.16
p = 1, q = 3 405064 2.31 1466704 1.02 6806512 0.59
p = 1, q = 4 1426374 9.82 5920974 4.60 23425902 1.17
p = 2, q = 1 24940 0.19 268120 0.06 2758888 0.01
p = 2, q = 2 204760 0.82 2071972 0.14 21617452 0.07
p = 2, q = 3 902188 1.66 10789336 0.72 FFC stopped
p = 2, q = 4 3680298 7.43 37846422 1.25 - -
p = 3, q = 1 19936 0.29 750880 0.04 21556504 0.01
p = 3, q = 2 367732 0.49 8611804 0.18 FFC stopped
p = 3, q = 3 2068552 1.93 43364368 0.31 - -
p = 3, q = 4 7366950 3.71 152974350 0.50 - -

Table 3.6: The number of operations and the ratio between number of operations
for the two representations for the elasticity-like tensor in three dimensions as a
function of different polynomial orders and numbers of functions.

could be attributed to the increased memory traffic noted by Kirby and Logg (2006).
Also, it may be that the compiler is unable to perform effective optimisations on
the unrolled code, or that the compiler is particularly effective at optimising the
loops in the generated quadrature code.
A similar comparison is made for elasticity-like forms and the results are
presented in Table 3.6. The trends in this table are similar to those observed for
the mass matrix. Again, FFC was stopped after one hour of generating code
for a number of the more complex forms when using the tensor contraction
representation. Code generation using the quadrature representation completes
in a few seconds for all cases. Compared to the mass matrix case, the number
of operations has increased significantly which has a big impact on both the FFC
generation time and the size of the generated code. As an example, FFC spent 63
minutes generating a file of 2.8 GB for the case where n f = 2, p = 3 and q = 4 for
the tensor contraction representation. For the quadrature representation the code
was generated in 8.6 seconds and the resulting file size was 9.2 MB.
As seen in Table 3.6, increasing the number of coefficient functions n f in the
form clearly works in favor of quadrature representation. For n f = 3 the quadrature
representation can be expected to perform best for all values of q and p even though
q/t = 1.17 for the case where p = 1 and q = 4. In this specific case the size of
the generated code for the tensor contraction representation is 442 MB which will
reduce the run-time performance as discussed previously, assuming that g++ is
able to compile the code at all. Increasing the polynomial order of the coefficients,
86 Chapter 3. Representations and optimisations of finite element variational forms

UFL code
element = VectorElement("Lagrange", triangle, 2)
element_f = VectorElement("Lagrange", triangle, 3)

u = TrialFunction(element)
v = TestFunction(element)

f = Coefficient(element_f)
g = Coefficient(element_f)

a = div(f)*div(g)*inner(grad(u), grad(v))*dx

Figure 3.14: UFL code for the vector-valued Poisson problem in two dimension
with with q = 2, premultiplied by the divergence of two vector valued functions
(n f = 2) of order p = 3.

p, also works in favor of quadrature representation although the effect is less


pronounced compared to the effect of increasing the number of coefficients. The
tensor representation appears to perform better when the polynomial order of the
test and trial functions, q, is increased although the effect is most pronounced when
the number of coefficients is low. However, file size considerations, will rule out the
tensor contraction representation for a number of forms where, based on the ratio, it
would be expected to outperform the quadrature representation. It is more difficult
in these cases to make broad generalisation as to the best representation. This again
suggests that a method for automatically determining the best representation based
on inspection of the form may be interesting. A discussion of such a strategy is,
however, postponed until Section 3.6.
Finally, the influence of premultiplying a vector-valued Poisson variational form
by the divergence of vector-valued functions is investigated. The UFL code for the
case n f = 2, p = 3 and q = 2 is shown in Figure 3.14. A comparison of tensor
contraction and quadrature representations is performed, as in the previous cases,
and the results are shown in Table 3.7. Premultiplying forms with derivatives
of functions clearly increases the complexity to such a degree that the tensor
contraction representation involves fewer operations for only a very limited number
of the considered cases.

3.5 Performance comparisons of quadrature optimisations


In this section the impact of the optimisation strategies, outlined in Section 3.3, on
the run-time performance is investigated. The point is not to present a rigorous
analysis of the optimisations, but to provide indications as to when the different
strategies will be most effective. The performance of the quadrature optimisations
3.5. Performance comparisons of quadrature optimisations 87

nf = 1 nf = 2
flops q/t flops q/t
p = 1, q = 1 708 0.29 6148 0.07
p = 1, q = 2 2202 0.90 18394 0.13
p = 1, q = 3 8090 1.48 66394 0.19
p = 1, q = 4 22548 2.53 183892 0.32
p = 2, q = 1 1412 0.16 24580 0.04
p = 2, q = 2 7790 0.52 162766 0.03
p = 2, q = 3 24902 0.57 516606 0.05
p = 2, q = 4 60156 1.27 1246436 0.10
p = 3, q = 1 2116 0.30 96772 0.02
p = 3, q = 2 11862 0.36 545422 0.02
p = 3, q = 3 45086 0.54 1695358 0.03
p = 3, q = 4 110668 1.08 4093924 0.04

Table 3.7: The number of operations and the ratio between number of operations for
the two representations for the vector-valued Poisson problem in two dimensions
as a function of different polynomial orders and numbers of functions.

will be investigated using two forms, namely the bilinear form for the weighted
Laplace equation (3.1), see UFL input in Figure 3.1, and the bilinear form for the
Mooney–Rivlin hyperelasticity model from (2.33), page 36, in three dimensions.
The UFL input for the hyperelasticity model is seen in Figure 3.15. In both cases
quadratic Lagrange finite elements will be used.
All tests were performed using the same hardware and software setup as de-
scribed in the previous section with the small difference that the g++ compiler
options are varied. The two forms are compiled with the different FFC optimi-
sations, and the number of floating point operations (flops) to compute the local
element tensor is determined. The number of flops is defined as in the previous
section, that is, as the sum of all appearances of the operators ‘+’ and ‘*’ in the
code. The ratio between the number of flops of the current FFC optimisation
and the standard quadrature representation, o/q is also computed. The gener-
ated code is then compiled with g++ using four different optimisation options for
g++, and the time needed to compute the element tensor N times is measured.
In the following, -zeros will be used as shorthand for the -f eliminate_zeros
option, -simplify is shorthand for the -f simplify_expressions option, -ip is
shorthand for the -f precompute_ip_const option and -basis is shorthand for the
-f precompute_basis_const option.
The operation counts for the weighted Laplace equation with different FFC
optimisations can be seen in Table 3.8, while Figure 3.16 shows the run-time
performance for different compiler options for N = 5 × 107 . The FFC compiler
88 Chapter 3. Representations and optimisations of finite element variational forms

UFL code
element = VectorElement("Lagrange", tetrahedron, 2)

w = TestFunction(element)
du = TrialFunction(element)
u = Coefficient(element)
c1 = Constant(tetrahedron)
c2 = Constant(tetrahedron)

I = Identity(3) # Identity tensor


F = I + grad(u) # Deformation gradient
C = F.T*F # Right Cauchy--Green tensor

I_C = tr(C) # First invariant of C


II_C = (I_C**2 - tr(C*C))/2.0 # Second invariant of C

# Stored strain energy density (Mooney--Rivlin model)


Psi = c1*(I_C - 3.0) + c2*(II_C - 3.0)

Pi = Psi*dx # Potential energy


F = derivative(Pi, u, w) # First variation of Pi about u in direction w
J = derivative(F, u, du) # Jacobian

Figure 3.15: UFL input for the Mooney–Rivlin hyperelasticity model in three
dimensions using quadratic elements. It is the bilinear form, the Jacobian J, which
is of interest in the performance comparison.
3.5. Performance comparisons of quadrature optimisations 89

FFC
optimisation flops o/q
None 4176 1.00
-zeros 6672 1.60
-simplify 2712 0.65
-simplify -zeros 1920 0.46
-ip 3756 0.90
-ip -zeros 4290 1.03
-basis 3756 0.90
-basis -zeros 3690 0.88

Table 3.8: Operation counts for the weighted Laplace equation.

options can be seen on the x-axis in the figure and the four g++ compiler options
are shown with different colors.
The FFC and g++ compile-times were less than one second for all optimisation
options. It is clear from Figure 3.16 that run-time performance is greatly influenced
by the g++ optimisations. Compared to the case where no g++ optimisations are
used (the -O0 flag), the run-time for the standard quadrature code improves by a
factor of 4.70 when using the -O2 option, 6.86 when using the -O2 -funroll-loops
option and 10.65 when using the -O3 option. The -O3 option does not appear to
improve the run-time noticeably beyond the improvement observed for the -O2
-funroll-loops option when the FFC optimisation option -zeros is used. Using
the FFC optimisation option -zeros alone for this form does not improve run-
time performance. In fact, using this option in combination with any of the other
optimisation options increases the run-time, even when combining with the option
-simplify, which has a significant lower operation count compared to the standard
quadrature representation. A curious point to note is that without g++ optimisation
there is a significant difference in run-time for the -ip and -basis options, even
though they involve the same number of flops. When g++ optimisations are
switched on, this difference is eliminated completely and the run-times for the two
FFC optimisations are identical. This suggests that it is not possible to predict run-
time performance from the operation count alone since the type of FFC optimisation
must be taken into account as well as the intended use of g++ compiler options.
The optimal combination of optimisations for this form is FFC option -ip or -basis
combined with g++ option -O2 -funroll-loops, in which case the run-time has
improved by a factor of 12.3 compared to standard quadrature code with no g++
optimisations.
The operation counts and FFC code generation time for the bilinear form for
hyperelasticity with different FFC optimisations are presented in Table 3.9, while
Figure 3.17 shows the run-time performance for different compiler options for
90 Chapter 3. Representations and optimisations of finite element variational forms

103
-O0
-O2
-O2 -funroll-loops
time [s] -O3

102

101
os
ne

os
y

os
p

is

os
if

-i
er
no

er

er

as

er
pl
-z

-z

-z

-b

-z
im

p
-s

is
if

-i

as
pl

-b
im
-s

Figure 3.16: Run-time performance for the weighted Laplace equation for different
compiler options. The x-axis shows the FFC compiler options, and the colors denote
the g++ compiler options.

N = 5 × 104 . Comparing the number of flops involved to compute the element


tensor to the weighted Laplace example, it is clear that this problem is considerably
more complex. The FFC code generation times in Table 3.9 show that the -simplify
optimisation, as anticipated, is the most expensive to perform. The g++ compile-
times for all test cases were less than three seconds for all optimisation options. A
point to note is that the scope for reducing the flop count is considerably greater
for this problem than for the weighted Laplace problem, with a difference in the
number of flops spanning several orders of magnitude between the different FFC
optimisations. This compares to a difference in flops of roughly a factor two
between the non-optimised and the most effective optimisation strategy for the
weighted Laplace problem. In the case where no g++ optimisation is used the
run-time performance for the hyperelastic problem can be directly related to the
number of floating point operations. When the g++ optimisation -O2 is switched
on, this effect becomes less pronounced. Another point to note, in connection
with the g++ optimisations, is that switching on additional optimisations beyond
-O2 does not seem to provide any further improvements in run-time. For the
hyperelasticity example, the option -zeros has a positive effect on the performance,
in particular when combined with the -basis and -simplify optimisations. This is
in contrast with the weighted Laplace equation. The reason is that the test and trial
functions are vector valued rather than scalar valued, which allows more zeros to
be eliminated. Finally, it is noted that the -simplify option performs particularly
3.5. Performance comparisons of quadrature optimisations 91

FFC FFC time


optimisation [s] o/q flops o/q
None 1.8 1.00 56228760 1.000
-zeros 1.8 1.00 38844456 0.691
-simplify 6.9 3.83 3086595 0.055
-simplify -zeros 5.8 3.22 185697 0.003
-ip 2.0 1.11 44310392 0.788
-ip -zeros 2.9 1.61 12562106 0.223
-basis 2.0 1.11 3664392 0.065
-basis -zeros 3.0 1.67 1609430 0.029

Table 3.9: FFC code generation times and operation counts for the hyperelasticity
example.

104
-O0
-O2
-O2 -funroll-loops
103 -O3
time [s]

102

101

100
os
ne

os

os
p

is

os
if

-i
er
no

er

er

as

er
pl
-z

-z

-z

-b

-z
im

y
-s

is
if

-i

as
pl

-b
im
-s

Figure 3.17: Run-time performance for the hyperelasticity example for different
compiler options. The x-axis shows the FFC compiler options, and the colors denote
the g++ compiler options.
92 Chapter 3. Representations and optimisations of finite element variational forms

well for this example compared to the weighted Laplace problem. The reason is
that the nature of the hyperelasticity form results in a relatively complex expression
to compute the entries in the local element tensor. However, this expression only
consists of a few different variables (components of the inverse of the Jacobian and
basis function values) which makes the -simplify option very efficient since many
terms are common and can be precomputed and hoisted. For the hyperelasticity
form, the optimal combination of optimisations is FFC option -simplify -zeros
and g++ option -O2 -funroll-loops. This combination improves the run-time
performance by approximately one order of magnitude compared to all other FFC
options when g++ optimisations are included. Compared to the case where no
optimisation is used by either FFC or g++, the run-time performance of the code is
improved by a factor of 744.
For the considered examples, it is clear that no single optimisation strategy is
the best for all cases. Furthermore, the generation phase optimisations that one
can best use depends on which optimisations are performed by the g++ compiler.
It is also very likely that different C++ compilers will give different results for
the test cases presented in this section. The general recommendation for selecting
the appropriate optimisation for production code will therefore be that the choice
should be based on a benchmark program for the specific problem.

3.6 Automatic selection of representation

In this chapter it has been illustrated how the run-time performance of the generated
code for variational forms can be improved by using various optimisation options
for the FFC and g++ compilers, and by changing the representation of the form.
Numerical experiments have shown that the relative run-time performance of
the two representations can differ substantially depending on the nature of the
considered variational form. In general, the tensor contraction approach deals
well with forms which involve high-order bases and few coefficient functions,
whereas the quadrature representation is more efficient as the number of coefficient
functions (other than constants coefficients) and derivatives in a form increases.
Hence, in general the quadrature representation is significantly faster for more
complicated forms.
In an automated modelling framework, like FEniCS, it seems natural to attempt
to select the most favourable representation automatically. When comparing the two
representations in Section 3.4 it was found that the operation count is a reasonably
good indicator for which form will exhibit the best run-time performance. FFC
presently computes the operation count for the code which is generated, on the
basis of which a choice could be made, but this involves generating computer code
for each case which can be time consuming. Ideally, the form compiler would
select the best representation based on an a priori inspection of the form. It turns
3.7. Future optimisations 93

out, however, that this is a non-trivial task if the goal is a general approach which
holds for any form which FFC can handle. Furthermore, as it has been shown
in the previous section, the code with the lowest number of flops, at least for the
quadrature representation, does not always perform best for a given form. Finally,
the run-time performance even depends on which g++ options are used. A strategy
for selecting between representations based only on an estimation of flops does,
therefore, not seem feasible.
Choosing the combination of form representation and optimisation options
that leads to optimal performance will inevitably require a benchmark study of
the specific problem. However, very often many variational forms of varying
complexity are needed to solve more complex problems. Setting up benchmarks
for all of them is cumbersome and time consuming. Additionally, during the model
development stage run-time performance is of minor importance compared to
rapid prototyping of variational forms as long as the generated code performs
reasonably well.
The default behavior of FFC is, therefore, to automatically determine which
form representation should be used based on a measure for the cost of using the
tensor representation. In short, the cost is simply computed as the maximum value
of the sum of the number of coefficients and derivatives present in the monomials
representing the form. If this cost is larger than a specified threshold, currently
set to three, the quadrature representation is selected. Recall from Table 3.6 that
when n f = 3 the flops for quadrature representation was significantly lower for
virtually all the test cases. Although this approach may seem ad hoc, it will work
well for those situations where the difference in run-time performance is significant.
It is important to remember that the generated code is only concerned with the
evaluation of the local element tensor and that the time needed to insert the values
into a sparse matrix and to solve the system of equations will reduce any difference,
particularly for simple forms. Therefore, making a correct choice of representation
is less important for forms where the difference in run-time performance is small.
A future improvement could be to devise a strategy for also letting the system
select the optimisation strategy for the quadrature representation automatically.
Regardless of whether it is possible to define an optimal strategy for automatically
selecting the representation (and possibly the optimisation), the applicability of
automated modelling is definitely extended by having both tensor contraction and
quadrature representations, and their optimisations, as part of the computational
arsenal.

3.7 Future optimisations

The optimisations proposed in Section 3.3.5 for the quadrature representation are
primarily concerned with the run-time performance of the generated code and the
94 Chapter 3. Representations and optimisations of finite element variational forms

strategies follow along similar lines as the ones already implemented and discussed
in Section 3.3. However, as the number of FEniCS users has increased, so has the
complexity of the problems that users are trying to solve. In Section 3.4 it was
demonstrated that, for some of the more complicated forms, the tensor contraction
representation can take hours to generate code for the given problem and that
the size of the generate code can become very large. For very complex forms,
typically nonlinear forms that are linearised automatically by UFL, similar trends
can be observed also for the quadrature representation. It is, therefore, necessary
to develop new strategies for the code generation process to reduce the generation
time and the size of the generated code.
Two possible approaches that could be investigated are outlined below. Cur-
rently, the code
 to compute
  derivatives of, for instance, basis functions like the term
∑dβ=1 ∑dα1 =1 ∂Xα1 /∂x β ∂Φi1 ( X q )/∂Xα1 in (3.4) is located inside the loop over
basis function indices j and k, see for instance Figure 3.2. From the UFL input

UFL code
element = FiniteElement("Lagrange", triangle, 1)
u = TrialFunction(element)
v = TestFunction(element)
a = inner(grad(u),grad(v))*dx

the generated code for the loop over basis function indices will be

C++ code
for (unsigned int j = 0; j < 3; j++)
{
for (unsigned int k = 0; k < 3; k++)
{
A[j*3 + k] += (((K_00*FE0_D10[0][j] + K_10*FE0_D01[0][j]))*
((K_00*FE0_D10[0][k] + K_10*FE0_D01[0][k])) +
((K_01*FE0_D10[0][j] + K_11*FE0_D01[0][j]))*
((K_01*FE0_D10[0][k] + K_11*FE0_D01[0][k])))*W1*det;
}
}

which is almost identical to that in Figure 3.2. However, the only difference between
the code to compute the derivative of u and v is the loop index because u and v are
defined using the same finite element. Thus precomputing the derivatives outside
the loop will lead to a reduction in the code size (and in the number of operations
needed). The improved code for the given case would then become:

C++ code
double FE0_d0[3];
double FE0_d1[3];
for (unsigned int r = 0; r < 3; r++)
{
3.7. Future optimisations 95

FE0_d0[r] = (K_00*FE0_D10[0][r] + K_10*FE0_D01[0][r]);


FE0_d1[r] = (K_01*FE0_D10[0][r] + K_11*FE0_D01[0][r]);
}
for (unsigned int j = 0; j < 3; j++)
{
for (unsigned int k = 0; k < 3; k++)
{
A[j*3 + k] += ((FE0_d0[j]*FE0_d0[k]) + (FE0_d1[j]*FE0_d1[k]))*W1*det;
}
}

The drawback of this approach is that the optimisations discussed in Section 3.3,
particularly the -f simplify optimisation, could be less effective as fewer common
expressions involving the geometry constants like K_00 will be present.
To reduce the size of the code even further (and possibly also improve run-
time performance), a linear algebra library, for instance Armadillo (http://arma.
sourceforge.net/) could be employed to perform block operations using optimised
BLAS. The generated code will then become:

C++ code
arma::vec FE0_d0(3);
arma::vec FE0_d1(3);
for (unsigned int r = 0; r < 3; r++)
{
FE0_d0[r] = (K_00*FE0_D10[0][r] + K_10*FE0_D01[0][r]);
FE0_d1[r] = (K_01*FE0_D10[0][r] + K_11*FE0_D01[0][r]);
}

arma::mat R = (FE0_d0*arma::trans(FE0_d0) + FE0_d1*arma::trans(FE0_d1))*W1*det;

// Copy values to A
double* p = R.memptr();
for (int r=0; r<9; r++)
A[r] = p[r];

In the given case, the size of the code has not been reduced significantly. The
approach will be particularly effective in situations involving, for instance, the
inverse operator in UFL. The inverse operator in UFL (only defined for 1 × 1, 2 × 2
and 3 × 3 matrices, is hardcoded as a function of the matrix components. This
leads to a very complex expression inside the loop over basis functions when
following the conventional quadrature approach which can be substituted by a
simple function call to arma::inv.4 The strategy outlined above could have a
negative influence on the run-time performance due to overhead in the linear
algebra library or by making it more difficult for the g++ compiler to perform
optimisations.
4 This approach might not be feasible for linearisations of the inverse when using the automatic

differentiation functionality in UFL.


96 Chapter 3. Representations and optimisations of finite element variational forms

As demonstrated in this chapter, having multiple representations and optimi-


sations available when considering variational forms of different complexity is
an advantage in an automated framework as it is the combination of form com-
plexity, FFC optimisations and g++ compiler options that determines the run-time
performance of the generated code. Implementing the strategies outlined above
will, therefore, extend the applicability of FEniCS to a range of even more complex
problems than what can be handled at present.
4 Automation of discontinuous Galerkin
methods

Discontinuous Galerkin methods in space have emerged as a generalisation of finite


element methods for solving a range of partial differential equations. While histori-
cally used for first-order hyperbolic equations, discontinuous Galerkin methods are
now applied to a range of hyperbolic, parabolic and elliptic problems. In addition
to the usual integration over cell volumes that characterises the conventional finite
element method, discontinuous Galerkin methods also involve the integration of
flux terms over interior facets. Discontinuous Galerkin methods exist in many vari-
ants, and are generally distinguished by the form of the flux on facets. A sample of
fluxes for elliptic problems can be found in Arnold et al. (2002).
Integration of functions on interior facets and evaluating flux terms, expressed
as jumps and averages of quantities of interest, adds complexity to the standard finite
element procedure. Therefore, it is obviously desirable to also handle these types
of formulations in an automated fashion as this permits the rapid prototyping
and testing of new methods. In this chapter the necessary extensions to the
FEniCS framework for implementing discontinuous Galerkin formulations are
presented. Specifically, new abstractions in UFL, FFC, UFC and DOLFIN are needed
in order to handle the automation of the characteristic features of discontinuous
Galerkin methods. The extended framework is then demonstrated through a
range of common problems, including the Poisson, advection–diffusion, Stokes and
biharmonic equations. The presentation of the extensions in this chapter is based
on the work in Ølgaard et al. (2008a)1 .
Although the functionality is implemented with discontinuous Galerkin meth-
ods in mind, it also allows a range of novel finite element methods that draw
upon discontinuous Galerkin methods to be handled automatically by the FEniCS
framework. These methods may not involve discontinuous function spaces but do
involve integration over interior facets. Such examples can be found in Hughes
1 In the original paper, the discontinuous Galerkin operators were implemented in the form language

of FFC which was later merged into UFL. The code examples from the paper have also been updated to
be compliant with FEniCS version 1.0.
98 Chapter 4. Automation of discontinuous Galerkin methods

T−
S

T+

Figure 4.1: Two cells T + and T − sharing a common facet S.

et al. (2006); Wells and Dung (2007); Labeur and Wells (2007).

4.1 Extending the framework to discontinuous Galerkin methods


Discontinuous Galerkin methods involve variational forms that include integrals
over the interior facets of a finite element mesh. Consider for example the following
bilinear form which may appear as a term in a discontinuous Galerkin formulation:
Z
a (u, v) := ∑ JuKJvK ds, (4.1)
S ∈ Γ0 S

where Γ0 denotes the set of all interior facets of the triangulation Th and JvK denotes
the jump in the function value of v across the facet S:

JvK = v+ − v− . (4.2)

Here, v+ and v− denote the values of v on the facet S as seen from the two cells
T + and T − incident with S, respectively (see Figure 4.1). Note that each interior
facet is incident to exactly two cells which may be labelled T + and T − . The union
of these two cells, T = T + ∪ T − , will be referred to as the macro cell.
In order to handle variational forms such as (4.1) in the FEniCS framework,
additional functionality is needed in a number of components. Obviously, UFL
must be extended to support the definition of integrals over interior facets. These
integrals may involve functions which can be evaluated on either of the two cells
incident to the interior facet. DOLFIN must be extended to support assembly of
multilinear forms containing interior facet integrals which in turn requires the
UFC interface to be extended with a new integral class. As UFC is only concerned
with the interface of this class, FFC must support code generation for interior
facet integrals defined using the UFL syntax. The following sections describe the
extensions that have been developed in each of these four components.
4.1. Extending the framework to discontinuous Galerkin methods 99

Mathematical notation UFL notation


f +, f− f(’+’), f(’-’)
hfi avg(f)
JfK jump(f), jump(f, n)

Table 4.1: Table of discontinuous Galerkin operators in UFL.

4.1.1 Extending the Unified Form Language


As illustrated in (4.1) and (4.2), a central concept of discontinuous Galerkin methods
is the possibility that an expression f has two values, denoted f + and f − , on an
interior facet S when it is evaluated based on the two cells T + and T − which
are incident with S. The UFL notation f(’+’) and f(’-’) is used to restrict an
expression f to T + and T − respectively. It is possible to implement a number
of common operators for discontinuous Galerkin methods using these simple
definitions. For convenience, UFL provides the set of operators presented in
Table 4.1 to facilitate compact implementation of these methods. Two typical
operators are the average and jump operators, frequently denoted by h f i and J f K,
respectively. The definition of the average operator is h f i = ( f + + f − )/2 while the
definition of the jump operator is, in general, J f K = f + − f − as shown in (4.2). For
convenience, these two operators are available in UFL by avg(f) and jump(f). It is
common to use the outward unit normal, denoted by n, to the interior facet when
defining the jump operator such that for a scalar valued expression J f K = f + n+ +
f − n− , while for a vector or tensor valued expression J f K = f + · n+ + f − · n− .
In both definitions, n+ and n− denote the outward unit normal to the interior
facet, S, as seen from the two cells T + and T − respectively. These two definitions
are implemented in a single operator jump(f, n) by letting UFL automatically
determine the rank of the expression, f, and return the appropriate definition. It
should be pointed out that because UFL is an embedded language and because of
the restriction operators f(’+’) and f(’-’) a user can easily implement custom
operators for discontinuous Galerkin methods.
What remains, in order to express variational forms of the type shown in (4.1),
is to define a notation for the interior facet integral. Following the notation for the
domain and exterior boundary integrals introduced in Section 1.3.2, the integral over
interior facets Γ I dS is simply written as I * dS(k) where k is the subdomain
R
0,k
number and I is a valid UFL expression. The extensions described above facilitate
compact implementation of a range of discontinuous Galerkin methods using a
syntax which is close to the mathematical notation. As a simple illustration, the
bilinear form in (4.1) is represented in UFL by:
UFL code
a = jump(u)*jump(v)*dS
100 Chapter 4. Automation of discontinuous Galerkin methods

4.1.2 Extending the Unified Form-assembly Code

As DOLFIN relies on the UFC interface when evaluating local finite element tensors,
the UFC interface must define the tabulate_tensor function also for interior facet
integrals. This function is provided by the class ufc::interior_facet_integral
and the interface is

C++ code
/// Tabulate the tensor for the contribution from a local interior facet
virtual void tabulate_tensor(double* A, const double * const * w, const cell&
c0, const cell& c1, unsigned int facet0, unsigned int facet1) const = 0;

where A is a pointer to an array which will hold the values of the local element
tensor and w contains nodal values of any coefficient functions present in the
integral. The two cells c0 and c1 correspond to the cells T + and T − incident with
the given facet S while facet0 and facet1 are the local indices of the facet S relative
to the cells c0 and c1 respectively. This is illustrated in Figure 1.6b, page 19, where
the local facet (edge) index of the shared facet is e0 relative to one cell while it
is e2 relative to the other cell. The implication of this aspect is elaborated in the
following section.

4.1.3 Extending the FEniCS Form Compiler

FFC must also be extended in order to generate code for the new integral class in
UFC to evaluate the local facet tensor. In Section 3.2.2, it was shown how the cell
tensor (element tensor) can be computed from the tensor representation

A T,i = ∑ A0iα GTα . (4.3)


α

Similarly, one may use the affine mappings (defined in Section 3.2.1) FT + and FT − to
obtain a tensor representation for the interior facet tensor AS . However, depending
on the topology of the macro cell T, one obtains different tensor representations.
For a triangular mesh, each cell has three facets (edges) and there are thus 3 × 3 =
9 different topologies to consider; there are nine different ways in which two
edges can meet. Similarly, for a tetrahedral mesh, there are 4 × 4 = 16 different
topologies to consider. Notice that this is only true because FFC assumes the UFC
numbering convention of mesh entities, outlined in Section 1.3.4 and illustrated in
Figure 1.6b, which guarantees that two incident simplicial cells always agree on
the orientation of an incident facet. If no particular ordering of the mesh entities is
assumed, one needs to consider 3 × 3 × 2 = 18 different topologies for triangles
and 4 × 4 × 6 = 96 topologies for tetrahedra. This is because there are two different
ways to superimpose two edges, and there are six different ways to superimpose
two faces. The tensor representation for the interior facet tensor can then be written
4.1. Extending the framework to discontinuous Galerkin methods 101

in the form
0, f + (S), f − (S)
AS,i = ∑ Aiα GTα (S) , (4.4)
α

where f + and f − denote the local numbers of the two facets that meet at S relative
to the two cells T + and T − respectively. Note that the geometry tensor GTα in (4.3)
involves the mapping from the reference cell and differs from the geometry tensor
GTα (S) in (4.4), which may involve the mapping from the reference cell and the
+ −
mapping from the reference facet. The reference tensor A0, f , f is precomputed
for each facet–facet combination ( f + , f − ) and a run-time decision must be made
as to which reference tensor should be contracted with the geometry tensor.
The FFC machinery which generates code for each facet–facet combination
based on UFL expressions is largely unaffected by the extensions to discontinu-
ous Galerkin methods. As a consequence, the quadrature representation can be
extended in a similar fashion taking into account the differences between the two
representations described in Section 3.2. Furthermore, the optimisations presented
in Section 3.3 also apply to variational forms containing interior facet integrals.

4.1.4 Extending DOLFIN

To assemble the global sparse tensor A for variational forms that contain integrals
over interior facets as in (4.1), one may extend the standard assembly algorithm
over the cells of the computational mesh (see Algorithm 1, page 27) by including
an iteration over the interior facets of the mesh. The approach is described for
the bilinear form in (4.1) where, for ease of notation, it is assumed that u, v ∈ V.
Adopting the notation from Section 1.3.5 the tensor A which arises from assembling
the bilinear form in (4.1) can be expressed as:
Z
A I = a φ I2 , φ I1 = ∑ aS φI2 , φI1 = ∑ JuKJvK ds, (4.5)
 
S S S

 N
where I = ( I1 , I2 ) is a multi-index and φk k=1 is a global (possibly discontinuous)
basis for V.
To assemble the global sparse tensor A efficiently by iterating over the interior
facets of the mesh, a local-to-global mapping that maps the basis functions on
the local facet S to the set of global basis functions is needed. This mapping is
constructed by considering + −
n two o cells T n ando T sharing a common facet S as
+ n − n
shown in Figure 4.1. Let φkT and φkT denote the local finite element
k =1 k =1
basis on T+ and T− respectively. These local basis functions are now extended to
102 Chapter 4. Automation of discontinuous Galerkin methods

the macro cell T by the following construction:

T+

 φk ( x ) , k = 1, 2, . . . , n,
 x ∈ T+,
0, k = 1, 2, . . . , n, x T−,



φ̄kT ( x ) = (4.6)
 0, k = n + 1, n + 2, . . . , 2n, x ∈ T+,
 φ T − ( x ) , k = n + 1, n + 2, . . . , 2n,

x ∈ T−.

k−n

The local basis functions on T + and T − are thus extended to T by zero to obtain a
n o2n
local finite element space, φ̄kT , on T of dimension 2n. Recall from Section 1.3.5
k =1
j
that, for each T ∈ Th , ι T : [1, n] → [1, N ] denotes the local-to-global mapping for
j
each discrete function space Vj . The local-to-global mapping for T (or S), ι T , can
then be obtained by the construction (4.6) such that
j j j j j j j j
ι T (1) = ι T + (1), . . . , ι T (n) = ι T + (n), ι T (n + 1) = ι T − (1), . . . , ι T (2n) = ι T − (n). (4.7)

The local interior facet tensor AS can now be defined. Consider first the case
j j
when ι T is an injective mapping and note that ι T is injective when the ranges of
j j
ι T + and ι T − are disjoint (which is the case for discontinuous elements). Continuing
from (4.5), the tensor A can be computed from
 
A I = ∑ aS φ I2 , φ I1 = ∑ aS φ̄ιT−1 ( I ) , φ̄ιT−1 ( I ) = ∑ A  −1 , (4.8)

−1
T 2 T 1 S, ι
T
( I1 ),ι T ( I2 )
S S S

where the local interior facet tensor AS is thus defined by

AS,i = aS (φ̄iT2 , φ̄iT1 ), (4.9)

where i = (i1 , i2 ) is a multi-index. Note that the size of AS , due to the construction
in (4.6) is 2n × 2n and not n × n as would be the case for a local cell tensor A T .
Similar to (1.11) and (1.12), the collective local-to-global mapping for each S ∈ Γ0 is
defined as  
ι T (i ) = ι1T (i1 ) , ι2T (i2 ) ∀ i ∈ IT , (4.10)

where I T is the index set

2
I T = ∏ [1, 2n] = (1, 1), (1, 2), . . . , (2n, 2n − 1), (2n, 2n) . (4.11)

j =1

The global tensor A can now be computed by Algorithm 2.


j
Now, if ι T is not injective (two local basis functions are restrictions of the same
global basis function), which may happen if the basis functions are continuous, one
4.2. Examples 103

Algorithm 2 Assembly algorithm over interior facets.


A=0
for S ∈ Γ0
(1) Compute ι T
(2) Compute AS
(3) Add AS to A according to ι T :
for i ∈ I T
+
Aι (i) = AS,i
T
end for
end for

may still assemble the global tensor A by Algorithm 2 and compute the interior
facet tensor according to (4.9). To see this, assume that ι1T (i1 ) = ι1T (i10 ) = I1 for
some i1 6= i10 . It then follows that the entry A I1 ,ι2 (i2 ) will be a sum of the two terms
T
AS,i1 ,i2 and AS,i0 ,i2 (and possibly other terms). Since aS is bilinear, we have
1

   
AS,i1 ,i2 + AS,i0 ,i2 = aS φ̄iT2 , φ̄iT1 + aS φ̄iT2 , φ̄iT0
1
1   
= aS φ̄iT2 , φ̄iT1 + φ̄iT0 = aS φ̄iT2 , φ I1 , (4.12)
1

where by the construction (4.6) φ I1 is the global basis function that both φ̄iT1 and φ̄iT0
1
are mapped to.
DOLFIN implements Algorithm 2 in the assemble function. To compute the
local contribution aS , DOLFIN calls the tabulate_tensor function for interior facet
integrals using the interface described in Section 4.1.2. For each discrete function
space DOLFIN calls the tabulate_dofs function, see Section 1.3.4, on the cells T +
j
and T − to construct the local-to-global mapping ι T from (4.7). These mappings
are then used by DOLFIN to construct the collective local-to-global mapping ι T
from (4.10).

4.2 Examples
The developments described in the previous section extend the applicability of the
FEniCS framework to a new range of problems. In this section, it is demonstrated
how the extensions make it possible to apply discontinuous Galerkin formulations
to a number of problems. The examples are presented on the usual form: find
u ∈ V such that
a (u, v) = L (v) ∀ v ∈ V̂, (4.13)
104 Chapter 4. Automation of discontinuous Galerkin methods

where V is the trial space and V̂ is the test space, a (u, v) and L (v) denote the
bilinear form and linear form respectively. Some of the examples are presented as
complete DOLFIN solvers while others only present the UFL input for the bilinear
and linear forms of the corresponding problem. For all examples, the test and trial
functions are assumed to come from the same function space, that is, V̂ = V.

4.2.1 The Poisson equation

Consider the function space V,


n o
V := v ∈ L2 (Ω) : v| T ∈ Pk ( T ) ∀ T ∈ Th , (4.14)

where Pk ( T ) denotes the space of polynomials of degree k on the element T. The


bilinear and linear forms for the Poisson equation with homogeneous Dirichlet
boundary conditions, enforced in a weak sense, read (Arnold et al., 2002)
Z Z Z
a (u, v) := ∇u · ∇v dx − JuK · h∇vi ds − h∇ui · JvK ds
Ω Γ0 Γ0
α α
Z Z Z Z
− JuK · ∇v ds − ∇u · JvK ds + JuK · JvK ds + uv ds (4.15)
∂Ω ∂Ω Γ0 h ∂Ω h

and Z
L(v) := f v dx, (4.16)

where α > 0 is a penalty parameter and h is a measure for the cell size defined as
h = (h+ + h− )/2 with h+ and h− denoting the cell size for the two cells, T + and
T − respectively, incident with the given interior facet. Due to the term involving
the penalty parameter α this formulation is commonly referred to as an interior
penalty (IP) formulation. The size of a cell is defined here as twice the circumradius.
The jump J·K and average h·i operators are defined as JvK = v+ n+ + v− n− and
h∇vi = (∇v+ + ∇v− )/2 on the set of interior facets, Γ0 , and JvK = vn on ∂Ω.
A domain and source term identical to those used in Section 1.3.5 are considered,
that is, Ω = [0, 1] × [0, 1] and f = 8π 2 sin(2πx ) sin(2πy). The corresponding
DOLFIN solver for this problem is shown in Figure 4.2 for linear polynomials on
triangular elements. Note in particular how the form and syntax of the definitions
of the bilinear and linear forms (a and L) resemble closely the mathematical notation
in (4.15) and (4.16). Also, note the close resemblance with the code in Figure 1.10,
page 25, for the continuous solution. This demonstrates the ease of switching
between formulations for a given problem by only changing the definitions of the
bilinear and linear forms in the computational setup. The functions FacetNormal
and CellSize are convenience functions implemented in DOLFIN according to the
definitions above. Because the solution is computed on discontinuous elements it is
4.2. Examples 105

common to project the solution onto a continuous basis for visualisation. This can
be accomplished easily by using the project function in DOLFIN. The computed
solution for the Poisson problem, projected onto a piecewise linear basis, is seen in
Figure 4.3, which is almost identical to the solution presented in Figure 1.9, page 24,
for the continuous case.

4.2.2 Steady state advection–diffusion equation


Next, the advection–diffusion equation is considered with Dirichlet boundary
conditions on inflow boundaries and full upwinding of the advective flux at
element facets. Using the same definition of V as in (4.14), the bilinear and linear
forms read
Z Z Z
a (u, v) := (κ ∇u − bu) · ∇v dx + ?
bu · JvK ds + bu? · JvK ds
Ω Γ0 ∂Ω
κα
Z Z Z
− κ h∇ui · JvK ds − κJuK · h∇vi ds + JuK · JvK ds
Γ0 Γ0 Γ0 h
α
Z Z Z
+ uv ds − JuK · ∇v ds − ∇u · JvK ds (4.17)
ΓD h ΓD ΓD

and
α
Z Z Z
L(v) := gv ds − JgK · ∇v ds − ∇ g · JvK ds, (4.18)
ΓD h ΓD ΓD

where the vector b is a given velocity field, u? is equal to u restricted to the upwind
side of a facet, 
u+ b · n+ > 0,
u? = (4.19)
u− b · n+ < 0,

κ is the diffusion coefficient, Γ D is the part of the boundary where the Dirichlet
condition u = g is applied. The definitions of the jump and average operators and
the parameters h and α are the same as for the Poisson equation.
Again, the unit square, Ω = [0, 1] × [0, 1], is considered with g = sin(5πy))
applied on the boundary at x = 1 and a constant velocity field of b = (−3, −2).
The diffusion coefficient κ is set to zero in which case the DOLFIN solver for this
problem can be implemented as shown in Figure 4.4 for linear triangular elements.
The implementation is again a reflection of the mathematical formulation with
a small exception regarding the upwind value u? . In the code, the variable bn
is computed as bn = b · n + |b · n| /2. Relative to the two elements T + and T −


associated with a given facet it evaluates to:


 
b · n+ b · n+ > 0, b · n− b · n− > 0,
bn| T + = and bn | T − = (4.20)
0 b · n+ < 0, 0 b · n− < 0.
106 Chapter 4. Automation of discontinuous Galerkin methods

Python code
from dolfin import *

# Create mesh and define function space


mesh = UnitSquare(32, 32)
V = FunctionSpace(mesh, "DG", 1)

# Define test and trial functions


v = TestFunction(V)
u = TrialFunction(V)

# Define normal component, mesh size, penalty parameter and right-hand side
n = FacetNormal(mesh)
h = CellSize(mesh)
h_avg = (h(’+’) + h(’-’))/2
alpha = 4.0
x = V.cell().x
f = 8*pi**2*sin(2*pi*x[0])*sin(2*pi*x[1])

# Define variational problem


a = dot(grad(u), grad(v))*dx \
- dot(jump(u, n), avg(grad(v)))*dS \
- dot(avg(grad(u)), jump(v, n))*dS \
- dot(u*n, grad(v))*ds \
- dot(grad(u), v*n)*ds \
+ alpha/h_avg*dot(jump(u, n), jump(v, n))*dS \
+ alpha/h*u*v*ds
L = f*v*dx

# Compute solution
u = Function(V)
solve(a == L, u)

# Project solution to piecewise linears


u_proj = project(u)

# Plot solution
plot(u_proj, interactive=True)

Figure 4.2: Complete DOLFIN solver for the interior penalty method applied to the
Poisson equation on a unit square using k = 1.
4.2. Examples 107

Figure 4.3: Computed solution of the Poisson problem. The solution has been
projected onto a piecewise linear basis for visualisation. The warped scalar field u
has been scaled by a factor of 0.5.

Recalling that jump(v) in UFL is equivalent to v+ − v− the line in the code concern-
ing upwinding on interior facets dot(jump(v), bn(’+’)*u(’+’) - bn(’-’)*u(’-’))
is equivalent to:
  
v+ − v− b · n+ u+ − b · n− u− =
v+ b · n+ u+ − v− b · n+ u+ − v+ b · n− u− + v− b · n− u− (4.21)

Since either b · n+ or b · n− is zero, (4.21) is identical to the bu? · JvK term in (4.17)
when the definition of u? in (4.19) is used. The implementation of the upwind
value is a good example of the flexibility offered by the operators in Table 4.1 for
implementing more complex expressions. Also note how the Dirichlet boundary
conditions are applied to only the Γ D part of the exterior boundary. This is achieved
in the code by first creating a class DirichletBoundary, see Section 1.3.5, which
overloads the inside function to return true when x = 1. Then, a FacetFunction
is created which holds an integer value, a marker, for all facets of the mesh and the
value for all facets is initially set to 0. The DirichletBoundary class is then used to
mark the facets which are located at x = 1 by 1. The variable boundary_facets now
contains the index of all facets and the associated value (0 or 1) which indicates if the
facet is part of the Γ D boundary or not. This variable is used to redefine the Measure
object ds to let *ds(1) and *ds(0) in the forms, a and L, indicate integration over
the Γ D and Γ D \ ∂Ω parts of the boundary respectively, see Section 1.3.2. The
computed solution to this problem, projected onto a piecewise linear basis, is seen
in Figure 4.5.
108 Chapter 4. Automation of discontinuous Galerkin methods

Python code
from dolfin import *

# Create mesh and define function space


mesh = UnitSquare(64,64)
V = FunctionSpace(mesh, "DG", 1)

# Define test and trial functions


v = TestFunction(V)
u = TrialFunction(V)

# Define normal component, mesh size, penalty parameter and velocity


n = FacetNormal(mesh)
h = CellSize(mesh)
alpha = 4.0
b = Constant((-3.0,-2.0))

# Define Dirichlet subdomain and value.


class DirichletBoundary(SubDomain):
def inside(self, x, on_boundary):
return abs(x[0] - 1.0) < DOLFIN_EPS and on_boundary
boundary_facets = FacetFunction("uint", mesh, 0)
DirichletBoundary().mark(boundary_facets, 1)
ds = ds[boundary_facets]
g = sin(5.0*pi*V.cell().x[1])

# bn = bn if outflow_facet else 0
bn = (dot(b, n) + abs(dot(b, n)))/2.0

# Define forms
a = dot(-b*u, grad(v))*dx \
+ dot(bn(’+’)*u(’+’) - bn(’-’)*u(’-’),jump(v))*dS + dot(bn*u,v)*ds(0)\
+ (alpha/h*u*v - dot(grad(u), v*n) - dot(u*n, grad(v)))*ds(1)
L = (alpha/h*g*v - dot(g*n, grad(v)) - dot(grad(g), v*n))*ds(1)

# Compute solution
u = Function(V)
solve(a == L, u)

# Project solution to piecewise linears and plot


u_proj = project(u)
plot(u_proj, interactive=True)

Figure 4.4: Complete DOLFIN solver for the advection–diffusion equation with
diffusion coefficient κ = 0.
4.2. Examples 109

Figure 4.5: Computed solution of the advection–diffusion problem. The solution


has been projected onto a piecewise linear basis for visualisation. The warped
scalar field u has been scaled by a factor of 0.5.

4.2.3 The Stokes equations

The Stokes equations with a mixture of continuous and discontinuous functions, as


well as basis functions with possibly varying polynomial orders are now addressed.
Consider the function spaces W and Q,
  d 
2
W := w ∈ L (Ω) : wi ∈ Pk ( T ) ∀ T ∈ Th , 1 6 i 6 d , (4.22)
n o
Q := q ∈ H 1 (Ω) : q ∈ Pj ( T ) ∀ T ∈ Th , (4.23)

where Ω is a bounded domain in Rd with d > 2. Setting V = W × Q and u = 0 on


∂Ω, particular bilinear and linear forms for the Stokes equation read (Baker et al.,
1990)
Z Z Z
a u, p; v, q := ν∇u : ∇v dx + ∇ p · v dx − u · ∇q dx

Ω Ω Ω
Z Z
+ Ju · nKq ds + u · nq ds
Γ0 ∂Ω
Z Z Z
− νh∇ui : JvK ds − νJuK : h∇vi ds − ν∇u : JvK ds
Γ0 Γ0 ∂Ω
να να
Z Z Z
− νJuK : ∇v ds + JuK : JvK ds + JuK : JvK ds, (4.24)
∂Ω Γ0 h ∂Ω h
110 Chapter 4. Automation of discontinuous Galerkin methods

UFL code
# Create mixed function space
W = VectorElement("Discontinuous Lagrange", "triangle", 1)
Q = FiniteElement("Lagrange", "triangle", 1)
element = W * Q

# Define test and trial functions


(v, q) = TestFunctions(element)
(u, p) = TrialFunctions(element)

# Define normal component, mesh size, penalty parameter and right-hand side
n = element.cell().n
h = 2.0*triangle.circumradius
h_avg = (h(’+’) + h(’-’))/2
alpha = 4.0
f = Coefficient(W)

# Define forms
a = inner(grad(u), grad(v))*dx + inner(grad(p), v)*dx - inner(u, grad(q))*dx \
+ inner(jump(u, n), q(’+’))*dS \
+ inner(u, n)*q*ds \
- inner(dot(avg(grad(u)), n(’+’)), jump(v))*dS \
- inner(jump(u), dot(avg(grad(v)), n(’+’)))*dS \
- inner(dot(grad(u), n), v)*ds \
- inner(u, dot(grad(v), n))*ds \
+ alpha/h_avg*inner(jump(u), jump(v))*dS \
+ alpha/h*inner(u, v)*ds

L = dot(f, v)*dx

Figure 4.6: UFL input for the Stokes equation using k = 1 and ν = 1.0.

and Z
L v, q := f · v dx. (4.25)


The jump J·K and average h·i operators are defined as JvK = v+ ⊗ n+ + v− ⊗ n− ,
Jv · nK = v+ · n+ + v− · n− and h∇vi = (∇v+ + ∇v− )/2 on Γ0 and JvK = v ⊗ n on
∂Ω. The UFL input in two dimensions for this problem with k = j = 1, as proposed
in Baker et al. (1990), and the kinematic viscosity ν = 1.0 is shown in Figure 4.6.

4.2.4 Biharmonic equation

Classically, Galerkin methods for the biharmonic equation seek approximate solu-
tions in a subspace of H 2 (Ω). However, such functions are difficult to construct in
a finite element context. Based on discontinuous Galerkin principles, methods have
been developed which utilise functions from H 1 (Ω) (Engel et al., 2002; Wells and
Dung, 2007). Rather than considering jumps in functions across element boundaries,
4.2. Examples 111

terms involving the jump in the normal derivative across element boundaries are
introduced. Unlike fully discontinuous approaches, this method does not involve
double-degrees of freedom on element edges and, therefore, does not lead to the
significant increase in the number of degrees of freedom relative to conventional
methods. Consider the continuous function space
n o
V := v ∈ H01 (Ω) : v ∈ Pk ( T ) ∀ T ∈ Th . (4.26)

The bilinear and linear forms for the biharmonic equation, with the boundary
conditions u = 0 on ∂Ω and ∇2 u = 0 on ∂Ω, read
Z Z Z
a (u, v) := ∇2 u∇2 v dx − J∇uK · h∇2 vi ds − h∇2 ui · J∇vK ds
Ω Γ0 Γ0
α
Z
+ J∇uK · J∇vK ds, (4.27)
Γ0 h
Z
L(v) := f v dx. (4.28)

The jump J·K and average h·i operators are defined as J∇vK = ∇v+ · n+ + ∇v− · n−
and h∇2 vi = (∇2 v+ + ∇2 v− )/2 on Γ0 . The UFL input for this problem with k = 4
is shown in Figure 4.7.

As an example for the biharmonic equation consider the domain Ω = [0, 1] ×


[0, 1] × [0, 1] with f = 9π 4 sin(πx ) sin(πy) sin(πz), in which case the exact solution
u = sin(πx ) sin(πy) sin(πz). The observed convergence behaviour for this problem
is illustrated in Figure 4.8 for various polynomial orders. As predicted by a priori
estimates, a convergence rate of k + 1 is observed for k > 2 (Engel et al., 2002), and
a rate of k for polynomial order k = 2 (Wells and Dung, 2007).

The error in the L2 norm for the convergence rates in Figure 4.8, is computed
via the code shown in Figure 4.9 where the finite element solution uh has been
computed using fourth order Lagrange basis functions. Given the exact solution
u and the finite element solution uh , the error e = u − uh can be computed by
the functional M in the code. Note that the exact solution has been approximated
by interpolating the exact solution using a continuous eighth order polynomial.
Extending the FEniCS framework for discontinuous Galerkin methods also permits
the computation of other norms like the mesh-dependent semi-norm of the error
Z Z
2
|||e||| = ∇e · ∇e dx + JeK · JeK ds, (4.29)
Ω Γ0

in a straightforward fashion. The UFL input for this functional is shown in


Figure 4.10.
112 Chapter 4. Automation of discontinuous Galerkin methods

UFL code
# Define test and trial functions
element = FiniteElement("Lagrange", tetrahedron, 4)
u = TrialFunction(element)
v = TestFunction(element)

# Normal component, mesh size and right-hand side


n = element.cell().n
h = 2.0*element.cell().circumradius
h_avg = (h(’+’) + h(’-’))/2
f = Coefficient(element)

# Parameters
alpha = 16.0

# Bilinear form
a = inner(div(grad(u)), div(grad(v)))*dx \
- inner(avg(div(grad(u))), jump(grad(v), n))*dS \
- inner(jump(grad(u), n), avg(div(grad(v))))*dS \
+ alpha/h_avg*inner(jump(grad(u), n), jump(grad(v),n))*dS

# Linear form
L = f*v*dx

Figure 4.7: UFL input for the biharmonic equation using k = 4.

100
k=2
k=3
10−1 k=4
2
10−2
1
ku − uh k

10−3
4
10−4
1

10−5 5

1
10−6
1 0.1
h

Figure 4.8: Error in the L2 norm for the biharmonic equation with penalty parame-
ters α = 4, α = 20 and α = 20, for k = 2, k = 3 and k = 4 respectively.
4.2. Examples 113

UFL code
element_u = FiniteElement("Lagrange", tetrahedron, 8)
element_uh = FiniteElement("Lagrange", tetrahedron, 4)

u = Coefficient(element_u)
u_h = Coefficient(element_uh)

e = u - u_h
M = e*e*dx

Figure 4.9: Computation of the error in the L2 norm (squared).

UFL code
element_u = FiniteElement("Lagrange", tetrahedron, 8)
element_uh = FiniteElement("Discontinuous Lagrange", tetrahedron, 4)

u = Coefficient(element_u)
u_h = Coefficient(element_uh)

e = u - u_h
M = inner(grad(e), grad(e))*dx + inner(jump(e), jump(e))*dS

Figure 4.10: Computation of the error in a mesh-dependent semi-norm (squared).


114 Chapter 4. Automation of discontinuous Galerkin methods

4.2.5 Further applications


As demonstrated by the examples in this section, many problems involving concepts
from discontinuous Galerkin methods can now be implemented in the FEniCS
framework in a relatively straightforward fashion due to the extensions developed
in this chapter. In addition to the presented examples, the developments for
automation of discontinuous Galerkin methods in the FEniCS framework have
also been applied by researchers and application developers to other problems
such as free surface flows (Labeur and Wells, 2009), the Navier–Stokes equations
(Labeur and Wells, 2012; Selim et al., 2012; Giesselmann et al., 2012), microstructural
processes (Maraldi et al., 2011), the advection-diffusion-reaction equation (Wells,
2011), magnetic advection (Sukys et al., 2010; Heumann and Hiptmair, 2012), mantle
convection simulations (Vynnytska et al., 2013, 2012), wave surface elevation (Lopes
et al., 2011), Nitsche’s method for overlapping meshes (Massing et al., 2013), PDE-
constrained optimisation (Funke and Farrell, 2013) and the p–biharmonic equation
(Pryer, 2012).
Clearly, automation of discontinuous Galerkin methods is a valuable extension
to the FEniCS framework. However, in order to handle an even broader range
of problems, automation of a particular class of discontinuous Galerkin methods
remains to be addressed. This is the topic of the following chapter.
5 Automation of lifting-type
discontinuous Galerkin methods

This chapter addresses the automation of a so-called lifting-type discontinuous


Galerkin method proposed by Bassi and Rebay (1997) and Bassi and Rebay (2002)
for the compressible Navier–Stokes equations. The method was analysed in Brezzi
et al. (2000) and successfully used by, for example, Dung and Wells (2006); Wells
and Dung (2007); Dung and Wells (2008) for thin bending problems. This chapter
discusses a particular lifting-type formulation for the Poisson equation and pro-
vides a basis for a similar formulation which is developed in the next chapter in the
context of gradient plasticity. The formulation has two major advantages compared
to the interior penalty (IP) formulation in Section 4.2.1. Firstly, no experiments are
needed to determine the value of the stabilisation parameter as the formulation is
stable for all positive values1 (unlike the penalty parameter α in (4.15), page 104).
Secondly, numerical experiments, see Section 5.3, indicate that one can use a con-
stant basis for the Poisson equation, which is not possible when using the interior
penalty method. These properties make the formulation particular interesting for
the gradient plasticity model which will be introduced in Chapter 6. In addition,
the lifting-type formulation is, unlike the IP formulation, also suitable for nonlinear
problems as the former preserves symmetry of the formulation (Ten Eyck and Lew,
2006; Ten Eyck et al., 2008; Wells and Dung, 2007; Dung and Wells, 2008). However,
the advantages come at a price. The three main drawbacks of the lifting-type
formulation are that the formulation is more complex, the local assembly is more
expensive to perform and the global tensor arising from assembling the variational
form becomes less sparse.
Due to the complexity of lifting-type formulations it is desirable to support
these in the FEniCS framework. Unfortunately, fully automated support is not yet
available, but it is possible to implement the methods in a semi-automated fashion
by taking advantage of the functionality developed in the previous chapter. This
1 Although the method is stable for all positive values of the stabilisation parameter, it should be

chosen small enough such that it does not dominate the results when constant elements are used. This
will be investigated in Section 5.3.
116 Chapter 5. Automation of lifting-type discontinuous Galerkin methods

chapter first presents a lifting-type formulation for the Poisson equation. Then
follows the implementation of this formulation in the FEniCS framework, including
some developments for semi-automated support. The two formulations for the
Poisson equation are then compared to each other to illustrate the influence of
the penalty parameter and performance for constant elements. Finally, future
developments to enable fully-automated support for lifting-type formulations in
the FEniCS framework are discussed.

5.1 Lifting-type formulation for the Poisson equation

This section describes the basic concepts of a lifting-type formulation for the
Poisson equation. The notation from the previous chapter is adopted and some
definitions and concepts are reiterated in the following for convenience. Recall
from Section 4.2.1 the discontinuous scalar function space V:
n o
V := v ∈ L2 (Ω) : v| T ∈ Pk ( T ) ∀ T ∈ Th , (5.1)

where Pk ( T ) denotes the space of Lagrange polynomials of degree k on the element


T of the standard triangulation of Ω, which is denoted by Th . Again, let T + and
T − denote the two cells sharing a common facet S as shown in Figure 4.1. Let the
jump of a function v ∈ V be defined as:

v+ n+ + v− n− on Γ ,
0
JvK = (5.2)
vn on ∂Ω,

where (·)+ and (·)− denote the value of a quantity (·) on T + and T − respectively,
n is the outward unit normal and Γ0 is the set of interior facets in Ω. A function
space for the gradient of functions in V is also defined:
 h id 
Q := q ∈ L2 (Ω) : q| T ∈ Pl ( T ) ∀ T ∈ Th , (5.3)

where Lagrange polynomials of degree l are used on the local element T. As


functions in Q should contain the gradient of functions in V it implies that l > k − 1,
with k being the polynomial degree of functions in V. The average of a function
q ∈ Q is defined as: 
 1 q+ + q−  on Γ ,
0
hqi = 2 (5.4)
q on ∂Ω.

An operator rS : V → Q (Brezzi et al., 2000) is now defined for a given v ∈ V,


5.2. Semi-automated implementation of lifting-type formulations 117

find rS (v) ∈ Q such that:


Z Z
rS (v) · q dx = − JvK · hqi ds ∀ q ∈ Q, (5.5)
E S

where E = T + ∪ T − (identical to the macro cell T in Figure 4.1) for S ∈ Γ0 ; and


E ∈ Th is the element associated with the facet S for S ∈ ∂Ω. Based on this operator,
a function can now be defined:

R (v) = ∑ rS (v) , (5.6)


S ∈ Γ0 ∪ Γ D

which can be interpreted as an approximation of the gradient of v as the operation


defined in (5.5) transforms the jump in v across element facets into a gradient-like
quantity on element interiors. The function R (v) will be referred to as a lifting
function which is defined by the lifting operator rS . Notice that because S ∈ Γ0 ∪ Γ D
in (5.6), the function is not defined on Γ N , and it will therefore be set to be zero in
this case.
The bilinear form for the Poisson equation, corresponding to (4.15) in Sec-
tion 4.2.1, can now be defined in terms of the lifting function and the lifting
operator:
Z
a (u, v) := ∇u + R (u) · ∇v + R (v) dx
 

Z
+ ∑ α

rS (u) · rS (v) dx, (5.7)
S ∈ Γ0 ∪ Γ D

where the last term is a stabilisation term with α being a stabilisation parameter.
An important property of the lifting-type discontinuous Galerkin formulation is
that the method is stable for any α > 0, which is in contrast to the IP formulation
in (4.15), see Arnold et al. (2002). In addition, no parameter for the mesh size is
needed (h in (4.15))2 . The linear form for the Poisson problem using a lifting-type
formulation remains identical to (4.16) when considering homogeneous Dirichlet
boundary conditions.

5.2 Semi-automated implementation of lifting-type formulations


In the previous chapter, it was shown how the IP formulation for the Poisson
equation in (4.15) and (4.16) can be implemented in a straightforward fashion
2 The mesh size parameter can be defined differently depending on the problem. For the IP formula-

tion of the Poisson equation in Arnold et al. (2002) the parameter he is defined as the length of an edge,
while Djoko et al. (2007b) defines he as the distance between centroids of elements sharing a common
edge for a similar problem.
118 Chapter 5. Automation of lifting-type discontinuous Galerkin methods

using the FEniCS framework, see Figure 4.2 on page 106. The implementation
of lifting-type discontinuous Galerkin forms like (5.7) is, however, more involved.
This is due to the nature of the lifting function R (v) defined in (5.6), through the
variational problem in (5.5), which adds complexity to the assembly procedure.
However, it is possible to use the tools provided by FEniCS as building blocks to
extend the framework to also handle lifting-type formulations in a semi-automated
fashion. This section describes a possible approach to achieve this.
onv
T,q nq
Let φkT,v
n n o
and φk denote the local finite element basis for V and Q
k =1 k =1
on a cell T respectively. From (5.5) two tensors, A E and AS , can be identified:
 Z
E,q E,q

A E,i = a E φ̄i , φ̄i = rS (v) · q dx (5.8)
2 1 E
Z
E,v E,q
 
AS,i = aS φ̄i , φ̄i = − JvK · hqi ds (5.9)
2 1 S

where i = (i1 , i2 ) is the usual multi-index and φ̄ E is a, possible, macro basis which
can be constructed from (4.6), page 102. The lifting operator rS (v) on the cell E can
be represented as:
N
E,q
rS (v) = ∑ rk φ̄k (5.10)
k

where N = 2nq if S is an interior facet and N = nq otherwise; and rk ∈ R N is the


vector of degrees of freedom values for the function rS (v) and can be computed
from:
r k = ( A E ) −1 A S . (5.11)
The vector rk is then used to compute the local cell tensor corresponding to (5.7).
To keep things simple, only the Ω R (u) · ∇v dx term is now considered and it is
R

assumed that S is an interior facet. The local cell tensor A T for this term is equal to:
 Z
E,q
A T,i = a T φ̄i , φiT,v =

rS (u) · ∇v dx, (5.12)
2 1 T
with rS (u) defined by (5.10). Due to the extensions presented in the previous
chapter, a bilinear form to compute AS from (5.9) in two dimensions can be
implemented directly in UFL as:

UFL code
Q = VectorElement("DG", triangle, 0)
V = FiniteElement("DG", triangle, 1)
q = TestFunction(Q)
u = TrialFunction(V)
n = triangle.n
a = - inner(jump(u, n), avg(q))*dS
5.2. Semi-automated implementation of lifting-type formulations 119

where discontinuous constant elements and discontinuous linear elements have


been used for Q and V respectively. The tensor A E in (5.8) is a macro tensor
computed over the domain E = T + ∪ T − . It is currently not possible to handle
integrals over macro cells in the FEniCS framework, but A E can be computed by
evaluating

UFL code
Q = VectorElement("DG", triangle, 0)
v = TestFunction(V)
u = TrialFunction(V)
a = u*v*dx

on T + and T − and then inserting the resulting tensors into A E (which is essentially
a macro mass matrix) following the construction in (4.6). The bilinear form to
compute A T only involves the standard integral over a cell and can, like AS , be
implemented directly in UFL as:

UFL code
Q = VectorElement("DG", triangle, 0)
V = FiniteElement("DG", triangle, 1)
v = TestFunction(V)
R = TrialFunction(Q)
a = inner(R, grad(v))*dx

Before formulating the assembly algorithm, a collective local-to-global mapping


ι T,S is needed. Note that this mapping depends on both T and S due to (5.6). In
this particular example the collective local-to-global mapping for each T ∈ Th and
S ∈ Γ0 is defined as
 
ι T,S (i ) = ι1T (i1 ) , ι2S (i2 ) ∀ i ∈ I T,S , (5.13)

where the mappings ι1T (i1 ) and ι2S (i2 ) are computed according to (1.11), page 26,
and (4.7), page 102, respectively; and I T,S is the index set:
n o
I T,S = (1, 1), (1, 2), . . . , (nv , 2nq − 1), (nv , 2nq ) , (5.14)

where it is assumed that S is an interior facet (otherwise 2nq = nq ).


An algorithm to compute R the contribution to the global tensor A from the local
cell tensor A T for the term Ω R (u) · ∇v dx in (5.7) is outlined in Algorithm 3. An
extension of the assemble function in DOLFIN (Sections 1.3.5 and 4.1.4) based on
the approach outlined in Algorithm 3 for lifting-type formulations is implemented
in the C++ class LiftingAssembler in the FEniCS Solid Mechanics library. Pro-
vided that the user supplies the necessary variational forms, the LiftingAssembler
computes A E , AS and A T by simply calling the tabulate_tensor function (A E is
120 Chapter 5. Automation of lifting-type discontinuous Galerkin methods

Algorithm 3 Assembly algorithm for R (u) · ∇v dx.


R

1: for T ∈ Th do
2: for S ∈ ∂T do
3: Compute A E and AS from (5.8) and (5.9) to obtain rS (v) via (5.11)
4: Compute A T from (5.12)
5: Compute local-to-global degree of freedom mapping ι T,S from (5.13)
6: Add A T to A according to ι T,S :
7: for i ∈ I T,S do
+
8: Aι T,S (i) = A T,i
9: end for
10: end for
11: end for

constructed from A T + and A T − as already mentioned). The tensors A E and A T are


then used to compute the degrees of freedom values rk by solving (5.11), which are
then passed as function values to the form (5.12) to compute A T in line 4.3 The
tabulate_dofs function for each discrete function space on the cells T, T + and
T − is called to construct the collective local-to-global mapping ι T,S after which the
global tensor A can be updated.
It is clear from Algorithm 3 that the local element assembly of lifting-type
formulations is more complex, and thus more expensive to perform, compared
to the assembly outlined in Algorithm 2, page 103, for IP formulations. The
increase in complexity also leads to a global tensor that is much less sparse. This
is illustrated in Figures 5.1 and 5.2, which indicate the location of nonzero entries
in the global tensor A obtained by assembling the bilinear formulations in (4.15)
and (5.7) respectively on the unit square Ω = [0, 1] × [0, 1] using the mesh shown in
Figure 5.3a. Discontinuous constant elements have been used such that cell indices
correspond to degree of freedom numbers. The increase in off-diagonal entries in
Figure 5.2 owes to the presence of the R (u) · R (v) term in (5.7), see Brezzi et al.
(2000). The reason is that each of the lifting functions contains a loop over facets of
the local cell, see (5.6), which effectively couple degrees of freedom on cells that
‘share a neighbouring cell’. This is different from the interior facets integrals in
the IP formulation which only couple degrees of freedom on cells that share a
facet. Figure 5.3b and 5.3c indicate the cells involved when computing the entry
for degree of freedom number six when using the IP and lifting-type formulations
respectively. In general, when considering a cell sufficiently far from the boundary
of a mesh consisting of triangles, the IP formulation will involve four cells while
the lifting-type formulation will involve ten cells when evaluating an entry.
3 These local computations involve many linear algebra operations on dense data structures including

computing the inverse of A E . In order to keep the implementation simple, the FEniCS Solid Mechanics
library employs Armadillo (http://arma.sourceforge.net/) to perform these computations.
5.2. Semi-automated implementation of lifting-type formulations 121

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
0
 
∗ ∗ ∗
1 ∗ ∗ ∗ 
2
 
 ∗ ∗ ∗ 
3 ∗ ∗ ∗ ∗
 

4 ∗ ∗ ∗
 
 
5 ∗ ∗ ∗
 
 
6 ∗ ∗ ∗ ∗
 
 
7 ∗ ∗ ∗ ∗
 
 
8 
 ∗ ∗ ∗ 

9 
 ∗ ∗ ∗ ∗ 

10  ∗ ∗ ∗ ∗ 
11
 
 ∗ ∗ 
12
 
 ∗ ∗ ∗ ∗ 
13
 
 ∗ ∗ ∗ 
14 ∗ ∗

Figure 5.1: Illustration of nonzero entries in the global tensor Aip arising from
assembling the IP formulation in (4.15) on the mesh shown in Figure 5.3a using
discontinuous constant elements.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
0
 
∗ ∗ ∗ ∗ ∗ ∗
1 ∗ ∗ ∗ ∗ ∗ 
2
 
∗ ∗ ∗ ∗ ∗ ∗ 
3 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
 

4 ∗ ∗ ∗ ∗ ∗
 
 
5 ∗ ∗ ∗ ∗ ∗ ∗
 

6 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
 

7 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
 
 
8 
 ∗ ∗ ∗ ∗ ∗ ∗ 

9 
 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗  
10  ∗ ∗ ∗ ∗ ∗ ∗ ∗ 
11
 
 ∗ ∗ ∗ ∗ 
12
 
 ∗ ∗ ∗ ∗ ∗ ∗ ∗ 
13
 
 ∗ ∗ ∗ ∗ ∗ ∗ ∗ 
14 ∗ ∗ ∗ ∗

Figure 5.2: Illustration of nonzero entries in the global tensor Alift arising from
assembling the lifting-type formulation in (5.7) on the mesh shown in Figure 5.3a
using discontinuous constant elements.
122 Chapter 5. Automation of lifting-type discontinuous Galerkin methods

11 13 14 11 13 14 11 13 14
10 12 10 12 10 12

7 9 7 9 7 9
5 6 8 5 6 8 5 6 8

1 3 4 1 3 4 1 3 4
0 2 0 2 0 2

(a) Simple mesh including (b) IP formulation. (c) Lifting-type formulation.


global cell indices.

Figure 5.3: Simple mesh of the domain Ω = [0, 1] × [0, 1] including cell indices.
Because constant elements are used, the index numbering is equal to the degree of
freedom numbering. Figures (b) and (c) show the cells involved when computing
the entry for degree of freedom number 6 using the IP and lifting-type formulations
respectively.

5.3 Comparison of IP and lifting-type formulations

The disadvantages of the lifting-type formulation in terms of increased complexity


and a less sparse global tensor have been treated in the previous section. This section
concerns the advantages of lifting-type discontinuous Galerkin formulations (5.7)
over the IP formulation (4.15) in terms of the penalty parameter and applicability
to constant elements. For the IP formulation, it is not straightforward to determine
the value of the penalty parameter α a priori, and it is, therefore, usually determined
through numerical experiments. As already mentioned, this is not the case for
the lifting-type formulation which is stable for any α > 0. Another drawback
of the IP formulation is that, if discontinuous constant elements are used for V
(k = 0 in (5.1)), then all terms in (4.15) vanish except the stabilisation term. As a
consequence, the value of the penalty parameter will govern the solution. This is in
contrast to the lifting-type formulation in (5.7) where also the R (u) · R (v) term is
nonzero in addition to the stabilisation term for constant elements.
The model problem from Section 4.2.1 on the unit square Ω = [0, 1] × [0, 1]
with f = 8π 2 sin(2πx ) sin(2πy), is considered again. The exact solution for this
problem is u = sin(2πx ) sin(2πy). First, the convergence of the two formulations
using discontinuous linear elements and different values for α is investigated. Then,
the convergence of the two formulations using discontinuous constant elements on
two different types of structured meshes is investigated, and the results obtained
for the ‘optimal’ value of the penalty parameter is presented. Finally, comparison
is made between results obtained using an unstructured mesh of discontinuous
constant elements. The two different types of structured meshes that will be used
5.3. Comparison of IP and lifting-type formulations 123

(a) Structured mesh, the ‘right’ mesh, with (b) Structured mesh, the ‘left/right’ mesh,
the direction of the diagonal pointing to the with alternating direction of the diagonals.
right.

Figure 5.4: Two types of structured meshes for the domain Ω = [0, 1] × [0, 1].

are shown in Figure 5.4. The diagonals of the mesh in Figure 5.4a all point to the
right, and this type of structured mesh will therefore be referred to as the ‘right’
mesh. This particular mesh is created in DOLFIN by:

C++ code
UnitSquare mesh(4, 4);

and is the default mesh type in DOLFIN. The direction of the diagonals of the
mesh in Figure 5.4b alternates between right and left and will be referred to as the
‘left/right’ mesh which is created in in DOLFIN by:

C++ code
UnitSquare mesh(4, 4, "left/right");

The convergence of the two formulations in the L2 norm is first investigated


using discontinuous linear elements, k = 1 in (5.1), on the ‘right’ mesh. The results
can be seen in Figure 5.5 for various values of the penalty parameter α. As expected,
a convergence rate of k + 1 is observed for the two formulations in general. In the
case where α = 2, the IP formulation appears to be unstable which indicates that
the value of α is too small, while for α > 4 the formulation is stable. The lifting-type
formulation is stable for all values of α > 0 as predicted by Brezzi et al. (2000)
which builds confidence in the implementation outlined in the previous section.
Convergence rates for the two formulations using discontinuous constant ele-
ments, k = 0 in (5.1), on the two different meshes for various values of α are shown
in Figure 5.6. Convergence of the IP formulation on the ‘right’ mesh is not observed
for any value of α. On the ‘left/right’ mesh, convergence can only be achieved
for a very limited range of values of the penalty parameter when using constant
elements. For α = 2.0 and α = 4.0 no convergence is observed and only for α = 2.45
124 Chapter 5. Automation of lifting-type discontinuous Galerkin methods

100 100

10−1 10−1
||u − uh ||

||u − uh ||
10−2 10−2
2 2

1 1
10−3 10−3
α = 10−3 α=2
α = 100 α=4
α = 103 α = 1000
10−4 10−4
0.1 0.01 0.1 0.01
h h

(a) Convergence for lifting-type formulation. (b) Convergence for IP formulation

Figure 5.5: Error in the L2 norm as a function of the cell size for various values of
α using discontinuous linear elements (k = 1). The results were computed on the
‘right’ mesh (similar results can be obtained using the ‘left/right’ mesh).

is the convergence rate as expected while for α = 2.3 convergence is suboptimal.


On this particular mesh convergence is thus very sensitive with respect to the value
of the penalty parameter. This is contrasted by the lifting-type formulation where
the expected convergence rate is observed for values in the range 0 < α < 10−2 for
both types of meshes. The important thing to note here, is that the stabilisation
parameter should be chosen small enough such that the solution is not dominated
by the stabilisation term.
Figure 5.7 show computed results for the two formulations on both types
of meshes using ‘optimal’ values of α, that is, α = 2.45 for the IP formulation
and α = 10−3 for the lifting-type formulation although α could take any value
in the range 0 < α < 10−2 . From the figure it is obvious that the IP result is
greatly influenced by the orientation of the mesh which is the reason for the poor
convergence in the ‘right’ mesh case, see Figure 5.6b. The lifting-type formulation,
on the other hand, seems largely unaffected by the mesh orientation. The reason is
that the IP formulation only consider jumps in the value between adjacent cells on
the shared facet, while the R (u) · R (v) term of the lifting-type formulation couple
the jump across one facet with the jumps across all other facets associated with
a given cell (also compare Figures 5.3b and 5.3c). This procedure results in an
averaging of the gradient experienced by the cell and reduces the influence of mesh
orientation.
Figure 5.8 shows the computed solutions to the Poisson problem for the two
formulations on an unstructured mesh using constant elements. The computed
solution from the lifting-type formulation with α = 10−3 is compared to the
solutions from the IP formulation with different values of α. From the results in
5.3. Comparison of IP and lifting-type formulations 125

100 100

10−1 10−1
||u − uh ||

||u − uh ||
1 1

1 1
10−2 10−2
α = 100 α = 1.00
α = 10−1 α = 2.00
α = 10−2 α = 2.30
α = 10−3 α = 2.45
α = 10−4 α = 4.00
10−3 10−3
0.1 0.01 0.1 0.01
h h

(a) Lifting-type formulation ‘right’ mesh. (b) IP formulation ‘right’ mesh


100 100

10−1 10−1
||u − uh ||

||u − uh ||

1 1

1 1
10−2 10−2
α = 100 α = 1.00
α = 10−1 α = 2.00
α = 10−2 α = 2.30
α = 10−3 α = 2.45
α = 10−4 α = 4.00
10−3 10−3
0.1 0.01 0.1 0.01
h h

(c) Lifting-type formulation on ‘left/right’ (d) IP formulation on ‘left/right’ mesh.


mesh.

Figure 5.6: Error in the L2 norm as a function of the cell size for various values of α
on different types of meshes using discontinuous constant elements (k = 0).
126 Chapter 5. Automation of lifting-type discontinuous Galerkin methods

(a) Computed solution for the lifting-type (b) Computed solution for the IP formula-
formulation with α = 10−3 on the ‘right’ tion with α = 2.45 on the ‘right’ mesh.
mesh.

(c) Computed solution for the lifting-type (d) Computed solution for the IP formula-
formulation with α = 10−3 on the ‘left/right’ tion with α = 2.45 on the ‘left/right’ mesh.
mesh.

Figure 5.7: Computed solutions to the Poisson problem for the two formulations
on the two structured meshes using constant elements and a ‘optimal’ value of α.
5.4. Future developments 127

Figures 5.6d and 5.7 one could expect that the IP formulation with α = 2.45 would
perform reasonably well on an unstructured mesh. But from Figure 5.8, α = 2 seems
to produce the best result as higher values of α reduce the magnitude of the values
in the solution. The lifting-type formulation is not affected by the unstructured
mesh and the result is comparable to the ones obtained in Figures 5.7c and 5.7a
for the ‘left/right’ and ‘right’ meshes respectively. In conclusion, a lifting-type
formulation is needed to obtain reliable results when using discontinuous constant
elements for the Poisson problem.

5.4 Future developments


Rather than the semi-automated approach outlined in Section 5.2 it is obviously
desirable to have lifting-type formulations fully supported in the FEniCS toolchain
starting from the UFL input. However, it is still not completely clear what ab-
stractions are needed to accomplish this, but general support for performing static
condensation in FEniCS would be a good start. A possible future syntax for
implementing the Ω R (u) · ∇v dx term from (5.7) could be:
R

UFL code
Q = VectorElement("DG", triangle, 0)
V = FiniteElement("DG", triangle, 1)
v = TestFunction(V)
u = TrialFunction(V)
R = LiftingFunction(Q)
a = inner(R(u), grad(v))*dE

The LiftingFunction represents the operations defined in (5.5) and (5.6) and FFC
must be extended to support code generation for these operations. A new type
of integral, dE, is introduced to denote integration over a patch of elements such
that evaluating R (u) on T will involve T and all of its neighbours. UFC will then
need to provide an interface for this new integral class and DOLFIN must be
updated with an algorithm to perform the assembly, including the construction of
the collective local-to-global mapping for the patch of elements under consideration.
The procedure outlined above is further complicated by the definition of the lifting
function in (5.6) where the loop over facets depends on the set Γ D where Dirichlet
boundary conditions are to be applied. In the event of a moving boundary inside
the domain Ω, the assembly algorithm must thus be provided with information
about which cells and facets to consider.
Another possible future direction, which is perhaps achieved more easily, is to
extend the assembly algorithm. In Algorithm 3 the loop over facets S to compute
rS (v) is nested inside a loop over the cells of the mesh. It should be possible to
compute the entire lifting function R (v) in a single loop over all facets of the mesh
to avoid redundant computations of rS (v). The collective local-to-global mapping
must, however, still be constructed by looping over the facets of the cell T during
assembly.
128 Chapter 5. Automation of lifting-type discontinuous Galerkin methods

(a) Computed solution for the lifting-type (b) Computed solution for the IP formula-
formulation with α = 10−3 on an unstruc- tion with α = 2 on an unstructured mesh.
tured mesh.

(c) Computed solution for the IP formulation (d) Computed solution for the IP formula-
with α = 2.45 on an unstructured mesh. tion with α = 4 on an unstructured mesh.

Figure 5.8: Computed solutions to the Poisson problem for the two formulations
on an unstructured mesh using constant elements. The solution computed using
the lifting-type formulation (a) is compared to the solutions obtained using the IP
formulation with different values of α.
6 Strain gradient plasticity

This chapter brings together the tools and extensions of the previous chapters in an
implementation of a strain gradient plasticity model proposed by Aifantis (1984) in
the FEniCS framework. Strain gradient plasticity models can be used to model size
effects which cannot be accounted for by classical plasticity theory. Size effects in
plasticity are manifest as an increase in the strength of a material as the size of a
specimen becomes smaller. This effect has been observed in many applications at
the micron scale, for instance micro-indentation (Poole et al., 1996; Nix and Gao,
1998; Begley and Hutchinson, 1998), micro-bending (Stölken and Evans, 1998) and
wire torsion (Fleck et al., 1994). For softening problems, the classical plasticity
theory exhibits a pathological mesh dependence in material softening as it does not
provide a length scale for the shear band width. Strain gradient models that define
an internal length scale can, therefore, sometimes be used to provide regularisation
in softening problems under certain conditions.
The considered plasticity model involves the addition of the Laplacian of the
plastic multiplier to the classical yield condition. By considering a weak formulation
for the yield condition H 1 -regular functions can be used for representing the plastic
multiplier. By employing a discontinuous formulation, the yield condition can
be satisfied locally (cell-wise). Following this approach, the standard balance of
momentum equations for the displacements are defined in the entire domain.
The formulation for the yield condition, however, is only defined in the plastic
domain. Furthermore, it is necessary to impose boundary conditions for the plastic
multiplier on the, potentially moving, boundary of the plastic domain. This poses
a numerically challenging problem.
The chapter is organised as follows. First, the strain gradient plasticity model
is presented. The presentation builds on the notation and equations presented in
Chapter 2, in particular Section 2.2.2 and 2.4.2 concerning plasticity and where
convenient some of the definitions are reiterated. A lifting-type formulation for
the plastic multiplier based on the work in the previous chapter is then proposed.
This is followed by the linearisation of the governing equations after which the
implementation in the FEniCS framework is discussed. Finally, numerical examples
are presented followed by some computational observations.
130 Chapter 6. Strain gradient plasticity

6.1 A strain gradient plasticity model

This section introduces the strain gradient model which will be investigated in
the remainder of this chapter. In the particular model under consideration, the
yield criterion from (2.23), page 34, is augmented with the Laplacian of the internal
hardening variable κ as suggested by Aifantis (1984, 1987) and investigated by, for
instance, Mühlhaus and Aifantis (1991):

f (σ, εp , κ ) := φ σ, qkin (εp ) − qiso (κ ) − σy + G ∇2 κ 6 0, (6.1)




where φ σ, qkin (εp ) is a scalar effective stress measure, qkin is a stress-like internal


variable used to model kinematic hardening, qiso is a scalar stress-like term used
to model isotropic hardening, κ is a scalar internal variable, σy is the initial scalar
yield stress and the constant scalar G > 0 is a hardening parameter. The hardening
parameter G determines the contribution of the gradient effect to hardening, and
in the case G = 0 the model reduces to the classical problem in (2.23). As in
Section 2.2.2, a von Mises model with linear isotropic hardening is adopted, see
(2.25). Classical associative plastic flow is assumed1 see (2.26), and isotropic strain-
hardening according to (2.27) is adopted, in which it follows that

κ̇ = λ̇. (6.2)

In addition to the yield criterion (6.1), the Kuhn–Tucker loading–unloading condi-


tions:
f (σ, κ ) 6 0, λ̇ > 0, f (σ, κ ) λ̇ = 0 (6.3)
and Prager’s consistency condition:

f˙ (σ, κ ) = 0 (6.4)

must also be satisfied in regions undergoing plastic deformations. In the remainder


of this chapter, regions undergoing plastic deformations will be denoted by Ωp ⊂ Ω.
p
Furthermore, the boundary of Ωp , denoted by ∂Ωp , is decomposed into regions Γ D
p p p p p p
and Γ N such that Γ D ∪ Γ N = ∂Ω and Γ D ∩ Γ N = ∅.
A consequence of the Laplacian term in (6.1) and the relationship between the
hardening parameter and plastic multiplier in (6.2), the consistency condition (6.4)
leads to a partial differential equation defined in the plastic domain Ωp rather
than an algebraic equation for determining λ at a point. Therefore, conventional
return mapping strategies, as for example the one outlined in Section 2.4.2, are not

1 It has been argued that the plastic flow direction for strain gradient plasticity is governed by a

microstress and not the deviatoric Cauchy stress, see for instance Gudmundson (2004). However, to
ensure a proper comparison of the approach taken in this chapter to the approach of other researchers
investigating the model by Aifantis, this argument is not taken into account.
6.1. A strain gradient plasticity model 131

suitable. A different approach is, therefore, adopted in which the yield criterion
(6.1) is satisfied in a weak sense inside the plastic region at the end of a loading
step: Z
f (σn+1 , λn+1 ) η = 0 ∀ η ∈ W, (6.5)
Ωp
for a suitable choice of W. Together with the standard balance of momentum
equations, this forms a coupled system of equations for computing the unknowns
u and λ. In the remainder of this section the subscript n + 1 is dropped for brevity.
Consider a weak formulation for the yield criterion (6.5):

a (u, λ) ; η = 0 ∀ η ∈ W, (6.6)


with
Z  
a (u, λ) ; η := φ σ (u, λ) − Hλ − σy η dx
 
Ωp
Z Z
−G ∇λ · ∇η dx + G (∇λ · n) η ds, (6.7)
Ωp ∂Ωp

where n is the outward unit normal to ∂Ωp and the last two integrals arise from the
application of integration by parts. The variational form is nonlinear due to how
the stress is computed from u and λ, however, the linearisation of the equations is
postponed to Section 6.3. Due to the presence of integrals involving ∇λ and ∇η
in (6.7), the functions interpolating λ and η should be in the space H 1 (Ω).
A further implication of (6.7) is the necessity of imposing boundary conditions
on λ on the elastic–plastic boundary ∂Ωp . Frequently, the homogeneous boundary
conditions:
p
λ=0 on Γ D , (6.8)
p
∇λ · n = 0 on ΓN , (6.9)

are adopted, see for instance Mühlhaus and Aifantis (1991); De Borst and Mühlhaus
p p
(1992), where Γ D and Γ N denote the parts of the elastic–plastic boundary where
Dirichlet and Neumann conditions for λ are applied respectively. These boundary
conditions bear resemblance to the microhard and microfree boundary conditions
suggested by Gurtin (2004); Gurtin and Needleman (2005) and defined as:
microscopically hard boundary conditions meant to characterise, for example,
microscopic behaviour at the boundary of a ductile metal perfectly bonded to
a ceramic;
microscopically free boundary conditions meant to characterise microscopic be-
haviour at a boundary whose environment exerts no microscopic forces on
the body.
132 Chapter 6. Strain gradient plasticity

The first boundary condition thus prevents plastic flow out of a domain, while the
latter does not.
Considering a discontinuous Galerkin formulation for satisfying the yield crite-
rion in a weak sense carries certain advantages. For instance, by using discontinuous
elements, the yield condition can be satisfied in a local sense (cell-wise). Another
advantage of a discontinuous Galerkin formulation is that the boundary conditions
in (6.8) and (6.9) on the, possibly moving, internal elastic–plastic boundary can be
included naturally. Due to the discontinuous elements, jumps in the λ function
across element boundaries can be represented, which is necessary to accommodate
the ∇λ · n = 0 boundary condition on the elastic–plastic boundary. This issue can
be resolved by leaving the λ function undefined in the elastic region Ωe when
considering continuous function spaces. However, this approach is computation-
ally challenging if the elastic–plastic boundary is moving. Finally, discontinuous
constant elements can be used for λ which is computationally more efficient.
The choice of boundary condition for λ on the elastic–plastic boundary is crucial
for the behaviour of the model. For instance, setting ∇λ · n = 0 on the elastic–
plastic boundary does not guarantee regularisation for a softening problem. The
reason is that this boundary condition permits a constant field for λ inside the
plastic domain which in turn does not introduce a gradient effect. Therefore, the
model does not provide a mechanism by which the plastic domain can expand.
On the other hand, by enforcing λ = 0 on the elastic–plastic boundary a constant
nonzero λ field is no longer possible. This activates the gradient term inside the
plastic domain and thereby provides a mechanism which allows the plastic domain
to expand. Note, however, that for a softening problem the plastic domain will
only expand if the gradient parameter is large enough to overcome the softening
behaviour of the plastic domain and make adjacent elastic elements yield. The
model is, therefore, not suitable for softening problems, see also Engelen et al.
(2006) who investigated the model proposed in Fleck and Hutchinson (2001) as
a representative of a wider class of gradient plasticity models, including that of
Aifantis presented in this section. For hardening problems, on the other hand,
the plastic domain can expand regardless of the choice of boundary condition for
λ although an expanding plastic domain will lead to jumps in the λ field. The
effect of boundary conditions on the behaviour of the model is demonstrated via
numerical examples in Section 6.5.
For the considered plasticity model and boundary conditions, others (De Borst
and Mühlhaus, 1992; De Borst and Pamin, 1996; Djoko et al., 2007a) have shown,
via numerical examples, a regularising effect in the presence of strain softening.
However, the regularising effect is achieved by defining the plastic multiplier on
the entire domain (elastic and plastic) and by imposing excessive regularity of the
plastic multiplier across the elastic–plastic boundary (using a C1 -conforming basis
(De Borst and Mühlhaus, 1992) or introducing penalty terms (De Borst and Pamin,
6.2. A discontinuous Galerkin formulation for the plastic multiplier 133

1996)) or by allowing the plastic multiplier to spread into the elastic region (Djoko
et al., 2007a). This is in contrast to the formulation pursued in this chapter which,
as already mentioned, will use a discontinuous basis for the plastic multiplier and
only consider the gradient term active in the plastic regions.

6.2 A discontinuous Galerkin formulation for the plastic multi-


plier

Instead of seeking λ in the space H 1 (Ω), as is implied by the variational formulation


in (6.7), it follows from the argumentation in the previous section that a more
appropriate function space for λ is:
n o
W := w ∈ L2 (Ω) : w| T ∈ Pk ( T ) ∀ T ∈ Th , (6.10)

where Pk ( T ) denotes the space of Lagrange polynomials of degree k on the element


T of the standard triangulation of Ω, which is denoted by Th . As a consequence
of the choice of function space W and the imposed regularity requirement in-
side the plastic region Ωp , following from the variational formulation in (6.7), a
discontinuous Galerkin formulation is needed.
First, consider a function space for the displacement field u:
 h id 
V := v ∈ H 1 (Ω) : v| T ∈ Pm ( T ) ∀ T ∈ Th , (6.11)

with Lagrange polynomials of degree m. A weak formulation for the yield criterion
corresponding to (6.5) on a single cell T ∈ Ωp can then be formulated as: find
(u, λ) ∈ V × W such that for all w ∈ W
Z  
φ σ (u, λ) − Hλ − σy w dx

T
Z Z
−G ∇λ · ∇w dx + G τ (λ, ∇λ) · n w ds = 0, (6.12)

T ∂T

where the numerical flux τ (λ, ∇λ) is an approximation to ∇λ on the boundary


of T. Various discontinuous Galerkin methods, including the IP and lifting-type
formulations presented in the preceding chapters, can be recovered by defining
the numerical flux appropriately and adding over all cells in Ωp , see Arnold et al.
(2002) for a detailed presentation. By setting w = 1 in (6.12) the equation reduces
to: Z   Z
φ σ (u, λ) − Hλ − σy dx + G τ (λ, ∇λ) · n ds = 0, (6.13)

T ∂T
which demonstrates local conservation in terms of the numerical flux provided
134 Chapter 6. Strain gradient plasticity

that the numerical flux is single valued on cell facets. As this is the case for all the
variants of the numerical flux presented in Arnold et al. (2002) the yield criterion is
satisfied locally on each cell for both the IP and lifting-type formulations.
Taking guidance from Section 4.2.1, the variational form for (6.6) using an
interior penalty formulation can be defined as:
Z   Z
a f IP (u, λ) ; w := φ σ (u, λ) − Hλ − σy w dx − G ∇λ · ∇w dx
 
p Ωp
Z Ω
α
Z
+G h∇λi · JwK + JλK · h∇wi ds − G JλK · JwK ds, (6.14)

p
Γ0
p
Γ0 he

where the last term ensures stability of the formulation in which he denotes the
distance between the centroids of two neighbouring elements, α is the usual
p
stabilisation parameter and Γ0 denotes the set of interior facets inside the plastic
region Ωp . Both λ and w are assumed to be functions in W while u ∈ V. This
particular type of formulation was used by Djoko et al. (2007a,b) for a gradient
plasticity model similar to the one described in the previous section.
However, as demonstrated in the previous chapter, the IP formulation is not
suitable when discontinuous piecewise constant elements are used. A lifting-type
formulation is, therefore, developed which is similar in nature to the lifting-type
formulation for the Poisson equation, which was discussed in Section 5.1, although
some definitions are slightly different to take into account that (6.5) is only valid for
regions undergoing plastic deformations. The jump of a function w ∈ W is defined
as: 
w+ n+ + w− n− on Γp ,
JwK = 0 (6.15)
wn on ∂Ω ∪ ∂Ωp ,

which is comparable to (5.2), page 116. The function space for the gradient of
functions in W is again denoted by Q and is defined in (5.3). The definition of the
average of a function q ∈ Q is slightly different from that in (5.4) namely:

 1 q+ + q−  on Γp ,
hqi = 2 0 (6.16)
q on ∂Ω ∪ ∂Ωp .

Also the definition of the lifting operator and the lifting function (equations (5.5)
and (5.6) respectively) are slightly different. The operator rS : W → Q is defined
for a given w ∈ W, find rS (w) ∈ Q such that:
Z Z
rS (w) · q dx = − JwK · hqi ds ∀ q ∈ Q, (6.17)
E S

p
where E = T + ∪ T − , as seen in Figure 4.1, for S ∈ Γ0 ; E ∈ Th is the element
6.3. Linearisation of the governing equations 135
p
associated with the facet S for S ∈ ∂Ω; and for S ∈ Γ D , E is the element inside Ωp
which is associated with the facet S. The lifting function is then defined as:

R (w) = ∑
p p
rS (w) , (6.18)
S ∈ Γ0 ∪ Γ D

which is very similar to the lifting function in (5.6). Note that due to the definitions
of E and S in (6.17), the function is not defined in neither the elastic region of Ω
p
nor on Γ N and it will, therefore, be defined to be zero in both of these cases. The
lifting-type formulation corresponding to the variational form in (6.14) for the yield
criterion then reads:
Z  
a f (u, λ) ; w := φ σ (u, λ) − Hλ − σy w dx
 
Ωp
Z Z
−G ∇λ + R (λ) · ∇w + R (w) dx − ∑ rS (λ) · rS (w) dx,
 
αG
Ωp p p Ωp
S ∈ Γ0 ∪ Γ D
(6.19)

where α is a stabilisation parameter. Note the close resemblance to the formulation


for the Poisson equation in (5.7).

6.3 Linearisation of the governing equations

The steady state balance of momentum equation from (2.12), page 32, at the end of
a loading step reads:
Z Z Z
σ (un+1 , λn+1 ) : ∇v dx − hn+1 · v ds − bn+1 · v dx = 0, (6.20)
Ω ΓN Ω

where v ∈ V is a weight function with


 h id 
V := v ∈ H 1 (Ω) : v| T ∈ Pm ( T ) ∀ T ∈ Th , v = 0 on Γ D , (6.21)

where Lagrange polynomials of degree m are used and homogeneous Dirichlet


boundary conditions are assumed for the displacement field u. The yield criterion
must also be satisfied at the end of the loading step

a f (un+1 , λn+1 ) ; w = 0. (6.22)




Together these equations form a coupled system of equations that are nonlinear in
general. Newton’s method is, therefore, employed to obtain a solution by linearising
about a state defined at Newton iteration k.
136 Chapter 6. Strain gradient plasticity

At the end of a loading step the stress tensor can be computed from (2.21)
and (2.22), page 34, by:

p
 
σn+1 = C : ε n+1 − ε n+1 . (6.23)

The increment in plastic strain is computed from (2.26):

p p ∂ f (σn+1 )
ε n+1 − ε n = ∆λ = ∆λN (σn+1 ) , (6.24)
∂σ
where the increment of the plastic multiplier ∆λ = λn+1 − λn . The Newton
increment of the stress tensor is determined by inserting (6.24) into (6.23) and
linearising such that at iteration k:
 
s ∂Nk
dσ = C : ∇ du − Nk dλ − ∆λk dσ , (6.25)
∂σ

which after rearranging terms yields:

dσ = Ctan : ∇s du − Nk dλ , (6.26)


with   −1
∂N
Ctan = C −1
+ ∆λk k . (6.27)
∂σ
Here, ∆λk = λk − λn denotes the total increment in the plastic multiplier measured
from the previously converged state at load step n.

In a similar fashion the increment of the yield function can be found by linearis-
ing (6.1) such that:
d f = Nk dσ − Hdλ + G ∇2 dλ, (6.28)
which after inserting (6.26) results in the following expression for the increment of
the yield function:

d f = Nk : Ctan : ∇s du − Nk dλ − Hdλ + G ∇2 dλ. (6.29)




Using these increments, the linearised coupled variational formulation for the
equations (6.20) and (6.22) then reads: find (du, dλ) ∈ V × W such that

a (du, dλ) ; (v, w) = L (v, w) ∀ (v, w) ∈ V × W, (6.30)



6.4. Implementation 137

where
 Z Z
a (du, dλ) ; (v, w) = Ctan : ∇s du : ∇v dx − dλHw dx
Ω Ω
Z Z
− dλCtan : Nk : ∇v dx + Nk : Ctan : ∇s du w dx
Ωp Ωp
Z Z
dλNk : Ctan : Nk w dx − G ∇dλ + R (dλ) · ∇w + R (w) dx
 

Ωp Ωp
Z
− ∑
p p
αG
Ωp
rS (dλ) · rS (w) dx (6.31)
S ∈ Γ0 ∪ Γ D

and
Z Z Z Z
L (v, w) = σk : ∇v dx − b · v dx − h · v ds + f k w dx. (6.32)
Ω Ω ΓN Ωp

In the linear form, the homogeneous Dirichlet and Neumann boundary conditions
for λ, seeR(6.8) and (6.9), have been adopted. An important thing to note is that
the term Ω dλHw dx in (6.31) is effective in the entire domain although, strictly
speaking, it should only be effective in regions undergoing plastic deformation.
This is necessary in order to avoid a singular global system when solving the
equations. However, it does not affect the solution because the term Ωp f k w dx
R

in (6.32) is only nonzero in regions undergoing plastic deformation.


The variational problem in (6.30) is solved for each Newton iteration followed
by the corrections uk ← uk − du and λk ← λk − dλ after which σk and f k can be
updated before proceeding with the next iteration k ← k + 1. Note that although
Nk , σk and Ctan are computed at integration points they are assembled into a global
system of equations for computing dλ. The classical local return mapping scheme
is thus effectively substituted by a global Newton scheme. Implementation details
of a solver for these linearised equations is discussed in the following section.

6.4 Implementation
Implementing a solver for the coupled nonlinear equations of the gradient plasticity
problem involves advancing the solution from the pseudo time tn to the time tn+1
where the state defined at tn is known. This is achieved by a series of iterations
using a predictor–corrector algorithm outlined in Algorithms 4 and 5 which is
implemented in the C++ class GradPlasProblem in the FEniCS Solid Mechanics
library. The algorithm is inspired by the work of Djoko et al. (2007b) although there
are a few notable differences. Firstly, the evolving plastic region is determined
based on the cell average value of the yield criterion instead of the value at
integration points. This means that an element is either elastic or plastic and that
the elastic–plastic boundary is located on element facets and not inside elements.
138 Chapter 6. Strain gradient plasticity

This is a necessary requirement for the discontinuous formulation developed in


the previous section. Secondly, the value of G ∇2 λ is not computed at integration
points for evaluating the yield criterion. Instead the yield criterion is evaluated
by solving a variational formulation. Thirdly, and most importantly, the value
of the yield function in an elastic element is independent of λ values in adjacent
plastic elements as the gradient terms are only active inside the plastic region. This
property prevents the artificial spread of λ from the plastic region into the elastic
region and a resulting spurious regularisation. The solution procedure is outlined
below.

6.4.1 The predictor step

Algorithm 4 Predictor step of the predictor–corrector algorithm for the coupled


variational problem in (6.30) at iteration k and time step n + 1.
1: Solve system (6.30) at configuration k − 1 to get uk and λk .
2: for T ∈ T h do
3: Compute ∆λk = λk − λn .
avg
4: Compute cell average ∆λk .
avg
5: if ∆λk < 0 then
6: Force all integration points on T to be elastic during entire load step
n → n + 1.
7: Use the elastic tangent, Ctan = C and set λk = λn .
8: end if
p
 
9: Compute trial stress σtr = C : ε k − ε n .
∂ f (σ )
10: Update N to trial state Ntr = ∂σtr .
11: end for
p
12: Solve problem (6.33) using Ωk−1 to get f tr = f (σtr , λk ).

Algorithm 4 shows the computations for the predictor step. The force b and
boundary condition h is updated to the state n + 1 and the global system in (6.30)
is assembled and solved to get the Newton increments du and dλ which are used
to update the values of u and λ at time n + 1 and iteration number k such that in
general un+1,k ← un+1,k−1 + du and λn+1,k ← λn+1,k−1 + dλ. For the first iteration
un+1,k ← un + du and λn+1,k ← λn + du. In the following, and in Algorithms 4
and 5, the subscripts n + 1 are omitted.
The total increment in the plastic multiplier for the entire load step is computed
at every integration point, line 3. If the cell average of this increment is negative,
the cell is marked as elastic during the entire load step n → n + 1 to avoid the
unstable situation where elements are switching back and forth between the elastic
and plastic state, lines 5–8. Note that it is only the total increment of λ which
6.4. Implementation 139

is not allowed to be negative, thus dλ for an element can be negative during


iterations. This situation will often occur as, due to the gradient effect, the λ field is
redistributed while the plastic domain is expanding. A trial stress is then computed
locally for all integration points based on uk and the plastic strain from the previous
converged load step n, line 9.
To determine which elements are yielding, the yield criterion (6.1) must be
evaluated. The value of the yield function also enters the linear form in (6.32) and
it must, therefore, be consistent with the linearisation in the previous section. This
means that the boundary conditions for λ should be identical to those enforced in
the bilinear form (6.31) and that the gradient terms are only active in the plastic
region. A variational formulation to compute the value of the yield function at
some known state k, can be defined on the form: find f k ∈ W such that

a f k , w = L (w) ∀ w ∈ W, (6.33)


where Z
a f k , w := f k w dx (6.34)


and
Z  
L (w) := φ (σk ) − Hλk − σy w dx

Z
−G ∇λk + R (λk ) · ∇w + R (w) dx
 
Ωp
Z
− ∑
p p
αG
Ωp
rS (λk ) · rS (w) dx, (6.35)
S ∈ Γ0 ∪ Γ D

The yield criterion is evaluated (line 12), using the trial stress, by solving the
variational problem (6.33) under the assumption that the set of plastic elements
p p
remained constant during the last iteration, that is, using Ωk−1 (or Ωn in case
k = 0).

6.4.2 The corrector step


The value of the yield function, based on the trial stresses and the old set of plastic
cells, was computed in the predictor step. In the corrector step, Algorithm 5, this
value is used to determine the new set of plastic elements together with a corrected
stress. In lines 2–5, the total increment of the plastic multiplier is again evaluated
to test if the element should be forced to be elastic. If this is the case, the corrected
stress is equal to the trial stress. Based on element averages of the yield function the
new set of plastic cells can be determined, lines 6–8. This means that integration
points inside an element can become part of the plastic domain although the value
of the yield function is negative for that particular integration point. As already
140 Chapter 6. Strain gradient plasticity

Algorithm 5 Corrector step of the predictor–corrector algorithm for the coupled


variational problem in (6.30) at iteration k and time step n + 1.
1: for T ∈ T h do
avg
2: if ∆λk < 0 for given cell then
3: Cell is already marked elastic, so use trial stress σk = σtr .
4: continue
5: end if
avg
6: Compute cell average of the yield function f tr .
avg
7: if f tr > 0 then
p
8: Cell is marked as plastic, that is, T ∈ Ωk .
p
 
9: Correct trial stress σk = C : ε k − ε n − ∆λk Ntr .
∂ f (σ )
10: Update N such that Nk = ∂σ k and compute Ctan from (6.27).
11: else
p
12: / Ωk .
Mark current cell as elastic, that is, T ∈
13: Use the elastic tangent, Ctan = C and set σk = σtr and λk = λn .
14: end if
15: end for
p
16: Solve problem (6.33) using Ωk to get f k = f (σk , λk ) for the linear form in (6.32).
17: Assemble system (6.30) at state k.
18: if Global convergence then
19: Advance state (·)n+1 → (·)n .
20: else
21: Return to line 1 in the predictor step, Algorithm 4 and increment iteration
number k ← k + 1.
22: end if
6.5. Numerical examples 141

discussed, this is necessary because the discontinuous Galerkin terms in (6.35)


and (6.31) are only defined on element boundaries.
After updating the plastic domain and correcting the trial stress for the relevant
integration points (line 9), the yield criterion (6.33) is evaluated again (line 16) such
that corrected values enters the linear form of the coupled problem in (6.32) as
f k . Notice that values of f k might be negative to allow for negative dλ in the next
iteration thereby permitting a redistribution of the λ field within the load step.
Finally, the global system in (6.30) is assembled and checked for convergence, line
17-18. If convergence is achieved, the system is advanced to the next load step (line
19), otherwise return to the predictor step and continue iterations (line 21).

6.4.3 Implementing the variational forms


The bilinear and linear forms of the two variational problems (6.30) and (6.33),
which are solved during the iterations in the algorithm above, can be implemented
in the FEniCS framework by utilising the functionality outlined in the preceding
chapters. To accomplish this, integrals of the variational forms which only contain
conventional terms are implemented in a standard UFL file while integrals involving
the lifting function R and the lifting operator r are handled separately. For the
coupled problem (6.30), the linear form (6.32) and the terms from the bilinear
form (6.31) which are independent of R and r can be implemented in UFL in a
straightforward fashion as shown in Figure 6.1. As was the case for conventional
plasticity in Section 2.4.2, the stress and the linearised tangent (and also the gradient
of the yield function N) are supplied as coefficients to the form using quadrature
elements. In the code, subdomain 0 refers to the elastic region while subdomain
1 refers to the plastic region. The remaining terms from (6.31) which involves R
and r, which in essence are identical to the bilinear form for the Poisson equation
in (5.7), can be implemented as outlined in Section 5.2 using the LiftingAssembler
class. The variational problem (6.33) is implemented using a similar approach.

6.5 Numerical examples


In this section, the gradient plasticity model is applied to different example prob-
lems to demonstrate the influence of boundary conditions on the abilities of the
model when considering softening and hardening problems. First, a simple soften-
ing problem is considered with negligible plastic strain gradients inside the plastic
domain to demonstrate that the model does not guarantee regularisation. Then,
another strain softening problem is considered, in which plastic strain gradients are
present inside the plastic domain, to demonstrate that some degree of regularisation
can be achieved for the microfree boundary condition. Finally, a hardening problem
is considered to demonstrate that the model is only capable of modelling size
effects when the microhard boundary condition is considered.
142 Chapter 6. Strain gradient plasticity

UFL code
V = VectorElement("Lagrange", tetrahedron, 1)
W = FiniteElement("DG", tetrahedron, 0)
EPS = VectorElement("Quadrature", tetrahedron, 1, 6)
TAN = VectorElement("Quadrature", tetrahedron, 1, 36)
element = V * W

(v, w) = TestFunctions(element)
(du, dl) = TrialFunctions(element)

N0 = Coefficient(EPS)
sig0 = Coefficient(EPS)
f0 = Coefficient(W)
t = Coefficient(TAN)
H = 2000.0
G = 800.0

def tangent(t):
return as_matrix([[t[i*6 + j] for j in range(6)] for i in range(6)])

def epsilon(U):
return as_vector([U[i].dx(i) for i in range(3)] \
+ [U[i].dx(j) + U[j].dx(i) for i, j in [(0, 1), (0, 2), (1, 2)]])

def sigma(s):
return as_matrix([[s[0], s[3], s[4]],
[s[3], s[1], s[5]],
[s[4], s[5], s[2]]])

a = inner(dot(tangent(t), epsilon(du)), epsilon(v))*dx(0) - inner(dl*H,


w)*dx(0) \
+ inner(dot(tangent(t), epsilon(du)), epsilon(v))*dx(1) - inner(dl*H,
w)*dx(1) \
- dl*inner(dot(tangent(t), N0), epsilon(v))*dx(1) \
+ inner(dot(N0,tangent(t)), epsilon(du))*w*dx(1) \
- dl*w*inner(dot(N0, tangent(t)), N0)*dx(1) \
- G*inner(grad(dl), grad(w))*dx(1)

L = inner(sigma(sig0), grad(v))*dx(0) \
+ inner(sigma(sig0), grad(v))*dx(1) + f0*w*dx(1)

Figure 6.1: UFL input for the conventional parts of the variational problem in (6.30)
in three dimensions. In the specific case, continuous, piecewise linear elements are
used for the displacements while discontinuous piecewise constant elements are
used for the plastic multiplier.
6.5. Numerical examples 143

The lifting-type formulation of the model will be considered and the solver is
implemented in the FEniCS framework as outlined in the previous section. Two
combinations of finite element discretisations for the displacement and plastic
multiplier fields will be considered. The first case considers a continuous, piecewise
linear displacement field and a discontinuous, piecewise constant field for the
plastic multiplier λ and will be referred to as the P1 /P0 case. The second case
considers a continuous, piecewise quadratic displacement field and a discontinuous,
piecewise linear field for the plastic multiplier λ and will be referred to as the
P2 /P1 case. For the latter case, it is chosen to use discontinuous, piecewise linear
polynomials for the gradient space Q although constant elements can be used
according to the definition in (5.3). Using equal order elements for the plastic
multiplier and the gradient space improves convergence of the Newton solver
when large gradients are present. Similar observations have been reported by Bassi
and Rebay (1997). Based on the conclusions from the previous chapter regarding
lifting-type formulations for fields involving discontinuous constants, the value of
the stabilisation parameter will be set to α = 10−3 for all examples to avoid that the
stabilisation term governs the solution.
Two types of boundary conditions for λ, see (6.8) and (6.9), are considered for
all examples. The first type will be referred to as the microhard boundary condition
p
where λ = 0 on Γ D = ∂Ωp \ ∂Ω, that is, the facets on the elastic–plastic boundary
which are not located on the exterior of the domain. For the microhard boundary
condition, ∇λ · n = 0 is imposed on the remainder of facets on the plastic boundary
p
such that Γ N = ∂Ωp ∩ ∂Ω. The second type will be referred to as the microfree
boundary condition where ∇λ · n = 0 on ∂Ωp .

6.5.1 Unit square loaded in shear with strain softening


This example considers a unit square under shear loading with strain softening
and negligible plastic strain gradients in the plastic domain. The purpose is
to demonstrate that the model does not provide a mechanism that guarantees
regularisation in the softening regime. The domain, Ω = [0, 1] × [0, 1], is divided
into 5 × 5 cells and each cell is divided into two 2 triangles, see Figure 6.3. The
left-hand and right-hand side of the domain is fixed in the horizontal direction
and the bottom is fixed in vertical direction for x 6 0.4. A sequence of downward
displacements are prescribed at the top of the domain for x > 0.6. An elastic load
step of ∆u = 0.0006mm is followed by 14 plastic load steps of ∆u = 0.0001mm such
that the total downward displacement after the final load step is u = 0.002mm.
The test is performed under plane strain conditions using the material parameters
shown in Table 6.1. The material is weakened in the center of the domain such
that σy = 150 MPa in [0.2, 0.8] × [0.2, 0.8] and for 0.4 6 x 6 0.6. For this particular
example, only P1 /P0 elements are considered.
First, the microfree boundary condition, ∇λ · n = 0, is applied. The net force
144 Chapter 6. Strain gradient plasticity

Parameter Value [unit]


Young’s modulus, E 200.0E3 [MPa]
Poisson’s ratio, ν 0.3
Yield strength, σy 200.0 [MPa]
Hardening modulus, H -25.0E3 [MPa]

Table 6.1: Material parameters for the localisation example.

120

100

80
Net force [N]

60

40

G =0
20 G = 1E2
G = 1E4
G = 1E6
0
0 0.001 0.002
Displacement [mm]

Figure 6.2: Load-displacement curve for different values of G using the microfree
boundary condition for λ on a unit square loaded in shear.

acting at the top of the domain as a function of the displacement is shown in


Figure 6.2 for different values of the gradient parameter G. As expected, the results
are independent of the value of G, even for very large values, as no significant
plastic strain gradients are present inside the plastic region. The distribution of the
λ field after the last load step can be seen in Figure 6.3. Clearly, the distribution of
the λ field is independent of the value of the gradient parameter and the plastic
zone is localised in the middle column of elements.
The microhard boundary condition, λ = 0, is now applied. The net force acting
at the top of the domain as a function of the displacement is shown in Figure 6.4
for different values of the gradient parameter G. As expected, the results for this
type of boundary condition are very sensitive to the value of G. The reason is
that the microhard boundary condition on the elastic–plastic boundary introduces
a gradient effect. Also note that the values for G are substantially lower than
those used for the microfree case. For the case where G = 50MPa the specimen
6.5. Numerical examples 145

(a) G = 0MPa. (b) G = 100MPa. (c) G = 1E6MPa.

Figure 6.3: Localisation of λ in the middle column of elements for different values
of the gradient parameter G after the last load step using the microfree boundary
condition.

120

100

80
Net force [N]

60

40

G =0
20 G = 50
G = 250
G = 1000
0
0 0.001 0.002
Displacement [mm]

Figure 6.4: Load-displacement curve for different values of G using the microhard
boundary condition for λ on a unit square loaded in shear.
146 Chapter 6. Strain gradient plasticity

(a) G = 50MPa. (b) G = 250MPa. (c) G = 1000MPa.

Figure 6.5: Distribution of λ for different values of the gradient parameter G after
the last load step using the microhard boundary condition.

still exhibits softening, but the softening is less if compared to the case where
G = 0MPa. As the value of G is increased, the softening becomes less pronounced
and for G = 250MPa the load-displacement curve is almost perfectly plastic. In
other words, the gradient effect counterbalances the influence of material softening
governed by the hardening parameter H. For G = 1000MPa, the specimen exhibits
a hardening behaviour as the gradient effect becomes dominant compared to the
softening term in the yield function.
The distribution of the λ field after the last load step for the microhard boundary
condition can be seen in Figure 6.5. Note, that for the two cases where G = 50MPa
and G = 250MPa, Figures 6.5a and 6.5b, the plastic zone is still localised in the
middle column of elements. However, the values of λ are different compared to the
classical plasticity case, Figure 6.3a, in that higher values of G correspond to lower
values of λ because the microhard boundary condition drives the values towards
zero. When the gradient parameter is large enough to make the specimen enter the
hardening regime, the plastic zone expands to the adjacent elements as shown in
Figure 6.5c. The expanding plastic zone also accounts for the jumps observed in
the load-displacement curve.
As demonstrated in this example, the model is incapable of providing regulari-
sation under the given conditions when using the microfree boundary condition.
Switching to the microhard boundary condition makes the softening less pro-
nounced for lower values of the gradient parameter although it does not lead to an
expansion of the plastic zone. To make the plastic zone expand, a high value of
the gradient parameter is needed which effectively changes the load-displacement
response from softening to hardening.

6.5.2 Plate under compressive loading with strain softening


This example considers shear band formation in a plate subjected to compressive
loading which means that gradients of the plastic multiplier will be present inside
the plastic region. Therefore, in contrast to the previous example, the results will
6.5. Numerical examples 147

(a) Mesh 1 consisting of 690 (b) Mesh 2 consisting of 1566 (c) Mesh 3 consisting of 6370
triangles. triangles. triangles.

Figure 6.6: Three unstructured meshes of the plate subjected to compressive


loading.

depend on the value of the gradient parameter also when using the microfree
boundary condition. As a consequence, regularisation of the softening problem
can be expected to some extent. Three different unstructured meshes, shown in
Figure 6.6, will be considered to demonstrate the influence of the mesh size in
this softening problem. The width of the plate is 10mm while the height is 15mm
and the imperfection in the lower left corner has an extension of 1mm. The left-
hand side of the plate is fixed in the horizontal direction and the bottom is fixed
in vertical direction. The test is performed under plane strain conditions using
material parameters identical to the ones shown in Table 6.1 with the exception
that H = −4000MPa and the yield stress is uniform in the entire domain.

P1 /P0 elements with G = 0


A sequence of downward displacements are prescribed at the top of the plate.
An elastic load step of ∆u = 0.005mm is followed by twelve plastic load steps of
∆u = 0.0025mm such that the total downward displacement after the final load
step is u = 0.035mm.
First, the gradient parameter G is set to zero to verify that the classical theory
is mesh dependent in the current implementation. The net force acting at the
top of the plate as a function of the displacement is shown in Figure 6.7 for the
three different meshes. Clearly, the result is mesh dependent in that less energy is
dissipated as the mesh is refined. The Newton solver failed to converge for the last
load steps in the case of mesh 3.
The distribution of the λ field after the last converged load step can be seen
in Figure 6.8. As the mesh is refined the plastic zone localises in a shear band of
148 Chapter 6. Strain gradient plasticity

2500

2000
Net force [N]

1500

1000

500
Mesh 1
Mesh 2
Mesh 3
0
0 0.01 0.02 0.03 0.04
Displacement [mm]

Figure 6.7: Mesh dependent softening with G = 0MPa (classical plasticity) for the
P1 /P0 case.

decreasing width as shown in the figure. Apart from the convergence problems,
the overall behaviour is as anticipated. In general, the approach is very sensitive to
the choice of model parameters for softening problems. In particular, the step size,
the mesh size, and values of H and G affects the stability of the problem.

P1 /P0 elements and microfree boundary condition for λ


It is now demonstrated that the results for the microfree boundary condition are
influenced by a nonzero gradient parameter. For this particular example, the
gradient parameter G = 200MPa otherwise the test setup remains identical to the
previous example. The resulting load-displacement curve is shown in Figure 6.9.
As seen in the figure, the nonzero gradient parameter has an influence on the
results as plastic gradients are present inside the plastic domain. (Compare to
Figure 6.2 where the value of the gradient parameter does not have any influence
on the load-displacement curve.) Although the gradient parameter does have an
influence on the results these are clearly not mesh independent. 2
Figure 6.10 shows the distribution of λ for the three different meshes after the
final load step. The width of the plastic zone is still mesh dependent such that a
finer mesh results in a more narrow plastic zone compared to a coarser mesh for
identical values of the gradient parameter. However, the width of the plastic zone
is less dependent on the cell size compared to the results in Figure 6.8. The reason
2 For other values of the gradient parameter a similar story holds; the finer mesh always exhibits

more softening than the coarser mesh for a given value of the gradient parameter.
6.5. Numerical examples 149

(a) Mesh 1. (b) Mesh 2. (c) Mesh 3.

Figure 6.8: Localisation of λ with G = 0MPa for the three different mesh cases after
the last converged load step using P1 /P0 elements.

2500

2000
Net force [N]

1500

1000

500
Mesh 1
Mesh 2
Mesh 3
0
0 0.01 0.02 0.03 0.04
Displacement [mm]

Figure 6.9: Load-displacement curve with G = 200MPa using P1 /P0 elements and
the microfree boundary condition for λ.
150 Chapter 6. Strain gradient plasticity

(a) Mesh 1. (b) Mesh 2. (c) Mesh 3.

Figure 6.10: Distribution of λ with G = 200MPa for the three different mesh cases
after the final load step using P1 /P0 elements and the microfree boundary condition
for λ.

is that the microfree boundary condition only has a regularising effect while the
plastic zone is developing. Once the plastic zone is fully developed, there is no
mechanism by which the plastic zone can expand as the ‘shape’ of the λ field does
not change. Therefore, no additional gradient effects are introduced and plastic flow
localises in the zone which is already plastic. The microfree boundary condition is,
therefore, not able to produce mesh independent results for this softening problem
even if some plastic strain gradients are present inside the plastic domain.

P1 /P0 elements and microhard boundary condition for λ


The microhard boundary condition is now enforced on the elastic–plastic boundary.
The resulting load-displacement curve is shown in Figure 6.11. As was the case for
the simple shear example, see Figure 6.4, the results are influenced by the nonzero
gradient parameter. Note that the results in the first part of the load-displacement
curve, before the jumps, appear to be more mesh independent than the results for
the microfree boundary condition.
For mesh 2, the jump in load bearing capacity at load step 11 is due to the
expanding plastic zone as explained in Section 6.5.1, see also Figures 6.4 and 6.5.
This is illustrated in Figure 6.12 which shows the distribution of λ on mesh 2 at
load steps 8-13. In load step 8, the plastic zone is fully developed and in step
9-10 plastic flow increases inside this zone. However, as λ = 0 is enforced on the
boundary, the gradients also increase. This introduces a hardening effect and as
a result the plastic zone expands in load step 11. Then, in load step 12-13, plastic
flow simply increase inside the, now larger, plastic zone.
For mesh 3, the solution becomes unstable. This is an effect of the spreading
of the plastic zone described above and the predictor–corrector algorithm. If the
6.5. Numerical examples 151

2500

2000
Net force [N]

1500

1000

500
Mesh 1
Mesh 2
Mesh 3
0
0 0.01 0.02 0.03 0.04
Displacement [mm]

Figure 6.11: Load-displacement curve with G = 200MPa using P1 /P0 elements and
the microhard boundary condition for λ.

avg
plastic zone expands ‘too much’ during an iteration, ∆λk for a given cell T can
become negative due to the diffusive nature of the yield function. For stability
reasons, in line 5 of Algorithm 4, the cell is marked as elastic during the entire load
step should this event occur. However, this introduces an artificial elastic–plastic
boundary in the otherwise plastic domain, which again, due to the boundary
conditions for λ, introduces additional hardening. This is illustrated in Figure 6.13
which shows the distribution of λ on mesh 3 at load steps 6-11. In load step 6-8,
the distribution of λ is developing as expected for the given problem. Then, in load
step 9-11 it is seen how the artificial elastic–plastic boundaries develop as loading
progresses which causes ‘plastic islands’ to emerge in the computational domain.
This effect naturally has an impact on the load-displacement curve as already
shown in Figure 6.11. For fine meshes, it is difficult to avoid this situation when
using the microhard boundary condition for λ. However, while the plastic zone
is expanding smoothly, convergence is usually better compared to the microfree
case. Reducing the loading step size does not improve the stability of the algorithm
avg
because even a small expansion of the plastic zone can result in a negative ∆λk
for a cell well inside the plastic region.

P2 /P1 elements
The influence of using higher order elements for the softening problem is now
investigated. Using higher order elements makes the algorithms even more sensitive
to the choice of model parameters. The hardening modulus is set to H = −2000MPa
152 Chapter 6. Strain gradient plasticity

(a) Load step 8. (b) Load step 9. (c) Load step 10.

(d) Load step 11. (e) Load step 12. (f) Load step 13.

Figure 6.12: Development in the distribution of λ at different load steps for G =


200MPa on mesh 2 using P1 /P0 elements and the microhard boundary condition
for λ.
6.5. Numerical examples 153

(a) Load step 6. (b) Load step 7. (c) Load step 8.

(d) Load step 9. (e) Load step 10. (f) Load step 11.

Figure 6.13: Development in the distribution of λ at different load steps for G =


200MPa on mesh 3 using P1 /P0 elements and the microhard boundary condition
for λ.
154 Chapter 6. Strain gradient plasticity

2500

2000
Net force [N]

1500

1000

500
Mesh 1
Mesh 2
Mesh 3
0
0 0.01 0.02 0.03 0.04
Displacement [mm]

Figure 6.14: Mesh dependent softening with G = 0MPa (classical plasticity) for the
P2 /P1 case.

and the size of the plastic load steps is reduced. An elastic load step of ∆u =
0.005mm is followed by sixty plastic load steps of ∆u = 0.0005mm such that the
total downward displacement after the final load step is, still, u = 0.035mm. Again,
the gradient parameter G is set to zero to verify that the classical theory is mesh
dependent in the current implementation.
The resulting load-displacement curve for the three meshes is shown in Fig-
ure 6.14. For this test set up, the Newton solver failed to converge for the last few
load steps in the case of mesh 2 and for mesh 3 convergence was only achieved for
a couple of load steps after the plastic zone was fully developed.
The distribution of the λ field after the last converged load step can be seen
in Figure 6.15. It is clear that the higher order elements allow the plastic zone to
localise in a zone which is only a couple of elements wide. (Compare to Figure 6.8
for the P1 /P0 case.)
The microfree boundary condition is now applied for the λ field with G =
200MPa and the resulting load-displacement curve can be seen in Figure 6.16.
Compared to the P1 /P0 case, the results are now almost mesh independent. How-
ever, the convergence rate of the Newton solver was poor and for mesh 3 it failed
to converge after a few plastic load steps. Figure 6.17 shows the distribution of
λ for the three different meshes after the last converged load step. The width of
the plastic zone is almost identical for the three meshes and much less dependent
on the cell size compared to the results in Figure 6.15. Note that the softening for
G = 200MPa in Figure 6.16 is much less pronounced compared to Figure 6.14 for
6.5. Numerical examples 155

(a) Mesh 1. (b) Mesh 2. (c) Mesh 3.

Figure 6.15: Localisation of λ with G = 0MPa for the three different mesh cases
after the last converged load step using P2 /P1 elements.

2500

2000
Net force [N]

1500

1000

500
Mesh 1
Mesh 2
Mesh 3
0
0 0.01 0.02 0.03 0.04
Displacement [mm]

Figure 6.16: Load-displacement curve with G = 200MPa using P2 /P1 elements and
the microfree boundary condition for λ.
156 Chapter 6. Strain gradient plasticity

(a) Mesh 1. (b) Mesh 2. (c) Mesh 3.

Figure 6.17: Distribution of λ with G = 200MPa for the three different mesh cases
after the final load step using P2 /P1 elements and the microfree boundary for λ.

G = 0MPa. However, for smaller values of G the resulting load-displacement curve


becomes mesh dependent, similar to that shown in Figure 6.9 for the P1 /P0 case
and the Newton solver is still fails to converge for mesh 3.
Finally, the influence of the microhard boundary condition for λ is demonstrated
for the P2 /P1 case. As seen from Figure 6.18, the results are highly unstable for the
same reasons as in the P1 /P0 case and no softening is observed.
As demonstrated in the previous examples the model is not suitable for pro-
viding regularisation of a softening problem regardless of which type of boundary
condition is being applied for λ. If the microfree boundary condition is used
results are independent of the gradient parameter if no gradients of the plastic
strain are present inside the plastic domain. Even if gradients are present inside
the plastic domain the results are not necessarily mesh independent for a given
value of the gradient parameter. Although the microhard boundary condition
for λ does provide a mechanism by which the plastic zone can expand it is also
unsuitable for providing regularisation of softening problems. The reason is that
the boundary condition will result in a hardening effect for sufficiently large values
of the gradient parameter which is needed for the plastic zone to expand. The
expanding plastic zone, on the other hand, makes the solution algorithm highly
unstable and for both types of boundary conditions convergence problems often
become an issue as the approach is very sensitive to the choice of model parameters.

6.5.3 Plate under compressive loading with strain hardening


The softening problem is now changed to a hardening problem by setting the hard-
ening modulus H = 2000MPa in order to investigate if the numerical difficulties
from the softening problem disappear. Mesh 3 is investigated for different values
of the gradient parameter G while all other parameters remain the same as for the
6.5. Numerical examples 157

2500

2000
Net force [N]

1500

1000

500
Mesh 1
Mesh 2
Mesh 3
0
0 0.01 0.02 0.03 0.04
Displacement [mm]

Figure 6.18: Load-displacement curve with G = 200MPa using P2 /P1 elements and
the microhard boundary condition for λ.

P1 /P0 case of the previous softening problem.


The load-displacement curves for the P1 /P0 case using microfree and microhard
boundary conditions can be seen in Figures 6.19 and 6.20 respectively, where
indeed the instability has disappeared for all values of G. Note that increasing
the value of G seems to have a negligible effect on the results. This is because the
microhard boundary condition λ = 0 on the elastic–plastic boundary (inside the
computational domain) only results in a gradient effect while the plastic zone is
developing. As soon as the plastic zone is fully developed, the positive hardening
modulus results in the (almost) entire domain becoming plastic. This activates the
p
∇λ · n = 0 boundary condition on Γ N = ∂Ωp ∩ ∂Ω. During continued loading,
the value of λ will simply increase inside the plastic region without introducing
additional gradients and the results thus become independent of G.
Similar results can be seen in Figures 6.21 and 6.22 for the microfree and
microhard boundary conditions respectively for the P2 /P1 case. Again the results
are almost independent of the values of G when microfree boundary conditions
are used. When using microhard boundary conditions, some artificial hardening
is introduced for G = 800MPa and G = 1600MPa. The reason is that there is a
small elastic region close to the imperfection where the yielding initialise. During
continued loading the plastic zone will try to expand into this region but due to
the high concentration of plastic flow at the imperfection ∆λavg becomes negative
which introduces artificial elastic–plastic boundaries as already discussed. Despite
this numerical difficulty, the model and implementation appears to be working
158 Chapter 6. Strain gradient plasticity

3000

2500

2000
Net force [N]

1500

1000

500 G =0
G = 800
G = 1600
G = 3200
0
0 0.01 0.02 0.03 0.04
Displacement [mm]

Figure 6.19: Influence of different values of the gradient parameter G for mesh 3
and an isotropic linear hardening modulus H = 2000MPa using P1 /P0 elements
and the microfree boundary condition for λ.

3000

2500

2000
Net force [N]

1500

1000

500 G =0
G = 800
G = 1600
G = 3200
0
0 0.01 0.02 0.03 0.04
Displacement [mm]

Figure 6.20: Influence of different values of the gradient parameter G for mesh 3
and an isotropic linear hardening modulus H = 2000MPa using P1 /P0 elements
and the microhard boundary condition for λ.
6.5. Numerical examples 159

3000

2500

2000
Net force [N]

1500

1000

500 G =0
G = 800
G = 1600
G = 3200
0
0 0.01 0.02 0.03 0.04
Displacement [mm]

Figure 6.21: Influence of different values of the gradient parameter G for mesh 3
and an isotropic linear hardening modulus H = 2000MPa using P2 /P1 elements
and the microfree boundary condition for λ.

3000

2500

2000
Net force [N]

1500

1000

500 G =0
G = 800
G = 1600
G = 3200
0
0 0.01 0.02 0.03 0.04
Displacement [mm]

Figure 6.22: Influence of different values of the gradient parameter G for mesh 3
and an isotropic linear hardening modulus H = 2000MPa using P2 /P1 elements
and the microhard boundary condition for λ.
160 Chapter 6. Strain gradient plasticity

much better for a hardening problem and it is less sensitive to the choice of model
parameters.

6.5.4 Micro-indentation
As shown in the hardening problem in the previous section, the effect of the value
of the gradient parameter on the load-displacement curves was negligible as the
microfree boundary condition on the exterior boundary did not introduce addi-
tional hardening. Therefore, a micro-indentation problem is now investigated to
demonstrate that the model is capable of capturing size effects when the microhard
boundary condition for λ is used on an elastic–plastic boundary located inside the
computational domain.
A three dimensional model problem is considered in which the specimen of
interest has a width of 10mm and a height of 5mm. The specimen is constrained
such that displacements in the normal direction on the four sides and at the bottom
are prevented. The indenter is located at the center of the top part of the domain.
It has a spherical tip with a radius of 1mm and is initially embedded within the
specimen, to which it is rigidly attached, at a depth equal to the radius. The domain
in this initial state is assumed stress free.
A sequence of downward displacements measured from this initial state are
prescribed on the indenter. An elastic load step of ∆u = 0.0008mm is followed
by seven plastic load steps of ∆u = 0.0004mm such that the total downward
displacement after the final load step is u = 0.0036mm. Rather than modelling
the indenter explicitly, the prescribed displacements are imposed on the degrees
of freedom located on the surface of the indenter. Due to the symmetry of the
problem, only one quarter of the domain is modelled.
A front and top view of the computational mesh used for this problem is shown
in Figure 6.23. The mesh is refined in the region around the indenter tip. The
material parameters for this example are identical to the ones shown in Table 6.1
with the exception that H = 2000MPa.
The net force acting on the indenter tip as a function of the indentation depth
for the P1 /P0 case using microfree boundary conditions is shown in Figure 6.24.
Increasing the value of G only has a small effect on the load–displacement curve.
On the other hand, the load–displacement curve shown in Figure 6.25 for the
microhard boundary condition show a much bigger dependence on the value of
G. After load step number 6 the load bearing capacity for the cases G = 1600MPa
and G = 3200MPa increases dramatically. Again, this can be attributed to the effect
avg
of forcing cells to be elastic in Algorithm 4 if ∆λk is negative as explained in the
previous section.
Figures 6.26 and 6.27 show the load–displacement curves for the P2 /P1 case
using microfree and microhard boundary conditions respectively. In case of the
microfree boundary condition, the effect of increasing G has completely vanished.
6.5. Numerical examples 161

(a) Front view. (b) Top view.

Figure 6.23: Finite element mesh for the micro-indentation example consisting of
10979 tetrahedra.

500

400
Net force [N]

300

200

100 G =0
G = 800
G = 1600
G = 3200
0
0 0.001 0.002 0.003 0.004
Indenter displacement [mm]

Figure 6.24: The resulting force on the indenter as a function of the indentation
depth for different values of the gradient parameter G using P1 /P0 elements and
the microfree boundary condition for λ.
162 Chapter 6. Strain gradient plasticity

500

400
Net force [N]

300

200

100 G =0
G = 800
G = 1600
G = 3200
0
0 0.001 0.002 0.003 0.004
Indenter displacement [mm]

Figure 6.25: The resulting force on the indenter as a function of the indentation
depth for different values of the gradient parameter G using P1 /P0 elements and
the microhard boundary condition for λ.

Again, the microhard boundary condition results in a load–displacement curve


which is clearly influenced by the value of G. Also note that the sudden jumps
in load bearing capacity from Figure 6.25 have disappeared. As demonstrated in
these numerical experiments, one should use the microhard boundary condition to
model size effects with the current gradient plasticity model.
Figures 6.28 and 6.29 show the distribution of λ at the final load step for
different values of the gradient parameter for the P2 /P1 case using microfree and
microhard boundary conditions respectively. For both the microfree and microhard
cases the extent of the plastic region increases and the distribution of the λ field
becomes smoother as the value of the gradient parameter is increased. The extent
of the plastic region is comparable for the microfree and microhard cases, but
note that the values of λ near the boundary are much smaller, as they should be,
for the microhard case. The gradient of λ is thus larger which is reflected in the
load–displacement curve.

6.5.5 Computational notes

As demonstrated in the previous four sections, the gradient plasticity model is


not suitable for producing mesh independent results for softening problems but
it is able to capture size effects for the micro-indentation test when considering
the microhard boundary condition for λ. However, the numerical experiments
also revealed that the computed solutions are sensitive to the effect of the evolving
6.5. Numerical examples 163

500

400
Net force [N]

300

200

100 G =0
G = 800
G = 1600
G = 3200
0
0 0.001 0.002 0.003 0.004
Indenter displacement [mm]

Figure 6.26: The resulting force on the indenter as a function of the indentation
depth for different values of the gradient parameter G using P2 /P1 elements and
the microfree boundary condition for λ.

500

400
Net force [N]

300

200

100 G =0
G = 800
G = 1600
G = 3200
0
0 0.001 0.002 0.003 0.004
Indenter displacement [mm]

Figure 6.27: The resulting force on the indenter as a function of the indentation
depth for different values of the gradient parameter G using P2 /P1 elements and
the microhard boundary condition for λ.
164 Chapter 6. Strain gradient plasticity

(a) G = 0MPa. (b) G = 800MPa.

(c) G = 1600MPa. (d) G = 3200MPa.

Figure 6.28: Close-ups of the region around the indenter tip which show the
distribution of λ at the final load step for different values of G using P2 /P1 elements
and the microfree boundary condition for λ.
6.5. Numerical examples 165

(a) G = 0MPa. (b) G = 800MPa.

(c) G = 1600MPa. (d) G = 3200MPa.

Figure 6.29: Close-ups of the region around the indenter tip which show the
distribution of λ at the final load step for different values of G using P2 /P1 elements
and the microhard boundary condition for λ.
166 Chapter 6. Strain gradient plasticity

elastic–plastic boundary particularly when the microhard boundary condition for


λ is used.
The problem does not appear to originate from the gradient model, the for-
mulation and linearisation or the implementation of the variational forms. This
conclusion is based on the observation that the convergence of the Newton solver
during a load step is quadratic provided that the plastic region does not expand. If
indeed the plastic region expands during a load step, the Newton solver does not
begin to converge until the plastic region becomes stable after which convergence
is quadratic.
The sensitivity of the solution with respect to the evolving elastic–plastic bound-
ary can, therefore, be attributed to the numerical effect caused by line 5 in Algo-
avg
rithm 4 where a cell T is forced to be elastic during a load step should ∆λk be
negative for the given cell at a given iteration. This line is, however, needed to
ensure convergence in situations where a cell switches back and forth between
being elastic and plastic during iterations.
Essentially, problems arise as the nonlinear equations are solved using an
iterative procedure where in each iteration the computational domain might change.
Future work in this regard involves investigating different approaches in order
to alleviate this numerical problem. Firstly, a different way of stabilising the
algorithm in which cells are not forced to be elastic might be considered. Secondly,
a staggered approach to solving the coupled nonlinear equations and the evolving
elastic–plastic boundary can be pursued. Thirdly, an adaptive mesh refinement
scheme in front of the evolving elastic–plastic boundary can be implemented to
allow the plastic region to expand smoothly. The last approach should probably be
used in combination with adaptive mesh coarsening behind the evolving elastic–
plastic boundary in order to reduce computational cost.
7 Conclusions and future developments

In this work, the automated modelling framework of FEniCS has been developed
in a number of directions with the aim to facilitate rapid implementation and
testing for a wider range of problems. The developed extensions are widely used
by researchers and application developers in a number of different fields, see the
introduction to Chapter 3 and Section 4.2.5 for examples. The main contributions
can be summarised as follows. Efficiency is an issue when large scale problems
are solved using the finite element method. The development of the quadrature
representation and its optimisations has, therefore, extended the applicability of the
automated modelling concepts to more complex problems. Discontinuous Galerkin
methods, and methods that use discontinuous Galerkin concepts, may be applied
to problems other than strain gradient plasticity as demonstrated in this work. The
extensions to FEniCS for discontinuous Galerkin methods developed in this work,
therefore, also apply to these problems. Finally, the quadrature element, developed
for correct linearisation of plasticity problems, can be used for other problems
where functions do not come from a finite element space.

Conclusions
The main conclusions of this work relate to the representations and optimisations
of finite element forms, the automation of discontinuous Galerkin methods and
strain gradient plasticity. Numerical experiments have shown that the relative
run-time performance of the quadrature representation and the tensor contraction
representation can differ substantially depending on the nature of the considered
variational form. In general, the tensor contraction approach deals well with
forms which involve high-order bases and few coefficient functions, whereas the
quadrature representation is more efficient as the number of coefficient functions
(other than constant coefficients) and derivatives in a form increases. Hence, in
general, the quadrature representation is significantly faster for more complicated
forms. Furthermore, it has been shown, that quadrature optimisations can have a
significant impact on the run-time performance. It is, therefore, desirable to select
the most favorable representation and optimisation strategy based on an a priori
168 Chapter 7. Conclusions and future developments

inspection of the variational form. However, the code with the lowest number of
flops, at least for the quadrature representation, does not always perform best for a
given form. In addition, the run-time performance even depends on which C++
compiler options are used. A strategy for selecting between representations and
optimisations based only on an estimation of the number of flops does, therefore,
not seem feasible.
By developing extensions for supporting discontinuous Galerkin methods a
range of discontinuous variational formulations can be implemented in a relatively
straightforward fashion in the FEniCS framework. However, the new abstractions
also permit other formulations that build on concepts from discontinuous Galerkin
methods to be implemented by using the developed extensions as building blocks.
This has been demonstrated in a semi-automated implementation of a lifting-type
discontinuous Galerkin formulation for the Poisson equation. The lifting-type
formulation has two main advantages in relation to this work compared to the
interior penalty formulation. Firstly, it is stable for all positive values of the
stabilisation parameter. Secondly, as numerical experiments indicate, one can use
a constant basis for the Poisson equation, something which is not possible when
using the interior penalty method.
The Aifantis strain gradient plasticity model was implemented in the FEniCS
framework using a continuous, piecewise linear displacement field and a discon-
tinuous, piecewise constant field for the plastic multiplier. The latter was possible
because a lifting-type discontinuous Galerkin formulation was used for the plas-
tic multiplier. The implementation was also tested successfully for a continuous,
piecewise quadratic displacement field and a discontinuous, piecewise linear field
for the plastic multiplier. It was demonstrated that the model is not suitable for
softening problems. Size effects, on the other hand, were observed for a hardening
problem in the micro-indentation example, provided that the microhard boundary
condition was employed for the plastic multiplier. Some numerical problems were,
however, observed during load steps in which the plastic region did not expand
smoothly. The observed problems originate from the algorithm which handles the
update of state variables as it will force an element to be elastic during a load step
if the average of the total increment of the plastic multiplier becomes negative in a
given iteration. This issue should be resolved in order to produce reliable results.

Future developments

Using this work as a basis, the following future developments of the FEniCS
framework could be of interest. As the user base of FEniCS grows, so does
the desire of solving problems of increasing complexity. Therefore, continued
investigations into further optimising the quadrature representation is desirable.
The optimisations should focus on both run-time performance of the generated
169

code and compile-time performance. Two areas of particular importance in terms


of compile-time performance are the size of the generated code and the speed of
the code generation stage. Related to these developments, is the automatic selection
of representation and/or optimisation strategy.
The advantages of the lifting-type formulation mentioned above come at a price.
The three main drawbacks of the lifting-type formulation are that the formulation
is more complex, the local assembly is more expensive to perform and the global
tensor arising from assembling the variational form becomes less sparse. The latter
drawback is difficult to remedy but the first drawback can be alleviated by adding
fully automated support for lifting-type discontinuous Galerkin formulations. As it
is not entirely clear how this should be implemented, a first step is to address the
second drawback by improving the algorithm that evaluates the lifting function
in the current semi-automated approach which will make the local assembly less
expensive.
In order to improve the current implementation of the Aifantis model the fol-
lowing approaches may be attempted as outlined in the previous chapter. Firstly,
a different way of stabilising the algorithm in which cells are not forced to be
elastic might be considered. Secondly, a staggered approach to solving the coupled
nonlinear equations and the evolving elastic–plastic boundary can be pursued.
Thirdly, an adaptive mesh refinement scheme in front of the evolving elastic–plastic
boundary can be implemented to allow the plastic region to expand smoothly. The
last approach should probably be used in combination with adaptive mesh coarsen-
ing behind the evolving elastic–plastic boundary in order to reduce computational
cost.
Finally, as the overall aim of this work was to promote rapid prototyping and
testing of complex problems, while maintaining high performance, it seems natural
to implement other gradient models using the FEniCS framework. To facilitate
this, continued development of the FEniCS Solid Mechanics library to improve
the interface is important. For solid mechanics problems in general, continued
development on support for isoparametric elements and shell problems within the
FEniCS framework is also important.
References

Aifantis, E. C. (1984). On the microstructural origin of certain inelastic models.


Journal of Engineering Materials and Technology, 106(4):326–330.

Aifantis, E. C. (1987). The physics of plastic deformation. International Journal of


Plasticity, 3:211–247.

Alfred, V., Sethi, R., and Jeffrey, D. U. (1986). Compilers: Principles, Techniques and
Tools. Addison-Wesley, Reading, Massachusetts.

Allen, G., Benger, W., Goodale, T., Hege, H.-C., Lanfermann, G., Merzky, A., Radke,
T., Seidel, E., and Shalf, J. (2000). The Cactus code: a problem solving environment
for the grid. In High-Performance Distributed Computing, 2000. Proceedings. The
Ninth International Symposium on, pages 253–260.

Alnæs, M. S. (2012). UFL: a finite element form language. In Logg, A., Mardal,
K.-A., and Wells, G. N., editors, Automated Solution of Differential Equations by
the Finite Element Method, volume 84 of Lecture Notes in Computational Science and
Engineering, chapter 17. Springer.

Alnæs, M. S., Logg, A., and Mardal, K.-A. (2012). UFC: a finite element code
generation interface. In Logg, A., Mardal, K.-A., and Wells, G. N., editors,
Automated Solution of Differential Equations by the Finite Element Method, volume 84
of Lecture Notes in Computational Science and Engineering, chapter 16. Springer.

Alnæs, M. S., Logg, A., Mardal, K.-A., Skavhaug, O., and Langtangen, H. P.
(2009). Unified framework for finite element assembly. International Journal of
Computational Science and Engineering, 4(4):231–244.

Alnæs, M. S., Logg, A., Ølgaard, K. B., Rognes, M. E., and Wells, G. N. (2013).
Unified Form Language: A domain-specific language for weak formulations
of partial differential equations. ACM Transactions on Mathematical Software, To
appear. http://arxiv.org/abs/1211.4047.
172 References

Alnæs, M. S. and Mardal, K.-A. (2010). On the efficiency of symbolic computations


combined with code generation for finite element methods. ACM Transactions on
Mathematical Software, 37(1).

Alnæs, M. S. and Mardal, K.-A. (2012). SyFi and SFC: Symbolic finite elements
and form compilation. In Logg, A., Mardal, K.-A., and Wells, G. N., editors,
Automated Solution of Differential Equations by the Finite Element Method, volume 84
of Lecture Notes in Computational Science and Engineering, chapter 15. Springer.

Arnold, D. N., Brezzi, F., Cockburn, B., and Marini, L. D. (2002). Unified analysis for
discontinuous Galerkin methods for elliptic problems. SIAM Journal on Numerical
Analysis, 39(5):1749–1779.

Baker, G. A., Jureidini, W. N., and Karakashian, O. A. (1990). Piecewise solenoidal


vector fields and the Stokes problem. SIAM Journal on Numerical Analysis,
27(6):1466–1485.

Balay, S., Buschelman, K., Gropp, W. D., Kaushik, D., Knepley, M. G., McInnes,
L. C., Smith, B. F., and Zhang, H. (2001). PETSc Web page. http://www.mcs.anl.
gov/petsc/.

Bangerth, W., Hartmann, R., and Kanschat, G. (2007). deal.II –a general-purpose


object-oriented finite element library. ACM Transactions on Mathematical Software,
33(4).

Bassi, F. and Rebay, S. (1997). A high-order accurate discontinuous finite element


method for the numerical solution of the compressible Navier–Stokes equations.
Journal of Computational Physics, 131(2):267–279.

Bassi, F. and Rebay, S. (2002). Numerical evaluation of two discontinuous Galerkin


methods for the compressible Navier–Stokes equations. International Journal for
Numerical Methods in Fluids, 40(1–2):197–207.

Bastian, P., Blatt, M., Dedner, A., Engwer, C., Klöfkorn, R., Kornhuber, R., Ohlberger,
M., and Sander, O. (2008a). A Generic Grid Interface for Parallel and Adaptive
Scientific Computing. Part II: Implementation and Tests in DUNE. Computing,
82(2–3):121–138.

Bastian, P., Blatt, M., Dedner, A., Engwer, C., Klöfkorn, R., Ohlberger, M., and
Sander, O. (2008b). A Generic Grid Interface for Parallel and Adaptive Scientific
Computing. Part I: Abstract Framework. Computing, 82(2–3):103–119.

Begley, M. R. and Hutchinson, J. W. (1998). The mechanics of size-dependent


indentation. Journal of the Mechanics and Physics of Solids, 46(10):2049–2068.
173

Bonet, J. and Wood, R. D. (1997). Nonlinear Continuum Mechanics for Finite Element
Analysis. Cambridge University Press.

Brandenburg, C., Lindemann, F., Ulbrich, M., and Ulbrich, S. (2012). Advanced
numerical methods for PDE constrained optimization with application to optimal
design in Navier Stokes flow. In Leugering, G., Engell, S., Griewank, A., Hinze,
M., Rannacher, R., Schulz, V., Ulbrich, M., and Ulbrich, S., editors, Constrained
Optimization and Optimal Control for Partial Differential Equations, volume 160 of
International Series of Numerical Mathematics, pages 257–275. Springer Basel.

Brezzi, F., Douglas, Jim, J., and Marini, L. (1985). Two families of mixed finite
elements for second order elliptic problems. Numerische Mathematik, 47:217–235.

Brezzi, F., Manzini, G., Marini, D., Pietra, P., and Russo, A. (2000). Discontinuous
Galerkin approximations for elliptic problems. Numerical Methods for Partial
Differential Equations, 16(4):365–378.

Clason, C. and Kunisch, K. (2012). A measure space approach to optimal source


placement. Computational Optimization and Applications, 53(1):155–171.

De Borst, R. and Mühlhaus, H.-B. (1992). Gradient-dependent plasticity: Formu-


lation and algorithmic aspects. International Journal for Numerical Methods in
Engineering, 35(3):521–539.

De Borst, R. and Pamin, J. (1996). Some novel developments in finite element


procedures for gradient-dependent plasticity. International Journal for Numerical
Methods in Engineering, 39(14):2477–2505.

Djoko, J. K., Ebobisse, F., McBride, A. T., and Reddy, B. D. (2007a). A discontinuous
Galerkin formulation for classical and gradient plasticity – Part 1: Formulation
and analysis. Computer Methods in Applied Mechanics and Engineering, 196(37–
40):3881–3897.

Djoko, J. K., Ebobisse, F., McBride, A. T., and Reddy, B. D. (2007b). A discontinuous
Galerkin formulation for classical and gradient plasticity. Part 2: Algorithms
and numerical analysis. Computer Methods in Applied Mechanics and Engineering,
197(1–4):1–21.

Dular, P., Geuzaine, C., Henrotte, F., and Legros, W. (1998). A general environment
for the treatment of discrete problems and its application to the finite element
method. Magnetics, IEEE Transactions on, 34(5):3395–3398.

Dung, N. T. and Wells, G. N. (2006). A study of discontinuous Galerkin methods


for thin bending problems. In III European Conference on Computational Mechanics:
Solids, Structures and Coupled Problems in Engineering, Lisbon, Portugal.
174 References

Dung, N. T. and Wells, G. N. (2008). Geometrically nonlinear formulation for


thin shells without rotation degrees of freedom. Computer Methods in Applied
Mechanics and Engineering, 197(33–40):2778 – 2788.

Engel, G., Garikipati, K., Hughes, T. J. R., Larson, M. G., and Taylor, R. L. (2002).
Continuous/discontinuous finite element approximations of fourth-order elliptic
problems in structural and continuum mechanics with applications to thin beams
and plates, and strain gradient elasticity. Computer Methods in Applied Mechanics
and Engineering, 191(34):3669–3750.

Engelen, R. A. B., Fleck, N. A., Peerlings, R. H. J., and Geers, M. G. D. (2006).


An evaluation of higher-order plasticity theories for predicting size effects and
localisation. International Journal of Solids and Structures, 43(7–8):1857–1877.

Fleck, N. and Hutchinson, J. (1997). Strain gradient plasticity. volume 33 of Advances


in Applied Mechanics, pages 295 – 361. Elsevier.

Fleck, N. and Hutchinson, J. (2001). A reformulation of strain gradient plasticity.


Journal of the Mechanics and Physics of Solids, 49(10):2245 – 2271.

Fleck, N., Muller, G., Ashby, M., and Hutchinson, J. (1994). Strain gradient plasticity:
theory and experiment. Acta Metallurgica et Materialia, 42(2):475–487.

Funke, S. W. and Farrell, P. E. (2013). A framework for automated PDE-constrained


optimisation. arXiv preprint. http://arxiv.org/abs/1302.3894.

Gao, H., Huang, Y., Nix, W., and Hutchinson, J. (1999). Mechanism-based strain gra-
dient plasticity– I. theory. Journal of the Mechanics and Physics of Solids, 47(6):1239 –
1263.

Giesselmann, J., Makridakis, C., and Pryer, T. (2012). Energy consistent DG methods
for the Navier–Stokes–Korteweg system. arXiv preprint. http://arxiv.org/abs/
1207.4647.

Grandi, D., Maraldi, M., and Molari, L. (2012). A macroscale phase-field model for
shape memory alloys with non-isothermal effects: Influence of strain rate and
environmental conditions on the mechanical response. Acta Materialia, 60(1):179–
191.

Gudmundson, P. (2004). A unified treatment of strain gradient plasticity. Journal of


the Mechanics and Physics of Solids, 52(6):1379–1406.

Gurtin, M. E. (2004). A gradient theory of small-deformation isotropic plasticity that


accounts for the Burgers vector and for dissipation due to plastic spin. Journal of
the Mechanics and Physics of Solids, 52(11):2545–2568.
175

Gurtin, M. E. and Needleman, A. (2005). Boundary conditions in small-deformation,


single-crystal plasticity that account for the Burgers vector. Journal of the Mechanics
and Physics of Solids, 53(1):1–31.

Heumann, H. and Hiptmair, R. (2012). Stabilized Galerkin methods for magnetic ad-
vection. ETH Zürich. ftp://ftp.sam.math.ethz.ch/pub/sam-reports/reports/
reports2012/2012-26.pdf.

Hilber, H. M., Hughes, T. J., and Taylor, R. L. (1977). Improved numerical dissipation
for time integration algorithms in structural dynamics. Earthquake Engineering &
Structural Dynamics, 5(3):283–292.

Hoffman, J., Jansson, J., de Abreu, R. V., Degirmenci, N. C., Jansson, N., Müller,
K., Nazarov, M., and Spühler, J. H. (2013). Unicorn: Parallel adaptive finite ele-
ment simulation of turbulent flow and fluid–structure interaction for deforming
domains and complex geometry. Computers & Fluids, 80(0):310–319. Selected
contributions of the 23rd International Conference on Parallel Fluid Dynamics.

Holzapfel, G. A. (2000). Nonlinear Solid Mechanics: A Continuum Approach for


Engineering. John Wiley & Sons.

Horst, T., Heinrich, G., Schneider, M., Schulze, A., and Rennert, M. (2013). Linking
mesoscopic and macroscopic aspects of crack propagation in elastomers. In
Grellmann, W., Heinrich, G., Kaliske, M., Klüppel, M., Schneider, K., and Vilgis,
T., editors, Fracture Mechanics and Statistical Mechanics of Reinforced Elastomeric
Blends, volume 70 of Lecture Notes in Applied and Computational Mechanics, pages
129–165. Springer Berlin Heidelberg.

Hosangadi, A., Fallah, F., and Kastner, R. (2006). Optimizing polynomial expressions
by algebraic factorization and common subexpression elimination. Computer-
Aided Design of Integrated Circuits and Systems, IEEE Transactions on, 25(10):2012–
2022.

Hughes, T. J. R., Scovazzi, G., Bochev, P. B., and Buffa, A. (2006). A multiscale
discontinuous Galerkin method with the computational structure of a continuous
Galerkin method. Computer Methods in Applied Mechanics and Engineering, 195(19–
22):2761–2787.

Jansson, N., Hoffman, J., and Nazarov, M. (2011). Adaptive simulation of turbulent
flow past a full car model. In High Performance Computing, Networking, Storage
and Analysis (SC), 2011 International Conference for, pages 1–8. IEEE.

Karniadakis, G. E. and Sherwin, S. J. (2005). Spectral/hp Element Methods for Com-


putational Fluid Dynamics. Numerical Mathematics and Scientific Computation.
Oxford University Press, Oxford, second edition.
176 References

Kirby, R. C. (2004). Algorithm 839: FIAT, A new paradigm for computing finite
element basis functions. ACM Transactions on Mathematical Software, 30:502–516.

Kirby, R. C. (2012). FIAT: Numerical construction of finite element basis functions.


In Logg, A., Mardal, K.-A., and Wells, G. N., editors, Automated Solution of
Differential Equations by the Finite Element Method, volume 84 of Lecture Notes in
Computational Science and Engineering, chapter 13. Springer.

Kirby, R. C., Knepley, M. G., Logg, A., and Scott, L. R. (2005). Optimizing the
evaluation of finite element matrices. SIAM Journal on Scientific Computing,
27(3):741–758.

Kirby, R. C. and Logg, A. (2006). A compiler for variational forms. ACM Transactions
on Mathematical Software, 32:417–444.

Kirby, R. C. and Logg, A. (2007). Efficient compilation of a class of variational


forms. ACM Transactions on Mathematical Software, 33(3).

Kirby, R. C. and Logg, A. (2008). Benchmarking domain-specific compiler op-


timizations for variational forms. ACM Transactions on Mathematical Software,
35(2).

Kirby, R. C., Logg, A., Scott, L. R., and Terrel, A. R. (2006). Topological optimization
of the evaluation of finite element matrices. SIAM Journal on Scientific Computing,
28(1):224–240.

Korelc, J. (1997). Automatic generation of finite-element code by simultaneous


optimization of expressions. Theoretical Computer Science, 187(1–2):231–248.

Labeur, R. and Wells, G. (2012). Energy stable and momentum conserving hybrid
finite element method for the incompressible Navier–Stokes equations. SIAM
Journal on Scientific Computing, 34(2):889–913.

Labeur, R. J. and Wells, G. N. (2007). A Galerkin interface stabilisation method for


the advection-diffusion and incompressible Navier-Stokes equations. Computer
Methods in Applied Mechanics and Engineering, 196(49–52):4985–5000.

Labeur, R. J. and Wells, G. N. (2009). Interface stabilised finite element method for
moving domains and free surface flows. Computer Methods in Applied Mechanics
and Engineering, 198(5–8):615 – 630.

Lakkis, O. and Pryer, T. (2011). A finite element method for fully nonlinear elliptic
problems. arXiv preprint. http://arxiv.org/abs/1103.2970.

Langtangen, H. P. (1999). Computational partial differential equations: numerical methods


and Diffpack programming. Springer Verlag.
177

Lezar, E. and Davidson, D. (2012). Electromagnetic waveguide analysis. In Logg, A.,


Mardal, K.-A., and Wells, G., editors, Automated Solution of Differential Equations
by the Finite Element Method, volume 84 of Lecture Notes in Computational Science
and Engineering, chapter 34, pages 629–642. Springer Berlin Heidelberg.

Logg, A., Mardal, K.-A., and Wells, G. N., editors (2012a). Automated Solution of
Differential Equations by the Finite Element Method, volume 84 of Lecture Notes in
Computational Science and Engineering. Springer.

Logg, A., Mardal, K.-A., and Wells, G. N. (2012b). Finite element assembly. In
Automated Solution of Differential Equations by the Finite Element Method, volume 84
of Lecture Notes in Computational Science and Engineering, chapter 6. Springer.

Logg, A., Ølgaard, K. B., Rognes, M. E., and Wells, G. N. (2012c). FFC: the fenics
form compiler. In Logg, A., Mardal, K.-A., and Wells, G. N., editors, Automated
Solution of Differential Equations by the Finite Element Method, volume 84 of Lecture
Notes in Computational Science and Engineering, chapter 11. Springer.

Logg, A. and Wells, G. N. (2010). DOLFIN: Automated finite element computing.


ACM Transactions on Mathematical Software, 37(2):20:1–20:28.

Logg, A., Wells, G. N., and Hake, J. (2012d). DOLFIN: A C++/Python finite element
library. In Automated Solution of Differential Equations by the Finite Element Method,
volume 84 of Lecture Notes in Computational Science and Engineering, chapter 10.
Springer.

Long, K., Kirby, R., and Van Bloemen Waanders, B. (2010). Unified embedded
parallel finite element computations via software-based Fréchet differentiation.
SIAM Journal on Scientific Computing, 32:3323–3351.

Lopes, N., Pereira, P., and Trabucho, L. (2011). A numerical analysis of a class of
generalized Boussinesq-type equations using continuous/discontinuous FEM.
International Journal for Numerical Methods in Fluids, 69(7):1186–1218.

Lubliner, J. (2008). Plasticity Theory. Dover Publications.

Luo, C. and Calderer, M. C. (2012). Numerical study of liquid crystal elastomers by a


mixed finite element method. European Journal of Applied Mathematics, 23:121–154.

Maraldi, M., Molari, L., and Grandi, D. (2012). A unified thermodynamic framework
for the modelling of diffusive and displacive phase transitions. International
Journal of Engineering Science, 50(1):31 – 45.

Maraldi, M., Wells, G., and Molari, L. (2011). Phase field model for coupled
displacive and diffusive microstructural processes under thermal loading. Journal
of the Mechanics and Physics of Solids, 59(8):1596–1612.
178 References

Marchand, R. and Davidson, D. (2011). The method of manufactured solutions for


the verification of computational electromagnetics. In Electromagnetics in Advanced
Applications (ICEAA), 2011 International Conference on, pages 487–490.

Massing, A., Larson, M., and Logg, A. (2013). Efficient implementation of finite
element methods on nonmatching and overlapping meshes in three dimensions.
SIAM Journal on Scientific Computing, 35(1).

Massing, A., Larson, M. G., Logg, A., and Rognes, M. E. (2012a). A stabilized
Nitsche fictitious domain method for the Stokes problem. arXiv preprint. http:
//arxiv.org/abs/1206.1933.

Massing, A., Larson, M. G., Logg, A., and Rognes, M. E. (2012b). A stabilized
Nitsche overlapping mesh method for the Stokes problem. arXiv preprint. http:
//arxiv.org/abs/1205.6317.

Miaskowski, A., Sawicki, B., and Krawczyk, A. (2012). The use of magnetic nanopar-
ticles in low frequency inductive hyperthermia. COMPEL: The International Journal
for Computation and Mathematics in Electrical and Electronic Engineering, 31(4):1096–
1104.

Molari, L., Wells, G. N., Garikipati, K., and Ubertini, F. (2006). A discontinuous
Galerkin method for strain gradient-dependent damage: Study of interpolations
and convergence. Computer Methods in Applied Mechanics and Engineering, 195(13–
16):1480–1498.

Mortensen, M., Langtangen, H., and Wells, G. (2011). A FEniCS-based programming


framework for modeling turbulent flow by the Reynolds-averaged Navier–Stokes
equations. Advances in Water Resources, 34(9):1082–1101.

Mühlhaus, H.-B. and Aifantis, E. C. (1991). A variational principle for gradient


plasticity. International Journal of Solids and Structures, 28:845–857.

Nikbakht, M. (2012). Automated Solution of Partial Differential Equations with Dis-


continuities using the Partition of Unity Method. PhD thesis, Delft University of
Technology.

Nikbakht, M. and Wells, G. (2009). Automated modelling of evolving discontinuities.


Algorithms, 2(3):1008–1030.

Nix, W. D. and Gao, H. (1998). Indentation size effects in crystalline materials: a


law for strain gradient plasticity. Journal of the Mechanics and Physics of Solids,
46(3):411–425.

Poole, W., Ashby, M., and Fleck, N. (1996). Micro-hardness of annealed and
work-hardened copper polycrystals. Scripta Materialia, 34(4):559–564.
179

Prud’homme, C. (2006). A domain specific embedded language in C++ for auto-


matic differentiation, projection, integration and variational formulations. Scien-
tific Programming, 14:81–110.
Pryer, T. (2012). Discontinuous Galerkin methods for the p–biharmonic equation
from a discrete variational perspective. arXiv preprint. http://arxiv.org/abs/
1209.4002.

Püschel, M., Moura, J. M. F., Johnson, J., Padua, D., Veloso, M., Singer, B., Xiong, J.,
Franchetti, F., Gacic, A., Voronenko, Y., Chen, K., Johnson, R. W., and Rizzolo,
N. (2005). SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE,
93(2):232– 275.
Riesen, P., Hutter, K., and Funk, M. (2010). A viscoelastic Rivlin–Ericksen material
model applicable to glacier ice. Nonlinear Processes in Geophysics, 17:673–684.
Riesen, P. D. (2011). Variations of the surface ice motion of Gornergletscher during
drainages of the ice-dammed lake Gornersee. PhD thesis, ETH Zürich. http://dx.
doi.org/10.3929/ethz-a-006526655.

Rognes, M., Kirby, R., and Logg, A. (2010). Efficient assembly of h(div) and h(curl)
conforming finite elements. SIAM Journal on Scientific Computing, 31(6):4130–4151.
Rognes, M. E. and Logg, A. (2012). Automated goal-oriented error control I:
Stationary variational problems. arXiv preprint. http://arxiv.org/abs/1204.
6643.

Rosseel, E. and Wells, G. N. (2012). Optimal control with stochastic PDE constraints
and uncertain controls. Computer Methods in Applied Mechanics and Engineering,
213–216(0):152 – 167.
Russell, F. P. and Kelly, P. H. J. (2013). Optimized code generation for finite element
local assembly using symbolic manipulation. ACM Transactions on Mathematical
Software, 39(4).
Saibaba, A. K., Bakhos, T., and Kitanidis, P. K. (2012). A flexible Krylov solver
for shifted systems with application to oscillatory hydraulic tomography. arXiv
preprint. http://arxiv.org/abs/1212.3660.
Selim, K. (2012). An adaptive finite element solver for fluid–structure interaction
problems. In Logg, A., Mardal, K.-A., and Wells, G. N., editors, Automated
Solution of Differential Equations by the Finite Element Method, volume 84 of Lecture
Notes in Computational Science and Engineering, chapter 29. Springer.
Selim, K., Logg, A., and Larson, M. (2012). An adaptive finite element splitting
method for the incompressible navier–tokes equations. Computer Methods in
Applied Mechanics and Engineering, 209–212(0):54–65.
180 References

Shewchuk, J. R. and Ghattas, O. (1993). A compiler for parallel finite element meth-
ods with domain-decomposed unstructured meshes. In Keyes, D. E. and Xu, J.,
editors, Proceedings of the Seventh International Conference on Domain Decomposition
Methods in Scientific and Engineering Computing (Pennsylvania State University),
Contemporary Mathematics, volume 180, pages 445–450. American Mathematical
Society.

Simo, J. C. and Hughes, T. J. R. (1998). Computational Inelasticity. Springer Verlag.

Simo, J. C. and Taylor, R. L. (1985). Consistent tangent operators for rate-


independent elastoplasticity. Computer Methods in Applied Mechanics and En-
gineering, 48(1):101–118.

Stölken, J. and Evans, A. (1998). A microbend test method for measuring the
plasticity length scale. Acta Materialia, 46(14):5109–5115.

Sukys, J., Hiptmair, R., and Heumann, H. (2010). Discontinuous Galerkin discretiza-
tion of magnetic convection. ETH Zürich. http://math1.unice.fr/~hheumann/
Files/Report_Sukys.pdf.

Ten Eyck, A., Celiker, F., and Lew, A. (2008). Adaptive stabilization of discontin-
uous Galerkin methods for nonlinear elasticity: Motivation, formulation, and
numerical examples. Computer Methods in Applied Mechanics and Engineering,
197(45–48):3605–3622.

Ten Eyck, A. and Lew, A. (2006). Discontinuous Galerkin methods for non-linear
elasticity. International Journal for Numerical Methods in Engineering, 67(9):1204–
1243.

Vidoli, S. (2013). Discrete approximations of the Föppl–Von Kármán shell model:


From coarse to more refined models. International Journal of Solids and Structures,
50(9):1241–1252.

Vynnytska, L., Clark, S. R., and Rognes, M. E. (2012). Dynamic simulations of


convection in the Earth’s mantle. In Logg, A., Mardal, K.-A., and Wells, G.,
editors, Automated Solution of Differential Equations by the Finite Element Method,
volume 84 of Lecture Notes in Computational Science and Engineering, chapter 31,
pages 585–600. Springer Berlin Heidelberg.

Vynnytska, L., Rognes, M., and Clark, S. (2013). Benchmarking FEniCS for man-
tle convection simulations. Computers & Geosciences, 50(0):95–105. Benchmark
problems, datasets and methodologies for the computational geosciences.

Wang, P. (1986). FINGER: A symbolic system for automatic generation of numerical


programs in finite element analysis. Journal of Symbolic Computation, 2(3):305–316.
181

Wells, G. (2011). Analysis of an interface stabilized finite element method: The


advection-diffusion-reaction equation. SIAM Journal on Numerical Analysis,
49(1):87–109.

Wells, G. N. and Dung, N. T. (2007). A C0 discontinuous Galerkin formulation


for Kirchhoff plates. Computer Methods in Applied Mechanics and Engineering,
196(35–36):3370–3380.

Wells, G. N., Garikipati, K., and Molari, L. (2004). A discontinuous Galerkin


formulation for a strain gradient-dependent damage model. Computer Methods in
Applied Mechanics and Engineering, 193(33–35):3633–3645.

Wells, G. N., Hooijkaas, T., and Shan, X. (2008). Modelling temperature effects on
multiphase flow through porous media. Philosophical Magazine, 88(28–29):3265–
3279.

Ølgaard, K. B., Logg, A., and Wells, G. N. (2008a). Automated code generation for
discontinuous Galerkin methods. SIAM Journal on Scientific Computing, 31(2):849–
864.

Ølgaard, K. B. and Wells, G. N. (2009). Supporting material. http://www.dspace.


cam.ac.uk/handle/1810/218612.

Ølgaard, K. B. and Wells, G. N. (2010). Optimisations for quadrature representations


of finite element tensors through automated code generation. ACM Transactions
on Mathematical Software, 37(1):8:1–8:23.

Ølgaard, K. B. and Wells, G. N. (2012a). Applications in solid mechanics. In Logg,


A., Mardal, K.-A., and Wells, G. N., editors, Automated Solution of Differential
Equations by the Finite Element Method, volume 84 of Lecture Notes in Computational
Science and Engineering, chapter 26. Springer.

Ølgaard, K. B. and Wells, G. N. (2012b). Quadrature representation of finite element


variational forms. In Logg, A., Mardal, K.-A., and Wells, G. N., editors, Automated
Solution of Differential Equations by the Finite Element Method, volume 84 of Lecture
Notes in Computational Science and Engineering, chapter 7. Springer.

Ølgaard, K. B. and Wells, G. N. (2013). FEniCS Solid Mechanics. https:


//bitbucket.org/fenics-apps/fenics-solid-mechanics.

Ølgaard, K. B., Wells, G. N., and Logg, A. (2008b). Automated computational


modelling for solid mechanics. In Reddy, B. D., editor, IUTAM Symposium on
Theoretical, Computational and Modelling Aspects of Inelastic Media, volume 11 of
IUTAM Bookseries, pages 192–204. Springer.
Summary

In engineering, physical phenomena are often described mathematically by par-


tial differential equations (PDEs), and a commonly used method to solve these
equations is the finite element method (FEM). Implementing a solver based on this
method for a given PDE in a computer program written in source code can be te-
dious, time consuming and error prone. Recently, compilers that automatically gen-
erate source code from the mathematical representation of a given PDE expressed in
a form language have been introduced. This approach to automated mathematical
modelling, which is key in the FEniCS Project (http://fenicsproject.org), has
reduced the burden of application developers working with the FEM when it comes
to implementing solvers for new models.
In this thesis, the automated modelling framework of the FEniCS Project is
extended such that discontinuous Galerkin methods can be handled; rapid prototyp-
ing of advanced models and applications is possible; and efficiency is maintained
also for complex problems in general.
The extensions are implemented in various components of the FEniCS frame-
work. For instance, the Unified Form Language (UFL) is extended by adding new
abstractions that allow operators pertinent to discontinuous Galerkin methods to
be represented in a straightforward fashion. The FEniCS Form Compiler (FFC)
is also extended such that code can be generated from expressions that contain
the discontinuous Galerkin operators introduced in UFL. In order to maintain
computational efficiency for complex problems, various optimisation strategies for
computing the local finite element tensor are implemented in the FFC. The central
philosophy of the optimisation strategies is to manipulate the representation in
such a way that the number of operations to compute the local element tensor
decreases.
As an example, to demonstrate the extensions to the FEniCS framework devel-
oped in this work, a strain gradient plasticity model which includes a lifting-type
discontinuous Galerkin formulation for the plastic multiplier is presented. It
is demonstrated that the model is not suitable for softening problems. On the
other hand, the model is able to capture size effects for a hardening problem in a
micro-indentation simulation in three dimensions.
Samenvatting

In de ingenieurspraktijk worden natuurkundige fenomenen vaak wiskundig


beschreven met partiële differentiaalvergelijkingen (PDVs) en een veelgebruikte
methode om zulke vergelijkingen op te lossen is de eindige-elementenmethode
(EEM). Het implementeren in een computerprogramma geschreven in broncode van
een solver die gebaseerd is op deze methode voor een gegeven PDV is weerbarstig,
tijdrovend en foutgevoelig. Recentelijk zijn compilers geïntroduceerd die automa-
tisch broncode genereren van een wiskundige representatie van een gegeven PDV
in een form language. Deze benadering van automatisch modelleren, die centraal
staat in het FEniCS Project (http://fenicsproject.org), heeft de werklast vermin-
derd van ontwikkelaars die werken met de EEM wat betreft het implementeren
van solvers voor nieuwe modellen.
In dit proefschrift is het automatische modelleerraamwerk van het FEniCS
Project zodanig uitgebreid dat discontinue Galerkin methoden kunnen worden
gebruikt; rapid prototyping van geavanceerde modellen en toepassingen mogelijk is;
en efficiëntie algemeen gewaarborgd is voor complexe problemen.
De uitbreidingen zijn geïmplementeerd in verschillende componenten van het
FEniCS raamwerk. Zo is de Unified Form Language (UFL) uitgebreid met nieuwe
abstracties waarmee operatoren die bij de discontinue Galerkin methoden horen,
eenvoudig gerepresenteerd kunnen worden. De FEniCS Form Compiler (FFC) is
ook uitgebreid, zodat code gegenereerd kan worden die de discontinue Galerkin
operatoren bevat welke geïntroduceerd zijn in UFL. Om numerieke efficiëntie voor
complexe problemen te waarborgen zijn verschillende optimalisatie-strategieën
voor het berekenen van de lokale eindige-elementen tensor geïmplementeerd in
FFC. De centrale filosofie van de optimalisatie-strategieën is om de representatie
zodanig aan te passen dat het aantal bewerkingen voor het berekenen van de lokale
elementen-tensor afneemt.
Om de uitbreidingen van het FEniCS raamwerk die ontwikkeld zijn in dit werk
te demonstreren, wordt bij wijze van voorbeeld een strain gradient plasticiteitsmodel
gepresenteerd met een lifting discontinue Galerkin formulering voor de plastische
multiplier. Het wordt aangetoond dat het model niet geschikt is voor problemen
met softening. Aan de andere kant is het model wel in staat om schaaleffecten weer
186 Samenvatting

te geven voor een hardening probleem van micro-indentatie in drie dimensies.


Propositions

1. Automatic code generation can reduce the time needed to implement finite
element solvers, but only if one has faith in the generator that generates the
code. (This proposition was conceived after many hours of debugging the FEniCS
Form Compiler only to find out that there was a sign error in the input code.)

2. In the past, finite element solvers for partial differential equations (PDEs) were
written in languages (source code) that compilers translated into machine
code. In the present, a high-level language for expressing the mathematical
formulation of a given PDE makes it possible for compilers to automatically
generate source code. In the future, automated model generators may create
PDEs from experimental data.

3. Eventually, artificial general intelligence (strong AI) will allow autonomous


systems to select which part of reality to model and how, thus making humans
redundant.

4. In the big picture, humans, as a species, are already redundant, but this does
not imply that strong AI has been developed yet.

5. In terms of exposing bugs and directing the development of a software project,


a large user base is worth more than a large number of developers.

6. “A young man naturally conceives an aversion to labour, when for a long time
he receives no benefit from it.” – Adam Smith, reflections on apprenticeships
in An Inquiry into the Nature and Causes of the Wealth of Nations. Similarly, a
PhD student may experience a drop in motivation if project funding runs
out. The solution is to improve project planning rather than extending the
funding.

7. Dijkstra’s shortest path algorithm works well for planning tasks of short
duration, but it is not suitable for planning long-term research projects.

8. Time (or money) is the penalty parameter closing the gap between ambition
and actual work done.
188 Propositions

9. Economic growth is driving social differences between countries to zero.

10. A child does not satisfy boundary conditions by construction. Boundary


conditions must, therefore, be enforced in a weak sense by penalties, rewards
and by setting a good example.

11. Gradients in the distribution of wealth is a prerequisite for a dynamic society.

12. Collaboration makes it possible for a strong group of individuals to outper-


form a group of strong individuals. This concept applies to science as well as
sports and is illustrated, e.g., by the 1992 and 2012 European Championship
football matches between The Netherlands and Denmark.

13. Recent debate whether or not Sinterklaas (Saint Nicholas) can have Zwarte
Piet (Black Peter) as his helper (provided his employment is in accordance
with the collective agreement) misses the point. The real problem is if Zwarte
Piet is dismissed because of his skin colour.

14. Although a PhD thesis is rarely read cover to cover, most people will read
the propositions and then go through the references to see how many papers
have been published based on the present work.

These propositions are regarded as opposable and defendable, and have been
approved as such by the supervisors Prof. dr. ir. L. J. Sluys and Dr. G. N. Wells.
Stellingen

1. Automatische code-generatie kan de tijd die nodig is om eindige elementen


solvers te implementeren reduceren, maar alleen voor wie vertrouwen heeft
in de generator die de code genereert. (Deze stelling is tot stand gekomen na
urenlang debuggen van de FEniCS Form Compiler, slechts om te ontdekken dat er
een tekenfout zat in de inputcode.)

2. In het verleden werden eindige elementen solvers voor partiële differenti-


aalvergelijkingen (PDVs) geschreven in talen (broncode) die door compilers
in machinetaal vertaald werden. In het heden maakt een hogere taal voor
het uitdrukken van de wiskundige formulering van een gegeven PDV het
mogelijk dat compilers automatisch broncode genereren. In de toekomst
kunnen geautomatiseerde modelgeneratoren PDVs creëren van experimentele
data.

3. Uiteindelijk zal kunstmatige algemene intelligentie (sterke KI) het mogelijk


maken voor autonome systemen om te kiezen welk onderdeel van de werke-
lijkheid te modelleren en hoe, en zodoende mensen overbodig maken.

4. In het grotere plaatje zijn mensen, als soort, al overbodig, maar dit impliceert
niet dat sterke KI al ontwikkeld is.

5. In termen van het ontdekken van bugs en het sturen van de ontwikkeling
van een software-project, is een grote gebruikersgroep meer waard dan een
groot aantal ontwikkelaars.

6. “Een jongeman ontwikkelt van nature een afkeer van arbeid wanneer hij
er gedurende een langere periode geen baat bij heeft.” – Adam Smith,
overdenkingen over leerlingschap in An Inquiry into the Nature and Causes of
the Wealth of Nations. Evenzo kan een promovendus een afname in motivatie
ervaren wanneer de projectfinanciering ophoudt. De oplossing is eerder om
de projectplanning te verbeteren dan om de financiering te verlengen.
190 Stellingen

7. Dijkstra’s kortste-pad-algoritme werkt goed voor het plannen van kortlopende


taken, maar het is niet geschikt voor het plannen van langlopende onderzoeks-
projecten.

8. Tijd (of geld) is de penalty parameter die de kloof tussen ambitie en werkelijk
verrichte arbeid dicht.

9. Economische groei drijft sociale verschillen tussen landen naar nul.

10. Een kind voldoet niet vanaf zijn geboorte aan randvoorwaarden. Randvoor-
waarden moeten daarom afgedwongen worden in zwakke zin met straf,
beloning en het geven van het goede voorbeeld.

11. Gradiënten in de verdeling van rijkdom zijn een voorwaarde voor een dy-
namische maatschappij.

12. Samenwerking maakt het mogelijk dat een sterke groep individuen een groep
sterke individuen aftroeft. Dit concept werkt zowel voor de wetenschap als
voor sport en is, bijvoorbeeld, te zien geweest in de EK-voetbalwedstrijden
tussen Nederland en Denemarken in 1992 en 2012.

13. De recente discussie over of Sinterklaas al dan niet Zwarte Piet als zijn helper
mag hebben (gegeven dat zijn dienstverband in overeenstemming is met
de collectieve arbeidsovereenkomst) mist het wezenlijke punt. Het echte
probleem ontstaat als Zwarte Piet ontslagen wordt vanwege zijn huidskleur.

14. Hoewel een proefschrift zelden van kaft tot kaft gelezen wordt, zullen de
meeste mensen de stellingen lezen en dan de bibliografie doornemen om te
zien hoeveel artikelen gepubliceerd zijn op basis van het betreffende werk.

Deze stellingen worden opponeerbaar en verdedigbaar geacht en zijn als zodanig


goedgekeurd door de promotoren Prof. dr. ir. L. J. Sluys en Dr. G. N. Wells.
Curriculum vitae

2 August 1978 Born in Ringkøbing, Denmark

Aug. 2000–Oct. 2005 Civil engineering studies, Aalborg University.

October 2005 Master of Science in Civil Engineering, Aalborg


University.

Nov. 2005–Nov. 2009 Research assistant, Faculty of Civil Engineering


and Geosciences, Delft University of Technology.

Apr. 2010–Sep. 2010 Scientific programmer, Simula Research Labora-


tory.

Nov. 2011–present Research assistant, Department of Civil Engineer-


ing, Aalborg University.

Vous aimerez peut-être aussi