Académique Documents
Professionnel Documents
Culture Documents
1 Introduction
Sensitivity analysis consists in computing derivatives of one or more quantities
(outputs) with respect to one or several independent variables (inputs). Al-
though there are various uses for sensitivity information, our main motivation
is the use of this information in gradient-based optimization. Since the calcula-
tion of gradients is often the most costly step in the optimization cycle, using
efficient methods that accurately calculate sensitivities are extremely important.
There are a several different methods for sensitivity analysis but since none
of them is the clear choice for all cases, it is important to understand their
relative merits. When choosing a method for computing sensitivities, one is
mainly concerned with its accuracy and computational expense. In certain
cases it is also important that the method be easily implemented. A method
which is efficient but difficult to implement may never be finalized, while an
easier, though computationally more costly method, would actually give some
result. Factors that affect the choice of method include: the ratio of the number
of outputs to the number of inputs, the importance of computational efficiency
and degree of laziness of the programmer.
Consider a general constrained optimization problem of the form:
minimize f (xi )
w.r.t xi i = 1, 2, . . . , n
subject to gj (xi ) 0, j = 1, 2, . . . , m
where f is a non-linear function of n design variables xi and gj are the m non-
linear inequality constraints we have to satisfy. In order to solve this problem,
a gradient-based optimization algorithm usually requires:
The sensitivities of the objective function, f /xi (n 1).
The sensitivities of all the active constraints at the current design point
gj /xi (m n).
2 Finite-Differences
Finite-difference formulae are very commonly used to estimate sensitivities. Al-
though these approximations are neither particularly accurate or efficient, this
methods biggest advantage resides in the fact that it is extremely easy to im-
plement.
1
All the finite-differencing formulae can be derived by truncating a Taylor se-
ries expanded about a given point x. A common estimate for the first derivative
is the forward-difference which can be derived from the expansion of f (x + h),
h2 00 h3
f (x + h) = f (x) + hf 0 (x) + f (x) + f 000 (x) + . . . (1)
2! 3!
Solving for f 0 we get the finite-difference formula,
f (x + h) f (x)
f 0 (x) = + O(h), (2)
h
where h is called the finite-difference interval. The truncation error is O(h),
and hence this is a first-order approximation.
For a second-order estimate we can use the expansion of f (x h),
h2 00 h3
f (x h) = f (x) hf 0 (x) + f (x) f 000 (x) + . . . , (3)
2! 3!
and subtract it from the expansion given in Equation (1). The resulting equation
can then be solved for the derivative of f to obtain the central-difference formula,
f (x + h) f (x h)
f 0 (x) = + O(h2 ). (4)
2h
When estimating sensitivities using finite-difference formulae we are faced with
the step-size dilemma, i.e. the desire to choose a small step size to minimize
truncation error while avoiding the use of a step so small that errors due to
subtractive cancellation become dominant.
The cost of calculating sensitivities with finite-differences is proportional to
the number of design variables since f must be calculated for each perturbation
of xi . This means that if we use forward differences, for example, the cost would
be n + 1 times the cost of calculating f .
2
3.2 Basic Theory
We will now see that a very simple formula for the first derivative of real func-
tions can be obtained using complex calculus. Consider a function, f = u+iv, of
the complex variable, z = x+iy. If f is analytic the Cauchy-Riemann equations
apply, i.e.,
u v
= (5)
x y
u v
= . (6)
y x
These equations establish the exact relationship between the real and imaginary
parts of the function. We can use the definition of a derivative in the right hand
side of the first Cauchy-Riemann Equation(5) to obtain,
u v(x + i(y + h)) v(x + iy)
= lim . (7)
x h0 h
where h is a small real number. Since the functions that we are interested in are
real functions of a real variable, we restrict ourselves to the real axis, in which
case y = 0, u(x) = f (x) and v(x) = 0. Equation (7) can then be re-written as,
f Im [f (x + ih)]
= lim . (8)
x h0 h
For a small discrete h, this can be approximated by,
f Im [f (x + ih)]
. (9)
x h
We will call this the complex-step derivative approximation. This estimate is not
subject to subtractive cancellation error, since it does not involve a difference
operation. This constitutes a tremendous advantage over the finite-difference
approaches expressed in Equation (2, 4).
In order to determine the error involved in this approximation, we will show
an alternative derivation based on a Taylor series expansion. Rather than using
a real step h, we now use a pure imaginary step, ih. If f is a real function in
real variables and it is also analytic, we can expand it in a Taylor series about
a real point x as follows,
f 00 (x) f 000 (x)
f (x + ih) = f (x) + ihf 0 (x) h2 ih3 + ... (10)
2! 3!
Taking the imaginary parts of both sides of Equation (10) and dividing the
equation by h yields
Im [f (x + ih)] f 000 (x)
f 0 (x) = + h2 + ... (11)
h 3!
Hence the approximations is a O(h2 ) estimate of the derivative of f .
3
3.3 A Simple Numerical Example
Because the complex-step approximation does not involve a difference opera-
tion, we can choose extremely small steps sizes with no loss of accuracy due to
subtractive cancellation.
To illustrate this, consider the following analytic function:
ex
f (x) = (12)
sin3 x + cos3 x
The exact derivative at x = 1.5 was computed analytically to 16 digits and
then compared to the results given by the complex-step (9) and the forward
and central finite-difference approximations.
e
4
accuracy of the function evaluation. Comparing the best accuracy of each of
these approaches, we can see that by using finite-difference we only achieve a
fraction of the accuracy that is obtained by using the complex-step approxima-
tion.
As we can see the complex-step size can be made extremely small. However,
there is a lower limit on the step size when using finite precision arithmetic. The
range of real numbers that can be handled in numerical computing is dependent
on the particular compiler that is used. In this case, the smallest non-zero
number that can be represented is 10308 . If a number falls below this value,
underflow occurs and the number drops to zero. Note that the estimate is still
accurate down to a step of the order of 10307 . Below this, underflow occurs
and the estimate results in NaN. In general, the smallest possible h is the one
below which underflow occurs somewhere in the algorithm.
When it comes to comparing the relative accuracy of complex and real com-
putations, there is an increased error in basic arithmetic operations when using
complex numbers, more specifically when dividing and multiplying.
Relational operators
5
discontinuity is in the first or higher derivatives. When using a finite-difference
method, the derivative estimate will be incorrect if the two function evaluations
are within h of the discontinuity location. However, if the complex-step is used,
the resulting derivative estimate will be correct right up to the discontinuity. At
the discontinuity, a derivative does not exist by definition, but if the function is
defined a that point, the approximation will still return a value that will depend
on how the function is defined at that point.
Arithmetic functions and operators include addition, multiplication, and
trigonometric functions, to name only a few, and most of these have a standard
complex definition that is analytic almost everywhere. Many of these definitions
are implemented in Fortran. Whether they are or not depends on the compiler
and libraries that are used. The user should check the documentation of the
particular Fortran compiler being used in order to determine which intrinsic
functions need to be redefined.
Functions of the complex variable are merely extensions of their real coun-
terparts. By requiring that the extended function satisfy the Cauchy-Riemann
equations, i.e. analyticity, and that its properties be the same as those of the
real function, we can obtain a unique complex function definition. Since these
complex functions are analytic, the complex-step approximation is valid and
will yield the correct result.
Some of the functions, however, have singularities or branch cuts on which
they are not analytic. This does not pose a problem since, as previously ob-
served, the complex-step approximation will return a correct one-sided deriva-
tive. As for the case of a function that is not defined at a given point, the
algorithm will not return a function value, so a derivative cannot be obtained.
However, the derivative estimate will be correct in the neighborhood of the
discontinuity.
The only standard complex function definition that is non-analytic is the
absolute value function or modulus. When the argument of this function p is a
complex value, the function returns a positive real number, |z| = x2 + y 2 .
This functions definition was not derived by imposing analyticity and therefore
it will not yield the correct derivative when using the complex-step estimate. In
order to derive an analytic definition of abs we start by satisfying the Cauchy-
Riemann equations. From the Equation (5), since we know what the value of
the derivative must be, we can write,
(
u v 1 x < 0
= = . (13)
x y +1 x > 0
From Equation (6), since v/x = 0 on the real axis, we get that u/y = 0
on the axis, so the real part of the result must be independent of the imaginary
part of the variable. Therefore, the new sign of the imaginary part depends only
on the sign of the real part of the complex number, and an analytic absolute
6
value function can be defined as:
(
x iy x<0
abs(x + iy) = . (14)
+x + iy x>0
Note that this is not analytic at x = 0 since a derivative does not exist for the
real absolute value. Once again, the complex-step approximation will give the
correct value of the first derivative right up to the discontinuity. Later the x > 0
condition will be substituted by x 0 so that we not only obtain a function
value for x = 0, but also we are able to calculate the correct right-hand-side
derivative at that point.
2. Define all functions and operators that are not defined for complex argu-
ments and re-define abs.
7
variable type called cmplx as well as all the functions that are necessary
for the the complex-step method. The inclusion of this file and the re-
placement of double or float declarations with cmplx is nearly all that
is required.
Matlab: As in the case of Fortran, one must redefine functions such as abs,
max and min. All differentiable functions are defined for complex variables.
Results for the simple example in the previous section were computed us-
ing Matlab. The standard transpose operation represented by an apostro-
phe () poses a problem as it takes the complex conjugate of the elements
of the matrix, so one should use the non-conjugate transpose represented
by dot apostrophe (.) instead.
Java: Complex arithmetic is not standardized at the moment but there are
plans for its implementation. Although function overloading is possible,
operator overloading is currently not supported.
Python: When using the Numerical Python module (NumPy), we have access
to complex number arithmetic and implementation is as straightforward
as in Matlab.
4 Algorithmic Differentiation
Algorithmic differentiation also known as computational differentiation or
automatic differentiation is a well known method based on the systematic
application of the differentiation chain rule to computer programs. Although
this approach is as accurate an analytic method, it is potentially much easier to
implement since this can be done automatically.
8
Algorithmic Complex-Step
x1 = 1 h1 = 1020
x2 = 0 h2 = 0
f = x1 x2 f = (x1 + ih1 )(x2 + ih2 )
f = x1 x2 + x2 x1 f = x1 x2 h1 h2 + i(x1 h2 + x2 h1 )
df /dx1 = f df /dx1 = Im f /h
the higher order terms in the Taylor series expansion of Equation (9). For very
small h, when using finite precision arithmetic, these terms have no effect on
the real part of the result.
Although this example involves only one operation, both methods work for
an algorithm involving an arbitrary sequence of operations by propagating the
variation of one input forward throughout the code. This means that in order
to calculate n derivatives, the differentiated code must be executed n times.
The other mode the reverse mode has no equivalent in the complex-step
method. When using the reverse mode, the code is executed forwards and then
backwards to calculate derivatives of one output with respect to n inputs. The
total number of operations is independent of n, but the memory requirements
may be prohibitive, especially for the case of large iterative algorithms.
There is nothing like an example, so we will now use both the forward and
reverse modes to compute the derivatives of the function,
The algorithm that would calculate this function is shown below, together
with the derivative calculation using the forward mode.
t1 = x1 t1 =1
t2 = x2 t2 =0
t3 = t1 t2 t3 = t1 t2 + t1 t2
t4 = sin(t1 ) t4 = t1 cos(t1 )
t5 = t3 + t4 t5 = t3 + t4
The reverse mode is also based on the chain rule. Let tj note all the inter-
mediate variables in an algorithm that calculates f (xi ). We set t1 , . . . , tn to
x1 , . . . , xn and the last intermediate variable, tm to f . Then the chain rule can
be written as,
tj X tj tk
= , i = 1, 2, . . . , m, (16)
ti tk ti
kKj
9
t5
+
t4 t3
sin
t1 t2
x1 x2
Figure 2: Graph of the algorithm that calculates f (x1 , x2 ) = x1 x2 + sin(x1 )
10
for j = m + 1, . . . , n to obtain the gradients of the intermediate and output
variables. Kj denotes the set of indices k < j such that the variable tj in the
code depends explicitly on tk . In order to know in advance what these indices
are, we have to form the graph of the algorithm when it is first executed. This
provides information on the interdependence of all the intermediate variables.
A graph for our sample algorithm is shown in Figure 2.
The sequence of calculations shown below corresponds to the application of
the reverse mode to our simple function.
t5
=1
t5
t5 t5
= =1
t4 t4
t5 t5 t4 t5 t3
= + +=10+11=1
t3 t4 t3 t3 t3
t5 t5 t3 t5 t4
= + = 1 t1 + 1 0 = t1
t2 t3 t2 t4 t2
t5 t5 t2 t5 t3 t5 t4
= + + = t1 0 + 1 t2 + 1 cos(t1 ) = t2 + cos(t1 )
t1 t2 t1 t3 t1 t4 t1
The following matrix, helps to visualize the sensitivities of all the variables
with respect to each other.
1 0 0 0 0
0 1 0 0 0
t3 t3
t 1 0 0 (17)
t1 t 2
4 t4 t4 1 0
t1 t2 t3
t5 t5 t5 t5
t1 t2 t3 t4 1
In the case of the example we are considering we have:
1 0 0 0 0
0 1 0 0 0
t2 t 1 1 0 0 (18)
cos(t1 ) 0 0 1 0
t2 + cos(t1 ) t1 1 1 1
The cost of calculating the derivative of one output to many inputs is not
proportional to the number of input but to the number of outputs. Since when
using the reverse mode we need to store all the intermediate variables as well as
the complete graph of the algorithm, the amount of memory that is necessary
increases dramatically. In the case of three-dimensional iterative solver, the cost
of using this mode can be prohibitive.
11
To implement algorithmic differentiation by source transformation, the whole
source code must be processed with a parser and all the derivative calculations
are introduced as additional lines of code. The resulting source code is greatly
enlarged and it becomes practically unreadable. This fact constitutes an imple-
mentation disadvantage as it becomes impractical to debug this new extended
code. One has to work with the original source, and every time it is changed (or
if different derivatives are desired) one must rerun the parser before compiling
a new version.
In order to use derived types, we need languages that support this feature,
such as Fortran 90 or C++. To implement algorithmic differentiation using
this feature, a new type of structure is created that contains both the value
and its derivative. All the existing operators are then re-defined (overloaded)
for the new type. The new operator has exactly the same behavior as before
for the value part of the new type, but uses the definition of the derivative of
the operator to calculate the derivative portion. This results in a very elegant
implementation since very few changes are required in the original code.
Many tools for automatic algorithmic differentiation of programs in different
languages exist. They have been extensively developed and provide the user
with great functionality, including the calculation of higher-order derivatives
and reverse mode options.
Fortran: Tools that use the source transformation approach include: AD-
IFOR [11], TAMC, DAFOR, GRESS, Odysse and PADRE2. The nec-
essary changes to the source code are made automatically. The derived
datatype approach is used in the following tools: AD01, ADOL-F, IMAS
and OPTIMA90. Although it is in theory possible to have a script make
the necessary changes in the source code automatically, none of these tools
have this facility and the changes must be done manually.
C/C++: Established tools for automatic algorithmic differentiation also exist
for C/C++[10]. These include include ADIC, an implementation mirror-
ing ADIFOR, and ADOL-C, a free package that uses operator overloading
and can operate in the forward or reverse modes and compute higher order
derivatives.
References
[1] Lyness, J. N., and C. B. Moler,, Numerical differentiation of analytic
functions, SIAM J. Numer. Anal., Vol. 4, 1967, pp. 202-210.
[2] Lyness, J. N., Numerical algorithms based on the theory of complex vari-
ables, Proc. ACM 22nd Nat. Conf., Thompson Book Co., Washington DC,
1967, pp. 124-134.
[3] Squire, W., and G. Trapp, Using Complex Variables to Estimate Deriva-
tives of Real Functions, SIAM Review, Vol. 10, No. 1, March 1998, pp.
100-112.
12
[4] Martins, J. R. R. A., I. M. Kroo, and J. J. Alonso An Automated Method
for Sensitivity Analysis using Complex Variables Proceedings of the 38th
Aerospace Sciences Meeting, AIAA Paper 2000-0689. Reno, NV, January
2000.
[5] Martins, J. R. R. A. and P. Sturdza The Connection Between the
Complex-Step Derivative Approximation and Algorithmic Differentiation
Proceedings of the 39th Aerospace Sciences Meeting, Reno, NV, January
2001. AIAA Paper 2001-0921.
13
5 Analytic Sensitivity Analysis
Analytic methods are the most accurate and efficient methods available for
sensitivity analysis. They are, however, more involved than the other methods
we have seen so far since they require the knowledge of the governing equations
and the algorithm that is used to solve those equations. In this section we will
learn how to compute analytic sensitivities with direct and adjoint methods.
We will start with single discipline systems and then generalize for the case of
multiple systems such as we would encounter in MDO.
where xj are the independent variables (the design variables) and yk are the
state variables that depend on the independent ones through the solution of the
governing equations. Note that the number of equations must equal the number
of unknowns (the state variables.)
Any perturbation in the variables this system of equations must result in
no variation of the residuals, if the governing equations are to be satisfied.
Therefore, we can write,
Rk0 Rk0
Rk0 = 0 xj + yk = 0. (20)
xj yk
since there is a variation due the change in the design variables as well as a
variation due to the change in the state vector. This equation applies to all
k = 1, . . . , nR and j = 1, . . . , nx . Dividing the equation by xj , we can get it
in another form which involves the total derivative dyk /dxj ,
14
also depends on both xj and yk and hence the total variation of fi is,
fi fi
fi = xj + yk . (22)
xj yk
Note that yk cannot be found explicitly since yk varies implicitly with respect
to xj through the solution of the governing equations. We can also divide this
equation by xj to get the alternate form,
dfi f f dyk
= + , (23)
dxj xj yk dxj
where i = 1, . . . , nx and k = 1, . . . , nR . The first term on the right-hand-side
represents the explicit variation of the function of interest with respect to the
design variables due to the presence of these variables in the expression for fi .
The second term the variation of the function due to the change of the state
variables when the governing equations are solved.
R=0
y f
15
5.1.3 Direct Sensitivity Equations
The direct approach first calculates the total variation of the state variables,
yk , by solving the differentiated governing equation (23) for dyk /dxj , the total
derivative of the state variables with respect to a given design variable. This
means solving the linear system of equations,
Rk0 dyk Rk0
= . (24)
yk dxj xj
The solution procedure usually involves factorizing the square matrix, Rk0 /yk
and then back-solve to obtain the solution. Note that we have to chose one xj
each time we back-solve, since the right-hand-side vector is different for each j.
We can then use the result for dyk /dxj and substitute it in equation (23) to
get dfi /dxj for all i = 1, . . . , nf .
where k is the adjoint vector. The values of the components of this vector are
arbitrary because we only consider variations for which the governing equations
are satisfied, i.e., Rk0 = 0. If we collect the terms multiplying each of the
variations we obtain,
fi Rk0 fi Rk0
fi = + kT xj + + kT yk (26)
xj xj yk yk
Since k is arbitrary, we can chose its values to be those for which the term
multiplying by yk is zero, i.e., if we solve,
T T
Rk0 fi Rk0 fi
kT = k = (27)
yk yk yk yk
for the adjoint vector k . An adjoint vector is the same for any xj , but it is
different for each fi .
The term in equation (26) that is multiplied by xj corresponds to the total
derivative of fi with respect to xj , i.e.,
dfi fi dRk0
= + kT , (28)
dxj xj dxj
16
5.1.5 Direct vs. Adjoint
In the previous two sections, the direct and adjoint sensitivity equations were
both derived independently from the same two equations (20, 22). We will now
attempt to unify the derivation of these two methods by expressing them in the
same equation. This will help us gain a better understanding on how these two
approaches are related.
If we want to solve for the total sensitivity of the state variables with respect
to the design variables, we would have to solve equation (24). Assuming that we
have the sensitivity matrix of the residuals with respect to the state variables,
Rk0 /yk , and that it is invertible, the solution is,
1
dyk Rk0 Rk0
= . (29)
dxj yk xj
Note that the matrix of partial derivatives of the residuals with respect to the
state variables, Rk0 /yk , is square, since the number of governing equations
must equal the number of state variables.
Substituting equation (29) into the expression for the total derivative of the
function of interest (23) to get,
1
dfi fi fi Rk0 Rk0
= . (30)
dxj xj yk yk xj
| {z }
dyk /dxj
Both the direct and adjoint methods can be seen in this equation. Using the
direct method we would start by solving for the term shown under-braced in
equation (30), i.e., the solution of,
which is the total sensitivity of the state variables. Note that each set of these
total sensitivities is valid for only one design variable, xj . Once we have these
sensitivities, we can use this result in equation (30), i.e.,
dfi fi fi dyk
= + (32)
dxj xj yk dxj
17
Step Direct Adjoint
Factorization same same
Back-solve nx times nf times
Multiplication same same
Unlike the direct method, where each dyk /dxj can be used for any function fi ,
we must compute a different adjoint vector k for each function of interest.
A comparison of the cost of computing sensitivities with the direct versus
adjoint methods is shown in Table 5.1.5. With either method, we must factorize
the same matrix, Rk0 /yk . The difference in the cost comes form the back-
solve step for solving equations (31) and (34) respectively. The direct method
requires that we perform this step for each design variable (i.e. for each j) while
the adjoint method requires this to be done for each function of interest (i.e. for
each i). The multiplication step is simply the calculation of the final sensitivity
expressed in equations (32) and (35) respectively. The cost involved in this
step when computing the same set of sensitivities is the same for both methods.
The final conclusion is the established rule that if the number of design variables
(inputs) is greater than the number of functions of interest (output), the adjoint
method is more efficient than the direct method and vice-versa. If the number
of outputs is similar to the number of inputs, either method will be costly.
In this discussion, we have assumed that the governing equations have been
discretized. The same kind of procedure can be applied to continuous governing
equations. The principle is the same, but the notation would have to be more
general. The equations, in the end, have to be discretized in order to be solved
numerically. Figure 4 shows the two ways of arriving at the discrete sensitiv-
ity equations. We can either differentiate the continuous governing equations
first and then discretize them, or discretize the governing equations and differ-
entiate them in the second step. The resulting sensitivity equations should be
equivalent, but are not necessarily the same. Differentiating the continuous gov-
erning equations first is usually more involved. In addition, applying boundary
conditions to the differentiated equations can be non-intuitive as some of these
boundary conditions are non-physical.
because the boundary conditions of the continuous sensitivity equations are
non-physical.
18
Continuous Discrete
Sensitivity Sensitivity
Equations Equations 1
Continuous
Governing
Equations
Discrete Discrete
Governing Sensitivity
Equations Equations 2
where Kk0 k is the stiffness matrix, uk is the vector of displacement (the state)
and fk is the vector of applied force (not to be confused with the function of
interest from the previous section!).
We are interested in finding the sensitivities of the stress, which is related
to the displacements by the equation,
i = Sik uk . (37)
We will consider the design variables to be the cross-sectional areas of the ele-
ments, Aj . We will now look at the terms that we need to use the generalized
total sensitivity equation (30).
For the matrix of sensitivities of the governing equations with respect to the
state variables we find that it is simply the stiffness matrix, i.e.,
Rk0 (Kk0 k uk fk )
= = Kk 0 k . (38)
yk uk
Lets consider the sensitivity of the residuals with respect to the design variables
(cross-sectional areas in our case). Neither the displacements of the applied
forces vary explicitly with the element sizes. The only term that depends on Aj
directly is the stiffness matrix, so we get,
The partial derivative of the stress with respect to the displacements is simply
given by the matrix in equation (37), i.e.,
fi i
= = Sik (40)
yk uk
19
Finally, the explicit variation of stress with respect to the cross-sectional areas
is zero, since the stresses depends only on the displacement field,
fi i
= = 0. (41)
xj Aj
Substituting these into the generalized total sensitivity equation (30) we get:
di i 1 Kk0 k
= K 0 uk (42)
dAj uk k k xj
Referring to the theory presented previously, if we were to use the direct method,
we would solve,
duk Kk0 k
Kk 0 k = uk (43)
dAj Aj
and then substitute the result in,
di i i duk
= + (44)
dAj Aj uk dAj
to calculate the desired sensitivities.
The adjoint method could also be used, in which case we would solve equa-
tion (34) for the structures case,
i
KkT0 k k = . (45)
uk
Then we would substitute the adjoint vector into the equation,
di i Kk0 k
= + kT uk . (46)
dxj xj Aj
to calculate the desired sensitivities.
20
x
u
RA= 0 RS = 0
w
In addition to diagonal terms of the matrix which would appear when solving
the single systems we have cross terms expressing the sensitivity of one system
to the others state variables. These equations are sometimes called the Global
Sensitivity Equations (GSE) [4].
Similarly, we can write a coupled adjoint based on equation (34),
RA RA
T fi
w u A w
RS RS = fi (48)
w u
S u
In addition to the GSE expressed in equation (47), Sobieski [4] also intro-
duced an alternative method which he called GSE2 for calculating total sen-
sitivities. Instead of looking at the residual variation, see variation in state,
yk (x, yk0 (x)), where k 0 6= k.
yk yk dyk0
yk = xj + xj (49)
xj yk0 dxj
Dividing this by yk , we get,
dyk yk yk dyk0
= + (50)
dxj xj yk0 dxj
21
For all k,
yk dyk0 yk
kk0 = (51)
yk0 dxj xj
Writing this in matrix form for the two discipline example we get,
" dw # "
w
#
I wu dxj xj
u du = u . (52)
w I dxj xj
One advantage of this formulation is that the size of the matrix to be factorized
might be reduced. This is due to the fact that the state variables of one system
not always depend on all the state variables of the other system. For example,
in the case of the coupled aero-structural system, only the surface aerodynamic
pressures affect the structural analysis and we could substitute all the flow state,
w, by a much smaller vector of surface pressures. Similarly, we could use only
the surface structural displacements rather than all of them, since only these
influence the aerodynamics.
An adjoint version of this alternative can also be derived and the system to
be solved is for this case,
w
T fi
I u A w
u = fi (53)
w I S u
Since factorizing the full residual sensitivity matrix is in many cases imprac-
tical, the method can be slightly modified as follows. Equation (48) can be
re-written as,
" # fi
RA T RS T
w w A w
RA T RS T
= f (54)
S u
i
u u
Since the factorization of the matrix in equation ( 54) would be extremely costly,
we decided to set up an iterative procedure, much like the one used for our aero-
structural solution, where the adjoint vectors are lagged and two different sets
of equations are solved separately. For the calculation of the adjoint vector of
one discipline, we use the adjoint vector of the other discipline from the previous
iteration, i.e., we would solve,
T T
RA fi RS
A = S (55)
w w w
T T
RS fi RA
S = A (56)
u u u
whose final result, after convergence, is be the same as equation (48). We will
call this the lagged coupled adjoint method for computing sensitivities of coupled
22
systems. Note that these equations look like the single discipline ones for the
aerodynamic and structural adjoint, except that a forcing term is subtracted
from the right-hand-side.
Once the solution for both of the adjoint vectors have converged, we are able
to compute the final sensitivities of a given cost function by using,
dfi fi T RA RS
= A ST . (57)
dxj xj xj xj
References
[1] Adelman, H. M., R. T. Haftka, Sensitivity Analysis of Discrete Structural
Systems, AIAA Journal, Vol. 24, No. 5, May 1986.
[2] Barthelemy, B., R. T. Haftka and G. A. Cohen, Physically Based Sensi-
tivity Derivatives for Structural Analysis Programs, Computational Me-
chanics, pp. 465-476, Springer-Verlag, 1989.
[3] Belegundu, A. D. and J. S. Arora, Sensitivity Interpretation of Adjoint
Variables in Optimal Design, Computer Methods in Applied Mechanics
and Engineering, Vol. 48, pp. 81-90, 1985.
23