Vous êtes sur la page 1sur 11

Underactuated Robotics

1 of 11

http://people.csail.mit.edu/russt/underactuated/underactuated.html?cha...

UR

Algorithms for Walking, Running, Swimming, Flying, and Manipulation

Russ Tedrake
Russ Tedrake, 2014
How to cite these notes

Note: These are working notes that will be updated throughout the Fall 2014 semester.

C12
Trajectory Optimization
I've argued that optimal control is a powerful framework for specifying complex behaviors
with simple objective functions, letting the dynamics and constraints on the system shape the
resulting feedback controller (and vice versa!). But the computational tools that we've
provided so far have been limited in some important ways. The numerical approaches to
dynamic programming which involve putting a mesh over the state space do not scale well to
systems with state dimension more than four or five. Linearization around a nominal
operating point (or trajectory) allowed us to solve for locally optimal control policies (e.g.
using LQR) for even very high-dimensional systems, but the effectiveness of the resulting
controllers is limited to the region of state space where the linearization is a good
approximation of the nonlinear dynamics. The computational tools for Lyapunov analysis
from the last chapter can provide, among other things, an effective way to compute estimates
of those regions. But we have not yet provided any real computational tools for approximate
optimal control that work for high-dimensional systems beyond the linearization around a
goal. That is precisely the goal for this chapter.
The big change that is going to allow us to scale to high-dimensional systems is that we are
going to give up the goal of solving for the optimal feedback controller for the entire state
space, and instead attempt to find an optimal control solution that is valid from only a single
initial condition. Instead of representing this as a feedback control function, we can represent
this solution as a trajectory,
, typically defined over a finite interval. In our graphsearch dynamic programming algorithms, we discretized the dynamics of the system on a
mesh spread across the state space. This does not scale to high-dimensional systems, and it
was difficult to bound the errors due to the discretization. If we instead restrict ourselves to
optimizing only a single initial condition, then a different discretization scheme emerges: we
can discretize the state and input trajectories over time.

12.1 PF

Given an initial condition, , and an input trajectory


defined over a finite interval,
, we can compute the long-term (finite-horizon) cost of executing that trajectory
using the standard additive-cost optimal control objective,

11/14/2014 9:26 PM

Underactuated Robotics

2 of 11

http://people.csail.mit.edu/russt/underactuated/underactuated.html?cha...

We will write the trajectory optimization problem as

Some trajectory optimization problems may also include additional constraints, such as
collision avoidance ( can not cause the robot to be inside an obstacle) or input limits (e.g.
), which can be defined for all time or some subset of the trajectory.
As written, the optimization above is an optimization over continuous trajectories. In order to
formulate this as a numerical optimization, we must parameterize it with a finite set of
numbers. Perhaps not surprisingly, there are many different ways to write down this
parameterization, with a variety of different properties in terms of speed, robustness, and
accuracy of the results. We will outline just a few of the most popular below. I would
recommend [78] for additional details.

12.2 C

TN
O

Before we dive in, we need to take a moment to understand the optimization tools that we
will be using. In the graph-search dynamic programming algorithm, we magically were able
to provide an iterative algorithm that was known to converge to optimal cost-to-go function.
With LQR we were able to reduce the problem to a matrix Riccati equation, for which we
have scalable numerical methods to solve. In the Lyapunov analysis chapter, we were able to
formulate a very specific kind of optimization problem--a semi-definite program (or
SDP)--which is a subset of convex optimization, and relied on custom solvers like SeDuMi
or Mosek to solve the problems for us. Convex optimization is a hugely important subset of
nonlinear optimization, in which we can guarantee that the optimization has no "local
minima". In this chapter we won't be so lucky, the optimizations that we formulate may have
local minima and the solution techniques will at best only guarantee that they give a locally
optimal solution.
The generic formulation of a nonlinear optimization problem is

where is a vector of decision variables, is a scalar objective function and is a vector of


constraints. Note that, although we write
, this formulation captures positivity
constraints on the decision variables (simply mulitply the constraint by
) and equality
constraints (simply list both
and
) as well.
The picture that you should have in your head is a nonlinear, potentially non-convex
objective function defined over (multi-dimensional) , with a subset of possible values
satisfying the constraints.

11/14/2014 9:26 PM

Underactuated Robotics

3 of 11

http://people.csail.mit.edu/russt/underactuated/underactuated.html?cha...

Figure 12.1 - One-dimensional cartoon of a nonlinear optimization problem. The


red dots represent local minima. The blue dot represents the optimal solution.
Note that minima can be the result of the objective function having zero-derivative or due to
the a sloped objective up against a constraint.
Numerical methods for solving these optimization problems require an initial guess, , and
proceed by trying to move down the objective function to a minima. Common approaches
include gradient descent, in which the gradient of the objective function is computed or
estimated, or second-order methods such as sequential quadratic programming (SQP) which
attempts to make a local quadratic approximation of the objective function and local linear
approximations of the constraints and solves a quadratic program on each iteration to jump
directly to the minimum of the local approximation.
While not strictly required, these algorithms can often benefit a great deal from having the
gradients of the objective and constraints computed explicitly; the alternative is to obtain
them from numerical differentiation. Beyond pure speed considerations, I strongly prefer to
compute the gradients explicitly because it can help avoid numerical accuracy issues that can
creep in with finite difference methods. The desire to calculate these gradients will be a
major theme in the discussion below, and we have gone to great lengths to provide explicit
gradients of our provided functions and automatic differentiation of user-provided functions
in D
.
When I started out, I was of the opinion that there is nothing difficult about implementing
gradient descent or even a second-order method, and I wrote all of the solvers myself. I now
realize that I was wrong. The commercial solvers available for nonlinear programming are
substantially higher performance than anything I wrote myself, with a number of tricks,
subtleties, and parameter choices that can make a huge difference in practice. Some of these
solvers can exploit sparsity in the problem (e.g., if the constraints in a sparse way on the
decision variables). Nowadays, we make heaviest use of SNOPT [79], which now comes
bundled with the precompiled distributions of D
, but also support fmincon from the
Optimization Toolbox in MATLAB. Note that while I do advocate using these tools, you do
not need to use them as a black box. In many cases you can improve the optimization
performance by understanding and selecting non-default configuration parameters.

12.3 T

12.3.1 Direct Transcription


Let us start by representing the finite-time trajectories,
and
, by their
values at a series of break points,
, and denoting the values at those points (aka
the knot points)
and
, respectively.

11/14/2014 9:26 PM

Underactuated Robotics

4 of 11

http://people.csail.mit.edu/russt/underactuated/underactuated.html?cha...

Then perhaps the simplest mapping of the trajectory optimization problem onto a nonlinear
program is to fix the break points at even intervals, , and use Euler integration to write

Note that the decision variables here are


, because
is given,
and
does not appear in the cost nor any of the constraints. It is easy to generalize this
approach to add additional costs or constraints on and/or . (Note that this formulation
does not actually benefit from the additive cost structure, so more general cost formulations
are also possible.) Computing the gradients of the objective and constraints is essentially as
simple as computing the gradients of
and
.
E12.1 (Direct Transcription for the Double Integrator)
We have implemented an optimization class hierarchy in Dwhich makes it easy to try
out these algorithms. Watching the way that they perform on our simple problems is a very
nice way to gain intuition. Here is some simple code to solve the (time-discretized)
minimum-time problem for the double integrator.
% note: requires Drake ver >= 0.9.7
cd(fullfile(getDrakePath,'examples'));
DoubleIntegrator.runDirtran;
% make sure you take a look at the code!
edit('DoubleIntegrator.runDirtran')

Nothing compares to running it yourself, and poking around in the code. But you can also
click here to watch the result. I hope that you recognize the parabolic trajectory from the
initial condition up to the switching surface, and then the second parabolic trajectory down
to the origin. You should also notice that the transition between
and
is
imperfect, due to the discretization error. As an exercise, try increasing the number of knot
points (the variable N in the code) to see if you can get a sharper response, like this.

If you take a moment to think about what the direct transcription algorithm is doing, you will
see that by satisfying the dynamic constraints, the optimization is effectively solving the
(Euler approximation of the) differential equation. But instead of marching forward through
time, it is minimizing the inconsistency at each of the knot points simultaneously. While it's
easy enough to generalize the constraints to use higher-order integration schemes, paying
attention to the trade-off between the number of times the constraint must be evaluated vs the
density of the knot points in time, if the goal is to obtain smooth trajectory solutions then
another approach quickly emerges: the approach taken by the so-called collocation methods.
12.3.2 Direct Collocation
In direct collocation (c.f., [80]), both the input trajectory and the state trajectory are
represented explicitly as piecewise polynomial functions. In particular, the sweet spot for this
algorithm is taking
to be a first-order polynomial -- allowing it to be completely defined

11/14/2014 9:26 PM

Underactuated Robotics

5 of 11

http://people.csail.mit.edu/russt/underactuated/underactuated.html?cha...

by the values at the knot points -- and


to be a Hermite cubic polynomial -- completely
defined by the values and derivatives at the knot points. The state derivatives at the knot
points are given as a function of and
by the plant dynamics, so the entire spline is
completely defined by the values of
and
at the knot points. To add the additional
constraint that
is dynamically consistent with
and
, we add an additional set of
constraints that the derivative at a set of collocation points also matches the plant dynamics.
For the special case of cubic polynomial state trajectories and piecewise linear input
trajectories, the derivative at the midpoint of each segment is particularly easy to compute,
making it the natural choice.

Figure 12.2 - Cubic spline parameters used in the direct collocation method.
Once again, direct collocation effectively integrates the equations of motion by satisfying the
constraints of the optimization -- this time producing a third-order approximation of the
dynamics with effectively two evaluations of the plant dynamics per segment. [81] claims,
without proof, that as the break points are brought closer together, the trajectory will
converge to a true solution of the differential equation. Once again it is very natural to add
additional terms to the cost function or additional input/state constraints, and very easy to
calculate the gradients of the objective and constraints. I personally find it very nice to
explicitly account for the parametric encoding of the trajectory in the solution technique.
E12.2 (Direct Collocation for the Double Integrator)
Direct collocation also easily solves the minimum-time problem for the double integrator.
The performance is similar to direct transcription, but the convergence is visibly different.
Try it for yourself:
% note: requires Drake ver >= 0.9.7
cd(fullfile(getDrakePath,'examples'));
DoubleIntegrator.runDircol;
% make sure you take a look at the code!
edit('DoubleIntegrator.runDircol')

12.3.3 Shooting Methods


In the methods described above, by asking the optimization package to perform the
numerical integration of the equations of motion, we are effectively over-parameterizing the
problem. In fact, the optimization is perfectly well defined if we restrict the decision

11/14/2014 9:26 PM

Underactuated Robotics

6 of 11

http://people.csail.mit.edu/russt/underactuated/underactuated.html?cha...

variables to
only--we can compute
ourselves by knowing
. This is exactly the approach taken in the shooting methods.

, and

Computing gradients

Again, providing gradients of the objectives and constraints to the solver is not strictly
required -- most solvers will obtain them from finite differences if they are not provided -but I feel strongly that the solvers are faster and more robust when exact gradients are
provided. Now that we have removed the decision variables from the program, we have to
take a little extra care to compute the gradients. This is still easily accomplished using the
chain rule. To be concise (and slightly more general), let us define
as the discrete-time approximation of the continuous dynamics;
for example, the forward Euler integration scheme used above would give
Then we have

where the gradient of the state with respect to the inputs can be computed during the
"forward simulation",

These simulation gradients can also be used in the chain rule to provide the gradients of any
constraints. Note that there are a lot of terms to keep around here, on the order of (state dim)
(control dim) (number of timesteps). Ouch. Note also that many of these terms are zero;
for instance with the Euler integration scheme above

if

The special case of optimization without state constraints

By solving for
ourselves, we've removed a large number of constraints from the
optimization. If no additional state constraints are present, and the only gradients we need to
compute are the gradients of the objective, then a surprisingly efficient algorithm emerges.
I'll give the steps here without derivation, but will derive it in the Pontryagin section below:
1. Simulate forward:

from
2. Calculate backwards:

from
3. Extract the gradients:

11/14/2014 9:26 PM

Underactuated Robotics

7 of 11

http://people.csail.mit.edu/russt/underactuated/underactuated.html?cha...

with
Here

.
is a vector the same size as

which has an interpretation as

The equation governing is known as the adjoint equation, and it represents a dramatic
efficient improvement over calculating the huge number of simulation gradients described
above. In case you are interested, the adjoint equation is known as the backpropagation
algorithm in the neural networks literature and it is one of the primary reasons that training
neural networks became so popular in the 1980's; super fast GPU implementations of this
algorithm are also one of the reasons that deep learning is incredibly popular right now (the
availability of massive training databases is perhaps the other main reason).
To take advantage of this efficiency, advocates of the shooting methods often use penalty
methods instead of enforcing hard state constraints; instead of telling the solver about the
constraint explicitly, you simply add an additional term to the cost function which gives a
large penalty commesurate with the amount by which the constraint is violated. These are not
quite as accurate and can be harder to tune (you'd like the cost to be high compared to other
costs, but making it too high can lead to numerical conditioning issues), but they can work.
12.3.4 Discussion
Although the decision about which algorithm is best may depend on the situation, in our
work we have come to favor the direct collocation method (and occasionally direct
transcription) for most of our work. There are a number of arguments for and against each
approach; I will try to discuss a few of them here.
Solver performance

Numerical conditioning. Tail wagging the dog.


Sparse constraints. Potential for parallel evaluation of the constraints (computing the
dynamics and their derivatives are often the most expensive part).
Providing an initial guess

to avoid local minima. direct transcription and collocation can take an initial guess in , too.
Implicit dynamics

Another potential advantage of the direct transcription and collocation methods is that the
dynamics constraints can be written in implicit form.
Variations in the problem formulation

There are number of useful variations to the problem formulations I've presented above. By
far the most common is the addition of a terminal cost, e.g.:

These terms are easily added to the cost function in the any of methods, and the adjoint
equations of the shooting method simply require the a modified terminal condition

11/14/2014 9:26 PM

Underactuated Robotics

8 of 11

http://people.csail.mit.edu/russt/underactuated/underactuated.html?cha...

when computing the gradients.


Another common modification is including the spacing of the breakpoints as additional
decision variables. This is particularly easy in the direct transcription and collocation
methods, and can also be worked into the shooting methods. Typically we add a lower-bound
on the time-step so that they don't all vanish to zero.
Accuracy of numerical integration

One potential complaint about the direct transcription and collocation algorithms is that we
tend to use simplistic numerical integration methods and often fix the integration timestep
(e.g. by choosing Euler integration and selecting a ); it is difficult to bound the resulting
integration errors in the solution. One tantalizing possibility in the shooting methods is that
the forward integration could be accomplished by more sophisticated methods, like
variable-step integration. But I must say that I have not had much success with this approach,
because while the numerical accuracy of any one function evaluation might be improved,
these integration schemes do not necessarily give smooth outputs as you make incremental
changes to the initial conditions and control (changing
by could result in taking a
different number of steps in the integration scheme). This lack of smoothness can wreak
havoc on the convergence of the optimization. If numerical accuracy is a premium, then I
think you will have more success by imposing consistency constraints (e.g. as in the
Runge-Kutta 4th order simulation with 5th order error checking method) as addition
constraints on the time-steps; shooting methods do not have any particular advantage here.

12.4 P
'M

The tools that we've been developing for numerical trajectory optimization are closely tied to
theorems from (analytical) optimal control. Let us take one section to appreciate those
connections.
What precisely does it mean for a trajectory,
, to be locally optimal? It means that if
I were to perturb that trajectory in any way (e.g. change
by ), then I would either incur
higher cost in my objective function or violate a constraint. For an unconstrained
optimization, a necessary condition for local optimality is that the gradient of the objective at
the solution be exactly zero. Of course the gradient can also vanish at local maxima or saddle
points, but it certainly must vanish at local minima. We can generalize this argument to
constrained optimization using Lagrange multipliers.
12.4.1 Constrained optimization with Lagrange multipliers
Given the equality-constrained optimization problem

where is a vector. Define a vector


scalar Lagrangian function,

of Lagrange multipliers, the same size as , and the

11/14/2014 9:26 PM

Underactuated Robotics

9 of 11

http://people.csail.mit.edu/russt/underactuated/underactuated.html?cha...

A necessary condition for


to be an optimal value of the constrained optimization is that
the gradients of vanish with respect to both and :

Note that
be satisfied.

, so requiring this to be zero is equivalent to requiring the constraints to

E12.3 (Optimization on the unit circle)


Consider the following optimization:

The level sets of


are straight lines with slope
solution lives on the unit circle.

, and the constraint requires that the

Simply by inspection, we can determine that the optimal solution should be


Let's make sure we can obtain the same result using Lagrange multipliers.
Formulating

we can take the derivatives and solve

Given the two solutions which satisfy the necessary conditions, the negative solution is
clearly the minimizer of the objective.

12.4.2 Lagrange multiplier derivation of the adjoint equations


Let us use Lagrange multiplizers to derive the necessary conditions for our constrained
trajectory optimization problem in discrete time

11/14/2014 9:26 PM

Underactuated Robotics

10 of 11

http://people.csail.mit.edu/russt/underactuated/underactuated.html?cha...

Formulate the Lagrangian,

and set the derivatives to zero to obtain the adjoint equation method described for the
shooting algorithm above:

Therefore, if we are given an initial condition


and an input trajectory
, we can verify
that it satisfies the necessary conditions for optimality by simulating the system forward in
time to solve for
, solving the adjoint equation backwards in time to solve for
, and
verifying that
for all . The fact that
when
and
follows
from some basic results in the calculus of variations.
12.4.3 Necessary conditions for optimality in continuous time
You won't be suprised to hear that these necessary conditions have an analogue in continuous
time. I'll state it here without derivation. Given the initial conditions,
, a continuous
dynamics,
, and the instantaneous cost
, for a trajectory
defined over
to be optimal is must satisfy the conditions that

In fact the statement can be generalized even beyond this to the case where has constraints.
The result is known as Pontryagin's minimum principle -- giving necessary conditions for a
trajectory to be optimal.
T12.1 - Pontragin's Minimum Principle
Adapted from [82]. Given the initial conditions,

, a continuous dynamics,

11/14/2014 9:26 PM

Underactuated Robotics

11 of 11

over

http://people.csail.mit.edu/russt/underactuated/underactuated.html?cha...

, and the instantaneous cost


, for a trajectory
to be optimal is must satisfy the conditions that

defined

Note that the terms which are minimized in the final line of the theorem are commonly
referred to as the Hamiltonian of the optimal control problem,

It is distinct from, but inspired by, the Hamiltonian of classical mechanics. Remembering that
has an interpretation as

, you should also recognize it from the HJB.

12.5 T

12.5.1 Linear systems with convex linear constraints


An important special case. Linear/Quadratic objectives results in an LP/QP
optimization.

Convex

12.5.2 Differential Flatness


12.5.3 Mixed-integer convex optimization for non-convex constraints

12.6 LT
FD

Once we have obtained a locally optimal trajectory from trajectory optimization, there is still
work to do...
12.6.1 Model-predictive control
12.6.2 Time-varying LQR
Take
the LQR chapter).

, and

, then apply finite-horizon LQR (see

12.7 I
LQR D

D
P

12.8 CS: A

Next chapter
Underactuated Robotics

Russ Tedrake, 2014

11/14/2014 9:26 PM

Vous aimerez peut-être aussi