Vous êtes sur la page 1sur 4

Direct Adaptive Control, Direct Optimization

F Pait and Rodrigo Romano


December 2015

Steve Morse Workshop in Osaka

How do we design a direct adaptive controller? First we


need to decide on the underlying control design methodology.
The adjustable control parameters, the shape of the error equation, the everpresent adaptive observer, all follow from this initial
choice. This contrasts with indirect adaptive control systems. Indirect controllers typically comprise a parameterized observer which
generates an identification error; a certaintyequivalence feedback
regulator; and a tuner or adaptive law. These components can be
designed in a modular fashion, more or less independently, provided each possesses some properties which are indeed satisfied
by typical control and estimation algorithms. Not so with direct
adaptive control!
Reference models are just one possibility in indirect adaptive
control, and they are used sparingly, if at all, outside adaptive control. In this paper we explore the feasibility of another design technique: linearquadratic optimal control. Design using a quadratic
objective is perhaps the most transparent and best understood paradigm that can be applied to detectable, stabilizable linear dynamical
systems in general.
We wish to control a plant with measured output y Rny
and control input u Rnu using the nO dimensional direct adaptive
controller
x = AO x + BO u + DO y

(1)

u = K(t)x.

(2)

in which AO , BO , and DO are fixed, and K is a tunable feedback


parameter. The Morse observer is constructed, as fully spelled out
in1 , together with the state and output estimation equations
x = EO ()x

(3)

y = CO ()x + GO ()y.

(4)

The state vector x plays a role of the socalled regressor used in


parameter estimation and system identification2 .

y D = (I G())

If the plant is siso (ny = nu = 1)


and has dimension n, one could pick
nO = 2n, choose a controllable pair
( A nn , b n1 ) with A stable, and build
the observer with
AO = [

A
0

0
0
b
] , BO = [ ] , DO = [ ] ,
A
b
0

CO an nO -dimensional row vector of


parameters, and GO = 0.
A. S. Morse and F. M. Pait. MIMO
design models and internal regulators
for cyclicly switched parameter
adaptive control systems. IEEE
Transactions on Automatic Control, 39(9):
18091818, 1994
1

R. A. Romano and F. Pait.


Matchableobservable linear models
and direct filter tuning: An approach
to multivariable identification. Submitted to the IEEE Transactions on
Automatic Control, 2015
2

To construct Steves observer (1) we pick a design model


1
x D = (A D + D() (I G()) CD ) x D + BD ()u D

The overwhelming majority of the


direct adaptive control literature uses
reference models as the design paradigm. This is because the control error
between a plants output and that of a
suitably defined reference model can
be expressed in a convenient form in
which the control parameters appear
linearly provided, of course, that
a number of restrictive hypothesis
are satisfied. Another class of direct
controllers are nonidentifier based
universal controllers.

(5)

CD x D ,

where are model parameters, I is an identity matrix, and (CD , A D )


is a stable, observable, parameterindependent pair. The parameter
matrix G() Rny ny is strictly lower triangular, and the parameter
matrices D() and B() take values in Rnx ny and Rnx nu .

direct adaptive control, direct optimization

Matchability. Let CP (`O ) be the class of stabilizable, detectable


linear systems with coefficient matrix triple (CP , A P , BP ) of dimensions ny n P , n P n P , and n P nu respectively, and whose list of
observability indices is `O = {n1 , n2 , . . . , nny }. Then there exists a
value of the parameter such that the transfer function
H (s) = (I G())

CD (sI A D D() (I G())

CD )

An indirect adaptive control scheme


might be designed as follows:
+
Use the expression y = CO ()x
to define an identification
G()y
error y y for each value of ;

EO ()AO = A D EO ()
EO ()DO = DD ()

CO () = CD E0 ()

The number n1 + n2 + + nny is


the McMillan degree of the transfer
functions of the systems in the class

BD ()

of (5) matches the transfer function CP (sI A P )1 BP .


The design model (5) is linked to (1) by the equations:

EO ()BO = BD ()

GO () = G().

Existence of EO () is fundamental for the argument ahead, but its


construction will not be needed.
If the value of is such that (5) is stabilizable, there exists a
state feedback matrix K D () that stabilizes the design model. It
can be shown that the observer state feedback u = K D ()x D renders the signals y and u bounded, and stability of AO guarantees that x is bounded as well. Therefore there exists a matrix
K() = K D ()EO () such that A() = AO + BO K() is Hurwitz.
Relabeling BO = B, we now work with the system
x = A()x + Bu.

(6)

In this construction B is fixed, but A depends on the parameters ,


which in adaptive control is unknown.
Theorem 1 The pair (A(), B) as constructed above is stabilizable.
To obtain a quadratic error equation, choose constant,
symmetric, positivedefinite matrices Q and R so that one can
compute the quantity
z2 = y T Qy + u T Ru.

(7)

Taking advantage of linearity of


the error in , obtain the estimate
at each instant t using a recursive
gradienttype or leastsquares
method;
At each t plug the estimate into
the model, design a feedback con and set the observer
trol K D (),
based state feedback control signal
O ()x
according to
u(t) = K D ()E
the certaintyequivalence principle.
We are not at the moment interested
in pursuing parameter estimation or
indirect adaptive control, so we shall
directly employ a controller given by
(2), without the intermediate steps
of obtaining parameters estimates
and computing EO () or KO ().
The reason is that the parameter
space may contain values of for
which (5) loses stabilizability and the
certaintyequivalence feedback control
laws exhibit singularities. Moreover,
the topology of the space where
the parameters and the control
parameters contained in K live are
incompatible; one is neither stronger
nor weaker than the other, and this
leads to well-known complications
which we wish to avoid via the use of
direct adaptive control.

using only process input and output data. We write the quantity
on the lefthand side as a magnitude squared to emphasize that it
takes positive values only. The algebraic Riccati equation
T
A T P + PA PBR1 B T P + (I G)1 CO
QCO (I G)1 = 0,

which also appears3 when trying to avoid loss of stabilizability in


indirect control, has a unique positivedefinite solution P due to
stabilizability of (A, B), so along the trajectories of (6)

D Colon and FM Pait. Geometry of


adaptive control: optimization and
geodesics. International Journal of
Adaptive Control and Signal Processing,
18(4):381392, 2004
3

d T
x Px = x T A T Px + x T PAx + u T B T Px + x T PBu
dt
T
QCO (I G)1 x + x T PBR1 B T Px + x T K T B T Px + x T PBKx
= x T (I G)1 CO
= x T (K + R1 B T P)T R(K + R1 B T P)x y T Qy u T Ru.

direct adaptive control, direct optimization

Integrating on the interval [ti T, ti ] gives the identity

ti
ti T

z2 dt =

ti
ti T

x T (K K )T R(K K )xdt + x(ti T)T Px(ti T) x(ti )T Px(ti ) dt.

(8)
where the stabilizing feedback gain matrix K = R B P is unknown, as is the matrix P. We now assume that here exists a known
matrix P such that P P is positive definite, and write
1 T

i) +
zK (ti ) = x(ti )T Px(t

ti
ti T

z2 dt.

The term in K looks almost like the error in a traditional parameter estimation problem. It would be amenable to treatment by a
least-squares algorithm, except that in the present case the error is
known in magnitude only4 . The need arises for tuners with capabilities comparable to those of traditional leastsquares or gradient
type, which can function with information on the magnitude of the
error only.
Direct tuning for direct control. We consider that the feedback control K is linearly parametrized by a vector of parameters .
Our task then is to tune in a manner that keeps
f (, ti ) =

We are ignoring all the noise and


initial condition terms in the formulas
as written here.

zK() (ti )
T
e + Tx (ti T)x(ti

T)

small.
1. Choose a sequence of instants ti We shall fix t0 = 0 and use
ti+1 = ti + T, however different durations of the intervals ti+1 ti
might be algorithmically advantageous.
2. During each interval [ti , ti+1 ) apply a constant feedback of K to
obtain the control cost f (, ti ). The controlled process plays the
role of a 0order oracle: when queried it supplies the value of f ,
with no hint on how to use it.

F.M. Pait. On the design of direct


adaptive controllers. In Proceedings of
the 40th IEEE Conference on Decision and
Control, pages 734738, 2001
4

When an explicit formula for a function F(x) is available, a gardenvariety


finite difference approximation for
its derivative uses the 1st term of
F(x+h)F(x)
the Taylor series: F

.
x
h
A betterbehaved approximation which doesnt need the difference term uses a complex step;
Im [F(x+ih)]Im [F(x)]
Im [F(x+ih)]
F

=
.
x
Im [ih]
h
We shall not have the opportunity to
use this approximation here because
there is no experimental manner to
compute f (, ti ) for complex arguments, but I find the formula neat and
wrote it down for inspiration.

i ).
3. At each instant ti use the barycenter formula to compute (t
i ) and, during the
4. Pick (ti ) as a random variable with mean (t
interval [ti , ti+1 ), use the feedback K((ti )).
The barycenter method can be used to tune the parameters :
mi = mi1 + e f (i ,ti )

(10)

1
i =
(mi1 i1 + e f (i ,ti ) i )
mi

(11)

i = i1 + i

(12)

Here m0 = 0, 0 = 0, i is the sequence i of test values of the controller parameters, and is a positive real constant.
The rationale behind the method is that points where f is large
receive low weight in comparison with those for which f is small.

The update rules for mi (10) and i (11)


constitute a recursive version of the
expression
i =

ij=1 i e
ij=1 e

f ( j ,t j )

f ( j ,t j )

(9)

for the barycenter or center of mass


of a distribution of weights e f (i ,ti )
placed at points i , which is the point
Rn that minimizes the sum of
weighted distances
i

2 f ( j ,t j )
.
( j ) e
j=1

direct adaptive control, direct optimization

In system identification, this method has been employed successfully for filter tuning5 .
It will prove useful to consider the sequence of test points i as
defined by the sum of the barycenter i1 of the previous points
and a curiosity i , as spelled out in (12). Then (11) reads
i i1 =

e f (i ,ti )
(i i1 ) = Fi (i )i .
mi1 + e f (i ,ti )

R. A. Romano and F. Pait. Direct filter


tuning and optimization in multivariable identification. In Proceedings of
the 53rd IEEE Conference on Decision and
Control, pages 17981803, Los Angeles,
USA, Dec 2014
5

Assume that f () is continuously differentiable with respect to


mi1 e f (i1 +,ti )

the argument , and write F()


=
, so that
f (
+,t )
(mi1 e

i1

+1)2

f
= F (here and in the computations that follow the subscript
indicating dependence of the interval i is omitted if there is no
ambiguity.)
F

A randomized version of the barycenter algorithm has the


required gradientlike properties, in the case where i has a Gaussian normal distribution. A continuoustime version of the barycenter algorithm was analyzed in6 .
Claim 2 The expected value of i = i i1 with respect to the random
variable is proportional to the average value of the gradient of f ( i1 +
i , ti ) in the support of the distribution of i .
Consider the probability density function
p() =

1
(2)n

e 2 ()
1

1 ()

)p() so p = p . Einsteins implicit


Then = 1
(
summation of components with equal upper and lower indices convention
is in force, with upper Greek indices for the components of z and of , and
lower indices for the components of s inverse. With X the ndimensional
set where the curiosity takes its values, for each component of the
vector we have
p

E [F() ] = F() p()d = F() p()d F()


X

F
d +
p()d = F()p() ( d) ,

X
X

where d is an (n 1)form which can be integrated at the boundary


X of the ndimensional set X . The righthand side is zero, because F is
bounded and p() vanishes at the border of X , which is at infinity, hence
E [F() ] = F()p()d +
X

F
p()d,

E [ i ] = E [Fi ()] E [ Fi () f ( i1 + , ti )] .

The term in can be employed to


incorporate extra knowledge in several
manners. For example, if at each
step we take = i1 , then the
gradient term is responsible for the
rate of change, or acceleration, of
the tuning process. Here the author
cannot resist citing his seldomread
paper discussing tuners that set the
2nd derivative, or difference, of the
adjusted parameters, rather than the
1st derivative, as is more often done in
the literature, as well as an application
to filtering theory:

p
()d.
Felipe M Pait. A tuner that accele

Using the integrationbyparts formula,


F()

If we consider i to be our best guess,


on the basis of the information provided by the test point i , of where
the minimum of f () might be found,
then in the absence of any extra knowledge it makes sense to pick the
curiosity i as a random variable with
some judiciously chosen probability
distribution.
6
Felipe Pait. Reading Wiener in Rio.
In IEEE Conference on Norbert Wiener in
the 21st Century, pages 14. IEEE, 2014

(13)

rates parameters. Systems & Control


Letters, 35(1):6568, August 1998; and
M. Gerken, F.M. Pait, and P.E. Jojoa.
An adaptive filtering algorithm with
parameter acceleration. In IEEE International Conference on Acoustics, Speech,
and Signal Processing, pages 1720, 2000

Formula (13) is the main result concerning the barycenter method. It shows
that roughly speaking the search performed by the barycenter algorithm
follows the direction of the negative
average gradient of the function to
be minimized, the weighted average
being taken over the domain where the
search is performed.

Vous aimerez peut-être aussi