0 Votes +0 Votes -

77 vues14 pagesApr 07, 2011

© Attribution Non-Commercial (BY-NC)

PDF, TXT ou lisez en ligne sur Scribd

Attribution Non-Commercial (BY-NC)

77 vues

Attribution Non-Commercial (BY-NC)

- Nonlinear Ansys Technology
- Control of a Double Inverted Pendulum on a
- How Do I Model Tension-Only Bracing in ETABS
- The lagrangian equation examples
- Production Strategy
- Optimal Control of Double Inverted Pendulum Using LQR Controller
- Mechanical Intro 17.0 M03 Structural Analysis
- Dynamic User Equilibrium With Side Constraints for a Traffic Network Theoretical Development and Numerical Solution
- 20.IJASRAPR201720
- Ber Kane
- Adaptive Dynamic Programming Introduction
- dynamic optimization
- Optimization of Pile Groups.pdf
- 87b0e070d76d2f23332584eb636eae15e0eb
- Aspects of process modelling
- 1159
- C407X_01
- Identification of Hammerstein
- PB
- Model

Vous êtes sur la page 1sur 14

Alexander Bogdanov ∗

Department of Computer Science & Electrical Engineering,

OGI School of Science & Engineering, OHSU

Technical Report CSE-04-006

December 2004

In this report a number of algorithms for optimal control of a double inverted pendulum on a cart (DIPC)

are investigated and compared. Modeling is based on Euler-Lagrange equations derived by specifying a

Lagrangian, difference between kinetic and potential energy of the DIPC system. This results in a system

of nonlinear differential equations consisting of three 2-nd order equations. This system of equations is then

transformed into a usual form of six 1-st order ordinary differential equations (ODE) for control design pur-

poses. Control of a DIPC poses a certain challenge, since unlike a robot, the system is underactuated: one

controlling force per three degrees of freedom (DOF). In this report, problem of optimal control minimizing

a quadratic cost functional is addressed. Several approaches are tested: linear quadratic regulator (LQR),

state-dependent Riccati equation (SDRE), optimal neural network (NN) control, and combinations of the

NN with the LQR and the SDRE. Simulations reveal superior performance of the SDRE over the LQR and

improvements provided by the NN, which compensates for model inadequacies in the LQR. Limited capa-

bilities of the NN to approximate functions over the wide range of arguments prevent it from significantly

improving the SDRE performance, providing only marginal benefits at larger pendulum deflections.

cumulative cost functional quadratic in states and con-

m mass trols. For linear systems, this leads to linear feedback

li distance from a pivot joint to the i-th pen- control, which is found by solving a Riccati equation

dulum link center of mass [5], and thus referred to as linear quadratic regulator

Li length of an i-th pendulum link (LQR). DIPC, however, is a highly nonlinear system,

θ0 wheeled cart position and its linearization is far from adequate for control de-

θ1 , θ 2 pendulum angles sign purposes. Nonetheless, the LQR will be considered

Ii moment of inertia of i-th pendulum link as a baseline controller, and other control designs will

w.r.t. its center of mass be tested and compared against it. To solve the non-

g gravity constant linear optimal control problem, we will employ several

u control force different approaches: a nonlinear extension to the LQR

T kinetic energy called SDRE; direct nonlinear optimization using a neu-

P potential energy ral network (NN); and combinations of the direct NN

L Lagrangian optimization with the LQR and the SDRE.

w neural network (NN) weights For a baseline nonlinear control, we will utilize a

Nh number of neurons in the hidden layer technique that has shown considerable promise and in-

∆t digital control sampling time volves manipulating the system dynamic equations into

Subscripts 0, 1, 2 used with the aforementioned param- a pseudo-linear state-dependent coefficient (SDC) form,

eters refer to the cart, first (bottom) pendulum and sec- in which system matrices are given explicitly as a func-

ond (top) pendulum correspondingly. tion of the current state. Treating the system matrices

as constant, the approximate solution of the nonlinear

state-dependent Riccati equation is obtained for the re-

2 Introduction formulated pseudo-linear dynamical system in discrete

time steps. The solution is then used to calculate a feed-

As a nonlinear underactuated plant, double inverted back control law that is optimized around the system

pendulum on a cart (DIPC) poses a challenging control state estimated at each time step. This technique, re-

problem. It seems to have been one of attractive tools ferred to as State-Dependent Riccati Equation (SDRE)

for testing linear and nonlinear control laws [6, 17, 1]. control, is thus an extension to the LQR as it solves the

Numerous papers use DIPC as a testbed. Cited ones LQR problem at each time step.

are merely an example. Nearly all works on pendulum

control concentrate on two problems: pendulums swing- As a next step, a direct optimization approach will

up control design and stabilization of the inverted pen- be investigated. For this purpose, a NN will be used in

dulums. In this report, optimal nonlinear stabilization the feedback loop, and a standard calculus of variations

approach will be employed to adjust the NN parameters

∗ Senior Research Associate, alexb@cse.ogi.edu (weights) to optimize the cost functional over a wide

1

where

T2 1

I2

T0 = m0 θ̇02

2

l2 L2 1 h 2 2 i

T1 = m1 θ̇0 + l1 θ̇1 cos θ1 + l1 θ̇1 sin θ1

m2g 2

L1 T1

Y

1

T0 I1 + I1 θ̇2

2 1

m0 l1 1 1

m1 θ̇02 + m1 l12 + I1 θ̇12 + m1 l1 θ̇0 θ̇1 cos θ1

u =

m1 g

2 2

1 h 2

X T2 = m2 θ̇0 + L1 θ̇1 cos θ1 + l2 θ̇2 cos θ2

2

2 i 1

Figure 1: Double inverted pendulum on a cart + L1 θ̇1 sin θ1 + l2 θ̇2 sin θ2 + I2 θ̇22

2

1 1 1

m2 θ̇0 + m2 L1 θ̇1 + m2 l2 + I2 θ̇22

2 2 2 2

=

range of initial DIPC states. As a universal function 2 2 2

approximator, the NN is thus trained to implement a + m2 L1 θ̇0 θ̇1 cos θ1 + m2 l2 θ̇0 θ̇2 cos θ2

nonlinear optimal controller. + m2 L1 l2 θ̇1 θ̇2 cos(θ1 − θ2 )

As the last step, two combinations of feedback NN

control with LQR and SDRE will be designed. Approxi- P0 = 0

mation capabilities of a NN are limited by its size and the P1 = m1 gl1 cos θ1

fact that optimization in the space of its weights is non- P2 = m2 g L1 cos θ1 + l2 cos θ2

convex. Thus, only local minimums are usually found,

and the solution is at most suboptimal. The problem Thus the Lagrangian of the system is given by

is more severe with wider ranges of NN inputs and out- 1

m0 + m1 + m2 θ̇02

puts. To address this, a combination of the NN with L =

a conventional feedback suboptimal control is designed 2

to simplify the NN input-output mapping and therefore 1 1

m1 l12 + m2 L21 + I1 θ̇12 + m2 l22 + I2 θ̇22

+

reduce training complexity. If the conventional feedback 2 2

control were optimal, the optimal NN output would triv- + m1 l1 + m2 L1 cos(θ1 )θ̇0 θ̇1

ially be zero. Simple logic says that a suboptimal con-

ventional control will simplify the NN mapping to some + m2 l2 cos(θ2 )θ̇0 θ̇2 + m2 L1 l2 cos(θ1 − θ2 )θ̇1 θ̇2

extent: instead of generating optimal controls, the NN is − m1 l1 + m2 L1 g cos θ1 − m2 l2 g cos θ2

trained to produce corrections to the controls generated

by the conventional suboptimal controller. For example, Differentiating the Lagrangian by θ̇ and θ yields La-

the LQR provides near-optimal control in the vicinity grange equation (1) as

of the equilibrium, since nonlinear and linearized DIPC

d ∂L ∂L

dynamics are close in the equilibrium. Thus the NN will − = u

only have to correct the LQR output when the lineariza- dt ∂ θ̇0 ∂θ0

tion accuracy diminishes. The next sections will discuss

d ∂L ∂L

the above concepts in details and illustrate them with − = 0

dt ∂ θ̇1 ∂θ1

simulations.

d ∂L ∂L

− = 0

dt ∂ θ̇2 ∂θ2

3 Modeling

Or explicitly,

The DIPC system is graphically depicted in Fig. 1. To X

derive its equations of motion, one of the possible ways u = mi θ̈0 + m1 l1 + m2 L1 cos(θ1 )θ̈1

is to use Lagrange equations:

+ m2 l2 cos(θ2 )θ̈2 − m1 l1 + m2 L1 sin(θ1 )θ̇12

d ∂L ∂L

− =Q (1) − m2 l2 sin(θ2 )θ̇22

dt ∂ θ̇ ∂θ

0 = m1 l1 + m2 L1 cos(θ1 )θ̈0

where L = T − P is a Lagrangian, Q is a vector of

m1 l12 + m2 L21 + I1 θ̈1

generalized forces (or moments) acting in the direction +

of generalized coordinates θ and not accounted for in + m2 L1 l2 cos(θ1 − θ2 )θ̈2

formulation of kinetic energy T and potential energy P .

Kinetic and potential energies of the system are given + m2 L1 l2 sin(θ1 − θ2 )θ̇22

by the sum of energies of its individual components (a

− m1 l1 + m2 L1 g sin θ1

wheeled cart and two pendulums):

0 = m2 l2 cos(θ2 )θ̈0 + m2 L1 l2 cos(θ1 − θ2 )θ̈1

T = T 0 + T1 + T2 + m2 l22 + I2 θ̈2 − m2 L1 l2 sin(θ1 − θ2 )θ̇12

P = P0 + P1 + P2 − m2 l2 g sin θ2

2

Lagrange equations for the DIPC system can be writ- which represents an accumulated cost of the sequence of

ten in a more compact matrix form: states xk and controls uk from the current discrete time t

to the final time tf inal . For regulation problems tf inal =

D(θ)θ̈ + C(θ, θ̇)θ̇ + G(θ) = Hu (2) ∞. Optimization is done with respect to the control

where sequence subject to constraints of the system dynamics

d1 d2 cos θ1 d3 cos θ2

! (6). In our case,

D(θ) = d2 cos θ1 d4 d5 cos(θ1 −θ2 ) (3)

d3 cos θ2 d5 cos(θ1 −θ2 ) d6 Lk (xk , uk ) = xTk Qxk + uTk Ruk (8)

0 −d2 sin(θ1 )θ̇1 −d3 sin(θ2 )θ̇2

C(θ, θ̇) = 0 0 d5 sin(θ1 −θ2 )θ̇2 (4) linear systems, this leads to linear state-feedback control,

LQR, designed in the next subsection. For nonlinear

0 −d5 sin(θ1 −θ2 )θ̇1 0

systems the optimal control problem generally requires

0 a numerical solution, which can be computationally pro-

!

G(θ) = −f1 sin θ1 (5) hibitive. An analytical approximation to the nonlin-

−f2 sin θ2 ear optimal control solution is utilized in subsection on

H = (1 0 0)T SDRE control, which represents a nonlinear extension

to the LQR and yields superior results. Neural net-

Assuming that centers of mass of the pendulums are in work (NN) capabilities for function approximation are

the geometrical center of the links, which are solid rods, employed to approximate the nonlinear control solution

we have: li = Li /2, Ii = mi L2i /12. Then for the ele- in subsection on NN control. And combinations of the

ments of matrices D(θ), C(θ, θ̇), and G(θ) we get: NN with LQR and SDRE are investigated in the subsec-

d1 = m0 + m1 + m2 tion following the NN control.

1 4.1 Linear Quadratic Regulator

d2 = m1 l1 + m2 L1 = m1 + m 2 L 1

2 The linear quadratic regulator yields an optimal solu-

1 tion to the control problem (7)–(8) when system dynam-

d3 = m 2 l2 = m2 L 2

2 ics are linear. Since DIPC is nonlinear, as described by

1 (6), it can be linearized to derive an approximate linear

m1 l12 + m2 L21 + I1 = m1 + m2 L21

d4 =

3 solution to the optimal control problem. Linearization

1 of (6) around x = 0 yields:

d5 = m 2 L 1 l2 = m2 L 1 L 2

2

1 ẋ = Ax + Bu (9)

d6 = m2 l22 + I2 = m2 L22

3 where

1

f1 = (m1 l1 + m2 L1 )g = ( m1 + m2 )L1 g

0 I

2 A = (10)

−D(0) ∂G(0)

−1

1 0

f 2 = m 2 l 2 g = m2 L 2 g ∂θ

2

0

B = −1 (11)

Note that matrix D(θ) is symmetric and nonsingular. D(0) H

4 Control

u = −R−1 BT Pc x ≡ −Kc x (12)

To design a control law, Lagrange equations of motion

(2) are reformulated into a 6-th order system of ordinary where Pc is a steady-state solution of the differential

differential equations. To do this, a state vector x ∈ R 6 Riccati equation. To implement computerized digital

is introduced: control, dynamic equations (9) are approximately dis-

x = (θ θ̇)T cretized as Φ ≈ eA∆t , Γ ≈ B∆t, and digital LQR con-

Then dropping dependencies of the system matrices on trol is then given by

the generalized coordinates and their derivatives, the

system dynamic equations appear as: uk = −R−1 ΓT Pxk ≡ −Kxk (13)

0 I 0 0 where P is the steady state solution of the difference

ẋ = x+ + u (6)

0 −D−1 C −D−1 G D−1 H Riccati equation, obtained by solving the discrete-time

algebraic Riccati equation

In this report, optimal nonlinear stabilization control de-

sign is addressed: stabilize the DIPC minimizing an ac- ΦT [P − PΓ(R + ΓT PΓ)−1 ΓT P]Φ − P + Q = 0 (14)

cumulative cost functional quadratic in states and con-

trols. The general problem of designing an optimal con- where Q ∈ R6×6 and R ∈ R are positive definite state

trol law involves minimizing a cost function and control cost matrices. Since linearization (9)–(11)

tf inal accurately represents the DIPC system (6) in the equi-

librium, the LQR control (12) or (13) will be a locally

X

Jt = Lk (xk , uk ), (7)

k=t

near-optimal stabilizing control.

3

4.2 State-Dependent Riccati Equation Control Ruk2

An approximate nonlinear analytical solution of the

optimal control problem (7)–(8) subject to (6) is given xk uk

Neural network Double inverted x k 1

by a technique referred to as the state-dependent Ric- pendulum on a cart

cati equation (SDRE) control. The SDRE approach [3] xk

involves manipulating the dynamic equations

q 1

ẋ = f (x, u)

xTk Qx k

into a pseudo-linear state-dependent coefficient (SDC)

form in which system matrices are explicit functions of

the current state: Figure 2: Neural network control diagram

A standard LQR problem (Riccati equation) can then and on the other hand, SDC form of the system dynamics

be solved at each time step to design the state feedback is

control law on-line. For digital implementation, (15) is ẋ = A(x)x + B(x)u

approximately discretized at each time step into

Since O(x)2 → 0 when x → 0, then A(x) → A and

xk+1 = Φ(xk )xk + Γ(xk )uk (16) B(x) → B. Therefore, it is natural to expect that per-

And the SDRE regulator is then specified similar to the formance of the SDRE regulator will be very close to the

discrete LQR (compare with (13)) as LQR in the vicinity of the equilibrium, and the differ-

ence will show at larger pendulum deflections. This is

uk = −R−1 ΓT (xk )P(xk )xk ≡ −K(xk )xk (17) exactly illustrated in the Simulation Results section.

where P(xk ) is the steady state solution of the difference 4.3 Neural Network Learning Control

Riccati equation, obtained by solving the discrete-time A neural network (NN) control is often popular in con-

algebraic Riccati equation (14) using state-dependent trol of nonlinear systems due to universal function ap-

matrices Φ(xk ) and Γ(xk ), which are treated as being proximation capabilities of NNs. Neural networks with

constant at each time step. Thus the approach is some- only one hidden layer and an arbitrarily large num-

times considered as a nonlinear extension to the LQR. ber of neurons represent nonlinear mappings, which can

Proposition 1. Dynamic equations of a double inverted be used for approximation of any nonlinear function

pendulum on a cart are presentable in SDC form (15). f ∈ C(Rn , Rm ) over a compact subset of Rn [7, 4, 12]. In

this section, we utilize the function approximation prop-

Proof. From the derived dynamic equations for the erties of NNs to approximate a solution of the nonlinear

DIPC system (6) it is clear that the required SDC form optimal control problem (7)–(8) subject to system dy-

(15) can be obtained if vector G(θ) is presentable in the namics (6), and thus design an optimal NN regulator.

SDC form: G(θ) = Gsd (θ)θ. Let us construct Gsd (θ) as The problem can be solved by directly implementing a

feedback controller (see Figure 2) as:

0 0 0

sin θ

Gsd (θ) = 0 −f1 θ1 1 0 uk = NN (xk , w) ,

0 0 −f2 sinθ2θ2

Next, an optimal set of weights w is computed to

Elements of constructed Gsd (θ) are bounded everywhere solve the optimization problem. To do this, a stan-

and G(θ) = Gsd (θ)θ as required. Thus the system dy- dard calculus of variations approach is taken: mini-

namic equations can be presented in the SDC form as mization of a functional subject to equality constraints

xk+1 = f (xk , uk ). Let λk be a vector of Lagrange mul-

0 I 0

ẋ = x+ u (18) tipliers in the augmented cost function

−D−1 Gsd −D−1 C D−1 H

tf inal n o

X

H= L(xk , uk ) + λTk (xk+1 −f (xk , uk )) (19)

Derived system equations (18) (compare with the lin- k=t

earized system (9)–(11)) are discretized at each time step

into (16), and control is then computed as given by (17). We can now derive the recurrent Euler-Lagrange equa-

Remark. In the neighborhood of equilibrium x = 0 ∂H

tions by solving ∂x = 0 w.r.t. the Lagrange multipliers

system equations in the SDC form (18) turn into lin- k

earized equations (9) used in the LQR design. This can and then find the optimal set of NN weights w ? by solv-

be checked either by direct computation or by noting ing the optimality condition ∂H ∂w = 0 (numerically by

that on one hand, linearization yields means of gradient descent).

Dropping dependence of L(xk , uk ) and f (xk , uk ) on

ẋ = Ax + Bu + O(x)2 xk and uk , and N N (xk , w) on xk and w for brevity, the

4

∂fc (x,u)

2 Ruk

∂u ,

where fc (x, u) is a brief notation for the right-

hand side of the continuous system (6), i.e. ẋ = fc (x, u).

dx kNN duk

NN Jacobian + Pendulum Ȝ k 1

Jacobian ∂fc (x, u) 0 I

dx k = (22)

∂x −D−1 M −2D−1 C

Ȝk

q 1

+ ∂fc (x, u) 0

= (23)

∂u D−1 H

2Qx k

where matrix M(θ, θ̇) = (M0 M1 M2 ), and each of its

Figure 3: Adjoint system diagram columns (note that ∂D −1 ∂D −1

−1

∂C ∂G ∂D −1

Euler-Lagrange equations are derived as Mi = θ̇ + − D (Cθ̇ + G − Hu) (24)

∂θi ∂θi ∂θi

T

∂f ∂f ∂NN

λk = + λk+1 Remark 1. Clearly, Jacobians (22)–(24) transform into

∂xk ∂uk ∂xk linear system matrices (10)–(11) in equilibrium θ = 0,

T

∂L ∂L ∂NN θ̇ = 0.

+ + (20) Remark 2. From (3)–(5) it follows that M0 ≡ 0.

∂xk ∂uk ∂xk

Remark 3. Jacobians (22)–(24) are derived from the

with λtf inal initialized as zero vector. For L(xk , uk ) continuous system equations (6). BPTT requires com-

∂L ∂L putation of the discrete system Jacobians. Thus, to use

given by (8), ∂x = 2xTk Q, and ∂u = 2uTk R. These

k k the derived matrices in NN training, they should be dis-

equations correspond to an adjoint system shown graph- cretized (e.g. as it was done for the LQR and SDRE).

ically in Figure 3, with optimality condition Computation of the NN Jacobian is easy to perform

tf inal given the nonlinear functions of individual neural ele-

∂H X ∂f ∂L ∂NN ments are y = tanh(z). In this case, NN with N0 inputs,

= λTk+1 + = 0. (21) single output u and a single hidden layer with Nh ele-

∂w ∂uk ∂uk ∂w

k=t

ments is described by

The overall training procedure for the NN can now be

Nh N0

summarized as follows: X X

u= w2i tanh w1i,j xj + w1bi + w2b

1. Simulate the system forward in time for tf inal time i=1 j=1

steps (Figure 2). Although, as we mentioned,

tf inal = ∞ for regulation problems, in practice it Or in a more compact form,

is set to a sufficiently large number (in our case

tf inal = 500).

u = w2 tanh W1 x + w1b + w2b

mulate the Lagrange multipliers (20) (see Figure 3). T

where tanh(z) = (tanh(z1 ), . . . , tanh(zNh )) , w2 ∈ RNh

System Jacobians are evaluated analytically or by is a row-vector of weights in the NN output, W1 ∈

perturbation. RNh ×N0 is a matrix of weights of the hidden layer el-

ements, w1b ∈ RNh is a vector of bias weights in the

3. Update the weights using gradient descent1 , ∆w =

T hidden layer and w2b is a bias weight in the NN output.

−γ ∂H

∂w . Graphically NN is depicted in Figure 4.

Remark 4. In case of a NN with Nout outputs

4. Repeat until convergence or until an acceptable level (MIMO control problems), w2 becomes a matrix W2 ∈

of cost reduction is achieved.

RNout ×Nh , and w2b becomes a vector w2b ∈ RNout .

Due to the nature of the training process (propagation of Noting that ∂ tanh(z)

∂z = 1 − tanh2 (z), the NN Jacobian

the Lagrange multipliers backwards through time), this w.r.t. the NN input (state vector x) is given by

training algorithm is referred to as Back Propagation

Through Time (BPTT) [12, 16]. ∂NN T

As seen from Euler-Lagrange equations (20), back = w2T dtanh W1 x + w1b W1 (25)

∂x

propagation of the Lagrange multipliers includes compu-

tation of the system and NN Jacobians. The DIPC Ja- where operator is an element-by-element

cobians can be computed numerically by perturbation or multiplication of two vectors, i.e. xy =

derived analytically from the dynamic equations (6). Let T

(x,u) (x1 y1 , . . . , xn yn ) ; and vector field dtanh(z) =

us show the analytically derived Jacobians ∂fc∂x and T

1 − tanh2 (z1 ), . . . , 1 − tanh2 (zNh ) . Again, in MIMO

1 In practice we use an adaptive learning rate for each weight in case when the NN has Nout outputs, individual rows of

the network using a procedure similar to delta-bar-delta [8] W2 take place of row-vector w2 in (25).

5

W1 , w1b Hidden layer Ruk2

W11,1

W11,2

+ tanh

1

xk uk

Neural network + x k 1

x1 W11,6

Double inverted

pendulum on a cart

Wb11 W2 , w b2 Output xk

x2 W12,1

K

2

W12,2

+ tanh

W21

W22

+

u

q 1

x6 W12,6

Wb12 W2Nh

Wb2 xTk Qx k

W1Nh,1

W1Nh,2

+ tanh Nh

Figure 5: Neural network + LQR control diagram

W1Nh,6

Wb1Nh

Figure 4: Neural network structure other hand, too many elements make gradient descent

in space of NN weights more prone to getting stuck in

local minima (recall, this is a non-convex problem), thus

The last essential part of the NN training algorithm is prohibiting achievement of good approximation levels.

the NN Jacobian w.r.t. the network weights: Therefore a reasonable compromise is usually necessary.

∂NN T

4.4 Neural Network Control + LQR/SDRE

= w2T dtanh W1 x + w1b xi

∂W1i

In the previous section a NN was optimized directly to

∂NN stabilize the DIPC at minimum cost (7)–(8). To achieve

tanh W1 x + w1b

= (26)

∂w2 this, the NN was trained to generate optimal controls

∂NN T over the wide range of the pendulums motion. As illus-

= w2T dtanh W1 x + w1b trated in Simulation Results section, NN approximation

∂w1b

capabilities, limited by the number of neurons and nu-

∂NN merical challenges of non-convex minimization, provided

= 1

∂w2b achievement of stabilization only within the range com-

parable to the LQR. One might ask whether it is possible

where W1i is an i-th column of W1 . to make the NN training more efficient and faster. Sim-

Now we have all the quantities to compute weight up- ple logic says that if the DIPC were stable and closer

dates and control. Explicitly, from (21) and (26) the to optimal, the NN would not have much to learn since

weight update recurrent law is given by its optimal output would be closer to zero. In the trivial

case, if the DIPC were optimally stabilized by an internal

tf inal(" T #

X

∂f ∂L controller, the optimal NN output would be zero. Now

W1n+1 = W1n −γ W2Tn λTk+1 + let us recall that LQR provides optimal control in the

∂uk ∂uk

k=t vicinity of the equilibrium. Thus if we include LQR into

the control scheme as shown in Figure 5, it is reason-

)

b

T

dtanh W1n xk + w1n xk able to expect better overall performance: the NN will

be trained to generate only “corrections” to the LQR

tf inal( T controls to provide optimality in the wide range of pen-

X ∂f ∂L dulums motion.

W2n+1 = W2n −γ λTk+1 +

∂uk ∂uk Although in Figure 5 an LQR is shown, the SDRE

k=t

) controller can be used as well in place of the LQR.

As demonstrated in the Simulation Results section, the

× tanhT W1n xk + w1n b

(27)

SDRE control provides superior performance in terms

of minimized cost and range of stability over the LQR.

tf inal(" T #

Thus, it is natural to expect a better performance of the

X ∂f ∂L

w1b n+1 = w1b n −γ W2Tn λTk+1 + control design shown in Figure 5 when the SDRE is used

∂uk ∂uk

k=t in place of the LQR. Since both the LQR and the SDRE

have a lot in common (in fact, the SDRE is often called a

)

w1b n

dtanh W1n xk + nonlinear extension to the LQR), both cases will be dis-

cussed in this section, and when the differences between

tf inal T the two will call for clarification, it will be provided.

X ∂f ∂L The problem solved in this section is nearly the same

w2b n+1 = w2b n −γ λTk+1 +

∂uk ∂uk as in the previous section, therefore, almost all the for-

k=t

mulas stay the same or have just a few changes.

In conclusion to this section, it should be mentioned Since control of the cart is a sum of the NN output

that the number of elements in the NN hidden layer af- and the LQR (SDRE),

fects optimality of the control design. One one hand, the

more elements, the better the theoretically achievable uk = NN (w, xk ) + Kxk

6

2 Ruk SDRE control was designed for a simplified nonlinear

model, and the NN made it possible to compensate for

dx kNN duk wind disturbances and higher-order terms unaccounted

NN Jacobian + Pendulum Ȝ k 1

dx k Jacobian for in the SDRE design.

K 4.5 Receding Horizon Neural Network Control

Ȝk Is it possible to further improve control scheme de-

+ + q 1

signed in the previous section? Recall, that the NN was

trained numerous times over the wide range of the pen-

2Qx k

dulum angles to minimize the cost functional (19). The

complexity of the approach is a function of the final time

Figure 6: Adjoint NN + LQR system diagram tf inal , which determines the length of a training epoch.

Solution suggested in this section is to apply a receding

horizon framework and train the NN in a limited range

where in case of SDRE K ≡ K(xk ), minimization of the of the pendulum angles, along a trajectory starting in

augmented cost function (19) by taking derivatives of H the current point and having relatively short duration

w.r.t. xk and w yields slightly modified Euler-Lagrange (horizon) N . This is accomplished by rewriting the cost

equations: function (19) as

T t+N−1

∂f ∂f ∂NN Xn o

λk = + +K λk+1 H= L(xk , uk )+λTk (xk+1 −f (xk , uk )) +V (xt+N )

∂xk ∂uk ∂xk

T k=t

∂L ∂L ∂NN

+ + +K (28) where the last term V (xt+N ) denotes the cost-to-go from

∂xk ∂uk ∂xk

time t + N to time tf inal . Since range of the NN inputs

Note that when SDRE is used instead of the LQR, in this case is limited and the horizon length N is short,

∂(K(xk )xk )

should be used in place of K in the above a shorter time is required to train the NN. However, only

∂xk local minimization of function (19) is achieved: starting

equation. These equations correspond to an adjoint sys-

with a significantly different initial condition the NN will

tem shown graphically in Figure 6 (compare to Figure 3),

not provide cost minimization since it was not trained for

with optimality condition (21)

it. This issue is addressed by periodically retraining the

The training procedure for the NN is the same as in

NN: after a period of time called an update interval, the

the previous section, and all the system Jacobians and

NN is retrained taking the current state vector as initial.

the NN Jacobian are computed in the same way. The

Update interval is usually significantly shorter than the

only new item in this section is the SDRE Jacobian

∂(K(xk )xk ) horizon, and in classical model-predictive control (MPC)

∂xk which appears in the Euler-Lagrange equa- is often only one time step. This technique was applied

tions (28). This Jacobian is computed either numeri- in helicopter control problem [15, 14] and is referred to

cally, which may be computationally expensive, or ap- as Model Predictive Neural Control (MPNC).

proximately as In practice, the true value of V (xt+N ) is unknown,

and must be approximated. Most common is to simply

∂ (K(xk )xk ) set V (xt+N ) = 0; however, this may lead to reduced

≈ K (xk )

∂xk stability and poor performance for short horizon lengths

[9]. Alternatively, we may include a control Lyapunov

Experiments in the Simulation Results section illus- function (CLF), which guarantees stability if the CLF is

trate the superior performance of such combination con- an upper bound on the cost-to-go, and results in a region

trol over the pure NN or LQR control. For the SDRE, of attraction for the MPC of at least that of the CLF [9].

a noticeable cost reduction is achieved only near criti- The cost-to-go V (xt+N ) can be approximated using the

cal pendulum deflections (close to the boundaries of the solution of the SDRE at time t + N ,

SDRE recovery region).

Remark. The idea of using NNs to adjust outputs V (xt+N ) ≈ xTt+N P(xt+N )xt+N

of a conventional controller to account for differences

between the actual system and its model used in the This CLF provides the exact cost-to-go for regulation

conventional control design was also employed in a num- assuming a linear system at the horizon time. A sim-

ber of works [15, 14, 2, 11]. Calise, Rysdyk and John- ilar formulation was used for nonlinear regulation by

son [2, 11] used a NN controller to supplement an ap- Cloutier et al [13].

proximate feedback linearization control of a fixed wing All the equations from the previous section apply here

aircraft and a helicopter. The NN provided additional as well. The Euler-Lagrange equations (20) are initial-

controls to match the response of the vehicle to the ref- ized in this case as

erence model, compensating for the approximations in T

the autopilot design. Wan and Bogdanov [15, 14] de- ∂V (xt+N )

λt+N = ≈ P(xt+N )xt+N

signed a model predictive neural control (MPNC) for an ∂xt+N

autonomous helicopter, where a NN worked in pair with

the SDRE autopilot and provided minimization of the where dependence of the SDRE solution P on state vec-

quadratic cost function over a receding horizon. The tor xt+N was neglected.

7

Stability of the MPNC is closely related to that of the uk

traditional MPC. Ideally, in the case of unconstrained Neural network xˆ k 1 -

+

optimization, stability is guaranteed provided V (xt+N ) Ruk2 emulator

+

xk

is a CLF and is an (incremental) upper bound on the

cost-to-go [9]. In this case, the minimum region of at- xk uk

Neural network x k 1

traction of the receding horizon optimal control is de- Double inverted

pendulum on a cart

termined by the CLF used and horizon length. The xk

guaranteed region of operation contains that of the CLF

controller and may be made as large as desired by in- q 1

creasing the optimization horizon (restricted to the in-

finite horizon domain) [10]. In our case, the minimum xTk Qx k

region of attraction of the receding horizon MPNC is

determined by the SDRE solution used as the CLF to

approximate the terminal cost. In addition, we also re- Figure 7: Neural network adaptive control diagram

strict the controls to be of the form given by (4.4) and

the optimization is performed with respect to the NN uk

weights w. In theory, the universal mapping capability Neural network xˆ k 1 -

+

of NNs implies that the stability guarantees are equiva- Ruk2 emulator

+

xˆ k

lent to that of the traditional MPC framework. However, q 1

in practice stability is affected by the chosen size of the xk uk

Neural network Double inverted x k 1

NN (which affects the actual mapping capabilities), as pendulum on a cart

well as the horizon length N and update interval length xk

(how often NN is re-optimized). When the horizon is

short, performance is more affected by the chosen CLF. q 1

mance is limited by the NN properties. An additional xTk Qx k

factor affecting stability is the specific algorithm used

for numeric optimization. Gradient descent, which we Figure 8: Neural network adaptive control diagram

use to minimize the cost function (4.5), is guaranteed to

converge to only a local minimum (the cost function is

not guaranteed convex with respect to the NN weights), 2 Ruk

and thus depends on the initial conditions. In addition,

convergence is assured only if the learning rate is kept dx kNN duk

NN Jacobian + NN emulator Ȝ k 1

sufficiently small. To summarize these points, stabil- Jacobian

ity of the MPNC is guaranteed under certain restricted dx k

ideal conditions. In practice, the designer must select Pendulum

+ q 1

to assure stability and performance.

4.6 Neural Network Adaptive Control 2Qx k

What if pendulum parameters are unknown? In this

case the system Jacobians (22)–(23) can not be com- Figure 9: Adjoint NN adaptive system diagram

puted directly. A solution then is to use an additional

NN, called “emulator”, to emulate pendulum dynamics

(see Figure (7)–(8)). This NN-emulator is trained to ters used in the simulations are presented in Table 1.

recreate pendulum outputs, given control input and cur- Slight deflection (10 degrees in the opposite directions

rent state vector. The regular error back-propagation or 20 degrees in the same direction) results in a very

(BP) algorithm is used if the NN-emulator is connected similar performance of all control approaches as it was

as in Figure (7), and the BPTT is employed in case de- predicted (see Fig. 10–11).

picted in Figure (8). The NN-regulator training proce- Larger deflections (15 degrees in the opposite direc-

dure is the same as in subsection on the NN learning tions or 30 degrees in the same direction) reveal differ-

control, but instead of the system Jacobians (22)–(23), ences between the LQR and the SDRE: the latter one

the NN-emulator Jacobians computed as (25) are used provides lesser cost (7)–(8) and keeps pendulum veloci-

(see Figure (9)). ties lower (see Fig. 12–13 and Table 2). The NN-based

controllers behave similar to the SDRE.

Note that the LQR could not recover the pendulums

5 Simulation results starting from 19 deg. initial deflection in the opposite

directions or 36 deg. deflection in the same direction.

To evaluate the control performance, two sets of simula- Critical cases for the LQR (pendulums starting at 18

tions were conducted. In the first set, the LQR, SDRE, deg. deflections in the opposite directions from the up-

NN and NN+LQR/SDRE control schemes were tested to ward position and 35 deg. in the same direction) are

stabilize the DIPC when both pendulums were initially presented in Fig. 14–15. The NN control provides con-

deflected from their vertical position. System parame- sistently better cost than the LQR at these larger deflec-

8

but are left currently beyond our scope.

Table 1: Simulation parameters

Parameter Value

m0 1.5 kg 6 Conclusions

m1 0.5 kg

m2 0.75 kg This report demonstrated potential advantage of the

L1 0.5 m SDRE technique over the LQR design in nonlinear opti-

L2 0.75 m mal control problems with an underactuated plant. Re-

∆t 0.02 s gion of pendulum recovery for SDRE appeared to be 55

Q diag(5 50 50 20 700 700) to 91 percent larger than in case of the LQR control.

R 1 Direct optimization via using neural networks yields

Nh 40 results superior to the LQR, but the recovery region is

tf inal 500 about the same as in the LQR case. This happens due to

limited approximation capabilities of the NN and non-

convex numerical optimization challenges.

Combination of the NN control with the LQR (or with

Table 2: Control performance (cost) comparison

the SDRE) provides larger recovery regions and better

overall performance. In this case the NN learns to gener-

Control Deflection of pendulums, deg.

ate corrections to the LQR (SDRE) control to compen-

Opposite direction

sate for suboptimality of the LQR (SDRE).

10 15 18 28

LQR 115.2 328.4 705.0 n/r To enhance this report, it would be valuable to inves-

SDRE 112.9 277.3 437.5 3655.8 tigate taking limited control authority into account in

NN 114.1 323.4 n/r n/r all control designs.

NN+LQR 113.3 275.7 448.3 n/r

NN+SDRE 112.5 276.3 436.6 2753.7

Same direction References

20 30 35 67

LQR 36.7 108.3 325.3 n/r [1] R. W. Brockett and H. Li. A light weight rotary

SDRE 36.0 84.0 118.0 5250.5 double pendulum: maximizing the domain of at-

NN 36.9 85.7 144.8 n/r traction. In Proceedings of the 42nd IEEE Confer-

NN+LQR 37.3 86.2 136.4 n/r ence on Decision and Control, Maui, Hawaii, De-

NN+SDRE 36.0 84.0 118.0 n/r cember 2003.

control using neural networks. IEEE Control Sys-

Table 3: Regions of pendulums recovery

tems Magazine, 18(6), December 1998.

Control Recovery region, deg.

Max. opposite dir. Max. same dir.

[3] J. R. Cloutier, C. N. D’Souza, and C. P. Mracek.

LQR 18 35 Nonlinear regulation and nonlinear H-infinity con-

SDRE 28 67 trol via the state-dependent Riccati equation tech-

NN 15 38 nique: Part1, theory. In Proceedings of the Interna-

NN+LQR 21 40 tional Conference on Nonlinear Problems in Avia-

NN+SDRE 28 62 tion and Aerospace, Daytona Beach, FL, May 1996.

tions, but it can’t stabilize the system beyond 15 (38) a sigmoidal function. Mathematics of Control, Sig-

deg. pendulum deflections in the opposite (same) di- nals, and Systems, 2(4), 1989.

rections due to limited approximation capabilities of the

NN. [5] G. F. Franklin, J. D. Powell, and A. Emami-Naeini.

In the second set of experiments, regions of pendulum Feedback control of dynamic systems. Addison-

recovery (i.e. maximum initial pendulum deflections Wesley, 2 edition, 1991.

from which a control law can bring the DIPC back to

the equilibrium) were evaluated. The results are shown [6] K. Furuta, T. Okutani, and H. Sone. Computer

in Table 3. Cases of critical initial pendulums positions control of a double inverted pendulum. Computer

for SDRE (28 deg. deflections the in opposite directions and Electrical Engineering, 5:67–84, 1978.

or 67 deg. in the same direction) are presented for the

reference in Fig. 16–17. Note that no limits were imposed [7] K. Hornik, M. Stinchcombe, and H. White. Multi-

on the control force magnitude in all simulations. The layer feedforward neural networks are universal ap-

purpose was to compare LQR and SDRE without intro- proximators. Neural Networks, 2:359–366, 1989.

ducing model changes which are not directly accounted

for in control design. It should be mentioned however, [8] R. A. Jacobs. Increasing rates of convergence

that limited control authority and ways of taking it into through learning rate adaptation. Neural Networks,

account in the SDRE design were investigated earlier [3] 1(4):295–307, 1988.

9

[9] A. Jadbabaie, J. Yu, and J. Hauser. Stabilizing re-

ceding horizon control of nonlinear systems: a con-

trol Lyapunov function approach. In Proceedings of

American Control Conference, 1999.

[10] A. Jadbabaie, J. Yu, and J. Hauser. Unconstrained

receding horizon control of nonlinear systems. In

Proceedings of IEEE Conference on Decision and

Control, 1999.

[11] E. Johnson, A. Calise, R. Rysdyk, and H. El-

Shirbiny. Feedback linearization with neural net-

work augmentation applied to x-33 attitude control.

In Proceedings of the AIAA Guidance, Navigation,

and Control Conference, August 2000.

[12] W. T. Miller, R. S. Sutton, and P. J. Werbos. Neural

networks for control. MIT Press, Cambridge, MA,

1990.

[13] M. Sznaizer, J. Cloutier, R. Hull, D. Jacques, and

C. Mracek. Receding horizon control Lyapunov

function approach to suboptimal regulation of non-

linear systems. Journal of Guidance, Control and

Dynamics, 23(3):399–405, May-June 2000.

[14] E. Wan, A. Bogdanov, R. Kieburtz, A. Baptista,

M. Carlsson, Y. Zhang, and M. Zulauf. Model

predictive neural control for aggressive helicopter

maneuvers. In T. Samad and G. Balas, editors,

Software Enabled Control: Information Technolo-

gies for Dynamical Systems, chapter 10, pages 175–

200. IEEE Press, John Wiley & Sons, 2003.

[15] E. A. Wan and A. A. Bogdanov. Model predictive

neural control with applications to a 6 DOF heli-

copter model. In Proceedings of IEEE American

Control Conference, Arlington, VA, June 2001.

[16] P. Werbos. Backpropagation through time: what it

does and how to do it. Proceedings of IEEE, spe-

cial issue on neural networks, 2:1550–1560, October

1990.

[17] W. Zhong and H. Rock. Energy and passivity based

control of the double inverted pendulum on a cart.

In Proceedings of the IEEE international confer-

ence on control applications, Mexico City, Mexico,

September 2001.

10

Bottom pendulum angles θ1 Bottom pendulum angles θ1

10 25

SDRE

5 20 LQR

NN

0 15 NN+LQR

NN+SDRE

deg

−5 10

deg

−10 SDRE 5

LQR

−15 NN 0

−20 NN+LQR −5

NN+SDRE

−25 −10

0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7

time, s

time, s

Bottom pendulum velocities dθ1/dt

Bottom pendulum velocities dθ1/dt

100 40

20

0

0

deg/s

deg/s

SDRE

LQR LQR

−200 NN NN

−40

NN+LQR NN+LQR

NN+SDRE NN+SDRE

−300 −60

0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7

time, s

time, s

Top pendulum angles θ2

Top pendulum angles θ 20

2

10 LQR

15 SDRE

NN

5 NN+LQR

10

NN+SDRE

0

deg

5

deg

−5 SDRE 0

LQR

−10 NN −5

NN+LQR

−15

NN+SDRE −10

0 1 2 3 4 5 6 7

0 1 2 3 4 5 6 7 time, s

time, s

Top pendulum velocities dθ /dt

2

Top pendulum velocities dθ2/dt 20

30

10

20

0

deg/s

10

−10

deg/s

0 SDRE

SDRE −20 LQR

−10 LQR NN

NN −30 NN+LQR

−20 NN+LQR NN+SDRE

−30 NN+SDRE −40

0 1 2 3 4 5 6 7

0 1 2 3 4 5 6 7 time, s

time, s

Cart position θ0

Cart position θ0 3.5

0.5 SDRE

3 LQR

0 NN

2.5 NN+LQR

−0.5 NN+SDRE

2

−1

m

1.5

−1.5 SDRE 1

m

−2 LQR

NN 0.5

−2.5 NN+LQR

NN+SDRE 0

−3 0 1 2 3 4 5 6 7

0 1 2 3 time, s 4 5 6 7 time, s

Cart velocity dθ0/dt 2.5

SDRE

2 2 LQR

NN

1.5 NN+LQR

1 NN+SDRE

m/s

1

m/s

0 0.5

SDRE 0

−1 LQR

NN −0.5

NN+LQR

−2 NN+SDRE −1

0 1 2 3 4 5 6 7

0 1 2 3 time, s 4 5 6 7 time, s

140 25

SDRE SDRE

LQR LQR

120 NN 20 NN

NN+LQR NN+LQR

NN+SDRE NN+SDRE

100

15

80

10

60

N

5

N

40

0

20

−5

0

−20 −10

−40 −15

0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3

time, s time, s

Figure 10: 10 deg. opposite direction Figure 11: 20 deg. same direction

11

Bottom pendulum angles θ Bottom pendulum angles θ

1 1

20

SDRE

10 30 LQR

NN

20 NN+LQR

0 NN+SDRE

deg

deg

−10 10

SDRE

LQR

−20 NN 0

NN+LQR

−30 NN+SDRE −10

0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7

time, s time, s

Bottom pendulum velocities dθ /dt Bottom pendulum velocities dθ /dt

1 1

200 50

100

0

deg/s

deg/s

−100 −50

SDRE SDRE

−200 LQR LQR

NN −100

−300 NN

NN+LQR NN+LQR

−400 NN+SDRE NN+SDRE

−150

0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7

time, s time, s

Top pendulum angles θ2 Top pendulum angles θ

2

10 30

SDRE

LQR

20 NN

0 NN+LQR

NN+SDRE

deg

10

deg

−10 SDRE

LQR 0

NN

−20 NN+LQR

NN+SDRE −10

0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7

time, s time, s

2 2

80

SDRE

60 LQR 0

NN

40 NN+LQR

−20

deg/s

NN+SDRE

deg/s

20

−40 SDRE

0

LQR

NN

−20 −60 NN+LQR

NN+SDRE

−40

0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7

time, s time, s

1

5

SDRE

0 4 LQR

NN

−1 3

m

NN+LQR

NN+SDRE

2

m

−2 SDRE

LQR

NN

1

−3 NN+LQR

NN+SDRE 0

−4

0 1 2 3 4 5 6 7 −1

time, s 0 1 2 3 4 5 6 7

time, s

Cart velocity dθ0/dt Cart velocity dθ0/dt

4

4

SDRE

2 3 LQR

NN

2 NN+LQR

m/s

0 NN+SDRE

m/s

1

SDRE

LQR

−2 NN

0

NN+LQR

NN+SDRE −1

−4

0 1 2 3 4 5 6 7 −2

time, s 0 1 2 3 4 5 6 7

time, s

Control force

Control force

250

SDRE 50 SDRE

LQR LQR

NN NN

200 NN+LQR

40 NN+LQR

NN+SDRE NN+SDRE

150

30

100

20

N

50 10

0 0

−50 −10

−100 −20

0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3

time, s time, s

Figure 12: 15 deg. opposite direction Figure 13: 30 deg. same direction

12

Bottom pendulum angles θ Bottom pendulum angles θ

1 1

30 40 SDRE

LQR

20 NN

20 NN+LQR

10 NN+SDRE

deg

deg

0

−10

−20 SDRE

LQR

NN+LQR

−20

−30

NN+SDRE

−40 0 1 2 3 4 5 6 7

0 1 2 3 4 5 6 7 time, s

time, s

Bottom pendulum velocities dθ /dt

Bottom pendulum velocities dθ1/dt 1

400

100

200

0

0

deg/s

deg/s

−100

SDRE

−200 LQR

SDRE −200 NN

LQR

−400 NN+LQR

NN+LQR

−300 NN+SDRE

NN+SDRE

−600 0 1 2 3 4 5 6 7

0 1 2 3 4 5 6 7

time, s

time, s

Top pendulum angles θ2

Top pendulum angles θ

2

30 LQR

SDRE

10 NN

20 NN+LQR

NN+SDRE

0

deg

10

deg

−10 0

SDRE

LQR

−20 NN+LQR −10

NN+SDRE

0 1 2 3 4 5 6 7

0 1 2 3 4 5 6 7 time, s

time, s

Top pendulum velocities dθ /dt

Top pendulum velocities dθ /dt 2

2

SDRE

100 LQR 0

NN+LQR

NN+SDRE

deg/s

deg/s

50 −50

SDRE

0 LQR

−100 NN

NN+LQR

−50 NN+SDRE

0 1 2 3 4 5 6 7

0 1 2 3 4 5 6 7 time, s

time, s

Cart position θ0

Cart position θ 6

0

1 SDRE

5 LQR

0 4 NN

NN+LQR

−1 3 NN+SDRE

m

2

m

−2

SDRE 1

LQR

−3 NN+LQR 0

NN+SDRE

−1

−4 0 1 2 3 4 5 6 7

0 1 2 3 time, s 4 5 6 7 time, s

Cart velocity dθ0/dt 6

6

SDRE

4 LQR

4 NN

2 NN+LQR

NN+SDRE

m/s

2

m/s

−2 SDRE

LQR

0

−4 NN+LQR

NN+SDRE

−6 −2

0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7

time, s time, s

350 100

SDRE SDRE

LQR LQR

300 NN+LQR NN

80

NN+SDRE NN+LQR

250 NN+SDRE

60

200

40

150

100 20

N

50

0

0

−20

−50

−40

−100

−150 −60

0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3

time, s time, s

Figure 14: 18 deg. opposite direction Figure 15: 35 deg. same direction

13

Bottom pendulum angles θ1

20

deg

0

−20

SDRE

−40 NN+SDRE

0 1 2 3 4 5 6 7

time, s

Bottom pendulum velocities dθ /dt

1

Pendulum angles

500 80

θ1

60 θ2

deg/s

0

40

deg

20

−500

SDRE 0

NN+SDRE

−1000 −20

0 1 2 3 4 5 6 7

time, s −40

0 1 2 3 time, s 4 5 6 7

Top pendulum angles θ2

20 Pendulum velocities

1500

10 dθ1/dt

1000 dθ2/dt

deg

0

500

−10

deg/s

0

−20

−500

−30 SDRE

NN+SDRE

−40 −1000

0 1 2 3 4 5 6 7

time, s −1500

0 1 2 3 4 5 6 7

time, s

Top pendulum velocities dθ2/dt

200 Cart position, θ0

15

100

deg/s

0 10

−100

5

m

−200

−300 SDRE

NN+SDRE 0

−400

0 1 2 3 4 5 6 7

time, s −5

0 1 2 3 4 5 6 7

time, s

Cart position θ0

1 Cart velocity, dθ0/dt

25

0 20

15

−1

m

m/s

10

−2 5

SDRE

NN+SDRE 0

−3

0 1 2 3 4 5 6 7 −5

time, s 0 1 2 3 4 5 6 7

time, s

Cart velocity dθ0/dt

10 Control force

5 600

0 400

m/s

−5

200

SDRE

−10 NN+SDRE

0

0 1 2 3 4 5 6 7

time, s

−200

N

Control force

1000

SDRE −400

NN+SDRE

800

−600

600 −800

400 −1000

200 time, s

N

0

Figure 17: 67 deg. same direction, SDRE

−200

−400

−600

0 0.5 1 1.5 2 2.5 3

time, s

NN+SDRE

14

- Nonlinear Ansys TechnologyTransféré parmatteo_1234
- Control of a Double Inverted Pendulum on aTransféré parJoseph Cassar
- How Do I Model Tension-Only Bracing in ETABSTransféré parNil D.G.
- The lagrangian equation examplesTransféré parGustavo
- Production StrategyTransféré parJD_04
- Optimal Control of Double Inverted Pendulum Using LQR ControllerTransféré parSachin Sharma
- Mechanical Intro 17.0 M03 Structural AnalysisTransféré parSamedŠkulj
- Dynamic User Equilibrium With Side Constraints for a Traffic Network Theoretical Development and Numerical SolutionTransféré parPriscila Rodrigues
- 20.IJASRAPR201720Transféré parTJPRC Publications
- Ber KaneTransféré parSamar Ahmed
- Adaptive Dynamic Programming IntroductionTransféré parzorrin
- dynamic optimizationTransféré paririnutza123
- Optimization of Pile Groups.pdfTransféré parLe Thanh Phan
- 87b0e070d76d2f23332584eb636eae15e0ebTransféré parimran5705074
- Aspects of process modellingTransféré parSanjay Rajbhar
- 1159Transféré parbadaigile
- C407X_01Transféré parelsaorduna
- Identification of HammersteinTransféré parAmrita Mech
- PBTransféré parjov08
- ModelTransféré parBharat Raj Singh
- Hidden Symmetries, First Integrals and Reduction of Order of Nonlinear Ordinary Differential EquationsTransféré parjustin wise
- Techteach.no Labview Lv86 Sim Module IndexTransféré parDiablo
- 1-s2.0-S0926985106001467-mainTransféré parZulfani Aziz
- MiniAssignment2_1811060Transféré parKundan Thakan
- Experimental Verification of Modeling of DELTA Robot Dynamics by Direct Application of Hamilton's PrincipleTransféré parfilipe o tosco
- dynamic-modelling-of-differentialdrive-mobile-robots-using-lagrange-and-newtoneuler-methodologies-a-unified-framework-2168-9695.1000107.pdfTransféré parMarcos Infante Jacobo
- vol7no4_1Transféré parCristi Budau
- AME20214_EngineeringComputingTransféré parrdc02271
- 10.1.1.25Transféré parfsood
- www.yokogawa.com_rd_pdf_TR_rd-tr-r00024-005.pdfTransféré parsinhleproviet

- Nonlinear control of an inverted pendulumTransféré parFawaz Parto
- Calculus of Variations & Optimal Control - SasaneTransféré parDEEPAK KUMAR MALLICK
- Anderson and Moore - Linear Optimal ControlTransféré parrperezi
- P.L.D. Peres; J.C. Geromel -- H2 Control for Discrete-time Systems Optimality and RobustnessTransféré parflausen
- introduction to the control of switching electronic systemsTransféré parمحمد سامي
- Lecture 4 Maximum PrincipleTransféré parmohamed-gamaa
- 2009__Neural Network Approach to Continuous-time Direct Adaptive Optimal Control for Partially Unknown Nonlinear SystemsTransféré parMohamed Charifi
- Adaptive Dynamic Programming IntroductionTransféré parzorrin
- Introduction+to+Optimization+and+Optimal+Control+using+the+software+packages+CasADi+and+ACADOTransféré parclifford
- Quad TradeTransféré parprabindra
- Lecture 5Transféré parAdityan Ilmawan Putra
- An Introduction to the Adjoint Approach to DesignTransféré parkevin ostos julca
- L7Transféré parAnonymous AFFiZn
- vtu.ac.in_archive_pdf_phd2012_course_elect2012.pdfTransféré parSachin Murarka
- Nonlinear and Dynamic OptimizationTransféré parAhmed H El Shaer
- Simultaneous Optimization and Solution Methods for Batch Reactor Control ProfilesTransféré parMcar
- GP a Multi-objective Resource Allocation Problem in Pert NetworkTransféré parTanika D Sofianti
- Solutions Manual for Optimal Control Theory -- Applications to Management Science ( Suresh P. Sethi ; Gerald L. Thompson ).pdfTransféré parChisn Lin Chisn
- hw4solTransféré parmaregbesola
- A Brief Introduction to Differential GamesTransféré parMohamedHasan
- Lunar Spacecraft Trajectory OptimizationTransféré partruman1723
- Optimal Control of Uncertain Nonlinear SystemsTransféré pardumiusername
- Ewing Calculus of Variations With Applications PDFTransféré parTheresa
- Maths RemediationTransféré parFawwaz Ramadona
- 1 Fuzzy Control of a Double Inverted Pendulum Based on Information FusionTransféré parVijay Priam
- A Two-Phased Guidance Law for Impact Angle Control withTransféré parAli Moharampour
- ScienceDirect - Computer Methods in Applied Mechanics and Engineering _ InveTransféré parWalter Vermehren
- Applied Intertemporal OptimiziationTransféré parMaria Jose Mendoza
- ae483 syllabusTransféré parSiyar Joyenda
- Métodos Matemáticos de La Física - Oscar a. ReulaTransféré parАренас Джоунс Оливер

## Bien plus que des documents.

Découvrez tout ce que Scribd a à offrir, dont les livres et les livres audio des principaux éditeurs.

Annulez à tout moment.