Académique Documents
Professionnel Documents
Culture Documents
Pang Rui
School of Automation, Northwestern Polytechnical University, Xian, China
University of Sydney, SydneyNSW, Australia
pangrui517@gmail.com
linear feedback controller in order to maintain the formation
in the simple maneuver[2].
UAVs almost have small type, and the effects caused by
wind disturbance are much larger than those of normal
aircrafts. When UAVs are in formation flight, they can be
affected by tail vortex of the UAVs around them. Under
such circumstance, the accurate dynamic model of UAV is
hard to acquire and the normal model based controller
design methods can not achieve good control performance.
Fuzzy control method does not rely on the accurate model of
the control object, which makes it good adaptation of the
variation of the parameters. However, all design procedure
needs expert experience and manual adjusting. Recently,
machine learning is used to help controller designers to
design fuzzy controllers. Supervised learning is a fast
learning method, while when the current input-output data
do not exist, the supervised learning can not satisfy the
requirements. Reinforcement learning is one method of
unsupervised learning, which uses trial-correction method to
learn optimal strategy without the outside expert knowledge.
Q-learning is one improved TD(temporal difference)
reinforcement learning method given by Watikins. It is also
noted as off-policy TD, which adopts state-action reward
and Q function as estimate function, carries on consecutive
action choosing and searching, ultimately forming the
optimal policy [12]. Buijtenen used Q-learning to adjust the
parameters of fuzzy control in order to design an adaptive
controller applied in satellite attitude control. Xiaohui Dai
use Q-learing to control direction and speed of intelligent
vehicles[7]. However, the application of Q-learning in the
parameter learning of fuzzy flight control is relatively rare.
We research on the design and realization of Q-learning
fuzzy controller design, and apply it to the multi-UAV
formation control. We also combine the controller with the
decision creator of maneuver commands, and conduct a
maneuvering simulation of five UAVs.
I. INTRODUCTION
UAV formation cooperative mission is a new frontier of
the application of UAVs, which is extended from single
specific mission conducted by single UAV and plays an
important role in the application of UAV in the future battle
and civilian field. The UAVs can cooperative with tactics
information, share battlefield situation, cooperatively
distribute firepower and decoys and unify tactic decisions.
The main task of UAV formation control includes formation
maneuver, single UAV localization, keeping relative
position, collision avoiding of multi-UAV maneuvering and
splitting and reforming of the formation. All these tasks
require the high specification and robustness of the
controller. [1]-[2]
Recently lots of theoretical and experimental research
has been conducted in the field of multi-UAV formation
control. Singh designed the nonlinear adaptive close
formation controller to realize the Leader-Follower
formation control of two UAV and demonstrate the
close-loop stability of the control system[5].Giulietti
constructs multi-UAV model with vortex disturbance and
designs the LQ-Servo to maintain the formation shape[3].
Proud illustrates the fact that drag can be reduced by 10% to
25% when UAVs form a close formation[11].In [4], West
Virginia University carries an experiment which used 3
small YF-22 like UAV to form a formation flight and
realizes the leader-follower tracking control using linear
feedback controller. Lingpei Zong in Beihang University
designed the formation optimistic controller based on
multi-agent system[1] and Zheng He designed the distributive
978-1-4244-5848-6/10/$26.00 2010 IEEE
II.
252
Diagram 1:
longitude
III.
latitude
altitude
velocity_x
hi
Vxi
pitch
angle
D
yaw angle
()
(m / s ) roll angle
velocity_y
Vyi
(m / s)
(D )
(D )
(D )
(D )
t + 1
y
rij = y i y j
h rij = hi h j
dis = P os P os
x rij 2 + y rij 2 + hrij 2
i
j 2 =
ij
rij = i j
actions A. It maximize
can be defined as
( st )
for all s S .
( st )
V ( st ) rt + rt +1 + 2 rt + 2 + ... = i rt + i
i=0
[6],[12]
a
(4)
Therefore, following the above rules, agents repeatedly
s
observe the current state t , choose some action a, interact
with the environment, observe the reward r ( st , at ) and the
r ij
= (s , a )
t
t , update the Q value Q ( st , at ) and
new state t +1
ultimately achieve the maximum Q value and the
corresponding optimal policy.
Considering the uncertainties of the aircraft model, the
flight control system can be an uncertain Markov Decision
Process. In order to achieve the convergence, we need to
introduce the improved Q-learning iterative method:
[8]
(5)
Q(st +1, at +1) = (1n )Q(st , at ) +n[r + max Q(st +1, a)]
xrij
xrij
= M ai g yrij
yrij
h
h
rij
rij
IV.
2
Therefore the task of UAV formation maneuvering
controller is to control the relative position of UAVs, to
achieve the goal:
xcij
xrij = xrij
ycij = 0
lim yrij = yrij
t
x1
A1 j 1
xn
x,y,z
, ,
j1
1
j2
2
B ( u ) = m ax
m ax
X k = ( x k , y k , z k , V k , k , k , k ) T
k = 1, 2 " m , k i
. This layer contains an autonomous decision loop to give
the command to inner layer. The actual speed controller,
altitude controller, and lateral position controller accept the
reference input signal u r = ( V , h i j , x i j , y i j ) , output
p
A nj 1
( xn ),
A nj 2
( x n ) " [7]
k1
1
k2
2
e , a , r ,
( x1 ) "
B (U ) = [ A ( x1 ) A ( x2 )"] [ A ( x1 ) A ( x2 )"]
A1 j 2
ax,ay,az
( x1 ) ,
x, y , z , V
if u < c1 o r u > c m
2 ( u c j )
0,
elseif u j < = c j
wj
2 ( u c j )
0,
elseif u j < = c j
wj
(6)
The parameters that can be tuned in the Mamdani-type
fuzzy controller mainly includes the conclusion of the rules,
the number of universe of discourse of input and output
variables, width of the membership functions. The drawback
of reinforcement learning is that when the number of
parameters grows, the learning speed decreases greatly.
Q-learning also needs to divide the states of each tuned
parameter, which also leads to the dimension catastrophe.
Therefore, first we design the rough fuzzy rules with expert
experience, and these rules only can supply fundamental
performances. Q-learning is applied to tune the parameters
of the membership function.
The fuzzy rule surface is as follows:
x i , and
B i j as the jth linguistic variable of ith output fuzzy
set.
Ai = { Ai j : j = 1, 2, " , N i }
Therein
i = 1, 2 " n
Figure 2
j
is number of the linguistic variable of Ai .
Bi = { Bi j : j = 1, 2," , M i }
i = 1, 2 " m
c4
254
-0.5
1.5
-0.210
1.7
-0.217
2.2
-0.049
2.5
-0.169
3
-0.097
-0.4
-0.35
-0.3
-0.2
-0.15
-0.1
-0.154
-0.08
-0.199
-0.098
-0.151
-0.103
-0.104
-0.197
-0.152
0.011
-0.224
-0.102
-0.113
-0.067
-0.093
-0.06
-0.066
-0.12
-0.04
-0.097
-0.102
-0.009
-0.012
-0.023
-0.025
0.0119
0.113
-0.129
0.024
0.096
Diagram 2: Q-value table after learning
Obviously, the action pair (-0.1,1.7) is the largest with
best performance. As a result, we adopt this action pair, and
the altitude control result is shown in figure 4
Similarly, we design lateral controller to control UAVs
rolling angle. The tracking performance is illustrated in
figure5:
V.
A.
G
P1 = ( x1 , y1 , z1 , 1 ) = (800,1000,1000, 235 )
G
P2 = ( x 2 , y 2 , z 2 , 2 ) = (800, 990,1000, 235 )
G
P3 = ( x 3 , y 3 , z 3 , 3 ) = (800, 980,1000, 235 )
G
P4 = ( x 4 , y 4 , z 4 , 4 ) = (810, 980,1000, 235 )
G
P5 = ( x 5 , y 5 , z 5 , 5 ) = (820, 980,1000, 235 )
The states of UAVs passing set waypoints are:
d 1 2 = 1 2 .4 m , d 2 3 = 1 1.4 m , d 3 4 = 1 1 .7 m , d 4 5 = 1 0 .2 m
C.
B.
ij
ij
ij
ij
ij
G
P4 = ( x4 , y4 , z4 , 4 ) = (3590.4,3102.2,1000.7, 46.34 )
G
P5 = ( x5 , y5 , z5 , 5 ) = (3733.3, 2823.7,1000.1, 43.4 )
256
D.
References:
[1] Zong lingpeiXie FanQin Shiyin. UAV Intelligent Optimal Control
of Formation Flight Based on MAS. Chinese Journal of
Aeronautics(Chinese). 29(5):1326-1333
257