Vous êtes sur la page 1sur 6

Multi-UAV Formation Maneuvering Control Based on Q-Learning Fuzzy Controller

Pang Rui
School of Automation, Northwestern Polytechnical University, Xian, China
University of Sydney, SydneyNSW, Australia
pangrui517@gmail.com
linear feedback controller in order to maintain the formation
in the simple maneuver[2].
UAVs almost have small type, and the effects caused by
wind disturbance are much larger than those of normal
aircrafts. When UAVs are in formation flight, they can be
affected by tail vortex of the UAVs around them. Under
such circumstance, the accurate dynamic model of UAV is
hard to acquire and the normal model based controller
design methods can not achieve good control performance.
Fuzzy control method does not rely on the accurate model of
the control object, which makes it good adaptation of the
variation of the parameters. However, all design procedure
needs expert experience and manual adjusting. Recently,
machine learning is used to help controller designers to
design fuzzy controllers. Supervised learning is a fast
learning method, while when the current input-output data
do not exist, the supervised learning can not satisfy the
requirements. Reinforcement learning is one method of
unsupervised learning, which uses trial-correction method to
learn optimal strategy without the outside expert knowledge.
Q-learning is one improved TD(temporal difference)
reinforcement learning method given by Watikins. It is also
noted as off-policy TD, which adopts state-action reward
and Q function as estimate function, carries on consecutive
action choosing and searching, ultimately forming the
optimal policy [12]. Buijtenen used Q-learning to adjust the
parameters of fuzzy control in order to design an adaptive
controller applied in satellite attitude control. Xiaohui Dai
use Q-learing to control direction and speed of intelligent
vehicles[7]. However, the application of Q-learning in the
parameter learning of fuzzy flight control is relatively rare.
We research on the design and realization of Q-learning
fuzzy controller design, and apply it to the multi-UAV
formation control. We also combine the controller with the
decision creator of maneuver commands, and conduct a
maneuvering simulation of five UAVs.

Abstract: On the basis of the relative motion relations of the


formation flight, UAV longitudinal and lateral fuzzy
controllers are designed to solve the multi-UAV formation
control problem. The relative positions between adjacent
UAVs are controlled to meet the desired commands and
performance requirements. Q-Learning method, which is one
kind of reinforcement learning, is used to tune the
corresponding parameters in output membership functions of
fuzzy controller. This auto-tuning avoids the complexity of
manual tuning with expert experience and eliminates the
steady state errors. Also, different conditions including
coordinative turning, tense-loose shape changing, shape
sequence changing and collision avoiding are simulated with
the formation control methodology, which is comprised of
centralized decision and decentralized control. The results
prove the correctness of control method and formation control
strategy under the circumstances of different formation
maneuvering demands.
Key Words: Reinforcement learning, Q-learning, fuzzy control,
UAVs, Formation Flight, Maneuvering

I. INTRODUCTION
UAV formation cooperative mission is a new frontier of
the application of UAVs, which is extended from single
specific mission conducted by single UAV and plays an
important role in the application of UAV in the future battle
and civilian field. The UAVs can cooperative with tactics
information, share battlefield situation, cooperatively
distribute firepower and decoys and unify tactic decisions.
The main task of UAV formation control includes formation
maneuver, single UAV localization, keeping relative
position, collision avoiding of multi-UAV maneuvering and
splitting and reforming of the formation. All these tasks
require the high specification and robustness of the
controller. [1]-[2]
Recently lots of theoretical and experimental research
has been conducted in the field of multi-UAV formation
control. Singh designed the nonlinear adaptive close
formation controller to realize the Leader-Follower
formation control of two UAV and demonstrate the
close-loop stability of the control system[5].Giulietti
constructs multi-UAV model with vortex disturbance and
designs the LQ-Servo to maintain the formation shape[3].
Proud illustrates the fact that drag can be reduced by 10% to
25% when UAVs form a close formation[11].In [4], West
Virginia University carries an experiment which used 3
small YF-22 like UAV to form a formation flight and
realizes the leader-follower tracking control using linear
feedback controller. Lingpei Zong in Beihang University
designed the formation optimistic controller based on
multi-agent system[1] and Zheng He designed the distributive
978-1-4244-5848-6/10/$26.00 2010 IEEE

II.

THE RELATIVE MOTION RELATIONS OF UAV


FORMATION FLIGHT

With the purpose of analyzing and controlling the UAV


formation maneuvering, we need to establish the relative
motion relations among UAVs. The UAVs in the formation
can share flight information through Joint Tactics
Information Distribution System to avoid collision and
achieve precise control. We assume that there exist m UAVs
in the formation, then the shared information of the ith UAV
is as follows:

252

Diagram 1:
longitude

III.

The shared common information of the ith


UAV
velocity_z Vzi (m / s)
i
(D )

latitude

altitude
velocity_x

hi
Vxi

pitch
angle
D
yaw angle
()
(m / s ) roll angle

velocity_y

Vyi

(m / s)

(D )

(D )

(D )

BRIEF INTRODUCTION TO Q-LEARNING METHOD

Q-learning is one of the reinforcement learning methods.


Reinforcement learning is unsupervised learning method,
which differs from supervised learning needing expert
knowledge and prime knowledge.
The procedure of reinforcement is just a kind of Markov
Decision Process (MDP). The agent has aggregates of S and
A. S represents the total of different states and A represents
the actions available. At one discrete time step t, agent can

(D )

acquire current state s t , choose current action a t and do it.


The environment responds to this action and gives
immediate reward r = r ( s , a ) . After the agent interacts
= ( s , a )
with the environment, a subsequent state S
accrues and the agent continues the above sequence and the
learning activity proceeds. In Markov Decision Process,
function ( s t , a t ) and r ( s t , a t ) rely on current states
and actions, not the previous states and actions [7].
The purpose of reinforcement learning is to learning a
policy , which is mapping from the chosen states S to

Therein, latitude i , longitude i and altitude hi can be

transformed to coordinates xi , yi and z i in ground


coordinate system. In [11], the motion relations of two
UAVs, leader and follower, are described in details. Now
we extend it to the formation system including m UAVs.
With ground reference Oxg y g z g , the relative position

t + 1

between the ith and jth ( i , j m , i j )UAVs is given as


follows:
x rij = x i x j

y
rij = y i y j

h rij = hi h j

dis = P os P os
x rij 2 + y rij 2 + hrij 2
i
j 2 =
ij

rij = i j

When the UAVs are in Leader-Follower flight, we


concern more about the relative distance in the motion
direction. However, the navigation system equipped can
with the
only supply the relative position x r i j and y
ground reference, which is hard to be the control reference
inputs. As a result, we adopt the wind reference coordinates
in the ith aircraft to describe the relative position between ith
and jth UAV[3][11].
M ai g
as the transforming matrix from wind
Assume
reference in ith UAV to ground reference:
0
0 cosi 0 sini cos i sin i 0
1
Maig = 0 cosi sin i < 0
1
0 < sin i cosi 0
0
1
0 sin i cosi sini 0 cosi 0
Then new relative position can be given as:

actions A. It maximize
can be defined as

( st )

for all s S .

( st )

V ( st ) rt + rt +1 + 2 rt + 2 + ... = i rt + i
i=0

[6],[12]

The key of Q-learning is the introduction of an


evaluative function Q ( s, a ) , the definition of which is the
maximum discount reward from state s and using action a as
the first action,
Q ( s t +1 , a t +1 ) = r ( s t , a t ) + m ax Q ( ( s t , a t ), a ) [12]

a
(4)
Therefore, following the above rules, agents repeatedly
s
observe the current state t , choose some action a, interact
with the environment, observe the reward r ( st , at ) and the

r ij

= (s , a )

t
t , update the Q value Q ( st , at ) and
new state t +1
ultimately achieve the maximum Q value and the
corresponding optimal policy.
Considering the uncertainties of the aircraft model, the
flight control system can be an uncertain Markov Decision
Process. In order to achieve the convergence, we need to
introduce the improved Q-learning iterative method:
[8]
(5)
Q(st +1, at +1) = (1n )Q(st , at ) +n[r + max Q(st +1, a)]


xrij
xrij


= M ai g yrij
yrij
h
h
rij
rij

IV.

SINGLE UAV CONTROLLER DESIGN BASED ON


Q-LEARNING FUZZY CONTROL

A. Controller Design Frame

2
Therefore the task of UAV formation maneuvering
controller is to control the relative position of UAVs, to
achieve the goal:

xcij
xrij = xrij

ycij = 0 
lim yrij = yrij
t

hrij = hrij hcij


( xcij , ycij , hcij ) is the desired postions.
253

Denote one UAV in the formation as U i . Different


from single normal UAV, U i not only need to have the
basic function such as altitude keeping, velocity and
direction control, but need to receive formation command
information, maneuver in the formation and keep the
formation shape. We illustrate the controller system design
including receiving the outside information as follows:

x1

A1 j 1

xn

x,y,z

, ,

j1
1

j2
2

B ( u ) = m ax

m ax

X k = ( x k , y k , z k , V k , k , k , k ) T
k = 1, 2 " m , k i
. This layer contains an autonomous decision loop to give
the command to inner layer. The actual speed controller,
altitude controller, and lateral position controller accept the
reference input signal u r = ( V , h i j , x i j , y i j ) , output
p

A nj 1

( xn ),

A nj 2

( x n ) " [7]

k1
1

k2
2

If the membership function of the conclusion part are


chosen as triangle membership function with m linguistic
variable, then:

This figure illustrates the control strategy of combining


two layers. The outer layer receive current flight states of its
own UAV and the outer necessary information, such as own
position x , y , z , own speed V , angular velocity
p i , q i , ri
,Eular angles i , i , i and state of other UAVs,

e , a , r ,

( x1 ) "

B (U ) = [ A ( x1 ) A ( x2 )"] [ A ( x1 ) A ( x2 )"]

Figure 1 Single UAV control system design

A1 j 2

We adopt the Mamdani type IF-Then rule, which can be


repesented as follows:
Rl :IF(x1 is A1j ) AND(x2 is A2k )"AND"THEN(u1 is B1l )"
In order to acquire the membership function of the
conclusion part, we need to apply max-min inference
method. If at the same time, there are only two acting rules,
the membership function of the conclusion part is :

ax,ay,az

( x1 ) ,

x, y , z , V

if u < c1 o r u > c m
2 ( u c j )
0,
elseif u j < = c j
wj

2 ( u c j )
0,
elseif u j < = c j
wj

(6)
The parameters that can be tuned in the Mamdani-type
fuzzy controller mainly includes the conclusion of the rules,
the number of universe of discourse of input and output
variables, width of the membership functions. The drawback
of reinforcement learning is that when the number of
parameters grows, the learning speed decreases greatly.
Q-learning also needs to divide the states of each tuned
parameter, which also leads to the dimension catastrophe.
Therefore, first we design the rough fuzzy rules with expert
experience, and these rules only can supply fundamental
performances. Q-learning is applied to tune the parameters
of the membership function.
The fuzzy rule surface is as follows:

commands and drive the actuators.

B. Fuzzy Controller Design Based on Q-learning

Fuzzy controller is applied to design altitude and lateral


control, because of its easy realization in application and
weak dependence on precise UAV model. Fuzzy controller
can be capable of dealing with nonlinear system with
uncertain parameters, and can acquire better performance.
Traditional fuzzy controller design usually tunes the
parameters of the controller with human experience, but for
MISO, MIMO system, manual tuning is an annoying work.
As a result, we had researched on a control method which
adopt Q-learning to automatic optimizing the parameters of
fuzzy controller.
For the MIMO system, if we assume that the controller
has n inputs: x 1 , x 2 , " x n ,m outputs u 1 , u 2 , " u m .Denote
A i j as the jth linguistic variable in the ith input fuzzy set

x i , and
B i j as the jth linguistic variable of ith output fuzzy
set.
Ai = { Ai j : j = 1, 2, " , N i }

Therein

i = 1, 2 " n

Figure 2

The input scaling factor is g1 = g 2 = 1 , and output


g = 5 / 57.3
. The simulation initial states
scaling factor is o
are
(V0 , 0 , 0 , h0 ) = (34.5m / s,5.844 ,5.844 ,1000m)
We adopt the candidates of c 4 and w 4 :

j
is number of the linguistic variable of Ai .

Bi = { Bi j : j = 1, 2," , M i }

i = 1, 2 " m

is the number of the linguistic variable of


x
Bi
Therein N i is number of the linguistic variable i . M i
Mi

c4

=[-0.5 -0.4 -0.35 -0.3 -0.2 -0.15 -0.1]


w 4 =[1.5 1.7 2.2 2.5 3]Equation (5) is applied and
= 0.8, n = 0.95

is the number of the linguistic variable i


Every input and output linguistic variable has a

membership function, denoted as A B . For some certain


T
input X = ( x1 , x 2 , " x n ) , through fuzzification, we can
get:
i

Fuzzy Rule surface

254

There exist 5*7=35 candidates in sum. After 3000s


learning period4 of the randomly selected Q-values of
action pairs are illustrated

Figure 4 Altitude Control Performance (red:(-0.5, 1.7), blue:(-0.4, 1.5),


green:(-0.1, 1.7), black(-0.15,2.2))

Figure 3 Q-value learning curve

The table shows the Q-vaules after the learning period


(c4 , w4 )

-0.5

1.5
-0.210

1.7
-0.217

2.2
-0.049

2.5
-0.169

3
-0.097

-0.4
-0.35
-0.3
-0.2
-0.15
-0.1

-0.154
-0.08
-0.199
-0.098
-0.151
-0.103
-0.104
-0.197
-0.152
0.011
-0.224
-0.102
-0.113
-0.067
-0.093
-0.06
-0.066
-0.12
-0.04
-0.097
-0.102
-0.009
-0.012
-0.023
-0.025
0.0119
0.113
-0.129
0.024
0.096
Diagram 2: Q-value table after learning
Obviously, the action pair (-0.1,1.7) is the largest with
best performance. As a result, we adopt this action pair, and
the altitude control result is shown in figure 4
Similarly, we design lateral controller to control UAVs
rolling angle. The tracking performance is illustrated in
figure5:

Figure 5 Rolling Angle Control

V.

MULTI-UAV FORMATION MANEUVERING CONTROL


AND SIMULATION
Multi-UAVs which take formation flight need
centralized decision and decentralized control. Normally,
UAV receives the command from ground station, and to
arrange mission based on computer equipments inbound.
However, when UAVs enters some areas where
communications are jammed, the formation needs to
autonomously conduct coordinate control and decision.
The simulation platform based on Matlab and simulink
is as follows:

Figure 6 Simulation Platform

A.

G
P1 = ( x1 , y1 , z1 , 1 ) = (800,1000,1000, 235 )
G
P2 = ( x 2 , y 2 , z 2 , 2 ) = (800, 990,1000, 235 )
G
P3 = ( x 3 , y 3 , z 3 , 3 ) = (800, 980,1000, 235 )
G
P4 = ( x 4 , y 4 , z 4 , 4 ) = (810, 980,1000, 235 )
G
P5 = ( x 5 , y 5 , z 5 , 5 ) = (820, 980,1000, 235 )
The states of UAVs passing set waypoints are:

Coordinate Control and Simulation

The adjacent distance between UAVs is 10m at initial


point. After the leader receivers the turn command, it issues
the command and all the UAVs in the formation initialize a
turning maneuver. The controller keeps the distance between
each UAV stable.
The initial UAV flight conditions are:
255

d 1 2 = 1 2 .4 m , d 2 3 = 1 1.4 m , d 3 4 = 1 1 .7 m , d 4 5 = 1 0 .2 m

1 = 268.9D , 2 = 270.9D , 3 = 270.2D , 4 = 270.5D , 4 = 270.5D

Figure 8 Spacing Control Simulation

C.

Figure7 Coordinate maneuvering simulation

B.

The shape adjustment is vital in UAV formation flight.


When UAV take recon mission, the UAV which have recon
equipments need to be in the outer side of the formation,
while attack mode initiates, the recon UAV need to
maneuver to the inner part of the formation in order to
acquire protection. The attacking UAV can move to the
outer part to take cover and attack.
Take reversing sequence of UAVs as an example. The
procedure is as followers:
1) Leader issues the formation change command;
solve
out
the
relative
2) UAVs
x , y , h
position
, adjust the altitude h i to

Control Strategy of the Intense to Loose Formation


Change

The spacing of formation is determined by the task that


UAVs have taken. Multi-UAVs need to keep loose
formation when pass the battle field or cruise with high
speed.
The
adjacent
UAVs
keep
large
distance x , y , h . Nevertheless, when UAVs take
cover for each other, or carry on the mission to attack
ground targets, the less distance of adjacent UAV is needed.
After
Leader
issued
the
formation
command x c o m i j , y c o m i j , h c o m i j , the followers adjust the
yaw angle to indirectly adjust the relative positions.
The initial simulation conditions are:
G
P1 = ( x1 , y1 , z1 , 1 ) = (800,1000,1000, 45 )
G
P2 = ( x2 , y2 , z2 , 2 ) = (900,1000,1000, 45 )
G
P3 = ( x3 , y3 , z3 , 3 ) = (1000,1000,1000, 45 )
G
P4 = ( x4 , y4 , z4 , 4 ) = (1000,900,1000, 45 )
G
P5 = ( x5 , y5 , z5 , 5 ) = (1000,800,1000, 45 )
If we fix the relative coordinate system on No3 leader,
with the transformation of equation 2, the purpose of the
formation control is to adjust the lateral distance from the
original 50 2 m to loose distance 100m.
The
G position after the formation adjustment is as follows:
P1 = ( x1 , y1 , z1 , 1 ) = (2824.1,3733.2,1000.5, 46.5 )
G
P2 = ( x2 , y2 , z2 , 2 ) = (3102.5,3590.1,1000.4, 43.6 )
G
P3 = ( x3 , y3 , z3 , 3 ) = (3418.3,3417.8,1000.4, 45.2 )
ij

ij

The Adjustment of Formation Shape

ij

ij

ij

ij

achieve gradient altitudes, which aims to avoid


collision.
3) When gradient altitude achieved, UAVs adjust the
yaw angle and change the flight path.
4) Harmonize the altitude, keep minor adjustment of
flight path and achieve the coordinate flight after
the formation change.
UAVs initially form the delta shape. Leader issues the
command to change the position with symmetric side. The
simulation results show that to avoid the danger of collision,
No1, No2, No3 UAVs automatically rise 30m, 20m and 10m.
When the height goes up, lateral distance and yaw angle are
adjusting too.

G
P4 = ( x4 , y4 , z4 , 4 ) = (3590.4,3102.2,1000.7, 46.34 )
G
P5 = ( x5 , y5 , z5 , 5 ) = (3733.3, 2823.7,1000.1, 43.4 )

After control, we can acquire the relative lateral distance


between UAVs:
X r13 = 197.3m, X r 23 = 102.4m, X r 34 = 101.1m, X r 35 = 198.5m

256

[2] He Zhen, Lu Yuping. Decentralized Design Method of UAV Formation


Flight Shape Keeping Controller. Journal of Aeronautics(Chinese).
29(Appendix Edition):S55-S60
[3] Fabrizio Giulietti, Mario Innocenti, Marcello Napolitano, Lorenzo
Pollini. Dynamic and control issues of formation flight. Aerospace
Science and Technology. vol 9,2005:65-71
[4] Giampiero Campa, Yu Gu, Brad Seanor, Marcello R. Napolitano,
Lorenzo Pollini, Mario L. Fravolini. Design and flight-testing of
non-linear formation control laws. Control Engineering Practice
15(2007): 1077-1092
[5] Sahjendran. Singh, Phil Chandler, Corey Schumacher, Siva Banda,
Meir Pachter. Nonlinear Adaptive Close Formation Control of
Unmanned
Aerial
Vehicles.
Dynamics
and
Control,
vol10,2000,179-194
[6] Walter M.van Buijtenen, Gerard Schram, Robert Babuska, and Henk B.
Verbruggen, Adaptive Fuzzy Control of Satellite Attitude by
Reiforcement Learning. IEEE Transactions on Fuzzy Systems,
vol6(2),1998:185-194
[7] Xiaohui Dai, Chi-Kwong Li,A.B.Rad. An Approach to Tune Fuzzy
Controllers Based on Reinforcement Learning for Autonomous Vehicle
Control. IEEE Transactions on Intelligent Transportation System,
vol6(3),2005:285-293:
[8] N.Pappa, S.Rama Krishnan. Autotuning of Fuzzy Inference System
with RL. Proceedings of the 2007 American Control Conference, New
York City, USA, July 11-13,2007
[9] Meng Joo Er, Chang Deng. Online Tuning of Fuzzy Inference Systems
Using Dynamic Fuzzy Q-Learning. IEEE Transactions on Systems,
Man and Cybernetics-Part B: Cybernetics, Vol34(3),2004:1478-1489
[10] Sefer Kurnaz, Omer Cetin, Okyay Kaynak. Fuzzy Logic Based
Approach to Design of Flight Control and Navigation Tasks for
Autonomous Unmanned Aerial Vehicles. Journal of Intelligent Robot
Systems.
[11] Andrew W. Proud, Meir Pachter, and John J. DAzzo. Close
Formation Flight Control. Proceedings of AIAA Guidance,
Navigation, and Control Conference, AIAA-99-4207,Portland, OR,
August 1999:1231-1246
[12] Gao Yang, Chen Shifu, Lu Xing. Summary of Research on
Reinforcement Learning. ACTA AUTOMATICA
SINICA.
30(1),2004:86-100

Figure 9 Simulation of Inner Position Adjustment

D.

Emergency Maneuvering Simulation:

5 UAVs initially form delta formation and suddenly No.


2 UAV has some fault in controller, which can not keep
stable and straight flight. The distance between No.1 and
No.2 UAV becomes drastically smaller. When this collision
danger occurs, the No.1 UAV rapidly autonomously rise to
1010 meters and to avoid this sudden impact danger. Until
the danger vanished, No1 UAV lowers to 1000m and to
compensate No.2s space instead.

Figure 10 Emergency Maneuvering simulation

This Simulation result shows the emergency


maneuvering of UAV formation flight when some
unpredictable danger occurs.
VI. CONCLUSION:
We analyzed the relative motions between formation
UAVs and designed altitude controller, rolling controller
based on Q-learning method. Q-learning is useful way to
adjust the parameters of the fuzzy controller and to reduce
stable error during the controlling process. Also, we utilized
the designed controller to simulate a series of conditions of
UAV formation flight, which demonstrates the validity of
the controller and control strategy of decision. The future
work mainly focuses on more detail research of the different
conditions of UAV formation flight.

References:
[1] Zong lingpeiXie FanQin Shiyin. UAV Intelligent Optimal Control
of Formation Flight Based on MAS. Chinese Journal of
Aeronautics(Chinese). 29(5):1326-1333

257

Vous aimerez peut-être aussi