Vous êtes sur la page 1sur 6

Decision Making Framework for Autonomous Vehicle Navigation

Augie Widyotriatmo1 and Keum-Shik Hong2


School of Mechanical Engineering, Pusan National University, Busan, Korea (Tel : 1+82-51-510-1481, 2+82-51-510-2454; E-mails: {1augie, 2kshong}@pusan.ac.kr) Abstract: A key trait of an autonomous vehicle is the ability to handle multiple objectives, such as planning to guide the autonomous vehicle from an initial to a goal configuration, avoiding obstacles, and decision making to choose an optimal action policy. Moreover, the autonomous vehicle should act in ways that are robust to sorts of uncertainties, such as wheel slip, sensors affected by noise, obstacle move unpredictably, etc. In this paper we propose a decision making framework for autonomous vehicle conducting a nonholonomic motion planner, localization, obstacle avoidance, and also dealing with the uncertainties. The decision making framework manages the safety and task related assignment by adopting the partially observable Markov decision process model. We predict N-steps belief states from the action sequence candidate and use them as inputs of a proposed reward value function. The dimensionality problem in the searching of an action sequence from the action space (linear and rotational velocities) is simplified by applying a nonholonomic motion planner as a reference. The simulation results show the decision path resulting from decision making framework depends on the initial setting of belief state of its position and orientation, and also the determination of uncertainties of the sensors and the actuators of the vehicle. Keywords: autonomous vehicle, nonholonomic system, motion planning, obstacle avoidance, decision making
1, 2

1. INTRODUCTION
In order to be able to do a particular task, an autonomous vehicle (agent) should be able to make a planning, execute the plan, and also have to decide when it should reinitialize the plan. Most approaches in autonomous vehicle research are to produce a collision-free path, and then during execution other methodologies are employed to accommodate the unexpected changes of the environment. One problem that we should consider in the most autonomous vehicle is that it deals with nonholonomic constraint. The nonholonomic constraint is characterized by the system has less controls than configuration variables. Its configuration can be represented by position (x, y) and orientation while its controls are linear v and angular space . This characteristics make such a nonholonomic agent cannot move sidewise. The nonholonomic motion planning may be considered as the problem of planning open loop control [13, 19, 20] and the problem of feedback control [8, 10, 22]. The open loop strategies are generally able to guide the system from the initial to the final configuration, but the approach is not robust. While closed loop strategies are potentially more robust, but the dynamics of the closed loop system may show oscillatory response, which is not at all necessary to reach the desired final configuration. However, it is possible to use open loop as well as close loop strategies in an iterative design method. The robustness properties of this method have been shown by Lucibello and Oriolo in [18]. The obstacle avoidance problem adds one level difficulty of autonomous vehicle with nonholonomic constraint. In this paper, we assume that the map and the geometric addressing are already provided. The remaining problem is the unmodeled obstacles that have not included in the geometric map. Several techniques

for obstacle avoidance problem were proposed, such as potential field method [17] and vector field histogram (VFH) [12], but the approaches are not included the nonholonomic constraint. The methods for obstacle avoidance where the nonholonomic constraint is taken into account appeared in curvature velocity method (CVM) [21], lane curvature method [16], dynamic window approach (DWA) [6], VFH+ [23] etc. All the above mentioned obstacle avoidance methods stick on the initial planned path to the destination point obtained by the global path planning module and do not reinitialized plan no matter the changes that occur in the environment. Recently, some approaches that integrate the path planning and obstacle avoidance modules have been proposed such as in [5]. Another feature of intelligence of an agent is to deal robustly with uncertainties. Several approaches are presented to overcome the uncertainty, for example as described in [22] which is lead to Markov Decision Process (MDP) and the extensions of it: the Partially Observable Markov Decision Process (POMDP). The POMDP explicitly models navigation uncertainty including sensors and actuators uncertainties and approximate knowledge of the current configuration. Many works of this approach are to find the best actions sequence to minimize the uncertainty of localization [3, 15]. The most prominent approach to calculate the most optimal sequence actions (policy) is by using the value iteration method. This method induces a complex computation [1]. Another approach instead of POMDP is by using the model predictive control (MPC) as a planning framework [4]. The representation of uncertainty is estimated in the frame of extended Kalman filter (EKF). The one step action is chosen based on N-steps optimal control problem. In this paper, we adopt POMDP model as a decision making model for autonomous vehicle with predictive

framework. The belief states resulting from N-steps prediction are used to determine the best action policy. The sequential actions are searched among neighbors of nonholonomic system planner. The N-steps belief state prediction uses the extended Kalman filter (EKF) and it is implicitly embedded in the localization framework. Accordingly, the proposed method integrates the localization, planning, and obstacle avoidance considering the uncertainty of actuator and sensors; and the nonholonomic constraint of the vehicle.

execute for any possible state that it might occupy. The value V of executing a policy is equal to discounted sum of expected rewards, V (bk ) = E j R( sk , ak ) (4) where [0,1] is the discount factor that determines how important are the future rewards the agent will receive. In POMDP, the k step optimal value function is formulated as follows:
Vk * (bk ) = max R (bk , ak ) + T (bk , a k , bk +1 )Vk +1 (bk +1 ) . (5) a A bk +1

2. PARTIALLY OBSERVABLE MARKOV DECISION PROCESS


The decision making framework in this paper adopts the POMDP model. POMDP address the problem of the agent to act optimally in the partially observable environment. Formally the model is consisted of the finite sets of states S , actions A , observations O , the state transition function T (s k , ak , s k +1 ) which provides the probability of states s k +1 at time k + 1 when the agent starts from state s k and takes an action ak at time k , Reward function R which provides the expected reward gain by agent taking an action ak when it is in state sk. The belief state bk of an agent is a probability distribution over the set of environment states S . A POMDP model has two components which are state estimator component that performs the belief state update and the policy component that solves the POMDP. Under the Markov assumption, the belief state bk is defined as follows: bk (s k ) = p(s k | d 0k , m) (1) where m is the model of the world (e.g. map) and d 0k is the sequence data related to observations (e.g. camera images, laser range scanner) and action data (e.g. motor control or odometer readings) up to time k, d 0k = o0 , a0 , o1 , a1 ,, ak 1 , ok . (2) The posterior belief state is obtained by applying Bayes rule, the theorem of total probability, and exploiting the Markov assumption as follows
bk +1 = p( s k +1 | o0 ,, ak , ok +1 , m) = k p(ok +1 | o0 ,..., ak , s k , m) p( s k +1 | o0 ,..., ak , m) = k p(ok +1 | s k +1 , m) p ( s k +1 | o0 ,..., a k , m) = k (ok +1 | s k +1 , m) p ( s k +1 | o0 ,..., ak , s k , m) p ( s k | o0 ,..., a k 1 , ok , m) dsk = k (ok +1 | s k +1 , m) p ( s k +1 | ak , s k , m) p( s k | o0 ,..., a k 1 , ok , m)dsk = k (ok +1 | s k +1 , m) p ( s k +1 | ak , s k , m) p (bk ( s k )) dsk

The reward and transition function, R and T respectively, is defined as a function of belief state instead of a single state, since in POMDP, the true state is not completely known. Thus, they can be defined as R (bk , ak ) = bk ( sk )R (sk , ak ) and

sS

T (bk , ak , bk +1 ) = P (bk +1 | ak , bk ) , respectively. The iterative construction of the optimal value function over the set of all possible belief state is computationally expensive either in finite or infinite horizon. Therefore, many approaches proposed to simplify the computation by approximating the value function such as assuming a fully observable MDP [3], point-based value iteration [14] where the POMDP is solved for a sampled set of belief points, etc. In this paper, we use the extended Kalman filter (EKF) for belief prediction and update. The preferential of EKF is based on the efficiency of the computational effort since the EKF utilizes low dimensional matrix operations rather than iterations over large state spaces.

3. FORMULATION OF DECISION MAKING FOR NAVIGATION


3.1 Prediction and update of belief state The discrete time model of an autonomous vehicle with position in x-, y-coordinate and orientation as a state vector x is defined as follows: x(k + 1) = A(x(k ), u(k ), w u (k )) (6) where,
u(k ) = [v(k ) (k ) ] with v and are linear and rotational velocity, respectively, at time k; and
T

x(k ) = [x, y, ]T

at

time

k;

where k is a normalizing factor and is equal to the total probability of perceiving the observation o given in the previous belief state of the agent and the action it executed. The POMDP can be interpreted as a process to obtain optimal action policy * (bk ) over agents belief state for each time step. In other words, a policy is a mapping that specifies an action sequence that the agent should

cos sin 0 A ( x) = is the simplified 0 0 1 (3) kinematics model of autonomous vehicle. w u (k ) is the vector of the linear and rotational velocities error. The prediction step of EKF algorithm is as follows:
x (k + 1) = A(x(k ), u(k ),0) P (k + 1) = F (k )P(k )F(k )T + G (k )Q(k )G (k )T
^
^_ ^

(7)

where x(k ) and P(k ) are the state and error covariance matrix estimate respectively at time k (after

the update). x (k + 1) and P (k + 1) are the predicted state estimate and covariance matrix at time k+1 (before update). Q is the covariance matrices of the velocity error. F(k) and G(k) are Jacobians of A with respect to x(k) and wu(k) evaluated at k, respectively, which is obtained as follows: 1 0 v (k )T sin T cos 0 0 F (k ) = 0 1 v(k )T cos , G (k ) = T sin 0 0 1 0 T (8) where T is the range of time sample k. The observation z at time k+1 is,
z (k + 1) = H(x(k ), u(k ), v z (k )) (9) where H is the observation function and the vz is the noise measurement. In this paper we assume that the state of the system is fully observable using the absolute localization sensors (e.g. laser sensor for navigation). The update step of EKF is as follows:
x(k + 1) = x (k + 1) + K ( k + 1)(z (k + 1) H( k + 1) x ( k + 1)) P ( k + 1) = P ( k + 1) K ( k + 1)S( k + 1)K ( k + 1)T
^ ^_ ^_

^_

P ( k + 1) = F ( k ) P( k )F ( k )T + G (k )Q( k )G (k )T x ( k + 1) = A( x(k ), u( k ),0) S (k + 1) = J ( k + 1)P (k + 1) J (k + 1)T + R (k ) K (k + 1) = P ( k + 1)J ( k + 1)T S(k + 1) 1 P ( k + 1) = P (k + 1) K (k + 1)S (k + 1)K (k + 1)T x( k + 1) = x ( k + 1) Equation (12) shows that the belief state is not updated with a measurement. In this phase, we expand the next step belief states from the previous belief state. At each step prediction, we have a predicted area (i.e. 2 2 ellipse area representing the variance xx , yy of
^ ^_ ^_

. (12)

position in x-, y-coordinate, respectively) that represents the possible location of the autonomous vehicle. By using the predicted area, we formulate the decision rule with respect to the goal and obstacles position. In each step prediction we may predict the belief state for each sequence of policy (i.e. sequence of control inputs u(k) for k=0, , N1). In this paper we proposed the reward value R k as follows:
+1 = - 1 if 0 if
2

(10) where
K (k + 1) = P (k + 1)J (k + 1)T S(k + 1) 1 S(k + 1) = J (k + 1)P (k + 1)J (k + 1)T + R and R is the covariance matrices of state error measurement. J(k+1) is the Jacobian of H(k+1)

x j x goal
2

(11)

R k

^ ^ ^ ^ x j x obs y j y obs + 1 2 2

xx

yy

evaluated at x(k + 1) . Since we assume that the localization sensor provides all the states, thus in this paper we obtain the matrix J and H as an identity matrix. 3.2 Decision making framework based on N-steps prediction of belief state In the following, we present the strategy of decision making framework for autonomous vehicle. One objective of the decision making for autonomous vehicle navigation is to move effectively from the initial to the goal configuration while also to avoid unmodeled obstacles. To create the effective movement, it is necessary for the autonomous vehicle to predict several steps forward before it decides the path that will be followed. It is possible to use the belief state prediction equation (3) for the multiple steps prediction, under the assumption that the features of the observations are static in N-step prediction. The N-steps of belief states are predicted based the EKF prediction in (8). It can be written as follows:

others

(13) where is the small acceptable value for the state


x j at step j ( j = k + 1,.., N ) in the area of goal position
2 2 x goal . xx and yy are the variance of position in ^

x-y coordinate, respectively, which are the two first diagonal of the covariance P(k + 1) . x obs and y obs are the position of feature obstacles in the x-y coordinate. The reward value function (13) gives reward +1 for any part of action sequence that achieves the acceptable configuration of goal position. It gives reward -1 to any part of action sequence which the belief state of its action arrives in the area occupied by obstacles. If the prediction belief state arrives at neither obstacles nor goal point area then the reward will be given 0. 3.2 Searching for action policy The value iteration for the determination of the best action sequence may be very large. In this paper, we use the nonholonomic motion algorithm based on shortest path optimization [9] of nonholonomic planner as the reference and our reward value function (13) for the searching of action policy.
^ ^

17

16

15

searchDirection( (1)) reward=NstepPredictionBeliefStateEKF(v, ) end whichDirection(l1, l2) executeAction(v (1), (1)) z = getObservation() belief=updateBelief(z, v (1), (1)) end The prediction of belief states, the optimal path, and the decision path of the selected action are shown in Fig. 1. Fig. 2 shows the path from nonholonomic planner that is not taken into account the obstacles and the path that is obtained from the proposed decision making algorithm. In Fig. 3, we can see the time length modification of rotational velocity of the nonholonomic planner in the proposed decision making algorithm. At the first time sequence, the decision making algorithm modifies the time length longer at the = 1. It causes the orientation angle produced by decision making algorithm is larger than the angle produced by nonholonomic planner. The remaining time sequences show that the decision making algorithm steers the agent to avoid the obstacles and to achieve the goal.
41

14

y (m)

13

12

11

10

10

11

12

13

14

15

x (m)

Fig. 1 Nonholonomic planner (dashed), uncertainty prediction (dot), and the execution of the first sequence decision (solid) The nonholonomic planner assumes that the linear and rotational velocities are in the domain of v {-1, 1} and {-1, 0, 1}, respectively. The path produced from this algorithm is the set of arcs of circle and straight paths. The sequence of actions (linear and rotational velocity, v and respectively) and their time length is a bang-bang solution from the Pontryagins Maximum Principle. We sample this sequence over determined time steps for the purpose of N-steps prediction of belief states. For the determined prediction, we examine whether the nonholonomic planner is a collision-free path or not. From this point, we look for other control candidates by modifying of the first sequence action. We modify the time length of the first sequence steering input, which will change the angle direction, until we find the prediction belief states that are free from obstacles. If there are two action candidates that both are not occupied any obstacles, then the two action candidates are compared by choosing the most minimal norm between the state prediction of each action with the goal point. After we find the action sequence that is collision free, we execute the first part of the action sequence. The localization sensor is used to update the belief state after the first action sequence is executed. The overall algorithm is described in algorithm 1. Algorithm 1 N = number of N-step prediction initial belief state while not reach the goal point [v,]=NonholonomicSequence (initialpoint,goal point) reward =NstepPredictionBeliefStateEKF(v,) while reward < 0

40

39

38

y (m)

37

36

35

34

33

32 -22

-21

-20

-19

-18

-17

-16

-15

-14

-13

x (m)

Fig. 2 Path of planner and decision making


1.5

Linear velocity (m/s)a Rotational velocity (rad/s)

0.5

-0.5

-1

Time (sec)

Fig. 3 Linear velocity (solid), rotational velocity of nonholonomic planner (dot), and rotational velocity of decision making algorithm (dashed)

4. SIMULATION RESULTS
Simulation of the proposed method that conducts nonholonomic planner and N-steps prediction of belief states is developed using MATLAB 7.1 software. The code of nonholonomic planner developed in [9] is used. 4.1 Obstacle avoidance In the first simulation, the initial point is set at (7, 13, 0) and the goal point at (12, 15, /2) where some obstacle points are placed along the way. The step projection of belief states is set to be 3 steps (1 second each step) projection. At the first trial, we set the standard deviation of the position uncertainty of the agent (i.e. the belief state) at the initial point, also the sensors and actuators uncertainty are 0.1 (Fig. 4.a). The path that is produced from decision making algorithm by using this setting makes the agent can go through the narrow passage. At the second trial, we set the initial belief state, the sensors, and actuators uncertainty 0.2. At the first two steps, the agent follows the optimal nonholonomic path (Fig. 4.b). And then, the agent is changing its direction because the belief states projected are occupied by obstacles. The agent finds other belief state prediction that is free from obstacles (wider passage). The cusp path in this point is created from the nonholonomic planner with the current position as the initial point. a)
18

4.2 Task achievement In the working area, the autonomous vehicle usually has to achieve the place that is nearby the obstacles. The second scenario, we conduct the initial point (10, 10, /2) and goal point (12.5, 16.5, 0) where the goal point is nearby obstacles. By using (13), the reward +1 will be given to the agent if the agent gets into the goal area. Thus the reward 1 that is given by the N-step belief state prediction will be neutralized. In the second simulation, we show in Fig. 4 that the autonomous vehicle is able to achieve the goal point that is located nearby the obstacles.

5. CONCLUSION
In this work we propose a decision making framework for navigation of autonomous vehicle. The simulation results show that the decision making behavior of the agent depends on the setting of initial belief state of the position and orientation of the vehicle, and also the determination of sensors and actuators uncertainties. The proposed decision making framework performs obstacle avoidance and operational task achievement with respect to the nonholonomic constraint and considering the uncertainties of the autonomous vehicle. The method can be conducted further with the higher level architecture of the agent. For an example, if the search of action sequence does not give any solution then the higher planner should creates a new waypoint from the known map.

17

16

y (m)

18
15

14

17

13

16

12
18

10

b)
17

16

15

y (m)

14

13

12 7 8 9 10 11 12 13 14

y (m)

x (m)

11

12

13

14

15

14

13

12

11

10

10

11

x (m)

x (m)

12

13

14

Fig. 4 The path of decision making algorithm with the standard deviation of initial belief state and sensors and actuators uncertainties: a) 0.1. b) 0.2

Fig. 5 Task achievement of the autonomous vehicle where the goal point is nearby the obstacles

Future works involves the application of N-steps prediction belief states in real time, integrating with the map building, and the high level planner, also integrating with the prediction of moving obstacles to perform efficient and effective obstacle avoidance in dynamic environment. The research for the autonomous vehicle to determine its own belief state is also needed. Furthermore, the application of N-steps prediction belief state to multi agent and cooperation will be examined.

ACKNOWLEDGMENT
This work was supported by the Regional Research Universities Program (Research Center for Logistics Information Technology, LIT) granted by the Ministry of Education, Science and Technology, Korea.

REFERENCES
[1] A. Brooks, A. Makarenko, S. Williams, H. Duurant-Whyte, Parametric POMDPs for planning in continuous space, Robotics and Autonomous Systems, vol. 54, pp. 887-897, 2006. [2] A. Foka and P. Trahanias, Real time hierarchical POMDPs for autonomous robot navigation Robotics and Autonomous Systems, vol. 55, pp. 561-571, 2007. [3] A. R. Cassandra, L. P. Kaelbling, J. A. Kurien, Acting under uncertainty: Discrete Bayesian models for mobile-robot navigation, in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, Osaka, Japan, pp. 963-972, 1996. [4] C. Leung, S. Huang, N. Kwok, G. Dissanayake, "Planning under uncertainty using model predictive control for information gathering," Robotics and Autonomous Systems, vol. 54, pp. 898-910, 2006. [5] C. Stachniss and W. Burgard An integrated approach to goal-directed obstacle avoidance under dynamic constraints for dynamic environments. in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots & Systems (IROS), 2002. [6] D. Fox, W. Burgard, and S. Thrun. The dynamic window approach to collision avoidance. IEEE Robotics & Automation Magazine, vol. 4, pp. 23-33, 1997. [7] D. A. Lizarraga, P. Morin, and C. Samson, Nonrobustness of continuous homogeneous stabilizers for affine control systems, in Proceedings of the 38th IEEE Conference on Decision and Control, vol. 1, pp. 855 860, 1999. [8] E. Valtolina, and A. Astolfi, Local robust regulation of chained systems, Systems and Control Letters, vol. 49, pp. 231-238, 2003. [9] H. Wang, Y. Chen, and P. Soures, An efficient geometric algorithm to compute time-optimal trajectories for car-like robot, in Proceedings of the 46th IEEE Conference on Decision and Control, pp. 5383-5388, 2007. [10] I. Kolmanovsky and N. H. McClamroch, Developments in nonholonomic control problems,

IEEE Control System Magazine, vol. 15, pp. 20-36, 1995. [11] IEEE RAS Technical Committee on Algorithms for Planning and Control of Robot Motion Web Site http://robotics.cs.umass.edu/tc-apc/Main/Time-optimal TrajectoriesForACar-likeRobot. [12] J. Borenstein and Y. Koren, The Vector Field Histogram - fast obstacle avoidance for mobile robots. IEEE Journal of Robotics and Automation, vol. 7, pp. 278-288, 1991. [13] J-P. Laumond Ed., Robot Motion Planning and Control, Springer-Verlag, Berlin, 1999. [14] M. T. J. Spaan and N. Vlassis, A point-based POMDP algorithm for robot planning, in the Proceedings of 2004 IEEE International Conference on Robotics and Automation (ICRA), vol. 3, pp. 2399-2404, 2004. [15] N. Roy, W. Burgard, D. Fox, and S. Thrun. Coastal navigation: Robot navigation under uncertainty in dynamic environments. in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), vol. 1, pp. 35-40, Detroit, MI, 1999. [16] N. Y. Ko, R. Simmons, The Lane-Curvature Method for Local Obstacle Avoidance, in Proceedings of the 1998 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS98), vol. 3, pp. 1615-1621, Victoria, BC, 1998. [17] O. Khatib, "Real-time obstacle avoidance for manipulators and mobile robots," International Journal of Robotics Research, vol. 5, pp. 90-98, 1986. [18] P. Lucibello, and G. Oriolo, Robust stabilization via iterative state steering with an application to chained form systems, Automatica, vol. 37, pp. 71-79, 2001 [19] P. Soures and J-P. Laumond, Shortest paths synthesis for a car-like robot, IEEE Transactions on Automatic Control, vol. 41, no. 5, pp. 672-688, 1996. [20] R. M. Murray and S. S. Sastry, Nonholonomic motion planning: Steering using sinusoids, IEEE Transactions on Automatic Control, vol. 38, pp. 700-716, 1993. [21] R. Simmons, The Curvature Velocity Method for Local Obstacle Avoidance, in Proceedings of the IEEE International Conference on Robotics and Automation, vol. 4, pp. 3375-3382, Minneapolis, MN, 1996. [22] S. Thrun. "Probabilistic algorithms in robotics," AI Magazine, vol. 21, pp.93-109, 2000. [23] I. Ulrich and J. Borenstein, VFH+: Reliable obstacle avoidance for fast mobile robots, In Proceedings of the IEEE International Conference on Robotics & Automation (ICRA), pp. 1572-1577, 1998.