Vous êtes sur la page 1sur 5

WCDMA Downlink Load Sharing with Dynamic Control of Soft Handover Parameters

Ridha Nasri, Zwi Altman, Herve Dubreil, Zakaria Nouir


France Telecom Research and Development, RESA-NET, 38-40 rue du General Leclerc 92794 Issy les Moulineaux, Cedex 9, France { ridha.nasri, zwi.altman, herve.dubreil, zakaria.nouir} gfrancetelecom.com
Abstract- In this article, we address the problem of auto-tuning of soft handover (SHO) parameters in WCDMA networks. The auto-tuning process uses a fuzzy Q-learning controller to adapt SHO parameters to varying network situations such as traffic fluctuation. The fuzzy Q-learning controller combines both fuzzy logic theory and reinforcement learning method. The cooperation of these two mechanisms simplifies the task of the online optimization of fuzzy logic rules and consequently leads to a better online SHO parameterization of each base station in the network. The proposed scheme improves the system capacity compared to a classical network with fixed parameters, balances the load between base stations and minimizes human intervention in network management and optimization tasks.

Index Terms-WCDMA networks, Soft handover, fuzzy Q-learning, auto-tuning.


I. INTRODUCTION

In WCDMA networks, soft handover (SHO) is an important mechanism that contributes to the quality of service (QoS), since it increases the capacity and the coverage and leads to a smooth mobility of users. However, when the SHO overhead (percentage of mobiles in SHO situation) exceeds a certain threshold (i.e. 45%), the SHO becomes a handicap for the downlink capacity. In such a case, a new link establishment increases interference and decreases the available base station (BS) power. Careful dynamic online setting of SHO parameters, namely Hysteresis eventlA and Hysteresis_ eventiB (according to 3GPP specifications [1]) is then required and could be particularly beneficial in 3G-CDMA networks. In general, the traffic fluctuation and the mobile environment unpredictability render fixed parameter setting inaccurate and insufficient. That is why an onlineoptimization of such parameters is an encouraging method to adapt the network to such traffic variation. By adjusting dynamically the SHO parameters, a better balance of load between BSs can be achieved. As a consequence, the capacity and the coverage of the system may increase and more users can be accommodated with good quality. Fuzzy logic controller (FLC) with Q-learning algorithm allows to perform the auto-tuning process quickly and efficiently. In fact, FLC is used in various domains in
* This work has been carried out in the framework of the European CELTIC initiative, and has been partially funded by the French Ministry of Industry.

engineering and is particularly recommended for the control task of radio resource management functions [3, 4]. This is due to its simplicity and ability to model the human thinking. Fuzzy logic control translates human linguistic rules into simple mathematical equations. The Q-learning algorithm is used to dynamically optimize fuzzy rules and to adapt the controller to any network situation. It includes a learning phase in which the FLC dynamically learns what to do, namely how to map network states (cell load, blocking rate...) into actions (changing network parameters) in order to maximize a long-term expected cumulative reward function. The reward function depends on quality indicators provided by the network such as blocking and dropping rates. The content of this paper is the following: Section II briefly describes the system model. Section III highlights the general concept of fuzzy Q-learning (FQL) controller. Simulation results are presented in section IV. Section V concludes the paper.
II. SYSTEM MODEL

A WCDMA radio access network typically consists of a set of base stations or node B governed by a radio network controller (RNC). Each BS covers a geographical area (called cell or sector) and serves an interference-limited capacity defined by the number of mobiles in each service class with a target quality of service (e.g. blocking and dropping rates, Eb/No). Continuous coverage over more than two BS-service areas is achieved by SHO (macro-diversity) mechanism, which is a seamless transfer of a call from one BS to another. This inter-cellular call transfer is typically governed by a set of parameters. Among them, we cite the Hysteresis eventlA, the Hysteresis eventiB and the Active Set Size [1, 2]. Briefly and according to 3GPP specification, the SHO algorithm is performed as follows: when the mobile-measured Received Signal Strength (RSS) from a BS is higher than the best-cell RSS minus Hysteresis eventlA, a new radio link is established between the BS and the mobile. On the other hand, when the RSS from a served BS becomes lower than the bestcell RSS minus Hysteresis eventiB, the radio link is released from the active set of the mobile in question. Other parameters (Hysteresis eventiC for example) involving SHO algorithm are out of the scope of this paper. For more details about the SHO algorithm, readers can refer to [1] or [2]. Each BS has its own parameters. A uniform setting of all BSs leads certainly to a sub-optimal parameterization. Each

0-7803-9392-9/06/$20.00 (c) 2006 IEEE

942

BS should be optimally parameterized with respect to the other BSs. SHO hysteresis parameters of one BS strongly impact its own radio load and that of its neighbors. Consequently, these parameters influence the downlink capacity and the quality of service of the network. By increasing the hysteresis parameter values, the downlink and the uplink loads increases and decreases respectively. However, a low value of these parameters may increase the risk of call dropping and may cause coverage holes. In the next section, we assume that the network can be modeled as a controllable and commendable non-linear stochastic system. This system receives SHO parameters as input and delivers network quality indicators as output, which serve as input to the fuzzy Q-learning controller.
III. GENERAL CONCEPT OF FUZZY Q-LEARNING CONTROLLER

Pns(t) =Pi (-)fiNS iE=-NS


(s)

(3)

l 11~~~~~1
i
:3

WO KDVDMA network
--------------------

CD~~~~~~~~~~~~~~~~~~~~: 0.

;+

.,,

.0

The general concept of fuzzy Q-learning based automatic control of soft handover is depicted in figure 1. At each time step, the controller observes the current network state x(t), and performs an action a (t) One step later, the controller receives from the network a reward signal r(t q T) and finds itself in a new state x(t T) , where T stands for the controlling period. The controller reactivity is the inverse of T. During its learning, the controller tends to find an optimal mapping, denoted z :x(t) a(t)=zt(x(t)) , between system states and actions. The Q-learning algorithm continually estimates and updates the quality of the state-action pair, denoted Qt (x(t), a(t)) . The Q-function is mainly used to aid the controller to select the action that maximizes a utility function. The most commonly used utility function, and the one that will be used throughout this paper, is the discounted cumulative future reward function, expressed as: V (x(t))
XY'r(t+1)
i=O

Fig. 1. Fuzzy Q-leaming controller for auto-tuning of SHO parameters.

where Ps (t - -) is the instantaneous load of BS s at time t--. NS(s) is the cell-neighboring set of BS s, and 0 Si is the relation degree between BS s and BS i. It is defined as the flow of mobiles between BS s and BS i normalized by the total mobile flow involving BS s. The system quality that we want to improve here is the combination of both system blocking (CBR(t)) and dropping rates (CDR(t)). Then, the reinforcement function should contain these indicators or some others related to them. The following reinforcement function satisfies the desired criteria:

r(t) = (CBR* CBR(t)) + /3- (CDR* CDR(t))


-

(4)

(1)

where yis the discount factor. It keeps the cumulative reward finite, but it is also used as a measure that indicates the relative importance of future rewards. Setting yclose to 0 will make the controller trying to optimize immediate rewards. On the other hand, setting yclose to 1 means that the controller will regard future rewards almost as important as the immediate ones. In our study, we set yto 0.95.
In this paper, we use x(t) = (p, (t), p,, (t)), as the system state at time t. p (t) and pj(t) stand respectively for the average local BS load and the average of neighbor cell loads. In fact, quality indicators calculated during a period Tf (filtering period) are filtered to avoid using very oscillating values at the input of the controller. Mathematically, Ps (t) and P (t) can be expressed as follows:

where CBR* and CDR* are respectively the operator target blocking rate and the target dropping rate. ,8 is the mixing factor between the dropping and blocking rate and depends on the operator preference. The dropping of a call is considered more penalizing than its blocking, and consequently A should be bigger than 1. In our study, ,8 equals 4.

Initially, the quality function Q1 (x(0), a(0)) is set to zero for all the state - action couples. During the learning processes, it is updated (eq. (5)) periodically according to the received reward r(t e T) at time t+T, and to the estimated value of the next state V(x(t T)) . The Q-function values are stored in a look-up table which is interpreted as the memory of the controller.
Qt+T (x,a) (1-K)Qt (x, a) + x(r(t + T) + jt (x(t + T))) (5) where x is the learning rate. Setting K close to 1 means that the controller gives more importance to the estimation of the new quality than to the last one. This can lead to a very oscillating system. On the other hand, setting x close to 0 might delay the convergence of the learning process. Here we use 1 equal to 0.1 as recommended by Q-learning

Tf

Ps(t)

T
0

P (t - r)d r

(2)

943

community [5]. As WCDMA quality indicators are all continuous, fuzzy logic is used here to handle the continuous input. The crisp quality indictors are sampled into a discrete state called fuzzy set. The mapping between crisp and discrete (fuzzy) indicators is made by a membership function c. which defines the membership degree of the crisp input with the fuzzy variable Lj (j E/7,m], m being the number of fuzzy sets, denoted also rules). In this work, triangular membership functions are used. For more details about fuzzy logic, the readers are referred to [3, 5, 6]. Briefly, in the case of continuous indicators, the inferred action a for the state x(t) is given by the gravity centre of actions in each rule weighted by their membership function o. If the member functions are chosen to satisfy the normalized

condition

j=1

Oaj (x)

1,the output action a(t) is

a(t) = zt (x(t)) =Y, a, (x)x oX


}=1

(8)

where o0 is the action in the fuzzy state Lj. The corresponding Q-function for the inferred action a is given by

Q(x, a) =, Eaj (x)x q j (Lj, o j)


j=1

(9)

where q(Lj, o) is the q-value of the action oj in rule Lj. During the learning process, the controller can exploit what has been learned by choosing, in each state, a random action with a probability e and the best action with the probability 1e. Once the convergence of the learning process is reached and the state-action quality becomes constant, the learning is completely stopped and the best action with highest quality is always chosen.
IV. SIMULATION RESULTS AND DISCUSSION

To evaluate the performance of the proposed auto-tuning method, a Semi-Dynamic Simulator (SDS) is used. The SDS performs correlated snapshots to account for the time evolution of the network. After each time step that can typically vary between one to a few seconds, the new mobile positions (due to mobility with different speeds) and the powers transmitted in the network are computed, as in a static simulator. The fuzzy Q-learning controller has been applied to a WCDMA network with 32 sectors in a dense urban environment. The studied network is extracted from a real situation where the propagation is calibrated according to a professional model that accounts for clutter effects. Each cell is surrounded by a set of neighboring cells constructed dynamically by the simulator. To model the traffic and mobility in the system, we use the following assumptions: 1. Call requests are generated according to a Poisson process with a rate X that varies between 2 to 13 call requests per second in the present simulation. During the simulation, the

traffic is stationary and then X is kept constant. To model the non-uniformity of traffic, the generated call appears in the network area according to a pre-prepared traffic map. The communication time of each call is exponentiallydistributed with a mean equal to 100 s. 2. The user mobility is based on a two-dimensional semirandom walk model. At each time step, the mobile can randomly change its direction within a limited angular interval. At the boarder of the area, the user does not quit the network but is reflected back. 80% of the generated mobiles are in indoor situation and their speed is set to zero. The speed of the rest of the mobiles is set to 60 km/h. 3. The maximum transmit power of each base station is set to 20 Watt. 20% of the power is assigned to the common channel including the common pilot channel (CPICH). 65% of the power is assigned to traffic channels. The admission control threshold is then set to 85% (20% plus 65%). When the cell load reaches 85%, requested calls are blocked. The considered soft handover criterion is the CPICH Received Signal Code Power (CPICH RSCP). For a classic network, without and any control, Hysteresis eventlA Hysteresis eventiB are set to 4 and 6 dB respectively. In the present method, these parameters are controlled according to system reactivity equal to 1/50 s-1 (i.e. the period of parameter regulation is set to 50 seconds) unless contrary indication. 4. The fuzzy Q-learning controller parameters are the discounted factor the learning rate A and the exploitation/exploration probability e. They are set respectively to 0.95, 0.1 and 0.80 [5, 6]. We first illustrate the convergence of the proposed scheme and the evolution of the maximum variation of the state-action qualities in the learning process. Fig. 2 shows the maximum variation of the state-action qualities as a function of time. As mobile networks are stochastic, the mathematical expectation of this variable (maximum variation of quality function) is used. At the very beginning, the action-state quality varies randomly. This is due to the exploration feature of the fuzzy Q-learning controller.
,

0.3
M 0.25

0.2

I.. i0.1501 005

@~0. 05

O 0_

-1 2 3 4 5 6 Time (seconds) 7 8 9 10
4

Fig. 2. Time evolution of the learning process measured as the Maximum variation of action-state quality.

944

For the present study, the convergence of the algorithm is reached after approximately 80000 seconds of learning, equivalent to one day of learning in a real network. This learning process is performed by a computer simulation for around 20 minutes. Once the learning process converges and the stability of stateaction quality is reached, we stop totally the learning process and we exploit the obtained FQL controller for the auto-tuning of SHO parameters. Fig. 3 illustrates the percentage of failed communication versus the traffic intensity (measured as the number of arrival calls per second) for an optimized WCDMA network with the proposed scheme compared to a classic network with fixed configuration. This quality indicator measures the probability of dropped and blocked calls. It is the complementary of the call success rate. It can be seen that the FQL controller can always guarantee a low percentage of failed communication for all traffic load conditions. Although, the learning is performed for a specific level of traffic, the optimized controller continues to improve network performance in other traffic situations. This is an important result that illustrates the robustness of the auto-tuning process with respect to traffic variations. For a percentage of call failures equals to 2% (the conventional operator quality target) the optimized network can support a traffic level equals to 6.6 mobiles per second, whereas the classic network supports only 5.3 mobiles per second. Hence, the capacity improvement at 2% of QoS is equal to 24%. Fig. 4 shows the distribution of cell load for the optimized network compared to a network with fixed configuration. As expected, controlling dynamically and optimally SHO parameters can lead to a better load distribution between the cells since it allows a cell with high traffic to give in certain links to a neighboring cell with low traffic. As can be seen in Fig. 4, the number of cells with medium load is higher for the optimized network than for the classic one. The use of a very reactive dynamic controller is subject to some drawbacks since the FQL controller can increase the signaling messages. Table 1 shows the impact of the controller on the average of the Ping-Pong effect. The Ping-Pong effect is measured as the average number of active set updates for each mobile during its sojourn time in the network. In order to give prominence to this effect, we simulate two network environments: low mobility (speed=3km/h for 20% of users) and high mobility (speed=60km/h for 20% of users) environment. The average frequency of active set updates equals 0.028 (respectively 0.072) for the classic network with low mobility (respectively high mobility). A very reactive controller (reactivity=0.05 Hz) increases the frequency of active set update to 0.032 in a low mobility environment and to 0.092 in a high mobility environment. The increase of signalization messages is of about 20%. By decreasing the reactivity of the controller to one regulation per 100s (reactivity=0.01 Hz), the additional signaling introduced by the controller is decreased to 6% in a high-mobility environment. Hence the adaptation of the controller reactivity needs to be carefully taken into account in each network environment.

35
-

30
25

-- Classic network 4/6dB -* Optimized network

ct

Z 20
C.)
c-

c.)

15

210
u a)

C. a)

10

11

12

13

traffic rate (#mobiles/s)


Fig. 3. Percentage of failed communication versus traffic intensity. 0,18
-

0,16

P, 0,14 .r 0,12 > 0,1 .t 0,08


q 0,06
=

0- 0,02
0

It-

0,04

0,2

0,3

0,4

0,5 0,6 Cell Load

0,7

0,8

0,9

Fig. 4. Distribution of cell load for the optimized network compared to a classic network with fixed configuration.
IMPACT OF THE CONTROLLER ON TEHE FREQUENCY OF ACTIVE SET UPDATE

TABLE I

Classic Network Optimized network FQL reactivity=0.05 Hz Optimized network FQL reactivity=0.01 Hz

Average frequency of active set update mobility=3km/h mobility=60km/h 0,028 0,0721


0,0328 0,0921

0,0296

0,0773

V. CONCLUSION

In this paper we have presented a new scheme for dynamically controlling SHO parameters. The proposed mechanism uses fuzzy Q-learning controller to dynamically adapt SHO hysteresis parameters to the traffic fluctuation. The controller receives as inputs the filtered downlink load of each BS as well as of the neighboring BSs. The controller continually learns the best parameterization in each network situation. The learning process is governed by a utility function called reinforcement. Simulation results show significant improvements in terms of network performance. The proposed scheme improves the system capacity up to 25% compared to a classical network, balances the load between base stations and minimizes human interventions

945

required for network management. However, we have shown that the dynamic controller increases the frequency of active set updates which in turn increases the signaling messages in the radio interface as well as in the core network. The PingPong phenomenon can be reduced by bringing down the reactivity of the controller.
ACKNOWLEDGMENT

[2] [3]

[4]

The authors would like to thank Mr Christophe Gay for his help in implementing the fuzzy Q-learning algorithm and for many helpful discussions.
REFERENCES
[1]
3GPP TR 25.922 (Release 6.1.0): "Radio Resource Management Strategies".

[5]

H. Holma, A. Toskala, "WCDMA for UMTS: Radio Access for Third Generation Mobile Communications," Wiley & Sons, 2001. H. Dubreil, Z. Altman, V. Diascorn, J.M. Picard, and M. Clerc, "Particle Swarm optimization of fuzzy logic controller for high quality RRM auto-tuning of UMTS networks," IEEE International Symposium VTC 2005, Stockholm, Sweeden, 29 May- 1 June 2005. R. Nasri, Z. Altman and H. Dubreil, "Dynamic Radio Resource Management in wireless networks: Towards Autonomic Mobile Networking," submitted to IEEE Communications Magazine. L. Jouffe, "Fuzzy Inference System Learning by reinforcement Methods", IEEE Transactions on Systems, Man, and Cybernetics, Vol. 28, pp. 338-355, Aug. 1998. P.Y. Glorennec and L. Jouffe, "Fuzzy Q-leaming", 6th IEEE International Conference on Fuzzy System, 1-5 July, 1997.

[6]

946

Vous aimerez peut-être aussi