Vous êtes sur la page 1sur 53

A Call Admission Control for

Service Differentiation and


Fairness Management
in WDM Grooming Networks

Kayvan Mosharaf, Jerome Talim and Ioannis Lambadaris
BroadNet 2004 proceeding

Presented by Zhanxiang
February 7, 2005
Goal & Contribution
Goal:
Fairness control and service differentiation in a WDM
grooming network. Also maximizing the overall
utilization.

Contributions:
An optimal CAC policy providing fairness control by
using a Markov Decision Process approach;
A heuristic decomposition algorithm for multi-link and
multi-wavelength network.
Quick Review of MDP
DTMC

DTMDP
We focus on DTMDP because CTMDP
usually solved by discretization.
DTMC
0 1 -1 -1
-1
A DTMC { | 0,1, 2...} is a discrete time discrete value random sequence such that
given , ... , the next random variable depends only on through the
transition probability
[ |
n
n n n
n n
X n
X X X X X
P X j X
=
= =
2 2 0 0 -1
, ,..., ] [ | ]
of the : ( ) ( )
Transition probability: ( ) ( | ) , 0
Chapman-Kolmogorov equation: ( ) ( ) ( ), , , , 0
n n n n
n j n
jk m n m
ij ik kj
k
i X i X i P X j X i
pmf X p n P X j
p n P X k X j m
p m n p m p n m n i j

+
= = = = =
= =
= = = >
+ = >

Originate from Professor Malathi Veeraraghavans slides.


DTMC
0 1
00 01 02
0 1 2
Initial distribution: (0) [ (0), (0), ... ]
Transition probability matrix P:
...
... ... ... ...
...
... ... ... ...
: ( )

i i i
n
p p p
p p p
P
p p p
n step transition matrix P n P
probabilities that system
=
(
(
(
=
(
(

=
:
( ) (0) && ( ) (0)
n n
j i ij
i
is in state j after n transitions
p n p P p n p P = =

Originate from Professor Malathi Veeraraghavans slides.


DTMC
Two states i and j communicate if for
some n and n, pij(n)>0 and pji(n)>0.
A MC is Irreducible, if all of its states
communicate.
A state of a MC is periodic if there
exists some integer m>0 such that
pii(m)>0 and some integer d>1 such
that pii(n)>0 only if d|n.
Originate from Professor Malathi Veeraraghavans slides.
DTMC
- ( ) ,

. lim ( ),
: - ,
lim ( ) lim (0)
ij
j ij
n
j
n
j j j ij j
n n
i
the n step transition probabilities p n of finite
irreducible and aperiodic MCs become independent
of i and n as n Let q p n
v long run proportion
v p n p P q

= = =

| |
0
lim ( - )
, ( 0,1,...), & 1
n
n
j i ij j
i j
P V steady state probability vector
v v p j v

=
=
= = =

Originate from Professor Malathi Veeraraghavans slides.
Decision Theory
Probability
Theory
+
Utility Theory

=

Decision Theory
Describes what an agent should
believe based on evidence.


Describes what an agent wants.



Describes what an agent should
do.
Originate from David W. Kirschs slides
Markov Decision Process
MDP is defined by:

State Space: S
Action Space: A
Reward Function: R: S {real number}
Transition Function: T: SXA S (deterministic)
T: SXA Power(S) (stochastic)

The transition function describe the effect of an
action in state s. In this second case the transition
function has a probability distribution P(s|s,a) on the
range.
Originate from David W. Kirschs slides and modified by Zhanxiang
MDP differs DTMC
MDP is like a DTMC, except the transition matrix
depends on the action taken by the decision maker
(a.k.a. agent) at each time step.

Ps,a,s' = P [S(t+1)=s' | S(t)=s, A(t)=a]

Next state s
Action a
DTMC
MDP
Current state s
MDP Actions
Stochastic Actions:
T : S X A PowerSet(S)
For each state and action we specify a probability
distribution over next states, P( s | s, a).

Deterministic Actions:
T : S X A S
For each state and action we specify a new state.
Hence the transition probabilities will be 1 or 0.
Action Selection & Maximum
Expected Utility
Assume we assign reward U(s) to each state s
Expected Utility for an action a in state s is



MEU Principle: An agent should choose an
action that maximizes the agents EU.
EU(a|s) =
s
P(s | s, a) U(s)
Originate from David W. Kirschs slides and modified by Zhanxiang
Policy & Following a Policy
Policy: a mapping from S to A, : SA

Following policy procedure:

1. Determine current state s
2. Execute action (s)
3. Repeat 1-2
Originate from David W. Kirschs slides modified by Zhanxiang
Solution to an MDP
In deterministic processes, solution is a plan.
In observable stochastic processes, solution is a policy
A policys quality is measured by its EU
Notation:
a policy
(s) the recommended action in state s
* the optimal policy
(maximum expected utility)
Originate from David W. Kirschs slides and modified by Zhanxiang
Should we let U(s)=R(s)?
In the definition of MDP we introduce R(s), which
obviously depends on some specific properties
of a state.

Shall we let U(s)=R(s)?
Often very good at choosing single action decisions.
Not feasible for choosing action sequences, which
implies R(s) is not enough to solve MDP.
Assigning Utility to Sequences
How to add rewards?
- simple sum
- mean reward rate
Problem: Infinite Horizon infinite reward
- discounted rewards
R(s
0
,s
1
,s
2
) = R(s
0
) + cR(s
1
) + c
2
R(s
2
) where 0<c1
Originate from David W. Kirschs slides modified by Zhanxiang
How to define U(s)?
Define U

(s) is specific to each


U

(s) = E(
t
R(s
t
)| , s
0
=s)
Define U(s)= Max

{U

(s) }= U
*
(s)
We can calculate U(s) on the base of R(s)
U(s)=R(s) + max P(s|s,(s))U(s)
s
Bellman equation
If we solve the Bellman equation for each state,
we will have solved the optimal policy * for the
given MDP on the base of U(s).
Originate from David W. Kirschs slides and modified by Zhanxiang
Value Iteration Algorithm
We have to solve |S| simultaneous Bellman
equations
Cant solve directly, so use an iterative approach:
1. Begin with an arbitrary utility function U
0
2. For each s, calculate U(s) from R(s) and U
0
3. Use these new utility values to update U
0
4. Repeat steps 2-3 until U
0
converges

This equilibrium is a unique solution! (see R&N for proof)
Originate from David W. Kirschs slides
State Space and Policy Definition
in this paper
The authors idea of using MDP is great, Im not
comfortable with state space definition and the
policy definition.

If I were the author, I will define system state
space and policy as follows:
S = S X E
where S={(n1, n2, , nk) | tknk<=T} and
E={ck class call arrivals} U {ck class call
departures} U {dummy events}
Policy : SA
Network Model :: Definitions
OADM: Optical Add/Drop Multiplexer
WC: wavelength converter
TSI: time-slot interchanger
L: # of links a WDM grooming network contains
M: # of origin-destination pairs the network includes
W: # of wavelengths in a fiber in each link
T: # of time slots each wavelength includes
K: # of classes of traffic streams
ck: traffic stream classes differ by their b/w requirements
tk: # of time slots required by class ck traffic to be
established
nk: # of class ck calls currently in the system
Network model :: assumptions
For each o-d pair, class ck arrivals are
distributed according to a Poisson process with
rate k.
The call holding time of class ck is exponentially
distributed with mean 1/k . Unless otherwise
stated, we assume 1/k = 1.
Any arriving call from any class is blocked when
no wavelength has tk available time slots.
Blocked calls do not interfere with the system.
The switching nodes are non-blocking
No preemption
Fairness definition
There is no significant difference between
the blocking probabilities experienced by
different classes of users;

CS & CP
Complete Sharing (CS)
No resources reserved for any class of calls;
Lower b/w requirement & higher arrival rate
calls may starve calls with higher b/w
requirement and lower arrival rate;
Complete Partitioning
A portion of resources is dedicated to each
class of calls;
May not maximize the overall utilization of
available resources.
Not Fair
Fair but
Single-link single-wavelength(0)
System stat space S:
S={(n1, n2, , nk) | tknk <= T}
k


Operators:

Aks = (n1, n2, , nk+1, , nK)

Dks = (n1, n2, , nk-1, , nK)

AkPas = (n1, n2, , nk+a, , nK)

Single-link single-wavelength(1)
Sampling rate

v = ([T/tk]k+k)
k

Only one single transition can occur during each
time slot.

A transition can correspond to an event of

1) Class ck call arrival

2) Class ck call departure

3) Fictitious or dummy event
(caused by high sampling rate)
Single-link single-wavelength(2)

Reward function R:

Value function
Single-link single-wavelength(3)

Optimal value function:

Optimal Policy:
Single-link single-wavelength(4)

Value iteration to compute Vn(s)
Single-link single-wavelength(5)

Action decision:
If Vn(AkP1s) >= Vn(AkP0s)
then a=1;
else a=0;
Basing on the equation below.
My understanding
The authors idea of using MDP is great

Example

Matlab toolbox calculation

Heuristic decomposition
algorithm
Step 1: For each hop i, partition the set of
available wavelengths into subsets,
dedicated to each of o-d pairs using hop i.

Step 2: Assume uniformly distributed
among the Wm wavelengths, thus, the
arrival rate of class ck for each of the Wm
wavelengths is given by: k/Wm.
Heuristic decomposition
algorithm (2)
Step 3: Compute the CAC policy with
respect to k/Wm.

Step 4: Using the CAC policy computed in
Step 3, we determine the optimal action
for each of the Wm wavelengths,
individually.
Performance comparison
1
k k
i j
We define as the offered load per o-d pair;
BP as the blocking performance of class c calls;
Suppose that c and c calls experience the highest and lowest
blocking probabilities in the ne
K
k
k
k

=
=

i
r
j
twork, then we define fairness ratio
BP
as f := ;
BP
Performance comparison
Performance comparison
Performance comparison
Relation to our work
We can utilize MDP to model our
bandwidth allocation problem in call
admission control to achieve fairness;

But in heterogeneous network the
bandwidth granularity problem is still there;
Possible Constrains
Under some conditions the optimal policy
of an MDP exists.

Backup
Other MDP representations
Markov Assumption
Markov Assumption:
The next states conditional
probability depends only
on a finite history of
previous states (R&N)

kth order Markov Process
Andrei Markov (1913)
The definitions are equivalent!!!
Any algorithm that makes the 1
st
order Markov Assumption
can be applied to any Markov Process
Markov Assumption:
The next states conditional
probability depends only
on its immediately
previous state (J&B)

1
st
order Markov Process
Originate from David W. Kirschs slides
MDP
A Markov Decision Process (MDP) model
contains:
A set of possible world states S
A set of possible actions A
A real valued reward function R(s,a)
A description T(s,a) of each actions effects in
each state.
MDP differs DTMC
A Markov Decision Process (MDP) is just like a Markov
Chain, except the transition matrix depends on the action
taken by the decision maker (agent) at each time step.

Ps,a,s' = P [S(t+1)=s' | S(t)=s, A(t)=a]

The agent receives a reward R(s,a), which depends on
the action and the state.

The goal is to find a function, called a policy, which
specifies which action to take in each state, so as to
maximize some function of the sequence of rewards
(e.g., the mean or expected discounted sum).
MDP Actions
Stochastic Actions:
T : S X A PowerSet(S)
For each state and action we specify a probability
distribution over next states, P( s | s, a).

Deterministic Actions:
T : S X A S
For each state and action we specify a new state.
Hence the transition probabilities will be 1 or 0.
Transition Matrix
Next state s
Current state s
Action a
DTMC
MDP
MDP Policy
A policy is a mapping from S to A
: S A

Assumes full observability: the new state
resulting from executing an action will be
known to the system

Evaluating a Policy
How good is a policy in the term of a
sequence of actions?

For deterministic actions just total the rewards
obtained... but result may be infinite.
For stochastic actions, instead expected total reward
obtained again typically yields infinite value.

How do we compare policies of infinite value?
Discounting to prefer earlier
rewards
A value function, V : S Real,
represents the expected objective value
obtained following policy from each state
in S .

Bellman equations relate the value
function to itself via the problem dynamics.
Bellman Equations
'
* *
'
( ) ( , ( )) ( , ( ), ') ( ')
( ) ( , ( )) ( , ( ), ') ( ')
.
.
| | tan
s S
s S
V s R s s T s s s V s
V s MAX R s s T s s s V s
is the discount factor
There is one equation for each state in S
Thus we have to solve S simul eo
t t
t
t t
t t

e
e[
e
= +
| |
= +
|
\ .

. us Bellman equations
Value Iteration Algorithm
Cant solve directly, so use an iterative approach:

1. Begin with an arbitrary utility vector V;

2. For each s, calculate V*(s) from R(s,) and V;

3. Use these new utility values V*(s) to update V;

4. Repeat steps 2-3 until V converges;

This equilibrium is a unique solution!
MDP Solution
* *
'
*
'
| |
( ) ( , ( )) ( , ( ), ') ( ')

arg ( , ( )) ( , ( ), ') ( ')
.
s S
s S
Solution to the S Bellman equations
V s MAX R s s T s s s V s
is policy
MAX R s s T s s s V s
when the V converges
t
t
t
t t
t t
e[
e
e[
e
| |
= +
|
\ .
| |
| |
[ = +
|
|
\ .
\ .

Vous aimerez peut-être aussi