Vous êtes sur la page 1sur 7

Decentralized control for optimal energy-delay tradeoff in cooperative relay communications

Deepanshu Vasal and Achilleas Anastasopoulos


Department of Electrical Engineering and Computer Science University of Michigan Ann Arbor, Michigan 48105 Email: {dvasal, anastas}@umich.edu
AbstractIn a multi-hop cooperative communication scenario, where a transmitter can transmit a packet directly to the receiver or indirectly through a relay, there is an inherent trade-off between energy and delay. While it may consume more energy to transmit a packet directly to the receiver than transmitting through a relay, the transmission through relay incurs more delay. We consider the MAC layer of a transmission scheme involving a transmitter, a receiver and a relay, where average delay is related to the queue size of the transmitter and relay. We pose this problem as an innite horizon stochastic control problem in a decentralized setup where the information about the queue size of a node is not available to the other node. We prove a structural result that the optimal policy is the solution of a dynamic programming equation and the optimization is done over a time invariant state space.

I. I NTRODUCTION With increasing number of wireless networking devices using real time applications, the delay is an important parameter for quality of service (QoS) of the communication, whereas due to battery constraints, the transmission energy is costly. In a wireless network, the energy required to transmit a packet successfully to a receiver could be large due to large distance between the two nodes or bad channel gain, but presence of other nodes in the network could provide alternate route with possibly less energy costs. Since this requires successful transmission from the transmitter to the relay node and then from relay node to the receiver node, clearly the delay is more. Thus there is a tradeoff between the energy cost for successfully routing a packet and the delay cost. The relay channel is the simplest model and building block for user cooperation in a network. It was rst viewed from the lens of Information theory by Van Der Meulen [1]. Though the capacity of the general relay channel is still unknown, the bounds discovered by Van Der Meulen were improved by Cover and El Gamal [2]. With recent advances in computation and digital communication technologies such as LDPC codes, spatial diversity etc., there is a renewed interest in the cooperative communication [3]. Since information theory is an asymptotic theory it does not address the delay constraints in a direct fashion. The relay channel can also be viewed from the MAC layer either as a problem where the two users (the transmitter and the relay) are strategic, or as a problem where the two users

act as a team. In [4], the authors use the former approach and pose the problem as a static game, where cooperation is induced using a reward mechanism, and they analyze strategies in Nash equilibrium. In our work, we use the latter approach (users form a team) and consider a half duplex relay channel with a transmitter, a relay and a receiver node with incoming trafc at both the transmitter and the relay node. This model assumes both transmitter and relay nodes having incoming trafc, and thus it is more realistic compared to the model in [2] which assumes that the relay is neither a source of information, nor a sink. Our model further assumes xed energy costs for any successful transmission from transmitter to relay, relay to receiver, or transmitter to the receiver; nally there is a delay cost for each packet in the queue of either transmitter or the receiver. We study the decentralized case where the queue length of any node is its private knowledge. We assume that there is a designer that has the knowledge of all energy and delay costs. It nds the optimal strategies for users in an ofine fashion which are implemented online by the users in a decentralized way. We pose the designers problem as a decentralized stochastic control problem of minimizing expected sum of costs of the users in innite horizon, where users cooperate to achieve the goal. Since there is no single controller, rather both transmitter and relay nodes are controllers with linked stochastic control problems, this set-up does not t into the standard framework of Markov Decision Process (MDP) theory [5], [6]. Similar decentralized control problems are studied in [7], [8]. In this paper, we prove a structural result which shows that there exists an optimal policy that is the solution of a dynamic programming equation where the optimization is done over a xed state space as opposed to an ever-increasing state-space in general. We achieve this by two simplications in Lemma 2 and Proposition 1. Firstly we nd a controlled Markov process for each user for a xed strategy of the other user. Secondly we pose the problem as an instance of delayed sharing of information [?] and nd the sufcient state that describes optimal policy. We further simplify the state space as P(N) P(N) as opposed to P(N2 ), where P(X) is the space of all probability measures dened on space X, and N is the set of non-negative integers. In other words we show that it is sufcient to restrict the state space to a pair of marginal probability distributions over the two queue lengths instead

of the joint probability distribution on the pair of the queue lengths. The remainder of this paper is structured as follows. In section II, we present the model. In section III, we formulate the problem and present our structural results. Section IV discusses the possibility of further simplication of computing optimal strategies and their scalability. We conclude in section V. II. M ODEL

Fig. 1.

A simple relay channel

Our model, as shown in Fig. 1, consists of a transmitter node (node 1), a relay node (node 2) and a receiver node (node 3). The time is discretized into slots and we assume Bernoulli packet arrival processes pt , qt at node 1 and 2 respectively and the probability of arrival of a packet in any slot is p and q for node 1 and 2, p, q [0, 1]. Both node 1 and 2 have queues of innite size. The transmitter has to send the arrived packets to the receiver and it has a choice to either transmit directly to the receiver or transmit it through the relay or not transmit at all. We denote by x1 and x2 , the number of packets at time t t t in the queues of node 1 and node 2, respectively. Node 1 and node 2 take action u1 ,u2 , respectively, as a function of all the t t information gathered till time t, where u1 U 1 = {0, E12 , E13 } t u2 t U = {0, E23 }
2

each others queue lengths, and thus our model is different from the delayed-sharing information model studied in [?]. We dene generic, stationary instantaneous cost function c(x1:2 , u1:2 ) as a function of queue lengths and actions of t t both the nodes. To quantify energy delay tradeoff, we assume following costs. The energy cost of transmission from node 1 to node 3 is E13 , node 1 to node 2 is E12 and that for node 2 to node 3 is E23 and simultaneous transmissions from both node 1 and node 2 lead to unsuccessful reception (collision), without any additional cost. To simplify notation, we consider the same symbols for actions as for the corresponding energy costs and reference is clear from context. We also assume a delay cost which is equal to the total number of packets waiting in the queues of node 1 and 2, thus cost of one unit per epoch for each packet in either queue. One instance of such cost function that captures both energy and delay could be c(x1:2 , u1:2 ) = t t x1 + x2 + u1 + u2 . All costs are additive and costs for future t t t t slots (or epochs) are discounted by discount factor , (0 < < 1). Due to assumption of Bernoulli arrival process, the basic random variables of the system are independent. We dene (E13 , E23 , E12 , , p, q) as the basic parameters of the system. III. D ECENTRALIZED C ONTROL We consider the decentralized case where node 1 cannot observe the queue length of node 2 and vice versa. At time t, information available to node k is (xk , uk 1:t1 , w1:t1 ) which 1:t is equivalent to (xk , u1:2 ) and thus control actions can be 1:t1 1:t dened as follows
1 u1 = gt (x1 , u1 1 1:t 1:t1 , w1:t1 ) = gt (x1 , u1:2 ) t 1:t 1:t1 2 u2 = gt (x2 , u2 2 1:t 1:t1 , w1:t1 ) = gt (x2 , u1:2 ) t 1:t 1:t1 k k k k g1 , g2 ,

(3)

where If g is any strategy of node k i.e., g = k {1, 2} then g = (g1 , g2 ) is the combined strategy of both the nodes and the corresponding cost is given by J g
1 2 J = E{ t1 c(Xt , Xt , Ut1 , Ut2 )} g t=1

(1a) (1b)

(4)

The possible actions for node 1 are wait (0), transmit to node 2 (E12 ) and transmit to node 3 (E13 ); and possible actions for node 2 are wait (0) and transmit to node 3 (E23 ). The system evolution can be given by x1 =pt + x1 1{E12 ,E13 } (u1 )1{0} (u2 ) t t1 t1 t1 x2 =qt + x2 1{E23 } (u2 )1{0} (u1 ) t t1 t1 t1 + 1{E12 } (u1 )1{0} (u2 ), t1 t1 (2b) where 1A () is the indicator function of the set A. At the end of time slot t, node 1 and node 2 receive a noiseless feedback wt {0, 1, 2, e} from the receiver stating if the slot had successful transmission from node 1 (1) or node 2 (2), was idle (0), or had a collision (e). Thus each node at time t can determine (u1 , u2 ) from the feedback t1 t1 wt and thus it is a delayed sharing of information with delay 1. Note however that the two users will never get to know (2a)

Problem 1: Find the optimal decentralized policy g that achieves the optimal cost, if it is achievable, J := inf J g
g

(5)

where J g is as dened in (4), control actions u1:2 as in (3). t Here we prove a structural result for the optimal decentralized policy and show that it can be found as a solution of a dynamic programming equation. We rst prove that there exist optimal control actions of a node that depend only on its current queue length and the entire control history of both the nodes i.e., (xk , u1:2 ). In second simplication, we further t 1:t1 show that there exists an optimal policy that depends on the current queue length xk and the posterior on x1:2 conditioned t t on the control history u1:2 . 1:t1 In this decentralized case, at time t, xk is the private 1:t information of node k and u1:2 is the common information 1:t1 available to both the nodes. As shown in Fig.2, the private

information of the nodes is separated by the common information and thus it is intuitive to think that the private information of the two nodes is independent given the common information. This result is proved in following lemma.
Private information of Node 2
f x g u f x g u f x g u

Eg {c(x1 , x2 , u1 , u2 )|x1 , u1:2 , u1 } t t t t 1:t 1:t1 1:t = Eg {c(x1 , x2 , u1 , u2 )|x1 , u1:2 , u1 } t t t t t 1:t1 t = c(x1 , u1:2 , u1 ) t 1:t1 t (9)
2

Proof: See Appendix A As a consequence of the MDP structure of the problem, given a xed strategy g2 of the node 2, the optimal control action by node 1 can be given as (for k = 1) [6]
k uk = gt (xk , u1:2 ) t t 1:t1

Common Info

(10)

u g g

u g

Since this is true for any xed strategy of node 2, it is also true for the optimal strategy of the node 2. A similar result for node 2 is true, thus the above equation is valid for k {1, 2} . A. POMDP from the perspective of coordinator In the decentralized case, each node can act as a controller and thus we have two linked stochastic control problems. The problem in (5) can equivalently be viewed from the perspective of a ctitious coordinator [?] who observes, at time t, the feedback wt or equivalently u1:2 (common information) but does t1 not observe xk , k {1, 2} (private information). Thus at time t t, it has access to the information u1:2 (due to perfect recall) 1:t1 and based upon this information, it generates partial functions 1:2 k t as its control output, where t : N U k , k {1, 2}. And based upon these control outputs of the coordinator, node k, k {1, 2} compute its action by operating these partial functions on its private information i.e., xk , as shown in Fig. t 3 . If strategy of the coordinator is , then
1 2 (t , t ) = t (u1:2 ) 1:t1

Private information of Node 1

Fig. 2.

System Evolution

1 Lemma 1: For any xed strategy g, random variables X1:t 2 and X1:t are conditionally independent given the control 1:2 history till time t, U1:t1 i.e.,

Pg (x1:2 |u1:2 ) = Pg (x1 |u1:2 )Pg (x2 |u1:2 ) 1:t 1:t1 1:t 1:t1 1:t 1:t1

(6)

Proof: The causal decomposition of Pg (x1:2 , u1:2 ) 1:t1 1:t gives, Pg (x1:2 , u1:2 ) 1:t 1:t1 =P(x1 ) 1 P(x2 ) 1 Thus, Pg (x1:2 |u1:2 ) 1:t 1:t1 t1 1 P(x1 ) i=1 P(x1 |x1 , u1:2 )Pg (u1 |x1 , u1:2 ) 1 1:i1 i i 1:i i+1 i = t1 1 P(x1 |x1 , u1:2 )Pg1 (u1 |x1 , u1:2 ) 1 P(x1 ) i+1 i i i 1:i 1:i1 x1:t i=1 t1 2 2 1:2 g2 2 2 1:2 2 P(x1 ) j=1 P(xj+1 |xj , uj )P (uj |x1:j , u1:j1 ) t1 2 2 2 1:2 2 2 1:2 g2 x2 P(x1 ) j=1 P(xj+1 |xj , uj )P (uj |x1:j , u1:j1 ) 1:t (7b) =Pg (x1 |u1:2 )Pg (x2 |u1:2 ) 1:t 1:t1 1:t 1:t1
1 2

t1 i=1 t1 j=1

P(x1 |x1 , u1:2 )Pg (u1 |x1 , u1:2 ) 1:i1 i 1:i i i+1 i P(x2 |x2 , u1:2 )Pg (u2 |x2 , u1:2 ) j+1 j j j 1:j 1:j1
2

(11a) (11b) (11c) (11d)

uk t (7a)

= = =

k t (x1 ) t k (u1:2 )(xk ) t 1:t1 t k gt (u1:2 , xk ) 1:t1 t

x1 t

g1( , ) t

u1 t

x1 t

g1 ( ) t g1 ( ) t

u1 t

u1:2 1:t-1

Yt ( ) g )
2 t(

u1:2 1:t-1

2 t

g2( , ) t

2 t

2 t

gt2 ( )

u2 t

(7c)

Fig. 3.

Control by the Fictitious Coordinator

Lemma 2: For given any xed strategy of the node 2 i.e., 1 1:2 g2 , {(Xt , U1:t1 ); t = 1, 2, } is a controlled Markov 1 1:2 process with state (Xt , U1:t1 ) and control input Ut1 i.e., Pg (x1 , u1:2 |x1 , u1:2 , u1 ) t+1 1:t 1:t 1:t1 1:t = Pg (x1 , u1:2 |x1 , u1:2 , u1 ) t+1 1:t t 1:t1 t
2 2

Now we show that belief on x1:2 given the observation and t 1:2 control history till time t, which is u1:2 , 1:t1 , forms a 1:t1 sufcient state for the coordinators problem. We dene the 1:2 random variable t P(N2 ) as the posterior pmf of Xt 1:2 1:2 conditioned on U1:t , 1:t1 i.e.,
1:2 1:2 t (x1:2 ) = P(Xt = x1:2 |U1:t1 , 1:2 ) t t 1:t1

(8)

(12)

Lemma 3: There exists a deterministic update function F , independent of the policy g, that updates the state t given 1:2 control t and variable u1:2 . t
1:2 t+1 = F (t , t , u1:2 ) t

+ E{V ( 1 , 2 )| 1:2 , 1:2 }]

(22)

(13)

Proof: See Appendix B Proposition 1: The process {t , t = 1, 2, ...T } is a con1:2 trolled Markov Process with control t . i.e.,
1:2 1:2 P(t+1 |1:t , 1:t ) = P(t+1 |t , t ) 1:2 1:2 E(c(x1:2 , u1:2 )|1:t , 1:t ) = E(c(x1:2 , u1:2 )|t , t ) t t t t

where the expectation is with respect to the conditional probability induced by the update functions (G1 , G2 ) and u1:2 as t random variable (noise). Thus there exists optimal control actions of the form
k 1 2 uk = gt (xk , t , t ) t t

(23)

(14) (15) (16)

1:2 c(t , t )

Proof: See Appendix C Since {t , t = 1, 2, ...T } is a controlled Markov Process the 1 2 optimal output functions can be given by (t , t ) = t (t ) [6]. And thus optimal action by node k can be written as
k uk = gt (xk , t ) t t

(17)

and can be found as solution of (22). The action of node 1 is a function of its current queue length, its estimate of node 2 2s queue length (t ) and also node 2s estimate of node 1s 1 queue length (t ). Similar result is true for node 2. In the online operation of the system, at time t, each node (transmitter 1:2 and relay) rst updates the quantities t1 as dictated by the 1,2 recursion (19) and based upon t they nd the corresponding 1:2 action t as solution of (22). Finally they generate their k k action ut by evaluating t on their private information xk t k k k i.e., ut = t (xt ). IV. D ISCUSSION

The dynamic program for the coordinator is V () = inf [(, 1:2 ) + E{V ( )|, 1:2 }] c 1:2

(18)

where the expectation is with respect to the conditional probability induced by the update function F and u1:2 as random t variable (noise). This result is in accordance with [?]. Furthermore, due to the specic nature of our problem, we show that instead of joint probability on the queue length of two nodes, individual marginals form a sufcient state. To that effect, we dene random variable k P(N) as t k 1:2 the posterior pmf of Xt conditioned on U1:t1 , 1:2 i.e., 1:t1 1 2 1:2 k k (xk ) = P(Xt = xk |U1:t1 , 1:2 ) and show that (t , t ) t t t 1:t1 is controlled Markov process. This gives a signicant reduction in size of state over which optimal policies have to searched as is dened over a space of P(N2 ) while ( 1 , 2 ) is dened over P(N) P(N). Lemma 4: There exists a deterministic update functions Gk , k {1, 2}, independent of the policy g, that updates the k k state t given control t and variable u1:2 . t
k k k t+1 = Gk (t , t , u1:2 ) t

k {1, 2}

(19)

Proof: See Appendix D Proposition 2: The process {(1 , 2 ); t = 1, 2, ...} is a t t controlled Markov Process with controls 1:2 . i.e., t
1 2 1 2 1:2 1 2 1 2 1:2 Pg (t+1 , t+1 |1:t , 1:t , 1:t ) =P(t+1 , t+1 |t , t , t ) (20) 1 1:2 E(c(x1:2 , u1:2 )|1:t , 1:t ) =(t , t ) + c(t , t ) c 1 1 2 2 t t

(21)

Proof: See Appendix E 1 2 Since {(t , t ); t = 1, 2, ...} is a controlled Markov Process, 1 2 the optimal output functions can be given by (t , t ) = 1 2 t (t , t ). The dynamic programming equation can be given as V ( 1 , 2 ) = min [1 ( 1 , 1 ) + c2 ( 2 , 2 ) c 1 2
,

In our related research [9] we have also addressed the centralized version of this problem, where there is a centralized controller that can observe the queue lengths of both nodes. The problem in that case is much simpler: it consists of a controlled Markov chain with the state being the pair of the two queue lengths. Thus the optimal action of the controller is function of the current pair of queue lengths and can be found using dynamic programming. As expected, it avoids collisions, since they result in additional current costs without improving the state of the system. Also the optimal policy is a threshold policy so that the state space N2 is divided into contiguous regions, one for each possible action. We believe (although at this point we do not have a conclusive proof) that due to the structure of the problem, for the decentralized case also, the i optimal t functions are also threshold functions. For example, for node 2 there are two possible actions namely wait (0) 2 and transmit (E23 ), and optimal t could choose either of the actions depending on whether the queue length is smaller or greater than some threshold value. This threshold however is not constant, but is function of the state of the coordinator 1:2 i.e., (t ), as dictated by (22). This additional structural simplication could signicantly reduce the computation of the optimal policy by transforming the functional optimization in (22) to parameter optimization. Lastly, we point out that this model cannot be easily extended to the case where there are more than two transmitter nodes (where the relay node in our model is also considered a transmitter node). This is because when there are only two transmitter nodes, in case of a collision, each node can determine that collision occurred due to simultaneous transmission of the other node. But if there are three or more nodes, in case of collision, a node cannot determine which other node(s) transmitted simultaneously. More precisely, in the former case, feedback wt combined with uk gives transmission prole of t1 each user i.e., (u1 , u2 ) whereas in the latter case this is no t1 t1 longer true. Thus our model needs to be enriched so that each

collision also contains the information regarding which nodes transmitted. This could be achieved if each node transmits a signature waveform along with the data waveform such that signature waveform of all users are mutually orthogonal and orthogonal to data (e.g., in frequency). V. C ONCLUSION

A PPENDIX B Proof: Fix


1:2 t+1 (x1:2 ) =P(Xt+1 = x1:2 |u1:2 , 1:t ) t+1 t+1 1:t 1:2 = P(x1:2 , x1:2 |u1:2 , 1:t ) t+1 t 1:t

(26a) (26b)

= We studied energy and delay tradeoff in cooperative communication by posing it as a decentralized, innite horizon stochastic control problem. We proved a structural stating that the optimal policy can be found by solving a dynamic programming equation and optimal control can be given as k 1 2 uk = gt (xk , t , t ). The domain of optimization is the space t t of the pair of marginal probability mass functions on the integers P(N) P(N) and algorithms similar to ones used for POMDP could be used to nd the solution. Future research directions include the unveiling of additional structural properties of the optimal strategy (e.g., threshold strategies), as well as designing optimal and efcient suboptimal strategies and analyzing their performance. A PPENDIX A Proof: Pg (x1 , u1:2 |x1 , u1:2 , u1 ) 1:t 1:t1 1:t 1:t t+1 =Pg (x1 |x1 , u1:2 )Pg (u1:2 |x1 , u1:2 , u1 ) t+1 1:t 1:t 1:t 1:t 1:t1 t
2 =P (x1 |x1 , u1:2 )Pg (u2 |x1 , u1:2 , u1 ) t 1:t1 t 1:t 1:t t+1 1:t 2 =P(x1 |x1 , u1:2 )Pg (u2 |x1 , u1:2 , u1 )) t+1 t t t 1:t 1:t1 t 2 =P(x1 |x1 , u1:2 )Pg (u2 |u1:2 ) t 1:t1 t t+1 t g2 1 1:2 1 1:2 =P (xt+1 , u1:t |xt , u1:t1 , u1 ) t 2 2 2

x1:2 t

x1:2 t

1:2 P(x1:2 |u1:2 , 1:t ).P(x1:2 |x1:2 , u1:2 ) t 1:t t+1 t t

(26c) Now,
1:2 P(x1:2 |u1:2 , 1:t ) t 1:t 1:2 P(x1:2 , u1:2 |u1:2 , 1:t ) t t 1:t1 = 1:2 1:2 1:2 1:2 x1:2 P(xt , ut |u1:t1 , 1:t )
t

(26d) (26e) (26f)

1:2 1:2 P(x1:2 |u1:2 , 1:t )P(u1:2 |u1:2 , 1:t , x1:2 ) t t t 1:t1 1:t1 1:2 , u1:2 |u1:2 , 1:2 ) 1:2 P(xt t 1:t1 1:t x
t

1:2 1:2 P(x1:2 |u1:2 , 1:t1 )1u1:2 {t (x1:2 )} t t 1:t1 t 1:2 1:2 1:2 1:2 1:2 x1:2 P(xt |u1:t1 , 1:t1 )1u1:2 {t (xt )} t
t

1:2 Since t = (u1:2 ) 1:t1 1:2 1:2 t (x1:2 )1u1:2 {t (x1:2 )} t t t 1:2 (x1:2 )1 1:2 { 1:2 (x1:2 )} 1:2 t ut t t t xt (26g)

1:2 P(x1:2 |u1:2 , 1:t ) = 1:t t

(24a) (24b) (24c) (24d) (24e) Proof:


1:2 P(t+1 |1:t , 1:t ) 1:2 = P(t+1 , u1:2 |1:t , 1:t ) t

g2

Thus,
1:2 t+1 = F (t , t , u1:2 ) t

(26h)

A PPENDIX C

where (24c) follows since x1 = ft (x1 , pt+1 , u1:2 ) where t t t+1 ft is as dened in (2) and by independence of basic random 2 variables and (24d) follows since Ut2 is a function of X1:t , 1 2 1 1 Ut is a function of X1:t and X1:t , X1:t are conditionally 1:2 independent given U1:t1 (Lemma 1). For the second part, Eg {c(x1 , x2 , u1 , u2 )|x1 , u1:2 , u1 } t t t t 1:t 1:t1 1:t 1 2 1 2 g2 1:2 1:2 1 = c(xt , xt , ut , ut )P (xt , ut |x1:t , u1:2 , u1 ) 1:t1 1:t
x1:2 ,u1:2 t t
2

(27a) (27b)

= =

u1:2 t

u1:2 t

1:2 1:2 1t+1 {F (t , t , u1:2 )}P(u1:2 |1:t , 1:t ) t t 1:2 1:2 1t+1 {F (t , t , u1:2 )}1u1:2 {t (x1:2 )} t t t

u1:2 ,x1:2 t t 1:2 P(x1:2 |1:t , 1:t ) (27c) t 1:2 1:2 1:2 1:2 1:2 = t (xt )1t+1 {F (t , t , ut )}1u1:2 {t (xt )} t u1:2 ,x1:2 t t

= =

x2 ,u2 t t

(25a) c(x1 , x2 , u1 , u2 )Pg (x2 , u2 |x1 , u1:2 , u1 ) (25b) t t t t t t 1:t 1:t1 1:t c(x1 , x2 , u1 , u2 )Pg (x2 , u2 |u1:2 ) t t t t t t 1:t1
2 2

(25c) (25d) (25e)

(27d) =
1:2 P(t+1 |t , t )

x2 ,u2 t t

(27e)

=Eg {c(x1 , x2 , u1 , u2 )|x1 , u1:2 , u1 } t t t t t 1:t1 t =(x1 , u1:2 , u1 ) c t 1:t1 t where (25c) follows from Lemma 1.

1:2 E(c(x1:2 , u1:2 )|1:t , 1:t ) t t 1:2 = c(x1:2 , u1:2 )P(x1:2 , u1:2 |1:t , 1:t ) t t t t x1:2 ,u1:2 t t

(27f)

x1:2 ,u1:2 t t

1:2 1:2 c(x1:2 , u1:2 )P(x1:2 |1:t , 1:t )1u1:2 {t (x1:2 )} t t t t t

A PPENDIX E Proof: In the following we use the notation G := (G1 , G2 )


1 2 1 2 1:2 Pg (t+1 , t+1 |1:t , 1:t , 1:t ) 1 2 1 2 1:2 = Pg (t+1 , t+1 , u1:2 |1:t , 1:t , 1:t ) t

x1:2 ,u1:2 t t

(27g)
1:2 c(x1:2 , u1:2 )t (x1:2 )1u1:2 {t (x1:2 )} t t t t t

(27h) (27i)

1:2 = c(t , t )

(29a)

= A PPENDIX D Proof: For any xed coordinator strategy ,


1 t+1 (x1 ) t+1 1:2 =P (x1 |u1:2 , 1:t ) t+1 1:t 1:2 P (x1 , x1:2 |u1:2 , 1:t ) = t+1 t 1:t x1:2 t

u1:2 t

u1:2 t

1 2 1:2 1 2 1:2 1 2 1t+1 ,t+1 {G(t , t , t , u1:2 )}Pg (u1:2 |1:t , 1:t , 1:t ) t t

= (28a) (28b)

u1:2 ,x1:2 t t

(29b)
2 1:2 1 2 1t+1 ,t+1 {G(t , t , t , u1:2 )} t

(29c)

x1:2 t

1:2 P (x1:2 |u1:2 , 1:t ).P(x1 |x1 , u1:2 )} t 1:t t+1 t t

1:2 1 2 1:2 1u1:2 {t (x1:2 )}Pg (x1:2 |1:t , 1:t , 1:t ) t t t 1 2 1:2 1 2 2 1 t (x1 )t (x2 )1t+1 ,t+1 {G(t , t , t , u1:2 )} = t t t u1:2 ,x1:2 t t 1:2 1u1:2 {t (x1:2 )} (29d) t t 1 2 1:2 1:2 1:2 1 1 2 2 1 2 = t (xt )t (xt )1t+1 ,t+1 {G(t , t , t , t (xt ))} x1:2 t

(28c) Now,
1:2 P (x1:2 |u1:2 , 1:t ) t 1:t 1:2 P (x1:2 , u1:2 |u1:2 , 1:t ) t t 1:t1 = 1:2 1:2 1:2 1:2 x1:2 P (xt , ut |u1:t1 , 1:t )
t

(29e)
1:2 1 2 2 1 =P(t+1 , t+1 |t , t , t )

(29f)

(28d) (28e) (28f) (28g)


1:2 1:2 E(c(x1:2 , u1:2 )|1:t , 1:t ) t t 1:2 1:2 = (c(x1:2 , u1:2 ))P(x1:2 , u1:2 |1:t , 1:t ) t t t t x1:2 ,u1:2 t t

1:2 1:2 P (x1:2 |u1:2 , 1:t )P (u1:2 |u1:2 , 1:t , x1:2 ) t t 1:t1 1:2 1:2t 1:21:t1 1:2 = x1:2 P (xt , ut |u1:t1 , 1:t )
t

(29g)

1:2 1:2 P (x1:2 |u1:2 , 1:t1 )1u1:2 {t (x1:2 )} t t 1:t1 t 1:2 1:2 P (x1:2 |u1:2 , 1:t1 )1u1:2 {t (x1:2 )} t t 1:t1 x1:2 t
t

1:2 1:2 1:2 (c(x1:2 , u1:2 ))P(x1:2 |1:t , 1:t )1u1:2 {t (x1:2 )} t t t t t

1:2 2 1 t (x1 )t (x2 )1u1:2 {t (x1:2 )} t t t t = 1 1 2 2 1:2 (x1:2 )} 1:2 t (xt )t (xt )1u1:2 {t t x t
t

x1:2 ,u1:2 t t

x1:2 ,u1:2 t t

(29h)
1:2 2 1 (c(x1:2 , u1:2 ))t (x1 )t (x2 )1u1:2 {t (x1:2 )} t t t t t t

(29i) (29j)

Thus,
1 t+1 (x1 ) t+1 =P(x1 |x1 , u1:2 ) t+1 t t 2 1:2 1 t (x1 )t (x2 )1u1:2 {t (x1:2 )} t t t t 1 1 2 2 1:2 (x1:2 )} 1:2 t (xt )t (xt )1u1:2 {t t xt t x1:2

=(t , t ) c 1:2 1:2

(28h)

R EFERENCES
[1] E. C. van der Meulen, Three-terminal communication channels, Adv. Appl. Prob., pp. 120154, 1971. [2] T. M. Cover and A. A. El Gamal, Capacity theorems for the relay channel, IEEE Trans. Information Theory, pp. 572584, 1979. [3] A. Chakrabarti, A. Sabharwal, and B. Aazhang, Cooperative Wireless Communications: Fundamental Techniques and Enabling Technologies, 2007, ch. Cooperation in Wireless Networks: Principles and Applications. [4] Y. E. Sagduyu and A. Ephremides, A game-theoretic look at simple relay channel, ACM/Kluwer Journal of Wireless Networks, no. 5, pp. 545560, Oct 2006. [5] D. Bertsekas, Dynamic Programming and Stochastic Control. Academic Press, 1976. [6] P. R. Kumar and P. Varaiya, Stochastic systems: estimation, identication, and adaptive control. Englewood Cliffs, NJ: Prentice-Hall, 1986. [7] A. Mahajan and D. Teneketzis, On the design of globally optimal communication strategies for real-time communcation systems with noisy feedback, IEEE J. Select. Areas Commun., no. 4, p. 580595, May 2008. [8] A. Nayyar and D. Teneketzis, On jointly optimal real-time encoding and decoding strategies in multi-terminal communication systems, in Proc. IEEE Conf. on Decision and Control, Cancun, Mexico, Dec. 2008, pp. 16201627.

P(x1 |x1 , u1:2 ) t+1 t t


x2 t

x1 t

2 2 2 2 x2 1u2 {t (xt )}t (xt ) t t 1 1 t (x1 )1u1 {t (x1 )} t t t = P(x1 |x1 , u1:2 ) t+1 t t 1 1 1 1 1 t (xt )1u1 {t (xt )} xt t x1 x1 t 1 1 t (x1 )1u1 {t (x1 )} t t t
t

1 1 t (x1 )1u1 {t (x1 )} t t t

2 2 1u2 {t (x2 )}t (x2 ) t t t

(28i) (28j) (28k)

1 1 =G1 (t , t , u1:2 )(x1 ) t t+1

1 2 where (28g) is true since Xt and Xt are conditionally 1:2 independent given Ut1 (Lemma 1). 2 2 2 Similarly t+1 = G2 (t , t , u1:2 ) where G1 and G2 are t deterministic functions.

[9] D. Vasal and A. Anastasopoulos, Centralized and decentralized sequential transmisison strategies using relays, University of Michigan, Ann Arbor, MI, Tech. Rep., Mar. 2011, can be downloaded from http://www.eecs.umich.edu/systems/TechReportList.html.

Vous aimerez peut-être aussi