Appendix (Probability Concepts) Old

Appendix A
Probability Theory.
A.1 Convergence Concepts in Probability: Random Variables
A.1.1 Convergence of Random Variables
We say that X
n
converges weakly to X as n if
P(X
n
x) P(X x) (A.1)
as n , at every continuity point x of P(X
n
) (this notion of convergence is also known as
convergence in distribution). This is, for example, the type of convergence underlying the central
limit theorem. When (A.1) holds, this suggests that we may approximate the distribution of X
n
by
that of X. We can denote this as
X
n
D
X,
where
D
denotes has approximately the same distribution as, and has no rigorous mathematical
meaning (the rigorous statement here is (A.1)). In any case we write
X
n
X as n .
to denote weak convergence of X
n
to X.
Exercise A.1
Find an example of a sequence of random variables X
n
: n 1 such that there exists a random
variable X for which X
n
X, but for which P(X
n
x) P(X x) for at least one x-value.
Exercise A.2
Show that if X
n
X as n , where X is a continuous random variable, then the convergence in
(A.1) is uniform, i.e.
sup
x
[P(X
n
x) P(X x)[ 0 as n .
A.1.2 Convergence in Probability of Random Variables.
We say that X
n
converges in probability to X as n if for each > 0,
P([X
n
X[ > ) 0 (A.2)
as n . When X
n
converges in probability to X, we write
X
n
P
X as n .
Exercise A.3
Prove that if X
n
P
X as n , then Xn X as n .
Thus, convergence in probability implies weak convergence.
129
130 A. Probability Theory.
A.1.3 Convergence in pth mean of Random Variables.
For 1 p < , let L
p
be the vector space of all random variables Y dened on a given probability
space for which E[Y [
p
< (To be precise, we should write L
p
= L
p
(, T, P).). Dene a norm | |
p
on L
p
as
|Y |
p
= (E[Y [
p
)
1/p
.
We say that X
n
converges to X in pth mean (or in L
p
) if X
n
, X L
p
and
|X
n
X|
p
0.
as n .
Exercise A.4
(a) Prove that for > 0,
P([X
n
X[ > )
|X
n
X|
p
p
p
.
(b) Prove that if X
n
converges to X in pth mean, then X
n
P
X as n .
(c) Find a sequence X
n
: n 1 where X
n
P
X as n , but X
n
does not converge to X in pth
mean.
(d) Prove that if X
n
converges to X in pth mean, then X
n
converges to X in qth mean for q p.
A.1.4 Almost Sure Convergence of Random Variables.
We say that X
n
converges almost surely to X (written as X
n
X a.s. asn ) if
P( : X
n
() X() as n ) = 1.
This type of convergence is also known as convergence with probability one. One important
technical advantage of almost sure convergence is that it permits one to use elementary real variable
arguments to establish convergence.
Example A.1
Suppose X
n
X a.s. as n and Y
n
Y a.s. as n . How might we prove that
X
n
+Y
n
X +Y a.s. as n ?
Since X
n
X a.s. as n and Y
n
Y a.s. as n , we know that P(A) = P(B) = 1, where
A = : X
n
() X() as n
B = : Y
n
() Y () as n
But for each xed sample outcome , (X
n
() : n 1) and (Y
n
() : n 1) are (deterministic)
real-valued sequences. So if (AB), then both (X
n
() : n 1) and (Y
n
() : n 1) are convergent
sequences, so elementary real variables argument imply that
X
n
() +Y
n
() X() +Y ()
as n .
Set C = : X
n
() + Y
n
() X() + Y () as n . We have just shown that (AB) C.
Hence, P(C) P(A B) P(A) + P(B) 1 = 1. So, we conclude that X
n
+ Y
n
X + Y a.s. as
n This type of argument, in which we argue sample path by sample path, can be quite powerful.
Exercise A.5
(a) Suppose that X
n
X a.s. as n . Prove that X
n
P
X as n .
A.2. Convergence Concepts in Probability: Random Elements. 131
d
d
d
d
d
d
d
almost sure convergence convergence in pth mean

convergence in probability
c
weak convergence
(convergence in distribution)
Figure A.1: Dierent Modes of Convergence for Random Variables.
(b) Find an example of random variables X
n
: n 1 such that X
n
P
X as n , but for which
X
n
,X a.s. as n .
(c) Find a sequence X
n
: n 1 such that X
n
converges to X in pth mean as n , but
X
n
,X a.s. as n .
Exercise A.6
(a) Let X = X
n
: n 1, be given and set I
n
= I([X
n
X[ > ). Suppose that for each > 0,
n=1
I
n
< . (A.3)
Then, prove that X
n
X a.s. as n .
(b) Prove that (A.3) holds if

n=1
E(X
n
X)
2
< .
A pictorial summary of this discussion on convergence is shown in Fig A.1.
A.2 Convergence Concepts in Probability: Random Elements.
In many applications, we want to approximate more complicated objects than random variables. For
example, we might wish to show that the random walk can be approximated, as a stochastic process,
by Brownian motion. To accomplish this, we wish to extend the concepts described in Appendix A.1
to more general random objects, namely random elements (where a random element is just our
terminology to highlight the fact that the object may no longer be real-valued, as in the case of random
variables). The most important class of random elements we shall consider are random processes with
paths either in C[0, ) (the space of functions x : [0, ) 1
d
that are continuous) or D[0, ) (the
space of right- continuous functions x : [0, ) 1
d
with left-limits). Both of these function spaces
can be viewed as metric spaces.
A.2.1 Metric Spaces.
Let S be a non-empty set. A metric on S is a distance function d which measures the distance
between pairs of points. Specically, d : S S [0, ) must satisfy the following:
(i) d(x, y) = 0 if and only if x = y;
(ii) d(x, y) = d(y, x);
(iii) d(x, y) d(x, z) +d(z, y),
for x, y, z S.
Example A.2
Let S = 1
d
and d(x, y) =
_
d
i=1
(x
i
y
i
)
2
; then (S, d) is a metric space.
Example A.3
Let S = C[0, 1], the space of functions x : [0, 1] 1
d
that are continuous. Put
d(x, y) = sup
0t1
|x(t) y(t)|,
where | | is the Euclidean norm on 1
d
. This metric is known as the uniform metric (inducing the
topology of uniform convergence) on C[0, 1].
The notion of completeness easily generalizes from 1
d
to the general metric space setting.
Definition A.1
We say that a sequence (x
n
: n 1) is Cauchy in (S, d) if for each > 0, there exists n = n() such
that d(x
k
, x
l
) < for k, l n.
Definition A.2
We say that (S, d) is complete if every Cauchy sequence x
n
: n 1 converges to some point x S
(i.e., there exists x S such that d(x
n
, x) 0 as n ).
Not all metric spaces are complete; however, both 1
d
and C[0, 1] are.
Another important metric space concept is that of separability.
Definition A.3
We say that (S, d) is separable if there exists a countable subset D S such that for each x S and
> 0, there exists y D for which d(x, y) < .
Again, not all metric spaces are separable, but both 1
d
and C[0, 1] are.
A.2.2 Convergence in Probability of Random Elements.
Let X
n
and X be random elements taking values in S. In other words, X
n
: S (similarly for X).
We say that X
n
converges in probability to X as n if
d(X
n
, X)
P
0
as n (note that d(X
n
, X) is a real-valued random variable, so the above statement just involves
ordinary convergence in probability).
A.2. Convergence Concepts in Probability: Random Elements. 133
A.2.3 Almost Sure Convergence of Random Elements.
Again, let (S, d) be a metric space, and suppose that X
n
and X are S-valued random elements. We
say that X
n
converges almost surely to X as n if
d(X
n
, X) 0 a.s. as n
(as in Section A.2.2 above, d(X
n
, X) is a real-valued random variable, so the above statement involves
ordinary almost sure (a.s.) convergence).
A.2.4 Weak Convergence of Random Elements.
Weak convergence is the most challenging convergence concept to extend to the metric space setting.
It is clear, given (A.1) , that we need to reformulate the notion of weak convergence in order to extend
it to the metric space setting.
So, we will now nd an equivalent characterization of weak convergence for random variables. The
key to our new characterization is the Skorohod representation theorem.
Theorem A.1
A sequence of random variables X
n
: n 1 satises the relationship X
n
X as n if and only
if there exists a probability space supporting random variables X
n
: n 1 and X
such that:
(i) X
n
D
= X
n
, n 1;
(ii) X
D
= X
;
(iii) X
n
X
a.s. as n .
The above theorem asserts that, on a suitable probability space, weak convergence may be viewed as
a.s. convergence. Of course, the if part of the above theorem is trivial. The more interesting and
instructive part of the proof is the only if part, which we now outline.
Let F
n
(x) = P(X
n
x) and F(x) = P(X x). Set
F
1
n
(y) = infx : F
n
(x) y
F
1
(y) = infx : F(x) y.
It is straightforward to prove that if F
n
converges to F at all continuity points of F, then
F
1
n
(y) F
1
(y) (A.4)
as n , except (at most) countably many points y (see, for example, p. 287 of (3). Choose any
probability space supporting a random variable U that is uniform on [0, 1], and set
X
n
= F
1
n
(U)
X
= F
1
(U)
It is easy to show that X
n
D
= X
n
and X
D
= X. It follows from (A.4) that
X
n
X
a.s. as n ,
proving the only if part of the Skohorod representation theorem for random variables.
Given the Skohorod representation theorem for random variables, the following result is easily
established.
Proposition A.1
Suppose that X
n
: n 1 and X are random variables such that X
n
X as n . Then
Ef(X
n
) Ef(X) (A.5)
as n for each bounded continuous function f : 1 1.
Proof. We use the Skohorod representation theorem to nd an equivalent sequence such that
X
n
X
a.s. as n .
Clearly, if f is continuous, f(X
n
) f(X
) a.s. as n . The bounded convergence theorem then

yields the result.
The converse to Proposition A.1 also holds, so that requiring (A.4) for every bounded continuous
function is equivalent to weak convergence. This is a notion that generalizes nicely to random elements.
Definition A.4
Let X
n
: n 1 and X be random elements on a metric space (S, d). Then, X
n
X as n if
and only if
Ef(X
n
) Ef(X) (A.6)
as n for every bounded continuous function f : S 1.
Actually, it turns out that we do not need to verify (A.6) for all bounded continuous functions f;
we may restrict the class somewhat.
Proposition A.2
Let X
n
: n 1 and X be random elements on a metric space (S, d). Then, X
n
X as n if
and only if
Ef(X
n
) Ef(X) (A.7)
as n for each bounded uniformly continuous function f : S 1.
For a proof of this result, see p. 24 of (2).
With this denition of weak convergence, the Skorohod representation theorem also suitably gen-
eralizes.
Theorem A.2
Let (S, d) be a separable metric space, and suppose that X
n
: n 1 and X are random elements
taking values in (S, d). Then, X
n
X as n if and only if there exists a probability space
supporting random elements X
n
: n 1 and such that
(i) X
n
D
= X
n
, n 1;
(ii) X
D
= X
;
(iii) d(X
n
, X
) 0 a.s. as n .
For a proof of this general result, see pp. 102-103 of (8).
One very important result in the theory of weak convergence is the continuous mapping prin-
ciple.
Proposition A.3
Let (S
1
, d
1
) and (S
2
, d
2
) be two metric spaces. If X
n
X as n in (S
1
, d
1
) and the function
g : S
1
S
2
is continuous, then g(X
n
) g(X) as n in (S
2
, d
2
).
Exercise A.7
Prove Proposition A.3.
Actually, g need not be continuous everywhere. Let D
g
be the set of discontinuities of g. The following
is a useful extension of the continuous mapping principle.
Proposition A.4
Let (S
1
, d
1
) and (S
2
, d
2
) be two metric spaces. If X
n
X as n in (S
1
, d
1
) and the function
g : S
1
S
2
satises P(X D
g
) = 0, then g(X
n
) g(X) as n in (S
2
, d
2
).
A.3. The Metric Space C[0, ). 135
See pp. 31 of (2), for a proof.
In any case, when X
n
X as n , we can approximate X
n
(for large n) as follows:
X
n
D
X.
A.2.5 Total Variation Convergence.
Weak convergence requires utilizing the concept of a distance (or topology, in general) on S. The
notion of total variation convergence avoids this.
Definition A.5
We say that X
n
converges in total variation to X (written X
n
t.v.
X as n ) if
sup
A
[P(X
n
A) P(X A)[ 0 as n . (A.8)
Clearly, total variation convergence implies weak convergence (since (A.8) implies that Ef(X
n
)
Ef(X) for all bounded functions f). However, the converse is false.
Exercise A.8
Let B(n, p) be binomial with parameters n and p. Show that
B(n, p) np
_
np(1 p)
does not converge in total variation to N(0, 1) as n .
Exercise A.9
Let S be a random walk in which the increments are i.i.d. with nite variance. Prove that there exists
no random variable Z such that (S
n
n)/
n
t.v.
Z a.s. as n , when = ES
1
. Reconcile this
result with the Skohorod representation theorem.
A.3 The Metric Space C[0, ).
Let C[0, ) be the space of functions x : [0, ) 1
d
that are continuous. We equip C[0, ) with the
metric
d(x, y) =
_

0
e
t
sup
0st
|x
n
(s) x(s)[[
1 + sup
0st
|x
n
(s) x(s)[[
dt.
This induces the topology of uniform convergence on compact sets, namely d(x
n
, x) 0 if and only
if for each u 0,
sup
0su
|x
n
(s) x(s)[[ 0
as n .
A.4 The Metric Space D[0, ).
Let D[0, ) be the space of functions x : [0, ) 1
d
that are right continuous with left limits. We
could try using the same metric as on C[0, ). However, that metric is unsuitable because it would
assert that x
n
,x as n , if
x
n
(t) =
_
0, t < 1/n
1, t 1/n
and x(t) 1.
In order to deal with such discontinuities, another topology is used, namely the Skorohod topology
(see pp. 116-127 of (8). However, it should be noted that uniform convergence on compact sets implies
convergence in the Skorohod topology.
In any case, D[0, ) is complete and separable under the Skorohod topology. Another important
fact related to the Skorohod topology is that if x
n
x in this topology as n , with x continuous,
then the convergence is automatically uniform on compact sets. So, consider g : D[0, ) 1 such
that whenever
sup
0tu
|x
n
(t) x(t)| 0 (A.9)
for each u 0, g(x
n
) g(x) as n . For such a g, D
g
(the set of discontinuities of g) is clearly
contained in the set of discontinuous functions contained in D[0, ). Thus, P(B D
g
) = 0 if g satises
the condition (A.9). This can be useful in applying Proposition A.4.
A.5 The Hilbert Space of Square-Integrable Random Vari-
ables.
For a given probability space (, T, P), let
L
2
= X : EX
2
<
be the set of all square-integrable random variables. The set L
2
is a vector space under the usual
operations of addition of random variables and scalar multiplication. Furthermore, let X, Y ) be the
bi-linear form dened by
X, Y ) = E[XY ]
for X, Y L
2
. Then, ) is an inner product on L
2
, and, dened by
|X| =
_
X, X)
is a norm on L
2
.
It turns out that L
2
is complete under the above norm, in the sense that if X
n
: n 0 is a
Cauchy sequence in the norm | |, then there exists a random variable X
L
2
such that
|X
n
X
| 0
as n ; see p. 117 of (16) for a proof.
A complete inner product space is known as a Hilbert space. Hence, L
2
is a Hilbert space under
the above inner product. One very important characteristic of a Hilbert space is that it induces a
geometry on the space. For example, one can now precisely dene the angle between two points
in the space. Specically, two random variables X and Y in L
2
are orthogonal with respect to one
another if
X, Y ) = 0.
This notion of a geometry on the space of random variables can often be very useful, both theoretically
and in terms of developing intuition
A.6 Interchanging Limits and Expectations.
If X
n
X a.s. as n , it does not necessarily follow that EX
n
EX as n .
Exercise A.10
Provide an example of a sequence of integrable random variables X
n
: n 1 such that X
n
X a.s.
as n with E[X[ < , but for which EX
n
,EX as n .
A.6. Interchanging Limits and Expectations. 137
Measure-theoretic probability provides several useful sucient conditions for justifying the inter-
change of limits and expectations. These are conditions under which
lim
n
EX
n
= E lim
n
X
n
= EX,
where lim
n
X
n
= X.
One important sucient condition is the Bounded Convergence Theorem (BCT).
Theorem A.3 (Bounded Convergence Theorem)
Suppose that X
n
X a.s. as n , where there exists c < such that
sup
n1,
[X()[ c.
Then, EX
n
EX as n .
A useful generalization is the Dominated Convergence Theorem (DCT).
Theorem A.4 (Dominated Convergence Theorem)
Suppose that X
n
X a.s. as n . If there exists a random variable Y having nite expectation
and sup
n1
[X
n
()[ [Y ()[, for , then EX
n
EX as n .
In both of the above theorems, it is necessarily the case that the random variable X has nite mean,
i.e. E[X[ < . This is not the case in our next such result, known as the Monotone Convergence
Theorem (MCT).
Theorem A.5 (Monotone Convergence Theorem)
Suppose that X
n
: n 1 is a sequence of non-negative random variables that is non-decreasing in
the sense that for , (X
n
() : n 1) is a non-decreasing real-valued sequence. (We write this as
X
n
X, where X() = lim
n
X
n
() is the limit. This limit must necessarily exist, but may be
innite.) Then EX
n
EX as n .
In Theorem A.5, there is no guarantee that EX < , even if the X
n
s are integrable. Another result
that is sometimes useful is the following, known as Fatous Lemma.
Proposition A.5 (Fatous Lemma)
Let X
n
: n 1 be a sequence of non-negative integrable random variables. Then
E lim
n
X
n
E liminf
n
X
n
lim
n
EX
n
.
In many settings, the sense in which X
n
: n 1 converges to X is weaker than almost sure
convergence. For example, it may be true that X
n
X as n , and that one wishes to conclude that
EX
n
EX as n . To cast the question of whether limits and expectations can be interchanged
in terms of almost sure convergence, one can simply appeal to the Skorohod Representation Theorem
(see Appendices A.1 and A.2). This reduces the problem to one in which the above results may be
applied.
Exercise A.11
Suppose that X
n
X in pth mean with p > 1. In other words, E[X
n
[
p
< (n 1), E[X[
p
< ,
and E[X
n
X[
p
0 as n . Prove that EX
n
EX as n .
Exercise A.12
Suppose that X
n
: n 1 is a sequence of identically distributed (but not necessarily independent)
random variables. If E[X
1
[ < , prove that n
1
max
1kn
[X
k
[ 0 as n .
Exercise A.13
Let = (
0
,
0
+ ) for > 0. Suppose that X : 1 is such that X(, ) is continuously
dierentiable on for each . If E sup
[X
()[ < , prove that

d
d
E[X()]
=
0
= E
_
d
d
X()
_
=
0
.
A related set of results that is often used is the following. Suppose that we have a space on which is
dened a probability distribution . For X : 1, when is it true that
_
X(, )(d)P(d) =
_
X(, )P(d)(d)?
In other words, when is it correct that E[
_
X()(d)] =
_
EX()(d)?
Proposition A.6
If X : [0, ), then
E[
_
X()(d)] =
_
EX()(d).
Proposition A.6 is one version of Fubinis theorem. It asserts that interchanging the order of
integration is always valid when X is non-negative; however, it may be that both expressions are
innite. For X of mixed sign, one uses the following variant of Fubinis theorem.
Theorem A.6 (Fubinis theorem)
If X : 1 is such that
E
__
[X()[(d)
_
< . (A.10)
then
E
__
X()(d)
_
=
_
EX()(d).
In verifying (A.10), one often uses Proposition A.6 to evaluate the integral.
Exercise A.14
Use Theorem A.6 to prove that if (a
mn
: m, n 1) is a real-valued (deterministic) doubly-indexed
sequence, then
m=1
n=1
a
mn
=
n=1
m=1
a
mn
provided that

m=1
n=1
[a
mn
[ <
For further details on these interchange issues, see (3).
A.7 Denition of Conditional Expectation.
Let X and Y be two jointly distributed random variables having joint density f
X,Y
. The standard
formula for computing E(X[Y ) is to compute
g
(y) =
_

f
X|Y
(x[y)dx, (A.11)
where
f
X|Y
(x[y) =
f
X,Y
(x, y)
_
f
X,Y
(z, y)dz
A.7. Denition of Conditional Expectation. 139
is the conditional density of X given that Y = y. Then, we conclude that E(X[Y ) = g
(Y ). This
approach to computing the conditional expectation is completely satisfactory so long as one conditions
on a nite number of random variables.
If we wish to condition on innite collections of random variables, we need something more. One
approach to developing a more general denition involves viewing E(X[Y ) as a predictor of X given
that one has observed Y . To set this idea in perspective, consider the unconditional mean EX. If one
observes nothing, then the predictor

X for X must be independent of the sample outcome, i.e. it is
deterministic. To minimize the mean-square prediction error E[(X

X)
2
] over all predictors of the
form

X = a is a simple exercise in calculus; the minimizing value a
= EX. So, in the absence of

observed information, the optimal predictor of X is simply X
= EX.
Now, what happens if we do observe information, in the form of observing some random element
Y ? The predictor is then permitted to depend on the observed sample outcome, but only through Y .
Specically, it must be true that the predictor

X = g(Y ) for some deterministic function g. Then, our
goal is to nd X
= g
(Y ) which minimizes E[(X

X)
2
] over all predictors of the form

X = g(Y ).
Then, it seems natural to dene E(X[Y ) to be the random variable g
(Y ).
How do we know that such a minimizer X
exists? The key is to appeal to the Hilbert space L

2
described in Appendix A.5, assuming that X has the (rather) reasonable property that EX
2
< . In
this case, X L
2
. If

X , L
2
, then

X is innite, so it is natural to restrict our predictors to the set
G = W L
2
: W = g(Y ) for some deterministic g.
Obviously, G is a linear subspace of L
2
. It can be shown that G is a closed subspace of L
2
, in the
sense that if W
n
W in L
2
with W
n
G, then W G. This permits us to apply the following result,
known as the Hilbert Space Projection Theorem.
Theorem A.7
Suppose that X L
2
and G is a closed subspace of L
2
. Then, there exists a unique W
G that
minimizes |X W| over W G. Furthermore, W
is characterized as that unique W
G such that
X W
, W) = 0 for W G. (A.12)
Theorem A.7 asserts that W
is that point in G such that the prediction error XW
is orthogonal
to every W G. In other words, we can obtain W
by merely dropping the perpendicular down

from X to the subspace G.
Definition A.6
If X L
2
, then we dene E(X[Y ) to be the random variable W
that is guaranteed to exist by

Theorem A.7.
Note that Theorem A.7 ensures that E(X[Y ) is the unique random variable in G having the property
that
E[XW] = E[E(X[Y )W] (A.13)
for W G. Relation (A.13) often provides an easy means of computing the conditional expectation.
Exercise A.15
Suppose that X and Y are two random variables with joint density f
X,Y
. Also, assume that EX
2
is nite. Using Denition A.6, prove that the conditional expectation E(X[Y ) = g
(Y ), where g
is
dened by (A.11).
Actually, E(X[Y ) may be dened whenever E[X[ < or X 0 (regardless of the structure of Y ).
Then, the idea is to dene E(X[Y ) through (A.13).
Definition A.7
If E[X[ < or X 0, then we dene E(X[Y ) to be the unique random variable Z
in integrable
Z : Z = g(Y ) for some deterministic function g having the property that
E[XW] = E[Z
W]
for all W bounded functions Z : Z = g(Y ) for some deterministic function g.
Another way to construct E(X[Y ) in this general setting is to use martingale theory. The next exercise
describes this approach in a special case.
Exercise A.16
Let Y = Y (t) : t 0 be a right-continuous stochastic process. Suppose that X has nite expectation.
Set
M
n
= E[X[Y (tj2
n
) : 0 j 2
n
].
Z
n
= Y (tj2
n
) : 0 j 2
n
.
(a) Prove that M
n
: n 0 is a martingale with respect to Z
n
: n 0.
(b) Use the martingale convergence theorem to prove that M
n
M
a.s. as n , where M
is
nite-valued. (Then, the random variable M
is the desired conditional expectation E(X[Y (u) :

0 u t).
Bibliography
[1] Arnold, L. Stochastic Dierential Equations. John Wiley, 1974.
[2] Billingsley, P. Convergence of Probability Measures. John Wiley, Chichester, England, 1968.
[3] Billingsley, P. Probability and Measure, rst ed. John Wiley, Chichester, England, 1979.
[4] Bremaud, P. Point Processes and Queues: Martingale Dynamics. Springer-Verlag, 1981.
[5] Chung, K. A First Course in Probability. Academic Press, 1974.
[6] Cinlar, E. Introduction to Stochastic Processes. Prentice-Hall Inc., Englewood Clis, N.J., 1975.
[7] Doob, J. L. Stochastic Processes. John Wiley, 1953.
[8] Ethier, S. N., and Kurtz, T. G. Markov Processes: Characterization and Convergence. John
Wiley, 1986.
[9] Hoel, P., Port, S., and Stone, C. Introduction to Stochastic Processes. Houghton Miin,
Boston, 1972.
[10] Karatzas, I., and Shreve, S. Brownian Motion and Stochastic Calculus. Springer-Verlag,
1988.
[11] Karlin, S., and Taylor, H. M. A First Course in Stochastic Processes, second ed. Academic
Press, New York-London, 1975.
[12] Karlin, S., and Taylor, H. M. A Second Course in Stochastic Processes, second ed. Academic
Press, New York-London, 1981.
[13] Krylov, N. V. Controlled Diusion Processes. Springer-Verlag, 1980.
[14] Oksendal, B. Stochastic Dierential Equations. Springer Verlag, 1998.
[15] Ross, S. M. Introduction to Stochastic Dynamic Programming. Academic Press, 1983.
[16] Royden, H. Real Analysis. Macmillan, 1968.
[17] Rudin, W. Principles of Mathematical Analysis, third ed. McGraw-Hill, 1976.
[18] Samorodnitsky, G., and Taqqu, M. Stable Non-Gaussian Random Processes. Chapman and
Hall, 1994.
[19] Shiryayev, A. N. Optimal Stopping Rules. Springer-Verlag, 1978.
[20] Wiener, N. Dierential space. J. Math. Phys. 2 (1923), 131174.
[21] Willinger, W., and Taqqu, M. S. Pathwise stochastic integration and applications to the
theory of continuous trading. Stochastic Processes and Their Applications 32 (1989), 253280.
141

Appendix (Probability Concepts) Old

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Appendix (Probability Concepts) Old

Transféré par

Droits d'auteur :

Formats disponibles

Appendix A

almost sure convergence convergence in pth mean

) a.s. as n . The bounded convergence theorem then

()[ < , prove that

= EX. So, in the absence of

(Y ) which minimizes E[(X

exists? The key is to appeal to the Hilbert space L

is characterized as that unique W

is that point in G such that the prediction error XW

by merely dropping the perpendicular down

that is guaranteed to exist by

is the desired conditional expectation E(X[Y (u) :

Vous aimerez peut-être aussi