Vous êtes sur la page 1sur 10

A BAYESIAN ANALOGUE OF PAULSON'S LEMMA AND ITS USE IN TOLERANCE REGION CONSTRUCTION WHEN SAMPLING FROM THE MULTI-VARIATE

NORMAL
IRWIN GUTTMAN* (Received Feb. 16, 1970) 1. Introduction

Suppose we are sampling on a k dimensional random variable X, defined over R ~, Euclidean space of k dimensions, and let { A } - - ~ denote a a-algebra of subsets A of R ~. We assume t h a t ~ I c B - - B o r e l subsets of R ~. Suppose f u r t h e r t h a t X has the absolutely continuous distribution

(I.I)

F(x[@)=I'~ "-'I ~I f(Yl,'", y~16)dy,...dy,

and # ~ I2, with 9 an indexing set which we will refer to as t h e parameter space. We denote a random sample of n independent observations on X by (X~,..., X~) or {X~}. We have the following definitions. DEFINITION 1.1. A statistical tolerance region S ( X ~ , . . . , X,) is a statistic defined over R~ . . . which takes values in the aalgebra ~. This definition, then, implies t h a t a statistical tolerance region is a statistic which is a set function, and maps " t h e p o i n t " ( X 1 , - . . , X~) R ~ into t h e region S ( X ~ , . . . , X~)~ ~, t h a t is, S ( { X ~ } ) c R ~. W h e n constructing such statistical tolerance regions, various criteria m a y be used. One t h a t is often borne in mind is contained in the following definition. DEFINITION 1.2. region if S(X1,- 9 X,) is a ~-expectation statistical tolerance

(1.2)
for all 0 ~ Q, where (1.2a)

EiX,} {F[S(X1," ", X,) [ 01} = fl

F [ S [ @]= f'~" f f ( u l 6 ) d u .

* This research was supported in part by the Wisconsin Alumni Research Foundation. Present address is: Centre de Recherches Math~matiques, Universit~ de Montreal. 67

68

IRWIN G U T T M A N

The quantity F[S]O]=F[S(XI,..., X,)IO] is called the coverage of the region S, and will be denoted by C[S]. We note t h a t it can be viewed as the probability of an observation, say, Y, falling in S, where Y is independent of X~,..., X, and, of course, h~s distribution F ( y [ 0 ) . Now because S(X1,..-, X,) is a random set function, F[S(X~,..., X,)I0] is a random variable and has the distribution of its own. Hence, constructing an S to satisfy (1.2), simply implies t h a t we are imposing the condition t h a t S be such that the distribution of its coverage F[S]O] has expectation (mean value) {3. Paulson [6] has given a very interesting connection between statistical tolerance regions and prediction regions. PAULSON'S LEMMA. I f On the basis of a given sample on a k-dimension~d random variable X, a k-dimensional "confidence" region S(XI, 9.., X,) of level ~ is found for statistics TI,..., T~, where T~- T~(Y1, 9.., Yq), where the ( k vector observations Y~, j = l , . . . , q are independont observations on X, and independent of X I , . . . , X,, and i f C is defined to be such that (1.3)

C= Is dG(t)
then .

where G(t) is the distribution function of T ( T ' = ( T 1 , . - . , T~), a n d T~=


T, ( Y , , . . . ,

(1.4)

Before we prove this lemma, we remark that our interest will be for the case q = l , (that is, we will have one f u t u r e observation Y~ = Y), and T~ = T~(Y) = Y~ so that T = Y and G(t)=F(y[ 0). Hence, C given by (1.3) is simply the coverage of S(XI,..., X,). Note t h a t C depends here on S and 0 and indeed we may write C=C,(S). In these circumstances, then, Paulson's Lemma then gives us an operational method for constructing a statistical tolerance region of ~expectation, namely : Find S, a prediction region of level ~ for a future observation Y. If this is done, then S is a tolerance region of ~-expectation. PROOF OF PAULSON'S LEMMA. The joint distribution function of X~,. .., X,, is
n r

(x, le)

Now the left-hand side of (1.4) may be written as (1.5)

E[C]= fs~ is dG(t)d ~ F(x, , e ) .

A BAYESIAN ANALOGUE OF PAULSON'S LEMMA AND ITS USE

6g

But the right-hand side of (1.5) is the probability that T lies in S, and we are given that S=S(X~,..., X,) is ~-level confidence region for T (or prediction region for T). Hence, the right-hand side of (1.5) has value ~, and we have that E[C]=~. As an illustration of the above, we take the case q=l, and suppose that sampling is on the k-dimensional normal variable N(~, 2"). It is well known that a 100~ % prediction region for Y, where Y=N(p, X), constructed on the basis of the random sample of n independent observations X I , - . . , X, [where Y, X ~ , . . . , X, are all independent] is

(1.6)

S([X,}) = { Y[ ( Y-.X)'V-'( Y-.X)~_C~}

where the mean vector 2~ and the sample variance-covariance matrix V are defined by

(1.6a)

r=(n-1)-'
with C~ given by (1.6b) C~= [ ( n - 1)k/(n - k)] [1 + n-L]Fk,,,_k;~-~

and F~,,_~:I_~ is the point exceeded with probability 1 - ~ when using the Snedecor-F distribution with (k, n--k) degrees of freedom. Hence, by Paulson's Lemma, S given by (1.6) is a E-expectation tolerance region. This region is known to have certain optimum properties--see, for example, Fraser and Guttman [1]. We now approach the problem of constructing "~-expectation" tolerance regions from the Bayesian point of view. We will see that there is a direct analogue of Paulson's Lemma, which arises in a natural and interesting way.

2. The Bayesian approach


In the Bayesian framework, references about the parameters @ in a statistical model are summarized by the posterior distribution of the parameters which are obtained by the use of a theorem due to the Reverend Thomas Bayes. This theorem, a simple statement of conditional probability, states that the distribution of the parameters 0, given that X~ is observed to be x~, i - - 1 , - . . , n, is

(2.1)

p[o l {x,} ]=cp(e)p[ Ix,} l e] ,

where the normalizing constant c is such that

70

IRWIN GUTTMAN

(2.1a)

c-'= lo p(o)p[{x,} I O]dO,

p(0) is the (marginal) distribution of the vector o f parameters 0, and p[{xd [0] is the distribution of the observations {X,}, given 0. When we are indeed given that {X~}= {x~}, then we often call p[{xd[0] the likelihood function of 0 and denote it be l[0[ {xd]. The ingredients of (2.1) may be interpreted as follows. The distribution p(O) represents our knowledge about the parameters 0 before the data are drawn, while l[0[ {xd ] represents information given to us about 0 from the data {x~}, and finally, p[O[{xd] represents our knowledge of 0 after we observe the data. For these reasons, p(O) is commonly called the a-priori or prior distribution of 0, and p[Ol{xd], the aposteriori or posterior distribution of 0. Using this interpretation, then, Bayes' theorem provides a formal mechanism by which our a-priori information is combined with sample information to give us the posterior distribution of 0, which effectively summarizes all the information we have about 0. Now the reader will recall that S is a tolerance region of fi-expectation if its coverage C[S] has expectation ~. Now, from a Bayesian point of view, once having seen the data, that is, having observed {X~}= {xd,

C[S] = I f(y[O)dy is a function only of the parameters 0, and the J3 expectation referred to is the expectation with respect to the posterior distribution of the parameters 0, that is, on the basis of the given data
then {x~}, we wish to construct S such that (2.2)

E[C[S] i {x'} l= fo Is f(Y l O)P[S l {x'] ldYgO=~

where Y has the same distribution as the X,, namely f(-[0). Now it is interesting to note that, assuming the conditions of Fubini's Theorem hold, so that we may invert the order of integration, we have that (2.3)

E[C[S] ]{x,]] = f s Io f(y [O)p[O [ {x,] ]40@ = Is h[yi{x,}]gy.

Now the density (2.4)

h[y[[x~]], where h[y l {x,} ] = IQ f(y I @)p[O] {x,}]40

is (examining the right-hand side of (2.4)) simply the conditional distribution of Y, given the data [x~}, where Y may be regarded as an additional observation from f(x[O), additional to and independent of X1,

A BAYESIAN ANALOGUE OF PAUL.SON'S LEMMA AND ITS USE

71

9.., X,. This density h[yl{x,}] is the Bayesian estimate of the distribution of Y and has been called the predictive or future distribution of Y. (For further discussion, see, for example, Guttman [4] and the references cited therein.) Hence, (2.2) and (2.3) have the very interesting implication that S is a fl-expectation tolerance region if it is a (predictive) ~-confidence region for Y, where Y has the predictive distribution h[yl{x,}] defined by (2.4). This is the Bayesian analogue of Paulson's Lemma given in Section 1 with q = l . To repeat in another way, for a particular f, we need only find h[yl{x,} ] and a region S that is such that (2.5) Pr ( Y e S)= Is h[y [ {x,} ]dy-fl .

We summarize the above results in the following lemma.

I f on the basis of observed data {x~}, a predictive ~confidence region S=S({x~}) is constructed such that
LEMMA 2.1. (2.6)

Is h[y l {x~}ldy-=fl

where the predictive distribution is given by (2.4), and i f C[S] is the coverage of S, that is
(2.7)

C[S] =C[S(x, ,-.., x~)]= Is f ( y [O)dy

where f is the common distribution of the independent random variables, X~,..., X~, Y, then the posterior expectation of C[S] is ~, that is S is of ~-expectation.
(The proof is simple and utilizes relations (2.3) and (2.6).)

3.

Sampling from the k-variate normal

We suppose in this section that sampling is from the k-variate normal N(p, Z), whose distribution is given by

(3.1)

f ( x I P, 2)=(2~)-~n ] $ -I 1 in exp f - 1 (x-/J)'~:-l(x--p)} ,

where /J is (kx 1) and ~7 is a (kxk) symmetric positive definite matrix. It is convenient to work with the set of parameters (/~, 2 -1) here, and accordingly, suppose that the prior for this situation is the conjugate prior or Raiffa and Schlaifer given by

72

IRWIN G U T T M A N

(3.2)

P(P, X - ' ) d ~

-~ ~ I ~ - ' Ir

9 exp {-- 89tr

I:-~[(no-1)Vo+no(p-~)(p-Yco)']Idpd$-~

where x0 is a (kx 1) vector of known constants and 110 is a (k symmetric positive definite matrix of known constants*. It is to be noted that if ~ tends to 0 and 0 h - l ) V 0 tends to the zero matrix, then (3.2) tends to the "in-ignorance" prior advocated by Geisser [2] and Geisser and Cornfield [3], viz (3.3) p(p, $-~)dpd~ -~ c~ [$-~

[-(~+,/2dpd~r'~.

Now it is easy to see that if n independent observations X~ are taken from (3.1), and we observe {X~} to be [x~], that the likelihood function is given by (3.4) l[p,.~'-'l

{x,}l=(2x)-'WZl~ -~[''

. exp { - l tr Y;-'[(n-1)V
where
n

x = n -l ~ X~
/.ffi!

and
n

( n - 1)v= ~. ( x , - ~) (x, -

~)'.

(the abbreviation " t r A " stands for the trace of the matrix A.) Now combining (8.2) and (3.4) using Bayes' Theorem gives us that the posterior of (p, Z-I) is such that
(3.5) p[~, :~-'1 {x,]] r162 I~ -1 p+~-~-"/' 1 9 exp {- ~ tr 2:-'[(n0 -- 1)1/'0+ (n-- 1)V

+m(~-~0)(~-~0)'+n(~-~)(~-~)']l

Now the exponent of (3.5) may be written in simplified form on "completing the square" in p, that is, the term in square brackets in the exponent of (3.5) may be written, after some algebra, as
(3.6) (n + no) (~ -

~)(~- ~)' + (no -

1)F0 + (n - 1 ) V + R

* We are in effect saying t h a t our prior information on p a n d ~P-I is such t h a t we expect p to be ~0, with dispersion, t h a t is, variance co-variance m a t r i x of p, to be ( ~ - - l ) V 0 / [no(no--k--2)], a n d t h a t we expect ~-1 to be V'o -1 etc.

A BAYESIAN ANALOGUE OF PAULSON'S LEMMA AND ITS USE

73

where
(3.7) and

~= (n-F no)-l( n~o + nYc)

(3.7a)

R = n~Yd + noYCo~- (n + no) (&Yd)= nno n +no

To)

To)' .

Hence, we may now write (3.5) as follows:

(3.8)
9 expI-l[trE-~Q+(no+n)(l~-~)'X-~(IJ-~)] }
where (3.8a)

Q=(no-1)Vo+(n-1)V+

non ( x - ~ 0 ) ( x - ~ 0 ) ' , no+n

and c is the normalizing is, integrate to 1. Now to /J and then 2 -I, and rived from the k-variate

constant necessary to make (3.8) a density, that to determine c, we first integrate with respect in so doing, we make use of the identities denormal and k-order Wishart distributions, viz

= 2'n~n~c~-~'/' I M I''n ]-[ F[(m + 1--/)/21 9

As is easily verified, performing the integration yields (3.10)

/ 2kc~+~o'n~ c=(n+m)~l'[Q [<n+~0-1'n ~r

' F[(n%no-i)12] . ]-[


~:~1

We are now in the position of being able to find the predictive density of a future observation Y, conditional on x. The first factor of the integrand of (2.4) has functional form (3.1), and the second factor is given by (3.8). Hence, we find that the predictive distribution of Y is given by
(3.11)

h[u l {x,} ]= I . . . f I. . . I c(2~)-'/' l E-' lc"+~o-"n


9 exp t - - 1 tr,-'[Q+W]dlLdE-Z t

where c is given by (3.10), Q by (3.8a), and where W is such that

74

IRWIN GUTTMAN

(3.12)

W=(n +m)(p-Yc)(p-Yc)' +(p-y)(p-y)' .

Again "completing the square" in p in (3.12), we easily, but tediously, find that (3.13) with W=(n+no+l)(p-~)(p-~)'-} n+no (y-Yc)(y-Yc)' n+n0+l

#=(n+no+i)-'[(n+m)i+y].
The integration with respect to p in (3.11), using (3.9a), gives us

(3.14)h[yl{xd]=I... Ic(n+m+l)-'/'II-11(~+=o -'-'/'


n+~+l Integrating (3.14) with the help of (3.9b), and substituting for the value of c given by (3.10), we find that

(3.15)

h[yl{x,}]=( n+m )';aFt(n+m)/2llQ-*l ~"


n no 1 II'/'F[(n no -- k)/2] 9 It n+no n+n0 Q_~(y_Yc)(y_Yc),l -'"+~o'n
I

Now using the identity (proved in the appendix) (3.16) I I~I--AB[= I I , , - B A I

where A is (n~ and B is (n, n~), we have the result that the predictive density of Y is given by

(8.17)

h[ylix,}l=( n+no )"r[(n+m)/2llQ-'l '/'


n l II'/2F[(n
9 n+n0
'

n+n0+l that is, we have the interesting result that the predictive distribution of Y, given {x~}, is related to the k-variate t-distribution, degrees of freedom ( n + n o - k ) . As may be seen from properties of the multivariate-t (see, for example, Tiao and Guttman [8]), we have that (3.18) no+n n+ n 0 + i (Y-Yc.)'Q-'(Y-Ic)= n + nbo - k F~..+~_~

Suppose now that we are interested in the "central" 100;9~ of the normal distribution (3.1), that is, in the set

A BAYESIAN ANALOGUE OF PAULSON'S LEMMA AND ITS USE

75

(3.19)

A ~= {y l (y--p)'2-~(y-- p) <=Z~,~_~}

where Z~,l_p is the point exceeded with probability ( 1 - 8 ) when using the chi-square distribution with k degrees of freedom. Given p and 2 or (~, 2-~), we have that (3.20)

P ( Y e A~lp, 2)=~

that is, if we knew (~, 2), A k would be a 100~ % predictive region for Y. Now since we don't know (~, 2), we use as an estimator of the density (3.1), the density (3.17), after observing the data {x~}. Hence, a sensible predictive region for Y is the central ellipsoidal region of (3.17), namely the region (3.21)

S(x,,...,x~)___ty ] no.-l-n (y_Yc),[Q/(no_l_n_k)l-l(y_]c) no-f-n-t-1

and it is easy to see, from (3.18), that (3.22) P ( Y e S I {x~})=~.

Thus, by Lemma 2.1, we have that S defined by (3.21) is a tolerance region of (posterior) ~-expectation. We note that if no=Oand (n0-1)V0 is the zero matrix, that is, if the so-called "in-ignorance" prior given by (3.3) is the appropriate prior, then the above results imply that the ~-expectation region is of the form (3.23)

S({x,})-- Iyl n-~(y-Yc)'[(n-1)V/(n-k)]-'(lt-Yc)~_kF,,,,_,:,_,t

which is interesting, since this latter result is in agreement with the sampling theory result (1.6), as may be easily verified. It is to be finally remarked, that the lemma of Section 2 is quite general and may be used when sampling is from any population. In fact, the case of the single exponential is discussed in Guttman [5].

Appendix
We give a proof, due to George Tiao, of (3.16). Consider the matrix equations (A is (n~ x n~) and B is (n~ x n~))

76 and

IRWIN GUTTMAN

<A.2)

B I. j L - B ! z . j

z,j"

No~v taking determinants of both sides of (A.1) and (A.2) yields

I MI.I X~I1.1 X~ I=1 X., I-I I~-BA 1


(A.3) where

and

M [ I~I A
=

Using (A.3) we have the result


II~,-AB I--II,~-BA I=I MI .
UNIVERSITY OF WISCONSIN AND UNIVERSITY OF MASSACHUSETTS

REFERENCES [ 1 ] Fraser, D. A . S . ~nd Guttman, Irwin (1956). Tolerance regions, Ann. Math. Statist., 27, 162-179. [ 2 ] Geisser, S. (1965). Bayesian estimation in multivariate analysis, Ann. Math. Statist., 36, 150-159. [ 3 ] Geisser, S. and Cornfield, J. (1963). Posterior distributions for multivariate normal parameters, ]our. Roy. Statist. Soc., Ser. B, 25, 368-376. [ 4 ] Guttman, Irwin (1967). The use of the concept of a future observation in goodness-offit problems, four. Roy. Statist. Soc., Set. B, 29, 83-100. [5 ] Guttman, Irwin (1968). Tolerance regions: A survey of its literature. VI. The Bayesian approach, Te.ch. Rzp. No. 126, / ~ p a ~ n ~ t of Statistics, University of Wisconsin. [ 6 ] Paulson, E. (1943). A note on tolerance limits, Ann. Math. Statist., 14, 90-93. [ 7 ] Raiffa, H. and Schlaifer, R. (1961). Applied Statistical Decision Theory, Harvard University Press. [8 ] Tiao, G. C. and Guttman, Irwin (1965). The multivariate inverted beta distribution with applications, Jour. Amer. Statist. Assoc., 60, 793-805.

Vous aimerez peut-être aussi