Académique Documents
Professionnel Documents
Culture Documents
NORMAL
IRWIN GUTTMAN* (Received Feb. 16, 1970) 1. Introduction
Suppose we are sampling on a k dimensional random variable X, defined over R ~, Euclidean space of k dimensions, and let { A } - - ~ denote a a-algebra of subsets A of R ~. We assume t h a t ~ I c B - - B o r e l subsets of R ~. Suppose f u r t h e r t h a t X has the absolutely continuous distribution
(I.I)
and # ~ I2, with 9 an indexing set which we will refer to as t h e parameter space. We denote a random sample of n independent observations on X by (X~,..., X~) or {X~}. We have the following definitions. DEFINITION 1.1. A statistical tolerance region S ( X ~ , . . . , X,) is a statistic defined over R~ . . . which takes values in the aalgebra ~. This definition, then, implies t h a t a statistical tolerance region is a statistic which is a set function, and maps " t h e p o i n t " ( X 1 , - . . , X~) R ~ into t h e region S ( X ~ , . . . , X~)~ ~, t h a t is, S ( { X ~ } ) c R ~. W h e n constructing such statistical tolerance regions, various criteria m a y be used. One t h a t is often borne in mind is contained in the following definition. DEFINITION 1.2. region if S(X1,- 9 X,) is a ~-expectation statistical tolerance
(1.2)
for all 0 ~ Q, where (1.2a)
F [ S [ @]= f'~" f f ( u l 6 ) d u .
* This research was supported in part by the Wisconsin Alumni Research Foundation. Present address is: Centre de Recherches Math~matiques, Universit~ de Montreal. 67
68
IRWIN G U T T M A N
The quantity F[S]O]=F[S(XI,..., X,)IO] is called the coverage of the region S, and will be denoted by C[S]. We note t h a t it can be viewed as the probability of an observation, say, Y, falling in S, where Y is independent of X~,..., X, and, of course, h~s distribution F ( y [ 0 ) . Now because S(X1,..-, X,) is a random set function, F[S(X~,..., X,)I0] is a random variable and has the distribution of its own. Hence, constructing an S to satisfy (1.2), simply implies t h a t we are imposing the condition t h a t S be such that the distribution of its coverage F[S]O] has expectation (mean value) {3. Paulson [6] has given a very interesting connection between statistical tolerance regions and prediction regions. PAULSON'S LEMMA. I f On the basis of a given sample on a k-dimension~d random variable X, a k-dimensional "confidence" region S(XI, 9.., X,) of level ~ is found for statistics TI,..., T~, where T~- T~(Y1, 9.., Yq), where the ( k vector observations Y~, j = l , . . . , q are independont observations on X, and independent of X I , . . . , X,, and i f C is defined to be such that (1.3)
C= Is dG(t)
then .
(1.4)
Before we prove this lemma, we remark that our interest will be for the case q = l , (that is, we will have one f u t u r e observation Y~ = Y), and T~ = T~(Y) = Y~ so that T = Y and G(t)=F(y[ 0). Hence, C given by (1.3) is simply the coverage of S(XI,..., X,). Note t h a t C depends here on S and 0 and indeed we may write C=C,(S). In these circumstances, then, Paulson's Lemma then gives us an operational method for constructing a statistical tolerance region of ~expectation, namely : Find S, a prediction region of level ~ for a future observation Y. If this is done, then S is a tolerance region of ~-expectation. PROOF OF PAULSON'S LEMMA. The joint distribution function of X~,. .., X,, is
n r
(x, le)
6g
But the right-hand side of (1.5) is the probability that T lies in S, and we are given that S=S(X~,..., X,) is ~-level confidence region for T (or prediction region for T). Hence, the right-hand side of (1.5) has value ~, and we have that E[C]=~. As an illustration of the above, we take the case q=l, and suppose that sampling is on the k-dimensional normal variable N(~, 2"). It is well known that a 100~ % prediction region for Y, where Y=N(p, X), constructed on the basis of the random sample of n independent observations X I , - . . , X, [where Y, X ~ , . . . , X, are all independent] is
(1.6)
where the mean vector 2~ and the sample variance-covariance matrix V are defined by
(1.6a)
r=(n-1)-'
with C~ given by (1.6b) C~= [ ( n - 1)k/(n - k)] [1 + n-L]Fk,,,_k;~-~
and F~,,_~:I_~ is the point exceeded with probability 1 - ~ when using the Snedecor-F distribution with (k, n--k) degrees of freedom. Hence, by Paulson's Lemma, S given by (1.6) is a E-expectation tolerance region. This region is known to have certain optimum properties--see, for example, Fraser and Guttman [1]. We now approach the problem of constructing "~-expectation" tolerance regions from the Bayesian point of view. We will see that there is a direct analogue of Paulson's Lemma, which arises in a natural and interesting way.
(2.1)
70
IRWIN GUTTMAN
(2.1a)
p(0) is the (marginal) distribution of the vector o f parameters 0, and p[{xd [0] is the distribution of the observations {X,}, given 0. When we are indeed given that {X~}= {x~}, then we often call p[{xd[0] the likelihood function of 0 and denote it be l[0[ {xd]. The ingredients of (2.1) may be interpreted as follows. The distribution p(O) represents our knowledge about the parameters 0 before the data are drawn, while l[0[ {xd ] represents information given to us about 0 from the data {x~}, and finally, p[O[{xd] represents our knowledge of 0 after we observe the data. For these reasons, p(O) is commonly called the a-priori or prior distribution of 0, and p[Ol{xd], the aposteriori or posterior distribution of 0. Using this interpretation, then, Bayes' theorem provides a formal mechanism by which our a-priori information is combined with sample information to give us the posterior distribution of 0, which effectively summarizes all the information we have about 0. Now the reader will recall that S is a tolerance region of fi-expectation if its coverage C[S] has expectation ~. Now, from a Bayesian point of view, once having seen the data, that is, having observed {X~}= {xd,
C[S] = I f(y[O)dy is a function only of the parameters 0, and the J3 expectation referred to is the expectation with respect to the posterior distribution of the parameters 0, that is, on the basis of the given data
then {x~}, we wish to construct S such that (2.2)
where Y has the same distribution as the X,, namely f(-[0). Now it is interesting to note that, assuming the conditions of Fubini's Theorem hold, so that we may invert the order of integration, we have that (2.3)
is (examining the right-hand side of (2.4)) simply the conditional distribution of Y, given the data [x~}, where Y may be regarded as an additional observation from f(x[O), additional to and independent of X1,
71
9.., X,. This density h[yl{x,}] is the Bayesian estimate of the distribution of Y and has been called the predictive or future distribution of Y. (For further discussion, see, for example, Guttman [4] and the references cited therein.) Hence, (2.2) and (2.3) have the very interesting implication that S is a fl-expectation tolerance region if it is a (predictive) ~-confidence region for Y, where Y has the predictive distribution h[yl{x,}] defined by (2.4). This is the Bayesian analogue of Paulson's Lemma given in Section 1 with q = l . To repeat in another way, for a particular f, we need only find h[yl{x,} ] and a region S that is such that (2.5) Pr ( Y e S)= Is h[y [ {x,} ]dy-fl .
I f on the basis of observed data {x~}, a predictive ~confidence region S=S({x~}) is constructed such that
LEMMA 2.1. (2.6)
Is h[y l {x~}ldy-=fl
where the predictive distribution is given by (2.4), and i f C[S] is the coverage of S, that is
(2.7)
where f is the common distribution of the independent random variables, X~,..., X~, Y, then the posterior expectation of C[S] is ~, that is S is of ~-expectation.
(The proof is simple and utilizes relations (2.3) and (2.6).)
3.
We suppose in this section that sampling is from the k-variate normal N(p, Z), whose distribution is given by
(3.1)
where /J is (kx 1) and ~7 is a (kxk) symmetric positive definite matrix. It is convenient to work with the set of parameters (/~, 2 -1) here, and accordingly, suppose that the prior for this situation is the conjugate prior or Raiffa and Schlaifer given by
72
IRWIN G U T T M A N
(3.2)
P(P, X - ' ) d ~
-~ ~ I ~ - ' Ir
I:-~[(no-1)Vo+no(p-~)(p-Yco)']Idpd$-~
where x0 is a (kx 1) vector of known constants and 110 is a (k symmetric positive definite matrix of known constants*. It is to be noted that if ~ tends to 0 and 0 h - l ) V 0 tends to the zero matrix, then (3.2) tends to the "in-ignorance" prior advocated by Geisser [2] and Geisser and Cornfield [3], viz (3.3) p(p, $-~)dpd~ -~ c~ [$-~
[-(~+,/2dpd~r'~.
Now it is easy to see that if n independent observations X~ are taken from (3.1), and we observe {X~} to be [x~], that the likelihood function is given by (3.4) l[p,.~'-'l
{x,}l=(2x)-'WZl~ -~[''
. exp { - l tr Y;-'[(n-1)V
where
n
x = n -l ~ X~
/.ffi!
and
n
( n - 1)v= ~. ( x , - ~) (x, -
~)'.
(the abbreviation " t r A " stands for the trace of the matrix A.) Now combining (8.2) and (3.4) using Bayes' Theorem gives us that the posterior of (p, Z-I) is such that
(3.5) p[~, :~-'1 {x,]] r162 I~ -1 p+~-~-"/' 1 9 exp {- ~ tr 2:-'[(n0 -- 1)1/'0+ (n-- 1)V
+m(~-~0)(~-~0)'+n(~-~)(~-~)']l
Now the exponent of (3.5) may be written in simplified form on "completing the square" in p, that is, the term in square brackets in the exponent of (3.5) may be written, after some algebra, as
(3.6) (n + no) (~ -
1)F0 + (n - 1 ) V + R
* We are in effect saying t h a t our prior information on p a n d ~P-I is such t h a t we expect p to be ~0, with dispersion, t h a t is, variance co-variance m a t r i x of p, to be ( ~ - - l ) V 0 / [no(no--k--2)], a n d t h a t we expect ~-1 to be V'o -1 etc.
73
where
(3.7) and
(3.7a)
To)
To)' .
(3.8)
9 expI-l[trE-~Q+(no+n)(l~-~)'X-~(IJ-~)] }
where (3.8a)
Q=(no-1)Vo+(n-1)V+
and c is the normalizing is, integrate to 1. Now to /J and then 2 -I, and rived from the k-variate
constant necessary to make (3.8) a density, that to determine c, we first integrate with respect in so doing, we make use of the identities denormal and k-order Wishart distributions, viz
We are now in the position of being able to find the predictive density of a future observation Y, conditional on x. The first factor of the integrand of (2.4) has functional form (3.1), and the second factor is given by (3.8). Hence, we find that the predictive distribution of Y is given by
(3.11)
74
IRWIN GUTTMAN
(3.12)
Again "completing the square" in p in (3.12), we easily, but tediously, find that (3.13) with W=(n+no+l)(p-~)(p-~)'-} n+no (y-Yc)(y-Yc)' n+n0+l
#=(n+no+i)-'[(n+m)i+y].
The integration with respect to p in (3.11), using (3.9a), gives us
(3.15)
where A is (n~ and B is (n, n~), we have the result that the predictive density of Y is given by
(8.17)
n+n0+l that is, we have the interesting result that the predictive distribution of Y, given {x~}, is related to the k-variate t-distribution, degrees of freedom ( n + n o - k ) . As may be seen from properties of the multivariate-t (see, for example, Tiao and Guttman [8]), we have that (3.18) no+n n+ n 0 + i (Y-Yc.)'Q-'(Y-Ic)= n + nbo - k F~..+~_~
Suppose now that we are interested in the "central" 100;9~ of the normal distribution (3.1), that is, in the set
75
(3.19)
A ~= {y l (y--p)'2-~(y-- p) <=Z~,~_~}
where Z~,l_p is the point exceeded with probability ( 1 - 8 ) when using the chi-square distribution with k degrees of freedom. Given p and 2 or (~, 2-~), we have that (3.20)
P ( Y e A~lp, 2)=~
that is, if we knew (~, 2), A k would be a 100~ % predictive region for Y. Now since we don't know (~, 2), we use as an estimator of the density (3.1), the density (3.17), after observing the data {x~}. Hence, a sensible predictive region for Y is the central ellipsoidal region of (3.17), namely the region (3.21)
Thus, by Lemma 2.1, we have that S defined by (3.21) is a tolerance region of (posterior) ~-expectation. We note that if no=Oand (n0-1)V0 is the zero matrix, that is, if the so-called "in-ignorance" prior given by (3.3) is the appropriate prior, then the above results imply that the ~-expectation region is of the form (3.23)
which is interesting, since this latter result is in agreement with the sampling theory result (1.6), as may be easily verified. It is to be finally remarked, that the lemma of Section 2 is quite general and may be used when sampling is from any population. In fact, the case of the single exponential is discussed in Guttman [5].
Appendix
We give a proof, due to George Tiao, of (3.16). Consider the matrix equations (A is (n~ x n~) and B is (n~ x n~))
76 and
IRWIN GUTTMAN
<A.2)
B I. j L - B ! z . j
z,j"
and
M [ I~I A
=
REFERENCES [ 1 ] Fraser, D. A . S . ~nd Guttman, Irwin (1956). Tolerance regions, Ann. Math. Statist., 27, 162-179. [ 2 ] Geisser, S. (1965). Bayesian estimation in multivariate analysis, Ann. Math. Statist., 36, 150-159. [ 3 ] Geisser, S. and Cornfield, J. (1963). Posterior distributions for multivariate normal parameters, ]our. Roy. Statist. Soc., Ser. B, 25, 368-376. [ 4 ] Guttman, Irwin (1967). The use of the concept of a future observation in goodness-offit problems, four. Roy. Statist. Soc., Set. B, 29, 83-100. [5 ] Guttman, Irwin (1968). Tolerance regions: A survey of its literature. VI. The Bayesian approach, Te.ch. Rzp. No. 126, / ~ p a ~ n ~ t of Statistics, University of Wisconsin. [ 6 ] Paulson, E. (1943). A note on tolerance limits, Ann. Math. Statist., 14, 90-93. [ 7 ] Raiffa, H. and Schlaifer, R. (1961). Applied Statistical Decision Theory, Harvard University Press. [8 ] Tiao, G. C. and Guttman, Irwin (1965). The multivariate inverted beta distribution with applications, Jour. Amer. Statist. Assoc., 60, 793-805.