34 vues

Transféré par Benjamin Koh

questions for statistics

- D573-04 Standard Test Method for Rubber—Deterioration in an Air Oven.pdf
- Profile likelihood
- A Primer on Learning in Bayesian Networks for Computational Biology
- Algorithms
- Retention Model
- Information Theory and Statistics- A Tutorial
- self esteem, gender, academic achievement
- MIT18_443S15_LEC6
- PhD Thesis
- Shebalin Etal 2012
- Cas3 Notes
- Minka Aspect
- Chapter 4
- Website - Machine Learning
- thrun.3D-EM
- 93401-2013-2015-syllabus
- IASSC Black Belt Body of Knowledge
- On the Simlation and Estimation of the MR OU Process With MATLAB
- ARTM Equalize 2 LinearRzxcvgbhjklx
- Mba Assign Unit I

Vous êtes sur la page 1sur 35

Paper 1, Section I

7H Statistics

Suppose that X1 , . . . , Xn are independent normally distributed random variables,

each with mean and variance 1, and consider testing H0 : = 0 against H1 : = 1.

Explain what is meant by the critical region, the size and the power of a test.

For 0 < < 1, derive the test that is most powerful among all tests of size at

most . Obtain an expression for the power of your test in terms of the standard normal

distribution function ().

[Results from the course may be used without proof provided they are clearly stated.]

Paper 2, Section I

8H Statistics

Suppose that, given , the random variable X has P(X = k) = e k /k!,

k = 0, 1, 2, . . .. Suppose that the prior density of is () = e , > 0, for some

known (> 0). Derive the posterior density ( | x) of based on the observation X = x.

For a given loss function L(, a), a statistician wants to calculate the value of a that

minimises the expected posterior loss

Z

L(, a)( | x)d.

Suppose that x = 0. Find a in terms of in the following cases:

(a) L(, a) = ( a)2 ;

(b) L(, a) = | a|.

List of Questions

[TURN OVER

42

Paper 4, Section II

19H Statistics

Consider a linear model Y = X + where Y is an n 1 vector of observations, X

is a known n p matrix, is a p 1 (p < n) vector of unknown parameters and is an

n 1 vector of independent normally distributed random variables each with mean zero

and unknown variance 2 . Write down the log-likelihood and show that the maximum

and

likelihood estimators

2 of and 2 respectively satisfy

= X T Y,

XT X

1

T (Y X )

= n

(Y X )

4

and

(T denotes the transpose). Assuming that X T X is invertible, find the solutions

2

of these equations and write down their distributions.

and

Prove that

2 are independent.

Consider the model Yij = i + xij + ij , i = 1, 2, 3 and j = 1, 2, 3. Suppose that, for

all i, xi1 = 1, xi2 = 0 and xi3 = 1, and that ij , i, j = 1, 2, 3, are independent N (0, 2 )

random variables where 2 is unknown. Show how this model may be written as a linear

model and write down Y, X, and . Find the maximum likelihood estimators of i

(i = 1, 2, 3), and 2 in terms of the Yij . Derive a 100(1 )% confidence interval for 2

and for 2 1 .

cov(W1 , W2 ) = 0, then W1 and W2 are independent.]

List of Questions

43

Paper 1, Section II

19H Statistics

Suppose X1 , . . . , Xn are independent identically distributed random variables each

with probability mass function P(Xi = xi ) = p(xi ; ), where is an unknown parameter.

State what is meant by a sufficient statistic for . State the factorisation criterion for a

sufficient statistic. State and prove the RaoBlackwell theorem.

with

m

P(Xi = xi ) =

xi (1 )mxi , xi = 0, . . . , m,

xi

for .

P

Show that T = ni=1 Xi is sufficient for and use the RaoBlackwell theorem to

find another unbiased estimator for , giving details of your derivation. Calculate the

A statistician cannot remember the exact statement of the RaoBlackwell theorem

and calculates E(T | X1 ) in an attempt to find an estimator of . Comment on the

suitability or otherwise of this approach, giving your reasons.

=

[Hint: If a and b are positive integers then, for r = 0, 1, . . . , a + b, a+b

r

b

Pr

a

j=0 j rj .]

Paper 3, Section II

20H Statistics

(a) Suppose that X1 , . . . , Xn are independent identically distributed random variables, each with density f (x) = exp(x), x > 0 for some unknown > 0. Use the

generalised likelihood ratio to obtain a size test of H0 : = 1 against H1 : 6= 1.

(b) A die is loaded so that, if pi is the probability of face i, then p1 = p2 = 1 ,

p3 = p4 = 2 and p5 = p6 = 3 . The die is thrown n times and face i is observed xi times.

Write down the likelihood function for = (1 , 2 , 3 ) and find the maximum likelihood

estimate of .

Consider testing whether or not 1 = 2 = 3 for this die. Find the generalised

likelihood ratio statistic and show that

2 loge T,

where T =

3

X

(oi ei )2

i=1

ei

approximate size 0.05 test using the value of T . Explain what you would conclude (and

why) if T = 2.03.

List of Questions

[TURN OVER

42

Paper 1, Section I

7H Statistics

Consider an estimator of an unknown parameter , and assume that E 2 <

Show that the mean squared error of is the sum of its variance and the square of

its bias.

Suppose that X1 , . . . , Xn are independent identically distributed random variables

where

with mean

and variance 2 , and consider estimators of of the form k X

P

n

1

=

X

i=1 Xi .

n

(i) Find the value of k that gives an unbiased estimator, and show that the mean

squared error of this unbiased estimator is 2 /n.

is smaller

(ii) Find the range of values of k for which the mean squared error of kX

2

than /n.

Paper 2, Section I

8H Statistics

There are 100 patients taking part in a trial of a new surgical procedure for a

particular medical condition. Of these, 50 patients are randomly selected to receive the

new procedure and the remaining 50 receive the old procedure. Six months later, a doctor

assesses whether or not each patient has fully recovered. The results are shown below:

Old procedure

New procedure

Fully

recovered

25

31

Not fully

recovered

25

19

The doctor is interested in whether there is a difference in full recovery rates for patients

receiving the two procedures. Carry out an appropriate 5% significance level test, stating

your hypotheses carefully. [You do not need to derive the test.] What conclusion should

be reported to the doctor?

[Hint: Let 2k () denote the upper 100 percentage point of a 2k distribution. Then

21 (0.05) = 3.84, 22 (0.05) = 5.99, 23 (0.05) = 7.82, 24 (0.05) = 9.49.]

List of Questions

43

Paper 4, Section II

19H Statistics

Consider a linear model

Y = X + ,

()

is an n 1 vector of independent N (0, 2 ) random variables with 2 unknown. Assume

of and derive its distribution.

that X has full rank p. Find the least squares estimator

Define the residual sum of squares RSS and write down an unbiased estimator

2 of 2 .

Suppose that V

+ i and Zi = c + dwi + i , for i = 1, . . . , m, where ui and

i = a + buiP

P

m

wi are known with m

u

=

i=1 i

i=1 wi = 0, and 1 , . . . , m , 1 , . . . , m are independent

2

N (0, ) random variables. Assume that at least two of the ui are distinct and at least

two of the wi are distinct. Show that Y = (V1 , . . . , Vm , Z1 , . . . , Zm )T (where T denotes

in terms of the Vi , Zi ,

transpose) may be written as in () and identify X and . Find

ui and wi . Find the distribution of b d and derive a 95% confidence interval for b d.

has a 2np distribution, and that and the

[Hint: You may assume that RSS

2

residual sum of squares are independent. Properties of 2 distributions may be used without

proof.]

List of Questions

[TURN OVER

44

Paper 1, Section II

19H Statistics

Suppose that X1 , X2 , and X3 are independent identically distributed Poisson

random variables with expectation , so that

P(Xi = x) =

e x

x!

x = 0, 1, . . . ,

1 is a known value greater

than 1. Show that the test with critical region {(x1 , x2 , x3 ) : 3i=1 xi > 5} is a likelihood

ratio test of H0 against H1 . What is the size of this test? Write down an expression for

its power.

A scientist counts the number of bird territories in n randomly selected sections

of a large park. Let Yi be the number of bird territories in the ith section, and

suppose that Y1 , . . . , Yn are independent Poisson random variables with expectations

1 , . . . , n respectively. Let ai be the area of the ith section. Suppose that n = 2m,

a1 = = am = a(> 0) and am+1 = = a2m = 2a. Derive the generalised likelihood

ratio for testing

1 i = 1, . . . , m

H0 : i = ai against H1 : i =

2 i = m + 1, . . . , 2m.

What should the scientist conclude about the number of bird territories if 2 loge ()

is 15.67?

[Hint: Let F (x) be P(W 6 x) where W has a Poisson distribution with expectation .

Then

F1 (3) = 0.998, F3 (5) = 0.916, F3 (6) = 0.966, F5 (3) = 0.433 .]

List of Questions

45

Paper 3, Section II

20H Statistics

Suppose that X1 , . . . , Xn are independent identically distributed random variables

with

k x

P(Xi = x) =

(1 )kx, x = 0, . . . , k,

x

where k is known and (0 < < 1) is an unknown parameter. Find the maximum

likelihood estimator of .

Statistician 1 has prior density for given by 1 () = 1 , 0 < < 1, where

> 1. Find the posterior distribution for after observing data X1 = x1 , . . . , Xn = xn .

(B)

Write down the posterior mean 1 , and show that

(B)

1 = c + (1 c)1 ,

where 1 depends only on the prior distribution and c is a constant in (0, 1) that is to be

specified.

Statistician 2 has prior density for given by 2 () = (1)1 , 0 < < 1. Briefly

describe the prior beliefs that the two statisticians hold about . Find the posterior mean

(B)

(B)

(B)

2 and show that 2 < 1 .

Suppose that increases (but n, k and the xi remain unchanged). How do the prior

beliefs of the two statisticians change? How does c vary? Explain briefly what happens

(B)

(B)

to 1 and 2 .

[Hint: The Beta(, ) ( > 0, > 0) distribution has density

f (x) =

( + ) 1

x

(1 x)1 ,

()()

with expectation +

and variance

the Gamma function.]

.

(++1)(+)2

List of Questions

0 < x < 1,

Here, () =

R

0

x1 ex dx, > 0, is

[TURN OVER

41

Paper 1, Section I

7H Statistics

Let x1 , . . . , xn be independent and identically distributed observations from a

distribution with probability density function

(

e(x) , x > ,

f (x) =

0,

x < ,

where and are unknown positive parameters. Let = + 1/. Find the maximum

likelihood estimators ,

and .

Determine for each of ,

and whether or not it has a positive bias.

Paper 2, Section I

8H Statistics

State and prove the RaoBlackwell theorem.

Individuals in a population are independently of three types {0, 1, 2}, with unknown

probabilities p0 , p1 , p2 where p0 + p1 + p2 = 1. In a random sample of n people the ith

person is found to be of type xi {0, 1, 2}.

Show that an unbiased estimator of = p0 p1 p2 is

(

1, if (x1 , x2 , x3 ) = (0, 1, 2),

=

0, otherwise.

say

such that var( ) < (1 ).

List of Questions

[TURN OVER

42

Paper 4, Section II

19H Statistics

Explain the notion of a sufficient statistic.

Suppose X is a random variable with distribution F taking values in {1, . . . , 6},

with P (X = i) = pi . Let x1 , . . . , xn be a sample from F . Suppose ni is the number of

these xj that are equal to i. Use a factorization criterion to explain why (n1 , . . . , n6 ) is

sufficient for = (p1 , . . . , p6 ).

Let H0 be the hypothesis that pi = 1/6 for all i. Derive the statistic of the

generalized likelihood ratio test of H0 against the alternative that this is not a good

fit.

Assuming that ni n/6 when H0 is true and n is large, show that this test can be

approximated by a chi-squared test using a test statistic

6

T = n +

6X 2

ni .

n

i=1

Suppose n = 100 and T = 8.12. Would you reject H0 ? Explain your answer.

List of Questions

43

Paper 1, Section II

19H Statistics

Consider the general linear model Y = X + where X is a known n p matrix,

is an unknown p 1 vector of parameters, and is an n 1 vector of independent N (0, 2 )

random variables with unknown variance 2 . Assume the p p matrix X T X is invertible.

Let

= (X T X)1 X T Y

= Y X .

What are the distributions of and ? Show that and are uncorrelated.

Four apple trees stand in a 2 2 rectangular grid. The annual yield of the tree at

coordinate (i, j) conforms to the model

yij = i + xij + ij ,

i, j {1, 2},

where xij is the amount of fertilizer applied to tree (i, j), 1 , 2 may differ because of

varying soil across rows, and the ij are N (0, 2 ) random variables that are independent

of one another and from year to year. The following two possible experiments are to be

compared:

0 2

0 1

.

and II : xij =

I : xij =

3 1

2 3

Represent these as general linear models, with = (1 , 2 , ). Compare the variances of

estimates of under I and II.

With II the following yields are observed:

100 300

yij =

.

600 400

Forecast the total yield that will be obtained next year if no fertilizer is used. What is the

95% predictive interval for this yield?

List of Questions

[TURN OVER

44

Paper 3, Section II

20H Statistics

Suppose x1 is a single observation from a distribution with density f over [0, 1]. It

is desired to test H0 : f (x) = 1 against H1 : f (x) = 2x.

Let : [0, 1] {0, 1} define a test by (x1 ) = i accept Hi . Let

i () = P ((x1 ) = 1 i | Hi ). State the Neyman-Pearson lemma using this notation.

Let be the best test of size 0.10. Find and 1 ().

Consider now : [0, 1] {0, 1, } where (x1 ) = means declare the test to be

inconclusive. Let i () = P ((x) = | Hi ). Given prior probabilities 0 for H0 and

1 = 1 0 for H1 , and some w0 , w1 , let

cost() = 0 w0 0 () + 0 () + 1 w1 1 () + 1 () .

Prove that for each value of 0 (0, 1) there exist w0 , w1 (depending on 0 ) such that

cost( ) = min cost(). [Hint: w0 = 1 + 2(0.6)(1 /0 ).]

Hence prove that if is any test for which

i () 6 i ( ),

i = 0, 1

List of Questions

42

Paper 1, Section I

7H Statistics

Describe the generalised likelihood ratio test and the type of statistical question for

which it is useful.

Suppose that X1 , . . . , Xn are independent and identically distributed random variables with the Gamma(2, ) distribution, having density function 2 x exp(x), x > 0.

Similarly, Y1 , . . . , Yn are independent and identically distributed with the Gamma(2, )

distribution. It is desired to test the hypothesis H0 : = against

PH1 :

P6= . Derive

the generalised likelihood ratio test and express it in terms of R = i Xi / i Yi .

(1)

Let F1 ,2 denote the value that a random variable having the F1 ,2 distribution

exceeds with probability . Explain how to decide the outcome of a size 0.05 test when

(1)

n = 5 by knowing only the value of R and the value F1 ,2 , for some 1 , 2 and , which

you should specify.

[You may use the fact that the 2k distribution is equivalent to the Gamma(k/2, 1/2)

distribution.]

Paper 2, Section I

8H Statistics

Let the sample x = (x1 , . . . , xn ) have likelihood function f (x; ). What does it mean

to say T (x) is a sufficient statistic for ?

Show that if a certain factorization criterion is satisfied then T is sufficient for .

Suppose that T is sufficient for and there exist two samples, x and y, for which

T (x) 6= T (y) and f (x; )/f (y; ) does not depend on . Let

(

T (z) z 6= y

T1 (z) =

T (x) z = y.

Show that T1 is also sufficient for .

Explain why T is not minimally sufficient for .

List of Questions

43

Paper 4, Section II

19H Statistics

From each of 3 populations, n data points are sampled and these are believed to

obey

yij = i + i (xij x

i ) + ij , j {1, . . . , n}, i {1, 2, 3},

P

where x

i = (1/n) j xij , the ij are independent and identically distributed as N (0, 2 ),

P

and 2 is unknown. Let yi = (1/n) j yij .

(i) Find expressions for

i and i , the least squares estimates of i and i .

i and i ?

(iii) Show that the residual sum of squares, R1 , is given by

n

3

n

X

X

X

(yij yi )2 i2

(xij x

i )2 .

R1 =

i=1

j=1

j=1

Calculate R1 when n = 9, {

i }3i=1 = {1.6, 0.6, 0.8}, {i }3i=1 = {2, 1, 1},

3

X

(yij yi )2

j=1

i=1

X

(xij x

i )2

j=1

i=1

likelihood estimator of 1 under the assumption that H0 is true. Calculate its value for

the above data.

(v) Explain (stating without proof any relevant theory) the rationale for a statistic

which can be referred to an F distribution to test H0 against the alternative that it is not

true. What should be the degrees of freedom of this F distribution? What would be the

outcome of a size 0.05 test of H0 with the above data?

List of Questions

[TURN OVER

44

Paper 1, Section II

19H Statistics

State and prove the Neyman-Pearson lemma.

A sample of two independent observations, (x1 , x2 ), is taken from a distribution with

density f (x; ) = x1 , 0 6 x 6 1. It is desired to test H0 : = 1 against H1 : = 2.

Show that the best test of size can be expressed using the number c such that

1 c + c log c = .

Is this the uniformly most powerful test of size for testing H0 against H1 : > 1?

Suppose that the prior distribution of is P ( = 1) = 4/(1 + 4), P ( = 2) =

1/(1+4), where 1 > > 0. Find the test of H0 against H1 that minimizes the probability

of error.

Let w() denote the power function of this test at (> 1). Show that

w() = 1 + log .

List of Questions

45

Paper 3, Section II

20H Statistics

Suppose

that X is a single observation drawn from the uniform distribution on the

interval 10, + 10 , where is unknown and might be any real number. Given 0 6= 20

we wish to test H0 : = 0 against H1 : = 20. Let (0 ) be the test which accepts H0 if

and only if X A(0 ), where

A(0 ) =

(

0 8, ,

0 > 20

, 0 + 8 , 0 < 20 .

Now consider

C1 (X) = { : X A()},

C2 (X) = : X 9 6 6 X + 9 .

Prove that both C1 (X) and C2 (X) specify 90% confidence intervals for . Find the

confidence interval specified by C1 (X) when X = 0.

Let Li (X) be the length of the confidence interval specified by Ci (X). Let (0 ) be

the probability of the Type II error of (0 ). Show that

Z

Z

(0 ) d0 .

1{0 C1 (X)} d0 = 20 =

E[L1 (X) | = 20] = E

Here 1{B} is an indicator variable for event B. The expectation is over X. [Orders of

integration and expectation can be interchanged.]

Use what you know about constructing best tests to explain which of the two

confidence intervals has the smaller expected length when = 20.

List of Questions

[TURN OVER

38

Paper 1, Section I

7H Statistics

Consider the experiment of tossing a coin n times. Assume that the tosses are

independent and the coin is biased, with unknown probability p of heads and 1 p of tails.

A total of X heads is observed.

(i) What is the maximum likelihood estimator pb of p?

Now suppose that a Bayesian statistician has the Beta(M, N ) prior distribution

for p.

(iii) Assuming the loss function is L(p, a) = (p a)2 , show that the statisticians point

estimate for p is given by

M +X

.

M +N +n

[The Beta(M, N ) distribution has density

mean

M

.]

M +N

(M + N ) M 1

x

(1 x)N 1 for 0 < x < 1 and

(M )(N )

Paper 2, Section I

8H Statistics

Let X1 , . . . , Xn be random variables with joint density function f (x1 , . . . , xn ; ),

where is an unknown parameter. The null hypothesis H0 : = 0 is to be tested against

the alternative hypothesis H1 : = 1 .

(i) Define the following terms: critical region, Type I error, Type II error, size, power.

(ii) State and prove the NeymanPearson lemma.

List of Questions

39

Paper 1, Section II

19H Statistics

Let X1 , . . . , Xn be independent random variables with probability mass function

f (x; ), where is an unknown parameter.

(i) What does it mean to say that T is a sufficient statistic for ? State, but do not prove,

the factorisation criterion for sufficiency.

(ii) State and prove the RaoBlackwell theorem.

1

Now consider the case where f (x; ) = ( log )x for non-negative integer x and

x!

0 < < 1.

(iii) Find a one-dimensional sufficient statistic T for .

(iv) Show that e = 1{X1 =0} is an unbiased estimator of .

(v) Find another unbiased estimator b which is a function of the sufficient statistic T

e You may use the following fact without proof:

and that has smaller variance than .

X1 + + Xn has the Poisson distribution with parameter n log .

List of Questions

[TURN OVER

40

Paper 3, Section II

20H Statistics

Consider the general linear model

Y = X +

where X is a known n p matrix, is an unknown p 1 vector of parameters, and

is an n 1 vector of independent N (0, 2 ) random variables with unknown variance 2 .

Assume the p p matrix X T X is invertible.

(i) Derive the least squares estimator b of .

b Is b an unbiased estimator of ?

(ii) Derive the distribution of .

(iii) Show that 12 kY X k

is to be determined.

e b )T ] or otherwise, show that b and b e are

By considering the matrix E[(b )(

independent.

[You may use standard facts about the multivariate normal distribution as well as results

from linear algebra, including the fact that I X(X T X)1 X T is a projection matrix of

rank n p, as long as they are carefully stated.]

Paper 4, Section II

19H Statistics

2 ) distribution

Consider independent random variables X1 , . . . , Xn with the N (X , X

2

and Y1 , . . . , Yn with the N (Y , Y ) distribution, where the means X , Y and variances

2 , 2 are unknown. Derive the generalised likelihood ratio test of size of the null

X

Y

2 = 2 against the alternative H : 2 6= 2 . Express the critical

hypothesis H0 : X

1

Y

X

Y

SXX

and the quantiles of a beta distribution,

region in terms of the statistic T =

SXX + SY Y

where

!2

!2

n

n

n

n

X

X

1 X

1 X

2

2

Yi

Xi

Yi .

and SY Y =

Xi

SXX =

n

n

i=1

i=1

i=1

i=1

[You may use the following fact: if U (a, ) and V (b, ) are independent,

U

then

Beta(a, b).]

U +V

List of Questions

40

Paper 1, Section I

7E Statistics

Suppose X1 , . . . , Xn are independent N (0, 2 ) random variables, where 2 is an

unknown parameter. Explain carefully how to construct the uniformly most powerful test

of size for the hypothesis H0 : 2 = 1 versus the alternative H1 : 2 > 1 .

Paper 2, Section I

8E Statistics

A washing powder manufacturer wants to determine the effectiveness of a television

advertisement. Before the advertisement is shown, a pollster asks 100 randomly chosen

people which of the three most popular washing powders, labelled A, B and C, they prefer.

After the advertisement is shown, another 100 randomly chosen people (not the same as

before) are asked the same question. The results are summarized below.

A B C

before 36 47 17

after 44 33 23

Derive and carry out an appropriate test at the 5% significance level of the

hypothesis that the advertisement has had no effect on peoples preferences.

[You may find the following table helpful:

21

95 percentile

22

23

24

25

26

List of Questions

41

Paper 1, Section II

19E Statistics

Consider the the linear regression model

Y i = xi + i ,

where the numbers x1 , . . . , xn are known, the independent random variables 1 , . . . , n

have the N (0, 2 ) distribution, and the parameters and 2 are unknown. Find the

maximum likelihood estimator for .

State and prove the GaussMarkov theorem in the context of this model.

Write down the distribution of an arbitrary linear estimator for . Hence show that

there exists a linear, unbiased estimator b for such that

E, 2 [(b )4 ] 6 E, 2 [(e )4 ]

Paper 3, Section II

20E Statistics

Let X1 , . . . , Xn be independent Exp() random variables with unknown parameter

. Find the maximum likelihood estimator of , and state the distribution of n/ . Show

that / has the (n, n) distribution. Find the 100 (1 )% confidence interval for of

for a constant C > 0 depending on .

the form [0, C ]

Now, taking a Bayesian point of view, suppose your prior distribution for the

parameter is (k, ). Show that your Bayesian point estimator B of for the loss

function L(, a) = ( a)2 is given by

B =

n+k

P

.

+ i Xi

Find a constant CB > 0 depending on such that the posterior probability that

6 CB B is equal to 1 .

[The density of the (k, ) distribution is f (x; k, ) = k x k 1 e x /(k), for x > 0.]

List of Questions

[TURN OVER

42

Paper 4, Section II

19E Statistics

Consider a collection X1 , . . . , Xn of independent random variables with common

density function f (x; ) depending on a real parameter . What does it mean to say T

is a sufficient statistic for ? Prove that if the joint density of X1 , . . . , Xn satisfies the

factorisation criterion for a statistic T , then T is sufficient for .

Let each Xi be uniformly distributed on [ , ] . Find a two-dimensional

sufficient statistic T = (T1 , T2 ). Using the fact that = 3X12 is an unbiased estimator of

, or otherwise, find an unbiased estimator of which is a function of T and has smaller

variance than . Clearly state any results you use.

List of Questions

48

Paper 1, Section I

7H Statistics

What does it mean to say that an estimator of a parameter is unbiased?

An n-vector Y of observations is believed to be explained by the model

Y = X + ,

where X is a known n p matrix, is an unknown p-vector of parameters, p < n, and

is an n-vector of independent N (0, 2 ) random variables. Find the maximum-likelihood

of , and show that it is unbiased.

estimator

Paper 3, Section I

8H Statistics

In a demographic study, researchers gather data on the gender of children in families

with more than two children. For each of the four possible outcomes GG, GB, BG, BB

of the first two children in the family, they find 50 families which started with that pair,

and record the gender of the third child of the family. This produces the following table

of counts:

First two children Third child B Third child G

GG

16

34

GB

28

22

BG

25

25

BB

31

19

In view of this, is the hypothesis that the gender of the third child is independent of the

genders of the first two children rejected at the 5% level?

[Hint: the 95% point of a 23 distribution is 7.8147, and the 95% point of a 24 distribution

is 9.4877.]

List of Questions

49

Paper 1, Section II

18H Statistics

What is the critical region C of a test of the null hypothesis H0 : 0 against

the alternative H1 : 1 ? What is the size of a test with critical region C? What is

the power function of a test with critical region C?

State and prove the NeymanPearson Lemma.

If X1 , . . . , Xn are independent with common Exp() distribution, and 0 < 0 < 1 ,

find the form of the most powerful size- test of H0 : = 0 against H1 : = 1 . Find

the power function as explicitly as you can, and prove that it is increasing in . Deduce

that the test you have constructed is a size- test of H0 : 6 0 against H1 : = 1 .

Paper 2, Section II

19H Statistics

What does it mean to say that the random d-vector X has a multivariate normal

distribution with mean and covariance matrix ?

Suppose that X Nd (0, 2 Id ), and that for each j = 1, . . . , J, Aj is a dj d matrix.

Suppose further that

Aj ATi = 0

for j 6= i. Prove that the random vectors Yj Aj X are independent, and that

Y (Y1T , . . . , YJT )T has a multivariate normal distribution.

[ Hint: Random vectors are independent if their joint MGF is the product of their individual

MGFs.]

If Z1 , . . . , Zn is an independent sample from a univariate N (, 2 ) distribution,

P

2 and the sample mean Z

prove that the sample variance SZZ (n 1)1 ni=1 (Zi Z)

P

n

n1 i=1 Zi are independent.

Paper 4, Section II

19H Statistics

What is a sufficient statistic? State the factorization criterion for a statistic to be

sufficient.

Suppose that X1 , . . . , Xn are independent random variables uniformly distributed

over [a, b], where the parameters a < b are not known, and n > 2. Find a sufficient statistic

for the parameter (a, b) based on the sample X1 , . . . , Xn . Based on your sufficient

statistic, derive an unbiased estimator of .

List of Questions

[TURN OVER

35

1/I/7H

Statistics

N (, 1 ) distribution. He has a prior density for the unknown parameters , of the

form

0 (, ) 0 1 exp ( 12 K0 ( 0 )2 0 ) ,

where 0 , 0 , 0 and K0 are constants which he chooses. Show that after observing

X1 , . . . , Xn his posterior density n (, ) is again of the form

n (, ) n 1 exp ( 12 Kn ( n )2 n ) ,

where you should find explicitly the form of n , n , n and Kn .

1/II/18H

Statistics

and Y1 , . . . , Yn is an independent sample of size n from a N (Y , 1) distribution.

(i) Find (with careful justification) the form of the size- likelihoodratio test of the

null hypothesis H0 : Y = 0 against alternative H1 : (X , Y ) unrestricted.

(ii) Find the form of the size- likelihoodratio test of the hypothesis

H0 : X > A, Y = 0 ,

against H1 : (X , Y ) unrestricted, where A is a given constant.

Compare the critical regions you obtain in (i) and (ii) and comment briefly.

Part IB

2008

36

2/II/19H

Statistics

Z+ = {0, 1, 2, . . . } is given by the joint probability generating function

(s, t) E [sX tY ] =

1

,

1 s t

where the unknown parameters and are positive, and satisfy the inequality + < 1.

Find E(X). Prove that the probability mass function of (X, Y ) is

x+y x y

f (x, y | , ) = (1 )

x

(x, y Z+ ) ,

and prove that the maximum-likelihood estimators of and based on a sample of size

n drawn from the distribution are

X

,

1+X +Y

Y

,

1+X +Y

By considering

+ or otherwise, prove that the maximum-likelihood estimator

is biased. Stating clearly any results to which you appeal, prove that as n ,

,

making clear the sense in which this convergence happens.

3/I/8H

Statistics

confidence set for ?

In the case where the Xi are independent N (, 2 ) random variables with 2 known,

unknown, find (in terms of 2 ) how large the size n of the sample must be in order for

there to exist a 95% confidence interval for of length no more than some given > 0 .

[Hint: If Z N (0, 1) then P (Z > 1.960) = 0.025 .]

Part IB

2008

37

4/II/19H

Statistics

Yi = + xi + i ,

where observations Yi , i = 1, . . . , n, depend on known explanatory variables xi ,

i = 1, . . . , n, and independent N (0, 2 ) random variables i , i = 1, . . . , n .

Derive the maximum-likelihood estimators of , and 2 .

Stating clearly any results you require about the distribution of the maximum-likelihood

estimators of , and 2 , explain how to construct a test of the hypothesis that = 0

against an unrestricted alternative.

(ii) A simple ballistic theory predicts that the range of a gun fired at angle of

elevation should be given by the formula

Y =

V2

sin 2 ,

g

where V is the muzzle velocity, and g is the gravitational acceleration. Shells are fired at

9 different elevations, and the ranges observed are as follows:

(degrees)

sin 2

Y (m)

5

0.1736

4322

15

25

0.5

0.7660

11898 17485

35

45

0.9397

1

20664 21296

55

0.9397

19491

65

75

85

0.7660

0.5

0.1736

15572 10027 3458

The model

Yi = + sin 2i + i

()

is proposed. Using the theory of part (i) above, find expressions for the maximumlikelihood estimators of and .

The t-test of the null hypothesis that = 0 against an unrestricted alternative

does not reject the null hypothesis. Would you be willing to accept the model ()? Briefly

explain your answer.

[You may

of the data. If xi =P sin 2i , then

P need the following summary statistics

P

x

n1 xi = 0.63986, Y = 13802, Sxx (xi x

)2 = 0.81517, Sxy =

Yi (xi x

) =

17186. ]

Part IB

2008

28

1/I/7C

Statistics

Let

identically distributed random variables from the

X1 , . . . , Xn be independent,

2

2

N , distribution where and are unknown. Use the generalized likelihood-ratio

test to derive the form of a test of the hypothesis H0 : = 0 against H1 : 6= 0 .

Explain carefully how the test should be implemented.

1/II/18C

Statistics

P (Xi = 1) = = 1 P (Xi = 0) ,

where is an unknown parameter, 0 < < 1, and n > 2. It is desired to estimate the

quantity = (1 ) = nVar ((X1 + + Xn ) /n).

of .

(i) Find the maximum-likelihood estimate, ,

(ii) Show that 1 = X1 (1 X2 ) is an unbiased estimate of and hence, or otherwise,

obtain an unbiased estimate of which has smaller variance than 1 and which is

a function of .

(iii) Now suppose that a Bayesian approach is adopted and that the prior distribution

for , (), is taken to be the uniform distribution on (0, 1). Compute the Bayes

2

point estimate of when the loss function is L(, a) = ( a) .

[You may use that fact that when r, s are non-negative integers,

Z

2/II/19C

Statistics

Suppose that X is a random variable drawn from the probability density function

f (x | ) = 12 | x |1 e|x| /(),

where () =

< x < ,

power of the test as a function of .

Is your test uniformly most powerful for testing H0 : = 1 against H1 : > 1?

Explain your answer carefully.

Part IB

2007

29

3/I/8C

Statistics

Light bulbs are sold in packets of 3 but some of the bulbs are defective. A sample

of 256 packets yields the following figures for the number of defectives in a packet:

No. of defectives

No. of packets

116

94

40

Test the hypothesis that each bulb has a constant (but unknown) probability of

being defective independently of all other bulbs.

[ Hint: You may wish to use some of the following percentage points:

Distribution

90% percentile

95% percentile

4/II/19C

21

22

23

24

t1

t2

t3

t4

271

384

461

599

625

781

778

949

308

631

189

292

164

235

153

213

Statistics

Yi = + xi + i ,

1 6 i 6 n,

n

real numbers with i=1 xi = 0 and , and 2 are unknown.

(i) Find the least-squares estimates

b and b of and , respectively, and explain why

in this case they are the same as the maximum-likelihood estimates.

(ii) Determine the maximum-likelihood estimate

b2 of 2 and find a multiple of it which

2

is an unbiased estimate of .

(iii) Determine the joint distribution of

b, b and

b2 .

(iv) Explain carefully how you would test the hypothesis H0 : = 0 against the

alternative H1 : 6= 0 .

Part IB

2007

29

1/I/7C

Statistics

mean and variance 1. Find the maximum likelihood estimate M for based on

X1 , . . . , X n .

Suppose that we now take a Bayesian point of view and regard itself as a normal

random variable of known mean and variance 1 . Find the Bayes estimate B for

based on X1 , . . . , Xn , corresponding to the quadratic loss function ( a)2 .

1/II/18C

Statistics

. Explain what is meant by a sufficient statistic T (X) for .

In the case where X is discrete, with probability mass function f (x|), explain,

with justification, how a sufficient statistic may be found.

Assume now that X = (X1 , . . . , Xn ), where X1 , . . . , Xn are independent nonnegative random variables with common density function

f (x|) =

e(x)

0

if x > ,

otherwise.

Here 0 is unknown and is a known positive parameter. Find a sufficient statistic for

and hence obtain an unbiased estimator for of variance (n)2 .

[You may use without proof the following facts: for independent exponential random

variables X and Y , having parameters and respectively, X has mean 1 and variance

2 and min{X, Y } has exponential distribution of parameter + .]

2/II/19C

Statistics

mean and variance 1. It is desired to test the hypothesis H0 : 0 against the

alternative H1 : > 0. Show that there is a uniformly most powerful test of size = 1/20

and identify a critical region for such a test in the case n = 9. If you appeal to any

theoretical result from the course you should also prove it.

[The 95th percentile of the standard normal distribution is 1.65.]

Part IB

2006

30

3/I/8C

Statistics

One hundred children were asked whether they preferred crisps, fruit or chocolate.

Of the boys, 12 stated a preference for crisps, 11 for fruit, and 17 for chocolate. Of the

girls, 13 stated a preference for crisps, 14 for fruit, and 33 for chocolate. Answer each of

the following questions by carrying out an appropriate statistical test.

(a) Are the data consistent with the hypothesis that girls find all three types of

snack equally attractive?

(b) Are the data consistent with the hypothesis that boys and girls show the same

distribution of preferences?

4/II/19C

Statistics

X1 , . . . , Xm , the second resulting in observations Y1 , . . . , Yn . We assume that all observations are independent and normally distributed, with unknown means X in the first series

and Y in the second series. We assume further that the variances of the observations are

unknown but are all equal.

= m1 Pm Xi and sum of

Write down

the

distributions

of

the

sample

mean

X

i=1

Pm

2.

squares SXX = i=1 (Xi X)

Hence obtain a statistic T (X, Y ) to test the hypothesis H0 : X = Y against

H1 : X > Y and derive its distribution under H0 . Explain how you would carry out a

test of size = 1/100 .

Part IB

2006

32

1/I/7D

Statistics

The fast-food chain McGonagles have three sizes of their takeaway haggis, Large,

Jumbo and Soopersize. A survey of 300 randomly selected customers at one branch choose

92 Large, 89 Jumbo and 119 Soopersize haggises.

Is there sufficient evidence to reject the hypothesis that all three sizes are equally

popular? Explain your answer carefully.

Distribution

t1

t2

t3

21

22

23

F1,2 F2,3

95% percentile 631

1/II/18D

292

235

384

599

782

1851

955

Statistics

In the context of hypothesis testing define the following terms: (i) simple hypothesis; (ii) critical region; (iii) size; (iv) power; and (v) type II error probability.

State, without proof, the NeymanPearson lemma.

Let X be a single observation from a probability density function f . It is desired

to test the hypothesis

H0 : f = f0

against

H1 : f = f1 ,

with f0 (x) = 21 |x| ex /2 and f1 (x) = 0 (x), < x < , where (x) is the distribution

function of the standard normal, N (0, 1).

Determine the best test of size , where 0 < < 1, and express its power in terms

of and .

Find the size of the test that minimizes the sum of the error probabilities. Explain

your reasoning carefully.

2/II/19D

Statistics

where is an unknown real-valued parameter which is assumed to have a prior density

(). Determine the optimal Bayes point estimate a(X1 , . . . , Xn ) of , in terms of the

posterior distribution of given X1 , . . . , Xn , when the loss function is

L(, a) =

( a)

when > a,

(a )

when 6 a,

Calculate the estimate explicitly in the case when f (x | ) is the density of the

uniform distribution on (0, ) and () = e n /n!, > 0.

Part IB

2005

33

3/I/8D

Statistics

variance 2 , where and 2 are unknown. Derive the form of the size- generalized

likelihood-ratio test of the hypothesis H0 : = 0 against H1 : 6= 0 , and show that it

is equivalent to the standard t-test of size .

[You should state, but need not derive, the distribution of the test statistic.]

4/II/19D

Statistics

Yi = xi + i ,

1 6 i 6 n,

where 1 , . . . , n are independent random variables each with the N (0, 2 ) distribution.

Here x1 , . . . , xn are known but and 2 are unknown.

b

b2 ) of (, 2 ).

(i) Determine the maximum-likelihood estimates (,

b

(ii) Find the distribution of .

b i and b are independent, or otherwise, determine

(iii) By showing that Yi x

the joint distribution of b and

b2 .

(iv) Explain carefully how you would test the hypothesis H0 : = 0 against

H1 : 6= 0 .

Part IB

2005

23

1/I/10H

Statistics

Use the generalized likelihood-ratio test to derive Students t-test for the equality

of the means of two populations. You should explain carefully the assumptions underlying

the test.

1/II/21H

Statistics

Suppose that X1 , X2 , . . . , Xn are independent, identically-distributed random variables with distribution

P (X1 = r) = pr1 (1 p),

r = 1, 2, . . . ,

statistic, T , for p.

By first finding a simple unbiased estimate for p, or otherwise, determine an

unbiased estimate for p which is a function of T .

Part IB

2004

24

2/I/10H

Statistics

A study of 60 men and 90 women classified each individual according to eye colour

to produce the figures below.

Blue

Brown

Green

Men

20

20

20

Women

20

50

20

Explain how you would analyse these results. You should indicate carefully any underlying

assumptions that you are making.

A further study took 150 individuals and classified them both by eye colour and by

whether they were left or right handed to produce the following table.

Blue

Brown

Green

Left Handed

20

20

20

Right Handed

20

50

20

How would your analysis change? You should again set out your underlying assumptions

carefully.

[You may wish to note the following percentiles of the 2 distribution.

2/II/21H

21

95% percentile 3.84

22

5.99

23

7.81

24

9.49

25

26

11.07 12.59

Statistics

Defining carefully the terminology that you use, state and prove the Neyman

Pearson Lemma.

Let X be a single observation from the distribution with density function

f (x | ) = 12 e|x| ,

< x < ,

for an unknown real parameter . Find the best test of size , 0 < < 1, of the hypothesis

H0 : = 0 against H1 : = 1 , where 1 > 0 .

When = 0.05, for which values of 0 and 1 will the power of the best test be at

least 0.95?

Part IB

2004

25

4/I/9H

Statistics

normal distribution with mean xi and variance 2 ; here , 2 are unknown and x1 , . . . , xn

are known constants.

Derive the least-squares estimate of .

Explain carefully how to test the hypothesis H0 : = 0 against H1 : 6= 0.

4/II/19H

Statistics

random variable with probability density function f (x | ); the parameter has the prior

distribution with density () and the loss function is L(, a). Show that the optimal

Bayesian point estimate minimizes the posterior expected loss.

Suppose now that f (x | ) = ex , x > 0 and () = e , > 0, where > 0

is known. Determine the posterior distribution of given X.

Determine the optimal Bayesian point estimate of in the cases when

(i) L(, a) = ( a)2 , and

(ii) L(, a) = |( a) /|.

Part IB

2004

- D573-04 Standard Test Method for Rubber—Deterioration in an Air Oven.pdfTransféré parรอคนบนฟ้า ส่งใครมาให้ สักคน
- Profile likelihoodTransféré parAnonymous Y2ibaULes1
- A Primer on Learning in Bayesian Networks for Computational BiologyTransféré parashu_438
- AlgorithmsTransféré parJose Cisneros
- Retention ModelTransféré parRania Dirar
- Information Theory and Statistics- A TutorialTransféré parnmobin27
- self esteem, gender, academic achievementTransféré parsyosepha
- MIT18_443S15_LEC6Transféré parMithilesh Kumar
- PhD ThesisTransféré par1nf0rmat10n
- Shebalin Etal 2012Transféré parJavier Alvarez
- Cas3 NotesTransféré parCeline Yap
- Minka AspectTransféré parSuperman
- Chapter 4Transféré parVilya0829
- Website - Machine LearningTransféré parleninworld
- thrun.3D-EMTransféré parjimakosjp
- 93401-2013-2015-syllabusTransféré parJunweiLiew
- IASSC Black Belt Body of KnowledgeTransféré parAhmed Elswesy
- On the Simlation and Estimation of the MR OU Process With MATLABTransféré pardowatemi
- ARTM Equalize 2 LinearRzxcvgbhjklxTransféré parBach Trung
- Mba Assign Unit ITransféré parrexsiva_2k8019
- Inference for Two PopulationsTransféré parAsiiyah
- 21241Transféré parLion Khushi
- StatisticsTransféré parfarbods0
- Assert Hand02Transféré parMorxiah
- Ch7_NormalDistributionTransféré parmartynapet
- Normal dTransféré parFredy HC
- Bowerman3Ce SSM Chapter06Transféré paramul65
- Normal Distribution-week3 (2)(2)Transféré parSr.haree Kaarthik
- Comparative Examinations of Single Bored Piles Using International Codes.pdfTransféré partrannguyenviet
- IPPTCh005Transféré parJacquelynne Sjostrom

- Power of a TestTransféré parBenjamin Koh
- Lec 4 - Hypothesis TestingTransféré parBenjamin Koh
- Econometrics Assignment QnsTransféré parBenjamin Koh
- Micro Quiz 1Transféré parYuki Tan
- A Level Economics NotesTransféré partasleemfca
- Test for normality.pdfTransféré parShejj Peter
- Lec 2 - Descriptive StatisticsTransféré parBenjamin Koh
- shadow.pdfTransféré parBenjamin Koh
- Lec 7 - Mann-Whitney and Paired TestsTransféré parBenjamin Koh
- bubbleindex-2016.pdfTransféré parBenjamin Koh
- Lec 5 - Hypothesis TestTransféré parBenjamin Koh
- Burnside and DollarTransféré parBenjamin Koh
- Lecture 1 StatsTransféré parBenjamin Koh
- PROBLEM SET 6 SOLUTIONSTransféré parBenjamin Koh
- 1.1 Prime & Composite Numbers-ilovepdf-compressed.pdfTransféré parBenjamin Koh
- Econ7310 Assignment 3(1)Transféré parBenjamin Koh
- PcaTransféré parBenjamin Koh
- panel2up.pdfTransféré parBenjamin Koh
- The Art of Short Selling Kathryn f StaleyTransféré parmaserras
- 20180212 Logistic Assignment ExerciseTransféré parBenjamin Koh
- CS1010S Lecture 03 - Recursion, Iteration & Order of GrowthTransféré parBenjamin Koh
- Exploratory Factor Analysis (SPSS) - D. BoduszekTransféré parJangz Magister
- Singapore-Guide-Final.pdfTransféré parAdam Choy
- Global_Market_Forecast_2016-2035.pdfTransféré parBenjamin Koh
- 0471120626Transféré parMiguel Liszt Chopin
- Four Assumptions of Multiple Regression That Researchers Should Always TestTransféré parBenjamin Koh
- bubbleindex-2016.pdfTransféré parBenjamin Koh
- Alastair Day - Mastering Risk Modelling_ A Practical Guide to Modelling Uncertainty with Microsoft Excel (2nd Edition) (Financial Times Series) (2009, FT Press).pdfTransféré parchaturya makkena
- Expectations JAE2011Transféré parBenjamin Koh
- R Commands for Linear RegressionTransféré parBenjamin Koh

- Tutorial LAVAANTransféré parPetru Madalin Schönthaler
- chapter_111.docTransféré parJc Quismundo
- 00915656.pdfTransféré parAnonymous oUoJ4A8x
- Normal Distribution -- From Wolfram MathWorldTransféré parAdam Tomlinson
- Rumus FedererTransféré parReza Di Putra
- JR00Transféré parThien Trang Nguyen Cao
- Tese Final - The Impact of Ethical and Despotic Leadership on the Emotions and Team Work Engagement PerTransféré parZeeshan Ahmed
- Em ATransféré parSagar
- Review MidtermII Summer09Transféré parDeepak Rana
- Homework Chapter 7Transféré parToño González
- SPSSTransféré parMuwaga Musa Iganga Musa
- Analysis of the Data (1)Transféré parRamakanth Ravirala
- upfalTransféré parpreethi
- Bv Cvxbook Extra ExercisesTransféré parAnonymous WkbmWCa8M
- Discriminant Analysis and the Prediction of Corporate Bankruptcy in the Banking Sector of NigeriaTransféré parmukosino
- Statistical TheoryTransféré parilhanayd
- Lecture Overheads -- Overhead Variances, Chp. 8 Sv 505Transféré parAmby Roja
- Are Short Term Stock Asset Returns Predictable? An Extended Empirical AnalysisTransféré parmirando93
- B.TECH. AERONAUTICAL ENGINEERING SyllabusTransféré parJinu Madhavan
- Chapter 4: Estimating species richnessTransféré parGyoji Tormentum
- Std cstg.pptTransféré parsanam20191
- note-145_biostat_prof._abdullah_al-shiha.pdfTransféré parAdel Dib Al-jubeh
- Library ScienceTransféré parShaheryar Sadiq
- 5 MANOVA Presentation StatsTransféré parVaibhav Baluni
- Handout ThreeTransféré parCassie Desilva
- Elec PricingTransféré pardsowm
- IJCSIT 040615Transféré parAnonymous Gl4IRRjzN
- Open Field TestTransféré parmrkrlnd
- Ch9_06Transféré parAamir Mukhtar
- Css Course OutlineTransféré parUmar Israr