Ch4 Bootstrap

4.
1 Bootstrap principle
4.2 Basic methods
4.3 Bootstrap inference
4.4 Reducing MC error
CHAPTER 4: Bootstrap Methods

MAST90083 Computational Statistics & Data Mining
Guoqi Qian
School of Mathematics & Statistics
The University of Melbourne
1/90
4.1 Bootstrap principle
4.2 Basic methods
Outline
4.1 The bootstrap principle
4.2 Basic methods
4.2.1 Nonparametric and parametric bootstrap
4.2.2 Bootstrapping samples in regression
and se()
4.2.3 Bootstrap estimation of bias()

4.3.1 Computing bootstrap confidence intervals
4.3.2 Percentile and pivoting methods
4.3.3 Bootstrap hypothesis testing
4.4 Reducing Monte Carlo error
4.4.1 Balanced bootstrap
4.4.2 Antithetic bootstrap
2/90
4.2 Basic methods
Distributions of correlation coefficient (1)

I
As a motivating example, consider the correlation coefficient

of a bivariate random vector (X , Y ) which is defined as
=
E (XY ) E (X )E (Y )
.
X Y
The sample correlation coefficient is

Pn
Xi Yi nX
,
P
n
2
2
i=1 (Xi X )
i=1 (Yi Y )
r = qP
n
i=1
based on i.i.d. samples (X 1, Y1 ), , (Xn , Yn ).

One can use r to estimate . Evaluating the variability in this
estimation would require the sampling distribution of r .
When (X , Y ) is bivariate normal and = 0, the distribution
of r is quite simple. It is extremely complicated otherwise.
3/90
4.2 Basic methods
Distributions of correlation coefficient (2)

Theorem (Distribution of r (Fisher, 1915))
Let (X , Y ) be bivariate normal with correlation coefficient .
Then, the sample correlation coefficient r has the pdf
fR (r )
n1
n4
3
(n 2)(n)
(1 2 ) 2 (1 r 2 ) 2 (1 r ) 2 n
1
1
2(n 1)( 2 )(n 2 )
1 1 + r
1 1
2 F1 ( , ; n ;
),
(1)
2 2
2
2
where 2 F1 denotes the ordinary hypergeometric function.

In particular, if X and Y are independent, then n2r
tn2 .
1r 2
Various methods, including asymptotic and Monte Carlo ones,
have been developed to approximate the distribution of r when
intractable. Bootstrap is a Monte Carlo method using resampling.
4/90
4.2 Basic methods
Bootstrap principle (1)

I
Let = T (F ) be an unknown parameter of a distribution

function F with pdf f (x) = F 0 (x). We want to estimate and
investigate the sampling distribution of the estimator of ,
from which we can make statistical inference about .
Note the parameter is expressed as a functional of F . E.g.
I
R
d
T1 (F ) = R xdF (x) = EFR(X ) = is the mean of X = F .
T2 (F ) = x 2 dF (x) [ xdF (x)]2 = VarF (X ) = 2 is the
variance of
R RX = F .
RR
RR
T3 (F ) =
x1 x2 dF (x1 , x2 )
x1 dF (x1 , x2 )
x2 dF (x1 , x2 )
= CovF (X1 , X2 ) is the covariance of (X1 , X2 ) = F .

I
Let xn = {x1 , , xn } be data observed as a realisation of

iid
the random variables X1 , , Xn = F . Let X = {X1 , , Xn }

denote the entire dataset.
5/90
4.2 Basic methods

I
Let F be the empirical cdf of the data x1 , , xn , or X , i.e.

# of xi s x
,
F (x) =
n
# of Xi s x
or F (x) =
.
n
F (x) can be regarded as an estimator of F .

Consequently, = T (F ) can be regarded as an estimator of ,
called the plug-in estimator of .
I
R
Pn
As an example,
= T1 (F ) = xd F (x) = n1 i=1 Xi . So the
sample mean is a plug-in estimator of the population mean .
h P i2
P 2
Xi
Xi
As another example,
2 = T2 (F ) =
is a
n
plug-in estimator of 2 .
6/90
4.2 Basic methods

I
The sampling distribution of the estimator T (F ) is required

for statistical inference on .
Sometimes, the distribution of a related random quantity
R(X , F ) is also required, which may provide better inference
on .
I
For example, R(X , F ) =
)T1 (F )
T1 (F
q
n
n1 T2 (F )
is a t-test statistic, and
its distribution is required for one-sample t-test.

I
The distribution of R(X , F ) often depends on unknown F and

is mostly intractable.
The motivation of bootstrap is to find an approximation to

the distribution of R(X , F ) or T (F ) through a sophisticated
use of the empirical cdf F .
7/90
4.2 Basic methods

I
A sample of size n randomly drawn from the empirical cdf F is

called a bootstrap sample, denoted as xn = {x1 , , xn } or
as X = {X1 , , Xn } if regarded as to be drawn.
By default, x1 , , xn are n elements drawn with replacement

from x1 , , xn ; and X1 , , Xn are i.i.d. random variables
with cdf F .
The bootstrap strategy is to examine the distribution of
R(X , F ) (being called the ideal bootstrap distribution).
For example, if R(X , F ) =

R(X , F ) =
)T1 (F )
T1 (F
q
,
n
n1 T2 (F )
then
)T1 (F
)
T1 (F
q
,
n
n1 T2 (F )
where F is the empirical cdf of the bootstrap sample X .

8/90
4.2 Basic methods

I
In some special cases it is possible to derive the ideal bootstrap

distribution of R(X , F ) through analytical means. However, it can
only be done by simulation in most cases.
The bootstrap principle says R(X , F ) R(X , F ).
This principle has been justified by many research results:

h
i

) q] P[R(X , F ) q] > 0
I P P[R(X , F
for any > 0 and any q as n . Here P () is the
probability measure determined by the empirical cdf F .
I If R(X , F ) is asymptotically pivotal with standard normal dist.,
P [R(X , F ) q] P[R(X , F ) q] = Op (n1 ),
better than the usual normal approximation rate Op (n1/2 ).
The rate could be improved to Op (n2 ) when more advanced
bootstrap is implemented.
9/90
4.2 Basic methods

I
Theory underlying the above results is Edgeworth expansion:

1
P[R(X , F ) q] = (q) + n 2 p1 (q)(q) + O(n1 )

1
P [R(X , F ) q] = P[R(X , F ) q | X ] = (q) + n 2 p

1 (q)(q) + Op (n1 )
where p1 (q) is related to the Hermite polynomials and

involves up to the 3rd moments of F ; and p1 (q) is the plug-in
estimator of p1 (q) by substituting the moments of F . (q)
and (q) are cdf and pdf of N(0, 1).
I
One can show that p1 (q) p1 (q) = Op (n 2 ). Thus

P [R(X , F ) q] P[R(X , F ) q] = Op (n1 ),
1
in comparison with (q) P[R(X , F ) q] = O(n 2 ).

10/90
4.2 Basic methods
Bootstrap principle: Example 4.1 (1)

Example 4.1 Suppose the data x3 = {x1 , x2 , x3 } = {1, 2, 6} are
n = 3 i.i.d observations from a cdf F that has mean . Then the
empirical pdf F is determined by the empirical pdf/pmf
P(x)
= 13 ; x = x1 , x2 , x3 .
Suppose T (F ) = 13 (X1 + X2 + X3 ) is used to estimate .
Our objective is to bootstrap the distribution of R(X , F ) = .
Note that X = {X1 , X2 , X3 }; and X = {X1 , X2 , X3 } is a
bootstrap sample consisting of elements drawn from F . There
are
= 10
nn = 27 possible outcomes for X but consist of only 2n1
n
distinct ones. Let F denote the empirical cdf of X and P () be

the corresponding empirical pdf/pmf. Then
T (F ) = 31 (X1 + X2 + X3 ) is a bootstrap replicate of
based on X .
11/90
4.2 Basic methods

Possible outcomes of bootstrap sample X from {1, 2, 6} (ignoring
order), the resultant values of , the empirical pmf P ( ) and
the observed relative frequency in 1000 bootstrap iterations:
X
1, 1, 1
1, 1, 2
1, 1, 6
1, 2, 2
1, 2, 6
1, 6, 6
2, 2, 2
2, 2, 6
2, 6, 6
6, 6, 6
3/3
4/3
8/3
5/3
9/3
13/3
6/3
10/3
14/3
18/3

3/3 3
4/3 3
8/3 3
5/3 3
9/3 3
13/3 3
6/3 3
10/3 3
14/3 3
18/3 3
P ( )
1/27 0.037
3/27 0.111
3/27 0.111
3/27 0.111
6/27 0.222
3/27 0.111
1/27 0.037
3/27 0.111
3/27 0.111
1/27 0.037
obs. Frequency
38/1000
100/1000
116/1000
112/1000
245/1000
105/1000
38/1000
104/1000
108/1000
34/1000
12/90
4.2 Basic methods

The R code for the above table:
x=c(1,2,6); X=matrix(0,3^3,3)
for(i in 1:3){for(j in 1:3){ #Find all possible bootstrap samples
for(k in 1:3)X[(i-1)*3^2+(j-1)*3+k,]=sort(x[c(i,j,k)])}}
X.unique=unique(X, MARGIN=1)
#Find all distinct bootstrap samples
#Find frequency of each distinct bootstrap sample
nd=nrom(X.unique);
freq=rep(0,nd); obs.freq=rep(0,nd)
for(j in 1:3^3){for(i in 1:nd)freq[i]=freq[i]+(sum((X[j,]-X.unique[i,])^2)==0)}
set.seed(123) #Find the observed frequency from 1000 bootstrap samples:
for(j in 1:1000){x.bs=sort(sample(x, size=3, rep=T))
for(i in 1:nd){obs.freq[i]=obs.freq[i]+(sum((x.bs-X.unique[i,])^2)==0)}}
cbind(X.unique,freq,obs.freq)
freq obs.freq
[1,] 1 1 1
1
38
[2,] 1 1 2
3
100
[3,] 1 1 6
3
116
[4,] 1 2 2
3
112
[5,] 1 2 6
6
245
[6,] 1 6 6
3
105
[7,] 2 2 2
1
38
[8,]
2 2 6 Methods
3
104
CHAPTER
4: Bootstrap
13/90
4.2 Basic methods

I
From the previous table we see the ideal bootstrap distribution

of R(X , F ) = is given by P (), which provides an
estimate for the distribution of R(X , F ) = . This
estimate is further estimated by the bootstrap distribution of
R(X , F ) which is given by the "obs. Frequency" column.
A 92.6% ideal bootstrap C.I. for can be found to be
(4/3, 14/3) using the quantiles of .
The confidence level for this interval is 92.8% based on the
"obs. Frequency" column.
By the bootstrap principle, this interval is an approximate
92.6% C.I. for .
The point estimate of is still calculated from the observed
data which is = 9/3 = 3.
14/90
4.2 Basic methods
Nonparametric bootstrap
I
Finding the ideal bootstrap distribution of R(X , F ) requires

complete enumeration of F or P (), which is not practical
when the sample size n is even moderate.
Instead, B i.i.d. samples, each of size n, are drawn from F ,
producing B nonparametric bootstrap samples. Denote
, , X } iid
them as Xi = {Xi1
in = F for i = 1, , B.
The empirical cdf of {R(Xi , F ), i = 1, , B} is used to

approximate the ideal bootstrap cdf of R(X , F ) which
further approximates the cdf of R(X , F ), allowing inference.
The simulation error in approximating the ideal bootstrap cdf
of R(X , F ) can be made arbitrarily small by increasing B.
C.f. the last 2 columns of the table in Example 4.1.
A key requirement of bootstrapping is that the data to be
resampled must be an i.i.d. sample.
15/90
4.2 Basic methods
Parametric bootstrap
I
When a parametric model is assumed for the data, namely

iid
X1 , , Xn = F (x|), the cdf F (x|) can be parametrically
instead of being estimated by the
estimated by F (x|)
empirical cdf F .
To estimate the distribution of R(X , F (x|)), one can draw B
producing B
i.i.d. samples, each of size n, from F (x|),
parametric bootstrap samples. Denote them as
, , X } iid
Xi = {Xi1
in = F (x|) for i = 1, , B.
i = 1, , B} is then
The empirical cdf of {R(Xi , F (x|)),
used to approximate the ideal bootstrap cdf of R(X , F (x|))

and further the cdf of R(X , F (x|)).
If the parametric model is not good, the parametric bootstrap
can give misleading inference.
16/90
4.2 Basic methods
Bootstrapping samples in regression (1)

I
Consider a multiple regression model, Yi = xT

i + i , for
iid
i = 1, , n, where 1 , , n = F with EF (i ) = 0 and

VarF () = 2 .
I
The observed data are {z1 = (x1 , y1 ), , zn = (xn , yn )}.
It is wrong to generate bootstrap samples from {y1 , , yn }

and from {x1 , , xn } independently, because {y1 , , yn }
are not i.i.d. samples.
Two appropriate ways to construct bootstrap samples from

the observed data are bootstrap the residuals and
bootstrap the cases.
17/90
4.2 Basic methods

Bootstrap the residuals
1. Fit the regression model to the observed data. Obtain the
fitted responses yi = xT
i = yi yi .
i and residuals
2. Bootstrap residuals from {
1 , , n } to get {
1 , , n }.
Note {
1 , , n } are not i.i.d. but roughly so if the
regression model is correct.
3. Create a bootstrap sample of responses: Yi = yi + i for
i = 1, , n.
4. Fit the regression model to {(x1 , Y1 ), , (xn , Yn )} to get
bootstrap estimate ( ,
) of (, ).
5. Repeat this process B times to obtain {(1 ,
1 ), (B ,
B )},
from which an empirical cdf can be built for inference.
18/90
4.2 Basic methods

Bootstrap the cases (also called paired bootstrap)
1. Treat the observed data {z1 = (x1 , y1 ), , zn = (xn , yn )} as
i.i.d. from a cdf F (x, y ).
2. Create a bootstrap sample {Z1 , , Zn } by sampling with
replacement from {z1 , , zn }.
3. Fit the regression model to {Z1 , , Zn } to get bootstrap
estimate ( ,
) of (, ).
4. Repeat this process B times to obtain {(1 ,
1 ), (B ,
B )},
from which an empirical cdf can be built for inference.
5. Bootstrapping the cases is less sensitive to violations in the
regression model assumptions (i.e. adequacy of the model and
constancy of 2 ) than bootstrapping the residuals.
19/90
4.2 Basic methods
and se()
and se()
Bootstrap estimation of bias()

I
q
are the two basic
Bias() = EF () and se() = VarF ()

attributes of the estimator that we can use bootstrap
analysis to estimate.
for some
Suppose = T (F ) and = T (F ) or = T (F (|))
functional T .
or
T (F ) = .
Let R(X , F ) = T (F ) T (F ) = T (F (|))
Then bias() = EF [R(X , F )] and Var() = VarF [R(X , F )] are

population moments of R(X , F ), which can be estimated by
the population moments of the ideal bootstrap distribution of
per the bootstrap principle.
R(X , F ) or R(X , F (|))
They can be further estimated by the sample moments of
calculated from the bootstrap
R(X , F ) or R(X , F (|)),
samples.
20/90
4.2 Basic methods
and se()
and se()
Nonparametric bootstrap estimation of bias()

Computing steps for obtaining nonparametric bootstrap
and se()
are as following:
estimates of bias()
1 Compute from the observed sample xn = (x1 , , xn ).
2 Generate B (typically B 999) nonparametric bootstrap
samples of size n from the observed sample.
3 For each bootstrap sample, compute an estimate of in the
The new estimates of are
same way as estimating by .
called the bootstrap replicates of and are denoted as
1 , , B .
P
by
bias()
4 Compute = B 1 B
r =1 r and estimate
q
P
B
1
2
= ;
compute seB ()
=
bB ()
r =1 (r )
B1
by seB ().
and estimate se()

21/90
4.2 Basic methods
and se()
and se()
Parametric bootstrap estimation of bias()
Parametric bootstrap estimation proceeds the same way as the

nonparametric bootstrap estimation except for in step 2 where
bootstrap samples of size n are generated from F (x|).

Remark:
I
= EF [( )2 ] may be
A bootstrap estimate of MSE(
)
P
2.
= 1 B (r )
obtained as MSEB ()
r =1
B
22/90
4.2 Basic methods
and se()
R package boot
The package boot in R contains many functions for implementing
bootstrap methods. The function boot() is the one for generating
bootstrap samples and various bootstrap estimates about .
library(boot)
boot(data, statistic, R, sim="ordinary", stype="i",
strata=rep(1,n), L=NULL, m=0, weights=NULL,
ran.gen=function(d, p) d, mle=NULL, ...)
See the
The argument statistic is a function defined by .
examples later for illustration.
The functions boot.array() and freq.array() are useful for finding
which original observations and how many times they are included
in each bootstrap sample.
23/90
4.2 Basic methods
and se()
Example 4.2 Bootstrapping on copper-nickel alloy data (1)

Example 4.2 The table below gives 13 measurements of corrosion
loss (yi ) in copper-nickel alloys, each with a specific iron content
(xi ) (Draper & Smith 1966).
xi
0.01
yi
127.6
0.48
1.44
124.0
92.3
0.71
0.71
110.8
113.1
0.95
1.96
103.9
83.7
1.19
0.01
101.5
128.0
0.01
1.44
130.1
91.4
0.48
1.96
122.0
86.2
Of interest is the change in corrosion in the alloys as the iron

content increases, relative to the corrosion loss when there is no
iron. Thus = 10 is the quantity we want to estimate in a simple
linear regression model yi = 0 + 1 xi + i .
24/90
4.2 Basic methods
and se()

The LS estimate or MLE of is =
>
>
>
>
>
1
0
= 0.185.
z=matrix(0,13,2)
z[,1]=c(0.01,0.48,0.71,0.95,1.19,0.01,0.48,1.44,0.71,1.96,0.01,1.44,1.96)
z[,2]=c(127.6,124.0,110.8,103.9,101.5,130.1,122.0,92.3,113.1,83.7,128.0,91.4,86
temp=lm(z[,2]~z[,1])
temp$coef[2]/temp$coef[1]
-0.1850722
and sd().
1. Use bootstrap the cases to estimate bias()

and sd().
2. Use bootstrap the residuals to estimate bias()

3. Assuming the normal linear regression model, use a parametric
and sd().
bootstrap approach to estimate bias()

4. Let = corr(X , Y ). Perform a nonparametric bootstrap
analysis for bias(
) and sd(
).
25/90
4.2 Basic methods
and se()

and sd().
1. Use bootstrap the cases to estimate bias()

First run a pilot study involving only 5 bootstrap samples.
> library(boot)
# Step 1. Write a function to specify the statistic estimator for which
# we want to find a bootstrap replicates.
#lm1.bt() uses the "bootstrap the cases" approach.
lm1.bt=function(x,i){temp=lm(x[i,2]~x[i,1])$coef
ratio=temp[2]/temp[1]
return(ratio)}
# Step2. Use boot() to perform the bootstrap. Need to specify the data, the
# statistic and the # of bootstrap samples (R) to to be generated.
set.seed(1234);
boot1=boot(data=z, statistic=lm1.bt, R=5)
26/90
4.2 Basic methods
and se()

> boot1
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = z, statistic = lm1.bt, R = 5)
Bootstrap Statistics :
original
bias
std. error
t1* -0.1850722 -0.001099678 0.01011771
> attributes(boot1)
$names
[1] "t0"
[7] "sim"
"t"
"call"
> boot1$t0
> t(boot1$t)
"R"
"stype"
"data"
"strata"
"seed"
"weights"
"statistic"
#gives \hat{\theta} estimate

-0.1850722
#the 5 bootstrap replicates of \hat{\theta}
[1,] -0.1871027 -0.1963776 -0.1835215 -0.1704960 -0.1933616

> mean(boot1$t)-boot1$t0
#gives the bias estimate
[1] -0.001099678
> sd(boot1$t)
#gives the sd. estimate
[1] 0.01011771
27/90
4.2 Basic methods
and se()

> boot.array(boot1, indices=T)
[1,]
[2,]
[3,]
[4,]
[5,]
#indices of obs used in bootstrap samples.
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
2
9
10
11
5
11
6
10
8
7
1
7
12
9
1
8
4
4
7
4
3
9
9
5
7
1
8
4
4
4
3
12
4
4
5
7
10
10
5
9
9
13
3
1
11
7
13
9
4
7
3
1
12
7
4
4
3
1
3
11
5
10
2
12
4
For example the 1st bootstrap sample is {(x2 , y2 ), (x9 , y9 ), , (x7 , y7 ), (x12 , y12 )}.
> boot.array(boot1, indices=F) #frequencies of obs used in boot. samples.
[1,]
[2,]
[3,]
[4,]
[5,]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
1
1
0
0
1
1
2
1
1
2
2
1
0
2
0
1
3
1
0
2
1
3
0
0
0
0
0
0
1
5
2
0
1
1
0
2
0
1
0
2
0
2
1
0
0
2
0
3
0
1
0
2
1
1
2
3
1
0
1
0
0
1
1
2
0
E.g. in bootstrap sample (x1 , y1 ) appeared once, (x7 , y7 ) appeared twice, etc..
> freq.array(boot.array(boot1,indice=T)) #same as boot.array(boot1, indices=F)
28/90
4.2 Basic methods
and se()

B = R = 5 is too small. Now run for B = R = 1999. It follows
= 0.00148788 and sdB ()
= 0.008483298.
that bB ()
> set.seed(1234); boot1=boot(data=z, statistic=lm1.bt, R=1999);
Call: boot(data = z, statistic = lm1.bt, R = 1999)
original
bias
std. error
t1* -0.1850722 -0.001487880 0.008483298
boot1
> plot(density(boot1$t),lwd=2); hist(boot1$t, breaks=50, freq=F, add=T)
30
0
10
20
Density
40
50
density.default(x = boot1$t)
0.24
0.22
N = 1999
0.20
0.18
0.16
Bandwidth = 0.001472
29/90
4.2 Basic methods
and se()

and sd().
2. Use bootstrap the residuals to estimate bias()

> library(boot)
#Require lm2.bt() for the "bootstrap the residuals" approach.
lm2.bt=function(x,i){temp=lm(x[,2]~x[,1])
y.star=temp$fitted + temp$residual[i]
temp1=lm(y.star~x[,1])$coef
ratio=temp1[2]/temp1[1]
return(ratio)}
> set.seed(1234); boot2=boot(data=z, statistic=lm2.bt, R=1999);
boot2

Call: boot(data = z, statistic = lm2.bt, R = 1999)
original
bias
std. error
t1* -0.1850722 0.0002736776 0.007615583
= 0.0002737 and sdB ()

= 0.007616.
It follows that bB ()
30/90
4.2 Basic methods
and se()

The bootstrap the residuals method gives smaller estimates of
and sd()
than the bootstrap the cases method. The
bias()
bootstrap pdf of is more symmetric.
4
0
12
40
90
100
110
120
130
2.0
Normal QQ
2
13

12
0.19
0.18
0.17
N = 1999 Bandwidth = 0.001499
0.16
0.0
90
100
110
Fitted values
1.0
1.5
120
130
Residuals vs Leverage
1
13 0.5
1.2
0.6
Standardized residuals
0.20
30
10
0
0.21
2
12
0.5
Theoretical Quantiles
ScaleLocation
13
1.5 1.0 0.5 0.0
Fitted values
20
Density
50
Residuals
60
2
7
0.5
Residuals vs Fitted
1.0
> plot(density(boot2$t), ylim=c(0,60), lwd=2)

> hist(boot2$t, breaks=50, freq=F, add=T)
> par(mfrow=c(2,2)); plot(lm(z[,2]~z[,1])) #regression diagnosis plots.
Cook's distance
12
0.00
0.05
0.10
0.15
0.5
0.20
0.25
Leverage
31/90
4.2 Basic methods
and se()

3. Assuming the normal linear regression model, use a parametric
and sd().
bootstrap approach to estimate bias()

> library(boot)
#Parametric bootstrap analysis
#lm3.bt() uses parametric bootstrap samples generated from lm.gen().
lm3.bt=function(x){temp=lm(x[,2]~x[,1])$coef; ratio=temp[2]/temp[1]; ratio}
#lm.gen() generates a parametric bootstrap sample
lm.gen=function(x, mle){n=nrow(x); err=rnorm(n, mean=0, sd=mle$sigma)
y=mle$beta0+x[,1]*mle$beta1+err; return(cbind(x[,1],y))}
temp=lm(z[,2]~z[,1])
mle.list=list(beta0=temp$coef[1], beta1=temp$coef[2])
mle.list$sigma=sqrt(sum(temp$resid^2)/temp$df.resid)
> set.seed(1234)
> boot3=boot(data=z, stat=lm3.bt, R=1999, sim="parametric",
ran.gen=lm.gen, mle=mle.list)
32/90
4.2 Basic methods
and se()

> boot3
PARAMETRIC BOOTSTRAP
boot(data = z, statistic = lm3.bt, R = 1999, sim = "parametric",
ran.gen = lm.gen, mle = mle.list)
original
bias
std. error
t1* -0.1850722 -7.34372e-05 0.008344886
= 7.34 105 and sdpar ()

= 0.00834.
It follows that bpar ()
40
0
10
10
20
30
Density
30
20
Density
40
50
50
60
60
0.22
0.21
0.20
0.19
0.18
0.17
N = 1999 Bandwidth = 0.001642
0.16
0.15
1.00
0.98
0.96
0.94
0.92
N = 1999 Bandwidth = 0.001271
33/90
4.2 Basic methods
and se()

4. Let = corr(X , Y ). Perform a nonparametric bootstrap
analysis for bias(
) and sd(
).
It follows that = 0.9847435, bB (
) = 0.0001254529 and
sdB (
) = 0.007203108.
#A bootstrap correlation function:
cor.bt=function(x,i){cor(x[i,1],x[i,2])}
> set.seed(1234); boot4=boot(data=z, stat=cor.bt, R=1999); boot4
Call: boot(data = z, statistic = cor.bt, R = 1999)
original
bias
std. error
t1* -0.9847435 -0.0001254529 0.007203108
34/90
4.2 Basic methods
Bootstrap inference contents
This section will include:

1. how to use boot.ci() in package boot in R to compute
bootstrap confidence intervals;
2. percentile and pivoting methods for deriving various bootstrap
confidence intervals;
3. use of bootstrap and permutation in hypothesis testing.
35/90
4.2 Basic methods
Computing bootstrap confidence intervals

generated by the boot() function in
Bootstrap replicates of ,
boot package, can be used to construct CIs for .
The boot package computes 6 types of bootstrap CIs for :
1. Percentile (or basic percentile)
2. Normal approximation
3. Basic (or residual)
4. Studentized
5. BCa bias corrected and accelerated
6. ABC approximate bias corrected
All of them except ABC are computed using the boot.ci()
function. The ABC CIs are computed using the abc.ci() function.
We will not discuss the ABC CIs here.
36/90
4.2 Basic methods
boot.ci() function for computing bootstrap CIs
boot.ci(boot.out, conf = 0.95, type = "all",

index = 1:min(2,length(boot.out$t0)), var.t0 = NULL,
var.t = NULL, t0 = NULL, t = NULL, L = NULL, h = function(t) t,
hdot = function(t) rep(1,length(t)), hinv = function(t) t, ...)
The argument boot.out is the result returned from executing

boot().
Type help(boot.ci) for details on other arguments.
37/90
4.2 Basic methods
Example 4.3 Bootstrap CIs on copper-nickel alloy data (1)

Example 4.3 (Example 4.2 continued)
1. Use the bootstrap replicates of saved in object boot1 to
find 95% bootstrap CIs for .
3. Use the parametric bootstrap replicates of saved in object
boot3 to find 95% bootstrap CIs for .
38/90
4.2 Basic methods

> library(boot); boot.ci(boot1)
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1999 bootstrap replicates
CALL :
boot.ci(boot.out = boot1)
Intervals :
Level
Normal
95%
(-0.2002, -0.1670 )
Basic
(-0.1963, -0.1626 )
Level
Percentile
BCa
95%
(-0.2076, -0.1738 )
(-0.2047, -0.1731 )
Calculations and Intervals on Original Scale
Warning message:
In boot.ci(boot1) : bootstrap variances needed for studentized intervals
39/90
4.2 Basic methods

> boot.ci(boot2)
CALL :
boot.ci(boot.out = boot2)
Intervals :
Level
Normal
95%
(-0.2003, -0.1704 )
Basic
(-0.2006, -0.1710 )
Level
Percentile
BCa
95%
(-0.1992, -0.1695 )
(-0.1978, -0.1677 )
Warning message:
40/90
4.2 Basic methods

3. Use the parametric bootstrap replicates of saved in object
boot3 to find 95% bootstrap CIs for .
> boot.ci(boot3)
Error in empinf(boot.out, index = index, t = t.o, ...) :
influence values cannot be found from a parametric bootstrap
In addition: Warning message:
> boot.ci(boot3, type=c("norm","basic", "perc"))
CALL : boot.ci(boot.out = boot3, type = c("norm", "basic", "perc"))
Intervals :
Level
Normal
Basic
95%
(-0.2014, -0.1686 )
(-0.2021, -0.1692 )
Percentile
(-0.2010, -0.1681 )
41/90
4.2 Basic methods

> boot.ci(boot4)
CALL : boot.ci(boot.out = boot4)
Intervals :
Level
Normal
95%
(-0.9987, -0.9705 )
Basic
(-1.0010, -0.9740 )
Level
Percentile
BCa
95%
(-0.9954, -0.9685 )
(-0.9935, -0.9539 )
Warning message:
42/90
4.2 Basic methods

Remarks:
1. Across parts 1 to 4, studentized CIs cannot be computed
because boot.ci() requires bootstrap replicates of the
estimated variance of which are not available. See
section 4.3.2 for solutions.
2. In part 3, it reveals that empirical influence function(EIF)
values are required for computing the BCa CIs, but parametric
bootstrap does not provide the EIF values.
3. The EIF of an estimator based on a sample {Z1 , , Zn } is
defined as a sequence {EIF1 , , EIFn }, where
EIFi = (n 1)[ (i)]

for i = 1, , n, with (i)
being
the estimate of from {Z1 , , Zi1 , Zi+1 , Zn }.
43/90
4.2 Basic methods

Remarks: (continued)
4. The empinf() function computes EIF. The BCa CI for based
on parametric bootstrap replicates can now be calculated:
> boot3$L=empinf(data=z, stat=lm1.bt, stype="i")
> boot.ci(boot3)
Intervals :
Level
Normal
95%
(-0.2014, -0.1686 )
Basic
(-0.2021, -0.1692 )
Level
Percentile
BCa
95%
(-0.2010, -0.1681 )
(-0.1987, -0.1653 )
Warning message:
44/90
4.2 Basic methods
Normal approximation based bootstrap confidence intervals

d
N(0, 1), e.g., when is the MLE.
se()
In many cases,
Then an approximate 100(1 )% CI for would be

where z1 = (1 ).
z1 2 se()
2
2
to estimate
If bootstrap replicates are available, we use seB ()
se() (if otherwise difficult to estimate); and estimate by

is an unbiased
= 2 (note bias()
bB ()
estimator of ). This suggests the following 100(1 )%
normal approximation based bootstrap CI for :
h
i
.
(2 ) z1 2 seB (),
(2 ) + z1 2 seB ()
This formula is used in boot.ci() to compute the Normal CI.
45/90
4.2 Basic methods
Percentile bootstrap confidence intervals (1)

I
The 2.5th and 97.5th percentiles, say, of bootstrap replicates

of provide a 95% prediction interval for , and accordingly
a 95% PI for by the bootstrap principle.
The above 100(1 )% PI is used as the 100(1 )%
Percentile CI for in boot.ci(). Recall Example 4.3 (1):
> boot.ci(boot1, type="perc")
Intervals based on 1999 bootstrap replicates:
Level
Percentile
95%
(-0.2076, -0.1738 )
> quantile(boot1$t, prob=c(0.025,0.975), type=6)
2.5%
97.5%
-0.2075644 -0.1738358
The percentile method bootstrap CI is prone to bias and

inaccurate coverage probabilities. It works better when is
essentially a location parameter.
46/90
4.2 Basic methods

A justification on the percentile method bootstrap CI for
I Assume the existence of a continuous and strictly increasing
transformation , and a continuous cdf H with symmetric pdf
d
() =
(implying H(z) = 1 H(z)) such that ()
H.
I This assumption is likely to be reasonable although it may be
difficult to find such and H. However, it turns out that we
dont need explicit specification of and H. On the other
hand, when such and H exist, we can even assume H to be
N(0, 1) (why?).
I Now we know
h
i
() h
P h/2 ()
=1
(2)
1/2
where h is the quantile of H.
47/90
4.2 Basic methods

A justification on the percentile method CI for (continued)
I Applying the bootstrap principle to (2), we have
h
i
h
1 P h/2 ( ) ()
1/2
h
i
( ) h
= P h/2 + ()
+
(
)
1/2
h

i
1
1 h
= P
h/2 + ()
+ ()
.(3)
1/2
I

Hence 1 h/2 +()
and 1 h1/2 +()
/2
1/2
with being the quantile of the ideal bootstrap
distribution P () of which can be estimated by
([B+1])
, the sample quantiles (order statistic) from B
bootstrap replicates of .
48/90
4.2 Basic methods

A justification on the percentile method CI for (continued)
I
On the other hand, (2) can be rewritten as

h

i
1 h
P 1 h/2 +()
+(
)
=1
1/2
(4)
noting that H has a symmetric pdf so that h/2 = h1/2 .
Therefore, by comparing (3) and (4) we know

h
i h
i
([B+1]/2)
/2
, 1/2
, ([B+1](1/2))
can serve as an approximate 100(1 )% C.I. for , which is
called the (basic) percentile bootstrap CI.
49/90
4.2 Basic methods
Basic (or residual) bootstrap confidence intervals

I
Taking to be the identical transformation, eq. (2) becomes

h
i
P h/2 h1/2 = 1
(5)
We call the residual of the estimator .
= where
By the bootstrap principle, h ( )
is the sample
quantile of . Using this approximation,
(5)
h
i
becomes P /2 1/2 1 , which is

i
h
1
(6)
P 2
2
1/2
/2
This suggests the following approximate 100(1 )% basic

(or residual) bootstrap CI for :
i
h
i h

, 2
2
,
2
([B+1] )
1/2
/2
([B+1](1 ))
2
50/90
4.2 Basic methods
BCa bootstrap confidence intervals (1)

I
The basic (residual) bootstrap CI tends to suffer from the

same defects as the basic percentile bootstrap CI does.
Namely, it is prone to bias and inaccurate coverage
probabilities.
For these two CIs to work well, it requires the cdf H there to
be free from . This implies that a stronger transformation
and to find the CI
is in need to get a pivotal quantity for ,
based on the pivot.
The bias corrected and accelerated percentile method, or

BCa, is motivated by this finding, and has been used to derive
CIs for with substantial improvement over the previous two
percentile methods.
51/90
4.2 Basic methods
The idea behind the BCa method is to assume the existence

of a transformation of whose distribution is (asymptotically)
normal and whose mean and standard deviation depend in a
particular way on so that an (asymptotic) pivotal can be
easily constructed.
A CI is made on the transformed parameter and then the

interval is inverted to obtain an interval for . By using the
bootstrap method, the inversion can be done without
knowledge of the explicit form of the transformation.
52/90
4.2 Basic methods

Suppose there is a strictly increasing transformation such that
has a normal distribution with
(h)
i
h
i
= () c0 [1+a()] and Var ()
= [1+a()]2 .
E ()
()
()
d
+ c0 = N(0, 1).
1 + a()
Namely,
I
(7)
If zp is the 100pth percentile of N(0, 1) with p = 1 2 , then

P
()
()
zp
+ c0 zp
1 + a()
P
!
= p (1 p) = 1
+ c0 zp
+ c0 + zp
()
()
()
1 a(c0 zp )
1 a(c0 + zp )
!
=1
53/90
4.2 Basic methods

which suggests a 100(1 )% CI for () as
"
+ c0 zp
()
,
L=
1 a(c0 zp )
#
+ c0 + zp
()
U=
, not computable as unknown;
1 a(c0 + zp )
And a 100(1 )% CI for would be [ 1 (L), 1 (U)].

I
By the bootstrap principle and (7),
Thus it can be verified that (note p
( )()
1+a()
= 1 2 )
+ c0 N(0, 1).
!

)()
c
+
z
0
p
P ( ) U = P
+c0
+c0
1a(c0 + zp )
1 + a()

c0 + zp
denoted
+c0
= pU .
1a(c0 + zp )
54/90
4.2 Basic methods

I
I
Hence U is approximately the pU quantile of the cdf of ( ).

1 (U) is the UCL of the 100(1)% CI for as is strictly
increasing. It is also approximately the pU quantile of the cdf
of 1 (( )) = , which can be estimated by p , the pU
U
sample quantile of the bootstrap replicates of .
Similarly (note p = 1 2 )
!

)()
z
0
p
P ( ) L = P
+c0
+c0
1a(c0 zp )
1 + a()

c0 zp
denoted
+c0
= pL ,
1a(c0 zp )
So the LCL of the 100(1)% CI for can be estimated by

p , the pL sample quantile of the bootstrap replicates of .
L
55/90
4.2 Basic methods

I
Given values of c0 and a, we can compute pU and pL , and

accordingly a 100(1 )% BCa bootstrap CI of as
h
i h
i
p , p ([B+1]p
,
)
([B+1]p )
L
E.g., given c0 = 0.20, a = 0.01 and = 0.05 (so p = 0.975), a

95% BCa CI of is
h
i h
i
0.063
, 0.992
= (0.063[B+1])
, (0.992[B+1])
which can be read off from the B bootstrap replicates of .
The value of c0 is determined by the relative position of
among the bootstrap replicates of , while the value of a is
determined by the skewness of the bootstrap replicates of .
56/90
4.2 Basic methods

the proportion of the
Specifically, let p0 = #{B<} = F (),
and let i be the estimate of based

replicates that are ;
on the data x1 , , xi1 , xi+1 , , xn . We call i s the jackknife
Then
replicates of .
the p0 quantile of N(0, 1).
()),
I c0 = 1 (p0 ) = 1 (F
a=
P n 3
i=1 J i

,
Pn 2 3/2
6
i=1 J i
where J =
1
n
Pn
i=1 i .
Note

(n1) J i are related to EIF (empirical influence function). In
boot.ci(), EIFi are used in place of (J i ).
Sometimes a = 0 is set. Then the resultant interval is called the

BC (bias corrected) CI of .
57/90
4.2 Basic methods
Computing a 100(1 )% BCa bootstrap CI of

In summary, compute a BCa CI for as following:
1 Generate B replicates 1 , , B by a bootstrap method.
2 Compute c0 = 1 (p0 ), where p0 =
compute a =
P n 3
i=1 J i

,
Pn 2 3/2
6
i=1 J i
3 Compute pU =
4
c0 +z1
2
1a(c0 +z1
PB
I (r <)
.
B
Also
P
where J = n1 ni=1 i .
r =1

c0 z1
2
+c
and
p
=
+c
0
0 .
L
)
1a(c0 z )
1
([B+1]p
)
U
([B+1]p
)
L
Find the order statistics

and
from the B
bootstrap
h replicates. Then a 100(1
i
h )% BCa
i bootstrap CI
of is ([B+1]p ) , ([B+1]p ) or p , p .
L
58/90
4.2 Basic methods
Example 4.4 BCa CI involving copper-nickel alloy data

Example 4.4 (Example 4.3 continued) In Example 4.3 part 1 we
have found a 95% BCa CI as (0.2047, 0.1731). This can be
verified following the procedure on the previous page.
> boot.ci(boot1)$bca
conf
[1,] 0.95 72.56 1968.28 -0.2047184 -0.1730628
p0=sum(boot1$t < boot1$t0)/boot1$R; c0=qnorm(p0)
eif=empinf(boot1); a=sum(eif^3)/(6*(sum(eif^2))^(3/2))
pu=pnorm((c0+qnorm(0.975))/(1-a*(c0+qnorm(0.975)))+c0)
pl=pnorm((c0-qnorm(0.975))/(1-a*(c0-qnorm(0.975)))+c0)
> quantile(boot1$t, prob=c(pl,pu), type=6)
3.627772% 98.41409%
-0.2047184 -0.1730626
> p0; c0; a; pl; pu; pl*2000; pu*2000
[1] 0.5087544; [1] 0.02194573; [1] 0.03419632;
[1] 0.03627772; [1] 0.9841409; [1] 72.55545; [1] 1968.282
59/90
4.2 Basic methods
Studentized bootstrap confidence intervals (1)

I
A more intuitive way to construct an appropriate pivot for

bootstrap is the studentized bootstrap or bootstrap t
method.
Suppose = T (F ) is to be estimated using = T (F ), with
V (F ) estimating the variance of .

Then it is reasonable to expect that R(X , F ) =
T (F
)T (F )
)
V (F
will be roughly pivotal. Bootstrapping R(X , F ) yields a

collection of R(X , F ).
and G
the distributions of R(X , F ) and
Denote by G
R(X , F ) respectively.
60/90
4.2 Basic methods

I
Theoretically a 100(1 )% CI for can be obtained using

) R(X , F ) 1 (G
)
P 2 (G
2

q
q
= P 1 2 (G ) V (F ) 2 (G ) V (F ) = 1
) is the quantile of G
. These quantiles are
where (G
unknown but can be estimated under the bootstrap principle,
) (G
).
so (G
This gives the 100(1 )% studentized bootstrap CI of :

q
q
) V (F ), T (F ) (G
) V (F )
T (F ) 1 2 (G
2

q
q
c ),
c )
) Var(
) Var(
= 1 2 (G
2 (G
) is the quantile of G
.
where (G
61/90
4.2 Basic methods

I
To calculate the studentized bootstrap CI for , we need the

estimated variance V (F ), which can be approximated by the
or by using a delta method.
bootstrap estimate sd2B ()
A more difficult problem in calculating the studentized
) values. Note (G
) is
bootstrap CI for is finding the (G
F )T (F )
.
w.r.t. the cdf G
the quantile of R(X , F ) = T (
)
V (F
The bootstrap replicates of R(X , F ) for B given bootstrap

to replace
samples are q 1
, , q B
. Using sd2B ()
d ( )
Var
1
d ( )
Var
B
c (j )s ignores their variation, which reduces to the basic

all Var
(or residual) bootstrap CI method.
c ( ) by a
An approximation method is to calculate V (F ) = Var
delta method for each bootstrap sample.
Can use double bootstrap but computationally intensive.
62/90
4.2 Basic methods
Coverage probability of the studentized bootstrap CI closely

approximates the nominal confidence level in general.
The approximation is most reliable when T (F ) is a location
statistic in the sense that a constant shift in all the data
values will induce the same shift in T (F ).
It is also reliable for variance-stabilized estimators.
It is however sensitive to the presence of outliers in the

dataset, so use the studentized bootstrap CI with caution in
such cases.
Unlike the percentile-based methods, the studentized

bootstrap CI is not transformation-respecting.
63/90
4.2 Basic methods
Example 4.5 Bootstrap t CI for copper-nickel alloy data (1)

Example 4.5 In Example 4.3 part 1 we cannot get a 95%
studentized bootstrap CI. We can get it now if we incorporate an
into the bootstrap procedure.
estimate of Var()
I Using the delta method, an estimated variance is
c )
=
Var(
I
1
0
!2
c 1 )
c 0 )
d 0 , 1 )
Var(
Var(
2Cov(
+
2
2
1
0
0 1
Need a statistic function for boot() to generate bootstrap

c ):
replicates for both and Var(
lm5.bt=function(x,i){tem=summary(lm(x[i,2]~x[i,1]))
beta=tem$coef[,1]
ratio=beta[2]/beta[1]
v.ratio=ratio^2*(tem$cov[1,1]/beta[1]^2+tem$cov[2,2]/beta[2]^2
-2*tem$cov[1,2]/(beta[1]*beta[2]))
return(c(ratio, v.ratio))}
64/90
4.2 Basic methods

Using lm5.bt() function, run boot(), boot.ci() and other functions:
library(boot); set.seed(1234)
boot5=boot(data=z, statistic=lm5.bt, R=1999);
boot.ci(boot5);
boot.ci(boot5)$stu
boot5
Q=(boot5$t[,1]-boot5$t0[1])/sqrt(boot5$t[,2])
sort(Q)[(1999+1)*0.975];
sort(Q)[(1999+1)*0.025]
quant=quantile(Q,prob=c(0.025,0.975),type=6)
c(boot5$t0[1]-sqrt(boot5$t0[2])*quant[2],boot5$t0[1]-sqrt(boot5$t0[2])*quant[1])
c(boot5$t0[1]-sqrt(boot5$t0[2])*1.96, boot5$t0[1]+sqrt(boot5$t0[2])*1.96)
par(mfrow=c(1,3))
plot(density(boot5$t[,1]), ylim=c(0,60),lwd=2)
hist(boot5$t[,1], breaks=50, freq=F, add=T)
plot(density(boot5$t[,2]^0.5), ylim=c(0,1000),lwd=2)
hist(boot5$t[,2]^0.5, breaks=50, freq=F, add=T)
plot(density(Q),ylim=c(0,0.18),lwd=2)
hist(Q, breaks=50, freq=F, add=T)
65/90
4.2 Basic methods

From the following R outputs we see:
c )
= 0.185; and V (F ) = Var(
= 7.4663 106 by the
I
delta method;
are 6.129 and 4.360
I the 2.5 and 97.5 percentiles of G
respectively;
I
so the 95% studentized bootstrap CI for is

0.185 4.360 7.4663 106 , 0.185 (6.129) 7.4663 106
= [0.1970, 0.1683].
I
The empirical pdfs of the bootstrap replicates of ,

and R(X , F ) are non-symmetric.
V (F )
66/90
4.2 Basic methods

> boot5
Call:
boot(data = z, statistic = lm5.bt, R = 1999)
original
bias
std. error
t1* -0.1850722153 -1.487880e-03 8.483298e-03
t2* 0.0000074663 1.494497e-06 3.879474e-06
> boot.ci(boot5)
Intervals :
Level
Normal
95%
(-0.2002, -0.1670 )
Basic
(-0.1963, -0.1626 )
Studentized
(-0.1970, -0.1683 )
Level
Percentile
BCa
95%
(-0.2076, -0.1738 )
(-0.2047, -0.1731 )
67/90
4.2 Basic methods

> boot.ci(boot5)$stu
conf
[1,] 0.95 1950 50 -0.1969873 -0.1683238
> Q=(boot5$t[,1]-boot5$t0[1])/sqrt(boot5$t[,2])
> sort(Q)[(1999+1)*0.025];
sort(Q)[(1999+1)*0.975]
[1] -6.129442
[1] 4.360584
> quant=quantile(Q,prob=c(0.025,0.975),type=6); quant

2.5%
-6.129442
97.5%
4.360584
> c(boot5$t0[1]-sqrt(boot5$t0[2])*quant[2],
boot5$t0[1]-sqrt(boot5$t0[2])*quant[1])
-0.1969873
-0.1683238
#verifying the results given by boot.ci(boot5)$studen
> c(boot5$t0[1]-sqrt(boot5$t0[2])*1.96, boot5$t0[1]+sqrt(boot5$t0[2])*1.96)

-0.1904278 -0.1797166
68/90
4.2 Basic methods

par(mfrow=c(1,3))
plot(density(boot5$t[,1]), ylim=c(0,60),lwd=2)
hist(boot5$t[,1], breaks=50, freq=F, add=T)
plot(density(boot5$t[,2]^0.5), ylim=c(0,1000),lwd=2)
hist(boot5$t[,2]^0.5, breaks=50, freq=F, add=T)
plot(density(Q),ylim=c(0,0.18),lwd=2)
hist(Q, breaks=50, freq=F, add=T)
density.default(x = boot5$t[, 2]^0.5)
density.default(x = Q)
0.10
Density
Density
0.00
10
200
0.05
20
400
30
Density
600
40
800
50
60
1000
density.default(x = boot5$t[, 1])
0.15
>
>
>
>
>
>
>
0.24
0.22
N = 1999
0.20
0.18
0.16
0.002
N = 1999
0.004
0.006
Bandwidth = 8.967e05
10
N = 1999
Bandwidth = 0.5123
69/90
4.2 Basic methods
Empirical variance stabilization (1)

I
A variance-stabilization transformation of an estimator is one

for which the sampling variance of the transformed estimator
does not depend on . It is often the basis for a good pivot.
The mostly unknown variance-stabilization transformation can

be estimated using the (double) bootstrap.
Let Z be a r.v. with mean and standard deviation s(). By

the delta method, Var[g (Z )] g 0 ()2 s 2 ().
Rz
For Var[g (Z )] to be constant, we require g (z) = a s 1 (u)du
where a is any such that s 1 (u) is continuous on [a, z].
Given a sequence of (u, s(u)) values, g (z) or g () can be

estimated using numerical integration.
70/90
4.2 Basic methods
Empirical variance stabilization (2)

I
The sequence of (u, s(u)) values are generated using the

bootstrap:
1. Draw B1 bootstrap samples Xj for j = 1, , B1 from the
original data X . Calculate bootstrap replicates j , j = 1, , B1 .
2. From each Xj , draw B2 bootstrap samples Xj1 , , XjB

, and
2
calculate j1 , , jB2 .
3. For each j = 1,
1 , calculate
PB 2 , B
PB2
(jk j )2 with j = B12 k=1
jk .
s (j ) = B211 k=1
4. Return the sequence (1 , s (1 )), , (B1 , s (B1 )).
Once the variance-stabilization transformation g (z) is

estimated (as g (z)), we can apply a further bootstrap
procedure to find (either percentile-based or studentized) CI
for g (). Then invert the CI back to that for .
The procedure is computing-intensive. Details skipped here.
71/90
4.2 Basic methods
Nested bootstrap and prepivoting (1)

I
The strategy of drawing further bootstrap samples from each

bootstrap sample of the original sample data can be used to
provide another approach of pivoting in finding bootstrap CIs.
The approach is called the nested bootstrap, or iterated
bootstrap or double bootstrap.
Suppose a statistic R0 (X , F ), involving parameter = T (F ),
can be used to construct a CI for if the distribution of
R0 (X , F ) is known. Suppose the data {x1 , , xn } are
iid
observed from the model X1 , , Xn = F .

Let F0 (q, F ) = P[R0 (X , F ) q] be the cdf of R0 (X , F ),
where we make explicit its dependence on F . Now a CI for
could be derived based on the statement
P[F01 (/2, F ) R0 (X , F ) F01 (1 /2, F )] = 1
72/90
4.2 Basic methods

I
Of course F0 (q, F ) is unknown, so what we have been doing is

to use bootstrap to approximate F0 (q, F ) and its quantiles.
As the approximation is involved, the CI constructed will not
have coverage probability exactly equal to 1 . The error in
the approximation can be quite bad if R0 (X , F ) is not a pivot.
However, the random variable
d
R1 (X , F ) = F0 (R0 (X , F ), F ) = U(0, 1) is a pivot. This
means, for a bootstrap estimate F0 (q, F ) of F0 (q, F ) , the
difference between U(0, 1) and the distribution of
1 (X , F ) = F0 (R0 (X , F ), F ) should be smaller than that
R
between F0 (q, F ) and F0 (q, F ).
This suggests we can use the bootstrap distribution of
1 (X , F ) to construct a CI for instead of using the
R
bootstrap distribution of R0 (X , F ).
73/90
4.2 Basic methods

I
1 (X , F ) q] be the cdf of R
1 (X , F ). The
Let F1 (q, F ) = P[R
100(1 )% CI for based on the bootstrap distribution of
1 (X , F ) is fashioned after the statement
R
1 (X , F ) F 1 (1/2, F )] = 1 .
P[F11 (/2, F ) R
1
1 (X , F ) comes from two sources:

Note the randomness in R
1. random observations {x1 , , xn } from F , which determines F ;
1 (X , F ) = F0 (R0 (X , F ), F ) is calculated from random
2. R
sampling from F .
These two sources of randomness are captured in the

following nested/iterated/double bootstrap algorithm,
which gives a double bootstrap CI for .
74/90
4.2 Basic methods

Nested/iterated/double bootstrap algorithm:
1 Generate B0 bootstrap samples X1 , , XB0 from {x0 , , xn }.
2 Compute R0 (X , F ) for j = 1, , B0 .
j
3 For j = 1, , B0 :
(a) Let Fj be the empirical cdf of Xj . Draw B1 bootstrap samples
Xj1 , , XjB
from Fj .
1
(b) Compute R0 (Xjk , Fj ) for k = 1, , B1 .
(c) Compute
1 (Xj, F ) = F0 (R0 (Xj, F ), F ) = 1
R
B1
PB1

k=1 I [R0 (Xjk , Fj )
R0 (Xj, F )].
1 (X , F ), , R
1 (X , F ).
4 Denote as F1 the empirical cdf of R
1
B0
1 ({x1 , , xn }, F ) = F0 (R0 ({x1 , , xn }, F ), F ) and
5 Use R
quantiles of F1 to construct the CI for following the
1 (X , F ) F 1 (1 /2)] 1 .
statement P[F11 (/2) R
1
75/90
4.2 Basic methods

Remarks:
1. Steps 1 and 2 of the algorithm aim to capture the first
source of randomness by applying the bootstrap principle to
approximate R0 (X , F ) by R0 (X , F ).
2. Step 3 aims to capture the second source of randomness
1 when R0 is bootstrapped conditional on F .
introduced in R
3. It is expected the double bootstrap algorithm is much more
computing intensive than the usual bootstrap because B0 B1
bootstrap samples need be generated.
4. As it need capture two sources of randomness, the double
bootstrap may not be as good as the two pivoting methods
BCa and studentized t when the assumptions involved in the
latter are satisfied. But the former can be applied to
situations where assumptions for the latter are not satisfied.
76/90
4.2 Basic methods
Example 4.6 Double bootstrap CI for copper-nickel alloy data (1)

Example 4.6 Continuing the analysis for copper-nickel alloy data,
we want to find a 95% CI for = 10 by double bootstrap. We will
use the bootstrap cases approach to generate each bootstrap
sample. (The bootstrap residuals approach is left for exercise)
1
0
1
0 .
First define R0 ({x1 , , xn }, F ) = =

1 and F1 are determined accordingly.
R
The boot package does not have any function to implement

double bootstrap. So we write our own lm6.dbt() using R.
Then set B0 = B1 = 1000 and execute

lm6.dbt(z,B0=1000,B1=1000,conf.lev=0.95).
R1 (X , F ),
77/90
4.2 Basic methods

I
1 shows that F1 differs noticeably from

The histogram of R
uniform. The double bootstrap gives 2.5 and 97.5 percentiles
1 as 0.01925 and 0.997, respectively. The 0.01925 and
of R
0.997 quantiles, or the 1.925 and 99.7 percentiles of
R0 (X , F ) = are then found to be 0.02195345 and
0.02156509, respectively. Hence a 95% double bootstrap CI
for is
[0.02156509,
(0.02195345)]
= [0.2066373, 0.1631188]
knowing = 0.1850722.
78/90
4.2 Basic methods

lm6.dbt=function(x,B0,B1,conf.lev=0.95){
n=nrow(x); R0.star=rep(0,B0); R0.2star=rep(0,B1); R1.hat=rep(0,B0)
tem0=lm(x[,2]~x[,1]);
ratio0=tem0$coef[2]/tem0$coef[1]
for(j in 1:B0){
i1=sample.int(n,size=n,rep=T);
x1=x[i1,]
tem1=lm(x1[,2]~x1[,1]);
R0.star[j]=ratio1-ratio0
for(k in 1:B1){
i2=sample.int(n,size=n,rep=T)
tem2=lm(x1[i2,2]~x1[i2,1]);
R0.2star[k]=ratio2-ratio1}
#end loop k
R1.hat[j]=mean(R0.2star<=R0.star[j])}
#end loop j
qL=quantile(R1.hat, prob=(1-conf.lev)/2, type=6, na.rm=T)
qU=quantile(R1.hat, prob=1-(1-conf.lev)/2, type=6, na.rm=T)
#qL and qU are alpha/2 and (1-alpha/2) quantiles of R1.hat.
#qL and qU quantiles of R0.star are used to find the CI of ratio.
L=ratio0-quantile(R0.star,prob=qU,type=6, na.rm=T)
U=ratio0-quantile(R0.star,prob=qL,type=6, na.rm=T)
rslt=list(theta=ratio0, qL=qL, qU=qU, L=L, U=U, R1.hat=R1.hat, R0.star=R0.star)
return(rslt)}
79/90
4.2 Basic methods

> ptm=proc.time(); set.seed(1234); res6=lm6.dbt(x=z,B0=1000,B1=1000,conf=0.95)
> proc.time()-ptm
user
2264.53
system elapsed
0.12 2273.27
> res6
$theta
-0.1850722
$qL
2.5%
0.01925
$qU 97.5%
0.997
$L
-0.2066373
$U
-0.1631188
$R1.hat
[1] 0.174 0.864 0.851 0.278
NA 0.509 0.358 0.836 ......
$R0.star
[1] -1.376886e-02 4.653109e-03 6.934917e-03 ......
80/90
4.2 Basic methods

> quantile(res6$R0.star, prob=c(0.01925, 0.997), type=6)
1.925%
99.7%
-0.02195345 0.02156509
> c(res6$theta-0.02156509, res6$theta-(-0.02195345))
-0.2066373 -0.1631188
> par(mfrow=c(1,2))
> hist(res6$R1.hat, breaks=30, freq=F)
> plot(density(res6$R0.star, na.rm=T),ylim=c(0,60), lwd=2)
> hist(res6$R0.star, breaks=30, freq=F, add=T)
density.default(x = res6$R0.star, na.rm = T)
40
30
Density
0.0
10
0.5
20
1.0
Density
1.5
50
2.0
60
Histogram of res6$R1.hat
0.0
0.2
0.4
0.6
res6$R1.hat
0.8
1.0
0.04
0.02
N = 1000
0.00
0.01
0.02
0.03
81/90
4.2 Basic methods
Bootstrap hypothesis testing (1)
Hypothesis testing (HT) can be performed using bootstrap.
For example, HT for H0 : = 0 vs. H1 : 6= 0 can be

simply done based on (1 )100% bootstrap CI for . H0 will
be rejected at significance level if the CI does not cover 0 .
However, caution should be exercised when bootstrap HT. In

particular, be careful about the selection of (approximate)
pivots R(X , F ), and its bootstrap replicates being used for
estimating its sampling distribution.
82/90
4.2 Basic methods

I
For example, let the test statistic be R(X , F ) = 0 . The

distribution of R(X , F ) under H0 : = 0 is required in HT.
There is a temptation to generate values of
R(X , F ) = 0 , with null value 0 being used, via
bootstrap to approximate the pdf of R(X , F ) under H0 .
However, the bootstrap distribution of R(X , F ) actually
approximates that of R(X , F ) under the true value of ,
because the sample X is observed from F with true value.
Therefore, the p-value obtained by comparing
R({x1 , , xn }, F ) with the bootstrap pdf of R(X , F ) is
unlikely to be significant whether or not 0 is significantly
different from the true value of .
83/90
4.2 Basic methods

I
The fact is it is not possible to bootstrap the distribution of

R(X , F ) under H0 . So the distribution of under the true
value of is actually used as the reference distribution for
R(X , F ) in bootstrap HT. The bootstrap distribution of
R(X , F ) = is used to approximate this reference
distribution. It is easy to see that, if 0 is significantly
different from the true value, R(X , F ) = 0 will look very
unusual by comparing it with the bootstrap distribution of
R(X , F ) = and a significant p-value will return; hence
H0 will be rejected.
We have seen that the paradigm behind bootstrap HT can be
quite different from that for traditional HT. Hall and Wilson
have addressed the issues on bootstrapping HT and provided
advice to improve the power and accuracy of bootstrap HT.
84/90
4.2 Basic methods

I
Using an appropriate pivot is still important in bootstrap HT.

It is often best to base HT on the bootstrap distribution of

is a good estimator of sd( ). This pivot
, where
0
usually gives better results than ,
, or
from the original dataset.

0 , where
estimates sd()
Finally, note that permutation testing or randomization testing

is another important HT method using resampling approach as
the bootstrap HT. Permutation tests are capable of providing
exact p-values if all possible permutations are considered,
while the bootstrap HT cannot. Permutation tests are often
more powerful than their bootstrap counterparts. However,
bootstrap HT requires less stringent assumptions and provides
greater flexibility. Permutation tests will not be detailed here.
85/90
4.2 Basic methods
Balanced bootstrap (1)

Balanced bootstrap is an approach to reduce the Monte Carlo
error induced by bootstrap sampling.
I Consider a bootstrap bias correction of the sample mean. We
is 0 in estimating the
know the bias of the sample mean X
population mean .
be the bias quantity, and R(X , F ) be
I Let R(X , F ) = X
is
its bootstrap replicate. Then EF [R(X , F )] = 0 as X
unbiased. However,
P the bootstrap estimate
P of the bias,
) = B 1 B R(X , F ) = B 1 B [X
bB (X
j=1
j=1 j X ], is
j
unlikely to be 0 in ordinary bootstrap. This is caused by the
Monte Carlo variation in generating bootstrap samples.
) = 0 exactly if each value occurs in the
I However, bB (X
combined collection of bootstrap samples with the same
relative frequency as it does in the observed sample.
CHAPTERI
4: Bootstrap
UnlikeMethods
the percentile-based methods, the studentized
I
86/90
4.2 Basic methods
Balanced bootstrap (2)

I
Hence by balancing the bootstrap samples in this manner, a

source of potential Monte Carlo error is eliminated.
This motivates using balanced bootstrap samples in

bootstrap methods.
The simplest way to get B balanced bootstrap samples is to

concatenate B copies of the observed sample of size n,
randomly permute this series, and then read off B blocks of
size n sequentially. The jth block becomes the jth bootstrap
sample Xj . For the permutation involved, balanced bootstrap
is also called permutation bootstrap.
More elaborate balancing algorithms are possible, but will not

be discussed here.
87/90
4.2 Basic methods
Antithetic bootstrap (1)

I
For a sample of univariate data, x1 , , xn , denote the

ordered data as x(1) , , x(n) . Let (i) = n i + 1 be a
permutation operator that reverses the order statistics.
Then for each bootstrap sample X = {X1 , , Xn }, let

X = {X1 , , Xn } denote the sample obtained by
substituting X((i)) for every instance of X(i) in X . Thus, for
example, if X has an unrepresentative predominance of the
larger observed data values, the smaller observed values will
predominate X .
Using this strategy, each bootstrap draw provides two

estimators: R(X , F ) and R(X , F ). The two estimators are
often negatively correlated.
88/90
4.2 Basic methods
Antithetic bootstrap (2)

I
Let Ra (X , F ) = 12 [R(X , F ) + R(X , F )]. Then Ra has the

following desirable property
1
Var[Ra (X , F )] =
[Var[R(X , F )] + Var[R(X , F )]
4

+2Cov[R(X , F ), R(X , F )]
Var[R(X , F )]
if the covariance is negative.

The above strategy of reducing Monte Carlo error in
bootstrap is referred to as antithetic bootstrap.
It is also possible to establish ordering of multivariate data to
permit an antithetic bootstrap strategy.
89/90
4.2 Basic methods
Questions?
4.1 The bootstrap principle
4.2 Basic methods
and se()

4.4 Reducing Monte Carlo error
90/90

Ch4 Bootstrap

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Ch4 Bootstrap

Transféré par

Droits d'auteur :

Formats disponibles

4.

4.2 Basic methods

4.3 Bootstrap inference

4.4 Reducing MC error

CHAPTER 4: Bootstrap Methods

CHAPTER 4: Bootstrap Methods

4.1 Bootstrap principle

4.2 Basic methods

4.3 Bootstrap inference

4.4 Reducing MC error

4.2.3 Bootstrap estimation of bias()

4.1 Bootstrap principle

4.2 Basic methods

4.3 Bootstrap inference

4.4 Reducing MC error

Distributions of correlation coefficient (1)

As a motivating example, consider the correlation coefficient

The sample correlation coefficient is

based on i.i.d. samples (X 1, Y1 ), , (Xn , Yn ).

CHAPTER 4: Bootstrap Methods

4.1 Bootstrap principle

4.2 Basic methods

4.3 Bootstrap inference

4.4 Reducing MC error

Distributions of correlation coefficient (2)

where 2 F1 denotes the ordinary hypergeometric function.

4.1 Bootstrap principle

4.2 Basic methods

4.3 Bootstrap inference

4.4 Reducing MC error

Bootstrap principle (1)

Let = T (F ) be an unknown parameter of a distribution

= CovF (X1 , X2 ) is the covariance of (X1 , X2 ) = F .

Let xn = {x1 , , xn } be data observed as a realisation of

the random variables X1 , , Xn = F . Let X = {X1 , , Xn }

4.1 Bootstrap principle

4.2 Basic methods

4.3 Bootstrap inference

4.4 Reducing MC error

Bootstrap principle (2)

Let F be the empirical cdf of the data x1 , , xn , or X , i.e.

F (x) can be regarded as an estimator of F .

4.1 Bootstrap principle

4.2 Basic methods

4.3 Bootstrap inference

4.4 Reducing MC error

Bootstrap principle (3)

The sampling distribution of the estimator T (F ) is required

For example, R(X , F ) =

is a t-test statistic, and

its distribution is required for one-sample t-test.

The distribution of R(X , F ) often depends on unknown F and

The motivation of bootstrap is to find an approximation to

CHAPTER 4: Bootstrap Methods

4.1 Bootstrap principle

4.2 Basic methods

4.3 Bootstrap inference

4.4 Reducing MC error

Bootstrap principle (4)

A sample of size n randomly drawn from the empirical cdf F is

By default, x1 , , xn are n elements drawn with replacement

For example, if R(X , F ) =

where F is the empirical cdf of the bootstrap sample X .

4.1 Bootstrap principle