Académique Documents
Professionnel Documents
Culture Documents
CORRELATION COEFFICIENT
by
MARK M. FRIDLINE
Department of Statistics
CASE WESTERN RESERVE UNIVERSITY
January 2010
CASE WESTERN RESERVE UNIVERSITY
SCHOOL OF GRADUATE STUDIES
Mark M. Fridline
Committee: _____________________________________________________________
Dr. Wojbor Woyczynski
Professor
Department of Statistics
Committee: _____________________________________________________________
Dr. Patricia Williamson
Instructor
Department of Statistics
Committee: _____________________________________________________________
Dr. David Gurarie
Professor
Department of Mathematics
January 2010
*We also certify that written approval has been obtained for any proprietary material
contained therein.
Table of Contents
1 Introduction 1
5 Numerical Applications 61
5.1 Introduction ........................................................................... 61
5.2 Bivariate Normal Random Deviate Generation .................... 65
5.3 Bivariate Exponential Deviate Generation ........................... 67
5.4 ASCLT for the Correlation Coefficient Simulations ............ 69
5.5 ASCLT for the Correlation Coefficient Simulations – Variance
Stabilizing Technique ........................................................... 81
iii
5.6 ASCLT-based Confidence Interval for Permuted Samples
(Technique #1) ...................................................................... 90
5.7 ASCLT-based Confidence Interval for Permuted Samples
(Technique #2) ....................................................................... 91
5.8 Bootstrap Confidence Interval for Population Coefficient .... 92
5.9 Variance Stabilizing Transformation for the Confidence Interval for
Population Coefficient (Classic Technique) .......................... 94
5.10 Simulation Results for the Correlation Coefficient
Confidence Interval ............................................................... 95
5.11 Simulation Results based on Bivariate Normal Distributions 97
5.12 Simulation Results based on Bivariate Exponential Distributions 102
5.13 Simulation Results based on Bivariate Poisson Distributions 106
6 Conclusion 109
6.1 Summary ............................................................................... 109
6.2 Future Research Ideas ........................................................... 114
Appendix 115
Bibliography 118
...
iv
List of Tables
5.1 Simulation Results for the Bivariate Normal Distribution ............... 115
5.2 Simulation Results for the Bivariate Exponential Distribution ........ 116
5.3 Simulation Results for the Bivariate Poisson Distribution ............... 117
v
List of Figures
~
5.4 Estimated distribution function H N ( t ) for simulated samples from dependent
bivariate normal distributions when ρ=0.3 ....................................... 75
~
5.5 Estimated distribution function H N ( t ) for simulated samples from dependent
bivariate normal distributions when ρ=0.5 ....................................... 76
~
5.6 Estimated distribution function H N ( t ) for simulated samples from dependent
bivariate normal distributions when ρ=0.7 ....................................... 76
~
5.7 Estimated distribution function H N ( t ) for simulated samples from bivariate
exponential distributions with parameters λ1 = 1 , λ 2 = 1 , λ12 = 1 ( ρ = 0.7 ). 77
6 4
~
5.8 Estimated distribution function H N ( t ) for simulated samples from bivariate
exponential distributions with parameters λ1 = 1 , λ 2 = 1 , λ12 = 1 ( ρ = 0.5 ). 78
6 3 2
~
5.9 Estimated distribution function H N ( t ) for simulated samples from bivariate
exponential distributions with parameters λ1 = 1 , λ 2 = 1 , λ12 = 1 ( ρ = 0.25 ). 78
2 2 3
~
5.10 Estimated distribution function H N ( t ) for simulated samples from independent
bivariate exponential distributions with parameters λi=3, i=1, 2 ...... 79
~
5.11 Estimated distribution function H N ( t ) for simulated samples from bivariate poisson
distributions with parameters λi=1, i=1, 2 ......................................... 80
~
5.12 Estimated distribution function H N (t ) for simulated samples from bivariate poisson
distributions with parameters λi=3, i=1, 2 ......................................... 80
vi
~
5.13 Estimated distribution function H N ( t ) for simulated samples from bivariate poisson
distributions with parameters λi=10, i=1, 2 ....................................... 81
~
5.14 Estimated distribution function J N ( t ) for simulated samples from dependent
bivariate normal distributions with ρ=0.7 ......................................... 83
~
5.15 Estimated distribution function J N ( t ) for simulated samples from dependent
bivariate normal distributions with ρ=0.5 ......................................... 84
~
5.16 Estimated distribution function J N ( t ) for simulated samples from dependent
bivariate normal distributions with ρ=0.3 ......................................... 84
~
5.17 Estimated distribution function J N ( t ) for simulated samples from independent
bivariate normal distributions ........................................................... 85
~
5.18 Estimated distribution function J N ( t ) for simulated samples from bivariate
exponential distributions with parameters λ1 = 1 , λ 2 = 1 , λ12 = 1 ( ρ = 0.7 ). 86
6 4
~
5.19 Estimated distribution function J N ( t ) for simulated samples from bivariate
exponential distributions with parameters λ1 = 1 , λ 2 = 1 , λ12 = 1 ( ρ = 0.5 ). 86
6 3 2
~
5.20 Estimated distribution function J N ( t ) for simulated samples from bivariate
exponential distributions with parameters λ1 = 1 , λ 2 = 1 , λ12 = 1 ( ρ = 0.25 ). 87
2 2 3
~
5.21 Estimated distribution function J N ( t ) for simulated samples from bivariate
exponential distributions with parameter λi=3, i=1, 2 ...................... 87
~
5.22 Estimated distribution function J N ( t ) for simulated samples from bivariate poisson
distributions with parameters λi=1, i=1, 2 ......................................... 88
~
5.23 Estimated distribution function J N ( t ) for simulated samples from bivariate poisson
distributions with parameters λi=3, i=1, 2 ......................................... 89
~
5.24 Estimated distribution function J N ( t ) for simulated samples from bivariate poisson
distributions with parameters λi=10, i=1, 2 ....................................... 89
5.25 Confidence interval results for N=10,000, CL=90%, and ρ=0.7 ...... 97
5.26 Confidence interval results for N=1,000, CL=90%, and ρ=0.7 ........ 98
vii
5.27 Confidence interval results for N=1,000, CL=95%, and ρ=0.5 ........ 98
5.28 Confidence interval results for N=100, CL=90%, and ρ=0.7 ........... 99
5.29 Confidence interval results for N=100, CL=95%, and ρ=0.3 ........... 100
5.30 Confidence interval results for N=100, CL=99%, and ρ=0.0 ........... 100
5.31 Confidence interval results for N=10,000, CL=95%, and ρ=0.7 ...... 102
5.32 Confidence interval results for N=1,000, CL=95%, and ρ=0.7 ........ 103
5.33 Confidence interval results for N=100, CL=95%, and ρ=0.7 ........... 104
5.34 Confidence interval results for N=100, CL=99%, and ρ=0.0 ........... 104
5.35 Confidence interval results for N=10,000, CL=95%, and ρ=0.0 ...... 106
5.36 Confidence interval results for N=1,000, CL=95%, and ρ=0.0 ........ 107
5.37 Confidence interval results for N=100, CL=95%, and ρ=0.0 ........... 107
viii
ACKNOWLEDGMENTS
I would like to express my sincere appreciation to the following people who helped and
supported me through this entire dissertation and Ph.D. in Statistics process.
Dr. Patricia Williamson and Dr. David Gurarie, I am appreciative for all your help and
effort while participating on my dissertation committee. I am grateful for your
outstanding suggestions and advise while I completed my dissertation.
Dr. Wojbor Woyczynski, I extremely appreciative for giving me the encouragement and
advice during my initial meeting with you over five years ago. Thank you for
understanding me as a student, but also as a person. Your academic and research
excellence has been inspiring, and I am blessed that I had the opportunity to work with
you during my tenure at CWRU.
Dr. Manfred Denker, I would like to thank you for your incredible help and guidance as
my dissertation advisor. Your style and approach while working with me, has re-
evaluated how I will work with my students. I am appreciative for your participation in
my progress as a student and researcher. I am most grateful to consider you my mentor.
To my wife Sara, who provided continued love, support, and prayers for me during my
pursuit of this degree. This accomplishment was a complete team effort, and I would not
have accomplished without you by my side. But mostly, I am a better person, husband,
and father because of you. To my children Mikaela, Andrew, and Jonathan who gave me
the inspiration to show that any dream is possible. To my parents, Jacob and Janice, and
my mother-in-law and late father-in-law, Cherie and Rollin, for supporting and believing
in me.
Most importantly, I would like to thank my Lord and Savior Jesus Christ for answering
all my prayers while helping me obtain my dream. I can do everything through Him who
gives me strength (Philippians 4:13). I give Him all the glory for this accomplishment.
ix
Almost Sure Confidence Intervals for the Correlation Coefficient
Abstract
by
Mark M. Fridline
The dissertation develops a new estimation technique for the correlation coefficient.
sequential sampling and sampling without replacement. This paper will emphasize the
features, advantages, and applications of this new procedure. It also will explain the
theoretical background and explain the necessary theory to apply this method
successfully.
x
Chapter 1
Introduction
The subject of probability theory is the foundation upon which all statistics was created,
providing a procedure for making decisions for populations when the probabilistic model is
unknown. Through these models, statisticians are provided the tools to draw inferences
about populations, while examining only a portion of the population. In this dissertation,
we will consider the probability space (Ω, β, P), where Ω is the sample space, β is the
Sigma Field, and P is a family of probabilities on β where all inferential decisions will be
made. For a more complete set-up and explanation of probability theory and how it relates
quantiles using a theorem called the Almost Sure Central Limit Theorem.
The Almost Sure Central Limit Theorem (ASCLT) was first developed independently by
the researchers Fisher (1987), Brosamler (1988), and Schatte (1988) under different
degrees of generality. In the past decade, there have been several authors that have
investigated the ASCLT and related logarithmic limit theorems for partial sums of
1
independent random variables. We refer to Atlagh (1993), Atlagh and Weber (1992), and
Berkes and Dehling (1993) for surveys of this field. The simplest form of the ASCLT
states that if X1, X2, X3, …, Xn are independently and identically distributed random
N t − x2 2
1 1
LogN n=1
∑d n 1⎧ S ⎫
n
⎨ ≤t ⎬
⎯⎯
⎯→
a .s .
2π ∫−∞
e dx (1.1)
⎩ n ⎭
or
N t − x2 2
1 1
∑11
X + X +L+ X n
LogN n=1 n { 1 2 ≤t }
⎯⎯
⎯→
a .s .
2π ∫
−∞
e dx (1.2)
n
where d n = 1 , the partial sums S n = ∑ nN= 1 X n , and 1{A} is the indicator function of the
n
1 t − x2 2
set A. By the ASCLT, the above averages converge almost surely to ∫− ∞ e dx ,
2π
the standard normal cumulative distribution function, N(0,1). By this ASCLT, the
averages on the left hand side of (1.1) and (1.2) will converge almost surely to N(0,1) if
(Hörmann, 2007). It should be noted that the result in (1.1) was first presented by Lévy
(1937, p. 270) but he did not specify the conditions and gave no proof. For a complete
It should be noted that any results utilizing the ASCLT are asymptotic in nature and are
derived from logarithmic averages. Therefore, the rates of convergence for (1.1) and
(1.2) will be very slow. Due to this issue, general data analysis applications that use the
ASCLT are nearly impossible when dealing with small sample sizes. Later in this
2
dissertation we will address this small sample issue and propose an approximation with
asymptotic results.
As mentioned earlier, a concern when working with the ASCLT is the rate of
convergence to a normal distribution. Even for very large sample sizes, the rate of
convergence is very slow. In chapter 2, this dissertation will address this ASCLT rate of
convergence by using a proposal from Thangavelu (2005) that we replace the “Log N” in
(1.1) or (1.2) with the averaging term ∑ nN= 1 1 to create the following,
n
N
1
Lim
N →∞ N
∑ 11
X + X +⋅⋅⋅+ X n
1 n=1 n { 1 2 ≤t }
(1.3)
∑ n
n=1 n
or
N
1
Lim
N →∞ N
∑ 11
X + X +⋅⋅⋅+ X n −nμ
1 n=1 n { 1 2 ≤t }
. (1.4)
∑ n
n=1 n
It will be proven that (1.3) and (1.4) are cumulative distribution functions.
This dissertation will address this rate of convergence by applying Cramér’s Theorem
(see Ferguson, 2002). Cramér’s Theorem for smooth functions of sample moments is
one of the basic results in statistics used to diminish variation and allow the construction
Cramér’s Theorem to extend the ASCLT to show that the result converges almost surely
3
towards a limiting distribution. Observe the following version of the ASCLT that
N
1
∑ 11
LogN n=1 n { n ( X n − μ )≤t }
a .s .
⎯→ N ( 0 , Σ )
⎯⎯ (1.5)
The following almost sure version of Cramér’s Theorem, which is a new result, will be
N
1
∑11
LogN n=1 { n ( g ( X n )− g (μ ))≤t }
n
⎯→ N ( 0 , g& (μ )Σg& (μ )T )
⎯⎯
a .s .
(1.6)
Later in this dissertation we will be connecting the above ideas and connect them with the
correlation coefficient.
The correlation coefficient has a history that started over a century ago. Rogers and
Nicewander (1988) gave the following brief history of the development of the correlation
coefficient. Sir Francis Galton first discussed the term bivariate correlation in 1885. In
that year, he published an article that included the first bivariate scatterplot showing the
idea of correlation. However, it was not until a decade later, in 1895, did Karl Pearson
develop an index that we use to measure the association between two random variables.
The basic idea of correlation was considered before 1885. In the article by Pearson in
1920, “Notes on the History of Correlation,” he credited Carl Friedrich Gauss for
developing the normal surface of n correlated variates in 1823. Gauss was not
4
particularly interested in the correlation as a statistic, but was interested in it as a
previously published paper in 1895, with referring to one parameter of the bivariate
normal distribution as “une correlation.” But like Gauss, Bravais did not see the
importance of correlation as a stand alone statistic. Charles Darwin who was Dalton’s
cousin, used the concept of correlation in 1868 by discussing that “all the parts of the
Inferences based on the correlation coefficient of two random variables have been
discussed by many authors. Fisher (1915) and Hotelling (1953) have derived various
forms of the distribution function for the sample correlation coefficient. These
coefficient. Fisher derived the density for the correlation statistic (r) and presented it in
two forms: one is expressed in terms of an infinite sum, and the other is expressed in
intervals from each form, Fisher (1921) introduces the extremely useful Fisher’s z-
transformation (see section 5.9) that simplified the confidence interval calculations. This
method is currently still the most common method for the calculation of the confidence
intervals for the correlation coefficient. Whereas Hotelling recommended the density of r
his derived distribution function converges rapidly even for small samples. Rubin (1966)
also suggested a simpler method for approximating the confidence interval for the
5
normalization for the correlation coefficient in samples that compared favorably to the
Also, in the literature, results can be found that finds the distribution of the correlation
coefficient, and subsequently derives the confidence interval. Some of these techniques
are very simple and require little effort in calculating the confidence intervals (see
Samiuddin, 1970), and others are more complex (see Mudholkar and Chaubey, 1978). In
Fisher (1921), Samiuddin (1970), Mudholkar and Chaubey (1978), and Rubin (1966). In
used. Later in this dissertation we will be presenting two new results that are ASCLT-
In chapter 3 we will introduce the population correlation coefficient (see Ferguson, 2002)
and prove that the asymptotic distribution of n (rn − ρ ) converges to the normal
where
6
b33 = Var(( X 1 − μ x ) 2 − σ x2 )
b34 = Cov(( X 1 − μ x )2 ,( Y1 − μ y )2 )
b44 = Var(( Y1 − μ y )2 − σ 2y )
b35 = Cov(( X 1 − μ x )2 ,( X 1 − μ x )( Y1 − μ y ))
In this chapter we will also study the distribution function addressed in the ASCLT and
connect it with the correlation coefficient to develop a new distribution function. When
connecting the ASCLT and the correlation coefficient we get the following distribution
function,
N
1
∑ 11
LogN n=1 n { n (rn − ρ )≤t }
⎯⎯
a .s .
⎯→ N ( 0 ,γ 2 ) (1.7)
Our proposal in this thesis is to apply the cumulative distribution function in (1.7) and
develop a confidence interval method for the population correlation coefficient. The
asymptotic behavior of this distribution function (1.7) will be studied empirically and a
new estimation method will be proposed and applied to the correlation coefficient.
In fact, our main goal towards this ASCLT-based theory of confidence interval
In chapter 3 we will use the results of Cramér’s Theorem to prove that the asymptotic
1 1+ ρ
and a variance of τ 2 where g( ρ ) = log and,
2 1− ρ
7
ρ2 ⎡b b b ⎤
τ2 = ⎢ 33 + 2 34 + 44 ⎥
4( 1 − ρ 2 ) 2 ⎢⎣ σ x4 σ x2σ 2y σ 4y ⎥⎦
⎡ b ⎡ ⎤
ρ 35 b45 ⎤ ⎢ b55 ⎥.
− ⎢ + ⎥+
( 2 ⎢ 3
) 3⎥ ⎢ 2
( ⎥
1 − ρ 2 ⎣σ x σ y σ xσ y ⎦ ⎢⎣ 1 − ρ 2 σ x2σ 2y ⎥⎦)
Chapter 3 will also study the distribution function addressed in the ASCLT, Cramér’s
Theorem, and connect with the correlation coefficient to develop the following new
distribution function,
N
1 1
Lim
n → ∞ log N
∑
n
1⎧ ⎛ 1 + r
⎪ n
⎜⎜ log n − log 1 + ρ ⎞ ⎫
⎪
⎟ ≤ t⎬
a .s .
⎯⎯⎯→ N ( 0 ,τ 2 ). (1.8)
n =1 ⎨
⎩⎪ 2 ⎝ 1 − rn 1 − ρ ⎟⎠ ⎪⎭
Another proposal in this thesis is to apply the cumulative distribution function in (1.8)
and develop an additional confidence interval method for the population correlation
coefficient. The asymptotic behavior of this distribution function (1.8) will be studied
empirically and a new estimation method will be proposed and applied to the correlation
To develop confidence intervals for the population correlation coefficient, we will need
the law of iterated logarithm (see Dudley, 1989) for the correlation statistic. In chapter 3
section 5, we will state and prove the following almost sure result,
s xy − nσ xy
lim sup ≤ b55 + 2 μ x σ y + 2 μ y σ x .
n→∞ 2 n log log n
8
In chapter 4, we will connect the ASCLT result and the correlation coefficient discussed
in chapter 3, and develop a confidence interval method for the population correlation
coefficient. Before we can state our confidence interval, we first will need to define the
inverse cumulative distribution functions. Once the inverse function is known, we can
show results of the quantiles of these distributions. Recall that our main goal for the
ASCLT-based confidence interval is the estimation of the quantiles of the distribution for
the correlation coefficient statistic. The inverse function for the distribution function
⎧sup{ t H N ( t ) = 0 } for α = 0
−1 ⎪
HN ( α ) = ⎨sup{ t H N ( t ) < α } for 0 < α < 1 .
⎪
⎩inf{ t H N ( t ) = 1 } for α = 1
The inverse function for the distribution function J N ( t ) stated in (1.8) is denoted as
follows:
After the inverse functions were shown for the cumulative distribution functions in (1.7)
(N ) −1
and (1.8), the results of the quantiles of these functions will be defined as tα = H N (α )
−1
or tα( N ) = J N (α ).
In chapter 4 section 4, a new version of the confidence interval for the population
correlation coefficient will be presented. This method uses quantiles from the ASCLT-
based distribution function (1.7) to estimate confidence intervals. One key property of
9
this new method is that the estimation or use of the variance of the observations is not
needed. Therefore this approach uses a variance-free method to estimate the limiting
⎡ t1( −Nα) t( N ) ⎤
(N) ⎢
I α = ρˆ + , ρˆ + α ⎥ (1.9)
⎢ N N ⎥
⎣ ⎦
where ρ̂ is the estimated correlation coefficient.
Fisher used it to construct the confidence interval for the population correlation
coefficient. In chapter 4 section 5, another version of the confidence interval for the
population correlation coefficient will be presented. This method uses quantiles from the
new result is another ASCLT-derived confidence interval for ρ using the variance
stabilizing technique:
⎡ ⎧ ⎛ ( N ) ⎞⎫
⎧ (N) ⎫ ⎤
⎢ exp ⎪⎨2⎜ z + t1 −α ⎟⎪⎬ − 1 exp ⎪2⎛⎜ z + tα ⎞⎟⎪ − 1 ⎥
⎨ ⎜ N ⎬
⎢ ⎪
⎜ N N ⎟⎪ ⎪ N ⎟⎠⎪⎭ ⎥
⎩ ⎝ ⎠ ⎭ ⎩ ⎝
I α( N ) = ⎢ , ⎥ (1.10)
⎢ ⎧ ⎛ ( N ) ⎞⎫ ⎧ ⎛ ( N ) ⎞ ⎫ ⎥
⎢ exp ⎪2⎜ z + t1 −α ⎟⎪ + 1 exp ⎪⎨2⎜ z + tα ⎟⎪⎬ + 1 ⎥
⎨ ⎜ N ⎬ N
⎢
⎪ N ⎟⎪ ⎪⎩ ⎜⎝ N ⎟⎠⎪⎭ ⎥
⎢⎣ ⎩ ⎝ ⎠ ⎭ ⎥⎦
1 + ρˆ
where z N = log .
1 − ρˆ
To the best of our knowledge, the cumulative distribution functions mentioned in (1.7)
10
and (1.8) have not been considered before. Even though the property of converging to
normal distribution. Chapter 5 will also evaluate the performance of the proposed
bootstrap approach (discussed in section 5.8) and to the classic Fisher’s z transformation
parametric approach (discussed in section 5.9). All numerical simulations for the
empirical distribution functions and confidence interval techniques will be completed for
varying values of n and ρ. Also, each simulation will be repeated for the bivariate
Before proceeding into this dissertation, we should concisely mention our goals. By
using the ASCLT we will be estimating the quantiles from an unknown distribution. In
inferences for the population correlation coefficient. These new approaches will be
developed using an almost sure version of Cramér’s Theorem. This dissertation will
simulations.
11
Chapter 2
From a statistical viewpoint, the almost sure central limit theorem is just a theorem for a
special (but important) class of statistics, the mean. In this chapter, we will discuss the
ASCLT and how it can be extended to be included with Cramér’s Theorem. This chapter
will go into more detail of the ASCLT, introduce Cramér’s Theorem, and finally develop
The almost sure central limit theorem (ASCLT) was developed independently by the
researchers Fisher (1987), Brosamler (1988), and Schatte (1988). The theorem presented
by these three authors has connected the theory of the central limit theorem to an almost
sure version named the almost sure central limit theorem. A version of the ASCLT is
presented below.
Theorem 2.1. (Almost Sure Central Limit Theorem) Let X1, X2, X3, …, Xn be i.i.d.
random variables with Sn = X1+X2+ ··· + Xn, being the partial sums. If EX1 = 0, EX 12 = 1,
2 +δ
and E X 1 is finite for some δ > 0 then
N
1
∑ 11
X + X +L+ X n
LogN n=1 n { 1 2 ≤t }
⎯⎯
a .s .
⎯→Φ ( t ) for any t, (2.1)
n
12
where Φ(t) is the standard normal cumulative distribution function and 1{A} is the
Brosamler’s version of the ASCLT presented in Theorem 2.1 is the simplest version since
it assumes that µ = 0. However, observe the following version of the ASCLT when the
N
1
∑
11
X + X +L+ X n −nμ
LogN n=1 n { 1 2 ≤t }
⎯⎯
a .s .
⎯→Φ σ ( t ) for any t, (2.2)
n
where now Φσ(t) is the normal cumulative distribution function with mean 0 and variance
σ2. In typical circumstances, the entire problem revolves around hypothesis testing and
since this parameter is typically not known in practice. This issue will be addressed later
in this thesis.
After the above theorem had been established, during the past decade, there have been
several authors investigating the ASCLT and related logarithmic limit theorems for
partial sums of independent random variables. In fact, Berkes and Csáki (2001) extended
this theory and introduced that not only the central limit theorem, but every weak limit
theorem for independent random variables, with minor technical conditions, can have an
almost sure version. We will not go into any details surrounding the investigations other
authors that have developed advances in the concepts of ASCLT, as these theorems are
typically from a mathematical perspective. Our interest moving forward will be dealing
13
Lifshits (2002) extended the ASCLT from random variable to random vectors. He
established a sufficient condition for the ASCLT for sums of independent random vectors
be noted that this article was translated from an article from the journal Zapiski
{ }
Theorem 2.2. Let ξ j be a sequence of independent random vectors taking values in
ℜ d where δx > 0 is the probability measure which assigns its total mass to x. Assume the
following condition (see Berkes and Dehling; 1993) which is sufficient for the ASCLT,
{ }
Assume also that ξ j ∈ L2 . Then the ASCLT holds for random vectors, i.e.
N
1 1δ
n ∑⎧ n ⎫
a .s .
⎯⎯⎯→ N ( 0 , Σ )
∑ ξ j − E (ξ j )⎬
LogN n=1 ⎪ 1 ⎪
⎨
⎪⎩ n j =1 ⎪⎭
where
One issue of the ASCLT as in the form (2.1) or (2.2) is the rate of convergence to a
normal distribution. Even for very large sample sizes, the rate of convergence is very
14
slow. Consider the following quotient:
N
1
∑n
n =1
LogN
This quotient will converge to 1 for sufficiently large values of N. However, what is a
large value of N? Even with large values on N, this fraction does not equal 1. For
example, for N = 107, the above ratio is approximately equal to 1.03, and for N = 1010, it
N
1
Lim
N →∞ LogN
11
∑
n { X 1 + X 2 +⋅⋅⋅+ X n ≤t }
n =1 n
or
N
1
Lim
N →∞ LogN
∑
11
n { X 1 + X 2 +⋅⋅⋅+ X n − nμ ≤t }
n =1 n
will not be a distribution function even for very large values of N. Moving forward, we
will use a proposal from Thangavelu (2005) that we replace the “Log N” in the ASCLT
N
1
Lim
N →∞ N 1
∑ n1 1{ X1+ X 2 +⋅⋅⋅+ X n ≤t }
∑ n=1 n
n=1 n
or
N
1
Lim
N →∞ N 1
∑ n1 1{ X1+ X 2 +⋅⋅⋅+ X n −nμ ≤t } .
∑ n=1 n
n=1 n
15
For convenience, subsequent sections we will denote ∑ nN= 1 1 by CN.
n
Consider the following notation and assumptions. For n ≥ 1, where n ≤ N ∈ N, let the
statistic Tn be a sequence of real valued statistics defined on the same measureable space
(Ω, β) and P be a family of probabilities on β. Let us assume that the statistic Tn satisfies
the Central Limit Theorem for each P ∈ P where the constants bn = n-1/2 and an(P) =
nµ(P) are unknown. Let us also assume that GP is the unknown normal distribution
function (Normal N(µ, σ2) where µ and σ2 are unknown) that is continuous where
a .s .
P({ ω ∈ Ω : bn ( Tn ( ω ) − a n ( P )) ≤ t }) ⎯⎯
⎯→ G P ( t ) for t ∈ CG
where CG denotes the set of continuity points of GP. We will be presenting results for a
will denote Tn(ω) by Tn, an(P) by an, and µ(P) by µ. These simplifications will hold true
for every P ∈ P.
Again, consider the following notation and assumptions. For n ≥ 1, where n ≤ N ∈ N, let
the statistic Tn be a sequence of real valued statistics defined on the same measureable
space (Ω, β) and P be a family of probabilities on β. Let us assume that the statistic Tn
16
satisfies the CLT and ASCLT for each P ∈ P where the constants bn = n-1/2 and an = nµ
N
1 1
Lim
N → ∞ LogN
∑ 1{ b ( T − a ) ≤ t } = G P ( t ),
n n n n
∀ t ∈ ℜ d a .s . (2.3)
n =1
where an and bn are non-random sequences. It is known from Brosamler (1988), Schatte
(1988), Fisher (1987), and Lacey and Philipp (1990) that the almost sure limit theorem
holds for Tn being the mean of n i.i.d. random variables with finite second moment. The
case of generalized means, U-statistics, has been considered by Berkes (1993) and later
by Holzmann, Koch, and Min (2004). One particular case was established by
Thangavelu (2007) for the rank statistics. Other sequences of statistics have not been
N
1 1
G N ( t ,ω ) = G N ( t ) = ∑ 1{ b ( T − a )≤ t }
LogN n = 1 n n n n
or
N
1 1
G N ( t ,ω ) = G N ( t ) = ∑
LogN n = 1 n
1( −∞ ,t ] ( bn ( Tn − a n )) .
We will be presenting results for a fixed ω ∈ Ω , though the results would be applicable
to each ω ∈ Ω .
Consider the following two functions that are now defined for each ω ∈ Ω and t ∈ ℜ with
17
replacing the log averaging term with CN:
N
~ 1 1
GN ( t ) = ∑ 1{ b ( T − a ) ≤ t }
C N n =1 n n n n
N
1 1
= ∑
C N n =1 n
1( −∞ ,t ] ( bn ( Tn − a n )).
N
1 1
Ĝ N ( t ) = ∑ 1{ b T )≤ t }
C N n =1 n n n
N
1 1
= ∑
C N n =1 n
1( −∞ ,t ] ( bnTn )
Lemma 2.1. Let X1, X2, …, Xn be i.i.d. random variables on (Ω, β, P) where Ω is the
sample space, β is the Sigma Field, and P is the probability space (probability of each
possible subset in Ω), where Xi ∈ (-∞, +∞), μ ∈ (-∞, +∞), σ2 > 0 and finite.
~ ~
G N and Ĝ N are empirical distribution functions. Also, G N ( t ) converges to G P ( t ) almost
~
Observe the following proof (Thangavelu, 2005). Let us first consider G N ( t ) ,
N
~ 1 1
GN ( t ) = ∑ 1 X + X +⋅⋅⋅+ X n −nμ
C N n=1 n { 1 2 ≤t }
.
n
~ ~ ~
This implies G N ( t ) ≤ G N ( s ) for n ≤ N , N ∈ N fixed. Therefore, G N ( t ) is monotonically
18
increasing in t ∈ ℜ . We also observe that,
N
1
Lim
t →−∞ C N
∑ 11
n { X 1 + X 2 +⋅⋅⋅+ X n − nμ ≤t }
=0
n =1 n
N
1
Lim
t →+∞ C N
∑ 11
n { X 1 + X 2 +⋅⋅⋅+ X n − nμ ≤t }
=1
n =1 n
~
which implies lim G~ N ( t ) = 1. Further we note that the function G N ( t ) is a step function
t →+∞
~ ~
in t and 0 ≤ G N ( t ) ≤ 1 , where G N ( t ) is a constant for t ∈ (ti-1, ti], for all i = 2, 3, .., s.
~ ~
Also, t1 ≤ t2 ≤ … ≤ tN, G N ( t ) ≈ 0 for all t ≤ t1, and G N ( t ) ≈ 1 for all t ≥ tN. Note
~
that G N ( t ) is left continuous, which is a function that is continuous when a point is
~
approaching from the left. Therefore G N ( t ) is an empirical distribution function.
~
Also, since Ĝ N ( t ) is a special case of G N ( t ) where µ = 0, all the aforementioned steps
in the proof hold true for Ĝ N ( t ) ; hence it is also an empirical distribution function.
~
Now that we have established that G N ( t ) has the properties of a distribution function, we
~
can now show that the result G N ( t ) converges to G P ( t ) almost surely for all t ∈ CG . This
is a unique case of the Glivenko-Cantelli Theorem, which relates the idea of consistency
(Thangavelu, 2005) below without proof. It should be noted that this theorem establishes
19
~
the relationship between the empirical distribution G N ( t ) and the theoretical distribution
function G P ( t ) .
~
Theorem 2.3. (Glivenko-Cantelli Theorem) G N ( t ) converges almost surely to G P ( t )
that is,
~
Lim sup G N ( t ) − G P ( t ) = 0 .
N → ∞ t∈ℜ
We will now state Cramér’s Theorem (see (Ferguson, 2002; Lehmann, 1999)) below
without proof.
n ( X n − μ ) ⎯⎯→ X
L
Xn, n ≥ 1 be a sequence of ℜ d – valued random vectors such that
(
n ( g ( X n ) − g (μ )) ⎯⎯→ N 0 , g& ( μ )Σg& ( μ )T
L
)
For a proof of this theorem, please refer to Ferguson (2002).
Cramér’s theorem for smooth functions of sample moments is one of the basic results in
intervals. In this chapter we extend the result to almost sure convergence towards the
limiting distribution (in the sense of the above almost sure weak convergence results
20
(2.3)). We use the notation x ≤ t for xi ≤ ti for all i = 1, ..., d. For a random variable X we
let GX denote its cumulative distribution function (c.d.f.) which is a right continuous
function.
surely in the weak topology (for short a.s. weakly) if for any bounded continuous
function f : ℜ d → ℜ,
n
1 1
Lim
n → ∞ log n
∑ k
f ( X k ) = Ef ( X )
k =1
holds. It is necessary and sufficient for this convergence that the relation holds for any
function f of the form 1{ x∈ℜ d : x ≤ t , i = 1,K ,d } where t = (t1, ..., td) is a vector in ℜ d at
i i
which the distribution function GX of X is continuous. This follows from the standard
fact that these indicator functions form a generating class for all bounded continuous
a .s .
X n ⎯⎯
⎯→ X
to denote a.s. weak convergence. The set of continuity points will be denoted
by D(GX).
We first note that Slutzky’s lemma (see (Ferguson, 2002; van der Vaart, 1998;
tool that can be used in conjunction with the Central Limit Theorem. To simplify
21
d
notation, we use t ≤ s for vectors s = (s1, ..., sd), t = (t1, ..., td) ∈ ℜ to denote the ordering
ti ≤ si for i = 1, ..., d of all coordinates. Likewise t < s means that at least one coordinate
is strictly smaller.
Lemma 2.2.
a .s .
1. If an ∈ ℜ (n ≥ 1) is a sequence converging to a ∈ ℜ and if X n ⎯⎯
⎯→ X ,
a .s .
then a n X n ⎯⎯
⎯→ aX .
a .s . a .s . a .s .
2. If X n − Yn ⎯⎯
⎯→ 0 and if X n ⎯⎯
⎯→ X then Yn ⎯⎯
⎯→ X .
Proof:
surely in the weak topology. We use the equivalent definition of this convergence. Let
If a = 0, then aX is zero a.s. GaX(t) vanishes for t = (t1, ..., td) ∈ ℜ d such that ti < 0 for some
1 ≤ i ≤ d, and equals 1 for t ∈ ℜd for which each coordinate is strictly greater than 0. Let
t = (t1, ..., td) ∈ ℜd be a continuity point of G0 such that G0(t) = 0. Note that the set
( )d
I = {i : 1 ≤ i ≤ d; ti < 0} is non-empty. Let η > 0. Choose u, v ∈ ℜd ∪ {∞} such that
vi = ∞ for i ∉ I ,
22
Then
n
1 1
Lim
n → ∞ log n
∑k
1{a k X k ≤ t }
k =1
o n n
1 1 1 1
= Lim
n → ∞ log n
∑
k
1{a k X k ≤ t } + Lim
n → ∞ log n
∑ 1{a X ≤ t }
k k k
k =1 k = n o + 1, a k = 0
n d
1 1
+ Lim
n → ∞ log n
∑ k ∏
1{X k ( i ) ≤ t a k }
i
k = no + 1,a k > 0 i = 1
n d
1 1
+ Lim
n → ∞ log n
∑ k ∏
1{X k ( i )≥ t a k }
i
k = no + 1,a k < 0 i = 1
n
1 1
≤ Lim
n → ∞ log n
∑ ∏ 1{X
k k ( i ) ≤ vi }∏ 1{X k ( i )< ∞}
k =1 i∈ I i∉ I
n
1 1
+ Lim
n → ∞ log n
∑ ∏ 1{X
k k ( i )≥ ui }∏ 1{X k ( i )≥ t i ak }
k =1 i∈ I i∉ I
n n
1 1 1 1
= Lim
n → ∞ log n
∑
k
1{X k ≤ v} + Lim
n → ∞ log n k ∑
1{X k ∉∏ i (− ∞ ,u i )}
k =1 k =1
≤ 2η
since
n n
1 1 1 1
Lim
n → ∞ log n
∑ k
1{ X k ∉Π i (− ∞ ,u i ) } = 1 − Lim
n → ∞ log n k ∑
1{ X k ≤ u }.
k =1 k =1
Let t = (t1, ..., td) ∈ ℜd be a continuity point of G0 such that G0(t) = 1. This means that
Since Lim n → ∞ a n = a = 0 , there is no ∈ N such that for all n ≥ n0 we have that t/an ≥ u in
23
n
1 1
Lim
n → ∞ log n
∑ 1{a X ≤ t }
k k k
k =1
n
1 1
= 1 − Lim
n → ∞ log n
∑k
1{a k X k ∉∏ i (− ∞ ,t i ]}
k =1
o n
1 1
= 1 − Lim
n → ∞ log n
∑k
1{a k X k ∉∏ i (− ∞ ,t i ]}
k =1
n
1 1
− Lim
n → ∞ log n
∑ k
1{a X ∉∏ (− ∞ , t ]}
k k i i
k = n o + 1, a k = 0
n
1 1
− Lim
n → ∞ log n
∑ k
1{X k ≥ t a k }
i
k = n o + 1, a k > 0
n
1 1
− Lim
n → ∞ log n
∑ k
1{X k ≤ t a k }
i
k = n o + 1, a k < 0
n
1 1
≥ 1 − Lim
n → ∞ log n
∑k
1{X k ∉∏ i (− ∞ ,u i )}
k =1
n
1 1
− Lim
n → ∞ log n
∑
k
1{X k ≤ v}
k =1
≥ 1 − 2η
since
n
1 1
Lim
n → ∞ log n
∑ k
1{X k ∉∏ i (− ∞ ,u i ]}
k =1
n
1 1
= 1 − Lim
n → ∞ log n k∑1{X k ∈∏ i (− ∞ ,u i ]}
k =1
= 1 − Gx ( u )
< η.
It is left to consider the case when a ≠ 0. We first show the result for a > 0. We may
assume that every ak > 0 by a similar argument as just used. In this case, let t ∈ ℜ d be a
24
Let η > 0. Since t is a continuity point and since GaX(t) = GX(t/a), there exists δ > 0 such
that,
d
where e = (1, 1, ..., 1) ∈ ℜ . Choose no ∈ N such that for n ≥ no,
t t
− <δ
an a
n
1 1
Lim
n → ∞ log n
∑ k
1{a k X k ≤ t}
k =1
n
1 1
= Lim
n → ∞ log n
∑
k
1{X k ≤ t a k }
k =1
o n n
1 1 1 1
≤ Lim
n → ∞ log n
∑
k
1{X k ≤ t a k } + Lim
n → ∞ log n
∑ k
1{X k ≤ (t a )+ δe}
k =1 k = no + 1
n
1 1
≤ Lim
n → ∞ log n
∑
k
1{X k ≤ (t a )+ δe}
k =1
= G X (( t a ) + δe )
≤ G X ( t a ) +η .
n
1 1
Lim
n → ∞ log n
∑ k
1{a k X k ≤ t} ≤ G X (( t a ) + δe ) ≤ G X ( t a ) .
k =1
25
n
1 1
Lim
n → ∞ log n
∑k
1{a k X k ≤ t}
k =1
n
1 1
= Lim
n → ∞ log n
∑ k
1{X k ≤ t a k }
k =1
o n n
1 1 1 1
≥ Lim
n → ∞ log n
∑ k
1{X k ≤ t a k } + Lim
n → ∞ log n
∑k
1{a k X k ≤ (t a )− δe}
k =1 k = no + 1
n o n
1 1 1 1
= Lim
n → ∞ log n
∑ k
1{X k ≤ (t a )− δe} − Lim
n → ∞ log n k ∑
1{X k ≤ (t a )− δe}
k =1 k =1
= G X (( t a ) − δe )
≥ G X ( t a ) −η .
Letting η tend to zero shows that the lower bound for the limit is as well GX(t/a),
completing the proof in the case when a > 0. Finally, the case a < 0 is carried out in the
same way.
(2) Let Xn−Yn converge to zero almost surely and Xn converge to X weakly almost surely.
Let t be a continuity point of the distribution function GX of X. Let η > 0. Choose δ > 0
such that
where e is as in (1). Let Ω 0 be a set of probability one such that for ω ∈ Ω 0 we have
lim [ X n ( ω ) − Yn ( ω )] = 0
n→∞
and
n
1 1
Lim
n → ∞ log n
∑ k
1{ X k ( ω ) ≤ t } = G X ( t ).
k =1
Fix ω ∈ Ω 0 . Choose n0 such that for n ≥ n0 we have that || Xn(ω) − Yn(ω) || < δ. Then
26
n
1 1
Lim
n → ∞ log n
∑ 1{Y ≤ t}
k k
k =1
o n n
1 1 1 1
= Lim
n → ∞ log n
∑k
1{X k ≤ t} + Lim
n → ∞ log n
∑ k
1{X k ≤ t − Yk + X k }
k =1 k = no + 1
n
1 1
≤ Lim
n → ∞ log n
∑k
1{X k ≤ t + δe}
k =1
= G X ( t + δe )
≤ GX ( t ) +η.
n
1 1
Lim
n → ∞ log n
∑ k
1{ Yk ≤ t } ≤ G X ( t ).
k =1
27
N
1 1
Lim
N → ∞ log N
∑n
1{ Z n ≤ t } = G Z ( t ) , t ∈ D( G z ) a.s . ,
n =1
then
N
1 1
Lim
N → ∞ log N
∑n
1{ ΣZ n ≤ t } = GΣZ ( t ) , t ∈ D( GΣz ) a .s . .
n =1
N
1 1
Lim
N → ∞ log N
∑n
1{ ΣZ n ≤ t }
n =1
N
1 1
= Lim
N → ∞ log N
∑ n
1{ Z n ∈ A }
n =1
= G Z ( A ) = GΣZ ( t ) a .s .
Theorem 2.5. Let the function g be a mapping from ℜ d into ℜ k where g is differentiable
sequence of ℜ d − valued random vectors satisfying the almost sure weak convergence
property:
N
1 1
Lim
N → ∞ log N
∑n
1{ n ( X − μ ) ≤ t } = G X ( t ) ,
n
t ∈ D( G X ) a .s . , (2.4)
n =1
where GX is the cumulative distribution function (c.d.f.) of some random variable X and
28
N
1
Lim
N →∞
∑ n
=0 (2.5)
n = 1; n∉ N o
and
Lim X n k = μ a .s . (2.6)
k →∞
Then
N
1 1
Lim
N → ∞ log N
∑
n
1{ n ( g ( X ) − g ( μ )) ≤ t } = G g& ( μ ) X ( t ) , t ∈ D( G g& ( μ ) X ) a.s . (2.7)
n
n =1
Proof: First note that we may assume that Xn → μ a.s., since (2.5) shows that
N
1 1
Lim
N → ∞ log N
∑ n
1{ n ( g ( X ) − g ( μ )) ≤ t } = 0.
n
n = 1; n∉ N o
We shall show the convergence in equation (2.7) on the set of points ω ∈ Ω for which
1
n (g( X n ) − g( μ )) = g& ( μ + v( X n − μ ))dv n ( X n − μ ),
∫
0
g& ( μ + v( X n − μ )) → g& ( μ ) a .s .
N
1 1
Lim
n → ∞ log N
∑
n
1{ n ( X ( ω ) − μ ) ≤ t } ≥ 1 − δ .
n δ
n =1
29
N N
1 1 1 1
Lim
N → ∞ log N
∑ n
≤ 1 − Lim
N → ∞ log N n ∑
≤δ.
n = 1; n∈ N 1 n = 1; n∉ N 1
large that
g& ( μ + v( X n ( ω ) − μ )) − g& ( μ ) ≤ ε ∀0 ≤ v ≤ 1 .
log n0
<δ ∀N ≥ N 0 ,
log N
and
N
1 1
sup ∑
t ∈ D( G g& ( μ ) X ) log N n = 1 n
1{ n ( X − μ )≤ t } − G g& ( μ ) X ( t ) < δ .
n
N
1 1
∑ 1
log N n = 1 n { n ( g ( X n ( ω )) − g ( μ ))≤ t }
no − 1
1 1
= ∑ 1
log N n = 1 n { n ( g ( X n ( ω )) − g ( μ ))≤ t }
N
1 1
+ ∑ 1
log N n = n : n∈ N n { n ( g ( X n ( ω )) − g ( μ ))≤ t }
o 1
N
1 1
+ ∑ 1
log N n = n : n∉ N n { n ( g ( X n ( ω )) − g ( μ ))≤ t }
o 1
N
log no 1 1
≤
log N
+δ + ∑
1 1
log N n = n : n∉ N n { ∫0 g& ( μ + v( X n − μ ))dv n ( X n ( ω ) − g ( μ ))≤ t }
o 1
30
≤ 2δ
N
1 1
+ ∑ 1 1
log N n = 1: n∉ N n { g& ( μ ) n ( X n ( ω ) − g ( μ ))≤ t + ∫0 [g& ( μ ) − g& ( μ + v( X n − μ ))]dv n ( X n − μ ) }
1
N
1 1
≤ 2δ + ∑ 1
log N n = 1: n∉ N n { g& ( μ ) n ( X n ( ω ) − g ( μ ))≤ t + εtδ }
1
≤ G g& ( μ ) X ( t + εtδ ) + 3δ , t ∈ D( G g& ( μ ) X ).
In order to continue the proof of the theorem, first let ε → 0, then δ → 0 to obtain
N
1 1
Lim
N → ∞ log N
∑1
n { n ( g ( X n ) − g ( μ )) ≤ t }
≤ G g& ( μ ) X ( t ) , t ∈ ℜk .
n =1
N
1 1
∑ 1
log N n = 1 n { n ( g ( X n ( ω )) − g ( μ ))≤ t }
N
1 1
≥ ∑ 1
log N n = n : n∉ N n { n ( g ( X n ( ω )) − g ( μ ))≤ t }
o 1
N
1 1
= ∑ 1 1
log N n = n : n∉ N n { ∫0 g& ( μ + v( X n − μ ))dv n ( X n ( ω ) − g ( μ ))≤ t }
o 1
N
1 1
= ∑ 1 1
log N n = n : n∉ N n { g& ( μ ) n ( X n ( ω ) − g ( μ ))≤ t + ∫0 [g& ( μ ) − g& ( μ + v( X n − μ ))]dv n ( X n − μ ) }
o 1
N
1 1
≥ ∑ 1
log N n = 1: n∉ N n { g& ( μ ) n ( X n ( ω ) − g ( μ ))≤ t − εtδ }
1
31
N
1 1
≥ ∑ 1
log N n = n n { g& ( μ ) n ( X n ( ω ) − g ( μ ))≤ t − εtδ }
o
no − 1
1 1
− ∑ 1
log N n = 1 n { g& ( μ ) n ( X n ( ω ) − g ( μ ))≤ t − εtδ }
N
1 1
− ∑ 1
log N n = n : n∈ N n { n ( X n ( ω ) − g ( μ )) ≥ t ε }
o 1
≤ G g& ( μ ) X ( t − εtδ ) − 3δ , t ∈ D( G g& ( μ ) X ).
N
1 1
Lim
N → ∞ log N n ∑
1{ n ( g ( X ) − g ( μ )) ≤ t } ≥ G g& ( μ ) X ( t − ) ,
n
t ∈ ℜk ,
n =1
the c.d.f..
32
Chapter 3
direction of the linear association between these two quantitative variables from a
bivariate distribution. The range of values for the correlation coefficient is from -1 to +1,
where the closer ρxy is to ±1, the stronger the linear relationship. In this chapter, we
We begin with some more notation. Let (Xn, Yn) (n ≥ 1) be a sequence of independently
identically bivariate vector pairs having a joint probability distribution f(x,y) and having
the assumption as in Theorem 2.2. We denote µx = E(X1) and µy = E(Y1) the expectations
of the marginals and by σ x2 and σ 2y their variances. Let the covariance of X1 and Y1 (see
n
1
s x2 = ∑
n i =1
( X i − x )2 ;
n
1
s 2y = ∑
n i =1
( Yi − y )2 ;
and
33
n
1
s xy = ∑
n i =1
( X i − x )( Yi − y ) .
Cov( X ,Y )
ρ xy = ,
σ xσ y
where
⎧
⎪⎪ x y
∑∑( x − μ x )( y − μ y ) f ( x , y ) , ( X ,Y ) discrete
Cov( X ,Y ) = ⎨ .
⎪ ∞ ∞ ( x − μ )( y − μ ) f ( x , y )dxdy ,
∫ ∫
⎪⎩ − ∞ − ∞ x y ( X ,Y ) continuous
Y. If large values of X tend to be observed with large values of Y, and small values of X
tend to be observed with small values of Y, Cov(X,Y) will be positive. This positive
relationship can be shown when X > µx, then Y > µy is likely to be true and the product of
(X-µx)(Y-µy) will be positive. If X < µx, then Y < µy is also likely to be true and the
observed with small values of Y, or if small values of X tend to be observed with large
values of Y, Cov(X,Y) will be negative. The negative relationship can also be shown
when X > µx, then Y < µy (and vice versa) is likely to be true and the product of (X-µx)(Y-
µy) will tend to be negative. Using the fact that Cov(X,Y) = E(XY) - µxµy, the population
correlation coefficient:
E( XY ) − μ x μ y
ρ xy = .
σ xσ y
The proof of this version of the correlation coeff. can be viewed in Casella and Berger (2002).
34
3.2 The Sample Correlation Coefficient
Let (x1, y1), (x2, y2), …, (xn, yn) be a random sample of n pairs from a bivariate
n
∑ ( xi − x )( yi − y )
i =1
rxy =
n n
∑ ( xi − x ) 2 ∑ ( yi − y )2
i =1 i =1
n n
n ∑ ∑ yi
xi
∑ xi y i − i = 1 n
i =1
i =1
= .
2 2
⎛ n ⎞ ⎛ n ⎞
n
⎜ x ⎟
⎜ ∑
i⎟ n
⎜
⎜ ∑ yi ⎟
⎟
2 ⎝ i =1 ⎠
yi2 − ⎝
i =1 ⎠
∑ xi −
n
∑ n
i =1 i =1
Let the sample correlation coefficient rxy be a point estimate of the population coefficient
coefficient ρxy. We can define the above sample correlation coefficient using sample
moments. If x1, x2, x3, …,xn is a random sample, where the sample moments are defined as
( ) ( ) ( )
m x = 1 ∑ in= 1 xi , m xx = 1 ∑ in= 1 xi2 , m xy = 1 ∑ in= 1 xi yi , etc. The following are the
n n n
n n
n ∑ xi ∑ y i
∑ xi y i − i = 1 n
i =1
rxy = i =1
2 2
⎛ n ⎞ ⎛ n ⎞
n ⎜ ∑
⎜ x ⎟
i⎟ n
⎜
⎜ ∑
yi ⎟
⎟
2 ⎝ i =1 ⎠
yi2 − ⎝
i =1 ⎠
∑ xi −
n ∑ n
i =1 i =1
35
⎡ n n ⎤
1⎢
⎢
n
xi yi ⎥ ∑ ∑
i =1 i =1 ⎥
∑
⎢ xi y i −
n ⎢i = 1 n
⎥
⎥
⎢ ⎥
= ⎣ ⎦
2 2
⎛ n ⎞ ⎛ n ⎞
n
⎜ x ⎟
⎜ ∑
i⎟ n
⎜
⎜ ∑ yi ⎟
⎟
1 2 ⎝ i =1 ⎠ 1 2 ⎝ i =1 ⎠
∑
n i =1
xi −
n
∑
n i =1
yi −
n
n n n
∑ xi y i ∑ xi ∑ yi
i =1
− i =1 ⋅ i =1
= n n n
2 2
n ⎛ n ⎞ n ⎛ n ⎞
∑ 2 ⎜
xi
⎜ ∑xi ⎟
⎟ ∑ 2 ⎜
yi
⎜
yi ⎟
⎟∑
−⎝
i =1 ⎠
−⎝
i =1 i =1 i =1 ⎠
n n2 n n2
m xy − m x ⋅ m y
= .
m xx − (m x ) 2
m yy − m y( ) 2
The definition of the sample correlation coefficient using sample moments will help in
When developing inferential statistical techniques (i.e. hypothesis testing and confidence
intervals) for the population correlation coefficient, the asymptotic distribution of the
sample correlation coefficient must be considered (see Ferguson, 2002). First, we let A
36
a11 = Cov(( X 1 − μ x )2 ,( X 1 − μ x )2 ) = E(( X 1 − μ x )4 ) − σ x4 ,
a 22 = Cov(( X 1 − μ x )( Y1 − μ y ),( X 1 − μ x )( Y1 − μ y )) =
= E(( X 1 − μ x )2 ( Y1 − μ y )2 ) − σ xy
2
,
a12 = a 21 = Cov(( X 1 − μ x )2 ,( X 1 − μ x )( Y1 − μ y ))
= E(( X 1 − μ x )3 ( Y1 − μ y )) − σ x2σ xy ,
= E(( X 1 − μ x )2 ( Y1 − μ y )2 ) − σ x2σ 2y ,
a32 = a 23 = Cov(( Y1 − μ y )2 ,( Y1 − μ y )( X 1 − μ x ))
= E(( X 1 − μ x )( Y1 − μ y )3 ) − σ xyσ 2y .
In order to prepare for the following theorem we note the following Lemma.
Lemma 3.1: Let (Xi, Yi) (i ≥ 1) be as above and define Sn = (Sn1, Sn2, Sn3, Sn4, Sn5) by
n
1
S n1 = ∑
n k =1
( X k − μ x );
n
1
S n2 = ∑
n k =1
( Yk − μ y );
n
1
S n3 =
n
∑ ( X k − μ x )2 ;
k =1
n
1
S n4 =
n
∑ ( Yk − μ y ) 2 ;
k =1
and
n
1
S n5 = ∑
n k =1
( X k − μ x )( Yk − μ y ).
a .s .
Then we have the almost sure convergence S n ⎯⎯
⎯→ N ( b , B ) , where
37
b11 = σ x2
b22 = σ 2y
b33 = Var(( X 1 − μ x ) 2 − σ x2 )
b44 = Var(( Y1 − μ y ) 2 − σ 2y )
b55 = Var(( X 1 − μ x )( Y1 − μ y ) − σ xy )
b12 = b21 = σ xy
Proof: This follows directly from the almost sure central limit theorem for i.i.d. random
Zn = (Xn, Yn, (Xn - µx)2, (Yn - µy)2, (Xn - µx) (Yn - µy)).
Recall that the population correlation coefficient is ρ =ρxy = σxy/σxσy, and it is estimated
s xy
r = rxy = .
sx s y
The following theorem and outlined proof can be found in Ferguson (2002).
Theorem 3.1. Let (x1, y1), (x2, y2), …, (xn, yn) be a random sample of n pairs as having
been drawn from a bivariate population with finite fourth moments, EX4 and EY4, then:
38
(1) The statistic
⎡⎛ 2 ⎞ ⎛ 2 ⎞⎤
⎢⎜ s x ⎟ ⎜ σ x ⎟⎥
⎜ ⎟ ⎜ ⎟
n ⎢⎜ s xy ⎟ − ⎜ σ xy ⎟⎥
⎢ ⎥
⎢⎜⎜ s 2 ⎟⎟ ⎜⎜ σ 2 ⎟⎟⎥
⎣⎝ y ⎠ ⎝ y ⎠⎦
n (rn − ρ ) ⎯⎯
a .s .
(2) ⎯→ N ( 0 ,γ 2 ), where
Proof: Note that the assumption of finite fourth moments ensure the existence and
finiteness of all (mixed) moments used below. This follows from Hölder’s inequality
depend on location. Now by the aforementioned lemma, Sn = (Sn1, Sn2, Sn3, Sn4, Sn5) is
a.s. asymptotically normal. Let g be a mapping where g : ℜ 5 → ℜ 3 and apply the a.s.
⎛ g 1 ( x1 ,L , x5 ) ⎞ ⎛⎜ x3 − x1 ⎞⎟
2
⎜ ⎟
g( x ) = ⎜ g 2 ( x1 ,L , x5 ) ⎟ = ⎜ x5 − x1 x 2 ⎟.
⎜ g ( x ,L , x ) ⎟ ⎜⎜ ⎟
⎝ 3 1 5 ⎠ ⎝ x4 − x 22 ⎟⎠
39
1
Note that the mean of S *n = S n is μ = ( 0 ,0 ,σ x2 ,σ 2y ,σ xy ) and
n
⎛ ∂ ∂ ⎞
⎜ g1( x ) K g1( x ) ⎟
⎛ − 2 x1 0 1 0 0⎞
⎜ ∂x1 ∂x5 ⎟ ⎜ ⎟
g& ( x ) = ⎜ M M M ⎟ = ⎜ − x2 − x1 0 0 1 ⎟.
⎜ ∂ ∂ ⎟
⎜ ∂x g3( x ) L g 3 ( x ) ⎟ ⎜⎝ 0 − 2 x2 0 1 0 ⎟⎠
∂x5
⎝ 1 ⎠
It should be mentioned that we can prove this theorem by defining the following moment
matrix:
⎛ x1 = mx ⎞
⎜ ⎟
⎜ x2 = my ⎟
⎜ ⎟
x n = ⎜ x3 = m xx ⎟ .
⎜x = m xy ⎟
⎜ 4 ⎟
⎜ x5 = m yy ⎟⎠
⎝
Please refer to Ferguson (2002) to follow the proof steps that utilizes this xn matrix.
⎛0 0 1 0 0 ⎞
⎜ ⎟
g& ( μ ) = ⎜ 0 0 0 0 1 ⎟ ,
⎜0 0 0 1 0 ⎟
⎝ ⎠
and
⎡⎛ 2 ⎞ ⎛ 2 ⎞⎤
⎢⎜ s x ⎟ ⎜ σ x ⎟⎥
⎜ ⎟ ⎜ ⎟
n ( g ( S n ) − g ( μ )) = n ⎢⎜ s xy ⎟ − ⎜ σ xy ⎟⎥ ⎯⎯
a .s .
*
⎯→ N ( 0 , g& ( μ )Bg& ( μ )T ),
⎢ ⎥
⎢⎜⎜ s 2 ⎟⎟ ⎜⎜ σ 2 ⎟⎟⎥
⎣⎝ y ⎠ ⎝ y ⎠⎦
40
⎛ b11 b12 b 13 b14 b15 ⎞⎛ 0 0 0⎞
⎜ ⎟⎜ ⎟
⎛ 0 0 1 0 0 ⎞⎜ b21 b22 b23 b24 b25 ⎟⎜ 0 0 0⎟
⎜ ⎟⎜
⎜ 0 0 0 0 1 ⎟⎜ b31 b32 b33 b34 b35 ⎟⎜ 1 0 0⎟
⎟⎜ ⎟
⎜ 0 0 0 1 0 ⎟⎜ b b45 ⎟⎜ 0 0 1⎟
⎝ ⎠ 41 b42 b43 b44
⎜b b55 ⎟⎠⎜⎝ 0
⎝ 51 b52 b53 b54 1 0 ⎟⎠
⎛0 0 0⎞
⎜ ⎟
⎛ b31 b32 b33 b34 b35 ⎞⎜ 0 0 0⎟
⎜ ⎟
= ⎜ b51 b52 b53 b54 b55 ⎟⎜ 1 0 0⎟
⎜ ⎟
⎜b b42 b43 b44 b45 ⎟⎠⎜ 0 0 1⎟
⎝ 41
⎜0 1 0 ⎟⎠
⎝
⎛ b33 b35 b34 ⎞
⎜ ⎟
= ⎜ b53 b55 b54 ⎟.
⎜b b45 b44 ⎟⎠
⎝ 43
Therefore,
⎡⎛ 2 ⎞ ⎛ 2 ⎞⎤
⎢⎜ s x ⎟ ⎜ σ x ⎟⎥ ⎡ ⎛ b33 b35 b34 ⎞⎤
⎟⎥
⎜ ⎟ ⎜ ⎟ a .s . ⎢ ⎜
n ( g ( S *n ) − g ( μ )) = n ⎢⎜ s xy ⎟ − ⎜ σ xy ⎟⎥ ⎯⎯⎯→ N ⎢0 ,⎜ b53 b55 b54 ⎟⎥ .
⎢ ⎥
⎢⎜⎜ s 2 ⎟⎟ ⎜⎜ σ 2 ⎟⎟⎥ ⎢⎣ ⎜⎝ b43 b45 b44 ⎟⎠⎥⎦
⎣⎝ y ⎠ ⎝ y ⎠⎦
(2) Let us again assume µx = µy = 0 since r does not depend on location. In order to
show the second part of the theorem, use part (1) together with Cramér’s theorem applied
y2
h( y1 , y 2 , y 3 ) = ;
y1 y 3
where
s xy
h( s x2 , s xy , s 2y ) = ;
sxs y
⎛ 1 s xy 1 1 s xy ⎞⎟
h&( s x2 , s xy , s 2y ) = ⎜ − − ;
⎜ 2 s x3 s y sx s y 2 s x s 3y ⎟
⎝ ⎠
41
and
⎛ 1 σ xy 1 1 σ xy ⎞
h&( σ x2 ,σ xy ,σ 2y ) = ⎜ − − ⎟.
⎜ 2 σ 3σ σ xσ y 2 σ xσ 3y ⎟
⎝ x y ⎠
( )
We obtain n h( s x2 , s xy , s 2y ) − h( σ x2 ,σ xy ,σ 2y ) ⎯⎯
a .s .
⎯→ N ( 0 ,γ 2 ) , where
γ 2 = h&( σ x2 ,σ xy ,σ 2y )Ch&( σ x2 ,σ xy ,σ 2y )T
⎡b b35 b34 ⎤⎛ T
⎛ σ xy 1 σ xy ⎞⎟ ⎢ 33 σ 1 σ ⎞
⎜ b54 ⎥⎥⎜ − ⎟
xy xy
= − , ,− b b55 , ,−
⎜ 2σ 3σ σ σ 3 ⎟ ⎢ 53 ⎜ 2σ 3σ σ σ 3⎟
⎝ x y x y 2σ σ
x y ⎠ ⎢b b45 b44 ⎥⎦⎝ x y x y 2σ xσ y ⎠
⎣ 43
⎛ σ xy 1 σ xy σ xy 1 σ xy
= ⎜− b33 + b53 − b43 , − b35 + b55 − b ,
⎜ 2σ 3σ σ σ 2σ σ 3
2σ 3
σ σ σ 2σ σ 3 45
⎝ x y x y x y x y x y x y
T
σ xy 1 σ xy ⎞ ⎛ σ xy 1 σ xy ⎞⎟
− b34 + b54 − b44 ⎟ × ⎜ − , ,−
2σ x3σ y σ xσ y 2σ xσ 3y ⎟ ⎜ 2σ 3σ σ σ 2σ xσ 3y ⎟⎠
⎠ ⎝ x y x y
⎛ σ xy 1 σ xy ⎞ σ xy
= −⎜ − b33 + b53 − b34 ⎟
⎜ 2σ 3σ σ xσ y 2σ xσ 3y ⎟ 2σ 3σ
⎝ x y ⎠ x y
⎛ σ xy 1 σ xy ⎞ 1
+⎜− b35 + b55 − b45 ⎟
⎜ 2σ 3σ σ xσ y 2σ xσ 3y ⎟σ σ
⎝ x y ⎠ x y
⎛ σ xy 1 σ xy ⎞ σ xy
−⎜− b34 + b54 − b44 ⎟
⎜ 2σ 3σ σ xσ y 2σ xσ 3y ⎟ 2σ σ 3
⎝ x y ⎠ x y
1 ⎡b 2b34 b ⎤ ⎡ b b ⎤ b
= ρ 2 ⎢ 33 + + 44 ⎥ − ρ ⎢ 35 + 45 ⎥ + 55
4 ⎢ σ x4 σ x2σ 2y σ 4y ⎥ ⎢ σ x3σ y σ xσ 3y ⎥ σ x2σ 2y
⎣ ⎦ ⎣ ⎦
since
⎛ ⎞
& ⎜ y2 1 y2 ⎟
h( y1 , y 2 , y 3 ) = ⎜ − , ,− ⎟.
⎜ 2 y 3 y3 y1 y 3 2 y y 3 ⎟
⎝ 1 1 3 ⎠
42
3.4 Connecting the ASCLT and the Correlation Coefficient
In the remaining parts of the chapter, we will discuss the ASCLT and how we can apply
it to the estimation of the correlation coefficient. Our proposal in this thesis is to connect
these two ideas and converge to a confidence interval method for the population
correlation coefficient. In fact, our main goal towards this ASCLT-based theory of
confidence interval estimation will be the estimation of the quantiles of the distribution of
the correlation coefficient statistic. In this section, the results of a new version of the
Remark 3.1
n ⎡ u2 ⎤
1 1 t 1
Lim
n → ∞ log n
∑ k k ∫
1{ n (r − ρ ) ≤ t } =
−∞ 2
exp ⎢−
⎢ 2γ 2
⎥ du
⎥⎦
k =1 2πγ ⎣
1
This convergence is not changed when replacing the norming sequence by the
log n
lim a n 1 1
sequence an with → 1. A particular sequence an is of the form = ∑ nk = 1 .
log n an k
This sequence turns the left hand side in the above equation into a distribution function.
2) Another modification of the above result arises when the weights 1/k are changed so
n
1 1
Lim
n → ∞ log n
∑ wk −
k
k =1
m( n ) n
1 ⎛ 1⎞ 1 1
= Lim
n → ∞ log n
∑ ⎜ wk + ⎟ + Lim ∑
k ⎠ n → ∞ log n k = m( n ) + 1
wk − = 0 ,
k
k =1 ⎝
43
by choosing m(n) increasing so slowly that the first limit is zero. In particular, we can
choose wk = 0 for k ≤ K and equals 1/k otherwise. This also shows that one may take
n ⎡ u2 ⎤
1 t 1
Lim
n → ∞ log n
∑ wkn 1{ n (r − ρ ) ≤ t } =
k ∫−∞ 2
exp ⎢−
⎢ 2γ 2
⎥ du .
⎥⎦
k =1 2πγ ⎣
If K(n) is increasing sufficiently slow, then we may put wkn = 0 for k ≤ K(n) and
1
wkn = otherwise, replacing as well logn by ∑ kn = K ( n ) + 1 1 .
k k
3) Since every measurable set of full measure under the distribution of an i.i.d. sequence
coordinates, we obtain
n ⎡ u2 ⎤
1 1 t 1
Lim
n → ∞ m( n ) log n
∑ ∑ 1 ∫
k { n (rk (τ (ω ))− ρ ) ≤ t }
=
−∞
exp ⎢− ⎥ du
⎢⎣ 2γ 2 ⎦⎥
τ ∈M ( n ) k = 1 2πγ 2
where M(n) is a family of maps on the probability space which permutes the values of
X1, ...,Xn and which has cardinality m(n). In practice one chooses M(n) at random.
Theorem 3.1 is the analogue of the classical asymptotic distribution result for the
correlation coefficient. In the non a.s. case it does not provide best confidence intervals
because γ2 has to be estimated. In the almost sure case this is not necessary; however, as
in the classical case we need to investigate the effect of a stabilizing transformation. The
44
Corollary 3.1 If the distribution of (X1, Y1) is bivariate normal, then γ2 = (1-ρ2)2 such
that
n (rn − ρ ) ⎯⎯
a .s .
⎯→ N ( 0 ,( 1 − ρ 2 )2 ) .
Proof: Although the result is well known, we give a proof for completeness. For normal
distributions we have
b11 = σ x2
b22 = σ 2y
b33 = Var(( X 1 − μ x )2 − σ x2 ) = 3σ x4 − σ x4 = 2σ x4
b44 = Var(( Y1 − μ y )2 − σ 2y ) = 2σ 4y
b12 = b21 = σ xy
hence
1 2 ⎡⎢ b33 2b34 b ⎤ ⎡ b b ⎤ b
ρ + + 44 ⎥ − ρ ⎢ 35 + 45 ⎥ + 55
4 ⎢ σ x4 σ x2σ 2y σ 4y ⎥ ⎢ σ x3σ y σ xσ 3y ⎥ σ x2σ 2y
⎣ ⎦ ⎣ ⎦
1 2 ⎡⎢ 2σ x4 4σ xy 2σ y ⎤ ⎡ 2σ x2σ xy 2σ xyσ 2y ⎤ σ x2σ 2y + σ xy
2 4 2
= ρ + + ⎥ − ρ⎢ + ⎥+
4 ⎢ σ x4 σ x2σ 2y σ 4y ⎥
⎣ ⎦
⎢ σ x3σ y
⎣ σ σ 3 ⎥
x y ⎦ σ x2σ 2y
= ρ 2 + ρ 4 − 4 ρ 2 + 1 + ρ 2 = ( 1 − ρ 2 )2 .
The following theorem can be found in Ferguson (2002) and van der Vaart (1998).
45
Theorem 3.2. Let (x1, y1), (x2, y2), …, (xn, yn) be a random sample of n pairs as having
been drawn from a bivariate population with finite fourth moments, EX4 and EY4, and
n⎛ 1 + rn 1 + ρ ⎞ a .s .
⎜⎜ log − log ⎯→ N ( 0 ,τ 2 ),
⎟⎟ ⎯⎯
2 ⎝ 1 − rn 1− ρ ⎠
where
ρ2 ⎡b b b ⎤
τ2 = ⎢ 33 + 2 34 + 44 ⎥
4( 1 − ρ 2 )2 ⎢⎣ σ x4 σ x2σ 2y σ 4y ⎥⎦
⎡ b ⎡ ⎤
ρ 35 b45 ⎤ ⎢ b55 ⎥.
− ⎢ + ⎥ +
( )
2 ⎢ 3 3⎥ ⎢
(2
) ⎥
1 − ρ 2 ⎣σ x σ y σ xσ y ⎦ ⎢⎣ 1 − ρ 2 σ x2σ 2y ⎥⎦
1 1+ ρ
Proof: Let g( ρ ) = log for -1 < ρ < 1, so g is differentiable function
2 1− ρ
g : ( −1,1 ) → ℜ .
1
We have g& ( ρ ) = and hence by Cramér’s theorem and Theorem 3.1, part (2) we
1− ρ2
n ( g( rn ) − g( ρ )) ⎯⎯
a .s .
⎯→ N ( 0 ,τ 2 ) ,
where
46
ρ2 ⎡b b b ⎤
τ 2 = g& ( ρ )2 γ 2 = ⎢ 33 + 2 34 + 44 ⎥
4( 1 − ρ 2 )2 ⎢⎣ σ x4 σ x2σ 2y σ 4y ⎥⎦
⎡ b ⎡ ⎤
ρ 35 b45 ⎤ ⎢ b55 ⎥.
− ⎢ + ⎥+
( 2 ⎢ 3
) 3⎥ ⎢ 2
( ⎥
1 − ρ 2 ⎣σ x σ y σ xσ y ⎦ ⎢⎣ 1 − ρ 2 σ x2σ 2y ⎥⎦ )
Remark 3.2 The last theorem states that no matter what the asymptotic variance will be,
we can estimate the quantiles of the asymptotic statistics from the data. With that in
n ⎡ u2 ⎤
1 1 t 1
Lim
n → ∞ log n
1 ∑
k ⎧⎪ n ⎛⎜ log 1 + rk − log 1 + ρ ⎞⎟ ≤ t ⎫⎪⎬
=
−∞ ∫
2
exp ⎢−
⎢ τ 2
⎥ du
k =1 ⎨
2 ⎜ 1− r 1− ρ ⎟
2πτ ⎣ 2 ⎦⎥
⎪⎩ ⎝ k ⎠ ⎪⎭
The classical law of the iterated logarithm (see (van der Vaart, 1998; Dudley, 1989)) for
independent, identically distributed random variables Zn with finite variance σ2 states that
almost surely
Z + L + Z n − nEZ 1
lim sup 1 = 1.
2
n→∞ 2σ n log log n
Likewise, the liminf is equal to −1. We shall use this result below. In the sequel we will
need the law of iterated logarithm for the statistic used for a.s. confidence intervals. In a
s xy
r = rn =
sx s y
47
1 1 + rn
log .
2 1 − rn
Theorem 3.3. Let (Xn, Yn) be an i.i.d. sequence of bivariate random vectors with finite
σxy = Cov(X1, Y1) and b55 as defined in section 3.3. Then almost surely
s xy − nσ xy
lim sup ≤ b55 + 2 μ x σ y + 2 μ y σ x .
n→∞ 2n log log n
Proof: This follows from the classical law of iterated logarithm for independent
identically distributed random variables with finite second moment, together with the
definition of the constants bij as follows: Let m x = ∑nk =1 X k and m y = ∑ n Yk and recall
k =1
that:
n
⎛ 1 ⎞⎛ 1 ⎞
s xy = ∑ ⎜⎝ X k − n m x ⎟⎠⎜⎝Yk − n m y ⎟⎠.
k =1
By the law of iterated logarithm for i.i.d. sequences, applied to Zn = (Xn - µx)(Yn - µy), we
obtain
n
∑ ( X k − μ x )(Yk − μ y ) − nσ xy
lim sup k =1 =1
n→∞ 2nb55 log log n
n
1 m x − nμ x
lim sup ∑ ( X k − μ x ) = limn →sup =1
n→∞ 2nσ x2 log log n k = 1 ∞ 2 nσ x2 log log n
48
And likewise for the other marginal process Yk. Therefore,
s xy − nσ xy
lim sum
n→∞ 2n log log n
⎛ 1 ⎞⎛ 1 ⎞
∑ k =1 ⎜⎝ X k − n m x ⎟⎠⎜⎝ Yk − n m y ⎟⎠ − nσ xy
n
= lim sum
n→∞ 2 n log log n
∑ ( X k − μ x )(Yk − μ y )− nσ xy
n
= lim sum k = 1
n→∞ 2 n log log n
∑ k =1 ⎜⎜⎝ X k (μ y − n m y ⎟⎠ + Yk (μ x − n m x ⎟⎠ + n 2 m x m y − μ x μ y ⎟⎟⎠
n ⎛ 1 ⎞ 1 ⎞ 1 ⎞
+
2 n log log n
1 1
( 1
) (
m x nμ y − m y + m y (nμ x − m x ) + m x m y −nμ y + μ y (m x −nμ x ) )
= b55 + lim sum n n n
n→∞ 2 n log log n
≤ b55 + 2 μ x σ y + 2 μ y σ x ,
where we also used the strong law of large numbers such that almost surely
1 1
lim mx = μ x and my = μ y .
lim
n→∞ n n→∞ n
n
lim sup rn − ρ ≤ Γ .
n→∞ 2 log log n
n 2 n
⎛1 ⎞ 1
s x2 = ∑ X i2
⎝n ⎠ i =1
∑
− n⎜ m x ⎟ = ( X i − EX 1 )2 + n( EX 1 )2 − n( m x ) 2 .
n
i =1
By the strong law of large numbers we obtain the following almost sure results,
49
n
1
lim
n→∞ n
∑( X i − EX 1 )2 = σ x2
i =1
and
2
⎛1 n ⎞
lim (EX 1 )
n→∞
2
−⎜
⎜n ∑
X k ⎟ = 0.
⎟
⎝ k =1 ⎠
1 2
Therefore lim s x = σ x2 almost surely. Now by replacing X and Y variables we also have
n→∞ n
1 2
lim s y = σ 2y ;
n→∞ n
hence,
1
Lim s x s y = σ xσ y
n→∞ n
1
lim sup s xy − nσ xy ≤ b55 + 2 μ x σ y + 2 μ y σ x
n→∞ 2n log log n
n
lim sup rn − ρ
n→∞ 2 log log n
1 s xy nσ xy
=lim sup −
n→∞ 2n log log n 1 s s σ xσ y
x y
n
1 s xy nσ xy σ xy s x s y −nσ xσ y
≤ lim sup − + .
n→∞ 2n log log n 1 s s 1
sxs y 2 n log log n 1 s s σ σ
x y x y x y
n n n
50
thus
1
∑ k ,l =1 X k2Yl2 is a non-degenerate von Mises statistic with n s x2 s 2y = σ x2σ 2y ,
n
Since s x2 s 2y =
it follows from the bounded law of iterated logarithm for von Mises statistics (see
Dehling, Denker, Philipp (1984)), that there exists a constant C such that,
1
lim sup s x2 s 2y − n 2σ x2σ 2y ≤ C ≤ ∞.
3
n→∞ n 2 log log n
s x s y − nσ xσ y s x2 s 2y − n 2σ x2σ 2y
(2σ xσ y − η ) ≤
3
≤ C +η.
n log log n
n 2 log log n
Setting
b55 2 μx 2 μy σ xy
Γ = + + + C
σ xσ y σx σx 2σ x3σ 3y
n ⎡ 1 + rn 1+ ρ ⎤
lim sup ⎢log − log ⎥ ≤ Λ < ∞.
n→∞ 2 log log n ⎣ 1 − rn 1− ρ ⎦
1+ t
Proof: First use the Taylor series expansion of log around t = ρ up to the first
1−t
1+ t 1+ ρ d 1+ t
log = log + ( t − ρ ) log
1−t 1− ρ dt 1 − t |t = ξ
1+ ρ 2
= log +(t − ρ ) ,
1− ρ 1−ξ 2
51
where ξ = ξ(t) is a point in the interval from t to ρ. Note that we may assume that ρ ≠ ±1
because otherwise the random variables X1 and Y1 are collinear. Taking t = rn it follows
n 1 + rn 1+ ρ
lim sup log − log
n→∞ 2 log log n 1 − rn 1− ρ
n 1
= lim sup rn − ρ
n→∞ log log n 1 − ξ ( rn ) 2
(
≤ Λ := 2Γ 1 − ρ 2 , )
using Corollary 3.2.
52
Chapter 4
notation introduced in chapter 2. This notation will be applied to the development of the
coefficient.
In this section, we will observe some of the previously mentioned distribution functions
and develop additional ones. After this has been completed we can continue developing
N
1 1
GN ( t ) = ∑ 1{ b ( T − a ) ≤ t } .
LogN n = 1 n n n n
Due to the slow nature of convergence of the distribution function, the following was
N
~ 1 1
GN ( t ) = ∑ 1{ b ( T − a )≤ t } .
C N n =1 n n n n
We introduced the following function in Remark 3.1. I will define this function H N ( t ) :
53
N
1 1
HN(t ) = ∑ 1
log N n = 1 n { n (rn − ρ ) ≤ t }
,
where this function had an a.s. weak convergence to the N(0, γ2). However, according to
Corollary 3.1, if the distribution of (X1, Y1) is bivariate normal, then this function had an
N
~ 1 1
HN (t ) = ∑ 1
C N n = 1 n { n (rn − ρ )≤ t }
.
We introduced the following function in Remark 3.2. I will define this function J N ( t ) :
N
1 1
JN (t ) = ∑ 1
log N n = 1 n ⎧⎪⎨ n ⎛⎜ log 1 + rn − log 1 + ρ ⎞⎟ ≤ t ⎫⎪⎬
,
⎪ 2 ⎜
⎩ ⎝1− r n 1− ρ ⎟ ⎪ ⎠ ⎭
where this function had an a.s. weak convergence to the N(0, τ2). However, according to
Theorem 3.2, if the distribution of (X1, Y1) is bivariate normal, then this function had an
N
~ 1 1
JN (t ) = ∑ 1
C N n = 1 n ⎧⎪⎨ n ⎛⎜ log 1 + rn − log 1 + ρ ⎞⎟ ≤ t ⎫⎪⎬
.
⎪⎩ 2 ⎜⎝ 1 − rn 1 − ρ ⎟⎠ ⎪⎭
In the previous section of this chapter, we have defined empirical distribution functions
54
~ ~
H N ( t ) , H N ( t ), J N ( t ) , and J N ( t ). In this section, we will define their corresponding
inverse functions which are based on the definition from Thangavelu (2005). Once the
inverse function is known, we can show results of the quantiles of these distributions.
However, before we can define the quantiles, we must first define the inverse function of
~ ~
Definition 4.1. Let H N ( t ) , H N ( t ), J N ( t ) , and J N ( t ) be empirical distribution functions
that converge to the true distribution G P ( t ). For a fixed N ∈ N, let the inverse of our
−1 ~ −1 −1
distribution functions be denoted by the functions H N ( α ), H N ( α ), J N ( α ), and
~
J N−1 ( α ), where α is between 0 and 1, inclusive. The inverse functions are denoted as
follows:
⎧sup{ t G P ( t ) = 0 } for α = 0
⎪
G P− 1 ( α ) = ⎨sup{ t G P ( t ) < α } for 0 < α < 1 ;
⎪
⎩inf{ t G P ( t ) = 1 } for α = 1
⎧sup{ t H N ( t ) = 0 } for α = 0
−1 ⎪
H N ( α ) = ⎨sup{ t H N ( t ) < α } for 0 < α < 1 ;
⎪
⎩inf{ t H N ( t ) = 1 } for α = 1
~
⎧sup{ t H N ( t ) = 0 } for α = 0
~ −1 ⎪⎪ ~
H N ( α ) = ⎨sup{ t H N ( t ) < α } for 0 < α < 1 ;
⎪ ~
⎪⎩inf{ t H N ( t ) = 1 } for α = 1
⎧sup{ t J N ( t ) = 0 } for α = 0
−1 ⎪
J N ( α ) = ⎨sup{ t J N ( t ) < α } for 0 < α < 1 ;
⎪
⎩inf{ t J N ( t ) = 1 } for α = 1
55
and
~
⎧sup{ t J N ( t ) = 0 } for α = 0
~ ⎪⎪ ~
J N− 1 ( α ) = ⎨sup{ t J N ( t ) < α } for 0 < α < 1 .
⎪ ~
⎪⎩inf{ t J N ( t ) = 1 } for α = 1
In the previous sections, we have defined our empirical distribution functions and their
inverse functions. Now that the inverse functions are known, we can show results of the
empirical quantiles of these distributions. In this section, we base our definitions from
Thangavelu (2005).
~ ~
Definition 4.2. Let H N ( t ) , H N ( t ), J N ( t ) , and J N ( t ) be empirical distribution
functions that converge to the true distribution G P ( t ) . For a fixed N ∈ N, let the inverse
−1 ~ −1 −1
of our distribution functions be denoted by the functions H N ( α ), H N ( α ), J N ( α ),
~
and J N−1 ( α ). For 0 ≤ α ≤1, the empirical α-quantiles for our statistics, where n ≤ N ∈ N
−1
tα( N ) = H N (α ) (4.1)
~ ~
t ( N ) = H −1( α )
α N (4.2)
or
−1
tα( N ) = J N (α ) (4.3)
~ .
~
t ( N ) = J −1( α )
α N (4.4)
56
4.4 Confidence Intervals for ρ
Recall in the beginning of this thesis we discussed how the ASCLT could be used to
develop an estimation technique for the correlation coefficient. However, our main goal
was to work towards an ASCLT-based theory of confidence interval estimation for our
~ ~ −1 ~ −1
H N ( t ) , H N ( t ), J N ( t ) , and J N ( t ) , the inverse functions H N ( α ), H N ( α ),
−1 ~
JN ( α ), and J N−1 ( α ) , and the estimated α-quantiles for the empirical distribution
functions. In this section, a new version of the confidence interval for the population
Definition 4.3. Let (x1, y1), (x2, y2), … be a sample from a bivariate distribution with
finite fourth moments, EX4 and EY4. For n ≥ 1 let the statistic rn be a sequence of real
valued correlation coefficient statistics defined on the same measureable space (Ω, β) and
~
P be a family of probabilities on β. Recall that the distribution function H N ( t )
converges almost surely for any t to a N(0, γ2) distribution. The following is the ASCLT-
⎡ ~
t1(−Nα ) t (N)⎤
~
(N)
Iα ⎢
= ρˆ + , ρˆ + α ⎥. (4.5)
⎢ N N ⎥
⎣ ⎦
Here we will estimate the population correlation coefficient with ρ̂. Notice that the above
asymptotic confidence interval does not include the variance. The material presented in
57
this chapter show procedures developed from the ASCLT to derive confidence intervals
variance stabilizing technique, Fisher used it to construct the confidence interval for the
n (r − ρ ) ⎯⎯→ N ( 0 ,( 1 − ρ 2 )2 ) .
L
The variance stabilizing technique seeks a transformation, g(r) and g(ρ), such that
g& ( ρ ) 2 ( 1 − ρ 2 ) 2 = 1 ;
1
g& ( ρ ) = .
(1− ρ2 )
1 ⎡ 12 12 ⎤ 1 1 1 ⎡1 + ρ ⎤
g( ρ ) = ∫ ( 1 − ρ 2 )2 dρ = ∫ ⎢⎣( 1 − ρ ) + ( 1 + ρ ) ⎥⎦dρ = 2 [− ln( 1 − ρ )] + 2 [ln( 1 + ρ )] = 2 ln⎢⎣ 1 − ρ ⎥⎦
and
1 ⎡1 + r ⎤
g( r ) = ln ⎢ .
2 ⎣ 1 − r ⎥⎦
~
Definition 4.4. Let J N ( t ) or J N ( t ) be empirical distribution functions that converge to
58
the true distribution G P ( t ). For a fixed N ∈ N, let the inverse of our distribution
−1 ~
functions be denoted by the functions J N ( t ) or J N−1 ( t ) . For 0 ≤ α ≤1, the empirical α-
1 + rn
quantiles of the statistic z n = log , where n ≤ N ∈ N, were defined in (4.3) and (4.4).
1 − rn
1 + ρˆ
Also, define z N = log where the estimate of the population correlation coefficient
1 − ρˆ
will be denoted by ˆρ. Although the procedure to develop the confidence interval is well
known using the variance stabilizing technique, for completeness we will develop the
~t ( N )
1 −α 1 ⎛1+ ρ ⎞
zN + ≤ ln⎜ ⎟
N 2 ⎜⎝ 1 − ρ ⎟⎠
⎧ ⎛ t1(−Nα ) ⎞⎟⎫⎪ 1 + ρ
~
⎪ ⎜
⇒ exp⎨2⎜ z N + ⎬≤
⎪⎩ ⎝ N ⎟⎪ 1 − ρ
⎠⎭
⎧ ⎛ t1(−Nα ) ⎞⎟⎫⎪
~ ⎧ ⎛ t1(−Nα ) ⎞⎟⎫⎪
~
⎪ ⎜ ⎪ ⎜
⇒ exp⎨2⎜ z N + ⎟⎬ − ⎟⎬ ρ ≤ + ρ
exp⎨2 z N + 1
N ⎜ N
⎪⎩ ⎝ ⎠⎪⎭ ⎪⎩ ⎝ ⎠⎪⎭
⎧ ⎛ t1(−Nα ) ⎞⎟⎫⎪
~ ⎧ ⎛ t1(−Nα ) ⎞⎟⎫⎪
~
⎪ ⎜ ⎪ ⎜
⇒ − exp⎨2⎜ z N + ⎬ ρ − ρ ≤ 1 − exp⎨2⎜ z N + ⎬
⎪⎩ ⎝ N ⎟⎪ ⎪ N ⎟⎪
⎠⎭ ⎩ ⎝ ⎠⎭
⎡ ⎧ ⎛ t1(−Nα ) ⎞⎟⎫⎪ ⎤⎥
~ ⎧ ⎛ t1(−Nα ) ⎞⎟⎫⎪
~
⎢ ⎪ ⎜ ⎪ ⎜
⇒ ρ exp ⎨2⎜ z N + ⎬ + 1 ≥ exp ⎨2⎜ z N + ⎬−1
⎢ ⎪ N ⎟⎪ ⎥ ⎪ N ⎟⎪
⎢⎣ ⎩ ⎝ ⎠⎭ ⎥⎦ ⎩ ⎝ ⎠⎭
⎧ ⎛ ~t1(−Nα ) ⎞⎟⎫⎪
⎪ ⎜
exp⎨2 z N + ⎬−1
⎜ N ⎟⎪
⎪⎩ ⎝ ⎠⎭
⇒ρ≥ .
⎧ ⎛ ~t ( N ) ⎞⎫
⎪ ⎪
exp ⎨2⎜ z N + 1 −α ⎟⎬ + 1
⎜ N ⎪ ⎟
⎪⎩ ⎝ ⎠⎭
59
Upper Bound CI for ρ:
~
t ( N ) 1 ⎛1+ ρ ⎞
zN + α ≥ ln⎜⎜ ⎟
N 2 ⎝ 1 − ρ ⎟⎠
⎧⎪ ⎛ ~
tα( N ) ⎞⎟⎫⎪ 1 + ρ
⇒ exp ⎨2 z N +⎜ ⎬≥
⎪⎩ ⎜⎝ N ⎟⎠⎪⎭ 1 − ρ
⎧⎪ ⎛ ~
t ( N ) ⎞⎟⎫⎪ ⎧⎪ ⎛ ~
tα( N ) ⎞⎟⎫⎪
⇒ exp ⎨2⎜ z N + α ⎬ − exp 2⎜
⎨ ⎜ Nz + ⎬ρ ≥ 1 + ρ
⎪⎩ ⎝ ⎜ N ⎟ N ⎟
⎠⎪⎭ ⎪⎩ ⎝ ⎠⎪⎭
⎧⎪ ⎛ ~tα( N ) ⎞⎟⎫⎪ ⎧⎪ ⎛ ~
t ( N ) ⎞⎟⎫⎪
⇒ − exp ⎨2 z N +⎜ ⎬ ρ − ρ ≥ 1 − exp ⎨2 z N + α
⎜ ⎬
⎪⎩ ⎜⎝ N ⎟⎠⎪⎭ ⎪⎩ ⎜⎝ N ⎟⎠⎪⎭
⎡ ⎧ ⎛ tα( N ) ⎞⎟⎫⎪ ⎤
~ ⎧⎪ ⎛ ~
tα( N ) ⎞⎟⎫⎪
⎢ ⎪ ⎜ ⎥ ⎜
⇒ − ρ exp ⎨2 z N + + 1 ≤ exp⎨2 z N + −1
⎢ ⎪ ⎜⎝ N ⎟⎬⎪ ⎥ ⎪ ⎜ N ⎟⎬⎪
⎣ ⎩ ⎠ ⎭ ⎦ ⎩ ⎝ ⎠ ⎭
⎧⎪ ⎛ ~
t ( N ) ⎞⎟⎫⎪
exp⎨2⎜ z N + α ⎬−1
⎜
⎪⎩ ⎝ N ⎟
⎠⎪⎭
⇒ρ≤ .
⎧⎪ ⎛ ~ ( N ) ⎞⎫
t
exp ⎨2⎜ z N + α ⎟⎪⎬ + 1
⎪⎩ ⎝ ⎜ N ⎟⎠⎪⎭
Therefore the following is the ASCLT-derived confidence interval for ρ using the
⎡ ⎧ ⎛ ~ ( N ) ⎞⎫ ⎧ ~ ( N ) ⎞⎫ ⎤
⎢ exp ⎪⎨2⎜ z + t1 −α ⎟⎪⎬ − 1 exp⎪2⎛⎜ z + tα ⎟⎪⎬ − 1 ⎥
⎢ ⎜ N ⎟ ⎨ ⎜ N
⎪⎩ ⎝ N
⎠⎪⎭ ⎪⎩ ⎝ N ⎟⎠⎪⎭ ⎥
(N) ⎢ ⎥.
Iα = , (4.6)
⎢ ⎧ ⎛ ~ ( N ) ⎞⎫ ⎧ ⎛ ~ ( N ) ⎞⎪ ⎥
⎫
⎢ exp⎪2⎜ z + t1 −α ⎟⎪ + 1 exp ⎪⎨2⎜ z + tα ⎟⎬ + 1 ⎥
⎨ ⎜ N ⎬ ⎜ N
⎢
⎪⎩ ⎝ N ⎪ ⎟ ⎪⎩ ⎝ N ⎟⎠⎪⎭ ⎥
⎢⎣ ⎠⎭ ⎥⎦
60
Chapter 5
Numerical Applications
5.1 Introduction
A considerable amount of time was used in the previous chapters to develop the
theoretical results. In this chapter, the main objective will be to validate the theoretical
results presented earlier by numerical simulations. We are also interested in how the
methods presented earlier work for finite samples. It has been emphasized earlier that all
the theoretical results that utilize the almost sure central limit theorem are asymptotic in
nature. Also, these asymptotic results converge at a very slow rate. These issues will be
addressed in this chapter. The following paragraphs will outline what will be presented
in this chapter.
Consider the following ASCLT distribution function that was presented in Chapter 2:
N
1
∑11
n
LogN n=1 {
X 1 + X 2 +⋅⋅⋅+ X n − nμ
≤t }
a .s .
⎯⎯⎯→Φ σ ( t ) for any t, (5.1)
n
where Φσ(t) is the normal distribution function with mean 0 and variance σ2. This
distribution has been proven to converge therefore we will not present any simulated
pictures.
Consider the following distribution functions that were presented in Section 3.4 where
61
N
1 1
∑ 1
log N n = 1 n { n (rn − ρ ) ≤ t }
⎯⎯
a .s .
⎯→Φ γ ( t ) for any t (5.2)
and
N
1 1
∑
1
log N n=1 n ⎧⎪ n ⎛⎜ log 1+ rn − log 1+ ρ ⎞⎟ ≤ t ⎫⎪
⎯⎯
a .s .
⎯→ Φ τ ( t ) for any t (5.3)
⎨ ⎜
⎪⎩ 2 ⎝ 1− rn ⎟ ⎬ 1− ρ ⎠ ⎪⎭
For (5.2), Φγ(t) is the normal distribution function with mean 0 and variance was
For (5.3), Φτ(t) is the normal distribution function with mean 0 and variance was
⎡bρ2 b b ⎤
τ2 = g& ( ρ ) 2 γ 2 = ⎢ 33 + 2 34 + 44 ⎥
4( 1 − ρ 2 ) 2 ⎢⎣ σ x4 σ x2σ 2y σ 4y ⎥⎦
⎡ b ⎡ ⎤
ρ 35 b45 ⎤ ⎢ b55 ⎥,
− ⎢ + ⎥+
( 2 ⎢ 3
) 3⎥ ⎢ 2
(
1 − ρ 2 ⎣ σ x σ y σ xσ y ⎦ ⎢⎣ 1 − ρ 2 σ x2σ 2y ⎥⎦ ) ⎥
where
1
g& ( ρ ) = .
1− ρ2
To the best of our knowledge, these empirical distributions also have not been considered
before. Even though the property of converging to asymptotic results were derived
62
be noted that the above empirical distribution functions include the unknown parameter ρ.
As mentioned previously, it will be shown that the empirical distribution functions that
combine the ASCLT and the correlation coefficient converge to a normal distribution.
However, our true goal is to estimate the quantiles from these distribution functions. For
interval estimation purposes, we will be interested in the quantiles on the tails of these
require large sample sizes to estimate accurately the true quantiles, tα and t1 −α . In the
following sections of this chapter, we will present techniques that will lead to the
In practical data analysis problems, it is not uncommon to deal with small to moderate
asymptotic results for the proposed distribution functions, quantiles, and confidence
interval estimation procedures mentioned earlier for small and moderate sample sizes.
These procedures will be clearly shown and validated based on simulation studies. In our
simulation based studies, we will be estimating the quantiles for different sized samples.
To overcome the problem of asymptotic results that converge at a very slow rate, we
63
• Do random permutations of the full sample in the process of estimating the
how the results of the permuted samples affect the rate of convergence.
Before these simulations are observed, we should keep in mind that any proposed
ASCLT-based confidence intervals that adjust for finite-sample cases should satisfy
certain interval estimation properties. For example, any interval estimator should have a
probability of (1-2α) that includes the true value of the population parameter. Recall the
two confidence interval techniques that were presented in chapter 4. For the population
correlation coefficient,
P[ ρ ∈ I α( N ) ] = 1 – 2α . (5.4)
To validate that this statement in (5.4) is true, this chapter will rely on the long-run
in this interval estimate with (1-2α)% confidence, numerical studies based on computer
simulations will be used to check this property. Each simulation to estimate the
In this chapter, the following confidence intervals for the population correlation
64
• Confidence intervals (4.5) and (4.6) that were introduced earlier. These interval
estimates uses the derived quantiles generated from the distribution functions
~ ~
H N ( t ) and J N ( t ) , respectively.
• Confidence interval (5.9) derived from the classic bootstrap method which will be
All the above confidence intervals will be compared assuming the bivariate normal
It should be mentioned that, being a new confidence interval approach for the correlation
coefficient, the ASCLT-based intervals will need many simulation-based testing and
evaluations to determine its performance and comparison over existing methods. Many
simulations, with different samples sizes and permutations were performed and results
findings.
Many of the simulation studies observed for this research will assume bivariate normal
distributions. The preferred method of generating standard normal random deviates is the
Box and Muller method (see Lange (1999)). This method generates two independent
standard normal deviates X and Y by starting with independent uniform deviates U and V
on [0,1]. The Box and Muller method transforms from the random Cartesian coordinates
65
(X, Y) in the plane to random polar coordinates (Θ, R). If Θ is uniformly distributed on
[0, 2π] and R2 is exponentially distributed with mean 2, where Θ = 2πU and R2 = -2lnV,
our defined independent standard normal random deviates will be defined as X = RcosΘ
N(μ1,σ12) and N(μ2,σ22) random deviates, simply do the transformations Z1 = μ1 + Xσ1 and
Z2 = μ2 + Yσ2.
Now suppose we want to simulate dependent bivariate normal random deviates with
simplest way to generate these bivariate random deviates is to complete the following
Step 1: Use the Box Muller technique to generate the matrix Z (n x 2) that
coefficient:
⎡ 1 σ xy ⎤⎡ 1 ρ xy ⎤
Σ =⎢ =⎢ .
⎣σ xy
⎥
1 ⎦ ⎣ ρ xy 1 ⎥⎦
66
After these steps have been completed, we have generated random deviates from a
again easily transform these random variables by multiplying the standard deviations and
Many of the simulation studies observed for this research will assume bivariate
there is dependence between the variables X and Y. For a more complete explanation of
copulas, please refer to Nelson (1999). Copulas will be used as a starting point for
and their applications in statistics is a rather modern phenomenon. What are copulas?
From one point of view, copulas are functions that join or couple multivariate distribution
The word copula is a Latin noun which means “a link, tie, bond,” and is used in grammar
and logic to describe “that part of a proposition which connects the subject and
predicate.” The word copula was first used in a statistical sense by Abe Sklar in 1959 in
Sklar’s Theorem. Let H be a joint distribution function with marginals F and G. Then
67
there exists a copula C such that for all x, y in ℜ,
If C is a copula and F and G are distribution functions, then the function H is a joint
distribution of time between events in a typical Poisson process. The following example,
first described by Marshall and Olkin, describes the role in a two-dimensional Poisson
process with bivariate exponential interarrival times. Consider a two component system –
such as a two engine aircraft, or a computer with dual CPU co-processors. The
components are subject to “shocks” which are fatal to one or both of the components.
For example, one of the two aircraft engines may fail, or a massive explosion could
destroy both engines simultaneously; or the CPU or a co-processor could fail, or a power
surge could eliminate both simultaneously. Let X and Y denote the lifetimes of
components 1 and 2, respectively. The “shocks” to the two components are assumed to
form three independent Poisson processes with positive parameters λ1, λ2, and λ12. These
parameters are dependent on whether the shock kills only component 1, component 2, or
both components. The times Z1, Z2, Z12 of the occurrence of these three shocks are
Now suppose we want to simulate dependent bivariate exponential random deviates with
68
following algorithm developed by Devroye (1986) generates random variates (X,Y) from
the Marshall-Olkin bivariate exponential distribution with parameters λ1, λ2, and λ12:
λ12
ρ= .
λ1 + λ2 + λ12
In this section we will show how the ASCLT-based distribution function method 1 in
(5.2) that includes the correlation coefficient converges to a normal distribution. For our
estimated distribution function in (5.2), the simulations in this section will be completed
• Bivariate exponential distributions for different values of λ1, λ2, and λ12.
In chapter 3, we spent time developing the theoretical results, now we would like to
validate these results with computer based simulations. As mentioned in the introduction,
due to the slow rate of convergence of the theoretical proposal of the ASCLT-based
more applicable to small to moderate sample size situations. This adjustment will be
accomplished through random permutations of the original sample and by replacing logN
69
with the quantity CN.
Since our function includes the unknown parameter ρ, we propose replacing this
unknown quantity with the suitable approximation ρ̂ . With this in mind, by replacing the
log averaging term and by Corollary 3.1, all the simulations in this section will be
N
1
~
HN (t ) =
CN
∑n1 1{ n (rn − ρˆ ) ≤t }
(5.5)
n =1
where this function has an a.s.weak convergence to the normal distribution with mean 0
and variance (1-ρ2)2. Also, ρ̂ will be the estimated population correlation coefficient for
using random permutations of the entire sample in the process of estimating the empirical
distribution function. The following are the steps to complete this permutation process:
bivariate normal sample (XN, YN) where N is our desired sample size.
Step 2: Let the ith permuted sample vector be denoted and given by
function:
70
N
~ 1 1
H *Ni ( t ) = ∑ 1
C N n =1 n ⎧ ⎛ * i ⎞ ⎫
, n = 1, ... , N ,
⎨ n ⎜ rn − ρˆ ⎟ ≤ t ⎬
⎩ ⎝ ⎠ ⎭
~
the function H N ( t ) .
Step 4: For each value of t, the estimated distribution function that will be
observed is
~
∑i =1
nper
~ H *Ni ( t )
HN (t ) = .
nper
function for method 1. For the bivariate normal and poisson distributions, the following
However, for the bivariate exponential distribution, the simulation set-up changed:
The skewed nature of the parent populations substantially affects our rate of convergence,
For the bivariate normal distribution, the following will give the reader some insight into
71
the progression of simulations to our final estimated distribution function. Each plot will
superimpose the estimated distribution function with the true normal distribution. The idea
here is to evaluate the convergence of the empirical distribution to the true distribution
function. Within this framework, we will observe the simulation progression to arrive at the
The initial simulation results are presented in Figure 5.1. Observe how for finite random
samples, our estimated function is not a distribution function. It does not satisfy the
condition limt → ∞ H N ( t ) = 1.
Figure 5.1: Estimated distribution function H N ( t ) for simulated samples from bivariate
standard normal distributions.
0.9
0.8
Distribution Function
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-3 -2 -1 0 1 2 3
t-values
72
In the next iteration presented in Figure 5.2, we replace logN with the quantity
C N = ∑ nN= 1 1 , where N is the sample size. As discussed in chapter 2, this should help
n
with the violated condition mentioned earlier and increase the rate of convergence.
~
Figure 5.2: Estimated distribution function H N ( t ) with CN for simulated samples from
bivariate standard normal distributions.
0.9
0.8
Distribution Function
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-3 -2 -1 0 1 2 3
t-values
Observe in Figure 5.2 how for finite random samples, our estimated function satisfies the
condition limt → ∞ H N ( t ) = 1. Replacing the log averaging term with CN helped this
condition; however, notice the jump when the t-value is approximately 0. This function
does not satisfy the condition H N ( t ) is right continuous, that is, for every number t0,
limt ↓ t H N ( t ) = H N ( t0 ). To address this issue, we refer to Remark 3.1 and the Law of
0
Iterated Logarithm discussed in Section 3.5. When estimating the distribution using
73
(5.5), notice that for small values of n, the weights of the function is determined by the
quotient 1 . Notice how more weight is applied to the estimated function for smaller
n
values of n. To address this issue, when performing the computer simulations, we will
Observe in Figure 5.3 that our empirical distribution function through simulation is
approaching the normal distribution with mean 0 and standard deviation (1-ρ2)2. The
~
Figure 5.3: Estimated distribution function H N ( t ) with CN from n=m(n) to N=max(n).
0.9
0.8
Distribution Function
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-3 -2 -1 0 1 2 3
t-values
74
It may be interesting to observe how the ASCLT-based distribution function performs
when generating from dependent normal distributions for various strengths in the
correlation parameter. When observing this distribution function, we may assume the
means are 0 and the variances are 1, because the correlation coefficient is independent of
a change in location and scale in X or Y. The following charts in Figures 5.4-5.6 were
created from a bivariate normal distribution where ρ=0.3, ρ=0.5, and ρ=0.7.
~
Figure 5.4: Estimated distribution function H N ( t ) for simulated samples from
dependent bivariate normal distributions when ρ=0.3.
0.9
0.8
Distribution Function
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-3 -2 -1 0 1 2 3
t-values
75
~
Figure 5.5: Estimated distribution function H N ( t ) for simulated samples from dependent
bivariate normal distributions when ρ=0.5.
ASCLT #1 for the Correlation Coefficient - Dependent Bivariate Normal (ρ=0.5)
ASCLT - CDF
1.1 Normal - CDF
max( n )
~ 1 1
H N (t ) = ∑ 1
C N n = m( n ) n { n (rn − ρˆ ) ≤ t }
1
0.9
0.8
Distribution Function
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5
t-values
~
Figure 5.6: Estimated distribution function H N ( t ) for simulated samples from dependent
bivariate normal distributions when ρ=0.7.
ASCLT #1 for the Correlation Coefficient - Dependent Bivariate Normal (ρ=0.7)
ASCLT - CDF
1.1 Normal - CDF
max( n )
~ 1 1
H N (t ) = ∑ 1
C N n = m( n ) n { n (rn − ρˆ ) ≤ t }
1
0.9
0.8
Distribution Function
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-1.5 -1 -0.5 0 0.5 1 1.5
t-values
76
Observe in Figure 5.4 that our empirical distribution function through simulation is also
approaching the normal distribution with mean 0 and standard deviation (1-ρ2)2, even if
there exists a correlation between the two normal random variables. However, as the
strength of the relationship between X and Y increase, the rate of convergence decreases.
increased.
when generating from non-normal distributions. Observe the following charts in Figures
5.7-5.10 created from bivariate exponential distributions for various strengths in the
correlation parameter.
~
Figure 5.7: Estimated distribution function H N ( t ) for simulated samples from bivariate
exponential distributions with parameters λ1 = 1 , λ 2 = 1 , λ12 = 1 ( ρ = 0.7 ).
6 4
0.9
0.8
Distribution Function
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-3.5 -2.5 -1.5 -0.5 0.5 1.5 2.5 3.5
t-values
77
~
Figure 5.8: Estimated distribution function H N ( t ) for simulated samples from bivariate
exponential distributions with parameters λ1 = 1 , λ 2 = 1 , λ12 = 1 ( ρ = 0.5 ).
6 3 2
ASCLT #1 for the Correlation Coefficient - Dependent Exponential RV's (ρ=0.5)
ASCLT - CDF
1.1 Normal - CDF
max( n )
~ 1 1
H N (t ) = ∑1
C N n =m( n ) n { n (rn − ρˆ ) ≤ t }
1
0.9
0.8
Distribution Function
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-4 -3 -2 -1 0 1 2 3 4
t-values
~
Figure 5.9: Estimated distribution function H N ( t ) for simulated samples from bivariate
exponential distributions with parameters λ1 = 1 , λ 2 = 1 , λ12 = 1 ( ρ = 0.25 ).
2 2 3
ASCLT #1 for the Correlation Coefficient - Dependent Bivariate Exponential (ρ=0.25)
ASCLT - CDF
1.1
max( n ) Normal - CDF
~ 1 1
H N (t ) = ∑ 1
C N n = m( n ) n { n (rn − ρˆ ) ≤ t }
1
0.9
0.8
Distribution Function
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-3.5 -2.5 -1.5 -0.5 0.5 1.5 2.5 3.5
t-values
78
~
Figure 5.10: Estimated distribution function H N ( t ) for simulated samples from
independent bivariate exponential distributions with parameters λi=3, i=1, 2.
ASCLT#1 for the Correlation Coefficient - Independent Exponential RV's
ASCLT - CDF
1.1 Normal - CDF
max( n )
~ 1 1
∑
1
H N (t ) = 1
C N n = m( n ) n { n (rn − ρˆ ) ≤ t } 0.9
0.8
Distribution Function
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-3 -2 -1 0 1 2 3
t-values
Observe the progression of the ASCLT-based distribution function method 1 in the above
figures for exponential distributions as the correlation changes. The skewed nature of the
parent populations and the dependent structure of the variables cause the rate of convergence
to substantially decrease. In fact, when the strength of the correlation decreases for the
increases. It should be noted that any simulations based on non-normal simulations converge
to a normal distribution with a mean of 0 and a variance that was defined in Theorem 3.1.
performs when generating from non-normal discrete distributions. Observe the following
charts in Figures 5.11-5.13 created from independent bivariate poisson distributions for
different parameters.
79
~
Figure 5.11: Estimated distribution function H N ( t ) for simulated samples from bivariate
poisson distributions with parameters λi=1, i=1, 2.
ASCLT #1 for the Correlation Coefficient - Independent Poisson RV's
ASCLT - CDF
1.1
Standard Normal - CDF
max( n )
~ 1 1
H N (t ) = ∑1
C N n =m( n ) n { n (rn − ρˆ ) ≤ t }
1
0.9
0.8
0.7
0.6
Function
0.5
0.4
0.3
0.2
0.1
0
-3 -2 -1 0 1 2 3
t-values
~
Figure 5.12: Estimated distribution function H N ( t ) for simulated samples from bivariate
poisson distributions with parameters λi=3, i=1, 2.
0.9
0.8
0.7
0.6
Function
0.5
0.4
0.3
0.2
0.1
0
-3 -2 -1 0 1 2 3
t-values
80
~
Figure 5.13: Estimated distribution function H N ( t ) for simulated samples from bivariate
poisson distributions with parameters λi=10, i=1, 2.
0.9
0.8
0.7
0.6
Function
0.5
0.4
0.3
0.2
0.1
0
-3 -2 -1 0 1 2 3
t-values
It does not appear that the discrete nature of the parent population affects the
convergence. The above figures indicate that the independent poisson distributions
5.5 ASCLT for the Correlation Coefficient Simulations – Variance Stabilizing Technique
In this section we will show how the ASCLT-based distribution function method 2 in
(5.3) that includes the correlation coefficient, Cramér’s Theorem, and Theorem 3.2,
converges to a normal distribution. For our estimated distribution function in (5.3), the
simulations in this section will also be completed for the same scenarios as in the
previous section:
81
• Bivariate normal distributions for different values of ρ.
• Bivariate exponential distributions for different values of λ1, λ2, and λ12.
We will again propose the same modifications to the original theoretical results in order
to make them more applicable in small sample situations. In this section, all the
N
~ 1 1
JN(t ) = ∑ 1
C N n = 1 n ⎧⎪ n ⎛⎜ log 1 + rn − log 1 + ρˆ ⎞⎟ ≤ t ⎫⎪
(5.6)
⎨ ⎜ ⎬
⎪⎩ 2 ⎝ 1 − rn ˆ⎟1− ρ ⎠ ⎪⎭
As mentioned in Remark 3.2, this function has an a.s.weak convergence to the normal
distribution with mean 0 and variance τ2. Also, ρ̂ will be the estimated population
In order to overcome the problem of small to moderately sized samples, we will again use
permutations of the entire sample in the process of estimating the empirical distribution
function. We will use the same steps as in the previous section; however, in Step 3 the
N
~* i 1 1
JN (t ) = ∑ 1
C N n = 1 n ⎧⎪ n ⎛⎜ 1 + rn* i 1 + ρˆ ⎞⎟ ⎪
⎫
, n = 1, ... , N ,
⎨ log − log ≤t
⎜ ⎟ ⎬
1 − ρˆ
⎪⎩ 2 ⎝ 1 − rn* i ⎠ ⎪⎭
and for each value of t, the estimated distribution function that will be observed:
~* i
∑i =1
nper
~ JN (t )
JN (t ) = .
nper
82
The following will give the reader some insight how the empirical distribution function
(5.6) converges to the true distribution function. Each plot will superimpose the
estimated distribution function with the true normal distribution. The idea here is to
evaluate the convergence of the empirical distribution to the true distribution function,
when generated from normal populations with different strengths in the correlation
parameter.
~
Figure 5.14: Estimated distribution function J N ( t ) for simulated samples from
dependent bivariate normal distributions with ρ=0.7.
ASCLT #2 for the Correlation Coefficient - Dependent Bivariate Normal (ρ=0.7)
ASCLT - CDF
1.1 Normal - CDF
max( n )
~ 1 1
JN(t ) = ∑ 1
C N n = m( n ) n ⎧⎪ n ⎛⎜ log 1+ rn − log 1+ ρˆ ⎞⎟ ≤ t ⎫⎪ 1
⎨ ⎜ ⎟ ⎬
⎪⎩ 2 ⎝ 1− rn 1− ρˆ ⎠ ⎪⎭
0.9
0.8
Distribution Function
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-3 -2 -1 0 1 2 3
t-values
83
~
Figure 5.15: Estimated distribution function J N ( t ) for simulated samples from
dependent bivariate normal distributions with ρ=0.5.
ASCLT #2 for the Correlation Coefficient - Dependent Bivariate Normal (ρ=0.5)
ASCLT - CDF
1.1 Normal - CDF
max( n )
~ 1 1
JN(t ) = ∑ 1
C N n = m( n ) n ⎧⎪ n ⎛⎜ log 1+ rn − log 1+ ρˆ ⎞⎟ ≤ t ⎫⎪ 1
⎨ ⎜ ⎟ ⎬
⎪⎩ 2 ⎝ 1− rn 1− ρˆ ⎠ ⎪⎭
0.9
0.8
Distribution Function
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-3 -2 -1 0 1 2 3
t-values
~
Figure 5.16: Estimated distribution function J N ( t ) for simulated samples from
dependent bivariate normal distributions with ρ=0.3.
ASCLT #2 for the Correlation Coefficient - Dependent Bivariate Normal (ρ=0.3)
ASCLT - CDF
1.1 Normal - CDF
max( n )
~ 1 1
JN(t ) = ∑ 1
C N n = m( n ) n ⎧⎪ n ⎛⎜ log 1+ rn − log 1+ ρˆ ⎞⎟ ≤ t ⎫⎪ 1
⎨ ⎜ ⎟ ⎬
⎪⎩ 2 ⎝ 1− rn 1− ρˆ ⎠ ⎪⎭
0.9
0.8
Distribution Function
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-3 -2 -1 0 1 2 3
t-values
84
~
Figure 5.17: Estimated distribution function J N ( t ) for simulated samples from
independent bivariate normal distributions.
ASCLT #2 for the Correlation Coefficient - Independent Bivariate Normal
ASCLT - CDF
1.1 Normal - CDF
max( n )
~ 1 1
JN(t ) = ∑ 1
C N n = m( n ) n ⎧⎪ n ⎛⎜ log 1+ rn − log 1+ ρˆ ⎞⎟ ≤ t ⎫⎪ 1
⎨ ⎜ ⎟ ⎬
⎪⎩ 2 ⎝ 1− rn 1− ρˆ ⎠ ⎪⎭
0.9
0.8
Distribution Function
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-3 -2 -1 0 1 2 3
t-values
The simulation presented in Figures 5.14-5.17 indicates that our empirical distribution
function through simulation is approaching the normal distribution with mean 0 and
standard deviation 1, which validates our theoretical assumptions. Even if there exists a
correlation between the two normal random variables, regardless of the strength,
generating from non-normal distributions. In fact, the following graphs were created
parameter.
85
~
Figure 5.18: Estimated distribution function J N ( t ) for simulated samples from bivariate
exponential distributions with parameters λ1 = 1 , λ 2 = 1 , λ12 = 1 ( ρ = 0.7 ).
6 4
ASCLT #2 for the Correlation Coefficient - Dependent Exponential RV's (ρ=0.7)
ASCLT #2 - CDF
max( n ) 1.1 Normal - CDF
~ 1 1
JN(t ) = ∑ 1
C N n = m( n ) n ⎧⎪ n ⎛⎜ log 1+ rn − log 1+ ρˆ ⎞⎟ ≤ t ⎫⎪ 1
⎨ ⎜ ⎟ ⎬
⎪⎩ 2 ⎝ 1− rn 1− ρˆ ⎠ ⎪⎭
0.9
0.8
Distribution Function
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-4 -3 -2 -1 0 1 2 3 4
t-values
~
Figure 5.19: Estimated distribution function J N ( t ) for simulated samples from bivariate
exponential distributions with parameters λ1 = 1 , λ 2 = 1 , λ12 = 1 ( ρ = 0.5 ).
6 3 2
0.8
Distribution Function
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-4 -3 -2 -1 0 1 2 3 4
t-values
86
~
Figure 5.20: Estimated distribution function J N ( t ) for simulated samples from bivariate
exponential distributions with parameters λ1 = 1 , λ 2 = 1 , λ12 = 1 ( ρ = 0.25 ).
2 2 3
ASCLT #2 for the Correlation Coefficient - Dependent Exponential RV's (ρ=0.25)
ASCLT #2 - CDF
max( n ) 1.1 Normal - CDF
~ 1 1
JN(t ) = ∑ 1
C N n = m( n ) n ⎧⎪ n ⎛⎜ log 1+ rn − log 1+ ρˆ ⎞⎟ ≤ t ⎫⎪ 1
⎨ ⎜ ⎟ ⎬
⎪⎩ 2 ⎝ 1− rn 1− ρˆ ⎠ ⎪⎭
0.9
0.8
Distribution Function
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-4 -3 -2 -1 0 1 2 3 4
t-values
~
Figure 5.21: Estimated distribution function J N ( t ) for simulated samples from bivariate
exponential distributions with parameter λi=3, i=1, 2.
ASCLT #2 for the Correlation Coefficient - Independent Exponential RV's
ASCLT #2 - CDF
max( n ) 1.1 Normal - CDF
~ 1 1
JN(t ) = ∑ 1
C N n = m( n ) n ⎧⎪ n ⎛⎜ log 1+ rn − log 1+ ρˆ ⎞⎟ ≤ t ⎫⎪ 1
⎨ ⎜ ⎟ ⎬
⎩⎪ 2 ⎝ 1− rn 1− ρˆ ⎠ ⎪⎭
0.9
0.8
Distribution Function
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-3 -2 -1 0 1 2 3
t-values
87
It appears in the Figure 5.18 that the skewed nature of the parent populations and the
decrease. However, as the strength of the correlation decreases from Figure 5.19 to 5.21,
~
the rate of convergence increases. Again, our empirical distribution function J N ( t ) is
again approaching the normal distribution with mean 0 and variance that was defined in
Theorem 3.1.
were created from independent bivariate poisson distributions for different parameters.
~
Figure 5.22: Estimated distribution function J N ( t ) for simulated samples from bivariate
poisson distributions with parameters λi=1, i=1, 2.
ASCLT #2 for the Correlation Coefficient - Independent Poisson RV's
ASCLT #2 - CDF
1.1
max( n ) Normal - CDF
~ 1 1
JN(t ) = ∑ 1
C N n = m( n ) n ⎧⎪ n ⎛⎜ log 1+ rn − log 1+ ρˆ ⎞⎟ ≤ t ⎫⎪ 1
⎨ ⎜ ⎟ ⎬
⎪⎩ 2 ⎝ 1− rn 1− ρˆ ⎠ ⎪⎭
0.9
0.8
0.7
0.6
Function
0.5
0.4
0.3
0.2
0.1
0
-3 -2 -1 0 1 2 3
t-values
88
~
Figure 5.23: Estimated distribution function J N ( t ) for simulated samples from bivariate
poisson distributions with parameters λi=3, i=1, 2.
ASCLT #2 for the Correlation Coefficient - Independent Poisson RV's
ASCLT #2 - CDF
1.1
max( n ) Normal - CDF
~ 1 1
JN(t ) = ∑ 1
C N n = m( n ) n ⎧⎪ n ⎛⎜ log 1+ rn − log 1+ ρˆ ⎞⎟ ≤ t ⎫⎪ 1
⎨ ⎜ ⎟ ⎬
⎪⎩ 2 ⎝ 1− rn 1− ρˆ ⎠ ⎪⎭
0.9
0.8
0.7
0.6
Function
0.5
0.4
0.3
0.2
0.1
0
-3 -2 -1 0 1 2 3
t-values
~
Figure 5.24: Estimated distribution function J N ( t ) for simulated samples from bivariate
poisson distributions with parameters λi=10, i=1, 2.
0.8
0.7
0.6
Function
0.5
0.4
0.3
0.2
0.1
0
-3 -2 -1 0 1 2 3
t-values
89
It does not appear that the discrete nature of the parent population affects the convergence
for the distribution function (5.3). The above figures indicate that the independent
parameters.
In this section, for the distribution function (5.5) we will define the quantile estimates for
each permuted sample. For α ∈ (0 ,1) , the quantiles ~tα* i ,( N ) and ~t1*−iα,( N ) are now
⎧ N ⎫
⎪ 1 ⎪
~
tα* i ,( N ) −1
= max ⎨t C N ∑ n
1⎧ ⎛ * i ⎞ ⎫ ≤ α ⎬
n ⎜ r − ρˆ ⎟ ≤ t ⎬
⎪⎩ n = 1 ⎨⎩ ⎝ n ⎠ ⎭ ⎪⎭
and
⎧ N ⎫
⎪ 1 ⎪
~ * i ,( N )
t1 −α −1
= max ⎨t C N ∑ n
1⎧ ⎛ * i ⎞ ⎫ ≤ 1 − α ⎬ ,
n ⎜ r − ρˆ ⎟ ≤ t ⎬
⎪⎩ n = 1 ⎨⎩ ⎝ n ⎠ ⎭ ⎪⎭
where C N = ∑ nN= 1 1 .
n
The following will be the ASCLT-based confidence interval estimates for permuted
samples:
⎡ t t ⎤
I α( N ) = ⎢ ρˆ + 1 −α , ρˆ + α ⎥ , (5.7)
⎣ N N⎦
∑ ∑
nper nper
t* i ,( N ) t* i ,( N )
i = 1 1 −α i =1 α
where t1 −α = and tα = .
nper nper
90
5.7 ASCLT-based Confidence Interval for Permuted Samples (Technique #2)
In this section, for the distribution function (5.6) we will define the quantile estimates for
⎧ ⎫
⎪ N ⎪
⎪ 1 ⎪
~
tα* i ,( N ) −1
= max ⎨t C N ∑ n
1⎧ ⎛ *
1 + rn i ⎞
1 + ρˆ ⎟ ⎪
⎫ ≤ α⎬
n = 1 ⎪⎨
⎪ n⎜ ⎪
log − log ≤t
⎪ 2 ⎜ * i 1 − ρˆ ⎟ ⎬ ⎪
⎪⎩ ⎝ 1 − rn ⎠ ⎪⎭
⎩ ⎭
and
⎧ ⎫
⎪ N ⎪
⎪ 1 ⎪
~ * i ,( N )
t1−α −1
= max ⎨t C N ∑ n
1⎧ ⎛ *
1 + rn i ⎞
1 + ρˆ ⎟ ⎪
⎫ ≤ 1 − α ⎬,
n = 1 ⎪⎨
⎪ n⎜ ⎪
log − log ≤t
⎪ 2 ⎜ * i 1 − ρˆ ⎟ ⎬ ⎪
⎪⎩ ⎝ 1 − rn ⎠ ⎪⎭
⎩ ⎭
where C N = ∑ nN= 1 1 .
n
The following will be the ASCLT-based confidence intervals estimates for permuted
samples:
⎡ ⎧ ⎛ ( N ) ⎞⎫
⎧ (N) ⎫ ⎤
⎢ exp ⎪⎨2⎜ z + t1−α ⎟⎪⎬ − 1 exp ⎪2⎛⎜ z + tα ⎞⎟⎪ − 1 ⎥
⎜ N ⎨ ⎜ N ⎬
⎢ ⎪ N ⎟⎪ ⎪⎩ ⎝ N ⎟⎠⎪⎭ ⎥
(N) ⎢ ⎩ ⎝ ⎠⎭ ⎥,
Iα = , (5.8)
⎢ ⎥
⎢
⎧ ⎛
⎪ ⎜ t1(−Nα) ⎞⎟⎫⎪ ⎧⎪ ⎛
⎜ tα( N ) ⎞⎟⎫⎪ ⎥
exp ⎨2 z N + +1
⎢ exp ⎨2⎜ z N + N ⎟⎬ + 1 ⎪ ⎜ N ⎟⎬⎪ ⎥
⎣⎢ ⎪⎩ ⎝ ⎠⎪⎭ ⎩ ⎝ ⎠⎭ ⎥
⎦
∑ ∑
nper nper
t* i ,( N ) t* i ,( N )
i = 1 1 −α i =1 α
where t1 −α = and tα = .
nper nper
91
5.8 Bootstrap Confidence Interval for Population Coefficient
In this section, we will introduce the confidence interval technique for ρ that utilizes a
methods are a recently developed technique for making statistical inferences. In fact,
with the recent development of more powerful computers, bootstrap techniques have
been extensively explored. This is because it requires the modern computer power to
the sampling distribution of a statistic by sampling with replacement from the original
sample. These samples are then used in the computation of the relevant bootstrap
estimates. This technique will be used to develop estimates of the standard errors and
used for the parameter estimation for the mean, median, proportion, odds ratio, and
regression coefficients. It can also be used in the development of hypothesis tests. For a
For interval estimation, the following resampling scheme is proposed to compute the
bootstrap confidence interval for ρ. The bootstrap algorithm for the confidence interval
original sample.
92
Step 2: Generate B1 independent bootstrap samples (X(*b1), Y(*b1)) with
sample.
∑( )
B1
(* b1 ) 1
se = ρˆ (*b1 ) − ρˆ (*b1 )
B1 − 1 b = 1
1
1
∑b11=1 ρˆ (*b1 ) .
B
where ρˆ (* b1 ) =
B1
bootstrap sub-sample:
∑( )
B2
se (* b2 ) =
1
ρˆ (*b2 ) − ρˆ (* b2 )
( B2 − 1 ) b = 1
2
1
∑b2 =1 ρˆ (*b2 ) .
B2
where ρˆ (* b2 ) =
B2
93
Step 6: The αth quantile of t*b is estimated by the value t̂ ( α ) such that:
α=
{
# t*b ≤ t̂ ( α ) }
B1
In this section, we will introduce the classical confidence interval technique for ρ that
classical confidence interval approach to the ASCLT-based interval presented in (5.8) and
n (r − ρ ) ⎯⎯→ N ( 0 ,( 1 − ρ 2 )2 ) .
L
(5.9). For normal populations, Theorem 3.2 stated that
1 1+ ρ
g( x ) = log
2 1− ρ
In this case, g(r) tends to become normal quickly as the sample size increases, and the
Furthermore, even if the distribution of g(r) is not strictly normal; it tends to be normal
Remark 5.1 The confidence interval for the population correlation coefficient by using
94
⎡ exp( 2 * ζ l ) − 1 exp( 2 * ζ u ) − 1 ⎤
I α( N ) = ⎢ , ⎥, (5.10)
⎣ exp( 2 * ζ l ) + 1 exp( 2 * ζ u ) + 1 ⎦
where
1 1 + rn 1
ζ l = log − zα ,
2 1 − rn n−3
and
1 1 + rn 1
ζ u = log + zα .
2 1 − rn n−3
For a (1-α)% confidence level, zα is the upper percentage point of the standard normal
(2004).
the ASCLT-based methods 1 and 2, bootstrap, and classic procedures. In this section, the
idea is to evaluate the confidence interval procedures when the samples come from
performance by changing the sample sizes, confidence levels, and correlation strengths. As
mentioned earlier, one of the major advantages of the ASCLT-based confidence intervals is
that the procedures do not involve the variance estimation. A focused concern when the
variance estimation is not involved is how well the derived confidence intervals perform
when estimating the correlation coefficient. The confidence interval methods detailed in
this chapter will be observed via monte-carlo simulations in order to compare the results of
95
these procedures. The confidence interval results will then be compared and evaluated by
the confidence interval, the more precise the estimation of the population
correlation coefficient.
• Confidence interval accuracy will be compared for each method. For each
As mentioned in the beginning of this chapter, only a few simulation results will be
presented here since the general trend remains similar over several settings. Numerical
results will be presented using tables and figures, and also summarized with discussion
• Random deviates will be generated for the normal, exp., and poisson distributions.
• For the normal populations, ρ will take on the values 0.7, 0.5, 0.3, and 0.0.
• For the exponential populations, ρ will take on the values 0.7, 0.5, 0.25, and 0.0.
• For the poisson populations, we will assume independence between the variables,
but the poisson parameter will take on the values λ = 1, 3, and 10.
• The total number of random deviates generated will be Nsim = 10,000 and nper =
100, Nsim = 1,000 and nper = 500, and Nsim = 100 and nper = 1000.
• The confidence levels that will be observed are 2α = 90%, 2α = 95%, and 2α = 99%.
96
5.11 Simulation Results based on Bivariate Normal Distributions
The idea here is to evaluate the performance of the ASCLT-based confidence intervals
versus other conventional methods when samples come from bivariate normal
distributions. Within this framework, we consider the scenarios when the samples
originate from bivariate populations with varying dependencies. In the Appendix, the
outcomes of 144 simulations will be presented in Table 5.1. Also, results of selected
Figure 5.25: Confidence interval results for N=10,000, CL=90%, and ρ=0.7.
0.020 % Containing ρ
100%
0.018
98%
0.016
0.014 96%
% Containing ρ
CI Ave Width
0.012 94%
0.010
92%
0.008
90%
0.006
88%
0.004
86%
0.002
0.000 84%
ASCLT #1 ASCLT #2 Bootstrap Classic
Ave CI Width 0.0167596 0.01552119 0.01683587 0.01678195
% Containing ρ 87% 85% 85% 91%
Simulations evaluating the property of precision and accuracy are presented in Figure
5.25 for N=10,000, CL=90%, and ρ=0.7. It is clear in this scenario that the ASCLT-
based confidence intervals for method 2 yielded the most precise interval. Even though
the accuracy did not reach the desired confidence level, this difference is not drastic
enough to be concerned.
97
Figure 5.26: Confidence interval results for N=1,000, CL=90%, and ρ=0.7.
0.050 98%
96%
0.040
% Containing ρ
CI Ave Width
94%
0.030
92%
90%
0.020
88%
0.010
86%
0.000 84%
ASCLT #1 ASCLT #2 Bootstrap Classic
Ave CI Width 0.05309734 0.04930895 0.05357339 0.05316684
% Containing ρ 89% 93% 91% 93%
Figure 5.27: Confidence interval results for N=1,000, CL=95%, and ρ=0.5.
0.090 98%
0.080
96%
0.070
% Containing ρ
CI Ave Width
94%
0.060
92%
0.050
0.040 90%
0.030
88%
0.020
86%
0.010
0.000 84%
ASCLT #1 ASCLT #2 Bootstrap Classic
Ave CI Width 0.09389322 0.09977876 0.09597071 0.09374027
% Containing ρ 93% 97% 96% 95%
98
Simulations evaluating the property of precision and accuracy are presented in Figure 5.26
for N=1,000, CL=90%, and ρ=0.7, and in Figure 5.27 for N=1,000, CL=95%, and ρ=0.5. It
is clear in these scenarios how the confidence interval precision changes as the correlation
strength changes. The stronger the correlation, the ASCLT-based confidence intervals for
method 2 yielded the most precise interval. As the correlation decreased to 0.5, the
ASCLT-based confidence intervals for method 1 yielded the most precise interval.
Figure 5.28: Confidence interval results for N=100, CL=90%, and ρ=0.7.
98%
0.160
0.140 96%
% Containing ρ
CI Ave Width
0.120 94%
0.100
92%
0.080
90%
0.060
88%
0.040
86%
0.020
0.000 84%
ASCLT #1 ASCLT #2 Bootstrap Classic
Ave CI Width 0.1681341 0.1558588 0.179894 0.1721169
% Containing ρ 84% 87% 88% 93%
Simulations evaluating the property of precision and accuracy are presented in Figure
5.28 for N=100, CL=90%, and ρ=0.7. In this scenario, the ASCLT-based confidence
intervals for method 2 yielded the most precise, however the accuracy level of 87% is
99
Figure 5.29: Confidence interval results for N=100, CL=95%, and ρ=0.3.
0.400 % Containing ρ
100%
0.350
98%
0.300
96%
% Containing ρ
0.250
CI Ave Width
94%
0.200
92%
0.150
90%
0.100 88%
0.050 86%
0.000 84%
ASCLT #1 ASCLT #2 Bootstrap Classic
Ave CI Width 0.3322145 0.380784 0.3751739 0.3606292
% Containing ρ 94% 94% 92% 96%
Figure 5.30: Confidence interval results for N=100, CL=99%, and ρ=0.0.
0.500 98%
0.450
96%
0.400
% Containing ρ
CI Ave Width
0.350 94%
0.300
92%
0.250
90%
0.200
0.150 88%
0.100
86%
0.050
0.000 84%
ASCLT #1 ASCLT #2 Bootstrap Classic
Ave CI Width 0.4381811 0.5733602 0.5807997 0.51566
% Containing ρ 99% 100% 98% 99%
100
Simulations evaluating the property of precision and accuracy are presented in Figure
5.29 for N=100, CL=95%, and ρ=0.3, and in Figure 5.30 for N=100, CL=99%, and ρ=0.0.
It is clear in these weak correlation scenarios that the ASCLT-based confidence intervals
for method 1 yielded the most precise interval while upholding the accuracy criteria.
The main findings from the results presented in tables and figures can be summarized as
follows.
• All the confidence intervals presented in this section were repeated and a long-run
whenever a simulation was repeated 100 times, nearly 100(1 – 2α)% of these
confidence intervals for method 2 consistently yielded the most precise interval.
• For normal random variables with moderate to weak correlations, the ASCLT-
based confidence intervals for method 1 consistently yielded the most precise
with strong correlations compared to the classic Bootstrap and Fisher’s variance
was to develop confidence intervals for small to moderate sample sizes. Future
research may show that this interval technique may be more precise as the sample
sizes increase.
101
5.12 Simulation Results based on Bivariate Exponential Distributions
The idea here is to evaluate the performance of the ASCLT-based confidence intervals
versus other conventional methods when samples come from distributions that are non-
normal. Therefore we want to see how the confidence interval results presented in the
consider the scenarios when the samples originate from bivariate exponential populations
with varying dependencies. We will use the Marshall-Olkin technique presented earlier
when simulating from dependent exponential distributions. In the Appendix, the outcomes
of 144 simulations will be presented in Table 5.2. Also, results of selected simulations will
Figure 5.31: Confidence interval results for N=10,000, CL=95%, and ρ=0.7.
0.045 98%
96%
0.040
94%
0.035 92%
90%
% Containing ρ
CI Ave Width
0.030 88%
86%
0.025
84%
0.020 82%
80%
0.015
78%
0.010 76%
74%
0.005
72%
0.000 70%
ASCLT #1 ASCLT #2 Bootstrap Classic
Ave CI Width 0.03876671 0.03567012 0.04520584 0.01968474
% Containing ρ 89% 98% 93% 58%
102
Simulations evaluating the property of precision and accuracy are presented in Figure
5.31 for N=10,000, CL=95%, and ρ=0.7. It is clear in this scenario the classic confidence
interval yielded a precise interval, however the accuracy was poor. However, the
ASCLT-based confidence intervals for method 2 yielded a precise interval with a high
accuracy rate.
Figure 5.32: Confidence interval results for N=1,000, CL=95%, and ρ=0.7.
0.135 98%
96%
0.120
94%
0.105 92%
90%
% Containing ρ
CI Ave Width
0.090 88%
86%
0.075
84%
0.060 82%
80%
0.045
78%
0.030 76%
74%
0.015
72%
0.000 70%
ASCLT #1 ASCLT #2 Bootstrap Classic
Ave CI Width 0.1134059 0.1038371 0.1410777 0.06253979
% Containing ρ 92% 87% 92% 57%
Simulations evaluating the property of precision and accuracy are presented in Figure
5.32 for N=1,000, CL=95%, and ρ=0.7. Again, the classic technique had a low accurate,
high precision interval. The ASCLT-based confidence intervals for method 2 yielded a
precise interval with a low accuracy rate. Also, the ASCLT-based confidence intervals
103
Figure 5.33: Confidence interval results for N=100, CL=95%, and ρ=0.7.
0.270
0.240 95%
0.210
90%
% Containing ρ
CI Ave Width
0.180
0.150 85%
0.120
80%
0.090
0.060
75%
0.030
0.000 70%
ASCLT #1 ASCLT #2 Bootstrap Classic
Ave CI Width 0.288143 0.2253667 0.5558801 0.2048886
% Containing ρ 77% 73% 86% 50%
Figure 5.34: Confidence interval results for N=100, CL=99%, and ρ=0.0.
0.450 96%
0.400 94%
% Containing ρ
CI Ave Width
0.350 92%
0.300
90%
0.250
88%
0.200
86%
0.150
84%
0.100
0.050 82%
0.000 80%
ASCLT #1 ASCLT #2 Bootstrap Classic
Ave CI Width 0.4110187 0.4783997 0.6431355 0.5155147
% Containing ρ 94% 93% 99% 98%
104
Simulations evaluating the property of precision and accuracy are presented in Figure
5.33 for N=100, CL=95%, and ρ=0.7, and in Figure 5.34 for N=100, CL=99%, and ρ=0.0.
It is clear in these scenarios that for a small sample size and a strong correlation, there are
issues with both precision and accuracy for all confidence interval techniques. However,
as the strength of the correlation decreases the accuracy of each interval technique
increases. For the independent case, the ASCLT-based confidence intervals for method 1
yielded the most precise interval. The main findings from the results presented in tables
• The classic confidence interval technique had the poorest results when the
had the most precise intervals, it had very inaccurate results. When the random
variables had a strong correlation, the long-run relative frequency did not obtain
However, this method also consistently had the least precise interval estimates.
compared to method 1 and the bootstrap method, however there were issues with
moderate to weak levels, the accuracy and precision of method 1 was consistently
105
5.13 Simulation Results based on Bivariate Poisson Distributions
The idea here is to evaluate the performance of the ASCLT-based confidence intervals
versus other conventional methods when samples come from distributions that are discrete
and non-normal. Therefore we want to see how the confidence interval results presented in
the chapter change if the assumption of normality is violated. Within this framework, we
consider the scenarios when the samples originate from independent bivariate poisson
5.3. Also, results of selected simulations will be displayed and conclusions presented
Figure 5.35: Confidence interval results for N=10,000, CL=95%, and ρ=0.0.
0.045 % Containing ρ
100%
0.040
98%
0.035
96%
0.030
% Containing ρ
CI Ave Width
94%
0.025
92%
0.020
90%
0.015
0.010 88%
0.005 86%
0.000 84%
ASCLT #1 ASCLT #2 Bootstrap Classic
Ave CI Width 0.03792344 0.04155826 0.04008428 0.03920124
% Containing ρ 92% 91% 96% 95%
106
Figure 5.36: Confidence interval results for N=1,000, CL=95%, and ρ=0.0.
0.140 % Containing ρ
100%
0.120
98%
0.100 96%
% Containing ρ
CI Ave Width
94%
0.080
92%
0.060
90%
0.040
88%
0.020
86%
0.000 84%
ASCLT #1 ASCLT #2 Bootstrap Classic
Ave CI Width 0.1180717 0.1339668 0.1290338 0.1239971
% Containing ρ 95% 91% 96% 93%
Figure 5.37: Confidence interval results for N=100, CL=95%, and ρ=0.0.
0.450 % Containing ρ
100%
0.400
98%
0.350
96%
0.300
% Containing ρ
CI Ave Width
94%
0.250
92%
0.200
90%
0.150
0.100 88%
0.050 86%
0.000 84%
ASCLT #1 ASCLT #2 Bootstrap Classic
Ave CI Width 0.3582246 0.4239575 0.4247295 0.3943375
% Containing ρ 95% 95% 93% 97%
107
It is clear in the three simulations poisson examples presented above that the ASCLT-
based confidence intervals for method 1 yielded the most precise interval. Even though
the accuracy did not reach the desired confidence level when N=10,000, this difference is
The main findings from the results presented in tables and figures can be summarized as
follows.
• All the confidence intervals presented in this section were repeated and a long-run
whenever a simulation was repeated 100 times, nearly 100(1 – 2α)% of these
intervals for method 1 yielded the most precise interval, regardless of the sample
size.
precise interval estimates when compared to the bootstrap and classic methods.
108
Chapter 6
Conclusion
Throughout this thesis, we considered the balance between the asymptotic theory and
real-life, finite sample approximations. With this in mind, we tried to present all the
ideas with a mixture of mathematical theory with supporting simulation based evaluation.
This thesis presented some existing theoretical results on the Almost Sure Central Limit
Theorem. After we introduced the ASCLT, two distribution functions that included the
ASCLT and the correlation coefficient were presented. We immediately discussed ways
of modifying these distribution functions to address the rate of convergence issue. These
modifications were presented and evaluated with simulations. This was then followed
with proposals for estimating the quantiles of the distribution of our correlation
Throughout this thesis, wherever appropriate, there have been conclusions stated. In this
final chapter, we will summarize our conclusions, and mention possible future research
ideas.
6.1 Summary
The following bullet points summarize the conclusions about the ASCLT-based
109
• For the bivariate normal distribution, both ASCLT-based distribution function
decreased. For method 2, even if there existed a correlation between the two
normal random variables, regardless of the strength, the rate of convergence was
not affected.
correlation between the two normal random variables existed. The skewed
natural of the parent populations and the dependent structure of the variables
nature of the parent population (nor the poisson parameter) did not affect the
convergence.
For the bivariate normal distribution, the following bullet points summarize the
conclusions about the confidence intervals constructed for the population correlation
• For strong, moderate, weak, and no variable correlations, simulations show that
the accuracy for all methods were reasonable. However, the classical procedure
was the only method to consistently meet the accuracy criteria set by each
110
confidence interval method 2 had the most precise intervals. However, for weak
For the bivariate normal distribution, the following bullet points summarize the
conclusions about the confidence intervals constructed for the population correlation
• For strong, moderate, weak, and no variable correlations, simulations show that
the accuracy for all methods were reasonable. However, the bootstrap procedure
was the only method to consistently meet the accuracy criteria set by each
confidence interval method 2 typically had the most precise intervals. However,
For the bivariate normal distribution, the following bullet points summarize the
conclusions about the confidence intervals constructed for the population correlation
• For moderate, weak, and no variable correlations, simulations show that the
accuracy for all methods were reasonable. However, for strong correlations,
simulations show that all methods had accuracy issues, except for the classical
interval method 2 had the most precise intervals. However, for moderate, weak,
111
and no variable correlations, the ASCLT-based confidence interval method 1 had
After observing the simulation results for the bivariate normal distribution, consider the
following concluding statements. For strong variable correlations, after considering both
accuracy and precision collectively, the ASCLT-based confidence interval for method 2
had the “best” intervals by comparison. However as the correlation weakened, and after
For the bivariate exponential distribution, the following bullet points summarize the
conclusions about the confidence intervals constructed for the population correlation
However, this method occasionally did not meet the accuracy requirements set by
the confidence level. Though the bootstrap method did not have the most precise
• For strong, moderate, and weak variable correlations, simulations show that the
For the bivariate exponential distribution, the following bullet points summarize the
conclusions about the confidence intervals constructed for the population correlation
112
from small sample simulations.
• For strong variable correlations, simulations show that all confidence interval
• For moderate, weak, and no correlations, simulations show that the ASCLT-based
confidence interval for method 1 had the most reasonable intervals. However, all
After observing the simulation results for the bivariate exponential distribution, consider
the following concluding statements. For strong, moderate, and weak variable
correlations, all methods had issues with accuracy regardless of the sample size. The
only scenario where accuracy occurred was when the simulated variables were
independent. Also for small sample sizes, the estimated intervals for each method were
both inaccurate and imprecise. For strong to moderate variable correlations, after
interval for method 1 (more precise) or the bootstrap method (more accurate) had the
For the bivariate poisson distribution, the following bullet points summarize the
conclusions about the confidence intervals constructed for the population correlation.
• For any confidence level and sample size, simulations show that the accuracy for
all methods were reasonable. For the independent bivariate poisson distribution,
113
the ASCLT-based confidence interval for method 1consistently had the most
precise intervals.
After observing the accuracy and precision for the bivariate poisson distribution, the
ASCLT-based confidence interval for method 1 had the “best” intervals by comparison.
Significant numerical and theoretical investigations have been completed, however there
are topics that were not considered in this dissertation. Possible examples to consider:
additional ways to speed up the convergence for the ASCLT-based distribution function
methods 1 and 2 when the parent population is severely skewed; additional ways to make
the interval estimation for methods 1 and 2 more precise for small sample sizes; develop
methods to perform a test of hypothesis about the correlation coefficient (i.e. ρ ≠ 0) using
114
Appendix
Table 5.1: Simulation Results for the Bivariate Normal Distribution
115
Table 5.2: Simulation Results for the Bivariate Exponential Distribution
116
Table 5.3: Simulation Results for the Bivariate Poisson Distribution
117
Bibliography
Lehmann,E. L., and J. P. Romano, Testing Statistical Hypothesis, New York: Springer,
2005.
Fisher, A., Convey invariant measure and pathwise central limit theorem, Advances in
Mathematics, 63, 213 – 246, 1987.
Schatte, P., On strong versions of the central limit theorem, Mathematische Nachricten,
137, 249-256, 1988.
Atlagh, M., Almost sure central limit theorem and law of the iterated logarithm for sums
of independent random variables, C. R. Acad. Sci. Paris Sér. I., 316, 929 – 933, 1993.
Atlagh, M., and M. Weber, An almost sure central limit theorem relative to
subsequences, C. R. Acad. Sci. Paris Sér. I., 315, 203 – 206, 1992.
Berkes, I., and H. Dehling. Some limit theorems in log density, The Annals of
Probability, 21, 1640 – 1670, 1993.
Hörmann, S., Critical behavior in almost sure central limit theory, Journal of Theoretical
Probability, 20, 613 – 636, 2007.
Lévy, P., Theorie de l’addition des variables aleatories, Gauthier-Villars, Paris, 1937.
Thangavelu, K., Quantile estimation based on the almost sure central limit theorem,
Ph.D. thesis, University of Göttingen, Department of Medical Statistics, 2005.
Ferguson, T.S., A Course in Large Sample Theory, Chapman & Hall, New York, 2002.
Rogers, J. L., and A. Nicewander, Thirteen ways to look at the correlation coefficient,
The American Statistician, 42, 59-66, 1988.
Pearson, K., Notes on the history of correlation, Biometrika, 13, 25-45, 1920.
Fisher, R.A., Frequency distribution of the values of the correlation coefficient in samples
of an indefinitely large population, Biometrika, 10, 507–521, 1915.
118
Hotelling, H., New light on the correlation coefficient and its transforms, Journal of the
Royal Statistical Society, B, 15, 193–225, 1953.
Fisher, R.A., On the `probable error' of a coefficient of correlation deduced from a small
sample, Metron, 1, 3–32, 1921.
Ruben, H., Some new results on the distribution of the sample correlation coefficient,
Journal of the Royal Statistical Society, Series B (Methodological), 28, 513-525, 1966.
Sun, Y., and A. C. M. Wong, Interval Estimation for the normal correlation coefficient,
Statistics and Probability Letter, 77, 1652-1661, 2007.
Dudley, R. M., Real Analysis and Probability, Wadsworth & Brooks/Cole, 1989.
Berkes, I., and E. Csáki, A universal result in almost sure central limit theory, Stochastic
Processes and their applications, 94, 105 – 134, 2001.
Lifshits, M. A., The almost sure limit theorem for sums of random vectors, Journal of
Mathematical Sciences, 109, 2166 – 2178, 2002.
Lifshits, M. A., The almost sure limit theorem for sums of random vectors, Translated
from Zapiski Nauchnykh Seminarov POMI, 260, 186 – 201, 1999.
Lacey, M. T., and W. Philipp, A note on the almost sure central limit theorem, Statistics
and Probability Letters, 9, 201 – 205, 1990.
Berkes, I., Results and problems related to the pointwise central limit theorem, in B. S.
(Ed.), Asymtotic Methods in Probability and Statistics – A volume in Honour of Miklos
Csorgo, 59 – 96, Elseview, Amsterdam, 1998.
Holzmann, H., S. Koch, and A. Min, Almost sure limit theorems for U-statistics,
Statistics and Probability Letters, 69, 261 – 269, 2004.
Van der Vaart, A. W., Asymptotic Statistics, Cambridge University Press, New York,
1998.
119
Manoukian, E., Mathematical Nonparametric Statistics, Gordon and Breach Science
Publishers S. A., 1986.
Dehling, H., M. Denker, and W. Philipp, Invariance principles for von Mises and U-
statistics, Z. Wahrscheinlichkeitstheor. Verw. Geb. 67, 139-167, 1984..
Lange, K., Numerical Analysis for Statisticians, Springer-Verlag New York Inc., New
York, NY 10010, U.S.A., 1999.
Gentle, J. E., Random Number Generation and Monte Carlo Methods, second ed.,
Springer-Verlag New York Inc., New York, NY 10010, U.S.A., 2003.
Nelson, R., An Introduction to Copulas, Springer-Verlag New York Inc., New York, NY
10010, U.S.A., 1999.
Devroye, L., Non-Uniform Random Variate Generation, Springer-Verlag New York Inc.,
New York, NY 10010, U.S.A., 1986.
Efron, B., and R. J. Tibshirani, An Introduction to the Bootstrap, Chapman & Hall, New
York, 1993.
Devore, J. L., Probability and Statistics for Engineering and the Sciences, sixth ed.,
Brooks/Cole – Thomson Learning, Belmont, Ca., 2004.
120