MATH 524 Nonparametric Statistics

'
MATH 524 Nonparametric Statistics

Christian Genest, Ph.D., P.Stat. McGill University Montral (QC) Canada e Fall 2011
&
2011
'
Nonparametric Statistics
It covers a wide range of topics but refers generally to inference methods that are said to be distribution free because they do not rely on assumptions such as the Gaussian distribution; the data are drawn from a family f of distributions, where is a parameter. &
Chr. Genest
%
McGill University
2011
'
In this course, we will focus on: rank-based methods; hypothesis testing problems. Very little will be said about estimation in general, and nothing whatsoever about kernel-based estimation methods.
&
Chr. Genest
%
McGill University
2011
'
Notion of rank
Given a random sample X1 , . . . , XN from an arbitrary distribution F , let Ri = rank of Xi = integer between 1 and N representing the position of Xi among X1 , . . . , XN .
&
Chr. Genest
%
McGill University
2011
'
Example
Consider the following data: X1 = 1.6, X4 = 0.8, X2 = 2.4, X5 = 1. X3 = 3.2,
The associated ranks are: R1 = 3, R4 = 2, &

Chr. Genest
R2 = 4, R5 = 1.
R3 = 5,
%
McGill University
2011
'
Working hypothesis
It will generally be assumed that the data arise from a continuous distribution. This will ensure that there are no ties (a.s.). As ties do occur in practice, the issue will be revisited (one could assign average ranks, add a small random noise to break ties, etc.). &
Chr. Genest
%
McGill University
2011
'
Ranks: pro or con?

On one hand, it looks like a waste: lots of information is being discarded! On the other hand, there is a clear gain in robustness: the assumptions under which the procedures will be carried out are minimal.
&
Chr. Genest
%
McGill University
2011
'
Loss of information
In the previous example, one has x = = 1 (1.6 + 2.4 + 3.2 + 0.8 1) 5 7 = estimation of = mean of F, 5
while in contrast, the average of the ranks is always equal to 3, whatever F : 1 (1 + 2 + 3 + 4 + 5) = 3. 5 &
Chr. Genest
%
McGill University
2011
'
Distribution of (R1 , . . . , RN )
Denote Ri = rank of the ith observation, i {1, . . . , N }. If F is continuous, the random vector (R1 , . . . , RN ) is uniformly distributed on permutations of {1, . . . , N }. This set has N ! points. &
Chr. Genest
%
McGill University
2011
'
Numerical illustration
When N = 4, there are 4! = 24 possibilities: (1, 2, 3, 4) (2, 1, 3, 4) (3, 1, 2, 4) (3, 1, 4, 2) (3, 2, 1, 4) (3, 2, 4, 1) (3, 4, 1, 2) (3, 4, 2, 1) (4, 1, 2, 3) (4, 1, 3, 2) (4, 2, 1, 3) (4, 2, 3, 1) (4, 3, 1, 2) (4, 3, 2, 1)
(1, 2, 4, 3) (2, 1, 4, 3) (1, 3, 2, 4) (2, 3, 1, 4) (1, 3, 4, 2) (2, 3, 4, 1) (1, 4, 2, 3) (2, 4, 1, 3) (1, 4, 3, 2) (2, 4, 3, 1) They are equally likely. &
Chr. Genest
%
McGill University
2011
'
10
Invariance with respect to F

If F is continuous and strictly increasing, then Ri Rj Xi Xj F (Xi ) F (Xj ),
iid
Furthermore, F (X1 ), . . . , F (XN ) U (0, 1). For if X F , then F (X) U (0, 1). &
Chr. Genest
%
McGill University
2011
'
11
Proof in the case n = 2

If N = 2, the pairs of ranks (1, 2) and (2, 1) are equiprobable. This is equivalent to saying that 1 Pr(R1 = 1) = Pr(R2 = 1) = . 2 Is it really the case? &
Chr. Genest
%
McGill University
2011
'
12
Let f (x) = dF (x)/dx. As X1 and X2 are independent, one has Pr(R1 = 1) = Pr(R1 < R2 ) = Pr(X1 < X2 )
Pr(X1 x|X2 = x)f (x) dx Pr(X1 x) dF (x)

= =
F (x) dF (x).
&
Chr. Genest
%
McGill University
2011
'
13
Now if u = F (x) [0, 1], then

1
F (x) dF (x) =
0
u u du = 2
2 1 0
1 = , 2
and the proof is complete. A similar argument will be presented in class for the case n = 3.
&
Chr. Genest
%
McGill University
2011
'
14
Consequence for the moments

Simple calculations show that 1 N 1 N
N
Ri
i=1 N 2 Ri i=1
1 (N + 1) R, 2 1 (N + 1)(2N + 1), 6 1 N
N
var(R)
Ri R
i=1
1 N2 1 . = 12
&
Chr. Genest
%
McGill University
2011
'
15
Is this interesting at all?

No, if all we have in a univariate sample! Yes, if we want to compare two samples! Even more so if we have a bivariate sample! And in many other situations that we will explore together...
&
Chr. Genest
%
McGill University

MATH 524 Nonparametric Statistics

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

MATH 524 Nonparametric Statistics

Transféré par

Droits d'auteur :

Formats disponibles

'

MATH 524 Nonparametric Statistics

The associated ranks are: R1 = 3, R4 = 2, &

Ranks: pro or con?

Invariance with respect to F

Proof in the case n = 2

Pr(X1 x|X2 = x)f (x) dx Pr(X1 x) dF (x)

Now if u = F (x) [0, 1], then

Consequence for the moments

Is this interesting at all?

Vous aimerez peut-être aussi