Académique Documents
Professionnel Documents
Culture Documents
&
2011
'
Nonparametric Statistics
It covers a wide range of topics but refers generally to inference methods that are said to be distribution free because they do not rely on assumptions such as the Gaussian distribution; the data are drawn from a family f of distributions, where is a parameter. &
Chr. Genest
%
McGill University
2011
'
In this course, we will focus on: rank-based methods; hypothesis testing problems. Very little will be said about estimation in general, and nothing whatsoever about kernel-based estimation methods.
&
Chr. Genest
%
McGill University
2011
'
Notion of rank
Given a random sample X1 , . . . , XN from an arbitrary distribution F , let Ri = rank of Xi = integer between 1 and N representing the position of Xi among X1 , . . . , XN .
&
Chr. Genest
%
McGill University
2011
'
Example
Consider the following data: X1 = 1.6, X4 = 0.8, X2 = 2.4, X5 = 1. X3 = 3.2,
R2 = 4, R5 = 1.
R3 = 5,
%
McGill University
2011
'
Working hypothesis
It will generally be assumed that the data arise from a continuous distribution. This will ensure that there are no ties (a.s.). As ties do occur in practice, the issue will be revisited (one could assign average ranks, add a small random noise to break ties, etc.). &
Chr. Genest
%
McGill University
2011
'
&
Chr. Genest
%
McGill University
2011
'
Loss of information
In the previous example, one has x = = 1 (1.6 + 2.4 + 3.2 + 0.8 1) 5 7 = estimation of = mean of F, 5
while in contrast, the average of the ranks is always equal to 3, whatever F : 1 (1 + 2 + 3 + 4 + 5) = 3. 5 &
Chr. Genest
%
McGill University
2011
'
Distribution of (R1 , . . . , RN )
Denote Ri = rank of the ith observation, i {1, . . . , N }. If F is continuous, the random vector (R1 , . . . , RN ) is uniformly distributed on permutations of {1, . . . , N }. This set has N ! points. &
Chr. Genest
%
McGill University
2011
'
Numerical illustration
When N = 4, there are 4! = 24 possibilities: (1, 2, 3, 4) (2, 1, 3, 4) (3, 1, 2, 4) (3, 1, 4, 2) (3, 2, 1, 4) (3, 2, 4, 1) (3, 4, 1, 2) (3, 4, 2, 1) (4, 1, 2, 3) (4, 1, 3, 2) (4, 2, 1, 3) (4, 2, 3, 1) (4, 3, 1, 2) (4, 3, 2, 1)
(1, 2, 4, 3) (2, 1, 4, 3) (1, 3, 2, 4) (2, 3, 1, 4) (1, 3, 4, 2) (2, 3, 4, 1) (1, 4, 2, 3) (2, 4, 1, 3) (1, 4, 3, 2) (2, 4, 3, 1) They are equally likely. &
Chr. Genest
%
McGill University
2011
'
10
Furthermore, F (X1 ), . . . , F (XN ) U (0, 1). For if X F , then F (X) U (0, 1). &
Chr. Genest
%
McGill University
2011
'
11
%
McGill University
2011
'
12
Let f (x) = dF (x)/dx. As X1 and X2 are independent, one has Pr(R1 = 1) = Pr(R1 < R2 ) = Pr(X1 < X2 )
= =
F (x) dF (x).
&
Chr. Genest
%
McGill University
2011
'
13
F (x) dF (x) =
0
u u du = 2
2 1 0
1 = , 2
and the proof is complete. A similar argument will be presented in class for the case n = 3.
&
Chr. Genest
%
McGill University
2011
'
14
Ri
i=1 N 2 Ri i=1
1 (N + 1) R, 2 1 (N + 1)(2N + 1), 6 1 N
N
var(R)
Ri R
i=1
1 N2 1 . = 12
&
Chr. Genest
%
McGill University
2011
'
15
&
Chr. Genest
%
McGill University