Académique Documents
Professionnel Documents
Culture Documents
❖ Discrete Distributions
❖ Continuous Distributions
8
< 0 6 f (y) Rb
P (a 6 y 6 b) = f (y)dy
: R1 a
1
f (y)dy = 1
Probability Distributions
P (y = yi ) = p(yi )
p(yi )
❖ Discrete Distributions y1 yn yi
y2
❖ Continuous Distributions f (y)
P (a y b))
a b y
Mean and Variance
⇥
⇤⇥ y i p(y i )
µ =: E(y) = i
⇥
yf (y)dy
❖ Expected Value
Operators
❖ Variance
⇥
⇤⇥ (y i µ)2
p(y i )
⇥ =: V (y) =
2 i
⇥
(y µ) 2
f (y)dy
2
= E[(y µ) ]
2
Fundamental Relationships
E(c) = c (1)
E(y) = µ (2)
E(cy) = cE(y) = cµ (3)
V (c) = 0 (4)
V (y) = ⇥ 2
(5)
V (cy) = c2 V (y) = c2 ⇥ 2 (6)
E(y1 + y2 ) = E(y1 ) + E(y2 ) = µ1 + µ2 (7)
Cov(y1 , y2 ) = E[(y1 µ1 )(y2 µ2 )] (8)
V (y1 + y2 ) = V (y1 ) + V (y2 ) + 2 Cov(y1 , y2 ) (9)
V (y1 y2 ) = V (y1 ) + V (y2 ) 2 Cov(y1 , y2 ) (10)
Population and Samples
Distributions
and Discrete Variables
Binomial Distribution (d)
0.15 0.10
0.080.15
❖ trials z1,…,zn are independent
Binom(20,0.5)
Geom(20,0.5)
0.10
Poisson(5)
0.10
0.06
❖ a trial outcome can be either
0.04
0.05
0.05
success or failure
0.02
0.00
0.00
0 5 10 15 20
❖ probability of success p is 0 5 10
xx
15 20
constant
Binomial Distribution (d)
x = z 1 + z 2 + · · · + zn
x ⇠ Binom(n, p) when :
0.15
✓ ◆
n x
p (1 p)n x , x 2 {0, 1, . . . , n}
Binom(20,0.5)
f (x) =
0.10
x
0.05
n = Number of trials
0.00
p = Probability of success for each trial 0 5 10 15 20
2
µ = np, = np(1 p) x
Poisson Distribution (d)
0.15 0.10
0.080.15
❖ In a binomial experiment, when
Binom(20,0.5)
Geom(20,0.5)
n becomes large (infinite) and
0.10
Poisson(5)
0.10
0.06
the distribution mean remains
0.04
0.05
0.05
constant:
0.02
0.00
0.00
0 5 10 15 20
0 5 10 15 20
xx
x ⇠ Poisson( ) when :
0.15 0.10
x
0.080.15
e
f (x) =
x!
Binom(20,0.5)
Geom(20,0.5)
0.10
Poisson(5)
0.10
0.06
0.04
0.05
= Probability of success
0.05
within an interval
0.02
0.00
0.00
2
E(x) = µ = ; V (x) = = 0
0 5
5 10
10 15
15 20
20
xx
Geometric Distribution (d)
0.080.15 0.10
x ⇠ Geom(p) when :
Binom(20,0.5)
Geom(20,0.5)
p)x 1
0.10
f (x) = p(1 , x 2 {1, 2, . . . }
Geom(0.1)
0.06
0.04
0.05
p = Probability of success for each trial
0.02
0.00
0 5 10 15 20
0 5 10 15 20
xx
Uniform Distribution (d, c)
1.0
0.8
y ⇠ U (a, b) when :
⇢
0.6
U(0,1)
0 y < a, y > b
f (y) =
0.4
1/(b a) a 6 y 6 b
0.2
0.0
0.0 0.5 1.0 1.5 2.0
y
Normal Distribution (c)
0.4
y ⇠ N (µ, 2 ) when :
0.3
1 1 y µ 2
f (y) = p e 2 [ ]
N(0,1)
0.2
2⇡
y µ
0.1
Note : ⇠ N (0, 1)
0.0
-3 -2 -1 0 1 2 3
y
Central Limit Theorem
0.20
1
0.15
f (y) = y k/2 1
e y/2
2k/2 (k/2)
0.10
2
k
χ
µ=k
0.05
2
= 2k
Pn
0.00
SS i=1 (yi ȳ)2 2 0 10 20 30 40
Note : 2
= 2
⇠ n 1 y
Student’s Distribution (c)
y ⇠ tk when :
z 2
y=p ; z ⇠ N (0, 1), x ⇠ k
0.5
x/k 106
0.4
[(k 1)/2] 1
f (y) = p 5
0.3
[(y 2 /k) + 1](k+1)/2
k⇡ (k/2)
tk
1
0.2
µ=0
0.1
2 k
=
0.0
k 2 -3 -2 -1 0 1 2 3
Note : tk = N (0, 1)
k!1
Snedecor’s F Distribution (c)
y ⇠ Fu,v when :
1.0
xu /u
0.8
2 2
y= ; xu ⇠ u , xv ⇠ v
0.6
xv /v
F10,10
0.4
u+v u u/2 (u/2) 1
2 v y
f (y) =
0.2
u v u (u+v)/2
vx + 1
0.0
2 2 0 2 4 6 8
y
Distribution of Mean
X̄ µ
p ⇠ N (0, 1)
/ n
Moreover, if 2 is unknown it can be replaced with the sample variance S 2 ,
and in this case it holds:
X̄ µ
p ⇠ tn 1
S/ n
From Sample to Population
Induction of population
Inferential Statistics properties from sample
observation
Difference in Sample Statistics
N (µ1 , 1)
2
N (µ2 , 2)
2
µ1 µ2 y
Treatment 1 Treatment 2
H0 : µ1 = µ2
H1 : µ1 = µ2
Error Probability
↵ = P (type I error) = P (reject H0 |H0 is true)
= P (type II error) = P (fail to reject H0 |H0 is false)
Power = 1 = P (reject H0 |H0 is false)
↵ 6= 1
Quantile function: the value of tn that has a probability less than α/2
Distribution Cumulative prob.
0.4
1.0
0.8
0.3
pnorm(x, low = F)
0.6
dnorm(x)
0.2
0.4
0.1
0.2
0.0
0.0
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
Cumulative prob. Quantile function
x x
1.0
2
“Upper Tail”
0.8
1
pnorm(x, low = F)
0.6
0
x
0.4
-1
0.2
-2
0.0
x pnorm
Student’s (t)-Test
µ1 6= µ2 µ1 = µ2
0.4
0.4
0.3
0.3
dt(x, k)
dt(x, k)
0.2
0.2
0.1
0.1
0.0
0.0
-4 -2 0 2 4 -4 -2 0 2 4
t t
p>↵ p<↵
One Sample, Unknown Variance
H0 : µ = µ0
|t0 | > t↵/2,n
H1 : µ = µ0 1
H0 : µ = µ0 ȳ p
µ0
t0 = t0 < t↵,n
H1 : µ < µ0 S/ n 1
H0 : µ = µ0
t0 > t↵,n
H1 : µ > µ0 1
Two Samples, Unknown Variance
2
1 = 2
2
H0 : µ 1 = µ 2
|t0 | > t↵/2,⌫
H1 : µ1 6= µ2 ȳ1 ȳ2
t0 = p
Sp 1/n1 + 1/n2
H0 : µ1 = µ2
⌫ = n 1 + n2 2
H1 : µ1 < µ2
(n1 1)S12 + (n2 1)S22
Sp2 =
n1 + n2 2
t0 > t↵,⌫
H0 : µ1 = µ2
H1 : µ1 > µ2
Two Samples, Unknown Variance
2
1 6= 2
2
H0 : µ 1 = µ 2
|t0 | > t↵/2,⌫
H1 : µ1 6= µ2 ȳ1 ȳ2
t0 = p
H0 : µ1 = µ2 S12 /n1 + S22 /n2
H1 : µ1 < µ2 (S12 /n1 + S22 /n2 )2
⌫= (S12 /n1 )2
+
(S22 /n2 )2 t0 > t↵,⌫
H0 : µ1 = µ2 n1 1 n2 1
H1 : µ1 > µ2
Known Pop. Variance
Hypoteses Test Statistics Reject H0 when
H0 : µ = µ 0
|Z0 | > Z↵/2
H1 : µ 6= µ0 ȳ
µ0
One Sample
H0 : µ = µ 0 Z0 = p
/ n Z0 < Z↵
H1 : µ < µ 0
H0 : µ = µ 0 ) Z0 ⇠ N (1, 0)
H1 : µ > µ 0
Z0 > Z↵
H0 : µ 1 = µ 2
|Z0 | > Z↵/2
Two Samples
H1 : µ1 6= µ2
ȳ1 ȳ2
H0 : µ 1 = µ 2 Z0 = q 2 2 Z0 < Z↵
H1 : µ 1 < µ2 1
n1 + 2
n2
H0 : µ 1 = µ 2
H1 : µ 1 > µ2
Z0 > Z↵
Paired t-Test
Statistical model:
⇢
i = 1, 2
yij = µi + j + ✏ij ;
j = 1, 2, . . . , n
dj = y1j y2j ; j = 1, 2, . . . , n
µd = E(dj )
= E(y1j y2j )
= E(y1j y2j )
= E(y1j ) E(y2j )
= (µ1 + j ) (µ2 + j)
= µ1 µ2
Paired t-Test
Testing the hypothesis H0 : µ1 = µ2 means to test the couple of hypotheses:
H0 : µ d = 0
H1 : µd 6= 0
and: sP sP
¯2 2
P 2
j (dj d) d
j j 1/n( j d j )
Sd = =
n 1 n 1
Paired t-Test
❖ In two-samples t-test:
❖ variances of populations are equal
❖ observations are independent random
variables N ID(µ, 2
)
❖ populations are normally distributed
Inference on Variance
H0 : 2
= 2
0 H0 : 2
1 = 2
2
Test
H1 : 2
= 2
0 H1 : 2
1 = 2
2
2 SS (n 1)S 2 2 S12
Statistics 0 = 2 = 2 ⇠ n 1 F 0 = 2 ⇠ Fn 1 1,n2 1
0 0 S2
COV=
COV=-0.308016063713454
1.14033728906712
0.23899288508923 COV=
COV=0.991968132659053
1.14033728906712
777
77 7
666
66 6
555
d2
d3
d2
d3
55 5
44 4
44 4
33 3
3
1.5
1 2.0 22.5 3.03 3.5 4.0
4 4.5 5 1.5
1 2.0 22.5 3.03 3.5 4.0
4 4.5 5
d1 d1
Normality Check
Normal Q-Q plot for d1 Normal Q-Q plot for d3
5
7
6
4
Sample Quantiles
Sample Quantiles
5
3
4
Normal Q-Q plot for U(1,5)
2
3
5
1
-2 -1 0 1 2 -2 -1 0 1 2
4
Sample Quantiles
-2 -1 0 1 2
Theoretical Quantiles
How to Build Q-Q Plots
# y f(x<y) q(f)
17
3 13.28 0.32 -0.47
16
Sample Quantiles (y)
15
4 14.31 0.44 -0.15
14
5 14.55 0.56 0.15
13
6 16.16 0.68 0.47
-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5
7 16.52 0.80 0.85
Theoretical Quantiles (q(f))
1.0
P(Type II error), or (1-Power)
0.8
2
2 3
3 4
0.6
0.6
4 5
5 10
0.4
0.4
10 20
20 50
50 100
0.2
0.2
100
0.0
0.0
0 1 2 3 4 5 6 0 1 2 3 4 5 6
d d
|µ1 µ2 | |µ µ0 |
d= ; d=
2 2
Confidence Interval
X n
1
X̄ = Xi
n i
Pn 2
(X i X̄)
S2 = i
n 1
then
X̄ µ
t0 = p ⇠ tn 1
S/ n
is distributed as a Studen’s T with n 1 degrees of freedom. If c is the 95-th
percentile of that distribution, then:
and, consequently:
p p
P r(X̄ cS/ n < µ < X̄ + cS/ n) = 0.90
Confidence Interval
When the di↵erence between two variables is of interest, the confidence in-
terval can be described by the statistics
in fact:
0 1
(ȳ1 ȳ2 ) (µ1 µ2 )
P @ t↵/2,n1 +n2 2 6 q 6 t↵/2,n1 +n2 2
A=1 ↵
1 1
Sp n1 + n2
Confidence Interval
And thus, since:
✓ r
1 1
P (ȳ1 ȳ2 ) t↵/2,n1 +n2 2 Sp + 6 µ1 µ2 6
n1 n2
r ◆
1 1
6 (ȳ1 ȳ2 ) + t↵/2,n1 +n2 2 Sp + =1 ↵
n1 n2
❖ Chauvenet criterion:
❖ sample ➜ ",σ2 ➜ P(max(distance)/σ2) ➜ reject if
P<0.5
❖ Grubbs’ test
v
max |Yi Ȳ | u
i=1,...,n n 1u t2↵/(2n),n
G0 = , if G0 > t 2
then reject H0
s n n 2+ t2↵/(2n),n 2
Pearson’s Chi-square test
Freq None Some
Heavy 7 1 3
Never 87 18 84
Occas 12 3 4
Regul 9 1 7