Vous êtes sur la page 1sur 17

Mathematical and Statistical

Techniques
Part I: Probability and statistics
hugo.ruiz@ub.edu

STATISTICS

2
Statistics
Two classes of problems:

A. Quantification of the agreement of facts (or data)


with a given theory

B. Search for the theory that best explains the data

Hypotheses
Simple hypothesis: determines uniquely  
Composite hypothesis: the prediction depends on
certain parameters  :  ; 

Let us stick now to the case of simple hypotheses


Often a statement of the validity of a Hypothesis 
involves a comparison with the alternatives  ,  ,
etc.
Each specifies a pdf: ( |
)

4
Test statistic
Test statistic: a function of the measured sample of
values of the random variables that will be used to
estimate the agreement with the hypotheses
If we perform measurements, we denote
T  , , 

Each hypothesis will then imply a given pdf for the


statistic: (|
)
Desired property: pdfs from different hypotheses
overlap as little as possible

Example: LHCb

6
Cerenkov radiation


cos  

http://skullsinthestars.com/2009/11/20/reversing-optical-shockwaves-using-metamaterials/

LHCb RICH detector

8
9

Kaon-pion separation statistic


 kaon K
 pion 

Remember: the less the distributions overlap, the


better T

10
Taking decisions
Well decide that a particle is a kaon if  critical
region (here:    )
Significance level: prob. of rejecting  if it is true
$
!"    # -./
%&'(

The prob. of accepting 


while  being true is
%&'(
)"    #
*$
Power of the test: 1 , )

11

Efficiencies, errors
The efficiencies for selecting kaons and pions are
%&'(
01  "   2 #  1 ,
*$
%&'(
04  "    #  )
*$

Error of the 1st kind: rejecting  when it is true


Error of the 2nd kind: accepting  when it is false

12
But
Let as assume that, for a
given particle, we obtain
2

Can I compute the


probability for it to be a
kaon or a pion? How?

13

Back to Bayes!
The problem is the same as in the disease example
Exercise: relate ! and ) with the rates of fake positives

The probability of a given particle being a pion or a kaon


depends on how abundant the species are at LHC!
If the fractions are 61 and 64 , 61 + 64  1, then the
probabilities for a measured value of  are given by:
61   2
2 
61   2 + 6 4   

64   2
 
61   2 + 64   

14
Purity
If we select kaons based on a cut    , what is the
fraction of true kaons we are getting?
number of kaons with   
91
number of all particles with   

If we assume that all particles are kaons or pions (not a


bad approximation):
%
J*$ 61   2 #
&'(

91  %
J*$ 61   2 + 1 , 61    #
&'(

This is a very relevant information for deciding  (and


note that it depends on 61 )

15

P-value
Let us forget about pions

We measure  for one particle

The P-value for the kaon


hypothesis is defined as:
P  , L  M | , L()|

The closer to 1, the more likely


the hypothesis
16
PARAMETER ESTIMATION (POINT
ESTIMATION THEORY)

17

Samples, statistic
Let us go back to composite hypotheses: the predicted
pdf depends on parameters  , ,  
That is, we have  ; 

Assume we perform independent measurements of 


( a sample of size )
This is equivalent to N  ( , , O ) with the same pdf for all
the variables
PQRSTU  , , O ;     ;   V ;  (O ;  )

Statistic: any function of N which does not contain


unknown parameters
18
Estimators
Estimator: a statistic used to estimate some property of a pdf
(its mean, variance, others).
Notation: W is the estimator of the parameter  (let us remove the
vector arrow not to overcomplicate)
It is itself a random variable!

Consistent estimator: if lim Y  


O$
Large sample limit asymptotic limit high statistics limit
Its a fair requirement for an estimator!

Parameter fitting: procedure for estimating a parameters


value given the data (point estimation theory)

19

Bias of an estimator
If we repeat our full experiment (with its
measurement values) many times, we will obtain
different values of the statistic W
These values will distribute differently according to
the true values of the parameters 
Sampling distribution: the pdf of W,  W, 
Bias of an estimator: [ \ ] ^ , ], with
L W;   J W W;  #W 
 J J W  , O   ;   O ;  # #O

20
Comments on the bias
It depends on:
The sample size
The functional form of the estimator
The true pdf of , (), including the value of 
An estimator is said to be unbiased if the bias is zero for
any

A consistent estimator can be biased: lim W  , but


O$
for finite , L W ,  0

A little bit of bias (smaller compared with other


uncertainties) can be accepted in many cases

21

Estimator for the mean


How do we estimate a?
Try the estimator called sample mean:
O
1
 c 



d
Not to be confused with the population mean a L[], for
which  is an estimator
We have
O O O
1 1 1
L   L c 
 c L[
]  c a  a


d
d
d
This is an unbiased and consistent estimator
22
Estimators for variance and covariance

Sample variance: g V O
d 
,  V
O*
It is an unbiased estimator for the population variance i V


The estimator j V O
d 
, k V is also unbiased
O
For the covariance matrix and the correl. coefficients:
O
1
lYmn  c(
,  )(o
, op)
,1

d
Y
lmn O
d 
,  o
, op
qmn  
gm gn 
sd 
,  V td ot , op V V
O O

23

Variance of an estimator
V
l W  L W V , L W
Remember: iu is a measure of the expected
dispersion of W about its mean in a large number of
similar experiments each with sample size
sometimes quoted as statistical uncertainty of ]^
Note: what we are saying in fact is
l W;   WRUQP.wUx
 L W V ;   WRUQP.wUx
V
W W
, L ;   RUQP.wUx
(see later example on exponential distribution)
24
Ex: variance of the sample mean
For example, for the case of the mean:
O O
1 1
l   L  V , L  V
L c 
c s , aV


d sd
O
1
 V c L 
s , aV


,sd
1 iV
 V V , a V + a V + i V , aV 
IMPORTANT!

No correlation between consecutive L 


, s  aV for y z
values of the random variable! L 
V  aV + i V

25

Example: Poisson distribution


Assume the number of events in a time interval
distributes according to a Poisson
 m *u
   |
!
We measure the number of events in a time interval,
we get 0
What can we say about ?
For one event, clearly Y   
L Y   L    a   
^}
So our measurement is ]

26
Example: Poison distribution
V
What is l W  L W V , L W in this case?

This shows the limitation of using l W as the


statistical uncertainty
27

Confidence intervals
Consider , W, and (W|) its pdf
Let us define W (), WV () by:
^~
u $
1,)
"  W  #W  "  W  #W 
*$ ^
u 2

 W  For a given ]
Area )

Y () YV ()  28


Confidence intervals
YV ()
W ^ 
L 

^
 Y ()

V  
For  M  or   V , the probability of having a value as
*
eccentric as Y RUQP.wUx is 
V

^ ] at C.L. (  , Y; V Y , V)


We will write: ]  ] *]

Sometimes, the 0.683 C.L. interval is used to quote the


^ ] .
statistical uncertainty, and we just write ] *]
29

One-sided confidence interval


Define W ()by:  W  For a given ]
^~ Area )
u
"  Y  #Y  1 , )
*$
Y () Y () 

For  M  , the probability of


YV ()
having a value as eccentric as W L  ^ 
Y RUQP.wUx is  1 , )
^

Y ()
We will write: ]  ] at C.L.

V  
30
Back to the Poisson example
Remember: we got 0 events
our measurement is ] ^}
Clearly we cannot set a
symmetric interval here

(Note: discrete: J )
u *u
We have  0   |  | *u
!
If we want 0 to be excluded with a CL ):
| u~  1 , )   ,ln (1 , ))
For a 95% CL:   ln 0.05  3 ]  at % C. L.

31

Interpretation
As we defined them, the confidence interval represents
values for the parameter for which the difference
between the parameter and the observed estimate is
not statistically significant at the (1 , ))% level

Not the same as: were this procedure to be repeated on


multiple samples, the calculated confidence interval
(which would differ for each sample) would encompass
the true population parameter 90% of the time
As an illustration, the later cannot be stated without an
assumption of the true value of the parameter in the repetitions
(or its a priori distribution)! Well come back to this.

32
Back to Bayes again!!
Beware of the interpretation of confidence intervals
Assigning a probability to a given value of  would
require more information that just :

 
 

 g T
(): priori probability of 
 : priori probability of  (takes into account the
distribution of probabilities of !)
All this is normally not available in Physics
33

Vous aimerez peut-être aussi