Hypothesis Testing

Hypothesis Testing
Hypothesis Testing
To determine if a dataset is consistent with a

given hypothesis to check the validity of it
2
Hypotheses
Null hypothesis:

Considered true by default

The boring stuf we want to reject
( e.g., Standard Model )
3
Hypotheses
Alternative hypothesis:

Complementary to null hypothesis

The special case we wish to see
( e.g., some new physics models )
4
Examples of Hypotheses
Null / Alternative hypothesis:


A reconstructed particle is ( muon / electron ).

The mass distribution ( is completely consistent
with Standard Model / is not predicted by SM
but by another theory ).
5
Frequently Used Hypotheses in HEP

Null hypothesis
“Background-only hypothesis”: Only Standard Model
contributes to the observation.

Alternative hypothesis
“Signal-plus-background hypothesis”: Additional
new physics processes contribute.
6
More about Hypotheses

Simple hypothesis:
The expected distribution of data can be entirely
determined. (No free parameters)

Composite hypothesis:
Based on an ensemble of simple hypotheses,
which may be related by a continuous parameter.
7
Test Statistic
Test statistic :

A variable for testing how well the
hypothesis agree with the observation.

It could be either a scalar function or a
vector function.
8
Test Statistic
For a hypothesis , there is a expected

PDF of the test statistic for the
hypothesis
9
Test Statistic
Example:
Let test statistic be the number of events.
The expected number of background events, and the

number of signal events are
Then the PDFs of BG-only and signal-plus-BG hypothesis

are Poisson distributions with parameter
1.3 and 3.3.
10
Critical Region
Critical region: One of the ways to decide

whether to reject the null hypothesis.
11
Critical Region
One-tailed test
(examined later)
Defne as rejection
region, and as
acceptance region.

If lies in the rejection
region, then the null
hypothesis is rejected.

If lies in the acceptance
region, then we fail to reject the
null hypothesis.
12
Critical Region
: the probability that we

reject while is true.
: the probability that we

reject while is true.
is called the size of the test, or the signifcance level.

(Note that it’s diferent from the confdence level or p-value.)
is called the power of the test.
13
One- and two- tailed test
A two-tailed test
14
Type I and Type II Errors
Type I error: The null hypothesis is rejected while it’s

actually true. The probability of this kind of error
happens is .
Type II error: One fails to reject the null hypothesis

while it’s actually false. If the alternative hypothesis is
complementary with the null one, the probability of
this kind of error happens is .
15
16
17
The Testing Process
1. Defne the null hypothesis and the alternative hypothesis.

2. Select a test statistic t by the specifcs of the analysis.
3. Determine the expected distribution of t for the null
hypothesis.
4. Defnd the size considering both type I and II errors.
5. Determine the observed value of t from the measured data.
6. Check what region does t lie in and draw a conclusion.
18
The Testing Process
1. Defne the null hypothesis and the alternative hypothesis.

2. Select a test statistic t by the specifcs of the analysis.
3. Determine the expected distribution of t for the null
hypothesis.
4. Defnd the size considering both type I and II errors.
5. Determine the observed value of t from the measured data.
6. Check what region does t lie in and draw a conclusion.
19
Choosing the Test Statistic

A good test statistic results in a clear separation of
the distribution of t for the diferent hypotheses.

A test statistic is called “sufcient” if there exists no
other test statistic that provides additional relevant
information on the hypothesis model.

An ideal test statistic obeys these and provide the
best power for a given size .
20

Example of test statistics: reconstructed mass,
transverse momentum, ( for goodness-of-ft
tests) , etc.
21

Example of test statistics: reconstructed mass,
transverse momentum, ( for goodness-of-ft
tests) , etc.
22

Neyman-Pearson lemma:
If all of the hypotheses are simple, that is, we could
entirely determine the model, the likelihood ratio
is the best choice of the test statistic, and the best

critical region is where . is a constant
adjusted to reach the size .
23
For composite hypotheses, if the set of simple hypotheses

could be written as a function of a parameter , then the
power of the test could be written as . We would have
to choose the test by the expected .
24
For composite hypotheses, we could also plot versus ,

and choose the test with the best power for a given size .
25
Choosing the Critical Region

A trade-of between type I and type II errors has
to be made.

In high-energy physics, is commonly taken to
be very small for a discovery.
26
Determining the Test Statistic Distributions

If there are many data samples, the distribution
will be like a Gaussian and is easy to determine.

If not, then the distribution is non-trivial.
Usually, numerical methods are used to
determine it.
27

Hypothesis Testing

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Hypothesis Testing

Transféré par

Droits d'auteur :

Formats disponibles

Hypothesis Testing

To determine if a dataset is consistent with a

Null / Alternative hypothesis:

For a hypothesis , there is a expected

The expected number of background events, and the

Then the PDFs of BG-only and signal-plus-BG hypothesis

Critical region: One of the ways to decide

: the probability that we

: the probability that we

is called the size of the test, or the signifcance level.

is called the power of the test.

Type I error: The null hypothesis is rejected while it’s

Type II error: One fails to reject the null hypothesis

1. Defne the null hypothesis and the alternative hypothesis.

1. Defne the null hypothesis and the alternative hypothesis.

is the best choice of the test statistic, and the best

For composite hypotheses, if the set of simple hypotheses

For composite hypotheses, we could also plot versus ,

Vous aimerez peut-être aussi