Vous êtes sur la page 1sur 27

Hypothesis Testing

Hypothesis Testing

To determine if a dataset is consistent with a


given hypothesis to check the validity of it

2
Hypotheses

Null hypothesis:

Considered true by default

The boring stuf we want to reject
( e.g., Standard Model )

3
Hypotheses

Alternative hypothesis:

Complementary to null hypothesis

The special case we wish to see
( e.g., some new physics models )

4
Examples of Hypotheses

Null / Alternative hypothesis:



A reconstructed particle is ( muon / electron ).

The mass distribution ( is completely consistent
with Standard Model / is not predicted by SM
but by another theory ).

5
Frequently Used Hypotheses in HEP


Null hypothesis
“Background-only hypothesis”: Only Standard Model
contributes to the observation.

Alternative hypothesis
“Signal-plus-background hypothesis”: Additional
new physics processes contribute.

6
More about Hypotheses


Simple hypothesis:
The expected distribution of data can be entirely
determined. (No free parameters)

Composite hypothesis:
Based on an ensemble of simple hypotheses,
which may be related by a continuous parameter.

7
Test Statistic

Test statistic :

A variable for testing how well the
hypothesis agree with the observation.

It could be either a scalar function or a
vector function.

8
Test Statistic

For a hypothesis , there is a expected


PDF of the test statistic for the
hypothesis

9
Test Statistic
Example:
Let test statistic be the number of events.

The expected number of background events, and the


number of signal events are

Then the PDFs of BG-only and signal-plus-BG hypothesis


are Poisson distributions with parameter
1.3 and 3.3.

10
Critical Region

Critical region: One of the ways to decide


whether to reject the null hypothesis.

11
Critical Region

One-tailed test
(examined later)

Defne as rejection
region, and as
acceptance region.

If lies in the rejection
region, then the null
hypothesis is rejected.

If lies in the acceptance
region, then we fail to reject the
null hypothesis.
12
Critical Region

: the probability that we


reject while is true.

: the probability that we


reject while is true.

is called the size of the test, or the signifcance level.


(Note that it’s diferent from the confdence level or p-value.)

is called the power of the test.

13
One- and two- tailed test

A two-tailed test

14
Type I and Type II Errors

Type I error: The null hypothesis is rejected while it’s


actually true. The probability of this kind of error
happens is .

Type II error: One fails to reject the null hypothesis


while it’s actually false. If the alternative hypothesis is
complementary with the null one, the probability of
this kind of error happens is .

15
Type I and Type II Errors

16
Type I and Type II Errors

17
The Testing Process

1. Defne the null hypothesis and the alternative hypothesis.


2. Select a test statistic t by the specifcs of the analysis.
3. Determine the expected distribution of t for the null
hypothesis.
4. Defnd the size considering both type I and II errors.
5. Determine the observed value of t from the measured data.
6. Check what region does t lie in and draw a conclusion.

18
The Testing Process

1. Defne the null hypothesis and the alternative hypothesis.


2. Select a test statistic t by the specifcs of the analysis.
3. Determine the expected distribution of t for the null
hypothesis.
4. Defnd the size considering both type I and II errors.
5. Determine the observed value of t from the measured data.
6. Check what region does t lie in and draw a conclusion.

19
Choosing the Test Statistic


A good test statistic results in a clear separation of
the distribution of t for the diferent hypotheses.

A test statistic is called “sufcient” if there exists no
other test statistic that provides additional relevant
information on the hypothesis model.

An ideal test statistic obeys these and provide the
best power for a given size .

20
Choosing the Test Statistic


Example of test statistics: reconstructed mass,
transverse momentum, ( for goodness-of-ft
tests) , etc.

21
Choosing the Test Statistic


Example of test statistics: reconstructed mass,
transverse momentum, ( for goodness-of-ft
tests) , etc.

22
Choosing the Test Statistic


Neyman-Pearson lemma:
If all of the hypotheses are simple, that is, we could
entirely determine the model, the likelihood ratio

is the best choice of the test statistic, and the best


critical region is where . is a constant
adjusted to reach the size .

23
Choosing the Test Statistic

For composite hypotheses, if the set of simple hypotheses


could be written as a function of a parameter , then the
power of the test could be written as . We would have
to choose the test by the expected .

24
Choosing the Test Statistic

For composite hypotheses, we could also plot versus ,


and choose the test with the best power for a given size .

25
Choosing the Critical Region


A trade-of between type I and type II errors has
to be made.

In high-energy physics, is commonly taken to
be very small for a discovery.

26
Determining the Test Statistic Distributions


If there are many data samples, the distribution
will be like a Gaussian and is easy to determine.

If not, then the distribution is non-trivial.
Usually, numerical methods are used to
determine it.

27

Vous aimerez peut-être aussi