Académique Documents
Professionnel Documents
Culture Documents
In statistical hypothesis testing, a type I error is the incorrect rejection of a true null hypothesis (a false positive), while a type II error is the failure to reject a false
null hypothesis (a false negative). More simply stated, a
type I error is detecting an eect that is not present, while
a type II error is failing to detect an eect that is present.
The terms type I error and type II error are often
used interchangeably with the general notion of false positives and false negatives in binary classication, such as
medical testing, but narrowly speaking refer specically
to statistical hypothesis testing in the NeymanPearson
framework, as discussed in this article.
3 EXAMPLES
ally corresponds to a default state of nature, for example this person is healthy, this accused is not guilty or
this product is not broken. An alternative hypothesis is
the negation of null hypothesis, for example, this person
is not healthy, this accused is guilty or this product is
broken. The result of the test may be negative, relative
to null hypothesis (not healthy, guilty, broken) or positive
(healthy, not guilty, not broken). If the result of the test
corresponds with reality, then a correct decision has been
made. However, if the result of the test does not correspond with reality, then an error has occurred. Due to
the statistical nature of a test, the result is never, except
in very rare cases, free of error. Two types of error are
distinguished: type I error and type II error.
2.1
3.1 Example 1
Type I error
3 Examples
The type I error rate or signicance level is the probability of rejecting the null hypothesis given that it is true.[4][5]
It is denoted by the Greek letter (alpha) and is also
called the alpha level. By convention, the signicance
level is set to 0.05 (5%), implying that it is acceptable
to have a 5% probability of incorrectly rejecting the null
hypothesis.[4]
3.2 Example 2
2.2
Type II error
3.3 Example 3
3
A positive correct outcome occurs when convicting a
guilty person. A negative correct outcome occurs when
letting an innocent person go free.
3.4
Example 4
3.5
Theory
Etymology
In 1928, Jerzy Neyman (18941981) and Egon Pearson (18951980), both eminent statisticians, discussed
the problems associated with "deciding whether or not a
particular sample may be judged as likely to have been
randomly drawn from a certain population"[7]p. 1 : and,
as Florence Nightingale David remarked, "it is necessary
to remember the adjective 'random' [in the term 'random
sample'] should apply to the method of drawing the sample
and not to the sample itself".[8]
They identied "two sources of error", namely:
5 Related terms
5.1 Null hypothesis
Main article: Null hypothesis
It is standard practice for statisticians to conduct tests
in order to determine whether or not a "speculative
hypothesis" concerning the observed phenomena of the
world (or its inhabitants) can be supported. The results of
such testing determine whether a particular set of results
agrees reasonably (or does not agree) with the speculated
hypothesis.
On the basis that it is always assumed, by statistical convention, that the speculated hypothesis is wrong, and the
6 APPLICATION DOMAINS
A threshold value can be varied to make the test more restrictive or more sensitive, with the more restrictive tests
increasing the risk of rejecting true positives, and the
more sensitive tests increasing the risk of accepting false
positives.
An automated inventory control system that rejects highquality goods of a consignment commits a type I error,
The consistent application by statisticians of Neyman and while a system that accepts low-quality goods commits a
Pearsons convention of representing "the hypothesis to be type II error.
tested" (or "the hypothesis to be nullied") with the expression H 0 has led to circumstances where many understand the term "the null hypothesis" as meaning "the nil 6.2 Computers
hypothesis" a statement that the results in question have
arisen through chance. This is not necessarily the case The notions of false positives and false negatives have a
the key restriction, as per Fisher (1966), is that "the null wide currency in the realm of computers and computer
hypothesis must be exact, that is free from vagueness and applications, as follows.
ambiguity, because it must supply the basis of the 'problem of distribution,' of which the test of signicance is the
solution."[11] As a consequence of this, in experimental 6.2.1 Computer security
science the null hypothesis is generally a statement that
a particular treatment has no eect; in observational sci- Main articles: computer security and computer insecurity
ence, it is that there is no dierence between the value of
a particular measured variable, and that of an experimen- Security vulnerabilities are an important consideration in
tal prediction.
the task of keeping computer data safe, while maintaining
access to that data for appropriate users. Moulton (1983),
stresses the importance of:
5.2
Statistical signicance
6.2.3 Malware
The term false positive is also used when antivirus software wrongly classies an innocuous le as a virus. The
incorrect detection may be due to heuristics or to an incorrect virus signature in a database. Similar problems
can occur with antitrojan or antispyware software.
6.5
6.2.4
Medical screening
Optical character recognition
Detection algorithms of all kinds often create false posi- In the practice of medicine, there is a signicant diertives. Optical character recognition (OCR) software may ence between the applications of screening and testing.
detect an a where there are only some dots that appear
Screening involves relatively cheap tests that are
to be an a to the algorithm being used.
given to large populations, none of whom manifest
any clinical indication of disease (e.g., Pap smears).
6.3
Security screening
Perhaps the most widely discussed false positives in medical screening come from the breast cancer screening procedure mammography. The US rate of false positive
mammograms is up to 15%, the highest in world. One
consequence of the high false positive rate in the US is
that, in any 10-year period, half of the American women
screened receive a false positive mammogram. False positive mammograms are costly, with over $100 million
spent annually in the U.S. on follow-up testing and treatment. They also cause women unneeded anxiety. As a
result of the high false positive rate in the US, as many
as 9095% of women who get a positive mammogram
6.4 Biometrics
do not have the condition. The lowest rate in the world
Biometric matching, such as for ngerprint recognition, is in the Netherlands, 1%. The lowest rates are generally
facial recognition or iris recognition, is susceptible to type in Northern Europe where mammography lms are read
I and type II errors. The null hypothesis is that the input twice and a high threshold for additional testing is set (the
does identify someone in the searched list of people, so: high threshold decreases the power of the test).
The ideal population screening test would be cheap, easy
the probability of type I errors is called the false to administer, and produce zero false-negatives, if posreject rate (FRR) or false non-match rate (FNMR), sible. Such tests usually produce more false-positives,
which can subsequently be sorted out by more sophisti while the probability of type II errors is called cated (and expensive) testing.
the false accept rate (FAR) or false match rate
(FMR).[12]
If the system is designed to rarely match suspects then
the probability of type II errors can be called the "false
alarm rate. On the other hand, if the system is used for
validation (and acceptance is the norm) then the FAR is a
measure of system security, while the FRR measures user
inconvenience level.
REFERENCES
False positives can also produce serious and counterintuitive problems when the condition being searched for
is rare, as in screening. If a test has a false positive rate of
one in ten thousand, but only one in a million samples (or
people) is a true positive, most of the positives detected
by that test will be false. The probability that an observed
positive result is a false positive may be calculated using
Bayes theorem.
6.7
Paranormal investigation
See also
Binary classication
Detection theory
Egon Pearson
False positive paradox
Family-wise error rate
Information retrieval performance measures
NeymanPearson lemma
Null hypothesis
Probability of a hypothesis for Bayesian inference
Precision and recall
Prosecutors fallacy
Prozone phenomenon
Receiver operating characteristic
Sensitivity and specicity
8 Notes
[1] In relation to this newborn screening, recent studies have
shown that there are more than 12 times more false positives than correct screens (Gambrill, 2006. )
[2] Several sites provide examples of false positives, including The Atlantic Paranormal Society (TAPS) and
Moorestown Ghost Research.
9 References
[1] Sheskin, David (2004). Handbook of Parametric and
Nonparametric Statistical Procedures. CRC Press. p. 54.
ISBN 1584884401.
[2] Peck, Roxy and Jay L. Devore (2011). Statistics: The Exploration and Analysis of Data. Cengage Learning. pp.
464465. ISBN 0840058012.
[3] Cisco Secure IPS Excluding False Positive Alarms
http://www.cisco.com/en/US/products/hw/vpndevc/
ps4077/products_tech_note09186a008009404e.shtml
[4] Lindenmayer, David; Burgman, Mark A. (2005). Monitoring, assessment and indicators. Practical Conservation
Biology (PAP/CDR ed.). Collingwood, Victoria, Australia: CSIRO Publishing. pp. 401424. ISBN 0-64309089-4.
[5] Schlotzhauer, Sandra (2007). Elementary Statistics Using
JMP (SAS Press) (1 ed.). Cary, NC: SAS Institute. pp.
166423. ISBN 1-599-94375-1.
[6] Shermer, Michael (2002). The Skeptic Encyclopedia of
Pseudoscience 2 volume set. ABC-CLIO. p. 455. ISBN
1-57607-653-9. Retrieved 10 January 2011.
[7] Neyman, J.; Pearson, E.S. (1967) [1928]. On the Use
and Interpretation of Certain Test Criteria for Purposes
of Statistical Inference, Part I. Joint Statistical Papers.
Cambridge University Press. pp. 166.
[8] David, F.N. (1949). Probability Theory for Statistical
Methods. Cambridge University Press. p. 28.
[9] Pearson, E.S.; Neyman, J. (1967) [1930]. On the Problem of Two Samples. Joint Statistical Papers. Cambridge
University Press. p. 100.
[10] Neyman, J.; Pearson, E.S. (1967) [1933]. The testing of
statistical hypotheses in relation to probabilities a priori.
Joint Statistical Papers. Cambridge University Press. pp.
186202.
[11] Fisher, R.A. (1966). The design of experiments. 8th edition. Hafner:Edinburgh.
10 External links
Bias and Confounding presentation by Nigel
Paneth, Graduate School of Public Health, University of Pittsburgh
11
11
11.1
11.2
Images
11.3
Content license