Vous êtes sur la page 1sur 14

DRAFT version October 22 2009

NFI Appendix

The NFI series of verbal probability terms, and the Bayesian framework for
the interpretation of evidence.

Table of contents

1. What is an NFI appendix?


2. Introduction
3. How is the series used?
4. A medical example: the HIV test
5. Bayes Rule
6. A numerical example: the DNA match
7. The Bayesian framework for the interpretation of forensic evidence
8. Hypotheses
9. If numbers are missing: the series of verbal probability terms
10.A verbal example: facial comparison
11.Errors in reasoning (fallacies)
12.Certainty/no opinion
13.Glossary
14.Literature

1. What is an NFI appendix?

The Netherlands Forensic Institute (NFI) performs a wide variety of


investigations. Normally each NFI report is provided with an appendix. This
appendix serves as an explanation of the research and has a purely informative
character. A glossary and an overview of source and literature references have
been included in the back of the appendix.

2. Introduction

In many cases a forensic scientist will not be able to provide a client with a clear
yes or no answer. There is, in such cases, a certain degree of uncertainty
concerning the conclusion. This uncertainty is preferably expressed in a number,
for example a numerical probability, or an interval. In some cases, however, the
scientist will only be able to formulate his conclusion in verbal probabilistic
terms. In order to prevent confusion, NFI scientists use, where applicable, a
standard series of verbal terms to formulate their conclusion. As a result of the
insights gained from the so-called Bayesian framework for the interpretation of
DRAFT version October 22 2009

evidence1 the NFI wishes to implement a new standard series. In this appendix,
this framework is discussed on the basis of examples. In addition, two important
and frequently-occurring errors of thinking are also addressed; these errors are
known as the prosecutors fallacy and the defence fallacy. In the past, and in the
current transitional phase, NFI scientists utilised a verbal probability series that
mainly related to the probability of one single hypothesis. For example:

The handwriting is:


 with a probability bordering on certainty
 highly probably,
 probably
 probably not
 highly probably not
 with a probability bordering on certainty not
the handwriting of Mr. X.

The Bayesian framework shows that the scientist should rather limit himself to
an opinion on the probability of the findings in view of the hypotheses. The NFI
therefore introduces a new series of statements to be used:

The findings of the investigation are:


 approximately equally probable [as]
 slightly more probable
 more probable
 much more probable
 very much more probable
if hypothesis 1 is correct, than if hypothesis 2 is correct.

3. How is the series used?

An example may clarify how the new series of verbal probability statements is
used in practice. Assume, for example, that a scientist compares a shoe mark
with the shoe of a suspect. The scientist will, in such cases, consider at least two
hypotheses, for example:
1. the suspects shoe made the mark;
2. another shoe with a similar sole pattern and size made the mark.
The similarities and differences identified between the shoe of the suspect and
the shoe mark found constitute the findings of the investigation. If the findings
are more in line with hypothesis 1 than with hypothesis 2, they constitute
evidence in support of hypothesis 1. A striking similarity, such as a long cut in
1
See for example Robertson and Vignaux 1995; Broeders 2003; Sjerps and Coster van Voorhout (ed.)
2005
DRAFT version October 22 2009

the shoe sole whose shape and position correspond with a line in the shoe
mark will be more in line with hypothesis 1 than hypothesis 2. The better the
findings fit with hypothesis 1 rather than with hypothesis 2, the stronger the
evidence will be. The Bayesian framework formalises this line of reasoning. The
term probability is used in this context as a criterion for the extent to which the
findings are striking or fit the hypothesis. In the interpretation of his
findings, the scientist will therefore establish how probable it would be to
observe the findings when hypothesis 1 is correct, and when hypothesis 2 is
correct. In such cases, he could conclude, for example, that:
The findings of the investigation are much more probable if the shoe of the
suspect made the mark than if the mark was made by another shoe with a
similar sole pattern and size.
It may also be the case that the findings are more probable in the event that the
second hypothesis is correct, for example if the shoe was secured immediately
after the crime was committed, but the long cut cannot be found in the shoe
mark. In such cases, the same series is used, while hypotheses 1 and 2 change
positions. For example:
The findings of the investigation are much more probable if another shoe
with a similar sole pattern and size made the mark than if the shoe of the
suspect made the mark.

4. A medical example: the HIV test

An example from medical science may serve to clarify the Bayesian line of
reasoning. Let us assume that an HIV test will always result in a positive result
for someone who has been infected with HIV virus. However, for a small
number of test subjects who have not been infected, the result will also be
positive (a false positive). Assume furthermore that John receives a positive
test result. What is the probability that he is actually infected with HIV? The
doctors finding is the positive HIV test. This result is exactly what he can
expect if John is actually infected. If John is not infected, the result may be a
rare false positive. The finding is therefore considerably more probable if John
is infected than if he is not infected. The doctors diagnosis, however, does not
only depend on the test result. John may happen to belong to an HIV risk group,
for example drug users who use contaminated needles. John may also come
from a country where many people are infected with HIV. In that case, even
before the test is carried out, the probability of infection will already be higher
than in the case of a person who does not belong to a risk group. This prior
probability is adjusted upward as a result of the finding that the test is positive.
Two factors therefore play a role when establishing the probability of John being
infected with HIV: the prior probability and the test result. As the prior
probability differs for each person, the same test result may lead to different
DRAFT version October 22 2009

diagnoses for different persons. Adjusting the prior probability by taking new
findings into account may be described mathematically by means of the so-
called Bayes Rule. The result is called the posterior probability: the probability
after the test.

5. Bayes Rule

The Bayesian framework was named after the English Reverend Thomas Bayes,
who lived in the 18th century and drew up a simple mathematical formula on
probability. A re-formulation of this Bayes Rule considers the ratio of the
probabilities of two hypotheses. If the ratio is ten to one, one hypothesis will be
ten times as probable as the other. This ratio of probabilities can be considered
before certain findings are taken into account (a priori, or prior, odds) or after
they are taken into account (a posteriori, or posterior, odds). Bayes Rule shows
in this context how the ratio of the probabilities changes due to the findings:

prior odds x Likelihood Ratio = posterior odds

This Likelihood Ratio is often abbreviated to LR. This term concerns the ratio
between the probability of the findings in the two hypotheses. This formula is
expressed below:

the probability of observing the findings if hypothesis 1 is true


LR =
the probability of observing the findings if hypothesis 2 is true

The new series of verbal probability statements that the NFI will implement is a
verbal equivalent of the LR. In the Bayesian framework for the interpretation of
evidence, the LR is considered the measure for the evidential strength of the
findings when considering hypothesis 1 vis--vis hypothesis 2. The task of the
scientist is, according to this framework, limited to reporting the LR. These
insights are illustrated in the following numerical example.

6. A numerical example: the DNA match

A trace containing DNA is found at the scene of a crime which may belong to
the offender. In order to facilitate calculation, this example will assume the
DNA material to be of very poor quality, such that only a small part of the DNA
profile can be visualised. This partial profile show, among other things, that it
concerns a man. The suspects DNA profile matches the partial profile. The
probability that a randomly selected man, who is not related to the suspect, will
DRAFT version October 22 2009

have this DNA profile is one in a thousand. This probability is referred to as the
random match probability. How probable is it that the DNA at the crime scene
originates from the suspect? The scientist will translate this question into two
hypotheses. The hypotheses read as follows:
 hypothesis 1: the suspect is the donor of the DNA material
 hypothesis 2: the donor of the DNA material is an unknown man who is not
related to the suspect.
The finding of the DNA analysis is the DNA match. The question now is: how
probable is this finding given hypothesis 1, and given hypothesis 2? Assuming
that no mistake has been made in the chain of custody, the scientist will reason
as follows:
 If the suspect is the donor of the DNA material, their profiles will match. The
probability of the finding of the DNA analysis is therefore 100% if
hypothesis 1 is correct.
 If an unknown man is the donor of the DNA material, it would be fairly
coincidental that this material matches the suspects profile. The chances of
this occurring are one in a thousand. The probability of the finding of the
DNA analysis are therefore 0.1% if hypothesis 2 is correct.

The ratio of the probability of the findings given the two hypotheses (LR) is
therefore 100:0.1 = 1000. This means that the findings are 1000 times more
probable if hypothesis 1 is correct than if hypothesis 2 is correct. In this simple
example, the LR is thus equal to 1/random match probability.

How likely is it that the suspect is the donor? The answer depends, as it did in
the HIV example, on not just the results of the DNA analysis, but also on the
prior odds of the hypotheses. Or, in terms of Bayes Rule, the answer depends on
the LR and the prior odds. The LR is determined by the rarity of the DNA
profile, expressed in the random match probability. The prior odds are
determined on the basis of other information in the case.

For example, the other information in the case is the following: the crime took
place aboard a container ship at sea, and ten other men can be the donor in
addition to the suspect. Moreover, all these men qualify equally as donors of the
DNA material, but the only DNA profile available is that of the suspect. In this
case, the odds in favour of the suspect being the donor before the DNA
analysis is taken into consideration are one to ten. The prior odds in favour of
hypothesis 1 are therefore 0.1. Bayes Rule (prior odds x LR = posterior odds)
shows that the posterior odds are equal to 0.1 x 1000 = 100. This means that,
after the DNA analysis is taken into consideration, the odds are 100 to 1 in
favour of the suspect being the donor of the trace material and not one of the
other seamen. The table below illustrates how the probability that the DNA
belongs to the suspect depends on the rarity of the DNA profile and the number
DRAFT version October 22 2009

of men aboard the ship, or, in other words, on the LR and the prior odds. Table
(a) contains the numbers from this example. It shows the following: the rarer the
DNA profile, the stronger the DNA evidence, and the larger the LR.

Table 1a Random match probability 1 in 1000

Number of Prior odds Posterior odds Probability that the


other men suspect is the donor
1 1:1 1000:1 99,9%
10 1:10 100:1 99%
1000 1:1000 1:1 50%
1,000,000,000 1:1,000,000,000 1:1,000,000 0,0001%

Table 1b Random match probability 1 in 1 billion

Number of Prior odds Posterior odds Probability that the


other men suspect is the donor
1 1:1 1,000,000,000:1 99,9999999%
10 1:10 100,000,000:1 99,999999%
1000 1:1000 1,000,000:1 99,9999%
1,000,000,000 1:1,000,000,000 1:1 50%

Table 1 considers two hypotheses. Hypothesis 1: The suspect is the donor of the DNA
material and hypothesis 2: One of the other men aboard the ship is the donor of the DNA
material. The table illustrates the effect of the DNA match on the odds in favour of
hypothesis 1 according to the Bayes Rule. We consider four different numbers of other men.
The final column is calculated from the posterior odds. The random match pobability of the
DNA profile of the trace is (a) 1 in 1000 (b) 1 in 1 billion. The LR is therefore (a) 1000 (b) 1
billion.

In the HIV example, the doctor can make a statement on the posterior odds of
HIV infection, because he has specialist knowledge of both the prior odds and
the LR. The DNA scientist, however, only has specialist knowledge of the
frequency of the DNA profile, i.e. of the LR. The DNA scientist will usually not
have specialist knowledge or a full overview of the other information of the
case, which determines the posterior odds. He will not wish to express an
opinion in this respect. This means, contrary to what is often thought, that a
DNA scientist does not calculate the probability that the trace originates from
the suspect (or from someone other than the suspect). This posterior probability,
as we have seen, depends on the prior odds. The DNA scientist limits himself to
DRAFT version October 22 2009

an opinion on the LR. As the LR is in the present case equal to 1/random match
probability, he will, in practice, only report this probability.

7. The Bayesian framework for the interpretation of forensic evidence

The use of the Bayesian framework is not restricted to DNA analysis. It can be
used in many areas of forensic science that contain uncertainties in the
interpretation of the results of the investigations. The general application is
completely analogous to the application in the HIV and the DNA examples. The
forensic scientist conducts an investigation and obtains certain findings. To
interpret these findings, he will consider at least two, mutually exclusive
hypotheses. These hypotheses already have a certain probability even before the
forensic findings are taken into consideration. The ratio of these prior
probabilities (the prior odds) is based in part on other evidence that was
collected in the case, and therefore falls outside the forensic scientists area of
expertise. This factor will be left up to the jurist. On the basis of his expertise,
the forensic scientist can, however, estimate the probability of observing his
findings if a certain hypothesis is correct. These probabilities determine the LR.
It is therefore up to the scientist to determine this factor. The jurist can adjust his
estimation of the prior odds on the basis of the LR according to the Bayes Rule.
The role of the scientist is limited to reporting of the LR.

It follows from the Bayesian framework that, in addition to this principle


involving the division of duties, the LR can also be used as a measure for the
strength of the evidence. If the LR is larger than 1, the numerator of this ratio
will be larger than the denominator. In such a case, the findings will be more
probable when hypothesis 1 is true than when hypothesis 2 is true. It therefore
concerns evidence in support of hypothesis 1, which increases the probability of
hypothesis 1 relative to that of hypothesis 2. Bayes Rule shows that this occurs
by means of multiplication with the LR. The increase becomes stronger as the
LR increases, and must therefore concern stronger evidence. The LR measures
the strength of the evidence for hypothesis 1 when compared with hypothesis 2;
the larger the LR, the stronger the evidence.

If the findings are equally probable in both hypotheses, the LR will be equal to
1. The findings do not change the odds of the hypotheses: it concerns neutral
evidence that does not add to or detract from the existing burden of proof. A LR
of 1 constitutes the turning point as regards which hypothesis is supported by the
findings. Findings with a LR smaller than 1 constitute evidence in support of
hypothesis 2: the smaller the LR, the stronger the evidence in support of
hypothesis 2 when compared with hypothesis 1.
DRAFT version October 22 2009

8. Hypotheses

An important step in the reporting procedure is the selection of the hypotheses.


The forensic scientist will, often in mutual consultations, only consider
hypotheses that he deems relevant in light of the investigative questions and the
available information. All sorts of scenarios are often conceivable which will not
be taken into consideration by the scientist. This does not mean that the scientist
deems such a scenario impossible or implausible.

A choice also has to be made with regards to the precise wording in which a
hypothesis is formulated. For example, the glass fragment originates from
another window and the glass fragment originates from a different source of
glass are not equal; after all, only the latter leaves open the possibility that it
concerns glass from a jam jar. The LR and the prior odds depend on the precise
formulation of the hypotheses and must therefore be assessed in the context of
these precisely worded hypotheses.

9. If numbers are missing: the series of verbal probability terms

In many forensic areas of expertise the scientist will not be able to make a
numerical statement on the LR. Relevant data from representative random
samples and experiments are often not available for this purpose. In such cases,
the scientist will base his opinion on a combination of knowledge, experience
and common sense. The scientist will not be able to express the probabilities in
the numerator and the denominator in a percentage, but he will be able to
indicate whether the numerator is much larger or actually much smaller than the
denominator. In such cases, the LR will not be expressed as a number, but as
one of the qualifications from the standard series of verbal probability terms.
Jurists are often also unable to provide a well-founded substantiated number of
the estimation of the prior odds. The Bayesian framework can, nevertheless, be
useful in these situations: it can be used as a way of thinking rather than a
calculation model. The following example shows how this framework can be
applied if numerical data is lacking.

10. A verbal example: facial comparison

Suppose that a debit card has been stolen. Ten minutes later, money is
withdrawn from a cash machine with this card. The perpetrator was recorded by
the cash machines camera system, and the images are of a good quality. The
DRAFT version October 22 2009

police think that John J. looks somewhat like the person making the withdrawal.
They question John and make a passport photo of him that is a good likeness.
They subsequently request a facial comparison investigation, whereby the
forensic scientist compares the passport photo with the images from the camera
system. The hypotheses to be considered in this context are:
 Hypothesis 1: John is the same person as the person in the CCTV images
 Hypothesis 2: John is not the same person as the person in the CCTV images
(and is also not directly related to this person).
The scientist compares the passport photo with the images and observes a
striking similarity: both John and the person making the withdrawal have a scar
and a birthmark in the same places. Differences are not observed. The
conclusion is that the findings of the investigation are much more probable if
John is the same person as the person in the CCTV images than if this is not the
case. Or, in other words: the LR is much larger than 1.

We now compare two situations.


 Situation A: John confesses during questioning and also demonstrates that he
has information only known to the perpetrator
 Situation B: John has a strong alibi
There is no further information in both situations. The Court could decide, on
the basis of the information in situation A, that the probability is fairly large, on
the basis of the confession and the information only known to the perpetrator,
that John is the wanted person. This probability becomes even larger as a result
of the scientists conclusion. The Court concludes that John is probably the
person in the CCTV images. In Bayesian terms, this means large prior odds for
hypothesis 1 versus hypothesis 2, which, multiplied with a large LR, lead to
large posterior odds.
In situation B the Court could, however, decide that the probability is actually
quite small that John is the person in the CCTV as a result of his strong alibi.
These prior odds are significantly adjusted upward as a result of the scientists
conclusion. The Court therefore has to weigh the alibi together with the expert
opinion. This assessment may turn out to the advantage or the disadvantage of
the suspect. In Bayesian terms, this means multiplication of small prior odds
(smaller than 1) with a large LR. The result, the posterior odds, may vary from
much smaller than 1 to much larger than 1.

In the end, it is considerably more probable that it is John who is seen in the
CCTV images in situation A than in situation B. The scientists conclusion is,
however, the same in both situations.

It is also instructive to see what happens if the results of the forensic


investigation are very exculpatory for the suspect (the LR is much smaller than
1). For example, John has an attached earlobe whereas the person making the
DRAFT version October 22 2009

withdrawal fairly clearly has a freely hanging earlobe. The conclusion is that the
findings of the investigation are much more probable if John is not the same
person as the person in the CCTV images than if this were the case. Hypotheses
1 and 2 have changed places in the conclusion. The Court will be required to
adjust the prior odds downward. In situation A (large prior odds) the outcome
may again vary from small to large posterior odds. In situation B (small prior
odds), the Court will conclude that John is probably not the wanted person. We
see also in respect of exculpatory evidence that strong evidence does not
necessarily mean that the hypothesis with a large probability is true: the strong
evidence must be weighed together with the other information.

11. Errors in reasoning (fallacies)

Literature and practice have shown that there are a number of errors in
reasoning, or fallacies, which regularly occur in the interpretation of probability
statements2. As these may have important consequences, we will discuss the two
most well-known ones below.

The scientist bases his reasoning on the probability of observing the findings if a
certain hypothesis is true. People are inclined to reverse this formulation. In such
cases, it becomes: the probability that a certain hypothesis is true if the findings
are observed. This reversal is incorrect, and is based on an error in reasoning.
The same error is made when reversing the sentence: if an animal is a cow, the
probability that it has four legs is 100% into if an animal has four legs, the
probability that it is a cow is 100%. This error in reasoning is referred to as the
prosecutors fallacy in case law.

A forensic example. A handwriting expert investigates a threatening letter drawn


up in the Dutch language, and considers the following hypotheses:
1. Pete wrote the threatening letter that is to be investigated
2. Someone else wrote the letter.
His finding, following the investigation, is that Pete's handwriting shows strong
similarities with the handwriting in the threatening letter. He reports:
(a) [scientists conclusion]: such a high degree of similarity is much more
probable if Pete wrote the letter than if someone else wrote the letter.
The prosecutors fallacy distorts this conclusion into:
(b) [distortion] in view of the high degree of similarity it is much more
probable that Pete wrote the letter than that someone else did.
In this example, (a) states that the finding is much more probable when
hypothesis 1 is true than when hypothesis 2 is true. This is a statement on the

2
See also Thompson and Schuman 1987; Evett 1995
DRAFT version October 22 2009

LR. (b), however, contains the distortion that hypothesis 1 is much more
probable than hypothesis 2 in view of the finding. This is a statement on the
posterior odds. As previously discussed, the prior odds constitute the difference
between these two terms. If it becomes improbable, on the basis of the other
information, that Pete wrote the letter, this will not change conclusion (a), but it
could change conclusion (b).

In order to prevent the prosecutors fallacy from occurring, it is important to


understand that the forensic scientist makes a statement on the probability of his
findings, given certain hypotheses. The word findings and hypotheses in the
previous sentence should not switch places.

The lawyer in the same handwriting case can also make an error in reasoning:
the defence fallacy. The defence fallacy contains implicit assumptions. For
example:
(a) [scientists reasoning] If someone else wrote the letter it is most unlikely
that such a high degree of similarity can be observed.
(b) [implicit assumption] There are therefore only a few people in the
Netherlands who have a handwriting that is equally similar to the
handwriting in the threatening letter as Pete's handwriting. Assume, for
example, that, apart from Pete, there are two or three such persons, then
Pete will be one of only three or four possible persons who could have
written the letter.
(c) [implicit assumption 2] The probability that Pete wrote the letter is
therefore one in three or one in four.
(d) Somebody else probably wrote the letter.
The lawyer firstly assumes, in this context, that the perpetrator has to have
Dutch nationality, while in reality anyone who can write Dutch could be the
perpetrator. He subsequently assumes that each of these persons has an equal
probability of being the perpetrator. All persons are therefore included in an
equal manner, irrespective of, for example, their relationship to the addressee.
This reasoning may be correct but only in cases in which it is justified to view
every Dutch citizen as a potential perpetrator with equal measure. This does not
apply in most cases.

In order to prevent the defence fallacy from occurring, it is important to be alert


when it comes to implicit assumptions concerning, for example, the population
of potential perpetrators.

12. Certainty/no opinion


DRAFT version October 22 2009

Probability plays a less prominent role in some cases. The findings are, for
example, impossible or hardly conceivable given one of the hypotheses. In such
cases, the scientist will be completely convinced that this hypothesis is
impossible. A scientist may also, on the basis of the findings, be completely
convinced that only one hypothesis can be correct. He will then not apply any of
the qualifications from the verbal probability series. He will rather express his
subjective conviction in formulations such as:
It is my firm conviction that hypothesis 1 is correct.
Or:
It is my firm conviction that the findings exclude hypothesis 1.
If there is no uncertainty at all, the following phrase may be used:
The findings of the investigation exclude hypothesis 1.
It may also be the case that a scientist is unable to estimate the probability. In
such an event the conclusion may be as follows:
I am unable to render an opinion on the probability of the findings in light
of the hypotheses, because

13. Glossary

A priori
Prior to assessment of the findings of the investigation

A posteriori
After the assessment of the findings of the investigation

Hypothesis
Scenario, proposition

Odds
Strictly speaking, this word is only defined as the ratio between the probability
(p) that an event will occur and the probability (1-p) that this event will not
occur. In many texts, including the present one, the term is more broadly defined
as the ratio between the probability (p) that a certain event will occur and the
probability (q) that a different event will occur. The two events are in this case
mutually exclusive, but do not necessarily provide an exhaustive enumeration of
all possible events (in mathematical terms, they are not necessarily exhaustive).
DRAFT version October 22 2009

Likelihood Ratio
the probability of observing the findings if hypothesis 1 is true
LR =
the probability of observing the findings if hypothesis 2 is true

Bayes Rule
Prior odds x LR = posterior odds

Probability term
A qualification from the standard series of statements used to formulate the
conclusion concerning the probability of the findings in light of the hypotheses.

14. Literature

Broeders, A.P.A., Op zoek naar de bron- over de grondslagen van de


criminalistiek en de waardering van forensisch bewijs, Deventer (2003).

Evett, I.W., Avoiding the transposed conditional, Science and Justice 35 (2),
127-131, (1995).

Robertson, B. and Vignaux, G.A., Interpreting evidence evaluating forensic


science in the courtroom, Chichester UK, (1995).

Sjerps, M.J. and Coster van Voorhout J.A. (eds.) Het onzekere Bewijs. Gebruik
van statistiek en kansrekening in het strafrecht, Deventer, (2005).

Thompson, W.C. and Schuman, E.L., Interpretation of statistical evidence in


criminal trials the prosecutors fallacy and the defense attorneys fallacy, Law
and Human Behavior 11 (3), 167-187, (1987).
DRAFT version October 22 2009

Contact details

The Netherlands Forensic Institute (NFI)


www.forensischinstituut.nl

Visitors address
Laan van Ypenburg 6
2497 GB THE HAGUE
The Netherlands

Postal address
PO Box 24044
2490 AA THE HAGUE
Telephone: +31 (0)70 888 66 66
Fax: +31 (0)70 888 65 55

For general questions, please contact the Front Desk at telephone number: +31
(0)70 888 68 88

For case-related questions, please contact the Digital Technology and Biometry
Department, Statistics Section, at telephone number: +31 (0)70 888 64 00

V.War 2.0
May 2008

Vous aimerez peut-être aussi