Vous êtes sur la page 1sur 47

BTEC (Extended) Diploma

Applied Science (Forensics) Level 3



Steve Bishop
November 2012
Unit 8 Steve Bishop

2
Contents
1 BE ABLE TO USE STATISTICAL TECHNIQUES TO INVESTIGATE SCIENTIFIC
PROBLEMS ................................................................................................................. 3
Statistical techniques ................................................................................................... 5
Measures of location ................................................................................................ 5
Measures of dispersion ............................................................................................ 6
Normal distribution 1 .................................................................................................... 9
Confidence limits .................................................................................................... 11
Shapes of distributions ........................................................................................... 13
The normal distribution 2 ........................................................................................... 14
Finding probabilities with negative values of z ....................................................... 17
Standardising a normal distribution ........................................................................... 19
Probability introduction .............................................................................................. 22
Conditional probability ............................................................................................... 24
Statistics and probability questions ........................................................................... 26
2 BE ABLE TO PERFORM STATISTICAL TESTS TO INVESTIGATE SCIENTIFIC
PROBLEMS ............................................................................................................... 28
Chi-squared (
2
!
) test ............................................................................................. 29
Practice questions ................................................................................................. 35
Type I and type II errors ............................................................................................ 36
The angel of death: guilty or not guilty? ..................................................................... 37
Students t-test ........................................................................................................... 39
t-test for matched pairs .......................................................................................... 41
Independent samples ............................................................................................. 42
Independent t-test .................................................................................................. 44
STATISTICAL TABLES ............................................................................................. 45

Unit 8 Steve Bishop

3
1 BE ABLE TO USE STATISTICAL TECHNIQUES
TO INVESTIGATE SCIENTIFIC PROBLEMS

Probability: addition and multiplication rules; conditional probability, eg lottery,
Mendelian inheritance

Frequency distributions: discrete data; continuous data (grouped and ungrouped)

Shape of distributions: unimodal distributions (normal distributions and skewed
distributions); bimodal distributions (qualitative explanation)
Statistical data calculations: calculation of the mean, ; mode;
median; calculations of standard deviation, ; using ICT
equipment to calculate the standard deviation; entering statistical data into ICT
equipment; retrieving statistical information from ICT equipment; standard error of
the mean; confidence limits

Normal distribution: mean; variance; use of tables of the cumulative distribution
function; application of the normal distribution in science

Sampling: random sampling (quadrant in field sampling); population and sample
(Gallup or Mori poll); standard error of the mean (the uncertainty in the average
value of a set of measurements, eg the calorific value of oil)


P1 carry out statistical calculations to investigate a scientific problem

M1 perform a calculation using probability to investigate a scientific problem

D1 interpret shapes of distributions in scientific data


Unit 8 Steve Bishop

4
Unit 8 Steve Bishop

5
Statistical techniques


Measures of location

There are three types of average: the mean (
!
x
_
),
mode and median. These are known as measures of
location. This provides a single value that represent
the data.




Now try this
Find the mean, median and mode of the following data:

(a) 1, 1, 1, 3, 4, 5, 6



(b) 0, 1, 1, 1, 1 ,1 ,1, 9



Which of the three measures of location is most affected by an extreme value?





When might the mode be of more use than either the median or the mean?




What is the advantage of the mean?






Measures of location doesnt tell us how spread out our data are how dispersed
they are.






Be able to use statistical
techniques to investigate
scientific problems

Frequency distributions
Shape of distributions
Statistical data calculations: mean,
mode, median and standard
deviation
Samples and populations standard
error of the mean
Using spreadsheets and calculators

Unit 8 Steve Bishop

6

Measures of dispersion

One measure of dispersion or spread is the range. Another is the standard
deviation, s or !.

The formula for standard deviation is: s =
!
x " x
_
#
$
%
&
'
(
2
n "1


This is not quite so scary as it looks!

It involves a few simple steps

1. Find the mean
2. Subtract it from all the values to find the deviation and then square it
3. Total up the deviation squared
4. Divide 3 by the total number of data points less one.
5. Square root the answer from 4. This is the sample standard deviation.

Done manually, it is best done in a table:

Example

These are the number of break-ins in a housing estate over a twelve-month period.
Find the mean and the standard deviation of the data:
1, 3, 3, 4, 2, 0, 0, 3, 4, 3, 0, 1




1. Find the mean

!
x
_
=
1+3+3+4+2+0+0+3+4+3+0+1
12
=
24
12
= 2

2. Subtract the mean from all the other values and square it x ! x
_
"
#
$
%
&
'
2













s is the standard deviation
x is the individual data
points

!
x
_
(x bar) is the mean
n is the number of data
points
Unit 8 Steve Bishop

7


!
x " x
_
#
$
%
&
'
(

!
x " x
_
#
$
%
&
'
(
2

1 1 - 2 -1 1
3 3 2 1 1
3 3 2 1 1
4 4 2 2 4
2 2 2 0 0
0 0 2 -2 4
0 0 2 -2 4
3 3 2 1 1
4 4 2 2 4
3 3 2 1 1
0 0 2 -2 4
1 1 2 -1 1
Total 26

3. Find the total of the deviations

!
"
x # x
_
$
%
&
'
(
)
2
= 26

4. Divide the above by the number of data points (n) less 1 (n-1)
12-1 =11
!
x " x
_
#
$
%
&
'
(
2
n "1
=

!
26
11
=2.3636363 (Dont round up yet!)

5. To find the standard deviation square root the answer above
s =
!
x " x
_
#
$
%
&
'
(
n "1
2
= 2.363636... = 1.5374

= 1.54 (3 sig fig)

















Unit 8 Steve Bishop

8
This can be done on a spreadsheet using
the insert function.
Enter the data in a column.
Ensure the correct data points are chosen
Insert function choose statistical >
STDEV



Alternatively use the function statement
=STDEV(cell range)




.






















Unit 8 Steve Bishop

9
Normal distribution 1
The standard deviation can be used to find the confidence interval for a set of
measurements.
We expect 95% of measured values to lie within 2 standard deviations above and
below the mean.
The distribution of the height of 1000 people might look like this.


The shape is known as a bell shape.
The mean, median and mode will all have the same value
It is symmetrical around the mean value.

Many biological variables such as weight, height, blood pressure, life span have this
same distribution shape.
Given enough data points the curve will be a smooth bell shape

Unit 8 Steve Bishop

10


On a normal distribution:
68% of the data items will be within 1 standard deviation from the mean
95.5% of the data items will be within 2 standard deviation from the mean
99.7% of the data items will be within 3 standard deviation from the mean

However, the mean is only an estimate of the exact value and we only have a small
sample of values so we have to use this equation. There will be a sampling error, as
we cannot always sample the whole population.

We then need to calculate the standard error of the mean:
Standard error =
!
s
n







Any data that is more than
3 standard deviations
from the mean is
considered to be an
outlier.
If the whole population is
sampled then this is
known as a census
Unit 8 Steve Bishop

11
Now try this

Complete the following table

Sample size Mean (cm) Standard deviation Standard error of
the mean
10 150 2
100 150 2
1000 150 2
10 000 150 2

What happens to the standard error of the mean as the sample size increases?




Confidence limits

To find how confident we can be in the data we can find the confidence limits
these are related to the standard error.

For data that is normally distributed approximately 95% is
within 2 standard deviations. The 95% confidence level is
adequate for most scientific investigations.

95% confidence limit = mean 1.96 x standard error of the mean



In forensic situations or in
clinical trials a 99.7%
confidence limit is often
required
Unit 8 Steve Bishop

12

Now try these
1. The diameter of a piece of wire is measured using a micrometer. The
following results in mm were obtained:
2.34, 2.34, 2.35, 2.37, 2.38

Calculate the mean and standard deviation.




2. The mean of five diameter values of a piece of wire is 2.36 mm and the
standard deviation is 0.018 mm.



What is the standard error of the mean?




A piece of wire is 2.39 mm. Can you be 95% confident that it is a correct
measurement of the diameter?




Unit 8 Steve Bishop

13

3. The volumes of acid to determine the end point of a titration are the
following:

(a) Calculate the mean and standard deviation


(b) Find the standard error of the mean.


(c) What are the 95% confidence limits, assuming the data is normally
distributed?



Shapes of distributions

Unit 8 Steve Bishop

14
The normal distribution 2
The normal distribution is a very
important distribution.
It is described by:

X ~ N(
!
x
_
, s!)



It has the following features:
bell-shaped
symmetrical about "
it extends from ! to +!
the maximum value of f(x) =
1
! 2"
1
! 2"

the total area under the curve is 1


95%
!2" +2"

99.9%
!3" +3"

Approximately 95% of the
distribution lies between 2 SDs of
the mean

Approximately 99.9% of the
distribution lies between 3 SDs of
the mean

f(x)
mean
variance
s
The probability that X lies between a and b is written
as: P(a<X<b).
To find the probability we need to find the area under
the normal curve between a and b.
We can integrate or use tables.
To simplify the tables for all possible values of ! and "# the variable X is
standardised so that the mean is zero and the standard deviation is 1. The
standardised normal variable is Z and Z (0, 1)
Using standard normal tables
Below is a part of the normal distribution function tables from the data book:
(a) To find P(Z < 0.16), read off the value $
(0.16): find row 0.1 and go across to
column 6. This gives the value
0.5636.
(b) To find P(Z < 0.429) read off
$(0.429), find row 0.4 and column 2,
then column 9 in the add section.
Add these values together: 6628 + 32 = 6660
Hence, P(Z < 0.429) = 0.6660.
(c) To find P(Z > 0.55) we need to find
1 $ (0.55)
= 1 0.7088 = 0.2912.
a b
P(a<X<b)
(a)
(z) = 05636
z = 0.16
!
1!"(0.55)
z = 055
= 0.2912
Remember
that the total
area under
the
standardised
curve is 1.
Unit 8 Steve Bishop
16
Now try these
Draw sketches to illustrate your answers
If Z ~ N (0, 1), find
1. P (Z <0.87) 2. P (Z > 0.87) 3. P (Z < 0.544) 4. P (Z > 0.544)
Unit 8 Steve Bishop
17
Finding probabilities with negative values of z
For negative values of Z we use ! (-z) = 1 ! (z). Remembering that the
curves are symmetrical and that the total area under the curves is 1.
Above shows, P(Z < -a) = !(-a) = 1 !(a)
This shows that P(Z > -a) = !(a)
Example
Find (a) P (Z < 0.411) (b) P (Z > - 0.411) (c) P (Z > 0.411) (d) P (Z < - 0.411)
Solution
(a) P(Z < 0.411) = [from tables 6591 + 4] = 0.6595
(b) P(Z > - 0.411) = P(Z < 0.411) = 0.6595 (from (a))
(c) P(Z > 0.411) = 1 !(0.411) = 1 0.6595 = 0.3405
(d) P(Z < - 0.411) = P(Z > 0.411) = 0.3405
-a
!(-a)
1-
a
!(a)
a
! (a)
-a
P(Z < -a) = P(Z > a)
P(Z > -a) = P(Z < a)
Unit 8 Steve Bishop
18
Now try these
1. P (Z > - 0.314) 2. P (Z < - 0.314) 3. P (Z > 0.111) 4. P (Z > - 0.111)
P(a < Z < b) = !(b) !(a)
Example:
Find P(0.345 < Z < 1.751)
= !(1.751) ! (0.345) = 0.9600 0.6350
= 0.3250
Now try these
Find
(a) P(0.35 < Z < 1.50)
(b) P(0.45 < Z < 1.51)
(c) P(0.354 < Z < 1.541)
(d) P(0.349 < Z < 1.716)
Answers
Now try these
1. 0.80078
2. 1-0.8078 = 0.1922
3. 0.7068
4. 1- 0.7068 = 0.2932
5. !(0.314) = 0.6231 ! by symmetry 0.6231
6. 1- 0.6231 = 0.3769
7. 1- 0.5442 = 0.4558
8. 0.5442
Now try these
(a) !(1.50) - !(0.35) = 0.0332 0.6368 = 0.2964
(b) 0.9345 0.6736 = 0.2609
(c) 0.9383 0.6382 = 0.3001
(d) 0.9569-0.6363 = 0.3206
a b
Unit 8 Steve Bishop
19
Standardising a normal distribution
To standardise X where X ~ N(!, "#)
subtract the mean and then divide by the standard deviation:
Z =
X
!
Z =
X
!
where Z ~ N(0,1)
Example
If X~ N(100, 25), find P (X > 110)
Solution
First standardise the random variable:
P (X > 110) =
P
!
"
#
Z >
110 100
5
$
%
&
= P(Z > 2)
P
!
"
#
Z >
110 100
5
$
%
&
= P(Z > 2)
P (Z > 2) = 1 P (Z $ 2)
= 1 0.9772
= 0.0228
Now try these
1. If X~ N(116, 64), find P (X < 100)
2. If X~ N(100, 16), find P (X > 90)
100 110
0 2
X~N (100,25)
Z~N (0,1)
Unit 8 Steve Bishop
20
150 165 x:
z: 0 0.5
145
-05
Example
Lengths of a murder victims hair are normally distributed with a mean length
of 150 cm and a standard deviation of 10 cm.
Find the probability that the length of a randomly selected strip is shorter than
165 cm
Solution
Here X ~N (150, 10!)
(a) This means we have to find P(X
<165)
To use the tables we have to standardise
X:
Z =
X
!
Z =
X
!
=
165 150
10
165 150
10

= 1.5
So P(X <165) becomes P (Z < 1.5)
P(X <165) = P (Z < 1.5) = " (1.5)
= 0.9332 from the tables.
Hence the probability that the length is
shorter than 165 cm is 0.93.
(b) To find the probability that the length is within 5 cm of the mean, we need
to find
P(|X 150| < 5)
Dividing by the standard deviation gives
P
!
"
#
#X 150#
10
<
5
10
$
%
&
P
!
"
#
#X 150#
10
<
5
10
$
%
&
i.e. P(|Z| <
0.5)
P (|Z| < 0.5) = P(-0.5 < Z < 0.5)
= 2"(0.5) 1
= 2 # 0.6915 1
= 0.383
150 165 x:
z: 0 15
X~N(150, 10?)
150 165
Z~N(0, 1)
0 1.5
Unit 8 Steve Bishop
21
Now try these 2
1. The masses of packages from a particular machine are normally
distributed with a mean of 200g and a standard deviation of 2 g. Find
the probability that a randomly selected package from the machine
weighs
(a) less than 197 g
(b) more than 200.5 g
(c) between 198.5 and 199.5 g.
2. The heights of boys at a particular age follow a normal distribution with
mean 105.3 cm and variance 25 cm.
find the probability that a boy picked at random from this group has
height
(a) less than 153 cm
(b) more than 158 cm
(c) between 150 cm and 158 cm
(d) more than 10 cm difference from the mean height.
Answers
Now try these
1. 0.0228
2. 0.8994
Now try these 2
1. (a) 0.0668 (b) 0.4013 (c) 0.1747
2. (a) 0.7054 (b) 0.0618 (c) 0.4621 (d) 0.0456
Unit 8 Steve Bishop
22
Probability introduction
Random events happen by chance. Probability is a measure of how likely they are.
It is measured on a scale from 0 (impossible) to 1 (certain).
A random event has various outcomes.
In a trial (or experiment) the things that happen are called outcomes.
Events are groups of one or more outcomes.
When an outcome is equally likely the probability of an event is determined by
counting the outcomes.
P(event) =
Number of outcomes where event happens
Total number of possible outcomes
P(event) =
Number of outcomes where event happens
Total number of possible outcomes

Example
A bag with 10 balls in 4 are red, 3 are blue, 2 are white and 1 is black.
What is the probability of picking a blue ball? a white ball? a green ball? a ball that
is not red?
A sample space is the set of all possible outcomes.
Example
Complete the following sample space for the score on rolling 2 dice:
Scores 1 2 3 4 5 6
1
2
3
4
5
6
Find the probability of scoring: a total of 12; a total of 7; a score of less than 4.
Venn diagrams can be used to show which outcome corresponds to which event.
The shaded area in the middle The shaded area shows A or B
shows A and B P(A!B) P(AUB)
P(AUB) = P(A) + P(B) P(A!B)
A B
Unit 8 Steve Bishop
23
Example
If you roll a dice, event A is an even number, and event B is a number >4, then the
Venn diagram would be:
A B
2
4
6
5
3
1
Find: P(A); P(B) ; P (A ! B); P(A)'
Unit 8 Steve Bishop
24
Conditional probability
In a small prison there are 100 prisoners. 50 are imprisoned for burglary, 29 arson
and 34 for other crimes.
First draw a Venn diagram. Work out how many are in for burglary and arson:
(50+29+24) 100 = 13. These must have been counted twice, so they are the ones
in for both. So those who charged for burglary only must be 50 13 = 37 and arson
only 29 13 = 16. Place these numbers on the Venn diagram
50 29 37 16 13
34
Maths Science
What is the probability of choosing someone from the prison who is not in for burglary
or arson?
1. What is the probability of choosing someone who is in for burglary and arson?
2. What is the probability of choosing someone who is in for arson?
3. What is the probability of this person in for burglary as well?
This last question is known as conditional probability. It is often phrased as What
is the probability of choosing someone who is convicted for burglary given that they
are convicted for arson?
This is written as: P(B|A) (the probability of B given A).
From the diagram, the answer is straightforward 13/29.
The 13 represent those in Burglary and Arson and 29 represents those in Arson
So this can be written as:
P(A|B) = P (A and B)
P()
Arson
Burglary

Unit 8 Steve Bishop
25
Now try these
1. Two dice are thrown. What is the probability that the total is: (i) 7; (ii) a prime
number; (iii) 7, given that it is a prime number.
2. A forensics company is worried about the high turnover of its employees and
decides to investigate whether they are more likely to stay if they are given
training. On 1
st
January one year the company employed 256 people (excluding
those about to retire). During the year a record was kept of who received training
as well as who left the company. The results are summarised below:
Still employed Left company Total
Given training 109 43 152
Not given training 60 44 104
Total 169 87 256
Find the probability that a random selected employee:
(i) received training
(ii) did not leave the company
(iii) received training and did not leave the company
(iv) did not leave the company, given that the person had received training
(v) received training, given the person had not left the company.
3. 100 cars are entered for a road-worthiness test which is in two parts, mechanical
and electrical. A car passes only if it passes both parts. Half the cars fail the
electrical test and 62 pass the mechanical. 15 pass the electrical but fail the
mechanical test. Find the probability that a car chosen at random.
(i) passes overall (ii) fails one test only (iii) given that it has failed, failed the
mechanical test, only.
Unit 8 Steve Bishop
26
Statistics and probability questions
For each task show all your workings. Give the final answer where
appropriate to 3 significant figures. Hand in your completed working and
solution.
Task 1
A. Use the data from your titration experiment.
(a) Find the mean and median of the volume of HCl used to determine the end
point.
(b) Determine the standard deviation using an appropriate method.
B. On a particular corpse some unidentified tissue has been found. A sample
of 11 cells have been taken and measured. The diameters (in m) are as
follows:
123, 126, 129, 122, 125, 128, 125, 124, 125, 126, 122
(a) Find the mean, median and mode of the diameters
(b) Determine the standard deviation manually and by ICT (if you use a
spreadsheet include a screen shot).
(c) Calculate the standard error of the mean. What is the 95% confidence
limit?
(P1)
Task 2
You have been investigating the probability of certain ballistic trace evidence
been found at a crime scene. The probability of one type A is 0.3 and the
probability of type B is 0.5. The probability of P(A|B) = 0.25.
(a) Find the probability of finding A and B at a crime scene
(b) Find the probability of finding A or B at the scene.
(P1 part; M1 part)
Task 3
A forensic anthropologist has asked your advice. She was investigating the
lifespan of insects on a human corpse. The mean lifespan for one insect is 144
days and the standard deviation is 16 days. Find the probability that one insect
will live less than 140 days and another more than 156 days.
(M1 part, D1 part)
Unit 8 Steve Bishop
27
Task 4
The following distributions have had their labels removed.
A B
C D
Identify:
(a) the bimodal distribution
(b) the positively skewed distribution
(c) the negatively skewed distribution and
(d) the normal distribution.
Which distribution matches the following:
(i) An easy science examination
(ii) The salary of workers in a large laboratory
(iii) The heights of males and females in the UK
(iv) The mass of males in a large science laboratory.
(D1 part)
Unit 8 Steve Bishop
28
2 BE ABLE TO PERFORM STATISTICAL TESTS
TO INVESTIGATE SCIENTIFIC PROBLEMS
Chi-squared test: , where O is the observed frequency and E
is the expected frequency); degrees of freedom; contingency tables; science
related applications of the Chi-squared test, eg colour blindness, psychology,
genetics, drug tests, any other science related test
P2 perform a chi-squared test to support a scientific hypothesis
M2 interpret the results of the chi-squared test
D2 evaluate the validity of the interpretation of the results of the chi-squared test
The t-test: independent samples; related samples (matched pairs); applications,
eg equal number of seeds in two different composts, test whether a particular
fertilizer improves yield of tomatoes, any other science related test
P3 perform a t-test on data collected from a laboratory experiment
M3 interpret the results of the t-test
D3 evaluate the validity of the interpretation of the results of the t-test
Correlation testing: graphical test, eg line of best fit; linear regression, eg using a
calculator in linear regression mode; testing for power law, eg radioactivity
experiments, electrical experiments, any other science related example
P4 carry out an appropriate correlation method to investigate data collected from a
laboratory experiment.
M4 interpret the results of the correlation.
D4 evaluate the validity of the interpretation of the results of the correlation.
Unit 8 Steve Bishop
29
Chi-squared (
2
!
) test
2
! is pronounced kai-squared, and sometimes written chi-squared. The
2
! test
helps discover if there is any connection between two variables that can be arranged
into categories (eg colours, countries, gender). (It cannot be used with continuous
data.)
Example 1
50 men and 50 women are interviewed.
43 men can name over 15 clubs in the premier league
27 women can name over 15 clubs in the premier league.
Is there a connection between gender and football interest (assuming being able to
name over 15 clubs means that the person has an interest in football)?
1. Define the null and alternative hypotheses
H
0
: there is no difference between genders
H
1
: there is a difference between genders.
2. Arrange the data into a contingency table
Interested
in football
Not
interested
in football
Total
Men 43 7 50
Women 27 23 50
Total 70 30 100
This is a 2 ! 2 table there are 2 categories for each variable.
!
"
2
=
(O# E)
2
E
$
where O is the observed values and E is the expected values.
The contingency table above gives the observed values. We now have to find the
expected values.
3. Find the expected values
This is found by multiplying the column total by the row total and dividing by the
grand total.
Interested
in football
Not
interested
in football
Total
Men 43 7 50
Women 27 23 50
Total 70 30 100
column ! row
overall
Unit 8 Steve Bishop
30
Hence for men:
interested in football we would expect: (70 ! 50) 100 = 35
not interested in football: (30 ! 50) 100 = 15
For women:
interested in football: (70 ! 50) 100 = 35
not interested in football: (30 ! 50) 100 = 15
The expected table would then read:
Interested
in football
Not
interested
in football
Total
Men 35 15 50
Women 35 15 50
Total 70 30 100
The totals will remain unchanged.
4. Calculate the residual table
The residual is the difference between the observed and the expected values.
Observed - Expected = Residual
43 7 - 35 15 = + 8 -8
27 23 35 15 -8 +8
In 2 ! 2 tables the numbers will always be the same, with only the signs differing.
5. Calculate
2
!
!
"
=
E
E O
2
2
) (
# where O is the observed values and E is the expected values.
The residual table was found (O - E)
So,
2
! =
35
) 8 (
2
+
+
15
) 8 (
2
!
+
35
) 8 (
2
!
+
15
) 8 (
2
+
= 12.19
We now have to decide if the
2
! value is high enough to conclude that it is unlikely
to get such a number by chance.
To do this we have to look at the concept of degrees of freedom
Unit 8 Steve Bishop
31
Degrees of freedom

In a 2 ! 2 contingency table, the value of one entry determines all the others:

Total
43 50
50
70 30 100

However, in a 3 ! 3 table we need 4 values before we can know what all the other
values are:


Total
37 22 70
8 10 20
60
60 50 40

One in the first example, and four in the second example are called the degrees of
freedom.

The degrees of freedom can be calculated using:

degrees of freedom = (r 1) ! (c -1)

where r = the numbers of rows and c = the number of columns.



Knowing the degrees of freedom, the
2
! value and a table of critical values we can
find out if there is any relations hip between gender and interest in football.


One-tail 5% 2.5% 1.25% 0.5% 0.25% 0.005%
Two-tail 10% 5% 2.5% 1% 0.5% 0.01%
0.9 0.95 0.975 0.99 0.995 0.999
! =1 2.706 3.841 5.024 6.635 7.8794 10.83
! =2 4.605 5.991 7.378 9.210 12.84 16.27
! =3 6.251 7.815 9.348 11.34 14.86 18.47
! =4 7.779 9.488 11.14 13.28 16.75 20.51



With one degree of freedom and a test at the 5% level gives us a value of 3.841.
This means that 5% of the time we would expect a number greater than 3.81.

As the
2
! value is 12.9, we can say that at the 5% level we are confident that there is
a relationship between football and gender.




Unit 8 Steve Bishop
32
Example 2
A sociologist wants to know if middle-class men are more likely to change babies
nappies than working-class men. The sociologist interviews 40 middle-class and 60
working-class men. 17 middle-class men change nappies and 13 working-class men
change nappies.

1. Define the null and alternative hypotheses
H
0
: There is no connection between social class and nappy changing
H
1
: The two variables are related.

2. Arrange the data into a contingency table
3. Find the expected values
4. Calculate the residual table

17 23 - 12 28 = +5 -5
13 47 18 42 -5 +5

5. Calculate
2
!

So,
2
! =
12
) 5 (
2
+
+
28
) 5 (
2
!
+
18
) 5 (
2
!
+
42
) 5 (
2
+
= 4.96

6. Find the degrees of freedom

Degrees of freedom = (2 1 ) ! (2 1 ) = 1


7. Use the tables
The chance that
2
! will be 3.841 or more by chance if H
0
is true will be 5%.
2
! = 4.96, so this suggests that we reject H
0
and conclude that there is some
connection between social class and nappy changing.






Unit 8 Steve Bishop
33
Now try these
1. Find the expected values for the following tables.
(a) 18 32 (b) 25 16
8 42 22 37
(c) 40 60 60
60 50 50
20 50 10
2. Find the residual tables for the tables in question 1.
3. Calculate the
2
! for the tables in question 1.
4. How many degrees of freedom will there be for each of the following contingency tables?
(a) 5 ! 3 (b) 7 ! 5 (c) 6 ! 2 (d) 10 ! 17
5. The table below shows the results of a drug test on an infection. Is there any evidence
that treatment is related to cure?
Treated Not treated
Cured 24 57
Not cured 53 257
6. Murder Inc., a forensic science firm, carried out a survey to find out the political affiliation
of its employees. Carry out a
2
! test on the table to determine whether there is any
association between political affiliation and type of work
Lab-based Non lab-based Total
Conservative 22 16 38
Labour 53 8 61
LibDem 20 11 31
Total 95 35 130
7. A researcher in genetics is investigating whether eye colour bears any relationship to
place of residence. From the table below, is there any evidence of such a relationship?
Brown Blue Other
Leicester 72 80 28
Bournemouth 20 62 18
Aberdeen 67 120 44
Unit 8 Steve Bishop
34
Answers
1.
(a) 13 37 (b) 19.27 21.73
13 37 27.73 31.27
(c) 48 64 48
48 64 48
24 32 24
2.
(a) + 5 - 5 (b) + 5.73 -5.73
- 5 + 5 -5.73 + 5.73
(c) -8 -4 +12
+12 -14 +2
-4 +18 -14
3.(a)
2
! = 25/13 + 25/37 + 25/13 + 25/37 = 5.20 (b) 5.45 (c) 29.69
4. (a) 4 ! 2 = 8 (b) 6 ! 4 = 24 (c) 5 (d) 144
5. 11.59 significant at ! %, so there is evidence of an association
6. 6.38, significant at 2 %, so there is evidence of an association
7. 13.5, 4 degrees of freedom, significant at 1% so there is evidence of an association.
Unit 8 Steve Bishop
35
Practice questions
1. Is there a connection at the 5% level between burglary and house type?
Burglary No burglary Total
House 3 2
Bungalow 4 1
Total
2. Is there a connection between the type of area and fatal traffic accidents (figures
in thousands) at the 5% level?
Fatal Non-fatal Total
Motorway 5 15 20
Urban 4 24 28
Rural 3 12 15
Total 12 51 63
Solutions
1. Degrees of freedom: 1
Chi-square = 0.476
For significance at the 5% level, chi-square should be greater than or equal to 3.84.
The distribution is not significant.
2. Degrees of freedom: 2
Chi-square = 0.88
For significance at the 5% level, chi-square should be greater than or equal to 5.99.
The distribution is not significant.
Unit 8 Steve Bishop
36
Type I and type II errors
There are four possible conclusions when conducting a significance test:
True situation Our conclusion
H
0
is true Accept H
0
Correct decision
H
0
is true Reject H
0
Wrong decision Type I error
H
0
is false Accept H
0
Wrong decision Type II error
H
0
is false Reject H
0
Correct decision
A type I error is known as a false positive.
For example a court finding a person guilty for a crime they did not commit.
The probability of a type I error is the same as the significance level
A type II error is a false negative.
A court finding a person not guilty of a crime they did commit.
A third type of error has also been proposed: type III
Rejecting the null hypothesis for the wrong reason!
Justice System - Trial
Defendant
Innocent
Defendant
Guilty
Reject
Presumption of
Innocence
(Guilty Verdict)
Type I Error Correct
Fail to Reject
Presumption of
Innocence (Not
Guilty Verdict)
Correct Type II Error
Statistics - Hypothesis Test
Null Hypoth
True
Null Hypoth
False
Reject Null
Hypothesis
Type I Error Correct
Fail to Reject
Null Hypothesis
Correct Type II Error
Unit 8 Steve Bishop
37
The angel of death: guilty or not guilty?
Kirsten Gibert was a nurse on Ward C at the Veterans Affairs Medial
centre in Northampton Massachusetts, USA. She earned the
nickname Angel of Death as she was often the first to notice that a
patient was going into a cardiac arrest. She was calm and competent
and would be able to administer the correct drug to save the patient.
However, there were growing suspicions about her behaviour. There
had been a high number of deaths on her particular ward. As well as
shortages of the amphetamine-type drug epinephrine that can be
used to cause cardiac arrest.
A hospital investigation found nothing untoward. Some staff were still concerned, so a
second investigation took place, this time involving statistician Stephen Gehlbach. Gehlbach
plotted the annual number of deaths, broken down by shift and year (below). Gilbert started
to work on Ward C in March 1990 and stopped working at the hospital in February 1996.
Total deaths at the hospital, by shift and year [source: Devlin & Lorden (2007, p. 16)]
What pattern does the bar chart show?
Unit 8 Steve Bishop
38
Is there evidence to secure a conviction? Could it be a coincidence? To determine this we
can use a chi-squared test.
Here is the data the investigators had:
Gilbert Present Death on shift
Yes No Total
Yes 40 217
No 34 1350
Total
Perform a chi-squared test to support the following one-tail hypothesis at the 0.01 level (P2):
H
A
: Significantly more patients will be found to die on a shift where the subject is
working than on shifts when the subject is not working.
State clearly your conclusion.
What are the implications of the result? (M2)
How valid are your results? How valid is your interpretation? Is Kirsten Gilbert really gulity or
non-guilty? (D2)
Bibliography
Kelly M. Pyrek (2009). Kristen Gilbert Case Explored in New Book Forensic Nurse [online:
http://www.forensicnursemag.com/webx/391webx1.html [accessed 22 Jan 2010]]
K Devlin and G. Lorden (2007). The Numbers behind Numb3rs. Plume: New York.
Unit 8 Steve Bishop
39
Students t-test
Student was W. S. Gossett. He published his test anonymously as
Student because he was working for the brewers Guinness as a
statistician and Guinness did not want the competition knowing that they
were using statistics to help improve the brewing process.
The test is used to compare samples from two different batches.
This may be beer brewed under different circumstances, soil from
different areas or evidence from two different crime scenes.
It is usually used with small (<30) samples that are normally distributed.
Question
In an investigation to determine the effectiveness of sequencing of fingerprints 10 prints are
taken enhanced with DFO and then with ninhydrin. The points of detail at each stage are
recorded. Is there a difference at the 95% confidence level?
DFO 8 12 11 6 9 11 7 8 10 9
DFO+ninhydrin 10 15 12 6 13 14 9 9 15 12
t-test for matched pairs
This is used when there is some sort of link between the data sets. There is some link here
there is a before and after (before ninhydrin and after ninhydrin), so we use the t-test for
matched pairs
1. Set up the null and alternative hypothesis:
H
0
there is no difference in the number of minutae when using ninhydrin
H
A
there are more minutae observed after the enhancement of ninhydrin.
This is a one-tail test.
We are testing at the 95% or 5% (0.05) level
2. Calculate the difference between the pairs in the sample
DFO
8 12 11 6 9 11 7 8 10 9 Total
DFO+ninhydrin
10 15 12 6 13 14 9 9 15 12
Difference (D)
2 3 1 0 4 3 2 1 5 3 24
3. Calculate the mean of the differences
! !
!
!
!
!"
!"
= 2.4
Unit 8 Steve Bishop
40
4. Calculate the standard deviation of the difference
! !
!!!!!!
!!!
!
!!!!!
!
! !!!!!!!!!!!!
!"!!
= 1.51
5. Calculate the standard error
SE =
!
!
=
!!!"
!"
= 0.478
6. Calculate the value of t
! !
!
!"

=
!!!
!!!"#
= 5.0
7. Calculate the number of degrees of freedom and find the critical value
No of pairs of data 1 = n 1
10-1 = 9
8. From the table with 9 degrees of freedom 1-tail at 0.05 level:
9. Determine if there is a difference or not
t > t
critical
(5.0 > 1.833)
So, the null hypothesis is rejected and the alternative hypothesis is accepted.
The ninhydrin does make a positive difference.
Unit 8 Steve Bishop
41
t-test for matched pairs

1 Set up the null and alternative
hypotheses and determine if it is a one-
or two-tail test
H
0

H
A


2 Calculate the differences between the
pairs in the samples (D)











3 Calculate the mean of the differences
! !
!
!



4 Calculate the standard deviation of the
differences ! !
!!!!!!
!!!







5 Calculate the standard error of the
differences SE =
!
!








6 Calculate the value of t
! !
!
!"




7 Calculate the number of degrees of
freedom
No of pairs of data 1 = n 1


8. Find the critical value from the table

9. Determine if there is a difference or not

If t < critical value then there is no
significant difference between the two sets
of data and the null hypothesis is accepted.

If t ! critical value then the null hypothesis
is rejected. Then the two sets of data differ
significantly.


Unit 8 Steve Bishop
42
Independent samples
If there is no before and after relationship between the samples then the independent
samples test is used.
! !
!
!
! !
!

!
!
!
!
!
!
!
!
!
!
!
Example
Some brown dog hairs were found on the clothing of a victim at a crime scene involving a
dog.
The five of the hairs were measured: 46, 57, 54, 51, 38 !m.
A suspect is the owner of a dog with similar brown hairs. A sample of the hairs has been
taken and their widths measured: 31, 35, 50, 35, 36 !m.
Is it possible that the hairs found on the victim were left by the suspects dog? Test at the %5
level.
[From D. Lucy Introduction to Statistics for Forensic Scientists Chichester: Wiley, 2005 p. 44.]
Solution
1. Calculate the mean and standard deviation for the data sets !
!
and !
!
Dog A Dog B
46 31
57 35
54 50
51 35
38 36
Total 246 187
Mean 49.2 37.4
Standard
deviation
7.463 7.301
2. Calculate the magnitude of the difference between the two means.! !
!
- !
!
!
49.2 37.4 = 11.8
3. Calculate the standard error
!
!
in the difference:
!
!
!
!
!
!
!
!
!
!
!
.
!!!"#!
!
!
!!!"#!
!
= !"!!" !! !"!!! ! ! !
= 4.669 " 4.67 (3 sf)
Unit 8 Steve Bishop
43
4. Calculate the value of t:
t = difference between the means standard error in the difference
11.8!4.669 = 2.527
! 2.53 (3 sig fig)
5. Calculate the degrees of freedom = !
!
+ !
!
2
5 + 5 -2 = 8
6. Find the critical value for the particular significance you are working to and find the
critical value from the table
At the 0.05 level t
crit
= 2.306
If t < critical value then there is no significant difference between the two sets of data
If t > critical value then there is a significant difference between the two sets of data
So, at 0.05 level there is a significant difference between the two data sets.
So it could not come from the same dog.
Unit 8 Steve Bishop
44
Independent t-test
1 Calculate the mean and standard
deviation for the data sets !
!
and !
!
2 Calculate the magnitude of the
difference between the two means.
! !
!
- !
!
!
3
Calculate the standard error
!
!
in the
difference:
!
!
!
!
!
!
!
!
!
!
!
.
4 Calculate the value of t:
t = difference between the means
standard error in the difference [step 2
step 3]
5 Calculate the degrees of freedom = !
!
+ !
!
2
6 Find the critical value for the particular
significance you are working to.
7 If t < critical value then there is no
significant difference between the two sets
of data and the null hypothesis is accepted.
If t " critical value then the null hypothesis is
rejected. Then the two sets of data differ
significantly.
Unit 8 Steve Bishop
45
STATISTICAL TABLES
Unit 8 Steve Bishop
46
Unit 8 Steve Bishop
47

Vous aimerez peut-être aussi