Vous êtes sur la page 1sur 21

1

DEALING WITH EXPERIMENTAL DATA


INTRODUCTION

Practical work and investigations in biology may be descriptive, for example, making
observations of the structure of an insect-pollinated flower. Many investigations involve the
collection of data, in other words, they are quantitative. The way in which an investigation is
organised follows a sequence, as shown in the flow chart below.

Make an observation,
formulate a
hypothesis and plan
the investigation

Carry out practical


Carry out practical
work and collect data
work and collect data

Tabulate data and


Tabulate data and
present in the form of
present in the form of
a suitable graph
a suitable graph

Apply a statistical test


to the data to test the
validity of the experiment
and to test your
hypothesis

Note: statistical tests do not prove that the hypothesis is correct (or incorrect). The result of
the test might, however, suggest that the hypothesis is correct, or support the hypothesis.

Tabulating data
In order to help understand the data, it is essential that the results are tabulated and presented
correctly.

As an example, a simple experiment might be carried out to investigate the effect of


temperature on the activity of an enzyme. This could involve recording the time taken for the
enzyme and substrate mixture to reach an end-point, at a range of different temperatures. We
can calculate the relative rate of the reaction, by taking the reciprocal of the time taken to reach
the end-point, that is, 1 time taken to reach the end-point.

The results of this investigation would be presented as in the table below.

1
2

Temperature Time taken to Relative rate of


/ C reach the end-point reaction / s-1
/s
20
25
30
35
40
45

Note how the table is presented. The first column is the independent variable, the second
column is the dependent variable and the last column contains derived data.

Each column has a suitable, meaningful heading, that is, what was recorded, followed by the
units. The units are not then repeated in each column of the table. It is a convention to
separate the units from what was measured by a solidus (/), rather than using brackets.
However, if we calculate a percentage change, the percentage sign is usually placed in
brackets (%), because percentage is not a unit.

The example below shows a more complicated table, used to record the results of an
experiment on osmosis in plant tissue.

Concentration of Initial mass of potato Final mass of potato Percentage change in


sucrose solution / mol tissue / g tissue / g mass (%)
dm-3 [(I F) / I] x 100
0.0
0.2
0.4
0.6

Plotting graphs

In biology, we are likely to use three different types of graphs:

Line graphs

Bar charts

Histograms

Line graphs are used to show the relationship between one variable and another. For
example, the relationship between enzyme activity and temperature, or the relationship
between the rate of transpiration and the time of day. In a line graph, the independent variable
is plotted on the x-axis (horizontal axis) and the dependent variable is plotted on the y-axis
(vertical axis).
Points must be plotted carefully, using either crosses () or encircled dots (). You need to
think carefully about whether to join successive points with a ruled, straight line, or to draw a

2
3

line of best fit. If you are unsure, it is usually safer to join successive points with ruled, straight
lines. You should draw a smooth curve through the points only if you have a good reason to
do so, e.g. a graph showing the relationship between the rate of photosynthesis and light
intensity.

If there is no clear relationship do not draw a line, but leave as a scattergram. This can be
useful to show whether or not there is a correlation between two variables.

Bar charts are used to display data where one of the variables is not numerical. For example,
if you wish to draw a graph to show the vitamin C content of a range of different fruits. In this
case, you would plot the concentration of vitamin C on the y-axis, and put the names of the
different types of fruit on the x-axis. By convention, the bars are drawn separately (that is, not
touching) and of equal width. The bars can be arranged in any order, but it is easier to make
comparisons if they are arranged in descending order of size (highest to lowest). You could
also use a bar chart to show the means, for example, of two sets of data. The mean numbers
of organisms found in two sites, A and B, could be shown in this way

Histograms are used when the independent variable is numerical and the data are
continuous. The data are grouped into classes. The number of classes will depend on the
nature of the data obtained. The y-axis represents the frequency (or number) in each class. In
a histogram the bars should be drawn touching.

Hints for drawing graphs

1. Decide which type of graph is appropriate for the data.

2. Use the x-axis for the independent variable (that is, the variable chosen by the
experimenter) and the y-axis for the dependent variable (the readings taken during
the experiment).

3. Label the axes fully, including the units.

4. Plot the points carefully.

5. If there is more than one set of data, label each curve to show what it represents.

6. Do not extrapolate!

7. Give your graph a suitable title. For e.g.,

Effect of Lack of Nitrates on the Biomass of Bean Seedlings,

Influence of Temperature on the Rate of Dehydrogenase Activity,

Relationship Between flow Rate of Water and the Number of Mayfly Nymphs Found

SELF ASSESSMENT QUESTIONS


Present each of the following tables of data in a suitable graphical form.

3
4

SAQ 1 Blood glucose concentration of a person after having a meal of carbohydrates.

Time after eating / min Blood glucose concentration / mmol


dm-3
0 5.5
15 5.9
30 6.2
45 6.8
60 6.0
75 5.5
90 5.2

SAQ 2 Relationship between the height of a plant and the number of flowers

Height of plant / cm Number of flowers


106 47
113 52
96 47
98 39
77 20
102 46
110 45
77 20
141 102

SAQ 3 Numbers of different species of insects collected from different species of trees
Species of tree Number of insect species collected
Oak 83
Birch 68
Hazel 53
Willow 34
Larch 9

SAQ 4 Frequency of leaf length

Leaf length / mm Number of leaves


100 109 1
110 119 7
120 129 8
130 139 14
140 149 18
150 159 16
160 169 9
170 179 10
180 189 5
190 199 1
Mean, Median, and Mode

4
5

Mohamed asked Ali how many hours per week he spent watching TV. It varies, he said, but
on average, about 18 hours. What does this really mean?

For the next 11 weeks, he kept a diary to record the number of hours per week he spent
watching TV. Here is his record:

14, 12, 9, 18, 22, 27, 6, 22, 19, 24, 15

At first sight, it is difficult to draw any conclusions from these figures. In order to make the
picture clearer, we can put the figures into rank order, from the smallest to the largest:

6, 9, 12, 14, 15, 18, 19, 22, 22, 24, 27

We can now see that 18 hours is in the middle of the range of values, which is what we might
expect as a typical, representative figure.

6, 9, 12, 14, 15, 18, 19, 22, 22, 24, 27

The median is the middle number or value when all the values have been ranked. In the
example above, the median is 18 hours. When there is an even number of values, the median
is calculated as the mean of the two middle values.

The mean is the sum of the observations divided by the number of observations. In this
example, the mean is 188 11 = 17.09 hours.

The mode is the most common number or value in a set of observations. In the example
above, 22 hours is the mode.

SAQ 5 A student obtained a sample of 12 holly leaves and counted the


number of spines on each leaf. Here are the results of this investigation.

Leaf 1 2 3 4 5 6 7 8 9 10 11 12

Number of 21 16 22 17 15 20 14 12 16 13 19 11
spines

Arrange these results in rank order. Find the mean number of spines per leaf, the median
number, and the mode.

Mean. Median Mode

SAQ 6 In an investigation, the numbers of aphids (greenflies) on 14 bean plants were


counted. The results were:

5
6

84 58 129 94 106 80 114

105 119 97 86 59 152 141

Calculate the mean number of aphids per plant

SAQ 7 In a breeding experiment using maize plants, the numbers of purple grains contained
in 9 rows of grain on a single cob were found to be as follows:

14 14 14 15 19 17 12 15 13

Arrange these values in rank order and determine the median number of purple grains per row.

Statistical Tests

A student collected two samples of holly leaves from a tree and noticed that the leaves from
higher up on the tree seemed to have fewer spines than those collected from lower down. She
wondered whether this was just due to chance, or whether the difference was significantly
different.

This requires the application of a statistical test.

In biology, you are likely to encounter three types of statistical test, as follows.

1. Comparison between two sets of data [t-test, or Mann-Whitney U test]

2. Correlation between two variables [Spearman rank correlation coefficient]

3. Comparison of differences between observed and expected results [Chi-


squared test (2)]

Comparison between two sets of data

An investigation might involve making a comparison between two lots of data, for example,
comparing the numbers of bluebells found in two different areas of woodland, or comparing the
size of leaves found on a plant growing in shady conditions with the leaves on the same
species of plant growing in open areas.

In the holly leaf spine example, it would be appropriate to use a test to compare the means of
the two samples. A t -test could be used if the data are normally distributed. A minimum of 6
observations are needed for each sample.

The Mann-Whitney U test compares two sets of data, which are not normally distributed. It is
designed to compare the medians of two unmatched samples. It can be used to decide
whether two sampled groups of organisms are from the same or different populations. The
samples to be compared may be unequal, but neither must consist of fewer than five
measurements, or of more than 25 individuals.

6
7

Correlation Between Two Variables

Suppose that the aim of your investigation was to determine whether or not there was a
relationship between soil moisture content and the yield of a crop plant. This would involve
taking a number of measurements at least 12 to 15 pairs - (soil moisture content and yield
per unit area) and investigating the relationship between these two variables. The correlation
coefficient indicates the strength of the relationship between the two variables.

The formula for calculating the Spearman rank correlation coefficient (rs) gives a value
between +1 (perfect positive correlation) and 1 (perfect negative correlation). See page 17.

The Chi-squared Test ( 2)

This test (sometimes referred to as a test of goodness of fit) is used to compare observed ( O)
results with expected (E) results, to determine whether the difference between the two is
significant. In general, the smaller the value of 2, the closer the observed results are to the
expected results and it less likely, therefore, that any differences are due to chance.

The Chi-squared test is particularly useful in genetics investigations, when the expected results
can be calculated using Mendelian ratios (e.g. 3:1 or 9:3:3:1). It has limited use in ecological
investigations and should not be used, for example, to test for differences between
measurements from two sites or populations.

Note: In W2 you are not expected to remember the formulae for


statistical tests, but you need to understand how to interpret the
results.

You may also be expected to suggest an appropriate test to use in a


particular investigation.

The Null Hypothesis ( H o )

In an investigation, there will be a hypothesis (i.e. an intelligent guess or your prediction) and a
corresponding null hypothesis. In the holly leaf example, (Page 5, SAQ 5) the hypothesis
could be stated as:

There is a significant difference between the mean numbers of spines found on holly leaves
from different heights on a tree.

The null hypothesis ( H0) is:

The difference between the mean numbers of spines found on holly leaves from different
heights on a tree is not significant.

7
8

OR
There is no significant difference between the mean numbers of spines on holly
leaves from different heights on a tree .

SAQ 8 Suggest a null hypothesis for an investigation into the relationship between soil
moisture content and crop yield.

H 0

Levels of Significance

In biology, the 5% (<1 in 20 or p = 0.05) level of significance is most commonly used. This
means that a particular event is predicted to happen by chance less than once in twenty times.
By referring to statistical tables, (Page 20) we can compare our calculated or computed value
with the tabulated value at 5% (or p = 0.05), known as the critical value or the table value.
For example, if we carried out a t -test and found that our calculated value of t is greater than
the tabulated value at 5%, we can say that the difference between the means is significant, and
reject the null hypothesis.( In other words, if the computed value is less than the critical
value accept the null hypothesis )

The Mann-Whitney U test gives two values, U1 and U2 (one for each set of data). We
compare the smaller of the two values with the critical value for the number of samples in the
set of data. If the smaller U value is less than or equal to the critical value, we reject the null
hypothesis.

The Spearman rank correlation coefficient (rs) is similarly compared with the critical value
at 5 % (or 0.05). If the value of rs (ignoring the + or sign) is equal to, or greater than the
critical value, then we can say that there is a significant correlation between the two variables.

We also interpret the 2 value by reference to statistical tables, comparing the calculated value
with the tabulated critical value at 5% (or 0.05). If the calculated value is smaller than the
critical value, we can say that the difference between the observed and expected values is not
significant, in other words, the difference is not due to chance, and we accept the null
hypothesis.

Degrees of freedom

Statistical tests take into account the number of readings,(observations or measurements) that
are made. This affects the critical values, so we therefore need to take into account the sample
size and calculate the number of degrees of freedom.

In the t -test the number of degrees of freedom = n a + n b 2, where na and nb are


numbers of measurements in each of the two samples, a and b. For example, if sample a
contained 20 leaves and sample b also contained 20 leaves, then the number of degrees
of freedom, df = 20 + 20 2 = 38.

In the Spearman rank correlation, the number of degrees of freedom is n 1, where n is


the number of pairs of measurements.

8
9

In the Chi-squared test, the number of degrees of freedom is n 1, where n is the number of
pairs of observed and expected values. [ df = (r-1)(c-1) where r = number of rows and c=
number of columns].

SAQ 9 A survey was carried out to investigate the relationship between the bract length and
the number of stem ridges in the soft rush ( Juncus effusus). The results are shown in the table
below.

Bract length / mm 155 248 240 200 155 105 135 145 175 90 110 120

Number of stem 58 53 49 47 46 43 41 41 43 35 37 35
ridges

(a) State a suitable null hypothesis for this investigation.

(b) State which statistical test would be appropriate to use.

(c) State the number of degrees of freedom.

SAQ 10 The table below shows the masses of carbon dioxide taken in daylight, and
produced during the dark, by 13 plants kept at a temperature of 30 C.

Carbon dioxide taken in during the Carbon dioxide produced in the dark /
day / mg g-1 h-1 mg g-1 h-1
1.2 2.1
1.5 2.0
1.6 2.2
1.7 2.7
1.7 2.4
1.9 1.8
2.0 2.4
2.0 2.0
2.1 2.3
2.3 2.6
2.5 2.7
2.6 2.5
2.9 3.0

The researchers wished to test the hypothesis that more carbon dioxide is produced in the dark
than is taken in during daylight.
(a) State a suitable null hypothesis for this investigation.

9
10

(b) State which statistical test would be appropriate to test the


hypothesis.

(c) State the number of degrees of freedom for this test. ..

SAQ 11 A new environmentally friendly slug pesticide [ slug-gone] has been


developed. During testing, slug-gone was applied to 7 plots of a wheat field. A further 8 plots
were left as a control. The numbers of slug-damaged wheat plants per metre of row in each
plot were recorded. The results are shown in the table below.

Number of slug-damaged plants per metre of row


Without slug-gone 6.2 8.5 12.3 13.2 14.1 15.8 19.4 21.0
With slug-gone 3.9 8.4 9.1 11.8 12.4 14.9 18.2

(a) Suggest why a Mann-Whitney U test is appropriate to determine


whether slug-gone reduces the number of slug-damaged plants.

(b) State a suitable null hypothesis for this investigation.

(c) The U values for the two sets of data were calculated
and found to be:

U1 = 37

U2 = 19

A statistical table showed that the critical value is 10. What can you conclude from this
investigation? Explain your answer.

SAQ 12 An experiment was carried out to investigate the inheritance of grain shape
and colour in maize. The numbers of each type of grain on a cob were counted (observed
number). The results are shown in the table below.

10
11

OBSERVED EXPECTED (O E )2
PHENOTYPE OF GRAIN NUMBER NUMBER
(O) (E ) E

Yellow and Smooth 53

Yellow and Wrinkled 20

White and Smooth 17

White and Wrinkled 10

It was predicted that the phenotypes would be in the ratio 9 : 3 : 3 : 1 and a Chi-squared ( 2)
test was used to investigate whether there was a significant difference between the observed
results and the expected results.

(a) State the null hypothesis for this investigation.

(b) Complete the table, by calculating the expected number (E) of grains of each
phenotype.

(c) State the number of degrees of freedom.

(d) The value of 2 was calculated and found to be 2.68. The table on page # 20 shows
critical values for 2 for the appropriate number of degrees of freedom.

What conclusion can you draw from this investigation? Explain your answer.

11
12

STATISTICAL TESTS THE BASIS FOR PLANNING.

Deciding which statistical test to use is probably the most difficult aspect of many student investigations. By considering the statistics
early in the planning stage, some of the problems could be avoided. The flow diagram below can be used to help plan investigations
which incorporate appropriate statistical tests. (Adapted from John Adds, Erica Larkcom, Ruth Miller, Robon Sutton (1999), Tools, Techniques and
Assessment in Biology , A course guide for Students and Teachers, P 107, Nelson publ.).

12
13
SAQ 13 Biology and Human Biology June 1997 (B6) Question 2 on t test. ( Modified)

Leaves which are adapted to low light intensities are known as shade leaves, while those which function more
efficiently in high light intensities are known as sun leaves.

An investigation was carried out to determine whether there was a significant difference in the surface area of
shade and sun leaves of dogs mercury ( Mercurialis perennis), a plant which grows in woodland and shady
places.
Seventeen leaves of the dogs mercury were collected from plants growing in the deep shade (Site A) and
seventeen leaves from plants growing in a clearing open to sunlight (Site B). The surface area of each leaf was
measured. The results are shown in the table below.

LEAF / DEEP SHADE - SITE - A OPEN CLEARING SITE B


n
XA XA 2 XB X B2
1 21 15
2 14 17
3 16 18
4 18 17
5 19 17
6 21 19
7 19 13
8 22 14
9 18 21
10 16 13
11 13 16
12 22 13
13 21 16
14 23 12
15 19 14
16 18 12
17 15 20

X
S.D

a. State a suitable null hypothesis for this investigation.

..
b. i. Calculate XA2 and XB2. (need 2 decimal places in all calculations).
ii. Calculate the summation of XA , XA2 , XB and XB2.
iii. Calculate the mean of XA , XA2 , XB and XB2.
x2 ( x)2 n
iv. Calculate the Standard Deviation using the following formula. s = ------------------------
n -- 1
sA = sB =

13
14

c. A t-test was carried out to determine whether the difference in the mean surface area was statistically
significant at the 5% level.
(XA XB) ( n1) X A XB
The formula used for the t test was : t = ------------------------- t= ----------------
s sA2 sB2
n A + nB
Where s is found by the formula s 2 = s A2 + s B2
..

Notation.

o When comparing two sets of data, XA and XB are their two respective mean values.

o The vertical lines indicate that the positive difference between the two means should be taken ,
irrespective of which is bigger.

o s is the symbol for standard deviation. It is a measure of dispersal around the mean value. The square of
these values, sA2 and sB2, are the variance of the two sets of data and are measures of the spread of data
around each value.

o n is the number of observations. There are n A, i.e. 17 , entries in site A and nB, i.e., 17, entries in site B.

i. Using your values from question b, calculate the value of t, either using the first or the second formula.
(XA XB) ( n1) X A XB
t = ------------------------- t = -------------------
s sA2 sB2
n A + nB
Answer : t = ...................................... Answer : t =
d. State the number of degrees of freedom.

e. Compare your calculated t value with the critical value given in the table on page 20. What does this indicate
about the difference in mean surface area between leaves from site A and leaves from site B.

f. Suggest one reason (i.e., the biological significance) for the difference in surface areas between leaves
from the two sites. Explain your answer.

14
15

SAQ 14.

The mean cell volume (MCV) of a single red blood cell is measured in femtolitres (10 -15 litre). The packed cell
volume (PCV) of the blood is the volume of red cells in a known volume of blood after a standard centrifugation.
The mean cell volume, MCV, can be calculated using the equation below.

PCV
MCV =
RBC count

a. A person had a PCV of 0.45 litres per litre of blood and a red blood cell count of 5.0 x 10 12 cells per litre of
blood. Calculate the mean cell volume in femtolitres. Show your working.

Answer :

b. A group of students expressed the view that the people suffering from iron deficiency anaemia may have red
blood cells of lower volume than normal healthy people. (Anaemia is a condition in which reduced quantity of
haemoglobin reduces its ability to carry oxygen).

The students selected two independent samples of people. Sample A contained nine people who were
apparently healthy and Sample B contained nine people who had just been diagnosed as suffering from iron
deficiency anaemia, but had not begun their treatment.

Suggest three factors that need to be controlled when selecting the volunteers in this investigation.

1.

2.

3.

b. Samples of blood were taken from each person and the mean cell volumes were calculated. The table
found overleaf shows the MCV of nine people in each of the two samples.

SAMPLE MEAN CELL VOLUME /


FEMTOLITRES
SAMPLE A 87 86 84 82 90 92 92 86 90
(HEALTHY)
SAMPLE B 85 81 82 79 85 83 78 78 80
(ANAEMIC)

i. The data were analysed using a Mann-Whitney U test to test the hypothesis which the students formulated.
Suggest a suitable null hypothesis for this investigation.

15
16

ii. Explain, why a Mann-Whitney U test was chosen to test the hypothesis.

c. The median value in the healthy sample (A) is 87 femtolitres. Find the median value for the anaemic
population and comment on the difference between the two median values.

d. Arrange the data from the table in order, in a form suitable for analysis using a Mann-Whitney U test.

RANKED RANK RANK AFTER HEALTH CONDITION RANK RANK FOR


DATA BEFORE ADJUSTMEN H HEALTHY FOR SUFFERER
ADJUSTMENT T S SUFFERER HEALTHY

---- ---- ----

e. Calculate the value of U using the following formula.

(ns) (ns + 1) ( nH) (nH + 1)


UH = nH . nS + ---------------- -- RS US = nH . nS + ---------------- RH
2 2

U H = . U S = .

16
17

Where,
n H = size of sample H = . , R H = sum of ranks for sample H = ..

n S = size of sample S = .., R s = sum of ranks for sample S = ..

f. Which value of U would you take to determine the significance of these results? .

g. Do the results enable you to accept or reject the null hypothesis? Explain your answer.

SAQ 15. Spearman Rank Correlation.(Adapted from Maths for Advanced Biology by Alan Cadegan and Robin Sutton, Chap. 5 ,pp 29).

An investigation was carried out to study the effect of flow rate on the density of stream invertebrates. A group of
students expressed the view that an increase in flow rate increases the number of invertebrates in the stream.
Twelve sites were chosen randomly and flow rate of the stream was determined. Using a 0.25m 2 quadrat, the
density of the invertebrates at each site was determined. The results of the investigation and the graphical
representation of the relationship between the flow rate and the invertebrate density at each site are shown
below.

Compare the scattergram with the three graphs below. Graph a shows that for every increase in flow rate there
is an increase in the number of invertebrates, i.e., there is a perfect positive correlation. In graph c, for every
increase in flow rate, there is a decrease in the number of invertebrates, i.e., there is a perfect negative
correlation. In graph b, there is a random scattering of points, i.e., there is no correlation.

17
18

a. Rearrange the flow rate and the corresponding invertebrate density in increasing order and then rank the
data for each variable.

SITE FLOWRATE / RANK AFTER ANIMALS PER RANK / R2 D = ( R1 R2) D2


-1
ms ADJUSTMENT / R1 QUADRAT
1
2
3
4
5
6
7
8
9
10
11
12
---- ---- ---- ----

6 D2
b. Calculate the value of rs using the following formula. rs = 1 -- ---------- where n = # of pairs of observation.
n(n2 1)

rs = ..

b. Compare the value of rs against the critical value for the appropriate number of pairs of measurements .
(Refer to page # 20)
What conclusion can you draw from the investigation? Explain your answer.

.......

.......

18
19
.

STATISICAL TABLES

19
20

LEAF / DEEP SHADE - SITE - A OPEN CLEARING SITE B


n
XA XA 2 XB X B2
1 21 15
2 14 17
3 16 18
4 18 17
5 19 17
6 21 19
7 19 13
8 22 14
9 18 21
10 16 13
11 13 16
12 22 13

20
21
13 21 16
14 23 12
15 19 14
16 18 12
17 15 20

X
S.D

21

Vous aimerez peut-être aussi