Vous êtes sur la page 1sur 183

# UNIVERSITY OF CAMBRIDGE INTERNATIONAL EXAMINATIONS

## General Certificate of Education Ordinary Level

* 0 1 9 2 7 3 6 8 8 2 *

4040/12

STATISTICS
Paper 1

October/November 2013
2 hours 15 minutes

## Candidates answer on the question paper.

Pair of compasses
Protractor

Write your Centre number, candidate number and name on all the work you hand in.
Write in dark blue or black pen.
You may use a soft pencil for any diagrams or graphs.
Do not use staples, paper clips, highlighters, glue or correction fluid.
DO NOT WRITE IN ANY BARCODES.
Answer all questions in Section A and not more than four questions from Section B.
If working is needed for any question it must be shown below that question.
The use of an electronic calculator is expected in this paper.
At the end of the examination, fasten all your work securely together.
The number of marks is given in brackets [ ] at the end of each question or part question.

## This document consists of 19 printed pages and 1 blank page.

DC (NH/CGW) 66916/2
UCLES 2013

[Turn over

2
Section A [36 marks]
Answer all of the questions 1 to 6.

## Seven statistical measures are

mean,
median,
mode,
range,
interquartile range,
variance
and standard deviation.
In each of the following situations, one of these measures is to be found by the person
described. State the appropriate measure in each case.
(i)

## A doctor finds the most common age of her patients.

................................................... [1]

(ii)

An athlete who competes in the 100 metres sprint finds the difference between his
slowest and quickest practice times.
................................................... [1]

(iii)

A graduate who seeks employment with a company finds a measure of central tendency
for the salaries of the companys employees. The company has twenty employees, of
whom three are managers earning salaries very much higher than the other employees.
................................................... [1]

(iv)

A teacher finds a measure of dispersion for the scores of her pupils in a test, in which no
pupil scored an exceptionally high mark, and no pupil scored an exceptionally low mark.
................................................... [1]

(v)

A biologist finds a measure of dispersion for the growth of twelve plants over a period of
three months. Two plants have been attacked by insects and have grown very much less
than the others.
................................................... [1]

(vi)

A sociologist finds a measure of central tendency for the first names given to the male
babies born in a hospital over a period of six months.
................................................... [1]

UCLES 2013

4040/12/O/N/13

For
Examiners
Use

3
2

A large keep fit class for women is held at a sports club once every week. The manager of
the club asks the class instructor to select a sample of size 10 from the class.
(i)

For
Examiners
Use

## State the method of sampling used if the instructor decides to select

(a) the first 10 women to arrive at the class,
................................................... [1]
(b) women at regular intervals from the class register.
................................................... [1]

The sample is required to obtain responses to a proposal to change the time of the class from
Monday evening to Monday afternoon. For class members the only items of data presently
available to the instructor are name and age.
(ii)

State, and justify, two other items of data relating to class members which the instructor
needs to know when selecting the sample in order to avoid bias in responses. You are
not required to describe how the sample is selected.
..........................................................................................................................................
..........................................................................................................................................
..........................................................................................................................................
..........................................................................................................................................
..........................................................................................................................................
..........................................................................................................................................
..........................................................................................................................................
...................................................................................................................................... [4]

UCLES 2013

4040/12/O/N/13

[Turn over

4
3

In a photographic equipment store a record was kept of the number of cameras sold each
day. The values, for eleven consecutive days, were as follows.
6

## For these values find

(i)

the mode,

................................................... [1]
(ii)

## the mean, correct to one decimal place,

................................................... [2]
(iii)

the median.

................................................... [2]
The values recorded for the next three days were x, x + 1 and x + 2.
(iv)

If the median for the entire fourteen-day period was the same as the median for the first
eleven days, find x.

x = .................................................. [1]

UCLES 2013

4040/12/O/N/13

For
Examiners
Use

5
4

The diagram below shows the number of actors at a film festival who have worked in one or
more of the cities Mumbai, Los Angeles and Rome.
Mumbai

For
Examiners
Use

Los Angeles
5

13

4
3

Rome
(i)

## Find the number of actors who have worked in Mumbai.

................................................... [1]
(ii)

## Interpret the value 6 in the diagram.

..........................................................................................................................................
...................................................................................................................................... [1]

## A journalist selects one of these actors at random for interview.

Find the probability of selecting an actor who has worked in
(iii)

## Mumbai or Los Angeles or both,

................................................... [2]
(iv)

## Los Angeles and Rome,

................................................... [1]
(v)

Rome, given that the actor has worked in Mumbai and Los Angeles.

................................................... [1]

UCLES 2013

4040/12/O/N/13

[Turn over

6
5

## In this question you are not required to draw any charts.

A charity, Camfam, classifies the income it receives under the headings Special Events,
Donations, Grants, and Other Sources. In Camfams report for 2010, the following percentage
bar chart was given, which represents a total income of \$80 million.

2010

10

20

Special Events
(i)

30

40

50
60
Percentage

Donations

70

Grants

80

90

100

Other Sources

## Find the income which Camfam received in 2010 from Grants.

\$ ................................................... [2]
(ii)

If a pie chart were to be drawn to represent this information, find the angle which would
represent the sector for Special Events.

................................................... [2]
Camfams total income in 2011 was \$60 million.
Two pie charts, one for 2010 and one for 2011, are to be presented together in a new report.
(iii)

Find, in its simplest terms, the ratio of the area of the chart representing 2010 to the
area of the chart representing 2011.

................................................... [2]

UCLES 2013

4040/12/O/N/13

For
Examiners
Use

7
6

The following table is to show the distance, in kilometres, between any two of the five towns
A, B, C, D and E.

For
Examiners
Use

A
42

B
C

20 39

36 18

25

(i)

## Complete the table using the following information.

(a) The distance between B and C is 10 km more than the distance between D and E.

[1]
(b) The distance between A and C is two thirds of the distance between A and E.

[1]
(c) The distance between A and B is twice the distance between C and E.

[1]
(d) C is 19 km further from D than B is from E.

[1]
Dimitri lives in town A, but has one friend in each of the towns D and E. He makes a journey
in which he leaves his home, visits each of these friends once, and then returns home.
(ii)

## Find the distance which Dimitri travels to complete the journey.

............................................. km [2]

UCLES 2013

4040/12/O/N/13

[Turn over

8
Section B [64 marks]

For
Examiners
Use

## Answer not more than four of the questions 7 to 11.

Each question in this section carries 16 marks.

In this question the fertility rate of a population is defined as the number of births per 1000
females.
The table below gives information about the female population and age group fertility rates in
a particular city for the year 2012, together with the standard population of the area in which
the city is situated.
Age group of
females

Population of
females in age group

Age group
fertility rate

Standard population
of females (%)

Under 20

2900

50

18

20 29

4500

184

22

30 39

5250

136

25

Over 39

5800

15

35

(i)

Births

Calculate, to 1 decimal place, the standardised fertility rate for the city.

................................................... [4]
(ii)

Calculate the number of births for each age group and insert the values in the table
above.

[2]

UCLES 2013

4040/12/O/N/13

9
(iii)

Calculate, to 1 decimal place, the crude fertility rate for the city.

For
Examiners
Use

................................................... [4]
There are equal numbers of males and females in the city and in the standard population.
The standardised and crude death rates for the city in 2012 were 8.5 and 7.8 per thousand
of the population respectively.
(iv)

Using one of these values, and any other appropriate values from parts (i), (ii) and (iii),
find the increase in the population of the city in 2012 due to births and deaths.

................................................... [5]
It is not possible to obtain an accurate measure of population increase or decrease in a city
from information on births and deaths alone.
(v)

## State what additional information is required.

..........................................................................................................................................
...................................................................................................................................... [1]

UCLES 2013

4040/12/O/N/13

[Turn over

10
8

In a large residential building there are 120 apartments, of which 50 are private apartments
(owned by the residents) and 70 are company apartments (owned by the company which
constructed the building).
If two apartments are chosen at random, find the probability of choosing
(i)

## two private apartments,

................................................... [2]
(ii)

## at least one company apartment.

................................................... [2]
The weekly rents, in dollars, charged on the company apartments are represented in the
histogram below, from which one rectangle, representing the \$400 to under \$500 class, has
been omitted.
25

20

Number
of
apartments
per \$50

15

10

200

250

300

350

## Weekly rent (\$)

UCLES 2013

4040/12/O/N/13

400

450

500

For
Examiners
Use

11
Use the histogram to find the number of company apartments for which the weekly rent was
(iii)

## from \$250 to under \$400,

For
Examiners
Use

................................................... [2]
(iv)

## from \$225 to under \$250.

................................................... [2]
There were 10 company apartments for which the weekly rent was from \$400 to under \$500.
(v)

## Complete the histogram by drawing on the grid the rectangle representing

the \$400 to under \$500 class.

[1]
(vi)

Write down the term used to describe the \$300 to under \$350 class.
................................................... [1]

The private apartments are of three different sizes. There are 24 apartments with three
rooms, 14 with four rooms, and 12 with five rooms.
A safety expert, conducting a survey on the use of smoke detectors, chooses three private
apartments at random.
(vii)

If the apartments chosen have 12 rooms in total, find the probability that the apartments
are all of the same size.

................................................... [6]
UCLES 2013

4040/12/O/N/13

[Turn over

12
9

The mid-day temperature at a particular location in a city was measured every day throughout
the year 2010. The following table summarises the results obtained.
Temperature (C)

Number of days

0 under 5

5 under 10

25

10 under 15

52

15 under 20

81

20 under 25

79

25 under 30

68

30 under 35

37

35 under 40

15

Cumulative frequency

(i)

## Complete the cumulative frequency column in the above table.

[2]

(ii)

Plot the cumulative frequencies on the grid opposite, joining the points by a smooth
curve.
[3]

(iii)

## Use your graph to estimate

(a) the median of these temperatures,

.............................................. C [1]
(b) the interquartile range of these temperatures.

.............................................. C [4]

UCLES 2013

4040/12/O/N/13

For
Examiners
Use

13
For
Examiners
Use

400

350

## Cumulative frequency (days)

300

250

200

150

100

50

10

15

20

25

30

35

40

Temperature (C)

UCLES 2013

4040/12/O/N/13

[Turn over

14
When the results were obtained, a scientist predicted that, because of climate change,
temperatures in the city would increase at the rate of 0.5 C every ten years.
Assume that this prediction is accurate.
For this particular location,
(iv) use your answers to part (iii) to estimate, for the year 2050,
(a) the median of the mid-day temperatures,

........................................C [2]
(b) the interquartile range of the mid-day temperatures,

........................................C [1]
(v)

use your graph to estimate, for the period 2010 to 2050, the increase in the number of
days with a mid-day temperature of more than 36 C.

................................................... [3]

UCLES 2013

4040/12/O/N/13

For
Examiners
Use

15
BLANK PAGE

## [Turn over for Question 10]

UCLES 2013

4040/12/O/N/13

[Turn over

16
10 Emilie, a student teacher, conducted research on the number of pupils and the number
of teachers in the schools in the town of Astra, where she lives. The schools supplied the
following data.
School

Number of pupils, x

760

1219

927

470

1361

628

381

1085

Number of teachers, y

29

44

33

34

52

24

16

40

(i)

## Plot these data on the grid below.

y
60

50

40

Number
of
30
teachers

20

10

200

400

600

800

1000

1200

1400 x

Number of pupils
[2]

UCLES 2013

4040/12/O/N/13

For
Examiners
Use

17
The data have an overall mean of (853.875, 34) and an upper semi-average of (1148, 42.25).
(ii)

For
Examiners
Use

[2]
(iii)

## Find the lower semi-average.

................................................... [2]
(iv)

Without plotting the averages, and without drawing the line, find the equation of the
line of best fit in the form y = mx + c.

................................................... [3]
(v)

Explain briefly why the value of c which you have found in part (iv) might give you
cause for concern.
..........................................................................................................................................
...................................................................................................................................... [1]

Emilie discovered later that the data supplied by one of the schools gave, incorrectly, the
total number of people employed by the school, and not the number of teachers.
(vi)

Ignoring the point representing the school which supplied incorrect data, draw, by eye,
on the grid in part (i), a line of best fit through the remaining seven points.
[1]

(vii)

Use the line you have drawn in part (vi) to find its equation in the form y = mx + c.

................................................... [3]

UCLES 2013

4040/12/O/N/13

[Turn over

18
Emilie repeated the research for schools in the nearby town of Belport, for which she found
the equation of the line of best fit to be y = 0.0431x + 1.72 .
(viii)

Using this equation, and your answer to part (vii), state in which of the two towns a pupil
might choose to be educated, if free to choose. Explain your answer briefly.
..........................................................................................................................................
..........................................................................................................................................
...................................................................................................................................... [2]

UCLES 2013

4040/12/O/N/13

For
Examiners
Use

19
11

(i)

Give one advantage and one disadvantage of forming a large set of data into a grouped
frequency distribution.

For
Examiners
Use

..........................................................................................................................................
...................................................................................................................................... [2]
The presenter of a radio programme, in which recordings of popular songs are played, plans
his programme. For each song chosen he writes down the song length, in terms of time, in
minutes, taken to play the song. The following table summarises the song lengths.
Song length
(minutes)

Number
of songs

## 4.2 under 4.6

(ii)

Estimate, in minutes, the mean and standard deviation of these song lengths. Give your

Mean = ......................................................
Standard deviation = .................................................. [8]
UCLES 2013

4040/12/O/N/13

[Turn over

20
Information about five of the presenters earlier programmes is shown below.
Programme

Number of songs
played

## Mean of song lengths

(minutes)

Standard deviation of
song lengths (minutes)

38

3.70

0.339

39

3.52

0.328

42

3.69

0.294

37

3.83

0.305

38

3.74

0.291

(iii)

For
Examiners
Use

## State in which of the programmes P, Q, R, S or T, songs were generally

(a) shortest in length,
................................................... [1]
(b) most similar in length.
................................................... [1]

All the presenters programmes are three hours in duration. Songs are not played continuously
throughout each programme; for some of the time the presenter talks about the songs and
the singers.
A listener switched on programme P at a random time during its transmission.
(iv)

Find the probability that a song was not being played at that moment.

................................................... [4]

Permission to reproduce items where third-party owned material protected by copyright is included has been sought and cleared where possible. Every
reasonable effort has been made by the publisher (UCLES) to trace copyright holders, but if any items requiring clearance have unwittingly been included, the
publisher will be pleased to make amends at the earliest possible opportunity.
University of Cambridge International Examinations is part of the Cambridge Assessment Group. Cambridge Assessment is the brand name of University of
Cambridge Local Examinations Syndicate (UCLES), which is itself a department of the University of Cambridge.

UCLES 2013

4040/12/O/N/13

## UNIVERSITY OF CAMBRIDGE INTERNATIONAL EXAMINATIONS

General Certificate of Education Ordinary Level

* 5 8 2 3 8 7 8 8 1 2 *

4040/13

STATISTICS
Paper 1

October/November 2013
2 hours 15 minutes

## Candidates answer on the question paper.

Pair of compasses
Protractor

Write your Centre number, candidate number and name on all the work you hand in.
Write in dark blue or black pen.
You may use a soft pencil for any diagrams or graphs.
Do not use staples, paper clips, highlighters, glue or correction fluid.
DO NOT WRITE IN ANY BARCODES.
Answer all questions in Section A and not more than four questions from Section B.
If working is needed for any question it must be shown below that question.
The use of an electronic calculator is expected in this paper.
At the end of the examination, fasten all your work securely together.
The number of marks is given in brackets [ ] at the end of each question or part question.

## This document consists of 19 printed pages and 1 blank page.

DC (CW/CGW) 66919/2
UCLES 2013

[Turn over

2
Section A [36 marks]
Answer all of the questions 1 to 6.

A survey was carried out to discover whether the quantity of traffic on a busy road was
sufficient to justify the installation of a pedestrian crossing. At intervals throughout one day
an investigator recorded the number of vehicles passing the proposed location in periods of
30 seconds duration.
The numbers he recorded were:
12 51 64 55 51 61 31 22 20 15 34 14 69 35
When his record sheet was examined the number shown here as was illegible, but it was
certainly a single-digit number.
Although this number is unknown, name, but do not calculate,
(i)

## two measures of central tendency (average) which can still be found,

.......................................................
................................................... [2]

(ii)

## one measure of dispersion which can still be found,

................................................... [1]

(iii)

## one measure of central tendency (average) which cannot be found,

................................................... [1]

(iv)

## two measures of dispersion which cannot be found.

.......................................................
................................................... [2]

UCLES 2013

4040/13/O/N/13

For
Examiners
Use

3
2

The pie chart below illustrates the distribution by location of the total net profit of \$787 million
earned by an international company in the year 2011.

For
Examiners
Use

Asia
North
America

Rest
of the
World

Europe

(i)

Measure, to the nearest degree, the sector angles of the pie chart, and insert them in
the appropriate places on the chart.
[2]

(ii)

Calculate, to the nearest \$million, the net profit of the company in Asia.

## \$ ...................................... million [1]

(iii)

Measure and state the radius, in centimetres, of the above pie chart.
............................................. cm [1]
The total net profit of the same company in the year 2005 was \$523 million.

(iv)

## Calculate, correct to 2 significant figures, the radius, in centimetres, of the comparable

pie chart for 2005.

............................................ cm [2]

UCLES 2013

4040/13/O/N/13

[Turn over

4
3

A factory employs both male and female staff in each of the three categories managerial,
inspection and production.
There are altogether 3500 employees, of whom 2150 are male. There are a total of 660
managerial staff, 540 male inspection staff and 785 female production staff.
(i)

## Insert these values in the appropriate places in the following table.

Managerial

Inspection

Production

TOTAL

Male
Female
TOTAL
[1]
Two thirds of the managerial staff are female.
(ii)

## Use this further information to complete the table.

[5]

UCLES 2013

4040/13/O/N/13

For
Examiners
Use

5
4

There are 50 girls in their final year at a school. The diagram below illustrates the number of
the girls who play each of the sports badminton, volleyball and handball.

For
Examiners
Use

Volleyball
6

11

9
8

3
Handball

(i)

## Calculate the value of x, and state what it represents.

x = ......................................................
..........................................................................................................................................
...................................................................................................................................... [2]
(ii)

Find
(a)

## how many more girls play volleyball than play handball,

................................................... [1]

(b)

how many more girls play exactly two sports than play exactly one sport.

................................................... [1]
Half of the girls who play volleyball and two thirds of the girls who play only handball say they
intend to continue playing sport after they have left school.
(iii)

Find the number of girls who intend to continue playing sport after they have left school.

................................................... [2]
UCLES 2013

4040/13/O/N/13

[Turn over

6
5

## In answering this question you are not required to draw a histogram.

The times taken, in minutes, by 174 people to complete an aptitude test are summarised in
the following table.

Time (minutes)

Number of
people

10 under 30

28

30 under 40

36

40 under 45

40

45 under 50

32

50 under 75

20

75 under 120

18

TOTAL

174

Height of rectangle
(units)

18

The times are to be illustrated by a histogram, in which the 30 under 40 class is represented
by a rectangle of height 18 units.
(i)

Calculate the height of the rectangle representing the 40 under 45 class, and insert
the value in the table.

[1]
(ii)

Calculate the heights of the rectangles representing the remaining four classes, and
insert the values in the table.

[3]
(iii)

If the final two classes were combined into a single 50 under 120 class, calculate, to
2 decimal places, the height of the rectangle which would represent the combined class.

................................................... [2]
UCLES 2013

4040/13/O/N/13

For
Examiners
Use

7
6

(a) (i)

Describe the situation which can lead to the method of systematic sampling
producing a biased sample.

For
Examiners
Use

..................................................................................................................................
..................................................................................................................................
.............................................................................................................................. [1]
(ii)

## There are 380 students at a college. It is proposed to take a systematic sample of

20 of the students. Explain briefly how this could be achieved.
..................................................................................................................................
..................................................................................................................................
..................................................................................................................................
..................................................................................................................................
..................................................................................................................................
.............................................................................................................................. [3]

(b) Briefly explain how a population could be stratified, prior to taking a stratified sample, in
order to ascertain the views of members of the public on
(i)

## a proposed increase in the tax on tobacco products,

..................................................................................................................................
.............................................................................................................................. [1]

(ii)

aircraft noise.
..................................................................................................................................
.............................................................................................................................. [1]

UCLES 2013

4040/13/O/N/13

[Turn over

8
Section B [64 marks]

For
Examiners
Use

## Answer not more than four of the questions 7 to 11.

Each question in this section carries 16 marks.

(a) A test for a particular disease has a 95% chance of correctly giving a positive result for a
person who has the disease, but a 10% chance of incorrectly giving a positive result for
a person who does not have the disease.
(i)

Find the chance that the test gives a negative result for a person who has the
disease, and insert it in the following table.

Person has
the disease
P(test result positive)

have the disease

0.95

[1]
(ii)

## Complete the table.

[1]

15% of the people who are tested are believed to have the disease.
A person is chosen at random and tested.
(iii)

Calculate the probability that the test gives a correct result for this person.

................................................... [4]

UCLES 2013

4040/13/O/N/13

9
(b) Give all probabilities in this part of the question as fractions.
The following diagram classifies the members of a tennis club as to whether they are
male or female, left-handed or right-handed, and whether or not they have represented
the club in matches.
Left-handed

Male

Female

For
Examiners
Use

Right-handed

represented club
Represented
club
1

## A member of the club is chosen at random.

(i)

Calculate the probability that this member has represented the club in matches.

................................................... [1]
A female member is chosen at random.
(ii)

## Calculate the probability that she is right-handed.

................................................... [2]
A member who has represented the club in matches is chosen at random.
(iii)

## Calculate the probability that this member is left-handed.

................................................... [2]

UCLES 2013

4040/13/O/N/13

[Turn over

10
(c) Laura walks to school. On her route she passes two shops, A and B. The probability that
she will go into shop A on any morning is 0.2, and into shop B is 0.7.
Her decision of whether to go into one of the shops is independent of whether she goes
into the other shop. If she goes into either or both shops the probability that she will be
late for school is 0.09.
(i)

Calculate the probability that on any morning she will go into exactly one shop and
be late for school.

................................................... [3]
Laura has been told that she must aim to be late on no more than 5% of the schooldays
on which she goes into exactly one shop.
(ii)

## State, with a reason, whether she is likely to achieve this target.

..................................................................................................................................
.............................................................................................................................. [2]

UCLES 2013

4040/13/O/N/13

For
Examiners
Use

11
8

(a) The table below summarises information about the number of GCE O Level subjects
passed by different numbers of pupils at a school in the year 2011.
Number of subjects (x)

## Number of pupils (frequency)

12

15

10

Cumulative frequency

10

22

37

47

54

For
Examiners
Use

## An appropriate cumulative frequency graph is to be drawn to represent these data.

(i)

On the grid below, draw and label two axes, the horizontal axis representing the
number of subjects passed and the vertical axis representing cumulative frequency.
[2]

(ii)

## Draw an appropriate cumulative frequency graph to represent these data.

[4]

UCLES 2013

4040/13/O/N/13

[Turn over

12
(b) The cumulative frequency graph below illustrates the lengths of journey times, in
minutes, to their homes of a number of students at a college at the end of one particular
day.
200

160

120
Cumulative
frequency
80

40

10

20

30

40

50

60

## Journey time (minutes)

Use the graph to estimate
(i)

## the median journey time,

.................................... minutes [1]

(ii)

## the lower quartile time,

..................................... minutes [1]

(iii)

## ..................................... minutes [1]

(iv)

the number of students whose journey time was longer than 23 minutes,

................................................... [3]

UCLES 2013

4040/13/O/N/13

For
Examiners
Use

13
(v)

## the percentile corresponding to a journey time of 17 minutes.

For
Examiners
Use

................................................... [2]
On the next day, due to bad weather, the journey time of all students was 5 minutes
longer than the original times illustrated in the graph.
Compared with the original times, state, without further calculation, the effect which the
(vi)

## the upper quartile journey time,

.............................................................................................................................. [1]

(vii)

## the interquartile range of journey times.

.............................................................................................................................. [1]

UCLES 2013

4040/13/O/N/13

[Turn over

14
9

The following table gives information about the populations and deaths in two towns, A and
B, during the course of one year, together with the standard population of the area in which
both towns are situated.
Town A
Age
Population

Deaths

0 under 15

5000

45

15 under 45

3750

15

45 under 65

2500

25

65 and over

1250

(i)

q=

Town B
Population

Deaths

Standard
population

6000

66

400

27000

54

300

10

15000

60

200

32

2000

30

100

Death rate
(per thousand)
p=

For town A, calculate the values of p and of q and insert them in the table.

[2]
(ii)

## Calculate the crude death rate of town B.

................................................... [4]
(iii)

## Calculate the standardised death rate of town B.

................................................... [4]
(iv)

Use the population figures given in the table to state why the crude death rate and the
standardised death rate of town A are equal.
..........................................................................................................................................
...................................................................................................................................... [2]

UCLES 2013

4040/13/O/N/13

For
Examiners
Use

15
The table shows that far more deaths occurred in town B than in town A during the year,
and yet the standardised death rate for town B is much lower than that for town A.
(v)

For
Examiners
Use

## Give two reasons why this situation has occurred.

..................................................................................................................................
..................................................................................................................................
..................................................................................................................................
.............................................................................................................................. [2]

## It was subsequently discovered that a small number of inhabitants of town B, none of

whom had died during the year, had been misclassified by being included incorrectly in
the 45 under 65 class, when in fact they were all 65 and over.
(vi)

State, with a reason, the effect, if any, which correcting this error would have on the
crude death rate of town B.
..................................................................................................................................
.............................................................................................................................. [2]

UCLES 2013

4040/13/O/N/13

[Turn over

16
10 The time, in minutes, taken by each of 6 children to walk 1 kilometre, is given in the following
table.

(i)

Child

13

15

12

12

23

25

11

18

23

## Plot these data on the grid below.

y
30

20

Time
(minutes)

10

10

12

14

16 x

Age (years)
[2]
(ii)

Calculate the overall mean and the two semi-averages of the data, and plot them on

[5]
UCLES 2013

4040/13/O/N/13

For
Examiners
Use

17
(iii)

## Use your plotted averages to draw a line of best fit.

[1]

(iv)

Using any valid method, obtain the equation of your line of best fit, and write it in the
form y = mx + c.

For
Examiners
Use

................................................... [3]
(v)

Use your equation to estimate, to the nearest minute, the time taken to walk 1 kilometre
by a child aged 14 years.

................................................... [1]
(vi)

(a) Comment on how well your line of best fit matches the data points.
..................................................................................................................................
.............................................................................................................................. [1]
(b)

From the graph, identify the child for whom your line of best fit most overestimates
the time taken.
................................................... [1]

(vii)

State, with a reason, whether it would be valid to use your line of best fit to estimate the
time taken to walk 1 kilometre by a person whose age is outside the range of values
given in the table.
..................................................................................................................................
..................................................................................................................................
..................................................................................................................................
.............................................................................................................................. [2]

UCLES 2013

4040/13/O/N/13

[Turn over

18
11 The following table summarises the increase, in dollars, of the annual income of a sample of
200 people between the years 2006 and 2011 (a negative value indicates a decrease).
Increase in annual
income (\$x)

y = m 750
250

2500 under 0

fy

fy 2

14

0 under 1500

99

39

25

## 5000 under 10000

23

TOTAL
(i)

Frequency
(f )

200

Obtain the mid-point, m, for each of the five classes and insert the values in the table.
[1]

(ii)

For each class, obtain the value of the scaled variable, y, where
y = m 750 ,
250
and insert the values of y in the table.

[2]

UCLES 2013

4040/13/O/N/13

For
Examiners
Use

19
(iii)

Obtain the values of fy and fy 2 and use them to estimate the values of the mean of y
and the variance of y.

Mean = ......................................................
Variance = .................................................. [7]
(iv)

## Use your results from part (iii) to estimate

(a)

the mean of x,

................................................... [2]
(b)

the variance of x.

................................................... [3]
(v)

## State the units of the variance of x.

................................................... [1]

UCLES 2013

4040/13/O/N/13

For
Examiners
Use

20
BLANK PAGE

Permission to reproduce items where third-party owned material protected by copyright is included has been sought and cleared where possible. Every
reasonable effort has been made by the publisher (UCLES) to trace copyright holders, but if any items requiring clearance have unwittingly been included, the
publisher will be pleased to make amends at the earliest possible opportunity.
University of Cambridge International Examinations is part of the Cambridge Assessment Group. Cambridge Assessment is the brand name of University of
Cambridge Local Examinations Syndicate (UCLES), which is itself a department of the University of Cambridge.

UCLES 2013

4040/13/O/N/13

## UNIVERSITY OF CAMBRIDGE INTERNATIONAL EXAMINATIONS

General Certificate of Education Ordinary Level

* 9 5 0 8 8 4 8 7 2 6 *

4040/22

STATISTICS
Paper 2

October/November 2013
2 hours 15 minutes

## Candidates answer on the question paper.

Pair of compasses
Protractor

Write your Centre number, candidate number and name on all the work you hand in.
Write in dark blue or black pen.
You may use a soft pencil for any diagrams or graphs.
Do not use staples, paper clips, highlighters, glue or correction fluid.
DO NOT WRITE IN ANY BARCODES.

Answer all questions in Section A and not more than four questions from Section B.
If working is needed for any question it must be shown below that question.
The use of an electronic calculator is expected in this paper.
At the end of the examination, fasten all your work securely together.
The number of marks is given in brackets [ ] at the end of each question or part question.

## This document consists of 20 printed pages.

DC (RW/CGW) 66922/2
UCLES 2013

[Turn over

2
Section A [36 marks]

For
Examiners
Use

## Answer all of the questions 1 to 6.

Events A, B, C and D are four of the possible outcomes of an experiment such that
P(A) = 0.15 ,
(i)

P(B) = 0.2 ,

P(C) = 0.4

and

P(D) = 0.24 .

## If events A and B are independent, find

(a) P(A B),

................................................... [2]
(b) P(A B).

................................................... [2]
(ii)

## If events C and D are mutually exclusive, find

(a) P(C D),

................................................... [1]
(b) P(C D).

................................................... [1]

UCLES 2013

4040/22/O/N/13

3
2

(i)

The annual salaries of the employees at a company have a mean of \$m and a standard
deviation of \$s, where s 0.
A new employee arrives at the company and is paid an annual salary of \$m.
The mean and standard deviation of the salaries of the employees are now recalculated
to include the salary of the new employee.

For
Examiners
Use

For each of the mean and the standard deviation, state whether it will increase,
decrease, or stay the same when this new employees salary is included.
Mean .......................................................
Standard deviation ................................................... [2]
(ii)

At another company, at the end of 2011, the employees annual salaries had a mean of
\$12 000 and a standard deviation of \$1000.
During 2012, each of the employees salaries increased by 5%. At the end of that year
they each also received an annual bonus of \$200.
Calculate the mean and standard deviation of the annual incomes (salaries plus
bonuses) of the employees at the end of 2012.

Mean \$ .......................................................
Standard deviation \$ ................................................... [4]

UCLES 2013

4040/22/O/N/13

[Turn over

4
3

## Ariana and Bella are playing a game.

They each have 4 cards, which are numbered 1, 2, 3 and 4.
Each shuffles her own cards and turns one over at random.
(i)

For
Examiners
Use

If the cards show the same number, Ariana wins and Bella must pay Ariana \$3.
If the cards show different numbers, Bella wins and Ariana must pay Bella \$1.
By finding the probabilities of Ariana and Bella winning, show whether or not the game
is fair.

[3]
(ii)

In a second game the numbers shown on the cards are added together.
If the total is 4 or less, Ariana wins and Bella must pay Ariana \$5.
If the total is 5 or more, Bella wins.
If the game is to be fair, how much should Ariana pay Bella if Bella wins?

\$ ................................................... [3]

UCLES 2013

4040/22/O/N/13

5
4

The pupils in a class should arrive for registration at 9.00 am. On one particular day, 25
pupils were early, with a mean arrival time of 8.51 am. On the same day, 9 pupils were late
with a mean arrival time of 9.21 am, and 2 pupils arrived at 9.00 am exactly.

For
Examiners
Use

If x represents the number of minutes a pupil was late (a pupil who was early would have a
negative value of x),
(i)

find x, and hence find the mean arrival time for all 36 pupils.

x = .......................................................
Mean = ................................................... [3]
If x 2 = 5096 for the 36 pupils,
(ii)

## find the standard deviation of x, correct to one decimal place.

................................................... [3]

UCLES 2013

4040/22/O/N/13

[Turn over

6
5

The change in a countrys annual production (in millions of tonnes) of 4 commodities between
2011 and 2012 is shown in the change chart below.
3

2.5

1.5

0.5

0.5

1.5

2.5
Wheat

Rice
Cotton
Maize
3

2.5

1.5

0.5

0.5

1.5

2.5

Change in annual production between 2011 and 2012 (in millions of tonnes)
The quantity produced (in millions of tonnes) of the 4 commodities in 2011 in this country is
shown in the table below.
Commodity

(i)

## Quantity produced in 2011

(millions of tonnes)

Wheat

78.6

Rice

99.2

Cotton

22.6

Maize

17.3

## Quantity produced in 2012

(millions of tonnes)

Use these data and the change chart to find the quantities of the commodities produced
in 2012 and complete the table.
[2]

UCLES 2013

4040/22/O/N/13

For
Examiners
Use

7
(ii)

On the grid below, draw a dual bar chart to show the quantities produced in 2011 and
2012 of each of the 4 commodities.

For
Examiners
Use

[3]
(iii)

## State one advantage of a dual bar chart over a change chart.

..........................................................................................................................................
...................................................................................................................................... [1]

UCLES 2013

4040/22/O/N/13

[Turn over

8
6

(a) For each of the following state whether the variable is discrete or continuous and
whether it is qualitative or quantitative.
Discrete or Continuous

Qualitative or Quantitative

## (i) the heights of the players

in a football competition

[1]

## (ii) the towns of birth of the players

in a football competition

[1]

(b) A football team used the diagram below to illustrate the number of goals it had scored
per match in a season in both the league and cup competitions.
20
18
16
14
12
Number of
10
matches
8

## matches played in the cup

matches played in the league

6
4
2
0

(i)

1
2
3
Number of goals

## State the full name given to this type of diagram.

................................................... [1]

(ii)

Explain why the above diagram is more appropriate than a histogram to illustrate
these data.
..................................................................................................................................
.............................................................................................................................. [1]

(iii)

Find the proportion of matches played in the cup in which the team scored 2 or
more goals.

................................................... [2]

UCLES 2013

4040/22/O/N/13

For
Examiners
Use

9
Section B [64 marks]

For
Examiners
Use

## Answer not more than four of the questions 7 to 11.

Each question in this section carries 16 marks.

(a) The total number of visitors at a tourist attraction has been recorded for every quarter
over a three-year period.
(i)

## Explain why it might be appropriate to calculate moving average values when

establishing the trend in the number of visitors.
..................................................................................................................................
.............................................................................................................................. [1]

(ii)

## If an n-point moving average is to be calculated, state an appropriate value for n.

................................................... [1]

(iii)

## State, with a reason, whether centring would be necessary in this case.

..................................................................................................................................
.............................................................................................................................. [2]

UCLES 2013

4040/22/O/N/13

[Turn over

10
(b) A hospital records the number of patients admitted at two-monthly intervals over a
period of two years and the results are shown in the table below, together with the
6-point moving average values for these data.
Number of
patients
Jan Feb

241

Mar Apr

208

May Jun

6-point
total

6-point moving
average value

1272

212

1290

215

1290

215

1296

216

Centred moving
average value

x=

2010
Jul Aug

Sep Oct

Nov Dec

Jan Feb

185

209

261

259
y=

Mar Apr

May Jun

208
1323

220.5

1332

222

174

2011
Jul Aug

197

Sep Oct

224

Nov Dec

270

(i)

z=

## Calculate the values of x, y and z and insert them in the table.

[3]

UCLES 2013

4040/22/O/N/13

For
Examiners
Use

11
(ii)

Calculate the centred moving average values and insert them in the appropriate
places in the table.

For
Examiners
Use

[3]
(iii)

Plot the centred moving average values on the grid below and draw a trend line
through the points.

235

Number of patients

230
225
220
215

2010

2011

MayJun

MarApr

JanFeb

NovDec

SepOct

JulAug

MayJun

MarApr

JanFeb

NovDec

SepOct

JulAug

MayJun

MarApr

JanFeb

210

2012
[3]

(iv)

Explain what the trend line you have drawn tells you.
..................................................................................................................................
.............................................................................................................................. [1]

## The seasonal component for Mar Apr is 11.25 .

(v)

Estimate the number of patients admitted to the hospital during the period
Mar Apr 2012.

................................................... [2]

UCLES 2013

4040/22/O/N/13

[Turn over

12
8

The students at a college take one of three programmes of study: Physics, Chemistry and
Mathematics (PCM) or Physics, Chemistry and Biology (PCB) or Economics, Geography
and Mathematics (EGM). The numbers of students who study each programme are shown in
the table below.

(i)

PCM

PCB

EGM

TOTAL

Male

60

40

40

140

Female

40

90

30

160

TOTAL

100

130

70

300

## Find the probability that a student chosen at random

(a) is a male studying PCM,

................................................... [1]
(b) is female,

................................................... [1]
(c) is studying Physics as part of their programme,

................................................... [1]
(d) is studying PCB, given that they are male.

................................................... [1]
(ii)

If two different students are chosen at random, find the probability that they are taking
the same programme of study.

................................................... [3]

UCLES 2013

4040/22/O/N/13

For
Examiners
Use

13
(iii)

If three different students are chosen at random, find the probability that they are each
taking a different programme of study.

For
Examiners
Use

................................................... [3]
Students are required to buy textbooks for each subject that they study: one textbook for each
of Physics, Chemistry and Biology and two textbooks for each of Mathematics, Economics
and Geography.
(iv)

Find how many textbooks a student taking each programme of study must buy, and
complete the table below.

Course

PCM

PCB

Number of
textbooks

(v)

EGM

[1]

If one of the textbooks owned by a student at the college is lost at random, find the
probability that it
(a) belongs to a student on the PCM programme,

................................................... [3]
(b) is a Mathematics textbook.

................................................... [2]
UCLES 2013

4040/22/O/N/13

[Turn over

14
9

(a) The values of a variable are formed into a grouped frequency distribution, with one of
the classes stated as 50 60 . State the true class limits of this class if the variable is
Lower class limit

of flats,

[1]

## (ii) the lengths of some rods, measured

in mm, to the nearest mm,

[1]

## (iii) the lengths of some rods, measured

in mm, to the nearest 10 mm.

[1]

(b) A fisherman recorded, in grams (g), to the nearest 100 grams, the masses of 100 fish
he had caught in river A.

(i)

## Mass of fish (grams)

Number of fish

100 200

12

300 400

31

500 700

29

800 1000

14

1100 1400

1500 2000

2100 3000

Cumulative frequency

State, with a reason, which of the mean or the median would be the more
appropriate measure of central tendency to use in this case.
..................................................................................................................................
.............................................................................................................................. [2]

(ii)

[1]

(iii)

## Without drawing a graph, calculate an estimate of the interquartile range of the

masses of the fish.

................................................... [6]
UCLES 2013

4040/22/O/N/13

For
Examiners
Use

15
(iv)

The fisherman also recorded the masses of 100 fish caught in river B and found
the interquartile range of the masses of these fish to be 352 g. Explain what this
tells you about the masses of the fish caught in river B compared to those caught in
river A.

For
Examiners
Use

..................................................................................................................................
.............................................................................................................................. [1]
(v)

## Without drawing a graph, calculate an estimate of the percentage of fish in river A

with a mass of less than 650 g.

................................................... [3]

UCLES 2013

4040/22/O/N/13

[Turn over

16
10 A hairdresser classifies the expenditure on her business into three categories: Rent,
Equipment and Wages.
The cost of Rent has increased from \$240 per month in 2010 to \$256 per month in 2012.
The price relative of Equipment in 2012 is 110, taking 2010 as base year.
The hourly rate of the Wages of her employees has decreased by 2% between 2010 and
2012.
(i)

(a) Calculate the price relative, to the nearest whole number, of Rent for 2012, taking
2010 as base year.

................................................... [2]
(b) Explain what the price relative of 110 for Equipment indicates.
..................................................................................................................................
..................................................................................................................................
.............................................................................................................................. [3]
(c) State the price relative of Wages for 2012, taking 2010 as base year.
................................................... [1]
(d) Present the price relatives for 2010 and 2012 for each of Rent, Equipment and
Wages in a suitable table.

[2]

UCLES 2013

4040/22/O/N/13

For
Examiners
Use

17
The hairdresser wishes to calculate a weighted aggregate cost index, using weights
calculated in 2010, for the three categories.
(ii)

For
Examiners
Use

## (a) Briefly describe how these weights could be calculated.

..................................................................................................................................
.............................................................................................................................. [1]
The weights in 2010 for Rent, Equipment and Wages were calculated as 7, 2 and 5
respectively.
(b) Calculate, to the nearest integer, a weighted aggregate cost index for 2012, taking
2010 as base year.

................................................... [3]
(c) Her total expenditure on the hairdressing business in 2010 came to \$5760. Use
your answer to part (b) to estimate, to the nearest dollar, her total expenditure on

................................................... [2]
(d) Give two possible reasons why this estimate might be very inaccurate.
Reason 1 ...................................................................................................................
..................................................................................................................................
Reason 2 ...................................................................................................................
.............................................................................................................................. [2]

UCLES 2013

4040/22/O/N/13

[Turn over

18
11 A small village has a population of 60 people aged 10 and over.
A group of researchers wish to find out what the people of the village think about proposed
changes to the timetable for the buses that pass through the village. Each researcher has a
list of the population and thinks of a different way to select a sample.
(i)

The first researcher plans to stand at the village bus stop at 7 am on a Monday morning
and ask the first six people from the population who come to wait for a bus. Explain why
this might not produce a reliable sample.
..........................................................................................................................................
..........................................................................................................................................
...................................................................................................................................... [2]

(ii)

A second researcher decides to take a simple random sample of size six from the
population of 60 people.
(a) Explain what the researcher would need to do with the population list before being
able to select the sample from a random number table.
..................................................................................................................................
.............................................................................................................................. [2]
(b) Use the random number table below, starting at the beginning of the first row
and working along the row, to select a simple random sample of size six from the
population of 60 people, ensuring that no one is selected more than once.
RANDOM NUMBER TABLE
15 08 73 00 60 15 31 52 86 47 82 99 04 33
23 05 65 27 46 13 81 50 49 34 29 08 94 72

.............................................................................................................................. [2]

UCLES 2013

4040/22/O/N/13

For
Examiners
Use

19
(iii)

A third researcher decides to take a systematic sample of size six from the population.
(a) Explain clearly how they should use a random number table to select the first value
for such a sample.

For
Examiners
Use

..................................................................................................................................
.............................................................................................................................. [1]
(b) Use the random number table below, starting at the beginning of the first row and
working along the row, to select a systematic sample of size six.
RANDOM NUMBER TABLE
36 04 85 06 63 22 16 64 12 51 25 92 74 43
35 75 21 44 56 20 83 59 98 35 27 08 14 69

.............................................................................................................................. [3]

## [Question 11 continues on the next page]

UCLES 2013

4040/22/O/N/13

[Turn over

20
The table below shows the population, split into three different age groups.

Number
of people
(iv)

10 18
years

19 65
years

66 years
and over

TOTAL

20

30

10

60

For
Examiners
Use

A fourth researcher decides to take a random sample of size six, stratified by age group.
(a) State how many people from each age group would be needed for such a sample.
10 18 years .......................................................
19 65 years .......................................................
66 years and over ................................................... [1]
(b) Explain clearly what the researcher would need to do before selecting the random
sample, stratified by age group, from a random number table.
..................................................................................................................................
.............................................................................................................................. [2]
(c) Use the random number table below, starting at the beginning of the first row and
working along the row, to select a random sample of size six, stratified by age
group, ensuring that no one is selected more than once. Use every number if the
age group to which it relates has not yet been fully sampled.
RANDOM NUMBER TABLE
17 55 82 25 07 16 35 42 89 37 91 98 24 38
77 29 38 02 47 19 80 53 16 40 28 07 94 73

.............................................................................................................................. [2]
(d) Explain why a random sample, stratified by age group, might be a good idea in this
situation.
..................................................................................................................................
..................................................................................................................................
.............................................................................................................................. [1]

Permission to reproduce items where third-party owned material protected by copyright is included has been sought and cleared where possible. Every
reasonable effort has been made by the publisher (UCLES) to trace copyright holders, but if any items requiring clearance have unwittingly been included, the
publisher will be pleased to make amends at the earliest possible opportunity.
University of Cambridge International Examinations is part of the Cambridge Assessment Group. Cambridge Assessment is the brand name of University of
Cambridge Local Examinations Syndicate (UCLES), which is itself a department of the University of Cambridge.

UCLES 2013

4040/22/O/N/13

## UNIVERSITY OF CAMBRIDGE INTERNATIONAL EXAMINATIONS

General Certificate of Education Ordinary Level

* 3 3 7 3 5 2 4 8 2 4 *

4040/23

STATISTICS
Paper 2

October/November 2013
2 hours 15 minutes

## Candidates answer on the question paper.

Pair of compasses
Protractor

Write your Centre number, candidate number and name on all the work you hand in.
Write in dark blue or black pen.
You may use a soft pencil for any diagrams or graphs.
Do not use staples, paper clips, highlighters, glue or correction fluid.
DO NOT WRITE IN ANY BARCODES.

Answer all questions in Section A and not more than four questions from Section B.
If working is needed for any question it must be shown below that question.
The use of an electronic calculator is expected in this paper.
At the end of the examination, fasten all your work securely together.
The number of marks is given in brackets [ ] at the end of each question or part question.

## This document consists of 20 printed pages.

DC (LEG/CGW) 66936/4
UCLES 2013

[Turn over

2
Section A [36 marks]
Answer all of the questions 1 to 6.

(i)

## the number of items of mail delivered each day to a particular address;

................................................... [1]

(ii)

## the distances run by a number of athletes during 1 hour.

................................................... [1]

The variables described above are each grouped into classes labelled 0 4, 5 9,
10 14 etc.
State the true lower and upper class limits for the 5 9 class for
(iii)

## the variable described in (i),

...................................................................................................................................... [2]

(iv)

the variable described in (ii), after the distances have been rounded to the nearest
integer.
...................................................................................................................................... [2]

UCLES 2013

4040/23/O/N/13

For
Examiners
Use

3
2

Give a brief explanation of the meaning of each of the following terms when used in the
calculation of index numbers:
(i)

For
Examiners
Use

base year;
..........................................................................................................................................
..........................................................................................................................................
...................................................................................................................................... [2]

(ii)

weight;
..........................................................................................................................................
..........................................................................................................................................
...................................................................................................................................... [2]

(iii)

price relative.
..........................................................................................................................................
..........................................................................................................................................
...................................................................................................................................... [2]

UCLES 2013

4040/23/O/N/13

[Turn over

4
3

The body lengths (including the tail) of a sample of 45 white-footed Texas mice were
measured in millimetres. 25 of the mice were found to be male and 20 female. The following
table summarises the data obtained on mouse length.

(i)

Number of mice

Sum of lengths

## Sum of squares of lengths

Male

25

4325

748 369

Female

20

3060

468 252

Explain why the mean length of the total sample of 45 mice is not just given by
(mean length of male mice + mean length of female mice) / 2.
..........................................................................................................................................
...................................................................................................................................... [1]

(ii)

Calculate, to 1 decimal place, the mean and the standard deviation of the lengths of the
total sample of 45 mice.

Mean = ..................................................
Standard deviation = .................................................. [5]

UCLES 2013

4040/23/O/N/13

For
Examiners
Use

5
4

Values of experimental readings taken by different people are to be scaled for purposes of
comparison. The readings have a mean of 37 and a standard deviation of 5.
The scaled values are to have a mean of 100 and a standard deviation of 10.

For
Examiners
Use

Calculate
(i)

## the scaled value corresponding to a reading of 55,

................................................... [2]
(ii)

## the reading corresponding to a scaled value of 87.5,

................................................... [2]
(iii)

## the reading which is unaltered when scaled.

................................................... [2]

UCLES 2013

4040/23/O/N/13

[Turn over

6
5

For
Examiners
Use

School A

School B

The bar chart above is intended to illustrate information about how many boys and girls
attend each of two schools, A and B.
(i)

The bar chart is incomplete. List three items of detail which are missing.
...................................................
...................................................
................................................... [2]

(ii)

## State the name of this type of bar chart.

................................................... [1]

(iii)

Explain how you know that the bar chart illustrates the actual number of boys and girls,
and not percentages.
..........................................................................................................................................
...................................................................................................................................... [1]

(iv)

Another type of diagram which could be used to illustrate the data is a pictogram. State
a disadvantage of pictograms, compared with bar charts, when illustrating frequencies
such as the number of pupils at a school.
..........................................................................................................................................
...................................................................................................................................... [1]

(v)

Give a reason why a change chart could not be used to illustrate these data.
..........................................................................................................................................
...................................................................................................................................... [1]

UCLES 2013

4040/23/O/N/13

7
6

A farmer classifies the expenditure in running his farm under four headings: Animal Feed,
Labour, Fuel and Professional Services (e.g. veterinary services). The price relatives for each
of these headings for the year 2011, taking 2006 as base year, and the weight allocated by
the farmer to each heading are given in the following table.

(i)

Price relative

Weight

Animal Feed

104

14

Labour

110

Fuel

107

Professional Services

102

For
Examiners
Use

Calculate, correct to 2 decimal places, the overall percentage increase in the farmers
weighted cost index from 2006 to 2011.

................................................... [4]
(ii)

In 2011 the farmers income was 7% greater than it had been in 2006. State, with a
reason, whether or not the farm was more profitable than it had been five years earlier.
..........................................................................................................................................
...................................................................................................................................... [2]

UCLES 2013

4040/23/O/N/13

[Turn over

8
Section B [64 marks]

For
Examiners
Use

## Answer not more than four of the questions 7 to 11.

Each question in this section carries 16 marks.

## This question must be answered by calculation. An answer using a graphical method

will not be awarded any marks.
The following table summarises the heights, in centimetres, of a sample of 8585 adult males
in the United Kingdom.
Height (cm)

Frequency

144

1232

2213

2559

1709

705

## 190 under 200

23

Cumulative frequency

(i)

(ii)

## (a) State the class in which the median height lies.

[2]

................................................... [1]
(b) Estimate, to 1 decimal place, the median height.

............................................. cm [3]
(iii)

(a) State the class in which the lower quartile height lies.

................................................... [1]
(b) Estimate, to 1 decimal place, the lower quartile height.

............................................. cm [3]
UCLES 2013

4040/23/O/N/13

9
The upper quartile height, correct to 1 decimal place, is 175.9 cm.
(iv)

For
Examiners
Use

## (a) Estimate the interquartile range of the heights.

............................................. cm [1]
(b) Compare the distances of the quartiles from the median, and comment on whether
this is what you would expect in a distribution of the heights of a large number of
..................................................................................................................................
..................................................................................................................................
..................................................................................................................................
.............................................................................................................................. [3]
(v)

If a cumulative frequency curve were drawn to illustrate this distribution, state, with a
reason, in which part of the graph the curve would be at its steepest.
..........................................................................................................................................
...................................................................................................................................... [2]

UCLES 2013

4040/23/O/N/13

[Turn over

10
8

## In this question give all answers as fractions in their lowest terms.

Two identical bags each contain a number of coloured balls.
Bag X contains 4 white and 7 blue balls. Bag Y contains 3 blue and 8 red balls.
(i)

A bag is chosen at random and a ball selected at random from it. Find the probability
that the selected ball is blue.

................................................... [3]
(ii)

Two balls are chosen at random from bag Y. Find the probability that they are of the
same colour.

................................................... [3]
(iii)

One ball is chosen at random from each bag. Find the probability that the chosen balls
are of the same colour.

................................................... [4]
(iv)

A bag is chosen at random and two balls are selected at random from it. Find the
probability that both selected balls are white.

................................................... [3]

UCLES 2013

4040/23/O/N/13

For
Examiners
Use

11
(v)

All the balls from both bags are emptied into a third bag, bag Z. Two balls are then
chosen at random from bag Z. Find the probability that both selected balls are white.

For
Examiners
Use

................................................... [2]
(vi)

Explain briefly why the answer to part (iv) is greater than the answer to part (v).
..........................................................................................................................................
...................................................................................................................................... [1]

UCLES 2013

4040/23/O/N/13

[Turn over

12
9

Three unbiased six-sided dice, each with faces numbered 1, 2, 3, 4, 5 and 6, are rolled
simultaneously.
Find the probability that the numbers on the uppermost faces will be
(i)

three 1s,

................................................... [1]
(ii)

## three of the same number except 1,

................................................... [1]
(iii)

## exactly two 1s and some other number.

................................................... [3]
A game, in which three such dice are rolled simultaneously and for which the entry fee is \$1,
is organised. Prizes are paid for certain outcomes on the uppermost faces, as given in the
following table.

(iv)

Outcome

Three 1s

## Exactly two 1s and

some other number

Calculate, to the nearest cent, the organisers expected profit each time the game is
played.

................................................... [3]
UCLES 2013

4040/23/O/N/13

For
Examiners
Use

13
In another game, a contestant chooses three cards at random from a set of ten. The numbers
on the cards are 1, 1, 1, 2, 2, 2, 3, 4, 5 and 6. Prizes are again paid as given in the previous
table.
(v)

For
Examiners
Use

By first calculating the appropriate probabilities, calculate, to the nearest cent, the entry
fee which should be charged to make this a fair game.

................................................... [8]

UCLES 2013

4040/23/O/N/13

[Turn over

14
10 (a) A large housing estate contains approximately equal numbers of three types of dwelling:
detached houses (D), semi-detached houses (S) and bungalows (B). A research
organisation wishes initially to get some idea of how many occupants there tend to be
in each type of dwelling. It has instructed an interviewer to call at four of each type of
dwelling to ask how many people live there but the choice of exactly which dwellings is
up to the interviewer.
(i)

## State the name of the method of sampling being used.

................................................... [1]

(ii)

Give a reason why the research organisation could not just simply use a list of
registered voters for the estate.
..................................................................................................................................
.............................................................................................................................. [1]

The interviewer labelled his chosen dwellings 1 to 12, and the following is a copy of the
notes he made during a number of visits to the estate:

1
2
3
4
5
6
7
8
9
10
11
12
5
8
10
5
8
8
(iii)

B
S
S
D
B
D
D
B
S
S
D
B

call again later
call again

## For the twelve dwellings chosen, find the total number of

................................................... [1]
(b) bungalows with no children.
................................................... [1]

UCLES 2013

4040/23/O/N/13

For
Examiners
Use

15
(iv)

Draw up and complete a table showing the number of dwellings, classified by their
type and by the number of children who live in them.

For
Examiners
Use

[3]
(v)

## The research organisation is to carry out a survey on behalf of a manufacturer of

childrens clothes. If it only has sufficient funding to investigate the expenditure on
such clothes by the inhabitants of one type of dwelling, state, with a reason, which
type it should choose.
..................................................................................................................................
..................................................................................................................................
.............................................................................................................................. [2]

## [Question 10 continues on the next page]

UCLES 2013

4040/23/O/N/13

[Turn over

16
(b) A group of 60 people are each allocated a different two-digit random number in the
range 01 to 60. The 20 men are numbered 01 to 20 and the 40 women are numbered 21
to 60.
A sample of size six is to be selected by different sampling methods using the following
random number table, starting at the beginning of the row for each sample. No person
may be selected more than once in any one sample.
RANDOM NUMBER TABLE
21

32

07

42

98

81

21

57

81

59

31

17

36

Select
(i)

## a simple random sample,

.............................................................................................................................. [2]

(ii)

a systematic sample,
.............................................................................................................................. [3]

(iii)

a sample stratified by gender, using every number if the gender to which it relates
has not yet been fully sampled.

.............................................................................................................................. [2]

UCLES 2013

4040/23/O/N/13

For
Examiners
Use

17
11 (a) (i)

A companys sales are recorded every month over a period of several years. Use
this example to explain briefly the meaning of the term

For
Examiners
Use

(a) trend,
...........................................................................................................................
...........................................................................................................................
....................................................................................................................... [1]
(b) seasonal variation,
...........................................................................................................................
...........................................................................................................................
....................................................................................................................... [1]

## (c) cyclic variation.

...........................................................................................................................
...........................................................................................................................
....................................................................................................................... [1]
(ii)

State which one of trend, seasonal variation and cyclic variation the method
of moving averages removes from a time series, and explain briefly how this is
achieved.
..................................................................................................................................
..................................................................................................................................
.............................................................................................................................. [2]

UCLES 2013

4040/23/O/N/13

[Turn over

18
(b) The following table gives the number of properties sold during each quarter of the years
2006 to 2009 by a small estate agent, together with values of relevant totals and moving
averages.
Year

2006

Quarter

Number of
sales

18

II

24

4-quarter
total

8-quarter
total

8-quarter moving
average value

197

24.625

205

25.625

209

26.125

209

26.125

96
III

28
101

IV

26
104

23
x=

II
2007

27
104

III

29

y=

25.625

101
IV

25

197

24.625

188

23.5

96
I

20
92

II
2008

22

180

z=

88
III

25

167

20.875

151

18.875

137

17.125

121

15.125

79
IV

21
72

11
65

II

15

2009

(i)

56
III

18

IV

12

[3]
UCLES 2013

4040/23/O/N/13

For
Examiners
Use

19
(ii)

## On the grid below plot the 8-quarter moving average values.

[2]

II
III
2006

(iii)

Describe what your plotted points show about the sales of properties during this
time period.

28

For
Examiners
Use

26

24

22

20

Number
of 18
sales

16

14

12

10

IV

II
III
2007

IV

II
III
2008

IV

II
III
2009

IV

I
2010

..................................................................................................................................
.............................................................................................................................. [2]
[Question 11 continues on the next page]

UCLES 2013

4040/23/O/N/13

[Turn over

20
(iv)

State, with a reason, whether or not it would be meaningful to draw a single straight
trend line through the plotted points.

For
Examiners
Use

..................................................................................................................................
.............................................................................................................................. [1]
(v)

Draw a straight trend line which would be useful for estimating the number of
properties sold in the first quarter of 2010.
[1]

The seasonal components for the number of sales are given in the following table.

(vi)

Quarter

II

III

IV

Seasonal component

4.4

0.1

3.5

1.0

State, with a reason, whether the actual sales of properties in the first quarter of
2010 would be likely to be greater or smaller than the value indicated by your trend
line.
..................................................................................................................................
.............................................................................................................................. [2]

Permission to reproduce items where third-party owned material protected by copyright is included has been sought and cleared where possible. Every
reasonable effort has been made by the publisher (UCLES) to trace copyright holders, but if any items requiring clearance have unwittingly been included, the
publisher will be pleased to make amends at the earliest possible opportunity.
University of Cambridge International Examinations is part of the Cambridge Assessment Group. Cambridge Assessment is the brand name of University of
Cambridge Local Examinations Syndicate (UCLES), which is itself a department of the University of Cambridge.

UCLES 2013

4040/23/O/N/13

## Cambridge General Certificate of Education Ordinary Level

4040 Statistics November 2013
Principal Examiner Report for Teachers

STATISTICS
Paper 4040/12
Paper 12

Key Messages
If a question specifies a certain degree of accuracy for numerical answers, full marks will not be obtained if
the instruction is not followed.
Premature rounding or truncation of decimals in the middle of working should be avoided so that accuracy is
not lost.
Candidates should develop the skill of holding the intermediate values of a calculation in the calculator to
obtain maximum accuracy in the final answer.
Candidates should try to relate their knowledge to the specific requirements of a question rather than simply
repeat memorised knowledge.
After performing any calculation it is worth pausing to consider if the answer obtained is a reasonable one for
the practical situation of the question.

The overall standard of work was comparable to that of last year. Some very good marks were obtained,
and there were few exceptionally low marks. As is noted regularly in these reports, there were again
instances of marks being needlessly lost due to final answers not being given to the accuracy specifically
stated in the question. In those parts of questions requiring comment related to results calculated there is
still a tendency for some answers given to be mathematical rather than contextual (see Question 10 below).
Any candidate of statistics ought to be able to observe whether or not the result of a calculation is
reasonable in a given practical situation. If it is clearly unreasonable, the work can be checked to find the
error. For example, if it is found that the mid-day temperature in a city is set to increase by 20C by midcentury (see Question 9 below) it should be obvious that a mistake has been made; this is far in excess of
even the direst predictions of climate change scientists.
It may seem superfluous to remark that a question should be read carefully before an answer is attempted.
Yet there was one question in particular on the paper (see Question 2 below) where this was apparently not
done.

Section A
Question 1
Parts (i) and (ii) were generally answered best. It was clear from answers to the other parts that many
candidates do not understand the terms central tendency and dispersion, for many gave a measure of
dispersion when a measure of central tendency was requested, and vice versa. Few answered all parts
correctly.
Answers: (i) mode (ii) range (iii) median (iv) variance or standard deviation (v) interquartile range
(vi) mode

2013

## Cambridge General Certificate of Education Ordinary Level

4040 Statistics November 2013
Principal Examiner Report for Teachers
Question 2
The best answers to part (ii) were those which demonstrated that the candidate had read the question
carefully, and in particular had understood that the key piece of information given was that there was a
proposal to change the time of the class. Thus, when taking her sample, it was important that the instructor
did not select, for example, all women who were in full-time employment, and who, presumably, would all
have been against the change. The answers given below are not exhaustive; but whatever was suggested,
to earn credit it had to be explained to be something that would affect the womans ability, one way or the
other, to attend at the new time.
Weak answers did not address the situation described, but reproduced what was apparently memorised
material on avoiding bias in general. Thus in spite of the question stating clearly that this was a class for
women, and that the instructor already knew their ages, it was quite common to see gender and age
suggested for items of data needed.
Answers: (i)(a) quota (i)(b) systematic (ii) employment status, because working women may need to be
at work in the afternoon; maternal status, because a woman with children may prefer afternoon
attendance when her children are at school
Question 3
This was very well done, with only part (iv) causing problems. Success was most readily achieved by those
who tried inserting different sets of three consecutive integers into their ordered list in part (iii).
Answers: (i) 6 (ii) 3.9 (iii) 4 (iv) 3
Question 4
Whilst parts (i) and (ii) were almost always answered correctly, there were few fully correct answers to the
next three parts. As is observed regularly in these reports, many candidates do not understand clearly what
the regions of the different parts of a Venn diagram represent. In parts (iii) and (iv) common numerators
seen were 27 and 6 respectively, and in part (v) little appreciation was shown that a denominator of 9 had to
be used.
Answers: (i) 25 (ii) 6 actors have worked in Los Angeles and Rome but not Mumbai (iii) 40/48
(iv) 10/48 (v) 4/9
Question 5
Parts (i) and (ii) were almost universally well done. There were also many correct answers to part (iii), but
because past questions have usually asked about the radii of the charts, some candidates felt that squaring
or taking square roots had to be done somewhere.
Answers: (i) \$12 million (ii) 126 (iii) 4 : 3
Question 6
This was another question which was almost universally well done. Candidates understood very clearly this
particular form of tabulation for the representation of the distances between different towns. Errors occurred
occasionally in part (ii) when it was not realised that three distances only had to be added for the journey
described in the question.
Answers: (i)(a) 35 in cell BC (i)(b) 24 in cell AC (i)(c) 21 in cell CE (i)(d) 37 in cell CD (ii) 81 km
Section B
Question 7
As was the case in the examination last year, most candidates were able to apply their knowledge of crude
and standardised rates to fertility rates, and there were many good answers to parts (i), (ii) and (iii).
However, as mentioned in the general comments above, this was yet again one of the questions where
marks were sometimes lost through failure to follow the given accuracy instructions.

2013

## Cambridge General Certificate of Education Ordinary Level

4040 Statistics November 2013
Principal Examiner Report for Teachers
Good answers to part (iv) showed clear understanding that the task was to find the number of deaths in the
city, as the number of births was already known from part (iii). They further showed understanding that the
calculation had to be based on the total population of the city, and not just the females. It was quite common
in weaker answers to see this last point overlooked, with 18 450 being used in the working for deaths instead
of 36 900. The least creditworthy attempts simply subtracted one of the death rates from one of the fertility
rates and stopped at that point, again failing to appreciate that, whilst fertility rates applied only to the
females, death rates applied to the whole population.
Very good general understanding of what was required was shown in part (v).
Answers: (i) 88.7 (ii) 145, 828, 714, 87 (iii) 96.2 (iv) 1486 (v) migration of people into or out of the city
Question 8
There were many correct answers to part (i), though not all candidates appreciated that this was a without
replacement situation. Most did not see the simple link between this part and the next, and attempted part
(ii) as though it was completely unrelated to what had gone before. Unfortunately, in the analysis of the
different cases this involved, one of the three possibilities was frequently omitted.
The quality of answers to the histogram was mixed, with many fully correct answers, but also many where no
allowance was made for the different widths of the rectangles.
Whilst the number of fully correct answers to part (vii) was limited, a good number of candidates were able to
obtain some marks on the question. The best answers showed clear understanding of the conditional
element, ending with a division of probabilities, even though these might not be individually correct, it being
sometimes thought that there were just three 3, 4, 5 cases. More limited answers finished at the point where
the probability of the apartments having 12 rooms had been found, the conditional element not being
recognised. A significant number of answers was seen in which it was thought that the only requirement was
to find the probability of choosing three apartments each with 4 rooms. It should have been apparent that a
question worth 6 marks must have involved more than one line of working for its solution.
Answers: (i) 35/204 (ii) 169/204 (iii) 54 (iv) 6 (v) rectangle of height 5 (vi) modal class (vii) 13/157
Question 9
Some candidates produced graphs of very high quality, the majority plotting points correctly. But the error of
using mid-class values instead of upper class boundaries continues to be seen too often.
As has been pointed out before in these reports, good answers to this type of question give some indication
on the graph (for example with lines drawn and labelled) of how the required information is being found.
Credit can then be given for method, even if the answer is incorrect. Some progress appears to have been
made in this respect, with, on this occasion, fewer graphs devoid of annotations than has been the case in
the past.
Part (iii) was reasonably well done, although a significant number of answers was seen where the serious
error of using a total frequency of 400 was made. Common errors in part (iv) were to add 2.5C or even
20C to the median previously found, and also to add a temperature increase to the interquartile range
previously found. In the case where 20C was being added, it should have been realised that this was a
highly unrealistic increase.
In part (v), thought processes were not always evident from answers presented. The best solutions were
those where vertical lines were drawn on the graph at temperatures of 36C and 34C, with horizontal lines
linking these to the respective cumulative frequencies.

(i) 8, 33, 85, 166, 245, 313, 350, 365 (ii) plot of cumulative frequencies at upper class
boundaries joined by a smooth curve (iii)(a) 20.7C to 21.3C (iii)(b) 11C to 12C, dependent
on correct method for, and accuracy of, quartiles (iv)(a) answer to part (iii)(a) + 2C
(iv)(b) same answer as part (iii)(b) (v) 9, 10 or 11 days

2013

## Cambridge General Certificate of Education Ordinary Level

4040 Statistics November 2013
Principal Examiner Report for Teachers
Question 10
Following an observation made in this report last year on the clarity of plotted points, this year, almost
always, Examiners were able to see points very clearly.
Very good marks were generally earned on the first three parts, with good understanding shown of the need
to order data to find the semi-averages. By far the best way to proceed in part (iv) was to use the two given
averages to find the equation of the line. Candidates who used the average they had calculated in part (iii)
risked error by using values they could not be certain were correct, unlike the values for the other averages
given in the question. Unfortunately many did exactly this, and as a consequence of working with their own
(incorrect) average obtained an incorrect equation. Incorrect equations also resulted from working with a
gradient accurate to only one significant figure.
In part (v) quite a lot of answers were written in purely mathematical language, when what was required was
an appreciation of what was implied for the schools and teachers.
Reasonable skill was shown in part (vi) in drawing a line of best fit by eye, and in part (vii) in finding its
equation. For the latter it was essential that points from the line drawn had to be used. When values were
seen which were originally given in the table, Examiners only gave credit if the line drawn passed through the
plot of these particular points.
In part (viii), most candidates knew that this had something to do with educational provision as it related to
the number of teachers employed. But a good number focused on the intercepts of the two equations rather
than the gradients. Statements to the effect that Belport was better because it employed more teachers
could not be accepted, as actual numbers for Belport were unknown.
Answers: (ii) (927+1085+1219+1361)/4 (iii) (559.75, 25.75) (iv) m = 0.0280 or 0.028, c = 10.00 to 10.11
(v) it indicates there are 10 teachers when there are no pupils (vii) m = 0.033 to 0.039, c =
intercept of line drawn in part (vi) (viii) Belport, as gradient for Belport is higher, showing that the
number of teachers per pupil there is higher than at Astra
Question 11
The answers below for part (i) are not exhaustive, but to gain credit specific advantages and disadvantages
in the statistical analysis of data had to be provided. Thus references to a process being tedious or taking a
lot of time were not considered acceptable. Also, what appear to be common assumptions about it being
easier to analyse a frequency distribution rather than a large set of data must be questioned; if a large set of
data is held in a spreadsheet a wide range of statistical measures can be found almost instantaneously.
Part (ii) was generally well answered, although a mark was commonly lost on the standard deviation through
failure to maintain sufficient accuracy in decimals in the body of the working. For such a problem candidates
should have the ability to retain intermediate values of maximum accuracy within the calculator, by making
use of the memory. Too often premature rounding or truncation of decimals is seen. Most used the method
for standard deviation based on fx and fx, which is far better for computational purposes than that which
uses f(x mean).
Part (iii) aimed to test if candidates were able to focus on the particular numbers relevant to a question,
when given a table containing a range of information. There were very mixed answers, with some giving
more than one programme for one or both answers.
Good understanding was shown in part (iv), and many clearly presented answers were seen.
Answers: (i) provides a concise summary of the data; original data are lost (ii) 3.66, 0.343 (iii)(a) Q
(iii)(b) T (iv) 197/900

2013

## Cambridge General Certificate of Education Ordinary Level

4040 Statistics November 2013
Principal Examiner Report for Teachers

STATISTICS
Paper 4040/13
Paper 13

Key Messages
A valuable skill in statistical work is to be able to recognise when the results of a calculation or analytical
process are reasonable.
If a question specifies a certain degree of accuracy for numerical answers, the instruction must be followed
for full marks to be credited.
If words in a question are emphasised they should be noted carefully by the candidate so that unnecessary
errors are avoided.

The overall standard of work was comparable to that of last year, with a wide range of marks being obtained.
As is noted regularly in these reports, there were again instances of marks being needlessly lost when
answers were not given to the required accuracy, where this was stated in the question (see Questions 2,
10 below).
A candidate of statistics ought to know whether or not the result of a calculation or analytical process is
reasonable in a given practical situation. If it is clearly unreasonable, the work can be checked to find the
error and the error corrected. If a plot of the values on a scatter diagram show clearly that as x increases y
decreases, it ought to be obvious that, if found, a line of best fit with positive gradient must be wrong (see
Question 10 below).
In questions which require written answers, candidates should try to relate their knowledge to the specific
context of the question rather than simply repeat memorised knowledge of a general nature (see Question 6
below).

Section A
Question 1
Answers to this question were mixed. It is clear that some candidates do not understand the terms central
tendency and dispersion, for a measure of dispersion was sometimes given when a measure of central
tendency was requested, and vice versa.
Answers: (i) median, mode (ii) interquartile range (iii) mean (iv) two from range, standard deviation,
variance
Question 2
This was very well answered, with many candidates obtaining full marks. Good understanding was shown of
the use of the square of the radius in part (iv), though occasionally a mark was needlessly lost as a
consequence of the accuracy instruction being ignored.
Answers: (i) Europe 164, Asia 74, North America 90, Rest of the World 32 (ii) \$162 million
(iii) 4.9 cm to 5.1 cm (iv) 4.1 cm

2013

## Cambridge General Certificate of Education Ordinary Level

4040 Statistics November 2013
Principal Examiner Report for Teachers
Question 3
This was another very well done question, with many full mark answers being presented.
Answers: (i) and (ii) column totals: 220, 440, 660; 540, 125, 665; 1390, 785, 2175; 2150, 1350, 3500
Question 4
Where errors occurred they were mainly in part (iii), where the value for the total number of handball players
was occasionally used instead of the value for those who play only handball.
Answers: (i) 1; one girl did not play any of the three sports (ii)(a) 7 (ii)(b) 2 (iii) 19
Question 5
This question and the next were by far the least well answered in Section A. Whilst almost all recognised
the need for rectangle heights to correspond to frequency densities, many errors were made in using the one
given height to deduce correctly the standard class width.
Answers: (i) 40 (ii) 7, 32, 4, 2 (iii) 2.71
Question 6
It was clear that most candidates knew about systematic sampling, and there was scarcely any confusion
with other types of sampling. But in part (a)(i) there was a tendency to give examples of biased outcomes
rather than the causes of such outcomes. Answers to part (a)(ii) were reasonable, though rarely complete,
either the first or second steps (or even both) in the process being omitted. In part (b), stratification was
clearly understood, but only the strongest answers gave stratification directly relevant to the surveys being
carried out. Weaker answers offered criteria which might be employed in general, such as gender, age or
occupation.
Answers: (a)(i) occurs when there is a regular pattern in the population listing (a)(ii) three basic steps to
be given: listing the population; starting the selection at a random point; selecting every 19th
candidate from the list after the starting point (b)(i) into smokers and non-smokers (b)(ii) into
those who live near an airport and those who do not
Section B
Question 7
In this question, part (b) was answered far better than the other two parts. The diagram was well understood
and there were many correct answers.
In part (a) not everyone appreciated that the case of the person not having the disease had to be considered
as well as the case of the person having the disease, and furthermore that the test result had to be negative
in the former case to give the correct result. Nevertheless some correct solutions were seen.
But there were very few correct solutions to part (c)(i). Almost all failed to consider in their working that if
Laura went into exactly one shop it meant that she did not go into the other. Consequently 0.8 and 0.3 were
usually absent from the working. In part (c)(ii) some candidates did not seem to recognise the numerical
comparison which had to be made in order to give a decision.
Answers: (a)(i) 0.05 (a)(ii) 0.1, 0.9 in second column (a)(iii) 0.9075 (b)(i) 13/33 (b)(ii) 4/5 (b)(iii) 4/13
(c)(i) 0.0558 (c)(ii) unlikely as 0.0558 > 0.05
Question 8
There were very few completely correct answers to part (a) because of the graphs presented in part (a)(ii).
Candidates do not seem to have observed the emphasis given to the word appropriate, because almost all
produced a totally inappropriate graph. As the variable is discrete, full credit could only be given where a
step polygon was drawn.

2013

## Cambridge General Certificate of Education Ordinary Level

4040 Statistics November 2013
Principal Examiner Report for Teachers
The first five parts of part (b) were generally well answered, though with occasional errors through the
misreading of scales. Good appreciation was shown in part (b)(vii) that there would be no change, but a
mark was frequently dropped in part (b)(vi) because the 5 minutes given in the question was absent from
Answers: (a)(ii) step polygon required (b)(i) 42 (b)(ii) 35 (b)(iii) 55 to 56 (b)(iv) 180 (b)(v) 6th or 7th
(b)(vi) increased by 5 minutes (b)(vii) unchanged
Question 9
The calculation of crude and standardised death rates is well known by most candidates, and there were
many good answers to the first three parts.
The explanatory parts were less well done. In part (iv), few focused on the population age structures, and in
part (v) it was usual to see only the first of the reasons given below, though credit was also given for the
observation that town B must have the healthier environment. In part (vi) there was widespread recognition
that the rate would not change, but incomplete explanation as to why this was so.
Answers: (i) p = 9, q = 40 (ii) 4.2 per thousand (iii) 7.3 per thousand (iv) the proportions of the
population of town A in the different age groups match exactly the proportions of the standard
population in the different age groups (v) town B has a larger population than town A; town B
has a much smaller group death rate amongst the elderly than town A (vi) value unchanged;
CDR is calculated using only total population and total deaths, and both would be unchanged
Question 10
A good proportion of candidates answered the first four parts well, with accurately plotted points and
accurately calculated averages, leading to a good line of best fit. But for others the fact that y decreased as
x increased resulted in a common error, it being assumed that the smallest values of x always had to be
paired with the smallest values of y, when calculating the semi-averages. This error meant that the location
of the plotted averages on the grid, and the line subsequently drawn through them, bore no relationship
whatsoever to the pattern of the plotted data. The line had a positive gradient when clearly the trend of the
data indicated the gradient should be negative. When this happened the candidate ought to have realised
something was wrong and paused for reflection, instead of continuing regardless.
In part (v) the accuracy instruction was sometimes ignored.
The best answers in part (vii) were those which illustrated the dangers of extrapolation with contextual
examples, commenting on the likely performance in this situation of very young children or elderly people.
Answers: (ii) overall (10.7, 18.7); lower (8, 23.7); upper (13.3, 13.7) (iv) gradient: value rounding to 1.9;
intercept: value rounding to 39 (v) 12 minutes (vi)(a) reasonably well (vi)(b) A (vii) would not
be valid for substantial extrapolation; for example, the line of best fit indicates an impossible time
of zero for someone who is about 20 years old
Question 11
The quality of answers to this question was variable. Even though basic computation of mean and standard
deviation was required, marks were routinely lost. Sometimes this was the result of calculation errors,
sometimes the result of using incorrect formulae.
In part (iv), as emphasised in the question, the results from part (iii) had to be used. Few candidates were
able to do this successfully. The few good answers seen used the 250 and 750 appropriately and obtained
the required values quickly and easily. Unsatisfactory answers went back to the original x values and started
again.
Answers: (i) 1250, 750, 2000, 3750, 7500 (ii) 8, 0, 5, 12, 27 (iii) 5.02, 85.9896 (iv)(a) 2005
(iv)(b) 5 374 350 (v) dollars squared

2013

## Cambridge General Certificate of Education Ordinary Level

4040 Statistics November 2013
Principal Examiner Report for Teachers

STATISTICS
Paper 4040/22
Paper 22

Key Message
The most successful candidates in this examination were able both to calculate the required statistics and to
interpret their findings. In the numerical problems, candidates scoring the highest marks provided clear
evidence of the methods they had used in logical, clearly presented solutions. In questions requiring written
definitions, justification of given techniques and interpretation, the most successful candidates provided
detail in their explanations with clear thought given to the context of the problem, where appropriate.

In general, candidates did better on the questions requiring numerical calculations and graphical work than
on those requiring written explanations; in particular, candidates did well on the numerical and graphical
parts of Questions 1, 5, 7 and 10. Answers provided to questions requiring written explanations, such as
Questions 7(a)(i), 10(ii)(d) and 11(iv)(d), were sometimes too vague. Where candidates needed to provide
some interpretation of their calculated statistics, such as in comparing the interquartile ranges in Question
9(b)(iv), some otherwise strong candidates seemed to struggle.
Question 8, on probability, proved to be the least popular of the optional Section B questions, with each of
the remaining Section B questions proving equally popular.

Section A
Question 1
The majority of candidates were able to apply correctly the laws of probability relating to independent and
mutually exclusive events. The most common errors were for candidates simply to add the probabilities of A
and B in part (i)(b), without subtracting the intersection, and to multiply the probabilities of C and D in part
(ii)(a).
Answers: (i)(a) 0.03 (i)(b) 0.32 (ii)(a) 0 (ii)(b) 0.64
Question 2
In part (i) of this question, a new value was being added to a set of data and candidates were asked to
explain the effect on the mean and the standard deviation. Many candidates stated, incorrectly, that the
mean would increase and that the standard deviation would stay the same. Such candidates had confused
the idea of adding a constant to each data item, rather than adding a single value to the set of data items. In
part (ii) the concept being tested was the effect on the mean and standard deviation of adding to each item a
constant and of multiplying each item by a constant. Some candidates, incorrectly, assumed that the
addition of the bonus would affect the standard deviation.
Answers: (i) Stay the same, decrease (ii) 12800, 1050

2013

## Cambridge General Certificate of Education Ordinary Level

4040 Statistics November 2013
Principal Examiner Report for Teachers
Question 3
There were some good attempts at this question, with many candidates producing well organised solutions.
Some candidates got incorrect probabilities, but were nonetheless able to use expected values to decide
whether or not the game was fair. A few candidates, incorrectly, attempted to compare probabilities, rather
than expected values.
Answers: (i) , , fair game (ii) \$3
Question 4
Many candidates struggled to deal with the times in this question. It was necessary to find the mean number
of minutes early/late for the two groups of candidates before trying to combine them. In part (ii) many
candidates were able to quote the correct formula for standard deviation, but again they frequently used
times rather than the number of minutes late in this formula.
Answers: (i) 36, 8.59 (ii) 11.9
Question 5
Most candidates were able to use the change chart, together with the figures provided, to calculate the
quantities of the commodities produced in 2012. They then, usually successfully, displayed this information
in the form of a dual bar chart. A mark was lost by some candidates for insufficient labelling of the vertical
axis, where it was necessary to state that the units were millions of tonnes. In part (iii) some candidates did
not explain sufficiently clearly that the advantage of a dual bar chart over a change chart is that the original
data is not lost.
Answers: (i) 80.7, 96.8, 22.1, 17.7
Question 6
Most candidates correctly identified the heights of the players as continuous, quantitative data and the towns
of birth of the players as discrete, qualitative data. In part (b), the majority of candidates were able to identify
the chart correctly as a sectional, component or composite bar chart, but many did not recognise that this
chart was more appropriate than a histogram, as the data presented here is discrete. Many candidates
simply stated that the sectional bar chart was easier to understand than a histogram. In part (b)(iii), it was
common to see the answer given as simply the number of matches played in the cup in which the team
scored 2 or more goals, rather than this expressed as a fraction of the total number of matches played in the
cup. The denominator of 11 was frequently incorrect or missing entirely.
Section B
Question 7
In part (a)(i), it was necessary for candidates to consider the merits of obtaining moving average values in
this particular situation. Therefore they needed to consider whether the number of visitors at a tourist
attraction is likely to be subject to seasonal variation, and to conclude that this is likely. Many candidates
simply stated, in general terms, the purpose of calculating moving average values, without relating their
comments to the particular situation identified. Parts (a)(ii) and (iii) were completed correctly by many
candidates with a few, incorrectly, giving an answer of 3 for part (ii).
The calculations in parts (b)(i) and (ii) were completed correctly by most candidates and the graph plots in
part (iii) were mostly accurate, with a suitable trend line drawn. Most candidates correctly interpreted the
trend line in the context of the problem presented. In part (v), some candidates did not take the reading from
the trend line at the correct place and others did not subtract 11.25 from their reading. The most common
error, however, was not to give the final estimate of the number of patients admitted to the hospital as a
whole number.
Answers: (a)(ii) 4 (b)(i) 168, 1308, 218 (b)(ii) 213.5, 215, 215.5, 217, 219.25, 221.25

2013

## Cambridge General Certificate of Education Ordinary Level

4040 Statistics November 2013
Principal Examiner Report for Teachers
Question 8
Most candidates were successful with part (i) of this question, although it was surprisingly common to see
incorrect responses of 60/100 and 40/130 for parts (a) and (d), respectively. There were many fully correct
solutions seen to part (ii), with some errors caused by some candidates unnecessarily trying to consider the
males and females separately and omitting some of the possible combinations. Part (iii) was more
challenging, but some good attempts were seen with the most common error being multiplication by 3
instead of 6. Most candidates found part (v) the most challenging, although some fully correct solutions were
seen. Some candidates were unable to attempt the final part of this question and a common incorrect
answer seen in part (v)(a) was 4/13.
Answers: (i)(a) 1/5 (i)(b) 8/15 (i)(c) 23/30 (i)(d) 2/7 (ii) 105/299 (iii) 700/3427 (iv) 4, 3, 6
(v)(a) 40/121 (v)(b) 34/121
Question 9
Candidates found part (a), and in particular part (a)(i), of this question difficult. A common error seen in part
(a)(i) was for the true class limits to be given as 50 and 60. Candidates who were successful with parts
(a)(ii) and (iii) usually went on to complete the numerical parts of (b) correctly.
In part (b)(i), many candidates correctly chose the median and explained that it is not affected by extreme
values. Almost all candidates found the correct cumulative frequencies in (b)(ii) and most then tried to find
th
th
the 25 and 75 values, as required for the interquartile range in (b)(iii). Most candidates then applied the
correct formula, but many used wrong values for class boundaries and class widths. In part (b)(iv), it was
necessary for candidates to interpret this value. Many thought that a smaller interquartile range indicated
smaller masses in general, rather than a smaller dispersion of the masses. Again in part (b)(v), incorrect
identification of class boundaries led to wrong answers for some who used a correct approach. Common
wrong methods involved trying to divide the whole population proportionately, rather than just the 450750
group.
Answers: (a)(i) 50, 61 (a)(ii) 49.5, 60.5 (a)(iii) 45, 65 (b)(ii) 12, 43, 72, 86, 94, 98, 100 (b)(iii) 480
(b)(v) 62.3
Question 10
Most candidates demonstrated a good understanding of price relatives in their answers to this question. The
numerical parts of (i), namely parts (a) and (c), were usually correct. In part (i)(b), candidates needed to
explain that the price relative of 110 for equipment indicates that the price or cost has increased by 10%
between 2010 and 2012. A few stated incorrectly that it indicated that the expenditure had increased by
10%. In part (i)(d), most candidates correctly drew a two-way table, with values of 100 for each category in
2010 and the price relatives that they had calculated for 2012.
Again it was the numerical parts of (ii), (b) and (c), which candidates found the most straight-forward, and
many fully correct solutions were seen. In part (a), it was necessary to explain that the ratio of the
expenditure on the three different categories could be used to calculate the weights. In part (d), candidates
needed to consider the reliability of the result they had achieved and also what might contribute to an
unreliable result. In the particular context provided, these reasons might have been that the number of
employees, or number of hours worked, had changed or that the amount of equipment used had changed.
These features had not been considered within the calculation of the weighted aggregate cost index. Some
candidates gave incorrect answers, such as that there might have been inflation; this is a feature included
within the figures for the price relatives, and thus not a potential source of inaccuracy. Other answers which
did not gain credit were those which were too vague and did not relate specifically to the problem presented,
such as simply that the weights may be incorrect. It was necessary to provide a reason as to why this might
be the case.
Answers: (i)(a) 107 (i)(c) 98 (ii)(b) 104 (ii)(c) 5990

10

2013

## Cambridge General Certificate of Education Ordinary Level

4040 Statistics November 2013
Principal Examiner Report for Teachers
Question 11
In part (i), candidates needed to consider the reliability of the sampling technique described. They needed to
think about the specific situation and consider who might be waiting for a bus at 7 am on a Monday morning.
In the best answers, the candidates described the method as not representative of the whole population,
because it was likely to contain a group of people such as workers or college candidates who are likely to
have similar requirements in terms of the buses they want to catch.
In part (ii), candidates needed to explain the need to number the population list from either 00 to 59 or 01 to
60 before selecting a simple random sample. Some candidates, who described writing names on pieces of
paper and drawing them from a hat, did not appear to have read the question carefully enough. There was a
high proportion of correct answers to (ii)(b), with the most common errors being the inclusion of both 00 and
60 or the inclusion of 15 twice.
Many candidates were able to find the correct systematic sample in part (iii) although, as with the previous
part, some candidates did not read the wording of part (a) sufficiently clearly and described the process for
selecting the whole sample rather than giving a sufficiently detailed description of the selection of the first
term. It was necessary to state that the first term came from a number between 00 and 09 (or between 01
and 10) randomly selected from the table.
In the final sampling method, stratified sampling, the numbering of the people within the age groups needed
to be considered. Some candidates appeared to be using the ages themselves for the numbering of the
groups, and thus it was common to see 07 excluded and 82 and 16 included in the stratified sample. The
correct numbering that should have been to give each of the age groups in the question was 00 to 19, 20 to
49 and 50 to 59 (or 01 to 20, 21 to 50 and 51 to 60) respectively. In part (iv)(d), it was necessary to explain
the benefit of a sample stratified by age group when considering the views of the population to the proposed
bus timetable change. Thus it was necessary to consider the relevance of age to this particular problem. In
the best answers, candidates explained that the different age groups are likely to want buses at different
times. Most candidates made a general comment about stratified samples not being biased, which did not
relate specifically to the problem that had been presented. Only the very strongest candidates scored this
final mark on the paper.
Answers: (ii)(b) 15, 08, 00, 31, 52, 47 or 15, 08, 60, 31, 52, 47 (iii)(b) 04, 14, 24, 34, 44, 54
(iv)(a) 2, 3, 1 (iv)(c) 17, 55, 25, 07, 35, 42

11

2013

## Cambridge General Certificate of Education Ordinary Level

4040 Statistics November 2013
Principal Examiner Report for Teachers

STATISTICS
Paper 4040/23
Paper 23

Key Message
The most successful candidates in this examination were able both to calculate the required statistics and to
interpret their findings. In the numerical problems, candidates scoring the highest marks provided clear
evidence of the methods they had used in logical, clearly presented solutions. In questions requiring written
definitions, justification of given techniques and interpretation, the most successful candidates provided
detail in their explanations with clear thought given to the context of the problem, where appropriate.

In general, candidates did better on the questions requiring numerical calculations than on those requiring
written explanations; in particular, candidates did well on the numerical parts of Questions 4, 6, 7, 8 and, for
those who attempted it, Question 9. It was particularly pleasing to see, in Question 9(v), clearly laid out
logical solutions, as these were essential in this 8-mark question. Answers to questions requiring written
explanations, such as Questions 5(i) and 10(a)(ii), were sometimes too vague.
In Questions 6(ii), 7(iv)(b), 11(b)(iii) and 11(b)(iv) it was necessary for candidates to provide some
interpretation of their calculated statistics and graphs. In Questions 6(ii) and 11(b)(iv), some otherwise
strong candidates seemed to struggle to interpret and explain their results.
Question 9, on probability and expectation, proved to be the least popular of the optional Section B
questions, although those that attempted it generally scored high marks. Question 7 on linear interpolation
proved to be the most popular of the optional questions.

Section A
Question 1
The majority of candidates were able to correctly identify the number of items of mail as a discrete variable
and the distance run by a number of athletes during 1 hour as a continuous variable, although there were
slightly more errors in part (ii) than part (i). In parts (iii) and (iv), it was necessary to give true lower and
upper class limits for each of these variables, with candidates often being more successful in identifying the
lower than the upper class limit.
Answers: (i) Discrete (ii) Continuous (iii) 5 and 9 (iv) 4.5 and 9.5
Question 2
There were some good attempts by many candidates in part (i) to explain the term base year. Many scored
one of the two available marks, by describing the base year as a reference point or the point in time to which
other index numbers are referred. The second mark was for stating that it is when an index number takes
the value 100, and it was less common to see this part of the answer given.
In part (ii), many candidates found it difficult to express the meaning of the term weight in this context. A
few candidates were able to say that weights were used to calculate a weighted cost index, but not that it is a
means of taking into account the different levels of importance of items in an index number. For the second
mark, it was necessary to state that the weight is usually based on the expenditure on each item and this
was rarely seen.

12

2013

## Cambridge General Certificate of Education Ordinary Level

4040 Statistics November 2013
Principal Examiner Report for Teachers
As with part (i), it was common for candidates to score one mark in part (iii). It was necessary to describe
the term price relative as the ratio of two prices, or as showing the proportional or percentage change in the
price of an item, for the first mark and then, for the second mark, to state that this is relative to the base year.
Question 3
It was rare to see the correct answer to part (i), that the samples of male and female mice are of different
sizes. Some candidates, incorrectly, stated that the division should be by 45 rather than 2, because there
are 45 mice.
In part (ii), errors in the calculation of the mean included candidates multiplying the sums of lengths by the
respective frequencies and use of the incorrect formula from part (i). In the calculation of the standard
deviation, some candidates used their mean correct to only one decimal place, when a greater degree of
accuracy was required in order to find the standard deviation correct to one decimal place.
Question 4
The majority of candidates were successful with this question, particularly parts (i) and (ii). In part (iii), it was
quite common to see an incorrect initial equation, often with only one occurrence of the unknown, rather than
two.
Answers: (i) 136 (ii) 30.75 (iii) 26
Question 5
Correct items of detail missing from the bar chart for part (i) are the key/legend, the vertical scale, the label
on the vertical axis and the title. Some candidates gave answers which were too vague, such as labels or
axes. The vast majority of candidates scored at least one of the two marks available for this part of the
question.
In part (ii), candidates needed to describe the bar chart as either Sectional, Component or Composite. The
most common error was to see it described as a stacked bar chart.
Most candidates correctly noted, in part (iii), that had the bar chart illustrated percentages then the bars
would have been of equal heights.
In part (iv), many candidates correctly stated the disadvantage of using pictograms to illustrate frequencies
as the difficulty to determine the exact frequency when partial pictures are used. Some however incorrectly
referred to the difficulty of construction of a pictogram, rather than considering the impact of the diagram
once constructed.
In part (v), most candidates correctly identified that nothing had changed and therefore a change chart could
not be used.
Question 6
The most common error in part (i) was to see the weighted cost index given as the final answer, rather than
the percentage increase. There were however many fully correct solutions and the majority of candidates
scored at least three of the four available marks.
There were more difficulties, however, with part (ii), with surprisingly few candidates appreciating the need to
compare 5.56% with 7% in order to establish that the farm was more profitable than it had been, because the
income had increased by more than the costs.

13

2013

## Cambridge General Certificate of Education Ordinary Level

4040 Statistics November 2013
Principal Examiner Report for Teachers
Section B
Question 7
Parts (i), (ii), (iii) and (iv)(a) of this question were completed successfully by the majority of candidates.
Some candidates did not appear to understand what they were being asked to do in part (iv)(b). Many did
not find the distances of the quartiles from the median. Those that did were often able to state correctly that
these distances were approximately equal, and that this is what would be expected in a distribution of
In part (v), some candidates were able to identify correctly that the curve would be steepest in the middle of
the graph, where the frequency is greatest or where the greatest change in the cumulative frequency occurs.
The most common error was to identify the part of the graph where the greatest change in frequency occurs.
Answers: (i) 144, 1376, 3589, 6148, 7857, 8562, 8585 (ii)(a) 170 under 175 class (ii)(b) 171.4
(iii)(a) 165 under 170 class (iii)(b) 166.7 (iv)(a) 9.2
Question 8
Most candidates were successful with part (i) of this question, realising the need to include the factor of
(the probability of selecting each bag) within their calculations.
In part (ii), some candidates did not realise that the selection of two balls implies that they will not be
replaced and hence the most common error was to see denominators of 11 and 11 in the two two-factor
products, rather than 11 and 10.
Parts (iii), (iv) and (v) were usually correctly calculated by those who attempted them, although by part (v)
some candidates were leaving this question blank.
In order to answer part (vi) of this question, it was necessary to notice that the probability that the first ball
will be white is the same in each situation. Thus the comparison can focus on the fact that, with fewer balls
in X than Z, the probability that the second ball being white is greater in part (iv) than part (v).
Answers: (i) 5/11 (ii) 31/55 (iii) 21/121 (iv) 3/55 (v) 2/77
Question 9
Most candidates were able to correctly find the probability of getting three 1s in part (i).
In part (ii), the most common error was for candidates to calculate the probability of any three numbers
except 1, rather than three of the same number except 1.
In part (iii), some candidates did not consider the number of ways in which exactly two 1s and some other
number could be achieved, however the majority correctly multiplied by a factor of three.
Most candidates correctly calculated the prize multiplied by the probability for each outcome and summed
their results in part (iv). Some candidates, however, left this as their final answer or subtracted the entry fee
of \$1 from this amount, rather than considering the profit of the organiser and performing the subtraction the
other way around.
Clearly set out working was essential in part (v). Most candidates achieved this, with the most common error
being that some did not appreciate that the essential difference between this game and the previous one
was that the selected cards are not replaced. Thus, for example, denominators of 10, 10 and 10 were
sometimes seen instead of 10, 9 and 8. Such candidates did, however, usually appreciate the fact that the
only possible ways of getting three cards the same are three 1s or three 2s. Another common error was for
the factor of 3 to be missing from the probability of exactly two 1 s and some other number. The method for
the calculation of expectation was usually correct. Those candidates that attempted this part of the question
were usually able to achieve at least 3 of the available marks, with some scoring all 8 marks. Some
candidates, however, did not attempt this part of the question.
Answers: (i) 1/216 (ii) 5/216 (iii) 5/72 (iv) 67 cents (v) 61 cents

14

2013

## Cambridge General Certificate of Education Ordinary Level

4040 Statistics November 2013
Principal Examiner Report for Teachers
Question 10
Most candidates correctly named the sampling method as quota sampling in part (a)(i).
Answers to part (a)(ii) were often given in terms that were too general, such as to state simply that using a
list of registered voters would be biased, without providing a reason. Candidates needed to state that not all
inhabitants would appear in a list of voters.
Common incorrect answers of 17 and 1 were seen in parts (a)(iii)(a) and (a)(iii)(b) respectively, where
candidates had ignored the second and third visits to the properties.
In part (a)(iv), table headings were often incorrect with dwelling numbers 1 to 12, rather than number of
children 0 to 7, appearing as one of the row/column headings. Dwelling types D, S and B usually correctly
In part (a)(v), candidates needed to use the fact that the bigger the sample, the more accurate it is likely to
be. Some candidates incorrectly chose semi-detached houses over detached houses, because they said
that there were too many children in the detached houses.
In part (b), the simple random and stratified samples were usually correct. In part (b)(ii), some candidates
were picking values from the random number table at regular intervals, rather than selecting just the first
value for the systematic sample from the random number table and then selecting every tenth person.
Answers: (a)(iii)(a) 23 (a)(iii)(b) 3 (b)(i) 21, 32, 07, 42, 57, 59 (b)(ii) 07, 17, 27, 37, 47, 57
(b)(iii) 21, 32, 07, 42, 57, 17
Question 11
Correct responses to part (a)(i)(a) described the trend as the long-term pattern after regular variations have
been removed. The key to scoring the mark was to describe the increase/decrease as long-term, general
or over-time.
In part (a)(i)(b), some candidates described seasonal components rather than seasonal variation. The key
here was to include the idea of variation that repeats itself, such as describing a regular variation over a fixed
(relatively short) time period.
The majority of candidates were unable to explain the meaning of cyclic variation in part (a)(i)(c). The key
here was to describe long-term variation following a general pattern, but over variable lengths of time.
In part (a)(ii), a common error was to state that the trend, rather than the seasonal variation, is removed from
a time series when moving averages are calculated. Candidates were rarely able to explain that this is done
by smoothing out the variations over one time period.
The calculations in part (b)(i) were usually all correct, with accurate plots in part (b)(ii).
In part (b)(iii), most candidates correctly stated that the sales rose initially, but then declined thereafter.
Many candidates did not see that a single trend line was not appropriate in part (b)(iv). Those that did
recognise this were not always able to express the reason as being because the trend in the early quarters
was very different from that in the later part of the period.
Suitable trend lines, ignoring the plots before 2007, were drawn by many candidates for part (b)(v). Some
candidates incorrectly drew lines that included the early plots and others ignored the instruction to draw a
straight line and drew curves.
In part (b)(vi), most candidates correctly identified that the seasonal component for quarter one was negative
and therefore sales are likely to be smaller than indicated by the trend line.
Answers: (b)(i) x = 105, y = 205, z = 22.5

15

2013

## Cambridge International Examinations

Cambridge Ordinary Level

* 5 2 6 0 3 0 4 6 5 5 *

4040/12

STATISTICS
Paper 1

October/November 2014
2 hours 15 minutes

## Candidates answer on the question paper.

Pair of compasses
Protractor

Write your Centre number, candidate number and name on all the work you hand in.
Write in dark blue or black pen.
You may use an HB pencil for any diagrams or graphs.
Do not use staples, paper clips, glue or correction fluid.
DO NOT WRITE IN ANY BARCODES.
Answer all questions in Section A and not more than four questions from Section B.
If working is needed for any question it must be shown below that question.
The use of an electronic calculator is expected in this paper.
At the end of the examination, fasten all your work securely together.
The number of marks is given in brackets [ ] at the end of each question or part question.

## This document consists of 17 printed pages and 3 blank pages.

DC (LK/SLM) 102872/4 R
UCLES 2014

[Turn over

2
Section A [36 marks]
Answer all of the questions 1 to 6.

In an industrial process, readings, x, of a particular gauge are recorded regularly. For 6 such
recorded readings it is found that x = 279 and x2 = 13 093.
(i)

## Find the mean and standard deviation of x.

Mean = .......................................................
Standard deviation = ...................................................[4]
It was discovered later that one of the readings had been incorrectly recorded as 43, when in fact
(ii)

State, for each of the mean and standard deviation, whether its correct value will be smaller
than, larger than, or the same as the value found in part (i).
Mean .......................................................
Standard deviation ...................................................[2]

UCLES 2014

4040/12/O/N/14

3
2

A student calculated, correctly, five statistical measures for a set of data. The five values he
obtained were, in ascending order, 6, 36, 43, 48 and 53.
(i)

## Insert these values in their correct positions in the table below.

Statistical measure

Value

Median
Lower quartile
Upper quartile
Standard deviation
Variance
[5]
(ii)

State the value of the 75th percentile for the students original set of data.
....................................................[1]

UCLES 2014

4040/12/O/N/14

[Turn over

4
3

A national government plans a survey to obtain the responses of its citizens to its proposal to build
wind farms as sources of renewable energy.
(i)

A

## Questionnaires will be mailed to 1000 citizens.

Face to face interviews will be conducted with a total of 1000 citizens in shopping centres
in different parts of the country.

## A questionnaire will be placed on the Internet inviting responses from anyone.

Telephone calls will be made to 1000 citizens chosen from the telephone directory.

## (a) Explain why method D would produce a biased sample.

...........................................................................................................................................
.......................................................................................................................................[1]
(b) Give one advantage of method B over method A.
...........................................................................................................................................
.......................................................................................................................................[1]
...........................................................................................................................................
.......................................................................................................................................[2]
(ii)

## A closed question which will be asked in the survey is as follows.

Are you in favour of wind farms being built in your area?
Yes
No

...........................................................................................................................................
.......................................................................................................................................[1]
(b) Write down one open question which could be asked in the survey.
...........................................................................................................................................
...........................................................................................................................................
.......................................................................................................................................[1]
UCLES 2014

4040/12/O/N/14

5
4

The following diagram is to show the number of patients at a medical centre who have received
vaccine against one or more of the diseases polio, cholera and typhoid.
Polio

24

30
17

Cholera

(i)

Typhoid

## Complete the diagram using the following information.

(a) The number of patients who have received only cholera vaccine is 5 fewer than the
number of patients who have received only polio vaccine.
[1]
(b) The number of patients who have received only typhoid vaccine is two thirds of the
number of patients who have received polio and cholera vaccines but not typhoid vaccine.
[1]
(c) The number of patients who have received polio and typhoid vaccines but not cholera
vaccine is the same as the number of patients who have received all three vaccines. [1]
(d) Twice as many patients have received typhoid and cholera vaccines but not polio vaccine
as have received all three vaccines.
[1]

(ii)

Find the mode of the number of these vaccines received by these patients.

....................................................[2]

UCLES 2014

4040/12/O/N/14

[Turn over

6
5

The table below gives information on the gender of, and number of books written by, 40 authors
attending a book fair.
Number of books written
TOTAL
15

6 10

11 15

More than 15

Male

15

Female

12

25

TOTAL

17

14

40

## One of these authors is chosen at random to speak at the opening ceremony.

Find the probability of choosing
(i)

a male,

....................................................[1]
(ii)

## a female who has written 11 or more books,

....................................................[1]
(iii)

an author who has written 6 10 books, given that the author is male.

....................................................[1]
Two authors are chosen at random to lead discussion groups.
(iv)

## Find the probability that both have written 5 or fewer books.

....................................................[3]

UCLES 2014

4040/12/O/N/14

7
6

A police camera at the side of a road measures the speed, in km/h, of every vehicle travelling on
The following histogram represents the information it recorded over a certain period of time.
50

40

Number of
vehicles
per 10 km/h

30

20

10

20

40
60
80
Vehicle speed (km/h)

100

120

Use the histogram to find, for this period of time, the number of vehicles whose speeds were
(i)

## from 50 km/h to under 80 km/h,

....................................................[2]

(ii)

## from 80 km/h to under 100 km/h,

....................................................[2]
(iii)

under 50 km/h.
....................................................[1]

The speed limit on this road is 100 km/h. Any driver of a vehicle travelling at a speed which is
5 km/h or more greater than the speed limit must pay a fine.
(iv)

Estimate the number of drivers represented by this information who had to pay a fine.

....................................................[1]

UCLES 2014

4040/12/O/N/14

[Turn over

8
Section B [64 marks]
Answer not more than four of the questions 7 to 11.
Each question in this section carries 16 marks.

In this question calculate all accident rates per thousand. Where values do not work out
The table below gives information on the number of employees, and the number of accidents
they suffered, at a building construction company, Kwikbuild, in the year 2012. It also shows the
standard population for the building construction industry.

Job group

Number of
accidents

Number of
employees

Management

25

Office

167

35

Site Supervision

40

12

Site Labour

37

228

45

(i)

Job group
accident rate

Standard
population (%)

## Calculate the crude accident rate for Kwikbuild.

....................................................[4]
(ii)

Calculate the accident rate for each job group and insert the values in the table above.

[2]

UCLES 2014

4040/12/O/N/14

9
(iii)

Use your results from part (ii) to calculate the standardised accident rate for Kwikbuild.

....................................................[4]
Fastbuild is another building construction company. In 2012 its crude and standardised accident
rates were 109.4 and 98.7 per thousand respectively.
(iv)

State, with a reason, which of the two companies most likely operates in the safer environment.
...................................................................................................................................................
...................................................................................................................................................
...............................................................................................................................................[2]

For each company some of the accidents suffered by employees were classed as serious, and
they all occurred in the Site Labour job group.
The table below gives information on the serious accidents suffered at the two companies.

Job group

Company

Number of serious
accidents

Number of
employees

Kwikbuild

228

Fastbuild

154

Site Labour

(v)

Calculate, for each company, for the Site Labour job group only, the serious accident rate, and
hence state the company where an employee is less likely to suffer a serious accident.

....................................................[2]
(vi)

State, with a reason, whether the values you have calculated in part (v) are crude or
standardised rates.
...................................................................................................................................................
...................................................................................................................................................
...............................................................................................................................................[2]

UCLES 2014

4040/12/O/N/14

[Turn over

10
8

A running club holds a cross-country race. Competitors enter in either the junior or senior age
category. When they enter, they also choose to follow one of three routes: easy, moderate or
challenging. Information about the number of competitors and the routes chosen is shown below.
Number of competitors entering the race by age category
= 10 junior competitors
= 10 senior competitors

## Percentages of junior entrants

by choice of route

## Percentages of senior entrants

by choice of route

10%
25%
Easy
55%

35%

Easy
45%

Moderate
Challenging

Moderate
Challenging

30%
(i)

## Find the total number of competitors who entered the race.

....................................................[1]

(ii)

Show that there were 42 junior competitors who chose the moderate route.

[1]
(iii)

Find the number of senior competitors who chose the easy route.

....................................................[2]

UCLES 2014

4040/12/O/N/14

11
Not all the entrants completed the race. The times taken by those who did complete the race are
shown in the table below.
Number of competitors
Completion time
(minutes)

Junior

Senior

Easy

Moderate

Challenging

Easy

Moderate

Challenging

60 under 90

16

19

90 under 120

29

12

32

15

18

15

20

17

14

## 150 under 180

11

16

65

39

10

71

46

37

TOTAL

(iv)

Find the number of competitors who entered the race but did not complete it.

....................................................[3]
(v)

Estimate, to the nearest minute, the mean time taken by senior competitors who completed
the challenging route.

....................................................[3]
(vi)

Of the junior competitors who completed the race in 2 hours or more, find the percentage who

....................................................[3]
(vii)

Of all the senior competitors who had chosen the moderate route, find the percentage who
completed the race in under 2 hours.

....................................................[3]
UCLES 2014

4040/12/O/N/14

[Turn over

12
9

One way to determine if an adult has a healthy weight, independent of age and gender, is to
measure their body mass index, BMI (a continuous variable).
The BMI values for the adult population of a particular country in the years 1980 and 2010 are
summarised in the cumulative frequency polygons below.
100
90
1980

80

2010

70
Cumulative
frequency
population)

60
50
40
30
20
10
0

15

20

25

30

35

40

BMI
Use these graphs to answer the following questions on the adult population of this country.
(i)

Estimate
(a) the median BMI value in 1980,
....................................................[1]
(b) the median BMI value in 2010,
....................................................[1]
(c) the lower quartile BMI value in 1980,
....................................................[1]
(d) the upper quartile BMI value in 2010.
....................................................[1]

UCLES 2014

4040/12/O/N/14

13
An adults weight is classified as healthy if their BMI value is between 18.5 and 25.
(ii)

Estimate the percentage of the adult population whose weights were classified as healthy
(a) in 1980,

....................................................[2]
(b) in 2010.

....................................................[1]
An adult is classified as overweight if their BMI value is 25 or more.
(iii)

Estimate the median BMI value of the overweight adult population in 2010.

....................................................[3]
Adults with the highest BMI values are classified as obese.
In 1980, 7% of the adult population were obese.
(iv)

Estimate the percentage of the adult population in 2010 who were obese.

....................................................[4]
(v)

By referring to any of the values you have estimated in parts (i), (ii), and (iv), comment on
how the health of the adult population of this country, assessed in terms of its weight, changed
between 1980 and 2010.
...................................................................................................................................................
...................................................................................................................................................
...................................................................................................................................................
...............................................................................................................................................[2]

UCLES 2014

4040/12/O/N/14

[Turn over

14
10 Barutis teacher has suggested that pupils who enjoy studying a subject are likely to perform well
in tests in the subject.
To investigate this, Baruti asked his friends to rate their enjoyment of Statistics on a linear scale
from 0 (dislike very much) to 5 (like very much), then recorded their scores on the next class test.
His results are shown in the following table.
Friend

Enjoyment
rating, x

Test score
(%), y

57

47

78

59

26

86

53

34

(i)

## Plot these data on the grid below.

y
100

80

60
Test
score
(%)
40

20

x
0

Enjoyment rating
[2]
(ii)

Explain briefly why the points (5, 78) and (4, 59) are not used if the lower semi-average is
calculated.
...................................................................................................................................................
...............................................................................................................................................[1]

UCLES 2014

4040/12/O/N/14

15
(iii)

Calculate the two semi-averages and the overall mean of the data, and plot them on your
graph.

[5]
(iv)

Use your plotted averages to draw a line of best fit, and find its equation in the form y = mx + c.

....................................................[4]
Another friend, who had rated his enjoyment of Statistics at 3, missed the test through illness.
(v) Use the line you have drawn in part (iv) to estimate the score this friend would have obtained
if he had taken the test.
....................................................[1]
Baruti repeated his investigations for English and Science.
The equations he found for the lines of best fit were

and
(vi)

y = 1.24x + 53.8
y = 13.8x + 15.1

for English
for Science.

State, with a reason, in which of the subjects Statistics, English and Science a pupils test
score is most affected by their enjoyment rating.
...................................................................................................................................................
...............................................................................................................................................[2]

(vii)

Explain briefly why Baruti may have experienced difficulty in deciding which of his two
variables should be the independent and which the dependent.
...................................................................................................................................................
...................................................................................................................................................
...............................................................................................................................................[1]

UCLES 2014

4040/12/O/N/14

[Turn over

16
11 At a restaurant it is known from experience that 10% of the customers order an omelette.
Assume that customers make choices independently of each other and that nobody orders more
than one omelette.
(i)

## At table A there are 2 customers.

Find the probability that at this table an omelette is ordered by
(a) no customers,

....................................................[2]
(b) at least one customer.

....................................................[2]
The restaurant serves small omelettes and large omelettes. It is known from experience that 60%
of those ordered are small and 40% are large.
(ii)

## At table B there are 4 customers.

Find the probability that at this table only one customer orders an omelette and it is a large
omelette.

....................................................[4]

UCLES 2014

4040/12/O/N/14

17
Small omelettes are made with 2 eggs and large omelettes with 3 eggs. The chef has a special
store of high quality eggs which are used only for making omelettes.
(iii)

## At table C there are 3 customers.

Find the probability that, in preparing the food for this table, from his special store the chef
uses
(a) exactly 4 eggs,

....................................................[3]
(b) at most 4 eggs.

....................................................[5]

UCLES 2014

4040/12/O/N/14

## Cambridge International Examinations

Cambridge Ordinary Level

* 9 7 3 9 6 3 0 9 8 7 *

4040/13

STATISTICS
Paper 1

October/November 2014
2 hours 15 minutes

## Candidates answer on the question paper.

Pair of compasses
Protractor

Write your Centre number, candidate number and name on all the work you hand in.
Write in dark blue or black pen.
You may use an HB pencil for any diagrams or graphs.
Do not use staples, paper clips, glue or corrections fluid.
DO NOT WRITE IN ANY BARCODES.
Answer all questions in Section A and not more than four questions from Section B.
If working is needed for any question it must be shown below that question.
The use of an electronic calculator is expected in this paper.
At the end of the examination, fasten all your work securely together.
The number of marks is given in brackets [ ] at the end of each question or part question.

## This document consists of 17 printed pages and 3 blank pages.

DC (RW/SLM) 83695/3
UCLES 2014

[Turn over

2
Section A [36 marks]
Answer all of the questions 1 to 6.

## The number of passengers on each of 13 consecutive buses arriving at a terminus were

7 8 16 10 20 5 8 9 8 2 26 9 15 .

Three different measures of central tendency (average) of these numbers are 8, 9 and 11.
Complete the following table by giving, for each of these three measures, its name and a brief
explanation of how its value has been obtained.
Measure

Name

How obtained
..................................................................................................

............................

..................................................................................................
..................................................................................................

..................................................................................................
9

............................

..................................................................................................
..................................................................................................

..................................................................................................
11

............................

..................................................................................................
..................................................................................................
[6]

UCLES 2014

4040/13/O/N/14

3
2

50

40

30
Cumulative
frequency
20

10

0
0

(i)

4
X

## State, with a reason, whether the variable X is continuous or discrete.

...................................................................................................................................................
...............................................................................................................................................[2]

(ii)

State for which of the integer values shown in the graph the frequency of X is 0.
....................................................[2]

(iii)

## Complete the following table.

x

Frequency
[2]

UCLES 2014

4040/13/O/N/14

[Turn over

4
3

(a) The population of a town is tabulated in different age groups. A research organisation wishes
to interview, from the population, a sample which represents it in terms of age. It proposes to
do this using either stratified random sampling or quota sampling.
State one way in which the use of these sampling methods would be similar, and one way in
which it would be different.
...................................................................................................................................................
...................................................................................................................................................
...................................................................................................................................................
...............................................................................................................................................[2]
(b) It is wished to obtain an estimate of the mean number of words on each page of a book. For
each of the following methods state, with a reason, whether a sample obtained using it would
be likely to be biased or unbiased:
(i)

## counting the number of words on the last page of each chapter;

...........................................................................................................................................
...........................................................................................................................................
.......................................................................................................................................[2]

(ii)

## counting the number of words on a systematic sample of pages.

...........................................................................................................................................
...........................................................................................................................................
.......................................................................................................................................[2]

UCLES 2014

4040/13/O/N/14

5
4

The table below summarises the lengths, in millimetres, of a random sample of 50 leaves taken
from a bush.
Length (mm)

Frequency

Under 30

30 under 32

32 under 34

10

34 under 36

17

36 under 38

11

38 under 40

Cumulative frequency

(i)

## Calculate the cumulative frequencies and insert them in the table.

[1]

(ii)

Plot the cumulative frequencies on the grid below and draw a smooth curve through the
plotted points.
[2]
50

40

30
Cumulative
frequency
20

10

28

30

32

34

36

38

40

42

Length (mm)
(iii)

## Use your graph to estimate

(a) the lower quartile length,
.............................................mm [1]
(b) the percentage of leaves that have a length greater than 37.2 mm.

UCLES 2014

4040/13/O/N/14

....................................................[2]
[Turn over

6
5

A company which produces different sizes of sawn wood wishes to display information about the
amount of sawn wood it produces in each of two consecutive years.
(i)

State one advantage and one disadvantage of using a dual bar chart, as opposed to a
percentage bar chart, to illustrate the amount produced in the two years.
...................................................................................................................................................
...............................................................................................................................................[2]

(ii)

Name a quantity which neither a dual bar chart nor a percentage bar chart would show.
...............................................................................................................................................[1]

(iii)

State the names of two types of diagram which will give a relative indication of both the
amount of different sizes of sawn wood and the total amount of sawn wood produced in each
year.
........................................................
....................................................[2]

(iv)

State the name of the type of diagram which will give a direct indication of the differences in
the total amount of sawn wood produced from one year to the next.
....................................................[1]

UCLES 2014

4040/13/O/N/14

7
6

Three of the official languages of Switzerland are French, German and Romansh. The diagram
below illustrates which of these languages are spoken by a random sample of 70 Swiss citizens.
French

German

17

23

8
0
3

12

Romansh
(i)

Find the value which should be written inside the box but outside the circles in order to
complete the diagram.

....................................................[2]
(ii)

## Interpret the value 0 in the diagram.

...................................................................................................................................................
...............................................................................................................................................[1]

(iii)

State, with a reason, in each of the following cases, whether the value 0 would be changed if
the person described learned to speak Romansh.
(a) One of the people denoted in the diagram by the value 17.
...........................................................................................................................................
.......................................................................................................................................[1]
(b) One of the people denoted in the diagram by the value 8.
...........................................................................................................................................
.......................................................................................................................................[1]
(c) One of the people denoted in the diagram by your answer to part (i).
...........................................................................................................................................
.......................................................................................................................................[1]

UCLES 2014

4040/13/O/N/14

[Turn over

8
Section B [64 marks]
Answer not more than four of the questions 7 to 11.
Each question in this section carries 16 marks.

(a) Mr Hassan can travel to work by either car or train. The probability that on any day he travels
by train is 47. If he travels by car the probability that he will be late for work is 19, but by train it is 15.
Calculate the probability that on any randomly chosen day he is not late for work.

....................................................[4]
(b) Three children are to be chosen at random from a group of seven, consisting of four boys,
Ian, James, Michael and Nathan, and three girls, Karen, Lucy and Olive.
(i)

Calculate the probability that Ian, Lucy and Nathan are the three chosen.

....................................................[2]
Two of the seven are a brother and sister.
(ii)

Calculate the probability that the brother and sister will both be among the three chosen.

....................................................[3]

UCLES 2014

4040/13/O/N/14

9
(c) Sammy and Pekos each have a bag containing a number of blue balls and white balls. Each
selects one ball from his bag at random. If the selected balls are of the same colour, Sammy
puts them both in his bag; if they are of different colours, Pekos puts them both in his bag.
Originally, Sammys bag contains 2 blue and 6 white balls, while Pekos bag contains 3 blue
and 5 white balls.
(i)

Calculate the probability that both selected balls are of the same colour.

....................................................[3]
(ii)

If, on the first selection, the balls were of the same colour (so both were put in Sammys
bag before a second selection), calculate the probability that on the second selection the
balls are of different colours.

....................................................[4]
UCLES 2014

4040/13/O/N/14

[Turn over

10
8

The following table summarises the times, x minutes, which the visitors to an art gallery during
one day spent in the gallery. The first row of the table gives the column numbers.
(1)

(2)

(3)

(4)

(5)

(6)

Time, x (minutes)

Frequency, f

fy

fy 2

0 under 30

30 under 35

11

35 under 40

40 under 50

40

50 under 60

26

60 under 70

14

70 under 100

TOTAL

105

(i)

Insert in column (3) of the table the mid-points, m, of each of the classes.

[1]
(ii)

## Values of a variable, y, are given by

y=

m 45
.
2.5

Calculate the value of y for each class and insert the values in column (4) of the table.

[2]
(iii)

For each class, calculate the value of the product fy, and insert the values in column (5) of the
table.
[1]

(iv)

For each class, calculate the value of fy 2, and insert the values in column (6) of the table.

[1]
(v)

UCLES 2014

4040/13/O/N/14

[1]

11
(vi)

## Estimate the mean of y.

....................................................[2]
(vii)

## Estimate the standard deviation of y.

....................................................[2]
(viii)

## Use your results to parts (vi) and (vii) to estimate

(a) the mean of x,

....................................................[2]
(b) the standard deviation of x.

....................................................[2]
(ix)

Comment on whether or not, for these data, the interquartile range would be a more
appropriate measure of dispersion than the standard deviation.
...................................................................................................................................................
...................................................................................................................................................
...............................................................................................................................................[2]

UCLES 2014

4040/13/O/N/14

[Turn over

12
9

A bakery kept a record of the diameters, d centimetres, of the cakes it produced during one week.
The results are summarised in the histogram below.
50

40

Number 30
of cakes
per cm of
diameter 20

10

0
0

(i)

15

16

17

18
19
Diameter (cm)

20

21

22

23

Use the histogram to complete the following grouped frequency table for d.

Diameter, d (cm)
15

under 17

17

under 18

18

under 19

19

under 19.5

Frequency

19.5 under 20
20

under 20.5

20.5 under 21
21

under 23
[4]

UCLES 2014

4040/13/O/N/14

13
(ii)

Use the frequencies you have obtained to produce a simpler grouped frequency distribution,
having four classes of equal width between 15 cm and 23 cm, and present your distribution in
a table.

[3]
(iii)

On the grid below illustrate your simpler grouped frequency distribution by a histogram.

[3]
(iv)

Use the histogram you have drawn in part (iii) to estimate the modal diameter.
.............................................. cm [2]

(v)

Cakes with a diameter between 16.5 cm and 22 cm can be sold in the bakerys shop. Find the
percentage of this weeks cakes which can be sold in the shop.

UCLES 2014

4040/13/O/N/14

....................................................[4]
[Turn over

14
10 In this question calculate all death rates per thousand, and to 2 decimal places.
The first table below gives certain information about the population and deaths in a town, Eastbury,
for the year 2012, together with the standard population of the area in which Eastbury is situated.

(i)

Age group

Deaths

Population in
age group

Standard
population (%)

0 14

25

4500

20

15 34

7000

35

35 59

47

6000

25

60 and over

83

7000

20

The death rate for the 15 34 age group is 3.00 per thousand.
Show that x = 21.

[1]
(ii)

## Calculate the crude death rate for Eastbury.

....................................................[4]
(iii)

Calculate the death rates for the other three age groups.

## 0 14 age group ........................................................

35 59 age group ........................................................
60 and over age group ....................................................[2]

UCLES 2014

4040/13/O/N/14

15
(iv)

Using the given rate for the 15 34 age group, and the rates you have calculated in part (iii),
calculate the standardised death rate for Eastbury.

....................................................[4]
The table below gives information about Westville, another town in the same area, for the year
2012.
The crude death rate for Westville in 2012 was 6.62 per thousand.

(v)

Age group

## Death rate per

thousand

Population in
age group (%)

0 14

35

15 34

25

35 59

27

60 and over

24

13

Calculate the standardised death rate for Westville, using the same standard population as
for Eastbury.

....................................................[2]
One of the two towns has a higher crude death rate, but the other has a higher standardised death
rate.
(vi)

## Give a brief explanation of why such a situation can occur.

...................................................................................................................................................
...............................................................................................................................................[1]

(vii)

State, with a reason, which of the two towns would appear to have the healthier environment.
...................................................................................................................................................
...................................................................................................................................................
...............................................................................................................................................[2]

UCLES 2014

4040/13/O/N/14

[Turn over

16
11 Three trainee technicians, A, B and C, carried out laboratory trials to examine the effect of
temperature, x, in C, on the yield, y, in kg, of an industrial process. The following table shows the
results obtained by each technician.
Technician

Temperature, x (C)

10

15

20

25

30

35

40

45

50

55

60

65

Yield, y (kg)

80

106

75

90

117

118

97

127

80

109

140

115

(i)

Plot the points representing these results on the grid below and label each point A, B or C
according to which technician carried out the trial.
140

130

120

110

100
Yield
(kg)
90

80

70

60
0
0

10

20

30

40

50

60

70

Temperature (C)
[3]

UCLES 2014

4040/13/O/N/14

17
(ii)

## Calculate and plot the overall mean.

[3]
The two semi-averages are (22.5, 97.7) and (52.5, 111.3).
(iii)

Plot the semi-averages and use the three plotted averages to draw the line of best fit.

[3]

It is known that over this range of temperatures the relationship between yield and temperature is
approximately linear.
(iv)

## Comment on the results obtained by the three trainees.

...................................................................................................................................................
...................................................................................................................................................
...............................................................................................................................................[2]

An experienced and reliable technician carried out a trial at a temperature of 40C and obtained a
yield of 125 kg.
(v)

## Plot the experienced technicians result on the graph.

[1]

(vi)

What might this extra information tell you about the performance of the trainees?
...................................................................................................................................................
...................................................................................................................................................
...............................................................................................................................................[2]

(vii)

Use the extra information to draw, by eye, a revised line of best fit.

(viii)

Use this revised line of best fit to estimate the yield for a temperature of 52C.

[1]

............................................... kg [1]

UCLES 2014

4040/13/O/N/14

## Cambridge International Examinations

Cambridge Ordinary Level

* 9 9 5 0 0 0 0 2 4 8 *

4040/22

STATISTICS
Paper 2

October/November 2014
2 hours 15 minutes

## Candidates answer on the question paper.

Pair of compasses
Protractor

Write your Centre number, candidate number and name on all the work you hand in.
Write in dark blue or black pen.
You may use an HB pencil for any diagrams or graphs.
Do not use staples, paper clips, glue or correction fluid.
DO NOT WRITE IN ANY BARCODES.
Answer all questions in Section A and not more than four questions from Section B.
If working is needed for any question it must be shown below that question.
The use of an electronic calculator is expected in this paper.
At the end of the examination, fasten all your work securely together.
The number of marks is given in brackets [ ] at the end of each question or part question.

## This document consists of 19 printed pages and 1 blank page.

DC (NF/SLM) 83693/3
UCLES 2014

[Turn over

2
Section A [36 marks]
Answer all of the questions 1 to 6.

(i)

## Explain the meaning of the term discrete variable.

...................................................................................................................................................
.............................................................................................................................................. [1]

(ii)

## Give an example of a continuous variable.

.............................................................................................................................................. [1]

(iii)

## Explain the meaning of the term qualitative variable.

...................................................................................................................................................
.............................................................................................................................................. [1]

(iv)

## Give an example of a discrete quantitative variable.

.............................................................................................................................................. [1]

UCLES 2014

4040/22/O/N/14

3
2

Number of
people

8 or
more

Number of
homes

(i)

## State the modal number of people living in a home on this street.

................................................... [1]

(ii)

## Find the median number of people living in a home on this street.

................................................... [2]
It was later discovered that an error had been made, and that h homes with 8 or more people were
missing from the original data.
(iii)

Find the maximum possible value of h such that the median will be unchanged when the extra
data is included.

................................................... [2]

UCLES 2014

4040/22/O/N/14

[Turn over

4
3

Maria and Nico each have a tin containing 5 white, 8 milk and 7 dark chocolates.
(i)

Maria selects two chocolates at random from her tin and eats them.
Find the probability that
(a) both are white chocolates,

................................................... [2]
(b) exactly one is a white chocolate.

................................................... [2]
(ii)

Nico selects chocolates at random from his tin until he finds a milk chocolate. He returns
unwanted chocolates to the tin after each selection.
Find the probability that it will take him fewer than 3 attempts to find a milk chocolate.

................................................... [2]

UCLES 2014

4040/22/O/N/14

5
4

The masses, in grams, of a sample of potatoes from a crop are shown in the table below.

(i)

Mass, m (g)

Number of potatoes

30 m 50

14

50 m 100

63

100 m 150

82

150 m 250

47

250 m 400

19

400 m 600

12

For these data, state the name of the most appropriate measure of central tendency and the
name of the most appropriate measure of dispersion. Give a reason for your answers.
Measure of central tendency .......................................................
Measure of dispersion .......................................................
Reason .....................................................................................................................................
.............................................................................................................................................. [3]

## Potatoes over 300 g are classified as large.

(ii)

Without drawing a graph, calculate an estimate of the number of potatoes from this sample
which are classified as large.

................................................... [3]

UCLES 2014

4040/22/O/N/14

[Turn over

6
5

At a school the 100 male and 120 female students choose to study one of the three options Music,
Drama or Art.
Their choices are illustrated in the chart below.
100
90
80
Art

70

Drama

60

Music

Percentage
50
of students
40
30
20
10
0
(i)

Male

Female

## State the full name for this type of chart.

...............................................................................................................................................[1]

(ii)

Calculate the numbers of males and females taking each option and insert them into the
table below.

Music

Drama

Art

Male
Female
[2]

UCLES 2014

4040/22/O/N/14

7
(iii)

Display your data from part (ii) in a fully-labelled dual bar chart using the key provided.

Male
Female

[3]

(iv)

Give one advantage that the dual bar chart you have drawn has over the chart given at the
start of the question.
...................................................................................................................................................
.............................................................................................................................................. [1]

UCLES 2014

4040/22/O/N/14

[Turn over

8
6

Two unbiased six-sided dice, one blue and one green, each with faces numbered 1, 2, 3, 4, 5 and
6 are thrown.
The following are some of the possible outcomes.

(i)

## From the list above, state all the pairs of

(a) independent events,
...................................................................................................................................... [2]
(b) mutually exclusive events.
...................................................................................................................................... [2]

(ii)

## Find P(A B).

................................................... [3]
Event D and a fifth event, E, are known to be mutually exclusive.
(iii)

## Find the smallest and largest possible values for P(E).

................................................... [1]

UCLES 2014

4040/22/O/N/14

9
Section B [64 marks]
Answer not more than four of the questions 7 to 11.
Each question in this section carries 16 marks.

time
(i)

## State two purposes of finding moving average values.

1 .......................................................................................................................................
...........................................................................................................................................
2 .......................................................................................................................................
...................................................................................................................................... [2]

(ii)

If n-point moving average values were to be calculated for the variable V, state an
appropriate value for n.
................................................... [1]

(iii)

State whether or not it would be necessary to centre the moving average values in this
...........................................................................................................................................
...........................................................................................................................................
...................................................................................................................................... [3]

UCLES 2014

4040/22/O/N/14

[Turn over

10
(b) The table below shows fertilizer sales (in thousands of tonnes) by a company each quarter for
a period of 3 years.

Year

Quarter

Sales
(000 tonnes)

84

II

65

4-point
moving average

Centred 4-point
moving average

a = ....................

2010
III

92

74.5
74

IV

59

73.625
73.25

80

72.625
72

II

62

71.5
71

2011
III

87

b = ..............................
70.5

IV

55

70
69.5

78

68.875
68.25

II

58

67.75

2012

(i)

67.25
III

c = .......................

IV

51

Calculate the values of a, b and c and enter them in the table above.

[3]

UCLES 2014

4040/22/O/N/14

11
(ii)

Use the Sales and Centred 4-point moving average values for quarter II of 2011 and
2012 to find an estimate for the seasonal component of quarter II. Give your answer, in
thousands of tonnes, correct to one decimal place.

....................................................[3]
(iii)

Plot the centred moving average values on the grid below and draw the trend line.
75

70
Fertilizer sales
(000 tonnes)
65

60

I
II
2010

(iv)

III

IV

I
II
2011

III

IV

I
II
2012

III

IV

I
II
2013
[2]

Use your trend line and answer to part (ii) to estimate the sales for quarter II of 2013.

....................................................[2]

UCLES 2014

4040/22/O/N/14

[Turn over

12
8

In order to calculate a weighted aggregate cost index, a caf owner divides his expenditure into
three categories: Ingredients, Electricity and Wages.
He collects the following data for the year 2011.
Ingredients cost a total of \$15 600.
Electricity cost \$0.09 per unit.
A total of 5000 units of electricity were used.
A total of 4000 staff hours were worked.
The average wage per hour for all the staff was \$6.50.
(i)

Show that the caf owner should assign weights to the three categories Ingredients, Electricity
and Wages in the ratio 312 : 9 : 520.

[3]
(ii)

Using the following information, complete the table below, giving price relatives to the nearest
integer where appropriate.
2011 is the base year.
The cost of ingredients increased by 8% from 2011 to 2012.
The price of electricity rose to \$0.11 per unit in 2012.
The average wage per hour for all staff fell by 3% from 2011 to 2012.

Price relatives
2011

2012

Ingredients
Electricity
Wages
[5]
UCLES 2014

4040/22/O/N/14

13
(iii)

Calculate a weighted aggregate cost index for the year 2012, taking 2011 as base year, giving

................................................... [3]
(iv)

Use the index calculated in part (iii) and the costs for 2011 to estimate, to 3 significant figures,
the total cost of running the caf in 2012.

................................................... [3]
(v) Give two possible reasons why your estimate for 2012 may be very inaccurate.
Reason 1 ...................................................................................................................................
...................................................................................................................................................
Reason 2 ...................................................................................................................................
.............................................................................................................................................. [2]

UCLES 2014

4040/22/O/N/14

[Turn over

14
9

## In this question give all probabilities as exact fractions.

A turn at a game consists of throwing a pair of unbiased coins, each with a head on one side and
a tail on the other. A point is scored every time a turn produces a pair of heads. A game consists of
three turns.
(i)

## If one game is played,

1 ,
(a) show that the probability of scoring three points is 64

[2]
(b) find the probability of scoring exactly two points.

................................................... [3]
(ii)

If X is the number of points scored in one game, find the probability of each of the remaining
possible values of X. Hence produce a table showing all the possible values of X together with
their probabilities.

[4]
UCLES 2014

4040/22/O/N/14

15
Each game of three turns costs \$4 to play.
A player wins \$58 for scoring three points.
A player wins nothing for scoring fewer than two points.
Rashid decides to play the game, and scores exactly two points.
(iii)

## Find how much money he would win, assuming it is a fair game.

................................................... [3]
As an alternative to taking the money won in part (iii), a player who has scored exactly two points
is given the option of throwing another single coin once. This coin is weighted in favour of tails, and
is four times more likely to show a tail than a head.
If this option is taken the player will win \$50 for a head and \$12.50 for a tail.
(iv)

Determine, by calculation, whether or not Rashid should risk throwing the extra coin.

................................................... [4]

UCLES 2014

4040/22/O/N/14

[Turn over

16
10 (a) The number of calls per day received at a fire station over a period of time is shown in the
table below.
Number of calls per day
Number of days

13

11

## For example, 3 calls were received on 6 of the days.

(i)

Calculate how many calls were received, in total, over the period of time.

................................................... [2]
(ii)

Calculate the mean number of calls per day, correct to one decimal place.

................................................... [3]
(b) The mean and standard deviation of three numbers a, b and c are 11 and 3 respectively.
Complete the table below by finding the mean and standard deviation of each of the four sets
of numbers shown.
Mean

Standard deviation

## The three numbers

a 1, b 1, c 1
The three numbers
a, b, c
2 2 2
The three numbers
5a + 3, 5b + 3, 5c + 3
The six numbers
a, a, b, b, c, c

[4]
UCLES 2014

4040/22/O/N/14

17
(c) The students in a mathematics class were given a test in Algebra and a test in Geometry,
both with a maximum mark of 100. The following table summarises their results.

Class mean

(i)

Algebra

55

Geometry

40

Class standard
deviation
10
4.5

Explain what these figures tell you about the differences between the marks in the
Algebra test and the marks in the Geometry test.
...........................................................................................................................................
...........................................................................................................................................
...........................................................................................................................................
...................................................................................................................................... [2]

(ii)

## Priyanka scored 65 in her Algebra test and 49 in her Geometry test.

Use the class means and standard deviations to state, with a reason, in which test
Priyanka scored better in relation to the rest of the class.

...........................................................................................................................................
...........................................................................................................................................
...................................................................................................................................... [2]
(iii)

The highest mark scored by any pupil in the Algebra test was 87. It is required to scale
the marks so that the scaled mean is 60 and the scaled highest mark is 100.
Calculate the scaled standard deviation which must be used to achieve this.

................................................... [3]
UCLES 2014

4040/22/O/N/14

[Turn over

18
11 At a jam-making factory, 90 jars are filled with jam in fifteen minutes. A sample of 9 of the jars
needs to be taken to check that the mass of jam in the jars is within acceptable limits.
The jars are numbered from 00 to 89.
Asad thinks that the best method for selecting the sample is to take a simple random sample.
RANDOM NUMBER TABLE
47 00 51 96 32 47 85 11 67 05 10 90 28 73
92 01 55 83 76 34 41 29 07 24 63 15 59 81
44 03 59 99 14 27 20 30 09 78 60 04 81 65
(i)

Starting at the beginning of the first row of the random number table, and working along the
row, find Asads sample, ensuring that no jar is selected more than once.
.............................................................................................................................................. [2]

Omar thinks that the best method for selecting the sample is to take a systematic sample.
(ii)

By starting at the beginning of the second row of the random number table, and working
along the row, select the first jar in Omars sample. State also the numbers of the remaining
jars in his sample.

.............................................................................................................................................. [3]
The jam-making factory has three machines A, B and C which put jam into jars, and two packers
X and Y who pack jars into boxes.
Each jar is filled by one of the machines and then packed into a box by one of the packers.
The two-way table shows how many jars filled by each machine were packed by each packer in
fifteen minutes.

(iii)

Machine A

Machine B

Machine C

Packer X

10

11

18

Packer Y

10

19

22

If a sample of size 9 stratified by machine were to be taken, calculate how many jars from
each machine would be required.

Machine A .......................................................
Machine B .......................................................
Machine C .................................................. [2]

UCLES 2014

4040/22/O/N/14

19
The jars from machines A, B and C are those numbered 00 19, 20 49 and 50 89 respectively.
(iv)

Comment on how accurately the samples taken by Asad and Omar represent the jars filled by
each machine.
...................................................................................................................................................
...................................................................................................................................................
...................................................................................................................................................
.............................................................................................................................................. [2]

(v)

Starting at the beginning of the third row of the table, and moving along the row, select a
sample of size 9 stratified by machine. Use every number if the machine to which it relates
has not yet been fully sampled.
...................................................................................................................................................
.............................................................................................................................................. [3]

(vi)

If a sample of size 9 stratified by packer were to be taken, calculate how many jars from each
packer would be required.

Packer X ..................................................
Packer Y .................................................. [2]
(vii)

If a stratified sample were to be taken, state whether it would be more appropriate, in this
...................................................................................................................................................
...................................................................................................................................................
.............................................................................................................................................. [2]

UCLES 2014

4040/22/O/N/14

20
BLANK PAGE

Permission to reproduce items where third-party owned material protected by copyright is included has been sought and cleared where possible. Every
reasonable effort has been made by the publisher (UCLES) to trace copyright holders, but if any items requiring clearance have unwittingly been included, the
publisher will be pleased to make amends at the earliest possible opportunity.
Cambridge International Examinations is part of the Cambridge Assessment Group. Cambridge Assessment is the brand name of University of Cambridge Local
Examinations Syndicate (UCLES), which is itself a department of the University of Cambridge.

UCLES 2014

4040/22/O/N/14

## Cambridge International Examinations

Cambridge Ordinary Level

* 9 0 9 9 9 9 9 8 1 4 *

4040/23

STATISTICS
Paper 2

October/November 2014
2 hours 15 minutes

## Candidates answer on the question paper.

Pair of compasses
Protractor

Write your Centre number, candidate number and name on all the work you hand in.
Write in dark blue or black pen.
You may use an HB pencil for any diagrams or graphs.
Do not use staples, paper clips, glue or correction fluid.
DO NOT WRITE IN ANY BARCODES.
Answer all questions in Section A and not more than four questions from Section B.
If working is needed for any question it must be shown below that question.
The use of an electronic calculator is expected in this paper.
At the end of the examination, fasten all your work securely together.
The number of marks is given in brackets [ ] at the end of each question or part question.

## This document consists of 18 printed pages and 2 blank pages.

DC (SJF/SLM) 83687/4
UCLES 2014

[Turn over

2
Section A [36 marks]
Answer all of the questions 1 to 6.

The number of DVDs bought in a year by each person in a sample of 46 people is given in the
following table.
Number of DVDs bought

12

13

14

15

16

17

18

Number of people

10

## For example, 6 people each bought 15 DVDs.

(i)

State the modal number of DVDs bought in the year by these people.
....................................................[1]

(ii) Find the median number of DVDs bought in the year by these people.

....................................................[2]
(iii)

A number, k, of other people, all of whom bought 22 DVDs in the year, have been omitted
from the table.
If these people are included, find
(a) the greatest possible value of k if the median is unchanged,

....................................................[2]
(b) the greatest possible value of k if the median and the mode are both unchanged.

....................................................[1]

UCLES 2014

4040/23/O/N/14

3
2

Letters posted in the UK may be sent by either 1st or 2nd class post. 40% are sent 1st class.
The following table shows the number of days after posting on which a letter is delivered.
Days after posting

1st class

80%

20%

2nd class

50%

30%

20%

Find the expected number of days for a randomly chosen letter to be delivered.

....................................................[6]

UCLES 2014

4040/23/O/N/14

[Turn over

4
3

A teacher asked each of the 15 boys and 10 girls in her class to estimate the length, in cm, of a
piece of string she showed them.
The estimated lengths are summarised in the following table.

(i)

Number of
pupils

Sum of
lengths

of the lengths

Boys

15

270

5372

Girls

10

160

2759

## Calculate the total of the estimated lengths of all the pupils.

....................................................[1]
(ii)

## Calculate the mean of the estimated lengths of all the pupils.

....................................................[1]
(iii)

Calculate the sum of the squares of the estimated lengths of all the pupils.

....................................................[1]
(iv)

Hence calculate the standard deviation of the estimated lengths of all the pupils.

....................................................[3]

UCLES 2014

4040/23/O/N/14

5
4

## A variable, X, has a mean of 27 and a standard deviation of 12.

(i)

For purposes of comparison with another variable, values of X are to be scaled so that they
have a mean of 30 and a standard deviation of 6.
Find the value of X which would be unchanged by this scaling.

....................................................[3]
(ii)

The highest value of X was 51. It is now wished to scale values of X to produce a new variable,
Y, such that the mean of Y is 50 and the highest value of Y is 100.
Find the standard deviation of Y necessary to achieve this.

....................................................[3]

UCLES 2014

4040/23/O/N/14

[Turn over

6
5

A man who goes on a camping holiday each year classifies his expenditure under three headings:
Food, Clothing and Equipment.
The amounts, in dollars, which he spent in each of the years 2010 and 2011 are given in the
following table.

(i)

Year

Food

Clothing

Equipment

2010

135

165

200

2011

124

132

144

On the grid below illustrate the data by a dual bar chart, using one double bar for each item of
expenditure.

[3]
(ii)

On the grid below illustrate the data by a sectional (component) percentage bar chart, using
one bar for each year.

[3]
UCLES 2014

4040/23/O/N/14

7
6

(a) Explain briefly why it is not possible to illustrate a qualitative variable by a histogram.
...................................................................................................................................................
...................................................................................................................................................
...............................................................................................................................................[2]
(b) A quantitative variable can be either discrete or continuous. Briefly explain the difference
between these two types of quantitative variable.
...................................................................................................................................................
...................................................................................................................................................
...................................................................................................................................................
...............................................................................................................................................[2]
(c) A number of athletes all run for one hour and then the numbers of kilometres they have run
are formed into a grouped frequency distribution, of which the classes are labelled 12 13,
14 15, 16 17 etc.
State the mid-point of the 14 15 class if
(i)

## the distances have been rounded down to a whole number of kilometres,

....................................................[1]

(ii)

the distances have been rounded to the nearest whole number of kilometres.
....................................................[1]

UCLES 2014

4040/23/O/N/14

[Turn over

8
Section B [64 marks]
Answer not more than four of the questions 7 to 11.
Each question in this section carries 16 marks.

The treasurer of a cricket club is carrying out an analysis of changes in club expenditure.
He has summarised expenditure for the year 2012 as follows:
Total cost of maintenance to the grounds and buildings
Average cost of one box of three cricket balls
Number of balls purchased
Cost of services such as electricity and water supply
Wage rate paid per hour to the club groundsman
Number of hours worked by the groundsman during the year
(i)

\$10 000
\$50
75
\$2500
\$12.50
600

Use these data to show that the treasurer should assign weights to the four categories
Maintenance, Balls, Services, Wages in the ratio 8 : 1: 2 : 6.

[4]
In 2013, as compared with 2012, the following changes occurred.
Maintenance costs increased by 2%.
By changing the supplier the cost of balls decreased by 10%.
Cost of services increased by 5%.
The groundsmans hourly wage rate was increased by 3%.
(ii)

Write down price relatives for 2013, taking 2012 as base year, for each of the four categories.
Maintenance .......................................................
Balls .......................................................
Services .......................................................
Wages ...................................................[3]

UCLES 2014

4040/23/O/N/14

9
(iii)

Calculate a weighted aggregate cost index for 2013, taking 2012 as base year.

....................................................[4]
(iv)

Use the index you have calculated in part (iii) and the costs for 2012 to estimate the total cost
of running the club in 2013.

................................................... [3]
(v)

Give two reasons why your estimate for 2013 may be very different from the true cost in 2013.
Reason 1 ..................................................................................................................................
...................................................................................................................................................
Reason 2 ..................................................................................................................................
...............................................................................................................................................[2]

UCLES 2014

4040/23/O/N/14

[Turn over

10
8

A batch of 500 plastic rods is used as a statistical teaching aid. The following table summarises
the lengths of the rods in centimetres.
Length (cm)

Frequency

1 under 2

12

2 under 3

197

3 under 4

33

4 under 5

13

5 under 6

124

6 under 7

22

7 under 8

11

8 under 9

88

(i)

Cumulative frequency

## State the modal class.

....................................................[1]

(ii)

## Find the maximum possible value of the range.

....................................................[1]

(iii)

(iv)

## Calculate an estimate of the median length of the rods.

[2]

....................................................[3]
(v)

Calculate estimates of the two quartiles and hence obtain an estimate of the interquartile
range.

## Lower quartile .......................................................

Upper quartile .......................................................
Interquartile range ...................................................[5]
UCLES 2014

4040/23/O/N/14

11
(vi)

(a) Calculate the difference between the median and each of the two quartiles.
........................................................
....................................................[1]
(b) Comment on what your answer to part (a) tells you about the shape of this distribution.
...........................................................................................................................................
.......................................................................................................................................[1]

(vii)

If a cumulative frequency graph of these data were drawn, state, with a reason, in which part
of the graph the gradient would be at its steepest.
...................................................................................................................................................
...................................................................................................................................................
...............................................................................................................................................[2]

UCLES 2014

4040/23/O/N/14

[Turn over

12
9

(a) (i)

## Explain what is meant if two events are described as mutually exclusive.

...........................................................................................................................................
.......................................................................................................................................[1]

(ii)

## Give an example of two events which are mutually exclusive.

...........................................................................................................................................
.......................................................................................................................................[1]

(iii)

## Two possible outcomes of an experiment are event A and event B.

P(A) = 0.5 ,

P(B) = 0.6 .

(a) Explain why, without any further information, it can be stated correctly that A and B
are not mutually exclusive.
....................................................................................................................................
................................................................................................................................[1]
(b) Find the value which P(AB ) must have if A and B are independent.

....................................................[2]

(b) A survey is being carried out at a college as to the driving status of its students.
The following table summarises the responses of a sample of 60 students.
Driving status

Males

Females

16

## Taken driving test but failed

Learning to drive
but not yet taken a driving test

10

## Not started to learn to drive and therefore

not yet taken a driving test

(i)

## a student chosen at random has passed a driving test,

....................................................[2]
UCLES 2014

4040/23/O/N/14

13
(ii)

## a male student chosen at random has taken a driving test,

....................................................[2]
(iii)

a female student chosen at random has not yet taken a driving test,

....................................................[2]
(iv)

two students chosen at random have both not started to learn to drive,

....................................................[2]
(v)

of two students chosen at random, without replacement, the first is male and the second
has taken a driving test but failed.

....................................................[3]

UCLES 2014

4040/23/O/N/14

[Turn over

14
10 A statistician wishes to invite 5 people to dinner from among her fifteen closest friends and
relatives, and decides to select the 5 by applying a sampling procedure to a list of the fifteen.
The following table lists the fifteen people, numbered 00 to 14, classified by age group and by
whether they are a friend or a relative.
Person

00

01

02

03

04

05

06

07

08

09

10

11

12

13

14

Age group

II

II

II

III

II

III

II

II

III

Friend/relative

## Age group: I = 18 29 years, II = 30 49 years, III = 50 69 years.

F = friend, R = relative.
You are asked to help the statistician by applying four different sampling procedures to select a
sample of size 5 from this population, using the two-digit random number table below. No person
may be selected more than once in any one sample.
TWO-DIGIT RANDOM NUMBER TABLE
61
91
06
72
(i)

12
65
15
24

00
05
09
11

18
78
79
29

12
35
08
13

07
00
59
18

09
26
10
15

53
25
03
10

01
11
04
66

15
32
37
06

74
03
02
02

45
19
99
00

14
21
01
09

Starting at the beginning of the first row of the table, and moving along the row, select a
simple random sample of the required size.
....................................................[2]

(ii)

A systematic sample is to be selected by starting at the beginning of the second row of the
table, and moving along the row.
(a) Write down the smallest possible and the largest possible two-digit numbers of the first
person selected.
....................................................[1]
(b) Write down the number of the first person selected.
....................................................[1]
(c) Write down the numbers of the other four people selected for the systematic sample.
....................................................[1]

UCLES 2014

4040/23/O/N/14

15
(iii)

## A sample stratified by whether a person is a friend or a relative is to be selected.

(a) State how many friends and how many relatives would be selected for such a sample.
....................................... friends
.................................... relatives [1]
(b) Starting at the beginning of the third row of the table, and moving along the row, select
this sample. Use every number if the category to which it relates has not yet been fully
sampled.
...........................................................................................................................................
.......................................................................................................................................[3]

(iv)

## A sample stratified by age group is to be selected.

(a) State how many people from each age group would be selected for such a sample.
....................... from age group I
...................... from age group II
..................... from age group III [1]
(b) Starting at the beginning of the fourth row of the table, and moving along the row, select
this sample. Use every number if the age group to which it relates has not yet been fully
sampled.
...........................................................................................................................................
.......................................................................................................................................[2]

(v)

State, for each of the two stratified samples, with a reason, whether or not it represents the
population exactly.

## Stratified by friend/relative ........................................................................................................

...................................................................................................................................................
Stratified by age group ..............................................................................................................
...............................................................................................................................................[4]

UCLES 2014

4040/23/O/N/14

[Turn over

16
11 The numbers of absences recorded per day in a school over a period of three weeks are shown in
the table below.

Week

(i)

Day

Number of
absences

5-point
moving total

5-point moving
average

Monday

35

Tuesday

20

Wednesday

18

140

28.0

Thursday

24

137

27.4

Friday

43

133

26.6

Monday

32

x = .

25.4

Tuesday

16

125

25.0

Wednesday

12

121

24.2

Thursday

22

122

24.4

Friday

39

124

y = .

Monday

33

128

25.6

Tuesday

18

127

25.4

Wednesday

16

125

25.0

Thursday

21

Friday

37

State why it is most appropriate to calculate values of a 5-point moving average in order to
analyse these data.
...................................................................................................................................................
...............................................................................................................................................[1]

(ii)

Give a reason why the moving average values have not been centred.
...................................................................................................................................................
...............................................................................................................................................[1]

UCLES 2014

4040/23/O/N/14

17
(iii)

Plot the numbers of absences on the grid below and describe what they show.

50

40

30
Number
of
absences
20

10

0
Mon Tue Wed Thu
Week 1

Fri

Week 2

Fri

Week 3

Fri

## Mon Tue Wed

Week 4

...................................................................................................................................................
...............................................................................................................................................[3]
(iv)

Calculate the values of x and y and insert them into the table.

[2]
(v)

## Plot the values of the moving average on your graph.

[2]

(vi)

State the purpose of calculating moving averages, and indicate whether this appears to have
been achieved in this case.
...................................................................................................................................................
...............................................................................................................................................[2]

(vii)

[1]

## [Question 11 continues on the next page]

UCLES 2014

4040/23/O/N/14

[Turn over

18
(viii)

The seasonal components for the days of the week are given in the following table.
Day of week

Monday

Tuesday

Wednesday

Thursday

Friday

Seasonal component

11

15

## Find the value of q.

....................................................[2]
(ix)

Use your trend line and the appropriate seasonal component to estimate the number of
absences on the Tuesday of Week 4.

....................................................[2]

UCLES 2014

4040/23/O/N/14

19
BLANK PAGE

Permission to reproduce items where third-party owned material protected by copyright is included has been sought and cleared where possible. Every
reasonable effort has been made by the publisher (UCLES) to trace copyright holders, but if any items requiring clearance have unwittingly been included, the
publisher will be pleased to make amends at the earliest possible opportunity.
Cambridge International Examinations is part of the Cambridge Assessment Group. Cambridge Assessment is the brand name of University of Cambridge Local
Examinations Syndicate (UCLES), which is itself a department of the University of Cambridge.

UCLES 2014

4040/23/O/N/14

## Cambridge General Certificate of Education Ordinary Level

4040 Statistics November 2014
Principal Examiner Report for Teachers

STATISTICS
Paper 4040/12
Paper 12

Key Messages
After performing any calculation it is worth pausing to consider if the answer obtained is a reasonable one for
the practical situation of the question.
It is very important to carefully read the words of a question to understand precisely what is required.
Candidates should always try to relate their knowledge to the specific requirements of a question, including
the specific context involved, rather than simply writing out memorised general theory.
If a question specifies a certain degree of accuracy for numerical answers, full marks will not be obtained if
the instruction is not followed.
The overall standard of work was higher this year. A substantial number of candidates obtained very good
marks, and there were few exceptionally low marks. It has been noted regularly in these reports that marks
are often lost due to final answers not being given to the accuracy specifically stated in the question. A
definite improvement in this respect was observed this year.
It has also been noted previously that a student of Statistics ought to be able to observe whether or not the
result of a calculation is reasonable in a given practical situation. If it is clearly unreasonable, the work can be
checked to find the error. But some candidates still seem to give no thought to the answer they obtain,
treating the exercise as one in Pure Mathematics, having no practical relevance. For example, in a cross
country running race (see Question 8 below), it should have been obvious that the mean time taken by the
senior competitors following the most difficult route could not have been less than ten minutes.
It will seem superfluous to remark that a question should be read carefully before an answer is attempted.
Yet there were several instances on the paper (see Questions 2, 4(ii), 8(vii) below) where this most basic
advice for answering examination questions was not followed, and where candidates seemed to assume
what they thought was to be done.
When questions are asked which require written answers, there is a tendency for some candidates to
respond in a very general way, repeating apparently memorised points, without relating their knowledge to
the particular context of the question (see Question 3 below). Also, for example, in a situation involving
accidents in the construction industry (see Question 7(iv) below) there should have been no explanations in
terms of death rates, when there was no mention whatsoever of deaths anywhere in the question.
Section A
Question 1
In part (i) the mean and standard deviation were usually evaluated correctly. But in part (ii) many answers
said that the standard deviation would be unchanged after the error made in the gauge readings had been
corrected. It appears as though these candidates were confusing this question with the theory concerning
the effect on the mean and standard deviation of a variable by adding (or subtracting) a constant to each
observation in a set of data.
Answers: (i) 46.5, 4.46; (ii) mean smaller, standard deviation larger

2014

## Cambridge General Certificate of Education Ordinary Level

4040 Statistics November 2014
Principal Examiner Report for Teachers
Question 2
This question was not to test how statistical measures are calculated, but to test whether or not a candidate
could deduce which measure was which for a set of measures already calculated, from their relative
numerical values. Many fully correct answers were seen. Many answers were also seen where the candidate
did not read the question properly, but assumed, completely erroneously, that this was a set of raw data from
which the stated measures were to be found.
Answers: (i) 48, 43, 53, 6, 36; (ii) 53
Question 3
Strong answers in part (i) showed a good appreciation of the advantages and disadvantages of the different
survey methods in this particular situation. Weaker answers tended to be vague, speculative, or of a very
general nature. It is not enough in this type of question to say, for example, only that something is easy:
one might validly ask in what way is it easy; what is it that makes it easy?
The difference between closed and open questions was generally well understood in part (ii). The main
weakness in answers tended to be seen in part (ii)(a) where there was sometimes too much focus on the
example given, rather than on closed questions in general.
The answers given below are far from exhaustive, but give examples of what would be considered good
Answers: (i)(a) citizens not in the telephone directory are excluded, (b) better response rate, (c) a very wide
range of people can be reached very quickly, people without internet access are excluded; (ii)(a)
only a limited number of answers is possible, (b) any relevant open question
Question 4
Many fully correct answers to part (i) were seen. In contrast, part (ii) was rarely answered well. This was
another instance of candidates not reading the question correctly. The variable is clearly stated to be the
number of these vaccines received..., and this only takes the values 1, 2 or 3.
Answers: (i) the following numbers inserted into the correct spaces: (a) 19, (b) 20, (c) 17, (d) 34; (ii) 2
Question 5
Many candidates obtained good marks on this probability question. Errors tended to occur most frequently in
parts (ii) and (iii), with incorrect denominators being used. Some candidates made the solution more
complex than it needed to be in part (iv) by considering male and female authors separately, rather than all
of the authors together as a group.
Answers: (i) 3 , (ii) 1 , (iii) 2 , (iv) 34
5
8
8
195
Question 6
Many candidates now recognise that, for a histogram, the frequency of a class is not always represented
simply by the height of the relevant column. In taking account of the column areas, however, occasional
errors were made which indicated that the labelling of the vertical axis had either not been read carefully, or
not properly understood. So for the under 50km/h class, for example, the height was multiplied by 50 rather
than 5. Part (iv) was least well done, with the total class frequency being offered, rather than the fraction of it
indicated in the question.
Answers: (i) 116, (ii) 62, (iii) 40, (iv) 6

2014

## Cambridge General Certificate of Education Ordinary Level

4040 Statistics November 2014
Principal Examiner Report for Teachers
Section B
Question 7
The question on crude and standardised rates continues to be answered exceptionally well, and there was
good application this year of basic knowledge to the problem of industrial accident rates. There were a few
cases in part (iv) of the reason referring to death rates, and as there was no mention of death rates in the
question, this could be given no credit. Only limited understanding was shown in part (vi) as to why the rates
found in the previous part were crude rates.
Answers: (i) 106.5; (ii) 40, 47.9, 75, 162.3; (iii) 102.0; (iv) Fastbuild, because its standardised accident rate
is lower; (v) Kwikbuild 30.7, Fastbuild 32.5, Kwikbuild; (vi) crude rates, a standardised rate is to
eliminate differences in population structures, so is meaningless for one category
Question 8
This question tested the abilities of candidates to interpret statistical information, presented in a mixture of
pictorial and tabular forms, relating to a particular situation. Responses were generally very good, with many
obtaining a high proportion of the marks available. Because of incorrect work a few candidates produced
answers to part (v) which, with pause for thought, should have been recognised as being utterly unrealistic.
When most of the competitors referred to took more than two hours to complete the route, it should have
been immediately apparent that the mean time taken could not have been, as was seen more than once,
less than ten minutes.
To answer part (vii) it was necessary to use all the sources of information: pictogram, pie chart and table.
Once more, this was an instance of a question not being read carefully, because many candidates did not
base their calculation on the senior competitors who had chosen the moderate route, as the question states,
but on the senior competitors who had completed the moderate route. Such candidates thereby used only
one of the sources of information, not all three.
Answers: (i) 280, (ii) (35/100) 120, (iii) 72, (iv) 12, (v) 141 minutes, (vi) 17.3%, (vii) 37.5%
Question 9
Candidates responded to this question well, and a good number obtained correct answers to all the
numerical parts. One of the errors sometimes made in part (iii) was to divide the 60% by 2 for the overweight
people, but then to read BMI for a cumulative frequency of 30% rather than 70%. A general limitation in
answers to parts (iii) and (iv) was the absence of explanatory method. Whilst this did not matter when
answers were correct, marks for method could not be awarded when they were incorrect.
There is no single good answer to part (v). But to earn both marks it was necessary to say not only how the
health of the people of the country had changed, but to give this specific support by citing more than one of
the changes which had occurred, or citing actual statistics as calculated earlier in the question.
Answers: (i)(a) 23.523.8, (b) 26.226.5, (c) 21.221.5, (d) 29.529.8; (ii)(a) 57%59%, (b) 36%; (iii) 29;
(iv) 22%; (v) the population became more unhealthy, because the percentage healthy decreased
from 58% to 36%, and the percentage obese increased from 7% to 22%
Question 10
Almost all candidates had a clear idea of the required steps in the plotting of data, and the calculation of
averages, to find the equation of the line of best fit. However, a common error seen this year in the
calculation of the semi-averages derived from the ordering of x values and y values as though they were
unconnected with each other, rather than linked pairs of values. This serious error resulted in the loss of
marks for many candidates.
There were mixed answers to part (vi), and only a minority seemed to appreciate the point in part (vii).
Answers: (ii) their x coordinates are not in the set of the four lowest x coordinates; (iii) (2, 41), (4.5, 69),
(3.25, 55); (iv) m = 11.011.4, c = 1819; (v) 52; (vi) Science, because the gradient of the line of
best fit is the greatest; (vii) it would have been very difficult to know if the pupils performed well in
tests because they liked the subject, or they liked the subject because they performed well in tests
in it.

2014

## Cambridge General Certificate of Education Ordinary Level

4040 Statistics November 2014
Principal Examiner Report for Teachers
Question 11
There were some excellent fully correct answers to this question, but also some very weak ones. Where
good solutions were seen, marks were most frequently dropped in part (iii)(b) where the case of 0 eggs was
omitted. In weaker answers, little solid progress was made beyond part (i).
Answers: (i)(a) 0.81, (b) 0.19; (ii) 0.117; (iii)(a) 0.00972, (b) 0.982

2014

## Cambridge General Certificate of Education Ordinary Level

4040 Statistics November 2014
Principal Examiner Report for Teachers

STATISTICS
Paper 4040/13
Paper 13

Key Messages
Candidates should always try to relate their knowledge to the specific requirements of a question, including
the specific context involved, rather than simply writing out memorised general theory.
It is sound examination practice to show method clearly, so that marks for method can be awarded even if
If a question specifies a certain degree of accuracy for numerical answers, the instruction must be followed
for full marks to be credited.
The overall standard of work was comparable to that of last year. A wide range of marks was seen, but there
were few very high marks. The best performances were on Questions 1, 4 and 6 in Section A, and on
Question 10 in Section B.
This year candidates paid much better attention than has often been the case in the past to following
accuracy instructions, where given, as in Question 10.
In questions which require written answers, candidates should try to relate their knowledge to the specific
context of the question rather than simply repeat memorised knowledge of a general nature. The latter
tended to happen especially in Question 5, resulting in little creditworthy work.
Section A
Question 1
Good knowledge of these measures was shown, and how they are found. There were many full mark
answers, but a mark was sometimes lost in the explanation for the median, the need to order the data initially
being omitted.
Answers: (i) 8 is the mode, and definition (ii) 9 is the median, and definition (iii) 11 is the mean, and
definition
Question 2
The variable was usually identified as discrete in part (i), but it was rarely explained what feature of the
variable made it so. Some used as an incorrect reason the fact that the cumulative frequency only has
integer values. A common error in part (iii) was to enter cumulative frequencies into the table.
Answers: (i) X is discrete, as it only takes integer values (ii) 0, 4 (iii) 0, 5, 15, 10, 0, 7, 6, 7

2014

## Cambridge General Certificate of Education Ordinary Level

4040 Statistics November 2014
Principal Examiner Report for Teachers
Question 3
In part (a), the way in which methods were different was identified more easily than the way in which they
were similar. Good answers referred to whether or not there was a need for a sampling frame, or for random
numbers, and whether or not the method was biased. Few expressed clearly the way in which they were
similar.
In part (b), whilst the correct choice between biased and unbiased was often made, the reasons offered were
usually not creditworthy. In particular, the fact that one of these is a form of random sampling, whilst the
other is not, was rarely recognised.
Answers: (a) similar in that both sample proportionately from the different age groups; different in that
stratified random sampling requires a sampling frame, whilst quota sampling does not
(b)(i) because there are likely to be fewer words on the last page of the chapter than on other
pages, the sample is likely to be biased (ii) because a systematic sample is a form of random
sampling, the sample is likely to be unbiased
Question 4
This question was a good source of marks for many. Any errors that were made were usually in plotting
cumulative frequencies at class mid points, and occasionally in part (iii)(b), finding the percentage smaller
than, instead of larger than, 37.2 mm.
Answers: (i) 0, 8, 18, 35, 46, 50 (iii)(a) correct reading from the graph presented (b) 14%16%
Question 5
chart, but often the answers seen made no reference to the practical situation of the question: that is, the
company producing wood. Answers such as values can be compared were not given credit. It also seems
not to have been appreciated that part (i) was not about the advantages and disadvantages of a dual bar
chart considered on its own, but the advantages and disadvantages of a dual bar chart as opposed to a
percentage bar chart. So any disadvantage offered which applies to both could not be credited.
Answers: (i) it shows actual amounts of wood; it only shows amounts for individual sizes (ii) total amount of
wood of all sizes produced (iii) pie chart, sectional bar chart (iv) change chart
Question 6
A substantial number of full-mark answers to this question was seen, with clear reasons well expressed in
part (iii).
Answers: (i) 5 (ii) none of these citizens speaks all three languages (iii)(a) no, because the person would
only speak two of the languages (b) yes, because the person would speak all three of the
languages (c) no, because the person would only speak one of these languages
Section B
Question 7
Correct answers were most frequently seen to part (a) and part (c)(i). It was puzzling in part (b) to see
products of two fractions frequently offered, when clearly three children were being chosen. In part (c)(ii)
only a few candidates recognised that two sums of two products would have to be worked out, corresponding
to the first choices being both blue, or both white.
Answers: (a) 88/105 (b)(i) 1/35 (ii) 1/7 (c) 86/189

2014

## Cambridge General Certificate of Education Ordinary Level

4040 Statistics November 2014
Principal Examiner Report for Teachers
Question 8
Good answers were those in which the measures for y were calculated correctly, then used with minimal
work to find the measures for x. Marks were frequently lost in all of parts (iv) (viii).
For part (iv), the values in column 6 were often found by squaring those in column 5. For part (vi) and part
(vii) it was often assumed that there had been just 7 visitors to the gallery, not 105. And for part (viii) many
started again with the original data, instead of following the instructions of the question to use what had just
been calculated in the previous parts.
Most answers to part (ix) did not make a judgment by looking at the nature of this particular distribution, but
simply repeated apparently memorised theory about the use of the interquartile range.
Answers: (i) 15, 32.5, 37.5, 45, 55, 65, 85 (ii) 12, 5, 3, 0, 4, 8, 16 (iii) 72, 55, 12, 0, 104, 112, 64
(iv) 864, 275, 36, 0, 416, 896, 1024 (v) 141, 3511 (vi) 1.34 (vii) 5.62 (viii)(a) 48.4 (b) 14.1
(ix) because the distribution is reasonably symmetrical, standard deviation is preferable
Question 9
Many candidates recognised in part (i) that they could not simply read off the heights of these columns to
produce the frequency table, but some also did not. For part (ii) most formed the correct grouping, but a
good number made the mistake of grouping the classes in part (i) in pairs. In the latter case the classes did
not have equal width, so an appropriate histogram could not be produced in part (iii). A frequent problem
with the histogram was that the vertical axis was not properly labelled, so a possible follow through mark
could not be awarded on the heights drawn. Candidates should note that a vertical axis labelled fd or even
frequency density is not good enough; the labelling should be of the form given for the histogram at the
start of the question.
For part (v) marks were available for correct method following earlier errors, but the method had to be clearly
shown to earn these.
Answers: (i) 24, 36, 32, 21, 18, 22, 19, 28 (ii) frequencies 24, 68, 80, 28 (iv) 19.3 cm19.4 cm (v) 84%
Question 10
The question was very well done and a good source of marks for many candidates. The most common error
was in part (v), where the percentages in the second table were sometimes used instead of the percentages
for the standard population.
Candidates understand very well that it is the standardised rate that has to be used to make fair comparisons
in this type of situation; but they understand less well why it is that one town can have a higher crude rate,
but a lower standardised rate, than the other.
Answers: (ii) 7.18 (iii) 5.56, 7.83, 11.86 (iv) 6.49 (v) 7.90 (vi) populations of the towns are differently
structured in terms of age groups (vii) because its standardised death rate is lower, Eastbury
Question 11
It was a pity that, at the outset in part (i), some candidates did not label the plotted points as instructed, as
this labelling was needed later in the question to make judgments on the performance of the trainees. It was
in these judgment parts, (iv) and (vi), that most marks were lost.
Good answers were able to point out in part (iv) that results for A and B seemed to follow (different) straight
lines, whilst those for C were quite erratic. Once the experienced technicians result was plotted in part (v)
they then further added that Bs results were clearly accurate. The best revised lines drawn in part (vii)
passed very closely through Bs results and that of the experienced technician.
Answers: (ii) (37.5, 104.5) (iv) results of A and B fall approximately on straight lines, whilst results of C are
erratic (vi) experienced technicians result fits results of B very well, so it seems that results of B
are accurate (viii) correct reading from revised line drawn

2014

## Cambridge General Certificate of Education Ordinary Level

4040 Statistics November 2014
Principal Examiner Report for Teachers

STATISTICS
Paper 4040/22
Paper 22

Key Message
The most successful candidates in this examination were able both to calculate the required statistics and to
interpret their findings. In the numerical problems, candidates scoring the highest marks provided clear
evidence of the methods they had used in logical, clearly presented solutions. In questions requiring written
definitions, justification of given techniques and interpretation, the most successful candidates provided
detail in their explanations with clear thought given to the context of the problem, where appropriate.
In general, candidates did better on the questions requiring numerical calculations than on those requiring
written explanations; in particular, candidates did well on the numerical parts of Questions 8 and 10. It was
particularly pleasing to see, in Questions 8(iii) and 10(c)(iii), clearly laid out logical solutions. There were
however three numerical questions that caused difficulty this year, namely parts (ii) and (iii) of Question 2
and Question 4(ii). Answers to questions requiring written explanations, such as Questions 8(v), 10(c)(i)
and 10(c)(ii), were sometimes too vague or insufficiently detailed. However, in Question 1, for example,
there were some very good descriptions of different types of variable and in Question 7(a)(i) clear purposes
stated for finding moving average values. Graphs and charts were often accurately produced where
necessary, but a common error in Question 5 was for the vertical axis label to be missing.
Question 9, on probability and expectation, proved to be the least popular of the optional Section B
questions, with each of the remaining Section B questions proving equally popular.
Section A
Question 1
Most candidates found it easier to find examples for parts (ii) and (iv) than to produce the required definitions
for parts (i) and (iii). There were, however, some good responses seen in all parts and, in general,
candidates did better on this question than on similar questions in the past. In part (i) the most common
correct definitions seen for a discrete variable were a variable whose outcomes can only take specific or
exact values or a variable which can be counted. The most common incorrect answer seen was where
candidates thought that discrete variables must take whole number values. Many correct examples were
seen in part (ii) including height, weight and length. In part (iii) many candidates correctly described a
qualitative variable as one which does not involve numbers or one which can only be described in words.
The mark was not awarded to explanations which simply said that a qualitative variable is one which has
quality, as further explanation was required. In part (iv) many correct examples of a discrete quantitative
variable were seen including shoe size, the number of people on a bus and the number of leaves on a tree.

2014

## Cambridge General Certificate of Education Ordinary Level

4040 Statistics November 2014
Principal Examiner Report for Teachers
Question 2
Most candidates correctly identified the mode in part (i). Many candidates struggled, however, with the
remainder of this question. In part (ii), for example, many candidates, rather than adding 1 to the total
frequency before dividing by 2 to find the correct position for the median, simply divided 29 by 2. In part (iii)
many candidates, correctly, made an attempt to work with a cumulative frequency of 18, but a common
incorrect answer seen was h = 7. This occurred as a result of using an incorrect method for finding the
position of the median for ungrouped data, as was also seen in part (ii).
Answers: (i) 6; (ii) 5; (iii) 6.
Question 3
The majority of candidates scored full marks in part (i)(a) of this question. In part (i)(b) there were many good
answers, with the most common errors being either not multiplying by 2 or not realising that if Maria eats the
chocolates then this implies that the situation is without replacement. Part (ii) proved to be more difficult for
many candidates. Some candidates incorrectly included 3 attempts in their working or found only the
probability of exactly 3 attempts. Some candidates produced tree diagrams with probabilities written on the
branches, but no indication of further working which might have gained them some method marks. Some
candidates missed the fact that in this part unwanted chocolates were returned to the tin, and thus incorrect
denominators were sometimes seen.
Answers: (i)(a) 1/19, (b) 15/38; (ii) 16/25
Question 4
Correct answers, together with a correct reason, were seen in the work of the more able candidates in
part (i). Many candidates did not realise that the fact that the data contains extreme values is both the reason
for the choice of the median as the measure of central tendency and the reason for the choice of the
interquartile range as the measure of dispersion. Some candidates did not notice the presence of extreme
values in the data (namely some large masses). Candidates tended to be more successful with part (ii) of
this question. The most common error was for candidates to find one third, rather than two thirds, of 19
before adding it on to 12.
Question 5
In part (i) most candidates were able to name the chart as a percentage sectional or a percentage
component bar chart. The interpretation of this chart required in part (ii) was very well done by the vast
majority of the candidates, with almost all candidates getting the correct numbers of males and most of those
also getting the correct numbers of females. Accurately drawn dual bar charts were seen in part (iii),
although some candidates omitted the label on the vertical axis. In part (iv) it was encouraging to see that
many candidates recognised that the dual bar chart provided them with actual numbers rather than simply
percentages. Some candidates, however, simply stated that the dual bar chart was easier to read, without
explaining that this was because it shows actual numbers.
Answers: (ii) 42, 36, 22; 36, 24, 60
Question 6
Many candidates found this to be the most difficult of the Section A questions. It was quite common in part
(i) for candidates to give just one pair, rather than stating all of the pairs of the independent and mutually
exclusive events. Candidates often gave A and D as their only answer to part (i)(a) and B and C as their only
answer to part (i)(b). In part (ii) many candidates successfully wrote down that the probability of A is 1/6 and
the probability of B is 1/2. These two probabilities were then often simply added together by candidates,
rather than, using the fact that these are independent events, finding the intersection by multiplying the two
probabilities together and then subtracting this from the total. Some good attempts were made in part (iii),
but it was very rare to see both the smallest and the largest possible values both correct.
Answers: (i)(a) A and B, A and C, A and D, (b) B and C, C and D; (ii) 7/12; (iii) 0 and 5/6

2014

## Cambridge General Certificate of Education Ordinary Level

4040 Statistics November 2014
Principal Examiner Report for Teachers
Section B
Question 7
Many candidates were able to give two correct purposes of finding moving average values in part (a)(i). Any
two from to smooth out/eliminate the variation, to look for the trend, to find the seasonal components or to
make predictions were required. In part (a)(ii) many candidates gave the correct answer of 3, although a
considerable minority gave the incorrect answer of 4. In part (a)(iii) answers were usually consistent with any
value suggested in part (a)(ii), though many stopped at because n is odd/even without the further
explanation as to whether or not the moving average values correspond to original data points, which was
required for full marks. In parts (b)(i) and (b)(iii) the values of a, b and c were usually correct, the centred
moving average values were usually correctly plotted and a reasonable trend line was obtained. Candidates
often had difficulty, however, in part (b)(ii), with finding the seasonal component and, in part (b)(iv), with
using that seasonal component to estimate sales. Those candidates who correctly found the seasonal
component in part (b)(ii), by finding the differences between sales and moving average values for quarter II
and then averaging these differences, often went on to correctly use the seasonal component in part (b)(iv).
A common error in part (b)(iv) was for a reading to be taken from the trend line, but for the seasonal
Answers: (a)(ii) 3; (b)(i) a = 75, b = 70.75, c = 82; (ii) 9.6; (iv) 54 400
Question 8
Most candidates obtained the correct ratio in part (i). In part (ii) most candidates produced correct or almost
correct price relatives with some candidates omitting the 100s in the first column. Some weaker candidates
appeared not to understand the term price relatives and entered what looked like total expenditure values in
the various categories. These candidates sometimes did the part (ii) calculation in part (iii) and recovered.
Others continued with their very wrong values. Many candidates, however, did this part perfectly, producing
well set out solutions and giving their answer to the required degree of accuracy. Part (iv) was also done
perfectly by many candidates, though some ignored the instruction to use the index from part (iii) and did the
calculation by finding the new cost in each category. Many candidates correctly gave reasons specific to the
context of the problem presented, for example, that the amount of electricity used may have changed, that
the number of hours worked may have changed or the amount of ingredients used may have changed for
their answer to part (v). A few candidates referred, incorrectly, to the inaccuracy introduced by rounding
errors and some referred, incorrectly, to changes in prices. Vague answers, such as the weights or the
quantities may have changed, were sufficient for some of the marks, but in order to score full marks the
reasons provided needed to be in the context of the problem.
Answers: (ii) 100s in first column; 108, 122, 97 in second column; (iii) 101.3 or 101.4; (iv) \$42 600
Question 9
This was the question omitted by most candidates or started and abandoned after part (i)(a). Those
candidates who continued with this question were usually able to score both marks in part (i)(a). Although
some perfect answers were produced, many of those who did attempt the whole question did not seem to
have understood the basic process and, for example, thought that exactly 2 points required 2 heads and 1
tail. In part (i)(b) some candidates gave partially correct responses by calculating , but many
missed the fact that 2 points could be achieved in 3 ways. In part (ii) many candidates correctly realised that
X could take the values 0, 1, 2 or 3, but the probabilities of each of these outcomes were usually incorrect
and very often these probabilities did not sum to 1. In part (iii) some correct attempts to use expectation
were seen, although sometimes the working was rather disorganised. In part (iv) many candidates realised
that the probability of a head is 1/5. Again some correct attempts to use expectation were seen, but many
candidates had abandoned this question by this stage.
Answers: (i)(b) 9/64; (ii) 27/64, 27/64, 9/64, 1/64; (iii) \$22; (iv) \$20 so should not risk throwing extra coin
Question 10
Candidates did well on the numerical parts of this question. Correct answers with clear working were often
seen in parts (a)(i) and (a)(ii). In part (b) correct values for the mean and standard deviation were usually
seen in the first three rows of table. A common error was to see the values 22 and 6 for the mean and
standard deviation, respectively, in the final row of the table. In part (c)(i) many candidates achieved one of
the two available marks. They were often able to state that the marks for Algebra were better, but a correct

2014

## Cambridge General Certificate of Education Ordinary Level

4040 Statistics November 2014
Principal Examiner Report for Teachers
comparison using the class standard deviations was much less common. The more able candidates were
able to state that the marks in Algebra were generally more varied and there were more correct responses to
this question than on a similar question requiring the comparison of interquartile ranges last year. Part (c)(ii)
was a difficult question requiring candidates to compare Priyankas mark with the class mean in terms of the
class standard deviation for each of Algebra and Geometry. It was necessary to show that she did better in
Geometry because her mark was two standard deviations above the mean in this subject, whereas in
Algebra her mark was only one standard deviation above mean. Only the most able candidates were able to
score both of the marks in this part. Many candidates scored full marks in part (iii) with clearly set out
solutions.
Answers: (a)(i) 64; (ii) 1.5; (b) 10, 3; 5.5, 1.5; 58, 15; 11, 3; (c)(iii) 12.5.
Question 11
Responses to questions of this sort have improved over the years. In part (i) many candidates obtained the
correct simple random sample and it was pleasing to see that the number 47 was not repeated too often. In
part (ii) many candidates correctly found a systematic sample, although a few used an interval of 9 instead of
10, and of the three types of sample being tested in this question, the systematic sample was the least well
done. Most candidates were able to find the correct sample sizes in part (iii), even though they had not been
provided with the necessary totals. Many correct answers were seen in part (iv), but some candidates did not
give enough detail when explaining why Asads sample was not accurate. It was necessary to state that
Asads sample over-represents machine A (or under-represents machine B or C) and that Omars sample
accurately represents jars filled by each machine. In part (v) many correct stratified samples were seen and
in part (vi) correct sample sizes were calculated, which included the need to round answers to the nearest
whole number. Part (vii), which required candidates to explain why it was more appropriate to stratify by
machine, was a difficult final part to this question. Only the most able candidates looked back to the start of
the question to find the purpose of sampling in this case. As the purpose was to check the mass of jam in
each jar, it was appropriate to stratify by machine, rather than packer, as it was the machine that was
responsible for the mass of jam. While only the most able candidates were successful in this part of the
question, there were more correct responses than in a similar part on last years paper.
Answers: (i) 47, 00, 51, 32, 85, 11, 67, 05, 10; (ii) 01, 11, 21, 31, 41, 51, 61, 71, 81; (iii) 2, 3, 4; (v) 44, 03,
59, 14, 27, 20, 78, 60, 81; (vi) 4, 5

2014

## Cambridge General Certificate of Education Ordinary Level

4040 Statistics November 2014
Principal Examiner Report for Teachers

STATISTICS
Paper 4040/23
Paper 23

Key Message
The most successful candidates in this examination were able both to calculate the required statistics and to
interpret their findings. In the numerical problems, candidates scoring the highest marks provided clear
evidence of the methods they had used in logical, clearly presented solutions. In questions requiring written
definitions, justification of given techniques and interpretation, the most successful candidates provided
detail in their explanations with clear thought given to the context of the problem, where appropriate.
In general, candidates did better on the questions requiring numerical calculations, than on those requiring
written explanations; in particular, candidates did well on Questions 3 and 4 and the numerical parts of
Questions 7 and 8. It was particularly pleasing to see in Questions 3(iv) and 8(v), on finding the standard
deviation and the interquartile range, respectively, clearly laid out logical solutions. There were however two
numerical questions that caused difficulty this year, namely parts (ii) to (iv) of Question 1 and Question 2.
Answers to questions requiring written explanations, such as Questions 6(a) and 8(vii), were sometimes too
vague. However, in Question 6(b), for example, there were some very good explanations provided for the
difference between discrete and continuous variables. Graphs and charts were often accurately produced
where necessary, but a common error in Question 5 was for labelling to be missing.
Question 9, on probability, proved to be the least popular of the optional Section B questions and it was
also the question that those who attempted it found the most difficult. Question 8 on linear interpolation
proved to be the most popular of the optional questions.
Section A
Question 1
Most candidates were successfully able to calculate the mode in part (i). Many candidates struggled,
however, with the remainder of this question. In part (ii), for example, many candidates, rather than adding 1
to the total frequency before dividing by 2, to find the correct position for the median, simply divided 46 by 2.
In part (iii) many candidates, correctly, made an attempt to work with a cumulative frequency of 29, but a
common incorrect answer seen was k = 12. This occurred as a result of using an incorrect method for finding
the position of the median for ungrouped data, as was also seen in part (ii). It was rare in part (iv) to see a
Answers: (i) 17 (ii) 1 (iii) 11 (iv) 9
Question 2
This proved to be a difficult question on expectation. Many candidates were unable to deal with the fact that
40% and 60% of letters are sent by 1st and 2nd class post, respectively, with many ignoring this information
altogether in their solutions. Attempts were seen to multiply incorrect probabilities by the number of days in
an attempt to find the expected number of days for a letter to be delivered. Those candidates who have
successfully found the probabilities were often able to go on and correctly find the expectation.

2014

## Cambridge General Certificate of Education Ordinary Level

4040 Statistics November 2014
Principal Examiner Report for Teachers
Question 3
Most candidates were able to find the required totals and the mean estimate. In part (iv) the correct answer
was seen in many cases, with well set out solutions, but some candidates appeared not to know the correct
formula for the variance or the standard deviation.
Answers: (i) 430 (ii) 17.2 (iii) 8131 (iv) 5.42
Question 4
As with question 3, many candidates produced fully correct solutions to this question. The most common
error seen in part (i) was to have two, rather than one, unknown in the equation. A few candidates had a
correct expression but were unable to rearrange correctly and then solve it. In part (ii) many candidates had
a correct standardised term, with the unknown standard deviation the denominator, but this did not always
appear in a fully correct equation.
Question 5
Many accurate charts were seen in both parts of this question. Marks were, however, sometimes lost due to
a lack of labelling of the vertical axes. It was very important in part (i) to label the vertical axis as expenditure
(in dollars) and in part (ii) as percentages. Some weaker candidates did not start their scales on the vertical
axes at 0 and thus the height of their bars were not proportional to the expenditure (in part (i)) or the
percentage of the expenditure (in part (ii)).
Answers: (ii) 27%, 33%, 40%; 31%, 33%, 36%
Question 6
There were many partially correct responses to part (a). Candidates were often able to explain what is meant
by a qualitative variable, namely one with non-numerical outcomes, but they were not always able to explain
why this means that it is not possible to illustrate such data in the form of a histogram. In addition candidates
needed to explain that in a histogram area is proportional to frequency and, with no class widths, calculation
of such an area is not possible. In part (b) there were some well explained comparisons made and this was
good to see in a question requiring written explanation. Examples of correct comparisons seen were, a
discrete variable can only take certain values within its range, whereas a continuous variable can take all
values within its range and a discrete variable is counted whereas a continuous variable is measured. A
commonly seen incorrect answer was that discrete variables can only take whole number values. Answers to
part (c)(i) were often incorrectly given as 14.5 or 14, whereas answers to part (c)(ii) were more often correct.
Section B
Question 7
Some candidates who embarked upon this question abandoned it after part (i). The most common error in
part (i) was caused by candidates not noticing that each box contained 3 cricket balls. Most candidates who
found the correct ratio in part (i) were able to go on and find correct price relatives in part (ii) and then use
these values to find a correct weighted aggregate cost index in part (iii). The majority of candidates were
also able to find a correct estimate for the total cost of running the club in part (iv). Fewer candidates,
however, were able to provide reasons why their estimate may be very different from the true cost. The most
commonly seen correct answers in part (v) were that number of balls used may have changed or that the
number of hours worked by the groundsman may have changed. Some commonly seen incorrect answers
were those which had been accounted for within the information included in the calculation, for example the
wage rate may have changed or there may have been inflation. Some answers did not give sufficient
detail, for example, the groundsman may have become ill, without giving further details as to how this
would affect the calculation. In this case it would be necessary to add that he would then be able to work
fewer hours.
Answers: (ii) 90, 102, 105, 103 (iii) 102 (iv) \$21 675

2014

## Cambridge General Certificate of Education Ordinary Level

4040 Statistics November 2014
Principal Examiner Report for Teachers
Question 8
Most candidates found the correct modal class in part (i) and the correct cumulative frequencies in part (iii).
It was rare however see a correct answer for the maximum possible value of the range in part (ii). Common
incorrect answers seen included 197, the maximum frequency, and 186, the difference between the
maximum and minimum frequency. In parts (iv) and (v) estimates of the median and the interquartile range
were usually correct with clearly set out working. The position of the median, for example, was usually
correctly identified for grouped data by taking the total frequency and dividing it by two. In part (vi) most
candidates were unable to comment that the distribution was not symmetrical. In part (vii) some candidates
correctly identified that the gradient would be steepest around the 2 under 3 class, but the reason was
sometimes incorrectly given as the difference in the frequency is greatest at this point, rather than that class
frequency is greatest at this point or that the difference in the cumulative frequency is greatest at this point.
Answers: (i) 2 under 3 (ii) 8 cm (iii) 12, 209, 242, 255, 379, 401, 412, 500 (iv) 4.62 (v) 2.57, 5.97, 3.4
(vi)(a) 2.04 or 2.05 and 1.35
Question 9
This was the least popular of the Section B questions. In part (a)(i) many candidates correctly stated that
mutually exclusive events were those that could not occur at the same time. In part (a)(ii) candidates needed
to give an example of a pair of mutually exclusive events. These were often incorrect as they were not
outcomes of the same experiment, for example, a candidate may give one event as getting a head on a coin
and the another event as getting a 6 on a die. A correct pair of mutually exclusive events would be, for
example, getting a 6 on a die and getting a 4 on the die. In part (a)(iii)(a) candidates often incorrectly stated
that A and B are not mutually exclusive because the sum of the probabilities is not 1, rather than stating that
the sum of the probabilities is greater than 1. The answer to part (a)(iii)(b) was often correct. Most
candidates were able to find the correct probability in part (b)(i). Parts (b)(ii) and (b)(iii) were also usually
correct with any errors occurring in the denominator of the fraction. Fewer candidates were successful with
part (b)(iv), with a common error being that the problem was considered to be with rather than without
replacement. Occasionally addition rather than multiplication of the fractions was seen. Most solutions to part
(b)(v) were incorrect, however some candidates correctly had denominators of 60 and 59 in their
expressions. There are a number of ways to solve this problem, the most commonly correct method seen
being 7/60 12/59 + 28/60 13/59.
Answers: (a)(iii)(b) 0.3 (b)(i) 2/5 (ii) 23/35 (iii) 11/25 (iv) 1/177 (v) 112/885
Question 10
Most candidates produced a correct simple random sample in part (i), with the most common error being the
inclusion of the number 18. In part (ii)(a) the most common error was 03 being given for the largest possible
two-digit number for the first person selected. In part (ii)(b) the most common error was 05 for the first
person selected and this occurred sometimes in cases where the candidate had got the previous part
correct. In part (ii)(c) values outside the range were sometimes seen. The stratified samples in parts (iii) and
(iv) were usually correct. In part (v) reasons for each conclusion needed to be stated clearly. In the case of
the sample stratified by friend/relative it was necessary to state that this sample is also representative in
terms of age group. In the case of the sample stratified by age group it was necessary to state that this
sample over-represents friends or under-represents relatives.
Answers: (i) 12, 00, 07, 09, 01 (ii)(a) 00, 02 (ii)(b) 00; (ii)(c) 03, 06, 09, 12 (iii)(a) 3 friends, 2 relatives
(iii)(b) 06, 09, 08, 04, 02 (iv)(a) 2 from Group I, 2 from Group II, 1 from Group III
(iv)(b) 11, 13, 10, 02, 09

2014

## Cambridge General Certificate of Education Ordinary Level

4040 Statistics November 2014
Principal Examiner Report for Teachers
Question 11
Some candidates misunderstood the question in part (i) and gave general reasons for calculating moving
averages rather than reasons for calculating a 5-point moving average specifically. It was necessary to
explain that each cycle is of length 5 days. A general question regarding the purpose of calculating moving
averages appears in part (vi). In part (ii) most candidates gave as a correct answer that each cycle contains
an odd number of observations. Alternatively, the correct answer could be expressed as being because the
moving average values are at the same point in time as the original values. In part (iii) the plots were usually
correct. Most candidates spotted the clear cyclical pattern. An alternative answer would have been that there
is no clear upward or downward long-term trend. The calculations in part (iv) and the plots in part (v) were
usually correct. In part (vi) most candidates correctly stated the purpose of calculating moving averages,
namely to eliminate seasonal variation or to find the trend, but the subsidiary question regarding how well
this had been achieved in this case was less well answered. It was necessary for candidates to state that the
purpose had been achieved well in this case. In part (vii) most candidates were able to draw a suitable trend
line. However some candidates followed too closely the earlier moving average values and ignored the later
ones. In part (viii) a common incorrect answer was q = 3. In part (ix) working was sometimes missing, which
might have been worth a mark had it been shown.
Answers: (iv) x = 127, y = 24.8 (viii) q = 3 (ix) 17

2014