Vous êtes sur la page 1sur 198

Digitized by the Internet Archive

in

2008

with funding from

Microsoft Corporation

http://www.archive.org/details/frequencycurvescOOelderich

FREQUENCY - CURVES
AND

CORRELATION.
\

BY

W. PALIN ELDEETON.

|jttblisjj*i>

for

Institute of

Qttimms

BY

CHAELES

AND EDWIN LAYTON


LONDON,
E.C.

FARRINGDON STREET,

Wb

&

TABLE OF CONTENTS.
Preface by the President of the Institute of Actuaries
Introduction by the Author
...
...
.

v
vii

...

Key

to Actuarial

Terms and Symbols used

...

xi

PART
CHAPTER
I.

I.

Page.

IL.

Frequency
Method

Introductory
Distributions

III.

of

Moments

...

13

IV.

Frequency-Curves
Calculation

36

V.

48

PART
VI.

II.

Correlation

106
of

VII.

The

Correlation

Characters

not
125
131

quantitatively measurable
VIII.

Probable Errors

IX.
X.

The Test of Goodness of Fit The Theory of Contingency

139

145

APPENDICES.
I.

Useful Constants

151
...

II.

and T Functions
Integration

152

ill.

The

of

connected
of Error...
IV.

with

some Expressions the Normal Curve


155
.

Alternative Systems of
Books, References, &c.

Frequency-Curves

160
163
166

V.

Table of Loo Y(p)

...

INDEX

33G536

69

PREFACE.
The main
as
of the present

object

Volume may be regarded

being to give a detailed description of the basis and


application
of

practical

those

modern

statistical

methods

that are associated with the

name

of Professor
is

Karl Pearson.
follows
:

The history
January,
Institute

of

the

work
Palin

briefly

as

In
the

1903,
of

Mr.

W.
the

Elderton

read

before

Actuaries
of

an interesting

paper

dealing with

the application

Pearsonian frequency -curves to the


it

graduation of a mortality experience, and


that

was then

felt

the

discussion
fact
so

of

that

paper

suffered

considerably

from

the

that

Professor

Pearson's

methods, which

had attracted

much

attention in purely statistical circles,

were comparatively unfamiliar to the actuarial profession.


It

was therefore suggested by more than one member


Council that
if

of the

it

would be exceedingly useful

to the

profession

Mr. Elderton would contribute

to the Journal of

the Institute " an explanatory paper dealing with frequencycurves,

and

giving illustrations of their use " based upon

actuarial
in the

data.

To

this

invitation

Mr. Elderton replied

most public-spirited manner by preparing a lengthy

paper which forms the nucleus of the present volume.


consideration
it

On

was,

however,
if

felt

that

the

work would
instead

be more generally useful

published

separately

of in the

form of a paper
to

in the Journal,

and Mr. Eiderton


alterations

was good enough and additions.


It is

undertake

the

necessary

hoped that

in its present

form the

volume

may

be of use not only to the actuarial profession, but also to


other classes of statistical
students

who may be glad

to

have a connected account of Professor Pearson's methods.


Actuarial work, on
largely
its

purely technical side, depends so


of
statistical

upon

the

results

enquiries

that

developments

and
of

improvements
great interest
to

in
to

statistical

methods

must always be
profession
is

Actuaries,

and

the

much indebted

Mr. Eiderton (who has had

exceptional opportunities of becoming familiar with Professor

Pearson's work) for the preparation of this volume, which

must have involved a


labour.

great

expenditure of thought and


to

Time alone can show whether, and


so ably

what
will
it

extent,

the
to

methods which he has

expounded

prove

be of practical value in actuarial work, and

would

as yet be premature to express


It

any opinion on
state

this point.

may, therefore, be well

to

that the illustrations

based on actuarial data must, for the present, be regarded


rather as examples of method than
official

as

indications

of

any

view as

to the applicability of the

methods

to actuarial

problems.

The

fact
is

that

no such expression of opinion

by the Institute
slightest

yet possible, however, does not, in the

degree,

lessen

the

indebtedness of

all

actuarial

F. B.

W.

INTRODUCTION BY THE AUTHOR.

By

the preparation of the following pages an attempt


to bring before Actuaries the
statistical

is

made
of

more

practical
tell

methods

modern

work.

It is difficult to

how

far
to

such methods

may prove

useful
if

in

direct application

actuarial problems, but even

they should happen to be

only a slight assistance

it

seems advisable for Actuaries to


the

have

some

knowledge

of

contemporary

study

of

subject
side.

connected with their own work on the theoretical

It has

been necessary
selection

to
it

exclude some recent work,

and

in

making a

has been decided that the


fully

subject should be dealt with more

from

its

practical

than

its

theoretical aspect,
is

and

that, for the present, at


difficult

any

rate, it

best to

omit the more

proofs,

namely,

those of the probable errors of the coefficients of correlation

and

of the formula for testing goodness

of

fit,

while the

proof that the method of moments will probably give very


satisfactory
results

has

been

omitted
of

because
as

it

is

not

necessary to an appreciation

moments

a
it

practical

method

(as is
till

obvious

when we remember
had been
to

that

was not
}'ears),

published

the method
it

in use for

some

and

also because

seems

the present writer that until


for
all

adjustments

have
is

been

found

possible

cases

its

practical value

somewhat discounted.
with,

The mean square


is

contingency

is

dealt

but

the

mean contingency

Vlll

neglected because the mathematics leading to

it

are more

awkward, and, though the numerical work


are not so satisfactory.

is less,

the results

Some

readers

may be

surprised that the

method

of least

squares finds no mention, but they should bear in

mind that
is

the range of

its

applicability
to

is

so limited that there

a
it

growing tendency

put

it

aside in curve fitting, and

seems best to concentrate attention on those parts of the


subject

more
this
it

likely to
is

be of permanent value to those for

whom
and

book

intended.

The median

is

not considered
;

because

has no important bearing on the matter dealt with

if it

should happen to be required the only thing wanted


that
it is

is its definition,

the position such that the

number

of

cases before

it is

equal to the

number

after

it.

The

coefficient of variation has not


it is

been dealt with, but


its
is

here again

well for the reader to

know
it

meaning.

In

comparing

the

way two

things

vary

necessary to
the

remember that
"but

relative size influences not


it,

only

mean

the deviation from

and

in discussing variation this is

taken into account by using one hundred times the ratio

which the standard deviation bears

to the

mean

Vmean /

).

Part

II. will

probably give Actuaries more


it

difficulty

than

Part

I.

because

deals with a type of problem that has at

present received
it is

little

attention from actuarial students,


II.

and

because the direct bearing of Part


it

on actuarial work

is

somewhat uncertain that


I.

is

dealt with

more

in outline

than Part

Here, as in
if

all

other statistical study, examples

must be worked out


mastered.
practical

the methods and principles are to be

The reader who


subject

goes

through a book on a
is

and does not work out examples

as

IX

certain to encounter imaginary

and miss

real difficulties, as of

he

is

to fail to obtain

any satisfactory knowledge

the

subject.

The work may appear to some


knowledge than most Actuaries

to

demand more mathematical


and
it

possess,

may

therefore

be well to point out that a practical

man

can use frequency

curves and correlation reasonably without such knowledge,


for the fact that
statistics

a curve he has found agrees with the


is

from which the moments were obtained

a proof

that in the particular case he has obtained proper values


for

the

constants

even though he has

not

followed
It

the

mathematical reasoning leading to the equations.


not of
course be
inferred

must
is

that
it is

belief

without

proof

considered advisable, but that

unwise for a practical

man

to put aside a practical subject

which he can

test practically,

merely because he cannot follow some of the proofs.


Frequency-curves

and

correlation
to

form

subject

in

which there
progress

is

still

much
been

be done in spite of the rapid


recently.

that

has

made

There are few

subjects which oifer a richer field for original


statistical

work than
field the

mathematics and

its

applications.

In this

reader will find that in recent years


Professor

we

are indebted to

Karl Pearson

for the majority of the

work

that

has proved a success in practice, and anyone writing on the subject for practical
footsteps.

men

is

bound

to

follow in

his

Those who become interested in the subject are

strongly recommended to study Professor Pearson's papers


it is

not until they have done

so

that

they

will

fully

appreciate the great extent of his contribution to statistical


science.

The present Author has

merely tried

to

bring

together some of Professor Pearson's results and give them


to

members

of his profession with

examples that tend to show

that actuarial statistics can be

examined by

his

methods

in

the same

way

as the statistics of biology

and anthropology.

May

not the continuation of such work add some links to

the chain of continuity and indicate a wider law than an

actuary studying his


to suspect
?

own

subject exclusively might be led

As

will

be readily appreciated, the Author

is

chiefly
is

indebted to Professor Pearson, but his indebtedness

of the
;

kind for which

it

is

impossible to offer formal thanks


fail

such

thanks would, at their best,

to

express the sense of

gratitude which prompted them.

He
Gr.

has also to acknowledge


J.

much kind

help

from

Messrs.

Lidstoxe

and Johx

Spencer, both of

whom

read the work in a somewhat different


in connection

form in MS. and gave him many suggestions

with the arrangement of the matter, and the former has also

helped him in
S.

many ways

at the later stages, while Messrs.

Ad lard and

E. L. Eldertox have devoted a large amount


difficulties

of time to reading the proofs, and have suggested

that would probably

arise

and ways

of

removing

them.

Miss Ethel M. Eldertox has rendered assistance in some of


the calculations and in other ways.

W.
Loxdox, August, 1906.

P. E.

KEY TO THE ACTUARIAL TERMS

AND SYMBOLS

USED.

The

following explanation of certain technical terms and symbols


is

that are used in this book


readers.

given as an assistance to non-actuarial

For a

fuller

account of the functions and notation reference


I.

can be made to the Text-Book for Actuaries (Parts


to the " Account of the Principles and

and

II.),

and

Methods adopted
:

in constructing

the British Offices' Life Tables." London

C.

&

E. Layton.

1903.

When
among
the

an investigation

is

made

into the mortality experienced

lives assured, the

number
of

of persons entering at each age, and

numbers passing out

observation

at

each

age owing

to

(1) death, (2) withdrawal,

by the

policies lapsing, being surrendered,

or terminating from

some other

cause, are recorded.

The exposed

to

risk of death at age 25

(E 25 ) means the number who had the chance


and 26, and were on the average at
risk for

of dying between ages 25

the whole year.

The number who


at age 25 (q 25 ).

die

between ages 25 and 26, divided by the

exposed to risk at age 25, gives the probability of dying in a year

When
large
still

an experience ends
of persons

in

any year, say 1900, there

will be a

number

who have been

at risk, but whose policies are

in force; these arc called existing at the close

of the observation.

Xll

When

a graduation has been

made the expected deaths

are found
.

by

multiplying the exposed to risk by the graduated values of q x result is then compared with the actual deaths.
qnm(5)
is

The

t jie

name given

to the table of mortality obtained

from

the male lives assured by ordinary whole-life without profit policies

between 1863 and 1893, excluding the

first five

years of assurance.

is

a table constructed from the similar with-proht assurances

for all durations,

and

31

(healthy males)
in 1863.

is

the

name given

to the

older experience

which ended

qx

is

(see above) the probability of

dying in a year.
}^ear.

px
So
if

is

the probability of a person aged x living one


a stationary

we imagine

community, which a person can only


lx

leave

by death, and consider

to

be the number living at exact

age x, then lx+l =p x x lx and lx+2px+i

X h+i, and

so on.

The value
a

at a rate of interest i per unit of a

sum

of 1 payable
is

if

person
'4-

aged x be alive at
,

the

end

of

years,

therefore

VH l

^-

where y=(l-f i)" 1 and the value of an annuity of 1 would be


vlx+l

+ vHx+2 +
it is

Xow

for

convenience in making tables

well
vx
,

to multiply
as

numerator and denominator of


the value of the annuity

this expression

by

and we have

vHx
where
Similarly

Dx
lx

T> x

D x =v x

and

Kar=D a + l)j. + i-f


.

*=^+S**+i+
jKT,

Tables of D,
Ua\
is

and are called commutation columns.

written for the value of an annuity of 1 payable for n years

certain, independent of
tin
is

any

life,

so its value is v

+ v2 +
,

-\-v n .

the value of a similar annuity of 1 per

annum payable
is

times a year when

takes the limit of

oo

so that its value

dm.
o

Xlll

Oology
hypothesis
log /.v log
is lx

is

the logarithm of the reciprocal of

px
'

and Makeham's
hut
colog_p.r

assumes
/a?+i;

that

its

value

is

A + .Bc a

therefore an alternative

way

of stating the hypothesis

=k&g*.
valuing the policies in

When

an assurance

office

the actuary

groups cases together to save labour.


at death the}^ are

When

assurances are payable


birth, but

grouped according to the year of


a
certain

when
death

they are payable at

maturity age or

previous

(Endowment Assurances) they can be grouped either according to year of birth or according to the number of years to run {unexpired
term).

In the latter case they have to be valued by finding an


;

average age at maturity


of the ages, but Mr.
Gr.

formerly this was done by taking the mean

J. Lidstone has recently

shown that a much

more accurate
Progression.

result

is

reached by weighting the ages in Geometrical

The

constants used for this purpose are called Z.

model

office is

an imaginary specimen
valuations.

office

which

is

used for

making approximate

FREQUENCY-CURVES AND CORRELATION.

PART
CHAPTER

I.

I.

Introductoey.

The ordinary treatment of probability begins with the assumption that the chance that a certain event will occur is known, and proceeds to solve the problems that arise from
1.

the combination of events or the repetition of a particular

experiment
a limited

it

proves that a certain result


of trials

is

more

likely to

occur from experiment than any other, that a result based on

number

is

unlikely to differ greatly from

the expected result, and that the proportional deviation from

the most probable result will generally decrease as the


of trials
is

number

increased.

Experiments
theoretical

can

easily

be

made

method leads
;

to results

to show that the which can be realised in

practice

when

the probabilities can be estimated accurately

beforehand

for example, various trials


it

have been made with


if

coin tossing, in which

has been found that

five coins are

tossed together and the


is

number of them coming down " heads "

recorded, then the distribution of the cases will agree with

the binomial expansion


to expect.

as the ordinary theory leads us Sequences of " heads " or " tails " form a series approximating to the Geometrical Progression with a common ratio of i, and the drawing of cards from a pack gives a result
closely

5 (J 4- i)

agreeing with

the

numbers that

theoretical

work

suggests.
2.
It

frequently happens, however, that the probabilities are

it is impossible to tell whether we are dealing with an experiment like coin tossing or sequences or carddrawing; in fact, the only thing known is the distribution of

not known, and

the

number
to

of

cases

into

certain

groups, and in

these

circumstances the inverse problem of tracing the theoretical


series

which the
statistics

statistics

approximate

may become an

important matter.

The

difficulty of the subject is increased

because
exactly,

do not give the theoretical distribution

impossible to tell where the differences between the actual and theoretical results lie. To make the position clearer it will be well to re-state the problem and ask whether it is possible to find the theoretical series to which a series resulting from a statistical experiment approximates. It may be difficult, perhaps impossible, to trace the simple probabilities corresponding to a given case, but yet practicable to form a reasonable opinion of the series of numbers that might be reached if the experiment could be repeated an infinite number of times. On turning to the reasons which make it advisable to find this ideal result to which statistics approach, it will be seen that the elementary probabilities are not so important as they seem to be, and a reasonable
it

and

is

representation of the series

is

of far greater practical value.

We

notice that one of the

first

objects of a statistician or an

actuary dealing with

statistical

work

is

to

express

the

observations in a simple form so that practical conclusions

can be easily drawn from the figures that have been collected.
If the

available statistics fall

naturally into

fifty

or sixty

groups he has to decide how they can be- arranged to bring out the important features of the problem on which he is working, and if he can find four or five numbers closely
connected with the original series which can be used as an
index to the whole, he can then give the result in a way that might assist comparison with similar statistics, and enable
others

who have

to deal with the facts to appreciate the

whole

3
distribution
its

original

more readily than they could do The statistician has form.

if

it

remained
to

in

also

supply

approximate values for intermediate terms when only a few can be obtained from his experience, or complete or continue a series when only a part of it is known. In many cases
he has to keep the same terms as his original series, but remove the roughnesses of material due to limitations in the number of cases available for his investigation; that is, he
has to graduate his data. 3. In reality these objects are
tables can be represented

much

alike, for if the statistical

by an algebraic or transcendental formula, we can replace the whole series of numbers by a few
values
(the

constants

in

the

formula) which,

if

we
T

deal

systematically with

the

distributions

we meet,

facilitate

comparison or enable us to supply missing terms, while the roughness of the original material can be removed by making
possible. If a
it

a suitable formula represent the original statistics as nearly as formula is based on the theoretical considerations,
should also give a solution of the problem in probabilities

mentioned
practical

that both the at the outset, and we see and theoretical requirements can be dealt with at the same time, for the smooth series sought by the theoretical student is the same thing as the formula required
for practical work.

4.

The advantages
of

simplicity

any system of curves depend on the the formulae and the number of classes of
of

observations

that

can

be
is

dealt

with
little

satisfactorily,

for

complicated expression

very

original groups of statistics,

improvement on the and a system which is not capable


statistician
is

of general application leaves the

in

difficulties
;

whenever
a formula

it

breaks down.

One other thing

necessary

if

is

known

to

be a suitable one there must be some

method of finding the arithmetical constants that will give a good agreement in the particular case. Such a method, if it is to be of practical use, must be simple, reliable and capable of general and systematic application.
to

A broad idea of the objects to be accomplished ought they are likely to be be kept clearly before the mind forgotten because of the large amount of detail necessarily It is also important because connected with the subject.
;

the advantages of systematic treatment are often overlooked,


b 2

to

and short cuts and rough and ready methods are adopted the detriment of the work, and formulas having no basis and having no connection with others scientific
to

suitable

similar

cases

are

sometimes

used

in

rather

The consequence is that generalisation is impossible, and where a law might be found one can see little but a great variety of attempts by energetic
haphazard fashion by
statisticians.

workers to reach their own conclusions regardless of the value of comparative statistics.

CHAPTER

II.

Frequency Distribution.
1.

If statistics are arranged so as to

show the number

of times,

or frequency with which, an event happens in a particular

way, then

the

arrangement

is

frequency distribution.

Although some

of our results will be of wider applicability,

we
2.

shall generally

confine our attention to these distributions.

It is necessary to

have a name for the formula used to

describe such distributions, and the term frequency-curve has

been adopted for the purpose. The geometrical progression which describes the number of sequences in any direct experiment, such as coin tossing or dice throwing, is a frequency-curve, the equation to which is y = ~Nax
.

3.

Some

distributions give the

number

of cases fallin

certain group of values of the independent variable


{e.g., Example V. of Table I.) give the number c an exact value, and in the former case the exact va. the independent variable to which the groups correspond must be considered; for instance, "exposed to risk at age x" includes those from x\ to x +4, but the number of deaths When statistics are at duration n those from n to n+1. represented graphically, effect should be given to these

others

for

differences, and, to bring out the points a little

more

clearly,

The diagrams on pp. 6 and 7 have been prepared. drawings of distributions, such as those in the diagram, are called frequency polygons or histograms. 4. When statistics give the number of cases for an exact value of the independent variable, it is simple to plot them in a diagram by drawing ordinates and joining their tops, but in
the
the case of groups of values there
is

little

complication, for

_0>

Q_

E
03

X UJ

>
CL

E
re

"EL

E x UJ

we can
(Ex.

either

draw a rectangle standing on the

entire base

diagram) or pnt in ordinates at the middle points of the bases and then join their tops (Ex. III.). The former
II. of

method seems
(e.g.,

to

give

better

idea

of

the

information conveyed by the

statistics, but, for

amount of some purposes


is

for seeing the possible shape of the curve), the latter

more convenient.
5.
If the reader will

now examine

the examples in Table L,

he

will notice that the statistics

tend towards a smooth series


I.

Table
Example
I.

Example

II.

Example
III.

Example
IV.

Example

v.

Withdrawals
Curtate Durations.

with monthly incidence " 0" in year


of exit (p. 92, Principles &
.Methods).

Exposed to
risk of

Ages.

Sicknes3 (Watson,
.1/.

Existing at Existing at close of close of observations observations

Terms of
the expansion
1000(i + f)
of
1J

No.
of term.

Without
Profit

"Old"
Annuities
(Females).

r. Tables,
p. 19).

"Old"
Assurances.

2
3

4 5 6
7 8 9

308 200 118 09 59


41-

10 11 12 13 14 15 16 17 18 19 20 21 22 23

29 28 2G 21
18 18 12 11
5

11
7 6
1

-19 20-24 25-29 30-34 35-39 40-44 45-49 50-51 55-59 60-64 65-69 70-74 75-79 80-84 85-89 90-94 95-99 100-

34 145 156 145 123 103 86 71 55 37 21


13
7 3
1

32 127 232 258 194 103 40

2 3

4
5

6
7

42 111 176 200 193 160 73 26


6
1

29 23 81 151 192 239 157 93 29


6

11 2
1

8 9

10
11
...

3
1

3 2

...

1,000

1,000
2,995,721

1,000
2,674

1,000

1,000

True Total

1,308

172
79-400
3-998

Mean
Standard
Deviation
)

4-182

37*8750

68-485

4-1996
I

2-76810
I

1-771288
11

1-774894

1-46215

...

Type

VII

'

as the total

can be seen
conception

number of cases is increased, and from how naturally practical statistics lead
a frequency-curve
to

this it
to the

of

describe

the

smooth

distribution that would be obtained if an infinite supply of homogeneous material were available for investigation. In other words, such curves would give an approximation to the " total population" of which the particular case investigated
is

a sample.

6. It

may be

noticed that a frequency-curve will

give

frequency corresponding to every value of the independent variable along the whole range of the distribution, and will
not restrict us to a few more or less arbitrary groups as
necessarily done
is

by the actual

statistics.

The binomial

series

and geometrical progression do the same when we imagine we are dealing with something that can be divided into a very
large

number

of groups.

Thus,

if

we mix

a large quantity of

sand of two colours and take out a fixed quantity of the mixture and record the number of grains of sand of either
colour in each drawing, we should obtain a continuous curve from a large number of trials. When a 7. We will now define some important functions. distribution is arranged according to the progressive values
of

a variable

characteristic,

e.g.,

duration,

age,

&c, the
is

average value of that characteristic (not the average of the


frequencies)
is

called

the

mean

of

the distribution, and

given by

+fn * fa x a+fb x b +/c x c+ +/ fa+fb+fc+


.

"

>

\Y

here

fr

is

the frequency
is

corresponding to r; thus,
2.

m
we

Example L, 200

the frequency corresponding to


is

If

assume infinitesimal increments, the mean


\fx x dx

given by

\fx dx

'

where the limits of the integral will be such as to cover the The mean could also be described as whole distribution. the position of the ordinate through the centre of gravity of the distribution (centroid vertical), and this may be of help to some readers.

10
8.

The mode
its

is

the characteristic that occurs most frequently,


is

or, in

other words,

the position of the

maximum

ordinate,

and

calculation can therefore only be

made approximately,
the various groups.
until

unless

we know
find

the

law connecting

We

cannot

the

mode
it

exactly
is

we know the
is

frequency-curve,

because

the position of an ordinate,


statistics

and we cannot
greatest.

tell

from the rough

which ordinate

9,

Now since

one equation or curve might be used for several

distributions,

one given according to age, a second of a


so on,

different subject according to duration, a third according to

sums assured, and

we must have
itself.

a standard of reference

based on the distribution

For
is

this

purpose a function
It is

known

as the standard deviation

used.
. . .

given by

v
,
. . .

fa a'*+fb b'*+

+f*n'* \

where a h' n are the distances from the mean. form of integrals the standard deviation is

In the

v\

l fx

x x 2 dx\

where the distances x are measured from the mean. The standard deviation measures the way the frequencies Since are distributed in terms of the unit of measurement. the frequencies furthest from the mean are multiplied by the largest values of x3 a large standard deviation shows that the frequency distribution spreads out from the mean, while a small standard deviation shows that the frequency is closely In considering the relative concentrated about the mean.
sizes of

standard deviations,

it

is

necessary to bear in mind


if

the unit of measurement, because,

a given distribution

is

arranged in two series, first, according to years of age, and then in quinquennial age groups, the standard deviation will
be
five

times as large in the latter case as

it is

in the former.

This can be seen at

once by comparing the two expressions

jm^\
The

and

jiugm
The values
of
I.

latter is obviously five times the former.

the standard deviations are given in Table

for each case.


11

shows two Curves having the same the same area, but the dotted curve has the larger standard deviation because it spreads out more on each side of the mean. The reader will notice from the algebraic expressions given above that the standard deviation is not dependent on the number of cases (i.e., on the absolute size of the curve) but merely on the way they are distributed (i.e., on the proportionate numbers or the shape of the curve) it measures the " spread " or " scatter " of the statistics from the mean. 10. An examination of frequency distributions (see Table I. and pp. 6 and 7) shows that most of them start at zero, gradually rise to a maximum, and then fall sometimes at a very different rate. If the rise and fall are at the same rate, distribution will be symmetrical about its mean, which will obviously coincide with the mode. The difference between the mean and mode is therefore a function of the skewness or deviation from symmetry. In order to get a satisfactory measure, the way the material is grouped must be taken into account, and this leads us to measure skewness by (distance between mean and mode) -5- standard deviation. If the mean is on the left-hand side of the mode when the statistics are plotted out in diagram, this function will be negative, and to remember the sign it is convenient to write
Tlie diagram,
p. 12,

on

mean B and approximately

ri .

bkewness

= Mean^prMode
show the
rationale of the measure

The diagram
for skewness.

will help to

It gives two curves having the same mean B and the same mode A, but with different standard deviations, and it is clear that the dotted curve, with its larger standard deviation, is more nearly symmetrical than the other curve. 11. We may summarise these functions by saying that the mean and mode fix the position of the curve on the axis the standard deviation shows how the material is distributed about the mean, and the skewness shows the amount of the deviation from symmetry exhibited by the material. These preliminary definitions will be sufficient for our present purpose, but the functions defined will be more easily
;

understood when their actual connection with the practical

12

work
at

of curve fitting has been studied.

student working

the

subject

for

the

first

time should plot out several

on cross-ruled paper, in order to familiarise He should their nature and appearance. calculate and insert the means in the diagrams, but should not attempt to calculate standard deviations until he knows something: of the method of moments.
distributions

himself with

A B

CHAPTER
Method
1.

III.

of

Moments.

Before we proceed
it

to deal with suitable forms for use as

frequency-curves,

will

be well to see

if

some method

of

examples can be found, for it is clearly useless to suggest a curve and have no way of using it. We require, therefore, a general method by which a given formula can be fitted to a particular statistical experience, and may be applied to any expression (for instance, Makeham's formula for the force of mortality) on which we may have decided as the basis of graduation. The first point to be noticed in searching for such a method is that if there are n constants in the formula, we must form n equations between the formula and the statistics. Thus, if we have = 20, 40 and 88, when x = l, 2, and 3 three terms, say, y respectively, and wish to use the curve y = a + bx-{-c% 2 to describe them, we can, of course, find values of a, b and c so that each item is exactly reproduced by equating as follows
statistical
:

applying them to

a+ b+
a

c
2

= 20

+ 2b + 2 c = 40 a+3b + 32c=88
use ?/ = 96 when x=4, and found from the three equations just This suggests given, we should find that when x = 4 y = 164. statistics than there that when there are more terms in the are constants, the equations must be formed by using all the terms, not by selecting from them. The graduating curve

But

if

we have
a, b

a fourth term
c

the values of

and

reproduce exactly any of the observations but will run evenly through the roughnesses of the observed
will not necessarily

facts so as to represent their general trend.

14
aA an be n terms to be graduated then, if were perfectly smooth and followed a known law, each term could be reproduced exactly by, say, b b 2 b 3 bn where a = b 1} a 2 =b 2 a3 =b3 and a n = b n Now, if we consider the two series (the a's and the b's), we see that since each term is reproduced exactly

2. Let a 1} a 2

tlie series

x ,

r=n
r=l
.

rn
r=l

and

S
T=l
c r is

cr ar

=2
r=\

cr b r

where

a numerical coefficient.

This suggests a possible method to apply

when each term


graduated
ungraduated,

cannot be reproduced exactly.


figures

The

total of the

must be made equal

to the total of the

and the further equations necessary for finding the unknown constants must be formed by multiplying the various terms by different factors and similarly equating the sums of the graduated and ungraduated products, i.e., Sc r a r = l c r b r It still remains to decide the best form to be given to c r and the mean being equal to
t

+ 2a +
2

+na n
one reasonable
use
equation.
of

suggests that

cr

=r

should give

Again, since which, when

we

shall

have to
to the

some

function

applied

graduation formula, will give

make an powers of r suggest themselves as convenient when integration by parts If, therefore, we write Cr r and give t is attempted. we can obtain as many successively the values 0, 1, 2 equations as we require, and the first two of them give the area and mean of the distribution, which will be the same in the graduated and ungraduated figures.
an
integrable

equation

form (otherwise, we between ^.c r b r and Xc r a r ),

cannot

the

method is known as the Method of Moments, moments of inertia), and Professor Karl Pearson has (cf., recently shown (Biometrika, vol. i., pp. 267, &c), that it can
This

be expected to give very good


3.

results.

Applying the method above, we have

to solve the three

equations given

15

+ 6 + c) + + 2b + 2 c) + + 3fr + 3 c)=20 + 40 + 88 (a jlJj + c +2(a + 26 + 2 c) + 3(a + 36 + 3 c) =20 + 2x40 + 3x88 a + 6 + c +2 (a + 26 + 2 c) + 3 ( + 37, + 3 c) =20 + 2 x 40 + 3 x 88' 3ft + 66 + 14c = 148 or 6ft+146 + 36c = 364
(ft (ft
2

2
2

(tf

14a+366+98c=972
These equations will give the same result as those from which they were formed, because each of the three terms can be but if we introduce the fourth term, graduated exactly = 4, y = 96, we can modify the moment method by adding a x fourth term to each equation given above and obtain
;

4ft

= 244 = 748 30ft + 1006 + 354c = 2,508


+106 + 30c 10a+30& + 100c

The

solution of these equations gives

a 25*5

or

= 42-6 c=-3 03=1 y = 14:'l = 47'7 x=2 a?=3 y = 75-3 x=4 y = 96'9
b
t/

This
to

is

a very simple

example, but

it

will

probably help

show the way results are reached, and will serve as a foundation for what follows. 4. The ?ith moment of a particular frequency is defined as the product of the frequency and' the nth. power of the distance of the frequency from the vertical about which moments are being taken ; or the nth. moment of any ordinate
y of a frequency-curve about the vertical through a point distance x from it is yxn, and the nth moment of the whole
distribution treated as a series of ordinates
is

y^ + y
1

2 a'2

where

+ y. +
2

is

the

total

frequency.

Thus,

in

Example IV., the third moment


the vertical through age 77
is

of

the frequency 81 about


(

81 x

2)

where 5 years

is

the unit distance.

known, we can calculate the moment for them immediately by multiplying the frequencies by the powers of the distances between them and the. vertical about
5.
If the ordinates are


16

which the moments are required and then add the results, care being taken to give the distances their proper signs. If areas are given, an approximation is made by assuming them to be concentrated about the ordinates at the middle points of the bases on which they stand the moments thus obtained are sometimes said to be based on " loaded ordinates." The columns after the third in Table II. show the calculation of moments about the vertical through age 77 for the Example IV. of Table I., on the assumption that the frequencies are concentrated at the middle points of the bases.

Table
Central Age of

II.

Frequency

a?

-77
5

Group

/**
(4)

/x.s-

/x*3
(6)

fxs*
(7)

X
(1) (2)

=s
(3)

(5)

57
62

29

23
81

67 72 77 82
87

151
192

-4 -3 -2 -1
1

116

464
207

1,856

7,424
1,863

69
162
151

621

324
151

648
151

1,296

151

-498
239
239
2
3

-3,276
239
1,256

239
157 93 29 6

239
2,512
7,533
7,421

314
279

628
837

92
97

2,511
1,856

4
5

116
30

464 150
3,464

102

750

3,750

Totals

1,000

+ 978 + 480

+ 6,612 + 3,336

32,192

Notation toe Moments.


N"
i>

= total

frequency.
statistical

= nth.

unadjusted

moment about menu.

'

n =nth unadjusted statistical


H

moment about any


mean.

otber point.

(.

= tk moment from curve about mean. = wth adjusted statistical moment about
moment from
and

H-

n =nth

curve about other point.

= th
No TE.
-t/,

adjusted statistical
v',
fj.

moment about

other point.

fx

always refer to a total frequency of unity.

The
as
is

unit of grouping has

been taken as 5 years, and


to

if,

often

convenient,
totals

we assume
have
the

the total frequency to

be unity, the

will

be

divided by
in

1,000.

We

should generally deal with

actual

numbers that
I.

occur,

but as they have


.

been given

Table

as

the


17
distribution
in that
in

of

1,000

cases,

it

will

be better to use them

way

in the present case.


(3)

The numbers

4, 3

column

of the unit of

show the distances from age 77 in terms grouping. The centre of any other group
;

would have done almost as well as 77


choose the arbitrary origin so that
it is

it

is

convenient to
of

near the mean of the


calculation

makes moments about the mean (a


distribution.

This

easier

the

the

result frequently required), and

enables the calculator to get a rough check on these moments by comparing them with those about the arbitrary origin. The columns (4) to (7) are sufficiently explained by their headings they are formed successively and checked by multiplying / by s 4 the values of s* being taken from a table
;

of the

powers of the natural numbers. 6. It has so far been assumed that moments can be calculated about any point, but it is frequently inconvenient to do so for if we had required them about age 79'4, we should
;

! by have i i to multiply i. the powers or had


'

4=

77-79-4
=
,

ot

82-79-4
^

and so on, and it is quite clear that the labour would have been very great. In such a case we can, however, take the moments about any other more convenient point, and then
modify them in the following way Let the distance between A, about which the moments are known, and B, about which they are required, be + d thus, if we want moments about 25" 7 and have found them about if we had found them about 26, cl would have 25, d is '7
:

been '3. Then, if the distance of any ordinate yr from from B is x r then xr =Xp d
,

A is X

and

and

xrn =(Xr d) n
?ith

Now, the
so

moment
is

of the
}

whole distribution treated as


about A, and Zy r d' r n about

a series of ordinates

%y

Xn
r

we have
v" n

= Zy

x r = Zy r {X r -dy>

= S\_y

(X rn -ndX r ,
x

+^Ijd*)]
,

v n ndv n _

n{nl)
-\

2"j

a v_
,

n (1)
c


18

where v" n
?ith

is

written for the

nth.

moment about
as follows
:

B, and v

the

Instead of

moment about A. (1) we may proceed

= v"n + ndv" n _ +
1

n(n 0}

1)

dV' n

4-

v\
There
7.
is

= v\ -ndv\^- ni
l

^^d^

n_2

...

(2)

little

to

choose between these two formula?, and of

course they give identical results.

We

will

now apply formula


in Table II.

about the centroid vertical


for the

{i.e.,

example any point is

work out the moments through the mean) The distance of the mean from
(2) to

vertical

S(X,. ?/,.)

%y r
where

_ 2(X,.y, ~ N
or

is

the total frequency;

distance of the

mean from any


first

point

is

we may say that the the first moment of


It

the distribution about the vertical through that point.


follows that the

moment about the centroid vertical is zero, and this leads me to prefer formula (2) to formula (1) when moments are required about that vertical. When we come to deal with frequency-curves, we shall see that this is
generally the case.
8.

The arithmetical work is The totals in cols. (4) to


col.

as follows
(7)

observations (total of

moments
reference
i.e.,

[v)
to

by the number of and the quotients are the about 77. The moments are dealt with as having a case where unity is the total frequency,
are divided
(2)
),

proportional, not actual, frequencies are dealt with.

v\=
The value
of

-480

v',= 3-464
i/ 4

i/3 =3-336

v\

gives the
(lj

mean
(2),

= 32-192 age = 77 + 5 x '480 = 79*4.


is

In order to use formula

or

the value of d

required,

and when the calculation

moments has to be made about the centroid vertical its value is, as we have seen above, the same as v\ in the j)resent case it is the first moment about The powers of d are next the vertical through age 77. as it hap23ens d is a comparatively calculated by logarithms
of
;
;


19

simple

number

if

it

had been '48327 ,

say, the propriety of

using logarithms would have been more obvious

d 2 =-2304

(&=

+ '110592
(2)

d 4 ='0530842.
which
it

In modifying formula

for the particular case in vertical

moments
it

about
the

the

centroid
v\ is

are
v
is

required,

should be remembered that


is

zero

and

unity, because

merely
'

total

frequency

divided

by

the

total

frequencv.
v,

= v',-cl v, = v' -3dv -d v = v\-4 dv -Qd


2

v2

= 3-2336 = -1-430976 -d = 30-416289.


4

seven-place logarithm table and antilogarithm table

(such as Filipowski's), an Arithmometer or Brunsviga should

be used. It will be noticed that 6d 2 v2 can be formed very easily from Sdv.2 when logarithms are used, as log d is known. It is useful to keep a note of this value (log d) in a conspicuous place when the moments are being calculated. 9. Although the above is the most direct and obvious way

by which moments can be calculated, another method was suggested by Mr. Gr. F. Hardy and used by him in his recent
graduation of the British Offices Life Tables.
that

He

pointed out

by summing- the statistical numbers and forming a new series in the same Avay as the ~N X column is formed from the Dj. column and then summing these results (cf. the i column), and so on, equations can be formed. So far as I can trace, Mr. Hardy has not shown the connection between the summation method and the direct calculation of the moments, though he has pointed out that the same results can be
obtained.

calculation
process.

The arrangement on and the form

p.

of the expression

20 shows both the method of obtained by the


term,

Considering the line opposite the


the

first

we

notice that

sum of the which we will


unity gives the

series is given,
call

and that the second summation,


total

first

vertical situated at
to /(l).
:<

frequency is taken as whole distribution about a unit distance before the point corresponding

S 2 when the

moment

of the

Still

considering only the

first

line,

we

see that

S gives each function multiplied by coefficients of the form


c 2

"

20
<<
!b^
<=r^

+
~

+
~

+
g
_.

-*2

o o

<M

+
s
.

"*

+ *
1

^ O
'^.

1
"-

+
""J*

fa

+
=;

'

O ^
Jh

e
;

o w s

^
"ft

* T
^
+_

O B
CO

'V

+
,,

O ^

^ +
+

CN

-jj

SO

+
g
^s
|f

c^
+
*-.

+
CO

+
jg

s
+
**->

<
J^
^
1?
CO

co

^
-

^
+
CO
r-"

o
c 3
fa

+
w
s

^
(M
1

1
+
7

*c

^
+
<~>

1 +
CO

t
"a
CO

a
co

+
+
co

+
+_
22,

5
+
1

^
^

5 1

t
~i

S
+
,-v

oT

J$
+
=:

^ 3
'

-S"

^C
~
i

-'

=:

'

2
Fn
of

+
+^

p*

CI

^ c. ^ o
_+

CI

^v

s~~-

''
-

in

m
u

%
+

+
CO

^
+
+^
CI
1

^
"*-

/I

"

+
<^.

S
+
co*

"^

.2

>*
"5

>>
CM

V a 3

+
:

t
z

+
+
CO

+
1

O ^
+
>^

o
*->

^^
S,

-\-

"^

m
H
5
c^

+
o7

c?

+
1

n
II

CO

^
*-,

+
52, "->

+
*

^
C-

^
S,

B c
i;

*~>

+
+

+
+
co"

>2>

S
fe

^ +

52,

">

+
CO

^ ^ ^ +
Cl

^ + P
1

-^

^c
"

to

^ ^
^.,

"*-.

^
_

^
^^

^_^

_c

^^

cT
1

~
^.^

2
cZ

K
~
1 "

^
01

^
CO

S DO ^ s
"3

^
O)
""

3 "^
*

^
e

2^
1
-

CD

.2 s c


21
V-2

or

,
'
!

i.e., it
'

gives

+V
2

where

is

written for

the moment, because by definition the 2th moment (v'i) of the whole distribution is given by the sum of n*/(?i) for all values of n. S 4 and S 5 give each function multiplied by

n?+3n2 +2n
h

and

?i

+ 6?* + ll>i' + 6?i


3
2

^r

respectively.

A.

The following equations


S 2 = i>'i

result

S4 =i(^3+3i; 2 + 2i;

/ 1)

These equations enable us to calculate the moments about the selected origin, but if it is necessary to find moments about
the mean, the following relations are more convenient

they can be reached by substituting in the above the values in


;

formula
v,

(2),
3

and remembering that S 2 = d.

= 2$

-d(l + d)

v3
ir

=6S 4 -3vl+d)-d(l+d)(2+d) 4 =24S. 2i*b{2(l + d) + l} i^{6(1 +ci)(2 + 0 1} -d{l+d)(2 + d)(3 + d).

10. The following table shows the working in the numerical example already dealt with by the direct method. The fifth

sum

is

unnecessary, as the total of the items in the fourth


:

sum

gives the only value required

Table IV.
First 1 requency

Sum.

Second Sum.

Third

Fourth

Sum.

Sum.

29 23
81 151 192

1,000

239 157 93 29
6

971 948 867 716 524 285 128 35


6

5,480 4,480 3,509 2,561 1,694

978 454 169


41 6

19,372 13,892 9,412 5,903 3,342 1,648

54,508 35,136 21,244 11,832 5,929 2,587

670 216 47
6

939 269
53
6

(lor check) )

,,

Tt * a *

Hooo

5,480

19,372

54,508

132,503


1-1

From

the totals of the columns

we have

S 2 = rf = 5-48, S3 =19-372, S 4 =54*508, and S 5 =132*503.

The
52

first

value S 2 or d shows that the

mean

is

at
is

age

the used because it centre of the group before that in which numbers occur and, as has been already remarked, the summation method assumes
5-48 x 5
is

= 79-4.

The age 52

the

work

to

application of the formula for v 2

be done with reference to this position. The v3j and v 4 given above,
,
,

enables us to find
v2
v3

3-2336
1-43099

==

p4
11.

30-4164

This is the most obvious way of using the summation method, but if the series contains a great number of terms, it is more convenient to use a central term instead of the first term as the starting point for the summation.* A slight adjustment is then needed because, though there is no difficulty

about the calculation of the sum for the terms on the positive side of the selected point, the moments for the terms on the
negative side are formed by multiplying by the powers of negative quantities. In order to use the formula: given above,

we

require Suf(n);

%^
-

-/(), and so on, or*


;

when n

is

negative^ -nf(-n); X
K -

^_
ul
j,
)

j(-n)ovl2)

2
,,

-f(-n);
N

~ ra( n+l)( n + 2) ^ -/(-)

^
24

v ^~ n(n l){n jp
01

^/(-ra);and

- n(-n+l)(-n + 2)(-n + Z) _ A n

^n{n -l)(n- 2)(n-S) * ^


24

The first of these is given by the last term in the the second ordinary second summation taken negatively from Table III. to come from the term before the is seen
;

last

in

the

ordinary third sum


the

the third

is

the

second

term before

last in the fourth

sum taken negatively;

and the fourth is the third'. term before the last in the the sums in each case being begun from the fifth sum
;

* 1

have

to

thank Mr. G.

J.

Lidstone for tke suggestion of this improvement

in the

method.

23
central term but in the reverse direction from the sums on the
positive side.

To make the method

clearer the following table

has been prepared, showing the calculation of the summation about the centre of the group of which the frequency is 192.

Table IV. (A).


First

Frequency.

Second

Third

Fourth

Fifth

Sum.

Sum.

Sum.

Sum.

Sum.

29 23 8L 151
192

29 52 133 281

29
81

29

214 498

110 324

29 139

29

239 157 93 29
6

52

978
4-54

1,648

2,587

3,854

285 128 35
6

169
41 6

670 216 47
6

939 269
53 6

1,000

S 2 = -978-'498 =
S 3 = 1-648 + -324=

-48

1-972

S 4 = 2-587-- -139 = 2-448


and

Hence
and
and
similarly

S 5 = 3-854 + -029 = 3-883 v, = 2 x 1-972--48 x 1-48 = 3-2336


3=
i, 4

_ 1-43097

= 30-41621

agreeing with the previous results. 12. A comparison of Table IV. (A) with Table IV. will show that a saving of numerical work is effected by using a central
point as the starting point for the summation, for the sums
are numerically smaller

and the value


is

of

S2 or d, which enters
It will

into the formulae on p. 21,

much

smaller.
is

be readily
of

appreciated that whenever there

a large

number

terms

the summation method, and especially the form of it given in Table IV. (A), is a very great improvement on the product

method

of

calculating moments.
5

By means

of

an adding

marline, such as Burroughes adding machine, the summations can be obtained mechanically with little trouble, even for
series containing as

many

as a

hundred terms.

'

moments

24
13. It
is

now necessary

to consider the calculation of

from the curve, for until this has been done it is impossible to form equations for finding the constants. are constants to be Let y x =f(cV, a, b,c .) where a, h, c
.
. . . .

determined.

We have would be to

seen,
find
/,,

on pp. 13 and

14, that

one

way

of

working

/(I, a,

.)xl+/(2
f(x, a,

a, b, c

.)

x2+

say,

S
this

b, c

.)

x xn,

and
to

equations
find

would give a result which might be used in forming if it were not for the fact that it is often impossible
an
algebraic

expression

for
It
is,

the

sum

of

such

series in

terms of the constants.


?<th

however, generally

possible to find such an expression for the integral,

and as we

have defined the


nt\\

an ordinate y x as y x x n , the moment of the whole distribution from x=h to x=h is

moment

of

n yxx dx or
Jh
Jh

f(x, a, b, c

.)x

tl

dx.

The
rk
is J
h

total

frequency

{i.e.,

total
m

number
Ck
J
h

of cases investigated)
rk

yx dx, and the

mean

is

ygxdx-rJh

yx dx, as we have

already noticed.
14. If

the

calculated in this

moments from the equation to the curve are way and equated to the moments calculated
by assuming that the
is
:

from

statistics

latter consist of a series

of ordinates,

an inaccuracy

introduced.

Let us consider the two cases


(1)

When

the statistics are a system of isolated terms

(2)

we wish to pass a curve very through them. When they are a system of areas but the moments are calculated by assuming the areas to be
or ordinates* and
closely

concentrated at the middle points of the bases.


* Strictly

speaking,

not

frequency distribution but

series

of

values

requiring graduation.

The

distributions referred to on p. 5 have


tell

to

be dealt

with as areas for frequency-curve work because they

the

way

the whole

number

of cases is
of

divided in groups, and the whole area between the curve

and the axis

x must therefore be used.

25
15.
(1)

In this case the terms

?/

y l} y 2

y n -\

^i*e

given by
,

the statistics, and since

y x doo

is

approximately equal to yQ

it

y x dx is given by the equation to J-* the curve, and we have to find adjustments to counteract the x=n 1 Cn error caused by equating 2 X^. to X.yxdx (the error is
is

simplest* to assume that

.7=0

-i

analogous to that introduced by assuming

(l+t)*o;

= 5^).
by

The most
.

practical

way

of

overcoming the
corresponding
to

difficulty is

calculating the
. .

true

area

the ordinates

y 0f y\ y n -\ by means of a quadrature formula (formula of Many formulas are well known, approximate summation).

and some have been given


stances in

for use in rather special circumII.,

Text-Book,
it

Part
is

pp. 480-491
to

but
in

for

the

present
ordinates

purpose
lying

convenient
of

have

expressions

which give approximate values


both
to
fi

within

and

an area without

terms

of

the

base on

which the area


formulas express
or
y<>>

be valued stands.
y x dx in terms of
&c.

Symbolically, these
ijl,

i/_i,

y__n,

y^, &c,

yu y-u

y*> y-2,

I.Let
yx

= a + bx + cm + ds + ex*,
2

then

and
y

=a

y-i+yi=2(a+c+e)
7/_ 2

2/2

=2(a + 4c+16e).

Now, assume

the required integral can be equated to


hyo

+ h(y_ + y + l(y1
J)

+y2),

substitute

the values

given
e

just

above
1,

and

equate
,

the

coefficients of a, c

and

respectively to

y^ and
if

and

* It is generally possible to use these limits in case (1), but

other limits
used.

have

to

be taken, such as

to n, different quadrature formulae

must be

26

we have

+ 2k + 21=1

The

solution of tliese equations gives

_5178

~ 5760

308
k
'

~ 5760

anCl
>

'

~ " 5760

17

'

and we obtain
f
I

y*dM=

1 ^ Q0 {ol78ij + 3Q8{y_ +
1

ij l )

^17(y_ 2 i-

2 )\.

II. If

r*

III.If
yx

= a + bx + ex? + iIm +
3

e.c

IV.If

16.

We

can now take the calculation of the moments, where


required in terms of y
,

Vjcdx is

yx

vi

J-*

Now,
yxdx=\

yxdx+\

yxdx+...+

yxdx.

If formula I. be applied it can be used for all the integrals on the right-hand side of this equation except the first two


and the
last two,

and the values

of these are given

by IV.

Summing

the values

obtained and writing IV. with the

denominator 5760, we obtain

[V^=^{W63y,+4871y + 6660y
1

+5537y, + 5760(y4+yf+.
4

+ ijn-6 + !Jn-o) + o537y w _ + 6669t/ M _

+ 4371y_ 2 +6463yw _

which means that we can multiply the first and last ordinates 6463 = 1-1220485), the second and last but one by 5760

by

^Zl = -7588541),
o760
( v J

the
the

third

and

last

but but

two
three

by
bv

o760
iV37

= 1-1578127),
( v
; (

fourth

and
the
in

last

5760
this
less

= 9612847),

leave

all

other
the

ordinates

unaltered,

and work out the moments


modified series of ordinates.

usual
if

way
there

from
are

Of course,

than

eight

ordinates

another formula

must

be

evolved.
17. In the following table the original series

and the modified

one are set out in the first two columns, and in the other columns the calculations of the first four moments about the middle of the range by the direct method are shown
:

Table V.
Modified by

y*

Formula V.
y'x

',;

y'% x

ff

y'x x

#3

y'x x

&

51-81

58-13

232-52
99-57
82-02

930-08
288-71

3,720-32

14,881-28

43-74
35-58

33-19
41-01

866-13 328-08
26-72

2,598-39

164-04
26-72

656-16
26-72

27-80 20-42 13-79


8-26

26-72
20-42
13-26
9-52

26-72

-440-83

-4,941-25
13-26

1326
19-04
9-78 7-60

13-26

1326
152-32

38-08

76-16
88-02

4-29
1-69

326
1-90

29-34
30-40

264-06

121-60

486-40

208-38

207-41

+ 49-68

+
1,520-63

299-04
19,078-59

-39115

- 4,64221

28
then treated as the total frequency, and the moments for unit frequency (fju n would be obtained by dividing -391-15, 1520-63, &c, by 207*41, and not by 208'38,
is
)

207 "41

which
18.

is

not the " total frequency

",

but merely gives the

uncorrected

sum

of certain equidistant values.

The work can sometimes be

simplified considerably, for

if

the values at the ends of the experience are very small and have a tendency to keep close to the axis of x before they
finally vanish
(i.e., if
,

there

is

high contact

most actuarial

functions l x , a x D x , &c, have high contact at the old age end of the table), then it is reasonable to suppose that
ordinates

before the

first

are insignificant in value.


to the

and after the last exist, but Thus the integral corresponding

whole series of ordinates can be legitimately extended beyond the limits \ and n\ previously used, because the
additional area thus introduced will be evanescent.

Now

if

the area be so extended, the effect will be that in equation


x

the significant ordinates from y to y n _ will all have the coefficient unity, and the ordinates with weighted coefficients
will all vanish.

The

practical result

is,

that

if

there

is

high

contact at one end of the statistics the adjustment need only

be made at the other end, while both ends no adjustment


contact means that the
at the point of contact.
is

if

there

is

high contact at

necessary.

Mathematically, high

first

few

differential coefficients vanish

high contact at
p.

The diagrams on pp. 73 and 90 show both ends of the curves, and the diagram on

67 shows high contact at the longer durations.

19. (2)

are

The second case, namely, that in which mid-ordinates used instead of areas, may now be examined. By concentrating areas about the middle points of their

bases,
ft

we assume
f
;

that the

distances

by which the areas

1*

yx dx

ijxds,

&c, must be multiplied, are the same as

the distances from y y 1}


statistics is

&c;

that

is,

the

tth.

moment from the

+h
\

11

yx dxXH

ydx{X+iy+

+\"~ J
.
.

y x chiX+n-iy

and we require

(X + a) y x dx, where
f

is

the distance of

y from the ordinate about which moments are calculated.

29

Bv formula

I.

the series of integrals can be written

-L{

+ [5178fc' + 308{(fc-l)' + (A + l)'} -17{(h-2Y+(h+2y\}y+ ...}

where h is written for X + oj in order to simplify the expression, and working out this general coefficient Ave have

If
>>

=l

this
99

becomes h
}i

t l
t

9
=S

h2
,L

-\'

1 2

/i

+ J/t

It

=4

/l

+i

/l

+ _l_

has already been noticed that


{X.
;

if

there

is

high contact, the

value of
ordinates

+ xyydx
is,

is

found by using the unadjusted

that

the second
is

moment
;
,

is

given by a
hence,
the
if

series,

the general term of which

h2y

the third by a series, the


so on;
jjl

general term of which


for the unadjusted

is

h3 y

and

be
v

written for the true adjusted

moment about
relations

mean and
/x

moment, the

between

and

v are

given by
/jL2

+ t\ = Vo

or fi2=v2 Yw

The mean needs no adjustment, for if ^ = 1 the general term h, and the third moment has to be adjusted by J of the first moment, which is zero where the moments are taken about the mean. These adjustments were first given by Mr. W. F. Sheppard in Proceedings of the London Mathematical Society, vol. xxix., pp. 353-380. In order to demonstrate the correction for the ?ith moment by the above method, a parabola of at least the nth. order is necessary. If we apply these adjustments to the moments found on p. 19, for Example IV. of Table L, we have /x, = 3-1503, /x 3 = - 1-430976, and ^4 = 28*828322. These adjustments are found to make a
has the correct coefficient
considerable
difference in

the constants obtained from the


is

moments
20.

especially
is

when

there

a small

number
;

of terms.

When there

not high contact at both ends of the curve,


suggestions

the adjustments become more difficult to value

have been made for finding the corrections, but they are not

30
altogether satisfactory, and
to use the
it is

probably, best in such cases

unadj listed figures.

fe^jx^particular cases are,

however, dealt with in ATr*r4^-4rr2T of Chap. V.

A
SD=

student should calculate the


the

distributions,
also find
\/yL6 2

moments for one or two and make the necessary adjustments he can
;

standard

deviations of

distributions, for the

where the /u,2 has been adjusted in accordance with In Examples III. and IV. there is clearly high contact, in II. and V. there is more doubt but the adjustment is advisable, while in I. the rough moment should be used. 21. Before proceeding to deal with fitting more complicated curves it is advisable to consider the application of the method
the above rules.
of

moments

to a simple case,
21,

Let the range be

and

namely, when y a let the origin be

+ bx + c.r
at

-f

&c.

the middle

point of the range, and

stand for the area and

m n for the nth

moment
Then

of the

whole distribution about the middle of the range.

w =
2s

(a

+ bx + ex2 +
XI + +1
bl
-n 2s+3

.)x*d&

,2s

O^Q + 2s 3
-|-

and similarly

2,

+ i=21

2s+

+ 2s+5 +
'

constants
b,

These equations show that the even moments give the a, c, e, &c, and the odd moments give the constants
This
is,

d,f, &c.

of course, the result of using

moments

about the middle of the range, and makes the solution of the
equations less laborious than they would otherwise have been.

The

solution can also be simplified a little

by writing

a
ll'
I'

2S

^~2s+

7+ s 3 + 2s +
;

cl

'

so that

.m
1 1
2

= flf+
a

cl 2
-TV

+
2

p/ 4

y+
4

2
1

'

1
'

cl el ~S + 5 + 7 +- " m _ a cl el ~5+ 7 + 9 +
I

m _
2
4

'

'

31

and

similarly
1
.

m,
.

2Z
1

bl =-

cZZ

4-

3
3

o
dl s
7

fP 4-4^ ^
,

m _bl ~ 5 2i"
3 Z

+ *9 + +
fl
9

fP

II.

m _bl
5
Z

21

~~

dP
9

11

+
equations
gives

The

solution

of

these

the

constants

required, for example


(i.)

if

7/

= a 42l

fea?j

we have

a=

m
1

Z>

=3
I

2Z'T

(ii.)

if

= a 4- 6,r 4- ex? 5 mal _3(3 m a42r ~2T "Pi


6

=3

771,
/

7'2Z"
f

C_
(iii.)

15
2

4Z \

2Z

3 wiJ m + 2Z>7

if

a-\- bx + ex
3(3

4-

cZa;

m.j I

15

m,
Z

ra 3

4l\2l*

"""2l'"FJ

C_

15/
4Z 2
35 4Z 3

r
f [

2Z'

mo+ 2l'Tj
,

m,l

3 m, 2/' Z

w
Z

3)

2Z*

The above
is

results,

which can
be

easily be

extended

if

it

wished,

may now

applied to one or two numerical

examples. 22. As a first example, we shall graduate the statistics in Table V., Art. 17, for which the moments about the middle

32
of

the

range have
,

been calculated.

Taking

the

curve

y = a + h,r + cxrequired
:

the following values from Table V. will be

21

=9 m=

or

= 4-5

207*41

W!=
m,=

391-15

1520-63

Hence

5 3 f a= (-g- - g X j

622-23

-^

1520-63)
}

= 20-563
3
4-5
1

(-391-15)
4-5

- -6-4387
C_
15
f

4(4-5)U

_ 207-41 + 3 9 9

1520-63 \
(4-5)*
J

= 36815
23. The best

way

to obtain the ordinates corresponding to this


6

graduation

is

by calculating

+c

the

first

difference,

and 2e

the second difference, from the middle term ; their values are 6-0706 and -7363 respectively. Since second differences

are constant,
follows

the

work

is

done

continuously,

and

is

as

A
52-208 43-192

-9-016 -8*279
-7-543
-6-807
-6-071

-736

34-913
27-370

20-563
14-492
9-157

-5-335
-4-599 -3-862

4-558
696

These graduated figures


with those given in the
first

be found to agree column of Table V.


will

fairly well


33
24. As a further example the following
a paper by S. H. J.
statistics,

taken from

Allin (Journal of the Institute of Actuaries, xxxix., p. 350), and giving the values of annuities
to

W.

widows

in pension funds according to the age of the


:

member,

may be

considered

Modified

Value
Age.
of

by Formula
p.

V.

Distance from middle of range


multiplied

of x

a' x d-

' x

d3

Annuity.

2*7

a'

by

2.

27 32 37 42

21-20
19-91 19-34 18-58

23-79 15-11

-7

16653
75'55 67-20 17-86

5
-3 -1

2240
17-86

1165-71 377-75 201-60 17-86

8159-97 1888-75 604-80 17-86

-327-14
47 52 57 62
16-74 16-09 18-17 11-15 14-58

-10671-38

1569 1470 1299

+1 +3 +5 +7

1609
54-51 55-75 102-06

1609
163-53 278-75 714-42

16-09

490-59 1393-75 5000-94

139-15

+ 228-41

2935-71

+ 6901-37
-3770-01

98-73

In calculating the above moments it has been assumed that the figures to be graduated represent a system of ordinates
;

if

they had represented a system of areas the adjustment by formula V. would have been unsuitable.

is an even number of terms the difficulty of moments about the middle of the range is that the terms have to be multiplied by -5, To, 2*5, &c, and

When

there

calculating the

if

the series to be graduated contains only a few terms,

it

is

best to deal with the distance d, in the

way shown

above, and

2, 4 and 8, in order to obtain the second and third moments respectively. In this way, we have

then divide the totals by

first,

1=
7W

7)1]

= 139-15 = - 49-36 =
733-93

W*2

m =
3

-471-25


34

now
fit

We

will

the

statistics

with eacli of

the three

curves, the formulae for which have been given, and compare

the resulting graduations.


(i.)

y= 17*394- M57a;
y

(ii.)
(iii.)

= 17-633- l-157aj-*0451

y=17-633-ri9Qaj--0451aj?+-0035a;3
table shows the graduations
(i.)
(ii.)
:

The following
Age.

Ungraduated.

(iii.)

27 32 37 42 47 52 57 62

21-20 19-91 19-34 18-58

1674
15-69 14-70 12-99

21-44 20-29 19-13 17-97 16-82 15-66 14-50 13-34

2113
20-24 19-27 18-20 17-04 15-80 14-46

21-13 20-28
19-31 18-22 17-02 15-76 14-43 13-05

1303

Formula?
25. The
follows
:

(ii.)

and

(iii.)

are practically identical,

and both
as

are considerably closer to the original figures than


results

(i.).

obtained

so

far

may be summarized
is

(1)

The method
of finding

of

moments

a a

eneral

method
consists

the constants

in

formula suitable
it

to a particular statistical of
is

example, and
of
2/(?j)
is

equating
of

the

values

x nf

(which
for all

called the tih

moment, and
occur)
to

summed

values

that

similar

expressions

obtained from the graduation formula.

These latter

expressions will be algebraic, and simultaneous


equations have to be solved in order to find the
arithmetical constants.
(2)

The moments from the statistics can be calculated by multiplying the frequencies by appropriate values of n or by Mr. G-. F. Hardy's summation
f
,

method.
(3) If

moments have been obtained about any one

vertical,

they can be transferred to any other by the formulae in Art. 6 of Chap. III.

(4)

Since the moments from the graduation formula must generally be found by means of the integral

35
calculus, while those

by summation, the

latter

from the statistics are found have to be adjusted

before the equations for obtaining the constants

can be correctly formed. The adjustments depend on whether the statistics are a system of ordinates or a system of areas in the former case adjust;

ment
there

is

made by equation

V.,

and

in the latter

by
if

the formulae in Art. 19 (Sheppard's adjustments),


is

high contact at both ends of the curve.

D 2

36

CHAPTER

IV.

Frequency-Curves.
1. When it becomes necessary in practical work to decide on a system of curves for describing frequency distributions, we

have to bear
(1)

in

mind that

Any
it

expression used must be a graduation formula; must remove the roughness of the material.
so

(2)

There must not be


that
this

many

constants in the formula

we require a great number of moments, for means that the accuracy is reduced. The higher the moment the more liable it is to error when deduced from ungraduated observations; this is clear, when we remember that the ends of the experiences are multiplied by the highest numbers and their powers.
(3)

There must be a systematic method of approaching frequency distributions.

2.

Now, considering the

more obvious

characteristics

of

frequency distributions, we find they generally start at zero, rise to a maximum, and then fall sometimes at the same but
often at a different rate. there
is

At the ends
y

of

the

distribution

often high contact.


ij

This means, mathematically, that


;

a series of equations
so that in

= f(x)

= <p(%),
-r^
clc

&c, must be chosen,

each equation of the series 1


(for

in certain
is

cases

; '

at the

maximum

the test of a
zero
to

maximum

that the

first

differential

coefficient is

and the second negative) and

when y = 0,
of

be contact at one end, at least, or, in other words, the angle formed by the tangent to the curve at this point
for there
is

the range

of

the

distribution,

must be

zero,

in

order

that

the

tangent of

the

angle

'

87
(i.e.,

differential coefficient)

may be

zero.

In non-geometrical

lano-uao-e,

the

finite

difference

ordinates must be zero, or tliere

between two successive will not be contact.


(L

The
f

above
-,

suggests
if

that

may be put
if

equal

to

J^f F(x)

then,

= 0,
'

~ =0, dx
--

and

x= a,
So

-~-=0, and dx
F(a')
is

we have

the

maximum we

require.
is

long as

general the form assumed for


includes cases
is

extremely general and

when

-j*-

may not be

zero

when

is

zero.

F(x)

X)

expanded by Maclaurin's theorem and we have


di = dx
b

in ascending

powers of

y( + o) + b x + b x2 +
x

j
. .
.

We

in the
;

and show how it can be put form y = f(x), so as to express y as a direct function of x but as the matter has up to the present been approached from an experimental point of view, it will be interesting to see how equation I. can be obtained up to the x2 term in the denominator from elementary propositions in the
shall return to this equation

theory of probabilities.
3. If

p be

the probability of

the probability of an event happening and q its failing, then the probabilities of its
trials are

happening once, twice, and so on out of n

given by

the terms of the expansion (p -\-q) n ; or if we have cases, the terms of + q) give the frequency distribution of the (p

11

N cases
nearly
occurs
r,

into

n groups.

The binomial

series does not represent

all
is
.

the probabilities that arise, and another series that

the hypergeometrical.
.

Thus the chances

of getting

1,

black balls from a bag containing^ black and

qn white balls when r balls are drawn, are given by the


successive terms of the series

pn(pn 1)
n(n i)
.

(pn r-f-1)

f^
I

rqn

(n r+1)

pnr + 1

r.r

1
!

(pn-r + l){pn-r + 2) +
help
to

qn(qn-l)

"
'

\i

numerical example

may

make

the

way

the

series arises clear.

A bag

contains seven balls, of which four

38
are black

and three white

then

if

three halls are

drawn

the'

probability that
all will

be black

is

4.3.2 7.6.5
' '

two

will be black

is

7.6.5
is

x 3 Cj

one will be black

W x C 7.6.5
3

none

3.2.1
will

be black

is

7.6.5
is

The sum

of these four expressions

unity.

The terms can


,

be seen to agree with the series by putting n = l pn = 4>, qn = S, and r= 3. Other series may arise, but those given will be sufficient for the present purpose, and we shall proceed to consider how they can be put in the form of equation I. The inconvenience
of the expressions as they

now stand becomes

fairly obvious

large

when an attempt is made to calculate numerical values for a number of groups, and besides this, they are not continuous, while the statistics of practical work often are.
Considering the hypergeometrical
that the
the series
f auction
is

required for

and remembering ldy and as equation I. is series,


*

y
discontinuous, finite differences must be used,

we

have

?-

pm{pn 1) n (w -l) ..
.

(pn r +

Y)

r(r

1)
.

(n-r+1)

(r a?-f 2) (-l)I
. .

{pn
.

r+l){pn r + 2)

qn(qn-l)

,.

(qnx + 2) (pn r+x 1)


.

*.=v.n-u=v.\i
Jx

(rx + 1 qn x+l pu _ r+x


a,(pnr + x)

-1
1

j(r + lH g n+l)-*(+2)\ iovp + = l q 1 1


\
)

and

2{(r+l)(qn+l)-m{n+2)} _ ~ \r+-L)(qn+l)-x{2(r+l)+n(q-p)} + 2tf, y +i

Ay*

'

39
r

hicli

may be put

in the

form of equation L,
a+x
b

ldy ydx
4.

+ bLx + b.
we

2 ,v-

Returning

to

equation L,

see that

it

can be written in

the form
{b

+ b v + b,x- +
2
.v

)^=y( + a)i

multiplying each side by xn, and integrating with respect to x 3

we have
)

xn Q)Q + b xx +

b. a2

.)

dx =

n y [x + a)x dx

Integrate the left-hand side


part,

by parts treating

as one

and the right-hand and then


x n (b Q +
bl

side as the

sum

of

two functions,

x+K x +
2

.)y {nb^j

+ {n+l)
,

xn
. .

+ (n + 2)b,x n + +

.)ydx

or since

jj

at

the ends of the range of the curve the


a
l
. .

expression xn (b
notation

+b x+b2p' +
fi'n z=

.)y vanishes,

and using the

we have already adopted, namely,


\yxndx,

we have

rib fi n _i {n+l)bifjb'n (n + 2)b fJb'n+i


2

ft>'n+i

+ a>f*n*

If

we put n0,

1,

s respectively,
bx
. .
.

we get

s-f 1 equations

to enable us to find a, b
(//)

moments shown by the following equations, which have been obtained by writing the equation in the form
,

&c,

in terms of the

as

ctfi'n

+ nb

Q fjb

n_

+ (n + l)&i^' +
3

(71

+ 2) fc^'*+i +

"/*

?i+U

and then putting n =


CtfJb'o

1, 2,

&c.
.

+
+6

X
//

&

tt/^',
,

+ &l/*'o + 2&^Lfc'! + + 2b /jb\ + 3&s//2 +


}

= /A,

a/A i
o/// 3

+ 2bvfi'i-{-Bbifi
+ Sb fl2 + 4^//

a
3

-\

4?> 2y"'

3-f
/

- ^'a

II.

-f

56 2yLt

&c, &c.


40
Let us now make /a'i=0, and alter the other moments in way indicated in Chap. II., for the result of making fi'] = to change the origin of the system to the mean of the

the
is

distribution.

We
b

can

also

treat

//
:

as

1,

and

these

simpli cations lead to the following results


(1)

Keeping

only,

we have
1

dy _

x
'

y dx
(2)

/jl 2

Keeping
II.

and

b l}

the

first

three

equations in the

system

above give

and
or

a/jbo

= b = /ju, 36jyu. = +
a+
b{
2

/x 3

&!=
7

fa g
(

and and the

a
differential equation

becomes

%_

IfM
/jl 3

ydx

^ + &"
=

(3)

Keeping

b Xi b 2 ,

the system gives

b
a/jb 2

+ Sb.
2
3

2 2 /jL. 2 fji 3

fl2
3

a/j*

Sb

/u, 2

+ Sbifi + ^b fjb + 4fr,/z + obofii =/ii.


is

The

solution of these simultaneous equations

perfectly

straightforward, and leads to

dy _
10/i a A* 4

r
*

/*(a*4+3/* 3
i0/i 2ilt4

2 )

-l8/* a

--l2)tc 3

-18/i B 8 -12/i 8

10/i 2 /i 4 -18jt* 2 3
2

-12/i 3 2

10/i 2 /t 4 -18/i 2

-12/t s 2

In this

last

form put /3i=

^ and
f^2

/32

=
3)

^
fa

and

yk

1% =
/^(4/3.2

^ y^^tfe + 2(5&-6ft-~9)
-3A) +

v^ v ft(& + 3)a + (2/3


2 (5ft- 6ft -9)

-3ft

-6>

41

The reasoning by which equation I. was first obtained showed that a is the distance between the origin and the mode, or as the origin has now been transferred to the mean by putting //'i = 0, a is the distance between the mean and the mode. This distance in terms of the moment is, therefore,
5.

qVA(A+3)
2(5/32 -6/3
1

-9)
y/

where a

is

the standard deviation


is

'^
mean and

Since the skewness

the distance between the


deviation.

mode divided by the standard

6.

It

equation
b3 , b 4 ,

would be possible to obtain constants in the differential I. by using a greater number of terms and retaining &c, but there are strong practical objections to such
Besides the large increase in arithmetical work, the
is

a course.

gain in introducing additional constants


the higher

not great because


as

moments become untrustworthy, owing,

we have

already noticed, to their probable errors being very large.


Professor Pearson has shown* that "we might easily on a u random sample reach a 7th or 8th moment having half or
" double the value it actually has in the general population. " Constants based on these high moments will be practically " idle. They may enable us to describe closely an individual

" random sample, but no safe argument can be drawn from this " individual sample as to the general population at large, at " any rate so far as the argument is based on the constants " depending on these high moments." In some actuarial
statistics

might but even here the value of the work is discounted because any other smaller body of statistics on the same subject could not For practical be compared satisfactorily with the result. purposes it is probable that the equation taken as far as h 2 will be sufficient, and we shall confine our attention to the forms thus obtained, merely remarking that in some extreme cases in graduation another term might be required.

where there are

as

many

as 100,000 cases,

it

be worth while to go as far as the next term of the

series,

"Skew

Correlation and non-linear Regression," Drapers'


p. J.
(

Company Research

Memoir, 1905,


42
7.

Turning

to
it

the particular form of equation


will

I.

given in
in

equation III.

be seen that
the

it

is

possible to obtain a

formula representing

statistics

by inserting

that

equation the values of the moments found from the statistics,

but this would not give a graduation in the same form as that in which the original data appeared, for in the latter we have
?/, J

while the former gives - -7 or b ydx

dx ^-^

It

would, therefore, '


'

be necessary to integrate the expression


in practical

we

obtain in order to
it

get terms comparable with the original data, and

is

better

forms in which we require them for comparison, rather than by using the differential equations and then integrating the result.

work

to deal with the equations in the

The
8.

latter

method could only give proportional not actual


step
is,

frequencies.

The next

therefore, to replace the equation

d log y _ dx

bQ

x+a + biX-\- b x~
2

by one

of the

form y=f(x), and

to do this

x ~4r

(/

must

be integrated. Let us consider equation III. as a general expression for integration, then we notice that the form the integral takes depends on the particular values of the coefficients of x in The problem is, in fact, merely a the denominator. consideration of the forms taken by the denominator for
bQ

+b x+b
x

2 x-

b. 2

[and the
criterion for fixing the

-&i-v/ {&i2 -4&


2b,

&2 }-|

form

in a particular case

is,

obviously, the

same

as that for the nature of the roots of the


2

equation b

+ b vv + b.,x = 0,

viz.,

-r^
46
o2
2

which, by substituting

in formula III., gives

/3i(A + 3)

4(2/32 -3/3 1 -6)(4/3 2 -3/3 )'


1

9.

If this is negative the roots are real


I.),
if

and

of different sign

(Type

positive

and

less

than unity they are complex

A
43
if positive and greater than unity they are and of the same sign (Type VI.). This really covers all the cases, but just at the point where one type changes into another we can use a slightly simpler transition curve. Thus when the criterion is oo one root is oo (Type III.), when it is unity the two roots are equal (Type V.), while when it is zero the roots are equal in magnitude but of opposite sign The only other transition curve arises when (Type II.). b = b 2 = 0, and the criterion is again zero (Normal Curve of
,

(Type IV.) and

real

Error,
10.

of

Type VII.). The actual integration can now be considered. Type I. The factors in the denominator, when the roots are real and of different signs, take the b + biX + b 2 xr =

form
7

b +
}

-v^a

positive quantity"!

-r

2b 2
x

b >v/a
and the expression
be integrated

positive quantity"]

to

is

therefore of the form

(x

x+a + A.\)(x A

2)

A, a A! + A 2
now

A +a
2
aj

+ Ai

Ai + A2

x A 2

by

partial fractions.

The integration
lo g

is

simple,

and gives

y= A

^jrr

lo '

'

+A +

A +A

lo 8'

)+a

constant.

7/=Z/

t l'

+A

A, a )A 1 + A 2

A g +a

(^_A 2

a7+a,

where
x+
a),

y' results

If the origin is

from the constant introduced by integration. now transferred to the mode {i.e., put x for

we have

-KTO-0'
the form given in Table VI.

Type

II.

In

this

type a

= a.

in

Type L, and it

is,

therefore,

unnecessary to give the working.

44
Type III. This type is reached when the criterion which happens when b 2 =Q,

is cc

a
1

ho

~l\

bx

bxx
/

+b

Qi

-^)Jlog(6i* + 6J+0
bl

and
or,

by

e y changing the origin,

= y'

bl

(^ + W
(l+
5)

("

- ^)
bl

=w

x\y a

where a has a meaning different from that implied in Equation I. This type can be seen to be a particular case of Type I. when a 2 becomes infinite. Type IV. If the roots of the equation b -\-b x + b 2 x 2 = are complex, it is impossible to throw the denominator into and when this occurs, we have to integrate by real factors putting the expression on the right-hand side of the fundamental differential equation in the form

i)

X+c
fc 2

(X2 + A2)
bi
tT7-,

where

A.=x +

-7-

==-.

bi

26o

= a
dX

ana A~ =
.

b
=

26-,

o2

.
b{2

4o22

Then
loo

y=\b-(i?+A*)

=
=
u

\b 2 (X*+A>)
1

dX+
2
)

\x^ dX
c

-log(X2 +

A +
i *n--

-.tan
A.
1

X
-A.

+ constant.

2t>2

=y'(X 2 +A.*) 2b *-e

where a has a meaning different from that implied in equation I. The relation between this type and Type I. can be seen by


45
factorising the denominator of
differential equation, 62

the right-hand side


}

of

the

and then obtaining an expression for y having the same form as Type L, but containing complex expressions.
Type V.
,

(X iA)(x + iA.)

In

this case,
fl

when
,

the roots are real and equal,

x+a

i(*

ftW" ft).
(-ft)
a
r

26 2
da;

01

= sM"

+
26)J

26 2

b2 (x
6,
a- ~26
2

r?

K\ + 2bJ

+ constant

^'('+ftK
= y xpe
Type VI.
y'

4 '^

The factorising

is

the same as

Type

I.,

but the

roots of the equation being of like sign, the factors of the

The work is then denominator take the form (x + A ) (a? + A2) the same, but at the end the origin is put not at the mode but so that one of the expressions x + Ai or x + A 2 can be written
2
.

as x.

The form

is

then
y

=y

(x

a) m 'X~ m

'-.

Type

VILPutting Ol =52 =0

x = ^,- + 2&

ax
-=

^0
2

constant
.

jx+a) = ^^t <P+a) y = y' e

\-

constant

l2b

46
or,

by changing the origin and remembering that the sign


is

of

the expression in equation III.

negative,

11.

The

of their

table on p. 47 gives a list of the curves, a description appearance and range, the position of the mode and

the criteria. The values of /3i and /3 2 in the cases of Type II. and Type VII. can be seen to be required by examining equation III. The third moment about the mean must be

very small (theoretically, zero) if the curve is symmetrical, and therefore /3i = 0, and it is only when /3 2 = 3 and (3i = that

both the coefficients of x and x 1 in equation being the condition for Type VII.
12. It
is

III.

vanish

this

now

necessary to recapitulate the method, and see


fit

the steps that have to be taken to


statistics.
1. 2.
3.

a frequency-curve to

Arrange the

statistics in sequence.

Calculate the

moments about a convenient

vertical.

Transfer
(vertical

the

moments

to

the

centroid

vertical

through the mean).


high contact at both ends of the curve,

4.

If there is

apply Sheppard's adjustments to the moments (i.e., deduct y^ and \v2 ^To fr m the second and

fourth
5. 6.

moments

respectively).

Calculate the criterion.

By means
be used.

of Table VI. decide

which curve should

Table VI. gives a reference to the page on which the formulae for the constants of each curve in terms of the moments are to be found.

47

3
=

3
=
...

*
55

not

o
O 5
a s
oa
:
II

. :

.
:

. :

.
:

o
||

rH

O
V

O
B

# 8

V
"^

H
B

w v
3

O
1

0>

!>

!S>

S
6
O CD

lei
i

^| ,

6
page

6
r-H

^ f e
,^

!p

o
I>
00

For

calculation

of

Constants,

CO tQ
see

CO

Ci CD

GO

J>

CO 00

Curve.

> ^
1-3

to

^
1

<

l
,

Equation

^
II

S
II

<
II

^
1!

II

4
II

II

3s

S*>

3*>

3*j

5j

Ssj

gsj

a.
(symmetrical)

4. 'S

(skew)
(skew)

(skew)

(skew)

a.

a.

Curve.

directions

directions

of

direction

direction

direction
(symmetrical)

CO
I

o
II

rH
both
Description

CO
'

both
one

(skew)
one
one

in

in

in
range

in

in
range

^
CO

CO
I

o
CO
I

ci

+
02.

.S3-

>J

range

range

range

range

range

Unlimited

Unlimited

Limited

Limited

Limited

Limited

Limited

li

>

>

>

48

CHAPTER

V.

Calculation.
1.

The next

point to be considered

is

the calculation of the

constants for any particular distribution,

when

the

moments

have been calculated and the type to be used has been The formulas required for the numerical work will decided. given for each type, a numerical example, including the be calculation of the graduated figures, will follow, and finally
the proofs of the formula?.
2.

curves

Some general points relating to the calculation of the when the constants have been found may be
considered
here.

conveniently

When

the

constants
of x

are

known, we can calculate the ordinate


curve

for

any value

by

substituting that value in the expression for the frequency-

and if areas are required, some method of proceeding from ordiuates to areas must be found. The most simple is probably to calculate mid-ordinates, and then by the quadrature formula I. or II. find the areas. It is occasionally more convenient to calculate the ordinates at the beginning of each group, and then formula III. should be used. These
;

formulae can be best applied in the form of differences

thus,

from

II.

we have
f*

J-i

from

I.

^=2/0- bt8o! a *-'- a*I + 5760 ;A ^" A ^


291
17
III.

from

49

Formula II. is generally sufficiently accurate, while the others will be found to give a result true to five figures in ordinary
exceptional cases will be referred to in the numerical examples that follow. 3. It is sometimes a help to see the graduation expressed graphically, and this has been done with some of the examples.
cases

The

best

method

is

to insert a vertical height

yQ

at the

mode

note the ends of the curve, and the heights of the ordinates
that have been calculated.
curve, which can be

These heights give points on the


fairly easily.

drawn through them

In

drawing the curve, as well as in calculating the constants, the sign of the skewness must be borne in mind, for it is possible to draw the curve with the skewness on the wrong side of the mode, and if the distribution is nearly sjmimetrical, it is not so easy to notice the mistake as it seems to be. The tangent
to the curve at the

mode

is

parallel to the axis of x.

4. It

is

best to

distinctness,

draw on a rather large scale in order to gain and the curves given here were drawn larger
size
;

than their present

the reduction being, of course,

made

in the process of reproduction.

The base elements should

also

be fairly large in proportion


not ascend too steeply
;

to the height, so that the curve

may

otherwise small horizontal differences between the graduated

and ungraduated curves are apt


differences
is

to conceal large

vertical
it

when

the curve

is

rising or falling rapidly, but


It is

sometimes necessary to use more closely-ruled paper than that generally favoured by actuaries, and it can be procured in very
convenient rulings from Messrs.

the latter differences that are of importance.

W. Gr. Pye & Co., Granta Works, Cambridge. 5. The reader should notice that all the cases considered in the following pages assume complete distributions, and it is in
from part of a which is extremely laborious. Another point, to which reference will again be made, is with regard to grouping statistics it is sometimes impossible to obtain many groups, but for accuracy in finding moments the greater the number of groups the
general only possible to find
distribution

the

curve

by means

of successive approximation

better, unless the total

number

of

cases

is

small.

little

needed in this respect, but in actuarial statistics which are sometimes based on as many as 200,000 cases,
discretion
is

50
In our have grouped merely to save work, space and printing, and the grouping does not alter the method. 6, Another matter with which it seems advisable to deal here
seventy or eighty groups would not be excessive.
Ave

examples

is

connected with the criterion, k.


to

from -co
seen

+ 00,

This may have any value and from the following diagram it will be
all

how

the types cover

the possible values of the criterion

and do not overlap.


K

=X
k negative

==

=l
1

.<

00

k>0

and <

k>1
Type VI.
V.

Type
Typ elll.

I.

Type IV. Type VII. when &> = 3


Tyi >ell.

Type

Type III

when

j8 2

not

=3
I.

Just before

/c

Type
is

and

after that value

passed

becomes nearly symmetrical, we have a skew curve of


is

unlimited range, and so on.


" transition " curve, as
III.,
it

is

At each critical point there sometimes called so Types


;

II.,

V. and VII., are the transition types.

If

by a mistake a

student should use the wrong type he will necessarily find his

mistake by reaching an imaginary in one of the square roots which occur in the equations for the constants, but transition
types can be used
to the

when

the values of the criterion approximate


;

theoretical values

they can, in

fact,

be viewed as
is

approximations which give an accurate result in a limiting


case.
in It is impossible to say

within what limits one


theoretically

justified

using a transition

type;

the justification

depends on the size of the probable error of the function dealt with, but in practice one can be guided to a great extent by the size of the experience if there are few cases a larger It deviation in the criterion will arise than if there are many. would probably be sufficiently accurate to use Type III., provided tc was arithmetically greater than 4 individual cases must be considered on their merits, but if the student finds himself in doubt he should avoid using the transition type as he will then be on the safe side in the matter of accuracy.
;

7.

In the formulae that are given for the various types, the
//,3
.

choice of sign for a square root depends on the sign of

If

the frequency

is

concentrated more closely before the mean

51

than after
fi 3

it,
;

the

mode

is

on the left-hand side of the mean and

is

positive

the signs of certain constants in each type must

therefore depend on the signs of fa in order that the mode and mean may lie in their correct relative positions. Where,

however, no remark which a square root

is

made

as to the sign of the expression in


is

is

given the positive root

implied, and

the reader will find that these rules

become
.

easier to follow

when he has worked

out two examples, one giving a positive

Thus, if we imagine and the other a negative value for /x 3 the frequencies in the example for Type I. to be written in the opposite order 1, 3, 7, 13, &c, all the numerical work would be the same, but raj would be 2'776978, ra 2 = '409833, Oi = 13*52728, and a 2 =1*99638, and the graduation would be the same, but the numbers in the columns of the table on p. 56

would run

in the opposite order.

52

FORMULA FOR MOMENTS.


These Formulae apply to all the Types of Curves.

v\ = d
Vo^v'o
vA

=v

2.

~~ 3rfi> 2

v 4 =v' 4

^dv

~ d? Qd v d
2

or

S 2 =d

v,=2S 3 -d{l+d)
v,=e>s 4
vi

Sv

2 (i

+ d)-d{i+d){2+d)
2

= 24S,-2v^2{l+d) + l}-p. {6{l+d){2 + d)-l} -d(l+d)(2 + d)(3 + d)


z=v 2

fj, 2

Jj
Sheppard's adjustments when the
curve has high contact

cr

(standard deviation)

= vV-

A(A+8)
4(4/32-3^0(2^-3^-6)

DO

TYPE
+

-O 0"O-0
a
x

~ a

FORMULAE.

The values

to

be calculated in order are

~ 6+S&-2&
= W/WWi (r + 2) 2 +
are given by
1

6(ft-ft-l)

\
6(r

+ 1)}

?>i 2

and

/?i!

s |r - 2

2\
and
x

1 + <r + 2) J T Vft(r + 2)*+16(r+l)


k
;

+ 2= &, and

ai-r-mi

=
1

a. 2

-^m.2

_N
6

m^mj'
(m
1

r(mi + m -f 2) m )"^ + '^ r(m, + 1 )r (m + 1 +


-

table of

functions

is

required

(see

Appendix

II.).

Skewness = 5 v^i
r

Mode = Mean -\&\ -%\.

54
NOTES.

mi

is

taken with the negative root when

yLt 3

is positive,

and

as the positive root

when

yu, 3

is

negative.

Sometimes m a similar shape

is

negative, which

to that given in

Type

III.

it

starts at infinity,
is

though the
difference
is

ordinate
that
in

infinite,
I.

means that the curve has the numerical example of and falls rapidly so that The the area is finite.
;

Type

the

curve ends at a fixed

point, while in

Type
is

III. it continues indefinitely.

In this

needed in taking out the T function, for T(t) is required where t<l; the tables give \ogT(l + t), i.e., logt-\-\ogT(t). If both <m and m 2 are negative, a U-shaped curve is obtained.
case a
little

care

EXAMPLE.

As an example

of this type the figures given in Table


first

I.

(Example II.) may be used. The moments were Mr. Hardy's Summation Method (see Chap. III., Art,
following form:

found by
9) in

the

Central

Exposed
to Risk

_
First

Age
of

Example
of Table

II
I.

Sum.

Second Sum.

Third

Fourth

Sum.

Sum.

Group.

17

22 27 32 37 42 47 52 57 62 67
72 77

34 145 156 145 123 103 86


71 55

1,000

37 21 13
7 3
1

966 821 665 520 397 294 208 137 82 45 24


11

5,175 4,175 3,209 2,388 1,723 1,203

806 512 304 167 85 40 16


5
1

19,809 14,634 10,459 7,250 4,862 3,139 1,936 1,130

64,389 44,580 29,946 19,487 12,237


7,375 4,236 2,300 1,170

618 314 147 62


22 6
1

552 238 91 29
7
1

82 87

4
1

Totals

1,000

5,175

19,809

64,389

186,638


S2 = 5175-f-1000= 5-175 S 3 = 19809--1000= 19-809 S 4 = 64389-- 1000= 64-389 S 5 = 186638--1000 = 186-638

of the formula? on p. 21, and, in this no adjustments* are to be made in the moments the v's and ///s are the same because there is not high contact, we have
vertical
case, as
/z.,=

The next step is by means

to find the

moments about the centroid

766237
15-1069

M3 =

^4 =172-326 -5072955 ft=


(3,=

2-935110

From
be used

the values of

fti

and

/32

the criterion

(k)

can be
I.

calculated,

and

its

value being '2645 shows that Type

must

(see

Table VI.).
r
r

= 5-186811

logr
log log
log
(r

='7149004

+ 1=6-186811

(r+l)= '7914669

+ 2 = 7'186811
of

+ 2) ='8565363
were

72=3-186811
The
values

(r- 2) =-5033563
checked

log(r+l), &c,
6

by

Gauss-logarithm table.

= 15*52366
-409833

m, =

m,=

2-776978

a,= 1-99638
a 2 = 13-52728

Mean-mode=
It will

2*223116
1

be noted that the expression {{3 (r + 2) 2 + 16(r + 1)}* occurs in both the values of b and m. The mean is at age 12 + 5*1 75 x 5 = 37*8750, and the mode at age 37-8750-2-223116 x 5 = 2675942.

The skewness
* In

is

'8032.

permanent object depending on a considerable degree In the examples given it was simply done to save labour, and the original reasons for which the corves were If we had not grouped our calculated did not require extreme accuracy. statistics we should have reduced the error resulting from our not knowing the best adjustments to use in cases in which there is not high contact.

work which

lias a

of accuracy,

grouping should be avoided.


b6

The

calculation of logyo

is

as follows

log

X = 3-00000
6= 2-80901
1

colog
l

m logm = 1-84123
mo\oo;m.i
2

= 1-23179
_

colog (r-2)>- =2-39590


log

T(r)= 1-50406

colog r(w!

+ l)=

-05219

colog

r(w 2 +l) = 1-34037


logy
2-17455

where, of course, log

T{m,+ l) = \og r(3*776978) =log 2"776978


3

+ logl-776978 + logr(l*776978) the last value being taken from the table at the end of the book. The work to this point gives as the curve for graduating
the statistics

= 149-47] 1
is

X
1-99(338
at

'400833

x
1

2776978

"

13-52728)
is five

where the origin

age 26*75942 and the unit


:

years.

The following

table shows the calculation of ordinates of

the curve from the equation just given

col (6)

Age

1+

*
a
\

1(3)

"
i

log- (2)

log

:;

x col (4) m.2 x col (5)

+ col + log#o =logy


a!

(7)

Vx

(1)

(2)

(4)

(5)

(6)

(7)/

(8)

(9)

17

22 27 32 37 42 47 52 57 62 67
72 77

02228 52319 1-02410 1-52501 2-02592 2-52683 3-02774

114429
1-07037 99614 92252 81859 77466 70074 62681 55289 47896 40501 33111 25719 18326 10934 03541

352865 402956
4-530*7

503136
5-53229 6-03320 6-53411 7-03502

82 87 92

753593

2-31792 1-71866 0-01034 18327 30662 40257 48111 51760 60526 "65615 70169 74291 "78055 81519 84726 87714

0-05854 02955 1-99815 96198 92870 S8911 84556 79714 74264 68030 60750 51997 41025 26307 03878 2-54913

1-3229
1-8847 0-0042 0751 1257 1650 1972 2241 2481 2689 2876 3045 3199 3341 3472 3595

04626
0821 1-9957 9027 8020 6921 5711 1367

1-6601
2

1404

21745
2-1525 2-1023

45-7 138-2 149-5 142-1

1266
107-6 87-7 68-5 51-0 36-0

20317
1-9429 1-8357 1-7080 1*5557 1-3722 1-1461

2S53 1122

29100
6670 3623 3-9535 3307 5-9709

236
14-0

8568
4622 1-8525

72
2-9
7

3-5050

57

. r CO CD

CM

T"

O CM


58
Cols.
(2)

and

(3)

have a constant
,

first

difference,
at

viz.,

or -500907, and

or -073925.

The value

any point

having been calculated and checked, the other items are formed continuously. Cols. (4) to (9) explain themselves, but we may remark that it is generally advisable to use a larger

number
nil

of figures than five in taking logarithms, especially


2

if

or

is

large.

little

care

is

necessary in multiplying

such numbers as T71866 by


is

409833). If an arithmometer put on the plate, and is multiplied by '28134, and the result '1153 must be put in the form 1*8847, to enable us to add it to other logarithms. Col. (10) gives the
,

used, mi

is

and was formed by applying one of the formulae on p. 48. The area of the first group must be treated separately, as the
area,

curve starts at age 16*7775, and the base of the group is therefore 2*7225 in length, instead of 5 years as in the other
cases.

good way
viz.
:

to

find

the area

is

to

calculate

the

ordinates for the middle and ends of the base, and apply

Simpson's rule,

\yxd%={yQ + 4 yi + y
!

fi

},

remembering

to multiply the result

by

2*7225
- to allow for the

different length of the base.

base

The mid-ordinate is 92*1, the ordinate at the end of the is 116*5, and the ordinate at the start is of course zero
;

the area

is

approximately

tl^i x
o

f.

{0 + 4x92-1

+ 116-5} =44.

PROOF OF FORMULA*
The equation
to

the

curve

is

= ijJl+'
J

M
J

m m where =
,
y

a2

Let

a.,

=b

and

=
|

+d

The reader

-who lias

little

acquaintance with formula? of reduction and the

T and

functions, should consult

Appendix

II.

before reading the proofs of the

formulae for this and the other types.


59

The area from

= a, to a?= +

is

the total frequency N.

//

- (a i

a?)

'

(a 2

x)

"'-

d.v

Jo

ft 2

"

_N
y ~~ 6
'

m,
(

ffi

'wi2'"=
'

r(ra 1 + ra2 +2)

+ ~m^ + '^

'

I>ir+l)r(7?i 2 +I)

Using the same method for the moments as that just given we see that the nth moment, about the line parallel to the axis of y through a?= a is
for the area,
Y ,

tt]

'fl 2

"

th^aj^

y (m

+m

2)

m^ 'W2m
=
(jy

m +m --6 n+1 r(?HH- n + l)r( m + 2 1) ~ 2 r(mj + wia + fl +2)


i >

Now,

since r(p)

l)r(pl),
+ l)
2

the

moments about

the
:

line parallel to the axis of y

through a?=

are as follows

, l

bjm.
x

m + ra + 2
y(m + l)(m + 2)
1 1 7 (

A6

2=

wi,

TTTw T^T + wia + 2) (m, + ma + 3)


:

and

so on

Changing the origin

in order to get

moments about the mean

GO

and writing we have

m =m +l
/
1 1

and

m 2 =m2 +l
,

and

= m\+iu'

h~m
r (r
2

vi)b.,

+ l)
x

2bhn m' 2 (;m' 2 rn'i)


r(r

+ !)( + 2)
r
]

^4 ~~~

3&*m'imV| m \m 2 {r 6) 2r 2 r<(r+l)(r + 2)(r + 3)

We
p.

can simplify these expressions to obtain the equations on


2

53 by writing

/3 l

=^r
2

j32

and

em\m 2
r

then

Pl

_ 4(r -4e)(r+l " e(r + 2)

01

&(r + 2) 2 _
4(r + l)

~T

and

p_ 3(r + l){2^ + 6(r-6)}


/92 (r+2)(r+3)

2t*
e

3(r

l)

Eliminating

we
e

find
2

ft(r+2) 2(r+l)

&(r + 2)Q+3) _
3(r + l)

Dividing out by r + 2 we have


6

(&-A-l)
'

3^ -2ft + 6
Using
this value in the equation ^j-^

^- =
2)
'

+ 4+Wr+1
4^'

and from the equation

for

^
e

The other equations follow

at once

em\m The a = (a bm
2
.

distance between
/

the

from r=m'i + m'9 and mean and mode is

which can be easily reduced i)-T-(m'i + m' 2 form given. A general value (regardless of type) for the distance was given in Chap. IV. Art. 5.
x

/j/ 'i

) J

to the

01

TYPE

II.

-0-5)
FORMULAE.

2(3-/32 )

a-

JW3
3- /3

2 2

NxT(2m + 2) "a2^ +1 {r(m + l)}


62

NOTES AND PROOF.


Put

/3]

=
/jl 3

ia

therefore

= 0.

Type L, for the curve For the same reason it

is is

symmetrical, and
clear that

mi=m
to

r may
If

be approximated to if m is large. is positive, the curve starts at zero,


falls

rises

a
it

maximum and

again to zero;

but

if

is

negative,

starts at infinity, falls,

and then

rises to infinity again.

EXAMPLE.
In the discussion that followed the reading of Mr. Lidstone's

paper on Endowment Assurances, Mr. Gr. F. Hardy said that "the errors in the successive groups formed a curve very
similar to the

normal curve

of error " (Journal of the Institute

of Actuaries, xxxiv., p. 87),

and the

series in question is a

rather interesting example of a symmetrical distribution.

Unexpired Term in Years.

Error involved in using

"Mean Age"

Method.

0- 4 5- 9

11

116

10-14 15-19 20-24 25-29 30-34


35, &c.

274 451 432 267 116


16

1,683

Moments were
found for the
first

calculated about the centre of the 15-19

group, and '4985146, 2-161022, 3-104576, and 12-60666 were


four
(17-54- 2-492573

= 19-992573),
fl2

moments transferring to the mean and using Sheppard's adjust;


:

ments, the following values result

1-829172
120452

!**=

= = A
fi 4

8-52636

0023706

&=
/Co

2-548313
-

- -007492,

"liicli

shows

that


63

Type

II.

can be used.

The equations

for the type give

m=
a=
?/

4-141766 4-543079

= 462-57

The mean and mode coincide, because the curve is symmetrical. For calculating a series of values, the followingarrangement
X
a
(1)

is

convenient

log(l + ?)
(2) (3)

(2)

+ (3)
(4)

mx
(5)

(4)

+ !/

It is easier to

work

in this

way than by

calculating values of

1
at

In the particular example, ordiuates were calculated

the

beginning,

middle,

and end

of

each

group,

and

Simpson's, quadrature formula was used for finding the areas,


viz.,

y dx=-{ij
Group.

+ 4iA + y

Areas.

Mid-ordinates.

0- 4
5- 9 10-14 15-19

20-24 25-29 30-34


35, &c.

14 109 286 433 433 285 109 14

u
104 287 440 440 287 104
11

1,683

A comparison of the

mid-ordinates with the areas gives an

idea of the error involved in using the former for the latter the differences are largest at the " tails " and near the mode.

The curve starts at 19*992573 -227 1540 ends at 42*70797.


It

=- 272283, and

>3, and so a 2 and m are m'= m, in such a case the equation / x2 \ to the curve becomes y = yJl + -^ and the value of y can J
sometimes happens that
;

/3 2

negative

if a/ 2

= a

and

'"

best be found

by

64
+ *
i

x 2 \- r

'

r
[

J
-

x2 \" m

then putting In

^ =z
a

the reader will have no difficulty in

showing that

N=

P
Jo

v .a'(l

z)~^w/ "

^=

ay

B(m

J,

J)

by

Appendix

II.,

or y

= -,

'-i/m)
we

and r (i) = V^.

In a similar manner

could show that an alternative


is

value for

?/

to that given

on the previous page

Vo

=N
a

r(m+l
s/nrTim+l

15

20

Z5

30

35

*0

63

TYPE

III.

*-*f*( i+

iF

FORMULAE.

2/x.,

fl3

2/A2
/X 3

fJL^

2/Xo

Mode = Mean

-^

NOTES.
If

is

positive, the

shape of the curve


I.
;

is

like that

shown

in the

example
it

of

Type

but instead of

ending at a

fixed point,

goes to

infinity.

EXAMPLE.

The following

statistics are

taken from a paper in the

Transactions of the Actuarial Society of Edinburgh, vol. iv., p. 44, and give the numbers of wives tabulated for the ages
of mothers,

and according

to years

since

marriage.

The

mothers' ages for the particular series are 30 to 34.

Year after Marriage.

Number

of Wives.

Graduated by Type
Curve.

III.

2 3

4
5 6 7 8

44 135 45 12 8
3
1

59 111 45

20
9 4
2
1

Total

251

251

group,

The mean is '3346612 after the middle of the second and the moments about the centroid vertical are so that *=-8-44. 1-441787, 3-606622, and 18-93221 As this value was large, Type III. was used, and
;

v =a =This example

7=

-7995221

-0783584
-098007

7/0=214-8
it is one which shows a At first sight, a curve starting at zero, a maximum, and then falling, might be expected.

is

given because

difficulty rather clearly.


risino- to

In

reality,

we

find the curve starting at duration '68192 Type


III. is

The mode
case i^_
at

in ordinary cases of
so the

given by

mean

^3

In this
start

=1*25075
"

mode would be

at "58391,

and the curve would

{" mode

- a} = '58391 + '09801 ='68192.

67
so that the first

group

is

made up

of

a strip

31808 in length, and has

a smaller value than

on a base the next

first

roup, though, of course, any ordinate read off within the oroup would be larger than any ordinate in the second

croup.

No

adjustment was made to the rough moments.

Type

III

180

160

140

120

100

80

60

40

20

!-*

turatton

8
F 2

68

PROOF.

In the equation for the type,

viz.,

y = yJl+-)
if

e~yx, put

ya=p, and
frequency,

substitute

for y{a

+ x);

then,

be the total

ij

zPa-iJ e- z+ Py~ P +]
{

dz for -/
cto

=7

Jo
z p e~ zdz

\i4

=y<y+i I Tp+ 1
This
o-ives

?/

=45

71

The nth moment about the

start of the curve is

7-r(p+i)
by using the value
Since
of

found above.
the

T(p)={p-l)T{p-l),

first

moment

29

+
,

is

the

lKP + 2) and the third second (P + ,


7-

(^H^^A
7 work,
it is

order to apply these formulae to statistical


to

necessary

have moments about the centroid vertical, the position of which (the mean) can be found and as, by definition, the first moment about it is zero, we get
;

These results give y and p as

yu-3

and

-^-|
/^3

respectively.

TYPE

IV.

y=yo(i+j)

-^'
e

FORMULA.

6(ft-ft-l)

2^-3/9,-6

v/{16(r-l)-A(r- 2) 2 }

= Vl|v/ {16(r-l)-A(r-2)^}

y ~ aG(r,i/)

Sk.=
The
origin
is

/Tr-2

|V^I + 2
is
-

not at the mode, but

from the mean,

i.e.,

origin

= mean H

va

la3 (r 2) mode = mean ^ *)


'

2/jL2 (r

+ 2)
i2f ,+
</>)'

N Ire r
/

3r

l/o

2-7T

(COS

is

a close approximation where tan <= -

70

NOTES.
/z 3

and

have opposite

signs,

i.e.,

when
to

/jl3

is

positive

v is negative.

A simple way to calculate the curve x=a tan


y

is

put

it

in the

form

=y

cos r+2 0e~

'

Then 6

is

taken as

10, 20, 30,

&c, and x and y found


x.

this

gives corresponding values of x and

not be for equidistant values of


of 6

y. but the values of y will In calculating e vd the value

must

be

taken in circular measure.


to
if

If

equidistant
little
is

ordinates are required

be calculated accurately,

gained by the double form, and

we had good

tables

of

log(l+a? 2 ) and tan -1 a?, the calculation of a particular ordinate

would be a very simple matter. The calculation and meaning


the proof.

of Gr(r, v) are dealt with in

EXAMPLES.

The
Sutton's

numbers
Sickness
of

in

the

following

nearly
risk
of

symmetrical
sickness

distribution represent

the

exposed
(males
is

to
all

by
the

Tables

durations)

when

number

weeks' sickness

represented by the normal

curve of error (Type VII.).


Central Age.

No. Exposed.

Graduated by Type IV.

10
15

20 25 30 35 40
-45

10 13 41 115 326 675


1,113 1,528 1,692 1,530 1,122

6* 16 49 135 321 653


1,108 1,535 1,712 1,522 1,074

50
55 60 65 70 75 80 85 90 95

610 255 86 26
8 2
1 1

604 274
102 32 8 2
1

9,154

9,154

Tins group has been taken as the area of the rest of the curve.


71

The following values were obtained

Mean=
i f 2
3

44-5772339

= 4-527608 ^ = - -705687 = 64-98048 -0053656 0i =


/*4

&=
k=
Type IV. was used because,
eases, the probable error of
k.

3-169897
-0125
as there
is

a large

number

of

will

be small

(see

Chapter VIII.).

r= v= a=
?=

40-12143
4*450399 (positive because
13-39152

^3 is

negative)

21-06072
-03313

Sk.= -

When
is

the 5-years unit with which


to

changed

we have been working one year, a becomes 66*9576, and a 2 = 4483*325.

Ihe origin

=mean +

= 52-504394
The mode which
44-92989.
is

wanted

if

the curve

is

drawn,

is

at

As
,

is

9=

4-450398 40-12143
(

log cos

approximate form for y was used, 8925 . or 19 n 9 = lo 8" tan 6 01Q/ hence g 11537 1*9973446, and from this y is found to be
large

the
'

'

'

273*3649.

The value was checked by Dr. Alice Lee's tables (see Appendix V). The calculation of ordinates by the double process is as
follows
:

iii

years of age.

4'450398 01og lo

<?

42-1243 log cos

logy

0
1

243675
1-1687 2-3382

27337
251-38

T96637
1-93253

1-99721

2-40033 2-35813

f-98885

228-10


72

tan

The second column is formed by multiplying by a, and


is

directly as x
is

from the tables


a.

of

required in years,

13*39152 x 5 = 66*9576 should be used for

The fourth

column

formed by multiplying L cos 6, and the third is negative, the fourth continuously by addition. When column has to be subtracted from the fifth i.e., it ceases In each case the sixth to be negative and becomes positive. is formed from the fourth + the fifth + log y If the calculation is made directly, the following columns would be required
:

^(i%:)
(3)

tan- 1 a
&c.

col (4)

in degrees,

in circular

Co1
,

5)

,xcol(3)

+ (6) +

au
(7)

measure
(5) ()
(7)

g
(9)

(i)

(2)

(4)

(8)

Col.
2 )

(2)

can

be
tan"

formed
1

best

by

differences

since

A(1+X = 2X+1,
of the tangents

has to be found by using a table

of angles inversely.

table

helpful for

from col. (4) will be found on pp. 251 to 262 of Chambers' Mathematical Tables (1897 edition). When drawing a curve of this type the position and height of the mode can be noted and then corresponding points inserted, e.g., y=+l"1687 and y = 251*38. Care must
obtaining
col. (5)
t

be taken
point.

to

give

the

curve

its

maximum

at

the

right

Type

IV.

10

15

20 25

T 30l5 40

4-5

50 55 6

65

75

65 90

'

95

Mean

"

74 PROOF.

In

V = yoU
#

+
a

)i-

e' vtaBrX l put

tan0=

= tan -1 -

and
2

+ PYl

={1 + tan 2 0}-= (sec


.

0)

-=cos 2w 0,

vB 2m y =y cos 6e-

Now

N=|
77

y {l

+ ^|
I ' fl

e-'^-^da?

=
tan0 =

?/
7T

cos-" l9 e~

cos 2

oxdO. by substituting
sec -6/=

so that

a
7T

dd

,.

=a
v9

cos

^
2
6/

=y a

cos *0e
I

dd where r = 2 ra 2

= y ae
sine/) for

'

sin r <
J o

e'"^c?

(/) ,

substituting

cos

so that

(j>

+ ^7r = 6 and
origin

the limits are changed,


say.

=yQae-?vG;(r,v),
The nth moment about the
1
00

is

If 00
7T

=n.L^\
l = ==

1+

rh

~ Wi
e

'

&
substituting as above

ra

y
7T

a'

i+1

cos 2m_2

tan"0e-"W by

= iha~

2
1

cos'-" 6
cos r+w +
7'

&n*6e-d0
1

yoan+1 r

sin^-

^-^

71+ 1
2

]
I

p (sin*~ +
r
=
71

cos0e-*(n-l)-

ve~

'

sin""

*?)
;

<Z0

I-)

by integrating by parts and. treating sm n ~ 0e~ v0 as one part and cos r ~ n 6 sin as the other, and remembering that
1

PfK rn+1
J

rn + 1
Wow, since
cos
> n +
l

~ sin n 1 0e~ v0

=O

when

becomes

IT

or

-^

we have
rr

^
a

)J

- v
,

-VCO&r-n+W&mn-ide-'Odd

rn + 1
Further,
7T

|(n 1)<x/a' w _ 2 z>/u'_i

//

a-

cos^tan^e-^W

= tr \-\" Nr[
ll
1

vcoa r 0e- v9 d0[hy putting


,
J

n=l

in

the

above equation for

/x

because

N= ya

cos '0a~ ve d0
}

7T

Using the last result with the formula for the th in terms of the two previous moments, and remembering that
fi'

is

unity,

'

ii=

~r(r-l)(r-2j

(8r

-2 + ^
<8,(r

r(r-i)(r-g Xr-8)

" g)+ ^ "^ + *


flr

,i

Kef erring these moments


have, by putting d=p,\

to

the

centroid

vertical,

we

in the formula? on p. 19,

76
or

/^,, (r _l)(>- +-)


~
r
3

(r-l)(r-2)
2
z;

3a 4 (r 2 +
fit-

(r

+ 6)
,2
>

(?-

+ v -8r2 }
2
)

(r-l)(r-2)(r-3)
2

If

now, we put

for

z/

2
,

and write

as before,

A-&
we have,

and

A=,
8
z

and

2{,-l)

A(r-2)(r-3)
8(r-l)

"- r +
r 2, we
1)

8r

7'
have

Adding and dividing out by

'and

6(13,-/3,2/3,-3/3,

-6'

A(r-2)
16(r-l)

Finally, since
at once.

v-

?:r

2
,

the other formula?

on

p.

69 follow
ordinate

Since the tangent at the top of the

maximum
mode

is

parallel to the axis of x, the position of the

is

such that

dx

is

zero at that point,

i.e.,

J
{

a?)
cases, x

L
,

^
.

aJ
,

is

zero.

There are three

of x such that

=2

the

mean from

the

= x x = -fco and a value The distance of + - is zero, or x= =2m a origin is /x\ or -, and, therefore, the
mean and mode
is
-.

distance between the

r(r

2va ^.

+ 2)

which

77

reduces to the expression given on


v

p. 69,

when

the values for

on the same page, are inserted. It will be useful to give another example of the calculation of y for curves of this type, and may take a curve in which
a,

and

.-.

= 29'590, = 19'886, a = 13-650, N = 2162. = -67205, tan = -82998, log cos = T'91907, and in = 33 Jj, cos
i/

.*.

(/)

c/>

</>

54'

circular

measure

is

"59172.

logN
colog a
i log r

= 3-33486 = 2-86486 = -73557


= 1-60091

log

x/2 IT
cos 2 </>_

~37~

~
-

00776

-00282

12r

-cf>v=- 11-76700
-11-762
xloff 10

6-89183
</>)''

colog (cos

+1

= 2-47564
1-90367

-80107
accurate for
If,

The form

just considered

is

sufficiently

all

practical purposes provided v

is

not very small.

however,

v is less than 2, Q{r,v) should be calculated

by

2i/7re-W>+l)
n= f Product (1

+
4

r 2 -\-v 2

< 1+ l)

TYPE

V.

y=yQx

e~y

,x

FORMULA.

y={p-2)y/'/ i {p-Z)
jL

l/o-

V
Origin = Mean

p2

^~

Mode = Mean

27

p(p-2)
yu, 3

The sign

of

is

the same as that of

79

EXAMPLE.

The following
paper "

series of deaths is

taken from Mr. King's


xxxiii., pp.

On

the rate of Mortality amongst Female Nominees,

&c." (Journal of the Institute of Actuaries,

262-8)

Ages.

Deaths.

Graduated by Type V.

30-34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70-74 75-79 80-84 85-89 90-94 95-99
100, &c.

1 5

8 12

3 6

28 82 128 253 342 525 438 265 53 18 4

14 32 68 137 247 381 480 441 261 80 10


1

2,162

2,162

The mean
&c., are

is

at

age 75*9782605, and the moments (adjusted),

/Jh

= =

3-573346

fi3=- 4-752613
fj, 4

51-02583
-4950399

1=
R,=
/c

3-996134

85
used, but the value
is

Strictly speakings

Type IV. should be

not very far from unity, and the following Type V. constants

were found

p=
7=
\ogy =

37-29145

390*6609 (negative, because


56-930518

/x3 is)

The approximation
origin
is

to the value of log T(j9

1) was used,

The

at

age 131*32606, and the mode at 78-9467,

80

The columns used

for calculating the ordinates

were

X
(I)

log.r

y p log X -(-ylogioe) = log^ + (3) + (4)


(3)
(4)

log

y = antilog
(6)

(3)

(5)

Col. (4) is best

formed by putting ylog^e on the plate


it

of

the arithmometer, and multiplying

by

obtained, of course,

from a table of reciprocals. The point to be borne in mind in drawing a curve of this type is that as the mode and origin are not at the same place, care must be taken to give the maximum ordinate its right position and magnitude (cf. Type IV.). The graduated figures agree fairly closely with the original statistics below the 90-94 group, but are unsuitable

and the two later groups. The reason is that Type IV. should be used, and curves of Type V. have a range limited in one direction, while Type IV. curves have an unlimited
for that

range.

example

The particular case was chosen partly because an in which /^ is negative is rather more awkward than
is

when

positive.

In such cases

it

is

a good check
(in

to

imagine the
4, 18, 53,

statistics written in inverse

order

this

case

&c), and

so avoid the negative signs.

81

Type V

30

35

40

45

50

55

60

65

70

15

30

85

90

95

100

105

82

PROOF.
Putting
to
L

=z

in

=y

e~y'\c~i\

and integrating

from

x we
,

have
2sr

yo=
Using the same
orio-in is

p-l

r( P -T)

substitution, the nth

moment about

the

Jo

r(p-n-l)
1

r(p-i)

This gives

ii\

= *r. pl
mean
and. origin,

which

is

the distance between the

**- (p-2)(p-3)
^
3

(j,-2)0>-3)(p-4).

Transferring the moments to the centroid vertical


{M 2

=
(

P -2y( P -3)
4 73

and

/J>3

(p-2}(p-3)( 1 i-4)

Pl =
.

^ = 16(y-3) _ ^_ 4 + ^ (^-4)2
16
.

16

_ 4)2 (p
root
is

16.

16

p4

will

have

to

be taken

as

the

positive

of

the

equation, or 7, which from the above equations 3), will be imaginary. 2)\Zfj,2 (2') (p

given by

Since the tangent to the curve at the top of the


ordinate
is

parallel to the axis of x, the position of the


is

maximum mode is
-

such that -/
civ

zero there,

i.e..

yQ J

pe~P~ 1 e~yfa

{ \

p+s x)

is

zero.

x=
axis
1

and a?=oo give the cases in which the curve touches the of x, and the other case, the one required, is when
x
or

p - = 0,

a?= -,

i.e..

the

mode

is

from the

origin. 5

88

TYPE VI

y=yo{ x - a ) x
l,
'

FOEMULyE.

6(ft-ft-l)
6 + 3/3,-2/32

r
1

rr+2

+ 2) +16(r+l)
2

\A(r+2)+16(r+l]
_

= 2 V/ ^V/A(r + 2)

+16(r+l)

^"r^-^-ijrfe+i)
sk -4v<-l

Origin = Mean

^ 31-2.-2

Mode = Means-

^ r+2

rs

>

84
NOTES.

The range is from a to co and the method is like that of Type I. r and e are found exactly as in Type I., and lqi and 1 + ^2 are the roots of z 2 rz-^e O, just as l+??i and 1 + ra 2 were in Type I. The origin is before the beginning of g^ is taken with the negative root and l+q 2 the curve. 1 with the positive root when 3 is negative, and vice versa.
,
1

//,

EXAMPLE.

The number

of entrants in the recent limited

payment

were summed in groups of ten years of age and divided by 100, and the following series was
policies experience

obtained
Xo. of Entrants -100.

Graduated by Type VI.


curve.

rH

56 167 98 34
9 2
1

50 168 100 36 10
2
'5

368

368
!

lie

moments

&c.j
at

were

mean
/*2

= =

-402174 after the centre of 167 group 928835


893096

fia=
f*>4

4-088800
9953605

A= &=
K2 =

4-739349
1-895

1-2,= l + q*= =
r
9.i

33-42429

41-03080
7*60950

42-03080
6-60950
10-37947
46-1821

q,=
a=

logy


85

The origin

is

12-74270 before the mean

or

12"34058

before the centre of the 167 group, and the curve starts at

12'34053-10-37947 = 1-96106 before the centre


group.

of the largest

This makes the start of the curve at about age 10,


is

which

reasonable.
as follows

The curve was calculated


X

log-

log(x-a)
(3)

<l\

og

-'-'

q log(x
(5)

a)

logy
(6)

y
(7)

(1)

(2)

(4)

There is no difficulty in writing down the values for columns (2) and (3) without using column (1), as only the whole numbers in x and xa change, the decimal remaining constant so long as equidistant ordinates are required. Columns (4) and (5) are obtained directly, and column (6) by
adding columns (4) and (5) to log y The mode which is useful for drawing the curve
.

is

'02429

before the centre of the largest group.

The skewness

is *443.

PROOF.

N==

y (x a)^x~^dx

by

substituting

for

Jo

N
^"nlh-fc+iB.fe+lj qi-q-2-l)

iXq,+i)r( qi -q2 -iy

86

The nth moment about the


1 ~ ]$ f
J

origin

is

M
y

xU x (

~a

^x-^dx

__

rfa] g-2-n iyT(g 2 H-l)

by the same

substitution as that used above.

From
&c. ;

this last result

we

obtain,

by inserting the value


T(qi)

of y
l

and remembering the relationship between

and r(q

l),

/A 2

,_ =

a'(gi-l)(gi-2)

(i-9.-2)(a,-t-8;
&c.
It will

be noticed that these equations are the same as


l
.

those already obtained for Type

and m 2 = q.2 I. if m =q we can use the whole of the Type I. solution, provided Thus, we bear in mind that the range is from x = a to a?=oo
l

Type

VI.

87

TYPE

VII.

NORMAL CURVE OF ERROR."

FORMULAE.

c=2/*2

2/o

N
v/2 7T/X 2


88

EXAMPLES.

The following table gives, and bonuses, and in column


their office years of birth
:

in

column

(2),

the sums assured

(4) the reserves resulting from grouping a number of Endowment Assurances according to

Central Age for


5

Sum Assured ani Bonuses -^1,0C0.


Ungraduated.
Graduated.
(3)

Reserves -^ 1,000.

groups of
Ungraduated.
(4)

years of birth.

Graduated.
(5)

0)

(2)

17 22 27 32 37 42 47 52 57 62 67

11

13

e
2'8 11-5

48 124 213 281 295


185

104 40
15
3

40 104 202 282 288 214 116 44


13
3

2-7

277 591 847


741
50-5

10-9 30-1 58-4 80-1 76-9

232
12-2
1-3

522 250
8-4 2-4

Total

1,319

1,319

347-7

3477

The following
Constant.

table shows the

moments and constants


Reserves.

Sinn Assured and Bonus.

Mean age
J*2

39-202426
3 066840

43-967213 2-769635
029805

M-s

650127

M4
0i

27-02516 014653 2-873346

22-40663

0000418
2-920997

&
K
cr(

-005
1-751237

- -0002
1-664222

= V~)
(T- 1

5710248
300-4760

!h

6008813 83-34959

normal 'curve are k=0, ft = 0, and The values given above do not differ very greatly from these, but a comparison of the graduated and ungraduated figures shows that the reserve curve agrees better than the sum assured curve partly because the

The

criteria for the

ft = 3.

value of ft is closer to 3, and ft has a larger value in the case of the sum assured.

89

For the calculation


log

of y

the value of

= 1-6009100657
y/2/i

is

required.

In finding the

graduated

and

areas for the comparison between the ungraduated figures it is unnecessary to

calculate the ordinates, as one of the calculated tables of the

probability integral can be used.

The

best table
ii.,

was recently
pp. 174, &c),
it

given by Mr.
to

W. F. Sheppard
areas in

(Biometrika, vol.

and the columns


calculate the

in the following table

show how

was used

one of the cases (the reserves).

Mr. Sheppard' s tables give the areas and ordinates of the normal curve in terms of the standard deviation that is, he assumes the standard deviation to be unity, and his tables must be entered by using intervals of a'
;

Di.stance

from
Previous

origin in

Values of from Sheppard's


Tables using
differences (area from origin to x ).

Age

calculation
units, i.e., 5 years of age.

column

xo- 1

Difference of Area multiprevious column plied bv 3-47-7 area for age (total group x to.r+5. frequency)

145
19-5 24-5 29'5 34-5 39-5 44-5

495 54o 59o


64-5 69-5

5-893443 4-893443 3-893443 2-893443 1-893443 893443 106557 1-106557 2-106557 3-106557 4-106557 5-106557

3-541258 2-940377 2-339496 1-738615 1-137734 536853 064028 664909 1-265790 1-866671 2-467552 3-068413

00144*
99836 99049 95S97 87238 70432 52553 74694 89712 96902 99320 99892
00785 03152 08659 16806 22985+ 22141 15018 07190 02418 00572 00108*

2-7

10-9 30-1 58-4 80-1 76-9 52-2 25-0 8-4 2-0


4

* Remainders of areas beyond 19'5 and 69*5.

and

a piece of

+ ('70432 *50000) + (" 52553 "50000) because we pass across the the group is on each side of it.

origin,

The second column can be left out when the method has The ages in the first column were taken been grasped. with the assumptions that 17, 22, etc., were the consistently
central ages of the groups.

column in Mr. Sheppard's must be used. It was with its help that the curves in The statistics and curve for the the figure were drawn. reserves are shown by the dotted lines.
If ordinates are required, the z

tables

90

Type

VI

Sums.
Ass urea"
-r

/OOO.

/Jge

17

22

An
means

of the

average reserve for any group can be obtained by graduated figures, and it could be used to test

the reserves obtained at any future valuation.

This is by no means the only rough check that can be applied, but it is interesting because it shows a use to which frequency-curves

might be put

in practical office routine.


91

PROOF.

To show that

f
I

e~ x

dx= V7T
2

Jo
let

Jo

then, substituting ax for

a?,

we have
ada:=K

e~ a

* x2

Hence,
e- a^1+x ^adadx=A e- a2 da=K2

But

If"
2J

(to

.,

1+^-""
V7T
4

Hence,
J -co
"

v
:

The other constant


e~
c

is

obtained as follows

t/

cc

dx=yQ xe~
\

'-

+ e~ C
C&t'

<

xdx
J
_co

by parts

_2

1-f
AT

= 2/^a

92

ADDITIONAL EXAMPLES.

8.

Up

tu the

present

we have merely considered examples

with a view to illustrating the various types of frequencyit seems advisable to consider one or two practical examples which may help to show the range of applicability of the curves in actuarial work, and give an opportunity of noticing a few difficulties which may arise in applying them.

carves, but

in practical

or

The function with which actuaries generally wish to deal work is not an exposed to risk or series of deaths withdrawals, but the ratio between the deaths and the
that
is,

exposed;
curves

with

the

rates

of

mortality,

sickness,

marriage, and withdrawal.

An

actuary studying frequencythese

may

therefore naturally ask whether any of

rates can be graduated

examined, and,
other method
?

if

by means of the curves we have they fail, must they be put aside for some
;

Xow the first point to be considered is whether these rates are frequency distributions if they are not, the use of the frequency-curve is empirical. A rate of
who die, we imagine 1,000 persons exposed to risk at each integral age, the number of deaths would be 1,000 times the
if

mortality gives the proportion of people at each age

and

rate of mortality,

and

this

seems to show that


actual

it is

possible to
it

consider the rate uf mortality as a distribution,

though

is
is

me

that

could hardly arise in

experience.

It

impossible to describe the rates of mortality or sickness by a


single

frequency-curve.

On

the

other hand, the rates of

marriage are certainly much like frequency-curves, and the rates of withdrawal, whether regarded according to age or duration, might take a form like our example in Type III.

There are, however, practical objections to the direct operation on rates, even apart from the very exaggerated idea of frequency distributions in which it is necessary to indulge.

The numbers exposed to small, and a single death


rate,

risk at the

end

of

any table become


be a zero rate

or marriage there gives a very large

while at several ages near there


data.

may

shown by the ungraduated


as
it

tends

to-

extremely awkward, make the ratios dealt with far rougher in


This
is

application than the actual observations are in fact, and

we

are

forced to group the material before using

it,

which introduces

an arbitrary practice which

it

is

well to avoid as far as

93
possible.

be inferred that a small number of say fifty or one hundred deaths must necessarily be grouped according to each year of age, but that even if there are two or three thousand the roughnesses
It

must

not, of

course,

introduced

by

the

use

of

rates

influence

the

result

considerably.

each rate of
9. It

The reason is that an equal weight is given to mortality which is very far from the weight
these

indicated by the exposed to risk.


will

be useful to consider a case bearing out

objections

and then deal with a practical method of overcoming them. The statistics to be considered have been taken from a paper by Mr. M. Mackenzie Lees " On Rates of Mortality and Marriage among daughters of Peers and Heirs
Apparent, &c." (Transactions of the Faculty of Actuaries, vol. i., p. 276), and may be summarized as on page 94.

The moments were calculated by Mr. G. F. Hardy's Summation Method, and were found, about the mean
28-77191, to be

^=
fM 3

63-2092
627-101

^4=19,103-3

ft= 2 =
The
criterion

1-557153
4-781321

was

k= I'd,

but as I had neglected the

rate -00089 at 71 in calculating the

The

inclusion of the rate at that age

moments, I used Type III. would have lengthened

the curve and considerably increased the arithmetical value


of the criterion.

The constants

for

Type

III.

were
-201592
1-56881

7= p=

a=
The curve
starts, therefore, at

7-78189

Mode = 2381128
age 16*02939.

= 890-05.

table,

The rates resulting from this graduation are given in the and while they tend to show that the distribution
do
not
give

of rates of marriage is closely allied to a frequency-curve,

they

satisfactory

graduation,

and

the

94

Marriage Rates of Spinsters.


Age
Exposed
to Risk

Xo. of Marriages

Rate of Marriage

E
3,658 3,603 3,528-5 3,393-5 3,187 2,945 2,688-5 2,443 2,187 1,956 1,758 1,583-5 1 ,417 1,270-5 1,148-5 1,068

M.x

mx
0008 0022 0139

Rate of Marriage Hypothetica Graduated Exposed by E'x Frequency


1

Xo. of Marriages

M>
3 7

Rate of Marriage Graduated.


j

Curve.

15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49

...

8 49 114 176 219 192 211 212 194 146 137 121 105
75

0027 0132

0336
0552 0744 0714 0864 0969 0992 0831 0865 0854 0826 0653 0562 0650 0453 0354 0486 0266 0352 0268 0172 0229 0256
01

0332
0517 0667 0776 0846 0881 0S89 0875 0845 0803 0753 0698 0640 0583 0528 0475 0425 0378 0335 0295 0260 0228 0199 0173
0151

984

9045
848-5

802 752 711


672-5

60 64 41 30 39 20 25 18

3,695 3,433 3,187 2,957 2,742 2,541 2,354 2,179 2,016 1,861 1,723 1,591 1,469 1,355 1,249 1,151 1,061

44
99 151 189 168 188 196 185 143 138 126 112 82 65 69 43 32 40 20 25 17 10 12
12 7
5

0018 0157 0350

0541
0695 0809 0880 0917

0920
0901 0861 0812 0754 0693 0631 0569 0508 0452 0400 0352 0309 0270 0235 0205 0176 0151

638

6125
586-5 568-5 541-5

14 15
9 6 8

58

515
491-5

2
5 5

476 454
440-5

0111 0155 0041 0105 0110

2
5 2
'"

0015
0120 0051
.*::

416 395
378-5 363-5 348-5 335-5

50
51 52 53

0029 0089

54 55
56 57 58 59 60 61 62 63 64 65 66
67 68 69

3175
304 291
278-5
"3

0103 0036
.'.'.'

261
248-5 234-5
...-

2195
209-5 201-5 191

1
... ...

0046

177
165-5

:::
...

154
147-5 135-5 124-5 112-5 105-5

...

70 71
72

0131 0113 0097 0084 0072 0062 0054 0046 0039 0034 0029 0024 0020 0018 0015 0013 0011 0009 0007 0006 0005 0004 0004 0003 0002 0002 ouoi 0001
...
...

976 897 825 758 696 639 586 537 492 451 412 376 345 315 288 262 239 218 199 181 165 150 139 124

0130
0112 0096 0082 0070 0060 0051 0044 0037 0031 0026 0022 0019 0016

6
1 3 3
1

3
1

75*7

0007

45-7

0003

...

27-2

ocoi
...

1
...

0089

73 74 75

95
84-5

...

79

...


95

due almost entirely to the objections referred Of course, if we were examining the algebraic to above. form taken by rates of marriage, we should begin by work on population data where the roughness of material is avoided by the large numbers of individuals dealt with as, however, we are seeking for a graduation, we must see how these objections, which of course apply to some extent to any method of graduation, can be overcome. It has been remarked that the cause of the difficulty is that incorrect weights are given to the items used, and the most obvious suggestion is that the actual exposed and marriages should be graduated entails a large amount This, however, separately. of additional work, and a shorter method can be used which
failure
is
;

This method consists of using and treating it as a hypothetical exposed to risk from Avhich a new series of marriages can be The advantages are that we have only to make calculated. one graduation, and the weights of the various parts of
avoids the double graduation.
a series allied to the exposed,

the table are given approximately.

In a similar way q d can


.

be graduated, and in this connection it may be remarked that as the exposed to risk is generally capable of beingrepresented by a frequency-curve,
it is

natural to suggest that

the hypothetical exposed might be taken as the simplest form

assumed
tabulated.

by

such

curves

(viz.,

Type VII.);

this

is

also

convenient because the ordinates for such curves have been

The hypothetical exposed can be fixed by trial or The column E'a, in the from the values of the exposed. table given above is taken from Sheppard's Tables of the Probability Integral, x being taken as 3'06, 3'084, 3*108, and the entries were multiplied by 10". 3*132, &c, The M'a.=E'a.xma was then formed and graduated. following values were obtained for the M'.r series
10.
, :

Mean=
/

24-85779
29-5006

Lt2

=
;

= 190-112 M4 = 4 36M2
M3

1=

1-40775

&=
k= -

5-01114
7-102


96

As

this is large.

Type

III.

was used, and


-310350
1-841405

7=

p= a=
Mode=
in the final

5-933325

7/o=192-625 21-63562

The curve was then worked out and the rates of marriage column were obtained by dividing M' by E'. They

agree closely with the ungraduated figures. numerical example of the application of the method to MX """ Table may now be given. The normal curve with the <r = 10 and origin at age 524 was used, and the values were
11.

multiplied

b}^ q x

with the help of Crelle's tables.

A part
gxEx

of the

work was
Age.

105

Ordinate from Skeppard's Tables

=E
3984439 3944793 3866681

Age.

gxEx

lO3

810 597 644

52 51 50

53 54
55
1

801 850 875

&c.

Summing
following
:

these entries (q x

Ex

10 5 ) in

fives, I

formed the

Age

q X

10

r>

20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

13

70 218 594
1,394 2,460 3,702 4,519 4,385 3,602 2,249 1,197

461 133
31
5
1

25,034

The abbreviations (use of Crelle's tables and grouping) were adopted to save labour, and as the figures were required for an example they are sufficiently accurate.


The following values were then found

Mean Age = 59-439762


/

a2

4-584327

^=-4999871 ^4 =61-17014
Type
of curve.
x

No.

I.

m = 32-81 166
^, = 26-57123 ! = 18-78553

02=15-21272
?/

= 4609-884

Mode age = 59-730789


(The unit
is

5 years of age.)

The ordinates were then calculated for every fifth age, and finding that the curve is not very far removed from the normal
curve of error, I interpolated in the second differences of the logarithms of the ordinates for those at the other ages.* A quadrature formula was used for finding areas, and q x was

found by dividing by the hypothetical figure already used for


the exposed.

The expected deaths were


G raduated
q*
Group.
for Ce utral
f

as follows

Deviation.
Actual.

Expected.

Age

group.

+
1-5
1

15-19 20253035404550556065707580859095100-

...

00643 00731 00850 00991 01179 01452 01866 02505 03516 05118 07682 11648 17462 24870 33286 43289

1-5 8-9

69 205 369 588 801


1,064 1,399 1,752 2,164 2,216 1,965 1/237

61-0 204-6

8-0
4

3807
575*6 811-4 1063-8 1386-6 1773-2 2136-7 2261-2 1925-8 1241-9 514-4 126-0

11-7

12-4

104
"2

124
21-2
27-3
45-2

392
4-9
...

494 129
18
1

20-4

30
7

173
1-5
'5

14,480

14492-1

1158
21 9-5

103-7

*
say.

Ase - '* - ^
The

'

is

the equation to

normal curve, the logarithm


if

is \x'~

+Bx + C,

criterion of course

shows

the curve

is

nearly normal.

98
12. It will be interesting to

method

just described, as it is often required


'> x ~ hr

examine a particular case of the by actuaries.


v p x = A-\-~Bc we
,

Defining Makeham's hypothesis as colog


take a normal curve
(y e~(
2<T

~)

to represent
.

the exposed

and multiply by the values of colog p x This means that we assume that the products can be represented by
y

= (A+ Be*) y^-fr-WI

2"'

= Aytf-te-WI2** + HBi/ r
where

(:'

" 2[U(r=logff

'

]'

v+[U(r21o fC]2 )'2(7:

H=e

(
-

'

+'2 <T ' h [o


'

s< c +

o-

4 (iog^) 2

-h-)i-2a-

_ e k\ogec+ ^(logeO*-

y = Ajytf-(?-WP<r -+HBytf-<9

x - t) *'2<r

I.

i.e.,

the

sum

of

two normal curves both having the same

standard deviation as the exposed curve and one having the

same

origin.

The

difference

between the two origins gives

o-

log f c,

so log" 10 e=

log 10 e.
is

The whole

solution

made very simple by taking moments


r+oo

J'+x
first

xydx and

x2ydx (the
2 (<r
!

two moments) give {t &)N2 * and where ^S = A?j and N2 =HBy


l
-

1 o"

+N

[t h) 2)^

Dividing the values just


frequency),
origin

given
first

we

obtain, as the

by Ni + Na (the total moment, about the known

=-=r-.

and. as the second,

or
~

h = fi'20

and
where //
is

= /,(N, + N2
11

written for

moments about h
the odd

Remember that tlie normal curve is symmetrical, so that mean of such a curve are zero. 2 t Can be seen at once as the sum of two integrals; Njtr moment of the first normal curve in I, and N 2 (<r 2 +(t h)'2 ) moment of the second normal curve.
about the

moments

gives the second gives the second


99

logi

c=

-log
"'

10 e

II.

as stated above,

and

if

y=

y== a
10 7

as

is,

of course, generally

v'lrr

com enientj
r

then,

A =Ni-*-10*
and

B =N2 -=-(10* xH)

2
J

1
s

" 10* j** C+^(l0geC)

N
=
13. If

(see

equation

II.)

10Ve-lF *>10grf

2
t

+h

10 k C-2~

of the

we assume, as Mr. Hardy does in his recent graduations new experience, that log 10 c is known, we only require

to calculate one

moment which
of

gives us -^
JN
!

-f JN

r^r
2

and. this,

with

the

help

equation

II.,

enables

us

to

complete

the solution.

If c

were obtained for the aggregate table we

should use this result for the select tables.


NM(5) Table may be of normal curve with standard duration 10 and origin 55J was taken, and the terms multiplied by colog^, These were then grouped in fives, and the first two moments One little point should be borne in calculated about age 55^. mind in connection with the grouping though the centre of the base on which the product q x x exposed stands is x + h, the result colog^x exposed is an ordinate at x; the centre point of five ages 20 to 24 is 22J when q x is used and 22 when

14.

numerical example with the

interest.

oology is used. The figures were


1st

moment about 55|

in

^+^=136387. 5-years group = 1*416184.

2nd

=4-1929354

second

Deducting (W. F. Sheppard's adjustment) T^- from the moment and multiplying the first moment and the
h 2


100
adjusted second

moment by

the unit one year instead of 5 years,

and 25 respectively we have

to

make

/,=
then

7-080920

^2=164-384085 g (*-*)=
9586889

t-h =
log
10

9-092617

c=

03948873
00301749

A=
]ogi

B= B=

00004518782
5*6550214

q x was then calculated from the graduated oology obtained from the values of A, B and c, and the following table of expected deaths was worked out. The values of q x are given in the table showing the frequency-curve graduation
:

Graduated

Deviation.

Age
Group.
for Central

E xpected
Deaths. Deaths.

Age

of Group.

Under 25 253035404550556065707580859095-

00812 00882 00991 01162

01431 01854
02517 03551 05160 07639

13-0 67-0 211-6 380-8 566-9 799-7 1057-5 1392-7 1790-2

4-0 2-0
6'6

69 205

369 588 801


1,064 1,399 1,752 2,164 2,216 1,965 1,237

11-8

211
1-3 6-5

6-3

38-2
11-0 33-3 76-3 23-4
25-1 7-6
1-6
...

21530 22493
1888-7 1213-6

11415
17053 23352 36484

5191
136-6 20-6

494 129
19

14460-3

14,480

128-2 276-1

147-9

This result, which would have been improved by using


the terms instead of grouping,
is

all

very like that given by Mr. G-. F. Hardy, but avoids having to obtain c by trial. Mr. Hardy's expected and actual deaths balance better than the above, but I do not think the rates have been understated the 75-79 group accounts for the as systematically,
disagreement.

The

total deviation is less

than Mr. Hardy's.

101'

15.

When
there

dealing with, the adjustments for moments


it

it

was

remarked that

is

best to use unadjusted

moments except
In some

when

is

high contact at each end of the curve.

cases the unadjusted

moments

are rather far from the truth,

and consequently the curve obtained is by no means the best that can be found. This most frequently happens when the curve rises very abruptly, as in our example for Type I., or when it takes the form of the example for Type III., the reason
being that, in such cases, the assumption that an area is concentrated at the middle of a base of unit length involves
a considerable error.

In the Type

I.

example, for instance,

group was assumed to be an ordinate at 17, whereas the curve starts at age 16*76, and the central point ought, therefore, to be later. The results can sometimes be improved considerably by basing a second graduation on the first and, in the particular case just referred to, since the first group is
the
first

too large,
It is

we might assume

that the curve should start at 17*5.

unnecessary to find as

many

as four

moments, for by
(b)

assuming that the start known, the equations on

of the

curve and the range

are

p. 59, giving the moments about the have start of the curve, afford a very simple solution.

We

6(wi

m
and writing

a (w + l)(m + 2) and ^'2 = L ~ (m1 + m2 +2)(ml + m2 +3) + m2 -\-2

+ l)

fc

71=

and

7.,=

r4
fi ib

we have

71(72

!)

717-2

and

m +
2

(72-l)(l-7i)
71

72

moment about the start of the curve. range of a curve can be fixed by general considerations, a good deal of labour can thus be saved, while, if the start of the curve is known, the following solution depending on three moments is of use.
where
jl

is

written for a

16. If

the

Writing

\.,=

-A and X =
3

-r~A-

2/A

102
the values of the constants in the equation to the curve are

given by
in

+ 1=
K,-x

2(\2 -\,)
A/o

A.oA-3

Wl
*

fa-AJ (\3 - l)(l-X (2^-X2 -\2\3 (1 + X3- 2X2)


2
2)
)
7
,

mi

+ mo + 2

and

??i 1

line for

Returning to the example of Type I., and considering the age 22 in the table on p. 54, we see that 4*175 and 14' 634 give S 2 and S 3 excluding the first group, and the moments about age 17 are then found to be 4*175 and 29*268; transferring to 17*5, we have 4*075 and 24*268 adding the
17.
,
;

2 moments for the first group, *034 x I and *034 x respectively, = 4*0818 and yu/ 2 24*26936. Assuming a range of 15*5, and
(-i-)
/

u,

using the formulas given above,

m 1=
W2=
a,=
a2
7/0

-3498

2*7758
1*735

= 13*765 = 154*2

and the mode

is

17*5

+ 1*735 x 5 = 26"175.
first

From

these values the graduated figures for the

four

groups are 37, 140, 152, 143, which is an improvement on the This example is used for figures obtained previously.
convenience, but with regard to adjustments in the particular
case the remarks in the footnote on p. 55 should be borne in

mind.

As a second case, the example for Type III. may be considered. Assuming that the value of p( "0783584) is not to be altered, then the first moment about the start of the
18.

curve

(see p. 08)

aQ ,

= p+1 =
-

'9216416
-

Assuming the curve

,,

to


103
start at *8

the

first

moment about
44 x 135 x

the point

is

calculated

as follows

'1*= 4-4 -7 =94-5 45x17 =76-5 12x2-7 =32-4 8x3-7 =29-6 3x4-7 =14-1
1

x 5*7

5*7

3x6-7 =20-1
251
277-3

The

first

moment
The value
19,
8,
-

277-3
is

therefore
is

..

2ol

1*0875.

and hence
is

ry -84752.
47,
i/

of y
2,
1.

123, 48,

3,
r

205-0, and the graduation The equation to the curve

is

= 205-0ctf~'

07186

e~' 84751

with the origin at the start of the

curve, so that in this case the y

was calculated by the formula


will

y-r(

P +i,

which the reader

have no

difficulty in

reproducing for himself.


curve,
it is

When moments

are calculated about


is

the start of the curve and the form taken

that of the present

convenient to use the equation in this form rather It may be mentioned that as the than in that given on p. 65. value of p is nearly zero in the particular example, a good
result
is

would be obtained by assuming that value, or, which same thing, by putting y^y^e'V; y now becomes l-0875- = -91958, and y = 251 x '91958 = 230*8, and the
the
J

graduation
19. It

is

43, 125, 50, 20, 8, 3,

1, 1.

sometimes happens

that

the

error involved

in

the

calculations of the

moment

tends to balance that resulting

from the curve not starting at the beginning of the unit base assumed for the first group. An instance of this is the first example of Table I. for which the mean is at duration 5'182, and the moments and constants are

^= ^=
* Strictly

17-63688

A=

3-34846

1355361 ^4=1923-565

ft= 6-18392 =-l-307

should be rather earlier, because the ordinates are decreasing very

rapidly.


104
so that the curve will
7

be of Type
(i29fi85
t

I.

and equation
'

to

it is

= -89O82 y-'

(25-49729-ci0

624275

where the origin is at 1*02897 where the curve starts. The graduation by this curve is shown in the following
table

Duration.

Withdrawals.

Graduated by

Type

I.

curve.

1 2

1
5

308 200 118 69


59

312 198 101 76

58
15

6 7 8 9 10
11 12 13

u
29 28 26
2L 18
18

37 30 25
21 18 15 13 11 9 7 6 5 1
3 2 2
1 1

12
11
5

14
15 16 17 IS 19

11
7

6
1

20 21 22 23 21

3
1

3 2

1,000

1,000

20. The calculation of the graduated area of the

first

group
of

may

present a difficulty } as a quadrature formula cannot be

applied,

and the following method gives the best way

obtaining a correct value


y x
"

(bx) "-dx=\ y
l

x"

4b"

m
J

2 b"

-~ 1

*+

{i 2

l)

~2T
+1
/<mi

--x 2 .

.jdx

V"'i

+ 2)

last

which is a rapidly convergent series when x is small. In the 1*5 1*02897 = '47103 the second term example where a? is

105
barely affects the result.
the formula
y must, of course, be calculated

by

^ +m
which
is

Y(r)
3

+l

'r(m +i)r(w2 +i)


1

an analogous form

to that

given for similar Type III.

curves in Art. 18.


21. The expression Type III. curves is

for finding the area of the first

group

in

j;,

e-v^r = ,^
;/

(-|- 1

-^2+

...)

106

PART
CHAPTER

11/
VI.

COEEELATION.
1.

Two measurable

characteristics,

and B, are
a?

said to be

correlated, when, with different values,

of A, Ave do not find

the same value, y of B, equally likely to be associated. In other words, certain values of B are relatively more likely to

occur with the value x than others.


2.

In practice, as one characteristic increases,


steadily

the

other
it

generally either

increases

or

decreases,

and

is

exceptional to find that while

one

increases

steadily

the
in

other increases for

time and then

decreases.

Put

a rough-and-ready way, the definition can in particular cases, " The mean with which actuaries are familiar, be stated
:

ages at maturity in

Endowment Assurances increase with the unexpired term when the policies are grouped according to
;

the unexpired term "


likely
is

or,

" the older a bachelor, the

less

marry and have children." There is correlation between ages at maturity and unexpired term, and between the age of a bachelor and the number of children, and it is required to find a method of measuring the amount of The easiest way to appreciate the correlation statistically. nature of the problem is with the help of a table of double entry, such as the following, which gives particulars of 2,870 endowment assurances grouped according to their unexpired term. A little examination of the table shows that there is a connection between the two functions, but does not give any measure of the correlation suitable for comparison with
he
to


107
the experiences of other offices or with that of the same office
at a later date
Unexpired term of
:

Central Age at Maturity.


Total.

Mean
Maturity

Endowment
Assurances.

Age
for

30 35 40 45

50

55

60

65

70

75

the row.

0-4
5-9

2
24 1
IS

2
20
1
15

26
8

6
4

14

6
4

56
2
6

53-75

16

12

2
12

6
9

62
(5

36
3

40
127 237 271
231

22
3

172
1
6
...
::

55-03
55-85

10-14
15-19
3
6

2
10
...
:.

9
S

17

117
4

99
2

52
2

8
4
2

432 665

6
4
...

24 145 155
3
2
l

84 11
l

56-59 57-58
57-88

20-24 25-29

3 9
3

133

167

78 20
71
i

674
538
247
77

90 123
2
1

11
2

3
3

30-34
35-39

...

1
6

11
4

49
2

127

49
2

8
4

2
(5

59-94

6
3

49
2

22
3

61-04
62-50 65-00

40-44
45-49
...

2
4

3
4

...

8
1

12

1
Total
6

17

62

584 643 1,098 388 60


of small

8 2,870

Note.

For explanation
table
is

numbers, see Art.

16.

The above

called a correlation or frequency table,


it

called an array. The middle which the row is associated is value of the variable with called its type, so that the third column (i.e., that headed 40) would be called the ?/-array of type 40, and the fourth row would be called the ^-array of type 17*5, because 17*5 is the middle of the 15-J9 group. our definition of correlation, and 3. Now, returning to examining the last column of the table, which gives the mean age in each row, we see that the numbers in it tend to increase as we go down the column, and the age at maturity The figure is therefore correlated with the unexpired term. on p. 108 shows the series of mean values clearly.
in
is

and a column or row

4.
if

A little

consideration will lead to further informatioD, for

we imagine

a case in which there

is

no correlation, the

series of the

means

of the
is,

rows

will

be independent of the

other function;

that

they will

run horizontally when


to

plotted out as a diagram.

Another point

be noted

is

that

there are two kinds of correlation (positive and negative),


for the functions

may

increase together, or one

may

increase


JU8

FIGURE
'z/vV y

>

/
^

{/

10

!5

20

25

30

35

40

45

50 7c/

109

and the other decrease;


assurances the correlation
increases the
5.

in
is

the

ease

of

the

endowment

positive,

because as the term

mean maturity age also increases. These introductory remarks will give an indication of the nature of the problem to be solved, and may help to render the following proof easier to follow. It should be remembered that the proof deals with a function of n variables.* The table given above has only two variables, but it is easy to see how more for instance, variables may be introduced in similar tables endowment assurances by limited payments give three variables (term of premium, term of assurance and age of life), while an increase in the number of lives say, jointgives four life endowment assurances by limited payments
;

variables.

We may now consider


iu the

what equation

will represent

the

numbers

body

of the correlation table

and how
is

this

equation can be utilized to express the relationship between


the two functions with which the correlation table
[6.]

concerned.

be deviations from their respective means of a complex of measurable characteristics. The sizes
771,
772,
773
.

Let

rj n

of the functions measured, or organs, are determined

by a

large

number

of independent contributory causes.

Let there

be

m
6. , 2

of these causes,
.
.

and

let their
x

deviations from their


.

means
will

be
6],

>, 1, eL 63

e lu

then

rj

rj 2 ,

ij 3

%% will be functions of
certain of

63

em .

Further,

if

m>n

the

e's

appear only in certain of the 77's, and the e's will not be fully determined for a given tj complex. We also assume that the
variations in intensity of the contributory causes are small
as

compared with

their

absolute intensity, and that these


;

variations follow the normal law of distribution

that

is,

we

assume that the deviations from the mean value can be graduated by the normal curve of error (Type YIL). The mean complex being reached with the mean intensities of contributory causes, we have, by the principle of the super
position of .small quantities,
?; 1

=a

,e 1

+a

2 e2

+a

13 3H-

+a lm e m

V2

= 02iei + 022^2 +02363+


= OwiCi + 02 + 0*363 +
2

+2,nl

(i->

Vn

+ <*>wmt

A student reading the subject for the first time would do well to omit the paragraphs indicated by brackets. After the statistical idea underlying correlation has been understood, it will be found easier to follow the theoretical work.


110

The as are coefficients whose values have to be determined, and any of the system of a's may be zero, for a particular contributory cause may have no effect on a particular result. Further, the chance that we have a conjunction of contributory and causes lying between ei and ei + Sei, e2 and e 2 + Se 2 between em and em f 8e m will be given by

2^ + ?

61 2

6" 2

4+

'

J+

*f_ \

um*) x

SeM
of

Se m

(ii.)

where the
x
.

standard

deviations

the

distributions

are

k k2 K m and C is constant.* Now, by (i.) let n of the variables e, say the first n, be replaced by the variables 77, then the probability that we have a complex with organs lying between 77 and 971 + 8771, 7) 2 and %+S?i2 Vn and rj n + 8r} n together with a series of contributory causes lying between e w+1 and e u+1 + Se +u eOT and em + $,, will be e n+2 and Se n+2
.
.

where

C
(i.)

is

a constant, a function of
:

C and the

a/s,

and

2
cj>

consists of the following parts

(ii.)
(iii.)

A quadratic function of the t/s from A quadratic function of the e's from A series of functions of the type
e+i(&i, n + lVl+h, n + )V-2+

rjx

to

rj n

en +i to e m

+b n .n + iVn)

en+zfii, n+2Vi

+ b2}

n +iV-2+

+ &n>

n + iVn)

m (bi

mVi

+ h,

mV-2+-

+bn .mVn)

where some of the b's may be zero. Now, if P' be integrated for all values from
.
. .

go to + go of
x

e m we shall have the all the contributory causes en +i 3 e n+2 whole chance of a complex with organs falling between 7] and
,

V\ + &Vi>

and V2 we integrate with regard to an

and

772

+ 8772

7jn

r)

+ $7)n
say
2
<j>
,

But every time


,

e,

e n+x

we

alter

the

constants of each contributory part of


* Consider

but do not alter the

any particular case of the normal curves, the chance of getting a and ej + 8^ when the distribution is Type VII. is y^e-^l^^b^ where *i is the standard deviation similarly with each of the other causes. As the causes are independent, the product of the various chances gives the required
result

between

ei

chance,

Ill

triple constitution of

2
</>

except to cause one


constituents.
it
2

e to

disappear

from
alter

its

(ii.)

and

(iii.)

At the same time we


is

any terms in 77. Thus, reduced to its first constituent, or we conclude that the chance of a complex of organs between 7) and ^ + ?/!, 77.,, and 7) 2 -\-Sr} 2 rj n and
without introducing into
finally, after

mn
x

integrations,

</>

Vn + &Vn

occurring

is

given by
1

P = Ce-?x '8r)
where

,8r) 2 , Sr] 3 ,

Srj n

(iii.)

is

a quadratic function of the

77's.

This

is

the law of

frequency for the complex.


Consider the expression
function, then
(iii.)

but replace

X 2 by
-

a quadratic

P Q e -hi''
Here
C, cpp
cpq

ir]l -+C.,..,r,.."+

+2e

.,rll T

.,+2r 1

.,r1l

n .,+

are constants,

and

S]

denotes a summation for

every value of p, and S 2 for every pair of value of the series.


this

p and

q in

Taking the simplest becomes

case,

when
2

there are two variables,


,rll r1 ,)

P Ce-K'i'h
Integrate

+C.r,.S

+ 2c

*
to

for all values of

771

from

+x

and we

must have the normal curve of

t]. 2

variation.

f,

2= C2
V

1<T\

C\C 2

Similarly integrating for

all

values of

77-2,

2<x22

- =c/l- \
77!

C\C.2

Integrating for
frequency,

all

values of

and

77-2

to obtain the total

we have

N=
*

Ctt

VC1C2
In Appendix
III.,

(c

2
12 )

some

integrals connected with the normal curve of error


.-

are dealt with.

The
index
as

result

za-j

h z

=eJl "\
(v.)

e1 c2 /

can be reached at once by

rearranging

the

in
is

the

expression
of

for

as

a
III.

jaerfect

square

-e 2

(l

L2
'

)v-2~

done in No.

Appendix


112

r= C C
\

Now
we have

put

and write x and y

for

rji

and

770,

anc^

-2

:7rcr 1 o-o'

vT ^
/

W(l-r*)

o- lff2

(l-r 2 )

<r 2

*(L-r*)/.

(iv.)

The equation

just given

is

a graduation formula capable of

representing tables like that on p. 107. Tt has been obtained on certain assumptions which may not all be realised in
practice, but
it

has a far larger scope in practical work than

the analogous normal curve of error has in frequency-curve


operations.
[7.]

Since

20V2

represents one series of variations and


if

2<r 2 2

the other,

it

follows that

there were no correlation at


result,
,

all

the

frequency of any particular


of the

x y would be the product


to
e
\2

two chances,

i.e.,

proportional

-oW, which
a measure

means that the

size of the

term

jzr

<T x cr,{l

2
)

or

r, is

of the correlation,

is

called the coefficient of correlation.

[8.] Perhaps the easiest way tosee how its value can be obtained from an actual experience is by looking- at the matter from the curve-fitting point of view, and dealing with the expression for z hj moments. It will be remembered that the moments were obtained by

summing
powers

for all values of the frequencies multiplied

by the

independent variable, but as we now have two variables we can take n powers of the one and m of the other. Thus, if we take the second powers of the x distances and
of the

the zero power of the y distances [i.e., neglect y), we obtain the ordinary second moment of the frequencies reckoned only
Similarly, with the second powers of the and the zero power of the x distances. These y distances calculations give two of the unknown constants for a = \//j,2 There is, however, another second-order (see Type VII.). term to be considered, namely, that obtained by taking the first powers of both the x distances and the y distances, i.e., multiplying the frequencies by xy. This may be written
in the

x direction.

c. x

xy dx dy, where z has the value given above.


i as J

113
This double integral reduces to
or
Nro-jo-., (see

Appendix

III.)

the

_ (xy)

moment moment
(xy)

of

the total distribution ="Nra-1 <r<i j

or

No-i<r2

To

calculate

the

coefficient

of

correlation
2

therefore, to find the x 2

moment, the

we have, moment and the

xy moment about the centroid vertical. [9.] Now we have seen that equation (iv.) gives an expression for describing a correlation table such as the table of

endowment assurances on

p. 107,

clear that the distribution of an array of


z

and from that equation any type t is


g2

it

is

= ZQe -{<l^-2ht>: +
make

t^

where g i} h and in cti, a.2 and r. we have


,

g.>

are written for the longer expressions

If Ave

the index into a perfect square,

=z
This last expression
is

a normal distribution having the same


;

standard deviation as that of the whole surface


differs

but

its

mean

from that of the whole surface bv


"

9i

and

it

follows

that
(1)

The deviation
arrays

of the

mean
or

of the array is directly


;

proportional to the type


increase
lie

or the
in

means

of the

decrease

arithmetical

progression or
regression line).
(2)

on a straight

line (called the

The standard deviations of all parallel arrays are equal and independent of their types.
example
it

10. Before returning to the statistical


to

will

be well
to

consider the following proof,* which

proceeds on the

principle that

we

require to

fit

a straight line (y

=a

2 -\-b 2 x)

the correlation table.


* This proof

J.S.S., vol. k., pp. 812,

been altered to

one given by Mr. G. U. Yule, &c, andProc. Boy. Soc, 1897, vol. lx., pp. 477, &c. avoid the introduction of the method of least squares.
is

a modification of

in the
It

has

114

**

^*

*l

*3

Let xx

yi,

# 2 y2

&c., be associated deviations,

and

let

y=.a^-\-h^x

be the straight line used in the graduation;


figure corresponding to
x

then the graduated

Now,
method

if

of

x is a 2 Jrh 2 Xi. we proceed as we did in fitting frequency-curves by the moments, we make the graduated and ungraduated areas,

means, &c, equal, or


(ff 2

+&2#i) + (2+&2#2)+

=yi+y2+

or

N2+*2S'(#)=S'(y).

And
or

(2+&2#l)#I+(2+&2#2)#2+
tfoS'O)
first

=^^1 + ^2^2+

+ &2S'(ar2) =S'(ay),

where S'(#) gives the

of the y's, S'(# 2 ) the second

moment of the x's, S'(y) the first moment moment for the x's, and S'(#y) a

moment

distances in the

in

which any frequency is multiplied by the product of the x and y directions. If these moments are now transferred to the mean, as was done fitting the frequency -curves, we have
in

Nrt 2 =0,
or
tf 2

=0;

and

SV) = S'(*y),
7 -

or

~ W(x*)

'

115

But we have already seen that the second moment


frequency (N)
is

of the whole

Ncrr

h
-

_ s '(^)
No?
S'O/)
;

If

we now

write S(.r*/) rrNcr^oJ',

we have

0-2

where r

will represent the statistical


ar's

measure of correlation
y's.

(coefficient

of correlation) between the

and

11.

At

first

sight

it

may appear

that the two equations just


a:

given, showing the relationship between


consistent.
It must,

and

y,

are not
first,

however, be remembered that the

y=:rx,

gives

the

mean

values

of

corresponding to

particular values of x, while the second gives the


of x corresponding to particular values of y.

case as an example, assume that

cr l

mean values To take a simple = a.1 =l and that r '\,


2.

then

if

x=0

the
if

mean

of the y's corresponding to this value

of x is 0,

and

# = 20 the

mean

of the y's will be

When
is

we turn
it

the matter round, however,

we

cannot, of course,

assert that the


will

mean

of the x's corresponding to y

=2

20

be

-2.

12. After this preliminary remark we may return to the two equations and consider how it is that r is a measure of correlation and whether it can always be treated as a satisfactory measure. We can best see that r is a measure of

correlation
u
*

by rewriting the equation

y=rx

in the

form

x
0"i

or

Y = X?',

and w e can then interpret


T

it

as giving

(To

origin (this
proof)
in

one characteristic in terms of the other where the mean is the is due to referring moments to the mean in the

and the unit

of

measurement

is

the standard deviation


(Y) of the corresponding
i

each case.

In this form we see at once that as one

characteristic (X) increases the

mean

116
series

of

the other

characteristic
of
r,

increases to

an

extent

which depends on the value

while

if

r is

negative
is

Y
of

decreases.

It is only if r is unity that the

increments

and
If

reached.

Y Y

become

equal

and absolute correlation

remains constant as the value of

increases

the definition at the beginning of this chapter


there
is

tells

us that

no correlation, and r in this case is zero as can easily be seen from the equation Y = X?\ The value of r lies between 1 and + 1 (see Chap. X., Art. 4), and its sign has no influence on its numerical value. In other words a large negative value does not mean that the two characteristics do not vary together but only that increases in the one correspond with decreases in the other the numerical value of r indicates the extent to which variations in the two characteristics correspond. This indication is satisfactory provided the means, when plotted in a diagram such as that on p. 108, fall approximately Distinct in a straight line (i.e., " regression "* is linear).
;

deviations from linearity are not so

common

as

might be

they are very marked in any case, r ceases to be an entirely satisfactory measure of the correlation. opportunity of removing another 13. We may take this
supposed, but
if

difficulty

that

is

sometimes met.

doubt which is best shown by be perfect correlation when one thing is always smaller than As an example we may take the correlation another " ? between the lengths of a man's right arm and his left arm
;

students have a the question, " How can there

Some

here the coefficient of correlation would be practically unity,

and since each characteristic is measured from it own mean, and in terms of its own standard deviation, the coefficient would not be decreased if every left arm was a certain number
than the right or if 99 to the right arm. in length, say --.
of inches shorter
.

it

bore a fixed relation

14.

though we required

p. 17, we noticed that them about the mean, it' was best in practice to take them about some point fixed, arbitrarily so as to avoid fractions and then adjust the results afterwards. The values of the <t and <72 can, of course, be found with The the help of the formula on p. 19, viz., v.2 = v'o d 2

When

dealing with moments on


to find

was invented by Mr. Francis Galton in connection The term it indicates the way the children of particular with the study of heredity " step back " to the ordinary population mean. parents tend to
*
;

" regression"


117

for the

deduction of TV from the second moment should be made same reason and in the same cases as in frequency-

curve

fitting.

With regard

to the

product moment we have

S(*Y)=S(*+4)(y + dB

= S(xy)+d S(y) + d2S(x)+~Hd d


1
l

or since

S (x)

= S (y) =

S(ay)=S(*Y)-N4
where
S(V/') is calculated about a point distant d
x

from the

mean of 15. The


through.

the x's and d 2 from the


statistical

mean

of the y's.

It will

example on p. 107 can now be worked be found to make the proofs and methods
easier to grasp.

given above

much

point about which

moments

are to be calculated

is first

fixed, say, the

middle of the group corresponding to maturity

age 60 and unexpired terms 20-24 years, and for the present The following the calculations are made abont this point.
table shows the calculation of the
of the totals of the y-arrays,
i.e.,

mean and second moment


the totals at the bottom of
:

the table, because columns are y-arrays and rows ^-arrays

Frequency.

x'

Frequency x x'

Frequency x

(jc')~

A
17

62

584
a 4- CO

-6 -5 -4 -3 _2 -1
1

36 20 68 186
1,168

216 100

272 558
2,336

c CO GC
co CO

00

60
8

2
3

2,870

=N

643 -2,121 388 120 24 + 532 -1,589

643

388 240 72 J, 825

* = -z2870

1589

553659


118

Hence, the mean age


the unit of grouping A nor
o- t

=60 2*76830 = 57*23170,


(Sheppard's adjustment)

because

is

5 years.

= no** -~^i2 -~T2


= 1-37465- -083 = 1-29132

cj!

=1-13637
:

Treating the rows in the same way, the following table

was formed

Frequency.

y'

Frequency x y'

Frequency x y'-

56 172 432
665

-4 -3
_o

-1
1

674 538 247 77


8
1

2 3

4
5

224 516 864 665 -2,269 538 494 231 32


5

896
1,548
1,728

665

538 988 693 128 25


7,209

2,870 =

1,3U0

969 2870

969

4=
.

^=
C Z.,2

-337631

Mean unexpired term =22-1-68815 = 20-31185


02

7209 2870

_ JL
l
s

"

=
and
16.
a,

2-31453
1-52135
is

The value

of S(a?y)

formed with the help

of the

numbers

appearing under the frequencies in the correlation table. The frequency 62 in the 50 column, for instance, is distanced three spaces upwards and two sideways from the arbitrary origin, so the value of x'y' by which it has
in very small type to

be multiplied

is

3 x 2 = 6, as

shown

in the small type.

The

other figures are obtained in like manner, but the sign must

be borne in mind.

Any

value from the

left-hand upper

division of the table, or in the lower right-hand division, will

be positive, because the frequency


product of an
se

will

and y having

like

signs

be multiplied by a hil e any value


119

from the other divisions

will

be negative, because the x and y


signs.
:

by which the frequencies are multiplied are of opposite The calculation of the product moment is as follows
Frequencies.

tfaf

Total of frequencies

/**y
+ 19
204 144 452
5

(/)

155 + 71-84-123 145 +99 + 11 +49 -11 -52-49 -90 24 + 36 + 3 + 22-22-6-9. 6 + 6 + 8 + 3-6-8-11-2 + 117
.

1
1

+ 19
102 48 113
1

2 3

4
5 6

3 9

+ 17 + 62 + 2-1-1-2 + 26

6 2

2+2+1
1

....

8 9 10 12 15 18 24

80 35
6 2 5
1 1

480 280 54 20 60
15

18 48
1,799

S(a?y)=S(a>y)-N<M2

= 1799-Nc^ = 1262-51
S(ay)
No-,0-,

1262-51

2870 x 113637 x 1-52135

= 25445.
The
coefficient of correlation

between age

at maturity
is

and

the unexpired term of

endowment assurances

'25445.

The equation representing the one function


other
is

in terms of the

= 19007//
is

measurements are made from the mean and the unit The Hue drawn in the figure gives this result. 17. An alternative method similar to the summation method given in Art. 9, Chap. III. for moments can be conveniently
where
all

5 years.

used in connection with correlation tables. Taking the same example, we obtain from the given table
another in the same form, giving the y sum of it by summing each column continuously, and then form a third table by

summing the second

table across continuously.

120

Table of
or
Unexpired
Endowment
Assurances.

the

y-sum of Correlation Table.


Central Age at Maturity.

term

30

35

40

45

50

55

60

65

70

75

Totals.

0-4 5-9 10-14 15-19 20-24 25-29 30-34 35-39 40-44 45-49
>

4
3 3

4 4
3
1 1

17 17 15
6

62 60 54 37 13 10
1

584 558 496 379 234 101


11

643 1,098 637 1,084 601 1,044 502 917 680 347 409 180 178 57
8 2 51 2

388 382 360 308 224 146


75 26 4
1

60 60 58 50 39 19
8

8 8
7

7 6 3
1
1

2,870 2,814 2,642 2,210 1,545

871 333 86
9
1

Totals

16

13

55

237

2,363

2,977

5,463 1,914 294

49 13,381

The
give the

totals in the right-hand


first

column

of the second table

sum

correlation table,

the third table.


in the

column of the and are the same as the column a?=30 in The total of the y sum, or of the first column
of the total in the right-hand
y's

xy table, gives the mean of the

(13,381h-2,870),
of the

and

similarly the

sum

of the first

row gives the mean

x's (18,501^-2,870).

Table of x-sum of above

Table,

i.e.,

Table giving

all cases

for

xy group and over


-d

in Correlation Table.

- B

Central Age at Maturity.

(3

30

35

40

45

50

55

60

65

70

ID

Totals.

0-4 5-9 10-14 15-19 20-24 25-29 30-34 35-39 40-44 45-19

2,870 2,864 2,860 2,814 2,810 2,806 2,642 2,639 2,636 2,210 2,207 2,206 1,545 1,545 1,544 871 871 871

2,843 2,781 2,197' 1,554 2,789 2,729 2,171 1.534 2.621 2.567 2.071 1,470 2.200 2.163 1,784 1,282 1.544 1,531 1,297 950 861 580 871 760
!

456 450 426 365 270


171 86 27
5
1

68 68 66 57 16 25
11
1
1

8 8 8 7

333 86
9
1

333 86
9
1

333 86
9
1

333 86
9
1

332 86
9
1

321 86
9
1

261 78
7
1

18,501 18.179 17,146 14,481 7 10,279 6 5,887 3 2,349


1

623 68 8

Totals

13,381 13,365 13,352 13,297 13,060 10,6U7 7.720 -.257

343 49 87,521

The total of the last table gives the xy moment (87,521), and the x standard deviation is found by forming from the


121
first

row the

series
8,

18501, 15631, 12767, 9907, 7064, 4283,


it,

2086, 532, 76,

and summing
:

i.e.,

70,855.

The second
the numerical

moment about
work being

the

mean can then be found,


0-4463 = aAA*a

as follows

18501 x mean = -00777


v2

= 2S

-d(l + d)

= ^0855-6.4463 x 7-4463
= 1-3747
Similarly with the y

moments
y

mean=-OQ - n =4*6624
2870 Zo/U

3381

(13381 + 10511 + 7697 + 5055 + 2845 + 1300 + 429 + 96 + 10 + 1)


=2-2312

4-6624x56624

The xy moment

= ^^-

-6*4463 x 4-6624

Remembering and that the means


just

= 4399 that v2 -fa


a?

(Sheppard's adjustment)
to

=a

are in the above

work measured from the

centre of the group

= 25, y 5

years, the values

given will be found to agree with those previously

obtained by the direct method.


the same as

The xy moment

(-4399) is

007^

i- e ->

S(#y)-r-N.

18. Before dealing with other

examples and methods,

it

may

be well to point out a use to which the particular example The result in the equation form gives the might be put. average age corresponding to each unexpired term. Now,

we might weight each entry with Mr.


case,

Lidstone's Z's,* or with

the temporary annuities, then work out an equation in each

and get new series of average ages. The results used in would give the relative accuracy of the three I have worked out the formula with the Z weights methods. (H M Table), and found that Age at maturity =57'595 + *1200 x (unexpired term).
a valuation
*

result is

The method used by me was approximate and can probably be improved, merely given as an indication of a possible line for research.

the


122

The

results could also be


at

used as a rough check on the

possibility

and there certainly seems a towards making a simple " model office " for endowment assurances with the help of the method we have been using.
average ages
of

valuations,

doing something

19.

When
it

constructing

correlation
certain
to

tables

little

care

is

necessary,

because
is

in

arrangements of

statistical

material

possible

obtain a significant value for a


in reality the

coefficient of correlation

when

two functions are


called " spurious
it

absolutely uncorrelated.
correlation,"

and
it

as
is

Such a result is the manner in which

arises is

by the

use of indices,

defined as the correlation which will be

found between indices, when the absolute values of the functions dealt with have been selected purely at random.

As an example

of

the

way

spurious correlation might be

introduced in actuarial

statistics,

we may

refer to

endowment

assurances by limited payments on the books of a company

doing a large quantity of such business, and consider the term of the original assurance (1), the number of premiums
to

be paid

in future (t2 ),

and the number


(t 3 ).

of years for

which
ratios

the policy has been in force

If Ave

formed the

f and , and Avorked out the coefficients of correlation, Ave should not obtain a measure of
the
correlation

betAveen
of

number

of

premiums payable
in each

in future

and the number

years in force, because the result of using fractions with the

same denominator

would be

to

exaggerate correlation

that
the
folloAvs

is,

to introduce spurious correlation.

The general propositions


result
:

of spurious correlation, of Avhich


is

just

mentioned

particular

case,

are

as

index in terms of the means, I. To find the mean of an standard deviations and coefficients of correlation of the two absolute measurements.

Let

#1,

cc->,

a' 3

#4
,

be the absolute sizes of any four correlated


4
,

subjects,

mu
j

m>, e2

m3 w
, ,

their
r3i
,

mean

values;
''is?

<ri,

o-.>,

o- 3

or 4

their

standard deviations; r12


correlation;
their means,
,

vvj,

''24,

the six coefficient^ of

e4 ,
{

the deviations of the four subjects from


/p.

i.e., ^r 1

=w +
1

&c.

i13

the

mean

value of the index

and

/2 i

the

mean

value of

^
;

and 2o the standard deviations of the

123
indices

and

"

respectively,

and

N the total number of


may

groups.
values

We

shall suppose the ratios of the deviations of the

mean

of the organs are so small that their cubes

be neglected.

Tben

;, 3

1 S(-')

J. S Jf 1+

.y 1+ Y

ra 3

Wij

? 3

w3

wH

But S(ci)=S(e3)=0 and S(el 3)=Ncr1 a-2r 13 and S(3) 2 =N<r32


hf, 1+ m m
,

0-3"
-.

O"!

0-0

&ia=

;!

m.2

r Vi

and

?24=

2/ n

H
.

err
1 mf

cr2

0-4

m.2

\ ^J
J

II. To find the standard deviation of an index in terms of the standard deviations and coefficient of correlation of the two absolute measurements.

^3"

[V

lA
-f-

8/

W3
-

'l

*3

/J

square terms

W32

[mi

W2 3

=*(

N - +N

-2N.

r l3

or

-13 h.i

0+ VVwr w ^
3

'l3f

III. To find the coefficient of correlation of two indices in terms of the coefficients of correlation of four absolute measurements and their standard deviations.

Let

.1*3

and

X\

be the two indices.

124
Then,
if

p be the coefficient of correlation of the two indices,

<** -*0G-*0
_
milHi

__

l 3 in x

2 g3

W3W4 V
A,
.

mi
e2
;;* 2

m3
e4

m3

.^1
wz 32
;!

m 32
4
2

W3

e2 e 4

7 4

m 2 mA

mA -

M4 2

W*2

*W4

iuizSl J)( m \m
x

\v1.2

mj

as

we

neglect the terms of cubic order.


o-]

o- 2

0-1

<r 4

0^13^24= *13*24
\w*i

r m2

m
0"4

Tn

o- 2

0-3

cr 2

0-4

m2 m 3
Co
CT 3

r23_1

in 2

r24
4
>

Hence,
<Ti

(To r

<J\

(To

(T4

mi

Wo
.

r 12

m m
x

^14
4

m m3
2
)
)

;vH
0-4
4

m 2 m4

r24
cr 4
4

P/ f <ri

0-32

en

<r 2

/fo- 2 V

cr 2

\ Krnf
Proposition
the
I.

m3

m2

w 2 2

mf

m2 m

>

means

of

the

shows that the mean of an index is not the ratio of corresponding absolute measurements, and

Proposition III. shows that the p will vanish when the four subjects forming the indices are quite uncorrected, while, if two, say, the
third and fourth, are identical, so that r 34

=l

and
O-3

m3

=m
0"3

we have

<T X
*

(T 2
>*12

=
\ \
\

m m2
x

(T X
*

O-3

<X 2

r 13

nil
O-3

H
f

^H

m3

~ ^23H

m3

<Tl

Kin x 2

+ m
.

C3 2
-

0-3OV / 2 <n r x3 U/v-H V ym 2 2 m 32 ) Mi m 3

(To

a3

m2 m3

r23 \
>

This would become applicable in the endowment assurances by limited

payments to which we

referred.

An

interesting special case arises

when the

subjects

xu

.r 2 ,

x3

are

not correlated and

cc 3

and

x3

are formed, then

0V
O32 )

Ifctf

//^.^]

125

CHAPTER

VII.

Correlation of Characters not Quantitatively

Measurable.
1.

Before the theory in

this section is dealt with,

we

will give a

table showing the class of problem with which

it

deals,

drawn

from vaccination

statistics.

This subject was brought before

the Institute of Actuaries recently by the late A. F. Burridge,

but his figures are not

in a

form convenient for the present

purpose, and the table is taken from a paper on the subject by Dr. W. R. Macdonell* and relates to the Sheffield smallpox outbreak of 1887-1888:
Strrxcth to resist Smallpox when incurred.

Cicatrix.

Recoveries.

Deaths.

Total.

S
93

Present
rz

3,951

200 274

4,151

<B

Absent

278

552

A
Total

4,229

474

4,703

The functions between which we want to find the correlation are " Strength to resist smallpox when incurred " and " Degree of effective vaccination," and the statistics we
have cannot be arranged in a more detailed manner than the The characters cannot be measured quantitatively; above. but as the absence of such measurement does not mean that there is no correlation, we must see how the coefficient can
be obtained in such a case.
* BiometriJca, vol. i., pp. 375, et seq. This paper and a supplementary one deal with, the subject in a way that shows clearly the strength of the evidence on

the side of vaccination.


frecptently neglected.

The question

of class is investigated, a practical point

126

Table of Frequencie

a+b

+d

a+

b+d

2. Using the same notation as that of the previous chapter, imagine the frequency surface a 2 \ - l 1 ( x~ N"
-

2J=

ff

.'/

2tt\/

r'2

e
cricr2

21

V<Ti 2

<r 2 -

<rx <r.)

to be divided into four parts

the axes of x and y at distances h' and suggested by the figures above.

by two planes at right angles to as h' from the origin


;

127

Then

27rv

<Tia-2j

i>'

fc*

=
by substituting

/__

xt

"

^^
11
2
?/

+r- 2rxy) dxdy

for

OtP
1

and

for

-^

V2

and writing
??

and & =

1 x-

Further,

00

27TCT! J

fc'

n = -7= p
\/27rJfc

--** X
2

6-

r/^

and

+ r/,=

-^-=

50

1
-

""dt/

and, remembering that

the total

f requeney

=a+ b +c+

?,

we have

N-2(6 + ^) = N-N /V / 2
(a

fV^&
ZJ

+ c)-{b + d)_
.XT X>

~\/

/2 f*

> 7T J o

"^

and, similarly,

Since

a, b, c,

and d are known, ^ and k can be found from

Sheppard's Tables, and the problem becomes " To find a value for r from the equation
XT
2

poo

poo

2Wl-r )h
where
d,

}k

N,

/i,

and k are known."

Appendix III.)

128

The

solution given

by Professor Pearson

(see

leads to the following equation.

^^=r+^AA+J(tf-l)(^-l)+g^-8)i(*
+

-8)

\ - 6/* 2 + 3) (*4 - 6* 2 + 3)

+ ^A(#-10A2 +15)A<*4 -10*2 +15)


+ o040 (^ Sa
where
6

-15/i 4

+ 45/i 2 -15)(^-15A- 4
-J-45&
2

15) +

etc.

H=

-.-^e
v/2tt

-*

7'

and

K=

e~^'
:

y^tt

The numerical solution has to be obtained by approximating to the roots, and Newton's method* is convenient for the purpose. 3. The numerical work of our example is as follows

4
by interpolation
for this purpose

* Jl

e
f

(a

+ c)-(b + d) _

" 4703

3755

= 7984265
7^=1-27716
in Sheppard's Tables.

In using these tables

remember that the value "7984265 corresponds

to a in his notation, so h of (1 + '7984265) = '8992132 must be looked up inversely in his Table I. If his Table III. be used it must be entered with '7984265.

Similarly,

J2
We
next require

k
\

e-Wdy= -7652561
A-

= l-18833
>

v -,Tjl?

'>

an ^ we ^rs *

et ^ rom

Shepparcl's

Tables

H = -1764870 K = -1969111
* Neirton's

.-.

log

.-.

log

H = 1-2467127 K =1-2942702
of an equation. Let f(x) =

method of approximating

to the root
is

and let & be a value near to x so that x = b+h where h is small, then f(x) = f(b + K)=f(b) + hf(b) + terms involving higher powers of h by Taylor's Theorem, and since f(x) = 0, we
to be found

be an equation from which the value of x

have
there

7i

"^7777

oraj=b

",,,,

The

(f>)

(b)

chief objection J
b,

to the

method

is

that

may be more

than one root near the value


(Cf.
p. 110,

but this does not hold in the

application to correlation.

Approximations to rate of interest from an


formula
8.)

annuity, Text-Book, Part

I.,


129

Hence log
and
Dr.

^^ w^ = N HK
C
2

='1258266
1-336062

Macdonell gives 56 instead of 62 as the last two the difference is probably due to interpolation. Turning to the expression for r, we notice that hh is a product in the coefficients of r 2 , r 4 r 6 &c, so it is well to work
figures

out

its

value and keep a note of


It
is

it

while the coefficients are

being found.
writing

also

advisable to begin the work


In

by

down

the

first six

or seven powers of
:

and

h.

Dr. Macdonell gives the following series

097083r"+-008170r 6 + '1196]4?- 5 + *137450r4 + "043352r3 + *758844r2 +r= 1*336056


In order to obtain r we must find a value near the true

one as a

first

approximation.
-758844r2 + r- 1*336056 =
r

Taking

we have

= -1 +a/{1 +4 x 1-336 x i. 5177 = 79

-7588}

Now,

this value will

be in excess of the truth, owing to

our only using two terms of the series on the left-hand side
of the equation for finding
rate.
r,

and

Ave

may
:

take 77 as a
#

trial

Applying Newton's Rule, we have

-1-336056+

(-77)
4

+-7588C77)
5

= 771

+ -1375(-77) + -1196(-77)
2

+ -0434(-77) + -0082(-77) + -0971(-77)2


3 6
5

+ 2(*77) (-7588) + 3(-77) ('0434) +4(*77) 3 (*1375) + 5(-77) 4 (-1196) + 6(-77) (-0082) + 7('77) (-0971)
6

= 77- 2rMl
= 7692

0022

In work such as
of the natural

this, a table
is

giving the

first
is

seven powers
first

numbers
of
it

a help.

There
edition

one in the

edition of Barlow's Tables, but this


get,

is

now

difficult to

though a copy

will

Institute.

table of the
ii.,

be found in the Library of the same powers has been given in


et seq.

Biometrika, vol.

pp. 474,


4.

130
for r can be

The value we have found

checked in the

following

way
d=

AW- had

N
e

21-

-(x a +y 22rxy)

dxdy

2tt\/1'>:/<
"VT r

27r\/r
1ST

r Jk

[}*

*./
|
|

2tt\/1 r-j

,.-.,

.rPVl-rt/x'^

-r
where
t

"'i\

r e-^dX^y
1

u
. .

A/l-r 2
to

To approximate

the

double

integral

we

can

find

e~^dX
J
t

for a

few equidistant values

of

formula.

The following
Values of

table shows the

and apply a quadrature work


:

Values of

Vi_ r

h-yr _ f
-2

,*
^

1
/

Product of two previous


cols.

Application
of

27^

Simpson's Rule.

(&)1-18833

(+)l-68833
2-18833 2-68833 3-18833 3-68833

5682 -0342 -6367

-1-2392 -1-8417 -2-4441

2849 5133 7370 8923 9672

9795

1968 0980 0363 0107 0025 0004

0561 0491 0265 0096 0024 0004

Xl = -0561

x4 = -1964 x 2 = -0530
X 4 = -0384 X 2 = -0048 X 4 = -0016

3503 -i-6

= 584

The

final

column shows the application of Simpson's


viz.
:

first

quadrature rule,

y^=g{yo+4y*+2yi+4y,|+

this

and gives '584 as the value of the double integral; multiplying by N(4703), we have, 274*6 as the value of the group called d, which agrees with the figure given in the correlation
table.

131

CHAPTER

VIII.

Probable Errors.
1.

In the previous chapters we have assumed that the means, standard deviations, moments, constants, and coefficients of correlation obtained from a body of statistics give an exact

measure of the constants or of the correlation between two This is not really the case. If it were possible to make an infinite number of trials bearing on a given subject, we could obtain constants or measure correlation accurately, but in practice it is only possible to take a sample from this total " population." The variation that results from using a random sample, instead of the whole "population," to find the value of any particular constant, could be reasonably measured by the standard deviation of that constant, for, as we have already remarked, the standard deviation measures the way statistics are collected round their mean or their " scatter " from it. Custom has, however, led to the use of another function known as the Probable Error, which is 67449 times the Standard Deviation. The connection between these two functions is due to the theory having been developed from the normal curve of error, and arises in the following way. The probable error gives that value of x (say p) which divides the part of the normal curve representing positive errors into two equal portions it is
functions.
;

therefore given
of the

by

p y= e~*
1
J

v2

dx = '2b, where the whole area


is

V27T

curve (positive plus negative deviations)

unity.

In

order to find

in terms of the standard deviation,

we have,
k 2

132
therefore,
-1(1

to
in

obtain

the

value

of

x,

corresponding
is

to

+ a) ='75

Sheppard's Tables, where a

i
,

e~^''da\

Jo\/27r

formula, and

This can be done by interpolating inversely by Lagrange's p is thus found to be *67449 approximately.

Probably the best way of viewing the use of the probable error is to regard it as a conventional reduction of the
standard deviation.
2.

The general

rule followed
is

by

statisticians

when

considering

probable errors

that unless a result exceeds the expected

by two or three times the probable error, it is not safe to assume that the particular case differs from the expected
result.
3.

We

will first consider the

most simple

case,

and

find the

probable error of an event happening


are

mp

times

when

m trials

made and p
q
of
its

is

the probability of the event happening,

whole series is given by m Taking moments about +q q+ (ft + q_) the centre of the group represented by p m the first moment ~ is mp m + m(m l)p m ~ 2 + + mq m = mq(p-\-q) m ~ = mq.

and

failing.
l

The
.

m =p m + inp) m ~
l

The second moment about the same point


1

is

mp m - q + 2m(m l)pm ~ 2q2 + %m(m l)(??i 2)p m ~

q^

+
.

+m q m
2

= mp m - q + m{in l)p m t
l

-.n

o 2

o 2

m(m 1) (m 2)

^-y+
a

+mqm
1

+ m(m Y)pj m -

+ m{m l){m 2)p m -

q*

+m(ml)q

= mq-\-m(ml)q

The second moment about the mean

is,

therefore,

mq + m (m

2
1 )

m q = mpq
2

The standard deviation = v/uL2 =\/mpq


and the probable error
will, therefore,
is

'67449vm^^.
and
it

4. This value is of considerable use in statistical work,

be advisable to see

its

application to a few

examples.
It

has been remarked that the number of male children


to the

born

is

number

of female children

born as 1,050

1,000

133
in
'

other words, the


>
fi
.

probability of a child being male

is

If 51,350 out of

100,000 children proved to be males

in a certain

community, would it be safe to base any theory connected with the variation from the usual probability on
statistics
?

the

The expected
is

result

is

51,220,

and the

probable error 1
difference

"67449 Jl00,000 ^? } \ 20o0 20o0


.

+ J^ = 103*9.

The

and error, no from the


130,
If

as this

between the actual case and the expected result was is only one and a quarter times the probable definite conclusion can be based on the divergence

result.

actual

number of cases had been 10,000,000, and the number 5,135,000, then the probable error being 1,039,
the
it

and the actual difference 13,000,


fit

would have been


:

sufficient

evidence for the conclusion that the ratio 1,050 1,000 did not
the particular case.
the probability of death within a year
is
#

5. If

is

007, the

probable error in 200 cases

-67449\/200 x '007
possible
to

and

it

would,

therefore, be
if

x^993 = '80, approximate to a


of 1*4 as the

loading for emergencies

2*2

was taken instead

number

of deaths expected in a year out of 200 cases

on risk
to treat

for a year.

That

is, it

would not be unreasonable

'0110 as the rate of mortality instead of '007, in order to

obtain some idea of an emergency loading for term assurances

on the assumption that the number of cases is about 200 and the average age is such that '007 might be taken as the probability of death in a year. It has also been assumed that it is correct to treat each class as if it were subject to its own rate of mortality, and had to be treated independently of the
rest of the business
6. It will
;

this

is,

however, a debatable point.

remains constant, then \/mjpq has its largest numerical value when 'p q = i, which shows that an office will generally find that if it has two classes of

be noticed that

if

size, and one is subject to a higher rate of mortality than the other, the former will have the larger actual deviations from the expected number of claims, because the

equal

probability of dying in a year only reaches the value i at the

end

of the mortality table.

7. If

we now

consider a frequency distribution instead of an


it

individual experiment,

is

clear that

if

ys

is

the theoretical

134
frequency in the sth group that would occur in
probability of a particular one of the
stli

cases, the

cases falling in the


its

group

is

H =p, and
Then
will

the probability of

falling elsewhere

is

l~=q.
group

in

m trials

the distribution of frequency

of this

be given

by

{jp

+ q)m

and

its

standard

deviation

^=^/TO^=V m m(1 -y
where
y' s

=^ys, and

is

accordingly

the proportion

of

y8

"N individuals.

in the typical group of m out of In actual practice we have, however, only the sample, but since the sample is only likely to deviate from

which we should expect

the theoretical value to a proportionally small extent,


replace the
theoretical

we can

value by the observed

frequency,

and write

W -^
1

(i

where y s
8. If

is

now taken

as the frequency of the sth

group in

the sample.

one frequency in any distribution

is

too large, then the

others must on the average be too small, and this shows that

the errors between groups are correlated.

The next point


correlation

to

be investigated

is

the

amount

of

the

between

deviation y s and y s >, or between deviations in the frequencies of the sth and sth groups.

Let

8^= deviation

from y s the most probable value

in the sth

group, then, since

yi+y2 +y3 +

+>/*+

ys>+

-\-yn-m

Now,

if

the sample has given too large a value in the sth group,

it is

proper to suppose that this error will be distributed

among the

other

groups in the proportion of their relative frequencies. This assumes that deviations are only due to random sampling, not to defective

measurement or
latter causes, it

classification

if

the deviations were due to these


to

might be reasonable

assume that the excess was

135
drawn from adjacent groups, but it is necessary to confine our attention to errors due to random sampling, and we then have

%fr=-8y.x
and
This gives the
Sy s ,Sy s

-^~
my
1
s

=-^ m

*"

ys\m

effect of the error in the sth group on that in the group on the assumption that the error in the sth group is the cause of all deviations to give effect to the fact that all groups contribute we must sum the expression for all samplings, and obtain

s'th

m lys/m
Hence

(i-)

^y^y^ym--

yj

^
is

The only

step likely to cause difficulty in the above

in

summing
of a

fysfys, but if it is borne in correlation table gives ro-i<r2 ,

mind that the


the
difficulty

(ccy) -moment

should

immediately

disappear.

9.

To find

the standard deviation

<rh

of the mean h of a system of

observations.

Measured from a

fixed point

we have

S(y) where y
is

the frequency of size

ac.

mSh = S(.vSy)
Squaring each
side,

we have
for

m%8h)'2 =S(xs*8ys*)+2S'(xs a;s,Sy88ys,)

where

S'

is
s'.

the

sum

all

values of s and s\ for which 5


of samplings after

is

not

equal to

By

dividing by the

number
2 2

summing

for all

such samples, this gives


m-cr/r
or,

= S OV oW 2 S
) 2
"

'

xs00s>o-y8(ry8 -rym )

using

(i.)

and

(ii.),

V nAjtf = S O/y) - sf.r, ) - 2&'(s*J& m my V


V

mix m S(xsys) m =.m([X2 h


2 2

&(xs,ys,)

m
whole distribution about the
of

where
fixed

m/x' 2 is

the second
// 2

moment
2

of the

point.

sample.

But Hence

=cr2 =square
a-jl

standard

deviation

of

=a-l^

(iii.)


136
10. Tins last result
is

of considerable use in statistical work.

is recorded and the mean used to compare the particular experiment with another of a like kind. Is an actual difference between the means due to some cause other than random sampling ? A practical application would be the comparison of the average profit from various classes

large

number

of cases

of business for a of

number

of years.

The standard deviation

the

profits

in the various years would be obtained by

taking the square root of the second moment about the mean and dividing it b}T the square root of the number of years the quotient would give cru of (iii) It is only by using the
;
.

standard deviations or probable errors deduced from them


it would be possible to sav definitely whether a lower average profit in a certain part of the business was due to

that

chance or to some cause requiring removal. In a similar way it is possible to find the probable errors of the moments and constants, but this leads to the more theoretical parts of the subject, with which it is unadvisable
11.

to deal in a
to call

book

of this character.

It

is,

however, necessary
function
in

attention to the probable error of the coefficient of

correlation
statistical

owing

to

the

importance of that

work.
has been shown* to be "67449j=^

Its value

Vn

We

have

already found in Chapter VI. that the coefficient of correlation between the age at maturity and the unexpired term of

endowment assurances
definitely,

is

-25445, but
this

it

is

not right to assert


that
this

on the strength of

information,

coefficient represents

any

real relationship, until

we have

seen

how

large a deviation in the value of such a coefficient might

probable error of

from our having taken a random sample. The r is found by inserting 2870 for n in the formula given above, and we have +'00118 as the probable error. It is customary to show this by writing ?=: 25445 + '00118. In this case, therefore, the probable error is so small that the result is reliable, but it sometimes happens, especially when only a few cases have been
arise purely
* The proof is given in Vlul. Trans. A., vol. exci, pp. 231-241. It is, however, of a complicated nature and unsuitable for insertion in the present work. This last remark also applies to the proof for the probable error when the

fourfold table

is

used.


137
considered,

that

the probable

error

is

large

enough

to

make
result
r

it

impossible to base any definite conclusion on the


If,
it

produced.

for instance,

it

had been found that

= "0827 + '0621,

would

definitely

that the correlation

have been impossible to say had not arisen merely from

chance.
12. In using the fourfold table the probable errors are larger, as

would be expected, because the grouping

is

rougher, and
gives as the

the formula by which

they should strictly be calculated

becomes complicated.
probable error of
r,

The formula referred

to,

G7449f(fl + <Q ( C +

ft)

xV'n \

'

4ft 2

+ ^2

(a

+ c)(d+b)
ri
2

^
ab

(a

+ b) (d + c)
ft

ad be
n
2

cd
n
2

ac

bd] i
n2
J

where

v=

2Vl r
1 -=
f

_=

e -(fe

+fc 3 -2rfcfc)/2(i-r)

h =?*

Ji-r>e-& a dx
o

^2ic]

l-rh

fa

_1

Jl-r* e -1fc* fa

v2ttJo

assumed that the fourfold table is so arranged that a + b>c + d where a, b, c, and d have the meanings indicated on p. 126. The numerical work for finding the probable error of r for the example in Chapter VII. is
it is

and

+ c>b-\-d and

as follows
1
r

h ~ rk

fa

-t=\ v 2irJ

Vl -'"

e-^dx

_ v
7

r -5(3821

e-&dx= -21505

JJttJ

by Sheppard's Tables.
1

^
^-re-&*da;=

fa=

-=
1

-32230

e-i* <fo= -12639

v= x

= 2^1-^
;

e-(fea +fc3 +2UT)/2(l-r) =:

e --S6744

2ttx -63900

10462

138
1
.-.

log

= 1-98039,

log

^ = 1-33254,
is-

and

logifr2

= 1-10171

A
.*.

the probable error of r

G744Q 10462

^__l. j
^4703
[

-02283 + -00145

+ '00479 +

00252

- -00408- -01015 1*
J

= -0124.
by

13. It

is

often sufficient to assume that the probable error


2

the above formula will be three times that by the formula

The latter gives + -0040 in the above case, \J n which is a good approximation to J of -0124.
67449

139

CHAPTER

IX.

The Test of Goodness op


1.

Fit.

When

the values of ordinates and areas were calculated in

the examples of the various types of frequency-curves, no


systematic attempt was
to

made

to test the graduations, in order

ascertain whether the results obtained were

reasonable.

Actuaries have generally been in the habit of imposing on the

graduated values of any table on which they may have been working rough checks which have amounted to a comparison of the totals in various groups and an inspection of the changes of sign in the differences between the graduated and

ungraduated

figures.

needs, however,

The problem of the goodness of fit more accurate treatment, for inspection, even
calculation of

when aided by the


if

mean

error for each


;

group,"* can only tell that certain differences are large

and
it is

the

mean

error be exceeded in two or three cases,

impossible to

say whether the

excesses

are in

any way

is

balanced by equalities in the rest of the graduation. A test required which will give some measure of the disagreement

judged by the whole graduation. Now, if there be N observations distributed in n + 1 groups, the numbers in the group being m\, in 2 w' w+ i, we have to find a criterion to enable us to decide when the series m 1} m.2 mn+1 will be a legitimate graduation. We may clearly take a legitimate graduation to be one in which the observed values [in') do not differ from the theoretical (m) by more than the deviations that would be expected in
as
2.
. . .

random sampling.

What we

require

to

know

is

not the

* Generally calculated as i^ripq, which gives approximately the average magnitude of the deviations irrespective of sign from the mean result. G. F. Hardy, Journal of the Institute of Actuaries, xxvii., pp. l!14, et seq.


140
probability that the particular series of

m"s

will occur if the


;

m's represent the theory, but the probability that the

??z/ s,

or

an equally

likely or less likely series, will arise.

To appreciate

the difficulties of the problem

we may

consider the simplest

and suppose that a coin has been tossed six times and come down 4 heads and The "graduation" we make is 3 heads and 3 tails, 2 tails. and to test it we require to find the probability of obtaining a result as unlikely, or more unlikely, than the observed one. This probability is the same as that of getting any one of the
case, that of a coin-tossing experiment,

following results

6 heads and
5
>>

tails
1
>>

4
2

4
6

It is obviously impossible to calculate

such probabilities

even when the simple probabilities leading to the deviations are known, in any but the easiest cases but when
directly,
;

we do not know
our inability to
actually arisen.

the simple probabilities, or the case


is

is

complicated one, a further difficulty


tell

introduced, owing to

possible cases are

from a priori reasoning which of the more or less likely than that which lias
would, for instance, be impossible to say,

It

without a large amount of arithmetical work,

when 20

dice

were being thrown, whether the probability of getting ten 11 sixes" or more was greater than that of getting two " sixes" or fewer but this is an extremely simple case compared with
;

the general proposition in which deviations over a series of

numbers have to be considered. 3. If it is assumed in any measurement on one subject that the deviations from the mean take the form of the normal curve of error (Type VII. ), and it is required to estimate the
(,

chance of obtaining deviations greater than a certain value say), it will be necessary to sum all values of the normal curve beyond t on each side of the mean, i.e., take
e-* flfe+
j

2
I

e~*'

dx = 2
j

e~ x *dx

and divide the

result

by the area

of the

whole curve,

i.e.,

by

141
the total deviations.

Assuming that there are two measurerisk, for instance,


afc

ments instead of one (the exposed to


ages), the deviations are, as
of one
;

two

it

were, in two directions instead


to the

and

it

is

necessary to take an expression with two

variables instead of one.

The expression analogous

normal curve

is

the correlation surface

z=z
with which

lcr i"

"i"2

o a

we have already

performed for compared with the total. If there are n measurements, it becomes necessary to deal with a function of n variables, and this will give the reader a slight idea of the problem from the mathematical point of view, and suggest that he will expect the quotient of U\o ?i-fold integrals to give the probability. The next step is to reduce these ?i-fold integrals to the form of ordinary integrals, and it has been shown* that the result
I

The integrations must be both variables from t and t' onwards, and
dealt.

p_J

~ e~&* xn l dx
X
e
-},x*
4-

xn

cJ

I.
is

reached.

In

this

expression
the

stands for a complex

function depending
indicated
test for

on

n variables

from which

the
is

expression was evolved, and measures the

position that

by the probability of the particular the graduation of which is required.

distribution, the

measure of the probability P can be obtained, a must be found from the statistics of the particular ^ graduation, and in the paper to which reference has already been made its value is shown to be such that
4. Before a

value for

mr

It is almost obviously necessary to use the

square of the

difference in order that negative differences


* Professor

may, equally with


of

Karl

Pearson

"On

the criterion that a given system


a correlated system of

deviations from the probable in the case of


is

variables

such that

it

can be reasonably supposed to have arisen from


July, 1900.

Random

Sampling." Phil. Mag.,

p.

fairly extensive table of f It gives values of 155, &c.

corresponding to \ 2 horn 1 to

be found in Biometrlha, vol. i., values of n + 1 from 3 to 30, 30, with a few additional values and auxiliary
will

P P

for

all

tables for the calculation of further values.

142
positive differences, increase the improbability of the system,

while a ratio

is

required to bring into account the size of the

an error of 15 in a group of 20 would be very large, but in a group of 1,000 would be negligible.
;

group

for

5.

The

practical aspects of the test of

fit

and

its

application

may now

be dealt with.

the facts representing the graduated and (1) If ungraduated figures are only available in groups, then the value of the probability by the test will, as a rule, be lower as the number of groups is increased. This practical point should be borne in mind, as it sometimes happens that graduations are tested in groups of, say, 5 years of age; but the graduated
figures for individual ages are then used unreservedly, though,
strictly speaking, they
(2)

The
if

test

applicable

maybe no better than interpolated values. assumes a distribution, and would not be the numbers were a series of ordmates, though
fit if

the application of the test would probably give a fair idea of


the goodness of
a large

number

of ordinates

had been

given in the

series.

(3) The tails of the experience will be very small and " We ought to take our final theoretical never fit exactly. " groups to cover as much of the tail area as amounts to at " least a unit of frequency in such cases. " (Phil. Mag.,

footnote, p. 164.)
(4)

If the

number
will

of observations be multiplied

by

/,

sa}^,

and
value

the
of

deviations
2

and % the test will show that the fit is worse. This may seem strange at first, but a little consideration Avill show that it is
figure,

by be multiplied by the same


are
also

multiplied

/,

theu

the

reasonable, since a large


series

number
;

of cases will give


if

smoother

than a small number

then,

the results are propor-

tionally the

same

in

two examples having the same theoretical


it

distribution but

different total frequencies,

follows

that

the

one with greater frequency is less probable than the one with less frequency. The probability of a result as bad,

or worse, than three heads and one tail in coin tossing (two heads and two tails being the theoretical result) is '625; but

the probability of a result as bad, or worse, than

3x2 = 6

heads and 1x2 = 2 tails is "289. (5) I have found, in applying the test, that when the numbers dealt with are very large, the probability is often small, even though the curve appears to fit the statistics very


143
closely.

The explanation

is

that the statistics with which

we
of

deal in practice nearly always contain a certain

amount

extraneous matter, and the heterogeneity


small experience
in the

is

concealed in a

by the roughness

of the data.

The increase

number

of cases observed

removes the roughness, but


is

the heterogeneity remains.


fitting point

The meaning, from the curve really

of view,

is

that the experience

made

up of more than one frequency-curve; but a certain curve, approximating to the one calculated, predominates. thought that the introduction of (6) It is sometimes additional constants must necessarily improve the fit of a curve. It may do so in some cases, but it is quite possible to take a curve with ten constants and find it gives a worse result than another having only three.
(7)

It

may sometimes be

advisable to use a curve giving a

slightly worse

reasons

agreement than another for simplicity, or for such as those which prompt actuaries to employ
;

Makeham's hypothesis
best-fitting curve in

but, as a rule,
case, that
is,

it

is

well to use the

any

the curve giving the

highest value of P.
6. In a recent paper " On the Comparative Eeserves of Life Assurance Companies, &c." (Journal of the Institute of Actuaries, xxxvii., pp. 458-9), Mr. King remarked that it M Model Office for the M ; and it is permissible to use the

will

be interesting to apply the formulas given above to see


is

what

the probability of the

distribution
:

if

the

HM

be

taken as the theoretical distribution


Policies issued arranged
Central Age
in

Age-Groups.

o^_ -HM
+

Square of

M -H M

Group.

HM
6-97 17-75

QM

+HM

20 25 30 35 40 45 50 55 60 65

2101
18-41 13-82 9-45

623
3-51 1-97
85

7-30 20-45 23-11 18-40 13-05 8-44 5-07 2-58 1-20


40

33

02
41

2-70 3-07
01 77 1-01

15

00 04

11
22

1-16 1-93
77 45

1-60
30 21

100-00

100-00

6-10

640

X3-39

144

Biometrika gives
respectively.

There are ten groups, and ^ 2 = 3*39, and the table in P= '964295 and -911413 when %2 = 3 and 4

There

is,

however, a further point, for


for 100

it

has to

be decided

if it is sufficient to test

new

policies.

500

would reduce the probability to about '05, which means that in only one case out of twenty would a random sampling lead to a system of deviations from the IP 1 as great as that shown by the (P 1 This result will remind the student of the great danger of dealing with percentages without considering the actual number of cases investigated. Mr. King's other table, which is of greater importance in his work (policies according to attained age), shows a much closer agreement,
.

as

is

'831051 for 10,000 cases.

In a paper on Makeham's formula {Journal of the Institute of Actuaries, xxxv.) Mr. Calderon gave some graduations of F mortality table, and on pp. 188 and 189 his results the are summarized in a form which is convenient for applying

and The probability if ^ 2 2 The odds % = 30 is -051798, and if % = 40 is '003272. against the best of the three graduations must be 30 to 1, which shows that Makeham's formula is unsuitable or the methods of application unsatisfactory.
the test.
C, give 35*11, 34*02

His methods A,
2

and

37*77 as the values of

for 20 groups.

In the numerical examples of Chapter V., the value of

Type I. is about *98. Type II. gives '7, Type IV. f, and the sums assured in Type VII. '2, and the reserves give
for

a probability greater than


7.

'9.

which reference is necessary is the which a good fit ends and a bad one begins. It is impossible to fix such a value. We have merely a measure of probability for the whole table, and if the odds against the graduation are twenty or thirty to one the result
to

The only other point

actual value of

at

is

unsatisfactory

if

they are ten to one the graduation


the

is

not

unreasonable,

but

exact

value

when

a result

must

be discarded cannot be given. As, however, it is clearly impossible to imagine any test which can fix an absolutely definite standard, there is no reason for objecting' to the
particular

method because

it fails

to do so.


45

CHAPTER
The Theory
1.

X.

of Contingency.

table showing no correlation can be formed from the


of

totals

on

p.

107,

any ordinary correlation by dividing the total


to

table,

such
Thus,

as

that
in

of

each

column
the

proportion

the

totals

of

the

rows.

first

column would be

Unexpired Term Frequency with


correlation

...

4
)

9
172 2870

...
"

56 no ~ X ...> 2870

in

and the remaining part of the table would be A moment's consideration a similar way.
definition

formed
of

the

beginning of Chapter VI. will show that such a method of formation must necessarily give the required table, because, since each column is formed in proportion to the total, the means of the columns must all be the same as the mean of the total, which shows at once from the definition that no correlation can exist in
given
at

the

such a table. 2. The following

shows the figures exhibiting no correlation in ordinary type, and those exhibiting correlation in Now, if these two sets of figures coincide exactly small type. in any particular case there is clearly no correlation in the table if they differ slightly there is a slight amount, and if
table
;

they

differ greatly there is a considerable

amount

of correlation,

146

and we come therefore to the conclusion that an alternative method of finding the correlation between two things is by measuring the difference between the figures in the actual correlation table and those that would have arisen if there had not been any correlation. In the last chapter we discussed a method of measuring the goodness of fit (or amount of agreement) between two sets of figures, and this suggests that we might calculate % 2 by squaring the difference between each pair of figures in the table and dividing the result by the frequency when there is no correlation. The reason for choosing the figure from the table with no
correlation as the divisor
is

that

it

always has a value, while

the correlation table

may

give a frequency of zero, which

renders

it

impossible to use the latter as a divisor.


CENTRAX
Ar, e

Unexpired
tf rm of

at Maturity.
Total.

Endowment
Assurances.

30

35

40

43

50

55

60

65

70

75

0-4
5-9

i
2

1 2
1

1-1

11-4
26

12-5
6

21-4
14

7-6
6

1-2

2
"5

56 172

4
1

1-0
2

37
6

35-0
62

38-6
36

65-8 23-2
40 127 237
271
22

3-6

10-14
15-19

6
2

26
9

9-3
17

87-8
117
145

96-8
99

165-4 58-4
52

9~0
8

1-2
1

432 665
674

1-4
3

9 9
1

39
6

14-4
24

1353 1490 254-4 899


155 167
123

13-9
11

1-9 1-9
1

S4

20-24
25-29

1-4

4-0

11-6
3

1372 151-0 2578


133

91-1
7S

14-1
20

11
'5

8
4
1

32
1-5
5

11-6
9

109-5
90

120-6 205-9
231

72-7
71

11-2
11

1-4
3 7 2 2

538
247
77 8
1

30-34
35-39

5-3
1

50-3
11

55-3
49
1,7-2
6

94-6
127

33-4
49

5-2
8

1-7
2

15-7
1-6
2

29'4
49

10-4
22

1-6
"2

40-44
45-49

0
'0

0
'<>

0
:

1-8
2

31
4

1-1
-2
1

-2

Total

17

62

584

643

1,098

388

60

2,870

3.
ir

As
will

it is

clear that

will give a

be interesting to sec the connection between

coefficient of correlation r;

measure of the correlation, it and the and the following proof shows that
here
<-

*2

and the correlation tabic can be approximately represented by the normal correlation surface.

147

with no correlation

Using the same notation as that of Chapter VI., the frequency is given by

J.Tr(T]<r2

while that with correlation

is

S=

^vl_ rv
+
+
J

N
d?

:c-

2rxy

y" \

l( r2

Then

^f

-f oo oo J

-<f=^^
-IN

r+

r+

-i/^y^-fsa+y.y^i

where

ar

and y

+1
\

U-f*/

(1-r2) 2

(l-r2) 2 "" (1-r2 ) 2

by No.

(vi.)

of

Appendix III.

r2

"1
or
?

"Vr^
L 2


148
4.

The

result just obtained

may

be considered a
:

little

more

closely, as it leads to

some valuable conclusions


lie

(1)

It

shows clearly that r must

between

and

+ 1.
(2)

Since the value of

be affected by the order of the columns (or rows), it will be seen that it is permissible to interchange them, provided, of
<$r

will not

course, the whole

column

(or

row) be moved at

once.
(3)

The proof shows that


obtained exactly
is

will

not necessarily be

used, because

infinite
(4)

if a very small number of groups by using the integral calculus an number of groups was assumed.

We

also

assumed, however, that we were dealing

with perfectly smooth series; but since

is

measure
large

of

the

goodness of

fit

between

the

no-correlation figures, a very groups gives undue prominence to the chance deviation, due to the use of a
correlation

and

number

of

random sample, and the value


that of
2
cj)

of r

found from
grouping

may

differ

considerably from the value

reached by the ^-moment.

Too

fine a

may
one.

give a less accurate result than a less fine

These conclusions are borne out by practical work, and any student who cares to go into the subject can find the value of r by the two methods from a large table, using various groupings, and he will see that the best agreements are obtained when the grouping is neither very fine nor very rough. It should, however, be borne in mind that unless the correlation table takes the form assumed in the proof, an exact agreement between the two methods cannot be expected;
5.
it
is,

for this reason, well to


r

distinguish the value

-\/

v 1

+ <pof

from the value


contingency.
It

by calling the former the

coefficient

seems to

me

that

if

the

difficulty about-

grouping could be overcome, the coefficient of contingency would be more useful than the coefficient of correlation (?).


149
6.

The following

table shows the table given above

grouped

so as to enable us to obtain the coefficient of contingency

more conveniently
Unexpired term of

Central Age at Maturity


Total.

Endowment
Assurances. 30, 35,

40

&

45

50

oo

60

65

70&75
55
2

0-9
10-14
15-19 20-24

7-0
14

46-4
ss

511
42

87-2
54

30-8
28

228
432 665

134
28

87-8
117

96-8
99

165-4
127

58-4
52

10-2
9

20-6
38

135-3
145

149-0
155

254-4
237

89-9
84

15-8
11

20-9
4

1372
133

151-0
1G7

257-8
271

91*1
7S

16-0
21

674
538

25-29

16-7
;>

109-5
;>0

120-6
123

205-9
231

72-7
71

12-6
14

30-34
35-49

7-7
l

50-3
11

55-3
49

94-6
127

334
49

5-9
10

247
86

2-7

175

19-2
s

32-9
51

11-7
26

2-0
1

Total

89

584

643

1,098

388

68

2,870

Working out % 2 from


257-9,* which

this

table, the value

is

found

to

be

gives <*=
'2872.

^ ='0899,
*03
;

and the

coefficient of

contingency

is

This differs from the value of

by the other method by about

r found but an inspection of the

table on p. 107 leads to the conclusion that the totals caunot

be considered to be curves of Type VJL, and the condition

)rr=

Vi5 i+<f

is i

not therefore satisfied.

7. The probable error of the coefficient of contingency may be taken as approximately one and a third times that of r. 8. Though Ave have dealt with the theory of contingency

from the point of view of

its

particular application to ordinary

correlation tables, the reader should bear in


statistical practice its chief

mind that

in

use

is

when

characters not capable

of quantitative

instance, as

colours, shapes, diseases,


is,

measurement are being examined, such, for The method of &c.

application

of course, exactly the same.

* In case any student


the contributions to and 2 "7.

may

not follow the method easily, Ave


first

may mention

that

2 x from

the

column are

7'0,

15*9, 7'5, 13*7, 3'6, 5'S,

150

now, in conclusion, refer briefly to some of the practical applications to which the theories of correlation and contingency can be put. Some actuarial applications have already been made, such as the investigation by Professor Pearson and Miss Beeton into the inheritance of duration of life (see Journal of the Institute of Actuaries, vol. xxxv.,
9.

We may

pp. 112,

et

seq.j

and 458,

et

seq.

and Biometrika,
the

vol.

i.,

part
wife

I.)

while the correlations between ages of husband and


at death of

and age

man and

children under age

21 are obvious applications.

number of his The last-

mentioned case seems to suggest that it might be possible to apply the method in connection with the valuation of pension funds, while we have already noticed that it is possible that it might be of use for checking average ages, &c, in

endowment assurance

valuations.

151

APPENDIX

I.

USEFUL CONSTANTS.

= 2-71828

18285

e-*= -36787 94417

tt=314159 26536
log 10 e =

-43429 44820

loge 10 =230258 509

log (logioe) =1-63778 43114

log 10 7r=

-49714 98728

log 10 \/7r=

-24857 493(34

log io -7
\
'

=160091 00057
27T

log 10 e-r,

= r-96380 87932

152

APPENDIX

II

AND r FUNCTIONS.

B(m,ri)

=
=

xm

~l

(lx) n
-

-l

dx

Jo

r( 1 >)
l

e- :cxi

d,e

I.

\x^- e- x dx=i

e-

ar

^-

+(p l)\xP-*6-*da:
vanishes
'
,

by integration by parts. "When p 1 is positive,

-3

-1

when #=0, and


the
rule
for

when

a?

=00

it

can

be written

and

evaluation
ch. xiv.)

of undetermined can be applied.

forms

(Edwards'

Diff.

Gale,

xP- l e~ xdx={p 1)
Jo

aj*""%"*daj

Jo

I\p)=(p-l)T(p-l).
If
7

be an integer,

JT(^)

=|jp 1.

tt

II.

m prove B(m, n)= r./ To


x

r{)n)r(n) ;
*

,\

Putting zx for # in the equation for F(m) 3 we have

r(m)=
and
r(m)e--z n
~
1

c^W" ^
1

Jo

e~ e

l+x} z m+n ~ l dx

Jo

153

But

if

g(l
f

+ x) =ij, we

get
T

p-zX+xtym + n-XJy

C
I

x
fjUuiii+ii-LI,,

r(m)r(n)=r(m + n)
But putting

we

d*

l+#=

in this integral,

obtain

fl_ z )m+n
and

dz

which reduces

at once to B(ra, n),

IIITo

prove

r(4)=VV.

We
2
Jo

have already shown in the proof for Type VII. that

e~''dx-=-\/ir,

and by putting x 2 = z, we have

e-*"<fc=f e-*a-*<fe=r(i)=vV.
-'0

Jo

For statistical work a table of r(^) or logr(A') is required, and Legendre has given a table of the latter to twelve figures for values of x between 1 and 2, from which logT^-) can be found easily, provided x is small. When x is large, logT^') can be approximated to. The
best

known approximation

is

V(x + 1) =\/2irxx x 6- x e ~ ux *
or

log 10 r (as

+ 1) =log

10

s/2ir +

{as

!j)

log 10 x

(x +
8.

~\ log

10

and

it

can be used when x


is

is

not less than

To show how

the table of logF^')


*

used,

and

also

how

the approximation

proof of this well-known


ii.,

approximation will be found in Chry&taVa

Algebra, vol.

pp. 308,

&c,

or in Boole'a Finite Difference*, chapter VI.


154
approaches
the
true
value,

the

following

table

has been

prepared

Table comparing True and Approximate Values of logT(x).


Log
r(a?)

True.

Approximate.

1-372

2372 3372
4-372 5-372 6-372 7-372 8-372 9-372 10-372

1-948975 086329 461444 989332 1-630012 2-360148 3-164424 4-032009 4-954838 5-926670

086743 461532 989369 1-630025 2-360156

3164430
4-032010 4-954837 5-926669

000414 000088 000037 000013 000008 000006 000001

-000001 -000001

six-figure
is

table

of

Legendre

given on pp.

loglX^), obtained 166 and 167.

by

abridging

155

APPENDIX

III.

The Integration of some Expressions connected with the Normal Curve of Error.
1.

On

page 91 we showed that

e~ x 'dx=. V7r
en

(i.)

e-^i h 'dx = Ji^7r

(ii.)

Since the curve

is

symmetrical, we have

1"+ CO 2n+l x e -x* fix,_ zero

If
00

(iii.)

we integrate \x2tle~

'~dx

by

parts,

we have
r

I^n
and inserting the limits

-*"+ \2x - e- xi dx x2ne~ x*dx= n 2 + l 2+l J

j.-ji/i!

and

+x
:

we have

r+
J

-a

x2n e- x*dx= the

r+c

x2,l + 2 e- za dx

2-r-lJ_ x
connection

(iv.) v J

This last formula shows


even moments.
2. Referring to p. 112, let

between the successive

r^_2gyr

Then

2!

can be put in the form


1_

Z e

2(1-,-)

U,

# a

2
j.

_
e

//-'(I-/'- ) <r,-

2(1-/-)

Hi-ax
cr,

"
"I

=Z e
Then
j

jr_
2<r 2
a

2(l-i-=> x *

'

{gdx=z ^27r{l r^o-ie-y'l 2 "** by

(ii.)

2dfo?d^=

^ v2tt(1
J

r-)<nc-ii-i-<r*"-dy
(v.)

00 J oo

oo
v^l

27ro- 1 o-o

r~~

156
3. Using the same
if

method

as that just given, it can

be shown that

ac> I'2
f+OD
I

f+

2g-KaaB -2toy+cy 3

)^^ ==

(v i.)

for the

index can be written

-2V -^}
l

( "t

'- i)

and

if

we then integrate with respect

to x,

we have

and

if

ac

2 is

positive,

we can

integrate this last expression with

respect to

y and have

411acb+x
4.

\/ ac

b2

r+x

We

will

now

find
J

zxydxdy
ao J ao

Proceeding as in
r+oo

(v.),

we have
xyz
e

r+co

zxydx=
J -X,

oo

where
3la
:

X=a?

"
.

0(

yc--'-

-^"

y/'tri
"'

_
e

x2(i-r=)c

Jx
e~ x

+o

because by

r
(iii.)

~ !

XdX

is

zero

=*o
But, by putting

rV2ir(l r2)fe-y
and using

l2

*'-'-

=0

in (iv.)

(ii.),

we have

j.

*
j

zxydxdy=Ztfrfrr2vr

= ^S(T
because by (v.)

(T2r

(vii.)

'"2^

^P

157
5.

We may

now

deal with the

"

To

find a value of r

problem referred from the equation

to in Chap.

VII

2irvl
where
rf,

2
j7i Ja-

N, h and &
}

are

known."
-

Consider the expression

e-fo

+.v" -2rey)/2(i

-r-) XJ say

(a)

and expand

it in

terms of r by Maclaurin's theorem, then

where

Wm

=p^

+//

)^_ J ^
differential
coefficient

/dnXJ\

( 7)

Now
respect to

take
r,

the logarithmic

of

with

and we have
(pt?+y2 2rxy) d
dr
(I

ldV

r2)-l

dr

= _(^ +
dV
,/r

//

2_ 2nry )(l_ r 2)-2 +

(1

_ r2 )-l iry +

.( 1

_r

o )

-l

==U{1 #*}-^+r(l ^ y^H-r^i*}


>*

Differentiate times by Leibnitz's theorem and put

and we

have

u n+ i=n(%n~ 1 ar2 y*)un -\ n(n l)(*i 2) 2ww _ 3

+ ;ry (w w + w O 1) ?*
.

._

2)

Hence

^o=I

ux xy

^ =(<r2 _l )0/2 _l)


w4 =
(#*

(8)

- 6#a + 3) <y - 6/ + 3)
(8) are

The laws indicated by

t'

xv n _

( 1

y _ 2

....()

w n =yiv n -i (nl)w n _ 2
and we can, therefore,
re- write (j3) as

Ltj
7T

i
e
Z7T

-^

!(1+

^v+^+...)
1!

^1

(?)

158

integrating this from k to

oo

with respect to

.r,

and remembering that

wn

does not involve


i

x we have
t

htt
/,

i
1-rr

lJ h
I

i r
1
!

^7rJ

/,

;,

+ r-=
w
!

e-h* v n dx+.
J
/,

=
where V
is

v^tt

-^fv. +
[

^
1!

+ ... +

^+..A
n\
|

M
f

written for

v n e-*'~dx.

v2irJJi

Now
that

integrate this with respect to

y from & to

oo

remembering
e-%y*w n dy,

Yn

does not involve y, and writing


i
r

for

we

see that

r TJdxdy
V,
4

can be expressed as a series of which the

general term

is

n\

W, and we must now

evaluate

OT

and

W^.

From
of v n
is

(e) it

can be shown by induction* that the general form

xv

n(n-l) xn ~ 2
j-|

+ n(n-l)(n-2)(n-3) x n~4

&c.

(7?)

Now we

notice that

dvn
-j-

ax

nvn -i
dx

by

(c)

=*_!

Multiply by e~&* and integrate


\e-fr*v n dxz=
j

#e

-'

v n -idx

- **

ax

*^tlie proof

is

as follows

] (

,,-(-

1)

, .^.,1

"-

,"- 2)

-',

":;

^-^-^-^-*> .^\.,
__

-(,,_!).,.,.

(-l)(-2)(-3).,^_
n(n +~

-*"

//(-])
ji

arn

--

l)(n -2)(n-3) xn ~ 4 -2- -.-^--...

159
and integrating the latter integral by parts we have

Now, writing
d

H for -= e-^
V2w

',

and

K for -= r-~^ \
:

we have from

(a)

V2tt

N
27rJ
/,

fc

\\n\

=*
or

(b

+ d)(c + d)

^V^ +s(- Hk(, _


H
!

*(r n nwyF ,

1 ),

N
;l

0^-i)^,J

remembering that

N = a + h-\-c + J,

we write
\

W be

"/**,

= ,+

'J

&*+ ^ (A- 1)(*-1) +

*(**-3)*(**-3)

+ ^7*(^-10A + 15W^-10P+15)
2

&c

(viii.)


160

APPENDIX

IV.

Alternative Systems of Frequency-Curves.


This Appendix
to

deals with

some of the systems of frequency-curves


Objection
against
its

which have been suggested instead of those of Chapter IV,


Pearson's system has

not been

generally

directed

practical sufficiency, but has rather been

founded on the contention


is

that

the theoretical basis

of

the

system

insufficient.

It

is

interesting to note in this connection that in a paper read before

the Statistical

Societ}^ this

year, Professor F. Y.

Edgeworth, who
Pearson's

has himself suggested other methods,

points

out that

Generalized Probability' Curve appears more justifiable in the matter


of a priori justification the longer its philosophic basis
to
criticism.
is

subjected
to the

With

these prefatory remarks,

we may turn

suggested methods.
I.

Method of Translation.

As the normal curve has an approved


effected
.

by using y=-t/W3'. This merely conceals the use of an absolutely general expression, and It is hard one still requires to know what forms are best for f(jx) why the normal curve should be held to be anything more than to see For a fuller account of a first approximation to a general result. this method, the reader may refer to F. Y. Edgeworth, Journal of Statistical Society, vol. lxi., pp. 675-689, or J. C. Kapteyn, " Skew Frequency-Curves in Biology and Statistics," Groningen, 1903.
theoretical basis, graduation

might be

II. The use of half one normal curve for positive and half Obviously, there is another normal curve for negative frequencies.

no theoretical basis, for each separate normal curve has its

meaning

based on certain assumptions, but the two parts become meaningless.

The use of a part of our own use of part


it

a normal curve for a complete series


of one on p. 95
is

is

empirical

so too, but

we did not adopt

for actual curve fitting, but as a hypothetical series of numbers.

The method cannot give suitable curves for graduating the examples of Type II. or Type III., nor a curve rising abruptly from the
axis of x.
III.

The use of the

series

y=A <(>) + A
vol.

3 </>"'0r)

+ A^ ( +
iv

...

where

(#)=

j= e-C*-'')-/^-. The
Trans.,
into

curve

has

been

given

by

Edgeworth, Camh. Phil.

C. Y. L. Charlier, "Researches

the

xx, pp. 36-65, 113-141, Theory of Probability,"

161

Meddelanden fran Lunds Astronomiska Observatorium, Lund, 1906, and T. N. Thiele, "Theory of Observations," London, 1903, either in
this form, or as

y=+(*){l + 08[(*-ft)-8e(*-ft)i

+ 04[(a: iy-Qc(x by+3cP]+


which might be developed as
uses the
t/

.}*

f <f>(.v) ,cf

J a x-{r l

a. .v2

.]-.

Charlier

method of moments
,

for fitting the curve and, using our

notation, 6=/*,
tables of

cr-=fJ>. 2i
&*<}>'"

A s=
and

cr
3
</>

A 4=
(\r),

and writes

&c.

he gives

o-</>(V),

(x)

lv

his formula as

An
we

example will be of help.


the

Taking our example of Type IV.,


o-=2'127818,

find

mean
and

44 5772339,

N =43020,
Charlier's

^3
<7

Hi-") tables, we obtain the following:


3

= 012208

'007079, and

using

3!

Central

Age

.r- 44-57723
5a

sum
1

of three

First

Term.

Second Term.

Third

Term

previous cols. multiplied by


4302-0.

-3-7200

0004

+ 0002

10 15 20 25 30 35

- 3-2500
-2-7800

0020
0084 0277 0734 1561 2661 3637 3986 3503 2468 1394 0632 0229

- 2-3100
-1-8401

+ -0006 + -0013 + 0018 + -0006

- 1-3701 - -9002

40
45 50
55

-4302

- 0030 - -0064 - -0054


+ -0006 + 0060 + -0060 + -0022 -0010 -0018

+ -0397
5097 9797 1-4496 1-9196 2-3895 2-8595 3-3295 3-79S4

60 65 70 75 80 85 90

0067
0015 0003 oooo

43693

- -0012 - -0005 -0001 - -oooo

0003 0007 0010 0001 0030 C052 0023 0050 0084 0037 0032 0047 0025 0002 C010 0007 0003 0001

4
14

46 126 306 637


1,108 1,563 1,753 1,548 1,075

589 256 92 29 7
2
1

9,156

Thiele obtains the equation by writing e-<*- 6>' ^' = e -^e^'^'e'^'l 2 ^ and then expands the last term by Madaurin's theorem. Charlier in " Ueber das Fehlergesetz," Arkiv for Matematik, vol. ii., Stockholm, 1905, adopts a method which follows that of Laplace. EdgeAvorth gives more than one method of
lyl

leaching the formula.

162
This graduation, which shows the method in a suitable application,

which is less probable than the Type IV. graduation as judged by the test in Chapter IX., though the difference is entirely due to the bad agreement in the age 5 group. With the Type II.
gives a result

example the equation

is
5

^ = 12M-4{o-^(.i)--0081o-VOr)--01882o-

iy

^>

Or)}

which gives a fair result that is improved into an excellent one by omitting the middle term, but negative frequencies arise (see below).

The formula fails to graduate distributions such as our example for Type I. or Example I. of Table I., and for these cases Charlier
suggests the use of the series

IV. y

= FGr- + c)=B
e
-

tf.r)

+ B A^(.r)+B A2f( r)+


1

where
A sin irx |~1 tcx rl
A, A-

'V2

+
)

er x \*

2!(>-2)
Theory of Observations,
values
c

when x
p. 21).

is

a positive integer (cf.

Thiele,
is

Charlier' s fitting of this curve


as
is
3

arbitrary as he gives four


certain
for

methods of solution according


u\
c,

we assume
based on

&c.

Thus one
examples

solution
2

w=l

and
c.

= 0,
He

while

another, by assuming B!
gives

= B =B =0

finds A, w,

and

only

two points in connection with its a third point to which he application have still to be cleared up is that a statistical criterion depending on the moments does not refer is required to show which series is to be used and which solution
two
because
;

of

IV.

is

to be taken.

Apart, however, from these points the use of a series seems open
to

many

objections

for if one of the later coefficients should have

a large value the neglect of later terms

may

involve considerable error

while the higher

moments which

are necessary to find these coefficients


errors.

are untrustworthy

owing to their large probable


of terms of such series

The use

of

a limited

number

as those suggested,

may

lead to negative frequency

which

is

objectionable from the practical

point of view and hard to reconcile with sound theoretical treatment.

Edgeworth's well-known curve

is

merely the

first

two terms of the

series

III.,

and

its

inability
is

to

graduate distributions having considerable skewness

rather

accentuated by Charlier's recent work in which he often finds the

next term of the series significant.

163

APPENDIX

V.
Ac.

BOOKS, REFERENCES,
The
will,

following

list

covers the principal papers to which reference has


it is

been made, though

by no means a complete bibliography.

It

however, be found to cover those papers most likely to prove of

value or interest to actuarial students.

THEORETICAL PAPERS,
Biometrika
"
(Editorials)
:

&c.

On the
vol.

Probable Errors of Frequency Constants."


ii.,

Biom.,

pp. 273, et seq.

" Elementary Proof of Sheppard's Formula?, &c."


vol.
iii.,

Biom.,

pp. 308, et seq.

Blakeman,
"

J.

On

Tests for

Linearity

of

Regression

in

Frequency

Distributions."

Biom.,
:

vol. iv., pp.

332, et seq.

Blakejiax,
"

J.,

and Pearson, K.

On

the Probable Error of Mean Square Contingency." Biom., vol. v., pp. 191, et seq.
C. B.
:

Davenport,

" Statistical

Methods." London: Chapman


:

New
&

York: John Wiley

&

Sons;

Hall, 1904.

Galton, F.

" Correlations

vol. xlv., pp.

and their Measurement." 136-145.

Proc. Boy. Soc,

Pearson, Karl:
"

in

Skew Variation

Homogeneous Material."

Phil. Trans.

A., vol. clxxxvi., pp. 343, et seq., and a supplement in Phil. Trans. A., vol. cxcvii., pp. 443-459.

"Regression,

Hereditary and Panmixia."

Phil. Trans.

A., vol. clxxxvii., pp. 253-31S.


"

On

Form

of Spurious Correlation which are

may

arise

when
lx.,

Indices

used, &c."

Proc. Boy.

Soc.,

vol.

pp. 489-498.
"

On

the criterion that a given system of Deviations from the Probable in the case
Variables
is

of

a Correlated

System of

such that

it

can be reasonably supposed to


Phil. Mag.,

have arisen from


July, 1900.

Random Sampling."

164
Pearson, K_u<l (continued):
"

of

On

the

Correlation

Characters not

quantitatively

measurable."
"

Phil. Trans. A., vol. cxcv., pp. 1-17.


Closest Fit to Systems of

On

the Lines and Planes of

Points in Space."
"

Phil. Mag., November, 1901.

On

the Mathematical Theory of Errors of Judgment, with


special reference

to the Personal Equation."

Phil.

Trans. A.,
"

vol. cxcviii., pp.

235-299.

On

the Influence of Natural Selection on the Variability

and
vol.

Correlation

of

Organs."

Phil.

Trans.

A.,

cc, pp. 1-66.

"

On

a General

Theory of the Method of False Position."

Phil. Mag., June, 1903.


"

On

the

Theory

of

Contingency

and

its

relation

to

Association

Normal Correlation." Drapers Company Research Memoir: Dulau & Co., 1901.
and
the General Theory of

"

On

Skew

Correlation and

Non-

linear

degression."

Drapers"

Company

Research

Memoir: Dulau &


"

Co., 1905.

On

the Systematic Fitting of Curves to Observations and

Measurements."
vol.
ii.,

Biom.,

vol.

i.,

pp. 265, et seq.

and

pp.

1, et seq.
Gr.

PEABSOlf, KaEL, AM) FlLON, L. X.


"

:
and

On

the Probable Errors of Frequency Constants and on


the Influence of
Correlation."

Pandom

Selection on Variation
,

Phil. Trans. A., vol. exci

pp.

229-311.

Sheppard, W.
"

F.

Distribution and
excii.. pp.

On

the Application of the Theory of Error to Case> of

Normal
'

Normal

Correlation."

Phil.

Trans. A., vol.

101-167.

On

the Calculation of the Most Probable Values of the Frequency Constants for data arranged according to
equidistant divisions of a scale."

roc.

Lon. Math.

Hoc,

vol. xxix., pp.

353-380.

Yule, G. U.
"

in the case of

On

the Significance of Bravais' Formula) for Regression,

&c,
"

Skew

Correlation."

Proc. Roy. Soc,

vol. lx.. pp. 177, et seq.

On

the Theory of Correlation."


Society, vol.
lx.,

Journal of Statistical

pp. 812, et seq.


165

TABLES.
Peofessoe Peaeson informs us that Tie hopes to publish very shortly a volume of copyright tables for the use of statisticians, and it has therefore been decided not to include any tables in this volume
except that of lo<jY{p).

Barlow's Tables oe Squares, Cures, &c.


Eldertox, W. Palin
:

E.

&

F. N. Spon.

Tables for Testing the


Observation.

Goodness of
vol.
i.,

Fit of

Theory to

Biom.,

pp. 155, et seq. of the

Tables of Powers of Natural

Numbers and

Sums

of

the Powers of the Natural Numbers, from 1 to 100.

Biom.,

vol.

ii.,

pp. 47-1, et seq.

Gibson,

W.

Tables

for

Facilitating

the
iv.,

Computation

of

Probable

Errors.

Biom.,

vol.

pp. 385, et seq.

Lee, A.

Tables of F(r, v)

and H(r, v) Functions, ;*=1 to r=50,


Brit. Assocn. for Advancement

and <f}=0 to 45. of Science, 1899.

Sherrard,

W.

F.

New

Tables of the Probability Integral.


pp. 174, et seq.

Biom

vol.

ii.,

There are also tables of the following functions in Davenport's


'
;

Statistical

Methods "

Smaller tables than Sheppard's of the

Normal Curve

of Error.

Table of log r(p).

Table of

first six

powers of Natural Numbers from 1 to 30.


r.

Small table of Probable Errors of

Squares, Cubes, Square Roots, Cube Roots and Reciprocals


of

numbers from 1

to 1,054.

Table to six decimal places of Lsin#, Ltan#,

Lcos#,

Lcot0.

166
Table oflogT(p).

p
1-00 1-01 1-02 1-03

2
9500 7043 4656 2338 0089 7907
5791 3741 1755 9833

3
9251 6801 4421 2110

4
90U3 6560 4187 1883 9647 747S 5376 3338 1365 9456 7610 5825 4101 2438

5
8755 6320 3953 1656
9427 7265 5169 3138 1172 9269

6
8509 6080 3721 1430

7
8263 5841 3489 1205
|

8
8017 5602 3257 0981
8772 6629 4553 2541 0594 8710 6883

9
777: J
i

1-99 1-99 1-99

1-04 1-05 1-06


1-07 1-08 1-09

1-99 1-98 1-98


1-98 1-98 1-98 1-97
1-97 1-97

1-10
1-11

7529 5128 2796 0533 8338 6209 4145 2147 0212 8341
6531 4783 3096 1469 9901 8390 6939 5544 4205 2922
1695 0521 9401 8335 7321 6359 5449 4589 3780 3020 2310 1648 1035 0470 9951 9480 9054 8676 8342 SU53 7808 760S 7451 7338 7268 7240 7254 7310 7407

112
1-13

1-97
1-97 1-96 1-96

1-14 1-15 1-16

9750 7285 4892 2567 0311 8122 6000 3943 1951 0022 8157 6354 4612 2931 1309 9747 8243
6797 5408 4075 2797 1575 0407 9292

9868 7692 5583 3539 1560 9644


779 L 6000 4271 2602

117 118
1-19
1-20

1-96 1-96 1-96


1-96
1-96 1-96 1-95 1-95 1-95 1-95
1-95 1-95 1-95

1-21 1-22 1-23

1-24 1-25
1-26

1-27 1-28 1-29

8231 7223 6267 5360 4506 3702


2.>4

1-30
1-31 1-32 1-33

1-95 1-95 1-95 1-95

7974 6177 4441 2766 1150 9594 8096 6655 5272 3944 2672 1456 0293 9184 8128 7125 6173 5273 4423 3624 2874
-174 1522 0918

22-12

1-34 1-35 1-36

1-95 1-94 1-94

137
1

1-38 1'39
1.40
1-41 1-42

1-94 1-94 1-94

1-94
1-94 1-94 1-94 1-94

143
1-44 1-45 1-46
1-47 1-48 1-49

T94
1-94
1-94 1-94 1-94

1585 0977 0416 9902 9435 9015 8640 8311 8026 7786 7590 7438 7329 7263 7239 7258 7317
7115) 1

0362 9853 9391 8975 8605 8280 8000


7765 7573

7425 7321 7259 7239 7262 7326 7431

0992 9442 7949 6514 5137 3815 2548 1337 0180 9076 80^5 7027 6081 5185 4341 3547 2802 2106 1459 0861 0309 9805 9348 8936 8571 8250 7975 7744 7556 7413 7312 7255 7240 7266 7334
711
1

0835 9290 7803 6374 5002 3686 2425 1219 0067 8968 7923 6930 5989 5099 4259 3470 2730 2040 1397 0803 0257 9757 9304 8898 8537
8221

7950 7723 7540 7401 7305 7251


7241 7271 7343 7157

7428 5650 3932 2275 0677 9139 7658 6234 4868 3557 2302 1101 9955 8861 7821 6834 5898 5013 4178 3394 2659 1973 1336 0747 0205 9710 9262 8859 8503 8192 7925 7703 7524 7389 7298
72 48

9208 7053 4963 2939 0978 9082 7248 5475 3764 2113 0521 8988 7513 6095 4734 3429 2179 0984 9843 8755 7720 6738 5807 4927 4097 3318 2588 1907
1275

8989 6841 4758


27-10

5365 3026 0757 8554 6419 4349

0690 0153 9663 9219 8822 8470 8163 7901 7683 7509 7378
7291 7246 7243

0786 8896 7068 5301 3596 1951 0365 8838 7369 5957 4601 3302 2057 0867 9732 8649 7620 6642 5716 4842 4017 3243 2518 1842 1214 0634 0102 9617 9178 8785 8437 8135 7877 7664 7494 7368 7284 7214
72 \b

5128 3429 1790 0210 8688 7225 5818 4169 3175 1936
0751 9621

2344 0403 525 6709 4955 3262 1629 0055 8539 7082
568 L 4337 3048 1815 0636 9511 S439 7420 6453 5537 4673 3858 3094 2379 1712 1094 0524 0001 9525 9095 8711 8373 8080 7831 7626 7165 7348 7273 7241 7251 7302 7395
-7529

'

7242
7277 7353
7 474

7282 7363 7485

7289 7373 7499

'

8544 7520 6547 5627 4757 3938 3168 2448 1777 1154 0579 0051 9571 9136 8748 8105 8107 7854 7645 7479 7358 7278 7242 7248 7295 7384 7514

167
Table of log

r(p)

continued.
5
|

p
1-50
1-51 1-52 1-53

1
|

2
|

4
7612

6
7647 7850 8093 8376 8698 9059 9458 9896 0372 0886 1437 2025 2650 3312 1010 4743 5513

7
7666 7873 8120 8406 8732 9097 9500 9912 0422 0939 1494 2086 2715 3380 4081 4819 5592 6400 7213 8122 9034 9980 0961 1976 3024 4105 5220 6367 7547 8759 0003 1279 2586 3925 5295 6697 8128 9591 1084 2607 4159 5742 7354 8996 0666 2366 4094
5851 7637

8
7685

9
7704 7919 8174 8468 8802 9174 9586 0035 0522 1047 1610 2209 2845 3517 4226 4970 5740 6566 7416 8301 9220 0174 1162 2183 3238 4326 5447 6600 7787 9005 0255 1538 2852 4197 5573 6980 8419 9887 1386 2915 4474 6062 7680 9327 1004 2709 4143 6206 7997 9816
9

1-94
1-94 1-94 1-91 1-94 1-94 1-94
1-94 1-95 1-95 1-95 1-95 1-95 1-95
1-95 1-95 1-95
!

1-54 1'55 1-56


1'57 1-58 1-59

1-60
1-61 1-62 1-63

7545 7724 7913 8201 8500 8837 9211 96^9 0082 0573 1102 1668 2271
2911

7561 7744 7967 8229 8532 8873 9254

7577 7761 7991 8258 8561 8910

1-64 1-65 1-66


1-67 1-68 1-69

1-95
1-95 1-95

'

1-70
1-71 1-72 1-73

1-95
1-95

1-96 1-96

1-74 1-75 1-76


1-77 1-78
1
i

1-96 1-96 1-96 1-96 1-96 1-96


1-96
1-97 1-97

79

P80
1-81 1-82

183
1-84 1-85 1-86
1-87 1-88 1-89

1-97
1-97 1-97 1-97

1-97 1-98 1-98

1-90
1-91

1-98
1-98 1-98 1-98

1-92 1-93

194
1-95

196
1-97 1-98 1-99

1-98 1-99 1-99


1-99

199
1-99

3587 4299 5047 5830 6649 7503 8391 9311 0271 1262 2-87 3315 4436 5561 6718 7907 9129 0383 1668 2985 4333 5712 7123 8561 0036 1537 3069 4631 6223 7814 9191 1173 2S81 1618 6381 8178

9672 0130 062 4 1157 1727 2333 2977 3656 4372 5124 5911 6733 7590 8182 9409 0369 1363 2391 3453 4547 5675 6835 8028
9253 0509 1798 3118 4470 5852 7266 8710 0184 1689 3224 4789 6383 8007 9660 1343 3051 4794 6562 8359
1

9294 9716 0177 0676 1212 1786 2396


3013 3726 4416 5201 5991 6817 7678 8573 9502 0467 1464 2496 3561 1659 5789 6953 8149
11377

7594 7785 8016 8287 8597 8946 9334 9701 0225 0728 1268 1815 2159 3110 3797 4519 5278 6072 6901 7766 8661 9598 0565 1566 26U1 3669 4770 5901 7071 8270
9501

0637 1929 3232 4606 5992 7408 8856 0333


1841

0765 2060 3386

3379 4947 6541 8171

9827 1512 3227 4969 6740 8510

4741 6132 7552 9002 0483 1994 3535 5105 6706 8336 9995 1683 3399 5145 6919 8722
3

7806 8041 8316 8630 8983 9375 9806 0274 0780 1321 1905 2522 3177 3867 4594 5356 6151 6986 7854 8756 9693 0664 1668 2706 3778 4882 6019 7189 8392 9626 0893 2191 3520 4881 6273 7696 9119 0633 2117 3690 5264 6867 8500 0162 1853 3573 5321 7098 8903

7629 7828 8067 8316 8664 9021 9117 9851 0323 0833 1380 1965 2586 3244 3938 4668 5434 6235 7072 7943 8848 9788 0763 1770 2812 3887 1991 6135 7308 8514 9751 1021 2322 3655 5019 6111 7810 9296 0783 2299 3816 5423 7029 8665 0330 2024 3746 5498 7277 9085 5

6317 7157 8032 8911 9881 0862 1873 2918 3996 5107 6251 7127 8636 9877 1150 2454 3790 5157 6555 7984 9443 0933 2453 4003 5582 7192 8830 0498 2195 3920 5674 7457 9268

9450

7896 8146 8437 8767 9135 9543 9989 0472 0993 1552 2147 2780 3449 4151 4894 5671 6482 7322 8211 9127 0077 1061 2079 3131 4215 5333 6484 7666 8882 0129 1408 2719 4061 5434 6838 8273 9739 1234 2761 4316 5902 7517 9161 0835 2537 4269 6029 7817 9633

INDEX.

ACTUARIAL TERMS, xi-xiii. ADJUSTMENT OF MOMENTS,


ALLIN,
S.

24-30, 102-104.

H.

J.

W., 33.
128, 123.

APPROXIMATION TO ROOT OF EQUATION,

AREAS
for curves, 48, 49, 58, 63, 104, 105.

moments

of a

system

of,

28-30, 102-104.

ARRAY,

107.

B-FUNCTIONS,

59, 85,

App.

II.

BEETON,

M., 150.

CALCULATION OF CURVES,

ch.

V.,

161, 162.

CHARLIER, F. V. L., App. IV. COEFFICIENTS OF CORRELATION,

&c. (see

CORRELATION', &c).

CONSTANTS for curves, ch.


I.,

ch.

V.

table of, App.

I.

CONTACT,
and

28.

CONTINGENCYcorrelation, 145-148.
vii, viii.

mean,

mean
theory

square,

vii, viii., ch.

X.

probable error of
of,

mean

square, 149.

ch. X. ch. VI., VII.,


X.,

CORRELATION,

App.

III.

and actuarial work, vi., 150. and contingency, 145-148.


coefficient of, 112, 115, 116,

119, 128-130.
of,

probable error of coefficient


spurious, 122-124.

136-138.

CRITERION
for type of curve, 42-17, 50.

for goodness of

fit,

ch. IX.

170

DEATHS,

graduation of

statistics,

79,

80.

DIAGRAMS
construction
of,

5-8, 49, 72, 80, 89.


6,
7,

reproduction

of,

11,

12, 57, 64, 67, 73, 81, 86, 90, 108,

114.

EDGEWORTH,

F. Y., App. IV.

ENDOWMENT ASSURANCES
check valuation of, 121. and correlation and contingency, 106-109, 117-122, 145-149.

ENTRANTS,
EXISTING,

statistics of, 84-86. statistics of, 8.

EXPOSED TO RISK
hypothetical, 94-100.
statistics of, 8, 54-58.

FREQUENCY-CURVES
desiderata, ch.
I.,

36.

Pearson's system, ch. IV.


other systems, App. IV.
table of, 47.

FREQUENCY DISTRIBUTIONS,
G-FUNCTIONS,
70, 71, 74-77.

ch.

II.

GALTON, F., 116 (footnote). GAMMA FUNCTIONS, 54,


T(p)
r(i),

56,

App. II.-

when p <
App.
ii.

54.

table of, 166, 167.

GEOMETRICAL PROGRESSION, GOODNESS OF FIT, ch. IX.

2, 5,

103.

GRADUATION
36, ch. V., App. IV.
of rates, 92-100.

GRAPHICAL REPRESENTATION
of corvee, 72, 80, 89.

of distributions, 5-8.

HARDY,

G. F., 19, 54, 62, 93, 99, 100, 119-121, 139 (footnote).

HYPERGEOMETRICAL
INDICES, dangers

SERIES,

37, 38.

of using,

122-124.

KAPTEYN,
KING,

J.

C, App. IV.

G., 79, 143, 111.

LEAST SQUARES,
LEE, A., 71. LEES, M. M., 93. LIDSTONE, G. J,

Method

of, viii.

xiii.,

22 (footnote), 62, 121.


133.

LOADING FOR EMERGENCIES,

171

MACDONELL, W. R., 125. MAKEHAM'S HYPOTHESIS, 13, 98-100, MARRIAGE STATISTICS, 66, 67, 93-96.
MASCULINITY,
132, 133.

144.

MEAN, 9
distance between

mean and mode,


135, 136.

41.

probable error

of,

MEDIAN, viii. MODE, 9


distance between
position
of,

mean and mode, 41


endowment assurances,
144. 122.

47. for

MODEL

OFFICE,

King's

statistics, 143,

MOMENTS
adjustment
of,

24-30, 102-104.

formulas for change of origin, 17-19, 117.

method

of, ch.

III.,

112, 113, 115-121.

notation for, 16.

summation method,

19-23, 54, 102, 119-12L.

NEWTON'S APPROXIMATION TO ROOT OF EQUATION,


129. "

128 (footnote),

NORMAL CURVE OF ERROR,"


TABLE,
graduation
of,

45, 46, 87-91,

App. ILL, IV.

qnm(5)

96-100.

ORDINATES
loaded, 16.
mid-, 16, 48, 63.

moments

of

system

of,

15, 24-28.

PARABOLA, fitting of, PEARSON, K., v., vi.,


system of curves,

13-15, 30-34.
ix.,

x.,

14,

128, ch. IX.

ch. IV.,

App. IV.
33, 34, 150.
fitting, ch.
I.,

PENSION FUND STATISTICS,


PROBABILITY,
Integral (see also "

connection with curve

37-39.

Normal curve

of error ").

PROBABLE ERRORS,

41, ch. VIIL, 149.

QUADRATURE FORMULAE,
RATES, graduation of, 92-100. REGRESSION, 113, 116.
SEX-RATIO,

25-27, 48, 58, 63, 130.

132, 133.

SHEPPARD, W. F., 29, 89, 95. SICKNESS TABLES, graduation, &c, SKEWNESS, 11, 41, 49.

70-72.

172

SPURIOUS CORRELATION, 122-124. STANDARD DEVIATION, 10-12, ch. VIII.


of parallel arrays, 113.

and probable

errors, 131, 132.

SUMMATION METHOD OF FINDING MOMENTS,


TABLES, THIELE,
list

19-23, 54, 102, 110-121.

of,

165.

T. N., App. IV.


ch. IV.,

TYPES OF FREQUENCY-CURVES,
VACCINATION STATISTICS,
VARIATION,

V, App.

IV.

ch. VIII.

coefficient of, viii.

WITHDRAWALS, statistics, WYATT, F. B., preface by.


YULE,

8,
v.

103, 104.

G. U., 113 (footnote).

UNIVERSITY OF CALIFORNIA LIBRARY BERKELEY


Return
This book
is

to desk

from which borrowed.


the last date stamped below.

DUE on

1948

19Dec'49FB

15^~2RS
IflAYl

1953

I2%4 9fW

2/lfe'5C.

lEPi

1J63

APfi* 4/954 u,

DEAD

30^'
LD

A'

2lApr'60RT
21-100?n-9,*47(A5702sl6)476

LD

21A-50-/??-4,'60

(A9562sl0)476B

General Library University of California Berkeley