Académique Documents
Professionnel Documents
Culture Documents
337-349
337
A GENERAL APPROACH
TO CALCULATING
ISOTOPIC
DISTRIBUTIONS
FOR MASS SPECTROMETRY
JAMES A. YERGEY
Middle Atlantic Mass Spectrometry
Worfe Street, Baltimore, MD 21205
School
of Medicine,
725 North
ABSTRACT
Fundamental principles for obtaining mass spectral isotopic distributions are applied to a
general computer program which can be used to calculate and present in tabular and graphic
form the isotopic contributions for any molecular formula. A unique feature is the retention
of the isotopic distribution, exact mass, and absolute abundance for all individual peaks at
each mass. Special considerations have been made for the large number of isotopic combinations which occur for many higher mass compounds. The computer program accepts the input
of a molecular formula followed by interactive input of a number of parameters which affect
the final presentation of the theoretical distribution profile.
INTRODUCTION
1983
338
. ..)m(b.+bz+b3...)(cl+c,+c,+
. ..)...
(1)
where a ,, a,, a3, etc., b,, b,, b,, etc., and c,, c2, c3, etc., represent the
individual isotopes of the elements in the molecule, and the exponents m, n,
o, etc., are the number of atoms of each element present in the molecule. The
terms which result from the expansion of each polynomial can be used to
describe the isotopic contributions, exact masses, and absolute abundances
for each elements contribution to the molecule.
As an example, an expression describing all isotopic permutations of the
m.w. = 3482) is given by
pep tide molecule glucagon ( C , 53H,,,N,,O,,S,
(12, +
(32s
13q153(lH
33s
+2H)224(14N
34s
3%)
15~)42(16~
170
IS~)~O
(2)
339
If the polynomial for oxygen is expanded and the like terms collected, one
where the coefficient is equal
resulting term would be 58 000 160,,170,801,
to the number of times the term appeared in the expansion. Like terms can
be collected, in this application, since only the number distribution of the
isotopes can be derived from a mass spectral peak, not the position of each
isotope in the molecule. The contribution of each isotope is described by the
subscripts of the expansion term. The exact mass (803.7585) and absolute
abundance (1.518 X 10T3) for this permutation can be calculated from
isotopic masses and relative abundances. It should be noted that preserving
the isotopic contributions in the calculation permits the demonstration that
another expansion term for the oxygen polynomial (1225 r60,,*0,)
differs
in mass by only 5 p.p.m. (803.7541) from the above example. The example
also indicates that usually negligible isotopes such as I70 begin to have real
contributions in the middle molecule mass range. Finally, this example
illustrates the large number of permutations which occur when dealing with
high mass compounds. If the product of polynomials in eqn. (2) are
expanded with regard to the position of each atom there would be
(2)153(2>224(2)42(3)50(4)1 = 3.9 X 10r5 individual terms generated, which
when collected to yield the like terms would still result in 7.9 X IO9 unique
permutations.
The large number of permutations generated for high mass compounds,
coupled with the desire to preserve isotopic information concerning each
permutation while including all isotopes in the calculation, necessitates that
the program directly calculate only the unique permutations for each element. This is in contrast to calculation methods that expand each polynomial, followed by collection of like terms [9-163, which requires an
excessive amount of computer time when applied to large molecules.
An additional means of reducing the number of permutations and thereby
the calculation time, is to stop the calculation of permutations for each
element when all permutations having an absolute abundance greater than a
user-defined threshold have been determined. Different means of applying
the threshold can be understood by examining Table 1, which contains the
first ten unique permutations generated by the expan$ion of the carbon
polynomial ( 2C + l3C) 153from eqn. (2). A commonly employed method to
determine a threshold is to stop the calculation after a given number of
permutations have been calculated [ 14,151. This method is satisfactory for
small ,molecules containing elements whose most abundant isotope is also
the lightest isotope, since the third or fourth peak in the distribution is
almost always ( 1% of the first and most abundant peak. However, Table 1
illustrates that as the number of atoms of a given element becomes relatively
large, the distribution shifts in a way that makes this method of determining
a threshold invailid. A better method is to include only those permutations
34-o
TABLE
13C
0
144
Absolute
abundance
mass
0.18410
0.3 1327
0.2648 1
0.14825
0.06 183
0.02049
0.00562
0.00131
0.00027
0.00005
1836.000
1837.003
1838.007
1839.010
1840.013
1841.017
1842.020
1843.023
1844.027
1845.030
3C)53
Exact
whose abundance is greater than some absolute value. This method may also
become invalid for higher molecular weight compounds, since there are so
many permutations that the absolute abundance of even the most intense
permutation can become very small (-c 0.01). Therefore, a threshold which
can be applied to any molecular formula must be based on a percentage of
the most abundant permutations absolute abundance; this method is used
in the program described. The absolute value of the threshold will be based
on the current permutation with the greatest abundance, and since the most
abundant peak will usually change during the course of the calculation, the
absolute value of the threshold will also change.
The absolute abundances for each permutation can be described by the
following combinational equation
(3)
where n is the number of atoms of the element, rl, r,, I-~, etc., are the
abundances of each isotope and a, b, c, etc., are the number distribution of
the atoms in a given permutation [21]. Returning again to the example given
for eqn. (2) of the expansion of the oxygen polynomial (I60 + I70 + 18O)5o
of glucagon, it can now be shown that the absolute abundance of the
160,770,80, term is derived from the following equation
(50)!
A = (47)!(2)!(l)!
(r16)47tr17)2(r18)1
(4)
by substituting into eqn. (3) the number of atoms of oxygen (50) and the
number distribution (47, 2, 1) of these atoms in this particular permutation.
341
(4(b,)!(c,)!
_A
2-
.-*
(a,)!(b,)!(c,)!
. . .
)tc12-P,)(
pyc,,
p-y
r2
r1
r3
.**
(5)
where subscripts denote the two different permutations. The program calculates the abundance of the first permutation using eqn. (3) and then proceeds
by basing each subsequent abundance on that of the previous permutation,
according to eqn. (5). The example in eqn. (4) would then be described by
A
_-A
2-
(47)!(3)!(o)!
(47)!(2)!(l)!
)(47--47)~
16
>-(
r17
)(1--o)
118
(6)
or
A2
3A1(57rh3)
0)
which bases the abundance of the 160 4717021801 permutation ( A2) on that of
the previous term 160d717O,O, (A,). The total number of multiplication
and division operations is greatly reduced using this formula, thereby
reducing computational errors and saving substantial calculation time.
ALGORITHMS
The program which calculates and displays the molecular ion distributions
consists of a main module (EXMASS)
and four subroutines, which are
outlined in the following paragraphs. PARAMETER
statements at the
beginning of each section of the program allow the operator to modify the
common block array sizes easily, in order to accommodate vastly different
types of molecules, e.g., polystyrene, II = 1000 (CS004H80,0), a large biomolecule such as insulin (C,,, H,,,N6507,S,),
or an organometallic (Sn,C,,H,,),
while still keeping the overall core requirement below 32K words.
A molecular formula is input within subroutine DATAIN
as elemental
symbols which follow periodic table abbreviations, accompanied by the
number of atoms of each element present in the molecule. The formula is
decoded, and the exact mass and relative abundances of each isotope for
elements in the formula are read from a disk file. The disk file presently
342
contains all naturally occurring isotopes of all stable elements, but can easily
be modified to include isotopically-enriched species. An auxiliary program
(UPDATE) is used to update information on the disk file which contains
isotopic masses and abundances for each element. This routine can be used
to input new elements, and to list, modify, or delete existing isotopic
information, but must be run independently of program EXMASS.
All unique permutations of each element are generated within subroutine
PERMUT by a set of nine nested DO loops. The first permutation contains
all atoms in the first isotope. The number of atoms in the first isotope are
decremented and the remainder placed in the second isotope, forming a new
permutation. The number in the second isotope is then decremented, placing
the remainder in isotope three, etc. The loops are executed only to the level
corresponding to one less than the number of isotopes for the element,
therefore accommodating any element with ten or fewer isotopes. As each
permutation is generated its absolute abundance is calculated and compared
to the maximum abundance for that element. If the abundance is greater
than the selected threshold, the isotopic distribution, absolute abundance,
and exact mass of that permutation are saved. When appropriate, the
maximum abundance for the element is also updated_ If too many permutations are generated for a particular element, as defined by PARAMETER
MXPERM,
subroutine RESET is used to reset the threshold, reduce the
number of permutations, and inform the operator of the change in threshold.
If no permutations are saved for a given number of atoms in the first
isotope, any further decrementing of this number can only lead to permutations that will not be saved, and the calculation is therefore terminated.
In subroutine FORMULA
the permutations for each element are combined with the permutations for all other elements, generating complete
molecular formulae. The permutations for the first element are saved as the
initial combinations, and as the permutations for each successive element are
completed they are combined with the existing combinations. This procedure
accomplishes the multiplication of each successive polynomial described in
eqn. (1) to complete the calculation of the isotopic distributions. Each
combination is stored as a pointer to the isotopic distribution, or permutation, for each element, along with the exact mass and absolute abundance of
the combination. The same threshold used in generating permutations is
again applied to the combinations, and is reset, if necessary, according to the
value of PARAMETER
MXPEAK. After all elements have been permutated
and combined into their final formulae, the combinations are ordered by
increasing mass.
The output of the program consists of two tables and a plot, which are
generated within the main program module. The first table contains the
isotopic distributions of each element for all combinations above threshold,
343
along with the corresponding exact mass and absolute abundance. A second
table summarizes the input data and calculated data, including nominal,
monoisotopic, average, and most abundant masses. This table also includes a
list of exact masses, relative abundances, p.p.m. spread, and multiplicity for
integer mass groupings of the peaks in the first table. Finally, a plot is
generated from a Gaussian distribution of the integer mass groupings. The
resolution of the peaks in the plot is designated by the user, thereby allowing
the user to compare more readily the distribution with experimental data. A
bar plot is superimposed on the Gaussian distribution for clarity if the
user-defined resolution is less than half the mass of the molecule.
Program EXMASS
and accompanying subroutines are written in FORTRAN
IV, and consist of 1547 lines of code, including 805 comment
statements. Most statements also contain an internal comment. Program
UPDATE
is also written in FORTRAN
IV, and consist of 318 lines of code,
of which 132 are comments. All disk I/O and dialog are accomplished by
Data General RDOS FORTRAN
cqmmands, which are readily converted to
other operating systems. Dialog is designed for a Tektronix Model 4010
CRT, and output can be sent to either the CRT or to a hard copy device
such as the Versatec Model 8OOA Printer/Plotter.
Kratos DS-55 plot software is used in the present configuration but it can be readily exchanged for
packages such as Tektronix Plot-10 software. The core size demanded varies
with PARAMETER
settings, but can operate in less than 32K words of core
memory for most cases.
RESULTS
344
WELCOME
TO PROGRAM
EXMASS
THIS PROGRAM
WAS DESIGNED
TO ALLOW
THE USER TO VISUALIZE
THEORETICAL
DISTRIBUTION
PROFILES FOR ANY GIVEN MOLECULAR
FORMULA.
SPECIAL
CONSIDERATIONS
ARE MADE
FOR THE LARGE
NUMBER
OF ISOTOPIC CONTRIBUTIONS
WHICH
OCCUR FOR MANY COMPOUNDS
AT HIGHER
MASSES. THE PROGRAM
ACCEPTS THE INPUT OF A MOLECULAR
FORMULA,
FOLLOWED
BY INTERACTIVE
INPUT OF A NUMBER
OF PARAMETERS
WHICH
AFFECT
THE FINAL
PRESENTATION
OF THE THEORETICAL
DISTRIBUTION
PROFILE.
DO YOU
DISIRE
A MORE
COMPLETE
INTRODUCTION
TO THE PROGRAM?
YES
THE PROGRAM
WILL FIRST ASK YOU TO INPUT A MOLECULAR
FORMULA.
THE FORMULA
MAY CONTAIN
UP TO 9999 ATOMS OF ANY OF THE STABLE
ELEMENTS
OF THE PERIODIC
TABLE. IT WILL CHECK TO ENSURE THAT THE
FOLLOWING
RULES ARE MET:
1) FIRST CHARACTER
MUST BE A LETTER (ELEMENT
NAME),
2) ELEMENT
NAMES MUST BE TWO CHARACTERS
OR SHORTER, FOLLOWING
PERIODIC
TABLE ABBREVIATIONS,
3) ANY NUMBER
OF SINGLE
SPACES MAY BE INCLUDED,
BUT TWO IN ROW
INDICATES
THE END OF THE INPUT.
TRY TO INPUT THE FORMULA
FOR BOVINE INSULIN.
TRY INCORRECTLY
AT FIRST TO SEE THE RESPONSE
*INPUT COMPOUNDS
MOLECULAR
FORMULA:
3CH2COOH
OF THE PROGRAM.
*INPUT ERRORS,
*UNKNOWN
ELEMENT.
*MUST GIVE ELEMENT
SYMBOL
FIRST.
*ELEMENT
MUST HAVE LESS THAN 3 LETTERS.
TRY AGAIN,
THE FORMULA
IS C254 H377 N65 075 S6.
*INPUT COMPOUNDS
C254H377N65075S6
MOLECULAR
FORMULA:
THE PROGRAM
MUST LIMIT THE NUMBER
OF POSSIBLE PERMUTATIONS
FOR
MANY
COMPOUNDS,
AND THEREFORE
REQUESTS
A THRESHOLD
(5%OF THE
BASE PEAK) TO BE USED AS A CUTOFF. THRESHOLD
MUST BE > 0 and -z 100 OR
IT WILL DEFAULT
TO 1 E-10. TRY A THRESHOLD
OF O.OL FOR THIS EXAMPLE.
*INPUT
0.01
THRESHOLD
AS % OF BASE PEAK:
THE PROGRAM
NOW COMPLETES
IT CALCULATIONS
OF THE THEORETICAL
DISTRIBUTIONS,
WHICH
MAY REQUIRE
A MEASURABLE
AMOUNT
OF TIME,
AND MAY REQUIRE
RESETTING
THE THRESHOLD
IF TOO MANY
PERMUTATIONS ARE GENERATED,
AS IN THIS EXAMPLE.
NOTE THAT THE PROGRAM
WILL INFORM
YOU IF THIS IS NECESSARY;
*TOO
MANY
PEAKS,
THRESHOLD
AFTER COMPLETING
THE CALCULATIONS,
THE PROGRAM
WILL ASK QUESTIONS CONCERNING
THE DESIRED
OUTPUT
FORMAT.
THE FIRST QUESTION
IS WHETHER
YOU WANT
TO SEE THE ISOTOPIC
DISTRIBUTIONS
FOR EACH
345
ABOVE THRESHOLD
PERMUTATION,
OR GO ON TO THE TABLE OF PEAKS
EACH INTEGER
MASS AND THE PLOT. FOR THIS EXAMPLE
REPLY YES
SIMPLY Y.
*OUTPUT
YES
ISOTOPIC
DISTRIBUTIONS
FOR ALL
AT
OR
PEAKS?
DEVICE
RESOLUTION
RESOLUTION
OF 2500.
*PLOT
2500
LASTLY,
*TITLE:
BOVINE
IS SELECTED
(10% VALLEY),
VALUE
IS
A RESOLU-
RESOLUTION:
THE PROGRAM
INSULIN,
AND
PLOT.
THE PROGRAM
NOW OUTPUTS
A TABLE OF EACH ABOVE THRESHOLD
PEAK.
FOR EACH PEAK, EVERY ELEMENT
IS PRESENTED
ALONGSIDE
ITS ISOTOPIC
DISTRIBUTION
IN THAT PEAK. THE MASS AND ABSOLUTE
ABUNDANCE
ARE
ALSO GIVEN FOR EACH PEAK.
(See Table 2)
THE FOLLOWiNG
TABLE CONTAINS
BOTH THE INPUT DATA AND A SUMMARY
OF THE CALCULATED
DATA, INCLUDING
INTEGER
MASS GROUPINGS
OF THE
PEAKS PRESENTED
IN THE PREVIOUS TABLE.
(See Table 3)
FINALLY,
A PLOT WILL BE GENERATED
USING A GAUSSIAN
DISTRIBUTION
OF THE INTEGER
MASS GROUPINGS
OF THE PEAKS. NOTE THAT THE PROGRAM REQUESTS PATIENCE
WHILE CALCULATING
THE GAUSSIAN
DISTRIBUTION. NOTE ALSO THAT A BAR PLOT IS GENERATED
UNDER
THE GAUSSIAN
DISTRIBUTION.
THE PROGRAM
DOES THIS IN ALL CASES WHERE THE PLOT
RESOLUTION
IS LOW WHEN COMPARED
TO THE MASS, MAKING
IT DIFFICULT
TO VISUALIZE
THE INDIVIDUAL
PEAKS.
(See Figure 2)
NOTE:
346
TABLE
Isotopic
distributions,
INSULIN,
NO.
abundances
ion envelope
S6.
I : EXACT
MASS
= 5729.598
ABSOLUTE
ABUNDANCE
= 0_2753779E-
12C,,,1Hj,714N
I40
PEAK NO. 2: E&C;
32s,
MASS
= 5730.598
ABSOLUTE
ABUNDANCE
= 0.1304 147E- 1
= 5730.598
ABSOLUTE
ABUNDANCE
= 0.6647344E-2
MASS = 5730.602
0 32S
ABSOLUTE
ABUNDANCE
= 0.7866999E-3
2 5730.602
ABSOLUTE
ABUNDANCE
= 0.7779628E-
= 5730.605
ABSOLUTE
ABUNDANCE
= O.l557495E-2
= 5731.594
ABSOLUTE
ABUNDANCE
= 0.73206 13E-2
ABSOLUTE
ABUNDANCE
= 0_789979OE-3
ABSOLUTE
ABUNDANCE
= 0.3148122E-3
ABSOLUTE
ABUNDANCE
= O.l87795OE-
ABSOLUTE
ABUNDANCE
= 0.4140522E-2
= 5731.602
ABSOLUTE
ABUNDANCE
= O.l899039E-3
= 5731.602
ABSOLUTE
ABUNDANCE
= 0.3759686E-3
ABSOLUTE
ABUNDANCE
= 0.36843 12E-2
ABSOLUTE
ABUNDANCE
= 0.2222484E-2
ABSOLUTE
ABUNDANCE
= 0.1094576
12C2~41H3774N65607~32S~33S1
PEAK NO. 3: EXACT MASS
12C,~~H~,74N~45N,60,532S6
PEAK NO. 4: EXACT
2C2541H3774N
I60
PEAK
12253
NO. 5 : &AC?
b&S
3C,H37714N65607,3zS~
PEAK
NO.
12C,,4H
PEAK
6: EXACT
MASS
37a2H,14N65607532S6
NO.
7: EXACT
MASS
12C2541H3774N65607532S~34SI
PEAK NO. 8: EXACT MASS = 573 1.594
14Ns35N2160,532S6
12C,,,H
377
PEAK NO. 9: EXACT
MASS = 5731.598
12C2,4~H3774N~45N,160,532S533S,
PEAK NO. IO: EXACT
MASS = 5731.598
13C,H~7714Ns45N,607532S6
12c253
PEAK NO. 11: EXACT
MASS = 5 73 1.602
12C2541H3774N65160,480,32S6
PEAK NO. 12: EXACT
MASS
12C2541H3,,4N645N,60,4170,32SG
PEAK NO. 13: EXACT
MASS
12C254H376~H14N6415N1607532S6
PEAK NO. 14: EXACT
MASS = 573 1.602
2c253
PEAK
13C1H3774N651607532S533S,
NO.
15: EXACT
MASS
= 573 I .605
2C~j~13C,H37714N~5160,4170,32S~
PEAK NO. 16: EXACT
MASS = 5731.605
2C,,,
3C2H37714N6560,532S~
347
TABLE
Czs4 H 377N65075 s,
ELEMENT
#ATOMS
#ISOTOPES
ISOTOPIC
MASS
ISOTOPIC
ABUNDANCE
254
12.OOoO
13.0034
0.98900
0.01 loo
377
1.0078
2.0141
0.99985
0.00015
65
14.003 1
15.0001
0.99630
0.00370
75
15.9949
16.9991
17.9992
0.99762
0.00038
0.00200
31.9721
32.9715
33.9679
35.967 1
0.95020
0.00750
0.042 10
0.00020
CALCULATED
DATA:
NOMINAL
MASS = 5727
MONOISOTOPIC
MASS = 5729.598
AVERAGE MASS = 5733.585
THRESHOLD = 0. ICKUKUIOE-2
TOTAL ABUNDANCE
= 0.97975 10
MOST ABUNDANT
PEAK = 573 1.605
MASS
(MEAN)
5729.598
5730.598
573 1.602
5732.609
5733.609
5734.613
5735.609
5736.613
5737.617
5738.613
5739.62 1
5740.617
FRAC
ABUN
PPM
SPREAD
MULT
14.56
46.98
81.47
100.00
96.48
77.58
53.59
32.38
17.37
8.00
3.04
0.89
0.0
1.4
2.7
4.1
4.8
5.4
6.1
6.1
6.1
6.1
5.4
4.1
1
5
11
22
30
38
43
42
40
31
18
9
348
Plot PCsolUtlOn i 2500
Nominal
mE.5
Manosotoplc
nmss
90807060-
60
al
; 50
aI 50c
g 40-
8
5
40
30-
3
al 30
.s
.s
2o
=
d
f%
IO-
:
J
20
10
I,
I.
5730
I,
5735
m/z
,
5740
b30
mfz
4735
$740
for bovine
ACKNOWLEDGMENT
This work was supported by grants from the National Science Foundation, CHE-78 18396 and PCM-820 9954.
REFERENCES
1 T. Matsuo, H. Matsuda and 1. Katakuse, Anal. Chem., 5 1 (1979) 133 1.
2 R.P. Lattimer, D.J. Harmon and G.E. Hansen, Anal. Chem., 52 (1980) 1808.
3 C. Fenselau, R. Cotter, G. Hansen, T. Chen and David Heller, J. Chromatogr., 218 (1981)
21.
4 A. Deli and H. Morris, Biochem. Biophys. Res. Cornmun., 106 (1982) 1456.
5 M. Barber, R.S. Bordoli, G.J. Elliott, R.D. Sedgwick, A.N. Tyler and B.N. Green, J.
Chem. Sot., Chem. Commun., (1982) 936.
6 A.M. Buko, L.R. Phillips and B.A. Fraser, Biomed. Mass Spectrom., in press.
7 R.D. MacFarlane, Act. Chem. Res., (1982) 15.
8 J. Yergey, D. Heller, G. Hansen, R;J. Cotter and C. Fenselau, Anal. Chem., 55 (1983) 353.
9 J.L. Margrave and R.B. Polansky, J. Chem. Educ., 39 (1962) 335.
10 B. Boone, R.K. Mitchum and SE. Scheppele, Int. J. Mass Spectrom. Ion Phys., 5 (1970)
21.
1 I E. Hugentobler and J. Loliger, J. Chem. Educ., 49 (1972) 610.
12 B. Mattson and E. Carberry, J. Chem. Educ., 50 (1973) 511.
13 L.R. Crawford, Int. J. Mas Spectrom. Ion Phys., 10 (1972/3) 279.
14 Y.N. Sukharev, V.R. Sizoie and Y.S. Nekrasov, Org. Mass Spectrom., 16 (1981) 23.
15 M. Brownawell and J.S. Fillippo, Jr., J. Chem. Educ., 59 (1982) 663.
349
16 J.E. Campana, T.H. Risby and PC. Jurs, Anal. Chim. Acta, 112 (1979) 321.
17 A. Carrick and F. Glocking, J. Chem. Sot. A:, (1967) 40.
18 J-H. Beynon, Mass Spectrometry and Its Applications to Organic Chemistry, Elsevier,
Amsterdam, 1960, p. 295.
19 K. Biemann, Mass Spectrometry Organic Chemical Applications, McGraw-Hill, New
York, 1962, p. 59.
20 F.W. McLafferty, Interpretation of Mass Spectra, 3rd edn., University Science, Mill
Valley, California, 1980, p_ 15.
21 R.E. Kirk, Introductory Statistics, Wadsworth, Belmont, California, 1978, p. 166.