Académique Documents
Professionnel Documents
Culture Documents
Spectrum Evaluation
I. INTRODUCTION
This chapter deals with (mathematical) procedures to extract relevant information from
acquired x-ray spectra. Smoothing of the spectrum results in a graph that can be easier
interpreted by the human observer. To determine which elements are present in a speci-
men, peak search methods are used. To obtain the analytically important net peak areas of
the uorescence lines, a variety of methods, ranging from simple summation to sophisti-
cated least-squares-tting procedures, are to the disposal of the spectroscopist.
Spectrum evaluation is a crucial step in x-ray spectrometry, as much as sample pre-
paration and quantication. As with any analytical technique, the performance of x-ray
uorescence analysis is determined by the weakest step in the process. Spectrum evaluation
in energy-dispersive x-ray uorescence analysis (EDXRF) is certainly more critical than in
wavelength-dispersive spectrometry (WDXRF) because of the relatively low resolution of
the solid-state detectors employed. The often-quoted inferior accuracy of EDXRF can, to a
large part, be attributed to errors associated with the evaluation of these spectra. As a
consequence of this, most of the published work in this eld deals with ED spectrometry.
Although rate meters and=or strip-chart recorders have been employed in WD
spectrometry for a long time, the processing of ED spectra by means of computers has
always been more evident because of their inherent digital nature. Some of the techniques
to be discussed have their roots in g-ray spectrometry developed mainly in the sixties; for
others (notably the spectrum-tting procedures), EDXRF has developed its own specia-
lized data processing methodology. The availability of inexpensive and fast personal
computers together with the implementation of mature spectrum evaluation packages on
these machines has brought sophisticated spectrum evaluation within the reach of each
x-ray spectrometry laboratory.
In this chapter, various methods for spectrum evaluation are discussed, with emphasis
on energy-dispersive x-ray spectra. Most of the methods are relevant for x-ray uorescence,
particle-induced x-ray emission (PIXE), and electron beam x-ray analysis [electron probe x-
ray microanalysis (EPXMA), scanning electron microscopy energy dispersive x-ray
analysis (SEMEDX), and analytical electron microscopy (AEM)]. The principles of the
methods and their practical use are discussed. Least-squares tting, which is of importance
not only for spectrum evaluation but also for qualication procedures, is discussed in detail
in Sec. IX. Section X presents computer implementations of the main algorithms.
The aims of spectrum evaluation is to extract analytically relevant information from ex-
perimental spectra. Obtaining this information is not straightforward because the spectral
data are always corrupted with measurement noise.
1. Characteristic Lines
The characteristic radiation of a particular x-ray line has a Lorentz distribution. Peak
proles observed with a semiconductor detector are the convolution of this Lorentz dis-
tribution with nearly Gaussian detector response function, giving rise to what is known as
Voigt prole (Wilkinson, 1971). Because the Lorentz width is of the order of only 10 eV
for elements with atomic number below 50, whereas the width of the detector response
function is of the order of 160 eV, a Gauss function is an adequate approximation of the
line prole. Only for K lines of elements such as U and Th does the Lorentz contribution
become signicant and need to be taken into account (Gunnink, 1977).
A more close inspection of the peak shape reveals some distinct tailing at the low
energy side of the peak and a shelf extending to zero energy. This is mainly due to
incomplete charge collection caused by detector imperfections (dead layer and regions of
low electric eld) as discussed in Chapter 3. The eect is most pronounced for low
energy x-rays. For photons above 15 keV Compton scatter in the detector also con-
tributes to deviation from the Gaussian shape. The distortion caused by incomplete
charge collection has been described theoretically (Joy, 1985; Heckel and Scholz, 1987).
Various functions have been proposed to model the real peak shape more accurately
(Campbell et al., 1987).
The observed emission spectrum of an element is the result of many transitions, as
explained in Chapter 1. The resulting x-ray lines, including possible satellite lines, need to
be considered in the evaluation of an x-ray spectrum. A more detailed discussion on the
representation of the K and L spectra and the peak shape is given in Sec. VII.
2. Continuum
The continuum observed in x-ray spectra results from a variety of processes. The con-
tinuum in electron-induced x-ray spectra is almost completely due to the retardation of the
primary electrons (bremsstrahlung). The intensity distribution of the continuum radiation
emitted by the sample is in rst approximation given by Kramers formula (Chapter 1).
Absorption in the detector windows and in the sample causes this continuous decreasing
function to fall o at low energies, giving rise to the typical continuum shape observed.
The attenuation of this bremsstrahlung by major elements in the sample also causes ab-
sorption edges to appear in these spectra. Continuum modeling for electron-induced x-ray
spectra has been studied in detail by a number of authors (Statham, 1976a, 1976b,
Smith et al., 1975; Sherry and Vander Sande, 1977; Bloomeld and Love, 1985).
For particle-induced x-ray emission, a similar continuum is observed also mainly due
to secondary electron bremsstrahlung. Other (nuclear) processes contribute here, making
it virtually impossible to derive a real physical model for the continuum. Special absorbers
placed between the sample and detector further alter the shape of the continuum.
Copyright 2002 Marcel Dekker, Inc.
In x-ray uorescence, the main source of the continuum is the coherent and incoherent
scattering of the excitation radiation by the sample. The shape can therefore become very
complex and depends both on the initial shape of the excitation spectrum and on the sample
composition. When white radiation is used for the excitation, the continuum is mainly ra-
diative and absorption edges can also be observed. With quasimonoenergetic excitation
(secondary target, radioisotope), the incomplete charge collection of the intense coherently
and incoherently scattered peaks is responsible for most of the continuum (see Chapter 3).
Also here, realistic physical models for the description of the continuum are not used.
The incomplete charge collection of intense uorescence lines in the spectrum further
complicates the continuum. The cumulative eect of the incomplete charge collection of all
lines causes the apparent continuum at lower energies to be signicantly higher that ex-
pected on the basis of the primary continuum generating processes.
3. Escape Peaks
Escape peaks result from the escape of SiK or GeK photons from the detector after
photoelectric absorption of the impinging x-ray photon near the edge regions of the de-
tector. The energy deposited in the detector by the incoming x-ray is diminished with the
energy of the escaping SiK or GeK photon. Typical examples of the interference due to Si
escape peaks are the interference of TiKa (4.51 keV) by the FeKa escape at 4.65 keV and
the interference of FeKa by the CuKa escape.
For a Si(Li) detector, the escape peaks is expected 1.742 keV (SiKa) below the parent
peak. Experimentally, it is observed that the energy dierence is slightly but signicantly
higher, 1.750 keV (Van Espen et al., 1980). Ge escape peaks are observed 9.876 (GeKa)
and 10.984 keV (GeKb) below the parent peak. The width of the escape peaks is smaller
than the width of the parent peak and corresponds to the spectrometer resolution at the
energy of the escape peak.
The escape fraction f is dened as the number of counts in the escape peak Ne di-
vided by the number of detected counts (escape parent). Assuming normal incidence to
the detector and escape only from the front surface, the following formula can be derived
for the escape fraction (Reed and Ware, 1972):
Ne 1 1 m m
f oK 1 1 K ln 1 I 4
Np Ne 2 r mI mK
where mI and mK are the mass-attenuation coecient of silicon for the impinging and the
SiK radiation, respectively, oK is the K uorescence yield, and r is the K jump ratio of
silicon. Using 0.047 for the uorescence yield and 10.8 for the jump ratio, the calculated
escape fraction is in very good agreement with the experimentally determined values for
impinging photons up to 15 keV (Van Espen et al., 1980). Equation (4) is also applicable
for estimating the escape fraction in Ge detectors, provided that the parameters related to
Si are substituted with these of Ge. Knowing the energy, width, and intensity of the escape
peak, corrections for its presence can be made in a straightforward manner.
N_ 12 2tN_ 1 N_ 1 6
with N_ 11 the count rate (counts=s) in a sum peak due to the coincidence of two x-rays with
the same energy, N_ 12 the count rate of a sum peak resulting from two x-rays with dierent
energy, and t the pulse-pair resolution time in seconds. Sum peaks are often found when a
few large peaks at lower energy dominate the spectrum. Typical examples are PIXE
spectra of biological and geological samples. The high count rate of the K and CaK lines
produces sum peaks that are easily observed in the high-energy region of the spectrum
where the continuum is low. It is important to note that the intensity of sum peaks is
count-rate dependent, they can be reduced and virtually eliminated by performing the
measurement with a lower primary beam intensity. A method for correcting for the
contribution of sum peaks in least-squares tting has been proposed by Johansson (1982)
and is discussed further in Sec. VII.
6. Other Artifacts
A number of other features might appear in an x-ray spectrum and can cause problems
during the spectrum evaluation. In the K x-ray spectra of elements with atomic number
between 20 and 40, one can detect a peaklike structure with a rather poorly dened
maximum and a slowly declining tail (Van Espen et al., 1979a). This structure is due to the
KLL radiative Auger transition, which is an alternative decay mode of the K vacancy.
The maximum is observed at the energy of the KLL Auger electron transition energy.
The intensity of the radiative Auger structure varies from approximately 1% of the Ka line
for elements below Ca to 0.1% for elements above Zn. For chlorine and lower atomic
number elements, the radiative Auger band overlaps with the Ka peak. In most analytical
applications, this eect will not cause serious problems. The structure can be considered as
part of non-Gaussian peak tail.
The scattering of the excitation radiation in x-ray uorescence is responsible for most
of the continuum observed in the spectrum. When characteristic lines are present in the
excitation spectrum, two peaks might be observed. The Rayleigh (coherently)-scattered
peak has a position and width as expected for a normal uorescence line. The Compton
(incoherently scattered) peak is shifted to lower energies according to the well-known
Compton formula and is much broader than a normal characteristic line at that energy.
This broader structure, resulting from scattering over a range of angles and Doppler eect,
Copyright 2002 Marcel Dekker, Inc.
is dicult to model analytically (Van Dyck and Van Grieken, 1983). The structure is
further complicated by multiple scattering phenomena in the sample (Vincze et al., 1999).
Apart from these commonly encountered scattering processes, it is possible to detect x-
ray Raman scattering (Van Espen et al., 1979b). Again, a bandlike structure is observed with
an energy maximum given by the incident photon energy minus the electron-binding energy.
The Raman eect is most prominently present when exciting elements with atomic number
Z 7 2 to Z 7 with the K radiation of element Z. In this case, Raman scattering occurs on L
electrons. For x-ray excitation energies between 15 and 25 keV, the Raman scattering on the
K electrons of Al to Cl can also be observed. Because of its high-energy edge, the eect may
appear as a peak in the spectrum with possible erroneous identication as a uorescence line.
The intensity of the Raman band increases as the incident photon energy comes closer to the
binding energy of the electron. The observed intensity can amount to as much as 10% of the
L uorescence intensity for the elements Rh to Cs when excitation with MoK x-rays is used.
When the excitation source is highly collimated diraction peaks can be observed in
the x-ray uorescence spectrum of crystalline materials. It is often dicult to deal with
these diraction patterns. They can interfere with the uorescence lines or even be mis-
interpreted as being uorescence lines, giving rise to false identication of elements.
Spectrum processing refers to mathematical techniques that alter the outlook of the
spectral data. This is often done, using some digital lter, to reduce the noise, to locate
peaks, or to suppress the continuum. In this section, various methods of ltering are
discussed. Because of its relation to the frequency domain, the concept of Fourier trans-
formation is introduced rst.
jFuj2 R2 u I2 u 8
and gives an idea about the dominant frequencies in the spectrum. Because there are n
dierent nonzero real and imaginary coecients, no information is lost by the Fourier
transform and the inverse transformation is always possible:
X
n1
j2pux
fx Fu exp 9
u0
n
Figure 3 shows that power spectrum of the pulseheight distribution in Figure 2 (a single
Gaussian on a constant continuum). The frequency (inverse channel number) is dened as
u=n, with n 256 and u 0, . . . , n=2. The amplitude of the zero frequency jF0j2 , which is
equal to the average of the spectrum, is not shown. The dominating low frequencies
originate from the continuum and from the Gaussian peak, whereas the higher frequencies
are caused mainly by the counting statistics. It is clear that if we eliminate those high
frequencies, we are reducing this noise. This can be done by multiplying the Fourier
transform with a suitable function:
Gu FuHu 10
An example of such a function is a high-frequency cuto lter:
1; u ucrit
Hu 11
0; u > ucrit
which sets the real and imaginary coecients above a frequency ucrit to zero. If we apply
this lter to the Fourier transform of Figure 3 using ucrit 0.05 and then apply the inverse
Fourier transformation [Eq. (9)], the result as shown by the solid line in Figure 2 is ob-
tained. The peak shape is preserved, but most of the statistical uctuations are eliminated.
Copyright 2002 Marcel Dekker, Inc.
Figure 3 Fourier power spectrum of the pulseheight distribution shown in Figure 2.
If we would cut o at even lower frequencies, peak distortions at the top and at the base of
the peak would become more pronounced.
This Fourier ltering can also be done directly in the original data space. Indeed, the
convolution theorem says that multiplication in the Fourier space is equivalent to con-
volution in the original space:
Gu FuHu , fx hx gx 12
The convolute at data point x is dened as the sum of the products of the original data and
the lter centered around point x:
X
gx fx hx fx x0 hx0 13
x0
h(x) is called a digital lter and is the inverse Fourier transformation of H(u). In
general, this convolution or ltering of a spectrum yi with some weighing function is
expressed as
1 X
jm
yi hj yij 14
N jm
where hj are the convolution integers and N is a suitable normalization factor. The lter
width is given by 2m 1. Fourier ltering with the purpose to reduce or eliminate some
(high or low) frequency components in the spectrum can thus be implemented as a con-
volution of the original data with a digital lter. This convolution also alters the variance
of the original data. Applying the concept of error propagation, one nds that the variance
of the convoluted data is given by
Copyright 2002 Marcel Dekker, Inc.
1 X m
s2y h2 yij 15
i N2 jm i
The fact that the detector response function changes with energy (becomes broader) and,
more importantly, the presence of noise prohibits the straightforward application of this
Fourier deconvolution technique. Indeed, in the presence of noise, the measured signal
must be presented by
yx fx hx nx 17
and its Fourier transform
Yu FuHu Nu 18
or
Yu Nu
Fu 19
Hu Hu
At high frequencies, the response, H(u), goes to zero while N(u) is still signicant, so that
the noise is emphasized in the inverse transformation. This clearly shows that the noise
(counting statistics) is the ultimate limitation for any spectrum processing and analysis
method.
A clear introduction to Fourier transformations related to signal processing can be
found in the work of Massart et al. (1998). Algorithms for Fourier transformation and
related topics are given in the work of Press et al. (1998). Detailed discussions on Fourier
deconvolution can be found in many textbooks (Jansson, 1984; Brook and Wynne, 1988).
Fourier deconvolution in x-ray spectrometry based on maximum a posteriori or maximum
entropy principles is discussed by several authors (Schwalbe and Trussell, 1981; Nunez
et al., 1988; Gertner et al., 1989). Gertner implemented this method for the analysis of real
x-ray spectra and compared the results with those obtained by simple peak tting. The
problem that the deconvolution algorithms are limited to systems exhibiting translational
invariance was overcome by a transformation of the spectrum, so that the resolution
becomes independent of the energy.
B. Smoothing
p
Because of the uncertainty y on each channel content yi , ctitious maxima can occur
both on the continuum and on the slope of the characteristic peaks. Removal or sup-
pression of these uctuations can be useful during the visual inspection of spectra (e.g., for
locating small peaks on a noisy continuum) and is used in most automatic peak search and
Copyright 2002 Marcel Dekker, Inc.
continuum estimation procedures. Although smoothing can be useful in qualitative ana-
lysis, its use is not recommended as part of any quantitative spectrum evaluation.
Smoothing, although reducing the uncertainty in the data locally, redistributes the original
channel content over the neighboring channel, thus introducing distortion in the spectrum.
Accordingly, smoothing can provide a (small) improvement in the statistical precision
obtainable with simple peak integration but is of no advantage when used with least-
squares-tting procedures in which assumptions about the peak shapes are made.
1. Moving Average
The most straightforward way of smoothing (any) uctuating signal is to employ the
box-car or moving-average technique. Starting from a measured spectrum y, a
smoothed spectrum y* is obtained by calculating the mean channel content around each
channel i:
1 X
m
yi yi yij 20
2m 1 jm
This can be seen as a convolution [Eq. (14)] with all coecients hj 1. The smoothing
eect obviously depends on the width of the lter, 2m 1. The operation being a simple
p
averaging, the standard deviation of the smoothed data is reduced by a factor 2m 1 in
regions where yi is nearly constant. On the other hand, such a lter introduces a con-
siderable amount of peak distortion. This distortion depends on the lter width-to-peak
width ratio. Figure 4 shows the peak distortion eects when a moving-average lter of
widths 9, 17, and 25 is applied to a peak with
P full width at half-maximum (FWHM) equal
to nine channels. Being a unit area lter hj =N 1 with N 2m 1, the total counts in
the peak is not aected in an appreciable way other than by rounding errors.
Figure 4 Distortion introduced by smoothing of a peak with a moving-average filter 9, 17, and 25
points wide. The FWHM of the original peak is nine channels.
yi a0 a1 i i0 a2 i i0 2 21
can be made. Once we have determined the coecients aj , the value of the polynomial at
the central channel i0 can be used as the smoothed value:
yi yi0 a0 22
This concept is schematically illustrated in Figure 6. By moving the central channel to the
right (from i0 to i0 1), the next smoothed channel content can be calculated by repeating
the entire procedure.
Figure 5 Percent change in peak height and width introduced by filtering with a moving-average
and a Savitsky and Golay polynomial filter as a function of the filter width-to-peak FWHM ratio,
(2m 1)/FWHM.
At rst sight, this smoothing method would require a least-squares t for each
channel in the spectrum. However, the fact that the x values are equidistant allows us to
formulate the problem in such a way that the polynomial coecients can be expressed as
simple linear combinations involving only yi values:
1 X
jm
ak Ck; j yik 23
Nk jm
Table 1 Savitsky and Golay Coefcients for Second- and Third-Degree (r 2 and r 3)
Pjm
Polynomial Smoothing Filter; yi 1=Nrm jm Crm; j yij , Filter Width 2m 1
j Crm; j Crm;j
m Nrm 0 1 2 3 4 5 6 7 8 9 10 11 12
2 35 17 12 3
3 21 7 6 3 2
4 231 59 54 39 14 21
5 429 89 84 69 44 9 36
6 143 25 24 21 16 9 0 11
7 1105 167 162 147 122 87 42 13 78
8 323 43 42 39 34 27 18 7 6 21
9 2261 269 264 249 224 189 144 89 24 51 136
10 3059 329 324 309 284 249 204 149 84 9 76 171
11 805 79 78 75 70 63 54 43 30 15 2 21 42
12 5175 467 462 447 422 387 342 287 222 147 62 33 138 253
j Crm; j Crm;j
m Nrm 0 1 2 3 4 5 6 7 8 9 10 11 12
2 10 0 1 2
3 28 0 1 2 3
4 60 0 1 2 3 4
5 110 0 1 2 3 4 5
6 182 0 1 2 3 4 5 6
7 280 0 1 2 3 4 5 6 7
8 408 0 1 2 3 4 5 6 7 8
9 570 0 1 2 3 4 5 6 7 8 9
10 770 0 1 2 3 4 5 6 7 8 9 10
11 1012 0 1 2 3 4 5 6 7 8 9 10 11
12 1300 0 1 2 3 4 5 6 7 8 9 10 11 12
j Crm; j Crm;j
m Nrm 0 1 2 3 4 5 6 7 8 9 10 11 12
2 12 0 8 1
3 252 0 58 67 22
4 1,188 0 126 193 142 86
5 5,148 0 296 503 532 294 300
6 24,024 0 832 1,489 1,796 1,578 660 1,133
7 334,152 0 7,506 13,843 17,842 18,334 14,150 4,121 12,922
8 23,256 0 358 673 902 1,002 930 643 98 748
9 255,816 0 2816 5,363 7,372 8,574 8,700 7,481 4,648 68 6,936
10 3,634,092 0 29,592 56,881 79,564 95,338 101,900 96,947 78,176 43,284 10,032 84,075
11 197,340 0 1,222 2,365 3,350 4,098 4,530 4,567 4,130 3,140 1,518 815 3,938
12 1,776,060 0 8,558 16,649 23,806 29,562 33,450 35,003 33,754 29,236 20,982 8,525 8,602 30,866
j Crm; j Crm;j
m Nrm 0 1 2 3 4 5 6 7 8 9 10 11 12
2 7 2 1 2
3 42 4 3 0 5
4 462 20 17 8 7 28
5 429 10 9 6 1 6 15
6 1,001 14 13 10 5 2 11 22
7 6,188 56 53 44 29 8 19 52 91
8 3,876 24 23 20 15 8 1 12 25 40
9 6,783 30 29 26 21 14 5 6 19 34 51
10 33,649 110 107 98 83 62 35 2 37 82 133 190
11 17,710 44 43 40 35 28 19 8 5 20 37 56 77
12 26,910 52 51 48 43 36 27 16 3 12 29 48 69 92
The rather heuristic algorithm is discussed in Section X, together with the computer im-
plementation. The eect of the lter is illustrated in Figure 8, where it is compared with the
other smoothing methods discussed.
constant and positive coecients and two side lobes with constant and negative coecients.
Convoluting an x-ray spectrum with this kind of lter yields spectra in which the continuum
is removed and peaks are easily locatable. They are similar to inverted second-derivative
spectra. An important representative of this group of lters is the top-hat lter, which has
a central window with an odd number of channels w and two side windows each n channels
wide. The value of the lter coecients follows from the zero-area constraint:
8 1
< 2n ; n w2 k < w2
hk 1
; w2 k w2 30
: w1
2n ; w2 < k w2 n
The ltered spectrum is obtained by the convolution of the original spectrum with this lter:
X
knw=2
yi hk yik 31
knw=2
The eect of this lter on a typical spectrum is shown in Figure 10. The variance of the
ltered spectrum is obtained by simple error propagation:
X
knw=2
s2y h2k yik 32
i
knw=2
If yi is signicantly dierent from zero, a peak structure is found and the top of the peak
can approximately be located by searching for the maximum. Thus, i corresponds to the
position of a peak maximum in the original spectrum if
yi > rsyi 33
and
yi1 yi > yi1 34
In Figure 11, the positive part of the ltered spectrum (w 9 and n 5) and the decision
line rsy for r 1 and 4 are displayed.
If required, other peak features can be obtained from the ltered spectrum: The
distance between the two local minima is a measure of the width of the peak and the height
at the maximum is related to the net peak area.
Because the width and heights of the peaks in the ltered spectrum strongly depend
on the dimensions of the lter, it is important that its dimensions are matched to the width
of the peaks in the original spectrum. From considerations of peak detectability (signal-to-
noise ratio) and resolution (peak broadening), it follows that the optimum width of the
positive window w is equal to the FWHM of the peaks (Robertson et al., 1972). The width
of the negative side windows should be chosen as large as the curvature of the continuum
allows. A reasonable compromise between sensitivity to peak shapes and rejection of
continuum is reached when n equals FWHM=2 to FWHM=3. Typical values for the
sensitivity factor r are between 2 and 4. Higher values result in the loss of small peaks;
lower values will cause continuum uctuations to be interpreted as peaks.
Other zero-area rectangular lters, variations to the top-hat lter, are also in use,
such as the square-wave lter with typical coecient sequence 1, 1, 2, 2, 1, 1 (Philips
and Marlow 1976; McCullagh, 1982) and the symmetric square-wave lter with coe-
cients 1, 1, 1, 1 (Op De Beeck and Hoste 1975). A detailed account of the performance
of this lter is given by Op De Beeck and Hoste (1975). A method using a Gaussian
correlator function is discussed by Black (1969).
Copyright 2002 Marcel Dekker, Inc.
Figure 11 Peak search using the positive part of the top-hat filtered spectrum and the decision
level for one and four times the standard deviation. The original spectrum is shown at the bottom.
Once the peak top is approximately located in the ltered spectrum, a more precise
maximum can be found by tting a parabola over a few channels around the peak. For a
well-dened peak on a low continuum (or after continuum subtraction), the channel
content near the top of the peak can be approximated by a Gaussian:
" #
x m2
yi h exp 35
2s2
a1
m 37
2a2
1 yi1 yi1
mi 40
2 yi1 yi1 2yi
This method might be preferred for small peaks when the continuum cannot be
disregarded.
A FORTRAN implementation of a peak search algorithm is given in Section X.
Except for some special quantication procedures (e.g., the peak-to-background method
in electron microscopy), the relevant analytical information is found in the net peak areas
and continuum is considered a nuisance. There are, in principle, three ways to deal with
the continuum: (1) the continuum can be suppressed or eliminated by a suitable lter;
(2) the continuum can be estimated and subtracted from the spectrum prior to the esti-
mation of the net peak areas; and (3) the continuum can be estimated simultaneously with
the other features in the spectrum. The rst approach is discussed in Section VI, where the
continuum is removed from spectra by applying a top-hat lter followed by linear least-
squares t of the spectrum with a number of (also ltered) reference spectra. Least-squares
t (linear or nonlinear) with analytical functions (Sec. VII) allows the simultaneous esti-
mation of continuum and peaks, providing a suitable mathematical function can be found
for the continuum. In this section, we discuss a number of procedures that aim to estimate
the continuum independently of the other features in the spectrum. Once estimated, this
continuum can be subtracted from the original spectrum and all methods for further
processing, ranging from simple peak integration to least-squares tting can be applied.
Any continuum estimation procedure must fulll two important requirements. First,
the method must be able to reliably estimate the continuum in all kinds of situations (e.g.,
small isolated peaks on a high continuum as well as in the proximity of a matrix line).
Second, to permit processing of a large number of spectra, the method needs to be nearly
free of user-adjustable parameters.
Although a number of useful continuum estimation procedures has been developed,
it must be realized that their accuracy in estimating the continuum is not optimal. In one
way or another, they rely on the dierence in frequency response of the continuum
compared to other structures such as peaks, the former mainly consisting of low
Copyright 2002 Marcel Dekker, Inc.
frequencies (slowly varying). Because the peaks also exhibit low frequencies at the peak
boundaries, it is dicult to control the method in such a way that it correctly discriminates
between peaks and continuum. This results in either a small underestimation or over-
estimation of the continuum, introducing potentially large relative errors for small peaks.
In this respect, the tting of the continuum with analytical functions may provide more
optimal results (Vekemans et al., 1994). A considerable advantage of the methods dis-
cussed here is that they do not assume an explicit mathematical model of the continuum.
Constructing a detailed and accurate analytical model for the continuum based in physical
theory is nearly impossible except for some simple geometry and particular excitation
conditions. Most often, some polynomial type of function must be chosen when tting a
portion of the spectrum with analytical functions.
A. Peak Stripping
These methods are based on the removal of rapidly varying structures in a spectrum by
comparing the channel content yi with the channel content of its neighbors. Clayton et al.
(1987) proposed a method which compares the content of channel i with the mean of its
two direct neighbors:
yi1 yi1
mi 41
2
If yi is smaller than mi, the content of channel i is replaced by the mean mi. If this trans-
formation is executed once of all channels, one can observe a slight reduction in the height
of the peaks while the rest of the spectrum remains virtually unchanged. By repeating this
procedure, the peaks are gradually stripped from the spectrum. Because the method
tends to connect local minima, it is very sensitive to local uctuations in the continuum due
to counting statistics. This makes smoothing of the spectrum, as discussed in the previous
section, prior to the stripping process mandatory. Depending on the width of the peaks
after typically 1000 cycles, the stripping converges and a more or less smooth continuum
remains. To reduce the number of iterations, it might be advantageous to perform a log or
p
square root transformation to the data prior to the stripping: y0i logyi 1 or y0i yi .
After the stripping, the continuum shape is obtained by applying the inverse transforma-
tion. A major disadvantage of this method is that after a number of cycles, the bases of
partially overlapping peaks are transformed into broad humps, which take much longer
to remove than isolated peaks. The method was originally applied to PIXE spectra but
proves to be generally applicable for pulseheight spectra.
In Figure 12, this method is applied to estimate the continuum of an x-ray spectrum
in the region between 1.6 and 13.0 keV. The spectrum results from a 200-mg=cm2 pellet of
NIST SRM Bovine Liver sample excited with the white spectrum of an Rh-anode x-ray
tube ltered through a thin Rh lter (Tracor Spectrace 5000). Because of the white tube
spectrum, a considerable continuum intensity was observed, increasing quite steeply in the
region above 10 keV. To calculate the continuum, the following algorithm was used:
(1) the square root of the original spectrum was taken; (2) these data were smoothed with a
10-point Savitsky and Golay lter; (3) a number of iterations were performed applying
Eq. (41) over the region of interest; (4) the square of each data point was taken (back
transformation) to obtain the nal continuum shape. In Figure 12, the continuum after
10, 100, and, nally, 500 iterations is shown.
As a generalization of the above-discussed method, the average of two channels a
distance w away from i can be used:
Copyright 2002 Marcel Dekker, Inc.
Figure 12 Continuum estimate after 10, 100, and 500 iterations obtained with simple iterative
peak stripping.
yiw yiw
mi 42
2
Ryan et al. (1988) proposed using twice the FWHM of the spectrometer at channel i as the
value for w. They reported that only 24 passes were required to produce acceptable
continuum shapes
p in PIXE spectra. During the last eight cycles, w is progressively reduced
by the factor 2 to obtain a smooth continuum. To compress the dynamic range of the
spectrum, a double-log transformation of the spectrum, log[log( yi+1)+1], before the
actual stripping was proposed. In combination with the low statistics digital lter, this
procedure is called the SNIP algorithm (Statistical Nonlinear Iterative Peak clipping).
A variant of this procedure is implemented in the procedure SNIPBG given in Sec. X.
Instead of the double logarithmic, we employed a square root transformation, and a Sa-
vitsky and Golay smoothing is performed on the square root data. The width w is kept
constant over the entire spectrum. The value of w is also used as the width of the smoothing
lter. Using this implementation, the continuum of the above-discussed spectrum is cal-
culated and represented in Figure 13. The width was set 11 to channels approximately
corresponding to the FWHM of the peaks in the center of the spectrum and 24 iterations
were done. Apart from delivering a smoother continuum with smaller humps, this
method executes much faster than the original method proposed by Clayton.
In both in WDXRF and EDXRF, the concentration of the analyte is proportional to the
number of counts under the characteristic x-ray peak corrected for the continuum. At
constant resolution, this proportionality also exits for the net peak height. In EDXRF,
preference is given to the peak area. In WDXRF, the acquisition of the entire peak prole
is very time-consuming and the count rate is usually measured only at the peak maximum.
Table 5 Data on the Fit of the Continuum of the Spectrum Shown in Figure 14 Using Orthogonal
Polynomials
4 126 7 (3.980.20)106
5 199 16 (1.880.09)107
6 267 6 (7.660.28)1010
7 266 6 (3.662.03)1013
YBL and YBR are the values of the continuum at the channels iBL and iBR , left and right
from the peaks, respectively. These values are best estimated by averaging over a number
of channels:
1 X iBL1
NBL
YBL yi 52
nBL iiBL2 nBL
1 X
iBR1
NBR
YBR yi 53
nBR iiBR2
nBR
The number of channels in the continuum windows are nBL iBL2 iBL1 1 and
nBR iBR2 iBR1 1. The center position of the continuum windows (not necessarily an
integer number!) used in Eq. (51) are iBL iBL1 iBL2 =2 and iBR iBR1 iBR2 =2.
If both continuum windows have equal width, nBL nBR nB =2, and are positioned
symmetrically with respect to the peak window iP iBL iBR iP a much simpler ex-
pression is obtained for the net peak area:
nP
NP NT NBL NBR 54
nB
where NBL and NBR are the total counts in the left and right continuum windows, re-
spectively, each nB =2 channels wide, and nP equals the number of channels in the peak
window. Applying the principle of error propagation, the uncertainty in the net peak area
is then given by
s
2
nP
sNP N T NBL NBR 55
nB
Various counting strategies can be considered and the eect on the precision can be esti-
mated using Eq. (57) (Bertin, 1970). In a optimum xed time strategy, the minimum
Copyright 2002 Marcel Dekker, Inc.
uncertainty is obtained when, for a total measurement time t tP tB ; tP and tB are chosen
in such a way that their ratio is equal to the square root of the peak-to-continuum ratio:
r
tP IP
58
tB IB
Under, these conditions, the uncertainty in the net intensity is given by
p p
IP IB
sI p 59
tP tB
In this section, two techniques based on linear least squares are discussed. The lter-t
methods makes use of library spetra, measured or calculated spectra of pure compounds,
that are used to describe the spectra of complex samples. The other method is based on
partial least-squares (PLS) regression, a multivariate calibration technique. In this case, no
spectrum evaluation in the strict sense is performed, but, rather, relations between the
concentrations of the compounds in the samples and the entire spectrum are established.
In this way, quantitative analysis is possible without obtaining net peak areas of the
characteristic lines.
A. Filter-Fit Method
1. Theory
If a measured spectrum of an unknown sample can be described as a linear combination of
spectra of pure elements constituting the sample, then the following mathematical model
can be written:
X
m
ymod
i aj xji 60
j1
with ymod
i the content of channel i in the model spectrum and xji the content of channel i in
the jth reference spectrum. The coecients aj are a measure of the contribution of pure
reference spectra to the unknown spectrum and can be used for quantitative analysis. The
values of the coecients aj are obtained via multiple linear least-squares tting, mini-
mizing the sum of the weighted squared dierences between the measured spectrum and
the model:
" #2
Xn2
1 2
Xn2
1 Xm
w 2
yi yi yi aj xji 61
s2
in1 i
s2
in1 i j1
where yi and si are the channel content and the uncertainty of the measured spectrum,
respectively, and n1 and n2 are the limits of the tting region. A detailed discussion of the
least-squares-tting method is given in Sec. IX.
The assumption of linear additivity [Eq. (60)] holds normally reasonable well for the
characteristic lines in the spectrum, but not for the continuum. To apply this technique,
the continuum can be removed from the unknown spectrum and from the reference
spectra using one of the procedures described in Sec. IV before the actual least-squares t.
Copyright 2002 Marcel Dekker, Inc.
Another, frequently used approach is to apply a digital lter to both unknown and
reference spectra. This variant is known as the lter-t method (Schamber, 1977; Statham,
1978; McCarthy and Schamber, 1981) and is discussed in some detail below.
By the discrete convolution of a spectrum with top-hat lter [Eqs. (30) and (31)], the
low-frequency component (i.e. the slowly varying continuum) is eectively suppressed as
discussed in Sec. III. Apart from removing the slowly varying continuum, a rather severe
distortion of the peaks is also introduced. If we apply this lter to both the unknown
spectrum and the reference spectra, the nonadditive continuum is removed and the same
type of peak distortion will be introduced in all spectra, allowing us to apply the method of
multiple linear least-squares tting to the ltered spectra. Equation (61) then becomes
" #2
Xn2
1 X
m
w2 y0 aj x0ji 62
s0 2 i j1
in1 i
Where y0i and x0ji are the ltered unknown and reference spectra, respectively, s0i 2 is the
variance of y0i given by
X
s02
i h2k yik 63
k
The least-squares estimates of the contribution of each reference spectrum is then given by
(see Sec. IX)
X
m
aj a1
jk bk ; j 1; . . . ; m 64
k1
with
Xn2
1 0 0
bj yx
02 i ji
65
s
in1 i
Xn2
1 0 0
ajk x x 66
s0 2 ki ji
in1 i
The uncertainty in each coecient aj is directly estimated from the error matrix:
s2aj a1
jj 67
Schamber (1977) suggested the following equation for the uncertainties, taking into ac-
count the eect of the lter:
nw 1
s2aj a 68
n w jj
Where w is the width of the central positive part of the lter and n is the width of the
negative wings. A measure of the goodness of t is available through the reduced w2 value:
1
w2n w2 69
n2 n1 1 m
which is the w2 value of Eq. (62) divided by the number of points in the t minus the
number of reference spectra. A value close to 1 means a good t, indicating that the
reference spectra are capable of adequately describing the unknown spectrum.
Copyright 2002 Marcel Dekker, Inc.
Most of the merits and the disadvantages of the lter-t method can be deduced
directly from the mathematical derivation given in the preceding paragraphs. The most
interesting aspect of the lter-t method is that it does not require any mathematical
model for the continuum and that, at least in principle, the shape of the peaks in the
unknown spectrum are exactly represented by the reference spectra. Reference spectra
should be acquired with good counting statistics, at least better than the unknown spec-
trum, because the least-squares method assumes that there is no error in the independent
variables x0ji . Reference spectra can be obtained from single-element standards. Only the
portion of the spectrum that contains peaks needs to be retained as reference in the t.
Multielement standards can be used if the peaks of each element are well separated.
The reference spectra must provide an accurate model of the peak structure present
in the unknown spectrum. This requires that reference and unknown spectra are acquired
under strictly identical spectrometer conditions. Changes in resolution and, especially,
energy shifts can cause large systematic errors. The magnitude of this error depends on the
degree of peak overlap. Peak shifts of more than a few electron volts should be avoided,
which is readily possible with modern detector electronics. If shifts are observed over long
periods of operations of the spectrometer, careful recalibration of the spectrometer is
required or, better, the reference spectra should be acquired again. Also, peak shift and
peak broadening due to dierences in count rate between standards and unknown must be
avoided. Dierential absorption is another problem that might inuence the accuracy of
the model. Because of the dierence in x-ray attenuation in the reference and the un-
known, the Kb to Ka ratios might be dierent in the two spectra. This becomes especially
problematic if the Kb line is above and the Ka line below an absorption edge of a major
element of the unknown sample. The magnitude of the error depends on the peak overlap.
Careful selection of the samples to produce the reference spectra is therefore required.
The procedure requires that a reference spectrum be included for each element
present in the unknown. The method allows no mechanism to deal with sum peaks. Apart
from removing the continuum, the lter also has some smoothing eect on the spectrum
and causes the peak structure to be spread out over more channels. This is equivalent to
tting a spectrum with a somewhat lower resolution than originally acquired. Therefore,
the precision and detection limits attainable with the lter-t method are slightly worse
than optimal. The width of the lter is important in this respect. Schamber (1977) suggests
taking the width of the top of the lter equal to the FWHM resolution of the spectrometer,
u FWHM. The width of the wings can be taken as v u=2.
2. Application
The calculation procedure is quite simple and requires the following steps. The top-hat lter
is applied to the unknown spectrum and the m reference spectra [Eqs. (30) and (31)], and the
modied uncertainties are calculated using Eq. (63). Next, the vector b of length m and the
m m square matrix a are formed using Eqs. (65) and (66), summing over the part of the
spectrum one wants to analyze (n1 n2 ). Only the relevant part, such as the Ka or the Ka
plus the Kb, of the reference spectra needs to be retained; the rest of the ltered spectrum
can be set to zero. After calculating the inverse matrix a1 , the contribution of each re-
ference to the unknown and its uncertainty are calculated using Eqs. (64) and (67) or (68).
A computer implementation of the lter-t method is given in Sec. X. The method
was used to analyze part of an x-ray spectrum of a polished NIST SRM 1103 brass sample
(Fig. 16A). The measurements were carried out using a Mo x-ray tube and a Zr secondary
target and lter assembly. Spectra of pure metals (Fe, Ni, Cu, Zn, and Pb) were acquired
Copyright 2002 Marcel Dekker, Inc.
Figure 16 (A) Part of the x-ray spectrum of NIST SRM 1103 brass sample. (B) Top-hat-filtered
spectrum and result of fit using reference spectra.
under identical experimental conditions. A top-hat lter of width w 5 was used. Table 6
shows how the spectra were divided in regions of interest to produce the reference spectra.
Because considerable x-ray attenuation is present in brass, separate references were created
for the Ka and Kb of Cu and Zn. This was not done for Fe, Ni, and Pb because these
elements are only present as minor constituents in the brass sample. Figure 16B shows the
ltered brass spectrum and the resulting t using the seven (ltered) reference spectra. The
region below CuKa is expanded 100 times, and the region above ZnKa is expanded 10
times. As can be seen, the agreement between the ltered brass spectrum and the t is very
good. The reduced w2 value is 8.5. This high value is probably due to small peaks shifts in
the reference spectra compared to brass spectrum.
Table 7 compares the results of the lter t with the results obtained by nonlinear
least-squares tting using analytical functions (see Sec. VII). Although the w2 value of the
nonlinear t is slightly better (2.7), one observes an excellent agreement between the two
methods for the analytical important data (i.e., intensity ratios). The uncertainties for
small peaks are slightly higher with lter-t method, as explained previously.
The lter-t method is fast and relatively easy to implement. It can produce reliable
results when the spectrometer calibration can be kept constant within a few electronvolts
and suitable standards for each element present in the sample are available. The method
Copyright 2002 Marcel Dekker, Inc.
Table 6 Data on the Reference Spectra and the Unknown Spectrum Used
in the Filter-Fit Procedure (Fig. 16)
Fe 0.00530.0002 0.00470.0001 11
Ni 0.00190.0002 0.002010.00008 6
Cu 0.5460.001 0.5510.001 0.9
Zn 0.3900.001 0.39120.0008 0.31
Pb 0.03100.0006 0.03080.0003 0.65
performs well when one has to deal with a dicult to model continuum. If information on
trace elements and major elements is required (very large peaks next to very small ones),
the method might not be optimal. This lter-t method is frequently used to process x-ray
spectra obtained with electron microscopes (SEM-EDX), often in combination with a
ZAF or Phi-Rho-Z correction procedure.
TnA and UnA are the score matrices, the values of the AA p latent variables. P0Ap
and Q0Am are the loading matrices describing the relation between the latent variables
(T and U) and the original variables (X and Y). The number of latent variables A in the
model is of crucial importance and its optimum value must be found by cross-validation.
The matrices E and F contain the residuals, part of the original spectral data and the
concentration data, respectively, not accounted for when using A latent variables. The
inner relation is written as:
ua ba ta ; a 1; . . . ; A 74
from which the regression coecients B can be obtained. This operation can be seen as a least-
squares t between the X block and Y block scores. The nal PLS model can thus be written as
Copyright 2002 Marcel Dekker, Inc.
Y TBQ0 F 75
A graphical representation of the PLS model is given in Figure 17. In the normal PLS
algorithm Y is a vector of concentrations for one element and a separate model is build
for each element. If all Y-variables are predicted simultaneously, as in the case of
Equation (70), the PLS2 algorithm is used. This method performs better when the con-
centrations in the samples are highly correlated.
The quality of the calibration model can be judged by the root mean square error of
prediction (RMSEP):
" #1=2
1X n
2
RMSEP ^
yi yi 76
n i1
where y^i is the concentration in sample i predicted by the model yi is the true concentration
in the standard. To determine the optimum number of latent variables, the RMSEP is
calculated using PLS models with dierent numbers of latent variables A. The RMSEP
values are plotted against A and the value where a minimum or a plateau is reached is
taken. For small datasets, the calculation of the RMSEP is done using cross-validation.
When a large number of standards are available, they are split in a training set (ap-
proximately two-thirds of the samples) and a prediction set (approximately one-third of
the samples). The training set is used to build the PLS model and the RMSEP is calculated
based on the concentrations of the prediction set. Alternatively, for smaller calibration
sets, leave-one-out cross-validation can be used. Each sample is excluded once from the
dataset and predicted by the model built with the remaining samples. This is repeated until
all samples have been excluded once.
Geladi and Kowalski (1986) published a tutorial on PLS and its relation to other
regression techniques. A standard work on the PLS method is the book by Martens and
2. Application
The PLS method is illustrated with the analysis of aqueous solution containing Ni, Cu,
and As with concentration in the ranges 1850, 0.545, and 520 g=L, respectively,
whereas Zn, Fe, and Sb were present in more or less constant amounts. Spectra were
acquired for 1000 s from 5-mL solution using an EDXRF spectrometer equipped with a
Rh x-ray tube operating at 30 kV and a Si(Li) detector having 160 eV resolution at MnKa.
From the 22 samples, 10 were used to build the PLS mode and the remaining were
used as the prediction set. Table 8 gives the composition of the samples in the calibration
set. Figure 18 shows part of the spectrum between 5 and 14 keV of sample number 9. The
CuKa line is considerably interfered by the NiKb line. Absorption eects can be expected
because the NiK absorption edge (8.33 keV) is just above the CuKa line (8.04 keV) and the
AsK lines can cause secondary uorescence of Cu and Ni.
A PLS 1 model is build for Cu. The X matrix consists of 451 x-ray intensities
(variables) between 5 and 14 keV (channels 250 to 700) of the 10 samples. The Y matrix
contains the Cu concentration of those samples.
In Figure 19, the RMSEP based on leave-one-out cross-validation is plotted as
function of the number of latent variables. A minimum error of 0.57 g=L is obtained for
three latent variables so that the PLS model with three latent variable is retained.
Figure 20 compares the true Cu concentration with the predicted concentration for the
calibration set. The PLS model predicts very accurately the Cu concentrations in the
range from 1 to 40 g=L. The so-called regression coecients of the PLS model show
which variables (channels) are used in the model. They are plotted in Figure 21. As could
be expected, the Cu concentrations are predicted from the channel content corresponding
to the CuK lines. The inuence of absorption and enhancement results in a small negative
contribution from the Ni peak and a small positive contribution from the As peaks, re-
spectively. The PLS model thus handles both the problem of spectral interference and
matrix eects. The Cu concentration predicted by the PLS model for the test set is given in
Table 9. Except for the two samples with the highest concentrations, the Cu concentration
Concentration (g=L)
Sample No. Ni Cu Zn
Figure 19 RMSEP values versus the number of latent variables for the prediction of the Cu
concentration.
in the unknown samples is very well estimated. The predicted of the test set is generally
somewhat worse that the prediction of the calibration set. To build an accurate model, a
large number of standards spanning the concentration range of interest for each element is
required. This is certainly the major drawback of PLS for its application in XRF,
Copyright 2002 Marcel Dekker, Inc.
Figure 20 Comparison of true and predicted Cu concentrations for the samples in the PLS
calibration set.
Figure 21 Regression coefficients of the PLS model for Cu, showing which variables (channels in
the x-ray spectrum) are used to predict Cu.
Cu concentration (g=L)
especially when solid samples are considered. This problem can to some extend be
overcome by building a calibration set via Monte Carlo simulation. Just as with the lter-
t method, standards and unknowns need to be measured under strictly identical spec-
trometer conditions. Changes in gain or resolution will cause systematic errors in the
calculated concentrations.
Swerts and Van Espen (1993) demonstrated the use of PLS for the determination of
S in graphite using a Rh x-ray tube excitation XRF equipped with a Si(Li)detector. Be-
cause of diraction eects, least-squares tting of the spectra was nearly impossible. Using
PLS, the sulfur content could be determined in a concentration range 260%, with an
accuracy of better than 5% relative standard deviation. Urbanski and Kowalska applied
the PLS method to the determination of Sr and Fe in powdered rock samples and to the
determination of the S and ash content in coal using a radioisotope XRF system equipped
with a low-resolution gas proportional counter. They also demonstrated the usefulness of
this method for the determination of the thickness and composition of Sn-Pb and Ni-Fe
layered structures (Urbanski and Kowalska, 1995). Molt and Schramm (1987) compared
principal components regression (PCR), PLS for the determination of S, exhibiting strong
interference from Mo, in aqueous and solid samples. The results were also compared with
quantitative analysis using the method developed by Lucas-Tooth and Price (1961).
Equally good results were obtained with all three methods. Similar results were obtained
by Lemberge and Van Espen for the determination of Ni, Cu, and As in liquid samples.
They demonstrated that taking the square root of the data improves the PLS model and
that the PLS method extracts information from the scattered excitation radiation to de-
scribe the matrix eects (Lemberge and Van Espen, 1999).
A widely used and certainly the most exible procedure for evaluating complex x-ray
spectra is based on least-squares tting of the spectral data with an analytical function.
The method is conceptually simple, but not trivial to implement and use.
Copyright 2002 Marcel Dekker, Inc.
A. Concept
In this method, an algebraic function, including analytical importantly parameters such as
the net areas of the uorescence lines, is used as a model for the measured spectrum. The
object function (w2) is dened as the weighted sum of squares of the dierences between
this model y(i) and the measured spectrum yi over a region of the spectrum:
Xn2
1
w2 2
yi yi; a1 ; . . . ; am 2 77
in1
si
where s2i is the variance of data point i, normally taken as s2i yi , and aj are the para-
meters of the model. The optimum values of the parameters are those for which w2 is
minimal. They can be found by setting the partial derivatives of w2 with respect to the
parameters to zero:
@w2
0; j 1; . . . ; m 78
@aj
If the model is linear in all the parameters aj, these equations result in set of m linear
equations in the m unknowns aj, which can be solved algebraically. This is known as linear
least-squares tting. If the model is nonlinear in one or more of its parameters, a direct
solution is not possible and the optimum value of the parameters must be found itera-
tively. An initial value is given to the parameters and they are varied in some way until a
minimum for w2 is obtained. The latter is equivalent to searching for a minimum in the
m 1-dimensional w2 hypersurface. This is known as nonlinear least-squares tting. The
selection of a suitable minimization algorithm is very important because it determines to a
large extent the performance of the method. A detailed discussion of linear and nonlinear
least-squares tting is given in Sec. IX.
The most dicult problem to solve when applying this least-squares procedure is the
construction an analytical function that accurately describes the observed spectrum. The
model must be capable of describing accurately the spectral data in the tting region. This
requires an appropriate model for the continuum, the characteristic lines of the elements
and for all other features present in the spectrum such as absorption edges, escape, and
sum peaks. Although the response function of the energy-dispersive detector is, to a very
good approximation, Gaussian, deviation from the Gaussian shape needs to be taken into
account. Failure to construct an accurate model will result in systematic errors, which
under certain conditions may lead to gross positive or negative errors in the estimated
peak areas. On the other hand, the tting function should remain simple, with as few
parameters as possible. Especially for the nonlinear tting, the location of the w2 minimum
becomes problematic when a large number of parameters is involved.
In general, the tting model consists of two parts:
X
yi yB i yP i 79
P
where y(i) is calculated content of channel i and the rst part describes the continuum and
the second part the contributions of all peaklike structures.
Because the tting functions for both linear and nonlinear least-squares tting have
many features in common, we treat the detailed description of the tting function for the
most general case of nonlinear least squares. Moreover, if the least-squares t is done
using the Marquardt algorithm, the linear least-squares t is computationally a particular
case of the nonlinear least-squares t. Programs based on this algorithm can perform
Copyright 2002 Marcel Dekker, Inc.
linear and nonlinear tting using the same computer code. A large part of the discussion
given here is based on the computer code AXIL, developed by the author for spectrum
tting of photon-, electron-, and particle-induced x-ray spectra (Van Espen et al., 1977a,
1977b, 1979b, 1986).
1. Linear Polynomial
A linear polynomial of the type
yB i a0 a1 Ei E0 a2 Ei E0 2 ak Ei E0 k 80
is useful to describe the continuum over a region 23 keV wide. Wider regions often
exhibit too much curvature to be described by this type of polynominal. In Eq. (80), Ei is
the energy (in keV) of channel i [see Eq. (84)] and E0 is a suitable reference energy, often
the middle of the tting region. Expressing the polynomial as a functin of (Ei 7 E0) rather
than as a function of the channel number is done for computational reasons; (Ei 7 E0)3 is,
at most, of the order of 103, whereas i3 can be as high as 109. Most computer programs
that implement a polynomial model for the continuum allow the user to specify the degree
of the polynomial; k 0, 1, and 2 produce respectively a constant, a straight line, and a
parabolic continuum. Values of k larger than 4 are rarely useful because such high-degree
polynomials tend to have physical nonrealistic oscillations. Equation (80) is linear in the
tting parameters a0, . . . , ak, so that this function can be used in linear as well as in
nonlinear least-squares tting.
2. Exponential Polynomial
A linear polynomial cannot be used to t the continuum over the entire spectrum or to t
regions of high positive or negative curvature. Higher curvature can be modeled by
functions of the type
yB i a0 expa1 Ei E0 a1 Ei E0 2 ak Ei E0 k 81
where k is the degree of the exponential polynomial. A value of k as high as 6 might be
required for an accurate description of a continuum from 2 to 16 keV. This function is
nonlinear in the tting parameters a1, . . . , ak and requires a nonlinear least-squares pro-
cedure and some initial guess of these parameters. Initial values for these nonlinear
parameters can be determined by rst estimating the shape of the continuum using one of
the procedures described in Sec. IV followed by a linear t of the logarithm of this con-
tinuum. These initial guesses are then further rened in the nonlinear tting procedure.
3. Bremsstrahlung Continuum
The exponential polynomial is not suitable for describing the shape of the continuum
observed in electron- and particle-induced x-ray spectra, mainly because of the high
Copyright 2002 Marcel Dekker, Inc.
curvature at the low-energy region of the spectrum. The continuum results from photons
emitted from the sample by the retardation of fast electrons. The slope of the emitted
continuum is essentially an exponentially decreasing function according to Kramers for-
mula. At low energies, the emitted photons are strongly absorbed by the detector windows
and by the sample. A suitable function to describe such a radiative continuum is an
exponential polynominal multiplied by the absorption characteristics of the spectrometer:
yB i a0 expa1 Ei E0 ak Ei E0 k Ta Ei 82
A detailed discussion of the function Ta(E) is given on page 288. To be physically correct,
the absorption term must be convoluted with the detector response function, because
the sharp edges due to absorption by detector windows (Si or Au) or elements present in
the sample are smeared out by the nite resolution of the detector.
4. Continuum Removal
An alternative to an algebraic description of the continuum is to estimate the continuum
rst using one of the procedures outlined in Sec. IV and to substract this continuum from
the measured spectrum before the actual least-squares tting. To correctly implement the
least-squares t after subtraction of the continuum, the weights 1=s2i [Eq. (77)] must be
adjusted. If y0i represents the spectral data after subtraction of the continuum,
y0i yi yB i, the variance of y0i is given by s0i 2 s2i s2yB i . A reasonable approximation
for s2yB i is yB i itself, so that the total variance becomes s0i 2 yi yB i. If this adjust-
ment of the weights is not made, the uncertainty in the net peak areas are understimated,
especially for small peaks on a high continuum.
It is rather dicult for an inexperienced user to select the appropriate continuum
model for a given spectrum. The following might serve as a general guideline. For tting
regions 23 keV wide, a linear polynomial continuum is often adequate. To t large re-
gions of XRF spectra, the exponential polynomial provides the most accurate results, with
k typically, between 4 and 6. The same holds for the bremsstrahlung continuum of SEM-
EDX and PIXE spectra. The simplest method from the users point of view is the
continuum stripping, but this method does not provide optimum results. A slight under-
estimation or overestimation might occur, resulting in large relative errors in the area
determination of small peaks (Vekemans et al., 1995).
1. Single Gaussian
A Gaussian peak is characterized by three parameters: the position, width, and height or
area. It is desirable to describe the peak in terms of its area rather than its height because
the area is directly related to the number of x-ray photons detected, whereas the height
depends on the spectrometer resolution. The rst approximation to the prole of a single
peak is then given by
Copyright 2002 Marcel Dekker, Inc.
" #
A xi m2
p exp 83
s 2p 2s2
where A is the peak area (counts), s is the width of the Gaussian expressed in channels,
and m ispthe location
of the peak maximum. The often used FWHM is related to s by the
factor 2 2 ln 2 or FWHM 2.35s. In Eq. (83), the peak area is a linear parameter; the
width and position are nonlinear parameters. This implies that a nonlinear least-squares
procedure is required to nd optimum values for the latter two parameters. Using a linear
least-squares method assumes that the position and width of the peak are know with high
accuracy from calibration.
To describe part of a measured spectrum, the tting function must contain a number
of such functions, one for each peak. For 10 elements and 2 peaks (Ka and Kb) per element,
we would need to optimize 60 parameters. It is highly unlikely that such a nonlinear least-
squares t will terminate successfully at the global minimum. To overcome this problem,
the tting function can be written in a dierent way as shown in the next subsection.
with Ej the energy (in eV) of the x-ray line and s the peak width given by
NOISE 2
s
2
3:58FANOEj 86
2:3548
In this equation, NOISE is the electronic contribution to the peak width (typically
80100 eV FWHM) with the factor 2.3548 to convert to s units, FANO is the Fano factor
( 0.114), and 3.85
is the energy required to produce an electronhole pair in silicon. The
p
term GAIN=s 2p in Eq. (85) is required to normalize the Gaussian so that the sum over all
channels is unity.
For linear least-squares tting, ZERO, GAIN, NOISE, and FANO are physically
meaningful constants. In the case of nonlinear least squares, they are parameters to be
rened during the tting. The advantage of optimizing the energy and resolution calibra-
tion parameters rather than the position and width of each peak is a vast reduction of the
dimensionality of the problem. The nonlinear t of 10 peaks would now involve 14 para-
meters compared to 30. Even more importantly, all information available in the spectrum is
used to estimate ZERO, GAIN, NOISE, and FANO and thus the positions and the widths
Copyright 2002 Marcel Dekker, Inc.
of all peaks. Imagine a small, severely overlapping doublet with a well-dened peak on both
sides of this doublet. These two peaks will contribute most to the determination of the four
calibration parameters, virtually xing the position and the width of the two peaks in the
doublet. As a consequence, their areas can be determined much more accurately.
Referring to our discussion on information content in Sec. II, we did not obtain this
extra performance for nothing. We have supplied extra information: the energy of the
peaks and the two calibration relations [Eqs. (84) and (86)]. Fitting with Eq. (85) requires
that the extra information we supply is indeed correct.
With modern electronics, the linearity of the energy calibration [Eq. (84)] holds very
well in regions above 2 keV. Fitting the entire spectrum, including the low-energy region,
might require more complex energy calibration functions. To t PIXE spectra from 1 to
30 keV, Maenhaut and Vandenhaut (Maenhaut and Vandenhaut, 1986) suggested the
following function: i C1 C2 E C3 expC4 E.
The relation between the square of the peak width and the energy [Eq. (86)] is based
on theoretical considerations. The relation holds very well if the doublet splitting of the
x-ray lines is taken into account. The Ka1Ka2 separation increases from a negligible value
for Ca (3.5 eV) to nearly 100 eV for Mo. The observed peak shape of a K line is actually an
envelope of two peaks. This envelope can be represented rather well by a single Gaussian,
but failing to take this doublet splitting into account (i.e., tting with a single Gaussian
where doublets are required) will result in peak widths that do not obey Eq. (86). To
illustrate this, the observed width of a number of Ka lines as function of the x-ray energy is
presented in Figure 22. The dotted line represents the width of the Ka doublet tted as one
peak. The solid (straight) line shows the width of the individual lines in the doublet.
X
NP
yP i A Rj Gi; Ej 87
j1
where G are the Gaussians for the various lines with energy Ej and Rj the relative P in-
tensities of the lines. The summation runs over all lines in the group (NP) with Rj 1.
The transition probabilities of all lines originating from a vacancy in the same (sub)
shell (K, LI, LII, . . .) are constants, independent of the excitation. However, the relative in-
tensities depend on the absorption in the sample and in the detector windows. To take this
into account, the x-ray attenuation must be included in Eq. (87). The relative intensity ratios
are obtained by multiplying the transition probabilities with an absorption correction term:
Rj Ta Enj
R0j P 88
j Rj Ta Enj
Contributions from various subgroups (i.e., between K and L, between LI and LII, etc.)
depend on the type of excitation (photons, electrons, protons) and on the excitation en-
ergy. General values cannot be given and must be determined for the particular excitation
Copyright 2002 Marcel Dekker, Inc.
Figure 22 FWHM of various Ka lines, tted as single peak and as a Ka1Ka2 doublet.
4. Modified Gaussians
When tting very large peaks with a Gaussian, the deviation from the pure Gaussian shape
becomes signicant. In Figure 24, a MnK spectrum with 107 counts in the MnKa peak is
shown. One observes a tailing on the low-energy side of the peaks and a continuum that is
Copyright 2002 Marcel Dekker, Inc.
Figure 23 Fit of a complex L line spectrum of tungsten. In total, 24 transitions, divided over the 3
L subshells, are required for the description on the spectrum.
Figure 24 MnK line spectrum with very good counting statistics. The dierence from the
Gaussian response is obtained by subtracting all Gaussian peaks. From this, the peak-shape
correction is calculated.
Figure 25 Fit of the spectrum of a brass sample (NIST SRM 1106) (a) fitted with a simple
Gaussians and (b) fitted with Gaussians including tail and step functions.
where Ci is the numerical peak-shape correction at channel i. Values in the table are
interpolated to account for the dierence between the energy scale of the correction and
the actual energy calibration of the spectrum. Similar to the parameters of the non-
Gaussian analytical functions, the shape of the numeric correction seems to vary slowly
from one element to another. This allows us to interpolate the peak-shape correction for
all elements from a limited set of experimentally determined corrections.
A major disadvantage of this method is that it is quite dicult and laborious to
obtain good experimental peak-shape corrections. Although they are, in principle, de-
tector dependent, experience has proven that the same set of corrections can be used for
dierent detectors with reasonable success, proving the fundamental nature of the ob-
served non-Gaussian shape. Another disadvantage is that the peak-shape correction for
the K becomes underestimated if strong dierential absorption takes place because the
peak-shape correction is only related to the area of the Ka peak. Also, it is nearly im-
possible to apply this method to the description of L-line spectra. A mayor advantage
however is the computational simplicity of the method and the fact that no extra para-
meters are required in the model.
5. Absorption Correction
The absorption correction term Ta E, used in the Eqs. (82) and (88), includes the x-ray
attenuation in all layers and windows between the sample surface and the active area of the
detector. For high energetic photons also, the transparency of the detector crystal needs to be
taken into account. In x-ray uorescence, the attenuation in the sample, causing additional
changes in the relative intensities, can also be considered, providing the sample composition
is known. The total correction term is thus composed of a number of contributions:
Ta E TDet TPath TSample 96
The detector contribution for a Si(Li) detector is given by
Copyright 2002 Marcel Dekker, Inc.
TDet E emBe rdBe emAu rdAu emSi rdSi 1 emSi rDSi 97
where m, r, and d are the mass-attenuation coecient, the density, and the thickness of the
Be window, the gold contact layer, and the silicon dead layer. In the last term, D is the
thickness of the detector crystal.
Any absorption in the path between the sample and the detector can be modeled in a
similar way. For an air path, the absorption is given by
Also, various polynomial type functions expressing the escape ratio as a function of the
energy of the parent peak are in use. The coecients of the function can be determined by
least-squares tting from experimental escape ratios. For spectra obtained with a Ge
detector, one needs to account in a similar way for both the GeKa and GeKb escape peak
for elements above arsenic.
The incorporation of the sum peaks in the tting model is more complex. The
method discussed below was rst implemented by Johansson in the HEX program
(Johansson, 1982). Initially, the spectrum is tted without considering pileup peaks. The
peaks are then sorted according to their height and the n largest peaks are retained. Peaks
that diers less than 50 eV are combined to one peak. Using Eqs. (5) and (6), the relative
intensities of all possible nn 1=2 pileup peaks and their energies are calculated and the
m most intense are retained. Knowing the relative intensities and the energies of these m
pileup peaks, they can be included in the tting model as one pileup element. In the next
iteration, the total peak area A of this pileup element is obtained. The construction of the
pileup element can be repeated during the next iterations as more reliable peak areas
become available. In Figure 26, part of an PIXE spectrum is shown tted with and without
sum peaks included.
Figure 27 Effect of the use of a constraint on the shape of the w2 response surface. (Left: marginal
effect for a large peak; right: important contribution for a small peak.)
E. Examples
To illustrate the working of the nonlinear least-squares-tting method, an articial spec-
trum with CuK and ZnK lines is tted with four Gaussian peaks on a constant continuum.
Using the Marquardt algorithm, the position, width, and area of each peak are de-
termined. The tting function thus is
" #
X4
i aj1 2
yi a0 i aj exp 109
j1
2a2j2
with i the channel number (independent variable) and aj the parameters to be determined,
13 in total. In Table 10, the values of the parameters to generate the spectrum (true va-
lues), the initial guesses for the nonlinear parameters, and the nal tted values are given.
The initial guesses were deliberately taken rather far away from the true values. Figure 29
(top) shows the tted spectrum after the rst and second iterations. During the second
iteration, the Marquardt algorithm evolved into a gradient search, drastically changing the
position and width of the peaks. Even after ve iterations, the calculated spectrum deviates
considerably from the measured spectrum, as can be seen from Figure 29 (bottom).
Figure 28 The effect of weighing the least-squares fit is shown on part of an PIXE spectrum with
a very small number of counts per channel.
Area (counts)
CuKa 100,000 0 100,134321
CuKb 13,400 0 13,163169
ZnKa 30,000 0 30,092213
ZnKb 4,106 0 413883
Position (channel number)
CuKa 402.05 395 402.030.01
CuKb 445.25 450 445.350.06
ZnKa 431.55 435 431.590.04
ZnKb 478.60 485 478.680.09
Width (channels)
CuKa 3.913 3 3.910.01
CuKb 4.033 3 3.990.05
ZnKa 3.995 3 4.020.03
ZnKb 4.123 3 4.060.08
Finally, after 10 iterations, a perfect match between them is obtained, with a reduced w2
value of 0.96.
From Table 10, it is evident that the t was quite successful, with all peak areas,
positions, and widths estimated correctly within the calculated standard deviation. One
observes that the uncertainties in the peak areas are approximately equal to the square
root of the peak area and that the position and the width of the peaks are estimated very
precisely (within 0.01 channel or 0.2 eV), especially for the larger peaks.
By observing how the iteration procedure changes the peak width and position
parameters, one can imagine that something might go wrong. Especially if the spectrum is
more complex, chances are high that the iteration stops in a false minimum or even drifts
away completely. In both cases, physically incorrect parameter estimates will be obtained.
In practice, it is course possible to give much better initial estimates for the peak position
and width parameters than used in this example.
In a next example, a complex spectrum (geological reference material JG1, excited
with MoK x-rays from a secondary target system) is evaluated using nonlinear least-
squares tting. In Figure 30, the spectrum and the t are shown together with the residuals
of the t (see p. 298). Due to the large number of overlapping lines, the method used in the
rst example (tting the position and width of each peak independently) is not applicable
in this case. For the description of the spectrum from 1 to 18 keV, the uorescence lines of
21 elements were used. Au, Hg, Pb, and Th were treated each as one L group
(L1 L2 L3 ) and Al, Si, Ti, V, Cr, Mn, Ni, Cu, Zn, Ga, Al, Br, Sr, Rb, Y as one K group
(Ka Kb); K, Ca, and Fe were tted with individual Ka and Kb peaks. The coherently
scattered MoKa radiation was tted with a Gaussian and the incoherent scattered MoKa
radiation is tted with a Voigt prole. Including escape and sum peaks, this amounts to
well over 100 peak proles. Step and tail functions [Eq. (89)] were included for all peaks,
using expressions to relate the step and tail heights and the tail width to the energy of the
peak. The continuum is described by an exponential function [Eq. (81)] with six para-
meters. The least-squares t thus performed required the renement of 37 parameters
Copyright 2002 Marcel Dekker, Inc.
Figure 29 Artificial CuK and ZnK line spectrum tted with a nonlinear least-squares procedures
to optimize peak area, position, and width. (Top: Fitted spectrum after rst and second iteration;
bottom: after fth and nal iterations.)
(4 calibration parameters, 27 peak areas, 6 continuum parameters, 3 step and tail para-
meters, and 1 Voigt parameter). The minimum w2 value of 1.47 is obtained after 16
iterations. The residuals indicate an overall good t with most of the residuals in the 73
to 3 interval, without any systematic patterns. Its interesting to note that the tted
continuum is well below the base of the peaks, especially in the region from channel 200 to
600. The continuum describes correctly the small scattered bremsstrahlung contribution
from the x-ray tube above 12 keV (channel 600) and the Compton scattering in the de-
tector at the low-energy side. Most of the apparent continuum in this secondary-target
EDXRF spectrum is due to incomplete charge collection and tailing phenomena of the
scattered Mo excitation radiation and the uorescence lines.
are caused by discrepancies between the tting model and the observed data and cause
inaccuracies in the net peak areas.
1. Error Estimate
Section IX explains how the least-squares-tting method (linear as well as nonlinear)
allows the estimation of the uncertainties in the tted parameters. These uncertainties
result from the propagation of the statistical uctuations in the spectral data into the
parameters. Intuitively, one could come to the conclusion that the standard deviation of
the peak area should be close to the square root of the peak area. This is indeed the case
for a large, interference-free peak on a low continuum, but if the peak is sitting on a high
continuum and=or is seriously overlapped by another peak, the uncertainty in the esti-
mated peak area will be much larger.
A properly implemented least-squares method not only correctly estimates the net
peak areas but also their uncertainty, taking into account the continuum and the degree of
peak overlap, providing, of course, that the tting model is capable to describe the
measured spectrum. The closer together the peaks are, the higher the uncertainties in the
two peak areas will become. (Theoretically, in the limit of complete overlap, the un-
certainty will be innite and the area of the two peaks will have complete erratic values,
but their sum will still represent correctly the total net peak area of the two peaks; in
practice, the curvature matrix a will be singular so that the matrix inversion fails.)
Copyright 2002 Marcel Dekker, Inc.
The uncertainties in the net peak areas can be used to decide if a peak is indeed
present in the spectrum and to calculate the detection limit. A peak area is statistically not
signicant dierent from zero if its value is in the range 3s. Any value above 3s gives
clear evidence that the peak is present; any value below 73s would indicate that there is
something wrong with the model because truly negative peak areas are physically mean-
ingless. Because the uncertainty in the net peak area includes the inuence of continuum
and peak overlap, it can be used to calculate the a posteriori detection limits of the ele-
ments (peaks) present in the spectrum. Three situations can occur:
From the denition of the least-squares method, it follows that the w2 value [Eq. (75)]
estimates how well the model describes the data. The reduced w2 value, obtained by di-
viding w2 by the number of degrees of freedom;
1 1
w2n w2 w2 110
n nm
has an expected value of 1 for a perfect t. The number of degrees of freedom equals the
number of data points (n) minus the number of parameters (m) estimated during the t.
Because w2 is also a random variable, the observed value will often be slightly larger or
smaller than 1. Actually, w2 follows (approximately) a chi-square distribution and the 90%
condence interval is given by
1 X m
1
w2P yi yi2 112
n1 n2 in2 s2i
yi yi
ri 114
si
It is the sum of the squares of these residuals that were minimized by the least-squares
method. Values in excess of 3 or 73 and especially the presence of a pattern in the
residuals then indicate poorly tted regions.
Monte Carlo techniques for the simulation of x-ray spectra are becoming more and more
popular, particularly because of the fast computers available today. These simulated
spectra are useful for studying the behavior and performance of various spectrum pro-
cessing methods. The Monte Carlo technique also can be used in quantitative analysis
procedures, as will be discussed in Sec. VIII. B.
Figure 32 Simulation of a child spectrum from a parent. (Top: original spectrum; middle:
the cumulative distribution; bottom: generated child.)
In contrast to the normal library least-squares method, this method has the advantage that
the library spectra are simulated for a composition close to the composition of the spec-
trum to analyzed, rather than measured from standards. This eliminates the necessity of
applying the top-hat lter and problems related to changes in Kb=Ka ratios, and again the
method combines spectrum evaluation with quantitative analysis.
Finally the ability to simulate x-ray spectra that agree very well with real measured
spectra opens the possibility to used them as standards for the quantitative analysis
based on partial least-squares regression (see Sec. VI.B). Indeed, as this method only
functions correctly if the PLS model is built using a large number of standards covering
the entire concentration domain, it seems advantageous to use simulated spectra for this.
All the inter-element interactions can be accounted for by the simulation and only a few
real standards are required to scale the simulated spectra.
Copyright 2002 Marcel Dekker, Inc.
IX. THE LEAST-SQUARES-FITTING METHOD
The aim of the least-squares method is to obtain optimal values for the parameters of a
function that models the dependence of experimental data. The method has its roots in
statistics but is also considered part of numerical analysis. The least-squares parameter
estimate, also known as curve tting, plays an important role in experimental science. In
x-ray uorescence, it is used in many calibration procedures and it forms the basis of a
series of spectrum analysis techniques. In this section, an overview of the least-squares
method with emphasis on spectrum analysis is given.
Based on the type of tting function, one makes a distinction between linear and
nonlinear least-squares tting because one requires numerical techniques of dierent
complexity to solve the problem in the two cases. The linear least-squares method deals
with the tting of functions that are linear in the parameters to be estimated. For this
problem, a direct algebra solution exists. If the tting function is not linear in one or more
of the parameters, one uses nonlinear least-squares tting and the solution can only be
found iteratively. A group of linear functions of general interest are the polynomials, the
straight line being the simplest case. The special case of orthogonal polynomials will also
be considered. If more than one independent variable x1i ; x2i ; . . . ; xmi is associated with
each measurement of the dependent variable yi , one speaks of multivariate regression.
Spectrum analysis using the library function (e.g., lter-t method) belongs to this cate-
gory. If analytical function (e.g., Gaussians) are tted to a spectrum, the method of linear
or nonlinear least square is used, depending on whether nonlinear parameters, such as the
peak position and width, are determined.
A. Linear-Least Squares
Considered the problem of tting experimental data with the following linear function:
y a1 X1 a2 X2 am Xm 120
This function covers all linear least-squares problems. If m 2; X1 1, and X2 x, the
straight-line equation y a1 a2 x is obtained. For m > 2 and Xk xk1 , Eq. (120) is a
polynomial y a1 a2 x a3 x2 am xm1 to be tted to the experimental data
points fxi ; yi ; si g i 1; . . . ; n. If X1 ; . . . ; Xm represents dierent independent variables, the
case of multiple linear regression is dealt with. Because of this generality, we will discuss
the linear least-squares method based on Eq. (120) in detail.
Assume a set of n experimental data points:
The minimum is found by setting the partial derivatives of w2 with respect to the para-
meters to zero:
Copyright 2002 Marcel Dekker, Inc.
@w2 Xn
1
2 yi a1 X1 a2 X2 am Xm Xk 0; k 1; . . . ; m 123
@ak i1
s 2
i
Dropping the weights 1=s2i temporarily for clarity, we obtain a set of m simultaneous
equations in the m unknown ak :
X X X X
yi X 1 a1 X1 X1 a2 X2 X1 am Xm X1
X X X X
yi X 2 a1 X1 X2 a2 X2 X2 am Xm X2
.. 124
X X X. X
yi X m a1 X 1 X m a2 X2 Xm am Xm Xm
where the summations run over all experimental data points i. These equations are known
as the normal equations. The solutionthe values of ak can easily be found using matrix
algebra. Because two (column) matrices are equal if their corresponding elements are
equal, the set of equations can be in matrix form as
2X 3 2 X X X 3
yi X1 a1 X1 X1 a2 X 2 X 1 am Xm X1
6 X 7 6 X X X 7
6 yi X2 7 6 X1 X2 a2 X 2 X 2 am Xm X2 7
6 7 6 a1 7
6 .. 76 .. 7 125
6 7 6 7
4X . 5 4 X X . X 5
yi Xm a1 X 1 X m a2 X2 Xm am Xm Xm
The right-hand column matrix can be written as the product of a square matrix a and a
column matrix a:
2X 3 2 X X X 3
yi X1 X1 X1 X2 X1 Xm X1 2 a1 3
6X 7 6 X X X 76 7
6 yi X2 7 6 X2 X2 Xm X2 7
6 7 6 X1 X2 76 a2 7
6 . 7
6 7 6 7 126
6 . 7 6 .. 74 .. 7
6
4 X .. 5 4X X . X 5 5
yi Xm X1 Xm X2 Xm Xm Xm a2
or
b aa 127
This equation can be solved for a by premultiplying both sides of the equation with
the inverse matrix a1 ,
a1 b a1 aa Ia 128
or, I being the identity matrix,
a a1 b 129
Introducing the weights again, the elements of the matrices are given by
X n
1
bj y X ; j 1; . . . ; m
2 i j
130
i1
si
Xn
1
ajk Xk Xj ; j 1; . . . ; m; k 1; . . . ; m 131
i1
s2i
and
Copyright 2002 Marcel Dekker, Inc.
X
m
aj a1
jk bk ; j 1; . . . ; m 132
k1
where a1 1
jk are the elements of the inverse of the matrix a .
The uncertainty in the estimate of aj is due to the uncertainty of each measurement
multiplied by the eect that measurement has on aj :
Xn 2
@aj
s2aj s2i 133
i1
@yi
Because a1
jk is independent of yi , the partial derive is simply
@aj 1Xm
2 a1 Xk i 134
@yi si k1 jk
This results in the simple statement that the variance (square of uncertainty) of a tted
parameter aj is given by the diagonal element j of the inverse matrix a1 . The o-diagonal
elements are the covariances. For this reason, a1 is often called the error matrix. Simi-
larly, a is called the curvature matrix because the elements are a measure of the curvature
of the w2 hypersurface in the m-dimensional parameter space. It can easily be shown that
1 @ 2 w2 X 1
Xk Xj ajk 137
2 @aj @ak i
s2i
If the uncertainties in the data points si are unknown and are the same for all data
points si s, these equations can still be used by setting the weights wi 1=s2i to 1.
Assuming the tting model is correct, s can be estimated from the data:
1 X
s2i s2 s2 yi a1 X1 a2 X2 am Xm 2 138
nm i
If the uncertainties in the data points are known, the reduced w2 value can be calculated as
a measure of the goodness of t:
1 X 1 1
w2n yi a1 X1 a2 X2 am Xm 2 w2 140
n m i s2i nm
The expected value of w2n is 1.0, but due to the random nature of the experimental data values
slightly smaller or greater than 1 will be observed even for a perfect t. w2n follows a chi-
square distribution with n m degrees of freedom, and a 90% condence interval is dened by
Copyright 2002 Marcel Dekker, Inc.
w2 n; P 0:95 w2n w2 n; P 0:05 141
where w2 n; P is the (tabulated) critical value of the w2 distribution for n degrees of freedom
at a condence level P. Observed w2n values outside this interval indicate a deviation
between the t and the data that cannot be attribute to random statistical uctuations.
The least-squares estimates of the coecient Cj are determined by minimizing the weighted
sum of squares:
Xn
w2 wi yi yi2 143
i1
and
b0 0 and P0 xi 1 151
Thus, an example of a rst-order orthogonal polynomial is
C0 P0 C1 P1 C0 C1 xi a0 152
with
X
n .X
a0 w i xi wi 153
i1
In this part, we consider the tting of a function that is nonlinear in one or more tting
parameters. Examples of such functions are a decay curve,
yx a1 expa2 x 154
or a Gaussian on a linear background,
!
x a4 2
yx a1 a2 x a3 exp 155
2a25
whose minimum will be reached when the partial derivative with respect to the parameters are
zero; however, this will result in a set of m equations for which no general solution exists. The
other approach to the problem is then to consider w2 as a continuous function of the parameters
aj (i.e., w2 will take a certain value for each set of values of the parameters aj for a given dataset
fxi ; yi ; s2i g. w2 thus forms a hypersurface in the m-dimensional space, formed by the tting
parameter aj . This surface must be searched to locate the minimum of w2 . Once found, the
corresponding coordinate values of the axes are the optimum values of the tting parameters.
The problem of nonlinear least-squares tting is thus reduced to the problem of
nding the minimum of a function in an m-dimensional space. Any algorithm that per-
forms this task should operate according to the following:
1. Given some initial set of values for the parameters aini evaluate w2 :
w2old w2 aini
2. Find a new set of values anew such that w2new < w2old .
3. Test the minimum of w2 value:
if w2new is the (true) minimum accept anew as the optimum values of the t
else w2old w2new and repeat Step 2.
From the scheme, the iterative nature of the nonlinear least-squares tting methods be-
comes evident. Moreover, it shows some other important aspects of the method: Initial
values are required to start the search, we need a procedure to obtain a new set of
parameters which preferably are such that w2 is decreasing, and we need to be sure that the
true minimum, not some local minimum, is nally reached.
A variety of algorithms has been proposed, ranging from brute-force mapping
procedures dividing the m-dimensional parameter space in small cells and evaluating w2 in
each point, to more subtle simplex search procedures (Fiori et al., 1981). The most
Copyright 2002 Marcel Dekker, Inc.
important group of algorithms is nevertheless based on the evaluation of the curvature
matrix. The gradient method and the rst-order expansion will be discussed briey, as they
form the basis of the most widely used LeverbergMarquardt algorithm (Marquardt,
1963; Bevington and Robinson, 1992; Press et al., 1988).
a. The Gradient Method
Having a tting function y yx; a and w2 dened as a function of the m parameters aj ,
Xn
1
w2 w2 a y yxi ; a2
2 i
163
i1
s i
where j is the unit vector along the axis j and the components of the gradient are given by
@w2 X 1 @y
2 yi yxi ; a 165
@aj i
2
si @aj
It is convenient to dene
1 @w2
bj 166
2 @aj
The gradient gives the direction in which w2 increases most rapidly. A method to locate the
minimum can thus be developed on this basis. Given the current set of parameters aj , a
new set of parameters a0j is calculated (for all j simultaneously):
a0j aj Daj bj 167
which follows the direction of steepest descent and guarantees a decrease of w2 (at least if
the appropriate step sizes Daj are taken).
The gradient method works quite well away from the minimum, but near the
minimum, the gradient becomes very small (at the minimum, even zero). Fortunately, the
method discussed next behaves in the opposite way.
b. First-Order Expansion
If we write the tting function yxi ; a as a rst-order Taylor expansion of the parameters
aj around y0 ,
X @y0 x; a
yx; a y0 x; a daj 168
j
@aj
we obtain an (approximation) to the tting function which is linear in the parameter in-
crements daj . y0 x; a is the value of the tting function for some initial set of parameter a.
Using this function, we can now express w2 as
" #2
X 1 X @y0 xi ; a
w
2
yi y0 xi ; a daj 169
i
s2i j
@aj
and we can use the method of linear least squares to nd the parameters daj so that w2 will
be minimal. We are thus tting the dierence y0i yi y0 xi ; a with the derivatives as
Copyright 2002 Marcel Dekker, Inc.
variables and the increments daj as unknowns. With reference to the section on linear
least-squares tting [Eq. (122)],
@y0 xi
Xj 170
@aj
and [Eq. (130) and (131)]
X n
1 @y0 xi
bj 2 i
y y0 xi 171
i1
si @aj
X 1 @y0 xi @y0 xi
n
ajk 172
i1
s2i @aj @ak
At the minimum, the partial derivation of w2 with respect to the parameter ak will be zero:
@w2 @w20 X @ 2 w20
dak 0 179
@ak @ak j
@aj @ak
0 ajk 1 l; j k
ajk 182
ajk ; j 6 k
where ajk is given by Eq. (172) and the matrix equation to be solved for the increments daj is
X
bj a0jk dak 183
k
When l is very large (l
1), the diagonal elements of a dominate and Eq. (183) reduces to
bj a0jj dak 184
or
1 1 @w2
dak b 185
a0jj j
a0jj @ak
which is the gradient, scaled by a factor a0jj . On the other hand, for small values of
l (l 1), the solution is very close to rst-order expansion.
The algorithm proceeds as follows:
1. Given some initial values of the parameters aj , evaluate w2 w2 a and initialize
l 0.0001.
2. Compute b and a matrices using Eqs. (171) and (172).
3. Modify the diagonal elements a0jj ajj l and compute da.
4. If w2 a da
w2 a
increase l by a factor of 10 and repeat Step 3;
If w2 a da < w2 a
decrease l by a factor of 10
accept new parameters estimates a a da and repeat Step 2.
The algorithm thus performs two loops: the inner loop incrementing l until w2 starts to
decrease and the outer loop calculating successively better approximations to the optimum
values of the parameters. The outer loop can be stopped when w2 decreases by a negligible
absolute or relative amount.
Copyright 2002 Marcel Dekker, Inc.
Once the minimum is reached, the diagonal elements are an estimate of the un-
certainty in the tting parameters just as in the case of linear least squares:
s2aj a1
jj 186
which is equal to a0jj 1 providing the scaling factor l is much smaller than 1.
In Sec. X, a number of computer programs for linear and nonlinear least-squares
tting are given. Further information can be found in many textbook (Press et al., 1988).
The book by Bevington and Robinson (1992) contains a very clear and practical discus-
sion of the least-squares method.
In this section, a number of computer routines related to spectrum evaluation are listed.
The calculation routines are written in FORTRAN. Some example programs, calling this
FORTRAN routines, are written in C. The programs were tested using Microsoft FORTRAN
version 4.0A and C version 5.1. Most of the routines are written for clarity rather than
optimized for speed or minimum space requirement.
A. Smoothing
1. Savitsky and Golay Polynomial Smoothing
The subroutine SGSMTH calculates a smoothed spectrum using a second-degree poly-
nomial lter (see Sec. III.B.2).
For each channel i in the spectrum, two windows, one on each side of the channel of width
f6FWHM(Ei ) channels are considered. In both windows, the channel contents are
summed, yielding a left sum L and a right sum R. Both windows are subsequently reduced
in width until either the total sum S L yi R falls below some constant minimum M
or until two conditions are met:
p
1. S is less than a cuto value N A yi , with A a constant.
2. The slope R 1=L 1 lies between 1=r and r, with r a constant.
The minimum constant M sets the base degree of smoothing in a region of vanishing
counts. The rst condition ensures that smoothing is conned to the low statistics region
of the spectrum; the second condition avoids the incorporation of the edges of the peaks in
the averaging.
When the above conditions are satised, the average S=(2f6FWHM 1) is adopted
as a smoothed channel count. The following parameters were found to yield good results
when treating PIXE spectra: f 1.5, A 75, M 10, and r 1.3.
B. Peak Search
The subroutine LOCPEAKS locates peaks in a spectrum using positive part of tophyhat
lter (see Sec III.C).
Input: Y Spectrum
NCHAN Number of channels in the spectrum
R Peak search sensitivity factor, typical 2 to 4
IWID Width of the filter, approx. equal to the FWHM of the peaks
MAXP Maximum number of peaks to locate (size of array IPOS)
Output: NPEAK Number of peaks found
IPOS Peak positions (channel number)
C. Continuum Estimation
1. Peak Stripping
The subroutine SNIPBG, a variant of the SNIP algorithm, calculates the continuum via peak
stripping (see Sec. IV.A).
INPUT: Y Spectrum
NCHAN Number of channels in the spectrum
ICH1,ICH2 First and last channels of region to calculate the continuum
FWHM Width parameter for smoothing and stripping algorithm, set it
to average FWHM of peaks in the spectrum, typical value 8.0
NITER Number of iterations of SNIP algorithm, typical 24
Output: YBACK Calculated continuum in the region ICH1ICH2
Comment: Uses subroutine SGSMTH
The routine calls ADJWEIG to adjust the weights. Further, the subroutine ORTPOL is
used to t the polynomial. The iteration (adjustment of weights) stops when all coefcients
cj change less than one standard deviation or when the maximum number of iterations is
reached.
D. Filter-Fit method
The C program FILFIT is a test implementation of the lter-t method (see Sec. VI). This
program simply coordinates all input and output, allocates the required memory, and calls
two FORTRAN routines that do the actual work. The subroutine TOPHAT returns the
Copyright 2002 Marcel Dekker, Inc.
convolute of the spectrum with the top-hat lter or the weights (the inverse of the variance
of the ltered spectrum). The general-purpose subroutine LINREG is called to perform the
multiple linear least-squares t. The output includes the reduced w2 value, the parameters
of the t aj (which is an estimate of the ratio of the intensity in the analyzed spectrum to
the intensity in the standard for the considered X-ray lines), and their standard deviation.
The routine GETSPEC reads the spectral data and must be supplied by the user.
#include <stdio.h>
#include <malloc.h>
#include <float.h>
#include <math.h>
void fortran TOPHAT( );
void fortran LINREG( );
float spec[2048];
main( )
{
int nchan, first_ch_fit, last_ch_fit, width, ierr;
int i, first_ch_ref, last_ch_ref, ref, num_ref, num_points;
int filter_mode 0, weight_mode 1, ioff;
float meas_time, ref_meas_time, *scale_fac;
float *x, *xp, *y, *w, *yfit, *a, *sa, chi;
double *beta, *alpha;
char filename [64];
// input width of tophat filter
scanf(%hd, &width);
// input spectrum to fit and fitting region
scanf(%s, filename);
scanf(%hd %hd, &first_ch_fit, &last_ch_fit);
nchan Getspec(spec, filename, &meas_time);
num_points last_ch_fit 7 first_ch_fit 1;
// filter spectrum and store in y[ ]
y (float *)calloc(num_points, sizeof(float));
TOPHAT(spec, y, &nchan, &first_ch_fit, &last_ch_fit, &width, &filter_mode);
//calculate weights of fit and save in w[ ]
w (float *)calloc(num_points, sizeof(float));
TOPHAT(spec, w, &nchan, &first_ch_fit, &last_ch_fit, &width, &weight_mode);
//read reference spectra, filter and store in x[ ]
scanf(%hd, &num_ref);
scale_fac (float *)calloc(num_ref, sizeof(float));
x (float *)calloc(num_points*num_ref, sizeof(float));
for(ref 0; ref < num_ref; ref) {
scanf(%s, filename);
nchan Getspec(spec, filename, &ref_meas_time);
scale_fac[ref] ref_meas_time/meas_time;
scanf(%hd %hd, &first_ch_ref, &last_ch_ref);
if (first_ch_ref < first_ch_fit) first_ch_ref first_ch_fit;
if(last_ch_ref > last_ch_fit) last_ch_ref last_ch-fit;
ioff ref*num_points first_ch_ref 7 first_ch_fit;
xp x ioff;
TOPHAT(spec, xp, &nchan, &first_ch_ref, &last_ch_ref, &width, &filter_mode);
}
// Program NLRFIT
#include <stdio.h>
#include <malloc.h>
#include <float.h>
#include <math.h>
#define MAX_PERKS 10
#define MAX_CHAN 1024
void fortran MARQFIT();
float spec [MAX_CHAN];
// Fortran common block structure COMMON/FITFUN/NB, NP
struct common_block {short NB, NP;};
extern struct common_block fortran FITFUN;
main( )
{
char specfile[64];
int nchan, first_ch_fit, last_ch_fit, nb, np;
int i, j, n, num_points, num_param, ierr, max_iter;
float ini_pos[MAX_PEAKS], ini_wid[MAX_PEAKS];
float *x, *xp, *y, *w, *yfit, *a, *sa, chi, lamda, crit_dif;
float *b, *beta, *deriv, *alpha;
double *work;
// Input of parameters and spectral data
scanf(%s, specfile);
scanf(%hd %hd %hd %f, &first_ch_fit, &last_ch_fit, &max_iter, &crit_dif);
scanf(%hd %hd, &np, &nb);
for(i 0; i < np; i )
scanf(%f %f, &ini_pos[i], &ini_wid[i]);
Input: ISEED Set to any negative number to initialize the random sequence
Output: NRAND Normal distributed random deviate with zero mean and unit
variance.
G. Least-Squares Procedures
1. Linear Regression
Subroutine LINREG is a general-purpose (multiple) linear regression routine (see Sec. IX. A).
3. Nonlinear Regression
The subroutine MARQFIT performs nonlinear least-squares-tting according to the Mar-
quardt algorithm (see Sec. IX.C).
The routine requires two user-supplied subroutines: FITFUNC to evaluate the tting
function y(i) with the current set of parameters a and the DERFUNC to calculate the de-
rivatives of the tting function with respect to the parameters.
SUBROUTINE MARQFIT (IERR, CHISQR, FLAMDA, CRIDIF, MAXITER,
> X, Y, W, YFIT, NPTS, A, SA, NTERMS,
> B, BETA, DERIV, ALFA, ARR)
INTEGER*2 IERR, NPTS, NTERMS
REAL*4 CHISQR, FLAMDA, CRIDIF
REAL*4 X(NPTS), Y(NPTS), W(NPTS), YFIT(NPTS)
REAL*4 A(NTERMS), SA(NTERMS)
REAL*4 B(1), BETA(1), DERIV(1), ALFA(1)
REAL*8 ARR(1)
PARAMETER (FLAMMAX 1E4, FLAMMIN 1E 7 6)
4. Matrix Inversion
Subroutine LMINV is a general-purpose routine to invert a symmetric matrix.
Input: ARR Upper triangle and diagonal of real symmetric matrix stored in
linear array, size N(N 1)=2.
N Order of matrix (number of columns)
Output: IERR Error status, IERR 0 inverse obtained, IERR 1 singular
matrix
REFERENCES