Vous êtes sur la page 1sur 101

4

Spectrum Evaluation

Piet Van Espen


University of Antwerp, Antwerp, Belgium

I. INTRODUCTION

This chapter deals with (mathematical) procedures to extract relevant information from
acquired x-ray spectra. Smoothing of the spectrum results in a graph that can be easier
interpreted by the human observer. To determine which elements are present in a speci-
men, peak search methods are used. To obtain the analytically important net peak areas of
the uorescence lines, a variety of methods, ranging from simple summation to sophisti-
cated least-squares-tting procedures, are to the disposal of the spectroscopist.
Spectrum evaluation is a crucial step in x-ray spectrometry, as much as sample pre-
paration and quantication. As with any analytical technique, the performance of x-ray
uorescence analysis is determined by the weakest step in the process. Spectrum evaluation
in energy-dispersive x-ray uorescence analysis (EDXRF) is certainly more critical than in
wavelength-dispersive spectrometry (WDXRF) because of the relatively low resolution of
the solid-state detectors employed. The often-quoted inferior accuracy of EDXRF can, to a
large part, be attributed to errors associated with the evaluation of these spectra. As a
consequence of this, most of the published work in this eld deals with ED spectrometry.
Although rate meters and=or strip-chart recorders have been employed in WD
spectrometry for a long time, the processing of ED spectra by means of computers has
always been more evident because of their inherent digital nature. Some of the techniques
to be discussed have their roots in g-ray spectrometry developed mainly in the sixties; for
others (notably the spectrum-tting procedures), EDXRF has developed its own specia-
lized data processing methodology. The availability of inexpensive and fast personal
computers together with the implementation of mature spectrum evaluation packages on
these machines has brought sophisticated spectrum evaluation within the reach of each
x-ray spectrometry laboratory.
In this chapter, various methods for spectrum evaluation are discussed, with emphasis
on energy-dispersive x-ray spectra. Most of the methods are relevant for x-ray uorescence,
particle-induced x-ray emission (PIXE), and electron beam x-ray analysis [electron probe x-
ray microanalysis (EPXMA), scanning electron microscopy energy dispersive x-ray
analysis (SEMEDX), and analytical electron microscopy (AEM)]. The principles of the
methods and their practical use are discussed. Least-squares tting, which is of importance
not only for spectrum evaluation but also for qualication procedures, is discussed in detail
in Sec. IX. Section X presents computer implementations of the main algorithms.

Copyright 2002 Marcel Dekker, Inc.


II. FUNDAMENTAL ASPECTS

The aims of spectrum evaluation is to extract analytically relevant information from ex-
perimental spectra. Obtaining this information is not straightforward because the spectral
data are always corrupted with measurement noise.

A. Amplitude and Energy Noise


In x-ray spectra, we can distinguish between amplitude and energy noise. Amplitude noise
is due to the statistical nature of the counting process, in which random events (the arrival
of x-ray photons at the detector) are observed during a nite time interval. For such a
process, the probability of observing N counts when the true number of counts is N0 is
given by the Poisson distribution (Bevington and Robinson, 1992):
NN
0 N0
PN; N0 e 1
N!
The number of counts in each channel of an x-ray spectrum as well as the sum over a
number of channels obey this Poisson distribution. For a Poisson random variable, the
population standard deviation is equal to the square root of the true number of counts:
p
sN0 N0 2
The sample standard deviation, which is an estimate of the true standard deviation,
therefore can be calculated as the square root of the observed number of counts:
p
sN N  sN0 3
The statistical nature of the counting process (Poisson statistics or counting statistics)
causes the typical channel to channel uctuations observed in x-ray spectra. That the
uncertainty of the data can be calculated from the data itself [Eq. (3)] is of great im-
portance for the spectrum evaluation methods.
Energy noise, on the other hand, causes the characteristic x-ray lines of ED spectra
to appear much wider than their natural linewidth of about 510 eV. Part of this line
broadening is due to the nature of photon-to-charge conversion process in the detector,
and part of it is associated with the electronic noise in the pulse amplication and pro-
cessing circuit, as discussed in Chapter 3. As a result, x-ray photons with energy E, which
on average correspond to a pulse height stored in channel i, from time to time gives rise to
slightly higher or lower pulses, causing the x-ray events to be stored in channels above and
below i, respectively. Accordingly, characteristic x-ray lines appear as relatively broad
(140250 eV), nearly Gaussian-shaped peaks in the spectrum. The peaks observed in
wavelength-dispersive spectra are also wider than the natural linewidth because of im-
perfections in the diraction crystal and the nite size of the collimators.

B. Information Content of a Spectrum


In the absence of these two noise contributions, spectrum evaluation would be trivial. A
spectrum would consist of a well-dened continuum on which sharp characteristic lines
were superimposed. The intensity of the continuum and the net x-ray line intensities could
be determined without error. Any remaining peak overlap (e.g., between AsKa and PbLa,
where the separation of 8 eV is less than the natural width of the AsKa line) could be dealt
with in an exact manner.
Copyright 2002 Marcel Dekker, Inc.
Unfortunately, we cannot eliminate the noise in the measurements completely. It is
possible, however, to reduce the noise in various ways. Amplitude noise (i.e., counting
statistics) can be reduced by acquiring the spectrum for a longer period of time or by using a
more intense primary beam. The eect of energy noise can be lowered by using a detector
and associated electronics of good quality and by shielding the system from external sources
of electronic noise. Although these suggestions may sound straightforward and not ap-
propriate in the context of spectrum evaluation, it is important to realize that once a
spectrum has been acquired, the information content remains constant. No spectrum pro-
cessing procedure, no matter how sophisticated, can produce more information than present
originally. It is therefore much more ecient to employ optimal experimental conditions
when acquiring the data rather than to rely on mathematical techniques in an attempt to
obtain information which is not present in the rst place (Statham and Nashashibi, 1988).
From this point of view, spectrum processing can be seen as any (mathematical)
procedure that transforms the information content of a measured spectrum into a form that
is more useful for our purposes (i.e., more accessible). As is indicated in Figure 1, most of
the procedures that calculate this useful information require some form of additional
input. Sometimes, this extra information is intuitive and not clearly dened; in other cases,
additional information is used in the form of mathematical model. In this respect, not the
complexity of model but rather the ability to accurately describe the physical reality is of
relevance. When we use a procedure to estimate the net peak area of a characteristic line by
summing the appropriate channels and interpolating the continuum left and right of the
peak, explicitly some additional information is given to this spectrum processing
method in the form of peak and continuum boundaries. In addition (implicitly), a certain
mathematical model is assumed regarding the shape of the peak and the continuum.
Provided this model is correct, this relatively simple procedure returns a correct estimate of
the net peak area as good as one could obtain by a complicated tting procedure.
The important distinction between simple and more sophisticated spectrum eva-
luation procedures lies in the incorporated exibility and applicability. A simple area es-
timation based on the integration of peak and continuum regions is not generally
applicable in real-world situations in which peak overlap, curvature in the continuum, and
peak tailing occur. Procedures employing more complex models are adaptable to each
specic situation, yielding reliable peak estimates. Spectrum evaluation procedures should

Figure 1 Spectrum evaluation seen as an information process: spectrum processing requires


additional information to extract useful information from measured spectral data.

Copyright 2002 Marcel Dekker, Inc.


therefore be compared on the basis of the explicit and implicit assumptions that are made
in the model(s) they employ.

C. Components of an X-ray Spectrum


To evaluate an x-ray spectrum correctly, it is necessary to understand all the phenomena
that contribute to the nal appearance of the spectrum. This includes the two main fea-
tures, characteristic lines and continuum, and also a number of spectral artifacts, which
become important especially in trace analysis work (Van Espen et al., 1980).

1. Characteristic Lines
The characteristic radiation of a particular x-ray line has a Lorentz distribution. Peak
proles observed with a semiconductor detector are the convolution of this Lorentz dis-
tribution with nearly Gaussian detector response function, giving rise to what is known as
Voigt prole (Wilkinson, 1971). Because the Lorentz width is of the order of only 10 eV
for elements with atomic number below 50, whereas the width of the detector response
function is of the order of 160 eV, a Gauss function is an adequate approximation of the
line prole. Only for K lines of elements such as U and Th does the Lorentz contribution
become signicant and need to be taken into account (Gunnink, 1977).
A more close inspection of the peak shape reveals some distinct tailing at the low
energy side of the peak and a shelf extending to zero energy. This is mainly due to
incomplete charge collection caused by detector imperfections (dead layer and regions of
low electric eld) as discussed in Chapter 3. The eect is most pronounced for low
energy x-rays. For photons above 15 keV Compton scatter in the detector also con-
tributes to deviation from the Gaussian shape. The distortion caused by incomplete
charge collection has been described theoretically (Joy, 1985; Heckel and Scholz, 1987).
Various functions have been proposed to model the real peak shape more accurately
(Campbell et al., 1987).
The observed emission spectrum of an element is the result of many transitions, as
explained in Chapter 1. The resulting x-ray lines, including possible satellite lines, need to
be considered in the evaluation of an x-ray spectrum. A more detailed discussion on the
representation of the K and L spectra and the peak shape is given in Sec. VII.

2. Continuum
The continuum observed in x-ray spectra results from a variety of processes. The con-
tinuum in electron-induced x-ray spectra is almost completely due to the retardation of the
primary electrons (bremsstrahlung). The intensity distribution of the continuum radiation
emitted by the sample is in rst approximation given by Kramers formula (Chapter 1).
Absorption in the detector windows and in the sample causes this continuous decreasing
function to fall o at low energies, giving rise to the typical continuum shape observed.
The attenuation of this bremsstrahlung by major elements in the sample also causes ab-
sorption edges to appear in these spectra. Continuum modeling for electron-induced x-ray
spectra has been studied in detail by a number of authors (Statham, 1976a, 1976b,
Smith et al., 1975; Sherry and Vander Sande, 1977; Bloomeld and Love, 1985).
For particle-induced x-ray emission, a similar continuum is observed also mainly due
to secondary electron bremsstrahlung. Other (nuclear) processes contribute here, making
it virtually impossible to derive a real physical model for the continuum. Special absorbers
placed between the sample and detector further alter the shape of the continuum.
Copyright 2002 Marcel Dekker, Inc.
In x-ray uorescence, the main source of the continuum is the coherent and incoherent
scattering of the excitation radiation by the sample. The shape can therefore become very
complex and depends both on the initial shape of the excitation spectrum and on the sample
composition. When white radiation is used for the excitation, the continuum is mainly ra-
diative and absorption edges can also be observed. With quasimonoenergetic excitation
(secondary target, radioisotope), the incomplete charge collection of the intense coherently
and incoherently scattered peaks is responsible for most of the continuum (see Chapter 3).
Also here, realistic physical models for the description of the continuum are not used.
The incomplete charge collection of intense uorescence lines in the spectrum further
complicates the continuum. The cumulative eect of the incomplete charge collection of all
lines causes the apparent continuum at lower energies to be signicantly higher that ex-
pected on the basis of the primary continuum generating processes.

3. Escape Peaks
Escape peaks result from the escape of SiK or GeK photons from the detector after
photoelectric absorption of the impinging x-ray photon near the edge regions of the de-
tector. The energy deposited in the detector by the incoming x-ray is diminished with the
energy of the escaping SiK or GeK photon. Typical examples of the interference due to Si
escape peaks are the interference of TiKa (4.51 keV) by the FeKa escape at 4.65 keV and
the interference of FeKa by the CuKa escape.
For a Si(Li) detector, the escape peaks is expected 1.742 keV (SiKa) below the parent
peak. Experimentally, it is observed that the energy dierence is slightly but signicantly
higher, 1.750 keV (Van Espen et al., 1980). Ge escape peaks are observed 9.876 (GeKa)
and 10.984 keV (GeKb) below the parent peak. The width of the escape peaks is smaller
than the width of the parent peak and corresponds to the spectrometer resolution at the
energy of the escape peak.
The escape fraction f is dened as the number of counts in the escape peak Ne di-
vided by the number of detected counts (escape parent). Assuming normal incidence to
the detector and escape only from the front surface, the following formula can be derived
for the escape fraction (Reed and Ware, 1972):
   
Ne 1 1 m m
f oK 1  1  K ln 1 I 4
Np Ne 2 r mI mK
where mI and mK are the mass-attenuation coecient of silicon for the impinging and the
SiK radiation, respectively, oK is the K uorescence yield, and r is the K jump ratio of
silicon. Using 0.047 for the uorescence yield and 10.8 for the jump ratio, the calculated
escape fraction is in very good agreement with the experimentally determined values for
impinging photons up to 15 keV (Van Espen et al., 1980). Equation (4) is also applicable
for estimating the escape fraction in Ge detectors, provided that the parameters related to
Si are substituted with these of Ge. Knowing the energy, width, and intensity of the escape
peak, corrections for its presence can be made in a straightforward manner.

4. Pileup and Sum Peaks


With modern pulse processing electronics, the pileup eects are suppressed to a large
extend. With a pulse-pair resolution time of a few 100 ns or less, only true sum peaks are
observed. These sum peaks are, within a few electron volts, located at their expected
position, and they are slightly wider (5%) than normal peaks located at the same energy
in the spectrum (Van Espen et al., 1980). The count rate of a sum peak is given by
Copyright 2002 Marcel Dekker, Inc.
N_ 11 tN_ 1 N_ 2 5
and

N_ 12 2tN_ 1 N_ 1 6
with N_ 11 the count rate (counts=s) in a sum peak due to the coincidence of two x-rays with
the same energy, N_ 12 the count rate of a sum peak resulting from two x-rays with dierent
energy, and t the pulse-pair resolution time in seconds. Sum peaks are often found when a
few large peaks at lower energy dominate the spectrum. Typical examples are PIXE
spectra of biological and geological samples. The high count rate of the K and CaK lines
produces sum peaks that are easily observed in the high-energy region of the spectrum
where the continuum is low. It is important to note that the intensity of sum peaks is
count-rate dependent, they can be reduced and virtually eliminated by performing the
measurement with a lower primary beam intensity. A method for correcting for the
contribution of sum peaks in least-squares tting has been proposed by Johansson (1982)
and is discussed further in Sec. VII.

5. Discrete Character of a PulseHeight Spectrum


Another aspect of spectral data that should be mentioned is that a pulseheight spectrum
is a discrete histogram representing a continuous function. Digitization of this continuous
function, especially the Gaussian peaks, into too few channels causes considerable sys-
tematic errors. If the peak contains less than 2.5 channels at the FWHM, the peak area
estimate, for example, obtained by summing the channel contents is largely overestimated.
This lower limit of 2.5 channels at FWHM corresponds to a spectrometer gain of
60 eV=channel for a peak width of 150 eV. In practice, 40 eV=channel or lower is re-
commended, otherwise peak position and width determinations and the results of spec-
trum tting become unreliable.

6. Other Artifacts
A number of other features might appear in an x-ray spectrum and can cause problems
during the spectrum evaluation. In the K x-ray spectra of elements with atomic number
between 20 and 40, one can detect a peaklike structure with a rather poorly dened
maximum and a slowly declining tail (Van Espen et al., 1979a). This structure is due to the
KLL radiative Auger transition, which is an alternative decay mode of the K vacancy.
The maximum is observed at the energy of the KLL Auger electron transition energy.
The intensity of the radiative Auger structure varies from approximately 1% of the Ka line
for elements below Ca to 0.1% for elements above Zn. For chlorine and lower atomic
number elements, the radiative Auger band overlaps with the Ka peak. In most analytical
applications, this eect will not cause serious problems. The structure can be considered as
part of non-Gaussian peak tail.
The scattering of the excitation radiation in x-ray uorescence is responsible for most
of the continuum observed in the spectrum. When characteristic lines are present in the
excitation spectrum, two peaks might be observed. The Rayleigh (coherently)-scattered
peak has a position and width as expected for a normal uorescence line. The Compton
(incoherently scattered) peak is shifted to lower energies according to the well-known
Compton formula and is much broader than a normal characteristic line at that energy.
This broader structure, resulting from scattering over a range of angles and Doppler eect,
Copyright 2002 Marcel Dekker, Inc.
is dicult to model analytically (Van Dyck and Van Grieken, 1983). The structure is
further complicated by multiple scattering phenomena in the sample (Vincze et al., 1999).
Apart from these commonly encountered scattering processes, it is possible to detect x-
ray Raman scattering (Van Espen et al., 1979b). Again, a bandlike structure is observed with
an energy maximum given by the incident photon energy minus the electron-binding energy.
The Raman eect is most prominently present when exciting elements with atomic number
Z 7 2 to Z 7 with the K radiation of element Z. In this case, Raman scattering occurs on L
electrons. For x-ray excitation energies between 15 and 25 keV, the Raman scattering on the
K electrons of Al to Cl can also be observed. Because of its high-energy edge, the eect may
appear as a peak in the spectrum with possible erroneous identication as a uorescence line.
The intensity of the Raman band increases as the incident photon energy comes closer to the
binding energy of the electron. The observed intensity can amount to as much as 10% of the
L uorescence intensity for the elements Rh to Cs when excitation with MoK x-rays is used.
When the excitation source is highly collimated diraction peaks can be observed in
the x-ray uorescence spectrum of crystalline materials. It is often dicult to deal with
these diraction patterns. They can interfere with the uorescence lines or even be mis-
interpreted as being uorescence lines, giving rise to false identication of elements.

III. SPECTRUM PROCESSING METHODS

Spectrum processing refers to mathematical techniques that alter the outlook of the
spectral data. This is often done, using some digital lter, to reduce the noise, to locate
peaks, or to suppress the continuum. In this section, various methods of ltering are
discussed. Because of its relation to the frequency domain, the concept of Fourier trans-
formation is introduced rst.

A. Fourier Transformation, Convolution, and Deconvolution


One can think of an x-ray spectrum as consisting of a number of components with dif-
ferent frequencies. In the spectrum shown in Figure 2, one recognizes a nearly constant
component (the continuum) as well as a component that uctuates from channel to
channel ( fast). The latter is obviously the noise due to counting statistics. Peaks, then,
must have frequency components intermediate between these two. The frequency char-
acteristics of a spectrum can be studied in the Fourier space.
For any discrete function f(x), x 0, . . . , n 7 1 (e.g., a pulseheight spectrum), the
discrete Fourier transform is dened as
 
1 X
n1
j2pux
Fu fx exp
n x0 n
1 Xn1  ux ux
fx cos 2p  j sin 2p 7
n x0 n n
p
with j 1 and u 0, . . . , n 7 1. F(u) is a complex number. The real part, R(u), and the
imaginary part, I(u), represent respectively the amplitude of the cosine and the sine
functions that are necessary to describe the original data. The square of F(u) is called the
power spectrum,

jFuj2 R2 u I2 u 8

Copyright 2002 Marcel Dekker, Inc.


Figure 2 A 256-channel pulseheight spectrum (single Gaussian on a constant continuum) and
the Fourier-filtered spectrum.

and gives an idea about the dominant frequencies in the spectrum. Because there are n
dierent nonzero real and imaginary coecients, no information is lost by the Fourier
transform and the inverse transformation is always possible:
X
n1  
j2pux
fx Fu exp 9
u0
n
Figure 3 shows that power spectrum of the pulseheight distribution in Figure 2 (a single
Gaussian on a constant continuum). The frequency (inverse channel number) is dened as
u=n, with n 256 and u 0, . . . , n=2. The amplitude of the zero frequency jF0j2 , which is
equal to the average of the spectrum, is not shown. The dominating low frequencies
originate from the continuum and from the Gaussian peak, whereas the higher frequencies
are caused mainly by the counting statistics. It is clear that if we eliminate those high
frequencies, we are reducing this noise. This can be done by multiplying the Fourier
transform with a suitable function:
Gu FuHu 10
An example of such a function is a high-frequency cuto lter:

1; u  ucrit
Hu 11
0; u > ucrit
which sets the real and imaginary coecients above a frequency ucrit to zero. If we apply
this lter to the Fourier transform of Figure 3 using ucrit 0.05 and then apply the inverse
Fourier transformation [Eq. (9)], the result as shown by the solid line in Figure 2 is ob-
tained. The peak shape is preserved, but most of the statistical uctuations are eliminated.
Copyright 2002 Marcel Dekker, Inc.
Figure 3 Fourier power spectrum of the pulseheight distribution shown in Figure 2.

If we would cut o at even lower frequencies, peak distortions at the top and at the base of
the peak would become more pronounced.
This Fourier ltering can also be done directly in the original data space. Indeed, the
convolution theorem says that multiplication in the Fourier space is equivalent to con-
volution in the original space:
Gu FuHu , fx  hx gx 12
The convolute at data point x is dened as the sum of the products of the original data and
the lter centered around point x:
X
gx fx  hx fx  x0 hx0 13
x0

h(x) is called a digital lter and is the inverse Fourier transformation of H(u). In
general, this convolution or ltering of a spectrum yi with some weighing function is
expressed as

1 X
jm
yi hj yij 14
N jm

where hj are the convolution integers and N is a suitable normalization factor. The lter
width is given by 2m 1. Fourier ltering with the purpose to reduce or eliminate some
(high or low) frequency components in the spectrum can thus be implemented as a con-
volution of the original data with a digital lter. This convolution also alters the variance
of the original data. Applying the concept of error propagation, one nds that the variance
of the convoluted data is given by
Copyright 2002 Marcel Dekker, Inc.
1 X m
s2y h2 yij 15
i N2 jm i

when the original data follows a Poisson distribution (s2y y).


Because the measured spectrum y(x) is itself a convolution of the original (x-ray
emission) signal f(x), with the instrument (or detector) response function h(x), it is, in
principle, possible to restore the measured signal if this response function is know. This
can be accomplished by dividing the Fourier transform (FT) of the measured spectrum by
Fourier transform of the (nearly Gaussian) response function, followed by the inverse
Fourier transform (IFT) of the resulting quotient:
)
FT
yx!Yu Yu IFT
FT
Fu ! fx 16
hx!Hu Hu

The fact that the detector response function changes with energy (becomes broader) and,
more importantly, the presence of noise prohibits the straightforward application of this
Fourier deconvolution technique. Indeed, in the presence of noise, the measured signal
must be presented by
yx fx  hx nx 17
and its Fourier transform
Yu FuHu Nu 18
or
Yu Nu
Fu 19
Hu Hu
At high frequencies, the response, H(u), goes to zero while N(u) is still signicant, so that
the noise is emphasized in the inverse transformation. This clearly shows that the noise
(counting statistics) is the ultimate limitation for any spectrum processing and analysis
method.
A clear introduction to Fourier transformations related to signal processing can be
found in the work of Massart et al. (1998). Algorithms for Fourier transformation and
related topics are given in the work of Press et al. (1998). Detailed discussions on Fourier
deconvolution can be found in many textbooks (Jansson, 1984; Brook and Wynne, 1988).
Fourier deconvolution in x-ray spectrometry based on maximum a posteriori or maximum
entropy principles is discussed by several authors (Schwalbe and Trussell, 1981; Nunez
et al., 1988; Gertner et al., 1989). Gertner implemented this method for the analysis of real
x-ray spectra and compared the results with those obtained by simple peak tting. The
problem that the deconvolution algorithms are limited to systems exhibiting translational
invariance was overcome by a transformation of the spectrum, so that the resolution
becomes independent of the energy.

B. Smoothing
p
Because of the uncertainty y on each channel content yi , ctitious maxima can occur
both on the continuum and on the slope of the characteristic peaks. Removal or sup-
pression of these uctuations can be useful during the visual inspection of spectra (e.g., for
locating small peaks on a noisy continuum) and is used in most automatic peak search and
Copyright 2002 Marcel Dekker, Inc.
continuum estimation procedures. Although smoothing can be useful in qualitative ana-
lysis, its use is not recommended as part of any quantitative spectrum evaluation.
Smoothing, although reducing the uncertainty in the data locally, redistributes the original
channel content over the neighboring channel, thus introducing distortion in the spectrum.
Accordingly, smoothing can provide a (small) improvement in the statistical precision
obtainable with simple peak integration but is of no advantage when used with least-
squares-tting procedures in which assumptions about the peak shapes are made.

1. Moving Average
The most straightforward way of smoothing (any) uctuating signal is to employ the
box-car or moving-average technique. Starting from a measured spectrum y, a
smoothed spectrum y* is obtained by calculating the mean channel content around each
channel i:

1 X
m
yi yi yij 20
2m 1 jm

This can be seen as a convolution [Eq. (14)] with all coecients hj 1. The smoothing
eect obviously depends on the width of the lter, 2m 1. The operation being a simple
p

averaging, the standard deviation of the smoothed data is reduced by a factor 2m 1 in
regions where yi is nearly constant. On the other hand, such a lter introduces a con-
siderable amount of peak distortion. This distortion depends on the lter width-to-peak
width ratio. Figure 4 shows the peak distortion eects when a moving-average lter of
widths 9, 17, and 25 is applied to a peak with
P full width at half-maximum (FWHM) equal
to nine channels. Being a unit area lter hj =N 1 with N 2m 1, the total counts in
the peak is not aected in an appreciable way other than by rounding errors.

Figure 4 Distortion introduced by smoothing of a peak with a moving-average filter 9, 17, and 25
points wide. The FWHM of the original peak is nine channels.

Copyright 2002 Marcel Dekker, Inc.


Figure 5 shows the eect on the peak height and width when applying this type of
lter with dierent sizes. The peak distortion is caused by the fact that in the calculation of
yi , the content of all neighboring channels is used with equal weight. Consequently, by
employing a nonuniform lter h, which places more weight on the central channels and
less on the channels near the edge of the lter, smoothing can be achieved with less
broadening eects.

2. Savitsky and Golay Polynomial Filters


Another way of dealing with statistical uctuations in experimental data is by drawing a
best-tting curve through the data points. This idea resulted in the development by
Savitsky and Golay (1964) of a general type of smoothing lters with very interesting
features. The method is based on the fact that nearly all experimental data can be modeled
by a polynomial of order r, a0 a1 x a2 x2    ar xr , when the data are conned to a
suciently small interval. If we consider a number of data points around a central channel
io , such as yi0 2 ; yi0 1 ; yi0 ; yi0 1 ; yi0 2 , a least-squares t with the function.

yi a0 a1 i  i0 a2 i  i0 2 21
can be made. Once we have determined the coecients aj , the value of the polynomial at
the central channel i0 can be used as the smoothed value:
yi yi0 a0 22
This concept is schematically illustrated in Figure 6. By moving the central channel to the
right (from i0 to i0 1), the next smoothed channel content can be calculated by repeating
the entire procedure.

Figure 5 Percent change in peak height and width introduced by filtering with a moving-average
and a Savitsky and Golay polynomial filter as a function of the filter width-to-peak FWHM ratio,
(2m 1)/FWHM.

Copyright 2002 Marcel Dekker, Inc.


Figure 6 Concept of polynomial smoothing. A parabola is fitted trough the points i0 3 to i0 3.

The value of the parabola at i0 is the smoothed value ( ).

At rst sight, this smoothing method would require a least-squares t for each
channel in the spectrum. However, the fact that the x values are equidistant allows us to
formulate the problem in such a way that the polynomial coecients can be expressed as
simple linear combinations involving only yi values:
1 X
jm
ak Ck; j yik 23
Nk jm

This means that it is possible to implement the least-squares-tting procedure more e-


ciently as a convolution of the spectrum with a lter having appropriate weights. For this
second-order polynomial, the coecients are given by
C0; j 33m2 3m  1  5j2
24
N0 2m  12m 12m 3
Smoothing with a ve-point second-degree polynomial 2m 1 5 thus becomes
1
yi a0 3yi2 12yi1 17yi 12yi1  3yi2 25
35
In general, for a polynomial of degree r tted to 2m 1 points, this can be written as
1 X
jm
yi Crm; j yij 26
Nrm jm
where the convolution integers Crm;j and the normalization factors Nrm do not depend on
the data to be smoothed but are a function only of the polynomial degree r and on the lter
Copyright 2002 Marcel Dekker, Inc.
half-width m. Table 1 lists the coecients of polynomial smoothing lters with widths
between 5 and 25 points. The coecients for a second- and a third-degree polynomial are
identical. In comparison with the moving-average lter of the same width, polynomial
smoothing lters are less eective in removing noise but have the advantage of causing less
peak distortion. The distortion eect as a function of the lter width-to-peak width ratio is
given in Figure 5. When the lter becomes much wider than the peak, the smoothed
spectrum features oscillations near the peak boundaries, as illustrated in Figure 7.
An interesting feature of this type of lters is that they can produce not only a
smoothed spectrum but also a smoothed rst and second derivative of the spectrum. If we
dierentiate Eq. (21) and take the value at the center position,

0 dyi
y a1 27
di ii0

d 2 yi
y00 2a2 28
di2 ii0
it follows from Eq. (23) that the smoothed rst and second derivative of the spectrum can
also be calculated by means of suitable convolution coecients. For instance, for the rst
derivative of a second-order polynomial, using ve data points this becomes
1
y0
i 2yi2  yi1 yi1 2yi2 29
10
The corresponding convolution integers for the calculation of the smoothed rst and
second derivative are listed in Tables 24. The use of the derivative spectra is illustrated in
next section dealing with peak search methods. The FORTRAN implementation of the
Savitsky and Golay lters is given in Sec. X.
Variations on these smoothing strategies, such as the use of variable-width lters are
reviewed by Yule (1967). The eect of repeated smoothing on the accuracy and precision
of peak area determination is discussed by Nielson (1978). A more comprehensive treat-
ment on polynomial smoothing can be found in Enke and Nieman (1976) and its
references.

Table 1 Savitsky and Golay Coefcients for Second- and Third-Degree (r 2 and r 3)
Pjm
Polynomial Smoothing Filter; yi 1=Nrm jm Crm; j yij , Filter Width 2m 1

j Crm; j Crm;j

m Nrm 0 1 2 3 4 5 6 7 8 9 10 11 12

2 35 17 12 3
3 21 7 6 3 2
4 231 59 54 39 14 21
5 429 89 84 69 44 9 36
6 143 25 24 21 16 9 0 11
7 1105 167 162 147 122 87 42 13 78
8 323 43 42 39 34 27 18 7 6 21
9 2261 269 264 249 224 189 144 89 24 51 136
10 3059 329 324 309 284 249 204 149 84 9 76 171
11 805 79 78 75 70 63 54 43 30 15 2 21 42
12 5175 467 462 447 422 387 342 287 222 147 62 33 138 253

Copyright 2002 Marcel Dekker, Inc.


Figure 7 Effect of the smoothing of a peak with a Savitsky and Golay polynomial filter. The
FWHM of the original peak is nine channels.

3. Low Statistics Digital Filter


Most smoothing techniques originate from signal processing and were initially introduced
in the eld of g-ray spectroscopy. Also in the PIXE community, considerable attention has
been devoted to a number of aspects of spectrum processing and evaluation. Within the
framework of continuum estimation (see later), a smoothing algorithm was developed that
removes noise from a spectrum on a selective basis (Ryan et al., 1988). The method
provides an n-point means smoothing in regions of low statistics (few counts) while
avoiding spreading of the base of peaks and degradation of valleys between peaks.

Table 2 Savitsky and Golay Coefcients for Second-Degree (r 2) Polynomial, First-Derivative


Pjm
Filter; yi 1=Nrm jm Crm; j yij , Filter Width 2m 1

j Crm; j Crm;j

m Nrm 0 1 2 3 4 5 6 7 8 9 10 11 12

2 10 0 1 2
3 28 0 1 2 3
4 60 0 1 2 3 4
5 110 0 1 2 3 4 5
6 182 0 1 2 3 4 5 6
7 280 0 1 2 3 4 5 6 7
8 408 0 1 2 3 4 5 6 7 8
9 570 0 1 2 3 4 5 6 7 8 9
10 770 0 1 2 3 4 5 6 7 8 9 10
11 1012 0 1 2 3 4 5 6 7 8 9 10 11
12 1300 0 1 2 3 4 5 6 7 8 9 10 11 12

Copyright 2002 Marcel Dekker, Inc.


Pjm
Table 3 Savitsky and Golay Coefcients for Third-Degree (r 3) Polynomial, First-Derivative Filter; yi 1=Nrm jm Crm; j yij , Filter
Width 2m 1

j Crm; j Crm;j

m Nrm 0 1 2 3 4 5 6 7 8 9 10 11 12

2 12 0 8 1
3 252 0 58 67 22
4 1,188 0 126 193 142 86
5 5,148 0 296 503 532 294 300
6 24,024 0 832 1,489 1,796 1,578 660 1,133
7 334,152 0 7,506 13,843 17,842 18,334 14,150 4,121 12,922
8 23,256 0 358 673 902 1,002 930 643 98 748
9 255,816 0 2816 5,363 7,372 8,574 8,700 7,481 4,648 68 6,936
10 3,634,092 0 29,592 56,881 79,564 95,338 101,900 96,947 78,176 43,284 10,032 84,075
11 197,340 0 1,222 2,365 3,350 4,098 4,530 4,567 4,130 3,140 1,518 815 3,938
12 1,776,060 0 8,558 16,649 23,806 29,562 33,450 35,003 33,754 29,236 20,982 8,525 8,602 30,866

Copyright 2002 Marcel Dekker, Inc.


Table 4 Savitsky and Golay Coefcients for P Second- and Third-Degree (r 2 and r 3)
Polynomial Second-Derivative Filter; yi 1=Nrm jm
jm
Crm; j yij , Filter Width 2m 1

j Crm; j Crm;j

m Nrm 0 1 2 3 4 5 6 7 8 9 10 11 12

2 7 2 1 2
3 42 4 3 0 5
4 462 20 17 8 7 28
5 429 10 9 6 1 6 15
6 1,001 14 13 10 5 2 11 22
7 6,188 56 53 44 29 8 19 52 91
8 3,876 24 23 20 15 8 1 12 25 40
9 6,783 30 29 26 21 14 5 6 19 34 51
10 33,649 110 107 98 83 62 35 2 37 82 133 190
11 17,710 44 43 40 35 28 19 8 5 20 37 56 77
12 26,910 52 51 48 43 36 27 16 3 12 29 48 69 92

The rather heuristic algorithm is discussed in Section X, together with the computer im-
plementation. The eect of the lter is illustrated in Figure 8, where it is compared with the
other smoothing methods discussed.

C. Peak Search Methods


Several methods have been developed for the automatic localization of peaks in a spec-
trum. Nearly all methods follow a strategy where the original spectrum is transformed into
a form that emphasizes the peaklike structures and reduces the continuum, followed by a
decision whether these peaklike structures are statistically signicant. The latter involves
some adjustable parameter(s) controlling the sensitivity of the peak search.
Although visual inspection of the spectrum still appears to be the best method, peak
search algorithms, which are heavily used in g-ray spectrometry, may have some value in
x-ray spectrometry. Their use in energy-dispersive x-ray analysis as part of an automated
qualitative analysis procedure is hampered by the extreme peak overlap in these spectra.
More elaborated procedures involving articial intelligence techniques have been con-
sidered for this (Janssens and Van Espen, 1986; Janssens et al., 1988).
Peak search procedures usually involve three steps: (1) transformation of the original
spectrum so that continuum contribution is eliminated, peaks are readily locatable, and
overlapping peaks are (partially) resolved; (2) signicance test and approximate location of
the peak maximum; and (3) more accurate peak position estimate in the original spectrum.
The various peak search algorithms mainly dier in the choice of the transformation.
Some methods use the rst and second smoothed derivative of the spectrum. This method is
illustrated in Figure 9. The sign change (crossing of the x axis) of the rst derivative and the
minimum of the second derivative are quire suitable to detect the peaks in the original spectrum.
Other methods employ some form of correlation technique, which is basically the
convolution of the original spectrum with a lter that approximates the shape of the peak
and, therefore, emphasizes the peak. If a zero-area correlator (lter) is used, the continuum
is at the same time eectively suppressed. The simplest and most eective correlators belong
to the group of zero-area rectangular lters. These lters have a central window with
Copyright 2002 Marcel Dekker, Inc.
Figure 8 Effect of various smoothing methods applied to part of a x-ray spectrum: (A) original
spectrum, (B) nine-point Savitsky and Golay filter, (C) nine-point moving average, (D) low-noise
statistical filter.

constant and positive coecients and two side lobes with constant and negative coecients.
Convoluting an x-ray spectrum with this kind of lter yields spectra in which the continuum
is removed and peaks are easily locatable. They are similar to inverted second-derivative
spectra. An important representative of this group of lters is the top-hat lter, which has
a central window with an odd number of channels w and two side windows each n channels
wide. The value of the lter coecients follows from the zero-area constraint:
8 1
<  2n ; n  w2  k <  w2
hk 1
;  w2  k  w2 30
: w1
 2n ; w2 < k  w2 n
The ltered spectrum is obtained by the convolution of the original spectrum with this lter:
X
knw=2
yi hk yik 31
knw=2

The eect of this lter on a typical spectrum is shown in Figure 10. The variance of the
ltered spectrum is obtained by simple error propagation:
X
knw=2
s2y h2k yik 32
i
knw=2

Copyright 2002 Marcel Dekker, Inc.


Figure 9 Peak search using first and second derivatives: (A) doublet, peak width (s) four channels,
separation eight channels, on a constant continuum. (B) First and (C) second smoothed derivative
using a five-point Savitsky and Golay filter.

If yi is signicantly dierent from zero, a peak structure is found and the top of the peak
can approximately be located by searching for the maximum. Thus, i corresponds to the
position of a peak maximum in the original spectrum if
yi > rsyi 33

Copyright 2002 Marcel Dekker, Inc.


Figure 10 Result of applying a top-hat filter. Dotted line: typical X-ray spectrum; solid line:
filtered spectrum.

and
yi1  yi > yi1 34
In Figure 11, the positive part of the ltered spectrum (w 9 and n 5) and the decision
line rsy for r 1 and 4 are displayed.
If required, other peak features can be obtained from the ltered spectrum: The
distance between the two local minima is a measure of the width of the peak and the height
at the maximum is related to the net peak area.
Because the width and heights of the peaks in the ltered spectrum strongly depend
on the dimensions of the lter, it is important that its dimensions are matched to the width
of the peaks in the original spectrum. From considerations of peak detectability (signal-to-
noise ratio) and resolution (peak broadening), it follows that the optimum width of the
positive window w is equal to the FWHM of the peaks (Robertson et al., 1972). The width
of the negative side windows should be chosen as large as the curvature of the continuum
allows. A reasonable compromise between sensitivity to peak shapes and rejection of
continuum is reached when n equals FWHM=2 to FWHM=3. Typical values for the
sensitivity factor r are between 2 and 4. Higher values result in the loss of small peaks;
lower values will cause continuum uctuations to be interpreted as peaks.
Other zero-area rectangular lters, variations to the top-hat lter, are also in use,
such as the square-wave lter with typical coecient sequence 1, 1, 2, 2, 1, 1 (Philips
and Marlow 1976; McCullagh, 1982) and the symmetric square-wave lter with coe-
cients 1, 1, 1, 1 (Op De Beeck and Hoste 1975). A detailed account of the performance
of this lter is given by Op De Beeck and Hoste (1975). A method using a Gaussian
correlator function is discussed by Black (1969).
Copyright 2002 Marcel Dekker, Inc.
Figure 11 Peak search using the positive part of the top-hat filtered spectrum and the decision
level for one and four times the standard deviation. The original spectrum is shown at the bottom.

Once the peak top is approximately located in the ltered spectrum, a more precise
maximum can be found by tting a parabola over a few channels around the peak. For a
well-dened peak on a low continuum (or after continuum subtraction), the channel
content near the top of the peak can be approximated by a Gaussian:
" #
x  m2
yi  h exp  35
2s2

The logarithm of the data then is a simple polynomial


 
m2 m m2 2
ln yi ln h  2 x  x 36
2s s2 2s2
If we t ln yi with a polynomial a0 a1 x a2 x2 , where x represents the channel number,
the position of the peak m is obtained from

a1
m 37
2a2

Copyright 2002 Marcel Dekker, Inc.


with an accuracy of 0.1 channel or better if the continuum contribution is small and if the
peak is interference free [i.e., if Eq. (35) accurately describes the data]. An estimate of the
peaks width and height is obtained at the same time:
r
p 1
FWHM 2 2 ln 2s 2:3548  38
2a2
 
a2
h exp a0 1 39
4a2
To obtain a reliable estimate of the parameters, only the channels in the FWHM or, at
most, within the FWTM region of the peak must be included in the t.
As a somewhat simpler and faster alternative, one can nd an estimate of the peak
maximum by tting the parabola over the three top channels of the peak. If i is the peak
maximum found in the ltered spectrum, a better estimate of the maximum in the original
spectrum is found by

1 yi1  yi1
mi 40
2 yi1 yi1  2yi
This method might be preferred for small peaks when the continuum cannot be
disregarded.
A FORTRAN implementation of a peak search algorithm is given in Section X.

IV. CONTINUUM ESTIMATION METHODS

Except for some special quantication procedures (e.g., the peak-to-background method
in electron microscopy), the relevant analytical information is found in the net peak areas
and continuum is considered a nuisance. There are, in principle, three ways to deal with
the continuum: (1) the continuum can be suppressed or eliminated by a suitable lter;
(2) the continuum can be estimated and subtracted from the spectrum prior to the esti-
mation of the net peak areas; and (3) the continuum can be estimated simultaneously with
the other features in the spectrum. The rst approach is discussed in Section VI, where the
continuum is removed from spectra by applying a top-hat lter followed by linear least-
squares t of the spectrum with a number of (also ltered) reference spectra. Least-squares
t (linear or nonlinear) with analytical functions (Sec. VII) allows the simultaneous esti-
mation of continuum and peaks, providing a suitable mathematical function can be found
for the continuum. In this section, we discuss a number of procedures that aim to estimate
the continuum independently of the other features in the spectrum. Once estimated, this
continuum can be subtracted from the original spectrum and all methods for further
processing, ranging from simple peak integration to least-squares tting can be applied.
Any continuum estimation procedure must fulll two important requirements. First,
the method must be able to reliably estimate the continuum in all kinds of situations (e.g.,
small isolated peaks on a high continuum as well as in the proximity of a matrix line).
Second, to permit processing of a large number of spectra, the method needs to be nearly
free of user-adjustable parameters.
Although a number of useful continuum estimation procedures has been developed,
it must be realized that their accuracy in estimating the continuum is not optimal. In one
way or another, they rely on the dierence in frequency response of the continuum
compared to other structures such as peaks, the former mainly consisting of low
Copyright 2002 Marcel Dekker, Inc.
frequencies (slowly varying). Because the peaks also exhibit low frequencies at the peak
boundaries, it is dicult to control the method in such a way that it correctly discriminates
between peaks and continuum. This results in either a small underestimation or over-
estimation of the continuum, introducing potentially large relative errors for small peaks.
In this respect, the tting of the continuum with analytical functions may provide more
optimal results (Vekemans et al., 1994). A considerable advantage of the methods dis-
cussed here is that they do not assume an explicit mathematical model of the continuum.
Constructing a detailed and accurate analytical model for the continuum based in physical
theory is nearly impossible except for some simple geometry and particular excitation
conditions. Most often, some polynomial type of function must be chosen when tting a
portion of the spectrum with analytical functions.

A. Peak Stripping
These methods are based on the removal of rapidly varying structures in a spectrum by
comparing the channel content yi with the channel content of its neighbors. Clayton et al.
(1987) proposed a method which compares the content of channel i with the mean of its
two direct neighbors:
yi1 yi1
mi 41
2
If yi is smaller than mi, the content of channel i is replaced by the mean mi. If this trans-
formation is executed once of all channels, one can observe a slight reduction in the height
of the peaks while the rest of the spectrum remains virtually unchanged. By repeating this
procedure, the peaks are gradually stripped from the spectrum. Because the method
tends to connect local minima, it is very sensitive to local uctuations in the continuum due
to counting statistics. This makes smoothing of the spectrum, as discussed in the previous
section, prior to the stripping process mandatory. Depending on the width of the peaks
after typically 1000 cycles, the stripping converges and a more or less smooth continuum
remains. To reduce the number of iterations, it might be advantageous to perform a log or
p
square root transformation to the data prior to the stripping: y0i logyi 1 or y0i yi .
After the stripping, the continuum shape is obtained by applying the inverse transforma-
tion. A major disadvantage of this method is that after a number of cycles, the bases of
partially overlapping peaks are transformed into broad humps, which take much longer
to remove than isolated peaks. The method was originally applied to PIXE spectra but
proves to be generally applicable for pulseheight spectra.
In Figure 12, this method is applied to estimate the continuum of an x-ray spectrum
in the region between 1.6 and 13.0 keV. The spectrum results from a 200-mg=cm2 pellet of
NIST SRM Bovine Liver sample excited with the white spectrum of an Rh-anode x-ray
tube ltered through a thin Rh lter (Tracor Spectrace 5000). Because of the white tube
spectrum, a considerable continuum intensity was observed, increasing quite steeply in the
region above 10 keV. To calculate the continuum, the following algorithm was used:
(1) the square root of the original spectrum was taken; (2) these data were smoothed with a
10-point Savitsky and Golay lter; (3) a number of iterations were performed applying
Eq. (41) over the region of interest; (4) the square of each data point was taken (back
transformation) to obtain the nal continuum shape. In Figure 12, the continuum after
10, 100, and, nally, 500 iterations is shown.
As a generalization of the above-discussed method, the average of two channels a
distance w away from i can be used:
Copyright 2002 Marcel Dekker, Inc.
Figure 12 Continuum estimate after 10, 100, and 500 iterations obtained with simple iterative
peak stripping.

yiw yiw
mi 42
2
Ryan et al. (1988) proposed using twice the FWHM of the spectrometer at channel i as the
value for w. They reported that only 24 passes were required to produce acceptable
continuum shapes
p in PIXE spectra. During the last eight cycles, w is progressively reduced
by the factor 2 to obtain a smooth continuum. To compress the dynamic range of the
spectrum, a double-log transformation of the spectrum, log[log( yi+1)+1], before the
actual stripping was proposed. In combination with the low statistics digital lter, this
procedure is called the SNIP algorithm (Statistical Nonlinear Iterative Peak clipping).
A variant of this procedure is implemented in the procedure SNIPBG given in Sec. X.
Instead of the double logarithmic, we employed a square root transformation, and a Sa-
vitsky and Golay smoothing is performed on the square root data. The width w is kept
constant over the entire spectrum. The value of w is also used as the width of the smoothing
lter. Using this implementation, the continuum of the above-discussed spectrum is cal-
culated and represented in Figure 13. The width was set 11 to channels approximately
corresponding to the FWHM of the peaks in the center of the spectrum and 24 iterations
were done. Apart from delivering a smoother continuum with smaller humps, this
method executes much faster than the original method proposed by Clayton.

B. Continuum Estimation Using Orthogonal Polynomials


Another interesting continuum estimation procedure was introduced by Steenstrup (1981),
who applied the method to energy-dispersive x-ray diraction spectra. The spectrum is
tted using orthogonal polynomials, and the weights of the least-squares t are iteratively
adjusted so that only channels belonging to the continuum are included in the t.
Copyright 2002 Marcel Dekker, Inc.
Figure13 Continuum estimate using statistical nonlinear iterative peak clipping (SNIP algorithm)
with 24 iterations.

The method is generally applicable to pulseheight spectra and can be implemented as an


algorithm that needs little or no control parameters.
The continuum is described by a set of polynomials up to degree m:
X
m
yi cj Pj xi 43
j0
where Pj(xi) is an orthogonal polynomial of degree j. As an example of m 3, the function
becomes
yi c0 c1 xi  a0 c2 x1  a1 xi  a0  b1 44
The least-squares estimates of the parameters cj are given by
Xn
wi yi Pj xi
cj 45
i1
gj
where wi are the weights of the t. A detailed discussion on orthogonal polynomial tting
can be found in Sec. IX. Because the polynomial terms Pj xi are orthogonal, no matrix
inversion is required to obtain the results. Orthogonal polynomials of a much higher
degree can be tted to the experimental data without running into problems with ill-
conditioned normal equations and oscillating terms.
The goal of this continuum estimation method is to t the continuum with the
above-described orthogonal polynomial of degree m and to interpolate under the peaks.
This can be achieved by careful manual selection of only those data pairs xi ; yi that
belong to the continuum. The more elegant approach proposed by Steenstrup consisted of
using all channels and automatically adjusting the weights in such a way that the con-
tinuum contributions are emphasized. If yi is a polynomial approximation of the
continuum of degree m, then yi  yi if i is a continuum channel, otherwise yi > yi.
Copyright 2002 Marcel Dekker, Inc.
A better approximation to the continuum can then be found by choosing small weights for
the data points where yi > yi and repeating the t. The following weighting scheme is
proposed by Steenstrup:
1 p
wi if yi  yi r yi 46
yi
1 p
wi if y i > yi r yi 47
yi  yi 2
where r is an adjustable parameter. Typically, r 2 is used. If yi p is
normally
distributed
(which is approximately the case for yi > 30 counts), yi  yi 2 yi holds for 97.7%
of the channels containing only continuum. A too high value of r will cause the inclusion
of the tails of the peaks into the t. With the new weights, a new polynomial t of degree m
is performed. The process is stopped when the new polynomial coecients cj are within
one standard deviation from the previous ones. We have also obtained good results by
setting the weights eectively to zero when the channel content is statistically above the
tted continuum [Eq. (47)].
The method can be made even more unsupervised by including a procedure to au-
tomatically select the best degree of the polynomial. This can be done by tting (as de-
scribed earlier) successive polynomials with higher degrees and testing the signicance of
each additional polynomial coecient. If the coecient Cm1 is statistically not signicant
dierent from zero,
jCm1 j < 2sCm1 48
a polynomial of degree m is retained.
Figure 14 shows the spectrum of bovine liver sample (same as Fig. 12) and the tted
continuum using a fourth-, fth-, and sixth-degree orthogonal polynomial. A value of 1.5
was chosen for r [Eqs. (46) and (47)] and the weights were set to zero for the channels not
belonging to the continuum; wi 0 in Eq. (47). Table 5 gives the data on the ts. The nal
number of continuum channels retained, the number of weight adjustments performed,
and the value and standard deviation of the highest polynomial coecient are shown. The
total number of channels in the tting region was 571. Because the value of the seventh
degree polynomial coecient is insignicant [Eq. (48)], it is concluded that a sixth-degree
orthogonal polynomial is needed to model the continuum.
Section X lists the FORTRAN code that implements the complete procedure to esti-
mate the continuum of a given spectrum by an mth-degree orthogonal polynomial.

V. SIMPLE NET PEAK AREA DETERMINATION

In both in WDXRF and EDXRF, the concentration of the analyte is proportional to the
number of counts under the characteristic x-ray peak corrected for the continuum. At
constant resolution, this proportionality also exits for the net peak height. In EDXRF,
preference is given to the peak area. In WDXRF, the acquisition of the entire peak prole
is very time-consuming and the count rate is usually measured only at the peak maximum.

A. Peak-Area Determination in EDXRF


The most straightforward method to obtain the net area of an isolated peak in a spectrum
consists of interpolating the continuum under the peak and summing the continuum-
corrected channel contents in a window over the peak, as shown in Figure 15.
Copyright 2002 Marcel Dekker, Inc.
Figure 14 Continuum estimate obtained by fitting a fourth-, fifth-, and sixth-degree orthogonal
polynomial.

The net peak area, Np , of an isolated peak on a continuum is given by


X
iP1 X X
NP yi  yB i yi  yB i NT  NB 49
iP2 i i
where NT and NB are the total number of counts of the spectrum and the continuum in the
integration window iP1  iP2 . The uncertainty of the estimated net peak area due to
counting statistics is
p
sNP NT NB 50
The continuum yB i is calculated by interpolation, assuming a straight line (see Fig. 15):
i  iBL
yB i YBL YBR  YBL 51
iBR  iBL

Table 5 Data on the Fit of the Continuum of the Spectrum Shown in Figure 14 Using Orthogonal
Polynomials

Degree of No. of channels No. of weight Highest degree coecient


polynomial used adjustments and standard deviation

4 126 7 (3.980.20)106
5 199 16 (1.880.09)107
6 267 6 (7.660.28)1010
7 266 6 (3.662.03)1013

Copyright 2002 Marcel Dekker, Inc.


Figure 15 Determination of the net peak area by interpolating the continuum.

YBL and YBR are the values of the continuum at the channels iBL and iBR , left and right
from the peaks, respectively. These values are best estimated by averaging over a number
of channels:

1 X iBL1
NBL
YBL yi 52
nBL iiBL2 nBL
1 X
iBR1
NBR
YBR yi 53
nBR iiBR2
nBR

The number of channels in the continuum windows are nBL iBL2  iBL1 1 and
nBR iBR2  iBR1 1. The center position of the continuum windows (not necessarily an
integer number!) used in Eq. (51) are iBL iBL1 iBL2 =2 and iBR iBR1 iBR2 =2.
If both continuum windows have equal width, nBL nBR nB =2, and are positioned
symmetrically with respect to the peak window iP  iBL iBR  iP a much simpler ex-
pression is obtained for the net peak area:
nP
NP NT  NBL NBR 54
nB
where NBL and NBR are the total counts in the left and right continuum windows, re-
spectively, each nB =2 channels wide, and nP equals the number of channels in the peak
window. Applying the principle of error propagation, the uncertainty in the net peak area
is then given by
s
 2
nP
sNP N T NBL NBR 55
nB

Copyright 2002 Marcel Dekker, Inc.


From this equation, it can be seen that, in principle, the continuum should be estimated
using as many channels as possible nB to minimize random errors due to counting
statistics. In practice, the width will be limited by curvature of the continuum and by the
presence of other peaks. Most often, the total width of the continuum nB nBL nBR is
taken equal to or slightly larger than the width of the peak window.
The optimum width of the peak window to minimize counting statistics depends on
the peak-to-continuum ratio (Jenkins et al., 1981). For low peak-to-continuum ratios, the
peak height being only a fraction of the continuum height, the optimum width of the
integration window is 1.17 times the FWHM of the peak. If the ratio of the peak height to
the continuum height is larger than 1, a slightly wider window is optimal, although, in
practice, the improvement in precision is negligible.
The method does not deliver the total net peak area, only a fraction of it. Integrating
over 1.176FWHM (from 1:378s to 1:378s) covers only 83% of the peak. To cover
99% of the peak, the window should be 2.196FWHM (2.579s). If position or width
changes are observed from one spectrum to the other, care should be taken to dene the
windows in such a way that cover the same part of the spectrum. The wider the peak
window, the less sensitive the method for peak shift.
Although such a peak-area-determination method seems naively simple, if correctly
used it provides results that are as accurate and precise as the most sophisticated proce-
dures. The premises for use are that the peak window should be known to be free from
interferences, that there should be no peaks in the continuum windows, and that the con-
tinuum should be linear over the extent of the windows. A peak search procedure can, in
principle, be used to setup the windows automatically. However, its practical use is limited
by complexity of the EDXRF spectra. Moreover, the use of such automated procedures is
hazardous because no measure can be given for the presence or absence of systematic errors.
Because of this restrictions, a simple peak integration method cannot be used as a
general tool for spectrum processing. In a limited number of applications, good results can
be obtained. An evaluation of various peak integration methods is given by Hertogen et al.
(1973).

B. Net Count Rate Determination in WDXRF


In WDXRF, the count rate at the 2Y angle of the peak maximum corrected for the
continuum is used as analytical signal. The continuum is estimated at a 2Y position on the
left or right side of the peak. Continuum interpolation as described in the previous section
is also possible.
If NP is the number of counts accumulated during a time interval tP at the top of the
peak and NB is the number of the continuum counts accumulated during time tB , then the
net count rate, I, is given by
NP NB
I IP  IB  56
tP tB
and the uncertainty in the net count rate due to counting statistics is given by
s
NP NB
sI 2 57
t2P tB

Various counting strategies can be considered and the eect on the precision can be esti-
mated using Eq. (57) (Bertin, 1970). In a optimum xed time strategy, the minimum
Copyright 2002 Marcel Dekker, Inc.
uncertainty is obtained when, for a total measurement time t tP tB ; tP and tB are chosen
in such a way that their ratio is equal to the square root of the peak-to-continuum ratio:
r
tP IP
58
tB IB
Under, these conditions, the uncertainty in the net intensity is given by
p p
IP IB
sI p 59
tP tB

VI. LEAST-SQUARES FITTING USING REFERENCE SPECTRA

In this section, two techniques based on linear least squares are discussed. The lter-t
methods makes use of library spetra, measured or calculated spectra of pure compounds,
that are used to describe the spectra of complex samples. The other method is based on
partial least-squares (PLS) regression, a multivariate calibration technique. In this case, no
spectrum evaluation in the strict sense is performed, but, rather, relations between the
concentrations of the compounds in the samples and the entire spectrum are established.
In this way, quantitative analysis is possible without obtaining net peak areas of the
characteristic lines.

A. Filter-Fit Method
1. Theory
If a measured spectrum of an unknown sample can be described as a linear combination of
spectra of pure elements constituting the sample, then the following mathematical model
can be written:
X
m
ymod
i aj xji 60
j1

with ymod
i the content of channel i in the model spectrum and xji the content of channel i in
the jth reference spectrum. The coecients aj are a measure of the contribution of pure
reference spectra to the unknown spectrum and can be used for quantitative analysis. The
values of the coecients aj are obtained via multiple linear least-squares tting, mini-
mizing the sum of the weighted squared dierences between the measured spectrum and
the model:
" #2
Xn2
1 2
Xn2
1 Xm
w 2
yi  yi yi  aj xji 61
s2
in1 i
s2
in1 i j1

where yi and si are the channel content and the uncertainty of the measured spectrum,
respectively, and n1 and n2 are the limits of the tting region. A detailed discussion of the
least-squares-tting method is given in Sec. IX.
The assumption of linear additivity [Eq. (60)] holds normally reasonable well for the
characteristic lines in the spectrum, but not for the continuum. To apply this technique,
the continuum can be removed from the unknown spectrum and from the reference
spectra using one of the procedures described in Sec. IV before the actual least-squares t.
Copyright 2002 Marcel Dekker, Inc.
Another, frequently used approach is to apply a digital lter to both unknown and
reference spectra. This variant is known as the lter-t method (Schamber, 1977; Statham,
1978; McCarthy and Schamber, 1981) and is discussed in some detail below.
By the discrete convolution of a spectrum with top-hat lter [Eqs. (30) and (31)], the
low-frequency component (i.e. the slowly varying continuum) is eectively suppressed as
discussed in Sec. III. Apart from removing the slowly varying continuum, a rather severe
distortion of the peaks is also introduced. If we apply this lter to both the unknown
spectrum and the reference spectra, the nonadditive continuum is removed and the same
type of peak distortion will be introduced in all spectra, allowing us to apply the method of
multiple linear least-squares tting to the ltered spectra. Equation (61) then becomes
" #2
Xn2
1 X
m
w2 y0  aj x0ji 62
s0 2 i j1
in1 i

Where y0i and x0ji are the ltered unknown and reference spectra, respectively, s0i 2 is the
variance of y0i given by
X
s02
i h2k yik 63
k

The least-squares estimates of the contribution of each reference spectrum is then given by
(see Sec. IX)
X
m
aj a1
jk bk ; j 1; . . . ; m 64
k1

with
Xn2
1 0 0
bj yx
02 i ji
65
s
in1 i
Xn2
1 0 0
ajk x x 66
s0 2 ki ji
in1 i

The uncertainty in each coecient aj is directly estimated from the error matrix:

s2aj a1
jj 67
Schamber (1977) suggested the following equation for the uncertainties, taking into ac-
count the eect of the lter:
nw 1
s2aj a 68
n w jj
Where w is the width of the central positive part of the lter and n is the width of the
negative wings. A measure of the goodness of t is available through the reduced w2 value:
1
w2n w2 69
n2  n1 1  m

which is the w2 value of Eq. (62) divided by the number of points in the t minus the
number of reference spectra. A value close to 1 means a good t, indicating that the
reference spectra are capable of adequately describing the unknown spectrum.
Copyright 2002 Marcel Dekker, Inc.
Most of the merits and the disadvantages of the lter-t method can be deduced
directly from the mathematical derivation given in the preceding paragraphs. The most
interesting aspect of the lter-t method is that it does not require any mathematical
model for the continuum and that, at least in principle, the shape of the peaks in the
unknown spectrum are exactly represented by the reference spectra. Reference spectra
should be acquired with good counting statistics, at least better than the unknown spec-
trum, because the least-squares method assumes that there is no error in the independent
variables x0ji . Reference spectra can be obtained from single-element standards. Only the
portion of the spectrum that contains peaks needs to be retained as reference in the t.
Multielement standards can be used if the peaks of each element are well separated.
The reference spectra must provide an accurate model of the peak structure present
in the unknown spectrum. This requires that reference and unknown spectra are acquired
under strictly identical spectrometer conditions. Changes in resolution and, especially,
energy shifts can cause large systematic errors. The magnitude of this error depends on the
degree of peak overlap. Peak shifts of more than a few electron volts should be avoided,
which is readily possible with modern detector electronics. If shifts are observed over long
periods of operations of the spectrometer, careful recalibration of the spectrometer is
required or, better, the reference spectra should be acquired again. Also, peak shift and
peak broadening due to dierences in count rate between standards and unknown must be
avoided. Dierential absorption is another problem that might inuence the accuracy of
the model. Because of the dierence in x-ray attenuation in the reference and the un-
known, the Kb to Ka ratios might be dierent in the two spectra. This becomes especially
problematic if the Kb line is above and the Ka line below an absorption edge of a major
element of the unknown sample. The magnitude of the error depends on the peak overlap.
Careful selection of the samples to produce the reference spectra is therefore required.
The procedure requires that a reference spectrum be included for each element
present in the unknown. The method allows no mechanism to deal with sum peaks. Apart
from removing the continuum, the lter also has some smoothing eect on the spectrum
and causes the peak structure to be spread out over more channels. This is equivalent to
tting a spectrum with a somewhat lower resolution than originally acquired. Therefore,
the precision and detection limits attainable with the lter-t method are slightly worse
than optimal. The width of the lter is important in this respect. Schamber (1977) suggests
taking the width of the top of the lter equal to the FWHM resolution of the spectrometer,
u FWHM. The width of the wings can be taken as v u=2.

2. Application
The calculation procedure is quite simple and requires the following steps. The top-hat lter
is applied to the unknown spectrum and the m reference spectra [Eqs. (30) and (31)], and the
modied uncertainties are calculated using Eq. (63). Next, the vector b of length m and the
m  m square matrix a are formed using Eqs. (65) and (66), summing over the part of the
spectrum one wants to analyze (n1  n2 ). Only the relevant part, such as the Ka or the Ka
plus the Kb, of the reference spectra needs to be retained; the rest of the ltered spectrum
can be set to zero. After calculating the inverse matrix a1 , the contribution of each re-
ference to the unknown and its uncertainty are calculated using Eqs. (64) and (67) or (68).
A computer implementation of the lter-t method is given in Sec. X. The method
was used to analyze part of an x-ray spectrum of a polished NIST SRM 1103 brass sample
(Fig. 16A). The measurements were carried out using a Mo x-ray tube and a Zr secondary
target and lter assembly. Spectra of pure metals (Fe, Ni, Cu, Zn, and Pb) were acquired
Copyright 2002 Marcel Dekker, Inc.
Figure 16 (A) Part of the x-ray spectrum of NIST SRM 1103 brass sample. (B) Top-hat-filtered
spectrum and result of fit using reference spectra.

under identical experimental conditions. A top-hat lter of width w 5 was used. Table 6
shows how the spectra were divided in regions of interest to produce the reference spectra.
Because considerable x-ray attenuation is present in brass, separate references were created
for the Ka and Kb of Cu and Zn. This was not done for Fe, Ni, and Pb because these
elements are only present as minor constituents in the brass sample. Figure 16B shows the
ltered brass spectrum and the resulting t using the seven (ltered) reference spectra. The
region below CuKa is expanded 100 times, and the region above ZnKa is expanded 10
times. As can be seen, the agreement between the ltered brass spectrum and the t is very
good. The reduced w2 value is 8.5. This high value is probably due to small peaks shifts in
the reference spectra compared to brass spectrum.
Table 7 compares the results of the lter t with the results obtained by nonlinear
least-squares tting using analytical functions (see Sec. VII). Although the w2 value of the
nonlinear t is slightly better (2.7), one observes an excellent agreement between the two
methods for the analytical important data (i.e., intensity ratios). The uncertainties for
small peaks are slightly higher with lter-t method, as explained previously.
The lter-t method is fast and relatively easy to implement. It can produce reliable
results when the spectrometer calibration can be kept constant within a few electronvolts
and suitable standards for each element present in the sample are available. The method
Copyright 2002 Marcel Dekker, Inc.
Table 6 Data on the Reference Spectra and the Unknown Spectrum Used
in the Filter-Fit Procedure (Fig. 16)

Spectrum Region of interest (keV) Used as

Pure Fe 4.258.12 FeKa Kb reference


Pure Ni 4.739.57 NiKa Kb reference
Pure Cu 5.708.46 CuKa reference
8.469.81 CuKb reference
Pure Zn 6.599.09 ZnKa reference
9.0910.30 ZnKb reference
Pure Pb 8.3613.68 PbLa reference
SRM 1103 5.2211.26 Unknown

Table 7 Comparison of Spectrum Evaluation Results Using Filter-Fit Method and


Nonlinear Least-Squares Fitting; Ratios of the Intensity from the SRM 1130 Standard to
the Pure Element Given

Ratio of intensity in SRM 1104 to pure element

Element Filter t Nonlinear t % Di

Fe 0.00530.0002 0.00470.0001 11
Ni 0.00190.0002 0.002010.00008 6
Cu 0.5460.001 0.5510.001 0.9
Zn 0.3900.001 0.39120.0008 0.31
Pb 0.03100.0006 0.03080.0003 0.65

performs well when one has to deal with a dicult to model continuum. If information on
trace elements and major elements is required (very large peaks next to very small ones),
the method might not be optimal. This lter-t method is frequently used to process x-ray
spectra obtained with electron microscopes (SEM-EDX), often in combination with a
ZAF or Phi-Rho-Z correction procedure.

B. Partial Least-Squares Regression


Spectrum evaluation as discussed in most of this chapter aims to obtain the net intensity of
the uorescence lines. These net peak intensities are then used to determine analyte con-
centrations using one of the many empirical, semiempirical, or fundamental approaches
detailed in various other chapters. In this, sub-section, we discuss an approach that is
relatively new for the XRF community and is based on multivariate calibration. We will
concentrate our discussion on partial least squares (PLS), a chemometrical technique that
is extensively used in infrared and relates spectrometric methods. The basic idea of the
method is to nd a (multivariate linear) model that directly relates the spectral data to the
analyte concentrations, thus avoiding the explicit evaluation of the spectrum in terms of
the net peak areas. The method involves a calibration step where a large number of
standards are used to build and validate the model and the actual analysis step where the
model is applied to spectra of unknown samples.
Copyright 2002 Marcel Dekker, Inc.
1. Theory
Multivariate spectroscopic calibration attempts to predicts properties of samples (con-
centration of analytes) based on measured spectral data via the following relation:
Y XB F 70
The Ynm matrix holds concentration of m analytes in the n samples. The Xnp matrix
represents the spectral data, with n measured spectral data each having p x-ray intensities
(channels) and Fnm is the matrix the residuals, part of the Y matrix not explained by the
model. The regression coecients B can be calculated in several ways. The most
straightforward approach is the use of multiple linear regression (see Sec. IX). The least-
squares solution is given then given by
B X0 X1 X0 Y 71
When the number of x variables exceeds the number of samples ( p > n) and=or when there is
a high degree of correlation between the x variables, the least-squares solution becomes
unstable or cannot be obtained due to the fact that the covariance matrix X0 X cannot be
inverted. This is exactly the fact when we attempt to apply multivariate calibration in XRF.
The number of channels in the spectrum (1024 or more) largely exceeds the number of
samples and the intensities of neighboring channels in each peak are very strongly correlated.
To overcome this problem, the method of singular value decomposition can be used. The
X matrix is decomposed in a number of linearly independent variables, know as principal
components. In principle, there are as many principal components as there are original vari-
ables (channels), but only the most signicant [i.e., having the highest variance (or eigenvalue)]
are retained. Using the matrix of this smaller number of principal components scores, rather
than the entire X matrix in Eq. (70), is known as principal components regression (PCR). The
disadvantage of PCR is that the selection of the principal components is based on how much
variance they explain of the X matrix. The rst few principal components may have little re-
lation with the concentrations that need to be predicted by the model.
Partial least-squares regression (PLSR) is a variant of PCR that largely overcomes
this problem. The methods uses two outer relations and one inner relation. The outer
relations describe the decomposition of the X and Y matrices:
X
A
X TP0 E ta p0a E 72
a1
XA
Y UQ0 F ua q0a F 73
a1

TnA and UnA are the score matrices, the values of the AA p latent variables. P0Ap
and Q0Am are the loading matrices describing the relation between the latent variables
(T and U) and the original variables (X and Y). The number of latent variables A in the
model is of crucial importance and its optimum value must be found by cross-validation.
The matrices E and F contain the residuals, part of the original spectral data and the
concentration data, respectively, not accounted for when using A latent variables. The
inner relation is written as:

ua ba ta ; a 1; . . . ; A 74

from which the regression coecients B can be obtained. This operation can be seen as a least-
squares t between the X block and Y block scores. The nal PLS model can thus be written as
Copyright 2002 Marcel Dekker, Inc.
Y TBQ0 F  75
A graphical representation of the PLS model is given in Figure 17. In the normal PLS
algorithm Y is a vector of concentrations for one element and a separate model is build
for each element. If all Y-variables are predicted simultaneously, as in the case of
Equation (70), the PLS2 algorithm is used. This method performs better when the con-
centrations in the samples are highly correlated.
The quality of the calibration model can be judged by the root mean square error of
prediction (RMSEP):
" #1=2
1X n
2
RMSEP ^
yi  yi 76
n i1

where y^i is the concentration in sample i predicted by the model yi is the true concentration
in the standard. To determine the optimum number of latent variables, the RMSEP is
calculated using PLS models with dierent numbers of latent variables A. The RMSEP
values are plotted against A and the value where a minimum or a plateau is reached is
taken. For small datasets, the calculation of the RMSEP is done using cross-validation.
When a large number of standards are available, they are split in a training set (ap-
proximately two-thirds of the samples) and a prediction set (approximately one-third of
the samples). The training set is used to build the PLS model and the RMSEP is calculated
based on the concentrations of the prediction set. Alternatively, for smaller calibration
sets, leave-one-out cross-validation can be used. Each sample is excluded once from the
dataset and predicted by the model built with the remaining samples. This is repeated until
all samples have been excluded once.
Geladi and Kowalski (1986) published a tutorial on PLS and its relation to other
regression techniques. A standard work on the PLS method is the book by Martens and

Figure 17 Graphical representation of the PLS algorithm.

Copyright 2002 Marcel Dekker, Inc.


Naes (Martens and Naes, 1989). Theoretical and statistical aspects of the method can be
found in chemometrics literature (Manne, 1987; Lorber et al., 1987; Pratar and De Jong,
1997; Hoskuldsson, 1988).

2. Application
The PLS method is illustrated with the analysis of aqueous solution containing Ni, Cu,
and As with concentration in the ranges 1850, 0.545, and 520 g=L, respectively,
whereas Zn, Fe, and Sb were present in more or less constant amounts. Spectra were
acquired for 1000 s from 5-mL solution using an EDXRF spectrometer equipped with a
Rh x-ray tube operating at 30 kV and a Si(Li) detector having 160 eV resolution at MnKa.
From the 22 samples, 10 were used to build the PLS mode and the remaining were
used as the prediction set. Table 8 gives the composition of the samples in the calibration
set. Figure 18 shows part of the spectrum between 5 and 14 keV of sample number 9. The
CuKa line is considerably interfered by the NiKb line. Absorption eects can be expected
because the NiK absorption edge (8.33 keV) is just above the CuKa line (8.04 keV) and the
AsK lines can cause secondary uorescence of Cu and Ni.
A PLS 1 model is build for Cu. The X matrix consists of 451 x-ray intensities
(variables) between 5 and 14 keV (channels 250 to 700) of the 10 samples. The Y matrix
contains the Cu concentration of those samples.
In Figure 19, the RMSEP based on leave-one-out cross-validation is plotted as
function of the number of latent variables. A minimum error of 0.57 g=L is obtained for
three latent variables so that the PLS model with three latent variable is retained.
Figure 20 compares the true Cu concentration with the predicted concentration for the
calibration set. The PLS model predicts very accurately the Cu concentrations in the
range from 1 to 40 g=L. The so-called regression coecients of the PLS model show
which variables (channels) are used in the model. They are plotted in Figure 21. As could
be expected, the Cu concentrations are predicted from the channel content corresponding
to the CuK lines. The inuence of absorption and enhancement results in a small negative
contribution from the Ni peak and a small positive contribution from the As peaks, re-
spectively. The PLS model thus handles both the problem of spectral interference and
matrix eects. The Cu concentration predicted by the PLS model for the test set is given in
Table 9. Except for the two samples with the highest concentrations, the Cu concentration

Table 8 Composition of Samples Used to Build the PLS Model

Concentration (g=L)

Sample No. Ni Cu Zn

1 18.0 6.7 6.1


2 18.1 5.0 6.0
3 18.5 3.5 6.1
4 18.1 8.4 5.7
5 53.6 1.8 20.1
6 53.8 0.8 19.8
7 57.3 1.1 19.3
8 56.9 1.5 20.0
9 51.4 17.7 20.9
10 17.9 41.5 5.3

Copyright 2002 Marcel Dekker, Inc.


Figure 18 X-ray spectrum of sample number 9 in the calibration set used to build the PLS model.

Figure 19 RMSEP values versus the number of latent variables for the prediction of the Cu
concentration.

in the unknown samples is very well estimated. The predicted of the test set is generally
somewhat worse that the prediction of the calibration set. To build an accurate model, a
large number of standards spanning the concentration range of interest for each element is
required. This is certainly the major drawback of PLS for its application in XRF,
Copyright 2002 Marcel Dekker, Inc.
Figure 20 Comparison of true and predicted Cu concentrations for the samples in the PLS
calibration set.

Figure 21 Regression coefficients of the PLS model for Cu, showing which variables (channels in
the x-ray spectrum) are used to predict Cu.

Copyright 2002 Marcel Dekker, Inc.


Table 9 Comparison of True and Predicted Cu Concentration Using the PLS Model

Cu concentration (g=L)

Sample No. True Predicted Dierence

11 6.30 6.38 0.08


12 7.20 7.34 0.14
13 6.30 6.32 0.02
14 7.50 7.40 0.10
15 6.30 6.09 0.21
16 0.90 1.27 0.37
17 1.00 0.75 0.25
18 1.20 1.55 0.35
19 1.80 2.11 0.31
20 2.50 2.86 0.36
21 17.60 16.51 1.09
22 44.70 40.59 4.11

especially when solid samples are considered. This problem can to some extend be
overcome by building a calibration set via Monte Carlo simulation. Just as with the lter-
t method, standards and unknowns need to be measured under strictly identical spec-
trometer conditions. Changes in gain or resolution will cause systematic errors in the
calculated concentrations.
Swerts and Van Espen (1993) demonstrated the use of PLS for the determination of
S in graphite using a Rh x-ray tube excitation XRF equipped with a Si(Li)detector. Be-
cause of diraction eects, least-squares tting of the spectra was nearly impossible. Using
PLS, the sulfur content could be determined in a concentration range 260%, with an
accuracy of better than 5% relative standard deviation. Urbanski and Kowalska applied
the PLS method to the determination of Sr and Fe in powdered rock samples and to the
determination of the S and ash content in coal using a radioisotope XRF system equipped
with a low-resolution gas proportional counter. They also demonstrated the usefulness of
this method for the determination of the thickness and composition of Sn-Pb and Ni-Fe
layered structures (Urbanski and Kowalska, 1995). Molt and Schramm (1987) compared
principal components regression (PCR), PLS for the determination of S, exhibiting strong
interference from Mo, in aqueous and solid samples. The results were also compared with
quantitative analysis using the method developed by Lucas-Tooth and Price (1961).
Equally good results were obtained with all three methods. Similar results were obtained
by Lemberge and Van Espen for the determination of Ni, Cu, and As in liquid samples.
They demonstrated that taking the square root of the data improves the PLS model and
that the PLS method extracts information from the scattered excitation radiation to de-
scribe the matrix eects (Lemberge and Van Espen, 1999).

VII. LEAST-SQUARES FITTING USING ANALYTICAL FUNCTIONS

A widely used and certainly the most exible procedure for evaluating complex x-ray
spectra is based on least-squares tting of the spectral data with an analytical function.
The method is conceptually simple, but not trivial to implement and use.
Copyright 2002 Marcel Dekker, Inc.
A. Concept
In this method, an algebraic function, including analytical importantly parameters such as
the net areas of the uorescence lines, is used as a model for the measured spectrum. The
object function (w2) is dened as the weighted sum of squares of the dierences between
this model y(i) and the measured spectrum yi over a region of the spectrum:
Xn2
1
w2 2
yi  yi; a1 ; . . . ; am 2 77
in1
si

where s2i is the variance of data point i, normally taken as s2i yi , and aj are the para-
meters of the model. The optimum values of the parameters are those for which w2 is
minimal. They can be found by setting the partial derivatives of w2 with respect to the
parameters to zero:
@w2
0; j 1; . . . ; m 78
@aj
If the model is linear in all the parameters aj, these equations result in set of m linear
equations in the m unknowns aj, which can be solved algebraically. This is known as linear
least-squares tting. If the model is nonlinear in one or more of its parameters, a direct
solution is not possible and the optimum value of the parameters must be found itera-
tively. An initial value is given to the parameters and they are varied in some way until a
minimum for w2 is obtained. The latter is equivalent to searching for a minimum in the
m 1-dimensional w2 hypersurface. This is known as nonlinear least-squares tting. The
selection of a suitable minimization algorithm is very important because it determines to a
large extent the performance of the method. A detailed discussion of linear and nonlinear
least-squares tting is given in Sec. IX.
The most dicult problem to solve when applying this least-squares procedure is the
construction an analytical function that accurately describes the observed spectrum. The
model must be capable of describing accurately the spectral data in the tting region. This
requires an appropriate model for the continuum, the characteristic lines of the elements
and for all other features present in the spectrum such as absorption edges, escape, and
sum peaks. Although the response function of the energy-dispersive detector is, to a very
good approximation, Gaussian, deviation from the Gaussian shape needs to be taken into
account. Failure to construct an accurate model will result in systematic errors, which
under certain conditions may lead to gross positive or negative errors in the estimated
peak areas. On the other hand, the tting function should remain simple, with as few
parameters as possible. Especially for the nonlinear tting, the location of the w2 minimum
becomes problematic when a large number of parameters is involved.
In general, the tting model consists of two parts:
X
yi yB i yP i 79
P

where y(i) is calculated content of channel i and the rst part describes the continuum and
the second part the contributions of all peaklike structures.
Because the tting functions for both linear and nonlinear least-squares tting have
many features in common, we treat the detailed description of the tting function for the
most general case of nonlinear least squares. Moreover, if the least-squares t is done
using the Marquardt algorithm, the linear least-squares t is computationally a particular
case of the nonlinear least-squares t. Programs based on this algorithm can perform
Copyright 2002 Marcel Dekker, Inc.
linear and nonlinear tting using the same computer code. A large part of the discussion
given here is based on the computer code AXIL, developed by the author for spectrum
tting of photon-, electron-, and particle-induced x-ray spectra (Van Espen et al., 1977a,
1977b, 1979b, 1986).

B. Description of the Continuum


To model the continuum, various analytical expressions are in use, depending on the
excitation conditions and on the width of the tting region. Except for electron micro-
scopy, it is virtually impossible to construct an acceptable physical model that describes
the continuum, mainly because of the large number of processes that contribute to it. For
this reason, very often some type of polynomial expression is used.

1. Linear Polynomial
A linear polynomial of the type

yB i a0 a1 Ei  E0 a2 Ei  E0 2    ak Ei  E0 k 80
is useful to describe the continuum over a region 23 keV wide. Wider regions often
exhibit too much curvature to be described by this type of polynominal. In Eq. (80), Ei is
the energy (in keV) of channel i [see Eq. (84)] and E0 is a suitable reference energy, often
the middle of the tting region. Expressing the polynomial as a functin of (Ei 7 E0) rather
than as a function of the channel number is done for computational reasons; (Ei 7 E0)3 is,
at most, of the order of 103, whereas i3 can be as high as 109. Most computer programs
that implement a polynomial model for the continuum allow the user to specify the degree
of the polynomial; k 0, 1, and 2 produce respectively a constant, a straight line, and a
parabolic continuum. Values of k larger than 4 are rarely useful because such high-degree
polynomials tend to have physical nonrealistic oscillations. Equation (80) is linear in the
tting parameters a0, . . . , ak, so that this function can be used in linear as well as in
nonlinear least-squares tting.

2. Exponential Polynomial
A linear polynomial cannot be used to t the continuum over the entire spectrum or to t
regions of high positive or negative curvature. Higher curvature can be modeled by
functions of the type

yB i a0 expa1 Ei  E0 a1 Ei  E0 2    ak Ei  E0 k 81
where k is the degree of the exponential polynomial. A value of k as high as 6 might be
required for an accurate description of a continuum from 2 to 16 keV. This function is
nonlinear in the tting parameters a1, . . . , ak and requires a nonlinear least-squares pro-
cedure and some initial guess of these parameters. Initial values for these nonlinear
parameters can be determined by rst estimating the shape of the continuum using one of
the procedures described in Sec. IV followed by a linear t of the logarithm of this con-
tinuum. These initial guesses are then further rened in the nonlinear tting procedure.

3. Bremsstrahlung Continuum
The exponential polynomial is not suitable for describing the shape of the continuum
observed in electron- and particle-induced x-ray spectra, mainly because of the high
Copyright 2002 Marcel Dekker, Inc.
curvature at the low-energy region of the spectrum. The continuum results from photons
emitted from the sample by the retardation of fast electrons. The slope of the emitted
continuum is essentially an exponentially decreasing function according to Kramers for-
mula. At low energies, the emitted photons are strongly absorbed by the detector windows
and by the sample. A suitable function to describe such a radiative continuum is an
exponential polynominal multiplied by the absorption characteristics of the spectrometer:

yB i a0 expa1 Ei  E0    ak Ei  E0 k Ta Ei 82
A detailed discussion of the function Ta(E) is given on page 288. To be physically correct,
the absorption term must be convoluted with the detector response function, because
the sharp edges due to absorption by detector windows (Si or Au) or elements present in
the sample are smeared out by the nite resolution of the detector.

4. Continuum Removal
An alternative to an algebraic description of the continuum is to estimate the continuum
rst using one of the procedures outlined in Sec. IV and to substract this continuum from
the measured spectrum before the actual least-squares tting. To correctly implement the
least-squares t after subtraction of the continuum, the weights 1=s2i [Eq. (77)] must be
adjusted. If y0i represents the spectral data after subtraction of the continuum,
y0i yi  yB i, the variance of y0i is given by s0i 2 s2i s2yB i . A reasonable approximation
for s2yB i is yB i itself, so that the total variance becomes s0i 2 yi yB i. If this adjust-
ment of the weights is not made, the uncertainty in the net peak areas are understimated,
especially for small peaks on a high continuum.
It is rather dicult for an inexperienced user to select the appropriate continuum
model for a given spectrum. The following might serve as a general guideline. For tting
regions 23 keV wide, a linear polynomial continuum is often adequate. To t large re-
gions of XRF spectra, the exponential polynomial provides the most accurate results, with
k typically, between 4 and 6. The same holds for the bremsstrahlung continuum of SEM-
EDX and PIXE spectra. The simplest method from the users point of view is the
continuum stripping, but this method does not provide optimum results. A slight under-
estimation or overestimation might occur, resulting in large relative errors in the area
determination of small peaks (Vekemans et al., 1995).

C. Description of Fluorescence Lines


Because the response function of most solid-state detectors is predominantly Gaussian, all
mathematical expressions used to describe the uorescence lines involve this function.
When dealing with the K lines of high-atomic-number elements, such as Pb or U, the in-
uence of the natural line shape becomes appreciable and the use of the more complicated
Voigt prole is required (Wilkinsin, 1971; Gunnink, 1977; Pessara and Debertin, 1981).

1. Single Gaussian
A Gaussian peak is characterized by three parameters: the position, width, and height or
area. It is desirable to describe the peak in terms of its area rather than its height because
the area is directly related to the number of x-ray photons detected, whereas the height
depends on the spectrometer resolution. The rst approximation to the prole of a single
peak is then given by
Copyright 2002 Marcel Dekker, Inc.
" #
A xi  m2
p exp  83
s 2p 2s2

where A is the peak area (counts), s is the width of the Gaussian expressed in channels,
and m ispthe location

of the peak maximum. The often used FWHM is related to s by the
factor 2 2 ln 2 or FWHM 2.35s. In Eq. (83), the peak area is a linear parameter; the
width and position are nonlinear parameters. This implies that a nonlinear least-squares
procedure is required to nd optimum values for the latter two parameters. Using a linear
least-squares method assumes that the position and width of the peak are know with high
accuracy from calibration.
To describe part of a measured spectrum, the tting function must contain a number
of such functions, one for each peak. For 10 elements and 2 peaks (Ka and Kb) per element,
we would need to optimize 60 parameters. It is highly unlikely that such a nonlinear least-
squares t will terminate successfully at the global minimum. To overcome this problem,
the tting function can be written in a dierent way as shown in the next subsection.

2. Energy and Resolution Calibration Function


The rst obvious step is to drop the idea of optimizing the position and width of each peak
independently. In x-ray spectrometry, the energies of the uorescence lines are know with
an accuracy of 1 eV or better. The pattern of peaks observed in the spectrum is directly
related to elements present in the sample. Based on those elements, we can predict all x-ray
lines that constitute the spectrum and their energies. The peak function [Eq. (83)] is
therefore written in terms of energy rather than channel number. Dening ZERO as the
energy of channel 0 and expressing the spectrum GAIN in electronvolts=channel, the
energy of channel i is given by
Ei ZERO GAINi 84
and the Gaussian peak can be written as
" #
GAIN Ej  Ei 2
Gi; Ej p exp  85
s 2p 2s2

with Ej the energy (in eV) of the x-ray line and s the peak width given by
 
NOISE 2
s
2
3:58FANOEj 86
2:3548
In this equation, NOISE is the electronic contribution to the peak width (typically
80100 eV FWHM) with the factor 2.3548 to convert to s units, FANO is the Fano factor
( 0.114), and 3.85
is the energy required to produce an electronhole pair in silicon. The
p
term GAIN=s 2p in Eq. (85) is required to normalize the Gaussian so that the sum over all
channels is unity.
For linear least-squares tting, ZERO, GAIN, NOISE, and FANO are physically
meaningful constants. In the case of nonlinear least squares, they are parameters to be
rened during the tting. The advantage of optimizing the energy and resolution calibra-
tion parameters rather than the position and width of each peak is a vast reduction of the
dimensionality of the problem. The nonlinear t of 10 peaks would now involve 14 para-
meters compared to 30. Even more importantly, all information available in the spectrum is
used to estimate ZERO, GAIN, NOISE, and FANO and thus the positions and the widths
Copyright 2002 Marcel Dekker, Inc.
of all peaks. Imagine a small, severely overlapping doublet with a well-dened peak on both
sides of this doublet. These two peaks will contribute most to the determination of the four
calibration parameters, virtually xing the position and the width of the two peaks in the
doublet. As a consequence, their areas can be determined much more accurately.
Referring to our discussion on information content in Sec. II, we did not obtain this
extra performance for nothing. We have supplied extra information: the energy of the
peaks and the two calibration relations [Eqs. (84) and (86)]. Fitting with Eq. (85) requires
that the extra information we supply is indeed correct.
With modern electronics, the linearity of the energy calibration [Eq. (84)] holds very
well in regions above 2 keV. Fitting the entire spectrum, including the low-energy region,
might require more complex energy calibration functions. To t PIXE spectra from 1 to
30 keV, Maenhaut and Vandenhaut (Maenhaut and Vandenhaut, 1986) suggested the
following function: i C1 C2 E C3 expC4 E.
The relation between the square of the peak width and the energy [Eq. (86)] is based
on theoretical considerations. The relation holds very well if the doublet splitting of the
x-ray lines is taken into account. The Ka1Ka2 separation increases from a negligible value
for Ca (3.5 eV) to nearly 100 eV for Mo. The observed peak shape of a K line is actually an
envelope of two peaks. This envelope can be represented rather well by a single Gaussian,
but failing to take this doublet splitting into account (i.e., tting with a single Gaussian
where doublets are required) will result in peak widths that do not obey Eq. (86). To
illustrate this, the observed width of a number of Ka lines as function of the x-ray energy is
presented in Figure 22. The dotted line represents the width of the Ka doublet tted as one
peak. The solid (straight) line shows the width of the individual lines in the doublet.

3. Response Function for an Element


A second modication to the tting function that will reduce the number of tting
parameters is modeling an entire element, rather than single peaks. A number of lines can
be considered as logically belonging together, such the Ka1 and Ka2 of the above-men-
tioned doublets or all K lines of an element. This group can be tted with one area
parameter A representing the total number of counts of all the lines in the group.
The spectrum of an element can then be represented by

X
NP
yP i A Rj Gi; Ej 87
j1

where G are the Gaussians for the various lines with energy Ej and Rj the relative P in-
tensities of the lines. The summation runs over all lines in the group (NP) with Rj 1.
The transition probabilities of all lines originating from a vacancy in the same (sub)
shell (K, LI, LII, . . .) are constants, independent of the excitation. However, the relative in-
tensities depend on the absorption in the sample and in the detector windows. To take this
into account, the x-ray attenuation must be included in Eq. (87). The relative intensity ratios
are obtained by multiplying the transition probabilities with an absorption correction term:
Rj Ta Enj
R0j P 88
j Rj Ta Enj

Contributions from various subgroups (i.e., between K and L, between LI and LII, etc.)
depend on the type of excitation (photons, electrons, protons) and on the excitation en-
ergy. General values cannot be given and must be determined for the particular excitation
Copyright 2002 Marcel Dekker, Inc.
Figure 22 FWHM of various Ka lines, tted as single peak and as a Ka1Ka2 doublet.

if one wants to combine lines of dierent subgroups. Transition probabilities of lines in


various groups, determined experimentally and calculated from rst principles, can be
found in the literature (McCrary et al., 1971; Salem et al., 1970; Salem and Wimmer, 1970;
Scoeld 1970, 1974a, 1974b).
In Figure 23, the t of a tungsten L line spectrum using 24 transitions from the 3 L
subshells is shown. The relative intensities of the L lines within each subshell were taken
from the literature (Scoeld, 1974b). The t was thus done with one peak-area parameter
for the L1, the L2 and the L3 sublevels.
Fitting elements rather than individual peaks enhances the capability of the method
to resolve overlapping peaks. The area of the CrKa peak, interfered by a VKb peak, can be
obtained with higher precision (lower standard deviation) because the area of the VKb
peak is related to the area of the (interference free) VKa peak. Again, we have introduced
more a priori knowledge into our model. If this information (the relative intensity ratio) is
not correct, we will introduce systematic errors (e.g., the CrKa peak area, although having
a low standard deviation, will not be correct). In practice, there will be a trade-o between
the gain in accuracy and the gain in precision. Errors in the values of the transition
probabilities and in the value of the absorption correction term are suciently small so
that small to moderate high peaks (up to 105 counts peak area) can be tted as one group.

4. Modified Gaussians
When tting very large peaks with a Gaussian, the deviation from the pure Gaussian shape
becomes signicant. In Figure 24, a MnK spectrum with 107 counts in the MnKa peak is
shown. One observes a tailing on the low-energy side of the peaks and a continuum that is
Copyright 2002 Marcel Dekker, Inc.
Figure 23 Fit of a complex L line spectrum of tungsten. In total, 24 transitions, divided over the 3
L subshells, are required for the description on the spectrum.

Figure 24 MnK line spectrum with very good counting statistics. The dierence from the
Gaussian response is obtained by subtracting all Gaussian peaks. From this, the peak-shape
correction is calculated.

Copyright 2002 Marcel Dekker, Inc.


higher at the low energy side of the Ka peak than at the high energy side of the Kb peak.
The observed peak shape is partially caused by nonideal behavior of the detector. For low-
energy lines ( < 10 keV), incomplete charge collection and other detector artifacts play an
important role. For higher energetic lines, Compton scattering in the detector also con-
tributes. Another part of the deviation from the Gaussian shape is attributed to phe-
nomena taking place in the sample itself. X-ray satellite transitions such as KLM radiative
Auger transitions on the low-energy side of the Ka and peak and KMM transitions on the
low-energy side of the Kb contribute to the overall shape of the spectrum. In the literature,
considerable attention has been given to the explanation and to the accurate mathematical
description of the observed peak shape (McNelles and Compbell, 1975; Van Espen et al.,
1977b; Wielopolski and Gardner, 1979; Campbell et al., 1985, 1987, 1997, 1998, Gardner
et al., 1986; Yacout et al., 1986; Campbell, 1996).
Failure to account for the deviation from the Gaussian peak shape causes a number
of problems when tting x-ray spectra. Small peaks sitting on the tail of large ones (e.g.,
NiKa in front of CuKa) cannot be tted accurately, resulting in large systematic errors for
the small peak. Because the least-squares method seeks to minimize the dierence between
the observed spectrum and the tted function, the tail might become lled up with peaks of
elements that are not really present. Also, the continuum over the entire range of the
spectrum becomes dicult to describe. This is illustrated in Figure 25a, where a spectrum

Figure 25 Fit of the spectrum of a brass sample (NIST SRM 1106) (a) fitted with a simple
Gaussians and (b) fitted with Gaussians including tail and step functions.

Copyright 2002 Marcel Dekker, Inc.


of a brass sample (NIST SRM 1106) is tted with simple Gaussians and a linear con-
tinuum. An unrealistically high area for the NiKa peak is obtained in this way.
A number of analytical functions have been proposed to account for the true line
shape. Nearly all of them include a at shelf and an exponential tail, both convoluted with
the Gaussian response function. Similar functions are also used to describe the peak shape
of g-ray spectra (Phillips and Marlow, 1976).
To account for the deviation from the Gaussian peak shape, the Gauss function G(i,
Ej) in Eq. (85) can be replaced by
Pi; Ejk Gi; Ejk fS Si; Ejk fT Ti; Ejk 89
where G(i, Ej) is the Gaussian part given by Eq. (85) and S and T are the step and tail
functions, respectively,
 
GAIN Ei  Ejk
Si; Ejk erfc p 90
2Ejk s 2
   
GAIN Ei  Ejk Ei  Ejk 1
Ti; Ejk exp erfc p p 91
2gs exp1=2g 2 gs s 2 g 2
In these equations, s represents spectrometer resolution [Eq. (86)] and g is the broadening
of the exponential tail. The parameters fS and fT in Eq. (89) describe the fraction of the
photons that arrive in the step and the tail, respectively. The complement of the error
function used to convolute the step and the tail is dened as
Z u
2
et dt
2
erfcu 1  erfu 1  p 92
p 0
and can be calculated via series expansion (Press et al., 1988). Figure 25b shows a t of a
brass spectrum when these step and tail functions are included. In this case, a much more
realistic area for the NiKa is obtained and the continuum is described more correctly.
Campbell et al. (1987) used a Gaussian, a short-tail, and a long-tail exponential to t
the peak in energy-dispersive spectra obtained from Ka1 and La1 lines selected by Bragg
reection from a curved crystal. In this way, the inuence of the doublet structure and the
satellite lines is eliminated. Excellent ts with reduces chi-square values between 1.02 and
1.16 were obtained for peaks having 106 counts.
Fitting real x-ray spectra with modied Gaussians as given by Eq. (89) increases
dramatically the number of parameters that need to be optimized ( fS, fT, g) for each peak.
The phenomena contributing to the deviation from the Gaussian shape depend on the
energy of the detected x-ray line, so that these parameters can be expressed as smooth
functions of energy (Wielopolski and Gardner, 1979; Yacout et al., 1986). These relations
can be established using pure element spectra with very good counting statistics. An ad-
ditional problem is that the tailing of the Kb peaks tend to be larger due to the more
intense KMM radiative transition, thus requiring a separate relation for Ka and Kb tails.
A further modication of the tting function involves the use of a Voigt prole to
account for the natural line width of the x-ray lines. This becomes important when tting K
lines of higher-Z elements (Ba and above) where the natural linewidth becomes consider-
able ( > 20 eV) compared to the detector resolution. The peak prole, Gi; Ej in Eq. (89)
must then be replaced by a Voigt prole (the convolution of a Lorenzian and a Gaussian):
 
GAIN Ei  Ejk aL
p K p ; p 93
s 2p s 2 2s 2
with K the real part of the complex error function and aL the natural width of the x-ray line:
Copyright 2002 Marcel Dekker, Inc.
Kx; y Rebez erfcizc;
2
z x iy 94
Voigt proles can be calculated using numerical approximations (Schreier, 1992). The
imaginary part of the complex error function can be used to calculate the derivatives
necessary in the least-squares-tting algorithm.
An alternative procedure to describe the peak shape is used in the AXIL program
(Van Espen et al., 1977b). The deviation from the Gaussian shape is stored as a table on
numerical values, representing the dierence of the observed shape and the pure Gaussian.
The table extends from zero energy up to the high-energy side of the Kb peak and is
normalized to the area of the Ka peak. The deviation is obtained from pure element
spectra having very good counting statistics. (Ka 107 counts). Preferably, thin lms are
used to keep the continuum as low as possible and to avoid absorption eects. The area,
position, and width of all peaks in the spectrum are determined by tting Gaussians on a
constant continuum over the full width at tenth maximum (FWTM) of the peaks. The
Gaussian contributions are then stripped from the spectrum. The resulting non-Gaussian
part as shown in Figure 24 is further smoothed to reduce the channel to channel uc-
tuations and subsequently used as a numerical peak-shape correction. The tting function
for an element is then given by
( )
X
NP
0 0
yP i A RKa Gi; EKa Ci Rj Gi; Ej 95
j2

where Ci is the numerical peak-shape correction at channel i. Values in the table are
interpolated to account for the dierence between the energy scale of the correction and
the actual energy calibration of the spectrum. Similar to the parameters of the non-
Gaussian analytical functions, the shape of the numeric correction seems to vary slowly
from one element to another. This allows us to interpolate the peak-shape correction for
all elements from a limited set of experimentally determined corrections.
A major disadvantage of this method is that it is quite dicult and laborious to
obtain good experimental peak-shape corrections. Although they are, in principle, de-
tector dependent, experience has proven that the same set of corrections can be used for
dierent detectors with reasonable success, proving the fundamental nature of the ob-
served non-Gaussian shape. Another disadvantage is that the peak-shape correction for
the K becomes underestimated if strong dierential absorption takes place because the
peak-shape correction is only related to the area of the Ka peak. Also, it is nearly im-
possible to apply this method to the description of L-line spectra. A mayor advantage
however is the computational simplicity of the method and the fact that no extra para-
meters are required in the model.

5. Absorption Correction
The absorption correction term Ta E, used in the Eqs. (82) and (88), includes the x-ray
attenuation in all layers and windows between the sample surface and the active area of the
detector. For high energetic photons also, the transparency of the detector crystal needs to be
taken into account. In x-ray uorescence, the attenuation in the sample, causing additional
changes in the relative intensities, can also be considered, providing the sample composition
is known. The total correction term is thus composed of a number of contributions:
Ta E TDet TPath TSample 96
The detector contribution for a Si(Li) detector is given by
Copyright 2002 Marcel Dekker, Inc.
TDet E emBe rdBe emAu rdAu emSi rdSi 1  emSi rDSi 97
where m, r, and d are the mass-attenuation coecient, the density, and the thickness of the
Be window, the gold contact layer, and the silicon dead layer. In the last term, D is the
thickness of the detector crystal.
Any absorption in the path between the sample and the detector can be modeled in a
similar way. For an air path, the absorption is given by

TPath E emair rdair 98


The mass-attenuation coecient of air can be calculated assuming a composition of 79.8%
N2, 19.9% O2, and 0.3% Ar. The density of air in standard conditions is 1.261073 g=cm3.
In PIXE setups, it is common practice to use additional absorbers between the sample and
the detector, with one absorber sometimes having a hole (funny lter). The absorption
behavior of such a structure can be modeled by
TPath E emf rdf h 1  hemff rdff 99
where the rst term accounts for the absorption in (all) solid lters and the second term
accounts for the absorption of the lter having a hole. The fraction of the detector solid
angle subtended by the hole is denoted by h.
The sample absorption correction in the case of x-ray uorescence, assuming ex-
citation with an (equivalent) energy E0 , is given by
1  ews rds
Tsample E 100
ws rds
and the sample attenuation coecient is given by
ms E ms E0
ws 101
sin y1 sin y2
This sample attenuation coecient can only be calculated if the weight fractions of the
constituting elements are known, which is often not the case because the aim of the
spectrum evaluation is to obtain the net peak areas from which the concentrations are to
be calculated. Although x-ray attenuation in solid samples might become very large, it is
important to realize that not the absolute values of the absorption corrections but their
ratio is of importance in Eq. (88). This ratio changes less dramatically especially because
the energy dierence of the lines involved is small. For an innitely thin Fe sample, the
Kb=Ka intensity ratio is 0.134. In an innitely thick Fe matrix, the absorption correction
term Ta EKb and Ta EKa are respectively 161 and 134, assuming MoKa excitation,
causing the Kb=Ka ratio to change to 0.161. Therefore, a rough estimate of the sample
composition is often sucient. Van Dyck and Van Grieken (1983) demonstrated the
integration of spectrum analysis with quantitative analysis. The sample absorption term
is recalculated based on the estimated sample composition, and the spectrum tting is
repeated.

6. Sum and Escape Peaks


As indicated in Sec. II, Si escape peaks can be modeled by a Gaussian with an energy
1.750 keV below the parent peak. The area relative to the area of the parent peak can be
calculated from the escape fraction [Eq. (4)]:
Copyright 2002 Marcel Dekker, Inc.
Ne f
Z 102
Np 1  f
Including the escape peaks, the description of the uorescence of an element becomes
X
NP
yP i A R0j Gi; Ej ZGi; Ej  1:750 103
J1

Also, various polynomial type functions expressing the escape ratio as a function of the
energy of the parent peak are in use. The coecients of the function can be determined by
least-squares tting from experimental escape ratios. For spectra obtained with a Ge
detector, one needs to account in a similar way for both the GeKa and GeKb escape peak
for elements above arsenic.
The incorporation of the sum peaks in the tting model is more complex. The
method discussed below was rst implemented by Johansson in the HEX program
(Johansson, 1982). Initially, the spectrum is tted without considering pileup peaks. The
peaks are then sorted according to their height and the n largest peaks are retained. Peaks
that diers less than 50 eV are combined to one peak. Using Eqs. (5) and (6), the relative
intensities of all possible nn 1=2 pileup peaks and their energies are calculated and the
m most intense are retained. Knowing the relative intensities and the energies of these m
pileup peaks, they can be included in the tting model as one pileup element. In the next
iteration, the total peak area A of this pileup element is obtained. The construction of the
pileup element can be repeated during the next iterations as more reliable peak areas
become available. In Figure 26, part of an PIXE spectrum is shown tted with and without
sum peaks included.

D. Special Aspects of Least-Squares Fitting


1. Constraints
When using nonlinear least-squares tting, it can be advantages to impose limits on the
tting parameters to eliminate physically meaningless results. Some illustration of what
can happen if too much freedom is given to the tting parameters is provided by Statham
(1978).
The incorporation of the energy and resolution calibration functions [Eqs. (84) and
(86)] in the tting model already places a severe constraint on the t, avoiding the situation
in which two peaks that are very close together swapping their positions or becoming one
broader peak.
The tting parameter can be eectively constrained by dening the real physically
meaningful parameter Pj [e.g., the GAIN in Eq. (84)] as an arctangent function of the
tting parameter aj (McCarthy and Schamber, 1981; Nullens et al., 1979):
2Lj
Pj P0j arctan aj 104
p
where P0j is the expected value (initial guess) of the parameter Pj , and Lj denes the range.
As a result of this transformation, the parameter Pj will always be in the range P0j  Lj .
Apart from signicantly adding to the mathematical complexity, such transformation has
the disadvantage that the w2 minimum, although lying within the selected interval, cannot
be reached because the path toward the minimum passes a forbidden region. Also, this
constraint makes no distinction between the more probable values of the parameter lying
near the center of the interval and the unlikely values at the limits.
Copyright 2002 Marcel Dekker, Inc.
Figure 26 Fit of part of a PIXE spectrum having very high count rates for Fe and Zn. (Top:
without considering sum peaks; bottom: sum peaks included in the fitting model.)

An alternative approach, proposed by Nullens et al. (1979), relies on the modica-


tion of the curvature of the w2 surface. The tting parameters, such as ZERO, GAIN,
NOISE, and FANO, are considered as random variables with an expected value equal to
the initial guess a0j and having an uncertainty Daj . They can be included in the expression
of w2 , just as the observed data points yi  si :
X 1 X 1
w2 2
yi  yi 2 a0  aj 2
2 j
105
i
s i j Da j

Using this expression in the Marquardt nonlinear least-squares-tting algorithm results in


modied equations for the diagonal terms of the a matrix and for the b vector (see Sec. IX):
X 1 @y0 i2 1
ajj 106
i
2
si @a j Daj 2
and
X 1 @y0 i a0j  aj
bj y i  y 0 i 107
i
s2i @aj Daj 2
If no signicant peaks are present in the tting interval, the derivatives of the tting
function with respect to the position and width of the peaks are zero. Therefore, the
Copyright 2002 Marcel Dekker, Inc.
second term in bj will dominate, causing the parameter estimate in the next iteration to be
such that aj tends toward a0j (the initial value). If well-dened peaks are present, however,
the second term in b will be negligibly small compared to the rst term and the method
acts as if no constraints were present.
Another way to look at this is by considering the curvature of the w2 surface near the
minimum. Figure 27 shows a cross section through the w2 surface along the peak position
parameter. The variation of the w2 values, with and without constraints, are shown for a
large (10,000 counts) and a small (100 counts) peak. The true peak position is at channel
100 and the FWHM of the peak is eight channels. The constraint, Da, on the peak position
is one channel. From Figure 27, it is evident that the w2 minimum is much better dened
for a small peak when the constraint is applied, whereas this has no inuence for a large
peak. This method has been implemented in the AXIL program to constraint the energy
and resolution calibration parameters.

2. Weighting the Fit


The weights wi used in the least-square methods [Eq. (77)] are dened in terms of the
true (population) standard deviation of the data yi , which can be obtained directly from
the data itself:
1 1
wi 2
 108
si yi
This approximation is valid for moderate to good statistics. When regions of the spectrum
with very bad statistics (low channel contents) are tted, estimating the weight from the
measured channel content causes a systematic bias leading to an underestimate of the peak
areas. To overcome this problem, Phillips suggested estimating the weights from the

Figure 27 Effect of the use of a constraint on the shape of the w2 response surface. (Left: marginal
effect for a large peak; right: important contribution for a small peak.)

Copyright 2002 Marcel Dekker, Inc.


average of three channels (Phillips, 1978). In the AXIL program, the weights are initially
based on the measured channel content, but when the overall w2 value of the t falls
below 3, the weights are calculated from the tted channel content. This is based on the
idea that when the calculated spectrum approached the measured spectrum, the calculated
channel contents are a better estimate of the true channel content than the measured
values. The eect of the weighting on the t is considerable, as can be seen in Figure 28.

E. Examples
To illustrate the working of the nonlinear least-squares-tting method, an articial spec-
trum with CuK and ZnK lines is tted with four Gaussian peaks on a constant continuum.
Using the Marquardt algorithm, the position, width, and area of each peak are de-
termined. The tting function thus is
" #
X4
i  aj1 2
yi a0 i aj exp  109
j1
2a2j2

with i the channel number (independent variable) and aj the parameters to be determined,
13 in total. In Table 10, the values of the parameters to generate the spectrum (true va-
lues), the initial guesses for the nonlinear parameters, and the nal tted values are given.
The initial guesses were deliberately taken rather far away from the true values. Figure 29
(top) shows the tted spectrum after the rst and second iterations. During the second
iteration, the Marquardt algorithm evolved into a gradient search, drastically changing the
position and width of the peaks. Even after ve iterations, the calculated spectrum deviates
considerably from the measured spectrum, as can be seen from Figure 29 (bottom).

Figure 28 The effect of weighing the least-squares fit is shown on part of an PIXE spectrum with
a very small number of counts per channel.

Copyright 2002 Marcel Dekker, Inc.


Table 10 True Value of Spectrum Parameters Used to Generate the Articial Spectrum in
Figure 29, Initial Guesses and Fitted Values of the Peak Area, Position, and Width Obtained by
Nonlinear Least-Squares Fitting

Parameter True value Initial guess Fitted value

Area (counts)
CuKa 100,000 0 100,134321
CuKb 13,400 0 13,163169
ZnKa 30,000 0 30,092213
ZnKb 4,106 0 413883
Position (channel number)
CuKa 402.05 395 402.030.01
CuKb 445.25 450 445.350.06
ZnKa 431.55 435 431.590.04
ZnKb 478.60 485 478.680.09
Width (channels)
CuKa 3.913 3 3.910.01
CuKb 4.033 3 3.990.05
ZnKa 3.995 3 4.020.03
ZnKb 4.123 3 4.060.08

Finally, after 10 iterations, a perfect match between them is obtained, with a reduced w2
value of 0.96.
From Table 10, it is evident that the t was quite successful, with all peak areas,
positions, and widths estimated correctly within the calculated standard deviation. One
observes that the uncertainties in the peak areas are approximately equal to the square
root of the peak area and that the position and the width of the peaks are estimated very
precisely (within 0.01 channel or 0.2 eV), especially for the larger peaks.
By observing how the iteration procedure changes the peak width and position
parameters, one can imagine that something might go wrong. Especially if the spectrum is
more complex, chances are high that the iteration stops in a false minimum or even drifts
away completely. In both cases, physically incorrect parameter estimates will be obtained.
In practice, it is course possible to give much better initial estimates for the peak position
and width parameters than used in this example.
In a next example, a complex spectrum (geological reference material JG1, excited
with MoK x-rays from a secondary target system) is evaluated using nonlinear least-
squares tting. In Figure 30, the spectrum and the t are shown together with the residuals
of the t (see p. 298). Due to the large number of overlapping lines, the method used in the
rst example (tting the position and width of each peak independently) is not applicable
in this case. For the description of the spectrum from 1 to 18 keV, the uorescence lines of
21 elements were used. Au, Hg, Pb, and Th were treated each as one L group
(L1 L2 L3 ) and Al, Si, Ti, V, Cr, Mn, Ni, Cu, Zn, Ga, Al, Br, Sr, Rb, Y as one K group
(Ka Kb); K, Ca, and Fe were tted with individual Ka and Kb peaks. The coherently
scattered MoKa radiation was tted with a Gaussian and the incoherent scattered MoKa
radiation is tted with a Voigt prole. Including escape and sum peaks, this amounts to
well over 100 peak proles. Step and tail functions [Eq. (89)] were included for all peaks,
using expressions to relate the step and tail heights and the tail width to the energy of the
peak. The continuum is described by an exponential function [Eq. (81)] with six para-
meters. The least-squares t thus performed required the renement of 37 parameters
Copyright 2002 Marcel Dekker, Inc.
Figure 29 Artificial CuK and ZnK line spectrum tted with a nonlinear least-squares procedures
to optimize peak area, position, and width. (Top: Fitted spectrum after rst and second iteration;
bottom: after fth and nal iterations.)

(4 calibration parameters, 27 peak areas, 6 continuum parameters, 3 step and tail para-
meters, and 1 Voigt parameter). The minimum w2 value of 1.47 is obtained after 16
iterations. The residuals indicate an overall good t with most of the residuals in the 73
to 3 interval, without any systematic patterns. Its interesting to note that the tted
continuum is well below the base of the peaks, especially in the region from channel 200 to
600. The continuum describes correctly the small scattered bremsstrahlung contribution
from the x-ray tube above 12 keV (channel 600) and the Compton scattering in the de-
tector at the low-energy side. Most of the apparent continuum in this secondary-target
EDXRF spectrum is due to incomplete charge collection and tailing phenomena of the
scattered Mo excitation radiation and the uorescence lines.

F. Evaluation of Fitting Results


In order to understand and appreciate the capabilities and limitations of least-squares
tting, either using library spectra, linear, or nonlinear analytical functions, it is useful to
study the eect of random and systematic errors in some detail. Random errors are as-
sociated with the uncertainty si of the channel content yi . As will be seen, these un-
certainties inuence the precision of the net peak areas and determine the ultimate
resolving power of the least-squares method. Systematic errors, on the other hand,
Copyright 2002 Marcel Dekker, Inc.
Figure 30 Example of the fit of a complex x-ray spectrum (thin film deposit of geological
reference material JG1 excited with a Mo secondary-target system). The residuals are plotted as an
indicator of the quality of the fit.

are caused by discrepancies between the tting model and the observed data and cause
inaccuracies in the net peak areas.

1. Error Estimate
Section IX explains how the least-squares-tting method (linear as well as nonlinear)
allows the estimation of the uncertainties in the tted parameters. These uncertainties
result from the propagation of the statistical uctuations in the spectral data into the
parameters. Intuitively, one could come to the conclusion that the standard deviation of
the peak area should be close to the square root of the peak area. This is indeed the case
for a large, interference-free peak on a low continuum, but if the peak is sitting on a high
continuum and=or is seriously overlapped by another peak, the uncertainty in the esti-
mated peak area will be much larger.
A properly implemented least-squares method not only correctly estimates the net
peak areas but also their uncertainty, taking into account the continuum and the degree of
peak overlap, providing, of course, that the tting model is capable to describe the
measured spectrum. The closer together the peaks are, the higher the uncertainties in the
two peak areas will become. (Theoretically, in the limit of complete overlap, the un-
certainty will be innite and the area of the two peaks will have complete erratic values,
but their sum will still represent correctly the total net peak area of the two peaks; in
practice, the curvature matrix a will be singular so that the matrix inversion fails.)
Copyright 2002 Marcel Dekker, Inc.
The uncertainties in the net peak areas can be used to decide if a peak is indeed
present in the spectrum and to calculate the detection limit. A peak area is statistically not
signicant dierent from zero if its value is in the range 3s. Any value above 3s gives
clear evidence that the peak is present; any value below 73s would indicate that there is
something wrong with the model because truly negative peak areas are physically mean-
ingless. Because the uncertainty in the net peak area includes the inuence of continuum
and peak overlap, it can be used to calculate the a posteriori detection limits of the ele-
ments (peaks) present in the spectrum. Three situations can occur:

1. Estimated peak area > 36 standard deviation


? report: areastandard deviation
2. 736 standard deviation  estimated area  36 standard deviation
? report: detection limit equal to 36 standard deviation
3. Area < 736 standard deviation
? revise the tting model

2. Criteria for Quality of Fit

From the denition of the least-squares method, it follows that the w2 value [Eq. (75)]
estimates how well the model describes the data. The reduced w2 value, obtained by di-
viding w2 by the number of degrees of freedom;

1 1
w2n w2 w2 110
n nm
has an expected value of 1 for a perfect t. The number of degrees of freedom equals the
number of data points (n) minus the number of parameters (m) estimated during the t.
Because w2 is also a random variable, the observed value will often be slightly larger or
smaller than 1. Actually, w2 follows (approximately) a chi-square distribution and the 90%
condence interval is given by

w2n;0:95  w2n  w2n;0:05 111


where w2n;0:95 and w2n;0:05 are the tabulated w2 values at the 95% and 5% condence intervals,
respectively. For n 20, the condence interval is 0.5431.571, and for n 200, it is
0.8411.170. An observed value of w2n in this interval indicates that the model describes the
experimental data within the statistical uncertainty of the data. In other words, all re-
maining dierences between the data and the t can be attributed to the noise uctuations
in the channel contents and are statistically not signicant. When tting complex spectra
with good statistics, much higher chi-square values will be obtained due to small im-
perfections in the continuum or peak model. This does not mean that the estimated peak
areas are not useful any more, but high reduced w2 values ( > 3) might indicate that the
tting model needs improvement.
The reduced chi-square value as dened in Eq. (110) gives an estimate of the overall
t quality over the entire tting region. Locally, in some part of the spectrum, the t might
actually be worse than indicated by this value. It is therefore useful to dene the chi-square
value for each peak separately:

1 X m
1
w2P yi  yi 2 112
n1  n2 in2 s2i

Copyright 2002 Marcel Dekker, Inc.


where n1 and n2 are the boundaries of the peak at FWTM and n2  n1 approximates the
number of degrees of freedom. High values of w2P indicate that the peak is tted poorly and
the resulting peak area should be used with caution. In this case w2P > 1, it is advisable to
give a conservative estimate for the uncertainty in the net peak area, by multiplying the
calculated uncertainty with the square root of the w2 value:
q
sA0 sA w2P 113

Although the w2 values gives an indication of the goodness of t, visual inspection of


the t is highly recommended. Because of the large dynamic range of the data, a plot of the
spectrum and the t on a linear scale nearly always give the impression of a perfect t. A
plot of the logarithm or the square root of the data is more appropriate. The best method
is to plot the residuals of the t as in done in Figure 30. The residuals are dened as

yi  yi
ri 114
si
It is the sum of the squares of these residuals that were minimized by the least-squares
method. Values in excess of 3 or 73 and especially the presence of a pattern in the
residuals then indicate poorly tted regions.

G. Available Computer Codes


In the literature, a number of computer programs for spectrum evaluation based on the
least-squares method are reported. Without attempting to be complete, the main char-
acteristics of a number of programs are summarized in the following paragraphs. Watjen
made a compilation of the characteristics of eight computer packages for PIXE analysis
(Watjen, 1987).
An intercomparison of ve computer program for the analysis of PIXE spectra re-
vealed a very good internal consistency among the ve programs (Campbell et al., 1987).
PIXE spectra of biological, environmental, and geological samples were used and their
complexity put high demands of the spectrum analysis procedures. The following pro-
grams were tested: AXIL, University of Gent, Belgium; HEX, University of Lund, Swe-
den; SESAM-X (Marburg, FRG), the Guelph program; PIXAN, Lucas Heights. It was
concluded that the most serious disagreement occurred for small peaks on the low-energy
tails of very large peaks, pointing to a need to a more accurate description of the tail
functions. Also, very good agreement between the linear (SESAM-X) and the nonlinear
least-squares approach was observed.
The Los Alamos PIXE Data Reduction software (Duy et al., 1987) contains three
components. The K and L relative x-ray intensities of the elements making up the sample
are computed taking into account the detector and sample absorption. Using Gaussian
peak shape, the energy and resolution calibrations of the spectrum are calculated. With the
relative peak areas and the calibration functions obtained is this way, the spectrum is tted
using a Gaussian peak shape and a polynomial continuum. Escape and pileup peaks can
be included. A linear least-squares t is done with the relative elemental concentrations
and the polynomial continuum coecient as unknowns. The continuum and the relative
concentrations are constrained to non-negative values and all elements having x-ray lines
in the spectrum interval considered are included.
The PIXASE computer package (Zolnai and Szabo, 1988) performs spectrum ana-
lysis using nonlinear least-squares tting. Elements are represented by groups of lines with
Copyright 2002 Marcel Dekker, Inc.
xed, absorption-corrected, relative intensities and including escape peaks. Each peak is
modeled by a Gaussian with the addition of a exponential tail and an error function as
step. The square of the peak width is a rst-order polynomial of the peak energy and the
position is a second-order polynomial of the peak energy. The continuum is described as
the sum of an exponential polynomial and two simple exponentials. Pileup eects are
treated as one pileup element. The nonlinear least-squares tting is done using a simple
grid search technique. The search space of each parameter is limited by user-supplied
minimum and maximum values. Fitting a large series of similar spectra linear least-squares
tting using a library of calculated spectrum components can be done.
Bombelka et al. (1987) described an PIXE analysis program based on linear least
squares. The peak shape includes a Gaussian low-energy tail function to account for the
incomplete charge collection and the escape peak. The position and the square of the
width of the peaks are given by rst-order linear function of the energy. The uorescence
lines of an element are modeled as a sum of those peak shapes with relative intensity ratios
corrected for absorption in the detector windows and absorbers. The continuum is com-
posed of a fourth-order exponential polynomial multiplied by the x-ray attenuation term
(bremsstrahlung continuum) and a second-order linear polynomial. Pileup is taken into
account as a pileup element. The energy and resolution calibration parameters are ob-
tained from selected peaks in the spectrum. The parameters of the exponential continuum
are calculated from continuum spectra. The parameters obtained by the linear least-
squares t are the amplitude parameters of each element and of one pileup element, the
amplitude parameter of the bremsstrahlung continuum, and the linear polynomial con-
tinuum parameters. The computer implementation for PIXE is called SESAMX and is
highly interactive with graphical representation of the spectral data. Other computer
program for tube-excited (Breschinsky et al., 1979) and synchrotron radiation- (Petersen
et al., 1986) excited XRF were developed based on this code.
SAMPO-X (Aarnio and Lauranto, 1989) is intended for the analysis of electron
induced x-ray spectra and is based on the well-known SAMPO code original developed for
g-ray spectroscopy (Routti and Prussin, 1969). A Gaussian with two exponential tails, as
in the original SAMPO program, is used to represent the peaks. The height and the po-
sition parameters are obtained by the nonlinear least-squares t. The peak width and the
tail parameters are obtained from shape-calibration tables by interpolation. The con-
tinuum is modeled by the semi-empirical electron bremsstrahlung intensity function
proposed by Pella et al. (1985). The thickness of the detector beryllium window and the
atomic number of the sample, which occur in this formula, are adjusted by the least-
squares t. The program also includes an element identication based on the energy and
intensity of the tted peaks and a standard ZAF matrix correction.
Jensen and Pind (1985) described a program for the analysis of energy-dispersive x-
ray spectra. The program uses a sum of Gaussians, one for each uorescence line. The
continuum is subtracted rst using a linear, parabolic, or exponential function tted from
peak-free regions in the spectrum. The peak width is obtained from a calibration function
which expressed the log of the peak width as a linear function of the peak position. The
width calibration is done using nonoverlapping peaks in a calibration spectrum or in the
spectrum to analyze. Peak positions are determined using a peak search method or entered
by the operator with the aid of a graphical display of the spectrum. The peak heights are
then determined using linear least-squares tting.
The computer code developed at the technical University of Graz (Marageter et al.,
1984a) is primary intended for the evaluation of x-ray uorescence spectra. A Gaussian
response function with an low-energy tail is used to describe the peaks. The square of the
Copyright 2002 Marcel Dekker, Inc.
peak width is a linear function of the peak energy. A straight-line equation relates the peak
position to the energy of the peaks. A parabola is used to describe the continuum, whereas
absorption edges are modeled by a complementary error function. The tting parameters
are the peak heights, the three continuum parameters, the height of the absorption edges,
and the two energy calibration parameters. Nonlinear least-squares tting is done with the
Marquardt algorithm using a tangent transformation to constrain the tting parameters to
physical meaningful values (Nullens et al., 1979). Provision for escape peaks and Auger
peaks is also made (Marageter et al., 1984b).
The AXIL program (Van Espen et al., 1977a) was originally developed for the ana-
lysis of x-ray uorescence spectra and later modied to allow the evaluation of electron-
and particle-induced x-ray spectra (Van Espen et al., 1979b, 1986). It uses the Marquardt
algorithm for nonlinear least-squares tting with a modied (constrained) chi-square
function. Linear, exponential, and bremsstrahlung polynomials can be used to model the
continuum as well as continuum stripping. X-ray lines are described by Gaussian functions
with an optional numerical peak-shape correction. Escape and sum peaks can be included
in the model. The peak position and the square of the peak width is related to the x-ray
energy by linear functions. Provision is made to correct for the absorption in detector
windows, lters, and the sample. In a later version, an orthogonal polynomial continuum
model was added, as well as step and tail functions to describe the deviation from the
Gaussian peak shape (Vekemans et al., 1994). The code was implemented as a Windows
application. An example of the screen output of this program is given in Figure 31.

VIII. METHODS BASED ON THE MONTE CARLO TECHNIQUE

Monte Carlo techniques for the simulation of x-ray spectra are becoming more and more
popular, particularly because of the fast computers available today. These simulated
spectra are useful for studying the behavior and performance of various spectrum pro-
cessing methods. The Monte Carlo technique also can be used in quantitative analysis
procedures, as will be discussed in Sec. VIII. B.

A. Simulation of X-ray Spectra


During the development and test phase of a spectrum processing method, it is often ad-
vantageous to use computer-simulated spectra. For these spectra, such features as posi-
tion, width, and area of peaks are exactly known in advance. They can be generated to any
desired complexity. In order to make any real use of them, the simulated spectra must
possess the same channel-to-channel variation according to a Poisson distribution as ex-
perimental spectra.
A simple and adequate procedure consist of rst calculating, over the channel range
of interest, the ideal spectrum y0 using, for example, a polynomial for the continuum and a
series of Gaussians as given by Eq. (109). More complex functions, including a more
physically realistic model for the continuum and tailed peaks, can be used if desired. The
next step is to add or subtract some number of counts from each channel content so as to
obtain data with the desired counting statistical noise. In other words, the true content
y0i needs to be converted into a random variable, yi , so that is obeys a Poisson distribution
[Eq. (1)] with N0 y0 .
Poisson-distributed random variables can be generated by various computer algo-
rithms. An example (Press, 1988) is given in Sec. X. For y0 > 30 the Poisson-distributed
Copyright 2002 Marcel Dekker, Inc.
Figure 31 Screen capture of the spectrum analysis program WinAxil, showing the fitted spectrum and the results obtained.

Copyright 2002 Marcel Dekker, Inc.


random variable can be approximated by the much easier to calculate normal-distributed
random variable. The probability to observe y counts in a channel, assuming a normal
distribution, is given by
1 ym2
Py p e 2s2 115
s 2p
with m s2 y0 . For large y0 the normal distribution is a very good approximation of the
Poisson distribution. Even for small channel contents this approximation is quite sa-
tisfactory. The probability of observing 6 counts, assuming the true value is 4, is 0.10
according to a Poisson distribution, whereas it is 0.12 according to a normal distribution.
The problem of adding counting statistics to the calculated spectrum is thus reduced
to calculating a normal-distributed random variable y with mean value m y0 and var-
iance s2 y0 . Starting from a uniformly distributed random variable U in the interval
(0, 1), which can be generated by a pseudo-random number generator, normal distri-
buted random variables with zero mean and unit variance can be obtained with the Box-
Muller method (Press, 1988).
n1 2Ui  1
n2 2Ui1  1 116
r n21 n22
if r < 1
p
z1 n 12 ln r=r
p
z2 n2 2 ln r=r
n1 and n2 and r are calculated from two uniformly distributed random numbers Ui and
Ui1 . If r is less than 1, two normally distributed random numbers z1 and z2 can be cal-
culated. If r
1; n1 ; n2 ; and r are recalculated using a new set of uniform random numbers.
The normally distributed random number y, with mean m y0 and variance s2 y0 , is
then obtained by simple scaling:
p
y m zs y0 z y0 117
Applying this to all channels will produce the desired counting statistics. Since z is nor-
mally distributed with mean 0 and unit variance, z can be negative as well as positive. The
count
p rate in each channel will thus be increased or decreased randomly and proportional
to y0 . The nal step in the computer simulation is to convert the real numbers that were
used during the calculation to integers.
Another interesting procedure is to generate articial spectra from parent spectra.
The parent spectrum is a spectrum acquired for a very long time so that it exhibits ex-
tremely good counting statistics (high channel content). A large number of child spectra
with lower and varying counting statistics can be generated by the procedure explained
below. This method is useful to study the eect of counting statistics on spectrum-
processing algorithms (Ryan, 1988).
From the parent spectrum yi , which might be rst smoothed to reduce the noise even
further providing not too much distortion is introduced, the normalized cumulative dis-
tribution function Yj is calculated
Pj
yi
Yj Pi0n 118
i0 yi

Copyright 2002 Marcel Dekker, Inc.


n being the number of channels in the spectrum and Yj in the interval (01). To generate
the child spectrum we select N times a channel i according to the equation
i Y1 U 119
where Y1 is the inverse cumulative distribution and U is a uniformly distributed random
number. Each time channel i is selected, one count is added to that channel. Since N is the
total number of counts in the child spectrum, the counting statistics can be controlled by
varying N. However, Y1 cannot be expressed as an analytical function, but for each
random number U we select the channel i for which Yi  U < Yi1 . Figure 32 shows an
EDXRF spectrum from a 0.187 mg=cm2 pellet of IAEA Animal Blood reference material
acquired for 1000 s with a Tracor Spectrace 5000 instrument, which is used as a parent
spectrum. The cumulative distribution function and a child spectrum simulated with
N 36104 are also shown. The total number of counts in the original spectrum is 36106.
The child spectrum is equivalent to a spectrum that would have been acquired for 10 s.
Some computer routines useful for simulation experiments are given in Sec. X.

Figure 32 Simulation of a child spectrum from a parent. (Top: original spectrum; middle:
the cumulative distribution; bottom: generated child.)

Copyright 2002 Marcel Dekker, Inc.


B. Spectrum Evaluation Using Monte Carlo Techniques
Another interesting application of the Monte Carlo technique is the simulation of the
complete response of an x-ray uorescence setup. Given an excitation spectrum and the
excitation-detection geometry the interactions of the primary photons with the sample are
simulated and the events giving rise to detectable phenomena are registered. With such a
Monte Carlo simulation, the intensity of the characteristic lines and the scattered excitation
spectrum are estimated, taking primary and secondary eects (absorption, enhancement,
and single and multiple scattering) into account. The obtained spectrum is one as seen by an
ideal detector with innite resolution. This spectrum can then be convolution with the
response function of a real detector to mimic a typical observed pulse-height spectrum.
Apart from a detailed Monte Carlo simulation code, this technique requires detailed
knowledge of the excitation spectrum and a very accurate detector response function.
Doster and Gardner were among the rst to apply this technique to simulate the
complete spectral response of an EDXRF system (Doster and Gardner, 1982a; Doster and
Gardner, 1982b; Yacout and Dunn, 1987). More recently, Janssens and coworkers de-
veloped highly ecient computer codes for the simulation of conventional and synchro-
tron energy-dispersive x-ray uorescence systems, allowing such conditions as polarized
radiation and heterogeneous samples to be taken into consideration (Janssens et al., 1993;
Vincze et al., 1993; Vincze et al., 1995a, 1995b; Vincze et al., 1999). Figure 33 shows the
results of the application of this Monte Carlo simulation for a NIST SRM 1155 steel
sample excited with a ltered Rh-anode x-ray tube. At the top the simulated spectrum as
seen by a perfect detector is shown, at the bottom the spectrum after convolution with a
suitable detector response function is compared with a real measured spectrum of this
sample. Except for some sum peaks that occur in the measured spectrum, an excellent
agreement between the simulated and measured uorescence lines, scattered peaks, and
continuum is obtained.
These simulated x-ray spectra can be used in various ways in quantitative analysis.
Using the complete spectral response of spectrometer, one can try to nd a sample
composition that minimizes (in the least-squares sense) the dierence between the simu-
lated and the measured spectrum. The analysis involves the following steps: (1) simulation
of the X-ray intensities over the expected composition range of the unknowns; (2) con-
volution with the detector response function to obtain spectra; (3) construction of a w2
map (weighted sum of squares of dierences between simulated spectra and experimental
spectrum) as a function of the sample composition; (4) interpolation of w2 for the com-
position corresponding to the minimum. An interesting aspect of this method is that all the
information present in the spectrum (characteristic lines, scattered radiation, continuum)
is considered. Doster and Garnder demonstrated an analytical accuracy of the order of
2% absolute for the analysis of Cr-Fe-Ni alloys with a 109Cd radioisotope system (Doster
and Gardner, 1982a). Based on this work, Yacout and Dunn (Yacout and Dunn, 1987)
demonstrated the use of the inverse Monte Carlo method, which requires in principle only
one simulation to analyze a set of similar samples.
Another method is called Monte Carlolibrary least-squares analysis (Verghese
et al., 1988). Starting from an initial guess of the composition of an unknown sample, a
spectrum is simulated taking into account all the interactions in the sample. During the
simulation one keeps track of the response of each element to construct library spectra.
After the simulation these library spectra are used to obtain the elemental concentrations
by linear least-squares tting (see Sec. VI). In the concentration in the unknown samples
dier too much from the initial assumed concentration, the simulation is repeated.

Copyright 2002 Marcel Dekker, Inc.


Figure 33 Monte Carlo simulation of the spectral data from an NIST SRM 1155 sample excited
with a Rh-anode x-ray tube. (Top: simulated spectrum as seen by an ideal detector; bottom:
simulated spectrum after convolution with the detector response function and comparison with
measured spectrum.)

In contrast to the normal library least-squares method, this method has the advantage that
the library spectra are simulated for a composition close to the composition of the spec-
trum to analyzed, rather than measured from standards. This eliminates the necessity of
applying the top-hat lter and problems related to changes in Kb=Ka ratios, and again the
method combines spectrum evaluation with quantitative analysis.
Finally the ability to simulate x-ray spectra that agree very well with real measured
spectra opens the possibility to used them as standards for the quantitative analysis
based on partial least-squares regression (see Sec. VI.B). Indeed, as this method only
functions correctly if the PLS model is built using a large number of standards covering
the entire concentration domain, it seems advantageous to use simulated spectra for this.
All the inter-element interactions can be accounted for by the simulation and only a few
real standards are required to scale the simulated spectra.
Copyright 2002 Marcel Dekker, Inc.
IX. THE LEAST-SQUARES-FITTING METHOD

The aim of the least-squares method is to obtain optimal values for the parameters of a
function that models the dependence of experimental data. The method has its roots in
statistics but is also considered part of numerical analysis. The least-squares parameter
estimate, also known as curve tting, plays an important role in experimental science. In
x-ray uorescence, it is used in many calibration procedures and it forms the basis of a
series of spectrum analysis techniques. In this section, an overview of the least-squares
method with emphasis on spectrum analysis is given.
Based on the type of tting function, one makes a distinction between linear and
nonlinear least-squares tting because one requires numerical techniques of dierent
complexity to solve the problem in the two cases. The linear least-squares method deals
with the tting of functions that are linear in the parameters to be estimated. For this
problem, a direct algebra solution exists. If the tting function is not linear in one or more
of the parameters, one uses nonlinear least-squares tting and the solution can only be
found iteratively. A group of linear functions of general interest are the polynomials, the
straight line being the simplest case. The special case of orthogonal polynomials will also
be considered. If more than one independent variable x1i ; x2i ; . . . ; xmi is associated with
each measurement of the dependent variable yi , one speaks of multivariate regression.
Spectrum analysis using the library function (e.g., lter-t method) belongs to this cate-
gory. If analytical function (e.g., Gaussians) are tted to a spectrum, the method of linear
or nonlinear least square is used, depending on whether nonlinear parameters, such as the
peak position and width, are determined.

A. Linear-Least Squares
Considered the problem of tting experimental data with the following linear function:

y a1 X1 a2 X2    am Xm 120
This function covers all linear least-squares problems. If m 2; X1 1, and X2 x, the
straight-line equation y a1 a2 x is obtained. For m > 2 and Xk xk1 , Eq. (120) is a
polynomial y a1 a2 x a3 x2    am xm1 to be tted to the experimental data
points fxi ; yi ; si g i 1; . . . ; n. If X1 ; . . . ; Xm represents dierent independent variables, the
case of multiple linear regression is dealt with. Because of this generality, we will discuss
the linear least-squares method based on Eq. (120) in detail.
Assume a set of n experimental data points:

fx1i ; x2i ; . . . ; xmi ; yi ; si g; i 1; . . . ; n 121


with xki the value of the kth independent variables Xk in measurement i, assumed to be
known without error, and yi the value of the dependent variable measured with un-
certainly si . The optimum set of parameters a1 ; . . . ; am that gives a least-squares t of
Eq. (120) to these experimental data are those values of a1 ; . . . ; am that minimize the chi-
square function:
Xn
1
w2 2
yi  a1 X1  a2 X2      am Xm 2 122
i1
si

The minimum is found by setting the partial derivatives of w2 with respect to the para-
meters to zero:
Copyright 2002 Marcel Dekker, Inc.
@w2 Xn
1
2 yi  a1 X1  a2 X2      am Xm Xk 0; k 1; . . . ; m 123
@ak i1
s 2
i

Dropping the weights 1=s2i temporarily for clarity, we obtain a set of m simultaneous
equations in the m unknown ak :
X X X X
yi X 1 a1 X1 X1 a2 X2 X1    am Xm X1
X X X X
yi X 2 a1 X1 X2 a2 X2 X2    am Xm X2
.. 124
X X X. X
yi X m a1 X 1 X m a2 X2 Xm    am Xm Xm
where the summations run over all experimental data points i. These equations are known
as the normal equations. The solutionthe values of ak can easily be found using matrix
algebra. Because two (column) matrices are equal if their corresponding elements are
equal, the set of equations can be in matrix form as
2X 3 2 X X X 3
yi X1 a1 X1 X1 a2 X 2 X 1    am Xm X1
6 X 7 6 X X X 7
6 yi X2 7 6 X1 X2 a2 X 2 X 2    am Xm X2 7
6 7 6 a1 7
6 .. 76 .. 7 125
6 7 6 7
4X . 5 4 X X . X 5
yi Xm a1 X 1 X m a2 X2 Xm    am Xm Xm
The right-hand column matrix can be written as the product of a square matrix a and a
column matrix a:
2X 3 2 X X X 3
yi X1 X1 X1 X2 X1    Xm X1 2 a1 3
6X 7 6 X X X 76 7
6 yi X2 7 6 X2 X2    Xm X2 7
6 7 6 X1 X2 76 a2 7
6 . 7
6 7 6 7 126
6 . 7 6 .. 74 .. 7
6
4 X .. 5 4X X . X 5 5
yi Xm X1 Xm X2 Xm    Xm Xm a2

or
b aa 127
This equation can be solved for a by premultiplying both sides of the equation with
the inverse matrix a1 ,
a1 b a1 aa Ia 128
or, I being the identity matrix,
a a1 b 129
Introducing the weights again, the elements of the matrices are given by
X n
1
bj y X ; j 1; . . . ; m
2 i j
130
i1
si
Xn
1
ajk Xk Xj ; j 1; . . . ; m; k 1; . . . ; m 131
i1
s2i

and
Copyright 2002 Marcel Dekker, Inc.
X
m
aj a1
jk bk ; j 1; . . . ; m 132
k1

where a1 1
jk are the elements of the inverse of the matrix a .
The uncertainty in the estimate of aj is due to the uncertainty of each measurement
multiplied by the eect that measurement has on aj :
Xn  2
@aj
s2aj s2i 133
i1
@yi

Because a1
jk is independent of yi , the partial derive is simply

@aj 1Xm
2 a1 Xk i 134
@yi si k1 jk

After some rearrangements, it can be shown that


" #
Xm X m Xn
1
1 1
saj
2
ajk ajl Xk Xl 135
k1 l1 i1
s2i

the term in the brackets being akl and


XX
s2aj a1 1 1
jk ajl akl ajj 136
k l

This results in the simple statement that the variance (square of uncertainty) of a tted
parameter aj is given by the diagonal element j of the inverse matrix a1 . The o-diagonal
elements are the covariances. For this reason, a1 is often called the error matrix. Simi-
larly, a is called the curvature matrix because the elements are a measure of the curvature
of the w2 hypersurface in the m-dimensional parameter space. It can easily be shown that
1 @ 2 w2 X 1
Xk Xj ajk 137
2 @aj @ak i
s2i

If the uncertainties in the data points si are unknown and are the same for all data
points si s, these equations can still be used by setting the weights wi 1=s2i to 1.
Assuming the tting model is correct, s can be estimated from the data:
1 X
s2i s2  s2 yi  a1 X1  a2 X2      am Xm 2 138
nm i

The uncertainties in the parameters are then given by


s2aj s2 a1
jj 139

If the uncertainties in the data points are known, the reduced w2 value can be calculated as
a measure of the goodness of t:
1 X 1 1
w2n yi  a1 X1  a2 X2      am Xm 2 w2 140
n  m i s2i nm

The expected value of w2n is 1.0, but due to the random nature of the experimental data values
slightly smaller or greater than 1 will be observed even for a perfect t. w2n follows a chi-
square distribution with n  m degrees of freedom, and a 90% condence interval is dened by
Copyright 2002 Marcel Dekker, Inc.
w2 n; P 0:95  w2n  w2 n; P 0:05 141
where w2 n; P is the (tabulated) critical value of the w2 distribution for n degrees of freedom
at a condence level P. Observed w2n values outside this interval indicate a deviation
between the t and the data that cannot be attribute to random statistical uctuations.

B. Least-Squares Fitting Using Orthogonal Polynomials


A special group of linear functions are orthogonal polynomials. Orthogonality means that
the polynomial terms are uncorrelated, and this has some distinct advantages. Let Pj xi
be an orthogonal polynomial of degrees j; a function can then be constructed as a sum of
these orthogonal polynomials of successive higher degree:
X
m
yi Cj Pj xi 142
j0

The least-squares estimates of the coecient Cj are determined by minimizing the weighted
sum of squares:
Xn
w2 wi yi  yi2 143
i1

which results in a set of m 1 normal equations in the m 1 unknown. Because the Pj xi


are a set of orthogonal polynomials, they have the property that
Xn
wi Pj xi Pk xi gk djk 144
i1

with gk a normalization constant and djk 0 for j 6 k.


Because of this property, the matrix of the normal equations is diagonal and the
polynomial coecients are directly obtained from
Xn
wi yi Pj xi
Cj 145
i1
gj

and the variance of the coecient is given by


1
s2cj 146
gj
Another advantage of the use of orthogonal polynomials is that the addition of one
extra term Cm1 Pm1 does not change the values of the already determined coecients
C0 ; . . . ; Cm . Further, if the yi are independent, then also the Cj are independent; that is, the
variancecovariance matrix is also diagonal. As a result, much higher-degree orthogonal
polynomials can be tted compared to ordinary polynomials without running into pro-
blems with ill-conditioned normal equations and oscillating terms.
Orthogonal polynomials can constructed by following recurrence relation:
Pj1 xi xi  aj Pj xi  bj Pj1 xi j 0; . . . ; m  1 147
aj and bj are constants independent of yi given by
Pn
wi xi Pj xi 2
aj i1 ; j 0; . . . ; m 148
gj

Copyright 2002 Marcel Dekker, Inc.


Pn
wi xi Pj xi Pj1 xi
bj i1
; j 0; . . . ; m 149
gj1
Further, the normalization factor is given by
X
n
gj wi Pj xi 2 150
i1

and
b0 0 and P0 xi 1 151
Thus, an example of a rst-order orthogonal polynomial is
C0 P0 C1 P1 C0 C1 xi  a0 152
with
X
n .X
a0 w i xi wi 153
i1

C. Nonlinear Least Squares

In this part, we consider the tting of a function that is nonlinear in one or more tting
parameters. Examples of such functions are a decay curve,

yx a1 expa2 x 154
or a Gaussian on a linear background,
!
x  a4 2
yx a1 a2 x a3 exp  155
2a25

Equation (154) is nonlinear in a2 ; Eq. (155) is nonlinear in the parameters a4 and a5 .


Equation (154) is representative for a group of functions for which linear least-squares tting
can be applied after suitable transformation. Fitting with Eq. (155) implies the application of
truly nonlinear least-squares tting with iterative optimization of the tting parameters.

1. Transformation to Linear Functions


Taking the logarithm of Eq. (154),
ln y ln a1 a2 x 156
and dening y0 ln y; and a01 ln a1 , a linear (straight line) tting function is obtained,
y0 a01 a2 x 157
and the method discussed earlier can be applied, but not without making the following
important remark. We have transformed our original data yi to y0i ln yi . Consequently,
also the variance s02 i has been changed according to the general error propagation formula:
 0 2
@y
s0i 2 s2i 158
@y

Copyright 2002 Marcel Dekker, Inc.


or in this particular case,
s2i
s0i 2 159
yi
Thus, even if all original data points had the same uncertainty si s and unweighed
linear least-squares tting could have been used, after the transformation a weighted linear
least-squares t is required. The results of the t are the parameters and the associated
uncertainties of the transformed equation, and to obtain the original parameter, we must
perform a back transformation with the appropriate error propagation,
0
a1 e a 1 160
 2
@a1
s2a1 s2a0 a1 s2a0 161
1 @a01 1

2. General Nonlinear Least Squares


For the general case of least-squares tting with a function that is nonlinear in one or more
of its tting parameters, no direct solution exists. Still, we can dene the object function w2 :
X 1
w2 y  yxi ; a 2
2 i
162
i
si

whose minimum will be reached when the partial derivative with respect to the parameters are
zero; however, this will result in a set of m equations for which no general solution exists. The
other approach to the problem is then to consider w2 as a continuous function of the parameters
aj (i.e., w2 will take a certain value for each set of values of the parameters aj for a given dataset
fxi ; yi ; s2i g. w2 thus forms a hypersurface in the m-dimensional space, formed by the tting
parameter aj . This surface must be searched to locate the minimum of w2 . Once found, the
corresponding coordinate values of the axes are the optimum values of the tting parameters.
The problem of nonlinear least-squares tting is thus reduced to the problem of
nding the minimum of a function in an m-dimensional space. Any algorithm that per-
forms this task should operate according to the following:

1. Given some initial set of values for the parameters aini evaluate w2 :
w2old w2 aini
2. Find a new set of values anew such that w2new < w2old .
3. Test the minimum of w2 value:
if w2new is the (true) minimum accept anew as the optimum values of the t
else w2old w2new and repeat Step 2.

From the scheme, the iterative nature of the nonlinear least-squares tting methods be-
comes evident. Moreover, it shows some other important aspects of the method: Initial
values are required to start the search, we need a procedure to obtain a new set of
parameters which preferably are such that w2 is decreasing, and we need to be sure that the
true minimum, not some local minimum, is nally reached.
A variety of algorithms has been proposed, ranging from brute-force mapping
procedures dividing the m-dimensional parameter space in small cells and evaluating w2 in
each point, to more subtle simplex search procedures (Fiori et al., 1981). The most
Copyright 2002 Marcel Dekker, Inc.
important group of algorithms is nevertheless based on the evaluation of the curvature
matrix. The gradient method and the rst-order expansion will be discussed briey, as they
form the basis of the most widely used LeverbergMarquardt algorithm (Marquardt,
1963; Bevington and Robinson, 1992; Press et al., 1988).
a. The Gradient Method
Having a tting function y yx; a and w2 dened as a function of the m parameters aj ,
Xn
1
w2 w2 a y  yxi ; a 2
2 i
163
i1
s i

the gradient of w2 in the m-dimensional parameter space is given by


X @w2
Hw2 j 164
j
@aj

where j is the unit vector along the axis j and the components of the gradient are given by
@w2 X 1 @y
2 yi  yxi ; a 165
@aj i
2
si @aj

It is convenient to dene
1 @w2
bj  166
2 @aj
The gradient gives the direction in which w2 increases most rapidly. A method to locate the
minimum can thus be developed on this basis. Given the current set of parameters aj , a
new set of parameters a0j is calculated (for all j simultaneously):
a0j aj Daj bj 167
which follows the direction of steepest descent and guarantees a decrease of w2 (at least if
the appropriate step sizes Daj are taken).
The gradient method works quite well away from the minimum, but near the
minimum, the gradient becomes very small (at the minimum, even zero). Fortunately, the
method discussed next behaves in the opposite way.
b. First-Order Expansion
If we write the tting function yxi ; a as a rst-order Taylor expansion of the parameters
aj around y0 ,
X @y0 x; a
yx; a y0 x; a daj 168
j
@aj

we obtain an (approximation) to the tting function which is linear in the parameter in-
crements daj . y0 x; a is the value of the tting function for some initial set of parameter a.
Using this function, we can now express w2 as
" #2
X 1 X @y0 xi ; a
w
2
yi  y0 xi ; a  daj 169
i
s2i j
@aj

and we can use the method of linear least squares to nd the parameters daj so that w2 will
be minimal. We are thus tting the dierence y0i yi  y0 xi ; a with the derivatives as
Copyright 2002 Marcel Dekker, Inc.
variables and the increments daj as unknowns. With reference to the section on linear
least-squares tting [Eq. (122)],
@y0 xi
Xj 170
@aj
and [Eq. (130) and (131)]
X n
1 @y0 xi
bj 2 i
y  y0 xi 171
i1
si @aj
X 1 @y0 xi @y0 xi
n
ajk 172
i1
s2i @aj @ak

dening a set of m normal equations in the unknowns daj ,


b ada 173
with solution
X
m
daj a1
jk bk 174
k1

It is not very dicult to prove that


1 @w20
bj  175
2 @ak
(i.e., the component of the gradient of w2 at the point of expansion) and
1 @ 2 w20
ajk  176
2 @aj dak
Thus, ajk in Eq. (172) is the rst-order approximation to the curvature matrix whose
inverse is the error matrix.
The rst-order expansion of the tting function is closely related to the rst-order
Taylor expansion of the w2 hypersurface itself:
X @w2
w2 w20 0
daj 177
j
@a j

where w20 is the w2 function at the point of expansion:


Xn
1
w20 y  y0 xi ; a 2
2 i
178
i1
si

At the minimum, the partial derivation of w2 with respect to the parameter ak will be zero:
@w2 @w20 X @ 2 w20
dak 0 179
@ak @ak j
@aj @ak

This results in a set of equations in the parameters dak :


@w20 Xm
@w20
 dak 180
@ak j1
@aj @ak

Copyright 2002 Marcel Dekker, Inc.


X
bk ajk dak 181
which is the same set of equations, except that in the expansion of the tting function, only
a rst-order approximation of the curvature matrix is used.
Because near the minimum the rst-order expansion of the w2 surface is a good ap-
proximation, we can conclude that also the rst-order expansion of the tting function (which is
computationally more elegant because only derivatives of functions and not of w2 are required)
will yield parameter increments daj which will direct us toward the minimum. For each linear
parameter in the tting function, the rst-order expansion of the function in this parameter is
exact and the calculated increment daj will be such that the new value aj daj is optimum (for
the given set of nonlinear parameters which might not be at their optimum value yet).
c. The Marquardt Algorithm
Based on the observation that away from the minimum the gradient method is eective
and near the minimum the rst-order expansion is useful, Marquardt developed an al-
gorithm that combines both methods using a scaling factor l that moves the algorithm
either in the direction of the gradient search or into the direction of rst-order expansion
(Marquardt, 1963).
The diagonal terms of the curvature matrix are modied as follows:

0 ajk 1 l; j k
ajk 182
ajk ; j 6 k
where ajk is given by Eq. (172) and the matrix equation to be solved for the increments daj is
X
bj a0jk dak 183
k

When l is very large (l 1), the diagonal elements of a dominate and Eq. (183) reduces to
bj  a0jj dak 184
or
1 1 @w2
dak  b  185
a0jj j
a0jj @ak
which is the gradient, scaled by a factor a0jj . On the other hand, for small values of
l (l 1), the solution is very close to rst-order expansion.
The algorithm proceeds as follows:
1. Given some initial values of the parameters aj , evaluate w2 w2 a and initialize
l 0.0001.
2. Compute b and a matrices using Eqs. (171) and (172).
3. Modify the diagonal elements a0jj ajj l and compute da.
4. If w2 a da
w2 a
increase l by a factor of 10 and repeat Step 3;
If w2 a da < w2 a
decrease l by a factor of 10
accept new parameters estimates a a da and repeat Step 2.
The algorithm thus performs two loops: the inner loop incrementing l until w2 starts to
decrease and the outer loop calculating successively better approximations to the optimum
values of the parameters. The outer loop can be stopped when w2 decreases by a negligible
absolute or relative amount.
Copyright 2002 Marcel Dekker, Inc.
Once the minimum is reached, the diagonal elements are an estimate of the un-
certainty in the tting parameters just as in the case of linear least squares:

s2aj a1
jj 186
which is equal to a0jj 1 providing the scaling factor l is much smaller than 1.
In Sec. X, a number of computer programs for linear and nonlinear least-squares
tting are given. Further information can be found in many textbook (Press et al., 1988).
The book by Bevington and Robinson (1992) contains a very clear and practical discus-
sion of the least-squares method.

X. COMPUTER IMPLEMENTATION OF VARIOUS ALGORITHMS

In this section, a number of computer routines related to spectrum evaluation are listed.
The calculation routines are written in FORTRAN. Some example programs, calling this
FORTRAN routines, are written in C. The programs were tested using Microsoft FORTRAN
version 4.0A and C version 5.1. Most of the routines are written for clarity rather than
optimized for speed or minimum space requirement.

A. Smoothing
1. Savitsky and Golay Polynomial Smoothing
The subroutine SGSMTH calculates a smoothed spectrum using a second-degree poly-
nomial lter (see Sec. III.B.2).

Input: Y Original spectrum


NCHAN Number of channels in the spectrum
ICH1, ICH2 First and last channel number to be smoothed
IWID Width of the filter (2m 1), IWID < 42
Output S Smoothed spectrum, only defined between ICH1 and ICH2

SUBROUTINE SGSMTH (Y, S, NCHAN, ICH1, ICH2, IWID)


INTEGER*2 NCHAN, ICH1, ICH2, IWID
REAL*4 Y(0:NCHAN1), S(0:NCHAN1)
REAL C(20:20)
C - - Calculate filter coefficients
IW MIN(IWID, 41)
M IW=2
SUM FLOAT( (2*M 7 1)*(2*M 1)*(2*M 3) )
DO 10 J M, M
C(J) FLOAT( 3*(3*M*M 3*M 7 1 7 5*J*J) )
10 CONTINUE
C - - Convolute spectrum with filter
JCH1 MAX( ICH1, M )
JCH2 MIN( ICH2, NCHAN 7 1 7 M )
DO 30 I JCH1, JCH2
S(I) 0.

Copyright 2002 Marcel Dekker, Inc.


DO 20 J 7 M, M
S(I) S(I) C(J)*Y(I J)
20 CONTINUE
S(I) S(I)=SUM
30 CONTINUE
RETURN
END

2. Low Statistics Digital Filter


The subroutine LOWSFIL smooths a spectrum using the low statistics digital lter algo-
rithm (see Sec. III.B.3).

Input: Y Original spectrum


NCHAN Number of channels
ICH1, ICH2 First and last channel to be smoothed
IFWHM FWHM in channels of a peak in the middle of the smoothing region
Output: S Smoothed spectrum, only defined between ICH1 and ICH2

SUBROUTINE LOWSFIL (Y, S, NCHAN, ICH1, ICH2, IFWHM)


INTEGER*2 NCHAN, ICH1, ICH2, IFWHM
REAL Y(0:NCHAN 7 1), S(0:NCHAN 7 1)
LOGICAL NOKSLOPE, STOOHIGH
REAL AFACT, FFACT, MFACT, RFACT
PARAMETER (AFACT 75., FFACT 1.5, MFACT 10., RFACT 1.3)
C - - Adjust smoothing region
IW NINT( FFACT*IFWHM )
JCH1 MAX(ICH1 7 IW, IW)
JCH2 MIN(ICH2 IW, NCHAN 7 1 7 IW)
DO 100 I JCH1, JCH2
IW NINT( FFACT*IFWHM )
SUML 0.
SUMR 0.
DO 20 J I 7 IW, I 7 1
SUML SUML Y(J)
20 CONTINUE
DO 30 J I 1, I IW
SUMR SUMR Y(J)
30 CONTINUE
C - - Adjust window
50 CONTINUE
SUMT SUML Y(I) SUMR
IF( SUMT .GT. MFACT ) THEN
SLOPE (SUMR 1.)=(SUML 1.)
NOKSLOPE SLOPE.GT.RFACT .OR. SLOPE .LT. 1.=RFACT
STOOHIGH SUMT .GT. AFACT*SQRT(Y(I))
IF( NOKSLOPE .OR. STOOHIGH .AND. IW .GT. 1 ) THEN
SUML SUML 7 Y(I 7 IW)
SUMR SUMR 7 Y(I 7 IW)
IW IW 7 1
GOTO 50

Copyright 2002 Marcel Dekker, Inc.


ENDIF
ENDIF
C -- Smoothed value
S(I) SUMT=FLOAT (2*IW 1)
100 CONTINUE
C -- Copy data points that could not be smoothed
DO 110 I ICH1, JCH1 7 1
S(I) Y(I)
110 CONTINUE
DO 120 I JCH2 1, ICH2
S(I) Y(I)
120 CONTINUE
RETURN
END

For each channel i in the spectrum, two windows, one on each side of the channel of width
f6FWHM(Ei ) channels are considered. In both windows, the channel contents are
summed, yielding a left sum L and a right sum R. Both windows are subsequently reduced
in width until either the total sum S L yi R falls below some constant minimum M
or until two conditions are met:
p
1. S is less than a cuto value N A yi , with A a constant.
2. The slope R 1=L 1 lies between 1=r and r, with r a constant.

The minimum constant M sets the base degree of smoothing in a region of vanishing
counts. The rst condition ensures that smoothing is conned to the low statistics region
of the spectrum; the second condition avoids the incorporation of the edges of the peaks in
the averaging.
When the above conditions are satised, the average S=(2f6FWHM 1) is adopted
as a smoothed channel count. The following parameters were found to yield good results
when treating PIXE spectra: f 1.5, A 75, M 10, and r 1.3.

B. Peak Search
The subroutine LOCPEAKS locates peaks in a spectrum using positive part of tophyhat
lter (see Sec III.C).

Input: Y Spectrum
NCHAN Number of channels in the spectrum
R Peak search sensitivity factor, typical 2 to 4
IWID Width of the filter, approx. equal to the FWHM of the peaks
MAXP Maximum number of peaks to locate (size of array IPOS)
Output: NPEAK Number of peaks found
IPOS Peak positions (channel number)

SUBROUTINE LOCPEAKS (Y, NCHAN, IWID, R, IPOS, NPEAKS, MAXP)


INTEGER*2 NCHAN, IWID, NPEAKS, MAXP
INTEGER*2 IPOS (MAXP)
REAL Y (NCHAN), R
C - - Width of filter (number of channels in the top)

Copyright 2002 Marcel Dekker, Inc.


C must be odd, and at least 3
NP MAX (IWID=2)*2 1, 3)
NPEAKS 0
C -- Calculate half width and start and stop channel
N NP=2
I1 NP
I2 NCHAN-NP
C -- Initialize running sums
I I1
TOTAL 0.
TOP 0.
DO 20 K 7N*2, NP
TOTAL TOTAL Y(I1 K)
20 CONTINUE
DO 22 K 7N, N
TOP TOP Y(I1 K)
22 CONTINUE
C -- Loop over all channels
LASTPOS 0
SENS R*R
FI 0.
FNEXT 0.
SNEXT 0.
DO 100 I I1 1, I2
TOP TOP 7 Y(I-N-1) Y(I N)
TOTAL TOTAL 7 Y(I-NP) Y(I NP)
FPREV FI
FI FNEXT
SI SNEXT
FNEXT TOP TOP 7 TOTAL
SNEXT TOTAL
C Significant?
IF( FI.GT.0. .AND. (FI*FI.GT.SENS*SI) ) THEN
C Find maximum
IF( FI .GT. FPREV .AND. FI .GT. FNEXT ) THEN
IF( FPREV.GT.0. .AND. FNEXT.GT.0. ) THEN
C and store (ch# is array index  1 and FI refers to I1)
NEWPOS I2
IF( NEWPOS.GT.LASTPOS 2 ) THEN
NPEAKS NPEAKS 1
IPOS(NPEAKS) NEWPOS
LASTPOS NEWPOS
IF( NPEAKS .EQ. MAXP ) RETURN
ENDIF
ENDIF
ENDIF
ENDIF
100 CONTINUE
RETURN
END
The routine is optimized for speed and requires no other arrays than the spectrum and a
table to store the peak maxima found. This is achieved by using a variant of the top-hat

Copyright 2002 Marcel Dekker, Inc.


lter; that is, for a lter width of 5, the coecients are 71, 71, 1, 1, 1, 1, 1,
71, 71, 71. The next point in the ltered spectrum can be calculated from the current
by subtracting and adding and only the value of the previous, the current, and the next
points in the ltered spectrum are retained. This makes the routine quite cryptic, but it
works very fast and is reliable.

C. Continuum Estimation
1. Peak Stripping
The subroutine SNIPBG, a variant of the SNIP algorithm, calculates the continuum via peak
stripping (see Sec. IV.A).

INPUT: Y Spectrum
NCHAN Number of channels in the spectrum
ICH1,ICH2 First and last channels of region to calculate the continuum
FWHM Width parameter for smoothing and stripping algorithm, set it
to average FWHM of peaks in the spectrum, typical value 8.0
NITER Number of iterations of SNIP algorithm, typical 24
Output: YBACK Calculated continuum in the region ICH1ICH2
Comment: Uses subroutine SGSMTH

SUBROUTINE SNIPBG (Y, YBACK, NCHAN, ICH1, ICH2, FWHM, NITER)


INTEGER*2 NCHAN, ICH1, ICH2, NITER
REAL*4 Y(0:NCHAN-1), YBACK(0:NCHAN-1), FWHM
PARAMETER (SQRT2 1.4142, NREDUC 8)
C - - Smooth spectrum
IW NINT (FWHM)
I1 MAX (ICH1 7 IW, 0)
I2 MIN (ICH2 IW, NCHAN-1)
CALL SGSMTH (Y, YBACK, NCHAN, I1, I2, IW)
C - - Square root transformation over required spectrum region
DO 10 I I1, I2
YBACK(I) SQRT (MAX(YBACK(I), 0.))
10 CONTINUE
C - - Peak stripping
REDFAC 1.
DO 30 N 1, NITER
C.. Set width, reduce width for last NREDUC iterations
IF( N .GT. NITER-NREDUC ) REDFAC REDFAC=SQRT2
IW NINT (REDFAC*FWHM)
DO 20 I ICH1, ICH2
I1 MAX (I 7 IW, 0)
I2 MIN (I IW, NCHAN-1)
YBACK(I) MIN(YBACK(I), 0.5* (YBACK(I1) YBACK(I2)))
20 CONTINUE
30 CONTINUE
C - - Back transformation
DO 40 I ICH1, ICH2
YBACK(I) YBACK(I)*YBACK(I)

Copyright 2002 Marcel Dekker, Inc.


40 CONTINUE
RETURN
END

2. Orthogonal Polynomial Continuum


The subroutine OPOLBAC ts the continuum of a pulse-height spectrum using an ortho-
gonal polynomial. Continuum channels are selected by adjusting the weight of the t (see
Sec. IV.C).

Input: NPTS Number of data points (channels)


X Array of channel numbers
Y Array of spectrum
R Adjustable parameter [Eqs. (51) and (52)], typical value 2
Output: YBACK Array of fitted continuum
W Array of weights
A,B Coefficients of the orthogonal polynomial
C Fitted parameters of the orthogonal polynomial
SC Uncertainties of C
FAILED Logical variable TRUE if no convergence after MAXADJ weight
adjustments
RCHISQ Reduced chi-square value of the fitted continuum
Workspace: WORK1, WORK2 of size NPTS

The routine calls ADJWEIG to adjust the weights. Further, the subroutine ORTPOL is
used to t the polynomial. The iteration (adjustment of weights) stops when all coefcients
cj change less than one standard deviation or when the maximum number of iterations is
reached.

SUBROUTINE OPOLBAC (NPTS, X, Y, W, YBACK, WORK1, WORK2,


> NDEGR, A, B, C, COLD, SC, FAILED, RCHISQ, R)
INTEGER NPTS, NDEGR
REAL*4 X(NPTS),Y(NPTS),W(NPTS),YBACK(NPTS)
REAL*4 WORK1 (NPTS), WORK2 (NPTS)
REAL*4 A (NDEGR), B(NDEGR), C(NDEGR), COLD(NDEGR), SC(NDEGR)
REAL*4 RCHISQ, R
LOGICAL*2 FAILED
PARAMETER (MAXADJ 20)
LOGICAL*2 NEXT
C -- Initialize
DO 10 J 1, NDEGR
COLD(J) 0.
10 CONTINUE
DO 20 I 1, NPTS
YBACK(I) 0.
20 CONTINUE
C - - Main iteration loop
DO 100 K 1, MAXADJ
C .. Calculate weights
CALL ADJWEIG (NPTS, Y, W, YBACK, R, NBPNTS)

Copyright 2002 Marcel Dekker, Inc.


C .. Fit orthogonal polynomial
CALL ORTPOL (NPTS, X, Y, W, YBACK, WORK1, WORK2,
> NDEGR, A, B, C, SC, SUMSQ)
RCHISQ SUMSQ=FLOAT (NBPNTS 7 NDEGR)
S SQRT (RCHISQ)
C .. Test if further adjustments of weights is required
NEXT .FALSE.
DO 30 J 1, NDEGR
SC(J) S * SC(J)
IF( ABS(COLD(J)-C(J)) .GT. SC(J) ) NEXT .TRUE.
COLD(J) C(J)
30 CONTINUE
C .. convergence
IF( .NOT.NEXT ) THEN
FAILED .FALSE.
RETURN
ENDIF
100 CONTINUE
C - - No convergence after MAXADJ iterations
FAILED .TRUE.
RETURN
END
SUBROUTINE ADJWEIG (NPTS, Y, W, YFIT, R, NBPNTS)
C * * Adjust weights to emphasize the continuum
INTEGER NPTS, NBPNTS
REAL*4 Y(NPTS), W(NPTS), YFIT(NPTS), R
NBPNTS 0
C - - Loop over all data points
DO 10 I 1, NPTS
IF (YFIT(I) .GT. 0.) THEN
IF (Y(I).LE. YFIT(I) R*SQRT (YFIT(I))) THEN
C .. Point is considered as continuum
W(I) 1./YFIT(I)
NBPNTS NBPNTS 1
ELSE
C .. Point is NOT considered as continuum.
W(I) 1./(Y(I) 7 YFIT(I))**2
ENDIF
ELSE
C .. Continuum < 0, weight based original data
C (initial condition)
W(I) 1./MAX(Y(I),1.)
NBPNTS NBPNTS 1
ENDIF
10 CONTINUE
RETURN
END

D. Filter-Fit method

The C program FILFIT is a test implementation of the lter-t method (see Sec. VI). This
program simply coordinates all input and output, allocates the required memory, and calls
two FORTRAN routines that do the actual work. The subroutine TOPHAT returns the
Copyright 2002 Marcel Dekker, Inc.
convolute of the spectrum with the top-hat lter or the weights (the inverse of the variance
of the ltered spectrum). The general-purpose subroutine LINREG is called to perform the
multiple linear least-squares t. The output includes the reduced w2 value, the parameters
of the t aj (which is an estimate of the ratio of the intensity in the analyzed spectrum to
the intensity in the standard for the considered X-ray lines), and their standard deviation.
The routine GETSPEC reads the spectral data and must be supplied by the user.

#include <stdio.h>
#include <malloc.h>
#include <float.h>
#include <math.h>
void fortran TOPHAT( );
void fortran LINREG( );
float spec[2048];
main( )
{
int nchan, first_ch_fit, last_ch_fit, width, ierr;
int i, first_ch_ref, last_ch_ref, ref, num_ref, num_points;
int filter_mode 0, weight_mode 1, ioff;
float meas_time, ref_meas_time, *scale_fac;
float *x, *xp, *y, *w, *yfit, *a, *sa, chi;
double *beta, *alpha;
char filename [64];
// input width of tophat filter
scanf(%hd, &width);
// input spectrum to fit and fitting region
scanf(%s, filename);
scanf(%hd %hd, &first_ch_fit, &last_ch_fit);
nchan Getspec(spec, filename, &meas_time);
num_points last_ch_fit 7 first_ch_fit 1;
// filter spectrum and store in y[ ]
y (float *)calloc(num_points, sizeof(float));
TOPHAT(spec, y, &nchan, &first_ch_fit, &last_ch_fit, &width, &filter_mode);
//calculate weights of fit and save in w[ ]
w (float *)calloc(num_points, sizeof(float));
TOPHAT(spec, w, &nchan, &first_ch_fit, &last_ch_fit, &width, &weight_mode);
//read reference spectra, filter and store in x[ ]
scanf(%hd, &num_ref);
scale_fac (float *)calloc(num_ref, sizeof(float));
x (float *)calloc(num_points*num_ref, sizeof(float));
for(ref 0; ref < num_ref; ref) {
scanf(%s, filename);
nchan Getspec(spec, filename, &ref_meas_time);
scale_fac[ref] ref_meas_time/meas_time;
scanf(%hd %hd, &first_ch_ref, &last_ch_ref);
if (first_ch_ref < first_ch_fit) first_ch_ref first_ch_fit;
if(last_ch_ref > last_ch_fit) last_ch_ref last_ch-fit;
ioff ref*num_points first_ch_ref 7 first_ch_fit;
xp x ioff;
TOPHAT(spec, xp, &nchan, &first_ch_ref, &last_ch_ref, &width, &filter_mode);
}

Copyright 2002 Marcel Dekker, Inc.


// perform least squares fit
yfit (float *)calloc(num_points, sizeof(float));
a (float *)calloc(num_ref, sizeof(float));
sa (float *)calloc(num_ref, sizeof(float));
beta (double *)calloc(num_ref, sizeof(double));
alpha (double *)calloc(num_ref* (num_ref 1)/2, sizeof(double));
LINREG (y,w,x &num_points, &num_ref, &num_points, &num_ref, yfit, a, sa, &chi,
&ierr, beta, alpha);
if( ierr 0 ) {
printf(Filter fit: Chi-square %f\n, chi);
printf(Standard Int. in analyse spectrum/Int. in standard\n);
for(i 0; i < num_ref; i )
printf( %hd %f n%f\n, i 1, a[i]*scale_fac[i], sa[i]*sacle_fac[i]);
for(i 0; i < num_points; i ){
printf(%4hd %7.0f %9.2f, first_ch_fit i, y[i],yfit[i]);
for (ref 0;ref < num_ref; ref )
printf(%7.0f, x[ref * num_points i]);
printf(\n);
}
}
}

SUBROUTINE TOPHAT (IN, OUT, NCHAN, IFRST, ILAST, IWIDTH, MODE)


INTEGER*2 NCHAN, IFRST, ILAST, IWIDTH, MODE
REAL*4 IN (NCHAN), OUT(1)
C ** Tophat filter of width IWIDTH, MODE 0 calculate filtered specturm,
C MODE ! 0 calculate weights (1/variance of filtered spectrum)
C - - Calculate filter constants.
IW IWIDTH
IF(MOD(IW,2).EQ.0)IW IW 1
FPOS 1./FLOAT(IW)
KPOS IW/2
IV IW/2
FNEG 7 1./FLOAT(2*IV)
KNEG1 IW/2 1
KNEG2 IW/2 IV
N0
C - - Loop over all requested channels.
DO 30 I IFRST 1, ILAST 1
C . . Central positive part,
YPOS 0.
DO 10 K 7 KPOS, KPOS
IK MIN(MAX(I K,1),NCHAN)
YPOS YPOS IN(IK)
10 CONTINUE
C . . Left and right negative part,
YNEG 0.
DO 20 K KNEG1, KNEG2
IK MIN(MAX(I 7 K,1),NCHAN)
YNEG YNEG IN(IK)
IK MIN(MAX(I K,1),NCHAN)
YNEG YNEG IN(IK)

Copyright 2002 Marcel Dekker, Inc.


20 CONTINUE
NN1
IF(MODE.EQ.0) THEN
OUT(N) FPOS * YPOS FNEG * YNEG
ELSE
VAR FPOS*FPOS*YPOS FNEG*FNEG*YNEG
OUT(N) 1./MAX(VAR,1.)
ENDIF
30 CONTINUE
RETURN
END

E. Fitting Using Analytical Functions

The C program NLRFIT is an example implementation of the nonlinear spectrum tting


using an analytical function (see Sec. VII). The program only coordinates input and
output. The actual tting is done using the Marquardt algorithm with the FORTRAN
subroutine MARQFIT. The tting function consists of a polynomial continuum width NB
terms and NP Gaussians. The continuum parameters and the area, position, and with of
each Gaussian are optimized during the t. The tting function is calculated using the
routine FITFUNC. The derivatives of the tting function with respect to the parameters are
calculated by the routine DERFUNC.

// Program NLRFIT
#include <stdio.h>
#include <malloc.h>
#include <float.h>
#include <math.h>
#define MAX_PERKS 10
#define MAX_CHAN 1024
void fortran MARQFIT();
float spec [MAX_CHAN];
// Fortran common block structure COMMON/FITFUN/NB, NP
struct common_block {short NB, NP;};
extern struct common_block fortran FITFUN;
main( )
{
char specfile[64];
int nchan, first_ch_fit, last_ch_fit, nb, np;
int i, j, n, num_points, num_param, ierr, max_iter;
float ini_pos[MAX_PEAKS], ini_wid[MAX_PEAKS];
float *x, *xp, *y, *w, *yfit, *a, *sa, chi, lamda, crit_dif;
float *b, *beta, *deriv, *alpha;
double *work;
// Input of parameters and spectral data
scanf(%s, specfile);
scanf(%hd %hd %hd %f, &first_ch_fit, &last_ch_fit, &max_iter, &crit_dif);
scanf(%hd %hd, &np, &nb);
for(i 0; i < np; i )
scanf(%f %f, &ini_pos[i], &ini_wid[i]);

Copyright 2002 Marcel Dekker, Inc.


nchan GetSpec(spec, specfile);
num_points last_ch_fit 7 first_ch_fit 1;
num_param nb 3*np;
//Allocate memory for y[ ], w[ ], x[ ]
y (float *)calloc(num_points, sizeof(float));
w (float *)calloc(num_points, sizeof(float));
x (float *)calloc(num_points, sizeof(float));
// Store independent variable (spectrum), weights and dep. var (channel #)
for(i first_ch_fit, n 0; i < last_ch_fit; i , n ) {
y[n] spec[i];
w[n] (spec[i] > 0.) ? 1./spec[i] : 1.;
x[n] (float)i;
}
// allocate memory for other arrays required
yfit (float *)calloc(num_points, sizeof(float));
a (float *)calloc(num_param, sizeof(float));
sa (float *)calloc(num_param, sizeof(float));
b (float *)calloc(num_param, sizeof(float));
beta (float *)calloc(num_param, sizeof(float));
deriv (float *)calloc(num_param, sizeof(float));
alpha (float *)calloc(num_param*(num_param 1)/2, sizeof(float));
work (double *)calloc(num_param*(num_param 1)/2, sizeof(double));

// initialize, all linear parameters to zero, peak position and width


// to their initial guesses
lamda 0.001;
for(i 0; i < np; i ){
a[nb np i] ini_pos[i];
a[nb 2*np i] ini_wid[i];
}

// perform least squares fit


FITFUN.NP np;
FITFUN.NB nb;
MARQFIT(&ierr, &chi, &lamda, &crit_dit, &max_iter, x, y, w, yfit, &num_points, a,
sa, &num_param, b, beta, deriv, alpha, work);
if( ierr 0 ) {
printf(\nNon-linear fit: Chi-square %f\n, chi);
printf(Polynomial continuum parameters\n);
for(i 0; i < nb; i )
printf(%hd %f n %f\n, i 1, a[i], sa[i]);
printf(Peak parameters Area Position Width\n);
for(i 0; i < np; i ){
printf(%hd %10.0f n % 7 10.0f, i 1, a[nb i], sa[nb i]);
printf(%10.3f n % 7 10.3f, a[nb np i], sa[nb np i]);
printf(%10.3f n % 7 10.3f\n, a[nb 2*np i], sa[nb 2*np i]);
}
for (i 0; i < num_points; i )
printf(%4hd %7.0f %9.2f\n, first_ch_fit i, y[i], yfit[i]);
}
}

Copyright 2002 Marcel Dekker, Inc.


SUBROUTINE FITFUNC(X, YFIT, NPTS, A, NTERMS )
REAL*4 X(NPTS), YFIT(NPTS), A(NTERMS)
COMMON /FITFUN/ NB, NP
C ** Fitting function, polynomial continuum and NP gaussians
C with position, width and area as parameters
PARAMETER(SQR2PI 2.50663)
C - - Loop over all channels
DO 100 I 1, NPTS
C . . continuum
YFIT(I) A(1)
DO 20 J 2, NB
YFIT(I) YFIT(I) A(J) * X(I)**(J 7 1)
20 CONTINUE
C . . Peaks
DO 30 K 1, NP
AREA A(NB K)
POS A(NB NP K)
SWID A(NB 2*NP K)
Z ((POS 7 X(I))/SWID)**2
IF( Z.LT.50.) THEN
G EXP( 7 Z/2.)/SWID/SQR2PI
YFIT(I) YFIT(I) AREA*G
ENDIF
30 CONTINUE
100 CONTINUE
RETURN
END
SUBROUTINE DERFUNC(X, NPTS, A, NTERMS, DERIV, I)
REAL*4 X(NPTS), A(NTERMS), DERIV(NTERMS)
COMMON/FITFUN/NB, NP
C ** Derivatives of fitting function: polynomial continuum and NP Gaussians
C with postion, width and area as parameters
PARAMETER (SQR2PI 2.50663)
C -- Derivatives of function with respect to the continuum parameters
DERIV(1) 1.
DO 10 J 2, NB
DERIV(J) X(I)**(J 7 1)
10 CONTINUE
C -- Derivatives of function with respect to the peak parameters
DO 30 K 1, NP
AREA A(NB K)
POS A(NB NP K)
SWID A(NB 2*NP K)
Z ((POS 7 X(I))/SWID)**2
IF(Z.LT.50.) THEN
G EXP( 7 Z/2.)/SWID/SQR2PI
C . . Peak area
DERIV(NB K) G
C . . Peak position
DERIV(NB NP K) 7 AREA*G*(POS 7 X(I))/SWID/SWID
C . . Peak width
DERIV(NB 2*NP K) AREA*G*(Z 7 1.)/SWID
ELSE

Copyright 2002 Marcel Dekker, Inc.


DERIV(NB K) 0.
DERIV(NB NP K) 0.
DERIV(NB 2*NP K) 0.
ENDIF
30 CONTINUE
RETURN
END

F. Monte Carlo Methods


1. Uniform Random-Number Generator
The function URAND is a FORTRAN function returning uniform distributed random
numbers in the interval 0  U < 0. The random-number generator is based on Knuths
subtractive method (Press et al., 1988) (see Sec. VIII. A)
Input: ISEED Set to any negative number to initialize the random generator
Output: URAND Uniform random number in the interval 0  URAND < 1

REAL*4 FUNCTION URAND (ISEED)


INTEGER*2 ISEED
REAL*4 UTABLE(56)
REAL*4 UBIT, USEED
PARAMETER (UBIG 4000000., USEED 1618033.)
SAVE I1, I2, UTABLE, INIT
C -- Initialize table
IF (ISEED.LT.0 .OR. INIT.EQ.0) THEN
U USEED FLOAT(ISEED)
U MOD(U,UBIG)
UTABLE(55) U
UTMP 1.
DO 10 I 1, 54
II MOD(I*21, 55)
UTABLE(II) UTMP
UTMP U 7 UTMP
IF(UTMP.LT.0.) UTMP UTMP UBIG
U UTABLE(II)
10 CONTINUE
DO 30 K 1, 4
DO 20 I 1, 55
UTABLE(I) UTABLE(I) 7 UTABLE(1 MOD(I 30,55))
IF(UTABLE(I).LT.0) UTABLE(I) UTABLE(I) UBIG
20 CONTINUE
30 CONTINUE
I1 0
I2 31
ISEED 1
INIT 1
ENDIF
C -- Get next random number
I1 I1 1
IF(I1.EQ.56) I1 1

Copyright 2002 Marcel Dekker, Inc.


I2 I2 1
IF(I2.EQ.56) I2 1
U UTABLE(I1) 7 UTABLE(I2)
IF(U.LT.0.)U U UBIG
UTABLE(I1) U
URAND U/UBIG
RETURN
END

2. Normal Distributed Random Deviate


The function NRAND returns normal distributed random number with zero mean and unit
variance, using the BoxMuller method (see Sec. VIII. B).

Input: ISEED Set to any negative number to initialize the random sequence
Output: NRAND Normal distributed random deviate with zero mean and unit
variance.

REAL*4 FUNCTION NRAND (ISEED)


INTEGER*2 ISEED
SAVE NEXT, FAC, V1, V2
IF (NEXT.EQ.0 .OR. ISEED.LT.0) THEN
10 CONTINUE
V1 2. * URAND(ISEED) 7 1.
V2 2. * URAND(ISEED) 7 1.
R V1*V1 V2*V2
IF(R.G.E.1. .OR. R. EQ. 0.) GOTO 10
FAC SQRT( 7 2. * LOG(R)/R)
NRAND V1*FAC
NEXT 1
ELSE
NRAND V2*FAC
NEXT 0
ENDIF
RETURN
END

3. Poisson Distributed Random Deviate


The function PRAND can be used to produce approximately Poisson distributed random
deviates. For small numbers ( < 20), the direct method is used; for larger numbers, the
Poisson distribution is approximated by the Normal distribution.

Input: Y (population) mean of deviate


ISEED Set to any negative number to initialize the random sequence
Output: PRAND Poisson distributed random deviate with mean Y

Copyright 2002 Marcel Dekker, Inc.


REAL *4 FUNCTION PRAND(Y, ISEED)
INTEGER*2 ISEED
REAL*4 Y
REAL*4 NRAND, URAND
IF(Y.LT.20.) THEN
C -- Use direct method
G EXP( 7 Y)
PRAND 7 1.
T 1.
10 CONTINUE
PRAND PRAND 1.
T T * URAND(ISEED)
IF(T.GT.G) GOTO 10
ELSE
C -- Approximate by normal distribution
PRAND Y SQRT(Y) * NRAND(ISEED)
ENDIF
RETURN
END

G. Least-Squares Procedures
1. Linear Regression
Subroutine LINREG is a general-purpose (multiple) linear regression routine (see Sec. IX. A).

Input: Y Array of dependent variable


W Array of weights 1/s2i )
X Matrix of independent variables
N Number of data points
M Number of independent variables (columns of X)
NMAX,MMAX Size of X matrix
Output: YFIT Array of fitted Y values
A Estimated least-squares parameters
SA Standard deviation of A
CHI Chi-square value
IERR Error condition, 1 if fit failed (singular matrix)
Workspace: BETA of size M

SUBROUTINE LINREG (Y, W, X, N, M, NMAX, MMAX,


> YFIT, A, SA, CHI, IERR, BETA, ALPHA)
INTEGER*2 N, M, NMAX, MMAX, IERR
REAL*4 Y(N), W(N), YFIT(N), A(M), SA(M), CHI
REAL*4 X(NMAX, MMAX)
REAL*8 BETA(M), ALPHA(1)
c Accumulate BETA and ALPHA matrices
JK 0
DO 10 J 1, M
BETA(J) 0.0D0
DO 2 I 1, N

Copyright 2002 Marcel Dekker, Inc.


BETA(J) BETA(J) W(I)*Y(I)*X(I, J)
2 CONTINUE
DO 6 K 1, J
JK JK 1
ALPHA(JK) 0.0D0
DO 4 I 1, N
ALPHA(JK) ALPHA(JK) W(I)*X(I,K)*X(I,J)
4 CONTINUE
6 CONTINUE
10 CONTINUE
c Invert ALPHA matrix
CALL LMINV (ALPHA, M, IERR)
IF(IERR .EQ. 7 1) THEN
RETURN
ENDIF
c Calculate fitting parameters A
DO 20 J 1, M
A(J) 0.
JJ J*(J 7 1)/2
DO 12 K 1, J
JK K JJ
A(J) A(J) ALPHA(JK)*BETA(K)
12 CONTINUE
DO 14 K J+1, m
JK J K* (K 7 1)/2
A(J) A(J) ALPHA(JK)*BETA(K)
14 CONTINUE
20 CONTINUE
c Calculate uncertainties in the parameters
DO 30 J 1, M
JJ J*(J 1)/2
SA(J) DSQRT(ALPHA(JJ))
30 CONTINUE
c Calculate fitted values and Chi-square
CHI 0.
DO 40 I 1, N
YFIT(I) 0.
DO 32 J 1, M
YFIT(I) YFIT(I) A(J)*(I, J)
32 CONTINUE
CHI CHI W(I)*(YFIT(I) 7 Y(I))**2
40 CONTINUE
CHI CHI/FLOAT(N 7 M)
RETURN
END

2. Orthogonal Polynomial Regression


Subroutine ORTPOL ts an orthogonal polynomial to a set of data points [xi, yi, wi] (see
Sec. IX.B).

Copyright 2002 Marcel Dekker, Inc.


Input: NPTS Number of data points
X Array of independent variable
Y Array of dependent variable
W Array of weights (wi 1s2i )
NDEGR Degree of orthogonal polynomial to be fitted
Output: A, B Parameters of the orthogonal polynomials
C Fitted orthogonal polynomial coefficients
SC Standard deviation of C
SUMSQ Chi-square value
Workspace: PJ, PJMIN of size NPTS

SUBROUTINE ORTPOL (NPTS, X, Y, W, YFIT, PJ, PJMIN, NDEGR, A, B, C, SC,


> SUMSQ)
INTEGER NPTS, NDEGR
REAL*4 X(NPTS), Y(NPTS), W(NPTS), YFIT(NPTS)
REAL*4 PJ(NPTS), PJMIN(NPTS)
REAL*4 A(NDEGR), B(NDEGR), C(NDEGR), SC(NDEGR), SUMSQ
C - - Initialize
DO 10 I 1, NPTS
PJ(I) 1.
PJMIN(I) 0.
YFIT(I) 0.
10 CONTINUE
GAMJMIN 1.
C - - Loop over all polynomial terms
DO 100 J 1, NDEGR
C .. Accumulate normalization factor, A and B constants for term j
GAMJ 0.
A(J) 0.
B(J) 0.
DO 20 I 1, NPTS
GAMJ GAMJ W(I)*PJ(I)*PJ(I)
A(J) A(J) W(I)*X(I)*PJ(I)*PJ(I)
B(J) A(J) W(I)*X(I)*PJ(I)*PJMIN(I)
20 CONTINUE
A(J) A(J) / GAMJ
B(J) B(J) / GAMJMIN
C .. Least squares estimate of coefficient C
C(J) 0.
DO 30 I 1, NPTS
C(J) C(J) W(I)*Y(I)*PJ(I)
30 CONTINUE
C(J) C(J)=GAMJ
SC(J) SQRT(1.=GAMJ)
C .. Contribution of this term to the fit
DO 40 I 1, NPTS
YFIT(I) YFIT(I) C(J)*PJ(I)
40 CONTINUE
C .. Next polynomial term
IF (J .LT. NDEGR) THEN

Copyright 2002 Marcel Dekker, Inc.


DO 50 I 1, NPTS
PJPLUS (X(I) 7 A(J))*PJ(I) 7 B(J)*PJMIN(I)
PJMIN(I) PJ(I)
PJ(I) PJPLUS
50 CONTINUE
GAMJMIN GAMJ
ENDIF
100 CONTINUE
C -- Weighted sum of squares value
SUMSQ 0.
DO 110 I 1, NPTS
SUMSQ SUMSQ W(I)*(Y(I) 7 YFIT(I))**2
110 CONTINUE
RETURN
END

3. Nonlinear Regression
The subroutine MARQFIT performs nonlinear least-squares-tting according to the Mar-
quardt algorithm (see Sec. IX.C).

Input: MAXITER Maximum number of iterations


X Array of independent variable
Y Array of dependent variable
W Array of weights (wi 1=s2i )
NPTS Number of data points
NTERMS Number of parameters
A Array of initial values of the parameters
Output: IERR Error status, 7 1 indicates failure of fit
CHISQR Reduced chi-square value
FLAMDA Marquardt control parameter
YFIT Array of fitted data points
A Least-squares estimate of the fitting parameters
SA Standard deviation of A
CRIDIF Minimum percent difference in two chi-square values
to stop the iteration
Workspace: B, BETA, DERIV of size NTERMS
ALFA, ARR of size NTERMS  (NTERMS 1)=1

The routine requires two user-supplied subroutines: FITFUNC to evaluate the tting
function y(i) with the current set of parameters a and the DERFUNC to calculate the de-
rivatives of the tting function with respect to the parameters.
SUBROUTINE MARQFIT (IERR, CHISQR, FLAMDA, CRIDIF, MAXITER,
> X, Y, W, YFIT, NPTS, A, SA, NTERMS,
> B, BETA, DERIV, ALFA, ARR)
INTEGER*2 IERR, NPTS, NTERMS
REAL*4 CHISQR, FLAMDA, CRIDIF
REAL*4 X(NPTS), Y(NPTS), W(NPTS), YFIT(NPTS)
REAL*4 A(NTERMS), SA(NTERMS)
REAL*4 B(1), BETA(1), DERIV(1), ALFA(1)
REAL*8 ARR(1)
PARAMETER (FLAMMAX 1E4, FLAMMIN 1E 7 6)

Copyright 2002 Marcel Dekker, Inc.


C -- Evaluate the fitting function YFIT for the current parameters
C and save the Chi-square value
NITER 0
CALL FITFUNC(X, YFIT, NPTS, A, NTERMS)
CHISQR CHIFIT(Y, YFIT, W, NPTS, NTERMS)
FLAMDA 0.
C -- Set ALFA and BETA to zero, save the current value of the parameters A
100 CONTINUE
NITER NITER 1
CHISAV CHISQR
DO 110 J 1, NTERMS
B(J) A(J)
BETA(J) 0.
110 CONTINUE
DO 112 J 1, NTERMS*(NTERMS 1)=2
ALFA(J) 0.
112 CONTINUE
C -- Accumulate Alpha and Beta matrices
DO 120 I 1, NPTS
D Y(I) 7 YFIT(I)
C .. Calculate derivatives at point i
CALL DERFUNC (X, NPTS, A, NTERMS, DERIV, I)
DO 120 J 1, NTERMS
BETA(J) BETA(J) W(I)*D*DERIV(J)
JJ J*(J 7 1)=2
DO 120 K 1, J
JK JJ K
ALFA(JK) ALFA(JK) W(I)*DERIV(J)*DERIV(K)
120 CONTINUE
C -- Test and scale ALFA matrix
DO 140 J 1, NTERMS
JJ J*(J 7 1)=2
JJJ JJ J
IF(ALFA(JJJ) .LT. 1.E 7 20) THEN
DO 130 K 1, J
JK JJ K
ALFA(JK) 0.
130 CONTINUE
ALFA (JJJ) 1.
BETA(J) 0.
ENDIF
SA(J) SQRT (ALFA(JJJ))
140 CONTINUE
DO 160 J 1, NTERMS
JJ J*(J 7 1)=2
DO 150 K 1, J
JK JJ K
ALFA(JK) ALFA(JK)=SA(J)=SA(K)
150 CONTINUE
160 CONTINUE
C -- Store ALFA in ARR, modify the diagonal elements with FLAMDA
200 CONTINUE
DO 210 J 1, NTERMS

Copyright 2002 Marcel Dekker, Inc.


JJ J*(J 7 1)=2
DO 205 K 1, J
JK JJ K
ARR (JK) DBLE(ALFA(JK))
205 CONTINUE
JJJ JJ J
ARR(JJJ) DBLE(1. FLAMDA)
210 CONTINUE
C -- Invert matrix ARR
CALL LMINV (ARR, NTERMS, IERR)
IF (IERR .NE. 0) RETURN
C -- Calculate new values of parameters A
DO 220 J 1, NTERMS
DO 220 K 1, NTERMS
IF(K .GT. J) THEN
JK J K*(K 7 1)=2
ELSE
JK K J*(J 7 1)=2
ENDIF
A(J) A(J) ARR(JK) = SA(J)*BETA(K) = SA(K)
220 CONTINUE
C -- Evaluate the fitting function YFIT for the new parameters and Chi-square
CALL FITFUNC(X, YFIT, NPTS, A, NTERMS)
CHISQR CHIFIT(Y, YFIT, W, NPTS, NTERMS)
IF (NITER .EQ. 1) FLAMDA 0.001
C -- Test new parameter set
IF (CHISQR .GT. CHISAV) THEN
C .. Iteration NOT succesful, increase flamda and try again
FLAMDA MIN(FLAMDA * 10., FLAMMAX)
DO 300 J 1, NTERMS
A(J) B(J)
300 CONTINUE
GOTO 200
ENDIF
C .. Iteration succesful, decrease LAMDA
FLAMDA MAX(FLAMDA=10., FLAMMIN)
C .. Get next better estimate if required
PERDIF 100.*(CHISAV 7 CHISQR)=CHISQR
IF (NITER .LT. MAXITER .AND. PERDIF .GT. CRIDIF) GOTO 100
C -- Calculate standard deviations and return
DO 320 J 1, NTERMS
JJ J*(J 2)=2
SDEV DSQRT(ARR(JJ)) = SA(J)
SA(J) SDEV
320 CONTINUE
RETURN
END

FUNCTION CHIFIT(Y, YFIT, W, NPTS, NTERMS)


REAL*4 Y(NPTS), YFIT(NPTS), W(NPTS)
C ** Evaluate chi-square
CHI 0.
DO 300 I 1, NPTS

Copyright 2002 Marcel Dekker, Inc.


CHI CHI W(I) * (Y(I) 7 YFIT(I))**2
300 CONTINUE
CHI CHI = FLOAT(NPTS 7 NTERMS)
CHIFIT CHI
RETURN
END

4. Matrix Inversion
Subroutine LMINV is a general-purpose routine to invert a symmetric matrix.

Input: ARR Upper triangle and diagonal of real symmetric matrix stored in
linear array, size N(N 1)=2.
N Order of matrix (number of columns)
Output: IERR Error status, IERR 0 inverse obtained, IERR 1 singular
matrix

SUBROUTINE LMINV ( ARR, N, IERR )


INTEGER*2 N, IERR
REAL*8 ARR(1)
REAL*8 DIN, WORK, DSUM, DPIV
INTEGER*2 I, IND, IPIV, J, K, KEND, KPIV, L, LANF,
> LEND, LHOR, LVER, MIN
KPIV 0
DO 10 K 1, N
KPIV KPIV K
IND KPIV
LEND K 7 1
DO 4 I K, N
DSUM 0.DO
IF (LEND .GT. 0) THEN
DO 2 L 1, LEND
DSUM DSUM ARR(KPIV 7 L) * ARR(IND 7 L)
2 CONTINUE
ENDIF
DSUM ARR(IND) 7 DSUM
IF (I .EQ. K) THEN
IF (DSUM .LE. 0.D0) THEN
IERR 7 1
RETURN
ENDIF
DPIV DSQRT(DSUM)
ARR(KPIV) DPIV
DPIV 1.D0 = DPIV
ELSE
ARR(IND) DSUM * DPIV
ENDIF
IND IND I
4 CONTINUE
10 CONTINUE
IERR 0
IPIV N*(N 1)=2

Copyright 2002 Marcel Dekker, Inc.


IND IPIV
DO 20 I 1, N
DIN 1.DO = ARR(IPIV)
ARR(IPIV) DIN
MIN N
KEND I 7 1
LANF N 7 KEND
IF (KEND .GT. 0) THEN
J IND
DO 14 K 1, KEND
WORK 0.D0
MIN MIN 7 1
LHOR IPIV
LVER J
DO 12 L LANF, MIN
LVER LVER 1
LHOR LHOR L
WORK WORK ARR(LVER) * ARR(LHOR)
12 CONTINUE
ARR(J) 7 WORK * DIN
J J 7 MIN
14 CONTINUE
ENDIF
IPIV IPIV 7 MIN
IND IND 7 1
20 CONTINUE
DO 30 I 1, N
IPIV IPIV I
J IPIV
DO 24 K I, N
WORK 0.D0
LHOR J
DO 22 L K, N
LVER LHOR K 7 I
WORK WORK ARR(LVER) * ARR(LHOR)
LHOR LHOR L
22 CONTINUE
ARR(J) WORK
JJK
24 CONTINUE
30 CONTINUE
IERR 0
RETURN
END

REFERENCES

Aarnio PA, Lauranto H. Nucl Instrum Methods A276:608 (1989).


Bertin EP. Principles and Practice of X-ray Spectrometric Analysis. New York: Plenum Press, 1970.
Bevington PR, Robinson DK. Data Reduction and Error Analysis for the Physical Sciences.
New York: McGraw-Hill, 1992.
Black WW. Nucl Instrum Methods 71:317, 1969.

Copyright 2002 Marcel Dekker, Inc.


Bloomeld DJ, Love G. X-Ray Spectrom 14:8, 1985.
Bombelka E, Koenig W, Richter FW. Nucl Instrum Methods B22:21, 1987.
Breschinsky R, Krush E, Wehrse R. Diplomarbeit, Fachbereich Physik. Bremen: Universitat
Bremen, 1979.
Brook D, Wynne RJ. Signal Processing Principles and Applications. London: Edward Arnold, 1988.
Campbell JL. X-Ray Spectrom 24:307, 1995.
Campbell JL. Nucl Instrum Methods B109=110:71, 1996.
Campbell JL, Perujo A, Millman BM. X-Ray Spectrom 16:195, 1987.
Campbell JL, Maxwell JA, Papp T, White G. X-Ray Spectrom 26:223, 1997.
Campbell JL, Millman BM, Maxwell JA, Perujo A, Teesdale WJ. Nucl Instrum Methods B9:71,
1985.
Campbell JL, Maenhaut W, Bombelka E, Claytan E, Malmqvist K, Maxwell JA, Pallon J,
Vandenhaute J. Nucl Instrum Methods B14:204, 1986.
Campbell JL, Cauchon G, Lepy M-C, McDonald L, Plagnard J, Stemmler P, Teesdale WJ, White G.
Nucl Instrum Methods A418:394, 1998.
Cirone R, Gigante GE, Gualtieri G. X-Ray Spectrom 13:110, 1984.
Clayton E, Duerden P, Cohen DD. Nucl Instrum Methods B22:64, 1987.
Doster JM, Gardner RP. X-Ray Spectrom 11:173, 1982a.
Doster JM, Gardner RP. X-Ray Spectrom 11:181, 1982b.
Duy CJ, Rogers PSZ, Benjamin TM. Nucl Instrum Methods B22:91, 1987.
Enke CG, Nieman TA. Anal Chem 48:705A, 1976.
Fiori CE, Myklebust RL, Gorlen K. NBS Spec Pub 604, 233, 1981.
Gardner RP, Yacout AM, Zhang J, Verghese K. Nucl Instrum Methods A242:299, 1986.
Geladi P, Kowalski BR. Anal Chim Acta 185:1, 1986.
Gertner I, Heber O, Zajfman J, Zajfman D, Rosner B. Nucl Instrum Methods B36:74, 1989.
Gunnink R. Nucl Instrum Methods 143:145, 1977.
Heckel J, Scholz W. X-Ray Spectrom 16:181, 1987.
Hertogen J, De Donder J, Gijbels R. Nucl Instrum Methods 115:197, 1974.
Hoskuldsson A. J Chemometr 2:211, 1988.
Jansson PA. Deconvolution with Applications in Spectroscopy. New York: Academic Press, 1984.
Janssens K, Van Espen P. Anal Chim Acta 184:117, 1986.
Janssens K, Dorrine W, Van Espen P. Chemometr Intell Lab Syst 4:147, 1988.
Janssens K, Vincze L, Van Espen P, Adams F. X-Ray Spectrom 22:234, 1993.
Jenkins R, Gould RW, Gedcke D. Quantitative X-Ray Spectrometry. New York: Marcel Dekker,
1981.
Jensen BB, Pind N. Anal Chim Acta 117:101, 1985.
Johansson GI. X-Ray Spectrom 11:194, 1982.
Joy DC. Rev Sci Instrum 56:1772, 1985.
Kajfosz J, Kwiatek WM. Nucl Instrum Methods B22:78, 1987.
Lemberge P, Van Espen PJ. X-Ray Spectrom 28:77, 1999.
Lorber A. Wangen LE, Kowalski BK. J Chemometr 1:19, 1987.
Lucas-Tooth HJ, Price BJ, Metallurgia 64: 149, 1961.
Maenhaut W, Vandenhaute J. Bull Soc Chim Belg 95:407, 1986.
Manne R. J Chemometr Intell Lab Syst 2:187, 1987.
Marageter E, Wegscheider W, Muller K. Nucl Instrum Methods B1:137, 1984a.
Marageter E, Wegscheider W, Muller K. X-Ray Spectrom 13:78, 1984b.
Marquardt DW. J Soc Ind Appl Math 11:431, 1963.
Martens H, Naes T. Multivariate Calibration. Chichester: John Wiley, 1989.
Massart DL, Vandeginste BGM, Deming SN, Michotte Y, Kaufman K. Chemometrics: A
Textbook. Amsterdam: Elsevier, 1988.
McCarthy JJ, Schamber FH. NBS Spec Publ 604:273, 1981.
McCrary JH, Singman LV, Ziegler LH, Looney LD, Edmonds CM, Harris CE. Phys Rev
A4:1745, 1971.

Copyright 2002 Marcel Dekker, Inc.


McCullagh H. Report EGG-PHYS-5890, Idaho National Engineering Laboratory, Idaho Falls, ID,
1982.
McNelles LA, Campbell JL. Nucl Instrum Methods 127:73, 1975.
Molt K, Schramm R. Fresenius J Anal Chem 359:61, 1997.
Nielson KK. X-Ray Spectrom 7:15, 1978.
Nullens H, Van Espen P, Adams F. X-Ray Spectrom 8:104, 1979.
Nunez J, Rebollo Neira LE, Plastino A, Bonetto RD, Guerin DMA, Alvarez AG. X-Ray Spectrom
17:47, 1988.
Op De Beeck JP, Hoste J. Atomic Energy Rev 13:151, 1975.
Pella PA Feng L, Small JA. X-Ray Spectrom 14:125, 1985.
Pessara W, Debertin K. Nucl Instrum Methods 184:497, 1981.
Petersen W, Ketelsen P, Knochel A. Nucl Instrum Methods A245:535, 1986.
Phillips GW. Nucl Instrum Methods 153:449, 1978.
Philips GW, Marlow KW. Nucl Instrum Methods 137:525, 1976.
Pratar A, De Jong S. J Chemometr 11:311, 1997.
Press WH, Flannery BP, Teukolsky SA, Vetterling WT. Numerical Recipes in C, The Art of
Scientic Computing. Cambridge: Cambridge University Press, 1988.
Reed SJB, Ware NG. J Phys E5:582, 1972.
Robertson A, Prestwich WV, Kennett TJ. Nucl Instrum Methods 100:317, 1972.
Routti JT, Prussin SG. Nucl Instrum Methods 72:125, 1969.
Ryan CG, Clayton E, Grin WL, Sie SH, Cousens DR. Nucl Instrum Methods B34:396, 1988.
Salem SI, Wimmer RJ. Phys Rev A2:1121, 1970.
Salem SI, Saunders BG, Melson C. Phys Rev A1:1563, 1970.
Savitzky A, Golay MJE. Anal Chem 36:1627, 1964.
Schamber FH. In: Dzubay T, ed. X-Ray Fluorescence Analysis of Environmental Analysis. Ann
Arbor, MI: Ann Arbor Science, 1977, p. 241.
Schreier F. J Quant Spectros Radiat Transfer 48:743, 1992.
Schwalbe LA, Trussell HJ. X-Ray Spectrom 10:187, 1981.
Scoeld JH. Phys Rev 179:9, 1970.
Scoeld JH. Phys Rev A9:1041, 1974a.
Scoeld JH. Phys Rev A10:1507, 1974b.
Sherry WM, Vander Sande JB. X-Ray Spectrom 6:154, 1977.
Smith DGW, Gold CM, Tomlinson DA. X-Ray Spectrom 4:149, 1975.
Statham PJ. X-Ray Spectrom 5:16, 1976a.
Statham PJ. X-Ray Spectrom 5:154, 1976b.
Statham PJ. X-Ray Spectrom 7:132, 1978.
Statham PJ, Nashashibi T. In: Newbury DE, ed. Microbeam Analysis. San Francisco: San Francisco
Press, 1988, p. 50.
Steenstrup S. J Appl Crystallogr 14:226, 1981.
Swerts J, Van Espen P. Anal Chem 65:1181, 1993.
Urbanski P, Kowalska E. X-Ray Spectrom 24:70, 1995.
Van Dyck P, Van Grieken R. X-Ray Spectrom 12:111, 1983.
Van Espen P, Adams F. X-Ray Spectrom 5:123, 1976.
Van Espen P, Nullens H, Adams F. Anal Chem 51:1325, 1979a.
Van Espen P, Nullens H, Adams F. Nucl Instrum Methods 142:243, 1977a.
Van Espen P, Nullens H, Adams F. Nucl Instrum Methods 145:579, 1977b.
Van Espen P, Nullens H, Adams F. X-Ray Spectrom 9:126, 1980.
Van Espen P, Janssens K, Nobels J. Chemometr Intell Lab Syst 1:109, 1986.
Van Espen P, Nullens H, Maenhaut W. In: Newbury DE, ed. Microbeam Analysis. San Francisco,
San Francisco Press, 1979b, p. 265.
Van Espen P, Nullens H, Adams F. Anal Chem 51:1580, 1979c.
Vekemans B, Janssens K, Vincze L, Adams F, Van Espen P. X-Ray Spectrom 23:275, 1994.
Vekemans B, Janssens K, Vincze L, Adams F, Van Espen P. Spectrochim Acta 50B:149, 1995.

Copyright 2002 Marcel Dekker, Inc.


Verghese K, Mickael M, He T, Gardner RP. Adv X-Ray Anal 31:461, 1988.
Vincze L, Janssens K, Adams F. Spectrochim Acta 48B:553, 1993.
Vincze L, Janssens K, Adams F, Jones KW. Spectrochim Acta 50B:1481, 1995a.
Vincze L, Janssens K, Vekemans B, Adams F. Spectrochim Acta 54B:1711, 1999.
Vincze L, Janssens K, Adams F, Rivers ML, Jones KW. Spectrochim Acta 50B:127, 1995b.
Watjen U. Nucl Instrum Methods B22:29, 1987.
Wilkinson DH. Nucl Instrum Methods 95:259, 1971.
Wielopolski L, Gardner RP. Nucl Instrum Methods 165:297, 1979.
Yacout AM, Dunn WL. Adv X-Ray Anal 30:113, 1987.
Yacout AM, Gardner RP, Verghese K. Nucl Instrum Methods A243:121, 1986.
Yacout AM, Gardner RP, Verghese K. Adv X-Ray Anal 30:121, 1987.
Yule PH. Nucl Instrum Methods 54:61, 1967.
Zolnai L, Szabo, Gy. Nucl Instrum Methods B34:118, 1988.

Copyright 2002 Marcel Dekker, Inc.

Vous aimerez peut-être aussi