Académique Documents
Professionnel Documents
Culture Documents
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/6402702
CITATIONS
READS
148
280
8 AUTHORS, INCLUDING:
Radu Oprean
Serge Rudaz
University of Geneva
SEE PROFILE
SEE PROFILE
Bruno Boulanger
Philippe Hubert
University of Lige
University of Lige
SEE PROFILE
SEE PROFILE
Abstract
All analysts face the same situations as method validation is the process of proving that an analytical method is acceptable for its intended
purpose. In order to resolve this problem, the analyst refers to regulatory or guidance documents, and therefore the validity of the analytical
methods is dependent on the guidance, terminology and methodology, proposed in these documents. It is therefore of prime importance to have
clear definitions of the different validation criteria used to assess this validity. It is also necessary to have methodologies in accordance with these
definitions and consequently to use statistical methods which are relevant with these definitions, the objective of the validation and the objective of
the analytical method. The main purpose of this paper is to outline the inconsistencies between some definitions of the criteria and the experimental
procedures proposed to evaluate those criteria in recent documents dedicated to the validation of analytical methods in the pharmaceutical field,
together with the risks and problems when trying to cope with contradictory, and sometimes scientifically irrelevant, requirements and definitions.
2007 Elsevier B.V. All rights reserved.
Keywords: Validation; Guidelines; Terminology; Methodology; Accuracy profile
1. Introduction
The demonstration of the ability of an analytical method to
quantify is of great importance to ensure quality, safety and
efficacy of pharmaceuticals. Consequently, before an analytical method can be implemented for routine use, it must first
be validated to demonstrate that it is suitable for its intended
purpose. While the need to validate methods is obvious, the
procedures for performing a rigorous validation program are
generally not defined. If regulatory documents allow selecting
the validation parameters that should be established, there are
still three main questions remaining: (a) How to interpret the
regulatory definitions of the parameters? (b) What should be
the specific procedure to follow to evaluate a particular parame-
Corresponding author. Tel.: +32 4 366 43 16; fax: +32 4 366 43 17.
E-mail address: Ph.hubert@ulg.ac.be (P. Hubert).
0021-9673/$ see front matter 2007 Elsevier B.V. All rights reserved.
doi:10.1016/j.chroma.2007.03.111
112
Table 1
Definitions of selectivity and specificity in different international organizations
Organization
Definition
Reference
IUPAC
[14]
Specificity is the ability to assess unequivocally the analyte in the presence of components, which
may be expected to be present. Typically these might include impurities, degradants, matrix, etc.
Lack of specificity of an individual analytical procedure may be compensated for by other
supporting analytical procedure(s).
Test for interferences (specificity): (a) Test effect of impurities, ubiquitous contaminants, flavours,
additives, and other components expected to be present and at unusual concentrations. (b) test
nonspecific effects of matrices. (c) Test effects of transformation products, if method is to indicate
stability, and metabolic products, if tissue residues are involved.
[4]
WELAC
ISO
ICH
AOAC
[20]
[8]
IUPAC: International Union of Pure and Applied Chemistry; WELAC: Western European Laboratory Accreditation Cooperation; ICH: International Conference on
Harmonization; ISO: International Organization for Standardization; AOAC: Association of Official Analytical Chemists.
113
114
Fig. 1. Accuracy profiles of the LCMS/MS assay for the determination of loperamide in plasma (concentration pg/ml) using (A) linear regression model, (B)
weighted linear regression model with a weight equal to 1/X2 , (C) linear regression model after logarithm transformation, (D) quadratic regression. The dotted lines
represent the acceptance limits (15%, 15%); the dashed lines the 95% tolerance interval connected. When the tolerance intervals are included in the acceptance
limits, then the assay is able to quantify accurately, other wise not. The continuous line represents the estimated relative bias line.
115
Fig. 2. Response functions for the LCMS/MS assay for the determination of loperamide in plasma (concentration pg/ml) for series 2 only using (A) linear regression
model (R2 = 0.9991), (B) weighted linear regression model with a weight equal to 1/X2 (R2 = 0.9991), (C) linear regression model after logarithm transformation
(R2 = 0.9997), (D) quadratic regression (R2 = 0.9991).
because it requires a lot of computing and is a post-data acquisition scenario, e.g. evaluation of all the putative calibration
models before making a choice. Currently, the computational
power is no more a limitation and the selection of a model is
perfectly aligned with the objective of the method.
Having stressed the difference between response function and
linearity, it allows to apply the concept of linearity not only to
relative but also to absolute analytical methods such as titration for which the results are not obtained by back-calculation
from a calibration curve. Attempts to provide a response function are therefore of no use and impracticable as there is no
signal or response whereas the linearity of the results can be
assessed.
Statistical models for calibration curves can be either linear or
non-linear in their parameter(s)as opposed to linear in shape,
indeed a quadratic model Y = + X + X2 is linear because it
is a sum of X even if its graphical representation may look
curved on a XY plot. The choice between these two families
of models will depend on the type of method and/or the range
of concentrations of interest. When a narrow range is considered, an unweighted linear model is usually adapted, while a
larger range may require a more complex or weighted model.
116
Fig. 3. Standard curves (top, 1), linearity profiles (middle, 2) and accuracy profiles (bottom, 3) obtained on a high-performance thin-layer chromatography using
(left, a) a quadratic regression model, (right, b) a linear regression model. For the linearity profile and the accuracy profiles, the dotted lines represent the acceptance
limits (10%, 10%); the dashed lines the 95% tolerance interval connected. When the tolerance intervals are included in the acceptance limits, then the assay is able
to quantify accurately, other wise not. For the linearity profile, the continuous line represents the identity line (result = concentration) while for the accuracy profile
the continuous line represents the estimated bias line.
This being said, and because of fitting techniques, the experimental design, i.e. the way to spread the concentration values
over the range may significantly impact the precisions of the
results or inverse predictions that the response function will provide. As show by Francois et al. [31], depending on the model
117
Fig. 4. Standard curves (top, 1), linearity profiles (middle, 2) and accuracy profiles (bottom, 3) obtained on an immuno-assay using (left, a) a weighted 4-parameter
logistic model, (right, b) a linear regression on the most linear part of the response. For the linearity profile and the accuracy profiles, the dotted lines represent
the acceptance limits (30%, 30%); the dashed lines the 95% tolerance interval connected. When the tolerance intervals are included in the acceptance limits, then
the assay is able to quantify accurately, other wise not. For the linearity profile, the continuous line represents the identity line (result = concentration) while for the
accuracy profile the continuous line represents the estimated bias line.
that will be used for the response function, there are designs that
give more precise measurements than others. As general good
rule of thumb for optimally choosing the concentrations values, they show that for most models used in assays, from linear
118
x i T
T
x i
= 100 relative bias (%)
T
The ISO documents 5725 unambiguously affirm what is trueness and how to measure it. Application of this concept to the
validation experiments is that measuring several times independent validation standards, for instance i standards, for which the
true value of analyte concentration or amount (T ) is known
allows to compute their predicted concentration or amount: xi .
Therefore, it is possible to compute the mean value of these
predicted results (xi ) and consequently estimate the bias, relative bias or recovery. Those values are well estimated as they
are daily computed during the validation step of an analytical
procedure. Trueness is related to the systematic errors of the
analytical procedures [2,5,6,34]. Trueness refers thus to a characteristic or a quality of the analytical procedure and not to a
result generated by this procedure. This nuance is fundamental,
as we will see thereafter.
However, when looking for trueness in the regulatory documents for the validation of the pharmaceutical analytical
procedures, this concept is not per se defined. Conscientiously
reading both the ICH Q2R1 [4] and the FDA Bioanalytical
Method validation [3] documents references to this concept are
nonetheless made. When looking at ICH Q2R1 part 1 the use
of trueness is made: The accuracy of an analytical procedure
expresses the closeness of agreement between the value which
is accepted either as a conventional true value or an accepted
bias or 100% recovery is included in the 1 confidence interval of the relative bias or recovery, respectively. If these values
are outside their corresponding confidence interval then the null
hypothesis is rejected. However, the only conclusion which can
be made when the null hypothesis is not rejected is not that the
bias, relative bias or recovery is equal to 0, 0% or 100% but it
is that the test could not demonstrate that the bias, relative bias
or recovery is different than 0 or 100. As clearly demonstrated
in numerous publications [27,3638], the risk, which is the
probability to wrongly accept the null hypothesis, is not fixed
by the user in this situation. Furthermore, this approach can conclude that the bias is significantly different from 0, whereas it
could be analytically acceptable [27,3638]. It will also always
consider that the bias is not different from 0 when the variability
of the procedure is relatively high. In fact, the Student t-test used
this way is a difference test which answers the question: Is the
bias of my analytical procedure different of 0? However, the
question the analyst is wishing to answer during the validation
step of the analytical procedure is: Is the bias of my analytical
procedure acceptable? The test to answer this last question is
an equivalence or interval hypothesis test [27,3638]. In these
types of test, the analyst has to select an acceptance limit for the
bias, relative bias or recovery, that is limits in which if the true
bias, relative bias or recovery of the analytical procedure lays the
trueness of this procedure is acceptable. Different authors have
recommended the use of this type of tests to assess the acceptability of a bias [27,38]. Indeed a perfectly unbiased procedure
is utopia. Furthermore the bias obtained during the validation
experiment is only an estimation of the true unknown bias of the
analytical procedure. Nevertheless, this latest interval hypothesis test, while statistically correct, is not answering to the real
analytical question: the very purpose of validation is to validate
the results a method will produce, not the method itself. We will
come back to this objective and explain more in detail, in Section
5, the connections existing between good results and good
methods.
4.2. Precision
Contrary to trueness, homogenous definitions for precision
can be found in the regulatory documentation. For instance, the
ICH Q2R1 Part 1 definition of precision is: The precision of
an analytical procedure expresses the closeness of agreement
(degree of scatter) between a series of measurements obtained
from multiple sampling of the same homogeneous sample under
the prescribed conditions. This definition of precision is consistent with its definition in the FDA Bioanalytical Method
Validation, ISO, Eurachem, IUPAC, FAO and AMC documents.
As stated in all documents, precision is expressed as standard deviation (s), variance (s2 ) or relative standard deviation
(RSD) or coefficient of variation (CV). It measures the random
error linked to the analytical procedure, i.e. the dispersion of
the results around their average value. The estimate of precision
is independent of the true or specified value and the mean or
trueness estimate. Each document makes reference to different
precision levels. For ICH Q2R1 and ISO documents, three levels
could be assessed:
119
MSMj =
1
n(xij,calc
j )2
p1
i=1
j =
1
xijk,calc
pn
i=1 k=1
120
with xijk, calc being the calculated concentration from the selected
response function.
p
MSEj =
1
(xijk,calc x ij,calc )2
pn p
i=1 k=1
Table 2
Experimental design of four runs taking into account days, operators and equipments as sources of variability
Run 1
Run 2
Run 3
Run 4
Day 1
Operator 1
Equipment 2
Day 1
Operator 2
Equipment 1
Day 2
Operator 1
Equipment 1
Day 2
Operator 2
Equipment 2
2
W,j
= MSEj
MSMj MSEj
=
n
Else:
2
B,j
2
W,j
=
1
(xijk,calc x j,calc )2
pn 1
i=1 k=1
2
B,j
=0
2
2
= W,j
+ B,j
Rj =
2
B,j
2
W,j
2
RSD (%) = 100
x
where 2 is the estimated variance and x is the estimated average
value.
When an RSD precision is expressed, the corresponding variance is used, e.g. repeatability or intermediate precision. The
computed RSD is therefore the ratio of two random variables,
giving a new parameter with high uncertainty. However, in the
case of validation of analytical procedure, because the true or reference value is known, then the denominator should be replaced
by its corresponding true value T . The RSD computed by
this way depends only on the estimated precision (estimated
variances), regardless of the estimated trueness.
This being said, the use of relative estimate is convenient
from a direct reading point de view but triggers nevertheless
a series of queries: what matters the most for the results, the
(absolute) variance or the relative standard deviation? Imagine
that a bioanalytical method is used for supporting a pharmacokinetic study. In that case the results are used for fitting the PK
non-linear model and what matters only is either the variance of
the results or the variance of the logarithms of the results, but
not the RSD at all. Remember that a procedure is validated for
its intended use. So what is the relevance of making a decision
on acceptance of a method based on the RSD when only the
variance of its results are important with regard to the intend
of its use. This distinction becomes particularly important when
dealing with the LOQ. Indeed, since the RSD is the SD divided
by the true concentration value, the RSD becomes large at the
lower end simply because the SD is divided by a small number,
not because the method becomes less precise. A good example
can be seen by comparing the same information on Fig. 3.a.2 in
absolute scale and Fig. 3.a.3 in relative scale. On Fig. 3.a.2 the
distance between the two dashed lines represents a multiple of
the intermediate precision in absolute value while on Fig. 3.a.3 it
is the same value but expressed in relative value. While it appears
that on that later figure (a.3) the relative intermediate precision
(RSD) explode at the smallest concentration, leading to conclude that results are not precise enough at that level, it is also
clear that, for this example, the absolute intermediate precision
improves at the smallest concentration because the intermediate precision SD is smaller. The contradiction comes here from
the fact that the SD has been divided by a small number, not
because the measurements are less precise, on contrary. This
raise questions on the meaning and the definition of the LOQ.
Indeed why ignoring or discarding the results at those low levels while they are obtained with a variance much smaller than
results at high concentrations? Once again, the answer to this
question lies in the intended use of the results: for supporting
stability or pharmacokinetic studies, not only it is not relevant
to discard those very precise measurements at the small concentrations, but they also are very useful, for example, in estimating
accurately the half-life or the pharmacokinetic of metabolites.
Only the variance or the SD matters, not the RSD. So, while the
common practice evaluate a method with respect to the relative
expression of the precision, scientists in the laboratories should
carefully consider the absolute and fundamental variance before
discarding data and question if it serves or not the objectives of
the study.
4.3. Accuracy
In document ICH Q2R1 part 1 [4], accuracy is defined as: . . .
the closeness of agreement between the value which is accepted
either as a conventional true value or an accepted reference value
and the value found. This definition corresponds to the one of
ISO [5,6] documents or VIM [32] which states that accuracy
is: the closeness of agreement between a test result and the
accepted reference value. Furthermore, in the ISO definition a
note is added specifying that accuracy is the combination of random error and systematic error or bias. From this and as specified
by the Analytical Methods Committee (AMC) [34], it is easily
understood that accuracy rigorously applies to results and not
to analytical methods, laboratories or operators. The AMC also
outlines that accuracy should be used that way in formal writing.
121
Therefore, accuracy denotes the absence of error of a result. Similar definitions of accuracy are found in the Eurachem document
[33].
The total measurement error of the results obtained from an
analytical procedure is related to the closeness of agreement
between the value found, i.e. the result, and the value that is
accepted either as a conventional true value or an accepted reference value. The closeness of agreement observed is based on
the sum of the systematic and random errors, namely the total
error linked to the result. Consequently, the measurement error
is the expression of the sum of trueness (or bias) and precision
(or standard deviation), i.e. the total error. As shown below, each
measurement X has three components: the true sample value T ,
the bias of the method (estimated by the mean of several results)
and the precision (estimated by the standard deviation or, in most
cases, the intermediate precision). Equivalently, the difference
between an observation X and the true value is the sum of the
systematic and random errors, i.e. total error or measurement
error.
X = T + bias + precision
X T = bias + precision
X T = total error
X T = measurement error
X T = accuracy
However, when looking at the section corresponding to accuracy in part 2 of ICH Q2R1 document, the recommended data
to document accuracy are presented as: accuracy should be
reported as percent recovery by the assay of known added
amount of analyte in the sample or as the difference between
the mean and the accepted true value together with the condence intervals. This refers not anymore to accuracy but instead
to the trueness definition of ISO 5725 document because it is the
average value of several results as opposed to a single result as
for the accuracy that is compared to the true value, as already
stated previously. This section refers consequently to systematic errors whereas accuracy as defined in ICH Q2R1 part 1
and ISO 5725 part 1 corresponds to the evaluation of the total
measurement error. In the document FDA Bioanalytical Method
Validation [3], accuracy is defined as . . . the closeness of mean
test results obtained by the method to the true value (concentration) of the analyte. (. . .) The mean value should be within
15% of the actual value except at LLOQ, where it should not
deviate by more than 20%. The deviation of the mean from the
true value serves as the measure of accuracy. As already mentioned in the previous sections, this definition corresponds to
the analytical method trueness. For bioanalytical methods, earlier reviews have already stressed the problem of the difference
of the definition of accuracy relative to trueness [1,2,27,38].
For most uses it does not matter whether a deviation from
the true value is due to random error (lack of precision) or to
122
123
and ULOQ is the upper limit of quantitation. Thus, the abovementioned definitions are quite similar because for both of
them, the range is correlated with the linearity and the accuracy (trueness + precision). Moreover, both documents specify
that the range is dependent on the specific application of the
procedure. ICH Q2R1 part 2 states that the specied range is
established by conrming that the analytical procedure provides an acceptable degree of linearity, accuracy and precision
when applied to samples containing amounts of analyte within
or at the extremes of the specied range of the analytical procedure. IUPAC defines the range as a set of values of measured
for which the error of a measuring instrument is intended to lie
within specied limits.
The range should be anticipated in the early stage of the
method development and its selection is based on previous information about the sample, in a particular study. The chosen range
determines the number of standards used in constructing a calibration curve.
ICH Q2R1 part 2 recommends the minimum specified ranges
for different studies:
(i) for the assay of a drug substance or a finished (drug) product: normally from 80 to 120% of the test concentration;
(ii) for content uniformity, covering a minimum of 70130%
of the test concentration, unless a wider more appropriate
range, based on the nature of the dosage form (e.g. metered
dose inhalers), is justified;
(iii) for dissolution testing: 20% over the specified range;
(iv) for the determination of an impurity: from the reporting
level of an impurity to 120% of the specification.
Therefore, the dosing range is the concentration or amount
interval over which the total error of measurement or accuracy
is acceptable. It is essential to demonstrate the accuracy of the
results over the entire range. Consequently, and in order to fulfil
these definitions, the proposition of ICH document to realize six
measurements only at the 100% level of the test concentration
to assess the precision of the analytical method should be used
with precautions to be in accordance with the definition of the
range. Accuracy, and therefore trueness and precision should be
evaluated experimentally and acceptable over the whole range
targeted for the application of the analytical procedure.
6. Dosing range
7. Limit of quantitation
For any quantitative method, it is necessary to determine the
range of analyte concentrations or property values over which
the method may be applied. ICH Q2R1 part 1 document defines
the range of an analytical procedure as the interval between
the upper and lower concentration (amounts) of analyte in the
sample (including these concentrations) for which it has been
demonstrated that the analytical procedure has a suitable level
of precision, accuracy and linearity. The FDA Bioanalytical Method validation definition of the quantication range
is the range of concentration, including ULOQ and LLOQ,
that can be reliably and reproducibly quantied with accuracy and precision through the use of a concentrationresponse
relationship, where LLOQ is the lower limit of quantitation
124
3.3
S
QL
10
S
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
125