Vous êtes sur la page 1sur 11

Published on IVT Network (http://www.ivtnetwork.

com)

Statistical Analysis in Analytical Method Validation

By Eugenie Webster (Khlebnikova) Dec 16, 2013 8:57 pm EST

Peer Reviewed: Method Validation


The views and opinions expressed in this article are those of the individual author and should not be attributed to any
company with which the author is now or has been employed or affiliated.

Abstract
This paper discusses an application of statistics in analytical method validation. The objective of this paper is to provide an
overview of regulatory expectations related to statistical analysis and the review of common statistical techniques used to
analyze analytical method validation data with specific examples. The examples provided cover the minimum expectations of
regulators.

Key Points
The following key points are presented:

Regulatory guidelines regarding statistical data analysis in analytical method validation.


Statistics to analyze data for analytical method validation such as mean, standard deviation, confidence intervals, and
linear regression.
Data analysis using statistical packages such as Minitab and Excel are discussed.

Introduction
Analytical method validation is an important aspect in the pharmaceutical industry and is required during drug development
and manufacturing. The objective of validation of an analytical method is to demonstrate that the method is suitable for the
intended use, such as evaluation of a known drug for potency, impurities, etc. The intent of method validation is to provide
scientific evidence that the analytical method is reliable and consistent before it can be used in routine analysis of drug
product. The analytical method validation is governed by the International Conference on Harmonization Guideline Q2(R1),
Validation of Analytical Procedures: Text and Methodology (1). The ICH guideline on performing analytical method validation
provides requirements to demonstrate method specificity, accuracy, precision, repeatability, intermediate precision,
reproducibility, detection limit, quantitation limit, linearity, range, and robustness. The ICH definitions for validation
characteristics are listed in Table I.

Table I: Validation Characteristics for Analytical Method Validation.

Validation
ICH Definition
Characteristic
Specificity is the ability to assess unequivocally the analyte in the presence of components which may
Specificity
be expected to be present. Typically these might include impurities, degradants, matrix, etc.
The accuracy of an analytical procedure expresses the closeness of agreement between the value that
Accuracy is accepted either as a conventional true value or an accepted reference value and the value found.
This is sometimes termed trueness.
The precision of an analytical procedure expresses the closeness of agreement (degree of scatter)
between a series of measurements obtained from multiple sampling of the same homogenous sample
Precision
under the prescribed conditions. Precision may be considered at three levels: repeatability,
intermediate precision, and reproducibility.
Repeatability expresses the precision under the same operating conditions over a short interval of
Repeatability
time. Repeatability is also termed intra-assay precision.
Intermediate Intermediate precision expresses within-laboratories variations: different days, different analysts,
precision different equipment, etc.
Reproducibility expresses the precision between laboratories (collaborative studies usually applied to
Reproducibility
standardization of methodology).
The detection limit of an individual analytical procedure is the lowest amount of analyte in a sample
Detection Limit
that can be detected but not necessarily quantitated as an exact value.
The quantitation limit of an individual analytical procedure is the lowest amount of analyte in a sample
that can be quantitatively determined with suitable precision and accuracy. The quantitation limit is a
Quantitation Limit
parameter of quantitative assays for low levels of compounds in sample matrices and is used
particularly for the determination of impurities and/or degradation products.
The linearity of an analytical procedure is its ability (within a given range) to obtain test results that are
Linearity
directly proportional to the concentration (amount) of analyte in the sample.
The range of an analytical procedure is the interval between the upper and lower concentration
Range (amounts) of analyte in the sample (including these concentrations) for which it has been
demonstrated that the analytical procedure has a suitable level of precision, accuracy, and linearity.
The robustness of an analytical procedure is a measure of its capacity to remain unaffected by small,
Robustness but deliberate, variations in method parameters and also provides an indication of its reliability during
normal usage.

The validation characteristics should be investigated based on the nature of the analytical method. Results for each applicable
validation characteristic are compared against the selected acceptance criteria and are summarized in the analytical method
validation report. ICH also provides recommendations on statistical analysis required to demonstrate method suitability. These
recommendations are further discussed in the following sections.

In addition to ICH, the US Food and Drug Administration guidance, Draft guidance for Industry: Analytical Procedures and
Methods Validation (2) can be consulted for detailed information on the US requirements.

Statistics in Analytical Method Validation


Statistical analysis of data obtained during a method validation should be performed to demonstrate validity of the analytical
method. The statistics required for the interpretation of analytical method validation results are the calculation of the mean,
standard deviation, relative standard deviation, confidence intervals, and regression analysis. These calculations are typically
performed using statistical software packages such as Excel, Minitab, etc. The purpose of statistical analysis is to summarize
a collection of data that provides an understanding of the examined method characteristic. The acceptance criteria for each
validation characteristic are typically around the individual values as well as the mean and relative standard deviation. The
statistical analysis explained in this paper is based on assumption of normal distribution. Non-normally distributed data will
need to be transformed first, prior to performing any statistical analysis. The statistical tools with examples of each tool
application are described in the following.
Mean

The mean or average of a data set is the basic and the most common statistics used. The mean is calculated by adding all
data points and dividing the sum by the number of samples. It is typically denoted by x? (x bar) and is computed using the
following formula:

where Xi are individual values and n is the number of individual data points.

Standard Deviation

The standard deviation of a data set is the measure of the spread of the values in the sample set and is computed by
measuring the difference between the mean and the individual values in a set. It is computed using the following formula:

where Xi is individual value, X? is the sample mean, and n is the number of individual data points.

Relative Standard Deviation

The relative standard deviation is computed by taking the standard deviation of the sample set multiplied by 100% and dividing
it by the sample set average. The relative standard deviation is expressed as percent. Typically, the acceptance criteria for
accuracy, precision, and repeatability of data is expressed in % RSD:

Confidence Interval

Confidence intervals are used to indicate the reliability of an estimate. Confidence intervals provide limits around the sample
mean to predict the range of the true population of the mean. The prediction is usually based on probability of 95%. The
confidence interval depends on the sample standard deviation and the sample mean.

Confidence interval for

where s is the sample deviation, X? is the sample mean, n is the number of individual data points, and z is constant obtained
from statistical tables for z.

The value of z depends on the confidence level listed in statistical tables for z. For 95%, z is 1.96 (3). For small samples, z can
be replaced by t-value obtained from the Student’s t-distribution tables (4). The value of t corresponds to n-1.

Table II provides an example of a typical data analysis summary for the evaluation of a system precision for a high-powered
liquid chromatography (HPLC) analysis.

Table II: An Example of a System Precision Determination for a HPLC Analysis.

Injection Number Area Response for Analyte Peak in Standard


1 451662
2 450752
3 447638
4 452541
5 449321
6 448747
Mean 450110
Standard Deviation 1861
% RSD 0.41
95% Confidence Interval 448621 to 451599

Confidence interval (CI) for µ = 450110 ± = 450110 ±1489 = 448621 to 451599.

The calculated confidence interval in Table II indicates that the range of the true population of the mean is between 448621
and 451599.

Bayesian statistics is an alternative approach to confidence intervals, which is well explained in the reference provided in the
references section (11).

Regression Analysis

Regression analysis is used to evaluate a linear relationship between test results. A linear relationship is usually evaluated
across the range of the analytical procedure. The data obtained from analysis of the solutions prepared at a range of different
concentration levels is usually investigated by plotting on a graph.

Linear regression evaluates the relationship between two variables by fitting a linear equation to observed data. A linear
regression line has an equation of the form Y = bo + b1X, where X is the independent variable and Y is the dependent
variable. The slope of the line is b1, and bo is the intercept (the value of y when x = 0). The statistical procedure of finding the
“best-fitting” straight line is to obtain a line through the points to minimize the deviations of the points from the prospective line.
The best-fit criterion of goodness of the fit is known as the principle of least squares. In mathematical terms, the best fitting line
is the line that minimizes the sum of squares of the deviations of the observed values of Y from those predicted.

Once a regression model has been fit to a group of data, examination of the residuals (the deviations from the fitted line to the
observed values) allows investigation of the validity of the assumption that a linear relationship exists. Plotting the residuals on
the y-axis against the independent variable on the x-axis reveals any possible non-linear relationship among the variables or
might alert to investigate outliers.

Table III provides an example of data that is evaluated for linearity.

Table III: Solution Concentration versus Measurements.

Response Y
Concentration X
Measurement
10 213
20 378
30 629
40 848
50 994
60 1227

In this example, measurement values (Response Y) are plotted against corresponding concentration (X), refer to Figure 1.

Figure 1: Fitted Line Plot.


In this example, the data clearly shows a linear relationship. The fitted or estimated regression line equation is computed using
the following formula:

Y = b0 + b1X + ei
Where b0: y-intercept, b1: line slope, and ei: the residual.

Table IV provides the calculations that are used to compute y-intercept and the line slope.

Table IV: Manual Calculations.

Xi Yi (Xi)2 XiYi (Yi-Y)2 (Xi-X)*(Yi-Y) (Xi-X)2


10 213 100 2130 251837 12546 625
20 378 400 7560 113457 5053 225
30 629 900 18870 7367 429 25
40 848 1600 33920 17733 666 25
50 994 2500 49700 77934 4188 225
60 1227 3600 73620 262315 12804 625
A B C D E F G
? iX ? Yi ? (Xi)2 ? Xi Yi ? [ (Y2i - Y) ? [ (Xi - X)(Yi - Y) ] ? [ (X2i - X)
210 4289 9100 185800 730643 35685 1750
Table V provides the mathematical formulas and calculations for data listed in Table IV.

Table V: Manual Calculations for Error.

Mathematical Calculation Calculation

bo = y - b1*x 715-20.39*35=1.33

Thus, the equation of the line is Y =1.13 + 20.39*X.

The other important calculations that are typically reported are the coefficient of determination (R2) and linear correlation
coefficient (r). The coefficient of determination (R2) measures the proportion of variation that is explained by the model.
Ideally, R2 should be equal to one, which would indicate zero error. The correlation coefficient (r) is the correlation between
the predicted and observed values. This will have a value between 0 and 1; the closer the value is to 1, the better the
correlation. Any data that form a straight line will give high correlation coefficient; therefore, extra caution should be taken
when interpreting correlation coefficient. Additional statistical analysis is recommended to provide estimates of systematic
errors, not just the correlation or results. For instance, in method comparison studies, if one method gives consistently higher
results than the other method, the results would show linear correlation and have a high correlation coefficient, despite a
difference between the two methods.

Table VI provides equations that are used to determine the coefficient of determination (R2) and the correlation coefficient (r).

Table VI: Line Equation Formulas.

Mathematical Expression Calculation


SS Total = ? (Yi-Y)2 (E) =730643
730643-9100= 727668
SS regression = SS total - ? (Xi)2

SS error = SS Total- SS regression 730643-727668 = 2975


R2= SS regression/SS Total 727668/730643 = 0.996
r = ?R2 r = ?0.0996 = 0.998

Figure 2 demonstrates the Excel output, and Figure 3 demonstrates the Minitab output.

Figure 2: Excel Output.

SUMMARY OUTPUT

Regression Statistics

Multiple R 0.998

R Square 0.996

Adjusted R Square 0.995

Standard Error 27.270

Observations 6
ANOVA

df SS MS F Significance F

Regression 1 727668.1 727668.1 978.4744 0.000

Residual 4 2974.705 743.6762

Total 5 730642.8

Standard
Coefficients t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Error

Intercept 1.13 25.39 0.04 0.97 -69.35 71.62 69.35 71.62

X Variable 1 20.39 0.65 31.28 0.00 18.58 22.20 18.58 22.20

Figure 3 demonstrates Minitab output.

Figure 3: Minitab Output.

Regression Analysis: Y versus X

The regression equation is:


Y = 1.13 + 20.39 X

Predictor Coef SE Coef T P


Constant 1.13 25.39 0.04 0.967
X 20.3914 0.6519 31.28 0.000

S = 27.2704 R-Sq = 99.6% R-Sq(adj) = 99.5%

ANOVA

Source DF SS MS F P
Regression 1 727668 727668 978.47 0.000
Residual Error 4 2975 744
Total 5 730643

Table VII provides the summary of linear regression calculations.

Table VII: Regression Summary.

Data Description Value Calculated


The relationship between the concentration, x and the response, y: Y=
Equation of the line Y = 1.13 + 20.39 X
bo+b1*X
Intercept (bo) The value of y when x equals zero. 1.13
The slope of the line relate to the relationship between concentration and
Slope (b1) 20.39
response.
25.39
The standard error of the intercept can be used to calculate the required
SE (intercept) confidence interval.

95% confidence interval 18.58 to 22.20


The standard error of the slope can be used to calculate the required
0.65
SE (slope) confidence interval.
95% confidence interval 18.58 to 22.20
Coefficient
2 of determination, r The square of the correlation coefficient. 0.996
The correlation between the predicted and observed values. This will
Correlation coefficient, r have a value between 0 and 1; the closer the value is to 1, the better the 0.995
correlation.
The regression sum of squares is the variability in the response that is 727668
Regression SS
accounted for by the regression line.
Residual SS (the error sum The residual sum of squares is the variability about the regression line
2975
of squares) (the amount of uncertainty that remains).
Total SS The total sum of squares is the total amount of variability in the response. 730643

Other Statistical Tools

Other statistical tools used in method validation include comparative studies using Student’s t-test, analysis of variation
(ANOVA) analysis, design of experiments, and assessment of outliers. Information on these statistical tools can be obtained
from statistical books suggested in the reference section.

ICH Data Analysis Recommendations


The ICH guidelines provide suggestions regarding data reporting and analysis. Statistics recommended by ICH to evaluate
method suitability are listed below.

Specificity

The results from specificity studies are typically interpreted by a visual inspection. Quantitative interpretation may also be
performed using analytical software that is able to manipulate spectral information to analyze spectra.

Accuracy

ICH recommends accuracy assessment using a minimum of nine determinations at three concentration levels covering the
specified range. It should be reported as percent recovery by the assay of known amount of analyte in the sample or as the
difference between the mean and the accepted value together with the confidence intervals. Table VIII provides an example of
accuracy data assessment.

Table VIII: Accuracy Example.

Recovery %
Determination 70% 100% 130%

1 100.1 99.2 99.8


2 99.5 99.7 99.5
3 99.3 99.6 99.4
Mean 99.6 99.5 99.6
% RSD 0.4 0.3 0.2
Precision

Comparison of results obtained from samples prepared to test the following conditions:

Repeatability expresses the precision under the same operating conditions over a short interval of time. Repeatability
is also termed intra-assay precision.
Intermediate precision expresses within-laboratories variations: different days, different analysts, different equipment,
etc.
Reproducibility expresses the precision between laboratories (collaborative studies, usually applied to standardization
of methodology).

Table IX provides an example of a typical data analysis summary for the evaluation of a precision study for an analytical
method. In this example, the method was tested in two different laboratories by two different analysts on two different
instruments.

Table IX: Example of Results Obtained for a Precision Study.

Laboratory 1 Laboratory 2
Analyst 1 Analyst 2 Analyst 3 Analyst 4
Sample
Day 1 Day 2 Day 1 Day 2 Day 3 Day 4 Day 3 Day 4
HPLC 1 HPLC 1 HPLC 1 HPLC 1 HPLC 1 HPLC 1 HPLC 1 HPLC 1
1 99.93 99.78 100.23 100.97 99.56 100.17 100.21 99.23
2 99.08 99.63 100.35 100.24 99.03 100.33 101.13 99.68
3 99.46 100.12 100.2 100.36 100.38 99.87 99.15 99.65
4 100.36 100.58 99.91 100.02 100.21 99.65 99.35 99.29
5 100.24 100.45 99.59 100.05 100.39 99.7 100.77 100.13
6 99.68 99.02 100.87 99.97 99.47 98.76 99.38 99.98
mean 99.79 99.93 100.19 100.27 99.84 99.75 100.00 99.66
% RSD 0.49 0.58 0.43 0.37 0.57 0.55 0.83 0.36
Intermediate precision (n=24)
mean 100.05 mean 99.81
Laboratory 1 Laboratory 2
% RSD 0.48 % RSD 0.58
mean 99.93
Reproducibility (n=48)
% RSD 0.54

In the example provided in Table IX, precision of analytical procedure is evaluated by statistical analysis of data to determine
method precision. Precision is determined for a number of different levels during validation, which include system precision,
repeatability, intermediate precision, and reproducibility. The system precision is evaluated by comparing the means and
relative standard deviations. Reproducibility is assessed by means of an inter-laboratory trial. The intermediate precision is
established by comparing analytical results obtained when using different analysts and instruments and performing the
analysis on different days. The repeatability is assessed by measuring the variability in the results obtained when using the
analytical method in a single determination. In each case, the mean and % of RSD is calculated and compared to the
established acceptance criteria.

Detection Limit

The ICH guideline mentions several approaches for determining the detection limit: visual inspection, signal-to-noise, and
using the standard deviation of the response and the slope. The detection limit and the method used for determining the
detection limit should be presented. If visual evaluation is used, the detection limit is determined by the analysis of samples
with known concentration of analyte and by establishing the minimum level at which the analyte can be reliably detected. The
signal-to-noise ratio is performed by comparing measured signals from samples with known low concentrations of analyte with
those of blank. When the detection limit is based on the standard deviation of the response and the slope, it is calculated using
the following equation.

where ? is the standard deviation of the response and S is the slope of the calibration curve.

Quantitation Limit

The ICH guideline states several approaches for determining the quanititation limit: an approach based on visual evaluation,
an approach based on signal-to-noise, and an approach based on the standard deviation of the response and the slope. The
quanititation limit and the method used for determining the quantitation limit should be presented. When the quanititation limit
is based on the standard deviation of the response and the slope, it is calculated using the equation below:

where ? is the standard deviation of the response and S is the slope of the calibration curve.

Linearity

The ICH guideline states that a linear relationship should be evaluated across the range of the analytical procedure. If there is
a linear relationship, test results should be evaluated by linear regression analysis. The correlation coefficient, y-intercept, and
slope of the regression line and residual sum of squires should be submitted with a plot of data.

Range

Range is obtained from results from linearity, accuracy, and precision. These results should be linear, accurate, and precise to
validate a specific range for the method.

Robustness

Robustness is evaluated by performing a comparison of results obtained by deliberately manipulating method parameters
(temperature, different columns, etc.). Mean and % RSDs are compared against the acceptance criteria to evaluate impact of
changing experimental parameters.

Conclusion
The statistical methods used during the analytical method validation involve the basic knowledge of statistics. Even though
there are statistical packages available to perform statistical calculations, it is important to understand the mathematical basis
behind these calculations. It is essential for the analysts to be familiar with the basic statistical elements. Statistics used for
validation data interpretations should be incorporated into the company’s standard procedure and specified in the validation
protocol and report.

References
1. 1. ICH, Technical Requirements for Registration of Pharmaceuticals for Human Use, Topic Q2 (R1) Validation of
Analytical Procedures: Text and Methodology.
2. 2. FDA, Analytical Procedures and Methods Validation (Rockville, MD, 2000).
3. S. Bolton, Pharmaceutical Statistics Practical and Clinical Applications, 5th ed., New York, NY, Marcel Decker, Inc.,
2004, p.558, Table IV.2.
4. S. Bolton, Pharmaceutical Statistics Practical and Clinical Applications, 5th ed., New York, NY, Marcel Decker, Inc.,
2004, p.561, Table IV.4.
5. Minitab 16 Statistical Software (2010). [Computer software], State College, PA, Minitab, Inc.
6. W.J. Dixon and F.J. Massey, Introduction to Statistical Analysis, New York, NY McGraw-Hill, 1969.
7. NIST/SEMTECH, e-Handbook of Statistical Methods, available at: http://www/itl.nist/gov/div898/handbook
8. P.C. Meier and R.E. Zünd. Statistical Methods in Analytical Chemistry, 2nd ed. John Wiley & Sons, New York, 2000.
9. J.N. Miller and J.C. Miller, Statistics and Chemometrics for Analytical Chemistry, 6th ed. Pearson/Prentiss Hall, Harlow,
UK, 2010.
10. AMC Technical Brief, No. 14, The Royal Society of Chemistry, 2003.

Source URL: http://www.ivtnetwork.com/article/statistical-analysis-analytical-method-validation