2

Basic Method Validation
2nd Edition
James O. Westgard, PhD
with contributions from Elsa F. Quam, BS, MT(ASCP) Patricia L. Barry, BS, MT(ASCP) Sharon S. Ehrmeyer, PhD
Copyright 2003
Westgard QC, Inc. 7614 Gray Fox Trail Madison WI 53717 Phone 608-833-4718 Fax 608-833-0640
http://www.westgard.com
Library of Congress Control Number: 2003106221 ISBN 1-886958-19-X Published by Westgard QC 7614 Gray Fox Trail Madison, WI 53717 Phone 608-833-4718
Copyright 1999, 2003 by Westgard QC (WQC). All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without prior written permission of Westgard QC.
Foreword to the 2nd edition

Basic Method Validation is part of a trilogy of back to basics books that deal with analytical quality management. The other two books are Basic QC Practices and Basic Planning for Quality. When I teach these materials to students, I start with method validation because it introduces the basic concepts of analytical performance and the experimental and statistical techniques needed to describe performance in quantitative terms. Those concepts carry through into the practice of QC, including the selection of optimal QC designs by careful planning for quality. Method validation has actually become more important with publication of the Final CLIA Rule, which adds a requirement that laboratories must validate all non-waived methods introduced after April 24, 2003. Previously, moderately complex methods did not need to be validated. Now moderately and highly complex methods are combined into a category of non-waived methods that must be validated by the laboratory. The changes in CLIA made it imperative to update this book. The 2nd edition includes the following new, revised, and reorganized materials: Revised chapter on regulations that reflects the Final CLIA Rule that was published in January 2003. Revised material on reportable range that includes the CAPs new concepts of analytical measurement range (AMR) and clinical reportable range (CRR). New chapter on statistical sense, sensitivity, and significance to provide a better understanding of the behavior of different statistics in response to different types of analytical errors. Revised Method Decision Chart that includes the Six Sigma goal for world class quality. Additional discussion of Deming and Passing-Bablock regression. New Internet calculators that perform Deming and PassingBablock regression. Reorganization of some of the statistical materials to provide successive "doses" at the right times. New chapter that discusses the new ISO approach of characterizing analytical performance as uncertainty rather than the traditional characterization in terms of analytical errors. We hope that readers of the 2nd edition will find Basic Method Validation even more useful and valuable than the 1st edition.
Foreword to the First Edition

Over twenty-five years ago, we began work on method validation. Working then with Diane J. de Vos, Marian R. Hunt, Elsa F. Quam, R. Neill Carey, and Carl C. Garber, we published a series of five papers in the American Journal of Medical Technology. Those papers were later published together as a monograph that was titled Method Evaluation. The Editor-in-Chief of the journal, Ina Lea Roe, wrote in the preface of that earlier monograph: With the August, 1978 issue, the American Journal of Medical Technology concluded what we believe to be a landmark publication Westgard et als 'Concepts and Practices in the Evaluation of Clinical Chemistry Methods.' As clinical laboratory professionals are increasingly inundated with claims and counterclaims from competing manufacturers, the job of sorting out these claims and making correct decisions on instrumentation and methods becomes more complex. Bad decisions are costly, not only in monetary terms, but in lost time, impaired laboratory efficiency, and deterioration of physician confidence in laboratory service. The best methods mean more reliable laboratory results and, in return, better patient care. To the experienced technologist who remembers struggling to evaluate two or three simple methods, the Westgard system is indeed a gift, an innovation that makes manageable what used to be so complex. Basic Method Validation offers a new educational program that is in tune with modern times and technology. In addition to the traditional text, this book is integrated with the Internet: links at the end of each chapter provide access to Internet calculators and graphic data plotters, so readers can immediately perform the data calculations described in the book, using sample (or their own) data. As the earlier generation of medical technologists passes their responsibilities on to the next generation of clinical laboratory scientists, the problems faced in evaluating new analytical methods are still much the same. We hope that with this book, however, we have provided new tools and faster techniques to resolve these problems.
Acknowledgments
First and foremost, Sten Westgard deserves the credit for the formula used in developing these materials. As webmaster of Westgard Web, he facilitates the publication of materials and also provides a driving force for their completion via his schedule for web updates. At my age, I need deadlines to get things finished and I get them from Sten. Its always fun to work with Elsa Quam and Trish Barry on a project like this. Elsa was involved in the first version of these materials, understands their purpose and origin, and also appreciates the need for changes and evolution. Trish brings her systems perspective to bear and assures that any changes are really improvements and that they fit with our overall philosophy and approach to analytical quality management. Sharon Ehrmeyer again has been willing to help clarify the wonderful world of CLIA rules and regulations. With the publication of the Final CLIA Rule, method validation studies are required for all "non-waived" methods that were previously exempted, pending the implementation of a QC clearance process that never occurred. Sharons good humor keeps me from saying more. Thanks also to Karen Mugan and Neill Carey for their help in identifying some of the confusing issues that surround detection limits. I now clearly understand that detection limits are confusing, even though I dont know what to do about it. The antique maps that appear in this book are part of a small personal collection. I hope you find them helpful for illustrating key ideas in the book, as well as interesting and beautiful historical documents. James O. Westgard Madison Wisconsin
About the authors and contributors

James O. Westgard, PhD, is a Professor in the Department of Pathology and Laboratory Medicine at the University of Wisconsin Medical School, where he teaches in the Clinical Laboratory Science program. He is also Director of Quality Management Services at the Clinical Laboratories, University of Wisconsin Hospital & Clinics, and President of Westgard QC, Inc. Elsa F. Quam, BS, MT(ASCP), is a Quality Specialist in the Clinical Laboratories at the University of Wisconsin Hospital and Clinics. Patricia L. Barry, BS, MT(ASCP), is a Quality Specialist in the Clinical Laboratories at the University of Wisconsin Hospital and Clinics. Sharon S. Ehrmeyer, PhD, MT(ASCP), is a Professor in the Department of Pathology and Laboratory Medicine and Director of the Clinical Laboratory Science Program at the University of Wisconsin Medical School.
Westgard QC, Inc. Copyright 2003
Table of Contents
1. Is quality still an issue for laboratory tests? ........................... 1 Myths of quality ....................................................................... 3 2. How do you manage quality? ...................................................... 9 The need for standard processes and standards of quality . 11 3. What is the purpose of a method validation study? ............ 19 MV The inner, hidden, deeper, secret meaning ................ 20 4. What are the regulatory requirements for basic method validation? ................................................................................... 27 MV The regulations, by Sharon S. Ehrmeyer, PhD .......... 28 5. How is a method selected? ........................................................ 37 MV Selecting a method to validate .................................... 38 6. What experiments are necessary to validate method performance? .............................................................................. 47 MV The experimental plan ................................................. 48 7. How are the experimental data analyzed? ............................ 57 MV The data analysis tool kit ............................................. 58 8. How are the statistics calculated? ........................................... 69 MV The statistical calculations ........................................... 70 9. How is the reportable range of a method determined? ...... 87 MV The linearity or reportable range experiment ............ 88 Worksheet for validation of reportable range ....................... 93 Worksheet for quantifying errors .......................................... 97 Problem set Cholesterol method validation data ............. 100 10. How is the imprecision of a method determined? ........... 101 MV The replication experiment ........................................ 102 Problem set Cholesterol method validation data ............. 110 11. How is the inaccuracy (bias) of a method determined? . 111 MV The comparison of method experiments .................... 112 Problem set Cholesterol method validation data ............. 122 12. How do you use statistics to estimate analytical errors?123 MV Statistical sense, sensitivity, and significance .......... 124
Basic Method Validation, 2nd Edition
13. How do you test for specific sources of inaccuracy? ....... 139 MV The interference and recovery experiments .............. 140 Problem set Cholesterol method validation data ............. 151 14. What is the lowest test value that is reliable? .................. 153 MV The detection limit experiment .................................. 154 Problem set Cholesterol method validation data ............. 161 15. How is a reference interval verified? ................................. 163 MV Reference interval transference ................................. 164 16. How do you judge the performance of a method? ........... 173 MV The decision on method performance ........................ 174 17. Whats a practical procedure for validating a method? . 183 MV The real world applications ........................................ 184 MV The worksheets ........................................................... 193 18. What questions do you have about method validation? . 197 MV The Frequently-Asked-Questions .............................. 198 19. What impact will ISO have on method validation? ......... 211 To be uncertain or in error ................................................... 212 20. How do you use statistics in the Real World? ................... 219 Points of care in using statistics for method comparison ... 220 21. Glossary of Terms ................................................................... 229 22. Reference List ........................................................................... 249 Online reference list ............................................................. 254 23. Self-Assessment Answers ........................................................ 257 Cholesterol Problem Set answers ........................................ 282 Appendix 1: CLIA88 Analytical Quality Requirements ......... 289
Westgard QC, Inc., Copyright 2003
12: How do you use statistics to estimate analytical errors?

This chapter provides more details about the analysis and interpretation of data from a comparison of methods experiment. It makes use of simulated data to demonstrate the behavior of statistical parameters in response to the different types of errors that may be observed in the data.
Objectives:
Relate the statistics used to analyze the data from a comparison of methods experiment to the types of analytical errors occurring between the methods. Identify the limitations of t-test and regression statistics. Formulate a strategy to perform a proper analysis of data from a comparison of methods experiment.
Lesson materials:
MV Statistical Sense, Sensitivity, and Significance, by James O. Westgard, PhD Excel file with simulated glucose data
Things to do:
Study the materials. Setup (or download) an electronic spreadsheet to make calculations and prepare plots of comparison results. Prepare a data set to demonstrate the effect of proportional error on t-test statistics and the difference plot. Prepare a data set to show the effect of a narrow range of data on regression statistics and the comparison plot. Examine a validation study published in the scientific literature and critique its use and application of statistics.
Page 123
Basic Method Validation, 2 nd Edition
Method Validation:
Statistical Sense, Sensitivity, and Significance
James O. Westgard, PhD Remember the 1st secret of method validation its all about error assessment. The 2nd secret is that statistics are just tools to estimate the size of those errors. The data analysis toolkit makes it easy for you to calculate the appropriate statistics and prepare corresponding graphics using Internet calculators and plotters. Even so, you still need to be careful in interpreting the statistics, particularly for data from the comparison of methods experiment. Many years ago, we studied the use and interpretation of statistics for method-comparison data. We employed a data simulation approach to create data sets that had different types and magnitudes of analytical errors, then calculated regression statistics, t-test statistics, and the correlation coefficient for each of those data sets. We looked to see which statistics changed as the type and magnitude of analytical errors changed in the data sets. Those results that were published in the Clinical Chemistry journal [1] and the original paper is available on Westgard Web at http://www.westgard.com/method1.htm. While that knowledge has been around for 30 years, it isnt being passed around through the current education and training programs. Analysts still have great difficulty making sense of the statistics used in method comparison studies. The key to making statistics useful lies in understanding their sensitivity to different types of errors in the data. Statistical sense relates to the sensitivity of statistics to errors.
Simulation of errors in test results

Earlier glucose methods did not have the specificity of todays enzymatic methods. Because comparison studies between specific and non-specific glucose methods were appearing in the literature at that time, glucose was a good test for demonstrating how different types of analytical errors show up in the statistical results of method comparison studies.
Page 124
We begin by constructing a data set of 41 specimens that would be typical for a hospital population, as shown in Table 1 by the reference method results in the 1st column. In a perfect world, the test method would give exactly the same results, as shown in the 2nd column. To demonstrate the effects of different types of errors, additional sets of comparison results can be created by manipulating the reference data set in specific ways. Random error can be simulated by alternately adding or subtracting 5 mg/dL to every data point in the reference set, as shown by the 3rd column of results. This is not truly random, at least not the normal or Gaussian random error expected for an analytical method, but it will suffice to demonstrate the effect. Additional data sets can be constructed for 2 mg/dL and 10 mg/dl to demonstrate the effect of changes in the size of the random error. Constant systematic error can be simulated by adding 10 mg/dL to every data point in the reference set, as shown by the 4th column. Additional data sets can be constructed by adding 2 mg/dL or 5 mg/dL to demonstrate the effect of changes in the size of the constant error. Proportional systematic error can be simulated by multiplying each result in the reference data set by 1.05. Additional data sets can be constructed using factors of 1.02 and 1.10 to demonstrate the effect of changes in size of the proportional error. Combinations of errors can be simulated by applying two or more of the above operations, as shown by the data in columns 6 through 8.
Page 125
Table 1. Example simulated glucose data sets

Reference Perfect Random Constant RE5 CE5 Proportional PE5 RE & CE RE5+CE5 RE & PE RE5+PE5 RE & CE & PE RE5+CE5+PE5
40 60 80 90 100 110 120 125 130 135 140 145 150 155 165 170 175 180 185 190 195 200 205 210 215 220 225 230 235 240 245 250 260 270 280 290 300 320 340 380
40 60 80 90 100 110 120 125 130 135 140 145 150 155 165 170 175 180 185 190 195 200 205 210 215 220 225 230 235 240 245 250 260 270 280 290 300 320 340 380
35 65 75 95 95 115 115 130 125 140 135 150 145 160 160 175 170 185 180 195 190 205 200 215 210 225 220 235 230 245 240 255 255 275 275 295 295 325 335 385
45 65 85 95 105 115 125 130 135 140 145 150 155 160 170 175 180 185 190 195 200 205 210 215 220 225 230 235 240 245 250 255 265 275 285 295 305 325 345 385
38.00 57.00 76.00 85.50 95.00 104.50 114.00 118.75 123.50 128.25 133.00 137.75 142.50 147.25 156.75 161.50 166.25 171.00 175.75 180.50 185.25 190.00 194.75 199.50 204.25 209.00 213.75 218.50 223.25 228.00 232.75 237.50 247.00 256.50 266.00 275.50 285.00 304.00 323.00 361.00
45 65 85 95 105 115 125 130 135 140 145 150 155 160 170 175 180 185 190 195 200 205 210 215 220 225 230 235 240 245 250 255 265 275 285 295 305 325 345 385
33.25 61.75 71.25 90.25 90.25 109.25 109.25 123.50 118.75 133.00 128.25 142.50 137.75 152.00 152.00 166.25 161.50 175.75 171.00 185.25 180.50 194.75 190.00 204.25 199.50 213.75 209.00 223.25 218.50 232.75 228.00 242.25 242.25 261.25 261.25 280.25 280.25 308.75 318.25 365.75
42.75 61.75 80.75 90.25 99.75 109.25 118.75 123.50 128.25 133.00 137.75 142.50 147.25 152.00 161.50 166.25 171.00 175.75 180.50 185.25 190.00 194.75 199.50 204.25 209.00 213.75 218.50 223.25 228.00 232.75 237.50 242.25 251.75 261.25 270.75 280.25 289.75 308.75 327.75 365.75
Page 126
Statistical analysis of the simulated data

These data sets are then subjected to the statistical calculations for regression (the slope, b; the y-intercept, a; and the standard deviation of the points about the regression line, sy/x), paired t-test analysis (the average difference between methods, bias; the standard deviation of the differences, SDdiff; and the calculated t-value that is used to determine whether the bias is statistically significant or real ), and finally the correlation coefficent, r.
Table 2. Effect of analytical errors on calculated results

Data set 1. Perfect 2. RE 2 3. RE 5 4. RE 10 5. CE 2 6. CE 5 7. CE 10 8. PE 2 9. PE 5 10. PE 10 11. RE5+CE2 12. RE5+CE5 13. RE5+CE10 14. RE10+CE2 15. RE10+CE5 16. RE10+CE10 17. RE2+PE5 18. RE5+PE5 19. 20. 21. 22. 23. 24. 25. 26. 27. RE10+PE5 RE5+PE10 RE5+PE25 RE5+PE50 CE2+PE5 CE5+PE5 CE10+PE5 RE2+CE5+ RE10+CE2 RE CE PE 0 2 5 10 0 0 0 0 0 0 5 5 5 10 10 10 2 5 10 5 5 5 0 0 0 2 10 0 0 0 0 2 5 10 0 0 0 2 5 10 2 5 10 0 0 0 0 0 0 2 5 10 5 2 10 0 0 0 0 0 0 0 0 5 10 0 0 0 0 0 0 5 5 5 10 25 50 5 5 5 10 5 2 Slope y-int 1.000 1.001 1.001 1.003 1.000 1.000 1.000 0.980 0.950 0.900 1.001 1.002 1.002 1.003 1.003 1.003 0.951 0.953 0.900 0.751 0.501 0.950 0.950 0.950 0.905 0.953 0.00 -0.02 -0.04 -0.08 2.00 5.00 10.00 0.00 0.00 -0.04 1.96 4.92 9.95 1.92 4.92 9.92 -0.02 -0.08 0.23 -0.02 -0.04 2.00 5.00 10.00 4.99 1.89 Sy/x 0.00 2.00 5.00 10.00 0.00 0.00 0.00 0.08 0.08 0.14 5.00 5.02 5.02 10.00 10.00 10.00 2.00 5.00 10.00 5.01 5.00 5.00 0.08 0.08 0.08 2.03 10.01 4.99 bias SDdiff t 0.00 0.00 0.05 2.00 0.12 5.00 0.24 10.00 2.00 0.00 5.00 0.00 10.00 0.00 2.29 1.18 5.72 2.95 11.45 5.88 2.12 5.00 5.10 5.03 10.10 5.03 2.24 10.00 5.24 10.00 10.24 10.00 5.67 3.53 5.59 5.76 5.47 10.38 11.28 7.85 28.45 15.58 57.02 30.17 3.72 2.95 0.72 2.95 4.28 2.95 6.37 6.28 3.48 10.5 7.84 5.18 r
undefined 1.000
0.999 0.993 0.986 undefined 1.000 undefined 1.000 undefined 1.000 12.43 1.000 12.42 1.000 12.47 1.000 2.72 0.996 6.50 0.996 12.86 0.996 1.44 0.986 3.36 0.986 6.56 0.986 10.27 0.999 6.27 3.38 9.09 11.62 12.10 8.07 1.55 9.31 6.50 12.12 9.69 0.996 0.985 0.996 0.994 0.986 1.000 1.000 1.000 0.999 0.984 0.996
0.16 0.16 0.16
0.951 -0.04
28. RE5+CE10+PE2 5
0.981 9.96
Page 127
The calculated statistics are shown in the 2nd table, where the leftmost column identifies the type and magnitude of the analytical errors in the data set (RE, random error; CE, constant error; PE, proportional error) and the next three columns show the magnitude of the errors. Each row provides the statistical results for the specific error condition or set of error conditions. Comparison of the error conditions and the statistical values allows us to match the type and magnitude of the errors to the estimates from the statistics. No errors. Perfect comparison data would have exactly the same values for the test and reference methods. In a comparison plot, all the points would fall exactly on a line making a 45 degree angle and intersecting the axes at the origin. As shown in the table, the statistical results for this ideal situation show a value of 1.00 for the slope and correlation coefficient and values of 0.00 for all other statistics, except the t-value which is undefined (because it is a ratio of two terms that are both 0.00). In a perfect world, the statistics have ideal values of 1.00 (slope, correlation coefficient) or 0.0 (y-intercept, sy/x, bias, SDdiff). Deviations from those ideal values are indicators of errors. Random error. The effect of random error is shown in this figure for the data set where 5 mg/dL has been alternately added and subtracted from the reference set of values. Random error shows up in the plot as scatter in the points about the regression line. The statistical calculations in Table 2 show minimal changes in the slope, intercept, or bias, but the sy/x and SDdiff terms reflect directly the size of the random error. The correlation coefficent decreases as random error increases, but the changes in r are small and do not provide a direct estimate of random error in concentration units. Constant error. The effect of constant error is shown in this
Page 128
next figure, where the regression line no longer goes through the origin. In this case, 10 mg/dL has been added to create a constant error and the magnitude of that error is correctly estimated by both the y-intercept from regression and the bias term from t-test analysis. Note that there is no change to the correlation coefficient. Proportional error. The effect of proportional error is shown for three situations where test results are lower than reference results by 2%, 5%, and 10%. Proportional error changes the steepness of the regression line and the exact magnitude of the error is estimated by the slope from regression analysis. Note that proportional error does not affect the y-intercept, sy/x, and r, but does causes changes in both the bias and SDdiff terms in t-test analysis.
Sensitivity of statistics to types of errors

The behavior of these different statistics to the different types of errors is summarized in the table shown here. Random error is
Page 129
reflected by changes in s y/x, SDdiff, and r. Constant error shows up in the y-intercept and the bias. Proportional error can be best estimated by the slopes deviation from ideal, but also causes changes in the bias and SDdiff from t-test analysis. Thats a problem and the reason for the question mark in the table below. Proportional error confounds the interpretation of t-test statistics! Theres also a problem with the correlation coefficient because it responds only to random error, not systematic errors, which are the errors of interest in the method comparison experiment. You can have an ideal correlation coefficient even if a method is inaccurate! Estimation of random error. From the sensitivity table, it is apparent that there are three statistics that respond to random error. While this error was introduced into the test method by the
Sensitivity of Statistics Regression Slope, b Y-intercept, a SD about line, sy/x T-test Bias SDdiff Correlation coefficient r
RE
CE
PE
No No Yes No Yes No Yes No No No Yes Yes? Yes No Yes? Yes? No? No?
data simulation, the statistics usually reflect the random error from both methods and sometimes even include additional scatter caused by differences in specificity between the two methods being compared. Therefore, the estimate of precision from the replication experiment is still important for characterizing the performance of an individual method. Note that SDdiff is also influenced by proportional error, which means this statistic will not provide a specific estimate of random
Page 130
error. The other two statistics sy/x and r are sensitive only to random error, but they differ in their units and numerical values. It is most useful to estimate random error as a standard deviation and in concentration units as provided by sy/x, rather than the unitless numbers provided by the correlation coefficient. A value of 5 mg/dL for sy/x can be readily interpreted in terms of the differences expected, for example, 95% of the differences will be within 2 times the value of sy/x. A value of 0.996 for r does not provide an estimate of the size of random errors in a meaningful manner like a standard deviation term. A further limitation of r is that it depends on the analytical range covered by the data. For example, in the two plots shown here, the random error is the same, 10 mg/dL, yet the values for r are very different, 0.986 vs 0.764. The plot on the left shows the wide range of data that would be expected from a hospital population, whereas the plot on the right shows the narrow range expected from a healthy population. The correlation coefficient is actually sensitive to the random error between methods (the scatter in the y direction) relative to the
Page 131
range of analytical data in the x-direction. A high correlation coefficient means that the y-scatter is small compared to the xdistribution. While this behavior makes the correlation coefficient useless for estimating analytical errors, it does provide a measure of reliability for the regression slope and intercept, i.e., a high r value means that the data cover a wide range relative to the scatter between methods, therefore the line through that data is well defined. Estimation of constant error. The y-intercept from regression and the bias from t-test are the best statistics for estimating constant error. Both give the estimates in the units of concentration and provide similar values when proportional error is absent. However, proportional error does effect the bias term from t-test analysis, therefore that term does not provide a specific estimate of constant error. Instead it provides an overall estimate of systematic error that is reliable only at the mean of the data. Remember that bias is calculated as the difference between the mean of the test method results minus the mean of the comparative method results, which is the same as the average of the differences of all the individual specimens. Estimation of proportional error. The slope from regression, as well as both the bias and SDdiff from t-test are all sensitive to proportional error. The fact that both the bias and SDdiff terms respond demonstrates these statistics cannot provide a specific estimate of proportional error. Both can be misleading because they also respond to other types of errors, i.e., the bias term responds to constant error and the SDdiff responds to random error. It would be best to avoid the use of t-test statistics when proportional error is present. Regression provides the best estimate of proportional error. The difference between the slope and its ideal value of 1.00, expressed as a percentage, describes proportional error in the most useful way. For example, an observed slope of 0.95 indicates a proportional error of 5.0%.
Making Sense of Statistics
Page 132
Correlation coefficient. The correlation coefficient provides information only about random error, even though the objective in a method comparison study is to estimate systematic error. Therefore, the correlation coefficient is of little value for estimating analytical errors in a method-comparison experiment. However, because r is sensitive to the range of data collected, r is useful as a measure of the reliability of the regression statistics. [Isnt it wonderful? A limitation can be turned into a useful feature once the behavior is properly understood.] t-test statistics. The estimates of errors may be confounded by the presence of proportional error. There are two cases where the estimates of systematic error will be reliable: (1) if proportional error is absent, then the systematic error is constant throughout the concentration range; (2) if the mean of the patient results is close to the medical decision level of interest, then the overall estimate of constant and proportional error is reliable at the mean of the data, but that estimate of systematic error should not be extrapolated to other decision level concentrations. Inspection of a plot of test method values on the y-axis and comparison method values on the x-axis will usually reveal the presence of proportional error. If proportional error is NOT present, then the estimate of constant error should apply throughout the range of the data studied, but it would still be best to restrict the interpretation of decision levels near the mean of the data. If proportional error is present, it would be best to use regression statistics to estimate the systematic error at the decision levels of interest. Regression statistics. It is ideal to have three statistical parameters that can each estimate a different type of error. Proportional error can be estimated from the slope, constant error by the y-intercept, and random error (between methods) from the standard deviation about the regression line. Systematic error can be estimated at any concentration using the regression equation, i.e., Yc = a + bXc, where Xc is the critical medical decision
Page 133
concentration and Yc is the best estimate of that concentration by the test method. The difference between Yc and Xc is the systematic error at that critical concentration, i.e., Yc-Xc = SE. The estimates of errors from regression statistics will not be reliable unless the data satisfies certain conditions and assumptions. Linearity is assumed, therefore you must inspect a plot of the comparison results to assure there is a linear relationship between the two methods. For example, the effect of nonlinearity on the regression line is shown in the figure at right. Though it is obvious that there is some non-linearity at the high end, the calculations for linear regression will determine the best straight line through all the data. In this case, the points at the upper end will draw the line down, making the slope low and kicking up the yintercept. The estimates of proportional error and constant error will both be corrupted by any non-linearity in the data. Outliers can cause a similar problem! One or two points at the end of the line can exert undue influence by pulling the line towards those points and affecting both the slope and the yintercept. The remedy again is to inspect a plot of the data to be sure there are no outliers. A narrow range of data is also a problem because a line cannot be well-defined by a cloud of data points. While this situation can sometimes be recognized from a plot of the data, the best alert is provided by the correlation coefficent. A low value for r, 0.99 or less in some references and 0.975 in others [2], indicates that the estimates of the slope and intercept may be
Page 134
affected by the scatter in the data. One remedy is to utilize more sophisticated regression techniques, such as Deming regression [3] or Passing-Bablock regression [4]. A simpler remedy is to utilize the bias estimate from t-test statistics and interpret the data at the mean of the patient results (assuming that the mean is close to the medical decision level of interest). Statistical vs clinical significance. We have not included the t-value in the discussion so far because it does not provide an estimate of errors! This statistic is a test of significance that is mainly useful for deciding whether sufficient data have been collected to demonstrate that a difference exists. If the calculated t-value is greater than the critical t-value (which is 2.02 for the example data sets having 41 points), the observed bias is said to be statistically significant, which in practical terms means real. If the calculated t-value is less than the critical t-value, then the data are not sufficient to demonstrate that a statistically significance bias exists between the test and reference sets of values. From my perspective, this information on statistical significance is secondary in importance. The judgment on method acceptability must be made on clinical significance, not statistical significance. An error can be statistically significant, i.e., real, yet so small that it isnt clinically important. On the other hand, an error can be large and clinically important, yet the data may not be sufficient to demonstrate that it is statistically significant. Remember that the t-value is calculated as t = (bias/SDdiff)N1/2, which shows that it is a ratio of systematic error (bias) divided by random error (SDdiff) multiplied by the square root of the number of paired samples (N1/2). This is analogous to the equation for blood pH, where pH is a function of the ratio of bicarbonate to PCO2 times a dissociation constant. A pH value by itself is difficult to interpret without having information about the bicarbonate and PCO2 terms. Likewise, a t-value is difficult to make sense of unless you have information about the systematic and random error terms. Unfortunately, you will often find the t-value reported without any information being given about the bias or SDdiff terms.
Page 135
High t-values may result from bias being large, SDdiff being small, or N being large. A high t-value indicates that a real bias exists, however, that bias can be small and be statistically significant if SDdiff is small and/or N is large. Low t-values may result from bias being small, SDdiff being large, or N being small. A low t-value indicates that the data are not sufficient to demonstrate a real difference, therefore the conclusion is that no difference exists. However, a large bias may not be statistically significant if SDdiff is large and/ or N is small. The acceptability of method performance depends on whether or not the errors will affect the clinical usefulness of the test results. Clinical significance depends on defining allowable limits of errors, then comparing the observed errors to those limits. If the observed errors are smaller than the allowable errors, method performance is acceptable. If the observed errors are larger than allowable, method performance is not acceptable. Statistical tests can provide estimates of errors upon which judgments can be made, but they are not a substitute for the judgments that need to be made. Clinical significance is determined by comparing the statistical estimates of errors to the defined allowable error. A tool for doing this is the Method Decision Chart [5] described in chapter 16.
References
1. 2. Westgard JO, Hunt MR. Use and interpretation of statistical tests in method-comparison studies. Clin Chem 1973;19:49-57. Stockl D, Dewitte K, Thienpont M. Validity of linear regression in method comparison studies: Is it limited by the statistical model or the quality of the analytical input data? Clin Chem 1998;44:2340-6. Cornbleet PJ, Gochman N. Incorrect least-squares regression coefficients in method-comparison analysis. Clin Chem 1979;25:432-8. Passing H, Bablock W. A new biometrical procedure for testing the equality of measurements from two different analytical methods. J Clin Chem Clin Biochem 1983;21:709-720.
3. 4.
Page 136
Westgard QC, Inc., Copyright 2003 5. Westgard JO. A method evaluation decision chart (MEDx Chart) for judging method performance. Clin Lab Science 1995;8:277-83.
Online References:
Method validation data analysis tool kit http://www.westgard.com/mvtools.html CLIA requirements for analytical quality http://www.westgard.com/clia.htm Use and interpretation of common statistical tests in method-comparison studies. Original paper in PDF format. http://www.westgard.com/method1.htm Method Decision Charts. Original paper in PDF format. http://www.westgard.com/medx.htm Method Decision Charts - Excel worksheet http://www.westgard.com/medxcel.htm Simulated Glucose Data spreadsheet http://www.westgard.com/downloads/bmv2edsimdata.exe (This is a self-expanding zip file)
Self-Assessment Questions:
The following statistical summary was obtained for a glucose comparison of methods experiment:
Page 137
a = 5.23 mg/dL, b = 0.999, sy/x = 7.23 mg/dL, bias = 5.13 mg/dL, SDdiff = 7.23 mg/dL, t = 8.03, r = 0.996, N = 128. What is the proportional systematic error between methods? What is the constant systematic error between methods? What is the random error between methods? Why is there such good agreement between the estimates of error by regression and t-test statistics? Is the systematic error between methods statistically significant or real? What does the correlation coefficient tell you? The following statistical summary was obtained for a urea nitrogen comparison of methods experiment: N = 316, a = 0.31 mg/dL, sa = 0.23 mg/dL, b = 1.032, sb = 0.009, sy/x = 0.97 mg/dL, sx = 13.2 mg/dL, r = 0.997, bias = 0.40 mg/dL, SDdiff = 1.08 mg/dL, t = 6.58. What is the proportional systematic error between methods? What is the constant systematic error between methods? Why is it better to use the regression statistics to estimate errors rather than using t-test statistics? What is the 95% confidence interval for the y-intercept? Does the y-intercept differ significantly from the ideal value of 0.0? What is the 95% confidence interval for the slope? Does the slope differ significantly from the ideal value of 1.00? What does the correlation coefficient tell you?
Page 138

2

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

2

Transféré par

Droits d'auteur :

Formats disponibles

Basic Method Validation

Foreword to the 2nd edition

Foreword to the First Edition

About the authors and contributors

Westgard QC, Inc. Copyright 2003

Basic Method Validation, 2nd Edition

Westgard QC, Inc., Copyright 2003

12: How do you use statistics to estimate analytical errors?

Basic Method Validation, 2 nd Edition

Simulation of errors in test results

Westgard QC, Inc., Copyright 2003

Basic Method Validation, 2 nd Edition

Table 1. Example simulated glucose data sets

Westgard QC, Inc., Copyright 2003

Statistical analysis of the simulated data

Table 2. Effect of analytical errors on calculated results

0.16 0.16 0.16

Basic Method Validation, 2 nd Edition

Westgard QC, Inc., Copyright 2003

Sensitivity of statistics to types of errors

Basic Method Validation, 2 nd Edition

Westgard QC, Inc., Copyright 2003

Basic Method Validation, 2 nd Edition

Making Sense of Statistics

Westgard QC, Inc., Copyright 2003

Basic Method Validation, 2 nd Edition

Westgard QC, Inc., Copyright 2003

Basic Method Validation, 2 nd Edition

Basic Method Validation, 2 nd Edition

Vous aimerez peut-être aussi