A History of The OOS Problem

A History of the OOS Problem Nov 1, 2007 By: Steven S.
Kuwahara BioPharm International Volume 20, Issue 11 ABSTRACT Production lots and tests with a specification range of two standard deviations will produce random rejections five percent of the time, as a result of extreme statistical variation. Techniques based on sound statistical reasoning were developed to deal with out-of-specification (OOS) test results. The temptation to bend the rules and lower the reject rate led to abuses, however. The most common of these was to test a sample repeatedly until a passing result was produced. In 1993, Barr Laboratories lost a lawsuit on this and related points and the judge's decision led to new interpretations of FDA rules, including the requirement that an investigation be initiated before a replicate sample can be tested. These rules and others incorporated into FDA guidance documents reflect a misunderstanding of important statistical principles. Dealing with out-of-specification (OOS) test results has been a general manufacturing concern for more than 80 years.1 It arises because it is statistically plausible that five percent of lots and tests will fall outside accepted limits, even if the product actually meets specifications. The bigger problem is that many manufacturers have incorrectly applied retesting procedures and averages. Such erroneous application of statistical methods is probably due, in some cases, to poor training in mathematics and unethical efforts to avoid discarding lots, in others. The most significant abuse of statistical methods has been to test lots repeatedly until a sample falls within the specification range, and then to accept a lot based on one passing result. This method is known as "testing into compliance." This approach to OOS results became a major problem following the 1993 lawsuit between the US government and Barr Laboratories.2 Peculiar judicial conclusions and subsequent US Food and Drug Administration (FDA) actions created a major problem out of a minor quality control (QC) problem. In this article, we trace this history with an emphasis on the 15 years since the Barr Decision. A key part of the story is that poor training in mathematics and a lack of statistical thinking combine to confuse workers. BACKGROUND Before discussing the history of the out-of-specification (OOS) problem, it is useful to examine some basic tenets underlying lot release testing and the use of statistics.
All Measurements are Approximate Scientists realize that all measurements are uncertain at some level and are taught that the standard deviation is the parameter that estimates the degree of this uncertainty. For the pharmaceutical analyst, this idea is very important when making quality control (QC) measurements because the analyst must balance the cost of making measurements against the needed level of certainty. Unlike their counterparts in academia, industrial QC analysts are not expected to produce test results that are accurate and precise to the maximum number of significant figures that are possible. In most cases, the analyst's supervisors will not provide the equipment or the time to make measurements of that type, but will only provide what is necessary to determine if a product lot meets specifications. Of course, the occurrence of OOS results also raises the question of whether the specifications themselves have been properly set. If the specifications are set improperly, we will consistently see OOS results, because the manufacturing process itself cannot meet the specifications that were set for it. This article does not deal with such circumstances, however; the OOS problem addressed here applies to stable and controlled processes with realistic requirements, in which an OOS result is a rare event. Variability Can be Measured The experienced QC scientist knows, when setting specifications, that individual units of a product will vary because of process variations that affect both samples and whole lots. In addition, variation in the test method itself is layered on top of process variations. Therefore, the result of a single test is affected by multiple sources of variation, and may be misleading unless the degree of variation arising from the different sources is understood. That is why a specification has ranges. The statistically trained analyst tries to understand these variations by testing several replicates to obtain an average (mean, x-mean) and a standard deviation (s). The standard deviation is a measure of the variation of the test results and may be expressed as a percentage of the mean (100 * s/x-mean); this is called a coefficient of variation (CV). A small CV is believed to represent a test that is more precise than a test with a large CV. The usual procedure is to create specifications that allow for test results to vary in a range of two standard deviations about the mean. Statistically, this creates a range that captures 95% of expected test results. The problem lies with the remaining 5%. Approximately 5% of the time, a confluence of random events can occur, resulting in a test result that is outside the 95% range of the specification. This situation could occur as a result of what is known as extreme statistical variation, even if nothing is wrong with the product lot or process. In rare instances, extreme statistical variation could produce test results indicating that an entire product lot is OOS; more commonly, however, the OOS result is associated with a single test sample. To lessen the effect of extreme statistical variation, most QC analysts make their
measurements using replicates. The averaging of replicates is a method for controlling the effects of variability.1 The procedures for calculating the number of replicates required for a given level of risk and test method variation have been known for a long time.2 It is also well known that when using a 95% interval for a specification range, the 5% possibility of obtaining an OOS result (even on an acceptable product, because of extreme statistical variation) can be dealt with by repeating the test on the same sample or set of samples. The idea of this approach is that if an OOS test result was caused by an extreme statistical variation that occurred 5% of the time, then there was a 95% probability that the retest would not show the effects of this extreme variation. To a certain extent, this procedure led to what FDA has called "reflexive retesting." But this method, when used properly, is a valid approach. Only when abused does it truly become "reflexive retesting." THE FALLACY OF TESTING INTO COMPLIANCE AND THE OOS PROBLEM The OOS problem did not arise from reflexive retesting, however, but rather from an incorrect extension of the procedure, which led to "testing into compliance," essentially a result of the fact that management hates to reject a batch. The process of "testing into compliance" resulted from a reversal of the thinking that originally led to reflexive retesting. In "testing into compliance," an unethical manufacturer hopes that even a bad lot will produce a passing test result, as a result of extreme statistical variation. Consequently, failing test results are ignored and retests are ordered until extreme variation produces a passing test result. The passing result is accepted and the lot is released based on that result. Instances where seven to eight retests were ordered in an attempt to obtain a passing result are known. If a company actually believes that such a passing result shows that the product is of good quality, it is engaging in fallacious reasoning. There is no reason to believe that the results obtained from the retests are really different from the original test result. In fact, a result from a retest should be nothing more than another member of the population of test results that are generated by random variation. The fact that a passing test result is pleasing to management does not make it more valid than a result that indicates a failure to meet a specification. From a statistical point of view, as long as these results arise from properly performed tests on the same sample, they are all members of a population of test results, each of which represents a legitimate estimate of the property specified. The Proper Use of Retesting There is a legitimate and proper way to use retesting, however. If a test shows that a batch is OOS, the trained QC analyst considers three possibilities: 1. Has process variation created a whole lot that is OOS? 2. Has process variation created a single sample that is OOS?
3. Did the OOS test result occur because the test was performed incorrectly?
Therefore, when confronted with an OOS test resultwhether it is a single number or an averagethe logical next step is to perform a retest. The reason for this is that in some situations, sample limitations, or questions about the validity of the sample create a need to perform the retest on a new sample from the lot. In such cases, the analyst should conduct the retest under conditions under which the retest can be considered a legitimate test of the lot. Therefore, there should be no reason to believe that the retest is not as valid as the original test. If the result of the retest confirms the initial OOS conclusion, the analyst should accept the failure and reject the lot. If the retest shows a passing test result, however, the analyst is in a quandary, because both results are equally valid. The situation in which the analyst is faced with opposite conclusions from equally valid results is common when the initial OOS result is caused by extreme statistical variation. The experienced analyst knows that it is necessary to conduct additional retests to reach a high level of confidence that the lot is really acceptable. The OOS problem arose because of companies whose managers would immediately accept a passing result and discard a previous failing result when there was no scientifically defensible reason for doing so. This ended up in court, in the case of United States v. Barr Laboratories. THE BARR DECISION In 1993, Barr Laboratories was sued by the US government (i.e., the US Food and Drug Administration) regarding a whole set of issues, including the way the company dealt with OOS results.35 Barr lost and the judge who heard the case, Judge Wolin, issued a ruling commonly referred to as the Barr Decision. The Barr Decision made the OOS problem into a major problem for the QC laboratory by creating a regulatory requirement where, following an OOS result, an investigation must be initiated before any retesting can be done. Consequently, OOS results arising from random variation must be investigated before actions (i.e., retesting) can be taken to decide whether or not it is a random event. This creates additional work for QC laboratories, and an intense desire to simplify investigations by blaming OOS results on laboratory error. This can create situations in which retesting leads to testing a product into compliance with repeated claims of laboratory error. An outside observer might conclude that the laboratory is not competent to be performing the test. Most QC supervisors who have received basic statistical training know that statistical formulas can be used to calculate the proper number of replicates needed to overcome a single failing result. The number of replicates is based on previous data concerning the variability of the product and test method. In the Barr Decision, however, the judge offered the opinion that seven passing results are needed to overcome one OOS result. This caused a number of companies to adopt a "seven replicate rule" when confronted with an OOS test
result. This procedure and the testimony that originally led to the judge's conclusion were completely without scientific foundation. THE 1998 DRAFT GUIDANCE THAT FOLLOWED THE BARR DECISION Following the 1993 Barr Decision, the OOS problem was formalized by FDA in a draft guidance document, Out of Specification (OOS) Test Results for Pharmaceutical Production, issued in September 1998.6 Although it was a draft document, for many years it was the only guidance document available on this subject. Like the Barr Decision, the FDA's draft OOS guidance document required that any single OOS result must be investigated. The guidance also introduced procedures for investigating OOS test results. It made clear recommendations for the actions that should be taken during initial laboratory investigations and formal investigations (with reporting requirements) for situations in which the OOS result cannot be attributed to laboratory error. The recommendations detailed the elements required for the investigations and the reports that would be generated. The responsibilities of the analyst who obtains an OOS test result and that analyst's supervisor were described. Misunderstanding Averaging In addition to the investigation requirement, the guidance document incorporated many other elements of the Barr Decision. Thankfully, it did not perpetrate the erroneous idea of seven passing test results overcoming one failing result. Unfortunately, other odd ideas, particularly related to averaging, were maintained.

A History of The OOS Problem

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

A History of The OOS Problem

Transféré par

Droits d'auteur :

Formats disponibles

A History of the OOS Problem Nov 1, 2007 By: Steven S.

Vous aimerez peut-être aussi