Vous êtes sur la page 1sur 6

Agricultural & Applied Economics Association

Significance Levels. 0.05, 0.01, or? Author(s): Lester V. Manderscheid Reviewed work(s): Source: Journal of Farm Economics, Vol. 47, No. 5, Proceedings Number (Dec., 1965), pp. 13811385 Published by: Oxford University Press on behalf of the Agricultural & Applied Economics Association Stable URL: http://www.jstor.org/stable/1236396 . Accessed: 10/01/2012 07:38
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.

Agricultural & Applied Economics Association and Oxford University Press are collaborating with JSTOR to digitize, preserve and extend access to Journal of Farm Economics.

http://www.jstor.org

CONTRIBUTEDPAPERS: MARKETING, PRICES, AND CONSUMPTION


MARGUERITE OF CHAIRMAN: BuRK, UNIVERSITY MINNESOTA

Significance Levels-0.05,

0.01, or ?

LESTERV. MANDERSCHEID

OST statistically oriented research published in the JOURNAL OF includes tests of statisticalhypotheses. In most FARM ECONOMICS cases a significancelevel of either 5 or 1 percent is cited. But a few use 10 or even 20 percent.Why the difference?Is a 1-percentlevel "better" than a 5-percentlevel? I will argue that choice of statistical-significance levels is not arbitrary but ratheris, or at least shouldbe, a deliberatechoice. The basic purpose of this paper is to integratedecision theory-managementif you preferwith statisticalhypothesistesting. The discussionwill be restrictedto relatively simple cases so as to minimize mathematical confusion. Once the

concepts are clarified, the mathematically sophisticated reader may pursue the more realistic cases. Let me begin with a review of some elementaryideas to insurethat we are all thinkingin the same terms. The basic problemin hypothesistesting involveschoosingwhich of two hypothesesto use as a basis for action. Hypothesesmay be simple or very complex. In a simple case, we might hypothesize that two populations have equal means (H) or alternativelythat the mean of one populationis two units largerthan the mean of the second (HA). More formally: : H: Al == A
/ HA: 1I112 =

2.

a 1-percent significancelevel better than a 5-percent significancelevel? Why would we consider the 1-percent significancelevel better? Because the probabilityof rejecting H when it is true is reduced to 1 percent. And this is obviouslybetter than using a test which permitsa 5-percent probabilityof rejecting H when it is true. Or is it? The probability of acceptingH when it is false must also be considered.
Thus, we have two types of errors: Type I: Rejecting H when it is true.

A t test can be applied to a set of data to determine whether we accept H, or reject H and accept HA. What significancelevel should we use? Is

Type II: Accepting H when it is false.


is associate professor of agriculturaleconomics, Michi-

LESTER V. MANDERSCHEID gan State University.

1381

1382

LESTERV. MANDERSCim

In a statistician's Utopia one can simultaneously minimize the probability of Type I error (a) and the probability of Type II error (P). A statistician in Utopia would obviously set a = 0= . But as a is decreased (the significance level moved from 5 percent to 1 percent) P is increased. The exact relationship between a and P depends on the underlying probability distributions for the test statistic and on the hypothesis and alternative hypothesis. This relationship is illustrated in most introductory statistics textbooks. For any particular test, we know the test statistic and the hypotheses. From this information we can calculate the P associated wtih any particular a.: The statistical tests recommended in standard textbooks or reference books are suggested because P is minimized for given a by these tests. For example, the t test is recommended for testing H:?, = ~2 against HA:,1 - '2 = 2 under rather general conditions because for any value of a (any significance level) the probability of a Type II error, P, is as small as possible. In some cases power functions or operating-characteristic curves are exhibited to illustrate this fact. Unfortunately, very few standard statistics books go much further in helping us select a significance level. Two quotations from the more helpful books will suffice to make the point: The choice of a level of significance a will usually be somewhat arbitrary since in most situations there is no precise limit to the probability of an error of the first kind that can be tolerated. It has become customary to choose for a one of a number of standard values such as .005, .01 or .05. There is some convenience in such standardization since it permits a reduction in certain tables needed for carrying out various tests. Otherwise there appears to be no particular reason for selecting these values. In fact, when choosing a level of significance one should also consider the power that the test will achieve against various alternatives.2 In practice, the final choice of the value for the critical probability represents some compromisebetween these two risks. It must be arrived at by balancing the consequences of a Type I error against the possible consequences of a Type II error.3 Both quotations emphasize balancing the two types of error. Can we formalize this balancing by use of economic- and/or decision-theory criteria? One might consider minimizing a weighted average of the
1This statement is literally true for simple hypotheses. For complex hypotheses, we can calculate the P3for given a for various alternative values of the parameters. 'E. L. Lehmann, Testing Statistical Hypotheses, New York, John Wiley & Sons, Inc., 1959, p. 61. ' W. A. Spurr, L. S. Kellogg, and J. H. Smith, Business and Economic Statistics, Homewood, Ill., RichardD. Irwin, 1961, p. 253.

SIGNIFICANCE

LEVELS-0.05,0.01, OR ?

1383

costs, using a and P as weights. Defining the cost of a Type I erroras C, and the cost of a Type II erroras Cu, this might be stated as: Minimize L' = aCI + #CII. We thus consider L' as a 'loss function"and minimize it by choosing appropriatea and 3. More properly L' should be labeled as expectedloss function but simplicity suggests the term 'loss function." Unfortunately, the loss function involves mathematical difficulties: a is calculated on the basis that H is true while P is calculated on the items. But basis that HA is true. We are thus adding together "unlike" there is also anotherconsideration. and Lehmannphraseit thus: Hodges it of on the consequences the two errors. However, also dependson the is of circumstances the problemin anotherway. If the null hypothesis or believed,on the basis of muchpast experience of a wellvery firmly verifiedtheory,one wouldnot lightlyrejectit and hence wouldtend to use a very smalla. On the otherhand,a largera wouldbe appropriate for testing a null hypothesisabout which one is highly doubtfulprior to the experiment.4 Fortunatelythis suggestion provides a solution to some of the mathematical difficulties-at least to the person willing to accept some of the "Bayesian" approachto statistics.Define as follows: PI: Prior probabilitythat H is true, PII:Prior probabilitythat HA is true. These prior probabilitiesreflect the investigator's beliefs prior to looking at the data. If we accept the idea that prior probabilities exist, they provide a link for putting a and P probabilitieson a commonbasis. The resultingloss function is as follows: L = PIaCI+ PIfCII. Choosing a and P so as to minimize L, given the values of PI, P1I, or CI, and CII, leads to an "optimum" "best"significancelevel for the whose decisionrule is to minimizeexpectedloss. person Note that this discussion assumes a fixed sample size. Permitting sample size to vary allows calculation of the sample size needed to achieve given levels of the loss function rather than minimizing it for given sample size. A similaranalysiscan be pursuedby the personwho prefersa mini-max or some other decision rule. The significancelevel will depend on the decision rule but the conceptualargumentsare the same.
" L. Hodges, Jr., and E. L. Lehmann, Basic Concepts of Probability and StaJ. tistics, San Francisco,Holden-Day, Inc., 1964, p. 326.

. . . the reasonablecompromisein choosing the critical value will depend

1384

LESTEn V. MANEERSCHEID

What are the implications of the loss-function approach for the simple case suggested on page 1, where
H: A
-A

/2

and
HA:: 1/2 = 2 ?

Suppose that these refer to yields for two varieties of wheat. Suppose all data other than average yield indicate no difference in the two varieties. Then a farmer might well say Pr = PI = 3 and also say that Cr = 0, since if the yields are equal he loses no profit by choosing either variety. However, C11is 2 bushels per acre and this can be translated into a dollar amount by using price and acreage. Obviously, the farmer wants to minimize P and doesn't care about a-a significant level of 5 percent is obviously wrong! In fact, he should always choose HA and plant variety 1. Suppose that the decision maker is the head of a seed company and that development costs for variety 1, a new variety, would be high. The manager then needs some idea of how much we will lose if he develops variety 1 and it is no better (CI) compared to the loss if he fails to develop it and variety 1 is better (CT). Further, he will want to consult geneticists and other agronomists to evaluate the prior probabilities rather than assuming PI = PII = 3. In spite of the extra complications, the manager may still find the loss function a useful device for selecting an appropriate significance level. Others involved-the plant breeder, a rival seed company, etc.-might arrive at still different a levels either because they begin with different prior probabilities or because their estimated costs are different. But this should not worry us. Don't we argue that decision makers need to evaluate their environment, talents, etc. to arrive at a "best" decision? Some will argue that this approach is interesting in theory but impossible in practice because we cannot estimate Cr and CII. But if we cannot estimate the costs of an error, should we be testing? A basic purpose of testing is to choose between two acts. If we are choosing acts, we should be able to specify the costs which can be measured in either monetary or nonmonetary units. There remains, however, real difficulty in actually calculating the minimum for the loss function in most cases of practical importance. For example, if we test
H: U1 2,

HA: /11

/2,

SIGNIFICANCE LEVELS-0.05,

0.01,

OR ?

1385

then the value of P depends on the difference between p, and pt2;C I also depends on this difference.5 Thus, L is a complicated mathematical function. Approximate results can be obtained by using "representative values" for I, - p2. The basic conceptual framework is still a valid reasoning device whether one actually carries out the minimization or only approximates it. Birnbaum6 has argued that researchers test too many hypotheses and fail to specify the likelihood of various parameter values often enough. Short of publishing the likelihood function, one could publish the maximum value of a that would permit rejection of a relevant hypothesis (or maximum a for several hypotheses). One could go further and publish the P associated with several possible values of a. This would permit the decision maker to test, or approximate a test, using his optimum values for a and (3. Summary Choosing a significance level is not an arbitrary choice between a 5percent and a 1-percent level. Rather, a conscious choice can be made-a choice grounded in the principles of management and statistical theory. One must consider (1) the costs associated with each type of error, (2) the prior probabilities of the hypothesis and the alternative, and (3) the size of the Type II error associated with each significance level. Incorporating these facts into a decision model yields a "best" significance level. This approach clarifies the relation between testing hypotheses and following actions and helps explain why several decision makers faced with exactly the same observations may reach different decisions.
5P1 must be interpreted carefully, since the probability of exact equality is undoubtedly near 0. We usually have in mind equality up to some small difference. 6 A. Birnbaum, "On the Foundations of Statistical Inference," J. Am. Stat. Assn. 57:269ff, June 1962.

Vous aimerez peut-être aussi