By Stephen Lacy and Daniel Riffe
This study views intercoder reliability as a sampling problem. It develops a formula for generating sample sizes needed to have valid reliability estimates. It also suggests steps for reporting reliability. The resulting sample sizes will permit a knozun degree of confidence that the agreement in a sample of items is representative of the pattern that would occur if all content items were coded by all coders. Every researcher who conducts a content analysis faces the same question: How large a sample of content units should be used to assess the level of reliability? To an extent, sample size depends on the number of content units in the population and the homogeneity of the population with respect to variable coding complexity. Content can be categorized easily for some variables, but not for other variables. How does a researcher ensure that variations in degree of difficulty are included in the reliability assessment? As in most applications involving representativeness, the answer is probability sampling, assuring that each unit in the reliability check is selected randomly.' Calculating sampling error for reliability tests is possible with probability sampling, but few content analyses address this point. This study views intercoder reliability as a sampling problem, requiring clarification of the term "population." Content analysis typically refers to a study's "population" as all potentially codabte content from which a sample is drawn and analyzed. However, this sample itself becomes a "population" of content units from which a sample of test units is randomly drawn to check reliability. This article suggests content samples need to have reliabilify estimates representing the population. The resulting sample sizes will permit a known degree of confidence that the agreement in a sample of test units is representative of the pattem that would occur if all study units were coded by al! coders. Reproducibility reliability is the extent to which coding decisions can be replicated by different researchers.- In principle, the use of multiple independent coders applying the same rules in the same way assures that categorized content does not represent the bias of one coder. Research methods texts discuss reliability in terms of measurement error resulting from problems in coding instructions, failure of coders to achieve a common frame of reference, and coder mistakes.-* Few texts or Background

Stephen Lacy is a professor in the Michigan State University School of journalism, and l&MC Quarterly Daniel Riffe is professor m the E.W.Scripps School of journalism at Ohio üniversity. The J^.'*,^'!^^ authors thank Fred Fico for his comments and suggestions. 963-973


)/ in statistics and content analysis texts. Schutz''' dealt with measurement error and sample size. An early article by Janis. though that component was dropped from subsequent references to . even if chance agreement could be eliminated/^ the "remainder" level of agreement would exceed the acceptable level.. the first 50 items to be coded might be used).-* Often. The formula allows the researcher to be certain that the observed sample reliability level is high enough that. Schutz incorporated sampling error into his formula.80. Fadner." Stempel concludes that reliability estimates "should be based on several samples of content from the material in the study"'' and that a "minimum standard would be the selection of three passages to be coded by all coders. Weber's^^ only pertinent recommendation is that "The best test of the clarity of category definitions is to code a small sample of the text.80 to be acceptable." should be reanalyzed by independent coders to calculate overall intercoder reliability.'" Krippendorf argues that probability sampling to gei a representative sample is not necessary. "probably between 10% and 25%. and Janowitz'' comparing reliability of different coding schemes provided reliability coefficients with confidence intervals. a subsample of 5-7 percent of the total is probably sufficient for assessing reliability. Cohen'^ discussed sampling error while introducing kappa. if the minimal acceptable level of agreement is 807c.'^ Schutz offered a formula that enabled a researcher to set a minimal acceptable level of reliability and then compute the level that must be achieved in a reliability test to account for chance agreements." Most texts do not discuss reliability in the context of probability sampling and the resulting samplingerror.80. Research texts vary in their approach to sampling for reliability tests. the researcher cannot UiM & MASS CCMAUNÍCXTION QiMjrreKiy ." How large a subsample? "When a very large sample is involved. Kaid and Wadsworth'' suggest that "levels of reliability should be assessed initially on a subsample of the total sample to be analyzed betöre proceeding with the actual coding.e.. assuming satisfactory results. reliability samples have been selected haphazardly or based on convenience (e. the confidence interval does dip below . then to code the main body of data. For example. Sampling Error and Estitnating Sample SizeA Formula 964 The goal ofthe following analysis is to generate a formula for estimating simple random sample sizes for reliability tests. in the reliability test in order to control for chance agreement. For example. Singletary has noted that reliability checks introduce sampling error when probability samples are used." Yet early inquiries into reliability testing did address probability sampling. If.studies address whether the content units tested represent the population of items studied."" Wimmer and Dominick" urge analysts to conduct a pilot study on a sample of the "content universe" and. but the article concentrated on measurement error due to chance agreement. He explored the impact of "chance agreement" on reliability measures: i. Scott's'^ article introducing his pi included an equation accounting for sampling error.. the researcher might need to achieve a level as high as 837<. in a given test. the sample used must havea confidence interval that does not dip below . if the reliability coefficient must equal or exceed . though the existence of coding criteria reduces the influence chance could have. some coder agreements could occur by chance.g. Then a subsample. The formula can be used to generate samples with confidence intervals that tell researchers if the minimal acceptable reliability figure has been achieved.

and the sample level of agreement is 90%. I 100% y^5% -h5'l^o\ Confidence Interval Continuum for level of agreement in coding decisions Relevant area for determining acceptability of reliability test.the population size (number of content units in the study).. The resulting area of concem is the gray area between 90% and 80%. the formula becomes: (Equafion 2) Where N .CVLGORIES 965 . which involves the negafive side of the confidence interval. A similar procedure is used here. The FPC is used when the sample makes up 10'^ or more of the population. SAMPUNG ERROR AND SELECTINC imERcoaER REUAMITY SAMPLES ÍOR NOMINAL CoNrcm. A researcher'sconclusionofacceptablereiiability is not affected by whether the population agreement exceeds 5% on the posifive side because acceptance is based on a minimal standard. conclude that the "true" reliability of the populafion equals or exceeds the minimal acceptable level. We start with the equation for the standard error of prop()rfion and add the finite populafion correcfion (FPC). The reason for a one-tailed confidence interval is illustrated in Figure 1. Survey researchers use the formula for standard error of proporfion to estimate a minimal sample size necessary to infer to the population at a given level of confidence.FIGURE 1 Why Reliahility Confidence Interval Uses a One-Tailed Test Minimal acceptiibic 0% 80% ya". this analysis uses "simple agreement" (total agreements divided by total decisions) with a dichotomous decision (the coders either agree or disagree). which would fall on the negative side of the interval. The minimal acceptable agreement level is 80%.^Ñ^ V N-l (Equafion 1) But with the radical removed and the distributive property applied. The resulting formula is: SE = /PQ7 V ÍI-1 .. lt reduces the standard error but is often ignored because it has little impact when a sample has a small proporfion of the population. For simplicity.

Then we solve for standard error (SE).andQ^(l-P).e.^' Step 5. This is the level of agreement among all coders if they coded every content unit in the study. 95% (p=. ^. For example. Two approaches are possible. ThesecondistoassumeaPthat exceeds the minimal acceptable reliability figure by a certain level. Step 2. Using the normal curve. the researcher must follow five steps: Step 1.80. Equation 2 allows the researcher to solve for n.85. Once the acceptable probability level is determined. then the assumed P would be .64. In order to solve for n.85. using the formula: JOURNALISM & MASS CoMMUNiCAnoN QLMRTERLY 966 .g.05 is 1.P ^ the population level of agreement. '^ For example."^ It usuallyhasbeen determined before reaching the point of checking for the reliability of the instrument. The second approach creates the question: How many percentage points above the minimal reliability level should P be? For this analysis. The first is to estimate P based on a pretest of the coding instrumentand on previous research. This step is the most difficult step because it involves estimating the unknown population reliability figure. The researcher must determine the acceptable level of probability for estimating the confidence interval. A Simulation Assume an acceptable minimal level of agreement of 85% and P of 90% in a study using 1.8. newspaper stories). The level of agreement in coding all study units (P) must be estimated. i. Content analysis texts warn that an acceptable level of intercoderreliability should reflectthenature and difficulty of categoriesand content.O5) and 99% (p=. Once the five steps have been taken. Five percentage points is useful because it is consistent with a confidence interval of 5%. The researcher must set a minimal level of intercoder reliability for the test units.^' But this level is lower than recommended by others.. if the minimal acceptable reliability figure is . The first step is to determine ÏV (the number of content units being studied). The desired level of certainty is the traditional . a minimum level of 80% simple agreement is often used with new coding procedures. it will be assumed that the population level should be set at 5 percentage points above the minimal acceptable level of agreement. Step 4. the formula for confidence intervals is used to calculate the standard error (SE).. the resulting figures are plugged into Equation 2 and the number of units needed for the reliability test is determined.Ol) levels of probability. Step 3. If the reliability figure equals or exceeds . which represents the number of test units. a level consistent with minimal requirement recommendations by Krippendorf and the analysisof Schutz. chances are 95 out of 100 that the population (content units in the study) figure equals or exceeds . Andfi=samplesizefor the reliability check.05 level. we find that the one-tailed Z-score" associated with . The formula is: Confidence interval probability = Z (SE) (Equation 3) Z is the standardized point on the normal curve that corresponds with the acceptable level of probability.000 content units (e. We assume most content analysts will use the same levels of probability for the sampling error in intercoder reliability checks as are used with most sampling error estimates..

. and the resulfing SE at p .64-. squared to . However.989 In other words.000.0009) + .03.05 confidence level was . the higher will be the minimal acceptable level of reliability..90 (. Table 1 solves Equation 2 for n with three hypothefical levels of P (85%.05/1. PQ . The main problem in determining an appropriate sample of test units is estimafing the level of P.05.03 Recall that our formula for sample size begins with SE. A problem can occur if the level of agreement in the test units SAMPUNC ERROR AND SELECTING ¡matcoDEH REiJABunr SAMPLES FOR NOMINAL ComuirCAizcomES 96/ . Table 2 presents numbers of test units for 99% level of probability. The higher the assumed percentage.Confidence interval = Z (SE) (Equation 3) Our example confidence interval is 5% and our desired level of probability is 95%. with 1.000 study units and an assumed true agreement level of 90%.00Ö9) + . chances are 95 out of 100 that 85"/" or better agreement would exist if all study units were coded by all coders and reliability measured.90. the smaller will be the sample.09.05 = 1.(999)(. Thus.000 study units. This might produce an incentive to overestimate this level because it would reduce the amount of work in the reliability test. 5. So Equation 2 looks like n . and 95%) and with numbers of study units equal to 100. The sample sizes are based on confidence interval with 95% probability.10) or . So.64 (SE) or.9) taken from 1.000. SE = V H-l and becomes '-1)(SE)^ + PQN {Equafion 2) V N-i (Equation 1) Now we can plug in our numbers and determine how large a random sample we will need to achieve at minimum the standard 85% reliability agreement.000.09(1000) . Assuming a study unit level of 5 percentage points above the minimal level will control for this incentive because the higher the assumed level. However.9 (999)(. Our confidence interval is .0009. if we achieve at least 90% agreement in a simple random sample of 92 test units (rounded from 91. The figures for a given number of study units and agreement level are higher in Table 2 because they represent the increased number of test units needed to reach the higher level of probability. . the number of test units needed decreases much faster with higher levels of P than with the decline in the number of study units. and 10.250.899 = 91.1.500.Ü9 0.-^ Table 2 assumes the same agreement levels as Table 1. SE-. 90%. The table demonstrates how higher P levels and smaller numbers of study units affect the test units needed.

86) into Equation 2 as P. The standard error was used to find a sample size that would have sampling error equal to or less than 5% for the assumed population level of agreement. generates a confidence interval that does dip below the minimal acceptable level of reliability.05. Q = (1-P). Limitations of the Analysis This analysts may seem limited because it is: (a) based on a dichotomous decision. if the test units' reliability level equals .TABLE 1 Number of Content Units Needed for Reliability Test. the first two are not limitations. For example. However. and a 95% Level of Probability Assumed Level of Agreement in Population (Study Units) Population Size (Sfudy Units) 10. and (c) it uses a simple agreement measure of reliability.86 minus .Oi)O 85"X> 90% 95% 11 4 139 125 111 91 59 100 99 92 84 72 51 54 54 52 49 45 36 500 250 100 Note: The numbers are taken from the equation for standard error of proportions and are adjusted with the finite population adjustment. say . which means the full range of categories has not been tested. If this is the case. (b) with two coders. the larger sample size can be determined by plugging the test units' reliability level (. Based on Various Population Sizes. Neither is usinga dichofomous decision a problem. and n = the sample size. the researcher could randomly select more content units for the reliability check or accept a lower minimal level of agreement.E.85. This indicates that reliability figure for the population of study units might not exceed the acceptable level of . Sampling error is not affected by the number of coders. Additional units could be randomly selected and added to the original test units to calculate a new reliability figure and confidence interval based on a larger sample.'"* However. Three Assumed Levels of Population ¡ntercoder Agreement. The equation is S. Under this condition. N = the population size. Equation 2 would easily fit nominal content with more than two categories. as JOURNAUSM & MASS COMMUNKATION QuARTEniy 968 . If the first approach is used. = /PxQ X where P = percentage of agreement in population. the confidence interval dips below the minimal acceptable level of .000 l. the impact of more complex coding schemes might affect the representativeness of a reliability sample if some of the categories occur infrequently. These infrequent categories have less likelihood of being in the sample.000 5.80.85. who introduce measurement error after the reliability sample is selected.

areavailablefornomina] level data. If 95%. At least three other measures of reliability.-*" and Cohen's kappa. Based on Various Population Sizes. if the variables are straightforward counting measures.-" These three measures were developed to deal with measurement error due to chance and not with error introduced through sampling. First. The standard error was used to find a sample size that would have sampling error equal to or less than 5"/. N = the population size. the researcher should start by selecting the level of probability appropriate for the study.E. such as political KHOR ANO SELECTING INTERCODER RinABiLn\ SAMPLES TOR NOMINAL COOTENT CATEOJRIES Using the Tables " O ? . besides agreement among coding pairs. however. the researcher should randomly stratify the test units.000 5. If this is the case. If the variables involve coding meanings of content. The equations is S.000 500 250 100 271 263 218 179 132 74 193 190 165 142 111 67 104 103 95 87 75 52 Note: The numbers are taken from the equation for standard error of proportions and are adjusted with the finite population adjustment.^~ Several discussions of the relative advantages and disadvantages of these measures are available. and a 99% Level of Probability Assumed Level of Agreement in Population (Study Units) 85% 90% 95% Population Size (Study Units) 10. select a larger number of test units.-^ Krippendorf's alpha. or both. Equation 2 is limited. Some beginning researchers might struggle with the task of making assumptions and solving the equations. and n = the sample size. Q = (l-P).000 1.TABLE 1 Number of Content Units Needed for Reliability Test. Theseare Scott's pi. = / F X Q X / V n-1 V N-1 where P = percentage of agreemt-nt in population.. Second. Three Assumed Levels of Population Intercoder Agreement. A parallel analysis to this one for interval and ratio level categories could be developed using the standard error of means. such as source of newspaper stories. discussed in note 11. if 99% use Table 2. take the assumed agreement level among study units to be 90%. the two tables can be useful for selecting a sample of test units to establish equivalence reliability. use Table 1. The use of simple agreement in reliability test is not a problem either. to nominal data because it is based on the standard error of proportions. The representativeness of a sample of test units is not dependent on the test applied. for the assumed population level of agreement.

Guido H. Stability concerns the same coder testing reliability of the same content at two points in fime. NJ: Prentice-Hall. Reproducibility reliability. under some circumstances. The formula used here is the unbiased esfimator for simple random samples. Inc. a researcher studying coverage of economic news in network newscasts has 425 stories from 40 newscasts selected from the previous year. The role of selecHon bias in determining reliability coefficients seems to have gotten lost since earlier explorafions of reliability. However. The analysis in this arficle is based on simple random sampling for reliability tests. the researcher might oversample these categories. CA: Sage. if certain categories of a variable may make up a small proporfion of the content units being studied.•*" When reporfing reliability level. also called equivalence reliability. Westley (Englewood Cliffs. NOTES 1. "Sins of Omission and Commission in " ' " ¡ouRNAUSM & MASS COMMUNICAHON QUARTEIUÏ . 127. See Klaus Krippendorf. 4.leaning of news stories.^^ Third. Using probability samples and confidence intervals for reliability figures would help add rigor. samples based on proporfion or stratification will require adjustments available in many stafistics books. other forms of probability sampling. The confidence intervals for Scott's . such as strafified random sampling. might be preferable for selecting reliability test samples. For example. 1980). Stempel III and Bruce H. ed. 'y An inevitable question from graduate sfijdents conducting their first content analysis is how many items to use in the intercoder reliability test. Acceptinga confidence level of 95%. for sampling error to have meaning. "Content Analysis.'/ and Cohen's kappa can be calculated by referring to the formulas presented in the original articles for these coefficients. This bias can only be estimated through probability sampling. find the population size in the tables that is closest but greater than the size of the study units being analyzed. Guido H. Simple agreement confidence intervals can be calculated using the standard error of proportions. Accuracv reliability involves comparing coding results with some known standard. the researcher would look down the 907ci level of agreement column in Table 1 unfil she or he came to a population size of 500 (the closest sample size that is greater than 425). This arficie has attempted to answer this quesfion and to suggest a procedure for esfimating sampling error in reliability samples. Content Analysis: An Introduction to ¡ts Methodology (Beverly Hills. the sample must bea prohahility sample. Stempel III. 130-32. 1981). Take the number of test units from the table. The number of units needed for the reliability check equals 84. differs from stability and accuracy reliability. 3. Of course. confidence intervals should be reported with both measures of reliability. 2. The term reliability is used here to refer to reproducibility." in Research Methods in Mass Communication. take the assumed agreement level of 85% among study units. Stephen Lacy and Daniel Ri ffe. Variables involve numbers of stories devoted to various types of economic news. For example. The study of content needs a more rigorous way of dealing with potential selecfion bias.

It could require quota sampling.Mass Communication Quantitative Research. Roger D. Mass Communication Research (NY: Longman." in Research Methods in Mass Communication. the results for particular categories would have to be weighted to reflect the proportions in the study units. additional units can be selected. Basic Content Analysis. Robert Philip Weber. 8.A. Scott. Just generating a stratified reliability sample that would include sufficient numbers of units for each of these categories would be time consuming and difficult. Mass Media Research: An Introduction. 9. 3d ed.: Sage University Paper Series on Quantitative Applications in the Social Sciences. Michael Singletary. a twenty-sixcategory scheme for coding the variable "news topic") could create logistical problems. Content Analysis. Stempel III. (Belmont. 11. Lynda Lee Kaid and Anne Johnston Wadsworth. Stempel and Westley. Some would question whether the logistical problems outweigh the potential impact of such a "micro" measure of reliability on the overall validity of the data. disproportionate sampling of the less frequent categories would be useful. are indeed represented in the reliability data regardless of lioiofrequently they may occur in the actual data" (emphasis added). 7. This procedure might create problems when content has infrequent categories that are difficult to identify. Barker (NY: Longman. of course.g.. "Coefficient of Agreement for Nominal Scales. Cüivrew C^ATÏCORIES "71 ." EducaSAMPIJNG ERROR AND SELEOWJG J^frERCOD£^^ REUABSIFY SAMPLES FOR MIM/N/U. or selecting and checking content units for these infrequent categories until a proportion of the test units equals the estimated proportion of the infrequent categories. ed. 6. "Statistical Designs for Content Analysis." in Measurement of Communication Behavior. Philip Emmert and Larry L. Larger samples will increase the probability of including infrequent categories among the test units. "Reliability of Content Analysis: The Case of Nominal Scale Coding. 1994). No one would argue that all variables need to be tested in a reliability check. 1991). This will. Krippendorf argues that reliability samples "need not be representative of the population characteristics" but "must be representative of all distinctions made within the sample of data at hand" (emphasis in original). but the resulting reliability figure will be more representative of content units being studied." journalism Quarterly 70 (spring 1993): 126-32. CA: Wadsworth. 23. ed. 297. 146. Frequency of categories could be estimated by a pretest and different sampling rates could be used for categories that appear less frequently. Guido H. If a researcher suspects that some variable categories will occur infrequently in a simple random sample for a reliability check. "Content Analysis. 143." 128. (Newbury Park. Cohen. all decisions specified by various forms of instructions. 208. See Krippendorf. Another way of handling infrequent categories would be to increase the reliability test sample size above the minimum recommended here. 173. 10. Stempel." Pulylic Opinion Quarterly 19 (fall 1955): 321-25. but a large number of categories within a variable (e. Wimmer and Joseph R. J. 1989). 07-075). When figuring overall agreement for reliability. CA. 12. "Content Analysis. He suggests purposive or stratified sampling to ensure that "all categories of analysis. 2d ed. William A. 5. If the larger sample does not include sufficient numbers of the infrequent categories. Dominick. lead to coding of additional units from categories that appear frequently. 13.

24. This analysis assumes that each variable is checked and reported separately. recommends generally using the . Raymond H.8 level of simple agreement." Public Opinion Quarterly 7 (summer 1943): 293-96. 19. Ambiguity and Content Analysis") analysis starts with the . while sound. It is not clear whether Krippendorf's agreement level figures are for simple agreement among coders or for some other reliability measure. and the "odds" of agreement through randomness change once a coding criterion is introduced and used. 15. Irving L.)/ of . 23. 14. Under some condifions this would be consistent with a simple agreement of . N equals the number of units analyzed mulfiplied by the number of categories being used. Note that this is a one-tailed test. 16. See Singletary (Mi7ss Communication Research.000.8.9 for simple agreement and a Scott's /'/ or Krippendorf's alpha of . and Morris Janowitz. Content Analysis. adds a bothersome vagueness to content analysis. But just because chance could affect reliability does not mean it dtïes. populafion size reduces the number of test units noticeably when the number of study units falls under 1. This analysis will use . Wimmer and Dominick (Mass Media Research. "Reliability. If reliability is checked separately for each coding category in the content analysis. 21. Presumably. and the proportion of the population in the sample. 17. The acceptance of a coding instrument as reliable is not affected by whether the population reliability figure exceeds the reliability test figure on the positive side of the confidence interval. Three factors affect sampling error: the size of the sample. 22. then N equals the total number of units selected for the content analysis. Schutz's ("Reliability. 296) who states that a Scott's . Multiple-category variables differ from dichotomous variables because mulfipie-categories are not independent of each other." Psychological Rtim-w 59 (1952): 119-29." How long is a piece of string? 20. In effect.75 for intercoder reliabilitv. "The Reliability of a Content Analysis Technique. Schutz sought a way to control for the effect of those chance agreements. This is a bit like a professor's response that the length of an essay should be "as long as it takes.7 is the consensus val ue for the sta tistic. This advice. Ambiguity and Content Analysis. Janis. the homogeneity of the populafion. Schutz. Ifthe reliability ischecked for total decisions made in using the coding procedure. Content analysis researchers are concerned that the reliability figure exceeds a minimal level. 181 ) report a rule of thumb of at least . Strictly interpreted. But of course it can't. N equals the number of coding decisions that will be made by each coder. which would be on the negafive side of a confidence interval.8 level for intercoder reliability.tional and Psychological Measurement 20 (1960): 37-46. which means N equals number of content units in the populafion. Krippendorf. Eadner. these chance agreements could lead content analysts to overestimate theextent of coder agreement due to the precision of the coding instrument. The last factor has little impact unless the proportion is large.80 to remain consistent with Schutz. Its effect can only be acknowledged and compensated for. 18. but not always. As Table 1 shows. William C.67 could be reported for highly speculative conclusions. although he says some data with reliability figures as low as . However. this lack of independence is a bias in coding and not in the selection of units for 972 JüURNAiJSM & MASS CoMMUNiCAnON QiiAm^my .

A. Kolbe and Melissa S. Scott." 26. SAMPU\G ERSOR AND SEifcriNc ¡UTERCODEM REUABIUTY SAMPISS FOR NOMINAL COKTENI CAiTCi>ms 973 . 25. Coding simple content." journal ofMarketing Research 27 (May 1990): 185-195. 30.a reliability test. Content Analysis. "Reliability of Content Analysis. 1972). see C. typically yields higher levels of reliability because cues for coding are more explicit. For example. 29. and Richard H."/Di/í-mi/dfConsiíWír Research 18 (September 1991): 243-250. "Content-Analysis Research: An Examination of Applications with Directives for Improving Research Reliability and Objectivity. 27. Cohen. Moser and G. A lower reliability figure is an acceptable trade off for studying categories that concern meaning. see Maria Adele Hughes and Dennis F. The population agreement will be higher than coding schemes that deal with word mearungs. "Intercoder Reliability Estimation Approaches in Marketing: A Generalization Theory Framework for Quantitative Data." 28. (NY: Basic Books. Kalton. Burnett. "Coefficient of Agreement for Nominal Scales. Survey Methods in Social Investigations. Carrett. Krippendorf. such as numbers of stories. For examples. 2d ed.

users may print. However. download. or email articles for individual use. .Copyright of Journalism & Mass Communication Quarterly is the property of Association for Education in Journalism & Mass Communication and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission.