Vous êtes sur la page 1sur 5

Dent Mater 14:15, January, 1998

Observer variation in the assessment of resin composite


Hilde Tobi, Henk J. Groen, Cees M. Kreulen, W. Evert van Amerongen

Academic Centre for Dentistry Amsterdam (ACTA), Dept. of Cariology Endodontology and Pedodontics, Amsterdam, THE NETHERLANDS

ABSTRACT
Objectives. The aim of this study was to compare the type of information obtained from log-linear modelling vs Cohens kappa statistics on observer variation in the assessment of marginal adaptation in composite inlays and amalgam restorations. Methods. Marginal adaptation of Class II resin composite inlays and amalgam restorations was clinically assessed by two observers, four years after placement. Each of 52 patients received 4 different restorations, three composite (Herculite XR, Clearl CR Inlay and Visiomolar) inlays and one Tytin restoration. The results were evaluated by Cohens Kappa statistics and log-linear modelling. Results. The overall Cohens kappa was 0.45, ranging from poor to good for the four materials. Log-linear modelling conrmed that the observers agreed beyond chance but this agreement depended on the performance of the material. Marginal adaptation of Visiomolar (ESPE) inlays was somewhat inferior compared to the other materials. The assessment of Clearl CR (Kuraray) inlay was difcult using this clinical evaluation procedure. Signicance. Using log-linear modelling it is possible to look at observer agreement and material performance at the same time. This combined approach is important because agreement may depend on material performance. 1998 Academy of Dental Materials. Published by Elsevier Science Ltd

Observer variation in clinical trials is recognized as a serious problem, and in many articles, Cohens kappa (Cohen, 1960) is reported. Cohens kappa is a widely used and debated measure of agreement (Bland and Altman, 1986; Thompson and Walter, 1988a; Kraemer and Bloch, 1988; Thompson and Walter, 1988b; Guggenmoos-Holzmann, 1993). To address some of the drawbacks of Cohens kappa, some statisticians suggest the use of log-linear models for investigating observer variation (Agresti, 1992). The aim of this study was to determine what information log-linear modelling can yield, compared with kappa statistics, in the analysis of marginal adaptation of inlays made of different resin composites and amalgam restorations.

MATERIALS AND METHODS


As part of a controlled clinical study on the performance of Class II resin composite inlays in comparison with Class II amalgam restorations, marginal adaptation was clinically assessed on a four-point scale (Table 1) by two observers four years after placement (Kreulen et al., 1991). The margins of the restorations were visually divided in sections (Fig. 1), and each section was assessed separately. The rationale of this procedure was twofold: insurance of the dentists attention to all segments of the complete margin and breakdown of ratives which may result in more agreement between raters (Jako and Murphy, 1990). Marginal adaptation was summarized by the section with the highest score because in clinical practice, the worst part of a restoration determines treatment decision. Assessment by the two observers was performed independently. In a special session at the start of the study, the observers were calibrated by discussing the criteria based on 10 photographs of
Dental Materials/January 1998 1

INTRODUCTION
The importance of marginal integrity in the longevity of dental restorations has been widely acknowledged r, 1987; Roulet, 1994). Hence, marginal (Ryge, 1980; Mjo adaptation is a variable of interest for clinical comparison of materials in restorative dentistry. Marginal adaptation is usually scored by two or more observers on a categorical scale. Despite calibration, measurements of the same object by different observers result in some variation of assessments, which hampers the reliability of the measurement.
PII: S0109-5641(98)00002-5

TABLE 1. CRITERIA FOR THE EVALUATION OF MARGINAL ADAPTATION


Rating Description 1 No evidence of a crevice along the section of the margin, no catch of an explorer Visible evidence of a crevice along the section of the margin, an explorer catches slightly An explorer penetrates into a crevice but no dentin or base is exposed An explorer penetrates into a crevice, dentin or base is exposed

Fig. 1. Division in sections of the occlusal and proximal cavosurface margin of a restoration, for both molar and premolar.

restorations. When clarity and full agreement on the criteria were reached, one calibration session was arranged with three patients not involved in this study. Later in the study (about twice a year), the observers scored the same patients and discussed possible discrepancies until full agreement was reached. Each of the 52 patients who were involved in the present study received three composite inlays of different materials and one amalgam restoration (208 restorations in total). The resin composites used were Herculite XR (Kerr, Romulus, MI, USA), Clearl CR Inlay (Cavex Holland/Kuraray, Haarlem, The Netherlands) and Visiomolar (ESPE, Seefeld/Oberbay, Germany). Herculite XR (Batch no. 81271) is an ultrane hybrid posterior resin composite with an organic matrix containing a mixture of dimethacrylates and trimethacrylates. Clearl CR Inlay (Batch no. 3004B) is based on a similar organic matrix with the addition of tetrafunctional methacrylate monomers. The ller particles are larger than those Herculite (average 4 mm vs 0.6 mm). Visiomolar (Batch no. R193) has a tricyclodecandimethacrylate base and in contrast to the other two resin composites, it shows almost no water absorption. The average ller particle size (6-7 mm) is comparable to that of Clearl. Tytin (Kerr, Romulus, MI, USA, Batch no. 102088) is a ternary spherical high-copper
2 Tobi et al./Varying assessment of resin composite

amalgam. Restorations of all four brands were made from one batch to limit variation. Statistical background. Cohens kappa is commonly used as a measure of agreement between two observers, because it reects agreement beyond chance (Cohen, 1960; Armitage and Berry, 1994). For the interpretation of certain values for kappa, the standard by Altman (1991) is used. Altman distinguishes 5 categories: poor (less than 0.20), fair (0.21-0.40), moderate (0.41-0.60), good (0.61-0.80) and very good. Although in many studies, the report on reliability is conned to a point estimate of kappa, in this study 95% condence intervals are reported because condence intervals give more information (Gardner and Altman, 1989). Log-linear models describe associations between two or more categorical variables, whereas Cohens kappa describes (a specic type of) association between only two variables. The procedure is called log-linear modelling because the logarithm of the cell count in the continguency table is modelled by a linear expression (Fienberg, 1980). For explanatory purposes, the present case is used with the three variables: observer A, observer B and material. The most simple model assumes independence among all three variables. In short form, this model is written as [A,B,M]. It says that observers A and B do not agree beyond chance, and that all materials perform equivalently. Agreement beyond chance between observers A and B is represented by the interaction A*B. A more extended model also allows for differential performance of material as represented by an interaction between scores and material, yielding the model [A*M, B*M, A*B]. The saturated model ts the data perfectly. It is the most complex model including the highest order interaction possible, in this case, the interaction A*B*M. A*B*M allows agreement among observers to differ between materials. Note that all models are assumed hierarchical so this model includes the main effect of A, B and M as well as A*B, A*M, and B*M. The aim of log-linear modelling is to nd the model with the least number of components that estimates the actual cell counts closely enough. During the modelling

TABLE 2. FREQUENCIES OF SCORES FOR THE THREE INLAY MATERIALS AND TYTIN
Observer A Material Herculite XR Observer B Score 2 Score 3 Total Clearl CR Score 2 Score 3 Total Visiomolar Score 2 Score 3 Total Tytin Score 2 Score 3 Total Score 2 33 6 39 26 7 33 19 2 21 28 5 33 Score 3 6 7 13 12 7 19 5 24 29 9 10 19 Total

TABLE 4. LOG-LINEAR MODELLING OF OBSERVER AGREEMENT (A*B) FOR EACH MATERIAL SEPERATELY
Material Likelihood ratio chi2 statistic model A,B 7.05 1.47 29.36 8.12 P-value improvement model A*B compared with model A,B 0.01 0.23 0.01 0.01

Herculite XR 39 Clearl CR 13 Visiomolar 52 Tytin 38 14 52 24 Model 26 50 1. A, B, M 37 2. A*B, M 15 3. A*M, B*M 52 4. A*B, A*M, B*M 3 4 9 df

TABLE 5. LOG-LINEAR MODELLING WITH OBSERVER (A,B) AND MATERIAL (M)


Likelihood Ratio Chi2 P-value Improvement Compared with Model () 0.000 (1) 0.001 (1) 0.012 (2) 0.000 (3) Model Fit

10

68.70 25.33 46.00 9.05

0.000 0.003 0.000 0.029

TABLE 3. KAPPA ESTIMATES FOR EACH OF THE FOUR MATERIALS


Material Kappa 95% Condence Interval Lower Herculite XR Clearl CR Visiomolar Tytin 0.39 0.17 0.72 0.39 0.10 -0.11 0.53 0.13 Upper-bound 0.67 0.44 0.91 0.65

not the parameter estimates. The estimates have analysis of variance-like limitations (Agresti, 1990). For example, the sum of all the main effects of the four materials equals zero. Analoguous to the parameter estimates in linear regression, the log-linear model parameter estimates can be divided by their standard error, yielding Z-values. A Z-value larger than +2.00 or smaller than -2.00 implies that the parameter estimate differs statistically signicantly from zero. This means that the direction (positive or negative) and size of the Z-value provides more information than the actual parameter estimates.

RESULTS
procedure, different models are mutually compared, hierarchically, using likelihood ratio chi-square tests. The better the t of the log-linear model, the smaller the likelihood ratio chi-square and the higher the pvalue of the model t. The nal model cannot be statistically signicantly improved by adding a component and the p-value of the model t is sufciently high. This implies that when the model [A*M, B*M] is not statistically signicantly improved by adding the interaction A*B, the model [A*M, B*M] is preferred. The model itself is usually of primary interest and Within four years after placement, two Visiomolar inlays were replaced, one for secondary caries, the other for primary caries at a surface previously not involved in any restoration. All remaining 154 inlays and 52 restorations showed some visible evidence of a crevice. Frequencies of materials and scores given by the two observers can be found in Table 2. Note that no score of 4 was given. The overall Cohens kappa was 0.45, and the corresponding 95% Condence Interval 0.33 to 0.58. The material specic kappas ranged from poor to good for the four materials (Table 3). On the assessments of
Dental Materials/January 1998 3

TABLE 6. THE Z-VALUES FOR THE PARAMETER ESTIMATES IN THE FULL MODEL (A*B*M)
Main effects Z-value Observer A Score 2 Score 3 Observer B Score 2 Score 3 Material Herculite XR Clearl CR Visiomolar Tytin First order interaction effects Observer A Observer B score 2 score 3 Score 2 5.604 -5.604 Observer A Material Herculite XR Clearl CR Visiomolar Tytin Score 2 1.841 0.635 -2.131 0.063 Observer B Material Herculite XR Clearl CR Visiomolar Tytin Score 2 0.201 0.732 -1.019 0.321 Score 3 -0.201 -0.732 1.019 -0.321 Score 3 -1.841 -0.635 2.131 -0.063 Score 3 -5.604 5.604 1.101 -1.101 3.884 -3.884 -0.170 0.865 -0.969 0.517

Visiomolar, the observers agreed signicantly better than on the other materials; the 95% Condence Interval for kappa did not include any of the other three point estimates. Log-linear modelling for each material separately showed that the independence model for the ratings of marginal adaptation in Clearl CR inlay t sufciently (Table 4). For the other materials, the observers did agree beyond chance (inclusion of observer-interaction term, p 0.01). The modelling process for all data at once, is given in Table 5. In this study, the independence model does not t (Model 1). Although allowing for agreement beyond chance (Model 2) improves the t signicantly, it is not yet sufcient. Model 3 describes the situation in which materials may perform differently. This model is also improved signicantly by adding the interaction A*B (Model 4). As Model 4 also does not t sufciently, the saturated model is needed. The conclusion is that the observers do agree beyond chance but this agreement depends on material. From the Z-values of the parameter estimates in the nal model (Table 6), it can be concluded that observer B gives a score of 2 more often than a score of 3 (Z-value 3.884). The observers do agree beyond chance (Z = 5.604). Observer A scores the marginal adaptation of Herculite XR as better than the other materials (Z = 1.841). Both observers think that Visiomolar performs worse than the other materials (-2.131 and -1.019). The Z- values of A*B*M show that agreement was statistically signicantly worse for Clearl CR (-2.229) and better for Visiomolar (+2.399) compared with agreement on Herculite and Tytin.

DISCUSSION
Of the two inlays which failed within four years, one was a true failure. One failure out of 156 inlays is in accordance with the low failure rates reported in the literature (Wendt and Leinfelder, 1992; van Dijken, 1994). Both the series of kappa estimates and the log-linear modelling lead to the conclusion that observer agreement can differ between materials. Thus, the report of one single overall index of observer variation would be a distortion of the actual situation where some materials are harder to assess than others. Where a series of kappa estimates summarizes the observer agreement, log-linear models allow examination of observer agreement and performance of the materials at the same time. Although not the focus of this study, it would be interesting to know why materials are more or less difcult to assess. For example, had observer agreement been stronger for composite inlays than for amalgam restorations, the use of luting cement would have given a possible explanation. The wear of luting cement might facilitate the detection of a crevice and hence improve agreement. If rougher materials had shown less agreement than smoother ones, one could have argued that it is easier to discriminate the margin when the restoration is smooth (ller particles small) than when the restoration is rough.

Second order interaction effects Observer A Material Herculite XR Observer B Score 2 Score 3 Clearl CR Score 2 Score 3 Visiomolar Score 2 Score 3 Tytin Score 2 Score 3 Score 2 -0.339 0.339 -2.229 2.229 2.399 -2.399 -0.404 0.404 Score 3 0.339 -0.339 2.229 -2.229 -2.399 2.339 0.404 -0.404

4 Tobi et al./Varying assessment of resin composite

In this study, observer variation may have been inuenced by material performance and the rating procedure. With a moderate performing material, the chance of receiving a score of 3 for at least one section for both observers was high. The high agreement on Visiomolar, together with its moderate performance regarding marginal adaptation, seems to support this hypothesis. Further research is needed to investigate whether observer agreement is inuenced by material performance. It should be noted that since log-linear modelling can handle more than two variables, it is especially suited for situations in which more than two observers assess the same restorations. For example, in a situation with four observers (A, B, C and D) that is best described by the model [A*B*C, D], it can be concluded that disagreement is due to one particular observer (D). The major advantage of this modelling approach to observer variation in clinical material comparison research, be it with two observers or more, is that the performance of the materials can be incorporated in the model. Thus, in contrast to kappa coefcients, the detection of agreement beyond chance is not dependent on the marginal distribution. The major disadvantages are that the model does not yield an index for observer agreement with a known and understood distribution, and the model parameter estimates are often difcult to interpret (especially when some of the variables under study distinguish more than two categories). In summary, it has been shown that while the overall kappa may be acceptable, the material specic kappas are not necessarily so. Observer agreement can vary with material performance. Log-linear modelling allows researchers to look at observer agreement and material performance at the same time. Therefore, log-linear modelling is a valuable tool to further investigate observer variation when different materials are involved.
Received December 3, 1996 / Accepted December 19, 1997 Address for correspondence and reprint requests to: Hilde Tobi Academic Centre for Dentistry Amsterdam (ACTA) Department of Cariology Endodontology and Pedodontology Louwesweg 1 1066 EA Amsterdam, The Netherlands. Tel.: X - 31 - 20 5188 413; Fax: X - 31 - 20 5188 544; E-mail: H.Tobi@acta.nl

REFERENCES
Agresti A (1990). Categorical Data Analysis. New York: John Wiley and Sons. Agresti A (1992). Modelling patterns of agreement and disagreement. Stat Meth Med Res 1: 201218. Altman DG (1991). Practical Statistics for Medical Research. London: Chapman and Hall, 404. Armitage P, Berry G (1994). Statistical Methods in Medical Research. 3rd ed. Oxford: Blackwell Science Ltd. Bland JM, Altman DG (1986). Statistical methods for assessing agreement between two methods of clinical measurement. Lancet i:307-310. Cohen J (1960). A coefcient of agreement for nominal scales. Educ Psychol Meas 20:37-46. Fienberg SE (1980). The Analysis of Cross-classied Categorical Data. 2nd ed. Cambridge, Mass.: MIT Press. Gardner MJ, Altman DG, eds. (1989). Statistics with condence: condence intervals and statistical guidelines. London: British Medical Journal. Guggenmoos-Holzmann I (1993). How reliable are chance-corrected measures of agreement? Stat Med 12:2191-2205. Jako RA, Murphy KR (1990). Distributional ratings, judgment decomposition, and their impact on interrater agreement and rating accuracy. J Appl Psychol 75:500-505. Kraemer HC, Bloch DA (1988). Kappa coefcients in epidemiology: an appraisal of a reappraisal. J Clin Epidemiol 41:959-958. Kreulen CM, Van Amerongen WE, Akerboom HBM, Borgmeijer PJ, Kemp-Scholte ChM (1991). A clinical study on direct and indirect Class II posterior resin restorations: Design of the investigation. J Dent Child 58:281-288. r IA (1987). A regulatory approach to the formulaMjo tion of assessment criteria for posterior composite resins. Quintessance International 18:537-539 (Special Reprint). Roulet JF (1994). Marginal integrity: clinical signicance. J Dent 22:S9-12 (Suppl 1). Ryge G (1980). Clinical criteria. Int Dent J 30:347-358. Thompson WD, Walter SD (1988a). Kappa and the concepts of independent errors. J Clin Epidemiol 41:969-970. Thompson WD, Walter SD (1988b). A reappraisal of the kappa coefcient. J Clin Epidemiol 41:949-958. van Dijken JWV (1994). A 6-year evaluation of a direct composite resin inlay/onlay system and glass-ionomer cement-composite resin sandwich restorations. Acta Odontol Scand 52:368-376. Wendt SL, Leinfelder KF (1992). Clinical evaluation of a heat-treated resin composite inlay: 3-year results. Am J Dent 5:258-262.

Dental Materials/January 1998 5

Vous aimerez peut-être aussi