Académique Documents
Professionnel Documents
Culture Documents
2
.15, and JOL type had a significant effect, F(4, 148) 3.49,
MSE .18,
2
.09. The interaction between test type and JOL
type, however, was not significant, F(4, 148) 1.09, MSE .18,
p .05. This is important because TAM predicts that monitoring
will be best when stimuli (for the context-oriented view) or the
processing used (for the processing-oriented view) at JOL match
those at test. For recall, this would be JOL Condition 1: The JOL
prompt is the cue alone, exactly like the final test. Although
predictive accuracy was high in Condition 1 for recall, adding
irrelevant alternatives to the context (Conditions 3 and 4) did not
decrease G so long as the correct answer was not evident. For
recognition, best performance would be expected in JOL Condi-
tion 4, where the unmarked cuetarget pair is presented with the
same six distractor pairs that would be present at test; G was
intermediate in that condition.
Because the interaction was not significant (and because the
pattern was similar across the two conditions), we combined Gs for
recall and recognition tests and obtained the following mean Gs for
Conditions 15, respectively: 0.69, 0.44, 0.72, 0.67, and 0.50.
Mean Gs tended to be higher when the answer was not evident.
Comparing all pairwise combinations of the five JOL conditions
required 10 separate tests, so we used the Bonferroni correction
procedure and adopted a more stringent alpha level (.05/10)
.005 for each t test. No significant differences in mean Gs emerged
at the modified alpha level (observed p values for tests comparing
Conditions 2 and 5 vs. Conditions 1, 3, and 4 ranged from .007 to
.056). However, when recall and recognition tests were considered
separately, the direction of the effect (higher G when the answer
was not evident) was consistent in 12 of 12 cases; this is significant
by a sign test ( p .0002). Overall, Gs did not vary as predicted
by TAM but rather varied according to whether the correct answer
was evident during JOLs.
2
2
Using 137 participants, we replicated this finding in a second experi-
ment. The procedures were the same except that type of JOL varied
between subjects and all participants received a recall test for 30 items and
a recognition test for the remaining 30 items. This modification produced
fewer indeterminate Gs and reduced the variability in mean scores but
again completely failed to support the TAM hypothesis.
1060
WEAVER AND KELEMEN
JOL Magnitude
Mean JOL magnitude across conditions appears in Table 2. A
5 2 mixed ANOVA showed a significant main effect of JOL
type, F(4, 264) 51.74, MSE .02,
2
.44, and a significant
interaction between JOL type and test, F(4, 264) 3.56, MSE
.02,
2
.05. To follow up the interaction, we conducted separate
one-way ANOVAs for the recall and recognition tests, and the
effect of JOL type remained significant in both cases (Fs 15).
Next, we conducted post hoc paired t tests at alpha .005 to
examine differences across conditions. JOL magnitude in Condi-
tions 2, 4, and 5 was significantly higher compared with Condi-
tions 1 and 3 for both types of tests in all but one case (11 of 12
comparisons, p .003 using a sign test). Overall, participants
were more confident about their future memory performances
when they saw the answer at time of JOL.
Test Performance
Performance was better on recognition tests than on recall tests
(see Table 2), F(1, 66) 48.73, MSE .23,
2
.43. Type of
JOL also had a strong influence on test performance, F(4, 264)
46.58, MSE .02,
2
.41. In addition, the interaction between
test type and JOL type was significant, F(4, 264) 4.92, MSE
.02,
2
.07. We conducted separate repeated measures
ANOVAs for each type of test, and the influence of JOL type
remained significant (Fs 18) with large effect sizes (
2
.35)
for both types of tests. The influence of JOL cues on test perfor-
mance is clear: Participants memory was best when the correct
answer was evident (Conditions 2 and 5), moderate when the
answer was included but not distinguished from incorrect alterna-
tives (Condition 4), and lowest when the answer was not shown
(Conditions 1 and 3). Post hoc t tests were consistent with this
interpretation (see subscripts in Table 2).
Discussion
The purpose of this study was to evaluate the effects of match-
ing the processing elicited during delayed JOLs and the processing
required at subsequent tests. The context-oriented version of the
TAM hypothesis proposes that metamemory accuracy will in-
crease as judgments and tests become more similar. One major
prediction was the emergence of a reliable interaction between
type of JOL and test: For recall tests, Condition 1 should have
produced the highest Gs; for recognition tests, Condition 4 should
have been the best. This interaction did not occur. At the same
time, we failed to obtain evidence to support a processing-oriented
version of TAM. Although metamemory accuracy was high in
Condition 1 for recall, adding irrelevant alternatives during
JOLsthereby degrading the processing matchdid not decrease
metamemory in Conditions 3 and 4. For associative recognition,
Condition 4 provided an exact match of context and processing at
JOL and test, but metamemory accuracy did not improve as
predicted. In fact, mean Gs were slightly higher in Conditions 1
and 3, which provided imperfect matches.
We found large increases in JOL magnitude and test perfor-
mance when the correct answer was evident during delayed JOLs
(Conditions 2 and 5) compared with when the answer was absent
(Conditions 1 and 3). Seeing the correct cuetarget item at time of
JOL improved subsequent memory, and participants adjusted their
JOLs accordingly. The opposite pattern of results was obtained for
relative metamemory accuracy: Mean Gs were higher when the
correct answer was absent at time of JOL and lower when the
correct answer was evident. These results may have emerged
because participants attempted to retrieve the answer during JOLs
in Conditions 1 and 3, which provided highly diagnostic informa-
Table 2
Mean JOL Magnitude and Test Performance by Type of JOL and Type of Test
Test type and
performance
JOL condition
1 2 3 4 5
M SE M SE M SE M SE M SE
Recall
JOL magnitude .22
a
.04 .51
b
.05 .24
a
.03 .44
b
.04 .51
b
.04
Performance .17
a
.03 .43
c
.04 .14
a
.03 .31
b
.04 .41
c
.04
Recognition
JOL magnitude .33
a
.03 .52
b
.04 .37
a, c
.04 .44
b, c
.04 .52
b
.04
Performance .61
a
.04 .79
b
.04 .57
a
.04 .58
a
.05 .73
b
.05
Note. Means in the same row with different subscripts were significantly different at p .01 using post hoc
paired t tests. JOL judgment of learning.
Figure 1. Mean gammas as a function of judgment of learning (JOL)
condition and type of test. Vertical bars represent standard errors of the
mean.
1061
TRANSFER-APPROPRIATE METAMEMORY
tion regarding future memory performance. Conditions 2 and 5, on
the other hand, provided an additional opportunity to learn the item
but offered fewer diagnostic cues for JOLs.
Isolating Causes of Metamemory Accuracy
The main differences in metamemory accuracy can be summed
up as follows: Metamemory accuracy in JOL Conditions 2 and 5
is lower than it is in the other three conditions. Why is this the
case? First of all, Conditions 2 and 5 present the (clearly marked)
correct answer during the JOL, removing the need for any kind of
covert retrieval attempt, if in fact this is what participants are doing
at JOL. Specifically, these conditions preclude the case of a failed
retrieval attempt, which is particularly diagnostic (see Nelson,
Narens, & Dunlosky, in press). If so, then anything that reduces the
number of low JOLs (predictions of unsuccessful future recall)
should reduce JOL accuracy. Second, those JOL conditions may
induce an illusion of knowing (Glenberg, Wilkinson, & Epstein,
1982; Hart et al., 1992; Koriat, 1998), in which individuals de-
velop a sense of overconfidence, believing that they know more
than they do. Finally, presenting the targets at time of JOL may
impair JOL accuracy simply by distorting the distribution of JOL.
That is, decreasing the frequency of low JOLs may reduce gammas
for reasons having to do with measurement factors, not metacog-
nitive factors. By restricting the range of JOLs, observed levels of
G may be reduced. Compared with immediate JOLs, delayed JOLs
induce many more JOLs at the extremes of the JOL continuum
(Dunlosky & Nelson, 1994; Schwartz, 1994). In previous work
(Weaver & Kelemen, 1997), however, we determined that this
distribution shift was not the primary cause of the delayed-JOL
effect.
In the present study, the different JOL conditions did induce
major differences in the distribution of JOLs. Table 3 displays the
frequency with which different JOLs were selected as a function of
JOL condition. JOL Conditions 2 and 5 elicit far fewer judgments
of 0 than any other condition: less than 10% of the time for both
recall and recognition tests. The other JOL conditions elicited
JOLs of 0 between two and six times more frequently. At the same
time, these conditions produce different levels of correct perfor-
mance when conditionalized upon JOL (also shown in Table 3);
we refer to the patterns of conditional proportions correct as
calibration curves, following common practice in this field and in
judgments and decision making (Hart et al., 1992; Nelson, 1996;
Stankov, 1998; Wallsten, 1996; Weaver, 1990). Theoretically,
perfect metacognitive accuracy would be indicated by proportions
correct that are identical to the JOL level (that is, items with JOLs
of 80% would be answered correctly 80% of the time) and to be
independent of JOL frequency. When G is less than perfect,
though, the distribution of JOLs has a significant effect. G involves
a weighted averaging of items. An inaccurate prediction that
occurs frequently will significantly lower G. The same inaccuracy
Table 3
Frequency of JOL Usage and Conditional Proportion Correct by Type of JOL and Type of Test
JOL
condition Measure
JOL
0 20 40 60 80 100
Recall
1 Frequency .55 .19 .05 .07 .05 .09
Proportion correct .04 .09 .30 .34 .45 .83
2 Frequency .10 .25 .20 .11 .13 .21
Proportion correct .20 .28 .55 .57 .47 .50
3 Frequency .48 .25 .08 .07 .06 .07
Proportion correct .03 .07 .16 .29 .54 .69
4 Frequency .22 .25 .13 .11 .13 .17
Proportion correct .03 .13 .37 .33 .58 .68
5 Frequency .08 .26 .20 .14 .13 .19
Proportion correct .06 .31 .30 .54 .65 .53
Recognition
1 Frequency .27 .36 .14 .04 .06 .13
Proportion correct .39 .56 .70 1.00 .88 .89
2 Frequency .05 .24 .23 .18 .13 .17
Proportion correct .60 .60 .81 .82 .85 .99
3 Frequency .20 .32 .20 .09 .07 .12
Proportion correct .36 .44 .63 .66 .79 .98
4 Frequency .23 .27 .12 .08 .08 .23
Proportion correct .30 .31 .62 .85 .90 .94
5 Frequency .08 .24 .19 .17 .13 .19
Proportion correct .45 .58 .69 .83 .83 .94
Note. JOL judgment of learning.
1062
WEAVER AND KELEMEN
observation, occurring infrequently, has much less of an effect.
Estimates of G, then, are influenced not only by the function
relating JOL and performance (the calibration curves) but also by
the relative frequency with which each JOL category is used.
To separate these two influences, we conducted a series of
Monte Carlo simulations, similar to those performed by Weaver
(1990) and Weaver and Kelemen (1997). For each observation, we
first determined the JOL for that item using the frequency distri-
butions displayed in Table 3. For example, in JOL Condition 1 for
the recall data, participants selected the 0% JOL rating 55% of the
time, the 20% rating 19% of the time, and so on. We generated a
random number between 0 and 1 and used this to determine the
items JOL: If the random number was less than .55, the item was
assigned a JOL of 0. If the number was between .55 and .74 (.55
.19), it was given a JOL of 2, and so on. Once an item received a
JOLassume for illustrative purposes that the item was assigned
a JOL of 0the proportion correct data from Table 3 were used to
determine whether this item was successfully recalled. Another
random number was generated; if the number was less than .04
(the conditional proportion correct for JOL Condition 1 JOL
0), the item was presumed to have been correctly recalled. This
was repeated for each of 60 items, for 50 participants per simulated
experiment. Each experiment was replicated 50 times.
These procedures allowed us to separate the effects of JOL
distribution shifts from those due to differences in calibration
curves. For example, are the lower observed Gs in Conditions 2
and 5 an artifact of the relative infrequency of using JOLs of 0? If
so, then assigning JOLs based on data where JOLs of 0 are more
frequent (such as recall, Condition 1) but using the same calibra-
tion curve should produce higher Gs. If the lower Gs reflect true
metacognitive impairments, then varying the calibration curves
while holding constant the JOL distributions should produce larger
effects. In all, 25 combinations of 5 JOL distributions and 5
conditional proportions correct were possible for both the recall
and the recognition data.
The results of the simulations are shown in Table 4. The main
diagonal indicates places where the accuracy of the simulations
can be checked with participants actual data. In 8 of 10 cases, the
simulated Gs were nearly perfect (within the 95% confidence
interval for the mean of participants data). In the others, the
pattern observed still mirrors the data actually obtained. Overall,
we are satisfied that our simulations allow us to answer the
questions of interest.
3
The results of the simulations are clear and striking. Although
changing the JOL distribution alters the Gs somewhat, varying the
calibration curves alters them substantially. Regardless of the
underlying JOL distributions, the calibration curves from Condi-
tions 2 and 5 (those at which the correct answer is displayed and
identified at JOL) produce substantially lower Gs. The effects are
particularly powerful with recall data. This is noteworthy because
the vast majority of JOL research uses cued recall as the dependent
variable. We conclude from these data that the poor metacognitive
3
We can speculate as to why our results in some simulated conditions
differed more than others. First of all, our simulations assume that JOLs are
distributed randomly across each participant and among all participants.
The condition in which our error was greatest, recognition JOL Condition
4, illustrates one consequence of this assumption. If we assume that all
participants use all categories equally, our simulated numbers are more
believable. However, if participants tended to use either the higher cate-
gories or the lower categories more frequently, observed gammas would be
lower than simulated gammas. This is true because conditional probabili-
ties for Categories 0 and 20 are almost identical, as are those for Categories
60, 80, and 100. Those using JOLs at only the higher range, for example,
will have many cases in which the item with the higher JOL is not more
likely to be recalled, lowering the gammas.
Table 4
Results (Mean gamma) of Monte Carlo Simulations Varying JOL Frequency and Conditional
Proportion Correct
JOL
distribution
Condition
1 2 3 4 5 M
Recall
1 .81, .84 .43 .81 .80 .70 .71
2 .74 .28, .44 .74 .63 .42 .59
3 .77 .43 .82, .83 .77 .66 .69
4 .76 .35 .77 .70, .84 .53 .62
5 .71 .27 .73 .61 .40, .57 .54
M .76 .35 .71 .70 .54
Recognition
1 .56, .53 .45 .55 .63 .50 .54
2 .58 .51, .45 .55 .72 .50 .57
3 .58 .47 .54, .59 .66 .49 .55
4 .62 .57 .64 .72, .49 .58 .62
5 .59 .53 .59 .73 .42, .37 .59
M .59 .50 .57 .70 .50
Note. Simulated data are based on observed values from each experimental condition. Actual results are shown
in bold.
1063
TRANSFER-APPROPRIATE METAMEMORY
performance seen in Conditions 2 and 5 is a true deficit, not an
artifact of the shift in JOL distributions.
Our data, unfortunately, do not let us distinguish between the
two most compelling explanations of JOLs, the monitoring-dual
memories (MDM) hypothesis of Nelson and Dunlosky (Dunlosky
& Nelson, 1992, 1994, 1997; Nelson & Dunlosky, 1991, 1992) and
the self-fulfilling hypothesis of Spellman and Bjork (1992) and its
more recent variant, the memory hypothesis of Kimball and Met-
calfe (in press). Gs tended to be high when the correct answers
were not evident during JOLs; this is largely consistent with an
MDM account of JOLs. However, Gs were relatively high for
recall tests in Condition 4, even though the correct-cuetarget pair
was presented. At the same time, presenting target answers along
with cues at time of JOL also produced an increase in memory
accuracy (at the expense of metamemory accuracy); this is con-
sistent with the self-fulfilling and memory hypotheses. Nelson et
al.s newly developed prejudgment recall and monitoring (PRAM)
procedure (unpublished manuscript), in which recall attempts are
made prior to JOLs, may allow this question to be addressed more
directly in future research.
Most important, our data strongly argue against a processing
view of TAM for paired associates. It is possible that support for
TAM may yet emerge using more complex stimulus materials such
as passages of text. Text materials permit a wider range of encod-
ing strategies and processing during judgment and test; this might
increase the importance of matching processing at these times
(though see Rawson, Dunlosky, & McDonald, 2002, for a discus-
sion that contradicts this view). At present, however, we see little
evidence to support TAM as a viable account of metamemory
accuracy.
References
Begg, I., Duft, S., Lalonde, P., Melnick, R., & Sanvito, J. (1989). Memory
predictions are based on ease of processing. Journal of Memory and
Language, 28, 610632.
Blaxton, T. A. (1986). Investigating dissociations among memory mea-
sures: Support for a transfer appropriate processing framework (Doctoral
dissertation, Purdue University, 1985). Dissertation Abstracts Interna-
tional, 47, 408.
Clark-Carter, D. (1997). Doing quantitative psychological research: From
design to report. East Sussex, England: Psychology Press.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences
(2nd ed.). Hillsdale, NJ: Erlbaum.
Drum, P. A., Calfee, R. C., & Cook, L. K. (1981). The effects of surface
structure variables on performance in reading comprehension tests.
Reading Research Quarterly, 16, 486514.
Dunlosky, J., & Nelson, T. O. (1992). Importance of the kind of cue for
judgments of learning (JOL) and the delayed-JOL effect. Memory and
Cognition, 20, 374380.
Dunlosky, J., & Nelson, T. O. (1994). Does the sensitivity of judgments of
learning (JOLs) to the effects of various study activities depend on when
the JOLs occur? Journal of Memory and Language, 33, 545565.
Dunlosky, J., & Nelson, T. O. (1997). Similarity between the cue for
judgments of learning (JOL) and the cue for test is not the primary
determinant of JOL accuracy. Journal of Memory and Language, 36,
3449.
Glenberg, A. M., Wilkinson, A. C., & Epstein, W. (1982). The illusion of
knowing: Failure in the self-assessment of comprehension. Memory and
Cognition, 10, 597602.
Graf, P., & Ryan, L. (1990). Transfer-appropriate processing for implicit
and explicit memory. Journal of Experimental Psychology: Learning,
Memory, and Cognition, 16, 978992.
Hart, J. T., Nelson, T. O., Gerler, D., Narens, L., Arbuckle, T. Y., Cuddy,
L. A., et al. (1992). Metacognitive monitoring. In T. O. Nelson (Ed.),
Metacognition: Core readings (pp. 131231). Needham Heights, MA:
Allyn & Bacon.
Kelemen, W. L. (2000). Metamemory cues and monitoring accuracy:
Judging what you know and what you will know. Journal of Educational
Psychology, 92, 800810.
Kelemen, W. L., & Weaver, C. A., III. (1997). Enhanced memory at
delays: Why do judgments of learning improve over time? Journal of
Experimental Psychology: Learning, Memory, and Cognition, 23, 1394
1409.
Kimball, D. R., & Metcalfe, J. (2002, November). Explaining the delayed-
JOL effect: Evidence of a Heisenberg effect. Paper presented at the 43rd
Annual Meeting of the Psychonomic Society, Kansas City, MO.
Kimball, D. R., & Metcalfe, J. (in press). Delaying judgments of learning
affects memory, not metamemory. Memory & Cognition.
Koriat, A. (1997). Monitoring ones own knowledge during study: A
cue-utilization approach to judgments of learning. Journal of Experi-
mental Psychology: General, 126, 349370.
Koriat, A. (1998). Illusions of knowing: The link between knowledge and
metaknowledge. In V. Y. Yzerbyt (Ed.), Metacognition: Cognitive and
social dimensions (pp. 1634). Thousand Oaks, CA: Sage.
Lockhart, R. S. (2002). Levels of processing, transfer-appropriate process-
ing, and the concept of robust encoding. Memory, 10, 397403.
Morris, C. D. (1978). Transfer appropriate processing between different
encoding dimensions (Doctoral dissertation, Vanderbilt University,
1977). Dissertation Abstracts International, 39, 1017.
Morris, C. D., Bransford, J. D., & Franks, J. J. (1977). Levels of processing
versus transfer appropriate processing. Journal of Verbal Learning and
Verbal Behavior, 16, 519533.
Nelson, T. O. (1996). Gamma is a measure of the accuracy of predicting
performance on one item relative to another item, not of the absolute
performance on an individual item. Applied Cognitive Psychology, 10,
257260.
Nelson, T. O., & Dunlosky, J. (1991). When peoples judgments of
learning (JOLs) are extremely accurate at predicting subsequent recall:
The delayed-JOL effect. Psychological Science, 2, 267270.
Nelson, T. O., & Dunlosky, J. (1992). How shall we explain the delayed-
judgment-of-learning effect? Psychological Science, 3, 317318.
Nelson, T. O., & Dunlosky, J. (1996, November). Toward the theoretical
mechanisms underlying immediate versus delayed judgments of learn-
ing. Paper presented at the 37th Annual Meeting of the Psychonomic
Society, Chicago.
Nelson, T. O., & Narens, L. (1990). Metamemory: A theoretical framework
and new findings. In G. Bower (Ed.), The psychology of learning and
motivation (Vol. 26, pp. 125173). San Diego, CA: Academic Press.
Nelson, T. O., Narens, L., & Dunlosky, J. (in press). A revised methodology
for research on metamemory: Pre-judgment recall and monitoring
(PRAM). Psychological Methods.
Paivio, A., Yuille, J. C., & Madigan, S. A. (1968). Concreteness, imagery,
and meaningfulness values for 925 nouns. Journal of Experimental
Psychology Monographs, 76(1, Pt. 2).
Rajaram, S., Srinivas, K., & Roediger, H. L. (1998). A transfer-appropriate
processing account of context effects in word-fragment completion.
Journal of Experimental Psychology: Learning, Memory, and Cogni-
tion, 24, 9931004.
Rawson, K. A., Dunlosky, J., & McDonald, S. L. (2002). Influences of
metamemory on performance predictions for text. Quarterly Journal of
Experimental Psychology: Human Experimental Psychology, 55A, 505
524.
Roediger, H. L. (1990). Implicit memory: Retention without remembering.
American Psychologist, 45, 10431056.
1064
WEAVER AND KELEMEN
Roediger, H. L., Gallo, D. A., & Geraci, L. (2002). Processing approaches
to cognition: The impetus from the levels-of-processing framework.
Memory, 10, 319332.
Schwartz, B. L. (1994). Sources of information in metamemory: Judgments of
learning and feelings of knowing. Psychonomic Bulletin &Review, 1, 357375.
Spellman, B. A., & Bjork, R. A. (1992). When predictions create reality:
Judgments of learning may alter what they are intended to assess.
Psychological Science, 3, 315316.
Spellman, B. A., & Bjork, R. A. (1997, November). When prophecy
succeeds (too well): Inaccurate judgments of learning can produce
better-than-perfect predictions. Paper presented at the 38th Annual
Meeting of the Psychonomic Society, Philadelphia.
Stankov, L. (1998). Calibration curves, scatterplots and the distinction
between general knowledge and perceptual tasks. Learning and Individ-
ual Differences, 10, 2950.
Thiede, K. W., & Dunlosky, J. (1994). Delaying students metacognitive
monitoring improves their accuracy in predicting their recognition per-
formance. Journal of Educational Psychology, 86, 290302.
Wallsten, T. S. (1996). An analysis of judgment research analyses. Orga-
nizational Behavior and Human Decision Processes, 65, 220226.
Weaver, C. A., III. (1990). Constraining factors in calibration of compre-
hension. Journal of Experimental Psychology: Learning, Memory, and
Cognition, 16, 214222.
Weaver, C. A., III, & Kelemen, W. L. (1997). Judgments of learning at
delays: Shifts in response patterns or increased metamemory accuracy?
Psychological Science, 8, 318321.
Received February 20, 2002
Revision received March 20, 2003
Accepted May 8, 2003
1065
TRANSFER-APPROPRIATE METAMEMORY