Processing Similarity Does Not Improve Metamemory

Processing Similarity Does Not Improve Metamemory:
Evidence Against Transfer-Appropriate Monitoring

Charles A. Weaver III
Baylor University
William L. Kelemen
California State University, Long Beach
The transfer-appropriate monitoring (TAM) hypothesis of metamemory predicts that judgment of
learning (JOL) accuracy should improve when conditions during JOLs closely match conditions of the
memory test. The authors devised 5 types of delayed JOLs for paired associates and varied them along
with the type of memory test (cued recall or recognition). If the TAM hypothesis is correct, JOL and test
type should interact to influence metamemory. Contrary to TAM, metamemory accuracy did not improve
when JOL and test conditions matched but instead tended to vary according to whether the answer was
apparent at time of JOL. Memory test scores and JOL magnitude were both greater when the correct
target was evident during JOLs. Overall, the results are largely consistent with a monitoring retrieval
view of delayed JOLs and do not support TAM as a viable account of JOL accuracy.
Many studies of metamemory have focused on judgments of
learning (JOLs), which are predictions about future memory per-
formance that occur during, or soon after, study (Nelson & Narens,
1990). When JOLs occur for paired associates (e.g., elephant
sunburn) and the JOL cue provides only the cue term (elephant)
and asks about future recall of the absent target term (sunburn),
metamemory accuracy increases dramatically if a delay of several
minutes occurs between study and JOL. This finding is known as
the delayed-JOL effect (Nelson & Dunlosky, 1991), and it pro-
duces the largest change in JOL accuracy yet reported. Subsequent
research has shown that the delayed-JOL effect is quite robust, and
its theoretical explanation has been discussed widely (Dunlosky &
Nelson, 1992, 1994, 1997; Kelemen, 2000; Kelemen & Weaver,
1997; Kimball & Metcalfe, 2002, in press; Koriat, 1997; Nelson &
Dunlosky, 1991, 1992, 1996; Schwartz, 1994; Spellman & Bjork,
1992, 1997; Weaver & Kelemen, 1997). In the present study, we
focused on one possible explanation of delayed-JOL accuracy
known as transfer-appropriate monitoring (TAM).
Conceptually, TAM can be seen as an extension of the well-
known transfer-appropriate processing memory hypothesis. Ac-
cording to this view, memory is best when the processes used
during encoding are recapitulated during retrieval (Blaxton, 1986;
Graf & Ryan, 1990; Lockhart, 2002; Morris, 1978; Morris, Brans-
ford, & Franks, 1977; Rajaram, Srinivas, & Roediger, 1998; Roe-
diger, 1990; Roediger, Gallo, & Geraci, 2002). Similarly, accord-
ing to the TAM hypothesis, metamemory accuracy should vary as
a function of the match between conditions during JOLs and
conditions during subsequent memory tests; the closer the match
between JOL and retrieval conditions, the more accurate the
monitoring.
To test the TAM hypothesis, Dunlosky and Nelson (1997)
examined memory for paired associates on an associative recog-
nition test, eliciting delayed JOLs with either the cue alone or the
entire cuetarget pair. Because the recognition test involved both
the cue and the target, TAM predicts that metamemory accuracy
should have been higher for the cuetarget JOL cues. In fact, the
opposite pattern emerged: Cue-alone JOL accuracy was better than
cuetarget accuracy.
The above-mentioned view of TAM could be considered a
context-oriented version of TAM; when attempting to match judg-
ment and retrieval conditions, one looks for a match between the
stimuli. If the stimuli present at judgment match those at retrieval,
a context-oriented view of TAM would allow one to consider this
a match. The results of Dunlosky and Nelson (1997) can be used
to argue against the context-oriented version of TAM.
Alternatively, one might focus not on the specific stimuli that
are present but on the kind of processing the stimuli evoke.
Specifically, metamemory accuracy might increase if the cognitive
processing required during JOLs matches that required for suc-
cessful test performance, even if the exact context differs. We call
this the process-oriented version of TAM. This distinction, though
slight, may prove important.
1
For example, Begg, Duft, Lalonde,
Melnick, and Sanvito (1989) compared recall and recognition of
paired associates (items A and B) using different kinds of condi-
1
This distinction is similar to the one between encoding specificity and
transfer-appropriate processing. Encoding specificity, for all its utility, is
descriptive in nature. Encoding specificity does not explain why memory
improves when encoding conditions match retrieval conditions; it simply
states that it does. Transfer-appropriate processing is an attempt to provide
an explanation in terms of the processing similarity.
Charles A. Weaver III, Department of Psychology and Neuroscience,
Baylor University; William L. Kelemen, Department of Psychology, Cal-
ifornia State University, Long Beach.
We thank John Dunlosky and Janet Metcalfe for helpful comments on
a draft of this article. We also thank Sheila Barnes and Candice Ferguson
for assistance with data collection.
Correspondence concerning this article should be addressed to Charles
A. Weaver III, Department of Psychology and Neuroscience, Baylor Uni-
versity, Box 97334, Waco, Texas 76798. E-mail: charles_weaver@
baylor.edu
Journal of Experimental Psychology: Copyright 2003 by the American Psychological Association, Inc.
Learning, Memory, and Cognition
2003, Vol. 29, No. 6, 10581065
0278-7393/03/$12.00 DOI: 10.1037/0278-7393.29.6.1058
1058
tions at time of judgment. All participants were tested on recog-
nition of A and on cued (A 3 B) recall. Begg et al. found that a
precise match between prediction and text context was not re-
quired for accurate predictions. Instead, more accurate predictions
were obtained when then processing requirements were similar.
For example, consider the case in which the memory test requires
recall of B given A as a cue. Judgments were made in several
different ways, but in this example two are especially critical:
predicting recall of B given AB as the cue, and predicting recog-
nition of B given A as a cue. The second condition produced more
accurate predictions, even though the predictions involved recog-
nition, whereas the test was cued recall. Begg et al. (1989) con-
cluded that predictive accuracy depends on whether the predictive
task requires the same processes as the test, not on the nominal
question . . . (p. 630). This could be interpreted as supporting a
process-oriented view of TAM.
One complicating factor in comparing the context-oriented and
process-oriented versions of TAM is the covariability of the two.
In general, altering the stimuli present at study, prediction, and test
induces processing differences. One useful way of isolating the
two is comparing recall and recognition. Predictions of recall
should be best in a cue-only JOL condition. Ones ability to predict
future performance on a recognition test, however, requires knowl-
edge not only of the correct answer but also of the alternatives. In
a situation in which one is uncertain of an answer, it is much easier
to recognize the correct response if the alternatives are not plau-
sible. Recognition of caballo as the Spanish word for horse is
much easier given agua, verde, and sol as distractors than if given
pato, vaquero, and oveja. Indeed, the difficulty of alternatives is a
major determinant of recognition performance (e.g., Drum, Calfee,
& Cook, 1981).
To examine a process-oriented version of TAM, we compared
performance on five different delayed-JOL conditions in a paired-
associate task, summarized in Table 1. Participants studied a series
of cuetarget pairs like ELEPHANTsunburn and made JOLs
several minutes after studying. Participants in Condition 1 were
presented with only the cue word at time of JOL (ELEPHANT) and
were asked to predict the likelihood of future recall (or recogni-
tion) of the target (sunburn) given the cue. Participants in Condi-
tion 2 also predicted future performance but were presented both
the cue and the target at JOL (ELEPHANTsunburn). The supe-
riority of the cue-alone condition is well established but consistent
with both context-oriented and process-oriented versions of TAM.
To distinguish between the two, we varied prediction conditions on
a recognition test (Conditions 35).
Participants in Condition 3 studied the same word pairs, and at
time of JOL, they were shown the cue alone (ELEPHANT?) along
with six incorrect alternative pairs and were asked to predict later
recognition performance. These incorrect alternatives combined a
correct cue (ELEPHANT) paired with incorrect responses (such as
elbow). Condition 4 was identical to Condition 3, except that the
cue-alone alternative was replaced with the correct cuetarget.
Finally, in Condition 5, not only was the correct cuetarget pair
presented at JOL (as in Condition 4) but it was marked as correct
with asterisks.
According to a process version of TAM, the JOL conditions that
produce the most accurate metamemory should vary according to
the type of test. For recognition, JOL accuracy should be highest
in Condition 4 because the processing elicited during JOLs most
closely matches the test. In contrast, Condition 1 should produce
the best performance for cued-recall tests; including incorrect
alternatives at time of JOL might even hinder metamemory by
providing irrelevant information. Thus, a critical prediction of the
TAM hypothesis is that a significant interaction between type of
JOL and type of test should emerge.
Table 1
Examples of the Five Delayed-JOL Conditions for a Hypothetical Item, ELEPHANTsunburn
Condition Description Example of JOL prompt
1 Cue alone ELEPHANT?
2 Cuetarget ELEPHANTsunburn
3 Cue alone incorrect alternatives ELEPHANTdiamond
ELEPHANThillside
ELEPHANTmacaroni
ELEPHANTbar
ELEPHANT
ELEPHANTelbow
ELEPHANTsugar
4 Cuetarget incorrect alternatives ELEPHANTdiamond
ELEPHANThillside
ELEPHANTmacaroni
ELEPHANTbar
ELEPHANTsunburn
ELEPHANTelbow
ELEPHANTsugar
5 Cuetarget (marked) incorrect alternatives ELEPHANTdiamond
ELEPHANThillside
ELEPHANTmacaroni
ELEPHANTbar
ELEPHANTsunburn***
ELEPHANTelbow
ELEPHANTsugar
Note. JOL judgment of learning.
1059
TRANSFER-APPROPRIATE METAMEMORY
Method
Participants and Materials
A total of 68 college undergraduates participated for course credit. The
stimuli were 60 unrelated pairs of nouns from Paivio, Yuille, and Madi-
gans (1968) norms. Up to 4 people participated simultaneously during
experimental sessions, and all testing was conducted in individual cubicles
using IBM-compatible PCs.
Design and Procedures
We used a 2 5 (type of memory test, varied between subjects; by type
of JOL, within subject) mixed factorial design. All JOLs were delayed by
at least 2.5 min. On arrival, individuals were designated to receive either a
cued-recall test or a seven-alternative forced-choice recognition test over
all 60 items. Participants were informed which type of test would be
administered. Incorrect alternatives for the recognition test were con-
structed from correct answers for other stimuli.
Participants studied the items at a rate of 4 s/pair. An additional 5 pairs
of items (one in each JOL condition) were included at the beginning of the
study phase as a primacy buffer. The 60 critical stimuli were divided into
two blocks of 30 during the study phase (although these blocks were
transparent to the participants). Following study, the items were presented
for JOLs, again preceded by the 5 buffer items. The first block of 30 paired
associates was presented in random order for JOLs, followed by the second
block. Thus, at least 35 items (30 studied items plus the JOLs on the 5
buffer items) intervened between study and JOL. After providing all 60
JOLs, participants completed an unrelated filler activity for 10 min, fol-
lowed by an untimed memory test (either cued recall or recognition).
For each person, 12 items were randomly assigned to each of the five
JOL conditions. As described above, these JOL conditions were as follows:
1. Cue alone: The cue was shown at the top of the screen, followed
by the phrase The first word appears alone above.
2. Cue Target: The previously studied pair was shown at the top
of the screen, followed by the phrase The correct pair appears
above.
3. Cue alone with incorrect alternative pairs: The cue word (alone)
was shown at the top of the screen, followed by six incorrect
cuetarget pairs and then the statement The first word appears
alone above, mixed with six incorrect pairs.
4. Cue Target with six incorrect alternative word pairs, followed
by the statement The correct pair appears above, mixed with six
incorrect pairs.
5. Cue Target with six incorrect alternative word pairs, in which
case the correct pair was noted by flanking asterisks followed by
the statement The correct pair is marked with *** above. Six
incorrect pairs also are listed.
Participants in all conditions were asked to rate how likely they were to
recall or recognize the correct answer by selecting ratings of 0% (labeled
definitely will not remember), 20%, 40%, 60%, 80%, or 100% (labeled
definitely will remember) confident.
Results
An alpha level of .05 was used for all statistical tests except
where noted. We computed
2
as a measure of effect size for all
statistically significant analyses of variance (ANOVAs), and we
used guidelines based on Cohen (1988) to interpret
2
: 0.01
small effect size, 0.06 medium effect size, and 0.14 large
effect size (see Clark-Carter, 1997, for details).
Metamemory Accuracy
We computed GoodmanKruskal Gamma correlations (G) be-
tween JOL magnitude and memory test performance for each
participant. G was undefined in some conditions in which there
was a lack of variability in JOLs (i.e., using the same JOL rating
for all 12 items) or test performance (i.e., scoring 0/12 or 12/12 on
the memory test). Fourteen participants who received the recall test
and 15 who received the recognition test had undefined Gs in one
or more JOL conditions. Data from these participants were ex-
cluded from analyses; mean Gs from the remaining participants
(n 39) appear in Figure 1.
Gs were significantly lower for recognition tests than for recall
tests (cf. Thiede & Dunlosky, 1994), F(1, 37) 6.41, MSE .36,
2
.15, and JOL type had a significant effect, F(4, 148) 3.49,
MSE .18,
2
.09. The interaction between test type and JOL
type, however, was not significant, F(4, 148) 1.09, MSE .18,
p .05. This is important because TAM predicts that monitoring
will be best when stimuli (for the context-oriented view) or the
processing used (for the processing-oriented view) at JOL match
those at test. For recall, this would be JOL Condition 1: The JOL
prompt is the cue alone, exactly like the final test. Although
predictive accuracy was high in Condition 1 for recall, adding
irrelevant alternatives to the context (Conditions 3 and 4) did not
decrease G so long as the correct answer was not evident. For
recognition, best performance would be expected in JOL Condi-
tion 4, where the unmarked cuetarget pair is presented with the
same six distractor pairs that would be present at test; G was
intermediate in that condition.
Because the interaction was not significant (and because the
pattern was similar across the two conditions), we combined Gs for
recall and recognition tests and obtained the following mean Gs for
Conditions 15, respectively: 0.69, 0.44, 0.72, 0.67, and 0.50.
Mean Gs tended to be higher when the answer was not evident.
Comparing all pairwise combinations of the five JOL conditions
required 10 separate tests, so we used the Bonferroni correction
procedure and adopted a more stringent alpha level (.05/10)
.005 for each t test. No significant differences in mean Gs emerged
at the modified alpha level (observed p values for tests comparing
Conditions 2 and 5 vs. Conditions 1, 3, and 4 ranged from .007 to
.056). However, when recall and recognition tests were considered
separately, the direction of the effect (higher G when the answer
was not evident) was consistent in 12 of 12 cases; this is significant
by a sign test ( p .0002). Overall, Gs did not vary as predicted
by TAM but rather varied according to whether the correct answer
was evident during JOLs.
2
2
Using 137 participants, we replicated this finding in a second experi-
ment. The procedures were the same except that type of JOL varied
between subjects and all participants received a recall test for 30 items and
a recognition test for the remaining 30 items. This modification produced
fewer indeterminate Gs and reduced the variability in mean scores but
again completely failed to support the TAM hypothesis.
1060
WEAVER AND KELEMEN
JOL Magnitude
Mean JOL magnitude across conditions appears in Table 2. A
5 2 mixed ANOVA showed a significant main effect of JOL
type, F(4, 264) 51.74, MSE .02,
2
.44, and a significant
interaction between JOL type and test, F(4, 264) 3.56, MSE
.02,
2
.05. To follow up the interaction, we conducted separate
one-way ANOVAs for the recall and recognition tests, and the
effect of JOL type remained significant in both cases (Fs 15).
Next, we conducted post hoc paired t tests at alpha .005 to
examine differences across conditions. JOL magnitude in Condi-
tions 2, 4, and 5 was significantly higher compared with Condi-
tions 1 and 3 for both types of tests in all but one case (11 of 12
comparisons, p .003 using a sign test). Overall, participants
were more confident about their future memory performances
when they saw the answer at time of JOL.
Test Performance
Performance was better on recognition tests than on recall tests
(see Table 2), F(1, 66) 48.73, MSE .23,
2
.43. Type of
JOL also had a strong influence on test performance, F(4, 264)
46.58, MSE .02,
2
.41. In addition, the interaction between
test type and JOL type was significant, F(4, 264) 4.92, MSE
.02,
2
.07. We conducted separate repeated measures
ANOVAs for each type of test, and the influence of JOL type
remained significant (Fs 18) with large effect sizes (
2
.35)
for both types of tests. The influence of JOL cues on test perfor-
mance is clear: Participants memory was best when the correct
answer was evident (Conditions 2 and 5), moderate when the
answer was included but not distinguished from incorrect alterna-
tives (Condition 4), and lowest when the answer was not shown
(Conditions 1 and 3). Post hoc t tests were consistent with this
interpretation (see subscripts in Table 2).
Discussion
The purpose of this study was to evaluate the effects of match-
ing the processing elicited during delayed JOLs and the processing
required at subsequent tests. The context-oriented version of the
TAM hypothesis proposes that metamemory accuracy will in-
crease as judgments and tests become more similar. One major
prediction was the emergence of a reliable interaction between
type of JOL and test: For recall tests, Condition 1 should have
produced the highest Gs; for recognition tests, Condition 4 should
have been the best. This interaction did not occur. At the same
time, we failed to obtain evidence to support a processing-oriented
version of TAM. Although metamemory accuracy was high in
Condition 1 for recall, adding irrelevant alternatives during
JOLsthereby degrading the processing matchdid not decrease
metamemory in Conditions 3 and 4. For associative recognition,
Condition 4 provided an exact match of context and processing at
JOL and test, but metamemory accuracy did not improve as
predicted. In fact, mean Gs were slightly higher in Conditions 1
and 3, which provided imperfect matches.
We found large increases in JOL magnitude and test perfor-
mance when the correct answer was evident during delayed JOLs
(Conditions 2 and 5) compared with when the answer was absent
(Conditions 1 and 3). Seeing the correct cuetarget item at time of
JOL improved subsequent memory, and participants adjusted their
JOLs accordingly. The opposite pattern of results was obtained for
relative metamemory accuracy: Mean Gs were higher when the
correct answer was absent at time of JOL and lower when the
correct answer was evident. These results may have emerged
because participants attempted to retrieve the answer during JOLs
in Conditions 1 and 3, which provided highly diagnostic informa-
Table 2
Mean JOL Magnitude and Test Performance by Type of JOL and Type of Test
Test type and
performance
JOL condition
1 2 3 4 5
M SE M SE M SE M SE M SE
Recall
JOL magnitude .22
a
.04 .51
b
.05 .24
a
.03 .44
b
.04 .51
b
.04
Performance .17
a
.03 .43
c
.04 .14
a
.03 .31
b
.04 .41
c
.04
Recognition
JOL magnitude .33
a
.03 .52
b
.04 .37
a, c
.04 .44
b, c
.04 .52
b
.04
Performance .61
a
.04 .79
b
.04 .57
a
.04 .58
a
.05 .73
b
.05
Note. Means in the same row with different subscripts were significantly different at p .01 using post hoc
paired t tests. JOL judgment of learning.
Figure 1. Mean gammas as a function of judgment of learning (JOL)
condition and type of test. Vertical bars represent standard errors of the
mean.
1061
tion regarding future memory performance. Conditions 2 and 5, on
the other hand, provided an additional opportunity to learn the item
but offered fewer diagnostic cues for JOLs.
Isolating Causes of Metamemory Accuracy
The main differences in metamemory accuracy can be summed
up as follows: Metamemory accuracy in JOL Conditions 2 and 5
is lower than it is in the other three conditions. Why is this the
case? First of all, Conditions 2 and 5 present the (clearly marked)
correct answer during the JOL, removing the need for any kind of
covert retrieval attempt, if in fact this is what participants are doing
at JOL. Specifically, these conditions preclude the case of a failed
retrieval attempt, which is particularly diagnostic (see Nelson,
Narens, & Dunlosky, in press). If so, then anything that reduces the
number of low JOLs (predictions of unsuccessful future recall)
should reduce JOL accuracy. Second, those JOL conditions may
induce an illusion of knowing (Glenberg, Wilkinson, & Epstein,
1982; Hart et al., 1992; Koriat, 1998), in which individuals de-
velop a sense of overconfidence, believing that they know more
than they do. Finally, presenting the targets at time of JOL may
impair JOL accuracy simply by distorting the distribution of JOL.
That is, decreasing the frequency of low JOLs may reduce gammas
for reasons having to do with measurement factors, not metacog-
nitive factors. By restricting the range of JOLs, observed levels of
G may be reduced. Compared with immediate JOLs, delayed JOLs
induce many more JOLs at the extremes of the JOL continuum
(Dunlosky & Nelson, 1994; Schwartz, 1994). In previous work
(Weaver & Kelemen, 1997), however, we determined that this
distribution shift was not the primary cause of the delayed-JOL
effect.
In the present study, the different JOL conditions did induce
major differences in the distribution of JOLs. Table 3 displays the
frequency with which different JOLs were selected as a function of
JOL condition. JOL Conditions 2 and 5 elicit far fewer judgments
of 0 than any other condition: less than 10% of the time for both
recall and recognition tests. The other JOL conditions elicited
JOLs of 0 between two and six times more frequently. At the same
time, these conditions produce different levels of correct perfor-
mance when conditionalized upon JOL (also shown in Table 3);
we refer to the patterns of conditional proportions correct as
calibration curves, following common practice in this field and in
judgments and decision making (Hart et al., 1992; Nelson, 1996;
Stankov, 1998; Wallsten, 1996; Weaver, 1990). Theoretically,
perfect metacognitive accuracy would be indicated by proportions
correct that are identical to the JOL level (that is, items with JOLs
of 80% would be answered correctly 80% of the time) and to be
independent of JOL frequency. When G is less than perfect,
though, the distribution of JOLs has a significant effect. G involves
a weighted averaging of items. An inaccurate prediction that
occurs frequently will significantly lower G. The same inaccuracy
Table 3
Frequency of JOL Usage and Conditional Proportion Correct by Type of JOL and Type of Test
JOL
condition Measure
JOL
0 20 40 60 80 100
Recall
1 Frequency .55 .19 .05 .07 .05 .09
Proportion correct .04 .09 .30 .34 .45 .83
2 Frequency .10 .25 .20 .11 .13 .21
3 Frequency .48 .25 .08 .07 .06 .07
4 Frequency .22 .25 .13 .11 .13 .17
5 Frequency .08 .26 .20 .14 .13 .19
Recognition
1 Frequency .27 .36 .14 .04 .06 .13
Proportion correct .39 .56 .70 1.00 .88 .89
2 Frequency .05 .24 .23 .18 .13 .17
3 Frequency .20 .32 .20 .09 .07 .12
4 Frequency .23 .27 .12 .08 .08 .23
5 Frequency .08 .24 .19 .17 .13 .19
Note. JOL judgment of learning.
1062
WEAVER AND KELEMEN
observation, occurring infrequently, has much less of an effect.
Estimates of G, then, are influenced not only by the function
relating JOL and performance (the calibration curves) but also by
the relative frequency with which each JOL category is used.
To separate these two influences, we conducted a series of
Monte Carlo simulations, similar to those performed by Weaver
(1990) and Weaver and Kelemen (1997). For each observation, we
first determined the JOL for that item using the frequency distri-
butions displayed in Table 3. For example, in JOL Condition 1 for
the recall data, participants selected the 0% JOL rating 55% of the
time, the 20% rating 19% of the time, and so on. We generated a
random number between 0 and 1 and used this to determine the
items JOL: If the random number was less than .55, the item was
assigned a JOL of 0. If the number was between .55 and .74 (.55
.19), it was given a JOL of 2, and so on. Once an item received a
JOLassume for illustrative purposes that the item was assigned
a JOL of 0the proportion correct data from Table 3 were used to
determine whether this item was successfully recalled. Another
random number was generated; if the number was less than .04
(the conditional proportion correct for JOL Condition 1 JOL
0), the item was presumed to have been correctly recalled. This
was repeated for each of 60 items, for 50 participants per simulated
experiment. Each experiment was replicated 50 times.
These procedures allowed us to separate the effects of JOL
distribution shifts from those due to differences in calibration
curves. For example, are the lower observed Gs in Conditions 2
and 5 an artifact of the relative infrequency of using JOLs of 0? If
so, then assigning JOLs based on data where JOLs of 0 are more
frequent (such as recall, Condition 1) but using the same calibra-
tion curve should produce higher Gs. If the lower Gs reflect true
metacognitive impairments, then varying the calibration curves
while holding constant the JOL distributions should produce larger
effects. In all, 25 combinations of 5 JOL distributions and 5
conditional proportions correct were possible for both the recall
and the recognition data.
The results of the simulations are shown in Table 4. The main
diagonal indicates places where the accuracy of the simulations
can be checked with participants actual data. In 8 of 10 cases, the
simulated Gs were nearly perfect (within the 95% confidence
interval for the mean of participants data). In the others, the
pattern observed still mirrors the data actually obtained. Overall,
we are satisfied that our simulations allow us to answer the
questions of interest.
3
The results of the simulations are clear and striking. Although
changing the JOL distribution alters the Gs somewhat, varying the
calibration curves alters them substantially. Regardless of the
underlying JOL distributions, the calibration curves from Condi-
tions 2 and 5 (those at which the correct answer is displayed and
identified at JOL) produce substantially lower Gs. The effects are
particularly powerful with recall data. This is noteworthy because
the vast majority of JOL research uses cued recall as the dependent
variable. We conclude from these data that the poor metacognitive
3
We can speculate as to why our results in some simulated conditions
differed more than others. First of all, our simulations assume that JOLs are
distributed randomly across each participant and among all participants.
The condition in which our error was greatest, recognition JOL Condition
4, illustrates one consequence of this assumption. If we assume that all
participants use all categories equally, our simulated numbers are more
believable. However, if participants tended to use either the higher cate-
gories or the lower categories more frequently, observed gammas would be
lower than simulated gammas. This is true because conditional probabili-
ties for Categories 0 and 20 are almost identical, as are those for Categories
60, 80, and 100. Those using JOLs at only the higher range, for example,
will have many cases in which the item with the higher JOL is not more
likely to be recalled, lowering the gammas.
Table 4
Results (Mean gamma) of Monte Carlo Simulations Varying JOL Frequency and Conditional
Proportion Correct
JOL
distribution
Condition
1 2 3 4 5 M
Recall
1 .81, .84 .43 .81 .80 .70 .71
2 .74 .28, .44 .74 .63 .42 .59
3 .77 .43 .82, .83 .77 .66 .69
4 .76 .35 .77 .70, .84 .53 .62
5 .71 .27 .73 .61 .40, .57 .54
M .76 .35 .71 .70 .54
Recognition
1 .56, .53 .45 .55 .63 .50 .54
2 .58 .51, .45 .55 .72 .50 .57
3 .58 .47 .54, .59 .66 .49 .55
4 .62 .57 .64 .72, .49 .58 .62
5 .59 .53 .59 .73 .42, .37 .59
M .59 .50 .57 .70 .50
Note. Simulated data are based on observed values from each experimental condition. Actual results are shown
in bold.
1063
performance seen in Conditions 2 and 5 is a true deficit, not an
artifact of the shift in JOL distributions.
Our data, unfortunately, do not let us distinguish between the
two most compelling explanations of JOLs, the monitoring-dual
memories (MDM) hypothesis of Nelson and Dunlosky (Dunlosky
& Nelson, 1992, 1994, 1997; Nelson & Dunlosky, 1991, 1992) and
the self-fulfilling hypothesis of Spellman and Bjork (1992) and its
more recent variant, the memory hypothesis of Kimball and Met-
calfe (in press). Gs tended to be high when the correct answers
were not evident during JOLs; this is largely consistent with an
MDM account of JOLs. However, Gs were relatively high for
recall tests in Condition 4, even though the correct-cuetarget pair
was presented. At the same time, presenting target answers along
with cues at time of JOL also produced an increase in memory
accuracy (at the expense of metamemory accuracy); this is con-
sistent with the self-fulfilling and memory hypotheses. Nelson et
al.s newly developed prejudgment recall and monitoring (PRAM)
procedure (unpublished manuscript), in which recall attempts are
made prior to JOLs, may allow this question to be addressed more
directly in future research.
Most important, our data strongly argue against a processing
view of TAM for paired associates. It is possible that support for
TAM may yet emerge using more complex stimulus materials such
as passages of text. Text materials permit a wider range of encod-
ing strategies and processing during judgment and test; this might
increase the importance of matching processing at these times
(though see Rawson, Dunlosky, & McDonald, 2002, for a discus-
sion that contradicts this view). At present, however, we see little
evidence to support TAM as a viable account of metamemory
accuracy.
References
Begg, I., Duft, S., Lalonde, P., Melnick, R., & Sanvito, J. (1989). Memory
predictions are based on ease of processing. Journal of Memory and
Language, 28, 610632.
Blaxton, T. A. (1986). Investigating dissociations among memory mea-
sures: Support for a transfer appropriate processing framework (Doctoral
dissertation, Purdue University, 1985). Dissertation Abstracts Interna-
tional, 47, 408.
Clark-Carter, D. (1997). Doing quantitative psychological research: From
design to report. East Sussex, England: Psychology Press.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences
(2nd ed.). Hillsdale, NJ: Erlbaum.
Drum, P. A., Calfee, R. C., & Cook, L. K. (1981). The effects of surface
structure variables on performance in reading comprehension tests.
Reading Research Quarterly, 16, 486514.
Dunlosky, J., & Nelson, T. O. (1992). Importance of the kind of cue for
judgments of learning (JOL) and the delayed-JOL effect. Memory and
Cognition, 20, 374380.
Dunlosky, J., & Nelson, T. O. (1994). Does the sensitivity of judgments of
learning (JOLs) to the effects of various study activities depend on when
the JOLs occur? Journal of Memory and Language, 33, 545565.
Dunlosky, J., & Nelson, T. O. (1997). Similarity between the cue for
judgments of learning (JOL) and the cue for test is not the primary
determinant of JOL accuracy. Journal of Memory and Language, 36,
3449.
Glenberg, A. M., Wilkinson, A. C., & Epstein, W. (1982). The illusion of
knowing: Failure in the self-assessment of comprehension. Memory and
Graf, P., & Ryan, L. (1990). Transfer-appropriate processing for implicit
and explicit memory. Journal of Experimental Psychology: Learning,
Memory, and Cognition, 16, 978992.
Hart, J. T., Nelson, T. O., Gerler, D., Narens, L., Arbuckle, T. Y., Cuddy,
L. A., et al. (1992). Metacognitive monitoring. In T. O. Nelson (Ed.),
Metacognition: Core readings (pp. 131231). Needham Heights, MA:
Allyn & Bacon.
Kelemen, W. L. (2000). Metamemory cues and monitoring accuracy:
Judging what you know and what you will know. Journal of Educational
Psychology, 92, 800810.
Kelemen, W. L., & Weaver, C. A., III. (1997). Enhanced memory at
delays: Why do judgments of learning improve over time? Journal of
Experimental Psychology: Learning, Memory, and Cognition, 23, 1394
1409.
Kimball, D. R., & Metcalfe, J. (2002, November). Explaining the delayed-
JOL effect: Evidence of a Heisenberg effect. Paper presented at the 43rd
Annual Meeting of the Psychonomic Society, Kansas City, MO.
Kimball, D. R., & Metcalfe, J. (in press). Delaying judgments of learning
affects memory, not metamemory. Memory & Cognition.
Koriat, A. (1997). Monitoring ones own knowledge during study: A
cue-utilization approach to judgments of learning. Journal of Experi-
mental Psychology: General, 126, 349370.
Koriat, A. (1998). Illusions of knowing: The link between knowledge and
metaknowledge. In V. Y. Yzerbyt (Ed.), Metacognition: Cognitive and
social dimensions (pp. 1634). Thousand Oaks, CA: Sage.
Lockhart, R. S. (2002). Levels of processing, transfer-appropriate process-
ing, and the concept of robust encoding. Memory, 10, 397403.
Morris, C. D. (1978). Transfer appropriate processing between different
encoding dimensions (Doctoral dissertation, Vanderbilt University,
1977). Dissertation Abstracts International, 39, 1017.
Morris, C. D., Bransford, J. D., & Franks, J. J. (1977). Levels of processing
versus transfer appropriate processing. Journal of Verbal Learning and
Verbal Behavior, 16, 519533.
Nelson, T. O. (1996). Gamma is a measure of the accuracy of predicting
performance on one item relative to another item, not of the absolute
performance on an individual item. Applied Cognitive Psychology, 10,
257260.
Nelson, T. O., & Dunlosky, J. (1991). When peoples judgments of
learning (JOLs) are extremely accurate at predicting subsequent recall:
The delayed-JOL effect. Psychological Science, 2, 267270.
Nelson, T. O., & Dunlosky, J. (1992). How shall we explain the delayed-
judgment-of-learning effect? Psychological Science, 3, 317318.
Nelson, T. O., & Dunlosky, J. (1996, November). Toward the theoretical
mechanisms underlying immediate versus delayed judgments of learn-
ing. Paper presented at the 37th Annual Meeting of the Psychonomic
Society, Chicago.
Nelson, T. O., & Narens, L. (1990). Metamemory: A theoretical framework
and new findings. In G. Bower (Ed.), The psychology of learning and
motivation (Vol. 26, pp. 125173). San Diego, CA: Academic Press.
Nelson, T. O., Narens, L., & Dunlosky, J. (in press). A revised methodology
for research on metamemory: Pre-judgment recall and monitoring
(PRAM). Psychological Methods.
Paivio, A., Yuille, J. C., & Madigan, S. A. (1968). Concreteness, imagery,
and meaningfulness values for 925 nouns. Journal of Experimental
Psychology Monographs, 76(1, Pt. 2).
Rajaram, S., Srinivas, K., & Roediger, H. L. (1998). A transfer-appropriate
processing account of context effects in word-fragment completion.
Journal of Experimental Psychology: Learning, Memory, and Cogni-
tion, 24, 9931004.
Rawson, K. A., Dunlosky, J., & McDonald, S. L. (2002). Influences of
metamemory on performance predictions for text. Quarterly Journal of
Experimental Psychology: Human Experimental Psychology, 55A, 505
524.
Roediger, H. L. (1990). Implicit memory: Retention without remembering.
American Psychologist, 45, 10431056.
1064
WEAVER AND KELEMEN
Roediger, H. L., Gallo, D. A., & Geraci, L. (2002). Processing approaches
to cognition: The impetus from the levels-of-processing framework.
Memory, 10, 319332.
Schwartz, B. L. (1994). Sources of information in metamemory: Judgments of
learning and feelings of knowing. Psychonomic Bulletin &Review, 1, 357375.
Spellman, B. A., & Bjork, R. A. (1992). When predictions create reality:
Judgments of learning may alter what they are intended to assess.
Psychological Science, 3, 315316.
Spellman, B. A., & Bjork, R. A. (1997, November). When prophecy
succeeds (too well): Inaccurate judgments of learning can produce
better-than-perfect predictions. Paper presented at the 38th Annual
Meeting of the Psychonomic Society, Philadelphia.
Stankov, L. (1998). Calibration curves, scatterplots and the distinction
between general knowledge and perceptual tasks. Learning and Individ-
ual Differences, 10, 2950.
Thiede, K. W., & Dunlosky, J. (1994). Delaying students metacognitive
monitoring improves their accuracy in predicting their recognition per-
formance. Journal of Educational Psychology, 86, 290302.
Wallsten, T. S. (1996). An analysis of judgment research analyses. Orga-
nizational Behavior and Human Decision Processes, 65, 220226.
Weaver, C. A., III. (1990). Constraining factors in calibration of compre-
hension. Journal of Experimental Psychology: Learning, Memory, and
Weaver, C. A., III, & Kelemen, W. L. (1997). Judgments of learning at
delays: Shifts in response patterns or increased metamemory accuracy?
Psychological Science, 8, 318321.
Received February 20, 2002
Revision received March 20, 2003
Accepted May 8, 2003
1065

Processing Similarity Does Not Improve Metamemory

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Processing Similarity Does Not Improve Metamemory

Transféré par

Droits d'auteur :

Formats disponibles

Processing Similarity Does Not Improve Metamemory:

Evidence Against Transfer-Appropriate Monitoring

Vous aimerez peut-être aussi