Shapiro

Reading Psychology, 35:644665, 2014
Copyright
C Taylor & Francis Group, LLC
ISSN: 0270-2711 print / 1521-0685 online

DOI: 10.1080/02702711.2013.790328
CONCURRENT AND PREDICTIVE VALIDITY OF READING

RETELL AS A BRIEF MEASURE OF READING
COMPREHENSION FOR NARRATIVE TEXT
EDWARD S. SHAPIRO
Center for Promoting Research to Practice, Lehigh University, Bethlehem,
Pennsylvania
NANETTE S. FRITSCHMANN
Educational Support Services, Loyola Marymount University, Los Angeles,
California
LISA B. THOMAS
Devereux Center for Effective Schools, King of Prussia, Pennsylvania
CHEYENNE L. HUGHES
Nemours/Alfred I. duPont Childrens Hospital, Wilmington, Delaware
JAMES McDOUGAL
Counseling and Psychological Services Department, State University of New
YorkOswego, Oswego, New York
Concurrent and predictive validity between the Retell Reading Rubric (RRR),
Oral Reading Fluency (ORF), an adaptation of the DIBELS Retell Fluency
(RTF-A), and a state assessment emphasizing reading comprehension were ex-
amined across students in grades 3 and 5. Results showed the RRR to have
moderate and statistically significant relationships to the ORF, RTF-A, and the
state assessment for grade 3, but weaker relationships for grade 5. For grade 3,
the RRR accounted for a small significant proportion of variance beyond ORF
in predicting outcomes on the state assessment for third grade, but no statistically
significant contribution for grade 5.
An estimated 10% of children ages 711 years nationwide

have specific reading comprehension deficits unrelated to deficits
in word-decoding skills (Cain & Oakhill, 2006; Nation & Snowling,
1997). The assessment of reading comprehension is thus a crit-
ical component of effective evaluation for reading outcomes. A
particular problem in designing assessment methods for assessing
Address correspondence to Edward S. Shapiro, PhD, Director, Center for Promot-

ing Research to Practice, Lehigh University, L-111 Iacocca Hall, 111 Research Drive,
Bethlehem, PA 18015. E-mail: ed.shapiro@lehigh.edu
644
Reading Retell and Comprehension 645
comprehension is the complexity of defining the process of read-

ing comprehension itself. Reading comprehension is a complex
cognitive process involving multiple skills to connect meaning to
written text (National Reading Panel, 2000). Encompassed within
reading comprehension lies lower-level lexical skills, including vo-
cabulary knowledge and knowledge of grammatical structure as
well as higher-level text processing skills, such as inference gener-
ation, comprehension monitoring, and working memory capac-
ity (Cain & Oakhill, 2006; Kintsch & Kintsch, 2005). Both sets of
skills are necessary for the development of reading comprehen-
sion as competency in lower-level lexical skills provides the foun-
dation for higher-level processing skills (Cain & Oakhill, 2006).
Moreover, the coordination and integration of bottom-up (e.g.,
word recognition) and top-down (e.g., connect text to meaning
and context) processing are critical for reading comprehension
development (Cutting & Scarborough, 2006).
Given the complexity of defining reading comprehension,
the assessment of reading comprehension is equally challeng-
ing (Fletcher, 2006). Currently, there are multiple assessment
approaches for reading comprehension, including cloze (maze)
procedures, true/false sentences, sentence verification, multiple
choice, and open-ended questions (Cain & Oakhill, 2006; Pearson
& Hamm, 2005). One method that has been considered as hav-
ing the potential for assessing the reading comprehension process
is retelling (Keehn, 2003; Klingner, Vaughn, & Boardman, 2007;
Morrow, 1985a, 1985b; Reed & Vaughn, 2012). The retelling pro-
cess has been used to assess understanding of material through
written as well as oral retell. In this process, students are asked to
orally retell (or write) in their own words the key points of mate-
rial they had just read.
Multiple methods of scoring retell have been used, includ-
ing using a text analyses system to divide the passages into idea
units and assign idea units to a particular level of importance
(Maria, 1990), counting the total number of words included in
a participants retell (Fuchs, Fuchs, & Maxwell, 1988), and exam-
ining story structure elements included in a participants retell
(Gambrell, Koskinen, & Kapinus, 1991). Each of these meth-
ods yields somewhat different information about students under-
standing of what they have read.
Perhaps one of the more common ways that retell has
been scored is to examine a participants responses against a
646 E. S. Shapiro et al.
predetermined rubric for the presence or absence of various

characteristics. The process of retelling may provide informa-
tion about what the student comprehends from the text without
prompting or cueing (Blachowicz & Ogle, 2001). Additionally, the
method provides information on a students ability to sequence
and prioritize information, draw on past experience, knowledge
of the subject, and familiarity with the structure of the text (Bla-
chowicz & Ogle, 2001; Copmann & Griffith, 1994; Klingner et al.,
2007).
Oral retells have been mostly used as an informal or clini-
cal assessment (Blachowicz & Ogle, 2001; Fuchs & Fuchs, 1992).
Typically, students are asked to read a passage out loud and then
retell what they have read in their own words without prompting
(Shapiro, 2004). Scoring of the students retell includes examina-
tion of the content in the students retell based on specific criteria
such as setting, characters, main idea, and so on (Keehn, 2003).
Other methods of conducting an oral retell, such as in an infor-
mal reading inventory, involve prompting by the examiner follow-
ing the retell with components of the response being scored on
certain criteria such as beginning, middle, end, characters, and so
on (Paris & Carpenter, 2003). Regardless of the method, each type
of retell is hypothesized to offer information about the students
ability to comprehend text.
In perhaps the most comprehensive examination of the use
of retell in assessing reading comprehension, Reed and Vaughn
(2012) summarized outcomes of 54 studies that used a retell
method as an assessment. Across studies, retell was found to
be moderately correlated with standardized measures of read-
ing comprehension and somewhat lower correlations with mea-
sures of decoding and fluency for older students. Additionally,
Reed (2011), examining the psychometric properties across 11
instruments, found a lack of critical data to substantiate reliability
or validity of the measures.
Although retell measures typically use rubrics to score stu-
dent responses, Good and Kaminski (2002), as part of the
Dynamic Indicators of Basic Early Literacy (DIBELS, 6th edition)
measures, and Roberts, Good, and Corcoran (2005), as part of the
Vital Indicators of Progress (VIP) used within the Voyager Univer-
sal Literacy System, have used a different form of oral retell as
an assessment of comprehension. Their measure, entitled retell
fluency (RTF), involves counting the total number of words that

students say in one minute when asked to retell passages they have
just read. According to Good and Kaminski (2002), the measure
is designed to serve as a check on a students comprehension and
to discriminate those students whose comprehension is not con-
sistent with their fluency. The measure is designed to be brief,
reliable, sensitive to change over time, and reflective of change in
performance among students.
The depth of literature examining the predictive and con-
current validity of RTF is limited. Marcotte and Hintze (2009)
examined the relationship among several measures including
RTF in predicting reading comprehension among a group
of fourth-grade students. Their study revealed that RTF con-
sistently showed the lowest correlation and insignificant con-
tribution to the variance in predicting reading performance
in comparison to other measures. In addition, Marcotte and
Hintze (2009) found that RTF had relatively low levels of
interscorer agreement, suggesting the difficulties inherent in
scoring this measure. Riedel (2007) examined the relation-
ship between multiple DIBELS, 6th edition measures, in-
cluding RTF, among first and second graders and reading
comprehension. He found that the correlations between RTF
and standardized measures of reading such as the GRADE
(Williams, 2001) and the TerraNova (CTB/McGraw-Hill, 2003)
were not as strong as other DIBELS measures, especially Oral
Reading Fluency (ORF). McKenna and Good (2003) and Roberts
et al. (2005) reported statistically significant correlations be-
tween norm-referenced, standardized measures of reading com-
prehension and a RTF measure. Conversely, Pressley, Hilden, and
Shankland (2005) reported there was no statistically significant
difference found between RTF and comprehension and that
additional exploration of combining the RTF with ORF is war-
ranted. The most recent version of DIBELS, DIBELS Next (Good
& Kaminski, 2011) altered the scoring of the RTF measure, limit-
ing scoring to the number of words related to the passage during
retell. Currently, no published studies have appeared examining
this later version of the RTF. Correlations reported by the tests
authors within their technical manual indicate that the correla-
tion between DIBELS Next RTF and ORF to be in the moderate
to strong range for grades 1 through 6 (0.44 to 0.76) and with
the GRADE total test score (range 0.40 to 0.65) across grades 1
through 6 (Good et al., 2011).
Clearly, the assessment of reading comprehension remains
a critically important aspect of the evaluation of reading perfor-
mance.Methods that can offer brief but reliable estimates of read-
ing comprehension are certainly needed.Such methods are espe-
cially important in models of service delivery such as response
to intervention (RTI), where universal screening is used to iden-
tify students potentially at risk for reading difficulties (e.g., Jenk-
ins, Hudson, & Johnson, 2007). Currently, there has been lim-
ited research examining retell measures in general, especially
methods using the RTF or Reading Retell Rubric (RRR) scoring
methods.
The purpose of this study was to examine the potential of
using an easily scoreable oral retell rubric as a method for measur-
ing reading comprehension of narrative reading passages. Specif-
ically, the study examined the concurrent validity of the RRR
metric with ORF, and a modified version of RTF based on the
DIBELS 6th edition method of scoring. Predictive validity of ORF,
RTF, and RRR to the annual state assessment of reading compre-
hension among third and fifth graders was also examined. In ad-
dition, the study examined the potential of the RRR metric as a
benchmarking metric to reflect change in student performance
over time.
Method
Participants and Settings
Data were collected from a total of 271 elementary school chil-

dren enrolled in grades 3 (n=158) and 5 (n=113). The students
were recruited from two suburban public elementary schools and
one suburban middle school in northeastern Pennsylvania. All
students in grades 3 and 5 in those schools were invited to par-
ticipate in the current study; however, because primary inter-
est was normative outcomes across non-identified students, data
from students with an identified disability that would impact their
reading ability (e.g., specific learning disability in reading or
speech/language impairment) were excluded from the analyses.
A total of 16 students from the schools were eliminated from the
analysis. A letter from the principal indicating his approval of the

study and a consent form were sent home to each students par-
ent or guardian. Two rounds of consent forms were sent home,
which yielded an overall 79% response rate for third grade and
71% response rate for fifth grade. An average of 31.1% (range =
11.541.7%) of the students enrolled at each school were eligible
for free or reduced lunch, with an average of 79.2% of students
enrolled at each school from Caucasian backgrounds (range =
72.093.3%).
Measures
ORAL READING FLUENCY (ORF)

Narrative reading passages for third and fifth grades were
selected from both Dynamic Indicators of Basic Early Literacy
Skills, 6th edition (DIBELS) (Good & Kaminski, 2002) and AIM-
Sweb (Pearson Education Inc., 2008) where key narrative story el-
ements could be identified (e.g., characters, setting, climax, etc.).
A total of six passages at each grade level were used. Each pas-
sage consisted of approximately 350400 words for each grade
level. The readability of the selected passages was determined
by using the Spache Readability Formula (Micro Power & Light
Co., 2008) and the Lexile Analyzer (MetaMetrics Inc., 2008). For
the six third-grade passages, the average calculated Spache read-
ability was at the 3.6 grade level (range = 3.34.0) and the av-
erage Lexile score was 690 (range = 560820). The fifth-grade
passages yielded an average Spache readability at a 4.5 grade level
(range = 4.24.8) and an average Lexile score of 983 (range =
9501,050).
ORF was calculated by having students read each passage
aloud. The total number of words read correctly in the first
minute of reading was computed by subtracting any mispronun-
ciations, word substitutions, or omissions from the total number
of words read. The technical adequacy of ORF is well established
(Christ & Silberglitt, 2007; Marston, 1989). Test-retest reliability
has found to range from 0.920.97 and further information is
available in the DIBELS 6th edition technical manual (Good &
Kaminski, 2002) or AIMSweb technical manual (Howe & Shinn,
2002).
ADAPTED DIBELS (6TH EDITION) RETELL FLUENCY (RTF-A)

DIBELS (6th edition) Retell Fluency (RTF) was developed to
complement ORF by providing an indicator of reading compre-
hension (Good & Kaminski, 2002). The purpose of RTF is to pre-
vent speed reading and ensure that students can not only read
fluently but comprehend what they have read. In the original RTF
measure, students read a passage for one minute and retold in
their own words what the passage was about for one minute. For
this study, students were permitted to read the entire passage and
then completed the retell without regard to time. However, only
the total number of words retold in the first minute of the retell
was used to represent a students RTF-A score. There is limited in-
formation available regarding the psychometric properties of the
original RTF. However, RTF has been found to moderately corre-
late with ORF (r = 0.59; Good & Kaminski, 2002) and adequate
interrater reliability (0.96 and 0.98) has been reported (Marcotte
& Hintze, 2009).
READING RETELL RUBRIC (RRR)

The RRR was developed for the purpose of the present study
as a brief measure of reading comprehension. The RRR was an
easily administered metric of reading comprehension that fo-
cused on the content of narrative story retells. Selection of spe-
cific story elements included in the scoring measure were derived
based on research which has identified the elements of narra-
tive story grammar (Caldwell & Leslie, 2005; Medina & Piloni-
eta, 2006). In particular, students received one point for correctly
providing each of the following story structure elements in their
retell: (a) theme: the main idea of the passage or overarching
theme; (b) problem: the main problem of the story; (c) goal: how
the main character wants the problem to be solved and what the
main character is attempting to achieve; (d) setting : where and
when the story takes place; (e) characters: any characters in the
passage; (f) initiating event: the event that led to the climax and
story resolution; (g) climax: the major event of the passage; (h)
sequence: retells the story in a structural order; (i) problem solution:
how the problem was resolved; and (j) end of story: how the passage
ended. The total number of elements included in the total retell
was used to represent a students RRR score, which ranged from a
possible 0 to 10 points.
The development of the RRR scoring template for each pas-

sage included multiple steps. First, narrative-based reading probes
were extensively reviewed from both DIBELS (6th edition) and
AIMSweb in order to identify passages in which the above story
elements could easily be discerned. Then, four doctoral-level
graduate students independently read each passage and iden-
tified content from the passage to match each story structure
element on the RRR. The graduate student reviews were then
compared to the RRR scoring templates created by the re-
searchers in order to establish agreement. Passages were elimi-
nated if story structure elements were missing or consensus across
the graduate student reviewers and researchers could not be
reached.
Once passages were selected and RRR scoring templates were
developed, simulated retells were developed and four graduate
students scored the retells using the RRR scoring templates. The
reviewers were instructed to not only score the RRR but also to
write additional notes regarding areas of scoring difficulty or con-
cerns. This information was then used to further refine the RRR
and scoring templates. After consensus was reached on the RRR
scoring templates, the measure was piloted in two third-grade
classrooms and two fifth-grade classrooms. Results of the pilot
study were then used to further refine the RRR and scoring tem-
plates for clarity.
PENNSYLVANIA SYSTEM OF SCHOOL ASSESSMENT READING

2008, GRADES 3 & 5 (PSSA)
The PSSA is the measure designed for educational account-
ability purposes in Pennsylvania (Data Recognition Corporation
[DRC], 2009). The reading portion of the PSSA covered two
broad skill areas that were based on the Pennsylvania Assessment
standards: (a) Comprehension and Reading Skills and (b) Anal-
ysis of Fiction and Nonfiction text (DRC, 2009). The PSSA gen-
erated a raw score converted to a standard score that classified
student performance into four levels: advanced, proficient, basic,
and below basic. The students attained raw score was used in this
study as a measure of reading achievement. The technical infor-
mation available about the PSSA reading indicated that all of the
reading forms of the PSSA had high reliability coefficients (Cron-
bachs alpha) of 0.90 for both third and fifth grades. A modified
bookmark procedure was used to establish the performance cut

points for PSSA (DRC, 2009). The proficient performance level
set for the third grade reading PSSA test was a raw score of 25
(1235 or above standard score) and for fifth grade a raw score of
34 (1275 or above standard score) (DRC, 2009).
The reading PSSA included both multiple choice and per-
formance tasks as well as open-ended tasks. Extensive evaluation
of content validity, construct validity, item fit, and calibration
were described in the technical analysis manuals and showed the
PSSA to have strong psychometric characteristics consistent with
statewide assessments (DRC, 2009).
Procedures
DATA COLLECTOR TRAINING

Data collectors included a group of 11 doctoral-level gradu-
ate students in school psychology and special education. All stu-
dents were trained in the assessment and scoring of the ORF, RTF-
A, and RRR. Two 90-minute group training sessions were held to
review the purpose of the study and extensively review each of the
measures. Administration and scoring of each measure was mod-
eled, recorded, and several opportunities were provided to prac-
tice administration and scoring with the recordings. Corrective
feedback was provided to the data collectors throughout the train-
ing process. Additionally, pilot data using the developed rubrics
for scoring was collected in two classrooms and those data were
used to train the data collectors. Each data collector was pro-
vided a training manual for reference. Individual training sessions
involved mock administrations of the measures conducted with
each data collector, with additional corrective feedback given as
needed. Data collectors needed to meet a minimum of 80% agree-
ment with the scoring criterion to be eligible for data collection
as part of this study.
Inter-rater agreement was assessed for 33% of the individual
assessments, selected at random across all assessments collected
in the study. Word-by-word agreement was used to determine the
agreement for ORF and item-by-item agreement was calculated
for RRR. Agreement on RTF-A was calculated by dividing the
scores of two raters (lower by higher) and multiplying by 100.
On average, agreement for ORF was 99% (range = 96100), 86%

(range = 50100) for RRR, and 89% (range = 65100) for RTF-A.
ADMINISTRATION OF PASSAGES
Data were collected during January and May. At each data col-
lection period, the student read three passages in their entirety.
The order in which the passages were presented was counterbal-
anced across students to control for any potential order effects.
The first minute of the oral reading was used as a measure of ORF.
If the student stopped reading after one minute, the student was
instructed to continue reading aloud. Next, the student was asked
to retell the passage in the students own words. A digital recorder
or audio tape recorder was used to obtain the students retell.
The first minute of the retell was used to measure the RTF-A,
while the RRR was scored at a subsequent time using the entire
retell. The median score of the ORF, RTF-A, and RRR across the
three passages was used as the dependent measure. Median scores
(rather than the mean) were used to control for any potential vari-
ability in scores due to passage difficulty. The procedures for ad-
ministering ORF, RTF-A, and RRR were repeated for each of the
three passages with different (but equivalent) passages adminis-
tered in winter and spring. The PSSA was administered by schools
according to standardized instructions provided by the state in the
beginning of April, and scores were obtained from school person-
nel for the purpose of analysis.
Results
Concurrent Validity
Prior to conducting the analysis, all measures were examined for

meeting distributional assumptions and were found to represent
normal distributions, with skewness and kurtosis well within the
recommended ranges of +/ 2 standard errors (Brown, 2009).
Descriptive statistics and correlations for students performance
on the ORF, RTF-A, RRR, and PSSA measures are reported in
Table 1 for the third-grade sample and Table 2 for the fifth-grade
sample. Missing data was determined to be missing at random,
with listwise deletion used for all analysis.
TABLE 1 Descriptive Statistics and Correlations for the Third-Grade Sample

(n = 158)
Variable 1 2 3 4 5 6 7
1. Winter ORF 1.00 0.915 0.321 0.284 0.140 0.173 0.516

2. Spring ORF 1.00 0.288 0.290 0.112 0.124 0.572
3. Winter 1.00 0.465 0.556 0.288 0.257
RTFA
4. Spring 1.00 0.351 0.570 0.239
RTFA
5. Winter RRR 1.00 0.314 0.247
6. Spring RRR 1.00 0.268
7. PSSA 1.00
Mean 115.73 118.49 84.02 96.52 6.93 7 .50 32.94
Standard 32.23 32.48 26.24 24.88 1.87 1.60 6.76
deviation
Note. ORF = Oral Reading Fluency, RTF = Reading Retell Fluency, RRR = Reading
Retell Rubric, PSSA = Pennsylvania System of School Assessment.
p < 0.05, p < 0.001.
All correlations between measures for third grade were low to

moderate in strength and statistically significant (p < 0.01) with
the exception of the correlations between ORF and RRR, which
were either non-significant or barely significant (p < 0.05). The
strongest relationship was observed between ORF and PSSA (win-
ter r = 0.52; spring r = 0.57). Correlations between the PSSA
with RTF-A and RRR were low and of a similar magnitude (range
r = 0.240.27). Stronger correlations were observed between RTF-
A and RRR (range between 0.29 and 0.57) as compared to ORF
with RTF-A (range between = 0.29 and 0.32) and ORF with RRR
(range between 0.11 [n.s.] and 0.17).
Not all correlations between measures for fifth grade were
significant. Similar to third grade, the strongest relationship was
observed between ORF and PSSA (winter r = .38, p < 0.01; spring
r = 0.46, p < 0.01). There was a low but significant correlation be-
tween the winter administration of the RTF-A and PSSA (r = 0.37,
p < 0.01); however, the spring administration was not significant
(r = 0.16, n.s.). There were low but significant correlations be-
tween both the winter and spring administration of the RRR and
PSSA (winter r = 0.21, p < 0.05; spring r = 0.21, p < 0.05). Sim-
ilar to the pattern at third grade, stronger correlations emerged
TABLE 2 Descriptive Statistics and Correlations for the Fifth-Grade Sample

(n = 113)
Variable 1 2 3 4 5 6 7
1. Winter ORF 1.00 0.923 0.136 0.013 0.019 0.025 0.380

2. Spring ORF 1.00 0.174 0.022 0.003 0.032 0.462
3. Winter 1.00 0.414 0.572 0.241 0.368
RTFA
4. Spring 1.00 0.274 0.504 0.161
RTFA
5. Winter RRR 1.00 0.158 0.205
6. Spring RRR 1.00 0.213
7. PSSA 1.00
Mean 139.44 136.25 99.96 92.20 7 .89 7 .32 37 .19
Standard 27 .02 28.20 23.95 26.92 1.43 1.40 6.86
deviation
Note. ORF = Oral Reading Fluency, RTF = Reading Retell Fluency, RRR = Reading
Retell Rubric, PSSA = Pennsylvania System of School Assessment.
p < 0.05, p < 0.01.
between RTF-A and RRR (range 0.240.57, p < 0.01) as compared

to ORF with RTF-A (range between 0.02 and 0.02, n.s.) and ORF
with RRR (range between 0.03 [n.s.] and 0.46, p < 0.01).
Predictive Validity
A second question related to the degree to which the ORF, RTF-A,

and RRR measures predicted performance on the PSSA at third
and fifth grades. To address our research question, hierarchical
linear regression procedures was conducted for each grade from
the two schools, using the winter median scores on ORF, RTF-A,
and RRR as the independent variables and PSSA scales reading
scores as the outcome variable. The winter score on the measures
were used, as these were the closest in time to the PSSA that was
administered in late March and early April. The ORF scores were
entered into the model as the Table 2 first block because the
literature indicated that the ORF is very likely to be a signif-
icant predictor of the outcome variable. The RTF-A and RRR
median scores were entered as the second block because there
were no conclusive findings about the priority of the two variables
in predicting the outcome variable. Key assumptions underlying
TABLE 3 Summary of the Hierarchical Linear Regression Analysis for

Variables Predicting Third-Grade PSSA Scores (n = 160)
Variable B SE B
Block 1
Oral Reading Fluency 0.108 0.014 0.514
Block 2
Oral Reading Fluency 0.104 0.015 0.493
Reading Retell 0.003 0.022 0.012
FluencyAdapted
Reading Retell Rubric 0.684 0.293 0.183
Note. R2 = 0.295 (p < 0.001) for Step 2; R2 = 0.031 (p < 0.05) for Step 2.
p < 0.05, p < 0.001.
linear regression were checked with respect to (a) linearity be-

tween the outcome and the predictors using all partial regression
plots; (b) constant variance and normality of residuals (errors) us-
ing graphic methods (a histogram, a normal probability plot, and
a scatterplot); and (c) multicollinearity (i.e. no high correlation
between the independent variables) using the variance inflation
factor (VIF) and tolerance (reciprocal of VIF; 1/VIF). All of these
assumptions were met for the data. Screening for outliers was also
conducted and two cases were removed from grade 3 and one
case from the grade 5 data. Although all the cases in the data had
Cooks D smaller than 1, all of the three cases had standardized
residuals and studentized residuals that were larger than |+/
3.0|.
As shown in Table 3, for the third-grade sample, ORF, en-
tered in Block 1, explained 26% of the variance in PSSA per-
formance, F (1,159) = 56.82, p < 0.001. Including RTF-A and
RRR as Block 2 predictors in the model explained 3% additional
variance in the PSSA reading performance, F (3,159) = 21.77,
p < 0.001, R2 = 0.03, p < 0.05. ORF and RRR winter median
scores were identified as significant predictors (ORF, p < 0.001;
RRR,p < 0.05) of PSSA performance, whereas RTF-A was observed
to be a non-significant predictor (p = 0.883). Beta weights show
that ORF ( = 0.493, p < 0.001) contributed the largest propor-
tion of explained variance in PSSA outcomes, followed by RRR
( = 0.183, p < 0.01).
TABLE 4 Summary of the Backward Elimination Regression Analysis for

Variables Predicting Fifth-Grade PSSA Scores (n = 118)
Variable B SE B
Block 1
Oral Reading 0.107 0.021 0.385
Fluency
Block 2
Oral Reading 0.097 0.020 0.385
Fluency
FluencyAdapted
Rubric
Note. R2 = 0.275 (p < 0.001) for Step 2; R2 = 0.095 (p < 0.001).

p
< 0.05, p <0 .001.
As shown in Table 4, for the fifth-grade sample, ORF

enteredin Block 1 explained 17% of the variance in PSSA per-
formance, F (1,117) = 25.46, p < 0.001. Adding RTF-A and RRR
as Block 2 predictors into the model explained 9.5% additional
variance in the PSSA reading performance, F (3,117) = 14.42,
p < 0.001, R2 = 0.095, p < 0.01. ORF and RTF-A winter median
scores were identified as significant predictors (ORF, p < 0.001;
RTF-A, p < 0.001) of PSSA performance, whereas RRR was ob-
served to be a non-significant predictor (p = 0.627). Beta weights
show that ORF ( = 0.28, p < 0.001) contributed the largest
proportion of explained variance in PSSA outcomes, followed by
RTF-A ( = 0.28, p < 0.01).
Change in Measures Over Time
Change in the ORF, RTF-A, and RRR measures between winter to

spring reflected the degree to which these measures were sensitive
to general growth across time, an essential characteristic of a mea-
sure to be used for purposes of benchmarking. As seen in Tables 1
and 2, mean performance from winter to spring across measures
was inconsistent across third and fifth grades. At third grade, all
measures showed expected significant increases in performance
from winter to spring (ORF, t = 3.17, p < 0.01; RTF-A, t = 8.67,
p < 0.001; RRR, t = 3.94, p < 0.001). Conversely, at the
fifth-grade level from winter to spring, all measures had unex-

pected, significant decreases (ORF, t = 2.37, p < 0.05; RTF-A,
t = 3.83, p < 0.001; RRR, t = 2.304, p < 0.05).
Discussion
The purpose of this study was to examine a brief, easily adminis-

tered, and scored metric of reading comprehension for narrative
reading material. Specifically, the study examined the concurrent
and predictive validity of the RRR metric with ORF, RTF-A, and a
standardized state assessment of reading comprehension among
third and fifth graders. In addition, the study examined the poten-
tial of the RRR metric as a benchmarking metric to reflect change
in student performance over time.
Overall, results revealed different patterns in third and fifth
grade. In third grade, low to moderate and statistically signifi-
cant correlations were found between ORF and RTF-A measures
with the strongest, although moderate, relationship between RTF-
A and RRR. By comparison, correlations between ORF, and RRR
were mostly non-significant. For fifth-grade students the strongest
relationship was between RTF-A and RRR, which was significant,
moderate, and much higher than correlations between ORF and
RTF-A (mostly non-significant) and RRR (non-significant).
Given that RTF-A was designed to be an estimate of reading
comprehension, the significant correlations between RTF-A and
RRR at both grades suggest that the measures appear to be mea-
suring similar constructs. Important to this study was the finding
that the two measures designed to assess reading comprehension
correlated much stronger with each other at both third and fifth
grades than with a measure of ORF.These findings suggest the
importance of assessing the comprehension and fluency indepen-
dently and that the relationships between fluency and compre-
hension may not be as strong as indicated in many other studies.
With regard to predictive validity, all three measures were
used to predict a state assessment of reading (PSSA) for the
participants in Pennsylvania. For the third-grade students, ORF
and RRR accounted for the most variance in PSSA scores,
while RTF-A dropped out of the model. In contrast, ORF
and RTF-A predicted PSSA scores in fifth grade, while RRR
dropped out of the model. Taken together, these results may be
reflective of differences in the cognitive processes underlying

reading comprehension of narrative reading material for younger
versus older elementary-age students. Examination of the change
in the performance of measures over time showed that in third
grade, significant increases in all measures were evident between
the winter and spring benchmark assessment periods. In con-
trast, significance decreases were evident across measures for fifth
grade.
Why does the RRR measure appear to work differently for as-
sessing reading comprehension of narrative material at grade 3
and grade 5? The complex nature of the development of read-
ing comprehension itself as well as the developmental nature of
the instructional process across grades 3 to 5 may in part an-
swer the question. Reading comprehension is not a unitary con-
struct. Kintsch and Kintsch (2005) noted that three cognitive pro-
cesses interact in the development of comprehensiondecoding,
background knowledge, and strategies. How each of these pro-
cesses contributes to overall comprehension is based on the skills
a reader brings to the task. A skilled decoder is able to sound
out words. Additionally, a reader with adequate reading compre-
hension skills has the background knowledge or information on a
topic that is essential to understanding the subject or material that
is being read. Furthermore, the use of strategies, or how skilled a
reader is in linking meaning of text with prior knowledge and
prior experiences (i.e. rereading, summarizing, clarifying, etc.), is
critical (Duke & Martin, 2008). In addition, learner and text fac-
tors interact with these cognitive processes. On the learner side,
adequate decoding, knowledge and motivation, and strategy use
all impact the capacity to fully understand what is read. Likewise,
the coherence and complexity of text play a large part in whether
a reader understands what they read (Kintsch & Kintsch, 2005;
Kucan & Beck, 1996).
Examining the typical instructional focus in the teaching of
reading also may play a part in explaining the results of this
study. Effective teaching of reading to younger children (early ele-
mentary grades) tends to emphasize lower-level lexical skills such
as word reading efficiency, vocabulary knowledge, and knowl-
edge of grammatical structure (Cain & Oakhill, 2006). In con-
trast, as students acquire these skills, instruction shifts toward
higher-level text processing skills such as inference generation,
comprehension monitoring, and working memory capacity in the

older elementary grades (Cain & Oakhill, 2006; Muter, Hulme,
Snowling, & Stevenson, 2004). A consistent criticism of the assess-
ment of reading comprehension has been the failure to recog-
nize the complexity of comprehension and that many measures
assess only one or a few dimensions of the comprehension process
(Sweet, 2005). Given the difference between third- and fifth-grade
outcomes on all of the measures, it is possible that these measures,
and the RRR measure in particular, are assessing different dimen-
sions of comprehension for third versus fifth graders. This may
especially be the case given the developmental shift from third
to fifth grade in the expected change in instruction from learn-
ing to read to reading to learn (Stevens, Slavin, & Farnish,
1991).
A second possibility to explain the differences between out-
comes for third to fifth grade may be related to the nature of
the passages used in the RRR measure. Specifically, the measure
used in this study assessed reading retell from narrative passages.
Students knowledge of genre or type of text develops over time
(Chapman, 1994; Donovan, 2001; Newkirk, 1987). Although read-
ing comprehension is part of effective reading instruction for stu-
dents at all grades and the same strategies of predicting, monitor-
ing, questioning, imaging, and look-backs are used across grades,
substantial differences in text types between lower and upper
elementary grades result in complexities in the teaching process
which may be reflected in specific skills emphasized across grades
(Block & Duffy, 2008). Instruction in early elementary grades of-
ten revolves more around narrative-based text than other genres,
and the cognitive processes tapped by the assessment at this level
are more likely to be sensitive to student understanding of the ma-
terial. In contrast, by fifth grade, students are exposed to broader
ranges of text types (i.e., informational, procedural, persuasive)
(Best, Floyd, & McNamara, 2008), and it is possible that the RRR
measure is no longer an effective tool for assessing comprehen-
sion of narrative text at this level. This possibility clearly needs
to be explored empirically by examining how the RRR measure
responds when using non-narrative text types.
Although the addition of the RRR provided a statistically sig-
nificant level of variance in explaining outcomes on the state
assessment, the amount of variance (3%) was quite small relative
to the ORF measure and only evident for third-grade students.

For fifth grade, the addition of the RRR to the prediction was
non-significant. Additionally, while there were differences in the
strengths of correlations between RRR and RTF-A between third
and fifth grades, the differences between correlations were small
and perhaps within the margin of error. These data suggest that
the RRR may not be adding substantial practical significance to
explaining outcomes on the state assessment, perhaps for either
grade.
Certain limitations must be considered in the results of this
study. First, the populations of students used in the study had a
limited number of students who were from culturally and/or lin-
guistically diverse backgrounds. Future studies should examine
if the outcomes of the oral retell measure would be differential
for students whose first language was not English. Another limita-
tion of this study was the variability of passage difficulty. Although
passages were selected based from material commonly used for
universal screening of reading (e.g., DIBELS, 6th edition and
AIMSweb third- and fifth-grade progress monitoring passages)
and the passages were carefully gauged to meet grade level read-
ability requirements, there was more variability in the readability
of passages within grades than expected. Although efforts to con-
trol the variability by using median scores across passages for anal-
ysis were employed, future studies should take additional steps to
carefully calibrate passage difficulty. Another limitation was hav-
ing the passage in view during retell. The passage may have served
as a prompt or cue for the retell, inflating scores on RTF-A and
RRR. This was also a deviation from DIBELS standard administra-
tion of RTF.
Finally, it is important to note that the newest version of the
DIBELS (DIBELS Next) has altered the scoring process for retell
fluency. As such, future research is needed to determine if the
findings of this study which used a modified version of the original
DIBELS Retell Fluency measure would equally apply to the revised
version of the measure.
Despite these limitations, the RRR measure appears to be a
brief, easily scored measure that shows small effects but statisti-
cally significant concurrent validity to RTF-A, as well as predic-
tive validity to measures of reading comprehension, especially for
grade 3. Because the measure was found to be reflective of reading
comprehension among third- but not fifth-grade students, it

would be interesting to determine if the RRR measure has applica-
bility to children younger than grade 3. Similarly, given the chang-
ing nature of the demands of reading comprehension among
older elementary students, the applicability of the RRR measure
with non-narrative text type needs to be examined. Continued ef-
forts to find brief, easily scored measures of reading comprehen-
sion are still needed. The RRR measure used in this study shows
some promise for use among younger elementary-age students.
Research into the full applicability and limitations of this measure
are clearly needed.
References
Best, R. M., Floyd, R. G., & McNamara, D. S. (2008). Differential competencies

contributing to childrens comprehension of narrative and expository texts.
Reading Psychology, 29, 137164.
Blachowicz, C., & Ogle, D. (2001). Reading comprehension: Strategies for independent
learners. New York, NY: Guilford.
Block, C. C., & Duffy, G. G. (2008). Research on teaching comprehension:
Where weve been and where we are going. In C.C. Block & S. R. Parris (Eds.),
Comprehension instruction: Research-based best practices (2nd ed.; pp. 1937). New
York, NY: Guilford.
Brown, S. (2009). Measures of shape: Skewness and kurtosis. Retrieved from
http://www.tc3.edu/instruct/sbrown/stat/shape.htm.
Cain, K., & Oakhill, J. (2006). Assessment matters: Issues in the measurement
of reading comprehension. British Journal of Educational Psychology, 76, 697
708.
Caldwell, J. S., & Leslie, L. (2005). Intervention strategies to follow informal reading
inventory assessment: So what do I do now? Boston, MA: Pearson
Chapman, M. L. (1994). The emergence of genres: Some findings from an ex-
amination of first-grade writing. Written Communication, 11, 348380.
Christ, T. J., & Silberglitt, B. (2007). Estimates of the standard error of measure-
ment for curriculum-based measures of oral reading fluency. School Psychology
Review, 36, 130146.
Copmann, K. S. P., & Griffith, P. L. (1994). Event and story structure recall by
children with specific learning disabilities, language impairments, and nor-
mally achieving children. Journal of Psycholinguistic Research, 23, 231248.
CTB/McGraw-Hill. (2003). TerraNova. Monterey, CA: CTB/McGraw-Hill.
Cutting, L. E., & Scarborough, H. S. (2006). Prediction of reading comprehen-
sion: Relative contributions of word recognition, language proficiency, and
other cognitive skills can depend on how comprehension is measured. Scien-
tific Studies of Reading, 10, 277299.
Data Recognition Corporation (DRC). (2009, February). Technical Report for the
PSSA 2008 Reading and Mathematics: Grades 3, 4, 5, 6, 7, 8, and 11. Maple Grove,
MN: Author.
Donovan, C. A. (2001). Childrens development and control of written story and
informational genres: Insights from one elementary school. Research in the
Teaching of English, 35, 452497.
Duke, N. K., & Martin, N. M. (2008). Comprehension instruction in action: The
elementary classroom. In C. C. Block & S. R. Parris (Eds.), Comprehension in-
struction: Research-based best practices (2nd ed.; pp. 241257). New York, NY:
Guilford.
Fletcher, J. M. (2006). Measuring reading comprehension. Scientific Studies of
Reading, 10, 323330.
Fuchs, L. S., & Fuchs, D. (1992). Identifying a measure for monitoring student
reading progress. School Psychology Review, 21, 4558.
Fuchs, L. S., Fuchs, D., & Maxwell, L. (1988). The validity of informal
reading comprehension measures. Remedial and Special Education, 9(2),
2028.
Gambrell, L. B., Koskinen, P. S., & Kapinus, B. A. (1991). Retelling and the read-
ing comprehension of proficient and less-proficient readers. Journal of Educa-
tional Research, 84, 356362.
Good, R. H., & Kaminski, R. A. (Eds.). (2002). Dynamic indicators of basic early liter-
acy skills (6th ed.). Eugene, OR: Institute for the Development of Educational
Achievement.
Good, R. H., III, & Kaminski, R. A. (2011). DIBELS Next Assessment Manual.
Eugene, OR: Dynamic Measurement Group. Retrieved from http://www.
dibels.org/.
Good, R. H., III, Kaminski, R. A., Dewey, E. N., Wallin, J., Powell-Smith, K. A.,
& Latimer, R. J. (2011). DIBELS Next Technical Manual. Eugene, OR: Dynamic
Measurement Group. Retrieved from http://dibels.org/.
Howe, K. B., & Shinn, M. M. (2002). Standard reading assessment passages (RAPs) for
use in general outcome measurement: A manual describing developmental and technical
features. Eden Prairie, MN: edformation.
Jenkins, J. R., Hudson, R. F., & Johnson, E. S. (2007). Screening for at-risk readers
in a response to intervention framework. School Psychology Review, 36, 582
600.
Keehn, S. (2003). The effect of instruction and practice through readers theatre
on young readers oral reading fluency. Reading Research and Instruction, 42(4),
4061.
Kintsch, W., & Kintsch, E. (2005). Comprehension. In S. G. Paris & S. A. Stahl
(Eds.), Childrens reading comprehension and assessment (pp. 7192). Mahwah,
NJ: Lawrence Erlbaum Associates.
Klingner, J. K., Vaughn, S., & Boardman, A. (2007). Teaching reading comprehension
to students with learning difficulties. What works for special needs learners. New York,
NY: Guilford.
Kucan, L., & Beck, I. L. (1996). Four fourth graders thinking aloud: An investi-
gation of genre effects. Journal of Literacy Research, 28, 25987.
Marcotte, A. M., & Hintze, J. M. (2009). Incremental and predictive utility of

formative assessment methods of reading comprehension. Journal of School
Psychology, 47, 315335.
Maria, K. (1990). Reading comprehension instruction: Issues and strategies. Parkton,
MD: York Press.
Marston, D. B. (1989). A curriculum-based measurement approach to assess-
ing academic performance: What it is and why do it. In M. R. Shinn (Ed.),
Curriculum-based measurement: Assessing special children (pp. 1878). New York,
NY: Guilford Press.
McKenna, M. K., & Good, R., III. (2003). Assessing reading comprehension: The rela-
tion between DIBELS Oral Reading Fluency, DIBELS Retell Fluency, and Oregon State
Assessment scores. Eugene, OR: University of Oregon.
Medina, A. L., & Pilonieta, P. (2006). Once upon a time: Comprehending narra-
tive text. In J. S. Schumm (Ed.), Reading assessment and instruction for all learners
(pp. 222261). New York, NY: Guilford.
MetaMetrics Inc. (2008). The Lexile Analyzer. Retrieved from http://www.lexile.
com/analyzer.
Micro Power & Light Co. (2008). The Spache formula version 1.3. Dallas, TX:
Author.
Morrow, L. M. (1985a). Reading and retelling stories: Strategies for emergent
readers. Reading Teacher, 38, 870875.
Morrow, L. M. (1985b). Retelling stories: A strategy for improving young chil-
drens comprehension, concept of story structure, and oral language com-
plexity. The Elementary School Journal, 85, 647661.
Muter, V., Hulme, C., Snowling, M. J., & Stevenson, J. (2004). Phonemes, rimes
and grammatical skills as foundations of early reading development. Evidence
from a longitudinal study. Developmental Psychology, 40, 665681.
Nation, K., & Snowling, M. (1997). Assessing reading difficulties: The validity
and utility of current measures of reading skill. British Journal of Educational
Psychology, 67, 359370.
National Reading Panel (2000). Teaching children to read: An evidence based as-
sessment of the scientific research literature on reading and its implications for read-
ing instruction. Washington, DC: National Institute of Child Health and Hu-
man Development. Retrieved from http://www.nationalreadingpanel.org/
Publications/summary.htm.
Newkirk, T. (1987). The non-narrative writing of young children. Research in the
Teaching of English, 21, 121144.
Paris, S. G., & Carpenter, R. D. (2003). FAQs about informal reading inventories.
The Reading Teacher, 56, 578581.
Pearson Education Inc. (2008). AIMSweb. Reading-CBM. Retrieved from
https://aimsweb.pearson.com/downloads/AIMSweb TM.pdf
Pearson, P. D., & Hamm, D. N. (2005).The assessment of reading com-
prehension: A review of practicesPast, present, and future. In
S. G. Paris & S. Stahl (Eds.), Childrens reading comprehension and assessment (pp.
1360). Mahwah, NJ: Center for Improvement of Early Reading Achievement
(CIERRA), Lawrence Erlbaum Associates.
Pressley, M., Hilden, K. R., & Shankland, R. (2005). An evaluation of end-of-grade

3 dynamic indicators of basic early literacy skills (DIBELS): Speed reading without
comprehension predicting little. East Lansing, MI: Literacy Achievement Research
Center, Michigan State University.
Reed, D. K. (2011). A review of the psychometric properties of retell instruments.
Educational Assessment, 16, 123144.
Reed, D. K., & Vaughn, S. (2012). Retell as an indicator of reading comprehen-
sion. Scientific Studies of Reading, 16, 187217.
Riedel, B. W. (2007). The relation between DIBELS, reading comprehension,
and vocabulary in urban first-grade students. Reading Research Quarterly, 42,
546567.
Roberts, G., Good, R., & Corcoran, S. (2005). Story retell: A fluency-based indi-
cator of reading comprehension. School Psychology Quarterly, 20, 304317.
Shapiro, E. S. (2004). Academic skills problems: Direct assessment and intervention
(3rd ed.). New York, NY: Guilford.
Stevens, R. J., Slavin, R. E., & Farnish, A. M. (1991). The effects of cooperative
learning and direct instruction in reading comprehension strategies on main
idea identification. Journal of Educational Psychology, 83, 816.
Sweet, A. P. (2005). Assessment of reading comprehension: The RAND reading
study group vision. In S. G. Paris & S. A. Stahl (Eds.), Childrens reading compre-
hension and assessment (pp. 312). Mahwah, NJ: Lawrence Erlbaum Associates.
Williams, K. T. (2001). Group Reading Assessment and Diagnostic Evaluation
(GRADE). Circle Pines, MN: American Guidance Services.
Copyright of Reading Psychology is the property of Routledge and its content may not be
copied or emailed to multiple sites or posted to a listserv without the copyright holder's
express written permission. However, users may print, download, or email articles for
individual use.

Shapiro

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Shapiro

Transféré par

Droits d'auteur :

Formats disponibles

Reading Psychology, 35:644665, 2014

ISSN: 0270-2711 print / 1521-0685 online

CONCURRENT AND PREDICTIVE VALIDITY OF READING

An estimated 10% of children ages 711 years nationwide

Address correspondence to Edward S. Shapiro, PhD, Director, Center for Promot-

comprehension is the complexity of defining the process of read-

predetermined rubric for the presence or absence of various

fluency (RTF), involves counting the total number of words that

Participants and Settings

Data were collected from a total of 271 elementary school chil-

analysis. A letter from the principal indicating his approval of the

ORAL READING FLUENCY (ORF)

ADAPTED DIBELS (6TH EDITION) RETELL FLUENCY (RTF-A)

READING RETELL RUBRIC (RRR)

The development of the RRR scoring template for each pas-

PENNSYLVANIA SYSTEM OF SCHOOL ASSESSMENT READING

bookmark procedure was used to establish the performance cut

DATA COLLECTOR TRAINING

On average, agreement for ORF was 99% (range = 96100), 86%

Prior to conducting the analysis, all measures were examined for

TABLE 1 Descriptive Statistics and Correlations for the Third-Grade Sample

1. Winter ORF 1.00 0.915 0.321 0.284 0.140 0.173 0.516

All correlations between measures for third grade were low to

TABLE 2 Descriptive Statistics and Correlations for the Fifth-Grade Sample

1. Winter ORF 1.00 0.923 0.136 0.013 0.019 0.025 0.380

between RTF-A and RRR (range 0.240.57, p < 0.01) as compared

A second question related to the degree to which the ORF, RTF-A,

TABLE 3 Summary of the Hierarchical Linear Regression Analysis for

linear regression were checked with respect to (a) linearity be-

TABLE 4 Summary of the Backward Elimination Regression Analysis for

Note. R2 = 0.275 (p < 0.001) for Step 2; R2 = 0.095 (p < 0.001).

As shown in Table 4, for the fifth-grade sample, ORF

Change in Measures Over Time

Change in the ORF, RTF-A, and RRR measures between winter to

fifth-grade level from winter to spring, all measures had unex-

The purpose of this study was to examine a brief, easily adminis-

reflective of differences in the cognitive processes underlying

comprehension monitoring, and working memory capacity in the

to the ORF measure and only evident for third-grade students.

comprehension among third- but not fifth-grade students, it

Best, R. M., Floyd, R. G., & McNamara, D. S. (2008). Differential competencies

Marcotte, A. M., & Hintze, J. M. (2009). Incremental and predictive utility of

Pressley, M., Hilden, K. R., & Shankland, R. (2005). An evaluation of end-of-grade

Vous aimerez peut-être aussi