The Predictive Utility of Dibels Reading Assessment For Reading Comprehension Among Third Grade English Language Learners and English Speaking Children

THE PREDICTIVE UTILITY OF DIBELS READING
ASSESSMENT FOR READING COMPREHENSION AMONG

THIRD GRADE ENGLISH LANGUAGE LEARNERS AND
ENGLISH SPEAKING CHILDREN
D ebora S cheffel
University o f the Rockies
Colorado Springs, CO
D ianne L efly
Colorado Department o f Education Assessment/Standards
Denver, CO
J anet H ouser
Regis University
Rueckert-Hartman College fo r Health Professions
Denver, CO
Abstract
The study addresses the extent to which subtests on the Dynamic In
dicators of Basic Early Literacy Skills Reading Assessment (DIBELS;
Good & Kaminski, 2002) predict student success on a measure of read
ing comprehension and if prediction is consistent for native and second
English Language Learners. 2,649 elementary students were assessed
on a reading comprehension measure, of which 29.7% were English
Language Learners. Descriptive and analytic statistics were generated
including bivariate correlation analysis split by language proficiency.
Critical measures and suggested cutoff values (Good, Simmons, et al.,
2002) were evaluated for predictive utility by visualization of Receiver
Operating Characteristic (ROC) curves (Swets, Dawes, & Monahan,
2000), and comparison of the area-underthe- curve (AUC) values. DI
BELS better predicts children who are at low risk than those at
risk; however, DIBELS correctly classifies children at risk better
for ELL than non-ELL students in third grade.
Key words: English Language Learners (ELL), DIBELS, Reading,
Sensitivity, Specificity
87
88 / Reading Improvement
Introduction
The Dynamic Indicators of Basic Early
Literacy Skills (DIBELS; Good & Kaminski,
2002) is a measure designed to assess 3 of the
5 big ideas of early literacy espoused in the
National Reading Panel report (National Insti
tute of Child Health and Human Development,
2000): Phonological Awareness, the Alphabet
ic Principle, and Fluency with Connected Text.
Measures of Phonological Awareness include
Initial Sound Fluency (ISF) which assesses a
childs skill to identify and produce the initial
sound of a given word, and Phonemic Segmen
tation Fluency (PSF) which assesses a childs
skill to produce the individual sounds within a
given word. A measure of the Alphabetic Prin
ciple is Nonsense Word Fluency (NWF) which
assesses a childs knowledge of letter-sound
correspondences as well their ability to blend
letters together to form unfamiliar nonsense
(e.g., fik, lig, etc.) words, and a measure of
Fluency with connected text is Oral Reading
Fluency (ORF) which assesses the number of
correct words a child can read per minute in
grade level connected text.
The authors of the DIBELS claim that its
subtests are reliable predictors of reading under
achievement and thus may be used to identity
students in need of intervention and to reliably
determine student progress (Good, Simmons
6 Kameenui, 2001). DIBELS is being used in
thousands of schools across the nation, often to
provide formative data to schools accountable
for increasing student achievement on end-ofgrade-level state reading achievement tests. In
spite of its widespread use, some question its
utility in assessing reading comprehension, the
undisputed goal of reading (Good et al., 2001).
Samuels (2006) has been particularly critical
of fluency measures on the DIBELS assert
ing that fluency involves both decoding and
comprehending texts simultaneously, whereas
the DIBELS Oral Reading Fluency measure
focuses on decoding speed and does not assess
comprehension. This may have particularly
negative implications for the utility of DIBELS

with English Language Learners as they may
be able to decode text rapidly without com
prehending the passage due to impediments
in vocabulary and syntax. Justifying the use of
the DIBELS with children learning English as
their second language, Kaminski et al. (2006)
reported that for English Language Learners
who are learning to read in English, DIBELS
are appropriate for assessing and monitoring
progress in acquisition of early reading skills.
Haager and Windmueller (2001) also assert
that DIBELS have been used successfully with
English language learners to predict reading
underachievement, identify students in need of
intervention and determine student progress.
In addition, Riedel (2007) found that DIBELS
Oral Reading Fluency and comprehension
weremore strongly correlated in ELL students
than in non-ELL students, but because of the
small size of the ELL sample in his study, fur
ther investigation was needed. Schwarzer and
Ferguson (2007) critically reviewed these lim
ited research claims and concluded that they
are broadly unfounded due to a lack of rigor in
the research to support them.
No studies could be identified that suffi
ciently tested the reliability and validity of the
DIBELS suite of assessments specifically on
English Language Learners.While the chal
lenges of achieving literacy in this population
arewell known, there have been fewrigorous
studies demonstrating the usefulness of these
tests for accurately predicting which of these
children will require more intensive interven
tion to achieve success.
The overall purpose of the present study is
to determine the extent to which subtests on
the DIBELS are effective in predicting student
success on a summative, state criterion-referencedmeasure that exclusivelymeasures read
ing comprehension in third grade, the CSAP
(Colorado Student Assessment Program), and
whether or not its effectiveness is consistent for
native and second English Language Learners.
The Predictive Utility of Dibels Reading Assessment / 89
Specifically, the purposes of this study were to:
examine the predictive utility of the

DIBELS tests in correctly identi
fying students in need of intensive
intervention
identify the optimal cutoff score to cor

rectly classify students who are in need
of reading intervention
compare the usefulness of the mea

surement system for students who are
English Language Learners.
Method
Participants
Participants in the study were a cohort of
2,649 elementary school students in one of
the western states in the United States. These
students attended schools which received grant
monies as part of a national literacy reform
initiative targeting schools with themost chal
lenged students in terms of low achievement
and socioeconomic status. These students
were assessed at the conclusion of the 2006
school year for reading achievement on the
state assessment, and had received reading
interventions for two or more years previ
ous. 33.1%of the sample were Caucasian,
61.3%were identified as Hispanic, 2% oth

er ethnicity and 4.2% whose ethnicity was
unknown. Table 3 reveals that 66.1% were
classified as Non- English Language Learners
and 29.7% as English Language Learners and
4.2% whose English Language Learner Status
was unknown. AmongHispanic students, 6.9%
were identified as No English Proficiency,
50.7% were categorized as having Limited
English Proficiency, and 42.7% evidenced
Full English Proficiency. Seventy- five per
cent of sample students were eligible for free
or reduced priced lunch which is an indicator
of economic disadvantage.
Table 1 indicates that 49.31% of the study
sample was female and 51.07%, male. Table
2 indicates that 75% of the total sample was
eligible for free and reduced lunch, a proxy for
socioeconomic status. Distribution of CSAP
scores and classification of proficiency appear
in Figure 1.
Measures
Criterion-related and predictive validity
were tested for this report, using the state
StudentAssessment Program(CSAP) thirdgrade reading assessment as the criterion
reference. Predictive utility was appraised and
compared for English Language Learners and
Table 1 2006 CSAP Grade 3 Reading Means, Standard Deviations and Percent of Total
by Subgroup
Subgroup
Total Tested
Mean SS
SS STD
% of Total Tested
Native American
627
450.44
48.26
1.16%
Asian/Pacific Islanders
1981
477.63
54.81
3.65%
African American
3206
443.91
54.05
5.91%
Hispanic
14689
442.34
52.14
27.08%
White
33942
480.83
49.48
62.58%
Female
26747
475.53
53.78
49.31%
Male
27698
460.35
52.66
51.07%
Not ELL
46933
562.59
72.15
86.53%
ELL
7304
499.44
83.18
13.47%
Total Tested
54239
554.09 7
6.80
Table 2 Distribution of Demographics by Study Sample and Sub Sample

Total Sample
Percent of Total
Sample
White
840
33.1%
1.4%
98.6%
Hispanic
1557
61.3%
48.7%
51.3%
92
3.6%
95.7%
4.3%
Demographics
African American
Percent ELL
Students
Percent Non-ELL
Students
Asian/Pacific Islander
24.
9%
41.7%
58.3%
Native American
26
1.0%
11.5%
88/5%
No Ethnicity Label
110
4.2%
33.6%
66.4%
Male
1337
50.5%
30.9%
69.1%
Female
1312
49.5%
30.9%
68.7%
Free/Reduccd Lunch
Eligible
1903
75%
38.7%
61.3%
Not Free/Rcduced
Lunch Eligible
636
25%
8.0%
92.0%
Total Students
2649
29.7%
66.1%
Table 3 Distribution of Language Proficiency for Total Sample and for ELL Students
Language Proficiency
Total Sample
Percent o f All Students
1752
66.1%
No English Proficiency
54
2.0%
6.9%
Limited English Proficiency
399
15.1%
50.7%
Full English Proficiency
334
12.6%
42.7%
No ELL Label
110
4.2%
Not ELL
Total
Percent of ELL Students
2649
Figure 1 Distribution of CSAP scores and proficiency classifications
CSAP Scale Score
non-English Language Learners. Cutoff scores

were evaluated and identified to determine
what score(s) maximize correct classification
of students who need reading intervention and
those who do not. Predictive validity refers
to the extent to which a scale predicts scores
on some criterion measure. The importance
of predictive validity and utility is in gener
ating a number that reflects the likelihood that
students are accurately classified according
to their risk of reading underachievement on
a specific criterion (in this case, the CSAP).
Specifically, the numerical coefficient reflects
the likelihood that a child who scores within an
at risk range is actually at risk as measured on
the criterion measure (i.e., has positive predic
tive value) or the probability that a child who
scores within an at risk range, is actually not
at risk as measured on the criterion measure
(i.e. has, negative predictive value). Teachers
must have confidence that a progress monitor
ing measure like the DIBELS is predictive of
a students performance in reading on an out
come measure. By establishing the extent to
which this is characteristic of DIBELS, teach
ers can have the confidence to use the DIBELS
data to make instructional decisions that will
positively impact student achievement.
The State Student Assessment Program
The CSAP is a criterion-referenced, standards-based assessment designed to provide

stakeholders (i.e., students educational per
sonnel, parents, government officials, etc.)
with a picture of student performance on the
state Model Content Standards (MCS). The
primary purpose of the assessment program
is to determine the level at which Colorado
students meet the state standards in content
areas that are assessed. Results are intended
for use in improving curricula and instruction
aswell as increasing student learning.
The MCS were developed by educators and
community members over a two-year period.
The resulting standards represent a consensus
of parents, educators, administrators, business

people, and interested communities. The stan
dards serve as guidelines that describe what
students should know and be able to do at spe
cific grade levels as measured by the MCS.
As such, they measure attainment of the goal
of proficiency in reading, writing, math and
science. The first series of CSAP assessments
was administered from 1997 to 2001. They
were administered in selected grades during
those years. The CSAP assessments in their
current formhave been administered in grade
3-10 since 2002.
The CSAP assessments were federally
peer approved in 2006 for use as the states
NCLB assessment system. The CSAP assess
ments result in scale scores that are based on
students item response patterns (Item Re
sponse Theory). Since 2002, the CSAP has
been vertically scaled so student progress can
be tracked over time. The CSAP has estab
lished internal reliability for the total test and
for each content standard at each grade.
On the 2006 CSAP, total score reliability
coefficients were all .86 or greater measured
as Cronbachs alpha. These reliability coeffi
cients indicate that overall, the states 2006
assessment had strong internal consistency
and that the tests produced relatively stable
scores. The CSAP has been tested for content,
construct, discriminate, and predictive valid
ity; all are within acceptable levels. Detailed
information about the results of reliability and
validity testing are available at the Colorado
Department of Education web site at http://
www.cde.state.co.us/cdeas sess/archives.html
in the CSAP technical reports.
The CSAP third-grade reading assessment
is the criterion reference for this study. The
third-grade reading assessment measures only
reading comprehension, one of the states six
reading and writing standards.
Total score reliability coefficients for
the third-grade test was .89 as measured by
Cronbachs alpha. This reliability coefficient
indicates that the states third grade reading
assessment has strong internal consistency and
that the test produces relatively stable scores.
For 2006, the third grade assessment consisted
of 32 multiple choice items and 8 constructed
response items. The mean p-value for the mul
tiple choice itemswas .70 and themean p-value
for the constructed response items was .51.
Descriptive statistics for the third grade
reading assessment are in Table 1. Table 2
contains descriptive statistics for the study
sample, and Table 3 contains language pro
ficiency information about the study sample.
The DIBELS Monitoring Assessments
The DIBELS assessments are asserted
to be reliable predictors of reading under
achievement that may be used to identify stu
dents in need of intervention and to reliably
determine student progress to provide schoolbased data to inform instruction and to review
school level outcomes.
Table 4 lists the six tests used for bench
marks and/or progress monitoring. Good,
Simmons et al. (2002) used evaluation of the
predictive capacity of each test in the suite to
recommend a process for monitoring the de
velopment of literacy skills.
Table 5 summarizes the reported reliability
statistics for the DIBELS suite. The reliabil
ity and validity of the Oral Reading Fluency
tests have been the focus of most evaluations
reported in the literature. Table 6 summarizes
the reliability and validity tests that have been
reported for the Oral Reading Fluency (ORF)
For evaluating student proficiency, the
third grade scale scores are categorized as:
Proficiency Level
Scale Score Range
Advanced Proficiency
656 and above
Proficient
526 to 655
Partially Proficient
466 to 525
Unsatisfactory
465 and below
tests of the DIBELS suite of assessments. A

review of the DIBELS published in the Mental
Measurement Yearbook (Shanahan, 2005) con
cludes that the measures evidence adequate or
better psychometric properties, but lack sound
documentation of predictive validity. While
the DIBELS appears to be a valid indicator of
reading ability, the capacity of the tests to cor
rectly identify the childrenwho need additional
help is supported by limited research.
Data Collection and Analysis Procedures
DIBELS data were downloaded directly
from the University of Oregon data reposi
tory using standard data analytic procedures.
Data were retrieved for 2,649 students for the
school years ending 2004, 2005, and 2006, all
three measurement periods (Fall, Winter, and
Spring) for all DIBELS assessments. Univer
sity of Oregon data have an identifier that is
unique to the DIBELS measurement system,
and so these data were matched to CSAP data
using birth date, school, last name, first name
and gender. It was assumed that matching on
these demographic variables would minimize
the potential mismatching of the two data
sets (i.e. DIBELS and CSAP). The files were
merged and unique identifiers were deleted so
Table 4 Assessments in the DIBELS Suite
of Reading Tests
Test
Grades
Letter Naming Fluency

(LNF)
Kindergarten through 1st
Initial Sound Fluency

(ISF)
Kindergarten
Phoneme Segmentation
Fluency (PSF)
Kindergarten through 1st
Nonsense Word Fluency

(NWF)
Kindergarten through 2nd
Oral Reading Fluency

(ORF)
1st through 3rd
Work Use Fluency

(WUF)
Kindergarten through 3rd
Table 5 Documented Reliability Statistics for the DIBELS Tests

Study
Shanahan, T. (2005)
Specific Test(s)
Coefficient
Cronbachs alpha
.92
Test-retest
.92 to .97
Concurrent validity with

Woodcock-Johnson Reading
Predictive validity
.80
.66
McKnight (2001)
Inter-rater reliability
High 80s
Elliott (2001)
Inter-rater reliability
.82 to .94
Tcst-retest
.74 to .93
Hintzc (2003)
Equivalent forms
.64 to .91
Cronbachs alpha
.49 to .69
Table 6 Results of Comparable Studies Relating DIBELS ORF Scores and State
Assessments of Reading
Study
Correlation of Spring 3rd grade

ORF with State Assessment
Stage & Jacobsen (2001)
.43 to .44
Buck & Torgesen (2003)
.62 to .78
Wilson (2005)
.74
VanDerMecr (2005)
.61 to .6 5
Shaw & Shaw (2002)
.73 to .80
Barger (2003)
McGlinchcy & Hixon (2004)
Current Study
.73
.49 to .81
.604 to .628
that the final file was de-identified. Data were

matched such that a students first, second,
and third grade scores could bematched to
the third grade CSAP score, resulting in lon
gitudinal study data. A total of 2,492 students
had valid CSAP reading scale scores that
were collected from the state Department of
Education (DE) database and matched to 515
DIBELS scores from the students first grade
performance, 1,378 scores from second grade
performance, and 2,134 scores from third
grade performance on the tests.
Descriptive and analytic statistics for all
variables were generated for the total respon
dent groups and for the language proficiency
subgroups (i.e. English Language Learners
and non-English Language Learners). Bivar

iate correlation analysis was run on the DI
BELS / CSAP variables for the total data set
and split by language proficiency. Cross-tab
ulations were developed to test the associa
tion between DIBELS categorizations into
at risk; some risk; and low risk and CSAP
classification as adequate or inadequate profi
ciency. Chi square statistics were generated
to test for statistical significance of any asso
ciations. Cross-tabulated tables were used to
evaluate the sensitivity, specificity, positive
and negative predictive value, and overall
accuracy o f the DIBELS assessments. Crit
ical measures and suggested cutoff values as
identified by Good, Simmons et al. (2002)
were evaluated for predictive utility by vi
sualization of ROC (Receiver Operating
Characteristic) curves (Swets, Dawes, &
Monahan, 2000) and comparison of the areaunder-the-curve (AUC) values.
Results
Students in the sample for the current
studywere by definition at risk by attending
schools that qualify for specific grant monies.
As a result, the distribution of students in the
sample is not typical of the Colorado population
at large as can be seen by comparing Tables 1
and 2. Project schools have a higher proportion
ofininority students, students that qualify for
free and reduced lunches, and students who are
English Language Learners (ELL).
The Pearsons correlation coefficients for
the relationship between D1BELS Oral Read
ing Fluency (ORF) scores for Fall, Winter, and
Spring of the third grade and the CS AP reading
scale score appear in Table 7. The correlations
are reported for all students and by language

proficiency classification. The relationship be
tween the DIBELS tests and the CSAP reading
scores are moderately strong to strong.
The tests are as highly correlated for stu
dents that are classified as ELL as those who
are not. The tests are highly correlated among
the three measurement periods, and so it
could be assumed that each possesses roughly
the same amount of predictive utility.
Table 8 provides comparative data from
the literature, which shows the variability in
correlation coefficients across typical studies.
While part of the variability is accounted for
by differences in state assessments, the mag
nitude of relationship between DfBELS ORF
scores and state assessments remains quite
variable. For the current study, the correlation
is moderately strong to strong, and indicates
a linear relationship between scores on the
DIBELS ORF in all three third grade time
periods and the CSAP reading score.
Table 7 Correlation Coefficients for DIBELS Third Grade Assessments

DIBELS Fall
ALL
DIBELS Winter
Non-ELL
ELL
ALL
DIBELS Spring
Non-ELL
ELL
DIBELS Winter
0.912
0.906
DIBELS Spring
0.877
0.873
0.879
0.912
0.908
0.912
CSAP Spring
0.582
0.558
0.587
0.607
0.588
0.602
ALL
Non-ELL
ELL
0.909
0.623
0.603
0.628
Table 8 Measures of Correct Classification of State Assessment Reading Proficiency by

DIBELS Spring 3rd Grade Oral Reading Fluency (ORF)
Study
Hintze, ct al. (2003)
% At Risk Correctly
Identified
% Low Risk Correctly

Identified
.58 to .86
.52 to .88
Stage & Jacobsen (2001)
.41
.90
Buck & Torgescn (2003)
.81
.91
Wilson (2005)
.93
.82
VanDerMcer (2005)
.96
.72
McGlinchcy & Hixon

(2004)
.77
.72
.51 to .64
.92 to .93
This Study

The predictive utility of the DIBELS ORF
scores is depicted in Figure 2 as a scattergram
of the relationship between DIBELS classifi
cations of risk and CS AP classifications of pro
ficiency. Students in the upper right and lower
left comers are accurately classified as to their
risk of not achieving proficiency on the CSAP
reading test. These data are consistent with pre
vious studies that demonstrate predictive utili
ty for the at risk and low risk categories,
but the some risk category is equally likely
to predict proficiency or a lack of proficiency.
The lower cutoff of the ORF score would have
to be substantially lowered to achieve signifi
cant changes in predictive utility.
Predictive utility has been measured in
previous studies by calculating the proportion
o f low and at risk students who are correctly
classified as proficient or lacking proficiency.
Table 8 reports these figures for comparable
studies o f the spring third grade ORF and
state assessments of reading as well as the
results of this study. In the current studys

sample, between 51% and 64% of those at
risk and 92% to 93% of low risk students
were correctly classified by third grade DI
BELS tests. Table 9 demonstrates the classi
fication accuracy of all three (i.e. fall, winter,
spring) third grade ORF DIBELS tests for the
total student group. Sensitivity ranged from
.80 to .87, with strongest sensitivity in the fall
and winter. Specificity, on the other hand, was
strongest in the spring measurement period,
and ranged from a low of .64 to a high of .83.
Sensitivity was calculated as recommended by
Good, Gruba & Kaminski (2001) by eliminat
ing students from analysis that fell in the mid
range, i.e. students classified at some risk.
Calculations of sensitivity, specificity,
percent of correct classification, and overall
accuracy appear for all tests in Table 10, and
provide the most detailed information about
the usefulness of individual tests in specific
populations. Sensitivity for all ranged from a
Figure 2 Scattergram of the relationship between DIBELS classifications of risk and

CSAP classifications of proficiency.
800
700
600
500
400
300
H
01
200
100
CRF 3rd End
100
200
300
low of .53 to a high of .87 for the total group

of DIB ELS tests most highly related to the out
come measure (i.e. CSAP). Specificity ranged
from a low of .64 to a high of .92. The best bal
ance of sensitivity and specificity - indicating
a low rate of both false positives and false neg
atives - is achieved with the Winter and Spring
second and third grade ORF tests. Interestingly,
the Phoneme Segmentation Fluency score at
the beginning of the first grade also had a good
balance of these two diagnostic characteristics,

particularly considering the length of time that
passed between the first grade DIBELS testing
period and the third grade CSAP.
Table 10 also depicts the percent at risk
and low risk correctly classified, and the
overall accuracy of the tests. The best balance
of correct classification of both at risk and
low risk is achieved with the Winter Non
sense Word Fluency (NWF) and Spring Oral
Table 9 Cross-tabulations of CSAP and DIBELS Categories by Third Grade

Measurement Period
O ral Reading Fluency Classification 3rd Spring
ALL STUDENTS
A t Risk
Some Risk
Low Risk
(< 8 0 )
(80 -1 0 9 )
(>110)
Below Proficiency
298
171
73
542 (26%)
Proficient and Above
168
554
836
1558 (74%)
Total
466(22%)
725(35%)
909(53%)
2100
CSAP Perform ance
Total
Sensitivity: .80 Specifiedy: .83

O ral Reading Fluency Classification 3rd W inter
ALL STUDENTS
A t Risk
Some Risk
Low Risk
(< 80 )
(80 -1 0 9 )
(>110)
Below Proficiency
369
125
57
551 (26%)
316
521
736
1572 (74%)
Total
685(32%)
646(30%)
793(37%)
2124
CSAP Perform ance
Total
Sensitivity: .87 Specific t y : .70

O ral Reading Fluency Classification 3rd Fall
ALL STUDENTS
At Risk
Some Risk
Low Risk
(< 8 0 )
(8 0 -1 0 9 )
(>110)
Below Proficiency
385
118
57
560 (26%)
366
549
659
1574 (76%)
Total
751 (35%)
667(31%)
716(34%)
CSAP Perform ance
Total
Sensitivity: .87 Specificity: .64
Note: Sensitivity and Specificity calculations did not include students whose ORF scores were in the
midrange (i.e., students at some risk)

Table 10 Predictive Utility Statistics for All Grades, and Comparing ELL to non-ELL
ALL STUDENTS
Sensitivity
Specificity
% At Risk
% Low Risk
correctly
correctly
Overall
classified
classified
Accuracy
ORF 3rd End
0.80
0.83
0.64
0.92
0.82
ORF 3rd Middle
0.87
0.70
0.54
0.93
0.75
ORF 3rd Beg
0.87
0.64
0.51
0.92
0.71
ORF 2nd End
0.77
0.77
0.58
0.89
0.77
ORF 2nd Middle
0.76
0.78
0.58
0.89
0.77
ORF 2nd Beg
0.82
0.69
0.54
0.90
0.73
ORF 1st End
0.54
0.92
0.77
0.81
0.80
NWF 1st Mid
0.53
0.90
0.71
0.81
0.79
PSF 1st Beginning
0.71
0.64
0.54
0.79
0.67
ORF 3rd End: Not ELL
0.76
0.86
0.59
0.93
0.84
ORF 3rd End: ELL
0.85
0.77
0.70
0.89
0.80
ORF 3rd Mid: Not ELL
0.83
0.74
0.49
0.93
0.76
ORF 3rd Mid: ELL
0.91
0.58
0.60
0.91
0.71
ORF 3rd Beg: Not ELL
0.83
0.70
0.47
0.93
0.73
ORF 3rd Beg: ELL
0.92
0.50
0.57
0.89
0.67
ORF 2nd End: Not ELL
0.75
0.81
0.57
0.91
0.79
ORF 2nd End: ELL
0.81
0.64
0.61
0.83
0.71
ORF 2nd Mid: Not ELL
0.72
0.83
0.59
0.90
0.80
ORF 2nd Mid: ELL
0.81
0.64
0.58
0.84
0.70
ORF 2nd Beg: Not ELL
0.80
0.75
0.53
0.92
0.76
ORF 2nd Beg: ELL
0.85
0.51
0.57
0.82
0.66
ORF 1st End: Not ELL
0.47
0.94
.77
00.82
0.81
ORF 1st End: ELL
0.64
0.86
0.76
0.77
0.77
NWF 1st Mid: Not ELL
0.43
0.93
0.70
0.81
0.79
NWF 1st Mid: ELL
0.74
0.82
0.71
0.83
0.79
PSF 1st Beg: Not ELL
0.63
0.72
0.55
0.78
0.69
PSF 1st Beg: ELL
0.86
0.44
0.53
0.81
0.61
Not ELL / ELL Comparison
Reading Fluency (ORF) in the first grade.

The Spring first gradeORF also provided the
second best overall accuracy, lagging behind
the Spring third gradeORF by only .02. This
is an important finding, given that identifica
tion of risk in the first grade provides much
more time for intensive intervention than does
identification in the third grade.
Predictive utility statistics in Table 10
further compare ELL students with non-ELL
students. The sensitivity of the tests for ELL
students ranges from .64 to .92, althoughmost
possessed greater than 80% sensitivity. For all
tests, DIBELS demonstrated better sensitivity
for ELL students than for non-ELL students.
The reverse was true for specificity; the DI
BELS demonstrated better specificity across
the board for non-ELL students (ranging from
.70 to .94) than for ELL students (.44 to .86.)
The percent of at risk students correctly clas
sified was also generally greater for ELL stu
dents (.57 to .76) than non-ELL students (.47 to
.77). The percent of low risk students correctly
classified was greater for non-ELL students
(.78 to .93) than for ELL students (.77 to .91).
Overall accuracy of the tests was roughly equal
between ELL and non-ELL students, although
slightly higher for non-ELL. Sensitivity and
specificity are most balanced in the third grade,
less so for second grade, and the least for first.
On the other hand, while third grade scores
have the strongest linear relationship with the
CSAP score, first grade scores have a better
balance of predicting both proficiency and lack
of proficiency accurately.
ROC curves were constructed for each
of the recommended sequences of tests and
area-under-the curve (AUC) was calculated.
A higher area under the curve represents a
balance between sensitivity and specificity
that maximizes overall accuracy of a predic
tive test. The AUC scores ranged from .804 to
.816 for the third grade tests of oral fluency,
.774 to .804 for second grade tests, and .731
to .794 for first grade tests.While predictive
utility declines as tests are more remote in

time from the outcome, they are all within an
acceptable range for predictive utility.
There were no identifiable breaks or
elbows in the ROC curves, which would in
dicate that a revised cutoff score would not
achieve dramatically better predictive utility.
Ideally, tests will identify students at risk
earlier than the third grade. The correct clas
sification of Towrisk and at risk students by
the recommended test sequence is depicted in
Figure 3. Across all tests, the tests are better at
predicting success on the CSAP reading test
than failure. Two o f the most accurate predic
tors are first grade tests - the winter Nonsense
Word Fluency (NWF) and the springORF.
These two tests have a balanced capacity to
predict success and to predict risk of failure.
The winter and spring second grade ORF
tests are better at identifying risk of failure
than the first two third grade tests; otherwise,
second and third grade tests are roughly equal
in predicting proficiency. A similar pattern
was demonstrated for ELL and non-ELL stu
dents; in general,DIBELS is better at predict
ing children who are at Tow risk than those
at risk across the board. However, DIBELS
correctly classifies children at risk better for
ELL than non-ELL students in the third grade,
better in 2 of the 3 time periods in second
grade, and roughly equally in the first grade.
Discussion and Conclusions

Correlations among the third gradeORF
tests (i.e. fall, winter, spring) are strong. Cor
relations between ORF tests and the CSAP
measure are consistent with comparable stud
ies and are moderately strong to strong. There
remains a good deal of variability in the cor
relation between DIBELS and CSAP across
studies. The cutoff score o f 90 for DIBELS
ORF third grade spring would have to be low
ered substantially to improve the predictive
utility of the test. Continuing with the cutoff
scores recommended by Good and Kaminski
Figure 3 Percent at risk and low risk correctly classified by

recommended test sequence
(2002) is a reasonable approach to identifying

at-risk students.
Measures of oral reading fluency are as ef
fective for students who are classified as ELL
as for children who are not classified as ELL.
Sensitivity is higher across the board for ELL
students, while specificity is lower. Converse
ly, the tests are better at predicting at risk stu
dents when they are ELL and are better at pre
dicting low risk students when they are not.
For both ELL and non-ELL students, DIBELS
is better at predicting success than failure. This
does allow for the consideration of early inter
ventions to improve literacy achievement. Two
of the tests with the strongest predictive capac
ity are administered in the first grade, which
allows for early intervention.
Across all tests, DIBELS subtests are better
at predicting success on the CSAP reading test
than failure. In the first grade winter NWF and

spring ORF have a balanced capacity to predict
success and to predict risk of failure. A similar
pattern was demonstrated for ELL and nonELL students; that is, in general, DIBELS is
better at predicting childrenwho are at low risk
than those at risk. However, DIBELS correctly
classifies children at risk better for ELL than
non-ELL students in the third grade, better in
2 of the 3 time periods in second grade, and
approximately equal in first grade. This in
formation could assist educators in justifying
implementation of early intervention for at risk
students, including ELL students.
The current study adds to the limited body
of research strengthening the conclusion that
DIBELS is effective in identifying English
Language Learners that are at risk for under
achieving in reading.

The findings of this study suggest that
the DIBELS can be used to classify English
Language Learners who are at risk for reading
failure. Given the equivocal results of previ
ous research on this question, this is a signifi
cant finding given the substantial sample size
and the high proportion of English Language
Learners in this study. As Riedel concluded in
his research (2007), we cannot draw conclu
sions about the nature of interventions needed
to address the reading underachievement of
English Language Learners, but we do know
these data provide evidence that the DIBELS
is useful in identifying ELL students at risk for
underachievement in reading comprehension.
References
Barger, J. (2003). Comparing the DIBELS oral reading
fluency indicator and the North Carolina end o f grade
reading assessment (Technical Report.) Asheville,
NC: Carolina Teaching Academy.
Buck, J. & Torgcnson, J. (2003). The relationship be
tween performance on a measure o f oral reading flu
ency and performance on the Florida comprehensive
assessment test. FCRR Technical Report #1. Florida
Center for Reading Research.
Elliott, J., Lee, S., & Tollefson, N. (2001). A reliability
and validity study o f the dynamic indicators o f basic
early literacy skills - modified. School Psychology
Review. J0(l):33-49.
Good, R. H., Gruba, J.& Kaminski, R. (2001). Best
practices in using dynamic indicators o f basic early
literacy skills (DIBELS) in an outcomesdriven mod
el. In A. Thomas & J. Grimes (Eds.), Best Practices
in School Psychology IV (pp.679-700). Washington,
DC: National Association o f School Psychologists.
Good, R.H., Simmons, D.C., & Kameenui, E. J. (2001).
The importance and decision-making utility o f a con
tinuum o f fluency-based indicators o f foundational
reading skills for third-grade high stakes outcomes.
Scientific Studies o f Reading, 5, 257-288.
Good, R. H., Kaminski, R. A., Smith, S., Simmons, D.,
Kameenui, E., & Wallin, J. (In press). Reviewing
outcomes: Using DIBELS to evaluate a schools core
curriculum and system o f additional intervention in
kindergarten. In S. R. Vaughn & K. L. Briggs (Eds.),
Reading in the classroom: Systems fo r observing
teaching and learning. Baltimore: Paul H. Brookes.
Good, R. H., Simmons, D., Kameenui, E., Kaminski, R.
A., & Wallin, J. (2002). Summary o f decision rules
for intensive, strategic, and benchmark instruction

al recommendations in kindergarten through third
grade (Technical Report No. 11). Eugene, OR: Uni
versity o f Oregon.
Haager, D., & Windmuellcr, M.P. (2001). Early reading
intervention for English Language Learners at-risk
for learning disabilities: Student and teacher out
comes in an urban school. Learning Disability Quar
terly, 2 (Fall), 235-250.
Hintze, J., Ryan, A., & Stonwcr, G. (2003). Concurrent
validity and diagnostic accuracy o f the Dynamic In
dicators o f Basic Early Literacy Skills (DIBELS) and
the Comprehensive Test o f Phonological Process.
School Psychology Review, 32(4): 541-556.
Kaminski, R., Good, R., Baker, D., Cummings, K., Dufour-Martel, C., Fleming, K., et al. (2006). Position
paper on use o f DIBELS for diverse learners. Dy
namic Measurement Group. Retrieved December 3,
2008 from http://www.dibels.org/pubhcations.html.
McKnight, C., Lee, S., & Schowengerdt, (2001). Effects
of specific strategy training on phonemic awareness
and reading aloud with preschoolers: A comparison
study. Retrieved from ERIC, April 2001, pp. 1-55.
National Institute o f Child Health and Human Devel
opment. (2000) Report o f the National reading Panel
(NIH Publication # 00-4769). Washington, D.C.:
U.S. Government Printing Office.
Riedel, B. (2007). The relation between DIBELS, reading
comprehension, and vocabulary in urban first-grade
students. Reading Research Quarterly, vol. 42, no. 4,
October/Nov/Dcc 2007, 546-567.
Samuels, S.J. (2006, May). Introduction to reading flu
ency. Paper presented at the annual meeting o f the
International Reading Association, Chicago.
Schwarzer, D., and Ferguson, D. (2007). DIBELS and
English Language Learners in the United States: An
analysis o f the scientifically based research behind
the test. TSOL Quarterly Newsletter, Bilingual Edu
cation Interest Section (in press).
Shanahan, T. (2005). Review of DIBELS: Dynamic indi
cators o f basic early literacy skills. Mental Measure
ments Yearbook. 16th ed. P 310-12.
Shaw, R. and Shaw, D., (2002). DIBELS Oral Reading
Fluency-Based Indicators o f Third Grade Reading
Skills for Colorado State Assessment Program
(CSAP.) (Technical Report) Eugene, OR: University
of Oregon.
Swcts, J.A., Dawes, R.M., & Monahan, J. (2000). Psy
chological science can improve diagnostic decisions.
Psychological Science in the Public Interest, I, 1-26.
Vander Meer, C. D., Lentz, F. E., & Stollar, S. (2005). The
relationship between oral reading fluency and Ohio
proficiency testing in reading (Technical Report).
Eugene, OR: University o f Oregon.
Copyright of Reading Improvement is the property of Project Innovation, Inc. and its content
may not be copied or emailed to multiple sites or posted to a listserv without the copyright
holder's express written permission. However, users may print, download, or email articles for
individual use.

The Predictive Utility of Dibels Reading Assessment For Reading Comprehension Among Third Grade English Language Learners and English Speaking Children

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

The Predictive Utility of Dibels Reading Assessment For Reading Comprehension Among Third Grade English Language Learners and English Speaking Children

Transféré par

Droits d'auteur :

Formats disponibles

THE PREDICTIVE UTILITY OF DIBELS READING

ASSESSMENT FOR READING COMPREHENSION AMONG

negative implications for the utility of DIBELS

The Predictive Utility of Dibels Reading Assessment / 89

Specifically, the purposes of this study were to:

examine the predictive utility of the

identify the optimal cutoff score to cor

compare the usefulness of the mea

61.3%were identified as Hispanic, 2% oth

Table 2 Distribution of Demographics by Study Sample and Sub Sample

Percent o f All Students

Limited English Proficiency

Full English Proficiency

Percent of ELL Students

Figure 1 Distribution of CSAP scores and proficiency classifications

CSAP Scale Score

The Predictive Utility of Dibels Reading Assessment / 91

non-English Language Learners. Cutoff scores

The CSAP is a criterion-referenced, standards-based assessment designed to provide

of parents, educators, administrators, business

Scale Score Range

656 and above

465 and below

tests of the DIBELS suite of assessments. A

Letter Naming Fluency

Kindergarten through 1st

Initial Sound Fluency

Kindergarten through 1st

Nonsense Word Fluency

Kindergarten through 2nd

Oral Reading Fluency

1st through 3rd

Work Use Fluency

Kindergarten through 3rd

The Predictive Utility of Dibels Reading Assessment / 93

Table 5 Documented Reliability Statistics for the DIBELS Tests

Concurrent validity with

Correlation of Spring 3rd grade

Stage & Jacobsen (2001)

Buck & Torgesen (2003)

Shaw & Shaw (2002)

that the final file was de-identified. Data were

and non-English Language Learners). Bivar

are reported for all students and by language

Table 7 Correlation Coefficients for DIBELS Third Grade Assessments

Table 8 Measures of Correct Classification of State Assessment Reading Proficiency by

% Low Risk Correctly

Stage & Jacobsen (2001)

Buck & Torgescn (2003)

McGlinchcy & Hixon

The Predictive Utility of Dibels Reading Assessment / 95

results of this study. In the current studys

Figure 2 Scattergram of the relationship between DIBELS classifications of risk and

CRF 3rd End

low of .53 to a high of .87 for the total group

balance of these two diagnostic characteristics,

Table 9 Cross-tabulations of CSAP and DIBELS Categories by Third Grade

Proficient and Above

CSAP Perform ance

Sensitivity: .80 Specifiedy: .83

Proficient and Above

CSAP Perform ance

Sensitivity: .87 Specific t y : .70