Vous êtes sur la page 1sur 15

THE PREDICTIVE UTILITY OF DIBELS READING

ASSESSMENT FOR READING COMPREHENSION AMONG


THIRD GRADE ENGLISH LANGUAGE LEARNERS AND
ENGLISH SPEAKING CHILDREN
D ebora S cheffel
University o f the Rockies
Colorado Springs, CO
D ianne L efly
Colorado Department o f Education Assessment/Standards
Denver, CO
J anet H ouser
Regis University
Rueckert-Hartman College fo r Health Professions
Denver, CO

Abstract
The study addresses the extent to which subtests on the Dynamic In
dicators of Basic Early Literacy Skills Reading Assessment (DIBELS;
Good & Kaminski, 2002) predict student success on a measure of read
ing comprehension and if prediction is consistent for native and second
English Language Learners. 2,649 elementary students were assessed
on a reading comprehension measure, of which 29.7% were English
Language Learners. Descriptive and analytic statistics were generated
including bivariate correlation analysis split by language proficiency.
Critical measures and suggested cutoff values (Good, Simmons, et al.,
2002) were evaluated for predictive utility by visualization of Receiver
Operating Characteristic (ROC) curves (Swets, Dawes, & Monahan,
2000), and comparison of the area-underthe- curve (AUC) values. DI
BELS better predicts children who are at low risk than those at
risk; however, DIBELS correctly classifies children at risk better
for ELL than non-ELL students in third grade.
Key words: English Language Learners (ELL), DIBELS, Reading,
Sensitivity, Specificity

87

88 / Reading Improvement

Introduction
The Dynamic Indicators of Basic Early
Literacy Skills (DIBELS; Good & Kaminski,
2002) is a measure designed to assess 3 of the
5 big ideas of early literacy espoused in the
National Reading Panel report (National Insti
tute of Child Health and Human Development,
2000): Phonological Awareness, the Alphabet
ic Principle, and Fluency with Connected Text.
Measures of Phonological Awareness include
Initial Sound Fluency (ISF) which assesses a
childs skill to identify and produce the initial
sound of a given word, and Phonemic Segmen
tation Fluency (PSF) which assesses a childs
skill to produce the individual sounds within a
given word. A measure of the Alphabetic Prin
ciple is Nonsense Word Fluency (NWF) which
assesses a childs knowledge of letter-sound
correspondences as well their ability to blend
letters together to form unfamiliar nonsense
(e.g., fik, lig, etc.) words, and a measure of
Fluency with connected text is Oral Reading
Fluency (ORF) which assesses the number of
correct words a child can read per minute in
grade level connected text.
The authors of the DIBELS claim that its
subtests are reliable predictors of reading under
achievement and thus may be used to identity
students in need of intervention and to reliably
determine student progress (Good, Simmons
6 Kameenui, 2001). DIBELS is being used in
thousands of schools across the nation, often to
provide formative data to schools accountable
for increasing student achievement on end-ofgrade-level state reading achievement tests. In
spite of its widespread use, some question its
utility in assessing reading comprehension, the
undisputed goal of reading (Good et al., 2001).
Samuels (2006) has been particularly critical
of fluency measures on the DIBELS assert
ing that fluency involves both decoding and
comprehending texts simultaneously, whereas
the DIBELS Oral Reading Fluency measure
focuses on decoding speed and does not assess
comprehension. This may have particularly

negative implications for the utility of DIBELS


with English Language Learners as they may
be able to decode text rapidly without com
prehending the passage due to impediments
in vocabulary and syntax. Justifying the use of
the DIBELS with children learning English as
their second language, Kaminski et al. (2006)
reported that for English Language Learners
who are learning to read in English, DIBELS
are appropriate for assessing and monitoring
progress in acquisition of early reading skills.
Haager and Windmueller (2001) also assert
that DIBELS have been used successfully with
English language learners to predict reading
underachievement, identify students in need of
intervention and determine student progress.
In addition, Riedel (2007) found that DIBELS
Oral Reading Fluency and comprehension
weremore strongly correlated in ELL students
than in non-ELL students, but because of the
small size of the ELL sample in his study, fur
ther investigation was needed. Schwarzer and
Ferguson (2007) critically reviewed these lim
ited research claims and concluded that they
are broadly unfounded due to a lack of rigor in
the research to support them.
No studies could be identified that suffi
ciently tested the reliability and validity of the
DIBELS suite of assessments specifically on
English Language Learners.While the chal
lenges of achieving literacy in this population
arewell known, there have been fewrigorous
studies demonstrating the usefulness of these
tests for accurately predicting which of these
children will require more intensive interven
tion to achieve success.
The overall purpose of the present study is
to determine the extent to which subtests on
the DIBELS are effective in predicting student
success on a summative, state criterion-referencedmeasure that exclusivelymeasures read
ing comprehension in third grade, the CSAP
(Colorado Student Assessment Program), and
whether or not its effectiveness is consistent for
native and second English Language Learners.

The Predictive Utility of Dibels Reading Assessment / 89

Specifically, the purposes of this study were to:

examine the predictive utility of the


DIBELS tests in correctly identi
fying students in need of intensive
intervention

identify the optimal cutoff score to cor


rectly classify students who are in need
of reading intervention

compare the usefulness of the mea


surement system for students who are
English Language Learners.
Method

Participants
Participants in the study were a cohort of
2,649 elementary school students in one of
the western states in the United States. These
students attended schools which received grant
monies as part of a national literacy reform
initiative targeting schools with themost chal
lenged students in terms of low achievement
and socioeconomic status. These students
were assessed at the conclusion of the 2006
school year for reading achievement on the
state assessment, and had received reading
interventions for two or more years previ
ous. 33.1%of the sample were Caucasian,

61.3%were identified as Hispanic, 2% oth


er ethnicity and 4.2% whose ethnicity was
unknown. Table 3 reveals that 66.1% were
classified as Non- English Language Learners
and 29.7% as English Language Learners and
4.2% whose English Language Learner Status
was unknown. AmongHispanic students, 6.9%
were identified as No English Proficiency,
50.7% were categorized as having Limited
English Proficiency, and 42.7% evidenced
Full English Proficiency. Seventy- five per
cent of sample students were eligible for free
or reduced priced lunch which is an indicator
of economic disadvantage.
Table 1 indicates that 49.31% of the study
sample was female and 51.07%, male. Table
2 indicates that 75% of the total sample was
eligible for free and reduced lunch, a proxy for
socioeconomic status. Distribution of CSAP
scores and classification of proficiency appear
in Figure 1.
Measures
Criterion-related and predictive validity
were tested for this report, using the state
StudentAssessment Program(CSAP) thirdgrade reading assessment as the criterion
reference. Predictive utility was appraised and
compared for English Language Learners and

Table 1 2006 CSAP Grade 3 Reading Means, Standard Deviations and Percent of Total
by Subgroup
Subgroup

Total Tested

Mean SS

SS STD

% of Total Tested

Native American

627

450.44

48.26

1.16%

Asian/Pacific Islanders

1981

477.63

54.81

3.65%

African American

3206

443.91

54.05

5.91%

Hispanic

14689

442.34

52.14

27.08%

White

33942

480.83

49.48

62.58%

Female

26747

475.53

53.78

49.31%

Male

27698

460.35

52.66

51.07%

Not ELL

46933

562.59

72.15

86.53%

ELL

7304

499.44

83.18

13.47%

Total Tested

54239

554.09 7

6.80

90 / Reading Improvement

Table 2 Distribution of Demographics by Study Sample and Sub Sample


Total Sample

Percent of Total
Sample

White

840

33.1%

1.4%

98.6%

Hispanic

1557

61.3%

48.7%

51.3%

92

3.6%

95.7%

4.3%

Demographics

African American

Percent ELL
Students

Percent Non-ELL
Students

Asian/Pacific Islander

24.

9%

41.7%

58.3%

Native American

26

1.0%

11.5%

88/5%

No Ethnicity Label

110

4.2%

33.6%

66.4%

Male

1337

50.5%

30.9%

69.1%

Female

1312

49.5%

30.9%

68.7%

Free/Reduccd Lunch
Eligible

1903

75%

38.7%

61.3%

Not Free/Rcduced
Lunch Eligible

636

25%

8.0%

92.0%

Total Students

2649

29.7%

66.1%

Table 3 Distribution of Language Proficiency for Total Sample and for ELL Students
Language Proficiency

Total Sample

Percent o f All Students

1752

66.1%

No English Proficiency

54

2.0%

6.9%

Limited English Proficiency

399

15.1%

50.7%

Full English Proficiency

334

12.6%

42.7%

No ELL Label

110

4.2%

Not ELL

Total

Percent of ELL Students

2649

Figure 1 Distribution of CSAP scores and proficiency classifications

CSAP Scale Score

The Predictive Utility of Dibels Reading Assessment / 91

non-English Language Learners. Cutoff scores


were evaluated and identified to determine
what score(s) maximize correct classification
of students who need reading intervention and
those who do not. Predictive validity refers
to the extent to which a scale predicts scores
on some criterion measure. The importance
of predictive validity and utility is in gener
ating a number that reflects the likelihood that
students are accurately classified according
to their risk of reading underachievement on
a specific criterion (in this case, the CSAP).
Specifically, the numerical coefficient reflects
the likelihood that a child who scores within an
at risk range is actually at risk as measured on
the criterion measure (i.e., has positive predic
tive value) or the probability that a child who
scores within an at risk range, is actually not
at risk as measured on the criterion measure
(i.e. has, negative predictive value). Teachers
must have confidence that a progress monitor
ing measure like the DIBELS is predictive of
a students performance in reading on an out
come measure. By establishing the extent to
which this is characteristic of DIBELS, teach
ers can have the confidence to use the DIBELS
data to make instructional decisions that will
positively impact student achievement.
The State Student Assessment Program

The CSAP is a criterion-referenced, standards-based assessment designed to provide


stakeholders (i.e., students educational per
sonnel, parents, government officials, etc.)
with a picture of student performance on the
state Model Content Standards (MCS). The
primary purpose of the assessment program
is to determine the level at which Colorado
students meet the state standards in content
areas that are assessed. Results are intended
for use in improving curricula and instruction
aswell as increasing student learning.
The MCS were developed by educators and
community members over a two-year period.
The resulting standards represent a consensus

of parents, educators, administrators, business


people, and interested communities. The stan
dards serve as guidelines that describe what
students should know and be able to do at spe
cific grade levels as measured by the MCS.
As such, they measure attainment of the goal
of proficiency in reading, writing, math and
science. The first series of CSAP assessments
was administered from 1997 to 2001. They
were administered in selected grades during
those years. The CSAP assessments in their
current formhave been administered in grade
3-10 since 2002.
The CSAP assessments were federally
peer approved in 2006 for use as the states
NCLB assessment system. The CSAP assess
ments result in scale scores that are based on
students item response patterns (Item Re
sponse Theory). Since 2002, the CSAP has
been vertically scaled so student progress can
be tracked over time. The CSAP has estab
lished internal reliability for the total test and
for each content standard at each grade.
On the 2006 CSAP, total score reliability
coefficients were all .86 or greater measured
as Cronbachs alpha. These reliability coeffi
cients indicate that overall, the states 2006
assessment had strong internal consistency
and that the tests produced relatively stable
scores. The CSAP has been tested for content,
construct, discriminate, and predictive valid
ity; all are within acceptable levels. Detailed
information about the results of reliability and
validity testing are available at the Colorado
Department of Education web site at http://
www.cde.state.co.us/cdeas sess/archives.html
in the CSAP technical reports.
The CSAP third-grade reading assessment
is the criterion reference for this study. The
third-grade reading assessment measures only
reading comprehension, one of the states six
reading and writing standards.
Total score reliability coefficients for
the third-grade test was .89 as measured by
Cronbachs alpha. This reliability coefficient

92 / Reading Improvement
indicates that the states third grade reading
assessment has strong internal consistency and
that the test produces relatively stable scores.
For 2006, the third grade assessment consisted
of 32 multiple choice items and 8 constructed
response items. The mean p-value for the mul
tiple choice itemswas .70 and themean p-value
for the constructed response items was .51.
Descriptive statistics for the third grade
reading assessment are in Table 1. Table 2
contains descriptive statistics for the study
sample, and Table 3 contains language pro
ficiency information about the study sample.
The DIBELS Monitoring Assessments
The DIBELS assessments are asserted
to be reliable predictors of reading under
achievement that may be used to identify stu
dents in need of intervention and to reliably
determine student progress to provide schoolbased data to inform instruction and to review
school level outcomes.
Table 4 lists the six tests used for bench
marks and/or progress monitoring. Good,
Simmons et al. (2002) used evaluation of the
predictive capacity of each test in the suite to
recommend a process for monitoring the de
velopment of literacy skills.
Table 5 summarizes the reported reliability
statistics for the DIBELS suite. The reliabil
ity and validity of the Oral Reading Fluency
tests have been the focus of most evaluations
reported in the literature. Table 6 summarizes
the reliability and validity tests that have been
reported for the Oral Reading Fluency (ORF)
For evaluating student proficiency, the
third grade scale scores are categorized as:
Proficiency Level

Scale Score Range

Advanced Proficiency

656 and above

Proficient

526 to 655

Partially Proficient

466 to 525

Unsatisfactory

465 and below

tests of the DIBELS suite of assessments. A


review of the DIBELS published in the Mental
Measurement Yearbook (Shanahan, 2005) con
cludes that the measures evidence adequate or
better psychometric properties, but lack sound
documentation of predictive validity. While
the DIBELS appears to be a valid indicator of
reading ability, the capacity of the tests to cor
rectly identify the childrenwho need additional
help is supported by limited research.
Data Collection and Analysis Procedures
DIBELS data were downloaded directly
from the University of Oregon data reposi
tory using standard data analytic procedures.
Data were retrieved for 2,649 students for the
school years ending 2004, 2005, and 2006, all
three measurement periods (Fall, Winter, and
Spring) for all DIBELS assessments. Univer
sity of Oregon data have an identifier that is
unique to the DIBELS measurement system,
and so these data were matched to CSAP data
using birth date, school, last name, first name
and gender. It was assumed that matching on
these demographic variables would minimize
the potential mismatching of the two data
sets (i.e. DIBELS and CSAP). The files were
merged and unique identifiers were deleted so
Table 4 Assessments in the DIBELS Suite
of Reading Tests
Test

Grades

Letter Naming Fluency


(LNF)

Kindergarten through 1st

Initial Sound Fluency


(ISF)

Kindergarten

Phoneme Segmentation
Fluency (PSF)

Kindergarten through 1st

Nonsense Word Fluency


(NWF)

Kindergarten through 2nd

Oral Reading Fluency


(ORF)

1st through 3rd

Work Use Fluency


(WUF)

Kindergarten through 3rd

The Predictive Utility of Dibels Reading Assessment / 93

Table 5 Documented Reliability Statistics for the DIBELS Tests


Study
Shanahan, T. (2005)

Specific Test(s)

Coefficient

Cronbachs alpha

.92

Test-retest

.92 to .97

Concurrent validity with


Woodcock-Johnson Reading
Predictive validity

.80
.66

McKnight (2001)

Inter-rater reliability

High 80s

Elliott (2001)

Inter-rater reliability

.82 to .94

Tcst-retest

.74 to .93

Hintzc (2003)

Equivalent forms

.64 to .91

Cronbachs alpha

.49 to .69

Table 6 Results of Comparable Studies Relating DIBELS ORF Scores and State
Assessments of Reading
Study

Correlation of Spring 3rd grade


ORF with State Assessment

Stage & Jacobsen (2001)

.43 to .44

Buck & Torgesen (2003)

.62 to .78

Wilson (2005)

.74

VanDerMecr (2005)

.61 to .6 5

Shaw & Shaw (2002)

.73 to .80

Barger (2003)
McGlinchcy & Hixon (2004)
Current Study

.73
.49 to .81
.604 to .628

that the final file was de-identified. Data were


matched such that a students first, second,
and third grade scores could bematched to
the third grade CSAP score, resulting in lon
gitudinal study data. A total of 2,492 students
had valid CSAP reading scale scores that
were collected from the state Department of
Education (DE) database and matched to 515
DIBELS scores from the students first grade
performance, 1,378 scores from second grade
performance, and 2,134 scores from third
grade performance on the tests.
Descriptive and analytic statistics for all
variables were generated for the total respon
dent groups and for the language proficiency
subgroups (i.e. English Language Learners

and non-English Language Learners). Bivar


iate correlation analysis was run on the DI
BELS / CSAP variables for the total data set
and split by language proficiency. Cross-tab
ulations were developed to test the associa
tion between DIBELS categorizations into
at risk; some risk; and low risk and CSAP
classification as adequate or inadequate profi
ciency. Chi square statistics were generated
to test for statistical significance of any asso
ciations. Cross-tabulated tables were used to
evaluate the sensitivity, specificity, positive
and negative predictive value, and overall
accuracy o f the DIBELS assessments. Crit
ical measures and suggested cutoff values as
identified by Good, Simmons et al. (2002)

94 / Reading Improvement
were evaluated for predictive utility by vi
sualization of ROC (Receiver Operating
Characteristic) curves (Swets, Dawes, &
Monahan, 2000) and comparison of the areaunder-the-curve (AUC) values.
Results
Students in the sample for the current
studywere by definition at risk by attending
schools that qualify for specific grant monies.
As a result, the distribution of students in the
sample is not typical of the Colorado population
at large as can be seen by comparing Tables 1
and 2. Project schools have a higher proportion
ofininority students, students that qualify for
free and reduced lunches, and students who are
English Language Learners (ELL).
The Pearsons correlation coefficients for
the relationship between D1BELS Oral Read
ing Fluency (ORF) scores for Fall, Winter, and
Spring of the third grade and the CS AP reading
scale score appear in Table 7. The correlations

are reported for all students and by language


proficiency classification. The relationship be
tween the DIBELS tests and the CSAP reading
scores are moderately strong to strong.
The tests are as highly correlated for stu
dents that are classified as ELL as those who
are not. The tests are highly correlated among
the three measurement periods, and so it
could be assumed that each possesses roughly
the same amount of predictive utility.
Table 8 provides comparative data from
the literature, which shows the variability in
correlation coefficients across typical studies.
While part of the variability is accounted for
by differences in state assessments, the mag
nitude of relationship between DfBELS ORF
scores and state assessments remains quite
variable. For the current study, the correlation
is moderately strong to strong, and indicates
a linear relationship between scores on the
DIBELS ORF in all three third grade time
periods and the CSAP reading score.

Table 7 Correlation Coefficients for DIBELS Third Grade Assessments


DIBELS Fall
ALL

DIBELS Winter

Non-ELL

ELL

ALL

DIBELS Spring

Non-ELL

ELL

DIBELS Winter

0.912

0.906

DIBELS Spring

0.877

0.873

0.879

0.912

0.908

0.912

CSAP Spring

0.582

0.558

0.587

0.607

0.588

0.602

ALL

Non-ELL

ELL

0.909

0.623

0.603

0.628

Table 8 Measures of Correct Classification of State Assessment Reading Proficiency by


DIBELS Spring 3rd Grade Oral Reading Fluency (ORF)
Study
Hintze, ct al. (2003)

% At Risk Correctly
Identified

% Low Risk Correctly


Identified

.58 to .86

.52 to .88

Stage & Jacobsen (2001)

.41

.90

Buck & Torgescn (2003)

.81

.91

Wilson (2005)

.93

.82

VanDerMcer (2005)

.96

.72

McGlinchcy & Hixon


(2004)

.77

.72

.51 to .64

.92 to .93

This Study

The Predictive Utility of Dibels Reading Assessment / 95


The predictive utility of the DIBELS ORF
scores is depicted in Figure 2 as a scattergram
of the relationship between DIBELS classifi
cations of risk and CS AP classifications of pro
ficiency. Students in the upper right and lower
left comers are accurately classified as to their
risk of not achieving proficiency on the CSAP
reading test. These data are consistent with pre
vious studies that demonstrate predictive utili
ty for the at risk and low risk categories,
but the some risk category is equally likely
to predict proficiency or a lack of proficiency.
The lower cutoff of the ORF score would have
to be substantially lowered to achieve signifi
cant changes in predictive utility.
Predictive utility has been measured in
previous studies by calculating the proportion
o f low and at risk students who are correctly
classified as proficient or lacking proficiency.
Table 8 reports these figures for comparable
studies o f the spring third grade ORF and
state assessments of reading as well as the

results of this study. In the current studys


sample, between 51% and 64% of those at
risk and 92% to 93% of low risk students
were correctly classified by third grade DI
BELS tests. Table 9 demonstrates the classi
fication accuracy of all three (i.e. fall, winter,
spring) third grade ORF DIBELS tests for the
total student group. Sensitivity ranged from
.80 to .87, with strongest sensitivity in the fall
and winter. Specificity, on the other hand, was
strongest in the spring measurement period,
and ranged from a low of .64 to a high of .83.
Sensitivity was calculated as recommended by
Good, Gruba & Kaminski (2001) by eliminat
ing students from analysis that fell in the mid
range, i.e. students classified at some risk.
Calculations of sensitivity, specificity,
percent of correct classification, and overall
accuracy appear for all tests in Table 10, and
provide the most detailed information about
the usefulness of individual tests in specific
populations. Sensitivity for all ranged from a

Figure 2 Scattergram of the relationship between DIBELS classifications of risk and


CSAP classifications of proficiency.
800
700
600
500
400
300

H
01

200
100

CRF 3rd End

100

200

300

96 / Reading Improvement

low of .53 to a high of .87 for the total group


of DIB ELS tests most highly related to the out
come measure (i.e. CSAP). Specificity ranged
from a low of .64 to a high of .92. The best bal
ance of sensitivity and specificity - indicating
a low rate of both false positives and false neg
atives - is achieved with the Winter and Spring
second and third grade ORF tests. Interestingly,
the Phoneme Segmentation Fluency score at
the beginning of the first grade also had a good

balance of these two diagnostic characteristics,


particularly considering the length of time that
passed between the first grade DIBELS testing
period and the third grade CSAP.
Table 10 also depicts the percent at risk
and low risk correctly classified, and the
overall accuracy of the tests. The best balance
of correct classification of both at risk and
low risk is achieved with the Winter Non
sense Word Fluency (NWF) and Spring Oral

Table 9 Cross-tabulations of CSAP and DIBELS Categories by Third Grade


Measurement Period
O ral Reading Fluency Classification 3rd Spring

ALL STUDENTS

A t Risk

Some Risk

Low Risk

(< 8 0 )

(80 -1 0 9 )

(>110)

Below Proficiency

298

171

73

542 (26%)

Proficient and Above

168

554

836

1558 (74%)

Total

466(22%)

725(35%)

909(53%)

2100

CSAP Perform ance

Total

Sensitivity: .80 Specifiedy: .83


O ral Reading Fluency Classification 3rd W inter

ALL STUDENTS

A t Risk

Some Risk

Low Risk

(< 80 )

(80 -1 0 9 )

(>110)

Below Proficiency

369

125

57

551 (26%)

Proficient and Above

316

521

736

1572 (74%)

Total

685(32%)

646(30%)

793(37%)

2124

CSAP Perform ance

Total

Sensitivity: .87 Specific t y : .70


O ral Reading Fluency Classification 3rd Fall

ALL STUDENTS

At Risk

Some Risk

Low Risk

(< 8 0 )

(8 0 -1 0 9 )

(>110)

Below Proficiency

385

118

57

560 (26%)

Proficient and Above

366

549

659

1574 (76%)

Total

751 (35%)

667(31%)

716(34%)

CSAP Perform ance

Total

Sensitivity: .87 Specificity: .64

Note: Sensitivity and Specificity calculations did not include students whose ORF scores were in the
midrange (i.e., students at some risk)

The Predictive Utility of Dibels Reading Assessment / 97


Table 10 Predictive Utility Statistics for All Grades, and Comparing ELL to non-ELL

ALL STUDENTS

Sensitivity

Specificity

% At Risk

% Low Risk

correctly

correctly

Overall

classified

classified

Accuracy

ORF 3rd End

0.80

0.83

0.64

0.92

0.82

ORF 3rd Middle

0.87

0.70

0.54

0.93

0.75

ORF 3rd Beg

0.87

0.64

0.51

0.92

0.71

ORF 2nd End

0.77

0.77

0.58

0.89

0.77

ORF 2nd Middle

0.76

0.78

0.58

0.89

0.77

ORF 2nd Beg

0.82

0.69

0.54

0.90

0.73

ORF 1st End

0.54

0.92

0.77

0.81

0.80

NWF 1st Mid

0.53

0.90

0.71

0.81

0.79

PSF 1st Beginning

0.71

0.64

0.54

0.79

0.67

ORF 3rd End: Not ELL

0.76

0.86

0.59

0.93

0.84

ORF 3rd End: ELL

0.85

0.77

0.70

0.89

0.80

ORF 3rd Mid: Not ELL

0.83

0.74

0.49

0.93

0.76

ORF 3rd Mid: ELL

0.91

0.58

0.60

0.91

0.71

ORF 3rd Beg: Not ELL

0.83

0.70

0.47

0.93

0.73

ORF 3rd Beg: ELL

0.92

0.50

0.57

0.89

0.67

ORF 2nd End: Not ELL

0.75

0.81

0.57

0.91

0.79

ORF 2nd End: ELL

0.81

0.64

0.61

0.83

0.71

ORF 2nd Mid: Not ELL

0.72

0.83

0.59

0.90

0.80

ORF 2nd Mid: ELL

0.81

0.64

0.58

0.84

0.70

ORF 2nd Beg: Not ELL

0.80

0.75

0.53

0.92

0.76

ORF 2nd Beg: ELL

0.85

0.51

0.57

0.82

0.66

ORF 1st End: Not ELL

0.47

0.94

.77

00.82

0.81

ORF 1st End: ELL

0.64

0.86

0.76

0.77

0.77

NWF 1st Mid: Not ELL

0.43

0.93

0.70

0.81

0.79

NWF 1st Mid: ELL

0.74

0.82

0.71

0.83

0.79

PSF 1st Beg: Not ELL

0.63

0.72

0.55

0.78

0.69

PSF 1st Beg: ELL

0.86

0.44

0.53

0.81

0.61

Not ELL / ELL Comparison

98 / Reading Improvement

Reading Fluency (ORF) in the first grade.


The Spring first gradeORF also provided the
second best overall accuracy, lagging behind
the Spring third gradeORF by only .02. This
is an important finding, given that identifica
tion of risk in the first grade provides much
more time for intensive intervention than does
identification in the third grade.
Predictive utility statistics in Table 10
further compare ELL students with non-ELL
students. The sensitivity of the tests for ELL
students ranges from .64 to .92, althoughmost
possessed greater than 80% sensitivity. For all
tests, DIBELS demonstrated better sensitivity
for ELL students than for non-ELL students.
The reverse was true for specificity; the DI
BELS demonstrated better specificity across
the board for non-ELL students (ranging from
.70 to .94) than for ELL students (.44 to .86.)
The percent of at risk students correctly clas
sified was also generally greater for ELL stu
dents (.57 to .76) than non-ELL students (.47 to
.77). The percent of low risk students correctly
classified was greater for non-ELL students
(.78 to .93) than for ELL students (.77 to .91).
Overall accuracy of the tests was roughly equal
between ELL and non-ELL students, although
slightly higher for non-ELL. Sensitivity and
specificity are most balanced in the third grade,
less so for second grade, and the least for first.
On the other hand, while third grade scores
have the strongest linear relationship with the
CSAP score, first grade scores have a better
balance of predicting both proficiency and lack
of proficiency accurately.
ROC curves were constructed for each
of the recommended sequences of tests and
area-under-the curve (AUC) was calculated.
A higher area under the curve represents a
balance between sensitivity and specificity
that maximizes overall accuracy of a predic
tive test. The AUC scores ranged from .804 to
.816 for the third grade tests of oral fluency,
.774 to .804 for second grade tests, and .731
to .794 for first grade tests.While predictive

utility declines as tests are more remote in


time from the outcome, they are all within an
acceptable range for predictive utility.
There were no identifiable breaks or
elbows in the ROC curves, which would in
dicate that a revised cutoff score would not
achieve dramatically better predictive utility.
Ideally, tests will identify students at risk
earlier than the third grade. The correct clas
sification of Towrisk and at risk students by
the recommended test sequence is depicted in
Figure 3. Across all tests, the tests are better at
predicting success on the CSAP reading test
than failure. Two o f the most accurate predic
tors are first grade tests - the winter Nonsense
Word Fluency (NWF) and the springORF.
These two tests have a balanced capacity to
predict success and to predict risk of failure.
The winter and spring second grade ORF
tests are better at identifying risk of failure
than the first two third grade tests; otherwise,
second and third grade tests are roughly equal
in predicting proficiency. A similar pattern
was demonstrated for ELL and non-ELL stu
dents; in general,DIBELS is better at predict
ing children who are at Tow risk than those
at risk across the board. However, DIBELS
correctly classifies children at risk better for
ELL than non-ELL students in the third grade,
better in 2 of the 3 time periods in second
grade, and roughly equally in the first grade.

Discussion and Conclusions


Correlations among the third gradeORF
tests (i.e. fall, winter, spring) are strong. Cor
relations between ORF tests and the CSAP
measure are consistent with comparable stud
ies and are moderately strong to strong. There
remains a good deal of variability in the cor
relation between DIBELS and CSAP across
studies. The cutoff score o f 90 for DIBELS
ORF third grade spring would have to be low
ered substantially to improve the predictive
utility of the test. Continuing with the cutoff
scores recommended by Good and Kaminski

The Predictive Utility of Dibels Reading Assessment / 99

Figure 3 Percent at risk and low risk correctly classified by


recommended test sequence

(2002) is a reasonable approach to identifying


at-risk students.
Measures of oral reading fluency are as ef
fective for students who are classified as ELL
as for children who are not classified as ELL.
Sensitivity is higher across the board for ELL
students, while specificity is lower. Converse
ly, the tests are better at predicting at risk stu
dents when they are ELL and are better at pre
dicting low risk students when they are not.
For both ELL and non-ELL students, DIBELS
is better at predicting success than failure. This
does allow for the consideration of early inter
ventions to improve literacy achievement. Two
of the tests with the strongest predictive capac
ity are administered in the first grade, which
allows for early intervention.
Across all tests, DIBELS subtests are better
at predicting success on the CSAP reading test

than failure. In the first grade winter NWF and


spring ORF have a balanced capacity to predict
success and to predict risk of failure. A similar
pattern was demonstrated for ELL and nonELL students; that is, in general, DIBELS is
better at predicting childrenwho are at low risk
than those at risk. However, DIBELS correctly
classifies children at risk better for ELL than
non-ELL students in the third grade, better in
2 of the 3 time periods in second grade, and
approximately equal in first grade. This in
formation could assist educators in justifying
implementation of early intervention for at risk
students, including ELL students.
The current study adds to the limited body
of research strengthening the conclusion that
DIBELS is effective in identifying English
Language Learners that are at risk for under
achieving in reading.

100 / Reading Improvement


The findings of this study suggest that
the DIBELS can be used to classify English
Language Learners who are at risk for reading
failure. Given the equivocal results of previ
ous research on this question, this is a signifi
cant finding given the substantial sample size
and the high proportion of English Language
Learners in this study. As Riedel concluded in
his research (2007), we cannot draw conclu
sions about the nature of interventions needed
to address the reading underachievement of
English Language Learners, but we do know
these data provide evidence that the DIBELS
is useful in identifying ELL students at risk for
underachievement in reading comprehension.

References
Barger, J. (2003). Comparing the DIBELS oral reading
fluency indicator and the North Carolina end o f grade
reading assessment (Technical Report.) Asheville,
NC: Carolina Teaching Academy.
Buck, J. & Torgcnson, J. (2003). The relationship be
tween performance on a measure o f oral reading flu
ency and performance on the Florida comprehensive
assessment test. FCRR Technical Report #1. Florida
Center for Reading Research.
Elliott, J., Lee, S., & Tollefson, N. (2001). A reliability
and validity study o f the dynamic indicators o f basic
early literacy skills - modified. School Psychology
Review. J0(l):33-49.
Good, R. H., Gruba, J.& Kaminski, R. (2001). Best
practices in using dynamic indicators o f basic early
literacy skills (DIBELS) in an outcomesdriven mod
el. In A. Thomas & J. Grimes (Eds.), Best Practices
in School Psychology IV (pp.679-700). Washington,
DC: National Association o f School Psychologists.
Good, R.H., Simmons, D.C., & Kameenui, E. J. (2001).
The importance and decision-making utility o f a con
tinuum o f fluency-based indicators o f foundational
reading skills for third-grade high stakes outcomes.
Scientific Studies o f Reading, 5, 257-288.
Good, R. H., Kaminski, R. A., Smith, S., Simmons, D.,
Kameenui, E., & Wallin, J. (In press). Reviewing
outcomes: Using DIBELS to evaluate a schools core
curriculum and system o f additional intervention in
kindergarten. In S. R. Vaughn & K. L. Briggs (Eds.),
Reading in the classroom: Systems fo r observing
teaching and learning. Baltimore: Paul H. Brookes.
Good, R. H., Simmons, D., Kameenui, E., Kaminski, R.
A., & Wallin, J. (2002). Summary o f decision rules

for intensive, strategic, and benchmark instruction


al recommendations in kindergarten through third
grade (Technical Report No. 11). Eugene, OR: Uni
versity o f Oregon.
Haager, D., & Windmuellcr, M.P. (2001). Early reading
intervention for English Language Learners at-risk
for learning disabilities: Student and teacher out
comes in an urban school. Learning Disability Quar
terly, 2 (Fall), 235-250.
Hintze, J., Ryan, A., & Stonwcr, G. (2003). Concurrent
validity and diagnostic accuracy o f the Dynamic In
dicators o f Basic Early Literacy Skills (DIBELS) and
the Comprehensive Test o f Phonological Process.
School Psychology Review, 32(4): 541-556.
Kaminski, R., Good, R., Baker, D., Cummings, K., Dufour-Martel, C., Fleming, K., et al. (2006). Position
paper on use o f DIBELS for diverse learners. Dy
namic Measurement Group. Retrieved December 3,
2008 from http://www.dibels.org/pubhcations.html.
McKnight, C., Lee, S., & Schowengerdt, (2001). Effects
of specific strategy training on phonemic awareness
and reading aloud with preschoolers: A comparison
study. Retrieved from ERIC, April 2001, pp. 1-55.
National Institute o f Child Health and Human Devel
opment. (2000) Report o f the National reading Panel
(NIH Publication # 00-4769). Washington, D.C.:
U.S. Government Printing Office.
Riedel, B. (2007). The relation between DIBELS, reading
comprehension, and vocabulary in urban first-grade
students. Reading Research Quarterly, vol. 42, no. 4,
October/Nov/Dcc 2007, 546-567.
Samuels, S.J. (2006, May). Introduction to reading flu
ency. Paper presented at the annual meeting o f the
International Reading Association, Chicago.
Schwarzer, D., and Ferguson, D. (2007). DIBELS and
English Language Learners in the United States: An
analysis o f the scientifically based research behind
the test. TSOL Quarterly Newsletter, Bilingual Edu
cation Interest Section (in press).
Shanahan, T. (2005). Review of DIBELS: Dynamic indi
cators o f basic early literacy skills. Mental Measure
ments Yearbook. 16th ed. P 310-12.
Shaw, R. and Shaw, D., (2002). DIBELS Oral Reading
Fluency-Based Indicators o f Third Grade Reading
Skills for Colorado State Assessment Program
(CSAP.) (Technical Report) Eugene, OR: University
of Oregon.
Swcts, J.A., Dawes, R.M., & Monahan, J. (2000). Psy
chological science can improve diagnostic decisions.
Psychological Science in the Public Interest, I, 1-26.
Vander Meer, C. D., Lentz, F. E., & Stollar, S. (2005). The
relationship between oral reading fluency and Ohio
proficiency testing in reading (Technical Report).
Eugene, OR: University o f Oregon.

Copyright of Reading Improvement is the property of Project Innovation, Inc. and its content
may not be copied or emailed to multiple sites or posted to a listserv without the copyright
holder's express written permission. However, users may print, download, or email articles for
individual use.

Vous aimerez peut-être aussi