Vous êtes sur la page 1sur 10

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/242742418

Criteria for the Evaluation of Reading Assessment Tools

Article · January 2009

CITATIONS READS
3 6,882

2 authors:

Alain Desrochers Victor Glickman


University of Ottawa University of Victoria
79 PUBLICATIONS   1,445 CITATIONS    5 PUBLICATIONS   96 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Educational outcomes among survivors of childhood cancer in British Columbia, Canada: Report of the Childhood/Adolescent/Young Adult Cancer Survivors (CAYACS)
Program View project

Assessment of French reading skills View project

All content following this page was uploaded by Alain Desrochers on 21 April 2015.

The user has requested enhancement of the downloaded file.


Criteria for the Evaluation of Reading Assessment Tools
Written by: Alain Desrochers, Ph.D., Faculty of Social Sciences, University of Ottawa
and Victor Glickman, Ph.D., Faculty of Education, University of British Columbia

Introduction

The primary purpose of educational programs is to produce changes in children’s level


of knowledge and skills. One of the most important skills that needs to be acquired and
perfected in school is reading as other learning gains depend heavily on it. The outcome
of reading instruction is traceable and measurable. The purpose of the present
summary is to help educators understand the variety of outcomes that can be measured
and the different criteria that can be used to evaluate reading assessment tools.

Key Research Questions

1. Why are reading assessment tools used?


2. Who is assessed with reading assessment tools?
3. What needs to be considered before administering a reading assessment test?
4. What do the reading scores mean?
5. How will an educator know if the scores from a reading test are accurate?
6. How will an educator know if a reading test is useful?

Recent Research Results

Reading assessment is typically carried out to guide changes (Gaudreau, 2001) either
in individual interventions, instructional programs, or curricula. The result of an
individualized reading assessment may indicate that some children are ‘at risk’ of
developing reading skills that are significantly below the level of their peers, and that
some form of intervention is warranted: remediation, individualized instruction, or
placement into a special program. When the target of change is the instructional
program or curriculum, the assessment compares children as a group to the reading
goals that were initially set out in the curriculum; reading assessment results may then
indicate that the reading program needs to be upgraded.

1. Why are reading assessment tools used?


Reading assessment is typically carried out to attain one of four distinct goals: a)
screening, b) progress monitoring, c) diagnostic assessment, or d) program evaluation.
The characteristics of the tools needed to achieve these goals may vary.

Screening. An important function of a reading assessment is to identify children who are


‘at risk’ for reading failure, and to provide teachers with information on children’s degree
of preparation for grade-level reading instruction and their need for extra instruction.
Future reading performance can be predicted by assessing early oral language skills

Desrochers, A., & Glickman, V. Page 1 of 9 http://www.literacyencyclopedia.ca


(e.g., phonological awareness, morphological awareness, vocabulary size, or object
naming speed) or basic written language skills (e.g., knowledge of letter names and
sounds, orthographic processing; for reviews, see Blachman, 2000; Desrochers,
Cormier, & Thompson, 2005; Kirby, Desrochers, Roth, & Lai, 2008; Schatschneider,
Fletcher, Francis, Carlson, & Foorman, 2004). Early identification provides a basis for
implementing preventive intervention programs and deals with reading difficulties before
they lead to failure (Vaughn, Wanzek, Woodruff, & Linan-Thompson, 2007).

Progress monitoring. Several models of reading intervention (e.g., the Three-Tier


Model; Fuchs & Fuchs, 2007) require that children’s gains in reading ability be
monitored to ensure that continuous progress is made throughout the school year. The
intent of this type of assessment is to identify the children who are not benefiting as
expected from regular reading instruction or who are in need of remedial instruction.

Diagnostic assessment. Children may face difficulty in learning to read for a variety of
reasons. This diversity is due to the complexity of reading acquisition, which is based on
a large set of elementary skills (for detailed analyses, see Coltheart, 2005; Seymour,
1986; Sprenger-Charolles, Colé, & Serniclaes, 2006). For instance, efficient reading
requires a normal ability to discriminate and recognize visual patterns, process speech
sounds, convert graphemes into speech sounds, recognize and read out whole words,
and access meanings from printed words. Diagnostic test batteries are designed to
assess the strengths and weaknesses of readers on these elementary skills and identify
the components of reading that an intervention should target (for examples, see Reid,
Hresko, & Hammill, 2001; Wagner, Torgesen, & Rashotte, 1999).

Program evaluation. The focus of reading outcome assessment need not be on the
child; it can be on the reading instruction program itself. The assessment is then
intended to evaluate the merits and weaknesses of a curriculum, an instructional
program, the consequences of an educational reform, or the success of a program
implementation. Outcome results can serve to modify the orientation or improve specific
components of the program or its implementation. This type of assessment is typically
based on the reading curriculum that pertains to a particular population of school-age
students.

2. Who is assessed with reading assessment tools?


All reading assessment tools are designed for a particular age group or population.
When an assessment tool is being developed, it is tested on a sample of respondents;
this provides a set of benchmarks, which define the typical reading development for that
specific population. The test users’ guide typically provides the demographic
characteristics of the population the assessment tool was designed for: age group,
gender, socioeconomic level, parents’ education, ethnicity/race, or mother tongue. This
information is critical as it allows examiners to determine the degree of similarity
between the characteristics of a learner and those of the reference group on which the
test norms are based. These norms provide a frame of reference for determining how
different a child may be from children whose reading skills are developing normally for
their age or grade level. Most standardized tests provide procedures for assessing the

Desrochers, A., & Glickman, V. Page 2 of 9 http://www.literacyencyclopedia.ca


severity of a reading disability. Canadian norms, however, are often lacking in currently
available reading assessment tools.

3. What needs to be considered before administering a reading assessment test?


Reading assessment tests differ on many levels, over and above the population of
readers for which they were designed. For instance, the duration of a complete reading
assessment may vary and the assessment may or may not be divided into multiple
testing sessions. An assessment tool may have been designed for individual or group
testing. If it is completely standardized, examiners can expect to be provided with
explicit instructions for themselves and for the children (e.g., what to say, how to
present the test items, how to record and score children’s responses). The
administration of all reading tests requires some level of training. The amount of training
needed may be extensive and this requirement should be taken into account in
budgeting. Some practical considerations such as the time it takes to administer the test
and the characteristics of the physical environment the test is administered in can
impact the accuracy and reliability of the assessment.

4. What do the reading scores mean?


The scoring of children’s responses may be simple and straightforward (e.g., counting
the number of correct responses) or complicated (e.g., making correct responses
conditional upon speed of responding). In all cases, the users’ guide is expected to
provide a clear description of the scoring procedure so as to ensure that all examiners
score responses in the exact same way.

An assessment tool may be designed to provide quantitative information (e.g., how


many words a child read orally without mispronunciations) or qualitative information
(e.g., what types of errors a child makes) on reading performance. What is required
from children may also vary considerably. For example, they may be asked to read
words or sentences aloud or silently, process them for meaning or for making a
judgment (e.g., on their spelling or grammaticality), or select correct responses from
multiple choices. Test requirements are generally determined by what they are intended
to measure.

In order to interpret test scores, three elements are required: a) a normative framework
to determine if individual scores are at, above or below what is expected from an
average child, b) a framework that can help link specific performance scores with
specific cognitive skills (e.g., phonological decoding), and c) a link to provincial
curriculum and learning outcomes.

A normative frame of reference helps us understand what a typical level of performance


is expected to be on a test. The way this is established is based on performance data
collected from a sample of children who share particular characteristics. A good
assessment tool will provide ‘decision rules’ in order to determine if a child’s score is
significantly below average. For example, the examiner’s manual for the
Comprehensive Test of Phonological Processing (CTOPP; Wagner et al., 1999)
includes tables of ‘difference scores’ for determining small and large discrepancies in a
child’s score, compared to an average level of performance. These tables also make it

Desrochers, A., & Glickman, V. Page 3 of 9 http://www.literacyencyclopedia.ca


possible to establish the ‘reading age’ or the ‘reading grade’ of a child; these allow the
examiner to identify weaknesses that may be resolved with a moderate intervention
program or that can only be addressed with an intensive remediation program. This is
useful because it helps educators decide on the resources they need to allocate to
corrective intervention.

Another consideration is how reading assessments map onto provincial language


curriculum and achievement charts. Each jurisdiction in Canada sets out principles
underlying its reading and language curriculum. In many cases the expected level of
reading ability is explicitly specified for each grade level.

5. How will an educator know if the scores from a reading test are accurate?
No measure of human ability is absolutely perfect; all measures entail a margin of error.
To maximize accuracy most measures of reading ability involve multiple items (e.g.,
words, sentences, passages of text). Several indices may be computed and examined
to assess the level of accuracy of a reading test: internal consistency, temporal stability,
and measurement error (for a detailed discussion, see Bertrand & Blais, 2004; Kline,
2005; Laveault & Grégoire, 2002; Sax, 1997).

Internal consistency refers to the inter-relationships among the items that comprise the
test. Responses to these items are expected to be determined by a common set of
abilities and the extent to which they are influenced by the same factors can be
measured. The most common index of internal consistency is the Cronbach Alpha
coefficient. If a particular reading skill (e.g., oral reading of words) is measured over two
consecutive days on the same children, we would expect these two measures to be
identical if no learning has taken place and if there is no measurement error. An
estimate of temporal stability (also called test-retest reliability) can be obtained by
calculating the correlation coefficient between the scores observed over two occasions
separated in time, on the same test, and from the same individuals. Because internal
consistency and temporal stability are never perfect, all test scores are expected to
‘wobble’ around their true value. The estimate of this ‘wobble’ is called the standard
error of measurement and it can be used to define an interval of confidence of a reading
score. This corresponds to the score interval within which the true score has a 95%
chance of falling.

6. How will an educator know if a reading test is useful?


The usefulness of a reading test is closely related to its validity. In assessing the validity
of a reading test we are addressing the extent to which differences in reading scores
are actually due to differences in reading ability and the extent to which they can serve
as a basis for making sound decisions (e.g., recommending a remedial intervention).
Several criteria can be considered for assessing the usefulness of a reading test:
content, associations among reading-related skills, consequences on decision making,
sensitivity to individual differences, and cost-effectiveness (for a detailed discussion,
see American Educational Research Association, 1999).

Content. In many cases, content-related validity can be easily established. If the test is
intended to measure children’s ability to convert letters into speech sounds (e.g., as in

Desrochers, A., & Glickman, V. Page 4 of 9 http://www.literacyencyclopedia.ca


an oral reading test), a simple analysis will confirm if the items actually are letters or
words and if the instructions actually require the children to sound them out and fulfill
this requirement only. Other domains of reading ability may be more difficult to judge
from content analysis. For instance, if the test is intended to measure reading
comprehension, it may be informative to know the extent to which it is also measuring
vocabulary or deductive reasoning. Content analysis, even by experts, is sometimes
insufficient to assess content-related validity.

Association among reading-related skills. Many skills that are relevant to reading are
strongly correlated with one another. For instance, phonological awareness is strongly
associated with reading ability (for a review, see Kirby, Desrochers, Roth, & Lai, 2008).
We would then expect a good measure of phonological awareness to be significantly
correlated with specific aspects of reading such as oral decoding. This relationship is
observed in the correlations among concurrent measures. It is also observed when
measures of phonological awareness are used to predict the level of reading
performance achieved several weeks or months later. This form of criterion-related
evidence is particularly useful in the development of tests for screening children at risk
for reading failure.

Basis for decision making. Reading assessment is typically intended to guide one’s
decisions or actions. A common decision consists of determining if a child needs
remedial intervention and, if so, which components of reading should be targeted for
remedial instruction. These decisions depend, in part, on the test’s capacity to gauge
the severity of the child’s reading difficulties and to guide the ensuing intervention.
Further evidence of validity can be gained by assessing the goodness of the match
between the child’s reading profile, as revealed by the assessment tool, and the
recommended reading intervention based on the assessment.

Sensitivity to individual differences. All measures of reading ability are expected to be


sensitive to individual differences among children. However, in practice, a hard item is
not always better than an easier item at differentiating good readers from poor readers.
Various indicators can be computed for estimating the sensitivity of test items to
individual differences (for an overview, see Kline, 2005, chapter 6). A demonstration of
sensitivity to individual differences is typically provided in the test users’ guide or
technical manual.

Cost-effectiveness. A well-designed tool comes in a solid briefcase or box, its content is


printed on high quality paper and has a durable binding, with a user’s guide and an easy
to use test booklet. The response sheets should make recording, scoring, and
interpreting examinees’ responses efficient and accurate, and all documents should be
legible with clear and interpretable graphics. The assessment tool may also be useable
in special circumstances (e.g., testing a handicapped child with limited mobility or a
child who is hospitalized and bed-ridden). The assessment tool that is selected should
match the assessment needs. Also, the purchase cost of reading assessment tools is
typically high. This is due, in part, to the cost of background research, material design,
packaging and marketing. Examiners’ training and the use of the reading assessment
tool may entail additional costs (e.g., the purchase of response sheets, computer-

Desrochers, A., & Glickman, V. Page 5 of 9 http://www.literacyencyclopedia.ca


assisted scoring). Purchasing a reading assessment tool thus involves a judgment on
value given one’s purpose.

Future Directions

We now discuss three research directions that may be considered in the future: a)
providing educators with a complete toolbox for the assessment of reading skills, b)
developing norms that are relevant to all Canadian children, and c) linking reading
assessment to reading intervention.

A complete toolbox for reading assessment. Presently, no single assessment tool can
measure all reading-related skills that are required to chart a profile of children’s
strengths and weaknesses. This goal can only be achieved by gathering information
from different assessment tools. Educators could benefit from some guidance on how to
select a complete set of assessment tools for their purposes: screening children at risk
of reading failure, reading progress monitoring, diagnostic assessment, or reading
program evaluation. A rigorous matching procedure between current needs and current
resources would permit us to determine what is presently lacking in Canada to build a
complete toolbox and how our test-development efforts should be invested.

Reading performance norms that are relevant to Canadian children. Most reading
assessment tools currently in use in Canada were developed in the United States or
United Kingdom (for English) or Europe (for French). This means that the performance
norms that are currently available for these tests were developed with populations of
children in countries other than Canada. Since the level of performance on reading tests
depends largely on reading curricula and programs, which are decided by provincial
ministries of education in Canada, some discrepancy may be present between the
average Canadian reader and the average reader represented in the norms established
in other countries. There may also be differences among average readers in different
Canadian provinces or regions. Should a common normative frame of reference be
developed for the whole of Canada or for each Canadian province or region? How
should Canadian linguistic diversity be addressed in developing these norms? At the
present time, examiners consider themselves fortunate to have any norms available to
base their decisions on. Further research on the development of reading performance
norms that are relevant to Canadian children would provide a more reliable basis for
interpreting test scores and guiding educational decisions and actions.

Linking reading assessment to reading instruction or remediation. As noted, reading


assessment should aim at guiding decisions and actions. This implies that we should
know what information is needed to make sound decisions or consider appropriate
courses of action (e.g., selecting and implementing the right intervention program) and
that this information is actually provided, at least in part, by the assessment results.
How to link reading assessment to reading instruction or remediation is a complex
issue. It depends largely on our current state of knowledge on what needs to be
assessed and what needs to be done to help readers improve their level of ability.
Many evidence-based recommendations have been made in recent years (e.g.,
National Reading Panel, 2000) and successfully implemented through particular

Desrochers, A., & Glickman, V. Page 6 of 9 http://www.literacyencyclopedia.ca


approaches to reading instruction (e.g., the Three-Tier Model of Reading Intervention;
see Haager, Klingner, & Vaughn, 2007). Further research on the link between reading
assessment and reading intervention would benefit reading program developers,
teachers and learners.

Conclusions

Educators need a broad range of information in order to allocate their resources


effectively. The allocation process in reading assessment is partly determined by the
goals they are pursuing: screening children at risk of reading failure, progress
monitoring, diagnostic assessment, or reading program evaluation. These goals
determine the selection of reading assessment tools. These tools may be designed for
individual or group testing, provide quantitative or qualitative information, include a
normative frame of reference for interpreting reading scores, and require a more or less
extensive amount of training in their use. Many criteria can be considered to estimate
the precision, the validity or the practical usefulness of reading scores. These criteria
are intended to help educators choose the assessment tool that best serves their
purposes.

Date Posted Online: 2009-09-01 11:33:27

Desrochers, A., & Glickman, V. Page 7 of 9 http://www.literacyencyclopedia.ca


References

American Educational Research Association (1999). Standards for educational and


psychological testing. Washington, DC: Author. [Translated into French by G.
Sarrazin (2003). Normes de pratique du testing en psychologie et en education.
Montréal: Institut de recherches psychologiques]
Blachman, B. A. (2000). Phonological awareness. In M. L. Kamil, P. B. Mosenthal, P. D.
Pearson, & R. Barr (Eds.), Handbook of reading research (Vol. 3, pp. 483-502).
Mahwah, NJ: Erlbaum.
Bertrand, R., & Blais, J.-G. (2004). Modèles de mesure : L’apport de la théorie des
réponses aux items. Sainte-Foy, Québec: Presses de l’Université du Québec.
Coltheart, M. (2005). Analysing developmental disorders of reading. Advances in
Speech-Language Pathology, 7, 49-57.
Desrochers, A., Cormier, P., & Thompson, G. (2005). Sensibilité phonologique et
apprentissage de la lecture. Parole, 34-35-36, 113-138.
Fuchs, L. S., & Fuchs, D. (2007). The role of assessment in the three-tier approach to
reading instruction. In D. Haager, J. Klingner, & S. Vaughn (Eds.), Evidence-based
reading practices for response to intervention (pp. 29-42). Baltimore, MD: Paul H.
Brookes Publishing Co.
Gaudreau, L. (2001). Évaluer pour évoluer : Les indicateurs et les critères. Montréal:
Éditions logiques.
Haager, D., Klingner, J., & Vaughn, S. (Eds.). (2007). Evidence-based reading practices
for response to intervention. Baltimore, MD: Paul H. Brookes Publishing Co.
Kirby, J. R., Desrochers, A., Roth, L., & Lai, S. S. V. (2008). Longitudinal predictors of
word reading development. Canadian Psychology, 49, 103-110.
Kline, T. J. B. (2005). Psychological testing: A practical approach to design and
evaluation. Thousand Oaks, CA: Sage Publications.
Laveault, D. & Grégoire, J. (2002). Introduction aux théories des tests en psychologie et
en sciences de l’éducation. Bruxelles: Éditions De Boeck Université.
National Reading Panel. (2000). Teaching children to read: An evidence-based
assessment of the scientific research literature on reading and its implications for
reading instruction. Washington, DC: National Institute of Child Health and Human
Development.
Reid, D. K., Hresko, W. P., & Hammill, D. D. (2001). Test of early reading ability (3rd
ed.; TERA 3). Austin, Texas: Pro-Ed.
Sax, G. (1997). Principles of educational and psychological measurement and
evaluation. Belmont, CA: Wadsworth Publishing Company.
Schatschneider, C., Fletcher, J. M., Francis, D. J., Carlson, C. D, & Foorman, B. R.
(2004). Kindergarten prediction of reading skills: A longitudinal comparative
analysis. Journal of Educational Psychology, 96, 265-282.
Seymour, P. H. K. (1986). Cognitive analysis of dyslexia. London: Routledge & Kegan
Paul.
Sprenger-Charolles, L., Colé, P., & Serniclaes, W. (2006). Reading acquisition and
developmental dyslexia. New York: Psychology Press.
Vaughn, S., Wanzek, J., Woodruff, A. L., & Linan-Thompson, S. (2007). Prevention and
early identification of students with reading disabilities. In D. Haager, J, Klingner, &

Desrochers, A., & Glickman, V. Page 8 of 9 http://www.literacyencyclopedia.ca


S. Vaughn (Eds.), Evidence-based reading practices for response to intervention
(pp. 11-27). Baltimore, MD: Paul H. Brookes Publishing Co.
Wagner, R. K., Torgesen, J. K., & Rashotte, C. A. (1999). Comprehensive test of
phonological processing (CTOPP). Austin, Texas: Pro-Ed.

To cite this document:

Desrochers, A., & Glickman, V. (2009). Criteria for the evaluation of reading
assessment tools. Encyclopedia of Language and Literacy Development (pp. 1-
9). London, ON: Canadian Language and Literacy Research Network.
Retrieved from http://literacyencyclopedia.ca/pdfs/topic.php?topId=280

Desrochers, A., & Glickman, V. Page 9 of 9 http://www.literacyencyclopedia.ca

View publication stats

Vous aimerez peut-être aussi