Vous êtes sur la page 1sur 16

11

Assessing Reading
William Grabe

Northern Arizona University, USA

Xiangying Jiang
West Virginia University, USA

Introduction
In this chapter, we discuss the construct of reading comprehension abilities in
relation to reading assessment, examine prior and current conceptualizations of
reading abilities in assessment contexts, and describe why and how reading abilities are assessed. From a historical perspective, the construct of reading is a
concept that has followed far behind the formal assessment of reading abilities
(leaving aside for the moment the issue of classroom assessment of reading abilities). In fact, the construct of reading comprehension abilities, as well as all the
relevant component subskills, knowledge bases, and cognitive processes (hereafter component skills), had not been well thought out and convincingly described
in assessment contexts until the 1990s. It is interesting to note, in light of this point,
a quote by Clapham (1996) on efforts to develop the IELTS reading modules:
We had asked applied linguists for advice on current theories of language proficiency
on which we might base the IELTS test battery. However, the applied linguists
responses were varied, contradictory and inconclusive, and provided little evidence
for a construct for EAP tests on which we could base the test. (p. 76)

Similar limitations can be noted for the TOEFL of the 1980s (Taylor & Angelis,
2008) and the earlier versions of the Cambridge ESOL suite of tests (see Weir &
Milanovic, 2003; Hawkey, 2009; Khalifa & Weir, 2009). Parallel limitations with
classroom-based assessments in second language contexts were evident until
fairly recently with the relatively narrow range of reading assessment options
typically used (often limited to multiple choice items, true/false items, matching
items, and brief open-ended response items). Fortunately, this situation has
changed remarkably in the past 15 years, and very useful construct research (and
The Companion to Language Assessment, First Edition. Edited by Antony John Kunnan.
2014 John Wiley & Sons, Inc. Published 2014 by John Wiley & Sons, Inc.
DOI: 10.1002/9781118411360.wbcla060

Assessing Abilities

construct statements for assessment purposes) is now available to help conceptualize reading assessment.
The transition from reliability to validity as the driving force behind standardized reading assessment development in the past 20 years has focused on efforts
to reconceptualize reading assessment practices. Most importantly, this reconceptualization reflects a more empirically supported reading construct, one that has
also led to a wider interpretation of reading purposes generally (Grabe, 2009) and
in reading assessment contexts more specifically, for instance reading to learn
and expeditious reading (Enright et al., 2000; Khalifa & Weir, 2009).
Reading assessment itself involves a range of purposes that reflect multiple
assessment contexts: standardized proficiency assessment, classroom-based formative and achievement testing, placement and diagnostic testing, assessment for
reading research purposes (Grabe, 2009), and assessment-for-learning purposes
(Black & Wiliam, 2006). The first two of these contexts take up the large part of
this chapter (see Grabe, 2009, for discussion of all five purposes for reading
assessment).
In the process of discussing these purposes for reading assessment, questions
related to how reading assessments should be carried out are also addressed. The
changing discussions of the reading construct, the redesign of standardized assessments for second language learners, and the need to assess aspects of the reading
construct that were previously ignored have led to a wide range of assessment
task types, some of which had not been given serious consideration until the late
1990s.

Previous Conceptualizations
Reading comprehension ability has a more intriguing history than is commonly
recognized, and it is a history that has profoundly affected how reading comprehension is assessed. Before the 20th century, most people did not read large
amounts of material silently for comprehension. For the much smaller percentage
of test takers in academic settings, assessment emphases were placed on literature,
culture, and interpretation involving more subjectively measured items. The 20th
century, in its turn, combined a growing need for many more people capable of
reading large amounts of text information for comprehension with many more
uses of this information in academic and work contexts. In the USA, for example,
while functional literacy was estimated at 90% at the turn of the 20th century, this
may have been defined simply as completing one or two years of schooling. In
the 1930s, functional literacy in the USA was placed at 88%, being defined as a
third grade completion rate (Stedman & Kaestle, 1991). The pressure to educate
a much larger percentage of the population in informational literacy skills, and
silent reading comprehension skills in particular, was driven, in part, by the need
for more literate soldiers in World Wars I and II, more literate industrial workers,
and increasingly higher demands placed on student performance in educational
settings (Pearson & Goodin, 2010).
Within academic settings, the rise of objective testing practices from a rapidly
developing field of educational psychology and psychological measurement

Assessing Reading

spurred on large-scale comprehension assessment. However, for the US context,


it was only in 1970 that comprehension assessments provided a reliable national
picture of English first language (L1) reading abilities, and their patterns of variation, through the NAEP (National Assessment of Educational Progress) testing
program and public reports. If broad-based reading comprehension skills assessment has been a relatively recent development, so also has been the development
of reading assessment measures that reflect an empirically derived construct of
reading abilities.
During the period from the 1920s to the 1960s, objective assessment practices
built on psychometric principles were powerful shaping forces for reading assessment in US contexts. In line with these pressures for more objective measurement,
L2 contexts were not completely ignored. The first objectively measured foreign
language reading test was developed in 1919 (Spolsky, 1995). In the UK, in contrast, there was a strong counterbalancing emphasis on expert validity. In the first
half of the 20th century, this traditional validity emphasis sometimes led to more
interesting reading assessment tasks (e.g., summarizing, paraphrasing, text interpretation), but also sometimes led to relatively weak assessment reliability (Weir
& Milanovic, 2003).
By the 1960s and 1970s, the pressure to deliver objective test items led to the
development of the TOEFL as a multiple choice test and led to changes in assessment practices with the Cambridge ESOL suite as well as the precursor of the
IELTS (i.e., ELTS and the earlier EPTB, the English Proficiency Test Battery)
(Clapham, 1996; Weir & Milanovic, 2003). At the same time, the constraints of
using multiple choice and matching items also limited which aspects of reading
abilities could be reliably measured. Starting in the 1970s, the pressures of communicative competence and communicative language teaching led to strong
claims for the appropriateness of integrative reading assessments (primarily cloze
testing). However, from 1980 onwards, the overwhelming output of cognitive
research on reading abilities led to a much broader interpretation of reading abilities, one that was built from several component subskills and knowledge bases.
From 1990 onward, research on reading comprehension has been characterized
by the roles of various component subskills on reading performance, and on
reading for different purposes (reading to learn, reading for general comprehension, expeditious reading, etc.). This expansion of reading research has also led to
more recent conceptualizations of the reading construct as the driving force behind
current standardized reading assessment practices.

Current Conceptualizations
In considering current views on reading assessment, we focus primarily on standardized assessment and classroom-based assessment practices. These are the two
most widespread uses of reading assessment, and the two purposes that have the
greatest impact on test takers. In both cases, the construct of reading abilities is a
central issue. The construct of reading has been described recently in a number of
ways, mostly with considerable overlap (see Alderson, 2000; Grabe, 2009; Khalifa
& Weir, 2009; Adlof, Perfetti, & Catts, 2011). Based on what can now be classified

Assessing Abilities

as thousands of empirical research studies on reading comprehension abilities, the


consensus that has emerged is that reading comprehension comprises several
component language skills, knowledge resources, and general cognitive abilities.
The use of these component abilities in combinations varies by proficiency, overall
reading purpose, and specific task.
Research in both L1 and L2 contexts has highlighted those factors that strongly
impact reading abilities and account for individual differences in reading comprehension performance:
1. efficient word recognition processes (phonological, orthographic, morphological, and semantic processing);
2. a large recognition vocabulary (vocabulary knowledge);
3. efficient grammatical parsing skills (grammar knowledge under time
constraints);
4. the ability to formulate the main ideas of a text (formulate and combine
appropriate semantic propositions);
5. the ability to engage in a range of strategic processes while reading more challenging texts (including goal setting, academic inferencing, monitoring);
6. the ability to recognize discourse structuring and genre patterns, and use this
knowledge to support comprehension;
7. the ability to use background knowledge appropriately;
8. the ability to interpret text meaning critically in line with reading
purposes;
9. the efficient use of working memory abilities;
10. the efficient use of reading fluency skills;
11. extensive amounts of exposure to L2 print (massive experience with L2
reading);
12. the ability to engage in reading, to expend effort, to persist in reading without
distraction, and achieve some level of success with reading (reading
motivation).
These factors, in various combinations, explain reading abilities for groups of
readers reading for different purposes and at different reading proficiency levels.
Given this array of possible factors influencing (and explaining) reading comprehension abilities, the major problems facing current L2 assessment development
are (a) how to explain these abilities to wider audiences, (b) how best to measure
these component skills within constrained assessment contexts, and (c) how to
develop assessment tasks that reflect these component skills and reading comprehension abilities more generally.

Standardized Reading Assessment


Major standardized reading assessment programs consider the construct of
reading in multiple ways. It is possible to describe the reading construct in terms
of purposes for reading, representative reading tasks, or cognitive processes that
support comprehension. To elaborate, a number of purposes for engaging in
reading can be identified, a number of representative reading tasks can be

Assessing Reading

identified, and a set of cognitive processes and knowledge bases can be considered
as constitutive of reading comprehension abilities. Of the three alternative descriptive possibilities, reading purpose provides the most transparent explanation to a
more general public as well as to test takers, text users, and other stakeholders.
Most people can grasp intuitively the idea of reading to learn, reading for general
comprehension, reading to evaluate, expeditious reading, and so on. Moreover,
these purposes incorporate several key reading tasks and major component skills
(many of which vary in importance depending on the specific purpose), thus
providing a useful overarching framework for the construct of reading (see
Clapham, 1996; Enright et al., 2000; Grabe, 2009; Khalifa & Weir, 2009). This depiction of reading abilities, developed in the past two decades, has also led to a
reconsideration of how to assess reading abilities within well recognized assessment constraints. It has also led to several innovations in test tasks in standardized
assessments. This trend is exemplified by new revisions to the Cambridge ESOL
suite of exams, the IELTS, and the iBT TOEFL.
The Cambridge ESOL suite of exams (KET, PET, FCE, CAE, CPE) has undergone
important changes in its conceptualization of reading assessment (see Weir &
Milanovic, 2003; Hawkey, 2009; Khalifa & Weir, 2009). As part of the process, the
FCE, CAE, and CPE have introduced reading assessment tests and tasks that
require greater recognition of the discourse structure of texts, recognition of main
ideas, careful reading abilities, facility in reading multiple text genres, and a larger
amount of reading itself. Reading assessment tasks now include complex matching tasks of various types, multiple choice items, short response items, and
summary writing (once again).
IELTS (the International English Language Testing System) similarly expanded
its coverage of the purposes for reading to include reading for specific information, reading for main ideas, reading to evaluate, and reading to identify a topic
or theme. Recent versions of the IELTS include an academic version and a general
training version. The IELTS academic version increased the amount of reading
required, and it includes short response items of multiple types, matching of
various types, several complex readings with diagrams and figures, and innovative fill-in summary tasks.
The iBT TOEFL has similarly revised its reading section based on the framework
of reader purpose. Four reading purposes were initially considered in the design
of iBT TOEFL reading assessment: reading to find information, reading for basic
comprehension, reading to learn, and reading to integrate (Chapelle, Enright, &
Jamieson, 2008), although reading to integrate was not pursued after the pilot
study. iBT TOEFL uses three general item types to evaluate readers academic
reading proficiency: basic comprehension items, inferencing items, and reading-tolearn items. Reading to learn has been defined as developing an organized understanding of how the main ideas, supporting information, and factual details of the
text form a coherent whole (Chapelle et al., 2008, p. 111), for which two new tasks,
prose summary and schematic table, were included. In addition, the iBT TOEFL
uses longer, more complex texts than the ones used in the traditional TOEFL.
In all three of these standardized test systems, revisions drew upon well articulated and empirically supported constructs of reading abilities as they apply to
academic contexts. In all three cases, greater attention has been given to longer

Assessing Abilities

reading passages, to discourse organization, and to an expanded concept of read


ing to learn or reading to evaluate. At the same time, a number of component
reading abilities are obviously absent, reflecting the limitations of international
standardized reading assessment imposed by cost, time, reliability demands, and
fairness across many country settings. (Standardized English L1 reading assessment practices are far more complex.) These limited operationalizations of L2
reading abilities are noted by Alderson (2000), Weir and Milanovic (2003), Grabe
(2009), and Khalifa and Weir (2009).
Among the abilities that the new iBT TOEFL did not pursue are word recognition efficiency, reading to scan for information, summarizing, and reading to
integrate information from multiple texts. Khalifa and Weir (2009) note that the
Cambridge suite did not pursue reading to scan, reading to skim, or reading rate
(fluency). All three come under the umbrella term expeditious reading and, for
their analysis, this gap represents a limitation in the way the reading construct
has been operationalized in the Cambridge suite (and in IELTS). IELTS revisions
had considered including short response items and summary writing. In recent
versions, it has settled for a more limited but still innovative cloze summary task.
Returning to the list of component skills noted earlier, current standardized
reading assessment has yet to measure a full range of component abilities of reading
comprehension (and may not be able to do so in the near future). Nonetheless, an
assessment of reading abilities should reflect, as far as possible, the abilities a skilled
reader engages in when reading for academic purposes (leaving aside adult basic
literacy assessments and early child reading assessments). The following is a list of
the component abilities of reading comprehension that are not yet well incorporated into L2 standardized reading assessment (from Grabe, 2009, p. 357):
1.
2.
3.
4.
5.
6.
7.
8.
9.

passage reading fluency and reading rate,


automaticity and rapid word recognition,
search processes,
morphological knowledge,
text structure awareness and discourse organization,
strategic processing abilities,
summarization abilities (and paraphrasing),
synthesis skills,
complex evaluation and critical reading.

How select aspects of these abilities find their ways into standardized L2 reading
assessment practices is an important challenge for the future.
Although researchers working with standardized reading tests have made a
serious effort to capture crucial aspects of the component abilities of reading
comprehension (e.g., Khalifa & Weir, 2009; Chapelle et al., 2008; Hawkey, 2009),
construct validity still represents a major challenge for L2 reading assessment
because the number and the types of assessment tasks are strictly constrained in
the context of standardized testing. If the construct is underrepresented by the
test, it is difficult to claim that reading comprehension abilities are being fully
measured. This difficulty also suggests that efforts to develop an explanation
of the reading construct from L2 reading tests face the challenge of construct

Assessing Reading

underrepresentation in the very tests being used to develop the construct (a fairly
common problem until recently). Perhaps with greater uses of computer technology in testing, the control over time for individual items or sections can be better
managed, and innovative item types can be incorporated without disrupting
assessment procedures. In addition, as suggested by Shiotsu (2010), test taker
performance information recorded by computers may not only assist decision
making but might also be used for diagnostic purposes. One of the most obvious
potential applications of the computer is to more easily incorporate skimming,
reading-to-search, reading fluency, and reading rate measures. Such an extension
in the future would be welcome.

Classroom-Based Reading Assessment


Moving on from standardized assessments, the second major use of L2 reading
assessments takes place in classroom contexts. In certain respects, classroombased assessment provides a complement to standardized assessment in that
aspects of the reading construct not accounted for by the latter can easily be
included in the former. In many classroom-based assessment contexts, teachers
observe, note, and chart students reading rates, reading fluency, summarizing
skills, use of reading information in multistep tasks, critical evaluation skills, and
motivation and persistence to read.
Reading assessment in these contexts is primarily used to measure student
learning (and presumably to improve student learning). This type of assessment
usually involves the measurement of skills and knowledge gained over a period
of time based on course content and specific skills practiced. Typically, classroom
teachers or teacher groups are responsible for developing the tests and deciding
how the scores should be interpreted and what steps to take as a result of the
assessment outcomes (Jamieson, 2011). Classroom learning can be assessed at
multiple points in any semester and some commonly used classroom assessments
include unit achievement tests, quizzes of various types, and midterm and final
exams. In addition to the use of tests, informal and alternative assessment options
are also useful for the effective assessment of student learning, using, for example,
student observations, self-reporting measures, and portfolios. A key issue for
informal reading assessment is the need for multiple assessment formats (and
multiple assessment points) to evaluate a wide range of student performances for
any decisions about student abilities or student progress. The many small assessments across many tasks helps overcome the subjectivity of informal assessment
and strengthens the effectiveness and fairness of informal assessments.
Classroom-based assessment makes use of the array of test task types found in
standardized assessments (e.g., cloze, gap-filling formats [rational cloze formats],
text segment ordering, text gaps, multiple choice questions, short answer responses,
summary writing, matching items, true/false/not stated questions, editing, information transfer, skimming, scanning). Much more important for the validity of
classroom assessment, though less commonly recognized, are the day-to-day informal assessments and feedback that teachers regularly provide to students. Grabe
(2009) identifies six categories of classroom-based assessment practices and notes
25 specific informal assessment activities that can be, and often are, carried out by

Assessing Abilities

teachers. These informal activities include (a) having students read aloud in class
and evaluating their reading, (b) keeping a record of student responses to questions in class after a reading, (c) observing how much time students spend on task
during free reading or sustained silent reading (SSR), (d) observing students
reading with an audiotape or listening to an audiotaped reading, (e) having students list words they want to know after reading and why, (f) having students
write simple book reports and recommend books to others, (g) keeping charts of
student reading rate growth, (h) having a student read aloud for the teacher/tester
and making notes, or using a checklist, or noting miscues on the text, (i) noting
students uses of texts in a multistep project and discussing these uses, and (j)
creating student portfolios of reading activities or progress indicators.
Among these informal assessment activities, it is worth pointing out that oral
reading fluency (reading aloud) assessment has attracted much research interest
in L1 contexts. Oral reading fluency has been found to serve as a strong predictor
of general comprehension (Shinn, Knutson, Good, Tilly, & Collins, 1992; Fuchs,
Fuchs, Hosp, & Jenkins, 2001; Valencia et al., 2010). Even with a one-minute oral
reading measure, teachers can look into multiple indicators of oral reading fluency
(e.g., rate, accuracy, prosody, and comprehension) and obtain a fine-grained un
derstanding of students reading ability, particularly if multiple aspects of student
reading performances are assessed (Kuhn, Schwanenflugel, & Meisinger, 2010;
Valencia et al., 2010). However, research on fluency assessment has not been
carried out in L2 reading contexts. Practices of reading aloud as an L2 reading
assessment tool will benefit from research on the validity of oral reading fluency
assessment in the L2 context.
Another aspect of classroom-based assessment that is gaining in recognition is
the concept of assessment for learning (Black & Wiliam, 2006; Wiliam, 2010). This
approach draws on explicit classroom tests, informal assessment practices, and
opportunities for feedback from students to teachers that indicate a need for assistance or support. The critical goal of this assessment approach is to provide immediate feedback on tasks and to teach students to engage in more effective learning
instead of evaluation of their performance. An important element of assessment for
learning is the follow-up feedback and interaction between the teacher and the
students. Through this feedback, teachers respond with ongoing remediation and
fine-tuning of instruction when they observe non-understanding or weak student
performances. The key is not to provide answers, but to enhance learning, work
through misunderstandings that are apparent from student performance, develop
effective learning strategies, and encourage student self-awareness and motivation
to improve. Grabe (2009) notes 15 ideas and techniques for assessment for learning.
Although these ideas and techniques apply to any learning and assessment context,
they are ideally suited to reading tasks and reading comprehension development.

Current L2 Reading Assessment Research


In addition to the volume-length publications on assessment development and
validation with three large-scale standardized L2 tests (e.g., Clapham, 1996; Weir
& Milanovic, 2003; Chapelle et al., 2008; Hawkey, 2009; Khalifa & Weir, 2009)

Assessing Reading

reviewed above, this section will focus on recent journal publications related to
reading assessment. We searched through two of the most important assessment
journals, Language Testing and Language Assessment Quarterly, for their publications
in the past 10 years and found that the recent research on reading assessment
focused mainly on the topics of test tasks, reading texts, and reading strategies.
We note here seven studies relevant to conceptualizations of the L2 reading
construct and ways to assess the reading construct. The first four studies focus on
aspects of discourse structure awareness, complex text analysis tasks, and the role
of the texts themselves. Two subsequent studies focus on the role of reading strategies and reading processes in testing contexts. At issue is whether or not multiple
choice questions bias text reading in unintended ways. The final study examines
the role of memory on reading assessment as a further possible source of bias.
Overall, it is important to note that research articles on L2 reading assessment are
relatively uncommon in comparison with research on speaking and writing
assessment (and performance scoring issues).
Kobayashi (2002) examined the impact of discourse organization awareness on
reading performance. Specifically, she investigated whether text organization
(association, description, causation, and problem-solution) and response format
(cloze, open-ended questions, and summary writing) have a systematic influence
on test results of learners at different proficiency levels (high, middle, and low).
She found that text organization did not lead to strong performance differences
for test formats that measured less integrative comprehension such as cloze tests
or for learners of limited L2 proficiency. On the contrary, stronger performance
differences due to organizational differences in texts were observed for testing
formats that measure more integrative forms of comprehension tasks (open-ended
questions and summary writing), especially for learners with higher levels of L2
proficiency. The more proficient students benefited from texts with a clear structure for summary writing and open-ended questions. She suggested that it is
essential to know in advance what type of text organization is involved in passages used for reading comprehension tests, especially in summary writing with
learners of higher language proficiency (p. 210). The study confirms previous
findings that different test formats seem to measure different aspects of reading
comprehension and that text organization can influence reading comprehension
based on more complex reading tasks.
Yu (2008) also contributed to issues in discourse processing by exploring the
use of summaries for reading assessment with 157 Chinese university students in
an undergraduate EFL program. The study looked at the relationships between
summarizing an L2 text in the L2 versus in the L1, as well as relationships among
both summaries (L1 and L2) and an L2 reading measure, an L2 writing measure,
and a translation measure. Findings showed that test takers wrote longer summaries in the L1 (Chinese) but were judged to have written better summaries
in their L2 (English). Perhaps more importantly, summary writing in Chinese
and English only correlated with L2 reading measures at .30 and .26 (r2 of .09 and
.07 respectively, for only the stronger of two summary quality measures). These
weak correlations suggest that summary writing measures something quite different from the TOEFL reading and writing measures used. Yu found no relationships between summary-writing quality and the TOEFL writing or translation

10

Assessing Abilities

measures. In a questionnaire and follow-up interviews, test takers also felt that
summary writing was a better indicator of their comprehension abilities than of
their writing abilities. While this is only one study in one context, it raises interesting questions about the role of summarizing in reading assessment, which needs
to be examined further.
Trites and McGroarty (2005) addressed the potential impact of more complex
reading tasks that go beyond only measures of basic comprehension. The authors
reported the design and use of new measures to assess the more complex reading
purposes of reading to learn and reading to integrate (see Enright et al., 2000).
Based on the analyses of data from both native and non-native speakers, the
authors found that new tasks requiring information synthesis assessed something
different from basic comprehension, after a lower level of basic academic English
proficiency had been achieved. The authors speculated that the new measures
tap additional skills such as sophisticated discourse processes and critical thinking
skills in addition to language proficiency (p. 199).
Green, Unaldi, and Weir (2010) focused on the role of texts, and especially disciplinary text types, for testing purposes. They examined the authenticity of
reading texts used in IELTS by comparing IELTS Academic Reading texts with the
texts that first year undergraduates most needed to read and understand once
enrolled at their universities. The textual features examined in the study included
vocabulary and grammar, cohesion and rhetorical organization, genre and rhetorical task, subject and cultural knowledge, and text abstractness. The authors found
that the IELTS texts have many of the features of the kinds of text encountered by
first year undergraduates and there are few fundamental differences between
them. The findings support arguments made by Clapham (1996) that nonspecialist
texts of the kind employed in IELTS can serve as a reasonable substitute for testing
purposes.
Rupp, Ferne, and Choi (2006) explored whether or not test takers read in similar
ways when reading texts in a multiple choice testing context and when reading
texts in non-testing contexts. Using qualitative analyses of data from introspective
interviews, Rupp et al. (2006) found that asking test takers to respond to text passages with multiple choice questions induced response processes that are strikingly different from those that respondents would draw on when reading in
non-testing contexts. The test takers in their study were found to often segment
a text into chunks that were aligned with individual questions and focused predominantly on the microstructure representation of a text base rather than the
macrostructure of a situation model (p. 469). The authors speculated that higherorder inferences that may lead to an integrated macrostructure situation model in
a non-testing context are often suppressed or are limited to grasping the main idea
of a text (p. 469). The construct of reading comprehension that is assessed and
the processes that learners engage in seem to have changed as a result of the
testing format and text types used. The authors assert that the construct of reading
comprehension turns out to be assessment specific and is fundamentally determined through item design and text selection. (This issue of test variability
in reading assessments has also been the focus of L1 reading research, with considerable variability revealed across a number of standardized tests; see Keenan,
Betjemann & Olson, 2008.)

Assessing Reading

11

Cohen and Upton (2007) described reading and test-taking strategies that test
takers use to complete reading tasks in the reading sections of the LanguEdge
Courseware (2002) materials developed to introduce the design of the new TOEFL
(iBT TOEFL). The study sought to determine if there is variation in the types of
strategies used when answering three broad categories of question types: basic
comprehension item types, inferencing item types, and reading-to-learn item
types. Think-aloud protocols were collected as the participants worked through
these various item types. The authors reported two main findings: (a) participants
approached the reading section of the test as a test-taking task with a primary
goal of getting the answers right, and (b) the strategies deployed were generally
consistent with TOEFLs claims that the successful completion of this test section
requires academic reading-like abilities (p. 237). Unlike those in Rupp et al.
(2006), the participants in this study were found to draw on their understanding
and interpretation of the passage to answer the questions, except when responding to certain item formats like basic comprehension vocabulary. However, their
subjects used 17 out of 28 test-taking strategies regularly, but only 3 out of 28
reading strategies regularly. So, while subjects may be reading for understanding
in academic ways, they are probably not reading academic texts in ways in which
they would read these texts in non-testing contexts. In this way, at least, the results
of Cohen and Upton (2007) converge with the findings of Rupp et al. (2006).
Finally, Chang (2006) examined whether and how the requirement of memory
biases our understanding of readers comprehension. The study compared L2
readers performance on an immediate recall protocol (a task requiring memory)
and on a translation task (a task without the requirement of memory). The study
revealed that the translation task yielded significantly more evidence of comprehension than did the immediate recall task, which indicates that the requirement
of memory in the recall task may hinder test takers abilities to demonstrate fully
their comprehension of the reading passage. The results also showed that the
significant difference found in learners performance between the immediate
recall and the translation task spanned the effect of topics and proficiency levels.
This study provides evidence that immediate free recall tasks might have limited
validity as a comprehension measure due to its memory-related complication.
Certainly, more research is needed on the role and relevance of memory processes
as part of reading comprehension abilities.

Challenges
A number of important challenges face reading assessment practices. One of the
most important challenges for reading assessment stems from the complexity of
the construct of reading ability itself. Reading comprehension is a multicomponent construct which involves many skills and subskills (at least the 12 listed
above). The question remains how such an array of component abilities can best
be captured within the operational constraints of standardized testing, what new
assessment tasks might be developed, and what component abilities might best
be assessed indirectly (Grabe, 2009). In standardized assessment contexts, practices that might expand the reading assessment construct are constrained by

12

Assessing Abilities

concerns of validity, reliability, time, cost, usability, and consequence, which limit
the types of reading assessment tasks that can be used. In classroom-based contexts, effective reading assessments are often constrained by relatively minimal
awareness among teachers that a range of reading abilities, reflecting the reading
construct, need to be assessed.
A second challenge is the need to reconcile the connection between reading in
a testing context and reading in non-testing contexts. Whether or not a text or task
has similar linguistic and textual features in a testing context to texts in non-test
uses (that is, how authentic the text is) does not address what test takers actually
do when encountering these texts in a high stakes testing situation. When students
read a text as part of standardized assessment, they know that they are reading
for an assessment purpose. So, for example, although the characteristics of the
academic reading texts used in IELTS were said to share most of the textual characteristics of first year undergraduate textbook materials (Green et al., 2010), the
context for standardized assessment may preclude any strong assumption of a
match to authentic reading in the real world (see, e.g., Rupp et al., 2006; Cohen
& Upton, 2007). One outcome is that it is probably not reasonable to demand that
the reading done in reading assessments exactly replicate real world reading
experiences. However, the use of realistic texts, tasks, and contexts should be
expected because it supports positive washback for reading instruction; that is to
say, texts being used in testing and language instruction are realistic approximations for what test takers will need to read in subsequent academic settings.
A third challenge is how to assess reading strategies, or the strategic reader.
Rupp et al. (2006) found that the strategies readers use in assessment contexts
were different from the ones they use in real reading contexts and even the construct of reading comprehension is assessment-specific and determined by the test
design and text format. On the other hand, Cohen and Upton (2007) found that,
although the participants approached the reading test as a test-taking task, the
successful completion of the test requires both local and general understanding
of the texts, which reflects academic-like reading abilities. This debate leaves open
a key question: If readers use strategies differently in non-testing contexts and in
testing contexts, how should we view the validity of reading assessments (assuming strategy use is a part of the reading construct)? Clearly, more research is
needed on the use of, and assessment of, reading strategies in testing contexts.
A fourth challenge is the possible need to develop a notion of the reading construct that varies with growing proficiency in reading. In many L2 reading assessment situations, this issue is minimized (except for the Cambridge ESOL suite of
language assessments). Because English L2 assessment contexts are so often
focused on EAP contexts, there is relatively little discussion of how reading assessments should reflect a low-proficiency interpretation of the L2 reading construct
(whether for children, or beginning L2 learners, or for basic adult literacy populations). It is clear that different proficiency levels require distinct types of reading
assessments, especially when considering research in L1 reading contexts (Paris,
2005; Adlof et al., 2011). In L2 contexts, Kobayashi (2002) found that text organization and response format have an impact on the performance of readers at different proficiency levels. The implication of this finding is that different texts, tasks,
and task types are appropriate at different proficiency levels. In light of this
finding, how should reading assessment tasks and task types change with growing

Assessing Reading

13

L2 proficiency? Can systematic statements be made in this regard? Should proficiency variability be reflected at the level of the L2 reading construct and, if so,
how?

Future Directions
In some respects, the challenges to L2 reading assessment and future directions
for reading assessment are two sides of the same coin. In closing this chapter, we
suggest five future directions as a set of issues that L2 reading assessment research
and practice should give more attention to. These directions do not necessarily
reflect current conflicts in research findings or immediate challenges to the validity
of reading assessment, but they do need to be considered carefully and acted upon
in the future.
First, different L2 reading tests likely measure students differently. This is not
news to reading assessment researchers, but this needs to be explored more explicitly and systematically in L2 reading contexts. Standardized assessment programs
may not want to know how their reading tests compare with other reading tests,
so this is work that might not be carried out by testing corporations. At the same
time, such work can be expensive and quite demanding on test takers. Nonetheless, with applied linguists regularly using one or another standardized test for
research purposes, it is important to know how reading measures vary. One
research study in L1 contexts (Keenan et al., 2008) has demonstrated that widely
used L1 reading measures give different sets of results for the same group of test
takers. Work of this type would be very useful for researchers studying many
aspects of language learning.
Second, the reading construct is most likely underrepresented by all well-known
standardized reading assessment systems. A longer-term goal of reading assessment research should be to try to expand reading measures to more accurately
reflect the L2 reading construct. Perhaps this work can be most usefully carried out
as part of recent efforts to develop diagnostic assessment measures for L2 reading
because much more detailed information could be collected in this way. Such work
would, in turn, improve research on the L2 reading construct itself. At issue is the
extent to which we can (and should) measure reading passage fluency, main idea
summarizing skills, information synthesis from multiple text sources, strategic
reading abilities, morphological knowledge, and possibly other abilities.
Third, L2 readers are not a homogeneous group and they bring different background knowledge when reading L2 texts. They vary in many ways in areas such
as cultural experiences, topic interest, print environment, knowledge of genre and
text structures, and disciplinary knowledge. In order to control for unnecessary
confounding factors related to these differences in prior knowledge, more attention should be paid to issues of individual variation, especially in classroom-based
assessments, so no test takers are advantaged or disadvantaged due to these
differences.
Fourth, computers and new media are likely to alter how reading tests and
reading tasks evolve. Although we believe that students in reading for academic
purposes contexts are not going to magically bypass the need to read print materials and books for at least the near future, we need to recognize that the ability to

14

Assessing Abilities

read online texts is becoming an important part of the general construct of reading
ability. As a result, more attention needs to be paid to issues of reading assessment
tied to reading of online texts, especially when research has indicated a low correlation between students who are effective print readers versus students who are
effective online readers (Coiro & Dobler, 2007). At the same time, reading assessment research will need to examine the uses of computer-based assessments and
assessments involving new media. A major issue is how to carry out research that
is fair, rigorous, and relatively free of enthusiastic endorsements or the selling of
the new simply because it is novel.
Finally, teachers need to be trained more effectively to understand appropriate
assessment practices. A large number of teachers still have negative attitudes to
the value of assessment measures for student evaluation, student placement, and
student learning. In many cases, L2 training programs do not require an assessment course, or the course is taught in a way that seems to turn off future teachers.
As a consequence, teachers allow themselves to be powerless to influence assessment practices and outcomes. In such settings, teachers, in effect, cheat themselves
by being excluded from the assessment process, and they are not good advocates
for their students. Perhaps most importantly, teachers lose a powerful tool to
support student learning and to motivate students more effectively. The problem
of teachers being poorly trained in assessment practices is a growing area of attention in L1 contexts; it should also be a more urgent topic of discussion in L2
teacher-training contexts.

SEE ALSO: Chapter 4, Assessing Literacy; Chapter 13, Assessing Integrated Skills;
Chapter 32, Large-Scale Assessment; Chapter 46, Defining Constructs and Assessment Design; Chapter 50, Adapting or Developing Source Material for Listening
and Reading Tests; Chapter 66, Fairness and Justice in Language Assessment;
Chapter 89, Classroom-Based Assessment Issues for Language Teacher Education;
Chapter 94, Ongoing Challenges in Language Assessment

References
Adlof, S., Perfetti, C., & Catts, H. (2011). Developmental changes in reading comprehension:
Implications for assessment and instruction. In S. Samuels & A. Farstrup (Eds.), What
research has to say about reading instruction (4th ed., pp. 186214). Newark, DE: International Reading Association.
Alderson, J. (2000). Assessing reading. New York, NY: Cambridge University Press.
Black, P., & Wiliam, D. (2006). Assessment for learning in the classroom. In J. Gardner (Ed.),
Assessment and learning (pp. 925). London, England: Sage.
Chang, Y.-F. (2006). On the use of the immediate recall task as a measure of second language
reading comprehension. Language Testing, 23(4), 52043.
Chapelle, C. A., Enright, M. K., & Jamieson, J. M. (Eds.). (2008). Building a validity argument
for the Test of English as a Foreign Language. New York, NY: Routledge.
Clapham, C. (1996). The development of IELTS: A study in the effect of background knowledge on
reading comprehension. Studies in language testing, 6. New York, NY: Cambridge University Press.

Assessing Reading

15

Cohen, A. D., & Upton, T. A. (2007). I want to go back to the test: Response strategies on
the reading subtest of the new TOEFL. Language Testing, 24(2), 20950.
Coiro, J., & Dobler, E. (2007). Exploring the online reading comprehension strategies used
by sixth-grade skilled readers to search for and locate information on the Internet.
Reading Research Quarterly, 42, 21457.
Enright, M., Grabe, W., Koda, K., Mosenthal, P., Mulcahy-Ernt, P., & Schedl, M. (2000).
TOEFL 2000 reading framework: A working paper. TOEFL monograph, 17. Princeton, NJ:
Educational Testing Service.
Fuchs, L., Fuchs, D., Hosp, M., & Jenkins, J. (2001). Oral reading fluency as an indicator of
reading competence: A theoretical, empirical, and historical analysis. Scientific Studies
of Reading, 5, 23956.
Grabe, W. (2009). Reading in a second language: Moving from theory to practice. New York, NY:
Cambridge University Press.
Green, A., Unaldi, A., & Weir, C. (2010). Empiricism versus connoisseurship: Establishing
the appropriacy of texts in tests of academic reading. Language Testing, 27(2),
191211.
Hawkey, R. (2009). Examining FCE and CAE: Key issues and recurring themes in developing the
First Certificate in English and Certificate in Advanced English exams. Studies in language
testing, 28. New York, NY: Cambridge University Press.
Jamieson, J. (2011). Assessment of classroom language learning. In E. Hinkel (Ed.), Handbook
of research in second language teaching and learning (Vol. 2, pp. 76885). New York, NY:
Routledge.
Keenan, J., Betjemann, R., & Olson, R. (2008). Reading comprehension tests vary in the skills
they assess: Differential dependence on decoding and oral comprehension. Scientific
Studies of Reading, 12(3), 281300.
Khalifa, H., & Weir, C. J. (2009), Examining reading. Cambridge, England: Cambridge University Press.
Kobayashi, M. (2002). Method effects on reading comprehension test performance: Text
organization and response format. Language Testing, 19(2), 193220.
Kuhn, M. R., Schwanenflugel, P. J., & Meisinger, E. B. (2010). Aligning theory and assessment of reading fluency: Automaticity, prosody, and definitions of fluency. Reading
Research Quarterly, 45(2), 23051.
Paris, G. S. (2005). Reinterpreting the development of reading skills. Reading Research Quarterly, 40(2), 184202.
Pearson, P. D., & Goodin, S. (2010). Silent reading pedagogy: A historical perspective. In
E. Hiebert & D. R. Reutzel (Eds.), Revisiting silent reading (pp. 323). Newark, DE:
International Reading Association.
Rupp, A., Ferne, T., & Choi, H. (2006). How assessing reading comprehension with multiple-choice questions shapes the construct: A cognitive processing perspective. Language Testing, 23(4), 44174.
Shinn, M. R., Knutson, N., Good, R. H., Tilly, W. D., & Collins, V. L. (1992). Curriculumbased measurement of oral reading fluency: A confirmatory analysis of its relation to
reading. School Psychology Review, 21, 45979.
Shiotsu, T. (2010). Components of L2 reading: Linguistic and processing factors in the reading test
performances of Japanese EFL learners. Studies in Language Testing, 32. New York, NY:
Cambridge University Press.
Spolsky, B. (1995). Measured words. New York, NY: Oxford University Press.
Stedman, L., & Kaestle, C. (1991). Literacy and reading performance in the United States
from 1880 to the present. In C. Kaestle, H. Damon-Moore, L. C. Stedman, K. Tinsley,
& W. V. Trollinger, Jr. (Eds.), Literacy in the United States (pp. 75128). New Haven, CT:
Yale University Press.

16

Assessing Abilities

Taylor, C., & Angelis, P. (2008). The evolution of the TOEFL. In C. Chapelle, M. Enright, &
J. Jamieson (Eds.), Building a validity argument for the Test of English as a Foreign Language
(pp. 2754). New York, NY: Routledge.
Trites, L., & McGroarty, M. (2005). Reading to learn and reading to integrate: New tasks for
reading comprehension tests? Language Testing, 22(2), 174210.
Valencia, S. W., Smith, A. T., Reece, A. M., Li, M., Wixson, K. K., & Newman, H. (2010). Oral
reading fluency assessment: Issues of construct, criterion, and consequential validity.
Reading Research Quarterly, 45(3), 27091.
Weir, C., & Milanovic, M. (Eds.). (2003). Continuity and innovation: Revising the Cambridge
Proficiency in English examination 19132002. Cambridge, England: Cambridge University Press.
Wiliam, D. (2010). An integrative summary of the research literature and implications for
a new theory of formative assessment. In H. Andrade & G. Cizek (Eds.), Handbook of
formative assessment (pp. 1840). New York, NY: Routledge.
Yu, G. (2008). Reading to summarize in English and Chinese: A tale of two languages.
Language Testing, 25(4), 52151.

Suggested Readings
Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Educational Assessment:
Principles, Policy and Practice, 5(1), 774.
Chapelle, C. (2011). Validation in language assessment. In E. Hinkel (Ed.), Handbook of
research in second language teaching and learning (Vol. 2, pp. 71730). New York, NY:
Routledge.
Jenkins, J., Fuchs, L., van den Broek, P., Espin, C., & Deno, S. (2003). Sources of individual
differences in reading comprehension and reading fluency. Journal of Educational Psychology, 95, 71929.
Kamil, M., Pearson, P. D., Moje, E., & Afflerbach, P. (Eds.). (2010). Handbook of reading
research. Vol. 4. New York, NY: Routledge.
Kintsch, W. (1998). Comprehension: A framework for cognition. New York, NY: Cambridge
University Press.
Koda, K. (2005). Insights into second language reading: A cross-linguistic approach. New York,
NY: Cambridge University Press.
Perfetti, C., Landi, N., & Oakhill, J. (2005). The acquisition of reading comprehension skill.
In M. Snowling & C. Hulme (Eds.), The science of reading (pp. 22747). Malden, MA:
Blackwell.
Sadoski, M., & Paivio, A. (2007). Toward a unified theory of reading. Scientific Studies of
Reading, 11, 33756.
Weir, C. J. (1997). The testing of reading in a second language. In C. Clapham & D. Corson
(Eds.), Encyclopedia of language and education. Vol. 7: Language testing and assessment (pp.
3949). Norwell, MA: Kluwer.
Wiliam, D. (20078). Changing classroom practice. Educational Leadership, 65(4), 3642.