Vous êtes sur la page 1sur 28

Language Teaching

http://journals.cambridge.org/LTA
Additional services for Language

Teaching:

Email alerts: Click here


Subscriptions: Click here
Commercial reprints: Click here
Terms of use : Click here

Review of doctoral research in language assessment in Canada (20062011)


Liying Cheng and Janna Fox
Language Teaching / Volume 46 / Issue 04 / October 2013, pp 518 - 544
DOI: 10.1017/S0261444813000244, Published online: 24 September 2013

Link to this article: http://journals.cambridge.org/abstract_S0261444813000244


How to cite this article:
Liying Cheng and Janna Fox (2013). Review of doctoral research in language assessment in
Canada (20062011). Language Teaching, 46, pp 518-544 doi:10.1017/S0261444813000244
Request Permissions : Click here

Downloaded from http://journals.cambridge.org/LTA, IP address: 134.117.10.200 on 02 Aug 2016

c Cambridge University Press 2013


Lang. Teach. (2013), 46.4, 518544 
doi:10.1017/S0261444813000244

Surveys of Ph.D. Theses


Review of doctoral research in language assessment in Canada
(20062011)
Liying Cheng Queens University, Canada
liying.cheng@queensu.ca
Janna Fox Carleton University, Canada
janna_fox@carleton.ca
This paper reviews a selected sample of 24 doctoral dissertations in language assessment
(broadly defined), completed between 2006 and 2011 in Canadian universities. These
dissertations fall into five thematic categories: 1) reliability, validity and factors affecting test
performance; 2) washback (impact) and ethics; 3) raters, rating and rating scales; 4)
classroom-based research: teaching, learning and assessment; and 5) vocabulary learning,
lexical proficiency and lexical richness. The themes were categorized according to the
International Language Testing Association (ILTA) bibliographical categorization index. We
identify trends such as the methodological strength of complex mixed methods research
design, which enhances the validity of the research findings: 16 (67%) took a pragmatic
(rather than paradigmatic) approach in their use of mixed methods, with four (17%) opting
for multi-method quantitative approaches and four (17%) for qualitative. We also discuss the
depth and breadth of these dissertations and situate their scholarly contributions within
Canadian and international research on language assessment.

1. Introduction and selection protocol


In terms of area, Canada is the second-largest country in the world, with a relatively small
population of approximately 35 million people. According to the Association of Universities
and Colleges of Canada (AUCC), there are 83 universities in Canada that are independent
post-secondary education institutions with degree-granting authority. At the time of this
survey, 19 of these offered doctoral degree programs in the broader applied linguistics area
within a range of faculties, schools and departments. Canada has two official languages,
French and English (although Inuktitut is an official language in Nunavut, a federal territory
of Canada). Doctoral research is undertaken in both languages, but given the scope of the
present survey, we have focused exclusively on English-medium universities. Further, the

http://journals.cambridge.org

Downloaded: 02 Aug 2016

IP address: 134.117.10.200

LIYING CHENG & JANNA FOX: CANADA

519

numbers of doctoral dissertations in applied linguistics1 (broadly defined) across Canadian


English medium universities are impressively large, with three dominant areas of research: 1)
second language acquisition, 2) bilingual and immersion education, and 3) language teaching
and learning.
A search of the above three areas with Theses Canada Portal (www.collectionscanada.gc.ca/
thesescanada/index-e.html) identified more than 250 dissertations2 which had been
completed over the last five years (between 20062011). We therefore had to narrow our
review of Canadian dissertation research and have chosen to focus on research related
to language assessment3 . Our justifications for this are, first, that the three dominant
areas listed above are not mutually exclusive. Research related to language assessment
and testing is not only embedded within each of these areas, but is also becoming an
increasingly important link between them. In fact, many of the questions addressed in
doctoral research are directed at the central role of assessment and testing in teaching and
learning, within a range of curricular contexts and at different instructional levels and stages
of language development. Second, language assessment has gained in both prominence and
importance within Canadian applied linguistics research, as evidenced by the increasing
numbers of dissertations on the subject, as well as papers presented by Canadian graduate
students at the annual Language Testing Research Colloquium (LTRC) of the International
Language Testing Association (ILTA). We have also seen a dramatic increase in job openings
for language assessment specialists at the faculty level and in testing agencies within the
broader Canadian and international contexts. Increased interest in issues related to language
assessment in Canada has also resulted in the formation of a national association, The
Canadian Association of Language Assessment/Lassociation canadienne pour levaluation
des langues (CALA/ACEL), which was officially established in 2009. Third, both authors are
language assessment experts with the expertise necessary to judge the merits of dissertations
in this area. There is no doubt that language assessment will continue to be one of the key
focuses of future research in language teaching, as broadly defined by this journal, so we
believe a review of Canadian doctoral research on language assessment is both useful and
timely.

1 Applied linguistics (AL) is a notoriously conflicted term, with a long history in the literature. There are narrow
interpretations, such as linguistics applied (cf. Brumfit 1997; Widdowson 1998) and broad ones, such as those of Rampton
(1997) and Samson (2012), who argued that AL draws on theoretical foundations much larger than just linguistics,
including psychology, education, sociology and anthropology. The latter is in keeping with the definition given by the
American Association of Applied Linguistics (AAAL): a wide range of theoretical, methodological and educational
approaches from various disciplines from the humanities to the social, cognitive, medical, and natural sciences
(n. d.). This broader definition is characterized by interdisciplinarity (according to Samson 2012) and the theoretical
and empirical investigation of real-world problems in which language is a central issue (Brumfit 1997: 93). Here we have
adopted this broad definition of the term.
2 We have adopted the term doctoral dissertation as it is the term used in Canadian universities.
3 We have used the term language assessment to refer to broad issues in language assessment and testing. We regard
language assessment as an umbrella term, which includes language testing, but, consistently with research terminology
in the field (cf. journals such as Language Testing and Language Assessment Quarterly), we use language assessment, language
testing and language assessment and testing interchangeably.

http://journals.cambridge.org

Downloaded: 02 Aug 2016

IP address: 134.117.10.200

520 SURVEYS OF PH.D. THESES

A number of different search methods were used to identify dissertations in language


assessment and testing. First, data were collected from the Library and Archives Canada Theses
Canada Portal. Under the advanced search option, keywords such as language test, language
testing, language test performance and classroom assessment were used, specifically for
dissertations at the Ph.D. level. This search identified six dissertations written within the
search time framework. Broadening the search keywords to ESL language assessment, ESL
and assessment, ESL and assessing, language and scoring and language and feedback,
ten more dissertations were identified. One of these was an Ed.D. thesis (Wakamoto 2007),
but because only a few universities in Canada offer such professionally-oriented degrees,
we decided to restrict our review of doctoral research to Ph.D. dissertations. After careful
reading, three more dissertations, those by Ishii (2009), Suzuki (2009) and Yang (2008), were
excluded from the review due to the narrow linguistic focus of their studies, that is, they did
not show the relationship between teaching, learning and assessment, or have implications
for language teaching as defined by the journal (see also footnote 1, in which we clarify our
definition of applied linguistics). Thus, as a result of our initial search, 12 out of the 16 Ph.D.
dissertations that we had identified were included at this stage.
We then contacted 15 professors who worked in the broader area of applied linguistics
within 11 universities offering doctoral programs that included a focus on language assessment
and testing, namely, the University of British Columbia, University of Alberta, University of
Calgary, University of Toronto, Brock University, University of Ottawa, McGill University,
University of Montreal, Concordia University, University of Sherbrook and Laval University.
We located a further 12 dissertations as a result. This paper thus reviews a total of 24 Ph.D.
dissertations in language assessment and testing.
Each of us then independently examined the abstracts of the 24 dissertations in order to
identify recurring topics. We then met and compared our analysis, reducing the number of
topics and refining our definitions. At this point, we consulted the International Language
Testing Association (ILTA) bibliographical categorization index (www.iltaonline.com) to
further refine and validate the topics. Using the ILTA classification index, we reduced
the number of topics to five: 1) reliability, validity and factors affecting test performance;
2) washback (impact) and ethics; 3) raters, rating and rating scales; 4) classroom-based
research: teaching, learning and assessment; and 5) vocabulary learning, lexical proficiency
and lexical richness. We have situated our review within the field of language assessment by
discussing the impact and/or contributions of these dissertations. Due to the considerable
variety of dissertation topics and the varied methodology/methods employed in these 24
doctoral research studies, our review is at times uneven. Dissertations which demonstrated
methodological strength are given greater attention because of the greater validity of their
findings and their original contribution to scholarship within the field of language assessment.
We also made an effort to point out some of the unique features of certain dissertations,
even if they are of single-method design. Later in our review we have highlighted the
interdisciplinary nature of applied linguistics. The language assessment research considered
here, for example, was influenced by educational measurement, educational psychology,
linguistics, sociology and philosophy, along with other humanities and social science
disciplines. Our review ends with some concluding remarks and a consideration of future
directions.

http://journals.cambridge.org

Downloaded: 02 Aug 2016

IP address: 134.117.10.200

LIYING CHENG & JANNA FOX: CANADA

521

2. Reliability, validity and factors affecting test performance


At the core of any testing inquiry is the question of validity. Test developers and test users
want to be confident in the inferences drawn from testing results, and the outcomes that
are a consequence of those results. Validity, including reliability, is a way to view and
interpret test information (Moss, Girard & Haniford 2006). More specifically, establishing
the validity of a test involves determining the appropriateness and accuracy associated with
assessing student ability (Messick 1989). Test validity should be approached longitudinally
from multiple dimensions, because interpretations of validity concepts such as accuracy and
appropriateness are influenced by the assumptions made about student learning, the purpose
of the test, and the importance that is assigned to the testing context. The more traditional
types of language testing research typically involve issues of both reliability and validity,
which are, therefore, at the heart of language testing research (Bachman 2000). Research
into factors affecting test performance also falls into the broad category of test reliability
and validity research, because it adds to our understanding of the kinds of variables (beyond
language ability itself) that contribute to students test scores, thus helping to measure the
accuracy and appropriateness of our tests.
Five doctoral dissertations considered in this review fall into this category: Abbott (2005),
Farnia (2006), Gao (2007), Limbos (2005) and Zheng (2010). The first four were concerned
with test reliability and validity, and all focus on reading. The final study, Zheng (2010),
investigated factors influencing test performance of language achievement and proficiency.
Abbott (2005) and Gao (2007) dealt with adults reading in large-scale testing, while Farnia
(2006) and Limbos (2005) investigated reading achievement and reading difficulties in
children. Abbott (2005) and Gao (2007) are guided by the educational measurement and
cognition literature while Farnia (2006) and Limbos (2005) were informed by literatures drawn
from educational and clinical psychology. Zhengs (2010) study explored factors contributing
to the test performance (in English) of Chinese learners at university level. Three of these
studies Abbott (2005), Gao (2007) and Zheng (2010) were aligned with the core research
tradition in language assessment, while those of Farnia (2006) and Limbos (2005) belonged
more to the reading psychology tradition, which indicates the theoretical and interdisciplinary
diversity of these studies.
Abbott (2005) tested the hypothesis that some of the test items included in the Canadian
Language Benchmarks Assessment (CLBA) reading subtest favor certain groups whose first
language orthographies differ markedly from that of English. The study concerned Mandarin
speakers, who have a tendency to use bottom-up local reading strategies, and Arabic speakers,
who tend to have top-down global reading strategies. The study was in two parts and used
a mixed method design (Creswell & Plano-Clark 2007; Teddlie & Tashakkori 2009). Verbal
report data were first collected from Mandarin and Arabic learners to identify, clarify and
elaborate on the reading strategies engaged by the CLBA reading comprehension tasks. In
part two of the study, Abbott collected data from 250 Arabic and 250 Mandarin examinees.
Three reading experts assigned each of the 32 CLBA reading items to one of the seven bottomup or top-down reading strategy categories identified in part one of the study. SIBTEST
analysis revealed systematic group differences in both bottom-up and top-down reading
strategy categories. Items involving breaking words into smaller parts, scanning for details,

http://journals.cambridge.org

Downloaded: 02 Aug 2016

IP address: 134.117.10.200

522 SURVEYS OF PH.D. THESES

identifying synonyms or paraphrases and matching key vocabulary in the text were found to
favor Mandarin speakers, while items involving skimming for the gist, connecting or relating
information presented in different parts of the tests and drawing an inference favored Arabic
speakers. These results provide evidence for the validity of the bottom-up and top-down
reading strategy framework (Grabe 2009; Grabe & Stoller 2011) and a substantive method
for interpreting group differences on the CLBA reading assessment. This study employed
verbal protocol data and sophisticated statistical approaches such as differential item function
analysis, which are common to research methodologies and analytical approaches in language
testing (Bachman 2000). This mixed method study was elegantly designed and executed
and has led to a number of publications (e.g. Abbott 2007). The same characteristics and
contribution are also seen in the study by Gao (2007) discussed below, again in the area of
reading assessment.
Gaos (2007) study addressed the call to integrate cognitive psychology with assessment
to inform test design and validation and to provide detailed diagnostic feedback. One
approach to this is to model item statistics, in particular item difficulty. This study used
a cognitive-psychometric approach to model the reading items included in the Michigan
English Language Assessment Battery (MELAB) and tested the model through four stages:
a. An initial cognitive model was hypothesized to underlie the MELAB reading performance
based on the review of the process associated with reading and reading test taking by ESL
learners. The model was then validated by
b. having the cognitive demands made by the test items analyzed by raters,
c. collecting students verbal reports of the processes they used to arrive at the correct
responses, and
d. examining the relationship between the proposed cognitive processes and item difficulty
estimates using a tree-based regression.
The study demonstrated the value of using multiple sources of evidence, using a mixed
method design and employing different participants (e.g. raters and test-takers) to evaluate a
cognitive model and the relationship between cognitive psychology and language assessment.
Considering the interdisciplinary nature of language testing (Bachman 2000), studies of this
nature have particular strength and make an original contribution to our understanding of
test validation.
Zheng (2010) examined motivation, anxiety, global awareness and linguistic confidence
and their relationship to language test performance within the context of Chinese university
students taking the College English Test Band 4 (CET-4) in China. Using a mixed method
approach, through concurrent survey and interview inquiries, this study explored whether
and how the selected psychological factors contributed to students CET performance.
Results from exploratory factor analysis revealed that Chinese university students displayed
three types of instrumental motivation (mark orientation, further-education orientation
and job orientation), two types of anxiety (language anxiety and test anxiety) and two
types of confidence (linguistic confidence and test confidence). The results of confirmatory
factor analysis led to a modified socio-educational model of motivation with some contextspecific concepts (new instrumental orientations, global awareness and linguistic confidence)

http://journals.cambridge.org

Downloaded: 02 Aug 2016

IP address: 134.117.10.200

LIYING CHENG & JANNA FOX: CANADA

523

that more accurately represented the characteristics of the Chinese university students.
The results of structural equation modeling confirmed that attitude toward the learning
situation and integrative orientation were two strong indicators of motivation, which in
turn influenced language achievement and confidence. The negative impact of anxiety on
language achievement was confirmed. Certain group differences were found between male
and female students, high and low achievers, students from the arts programs and those from
the science programs, and students who started to learn English before Grade 7 and those
who did so after Grade 7. The interview findings further confirmed stronger instrumental
than integrative orientations within the study context. This study tested well-developed
motivation and anxiety models in the Chinese context and expanded our knowledge of
theory development in English language education. The implications of this study point
to the importance of understanding language test-takers characteristics in their macro- and
micro-learning contexts, thus providing evidence for the meaning of test scores and informing
test score use.
What impressed us is that all the three studies above 1) used multiple sources of evidence
to support their findings; 2) collected data concurrently or sequentially at different stages;
3) conducted data analysis using advanced statistical procedures alongside sophisticated
interpretative or qualitative analysis. All three are of high quality and have been published
(cf. Abbott 2007; Gao & Rogers 2011).
Farnia (2006) and Limbos (2005) took a different approach, collecting data through
individual testing of reading with younger students. Both studies therefore addressed the
nature of reading development at the school level and involved students learning English as
a first and a second language. Farnia (2006) was concerned with uncovering the patterns
of growth in text-reading fluency and reading comprehension in two groups of students:
children with English as a second language (ESL) and those with English as a first language
(EL1). The study explored the role of linguistic and cognitive predictors in understanding this
growth. The sample consisted of 50 EL1 and 107 ESL children whose performance on the
potential predictors of memory, phonological processing, spelling, word level reading skills
and oral language proficiency (vocabulary and grammar) was assessed yearly from Grade 1
to Grade 6. Text-reading fluency and reading comprehension were tracked from Grade 2 to
Grade 6.
Three interrelated studies were conducted. Study 1 focused on growth trajectories in
lower, word-level and high-level (comprehension and fluency) reading skills in Grades 13
and Grades 46. Hierarchical Linear Modeling indicated that ESL children developed faster
than EL1 children in word-level reading skills, aspects of phonological processing, memory
and oral language. Spelling and language proficiency skills were significant concurrent and
longitudinal predictors of reading comprehension in Grades 46. Study 2 revealed similar
patterns of text-reading fluency trajectories for the EL1 and ESL children. Regardless of
initial text-reading fluency level and language group, students who were able to read words
fluently at the outset demonstrated a faster growth rate from Grades 2 to 6. Study 3 explored
further facilitation between text-reading fluency and reading comprehension at Grades 4
6, and results showed that such growth among the EL1 and ESL students contributed
to their later performance in reading comprehension. This study considered implications
for assessment, the identification of developmental risk factors, practical means to improve

http://journals.cambridge.org

Downloaded: 02 Aug 2016

IP address: 134.117.10.200

524 SURVEYS OF PH.D. THESES

text-reading fluency and reading comprehension for ESL children in comparison to EL1
students. Studies of this nature are extremely important for countries like Canada where
schooling involves both groups of students with reading development at various levels.
Limbos (2005), in a similar study, explored the common misconception that assessment
of reading disabilities in ESL students must be delayed for two or three years until oral
language skills have developed. Using a battery of measures assessing various domains of
reading, memory and cognitive and oral language, the study collected data from 339 Grade
1 students (107 EL1 students and 232 ESL) and 253 Grade 3 students (80 EL1 and 173 ESL).
Confirmatory factor analysis corroborated the theoretical constructs of Phonology, Oral
Language, Verbal Memory and Reading at Grades 1 and 3. Structural equation modeling
was used to examine the relationship of these constructs to Grades 1 and 3 reading criteria
and supported the phonological-core deficit model of reading. Logistic regressions showed
that at both Grades 1 and 3, a composite Phonology score consistently made a significant,
unique contribution to the predictor of at-risk EL1 and ESL students. The data suggested
that it is possible to identify EL1 and ESL students who are highly or moderately at risk at
Grade 1. As mentioned above, the Limbos (2005) and Farnia (2006) studies are of critical
importance to Canada because both EL1 and ESL students were studied and compared
within the same classroom context. Such studies have important implications for teaching
and assessment in multilingual and multicultural classrooms and important impacts on policy
within the Canadian education context.

3. Washback (impact) and ethics


Research into washback and the impact and consequences of testing has flourished for the past
25 years and has become a prominent area of research in language testing. While traditional
washback research focused on the teaching and learning context, impact studies have
expanded it to a broader range of test stakeholders in ever-larger contexts. The issue of ethics
in language testing is also relevant here, since, whatever the scope of the study, it is at the core of
language testing research relating to test validation and the concept of consequential validity
(Messick 1989). Messick and others have argued that test consequences lead to inferences
about test performance and thus influence test validation (Messick 1996; Kane 2002; Mislevy,
Steinberg & Almond 2003; Moss et al. 2006). McNamara & Roever (2006) argue that
one of the greatest challenges facing these researchers is the clear separation between the
psychometric, positivist (or post-positivist) orientations to interpretation about test-taker
performance and the messy, socially constructed nature of consequences. Addressing this
relationship tends to improve the quality of the research, a fact that is demonstrated in the
six studies reviewed below.
The six studies that expand these discussions on washback and impact in language testing
are those by Fleming (2007), Sterzuk (2007), Mullen (2009), Baker (2010), Shih (2006), Tan
(2009) and Wang (2010). Of these, four might be considered traditional washback research,
collecting data from teachers and students, while two (Fleming 2007; Sterzuk 2007) employed
a more philosophical framework to investigate power and identity issues that arise from testing.

http://journals.cambridge.org

Downloaded: 02 Aug 2016

IP address: 134.117.10.200

LIYING CHENG & JANNA FOX: CANADA

525

Fleming (2007) drew on the voices of Punjabi-speaking immigrants enrolled in a


government-sponsored ESL program and explored how these adult newcomers to Canada
constructed new national identities. The study, using survey methods of questionnaires and
interviews, traced how the common threads in their conceptions of citizenship compared to
those embedded within the national ESL assessment the Canadian Language Benchmarks
Assessment (CLBA) and its associated curriculum documents. It further illustrated how these
documents construct and position idealized conceptions of second language (L2) learners.
Assessment and curriculum have the power to construct identity, especially in immigrant
countries like Canada, and also shape teaching and learning within their context (Shohamy
& McNamara 2009). The study showed that there were significant gaps between the national
assessment of the CLBA (and the associated curriculum documents used in this context)
and the range of views expressed by the learners in this study. Based on this research,
Fleming (2007) outlined some implications associated with L2 citizenship education in terms
of research priorities, national curriculum development and pedagogical treatment options.
In addition, three specific recommendations are made with regard to curriculum content:
that citizenship content be made more explicit within the Canadian national assessment
program and curriculum documents; that this content emphasize positive representations of
learners in documents as active and socially integrated; and that this content be centered on
the legalistic aspects of citizenship and avoid the use of singular normative cultural standards.
Sterzuk (2007), like Fleming (2007), provides an alternative way of conducting washbackand impact-related studies, employing a philosophical rather than a validity framework
(see the discussion in Cheng 2008). This study, using a postmodernist and critical inquiry
approach, focused on the negotiation of power in schools and the social and academic
experiences of First Nations4 and Metis children in Saskatchewan, who speak a non-standard
variety of English called Indigenous English. The study explored these learners experience
in a class of 25 and also interviewed 11 educators to explore their perceptions of the learners
literacy and language ability. The results of the study indicate that the First Nations children
speak a dialect of English that differs phonologically, morphologically, syntactically and
lexically from Standard English. They also differ in discourse behaviors from their White
settle classmates. The study examined the childrens speech and classroom behaviors in
order to identify important characteristics of their discourse. The findings derived from
the educators perceptions have implications for teaching, literacy development, classroom
management, evaluation and, in particular, for the referral of these students for speech and
language assessment. The context of the study is extremely important in that it was the only
one over this five-year period that researched Aboriginal students in Canada; the rest of the
doctoral research reviewed here involved ESL and international students (test-takers).
Baker (2010) explored one case of high-stakes language assessment English proficiency
assessment for teacher certification in Quebec, Canada. She conducted three interrelated
studies as a manuscript-based thesis (thesis by publication). The first examined the final
administration of a writing test, the second, the pilot administration of a new replacement
4 The Canadian constitution recognizes three groups of Aboriginal people (the original peoples of North America and
their descendants): Indians (commonly referred to as First Nations), Metis and Inuit. These are distinct peoples with unique
histories, languages, cultural practices and spiritual beliefs. More than one million people in Canada identify themselves
as Aboriginal, according to the 2006 census. See www.aadnc-aandc.gc.ca/eng/1100100013785/1304467449155

http://journals.cambridge.org

Downloaded: 02 Aug 2016

IP address: 134.117.10.200

526 SURVEYS OF PH.D. THESES

test, and the third, the first official administration of the new test. This sequential and mixed
method design allowed the researcher to examine different aspects of the tests impact on
stakeholders and raters. This program of research is CRITICAL in that
It integrated social and political values into the test validation process suggested by
Cheng (2008) and Moss et al. (2006);
Test stakeholders, including the test-takers (teachers applying for certification)
themselves, were not only consulted but also determined to a great extent the direction
of the research program. Test-takers are increasingly being studied as they bear the
highest stakes of all (Fox & Cheng 2007; Cheng & Deluca 2011);
The conflicting views and competing interests of the stakeholders were embraced.
Similar results were found by previous studies (e.g. Fox 2003; Qi 2007).
Bakers three interrelated studies (2010) demonstrate a responsible and progressive approach
to researching the assessment of language proficiency for professional certification, an area
of language assessment that has been relatively less researched.
Shih (2006), Tan (2009) and Wang (2010) examined, primarily using survey methods
(questionnaires and interviews), the impact and washback of testing programs elsewhere in
the world. Shih (2006) used a case study approach and investigated a range of stakeholders
perceptions of the Taiwanese General English Proficiency Test (GEPT) and its washback
on schools policies, teaching and learning of English. The GEPT is a test of English
language proficiency for Chinese learners of English at all levels and has gained societal
acceptance for educational and employment purposes in Taiwan. The study was conducted
at applied foreign language departments in two technological (vocational) institutes. The
principal research methods used were interviews with the departmental chair, teachers,
students and their immediate family members, and observations of English language courses.
Such research methods are often used in traditional washback studies. However, Shihs
inclusion of family members as stakeholders is unusual in studies of this nature. His findings
revealed several features of the GEPTs washback on teaching and learning within the
research context. The dissertation concluded by proposing a washback model of students
learning; this will be very useful for future research into washback, as this area of empirical
research is relatively new (20 years) and is in need of both theoretical and methodological
guidance.
Wang (2010) explored the washback effects of the CET (College English Test) on teacher
beliefs, interpretations and practices, in particular how the teacher factor is manifested in
the washback phenomenon as proposed by Watanabe (2004). Wang (2010) also investigated
the pedagogical as well as the social and personal influences of the test on teachers beliefs
and interpretations and practices within the Chinese tertiary context. Participants were 195
tertiary-level EFL teachers in the non-English programs. The study asked whether tests are a
major constraint on College English instructional innovation in China and what aspects are
relevant to this factor (e.g. teacher beliefs, knowledge or experiences) and present a barrier
to the implementation of instructional change. A mixed methods approach of a teacher
survey and in-depth case studies was used to collect data. Qualitative analysis involved the
use of the constant comparative method, while quantitative analysis involved descriptive

http://journals.cambridge.org

Downloaded: 02 Aug 2016

IP address: 134.117.10.200

LIYING CHENG & JANNA FOX: CANADA

527

and inferential statistics (e.g. exploratory factor analysis, confirmatory factor analysis and
structural equation modeling). The findings from this study suggest that the CET, coupled
with various interrelated components of the teacher factor, does indeed create a washback
effect. Given the complexities underlying the washback phenomenon, the educational change
in curriculum and assessment is not sufficient on its own to entail teacher change in terms
of pedagogical strategies as previously suggested by Cheng (2005) and Wall (2005). Wang
(2010) emphasized that for fundamental changes in teacher practice to occur, they must be
accompanied by other changes in teachers knowledge, beliefs, attitudes and thinking that
inform such practice.
Tans (2009) study, in a secondary setting in Malaysia, examined a change in the language
of instruction for mathematics and science subjects from Bahasa Malaysia to English in
Malaysia. This policy has two objectives: to promote student learning of mathematics and
science, and to increase student proficiency in English. The Education Ministry also chose to
create a washback effect by introducing a bilingual high-stakes secondary exit examination.
As is common in applied linguistics, a new test (a revised testing system) is introduced to bring
about changes: what is assessed becomes what is taught. This is an area that has been studied
by many researchers (e.g. Cheng 2005; Wall 2005). To examine the policy, the study drew on
the insights and perspectives offered by the literature on educational change, content-based
instruction and washback in language testing. This was a mixed methods study involving
a longitudinal and cross-sectional research design. The results point to the complexity of
educational change processes and indicate that the classroom implementation of this policy
is affected by multiple factors.
Mullen (2009) studied the impact of using a proficiency test as a placement tool. This
is not an impact and washback study in the traditional sense, in that the study focused on
the impact (area, extent and direction) of using a test for a purpose for which it was not
originally designed. In particular, the research questions asked if a standardized proficiency
test, the Test of English for International Communication (TOEIC), when used for placement
purposes, had an impact, defined as one of the six qualities in the model of Test Usefulness
(Bachman & Palmer 1996). Increasing numbers of such studies of language testing are being
carried out, as tests designed with one purpose are retrofitted for another (e.g. Fox 2009;
Doe 2011; Fox & Hartwick 2011). In Phase I of this sequential mixed methods evaluative
study, 15 teachers and 677 North American university students who were studying L2
English answered questionnaires about their experiences of taking the TOEIC. First, the
results showed that the group of 126 misplaced student participants differed significantly
from the 551 correctly placed students on the dimensions of college English experience,
age and first and second languages. The first of these was correlated most strongly with
misplacement. These differences were found to relate to the TOEIC and its practice of
predicting productive skill ability from formally measured receptive skill ability. Second,
teachers who held Masters degrees had significantly fewer student failures in their classrooms
than teachers with Bachelors degrees. In response to the studys finding that impact exists
in the form of student misplacement, in Phase II, teacher and student interviews focused on
their experiences related to misplacement. Five teachers, who held a Bachelors or Masters
degree, and 13 students, eight misplaced and five correctly placed, supported the findings
of Phase 1, showing that the practice of predicting productive ability from receptive ability

http://journals.cambridge.org

Downloaded: 02 Aug 2016

IP address: 134.117.10.200

528 SURVEYS OF PH.D. THESES

resulted in misplacement for certain groups of students. In turn, this misplacement resulted
in three consequences for stakeholders, involving the apparent validity of the TOEIC for
making course-level changes, the misplaced students willingness to learn, and the teachers
decision to award pass grades to the misplaced. There is evidence in this study that when a
proficiency test is used for placement purposes, misplacement occurs with identifiable groups
of students and impacts all stakeholders. Studies of this nature are extremely important not
only for Canada but also for English-medium universities around the world where a fast
increasing number of international students choose to pursue their degrees in English. One
of their first steps is to take an entrance English test and be placed into English programs
using such test scores.

4. Raters, rating and scales


The doctoral dissertations discussed in this section, by Barkaoui (2008), Isaacs (2010) and
Kim (2010), are representative of a dominant trend in language testing and assessment
research, namely, the investigation of the complex interactions between raters, ratings and
scales in the evaluation of L2 spoken and written performances. All three researchers use
mixed method approaches (Creswell & Plano-Clark 2007; Teddlie & Tashakkori 2009),
explicitly acknowledging their perspective as PRAGMATIC (Greene, Caracelli & Graham 1989),
and arguing that mixed method research inevitably triangulates findings by drawing from
different sources of evidence about the same phenomena, thus increasing the potential
validity of the findings. Taken together, these three dissertations extend an important line of
validity inquiry, which has risen in prominence over the past decade due to the landmark
work of such researchers as Fulcher (1996) and Turner & Upshur (2002), who argued for
empirically-derived scales in the measurement of written and spoken performance.
Barkaoui (2008) explored the effects of scoring method (holistic or multiple-trait) and
rater experience on the outcomes of ESL essay-rating performance. Holistic scales direct
the raters attention to a general, overall impression of the quality of a written text, whereas
multiple-trait scales direct their attention to specific features of writing (e.g. vocabulary, verb
tense agreement and use of transitions). Interestingly, he found that both methods were
measuring the same or similar constructs in his study; however, holistic methods appeared
to support greater inter-rater reliability, whereas multi-trait methods allowed for finer grain
evaluations of writing and supported higher rater self-consistency, especially for novice raters.
His findings suggest that focusing on the criteria in a multi-trait scale helped to encourage
more self-monitoring strategies, and required raters to attend to a wider range of features in
the writing and use all of the rating criteria in the scale.
Barkaoui reports that novice raters differed from experienced raters in his study in that
they varied more in their judgments, focused more intently on specific features of the scale,
and took more time. None of these findings are particularly surprising, as they are consistent
with those reported in the research literature on rater-scale interactions and the role of
experience (and training) in rater response (see, for example, Cumming 1990; Hamp-Lyons
1990, 1995; Weigle 1994; Barkaoui 2007). Barkaouis dissertation is remarkable, however, in
its meticulous analysis of both qualitative data, elicited through interviews and think-aloud

http://journals.cambridge.org

Downloaded: 02 Aug 2016

IP address: 134.117.10.200

LIYING CHENG & JANNA FOX: CANADA

529

protocols, and quantitative data, utilizing a multi-faceted Rasch approach (FACETS) to


examine the effects of scoring method on (a) estimates of examinee ability, (b) rater severity,
(c) rater self-consistency, and (d) biased interactions between facets in the rating context
(Barkaoui 2008: 100). He not only combines qualitative and quantitative approaches in
examining rater-scale interactions, but also demonstrates understanding of the implications
of mixed method research, which other researchers might choose to ignore or underplay.
For example, in drawing on think-aloud protocols and interviews, he builds into his study
procedures to assess the effects of thinking aloud, in order to examine the impact of these
research techniques on his findings, inferences and conclusions. In so doing, he points out,
there are real advantages for combining methods in test validation research provided that
one does not adopt a naive view that such approaches will be unproblematic (Barkaoui
2008: 226).
Kim (2010) also engaged in a complex, two-phase, mixed method validation study of a
criterion-referenced scale or checklist. She investigated scale development and validation
within the under-researched (Fox 2009; Fox & Hartwick 2011) and still under-theorized
(Alderson 2007) area of diagnostic assessment of L2 academic writing development in the
context of classroom-based assessment. In phase one, the scale criteria for the Empiricallyderived Descriptor-based Diagnostic (EDD) Assessment the focus of her research were
elicited through think-aloud protocols provided by nine ESL teachers. The teachers rated and
suggested the diagnostic feedback they would provide to students based on their evaluation
of ten L2 essays, written in response to prompts drawn from the Test of English as a Foreign
Language (TOEFL). As a result, criterion-referenced descriptors of academic writing were
identified as the basis of the EDD checklist (39 descriptors were initially identified, further
analysis reducing the number to 35).
In phase two, the EDD checklist was piloted, modeled and evaluated. In order to determine
its generalizability across raters and essay prompts, seven ESL teachers used the EDD
checklist to assess 80 TOEFL essays. They also completed a questionnaire in which they
accounted for their scores and scoring process, and were subsequently interviewed about
their use of the checklist. Scores were analyzed for their generalizability using FACETS.
Kim also investigated the relationship between the EDD checklist and performance on other
measures of ESL academic writing, comparing her EDD results with those of professional,
Educational Testing Service (ETS) raters, who are trained and calibrated to mark TOEFL
essays consistently in order to provide reliable (and defensible) scores on this high-stakes
proficiency test. In her main study, ten ESL teachers assessed 480 TOEFL essays using
the EDD checklist, and the scores were further analyzed to examine the dimensionality
of the construct of L2 or ESL academic writing. Finally, classroom teachers questionnaire
and interview responses were analyzed to evaluate positive and negative washback (Cheng,
Watanabe & Curtis 2004) at the classroom level as a result of the use of the EDD checklist.
Like Barkaoui (2007), Kim applied FACETS to examine marking consistency and found
evidence that supported Barkaouis (2008) findings, namely that it is hard to achieve a high
degree of agreement between raters who are not professional assessment raters (Kim 2010:
137). She notes that the patterns in her own study are very similar to those reported by
Barkaoui (2008), who reported that teachers reached a level of agreement of 22.4% when
using a nine-point holistic scale, which increased to 23.1% with a nine-point multi-trait or

http://journals.cambridge.org

Downloaded: 02 Aug 2016

IP address: 134.117.10.200

530 SURVEYS OF PH.D. THESES

analytic scale; and that experienced raters exhibited a higher degree of agreement (26.1%)
than novice raters (20%).
Kims (2010) study is extremely complex methodologically, weaving qualitative and
quantitative approaches together across two phases with multiple embedded studies and
five complex research questions. Like Barkaoui, she defines her approach as grounded on
pragmatism (p. 57) and explicitly acknowledges that mixed methods research can reduce
biases and limitations inherent in a single method, while strengthening the validity of inquiry
(ibid.). Of course, one of the dangers of such complex studies is that they are more difficult
to control for coherence. In this regard, Barkaouis study is more effective, but Isaacs (2010,
see below), who also uses a mixed method approach in her consideration of pronunciation in
speaking tests, utilizes the increasingly popular manuscript-based thesis option (Baker 2010),
which is particularly suited to complex, multi-step, mixed methods research, but remains
quite rare in Canadian doctoral research.
At the beginning of her dissertation, Isaacs explains that she has chosen to prepare a
manuscript-based thesis, comprised of a collection of three studies that are part of the same
overall program of research, in lieu of a traditional thesis (Isaacs 2010: iv). All the studies
were co-authored, with Isaacs as first author in each case, and one of the studies (Isaacs
& Trofimovich 2011) had been published by the time she completed her dissertation. The
advantages of such a manuscript approach in doctoral dissertation research are clear not
only in terms of the inherent coherence of the dissertation itself, but also with regard to
the next steps for doctoral graduates, who are enabled to apply for positions following their
degree completion with a clear program of research and a list of publications in peer-reviewed
journals.
Isaacs (2011) program of research examined variation in raters judgments of L2
pronunciation, an under-researched construct, which continues to be the EFL/ESL orphan
(Gilbert, cited in Isaacs 2010: 184). Specifically, she states that her purpose was to better
understand major constructs in L2 pronunciation research, improve current measurement
practice, and, ultimately, reinvigorate the conversation on L2 pronunciation assessment,
which has been virtually absent from the research agenda since the time of Lado (1961),
(Isaacs 2010: 24).
In study one, with her co-researcher (Isaacs & Trofimovich 2011), Isaacs takes a particularly
novel approach, recruiting 30 music majors and 30 non-music majors to rate 40 L2 speech
samples for three key features of pronunciation: comprehensibility, accentedness and fluency.
With their musical experience, music majors are more attuned to SOUND qualities, have a
heightened sense of the musicality, rhythm, flow, cadence and clarity of speech. Isaacs reports
that music majors assigned significantly lower ratings than non-music majors for accentedness,
particularly for low ability learners, but phonological memory and attention control did not
influence their ratings. In study two, Isaacs and her co-researcher investigate the effects of
scale length and rater experience on raters judgments of the same three features. The results
showed that experienced and novice raters achieved a high degree of consensus about the
highest and lowest scoring L2 speakers, but had difficulty differentiating between scale levels
in the absence of guidance from the rating instrument. Study three is the direct outcome of
studies one and two, as Isaacs confronts the key issue: what is the construct? (Bachman 2007:
41), recognizing that the numerical scales used in studies one and two did not sufficiently

http://journals.cambridge.org

Downloaded: 02 Aug 2016

IP address: 134.117.10.200

LIYING CHENG & JANNA FOX: CANADA

531

articulate what separated one scale point from another in terms of specific performance
features that raters should attend to. Like Barkaoui (2008) and Kim (2010), Isaacs attempts
to construct an empirically-driven L2 scale that builds into the scoring rubrics, the qualities
of the L2 performance that appear to be most salient to raters (Isaacs 2010: 127).

5. Classroom-based research: Teaching, learning and assessment


Six of the dissertations considered in this review focused on teaching, learning and assessment
in language classrooms across a range of program types, language learners and purposes
for learning. We divide our discussion of these dissertations into two sub-sections: those
investigating adult language learners and those investigating young language learners (YLLs).
This approach is consistent with the research literature, which typically recognizes that the
learning issues and language needs of YLLs differ considerably from those of older or adult
language learners (McKay 2006).

5.1 Older or adult L2 learners


Four of the dissertations Colby (2010), Neumann (2010), Seror (2008) and Song (2007)
investigated the learning, teaching and assessment of older or adult L2 learners, focusing
on issues arising from the demands for academic literacy in pre-university or universitylevel courses. Both Colby (2010) and Neumann (2010) applied mixed method triangulation
designs (Creswell & Plano-Clark 2007) in their investigations of L2 classroom contexts, while
Seror (2008) conducted an eight-month longitudinal ethnographic case study focusing on the
writing development of five Japanese exchange students (and four of their instructors) in a
large Canadian university. His qualitative study examined the role of feedback in these L2
students writing development, contrasting ideal and actual representations of what feedback
practices meant (p. 40) to the participants.
While Colby (2010) and Neumann (2010) focus on classroom-level practices, Seror (2008)
conceptualizes a model of feedback as a literacy practice that simultaneously works to
accomplish various pedagogic, institutional, economic, and socialization functions (p. 155).
This FEEDBACK model is informed by larger social theories of academic literacy and
knowledge construction (p. 156) and encompasses contextual factors such as the larger
institutional discourses (both public and private), which surround and affect the way that a
specific literacy practice is conceptualized, justified, valued, and ultimately executed (ibid.).
Using an elegant and engaging narrative style, Seror (2008) brings his participants to life
as he explores the impact of feedback practices on their writing, learning and identities,
focusing on the impact of critical incidents. . .where students reacted and/or interacted
with comments in the texts they had written (p. 40). He demonstrates the benefits of
longitudinal ethnographic case study research, as he skillfully traces the impact of discursive
micro-level reactions and interactions on the participants over time (e.g. examining students
writing and their perceptions of instructor feedback in the margins of an essay; observing
students interaction or lack of it with their peers or their instructors, in and out of

http://journals.cambridge.org

Downloaded: 02 Aug 2016

IP address: 134.117.10.200

532 SURVEYS OF PH.D. THESES

class). Through methodical attention to detail he documents feedback events (within and
outside the classroom) that effect the socialization of these L2 learners. Ultimately, this
allows him to suggest how these processes unfold in both predictable and unpredictable
fashions (p. 156) within particular contexts. This leads him to propose a model of
feedback across the dimensions of text, classroom and institution, which serves pedagogic,
socialization, institutional and economic functions (p. 147). Serors study thus takes him
from a consideration of micro-level texts and individuals to discussions of macro-level social
and institutional contexts; in contrast, Neumann (2010) focused her research on what two
teachers of writing attended to when assessing the writing of 33 L2 students, studying in
an English-medium university in Quebec, and how these teachers feedback impacts their
students learning.
The overall purpose of Neumanns complex, mixed method, multi-phase study was to
investigate how teachers of writing assess the grammatical ability of their university-level
L2 student writers. She began in phase one of her study with a statistical analysis of 33
students essay exams, hand-tagging, coding and tallying features of their writing that have
been linked in the research literature to grammatical ability (e.g. clauses, sentences and tunits, and morphological and syntactic errors). She then used multiple regression analysis
to determine which type of features, ACCURACY MEASURES (morphological errors/t-units) or
COMPLEXITY MEASURES (syntactic errors/t-units), best predicted the grammar grades assigned
by the teachers. Simultaneously, she applied systemic functional linguistic analysis (e.g. Eggins
2004) to assess the students ability to manage information in their texts.
In phase two, a questionnaire survey and interviews were conducted with the students
to evaluate the students knowledge and understanding of their teachers assessment criteria
for grammar. Finally, in phase three, she interviewed the two teachers about the criteria
they applied in assessing the grammatical ability of their students, and elicited their views
regarding priorities.
Neumann reports that grammatical accuracy was the single most important assessment
criterion for [these] L2 writing teachers (2010: 79). By including students in her study,
she is able to identify unintended consequences of the teachers focus on accuracy in
writing, which, as she points out, inadvertently communicated a highly reductive construct
of grammatical ability in academic writing to their students, and ultimately had a negative
impact on the learning potential of these students. Like Seror (2008), Neumann finds a
gap between the teachers intentions and the students interpretation and use of feedback
and concludes that The teachers in this study certainly took the responsibility of writing
assessment seriously (Neumann 2010: 180), but were largely unaware of the tensions and
contradictions that resulted from the interplay. . .between the teachers pedagogical and
assessment goals (to improve students linguistic accuracy in writing), and the resulting, albeit
unintended [negative] washback effect (students error avoidance strategies) (ibid.).
She notes that the teachers aimed to reduce error through their feedback by attracting
their students attention to what was incorrect in their writing. The students did become
aware, but instead of learning more about writing by taking risks and trying out new ways
of expressing their ideas, they avoided potentially negative comments on grammatical errors
by relying on structures that were already familiar to them.

http://journals.cambridge.org

Downloaded: 02 Aug 2016

IP address: 134.117.10.200

LIYING CHENG & JANNA FOX: CANADA

533

Song (2007) also explores learners perceptions, but from a phenomenological perspective
(Creswell 1998; Creswell & Plano-Clark 2007), although she does not identify her study as
phenomenological, and prefers to use the overarching term of NARRATIVE INQUIRY (Clandinin
& Connelly 2000) to describe her approach. She briefly discusses interpretive biography (Song
2007: 51) as another possible perspective on her method. However, from the beginning of
her study, her approach is consistent with phenomenological research. As Volkmann (cited in
Leedy 1997) points out, Attention to experience and intention to describe experience are the
central qualities of phenomenological research (Leedy 1997: 88). Or as Leedy observes in
defining phenomenological research designs: the researcher often has personal experience
with the phenomenon and aims to heighten his or her own awareness of the experience while
simultaneously examining the experience through the eyes of other participants (1997: 161).
Song, having been a Chinese ESL learner herself, frames her stories of six Chinese ESL
learners perceptions of classroom assessment within her own story, lived and told (Song
2007: 1). She weaves theories of classroom assessment into representation and interpretation,
seeking to arrive at themes or patterns in her own experience and those of her participants that
are invariable across all manifestations of the phenomenon (Tesch 1994: 197). Her openended approach focuses on relationships as being at the heart of thinking narratively. . .[and]
key to what it is that narrative inquirers do (Song 2007: 176).
Songs research journey (p. 181) leads her to conclude that perceptions of classroom
assessment are embedded in curricular contexts and intimately related to second language
acquisition. She seems to argue for what is self-evident: that a learners perception of classroom
assessment varies from person to person, and alters with the changing landscape of the
classroom (Song 2007: i) She begins and ends with recognitions that:
1. Assessment is a way of knowing, for both teachers and learners;
2. Assessment is power. Teachers have the power to impose assessment for a range of
pedagogical purposes, but students can also exercise power by assessing critically,
creatively, and reflectively (p. 105); and
3. Classroom assessment is a site of in-between (between process and product; learner and
teacher; learner and other learners; researcher and researched; theory and life or at least
stories of life contained in the inquiry (Clandinin & Connelly 2000, cited in Song 2007:
177).

Rather than exploring perceptions of assessment, Colby (2010) examines the role assessment
can play in supporting L2 acquisition. Specifically, she developed and implemented pedagogic
tasks in order to investigate their impact on the learning of L2 students in two intact, preuniversity English for academic purposes (EAP) classes. She subsequently collected evidence
of the students use of a grammatical feature (the use of WOULD and WILL in contingent
use contexts) as a result of their exposure to the pedagogic tasks. The tasks were designed
according to Assessment for Learning (AFL) principles (see Colby-Kelly & Turner 2007
for details), so that Colby could apply these AFL procedures in an L2 classroom setting,
and. . .investigate their effect on learning (Colby 2010: ii). The investigation of AFL-inspired
tasks and procedures and their impact on learning led in her dissertation to evidence of

http://journals.cambridge.org

Downloaded: 02 Aug 2016

IP address: 134.117.10.200

534 SURVEYS OF PH.D. THESES

the ASSESSMENT BRIDGE (AB), which she defines as the area of classroom practice linking
assessment, teaching, and learning (ibid.).
Citing Creswell & Plano-Clark (2007), Colby describes her research design as sequential,
exploratory mixed methods (2010: 59). Her research is quasi-experimental, because she is
working with two intact classes. She identifies a control and a treatment group, and through
the clever use of a split plot design manages to collect impressive evidence of the impact of
the AFL-inspired tasks on student learning over time.
Typically, a split plot project takes place over two time periods, between which the treatment
and control groups and their teachers reverse roles (Hatch & Lazaraton 1991). However,
Colby (2010) had to alter the design in order to take advantage of participant availability
and address constraints imposed by the operational requirements of the school where the
research took place. In the end, therefore, three teachers and four different classes participated
in the study, in two time periods, over a total of six weeks. Colbys (2010) research provides a
window on the challenges and benefits of longitudinal work in classroom-based research. Her
innovative approaches to meeting operational constraints with alternative research strategies
provide a model for field researchers undertaking longitudinal studies in situ. The pedagogic
tasks she develops will be of interest to other EAP teachers who are implementing AFL
principles in their teaching, as will the suggested teacher training strategies she applies.

5.2 Young L2 language learners


The other two dissertations, by Kwan (2005) and Gunning (2011), investigated the teaching,
learning and assessment of young language learners (McKay 2006). Kwan examined
the impact of systematic phonics instruction on L2 learners as young as four (in junior
kindergarten JK) or five (in senior kindergarten SK), while Gunning explored ESL
students strategy use and strategy instruction in Grade 6 classrooms.
Kwan (2006) makes the point that first-language (L1) literacy studies have demonstrated
that instruction in phonemic awareness and phonics for children at risk has improved their
literacy potential and reduced their risk of failure. He examines the impact of whole-class
instruction using the Jolly Phonics program on 240 SK children, aged five and six, with
regard to their oral language proficiency, phonological awareness and early literacy skill
development. Using a 3 2 between groups factorial design, Kwan (2006) investigated two
independent variables: systematic phonics instruction and language background (L1 or L2).
To examine the impact of phonics instruction, he investigated three groups of children: group
1 had systematic phonics instruction over a two-year period, in both JK and SK; group 2 had
systematic instruction in phonics during SK, but not in JK; and group 3 had no systematic
phonics instruction in either year of kindergarten. Kwans results suggest there are benefits
in early, systematic phonics instruction for young ESL children. The children in his study
developed increased phonological awareness and demonstrated improved literacy skills. They
did not, however, improve their oral language proficiency as a result of phonics instruction.
Kwan finds evidence that the intensive phonics instruction improved phonemic awareness
and early literacy skills for the ESL learners in the study, who outperformed both their L1
and L2 counterparts who had not received systematic phonics instruction.

http://journals.cambridge.org

Downloaded: 02 Aug 2016

IP address: 134.117.10.200

LIYING CHENG & JANNA FOX: CANADA

535

Kwan rightly points out that there is little research on YLLs early literacy development,
and little is known about the specific impact of phonics instruction in supporting it, or how
much phonics instruction is ideal. There has been concern that phonics instruction might
detract from an L2 childs acquisition of oral language proficiency. Since oral proficiency is
essential to a whole-language approach to reading (focusing on top-down meaning rather
than bottom-up soundletter correspondence in learning to read), Kwan also examines the
impact of systematic phonics instruction on oral language proficiency. As mentioned above,
he does not find that phonics instruction impedes (nor does it enhance) oral proficiency. The
benefits of systematic phonics instruction in Kwans study, for both L1 and L2 readers, quite
outweigh any concerns about the time it takes up.
Kwan acknowledges a problem in his research design, which confounds the timing of
instruction with the amount of instruction. He notes that systematic pre-testing of his
participants in the year prior to the study would have provided him with baseline data
enabling him to compare children who completed one year of systematic phonics instruction
in JK with children who completed one year in SK. Nor was Kwan able to consider the role
of language background of the L2 learners who spoke many different first languages. The
way the phonics program was implemented by each of the participating teachers and schools
(every day or every other day) was also an acknowledged limitation of the study (Kwan 2006:
147). However, quasi-experimental research designs at the level of the classroom always face
such challenges (as evidenced in Colbys (2010) research). The importance of Kwans research
is that it throws much needed light on an under-researched area and provides valuable insights
into the early literacy requirements of ESL (and L1) readers.
Gunning (2011) combines a general questionnaire survey study with a case study in her
mixed method investigation of 138 sixth-grade students studying ESL in the French-speaking
province of Quebec. She notes the critical need for a study of strategy use by elementary
school children because the curriculum requires teachers to integrate strategy training and
assessment into their teaching (2011: 65) of ESL. She investigates the relationship between
strategy instruction (SI) by ESL teachers and strategy use by learners, and how they impact
learning. She finds that the children reported using mainly affective and compensatory
categories of strategies, such as asking for help and risk-taking (2011: i), although this varied
in relation to their level of proficiency in English. Children with high levels of proficiency in
English reported using more strategies (affective and cognitive). She also reports that whether
or not the children liked studying English had a significant influence on their strategy use.
The case study participants were drawn from survey study participants, who were divided
into two sub-groups in order to investigate the effects of SI on the students in-class strategy
use and to assess the impact of their strategy use on success in ESL oral interaction tasks.
Gunning describes how Two intact, similar groups of participants from two different schools
served as a treatment group (n = 27) and a control group (n = 26) in the quasi-experimental
part of the research (ibid.). Like Colby (2010), Gunning devised pedagogic tasks, guided by
the principles of AFL, which were implemented at classroom level.
Gunnings mixed method study begins in phase 1 with the quantitative survey, followed
in phase 2 with a qualitative and quantitative investigation of strategy use at the classroom
level. Gunning identifies seven sources of evidence over the two phases of the study. Its
complexity is evident in her 59-page methodology section, which includes a description of

http://journals.cambridge.org

Downloaded: 02 Aug 2016

IP address: 134.117.10.200

536 SURVEYS OF PH.D. THESES

the development and validation of the instruments used in the study, including the strategy
questionnaire, task-based questionnaire, oral interaction measures, strategy log, field notes,
video recordings and interview protocols. The richness and thickness of her descriptive
detail is such that only an experienced researcher could control the quantity of data used.
She notes that The strategy assessment techniques and instruments, and the combination of
a case study with a quasi-experimental component, contributed to a mixed methods analysis
that allowed me to infer from the integration of the qualitative and quantitative data that the
SI helped raise the childrens consciousness about strategies, and that the childrens strategy
use facilitated their English oral interaction (Gunning 2011: 240).
Although research with adult language learners still seems to dominate doctoral research
on teaching, learning and assessment, it is reassuring to see work like that of Kwan (2006),
Gunning (2011), Farnia (2006) and Limbos (2005) that systematically investigates the impact
of specific pedagogic approaches on the learning of YLLs. Gunning observes that classroombased assessments (CBAs) time as a paradigm has arrived (2011: 238), and that research
is needed to investigate the specific characteristics of pedagogic tasks and procedures that
appear to support learning, and collect evidence that such tasks and procedures directly
impact learning. Kwan (2006), Gunning (2011) and Colby (2010) have contributed to this
long-overdue research agenda, while Seror (2008) and Neumann (2010) provide thoughtful
considerations of the role of feedback on L2 learning.

6. Vocabulary learning, lexical proficiency and lexical richness


Baba (2007), Cervatiuc (2007) and Douglas (2010) investigated features of vocabulary
acquisition and development, lexical proficiency and/or lexical richness and their impact
on academic outcomes in adult, non-native English speakers (NNES). While Douglas (2010)
and Baba (2007) researched novice undergraduate writers engaging with the demands of
university study in a second language, Cervatiuc (2007) explored the factors that 20 highly
proficient NNES attributed to their success in vocabulary acquisition.
All of these researchers investigated vocabulary size and profiled the English lexical
proficiency of their participants in order to address their research questions. Baba (2007)
used a mixed method approach to assess the English summary writing of 68 Japanese
undergraduate EFL students in Japan. After each student had written summaries of two
topics, she interviewed them using a stimulated-recall approach to elicit the types of
strategies the students reported using. She analyzed four features of lexical proficiency in
their writing: vocabulary size, depth of vocabulary knowledge, word definition ability and
lexical diversity. She also scored the summaries for their overall quality, ideational quality,
use of academic sources and extent of copying from source texts. Using the scores assigned to
the summaries as a guide, she then selected ten texts (five that were more effective, five less)
for additional textual analysis, paying particular attention to issues of lexical repetition and
topical structures. Discriminant analysis suggests that key differences between more or less
successful writers (n = 33/group) were associated with productive English abilities (scores
on summary writing, copying and word-definition ability), and both word definition and
summary writing performance were strong predictors of higher or lower levels of proficiency.

http://journals.cambridge.org

Downloaded: 02 Aug 2016

IP address: 134.117.10.200

LIYING CHENG & JANNA FOX: CANADA

537

Baba (2007) argues that much more work is needed to fully develop the construct of
vocabulary knowledge as it is operationalized in tests, and to better understand the relationship
between vocabulary knowledge and writing performance. She found evidence in her study
that lexical proficiency was crucial (p. 157) for the students in her study, although not every
dimension of lexical proficiency was equally related to writing performance. She argues for
continued research on the role of tasks in relation to contexts in eliciting writing, more probing
interviews with writers to look in greater depth at their accounts of how and why they use
the words they do, and longitudinal research to trace the development of their vocabulary
knowledge over time.
Her calls for more longitudinal research into lexical richness in writing are answered in part
by the work of Douglas (2010), who investigated vocabulary knowledge as it was evidenced
in the lexical richness of novice native speaker (NS) and non-native English speaker (NNES)
undergraduate writers on a test of writing, and related initial measures of lexical richness to
academic outcomes. He found that vocabulary knowledge appears as an underlying variable
related to students academic outcomes at university (p. 206).
Douglas quantitative study drew on a sample of 745 students (561 NS students and
184 NNES students) who were required to take a writing test upon entry to undergraduate
programs at the university where the study took place. These students allowed their tests to be
used anonymously for research purposes and filled in an information form, which provided the
researcher with demographic information and a means of linking their initial test performance
to university course outcomes. He used a corpus linguistic approach and lexical analysis tools
available online to investigate the degree of lexical richness in the students written tests. A
number of tools were used to analyze the test material for indices of lexical richness, based on
a corpus developed by the researcher on the written entry test and responses to a provincial
test written at the end of grade 12 as a high school matriculation requirement. Academic
outcome measures were derived through detailed analysis of university transcripts.
Like Baba (2007), Douglas found evidence of the importance of vocabulary knowledge,
lexical depth and breadth in eventual academic success at university. He found that NNESs
who do not have the depth or breadth of vocabulary knowledge of their NS peers often persist
(Fox: 2005) in their academic study in spite of many failures and setbacks. As Douglas points
out, however, While [these] NNES students eventually graduated from university. . .they
were faced with diminished academic outcomes in terms of Grade Point Averages, Length
of Program, Courses Attempted and Not Earned, and Academic Standing (Douglas 2010:
ii). Hierarchical regression analysis found that lexical richness upon entry to university was
a significant predictor of academic outcomes. In this regard, Cervatiucs (2007) research is
also particularly important. Her research was aimed at obtaining insight into the factors
that account for high levels of lexical proficiency amongst highly successful academic or
professional adult NNES.
The 20 participants in Cervatiucs study had arrived in Canada after the age of 18, scored
at native-like levels on tests of vocabulary size (i.e. from 13,500 to 20,000 base words) and
had vocabulary profiles consistent with native-like writing standards (Morris & Cobb 2004).
She found that both situational and linguistic factors had an impact on their acquisition
of vocabulary, but the two underlying forces that drive their success and activate a unique
combination of situational factors, L2 input, individual differences, and learning strategies

http://journals.cambridge.org

Downloaded: 02 Aug 2016

IP address: 134.117.10.200

538 SURVEYS OF PH.D. THESES

are AWARENESS of inner and outer resources and WILLPOWER to consistently employ these
resources in order to make language and vocabulary gains (Cervatiuc 2007: iv).
Babas (2007) focus on metacognitive strategy use in her Japanese EFL writers and
Douglass (2010) findings regarding PERSISTENCE support Cervatiucs (2007) conclusions
regarding her 20 highly proficient NNESs. Drawing on a vocabulary size test, language
proficiency assessments, interviews, questionnaires and samples of her participants writing,
she cross-analyzes or triangulates findings, guided by principles of grounded theory (Glaser
& Strauss 1967; Hutchinson 1997).
Although she does not describe it as such, Cervatiucs study employs a mixed method
approach with both quantitative and qualitative phases in her research design. In her
own view, her study is mainly interpretative (qualitative dominant) because meaning
is quintessential in interpretative research (Bogdan & Bicklen 1998) which focuses on
participants perspectives or perceptions, values and attitudes (Cervatiuc 2007: 73). Taking
into account her participants perceptions of how they acquired, used and developed their
lexical proficiency allows her to develop rich profiles of the 20 exceptionally successful L2
learners in her study and provides the initial groundwork for a theory of successful L2
vocabulary acquisition.

7. Conclusions and future directions


Examining such a large number of doctoral dissertations produced during the five-year
period 20062011 in Canada was a daunting task. Guided by the ILTA citation framework,
we have shown how these research studies are situated within the field of language assessment
(and more broadly within applied linguistics) and outlined their scholarly contribution. To
conclude, we would like to make two further points based on this review, and consider
future directions for doctoral research in language assessment, both within Canada and
internationally.
First, we can see the increasing use of mixed methods in comparison with single method
studies. Indeed, 16 of the 24 dissertations considered here used mixed methods in their
research, taking a pragmatic view in combining qualitative and quantitative research methods
(see Table 1), although the strength of the mixed methods across these dissertations varies:
some designs are driven by research questions and some by the researchers choice of methods
(Teddlie & Tashakkori 2009). When research methods are driven by research questions and
the link between questions and methods is stronger, so are the results. When, on the other
hand, the use of a method is METHOD-DRIVEN (in that the researcher clearly chose a certain
method due to a research tradition, or personal preference) the overall impact of the research
may simply be technical, and the results weaker. All of these studies employed multiple study
phases sometimes conducted concurrently, but mainly consecutively. A majority of the
researchers adopted a program of research approach, reflecting the research direction of
the Social Sciences and Humanities Research Council (SSHRC)5 of Canada the federal
5 SSHRC enables the highest levels of research excellence in Canada, and facilitates knowledge-sharing and collaboration
across research disciplines, universities and all sectors of society. www.sshrc-crsh.gc.ca/about-au_sujet/index-eng.aspx

http://journals.cambridge.org

Downloaded: 02 Aug 2016

IP address: 134.117.10.200

LIYING CHENG & JANNA FOX: CANADA

539

Table 1 Canadian doctoral dissertations (20052011) in language


assessment and testing, by methodological approach
Qualitative

Quantitative

Mixed methods

Song (2007)
Fleming (2007)
Sterzuk (2007)
Seror (2008)

Limbos (2005)
Farnia (2006)
Kwan (2006)
Douglas (2010)

Abbot (2005)
Shih (2006)
Gao (2007)
Baba (2007)
Cervatiuc (2007)
Barkaoui (2008)
Mullen (2009)
Tan (2009)
Baker (2010)
Colby (2010)
Isaacs (2010)
Kim (2010)
Neumann (2010)
Wang (2010)
Zheng (2010)
Gunning (2011)
16
Total: 24

funding agency. These 16 dissertations demonstrate superior methodological strength and


have made original contributions to the field of language assessment. Only four dissertations
were purely qualitative, and four quantitative. Some studies (Fleming 2007) used more than
one type of method, but both were survey methods drawn from the same research tradition.
We recognize the distinct strength of the increased use of mixed methods and multiple stage
studies as they naturally lead to a program of research rather than single studies. Although
only two of the dissertations considered here (Baker 2010 and Isaacs 2010) were prepared as
manuscript Ph.D.s, the advantages of this type of dissertation were evident. We predict an
increasing number of both mixed methods and manuscript dissertations in Canada in the
future.
Second, we observed some trends in study participants. Participants in the doctoral
dissertations we reviewed tended to be L2 students (both immigrants and international
students) who were in school (K-12) and enrolled in ESL programs in Canada. The majority
were adult learners; only four studies dealt with young learners. Only one of the studies
considered Aboriginal language learners (Sterzuk 2007), an indication that this is an underresearched area in the literature. This may be due to the nature of doctoral programs
in Canada. Most of the dissertations reviewed were produced within applied linguistics
departments rather than faculties of education. Some studies were devoted to the teacher
participants alone (e.g. Baker 2010, Wang 2010) and some to student participants alone (e.g.
Limbos 2005; Farnia 2006), but most dealt with both teachers and students (e.g. Shih 2007;

http://journals.cambridge.org

Downloaded: 02 Aug 2016

IP address: 134.117.10.200

540 SURVEYS OF PH.D. THESES

Sterzuk 2007; Tan 2009). There was also a balance between studies conducted in Canada
(e.g. Abbott 2007 and Mullen 2009) and elsewhere in the world (e.g. China and Malaysia).
Indeed, these studies reflect the social composition of Canadian society and two trends: the
increasing number of international students on university campuses and English language
learners at the school level.
The breadth of the literature that these researchers drew on to inform their work was
particularly impressive, crossing disciplinary, epistemological and methodological boundaries.
The depth of their inquiry was evident in their management of complex research designs
and analyses. As a collection, breadth lent vigor and strength to the doctoral research
considered here, while depth insured its quality. Taken together, these researchers have made
an important contribution to scholarship in language testing and assessment in Canada and
internationally.
What are the directions of future doctoral research on language assessment in Canada and
internationally? First of all, we expect more research to be conducted at the intersection of
assessment, teaching and learning, where testing is not viewed or researched in isolation, but
rather embedded within the context where it occurs. We also predict that an increasing
number of studies will explore the central role that assessment plays in teaching and
learning, as in classroom-based assessment, for example. This type of research will be more
interdisciplinary, contextually based, and involve multiple stakeholders. Methodologically,
we will continue to see more studies with mixed methods, multi-methods and multiphases: employing, in a sense, a program of research approach. We will see more use
of a (multiple) manuscript format which enables doctoral researchers to firmly establish a
program of research, become actively involved with the dissemination of research findings,
and publish during the course of their doctoral study. This format allows for a more
mentored, collaborative, and team-based approach to doctoral study and research, and
a more comprehensive preparation of doctoral researchers for a wide range of academic
and research careers. It is in these doctoral research studies that we see the future of our
field.

References
Abbott, M. L. (2005). English reading strategies differences in Arabian and Mandarin speaker performance on the
CLBA reading assessment (doctoral dissertation). Retrieved from Theses Canada (32659077).
Abbott, M. L. (2007). A confirmatory approach to differential item functioning on an ESL reading
assessment. Language Testing 24.1, 130.
Alderson, C. (2007). The challenge of (diagnostic) testing: Do we know what we are measuring? In J.
Fox, M. Wesche, D. Bayliss, L. Cheng, C. Turner & C. Doe (eds.), Language testing reconsidered. Ottawa,
ON: University of Ottawa Press, 2139.
Baba, K. (2007). Dimensions of lexical proficiency in writing summaries for an English as a foreign language test
(doctoral dissertation). Retrieved from Theses Canada (33748472).
Bachman, L. F. (2000). Modern language testing at the turn of the century: Assuring that what we
count counts. Language Testing 17.1, 142.
Bachman, L. F. (2007). What is the construct? The dialectic of abilities and contexts in defining
constructs in language assessment. In J. Fox, M. Wesche, D. Bayliss, L. Cheng, C. Turner & C. Doe
(eds.), Language testing reconsidered. Ottawa, ON: University of Ottawa Press, 4171.
Bachman, L. F. & A. S. Palmer (1996). Language testing in practice. Oxford: Oxford University Press.

http://journals.cambridge.org

Downloaded: 02 Aug 2016

IP address: 134.117.10.200

LIYING CHENG & JANNA FOX: CANADA

541

Baker, B. A. (2010). In the service of the stakeholder: A critical, mixed methods program of research in high-stakes
language assessment (doctoral dissertation). Retrieved from ProQuest (NR74366).
Barkaoui, K. (2007). Participants, texts, and processes in second language writing assessment: A
narrative review of the literature. The Canadian Modern Language Review 64, 97132.
Barkaoui, K. (2008). Effects of scoring method and rater experience on ESL essay rating processes and outcomes
(doctoral dissertation). Retrieved from Theses Canada (35096546).
Bogdan, R. C. & S. K. Bicklen (1998). Qualitative research in education. Boston, MA: Allyn and Bacon.
Brumfit, C. (1997). How applied linguistics is the same as any other science. International Journal of
Applied Linguistics 7.1, 8694.
Cervatiuc, A. (2007). Highly proficient adult non-native English speakers perceptions of their second language
vocabulary learning process (doctoral dissertation). Retrieved from ProQuest (NR33791).
Cheng, L. (2005). Changing language teaching through language testing: A washback study. Cambridge: Cambridge
University Press,
Cheng, L. (2008). Washback, impact and consequences. In E. Shohamy & N. H. Hornberger (eds.),
Encyclopedia of language and education, Vol. 7: Language testing and assessment. New York: Springer, 349
364.
Cheng, L. & C. DeLuca (2011). Voices from test-takers: Further evidence for test validation and test
use. Educational Assessment 16.2, 104122.
Cheng, L., Y. Watanabe & A. Curtis (eds.) (2004). Washback in language testing: Research contexts and methods.
Mahwah, NJ: Lawrence Erlbaum.
Clandinin, J. & M. Connelly (2000). Narrative inquiry: Experience and story in qualitative research. San Francisco,
CA: Jossey-Bass.
Colby-Kelly, C. & C. Turner (2007). AFL research in the L2 classroom and evidence of usefulness:
Taking formative assessment to the next level. Canadian Modern Language Review 64.1, 937.
Colby, D. C. (2010). Using Assessment of learning practices with pre-university level ESL students: A mixed
methods study of teacher and student performance and beliefs (doctoral dissertation). Retrieved from ProQuest
(NR61979).
Creswell, J. W. (1998). Qualitative inquiry and research design: Choosing among five traditions. Thousand Oaks,
CA: Sage.
Creswell, J. W. & V. L. Plano-Clark (2007). Designing and conducting mixed methods research. Thousand Oaks,
CA: Sage.
Cumming, A. (1990). Expertise in evaluating second language compositions. Language Testing 7, 3151.
Doe, C. (2011). The integration of diagnostic assessment into classroom instruction. In D. Tsagari & I.
Csepes (eds.), Classroom-based language assessment: Language testing and evaluation. Frankfurt: Peter Lang,
6376.
Douglas, S. R. (2010). Non-native English speaking students at university: Lexical richness and academic success
(doctoral dissertation). Retrieved from ProQuest (NR69496).
Eggins, S. (2004). An introduction to systemic functional linguistics (2nd edn). New York: Continuum.
Farnia, F. (2006). Modeling growth in reading fluency and reading comprehension in EL1 and ESL children: A
longitudinal individual growth curve analysis from first to sixth grade (doctoral dissertation). Retrieved from
Theses Canada (33265070).
Fleming, D. J. (2007). Becoming Canadian: Punjabi ESL learners, national language policy and the Canadian
language benchmarks (doctoral dissertation). Retrieved from Theses Canada (33664937).
Fox, J. (2003). From products to process: An ecological approach to bias detection. International Journal
of Testing 3.1, 2148.
Fox, J. (2005). Rethinking second language acquisition requirements: Problems with language-residency
criteria and the need for language assessment and support. Language Assessment Quarterly 2.2, 85
115.
Fox, J. (2009). Moderating top-down policy impact and supporting EAP curricular renewal: Exploring
the potential of diagnostic assessment. Journal of English for Academic Purposes 8, 2642.
Fox, J. & L. Cheng (2007). Did we take the same test? Differing accounts of the Ontario Secondary
School Literacy Test by first and second language test takers. Assessment in Education: Principles, Policy
& Practice 14.1, 926.
Fox, J. & P. Hartwick (2011). Taking a diagnostic turn: Reinventing the portfolio in EAP classrooms.
In D. Tsagari & I. Csepes (eds.), Classroom-based language assessment. Frankfurt: Peter Lang, 4761.
Fulcher, G. (1996). Does thick description lead to smart tests? A data-based approach to rating scale
construction. Language Testing 13.2, 208238.

http://journals.cambridge.org

Downloaded: 02 Aug 2016

IP address: 134.117.10.200

542 SURVEYS OF PH.D. THESES

Gao, L. (2007). Cognitive-psychometric modeling of the MELAB reading items (doctoral dissertation). Retrieved
from Theses Canada (33905480).
Gao, L. & W. T. Rogers (2011). Use of tree-based regression in the analyses of L2 reading test items.
Language Testing 28, 77104.
Glaser, B. G. & A. L. Strauss (1967). The discovery of grounded theory. Chicago, IL: Aldine.
Grabe, W. (2009). Reading in a second language: Moving from theory to practice. New York: Cambridge
University Press.
Grabe, W. & F. L. Stoller (2011). Teaching and researching reading. Harlow, UK: Pearson Education Limited.
Greene, J. C., V. J. Caracelli & W. F. Graham (1989). Toward a conceptual framework for mixedmethod evaluation designs. Educational Evaluation and Policy Analysis 11, 255374.
Gunning, P. (2011). ESL strategy use and instruction at the elementary school level: A mixed methods investigation
(doctoral dissertation). Retrieved from ProQuest (NR77521).
Hamp-Lyons, L. (1990). Second language writing: Assessment issues. In B. Kroll (ed.), Second language
writing: Research insights for the classroom. Cambridge: Cambridge University Press, 6987.
Hamp-Lyons, L. (1995). Rating non-native writing: The trouble with holistic scoring. TESOL Quarterly
29, 759762.
Hatch, E. & A. Lazaraton (1991). The research manual: Design and statistics for Applied Linguistics. Rowley,
MA: Newbury House.
Hutchinson, S. A. (1997). Education and grounded theory. In R. Sherman & R. Webb (eds.), Qualitative
research in education: Focus and methods. Philadelphia, PA: Falmer Press, 123140.
Isaacs, T. (2010). Towards defining a valid assessment criterion of punctuation proficiency in non-native English-speaking
graduate students (doctoral dissertation). Retrieved from ProQuest (MR24877).
Isaacs, T. & P. Trofimovich (2011). Phonological memory, attention control, and musical ability:
Effects of individual differences on rater judgments of L2 speech. Applied Psycholinguistics 32, 113
140.
Ishii, D. N. (2009). Language dia-logs: A collaborative approach for providing effective feedback on ESL learners verb
errors in writing (doctoral dissertation). Retrieved from Theses Canada (37943444).
Kane, M. T. (2002). Validating high-stakes testing programs. Educational Measurement: Issues and Practices
21.1, 3141.
Kim, Y. (2010). An argument-based validity inquiry into the empirically-derived descriptor-based diagnostic assessment
in ESL academic writing. Unpublished doctoral dissertation. University of Toronto.
Kwan, A. B. (2005). Impact of systemic phonics instruction on young children learning English as a second language
(doctoral dissertation). Retrieved from Theses Canada (32659359).
Lado, J. (1961). Language testing: The construction and use of foreign language tests. London: Longman.
Leedy, P. (1997). Practical research: Planning and design (6th edn). Upper Saddle River, NJ: Prentice Hall.
Limbos, M. (2005). Early identification of second-language students at risk for reading disability (Doctoral
dissertation). Retrieved from Theses Canada (32659383).
McKay, P. (2006). Assessing young language learners. Cambridge: Cambridge University Press.
McNamara, T. & C. Roever (2006). Language testing: The social dimension. Malden, MA: Blackwell
Publishing.
Messick, S. (1989). Validity. In R. L. Linn (ed.), Educational measurement (3rd edn). New York: Macmillan,
13103.
Messick, S. (1996). Validity and washback in language testing. Language Testing 13, 243256.
Mislevy, R. J., L. S. Steinberg & R. C. Almond (2003). On the structure of assessment arguments.
Measurement: Interdisciplinary Research and Perspectives 1.1, 362.
Morris, L. & T. Cobb (2004). Vocabulary profiles as predictors of TESL student performance. System
32.1, 7587.
Moss, P. A., B. J. Girard & L. C. Haniford (2006). Validity in educational assessment. Review of Research
in Education 30, 109162.
Mullen, A. (2009). The impact of using a proficiency test as a placement tool: The case of Test of English for
International Communication (TOEIC) (doctoral dissertation). University of Laval, QC: Canada.
Neumann, H. (2010). Whats in a grade? A mixed methods investigation of teacher assessment of grammatical ability
in L2 academic writing (doctoral dissertation). Retrieved from ProQuest (NR77532).
Qi, L. (2007). Is testing an efficient agent for pedagogical change? Examining the intended washback
of the writing task in a high-stakes English test in China. Assessment in Education 14.1, 5174.
Rampton, B. (1997). Retuning in applied linguistics. International Journal of Applied Linguistics 7.1,
325.

http://journals.cambridge.org

Downloaded: 02 Aug 2016

IP address: 134.117.10.200

LIYING CHENG & JANNA FOX: CANADA

543

Samson, M. (2012). What applied linguists do: An investigation of research practices in the field (Masters research
essay). Carleton University, Ottawa, Canada.
Seror, J. (2008). Socialization in the margins: Second language writers and feedback practices in university content
courses (doctoral dissertation). University of British Columbia, Canada.
Shih, C. M. (2006). Perceptions of the General English Proficiency Test and its washback: A case study of two Taiwan
technological institutes (doctoral dissertation). Retrieved from ProQuest (NR16000).
Shohamy, E. & T. McNamara (2009). Language tests for citizenship, immigration, and asylum. Language
Assessment Quarterly: An International Journal 6, 15.
Song, Y. H. (2007). A narrative inquiry into classroom assessment: Stories of six Chinese adult learners of English as
a second language (doctoral dissertation). Retrieved from Theses Canada (33969044).
Sterzuk, A. (2007). Dialect speakers, academic achievement, and power: First nations and Metis children in Standard
English classrooms (doctoral dissertation). Retrieved from Theses Canada (34491819).
Suzuki, W. (2009). Languaging, direct correction, and second language writing: Japanese university students of English
(doctoral dissertation). Retrieved from Theses Canada (37943556).
Tan, H. M. (2009). Changing the language of instruction of mathematics and science in Malaysia: The PPSMI policy
and washback effect of bilingual high-stakes secondary school exit exams (doctoral dissertation). Retrieved from
Theses Canada (39290869).
Teddlie, C. & A. Tashakkori (2009). Foundations of mixed methods research: Integrating quantitative and qualitative
approaches in the social and behavioral sciences. Thousand Oaks, CA: Sage.
Tesch, R. (1994). The contribution of a qualitative method: Phenomenological research. In M.
Langenbach, C. Vaughan & L. Aagaard (eds.), An introduction to educational research. Needham Heights,
MA: Allyn & Bacon, 143157.
Turner, C. & J. Upshur (2002). Rating scales derived from students samples: Effects of the scale maker
and student sample on scale content and student scores. TESOL Quarterly 36.1, 4970.
Wakamoto, N. (2007). The impact of extroversion/introversion and associated learner strategies on English
language comprehension in a Japanese EFL setting (doctoral dissertation). Retrieved from Theses Canada
(33748547).
Wall, D. (2005). The impact of high-stakes examinations on classroom teaching: A case study using insights from testing
and innovation theory. Cambridge, UK: Cambridge University Press.
Wang, J. (2010). A study of the role of the teacher factor in washback (doctoral dissertation). Retrieved from
ProQuest (NR74872).
Watanabe, Y. (2004). Teacher factors mediating washback. In L. Cheng, Y. Watanabe & A. Curtis (eds.),
Washback in language testing: Research contexts and methods. Mahwah, NJ: Lawrence Erlbaum, 129146.
Weigle, S. C. (1994). Effects of training on raters of ESL compositions. Language Testing 11, 197223.
Widdowson, H. G. (1998). Retuning, calling the tune, and paying the piper: A reaction to Rampton.
International Journal of Applied Linguistics 8.1, 147151.
Yang, Y. (2008). Corrective feedback and Chinese learners acquisition of English past tense (doctoral dissertation).
Retrieved from Theses Canada (38060335).
Zheng, Y. (2010). Chinese university students motivation, anxiety, global awareness, linguistic confidence, and English
test performance: A causal and correlational investigation (doctoral dissertation). Retrieved from Theses
Canada (39291111).
LIYING CHENG Ph.D. is Professor at the Faculty of Education, Queens University, Kingston, Ontario,
Canada. Her research interests are the impact of large-scale testing on instruction, the relationships
between assessment and instruction, and the academic and professional acculturation of international
and immigrant students, workers and professionals to Canada. Her recent books are English language
assessment and the Chinese learner (co-edited with A. Curtis, Taylor & Francis, 2010); Language testing
reconsidered (co-edited with J. Fox et. al., University of Ottawa Press, 2007); Changing language teaching
through language testing (single-authored, Cambridge University Press, 2005); and Washback in language
testing: Research contexts and methods (co-edited with Y. Watanabe and A. Curtis, Lawrence Erlbaum,
2004).
JANNA FOX Ph.D. is Associate Professor in the School of Linguistics and Language Studies, Carleton
University, Ottawa, Ontario, Canada. Her research interests include language testing and assessment,
the development of academic literacy within and across university disciplines, the scholarship of
teaching in linguistically and culturally diverse contexts, and the interplay between language policy,

http://journals.cambridge.org

Downloaded: 02 Aug 2016

IP address: 134.117.10.200

544 SURVEYS OF PH.D. THESES

curricula, assessment, and stakeholder impact. Her recent work in language testing and assessment is
published in Language Testing, Language Assessment Quarterly and Educational Measurement: Issues and Practice.
Her work on academic literacies is published in Written Communication, Multimodal Communication and
the Journal of English for Academic Purposes. In 2012, she received (with Natasha Artemeva) the College
Composition and Communication Award for Best Article on Pedagogy or Curriculum in Technical
or Scientific Communication (for Awareness versus production: Probing students antecedent genre
knowledge, in the October 2010 issue of the Journal of Business and Technical Communication).

http://journals.cambridge.org

Downloaded: 02 Aug 2016

IP address: 134.117.10.200