Vous êtes sur la page 1sur 20

Language Teaching Research

http://ltr.sagepub.com Reconciling accountability and development needs in heritage language education: A communication challenge for the evaluation consultant
Catherine Elder Language Teaching Research 2009; 13; 15 DOI: 10.1177/1362168808095521 The online version of this article can be found at: http://ltr.sagepub.com/cgi/content/abstract/13/1/15

Published by:

Additional services and information for Language Teaching Research can be found at: Email Alerts: http://ltr.sagepub.com/cgi/alerts Subscriptions: http://ltr.sagepub.com/subscriptions Reprints: http://www.sagepub.com/journalsReprints.nav Permissions: http://www.sagepub.co.uk/journalsPermissions.nav Citations http://ltr.sagepub.com/cgi/content/refs/13/1/15

Downloaded from http://ltr.sagepub.com by Mohsen Hajian Nejad on October 10, 2009

Language Teaching Research 13,1 (2009); pp. 1533

Reconciling accountability and development needs in heritage language education: A communication challenge for the evaluation consultant
Catherine Elder The University of Melbourne, Australia

The paper offers a retrospective evaluation of recent evaluative studies of bilingual programs in the Australian state of Victoria, in an attempt to determine how successfully the evaluation process met the dual criteria of external accountability and development. The programs in question were located in primary or secondary government schools and involved partial immersion in a heritage language. Data for the paper are drawn from the following: (a) the consultants recollections of the evaluation context and process, and (b) the evaluative reports relating to three different programs (VietnameseEnglish, ChineseEnglish and ArabicEnglish respectively). In hindsight it appears that the effectiveness of each evaluation may have depended in part on the degree of fit between the school and the consultants views about the function of the evaluation initiative, as well as on her ability to communicate findings in terms which were both academically defensible and meaningful for teachers and program administrators. While the task of bridging the gap between the accountability and ameliorative functions of each evaluation was challenging for all parties (and possibly exacerbated by linguistic and cultural divides), it is argued that the former is not necessarily at the expense of the latter. The requirement that outcomes be reported objectively to an external stakeholder can, if appropriately handled, generate insights among program participants which can be harnessed for program improvement. The paper concludes with an account of the lessons learned from the evaluations, in the hope that these help will evaluation consultants in forging more productive relationships and better communications with program participants.

Over the past two decades the field of program evaluation in general has shown a shift away from investigations in an exclusively positivist paradigm towards the inclusion of naturalistic approaches (Lynch, 1996). The scope of evaluations has also broadened, moving from a more or less exclusive focus on program outcomes to routinely encompass investigations of program processes (Chelmsky & Shadish, 1997). In addition, the democratic and
Address for correspondence: Catherine Elder, School of Languages and Linguistics, Arts Centre Building, Level 5, The University of Melbourne, Victoria 3010, Australia; email: caelder@unimelb. edu.au
2009 SAGE Publications (Los Angeles, London, New Delhi and Singapore) 10.1177/1362168808095521

Downloaded from http://ltr.sagepub.com by Mohsen Hajian Nejad on October 10, 2009

16 Reconciling accountability and development

ameliorative aspects of program evaluation are now seen as critical (Nevo, 1992; Patton, 1996; Stake, 2004), and utility (e.g. for program managers and teachers) is the key standard in judging the quality of evaluative projects (Joint Committee on Standards for Educational Evaluation, 1994). Sadly, the documentation of this type of participatory developmental approach in the applied linguistics literature remains fairly scant (but see Alderson & Scott, 1992; Kiely & Rea-Dickins, 2005; Lynch, 1992; Mackay et al., 1995; Norris, 2004; Rea-Dickins & Germaine, 1998). One reason for the dearth of published literature on the developmental aspects of language program evaluation may be that program insiders are the intended audience for such evaluations. Findings may therefore be perceived as having only local significance and may, for this reason, be documented in a form that is less meaningful to outsiders than is the case with evidence gathered for accountability purposes. Another explanation may be that demonstrating the utility of an evaluation from a developmental or awareness-raising perspective is harder than providing accountability evidence, since so many contingent factors (e.g., the relationships between outsider and insider evaluators, and the responsiveness of program managers and other stakeholders) can influence how receiving institutions respond to evaluation findings and recommendations. Stakeholder resistance does not however exonerate evaluators from pushing for change (Weir & Roberts, 1994) and documentation of developmental efforts (whether or not these are successful) may assist other evaluators in forging relationships with program participants that are conducive to program enhancement. With the aim of providing further insights into evaluation processes and the relationships with program participants in particular, this paper describes three evaluative studies of bilingual programs in the Australian State of Victoria conducted in the late 1990s. Drawing on both the evaluators recollections and the unpublished reports on each program, it considers how successful each evaluation was in reconciling the tensions between 1) the externally imposed requirement of accountability (i.e., documenting to the extent to which the programs achieved expected outcomes and were compliant with the terms of their funding) and 2) internal needs for program development (i.e., enhancing the quality of program delivery and student learning). It focuses particularly on the difficulties faced by the external evaluators (this author, her colleagues and students) in managing these different aspects of each evaluation and comments on the lessons learned from each evaluation experience.

I Background to the evaluations

A funding stream for bilingual education programs was set up in 1996 by the Victorian Department of Education as a means of enhancing academic language learning opportunities for Victorian students in the state sector. This funding followed some documented successes for bilingual education at a small number of schools in both the private and public sector (e.g., Clyne,

Downloaded from http://ltr.sagepub.com by Mohsen Hajian Nejad on October 10, 2009

Catherine Elder 17

1986; Fernandez, 1996; Berthold, 1995; de Courcy, 1999, 2002; Dellora, 1989; Devlin, 1997; Elder, 1989; Lorch et al., 1992; Reid, 1996); these in turn contrasted with a rather unremarkable record of achievement for programs following the more traditional limited exposure language instruction model which is the norm in Australian schools (Clyne, 1995). Schools were invited to tender for additional funds either to mount a new program or to build on a program already in existence. A total of 12 primary and three secondary state schools were awarded funding under this program. The conditions of funding were outlined in a Memorandum of Understanding between the Department and the school. All programs were to follow a contentbased partial immersion approach in which the foreign or heritage language was to be used for teaching at least 40% (amounting to 10 hours per week) of the school curriculum. One of the funding conditions for each program was that it should be evaluated over a three-year period by an outside consultant, working collaboratively with teachers and program administrators, and using a combination of data collection methodologies including the following: (a) tests (of language skills and subject knowledge), (b) parent and student questionnaires, (c) teacher interviews, (d) classroom observations, and (e) longitudinal learner case studies. The focus of the external evaluation was to be on language gains, learning in the relevant disciplinary areas, student/staff/parent attitudes to the bilingual program, and issues of program implementation, but the precise purpose of gathering this information was not explicitly stated. Findings of each evaluation were to be reported not only to the school, but also to the state educational authority responsible for the schools funding. An important consideration in conducting these evaluations was the amount of funding made available to the consultants, which was limited to Aus$6,000 per annum in most cases. This certainly constrained the consultants capacity for sustained engagement with the school and their ability to establish the relationships with insiders needed to foster the development side of the evaluation in particular. Also worth noting, because of its potential impact on relationships between evaluators and evaluands, is the ambiguous role of the evaluators and the way they were selected. The school rather than the central educational authority was responsible for hiring evaluation personnel, who thus served as external evaluators in terms of school improvement, but in some sense could be regarded as an insiders when it came to the evaluations accountability function, because of their obligations to their primary employer. This ambiguous role was further complicated by the fact that some schools advertised for an evaluation consultant through an open tender process, while others approached people either known or referred to them. Schools also varied in what they regarded as an appropriate type of expertise, with some favouring consultants with knowledge of the heritage language and/or expertise in bilingual education over language program evaluation or testing know-how. Thus the evaluation playing field was by no means even in terms of consultants

Downloaded from http://ltr.sagepub.com by Mohsen Hajian Nejad on October 10, 2009

18 Reconciling accountability and development

expertise, role, and status, and also their familiarity with the school context, personnel, or the relevant heritage language and culture. The selected evaluator sometimes enlisted postgraduate students to assist with the task, adding a further layer of complexity to the already complex relationship between the evaluator and the school, which played out very differently in each project. The author of this paper, who was hired as a consultant for a number of the above evaluations, is a language tester, with a history of involvement in bilingual language education as a language teacher, teacher educator consultant and researcher. Three of the programs evaluated, which are described in more detail below, involved heritage languages with which the evaluator was unfamiliar. Pseudonyms are used hereafter for the schools in question.

II Three evaluation cases

In the following account, each of three heritage language program evaluations is reviewed briefly in terms of how evaluations were initiated, what happened, and what did not. Key concerns with context, communication, and the utility of evaluation processes are also highlighted.

1 Rosemount
Rosemount is a large secondary school in Melbourne located in an outer suburban zone of what has long been a settlement area for immigrants from non English-speaking backgrounds. At the time when the evaluation was conducted, many of these were relatively recent arrivals from South-East Asian countries, with immigrants from Vietnam constituting the largest population in the area. While a limited-exposure Vietnamese language instruction program was already offered within the school, state government funds were designated to establish a more intensive late-entry bilingual program (from Years 7 to 9) for around 40 recently arrived students who were more proficient in Vietnamese (their mother tongue) than English. The two Vietnamese teachers charged with this program offered bilingual (VietnameseEnglish) input in Science and History (and later Accounting) plus an optional unit in Computer Science. A parallel program was also offered in Chinese, but will not be discussed here because this evaluator was not involved. After receiving funding, the school principal approached a freelance educational consultant (known to her) to take charge of the design and administration of the mandatory external evaluation. This consultant asked the author of this paper and one of her colleagues to implement a single component of the evaluation, namely the pre- and post-testing of language proficiency (in English and Vietnamese) within the school. After two site visits to determine the purpose and scope of the initiative, the evaluators set about negotiating with the school which tests/assessment procedures would be used, and

Downloaded from http://ltr.sagepub.com by Mohsen Hajian Nejad on October 10, 2009

Catherine Elder 19

how/when test data would be gathered. The task of visiting the school and administering the tests was later entrusted to a postgraduate student who was given permission to use the data as the basis for her Masters thesis. The testing component of the evaluation involved administering a battery of language tests comparing student proficiency in Vietnamese at two different stages of the program (in Semester 2, 1997) and after two years in the program (in Semester 2, 1999). The aim here was to show the extent to which students had improved their Vietnamese proficiency. Testing in relation to English started in 1998. Here the performance of those enrolled in the bilingual program (the experimental group) was also compared to that of a control group of learners similar in age, language background, and length of residence, but who had not experienced content-based instruction in Vietnamese. Since Vietnamese input in the bilingual program was used as a bridge for learning English, along with the academic subject matter of the mainstream curriculum, it was believed that greater English language gains among the experimental group would constitute evidence of the value of the bilingual program. The results emerging from the testing regime as stated in the first annual evaluation report (Zbar et al., 1998) were somewhat equivocal, showing no evidence of either positive or negative effects from the bilingual program (see Elder, 2005 for a more detailed discussion of the test data and its limitations). However these inconclusive findings were subsumed within an overwhelmingly positive report of the programs overall achievements prepared by the chief consultant. The report concluded that for the sake of stability, only minimal changes should be made to the program and that the department of Education continue to provide the same (indexed) level of funding for the bilingual program (Zbar et al., 1998, p. 36). It can be inferred from this conclusion that the chief evaluator construed his role as advocating (House & Howe, 1998) on behalf of the school for continuation of the program, although the accountability evidence supporting this position was limited in scope. The evaluation also fell short of our second criterion for effectiveness, given the absence of any evidence from classroom observations or any discussion of instructional processes or strategies in the evaluation report. Furthermore, the communications between ourselves and the consultant who had subcontracted us for the pre- and post-test component of the evaluation were very limited, as were our encounters with school staff. In addition, the postgraduate research assistant, who had visited the school on a number of occasions, felt constrained in offering feedback on what she had observed of the progams working due to her marginal status as student and also, not least, by the fact that she was not herself a speaker of Vietnamese. Interestingly, this author since became aware that another project involving a team of sociolinguists/bilingual educators from a nearby university was operating in tandem within the school. This project, reported in an academic journal by Lotherington (2001), took the form of classroom observations and regular focus-group meetings with the bilingual teachers, and aimed to

Downloaded from http://ltr.sagepub.com by Mohsen Hajian Nejad on October 10, 2009

20 Reconciling accountability and development

document an instance of late-entry content-based education, to profile biliteracy development in a secondary education context and to develop teachers problem solving skills through action research (Lotherington, 2001, p. 97). While this initiative was not part of the official evaluation and derived its funding from other sources, it filled many of the gaps referred to above by documenting teacher beliefs about the bilingual program as well as the ways they addressed the challenge of balancing students language and literacy needs in two languages. In hindsight, it seems likely that the presence of this second team of researchers may well have influenced the schools perfunctory attitude and commitment to the official, centrally mandated evaluation, although our distance from any school internal processes means that we can only speculate on this matter. Sadly, the lack of any dialogue between the two teams of researchers almost certainly limited the efficacy of the evaluation effort as a means of offering accurate and meaningful feedback for either accountability or development purposes.

2 Luxton
Luxton is located in the heart of inner metropolitan Melbourne near a high-rise housing estate which accommodates many recent immigrants and low income families. The program under investigation involved MandarinEnglish bilingual education for primary school-age heritage language learners who were first or second generation speakers of a range of Chinese dialects, most notably Hakka (spoken by a large group of recently arrived East Timorese refugees), as well as Cantonese and/or the target variety Mandarin. The bilingual program was well established within the school and had been offered to all Chinese background students in the preparatory and first grades for over 10 years (along with a similar program in Vietnamese). Although Mandarin was not the mother tongue of many of the children in the programme, this target variety has both symbolic and practical value within the Australian community due to its status as a literate lingua franca for all ethnic Chinese (Lotherington, 2001). A number of Chinese immigrant families from outside the school catchment area had moved to the area so that their children could enjoy the linguistic, cultural, and academic benefits of the bilingual program. The model used for delivering bilingual instruction was to teach entirely in English for the first half of the week and then to switch to the heritage language for the remainder. English medium instruction took place in multi-age multicultural classes (composed of children from not only Chinese but also Vietnamese, Turkish, and other language backgrounds), whereas Mandarinmedium instruction (and instruction through other heritage languages) was offered to the children in their respective ethnic groups during the second half of the week. Regular planning sessions between English and heritage language teachers were built into the timetable to ensure a logical curricular sequence from the beginning to end of the week.

Downloaded from http://ltr.sagepub.com by Mohsen Hajian Nejad on October 10, 2009

Catherine Elder 21

Government funding in this instance was being used not to introduce a new program but rather to extend the existing Chinese program for a further year so that the transition to English-only instruction plus a 4-hour per week language maintenance program would commence in Grade 3 rather than in Grade 2. The school principal approached the author (known to him on account of a prior research consultancy) with a direct request for assistance. This author assumed responsibility for all the required external components of the evaluation gathering of attitudinal data from parents, observation of language classes, monitoring of student achievements via pre- and post-tests (in English and Mandarin), and longitudinal case studies. Once the goals and methods for the evaluation had been determined, a postgraduate student, whose first language was Cantonese but who was also an accomplished speaker and user of Mandarin, was hired to observe classes, track individual children, and work with the heritage language teachers on developing suitable tests of language achievement. This student researcher had far more sustained engagement with the program and the teachers than was the case with the student researcher at Rosemount, and was able to develop a good working relationship with the bilingual teachers. The evaluation focused on whether there was support for the program within the school and local community, whether the children in the program were progressing in both languages, and in particular, whether there were observable and measurable academic and linguistic benefits from an additional year of heritage language instruction. The school was also interested in how their students fared compared with those at a neighbouring school that offered a two-way bilingual program in Chinese and English for children from both Chinese and English speaking backgrounds. While space does not permit a detailed discussion of the evaluation design, the outcomes were impressive, and each successive report on the bilingual program was laudatory, as illustrated in the following extract from one of the annual evaluation reports.
Our general conclusion, based on surveys of parent and teacher attitudes, on classroom observations and on testing of students proficiency in Chinese, is that the bilingual program at this school is highly successful and that the staff concerned are extremely skilful in doing the planning needed to make lessons (whether in Chinese or English) both challenging and productive. Teaching is very effective and students of all levels of proficiency appear to be engaged by their teachers in the kind of pushed output which Swain (1985) and others have demonstrated to be conducive to language acquisition. Our test and observational data reveal that conversational skills achieved by students at the end of three years of bilingual input are sufficient to enable them to communicate comfortably with one another and their teacher in both Chinese and English and to carry out cognitively demanding activities in a range of curriculum areas. In addition, the Chinese writing skills of the Grade 2 and 3 children are on a par with those of older children of similar language backgrounds in the neighbouring schools bilingual program. (Elder & Liem, 1999, p. 31)

However, concerns were expressed in our report about the dramatic fall-off in Mandarin proficiency standards occurring once children exited the intensive

Downloaded from http://ltr.sagepub.com by Mohsen Hajian Nejad on October 10, 2009

22 Reconciling accountability and development

immersion programme at the end of Grade 1 and moved into the 4-hour-perweek language maintenance program intended to maintain and develop their skills in the heritage language for the remaining four years of the childrens primary schooling. The nature of our concerns is encapsulated in the research assistants annotations of the lessons observed:
I am really surprised to observe those students who have had three years in the bilingual program learning colours again this year. Last year the girls were writing stories and now they spend the whole lesson writing a simple sentence. What is happening? Is there any communication between the previous bilingual teacher and the new language maintenance teacher about this class of students? (Elder & Liem, 1999, p. 24)

In the next phase of the evaluation we adjusted our data collection plan so that this transition problem could be documented in more detail. The findings indicated that the problem had been addressed, and that a new Mandarin teacher who was formerly associated with the schools bilingual program, and had a good understanding of what could be expected of students exiting this program, had been appointed to teach the language maintenance program and was ensuring that its content was more challenging and that it built appropriately on prior learning. Our general impression was that the school was proud of its bilingual program and staff were comfortable with outside scrutiny, since they had been the object of researcher attention for a number of years (and such interest is continuing, e.g., see Molyneux, 2004 on students attitudes to bilingualism and biculturalism). They responded readily to our recommended changes of focus in response to the issues which emerged as the evaluation proceeded, and they were confident about the programs outcomes being compared to those achieved by learners in the neighbouring school, having conducted a similar benchmarking exercise in the past (Elder, 1998). Satisfied as the school seemed with the results of our successive evaluation reports, the school principal did not involve us in communicating findings directly to the teaching staff, so it is not clear whether teachers were privy either to the praise we offered for their efforts or to the concerns that we raised. Although these concerns were eventually addressed, it is not clear how influential our evaluation was in policy decisions made by the school. Thus it can be said that while the evaluation was thorough and its accountability function was fulfilled, its developmental value was uncertain.

3 Seaview
Seaview is a large secondary school spread across two campuses in the western suburbs of Melbourne with a large population of Lebanese students, most of whose parents had immigrated en masse from a small mountain village in Lebanon. Parents spoke a dialect of Lebanese in the home, and many but not all of their children had studied Arabic for up to 3 hours per week in primary

Downloaded from http://ltr.sagepub.com by Mohsen Hajian Nejad on October 10, 2009

Catherine Elder 23

school. State funding was used to establish a new late-entry bilingual ArabicEnglish program in the lower secondary grades, with a view to fostering more positive attitudes to schooling and higher academic achievements amongst a group that tended to perform poorly and was seen as academically at risk. It was believed by staff that Arabic-medium instruction would help make the content of the school curriculum more accessible to these children and strengthen their sense of pride in their cultural and linguistic identity. It was also hoped that there would be additional spin-offs from the bilingual program, including improved English literacy, as had been demonstrated from bilingual research in other contexts (e.g., Cummins, 2000; Oller & Eilers, 2000). The funds did not, however, permit the provision of bilingual instruction for all students from Arabic-speaking backgrounds, so the program was offered only to one class of volunteering students in Years 7, 8, and 9 respectively. Science, mathematics, and social studies were initially taught through the medium of Arabic with the same curriculum content as that offered to other (non bilingual program) students through the medium of English. All English handouts and course materials were prepared in advance and translated into Arabic. Bilingual teachers and teacher aides were all Arabic speakers and came from a range of source countries, including Lebanon, Egypt, and Syria. While all were qualified secondary teachers, not all had specialist training in language teaching. A team of consultants including the author and two Arabic-speaking academics from the same university won the evaluation contract for this program via competitive tender. Four postgraduate students enrolled in a Language Program Evaluation course were also involved in the project, and two of these were highly proficient in Arabic. Our level of engagement in this schools evaluation was far greater than was the case at either Luxton or Rosemount, and this was partly due to the schools expressed need for guidance with program implementation. We met with school staff at the planning stage of the project and discussed which subjects were best delivered through the medium of Arabic. We were also able to collect baseline data on Arabic and English proficiency before the program was implemented. We attended parent information nights and contributed to these from time to time with information about the program (communicated to parents via an interpreter). Regular meetings with staff were scheduled throughout the project, to report our findings and consider strategies for program enhancement. At these meetings we had lively discussions on a range of issues including the question of which variety of Arabic should be the medium and target of instruction. We argued that this should be Modern Standard Arabic (MSA) or a modified version of same, since this is the language of academic instruction throughout the Arab world and a variety in which speakers of different Arabic dialects can easily communicate. Teachers, however, persisted in using non-standard dialects as the medium for the teaching because they believed that that MSA was inaccessible to the students. While Lebanese dialect, spoken by one of the four teachers, was well understood by most students (who either used or heard

Downloaded from http://ltr.sagepub.com by Mohsen Hajian Nejad on October 10, 2009

24 Reconciling accountability and development

it at home), and its use could perhaps be justified as providing a stepping stone to MSA, Egyptian was the dialect used predominantly by the other three teachers. This variety, according to feedback we gathered from the children, caused problems of comprehension. Children and their parents also complained about the handouts produced by the Syrian teacher aide, which were highly complex, unvocalized renditions of the English material that other (non-bilingual) students were receiving. Instead of providing a bridge to understanding, these materials were actually adding to the students learning burden. Beliefs about appropriate modes of delivering content-based instruction in Arabic also differed between teachers and evaluators. While the evaluators advocated a learner-centred constructivist pedagogy, most of the Arabicspeaking teachers favored a teacher-centred approach, and this was particularly true in the Arabic Language Arts session, which was run along very traditional lines, with the teacher reading Arabic texts aloud and the students then copying these texts in their exercise books. We considered that these practices, while perhaps consonant with pedagogic traditions in the Arab world, were not conducive either to language acquisition or the development of learner autonomy. The programs emphasis on reading and writing, and the lack of a concerted policy on the medium for teacher input, may have explained why students performance on speaking tests (which we devised and administered in collaboration with the teachers) did not show any marked improvement during the course of the program. The fact that a number of the authors postgraduate students were involved in the project added a new dimension to the evaluation task. The design and conduct of the evaluation was discussed in class and because the mode of program delivery was unorthodox (in that it did not adhere closely to the principles of immersion education) and from our perspective somewhat problematic, we rehearsed together how we would share our sometimes critical feedback with the school in ways that would invite reflection and adaptive action (as advocated by Patton, 1996, and others), while at the same time meeting external accountability demands. One reporting format which we tried out in an interim report was to make recommendations (after a detailed exposition of our findings), propose particular strategies for the implementation of these recommendations, and then to invite feedback on these recommendations from the school. Some examples from the interim report are given below. These give a flavour of the issues that arose during the programs implementation and also show the schools (not always accepting) reaction to our advice.

Recommendation 1: Review selection policies for Year 2000 and beyond to ensure that all students placed in the bilingual program have literacy skills in Arabic. - Set minimum proficiency criteria for entry to the bilingual program - Administer Arabic language test to all Arabic-background children as a basis for selection. (The test developed by Language Testing Research

Downloaded from http://ltr.sagepub.com by Mohsen Hajian Nejad on October 10, 2009

Catherine Elder 25

Centre as part of the evaluation process could be used routinely for this purpose) We understand that the school does not want the program to be elitist in the sense of targeting only the more academically able students, but teaching complex subject matter in Arabic to students who are not literate in the language is likely to frustrate them and ultimately disadvantage rather than help them academically. Late (secondary) immersion programs have traditionally been offered only to students with a good grounding in the language of instruction. In the absence of such grounding an intensive language instruction program (which may itself be partly content-based) may be more effective than teaching mainstream subjects through Arabic. Schools response Selection based on ability is unacceptable to the school and runs counter to the schools commitment to equality of opportunity for all students. An attempt will however be made to ensure that those children with specific language or learning difficulties are not placed at risk by being required to participate in the bilingual program which may place too great a demand on them. Recommendation 2: Use a simplified form of Modern Standard Arabic as the spoken and written medium of instruction in bilingual classes. - Lebanese dialect could be used on occasions as a bridge to understanding MSA by those teachers who are able to speak it but Egyptian dialect should NOT be used, since it appears to confuse the students rather than facilitate their understanding - English translation of Arabic utterances should be avoided. Time currently spent on translating into English is better spent on repeating and where necessary simplifying or paraphrasing Arabic input to render it comprehensible. - Produce simplified handouts, which are targeted at students current Arabic literacy level and therefore comprehensible, rather than geared to native speaker readers. Schools response The bilingual staff feel that the evaluators concerns about the use of dialect in class are exaggerated, that neither the children nor their parents have great difficulty in understanding the variety of language used both in and outside the class. Attempts are being made to avoid the practice of translating into English to facilitate understanding. This should be easier now that Science, which requires mastery of specialist terminology, has been dropped from the bilingual program. Likewise the problem of handouts produced in complex Arabic is resolved now that Science is no longer taught through the medium of Arabic. Recommendation 3: Develop strategies to help surmount student difficulties in understanding, for example:

Downloaded from http://ltr.sagepub.com by Mohsen Hajian Nejad on October 10, 2009

26 Reconciling accountability and development

- Use LOTE (Arabic) classes as a means of preparing students for the language forms they are likely to encounter in their subject classes. This will require considerable advance planning and cooperation between teachers and a revamping of the methods and materials used in LOTE classes. - Teach a more limited range of subjects through Arabic, but spend more time planning ways of presenting input in a form which is comprehensible to the students. - Develop a consistent policy for bilingual aides re their use of the target language. The support of bilingual aides can be enlisted to explain Arabic language material on a one-to-one basis. Schools response At group planning sessions for the bilingual program an attempt will be made to identify the language demands of the various subject areas so that the LOTE teacher can prepare students for these demands in preparation for new topics. New materials for the LOTE class are currently being developed and these are being reviewed by a teaching expert.

Although our views and recommendations were not always favorably received, the fact that our comments were backed up by the quantitative evidence of proficiency and classroom interaction required for accountability purposes gave it considerable weight. The report sent to the Department of Education thus provided not only a clear picture of classroom operations and learning outcomes, but also a sense of the dialogue that was occurring between evaluators and the school. This factor did not preclude the schools taking its own decisions and actions, but it was intended to demonstrate to education department officials that the evaluation process was being treated seriously and the program was evolving and refining its goals and directions in light of feedback and experience gained through its implementation (one of the fundamental principles underlying useful outcomes-oriented evaluations, see Norris, 2006). In subsequent phases of the evaluation we were able to follow up on the issues raised in our previous reports and document improvements in certain areas. This dialogic approach also highlighted some instances where the evaluator and the school were at cross purposes, giving us a sense of how to refine our communication strategies with program participants. For example our recommendation (see section 1 above) that a certain degree of proficiency in Arabic be a precondition for enrolling in the bilingual program had clearly been misinterpreted by the school as a proposal to reinstate the practice of streaming students according to academic aptitude, which is out of kilter with government educational policy.

III Discussion: Evaluating the evaluations

The paper has provided a brief sketch of three different program evaluation projects, all involving heritage languages. Although each project was

Downloaded from http://ltr.sagepub.com by Mohsen Hajian Nejad on October 10, 2009

Catherine Elder 27

conducted according to the same set of (loose) evaluation guidelines, and the same consultant was involved in each of the three contexts, the relationship between evaluator and the school was constructed and enacted very differently in each case. At Rosemount the consultant, her colleague, and research assistant had limited access to the school and were involved in only a limited component of the evaluation, uninformed by any direct involvement in the programs operations. They also had minimal control over the reporting of evaluation outcomes, which were, it has been suggested, of little utility either as evidence of compliance with the terms of funding or as feedback for program enhancement purposes. At Luxton the scope of the evaluation was much broader, and the bilingual research assistant had easy access to classrooms and built a good working relationship with key staff. However, while the school accommodated our suggestions to shift the foci of the evaluation as issues arose, and the quality of information gathered was accurate, relevant, and broad ranging enough to serve external accountability purposes, the written annual reports on the program were the only avenue for communicating findings and had limited circulation within the school. Although this was perhaps less serious than it might have been, because the school felt (as did we) that its house was already in good order, the potential for feedback and development was not fully exploited. At Seaview the dynamics were very different, with the evaluators invited into the school at the programs inception and involved in discussion with staff on an ongoing basis. Although our investigations brought some uncomfortable issues to light, the open debates which ensued, and the progressive documentation of these debates and subsequent actions, gave a rich picture not only of program outcomes but also of implementation issues arising as the program proceeded and the way these were addressed by the school. The accountability and developmental functions of the evaluation were intertwined, and the evaluation reports on this program were more complete, more candid, more audience-sensitive, and arguably more useful to both external and internal audiences than those produced for the other two programs. One factor which could be seen as bearing on the quality of our relationship with each school, and hence the quality of our evaluations, was our distance (actual or perceived) from cultural values associated with the heritage language. The fact that this author was not a speaker of any of the relevant heritage languages may have created a perception by the bilingual teachers and other staff that we did not or could not understand the issues they faced. Our lack of Vietnamese knowledge was certainly a constraint when it came to working with Rosemount teachers on a proficiency test of Vietnamese. Whether this was the reason why the principal at Rosemount kept us in the dark about the separate team of researchers working with the bilingual teachers in her school is uncertain. It may have been her attitude to our status as language testers with a relatively limited role in the evaluation project, rather than

Downloaded from http://ltr.sagepub.com by Mohsen Hajian Nejad on October 10, 2009

28 Reconciling accountability and development

her view of our cultural or linguistic expertise (or lack thereof), that was decisive here. At Luxton, after all, the fact of employing a Chinese-speaking assistant did not appear to have a direct impact on the schools view of the evaluation as an accountability exercise rather than as a development exercise. While knowing Mandarin gave the student-researcher better insights into classroom operations than were possible at Rosemount, it was the evaluators status as academic researchers that the school valued most in promoting its achievement to the Department of Education and to the wider community. Furthermore, although language-specific consultants were involved in collecting evidence for the Seaview evaluation, they were not, as it happened, of the same language background as the bilingual teachers. Although all were accomplished Arabic speakers who had undergone years of schooling in the language and had worked and/or travelled extensively in different parts of the Arab world, they were native speakers of English, Russian, Turkish, and Bahasa Indonesian respectively, and as such were seen as outsiders by the school. As indicated in the schools response to our recommendations about the medium of instruction, the bilingual teachers believed that these languagespecific consultants had unduly purist attitudes to the question of what language variety should be taught (and how). It can be argued that culture clashes due to language difference are simply one of the more salient components of the tension between insider and outsider perspectives that is part and parcel of any external evaluation (Alderson & Scott, 1992). Such clashes may of course limit the validity of information gathered as well as the acceptability of evaluative judgements offered by outsiders, resulting in what Holliday (1992) terms tissue rejection in some cases. On the other hand, these tensions, if openly aired and sensitively handled, can create opportunities for clarification and fruitful discussion of teaching goals, approaches, and values, which might not otherwise occur. The resultant understandings can inform evaluation findings and assist both in communicating information for accountability purposes and in collective decision-making about future directions for the program. The above account of these three evaluation projects is from a single evaluator whose perceptions are inevitably partial. A meta-evaluation should ideally draw on other insights, including, in this case, those of the school that commissioned the evaluation. However, even if we had canvassed each schools views of our evaluative efforts, the integrity of these sources of feedback may have been compromised by the evaluators verdict on each program. It seems likely, for example, that feedback from Seaview might be less favorable than that from Luxton, simply because the reports on the former program were more critical. In fact, at one point in the Seaview evaluation, one of the senior teachers chided the evaluator for not producing the kind of report which could be used to promote the innovative aspects of his program within the wider Arabic-speaking community. Such political pressures on an external evaluator are not uncommon, especially when the findings challenge

Downloaded from http://ltr.sagepub.com by Mohsen Hajian Nejad on October 10, 2009

Catherine Elder 29

the status quo (Weir & Roberts, 1994, p. 210). Dissatisfaction with evaluation findings cannot however be interpreted as an indictment of the evaluation itself. If evaluations are to meet professional standards (Joint Committee on Standards for Educational Evaluation, 1994), criteria for their conduct need to made explicit and feedback on their utility, feasibility, propriety and accuracy must be routinely gathered, not only from program participants, but also from central funding bodies as well as from peer evaluators and other readers of evaluation reports. Such feedback was not forthcoming in the case of these projects, which makes their effectiveness, whether from an accountability perspective or from a developmental perspective, difficult to gauge (Stufflebeam, 2000).

IV Lessons learned from the evaluations

The experience of conducting these evaluations has allowed us to reflect not only on the limitations of the evaluation brief (already alluded to) but also on our own role as evaluators and how it might have been more professionally and responsibly exercised. The lessons learned from the process are framed below as statements of advice, both for ourselves but also for future evaluation consultants, in the hope that these will aid them in building productive relationships with their clients.

1 Make sure that there are sufficient resources or funds to carry out the evaluations
While the amount of money provided for the projects was in some sense beyond our control, collective pressure on the government from all evaluation consultants for additional funds might have resulted in some increase in funding over the term of the evaluation project. Failing that, a more limited scope or focus for each project to allow it to fit within the budget would have been advisable.

2 Clearly establish with the client the purpose, scope, and audiences for the evaluation before embarking on the project
This strategy is strongly emphasized by Norris (2006) in a position paper outlining strategies college foreign language programs might take in response to government demands for increased accountability. In our case the terms of reference provided to the school by the state department of education were not sufficiently explicit about either the accountability or development role of the evaluation. This may explain why each school took a different stance on this issue and why we were sometimes uncertain about what was expected of us.

Downloaded from http://ltr.sagepub.com by Mohsen Hajian Nejad on October 10, 2009

30 Reconciling accountability and development

3 Identify areas of expertise among both evaluators and evaluands and what their respective roles will be in conducting the evaluation
This practice, also advocated by Norris (2004), if given more attention at the outset of each evaluation, might have resulted in a more participatory approach and better uptake of evaluative advice. At Rosemount the limitations of our involvement meant that: (a) we were unaware of the existence of another project operating in tandem within the school; (b) the language test data we gathered was difficult to gather and interpret; and (c) we had minimal input into the evaluation report and the recommendations in particular. In some circumstances, where the evaluator feels that her ability to carry out the project is jeopardized by the role that has been assigned, it may be advisable to decline the commission.

4 Discuss with stakeholders what will constitute evidence for accountability and development (assuming that both purposes are relevant to the evaluation)
In all three evaluations a focus on outcomes was assumed, but the mechanisms for giving feedback and developing strategies for improvement were not clearly identified by the evaluator. While at Seaview these mechanisms were outlined in the project tender and evolved as the program proceeded, at the other schools they were never put in place.1 Failure to do this placed constraints on stakeholders capacity to apprehend and respond to the evaluation findings.

5 Be prepared to change the evaluation plan as the program proceeds

This kind of responsiveness to conditions on the ground has long been advocated by evaluation theorists (Stake, 1975, 2003). At Luxton, our decision to shift the focus of the evaluation to the issue of transition between the bilingual and language maintenance program was agreed upon by all parties once this emerged as a problem area.

6 Use synergistic modes of reporting

The importance of sensitivity to audience in reporting findings has been emphasized by Passow (1990), Lynch (1996) and many others. Owen (1996, p. 26) advocates that all parties collaborate in the production of evaluation reports. Attention to this issue was found to be particularly helpful in the case of Seaview, where the teachers had the chance to respond to our recommendations, and the schools perspective was well represented in the annual reports to the Department of Education. Nevertheless more effort could have been made in all three projects to ensure that the more technical aspects of the

Downloaded from http://ltr.sagepub.com by Mohsen Hajian Nejad on October 10, 2009

Catherine Elder 31

report (relating to test properties, gain-scores, and patterns of classroom interaction) were understood by the different stakeholders. In addition, more guidance could have been offered to program teachers and administrators in documenting and reporting internal evidence of program achievements in a manner that would better satisfy external accountability requirements.

7 Build in mechanisms for evaluating the evaluation

The absence of any accountability and feedback for evaluators was a weakness of the heritage language program initiative described in this paper. A truly developmental evaluation, as well as offering strategies for improvement, will also be responsive to suggestions from the school and from other recipients of evaluation findings. At the outset of any evaluation project it is important that evaluators discuss with all stakeholders the ways in which the value of their contribution to the project will be assessed. Ideally this kind of meta-evaluation should occur at regular intervals throughout a project and should be built into the evaluation design. Closer attention to these lessons learned by this evaluator would undoubtedly improve the utility of many evaluative projects and offer a means of breaking through the impasse articulated by Cronbach and his colleagues nearly three decades ago: Whereas the persons who commission evaluations complain that the message from the evaluations are not useful, evaluators complain that the messages are not used (Cronbach et al., 1980, p. 47).


To be fair, it should be pointed out that the funds available for the evaluations were more restricted at Rosemount and Luxton than at Seaview, which had been singled out by the Department of Education as an implementation case study requiring further documentation.

V References
Alderson, C., & Scott, M. (Eds.). (1992). Insiders, outsiders and participatory evaluation. In C. Alderson and A. Beretta (Eds.), Evaluating second language education (pp. 2558). Cambridge: Cambridge University Press. Berthold, M. (Ed.). (1995). Pioneer and lighthouse: The Benowa experience. Canberra: National Language and Literacy Institute of Australia. Chelmsky, E., & Shadish, W. (1997). Evaluation for the 21st century. Newbury Park, CA: SAGE Publications. Clyne, M. (Ed.) (1986). An early start: Second language at primary school. Melbourne: River Seine Publications. Clyne, M., Jenkins, C., Chen, I. Y., Tsokaldou, R., & Wallner, T. (1995). Developing second language from primary school: Models and outcomes. Deakin, ACT: National Languages and Literacy Institute of Australia.

Downloaded from http://ltr.sagepub.com by Mohsen Hajian Nejad on October 10, 2009

32 Reconciling accountability and development

Cronbach, L. J., Ambron, S. R., Dorubush, S. M., Hess, R. D., Hornik, R. C., Phillips, D. C., Walker, D. F., & Weiner, S. S. (1980). Towards reform of program evaluation. San Francisco, CA: Jossey Bass. Cummins, J. (2000). Language power and pedagogy: Bilingual children in the crossfire. Buffalo, NY: Multilingual Matters. de Courcy, M. C. (2002). Learners experiences of immersion education: Case studies of French and Chinese. Clevedon, UK: Multilingual Matters. de Courcy, M. C., Burston, M., & Warren, J. (1999). Language development in an Australian French early partial immersion program. Babel: Journal of the Australian Federation of Modern Language Teachers Associations, 34(2), 1420, 38. de Courcy, M. C. (1997). Benowa High: A decade of French immersion in Australia. In R. K. Johnson & M. Swain (Eds.), Immersion: International perspectives (pp. 4462). Cambridge: Cambridge Press. Dellora, M. (1989). One model of bilingual education in Australia at Richmond West Primary School. Melbourne Papers in Applied Linguistics, 1(1), 17. Devlin, B. (1997). Links between first and second language instruction in Northern Territory bilingual programs: Evolving policies, theories and practice. In P. McKay et al. (Eds.), The Bilingual Interface Project Report (pp. 7590). Canberra: DEETYA. Elder, C. (1989). Drowning or waving? An evaluation of an Italian partial immersion program at a Victorian primary school. Melbourne Papers in Applied Linguistics, 1(2), 917. Elder, C. (1998). Luxton Primary School comparative literacy study. Unpublished report. Melbourne: Language Testing Research Centre, University of Melbourne. Elder, C. (2005). Evaluating the effectiveness of heritage language education. What role for testing? International Journal of Bilingual Education and Bilingualism, 8, 196212. Elder, C., & Liem, I. (1999). The bilingual Chinese/English program at Luxton primary school, July 1998June 1999 (External evaluators report). Melbourne: Language Testing Research Centre, University of Melbourne. Elder, C., & Mayer Attenbourgh, C. (2000). Seaview Secondary College ArabicEnglish Bilingual Program (External evaluators report). Melbourne: Language Testing Research Centre, University of Melbourne. Fernandez, S. (1996). Room for two: A study of bilingual education at Bayswater South Primary School (2nd ed.). Belconnen, ACT: National Languages and Literacy Institute of Australia. House, E., & Howe, R. (1998). The issue of advocacy in education. American Journal of Evaluation in Education, 19, 233236. Joint Committee on Standards for Educational Evaluation. (1994). The Program Evaluation Standards (2nd ed.). Newbury Park, CA: SAGE Publications. Kiely, R., & Rea-Dickins, P. (2005). Program evaluation in language education. New York: Palgrave Macmillan.

Downloaded from http://ltr.sagepub.com by Mohsen Hajian Nejad on October 10, 2009

Catherine Elder 33
Lorch, S., McNamara, T., & Eisikovits, E. (1992). Late Hebrew immersion at Mount Scopus College, Melbourne: Towards complete Hebrew fluency for Jewish day school students. Language and Language Education, 2(1), 129. Lotherington, H. (2001). A tale of four teachers: A study of an Australian late-entry content-based program in two Asian languages. International Journal of Bilingual Education and Bilingualism, 4(2), 97106. Lynch, B. (1996). Language program evaluation: Theory and practice. Cambridge: Cambridge University Press. Mackay, R., Wellesley, S., & Bazergan, E. (1995). Participatory evaluation. ELT Journal, 49(4), 308317. Molyneux P. (2004). Bilingually educated children reflect on their learning. Australian Language & Literacy Matters, 1(2), 410. Norris, J. (2004). Validity evaluation in foreign language assessment. Unpublished doctoral dissertation. Honolulu: University of Hawaii. Norris, J. (2006). The why (and how) of assessing student learning outcomes in college foreign language programs. The Modern Language Journal, 90, 577583. Oller, K., & Eilers, R. (Eds.). (2002). Language and literacy in bilingual children. Clevedon, UK: Multilingual Matters. Owen, J. (1996). Program evaluation: Forms and approaches. New South Wales: Allen and Unwin. Patton, M. Q. (1996). Utilization-focused evaluation: The new century text (3rd edition). Thousand Oaks, CA: SAGE Publications. Passow, A. H. (1990). Reporting the results of evaluation studies. In H. J. Walberg & G. D. Haertel (Eds.), The international encyclopedia of educational evaluation (pp. 745750). Oxford: Pergamon Press. Rea-Dickins, P. and Germaine, K. (Eds.). (1998). Managing evaluation and innovation in language teaching: Building bridges. Harlow, Essex: Longman. Reid, J. (1996). Recent developments in Australian late immersion language education. Journal of Multilingual and Multicultural Development, 17, 469485. Stake, R. E. (ed.). (1975). Evaluating the arts in education: A responsive approach. Colombus, OH: Merrill. Stake, R. E. (2001). A problematic heading. American Journal of Evaluation, 22, 349354. Stake, R. E. (2004). Standards-based & responsive evaluation. Thousand Oaks, CA: SAGE Publications. Stufflebeam, D. L. (1981). Meta-evaluation: Concepts, standards, and uses. In R. A. Berk (Ed.), Educational evaluation methodology: The state of the art (pp. 146163). Baltimore, OH: The Johns Hopkins University Press. Weir, C., & Roberts, J. (1994). Evaluation in ELT. Oxford: Blackwell. Zbar, V., Elder, C., McNamara, T., & Rossi, C. (1998). Seaview Secondary College Bilingual Program evaluation. Melbourne, Victoria: Zbar & Schapper Consulting.

Downloaded from http://ltr.sagepub.com by Mohsen Hajian Nejad on October 10, 2009