Vous êtes sur la page 1sur 41

Using Corpora for Autonomous Correction and Improvement of Academic Writing

Ramesh Krishnamurthy
Aston University
February 8th 2011

[REPORT ON WORK IN PROGRESS]

Abstract
1. All of our students need to improve their academic writing skills. This is true for Home students as well as for the increasing numbers of EU and International students. 2. This talk looks at the possibilities of using corpora in this process, and specifically reports on a case study involving a Chinese-speaking student using the ACORN (the Aston Corpus Network) corpora. 3. The method requires less teacher time, offers more scope for autonomous student learning, and leads to a greater awareness of academic writing as a cyclic editorial process rather than merely as a product for assessment.

UG1 students need to improve their academic writing skills - 1


Examples from UG1 The same article can be reported differently, depending on the type of newspaper it has been obtained from. As the case when first reported concerned the death of a young baby due to neglect and abuse, which legally the public were not allowed to be made aware of the full name of the child. As expected from a headline the text still reads as a statement as opposed to a structured sentence in order to grab the audiences attention.

UG1 students need to improve their academic writing skills


Extracts from Feedback to UG1 You make some simplistic and often illogical statements Many vague statements Some poor wordings make your points unclear You use inaccurate or imprecise terminology, e.g. transcript for phonetic transcription, and sayings and wordcombinations to describe pounded cake, pound note intuition is not a theory!....You have inaccurately typed URLs You used a very informal writing style Your quotes sometimes indicate the opposite of the point you are making Very repetitive style one can state one can statethe dictionary entry is for scrape, but the item in the text is scrapped

UG2 students need to improve their academic writing skills


Extracts from Feedback to UG2 Your written language needs some more work, as your errors sometimes impede communication some mistakes affect the clarity of the argument mistakes in spelling and grammar more noticeable are mistakes in the use of terminology poor grammar makes the analysis difficult to understand grammatical errors, and poor choice of words (especially terminology) spelling mistakes use of complex sentences affect the clarity of argument some rather informal comments

UG3 students need to improve their academic writing skills


Examples from UG3 From my initial reading on this matter, I have read within Richard Dawkins (1976) book The Selfish Gene and this gave me a valuable insight into Oxford Dictionaryies.com it is easily transferrable from any subject that is of slight annoyance, to an accident these memes are an interesting cause for study, particularly, as they are most widely recognised in younger Internet communities The area in which I propose to study is in politics and corpus

UG3 students need to improve their academic writing skills


Extracts from Feedback to UG3 Some errors and weaknesses in expression (comprised from) weak wordings cause loss of coherence Weak academic style; poor proofreading; sometimes repetitive/tautologous wordings Weak expression often obscures meaning Grammar not clear; many seeming errors Major weaknesses in expression and style, obscures meaning at times Some weaknesses in use of terms s for plurals non-grammatical sentences some poor wordings... errors and typos poor wordings, including informal, non-academic phrases weak grammar obscures meaning

Masters students need to improve their academic writing skills


Extracts from Feedback to MA/MSc Unfortunately, the presentation suffered considerably from poor wordings, weak academic style, and many typos and errors quite a lot of minor slips already noticeable in the first page, often to do with word choice The consistently poor quality of English throughout makes it very difficult to assess frequent lack of linguistic clarity and cohesion the content is largely obscured by the weaknesses in form at times repetitive, or overladen with connectors The main weakness is in English expression, which sometimes obscures the intended meaning English style is often poor (the learnt from coursebooks language the employment of the founding exerting the whole dialogues) problems in English expression sometimes cause difficulty for the reader inconsistent and inaccurate use of terminology very weak English academic writing style and expression throughout, often leading to considerable difficulty in comprehension

and its not just me!

http://nexus.aber.ac.uk/xwiki/bin/view/Main/HEA+Annual+Conference+2009

Higher Education Academy Annual Conference 2009 The Wiki Way to Develop Academic Writing Competence Dr Rob Spence (Edge Hill University)

This paper presented an account of an ongoing investigation into the use of wikis to develop students academic writing skills through collaborative work. Undergraduate students of English were invited to collaborate on writing tasks with the specific aim of developing their competence through peer review and appraisal. The motivation for the wiki project arose from the widelycommented (if only anecdotal) decline in student writing skills/literacy in HE. In particular, the wiki project sought to address three widely-perceived problems: students lack of confidence, students inability to deal with complex issues, students substandard written work and the tendency to Wikipedia cutand-paste.

http://www.humboldt.edu/english/GWPEGeneralInformation.htm

Humboldt State University, department of ENGLISH History of and Rationale behind the Graduation Writing Proficiency Examination Requirement Because of a noticeable decline in student writing skills, the CSU Chancellor appointed a Task Force on Student Writing Skills in 1975 to investigate the problem and recommend appropriate solutions. The major portion of the Task Force's recommendations, reviewed by the Educational Policies Committee and supported by the CSU Academic Senate, was accepted by the Board of Trustees in 1976. One of the central aspects of this policy required the demonstration of writing proficiency at the upper-division level as a requirement for graduation from every campus within the CSU system.

Learner autonomy
autonomy 1620s, from Gk. autonomia "independence, living by one's own laws" from auto- "self" + nomos "custom, law" [http://www.etymonline.com/] moral and political philosophy > sociology > education Holec (1979) Autonomy and Foreign Language Learning Boud (ed) (1981) Developing Student Autonomy in Learning Grenfell and James (2004) Change in the field - changing the field: Bourdieu and the methodological practice of educational Research. British Journal of Sociology of Education,25/4, 507-523

Learner autonomy: Holec (1979)


The autonomous language learner takes responsibility for the totality of his learning situation. He does this by determining his own objectives, defining the contents to be learned and the progression of the course, selecting methods and techniques to be used, monitoring this procedure, and evaluating what he has acquired. Objectives are specific to the learner, and the learner's communicative needs determine the verbal elements chosen. Learning thus proceeds from ideas to correct grammatical, lexical, and phonological form. The self-directed learner chooses the methods of instruction through trial-and-error. His selection is based on the objectives set and its applicability to internal and external constraints. The student evaluates his attainment through his objectives, and this evaluation helps him to plan subsequent learning. The concept of autonomous learning requires a redefinition of knowledge from an objective universal to a subjective individual knowledge determined by the learner. For teachers, it means new objectives which help the learner define his personal objectives and help him acquire autonomy. Several experiments in autonomous learning are described.

Learner autonomy: Boud (ed) (1981)


A collection of essays examines ways in which teachers in higher education can enable students to become more autonomous in their learning: that is, how students can learn without the constant presence or intervention of a teacher. The introduction by David Boud discusses the trend in education towards a more autonomous learner, and provides an overview of the book's structure. Part I provides a general orientation toward the issues discussed in detail in later chapters. Chapters in Part I include: "Toward Student Responsibility for Learning" (David Boud); "Changing Basic Assumptions about Teaching and Learning" (M. L. J. Abercrombie); and "Assessment Revisited" (John Heron). Part II (Case Studies) includes: "Reducing Teacher Control" (J. P. Powell); "Independent Study: A Matter of Confidence" (Harry Stanton); "One-To-One Learning" (David Potts); "Parrainage: Students Helping Each Other" (Marcel Goldschmid); "Student Autonomy in Learning Medicine: Some Participants' Experiences" (Barbara Ferrier, Michael Marrin, and Jeffrey Seidman); "Preparing for Contract Learning" (Mary Buzzell and Olga Roman); "Student Planned Learning" (John Stephenson); and "A Decade of Student Autonomy in a Design School" (Barrie Shelton); Part III (Reflections) offers: "Putting into Practice: Promoting Independent Learning in a Traditional Institution" (Malcolm Cornwall) and "Moving Towards Independent Learning" (J. P. Powell). References and an index are provided.

Learner autonomy: Grenfell and James (2004)


methodological practice in educational research from the perspective of Bourdieus field theory (p507) taking educational research itself to be a `field (p508) the briefest account of methodological developments in the twentieth century would describe a move away from a positivist towards a more qualitative, naturalistic paradigm. Up until the 1960s, what educational research that did take place was mostly small, part-time and based on psychometric tests of pupils intelligence and learning. The alternative to this approach stemmed from a philosophical critique of its founding assumptions to mimic the physical sciences and stressed instead the social and contextual aspects of education (see Hirst, 1966, 1974). What emerged was a definition of educational theory in terms of the socalled `foundational disciplines': sociology, philosophy, history, psychology.

Learner autonomy: Grenfell and James (2004)


The qualitative paradigm developed throughout the 1970s, 1980s and 1990s, giving rise to a range of ethnographic and naturalist methodologies, including the postmodernist. However, a sustained attack (see Hillage et al., 1998; Tooley & Darby, 1998) against this research was mounted during this last decade of the century; claiming to find its methods insufficiently rigorous, its data collection small scale and its outcomes biased. Moreover, it was argued that such research had little impact on institutional practice; while what was needed was research of the nature that answered questions such as how to improve pupil achievement. Researchers were urged to return to quantitative methods, with experiments and randomized controlled trials seen as capable of producing sufficiently `hard' evidence (see Fitz-Gibbon & Morris, 1987; Boruch, 1997; Fitz-Gibbon, 2001). (p509) avant-garde rear-garde process of time (p510) There are other features that follow from the character of fields and the avant-garde. First is the question of autonomy (p510) [NB NO mention anywhere in the article of learner autonomy!] academic products structure practice (p510)

Focus on Product = Neglect of Process?


League tables A level results Marking systems (class distribution) Equality (irrespective of motivation/performance) Increasing instrumentality in attitudes to education Grenfell and James (2004): academic products structure practice(p510)

Potential role of corpus in pedagogy


Syllabus Input Materials: target expert materials (traditional) and student data (forthcoming) Drafting academic writing Feedback Editing and revision of academic writing Assessment Syllabus

Initial Research: ACORN Case Studies 2006-7


This research was first reported in the ACORN Case Studies (2008), as ACORN Case Study 2: Self-Correction of Academic Writing Case Study 4: Spanish Grammar Clinics was since developed and published: Yepes, G.R. & Krishnamurthy, R. (2010). Corpus Linguistics and Second Language Acquisition the use of ACORN in the teaching of Spanish Grammar, Lebende Sprachen 55/1: 108122

ACORN Case Study 2


Context : I worked closely with Steven, a Computer Science Placement Student on ACORN, a Chinese native-speaker, who came to UK in 2002, did 9 months of English then 2 years A-level (Maths, Chinese, Physics) at an FE college, then started at Aston in 2005. He submitted a weekly 1-page report to me on his ACORN work. Aims: To help Steven to improve his English and produce better reports; to trial the ACORN system with a view to software enhancements; to understand some of the pedagogic implications of the methodology Procedure: This started very informally, but seemed to work extremely well, so we started to preserve the data. Very rough estimates are: I spent 2-3 minutes highlighting in green any marked usages; Steven spent 5-10 minutes correcting 1015% silly mistakes, 30 minutes checking ACORN corpus and correcting 60-70% of other items; We spent 15 minutes going through the 15-20% remaining complex items, and 15 minutes discussing Chinese/English, corpus software design, and search procedures Examples of items I highlighted in Stevens draft reports: I will take a deep look into it next week I replied him He was not an expert with MySQL The testing that I am doing does not affect any of the current functions on ACORN except adding new records to the ACORN log The PHP engine on the server might out put an error message. ACORN Screenshots were provided for these items, showing how he found the correct wording to use

ACORN Case Study 2


Initial Evaluation: Steven enjoys this method: he finds it empowering, and incidentally learns other lexis and grammar; he perceives for himself the value of functions missing in the ACORN software: e.g. phrase search, and this motivates him to develop them; It saves me time, and turns a more mundane task into a stimulating experimental procedure Afterthoughts: We have records of the marked pages and the corrected pages. We need to accurately record when Steven uses ACORN, which searches are quicker, which are impactful on his learning, which steps through the data require external prompting, etc. An updated report from Steven suggests that, partly because of the restricted and repetitive nature of his reports, and partly due to his past experience, the proportions of corrections are changing. The range/variety of errors has been reduced. He now estimates that he is able to self-correct 30% of errors (e.g. omission of the; mismatch of tense sequences), only about 20% involve ACORN searches, and perhaps up to 50% require discussion. I think this methodology could be used by many language teachers. It is quick for the teacher, and results in a high proportion of self-correction by the student, as well as some incidental learning. The procedure can of course also be used by students while drafting, rather than after correction, and for academic writing in French, German and Spanish.

DATASET 1 Stevens work : 2 corpora


a) original drafts b) revised/corrected/edited versions Corpus creation: original files were in MSWord (.doc) format; Converted to TXT (MSDOS); re-converted DOC files to txt using Windows Encoding, because MS-DOS rendered apostrophe as question mark Weeks 4-5-6 were covered in 1 report; Weeks 22-23 were covered in 1 report; OMITTED 3 files for weeks 43-45 from drafts because they were NOT corrected; Corpora: a) Drafts Corpus contains 39 files (weeks 1-42); 15694 tokens = c. 402 per file; 1821 types b) Corrected Corpus contains 39 files (weeks 1-42); 15643 tokens = c. 401 per file; 1806 types i.e. suggests very little change in length (word count) or lexical variety

AntConc software for initial corpus analysis


Demo

AntConc: Word List: Drafts corpus = 15694 tokens [avge length=402], 1821 types

AntConc: Word List: Corrected corpus = 15643 tokens [avge length=401], 1806 types

DATASET 2: ACORN usage monitoring programs


1. Createlog the original monitor program started 05/06/07 when ACORN was first released to staff/students but only recorded concordance searches Designed to allow download as Excel file but dataset has now outgrown the Excel maximum record limit (c. 62k lines?)

DATASET 2: ACORN usage monitoring programs


2. Monitor Log records ALL queries within ACORN (i.e. frequencies, etc as well as concordance) but only started on 13/03/08 written by Steven! Was also supposed to allow saving as Excel file But the Excel download does not work it creates a file, but with only one line of data, always the same one!

Extracting Stevens searches from the ACORN usage monitor logs


This was fairly straightforward, sorting the log files on the username column

Aligning Stevens work and ACORN usage for detailed analyses


This was slightly trickier! START/END DATES of Stevens work: 06/08/07 - 12/06/08
Week 1 draft report = 06/08/07 Week 42 draft report =29/05/08 [Week 45 draft report = 13/06/08] Week 1 corrected report = 07/08/07 Week 42 corrected report = 12/06/08

BUT
(1) Corrected versions were often submitted in batches, whenever Steven found the time in between his ACORN programming tasks, hence the detailed analyses are also initially conducted in batches (2) Change in ACORN usage monitoring program: As Steven only launched the monitor log program on 13/03/08, I can only check Stevens use of Concordances (and no other features) before that date

Stevens draft
Week twelve
I updated the contents of the tutorial and case studies files by following Rameshs corrections and then recreated them in new designed layout. And finally, I uploaded them to the server in order to allow Ramesh to them to show Professor Alison Halstead.

For the existing parallel text files on the server, there are <paragraphID#> marks between paragraphs, where # indicates the paragraph number, so that the parallel indexing program knows that where a new paragraph starts and what the paragraph number is. However, for the new parallel indexing program which compiled last week, it recognizes a new paragraph by an empty line of String and then increments the paragraph number by 1. The reason for why I did it this way was because if it gave me the correct paragraph number, then I would not have to run the paraAlign.java program to produce the <paragraphID#> marks before running the parallel indexing program, this would shorten the time required for the whole indexing processes.

The contents of the new created databas were changed slightly after using the new compiled program. The sequence of the values under the field ID in table tokens used to be in numerical order, from 1 to the number of total tokens in the file. But after the new compiled program was used, the sequence was not in numerical order. The reason for that was because the tables contents were ordered by the frequency of tokens, which means the most frequent word appeared on the top of the table rather than the first token in the file.

To test whether the new database could work properly with the parallelResult.php file, I had to upload the database and the parallel text files from localhost to the live server and then move the existing parallel text files on the server to a different directory so that only the new uploaded files were read, and finally test them by using the parallel function on the website. Unfortunately the test result suggested that there were some problems because no text was shown on the parallelResult web page. While I was thinking what the problems may be, I emailed Husman to explain what I have done and what the result was, to see if he knew what had gone wrong. The reasons that I could think of were either there was something else that I had not yet done or the values under the field ID had to be in numerical order. But I did not think the possibilities were high for both of the reasons.

I had a look at the parallelResult.php file and tried to find out what commands were used to retrieve the data from the database. But I have not resolved anything yet.

Stevens draft with Rameshs green highlights

Createlog ONLY 05/06/07 12/03/08 Week 12 draft Week 12 corrected 25/10/07 26/10/07

Createlog + Monitor Log 13/03/08 12/06/08

Items highlighted by Ramesh in draft

Items searched in ACORN

following Rameshs corrections I did not think the possibilities were high for both of the reasons the new compiled program the new uploaded files The reason for that was because

corrections reason compiled [corrected by analogy?] reason

NOT IN WEEK 12 DRAFT NOT IN WEEK 12 DRAFT

1346 chenz 1359 chenz 1366 chenz 1376 chenz 1377 chenz 1378 chenz 1379 chenz 1380 chenz 1381 chenz 1382 chenz 1383 chenz 1394 chenz

English English English English English English English English English English English English English English

eng_general_db eng_general_db eng_general_db eng_general_db eng_general_db eng_general_db eng_general_db eng_general_db eng_general_db eng_general_db eng_general_db eng_general_db eng_general_db eng_general_db

research negative the reason compiled numerical may might top reason the corrections webpage text

18/10/2007 21/10/2007 23/10/2007 25/10/2007 25/10/2007 25/10/2007 25/10/2007 25/10/2007 25/10/2007 25/10/2007 25/10/2007 26/10/2007 26/10/2007 26/10/2007

NOT IN WEEK 12 DRAFT

1395 chenz 1396 chenz

Stevens corrected version


Week twelve
I updated the contents of the tutorial and case studies files by implementing the corrections that Ramesh suggested, and then recreated them in a newly designed layout. And finally, I uploaded them to the server in order to allow Ramesh to show them to Professor Alison Halstead.

For the existing parallel text files on the server, there are <paragraphID#> marks between paragraphs, where # indicates the paragraph number, so that the parallel indexing program knows that where a new paragraph starts and what the paragraph number is. However, the new parallel indexing program (compiled last week) recognizes a new paragraph by empty lines, and then increments the paragraph number by 1. The reason for doing it this way was that if it gave me the correct paragraph number, then I would not have to run the paraAlign.java program to produce the <paragraphID#> marks before running the parallel indexing program. This would shorten the time required for the whole indexing process.

The contents of the newly created database were changed slightly after using the newly compiled program. The values under the field ID in table tokens used to be in numerical order, from 1 to the number of total tokens in the file. But after the newly compiled program was used, the sequence was not in numerical order. That was because the tables contents were ordered by the frequency of tokens, which means the most frequent word appeared at the top of the table rather than the first token in the file.

To test whether the new database could work properly with the parallelResult.php file, I had to upload the database and the parallel text files from localhost to the live server and then move the existing parallel text files on the server to a different directory so that only the newly uploaded files were read, and finally test them by using the parallel function on the website. Unfortunately the test result suggested that there were some problems because no text was displayed on the parallelResult webpage. While I was thinking what the problems might be, I emailed Husman to explain what I have done and what the result was, to see if he knew what had gone wrong. The reasons that I could think of were either there was something else that I had not yet done or the values under the field ID had to be in numerical order. But I did not think the possibilities were high for either of the reasons.

I had a look at the parallelResult.php file and tried to find out what commands were used to retrieve the data from the database. But I have not resolved anything yet.

NEXT STEPS: I need to search the same items that Steven searched, and try to work out, by following his search path, which screen displays could have led him to make successful corrections. This will help me to evaluate the query strategy he used, think of quicker/better strategies, ways to improve the user interface, and helpfiles to train users in successful search strategies

Vous aimerez peut-être aussi