Académique Documents
Professionnel Documents
Culture Documents
Paper ID: P 033 Paper Title: A Corpus- Based Analysis of Business Computer Terms The review process for ASIALEX 2009 , The 6th International Conference on Dictionaries in Education, is now completed. Based on the recommendation of the reviewers, I am pleased to inform you that your paper has been accepted for presentation in an oral session . You are cordially invited to present the paper on 20th-22nd August 2009 at The Imperial Queens Park Hotel in Bangkok, Thailand. This notification E-mail serves as our formal acceptance of your paper as well as an invitation to present your work at ASIALEX 2009. The reviewers comments (if any) are included at the end of this notification. Please address these comments while preparing your final manuscript. Please note that only papers that are both registered and presented at the conference will be included in the proceedings. For the most updated information on the conference, please check the conference website at http://www.kmitl.ac.th/asialex/ . I would like to take this opportunity to thank you for choosing ASIALEX 2009 to present your research results and I look forward to seeing you in August 2009.
Yours sincerely,
Comments from Reviewers: This abstract shows a corpus-based analysis of business computer terms collected from textbooks, which will finally lead to a compilation of a specialized dictionary.
1. Introduction
Nowadays, English language is imperative of vital importance in many countries. It can be said that English language is able to be global language. As stated by Crystal (1997), approximately 375 million people speak English as their first language and 235 million people speak English as their second language. In addition, many countries taught English as a foreign language. In Thailand, English language is one of the most essential foreign languages. The role of English language in Thailand is quite important as it is in many other developing countries. New technology and adoption of the internet have resulted in a major transition in terms of business, education, science, and technological progress, all of which demand proficiency in English (Wiriyachitra, A., 2009: Online). As the growth of both international business and technology in Thailand, it is necessary to follow the changing world of business and technology. Regarded with Veerachaisantikul (2007:6), English is acquired as international language of technology and commerce because it is suitable for learners who want to know English for particular purposes. English for Specific Purpose is the approach of second language teaching and learning that relates to multi-disciplinary activities of particular learners. The specific purpose learners, thus, are expected to have high quality both in English language and in specific filed, particularly technology filed as computer science and business computer. It is necessary that they have to know the vocabulary of their specific field to help their reading and writing texts. The acquisition of vocabulary has long been considered to be a crucial component of learning a language because the breadth and depth of a students vocabulary will have direct influence upon the descriptiveness, accuracy and quality of his or her writing (Coady et al 1993, Nation 2001, and Read 1998). This research studied the Business Computer Terms: Lexical Words and Function Words, Collocation Terms, Academic Vocabulary and Technical Vocabulary by using computer tool called a concordancing program.
a wordlist of the most frequency used medical academic words in medical RAs, was complied form a corpus containing 1,093,011 running words of medical RAs from online. The established MAWL contains 623 word families, which accounts for 12.24 % of the tokens in the medical RAs under study. The high word frequency and the wide text coverage of medical academic vocabulary throughout medical RAs confirm that medical academic vocabulary plays an important role in medical RAs. Verlinde and Selva (2009) investigated a selection of words to be contained in a learners dictionary of French. This study was a comparison study that used lexicographic approaches: corpus-based and intuition-based. In the study, the corpus was taken from two newspaper issues of Le Monde and Le Soir that were publish in 1998. The corpus had total size of 54, 260 926 words. The result of the study indicated that a corpus-based approach provided a practical method for the lexicographers personal intuition. In Thailand, Kaewphanngam, C. (2002) analysed a corpus of psychology texts in order to develop the teaching materials in English for Academic Purposes. A corpus of psychology texts was selected according to comprehensive criteria. The total numbers of running words were 236,086 words. The psychology students' comprehension of technical and sub-technical vocabulary was investigated by a vocabulary test based on the corpus analysis. The subjects were first year and third year psychology students attending the Faculty of Education, Silpakorn University, in the 2002 academic year. Results revealed that most of the frequent words were function words and content words which were highly topic-related words to the subject of psychology. The parts of speech, uses and meanings of the search words could be investigated by applying concordance analysis. The vocabulary test showed that psychology students performed better in sub-technical vocabulary than in technical vocabulary. First year and third year students obtained significantly different scores on the whole test. There was a high correlation coefficient of technical and sub-technical vocabulary. Veerachaisantikul, A. 2007 studied a business corpus of multi-words expressions in the newspaper. The subjects were news articles in business from the Bangkok Posts business section collected for 3 months from 1st August to 31st October 2006. The total number of news article were 167 files and the total number of tokens or running words were 172,349 words The results showed that the most frequency occurring words were function words and business key words, which were related to the subject of business. A concordance program is a convenient and useful tool for the parts of speech, uses, and the meaning of the selected word investigation. The most frequent parts of speech collocated with the selected business key words were article, adjective, and noun. The most frequent used pattern of multi-word expressions in the Business corpus was in the form of adjective + noun.
2. Methodology 2.1 Data Collection The sample used from the English textbooks and course materials collected from the
2008 Academic year of Business Computer curriculum that is composed of only subjects from the Faculty of Management Science together with interviewing 5 lecturers from Business Computer program in Suphanburi Campus. In the English textbooks and course materials of Business Computer, there were 17 textbooks and course materials as follows: 1. 101 Internet Businesses You Can Start from Home-Susan Sweeney, 2007. 2. The LINUX Programmers Toolbox--John Fusco, 2007 3. A Programmers Introduction to PHP 4.0--W.J. Gilmore, 2001 4. Practical PHP and my SQL--Jono Bacon, 2007 5. ASP.NET 3.5 for Dummies--Ken Cox, 2008 6. Analysis and Design of Information Systems--M. Langer, 2008 7. SQL: The Complete Reference--James R. Groff and Pual N. Weinberg, 1999 8. Oracle/SQL Tutorial--Michael Gerts , 2000 9. Computer Networks: A System Approach--Jarry L. Peterson and Bruce S. Davie, 2003 10. ASP Configuration--Gary Palmtier, 2001 11. Database Management Systems--Raghu Ramakrishnan and Johannes Gehrke, 1999
12. Database System Concept--Foxit, 2004 13. Computer Programming Concepts and Visual Basic--David I. Schneider, 1999 14. Database System--Paul Beynon-Davies, 2004 15. Router Security Configuration Guide--Vanessa Antoine et al. 2001 16. Visual Basic 2005 Database Programming--Roger Jennings, 2006. 17. A Beginnings Guide PHP--Cal Evans, 2009
2.3 Data Analysis There were four steps in this study. Firstly, the Lexical words and function words were
analysed from wordlists and concordancing line to find out parts of speech based on Biber et al.(1999). Secondly, collocations were analysed in term of cluster. Thirdly, academic vocabulary that were found in Academic Word List complied by Coxhead (2000) were comprised. Lastly, technical vocabulary also were analysed by checking from Oxford Advanced Learners Dictionary (2005),Oxford Dictionary of Computing for Learners of English (1996), The Oxford Interactive Dictionary of Business and Computing for Learners of English(1998), Oxford Dictionary of Business English for Learners of English (2000) and IBM Dictionary of Computing Online (2009).
Figure 1: Statistical Details of the Business Computer Corpus From the figure above, there were 2,857,687 tokens or running words, 40,116 word types or different words in a corpus since a recurrent word was counted only once. With a type/token ratio of type/token ratio of 1.40 or 1:71.24, each word in the corpus was repeated nearly 71 times on average throughout the corpus. Within the Business Computer Corpus, the ten most frequency occurring tokens account for 20.9 % of the total list. These tokens were the, to, of, a, and, in, is, that, for, and this (see Figure 2 below).
As can be seen in Figure 2, the top fourteen tokens were function words or closed classes. The word the appeared the most frequently with 169,451 times or 5.93 %. The first lexical word or open class was data which come out at the 15th rank.
From the Table 1, it was interested that the word business is in at the rank of 30th in top noun or at the 97th rank with 3,310 times in the whole corpus. This can be shown that computer is an important tool for business.
3.3 Collocation Selected from Business Computer Corpus In this section, two words from the top 40 nouns (Data and Business) were selected to
find the collocation of grammatical and lexical collocations to show the importance of computer through business. Table 2: The Most Frequent Lexical and Grammatical Collocation of Data and Business in Business Computer Corpus
Type Data Business Lexical Collocation (Freq.) relational data model (193) e-business model (226) Grammatical Collocation (Freq.) the data in (135) type of business (230)
Concordance data from the Business Computer displayed that distinct relationship of the words or phrases which were frequently used with another word or phrase e.g. relational data model (Freq. 193) and the data in (Freq.135). Consider the example of concordances below
Figure 3: The Concordance Data for the Word Data The concordance illustrated how nouns as data connected to other words in Business Computer Corpus to create the noun phrases the data in. The phrase the data in was often followed by noun or g group of word such as a regular heap. Whist, the Figure 4 showed the circumstance of the word business as e-business model and type of business which were preceded by verb store and determiner this.
Table 3: Top Ten Academic Vocabulary in Business Computer Corpus Type Frequency Rank Data 16,767 15 Chapter 5,213 51 Network 4,597 65 Access 3,701 85 Function 3,555 88 Process 3,217 100 Create 2,906 117 Select 2,854 118 Section 2,707 125 Design 2,470 134
Sublist 1 2 5 4 1 1 1 2 1 2
As can be seen from the Table 3, the sublist1 appeared the most (five word types with 29,152 times): data, function, process, create, and section. For the technical vocabulary, SQL which is abbreviation in type of initialisms from the full word Structured Query Language appeared the fist with 7,544 times at the 30th rank. Table 4: Top Ten Technical Vocabulary Type Frequency SQL 7,544 PHP 4,582 C (C language) 2,939 XML 2,773 DMBS 2,368 Schema 1,812 IP 1,758 Router 1,740 ASP 1,604 HTTP 1,207
From above Table, it can be seen that most of top ten technical words are abbreviation. The processes of abbreviation are reduction, which create numerous types of shortened forms in speech or writing of English. Many mechanisms of abbreviation in English are used to create a wide variety of forms (Barnhart, 1995: xiii). The abbreviations are classified as acronyms, intitialisms, clippings, apheresises, contraction, and symbols. For the eight abbreviations from the top ten were classified as intitialisms.
4. Conclusion
The Business Computer Terms based on Business Computer Corpus with 2,857,687 tokens or running word have been complied for the better learning and application of Business Computer students. Although, this study has not completed and it is still ongoing, the researcher hope the development of the corpus to be the Business Computer lexicography and the availability of textbook for Thai students who study in this field.
References:
(1) Barnhart, Robert K. (eds.). 1995. Barnhart Abbreviation Dictionary. USA.: John Wiley and Sons. In Pattaradej, R. 2005. Analysis of English Vocabulary and Sentence Patterns in Job Advertisements: A Case Study of the Bangkok Post. M.A. Bangkok. KMITL. (2) Benson, M., Benson, E., and Ilson, R. 1986. The BBI Combinatory Dictionary of English : A Guide to Word Combination. Amsterdam and Philadelphia: John Benjamins Publishing Company. (3) Biber, D., Johansson, S., Leech, G., Conrad, S., and Finegan, E. 1999. Longman Grammar of Spoken and Written English. Harlow: Pearson Education Limited. (4) Coady, J., Magoto, J., Hubbard, P., Graney, J., and Mokhtari, K. 1993. High Frequency Vocabulary and Reading Proficiency in ESL Readers. In T.Huckin, M. Haynes and J. Coady (Eds.) Second Language Reading and Vocabulary Acquisition. 217-228. Norwood, NJ: Ablex. (5) Coxhead, A. 2000. A New Academic Word List. TESOL Journal.34(2). 213-238. (6) Crytal, D. 1997.English as Global Language. Cambridge.Cambridge University Press. (7) IBM. 2009. IBM Dictionary of Computing. [Online]. Available:
www.networking.ibm.com/nsg/nsgmain.htm
(8) Kaewphanngam, C. 2002. A Corpus Analysis of Psychology Texts as a Basis for the Development of Teaching Materials in English for Academic Purposes. M.A. Mahidol. Mahidol University. (9) Nation, P. 2001. Learning Vocabulary in Another Language. Cambridge. Cambridge University Press. (10) Nesselhauf, N. 2005. Collocation in a Learner Corpus. Philadelphia. John Benjamins Publishing. (11) Read, J. 1998. Validating a Test to Measure Depth of Vocabulary Knowledge. In A.J. Kunnan (Ed.), Validation in Language Assessment: Selected Papers form the 17th Language Testing Research Colloquium. 41-60. Mahwah, NJ: Lawrence Erlbaum Associates. (12) Veerachaisantikul, A. 2007. A Corpus Analysis of Multi-Word Expressions in Newspaper: A Case Study of the Bangkok Post. M.A. Khon Kaen. Khon Kaen University. (13) Verlinde, S., and Selva, T. 2009. Corpus-based versus Intuition-based Lexicography: Defining a Word List for French Learners Dictionary. [Online]. Available: www.kuleuven.ac.be/ilt/grelep/publicat/verlinde. (14) Wang, J., Liang, S., and Ge, G. 2008. Establishment of a Medical Academic Word List. English for Specific Purposes Journal. 27(4). 442-458. (15) Wiriyachitra, A.2009. English Language Learning and Teaching in Thailand in This Decade. [Online]. Available: www.apecneted.org/resources/downloads/English%20Language%20Teaching%20and%