Working With Spanish Corpora PDF

ISBN 978-0-8264-9483-2
111111111111111111111111111111
Workirig With Spanish Corpora
Lee ch
Feng
Corpus the rnethodology to extract rneaning from texts.

ing point fact is nota mirror lets ns share what we
and think about reality, it 011 language as a phenornenon'. and makes visible the
attitudes and beliefs the rnembers of a discourse commumty.
. Consisting ofboth written Jangnage, discourse always has hrstoncal, sonal, furn-
llonal and regional Drscourse can be rnonolmgual or multilmgual, mtercon-
nected by translations. Discourse is where language and social studies rneet. .
The Corpus and Discourse series consists of two strands. The first, &search zn Corpus and
Discourse, featnres innovative contributions to various aspects of corpus lingnistics and a wide Edited by Giovanni Parodi
range of applications, from language technology via the teaching of a second language to a
history of rnentalities. The second strand, Studies in Corpus and Discourse, is cornprised of key
texts bridging the gap between social studies and lingnistics. Although equally academically rig-
orous, this strand will be airned at a wide audience of acadernics and postgraduate students
working in both disciplines.
Research in Corpus and Discourse

Meaningful Texts
The Extraction of Semantic Information from Monolingual and Multilingual Corpora
Edited by Geoff Barnbrook, Pernilla Danielssson and Michaela Mahlberg
Corpus Linguistics and World Englishes
An Analysis of Xhosa English
Vivian de Klerk
Evaluation in Media Discourse
Analysis of a Newspaper Corpus
Monika Bednarek
Idiorns and Collacations
Corpus-based Linguistics and Lexicographic Studies
Edited by Christine Fellbaurn
Working with Spanish Corpora
Edited by Giovanni Parodi
Historical Stylistics
Media, Technology and Change
Patrick Studer
Conversation in Context
A Corpus-based A nalysis
Christoph Rühlemann
Studies in Corpus and Discourse
English Collocation Studies
The OSTJ &port
John Sinclair, Susanjones and Robert Daley
Edited by Rarnesh K.rishnarnurthy
With an introduction by Wolfgang Teubert
Text, Discourse and Corpora
Michael Hoey, Michaela Mahlberg, Michael Stubbs and Teubert
With a foreword by John Sinclair
Corpus Semantics
An Introduction
Anua Cermáková and Wolfgang Teubert continuum
Continuum
The 80 Maiden Suite
! York Road York
London SE l 7NX NY 10038
©Giovanni and contributors 2007
All rights reservcd, No pan of this publicalion may be reproduced or !ransmitted in any form or
by any means, electronic or mechanical, including photocopying, recording, or any information
storage or system, withoul prior permission in writing from thc publishers,
British Library Cataloguing-in-Publication Data

A catalogue record for this book is available from the British Library
ISBN: 0-8264-9483-8 List of tables, figures and graphs vii

978-08264-9483-2
List of contributors lX
Editor's Preface Xll
Lilmuy of Congress Cataloging-in-Publication Data List of abbreviations and acronyms xiv

A catalog record of this book is available from the Li brary of Congress,
Chapter I: INTRODUCTION
Catching up with corpus linguistics: Register-diversified
studies from different corpora in different Spanish speaking
Typeset by Servis Filmsetting Ltd, Manchester countries 1
Printed and bound in Great Britain by The Cromwell Press, Trowbridge, Wiltshire Giovanni Parodi
Pontificia Universidad Católica de Valparaíso
Chile
'Ldll""'"''' II:
Variation across registers in Spanish: Exploring the
El Grial PUCV Corpus 11
Giovanni Parodi
Chile
Chapter III: Dimensions of register variation in Spanish 54

ut"'"'"'°' Biber and Nicole
Northern Arizona University
USA
Chapter IV: Epistemic modality and academíc Pilot

for COTECA (Corpus Textual del L0ifY<u•v•
de la Argentina) 90
Guiomar E.
Universidad de Buenos Aires
V1 ONTENTS
106
Future tense Pvrn'"""" in severa! corpora 132

Mercedes Sedano
Central de Venezuela
Venezuela
Chapter VII: Technical-professional discourses: Specialized

and dissemination text types Tables
145 2.1 Overall composition of the El Gtial PUCV-2003 Corpus 15
Giovanni Parodi and Aída Gramajo 2.2 General distribution of texts and words in the TSC 16
Pontificia Universidad Católica de Valparaíso 2.3 Composition of the LL Corpus 17
Chile 2.4 OIC Composition 17
2.5 65 / Sixty-five linguistic features 19
3.1 Composition of the corpus used for the MD analysis 60
~"'ª"'"" VIII: Academic writing: Exploring Corpus 92 173 3.2 Summary of the important linguistic features defining each dimension 64
Carmen López-Ferrero 4.1 Systematization of lexico-grammatical epistemic markers 99
Universitat Pompeu Fabra 5.1 Correspondence between prepositional schemes and PC 110
Spain 5.2 Types of communication verbs, according to Bosani (2000) 111
5.3 The model of analysis 115
5.4 General analysis matrix 115
Chapter IX: Using Latent Semantic Analysis in a Spanish research
5.5 Verbs under examination 116
article corpus 195
5.6 Conformation of the corpus 116
René 5.7 Ten most frequent communication verbs in the corpus 119
Pontificia Universidad Católica de Valparaíso 5.8 Dependence tests between prepositions and Cramer's V test cognitive
Chile categories 120
5.9 The most frequent schemes in both types of communication verbs 125
6.1 Morphological and periphrastic future occurrences in spoken Spanish 134
217 6.2 Morphological and periphrastic future occurrences in written Spanish 135
Nicole Viviana Cortes 6.3 Distribution of morphological and periphrastic future occurrences in
Northern Arizona University the two corpora 136
Iowa State 6.4 Temporal distance in the spoken Spanish corpus (Sedano 1994) 138
USA 6.5 Temporal distance in the written Spanish corpus (Sedano in press) 138
6.6 Verb person in spoken Spanish (Sedano 1994) 139
References 6.7 Verb person in written Spanish (Sedano in press) 139
232 6.8 Epistemic modality markers in spoken Spanish (Sedano 1994) 141
6.9 Epistemic modality markers in the written corpus (Sedano in press) 142
Index
256 7.1 Composition of the TSC 149
7.2 Text types and characterizing features 158
8.1 Cürpus 92 characteristics 174
8.2 Content and length of the study corpera 178
8.3 Common patterns of use in the samples from the IULA-CT corpus 185
9.1 Research corpus 205
9.2 Segmentation values in to quartiles for assessment of the
lexical-semantic similarity indexes 206
OF TABLES, FIGURES AN
208
223
Dimension oral versus literate

discourse 67
3.2 of Dimension 2: irrealis discourse 71
3.3 Comparison of registers along Dimension 3: narrative discourse
75
3.4 Comparison of registers along Dimension 4: addressee-focused
interaction 78 DOUGLAS BIBER is Regents' Professor ofEnglish (Applied Linguistics) at
3.5 Comparison ofregisters along Dimension 5: informational reports of Northern Arizona University. His research efforts have focused on corpus
past events 80 linguistics, English grammar and register variation (in English and cross-
3.6 Comparison ofregisters along Dimension 6: 'formal' written style
82 linguistícs; synchronic and diachronic). His publications in elude books
7.1 Diversified possibilities of specialized texts in a continuum
148 published by Cambridge University Press (1988, 1995, 1998), Oxford
7.2 Criteria for and features of textual classification
152 University Press (1994),.John Benjamins (2002, 2006) and the co-authored
7.3 Comparison ofthe 12 features in two textual types
159 Langman (',rammar aj Spaken and Written English ( 1999) and Langman Student
7.4 Technical-professional school discourse community and text types
162 Grammar aj Spaken and Written English (2002).
7.5 Dialectical relationships
169
9.1 First step of LSA method
202
9.2 First step of LSA method: application of SVD GUIOMAR E. CIAPUSCIO is professor at the University of Buenos Aires,
202
10.l Number of different lexical bundles by register Argentina, and researcher at the CONICET (Consejo Nacional de
220
10.2 Overa!! frequency of lexical bundles in the two registers Investigaciones Científicas y Técnicas). She received her Ph.D. in Linguistics
221
10.3 D~stribution of lex'.cal bundles across strnctural types at Bielefeld University (Germany) in 1992. Currently she coordinates
222
10.4 D1stnbu.t1on of lex1cal bundles across functional categories
229 Termtex, a research group at the Universidad de Buenos Aires and the
10.5 Interact10n of structural and functional categories for both registers
229 Universidad General Sarmiento, and is head of several projects related to
specialized discourse and its specificity in textual, grammatical and lexical
aspects. She has published several books and research articles on several
2.1 of oral and written registers
32
2.2 . . of specialized and non-specialized registers
topics in Spanish text linguistics, specialized discourse and terminology.
34
2.3 D1mens1ons and three registers
35
2.4 Dimensions and domains of specialization VIVIANA CORTES is assistant professor at Iowa State University in the
37
5.1 of communication verbs United States, where she teaches courses in English Grammar, Corpus-
118
5.2 Schemes with the preposition a based Discourse Analysis, English for Specific Purposes, and Academic
121
5.3 Schemes with the preposition en Writing for Intemational Graduate Students. Dr Cortes worked as an EFL
121
5.4 Schemes with the preposition de teacher in Buenos Aires, Argentina, before moving to the United States to
122
5.5 Schemes with the a
123 pursue her doctoral studies. She graduated from Northem Arizona
6.6 e>u1e1ueo with the preposition con
124 University, USA, in 2002 with a Ph.D. in Applied Linguistics. Her research
6.7 Schemes with the preposition de
125 interests cover the use of corpus-based methodologies to investigate recur-
7.1 Occurrence of 74 text types
168 rent word combinations (i.e. lexical bundles) in different registers of
9.1 of average indexes between
texts variables in both areas ~'"'""'W"' and Spanish. In addition, she is also interested in ways of using lan-
9.2 209 guage corpora in the classroom, particularly in courses of English for
of the lexical-semantic average in both areas
to ES-ARTICO Academic Purposes. She has published articles in several journals, includ-
210
ing Applied Linguistics and English Specific Purpases, as well as in severa!
edited books.
X LIST CONTRIBUTORS LIST OF ONTRIBUTORS
are as courses
several research ~r~,c·~· and abroad. She been awarded national and
tional programs and distinctions. She is the author of one book, Hendidas)' otras con-
lished severa! articles con ser en el habla de Caracas, and co-author of two others. She has
over 60 articles and specialized books.
!Ul1~H.~~
and
at Universitat Pompeu Fabra since 1993. She gradu- is a Ph.D. candidate in~'"""·'"
ated in (Language) and she received her doctorate in Arizona USA. Her research interests include
Philosophy and Sciences of Education from Barcelona University. She has · ofcorpus linguistics to language teaching and instructed second lan-
cations · d" ·
carried out research on discourse analysis, text linguistics, written commu- cquisition Currently she is working on completmg her issertatlon,
guage a · . . b · d
nication and Spanish language learning as Ll and L2. Currently she is h . h focuses on the acquisition of the pre tente and imperfect yl mstructe
W lC . f h
taking part in scientific research projects funded by governmental and com- L2 learners of Spanish. She has been co-author of a meta-~n.a_rs1s o ~ e
petitive foundations. She is a member of the Discourse Studies Network ( XED) f f ts of task-based interaction on second language acqms1t10n, wh1ch
e ec · d 1' h · (J h
and of the Langnages Acquisition, Learning and Teaching &search Group appeared in Synthesizing &search on Langnage Lea~nzng an eac . zngh o n
( GR@Al) at the UPF. She has published severa! books and articles on dis- Benjamins, 2006), and another article with Doug Biber'. to appear .m t e new
course analysís, text linguistics and language learning in Spanish. journal Corpora. She has conside~able teachi~g expenenc~, havmg taught
EFL in Argentina and Peru, Spamsh as a fore1gn language m the USA, ESL
GIOVANNI PARODI is presently head of the Postgraduate School of in the USA, and teach-training at Northern Arizona University.
Linguistics at Pontificia Universidad Católica de Valparaíso, Chile, and
editor of Revista Signos, Estudios de Lingüística. He obtained an M.A. in RENÉ VENEGAS is professor at the Pontificia Universidad Católica de
Applied Linguistics and later received his Ph.D. in Linguistics. His major Valparaíso, Chile. He has a Ph.D. in Linguistics. He teach~s linguistic~ and
fields of interest are text linguistics, discourse psycholinguistics (reading semantics to undergraduate and postgraduate students. H1s research mter-
comprehension and written production processes) and corpus linguistics. ests are academic discourse, the study of meaning with computer tools, the
Currently he is conducting research in specialized academic/professional development of computer tools for text analysis and oral argumentation.
written díscourse, press media discourse analysis and computational Dr Venegas is a member of the Chilean Linguistic Society (SOCHIL) and
resources through three grants funded by major Chilean research founda- the Latin American Discourse Analysis Association (ALED).
tions and international programmes, such as ECOS and UNESCO UNITWIN
Chairs. His publications include articles in Spanish and English journals and
severa! books published by EUDEBA (2005, 2007) and EUVSA (1999, 2002,
2003, . He has also edited four other interdisciplinary books.
OMAR SABA;J received his Ph.D. in Linguistics in 2004 from Pontificia

Universidad Católica de Valparaíso (PUCV), Chile. He is an associate pro-
fessor in the and postgraduate programmes at the Institute of
Literature Language Sciences at PUCV. He is currently carrying out
research on several projects as principal researcher and as collaborator. His
main areas of specialization are verbal studies from a syntactic lexical
perspective, corpus linguistics and generative grammar.
MERCEDES SEDANO has a Bachelor's degree in Letters and a Master's

degree in Linguistics from Universidad Central de Venezuela (UCV). She
also holds the degree ofDoctor in Philology from Universidad Nacional de
Educación a Distancia of Spain. She directed the Instituto de
S PREFACE
and to help out in whatever way

ments with the publis~ers and with the autho~s. . _. .
I hope you enjoy th1s book as much as I enjoyed nurtunng 1t as 1t gradu-
As a Chilean researcher living and working in a country far from the leading ally took shape and carne to life.
publishing companies, I was lucky enough to get enthusiastic support and
intemational academic collaboration to accomplish a challenging idea: to ~1c1v:aLiuu Parodi
compile a book on Spanish language research from the perspective of
corpus linguistics.
During the preparation of the book, I had the opportunity to spend my
sabbatical leave in Birmingham, England, where I thoroughly enjoyed the
discussions 1 had with Wolfgang. He was the ideal sounding board on which
to test my points ofview on corpus linguistics and discourse semantics.
For the last six years I have been doing research on the structure of
Spanish, and for the last four years I have been developing this subject using
corpus linguistics. In addition, I have followed the multi-register and multi-
dimensional analysis in the description of Spanish corpora, particularly
written and spoken varieties. My research has also included discourse psy-
cholinguistics. Fortunately, I have found a way to connect both areas with
profound and promising links. I take a broader view of the study of dis-
course psycholinguistics and of corpus linguistics than sorne of my col-
leagues, which has helped to widen my sphere of connections and has been
conducive to interdisciplinary research.
As I have conducted my research, I have been enthralled by corpus analy-
sis with computational tools and by collecting electronic corpora. At the
same time, I have discovered that research describing Spanish has been far
removed from mainstream corpus linguistics, except for a few exceptions in
Spain and Latin America. Also, it has been surprising to discover how few
publications any) have devoted space in English to disseminating recent
research conducted on Spanish following up-to-date methodologies.
Therefore, one of the purposes of this book is to fill that gap and gather
together a number of papers that do <leal with the description of Spanish
from the perspective of discourse analysis and corpus linguistics. With this
in mind, I undertook to motívate sorne friends and colleagues to write a
series of connected papers that could be compiled as a coherent book in
order to disseminate scientific knowledge for the advancement of linguis-
tics and for the of the Spanish language all over the world.
LIST OF ABBREVIATIONS AND ACRONYMS XV
indexes
multi-dimensional
MD
MF
neutral communication verb
National Endowment for the Humanities
Central TV News Programmes, Chile

TV
OIC Oral lnterviews also CEO
re prepositional complement
PF periphrastic future
ACV attitudinal communication verbs PrADo Preparación Automatizada de Documentos
ALED Latin American Discourse Analysis Association Pontifica Universidad Católica de Valparaíso, Chile
ARTHUS Spanish Text Archive, Universidad de Santiago de Compostela RAE Real Academia Española de la Lengua
ARTICO Artículos de Investigación Científica Originales ScIELO Scientific Electronic Library Online
BDS Base de Datos Sintácticos del Español Actual SGML Standard Generalized Markup Language
ce circumstantial complement SOCHIL Chilean Linguistic Society
CEO Oral Interviews Corpus SRA scientific research article
Clc Computer Learner Corpus SVD singular value decomposition
CLL Latín American Literature Corpus TA technical article
CONICET Consejo Nacional de Investigaciones Científicas y Técnicas TSC Technical-scientific Corpus of Spanish
CORDE Corpus Diacrónico del Español TSLC TELEC Secondary Learner Corpus
COTE CA Corpus Textual del Español Científico de la Argentina UAM Universidad Autónoma de México
CPP Public Policies Corpus ucv Universidad Central de Venezuela
CQP Corpus Query Processor UPF Universitat Pompeu Fabra, Barcelona
CREA Corpus de Referencia del Español Actual XED Discourse Studies Network
CTC Technical-Scientific Corpus
DEEB Primary School Discourse
DETP Technical-Professional School Discourse
DG didactic guideline
DICIPE Scientific Dissemination in Written Press
DO direct object
EFL English as a foreign language
EliES Estudios de Lingüística del Español
ESL English as a second language
EUDEBA Editorial Universitaria de Buenos Aires
EUVSA Ediciones Universitarias de Valparaíso
GIL Grupo de Ingeniería Lingüística
ICLE Intemational of Leamer English
IMRD v~~~'"'v", Method, Results, Discussion
IULA Instituto Universitario de Lingüística Aplicada, UPF, Barcelona
LLC Latín American Literature Corpus, also CLL
LSA latent semantic aHan01~
LSI latent semantic indexation
Catching up with corpus studies
from different corpora in different Spanish-speaking countries
Giovanni Parodi
Chile
1. Introducing Working With Spanish Corpora

Despite the growing interest in the field, studies on Spanish corpus linguis-
tics are scarce, not only in English-speaking countries but also in Spanish-
speaking communities. Spanish is rapidly growing as an international
language, making the need for empirical studies of its use more urgent
than ever. Over the last decade, corpora have been compiled and software
and computational programs have been developed to cover the needs of
researchers. However, to date there have be en surprisingly few corpus-based
studies conducted on Spanish reflecting recent lines of thinking. Most
studies of language variation and use of Spanish tend to focus on examples
taken from a few original text~ orare based on small corpora. In the present
book we intend to partially fill this gap by organizing a collection of investi-
gations following the principies of corpus linguistics. The chapters that
comprise this book represent research on contemporary Spanish con-
ducted in five countries: Argentina, Chile, Spain, the USA and Venezuela.
As a whole they constitute a unique collection of academic writing. To date
this is, to the best of my knowledge, the first book published in English
which has focused on Spanish variation among diversified corpora and
registers.
In spite of the fact that, strictly speaking, corpora constitute an intrinsic
part of linguistic research in Spanish, corpus linguistics has not been used as
a widespread methodology in the study of Spanish or in the research pub-
lished in journals and books. Even though the number of Spanish speakers
and readers is growing exponentially and Spanish could even be said to be
the second language used as a lingua franca in the world, I know of no other
compilations of research in English comparable to this one. Although the
chapters differ considerably in their specific methodologies, the focus of
2 WORKIN SPANISH CORPORA INTRODUCTION 3
as
f cus ofthe book is toª~'ª''~~
Ü aCfOSS different nc>0-1':YPr<
lS countries. At the same
and written as well as are . a set that revea! how research, conducted
considered. Various structures are focused on and based on corpora col- uve
. Latin America, is . fill.mg t h e gap m
. corpus-b ase d stu d.ies.
lected from different Spanish-speaking countries or communities. It is quite in At the end of this introductory chapter, we prov:ide a collectíon of refer-
~"ª""'''l<.'11.: to compare the results of research conducted on Spanish with ences to websites and computational tools available on the internet that <leal
~,.,.,,.u,. and other languages. As stated above, the popu- with the Spanish language and Spanish corpora.
is daily and Spanish is becoming more In II, Giovanni Parodi, from Pontificia Universidad Católica de
and more relevant on the world stage, hence one chapter comparing Valparaíso, Chil~, uses a multi~di~ensional ap.proach, based on multivaii~te
research on Spanish with research on three languages other than Spanish statistical analys1s, to study vanat10n across wntten and spoken, and speoal-
is also included. ·zed and non-specialized registers of Spanish texts (PUCV-2003 Corpus,
The nin e chapters that follow this one represen t a wide range of research ~lmost 2 .5 million words). sixty-five salient linguistic fe atures w:ith functional
on Spanish, not only because of the various countries, institutions and and communicative implications are determined to be relevant in Spanish.
diverse backgrounds of their authors, but also beca use of the specific topics Variation in frequencies across the texts and the features provided evidence
they focus on. However, even though they are rich and varied in their for five relevant dimensions. The multi-feature and multi-dimensional
approaches, they all contribute to the key aim of the book: the focus on analysis shows that the emerging dimensions tend to identify variation
linguistics variation across registers in the Spanish language. between written and spoken registers and technical and non-technical texts,
By and large, joumals and books on language corpora today are domi- with Informational Focus being the most relevant dimension with regard to
nated by research conducted on English. The collection ofworks presented accounting for the written technical-scientific corpus of Spanish (TSC).
here brings together an updated review of the rich and vast research cur- Chapter III gives an account of research conducted by Douglas Biber
rently being conducted on Spanish using corpus linguistic methodology. and Nicole Tracy-Ventura at Northem Arizona University, USA. Following
This is the first time such research has been gathered together in one the well-known multi-dimensional (MD) methodological approach, initially
volume. The common ground for this compilation of essays líes in their use designed by Douglas Biber himself, that applies multivariate statistical tech-
of spoken and written corpora. In addition, all the essays in this book are niques ( especially factor analysis) to the investigation of register variation in
concemed with empirical text analysis through the examination of authen- a language, Biber and Tracy-Ventura focus on the analysis of Spanish based
tic and diversified corpora. on a corpus that comes from the twentieth-century component of the NEH-
Sorne of the chapters are specifically concerned with language teach- funded Corpus del Español (18.2 million words; 4049 texts; 19 registers).
ing/learning processes in schools and universities, whilst others are more The Spanish MD analysis offers further evidence for both of the following
interested in the phenomena of genre description and discourse variation two major patterns: the existen ce of cross-linguistic universals, together with
across different registers or in cross-linguistic research. Together they illus- distinctive dimensions associated with each language/ culture. An emerging
trate the broad range of corpus-based studies now being carried out in many comparison is made between the Flagstaff study and the Valparaíso study
Spanish-speaking countries, and even in countries where Spanish is not the (Chapters II and III, respectively). Interestingly, these studies and others
official language, such as is the case with the USA. focusing on languages other than Spanish, such as Korean and Somali, have
This book is aimed primarily at English-speaking linguists, specifically found sorne striking similarities in the underlying 'dimensions' that distin-
those interested in Spanish and in contrastive linguistics and contrastive guish between registers in these diverse languages, raising the possibility of
rhetoric; at undergraduate and postgraduate students who study the Spanish universal patterns of re gis ter variation.
language and whose programmes focus on contrastive linguistics and/ or on As part of Chapter IV, Guiomar Ciapuscio, from Universidad de Buenos
contrastive rhetoric; and at English-speaking teachers of Spanish, grammar- Aires, Argentina, looks at the notion of spoken academic discourse genres
ians and discourse researchers. The secondarv audiences of the book are based on a new corpus in construction - the COTECA (Argentina's Scientific
linguists of all languages, language students, 'researchers and teachers of Spanish Text Corpus: Genre, Lexico/Grammatical and Terminological
Spanish, as well as language teachers in general. Research) - which seeks to contribute to the knowledge of spoken and
This introductory chapter an overview of the book, describing written Spanish used by Argentinian academic researchers and specialists. In
its purpose and on all of the chapters in order to explain how this context, Guiomar Ciapuscio explains that the first stage of the COTE CA
WORKING SPANISH ORPORA INTRODUCTION 5
functíons of a group of con-

express science talks, and, as such, it con- as used students versus more expert writers.
tributes to the further of COTECA's next of research. studies about the domain of Spanish paratactic
Ornar of Pontificia Universidad Católica de students' have shown that the use of their
offers a on schemes in communication verbs based on a has not been mastered successfully during their secondary
comparison of nine corpora in Chapter V. He describes the occurrence of Students are not able to satisfactorily manage the syntactic,
prepositional schemes (in two types of communication verbs) in a group of semantic, pragmatic and discourse features of this kind of conju~ction. In
diversified registers of contemporary Spanish. The corpus of analysis is com- this chapter, Carmen López-Ferrero contrasts the results of a p~ev10us study
prised of 4,874,275 words (the El Grial PUCV-2003 Corpus, available at about the use of paratactic conjunctions in Corpus 92 (a Spai:i1sh la~~age
www.elgrial.cl), from more than 1500 documents. The verbs under study are Jearners' written corpus) with the use of these connectors m spenahzed
subclassified according to a functional taxonomy in which two types are dis- expert written texts taken from the website of the Real Academia Española de
tinguished: 1) neutral communication verbs; and 2) attitudinal communi- la Lengua (RAE). She made use of a computational tool called Bwananet,
cation verbs. The findings allow for the identification of the most frequent available at the web page of Institut Universitari de Lingüística Aplicada
types ofverbs of communication in each register, and also lead toan analy- from UPF. The specific purpose of the study is to describe the syntactic,
sis of these schemes in terms of the productivity of each preposition. semantic, pragmatic and discourse properties conditioned by discourse regis-
Chapter VI investigates the alternation of two future expressions used ters of the following Spanish paratactic conjunctions: y/e, ni; o/u; bien . ..
in various corpora of written and spoken Spanish. Mercedes Sedano bien; pero; mas; sino; aunque.
(Universidad Central de Venezuela) addresses the study of morphological Chapter IX, written by René Venegas, Pontificia Universidad Católica de
future (MF) and periphrastic future (PF). Two groups of quantitative Valparaíso, Chile, focuses on lexical-semantic similarities based º:1 three
studies are compared: 1) those carried out by different researchers on text variables: keywords, abstracts and the contents. To determme the
spoken and written Spanish corpora; and 2) those conducted by Sedano Jexical-semantic similarities among the variables, a computer-statistical
( 1994 and in press). The variables considered in these studies and used for method called Latent Semantic Analysis (LSA) is employed. LSA is a com-
comparison in the present research are: (i) temporal distance, (ii) gram- putational too! which works with untagged texts. The findings of this study
matical person in future tense, and (iii) epistemic modality markers. Her allow Venegas to assert that there are no meaningful statistical differences
findings, especially from the written press and spoken interactive conversa- between the scientific areas under study according to the lexical-semantic
tions, show significant differences and variation in the use of these two dif- relationships of the text variables involved.
ferent expressions of futurity, depending on the context of use and the Closing the book, the final chapter is contributed by Nicole Tracy-
purpose of the writer/speaker.
Ventura, Viviana Cortes and Douglas Biber from two universities in the USA
In VII, Giovanni Parodi and Aída Gramajo illustrate how a (N orthern Arizona U niversity and Iowa Sta te U niversity). The study focuses
corpus-based research study to characterize a particular genre of academic- on lexical bundles in speech and writing. Lexical bundles are said to be the
professional discourse can be used not merely to describe a collection of most common recurrent sequences of words in a language and they serve
specialized texts from different angles, but also as the basis for refining the fundamentally important discourse functions in English registers. Previous
categories of a text typology with didactic purposes. The specific objective research has shown that lexical bundles are very common in spoken regis-
of this seventh chapter is to determine text types that characterize three ters, especially for stance functions and discourse organization. Lexical
specialized corpora ofwritten texts, collected from three areas of secondary bundles are less common in academic writing, where they tend to be used
technical schools (Maritime, Metal Mechanics and Commerce), from for referential functions rather than stance functions. Chapter X builds on
functional-communicative and textual perspectives. The corpora employed earlier studies to investigate whether lexical bundles are equally important
in this study are part of a larger corpus, the El Grial PU CV-2003 Corpus, that in the construction of spoken and written discourse in Spanish, examining
was used for the multi-dimensional research described Giovanni Parodi the functions and distribution oflexical bundles in a large corpus of Spanish
in Chapter II. The results of this show that a multi-level, complex texts from formal conversation and academic prose. With this objective in
6 WORKING WITH SPANISH INTRODUCTION
wehsites and com¡:>u1tatio1rn1 resources to Fabra, Barcelona, Spain. It offers informa-

tion about research projects and groups of
researchers. In addition, it shows part of the
In this last section, a list of websites and computa-
computational tool Bwananet. At the same
tional resources to research as well as sorne other languages has
time, there is access to several digital
been included. An attempt has been made to ensure that the information
corpora mainly collected by mem bers of the
compiled and contained in this section is accurate, though it is extremely
IULA.
difficult to keep up with the fast-changing intemet. There is no particular
guiding principie to the order in which these resources are presented here.
A brief description and/ or comment is provided for most of them. 2.4 Name of web page Spanish Text Archive from Universidad de
Santiago de Compostela (ARTHUS)
2.1 Name ofweb page El Grial
Webpage http://gramatica.usc.es/EspWelcome.html
Webpage www.elgrial.cl
Description (or comment) This site has been developed by the Grupo
Description (or comment) This website has been developed by a group de Investigación en Gramática del Español de la
of researchers on corpus linguistics belong- Facultad de Filología de la Universidad de
ing to the &cuela Lingüística de Valparaíso, Santiago de Compostela, Spain. Information
from the Pontificia Universidad Católica de about research projects, faculty members
Valparaíso, Chile. Here you will find a and online publications can be found here.
growing corpus of diverse, contemporary It offers a computational machine to inves-
registers. There is also a useful computa- tigate verbs and the Base de Datos Sintácticos
tional tool to tag and parse texts on a tem- del Español Actual (BDS).
porary basis and an interface to research
the corpora available online and those
loaded temporarily. 2.5 Name of web page Base de Datos Sintácticos del Español
Actual (BDS)
2.2 Name of web page RAE: CREA and CORDE Webpage www.bds.usc.es
Webpage www.rae.es Description (or comment) The Base de Datos Sintácticos del Español
Actual (BDS) contains the result of the
Description (or comment) The website of the Real Academia Española manual analyses of the approximately
de la Lengua (RAE) is comprised, among 160,000 clauses which form the contempo-
other things, of an interface that allows rary part of the Archivo de Textos Hispánicos
searching for concordances of the two de la Universidad de Santiago (ARTHUS).
corpora available online. These are the Each file is comprised of63 fields organized
Corpus de Referencia del Español Actual
in four big blocks.
(CREA), with almost 140 million forms, and
the Corpus Diacrónico del Español ( CORDE),
which contains more than 180 million
forms.
8 WORKING SPANISH CORPORA
on
analvsis of infonnation structure to
2.7 Name ofweb page text 'retrieval since 1986. From 1990 on the
Webpage group widened its areas of research to
www.corpusdelespanol.org
include natural language processing and
Description (or comment) This is a 100-míllion word corpus, funded by computational linguistics.
NEH during the years 2001 and 2002. Mark
Davis, Brigham Young University, USA, has
created this corpus. It has a very powerful 2.n Name ofweb page Grupo de Ingeniería Lingüística
research computational engine, so large Webpage http:/ /iling.torreingenieria.unam.mx/
that several corpora can be researched at
the same time. Description (or comment) This website belongs to the (',rupo de
Ingeniería Linrj1ística (GIL) of the Universi-
dad Autónoma de México (UAM). There
2.8 Name of web page Estudios de Lingüística Española (ELiEs) is basic information about the principies
Webpage for constructing and analysing a corpus. In
http://elies.rediris.es/ elies18/index.html
addition, there is a complete account of the
Description (or comment) This is the location of the the article by research activities of GIL members and a
Chantal Pérez Hernández, from the
wide repertoire of related links.
Universidad de Málaga. The article lS
Explotación de los córpora textuales informatiza-
dos para la creación de bases de datos termi- 2.12 Name ofweb page Fundación Biblioteca Virtual Miguel de
nológi.cas basadas en el conocimiento. Cervantes
Webpage www.cervantesvirtual.com/herramientas
2.9 Name ofweb page Laboratorio de Lingüística Informática Description (or comment) In the section Linguistic Tools of the
Webpage website belonging to the Fundación Biblio-
www.lllf.uam.es/ corpus/ corpus oral.html
teca Virtual Miguel de Cervantes, this is an
Description (or comment) Here we find the Corpus de Referencia de Advanced Text Browser offering Manila lit-
la Lengua Española Contemporánea, col-
erary texts. There are severa! options for
lected by members of the Laboratorio de the analysis and .1 IA;htirrn of the digital
Lingüística Informática from the Univer- corpus.
sidad Autónoma de Madrid, Spain. One of
its achievements is providing a 1-million
word corpus based on spoken texts and 2.13 Name ofweb page Lingüística de Corpus
implemented with sound recordings. Webpage http:/ /liceu.uab.es/ ~joaquim/language_
resources/lang_res/biblio_corpus.html
!' .....muu ( or comment) Here we find a wide ofbibliographic
resources related to corpus linguistics and
written discourse.
1 WORKING WITH SPANISH CORPORA
2.14 Name of PrADo
atizada de Documentos is
group of researchers from the
Universitat Autónoma de Barcelona and
the Universitat ~~rn•~m Fabra, Barcelona,
Spain. It is funded the Ministerio de
Ciencia y de España. The
website offers online pro-
grams and a growing corpus of texts. Pontificia Universidad Católica de Valparaíso
Chile
2.15 Name ofweb page Español Interactivo de la Escuela para
Estudiantes Extranjeros
Webpage www.uv.mx/ eee / sp_in teractivo /index.htm
Iottoduction
Description (or comment) In this website, we find interactive exercises
for learning Spanish as a foreign language. The increasing importance of register variation across disciplines as an
The principles guiding the exercises are explanatory factor for diverse knowledge construction within discourse
based on corpus linguistics. It belongs to communities has been increasingly recognized over the past decade. The
the Universidad Veracruzana de México. perception that there is no core disciplinary discourse per se and that i t is better
to talk about disciplinary discourses in the plural (Hyland 2000) is becoming
more accepted among researchers (Bhatia 2004). Empirical findings based
on various approaches have documented the importance of corpus-based
analysis as a way to advance and describe in detail the variation across disci-
plines and across text types (Biber 1988, 2003, 2005; Parodi 2004, 2005a;
Bhatia 1993, 2004; Flowerdew 2002; Martín and Veel 1998; Wingell 1998;
Williams 1998).
Corpora of natural, annotated texts have had a significant impact on lin-
guistic analyses over the previous two or three decades. In particular,
research into the English language, as well as certain European and Asían
languages, has revealed that linguistic studies based on large corpora of
digital texts do not always corroborate the researchers' initial intuitions.
The use of computer-supported corpora as well as the availability of com-
puter programs that help in dealing with them has boosted linguistic inves-
tigation in a way that was prev:iously unthinkable.
Unfortunately, research describing linguistic features of specialized acade-
mic discourse based on Spanish data and used in technical-professional
school settings is relatively scant or non-existent. Most of the research pro-
duced in the Spanish language on specialized discourse focuses on the so-
called specíalized-disseminating discourse ( Cademártori 2003; Calsamiglia
2000; Cassany et al. 2000; Ciapuscio 2003b; Ciapuscio and Kuguel 2002; López
2002), or it addresses discourse markers in a variety of texts (Martín
Zorraquino and Portolés 1999; Montolío 2001; Portolés 1998). Other studies
focus on a few linguistic features in small and exemplary texts (Ciapuscio
STERS IN SPANISH
WITH SPANISH
in text types or the

salient features across three registers. discourses. Likewise, there is
variability in the El Grial PUCV-2003 tagged corpus texts with a total of describing these texts or a psycholinguistic hierarchy
1,466,744 words) is explored using a multi-feature and multi-dimensional there is of the way students are given or have to tackle the reading
analysis with the aid of multivariate statistical techniques - namely, princi- that are supposed to help them beco me members of the academic
component analysis. In sum, the aims of this research are the following: are beginning to become a part of. This gap in informa-
(a) to identify the relevant linguistic pattems which co-occur in the El Grial tion or research, to put it more accurately, is also hav:ing a dramatic effect
PUCV-2003 corpus from a quantitative empirical perspective; (b) to on the way academic processes are being organized through language in
compare three corpora along functional dimensions determined in terms
school settings and university environments.
of the co-occurrence of linguistic patterns: a written, technical-scientific, ' Thus, Hyland (2000: 147) clearly sums up sorne of the most problematic
specialized corpus; a written literary corpus; and an oral interview-type issues about written discourses in educational and pedagogical contexts
corpus; and ( c) to identify similarities and differences, in dimensional
with to teaching/learning processes:
terms, across the three disciplinary domains of specialization covered \Vithin
the Technical-Scientific Corpus (TSC): maritime (Port Opera ti o ns), metal The fact of multiple literacies within the academy is a further burden to students,
mechanics (Industrial Mechanics) and commerce (Accounting). particularly if they lack vocabulary and analytical skills to distinguish the hetero-
This chapter begins by focusing on academic discourse and the multi- of the discourses and practices typical of the different disciplinary cultures
feature and multi-dimensional analysis, before going on to describe the encounter. Presenting academic skills as universal and transferable does a
corpora used and the linguistic features explored, as well as the procedures serious disservice to learners as it disguises variability and misrepresents academic
for automatic text annotation and the statistical techniques. The results are writing as naturalised, self-evident and non-contestable ways of participating in
then presented: namely, the five factors of feature clustering, interpreted academic communities.
along five relevant dimensions, and the comparison between specialized/ Our concept of disseminating academic discourse will, therefore,
and written/spoken registers. Closing the I discuss be that it comprises a diversified group of texts that are used in technical-
relevant issues and draw conclusions. pr·ot,es5>10naJ school settings as part of their curriculum in specialized disci-
academic domain activities. Whether they belong, strictly speaking,
l. Theoretical framework to the professional context orare produced to accomplish a pedagogic strat-
egy of the teacher will not forro part of the initial discriminating criteria.
1.1 Academic discourse
As mentioned above, research into written academic discourse 1.2
in the Spanish language, as used in institutionalized technical-professional consensus within the linguistics
On the other hand, there is now
school has received little attention within the academic, scientific of researchers, and ~~,r,,r,,,., \VÍthin corpus linguistics, about
in and in Latin America. This has not been the case in languages. A couple
SUCh as uo.nr;ncF•r
English, as one can easily access an extensive bibliography on the matter
of basic tenets are assumed:
(Halliday and Martín 1993; Unworth 2000; Gunnarsson Flowerdew
2002; Christie and Martín Christie 2005; Love 2002; Goldman and
1) any corpus should be ~Hrn1tff·'1nriv
Bisanz 2002; Bhatia 1993, Swales 1990, 2004; 2) any corpus should contain registers or rfrupy·~,-t-iPr text
Rose
WORKING WITH SPANISH CORPORA ACROSS REGISTERS IN SPANISH
in co-occurrence in a.~ .. ,..,_.~,,

written, specialized and non-specialized registers in a linguistic space
those co-occurrence patterns. At the same time, we attempted to
and Biber et al. have identified ínter- our study moving into more sentence-level linguistic categories
esting and lexical variations across various registers not trying to concentrate exclusively on features operating at word
of the oral and written English language. Two findings, among the many To complement this, evidence provided Louwerse et al. (2004) has
that have been reported, are that individual linguistic features presenta dis- been taken into account, so that a text-oriented perspective has also been
tinct occurrence in different registers, that is, the same or similar linguistic ernphasized.
features can have different functions in different registers. One of the
strengths of this methodological approach is its basis in a communicative
linguistic principie that appears logical: variation across registers is 2. Method
not accounted for exclusively by a single parameter or dimension, which
amounts to holding that there are multiple situational distinctions between 2.1 Research objectives and the corpora
registers. In other words, it is not possible for one linguistic feature, or even This research attempts to determine statistically - by means of factor
a few of them, to account exclusively for a determined variation among reg- analysis - the salient linguistic and co-occurring patterns in the El Grial
isters (e.g., oral/written, formal/informal). Research that has employed pUCV-2003 corpus and to carry out a comparative analysis of the three dif-
multivariate analysis has revealed that different dimensions are constructed ferent groups of texts collected. Finally, through the multi-dimensional
from different sets of concurrent linguistic features, thereby mirroring approach, a comparison of the three scientific-technical areas of the
various underlying functional interpretations ( e.g., objectivity, information technical-scientific corpus (TSC) is also made.
abstraction and modalization). The El Grial PUCV-2003 Corpus consist of 90 texts, equivalent to a total
Likewise, the traditional, more dichotomous distinctions (interactive/ of 1,466,744 words. This general corpus is divided into three registers or
non-interactive) have been questioned by multi-dimensional analytical subcorpora (Technical-Scientific Corpus [TSC], Latín American Literature
studies, which show that there is a continuum of linguistic variation Corpus [LLC] and Oral Interviews Corpus [OIC]). The TSC was collected
throughout the registers (Parodi 2005a). this is consistent with the first and the other two corpora, the LLC and OIC, were collected later, with
investigations advocating the idea of fuzzy categories or linguistic gradience the purpose of carrying out comparative procedures between different regis-
(Lakoff 1972; Aarts 2004; Aarts et al. 2004). ters that provide a deep and accurate description of the TSC and comply
Recently, Biber and his team have begun studying Spanish and focusing with the rigorous procedures of corpus linguistics. Table 2.1 shows the dis-
on a comparison of several languages through multi-dimensional analysis tribution of the number of texts and words.
(se e Chapters III and X, this volume). A crucial theoretical assumption of The unequal number of texts and words per corpus does not pose a
the multivariate approach is the principle that the co-occurrence of linguis- problem for comparison purposes since the figures are normalized to 1000
tic features (determined through statistical procedures) reflects shared com- words. In addition, despite being corpora of a rather small size, they are sig-
municative functions; that is, these patterns of features' co-occurrence are nificant considering the standards currently employed in corpus linguistics
interpreted in terms of common situational, social and cognitive functions.
In other words, the linguistic features co-occur in determined texts because
Table 2.1 Overall composition ofthe El Grial PUCV-2003 Corpus
they show shared specific functions. For example, passive sentences, nomi-
nalizations and non-finite verb forms are all related to both informativity and Type of corpus Number of texts Total number ofwords
object as first and second singular person pronouns, the
PUCV-TSC Corpus 74 (82%) 626, 790 (42%)
imperative mood and adverbs of place are related to interactivity. It is
PUCV-LLC Corpus 12 (13%) 459,860 (32%)
assumed that a group of features frequently co-occurs in certain texts
PUCV-OIC Corpus 04 (5%) 380,094 (26%)
because are used to express a set of communicative functions. These
functions cannot be determined it is crucial to Total 90 (100%) 1,466,744 (100%)
WORKING SPANISH S REGISTERS IN SPANISH 1
words
155,160 27,ss~s (6.5%)
246,374 (39%) 1,41
225,256 30,797
Total 626,790
in and are for the researcher to statistical

computer programs, as for factor ""'ª'vMo.
Below is a detailed of each of these corpora.
2.1.1 Corpus of 1echnical-Scientific texts (TSC)

The Technical-Scientific Corpus (as shown in Table 2.2) consists of 74
texts with a total number of words of 626,790, collected in technical-
professional secondary schools in the city ofValparaíso, Chile, in three dif-
ferent domain fields of specialization. These three areas of specialized Table 2.4 OIC Composition
technical knowledge are related to three professional curricula: the mari-
Oral Interviews Corpus (OIC) Number ofwords
time sector (Port Operation), the metal mechanics sector (Industrial
Mechanics) and the commerce sector (Accounting). The texts collected OIC 1 (PUCV 87) 86,616 (22%)
are those given as compulsory reading in each of the three domain OIC 2 88) 89,199 (24%)
fields under study; this reading material thus plays an important role in OIC 3 (PUCV 89) 102,092 (27%)
accessing the specialized knowledge of the community the students are OIC 4 (PUCV 90) 102,187 (27%)
trying to become part of. Total 380,094 (100%)
Detailed information about the TS corpus is outlined in Table 2.2.
As can be observed, there is an interesting unequal distribution of the
number of texts and the number ofwords in each subcorpus. It is clear that in this chapter, this is not problematic, due to the statistícal procedures
each domain field has certain inherent characteristics that are detected in to normalize the figures when comparing them or entering them in to
the corpus composition. We will come back to this point when analysing the factor analysis.
results.
2.1.3 ofOralinterviews (OIC)
2.1.2 Corpus of Latin American textbooks (LLC) The third and last corpus was collected from two interviews "IArith a
The selection of the twelve texts composing the Latin American Written total of 75 students attending their last year of education at technical-
Literature Corpus followed indications from the teachers of all the schools ro1es~nona1 and non-technical secondary schools. The first interview con-
participating in the research. The criteria were simple. We col- sisted of in-depth semi-directed conversation. The intenriewer directed the
lected the books teachers asked their students to read in the schools that conversation towards study techniques, and reading and comprehension
were part of the research. After the lists we The second interview, to that
identified twelve texts that were as part of the curriculum in all of the in the first conversation, was more open and less structured than
s:condary institutions. The purpose of this was to base the analy- the first. Of course, the kind of spoken interactions collected in such inter-
s1s on the same books all students participating in the were reading, \riews represent one type of many varieties of spoken
as a way of a certain degree of within the collec- limited in this case by the situational context.
tion of the TS Statistical data on these twelve texts are accordance 1'rith corpus organization procedures, the 150 transcribed
in Table 2.3.
~'''ic''""·~~ interviews were stored in four files. This is
As can be seen, the number ofwords of the LLC is notas as the distribution and of this corpus may be seen to differ
w:ith a difference of words. As has been mentioned
8 WORKING WITH SPANISH VARIATION ACROSS REGISTERS IN SPANI
Features
36. Prívate
37. Persuasive
l) 38. p,,,,,-,,nti'OP
2) 4. Present (indicative and J. Modal verbs
S. Future and suonni<~w;e 39. Possibility
6. future 40,
3) functional characterization '"""'·ªºuL features selected
B. Verb :moocl markers 41. Obligation
4) availability of computer programs can automatically tag the texts
7. Indicative/imperative 42. Volition
in flat format (ASCII or txt) K. Modality markers
S. Subjunctive/imperative
5) automatic tagging and parsing of the texts in the corpus 43. Hedges
9. Indicative mood
6) manual or (semi)automatic database queries to each text to determine 10. Subjunctive mood 44. Boosters
the occurrence of the features under study 11. Imperative mood L.Adverbs
7) elaboration of normalized data tables, given the different number of c. Verbal inflections 45. Place
words between texts 12. First singular 46. Time
8) application, with the aid of computer programs, of factor analysis to the 13. Second singular 47. Manner
frequency of feature occurrences. The reason for this is the need for a 14. Third singular 48. Quantity
reduction of the variables involved and for the determination of 15. First plural M. Subordination markers
co-occurrence patterns in the linguistic features 16. Second plural 49. Noun clauses váth 'que'
17. Third plural 50. Relative adjective clauses
9) establishment of a set of factors (each factor is made up of a set oflin-
D. Personal pronouns 51. Adverbial clauses of reason or
guistic features) through factor analysis with sorne kind of rotation cause/ effect
18. First person singular
(Varimax, Cuantrimax, Oblimin, etc.) 19. First person plural 52. Adverbial clauses of concession
10) functional interpretation ofthe factors, resulting from the factor analy- 20. Second person singular 53. Adverbial clauses of condition
sis, from the co-occurrence offeatures, thus constituting an underlying 21. Second person plural 54. Adverbial clauses of time
dimension of variatíon 22. Third person singular 55. Infinitive phrases with noun function
11) confirmation or refutation of the interpretation of the factors through 23. Third person plural N. Prepositional phrases and adjectives
the estimation of the factor loadings 24. Demonstrative pronoun 56. Prepositional phrases (noun
estimation of the dimension seores. In this step, the seores for each re g- E. Nominal forms complement)
is ter in each dimensionare compared and the linguistic and functional 25. Nominalizations 57. Attributive adjectives (descriptive)
similarities and/ or differences are studied. 26. Nouns (common and proper) 58. Predicative adjectives
F. Passive forms 59. Demonstrative adjectives
27. Passives with 'se' 60. Participles with adjectival function
2.3 28. Passive with 'ser' without agent O. Coordination markers
29. Passives with 'ser' with agent 61. Adversative, additive and disjunctive
In order to select features for the analysis, a literature search was initially 30. Passives with 'estar' conjunctions
carried out with the purpose of identifying representative categories G. Lexical specifidty P. Negation markers
that showed functional and communicative relevance in Spanish. We 31. Type/token per form relationship 62. adverbs
searched Spanish grammar books, specialized research articles on the 32. Type/token per lernrna relationship 63. of temporal negation
topic, textbooks and dictionaries on linguistics and grammar. H. Stative active forms 64. Negation conjunction
Based on all the literature searched, we were able to elaborate a matrix 33. Durative 'ser' 65. Negation pronouns
with a total of 65 linguistic features that could be identified with specific 34. Non-durative 'estar'
communicative and functional characterizations. Table 2.5 provides the
grouped into sixteen more general (identified by
means of
WORKING SPANISH VARlATION ACROSS REGISTERS IN SPANISH
In an advances, and
ficient to handle and carry out this section, we two types of results: ( l) the factors are
corpus of this dimension, we decided to In and the communicatíve functions shared the sets of features
~~~º"Vº and have corpus annotated are determined ; and (2) the dis-
program is called Connexor and it runs Linux. of these groups of features the three LLC
""~''"u'" that sorne researchers and students are sceptical of approaching and the three TSC technical-scientific areas (maritime, industrial
corpus linguistics because of their lack of computational abilities, a parallel and commerce) are analysed. In other words, we proceeded to construct a
objective was to produce a computational tool that could be easily handled functional interpretation of the statistical parameters that were found and
and operated. We therefore had to develop a website with a windows inter- to their incidence on each of the registers and areas of profess1onal
face. El Grial is now available and acts not only as an interface with which to specialization.
interrogate the annotated corpora, but also as a database that contains all
corpora collected by my research team (Paro di 2006). Several steps had to
3.1 Factors and dimensions
be overcome in order to achieve a fully operating system. Extensive details
of the still ongoing process, the versatility of the interface, and the available As mentioned above, five optimal factors emerged from our factor analysis.
tools being developed can be found at www.elgrial.cl and in Parodi (2007). Loadings with an absolute value of less than .40 were excluded from the
Thus, we proceeded to tag our corpus, revised it manually, so as to attain because, in this kind of study, these features are generally deemed
a high level of reliability, and recorded the frequency of the occurrence of to have no relative significance for the interpretation, even though they
all 65 linguistic features. In a more detailed description, the procedure may be statistically significan t. Other studies use loadings with a value of .35
applied to the El Grial PUVC-2003 Corpus the Chilean team consisted of: (Biber 1988); nevertheless, we decided that a higher figure should be used.
Due to Oblimin rotation, and in order to ensure a more realistic study in
1) SGML (Standard Generalized Codification which linguistic features may be presentas part of different interactions in
2) sentence splitter texts, we preferred not to remove a feature from subsequent factors once
3) morphological and syntactic annotation (El this was included in a previous factor. lt is clear the interpretation may there-
4) linguistic and stochastic desambiguator. fore be more complex but nevertheless more accurate.
The results, presented in the following tables, show the features and
their corresponding positive and negative factor seores in each of the five
2.5 Factor
dimensions.
A~ is well known, factor a11a1vM"
groupings of co-occur in FACTOR 1
texts. This analysis identífies correlations between a large number of vari-
ables this case, 65 linguistic and those that have a similar dis- Dimension 1: Contextual and Interactive Focus
tribution. The factor structure in which the variables that tend to co-occur
'"''"'hP·r is the result of a correlational matrix of all the vari- adverbial clauses .945
involved. Each group of variables is described as Time adverbs .934
a factor, which is further in terms of functional categories as ~.-.~ci~u adverbs .928
variation dimension. This reduces the variables involved Second person singular pronouns .911
virtue of their co-occurrence et al. 1999; Oakes First person singular pronouns .823
Once the factor was carried out Second person singular inflections .813
factors were determined pronouns .731
Oakes . These factors were confirmed Place adverbs .723
WORKING SPANI VARIATION ACROSS REGISTERS IN SPANISH
Indicative .693 be held

.668
.662 pronouns reveal a reference
.652 context, determine order frames for the
Active forms of a link with the action and express motives
Modal verbs .630 the sequence also the author's commitment
Demonstrative pronouns .592 contain the mark ofwhat has been situated (Kovacci Pérez-
Second person pronouns .531 Rioja 1971; RAE . Along with these sítuated and contextual
Adverbial clauses of condition .523 the indicative mood and the present tense indicate an experiential declara-
Adverbs of .503 tive mode 2000; Bassols and Torrent 1997; Criado de Val 1962; Gómez
Noun clauses .497 Peronard Hernández of oral interviews
Adverbial clauses of time .487 2002). First and second person singular pronouns refer directly to partici-
Private verbs .474 pants. They are classically considered as markers of the subject's presence
Infinitive phrases with noun function .466 ¡11 the text and signal an interpersonal focus plus a more compromised style .
Present tense .424 The latter features normally involve a specific addressee and signa! a high
degree of action, since they refer to direct communication with a direct style
Prepositional phrases as noun complement -.545 (De Kock and Gómez 2002). Hence, they represent dialogical interaction.
N ominalizations .479 Features with negative seores, such as nominalizations and nouns, are
Nouns -.443 classically considered carriers of the text's referential load, but here their
Participles with adjectival function .437 low frequency signals a low informational density (Biber and Finnegan
1986; Biber 1988; Chafe 1982, l 985; Chafe and Danielewics 1987; Ciapuscio
To interpret Factor l and be able to determine its underlying dimension, 1992; Di Tullio 1997; Halliday 1993; Picallo 1999; Parodi and Venegas 2004;
the functions shared by most of the co-occurring features should be evalu- Cademártori et al. 2006). Likeváse, adjectival structures (participial phrases
ated. Out of the 23 initial features with statistical seores of over .40, 15 fea- with adjectival function and prepositional phrases) actas expansions in the
tures with weights over .60 stand out. It should be noted that the positive textualization process; that is, they contribute to the integration and preci-
features co-occurring in this factor present the largest positive weights when sion of a great de al of information in a text ( Ciapuscio 1992; Harvey 2002).
to all the features characterizing the other four factors. On the As can be observed, the texts characterized by these types of features are
other hand, features with negative values in this factor are relatively fewer often highly abstract, condense a substantial amount of information and
in number and only four of them present weights over .40. express complex meanings, thereby constituting a resource of typical tech-
In decreasing order (from largest to lowest values), the most representa- nical and scientific language (Biber 1988; Burdach 2000; Hernanz 1999;
tive features are the following: adverbial subordina tes of cause/ effect (. 945), Moyano 2000; Zarzalejos 2001).
time adverbs (.934), negation adverbs (.928), second person pronouns In sum, the dimension that can be inferred from this factor is reified in
(.911) and first singular pronouns (.823), verb inflections of second person action, in sequences of events and in interpersonal, dialogical relationships,
singular (.813), negation pronouns (.731), place adverbs (.723), indicative which is also noticeable through temporal, spatial and demonstrative deixis.
mood (.693), verb inflections offirst person singular (.668), demonstrative All the linguistic features involved point, as a whole, to the supposition that
pronouns ( .592), second person plural pronouns ( .531), adverbial clauses of the texts characterized by this dimension do not contain highly abstract
time and present tense ( .424). Moreover, this is one of the factors that information, nor do they evidence a concise or precise integration of the
present the greatest number of negatíve among these are preposi- information. On the contrary, the high frequency of occurrence of these
tional (-.545), nominalizations (- .479), nouns (-.443) and par- typifying, statistically positive features concurs with a focus on explicitness,
function ( - .437). Most of the features in thís factor on dependence on context and on the interlocutors' active participation,
close functional relationship. Their interpretation is straight- which are classical features of oral and dialogical discourse. This emerging
forward. the structure of Dimension 1 does not constitute a unique dimension characterizes discourses where spatial-temporal marks are
of the factor extraction but is based on underlying lin- explicit and where there is mutual collaboration among the participants.
and communicative issues. This emerges as a powerful dimension This makes these discourses more authentic and less planned, where spon-
a substantial variation pattern between the oral and the and interactivity are and within which multiple discourse
and texts in ~..,,~,,.,üH organizations can co-exist, as narration, exposition, argumentation
WORKING SPANISH
VARIATION ACROSS IN SPANISH
.405
.402
-.581
-.562
oral and -.442
Given the interest in showing the relationships between the registers factor 2 also presents 23 co-occurring fratures distríbuted be~'ieen scores.of
TSC; literature, LLC; and semi-structured oral inter- S4 and .40. In decreasing order ofweight, the features are third pe.rson sn_i-
views, OIC) and - more specifically - in isolating identifying variables in ~ular pronouns (.842), first person plural pronouns (.828), penphrastIC
technical-scientific discourse in teaching contexts, these three registers are future (.823), imperfect past (.820), third person plural pronouns (.708),
compared through each of the dimensions interpreted. To achieve this, asta- ·ndicative mood (.686), first person plural verb inflections (.667), modal
tistical study is carried out from the factor seores that sum up the frequency ~erbs ofvolition (.651), indefinite past (.614), negation pronouns (.590),
of each feature in each factor for each text. The factor seores for each text prívate verbs (.577), place adverbs (.533), second person.singular pronouns
are averaged with ali the texts in a specific corpus (TSC, LLC and OIC) and, (.529), pereeptive verbs (.496), negation adverbs (.493), time adverbs (.48~),
in this way, a mean of the factor score is obtained for each dimension. These active non-durative verb 'estar' (.460), public verbs (.445), first person sm-
mean seores in each corpus or register are compared in order to determine gular pronouns (.431), third person singular verb inflectio~s (.423), adv~r
the types of existing relationships (similaríty or difference) (Hair et al. 1999). sative, additive and disjunctive conjunctions (.411), infinittve phrases w_ith
Factor 2 is presented below, and then the corresponding analysis is noun function (.405), and the negation conjunction 'ni' (.402). Meanwhile,
carried out.
among those with negative values are the features related to non:inalizations
FACTOR2 (-.581), prepositional phrases with a noun complement funct10n (-.562)
and attributive adjectives ( - .442).
The rneaningful presence of personal pronouns, especially those of the
Dimension 2: Narrative Focus
third person singular and first person plural (that is, human subjects .and
story protagonists) (Longacre 1983), justas their respective verb inflecttons
Third person singular pronouns .842 (the most common verb resource in Spanish), shows a rnarked stress on the
First person plural pronouns .828 identification of the persons in the discourse, those present at the moment
future .823 of stating something and those absent in relation to those present
past .820 (Calsamiglia and Tusón 1999). In keeping with the above, those features
person plural pronouns .708
Indicative rnood associated with past tenses - the imperfect past, which describes situations
.686 and circumstances, and its counterpart, the indefinite past, which
inflections .667 signals events and therefore the dynamism of actions (Kovacei 1993) -
.651 co-occur, thus denoting a direct reference to the narrated world (Arroyo
.614 2000; De Kock and Gómez 2002; Weinrich 1974). No less important is the
l'Je,ganon pronouns
.590 co-appearance of the periphrastic future, showing that, a~though narrate~
Prívate verbs
.577 events are generally situated in the past, others take place m the characters
Place adverbs
.533 presentas well (Contreras 2000). These verb tenses are associated with the
Second person pronouns
µp·rr~·~rmp verbs
.529 use of the indicative mood, which helps the protagonists present states and
.496 actions as real ( Gómez and Peronard 1988). All of this is cornplemented by
.493 the use of modal verbs of volition, private verbs, perceptive verbs and publie
adverbs .482
Active fonns of non-durative verbs that airn at accounting for the speaker's subjective attitudes (Arianz~n
.460 2001). In fact, what is rernarkable in mental-activityverbs-used above ali m
WORKING SPANISH CORPORA VARIATION ACROSS REGISTERS IN SPANISH
,604
.563
.562
.518
of the verb .467
. Likewise, the uc.>;ccc"Jl .452
'ni' and additive and disjunctive con- .435
support the sequence of events in the (Biber 1988; De Kock .427
and Gómez Pérez-Rioja RAE the stative active .411
verb 'estar' stands out because of its frequency in descriptive sequences pronouns .402
(Bassols and Torrent 1997; Lorente 2002), and the infinitive phrases with a
noun function are accounted for by their incidence in constructions where Prepositional phrases as noun complement -.457
modal verbs ofvolition and prívate verbs appear, which refer to the partici-
pants in a real or fictional communicative event (Hernanz 1999). To interpret Factor 3, 17 features have been considered. These features co-
Features with negative value, such as nominalizations, prepositional ccur with val u es over .40. From the largest to the lowest seores, the features
phrases and attributive adjectives are complemented in the integration and o . 1
considered are the following: prívate verbs (.824), first person smgu ar pro-
density of information (Burdach 2000; Chafe 1982, 1985, 1994; Ciapuscio nouns (. 789), indefinite past (.705), verbs of volition (.655), verb inflections
1992; Halliday and Martín 1993; Hemanz 1999;Janda 1985; Moyana 2000; of first person singular (.640), indicative mood (.630), :imperfect past
Zarzalejos 2001). The dimension, which emerges from Factor 2, identifies (.604), negation pronouns (.569), second person singular pronouns (.563),
itself with a chronological succession of events set mainly in the past and verb inflections of second person plural (.562), infinitive phrases (.518),
with a description of all which surrounds those events. The above is com- no un subordinates ( .467), adverbial subordinates of concession ( .452),
plemented with time and place markers. Additionally, the strong incidence active non-durative verb 'estar' ( .435), second person plural pronouns
of the pers.onal deixis helps express the presence of the protagonists by (.427), time adverbs (.411) and first person plural pronouns (.402).
means of either the presence of interna! points of view influenced by the Prívate verbs (Biber 1988; Weber and Bentivoglio 1991), verbs ofvolition
speaker's consciousness or externa! points ofview situated outside that con- (Gómez 1999) and the first and second person singular pronouns refer to
sci?usnes~. Intern~l, perceptive and prívate states correspond to the first the participants in a communicative act (Fernández 1999), specifically to
pomt ofVIew; pubhc states correspond to the second. In sum, this factor is persons who express their intentions and attitudes. Likewise, first person
associated with a sequence of events, which implies circumstances of time pronouns, first person singular inflections and second person plural
and place as well as the participation ofpeople in the discourse. Therefore, inflections confirm, asan essential characteristic of this factor, the discourse
Factor 2 helps create a functional dimension called narrative focus. writer/speaker's explicit involvement (Calsamiglia and Tusón 1999;
In conclusion, this dimension is associated with a sequence of events indi- Crismore 1989). The presence ofindefinite past markers suggests that the
c.ating precision of temporal and spatial circumstances, as well as participa- verbs previously described refer to past actions with a determined temporal
t10n of the first and second person in the discourse. Unlike highly end; that is, to constructions that mark the result of the action that the verb
~1-1ct..1a1Llt::u texts, the above helps identify literary texts, oral or written. expresses. The indicative mood, on the other hand, makes reference to real
facts localized in a real time (Contreras 1984; Criado de Val 1962) and
FACTOR3
expresses the experiential declarative mode typical of oral exchange dis-
course (Cepeda 2002). Similarly, it is suggested that the indicative mood is
Dimension 3: Commitment Focus a fe ature through which states or actions are expressed as real ( Gómez and
Peronard 1988); that is, this feature characterizes linguistic exchanges
Prívate verbs .824 whose referents are concrete facts in a given here-and-now. In subordinate
First person singular pronouns .789 noun clauses, the use of the subjunct 'que' is semantically conditioned: it
Indefinite past .705 designate events or processes that are not observed in their execution
Modal verbs of volition .655 but in their result - that is, as already established facts conceived as some-
First person inflections ,640 previous to the statement (Delbecque and Lam:iroy 1999). The pres-
Indicative .630 ence of this feature suggests that the interlocutors in the discourse
WORKING SPANISH VARIATION ACROSS IN SPANISH
co-occurrence of the above features becomes relevant when

2001), the vvith subordinates that atrribute very
are associated with a informative load because construc- - for ;vhich the has neither
tions with a greater abstraction. - to both the noun
The interpretation of strongly co-occurring features in thís factor, such as he events are or actions are carried out (Bosque
prívate verbs (decide, guess, feel, determine, demonstrate, appreciate, ~g90). can be said that both features (adjectival clauses and adverbs of
recognize), verbs of volition, pronouns and first person verb inflections con- anner) refer to processes of incidence of one linguistic segment on
stitute Dimension 3 Commitment Focus. This dimension is associated with :Uother one to form a higher unit (Hernández 2000b). Likewise, their
texts in which the wríter / speaker's intention and attitude are more relevant resence helps articulate a more expanded syntax with a lower degree of
than the message itself; in other words, this dimension characterizes texts ~recision. On the other hand, attenuators - informal and less specific
where there are real participants that express propositional intentions and rnarkers of probability and uncertainty - present the content of a proposi-
attitudes towards what has been said. The clear identity of the writer or of tion as uncertain (Markkanen and Schroder 2000). Their co-occurrence
the speaker is explicit in the text, and the participant takes responsibility for wgether with adverbs of manner and modal verbs of possibility character-
and gets involved in what s/he says and does. This commitment to discourse ize more modalized texts, in which the expression of the speaker's or
and its content reveals the writer/speaker's affections and purposes. The writer's attitude toward the content declared is marked; in other words, the
clear presence of yo through pronouns and first person verb inflections is speaker's particular view of the content of the statements s/he says stands
strong evidence of becoming involved in the discourse in an explicit mode out (Calsamiglia and Tusón 1999; Hyland 1998). In sum, the systematic con-
and playing a more controlling role. junction of these features is associated more strongly with discourses which
emphasize how things are said in relation to the content (modus), rather
FACTOR4 than on what has actually been said ( dictum).
The noun has classically been associated with the referential function.
Dimension 4: Modalizing Focus The negative value ofthisfeature (- .494) and positive seores ofthe features
mentioned above strongly suggest a relationship between Factor 4 and a
Active forms of durative 'ser' .671 function more descriptive than informative, more expressive than referen-
Hedges .656 tial. Therefore, as a way of reflecting the main co-occurring functions, this
Modal verbs of possibility .641 dimension has been labelled as Modalizing Focus, and is mainly associated
Adverbs of manner .606 with discourses containing explicit attitudinal markers .
Predicative adjectives .565
Third person plural inflections .549 FACTOR5
Adjective clauses .514
Third person singular inflections .405 Dimension 5: Informational Focus
Nouns -.494 Modal verbs of oblig¡ition .496

Subjunctive mood .494
For Factor 4, eight co-occurring positive features and one with negative Nominalizations .456
value have been considered. As in the previous cases, features with a score Participles with adjectival function .413
below .40 are ruled out. The features are active forros with 'ser' ( .671), atten- Prepositional phrases as noun complement .413
uators (hedges) , modal verbs of , modal adverbs
(.606), verb inflections Third person singular inflections -.632
WORKING WITH SPANISH VARIATION ACROSS REGISTERS IN SPANJSH
.630
Active non-duratíve of 'estar' -.595 as
verbs -.575 . It can thus be described as a set of
~.-,~u~"' pronouns -.572 informative load and is associated with a
Modal verbs -.503 of information, referential. As can be
the discourses characterized these types offeatures are often
The most correlation in Factor 5 is that between the ·~·"~'"H~ condense relevant amounts of data and express mean-
verbs of mood . 5 On the other hand, the negative features in this factor are geared
, with function , prepo- ing . a contextualization of a subject's intellectual state or non-observable
' third person singular verb inflections (- .632), ts at a given moment (Ciapuscio 1992). The presence of the above-
indefinite past (- , active non-durative verb 'estar' ( , prívate
ac features helps to distinguish between d'iscourses wlt .h a
verbs (- .575), negation pronouns (- .572) and modal verbs of volition great de al of information, a~d thus with a greater. degree ~f abstraction, and
(-.503). those discourses that contam a lesser amount of mformat1on; therefore, we
The presence of co-occurring positive features, such as modal verbs of decided to name Factor 5 Informative Focus.
obligation and the subjunctive mood, accounts for the necessity and cer- Considering all the factors involved, each of these five dimensions is the
tainty of the judgements expressed, essentially corresponding to a deontic result of a distinct set of co-occurring linguistic features and each poten-
mode (Hyland 1998; Osorno 2000). Moreover, in the case of the subjunc- tially defines a divergent group of similarities and differences between
tive, it refers to syn tactically more complex organizations (Gilí Gaya 1980) registers and areas of specialization. It is worth noting that, because of the
and to subordination, the function ofwhich is to frame the information of type of rotation selected (Oblimin), which is more adequate for manag-
the discourse (Delbecque and Lamiroy 1999; Galán 1999). However, this ing linguistic data and is more accurate when facts are analysed, the
same feature can be used to express subjective speculation and command makeup of one factor may present features repeated in a previous factor,
(Criado de Val 1962; Gómez and Peronard 1988). The co-occurrences of since these are not removed from the analysis once they are included in
nominalizations, participles with an adjectival function and prepositional a factor. The above involves potential complexities when interpreting the
phrases are presented as signals of integration and compactness of functions. The makeup, nevertheless, turns out to be more realistic as
highly abstract information, and are typical of nominal academic dis- far as human language and its association with a given dimension is
course (Biber 1986, 1988; Burdach 2000; Chafe 1982, 1985;Janda 1985; concerned. As has been shown, a group of characterizing features is
Picallo 1999). readily detectable in language as a constitutive part of communicative
Negative features that present a larger weight in this factor are: the third function; it is the group of features in systematic co-occurrence which
person singular verb inflections, which are used when there is risk of mis- reveals a singular variation pattern that offers possibilities for commu-
construing the reference because the information contributed by the nicative interpretation.
context fails (Castellano 2000); the indefinite past, which sets the action of Dimensions 1, 2 and 5 (Contextual and Interactive Focus, Narrative Focus and
an event in a finished temporal space, where the action is one which is Informational Focus) show themselves to be quite distinct because most of
repeated (Con treras 1984); the stative active verb 'estar', which forms part their positive features are different. Thus, it is possible to establish a clear
of the so-called units of knowledge, although it has no specialized value separation between the functions they represent and the types of texts they
(Lorente 2002); prívate verbs, noticeable for expressing intellectual states differentiate. This acute and delicate distinction is based on the fact that
or non-observable intellectual acts (Biber 1988; Weber and Bentivoglio many of the negative features also present heterogeneity and, in so doing,
1991); negation pronouns, the use ofwhich is mainly colloquial (Sánchez draw attention toward stereotyped registers or types of texts with very little
1999; Tottie 1983); finally, modal verbs ofvolitíon, which account for crisscrossing. Nevertheless, Dimensions 3 and 4 ( Commitment Focus and
either a down-to-earth mode (Langacker 1990) ora participant- Modalizing Focus) do not seem to aim at categories so finely distinguishable.
oriented mode, that is, a mode which outlines the subject's status (Olbertz In fact, they tend to be similar in sorne aspects. This fact is accounted for by,
1998). It should be noted that al! the features in this factor are on the one hand, their sharing a number oflinguistic features whose under-
present in prototypical written discourses and, in particular, in scientific lying functions are very similar and, on the other hand, by the fact that sorne
research artides (Burdach 2000; Cornillie 2003; Criado de Val 1962; Harvey of their features, though not identical, tend to render a similar functional
2002; Hyland 1998). interpretation. The above is not surprising because, as was anticipated, the
To sum up, the in this factor are essentially Oblimin rotation of the data entails this sort of result- results that
oriented toward n-.i-~rm-c> the concentration of information account for natural languages.
WORKING WITH SPANISH ORPORA VARIATION ACROSS REGISTERS IN SPANISH
one is well known, there is no definite consen-

sus on the basic characteristics of spoken and written discourse , way, the first four dimensions are of sorne interactive oral
1987; Chafe and Danielewics 1987; Biber 1988). In Graph 1, we observe ª nevertheless, it is also obvious that sorne of the features involved may
empirical data that show these two registers do not share the same dimen- appear in written texts. This means that no absolute differences
sional prototypical characterization. It is clear that the five dimensions built between speech and writing can be definitively stated. At the same time, the
upon the 65 features do reveal a differentiating factor between oral and data support the observation that interactivity and contextualization seem
to be the more representative functional features of the oral corpus
20 described in this research.
18 • \ _.__ Written register

In order to fully understand the nature and function of the analytical
comparison oftwo registers in a dimensional space, it is important to under-
stand that the nuclear argument here is that sorne strong patterns of use in
\
16 \ -1111- Oral register one register ( e.g., oral) often represent only weak patterns in other regis-
ters ( e.g., written). Thus, following this empirical analysis, we have relevant
14 information of both the extent and the ways in which any two registers are
different.
12 The next graph, Graph 2, gives information on the five dimensional seores
in two groupings of texts (specialized/ non-specialized). Therefore, we
10 compare the specialized register (written texts from the TSC) and that
which we calla non-specialized or general register (the oral corpus, OIC,
8 and the written narrative corpus, LLC).
Graph 2 reveals a similar situation to that of Graph 1, where oral and
6 written registers were compared. In this case, however, the specialized
receives negative values in the first four dimensions and only points positive
4
along Dimension 5. Interestingly, a completely contrasting distribution of
figures occurs with the non-specialized registers (oral conversations and
2
written literary texts). These texts receive negative values only along
Dimension 5, which means these linguistic configurations do not tend to
o
representa more general, non-specialized register.
-2 One major difference between the groups of registers being compared
can be established along Dimension 5; that is to say, there is empirical evi-
dence in favour of the existence of a so-called specialized discourse.
Furthermore, one can see an interesting linguistic pattern of configuration
between oral/ non-specialized and written/ specialized registers. Of particu-
Dimension 1 Dimension 3 Dimension 5 lar interest for the objectives of this research are the surprising findings that
Dimension 2 Dimension 4 a certain written mode of the language seems to be closely linked to the spe-
cialization of communication. So, it could be said that the co-occurring
Graph 2.1 of Dimension 5 tend to identify the prototypical version of
34 WORKING SPANISH C VARIATION ACROSS REGISTERS IN SPANISH
Dímension 1 Dimension 3 Dimension 5

Dimension 2 Dimension 4
Dimension 1 Graph 2.3 Dimensions and three registers

the oral non-technical-scientific texts; to be more precise, the aim is to iden-
Graph 2.2 Comparison of specialized and non-specialized registers tify potential linguistic and functional differences or similarities in the texts
under study.
written/specialized language, while the absence of such systematic organi- According to this data, the TSC corpus obtains relatively similar negative
zation is closer to oral conversation and interactive discourse. factor seores throughout the first four dimensions. This homogeneity reveals
In the preceding section, we noted that the informatíonal function of that, on the one hand, the features that identify these four dimensions must
language is represented in Dimension 5. It is this group of systematic present an occurrence similar to the texts in the corpus; interactivity, con-
co-occurring linguistic features which provides the distinctive characteristic textualization, modalizing and commitment would be neither typical nor
of linguistic variation between these registers. As we have observed, this occurring dimensions in the technical and scientific texts, compared
enables empirical comparative analysis not only of the way in which, but also to the other two registers. On the other hand, it is clear that Dimension 5
of the extent to which, two registers may differ. InformationalFocus obtains a mean positive seo re ( .1), which helps identify and
Graph 3 shows the mean seores per dimension (calculated according to determine a statistically significant distinction between the TSC and the LLC
factor score) for each one of the three registers in the El Grial PUCV-2003 and the OIC. Nominalizations, prepositional phrases as noun complements
corpus literary and oral) with respect to the five dimensions in and adjectival passive participles, among others, are features that show high
question. Let us recall that the aím is - through the dimensions already relevance in these written texts. These features contribute to conforming to
determined - to compare the registers and to determine specific relations the informational mechanism, which clearly moves away from interpersonal
in the Spanish written language, and non-specialized, as well as and affective contents. At the same time, as can be observed, the written texts
SPANISH CORPORA VARIATION ROSS REGISTERS IN SPANISH
pronouns, time,
and demonstrative pronouns and
This reveals
intersection areas between the of the inter-
views and the of Latín American literature. It also to
distinctively separate highly specialized texts, where grammatical construc-
tions of more complexity and greater packaging and reduction of informa-
tion are detected than those which involve the participants and their
interpersonal relations in a detailed way. In the latter texts (oral and
written), the author's involvement is detected and expressed in a more
explicit way through specific linguistic markers ( certain types of verbs,
adverbs, pronouns, etc.). This fact can also characterize written specialized
discourse, but most of the time it is kept implicit by using other resources.
As can be observed from the data, variation between registers faces a con-
tinuum, identified in this case through the dimensions and the linguistic
features captured by the dimension; it also helps identify and prototypically
characterize them. The implications derived from the above are multiple,
particularly concerning specialized discourse for didactical dissemination -
the central focus of the present research. Developers of didactic materials
addressing specialized discourse in written Spanish and whoever specializes Dimension 1 Dimension 3 Dimension 5
in their teaching should take advantage of these descriptions. Dimension 2 Dimension 4
Graph 4 shows the factor seores per dimension for each of the Graph 2.4 Dimensions and domains of specialization
three areas of specialization in the El Grial PUCV-2003 corpus (commerce,
industrial and maritime). analysis to fully account for which types of commerce texts gather around
It is worth noting that Graph 3 shows a clear differentiation in the TSC this dimensional pattern.
between Dimension 5 Informational F'ocus and the other two registers (LLC Next in the hierarchy, Dimension 4 also contributes to the distinction
and OIC); in this more detailed analysis, it is precisely Dimension 5 which between specialties. It is now the industrial area which reveals the highest
seems to show the greatest distinction between the technical-scientific positive mean score on the dimension iVIodalizing Focus (2.5), whereas the
domains. As can be seen, the commerce area presents the largest mean posi- maritime and commercial areas obtain very similar negative seores, but with
tive value on this dimension ( 3.8). This fact reflects its heaviest informa- no statistical significance between them. From these data it can be inferred
tional load through high lexical density and syntactic complexity. that the texts from the industrial area contain a greater regularity in the sys-
Parodi (2004) has already detected, by means of a simple descriptive tematic patterns of occurrence around the distinctive features of attenua-
study, sorne interna! variability in the behaviour of the 65 linguistic features tion and uncertainty. In the remaining dimensions (1, 2 and 3), the three
between the technical areas. This preliminary investigation showed a dis- TSC areas present negative and relatively similar figures. These facts indi-
tinctive pattern between the maritime area and the other two. As can be cate that these dimensions would not contribute to a differentiating expla-
seen, the maritime area obtains the largest negative score regarding infor- nation for the description of technical-scientific discourse for teaching
mational load ( - , precisely the dimension which marked a difference purposes. It can also be suggested that the texts belonging to these special-
with the other two areas in the study alluded to above. Considering this ized areas are not remarkable because of the occurrence in them of those
result, Dimension 5 appears to distinguish between texts in one area and the features that denote interaction, interpersonal relations and involvement of
other. Of course, it will be necessary to carry out an in-depth qualitative the participants in the discourse.
38 WORKING ORPORA VARIATION ACROSS REGISTERS IN SPANISH
Discussion and coinc:lu:srnm
reassens
in the way of
the MDA has also more
rP•rn<rPr'' from others that are
of the TSC was developed in the present

aim was to carry out an initial It will necessary to carry out a more detailed analysis of the dis-
from a multi-dimensional perspective. between the three areas of specialization in the TSC
Although this might be considered an exploratory study based on a rnaritime and industrial), while estimations of factor scoring among tl:e
relatively small number of words, the results prove their power, along with ·ous text types that make up this corpus should contmue to be made m
the fact that the data obtained is clearly interpretable both statistically and van
rder to inquire into the contribution ofthe MDA to t h e dºf'C · · an d
1 ierent1at10n
fünctionally. ~escription of those text types. The empirical background so ~ar accounts
The challenge to move on toward a new research field for the Chilean for an interesting homogeneity along Dimensions 1, 2_ and 3 m the three
team has been gradually but successfully tackled. Thus, to advance in this technical-scientific areas: these texts appear to have ne~ther a ~t~ong mark
multidisciplinary field at our university, we are implementing technological fnarrativity, nor ofinvolvement ofparticipants, nor ofmteractiv1ty. On the
tools which help gain a greater independence in the annotation of texts, ~ontrary, the texts of the industrial and maritime areas present significant
and developing other means of friendly interrogation (www.elgrial.cl) for differences along Dimensions 1Wodalizing Focus and Informative Focus .. !he
those who are not interested in learning computational languages, such as industrial domain shows a tendency towards modalizing and the mant1me
the commands in CQP. To this end a computer laboratory equipped with domain can be distinguished from the texts of the other two areas through
Spanish-language computer programs and specialized human support has a greater degree of positive informative density. .
been implemented, thanks to institutional and governmental funds. Another possibly relevant finding that emerges from the results of th1s
As for the determination and identification of relevant features and text study is the similarity detected in several languages in terms of the determi-
dimensions from a general perspective, multi-dimensional analysis proved nation of linguistic dimensions that con tribute to describing and differen-
to be a powerful method for computational corpus linguistics. The tiating between the registers (Aijmer 2002; Biber 1994; Kittredge 1982;
strengths of the MD analysis involve more than the mere description of the Reppen et al. 2002). The Spanish language, in general, as well as En?lish,
surface features of the texts: the determination of systematic regularities Somali and Korean, presents multiple dimensions that reflect at least mter-
revea! communicative functions and approaches to an in-depth description activity, informational focus and commitment. In ali these languages -
ofthe function that a certain text contributes in a given context oflanguage thanks to the MDA - distinctions between speech, writing and several spe-
use. The strength líes in the multi-feature and multi-dimensional analysis. cific registers in each mode of language have been established. The vari-
Moreover, the relatively large of texts helped generate robust con- ability registered in linguistic usage in the particular languages studied,
clusions with larger implications. These aspects lead linguistics not only to though existing in this pattern sense, seems to suggest a possible degree of
consider an alternative research method, but also to visualize a paradigm universality (see Chapter III, this volume), something which goes far
that gives a new impulse to forthcoming research in Spanish. beyond the aims of the present study.
The findings of the five dimensions, identified from a analy- Finally, this study has powerful pedagogical implications concerning the
sis of the distribution of the 65 linguistic features across 90 texts, repre- elaboration oflanguage tests of a different nature, such as the assessment of
senta resource of extraordinary descriptive potential. The analysis provides technical content and of discourse comprehension. It is also important for
essential distinctions between the analysed texts in their spoken and written the preparation ofteaching materials. This is because the registers describe
forms: the first appears to be anchored in interactivity, contextualization the characteristics of the type of language use employed in the material to
and interpersonal relations; that is, they are strongly described by Dimension in the areas mentioned, the technical-professional students are
1, Contextual and Interactive Focus; furthermore, the spoken register herein exposed. According to the results, these students need to develop their
described (interviews) has no direct relation to an informationally dense skills at mastering a very particular, specialized variety of 1-vTitten Spanish -
prose, from both lexical and syntactical perspectives. Although it is true that dense prose in lexical, morphological and syntactical terms,
the written coincides with the register on the but also texts v.rith modalizing marks.
WORKING WITH SPANISH VARIATION ACROSS IN SPANISH
PUCV-2003 LINGUISTTC
past
=vnr,_.,,,,,_, the
that cannot be nP·rt.·w1-nE•rl
since in their oror1esE;1onaJ has been defined. as tense
technical the tense m wh1ch the
also be suggested that it is not advis-
school students to '"'tomo>t,.,.,,
face texts marked such a complex informational prose. Materials should
be presented in a gradual progression, starting with those that are the most 2. Imperfect past (indicative) [PRET. IMP]
disseminated and moving on later to those texts that are more typical of the This expresses durative or, rather, reiterated and habitual, wholly or par-
professional field in which they are going to be working. tially simultaneous action with another past, durative or instantaneous
action. That is why it is said to be a relative tense (Alvar 2000; Moliner 1986).
In particular, it signals simultaneitywith respect to a moment previous to the
central focus (Alvar 2000). It is typically found in written language, specifi-
cally in narrative prose (De Kock and Gómez 2002). Bassols and Torrent
(1997) point out that it is a tense with which the initial states of a narrative
sequence are described, as wcll as the descriptions inserted in the account.
They also identify its use in argumentative constructions and, perhaps, as
Moliner (1986) states, the imperfect past in modal verbs (must, can, have
to) expresses an opinion about the convenience or origin of things.
3. Perfect past (indicative and subjunctive) [PRET.PER.]
This tense expresses a past action that influences the moment of enunci-
ation and which lasts until the very moment in which something is said
(Moliner 1986). Gili Gaya (1980) argues that, although modem Spanish
establishes a difference between the indefinite past and the perfect past, vast
arcas of Spain and Latín America have preferred one form to the other due
to the prevalence of the perfective aspect in both.
4. Present (indicative and subjunctive) [PRES]
This expresses the actions co-existing with the word's act Gaya 1980).
It is used for what is universal, and is habitual in maxims and sentences.
When the action refers to the moment of speaking, it is the current present.
But timeless truths and habitual actions are also stated with the present
tense. It can express past actions, when the historical present is used
(Moliner 1986). Likewise, when the action to be taken is certain, it can have
a future value (Moliner 1986).
5. Future (indicative and subjunctive)
The future tense expresses a forthcoming action, independent of other
actions. to Gilí the use of the future assumes a
part; dueto it appears
SPANISH CORPORA VARIATION ACROSS REGI IN SPANISH
should be noted
11.
function of this mood is that of direct command. It almost exclu-
occurs in de Val . These verb forms
This tense to the world commented on, in contrast with the counter the other are in the expres-
. It is found in col- sions ofthe deontic form of command. The imperative forms do
not convey except for the meaning of although it is µu0'"'u".
express commands through other linguistic forms of the future and of the
B. Verb mood markers present indicative, subjunctive present, etc. (Gilí Gaya 1980).
7. Indicative/imperative [INDIC.IMP]
c. Verbal inflections
The forms of the second person singular of the imperative and the third
12. First singular [DES.IS]
person of the indicative coincide.
This inflection reflects a text's egocentric nature; it implies a need for direct
8. Subjunctive/imperative [SUBJ.IMP]
communication. It is typical of direct style, ofwritten language and of nar-
There is syncretísm between the forms of the subjunctive and those of the rative prose (De Kock and Gómez 2002). In scientific language and that of
imperative because the only specific forms of the imperative are the second the dissemination of science, there is a common tendency to avoid refer-
persons of the singular (tú) and the plural (vosotros). The other persons are ences to the first person and to employ other procedures for the presenta-
taken from the present subjunctive. In negative phrases, second persons are tion of the author ( Ciapuscio l 992).
substituted for those of the subjunctive.
13. Second singular [DES. 2S]
9. Indicative mood [MOD. IND]
The second person is used with the purpose of causing a given effect: to gen-
This expresses a declarative experiential mode (Cepeda 2002), as well as eralize the stated experience and to include the interlocutor in a personal
states or actions considered to be real (Gómez and Peronard 1988). The and emotional way (Calsamiglia and Tusón 1999). It is associated with
indicative mood is typical of discursive oral exchange (Cepeda 2002), and colloquial language.
makes reference to real facts situated ata true time (Criado de Val 1962;
14. Third singular [DES.3S]
Gómez and Peronard 1988). it refers toan event located in the
past. to Alcoba (1999), the indicative mood is used to make an The third person singular inflection contributes notions of time, mood,
assertion. In this mood, the function prevails, and its distinct person and number to the statement in which the verb form appears
form is the logical or declarative. It is defined by the objective relationship (Alcoba 1999). This inflection is typical ofwritten language, particularly the
between and message and the declarative form and, smce the scientific article (Kaiser 2002).
declarative form is the unmarked form, the indicative mood is the
15. First plural [DES. lP]
most extensive (Hernández 1996).
Identification of the speaker with the first person incorporates the
10. mood [MOD.
speaker into a group (Calsamiglia and Tusón 1999). The so-called we of
speculate about uncertain facts, to a subjective modesty replaces yo because, on occasions, its use is considered inappropriate
appraisal de Val 1962). In this mood, the subjectivity or the in public/formal or academic texts. Another function of the first person
of the communication facing the statement is expressed; it is the mood plural is including the interlocutor, getting him involved and, in this way,
and In all subjunctive the presence of the smoothing commands and requests. Ciapuscio (1992) has pointed out that
speaker and that of the statement is perceived. The can appear, more- this use is frequent in dissemination discourse addressed to a general audi-
over, as agent of the oras subject of the statement, which is some- ence. Likewise, in these text types as well as in those addressed to a more
thing that cannot be found in the mood 1996). restricted audience for their dissemination, the author stresses the use of 'we'
WORKING CORPORA VARIATION ACROSS REGISTERS IN SPANI
to
for
17. Third
The appear when there is risk of
This inflection is ence because the information contributed the context fails. It is com-
due sentences, in er with oral interviews possess
Which the dÍSCOUrSe nrnr1l1u·pr lHIXltlllJll<HlY shows him/herself use makes them similar to demonstratives, but also have
disinterested, or (Gili Gaya 1980). referential use; that is, they can retrieve the features of a person present in
the context (Fontanella 1999). In the case of narrative texts, the
D. Personal pronouns person matches the omniscient narrator (Bassols and Torrent 1997).
the other personal pronouns, the third person singular is the non-
18. First person singular [PRON. IS]
that is, it is ruled out of the communicative act and refers not to the
Because conjugated verbs in Spanish are marked with the person's inflec- but to an 'objective' situation. In this sense, it is the unmarked
tion, the subject pronoun is almost always unnecessary. In first and second term; in fact, it <loes not exist in all languages (Fernández 1999).
persons, its appearance is emphatic and conveys particular insistence on
23. Third person plural [PRON. 3P]
making the subject stand out (Gilí Gaya 1980). It directly refers to the par-
ticipants, markers of the presence of the yo (Biber l 988). In narration, it tells Just like third person singular pronouns, the subject pronoun appears when
apart the witness-participant or the protagonist (Bassols and Torrent 1997). there may be ambiguity, since possible third persons may be many.
It is always present when the context does not clarify the verb person suffi-
24. Demonstrative pronoun [PRON. DEM]
ciently. It is typical of oral interviews (Castellano 2000). In general, personal
pronouns of first and second person - called deictic - refer to the partici- like other linguistic elements grouped around the denomination of
pants in the communicative act, a function that is typical of them. The essen- the demonstratives signal, by selecting them, sorne elements of the
tial semantic characteristic of personal pronouns is that they do not help context (Calsamiglia and Tusón 1999) and acquire a ful! sense in the
assign truth-values to statements independent of context (Fernández 1999). context in which they are stated. They are common in oral speech.
Another important aspect is that first and second person pronouns are
reversible in the sense that the yo being spoken cannot but yield right to a E. Nominal forms
tú if the yo wants to have a valid interlocutor, although they are not neces-
25. Nominalizations [NOMINAL]
sary to express the concept of grammatical person (Fernández
This term designates names derived from verbs and adjectival bases, as well
19. First person [PRON. lP]
as the process of their formation. The names derived can have an event, a
With the form nosotros, the use of the pronoun yo is dodged, but this does process, a state, a quality or a product as referent, as a result of an event or
not in principie, important semantic differences. This use is consid- process, ~ith the typical resource found in technical language (academic) to
ered to be more polite and, for this reason, its use is extensive in the acad- express complex and abstract (Picallo integrate
emic genre (Lledó 1995). This pronoun can have severa! of referents, information in a few words and function as conveyors of highly
in what is called 'fictitious plurals' (Alcina and Blecua abstract information (Biber 1988). Ciapuscio includes nominalization
in the of omitting the agent,
20. Second person [PRON.
26. Nouns and
The forms of written language in Spanish, in partic-
ular that of narration and poetry (De Kock and Gómez . They require , nouns are the main conveyors of the text's ref-
addressee and indicate a high degree of interaction and action erential meaning. Occurrence of sorne of nouns longest ones and
Kock and Gómez . In narrative texts, narration in the second those is associated with whose focus is infor-
person appears, in the case of the characters to a careful of the information. The
WORKING SPANISH VARIATION ACROSS REGISTERS IN SPANI
lemma
expresses
the lexical roots of the forms
F. Passive forms
27. Passives with 'se' active forms
actions, with intemal and implicit, 'ser' [ACT.SER]
may appear in all types of contexts.
are used when the of the is the name of an verbs basically express equivalence, equality, or
object. In general, they are more frequent than the passive with 'ser' relationships or attribute qualities or values, such as to be, to
1986). They appear both in oral speech and in written discourse. An increase appear, to ~e equival~nt. ~eris consi~ered to attribute pen_nanent qualities.
in the use of this construction has been noticed in informative disseminating Verbs of th1s type, pnmanly copulat1ve or pseudo-copulative, although not
language (Mendikoetxea l 999a and b). Ciapuscio (1992) gives evidence ofits ¡dentified as units conveying specialized knowledge, are part of the expres-
frequency in scientific texts, with which the text's impersonal nature is sion of that knowledge; that is, they do not have a specialized value but are
stressed. According to this author, the tendency to omit the agent is typical of part of specialized knowledge (Lorente 2002). Their main characteristic is
science language, which is maintained in its dissemination. their frequency in description (Bassols and Torrent 1997). With serjudge-
ments independent ofimmediate experience are made (Gili Gaya 1980).
28. Passives with 'ser' without an agent [FAS.SER-a] Their use in copulative sentences amounts to functioning as a link between
They specialize in focalized actions, with externa! objects and a marked the subject and the predicate; they are also used for expressing temporality
intentional nature that denotes the existence of a delimited implicit subject. (Gili Gaya 1980).
They are more frequent in written language. The absence of the agent has 34. Non-durative 'estar' [ACT.ESTAR]
been attributed to the intention of keeping silence or of concealing the
notional subject (Mendikoetxea l 999a and b). Because it is a connecting verb, it performs the same functions as those of
the verb sermentioned above but, unlike the latter, qualities are considered
29. Passives with 'ser' with an agent [PAS. SER+a] to be transient or accidental; besides, qualities are sensed as the result of a
These passive constructions normally express a notional subject, grammati- change or transformation (Gili Gaya 1980). Because it expresses states, it is
cally correct. The explicit inclusion of the agent appears most often in news- lexically incapable of expressing a change or progress during the time lapse
paper written discourse (Hernández 2000a). in which it takes place (De Miguel 1999).
30. Passives with 'estar' [PAS.ESTAR] I. Verb types

Grammarians have not agreed on the passive nature of these constructions. 35. Public [V.PUBLIC] (say, explain, admit, agree, declare, r:nr.~h/,r¡in remark,
Those who do not accept it argue that passives with serexpress an action that suggest)
falls on the patient subject; instead, constructions with estar express the
result of that action. According to Mendikoetxea (1999b), both construc- These verbs report actions typicaJ of scientific
tions have a passive meaning. are verbs denoting actions that can be observed they are primary
acts, such as 'say' and 'explain' and are used to introduce indirect
G. Lexical specificity assertions (Biber 1988).
31. relationship [1YP.TOK.form] 36. Prívate [V.PRNAD] (discover, believe, guess,find,feel,

determine, show, deem, acknowledge)
This is a relation in percentage terms that expresses the spectrum of vari-
ability ofunits in a universe. In this case, it refers to the lexical specificity of These refer to activities perceived only by the speaker and are also used as
the forms in a text. A high rate is related to written language due mitigatingverbs (Palmer 1974; Weber and Bentivoglio 1991). A typical char-
to the nature of its production. Conversely, a low rate is associated acteristic of these verbs is to express intellectual states or non-observable
because of the of orality (Horowitz and intellectual acts (Biber 1988). They correspond to Halliday's ( 1994) mental
processes and to Hyland's (1998) epistemic lexical verbs.
WORKING SPANISH VARIATION ACROSS IN SPANISH
ing .
are resources the service
used to achieve communicative effects that go
These express intellectual states when the on w?ich the action fa.lls is information
an abstract noun. They have a more concrete meanmg when the objects
44. Boosters
on which the action falls are concrete nouns. They correspond to a subtype
of They reveal an intemal focalization in These accentuate the value of the verbs. They are used to positively signal
the reliability ofpropositions. They can be used in non-propositional func-
tions to signal solidarity with the interlocutor. Typical of oral inten•iews
J. Modal verbs (Cepeda 2002).
39. Possibility [V.MOD.POS]
L. Adverbs
These verbs express the speaker's/writer's opinion on the content
45. Place [ADV.LUG]
expressed, generally mitigating the force of the statement~ ~aid: They are
in scientific articles (Hyland 1998) as a way of ant1opatmg poten- These place the meaning of the verb in spatial coordinates and add infor-
tial objections and help the author appear less restrictive. mation that completes the argumentative structure of the predicate
(Bosque 1990).
40.
These express the writer's compromise with what is said in the scientific 46. Time [ADV.TIEMP]
article (Hyland 1998), increasing the force ofwhat has been asserted. Dueto their deictic function, they set order frames in the sequence of events
or contextual clues for the interpretation of what has been said (Kovacci
. They are circumstantial in post-verbal position. actas circum-
These express the point of view of the writer, who judges the truth of what stantial if they interrogate or negate.
has been said in terms of certainty . Besides, the deontic
mode is concemed with the or the possibility of the actions being 47. Manner [ADV.MOD]
carried According to Bassols and Torrent (1997), they are of description. In
42. Volition [V.MOD.VOL] principle, they denote the manner in which events are or m
which actions are carried out.
To '-''"ª"ªL."-u
These express or
to a 1971). As they help express nr,r,n,Qrt;
mode. to giving descriptions greater
markers M. Subordination markers

43. 49. Noun clauses with
These are informal and less Subordinate sentences introduced que fulfil functions of nouns,
mark the content co-occur with do not to noun clauses. As far as the determiner
50 WORKING SPANI ORPORA VARIATION ROSS REGISTERS IN SPANI
events
results
50.
The use sentences w1th relative pronoun
noun with very characteristics for
does not possess or lexical
51. Adverbial subordinates of reason or cause/ effect as or as either

These signal the reason for the fact mentioned in the main sentence. They or optionally chosen by the verb. The modifying or attributive construction
help express the cause or consequences (Gili Gaya 1980). The cause or has a close relationship with the construction of the noun predicate, since
consequence is usually introduced by a causal connector, which indicates almost all adjectives that function as predicates in copulative characterizing
that the following statements are the effects of antecedent reasoning sentences can also be modifying. Adjectives are words that are applied to
(Calsamiglia and Tusón 1999). other words naming physical or mental objects; by means of adjectives, a
property ora set of properties is ascribed to those objects. More specifically,
52. Adverbial subordinates of concession [SUB.ADV.conc]
a modifying adjective ascribes properties whose specification is used to
These express sorne reserve that lacks efficacy so that the opposite can define or delineate the mentioned entity with greater accuracy to charac-
take place in the main sentence, whose execution is carried out despite terize it orto identify it among other similar entities, to classify it orto estab-
the obstacle (Bassols and Torrent 1997). They help build argumentative !ish cultural and scientific taxonomies orto indicate genetic or metonymical
discourse. relationships (part-whole relationships). The most salient feature of adjec-
tives (attributive and predicative), what differentiates them from nouns, is
53. Adverbial subordinates of condition [SUB.ADV.cond]
that they are general terms; thereby they can be applied to various objects
These indicate the condition that must be met so that what is expressed in and have a gradual nature (Picallo 1999).
the main sentence can be executed (Gilí Gaya 1980). contribute to
58. Predicative adjectives [ADJ.PRED]
making reasoning explicit; therefore they are typical ofboth expository and
argumentative discourses. These are adjectives that express properties of the subject by means of
attributive sentences (Gilí Gaya 1980); to put it another way, their depen-
54. Adverbial subordinates oftime [SUB.ADV.tiem]
dence on the noun is made indirectly through a verbal indicator (Marcos
These express the time ofthe action in the main sentence, delimiting a pre- 1975). There are adjectives that combine with the verb ser: adjectives of rela-
vious, or a simultaneous, or a subsequent relationship (Pérez-Rioja 1971). tionship and of origin belong to this group. Other adjectives are used only
Their function is to temporally locate the main action in relation to the with the verb estar. In general, these are those adjectives which indicate a
subordinate. With the aid of these it is possible to express the result. For this reason, participles - except in passive constructions - always
nuances for which the sole presence of the verbs is not enough combine with the verb estar. It should be noted, however, that there are
Gaya 1980). times when participles combine with the verb ser, because from this combi-
nation a different meaning emerges. Other adjectives that can be combined
55. Infinitive phrases with noun function [FRA.INF.nom]
with the verb estar are contento (glad) and (satisfied). The same
These indicate an attempt to condense omitting the partici- occurs with adjectives, which like participles indicate the result of some-
pants (Biber 1988; Chafe 1982, 1985). They have an event-like nature relat- Other adjectives combine with the verbs serand estar, but the meaning
ing to the description of processes or of alethic activities (Demonte and varíes in each case. With the verb ser, the adjective designates: a) a property
Varela . From a logical point they are open sentences, since inherent in what the subject designates; one of the subject's relatively per-
the subject of the infinitive is a variable that is taken from the noun manent characteristics which belongs to its description or which classifies it
phrase appearing in the same linguistic context ( Gómez 1999). According is, it introduces the subject to a class of entities). With the verb estar,
to Biber , these structures neither nor fixed designate an acquired a product of an actual or im-
functions. change.
SPANISH VARIATION SS IN SPANI
These have a more use

existential m
function as nouns, the presence of an
duce a name in the discourse or
the signifier of the masculine and
express unity or of existence (Picallo
1999).
60. function of participle [PARTICI.adj]
This is principally found in written rather than oral discourse, and the usual
is that it is used for integration and structural elaborations
(Biber 1988). Janda (1985) asserts that it is used in note-taking, because it
is more compact and thereby good for the production of highly informa-
tional discourse when time is limited. Ciapuscio ( 1992) stresses the use of
verbals, among them the participle, as a disagentivization resource, which is
a mechanism to conceal the agent that helps condense the information.
This author has verified its frequency in texts of science dissemination and
in scien tific texts.
O. Coordination markers
61. Adversative, additive and disjunctive conjunctions [CONJ.dis.adv.ad]
The conjunctions are typical ofwritten language and ofnarrative prose (De
Kock and Gómez 2002). Coordination is the grammatical procedure used
to associate syntactical constituents without establishing a grammatical hier-
among them (Camacho 1999). The use ofthese conjunctions is fre-
quent as an indicator of simplicity in spoken language. According to Ávila
(2000), employs coordination more than subordination,
in face-to-face and distanced conversations.
62.
These convey more and fragmented information (Biber 1988).
are indicators of the negative form in the sense that make
reference to the sender's attitude toward the receiver and the message
itself.
can
DIMENSIONS OF REGISTER VARIATION IN SPANISH
how is an oflinguistic variation. In con-

rast, there have been fewer studies that focus on a single
t comprehensive linguistic description of the register. And even fewer
ª have compared the range of spoken and written registers in a lan-
guage. This gap is due mostly to methodological difficultie~: until rec~ntly,.it
Northern Arizona has not been feasible to analyse the full range of texts, reg1sters and lrn.gms-
USA tic characteristics required for comprehensive analyses of register variation.
With the availability oflarge online text corpora and computational analyti-
cal tools, such analyses have become possible. Multi-dimensional (MD)
analysis - used for the present study - is a corpus-based research approach
1. Registers and register variation developed for the comprehensive analysis of register variation.
More specifically, MD analyses describe the basic patterns of linguistic
For many years, researchers have studied the language used in different situ- variation between spoken and written registers, and the ways (and extent)
ations: the description of registers. Registeris used here as a cover term for any in which any two registers are similar or different linguistically (see, e.g.,
language variety defined by its situational characteristics, including the Biber 1988, 1995). This research approach is based on computational analy-
speaker's purpose, the relationship between speaker and hearer, and the sis of a large text corpus to identify the most important patterns of linguis-
production circumstances. tic co-occurrence, known as the 'dimensions'. Each dimension comprises a
Although registers are defined in situational terms, they can also be com- distinct set of co-occurring linguistic features, and each has distinct func-
pared with respect to their linguistic characteristics, what is known as the tional underpinnings. Registers can be compared in this multi-dimensional
study of register variation. Re gis ter variation is inherent in human language: space, enabling empírica! analysis ofboth the extent and the ways in which
a single speaker will make systematic choices in pronunciation, morphology, any two registers are different.
word choice and grammar reflecting a range of situational factors. The There have been numerous MD analyses ofEnglish, from both synchronic
ubiquitous nature of re gis ter variation has been noted by a number of schol- and diachronic perspectives, considering a wide range of general as well as
ars, for example: more specialized registers. There have also been major analyses of Korean
and Somali, and more restricted analyses of other languages. These data-
... each language community has its own system of registers ... corresponding
driven analyses have resulted in many unanticipated findings, including
to the range of activities in which its members normally engage (U re 1982: 5).
1) major differences between 'oral' and 'literate' registers but no absolute
... register variation, in which language structure varíes in accordance with the differences between speech and writing (Biber ; 2) a fundamental
occasions ofuse, is all-pervasive in human language (Ferguson 1983: 154). distinction between the linguistic complexity profiles of spoken registers
... no human being talks the same way all the time .... At the very least, a variety (which differ in extent but not kind) and written registers (which exploit the
of registers and is used and encountered (Hymes 1984: 44). ful! spectrum oflinguistic variation; Biber · 3) surprising similarities in
the underlying multi-dimensional structure of English, Korean and Somali,
The of register as an explanatory factor for linguistic variation complemented specific differences reflecting the communicative priori-
has been increasingly recognized over the past two decades. Numerous ties of each culture (Biber 1995); and 4) drama tic historical shifts in the pat-
studies in functional linguistics, which focus on the interaction of discourse terns of register variation in English and Somali (Biber 1995).
and grammar, have documented how spoken and written differ- Surprisingly, there have been fewer MD analyses of register variation in
the patterns of variation for a linguistic feature other European languages. In the present study, we help to fill this gap
Tottíe 1991; Collins 1991; Sigley 1997). Most ofthese studies undertaking a MD analysis of register variation in Spanish. Spanish is an
to show how characteristics of the textual con text ideal complement to previous analyses. From a cross-linguistic and cross-
so that of use in one cultural perspective, Spanish has many interesting points of and
SPANI DIMENSIONS OF VARIATION IN SPAN
research was as a method-

ological approach to (1) identify the salient linguistic co-occurrence pat-
terns in a language, in empirical/quantitative terms, and (2) compare
2. Theoretical background registers in the linguistic space defined by those co-occurrence patterns.
In a few cases, registers can be distínguished by the presence of distinctive reg- The approach was first used in Biber ( 1985, 1986) and then developed more
ister markers: linguistic features restricted to a single register. For example, fully in Biber (1988).
Ferguson (1983) describes how the grammatícal routine known as 'the The notion oflinguistic co-occurrence has been given formal status in the
count', as in the count is two and one, is a distinctive re gis ter marker of baseball MD approach, in that different co-occurrence patterns are analysed as
game broadcasts. In most cases, though, register differences are realized underlying dimensions of variation. The co-occurrence patterns comprising
through the relative presence or absence of register features - core lexical each dimensionare identified quantitatively. That is, based on the actual dis-
and grammatical features - rather than by the presence of a few distinctive tributions of linguistic features in a large corpus of texts, statistical tech-
register markers. Register features are found to sorne extent in almost all texts niques (specifically factor analysis) are used to identify the sets oflinguistic
and registers, but there are often large differences in their relative distribu- features that frequently co-occur in texts. The methods used to identify
tions across registers. In fact, many registers are distinguished only by a par- these co-occurrence patterns are described in Section 4.
frequent or infrequent occurrence of a set of register features. Additionally, qualitative analysis is required to interpret the functions
Register analyses of these core linguistic features are necessarily quantita- associated with each set of co-occurring linguistic features. The dimensions
tive, to determine the relative distribution of linguistic features. Further, of variation have both linguistic and functional content. The linguistic
such analyses require a comparative approach. That is, it is only by quanti- content of a dimension comprises a group of linguistic features ( e.g., nom-
tative comparison to a range of other registers that we are able to determine inalizations, prepositional phrases, attributive adjectives) that co-occur with
whether a given frequency of occurrence is notably common or rare. A a high frequency in texts. Based on the assumption that co-occurrence
comparative approach allows us to treat register as a continu- reflects shared function, these co-occurrence patterns are interpreted in
ous construct: texts are situated within a continuous space oflinguistic vari- terms of the situational, social and cognitive functions most widely shared
vHU~""·" analysis of the ways in which registers are more or less by the linguistic features. That is, linguistic features co-occur in texts
different with respect to the full range of core linguistic features. because they reflect shared functions.
It turns out, though, that the relative distribution of common linguistic fea- A simple example is the way in which first and second person pronouns,
tures, considered individually, cannot reliably distinguish between registers. direct questions and imperatives are all related to interactiveness.
There are too many different linguistic characteristics to consider, and Contractions, false starts and generalized content words ( e.g., thing) are all
individual features often have distributions. However, when related to the constraints imposed by real-time production. The functional
are based on the co-occurrence and alternation patterns for groups of bases of other co-occurrence patterns are less transparent, so that careful
differences across registers are revealed. qualitative analyses of particular texts are required to help interpret the
has been emphasized underlying functions.
Sorne observe that In sum, the salient characteristics of the MD approach are:
isolated
.. The research goal of the approach is to describe the general patterns of
variation between registers, considering a comprehensive set of linguistic
features and the range of registers in the target domain of use.
WORKING WITH SPANISH DIMENSIONS REGISTER IN SPANISH
text, rather than sections document these for

of the statistical patterns as
"
ofview, it is not 4.MD

representa discourse domain. From a
to determine which distributions are the
4.1. The
without
.. The approach is multi-dimensional. That is, it is assumed that The corpus used for the
parameters of variation will operate in any discourse domain. nent of the NEH-funded
The approach is and quantitative. are based on see Davies . The Corpus del Espaiiol incorporates texts from many
normed frequency counts of linguistic features, describing the relative other existing Spanish corpora, including the Habla Culta (Lope Blanch
distributions of features across the texts in a corpus. The linguistic 1977, 1991), the Corpus oral de referencia de la lengua espaiiola contem-
co-occurrence patterns that define each dimension are identified empi- poránea, the Corpus lingüístico de referencia de la lengua espaiiola en
rically using multivariate statistical techniques. Argentina, the Corpus lingüístico de referencia de la lengua espaiiola en
The approach synthesizes quantitative and qualitative/functional Chile (Ballester and Santamaría 1993; Marcos-Marín 1975) and the
methodological techniques. That is, the statistical analyses are inter- Biblioteca Virtual (http:/ /www.cervantesvirtual.com). In addition, we added
preted in functional terms, to determine the underlying communicative a sample of 40 academic research articles in science and the humanities,
functions associated with each distributional pattern. The approach is downloaded from online sources.
based on the assumption that statistical co-occurrence patterns reflect We categorized all texts in the corpus for the present study into registers,
underlying shared communicative functions. based on their situational characteristics; in a number of cases, this
required actually reading through the texts to determine their primary
To achieve these theoretical goals, a multi-dimensional analysis follows eight communicative purposes. As shown in Table 3.1, the resulting corpus is
methodological steps: both large (c. 20 million words) and represents a wide range of spoken and
written registers.
1) An appropriate corpus is designed based on previous research and analy-
sis. Texts are collected, transcribed (in the case of spoken texts) and input
4.2. Development of the grammatical tagger and identification of linguistic features
into the computer. (In many cases, pre-existing corpora can be used.)
2) Research is conducted to identify the linguistic features to be induded Before beginning work on grammatical analysis software, it was necessary to
in the analysis, together with functional associations of the linguistic first identify the set of potentially relevant linguistic features to be used in
features. the multi-dimensional analysis. For this purpose, we attempted to itemize
3) programs are developed for automated grammatical analysis, the linguistic characteristics of Spanish that potentially served communica-
to - or 'tag' - all relevant linguistic features in texts. tive functions in discourse. We began by surveying major Spanish reference
4) The en tire corpus of texts is tagged automatically by computer, and all grammars, including the multi-volume Gramática Descriptiva de la Lengua
text5 are edited interactively to ensure that the linguistic features are Española (Bosque and Demonte 1999), and various reference grammars
accurately identified. written in Englísh (especially including Butt and Benjamín 2000). In addi-
5) Additional computer programs are developed and run to compute tion, we consulted with Spanish grammarians and other native speakers. In
normed frequency counts of each linguistic feature in each text of the a final step, we considered the sets oflinguistic features included in the MD
corpus. analyses of other languages (especially English, Somali and Korean) to
6) The co-occurrence among linguistic features are identified check whether any of these had counterparts in Spanish.
through a factor of the frequency counts. The grammatical tagger for our project, developed by Jones at Northern
7) The 'factors' from the factor analysis are interpreted functionally as Arizona University, has several different components, including:
underlying dimensions of variation.
8) Dimension seores for each text are computed; the mean dimension l) a probabilistic/rule-based component to identify the majorword classes
seores for each are then to the salient lin- (nouns, verbs, adjectives, adverbs) together with basic morphological
guistic similarities and differences between registers. features (e.g., number, gender, tense)
WORKING SPANISH DIMENSIONS VARIATION IN SPANISH
ofthe
classes: 10. lst person pronouns,

Face-to-face conversations 2nd person usted pronouns, 13. lst person
Business 15. all 3rd person pronouns
conversations
18. other se
interviews 419 2,293,918 5474
20. ali clitics, 21. demonstrative pronouns
Political interviews 753 1,181,198 1569
Radio/TV contests 23 53,813 2340 ése).
Political debates 39 86,277 2212 classes: 22. attributive 23.
Drama 54 389,177 7207 attributive adjectives, 24. predicative adjectives, 25. evaluative adjectives,
Institutional meetings 52 455,517 8760 26. other semantic classes of adjective ( colour, size/ quantity / extent,
Political speeches 42 311,060 7406 time, classificational, topical), 27. quantifiers (e.g., muchos, varias, cada).
News broadcasts 31 68,309 2204 Other noun phrase elements: 28. definite arrides, 29. premodifying demon-
Sports broadcasts 20 50,406 2520 stratives ( e.g., ese), 30. possessives (including premodifying determiners;
Spoken subtotal 1560 5,171,659 3315 pronouns, e.g., la mía; and emphatic pronouns, e.g., hija mía).
Adverb classes: 31. adverbs-place, 32. adverbs-time, 33. adverbs-manner,
Written texts: 34. other -mente adverbs.
Register Number of texts Word count Average words/text

Verbs:
Business letters 313 56,075 179 Tense and mood: 35. indicative, 36. subjunctive, 37. conditional, 38.
Fiction 187 7,205,389 38,531 present, 39. imperfect, 40. preterite, 41. progressive, 42. perfect, 43.
Newspaper reportage 791 1,515,911 1916
Editorials 95,603 1951
future, 44. future time with ir a.
49
Essays/Newspaper columns 378 1,977,167 5231 Semantic/lexical classes: 45. obligation verbs (e.g., deber, tener que, haber+
General prose and textbooks 26 1,814,801 69,800 que/ de), 46. all main verb ser, 47. all main verb estar, 48. aspectual verbs,
Encyclopedias 708 2,304,457 3255 49. mental and perceptual verbs, 50. verbs of desire, 51. communication
Academic articles 40 160,785 4020 verbs, 52. verbs of facilitation/ causation, 53. verbs of simple occurrence,
Written subtotal 2492 15,130,188 6079 54. verbs of existence/relationship.
Other features of the verb phrase: 55. ser passive with por, 56. agentless ser
passive, 57. se passive (with por and agentless), 58. verb + infinitive, 59.
2) a rule-based component and morphological analyser to identify func- infinitives without preceding verb or article, 60. existential haber.
tion word classes (e.g., prepositions, articles) and words belonging to
Questions: 61. yes/no questions, 62. CU questions, 63. tag questions.
special word classes (e.g., diminutives, nominalizations) Function word classes: 64. prepositions (single-word and multi-word), 65.
3) rule-based components to identify additional semantic and syntactic
general single-word conjunctions (pero, y, e, o, u), 66. other single-word
features (e.g., semantic classes of verbs, complementation patterns, conjunctions , 67. multi-word conjunctions, 68. exclamations (upside
pro-drop).
down exclamation mark).
The tagger was tested and revised extensively, checking the full set of fea-
tures in texts from various registers, and then focusing on the especially Dependent dauses:
problematic features (e.g., noun/verb/adjective ambiguities, and distin- Adverbial clauses: 69. causal subordinate clause (e.g., porque, puesto que, ya
guishing between the functions of words like que). The overall accuracy of que), 70. concessive subordinate clause ( e.g., aunque, a pesar de que), 71.
the final version of the tagger was estimated at 98 per cent. conditional clauses ( e.g., si, con tal que).
The tagger identifies c. 140 different linguistic features. However, these Complement clauses: 72. que verb complement clause - indicative, 73. que
were reduced to 85 features included in the final factor (see Section verb complement clause -subjunctive, 74. quenoun complement clause,
4.3 below): 75. que adjective complement dause, 76. CU verb complement dause.
WORKING WITH SPANISH ORPORA ONS OF REGISTER VARIATION IN SPANISH
site also searches on
4.3. Factor
j\s above, Multi-Dimensional
and rev:ision of the the that uses a statistical 'factor to patterns of
the co-occurrence. This procedure reduces a large number of
to a small set of underlying variables, called 'factors'. Each
factor represents a group ofvariables that are correlated with one another
Pero Acon + coor+ + + _gensingcon_ + their statistical tendency to co-occur in ; these factors
nada Ar+ + + + + !! + rbother_ +nada+ can subsequently be interpreted as underlying 'dimensions' of register
de Aen+ + + + + + _lwrdprep_ +de+ variation.
eso Ap3cs + dem + + + + + _prodem_ + eso+ The Appendix displays the full factorial structure for the analysis of
sucedió Avm +is+ 3s + + + + _indicat_preter_voccur_ +suceder+ Spanish linguistic fe atures. Only 85 of the original 140 + linguistic fe atures
, Apunc+ +, + + + + + + were retained in the final factor analysis. Sorne features were dropped
y Acon + coor+ + + + + _gensingcon_ + because they were redundant or overlapped to a large extent with other fea-
el Alms+ def+ + + + + _defart_ +el+ tures. In other cases, features were dropped because they were generally
embajador Anms + com + + + + + _singn_derivn_ +embajador+ rare in our corpus. Severa! of these features were combined into more
concedió Avm +is+ 3s+ + + + _indicat_preter_ +conceder+ general features. For example, possessive determiners and possessive pro-
su Ad3cs +pos+ + + + + _prepos_ + su+ nouns were combined into a more general feature; que clefts include both
mano Anfs + com + + + + + _singn_ + mano+ indicative and subjunctive clauses; similarly cual relative clauses comprise a
y Acon + coor+ + + + + _gensingcon_ + range of structural variants, including indicative and subjunctive clauses,
la Alfs+ def+ + + + + _defart_ +la+ with and without a preceding preposition. In addition, sorne features were
sonrisa Anfs + com + + + + + _singn_ + sonrisa+ dropped either because they did not vary across Spanish texts, or because
imperturbable Ajes+++ + + + _postadj_ +imperturbable+ they shared little variance with the overall factorial structure of this analysis
a Aen + + + + + + _l wrdprep_ +a+ (as shown by the communality estima tes).
cada AdOcs+ ind+ + + + + _quant_ +cada+ The solution for six factors was selected as optima!. These six factors
uno ApOms+ind+ +++++uno+ account for 45 per cent of the shared variance. A Promax rotation was used,
de Aen+ + + + + + _lwrdprep_ +de+ which allows for sorne correlations between the factors. (The Appendix
los Almp+ def+ + + + + _defart_ +el+ also lists the eigenvalues for the first 6 factors as well as the ínter-factor
invitados Anmp+ com+ +++!!+_plum_+ invitado+ correlations.)
Table 3.2 summarizes the important linguistic features defining each
Each line begins with the word followed by the start of the tag, indicated dimension (i.e. features with factor loadings over + or - .3). Each factor
A. The primary tag is in field l (e.g., noun, verb, etc.), with various sec- comprises a set of linguistic features that tend to co-occur in the texts from
ondary tags in fields 2-5 (e.g., the mood, tense, person, number and voice the Spanish corpus. Factors are interpreted as underlying 'dimensions' of
of a verb), an ambiguous tag in field 6 ( !!) , a linguistic fe ature tag in field 7 variation based on the assumption that linguistic co-occurrence patterns
( e.g., 'ynquest' for questions; 'subjvcompque' for que verb comple- reflect underlying communicative functions. That is, particular sets of lin-
ment clause with subjunctive mood), and the lemma in the final field. guistic features co-occur frequently in texts because they serve related
Once the texts in the corpus were tagged, it was a simple matter to communicative functions. Features with positive and negative loadings rep-
the frequency of each linguistic feature in each text. These fre- resent two distinct co-occurrence sets. These define a single factor because
quencies were 'normalized' to arate of occurrence per 1000 words of text the two sets tend to occur in complementary distribution: when a text has a
(see Biber et al. 1998). Thus, at this stage, we had normed frequencies of 85 high frequency of the positive set of features, that same text will tend to
linguistic features for each text, making it possible to compute descriptive have low frequencies of the negative set of features, and vice versa. In the
statistics for the different The entire tagged corpus is available for interpretation of a factor, it is important to consider the following: 1) the
research on the at http:/ /www.corpusdelespanol.org/registers. Thís communicative functíons that are shared by the linguistic features grouped
WORKING SPANISH DIMENSIONS OF VARIATION IN SPANISH
Table (continued)
Dimension 1: 6:
adverbs, existential haber, on a dimension; 2) the patterns of variation wüh

comr)le1ni"nt dauses (indicative), tag present tense, future ir a, "''"''A'J~~ feamres; and 3) the functions oftarget
aspect, communication verbs, person pronouns, aspect, el que in texts. In the section, present the
clauses, yes-no questions, que relative clauscs (indic.), manner adverbs, of each factor as a diinension ofvariation.
augmentatives, CU verb clauses,
conditional subordinate clauses, tú, usted, desire verbs,
verbs of facilitation, simple occurrence verbs 5, Interpretation of the Spanish dimensions of variation
Negative featurcs:
singular nouns, postmodifying attributive adjectivcs, dcfinite articlcs, prepositions, 5.1. Jnterpretation ofDimension 1: 'oral' versus 'literate' discourse
plural nouns, simple NPs (without determiners, etc.), derived nouns, type token
ratio, postnominal past participles, premodifying attributive adjectives, long words, The first step in the interpretation of a dimension is to describe the com-
other adjectives, se passives municative functions shared by the co-occurring linguistic features. In the
case of Dimension l, there is an extremely large number of linguistic
Dimension 2:
features with large positive weights, and these features can be described
Positive features: as serving several specific functions. However, those functions are related in
subjunctive verbs, que relative clauses (subjunctive), queverb complement clauses that they are all characteristic of spoken language rather than written
(subjunctive), verb+ infinitive, conditional verbs, obligation verbs, future tense, language.
infinitives without preceding verb or article, que verb complement clauses Many of these features are verb classes or characteristics of the verb
(indicative), vcrbs of facilitation, progressive aspect, conditionals in dependent phrase, such as indicative mood, present tense, future ir a, perfect aspect
clauses, que noun complement clauses
and progressive aspect. (In contrast, there are almost no nominal features
Dimension 3: included in the positive grouping on Dimension l.) Several of these verbal
features are used for simple descriptions, including copula ser, copula estar,
Positivc features:
clitics, imperfect tense, possessives, 3rd person pronouns, se (not passive or existential haber, and simple occurrence verbs ( e.g., pasar, ocurrir). Sorne of
reflexive), preterite tense, aspectual verbs, se (reflexive), se (emoción), infinitives these verb classes - especially mental verbs, desire verbs, and the copula
without preceding verb or article, verb+ infinitive estar - frequently occur with lst person pronouns (and lst person pro-
Negative features: drop) to express the speaker's own personal feelings and attitudes. There
derived nouns, postmodifying attríbutive adjectives are also several 'addressee-oriented' features included on Dimension 1,
such as the pronouns tú and usted, tag questions and yes-no questions. And
Dimension 4: at the same time, there are several 'other-oriented' features grouped on to
Positive features: this dimension, reflecting the description of other people in particular
2nd person pro-drop, tú, exclamatives, CU qucstions, simple NPs (without places and times (e.g., features like 3rd person pronouns, time adverbs,
determiners, etc.), yes-no questions, diminutives, conmigo/contigo/consigo place adverbs, demonstrative pronouns, communication verbs and manner
Negative featurcs: adverbs).
que relative dauses (indicative), other -mente adverbs These positive features can ali be associated with stereotypical 'oral' dis-
course, and this interpretation is supported by the patterns of register
Dimension 5:
variation along Dimension 1 (described below), given that inter-
Positive fcatures: pretation, it might surprise many readers that there are also several depen-
proper nouns, preterite tense, long words, prepositions, premodifying attributive dent clause types grouped among the positive Dimension 1 features: causal
adjectives subordinate clauses, conditional subordinate clauses, queverb complement
Negative features: clauses (indicative), CUverb complement clauses, el que clauses and que rel-
present tense, ative clauses (indicative).
WORKING WITH SPANISH ORPORA DIMENSIONS REGISTER VARIATION IN SPANISH
Similar patterns have been found in MD

70.00
where adverbial clauses are used to express
60.00
stance with features like
50.00
,,.
ll.l
o(.) 40.00
of nouns or characteristics of noun lh
.,- 30.00
"""""n"' attributive i:: 20.00
nouns, NPs determiners, o
·¡¡¡
i::: i0.00
derived nouns, postnominal past participles and premodifying attributive !!)
adjectives (before . Prepositional phrases contain a noun e 0.00

phrase, and they often function to modify so me head no un. Long words and i5 -10.00
a high type-token ratio are also included among these negative features, -20.00
reflecting the use of a diversified vocabulary and specialized words. The
reliance on nouns and complex noun phrases results in a style of text with
-30.00
G>
e
-
"' - "' "' - "' "'
e
"' "'
"'... -
o
o "'
G> ..."'
Cll
-
"'
E ¡:=
-
... ..."' :o
Cl U)
UI
¡:= (1)
G> .e:
e IJ) ....
o iii (1)>
IJ)
u¡
o
m ...
(1) G>
Cl
m o
G> UI
m
- E - ... - ... :a-

dense informational content packed in to relatively few words. Writers, who :¡::: e :¡:::
.e:
o.
Cll
G>
e .e
...
Cll
... G>
as
e
as
·s: G> 't:l ....
"' :¡::: (.) ·¡;; CJ as
u o
·;:::
(.)
..!!! o. o o. G>
have extensive opportunity to craft and revise their texts, are able to achieve G>
G>
o 't:I "' G>
G>
G> 't:l ¡¡: w iii c. (.)
c.
o
... ·e
G> !ll G> c. !ll U)
Gi > u iii
G> ....
this linguistic style of expression, but it is rare to find spoken texts of this
e: .E E .e ...o .E 111 .eo w 1/)G> G>
G>
.... ü
>
u ro
- e
e G>
type. Thus, Dimension 1 can be interpreted as reflecting the characteristics ti) o(.) ¡¡¡ iii ·¡¡¡ G> 'C (.)
ti)
:¡:::; e: ti) (.) (.) ti) G>
m e:
c. (.)
of stereotypical 'oral' diseourse (the positive features) as opposed to 'liter- G>
o o ¡¡: ~ m w
e ¡¡¡
- E
.!:!! :¡::: ::¡
oo. E m c.
o o z <C
(!)
ate' discourse ( the negative features). ·¡¡¡ :::1 fl.. ::::¡ ::::1 ti)
::::¡ ti) Cl fl..
m .5 :¡::: (/)
fl.. ~
This interpretation is strongly supported by the patterns of register varia- m u ti) G>
tion found with respect to Dimension 1 (se e Figure 3.1). That is, the second
o
'¡j
.5 z
major step in interpreting a dimension is to eonsider the similarities and dif- o
(/)
ferences between registers with respect to the set of co-occurring linguistic
features. For that analysis, dimension seores are computed for each text, and
then texts and registers are compared with respect to those seores. Figure 3.1 Comparison of registers along Dimension 1: oral versus literate
Dimension seores (or factor seores) are eomputed by summing the individual discourse
seores ofthe features with salient loadings on a dimension (i.e. features with
loadings greater than 1.301 on a factor). Figure 3.1 plots the mean dimension seores of registers along
In the present case, the Dimension l score for each text is computed by Dimension l. The registers with large positive values (such as telephone
adding together the frequencies of indicative mood verbs, verbs of exist- and casual face-to-face conversations) have high frequencies of indicative
ence, causal subordinate clauses, time adverbs, lst person pronouns, copula mood verbs, verbs of existence, lst person pronouns, etc. - the features
ser, etc. - the features with positive loadings on Factor l (from Table 3.2) - with salient positive weights on Dimension l. At the same time, these reg-
and then subtracting the frequencies of singular nouns, postmodifying isters with large positive values have markedly low frequencies of singular
attributive adjectives, definite articles, prepositions, plural nouns, etc. - the nouns, postmodifying attributive adjectives, prepositions, etc. - the fea-
features with negative loadings. tures with salient negative weights on Dimension l. Registers with large
All individual linguistic variables are standardized to a mean of O.O anda negative values (such as academic prose and encyclopedias) have the
standard deviation of 1.0 before the dimension seores are computed. This opposite linguistic characteristics: very high frequencies of nouns, post-
process converts feature seores to scales representing standard deviation modifying attributive adjectives, prepositions, etc., plus low frequencies of
units, so that all features on a factor have equivalent weights in the compu- verbs, pronouns, etc.
tation of dimension seores (see Biber 1988: 93-7). The register distribution shown in Figure 3.1 eonfirms the interpretation
Once a dimension score is computed for each text, the mean dimension of Dimension 1 as a continuum of 'oral' rather than 'literate' discourse. In
score for each register can be computed. Plots of these mean dimension fact, there is almost an absolute distinction between spoken and written reg-
seores allow linguistic characterization of any given register, comparison of isters along Dimension 1: all spoken registers have positive seores on
the relations between any two registers, and a fuller functional interpreta- Dimension 1, while all written registers -with the exception of fiction - have
tion of the underlying dimension. negative seores on Dimension l. Within speech, the conversational registers
WORKING WITH SPANISH C VARIATION IN SPANISH
also illustrates the dense use Dimension

from a televis1on This interaction
~~ .
the same features lst ~nd 2nd person r.rono:ins, . , with
an even greater concentratlon ot d!l"ect mteract10n and
involvement:
pronoun is
Text Sample 2: TV contest - . interacti?n . .
Speaker 1: ¿ese Alberto es - es un alumno tuyo - extranjero? (Present tense verbs are shown m hold underlined. Quest1ons begm and
Speaker 2: Sí. with ¿ ?)
Speaker 1: ¿Y de dónde es, llamándose Alberto?
Speaker 2: Italiano. Speaker 1: Bueno, pues ahora nos vamo~ a conocer a nuestro - consumid?r,
Speaker 1: Ah, italiano. Claro, claro. a saber - si está en casa y a darle el premio. Y nos vamos esta vez a Madnd,
Speaker 2: Di/ - tiene, fíjate, se llama Alberto y tiene un apellido catalán. y es una señora. Como siempre, vamos a esperar otros. cinc~ tonos, y
Speaker l: ¿Y eso? esperando - también que esté en casa. Ya saben, hasta diez millones de
Speaker 2: ¿Eh? pesetas. Ahí está el primer tono, nuestro segundo tono, y -
Speaker 1: ¿Y eso? Speaker 2: ¿ Dígame?
Speaker 2: Pues no sé. Digo: '¡Pero bueno!' Dice: 'Sí, sí', dice: 'Si - muchas Speaker l: ¡Hola, buenas noches!
'veces me he hecho pasar por español'. 'Con este apellido y este nombre-' Speaker 2: Buenas noches.
Y nada. Y - y - y - me - le digo a su amigo, digo: '¿Qué tal juega Alberto al Speaker 1: ¡Por favor, ¿hablo con doña Rosario Ramos?
tenis?' y me dice: 'Genial'. Speaker 2: Sí.
Speaker 1: ¡Jo! Speaker 1: ¿ Es usted?
Speaker 2: ¿Ves? Dice: '¡Idiota!', y le dice Alberto: 'Idiota, ¿por qué se lo Speaker 2: Sí.
dices?', dice: 'Ya no quiere jugar conmigo' Digo: 'No, ya no quiero jugar Speaker 1: ¿Está usted viendo la televisión?
contigo, porque juegas muy bien'. Speaker 2: Pues sí.
Speaker 1: Y estará viendo Antena 3, el Gordo -
Translation:
Speaker 2: El Gordo, claro -
Speaker 1: But, this Alberto is - is a student of yours - a foreigner? Speaker 1: Pues entonces - enhorabuena porque es nuestra ganadora Doña
Speaker 2: Yes. Rosario, venga un gritito de esos de - alegría - que la veo muy seria.
Speaker 1: And where is he from, with a name like Alberto?
Speaker 2: Italian.
Translation:
Speaker l: Ah, Italian. Sure, sure. Speaker 1: Well, I guess now we're going to meet our - client, and find out-
2: He has, get this, his name is Alberto and he has a Catalan last if she's at home and give her a prize. And we are going this time to Madrid,
name.
and it's a woman. As always, we're going to wait another five beeps and see
Speaker l: And so?
if she's at home. You all know, it's up to 10 million pesetas. And there is
Speaker 2: Huh?
the first beep, our second beep, and -
Speaker 1: And so?
Speaker 2: Hello?
Speaker 2: So, I d.on't k?ow. I say: 'Well, good!' He says: 'yes,' yes', he says: Speaker 1: Hello, good evening!
- a lot of times I ve gotten others to think I'm Spanish'. 'With this Speaker 2: Good evening.
l~st n.ame and th~s name -' And nothing. And - And - And - I say to Speaker 1: Can you tell me if I'm speaking with Mrs Rosario Ramos?
h1s fnend, I say: How well does Alberto tennis?' and he tells me: Speaker 2: Yes.
'Great'.
Speaker l: Is that
Speaker 1: Wow!
Speaker 2: Yes.
WORKING SPANISH
REGISTER VARIATION lN SPANISH
2:
are our t~ . these constructions describe

oons. but do not describe an actual event or state. Several
- because you seern
very serious. features on this dimension include a
is used for the expression of uv·~~'",'ª
fear, hope, etc. (see Butt and Benjamín ~000: 238
like institutional meet-
' rbs and future tense are both used to descnbe events or
speeches and news broadcasts are
,_..,.~,,,,L~ and much less directly interactive than conversation. At the same h t ;;uld occur (but have not actually occurred). Similarly,
time, these registers are much more focused on conveying information than
stat.es
bhgation~ a ve rbs (e ·g ., tengo describe events that should occur. Finally,
d
0
conversation. As a result, spoken registers like political speeches and news ' b
uever co mplement clauses and .quenoun complement clauses are use l ·dto
broadcasts have small positive seores, reflecting a more balanced use of pos- q
e:xpre ss a stance in the controllmg verb or noun ( e.g., sabe que, a i ea
itive and negative features on Dimension l. de que). . . ·1
As noted above, the written registers (except fiction) all have negative seores The register differences defined by Dimension 2 are 11:" man~ ways s1m1 ar
on Dimension 1, with academic prose and encyclopedias having the largest those found for Dimension 1. Figure 3.2 shows that D1mens10.n 2 defines
negative seores. These seores reflect the dense use of nominal features (e.g., to ear absolute distinction between spoken and written reg1ster.s: only
nouns, postmodifying attributive adjectives, prepositions, etc.) together with : ~ken
registers have large positive seores on Dimension 2 the ~reflectm?
the infrequent use of positive Dimension l features (verbs, pronouns, etc.). cÍense use of these 'irrealis' features). In contrast, all wntten reg1~ters
Text Sample 3 illustrates the use of these features in an academic prose text: have negative seores or seores near O.O; and the two formal expository
Text Sample 3: Academic prose

(Nouns are shown in bold underlined.) 8.00
6.00
También es común en estas zonas la desaparición de corrientes de ~, e ~
o(.) 4.00
incluso rios enteros pueden desaparecer en sumideros u ojos que pueden UI
conducir a cavernas subterráneas o a acuíferos. Los sumideros indican la N 2.00
presencia de cuevas bajo ellos. Debido a la captura de las aguas superficiales e:
o 0.00 ~ ~ "" = ~ -
- ~ cu @) lllij
por el sistema subterráneo de drenaje, algunas regiones con cuevas son ·¡¡¡ ;;
~ -2.00
bastante secas y polvorientas y tienen escasa vegetación.
E
i5 -4.00
Translation: --6.00
Underground streams are also common, and even entire rivers can some-
times disappear in sinkholes that lead to underground caves or aquifers.
The sinkholes indicate the presence of caves underneath. Due to the flow
-8.00
UI
::
11)- - - ...
·s;....e - ...
111
!\'!
m 11)
E e: en 111
(!)
o e:
m .e +:: m
UJ UJ
.c
UJ
11)
(.)
UJ
UJ
el)
.....
e ....
o (!)
+::
ca
UJ
:::::
fl)
11)
m
UJ
~
11)
">
" -
(..) (..)
e a. e ..9!
- - " ,_
(!) el) fl)
of surface-level water in to the underground system of drainage, sorne areas 11)
(!)
(!) el)
ca el) o ca 11>
with many caves are rather and have little
"C
a> E o... o.
¡¡¡ u
11) 11)
o
,_
> UJ
.5 '¡¡j
-¡¡j
(.)
¡g ¡¡¡e: .e ¡¡¡ e: 11)
e:
ou ·¡¡¡ .e .5
u
u E o ....
...111 :;:::
l.) 111
E o ~
(!)
E
In sum, Dimension l makes a fundamental distinction between speech and
oQ. c.
e: +:: o
·¡¡¡ ::l o. oc..
¡¡¡
::l
::l
m z(!) ·sCll111
at the two poles, this dimension actually distinguishes between stereo- ::::1 = w UJ
m
00 u¡ o .5
typical speaking (conversation) and stereotypical writing (expository prose). .E o
·¡;s
Linguistically, these opposing styles are represented by verbal/ clausal fea-
o
tures serving involved and interactive functions, as opposed to a dense ti)
nominal that careful and revi-

sion of
3.2 Comparison of registers along Dimension 2: spoken irrealis discourse
WORKING SPANISH REGlSTER VARIATION IN SPANISH
so the conversational are espe-

scores. In contrast, on Dimension 2 casual
conversation has a score near O.O. Instead, we see the '"'"""""'vote, as Mr Mendoza says, had been the
n ... ,,."T"'''º with seores on Dimension 2: for the other or for supporting the other can-
views and debates. For c.~ª"'"'~ didate, then would have divided those votes between either
or Mr or Mr or Mr Navarro. do I get all
Text 4: Political interview f these votes? is all of the support my It would be due
(Subjunctive, future and conditional verbs are in bold tmderlined.) ~o something more, right, Ramón? I hope you have the humility-
Speaker 2: v\Then you finish, when you finish -
Speaker 1: ... Hay que ser fuerte pero a la vez generoso, dispuestos a tratar Speaker 1: The humility to - .
al otro con respeto y compasión. s eaker 2: -your turn; I won't interruptyou- I'll answeryou when you fimsh.
Speaker 2: Una actitud moral - ¿Y servirá de algo la música? - s~eaker l: The humility to - to recognize it. What's clear is that-when there
Speaker 1: Servirá solamente si la gente llega a saber como amarla, como are something like 9000 supporters, (actually, more like 7000), but cer-
cantar e improvisar. Escuchar música en un sillón está muy bien. Claro tainly not Ramón's 11,000 (which were again only about 10,000).
está que me gusta un público que sepa escuchar. Pero en realidad me grut
taria que la música y el canto fueran siempre previos a los debates políti- Interestingly, drama and formal telephone conversations also show a dense
cos. Enseñar música en las escuelas con la actitud moral adecuada podría use of these features, whíle casual face-to-face conversation <loes not. In the
contribuir al entendimiento en el mundo. case of drama, the dialogue carries the narrative story line while showing us
the characters' inner thoughts and feelings, resulting in a dense use ofthese
Translation: irrealis features. For example:
Speaker 1: ... You have to be strong but at the same time generous, pre-
pared to treat the other with respect and compassion. Text Sample 6: Drama
Speaker 2: A moral attitude - And would music help? - (Subjunctive and verb + infinitive are shown in ill!k!J!Jru:ifil:J!íl!~.
Speaker l: It would help only ifpeople really learn to enjoy it, how to sing Speaker 1: Ud. es una pesadilla. ¿Que quiere que le cuente? Porque voy a tener
(along). Listening to music in a big, soft chair is a good idea. You know, I que contarle algo para que se quede tranquila. Soy ... soy un prófugo.
really like people who know how to listen. But actually, I'd really like for Speaker 2: ¿De la justicia?
music and singing to be part of the public dialogue. Teaching music in Speaker 1: Sí, si a U d. le gusta sí. Me persigue la policía y por eso me acerqué
the schools with the right kind of attitude would really contribute to a Ud., para disimular. ¿Ve esta caja? En ella llevo el botín. Sí. Lo que robé.
understanding in the world.
No se preocupe por mí.
Speaker 2: No tanto como hubiera debido.
Text Sample 5: Political debate
Speaker 1: ¿Tiene miedo? No se preocupe, yo no le robo a cualquiera. No.
(Subjunctive and future tense verbs are in bold underlined.) Le puedo asegurar que no, de otras cosas pero de esto no.
Speaker 2: Mire, no me interesa. ¡Ud. me engañó ... y se acabó!
1: ... Si el voto de cabreo, como dice el señor Mendoza, hubiera sido 1: ¿Gracias por divertirse conmigo?
el único motivo para votar a otro candidato, o para avalar a otro candidato, '-'1-'"ªJ""' 2 : Me pondría a cantar a los gritos si no fuera que no quiero arru-
pues lógicamente, nos hubiéramos dividido esos votos entre o el señor inarme la garganta. ¿Y de quién es prófugo entonces?
Nogal, o el señor Campos Gil, o el - señor Diéguez, o el - señor Navarro.
¿Por qué todos los votos vienen a mí? ¿Por qué todos los avales vienen a mí? Translation:
Será por algo más, eh - Ramón, ¿no?. Pero que tengas la humildad- "f-'""'°'c' 1: You are a nightmare. v\That do you want me to tell Because
Speaker 2: Cuando termines, cuando termines - l am going to have to tell you something so that you calm. I'm ... I'm
Speaker l: La humildad de -
Speaker 2: - tu turno; yo no te - Te contesto cuando termines.
WORKING SPANISH DIMENSIONS REGISTER VARIATION IN SPANISH
rob anyone. among the uses of the

but not that. lavarse 'to wash
I don't care. You've cheated me, and now it's over! n1n,.,..,.;wco 'to be , the passive se, and other uses
uµc,a.'°'Cl l: And 80 that'S it? comerse 'to eat . However, the factor analysis grouped all
2: I'd scream, if it wasn't that I'd ruin my throat. So who's the hese uses of se - except for the passive se - on Dimension 3,
t . .
runaway now? they co-occur m narrat1ve texts
tíon as a focusing element foregrounding).
Surprisingly, no written register is marked for the dense use of these features. As Figure 3.3 shows, fiction has by far the largest pos1t1ve score on
That is, even registers like editorials and essays are characterized by the rela- Dirnension 3. The following text sample illustrates the dense use of
tive absen.ce of irre.alis fea~ure.s, despit<: their cornrnunica~ive goals of arguing Dirnension 3 features in a fiction text, including both imperfect and
for a particular pomt ofV1ew m oppos1t1on to other poss1ble perspectives. In preterite verbs (tenía, vieron), clitics (hablarles), possessives (su) and verbs
part, this is due to the fact that editorials and essays (at least those found in with se.
our corpus) often have a past-tense 'narrative' orientation. This allows the
colurnnist to relate a series of events that deal with the overall argument that Text Sarnple 8: Fiction
he or she is making in the column. These narrative passages have a lower (Third person pronouns, possessives and clitics are shown m bold
degree of 'irrealis' than present- and future-oriented debates and drama. underlined.)
Consider the following passage from a representative newspaper story:
Text Sample 7: Newspaper reportage

12.00
(Past tense verbs are shown in bold underlined.)
10.00
A los ~ue estáb~os allí, que no éramos más de tres, no nos sorprendió CI)
.... 8.00
o
demasiado que eV1tara a precio dar explicaciones, porque este u
111 6.00
hombre, y esto hay que aclararlo, era uno de los personajes más conocidos M
e 4.00
en los círculos de la ciencia anglosajona por su rechazo implacable a todo o
·¡¡¡ 2.00
~quello que no se pudiera ~esar o medir. La intuición era un lenguaje para cCI)
el y el trabajo de toda su vida había estado dedicado a 0.00
E
de~ostrar la estructura matemática por la que, sin lugar a dudas, se rige el i5 -2.00
universo. -4.00
Translation:
Those of us that were there were not too surprised that he was avoiding at all
-6.00
c
o E (!)
(.)
Cll
- ... ....... - -- -
... o
111 !:
...l1l ..... :¡:¡Cll

:;::::
111
:::
CI)
111 (!) (/)
¡¡¡ e (/) CI) CI)
·¡: o 11) Cll .s::
·:;; o .e .o f.)
(/) IJj
....
l/j
11)
11:1
ti)
~
Q)
·:;: t::o
...111
CI)
C')
Cll
fil
;:...
Cll
111
ti)
Cl
e
:;::
Gl
fll
...oa. - :e ...
fil
l/j
l1l
¡¡¡
Cll
(1)
ti)
o
-
f.) f.) CI) Q.
¡¡:: Cl .!!! ¡¡¡ Q. e (!) IJ) (!)
a.
CJ) "O w "O
- o
(!)
:¡;¡ c. ¡¡¡
costs any This man - and this is an irnportant point-was
o~e of the most well-known figures in British scientific circles, in large
ti)
11)
(!)
> w
111 (!)
Qj u "O Q. ro
...o .!: ...... CI) E
(!)
... ...o Cll o (.)
ü;::.. 'Ej
h1s . . . rejection of anything that could not be weighed or measured. For
CI)
e
·¡¡¡ ou
e: .5
u
:¡;;
l/j
l/j
i1i lll ..o
o ¡¡¡
E f.)
-;¡¡ ll)
u c.
'ii !:
e (1)
(!)
..e
111
(.)
c
,.,
Q)
:::¡ ¡¡¡ ·::¡

IJ) Q)
e o E et! o e:;¡
:;::: ~ w Cll
u
mtmtwn was completely unacceptable, and his life had been dedicated 00 ·¡¡¡ ll. o
E c. (!) e:(
to proving the mathernatical structure that undoubtedly governs the universe.
:::¡
l/j ~ :::¡ n. o 111
a. el)
~
.....:::¡
:¡::;
z
o .!:
!ti ID 111
o ·¡:¡
z .E
o
(/)
The features on Dimension 3 are L~i"'"""" used to construct stereo-

typical narrative discourse. and tense verbs form the 3.3 Dimension 3: narrative discourse
VARIATION IN SPANISH
SPANISH
serenas en este momento

-¿ ~HiLauu ya Demetrio? Por lo que observé durante la
todo malentendido Bien. Ahora; lean esta carta.
silencio.
Translation:
That night, around bedtime, DonJose Pedro called together his two daugh-
ters. 'Your mother wants to talk to you. She's waiting for you in the bedroom,' 5.4. Jnterpretation
he said. And so, smoking, he left for the park. The daughters saw him Dimension 4 is composed of overtly interactive and highly involved features,
disappear in to the thick darkness. It was so dark outside that everything was including CU questions, yes-no questions, exclamatives and diminutives.
the colour of dark pine trees ... However, in contrast to Dimension 1, the style of discourse represented here
The two girls went straight into their mother's room. seems to be focused to a large extent on the addressee, resulting in the
'Sit down. I want you to settle down now,' the woman began. 'Has dense use of 2nd person pro-drop, and the pronoun tú, but not lst or 3rd
Demetrio settled down now as well? From what I saw at dinner, no one has
person pronouns.
understood anything. OK. Now I want you toread this letter- both together This somewhat specialized grouping of linguistic features is especially
and in silence.' common in business telephone conversations (see Figure . In this regis-
ter, telephone operators are interacting with customers, obtaining informa-
Interestingly, drama also has a large positive score on Dimension 3. In this tion and attempting to help with customer problems. In our corpus, these
case, the t~x~ is entirely dialogic, but the characters are narrating past events are conventionalized interactions that focus on the addressee, with little
and descnptlons to carry the story line of the play. For example; expression of the feelings and attitudes of the for example:
Text Sample 9: Drama Text Sample 10: Business telephone conversation
(Past tense verbs are shown in bold underlined.)
Speaker 1: No sé si llamarlo novio ... Sí, en realidad, lo fue. Eramos tan Speaker 1: Perdóname un segundito. 'Cilag', ¿dígame?
jóvenes. No sé po~9ue le un día que no viniera más. Él me quería y Speaker 2: ¿Teresa?
creo que yo :amb1en. Armando pretendía tantas cosas de mí. Esperaba Speaker 1: Sí.
q~e yo ~amh1ase tan;o ... Me decía que lo mío estaba bien para princi-
Speaker 2: Hola, soy Miguel.
pio de siglo ... No se que pasó. Me fatigaba ... Me exigía ... Y yo estaba Speaker l: Hola, Miguel. ¿Qué te cuentas?
tan cómoda, tan tranquila. Speaker 2: ¿Qué tal, cómo estás?
Speaker 2: Los afectos intensos siempre me han fatigado. 0uea1<..er 1: Dime.
2: Sí, ¿me pasas con Rocío?
uµLcu"c1 l: Te pongo con ella.
Translation:
Speaker l: I don 't know if I should call him my boy:friend. Actually, he was Speaker 2: Vale.
once. We were so young. I don't know one I told him to never 1:
come back. He loved me and I think I loved him too. Armando expected 3: Miguel Llavori.
so much from me. He wanted so much forme to change ... He told me ~v'-"''""' 1: Vale.
that I was acting so old-fashioned ... I don't know what He ·~ueai<..er 3:
wore me out ... He demanded so much ... And before l:Sí.

3: eh - por lo de Tenerife?
so
these intense emotions have worn me out l: te cuento lo de Tenerife.
WORKING WITH SPANISH ONS REGISTER VARIATION IN SPANISH
25.00
events
(!)
.... 20.00
o(J of past However,
111 15.00
'<:!' 1_ 01rwc:P•the two dimensions. There is a
e . features grouped on Dimension 3, including
o 10.00 tlC se various c l"it1cs
. an d .3r d person pronouns; as we saw above,
·¡¡¡ fea-
e: · 11y common m
tenres 'are espeCla · fi ct10na
· l narrat1ve
· an d d rama. In contrast,
(j)
5.00
E ~¡rnension 5 is much more specialized. It is defined a smaller set of fea-
i5
0.00 res: nouns, preterite tense, long words, prepositions and
~~ing . ~djectives. (The_major_ negative features are present tense,
- --
-5.00
al
e:
o
ctl
E
ltl
.e .... 1ü
c.
e:
.2
-
e: 111 (!)
111
o (j)
111
(j)
:¡:
;¡::;
111
ltl - - ...
<11 lll ....
ti!
...
o
e 111 oe: ¡¡: ·:;;....(!) ..Q(!) c.
...
(.)
ctl
'5
111
111
(!)
c.. ~
lll
¡¡¡
·;:
lll
ti)
e:
o :¡::; c. »
111
11)
....o
...
ti!
(j)
111
lll
(j)
3: .e
(.)
111
111
;::..
ctl
.... ca
(.)
o "tl "tl
111
(.)
111
ti)
ltl
111
111
.l!l
111
redicative adJect1ves and verb+ mfimt1ve.)
p Although the positive features grouped on Dimension 5 are related to
past time discours:, they are _quite different fro~ the Dimension_ 3 fea-
-
(j) (j)
w
- '5 E ¡¡¡
(!)
(j) "C (.) o
.E tll
(!)
...(j) c: 111
(!)
c. c. ca ca
a>
(!)
> o :5 ¡¡¡ Q) o tll w (!)
.... o ...
.... o tures. First of all, D1mens10n 5 mcludes only pretente tense (but not nnper-
111
e:
o (.) >- Q)
e: ¡¡¡ e: ¡¡¡ ¡¡¡ ..Q ... .o fect tense verbs), reflecting a focus on past time events with relatively little
(.)
"tl (.)
e: ·¡¡¡ e: ......
--
(j)
tll (.) ;¡::; E ca
(!)
(.)
c. tll tll
Q)
e: ¡¡¡ ·:¡ o
lll (.) w :::l .!2 ~ E(.) E ca :s:(!) background description. In addition, we find proper nouns with a large
m c. o
c. oc. !l.o
·¡-¡¡ ::J <(
tll Cl
:::l
:t:: lll c. z positive loading on Dimension 5, rather than 3rd person pronouns. This
::J 111 e: ¡¡:: w
m u tll suggests a style of discourse that discusses the past actions of many differ-
o'¡j .E
Q)
z ent people, referred to by name. In contrast, Dimension 3 features char-
o acterize more detailed fictional narratives that involve a few characters,
w
which are easily referred to with 3rd person pronouns. In addition,
Dimension 5 includes features ofhighly informational prose - long words,
Figure 3.4 Comparison ofregisters along Dimension 4: addressee-focused prepositions and premodifying attributive adjectives - suggesting that this
interaction
style of discourse has an informational rather than popular communica-
tive purpose.
Translation: A5 Figure 3.5 shows, these features are common only in written infor-
Speaker 1: OK Excuse me. 'Cilag'. Hello? mational registers: encyclopedias, business letters, newspaper reportage
Speaker 2: Teresa? to a lesser extent, academic prose. Encyclopedias and newspaper
Speaker 1: Yes. reportage are similar in that they are informational registers written for a
Speaker 2: Hi, it's Miguel. mass audience, informing readers about past events that involve many dif-
Speaker 1: Hi Miguel. What's happening? ferent people. Text sample 11, from an encyclopedia article, illustrates
Speaker 2: So, how's everything going? How are these features:
Speaker 1: So .. .?
Speaker 2: Look, can I talk to Rocío? Text Sample 11: Encyclopedia article
Speaker 1: Sure, I'll connect you. (Preterite verbs and proper nouns are shown in hold underlined.)
2: Thanks.
Speaker l: Hello? Tras abandonar los terrenos de juego, Suárez inició una nueva
Speaker 3: (Hi, Miguel Llavori. carrera como técnico. En esta faceta profesional, permaneció casi siempre
Speaker 1: Yeah? ligado a la secretaría técnica del Inter de Milán, de cuyo primer equipo
Speaker 3: Marica? llegó a ser entrenador. También el banquillo de varios
Speaker 1: Yeah? clubes españoles. Además, estuvo al frente de la selección nacional
Speaker 3: Well, um, can you tell me about the Tenerife <leal? española de fútbol, a la cual dirigió en la fase final de la Copa del
Speaker 1: Yeah, !et me tell you. Mundo disputada en 1990 en Italia. En 1992 regresó al Inter, primera-
mente como entrenador y, más como integrante de su equipo
ORPORA OF STER VARIATION IN SPANISH
e
o
"¡jj
e
lll
E San at a faír.
º -5.00
As a result, we still haven't carried out much
even it is late the process.
in terms of the
-6.00
111
111
:o(!)
...
-
111
!l)
Q)
1:1)
!I!
11) 111
o
e
...o iií!ti :¡::;
111
-
111
llJ
Q)
o
ti)
> ¡;a .e
111
.... !ti
lf¡
(!)
IJj
Cl ¡¡::
!/)
e: Cll 11)
·;::: (.) :¡::;
11'1
-- ...
111 fil
¡¡::
o
(!)
e
e
o
:¡::;
-e
ti)
!l)
111
E
The Printing Committee awarded the contract to the company named
(Book) Fairs and Congresses (directed by Juan Carlos Grassi), after several
-
111 (!) (!) 111
t:: c. (..) o (..) CI. 11) o !11 Cll ·:;; ·:;; .ec. 111 .e
.... e:
-- ...llJ detailed and involved discussions.
111
c. ..!!:! o "O ¡¡: 'O w o Q) o
!ll ¡¡¡ :e (!)
...o e:CI)...
Q)
o 111 Q. (..) ca (j)
·e
Q) Q)
w ¡¡¡c. E .....e: o Cll "C
u !l)111 ........Q) (i) .o....o ¡¡¡ > -¡¡¡
> .o ¡;a .5 e: The distinction between Dimensions 3 and 5 reflects the differing functions
u e: Q) "C llJ ¡¡¡ o ¡¡¡ o o
e "iii Q. 113 (1) 11)
o e: ¡¡; :¡::; (1) o E ofthe imperfect and the preterite, two forms ofthe past tense in Spanish that
~ t:: (!! o (.) Q)
w ::i !11 o Q) E e iii o
-
IJj
o :¡::;
al U)c. <t
z c. o :::¡ Eo
c.
·::¡ ·¡¡¡ ::i ll.
differ in aspect (a distinction not simílarly found in English). The preterí te
¡¡:: en :¡::; c. Cl lf¡
.s: m:::¡ o111 refers strictly to events that are viewed as a single whole, and preterite verbs
Q) 111
z .E o
·¡;
would therefore be common in encyclopedias and news reports (with the
o highest positive seores on Dimension 5). The imperfect, on the other hand,
en describes an event that was not yet complete, and thus it is used for back-
ground descriptions of events that were in progress or states that existed
Figure 3.5 Comparison of registers along Dimension 5: informational reports of when another event occurred. These discourse functions are important for
past events the description and narration typical in drama and fiction, the registers with
the largest positive Dimension 3 seores. It is interesting that the multi-dimen-
Translation: sional structure reflects this grammatical distinction found in Spanish (but
After retiring, Suárez began a new career as a technical advisor. In this pro- not English); we return to this point in the conclusion below.
fessional capacity, he was associated with the technical staff of In ter Milan,
and he became trainer for their first team. He was also an adv:isor for severa!
Spanish teams. In addition, he was in charge of the selection of the national 5. 6. Interpretation ofDimension 6: formal' written style
team for Spain, which he led to the final round ofthe World in 1990 in Finally, Dimension 6 is an extremely specialized parameter defined by only
he returned to In ter, first as a trainer and then as one of the tvvo co-occurring linguistic features: cual relative clauses and other cual
members of their technical staff. clauses. As Figure 3.6 shows, these features are common only in formal
written prose - especially academic prose. We interpret this
Business letters represent a somewhat different of prose but with
dimension as reflecting a formal 'high' academic of discourse, illus-
similar Dimension 5 features:
trated by the following sentences:
"ª"'IJ'L 12: Business letter
verbs and proper nouns are shown in bold underlined.) Text Sample 13: Academic text
(Cual relative clauses are shown in fill'1!!..ill1º'IT!fili~.
Estimado Sr. Obrach:
De regreso de mis vacaciones encontré su nota. Lamento no Dada la posible modificación que estas variantes pueden provocar en la
haber hablar con Ud. pero la decisión de la Comisión de composición del café, fundamentalmente en cuanto a sustancias lipídicas y
~~~-J...:H3':~~~~ se produjo en los últimos días de diciembre y durante el proceso de torrefacción al cual se someten, se pueden
enero, cuando yo estaba en San Pablo en una reunión de ferias. los niveles de compuestos, tales como hidrocarburos policícli-
Tanto es así que aún no se realizado entre los cos aromáticos y aminas heterocíclicas, los cuales al estar en mayor con-
y otra estamos en una bastante avanzada. centración provocar un mayor efecto mutagénico.
WORKING SPANISH ORPORA
OF STER VARIATION IN SPANISH
.50
...o
Q)
.00
o interactive focus', 'narrative
IJ)
U) 0.50 ni<Ju,ciu.cu,.,; focus' and 'informational focus'.
e:
o to compare the methods and results
·¡¡¡
i:: 0.00 in this chapter) to the Val paraíso "'~-~ "···~ ~
Q)
E . The Valparaíso study was conducted to
º 0.50 in a domain ofuse: orotiess:íonaL L<.u1111ca1

in Valparaíso, Chile. For this reason, a
to represent the major spoken and written in
- ,,-
1.00 Q) Q) 11) Q) (f)
...
fl) (f) U) IJj fl) tJ) ti! U) U) e: i:: ro
..
tJ) Q)
ro rn rn 3: that domain: technical/science textbooks, literary ficti_on ª1'.d oral ínter-
-
(f) IJ) Q) >. .!!! 1ii Q)
3:: .2 ;o E 1ii oe:
e o ·;: !ti .e:
Q) !ti e: Q)
Cll .... ;
tJ)
tí !ti ...
- ro (,)
ro .e:
"ti Q) !ti
!ti Q)
. ws. In contrast, the Flagstaff study was carried out to mvest1gate the pat-
c. c. o '!:: o
(,,)
Cll
Cll
-¡¡¡ '6 c. c. 111
..2!
U)
tJ)
w c.. e: Cll
o uo Cll Cll
·~ ..Q
Cll
,,
(.,)
·~
ro ....
¡¡: ...
111 o c. ,, we .
terns ofvariation between general spoken and wn~te_n reg1sters m pams .
. . S .h
(,)
·e ...Q) w ...Cll 111 Q)ti!

-
Cll Cll !ti Cll
13 E 'E o
... > ...
o Gi As a result, the Flagstaffstudywas based on a pre-ex1stmg corp:-is: the C~rpus
,,
Cll
ro C)
e:
(1)
... C1i e:
Q)
c. .!::? ·¡¡¡
>.
(,,)
e:
C1i
roe: (,) (,) tJ) ro
:¡:; E
..Q ·=e:
o(,) .Q
fl)
ti!
tll del Espai'íol (see Davies 2002), representing 19 spoken and wntten reg1sters.
o
(,) ro ::¡
:!:: w .!!! o Q) E3:: (,,)
ro::¡ '!:: Cll
- ·=
:¡:; e: Despite these differences in research focus and corpora, there are str_ong
<( c..
fl) o m ::¡ ::¡ D..
z o o ·¡¡¡
c.
3: !l. rn tll ·milarities between the Valparaíso and Flagstaff studies. The most obv10us
:¡:; c.. ro CI) ::¡
m
u · Dimension l in both studies, which is a basic oral/literate d"imens1on ·
Q) fl) Sl
:z .E o
·e:¡ IB . . .
com-
o osed of features that reflect personal involvement and mteractiv1ty as
(/)
~pposed to dense informational prose. A second similarity is that both analy-
ses uncovered a narrative dimension, composed of both past tense features
Figure 3.6 Comparison of registers along Dimension 6: 'formal' written style
(imperfect and preterite/indefinite) together with 3rd person pronouns
and communication verbs. Fiction is especially marked for the use of these
features in both analyses.
Translation:
A third similarity between the two analyses is more surprising: the exist-
Given the possible changes that these substances can undergo as they pass ence of an informational narrative dimension that is distinct from the fic-
through the roasting stage in the production of coffee (primarily in terms tional narrative dimension. This is Dimension 5 in both analyses, consisting
of changes in lipid and protein compounds), there can be an increase in the of the preterite/indefinite past tense together with n_ominal features, ~dj~c
level of compounds such as polycyclic hydrocarbons and heterocyclic amino tival features and prepositional phrases as noun mod1fiers. Encyclopedias m
acids, which in cases of their heaviest concentration can cause an elevated the Flagstaff study were especially marked for the use of t~~se features, whil_e
risk of mutation.
textbooks in the Valparaíso study had the largest pos1t1ve score on th1s
dimension. Both of those registers have a primary informational focus that
6. Discussion and condusion includes reporting past events, resulting in these similar dimensions in the
two analyses.
There have been two previous Multi-Dimensional studies of register varia- Dimension 4 in the Valparaíso study is a general stance dimension, inter-
In the Sáiz (1999) built parallel corpora ofEnglish and preted as 'modalizing focus', which includes hedging expressions (e.g.,
'-"1J'au10u texts the Xerox ScanWorX User's Cuide, translated into both que, creer; tal vez, a lo majar), possibility modals (poder) and possibility
languages), and then undertook independent MD analyses of both subcor- adverbs (e.g., probablemente, posiblemente). The oral interviews are especially
pora. The study focused primarily on part-of-speech and simple grammati- marked for the use of these features. The closest counterpart in the Flagstaff
cal distinctions ( e.g., plural nouns, present tense , resulting in five is Dimension 2, comprising verb features, conditional
dimensions identified for both languages. These dimensions were for obligation verbs, future tense, queverb complement clauses, verbs of
the most part similar in their underlying functions across the two languages, facilitation, que noun complement clauses, etc. Two registers make an espe-
and the parallel registers were also similar in many respects. Parodi (see cially dense use of these features: political interviews and political debates.
II, this volume) presents a more developed MD
These are more registers than the personal oral interviews
based on the distribution of 65 features in a 1.5-million-word
inc!uded in the are actually more similar to the
84 SPANISH s REGISTER IN SPANIS
variation that emerge from these two

respects to the patterns uncovered era te basic narrative dimension.
MD studies 1986, , Korean At same time, there are differences between the MD
and Somali (Biber and Hared l 992a, l 992b, ses of Spanish and the analyses of other languages. For example, the exist-
these earlier studies, identifying three nce of two distinct 'past time' dimensions in Spanish has not been
similarities: ;eplicated in any other language studied to date. Spanish Dimension 3 is
very similar to the narrative dimension identified in previous MD
" the co-occurring linguistic fe atures that define the dimensions of varia- consisting of past tense verbs (both preterite and imperfect) and 3rd person
tion in each language; pronouns; writte~ ficti~n is t~e most 1:'1ª.rke? regist~r along t_his diI_nen_sion.
., the functional considerations represented those dimensions; and In contrast, Spamsh Dnnens10n 5 is d1stmctive, unhke any d1mens1011 iden-
" the linguistic/functional relations between analogous registers. tified in previous MD analyses. Dimension 5 consists of only one of the two
tenses that express past time in Spanish - the - co-occurring with
The most striking simila1ity across languages is that the first dimension in nominal features associated with an informational focus (proper nouns,
each case defines a basic opposition between oral and literate registers. long words, prepositions, attributive adjectives). This dimension has a more
These dimensions are similar in their linguistic composition and in the reg- specialized function, distinguishing between expository registers that have
ister differences that define. The positive linguistic features on these a informational focus on reporting past events (such as encyclope-
dimensions include interactive reduced structure features and dias and newspaper reportage) and all other spoken and written registers.
stance features. In contrast, the negative linguistic features grouped on the Interestingly, fiction has a slightly negative score on this dimension.
first dimension in each of these languages include noun features, adjectival The existence of structural distinctions does not entail the
features and noun modifiers. For all languages, conversational registers are existence of systematic register differences, but languages/ cultures have
at the extreme positive of the first while written expository often evolved to take advantage of these linguistic resources. Previous MD
rP•T 1 otP 1'º are at the negative pole. Functionally, these dimensions are inter-
identify several cases where specialized structural distinctions are
as reflecting direct real-time circumstances exploited to make specialized regíster distinctions. The exist-
stance and rather than an informational focus ence of specialized dimensions relating to irrealis discourse and informa-
rPrrü·tPrc that permit and revised The fact tional reports of past events in Spanish reflects this
that this dimension emerges as the first factor in the previous MD analyses show that the ways in which a lan-
guages suggests that it rPnr<~O~•n guage /culture exploits such structural resources are not what we
variation across would have anticipated. For it is not surprising that
A second across all MD is the existence of a narrative µc,u;,.,t11¡.: cultures would evolve to mood verbs
dimension. In all cases, this dimension consists of features associ- for irrealis purposes. However, it is more that these features tend
ated with narrative such as past tense 3rd to co-occur with a range of other stance and that
person pronouns and temporal adverbials. Written fiction is the opinionated
most marked on this but folk-tales also have
seores.
At the same each of these MD ""'"''vs,-·s has identified dimensions come to in
that are cultures. Sorne linguistic features are distributed
orities ofthat guages, and are in very similar ways to
identified a dimension directive interactíon' across cultures. For features like lst and person
WORKING SPANISH ORPORA REGISTER VARIATION IN SPANI
FINAL FACTOR STRUCTURE PROMAX ROTATION
0.3341
3.3209655 0.7451271 0.3727
resources are more º1-''LU.<HlLL·u, 2.5758384 0.2986935 0.4027
0.3051511 0.0265 0.4292
and these resources have come to be
0.1077713 0.0229 0.4521
more distinctive dimensions of vanat1on.
The MD of Spanish has illustrated the importance of both kinds Inter-Factor Correlations
of register patterns. More detailed analysis of these linguistic features across
a range of spoken and written registers should help to enrich the functional Fact 1 Fact 2 Fact 3 Fact 4 Fact 5 Fact 6
interpretations of these dimensions.
Factor 1 1.00 0.26 0.27 0.44 -0.36 -0.14
Factor 2 0.26 1.00 -0.03 -0.02 -0.15 -0.06
Note Factor 3 0.27 -0.03 1.00 0.19 -0.05 -0.08
Factor 4 0.44 -0.02 0.19 1.00 -0.24 -0.10
1 As an anonymous reviewer pointed out, the 'business telephone conver- Factor 5 -0.36 -0.15 -0.05 -0.24 1.00 0.02
sation' register has a higher score than 'casual conversation' on Factor 6 -0.14 -0.06 -0.08 -0.10 0.02 1.00
Dimension l; however, the difference between the two registers on this
dimension is very small and not statistically significan t.
Linguistic variable Factor 1 Factor 2 Factor 3 Factor 4 Factor 5 Factor 6

Work on this project was supported by National Science Foundation
Research Grant #BCS-0214438. We would like to thank Mark Davies and indicat 0.86 -0.04 0.13 0.13 0.08 0.00
existv 0.84 0.10 -0.14 -0.09 -0.14 0.02
James K.Jones for their contributions to the research project.
causalsubord 0.83 -0.22 -0.07 -0.03 -0.07 0.03
rbtime 0.76 -0.12 0.10 0.05 0.11 -0.02
firstpro 0.76 -0.04 0.11 0.18 0.04 0.01
mvser 0.74 0.02 -0.09 -0.10 -0.25 0.01
demonpro 0.71 -0.03 -0.12 0.11 -0.09 0.03
othersingconj 0.70 -0.18 0.14 0.00 -0.15 0.02
prodropl 0.69 0.24 0.04 0.14 0.12 0.00
mvestar 0.67 0.10 -0.05 0.12 0.08 -0.02
mentalv 0.65 0.12 0.10 0.20 0.01 -0.01
0.63 -0.16 0.05 0.12 0.04 -0.02
exhaber 0.63 0.20 -0.21 -0.14 -0.04 -0.05
0.62 0.33 -0.12 -0.06 0.13 0.01
tagquest 0.61 -0.27 -0.18 0.20 -0.01 0.01
present 0.58 0.21 -0.27 0.20 -0.35 0.00
ira 0.54 0.20 -0.11 0.21 0.08 0.00
0.52 0.17 -0.07 -0.10 0.13 -0.07
communv 0.51 0.17 0.05 0.02 0.14 0.09
0.51 0.09 0.45 0.15 0.01 -0.01
0.48 0.31 -0.16 -0.11 0.03 -0.01
0.47 0.21 -0.02 -0.14 -0.07 0.00
yesnoquest 0.47 0.22 -0.12 0.40 0.06 -0.07
querelindic 0.46 0.03 -0.01 -0.31 -0.05 0.03
88 WORKING SPANISH REGISTER VARIATION IN SPANISH
0.02 -0.0l 0.03

0.05 -0.20 -0.02 -0.39 9
-0.05 0.03 -0.45 -0.04 0.08
0.40 -0.04
-0.12 -0.23 -0.03 0.07 -0.49 -0.23 -0.03 0.47 -0.0l
0.38 0.24
0.05 0.20 -0.19 0.02 -0.51 -0.01 -0.02 -0.04 0.29 -0.02
0.38 0.10
0.60 0.01 0.03 -0.54 -0.17 -0.11 0.09 -0.08 0.01
0.37 -0.15
-0.03 0.00 0.05 -0.06 -0.57 0.13 0.27 -0.13 0.28 -0.03
0.36 0.25
desirev 0.29 0.1 0.28 0.00 -0.01 -0.58 0.24 -0.32 -0.12 0.01 0.02
0.36
-0.45 -0.04 0.10 -0.01 -0.05 -0.61 -0.22 -0.15 0.47 0.04 0.03
0.35 nodetnp
0.33 0.14 0.04 -0.03 -0.03 -0.63 -0.16 -0.24 -0.03 -0.18 -0.06
facilv 0.34 pluraln
occurv 0.30 0.01 0.10 0.04 0.05 0.01 -0.64 0.01 -0.11 -0.02 0.42 0.02
preps -0.04
imperfct 0.28 -0.27 0.59 -0.12 0.15 -0.04 -0.67 -0.04 -0.16 -0.03 0.20
defart
0.28 -0.12 0.02 0.65 0.04 0.02 -0.69 -0.06 -0.29 -0.01 0.07 -0.03
prodroptu postmodadj
que_cleft 0.28 0.09 -0.05 -0.04 -0.06 -0.04 -0.76 -0.05 0.07 0.05 -0.06 0.00
singnoun
evaladj 0.25 0.07 -0.02 -0.06 -0.07 -0.01
predadj 0.24 0.10 0.04 -0.ll -0.34 -0.02
multiconj 0.22 0.19 0.03 -0.09 -0.20 -0.01
cuquest 0.22 -0.03 -0.01 0.48 0.01 -0.01
obligation_v 0.18 0.41 0.02 0.01 -0.20 0.01
othermente 0.18 0.17 -0.08 -0.35 -0.17 0.08
subjunct 0.16 0.66 0.03 0.20 -0.12 -0.02
diminut 0.14 -0.14 0.22 0.40 0.00 -0.03
sereflex 0.14 -0.02 0.36 0.08 -0.01 -0.05
quevcompsub 0.14 0.49 0.04 0.05 -0.03 -0.01
aspectv 0.10 -0.05 0.38 -0.02 0.05 0.01
querelsubjunc 0.10 0.56 -0.10 0.04 -0.08 0.00
prep_pro 0.06 0.02 0.20 0.30 0.05 0.06
ncompque 0.06 0.29 -0.01 -0.12 -0.04 0.02
preterit 0.05 -0.17 0.40 -0.04 0.64 0.01
conditnl 0.04 0.42 0.09 -0.04 -0.03 -0.01
exclamat 0.03 0.03 0.06 0.59 0.05 -0.01
seemocion 0.03 0.03 0.33 0.00 -0.01 -0.03
jcompque 0.02 0.24 0.00 -0.11 -0.14 -0.04
cualrel -0.01 0.02 0.00 -0.02 0.02 0.95
propem -0.02 0.01 -0.12 0.09 0.80 0.01
vplusinf -0.03 0.46 0.30 -0.07 -0.30 0.03
othelcual -0.03 0.00 -0.01 0.00 0.04 0.94
conditionals -0.04 0.30 0.06 -0.08 0.06 0.00
future -0.04 0.39 -0.11 0.11 0.13 0.04
-0.08 0.24 0.03 -0.13 -0.23 0.00
-0.09 0.08 0.40 0.07 -0.13 0.05
-0.13 0.07 0.49 0.02 0.27 0.00
concesssubord -0.15 -0.14 0.15 -0.11 -0.05 -0.0]
novorainf -0.18 0.38 0.33 0.01 -0.20 -0.01
serpsvpor -0.21 -0.07 -0.07 0.01 0.24 0.06
di tic -0.24 0.21 0.70 0.18 -0.25 0.02
EPISTEMIC MODALITY AND ACADEMIC ORALITY
still in a preliminary stage, the overall goal ofthe COTECA

. to create a text corpus of scientific used
Guiomar E. Ciapuscio ~s Argentina, both in oral and written varieties. Since the early 1990s in
Universidad de Buenos Aires ~gentina, there have been systematic studies on different aspects of
Argentina cientific-academic communication in Spanish. Pedagogic 'genres' have
~een particularly targeted, either specific aspects of abstracts or distinct sec-
tions of scientific papers: particular enunciative phenomena and linguistic/
grammatical features of scientific papers; reviews, and the terminology of
Introduction different disciplinary areas; and the conceptual and formal variation of ter-
minology in different 'genres'. In every case, these studies have been based
The field of sc~en~ific ~n~ academic communication has become today's key on occurrence samples or small sets of texts that are not representative and
research area m lmgmstlc and textual studies. Severa] factors which may be thus do not allow for generalizations. Likewise, since they are individual
'externa!' and 'interna!' to the field, contribute to this development. works on different topics and problems, efforts are scattered throughout the
Among the former, the most relevant are undoubtedly the deep and signifi- disciplines under study. In sum, there are interesting qualitative results
cant consequences of globalization in communication exchange between about the generic, enunciative, textual, grammatical and lexical character-
distant cultural and linguistic communities, the possibilities that modern istics of specialized texts in Spanish, although their representativeness and
communication technologies offer, networking and joint work among dif- scope are quite limited.
ferent groups, andan increase in the mobility and migration of students and COTECA's design and organization is based on the theoretical and
scientists. This has resulted in a more thorough reflection on the charac- methodological instrumentarium ofTextlinguistics, and on the work by the
teristics, problems and challenges posed by communication in specialized project's research team along this line (Ciapuscio and Kuguel 2002;
circles than in the past. Linguistics has been able to successfully meet this Gallardo 2005), which produce reliable categorizations of texts and their
sociopolitical demand, due to its own development associated with the ordering in the corpus according to precise typological parameters. The
broadening of its object of study. The elements of linguistic use, neglected research group has also made use of other available studies in corpus lin-
for so .long, became the object of analysis in different schools of thought. guistics which provide reliable methodological criteria, which, in turn, will
m the last 15 years, researchers in the field ofTextlinguistics, Systemic allow us to carry out quality quantitative and qualitative studies in the next
Fun~tional Linguistics, Discourse Analysis and Applied Linguistics have stage. COTECA's design paid careful attention to source selection criteria,
carned out countless studies on scientific and academic discourse in differ- consulting with experts on the project's topic areas to ensure: l) disciplinary
~nt l~nguages. This has led to significant progress in the knowledge and diversity ( different disciplines will be represented in the corpus: natural sci-
identification of the lexico-grammatical, textual and discursive features of ences, social sciences and the humanities - although we have focused on
different 'genres'. 1 broadly defined biology and economics in the first stage). The choice of dis-
However, and for the most part, research on specialized discourse in dif- ciplines also reflects their importance in the scientific community and their
ferent languages has focused on written 'genres'. Only recently has there been social impact; 2) diversity of 'genres': for this early stage we have chosen
more sustained interest in the linguistic and grammatical peculiarities of central 'genres' of both oral and written scientific-academic activity. The
academic oral 'genres' English see Recski 2005; Ventola et al. 2002; spectrum of 'specialized texts' includes new and original knowledge
for Spa~ish, Castella Lidon 2004; Ciapuscio 2003a, 2004; Ciapuscio and (research articles, dissertations, conference presentations, papers), and dif-
Kesselhe1m Fortanet Müller 2005, among . This article ferent forms of science communication (university lectures, articles in
seeks to broaden and the knowledge of modality and its science magazines, popular science talks).
WORKING WITH SPANISH EPISTEMIC AND ACADEMIC ORALITY
science talks, and, as

1tveicJu111i¿ COTECA's research work. unitary and vs written prose - aca-
dernic, narrative, exposltlve, . It is in recent decades that critical
2. Theoretical framework: texts, 'genres' and spedalized orafüy oices have emerged, positing that dichotomy in tenns of a continuurn
~Koch and Oesterreicher 1990) and arguing that orality is not homoge-
The theoretical framework is Textlinguistics, wíth a cognitive-communicative neous, but rather presents a diversity of more or less elaborate forms
approach (Heinemann and Viehweger 1991; Heinemann and Heinemann (Blanche-Benveniste 1998). Orality clearly includes an array of different
2002). The latter conceives texts as primarily psychical entities - given that, genres. As Payrató (1998: 29) has stated:
as any human activity, they are based on psychical processes - that should be
considered the result of mental processes. Texts - natural and complex Justas it is nowadays easy to prove that, within the field oflanguage sciences, var.i-
research objects - are characterized by the quality of textuality, which has a ation is a polivalent concept, which requires a complex definition, clearly what is
prototypical nature and consists of a set of features or attributes related to known as oral language is not really a (unitary) language variety, but a (very
the different levels or constitutive dimensions of texts - functional, situa- diverse) set of discourse forms. This consideration is paired with a second one:
tional, semantic and grammatical (Heinemann 2000). The different levels the oral/written dyad should be explained as an ambivalence and not as a
dichotomy, as was suggested by traditional orality prototypes or archetypes (spon-
that allow us to describe and systematize this complex object are not unre-
taneous conversation) and writing (a text that is prepared and perfectly cohesive
lated; rather, there is a close and reciproca! conditioning between them: the and consistent).
text's functional, situational and semantic levels determine the microstruc-
tural aspects (information distribution, syntactico-semantic connections On the other hand, although orality expertise is not homogeneous, it is
between sentences, syntax and lexicon). On the other hand, the microstruc- clear that there is an unequal frequency distribution between spontaneous
tural features are inescapable components when describing and explain- orality and elaborate or formal orality (Teberosky 1998). Although we have
ing the text object in its more comprehensive levels (Ciapuscio 2003b: 22). many chances to listen to and produce spontaneous oral forms, there are
Texts are always representatives ('tokens') of a category or 'genre' of comparatively fewer opportunities to produce or listen to elaborate oral
texts ('types'). According to Heinemann (2000) and Heinemann and productions ( official discourses, ceremonies or rituals, scientific talks, etc.).
Heinemann (2002), 'genres' could be described and explained in terms of Thus, there is still a tendency in linguistics to reduce the field of orality to
text groupings based on multi-dimensional features, i.e. features associated spontaneous orality, orto assign it greater relevance. The large number of
with their different constitutive dimensions. studies on specialized discourse in written 'genres' is also compounded by
The knowledge of'genres' is acquired in communicative experiences and the traditional attitude in linguistics as well as in other social sciences that
plays a central role in the acts of producing and understanding texts: they considers the written and monological variety to be the exernplary model of
are part of our communicative budget (Bergmann and Luckmann 1995). scientific discourse - research articles, monographic papers, textbooks, etc.
'Genre' knowledge, generally called 'global text patterns' (Heinemann and This is the reason why COTECA has decided to devote their efforts to the
Heinemann 2002), rnay be understood as a set of general orientations of oral academic 'genres', specifically lectures, academic and popular
toward~ the texts' attributes in their different dimensions, which vary science talks, papers and prívate interviews with specialists. Linguistic
acco~dmg to ª.n individual's comrnunicative experience, guiding and research on specialized orality will allow us to substantively broaden our
he.lpmg .people m the acts of producing and understanding. These general theoretical-descriptive knowledge of lexical, grammatical and textual
onentat10ns, related to the different dimensions of texts, are also reflected that until now have been analysed mostly in ~Titten texts. Similarly,
in the recurrent features that characterize different research on the textual types of specialized orality will provide descrip-
'genres'.
tive knowledge that will contribute to the education and training
WORKING SPANISH CORPORA MODALITY AND ACADEMIC ORALITY
that to
~"'"rn.iu.aci·~~, markers
who claim that studies stance not
. but should rather refer to the concept of 4 Each
5
uon, .
as HHAUHH."'
. oW11 use, i.e. it favours a d1fferent set of stance ~~"~~k·~
what he/she says. As we can express ~:fferent types of grammatical real.izations. T~e basis of t~üs work
our degree of or the of or is the Longman Spoken ª1'.d Wntten Enghsh .Corpus, from ''.'h1ch
We may also our feelings and affection 100,000 words rangmg across three reg1sters: conversat1on, acade-
anger, and we may formulate our discourse or rnic and news. The notion of stan~e is reformulated and broadened
text as statements, wishes, orders, etc. I thus start from the classic previous studies: stance thus des1gnates the of
distinction between intellectual ( epistemic, i.e., declarative and hypotheti- feelings and value judgements on three major levels:
cal), interrogative, volitional and affective modalities proposed by Charles
Bally (1944) .3 These were later reformulated by a large number of studies as . the epistemic stance, which comments on the certainty ( or doubt), reli-
regards labels, distinct qualities and location within the framework of differ- ability or limitations of a proposition, including comments on the source
ent linguistic theories and approaches, for example studies on modality and of information;
modulation in a Systemic-Functional Grammar Approach (Halliday 1994), . the attitudinal stance, which expresses the speaker's attitudes, feelings
recent developments along that line such as the Appraisal Theory (Martin or value judgements; . .
2001; Martín and Rose 2003), work on evidentiality (Chafe 1986), on corpus . the style stance, which describes the way in which the informat10n is pre-
linguistics' evidentiality and affect (Biber and Finnegan 1989; Conrad and sented.
Biber 2001), on modality and grammaticalization (Palmer 2001), and finally,
on modality and emotion (Sandhofer-Sixel 1990; Danes 1987). Within the epistemic stance, it is possible to distinguish different subclasses:
Since Chafe's pioneer work on evidentiality and his research on its mani-
festation in reduced corpora of colloquial orality and academic writing a) indication of degree of certainty or doubt regarding the proposition
( 1982, 1986), there have been man y studies of academic and scientific texts (realized by perhaps, probably, etc.);
that analyse how attitudes toward knowledge are expressed. Available b) comment on the actuality of the proposition ( actually, really, in Jact);
studies in English and German focus on standardized written 'genres', such c) indication that the proposition is in sorne way vague (sort of, if you call it
as research articles, textbooks, peer reviews, etc. (among many others, that way); .
Ventola 1997; Hyland 2000). In Spanish, works on this topic are still very d) identification of the source of information or the specificity ( accordzng
rare, although they also focus on written 'genres' (Ferrari and Gallardo to) or, by implication, with words such as apparently and evidently;
1999; Gallardo 1999; Ferrari 2004). e) limitation of the information or identification of the perspective from
Severa! studies by Biber, both individual and co-authored (among them, which the proposition is true (in most cases, from our perspective).
Biber and Finnegan 1989, 1994; Biber et al. 1998; Conrad and Biber 2001;
Biber 2005), have been very enlightening on the topic of evidentiality and Keeping in mind that these are texts produced in everyday communication,
affect in 'genres'. U sing the notion of 'stance', they in elude the lexical and an interesting result of this study is that in conversation there are at least
grammatical codification of evidentiality in English. In their research, based more than twice the number of adverbial stance markers than in written
on multifactorial analysis, they study the grammatical categories that realize texts. Very common stance adverbials in conversation are actually, really
evidentiality and affect stances in different 'genres' (both oral and written). (polysemic) and probably. This frequent use is consistent with several b~ck
The examination of co-occurrence of certain linguistic features allows them ground characteristics of conversations: focus on interpersonal relauon-
to propase different 'attitudinal styles'. In Biber and Finnegan's seminal ships, expression of value judgements and ~ersona! attitudes, and. lack of
work (Biber and Finnegan 1989), the authors present á study on stance time for planning. This result is, in turn, cons1stent w1th the expectation that
styles in English: with the aid of a computer program, they analysed 24 participants in a conversation are personally involved .with their m~s~ages
written and oral 'genres' classified on the basis of groupings of stance and thus frame their statements with their personal att1tudes and op1mons.
features, and proposed six basic attitudinal styles in English. The selection of In academic prose, the writers' concern is evident in the pains they take
oral texts includes the following: face-to-face conversations, telephone con- to express certainty, actuality and vagueness. However, and contrary to
versations, public conversations, debates and interviews, radio programmes, conversation, there is a relatively wide range of epistemic stance markers
WORKING SPANISH C AND ORALITY
a very
,.,v~·~·,,.,,~
stance. On the
cµ>Ioi.nJLHL
her in this context the and markers to express the modal-

centred ~t s assume distinctive characteristícs vis-a-vis other
¡t¡e.
~~·~"-~•'""
to various fields and , and 011 severa!
PuPruncrnco11versatio11 the dominant one. The
study scie11ce talks I analyse in this article intends to con-
tribute to the knowledge of epistemic modality in academic orality, and to Example 1 presents an illustrative case for our study. It comes from a talk on
provide a theoretical-analytical basis for further research. My analysis the effects of addictions, such as tobacco, alcohol and other drugs, on the
focuses 011 what Biber and Conrad call the epistemic stance and, within it, brain.
on the ways in which the speaker indicates the degree of certainty or doubt
regarding the proposition uttered. (1)
My work is based on a series of theses directly related to the multí- Una de las cosas que no nombré es que esta vía, que veíamos que se afecta en
dimensional notion of 'genre' that intend to advance the knowledge todos los adictos, eh , , , muchos se preguntaron por qué existe esa vía rela-
of modalities in Spanish and its academic 'genres', since - at least to my cionada a la adicción, entonces trataron de buscar qué cosas , , , o sea, que
knowledge - the interrelationships between them have not yet been system- tendría que haber un sistema que funciona naturalmente, algo natural, ¿sí? en
atically studied: todos los animales, porque existe en todos. Lo que encontraron, por ejemplo, en
animales y en humanos, con eso de las tomografías y la resonancia magnética, es
que es una vía que está motivada por algo placentero. Naturalmente los dos
" the types of modalities and markers present in academic 'genres' are eventos que encontraron placenteros son el sexo, o sea el sexo lo que hace es
related to their functional dimension, that is, with the actional and social mantener una especie, o sea los animales lo que hacen es tratar de procrearse
goals that 'genres' fulfil in interaction; para mantener la especie y el número de especie. Yla segunda es la comida. ¿Por
.. the types of modalities and their markers are conditioned by the qué la comida es necesaria para el mantenimiento del individuo? Si no comemos
'genre's' situational dimension, specifically the orality/writing vari- nos morimos, los dos eventos activan esa zona. Lo que se postula es que la droga
ables, the limitations imposed by the kind of institutional framework, lo que hace es desviarla como si la exacerbara, ¿sí? se evade esto y está asociado
and the types of interlocutors participating in the in teraction; con ciertas drogas. Está visto por ejemplo que las personas que dejan de fumar
.. variation in modalities is an outstanding linguistic feature or attribute empiezan a comer más. O los fumadores comen menos. Empiezan a fumar y
that characterizes different 'genres'. comen menos. Y otro que hay ciertas drogas que producen un displacer hacia el
sexo, o sea que tienen problemas sexuales. Entonces esto digamos que se está
estudiando. No se sabe todavía pero es una hipótesis que puede ser, que puede
4. Popular sdence talks: the centrality of epistemic modalities ser válida [AJ
The 'popular science talk' 'genre' corresponds to what may be included in This fragment shows a sequential display of statements in which the
planned orality: specialists in different topics present the results or aspects speaker expresses different degrees of certainty, through different lexico-
of their research to the general public in a formal environment. grammatical expressions, to present a strong claim arising from the
These talks are predominantly monological texts prepared for oral presen- research on the effects of addictions on the brain. The steps in the devel-
tation, in all cases with sorne kind ofvisual support (slides, projections, etc), opment of the arguments are anchored on epistemic modals, both hypo-
at the end of which there may be questions from the audience. The corpora thetical and declarative:
are two lectures delivered by Argentine scientists as part of a series of
popular science talks organized by the University of Buenos Aires, 'Buenos Lo que encontraron, ( ... ) es que es una vía que está motivada por algo
Aires piensa 2004' and 'Las ciencias adelantan que es una barbaridad'. 5 placentero.
The corpus comprises 25,476 words, in a four-hour recording. I have Naturalmente los dos eventos que encontraron placenteros son el sexo
selected 42 fragments of variable length, with a display of ( ... ) Y la segunda es la comida.
WORKING SPANISH CORPORA C MODALITY AND ACADEMIC ORALITY
Está visto por que personas que de fumar

comer más. los fumadores comen menos. ( ... ) Y otro que
el sexo ...
estudiando. No se sabe todavía pero 3rd
p.
es una ser válida.
creer (believe) (lst 3rd
p. singular)
The large amount of data the of the corpus present tense, imperfect parecrr (seem) (3rd
confirms that when specialists present their research to the general public, form tense (can)+
they make explicitly clear the certain or hypothetical nature of their state- ver (see) (lst p. plural, indicative/ conditional; passive
ments, which is translated by an extensive repertoire of markers. Table 4.1 past tenses; periphrasis verb phrase in 3rd p. · uno
shows a classification of the epistemic markers analysed. estar ('to be')+ gerund) ('one') + 3rd p. sing.)
The table shows that specialized orality includes the majority of common confirmar (confirm) (lst deber (must) + infinitive (V in
epistemic markers analysed in the research on academic written 'genres' p. plural, simple past) indicative/ conditional; 3rd p.
(Ferrari and Gallardo 1999; Ferrari 2004). As regards the declarative moda!- sostener (hold/maintain) sing./plural; lst p. plural)
(3rd p. plural) tener que (have to) + infinitive
ity, we find verbs such as know, confirm, maintain, see, prove and the verb to be+
comprobar (prove/ show) (meaning 'must')
sure, without doubt, etc.; epistemic adverbs such as evidently, really, in fact,
(lst p. plural, simple past) postular (advance/propose) (3rd p.
lexical modality in adjectives ( evident), participles (proven, confirmed); in sing., passive with particle se
ser seguro que (be sure of)
prepositional syntagms (in theory), etc. Likewise, there is a dominan ce of the (3rd p. singular) (reflective))
indicative mood, and of the past and present tense. For the hypothetical encontrar (find) (reflective especular (speculate) (3rd p. sing.,
modality, speakers use epistemic verbs such as think, believe, seem, propase, passive 3rd p. singular) passive with particle se
speculate, but also colloquial forms such as take with a pinch of salt, ( not) to take ( reflective))
litemlly; semi-lexicalized expressions, for example from what I could see, atti- (no) haber duda ( there is estar discutido (be discussed) (3rd
tudinal markers such as maybe + conditional; epistemic adverbs, for (no) doubt) (3rd p. p. sing., state passive)
example apparently, probably, possibly, maybe, nouns ( hypothesis, theory, ques- singular) (no) tomar al pie de la letra (not to
tion); and adjectives (probable, proposed). Statements are expressed in the con- take literally) (3rd p. plural;
ditional and subjunctive moods, as well as in the future indicative. Thus, we subjunctive)
tomar con pinzas ( take sth. with
may argue that in popular science talks we find features that are typical of
a pinch of salt)
the careful presentation of both established and new knowledge, plus a aparentemente (apparently)
Epistemic evidentemente ( eviden tly)
variety of lexico-grammatical markers of assertion and epistemic doubt adverbs realmente ( really) probablemente (probably)
which corpus studies assign mostly to written academic prose. On the other efectivamente (indeed) posiblemente (possibly)
hand, this being an oral interaction, the 'genre' also reflects the linguistic quizás (maybe)
consequences of its re gis ter: there is highly recurrent use of the first person tal vez (perhaps) + indicative/
singular and plural within epistemic constructions containing verbs that subjunctive
describe the speaker's mental activity (I know, I believe) and that of the Nominal idea (idea) hipótesis (hypothesis)
research group ( we confirm, we saw, we think, etc.). That is, there is an import- syntagms pregunta (question)
ant presence of the explicit subjective modality (Halliday 1994: 335-7) con- teoría ( theory)
evidente ( eviden t) probable (probable)
sistent with oral practice and direct contact with the interlocutor that clearly
participle comprobado (proven) postulado (postulated)
marks the contrast with written 'genres'.
syntagms (in state passives) debatido (debated)
confirmado ( confirmed)
6. Modality's qualifying schemata (idem)
visto (seen) (idem)
The qualitative data show that the specialist in popular science talks clearly Prepositional en realidad (in fact) en teoría (in theory)
and indica tes the degree of factuality of his/her statement, showing the inter- syntagms por lo que pude ver (from what I could
locutor how he/she must interpret the statements uttered. Apart from the see) ( + verb in the indicative)
variety of markers of epistemic modality, the analysis of the corpus allowed por ahí ( could be) ( + verb in the
me to a followed by specialists - the use of what I conditional)
WORKING WITH SPANISH EPISTEMIC AND ACADEMIC
Dedarative
Conditional
assertion
Verb tenses different schema whose is
Dr,ooosittHJn:1! elements. We
ésta es la teoría ahora, which both
schernata in and temporally restricts the previous assertion (it indicates that this is
in which the a provisional this assertion, we find a number of
out - different grammatical procedures - a fragment reformulating statements that inform the interlocutor about the limited
or following text, and 'assigns value' to it from the perspec- range within which he/she should interpret X: o sea no lo tomen como/ como
tive of its factuality. The following are sorne examples of qualifying schemata: al pie de la letra, porque todavía no se sabe, esto es lo postulado hasta ahora, lo que
se vio en cuanto se trabaja en/en in vitro, cuando se trabaja en temas de animales.
(2) The sequence ends with an assertion that sums up and clarifies the
[Los receptores son los que van a hacer que el individuo reincida a la nicotina.
restricted nature of X: En humanos todavía no sabemos nada.
Cuando uno fuma de vuelta, yo dejé de fumar, pruebo un cigarrillo, lo que me
desencadena eso, es la nicotina, o sea que son los receptores a la nicotinaJX ~ From a structural point of view, the modal operator may precede the
[de eso no hay dudaJY ¿sí? [15, AJ modified segment, as shown in example 5, where the neutral demonstrative
operates cataphorically:
At the end of this fragment I have marked the modality's qualifying opera-
tor (Y) in italics, where the neutral pronoun ofthe prepositional syntagm - (5)
eso - picks up the fragment of the previous text. The assertion no hay duda Aumenta el deseo de búsqueda de placer, [esto es lo que decía hoy. que está dis-
reinforces the declarative value expressed in it. The modal operator is cutido, hay algunos que sostienen, otros que no ]Y, [que muchas drogas ya no
provocan realmente un placer]X entonces ya tiene que ver con otro/ con/ con la
attached in parataxis to the segment that has been assigned value and, due
búsqueda, el deseo de búsqueda del displacer, de no tener /de no sentirse mal.
to the coincidence with the modal value of the segment that it modifies (an
¿sí? [8, AJ
assertion in itself), the operator may be considered a modal reinforcer.
In example 3 we find an operator that has a different value from its modi- Within the modal operator, the qualifying segment - que está discutido, hay
fied segment. Since its effect is to restrict or relativize the previous assertion, algunos que sostienen, otros que no- explicitly states the relative value with which
I call ita 'modal qualifier'. The operator instructs the interlocutor as to the the modified segment should be interpreted. In this case, the modal operator
reservations he/she should have while interpreting X; it follows the modi- and the modified segmentare placed in a hypotactical structural relationship.
fier and, again, a deictic element - the neutral demonstrative - realizes the The modal operator may be embedded in the component it modifies: in
explicit reference to the modalized fragment: example 6, the reinforcer interrupts the lineal syntax and separates the
(3)
nucleus from the prepositional complement, establishing a parenthetical
connection with its modified segment:
[Fumar causa un tercio de la muerte de los hombres entre treinta y cinco y sesenta
y nueve años y es una de las causas de mortalidad que continúa creciendo, y de (6)
aquellos que fumaron durante la adolescencia y durante la vida, la mitad muere estas estructuras son capaces de. producir comportamientos rígidos pero. [son
por problemas relacionados al hecho de haber fumadoJX ¿Sí?*--- [esto también. capaces X~ [lo hemos visto, lo han visto ustedesJY __, [de mucho más que eso,
cuando vean estadísticas, en los diarios o en lo que sea, les sugiero que lo tomen son capaces de aprendizajes simples, pero también de aprendizajes complejos,
con también, porque hay factores, que uno no sabe]Y [14, A] aprendizajes que hasta ahora no creían que pudieran existir en un sistema como
In example 4, the structural and semantic complexities are even greater: ésteJX [36, B]
(4) In this case, the operator consists of a constituent that includes a structural
[Es que los detectores a la nicotina están en el centro, en los núcleos del cerebro, parallelism: constructions including the epistemic sense verb
los cuales/ en los cuales estas células se mueren, en el Alzheimer o en el together with an identical direct complement as the object pronoun),
Parkinson. Entonces cuando cuando estas/ estas células del cerebro tienen achieve an emphatic modal value that reinforces the categorical assertion
este receptor, cuando unen la nicotina, se mantienen activas entonces no logTan of the modified segment.
WORKING SPANISH EPISTEMIC MODALITY AND ACADEMIC ORALITY
este cerebro
nos está haciendo falta es en realidad generar nuevos modelos
_ es lo que nos está haciendo falta son este de aproximaciones
_ qué es lo que se está midiendo
_ no se nos,ocmTió es que tal vez lo que estábamos haciendo necesitaba un
contexto diferente
These modifiers can be considered as co-markers of factuality.

We are now able to systematize the main structural and functional
(7) [eatures of the qualifying operators for this modality. From a structural
y entonces [la hipótesis es, bueno, eso quiere decir que las (abejas, ge) que la point ofview, the operators and the modified text segment forma binary
siguen huelen el olor que trae y reciben azúcar, entonces tal vez están aprendi-
structure:
endo, a su vez, las características de la flor, en todo caso del olor de la flor a la que
hay que ir, no sólo está sabiendo adónde hay que ir, sino cómo huele la flor a la Modified Segment (proposition/s) (X) + Qualifying Operator (Y)
que hay que ir]X ~[esto un trabajo que se está haciendo en este momento, así
que preferiría tener las conclusiones, antes de decir es así o no es así]Y [ 42, B] The formal length of X and Y varíes greatly in these texts, and may include
one or several predicative structures. The syntactíc relationship between X
The temporo-aspectual restrictions of factuality are manifest in the regular and Y may be paratactical, hypotactical or parenthetical (paratactical or
occurrence, within qualifying schemata, of time adverbs and complements, hypotactical interpolated dauses). The order between the modified
durational and progressive verb periphrases, state passives, etc., that accom- segment and the qualifying modal operator is also variable, and it may
pany verbs or other epistemic markers. The following are sorne of the exam- proceed, follow or even be embedded in X.
ples from the corpus:
Constructions with knowledge verbs + todavía ('still'): X+Y (Parataxis or hypotaxis)
Y+X (Parataxis or hypotaxis)
- todavía no se sabían las causas ni nada X-(Y)-X (Parenthetical)
- ayudaría, digamos todavía no se sabe bien
- o sea no lo tomen como/ como al pie de la letra porque todavía no se sabe According to the examples analysed, the modal operator (Y) is a binary
- en humanos todavía no sabemos nada structure made up of the following constituents: a first component, A, rep-
- no se sabe todavía por qué producen diferentes efectos resented by elements that pick up, point out or refer to X, and a second
- todavía no se sabe bien dónde está component, B, specifically the qualifying component, which may include
- no se sabe todavía pero es una hipótesis que/que puede ser/ que puede ser different modal markers and structures.
válida
- en ciertos tipos de memoria todavía no se sabe cuáles Qualifying Operator (Y): [{A}+ {B}]
- y este modelo todavía está muy lejos como la mayor parte de los modelos que A: {[anaphoric/cataphoric element] v [metapropositional elements] v [ellipsis]
existen, de develar todos sus misterios v [hedges]}
B: {[modal marker(s)] v [temporo-aspectual modifiers] v [modal propositions]}
Constructions with por/ hasta+ ahora ('until/up to + now'): v = inclusive disjunction
- Sustancias no perjudiciales, no perjudiciales digamos por ahora The chart shows that component A is represented largely a pronominal
- esta es la teoría por ahora element (anaphoric or cataphoric, examples 2, 3, 5, 6). The modified
- esto es lo postulado hasta ahora segment may also be indicated by metapropositional elements, such as idea,
- aprendizajes que hasta ahora no creían que pudieran existir en un sistema teoría (idea, theory) (example 4); by ellipsis through co-referential argu-
como éste
ments; or even by hedges, such as digamos (let's say). In the case of the
Durational/ progressive periphrasis: specifically qualifying component, it contains several modal markers, such
as epistemic verbs, names, adjectives, participles, prepositional syntagms
- lo que se está viendo acá es que quizá no es tan el número and phrases (see Table 4.1), to which temporo-aspectual modifiers are reg-
- esto un trabajo que se está haciendo ularly attached, thus restricting factuality. Naturally, indicators operate
WORKING SPANISH CORPORA EPISTEMIC AND AC ORALITY
Notes
we should also mention the research
recent years has knowledge regarding the
Condusions and semantic specificities of lexical units and their
The science talks has shown with regard to the combination and formal behaviour in Spanish, Adelstein
manifestation of epistemic modality. On the one hand, there is linguistic evi- 2004; Adelstein and Cabré 2002; Komfeld and Resnik 2002; Kuguel 2006).
dence of a significant presence of the specialist's explicit subjective expres- 2 This project has received funding from the Consejo Nacional de
sion, assuming the responsibility for modal assessment. On the other, the Investigaciones Científicas y Técnicas de Argentina (PIP-CO NI CET 6165).
research has verified a clear tendency to orient lay interlocutors as to how 3 Zavadil (1968) presents a study of the different modalities in Spanish and
they should interpret the factuality of the specialist's statements. In this their markers. See also Kovacci ( 1990).
sense, I have identified and described a regular procedure to accomplish 4 The analysis of severa! works by these authors shows, however, that they
this - the modality's qualifying schemata. These have a stable structural seem to make no conceptual distinctions between the terms 'genre' and
composition and functional nature, providing further grounds for the 'register'. In Biber and Finnegan (1994: 4), for example, they define reg-
belief - stated by different analysts - that there is, in fact, order and regu- ister as a 'linguistic variety considered in relation to its use context';
larity in orality (Blanche-Benveniste 1998, among others). The data gath- further on, they argue that 'aside from the term register, we have used the
ered has further shown that in the popular science talk genre in Spanish terms "genre", text type, and style to refer to linguistic varieties associated
both personal epistemic value judgements (e.g., yo no creo en una droga mila- with situational uses'.
grosa, lo que yo sé es que . .. ) and non-personal epistemic value judgements 5 Alicia Avellana transcribed the text which I later revised. I have normal-
(no se sabe, se está estudiando, lo que se vio cuando se trabaja in vitro) that seem ized the transcription, erasing paralinguistic information irrelevan t to the
to represent the voice of the speaker's community of peers are significantly analysis of the topic discussed in this article, such as lengthening or
present. We should point out that the non-personal epistemic value judge- emphasis. The punctuation attempts to represent intonation in this type
ment has been described so far as a feature ofwritten academic prose. of spoken text.
The descriptive data gathered at the grammatical form level may be
understood in relation to the functional and situational dimensions of the
genre. The purpose of adequately and fully informing a general audience
about their personal research in a public institutional environment and a
relatively formal context underlies the specialists' concern for a careful pre-
sentation of the degree of factuality of their statements in texts that have
been prepared beforehand. This explains the rich and varied repertoire of
markers to express doubt and certainty, and also the fact that the 'genre' is
full of assertive and hypothetical modal qualifying schemata. On the other
hand, from a situational level, the features that characterize popular science
talks, such as direct contact with interlocutors and oral expression, also
determine how specialists typically manifest commitment (Chafe 1982) in
spoken discourse: references to the first person, statement of speakers'
mental processes, hedges and colloquial locutions.
The descriptive results of the different lexico-grammatical markers of
epistemic modality may be used to guide further research of more extensive
corpora of academic orality and the expression of this modality in Spanish.
This article has provided evidence as to how the features of functional and
REGISTER ANALYSIS
J.1 PrefJos:tizo.nat
At this stage, we propose the notion of 'prepositional scheme' as a
Omar category of collocation, (at least in part) what different grammars
Pontificia Universidad Católica de Valparaíso (Alarcos Llorach 1999; Cano 1999; Di Tullio 1997; Fuentes 1985; Gómez
Chile 2002; Hemández 1986) have variously called prepositional complement or
prepositional object complement.
The first question that arises is that related to the reasons for proposing
this category, and the need to set it apart from other equivalent grammat-
ical categories. The answer is provided by the difference in collocation,
Introduction
based on the systematic co-occurrence of two or more elements that do
Prepositional complements (PC) constitute a kind of grammatical struc- not always have a direct grammatical relationship. In this sense, the notion
ture related to sorne verbs. Like many other grammatical structures, it can of prepositional scheme can be understood, in principie, as a bigram
be seen from different perspectives, depending on the author's point of (Jurafsky and Martin 2000) ora recurrent sequence of two elements, where
view. Furthermore, no previous studies have been conducted in which the first element is a common verb, and the second a common preposition.
the PCs analysed in this research were related to a lexical verb category, From a different point of view (Biber 2005), a scheme can be seen as a
such as movement verbs, cognition verbs, communication verbs, etc. This sequence ora frequently occurring lexical bundle. The schemes we present
chapter proposes a more diverse view of PCs, as what has been labelled a here also have a third element, so, strictly speaking, we are dealing with a
'prepositional scheme'. Moreover, we link these schemes to a specific verb trigram. The third element belongs to a general cognitive category that is
type, namely to what we call communication verbs (Bosani 2000; Sabaj equivalent to the notion ofthematic role (Dowty 1991) or semantic partici-
2004a). pant (Sabaj 2006). These semantic categories, however, are cognitive and,
These verbs, as can be seen from the different approaches that will be as such, do not necessarily maintain a dependent syntactic relation with
discussed, play an essential role in various dimensions of discourse, the previous elements. They are cognitive elements because they can be
among them the presentation of the discourse's voice and the author's represented in an ontology or semantíc network where the nodes corres-
stance with respect to the text topic. Unlike previous studies, this research pond to events, bodies, objects and a more or less defined number of cir-
is based on a corpus linguistics methodological perspective, which implies cumstantial elements (event, space, place, mode and instrument). These
that descriptions are based on actual texts taken from different language cognitive categories complete the meaning of the verb and can relate (as
uses. Such an approach has rarely been used in grammatical studies, since shall be seen later on) with syntactic structures such as prepositional object
they are traditionally associated with the of units without a dis- complements.
course context, i.e. grammarians have been traditionally more con- For Spanish, there are several approaches dealing with PCs (Alarcos
cerned with isolated sentences. In this research is based on Llorach 1999; Cano 1999; Di Tullio 1997; Fuentes 1985; Gómez 2002;
diversified corpora containing nine registers of varieties of Spanish, oral Hemández 1986) and consequently a verywide range of terms is employed.
and written, scientific and educational, professional and literary, among In this paper, we will summarize the most relevant classifications proposed
others. in Spanish and then we will relate the description of those structures to the
This has been organized as follows: first, we present the theoret- categories defined far the prepositional schemes we present.
ical on the basis of a revision of different grammatical From a general point ofview, prepositional complements are understood
approaches, proposing the notion of a prepositional scheme and its relation as structures introduced by a preposition, forming one single unit w:ith the
with PCs. Next, we review the characteristics of verb studies associated followed an argument. Even though this general definition may
with corpus and define communication which seem acceptable, to a greater or lesser it is to propose at
REGISTER ANALYSIS
)
2)
3) alternation of a transitive version andan intransitive version; and

(9) Chandler puso los
4) the structure's status as or argument.
Ross se bañó en la
Even these criteria structures As we have seen, there are different determine whcther a
that function in a similar way and, prototypes. can be as a PC or not, and those criteria tend to
According to the first criterion, sorne verbs do not presentan alternative in a manner which, more than helping to define constructions
construction without a preposition, i.e. they only appear with a preposition. allovvs the analyst to establish relatively prototypical behaviour in open
Sorne examples of those verbs are radicar en, consistir en, carecer de, influir categories, as Cano has rightly proposed ( 1999: 1813):
rtPt1P11rt1>r de. The preposition, in these cases, is generally placed irnrne-
diately after the verb and a complernent is mandatory to make the con- The individuality ofprepositional object complements in relation to the elements
which govern them is expressed not only by the fact that prepositional depen-
struction complete:
dence is not a well-defined category of complements in form and meaning, but
(1) *El gobierno influyó en. also by the fact that, as with objects, they are not determinations that appear and
are established as a part of any verbal process, but are specific to certain verbs.
According to criterion 2) it is possible to find constructions that mayor may
not be followed a preposition. Sorne exarnples of this type of verb are Various authors (Di Tullio 1997; Torrego 1999; Gómez 2002; Fuentes 1985)
hablar/hablar de, preguntar/preguntarpor, aprender/aprender a, luchar/lucharpor. point out that PCs can be misinterpreted as circumstantial complements
The alternation <loes not imply a change in meaning, but includes another (CC) and direct objects (DO). On the one hand, the way to distinguish PCs
participant: from CCs is based on criterion 4) defined above. In this sense, CCs can be
easily omitted, whereas PCs cannot be deleted (Cano 1999). On the other
(2) La mujer preguntó por ti/La mujer preguntó todo lo que quería. hand, PCs have a strong conceptual link with DOs, because they both com-
(3) Aprendí a relajarme antes de la exposición/ Aprendí mucho durante mi plete the rneaning of the verb and can be recognized from the question
estadía. 'What?'. Dueto this close relationship between PC and DO, sorne authors
( 4) Luchó por la libertad de Luchó toda su vida. have called these structures 'supplements' or 'prepositional object comple-
Criterion 3) establishes a distinction between verbs with a transitive and ments' (Toirego 1999; Hernández 1986; Cano 1999). Nevertheless, there
intransitive version. The transitive version requires no preposition, unlike are formal differences between both complements ( Gómez 2002). PCs are
the intransitive version with a form (Di Tullio 1997). Even introduced by a preposition and cannot be replaced by atonal per-
though these verbs can be understood as a subset of verbs generated from sonal pronouns. Sorne authors (Di Tullio 1997; Cano 1999; Torrego 1999)
criterion 2), the distinction between transitive and intransitive does not have also remarked that in many cases the presence of a prepositional com-
to all of them, but to those defined this third criterion. Sorne next to a verb that does not require thís structure implies a
of these verbs are the following: in meaning, e.g., on. It is to establish a direct
relation between PCs and schemes. All the dis-
(5) lamentó el 1 rn·H1'C'rnr<-' se lamentó de los resultados obtenidos. cussed structures are include a verb, a
(6) olvidó el se olvidó de su compromiso. pren,os1t1•'.ln anda category. Sorne of these schemes correspond to
a depending on the type category they present. In Table 5.1
to criterion 4) verbs that take prepositional ~v,u;;.nc.ui~HL~
we illustrate the categories to be used, and we will see which of them could
are as such when they con tribute value to
correspond to a PC.
of the verb. Such are arguments and not
From the correspondence shown in Table it can be stated as a pre-
answer 'What?', not 'How?', as can be seen in
that when categories in a scheme are an or object, a person
Mónica habla de corrido. event, such a scheme to a PC the category
(8) Phoebe habla de música. presents and argurnent ~LIJLHL<L'''-L structures.
WORKING WITH SPANISH ORPORA REGISTER ANALYSIS l l
preguntar, responder,
anunciar en be done with words c11.1Jm.ctr. informar, avisar, ordenar,
Location
hablar de corridor anunciar, declarar, relatar,
Mode
hablar narrar, comentar
Instrument
or hablar + B Those referring to actions that can be piropear, saludar, insultar,
Person hablar de Francisca + done with words and affect the listener calumniar, injuriar, amenazar
Event hablar de que es + e Those referring to attitudinal enfatizar, asegurar,
entretenido actions that affect the propositional afirmar
Time hablar por una hora content of the statement
D Those referring to physical action gritar, susurrar, balbucear,
that can be done with words murmurar
2. Verbs and corpora
Verbs undoubtedly play a central role in natural languages (Wiemer-
Hastings et al. 1998), a fact that makes them of focal interest in any linguis- in our research. Later, we will complete this scheme with other authors' tax-
tic study. Most current studies on verbs (Fernández et al. 1999; Aguirre 2000; onomies (Hyland 1998; López 2001; Massi 2005), determining the func-
Vázquez et al. 2000; Morante et al. 2000; Vázquez et al. 2002; Levin 1993; tions recognized by the communication verbs. The objective of the above
Ferrer 2004; Subirats 2004; Castellón et al. 2005) are formalistic and focus will be to propose the typological criteria to be used in this work.
on the so-called lexical-syntactic interface. The purpose of these studies is Bosani (2000: 253) states that the 'Spanish lexicon contains a large
to model the way in which sorne lexical patterns correlate with syntactic number ofpredicates referring toan act of enunciation'. She adds that the
structures in order to implement those models into computerized verbal structure of communication verbs is determined by the relation of these
lexicons. This line of research has attained a high degree of development, verbs to the verbal archetype DECIR ('to say'), which structures the type of
but its contribution can rarely be used to describe real texts, which leads us verb in question. The different subtypes of communication verbs can be
to claim that there is little relationship between studies on verbs and their defined according to the direct or indirect relation of a particular verb to
application in the description of authentic texts. the verb DECIR. Based on this principie, Bosani (2000) proposes four types
Conversely, the few studies that analyse verbs based on corpora consider of communication verbs that are schematically presented in Table 5.2.
these units to be a specific feature within a set of features in what has been Type A verbs correspond to the performative use oflanguage. Type B verbs
called multiple feature analysis (Biber 1988; Parodi 2005c). Nevertheless, can be defined according to two criteria: 1) whether the verbal act takes
sorne research using corpora has specifically focused on verbs (Sabaj 2004a place through the use of a special type of locution (greeting, compliment,
and 2004b), stressing their importance for register determination or auto- slander, etc.), and 2) whether it presupposes sorne degree of effect on the lis-
matic analysis of documents (Klavans and Kan 1998). The contribution of tener / patient of the verbal act. Type C verbs, apart from the action of saying
these studies exclusively focused on verbs is the fact that more accurate something, express a certain attitude of the agent towards the propositional
information can be obtained about the behaviour of this category, infor- content introduced by the verb. As stated by the author (Bosani 2000), these
mation which is othenvise lost when the verb is just another feature within verbs have a more complex lexical-syntactic representation, since beside the
a matrix. locutive action feature, they include the + MANNER feature in the repre-
sentation, thus becoming equivalent to Urban and Ruppenhofer's proposal
(2001). Finally, Type D verbs point to a locutive activity, marking the specific
2.1 Communication verbs physical manner in which the agent does the 'saying'.
Communication verbs basically correspond to what is called verba dicenci in This work will not address verbs oftype B or D, butwill rather concentrate
Latin grammar; that is, those lexical pieces expressing verbal human activ- on A and C types. The first group is close to the protoverb DECIR which,
. In this sense, and from a general point ofview, commu- according to the author, is a basic semantic predicate underlying all com-
nication verbs can be identified with acts of speech. munication verbs. On the other hand, as was mentioned before, type C verbs
For the purpose of this we will start from Bosani's include the expression of the attitude of the speaker towards the commu-
scheme to determine the of verbs to be considered nicated propositional contents.
12 WORKING SPANISH STER ANALYSIS
it is f.'U~MUH. of communication verbs which to Bosani's

discourse mode. When we will . The first group wíll be referred to as Neutral
also present sorne that can be established between these or, in other words, verbs to action that can be carried
the communication verbs described Bosani out with words. The second group will be called attitudinal communication
to lexical verbs, such as as we mentioned besides communica-
indicaror are the most used resources to express mitigation tion, convey the speaker's attitude in relation to an utterance or somehow
in scientific research artides. According to his definition: determine the general action of speaking. This attitude is to sorne extent
related to commitment and can also be represented as a continuum. The
Epistemic verbs represent the most transparent means of coding the subjectivity
decision to focus on these two types of communication verbs rather than
of the epistemic source and are generally used to hedge either commitment or
assertiveness [... ] epistemic verbs therefore mark both the mode ofknowing and others is based on our belief that the distinction between neutral and attitu-
its source. (Hyland 1998: 119-20) dinal verbs is relevant from the point of view of the re gis ter.
The analysis of prepositional schemes of communication verbs in actual
In general terms, it could be stated that epistemic lexical verbs correspond texts has special importance when we consider the discursive roles of these
to what sorne grammarians (Halliday 1994) have called verbs of acknow- verbs. As stated by Massi (2005) and López (2001), they are essential in the
ledgement or mental processes, that is, verbs whose general meaning refers presentation of information in academic discourse, both direct and indirect.
to the various cognitive processes of the human mind. The authors state that these verbs are good markers for tracking the textual
A special type of epistemic lexical verb is the judgement group of the topic, i.e. the global theme of the text. The study of communication verbs on
kind sugerir suggest') or estimar ('to estímate'). According to Hyland the basis of prepositional schemes contributes to understanding in a better
(1998: 120): waywhich aspects (neutral or attitudinal) are predominantin communication
and, even more, which specific schemes do appear more frequently ( i.e. which
These verbs reflect appraisals by the speaker of the factive status of events and
specific preposition interacts with each cognitive category). It is also possible
include specu!ation and deduction. They are distinguished by the fact that the
of commitment to the truth of a proposition is predicated on a reference to track all these aspects in a nine multi-register corpus, in order to determine
to the uncertainty of human evaluation, thereby differing from other verbs of the relative importance of these structures in each of them, associating that
knowing and saying by their epistemic orientation to the proposition. relative importance to the macro-contextual characteristics ofthose registers.
Another subtype of epistemic lexical verbs is that referring to the justifica-

tion of either based on reports of other speakers or the evidence 3. The study
of the author himself. It is important to note that, as pointed out by the The methodological framework of this research is what Sampieri et al.
author (Hyland 1998), evidence in scientific discourse often takes on an have called a 'descriptive study'. Following their definition, this type
informal character, i.e. it is based on belief or knowledge taken of research tries to 'measure or collect information separately or jointly
for in specialized literature. Evidential verbs may be of two such about the concepts orvariables they refer to' (Sampieri et al. 2003: 119). The
i.e. reporting or verbs. authors add that this type of work offers the of making predic-
evidential verbs, such as mostrar, predeciror sostener, express pre- tions or establishing relations between variables. Even though these authors
vious discoveries with respect to a tl1eory and indicate the author's degree distinguish between this kind of descriptive research and other explanatory
of commitment to such discoveries. In this sense, it can be asserted that the we agree with Titscher et al. (2000: 7) in that:
commitment expressed in a communication verb is a a con-
tinuum.
cw,,rrmtmn may have additional hidden aims and seeks to describe the behaviour
There an almost direct between this of actors in particular social fields. Such investigations do not aim to account for
temic classification of verbs and the classification of communication verbs quantifiable distributions but rather to document of existence, and some-
Bosani verbs to times a!so to go and the rules which determine them.
14 WORKING SPANISH CORPORA REGISTER ANALYSIS
Mode
Instrument
or
it is not easy to a or Pcrson
aspects stem from the type of category Event
used and aspects from (using neither co- Time
relational nor parametrical but basic statistics) of the oc curren ce of those
categories in each subcorpus.
Considering the above-mentioned arguments, the general objective of Table 5.4 General analysis matrix
the study is to make a comparative description of the performance of prepos-
itional schemes of two types of communication verbs in a multiple-register Criterion Name
corpus and relate the performance of each type of verb with the macro- N Number ofwords
contextual characteristics of the registers under study. V Number ofverbs
NCV Number of neutral communication verbs
ACV Number of attitudinal communication verbs
3.1 Method p Number ofprepositions
Since our analysis unit is a sequence ofthree elements, sorne decisions must e Number of cognitive categories
be made. Both the sequence and each separate element can be studied as a
unit. For better understanding of the manner in which we will analyse the
schemes, we present Table 5.3. 2) other queries, using the same instrumentas above, to rule out the verbs
unable to combine with a preposition, i.e. all selected verbs have the
3.1.1 a descriptive matrix analysis potential to combine with a preposition.
In Table 5.4 we present six criteria for the analysis.
Finally, the research was restricted to 17 verbs appearing at the top of the
3.1. 2 Verb sample and selection criteria list of frequency of occurrence in the two groups mentioned above. They
A sample of 34 communication verbs was selected, dividing them into are shown in Table 5.5.
two groups as described above: neutral communication verbs and
attitudinal communication verbs. The selection was done in three stages. 3.1.3 The corpus
a survey of various grammars and specialized articles in search of Table 5.6 presents the corpora used in the study.
the verbs most frequently included in these categories was conducted The corpus analysed consists of nine diversified registers of Spanish,
(Alarcos Llorach 1999; Cano 1999; Di Tullio 1997; Fuentes 1985; which include written registers (ARTICOS, DETP, DEEB, DICIPE,
Gómez 2002; Hernández 1986). Then, in order to establish a distinction CPP and CLL) and oral registers (CEO and NOTICEN TV). Sorne ofthese
between neutral and attitudinal were classified to are school-level registers DEEB, CEO, and CTC), while others are
Bosani (2000); additionally, we analysed the co-text of each occurrence associated with communication and the dissemination of scientific know-
to ensure that every verb corresponds to the category to which it was ledge (ARTICOS and DICIPE). Within the corpus, there are sorne registers
related to specific disciplines such as literature and public policies (CLL
The third stage involved: and CPP). In short, we have tried to focus our research on registers that can
be grouped together or set apart according to different criteria, so as to
1) a series of queries to determine average frequency of oc curren ce of a cover a wide range of registers. For a detailed description of the general
verb. An internet search engine was used and the results were characteristics and multiple corpus collection procedures, see the
written in order to eliminate verbs that were seldom used.
16 SPANISH REGISTER ANALYSIS
communication ----- 1)
2)
The done context of5 elements
4 Contar Precisar
The results were then transferred to MS Excel An ~"'U""""·
Narrar Afirmar
Discursear Aseverar
Left
Enunciar
8 por eso se usaban la
9 Comentar Mauricio Purto,
10 Informar Criticar andinista y médico.
11 Manifestar Declarar
Definir superior a la de los explicaría con su prolongada
12 Mencionar
Especificar esquimales, se permanencia en
13 Nombrar
Explicar estas regiones.
14 Señalar
15 Transmitir Insistir
After completing the queries for all registers, categories were classified in
16 Denominar Revelar
17 Presentar Sostener the last column of the grid above. Problem cases were submitted to expert
peers for classification. Finally, the variables were isolated for the two pat-
terns to be researched:
Table 5.6 Conformation of the corpus a) NCV + n PREP + n CC

Name Acronym NW ND b) ACV + n PREP + n CC
Articles of scientific research ARTICOS 2,471,389 642 where n is a preposition or a specific category. This independent classifica-
Technical-scientific corpus CTC 774,622 74 tion for scheme analysis allows specific statistical testing, which will be
Technical-professional DETP 40,449 27 explained with the results.
school discourse
Primary school discourse DEEB 139,250 49
Scientific dissemination in DICIPE 204,598 412 4. Results
written press
Central TV news programmes NOTICENTV 84,809 270 First, the occurrence of both verb groups (NCV and will be shown
of public policies CPP 234,818 20 in each register, and then the most frequent verbs in the corpus will be
of Latin American CLL 513,359 12 identified.
literature Second, we will show the nominal results in the use of a statistical test
of oral interviews CEO 410,981 4 (Cramer's V test), to find outwhether there is a general association between
TOTAL 9 4,874,275 1,510 a preposition and a cognitive category. the comparative behaviour
of prepositional schemes of the most communication verbs in each
NW = Number ofwords
register will be presented.
ND = Number of documents
4.1 Cmmnunication verhs

3.2Data
Graph 1 shows the frequency of occurrence of communication verbs in the
The first step in data consisted of a series of using different registers of the corpus. Calculations (normalized frequencies per
the interface 'El Grial' , which was developed N number ofwords) allow register comparison.
In at Pontificia Universidad The data show the following all present greater
occurrence of NCV than but the difference is practically nonexistent
WORKING SPANISH MULTI-REGISTER ANALYS
communication verbs
-t--------------------/-,,-~--\----1¡----. ~~~E % ACV N %
/
/
1 1
1
\
----
2
Hablar
Decir
582
407
32%
22% Comentar
81
46
32%
18%
1 Presentar 209 11% Definir 33 13%
/
3
4 Contar 103 6% Insistir 31 12%
5 Señalar 93 5% Declarar 18 7%
6 Expresar 92 5% Revelar 9 4%
7 Informar 75 4% i\segurar 8 3%
8 Manifestar 74 4% Argumentar 7 3%
9 Mencionar 50 3% Enfatizar 3%
10 Comunicar 41 2% Sostener 6 2%
Registers
presence of communication verbs in this register would be associated with
Graph 5.1 Frequency of communication verbs* the occurrence of expressions referring to interaction among characters.
*Frequency X 100 The ten most frequent communication verbs in the corpus are presented
in Table 5.7.
in ARTICOS and DICIPE, two registers associated with science and tech- As shown in Graph 1, Table 5.7 illustrates the predominance ofNCVover
nology. These results imply that in science communication and dissemina- ACV. It is interesting to note that the most frequent NCV are close to what
tion, communication verbs fulfil both roles, i.e. present information and Bosani (2000) has named Protoverbs. Even though the resulting percent-
express attitudes towards propositional contents. In addition, both registers ages of both verb types seem similar in this table, they are not comparable,
present few communication verbs, a fact that could have two explanations: because they have been calculated on the total of each subtype and com-
first, in these registers, other types of verbs are predominant (e.g., copula- parison can also be done vertically along the table.
tive) or other grammatical categories are prevalent (nouns, adjectives); or
second, DETP and CEO, two registers associated with school activities,
4.2 Dependence between prepositions and cognitive categories
present the greatest difference in occurrence between one group of com-
munication verbs and the other. To find out whether there is a general association among the examined vari-
The data show that these registers favour content presentation and ref- Cramer's V Test, which is a symmetrical coefficient, was used. This test
erence to the activity of communication before attitudinal expression does not distinguish between independent (cause) and dependent (effect)
towards propositional contents. It is worth mentioning that both are pro- variables, and it can only reflect the force and direction of the relation
duction registers, i.e. texts written by students in school contexts. Thus, we between two variables. This coefficient, just as other similar coefficients,
think that their educational level can account for a greater difference in the to compare the values obtained each register and it usually ranges
use of one verb rather than another, sínce the expression of attitudes, that from O to 1 (sorne range from -1 to + l), O being the statistical indepen-
is, taking a stance toward the contents of what is being communicated, is dence and 1, the perfect association. For the purposes of this study, the
part of a more advanced educational level, where the subject has a better expected value was 0.6. In other words, two variables become associated
command of vocabulary. The subjects who produced these registers are at when one occurs six times with the other out of ten occurrences. It should
the called Bereiter and Scardamalia (1987) 'the expression of be pointed out that only the nominal result, not the number, of each test
will be shown, since panda values vary every time the test is applied to each
DEEB and NOTICEN TV) pre- register. To minimize the error made by the application of a test,
senting a with mínimum difference between one the expected value a = 0.5 was divided by the number of times (9) the test
type ofverb and the other. This implies that these verbs, as textual features, was applied, and this yields a 0.055. This value was used in each ofthe nine
do not to establish a variation between the registers in this group. tests carried out on each type ofverb.
the register of Latin American literature texts, can be seen The results shown in Table 5.8 revea! in the case of the NCV, there is
m1íHA7:iv ~U'"""'" between those with the greatest difference ,.,~,,~··~··~~ between the type and the cognitive
and those where the difference is not The this prcposition occurs in all the ~~·~',,~.,~ studied. The occurrence of
WORKING SPANISH ORPORA MULTI-REGISTER ANALYS
tests V test cognitive
NCV
70%
CTC 60%_¡_~--~~~~-Jl'-~~~~~~-----;:------<li;------,llL-----'\-----11'-----1
DETP * o
o 50%
DEEB 40%+-~..............:--¡;----\----;1"--~r;-1---~-:-r----:.=-\-----1
DICIPE *
NOTICENTV
* o 30%
* o 20%-L----4~_._,L!'.:__~~--~~-'--,-----,,-\,--;..---§------~~-\-~~
CPP
CLL
* o 0%-l--~-~'-c-:¡¡.,-----=;~~__:-..;:--~-:;;;!-:-:---'--\--~:--------:~~--1
CEO
* o 0%-L-------------~.-d!'-----,.-__:s"'"""'-r---''9..-~r--o-~'---r---~
* * {<,- ~
<vº
(j <f 0«;
*= Dependent <:)~ ~~
O = Independent ~º
Graph 5.2 Schemes with the preposition a
one preposition and one category in these verbs matches with frequent col-
locations in each one of these registers. These results show that there is rela-
tion between both categories and they jointly function as a sequence typical 100%~---------------------------,
of the registers where they jointly occur. 90% • • • Place L - - - - - - - - - - - - - - - - - - - - - - 1
In the ACV, in contrast, there is only one association between the prepo- 80% - -•- - Mode 1 - - - - - - - - - - - - - - - - - - - - - - - - 1
sition and the cognitive category in two of the registers studied, ARTICOS 70% - Entity L - - - - - - - - - - - - - - - - - - - - - i
and CEO. According to the above, science and orality present sequences of 60%
typical prepositional schemes, which will be discussed in the following 50ºkJ___ __.L:___~_ _ _ _ _ _ _...!'___ _,.__ _ _ _ _ _--,,,--.....---..---i
section. 40%-L-__,,¡f'.___ _\---±;:_,.----#-~.-=----~-----4'--,,~~___:''-..--¡
30%J__~L__,.___ _....\,~=-=~'L.,--~---.,-,J!l'=.~no---.!!.__-JC----'1:4----j
20%-L---=~----='.'__,~--.:-\---J----'i!11r-::---~-..~--~---JL---,~---¡
4.3 schemes of communication verbs in Spanish _____________c't-----l----=-'---'l.---I
10%1---~-~::___"!j,.L_
The test shown above <loes not help determine which prepositions 0%J_______:~-------~---.,.------.~9'-~.--e-.-~-~
relate to which specific category. For both verb types, the three most fre- ""«. ~ - • 9.((,, <:<.,,~
quent prepositional schemes in each register will be shown. <:)«,, ti' ~~
Q ~º
4.3.1 Prepositional schemes in the NCV
. 2 shows the NCV schemes, preposition 'a' being the most frequent Graph 5.3 Schemes with the preposition en
m these types of verbs.
The shows that the preposition a preferably combines with three As to the 'mode' category, this presents a heterogeneous beha'Viour in the
cognitiv_e categories,_ namely, mode, and the category different registers, but rare occurrences or their absolute absence is notice-
is recurrent m those registers with communication and able in the registers CTC, DETP and DICIPE. These registers group
th~ dissemination ofscience (ARTICOS, CTC and DICIPE). This category, together in terms of theme and function, namely, the technical-scientific
wh1ch to abstract nouns, is functional in these registers because discourse both in school settings and press dissemination. The data suggest
for concept transmission. It is worth noting that with this that, combined with the preposition a, these registers rarely express the
._,~,oiu~u there is behaviour of the category which alterna tes with mode of communication.
of those general registers that oppose the Graph 3 shows the schemes ·with the most frequent preposition in this
DEEB and NOTICEN TV). The communication type of verb, that is, the preposition en. .
NCV tend, in these '\i\'hen there is a NCV with the preposition en, schemes with categones of
to whom this
place, mode and entity are mostly formed. The category 'entity' in this case
WORKING SPANISH CORPORA MULTI-REGISTER ANALYSIS
100%
90 % - - - Person
80% -B-- Event 1----------------::::::::::::::::::----.;
70%r---__::=-=~_j------------======~~---_:::,,.----j
70% ----· Person
'
~~~:~======~=====:=============::~=========================~
\
60% /
,. ......... --,
40%-t--------------------------------¡
50% ,. ;
' \
40%
30%-r-----:111::---------...---------------------¡ 30%
20%-t---7"--~~,,,__--::-.-Y:-------""~-----=:::::::¡¡:::~----...---1
\
10%--1-~.,L---7'~-__;,.'-:-----___:::a~==:E:::~=-=-~-=::::=-11---~o=::::::::_-1 20% • 1
\
1 1
1
0%--1-----lllf__~:::.::=---~,.c._~_::,_~=--=~---"--T-----.-----c~~---.~---1 10% 1
,p. 0%
0<vº 0 00 <vº 'v'v '9.Q, f....v ~ <::;<v.Á.,.Q,
!-..,'-º
(j (j (j v <::;<()
Graph 5.4 Schemes with the preposition de

~
Graph 5.5 Schemes with the preposition a
prevails in the registers CEO and CPP. Though very different, the theme of such a preposition, that is not allowed by other prepositions. The category
both registers can explain the prevalence ofsuch a scheme. In the first case 'event' presents a relatively homogeneous behaviour in registers of differ-
(CEO), the theme is the text comprehension process and, in the second ent characteristics (CEO, CPP, CTC, DEEB, DETP, DICIPE, NOTICENTV),
case (CPP), poverty eradication state policies are dealt with. This scheme and so the explanation of the syntactic restriction is the most pertinent one.
(NCV + en + entity) is, then, functional in those registers whose themes are The preposition de creates a scheme with the category 'person', mostly in
abstract. The fact that the categories 'place' and 'mode' are typical of CLL the CLL and in the ARTICOS. In the first case, it refers to the presence of
is also decisive in the sense that the function of these categories is to indi- characters, whereas in the second to the presence to authors.
cate the immediate context (deictic markers), a prevailing function in this
register. This tendency may be due to the mention of specific parts of 4.3.2 Prepositional schemes in the ACV
devices associated with technique in the case of CTC and to the reference Graph 5 shows the ACV schemes with the preposition a, the most frequent
to geographic places in the case ofboth DETP and DICIPE. As to the expres- one in this type of verb.
sion ofthe 'mode' in the NCV, itis typical ofthe register NOTICEN TV, most With the ACV, the preposition a largely produces schemes with the cat-
probably associated with the way people in the news express themselves. egories 'mode', 'entity' and 'person'. The category 'mode' occurs only in
Graph 4 shows the schemes resulting from the combination of a NCV with three of the nine registers studied and it prevails in the registers CLL and
the third most frequent preposition, namely, the preposition de. ARTICOS more than in any other register. The most relevant category that
The data in Graph 4 show that the preposition de preferably combines combines with this preposition is 'person', with a high occurrence in the
with the categories 'entity', 'person' and 'event'. First, the homogeneous CLL, CTC and DICIPE registers. The presence of this category in these
behaviour of this preposition concerning the category 'entity' should be registers must be associated with the reference to characters in the case of
emphasized. The scheme NCV + de + entity is cross-sectional and is not literature, whereas in the other registers, to the presence of different voices
affected by the differences that derive from the macrocontextual charac- or reference to different authors' views of the topic of the texts. There is a
teristics of the different registers. In this sense, this scheme represents a combination of these verbs with the category 'entity' which prevails less
tendency of the language in its entirety; that is, when a NCV occurs in com- than in the other categories, but it is markedly present in the ARTICOS,
bination with the preposition de, the tendency is for it to be followed by an DETP and DICIPE registers, and has a definitive absence in other registers
entity, independent of the register which is being studied. It is important to (CLL, CPP, CTC and NOTICEN TV). The occurrence of this category is
emphasize that the occurrence in these schemes ofthe category 'event' <loes associated with the presentation of abstract concepts that occur in the regis-
not characterize any other sequence in the NCV. ters mentioned.
The occurrence of this category in combination with the preposition de In Graph 6, the prepositional schemes with the second most frequently
can be understood as either activation ora syntactic restriction favoured occurring preposition are presented.
WORKING SPANISH CORPORA MULTI- ANALYSIS
100% 00%
90% Mode 90%
80% 80%
lnstrument
70%
60% Person
60%
50% 50%
40%
40%
30%
30%
20%
20%
i0%
i0%
0%
0%
00 <yo
C.i
d:a <yo
C.i v'>' ~(;
(¡
:<.{J :<-'fJ
~ ~
Graph 5.6 Schemes with the preposition con Graph 5.7 Schemes with the preposition de
Table 5.9 The most frequent schemes in both types of communication verbs
Justas Graph 6 shows, the preposition con preferably combines with the
categories 'mode', 'instrument' and 'person'. First, the presence of the NCV ACV
category 'instmment' with a high occurrence in the register CEO should a en de a con de
be noted. Moreover, this category has the same behaviour as that of the
categories 'person' and 'mode' in the CPP, CTC, DEEB, DETP, DICIPE Mode Place Entity Mode Mode Mode
and NOTICEN TV registers. In these registers, the categories present a Entity Mode Person Entity Instrument Entity
total absence and are perceptible only in the DICIPE register. The cat- Person Entity Event Person Person Event
egory 'instrument' in this register, associated with the dissemination of
science in the written press, can be related to the presentation of differ-
4.4 General comments on prepositional schemes in the communication verbs
ent authors' views or to the explicitness of the means on which com-
munication is based. Likewise, the category 'person' occurs with a The occurrence of schemes independent of the registers in both types of
different behaviour in the ARTICOS, CEO and CLL registers due to the verbs are commented on, with the purpose of analysing.which are the most
occurrence of authors in the case of ARTICOS and of characters in the frequent prepositions and the categories that preferably combine with these
case of CEO and of CLL, respectively. It is relevant that, for the category prepositions.
'instrument', a collocational argument can be used in the sense that this As Table 5.9 shows, communication verbs share two of the three most fre-
preposition conveys in itself (along with the preposition por) an instru- quent prepositions, in particular the prepositions a and de. In these cases,
mental meaning. they further share the same categories. That is to say, independent of the
In Graph 7 the prepositional schemes with the preposition de, the third type ofverb, there is dependence between a preposition anda cognitive cat-
most frequent in the ACV, are shown. egory: if the preposition a occurs, the categories that will be activated are
The preposition deis productive with the categories 'mode', 'entity' and 'mode', 'entity' and 'event'. The above occurs regardless of the specific
'event'. The category 'entity' shows a high presence in two registers (DETP function performed by both subtypes of communication verbs. It should be
and ARTICOS) and zero presence in the remaining registers. As was already pointed out that, independently of the preposition and the type of verb, the
mentioned, the presence ofthe category 'entity' relates to the communica- most productive categories in these schemes are 'mode', 'entity', 'person',
tion of abstract concepts. The category 'event', on the other hand, shows 'place', 'instrument' and 'event'.
syntactic compatibility with the preposition studied and mostly prevails in Both Table 5.9 and the graphs previously presented are based on the
the register CEO, i.e. oral interviews about text comprehension processes. highest frequencies, not on the total occurrences. Considering this, in the
In this register, the presence of events relates to the narration of acts con- NCV the preposition en is more relevant to this type of verb. The same
cerning the text task. occurs with the con, which is unique to the high occurrences in
WORKING SPANISH CORPORA MULTI-REGISTER ANALYSIS
between types of verbs and the

be determined with the cat-
tion of
the roanner
at the same level or with sorne features.
thís research can be made more the inclusion
not presented in the schemes, for example, prepositional schemes
Condusions . cognition, movement or perception verbs, váth the aim of determining
In this schemes of communi- mhether the same schemes are present w1th verbs of a different type or, in
cation verbs in the notion of the preposi- :ther words, if these schemes are common to verbs of a different nature.
tíonal scheme in relation to PCs. Then we rev:iewed various dassifications
for those verbs in order to specify the two subtypes on which our research
was focused: neutral communication verbs and attitudinal communication
verbs.
Among the conclusions to this work are the following: first, in ali registers
examined the occurrence ofneutral communication verbs was higher than
attitudinal communication verbs. This implies that, in language, the pre-
sentation of information is more important than attitudes with relation to
its contents, especially in registers associated with school activities. Likewise,
it can be said that in scientific discourse there is a relatively equal occur-
rence of the two functions expressed by these verbs.
Second, it is important to stress that, even though from a grammatical
point ofview a preposition has the same possibility of co-occurrence with a
given verb and cognitive category, in usage there is a systematic dependence
between communication verbs, the preposition and the cognitive category
that make up the scheme. Such is the case for both types of verbs under
examination. This occurs throughout all registers and with all verbs, since
ali registers systematically present the same prepositions combined with the
same categories.
Third, a syntactic dependence between a preposition anda cognitive cat-
egory, favouring the formation of certain schemes, can be established. Such
is the case of the relation existing between the preposition de and the cat-
egory 'event'. In this scheme, the preposition deis combined, from a gram-
matical point of view, with a nominal clause which in cognitive terms
corresponds toan event. It can be concluded, therefore, that the recurrence
of sorne schemes is associated with a syntactic relation. The same could be
said, from a semantic perspective, about the occurrence of the preposition
con and the cognitive category 'instrument', beca use they are semantically
close, i.e. the meaning of the preposition includes a sort of 'instrument'
feature.
Fourth, while sorne prepositional schemes behave similarly in different
registers, others mark a clear variation, thus becoming clear sequences for
register distinction. An example of this is the case of the NCV + de+ Entity
scheme, which behaves in exactly the same manner in all registers and can
be considered therefore as a constant in the language and a poor indicator
of register variation.
G SPANISH ANALYSIS
APPENDIX: CORPUS
Discourse
Summaries of students' responses in

test. The students were in their
last year of school. The
corpus was collected in schools of thc V
in thc framc of thc
ID ARTICOS
Year
Name Scielo Scientific Research Articles Number ofDocuments and Documents: 27
Mode Written Number of\'\'ords Words: 40,449
Register Scientific
Brief Description This corpus is composed of scientific research
articles collected in the digital index Scielo
(Scientific Electronic Library Online). The
articles come from three different science ID DEEB
domains: biology, exact sciences and social Primary School Discourse
Name
sciences. Each article includes a title, an Written
Mode
abstract, keywords and the body of the text. Educational discourse
Register
Acknowledgements and references have been Brief Description Primary school students' discourse elicited as a
omitted. part of a writing task. The students belonged to
Collection Year 2005 different subsidized and municipalized schools
Number of Documents and Documents: 642 of the V Region, Val paraíso, in the frame of
Number ofWords Words: 2,471,389 the FONDECYT 1020786 project.
Collection Year 2004
Number ofDocuments and Documents: 49
Number ofWords Words: 139,250
ID CTC
Name Technical-scientific Corpus ID DICIPE
Mode Written
Register Technical-scientific Name Scientific Dissemination in Written Press
Brief Description This corpus is composed of specialized texts of Mode Written
obligatory and complementary reference in three Re gis ter Journalistic
technical-professional areas (maritime, Brief Description Texts disseminating science and technology
commercial and industrial) used in the last year in five Chilean newspapers
of the differentiated school system in Chile. The Collection Year 2004
corpus was collected in schools of the V Region, Number ofDocuments and Documents: 412
Valparaíso, in the frame of the FONDECYT Number ofWords Words: 204,598
1020786 project.
Number of Documents and Documents: 74
WORKING WITH SPANISH ORPORA MULTI-REGISTER ANALYSIS l
ID
Name
NOTICENTV
Central
------------·
-
ID
Narne
CEO
Oral Interviews
Mode Oral Oral
Oral interviews
One month's of four open national This corpus is of oral semi-dirccted
TV programmes interview transcriptions with final year students
Collection Year 2000 of technical-professional and human-scientific
Number ofDocuments and Documents: 270 school systems. The interview topic was the use
Number ofWords Words: 84,809 of comprehension strategies in students. The
corpus was collected in schools of the V Region,
Valparaíso, in the frame of the FONDECYT
1020786 project.
ID CPP Nurnber of Documents and Documents: 4
Name Public Policies Corpus
Mode Written
Register Socio-political
Brief Description Texts of public policies to overcome poverty,
collected from different institutes, institutions
and think-tanks from a spectrum of política]
tendencies - the left, the centre and the right
wing
Number ofDocuments and Documents: 20
ID CLL
Name Latin American Literature Corpus

Mode Written
Register Literary-fictional
Brief Description This corpus is comprised of texts of Latín
American literature, which are obligatory
reference in three technical-professional areas
(maritime; commercial and industrial) during
the last year of the differentiated school
system in Chile. The corpus was collected in
schools of the V Region, Val paraíso, in the
frame of the FONDECYT 1020786 project.
Number of Documents and Documents: 12
FUTURE TENSE EXPRESSIONS IN SPANISH ORPORA
voy a cantar 'I am going to sing'.

from which periphrastic forms expressing are
few and consistent cross-linguistically. The most fre-
sources are movement verb constructions, with ten futures
Mercedes Sedano sources in constructions with 'come' and similar verbs and ten in
Universidad Central de Venezuela constructions with 'go'. Hopper and Traugott (1993: 79) note that 'temporal
Venezuela terms can be derived metaphoricallyfrom the spatial term (via SPACE >TIME)'.
The metaphorical use of ir a ( or its equivalent) justifies the periphrases
formed vvith this auxiliary verb in several Romance languages, as well as in
many others, among them English (he is going to leave early).
Introduction 1 In her detailed study on the future tense forms in Romance languages,
Fleischman ( 1982) considers that the verb ir 'to go' - or its equivalent in the
Two basic verbal expressions are used in present-day Spanish to refer to Janguages derived from Latín - originally denoting movement towards a
future events: the morphological future (MF: cantaré 'I will sing') and the place, changed in to a periphrastic auxiliary (ir a + infinitive) to express
periphrastic future (PF: voy a cantar'I am going to sing'). 2 Since both forms movement towards an objective. The author adds that the periphrasis ir a+
alternate in spoken and written Spanish, it seems necessary to find and infinitive may have been used in the colloquial language of the fifteenth
explain the reasons for using one or the other future form. century to refer to future events, but did not appear in written texts until
After an introduction on the origin and evolution of future tense forms the sixteenth. Since then, the periphrastic form voy a cantar 'I am going to
and a review of the existing studies on the topic, in spoken and written sing' has alternated with the morphological future cantaré 'I will/ shall sing'.
Spanish of different Spanish-speaking areas, I will presenta study compar- The prevalence of one of these two forms depends, to a considerable extent,
ing the results of two separate analyses: one on a Caracas spoken Spanish on the period of history, style and discourse mode (oral or written).
corpus (Sedano 1994) and the other on a Venezuelan Spanish written How could the lack of stability of future forms in all languages be
corpus (Sedano in press). explained? In precise terms, Fleischman (1982: 23) argues that:
In the present study I focus on the alternation between MF and PF, when
these forms refer to future events. Expressions such as Ahora serán las 11 it has often been observed that a quasi-universal characteristic of future forms is
'Now it must be 11' were excluded, as they indicate conjecture or doubt their propensity towards semantic change. They evolve typically out of modals or
about the present situation, as well as Qué voy a saber yo 'How should I know', aspectuals which at sorne point take a temporal value. At a second stage, once a
which refers to the present ('I do not know') and not to the future, even form has come to function as a future, it will then acquire modal colorations,
which, if sufficiently pronounced, may eventually supersede the temporal value of
though the latter expression is constructed with ir a+ infinitive 'going to +
the form ...
infinitive'.
On the evolution of the synthetic future in -ré and the periphrasis ir a +
1. Origin and evolution of future tense forms infinitive, the analysis conducted by Sáez (1968) on plays written in Spanish
from the sixteenth through to the twentieth century by Cervantes, Lope de
The morphological future in -ré has its origin in Latín, specifically in Vega, Bretón de los Herreros, García Lorca and three other contemporary
periphrases of the infinitive verbal form with the auxiliary habere writers is of particular relevance. 6 The results of the study show that MF pre-
habeo, 'he de , which denoted obligation ('I must sing'). In the vails over PF in all cases, even though the use ofMF decreases over time. In
second century AD the use ofthis periphrasis began to refer to future events, spite of the fact that Sáez does not find any case of PF in Cervantes' works,
thus starting to compete with the simple form (cantaba) used in classic Latín he identifies a gradual increase in its use in later works. There is then an
to denote 3 In medieval Spanish the hauer inverted frequency of use of the two future forms, that is, an increase of PF
fused with the main verb, form and a decrease of MF throughout the last four centuries.
l WORKING SPANISH ORPORA FUTURE TENSE EXPRESSIONS IN SPANISH CORPORA
UIJ•~U•ou, there are

Hf"<'->UC•r
"ª·'~~---variation. In PF is more frp.rn••Pn

Ferrer Sánchez
(ranging from 100 per cent in Santo .uc,•uu."'"'-' to 69 per cent in Mexico
Almeida and Díaz as well as various
than in Madrid (53-57 per and Las Palmas de Gran Canaria
1968; Grimes 1968; Hunnius 1968; Soll 1968; Bauhr
(28-62 per cent) .10
1989; Blas collected the twentieth century. The
The data corresponding to written also ordered in percentages,
of the corpora from the , methodological
are presented in Table 6.2, where the and second columns include the
in number of data and dialectal (different
researchers' names and the authors and titles ofthe analysed works, respec-
varieties) points of view might considered a shortcoming of this study.
tively. The written materials used for these analyses are works of different
However, by the same token, this heterogeneity makes it possible to present
genres: literary works, plays and folk-tales.
a general descriptive overview, which, in turn, is the basis for the explan-
The results show, overall, that MF is more frequent (72 per cent) than PF
ations set forth in Section 4 of this chapter.
The results obtained in the studies on spoken Spanish corpora are pre- (28 per cent), with a considerable variation ranging from 86 per cent of MF
sented in Table 6.1, by countries or cities of origin ofthe corpora, 7 followed in Rulfo's novel to 42 per cent in Espinosa's tales. The frequencies found
seem to depend on the works and characters. Indeed, more educated char-
acters generally use MF in formal situations, whereas less educated people,
Table 6.1 Morphological and periphrastic future occurrences in spoken Spanish especially in informal situations, tend to prefer PF.
Morphological Periphrastic
Another determining factor is the author of the work; for example, a high
Total
future future frequency ofMF is found in Rulfo (se e Grimes 1968), in spite of the fact that
Pedro Páramo is about folk characters. The significance of the author in
Tokens % Tokens % determining the use of one future tense expression over the other has been
Dominican Republic o o 16 100 16 pointed out by severa! authors, such as Cartagena (1981and1995-96) and
(Silva-Corvalán and Terrell 1989)8 Blas (2000). Despite the above-mentioned variety in terms of the type of
Chile (Silva-Corvalán and 1 2 64 98 65 material analysed, the overall results on written Spanish show that the use
Terrell 1989) of MF is much more frequent than that of PF: 74 per cent vs 26 per cent,
Puerto Rico (Silva-Corvalán and 10 11 79 89 89 respectively. Based on the data presented in Tables 6.1 and 6.2, it can be
Terrell 1989)
Caracas and Maracaibo 101 12 710 88 811 Table 6.2 Morphological and periphrastic future occurrences in written Spanish
(Sedano 1994)
Morphologi- Periphrastic Total
Venezuela (Silva-Corvalán and 2 12.5 14 87.5 16
cal future future
Terrell 1989)
Rosario, Argentina 34 20 137 80 171 Tokens % Tokens %
(Ferrer and Sánchez 1991)
Caracas (luliano 1976) 146 23 481 77 627 <?rimes (1968) J. Rulfo (Pedro Páramo) 155 86 26 14 181
Ciudad de México (Moreno de 374 31 824 69 1198 Avila (1968) 11 R. U sigli (El gesticulador) 81 84 15 16 96
Alba 1970) Blas (2000) Buera Vallejo (Three plays) 351 78 99 22 450
Las Palmas de Gran Canaria 164 38 266 62 430 Bauhr (1989) Fifty plays ( 1959-1973) 2472 75 812 25 3284
(Troya 1998) Bias (2000) Alonso de Santos (Four plays) 485 63 188 37 673
Madrid (Gómez 1988) 422 43 561 57 983 Soll (1968) A. Espinosa Junior (Cuentos 268 61 170 39 438
Madrid (Cartagena 1995-96) 9 60 47 69 53 129 papulares ... )
Las Palmas de Gran Canaria 656 71 262 29 918 Ávila (1968) L. G. Basurto (Cada quien su 31 48 34 52 65
(Almeida and Díaz 1998) vida)
Las Palmas de Gran Canaria 660 72 261 28 921 Hunnius A. Espinosa Junior (Cuentos 39 42 53 58 92
(Díaz 1997) (1968) populares ... )
Total 2630 41 3744 59 6374 Total 3882 74 1397 26 5279
WORKING SPANISH FUTURE TENSE EXPRESSIONS IN SPANISH ORPORA
conduded distance
and
tance, 14 agreed on the tem-

distance between the future action and the moment of enunciation
I took in to account the of 120 speak-
occur more frequently with MF, whereas time adverbs immediate
ers, 80 from Caracas and 40 from the interviews of half:an-hour
are more used with PF.
all in semiformal were recorded from 1986 to the same
In the analyses (Sedano 1994 and in press) ofternporal distance I have
criteria about speakers' social characteristics (age, socioeconomic status
taken into account only clauses where MF or PF are accompanied by time
and sex) were used. 13 The future forms were analysed only from a linguistic
adverbs or adverbials. Depending on the adverbial expression, there are dif-
point of view, as a previous study (Sedano 1994) had showed that, in
ferent types of temporal distance: (i) immediate futurity: ahoritica 'right
Venezuela, the city of origin and other extralinguistic factors had no influ-
11 ow', en seguida 'immediately', en un momento 'in a moment'; (ii) relatively
ence on the use of future forms. ·
near futurity: esta tarde 'this afternoon', mañana 'tomorrow', el viernes
Sedano (in press) analyses a selection of articles published in
próximo 'next Friday', el mes que viene 'next month'; and (iii) vague, distant
January 2003 in Ei Universal, a nationwide Venezuelan newspaper based
or very remote futurity: tarde o temprano 'sooner or later', cuando eso ocurra
in Caracas. The texts were taken from four different sections, namely
'when that happens', algún día 'sorne day', among others. These three pos-
(i) Sports, (ii) Economy, (iii) Domestic news and politics and (iv) Opinion.
sibilities are illustrated below in examples ( 1-3), where parta shows the use
The results shuwing the comparison of the two corpora are presented in
of MF, and part b the use of PF:
Table 6.3.
The percentages in Table 6.3 show two clearly opposing tendencies: MF (1) /immediate futurity/
used scarcely in oral Spanish (12 per cent) and being practically the rule in a. There are no tokens of MF in either corpus.
written Spanish (93 per cent). How could such striking quantitative differ- b. Voy a presentara continuación unas pocas ideas básicas. (Sedano in press)
ences be explained? One first obvious explanation would be that MF started 'I arn going to introduce in what follows a few basic ideas.'
to be used long before PF, so it is not surprising that MF has continued to (2) /relatively near futurity/
be more frequent in written language, which has been commonly associated a. de manera que las actividades comenzarán mañana (Sedano in press) 'so
with traditional forms and is the only one attested through the history of that activities will begin tomo:rrow'
Spanish language. However, it would be interesting to find out whether or b. la semana que viene se van a empezara dar algunos anuncios (Sedano in
not spoken and written discourse modes are determined by the same lin- press) 'next week sorne announcements are going to be made'
guistic factors. In order to show to what extent quantitative data revea! the (3) /vague, distant or very remate futurity/
meanings associated with the alternation of the two future tense expressions a. Antes o después se extinguirá envuelto en el desprecio. (Sedano in press)
in current Venezuelan Spanish, I will analyse the data according to three lin- 'Sooner or later he will die surrounded by contempt.'
guistic factors: temporal distance, person ofthe verb in future tense and two b. Yo sé que algún día me voy a casar; voy a tener hijos. (Sedano 1994)
markers of epistemic modality. 'I know that sorne day 1 am going to get married, I am going to have
children.'
Table 6.3 Distribution of morphological and periphrastic future in the two corpora
Tables 6.4 and 6.5 show the results about the temporal distance between the
Morphological Periphrastic Total future event that is being referred to and the moment of enunciation. The
future future results presented in Table 6.4 to a corpus of spoken language
(Sedan o 1994), whereas those ofTable 6.5 are based on the study of a corpus
Tokens % Tokens %
ofwritten language (Sedano in press).
Spoken corpus (Sedano 1994) 101 12 710 88 811 Tables 6.4 and 6.5 indicate (i) in the case of immediate futurity, unequiv-
Written corpus (Sedano in 598 93 44 '7 642 ocal per cent) is given to PF in both corpora; (ii) as for
Total 699 48 754 52 1453 PF is more in oral language (86 per
WORKING WITH SPANISH CORPORA FUTURE TENSE EXPRESSIONS IN SPANISH CORPORA
Table 6.6 Verb person (Sedano 1994)
Tokens % Tokens % Tokens % Tokens %

o 19 100 19 lst person 22 9 224 91 246
6 14 38 86 44 2nd person 16 13 106 87 122
20 36 35 64 55 3rd person 63 14 380 86 443
Total 26 22 92 78 118 Total 101 12 710 88 811
Table 6.7 Verb person in written Spanish (Sedano in press)

Table 6.5 Temporal distance in the written Spanish corpus (Sedano in press)
Morphological Periphrastic Total
Morphological Periphrastic Total
future future
future future
Tokens % Tokens %
Tokens % Tokens %
lst person 3 25 9 75 12
Immediate futurity o 2 100 2 2nd person o o
Relatively near füturity 82 95 4 5 86 3rd person 595 94 35 6 630
Vague, distant or very remate futurity 23 100 o 23
Total 598 93 44 7 642
Total 105 95 6 5 111
use of lst person verbs and PF is ratified by the results obtained by Bauhr
and MF in written language (95 per cent); (iii) in the case ofvague, distant (1989: 96) and Troya (1998: 82-136).
or remote futurity, there is a categorical use of MF (100 per cent) in the
A~ Tables 6.6 and 6.7 show, in the 2nd and 3rd person the tendencies are
written corpus, whereas PF is more frequent (64 per cent) than MF (36 per
marked by discourse mode: more frequent use of MF in written and more
cent) in the spoken corpus. The results show that, despite the general pref-
PF in spoken language. This can be explained by the fact that 2nd and 3rd
erence for MF in written Spanish and for PF in spoken Spanish, in both cases person verbs report someone else's intention, not the speaker's.
there is a strong correlation between temporal distance and the selected Examples (4) and ( 5) illustrate the difference between showing an inten-
future form. tion (which may be linked to the use of the lst person) and not showing it
(which may be linked to the use of other grammatical persons):
3. 2 Person of the verb tense (4) /Use of lst person verbs/
The results of the comparison related to the person (first, second or third a. Y veo a la ... la vaca por allá bien lejos, y yo: ' aprovechar para agarrar
person or plural) ofthe future tense verb in both corpora are shown al becerrito'. (Sedano 1994)
'And I see the ... the cow over there, quite far away, and I say: ''I'm going
in Table 6.6 and Table 6.7. 15
to take the opportunity to grab the little cal[".'
Table 6.7 shows that 2nd person verbs occur in the corpus. If
b. quédate . 'No ... no, yo no me voy a aquí.' (Sedano
we compare the results of Tables 6.6 and the most interesting finding 1994) '"Hugo, stay here". "No ... no, I'm not going to stay here".'
relates to the use of lst person verbs: the frequency is high in spoken (91 c. 'El barco no está en condiciones de cargar y no voy a correr ese riesgo.'
per as well as in written Spanish (75 per cent). In my opinion, the cor- (Sedano in press)
relation between lst person verbs PF, in both corpora, is due to the fact that ' "The ship is not ready to be loaded and 1 am not going to take that risk".'
have more certainty about their own intentions than
(5) /Use of 3rd person verbs/
someone else's. In other words, the lst person in expressions of future
a. Los supermercados afiliados a ANSA mantendrán cerradas sus puertas
seems to be associated with intention, a modality that reflects linguistically durante el día de sólo aquellos establecimientos ubicados
the intention of out the action. As Bauhr says, en las zonas brindarán atención (Sedano in
occurs with lst person verbs'. The correlation between the
WORKING SPANISH C FUTURE EXPRESSIONS IN SPANISH CORPORA
affiliated to Ac"JSA while

establishments the (6 and 7):
salida violenta
es más y con violenta todo ese grupo
a. Sin acá en Venezuela ya se sabe que esos
escenario (Sedan o in
cos del mandatario serán los al fina!, terminarán conspirando
'ifChávez refuses to a violent is more and with a
contra la buena voluntad de el
violent thal whole group will be thrown out arena' .
(Sedano in
c. 111t.d~>dL1uc<u de resolver la crisis nos conduce a un escenano
'However, here in Venezuela it is known that the head of state's
[... ] que nos llevará por la senda de la (Sedano in
!ove for the of will be what, in the end, will end up
the of the na ti o ns that make up the
'the inability to solve the política! crisis leads us to a terminal scenario
b. yo sé que no va a estar aquí a las cinco. (Sedano 1994)
[. . . ] that will lead us to hyperinjlation'
'I know he is not going to be here at five.'
c. 'estamos seguros de que no va a suceder nada porque ya se han tomado
All examples in (4) and (5) refer to future events: those with lst person
todas las medidas preventivas.' (Sedano in press)
verbs (4a-c) show the speaker/writer's intention to do or not do sm.ne- '"we are certain that nothing is going to happen, because ali preventive
thing, whereas in those with 3rd person verbs (5a-c) the sp~akerl_wnter measures have already been taken" .'
is either just reporting what others said (5a), or expressn~g his(her
/future expressions in interrogative sentences denoting uncertainty/
opinion about an event that does not de~e.nd ~~ his(her m~ent10~s;
a. Inmediatamente le asalta de nuevo la duda. ¿Qué pasará cuando pasen
in fact, in (5b) the event depends on a condlt10n ( if Chavez ... ) and m
10 años, y tú tengas 80 [años] y ella 30? (Sedano in press)
(5c) it is based on a prediction that, in the end, can be refuted by future
'He was immediately seized by a doubt. What will happen in ten years,
events. when you are 80 [years old] and she is 30?'
Although no absolute generalizations can be made ~bout .the fact b. Yo me pongo a pensar en las elecciones que vienen ahorita ¿Por quién
that all future expressions in the lst person show an mtent1on and voy a votar? (Sedano 1994)
that the speaker/writer's intention is linked to his/her confidence in the 'I start thinking about the forthcoming elections. Who am I going to vote
occurrence of the future event, to a certain extent, the correlation for?'
undoubtedly exists, as suggested by the percentages in Tables 6.6 and 6.7.
Tables 6.8 and 6.9 present the results of the epistemic modality markers.
What else could explain the preference for PF when the verb is in the lst
Tables 6.8 and 6.9 show that the quantity of epistemic modality tokens is
person?
small. However, we can observe that in both tables the tendency is the same:
Confidence in the realization of a future event is definitely not the same
the frequency ofMF in creases when the future expression .depends on_ inter-
as absolute certainty, as these are two different modalities. However, confi-
rogative sentences denoting uncertainty, while the oppos1te occurs with PF.
dence and absolute certainty do have something in common, namely, that
These data may show a correlation between the speaker /writer's certainty
in both modalities the speaker /writer shows an assertive attitude far from
about the future event taking place associated with PF and uncertainty asso-
doubt or uncertainty.
ciated with MF.
3.3 Epistemic modality markers

Table 6.8 Epistemic modality markers in spoken Spanish (Sedano 1994)
Epistemic modality has to do with the speaker/writer's attitude .towards the
truth value of a certain proposition. Even though there are vanous ways of
expressing this type of modality, I will focus on two markers, conside:ed
Sedan o (1994) and (in press): (i) subordination of the future express10n
to verbs denoting certainty in affirmative form (saber 'to know', estar seguro
'to be certain'); (ii) use of the future expression in interrogative sentences
denoting uncertainty, i.e. in director indirect questions that 'have not been
formulated in order to get an answer from the addressee, but to express the
c:>cua11u 1994: . These epistemic modal-
the sense that one of them denotes
WORKING SPANISH ORPORA FUTURE TENSE EXPRESSIONS IN SPANISH CORPORA
Table about
future
Tokens % Tokens %
With verbs l 50 50 2 nation for these
saber; estar seguro intention to carry out a future action, is confident that
In 7 100 o 7 do it. The association between lst person verb, intention ,,,.~~'~"C'
speaker's confidence that the event will take could be
8 89 11 9 asín
(9) lst person > intention > confidence that the future event will take place
4. Discussion and condusions
.. bpistemic modality marker. The markers analysed in both corpora are (i)
The comparison ofthe results, provided by a~l studies reported h~re on the the subordination of the future clause to a certainty verb in affirmative
use of the two future forms, indica tes that PF is the most frequent m spoken form; and (ii) the use offuture expressions in uncertainty interrogatives.
Spanish, whereas MF is preferred _in written Spanish .. The preference for The results indicate that while the frequency of PF is higher when the
one or the other future form vanes not only accordmg to the language future form is subordinated to a certainty verb, the opposite occurs with
mode (spoken or written), but also to dialectal factors. In Lati~ American MF. The results of the analysis reinforce once more the relationship
spoken Spanish, the frequency of PF is higher in sorne countnes ( e.g. the between certainty markers and the use of PF, and vice-versa, uncertainty
Dominican Republic) than in others (e.g. Mexico). Overall, the frequency markers and the use of MF.
of PF is also greater in Latín America than in Madrid or in the Canary
Islands. In the case of written Spanish, within the context of the literary The results of the comparative analysis of the three analysed factors point,
corpus analysed, it has been reported that the writer's personal style and then, to the relative association, on the one hand, between PF and the
the sociocultural level of the characters have sorne influence on MF or PF speaker /writer's confidence in the future event's occurrence, and, on the other
selection. hand, between MF and lack of confidence. These conclusions confirm and reaf-
The two analyses carried out by Sedano (1994 and in press) widely firm those by other authors such as Fleischman (1982); Bauhr (1989); Silva-
confirm the tendency to use MF in written and PF in spoken Spanish, which Corvalán and Terrell ( 1989); Troya ( 1998); and Blas (2000), among others.
allows me to affirm that discursive modality (spoken or written) is the most The fact that sorne contexts favour one or the other future form does not
important factor for the preference of one or the other future form. In add- guarantee that all speakers/writers will use one of the two forms according
ition, the results of the analysis according to the three linguistic factors - to the tendencies found in the analyses. What can be foreseen, however, is
temporal distance, verb person and epistemic modality markers - lead to that, if spoken and written samples are collected from a certain number of
the conclusion that the speaker /writer's confidence or lack of confidence speakers/writers, the frequency in percentage terms of the use of either MF
in the realization of the future event being enunciated may be of a psy- or PF will most probably be particularly high in the contexts which we have
cholinguistic nature. This can be explained as follows: signalled as favouring that particular future tense form.
Temporal distance. In the comparison carried out in the present study, two
tendencies can be observed: the frequency ofuse ofMF increases as the Notes
distance goes from immediate to remote future. Inversely, the 1 This chapter is a modified version of another study presented by Sedano
ofuse of PF increases as the temporal distance changes from (2005). I would likt5 to express my gratitude to Paola Bentivoglio and
remote to immediate future. Since it is easy to suppose that a future Rebecca Beke for their careful and valuable suggestions and
action is as more feasible when it is closer to the moment of comments.
enunciation, the relationship between temporal distance, speaker's atti- 2 A third way to refer to future actions is the present tense (Nos vamos
tude and future form selectíon can be represented as in (8): mañana a Madrid 'We go to Madrid tomorrow', >vith the meaning of
that the future event will 'Tomorrow we wi.11 go to Madrid'). This form will not be considered in
the present study.
WORKING
the verb tornarse 'become'.

for Fleischman Giovanni Parodi
(1991); and Company and Medina (1999). Aída Gramajo
6 The last one of the three plays was published in 1963.
Pontificia Universidad Católica de Valparaíso Chile
7 In Table 6.1 there are references to sorne countries cited by Silva-
Corvalán and Terrell 1989. Based on personal communication with
Silva-Corvalán, however, I can affirm that the majority of speakers
recorded for the corpus are from the capital cities: Santo Domingo in
the Dominican Republic, Santiago in Chile and San Juan in Puerto Rico. Introduction
8 The absence of MF tokens denoting futurity in the Do mini can Republic Genre theory in the past few years has contributed irnrnensely to our under-
recorded in this study is probably due to the small size of the corpus, standing of the way discourse is constructed and used in academic and pro-
which does not mean that this verbal form is not being used in fessional settings. Language use has come to be recognized as a crucial and
Dominican Spanish.
aspect of interdisciplinary research, váth status equal to the study of other
9 I am referring to Cartagena's study on 'Madrid's educated oral speech' disciplines. However, advances and developrnents at the level of empírica!
(1995-96: 88-9). analyses describing the ( co-) occurrences, functions and distributions of lin-
1 O The use of the future forms in this city should be analysed further, as the guistic features in natural discourse within institutionalized educational and
high occurrence of MF recorded by Díaz ( 1997) and Almeida and Díaz professional contexts have only just begun to be made. In particular, many
(1998) - 72 per cent and 71 per cent, respectively- contrasts with the questions regarding patterns oflanguage use that are typical of technical-pro-
relatively low occurrence of thís future form found by Troya (1998), fessional registers remain unanswered in the field of discourse analysis, espe-
namely 38 per cent. cially those regarding specialized written registers in the Spanish language.
11 Ávila ( 1968) analyses two contemporary Mexican plays. In El gesticulador, There is a growing consensus arnong specialists developing linguistic
the main characters are whereas in Cada quien su vida most competence tests that the language use ofthe discourse communities being
characters are uneducated. This probably explains the difference targeted rnust be taken in to consideration. This means that, if the linguistic
between the percentages of the rnorphological future in the two plays - performance of technical-professional high school students is to be
84 per cent in El gesticuladorvs 48 per cent in Cada su vida. assessed, text types used by the educational system, as well as the linguistic
12 For further information on the Caracas and Maracaibo corpora see features typical of the registers found in that systern, should first be
and Sedano and 40), re~;pé:cti1ve1v researched and described systematically (SIMCE 2005a, 2005b).
13 Regardingage, the groups (GG) ofthe One way to approach specialized written registers is to start frorn the
the oldest were considered out of four groups assumption that technical-scientific texts used by students in their daily
( 14-30 years and GG4 60 years of age). As for the socioeco- school reading practice reflect both the text types and the features typical
nomic status, five levels middle-high, middle, rniddle-low, of varieties of specialized language. In addition, a corpus-based approach is
were considered.
"~'~"'"p" to investigate the variation of text types and the characteristics of
14 The dassifications used these authors are different. types. One of the many of corpus-based analysis is that it is
15
based on an adequate representation of occurring discourse,
corpus including analysis of complete texts and of rnultiple texts frorn any given
to this same factor for the present 1999, Conrad
WORKING CORPORA SIONAL DISCOURSES
In this context, Schroder proposes a defmition of

from ESP (English for Specific . Schróder
varieties of languages:
Mercer , as well as the between the so-called armchair lin- are not dcfincd as thc opposite of common for special pur-
guistics and corpus linguistics (Fillmore 1992; Chafe 1992; Stubbs 1996, 2006; poses are sublanguages belonging to a certain field of subject-orientcd commu-
nications; they use the linguistic and other communicative means of a certain
Tognini-Bonelli 2001; Lazaraton 2002; Swales 2002; Hunston and Thompson
]anguage and culture system in a specific way and with a specific frequency of
2006; Parodi 2005a, 2007).
occurrence depending on the content, the purpose and the whole communica-
In this chapter, we are interested in classifying and describing the text tion situation of a text or discourse.
types of three corpora of specialized written technical-professional dis-
course based on functional, communicative and textual criteria. The This fragmen t emphasizes the role of the occurrence of features of a varied
corpora have been collected from three subject domains during the last year nature in defining this type of discourse: the linguistic, pragmatic and extra-
of study of secondary technical-professional education in the city of textual aspects deserve serious consideration. Schroder's preference for the
Val paraíso, Chile: the maritime sector (port operations), the metal mechan- term special or specialized leads him to suggest that the term languages far spe-
ics sector (industrial mechanics) and the commerce sector (accounting) cific purposes should not be used, but specialized communication, because it
(Parodi 2004; Parodi and Venegas 2004; Parodi 2005a). In the first section involves the aspects mentioned above and is therefore a more encompass-
of this chapter, we will focus on a brief revision of sorne concepts and crite- ing concept. We agree and would further argue that specialized discourses
ria which will serve as a framework for the study; in the second, sorne claims possess grammatical and textual features that, along with other non-
and methodological steps are given; and in the third, we outline the quali- linguistic factors, constitute a texture of useful criteria to describe text types.
tative and quantitatíve results yielded by the technical-professional text clas- Few of them or individually, either linguistic or extralinguistic, fail to fully
sification. Concluding remarks provide sorne comments and implications. account for the text object.
With the aim of defining even further the concept of specialized dis-
l. Theoretical frnmework course from a strictly linguistic perspective, it is useful to introduce the
notion of syndrome, developed by Halliday. According to Halliday ( 1993), a
1.1 Specialized discourse certain register can be identified through the co-occurrence of a set of lin-
guistic features. Syndromes are patterns of co-occurrence from features in
Throughout the chapter, we will use the term 'specialized discourse'. This one of various linguistic levels (of expression or of content) (Halliday 2006).
term illustrates the type of discourse on which the study is focused: texts on These syndromes characterize a variety of language and help us recognize
science and technology in their didactic dissemination function at sec- a given register as such (for example, a dialectal variety or a technical-
technical-professional institutions (Martín et al. 1987; Gliiser 1993; scientific variety). The notion of dimension, proposed by Biber ( 1988, 1994,
and Martin 1993; Rose Christie 1998; Unworth 2000; 1996, 2003, 2005) as part of his multi-featured studies, is also a very pro-
Goldman and Bisanz 2002). In the term 'specialized' itself ductive text descriptor, as it confirms a group of co-occurring linguistic char-
reveals a gradient or continuum, an essential axis with which to approach acteristics which work together to describe a pattern of text variation.
texts of various degrees of specialization, many of which possibly circula te Turning now to the topic of text classifications, Ciaspuscio (1994, 2003b)
on the border of the transition towards dissemination or popularization a comprehensive review of the of text types. It is
(Parodi 2005a). interesting to note that there are more exhaustive approaches that incor-
Many authors have noted that discourse is shaped a group porate multiple analytical levels (Brinker 1988; Bassols and Torrent 1997).
of texts centred on prototypical topics within a specific area of knowledge, We are certain that these complex approaches to text types or classes in
such as science and technology. These texts show a series of distinctive which and communicative aspects coexist can
features that reveal to the rhetorical and and of the phenomena under
WORKING SPANISH CORPORA PROFESSIONAL COURSES
group of features identified

or lexical.
As shown in of texts to the spe- or technical area
cialized area and its boundaries are not easy to determine, becom- see Parodi
what is a matter. The include the Second, these are texts that revea! a referential commu-
implied audiences, the context and the text configurations nicative function and they circulate in particular situational contexts. All of
employed to convey the intended meanings, among other variables, all of this means that the multiple linguistic and contextual feature organizations
which strongly interact when constituting a specialized discourse. This is are articulated in singular complex semiotic systems. Of course, other
any strongly dichotomous classification which considers a text equally important features are the highly specialized vocabulary
as or not is both a theoretical and methodological problem. 1999, 2000, 2002; Ciapuscio 2003a; Cademártori et al. 2006) and the proto-
Returning to Figure we tried to represent there the idea of a con- typical written format (Gotti 2003). Nevertheless, there is increasing inter-
tinuum of texts which may be distributed progressively from highly spe- est in the specialized spoken forms of interactions in academic settings (w:ith
cialized texts at one extreme of the line to more disseminating and general a grow:ing tradition in English including, for example, Swales and his
texts at the other extreme (Schroder 1991; Glaser 1993; Halliday 1993; MICASE Project). In Spanish this has not been the case, although we have
Jeanneret 1994; Peronard 1998; Ciapuscio 1994, 2000, 2003b; Cabré 2002; newly developing research areas, such as that being developed by Ciapuscio
Goldman and Bisanz 2002; Gotti 2003). The concept of fuzzy categories, (see Chapter IV, this volume).
originally proposed by Lakoff from cognitive linguistics (1972), and in
more recent linguistic theory considered as g;radience (Aarts 2004; Aarts,
et al. 2004), fits perfectly well in to our understanding that, in real life, one 2. Corpus and Method
discourse shares and moves along a diversified set of possibilities, making it
hard to classify at times. The more prototypical extremes are 2.1 The Technical-Scientific El C:xrial PUCV-2003
easier, but there vv:ill be hybrid or fluid models w:ithin the The Technical-Scientific Corpus (TSC), as part of a larger corpus called the
boundaries. El Grial PUCV-2003 Corpus, is made up of 74 texts totalling 626, 790 words,
One feature alone ora small group offeatures cannot account for aspe- collected from secondary technical-professional schools in the city of
cialized discourse, nor it of features co-occurring Valparaíso, Chile. Details ofthe corpus are presented in Table 7.1.
As we can see, there is no direct relationship between the number of
texts per specialty and the number of words. Thus, most texts are on the
CONTINUUM
maritime area of port operations, but they show the smallest number of
words. Thís implies a large group of texts that students must read, but
-- which must be comparatively short. In contrast, in the technical area of

industrial metal mechanics, the smallest number of texts was collected
but they make up the largest in terms of the number of
TEXTS
Table 7 .1 Composition of the TSC
Disciplinarv Areas Number of Texts Total Number ofWords
Disseminafüm Low Maritime Port Operations 36 155,160

Industrial Metal Mechanics 18 246,374
Commerce 20 225,256
Total 74 626,790
SPANISH C TECHNICAL-PROFESSIONAL DISCOURSES
We concerned
istics of the
erable number ofwords. written corpus. In
These texts the that could be to any text. If this were
would have had to be much ·w:ider and
11 nLMJcL1.cu1Hcu, oral or as is the case in many ,_.v,pnrn·)m
were made to ensure the collection of the reading The construction of the typological proposal started out from types
- teachers, students and librarians - and asking and institutionalized traditionally as observable
texts the students had to and read. 1993, 2004; Hyland 2002, 2005; Swales 2004; Martin
all the texts were digital- Beaugrande and Dressler . Subsequently, with the aim of
and onto a website belonging to the both common and diverging patterns that help distinguish the technical
Escuela Lingüística de Valparaíso (www.linguistica.cl). They can therefore be texts initially classified more accurately, a matrix with more specific features
searched and analysed online at www.elgrial.cl. They can also now be was constructed, following guidelines found in specialized literature. Text
retrieved using the text classification emerging from this research. types cut across disciplines, and therefore sorne of the texts in part of the
corpus share features across disciplinary boundaries. However, we expect to
find more prototypical texts of sorne disciplinary domains and also to iden-
2.2 In search of a technical--professional text typology: criteria and method tify subtle varia ti o ns in terms of specific characteristics, sorne of which may
According to Biber (1996), association pattern techniques can be used to turn out to be mutually exclusive.
investigate two major kinds of research question: i) the variability of a lin- This proposed typology has been organized around three general analyt-
guistic feature, and ii) the variability among texts. In this study we are con- ical criteria: situational, functional and textual. These criteria have been
cerned with the latter. When the purpose of research is to describe a group established to address aspects concerning the participants' interaction, the
of texts of a specialized written re gis ter, rather than individual linguistic fe a- communicative functions, the contexts in which the texts circulate and the
tures, textual co-occurrence patterns must be determined in order to iden- textual structures that characterize the texts in the collected sample.
the salient characteristics of the texts under study. Thus, although the Today there is a growing, virtually infinite, number of possibilities in terms
individual criteria that could be selected to describe the texts are important, of specific features to characterize texts. For example, a focus on patterns of
the systematic co-occurrence of these features will be one of the most rele- organization has shed light on larger text regularities in various text types.
vant contributions to this classification. The configuration of the patterns is Attempts to identify these patterns have followed different approaches, such
not accidental. They represent choices and options: they mean some- as in Hoey's (1983) problem-solving structures, Widdowson's (1978) rhetor-
thing. lt is true that all features interact in a text (linguistic and non- ical structures and van Dijk's (1980, 1983) schematic structures and then
linguistic ones), but the way these interactions occur varies. What may be a superstructures, along with the work done by, among others, Coulthard
strong pattern of association in one kind of text often represents a weak (1977), Horowitz and Samuels (1987) and Reid (1987). These regularities of
organization pattern in another text type. organization in discourse were also seen in terms of 'moves', as proposed by
Hence, to develop a text typology that accounts for the essential charac- Swales (1981, 1990, 2004) andalso byBhatia (1993, 1995, 2004).
teristics of the Technical-Scientific Corpus (TSC), as mentioned previously, As can be understood, there has been a continua! quest for more detailed
we decided to follow a multilevel approach (Bassols and Torrent 1997). That and grounded description and identification of recurring patterns in dis-
is, both the internal characteristics of the texts and the extralinguistic course. The emerging picture thus looks much more complex and dynamic
context in which they are produced and circulated were considered. than the one we had in mind for the starting text typology for our technical-
Such an approach finds its underpinnings in the eminently dialogical professional corpus. Nevertheless, the selected criteria are grounded in an
nature of language and attempts to account for the linguistic-social rela- eight-component matrix with sub-specifying features. Figure 7.2 presents the
between the participants of a given discourse community. general organization of the features, before each is described in detail.
Multilevel approaches are presently considered most appropriate for the
elaboration of valid from a theoretical perspective Situational criteria ( components of the communicative situation)
2003b; Bassols and Torrent Biber 1996; Bhatia 2004; Bazerman Beginning our analysis from a situational context, it is necessary to identify
1994; Swales . At the core is the selection of ~~~"~~·'" the features that best describe such a context (van Dijk 2001, 2002, 2006a
parameters, on the will and Martín 1993, 1998; Eggins and Martín 2003; Hood and
differences among the texts and text Bazerman Bhatia . In the
WORKING SPANISH PROFESSIONAL SCOURSES
The context in
(.)
:¡::;u ~~u~~~ in or for the
C\J ·.;:;
E et!
<DE
r. (J)
-o- .e
c·¡:: A.2) The original audience: refers to the relationship between writer and
o :J
:2: a:: reader as far as their expected knowledge of the topic is concerned. This
can be from expert to lay person, from expert to semi-lay and/ or from
expert to expert.
~
-et! :J
.a
X :J
ü " Lay person: a person who is not familiar with the topic.
(J) .J:;
1- (/) • Semi-lay: a person who has only basic knowledge of the topic, but which
is sufficient to enable further learning.
" Expert: a person who is knowledgeable about the topic.
(J)
> As for this ongoing research, we consider the student reader to be 'lay' or
~(.) 'semi-lay', since there are learning stages at which the student is not know-
·¡:: e ledgeable about the subject matter in the texts, and others at which the reader
:J o
E·.;::; has already acquired deeper knowledge of the specialized subject matter.
E u
o §
Ü LL A.3) The explicit author: refers to whether or not the author is clearly iden-
tified. This is important for the reader in order to grasp a sense of group
community. Ideally, new members would, through explicit citation, know
the authors and the authorities within a discipline.
(f)
O) o Functional criteria (communicative functions)
>- z
These features also help classify text types through certain resources. This
is whatBiber et al. (1998) call the purpose ofthe communicative event. Such
features match the communicative functions proposed byJakobson (1961).
B. l) Referential function: refers to facts, or ideas. Its purpose is
informative. It focuses on the context, that is, on the topic to which it refers,
and it manifests itself in the third person singular and in a large number of
nouns, among other features.
B.2) Expressive function: focuses on the writer/speaker and implies the
expression of feelings and emotions. It manifests itself in the first person, in
interjections and in a large number of adjectives.
Appellative function: focuses on the reader/listener and
persuasion and exhortation V\1.th the purpose of eliciting a response
WORKING WITH SPANISH
SIONAL DISCOURSES
from It manifests itself in the second person and a order that shows the unit of action and its
ofverbs. that if one of its action parts is whole
focuses on the of communication and ified . Its characteristic is the presence of
purpose is whether or not the channel functions rnru'rt1v the transformation of
ifests itself clichés formulaic Pv~.-pº'''"'"
B.5) Poetic function: focuses on the message itself and manifests itself in the its aim is to provide accurate directives or instructions about
and in of the way certain procedures that tend to govern behaviour and/ or
B.6) Metalinguistic function: is oriented towards the text itself, the code. Its ideal state of things and processes should be carried out
main function is to clarifications and and is usually
shown through exernplifications and definitions. Subject matter: refers to whether or not the text addresses one or
several topics, which can account for textual complexity and extension. A
It is irnportant to ernphasize that none of these functions exist in a pure multithematic text can be more complex because of the number of topics
state; that is, they intertwine in the discourse, although there is frequently dealt with and because of its length. A rnonothematic text would have a
sorne predorninance of one over the others. more simple structure, and could therefore be comprehended more easily.
C) Textual criteria
These features characterize the text types according to the organizational " Monothematic: focuses on one topic.
pattern of the text, which is influenced by the subject matter, the graphic • Multithematic: focuses on several topics.
elements and the structural characteristics.
C.3) Multimodality: refers to the presence or absence of elements of differ-
C. l) Textual structure: refers to the pattern of organization of the infor-
ent modalities (linguistic, graphic). Having multimodal elements, a text
mation which prevails in the text. Nowadays, we know there are multiple can facilitate comprehension, because one single concept is presented in
approaches and possibilities for classifying a text on this principle (Sinclair
different ways (Kress and van Leeuwen 1996; Camero 2001; Baldry and
et al. 1993; Ghadessy 1993; Hoey 1983, 2001; van Dijk 1980, 1983; Bhatia
Thibault 2006).
2004; Swales 1990, 2004). Given the overall objective and dueto the tech-
nical-professional characteristics of the corpus under analysis, we will C.4) Writing required: refers to whether or not the text needs closing
follow a more classical perspective, in which five main organizational struct- (writing).
ures can be distinguished: argumentative, descriptive, expository, narrative
and normative.
3. Results
. Argumentative: the aim of which is to influence a given audience. It In this section, we give four kinds of empirical results: l) the analysis of the
assumes an utterer who intends to make an audience (readership) texts themselves, the first step taken to study the TSC corpus, which was
accept a conclusion, offering a reason to accept that conclusion done with the purpose of attempting an initial classification (based on
(Plantin 1998). socially instituted notions, but also on definitions based on the criteria
" Descriptive: its function is to characterize objects, people, situations or selected); 2) a comparative characterization of the text types, based on all
processes through language, explaining their parts, qualities or cir- of the features included in the taxonomy, and, to illustrate sorne text types,
cumstances (Bassols and Torrent 1997). It is conditioned by the com- a few examples (although dueto the extension ofthe corpus and the major-
municative context and the purpose to be achieved (Calsamiglia and ity of the texts,just representative fragments are selected); 3) a comparison
Tusón 1999). between two emerging opposite text types and the organization of the fea-
Expository: its aim is to 'objectively' inform or expose subject matter to tures in terms ofmore general criteria; and, finally, 4) results of quantitative
facilitate cornprehension (Bassols and Torrent 1997). This implies a analysis of the occurrence of the text types by disciplinary domain.
need to be reliable, neutral and objective when information is given
(Calsamiglia and Tusón 1999).
.. Narratíve: the narrative function assumes the wish to provide real facts 3.1 Classifying and defining text
or those which are potentially real in a discourse universe. Its function assembled the set of features to be identified in each text, the next
is to discursively organize actions and events in an integrating sequential task was to read all the texts, analyse them and group them. An initial
SPANISH CORPORA TECHNICAL-PROFESSIONAL DISCOURSES
dassification of 12 types of
each, listed ~''"ª'"L
The second step consisted of the matrix to

Didactic Brief text of a didactic nature, a set the 12 text types; the was to describe
about a with exercises and suggesiea them more definitions and to estab-
lish more Table 7.2 shows the 12 text types and
the results of the based on the three criteria described
Directive: Document, frequently institutional, which provides useful previously. . . . . . .
information about how an activity should be carried out. Its main function This more detailed charactenzat10n, shown m Table helps to d1stm-
is appellative. Its dominant text structure is normative. guish more differentiating features of the text types. Thus, from the per-
spective of the communicative functions, the prevalence of the referential
Form: Institutional document with blank spaces to be filled out with data. and appellative functions is detected in most of the texts in the corpus. This
Its chief function is appellative. Its prevailing text structure is normative. implies the prevalence of the referential focus, in which reference to facts,
Glossary: Catalogue or vocabulary where technical words are described. It is things and/ or ideas is presented with a special emphasis on the topic.
addressed to a non-specialized group. Its chief function is referential- Communicatively speaking, the core purpose of these texts is to inform,
metalinguistic. Its prevailing text structure is expository. although there is also sorne focus on the readership, with the aim of elicit-
ing a response from the readers. However, it is evident that, according to
Law: Set of precepts mandated by the highest authority for all governed to these features, the El Grial 2003-PUCV technical-scientific corpus moves
abide by. Set of mandatory regulations for the citizens of a country. Its away from any objective that involves the expression of feelings and emo-
primary function is appellative. Its prevailing text structure is normative. tions (expressive function), as well as from stylistic concerns related to the
Legal Gloss: Clarification of or commentary on a law. Its primary function message (poetic function).
is referential. Its dominant text structure is expository. Table 7.2 also shows that text types in the TSC are primarily characterized
by an essentially expository and normative organizational structure. This
Manual: Treatise of a didactic disseminating-knowledge nature, mostly used means that their aim is to inform or explore a topic to facilitate compre-
by technical professionals and students approaching a discipline. It is rich hension, which entails the need for neutral and reliable information, as well
in examples, tables and multimodal resources, which facilitate comprehen- as to provide accurate directives or instructions on how certain procedures
sion. Its chieffunction is referential, but it has a secondary appellative func- should be carried out.
tion. Its prevailing text structures are expository-normative. Multimodality shows itself to be a relevant feature among the technical-
Regulation: Institutional document containing a set of rules, precepts and professional texts. The largest number of texts is ranked as multimodal, that
instructions for carrying out an activity. Its primary function is appellative. is, where non-verbal resources (figures, drawings, tables, outlines, graphs,
Its prevailing text structure is normative. etc.) play an important role in the communication of information. It is
obvious that any attempt to characterize these texts cannot set aside this
Table: Series of numerical values of any kind, of words or signs, organized essential feature of the specialized discourse.
in parallel columns. Its primary function is referential. Its prevailing text Finally, it is important to mention the monothematic nature ofmost text
structure is descriptive. types in the corpus. This feature, coupled with those previously described
Technical Article: Text of a didactic ( disseminating) nature whose aim is to (referential and expository focus), provides an overview of the relevant
explain various aspects of an object, process or activity in a professional area. characteristics of the texts under study. Of the 12 text types in analysis, only
It has no fixed structure, although illustrations, pictures and tables fre- the Manual shows itself as a kind of plurithematic text, traditionally centred
quently accompany it. Its primary function is referential. Its dominant text on a varied range ofknowledge.
structure is expository. From the characterizing feature matrix, it is possible to establish a variety
of comparisons, by either paying attention to certain specifying features
Technical Explanatory document containing physical and across the texts in the corpus or comparing tvvo or three types of texts more
functional ~1J·cu.1R.ct spheres of application and other characteristics of comprehensively. As an example, Figure 7.3 shows the differences and
Table 7 .2 Text types and characterizing features
Text Situational Features Functional Textual Features

Features
Area of Original Explicit Communicative Prevailing Textual Subject

original Audience Author Function Structure
production
Diagram School Semi-lay Yes/no Referential Descriptive Monothematic Yes No

u
Didactic School Semi-lay Yes/no Appellative- Expository Monothematic Yes
Guideline referential
Directive Work Expert Yes/no Appellative Normative Monothernatic Yes
Forrn Work Semi-lay No Appellative Normative Monothematic Yes
School Yes/no Referential- Expository Monothernatic No No
rnetalinguistic
Law Work Expert No Appellative Normative Monothematic No No
Legal Gloss School Semi-lay Yes/no Referential Expository Monothernatic No No
Manual School Semi-lay Yes Referential- Expository- Plurithernatic Yes No
appellative Normative
Regulation Work Expert No Appellative Normative Monothematic No No
Table School Semi-lay Yes/no Referential Descriptive Monothematic Yes
Technical Work Expert Yes Referential Expository- Monothematic Yes
Article Argumentative
Technical School Semi-lay Yes Referential Expository- Monothematic Yes
Description Descriptive
CRITERIA
FUNCTIONAL TEXTUAL
Area of original Original Explicit Author Subject

production Audience
tj
H
C/l
n
e
e;
?;!
C/l
t"1
(/l
7 .3 Comparison of the 12 features in two textual types

SPANISH PROFESSIONAL DISCOURSES
them
the way all
and the Technical Article
tures these interactive features. ..,r:n''"'~0c of such features and, 'ºLuJt1ua'
As shown in thís kind of make dis- discourse in greater
tinctions between the DG and the TA means of situational, functional
and textual features. In tenns of situational the DG differs from
J.3 A r·mm111.rv.1m.
the TA as far as the sphere of original production is concerned: the DG
emerges from the educational context and is created by and for such a audience
context. The TA's sphere of original production is, in contrast, the field of Figure 7.4 shows the text types that circulate in the technical-professional
work. With respect to the original audiences to whom these textual types are educational community. They have been subdivided into two groups: text
addressed, they differ in that the DG appeals to semi-lay readers and the TA types whose sphere of production is the school community itself and text
to an expert audience. As to the authorship, the DG may or may not bear types produced in other communities (more professional than academic).
the author's name, showing sorne formality as compared to the TA, in which In turn, the audiences for whom they were created (lay or semi-lay or expert)
the author's name is always explicit. are specified. Figure 7.4 clearly illustrates the profound effect of comparing
The TA and the DG put different emphases on their communicative pur- these two features in all twelve text types. The findings show how these two
poses: in the TA, the referential function prevails, which makes its nature features could help identify texts whose sphere of original production is not
more 'objective', with an informational focus. In contrast, the DG stresses the academic community and the invoked audiences to which they are ori-
the appellative function over the referential; that is, there exists a greater ginally addressed are not the students of the secondary technical-professional
appeal to the readership, an attempt to elicit a response from the readers schools, but others more related to professional environments.
by inviting them to participate actively through writing. The above relates It is interesting to verify not only that, in the technical-professional dis-
to the text features, specifically to text completion, since the DG needs to course community under study, text types produced by and for that com-
be completed, while the TA <loes not require writing. munity circulate, but also that there are text types not generated in that
Another difference can be observed in the prevailing textual structure: community that are equally read by its members. These text types are ori-
the DG has an expository structure, which means that its major aim is to ginally conceived for and addressed to other specific audiences. Therefore,
present topics to be learned and to facilitate comprehension. In the TA, their linguistic, textual and graphic resources are part of the meanings con-
where the structure is expository-argumentative, a persuasive focus prevails veyed originally to those groups. The aim of all this is to determine whether
and, therefore, it provides information that supports the line of exposition- they coincide váth the lay or semi-lay audience to whom the texts used in
argumentation. An analysis of the diagram and the distribution of the fea- the technical-professional school should be addressed.
tures make it abundantly clear that, based on the eight distinctive Finally, from the data presented, it is clear that most text types (seven out
characteristics, the two text types are diametrically opposed, to the extent of 12) circulate in the school community of origin: Didactic Guideline,
that they do not share six of the eight features. Strictly speaking, the only Legal Gloss, Technical Description, Table, Diagram, Glossary and Manual.
features they have in common are the Multimodality and the Monothematic However, a significant number of text types that students read in their tech-
Subject. The only feature the TA shares with the DG is textual structure nical-professional education are neither produced by that school commu-
(being expository and argumentative), though this is not rare, for, as is well nity nor are originally addressed to it. This is the case of the Form, the
known, pure text organizations are difficult to find. Directive, the Regulation, the Law and the Technical Article. Of these text
In terms of the situatíonal criterion, the DG differs from the TA as far as types, only one, the Form, coincides with the school audience, that is, it is
the sphere of original production is concerned: the DG emerges from the addressed from experts to semi-lay people. The four remainíng text types
educational context and is created and for such a context. The TA's are addressed to expert audiences, which results in a considerable academic
sphere of original production is instead the field of work. With respect to burden to be processed by secondary students. It is true these texts must
the original audiences to whom these textual types are addressed, differ open the way to the specialized knowledge that is typical of the orot<~ss1011al
in that the DG appeals to semi-lay readers and the TA to an expert and technical community these students are going to be part of, but it is also
WORKING WITH SPANISH CORPORA TECHNICAL-PROFESSIONAL DISCOURSES
Qne of the most relevant

between the text
ing shm\'S the way in which future oror•~ss1o:na1s
part of their discourse communities. The co-existence of
specialized texts, along ·with those for dissemination and didactics, is signif-
revealing as to the way in which the school could organize the
of specialization of knowledge and access to it.
This distribution of text must not be understood as a disadvantage
for the students. On the contrary, the use of texts characterized by high spe-
cialization is just what helps them progressively integrate and become
roembers of the more expert discourse community. These are the text5 that
today's students will encounter in their professional settings, while the other
texts (those for dissemination in the school) help these future profession-
als develop gradual disciplinary knowledge. In fact, this complementary
aspect of the text types is just what characterizes the professional, special-
ized, educational sphere.
Those who have shaped these communities - maybe intuitively - have
reached an appropriate balance between didactic-disseminating text types
and eminently and highly specialized text types in each of the three techni-
cal areas under study.
What remains unknown is the didactic hierarchy or pedagogical
sequence in which these text types are organized in order to be read by the
students under technical-professional instruction. It would be expected and
assumed that such a hierarchy, if it exists, would be chronologically organ-
ized from the more disseminating to the more specific and professional
texts, thus respecting the linguistic and psychological burden they entail.
No information in this respect is yet available.
3.4 Text examples of the TS Corpus PUCV-2003

Now we will provide a few examples from sorne of the text types identified
and described, accompanied by their corresponding translation into
English. We have extracted passages from five different text types: Manual,
Directive, Glossary, Regulation and Technical Description. The examples
belong to two disciplinary domains: commerce and maritime. We will also
give the codification number in each case, so it is possible to check the whole
text online on our website: www.elgrial.cl.
One of the most striking aspects of the possibility of observing and com-
paring these examples is that the distinctive features emerge clearly and, even
though certain characteristics may overlap, subtle differences prevail. The
Manual and the Directive show their didactic rhetorical tools (in that a large
number of pedagogical el u es can be detected). They thus actas disseminating
discipline knowledge material, facilitating the learners' experience and
WORKING WITH CORPORA PROFESSIONAL OURSES
THE
< LOSSES = LOSSES
extreme,
Technical very concrete a strong reference focus and
INSTRUCCIONES DE LLENADO DE CONTINUACIÓN DEL INFORME DE
little orno attention to the reader grasp the contents. The
definitions but no rhetorical didactic clues at all.
ua•wJu~. as a text type, is identified by its normative stance and proceda el uso de este deberá Ser emitido
reveals itself as somewhat distant and different fron1 the other four rnente con el informe de ~~..... ,~.;,~ veste último no tendrá validez si a él
As shown in the example, the structure no se ' 'Continuación del
characteristic of this text type. Informe de Importación' que corresponda(n).
In what follows the five example text types are presented and the transla-
tion into English has been included in square brackets. l. Presentación (Número y fecha).
Deberá indicarse el mismo número y fecha de presentación del Informe
3.4.1 Manual-7 CTC-COM-mal de Importación.
Poner Atención: 2. Entidad que presenta el Informe de Importación y Código.
Deberá indicarse las mismas del Informe de Importación.
I. LA CUENTA 3. Número, fecha y firma autorizada entidad emisora.
'Es una agrupación sistemática de los cargos y abonos relacionados a una Número, con su correspondiente fecha y firma autorizada con el cual el
persona o situación de la misma naturaleza, que se registran bajo un Servicio Nacional de Aduana cursa el Informe de Importación.
encabezamiento o título que los identifica.'
[INSTRUCTIONS FORFILLING IN THE CONTINUATION OF THE IMPORT
IL CONCEPTOS
REPORT
Whenever the use of this Jorm is required, it should be issued with the report from
a) Las anotaciones registradas al Debe de la cuenta se llaman: Cargo.
Import. The latter will have no validity unless attached to the correspondzng
b) Las anotaciones registradas al Haber de la cuenta se llaman: Abono.
'Continuation of Import Report'.
c) La suma de los Cargos se llama: Débito.
d) La suma de los Abonos se llama: Crédito. 1. Presentation (Number and Date)
e) La diferencia entre Débitos y Créditos se llama: Saldo. The same number and date of presentation of the Import Report should be stated.
2. Company presenting the Import Report and the Code Import.
RECORDAR:
The Same should be stated as for the Import Report.
GANANCIAS > PÉRDIDAS = UTILIDAD DEL EJERCICIO 3. Number, date, and authorized signature of issuing entity.
Number, corresponding date and authorized signature with which the National
GANANCIAS< PÉRDIDAS= PÉRDIDAS DEL EJERCICIO
Customs Service issues the Import Report.]
[Important.
3.4.3 Glossary-7 CTC-MAR-gs41
J. THE ACCOUNT
'" Mercancía
'It is systematic grouping of the charges and related to a person or situ-
Es todo bien corporal mueble sin distinción alguna.
ation of the same nature, registered under a heading or title that identifies them.'
Todo producto, manufactura, y otros bienes corporales muebles, sin
II. CONCEPTS excepción alguna.
'" Mercancía extranjera
a) Entries registered as credit in the account are called: Payments.
Mercancía proveniente del exterior y cuya importación no se ha con-
b) Entries registered as income in the account are called: Credit.
sumado legalmente aunque sea de producción o manufacturación
c) The total sum of all the Charges is called: Debit.
nacional; o que habiéndose importado bajo condición ésta deje de
d) The total sum is called: Credit.
cumplirse.
e) The between Debits and Credits is called: Balance.
WORKING WITH SPANISH CORPORA CAL-PROFESSIONAL DISCOURSES
se han a elementos que
DEFINICIONES DE CONTENEDORES
. Contenedores para son totalmente cerrados,
teniendo todas su como así también el el y
además una de sus extremas está provista de puerta.
" 2. Contenedores para uso especifico son aquellos destinados al
de mercancías generales construidos con características especiales, de
tal forma de facilitar el embarque o descarga ya sea por la puerta
on a condition, such a condition is no extrema o teniendo funciones específicas, tales como la ventilación de
.. National merchandise la carga.
Merchandise produced or manufactured in the country, with national or nation-
alized raw material] [J. HISTORJCAL EVOLUTION OF CONTAINERS
3.4.4 Regulation ~ CTC-MAR-rg45 The Container; as a new transportation device, has revolutionized maritime
III TRASLADO DE MERCANCÍAS DESDE EL PUERTO O AEROPUERTO AL transportation; vessels began to be adapted for more efficient operation, to the
RECINTO DE DEPÓSITO extent that some are built especially for container handling and ports have been
Las normas contenidas en esta apartado solamente se aplicarán cuando el Jorced to purchase tools that help the handling process.
recinto de depósitos esté ubicado fuera de la zona primaria del puerto o
aeropuerto de arribo de la nave. JI. DEFINI170NS OF CONTAINERS
l. La compañía de transporte debe entregar las mercancías al almacenista 1. Containers for general use are completely enclosed units with rigid walls, floors
dentro de las 2 horas siguiente a la hora de salida de la Zona Primaria and ceilings, and with a door provided in one of the end walls.
del puerto o aeropuerto de arribo de vehículo 2. Containers for specific use are those units destined for the transportation of
2. La responsabilidad ante el Servicio de Aduanas de la entrega de las mer- general merchandise, and which have been built with special characteristics to
cancías al almacenista es de la compañía de transportes que realizó el f acilitate loading or unloading, whether through the end door or having specific
flete internacional, independiente de la empresa que efectuó el traslado functions, such as ventilation of the load.]
desde la zona primaria al almacén
[III. TRANSFER OF MERCHANDISE FROM THE PORT OR AIRPORT TO 3. 5 Occurrence of the text type according to disciplinary domain
THE PLACE OF DE,'POSIT Graph l shows the occurrence of each of the 12 text types in a comparison
The norms contained in the section will only apply when storage is located outside the between the three disciplinary areas.
primary zone of the port or airport of arrival of the vessel. The fact that the commerce sector has eight text types makes it the most
1. The shipping company must deliver the merchandise to the head of the warehouse heterogeneous discipline in terms of knowledge communication: Manual,
within 2 hours following the departure time from the Primary Zone of the port or Didactic Guideline, Directive, Technical Description, Diagram, Glossary,
airport of the vehiele 's arrival. Form and Law. Of these types, the Form and the Law are exclusive to the
2. Responsibility bejore Customs Services for delivery of the merchandise to the head area; that is, they are only found in this disciplinary area. The technical
of the warehouse lies with the shipping company that transferred the goods from kinds that do not appear in this sector are Regulation, Legal Gloss,
the primary zone to the warehouse.] Technical Article and the Table. The maritime sector, in comparison, pre-
sents a greater variety of text types (ten in total): Manual, Didactic
3. 4.5 Technical Description ~ CTC-COM-dt 16 Guideline, Legal Gloss, Directive, Technical Description, Diagram,
I. EVOLUCIÓN HISTÓRICA DEL CONTENEDOR Glossary, Regulation, Technical Article and Table. Exclusive to this sector
are Regulation, Legal Gloss, Technical Article, and Table. In addition, the
El Contenedor como nuevo elemento de transpone ha revolucionado el
disciplinary domain is characterized by having the greatest number of
transporte marítimo, las naves comenzaron a adecuarse para una operación
Didactic Guidelines collected (12 in total). The text types which do not
expedita hasta llegar a las naves especialmente construidas para contenedores
appear in this sector are Form and Law. The industrial sector, in turn, has
WORKING SPANISH C PROFESSIONAL e
-·-·-·-·-·-·-·-·~
Specialized Oisseminating/Didactic
Discourse Discourse
Figure 7.5 Dialectical relationships
which nonetheless are, at the same time, intertextually linked. Certain

groups within specialized genres, for example, display possibly related char-
acteristics. Bazerman's (1994) notion of a system ofgenres may be useful here,
as may the notion introduced by Bhatia (2004) of a higher genre level of
interrelations termed genre sets. Of course, further analyses and qualitative
techniques will have to be applied to discover patterns of this nature.
A relationship can be observed between the disciplinary domain or spe-
cialty area and the text types that circulate, from those specialized and exclu-
sive to the professional sphere to those clearly disseminating/ didactic.
According to the figures, two areas reveal the greatest heterogeneity in the
text types and greatest diversity as far as the relationship between special-
Graph 7.1 Occurrence of 74 text types by discipline ized and disseminating is concerned, namely maritime and commerce. In
these, a major concern about the selection of texts can be detected with
only three types of text, namely, Manual, Directive and Technical respect to the gradation of their technical-scientific con ten t. This would ulti-
Description. It <loes not have an exclusive text; however, it has the greatest mately mean a better access to professional knowledge, typical of the sphere
number ofManuals (total of 15). ofwork (Flowerdew 2002; Lassen 2003).
The uneven distribution of figures in Graph 1 is evidence that disciplinary The plan in Figure 7.5 attempts to capture the dialectical relationships
domains and text types are related and that the language which the between these three axes, and their divergent directions.
members of those communities use to conceptualize their disciplinary- The tension between the specialization sphere and the continuum of dis-
specific knowledge and to convey their specialized meanings may be differ- courses - from those highly technical-scientific to those requiring a greater
ent. Martin (1993) offers an interesting discussion, based on secondary-level processing intended to lessen the information load, and therefore, to back
textbooks, in which he suggests there is a basic difference in the creation of up the specialized knowledge in tended for communication - shows itself as
technical disciplinary language between the sciences and the social sci- a bidirectional relationship of the utmost importance in designing dissem-
ences. Martin's contrast between technicality in scientific knowledge- inating and didactic materials. Additionally, the unidirectional projection,
making and the abstraction ofinterpretation in the humanities is discussed from more restricted and exclusive audiences and expert contexts toward
Wingell (1998), Williams (1998) and Love (2002). Hyland (2000) has those of a larger and more diverse audience, enables gradual integration
also contributed to these kinds of analyses of professional discourses in pro- into the target discourse communities.
fessional settings. He wonders why engineers 'report' while philosophers In this context, the situation in the industrial sector appears to be most
'argue' and biologists 'describe'. There seems to be no definite consensus worrying, due to its limited typological diversity and its focus on more dis-
at the moment. Unfortunately, our leve! of analysis of the data, strictly seminating text types, with little occurrence of technical texts typical of the
limited to the focus of this study, <loes not shed much light upon the issue. professional context within which the students must operate. At any rate,
Interestingly enough, this rather limited set of text types resulting from these data should be handled cautiously, since they were collected from
the application of the taxonomy influenced disciplinary domains repre- three institutions and, constitute a
sents a typical set of products, a set of individually distinct texts from a large range of schools.
1 WORKING SPANISH TECHNICAL- SIONAL OURSES
most used text ..

context. These kinds of texts disseminate
are seen, in words , as
, which through sorne rhetorical structures may grant access to the
most specialized professional communication (Gunnarsson 2000; that is,
and Keller-Cohen 2000). At the same time, these two text types interact texts to audiences. This is
with the audience in a writer-reader relationship that is appropriate to the evident in that the area provides the largest number ofDidactic
educational and disseminating context; that is, the writer acts as the spe- Technical Descriptions and Glossaries, all of which are identified by their
cialist and the reader as the non-initiated student approaching new know- didactic nature and typical of the educational community. These are sup-
ledge and trying to be part of the discourse community. As Hyland (2000: plemented by the circulation ofvaried, specialized types oftexts, which sup-
104) points out: ports the supposition that this specialized sector can gradually train its
students in the progressive handling of language appropriate for the pro-
Textbooks are indispensable to acadernic life, facilitating the professional's role fessional community.
as a teacher and constituting one of the prirnary rneans by which the concepts and
The commerce sector, although it shares similar text heterogeneity with
analytical rnethods of a discipline are acquired. They play a rnajor role in thc
learners' experience and understanding of a subject by providing a coherently
the maritime sector, shows fewer didactic text types, which results in
ordered epistemological map of the disciplinary landscape [ ... ] increased importance for these didactic text types. Nevertheless, a balance
between the didactic and the more specialized texts is maintained. The
Hyland's ideas are correct. The only necessary and fundamental point is to industrial sector, in contrast, is characterized by scarce text heterogeneity
ensure that textbooks will always be well designed, appropriately balanced and comparatively fewer collected documents.
in the gradual incorporation of disciplinary knowledge, and written and According to the data, the area of specialization has proved to have an
sequentially organized with adequate methodological and psycholinguistic influence both on the variability of text types and the number of texts that
strategies (Parkinson and Adendorff 2005). U nfortunately, this is not always circulate in one sector or another. Thus, the maritime and commercial areas
the case and this is in part we must continue conducting research in show a somewhat similar distribution in these spheres, unlike the industrial
educational settings and evaluating the available pedagogical materials with sector. Sorne text types can vary greatly across disciplines and within any
discourse analysis tools and psycholinguistic criteria with a special focus on one discipline. Other text types, on the other hand, stay relatively homoge-
disciplinary expertise. neous across different scientific disciplines, as is suggested in the report
by Venegas (2006), who analysed a large number of research articles
from indexed Latin American journals belonging to a variety of scientific'
Conduding remarks and implications
disciplines.
The issues cónsidered in this chapter are not intended to undermine or One of the most valuable contributions derived from this study is the typo-
sidestep the complexities of proposing a technical-professional text taxon- logical heterogeneity that has been detected, both in each specialized area
omy. It is more than ele ar that the ever-growing breed of theoretícal reflec- and in the overview that the application and analysis of certain specific fea-
tions and data available further complicate the challenge. In fact, tures provide. The above is especially noticeable in the identification of ori-
anyone who has analysed a number of texts in order to apply a ginal text types in the professional and other which are more
set of principles throughout the whole corpus knows quite well the draw- typical of school didactics. In numerical terms, this occurrence approaches
backs and many emerging variables váth which one has to cope. However, a balance of both types of text according to these features, and this is highly
one simple and naive remark could be to insist on the idea that our purpose satisfactory. With this, the gradual incorporation of the students into their
was not to search for a text taxonomy that could be applied to any group respective area of professional specialization will be taken care of.
of texts, but to search for principies that help classify and identify It is possible to theorize that the teachers teaching these specialized areas
the written text types circulatíng in a particular technical-professional edu- have made, albeit intuitively, the right selection of reading material to
cational environment. to their students. this balance would forro a continuum among
This centred on a multilevel has contributed to: types of text from those addressed to a semi-lay audience to one with
1 WORKING WlTH SPANISH
in
these texts, in order to include a number of written materials
suitable for students in This balance of reading materials would UniYersitat
facilitate students' access to conceptualization of technical-scientific Spain
contents to which professionals in the technical area are exposed. A sub-
stantial advance in learning qualitywould be reinforced by using these texts.
Finally, as we already know, the school teaching-learning process can often
become more accessible to readers if content and text types are classified l. Description of Corpus 92: academic by university applicants
and properly organized.
The corpus-based study of discourse created by learners forms a highly
productive research line for L2 teaching and learning, for example
research on the International Corpus of Learner English (ICLE) in Europe
(Granger et al. 2002; Granger 2004), in the United States on the Michigan
Corpus ofAcademic Spoken English (MICASE) (Simpson et al. 2002); and in
China on the TELEC Secondary Learner Corpus (TSLC) (Allan 2002). In the
Spanish language, however, there is still a shortage of studies on learner
corpora which would make it possible both to explain the level of com-
municative competence - both spoken and written - possessed by students
at each stage of their educational development and, on the basis of this
analysis, suggest the appropriate teaching method.
Along these lines, Corpus 921 was compiled so that reliable information
would be available on the degree of written competence possessed by
Spanish students upon entering university. This learner corpus has been
analysed in order to characterize student usage with a view to designing
rational efficient teaching materials that address their difficulties. The
corpus comprises 750 copies of entrance exams for Spanish universities
from June 1992. It includes academic texts from the Spanish secondary
teaching syllabus of both the sciences and the humanities (according to
the original names used in Spain to refer to the various itineraries in
secondary school) and examinations of the modules common to any
labus itinerary. 2 The same number ofwritten texts (125 exam copies) has
been collected from each of the six Spanish universities that took part in
the project: two from northern Spain, two from central and western Spain
and two from southern Spain. Consequently, this corpus is homogeneous
in nature as it is based on the same exam tests from the same sitting at
six somewhat different Spanish universities. Moreover, the exams were
all taken on the same date. Table 8.1 shows the criteria used when build-
as well as the abbreviations used to identify the texts, the
l WORKING SPANISH ORPORA WRITING
Table 8.1 92 characteristics

92: data co.llection and nu~thodok111
A corpus such as to
Universities
research on the ofwritten competence l,,,,,M-",.."
A recent book Torner and Battaner
Humanities Cornmon to Sciences The North Central The South the artícles the 92 research team.
all itineraries and western however, include all possibilities for study, as this article proposes
Spain
to demonstrate. The type of text chosen in 92- written academic dis-
Modern Text (GE) Barcelona Madrid Murcia course - ~ake.s it possible to address discourse aspects progres-
(HC) Comrnentary Biology (BA) (Me'\) (MU) sion, con¡unct1on, academic texts organizational schemes, as well as
(CO) and grammatical issues that are typical fe atures at this level of written
Hístory of Art Philosophy Physics (FI) Ü\~edo Salamanca Seville competence. The correlation that can be established between the various
(HA) (FI) Chemistry (OV) (SA) (SE) levels of analysis provides a useful description for explaining the conse-
(QU) quences in the final product that is our object of study (examinations in our
Spanish Mathematícs case), which result from the context and the process followed when com-
Literature (MA) posing written texts ( Ciapuscio 2003b).
(LI)
Accordingly, various types of research have been carried out:
142,466 words 120,153 words 94,242 words TOTAL NUMBER OF WORDS:
356,861
a) Qualitative analyses: in order to categorize problems in linguistic use
(Dagneaux et al. 1998) detailed analyses of the texts that comprise the
number ofwords contained in each section, and the total number ofwords various sections of the corpus (humanities, common texts, sciences)
in the corpus as a whole. were carried out. As mentioned above, the issues classified relate to
These criteria make it possible to consider Corpus 92 to be wholly repre- various areas of linguistic analysis, ranging from spelling to overall dis-
sentative of the degree of competence shown in writing the academic texts, course structure: identifying spelling and punctuation problems, identi-
which is the aim of the project: to identify the level achieved upon com- fying forms used in expressing certain discourse functions, such as
pleting secondary education, which forms the basis for university studies. raising hypotheses or quoting, analysing thematic progression, and iden-
The exam answers that comprise this corpus are complete texts with an tifying the rhetorical devices used in order to assess the knowledge
average length of 500 words. 3 The corpus itself is small (totalling 356,861 shown, among others (Tomer and Battaner 2005).
words), specific (it is composed of texts for academic purposes), and spe- b) Quantitative analyses: these, for example, were conducted by using elec-
cific to its time 4 (indeed, its communicative value as a means of validation tronically produced lists of the frequency of use (ofboth vocabulary and
for admission to university is limited to the specific time and place at which grammatical categories), examining recurrent concordances to char-
the exams were taken). The results gleaned from it lay the empirical foun- acterize the use of certain linguistic units and performing electronic
dations for analysing academic discourse in Spanish and for teaching it as counts of the number of words per sentence in order to study the
well, since it reveals the leaming needs (Murison-Bowie 1996) of a specific degree ofsyntactic complexityfound in the analysed texts. The purpose
group ofwriters who are still being educated. of these studies is to offer precise descriptions (supported by the elec-
Corpus 92is lemmatized and morphologically labelled, meaning that com- tronic word counts) ofthe lexical and syntactic knowledge possessed by
puter searches can be performed to grammatical categories and leamers and of the (lexico-) grammatical and discourse features that
en tries. Below we shall briefly mention the type of research that has been help to create a satisfactory academic text. This has made it possible to
carried out on this corpus (Section 2) in order to emphasize the pedagogi- answer questions such as: what types ofwords are used most at this stage
cal (Section 3). This will also enable us to disclose details of of leaming?; how often are certain units employed?; what types of
aimed at comparing the grammatical and discourse aspects of this nouns, verbs and adjectives are used most frequently?; what types of
learner corpus with those of a specialized corpus which is also electronic connectives are used most in reasoning?; which relative pronouns are
Sections 4 and 5 which constitute the central part of this chapter). The con- preferred?; what verb tenses are used most frequently?; which gram-
clusions 6) will show the pedagogical significance of performing matical person is most used?; and so on.
this type of between a corpus that is of a e) Comparative analyses: these were conducted across disciplines on the
kind of expert academic prose and a learner corpus. one hand (a of discourse from the sciences section, contrasted with
SPANISH
92
corpora, un.its
The chief in and a language is to the com-
municatíve competence possessed those learning it and to promote This paper presents between non-expert texts, as
reflection on that competence. In order to reach this goal, teachers need to represented the exam answers of 92, and expert texts cv1nu11eu
possess considerable knowledge of the difficulties their students encounter in a technical-scientific corpus. Carrying out thís comparison entails
during each stage of learning. Indeed, as Granger et al. (2002: 41) state, choosing a specialized corpus that may be used as a 'standard' with which
'Analysing authentic leamer errors in L2 (and also in Ll) corpora is an the learner corpus can be compared. The corpus taken in to consideration
extremely efficient - though all too frequently and in our view unjustifiably when making the comparison is the Corpus textual especializado multilingüe
disparaged - method to acquire that knowledge'. In this respect, linguistic (multilingual specialized text of the Institut Universitari de
research on Corpus 92 provides two answers regarding the degree of com- Lingüística Aplicada (IULA) at Fabra University. 6 This technical
petence attained by Spanish students upon admission to university: corpus (hereinafter, covers five different specialized domains
(economics, law, environmental medicine and IT). About one
1) What they do and do not know how to do when they write: which punc- million words have been compiled for each stemming from
tuation marks they have greater command of, and which punctuation samples taken from varying documents that represent discourse activity in
marks they have difficulties with; what vocabulary they are able to use; each field. 7 Unlike Corpus 92, the samples compiled are not full texts.
which aspects of parataxis they master fairly efficiently and which they For our specific purposes, in this study we have taken into consideration
do not; which devices they use to handle the unfolding of the informa- only two disciplines of the social sciences ( economics and environmental
tion they present; what expressions they employ in arder to raise studies) for the samples collected in Spanish from the IULA-CT corpus. This
hypotheses, to quote others, to clarify and exemplify concepts, to mod- methodological decision was based on similarity in the type of discourse. As
ulate their discourse; and so on. pointed out by Battaner et al. (2001), the humanities discourse from Corpus 92
2) What they should know how to do: what command of punctuation is characterized as being explanatory text in which description and reasoning
should be required; which should be the most suitable vocabulary to prevail. Descriptive and reasoning sequences are in economics and
answer the questions raised in an examination; to what issues of environmental studies; our decision to choose these two domains.
parataxis and hypotaxis should one lend particular attention in order Accordingly, the information from the corpora in this
to allow students to better the grammar of their discourse; is in Table 8.2.
what would be the most effective method of presenting and progress- specific objective is to compare the use made of conjunc-
ing in the unfolding of an academic subject; what types of resources tions in the two corpora. These have different discourse
would contribute to suitably raising hypotheses, quoting a text men- status according to the context in which
offering clarifications and of the concepts explained, of
giving their own standpoint with respect to the knowledge
and so forth. · "· """ the verv same lin-
to whether the text in question is ¿fa \Hitten
These are issues that contribute to an entire teaching and learning pro- or a conversational register (i.e.
gramme of the mother tongue for academic purposes. language uuLuvu~. the use of which is
studies, such as the one described in Corpus 92, highlight determined by the written or
the value ofleamer corpus data in improving language teaching pedagogi- Schleppegrell 2001; Clachar
cal materials grammars, classroom an reflection for those who do not possess
effective command of the mechanisms that are characteristíc of academic
WORKING SPANIS CORPORA ACADEMIC
Content
of
(98 u~•cu>uucc,,,
- economics: 1,091,314
(48
- environmental studies:
1,062,113 (50 documents) the interest that our is
Rather than contrasting the types of units used in each of the corpora, we
aim to look at the use made of each of the units in each context. In other
texts entails observing how the paratactic conjunctions are employed in this words, we will look at how paratactic conjunctions are used in each case,
type of discourse genre and distinguishing these uses from those made by which specific uses ofthese students are unacquainted with in
these students in a discourse that bears many features of orality (Tusón academic discourse, and which student uses we consider to be unsuitable.
1991; Salazar 1999). It is therefore interesting to account for the function
that characterizes theses units in academic discourse insofar as the follow-
ing aspects are concerned: the levels of liaison that are involved, how parat- 5. The use of
actic conjunctions are combined in expert discourse, and what pragmatic discourse
values characterize their formal academic usage. Thus, the aim is to show The enormous wealth of information corpora
students the features that these units possess in 'exemplary' texts; in other on account of their
words, texts that are examples of how expert writers make use of language recurrence in expert discourse, are useful for teaching and
in their respective academic fields. This use includes a command of para- learning in specific communication contexts. Moreover, a learner corpus
tactic conjunctions as textual connectives. such as Corpus 92 also shows common errors made the students
In a recent study on the use of paratactic conjunctions in Corpus 92 when managing the various discourse and grammar units. In this study
(López and Atienza 2006), we observed that students made very frequent we are looking at paratactic 1999; Flamenco
use of paratactic conjunctions. However, did not use them as connec- García 1999) .
tives, which would be the most suitable use in written academic texts, but As we have pointed out, in an earlier and Atienza 2006) we
rather as pragmatic markers (Schiffrin 1987) with various functions: to highlighted the syntactic, semantic and pragmatic that handling
change the subject, to introduce or to resume the discourse conjunctions poses for students 1999).
topic, for example - uses that are more characteristic of their function as (1) below illustrates the sense of
markers in a conversational text. These uses show that students at y in order to continue to add
level are still unaware ofvariation in the use fails to consider the semantic and new
to the element ofinformation has with the 1uuu.LuaacL
( l)
Los puntos de roce se hallaban en el Estado, :'víarx consideraba que había que
y/e, ni; para realizar una revolución mientras Bakunin defendía una
mas; sino; aunque. postura que negaba la existencia de Estado para llegar a la
The consultation tool to de realizar la revolución, creía estar
out the contexts in which these
la acción revolucionaria directa; y j>or último la 1;ortu:zhrirum
units are used is the Bwananet computer tool of available at the
se U¡J'UllJLdH
website of the Institut Universitari de of Pompeu Fabra
8 The here is a qualitative based on the In non-expert texts, such as those in 92, there are many
standard concordance put forward Bwananet. Jndeed, data9 is mistakes of infonnation. This is shown the that,
to differences in use are characteristic of expert discourse data units that do to the same level are
of their use as a way to structure information in the overall
text. the patterns of use in which tend to
appear modulate the discourse in a specific manner which needs to be
described if students are to leam how to use them effectively in their texts.
Also in 92 is the use of paratactic conjunctions as discourse

markers, that is, as oral units which mark the thematic progres-
sion and its changes: Syntactic restrictions with respect to the number ofunits that may
(3) relate to -paratactic can affect two units (adversatives)
Obras que se pueden clasificar en esta las Comedias bárbaras la 'Farsa de and others can affect more than two (copulatives, - and the
la enamorada del y la más destacada de bohemia' en esta obra tratará various syntactic levels at which they can intervene mean that are as dif-
de hablar de la crueldad humana, además el protagonista Max Estrella que ficult for non-expert writers to manage as and subor-
no tiene suerte en nada y que pero eso le permite ser más dination (Di Tullio 2005; Schleppegrell 2001; Clachar 2003). Examples (4)
vidente que de los demás. and ( 5) 12 illustrate the fact that certain coordinating such as
In the eyes of the student, a sen ten ce is the upper limit for expressing infor- the locution tanto ... corno and the ni, enable coordi-
mation, one in which the link between one clause and the next is, in nation (Camacho 1999: 26-38):
these cases, the as a marker of addition. the (4)
text becomes a one-dimensional chain of that are not Tanto los consumidores, como las Prr'mrP<c. como las mercancías, se consideran car-
into the textual and bears witness to the students' lack of command acterizados, además, por su loc:a11,za1c10,n geográfica. (e00060)
of the necessary variations in level that structured text
should possess 2006: (5)
La p1imera referencia que se nos ocurre para demostrar la creciente conciencia
These three we have chosen are not isolated cases, as various
ambiental de los ciudadanos no es el número de contenedores en las calles para
studies about the command of in pre- recoger o vidrio, ni la cantidad de instaladas, ni las hectáreas
students' . This de nuevos ni los kilowatios de electricidad con el
fact that viento y el sol, por citar los aspectos más del eoolcig1smo n,rnné>nn
cantidad de noticias,
cos, la radio la televisión. (a00022)
matics author's intentions and below demonstrate the m hierar-

other words, order master when are established between
take consideration wide range of aspects: types of components:
(6)
that are coordinated their Por tanto, y puesto que efectivamente las del FMI y el Banco Mundial
generan consecuencias reales sobre las personas, debemos concluir
are FMI Banco Mundial no para el desarrollo, bien,
WORKING SPANISH WRITING
m texts we see combinations of

as alternatives to those are (10)
grammars:
(8)
Tanto si el vino, las o la lefia de Torredembarra o de Tortosa,
en Catalufia, o bien Vinaroz y Burriana, en el 'reino' de Valencia, se trata evi-
dentemente del mismo movimiento de atracción, determinado por las necesi- (11)
dades de la barcelonesa en los mercados del litoral, y en especial Pero hay mucho más: los residuos urbanos e industriales, el la escasez de
de sus productos agrícolas. (e00085) agua, la contaminación de los ríos, el transporte, el tráfico, o la contaminación
acústica, son otros de los muchos abordados por el periodismo ambi-
The to the command of more complex discourse lies in the variation ental propiamente dicho que acapara en estos momentos los principales "º'wL;vo
and the interplay of combinations between varying conjunctions and y titulares.
between syntactic as seen in the grammar of specialized texts.
Since conjunctions carry out the role of discourse organizers in a
written text, have a range that is more restricted than that
found in less formal communication contexts. These uses also reflect the
first kind of difficulty we pointed out in a previous section (5.1), the syntac-
As mentioned earlier, the use of paratactic conjunctions poses another
tic difficulty involved in handling various levels of coordination in which
challenge for students in that are also used as textual organizers. The
paratactic conjunctions intervene. This can be illustrated the following
lack of planning in the student texts 92 can be illustrated by the
example, where the connector y is highlighted in lower case when it is used
fact that their starting for connections in discourse is
as a sentence conjunction, and in upper case when it is employed as a textual
formed the sentence unit rather than the text unit. do
approach the text as a whole in order to establish relationships
and hypotaxis, as is shown the following from (12)
Si se mantienen las relaciones comerciales y se impide que los déficit fiscales de
los ricos desalienten las inversiones en otras partes y si los de ingreso
(9)
alto mantienen una tasa de crecimiento elevada y estable, la mundial
Estas características también son a la mezquita de Córdoba. Esta fue
no disminuirá y se evitará que los ricos cedan ante las
construida en varias o 'fases', es decir, fue Abderramán I, quien en el
a causa la de tasas elevadas de
VIII hace la la cual consta de un donde se encuentra las fuentes
de abluciones que servían para lavarse los
encontrar el Alminar la as far as discourse is concerned, the text as a whole
11H:Ltj1m11a se encuentra que es orientada hacia la meca, y por allows us to consider both coherence in the information in the
último el Mihrab al cual no tenían acceso los fieles. Y el resto de la mezquita estaba
text and the linguistic ch osen. Examples , ( 11) and also
formado por columnas, llamado Haram. Fue en el S. IX cuando Abderramán U
la derrumbando la También por el S. X, intervino show certain schemes that characterize the grammar of paratac-
II que la rnlvió a pero la última intervención fue de tic features that can be
Almanzor, que la lateralmente. Todo este de salas forma la identified when vvith electronic corpora. The concordances
de Córdoba, uno de Jos offered the computing tool used enable us to determine the combin-
que destacar ations in which paratactic are used most in other
Medina A.zahara, S.X, en donde se encuentran los y arcos we can not the levels at which are used, but also
herradura todo este arte. (MU /HA/03) the types ofunits that appear before and after them and their
5.3 Comrnon schemes
taken
conjunctions appear in discourse and Table 8.3 Common ofuse in the

characterize the they play. It is highly common for paratactic connec-
tives to appear at the beginning of a sentence in these texts or, to draw Patterns of use
attention to the thematic content that follows, at the beginning of a frag- Units with which are combined Number of occurrences
ment of text: in IULA-CT in 92
(13) y Reinforcement of addition:

Y aun cuando nuestro objetivo no se centre en el análisis de la OMC, cabe destacar asimismo
el hecho de que integra bajo una misma estructura tres grandes Acuerdos además 19
(el GATT, el GATS y el TRJPS) y dos mecanismos internacionales (un también 32
entendimiento relativo a las normas y procedimientos por los que se rige la solu- más aún l
ción de diferencias y un mecanismo de examen de las políticas comerciales) y lo mismo ocurre
además prevé su cooperación con otras organizaciones internacionales especial-
izadas, entre las que destacan el Fondo Monetario Internacional y el Banco
Mundial. (e00029)
12
or to stress organization of the information provided:
así sucesivamente
(14) de
En Italia está de un sistema de medidas y subvenciones
para instalaciones de agua que cubrirá el 30% del of information:
valor de la instalación. Y, como último en existe una gama amplísima
de fundamentalmente destinadas a fomentar las solares en más concretamente 2
las nuevas construcciones del 80% del valor de la instalación al 11 % ) de manera
o a mejorar los sistemas energéticos de las con energía solar (subvención solo este
del 30% del coste de la obra),
Others:
orto the
como consecuencia de ello
(15) por tanto 6
Este proceso, )'sólo este, mantiene toda la vida en la Tierra a través de las cadenas aun cu.ando
alimentarias (nada que un mecanismo de transformación de
Reinforcement of addition:
incluso
(16) aun
los medios de comunicación no han entendido en absoluto esta 4
o, dicho, se a entenderla. (a00022)
2
to assess content
SPANI
are combincd
que por el conirario

que además 2
también 13
Others: que también ;3
un rnínimo de más bien
tan solo incluso
por consiguiente
Emphasizing of information:
o/ o bien Adversative reinforcement:
y esto es lo que nos
l
interesa
aún l
mejor dicho 3
simplemente
por el contrario
únicamente
Addition: que solamente
incluso 3
incluso más aunque Contrastive reinforcement:
a lo sumo de todos modos
de hecho
pero Contrastive reinforcement: también 4
además 6
más Subjective assessment:
hay mucho más obvio
también 13 obviamente
incluso por razones obvias
sobre todo 5 razonablemente
aun así la verdad es que
es obvio que
Subjective assessment: es evidente que
de hecho sí es cierto que
la verdad es que ciertamente
lo cierto es que sin duda
también es cierto que desde luego
es cierto también que seguramente
es evidente que naturalmente
lo que está claro es que en realidad
añadir
en realidad 2
1
Textual
Markers of time:
por otro lado
(casi) nunca
todavía
sino Contrastive reinforcement: a veces 2
muy al contrario a menudo
SPANISH C
--------------·-----
habitualmente
normalmente
actualmente
possess the same semantic
the called
siempre
Insofar as of the information are concerned, the
Textual
inicialmente follovving examples show combinations in this respect:
últimamente
(21)
Pedro el Grande convirtió a Rusia en una gran potencia occidental y también
extendió la influencia rusa en Oriente, intentando establecer relaciones comer-
Others: ciales directas con la India, hacer la paz con Persia y asegurar una salida al Mar
solo sea Caspio, aunque esto último se perdió después de la muerte de Pedro el Grande.
(e00118)
We can see that in expert discourse the following patterns of use prevail:
(22)
De esto se deduce que la adopción de políticas macroeconómicas acertadas no
a) patterns showing the strengthening or reinforcement of the semantic sólo es ventajosa para el crecimiento y la inflación, sino que además facilita la ori-
content (addition, contrast, alternatives) ofthe paratactic conjunctions; entación hacia el exterior. (eOOOOl)
patterns of organization of the information (textual structuring pat-
tems); and (23)
Se pretendía que la asunción de obligaciones dependiera del nivel de desarrollo
e) subjective assessment
de los Estados en relación a los distintos sectores, pero resultaba muy difícil
plasmar y llegar a la concreción de tales obligaciones por la falta de estadísticas
Let us take a closer look at each of these grammatical schemes. In written
fiables, o incluso en algunos casos por su inexistencia. ( e00029)
academic texts the uses made of conjunctions lend greater
importance to semantics and, as a result, they determine the semantic rela- another kind of syntactic scheme is what has been named subjective
tionships between the clauses that are linked. On the other hand, in spoken assessmen t expressions ( Ciapuscio 2003b). In these pattems, the conjunc-
paratactic take on more varied pragmatic values tions personalize and assess the information when combined vvith markers
2001; Clachar , sorne of which mav be unsuitable in of subjectivity or modality:
written register. The greater lent to the sema{1tic aspect
tactic in formal written discourse explains the (24)
Las noticias ambientales, y el resto de la actualidad, básicamente de
or reinforcement of each individual
personas, empresas, instituciones y múltiples
as can be seen in these a estos temas, pero no exclusivamente, porque la interdisciplinariedad
(18) intrínseca a lo ecológico o ambiental afecta abrumadoramente al ejercicio peri-
de todos modos, es interesante tener en cuenta que la velocidad del viento odístico, aunque es obvio que las diferentes áreas informativas tienen no pocos ter-
aumenta la altura y que veces es suficiente elevar 10 metros la de ritorios comunes: el las decisiones del los tribunales de
molino para que la que nos sea doble. los etc (a00022)
(19) (25)
Nuestro razonamiento no afecta mundo está cambiando: el se agota irremisiblemente
ificar considerablemente que asumamos lo lo mismo sucede con el carbón o el gas
hemos visto mientras se mantenga la
WORKING SPANISH ORPORA ACADEMIC
(26)
y
como un
Hmnanos
y, además,
assessment of the information

of
in the written
n2":drnJ11' can teachers in
texts of non-experts.
should be devoted to a given lexical item or gram-
As we can see in Table 8.3, the non-expert discourse of students features
semantic reinforcement of addition and contrast and bears a prevalent use
of the adverb también combined váth the copulative conjunction y, and with Thus, this information - taken from a 'non-exemplary' sample of academic
the adversative conjunctions pero, sino and aunque. In addition, the use of discourse - can help in planning the content of a pedagogical grammar.
combinations that highlight organization of the information in the text is Likewise, if these grammatical issues are placed in the context of the com-
more limited in the wTitten texts of learners than it is in experts' written municative aims and rhetorical strategies of the they take on a vital
texts. The discourse organizer por último is commonly employed in combin- role in a more overall analysis of the conventions of discourse genre.
ation with copulative conjunctions. In the case of other types of conjunc- The needs of students become more obvious when their production is
tions, their use as a means of structuring the information is rather unclear. compared with that of experts. Taking in to consideration samples of texts
Lastly, the least frequent patterns in students' written texts are subjective taken from manuals, informative articles, encyclopedias, etc. can provide
assessment expressions, due to the fact that for these types of patterns it is guidance as to what course needs to be followed in order to achieve the com-
necessary to personalize the discourse. In other words, it becomes neces- municative competence that one needs in an academic context. The infor-
sary to assess the information provided based on an acquaintance with the mation taken from the IULA-CTcorpus that we used to make the contrastive
degree of certainty of that information, something that students have yet analysis in these pages is of great use in this respect. Nevertheless, it should
to acquire. This situation explains the scant use of subjectivity or modality be taken into account that the learner's Corpus 92 constitutes another spe-
expressions in which paratactic conjunctions intervene, as opposed to cialized domain and, therefore, the need for communicative competence
their common and varied recurrence in expert academic discourse (for (or the uses of the conjunctions in this highly specific domain) may not be
see the patterns in which the conjunction pero is employed in the exactly the same.
With this pedagogical aim in mind, we looked at the use of paratactic
In this last section, which presents an of the use of para tac tic con- conjunctions in a subsection of Corpus 92, humanities, on the one hand, and
in academic discourse, we have illustrated their combinations in in a sample of Spanish from economics and environment texts from the
expert discourse as a way the needs of the learners of this type IULA-CT specialized corpus, on the other. The study shows the inappropri-
of text. It concerns the most grammar of conjunctions in ate use made by pre-university students of these coordinating conjunctions
academic discourse, a grammar of reference for anyone who wishes to have with regard to certain aspects: when it comes to handling syntax, that is, pos-
an effective command ofthis of communication. Along these lines, the sessing a command of the paratactic relationship in microstructure (sen-
aim is to become acquainted with and to command oftheir patterns of units and macrostructure (text) units; when it comes to using them
use, rather than concerned the of use of the gram- as elements that organize the discourse in formal register that is character-
matical units in certain communication contexts, precisely because they istic of academic discourse; when it comes to common patterns of use in
define the discourse function and ofthe linguistic elements we are which they are combined with other textual units that characterize this type
of discourse. The data offered computerized corpora illustrates the types
of relationship that are established paratactic conjunctions, relationships
Condusions that involve a syntactic difficulty that poses students a degree of difficulty
similar to that encountered when use hypotactic conjunctions.
In this artide we have offered a which the semantic value these units assume in
proves useful for use of
SPANISH
forward in their texts.
academic
the lists has
the 'concordances' 1991) found in each
conjunction in their real contexts ofuse and by distinguishing these occur-
rences in various contexts: at the start of a paragraph or a sentence, for
An analysis of this type enables patterns, 'models of use' and Paratactic Learner DISCOURSE
regular expressions possessing a particular function in the discourse to be conjunctions 92: humanities the environment (a),
identified. Describing common syntactic schemes, otherwise known as pre- economics (e)
tnh~r.,tNt patterns (Granger 1999) or chunks (Bybee 2002), proves to be
4121 (2.892 % ) a: 33,880 (3.189%)
beneficial for language teaching and learning in differing com- e: 27,249 (2.496%)
munication contexts. nz 91 (0.063%) a: 337 (0.031 % )
Granger (2004) has underlined how valuable the line of research known e: 43 (0.003%)
as Computer Learner C01pus) has been for progress in studies on the acqui- tanto . .. como 62 (0.043%) a: 540 (0.050%)
sition of second and foreign languages. In this article we advocate expand- e: 396 (0.036%)
ing the sphere of action for this type of research to encompass the o/ o bien/ u 430 (0.104%) a: 7829 (0.737%)
development of communicative competence, not only in L2 or foreign lan- e: 5830 (0.534%)
(o) bien . .. (o) 7 (0.004%) a: 85 (0.008%)
guages, but also in Ll or mother tongues.
bien / bien . . . o e: 87 (0.007%)
jJero 535 (0.375%) a: 1287 (0.121 % )
Notes e: 1497 (0.137%)
sino 59(0.041%) a: 527 (0.049%)
l y estudio del Corpus 92 escrita por aspi- e: 959 (0.087%)
1nii11rnitnrir:><1, directed Dr María Paz Battaner, was ini- aunque 138 (0.096%) a: 582 (0.054%)
Pompeu Fabra Barcelona (1993-1994) and e: 832 (0.076%)
( 1994-1997) the Dirección General de Investigación
Científica y Técnica (DGICYT PB93-0392) of Spain.
2 92 comprises written tests in academic ui:,uµ1Jtut" The frequency lists for each corpus are provided the section Unidades
entrance exams. de contexto of the tool used. This table shows the paratactic con-
3 The shortest exam answers contain around 300 words and the whose no for the consultation tool
ones can be in excess of900. the units that pose of identi-
4 92 has been included in the de KP:tPrr,nr:;1.n fication in the word count the conjunction rnas, which in
ofthe the consultation made is mixed with the adverb . It should be
out that Bwananetmakes no distinction between the conjunction
5 and the adverb ni. In any event, adverbial uses are the least
Encarna Atienza and we consider for statistical the of error
Torner Castells. generate is minimal. We have
esi1ec;1at:izaao contains written texts in five
m·1au1rYJ'""'P in the of the forms tanto and
domains. The directed
WORKING SPANISH ORPORA
10
12
René
nomics. Pontificia Universidad Católica de
13 In Table 8.3 we have included only those paratactic conjunctions that Chile
lend themselves to combination with other grammatical units, i.e. the
connectives that coordinate two or more members paratactically in con-
junction with other connectives or discourse markers. Consequently, the
distributive conjunctions tanto ... como, bien ... bien have not been
Introduction
included in the table, as do not allow these combinations.
This research is part of the studies conducted on specialized discourse;
more specifically it is related to the written production of scientific articles.
The term 'specialized discourse' is nowadays widely accepted by language
scientists. Generally, the concept of specialized discourse is conceived in a
global and comprehensive way (Parodi 2005a), acknowledging a continuum
within the concept that includes texts that range from high to low special-
ization and belong to a variety of text types.
The text type traditionally studied in scientific specialized discourse has
been the scientific research article, which is considered as a prototype in this
kind of discourse practice (Sager et al. l 980; Bazerman 1988; Swales 1990,
2004; Salage1~Meyer 1991, 1992; and Martin 1993; Hyland 1998,
1999, 2000; Martin and Rose 2003). This text type has recently received
greater attention in and has been studied from various perspectives
(Calsamiglia 1998; Bolívar 2000; Ciapuscio 2000, 2003b; Cassany et al. 2000;
Moyano 2000; Ciapuscio and Otañi 2002; López 2002; Mogollón 2003;
Martín 2003; Gotti 2003).
These researchers generally conceive of a scientific research article as the
written text published in a specialized journal, whose aim is to inform the
discourse community on the results of scientific research performed by
applying the scientific and which requires clear rhetorical struc-
ture commonly following the IMRD model (Introduction, Method, Results,
Discussion), proposed Swales (1990). However, as stated
Swales (2004), the structure may vary according to the characteristics of
each scientific discipline.
Research in this field has been performed from linguistic-
textual, rhetorical and socio-cognitive perspectives, using model text
and criteria. The more
for special and didactic
LATENT
unJ1u•;.:11~a1
sciences and social ~utn1cc:~
first part of this be
the research, involving Cabré 2002; Ciapuscio 2003b; Gotti
course of science, collocation semantics and latent semantic analysis. The As said before, the scientific research article (SRA) is a written text, with
methodological framework is presented in the second part. Finally, after the a rather rigid structure (at least, when produced in sorne empirical disci-
of the results, the chapter ends with sorne conclusions. plines): each ofthe traditional sections (e.g. IMRD) is preceded by a title,
the names of the authors and the institutions where they work as
researchers, and an abstract, whose aim is to briefly inform the reader on
Theoretical background
the content of the total SRA, to help them decide if they consider it useful
toread (Moyano 2000).
1.1 The research article: a
Swales ( 1990) argues that sorne characteristics of scientific articles may be
We are aware that in science there is not always consensus about the denom- displayed through a wide range of disciplines. It is said they are repeated to
inatíon of the objects of study, mainly due to the focus and delirnítation of warrant the existence of a rnacro-genre. Nevertheless, we nowadays know that
the authors in their effort to conceptualize them. Norrnally there are rnul- articles may vary in the degree of standardization and style from one discip-
approaches because of diverging theoretical assumptions. The concept line to another: sciences known as 'hard', 'exact' or 'physical' follow the rigid
µc,ua.11L.cu discourse' is no exception. It has received many narnes: aca- pattern discussed above, while in social sciences there are sorne journals that
demic discourse, special discourse, professional discourse, technical dis- have adopted the common pattern with different degrees of success. There
course, institutional discourse, and so on. Sorne ofthern actually do are still other journals that resist adopting a fixed organízation (Moyano
while sorne others do not. 2000; Mogollón 2003; Swales 2004). On the other hand, dueto the interna-
Likewise, it is not easy to reach a certain terminological order and attain tional process of indexation and accreditation systems, scientific journals
a more or less homogeneous vision (Ciapuscio 2000; López 2002) as the task have increasingly paid more attention to the standardized production of arti-
of deterrnining whether a text can be classified as a specialized text or a cles, following general and more common norrns - at least in format.
general text becomes a theoretical and descriptive problern (Schroder Various previous investigations have focused on different parts of scientific
Parodi 2004). the view is in favour of a text research articles. The introduction, for example, has been analysed in depth
continuum (Lakoff distributed from a highly special- Swales (1990), introductions and conclusions by Gnutzmann and
ized domain toward another more and information-oriented Oldenburg (1991), conclusions Ciapuscio and Otañi (2002), abstracts by
extreme 1982; Schroder 1991; and Martín Salager-Meyer (1991) and Bolívar (2000), abstracts and introductions by
1994; Peronard 1994, Cabré 2002; Parodi Martín (2003), and the introduction and discussion sections Dudley-Evans
On the other hand, Gotti , also when . Ali such research has been conducted from linguistic-
the rnulti-dirnensional nature ofthe specialized discourse, states textual, rhetorical and socio-cognitive and most of them frorn a
that there is no among different specialized He comparative inter-language in model of texts.
argues that vanat10ns not lexical connota- \!Ve will now refer in more detail to two aspects of the scientific research
tions, but often influence other textual and article: the abstract and the ITP,cmu•rn
1-'"'LU"""v"'"'""' semantic and
of various types of specialized discourse. 1.1.1 The abstract in research articles
Therefore, the differences betí-veen discourses allow us to level The abstract is a brief text used to allow the reader to and
differences discourse since, for the mere presence the article's rnain content. This text is located between the title
SPANISH C USING LATENT SEMANTIC ANALYS S
and lacks discussion, references, vant

in no more than 300 words, the scientific value of determine
the research work should be evidenced. For this reason, the is to index articles in relation to the matter. assist
essential information on materials and research methods used, and the the fast web search of relevant articles for the reader, when there a spe-
most relevant conclusions of the work are stated 1985; cific need
Swales López 2000). For Hartley and Kostoff (2003), point to the main concepts and
From the semantic point ofview, abstracts correspond to a globalization delimit the research field of interest. keywords enable the reader to
(condensation of information in smaller units) and conceptualization of decide when the article contains relevant material of interest, provide
the network of contents of the text. In this sense, we propose to name this readers vvith a cluster of convenient terms to be used when searcbing the
globalization process macrosemantization, as the abstract becomes the textu- internet to locate other topic-related material, help editors/indexers to
alization that abstractly represents the total meaning of the article's group related material, enable researchers to exchange documents on dis-
con ten t. cipline subjects and connect specific topics of concern to topics in higher
According to van Dijk and Kintsch (1983), the fact that abstracts are metalevels, especially with reference to lists of keywords that may identify a
placed at the beginning of the reading in all texts helps the reader to build particular research area. For example, if a list of keywords from scientific
a hypothesis on the topic, in such a way that the following sentences may be research articles on a specific discipline is made, the higher frequencies
processed top-down. Thus, the abstract as a product is nothing but the would point to the topics more commonly dealt with in this discipline,
result of an abstraction process where the information submitted by the which in turn would indicate higher levels of abstraction and would account
document of origin is synthesized, maintaining its essential parts and con- for the area's metatopics (Hartley and Kostoff 2003).
tents (López 1997). Generally, the study conducted on keywords is related to studies whose
A more updated form of study, though it may be little known in Spanish, aim is to extract information from texts through computer tools that
is one that focuses on abstract research from a computer-oriented perspec- mainly function with mathematical-statistical methods and algorithmic
tive. The works performed by researchers of the Cognitive Sciences models. The purpose of such studies is to obtain relevant documents con-
Institute, University of Colorado, Boulder, illustrate this kind of inves- sidering the user's need for certain information. For example, in the case
tigation and their results are interesting because they offer new approaches. of scientific articles that must be indexed and lack keywords or abstracts,
The team working at this centre has theoretical and empirical the various methods or models will allow, through different techniques, the
studies aimed at proving that vectorial LSA, represent provision of either lists of the most relevant words in the texts, or otherwise
an inductive of the construction process performed a text that will summarize the information of the documents of origin
human beings (http:/ /lsa.colorado.edu/) et al. 2000). Another (Turney 1997, 1999).
group of researchers developing this line of studies is the team from the
Institute oj of Memphis, USA. have also
1. 2 The study
LSA tools as part of a virtual tutor called Auto Tutor
et al. 2001). The analysis of lexical-semantic relations in this research is based on the
the abstract must not fulfil the function of providing 'semantic similarity' notion that goes back to the contribution by Russell
elements that stimulate the reader to consult the original documents, but (1937) with his of classes and . Nowadays, this notion is
should also facilitate a first level of comprehension of the implemented as a probabilistic measure or exchange degree of a word
addressed. another in a particular context, given that semantic similarity is conceived
under the assumption that words similar will behave in similar
ways (Manning and Schütze 2003; Matsumoto 2003).
after the a list of 'Vhile sorne authors understand semantic as the extension of
words for each article. These words are semantic clues that must capture the synonymy, others consider it in the sense of two words the same
USING ANALYS S
contain.
in turn con- LSA does not information or
their that is, it is method of mathematical
informatíon is not known, such a type of that attains inductive effects, an adequate
approach is of words' method (J urafsky and number of dimensions to represent objects and contexts (Landauer et al.
Martin 2000) . 1998). The LSA method extracts its meaning representations ofwords and
From a strictly linguistic perspective, this notion is compatible with that paragraphs exclusively from text mathematical-statístical analysis. Its know-
of collocational meaning developed by the English functionalist school ledge lacks anything that might come from perceptual information on the
(Palmer 1980). As we know, such a notion is based on the contextual theory physical world, instinct or experience generated by corporal functions, feel-
of meaning, where a word acquires its meaning through the words that ings and/ or intentions. Therefore, its representation of meaning is partía!
accompany it (Palmer 1980; Stubbs 1996, 2001). This concept of meaning and limited, as it does not use either syntactic, logical or morphological rela-
has given rise to computer corpus semantics, which by means of computer tions. In spite of the latter, Landauer (2002) explains that, at least for the
tools enables the performance of empirical studies of meanings using huge English language, 80 per cent of the potential information in language
textual corpora (Halliday 1991, 1992; Sinclair 1991; Stubbs 1996, 2001). líes in word selection, 'Without any consideration of the order in whích they
Stubbs (2001) states that it is possible to study lexical-semantic relations are placed.
starting from the study of word collocation and frequency. Based on concrete In addition to this representation without syntax, there is an idea which
examples, he promotes the observational methods of corpus semantics, states that in the use oflanguage, represented by great quantities of corpora,
arguing that data obtained from the corpus provide evidence with respect to there are weak semantic interrelations between words empowered by the so-
denotative and connotative meaning. Nevertheless, the fact that most fre- called method of dimension reduction Singular Value Decomposition
quency studies in corpus linguistics are limited to the isolated count of the (SVD). In this sense, the metaphor underlying the term 'latent' means that,
more frequent units, hiding various interesting aspects related to null, through dimension reduction obtained using SVD, the adequate represen-
minimal or intermediate frequency units, has been criticized (Rojo 2002). tation of existing relations between words in the textual corpus is obtained,
We believe that to quantitatively study lexical-semantic relations in a and such relations are very weak due to the great number of words
corpus, we cannot focus only on the highest frequencies of occurrences. It (Landauer and Dumais 1996, 1997; Landauer et al. 1998).
is highly relevant to pay attention to the complete range of occurrences, and The procedure performed when using LSA to represent texts in multi-
even more to the co-occurrences of sorne features. dimensional semantic spaces is the one proposed for the Latent Semantic
Bearing this in mind, we decided to use a vectorial method based on Indexation (LSI) method by Deerwester et al. (1990). This method, origi-
latent semantic analysis that recognizes, dimension reduction, the nally applied in the information recovery area, has been used with theoret-
semantic similarities existing among the linguistic or text units, among ical and methodological in psycholinguistics during the past years
and also between words and documents. The most relevant idea as (Deerwester et al. 1990; Landauer and Dumais 1996, Landauer et al.
to semantic is that results may be explained through the degree of 1998; Kintsch 1998, 2000, · Landauer 2002; Quesada et al. 2002;
contextual exchange, or the degree by which a word may be substituted Quesada 2003). Additional information on the theoretical and method-
another within a context. From an perspective, measur- ological discussíon of this method may be found in Spanish in Venegas
is conceptualized models ofvectorial type, as a (2003, 2005, 2006) and Gutiérrez
of vectors, in order to determine the of two speaking, this method works as
words as vectors in a multi-dimensional or multi-vectorial follows. The first step is to build a matrix of co-occurrences (X) from texts,
space. A matrix is built to do this, representing numerically the co- where each column represents a document or co-text ( d), and each file
occurrence ofwords a unit called 'document' sentences resents word from the text . Each cell contains the in
or the file word appears at the text page shown . Cell access
SPANISH CORPORA USING SEMANTIC ANALYS S
in a great number of documents and

associated with a type of document between vectors is calculated cosine rneasures, the
The second step is to the SVD method to the resulting matrix. It val u es of which rate from l, for vectors with the same direction is, what
decomposes the rectangular matrix (the one that considers different enti- is being measured is equal), to Ofor vectors is, they are per-
ties in files and columns, for example, terms per document) into products pendicular in the multi-vectorial space, which means that what is measured is
of three other matrices (words for words (P), documents for documents completely different). Values must be normalized to make comparison
(D), and one of singular values (S)), that is, a matrix of lesser dimension between them more effective. Ifthis is not done, longervectors might have an
(X'), that represents the original matrix (see Figure 9.2). unfair advantage over shorter vectors. In addition, normalizing cosine values
To facilitate understanding it is useful to interpret SVD in geometrical allows them to be calculated as simple (multiplication of vectors)
terms. This means that the reduced matrix file and column values are taken (Deerwester et al. 1990; Landauer et al. 1998; Manning and Schütze 2003).
2. Methodological framework
X= matrix) X = (transformed matrix)
di d2 d3 d4 di d2 d3 d4 2.1 Type of study and variables
Pi 1 3 2 4 Application of Pi 0.1 0.2 2 0.6 As we stated in the introduction to this research are: a) to
P2 2 4 3 P2 0.2 0.3 0.01 0.3 compare, using a vectorial analysis computer tool based on a corpus called
p3 Latent Semantic the lexical-semantic relationship of three text
3 4 2 1 p3 0.2 0.1 0.02 0.2
variables present in scientific research articles that are: keywords, abstracts
9.1. First step ofLSA method and the contents, and b) to compare, starting from the lexical-semantic sim-
ilarity values of the text variables, a sample of scientific research from two
science areas (biological sciences and social sciences).
X In order to with such objectives, we performed exploratory-
di d2 d3 d4 non-experimental a methodol-
P1 O. 1 0.2 2 0.6 ogy. It is because there are no studies concerning scientific
P2 0.2 0.3 0.01 0.3 writing conducted in Spanish using tools to calculate lexical-
semantic with LSA. It is because it about
Ps 0.2 0.1 0.02 0.2 incidence and values that show the relations between the text variables to
be
p D X' two areas with to their lexical con ten t.
P1 P2 P3 d1 d2 d3 d4 di d2 Variables considered in this work are of two types: text and discipline. Text
Pi i 0.2 2 d1 1 0.2 2 0.6 P1 0.1 0.2 variables are: a) a group of words or nominal phrases, that in
P2 0.2 1 0.01 d2 0.2 1 0.01 0.3 P2 0.2 0.3 on1p;Kt the main of the scientific research arti-
p3 0.2 0.1 d3 0.2 0.1 1 0.2
ld4 0.1 0.3 0.4 1 ~~'"'J"''~' in a lexical-semantic way the
scientific research e) the contents: texts made up of
ofSVD rhetorical-structural where the followed one or
USING
·----
4908
8395
4982
sciences. Those two areas been chosen because of their greater pres- 6156
RCHA5 2001 2735 13,419
ence in indexers used for corpus collection. Exact or pure
6 RCIM.5 2002 20(2) 1780 8876
sciences are another area of science: RClL'\2 2002 20 (3) INE2 2002 26(2) 8034
4251
however have not been taken into account in the '·'-'"HJ<H 8 RCHN6 2000 NS3
anides these areas are m Lúic:iw'" 9 RCHN7 2000 NS7 2003 186 7150
10 RCHNl 2002 5522 NSlO 2003 188 5436
11 At'VIV2 2001 33 (1) 2609
2.2 Hypothesis 12 At\1Vl5 2002 34(2) 3519
The research hu~r.;hp~'"º are the following: Total 45,959 73,838
Average 3829.9 7383.8
Hl: When comparing lexical-semantic similarity indexes between text vari- Std.Dev. 1177.8 258.4
ables (keywords-abstracts, keywords-contents and abstracts-contents) frorn
scientific research articles of two areas of knowledge, the abstracts-contents Latindex electronic indexers that cornply with intemational indexation crite-
relation will show higher similarity values than the keywords-contents and ria or . In addition, are texts
keywords-abstracts relations. that show the text variables for research. Annex 1 the
ARTICO corpus, considering the reviews, the number
H2: There will be significant differences between the two areas ofknowledge word total, in accordance with the scientific area of
researched when comparing lexical-semantic similarity indexes between text Table 9.1 presents the research corpus used to quantify semantic similar-
variables of the scientific research article in a specialized semantic space ities. For more details, see Annex 2 and bibliographic refer-
(keywords-abstracts, and abstracts-con ten ts). ences of the research sample).
2.3 The 2.4 semantic

To quantify lexical-semantic similarities between the text variables key- To be able to among the text variables under study,
words-abstracts, and abstracts-contents, a research LSA is used as has already been described. A semantic space of 294 dimen-
corpus of22 scientific research articles (1 was deter- sions called ES-ARTICO was built from the ARTICO corpus.
mined: 12 from biological sciences and 10 from social sci- Then we calculated for each article of the research corpus, the infor-
ences (73,838 mation ofES-ARTICO, the LSS of each '"rith respect to the au,,cu<cL.
This research corpus rnrr•'ff>rHH1 and the LSS of each with respect to each of the artide.
97 per cent confidence, The LSS ofthe abstractwith respect to each content was also thus
calcula te d. statistical tests
eles benveen were there were
nificant differences benveen the variables and benveen science areas.
Sciences,
Results
and social sciences
Information Sciences, Social and Cultural
3.1 variab/,es
Lc.vuvuuL~. Soc1oloov Cultural and Social m
2000 to 2003 in mainstream reviews in each science area, that is, reviews avail- The
able to researchers in ScIELO Electronic LSSs h"''""'""n
WORKING WITH USING SEMANTIC ANALYSIS
among variables
------------------------
2 sciences anides
range 0-25% 25-50% 50-75% + A+C K+C

Lcxical-semantic low Mcdium .6196
AM\13-2002 34(1) .2259
GC3 2002 66(2) .2928 .6218 .2054
Lexical-scmantic .0134 .1781 .2915 .4976
GC3 2003 67(1) .3569 .3472 .1412
value for each .5635
RC!-L~4 2001 19(3) .2856 .2003
RCHA5 2001 19(3) .1468 .4022 .0661
was using those averages, in order to establish a parame- RCHA5 2002 20(2) .2058 .4324 .1025
ter that would assessment per area and compare lexical-semantic sim- RCHA2 2002 20(3) .2698 .5155 .1845
RCHN6 2000 73(4) .2180 .4207 .0834
ilarity indexes (LSS_I) between areas. Table 9.2 shows the relationship
RCHN7 2000 73(4) .2670 .6366 .2059
between each percentage range of the quartile and the degree of lexical-
RCHNl 2002 75(2) .2915 .5062 .1602
semantic similarity (Dº _LSS) and the threshold value obtained for all the LSS- AMV2 2001 33(1) .4601 .1546
.2546
I between text variables, calculated for all the articles of the research corpus. AMV15 2002 34(2) .2988 .4350 .1418
Those values within the range of .0134 and .1780 (four decimal positions Average .2595 .4967 .1492
are considered to present results) correspond to a low Dº_LSS. Those
located from .1781 to .2914 shall have a medium-low Dº_LSS. LSS_I located (K = Keywords, A = Abstracts, C= Content)
from .2195 to .4975 shall have medium-high Dº _LSS. Finally, those LSS_I
over .4976 shall have a high Dº _LSS. The second statistical procedure was to RCHN7 2000 73(4) presents the highest index in this comparison, .2059,
apply the Kruskal-Wallis test to compare the lexical-semantic relationships which is a medium-low Dº _LSS. The articles in this area presentan average
between the text variables. Lastly, a Mann-Whitney test detected similar- between keywords and abstracts of .1492, thus corresponding to a low
ities and/ or differences betlveen the scientific domains under examination. Dº _LSS. This average value between the variables places biological sciences
articles as those with lower LSS-I between the articles investigated for both
science arcas. As will be shown later, both abstracts and keywords are less
3.2 related in articles in this area than in those about social sciences. This might
Table 9.3 summarizes the results obtained from the comparison of LSS be due to less strictness when writing abstracts, in not taking into consider-
between various texts in biological sciences. ation sorne relevant meanings of the content, and to the fact that keyword
In relation to keyword and abstract variables in this area, it should be selection would be less used to macrosemantize the global semantic content
noted that RCHA5 2001 19(3) is the article with the lesser LSS_I (.1468), of the article than to show more context or metatopical aspects of the
and this index corresponds to a low Dº _LSS. Conversely, the article with the research (for example: reference to the research suqject matter territory,
higher LSS_I in these variables is GC3 2003 (.3569), a value that geographic place description,
according to our segmentation into corresponds to a medium- In a general sense, the low LSS_I of the articles of the biological sciences
high Dº _LSS. As to the average index among variables, its value ranks at sample may be due to the of a quantification method of similarities
.2595, i.e. a medium-low Dº_LSS. to grasp the macrosemantization strategies that go the text content.
~"~""ª~ the betlveen the abstraer and content variables in Therefore, it is to that do macrosemantize the global
µu>M1J1c to establish that article GC3 2003 presents the text meaning, not in terms of intra-text lexical relations but rather in the
lowest corresponding to a medium-high Dº _LSS. function of still more abstract macrosemantization relations, exogenous to
the research article presenting the LSS_I among these variables is the text and of eminently and even inter-discourse, character.
RCHN7 2000 with a value of that corresponds to a Dº _LSS. Another possible argument that might this low LSS, particularly
As to the LSS_I value of these variables in it vl'ith reference to abstracts, is that the chosen at random, show a
un·:uu.un··HU!ll Dº _LSS. greater tendency toward a more descriptive than informative abstract con-
and content in this area, RCHA5 struction. Thus, the fundamental postulates of the original work would be
the lowest LSS_I between variables, in the abstracts, but not as concrete results of the reflections or
váth a value of co:rresrn)n<is to a low Dº _LSS. stated in the article
USING SEMANTIC ANALYSIS
between
A+C
---~------·-
2001 6 .2479 .1693 .0507 -

U)
1
AMBll 2001 6 .6572 en
AMB14 2001 .1020 .3830 .1266 ..J
AD14 2002 .0891 .5106 .0707
CHU2 2002 34(1) .3580 .4831 .2003
INE4 2001 25(2) .3224 .5900 .2608
.6307 .3762 K+C
INE2 2002 26(2) .4762
NS3 2003 184 .2183 ..5019 .1470 <>- biological 0,2595 0,4967 0,1492
NS7 2003186 .2405 .5965 .2128 sciences
NSlO 2003 188 .2517 .7628 .2730 --social
Average .2532 .5249 .1977 0,2532 0,5249 0,1977
sciencess
text variables
3.3
Graph 9.1 Comparison of average indexes oflexical-semantic similarity between
Table 9.4 presents the results obtained for comparison of LSS beaveen the text variables in both areas
variables to be studied in social sciences.
Results of the comparison between keywords and abstract variables show Thus, the keyword-abstract and keyword-content variables have a lower
that article AD14 2002 5 has the lowest LSS_I (.0891), corresponding to a average value than the semantic similarity relations between abstract-
low Dº_LSS. Conversely, research article INE2 2002 26(2) has the highest content in all articles investigated, independent of the science to which
LSS_I (.4762), corresponding to a medium-high Dº_LSS. It is worth noting they belong. More specifically, there is a slightly higher average index in
that this artide has the highest LSS_I between keyword and abstract the keyword-abstract relationship than in the keyword-content relation,
variables among all articles investigated in both science areas. The average which enables us to suppose a tendency toward relating keywords with
result of the LSS-I between these variables reaches a value of corre- abstracts rather than with text contents.
sponding to a medium-low Dº _LSS. Statistical contrast of such relations allows a confirmation of this idea.
As to the comparison between abstractandcontentin social sciences Thus, when comparing LSS between text variables (keywords-abstracts, key-
the one with the lowest LSS_I is AMBl 2001 6 with corresponding to a words-contents, and abstracts-contents) using the Kruskal-Wallis test, and
low Dº_LSS. article NSlO 2003 188 has the highest LSS_l beaveen considering a 5 per cent error, we can establish that in biological sciences
abstract and content variables in the area, with a value of .7628. The average there is a statistical difference beaveen the three relations, that is LSS
LSS_I beaveen variables is toan average Dº _LSS. between kevwords and contents is lower than the LSS relation between key-
In the between and content variables, article AMBl words and ~bstracts. In turn, both relations are lower than the LSS beaveen
2001 6 has the lowest LSS_I between variables , corresponding to a abstracts and contents. In social sciences, statistical tests also showed differ-
article INE2 2002 has the ,u~;uc.~t ences beaveen the three variable relations. Therefore, the LSS between key-
and content variables in social sciences. As words and abstracts is lower than the LSS beaveen keywords and contents,
between these it is . and both are lower than the LSS beaveen abstracts and contents of the arti-
cles under study.
These results confirm our first research hypothesis, considering that
abstracts rather than keywords macrosemantize better in both areas the
global semantic content of the SRA. A LSS pattern is also identifiable in both
In beaveen LSS results of text variables in the areas where there is a macrosemantic hierarchy. Thus, greater macrose-
two scientific areas studied. mantization is given in the abstract-content relations, followed
As shown in 1, LSS averages between text variables make up keyword-abstract, and lesser macrosemantization is between
pattern of semantic to both scientific areas. words and contents.
WORKING SPANISH USING LATENT SEMANTIC ANALYSIS
wiili 00~ rn
in accordance with international standards of scientific
thus appear that research anides sh01N
rhetorical-stnKtural
similar LSS between the areasº We believe that what has been
as to form and content, to lexical-semantic an of communication
relations between the variables studiedº This standardization takes areas of scienceº
thanks to editorial whose aim is coherent and infor-
mative articlesº Condusions
First, it is possible to assert that the abstracts-contents relation in all SRA of
305 the corpus is stronger than keywords-abstracts and keywords-contents rela-
The above results allow us to establish a similar behavioural pattern for tionsº The results confirm the macrosemantization function of abstracts,
both sciences, with the LSS of text variables as a starting poinL In order to and that they help the reader to build a hypothesis on the central topics of
verify whether this pattern marks a difference between study areas, there is the texL In this particular case, we have verified that the degree of the LSS
a statistical comparison of the average values of all the LSS relations (see of abstracts is statistically high with reference to contents, therefore, in this
2) o type of text, and in the investigated areas, readers' hypotheses would prob-
A Mann-Whitney non-parametric test for two independent samples was able be fulfilled to a high degreeº
applied, obtaining an alpha value of A5L According to this value, our Another conclusion is that keywords, at least in the SRA studied, lack a
second research hypothesis is disprovedº This means that when analysing clear macrosemantization function of the global meaning of contentsº
the LSS_I between text variables (keywords-abstracts, keywords-contents Thus, it is possible to argue that most probably keywords succeed in
and abstracts-contents) of the scientific research articles within a special- placing the research article in a subject-matter or procedural disciplinary
ized semantic space, no significant differences were evidenced (with a 5 per field, with little reference to this within the articleº It is worth noting that
cent error) between the two areas investigatedº the method of quantifying similarities is based on co-text collocation of
One possible interpretation of this result is that the scientific research words within the text, therefore it is highly probable that this keyword
articles used in the study sample are the material product of a complex function is not detected by LSS values delivered by LSA It could happen
process of scientific production, where multiple agents unite their dis- that keywords do establish the macrosemantized global meaning of the
course and disciplinary competence in order to attain a scientific research text, but not in terms of intra-textual lexical relations, but rather in more
abstract macrosemantic relations of metatopical and eminently inter-
textual charactec In addition, one could argue that keywords would fulfil
Comparison of the lexical-semanfü:: similarity average in
another specific functionº Their nature would be more persuasive
both areas
than informative, that is, writers or even editorial committees would use
0,5 or suggest keywords or phrases that, though not strongly related to
texts, arouse the interest of a possible reader, thus covering the first
0,4
- 1
step of approach to text reading for scientific communities of specific
en interesL
w
...! 0,3 We can also state that there are no significant differences between the
Q)
m scientific domains under study, in accordance with the lexical-semantic
«l
....
Q)
0,2 relationships analysed between the text variables in the articles investi-
> gatedº Evidently, this result does not respond to what had been expected,
<:!.
O, 1 considering that notionally it was assumed that biological scientists would
show greater precision and standardization when writing a scientific
0,0 research article, fundamentally between the scientific article sections
bíological sciences social sciences
and the terminological use of the lexicon, in contrast with social sciences
9º2 Comparison of the lexical-semantic similarity average in both areas texts where less rhetorical-structural standardization, lexicon variability
according to ES-ARTICO and conceptual were expectedº With reference to semantic
WORKING SPANISH USING SEMANTlC ANALYSIS 3
Note
The numbers included in 9.1 and 9.2 are used for representa-
tional purposes and ··~·,"~~~ri to the results of the sta-
indexed scientífic reviews in
tistical 'rn" • v~1~.
ards of international In this sense, the article produced an
or authors, is submitted to a complex editorial process. Through this
process many scientific producers and those who understand the discipline
join their knowledge and text-discourse competence to co-build a scientific
research article not only possessing content quality but organizational
quality according with the rhetorical structure demanded by the journal.
it is possible nowadays that in most disciplines research articles tend
to greater rhetorical-structural homogeneity, and therefore similar LSS. In
this way, the trend is toward a progressing similarity where individual disci-
pline differences are lost, at least regarding differences in macrosemantiza-
tion processes between the variables studied in biological and social sciences
text samples.
It is possible to infer that the researcher intending to join a scientific dis-
course community must learn, among other things, to communicate his or
her research by following the semantic-textual norms associated with this
type of text and the proper disciplinary norms of specialized reviews,
assisted in this learning process by editorial instan ces of the journal selected
for publication. This process includes multiple assessments and suggestions
both from scientific peers and editorial committees and/ or journal editors.
the effort of getting an article published in a scientific joumal
becomes a semantic textual co-construction process oriented toward a dis-
course production system or knowledge co-writing. It is very
thus, to discover that a finally published scientific research
artide may, in sorne cases, turn into a product of great interaction and
meaning exchange. So, the writer must pay attention to many voices that
might make him or her change not onlyformat and content, but
communication purpose.
we must emphasize the fact that results obtained through LSA
help to confirm the function of abstracts in SRA and determine
that text variables macrosemantize contents in SRA, independent of the dis-
cipline area. In other words, LSA using mathematical-statistical data of
the selected texts enables us to establish w:ith fair precision and economy, in
terms, the forces of between lexical components,
them semantic values and the of
lexical-semantic
components.
SPANISH CORPORA USING SEMANTIC ANALYSIS
AJ'<JNEX CORPUS ARTICO: NUMBER OF ARTICLES AND WORDS ANNEX 2 RESEARGH CORPUS REFERENCES
CODE CORPUS ARTICO BIOLOGICAL SCIENCE
-----
EXACT SCIENCES ID CODE
DE LA SOCIEDAD CHILENA 78
ALvI\13-2002 Sievers, M., Cárdenas, C.
34(1) 'Estudio anual de la eliminación huevos
v larvas de
29 91,025
en ovinos de una estancia en
RPQ Chile'. Archivos de A1edicina Veterinaria 34, (1), 37-47.
6 20,841
2 GC3-2002 Lara, G., Parada, E. and Peredo, S. (2002) 'Alimentación
publicaciones/ing_quimica/Reglamento.htm 66(2) y conducta alimentaria de la almeja de agua dulce
AAL ACTAS DE LA ACADEMIA LUVENTICUS 6 21,615 Diplodon chilensis (bivalvia: hyriidae) '. Gavana (Concepc.)
http:/ /www.luventicus.org/ Actas/ 66, (2)' 107-12.
ITERC INTERCIENC_IA http:/ /www.interciencia.org/ 6 36,615 GC3-2003 Alarcón, M. (2003) 'Sifonapterofauna de tres especies de
3
ACV ACTA CIENTIFICA VENEZOLANA 7 32,831 roedores de Concepción, VI Región, Chile'. Gavana
67(1)
http:/ /acta.ivic.ve/ (Concepc.) 67, (1), 16-24.
132 478,908
BIOLOGICAL SCIENCES 4 RCHA4-2001 Vásquez, B. (2001) 'Presencia de CBG ene! estroma
GC GAYANA CONCEPCIÓN http:/ /www.scielo.cl/ 19(3) ovárico de mamíferos'. Revista Chilena de Anatomia 19, (3),
40 159,070
scielo.php?pid=0717-6538andscript=sci serial 279-84.
RCHA REVISTA CHILENA DE ANATOMÍA 66 219,685 5 RCHA5-2001 Castro, A., Ghezzi, M., Alzota, R., Lupidio, M. and
http:/ /www.scielo.cl/ scielo.php?pid =0716-9868 19(3) Rodríguez,]. (2001) 'Morfoloqía del hígado de llama
andscript=sci serial (Lama qlama) '.Revista Chilena de Anatomía 19, (3),
RCI REVISTA CHILENA DE INFECTOLOGÍA 34 136,299 291-96.
http:/ /www.scielo.cl/scielo.php?pid=0716-1018 6 RCHA5-2002 Briones, F., Calderón, M., Muñoz,J., Venegas, F. and
andscript=sci serial 20(2) Araya, N. (2002) 'El anticuerpo monoclonal Ki-67 como
RCHN REVISTA DE HISTORIA NATURAL 101 657,749 elemento de valor diagnóstico y pronóstico. en neoplasias
http:/ /www.scielo.cl/ scielo. php?pid = 0716-078X mamarias caninas'. Revista Chilena de Anatomía 20, (2),
andscript=sci serial 165-8.
AMV ARCHNOS DE MEDICINA VETERINARIA 57 263,781 7 RCHA2-2002 Babinski, M., Chagas, M., Costa, W. and Pereira, M.
http:/ /www.scielo.cl/ scielo.php?pid = 030 l-732X (2002) 'Morfología y fracción del área del lumen
andscript=sci serial 20(3)
glandular de la zona de transición en la próstata
298 1,436,584 humana'. Revista Chilena de Anatomía 20, (3), 255-62.
SOCIAL SCIENCES
AMB ÁMBITOS.REVISTA INTERNACIONAL 74 8 RCHN6-2000 Véliz, D. and Vásquez,J. (2000) 'La Familia Trochidae
497,473
DE COMUNICACIÓN http://www.ull.es/ 73(4) (Mollusca: Gastropoda) en el norte de Chile:
publicaciones/latina/ambitos/ambitos.htm consideraciones ecológicas taxonómicas'. Revista
CHU CHUNGARA http:/ /www.scielo.cl/ scielo. Chilena de Historia Natural (4)' 757-69.
57 351,956
php?pid=0717-7356andscript=sci serial 9 RCHN7-2000 Martínez, G. and Montecino, V. (2000) 'Competencia en
AD ANALES DE DOCUMENTACIÓN 54 379,925 73(4) Cladocera: implicancias de la sobreposición en el uso de
http:/ /www.um.es/fccd/anales/ los recursos tróficos'. Revista Chilena de Historia Natural,
NS NUEVA SOCIEDAD 26 152,501 73, (4)' 787-95.
http:/ /www.nuevasoc.org.ve/home/ Canals, M., Atala, C., Olivares, R., Novoa, F. and
INE 10 RCHNl-2002
INVESTIGACIONES ECONÓMICAS 34 333,777 Rosenmann, M. (2002) 'La asimetría y el grado de
75(2)
http:/ /www.funep.es/invecon/ sp/ sie.asp optimización del árbol bronquial en Rattus norvegicus y
245 1,715,632 Oryctolagus cuniculus'. Revista Chilena de Historia Natural
TOTAL 675 3,631,124 75, (2), 271-82.
6 WORKING WITH SPANISH CORPORA
ANNEX (continued)
BIOLOGICAL SCIENCE
ID CODE REFERENCE
AMV2-2001 Díaz, D., Picco, Encínas, Rubio, and Litterio. N.
33(1) (2001) 'Residuos tisulares de nicotina to de norfloxacina lll
administrado vía oral en credos'. Archivos de A1edicina
Vetetinaria 33, 37-42.
12 AMV15-2002 Perfumo, C, Sanguinetti, H., N., Armocida, A., Ni cole
34(2) Machuca, M., Massone, A., Risso, M., and Viviana Cortes
Idiart,J. (2002) 'Constrictura rectal en cerdos Douglas Biber
necropsiados en una granja de ciclo completo en
confinamiento. Consideraciones sobre su prevalencia, Northcrn Arizona University
hallazgos anatomopatológicos y etiopatogenia'. Archivos Iowa State University
de Medicina Veterinaria 34, (2), 245-52. United States of America
SOCIAL SCIENCE
m CODE REFERENCE l. Introduction

AMBl-2001 6 Emanuelli, P. (2001) 'Dominante cultural y productos Over the past two decades, researchers have become increasingly interested
t~levisivos: Géneros que homogenizan preferencias'. in the study of multi-word prefabricated expressions (see the reviews in
Ambitos 6, 7-20. Weinert 1995; Ellis 1996; Howarth 1996, 1998; Wray and Perkins 2000 and
2 AMBll-2001 6 Barrero, A. (2001) '.Juicios paralelos y Constitución: Wray 2002). Multi-word sequences have been studied under many rubrics,
Su relación con el Periodismo'. Ámbitos 6, 171-89.
including 'lexical phrases', 'formulas', 'routines', 'fixed expressions' and
3 AMBl 4-2001 6 Egea, C. (2001) 'La carrera por la comunicación local
'prefabricated patterns' (or 'prefabs'). These approaches all define the
(1998-2000) "Los grandes" se atreven con "lo
pequeño"'. Ámbitos 6, 237-60. object of study in somewhat different terms, and so they provide different
4 AD14-2002 5 Moreira,J. (2002) 'Aplicaciones al análisis automático perspectives on the use ofmulti-word sequences. For example, sorne studies
del contenido provenientes de la teoría matemática describe multi-word sequences that are idiomatic (e.g., idioms like in a nut-
de la información'. Anales de Documentación 5, 273-86. shel[), while other studies focus on sequences that are non-idiomatic but per-
5 CHU2-2002 34(1) Schiappacasse, V. and Niemeyer, H. (2002) ceptually salient (e.g., ifyou know what I mean).
'Ceremonial Inca provincial: El asentamiento de A complementary approach is to describe the multi-word sequences that
Sagura (cuenca de Camarones). Chungará'. Revista actually occur most commonly in a given register: extended collocations
de Antropología Chilena 34, (1), 53-84. referred to as 'lexical bundles' ( e.g., in the case oj). Recurrent word
6 INE4-2001 25(2) Goicolea, A, Lisandro, O. and Maroto, R. (2001) 'Picos sequences have been investigated in several earlier studies, including Salem
de inversión y productividad del trabajo en los
in French (1987), Altenberg in English (1993, 1998), Butler in Spanish
establecimientos industriales madrileños'.
(1998) and de Cock in English (1998). The term 'lexical bundle' was first
Investigaciones Económicas 25, (2), 255-88.
7 INE2-2002 26(2) Del Río, C. (2002) 'Desigualdad intermedia paretiana'. used in the Longman GrammarofSpoken and WrittenEnglish (Biber et al. 1999),
Investigaciones Económicas 26, (2), 299-321. which described these recurrent sequences of words in conversation and
8 NS3-2003 184 Hualde, A (2003) '¿Existe un modelo maquilador? academic prose. This framework has been applied in several subsequent
Reflexiones sobre la experiencia mexicana y studies, including Cortes (2002, 2004), Partington and Morley (2002), Biber
centroamericana'. Nueva Sociedad 184, 86--101. and Conrad (1999) and Biber et al. (2003, 2004).
9 NS7-2003 186 Giacalone, R. (2003) 'Intergración Norte/Sur y Overall, research on lexical bundles has shown that these recurrent word
tratamiento especial y diferenciado en el contexto sequences are usually not idiomatic in meaning, and they are usually not com-
regional'. Nueva Sociedad 186, 69-85. plete structural units. However, lexical bundles are important textual build-
10 NSl0-2003 188 Costa, S. (2003) 'Derechos humanos en el mundo
ing blocks used in spoken and written discourse. That is, these multi-word
PC•snac1on1al . Nueva Sociedad 188, 52-65.
SPANISH CORPORA LEXICAL BUNDLES IN SPEECH AND WRITING
texts, where
studies
LUc"'H~H. One of
. His interviews 397
sequences of corpora of Academic texts 1,002,550
were a combination of rhttM·An Total 560 3,224,575
the results reflected that sorne
,,,,~,,'"º on collocates are For the frequent collo-
cates in the Habla corpus (a corpus Madrid which consists 3) Caracas, Venezuela, 4) Havana, Cuba, 5) La Paz, Bolivia, 6)
ofinterviews with an include concrete nouns related to Lima, Peru, 7) Madrid, Spain, 8) Mexico Mexico, 9) SanJose, Costa
and processes, which are the San Juan, Puerto Rico, 11) Chile and 12) Sev:ille, Spain. This re g-
type of nouns and adjectives used in the País corpus, which is made up of is ter was chosen because it was a Jarge sample of spoken language that was
excerpts from the daily national newspaper El País, largely reflect the finan- available to the researchers. Most of the sociolinguistic interviews follow the
cia! content of that corpus. Butler conducted a functional analysis of the typical interview format of a question and then a long answer but ~til.l are rel-
recurrent word combinations identified in these corpora and reported that atively open-ended. A smaller number are more informal and m1m1C casual
the majority of the repeated word sequences held two main properties: struc- conversation with shorter turns by both interlocutors.
turally, they began with conjunctions, articles, pronouns, prepositions or The academic register was designed for this project to reflect the same
discourse markers, and, functionally, these expressions were used for inter- countries that are represented in the Habla Culta. The texts were c~llected
personal or textual functions rather than ideational functions. from online and print sources. Approximately one-third of the arucles are
The current research complements these previous studies by examining history articles from Argentina, another third are humanities articles from
the use of lexical bundles in Spanish, focusing on two distinct registers: various countries such as Bolivia, Chile, Colombia, Costa Rica, Cuba,
sociolinguistic interviews and academic writing. Because the research on Mexico, Peru, Puerto Rico, Spain and Venezuela, and the final third ~re
lexical bundles in Spanish is rather scarce, we chose to begin the investiga- science articles taken from journals in all of the above-mentioned countnes.
tion at a more global level, comparing two very different registers. Table 10.1 demonstrates the composition of the two registers.
Specifically, the study focused on the following research questions:
2.2 Identification of lexical bundles
1) What are the most frequent bundles in Spanish conversation and aca-
demic prose? Lexical bundles are defined as the most frequent recurring lexical
2) What are the functions of these bundles? sequences in a register (Biber et al. 1999). It is important to. stre.s~ the fact
3) What similarities and differences are there between registers? that lexical bundles are identifi.ed empirically rather than mtmt1vely and
that these word combinations are defined by frequency. By defi.nition,
In Section 2, we describe the methodology used to identify lexical bundles, lexical bundles 'are the sequences of words that most commonly co-occur
and continue in Section 3 by summarizing the major results ofthe identifi- in a register' (Biber et al. 1999: 989). In the current stud,r, we continu~ this
cation and classifi.cation of lexical bundles in each re gis ter. Finally, we con- frequency approach tradition to identify lexic~l bundles 111 the two reg1st:rs
clude with a discussion of the fi.ndings and a brief compa1ison to previous selected. Our approach is rather conservative 111 that m order to be cons1d-
studies of lexical bundles. ered a lexical bundle, an expression must occur at least 30 times in a million
words and appear in at least 20 different texts. To further limit the investi-
gation, only four-word bundles were analysed. A computer pro~rarr: was
2. Methodology
written by the third author that identifi.es each bundle and ma111ta111s a
count of the number of times it occurs in each register and the number of
2.1 Corpus used in the current study
different texts that it occurs in.
The current is based on the analysis of texts from sociolinguistic inter-
views taken from the Habla Culta (Lope Blanche 1977, 1991), in addition to
2.3 Codingfor structural types andfunctions
academic articles that were downloaded from online sources and scanned
from print sources. The Habla Culta includes sociolinguistic interviews from Two of the researchers independently coded each bundle based on struc-
12 different Spanish-speaking cities: 1) Bogotá, Colombia, 2) Buenos Aires, tural and functional characteristics and all conflicts were resolved
WORKING LE XI CAL IN AND WRITING
4000
3500
3000
e
60 ~ 2500
50 ·e,...
.... 2000
40 Q)
Q.
30 > 1500
l.)
e:
Q)
20 :::;¡ iOOO
O'
10
o-+-~~~__¡_~_;_;_-"--~-'-~~~~~~~~--L-'-'--'-"~~'-"'--'-~~~~ -...
Q)
500
Soc:iolinguistic lnterviews Academic: Prose o

Sociolinguisfü:: lnterviews Ac:ademic Prose
Figure 10.l Number of different lexical bundles by register
Figure l 0.2 Overall frequency of lexical bundles in the two registers
discussion. Inter-rater reliability was found to be reliable with 89 per cent

and 88 per cent exact agreement respectively. Table 10.2 Structural types oflexical bundles
l. Lexical bundles that incorporate verb phrasefragments

3. Results la. lst person pronoun + VP fragment
Example bundles: yo creo que es, yo no creo que, yo creo que no, yo creo que se
Research question 1 concerned identifying the most frequent bundles lb. Demonstrative pronoun + VP fragment
in Spanish conversation and academic prose. As shown in Figure 10.l, Example bundle: eso es lo que
there are 50 lexical bundle types in the sociolinguistic interviews and 65 le. VP with NP fragment:
in the academic texts. This pattern shows that there are fewer types of Example bundles: es uno de los, es una cosa que, es una cosa muy
lexical bundles in Spanish sociolinguistic interviews than in academic ld. VP with verb complement clause fragment:
Example bundles: me parece que es, creo que es una, no sé por qué
prose.
le. Coordinator with NP + VP fragment
Figure 10.2 demonstrates the number of lexical bundle tokens by regis- Example bundles: pero yo creo que, y yo creo que
ter. As depicted in the figure, academic prose has more lexical bundle lf. NP /PP + VP fragment
tokens ( 3078) than sociolinguistic in terviews ( 2632). Example bundles: la verdad es que, por eso es que
lg. que-clause fragment:
Example bundles: que a mí me, que se va a, que va a ser, que todo el mundo
3.1 Structural classification
2. Lexical bundles that incorporate noun and prepositional phrasefragments
To be gin explaining the patterns of oc curren ce of the lexical bundles iden- 2a. Noun phrase with de-phrase fragrnent:
tified in these corpora, we first analyse the structural characteristics of these Example bundles: cada uno de los, la de los, el problema de la
expressions. Table 10.2 displays the two structural types that we identified in 2b. Noun phrase with complement fragment:
the analysis. Type l bundles incorporate verb phrase fragments. For Example bundles: el hecho de que, la idea de que, la posibilidad de que
example, begins with a first person pronoun anda verb phrase frag- 2c. Cornplex noun phrase:
ment (e.g., yo creo que es, yo no creo que). The second type on this list, 1b, starts Example bundles: la la, latina y el caribe
with a pronoun (demonstrative or 3rd person pronouns) followed by a verb 2d. Complex prepositional phrase
Exarnple bundles: a lo largo de, a través de la, con respecto a la, en cuanto a la
phrase fragment (eso es lo que). Types le and ld begin with a verb but while
2e. Prepositional phrase with que relative:
in le the verb is followed by a noun (e.g., es uno de es Example bundles: en la que se, de lo que se, de las cosas que, de todo lo que
una cosa , in ld the verb is followed a verb clause
SPANISH ORPORA LEXICAL BUNDLES IN AND WRITING
Table 10.3 across the two
Bundle Academic
STAi'JCE EXPRESSIONS
A. stanee
se va a, que a mí
Al) Personal
Type 2 bundles are both noun and pero yo creo que **
prepositional phrase 2a-2c noun phrase frag- que yo creo que **
ments (cada uno de los, el hecho de que, la ciencia y la), whereas types 2d and 2e yyo creo que **
incorporate prepositional fragments lo largo a través de en la yo creo que el *
que se). Figure 10.3 demonstrates the distributíon oflexical bundles across yo creo que en **
structural types. yo creo que es **
yo creo que la **
yo creo que no **
3.2 Functional taxonomy yo creo que sí **
In this section, we describe the functional taxonomy that emerged from yo no sé si **
A2) Impersonal
qualitatíve analysis of the bundles in context. For this analysis, we began
examining concordance lines to analyse the functions of all the bundles in
el hecho de que ** **
la verdad es que **
their discourse contexts. B. Attitudinal Stance
From the data, three primary functions were identified and those a mí me gusta **
functions match the functions that were described in Biber et al. (2004): a mí me parece **
1) stance expressions, 2) discourse organizers and 3) referential me parece que es *
expressions. Each of these three categories also includes subcategories of
U. DISCOURSE ORGANIZERS
functions. Table 10.3 lists the bundles in each of their functional A. Topic introduction/ focus
categories. The following sections will describe each of the bundles in en cuanto a la * **
more detail. se trata de un **
se trata de una **
te voy a decir *
80 B. Topic elaboration/ darification
o NP/PP-based bundles lo que pasa es que **
70 por eso es que *
11 VP-based bundles
111 60
que es lo que **
w
:¡:;
qué es lo que **
e: 50 III. REFERENTIAL EXPRESSIONS
:::1
.e A. Identification/ focus
o... 40 cada una de las *
w
.e 30
cada uno de los **
E
:::¡
de las cosas que **
z 20 de lo que es **
es lo que yo *
10 es una cosa que **
es una de las ** *
o es uno de los ** *
Sociolinguistic Academic Prose eso es lo que ***
lnterviews lo que más me *
lo que se llama *
Figure 10.3 Distribution oflexical bundles across structural types
WITH
BUNDLES IN SPEECH AND WRITING
Table 10.3
Bundle
de las cosas
B. Sp•ecifica1tion
**
América Latina y el
Bl) **
de la facultad de **
en la
* dela de
*
la mayor parte de
la mayoría de las
* ** de la universidad de * **
* * de los Estados Unidos * **
la mayoría de los * ** en la universidad de *
B2) attributes en los Estados Unidos
en el grupo de
*
* la facultad de ciencias *
B3) Intan¡jble framing attributes la universidad de Buenos
a pesar de que
*
** * Latina y el caribe *
con el fin de
** universidad de Buenos Aires **
con respecto a la
* C2) Time reference
desde el punto de *** ** a la vez que **
de la sociedad civil
** de la década de **
de la teoría de
de las ciencias sociales
* de la década del *
desde el punto de
** el momento en que * *
*** ** C3) Multijunctional reference
el caso de la
el caso de las
** a lo largo de **
el caso de los
* a lo largo del *
el derecho a la
* a partir de la ***
el desarrollo de la
* a partir de los *
** a través de la * ***
el problema de la
** * a través de las **
el punto de vista
*** ** a través de los * *
el sentido de que
en el area de
*
N. STRUCTURAL ONLY
en el campo de
**
* ** en el que se **
en el caso de en la que se *
** ***
en el caso del a mí no me **
en el marco de
* que a mí me
** **
en el proceso de
en el sentido de
* Key to syrnbols:
** = 10-19 per million words
en la medida en
en relación con el
** * = 20-39 per million words
* ** = 40-99 per million words
en relación con la *** = over 100 per million words
en torno a la
*
la idea de que
**
la medida en
** 3. 2.1 Stance bundles
la práctica de ** The stance bundles that were identified in this study were mainly examples
la teoría de la
** of personal epistemic stance and expressed certainty. Additionally, they all
la teoría de los
** induded the verb creerwith a complement clause. For example:
por parte de los
**
punto de vista de
** Y nosotros entendemos que subdesarrollado es tener poco. Pero yo creo que es
sobre la base de
* * demasiado limitativa, ¿verdad?, la idea de que los que tengan mucho son desarrol-
** lados y los que tengan poco son su ... son subdesarrollados. Yo creo que en este
momento lo que interesa es, realmente, las cosas de fondo que hacen las diferencias,
SPAN e BUNDLES IN SPEECH AND WRITING
The identification/focus bundles were sociolin-

interviews and were used to other
from academic prose: person said and elaborate on it. Note the
Entre las limitaciones del estudio nos seüalar dos: el reducido número Pues eso es lo que iba a decir. En Madrid tienes tu medio de vida por todos los
de encontrar resultados estadísticamente signi- lados. (sociolinguistic interview)
ficativos, y el hecho de que estudio no tenga dos brazos aleatorios con otro
grupo de no tratados con bomba de comparar direc- Another category of the referential bundles is used to attributes. This
tamente ambos grupos. (academic prose) category overwhelmingly includes bundles that were found in academic
prose and are used to specify attributes of the head noun. Sorne of these
A small number of bundles that represent attitudinal stance were also iden-
bundles specify quantities as in the following example:
tified. For example:
La escuela pública jugó un papel fundamental en la transmisión a la sociedad de
... a mí me gusta más una ciudad chica. O sea que La Paz está demasiado grande las formas de legar la cultura de generación a generación y de las tecnologías
ahora ¿no?, no es que está demasiado grande, pero, por ejemplo, esto de los edi- del enseñar y el aprender. La mayoría de las naciones logró articular las escuelas,
ficios a mí no me convence mucho. Prefiero una ciudad como Cochabamba. colegios y universidades, valorándolos como espacios privilegiados para la
(sociolinguistic interview) enseüanza, el aprendizaje y la producción de los saberes públicos. (academic
prose)
3.2.2 Discourse-organizing bundles
Of the discourse-organizing bundles that were identified, there were two Other types of bundles specify tangible and intangible framing attributes.
main types: topic introduction/focus and topic elaboration/clarification. The intangible attributes were more frequent than the tangible framing
The introduction/focus bundles are used to either signify that a new topic attributes, where only one example was found:
is being introduced orto direct the listeners' or readers' attention to aspe-
Respecto a los efectos de las altas temperaturas, una investigación centrada en el
cific part of the discourse. For example, in the following excerpt from a grupo de personas ancianas encuentra un mayor impacto del calor entre las
sociolinguistic interview, the speaker is using the bundle to begin his/her mujeres. (academic prose)
answer to the interviewer's question:
In contrast, numerous examples of intangible framing attributes can be
Pues, ¿qué te voy a decir de la prensa espaüola, que tú no sepas? En mi casa se found in the academic register. These examples tend to identify abstract
compra el 'ABC' no porque seamos monárquicos ni nada, no, simplemente
characteristics:
porque es el periódico más cómodo de manejo, no es más que por eso.
Con respecto a la fecha de diagnóstico, en el caso de los pacientes con EC, 69%
The topic elaboration/ clarification bundles are used to add additional
de ellos fue diagnosticado después del aüo 1995, es decir en la segunda mitad del
information about the speaker's message, as in the following example:
período estudiado. (academic prose)
Me dedico al arte en la casa en los ratos en que tengo libre, hago mosaico, que es Additionally, referential bundles that specify places, institutions
lo que más me gusta, y a veces hago también una ... una técnica de estampado
or times were found:
de tela, pero no en el sentido comercial, sino en el sentido netamente artístico.
(sociolinguistic interview) Utilizamos seis conejos (Oryctolacus cuniculus), machos, sexualmente maduros,
híbridos, clínicamente sanos y alimentados con pellet y zanahorias ad libitum,
The next example includes a bundle that was originally identified as two sep-
obtenidos del Bioterio de la Facultad de Medicina de la Universidad de la
arate four-word bundles (lo que pasa es and que pasa es que) but through la ter Frontera, Temuco, Chile. (academic prose)
analysis was identified as a single five-word bundle lo que pasa es que. All exam-
ples of this bundle were found in the sociolinguistic register: El fútbol, como juego reglamentado, nació en Inglaterra hacia mediados de la
década de 1860. (academic prose)
... ésa es la impresión que yo tengo. Lo que pasa es que siempre tenemos el gran
vicio, nosotros, de ver una islita realmente, ¿no? Nosotros vivimos en Buenos Aires Severa! lexical bundles identified in the referential category were used to
además vivimos en una islita en Buenos Aires. (sociolinguistic interview) perform different referential functions in different contexts. In the case of
WORKING SPANISH CORPORA LEXICAL BUNDLES IN SPEECH AND WRITING
or
60
HCH,,UdillUCHlC alo 111
el)
1999 hasta 50
inicios de la i5
i:::
-...
;::¡
.o 40
Alo distintas sociedades ocuparon los territorios que o
íll 30
formarían la en el extremo noroeste de (academic .e
text) E
::;¡ 20
z
Another example of these multi-functional referential bundles is a través de
la/los/las. As shown in the following examples, these bundles were used indi- 10
cating time, place and text reference:
o
La justificación de la existencia se presenta como superación de ésta en la obra, por Sociolinguistic Academic Prose
medio de la consolidación de la duración a través de la narración. (academic text) lnterviews
Controlando este efecto, es posible distinguir aquellas variables que son efectiva- Figure 10.4 Distribution oflexical bundles across functional categories
mente significativas a través de los años. (academic text)
Han abandonado el medio arbóreo, han abandonado la vida, propiamente, en las
galerías - se les llama así - a través de los ríos, en donde la naturaleza los ... les o NP/PP-based bundles
prestaba, propiamente, protección. (sociolinguistic interview) 11 VP-based bundles
3.2.4 Structural only

One final group emerged from the data which appeared not to have a spe-
cific function, and thus might be considered as structural artefacts instead
of genuine lexical bundles. Examples of these sequen ces were ones in which a¡ 40
all the words were function words. Butler ( 1997) mentions this category as .e
E 30
well: ::1
z 20
Desde la hoja de papel Margie canjea identidades con la propia novelista, que 10
regala momentáneamente a su personaje la facultad de narrar el propio texto
en el que se encuentra inmersa. (academic prose) o
Stance Discourse Referential
33 The interaction between structural and functional categories Figure 10.5 Interaction of structural and functional categories for both registers
As shown in Figure 10.4, referential bundles are predominant in both express stance and discourse-organizing functions, while NP-based bundles
spoken and written Spanish registers. In comparison, stance bundles are are typically used to express referential functions (see Biber et al. 2004).
common in sociolinguistic interviews but not in academic prose. There
are few discourse-organizing bundles in both registers, although more are
4. Discussion and condusion
found in sociolinguistic interviews.
Figure 10.5 shows the interaction between the structural characteristics The study identified the lexical bundles used in a corpus of acade-
and functional categories of lexical bundles in both registers. Referential mic prose and sociolinguistic interviews in classifying bundles
bundles are usually expressed by means of noun phrases or prepositional both and functionally. In sorne the findings are very
phrases, whereas stance and discourse-organizing functions are usually similar to those found in of lexical bundles in Spanish
realized as VP-based bundles. It is interesting to note that lexical bundles in and English et al. 2004). The most obvious
English show these same assodations of VP-based bundles being used to is the strong association between the structural and functional
WORKING SPANISH CORPORA LEXICAL BUNDLES IN SPEECH AND WRITING
characteristics of bundles
addition, the two

in their communicative uses of bundles: academic prose in both
and uses lexica! bundles for referential functions, while ~"'"~"~" found
use bundles for all three functions dis- number of referential bundles
course-organizing, referential). follow from the strong inter-
Beyond those points of however, there are sorne striking dif- action between structure and found for both Spanish and ~""'"-"'·~'"
ferences in the types and distributions of lexical bundles in the two lan- Section : VP-based bundles used for stance and discourse-
guages. the here has shown that there is a much set of organizing functions and NP-based bundles used for referential functions.
lexical bundles used in Spanish academic prose than in spoken sociolin- That is, NP-based bundles are especially common in Spanish due to the way
guistic interviews. This difference is in marked contrast to the register dis- in whích noun phrases are structured. As a result, referential bundles are
tribution of bundles in English: they are much more common in spoken especially common in Spanish, because NP-based bundles are usually used
registers than in informational written registers. 2 Butler (1997) also found for referential functions. And for the same reason, bundles are overall very
a higher number of four-word repeated sequences in his spoken registers of common in academic prose: academic prose relies almost exclusively on ref-
Spanish than the one written register of newspaper reportage that he erential functions (rather than stance or discourse-organizing bundles),
included in his analysis. and since NP-based (referential) bundles are structurally favoured in
Second, NP /PP-based bundles are more common than VP-based bundles Spanish, we find an especially frequent use of bundles overall in academic
in our study of Spanish, while VP-based bundles are generally more writing.
common than NP-based bundles in English. And finally, referential bundles Thus, the present study suggests that the occurrence oflexical bundles in
in Spanish are the most common functional type, in both sociolinguistic a language is influenced by both communicative factors and the structural
interviews and academic prose (although this function is especially dom- resources available in the language. Future research is required to better
inant in academic prose). Referential bundles are generally mu ch less address the methodological problems of comparing the sets of lexical
common in English: spoken English registers tend to use many more stance bundles across languages. In addition, future research will provide a much
and discourse-organizing bundles than referential bundles, and although fuller description of the use of lexical bundles across the full range of
most bundles in wrítten English registers are discourse organizers, they are spoken and "lnitten registers in Spanish. The present study has taken a first
not frequent in absolute terms. In part, structural differences between the step towards this goal, documenting the central importance of these
two languages contribute to all three ofthe above distributional differences. extended collocational sequences in both speech and writing.
Two structural characteristics of Spanish turn out to be especially importan t.
The first is the consistent marking of gender and number in Spanish but not Notes
in English. Gender and number are marked primarily in the noun phrase
(e.g. on determiners and pronouns). As a result, we sometimes find two NP- 1 Butler (1998) also investigated collocational frameworks in Spanish.
based bundles in Spanish where only one would be found in English (e.g., 2 However, Biber (2006) documents the dense use of lexical bundles in
yo creo que el, yo creo que la and la mayoría de los and la mayoría de las). The written institutional registers.
second factor is even more the almost exclusive reliance on de-
for phrasal modification in Spanish. In contrast, English commonly
uses ofphrases for similar functions, but these same meanings are also com-
monly with nouns as pre-modifiers. a common two- or three-
word sequence in English often corresponds to a four- or five-word
sequence in Spanish, resulting in the dense use of four-word NP-based
bundles. For example, compare these examples:
the learning process el proceso de enseñanza

research processes los procesos de investigación
research field área de investigación
the History 1Jl!'1Jacrtm'.en<W de Historia
REFERENCES
Gredos.
Aristóteles Buenos /tires: Editorial Sudamericana-Planeta.

(2000) 'La presencia de lo oral en la literatura: Sobre la variable
futuro verbal en una muestra del teatro español contemporáneo', in
Aarts, B. (2004) 'Modelling linguistic gradience'. Studies in Language 18(1), M. Muñoz, G. Fernández, A. Rodríguez and V. Benítez (eds), N Congreso
1-49. de Ling;üística General. Cádiz: Universidad de Cádiz, pp. 267-82.
Aarts, B., Denison, D., Keizer, E. and Popova, G. (2004) Fuzzy Grammar: Ávila, A. (2000) 'Hacia una caracterización gramatical del corpus de lengua
A Reader. London: Oxford University Press. hablada', in M. Muñoz, G. Femández, A. Rodríguez and V. Benítez ( eds),
Adam,J. (1992) Les Textes: Types et Prototypes. Paris: Nathan. , N Congreso de Lingüística General. Cádiz: Universidad de Cádiz, pp. 151-8.
Adelstein, A. (2004) Unidad Léxica y Valor Especializado. Estado de la Cuestión Avila, R. (1968) 'Expresiones verbales de lo futuro y la caracte1ización social
y Observaciones sobre su Representación. Barcelona: Instituto Universitario de en dos obras del teatro mexicano contemporáneo', in H. Meier, L. Sáez,
Lingüística Aplicada. Colección: Serie Tesis 5. K. Hunnius, R. Ávila and L. Grimes (eds), Futur und Zukunft im Spanischen.
A~elstein, A.. and Cabré, M. T. (2002) 'The specificity of units with special- Archiv für das Studium der Neueren Sprachen und Literaturen. Berlín: Erich
1zed meanmg: polysemy as explanatory factor'. Delta 18, 1-25. Schmidt Verlag, pp. 346-9.
Aguirre,J. (2000) 'Análisis y procesamiento de las diátesis de los verbos de Baldry, A. and Thibault, P. (2006) 'Multimodal corpus linguistics', in
cambio en gallego'. Online. Retrieved from: http:/ /webs.uvigo.es/sli/ S. Hunston and G. Thompson (eds), System and Corpus: Exploring
arquivos / sepln 00. doc. Connections. London: Equinox, pp. 164-83.
Aijmer, K.. (2002) . 'Modal adverbs of certainty and uncertainty in Ballester, A. and Santamaría, C. (1993) 'Transcription conventions used for
an Enghsh-Swed1sh perspective', in H. Hasselgard, S. Johansson, the corpus of spoken contemporary Spanish'. Literary and Linguistic
B. Behrens and C. Fabricius-Hansen (eds), Information Structure in Computing 8, 283-92.
a Cross-linguistic Perspective. Anisterdam, Netherlands: Rodopi, Bally, C. ( 1944) Linguistique Générale et Linguistique Fran~aise. Beme: Éditions
pp. 97-113. Francke.
Alarcos Llorach, E. (1999) Gramática de la Lengua Española. Madrid: Espasa Bassols, M. and Torrent, A. (1997) Modelos Textuales: Teoría y Práctica.
Cal pe. Barcelona: Octaedro.
Alcina, J. and Blecua, J. (1975) Gramática Española. Barcelona: Ariel. Battaner, P. (2000) 'Un corpus para la enseñanza: Corpus PAAU, junio,
Alcoba, S. (1999) 'La flexión verbal', in L Bosque and V. Demonte , 1992', in P. Battaner and C. López (eds), V/jornada de Corpus Lingüístics.
Gramática Descriptiva de la Española. Madrid: Espasa Calpe, L'"1"''"~·'· Barcelona: Institut Universitari
pp. 4915-92.
(2002) 'The TELEC secondary learner corpus. A resource for 11Ltcu,La. E., C. and Pujol, M. Enseñar:
teacher development', in S. Granger,J Hung and S. Petch-Tyson (eds), La Redacción de Exámenes. Madrid: Antonio Machado Libros.
Corpora, Second Language Acquisition Language G. (1989) El Futuro en -ré e ir a +
Amsterdam: Benjamins, pp. 195-211. Moderno. Gothenburg: Acta Universitatis
Almeida, M. and M. (1998) 'Aspectos sociolingüísticos de un cambio Bazerman, C. (1988) Shaping Written
xn1rP'1ir.n de futuro'. Estudios 7-21. in Science. Madison: The University ofWisconsin Press.
'Recurrent word combinations in spoken English', in Bazerman, C. (1994) of genres and the enhancement of social
the Nordic Studies intentions', in A. Freedman and P. , Genre and New Rhetoric.
·~~,~,,.~, pp. 17-27. London: and pp. 79-101.
WORKING WITH CORPORA REFERENCES
29-42.
communication', in U.
Berlín: de pp. 289-304. Somali'. Variation and
and Huckin, T. Gen re K ~rn11,1fp1ifre Biber, D. and Hared, M. in Somali: conse-
Communication: Cognition, Culture, Power. Hillsdale, NJ: Erlbaum. quences'. Annual Review of Applied Linguistics 12, 260-82. .. .
Bhatia, V. (1993) Analysing Genre: Language Use in Professional Settings. Biber, D. and Hared, M. (1994) 'Linguistic correlates ofthe trans1tlon to ht-
London: Longman. eracy in Somali: Language adaptation in six press registers', in D. Biber
Bhatia, V. (1995) 'Genre-mixing in the professional communication: The case and E. Finegan ( eds), Sociolinguistic Perspectives on &gister. Oxford: Oxford
of "prívate intentions" v. "socially recognised purposes" ', in P. Bruthiaux, University Press, pp. 182-216. .
T. Boswood and B. Bertha (eds), Explorations in English for Professional Biber, D., Conrad, S. and Cortes, V. (2003) 'Lexical bundles m speech and
Communication. Hong Kong: City University ofHong Kong, pp. 1-19. writing: An initial taxonomy', in A. Wilson, P. Rayson and T. McEnery
Bhatia, V. (2004) Worlds of Written Discourse: A Genre-Based View. London: (eds), Corpus Linguistics by the Lune. Frankfurt: Lang, pp. 71-93.
Continuum. Biber, D., Conrad, S. and Cortes, V. (2004) 'Ifyou look at ... lexical bundles
Biber, D. (1985) 'Investigating macroscopic textual variation through multi- in academic lectures and textbooks'. Applied Linguistics 25, 371-405.
feature/multi-dimensional analyses'. Linguistics 23, 337-60. Biber, D., Conrad, S. and Reppen, R. (1998) Corpus Linguistics: Investigating
Biber, D. (1986) 'Spoken and written textual dimensions in English: Language Structure and Use. Cambridge: Cambridge University Press.
Resolving the contradictory findings'. Language 62, 384-414. Biber, D.,Johansson, S., Leech, G., Conrad, S. and Finegan, E. (1999) The
Biber, D. (1988) Variation Across Speech and Writing. Cambridge: Cambridge Longman Grammar of Spoken and Written English. London: Longma~.
University Press. Blanche-Benveniste, C. (1998) Estudios Lingüísticos sobre la &laczon entre
Biber, D. (1992) 'On the complexity of discourse complexity: A multidi- Oralidad y &critura. Barcelona: Gedisa. .
mensional analysis'. Discourse Processes 15, 133-63. Blas,J. L. (2000) 'Aspectos sobre la variación lingüística en la lengua escnta:
D. (1994) 'Using register-diversified corpora for general language la expresión de futuridad en el español literario'. Lingüística Española
studies', in S. Armstrong ( ed.), Using Large Corpora. Cambridge: The MIT Actual22, 181-200.
Press, pp. 180-201. Bolívar, A. (2000) 'Homogeneidad versus variedad en la estructura de los
Biber, D. ( 1995) Dimensions ofRegister Variation: A Cross-Linguistic Comparison. resúmenes de investigación para congresos'. Akademos 2, 121-38.
Cambridge: Cambridge Press. Bosani, A. (2000) 'Verbos de comunicación y discurso', in J. de Bustos,
Biber, D. (1996) 'Investigating language use through corpus-based analyses of P. Charaudeau,J. Girón, S. Iglesias and C. López (eds), Lengua, Discurso,
association pattems'. International journal of Corpus Linguistics l, ( 2), 171-97. Texto: J Simposio Internacional de Análisis del Discurso. Madrid: Visor,
Biber, D. (2003) 'Variation among university spoken and written registers: pp. 253-62.
A new multí-dimensional analysis', in P. and C. (eds), Bosque, l. (1990) Las Categorías Gramaticales. Madrid: Síntesis.
Language Strncture Use. Amsterdam: Rodopi, Bosque, I. (1999) 'El nombre común', in I. Bosque and V. Demonte ,
pp. 47-70. GraináticaDescriptiva de la Lenguafüpañola. Madrid: Espasa Cal pe, pp. 3-76.
Biber, D. (2005) 'Paquetes léxicos en textos de estudio universitario: Bosque, I. and Demonte, V. (eds) Gramática Descriptiva de la Lengua
Variación entre disciplinas académicas'. Revista Signos Española. Madrid: Espasa Calpe.
Biber, D. (2006) Language: and Brinker, K. (1988) Linguistische extan1cztw Berlín: E. Schmidt.
Written Amsterdam: Benjamins. Brucart, J. (2000) 'L'analisi sintactica i la seva terminología en !' ensenya-
Biber, D. and Conrad, S. (1999) 'Lexical bundles in conversation and acad- ment secundari', in J. Macia Guila and J. Sola ( eds), La
, in H. and S. Oksefjell , Out Studies en Secundari. Barcelona:
wnrrn,snn Amsterdam: Rodopi, pp. 181-9.
REFERENCES
redes
y Sociedad
pp. 60-77. Castella Lidon, J. Oralitat
frameworks m 11m"~"·nrrrP Barcelona: Publicacions de de Montserrat.
Castellano, A. variación del
, in M. Mml.oz, G. Femández, V. Benítez , IV
de General. Cádiz: Universidad de Cádiz, pp. 521-31.
J. 'Sequentiality as the basis of constituent structure', in Castellón, I., Fernández, A., Martí, A., Morante, R. and Vázquez, G.
T. Givón and B. Malle (eds), The Evolution ofLanguage out oJPre-language. 'An interlingua representation based on the lexico-semantic informa-
Amsterdam: Benjamins, pp. 109-34. tion'. Online. Retrieved from: http://crl.nmsu.edu/Events/FWOI/
Cabré, M. (1999) 'El discurs especialitzat o la variació funcional determi- SecondWorkshop/paper/ castellon.html.
nada per la tema ti ca: Noves perspectives ', in M. Cabré ( ed.), La Cepeda, G. (2002) 'Entonación, actitud y modalidad'. fütudios Filológfros
Terminología: Representación y Comunicación. Una Teoría de Base Comunicativa 37, 7-28.
y Otros Artículos. Barcelona: IULA, pp. 151-73. Chafe, W. ( 1982) 'Integration and involvement in speaking, writing and oral
M. (2000) La Terminología: Representación y Comunicación. Elementos literature ', in D. Tannen ( ed.), Spoken and Written Language: Exploring
para una Teoría de Base Comunicativa. Barcelona: Instituto Universitario de Orality and Literacy. Norwood, NJ: Ablex, pp. 35-53.
Lingüística Aplicada, Universitat Pompeu Fabra. Chafe, W. ( 1985) 'Linguistic differences produced by differences between
Cabré, M. (2002) 'Textos especializados y unidades de conocimiento: speaking and writing', in D. Olson, N. Torrence and A. Hidyard
Metodología y tipologización', in.J. García and M. Fuentes (eds), Texto, ( eds), Literature, Language and Learning: The Nature and Consequences of
Terminología y Traducción. Barcelona: Almar, pp. 122-87. Reading and Writing. Cambridge: Cambridge University Press,
Cademártori, Y (2003) 'La inscripción de las personas en textos de divul- pp. 105-23.
gación científica'. Revista Latinoamericana de Estudios del Discurso 3, ( 1), 9-28. Chafe, W. (1986) 'Evidentiality in English conversation and academic
Cademártori, Y, Parodi, G. and Venegas, R. (2006) 'El discurso escrito y writing', in W. Chafe and J. Nicho Is (eds), Evidentiality: The Linguistic
especializado: caracterización y funciones de las nominalizaciones en los Codingof Epistemology. Norwood, NJ: Ablex, pp. 261-73.
manuales técnicos'. Literatura y Lingüística 17, 243-65. Chafe, W. (1992) 'The importance of corpus linguistics to understand the
E. (2000) 'Decir la ciencia: Las prácticas divulgativas nature of language', in J. Svartvik ( ed.), Directions in Corpus Linguistics.
de mira'. Revista Iberoamericana del Discurso y Sociedad 2, (2), Berlin: de Gruyter, pp. 79-97.
Chafe, W. (1994) Discourse, Consciousness and Time. Chicago: The University
H. 'Análisis discursivo de la divulgación of Chicago Press.
Online. Retrieved from: Chafe, W. and Danielewics,J. (1987) 'Properties of spoken and written lan-
danielcass/ anali.htm. guage', in R. Horowitz and J. Samuels (eds), Comprehending Oral and
Calsamiglia, H. and Tusón, A. (] Las Cosas de Decir: lvianual de Análisis Written Language. New York: Academic Press, 83-115.
de Discurso. Barcelona: Ariel. Christie, F. (1998) 'Science and apprenticeship: pedagogic discourse',
(1 in J. Martín and R. Veel Science: Critical and Functional
Gramática Perspectives on Discourse London: Routledge, pp. 152-80.
Christie, F. (2005) Classroom Discourse A Functional P~·cJ>,,rh"'"
London: Continuum.
Christie, F. and Martín, J. Genre and lnstitution: Social Processes
the and School. London: Continuum.
norma y habla del futuro de Church, K. 'Introduction to the
Rohrer Semantikos. Studia
Honorem Coseriu 1921-1981. Madrid: 383-94.
SPANISH REFERENCES
verbal interaction between experts

Studies 5, , 207-33.
G. y
Crísmore, A. with Readers:
en el discurso científico oral: aportes NewYork: Lang.
teórico-descriptivos para el estudio gramatical'. Revista de la Sociedad Dagneaux, E., Denness, S. and Granger, S. (1998) 'Computer-aided error
Argentina de Lingüística RASAL l, 81-100. analysis'. System 26, (2), 163-74.
Ciapuscio, G. (in press) 'Esquemas calificadores modales y recursos léxico- Dand, F. (1987) 'Cognition and Emotion in Discourse Interaction:
gramaticales en la conferencia de divulgación científica'. Boletín de la A Preliminary Survey of the Field'. Proceedings of the IVth International
Academia Argentina de Letras. Buenos Aires. Congress of Linguists, Berlin, pp. 272-91.
Ciapuscio, G. and Kesselheim, W. (2005) 'Identitatskonstitution in Experten- Davies, M. (2002) 'Un corpus anotado de 100.000.000 palabras del español
Laien-Kommunikation: Die Rolle der Textherstellungsverfahren'. Neue histórico y moderno'. Sociedad Española para el Procesamiento del Lenguaje
Romania. Linguistik am Text. Beitrdge aus Argentinien und Deutschland/ Natural (SEPLN) 21-7.
Lingüística en el texto. Contribuciones de Argentina y Alemania. Berlin, De Beaugrande, R. (1999) 'Reconnecting real language with real text: Text
pp. 125-52. linguistics and corpus linguistics'. InternationalJournal of Corpus Lznguzstzcs
Ciapuscio, G. and Kuguel, I. (2002) 'Hacia una tipología del discurso 4, (2), 243-59.
especializado: aspectos teóricos y aplicados', in M. T. Fuentes and J. García De Beaugrande, R. (2000) 'Text linguistics at the millennium: Corpus data
(eds), Terminologia, el Texto y la Traducción Salamanca: Almar, pp. 37-73. and missing links'. Text 20, (2), 153-95.
Ciapuscio, G. and Otañi, L (2002) 'Las conclusiones de los artículos de De Beaugrande, R. and Dressler, W. (1981) Introducción a la Lingüística del
investigación desde una perspectiva contrastiva'. RILL 15, 117-33. Texto. Barcelona: Ariel.
Clachar, A (2003) 'Paratactic conjunctions in creole speakers' and ESL de Cock, S. (1998) 'A recurrent word combination approach to the study of
learners' academicwriting'. WorldEnglishes22, (3), 271-89. formulae in the speech of native and non-native speakers of English'.
Collins, P. (1991) Cleft and Pseudo-Cleft Constructions in English. London: International]ournal of Corpus Linguistics 3, 59-80.
Routledge. Deerwester, S., Dumais, S., Furnas, G., Landauer, T. and Harshman,
Company, C. (1985-86) 'Los futuros en el español medieval. Sus orígenes y R. (1990) 'Indexing by latent semantic analysis'. journal of the American
su evolución'. Nueva Revista de Filologia Hispánica 34, 48-107. Society for Information Science 41, (6), 391-407.
Company, C. and Medina, A. (1999) 'Sintaxis motivada pragmáticamente. De.Jonge, B. (1991) 'La interpretación de datos lingüísticos en el análisis
Futuros analíticos futuros sintéticos en el español medieval'. Revista de lingüístico: numerus ornen est'. Lingüística 3, 15-35.
FilologiaEspañola (l-2), 65-100. De Kock,J. and Gómez, C. (2002) Gramática Española: Enseñanza eInvestigación.
Conrad, S. (2002) 'Corpus approaches for discourse analysis'. Annual Apuntes Salamanca: Ediciones Universidad Salamanca.
Review ofApplied Linguistics 22, 75-95. Delbecque, N. and B. 'La subordinación sustantiva: Las
S. and D. (2001) 'Adverbial of stance in speech and subordinadas enunciativas en los complementos verbales', in l. Bosque
, in S. Hunston and G. Thompson (eds), Evaluation in Text: and V. Demonte ( eds), de la Lengua Española. Madrid:
Authorial Stance in the Construction Oxford: Oxford Espasa Calpe, pp. 1965-2082.
Press, pp. 122-43. De Miguel, E. (1999) 'El aspecto léxico', in l. Bosque and V. Demonte (eds),
Contreras, C. 'Unidad temática y variedad textual: Un tópico social Gramática Descriptiva de la Española. Madrid: Espasa Calpe,
en tres relatos orales' Estudios 23-39. pp. 2977-3060.
L. Nuevo Texto Demonte, V. (1997) La Subordinación Sustantiva. Madrid: Cátedra.
.n-n,rm'm" a las Recientes Normas Actualmente en
Demonte, V. and Varela, S. (1997) 'Los infinitivos nominales eventivos del
Instituto Militar. "~'J<U>v• . Revista y Seña 7, 123-56.
REFERENCES
, in].
pp. 35-60.
Flamenco García, L. 'La coordinación adversativa', in l.
and argument selection'. V. Demonte , Gramática de la
3855-78.
Fleischman, S. The Future in and
( 'Genre VLO>Ll;<,áU.UU Of the H'1Crr'<l"
tion and discussion sections of MSc. , in M. Coulthard ( ed.), Cambridge University Press.
Talking about Text. Birmingham: English Language Research, Flowerdew,J. (ed.) (2002) Academic Discourse. London: Longman.
Fontanella, M. (1999) 'Sistemas pronominales de tratamiento usados en el
Birmingham University, pp. 128-45.
mundo hispánico', in I. Bosque and V. Demonte (eds), Gramática
Dyer, J. and Keller-Cohen, D. (2000) 'The discursive construction of pro-
fessional self through narratives of personal experience'. Discourse Studies Descriptiva de la Lengua Española. Madrid: Espasa Cal pe, pp. 1399-1426.
Fortanet, I. (2005) 'Honoris Causa speeches: an approach to structure'.
2, (3), 283-304.
R. (1991) 'Futuro analítico y futuro sintético en tres obras con Discourse Studies 7, ( 1), 31-51.
rasgos coloquiales: El Corbacho, La Celestina v La lozana andaluza' in Francis, N. (1979) 'A tagged corpus: Problems and prospects', in
K. H. Korner (ed.), HomenajeaHansFlasche. Stuttgart: Steiner, pp. 499-508. S. Greenbaum, G. Lee ch and J. Svartvik ( eds), Studies in English Lz·nv1us1;zcs
Eggins, S. and Martin,J. R. (2003) 'Context as genre: a functional linguistic for Randolph Quirk. London: Longman, pp. 192-209.
Fuentes, J. (1985) Gramática Moderna de la Lengua Madrid:
perspective'. Revista Signos 36, (54), 185-205.
Ellis, N. (1996) 'Sequencing in SIA: Phonological memory, chunking, and Editorial Bibliográfica Chilena.
points of order'. Studies in Second Language Acquisition 19, 91-126. Galán, C. (1999) 'La subordinación causal y final', in l. Bosque and
V. Demonte ( eds), Gramática Descriptiva de la Española. Madrid:
Escandell, V. (1993) 'Conectivas: el caso de la conjunción y', in V. Escandell,
Introducción a la Pragmática. Barcelona: Anthropos, pp. 185-97. Espasa Calpe, pp. 3597-642.
Ferguson, C. A. (1983) 'Sports announcer talk: Syntactic aspects ofregister Gallardo, S. (1999) 'Evidencialidad: la certeza y la duda en los textos
variation', Language in Society 12, 153-72. periodísticos sobre ciencia'. Revista de Lingüística Teórica y
Fernández, A., Vázquez, G., Martí, A. and Castellón, I. (1999) 'Los 37, 53-66.
predicados de cambio y su representación en una BCL'. Online. Gallardo, S. (2005) Los Médicos Recomiendan. Un Estudio de las Notas
Retrieved from: http:/ /www.sepln.org/revistaSEPLN/revista/24/24- Periodísticas sobre Salud. Buenos Aires: Eudeba.
Camero, S. (2001) La Traducción de Textos Técnicos. Barcelona: Ariel Lenguas
Fernández, (1999) 'El pronombre personal. Formas y distribuciones. Modernas.
Genette, G. !JI. Paris: Le Seuil (Points).
Pronombres átonos y tónicos', in I. Bosque and V. Demonte (eds),
Ghadessy, M. (1993) Register Theory and Practice. London:
Gramática Descriptiva de la Lengua Española. Madrid: Espasa Calpe,
pp. 1209-74. Pin ter.
de Sintaxis Española. Barcelona: Vox.
Ferrari, L. 'Modalidad epistémica y grados de certeza en los artícu- Gili
of classification in LSP
los de . Ponencia Presentada en el lll de del
Symposium on LSP, Copenhagen,
Mercosur: De la Teoría a la Praxis de las Lenguas, Universidad Nacional del
Argentina. Denmark.
genres.
S. ( 1999) 'Los marcadores de evidencialidad Glaser, R.
en una controversia ambiental'. Discurso y Fachsprache.'
Gnutzmann, C. and ~,.~~"~
verbs based LSP-research: Theoretical considerations
from: in H. Schroder
/Y11rhr><'"' and Text
REFERENCES
WORKING SPANISH CORPORA
.rn1x11fl.'w 1-1o·~"Mn
Léxico Fundamental
Ediciones Universitarias de and Martin, J. Science: Discursive
Power. Pittsburgh: University of Pittsburgh Press.
Hartley, J. and Kostoff, R. (2003) 'How useful are "key words" in scientific
journals?'. journal of Iriformation Science 29, (5), 433-8. . ,
Harvey, A. (2002) 'Representación e imagen del quehacer oenhfi~? en los
medios de comunicación', in G. Parodi (ed.), Lznguzstzca e
Interdisciplinariedad. Desafíos del Nuevo Milenio. en Honor a Marianne
Peronard. Valparaíso: Ediciones Universitarias de Valparaíso,_pp. 335-53.
Heinemann, W. (2000) 'Textsorten. Zur Diskussion um Bas1sklassen des
Kommunizierens. Rückschau und Ausblick', in K. Adamzik (ed.),
Textsorten. Reflexionen und Analysen. Tübingen: Stauffenburg, rP·
Heinemann, W. and Heinemann, M. (2002) Grundlagen der 1extlznguzstzk.
_9-29 ..
Tübingen: Max Niemeyer Verlag. . . ...

Heinemann, W. and Viehweger, D. (1991) Textlinguistik: ezne Eznjuhrung.
Tübingen: Narr.
Hernández, C. ( 1986) Gramática Funcional del Español. Madrid: Credos.
Hernández, C. (1996) Gramática Funcional del Español. Madrid: Credos.
Hernández, C. (2000a) 'Morfología del verbo. La auxiliaridad', in M. Alvar
( ed.), Introducción a la Lingüística Española. Barcelona: Ariel, 195-213.
Hernández, C. (2000b) 'Sintaxis: La subordinación', in M. (ed.),
Introducción a la Lingüística Española. Barcelona: Ariel, pp. 391-407.
Hernanz, M. (1999) 'El infinitivo', in I. Bosque and V. Demonte (eds),
Gramática Descriptiva de la Lengua Española. Madrid: Espasa Calpe,
pp. 2197-356. . . .
Hoey, M. ) Textual Interactions: An Introduction to Wrztten Dzscourse
London: Routledge.
'"'':nunv London: Allen and Unwin.
On the
'Invocación de actitudes: El juego de la
Hood, S. and Martin, J.
gradación de la valoración en el discurso'. Revista 38
195-220.
In Hopper, P. and
and Written Cambridge
Horowitz, R. and uaui.,<u0, 'Comprehending oral and wrítten lan-
guage: Critical contrasts for and schooling', in R. Horowitz and
Samuels :nni.hr.Ph.Pnrl.zrur Oral and Written San
n<,dL<n111L Press, pp.
WORKING SPANISH C
.1.mTm""' Variation. Amsterdam:

Press.
S. and G. (2001) Evaluation in Text: A.uthorial
Kintsch, W. 'Metaphor comprehension: A
Stance and the Construction of Discourse. Oxford: Oxford University Press.
Psychonomic Bulletin and Review 7, , 257-66. ~ ,..,
Hunston, S. and Thompson, G. (eds) (2006) System and Corpus: Exploring
Kintsch W. (2001) 'Predication'. Cognitive Science 2':J, (2), 1 /3-202.
Connections. London: Equinox. Kittred~e, R. (1982) 'Variation and homogeneity o~ sublanguages', i.n
K. (1998) Hedging in Research Articles. Amsterdam:
R. Ki ttredge and J. Lehrberger ( eds), Sublanguages: Studies of Language in
Restricted Semantic Domains. Berlin: de Gruyter, 145-89.
K. (1999) 'Disciplinary discourses: Writer stance in research arti-
Klavans, J. and Kan, M. (1998) 'Role in verbs document ~,,.•.,:,0
'"
des', in C. Candlin and K. Hyland ( eds), Writing Texts, Processes and

Practice. London: Longman, pp. 99-121. Proceedings of the Coling-Acl, 680-86. . .
Koch, P. and Oesterreicher, W. (1990) Sprache in der Romania:
K. (2000) Disciplinary Discourses: Social Interactions in Academic
Franzosisch, Jtalienisch, Spanisch. Tübingen: Niemeyer. . .
Harlow: Pearson.
Kornfeld, L. and Resnik, G. (2002) 'Sintagmas terminológicos con adJetlvos
Hyland, K. (2002) 'Genre: Language, context, and literacy'. Annual Review
pasivos'. Actas VII Simposio Iberoamericano de Terología. Cartagena: Red
of Applied Linguistics 22, 113-35.
Hyland, K. (2005) 'Stance and engagement: A model of interaction in Iberoamericana de Terminología.
Kovacci, O. (1990) El Comentario Gramatical. Madrid: Arco Libros. . , , .
academic discourse'. Discourse Studies 7, 173-92.
Kovacci, O. (1992) 'El ordenamiento del texto (I), La coordmac10n, m
Hymes, D. (1984) 'Sociolinguistics: Stability and consolidation'. International
O. Kovacci ( ed.), El Comentario Gramatical. Práctica. Madrid: Arco/
of the Sociology of Language 45, 39-45.
luliano, R. (1976) 'La perífrasis ir + a + infinitivo en el habla culta de Libros, pp. 226-40. . .
Kovacci, O. (1993) 'La didáctica de la lengua materna. Expenenoas en la
Caracas', in F. Aid, M. Resnick and B. Saciuk ( eds), 1975 Colloquium on Argentina'. Proceedings of the I Congreso Internacional sobre la Enseñanza del
Washington: University Press,
pp. 59-86. Espaiiol. Madrid: CEMIP. , .
Kovacci, O. (1999) 'El adverbio', in I. Bosque and V. Demonte (eds), Gramatica
Jakobson, R. (1961) and Poetics. New York: Wiley.
Descriptiva de la Lengua Española. Madrid: Espasa Cal pe: 705-86.
R. (1985) 'Note-taking as register'. Discourse Processes
Kress, G. and van Leeuwen, T. (1996) Reading Images: 1he
84, 437-54.
y (1994) la Science. Formes Paris: London: Routledge.
Kuguel, I. (2006) 'La Semá~tica del Léxico especializado: Los ~én~inos en
PUF. los Textos de Unpublished doctoral thes1s. Umvers1dad de
S.
in meaning criteria and the logic of
8, 183-288.
An Things: V\lhat Reveal
and Press.
the computational basis of learning and
Kaiser, D. del autor en los textos académicos: Un from LSA'. and Motivation
estudio contrastivo de de estudiantes de Venezuela y Alemania'.
Boletín de 53-68.
WORKING SPANISH CORPORA
REYERENCES
y medio de
26thAnnual
pp. 234-54.
edge" theories: Can the
genre of the texbook accommodate both?', in J. Flowerdew
Acadernic Discourse. London:
Manning, C. and Schütze, H. (2003) Foundations Statistical Natural
Lazaraton, A. (2002) 'Quantítative and qualitative approaches to discourse
Language Processing. Cambridge, YMA: MlT Press. _ . . .
. Annual &view ofApplied Linguistics 22, 32-51.
Marcos, F. (1975) Aproximación a la Gramática Espanola. Madnd: Ed1tonal
Leech, G. (1991) 'The state ofthe artin corpus linguistics', in K. Aijmer and
Cincel.
B. Altenberg (eds), English Corpus Linguistics. Studies in Honour of Jan
Markkanen, R. and Schroder, H. (2000) 'Hedging: A challenge for
Svartvik. London: Longman, pp. 8-29.
pragmatics and discourse analysis'. Online. Retrieved from: http:/ /
Leech, G. (1992) 'Corpora and theories of linguistic performance', in
sw2. euv-frankfurt-o. de /Publikl 1ationen/Hedging/ markkane. h tml.
J. Svartvik (ed.), Directions in Corpus Linguistics: Proceeding of Nobel Martín, G. (1986) Curso de Redacción. Madrid: Paraninfo.
Symposium. Berlin: de Gruyter, pp. 105-22.
Martin,J. (1992) English Text. Amsterdam: Benjamins. .
B. (1993) English Verb Classes and Alternations. A Preliminary
Martin, J. (1993) 'Technicality and abstraction', in M. Hall_1day and
Investigation. Chicago: The University of Chicago Press.
J. Martin, Writing Science: Literacy and Discursive Power. P1ttsburgh:
Lledó, E. (1995) 'Usos lingüísticos y género'. Textos de Didáctica de la Lengua
y la Literatura 6, 29-34. University of Pittsburgh Press, pp. 23-46. . . .
Martin,J. (1997) 'Register and genre: Modeling social context m funct10nal
Longacre, R. (1983) The GrammaroJDiscourse. NewYork: Plenum.
linguistics - narrative genres', in E. Pedro (_ed.), Proce.edz1:gs of the Fzrst Lzsbon
Blanch, J. M. (ed.) (1977) fütudios Sobre el Español Hablado en las
lnternational Meeting on Discourse Analysis. L1sbon: Cohbn/ APL, pp. 2_12-56.
Ciudades de América. Mexico Universidad Nacional Martin,J. (1998) 'A modeling context: The crooked path of progres,s m con-
Autónoma de México.
textual linguistics (Sydney SFL) ',in M. Ghadessy (ed.), Text and Context m
'--''<Hn.a, J. (1991) Estudios Sobre el ü"""'''"'n de A1éxico. Mexico
Universidad Nacional Autónoma de México. Functional Linguistics. Amsterdam: Benjamins, pp. 134-87._ . , .
(1999) 'Relaciones Martin, J. (2001) 'Beyond exchange: APPRAISAL systems m. Enghsh , m
, in l. Bosque and S. Hunston and G. Thompson (eds), Evaluation in Text: Authorzal Stance and
V. Demonte ( eds), Gramática
Espaiiola. Madrid: the Construction ofDiscourse. Oxford: Oxford University Press, pp. 34-55.
Calpe, pp. 3507-47.
Martín, J. and Rose, D. (2003) Working with Discourse: Meaning Beyond the
C. (2001) 'La comunicación del saber en los géneros académicos:
Clause. London: Continuum.
de modalidad de evidencialidad'. Martin,J. and Veel, R. (eds) (1998) &ading Science: Critica[ and Functional
PPJ'.>h1?ctives on Diswurse London: Routledge.
Martin,J., Christie, F. Rothery,J. (1987) 'Social processes in education.
A reply to and Watson others) ', in l. ~eid . , The Place
Genre Learning: Current Debates. Geelong, Australia: Deakm
'Las el
Press, pp. 46-57.
Berna! and J. DeCesars
P. 'A genre a11a1vM~ of and Spanish research paper
Paz Battaner: Barcelona: Institut Universitari social sciences'. Specific
11JHLdUct, pp. 147-59.
S.
and Portolés, J. 'Los marcadores del dis-
and V. Demonte Gramática de la
'.winnl!uL Madrid: pp. 4051-213.
SPANISH
REFERENCES
Press.
Universitarias de
Demonte
Parodi, G. 'Textos de y comunidades discursivas
''··""Inmn Madrid: Espasa pp. 1575-630.
técnico-profesionales: Una aproximación basada en corpus computa-
Mendikoetxea, A ( l 999b) 'Construcciones con se: Medias, pasivas e imper-
rizado'. Estudios Filológicos 39, 7-36.
sonales', in I. and V. Demonte (eds), Gramática de la
Parodi, G. (ed.) (2005~) Discurso e Instituciones Formadoras.
J',s17anola. Madrid: Espasa Calpe, pp. 1631-722.
Val paraíso: Ediciones Universitarias de Val paraíso.
S. (1999) 'El discurso del libro de texto: Una propuesta
Parodi, G. (2005b) 'La comprensión del discurso escrito
estratégico-pragmática'. Revista Iberoamericana de Sociedad 1, (2),
85-104. en ámbitos técnico-profesionales: ¿Aprendiendo a texto?'.
RevistaSignos38, (58), 221-67. , . . . . .
G. (2003) 'Paradigma científico y lenguaje especializado'. Revista
Parodi, G. (2005c) 'Lingüística de corpus y anahs1s multid1mens10nal:
de la Facultad de de la Universidad Central de Venezuela 18, (3),
5-14. Exploración de la variación en el corpus PUCV-_20.03: Una . .
mación multiniveles', in G. Parodi (ed.), Discurso Especializado e Instztuczones
Moliner, M. (1986) Diccionario de Uso del Español. Madrid: Credos.
Formadoras. Valparaíso: Ediciones Universitarias de Valparaíso, pp. 83-125.
Montolío, E. (2001) Conectores de la Escrita. Barcelona: Ariel
Practicum. Parodi, G. (2006) 'Reading-writing connections: Discourse-oriented
research'. Reading and Writing Interdisciplinary Journal July, 1-26.
R., Castellón, J. and G. 'Los verbos de
Parodi, G. (2007) Lingüística de Corpus. Buenos Aires: Eudeba.
ria'. Online. Retrieved from: http://grial.uab.es/archivos/2000-13.pdf.
Parodi, G. and Gramajo, A. (2003) 'Los tipos textuales del corpus PUCV-
Moreno de Alba,]. G. (1970) 'Vitalidad del futuro de indicativo en Ja norma
2003: una aproximación multiniveles'·, Revista 36; , 207-2.3.
culta del español hablado en México'. Anuario de Letras 8, 81-102.
Parodi, G. and Venegas, R. (2004) 'BUCOLICO: Aplicacion computaoor_ial
E. (2000) Comunicar Ciencia. El Artículo y las
para el análisis de textos (hacia un análisis de rasgos de la mformatlv1-
L,o:munu:ac:wnr:s a Buenos Aires: U niversídad Nacional de Lomas
dad) '. y Literatura 15, 223-51.
A. and J. 'From to
'La conferencia académica', in L. Cubo de Severino
, Los Textos de la Ciencia. Córdoba:
pp. 189-217.
M. (l un texto escrito?', in
M. L. Gómez, G. Parodi and P. Núñez , de
Textos fücritos: De la Teoría a Sala de Clases. de Chile: Andrés
Bello, pp. 55-78.
M., Gómez, L., G. and
Textos Escritos: De la 'Teoría a la Sala de Clases. ,.,,..,"·''""u
Andrés Bello.
WORKING WITH SPANISH CORPORA REFERENCES
Salazar, O. de oralidad en la escritura: análisis de

vu,U<,JU''ºpor escolares', in M. and J. oarH¡JC!
in
, Actas del XI Internacional de la Asociación de
tasks'. Universidad de Granada, Spain. Online. Retrieved
from: http:/ /w.vw.andrew.cmu.edu/user /jquesada/ / dissertation/. Filología de la llmérica Latina, Las Palmas de Gran Canaria,
Quesada, J., Kintsch, W. and Gómez, E. (2002) 'A theory of complex 1695-1704.
Salem, A. (1987) Pratique des Répétés. París: Institut National de la
problem solving using latent semantic analysis', in W. Gray and C. Schunn
( eds), Acts of the 24th A nnual Conference of the Cognitive Science Society. Langue Frarn;:aise.
Sampieri, R., Fernández, C. and Baptista, P. (2003) Metodologia de la
Mahwah, NJ: Erlbaum, pp. 750-5. ·
RAE (1973) Esbozo de una Nueva Gramática de la Lengua Española. Madrid: Investigación. Mexico City: Me Graw-Hill.
Sampson, G. (2005) 'Quantifying the shift towards empirical methods'.
Espasa Calpe.
Ratteray, O. (1985) 'Expanding roles for summarized information'. Written Internationaljournal oJCorpus Linguistics 10, (1), 15-36.
Communication 2, (4), 457-72. Sánchez, C. (1999) 'La negación', in l. Bosque and V. Demonte (eds),
Recski, L. (2005) 'Interpersonal engagement in academic spoken Gramática Descriptiva de la Lengua Española. Madrid: Espasa Calpe,
discourse: a functional account of dissertation defenses'. English for pp. 2561-634.
Specific Purposes 24, 5-23. Sandhofer-Sixel,J. (1990) 'Emotionale Bewertung als Modale Kategorie'.
Reid, I. (ed.) (1987) The Place of Genre in Learning: Current Debates. Geelong, Grazer Linguistische Studien 33/34, 267-78.
Australia: Deakin University Press. Schiffrin, D. (1987) Discourse Markers. Cambridge: CUP.
Reppen,. ~., Fit~maurice, S. and Biber, D. (2002) Using Corpora to Axplore Schleppegrell,. M.]. (2001) 'Linguistic features of the language of school-
Lznguzstzc Varzatzon. Amsterdam: Benjamins. ing'. Linguistics and Education 12, ( 4), 431-59. ,
Rojo, G. (2002) 'Sobre la lingüística basada en análisis de corpus'. Online. Schroder, H. ( 1991) 'Linguistic and text-theorical research on languages for
Retrieved from: http:/ /www.uzei.com/ corpusajardunaldia/Ol_grojo.pdf. special purposes. A thematic and bibliographical guide', in H. Schroder
Rose, D. (2005) 'Science, technology and technical literacies', in F. Christie (ed.), Subject-oriented Texts: Languages for Special Purposes and Text Theory.
and J. Martin (eds), Genre and lnstitutions: Social Processes in the Workplace Berlín: de Gruyter, pp. 1-48.
and School. London: Continuum, pp. 40-72. Sedano, M. ( 1994) 'El futuro morfológico y la expresión ir a + infinitivo en
Russell, B. (1937) Principles of Mathematics. London: George Allen and U nwin. el español hablado de Venezuela'. Verba 21, 225-40.
Sabaj, O. (2004a) 'El Comportamiento de los Verbos Abstractos en el Sedano, M. (2005) 'Futuro morfológico y futuro perifrástico en el español
Corpus PUCV-2003'. Unpublished doctoral thesis. Valparaíso: Pontificia hablado y escrito', in Análisis de Estructuras Lingüísticas. Paper presented
Universidad Católica de Valparaíso. at the XIV International Congress of ALFAL, Monterrey, Mexico.
Sabaj, O. (2004b) 'Especificidad, especialización y variabilidad verbal: Una Sedano, M. (in press) 'Futuro simple y futuro perifrástico en la prensa
escrita', in C. (ed.), El En América. Diacronía, Diatopía e
µuuR.rvJ"fü en estadística léxica'. Revistas 37,
(56), 75-89. Historiografía. G. Moreno de Alba en su 65 Aniversario. Mexico
Sabaj, O. (2006) 'El uso de los participantes semánticos en los predicados City: Universidad Autónoma de México.
de cambio de estado del español: una basada en . Sigley, R. 'The influence of and channel on relative
Revista y Literatura 17, 267-302. pronoun choice in New Zealand
Sáez, L. del futuro en 1, 207-32.
Silva-Corvalán, C. and Terrell, T. 'Notas sobre la expresión de futuri-
dad en el del Caribe'. 2, 190-208.
Silvestri, A. Discurso Jnstruccional. Buenos Aires: Eudeba.
REFEREN
P.A.AU 1992: Estudios

Barcelona: Instítut Universitari de
Torrego, E. (1999) 'El complemento directo preposicional', in I. .

andV.Demonte (eds), de Madnd:
Sinclair, J. (1989) 'Reflections on computer corpora in English language
Espasa Calpe, pp. 1779-806. . .
research ', in S. Johansson (ed.), Computer Corpora in English Language
Tottie, G. (1983) Much about 'Not' and .A ofthe Vanatzon between
Research. Bergen: Norwegian Computing Centre for the Humanities,
pp. 1-6. Analytic and Synthetic Negation in Contemporary American English. Lund:
Sinclair, J. (1991) Collocation. Oxford: Oxford CvVK Gleerup. . r. . .
Press. Tottie, G. (1991) in English and A Study m 'vanatzon.
Sinclair, J. (1996) 'The empty lexicon'. lnternational journal of San Diego: Academic Press. . .. , .
Linguistics l, ( 1), 99-119. Troya, M. (1998) Perífrasis Verbales de Infini~ivo en la Norma Lznguzstzca Culta
de Las Palmas de Gran Canaria. Madnd/Las Palmas: Real Academia
Sinclair, J., Hoey, M. and Fox, G. (eds) (1993) Techniques of Descriptions:
Spoken and Written Discourse. London: Routledge. Española/Universidad de Las Palmas de Gran Canaria. .
Turney, P. ( 1997) E,xtraction of Keyphrases fr~m Text: Evaluat:on of Four
Soll, L. (1968) 'Synthetisches und analytisches Futur im modernen
Spanischen'. Romanische Forschungen 80, 239-48. Algorithms. Ottawa: National Research Council Canada, Techmcal Report,
ERB-1051.
Stubbs, M. (1996) Text and Corpus Analysis. Com·/mter-assisted Studies of
Language and Culture. MA: Blackwell. - Turney, P. ( 1999) Learning to Extract Keyphrase~ from Text. Ottawa: N atio.nal
Research Council, Institute for Informat1on Technology, Techmcal
Stubbs, M. (2001) Words and Phrases: Corpus Studies o/ Lexical Semantics.
Oxford: Blackwell. · Report, ERB-1057. . , . ,
Tusón, A. (1991) 'Las marcas de la oralidad en la escntura. Signos. Teona y
Stubbs, M. (2006) 'Corpus analysis: the state of the art and three types of
Práctica de la Educación 3, 14-19.
unanswered questions', in S. Hunston and G. Thompson (eds), and
U nworth, L. ( ed.) ( 2000) Researching Language in School Comnmnities:
. Exploring Connections. London: Equinox, pp. 15-36.
Functional Linguistic Perspectives. London: Continuum. .
Subirats, C. (2004) 'FrameNet Español. Una red semántica de marcos con-
ceptuales'. Online. Retrieved from: M. and Ruppenhofer,J. (2001) 'Shouting and sc~eammg: m.anner
Leipzig_Paper. pdf. and noise verbs in communication'. and Lmguzstzc Computzng 16,
J. (1992) (ed.) Directions in (1), 77-97.
.\\11nh,1ti1·1m Berlin: Mouton de Ure,J. (1982) 'Introduction: Approaches to the of register range'.
(1981) 'Definitions in science and law: A case for
of the of Language 5-23 .
ESP matters'. 106-12. van Dijk, T. Estructuras y Funciones. Mexico Siglo XXI.
van Dijk, T. (1983) La Ciencia del Texto: Un P:r1.1111m.r, Interdisciplinario.
J. (1990) English in Academic and Research
Barcelona: Paidós.
Press.
van Dijk, T. (2001) de la teoría del contexto'. Revista
and fragmented worlds: EAP materials and
corpus , in J. Latinoamericana de Estudios del Discurso l, ( 1), 69-82.
Flowerdew Academic Discourse. London: van T. (2002) de conocimiento en el procesamiento del dis-
Longman, pp. 150-64.
Swales, J. Research Genres: r.rm.1J rm11Hi
0
curso', in G. Parodi , e Desafíos del
Press. Nuevo Milenio. en Honor a Marianne Peronard. Ediciones
pp. 43-66.
WORKING SPANISH REFERENCES
¿Cómo se escribe
'Clasificación verbal:
,)nUafY'm,a 3. Lleida: Edicions de la
A. and M. A. (2002) 'Léxicos

verbales computacionales', in M. Martí and J. Llisterri (eds),
Tratamiento del Natural. Barcelona: Universitat de Barcelona,
pp. 29-60.
Venegas, R. (2003) 'Análisis semántico latente: Una panorámica de su desar-
rollo'. Revista (53), 121-38.
Venegas, R. (2005) 'Las Relaciones Léxico-semánticas en Artículos de
Investigación Científica: Una Aproximación desde el Análisis Semántico
Latente'. Unpublished doctoral thesis, Pontificia Universidad Católica de
Valparaíso, Chile.
Venegas, R. (2006) 'La similitud léxico-semántica en artículos de investi-
gación científica en español: Una aproximación desde el Análisis
Semántico Latente'. Revista Signos 39, (60), 75-106.
E. (1997) 'Modalization: Probability - an exploration into its role
in academic writing', in A. Duszak (ed.), Culture and Styles of Academic
Discourse. Berlín: de Gruyter, 157-81.
Shalom, C. S. The of
Berlín: Lang.
E. and Bentivoglio, P. ) 'Verbs of cognition in spoken Spanish:
A discourse profile', in S. Fleischman and L. Waugh , Discourse
a¡r¡na.r,zr:s and the Verb: The Evidence Romance. London: Routledge,
114-98.
R. 'The role of formulaic language in second language
A review'. 180-205.
en
Credos.
as Cornmunication. London:
K (1998)
Gemsbacher and S.
of Science.
patterns of lexis
International
'98,
201-3
<tr·citt,mP'and markers of Latin American Literature

136, 140-2, 143 15, 16, 17
cµ1~uc111n. stancc 95, 96, 97 learner corpora, studies on
J.l:S-ARTICO 205 lexical bundles 217-18
94 association between structural and
and affect in genres 94 functional characteristics 229-30
discourse-organizing 226, 228-9,
abstracts, in scientific articles 197-8, de Referencia de la Lengua factor analysis 20-1, 63-5 230
211 Español 8 emerged factors and dimensions distributions of 230
academic discourse 12-13 de Referencia del Español 21-31 functional classification of 223-5
ARTHUS7 Actual (CREA) 6 functions of 221
factor seores and relationships
ARTICO 204, 205, 214 Corpus del Español 8, 59 discourse organizers 221, 223,
among corpora 24
AKfICOS 128 Corpus Diacrónico del Español 230
factors 21-31
attitudinal communication verbs 113, (CORDE) 6 seores 24 referential functions 221, 223,
116, 120, 126, corpus-based analysis 11 231
functional criterion 153-4
attitudinal stance 95 COTECA project 91-2, 93 stance expressions 221, 223
future tense forms, origin and
AutoTutor 198 CPP 130 identification of 219
evolution 132-3
Cramer's V test 117, 119-20 significance of authors regarding NP /PP-based bundles 229, 230,
Base de Datos Sintácticos 7 CTC 128 use 135 231
Biber, study of linguistíc variation oc curren ce of 231
categories 14, 148
14 data 116-17 referential 227-8, 228-9, 230, 231
Bosani's scheme 110, 111 DEEB 129 stance bundles 225-6, 228-9, 230
genre knowledge 92
Bwananet 5, 7, 178 DETP 129 structural only 225, 228, 229
genre sets 169
DICIPE 129 genre theory 145 structural types of 222
CEO 131 texts, corpora of 11 genres 92, 96 VP-based bundles 228, 229, 230,
109 dimensions 21-3l, 55, 147 academic 93 231
192 Commitment Focus 26-8 Bazerman's system of genres lexical-semantic relations 200, 212
CLL 130 Contextual and Interactive Focus 169 lexical-syntactic interface llO
collocational associative ""-~'"""'' 21-4,31 MD notion of 96 linguistic co-occurrence 56
Informational Focus 3l oral academic 93 linguistic features, computing of
199-200 of 65-82 global text patterns 92 62-3
communicatíon verbs 106, 110-11, awcw~c 14
gradience 148
113, 1 120-6 narrativc dimension 84 linguistic occurrence 55, 57
grammatical tagger 59-60, 62, 75
computer corpus semantics 200 Narrative Focus 24-6, 31 linguistic variation, study of 14
corpora 11 seores 66 from multi-dimensional
Habla Culta 218-19
direct 109 14
6,
181, 183,
features 19, 41-53 185-8
WORKING SPANISH CORPORA lNDEX
notion of94
12
142-3 stance 94, 95, 96
use in different areas 134-42 126 157-61
multi-dimensional 14, 55, Protoverbs 119 Didactic Guidelines 159,
57-9 invoked audiencc 161-3
differences between and 56 of
other languages 85 register differences 56 Technical Article 159, 160-1
methodological steps 58-9 register, factor for linguistic variation style stance 95 textbooks 170
of register variation in Spanish 54-5 subjective assessment expressions textual criteria 154-5
59-65,82 register features 56 189, 190 textual organizer 182-3, 184
salient characteristics 57-8 register, Halliday's definition of
multi-functional analysis, performing 57 tag fields 62 variability in PUCV-2003 corpus 32-8
a 18 register markers 56 tagger, grammatical 59-60, 62, 75 verba dicenci 11 O
multimodality, and texts 157 register, the concept of 95 Technical-Scientific (TSC) verbs, studies on 11 O
multiple feature analysis 110 register variation 11-12, 54-6, 57, 15, 16, 150 verbs and corpora 110-13
multi-word prefabricated expressions 59-65,84, 126 temporal distance 136, 137-8,
217 general patterns of 84 142-3 written competence, of students 173,
variation across registers 14 tests oflinguistic competence 145 175
neutral communication verbs 113, registers 54-6
116, 119-20, 126 of Spanish 115
NOTICEN TV 130 relationships between 24
researching Spanish
focalization 14 computational resources 6-1 O
Oral Interviews Corpus ( OIC) 15, 17 databases 6-10
orality 93, 96, 104 references 6-1 O
orality/writing dyad 93 websites 6-10
paratactic conjunctions 177-90, 191-2 scientific and academic

as textual organizers 182-4 communication research
occurrences l 93n. 9 90
pattems of use 184-90 scientific research articles 195-7,
in use of 212
181-2 abstract research, from computer-
periphrastic future (PF) 132, 142-3 oriented perspective 198
use in different areas 134-42 abstracts 197-8, 211
person ofthe verb 136, 138-40, 143 198-9, 211
popular science talk genre 96-7, 98, 199-200
104 201,
prepositional complements 106,
107-10 situational criteria 151-3
definition l 07 Spanish, features of 55-6

Working With Spanish Corpora PDF

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Working With Spanish Corpora PDF

Transféré par

Droits d'auteur :

Formats disponibles

ISBN 978-0-8264-9483-2

Corpus the rnethodology to extract rneaning from texts.

Research in Corpus and Discourse

©Giovanni and contributors 2007

British Library Cataloguing-in-Publication Data

ISBN: 0-8264-9483-8 List of tables, figures and graphs vii

Lilmuy of Congress Cataloging-in-Publication Data List of abbreviations and acronyms xiv

Chapter III: Dimensions of register variation in Spanish 54

Chapter IV: Epistemic modality and academíc Pilot

Future tense Pvrn'"""" in severa! corpora 132

Chapter VII: Technical-professional discourses: Specialized

Dimension oral versus literate

OMAR SABA;J received his Ph.D. in Linguistics in 2004 from Pontificia

MERCEDES SEDANO has a Bachelor's degree in Letters and a Master's

and to help out in whatever way

Central TV News Programmes, Chile

1. Introducing Working With Spanish Corpora

functíons of a group of con-

wehsites and com¡:>u1tatio1rn1 resources to Fabra, Barcelona, Spain. It offers informa-

2.14 Name of PrADo

in text types or the

in co-occurrence in a.~ .. ,..,_.~,,

in and are for the researcher to statistical

2.1.1 Corpus of 1echnical-Scientific texts (TSC)

Indicative .693 be held

co-occurrence of the above features becomes relevant when

Nouns -.494 Modal verbs of oblig¡ition .496

one is well known, there is no definite consen-

20 described in this research.

18 • \ _.__ Written register

Dímension 1 Dimension 3 Dimension 5

Dimension 1 Graph 2.3 Dimensions and three registers

Discussion and coinc:lu:srnm

of the TSC was developed in the present

30. Passives with 'estar' [PAS.ESTAR] I. Verb types

31. relationship [1YP.TOK.form] 36. Prívate [V.PRNAD] (discover, believe, guess,find,feel,

markers M. Subordination markers

51. Adverbial subordinates of reason or cause/ effect as or as either

These have a more use

how is an oflinguistic variation. In con-

research was as a method-

text, rather than sections document these for

ofview, it is not 4.MD

classes: 10. lst person pronouns,

Register Number of texts Word count Average words/text

site also searches on

adverbs, existential haber, on a dimension; 2) the patterns of variation wüh

Similar patterns have been found in MD

adjectives (before . Prepositional phrases contain a noun e 0.00

- E - ... - ... :a-

also illustrates the dense use Dimension

are our t~ . these constructions describe

Text Sample 3: Academic prose

nominal that careful and revi-

so the conversational are espe-

rob anyone. among the uses of the

Text Sample 7: Newspaper reportage

...l1l ..... :¡:¡Cll

:::¡ ¡¡¡ ·::¡

The features on Dimension 3 are L~i"'"""" used to construct stereo-

serenas en este momento

wore me out ... He demanded so much ... And before l:Sí.

º 0.50 in a domain ofuse: orotiess:íonaL L<.u1111ca1