Académique Documents
Professionnel Documents
Culture Documents
111111111111111111111111111111
Workirig With Spanish Corpora
Lee ch
Feng
All rights reservcd, No pan of this publicalion may be reproduced or !ransmitted in any form or
by any means, electronic or mechanical, including photocopying, recording, or any information
storage or system, withoul prior permission in writing from thc publishers,
'Ldll""'"''' II:
Variation across registers in Spanish: Exploring the
El Grial PUCV Corpus 11
Giovanni Parodi
Pontificia Universidad Católica de Valparaíso
Chile
106
208
223
are as courses
several research ~r~,c·~· and abroad. She been awarded national and
tional programs and distinctions. She is the author of one book, Hendidas)' otras con-
lished severa! articles con ser en el habla de Caracas, and co-author of two others. She has
over 60 articles and specialized books.
!Ul1~H.~~
and
at Universitat Pompeu Fabra since 1993. She gradu- is a Ph.D. candidate in~'"""·'"
ated in (Language) and she received her doctorate in Arizona USA. Her research interests include
Philosophy and Sciences of Education from Barcelona University. She has · ofcorpus linguistics to language teaching and instructed second lan-
cations · d" ·
carried out research on discourse analysis, text linguistics, written commu- cquisition Currently she is working on completmg her issertatlon,
guage a · . . b · d
nication and Spanish language learning as Ll and L2. Currently she is h . h focuses on the acquisition of the pre tente and imperfect yl mstructe
W lC . f h
taking part in scientific research projects funded by governmental and com- L2 learners of Spanish. She has been co-author of a meta-~n.a_rs1s o ~ e
petitive foundations. She is a member of the Discourse Studies Network ( XED) f f ts of task-based interaction on second language acqms1t10n, wh1ch
e ec · d 1' h · (J h
and of the Langnages Acquisition, Learning and Teaching &search Group appeared in Synthesizing &search on Langnage Lea~nzng an eac . zngh o n
( GR@Al) at the UPF. She has published severa! books and articles on dis- Benjamins, 2006), and another article with Doug Biber'. to appear .m t e new
course analysís, text linguistics and language learning in Spanish. journal Corpora. She has conside~able teachi~g expenenc~, havmg taught
EFL in Argentina and Peru, Spamsh as a fore1gn language m the USA, ESL
GIOVANNI PARODI is presently head of the Postgraduate School of in the USA, and teach-training at Northern Arizona University.
Linguistics at Pontificia Universidad Católica de Valparaíso, Chile, and
editor of Revista Signos, Estudios de Lingüística. He obtained an M.A. in RENÉ VENEGAS is professor at the Pontificia Universidad Católica de
Applied Linguistics and later received his Ph.D. in Linguistics. His major Valparaíso, Chile. He has a Ph.D. in Linguistics. He teach~s linguistic~ and
fields of interest are text linguistics, discourse psycholinguistics (reading semantics to undergraduate and postgraduate students. H1s research mter-
comprehension and written production processes) and corpus linguistics. ests are academic discourse, the study of meaning with computer tools, the
Currently he is conducting research in specialized academic/professional development of computer tools for text analysis and oral argumentation.
written díscourse, press media discourse analysis and computational Dr Venegas is a member of the Chilean Linguistic Society (SOCHIL) and
resources through three grants funded by major Chilean research founda- the Latin American Discourse Analysis Association (ALED).
tions and international programmes, such as ECOS and UNESCO UNITWIN
Chairs. His publications include articles in Spanish and English journals and
severa! books published by EUDEBA (2005, 2007) and EUVSA (1999, 2002,
2003, . He has also edited four other interdisciplinary books.
indexes
multi-dimensional
MD
MF
neutral communication verb
National Endowment for the Humanities
Giovanni Parodi
Pontificia Universidad Católica de Valparaíso
Chile
as
f cus ofthe book is toª~'ª''~~
Ü aCfOSS different nc>0-1':YPr<
lS countries. At the same
and written as well as are . a set that revea! how research, conducted
considered. Various structures are focused on and based on corpora col- uve
. Latin America, is . fill.mg t h e gap m
. corpus-b ase d stu d.ies.
lected from different Spanish-speaking countries or communities. It is quite in At the end of this introductory chapter, we prov:ide a collectíon of refer-
~"ª""'''l<.'11.: to compare the results of research conducted on Spanish with ences to websites and computational tools available on the internet that <leal
~,.,.,,.u,. and other languages. As stated above, the popu- with the Spanish language and Spanish corpora.
is daily and Spanish is becoming more In II, Giovanni Parodi, from Pontificia Universidad Católica de
and more relevant on the world stage, hence one chapter comparing Valparaíso, Chil~, uses a multi~di~ensional ap.proach, based on multivaii~te
research on Spanish with research on three languages other than Spanish statistical analys1s, to study vanat10n across wntten and spoken, and speoal-
is also included. ·zed and non-specialized registers of Spanish texts (PUCV-2003 Corpus,
The nin e chapters that follow this one represen t a wide range of research ~lmost 2 .5 million words). sixty-five salient linguistic fe atures w:ith functional
on Spanish, not only because of the various countries, institutions and and communicative implications are determined to be relevant in Spanish.
diverse backgrounds of their authors, but also beca use of the specific topics Variation in frequencies across the texts and the features provided evidence
they focus on. However, even though they are rich and varied in their for five relevant dimensions. The multi-feature and multi-dimensional
approaches, they all contribute to the key aim of the book: the focus on analysis shows that the emerging dimensions tend to identify variation
linguistics variation across registers in the Spanish language. between written and spoken registers and technical and non-technical texts,
By and large, joumals and books on language corpora today are domi- with Informational Focus being the most relevant dimension with regard to
nated by research conducted on English. The collection ofworks presented accounting for the written technical-scientific corpus of Spanish (TSC).
here brings together an updated review of the rich and vast research cur- Chapter III gives an account of research conducted by Douglas Biber
rently being conducted on Spanish using corpus linguistic methodology. and Nicole Tracy-Ventura at Northem Arizona University, USA. Following
This is the first time such research has been gathered together in one the well-known multi-dimensional (MD) methodological approach, initially
volume. The common ground for this compilation of essays líes in their use designed by Douglas Biber himself, that applies multivariate statistical tech-
of spoken and written corpora. In addition, all the essays in this book are niques ( especially factor analysis) to the investigation of register variation in
concemed with empirical text analysis through the examination of authen- a language, Biber and Tracy-Ventura focus on the analysis of Spanish based
tic and diversified corpora. on a corpus that comes from the twentieth-century component of the NEH-
Sorne of the chapters are specifically concerned with language teach- funded Corpus del Español (18.2 million words; 4049 texts; 19 registers).
ing/learning processes in schools and universities, whilst others are more The Spanish MD analysis offers further evidence for both of the following
interested in the phenomena of genre description and discourse variation two major patterns: the existen ce of cross-linguistic universals, together with
across different registers or in cross-linguistic research. Together they illus- distinctive dimensions associated with each language/ culture. An emerging
trate the broad range of corpus-based studies now being carried out in many comparison is made between the Flagstaff study and the Valparaíso study
Spanish-speaking countries, and even in countries where Spanish is not the (Chapters II and III, respectively). Interestingly, these studies and others
official language, such as is the case with the USA. focusing on languages other than Spanish, such as Korean and Somali, have
This book is aimed primarily at English-speaking linguists, specifically found sorne striking similarities in the underlying 'dimensions' that distin-
those interested in Spanish and in contrastive linguistics and contrastive guish between registers in these diverse languages, raising the possibility of
rhetoric; at undergraduate and postgraduate students who study the Spanish universal patterns of re gis ter variation.
language and whose programmes focus on contrastive linguistics and/ or on As part of Chapter IV, Guiomar Ciapuscio, from Universidad de Buenos
contrastive rhetoric; and at English-speaking teachers of Spanish, grammar- Aires, Argentina, looks at the notion of spoken academic discourse genres
ians and discourse researchers. The secondarv audiences of the book are based on a new corpus in construction - the COTECA (Argentina's Scientific
linguists of all languages, language students, 'researchers and teachers of Spanish Text Corpus: Genre, Lexico/Grammatical and Terminological
Spanish, as well as language teachers in general. Research) - which seeks to contribute to the knowledge of spoken and
This introductory chapter an overview of the book, describing written Spanish used by Argentinian academic researchers and specialists. In
its purpose and on all of the chapters in order to explain how this context, Guiomar Ciapuscio explains that the first stage of the COTE CA
WORKING SPANISH ORPORA INTRODUCTION 5
on
analvsis of infonnation structure to
2.7 Name ofweb page text 'retrieval since 1986. From 1990 on the
Webpage group widened its areas of research to
www.corpusdelespanol.org
include natural language processing and
Description (or comment) This is a 100-míllion word corpus, funded by computational linguistics.
NEH during the years 2001 and 2002. Mark
Davis, Brigham Young University, USA, has
created this corpus. It has a very powerful 2.n Name ofweb page Grupo de Ingeniería Lingüística
research computational engine, so large Webpage http:/ /iling.torreingenieria.unam.mx/
that several corpora can be researched at
the same time. Description (or comment) This website belongs to the (',rupo de
Ingeniería Linrj1ística (GIL) of the Universi-
dad Autónoma de México (UAM). There
2.8 Name of web page Estudios de Lingüística Española (ELiEs) is basic information about the principies
Webpage for constructing and analysing a corpus. In
http://elies.rediris.es/ elies18/index.html
addition, there is a complete account of the
Description (or comment) This is the location of the the article by research activities of GIL members and a
Chantal Pérez Hernández, from the
wide repertoire of related links.
Universidad de Málaga. The article lS
Explotación de los córpora textuales informatiza-
dos para la creación de bases de datos termi- 2.12 Name ofweb page Fundación Biblioteca Virtual Miguel de
nológi.cas basadas en el conocimiento. Cervantes
Webpage www.cervantesvirtual.com/herramientas
2.9 Name ofweb page Laboratorio de Lingüística Informática Description (or comment) In the section Linguistic Tools of the
Webpage website belonging to the Fundación Biblio-
www.lllf.uam.es/ corpus/ corpus oral.html
teca Virtual Miguel de Cervantes, this is an
Description (or comment) Here we find the Corpus de Referencia de Advanced Text Browser offering Manila lit-
la Lengua Española Contemporánea, col-
erary texts. There are severa! options for
lected by members of the Laboratorio de the analysis and .1 IA;htirrn of the digital
Lingüística Informática from the Univer- corpus.
sidad Autónoma de Madrid, Spain. One of
its achievements is providing a 1-million
word corpus based on spoken texts and 2.13 Name ofweb page Lingüística de Corpus
implemented with sound recordings. Webpage http:/ /liceu.uab.es/ ~joaquim/language_
resources/lang_res/biblio_corpus.html
!' .....muu ( or comment) Here we find a wide ofbibliographic
resources related to corpus linguistics and
written discourse.
1 WORKING WITH SPANISH CORPORA
atizada de Documentos is
group of researchers from the
Universitat Autónoma de Barcelona and
the Universitat ~~rn•~m Fabra, Barcelona,
Spain. It is funded the Ministerio de
Ciencia y de España. The
website offers online pro-
grams and a growing corpus of texts. Pontificia Universidad Católica de Valparaíso
Chile
2.15 Name ofweb page Español Interactivo de la Escuela para
Estudiantes Extranjeros
Webpage www.uv.mx/ eee / sp_in teractivo /index.htm
Iottoduction
Description (or comment) In this website, we find interactive exercises
for learning Spanish as a foreign language. The increasing importance of register variation across disciplines as an
The principles guiding the exercises are explanatory factor for diverse knowledge construction within discourse
based on corpus linguistics. It belongs to communities has been increasingly recognized over the past decade. The
the Universidad Veracruzana de México. perception that there is no core disciplinary discourse per se and that i t is better
to talk about disciplinary discourses in the plural (Hyland 2000) is becoming
more accepted among researchers (Bhatia 2004). Empirical findings based
on various approaches have documented the importance of corpus-based
analysis as a way to advance and describe in detail the variation across disci-
plines and across text types (Biber 1988, 2003, 2005; Parodi 2004, 2005a;
Bhatia 1993, 2004; Flowerdew 2002; Martín and Veel 1998; Wingell 1998;
Williams 1998).
Corpora of natural, annotated texts have had a significant impact on lin-
guistic analyses over the previous two or three decades. In particular,
research into the English language, as well as certain European and Asían
languages, has revealed that linguistic studies based on large corpora of
digital texts do not always corroborate the researchers' initial intuitions.
The use of computer-supported corpora as well as the availability of com-
puter programs that help in dealing with them has boosted linguistic inves-
tigation in a way that was prev:iously unthinkable.
Unfortunately, research describing linguistic features of specialized acade-
mic discourse based on Spanish data and used in technical-professional
school settings is relatively scant or non-existent. Most of the research pro-
duced in the Spanish language on specialized discourse focuses on the so-
called specíalized-disseminating discourse ( Cademártori 2003; Calsamiglia
2000; Cassany et al. 2000; Ciapuscio 2003b; Ciapuscio and Kuguel 2002; López
2002), or it addresses discourse markers in a variety of texts (Martín
Zorraquino and Portolés 1999; Montolío 2001; Portolés 1998). Other studies
focus on a few linguistic features in small and exemplary texts (Ciapuscio
STERS IN SPANISH
WITH SPANISH
words
155,160 27,ss~s (6.5%)
246,374 (39%) 1,41
225,256 30,797
Total 626,790
Features
36. Prívate
37. Persuasive
l) 38. p,,,,,-,,nti'OP
2) 4. Present (indicative and J. Modal verbs
S. Future and suonni<~w;e 39. Possibility
6. future 40,
3) functional characterization '"""'ỼuL features selected
B. Verb :moocl markers 41. Obligation
4) availability of computer programs can automatically tag the texts
7. Indicative/imperative 42. Volition
in flat format (ASCII or txt) K. Modality markers
S. Subjunctive/imperative
5) automatic tagging and parsing of the texts in the corpus 43. Hedges
9. Indicative mood
6) manual or (semi)automatic database queries to each text to determine 10. Subjunctive mood 44. Boosters
the occurrence of the features under study 11. Imperative mood L.Adverbs
7) elaboration of normalized data tables, given the different number of c. Verbal inflections 45. Place
words between texts 12. First singular 46. Time
8) application, with the aid of computer programs, of factor analysis to the 13. Second singular 47. Manner
frequency of feature occurrences. The reason for this is the need for a 14. Third singular 48. Quantity
reduction of the variables involved and for the determination of 15. First plural M. Subordination markers
co-occurrence patterns in the linguistic features 16. Second plural 49. Noun clauses váth 'que'
17. Third plural 50. Relative adjective clauses
9) establishment of a set of factors (each factor is made up of a set oflin-
D. Personal pronouns 51. Adverbial clauses of reason or
guistic features) through factor analysis with sorne kind of rotation cause/ effect
18. First person singular
(Varimax, Cuantrimax, Oblimin, etc.) 19. First person plural 52. Adverbial clauses of concession
10) functional interpretation ofthe factors, resulting from the factor analy- 20. Second person singular 53. Adverbial clauses of condition
sis, from the co-occurrence offeatures, thus constituting an underlying 21. Second person plural 54. Adverbial clauses of time
dimension of variatíon 22. Third person singular 55. Infinitive phrases with noun function
11) confirmation or refutation of the interpretation of the factors through 23. Third person plural N. Prepositional phrases and adjectives
the estimation of the factor loadings 24. Demonstrative pronoun 56. Prepositional phrases (noun
estimation of the dimension seores. In this step, the seores for each re g- E. Nominal forms complement)
is ter in each dimensionare compared and the linguistic and functional 25. Nominalizations 57. Attributive adjectives (descriptive)
similarities and/ or differences are studied. 26. Nouns (common and proper) 58. Predicative adjectives
F. Passive forms 59. Demonstrative adjectives
27. Passives with 'se' 60. Participles with adjectival function
2.3 28. Passive with 'ser' without agent O. Coordination markers
29. Passives with 'ser' with agent 61. Adversative, additive and disjunctive
In order to select features for the analysis, a literature search was initially 30. Passives with 'estar' conjunctions
carried out with the purpose of identifying representative categories G. Lexical specifidty P. Negation markers
that showed functional and communicative relevance in Spanish. We 31. Type/token per form relationship 62. adverbs
searched Spanish grammar books, specialized research articles on the 32. Type/token per lernrna relationship 63. of temporal negation
topic, textbooks and dictionaries on linguistics and grammar. H. Stative active forms 64. Negation conjunction
Based on all the literature searched, we were able to elaborate a matrix 33. Durative 'ser' 65. Negation pronouns
with a total of 65 linguistic features that could be identified with specific 34. Non-durative 'estar'
communicative and functional characterizations. Table 2.5 provides the
grouped into sixteen more general (identified by
means of
WORKING SPANISH VARlATION ACROSS REGISTERS IN SPANISH
In an advances, and
ficient to handle and carry out this section, we two types of results: ( l) the factors are
corpus of this dimension, we decided to In and the communicatíve functions shared the sets of features
~~~º"Vº and have corpus annotated are determined ; and (2) the dis-
program is called Connexor and it runs Linux. of these groups of features the three LLC
""~''"u'" that sorne researchers and students are sceptical of approaching and the three TSC technical-scientific areas (maritime, industrial
corpus linguistics because of their lack of computational abilities, a parallel and commerce) are analysed. In other words, we proceeded to construct a
objective was to produce a computational tool that could be easily handled functional interpretation of the statistical parameters that were found and
and operated. We therefore had to develop a website with a windows inter- to their incidence on each of the registers and areas of profess1onal
face. El Grial is now available and acts not only as an interface with which to specialization.
interrogate the annotated corpora, but also as a database that contains all
corpora collected by my research team (Paro di 2006). Several steps had to
3.1 Factors and dimensions
be overcome in order to achieve a fully operating system. Extensive details
of the still ongoing process, the versatility of the interface, and the available As mentioned above, five optimal factors emerged from our factor analysis.
tools being developed can be found at www.elgrial.cl and in Parodi (2007). Loadings with an absolute value of less than .40 were excluded from the
Thus, we proceeded to tag our corpus, revised it manually, so as to attain because, in this kind of study, these features are generally deemed
a high level of reliability, and recorded the frequency of the occurrence of to have no relative significance for the interpretation, even though they
all 65 linguistic features. In a more detailed description, the procedure may be statistically significan t. Other studies use loadings with a value of .35
applied to the El Grial PUVC-2003 Corpus the Chilean team consisted of: (Biber 1988); nevertheless, we decided that a higher figure should be used.
Due to Oblimin rotation, and in order to ensure a more realistic study in
1) SGML (Standard Generalized Codification which linguistic features may be presentas part of different interactions in
2) sentence splitter texts, we preferred not to remove a feature from subsequent factors once
3) morphological and syntactic annotation (El this was included in a previous factor. lt is clear the interpretation may there-
4) linguistic and stochastic desambiguator. fore be more complex but nevertheless more accurate.
The results, presented in the following tables, show the features and
their corresponding positive and negative factor seores in each of the five
2.5 Factor
dimensions.
A~ is well known, factor a11a1vM"
groupings of co-occur in FACTOR 1
texts. This analysis identífies correlations between a large number of vari-
ables this case, 65 linguistic and those that have a similar dis- Dimension 1: Contextual and Interactive Focus
tribution. The factor structure in which the variables that tend to co-occur
'"''"'hP·r is the result of a correlational matrix of all the vari- adverbial clauses .945
involved. Each group of variables is described as Time adverbs .934
a factor, which is further in terms of functional categories as ~.-.~ci~u adverbs .928
variation dimension. This reduces the variables involved Second person singular pronouns .911
virtue of their co-occurrence et al. 1999; Oakes First person singular pronouns .823
Once the factor was carried out Second person singular inflections .813
factors were determined pronouns .731
Oakes . These factors were confirmed Place adverbs .723
WORKING SPANI VARIATION ACROSS REGISTERS IN SPANISH
.405
.402
-.581
-.562
oral and -.442
Given the interest in showing the relationships between the registers factor 2 also presents 23 co-occurring fratures distríbuted be~'ieen scores.of
TSC; literature, LLC; and semi-structured oral inter- S4 and .40. In decreasing order ofweight, the features are third pe.rson sn_i-
views, OIC) and - more specifically - in isolating identifying variables in ~ular pronouns (.842), first person plural pronouns (.828), penphrastIC
technical-scientific discourse in teaching contexts, these three registers are future (.823), imperfect past (.820), third person plural pronouns (.708),
compared through each of the dimensions interpreted. To achieve this, asta- ·ndicative mood (.686), first person plural verb inflections (.667), modal
tistical study is carried out from the factor seores that sum up the frequency ~erbs ofvolition (.651), indefinite past (.614), negation pronouns (.590),
of each feature in each factor for each text. The factor seores for each text prívate verbs (.577), place adverbs (.533), second person.singular pronouns
are averaged with ali the texts in a specific corpus (TSC, LLC and OIC) and, (.529), pereeptive verbs (.496), negation adverbs (.493), time adverbs (.48~),
in this way, a mean of the factor score is obtained for each dimension. These active non-durative verb 'estar' (.460), public verbs (.445), first person sm-
mean seores in each corpus or register are compared in order to determine gular pronouns (.431), third person singular verb inflectio~s (.423), adv~r
the types of existing relationships (similaríty or difference) (Hair et al. 1999). sative, additive and disjunctive conjunctions (.411), infinittve phrases w_ith
Factor 2 is presented below, and then the corresponding analysis is noun function (.405), and the negation conjunction 'ni' (.402). Meanwhile,
carried out.
among those with negative values are the features related to non:inalizations
FACTOR2 (-.581), prepositional phrases with a noun complement funct10n (-.562)
and attributive adjectives ( - .442).
The rneaningful presence of personal pronouns, especially those of the
Dimension 2: Narrative Focus
third person singular and first person plural (that is, human subjects .and
story protagonists) (Longacre 1983), justas their respective verb inflecttons
Third person singular pronouns .842 (the most common verb resource in Spanish), shows a rnarked stress on the
First person plural pronouns .828 identification of the persons in the discourse, those present at the moment
future .823 of stating something and those absent in relation to those present
past .820 (Calsamiglia and Tusón 1999). In keeping with the above, those features
person plural pronouns .708
Indicative rnood associated with past tenses - the imperfect past, which describes situations
.686 and circumstances, and its counterpart, the indefinite past, which
inflections .667 signals events and therefore the dynamism of actions (Kovacei 1993) -
.651 co-occur, thus denoting a direct reference to the narrated world (Arroyo
.614 2000; De Kock and Gómez 2002; Weinrich 1974). No less important is the
l'Je,ganon pronouns
.590 co-appearance of the periphrastic future, showing that, a~though narrate~
Prívate verbs
.577 events are generally situated in the past, others take place m the characters
Place adverbs
.533 presentas well (Contreras 2000). These verb tenses are associated with the
Second person pronouns
µp·rr~·~rmp verbs
.529 use of the indicative mood, which helps the protagonists present states and
.496 actions as real ( Gómez and Peronard 1988). All of this is cornplemented by
.493 the use of modal verbs of volition, private verbs, perceptive verbs and publie
adverbs .482
Active fonns of non-durative verbs that airn at accounting for the speaker's subjective attitudes (Arianz~n
.460 2001). In fact, what is rernarkable in mental-activityverbs-used above ali m
WORKING SPANISH CORPORA VARIATION ACROSS REGISTERS IN SPANISH
,604
.563
.562
.518
of the verb .467
. Likewise, the uc.>;ccc"Jl .452
'ni' and additive and disjunctive con- .435
support the sequence of events in the (Biber 1988; De Kock .427
and Gómez Pérez-Rioja RAE the stative active .411
verb 'estar' stands out because of its frequency in descriptive sequences pronouns .402
(Bassols and Torrent 1997; Lorente 2002), and the infinitive phrases with a
noun function are accounted for by their incidence in constructions where Prepositional phrases as noun complement -.457
modal verbs ofvolition and prívate verbs appear, which refer to the partici-
pants in a real or fictional communicative event (Hernanz 1999). To interpret Factor 3, 17 features have been considered. These features co-
Features with negative value, such as nominalizations, prepositional ccur with val u es over .40. From the largest to the lowest seores, the features
phrases and attributive adjectives are complemented in the integration and o . 1
considered are the following: prívate verbs (.824), first person smgu ar pro-
density of information (Burdach 2000; Chafe 1982, 1985, 1994; Ciapuscio nouns (. 789), indefinite past (.705), verbs of volition (.655), verb inflections
1992; Halliday and Martín 1993; Hemanz 1999;Janda 1985; Moyana 2000; of first person singular (.640), indicative mood (.630), :imperfect past
Zarzalejos 2001). The dimension, which emerges from Factor 2, identifies (.604), negation pronouns (.569), second person singular pronouns (.563),
itself with a chronological succession of events set mainly in the past and verb inflections of second person plural (.562), infinitive phrases (.518),
with a description of all which surrounds those events. The above is com- no un subordinates ( .467), adverbial subordinates of concession ( .452),
plemented with time and place markers. Additionally, the strong incidence active non-durative verb 'estar' ( .435), second person plural pronouns
of the pers.onal deixis helps express the presence of the protagonists by (.427), time adverbs (.411) and first person plural pronouns (.402).
means of either the presence of interna! points of view influenced by the Prívate verbs (Biber 1988; Weber and Bentivoglio 1991), verbs ofvolition
speaker's consciousness or externa! points ofview situated outside that con- (Gómez 1999) and the first and second person singular pronouns refer to
sci?usnes~. Intern~l, perceptive and prívate states correspond to the first the participants in a communicative act (Fernández 1999), specifically to
pomt ofVIew; pubhc states correspond to the second. In sum, this factor is persons who express their intentions and attitudes. Likewise, first person
associated with a sequence of events, which implies circumstances of time pronouns, first person singular inflections and second person plural
and place as well as the participation ofpeople in the discourse. Therefore, inflections confirm, asan essential characteristic of this factor, the discourse
Factor 2 helps create a functional dimension called narrative focus. writer/speaker's explicit involvement (Calsamiglia and Tusón 1999;
In conclusion, this dimension is associated with a sequence of events indi- Crismore 1989). The presence ofindefinite past markers suggests that the
c.ating precision of temporal and spatial circumstances, as well as participa- verbs previously described refer to past actions with a determined temporal
t10n of the first and second person in the discourse. Unlike highly end; that is, to constructions that mark the result of the action that the verb
~1-1ct..1a1Llt::u texts, the above helps identify literary texts, oral or written. expresses. The indicative mood, on the other hand, makes reference to real
facts localized in a real time (Contreras 1984; Criado de Val 1962) and
FACTOR3
expresses the experiential declarative mode typical of oral exchange dis-
course (Cepeda 2002). Similarly, it is suggested that the indicative mood is
Dimension 3: Commitment Focus a fe ature through which states or actions are expressed as real ( Gómez and
Peronard 1988); that is, this feature characterizes linguistic exchanges
Prívate verbs .824 whose referents are concrete facts in a given here-and-now. In subordinate
First person singular pronouns .789 noun clauses, the use of the subjunct 'que' is semantically conditioned: it
Indefinite past .705 designate events or processes that are not observed in their execution
Modal verbs of volition .655 but in their result - that is, as already established facts conceived as some-
First person inflections ,640 previous to the statement (Delbecque and Lam:iroy 1999). The pres-
Indicative .630 ence of this feature suggests that the interlocutors in the discourse
WORKING SPANISH VARIATION ACROSS IN SPANISH
.630
Active non-duratíve of 'estar' -.595 as
verbs -.575 . It can thus be described as a set of
~.-,~u~"' pronouns -.572 informative load and is associated with a
Modal verbs -.503 of information, referential. As can be
the discourses characterized these types offeatures are often
The most correlation in Factor 5 is that between the ·~·"~'"H~ condense relevant amounts of data and express mean-
verbs of mood . 5 On the other hand, the negative features in this factor are geared
, with function , prepo- ing . a contextualization of a subject's intellectual state or non-observable
' third person singular verb inflections (- .632), ts at a given moment (Ciapuscio 1992). The presence of the above-
indefinite past (- , active non-durative verb 'estar' ( , prívate
ac features helps to distinguish between d'iscourses wlt .h a
verbs (- .575), negation pronouns (- .572) and modal verbs of volition great de al of information, a~d thus with a greater. degree ~f abstraction, and
(-.503). those discourses that contam a lesser amount of mformat1on; therefore, we
The presence of co-occurring positive features, such as modal verbs of decided to name Factor 5 Informative Focus.
obligation and the subjunctive mood, accounts for the necessity and cer- Considering all the factors involved, each of these five dimensions is the
tainty of the judgements expressed, essentially corresponding to a deontic result of a distinct set of co-occurring linguistic features and each poten-
mode (Hyland 1998; Osorno 2000). Moreover, in the case of the subjunc- tially defines a divergent group of similarities and differences between
tive, it refers to syn tactically more complex organizations (Gilí Gaya 1980) registers and areas of specialization. It is worth noting that, because of the
and to subordination, the function ofwhich is to frame the information of type of rotation selected (Oblimin), which is more adequate for manag-
the discourse (Delbecque and Lamiroy 1999; Galán 1999). However, this ing linguistic data and is more accurate when facts are analysed, the
same feature can be used to express subjective speculation and command makeup of one factor may present features repeated in a previous factor,
(Criado de Val 1962; Gómez and Peronard 1988). The co-occurrences of since these are not removed from the analysis once they are included in
nominalizations, participles with an adjectival function and prepositional a factor. The above involves potential complexities when interpreting the
phrases are presented as signals of integration and compactness of functions. The makeup, nevertheless, turns out to be more realistic as
highly abstract information, and are typical of nominal academic dis- far as human language and its association with a given dimension is
course (Biber 1986, 1988; Burdach 2000; Chafe 1982, 1985;Janda 1985; concerned. As has been shown, a group of characterizing features is
Picallo 1999). readily detectable in language as a constitutive part of communicative
Negative features that present a larger weight in this factor are: the third function; it is the group of features in systematic co-occurrence which
person singular verb inflections, which are used when there is risk of mis- reveals a singular variation pattern that offers possibilities for commu-
construing the reference because the information contributed by the nicative interpretation.
context fails (Castellano 2000); the indefinite past, which sets the action of Dimensions 1, 2 and 5 (Contextual and Interactive Focus, Narrative Focus and
an event in a finished temporal space, where the action is one which is Informational Focus) show themselves to be quite distinct because most of
repeated (Con treras 1984); the stative active verb 'estar', which forms part their positive features are different. Thus, it is possible to establish a clear
of the so-called units of knowledge, although it has no specialized value separation between the functions they represent and the types of texts they
(Lorente 2002); prívate verbs, noticeable for expressing intellectual states differentiate. This acute and delicate distinction is based on the fact that
or non-observable intellectual acts (Biber 1988; Weber and Bentivoglio many of the negative features also present heterogeneity and, in so doing,
1991); negation pronouns, the use ofwhich is mainly colloquial (Sánchez draw attention toward stereotyped registers or types of texts with very little
1999; Tottie 1983); finally, modal verbs ofvolitíon, which account for crisscrossing. Nevertheless, Dimensions 3 and 4 ( Commitment Focus and
either a down-to-earth mode (Langacker 1990) ora participant- Modalizing Focus) do not seem to aim at categories so finely distinguishable.
oriented mode, that is, a mode which outlines the subject's status (Olbertz In fact, they tend to be similar in sorne aspects. This fact is accounted for by,
1998). It should be noted that al! the features in this factor are on the one hand, their sharing a number oflinguistic features whose under-
present in prototypical written discourses and, in particular, in scientific lying functions are very similar and, on the other hand, by the fact that sorne
research artides (Burdach 2000; Cornillie 2003; Criado de Val 1962; Harvey of their features, though not identical, tend to render a similar functional
2002; Hyland 1998). interpretation. The above is not surprising because, as was anticipated, the
To sum up, the in this factor are essentially Oblimin rotation of the data entails this sort of result- results that
oriented toward n-.i-~rm-c> the concentration of information account for natural languages.
WORKING WITH SPANISH ORPORA VARIATION ACROSS REGISTERS IN SPANISH
pronouns, time,
and demonstrative pronouns and
This reveals
intersection areas between the of the inter-
views and the of Latín American literature. It also to
distinctively separate highly specialized texts, where grammatical construc-
tions of more complexity and greater packaging and reduction of informa-
tion are detected than those which involve the participants and their
interpersonal relations in a detailed way. In the latter texts (oral and
written), the author's involvement is detected and expressed in a more
explicit way through specific linguistic markers ( certain types of verbs,
adverbs, pronouns, etc.). This fact can also characterize written specialized
discourse, but most of the time it is kept implicit by using other resources.
As can be observed from the data, variation between registers faces a con-
tinuum, identified in this case through the dimensions and the linguistic
features captured by the dimension; it also helps identify and prototypically
characterize them. The implications derived from the above are multiple,
particularly concerning specialized discourse for didactical dissemination -
the central focus of the present research. Developers of didactic materials
addressing specialized discourse in written Spanish and whoever specializes Dimension 1 Dimension 3 Dimension 5
in their teaching should take advantage of these descriptions. Dimension 2 Dimension 4
Graph 4 shows the factor seores per dimension for each of the Graph 2.4 Dimensions and domains of specialization
three areas of specialization in the El Grial PUCV-2003 corpus (commerce,
industrial and maritime). analysis to fully account for which types of commerce texts gather around
It is worth noting that Graph 3 shows a clear differentiation in the TSC this dimensional pattern.
between Dimension 5 Informational F'ocus and the other two registers (LLC Next in the hierarchy, Dimension 4 also contributes to the distinction
and OIC); in this more detailed analysis, it is precisely Dimension 5 which between specialties. It is now the industrial area which reveals the highest
seems to show the greatest distinction between the technical-scientific positive mean score on the dimension iVIodalizing Focus (2.5), whereas the
domains. As can be seen, the commerce area presents the largest mean posi- maritime and commercial areas obtain very similar negative seores, but with
tive value on this dimension ( 3.8). This fact reflects its heaviest informa- no statistical significance between them. From these data it can be inferred
tional load through high lexical density and syntactic complexity. that the texts from the industrial area contain a greater regularity in the sys-
Parodi (2004) has already detected, by means of a simple descriptive tematic patterns of occurrence around the distinctive features of attenua-
study, sorne interna! variability in the behaviour of the 65 linguistic features tion and uncertainty. In the remaining dimensions (1, 2 and 3), the three
between the technical areas. This preliminary investigation showed a dis- TSC areas present negative and relatively similar figures. These facts indi-
tinctive pattern between the maritime area and the other two. As can be cate that these dimensions would not contribute to a differentiating expla-
seen, the maritime area obtains the largest negative score regarding infor- nation for the description of technical-scientific discourse for teaching
mational load ( - , precisely the dimension which marked a difference purposes. It can also be suggested that the texts belonging to these special-
with the other two areas in the study alluded to above. Considering this ized areas are not remarkable because of the occurrence in them of those
result, Dimension 5 appears to distinguish between texts in one area and the features that denote interaction, interpersonal relations and involvement of
other. Of course, it will be necessary to carry out an in-depth qualitative the participants in the discourse.
38 WORKING ORPORA VARIATION ACROSS REGISTERS IN SPANISH
reassens
in the way of
the MDA has also more
rP•rn<rPr'' from others that are
PUCV-2003 LINGUISTTC
past
=vnr,_.,,,,,_, the
that cannot be nP·rt.·w1-nE•rl
since in their oror1esE;1onaJ has been defined. as tense
technical the tense m wh1ch the
also be suggested that it is not advis-
school students to '"'tomo>t,.,.,,
face texts marked such a complex informational prose. Materials should
be presented in a gradual progression, starting with those that are the most 2. Imperfect past (indicative) [PRET. IMP]
disseminated and moving on later to those texts that are more typical of the This expresses durative or, rather, reiterated and habitual, wholly or par-
professional field in which they are going to be working. tially simultaneous action with another past, durative or instantaneous
action. That is why it is said to be a relative tense (Alvar 2000; Moliner 1986).
In particular, it signals simultaneitywith respect to a moment previous to the
central focus (Alvar 2000). It is typically found in written language, specifi-
cally in narrative prose (De Kock and Gómez 2002). Bassols and Torrent
(1997) point out that it is a tense with which the initial states of a narrative
sequence are described, as wcll as the descriptions inserted in the account.
They also identify its use in argumentative constructions and, perhaps, as
Moliner (1986) states, the imperfect past in modal verbs (must, can, have
to) expresses an opinion about the convenience or origin of things.
3. Perfect past (indicative and subjunctive) [PRET.PER.]
This tense expresses a past action that influences the moment of enunci-
ation and which lasts until the very moment in which something is said
(Moliner 1986). Gili Gaya (1980) argues that, although modem Spanish
establishes a difference between the indefinite past and the perfect past, vast
arcas of Spain and Latín America have preferred one form to the other due
to the prevalence of the perfective aspect in both.
4. Present (indicative and subjunctive) [PRES]
This expresses the actions co-existing with the word's act Gaya 1980).
It is used for what is universal, and is habitual in maxims and sentences.
When the action refers to the moment of speaking, it is the current present.
But timeless truths and habitual actions are also stated with the present
tense. It can express past actions, when the historical present is used
(Moliner 1986). Likewise, when the action to be taken is certain, it can have
a future value (Moliner 1986).
5. Future (indicative and subjunctive)
The future tense expresses a forthcoming action, independent of other
actions. to Gilí the use of the future assumes a
part; dueto it appears
SPANISH CORPORA VARIATION ACROSS REGI IN SPANISH
should be noted
11.
function of this mood is that of direct command. It almost exclu-
occurs in de Val . These verb forms
This tense to the world commented on, in contrast with the counter the other are in the expres-
. It is found in col- sions ofthe deontic form of command. The imperative forms do
not convey except for the meaning of although it is µu0'"'u".
express commands through other linguistic forms of the future and of the
B. Verb mood markers present indicative, subjunctive present, etc. (Gilí Gaya 1980).
7. Indicative/imperative [INDIC.IMP]
c. Verbal inflections
The forms of the second person singular of the imperative and the third
12. First singular [DES.IS]
person of the indicative coincide.
This inflection reflects a text's egocentric nature; it implies a need for direct
8. Subjunctive/imperative [SUBJ.IMP]
communication. It is typical of direct style, ofwritten language and of nar-
There is syncretísm between the forms of the subjunctive and those of the rative prose (De Kock and Gómez 2002). In scientific language and that of
imperative because the only specific forms of the imperative are the second the dissemination of science, there is a common tendency to avoid refer-
persons of the singular (tú) and the plural (vosotros). The other persons are ences to the first person and to employ other procedures for the presenta-
taken from the present subjunctive. In negative phrases, second persons are tion of the author ( Ciapuscio l 992).
substituted for those of the subjunctive.
13. Second singular [DES. 2S]
9. Indicative mood [MOD. IND]
The second person is used with the purpose of causing a given effect: to gen-
This expresses a declarative experiential mode (Cepeda 2002), as well as eralize the stated experience and to include the interlocutor in a personal
states or actions considered to be real (Gómez and Peronard 1988). The and emotional way (Calsamiglia and Tusón 1999). It is associated with
indicative mood is typical of discursive oral exchange (Cepeda 2002), and colloquial language.
makes reference to real facts situated ata true time (Criado de Val 1962;
14. Third singular [DES.3S]
Gómez and Peronard 1988). it refers toan event located in the
past. to Alcoba (1999), the indicative mood is used to make an The third person singular inflection contributes notions of time, mood,
assertion. In this mood, the function prevails, and its distinct person and number to the statement in which the verb form appears
form is the logical or declarative. It is defined by the objective relationship (Alcoba 1999). This inflection is typical ofwritten language, particularly the
between and message and the declarative form and, smce the scientific article (Kaiser 2002).
declarative form is the unmarked form, the indicative mood is the
15. First plural [DES. lP]
most extensive (Hernández 1996).
Identification of the speaker with the first person incorporates the
10. mood [MOD.
speaker into a group (Calsamiglia and Tusón 1999). The so-called we of
speculate about uncertain facts, to a subjective modesty replaces yo because, on occasions, its use is considered inappropriate
appraisal de Val 1962). In this mood, the subjectivity or the in public/formal or academic texts. Another function of the first person
of the communication facing the statement is expressed; it is the mood plural is including the interlocutor, getting him involved and, in this way,
and In all subjunctive the presence of the smoothing commands and requests. Ciapuscio (1992) has pointed out that
speaker and that of the statement is perceived. The can appear, more- this use is frequent in dissemination discourse addressed to a general audi-
over, as agent of the oras subject of the statement, which is some- ence. Likewise, in these text types as well as in those addressed to a more
thing that cannot be found in the mood 1996). restricted audience for their dissemination, the author stresses the use of 'we'
WORKING CORPORA VARIATION ACROSS REGISTERS IN SPANI
to
for
17. Third
The appear when there is risk of
This inflection is ence because the information contributed the context fails. It is com-
due sentences, in er with oral interviews possess
Which the dÍSCOUrSe nrnr1l1u·pr lHIXltlllJll<HlY shows him/herself use makes them similar to demonstratives, but also have
disinterested, or (Gili Gaya 1980). referential use; that is, they can retrieve the features of a person present in
the context (Fontanella 1999). In the case of narrative texts, the
D. Personal pronouns person matches the omniscient narrator (Bassols and Torrent 1997).
the other personal pronouns, the third person singular is the non-
18. First person singular [PRON. IS]
that is, it is ruled out of the communicative act and refers not to the
Because conjugated verbs in Spanish are marked with the person's inflec- but to an 'objective' situation. In this sense, it is the unmarked
tion, the subject pronoun is almost always unnecessary. In first and second term; in fact, it <loes not exist in all languages (Fernández 1999).
persons, its appearance is emphatic and conveys particular insistence on
23. Third person plural [PRON. 3P]
making the subject stand out (Gilí Gaya 1980). It directly refers to the par-
ticipants, markers of the presence of the yo (Biber l 988). In narration, it tells Just like third person singular pronouns, the subject pronoun appears when
apart the witness-participant or the protagonist (Bassols and Torrent 1997). there may be ambiguity, since possible third persons may be many.
It is always present when the context does not clarify the verb person suffi-
24. Demonstrative pronoun [PRON. DEM]
ciently. It is typical of oral interviews (Castellano 2000). In general, personal
pronouns of first and second person - called deictic - refer to the partici- like other linguistic elements grouped around the denomination of
pants in the communicative act, a function that is typical of them. The essen- the demonstratives signal, by selecting them, sorne elements of the
tial semantic characteristic of personal pronouns is that they do not help context (Calsamiglia and Tusón 1999) and acquire a ful! sense in the
assign truth-values to statements independent of context (Fernández 1999). context in which they are stated. They are common in oral speech.
Another important aspect is that first and second person pronouns are
reversible in the sense that the yo being spoken cannot but yield right to a E. Nominal forms
tú if the yo wants to have a valid interlocutor, although they are not neces-
25. Nominalizations [NOMINAL]
sary to express the concept of grammatical person (Fernández
This term designates names derived from verbs and adjectival bases, as well
19. First person [PRON. lP]
as the process of their formation. The names derived can have an event, a
With the form nosotros, the use of the pronoun yo is dodged, but this does process, a state, a quality or a product as referent, as a result of an event or
not in principie, important semantic differences. This use is consid- process, ~ith the typical resource found in technical language (academic) to
ered to be more polite and, for this reason, its use is extensive in the acad- express complex and abstract (Picallo integrate
emic genre (Lledó 1995). This pronoun can have severa! of referents, information in a few words and function as conveyors of highly
in what is called 'fictitious plurals' (Alcina and Blecua abstract information (Biber 1988). Ciapuscio includes nominalization
in the of omitting the agent,
20. Second person [PRON.
26. Nouns and
The forms of written language in Spanish, in partic-
ular that of narration and poetry (De Kock and Gómez . They require , nouns are the main conveyors of the text's ref-
addressee and indicate a high degree of interaction and action erential meaning. Occurrence of sorne of nouns longest ones and
Kock and Gómez . In narrative texts, narration in the second those is associated with whose focus is infor-
person appears, in the case of the characters to a careful of the information. The
WORKING SPANISH VARIATION ACROSS REGISTERS IN SPANI
lemma
expresses
the lexical roots of the forms
F. Passive forms
27. Passives with 'se' active forms
actions, with intemal and implicit, 'ser' [ACT.SER]
may appear in all types of contexts.
are used when the of the is the name of an verbs basically express equivalence, equality, or
object. In general, they are more frequent than the passive with 'ser' relationships or attribute qualities or values, such as to be, to
1986). They appear both in oral speech and in written discourse. An increase appear, to ~e equival~nt. ~eris consi~ered to attribute pen_nanent qualities.
in the use of this construction has been noticed in informative disseminating Verbs of th1s type, pnmanly copulat1ve or pseudo-copulative, although not
language (Mendikoetxea l 999a and b). Ciapuscio (1992) gives evidence ofits ¡dentified as units conveying specialized knowledge, are part of the expres-
frequency in scientific texts, with which the text's impersonal nature is sion of that knowledge; that is, they do not have a specialized value but are
stressed. According to this author, the tendency to omit the agent is typical of part of specialized knowledge (Lorente 2002). Their main characteristic is
science language, which is maintained in its dissemination. their frequency in description (Bassols and Torrent 1997). With serjudge-
ments independent ofimmediate experience are made (Gili Gaya 1980).
28. Passives with 'ser' without an agent [FAS.SER-a] Their use in copulative sentences amounts to functioning as a link between
They specialize in focalized actions, with externa! objects and a marked the subject and the predicate; they are also used for expressing temporality
intentional nature that denotes the existence of a delimited implicit subject. (Gili Gaya 1980).
They are more frequent in written language. The absence of the agent has 34. Non-durative 'estar' [ACT.ESTAR]
been attributed to the intention of keeping silence or of concealing the
notional subject (Mendikoetxea l 999a and b). Because it is a connecting verb, it performs the same functions as those of
the verb sermentioned above but, unlike the latter, qualities are considered
29. Passives with 'ser' with an agent [PAS. SER+a] to be transient or accidental; besides, qualities are sensed as the result of a
These passive constructions normally express a notional subject, grammati- change or transformation (Gili Gaya 1980). Because it expresses states, it is
cally correct. The explicit inclusion of the agent appears most often in news- lexically incapable of expressing a change or progress during the time lapse
paper written discourse (Hernández 2000a). in which it takes place (De Miguel 1999).
ing .
are resources the service
used to achieve communicative effects that go
These express intellectual states when the on w?ich the action fa.lls is information
an abstract noun. They have a more concrete meanmg when the objects
44. Boosters
on which the action falls are concrete nouns. They correspond to a subtype
of They reveal an intemal focalization in These accentuate the value of the verbs. They are used to positively signal
the reliability ofpropositions. They can be used in non-propositional func-
tions to signal solidarity with the interlocutor. Typical of oral inten•iews
J. Modal verbs (Cepeda 2002).
39. Possibility [V.MOD.POS]
L. Adverbs
These verbs express the speaker's/writer's opinion on the content
45. Place [ADV.LUG]
expressed, generally mitigating the force of the statement~ ~aid: They are
in scientific articles (Hyland 1998) as a way of ant1opatmg poten- These place the meaning of the verb in spatial coordinates and add infor-
tial objections and help the author appear less restrictive. mation that completes the argumentative structure of the predicate
(Bosque 1990).
40.
These express the writer's compromise with what is said in the scientific 46. Time [ADV.TIEMP]
article (Hyland 1998), increasing the force ofwhat has been asserted. Dueto their deictic function, they set order frames in the sequence of events
or contextual clues for the interpretation of what has been said (Kovacci
. They are circumstantial in post-verbal position. actas circum-
These express the point of view of the writer, who judges the truth of what stantial if they interrogate or negate.
has been said in terms of certainty . Besides, the deontic
mode is concemed with the or the possibility of the actions being 47. Manner [ADV.MOD]
carried According to Bassols and Torrent (1997), they are of description. In
42. Volition [V.MOD.VOL] principle, they denote the manner in which events are or m
which actions are carried out.
To '-''"ª"ªL."-u
These express or
to a 1971). As they help express nr,r,n,Qrt;
mode. to giving descriptions greater
events
results
50.
The use sentences w1th relative pronoun
noun with very characteristics for
does not possess or lexical
O. Coordination markers
61. Adversative, additive and disjunctive conjunctions [CONJ.dis.adv.ad]
The conjunctions are typical ofwritten language and ofnarrative prose (De
Kock and Gómez 2002). Coordination is the grammatical procedure used
to associate syntactical constituents without establishing a grammatical hier-
among them (Camacho 1999). The use ofthese conjunctions is fre-
quent as an indicator of simplicity in spoken language. According to Ávila
(2000), employs coordination more than subordination,
in face-to-face and distanced conversations.
62.
These convey more and fragmented information (Biber 1988).
are indicators of the negative form in the sense that make
reference to the sender's attitude toward the receiver and the message
itself.
can
DIMENSIONS OF REGISTER VARIATION IN SPANISH
ofthe
4.3. Factor
j\s above, Multi-Dimensional
and rev:ision of the the that uses a statistical 'factor to patterns of
the co-occurrence. This procedure reduces a large number of
to a small set of underlying variables, called 'factors'. Each
factor represents a group ofvariables that are correlated with one another
Pero Acon + coor+ + + _gensingcon_ + their statistical tendency to co-occur in ; these factors
nada Ar+ + + + + !! + rbother_ +nada+ can subsequently be interpreted as underlying 'dimensions' of register
de Aen+ + + + + + _lwrdprep_ +de+ variation.
eso Ap3cs + dem + + + + + _prodem_ + eso+ The Appendix displays the full factorial structure for the analysis of
sucedió Avm +is+ 3s + + + + _indicat_preter_voccur_ +suceder+ Spanish linguistic fe atures. Only 85 of the original 140 + linguistic fe atures
, Apunc+ +, + + + + + + were retained in the final factor analysis. Sorne features were dropped
y Acon + coor+ + + + + _gensingcon_ + because they were redundant or overlapped to a large extent with other fea-
el Alms+ def+ + + + + _defart_ +el+ tures. In other cases, features were dropped because they were generally
embajador Anms + com + + + + + _singn_derivn_ +embajador+ rare in our corpus. Severa! of these features were combined into more
concedió Avm +is+ 3s+ + + + _indicat_preter_ +conceder+ general features. For example, possessive determiners and possessive pro-
su Ad3cs +pos+ + + + + _prepos_ + su+ nouns were combined into a more general feature; que clefts include both
mano Anfs + com + + + + + _singn_ + mano+ indicative and subjunctive clauses; similarly cual relative clauses comprise a
y Acon + coor+ + + + + _gensingcon_ + range of structural variants, including indicative and subjunctive clauses,
la Alfs+ def+ + + + + _defart_ +la+ with and without a preceding preposition. In addition, sorne features were
sonrisa Anfs + com + + + + + _singn_ + sonrisa+ dropped either because they did not vary across Spanish texts, or because
imperturbable Ajes+++ + + + _postadj_ +imperturbable+ they shared little variance with the overall factorial structure of this analysis
a Aen + + + + + + _l wrdprep_ +a+ (as shown by the communality estima tes).
cada AdOcs+ ind+ + + + + _quant_ +cada+ The solution for six factors was selected as optima!. These six factors
uno ApOms+ind+ +++++uno+ account for 45 per cent of the shared variance. A Promax rotation was used,
de Aen+ + + + + + _lwrdprep_ +de+ which allows for sorne correlations between the factors. (The Appendix
los Almp+ def+ + + + + _defart_ +el+ also lists the eigenvalues for the first 6 factors as well as the ínter-factor
invitados Anmp+ com+ +++!!+_plum_+ invitado+ correlations.)
Table 3.2 summarizes the important linguistic features defining each
Each line begins with the word followed by the start of the tag, indicated dimension (i.e. features with factor loadings over + or - .3). Each factor
A. The primary tag is in field l (e.g., noun, verb, etc.), with various sec- comprises a set of linguistic features that tend to co-occur in the texts from
ondary tags in fields 2-5 (e.g., the mood, tense, person, number and voice the Spanish corpus. Factors are interpreted as underlying 'dimensions' of
of a verb), an ambiguous tag in field 6 ( !!) , a linguistic fe ature tag in field 7 variation based on the assumption that linguistic co-occurrence patterns
( e.g., 'ynquest' for questions; 'subjvcompque' for que verb comple- reflect underlying communicative functions. That is, particular sets of lin-
ment clause with subjunctive mood), and the lemma in the final field. guistic features co-occur frequently in texts because they serve related
Once the texts in the corpus were tagged, it was a simple matter to communicative functions. Features with positive and negative loadings rep-
the frequency of each linguistic feature in each text. These fre- resent two distinct co-occurrence sets. These define a single factor because
quencies were 'normalized' to arate of occurrence per 1000 words of text the two sets tend to occur in complementary distribution: when a text has a
(see Biber et al. 1998). Thus, at this stage, we had normed frequencies of 85 high frequency of the positive set of features, that same text will tend to
linguistic features for each text, making it possible to compute descriptive have low frequencies of the negative set of features, and vice versa. In the
statistics for the different The entire tagged corpus is available for interpretation of a factor, it is important to consider the following: 1) the
research on the at http:/ /www.corpusdelespanol.org/registers. Thís communicative functíons that are shared by the linguistic features grouped
WORKING SPANISH DIMENSIONS OF VARIATION IN SPANISH
Table (continued)
Dimension 1: 6:
-
"'
E ¡:=
-
... ..."' :o
Cl U)
UI
¡:= (1)
G> .e:
e IJ) ....
o iii (1)>
IJ)
u¡
o
m ...
(1) G>
Cl
m o
G> UI
m
2:
(.)
UJ
UJ
el)
.....
e ....
o (!)
+::
ca
UJ
:::::
fl)
11)
m
UJ
~
11)
">
" -
(..) (..)
e a. e ..9!
- - " ,_
(!) el) fl)
of surface-level water in to the underground system of drainage, sorne areas 11)
(!)
(!) el)
ca el) o ca 11>
with many caves are rather and have little
"C
a> E o... o.
¡¡¡ u
11) 11)
o
,_
> UJ
.5 '¡¡j
-¡¡j
(.)
¡g ¡¡¡e: .e ¡¡¡ e: 11)
e:
ou ·¡¡¡ .e .5
u
u E o ....
...111 :;:::
l.) 111
E o ~
(!)
E
In sum, Dimension l makes a fundamental distinction between speech and
oQ. c.
e: +:: o
·¡¡¡ ::l o. oc..
¡¡¡
::l
::l
m z(!) ·sCll111
at the two poles, this dimension actually distinguishes between stereo- ::::1 = w UJ
m
00 u¡ o .5
typical speaking (conversation) and stereotypical writing (expository prose). .E o
·¡;s
Linguistically, these opposing styles are represented by verbal/ clausal fea-
o
tures serving involved and interactive functions, as opposed to a dense ti)
Translation:
Those of us that were there were not too surprised that he was avoiding at all
-6.00
c
o E (!)
(.)
Cll
- ... ....... - -- -
... o
111 !:
-
f.) f.) CI) Q.
¡¡:: Cl .!!! ¡¡¡ Q. e (!) IJ) (!)
a.
CJ) "O w "O
- o
(!)
:¡;¡ c. ¡¡¡
costs any This man - and this is an irnportant point-was
o~e of the most well-known figures in British scientific circles, in large
ti)
11)
(!)
> w
111 (!)
Qj u "O Q. ro
...o .!: ...... CI) E
(!)
... ...o Cll o (.)
ü;::.. 'Ej
h1s . . . rejection of anything that could not be weighed or measured. For
CI)
e
·¡¡¡ ou
e: .5
u
:¡;;
l/j
l/j
i1i lll ..o
o ¡¡¡
E f.)
-;¡¡ ll)
u c.
'ii !:
e (1)
(!)
..e
111
(.)
c
,.,
Q)
Translation:
That night, around bedtime, DonJose Pedro called together his two daugh-
ters. 'Your mother wants to talk to you. She's waiting for you in the bedroom,' 5.4. Jnterpretation
he said. And so, smoking, he left for the park. The daughters saw him Dimension 4 is composed of overtly interactive and highly involved features,
disappear in to the thick darkness. It was so dark outside that everything was including CU questions, yes-no questions, exclamatives and diminutives.
the colour of dark pine trees ... However, in contrast to Dimension 1, the style of discourse represented here
The two girls went straight into their mother's room. seems to be focused to a large extent on the addressee, resulting in the
'Sit down. I want you to settle down now,' the woman began. 'Has dense use of 2nd person pro-drop, and the pronoun tú, but not lst or 3rd
Demetrio settled down now as well? From what I saw at dinner, no one has
person pronouns.
understood anything. OK. Now I want you toread this letter- both together This somewhat specialized grouping of linguistic features is especially
and in silence.' common in business telephone conversations (see Figure . In this regis-
ter, telephone operators are interacting with customers, obtaining informa-
Interestingly, drama also has a large positive score on Dimension 3. In this tion and attempting to help with customer problems. In our corpus, these
case, the t~x~ is entirely dialogic, but the characters are narrating past events are conventionalized interactions that focus on the addressee, with little
and descnptlons to carry the story line of the play. For example; expression of the feelings and attitudes of the for example:
Text Sample 9: Drama Text Sample 10: Business telephone conversation
(Past tense verbs are shown in bold underlined.)
Speaker 1: No sé si llamarlo novio ... Sí, en realidad, lo fue. Eramos tan Speaker 1: Perdóname un segundito. 'Cilag', ¿dígame?
jóvenes. No sé po~9ue le un día que no viniera más. Él me quería y Speaker 2: ¿Teresa?
creo que yo :amb1en. Armando pretendía tantas cosas de mí. Esperaba Speaker 1: Sí.
q~e yo ~amh1ase tan;o ... Me decía que lo mío estaba bien para princi-
Speaker 2: Hola, soy Miguel.
pio de siglo ... No se que pasó. Me fatigaba ... Me exigía ... Y yo estaba Speaker l: Hola, Miguel. ¿Qué te cuentas?
tan cómoda, tan tranquila. Speaker 2: ¿Qué tal, cómo estás?
Speaker 2: Los afectos intensos siempre me han fatigado. 0uea1<..er 1: Dime.
2: Sí, ¿me pasas con Rocío?
uµLcu"c1 l: Te pongo con ella.
Translation:
Speaker l: I don 't know if I should call him my boy:friend. Actually, he was Speaker 2: Vale.
once. We were so young. I don't know one I told him to never 1:
come back. He loved me and I think I loved him too. Armando expected 3: Miguel Llavori.
so much from me. He wanted so much forme to change ... He told me ~v'-"''""' 1: Vale.
that I was acting so old-fashioned ... I don't know what He ·~ueai<..er 3:
25.00
events
(!)
.... 20.00
o(J of past However,
111 15.00
'<:!' 1_ 01rwc:P•the two dimensions. There is a
e . features grouped on Dimension 3, including
o 10.00 tlC se various c l"it1cs
. an d .3r d person pronouns; as we saw above,
·¡¡¡ fea-
e: · 11y common m
tenres 'are espeCla · fi ct10na
· l narrat1ve
· an d d rama. In contrast,
(j)
5.00
E ~¡rnension 5 is much more specialized. It is defined a smaller set of fea-
i5
0.00 res: nouns, preterite tense, long words, prepositions and
~~ing . ~djectives. (The_major_ negative features are present tense,
- --
-5.00
al
e:
o
ctl
E
ltl
.e .... 1ü
c.
e:
.2
-
e: 111 (!)
111
o (j)
111
(j)
:¡:
;¡::;
111
ltl - - ...
<11 lll ....
ti!
...
o
e 111 oe: ¡¡: ·:;;....(!) ..Q(!) c.
...
(.)
ctl
'5
111
111
(!)
c.. ~
lll
¡¡¡
·;:
lll
ti)
e:
o :¡::; c. »
111
11)
....o
...
ti!
(j)
111
lll
(j)
3: .e
(.)
111
111
;::..
ctl
.... ca
(.)
o "tl "tl
111
(.)
111
ti)
ltl
111
111
.l!l
111
redicative adJect1ves and verb+ mfimt1ve.)
p Although the positive features grouped on Dimension 5 are related to
past time discours:, they are _quite different fro~ the Dimension_ 3 fea-
-
(j) (j)
w
- '5 E ¡¡¡
(!)
(j) "C (.) o
.E tll
(!)
...(j) c: 111
(!)
c. c. ca ca
a>
(!)
> o :5 ¡¡¡ Q) o tll w (!)
.... o ...
.... o tures. First of all, D1mens10n 5 mcludes only pretente tense (but not nnper-
111
e:
o (.) >- Q)
e: ¡¡¡ e: ¡¡¡ ¡¡¡ ..Q ... .o fect tense verbs), reflecting a focus on past time events with relatively little
(.)
"tl (.)
e: ·¡¡¡ e: ......
--
(j)
tll (.) ;¡::; E ca
(!)
(.)
c. tll tll
Q)
e: ¡¡¡ ·:¡ o
lll (.) w :::l .!2 ~ E(.) E ca :s:(!) background description. In addition, we find proper nouns with a large
m c. o
c. oc. !l.o
·¡-¡¡ ::J <(
tll Cl
:::l
:t:: lll c. z positive loading on Dimension 5, rather than 3rd person pronouns. This
::J 111 e: ¡¡:: w
m u tll suggests a style of discourse that discusses the past actions of many differ-
o'¡j .E
Q)
z ent people, referred to by name. In contrast, Dimension 3 features char-
o acterize more detailed fictional narratives that involve a few characters,
w
which are easily referred to with 3rd person pronouns. In addition,
Dimension 5 includes features ofhighly informational prose - long words,
Figure 3.4 Comparison ofregisters along Dimension 4: addressee-focused prepositions and premodifying attributive adjectives - suggesting that this
interaction
style of discourse has an informational rather than popular communica-
tive purpose.
Translation: A5 Figure 3.5 shows, these features are common only in written infor-
Speaker 1: OK Excuse me. 'Cilag'. Hello? mational registers: encyclopedias, business letters, newspaper reportage
Speaker 2: Teresa? to a lesser extent, academic prose. Encyclopedias and newspaper
Speaker 1: Yes. reportage are similar in that they are informational registers written for a
Speaker 2: Hi, it's Miguel. mass audience, informing readers about past events that involve many dif-
Speaker 1: Hi Miguel. What's happening? ferent people. Text sample 11, from an encyclopedia article, illustrates
Speaker 2: So, how's everything going? How are these features:
Speaker 1: So .. .?
Speaker 2: Look, can I talk to Rocío? Text Sample 11: Encyclopedia article
Speaker 1: Sure, I'll connect you. (Preterite verbs and proper nouns are shown in hold underlined.)
2: Thanks.
Speaker l: Hello? Tras abandonar los terrenos de juego, Suárez inició una nueva
Speaker 3: (Hi, Miguel Llavori. carrera como técnico. En esta faceta profesional, permaneció casi siempre
Speaker 1: Yeah? ligado a la secretaría técnica del Inter de Milán, de cuyo primer equipo
Speaker 3: Marica? llegó a ser entrenador. También el banquillo de varios
Speaker 1: Yeah? clubes españoles. Además, estuvo al frente de la selección nacional
Speaker 3: Well, um, can you tell me about the Tenerife <leal? española de fútbol, a la cual dirigió en la fase final de la Copa del
Speaker 1: Yeah, !et me tell you. Mundo disputada en 1990 en Italia. En 1992 regresó al Inter, primera-
mente como entrenador y, más como integrante de su equipo
ORPORA OF STER VARIATION IN SPANISH
e
o
"¡jj
e
lll
E San at a faír.
º -5.00
As a result, we still haven't carried out much
even it is late the process.
in terms of the
-6.00
111
111
:o(!)
...
-
111
!l)
Q)
1:1)
!I!
11) 111
o
e
...o iií!ti :¡::;
111
-
111
llJ
Q)
o
ti)
> ¡;a .e
111
.... !ti
lf¡
(!)
IJj
Cl ¡¡::
!/)
e: Cll 11)
·;::: (.) :¡::;
11'1
-- ...
111 fil
¡¡::
o
(!)
e
e
o
:¡::;
-e
ti)
!l)
111
E
The Printing Committee awarded the contract to the company named
(Book) Fairs and Congresses (directed by Juan Carlos Grassi), after several
-
111 (!) (!) 111
t:: c. (..) o (..) CI. 11) o !11 Cll ·:;; ·:;; .ec. 111 .e
.... e:
-- ...llJ detailed and involved discussions.
111
c. ..!!:! o "O ¡¡: 'O w o Q) o
!ll ¡¡¡ :e (!)
...o e:CI)...
Q)
o 111 Q. (..) ca (j)
·e
Q) Q)
w ¡¡¡c. E .....e: o Cll "C
u !l)111 ........Q) (i) .o....o ¡¡¡ > -¡¡¡
> .o ¡;a .5 e: The distinction between Dimensions 3 and 5 reflects the differing functions
u e: Q) "C llJ ¡¡¡ o ¡¡¡ o o
e "iii Q. 113 (1) 11)
o e: ¡¡; :¡::; (1) o E ofthe imperfect and the preterite, two forms ofthe past tense in Spanish that
~ t:: (!! o (.) Q)
w ::i !11 o Q) E e iii o
-
IJj
o :¡::;
al U)c. <t
z c. o :::¡ Eo
c.
·::¡ ·¡¡¡ ::i ll.
differ in aspect (a distinction not simílarly found in English). The preterí te
¡¡:: en :¡::; c. Cl lf¡
.s: m:::¡ o111 refers strictly to events that are viewed as a single whole, and preterite verbs
Q) 111
z .E o
·¡;
would therefore be common in encyclopedias and news reports (with the
o highest positive seores on Dimension 5). The imperfect, on the other hand,
en describes an event that was not yet complete, and thus it is used for back-
ground descriptions of events that were in progress or states that existed
Figure 3.5 Comparison of registers along Dimension 5: informational reports of when another event occurred. These discourse functions are important for
past events the description and narration typical in drama and fiction, the registers with
the largest positive Dimension 3 seores. It is interesting that the multi-dimen-
Translation: sional structure reflects this grammatical distinction found in Spanish (but
After retiring, Suárez began a new career as a technical advisor. In this pro- not English); we return to this point in the conclusion below.
fessional capacity, he was associated with the technical staff of In ter Milan,
and he became trainer for their first team. He was also an adv:isor for severa!
Spanish teams. In addition, he was in charge of the selection of the national 5. 6. Interpretation ofDimension 6: formal' written style
team for Spain, which he led to the final round ofthe World in 1990 in Finally, Dimension 6 is an extremely specialized parameter defined by only
he returned to In ter, first as a trainer and then as one of the tvvo co-occurring linguistic features: cual relative clauses and other cual
members of their technical staff. clauses. As Figure 3.6 shows, these features are common only in formal
written prose - especially academic prose. We interpret this
Business letters represent a somewhat different of prose but with
dimension as reflecting a formal 'high' academic of discourse, illus-
similar Dimension 5 features:
trated by the following sentences:
"ª"'IJ'L 12: Business letter
verbs and proper nouns are shown in bold underlined.) Text Sample 13: Academic text
(Cual relative clauses are shown in fill'1!!..ill1º'IT!fili~.
Estimado Sr. Obrach:
De regreso de mis vacaciones encontré su nota. Lamento no Dada la posible modificación que estas variantes pueden provocar en la
haber hablar con Ud. pero la decisión de la Comisión de composición del café, fundamentalmente en cuanto a sustancias lipídicas y
~~~-J...:H3':~~~~ se produjo en los últimos días de diciembre y durante el proceso de torrefacción al cual se someten, se pueden
enero, cuando yo estaba en San Pablo en una reunión de ferias. los niveles de compuestos, tales como hidrocarburos policícli-
Tanto es así que aún no se realizado entre los cos aromáticos y aminas heterocíclicas, los cuales al estar en mayor con-
y otra estamos en una bastante avanzada. centración provocar un mayor efecto mutagénico.
WORKING SPANISH ORPORA
OF STER VARIATION IN SPANISH
.50
...o
Q)
.00
o interactive focus', 'narrative
IJ)
U) 0.50 ni<Ju,ciu.cu,.,; focus' and 'informational focus'.
e:
o to compare the methods and results
·¡¡¡
i:: 0.00 in this chapter) to the Val paraíso "'~-~ "···~ ~
Q)
E . The Valparaíso study was conducted to
- ,,-
1.00 Q) Q) 11) Q) (f)
...
fl) (f) U) IJj fl) tJ) ti! U) U) e: i:: ro
..
tJ) Q)
ro rn rn 3: that domain: technical/science textbooks, literary ficti_on ª1'.d oral ínter-
-
(f) IJ) Q) >. .!!! 1ii Q)
3:: .2 ;o E 1ii oe:
e o ·;: !ti .e:
Q) !ti e: Q)
Cll .... ;
tJ)
tí !ti ...
- ro (,)
ro .e:
"ti Q) !ti
!ti Q)
. ws. In contrast, the Flagstaff study was carried out to mvest1gate the pat-
c. c. o '!:: o
(,,)
Cll
Cll
-¡¡¡ '6 c. c. 111
..2!
U)
tJ)
w c.. e: Cll
o uo Cll Cll
·~ ..Q
Cll
,,
(.,)
·~
ro ....
¡¡: ...
111 o c. ,, we .
terns ofvariation between general spoken and wn~te_n reg1sters m pams .
. . S .h
(,)
- ·=
:¡:; e: Despite these differences in research focus and corpora, there are str_ong
<( c..
fl) o m ::¡ ::¡ D..
z o o ·¡¡¡
c.
3: !l. rn tll ·milarities between the Valparaíso and Flagstaff studies. The most obv10us
:¡:; c.. ro CI) ::¡
m
u · Dimension l in both studies, which is a basic oral/literate d"imens1on ·
Q) fl) Sl
:z .E o
·e:¡ IB . . .
com-
o osed of features that reflect personal involvement and mteractiv1ty as
(/)
~pposed to dense informational prose. A second similarity is that both analy-
ses uncovered a narrative dimension, composed of both past tense features
Figure 3.6 Comparison of registers along Dimension 6: 'formal' written style
(imperfect and preterite/indefinite) together with 3rd person pronouns
and communication verbs. Fiction is especially marked for the use of these
features in both analyses.
Translation:
A third similarity between the two analyses is more surprising: the exist-
Given the possible changes that these substances can undergo as they pass ence of an informational narrative dimension that is distinct from the fic-
through the roasting stage in the production of coffee (primarily in terms tional narrative dimension. This is Dimension 5 in both analyses, consisting
of changes in lipid and protein compounds), there can be an increase in the of the preterite/indefinite past tense together with n_ominal features, ~dj~c
level of compounds such as polycyclic hydrocarbons and heterocyclic amino tival features and prepositional phrases as noun mod1fiers. Encyclopedias m
acids, which in cases of their heaviest concentration can cause an elevated the Flagstaff study were especially marked for the use of t~~se features, whil_e
risk of mutation.
textbooks in the Valparaíso study had the largest pos1t1ve score on th1s
dimension. Both of those registers have a primary informational focus that
6. Discussion and condusion includes reporting past events, resulting in these similar dimensions in the
two analyses.
There have been two previous Multi-Dimensional studies of register varia- Dimension 4 in the Valparaíso study is a general stance dimension, inter-
In the Sáiz (1999) built parallel corpora ofEnglish and preted as 'modalizing focus', which includes hedging expressions (e.g.,
'-"1J'au10u texts the Xerox ScanWorX User's Cuide, translated into both que, creer; tal vez, a lo majar), possibility modals (poder) and possibility
languages), and then undertook independent MD analyses of both subcor- adverbs (e.g., probablemente, posiblemente). The oral interviews are especially
pora. The study focused primarily on part-of-speech and simple grammati- marked for the use of these features. The closest counterpart in the Flagstaff
cal distinctions ( e.g., plural nouns, present tense , resulting in five is Dimension 2, comprising verb features, conditional
dimensions identified for both languages. These dimensions were for obligation verbs, future tense, queverb complement clauses, verbs of
the most part similar in their underlying functions across the two languages, facilitation, que noun complement clauses, etc. Two registers make an espe-
and the parallel registers were also similar in many respects. Parodi (see cially dense use of these features: political interviews and political debates.
II, this volume) presents a more developed MD
These are more registers than the personal oral interviews
based on the distribution of 65 features in a 1.5-million-word
inc!uded in the are actually more similar to the
84 SPANISH s REGISTER IN SPANIS
0.3341
3.3209655 0.7451271 0.3727
resources are more º1-''LU.<HlLL·u, 2.5758384 0.2986935 0.4027
0.3051511 0.0265 0.4292
and these resources have come to be
0.1077713 0.0229 0.4521
more distinctive dimensions of vanat1on.
The MD of Spanish has illustrated the importance of both kinds Inter-Factor Correlations
of register patterns. More detailed analysis of these linguistic features across
a range of spoken and written registers should help to enrich the functional Fact 1 Fact 2 Fact 3 Fact 4 Fact 5 Fact 6
interpretations of these dimensions.
Factor 1 1.00 0.26 0.27 0.44 -0.36 -0.14
Factor 2 0.26 1.00 -0.03 -0.02 -0.15 -0.06
Note Factor 3 0.27 -0.03 1.00 0.19 -0.05 -0.08
Factor 4 0.44 -0.02 0.19 1.00 -0.24 -0.10
1 As an anonymous reviewer pointed out, the 'business telephone conver- Factor 5 -0.36 -0.15 -0.05 -0.24 1.00 0.02
sation' register has a higher score than 'casual conversation' on Factor 6 -0.14 -0.06 -0.08 -0.10 0.02 1.00
Dimension l; however, the difference between the two registers on this
dimension is very small and not statistically significan t.
that to
~"'"rn.iu.aci·~~, markers
who claim that studies stance not
. but should rather refer to the concept of 4 Each
5
uon, .
as HHAUHH."'
. oW11 use, i.e. it favours a d1fferent set of stance ~~"~~k·~
what he/she says. As we can express ~:fferent types of grammatical real.izations. T~e basis of t~üs work
our degree of or the of or is the Longman Spoken ª1'.d Wntten Enghsh .Corpus, from ''.'h1ch
We may also our feelings and affection 100,000 words rangmg across three reg1sters: conversat1on, acade-
anger, and we may formulate our discourse or rnic and news. The notion of stan~e is reformulated and broadened
text as statements, wishes, orders, etc. I thus start from the classic previous studies: stance thus des1gnates the of
distinction between intellectual ( epistemic, i.e., declarative and hypotheti- feelings and value judgements on three major levels:
cal), interrogative, volitional and affective modalities proposed by Charles
Bally (1944) .3 These were later reformulated by a large number of studies as . the epistemic stance, which comments on the certainty ( or doubt), reli-
regards labels, distinct qualities and location within the framework of differ- ability or limitations of a proposition, including comments on the source
ent linguistic theories and approaches, for example studies on modality and of information;
modulation in a Systemic-Functional Grammar Approach (Halliday 1994), . the attitudinal stance, which expresses the speaker's attitudes, feelings
recent developments along that line such as the Appraisal Theory (Martin or value judgements; . .
2001; Martín and Rose 2003), work on evidentiality (Chafe 1986), on corpus . the style stance, which describes the way in which the informat10n is pre-
linguistics' evidentiality and affect (Biber and Finnegan 1989; Conrad and sented.
Biber 2001), on modality and grammaticalization (Palmer 2001), and finally,
on modality and emotion (Sandhofer-Sixel 1990; Danes 1987). Within the epistemic stance, it is possible to distinguish different subclasses:
Since Chafe's pioneer work on evidentiality and his research on its mani-
festation in reduced corpora of colloquial orality and academic writing a) indication of degree of certainty or doubt regarding the proposition
( 1982, 1986), there have been man y studies of academic and scientific texts (realized by perhaps, probably, etc.);
that analyse how attitudes toward knowledge are expressed. Available b) comment on the actuality of the proposition ( actually, really, in Jact);
studies in English and German focus on standardized written 'genres', such c) indication that the proposition is in sorne way vague (sort of, if you call it
as research articles, textbooks, peer reviews, etc. (among many others, that way); .
Ventola 1997; Hyland 2000). In Spanish, works on this topic are still very d) identification of the source of information or the specificity ( accordzng
rare, although they also focus on written 'genres' (Ferrari and Gallardo to) or, by implication, with words such as apparently and evidently;
1999; Gallardo 1999; Ferrari 2004). e) limitation of the information or identification of the perspective from
Severa! studies by Biber, both individual and co-authored (among them, which the proposition is true (in most cases, from our perspective).
Biber and Finnegan 1989, 1994; Biber et al. 1998; Conrad and Biber 2001;
Biber 2005), have been very enlightening on the topic of evidentiality and Keeping in mind that these are texts produced in everyday communication,
affect in 'genres'. U sing the notion of 'stance', they in elude the lexical and an interesting result of this study is that in conversation there are at least
grammatical codification of evidentiality in English. In their research, based more than twice the number of adverbial stance markers than in written
on multifactorial analysis, they study the grammatical categories that realize texts. Very common stance adverbials in conversation are actually, really
evidentiality and affect stances in different 'genres' (both oral and written). (polysemic) and probably. This frequent use is consistent with several b~ck
The examination of co-occurrence of certain linguistic features allows them ground characteristics of conversations: focus on interpersonal relauon-
to propase different 'attitudinal styles'. In Biber and Finnegan's seminal ships, expression of value judgements and ~ersona! attitudes, and. lack of
work (Biber and Finnegan 1989), the authors present á study on stance time for planning. This result is, in turn, cons1stent w1th the expectation that
styles in English: with the aid of a computer program, they analysed 24 participants in a conversation are personally involved .with their m~s~ages
written and oral 'genres' classified on the basis of groupings of stance and thus frame their statements with their personal att1tudes and op1mons.
features, and proposed six basic attitudinal styles in English. The selection of In academic prose, the writers' concern is evident in the pains they take
oral texts includes the following: face-to-face conversations, telephone con- to express certainty, actuality and vagueness. However, and contrary to
versations, public conversations, debates and interviews, radio programmes, conversation, there is a relatively wide range of epistemic stance markers
WORKING SPANISH C AND ORALITY
a very
,.,v~·~·,,.,,~
stance. On the
cµ>Ioi.nJLHL
Dedarative
Conditional
assertion
Verb tenses different schema whose is
Dr,ooosittHJn:1! elements. We
ésta es la teoría ahora, which both
schernata in and temporally restricts the previous assertion (it indicates that this is
in which the a provisional this assertion, we find a number of
out - different grammatical procedures - a fragment reformulating statements that inform the interlocutor about the limited
or following text, and 'assigns value' to it from the perspec- range within which he/she should interpret X: o sea no lo tomen como/ como
tive of its factuality. The following are sorne examples of qualifying schemata: al pie de la letra, porque todavía no se sabe, esto es lo postulado hasta ahora, lo que
se vio en cuanto se trabaja en/en in vitro, cuando se trabaja en temas de animales.
(2) The sequence ends with an assertion that sums up and clarifies the
[Los receptores son los que van a hacer que el individuo reincida a la nicotina.
restricted nature of X: En humanos todavía no sabemos nada.
Cuando uno fuma de vuelta, yo dejé de fumar, pruebo un cigarrillo, lo que me
desencadena eso, es la nicotina, o sea que son los receptores a la nicotinaJX ~ From a structural point of view, the modal operator may precede the
[de eso no hay dudaJY ¿sí? [15, AJ modified segment, as shown in example 5, where the neutral demonstrative
operates cataphorically:
At the end of this fragment I have marked the modality's qualifying opera-
tor (Y) in italics, where the neutral pronoun ofthe prepositional syntagm - (5)
eso - picks up the fragment of the previous text. The assertion no hay duda Aumenta el deseo de búsqueda de placer, [esto es lo que decía hoy. que está dis-
reinforces the declarative value expressed in it. The modal operator is cutido, hay algunos que sostienen, otros que no ]Y, [que muchas drogas ya no
provocan realmente un placer]X entonces ya tiene que ver con otro/ con/ con la
attached in parataxis to the segment that has been assigned value and, due
búsqueda, el deseo de búsqueda del displacer, de no tener /de no sentirse mal.
to the coincidence with the modal value of the segment that it modifies (an
¿sí? [8, AJ
assertion in itself), the operator may be considered a modal reinforcer.
In example 3 we find an operator that has a different value from its modi- Within the modal operator, the qualifying segment - que está discutido, hay
fied segment. Since its effect is to restrict or relativize the previous assertion, algunos que sostienen, otros que no- explicitly states the relative value with which
I call ita 'modal qualifier'. The operator instructs the interlocutor as to the the modified segment should be interpreted. In this case, the modal operator
reservations he/she should have while interpreting X; it follows the modi- and the modified segmentare placed in a hypotactical structural relationship.
fier and, again, a deictic element - the neutral demonstrative - realizes the The modal operator may be embedded in the component it modifies: in
explicit reference to the modalized fragment: example 6, the reinforcer interrupts the lineal syntax and separates the
(3)
nucleus from the prepositional complement, establishing a parenthetical
connection with its modified segment:
[Fumar causa un tercio de la muerte de los hombres entre treinta y cinco y sesenta
y nueve años y es una de las causas de mortalidad que continúa creciendo, y de (6)
aquellos que fumaron durante la adolescencia y durante la vida, la mitad muere estas estructuras son capaces de. producir comportamientos rígidos pero. [son
por problemas relacionados al hecho de haber fumadoJX ¿Sí?*--- [esto también. capaces X~ [lo hemos visto, lo han visto ustedesJY __, [de mucho más que eso,
cuando vean estadísticas, en los diarios o en lo que sea, les sugiero que lo tomen son capaces de aprendizajes simples, pero también de aprendizajes complejos,
con también, porque hay factores, que uno no sabe]Y [14, A] aprendizajes que hasta ahora no creían que pudieran existir en un sistema como
In example 4, the structural and semantic complexities are even greater: ésteJX [36, B]
(4) In this case, the operator consists of a constituent that includes a structural
[Es que los detectores a la nicotina están en el centro, en los núcleos del cerebro, parallelism: constructions including the epistemic sense verb
los cuales/ en los cuales estas células se mueren, en el Alzheimer o en el together with an identical direct complement as the object pronoun),
Parkinson. Entonces cuando cuando estas/ estas células del cerebro tienen achieve an emphatic modal value that reinforces the categorical assertion
este receptor, cuando unen la nicotina, se mantienen activas entonces no logTan of the modified segment.
WORKING SPANISH EPISTEMIC MODALITY AND ACADEMIC ORALITY
este cerebro
nos está haciendo falta es en realidad generar nuevos modelos
_ es lo que nos está haciendo falta son este de aproximaciones
_ qué es lo que se está midiendo
_ no se nos,ocmTió es que tal vez lo que estábamos haciendo necesitaba un
contexto diferente
- Sustancias no perjudiciales, no perjudiciales digamos por ahora The chart shows that component A is represented largely a pronominal
- esta es la teoría por ahora element (anaphoric or cataphoric, examples 2, 3, 5, 6). The modified
- esto es lo postulado hasta ahora segment may also be indicated by metapropositional elements, such as idea,
- aprendizajes que hasta ahora no creían que pudieran existir en un sistema teoría (idea, theory) (example 4); by ellipsis through co-referential argu-
como éste
ments; or even by hedges, such as digamos (let's say). In the case of the
Durational/ progressive periphrasis: specifically qualifying component, it contains several modal markers, such
as epistemic verbs, names, adjectives, participles, prepositional syntagms
- lo que se está viendo acá es que quizá no es tan el número and phrases (see Table 4.1), to which temporo-aspectual modifiers are reg-
- esto un trabajo que se está haciendo ularly attached, thus restricting factuality. Naturally, indicators operate
WORKING SPANISH CORPORA EPISTEMIC AND AC ORALITY
Notes
we should also mention the research
recent years has knowledge regarding the
Condusions and semantic specificities of lexical units and their
The science talks has shown with regard to the combination and formal behaviour in Spanish, Adelstein
manifestation of epistemic modality. On the one hand, there is linguistic evi- 2004; Adelstein and Cabré 2002; Komfeld and Resnik 2002; Kuguel 2006).
dence of a significant presence of the specialist's explicit subjective expres- 2 This project has received funding from the Consejo Nacional de
sion, assuming the responsibility for modal assessment. On the other, the Investigaciones Científicas y Técnicas de Argentina (PIP-CO NI CET 6165).
research has verified a clear tendency to orient lay interlocutors as to how 3 Zavadil (1968) presents a study of the different modalities in Spanish and
they should interpret the factuality of the specialist's statements. In this their markers. See also Kovacci ( 1990).
sense, I have identified and described a regular procedure to accomplish 4 The analysis of severa! works by these authors shows, however, that they
this - the modality's qualifying schemata. These have a stable structural seem to make no conceptual distinctions between the terms 'genre' and
composition and functional nature, providing further grounds for the 'register'. In Biber and Finnegan (1994: 4), for example, they define reg-
belief - stated by different analysts - that there is, in fact, order and regu- ister as a 'linguistic variety considered in relation to its use context';
larity in orality (Blanche-Benveniste 1998, among others). The data gath- further on, they argue that 'aside from the term register, we have used the
ered has further shown that in the popular science talk genre in Spanish terms "genre", text type, and style to refer to linguistic varieties associated
both personal epistemic value judgements (e.g., yo no creo en una droga mila- with situational uses'.
grosa, lo que yo sé es que . .. ) and non-personal epistemic value judgements 5 Alicia Avellana transcribed the text which I later revised. I have normal-
(no se sabe, se está estudiando, lo que se vio cuando se trabaja in vitro) that seem ized the transcription, erasing paralinguistic information irrelevan t to the
to represent the voice of the speaker's community of peers are significantly analysis of the topic discussed in this article, such as lengthening or
present. We should point out that the non-personal epistemic value judge- emphasis. The punctuation attempts to represent intonation in this type
ment has been described so far as a feature ofwritten academic prose. of spoken text.
The descriptive data gathered at the grammatical form level may be
understood in relation to the functional and situational dimensions of the
genre. The purpose of adequately and fully informing a general audience
about their personal research in a public institutional environment and a
relatively formal context underlies the specialists' concern for a careful pre-
sentation of the degree of factuality of their statements in texts that have
been prepared beforehand. This explains the rich and varied repertoire of
markers to express doubt and certainty, and also the fact that the 'genre' is
full of assertive and hypothetical modal qualifying schemata. On the other
hand, from a situational level, the features that characterize popular science
talks, such as direct contact with interlocutors and oral expression, also
determine how specialists typically manifest commitment (Chafe 1982) in
spoken discourse: references to the first person, statement of speakers'
mental processes, hedges and colloquial locutions.
The descriptive results of the different lexico-grammatical markers of
epistemic modality may be used to guide further research of more extensive
corpora of academic orality and the expression of this modality in Spanish.
This article has provided evidence as to how the features of functional and
REGISTER ANALYSIS
J.1 PrefJos:tizo.nat
At this stage, we propose the notion of 'prepositional scheme' as a
Omar category of collocation, (at least in part) what different grammars
Pontificia Universidad Católica de Valparaíso (Alarcos Llorach 1999; Cano 1999; Di Tullio 1997; Fuentes 1985; Gómez
Chile 2002; Hemández 1986) have variously called prepositional complement or
prepositional object complement.
The first question that arises is that related to the reasons for proposing
this category, and the need to set it apart from other equivalent grammat-
ical categories. The answer is provided by the difference in collocation,
Introduction
based on the systematic co-occurrence of two or more elements that do
Prepositional complements (PC) constitute a kind of grammatical struc- not always have a direct grammatical relationship. In this sense, the notion
ture related to sorne verbs. Like many other grammatical structures, it can of prepositional scheme can be understood, in principie, as a bigram
be seen from different perspectives, depending on the author's point of (Jurafsky and Martin 2000) ora recurrent sequence of two elements, where
view. Furthermore, no previous studies have been conducted in which the first element is a common verb, and the second a common preposition.
the PCs analysed in this research were related to a lexical verb category, From a different point of view (Biber 2005), a scheme can be seen as a
such as movement verbs, cognition verbs, communication verbs, etc. This sequence ora frequently occurring lexical bundle. The schemes we present
chapter proposes a more diverse view of PCs, as what has been labelled a here also have a third element, so, strictly speaking, we are dealing with a
'prepositional scheme'. Moreover, we link these schemes to a specific verb trigram. The third element belongs to a general cognitive category that is
type, namely to what we call communication verbs (Bosani 2000; Sabaj equivalent to the notion ofthematic role (Dowty 1991) or semantic partici-
2004a). pant (Sabaj 2006). These semantic categories, however, are cognitive and,
These verbs, as can be seen from the different approaches that will be as such, do not necessarily maintain a dependent syntactic relation with
discussed, play an essential role in various dimensions of discourse, the previous elements. They are cognitive elements because they can be
among them the presentation of the discourse's voice and the author's represented in an ontology or semantíc network where the nodes corres-
stance with respect to the text topic. Unlike previous studies, this research pond to events, bodies, objects and a more or less defined number of cir-
is based on a corpus linguistics methodological perspective, which implies cumstantial elements (event, space, place, mode and instrument). These
that descriptions are based on actual texts taken from different language cognitive categories complete the meaning of the verb and can relate (as
uses. Such an approach has rarely been used in grammatical studies, since shall be seen later on) with syntactic structures such as prepositional object
they are traditionally associated with the of units without a dis- complements.
course context, i.e. grammarians have been traditionally more con- For Spanish, there are several approaches dealing with PCs (Alarcos
cerned with isolated sentences. In this research is based on Llorach 1999; Cano 1999; Di Tullio 1997; Fuentes 1985; Gómez 2002;
diversified corpora containing nine registers of varieties of Spanish, oral Hemández 1986) and consequently a verywide range of terms is employed.
and written, scientific and educational, professional and literary, among In this paper, we will summarize the most relevant classifications proposed
others. in Spanish and then we will relate the description of those structures to the
This has been organized as follows: first, we present the theoret- categories defined far the prepositional schemes we present.
ical on the basis of a revision of different grammatical From a general point ofview, prepositional complements are understood
approaches, proposing the notion of a prepositional scheme and its relation as structures introduced by a preposition, forming one single unit w:ith the
with PCs. Next, we review the characteristics of verb studies associated followed an argument. Even though this general definition may
with corpus and define communication which seem acceptable, to a greater or lesser it is to propose at
REGISTER ANALYSIS
)
2)
Even these criteria structures As we have seen, there are different determine whcther a
that function in a similar way and, prototypes. can be as a PC or not, and those criteria tend to
According to the first criterion, sorne verbs do not presentan alternative in a manner which, more than helping to define constructions
construction without a preposition, i.e. they only appear with a preposition. allovvs the analyst to establish relatively prototypical behaviour in open
Sorne examples of those verbs are radicar en, consistir en, carecer de, influir categories, as Cano has rightly proposed ( 1999: 1813):
rtPt1P11rt1>r de. The preposition, in these cases, is generally placed irnrne-
diately after the verb and a complernent is mandatory to make the con- The individuality ofprepositional object complements in relation to the elements
which govern them is expressed not only by the fact that prepositional depen-
struction complete:
dence is not a well-defined category of complements in form and meaning, but
(1) *El gobierno influyó en. also by the fact that, as with objects, they are not determinations that appear and
are established as a part of any verbal process, but are specific to certain verbs.
According to criterion 2) it is possible to find constructions that mayor may
not be followed a preposition. Sorne exarnples of this type of verb are Various authors (Di Tullio 1997; Torrego 1999; Gómez 2002; Fuentes 1985)
hablar/hablar de, preguntar/preguntarpor, aprender/aprender a, luchar/lucharpor. point out that PCs can be misinterpreted as circumstantial complements
The alternation <loes not imply a change in meaning, but includes another (CC) and direct objects (DO). On the one hand, the way to distinguish PCs
participant: from CCs is based on criterion 4) defined above. In this sense, CCs can be
easily omitted, whereas PCs cannot be deleted (Cano 1999). On the other
(2) La mujer preguntó por ti/La mujer preguntó todo lo que quería. hand, PCs have a strong conceptual link with DOs, because they both com-
(3) Aprendí a relajarme antes de la exposición/ Aprendí mucho durante mi plete the rneaning of the verb and can be recognized from the question
estadía. 'What?'. Dueto this close relationship between PC and DO, sorne authors
( 4) Luchó por la libertad de Luchó toda su vida. have called these structures 'supplements' or 'prepositional object comple-
Criterion 3) establishes a distinction between verbs with a transitive and ments' (Toirego 1999; Hernández 1986; Cano 1999). Nevertheless, there
intransitive version. The transitive version requires no preposition, unlike are formal differences between both complements ( Gómez 2002). PCs are
the intransitive version with a form (Di Tullio 1997). Even introduced by a preposition and cannot be replaced by atonal per-
though these verbs can be understood as a subset of verbs generated from sonal pronouns. Sorne authors (Di Tullio 1997; Cano 1999; Torrego 1999)
criterion 2), the distinction between transitive and intransitive does not have also remarked that in many cases the presence of a prepositional com-
to all of them, but to those defined this third criterion. Sorne next to a verb that does not require thís structure implies a
of these verbs are the following: in meaning, e.g., on. It is to establish a direct
relation between PCs and schemes. All the dis-
(5) lamentó el 1 rn·H1'C'rnr<-' se lamentó de los resultados obtenidos. cussed structures are include a verb, a
(6) olvidó el se olvidó de su compromiso. pren,os1t1•'.ln anda category. Sorne of these schemes correspond to
a depending on the type category they present. In Table 5.1
to criterion 4) verbs that take prepositional ~v,u;;.nc.ui~HL~
we illustrate the categories to be used, and we will see which of them could
are as such when they con tribute value to
correspond to a PC.
of the verb. Such are arguments and not
From the correspondence shown in Table it can be stated as a pre-
answer 'What?', not 'How?', as can be seen in
that when categories in a scheme are an or object, a person
Mónica habla de corrido. event, such a scheme to a PC the category
(8) Phoebe habla de música. presents and argurnent ~LIJLHL<L'''-L structures.
WORKING WITH SPANISH ORPORA REGISTER ANALYSIS l l
preguntar, responder,
anunciar en be done with words c11.1Jm.ctr. informar, avisar, ordenar,
Location
hablar de corridor anunciar, declarar, relatar,
Mode
hablar narrar, comentar
Instrument
or hablar + B Those referring to actions that can be piropear, saludar, insultar,
Person hablar de Francisca + done with words and affect the listener calumniar, injuriar, amenazar
Event hablar de que es + e Those referring to attitudinal enfatizar, asegurar,
entretenido actions that affect the propositional afirmar
Time hablar por una hora content of the statement
D Those referring to physical action gritar, susurrar, balbucear,
that can be done with words murmurar
2. Verbs and corpora
Verbs undoubtedly play a central role in natural languages (Wiemer-
Hastings et al. 1998), a fact that makes them of focal interest in any linguis- in our research. Later, we will complete this scheme with other authors' tax-
tic study. Most current studies on verbs (Fernández et al. 1999; Aguirre 2000; onomies (Hyland 1998; López 2001; Massi 2005), determining the func-
Vázquez et al. 2000; Morante et al. 2000; Vázquez et al. 2002; Levin 1993; tions recognized by the communication verbs. The objective of the above
Ferrer 2004; Subirats 2004; Castellón et al. 2005) are formalistic and focus will be to propose the typological criteria to be used in this work.
on the so-called lexical-syntactic interface. The purpose of these studies is Bosani (2000: 253) states that the 'Spanish lexicon contains a large
to model the way in which sorne lexical patterns correlate with syntactic number ofpredicates referring toan act of enunciation'. She adds that the
structures in order to implement those models into computerized verbal structure of communication verbs is determined by the relation of these
lexicons. This line of research has attained a high degree of development, verbs to the verbal archetype DECIR ('to say'), which structures the type of
but its contribution can rarely be used to describe real texts, which leads us verb in question. The different subtypes of communication verbs can be
to claim that there is little relationship between studies on verbs and their defined according to the direct or indirect relation of a particular verb to
application in the description of authentic texts. the verb DECIR. Based on this principie, Bosani (2000) proposes four types
Conversely, the few studies that analyse verbs based on corpora consider of communication verbs that are schematically presented in Table 5.2.
these units to be a specific feature within a set of features in what has been Type A verbs correspond to the performative use oflanguage. Type B verbs
called multiple feature analysis (Biber 1988; Parodi 2005c). Nevertheless, can be defined according to two criteria: 1) whether the verbal act takes
sorne research using corpora has specifically focused on verbs (Sabaj 2004a place through the use of a special type of locution (greeting, compliment,
and 2004b), stressing their importance for register determination or auto- slander, etc.), and 2) whether it presupposes sorne degree of effect on the lis-
matic analysis of documents (Klavans and Kan 1998). The contribution of tener / patient of the verbal act. Type C verbs, apart from the action of saying
these studies exclusively focused on verbs is the fact that more accurate something, express a certain attitude of the agent towards the propositional
information can be obtained about the behaviour of this category, infor- content introduced by the verb. As stated by the author (Bosani 2000), these
mation which is othenvise lost when the verb is just another feature within verbs have a more complex lexical-syntactic representation, since beside the
a matrix. locutive action feature, they include the + MANNER feature in the repre-
sentation, thus becoming equivalent to Urban and Ruppenhofer's proposal
(2001). Finally, Type D verbs point to a locutive activity, marking the specific
2.1 Communication verbs physical manner in which the agent does the 'saying'.
Communication verbs basically correspond to what is called verba dicenci in This work will not address verbs oftype B or D, butwill rather concentrate
Latin grammar; that is, those lexical pieces expressing verbal human activ- on A and C types. The first group is close to the protoverb DECIR which,
. In this sense, and from a general point ofview, commu- according to the author, is a basic semantic predicate underlying all com-
nication verbs can be identified with acts of speech. munication verbs. On the other hand, as was mentioned before, type C verbs
For the purpose of this we will start from Bosani's include the expression of the attitude of the speaker towards the commu-
scheme to determine the of verbs to be considered nicated propositional contents.
12 WORKING SPANISH STER ANALYSIS
Mode
Instrument
or
it is not easy to a or Pcrson
aspects stem from the type of category Event
used and aspects from (using neither co- Time
relational nor parametrical but basic statistics) of the oc curren ce of those
categories in each subcorpus.
Considering the above-mentioned arguments, the general objective of Table 5.4 General analysis matrix
the study is to make a comparative description of the performance of prepos-
itional schemes of two types of communication verbs in a multiple-register Criterion Name
corpus and relate the performance of each type of verb with the macro- N Number ofwords
contextual characteristics of the registers under study. V Number ofverbs
NCV Number of neutral communication verbs
ACV Number of attitudinal communication verbs
3.1 Method p Number ofprepositions
Since our analysis unit is a sequence ofthree elements, sorne decisions must e Number of cognitive categories
be made. Both the sequence and each separate element can be studied as a
unit. For better understanding of the manner in which we will analyse the
schemes, we present Table 5.3. 2) other queries, using the same instrumentas above, to rule out the verbs
unable to combine with a preposition, i.e. all selected verbs have the
3.1.1 a descriptive matrix analysis potential to combine with a preposition.
In Table 5.4 we present six criteria for the analysis.
Finally, the research was restricted to 17 verbs appearing at the top of the
3.1. 2 Verb sample and selection criteria list of frequency of occurrence in the two groups mentioned above. They
A sample of 34 communication verbs was selected, dividing them into are shown in Table 5.5.
two groups as described above: neutral communication verbs and
attitudinal communication verbs. The selection was done in three stages. 3.1.3 The corpus
a survey of various grammars and specialized articles in search of Table 5.6 presents the corpora used in the study.
the verbs most frequently included in these categories was conducted The corpus analysed consists of nine diversified registers of Spanish,
(Alarcos Llorach 1999; Cano 1999; Di Tullio 1997; Fuentes 1985; which include written registers (ARTICOS, DETP, DEEB, DICIPE,
Gómez 2002; Hernández 1986). Then, in order to establish a distinction CPP and CLL) and oral registers (CEO and NOTICEN TV). Sorne ofthese
between neutral and attitudinal were classified to are school-level registers DEEB, CEO, and CTC), while others are
Bosani (2000); additionally, we analysed the co-text of each occurrence associated with communication and the dissemination of scientific know-
to ensure that every verb corresponds to the category to which it was ledge (ARTICOS and DICIPE). Within the corpus, there are sorne registers
related to specific disciplines such as literature and public policies (CLL
The third stage involved: and CPP). In short, we have tried to focus our research on registers that can
be grouped together or set apart according to different criteria, so as to
1) a series of queries to determine average frequency of oc curren ce of a cover a wide range of registers. For a detailed description of the general
verb. An internet search engine was used and the results were characteristics and multiple corpus collection procedures, see the
written in order to eliminate verbs that were seldom used.
16 SPANISH REGISTER ANALYSIS
communication ----- 1)
2)
The done context of5 elements
4 Contar Precisar
The results were then transferred to MS Excel An ~"'U""""·
Narrar Afirmar
Discursear Aseverar
Left
Enunciar
8 por eso se usaban la
9 Comentar Mauricio Purto,
10 Informar Criticar andinista y médico.
11 Manifestar Declarar
Definir superior a la de los explicaría con su prolongada
12 Mencionar
Especificar esquimales, se permanencia en
13 Nombrar
Explicar estas regiones.
14 Señalar
15 Transmitir Insistir
After completing the queries for all registers, categories were classified in
16 Denominar Revelar
17 Presentar Sostener the last column of the grid above. Problem cases were submitted to expert
peers for classification. Finally, the variables were isolated for the two pat-
terns to be researched:
communication verbs
/
/
1 1
1
\
----
2
Hablar
Decir
582
407
32%
22% Comentar
81
46
32%
18%
1 Presentar 209 11% Definir 33 13%
/
3
4 Contar 103 6% Insistir 31 12%
5 Señalar 93 5% Declarar 18 7%
6 Expresar 92 5% Revelar 9 4%
7 Informar 75 4% i\segurar 8 3%
8 Manifestar 74 4% Argumentar 7 3%
9 Mencionar 50 3% Enfatizar 3%
10 Comunicar 41 2% Sostener 6 2%
Registers
presence of communication verbs in this register would be associated with
Graph 5.1 Frequency of communication verbs* the occurrence of expressions referring to interaction among characters.
*Frequency X 100 The ten most frequent communication verbs in the corpus are presented
in Table 5.7.
in ARTICOS and DICIPE, two registers associated with science and tech- As shown in Graph 1, Table 5.7 illustrates the predominance ofNCVover
nology. These results imply that in science communication and dissemina- ACV. It is interesting to note that the most frequent NCV are close to what
tion, communication verbs fulfil both roles, i.e. present information and Bosani (2000) has named Protoverbs. Even though the resulting percent-
express attitudes towards propositional contents. In addition, both registers ages of both verb types seem similar in this table, they are not comparable,
present few communication verbs, a fact that could have two explanations: because they have been calculated on the total of each subtype and com-
first, in these registers, other types of verbs are predominant (e.g., copula- parison can also be done vertically along the table.
tive) or other grammatical categories are prevalent (nouns, adjectives); or
second, DETP and CEO, two registers associated with school activities,
4.2 Dependence between prepositions and cognitive categories
present the greatest difference in occurrence between one group of com-
munication verbs and the other. To find out whether there is a general association among the examined vari-
The data show that these registers favour content presentation and ref- Cramer's V Test, which is a symmetrical coefficient, was used. This test
erence to the activity of communication before attitudinal expression does not distinguish between independent (cause) and dependent (effect)
towards propositional contents. It is worth mentioning that both are pro- variables, and it can only reflect the force and direction of the relation
duction registers, i.e. texts written by students in school contexts. Thus, we between two variables. This coefficient, just as other similar coefficients,
think that their educational level can account for a greater difference in the to compare the values obtained each register and it usually ranges
use of one verb rather than another, sínce the expression of attitudes, that from O to 1 (sorne range from -1 to + l), O being the statistical indepen-
is, taking a stance toward the contents of what is being communicated, is dence and 1, the perfect association. For the purposes of this study, the
part of a more advanced educational level, where the subject has a better expected value was 0.6. In other words, two variables become associated
command of vocabulary. The subjects who produced these registers are at when one occurs six times with the other out of ten occurrences. It should
the called Bereiter and Scardamalia (1987) 'the expression of be pointed out that only the nominal result, not the number, of each test
will be shown, since panda values vary every time the test is applied to each
DEEB and NOTICEN TV) pre- register. To minimize the error made by the application of a test,
senting a with mínimum difference between one the expected value a = 0.5 was divided by the number of times (9) the test
type ofverb and the other. This implies that these verbs, as textual features, was applied, and this yields a 0.055. This value was used in each ofthe nine
do not to establish a variation between the registers in this group. tests carried out on each type ofverb.
the register of Latin American literature texts, can be seen The results shown in Table 5.8 revea! in the case of the NCV, there is
m1íHA7:iv ~U'"""'" between those with the greatest difference ,.,~,,~··~··~~ between the type and the cognitive
and those where the difference is not The this prcposition occurs in all the ~~·~',,~.,~ studied. The occurrence of
WORKING SPANISH ORPORA MULTI-REGISTER ANALYS
NCV
70%
CTC 60%_¡_~--~~~~-Jl'-~~~~~~-----;:------<li;------,llL-----'\-----11'-----1
DETP * o
o 50%
DEEB 40%+-~..............:--¡;----\----;1"--~r;-1---~-:-r----:.=-\-----1
DICIPE *
NOTICENTV
* o 30%
* o 20%-L----4~_._,L!'.:__~~--~~-'--,-----,,-\,--;..---§------~~-\-~~
CPP
CLL
* o 0%-l--~-~'-c-:¡¡.,-----=;~~__:-..;:--~-:;;;!-:-:---'--\--~:--------:~~--1
CEO
* o 0%-L-------------~.-d!'-----,.-__:s"'"""'-r---''9..-~r--o-~'---r---~
* * {<,- ~
<vº
(j <f 0«;
*= Dependent <:)~ ~~
O = Independent ~º
Graph 5.2 Schemes with the preposition a
one preposition and one category in these verbs matches with frequent col-
locations in each one of these registers. These results show that there is rela-
tion between both categories and they jointly function as a sequence typical 100%~---------------------------,
of the registers where they jointly occur. 90% • • • Place L - - - - - - - - - - - - - - - - - - - - - - 1
In the ACV, in contrast, there is only one association between the prepo- 80% - -•- - Mode 1 - - - - - - - - - - - - - - - - - - - - - - - - 1
sition and the cognitive category in two of the registers studied, ARTICOS 70% - Entity L - - - - - - - - - - - - - - - - - - - - - i
and CEO. According to the above, science and orality present sequences of 60%
typical prepositional schemes, which will be discussed in the following 50ºkJ___ __.L:___~_ _ _ _ _ _ _...!'___ _,.__ _ _ _ _ _--,,,--.....---..---i
section. 40%-L-__,,¡f'.___ _\---±;:_,.----#-~.-=----~-----4'--,,~~___:''-..--¡
30%J__~L__,.___ _....\,~=-=~'L.,--~---.,-,J!l'=.~no---.!!.__-JC----'1:4----j
20%-L---=~----='.'__,~--.:-\---J----'i!11r-::---~-..~--~---JL---,~---¡
4.3 schemes of communication verbs in Spanish _____________c't-----l----=-'---'l.---I
10%1---~-~::___"!j,.L_
The test shown above <loes not help determine which prepositions 0%J_______:~-------~---.,.------.~9'-~.--e-.-~-~
relate to which specific category. For both verb types, the three most fre- ""«. ~ - • 9.((,, <:<.,,~
quent prepositional schemes in each register will be shown. <:)«,, ti' ~~
Q ~º
4.3.1 Prepositional schemes in the NCV
. 2 shows the NCV schemes, preposition 'a' being the most frequent Graph 5.3 Schemes with the preposition en
m these types of verbs.
The shows that the preposition a preferably combines with three As to the 'mode' category, this presents a heterogeneous beha'Viour in the
cognitiv_e categories,_ namely, mode, and the category different registers, but rare occurrences or their absolute absence is notice-
is recurrent m those registers with communication and able in the registers CTC, DETP and DICIPE. These registers group
th~ dissemination ofscience (ARTICOS, CTC and DICIPE). This category, together in terms of theme and function, namely, the technical-scientific
wh1ch to abstract nouns, is functional in these registers because discourse both in school settings and press dissemination. The data suggest
for concept transmission. It is worth noting that with this that, combined with the preposition a, these registers rarely express the
._,~,oiu~u there is behaviour of the category which alterna tes with mode of communication.
of those general registers that oppose the Graph 3 shows the schemes ·with the most frequent preposition in this
DEEB and NOTICEN TV). The communication type of verb, that is, the preposition en. .
NCV tend, in these '\i\'hen there is a NCV with the preposition en, schemes with categones of
to whom this
place, mode and entity are mostly formed. The category 'entity' in this case
WORKING SPANISH CORPORA MULTI-REGISTER ANALYSIS
100%
90 % - - - Person
80% -B-- Event 1----------------::::::::::::::::::----.;
70%r---__::=-=~_j------------======~~---_:::,,.----j
70% ----· Person
'
~~~:~======~=====:=============::~=========================~
\
60% /
,. ......... --,
40%-t--------------------------------¡
50% ,. ;
' \
40%
30%-r-----:111::---------...---------------------¡ 30%
20%-t---7"--~~,,,__--::-.-Y:-------""~-----=:::::::¡¡:::~----...---1
\
10%--1-~.,L---7'~-__;,.'-:-----___:::a~==:E:::~=-=-~-=::::=-11---~o=::::::::_-1 20% • 1
\
1 1
1
0%--1-----lllf__~:::.::=---~,.c._~_::,_~=--=~---"--T-----.-----c~~---.~---1 10% 1
,p. 0%
0<vº 0 00 <vº 'v'v '9.Q, f....v ~ <::;<v.Á.,.Q,
!-..,'-º
(j (j (j v <::;<()
prevails in the registers CEO and CPP. Though very different, the theme of such a preposition, that is not allowed by other prepositions. The category
both registers can explain the prevalence ofsuch a scheme. In the first case 'event' presents a relatively homogeneous behaviour in registers of differ-
(CEO), the theme is the text comprehension process and, in the second ent characteristics (CEO, CPP, CTC, DEEB, DETP, DICIPE, NOTICENTV),
case (CPP), poverty eradication state policies are dealt with. This scheme and so the explanation of the syntactic restriction is the most pertinent one.
(NCV + en + entity) is, then, functional in those registers whose themes are The preposition de creates a scheme with the category 'person', mostly in
abstract. The fact that the categories 'place' and 'mode' are typical of CLL the CLL and in the ARTICOS. In the first case, it refers to the presence of
is also decisive in the sense that the function of these categories is to indi- characters, whereas in the second to the presence to authors.
cate the immediate context (deictic markers), a prevailing function in this
register. This tendency may be due to the mention of specific parts of 4.3.2 Prepositional schemes in the ACV
devices associated with technique in the case of CTC and to the reference Graph 5 shows the ACV schemes with the preposition a, the most frequent
to geographic places in the case ofboth DETP and DICIPE. As to the expres- one in this type of verb.
sion ofthe 'mode' in the NCV, itis typical ofthe register NOTICEN TV, most With the ACV, the preposition a largely produces schemes with the cat-
probably associated with the way people in the news express themselves. egories 'mode', 'entity' and 'person'. The category 'mode' occurs only in
Graph 4 shows the schemes resulting from the combination of a NCV with three of the nine registers studied and it prevails in the registers CLL and
the third most frequent preposition, namely, the preposition de. ARTICOS more than in any other register. The most relevant category that
The data in Graph 4 show that the preposition de preferably combines combines with this preposition is 'person', with a high occurrence in the
with the categories 'entity', 'person' and 'event'. First, the homogeneous CLL, CTC and DICIPE registers. The presence of this category in these
behaviour of this preposition concerning the category 'entity' should be registers must be associated with the reference to characters in the case of
emphasized. The scheme NCV + de + entity is cross-sectional and is not literature, whereas in the other registers, to the presence of different voices
affected by the differences that derive from the macrocontextual charac- or reference to different authors' views of the topic of the texts. There is a
teristics of the different registers. In this sense, this scheme represents a combination of these verbs with the category 'entity' which prevails less
tendency of the language in its entirety; that is, when a NCV occurs in com- than in the other categories, but it is markedly present in the ARTICOS,
bination with the preposition de, the tendency is for it to be followed by an DETP and DICIPE registers, and has a definitive absence in other registers
entity, independent of the register which is being studied. It is important to (CLL, CPP, CTC and NOTICEN TV). The occurrence of this category is
emphasize that the occurrence in these schemes ofthe category 'event' <loes associated with the presentation of abstract concepts that occur in the regis-
not characterize any other sequence in the NCV. ters mentioned.
The occurrence of this category in combination with the preposition de In Graph 6, the prepositional schemes with the second most frequently
can be understood as either activation ora syntactic restriction favoured occurring preposition are presented.
WORKING SPANISH CORPORA MULTI- ANALYSIS
100% 00%
90% Mode 90%
80% 80%
lnstrument
70%
60% Person
60%
50% 50%
40%
40%
30%
30%
20%
20%
i0%
i0%
0%
0%
00 <yo
C.i
d:a <yo
C.i v'>' ~(;
(¡
:<.{J :<-'fJ
~ ~
Graph 5.6 Schemes with the preposition con Graph 5.7 Schemes with the preposition de
Table 5.9 The most frequent schemes in both types of communication verbs
Justas Graph 6 shows, the preposition con preferably combines with the
categories 'mode', 'instrument' and 'person'. First, the presence of the NCV ACV
category 'instmment' with a high occurrence in the register CEO should a en de a con de
be noted. Moreover, this category has the same behaviour as that of the
categories 'person' and 'mode' in the CPP, CTC, DEEB, DETP, DICIPE Mode Place Entity Mode Mode Mode
and NOTICEN TV registers. In these registers, the categories present a Entity Mode Person Entity Instrument Entity
total absence and are perceptible only in the DICIPE register. The cat- Person Entity Event Person Person Event
egory 'instrument' in this register, associated with the dissemination of
science in the written press, can be related to the presentation of differ-
4.4 General comments on prepositional schemes in the communication verbs
ent authors' views or to the explicitness of the means on which com-
munication is based. Likewise, the category 'person' occurs with a The occurrence of schemes independent of the registers in both types of
different behaviour in the ARTICOS, CEO and CLL registers due to the verbs are commented on, with the purpose of analysing.which are the most
occurrence of authors in the case of ARTICOS and of characters in the frequent prepositions and the categories that preferably combine with these
case of CEO and of CLL, respectively. It is relevant that, for the category prepositions.
'instrument', a collocational argument can be used in the sense that this As Table 5.9 shows, communication verbs share two of the three most fre-
preposition conveys in itself (along with the preposition por) an instru- quent prepositions, in particular the prepositions a and de. In these cases,
mental meaning. they further share the same categories. That is to say, independent of the
In Graph 7 the prepositional schemes with the preposition de, the third type ofverb, there is dependence between a preposition anda cognitive cat-
most frequent in the ACV, are shown. egory: if the preposition a occurs, the categories that will be activated are
The preposition deis productive with the categories 'mode', 'entity' and 'mode', 'entity' and 'event'. The above occurs regardless of the specific
'event'. The category 'entity' shows a high presence in two registers (DETP function performed by both subtypes of communication verbs. It should be
and ARTICOS) and zero presence in the remaining registers. As was already pointed out that, independently of the preposition and the type of verb, the
mentioned, the presence ofthe category 'entity' relates to the communica- most productive categories in these schemes are 'mode', 'entity', 'person',
tion of abstract concepts. The category 'event', on the other hand, shows 'place', 'instrument' and 'event'.
syntactic compatibility with the preposition studied and mostly prevails in Both Table 5.9 and the graphs previously presented are based on the
the register CEO, i.e. oral interviews about text comprehension processes. highest frequencies, not on the total occurrences. Considering this, in the
In this register, the presence of events relates to the narration of acts con- NCV the preposition en is more relevant to this type of verb. The same
cerning the text task. occurs with the con, which is unique to the high occurrences in
WORKING SPANISH CORPORA MULTI-REGISTER ANALYSIS
APPENDIX: CORPUS
Discourse
ID CTC
Name Technical-scientific Corpus ID DICIPE
Mode Written
Register Technical-scientific Name Scientific Dissemination in Written Press
Brief Description This corpus is composed of specialized texts of Mode Written
obligatory and complementary reference in three Re gis ter Journalistic
technical-professional areas (maritime, Brief Description Texts disseminating science and technology
commercial and industrial) used in the last year in five Chilean newspapers
of the differentiated school system in Chile. The Collection Year 2004
corpus was collected in schools of the V Region, Number ofDocuments and Documents: 412
Valparaíso, in the frame of the FONDECYT Number ofWords Words: 204,598
1020786 project.
Collection Year 2003
Number of Documents and Documents: 74
Number ofWords Words: 774,622
WORKING WITH SPANISH ORPORA MULTI-REGISTER ANALYSIS l
ID
Name
NOTICENTV
Central
------------·
-
ID
Narne
CEO
Oral Interviews
Mode Oral Oral
Oral interviews
One month's of four open national This corpus is of oral semi-dirccted
TV programmes interview transcriptions with final year students
Collection Year 2000 of technical-professional and human-scientific
Number ofDocuments and Documents: 270 school systems. The interview topic was the use
Number ofWords Words: 84,809 of comprehension strategies in students. The
corpus was collected in schools of the V Region,
Valparaíso, in the frame of the FONDECYT
1020786 project.
Collection Year 2003
ID CPP Nurnber of Documents and Documents: 4
Number ofWords Words: 410,981
Name Public Policies Corpus
Mode Written
Register Socio-political
Brief Description Texts of public policies to overcome poverty,
collected from different institutes, institutions
and think-tanks from a spectrum of política]
tendencies - the left, the centre and the right
wing
Collection Year 1999
Number ofDocuments and Documents: 20
Number ofWords Words: 234,818
ID CLL
conduded distance
and
use of lst person verbs and PF is ratified by the results obtained by Bauhr
and MF in written language (95 per cent); (iii) in the case ofvague, distant (1989: 96) and Troya (1998: 82-136).
or remote futurity, there is a categorical use of MF (100 per cent) in the
A~ Tables 6.6 and 6.7 show, in the 2nd and 3rd person the tendencies are
written corpus, whereas PF is more frequent (64 per cent) than MF (36 per
marked by discourse mode: more frequent use of MF in written and more
cent) in the spoken corpus. The results show that, despite the general pref-
PF in spoken language. This can be explained by the fact that 2nd and 3rd
erence for MF in written Spanish and for PF in spoken Spanish, in both cases person verbs report someone else's intention, not the speaker's.
there is a strong correlation between temporal distance and the selected Examples (4) and ( 5) illustrate the difference between showing an inten-
future form. tion (which may be linked to the use of the lst person) and not showing it
(which may be linked to the use of other grammatical persons):
3. 2 Person of the verb tense (4) /Use of lst person verbs/
The results of the comparison related to the person (first, second or third a. Y veo a la ... la vaca por allá bien lejos, y yo: ' aprovechar para agarrar
person or plural) ofthe future tense verb in both corpora are shown al becerrito'. (Sedano 1994)
'And I see the ... the cow over there, quite far away, and I say: ''I'm going
in Table 6.6 and Table 6.7. 15
to take the opportunity to grab the little cal[".'
Table 6.7 shows that 2nd person verbs occur in the corpus. If
b. quédate . 'No ... no, yo no me voy a aquí.' (Sedano
we compare the results of Tables 6.6 and the most interesting finding 1994) '"Hugo, stay here". "No ... no, I'm not going to stay here".'
relates to the use of lst person verbs: the frequency is high in spoken (91 c. 'El barco no está en condiciones de cargar y no voy a correr ese riesgo.'
per as well as in written Spanish (75 per cent). In my opinion, the cor- (Sedano in press)
relation between lst person verbs PF, in both corpora, is due to the fact that ' "The ship is not ready to be loaded and 1 am not going to take that risk".'
have more certainty about their own intentions than
(5) /Use of 3rd person verbs/
someone else's. In other words, the lst person in expressions of future
a. Los supermercados afiliados a ANSA mantendrán cerradas sus puertas
seems to be associated with intention, a modality that reflects linguistically durante el día de sólo aquellos establecimientos ubicados
the intention of out the action. As Bauhr says, en las zonas brindarán atención (Sedano in
occurs with lst person verbs'. The correlation between the
WORKING SPANISH C FUTURE EXPRESSIONS IN SPANISH CORPORA
salida violenta
es más y con violenta todo ese grupo
a. Sin acá en Venezuela ya se sabe que esos
escenario (Sedan o in
cos del mandatario serán los al fina!, terminarán conspirando
'ifChávez refuses to a violent is more and with a
contra la buena voluntad de el
violent thal whole group will be thrown out arena' .
(Sedano in
c. 111t.d~>dL1uc<u de resolver la crisis nos conduce a un escenano
'However, here in Venezuela it is known that the head of state's
[... ] que nos llevará por la senda de la (Sedano in
!ove for the of will be what, in the end, will end up
the of the na ti o ns that make up the
'the inability to solve the política! crisis leads us to a terminal scenario
b. yo sé que no va a estar aquí a las cinco. (Sedano 1994)
[. . . ] that will lead us to hyperinjlation'
'I know he is not going to be here at five.'
c. 'estamos seguros de que no va a suceder nada porque ya se han tomado
All examples in (4) and (5) refer to future events: those with lst person
todas las medidas preventivas.' (Sedano in press)
verbs (4a-c) show the speaker/writer's intention to do or not do sm.ne- '"we are certain that nothing is going to happen, because ali preventive
thing, whereas in those with 3rd person verbs (5a-c) the sp~akerl_wnter measures have already been taken" .'
is either just reporting what others said (5a), or expressn~g his(her
/future expressions in interrogative sentences denoting uncertainty/
opinion about an event that does not de~e.nd ~~ his(her m~ent10~s;
a. Inmediatamente le asalta de nuevo la duda. ¿Qué pasará cuando pasen
in fact, in (5b) the event depends on a condlt10n ( if Chavez ... ) and m
10 años, y tú tengas 80 [años] y ella 30? (Sedano in press)
(5c) it is based on a prediction that, in the end, can be refuted by future
'He was immediately seized by a doubt. What will happen in ten years,
events. when you are 80 [years old] and she is 30?'
Although no absolute generalizations can be made ~bout .the fact b. Yo me pongo a pensar en las elecciones que vienen ahorita ¿Por quién
that all future expressions in the lst person show an mtent1on and voy a votar? (Sedano 1994)
that the speaker/writer's intention is linked to his/her confidence in the 'I start thinking about the forthcoming elections. Who am I going to vote
occurrence of the future event, to a certain extent, the correlation for?'
undoubtedly exists, as suggested by the percentages in Tables 6.6 and 6.7.
Tables 6.8 and 6.9 present the results of the epistemic modality markers.
What else could explain the preference for PF when the verb is in the lst
Tables 6.8 and 6.9 show that the quantity of epistemic modality tokens is
person?
small. However, we can observe that in both tables the tendency is the same:
Confidence in the realization of a future event is definitely not the same
the frequency ofMF in creases when the future expression .depends on_ inter-
as absolute certainty, as these are two different modalities. However, confi-
rogative sentences denoting uncertainty, while the oppos1te occurs with PF.
dence and absolute certainty do have something in common, namely, that
These data may show a correlation between the speaker /writer's certainty
in both modalities the speaker /writer shows an assertive attitude far from
about the future event taking place associated with PF and uncertainty asso-
doubt or uncertainty.
ciated with MF.
Table about
future
Tokens % Tokens %
With verbs l 50 50 2 nation for these
saber; estar seguro intention to carry out a future action, is confident that
In 7 100 o 7 do it. The association between lst person verb, intention ,,,.~~'~"C'
speaker's confidence that the event will take could be
8 89 11 9 asín
(9) lst person > intention > confidence that the future event will take place
4. Discussion and condusions
.. bpistemic modality marker. The markers analysed in both corpora are (i)
The comparison ofthe results, provided by a~l studies reported h~re on the the subordination of the future clause to a certainty verb in affirmative
use of the two future forms, indica tes that PF is the most frequent m spoken form; and (ii) the use offuture expressions in uncertainty interrogatives.
Spanish, whereas MF is preferred _in written Spanish .. The preference for The results indicate that while the frequency of PF is higher when the
one or the other future form vanes not only accordmg to the language future form is subordinated to a certainty verb, the opposite occurs with
mode (spoken or written), but also to dialectal factors. In Lati~ American MF. The results of the analysis reinforce once more the relationship
spoken Spanish, the frequency of PF is higher in sorne countnes ( e.g. the between certainty markers and the use of PF, and vice-versa, uncertainty
Dominican Republic) than in others (e.g. Mexico). Overall, the frequency markers and the use of MF.
of PF is also greater in Latín America than in Madrid or in the Canary
Islands. In the case of written Spanish, within the context of the literary The results of the comparative analysis of the three analysed factors point,
corpus analysed, it has been reported that the writer's personal style and then, to the relative association, on the one hand, between PF and the
the sociocultural level of the characters have sorne influence on MF or PF speaker /writer's confidence in the future event's occurrence, and, on the other
selection. hand, between MF and lack of confidence. These conclusions confirm and reaf-
The two analyses carried out by Sedano (1994 and in press) widely firm those by other authors such as Fleischman (1982); Bauhr (1989); Silva-
confirm the tendency to use MF in written and PF in spoken Spanish, which Corvalán and Terrell ( 1989); Troya ( 1998); and Blas (2000), among others.
allows me to affirm that discursive modality (spoken or written) is the most The fact that sorne contexts favour one or the other future form does not
important factor for the preference of one or the other future form. In add- guarantee that all speakers/writers will use one of the two forms according
ition, the results of the analysis according to the three linguistic factors - to the tendencies found in the analyses. What can be foreseen, however, is
temporal distance, verb person and epistemic modality markers - lead to that, if spoken and written samples are collected from a certain number of
the conclusion that the speaker /writer's confidence or lack of confidence speakers/writers, the frequency in percentage terms of the use of either MF
in the realization of the future event being enunciated may be of a psy- or PF will most probably be particularly high in the contexts which we have
cholinguistic nature. This can be explained as follows: signalled as favouring that particular future tense form.
Temporal distance. In the comparison carried out in the present study, two
tendencies can be observed: the frequency ofuse ofMF increases as the Notes
distance goes from immediate to remote future. Inversely, the 1 This chapter is a modified version of another study presented by Sedano
ofuse of PF increases as the temporal distance changes from (2005). I would likt5 to express my gratitude to Paola Bentivoglio and
remote to immediate future. Since it is easy to suppose that a future Rebecca Beke for their careful and valuable suggestions and
action is as more feasible when it is closer to the moment of comments.
enunciation, the relationship between temporal distance, speaker's atti- 2 A third way to refer to future actions is the present tense (Nos vamos
tude and future form selectíon can be represented as in (8): mañana a Madrid 'We go to Madrid tomorrow', >vith the meaning of
that the future event will 'Tomorrow we wi.11 go to Madrid'). This form will not be considered in
the present study.
WORKING
Mercer , as well as the between the so-called armchair lin- are not dcfincd as thc opposite of common for special pur-
guistics and corpus linguistics (Fillmore 1992; Chafe 1992; Stubbs 1996, 2006; poses are sublanguages belonging to a certain field of subject-orientcd commu-
nications; they use the linguistic and other communicative means of a certain
Tognini-Bonelli 2001; Lazaraton 2002; Swales 2002; Hunston and Thompson
]anguage and culture system in a specific way and with a specific frequency of
2006; Parodi 2005a, 2007).
occurrence depending on the content, the purpose and the whole communica-
In this chapter, we are interested in classifying and describing the text tion situation of a text or discourse.
types of three corpora of specialized written technical-professional dis-
course based on functional, communicative and textual criteria. The This fragmen t emphasizes the role of the occurrence of features of a varied
corpora have been collected from three subject domains during the last year nature in defining this type of discourse: the linguistic, pragmatic and extra-
of study of secondary technical-professional education in the city of textual aspects deserve serious consideration. Schroder's preference for the
Val paraíso, Chile: the maritime sector (port operations), the metal mechan- term special or specialized leads him to suggest that the term languages far spe-
ics sector (industrial mechanics) and the commerce sector (accounting) cific purposes should not be used, but specialized communication, because it
(Parodi 2004; Parodi and Venegas 2004; Parodi 2005a). In the first section involves the aspects mentioned above and is therefore a more encompass-
of this chapter, we will focus on a brief revision of sorne concepts and crite- ing concept. We agree and would further argue that specialized discourses
ria which will serve as a framework for the study; in the second, sorne claims possess grammatical and textual features that, along with other non-
and methodological steps are given; and in the third, we outline the quali- linguistic factors, constitute a texture of useful criteria to describe text types.
tative and quantitatíve results yielded by the technical-professional text clas- Few of them or individually, either linguistic or extralinguistic, fail to fully
sification. Concluding remarks provide sorne comments and implications. account for the text object.
With the aim of defining even further the concept of specialized dis-
l. Theoretical frnmework course from a strictly linguistic perspective, it is useful to introduce the
notion of syndrome, developed by Halliday. According to Halliday ( 1993), a
1.1 Specialized discourse certain register can be identified through the co-occurrence of a set of lin-
guistic features. Syndromes are patterns of co-occurrence from features in
Throughout the chapter, we will use the term 'specialized discourse'. This one of various linguistic levels (of expression or of content) (Halliday 2006).
term illustrates the type of discourse on which the study is focused: texts on These syndromes characterize a variety of language and help us recognize
science and technology in their didactic dissemination function at sec- a given register as such (for example, a dialectal variety or a technical-
technical-professional institutions (Martín et al. 1987; Gliiser 1993; scientific variety). The notion of dimension, proposed by Biber ( 1988, 1994,
and Martin 1993; Rose Christie 1998; Unworth 2000; 1996, 2003, 2005) as part of his multi-featured studies, is also a very pro-
Goldman and Bisanz 2002). In the term 'specialized' itself ductive text descriptor, as it confirms a group of co-occurring linguistic char-
reveals a gradient or continuum, an essential axis with which to approach acteristics which work together to describe a pattern of text variation.
texts of various degrees of specialization, many of which possibly circula te Turning now to the topic of text classifications, Ciaspuscio (1994, 2003b)
on the border of the transition towards dissemination or popularization a comprehensive review of the of text types. It is
(Parodi 2005a). interesting to note that there are more exhaustive approaches that incor-
Many authors have noted that discourse is shaped a group porate multiple analytical levels (Brinker 1988; Bassols and Torrent 1997).
of texts centred on prototypical topics within a specific area of knowledge, We are certain that these complex approaches to text types or classes in
such as science and technology. These texts show a series of distinctive which and communicative aspects coexist can
features that reveal to the rhetorical and and of the phenomena under
WORKING SPANISH CORPORA PROFESSIONAL COURSES
We concerned
istics of the
erable number ofwords. written corpus. In
These texts the that could be to any text. If this were
would have had to be much ·w:ider and
11 nLMJcL1.cu1Hcu, oral or as is the case in many ,_.v,pnrn·)m
were made to ensure the collection of the reading The construction of the typological proposal started out from types
- teachers, students and librarians - and asking and institutionalized traditionally as observable
texts the students had to and read. 1993, 2004; Hyland 2002, 2005; Swales 2004; Martin
all the texts were digital- Beaugrande and Dressler . Subsequently, with the aim of
and onto a website belonging to the both common and diverging patterns that help distinguish the technical
Escuela Lingüística de Valparaíso (www.linguistica.cl). They can therefore be texts initially classified more accurately, a matrix with more specific features
searched and analysed online at www.elgrial.cl. They can also now be was constructed, following guidelines found in specialized literature. Text
retrieved using the text classification emerging from this research. types cut across disciplines, and therefore sorne of the texts in part of the
corpus share features across disciplinary boundaries. However, we expect to
find more prototypical texts of sorne disciplinary domains and also to iden-
2.2 In search of a technical--professional text typology: criteria and method tify subtle varia ti o ns in terms of specific characteristics, sorne of which may
According to Biber (1996), association pattern techniques can be used to turn out to be mutually exclusive.
investigate two major kinds of research question: i) the variability of a lin- This proposed typology has been organized around three general analyt-
guistic feature, and ii) the variability among texts. In this study we are con- ical criteria: situational, functional and textual. These criteria have been
cerned with the latter. When the purpose of research is to describe a group established to address aspects concerning the participants' interaction, the
of texts of a specialized written re gis ter, rather than individual linguistic fe a- communicative functions, the contexts in which the texts circulate and the
tures, textual co-occurrence patterns must be determined in order to iden- textual structures that characterize the texts in the collected sample.
the salient characteristics of the texts under study. Thus, although the Today there is a growing, virtually infinite, number of possibilities in terms
individual criteria that could be selected to describe the texts are important, of specific features to characterize texts. For example, a focus on patterns of
the systematic co-occurrence of these features will be one of the most rele- organization has shed light on larger text regularities in various text types.
vant contributions to this classification. The configuration of the patterns is Attempts to identify these patterns have followed different approaches, such
not accidental. They represent choices and options: they mean some- as in Hoey's (1983) problem-solving structures, Widdowson's (1978) rhetor-
thing. lt is true that all features interact in a text (linguistic and non- ical structures and van Dijk's (1980, 1983) schematic structures and then
linguistic ones), but the way these interactions occur varies. What may be a superstructures, along with the work done by, among others, Coulthard
strong pattern of association in one kind of text often represents a weak (1977), Horowitz and Samuels (1987) and Reid (1987). These regularities of
organization pattern in another text type. organization in discourse were also seen in terms of 'moves', as proposed by
Hence, to develop a text typology that accounts for the essential charac- Swales (1981, 1990, 2004) andalso byBhatia (1993, 1995, 2004).
teristics of the Technical-Scientific Corpus (TSC), as mentioned previously, As can be understood, there has been a continua! quest for more detailed
we decided to follow a multilevel approach (Bassols and Torrent 1997). That and grounded description and identification of recurring patterns in dis-
is, both the internal characteristics of the texts and the extralinguistic course. The emerging picture thus looks much more complex and dynamic
context in which they are produced and circulated were considered. than the one we had in mind for the starting text typology for our technical-
Such an approach finds its underpinnings in the eminently dialogical professional corpus. Nevertheless, the selected criteria are grounded in an
nature of language and attempts to account for the linguistic-social rela- eight-component matrix with sub-specifying features. Figure 7.2 presents the
between the participants of a given discourse community. general organization of the features, before each is described in detail.
Multilevel approaches are presently considered most appropriate for the
elaboration of valid from a theoretical perspective Situational criteria ( components of the communicative situation)
2003b; Bassols and Torrent Biber 1996; Bhatia 2004; Bazerman Beginning our analysis from a situational context, it is necessary to identify
1994; Swales . At the core is the selection of ~~~"~~·'" the features that best describe such a context (van Dijk 2001, 2002, 2006a
parameters, on the will and Martín 1993, 1998; Eggins and Martín 2003; Hood and
differences among the texts and text Bazerman Bhatia . In the
WORKING SPANISH PROFESSIONAL SCOURSES
The context in
(.)
:¡::;u ~~u~~~ in or for the
C\J ·.;:;
E et!
<DE
r. (J)
-o- .e
c·¡:: A.2) The original audience: refers to the relationship between writer and
o :J
:2: a:: reader as far as their expected knowledge of the topic is concerned. This
can be from expert to lay person, from expert to semi-lay and/ or from
expert to expert.
~
-et! :J
.a
X :J
ü " Lay person: a person who is not familiar with the topic.
(J) .J:;
1- (/) • Semi-lay: a person who has only basic knowledge of the topic, but which
is sufficient to enable further learning.
" Expert: a person who is knowledgeable about the topic.
(J)
> As for this ongoing research, we consider the student reader to be 'lay' or
~(.) 'semi-lay', since there are learning stages at which the student is not know-
·¡:: e ledgeable about the subject matter in the texts, and others at which the reader
:J o
E·.;::; has already acquired deeper knowledge of the specialized subject matter.
E u
o §
Ü LL A.3) The explicit author: refers to whether or not the author is clearly iden-
tified. This is important for the reader in order to grasp a sense of group
community. Ideally, new members would, through explicit citation, know
the authors and the authorities within a discipline.
(f)
O) o Functional criteria (communicative functions)
>- z
These features also help classify text types through certain resources. This
is whatBiber et al. (1998) call the purpose ofthe communicative event. Such
features match the communicative functions proposed byJakobson (1961).
B. l) Referential function: refers to facts, or ideas. Its purpose is
informative. It focuses on the context, that is, on the topic to which it refers,
and it manifests itself in the third person singular and in a large number of
nouns, among other features.
B.2) Expressive function: focuses on the writer/speaker and implies the
expression of feelings and emotions. It manifests itself in the first person, in
interjections and in a large number of adjectives.
Appellative function: focuses on the reader/listener and
persuasion and exhortation V\1.th the purpose of eliciting a response
WORKING WITH SPANISH
SIONAL DISCOURSES
from It manifests itself in the second person and a order that shows the unit of action and its
ofverbs. that if one of its action parts is whole
focuses on the of communication and ified . Its characteristic is the presence of
purpose is whether or not the channel functions rnru'rt1v the transformation of
ifests itself clichés formulaic Pv~.-pº'''"'"
B.5) Poetic function: focuses on the message itself and manifests itself in the its aim is to provide accurate directives or instructions about
and in of the way certain procedures that tend to govern behaviour and/ or
B.6) Metalinguistic function: is oriented towards the text itself, the code. Its ideal state of things and processes should be carried out
main function is to clarifications and and is usually
shown through exernplifications and definitions. Subject matter: refers to whether or not the text addresses one or
several topics, which can account for textual complexity and extension. A
It is irnportant to ernphasize that none of these functions exist in a pure multithematic text can be more complex because of the number of topics
state; that is, they intertwine in the discourse, although there is frequently dealt with and because of its length. A rnonothematic text would have a
sorne predorninance of one over the others. more simple structure, and could therefore be comprehended more easily.
C) Textual criteria
These features characterize the text types according to the organizational " Monothematic: focuses on one topic.
pattern of the text, which is influenced by the subject matter, the graphic • Multithematic: focuses on several topics.
elements and the structural characteristics.
C.3) Multimodality: refers to the presence or absence of elements of differ-
C. l) Textual structure: refers to the pattern of organization of the infor-
ent modalities (linguistic, graphic). Having multimodal elements, a text
mation which prevails in the text. Nowadays, we know there are multiple can facilitate comprehension, because one single concept is presented in
approaches and possibilities for classifying a text on this principle (Sinclair
different ways (Kress and van Leeuwen 1996; Camero 2001; Baldry and
et al. 1993; Ghadessy 1993; Hoey 1983, 2001; van Dijk 1980, 1983; Bhatia
Thibault 2006).
2004; Swales 1990, 2004). Given the overall objective and dueto the tech-
nical-professional characteristics of the corpus under analysis, we will C.4) Writing required: refers to whether or not the text needs closing
follow a more classical perspective, in which five main organizational struct- (writing).
ures can be distinguished: argumentative, descriptive, expository, narrative
and normative.
3. Results
. Argumentative: the aim of which is to influence a given audience. It In this section, we give four kinds of empirical results: l) the analysis of the
assumes an utterer who intends to make an audience (readership) texts themselves, the first step taken to study the TSC corpus, which was
accept a conclusion, offering a reason to accept that conclusion done with the purpose of attempting an initial classification (based on
(Plantin 1998). socially instituted notions, but also on definitions based on the criteria
" Descriptive: its function is to characterize objects, people, situations or selected); 2) a comparative characterization of the text types, based on all
processes through language, explaining their parts, qualities or cir- of the features included in the taxonomy, and, to illustrate sorne text types,
cumstances (Bassols and Torrent 1997). It is conditioned by the com- a few examples (although dueto the extension ofthe corpus and the major-
municative context and the purpose to be achieved (Calsamiglia and ity of the texts,just representative fragments are selected); 3) a comparison
Tusón 1999). between two emerging opposite text types and the organization of the fea-
Expository: its aim is to 'objectively' inform or expose subject matter to tures in terms ofmore general criteria; and, finally, 4) results of quantitative
facilitate cornprehension (Bassols and Torrent 1997). This implies a analysis of the occurrence of the text types by disciplinary domain.
need to be reliable, neutral and objective when information is given
(Calsamiglia and Tusón 1999).
.. Narratíve: the narrative function assumes the wish to provide real facts 3.1 Classifying and defining text
or those which are potentially real in a discourse universe. Its function assembled the set of features to be identified in each text, the next
is to discursively organize actions and events in an integrating sequential task was to read all the texts, analyse them and group them. An initial
SPANISH CORPORA TECHNICAL-PROFESSIONAL DISCOURSES
dassification of 12 types of
each, listed ~''"ª'"L
CRITERIA
FUNCTIONAL TEXTUAL
tj
H
C/l
n
e
e;
?;!
C/l
t"1
(/l
them
the way all
and the Technical Article
tures these interactive features. ..,r:n''"'~0c of such features and, 'ºLuJt1ua'
As shown in thís kind of make dis- discourse in greater
tinctions between the DG and the TA means of situational, functional
and textual features. In tenns of situational the DG differs from
J.3 A r·mm111.rv.1m.
the TA as far as the sphere of original production is concerned: the DG
emerges from the educational context and is created by and for such a audience
context. The TA's sphere of original production is, in contrast, the field of Figure 7.4 shows the text types that circulate in the technical-professional
work. With respect to the original audiences to whom these textual types are educational community. They have been subdivided into two groups: text
addressed, they differ in that the DG appeals to semi-lay readers and the TA types whose sphere of production is the school community itself and text
to an expert audience. As to the authorship, the DG may or may not bear types produced in other communities (more professional than academic).
the author's name, showing sorne formality as compared to the TA, in which In turn, the audiences for whom they were created (lay or semi-lay or expert)
the author's name is always explicit. are specified. Figure 7.4 clearly illustrates the profound effect of comparing
The TA and the DG put different emphases on their communicative pur- these two features in all twelve text types. The findings show how these two
poses: in the TA, the referential function prevails, which makes its nature features could help identify texts whose sphere of original production is not
more 'objective', with an informational focus. In contrast, the DG stresses the academic community and the invoked audiences to which they are ori-
the appellative function over the referential; that is, there exists a greater ginally addressed are not the students of the secondary technical-professional
appeal to the readership, an attempt to elicit a response from the readers schools, but others more related to professional environments.
by inviting them to participate actively through writing. The above relates It is interesting to verify not only that, in the technical-professional dis-
to the text features, specifically to text completion, since the DG needs to course community under study, text types produced by and for that com-
be completed, while the TA <loes not require writing. munity circulate, but also that there are text types not generated in that
Another difference can be observed in the prevailing textual structure: community that are equally read by its members. These text types are ori-
the DG has an expository structure, which means that its major aim is to ginally conceived for and addressed to other specific audiences. Therefore,
present topics to be learned and to facilitate comprehension. In the TA, their linguistic, textual and graphic resources are part of the meanings con-
where the structure is expository-argumentative, a persuasive focus prevails veyed originally to those groups. The aim of all this is to determine whether
and, therefore, it provides information that supports the line of exposition- they coincide váth the lay or semi-lay audience to whom the texts used in
argumentation. An analysis of the diagram and the distribution of the fea- the technical-professional school should be addressed.
tures make it abundantly clear that, based on the eight distinctive Finally, from the data presented, it is clear that most text types (seven out
characteristics, the two text types are diametrically opposed, to the extent of 12) circulate in the school community of origin: Didactic Guideline,
that they do not share six of the eight features. Strictly speaking, the only Legal Gloss, Technical Description, Table, Diagram, Glossary and Manual.
features they have in common are the Multimodality and the Monothematic However, a significant number of text types that students read in their tech-
Subject. The only feature the TA shares with the DG is textual structure nical-professional education are neither produced by that school commu-
(being expository and argumentative), though this is not rare, for, as is well nity nor are originally addressed to it. This is the case of the Form, the
known, pure text organizations are difficult to find. Directive, the Regulation, the Law and the Technical Article. Of these text
In terms of the situatíonal criterion, the DG differs from the TA as far as types, only one, the Form, coincides with the school audience, that is, it is
the sphere of original production is concerned: the DG emerges from the addressed from experts to semi-lay people. The four remainíng text types
educational context and is created and for such a context. The TA's are addressed to expert audiences, which results in a considerable academic
sphere of original production is instead the field of work. With respect to burden to be processed by secondary students. It is true these texts must
the original audiences to whom these textual types are addressed, differ open the way to the specialized knowledge that is typical of the orot<~ss1011al
in that the DG appeals to semi-lay readers and the TA to an expert and technical community these students are going to be part of, but it is also
WORKING WITH SPANISH CORPORA TECHNICAL-PROFESSIONAL DISCOURSES
THE
< LOSSES = LOSSES
extreme,
Technical very concrete a strong reference focus and
INSTRUCCIONES DE LLENADO DE CONTINUACIÓN DEL INFORME DE
little orno attention to the reader grasp the contents. The
definitions but no rhetorical didactic clues at all.
ua•wJu~. as a text type, is identified by its normative stance and proceda el uso de este deberá Ser emitido
reveals itself as somewhat distant and different fron1 the other four rnente con el informe de ~~..... ,~.;,~ veste último no tendrá validez si a él
As shown in the example, the structure no se ' 'Continuación del
characteristic of this text type. Informe de Importación' que corresponda(n).
In what follows the five example text types are presented and the transla-
tion into English has been included in square brackets. l. Presentación (Número y fecha).
Deberá indicarse el mismo número y fecha de presentación del Informe
3.4.1 Manual-7 CTC-COM-mal de Importación.
Poner Atención: 2. Entidad que presenta el Informe de Importación y Código.
Deberá indicarse las mismas del Informe de Importación.
I. LA CUENTA 3. Número, fecha y firma autorizada entidad emisora.
'Es una agrupación sistemática de los cargos y abonos relacionados a una Número, con su correspondiente fecha y firma autorizada con el cual el
persona o situación de la misma naturaleza, que se registran bajo un Servicio Nacional de Aduana cursa el Informe de Importación.
encabezamiento o título que los identifica.'
[INSTRUCTIONS FORFILLING IN THE CONTINUATION OF THE IMPORT
IL CONCEPTOS
REPORT
Whenever the use of this Jorm is required, it should be issued with the report from
a) Las anotaciones registradas al Debe de la cuenta se llaman: Cargo.
Import. The latter will have no validity unless attached to the correspondzng
b) Las anotaciones registradas al Haber de la cuenta se llaman: Abono.
'Continuation of Import Report'.
c) La suma de los Cargos se llama: Débito.
d) La suma de los Abonos se llama: Crédito. 1. Presentation (Number and Date)
e) La diferencia entre Débitos y Créditos se llama: Saldo. The same number and date of presentation of the Import Report should be stated.
2. Company presenting the Import Report and the Code Import.
RECORDAR:
The Same should be stated as for the Import Report.
GANANCIAS > PÉRDIDAS = UTILIDAD DEL EJERCICIO 3. Number, date, and authorized signature of issuing entity.
Number, corresponding date and authorized signature with which the National
GANANCIAS< PÉRDIDAS= PÉRDIDAS DEL EJERCICIO
Customs Service issues the Import Report.]
[Important.
3.4.3 Glossary-7 CTC-MAR-gs41
J. THE ACCOUNT
'" Mercancía
'It is systematic grouping of the charges and related to a person or situ-
Es todo bien corporal mueble sin distinción alguna.
ation of the same nature, registered under a heading or title that identifies them.'
Todo producto, manufactura, y otros bienes corporales muebles, sin
II. CONCEPTS excepción alguna.
'" Mercancía extranjera
a) Entries registered as credit in the account are called: Payments.
Mercancía proveniente del exterior y cuya importación no se ha con-
b) Entries registered as income in the account are called: Credit.
sumado legalmente aunque sea de producción o manufacturación
c) The total sum of all the Charges is called: Debit.
nacional; o que habiéndose importado bajo condición ésta deje de
d) The total sum is called: Credit.
cumplirse.
e) The between Debits and Credits is called: Balance.
WORKING WITH SPANISH CORPORA CAL-PROFESSIONAL DISCOURSES
DEFINICIONES DE CONTENEDORES
. Contenedores para son totalmente cerrados,
teniendo todas su como así también el el y
además una de sus extremas está provista de puerta.
" 2. Contenedores para uso especifico son aquellos destinados al
de mercancías generales construidos con características especiales, de
tal forma de facilitar el embarque o descarga ya sea por la puerta
on a condition, such a condition is no extrema o teniendo funciones específicas, tales como la ventilación de
.. National merchandise la carga.
Merchandise produced or manufactured in the country, with national or nation-
alized raw material] [J. HISTORJCAL EVOLUTION OF CONTAINERS
3.4.4 Regulation ~ CTC-MAR-rg45 The Container; as a new transportation device, has revolutionized maritime
III TRASLADO DE MERCANCÍAS DESDE EL PUERTO O AEROPUERTO AL transportation; vessels began to be adapted for more efficient operation, to the
RECINTO DE DEPÓSITO extent that some are built especially for container handling and ports have been
Las normas contenidas en esta apartado solamente se aplicarán cuando el Jorced to purchase tools that help the handling process.
recinto de depósitos esté ubicado fuera de la zona primaria del puerto o
aeropuerto de arribo de la nave. JI. DEFINI170NS OF CONTAINERS
l. La compañía de transporte debe entregar las mercancías al almacenista 1. Containers for general use are completely enclosed units with rigid walls, floors
dentro de las 2 horas siguiente a la hora de salida de la Zona Primaria and ceilings, and with a door provided in one of the end walls.
del puerto o aeropuerto de arribo de vehículo 2. Containers for specific use are those units destined for the transportation of
2. La responsabilidad ante el Servicio de Aduanas de la entrega de las mer- general merchandise, and which have been built with special characteristics to
cancías al almacenista es de la compañía de transportes que realizó el f acilitate loading or unloading, whether through the end door or having specific
flete internacional, independiente de la empresa que efectuó el traslado functions, such as ventilation of the load.]
desde la zona primaria al almacén
[III. TRANSFER OF MERCHANDISE FROM THE PORT OR AIRPORT TO 3. 5 Occurrence of the text type according to disciplinary domain
THE PLACE OF DE,'POSIT Graph l shows the occurrence of each of the 12 text types in a comparison
The norms contained in the section will only apply when storage is located outside the between the three disciplinary areas.
primary zone of the port or airport of arrival of the vessel. The fact that the commerce sector has eight text types makes it the most
1. The shipping company must deliver the merchandise to the head of the warehouse heterogeneous discipline in terms of knowledge communication: Manual,
within 2 hours following the departure time from the Primary Zone of the port or Didactic Guideline, Directive, Technical Description, Diagram, Glossary,
airport of the vehiele 's arrival. Form and Law. Of these types, the Form and the Law are exclusive to the
2. Responsibility bejore Customs Services for delivery of the merchandise to the head area; that is, they are only found in this disciplinary area. The technical
of the warehouse lies with the shipping company that transferred the goods from kinds that do not appear in this sector are Regulation, Legal Gloss,
the primary zone to the warehouse.] Technical Article and the Table. The maritime sector, in comparison, pre-
sents a greater variety of text types (ten in total): Manual, Didactic
3. 4.5 Technical Description ~ CTC-COM-dt 16 Guideline, Legal Gloss, Directive, Technical Description, Diagram,
I. EVOLUCIÓN HISTÓRICA DEL CONTENEDOR Glossary, Regulation, Technical Article and Table. Exclusive to this sector
are Regulation, Legal Gloss, Technical Article, and Table. In addition, the
El Contenedor como nuevo elemento de transpone ha revolucionado el
disciplinary domain is characterized by having the greatest number of
transporte marítimo, las naves comenzaron a adecuarse para una operación
Didactic Guidelines collected (12 in total). The text types which do not
expedita hasta llegar a las naves especialmente construidas para contenedores
appear in this sector are Form and Law. The industrial sector, in turn, has
WORKING SPANISH C PROFESSIONAL e
-·-·-·-·-·-·-·-·~
Specialized Oisseminating/Didactic
Discourse Discourse
in
these texts, in order to include a number of written materials
suitable for students in This balance of reading materials would UniYersitat
facilitate students' access to conceptualization of technical-scientific Spain
contents to which professionals in the technical area are exposed. A sub-
stantial advance in learning qualitywould be reinforced by using these texts.
Finally, as we already know, the school teaching-learning process can often
become more accessible to readers if content and text types are classified l. Description of Corpus 92: academic by university applicants
and properly organized.
The corpus-based study of discourse created by learners forms a highly
productive research line for L2 teaching and learning, for example
research on the International Corpus of Learner English (ICLE) in Europe
(Granger et al. 2002; Granger 2004), in the United States on the Michigan
Corpus ofAcademic Spoken English (MICASE) (Simpson et al. 2002); and in
China on the TELEC Secondary Learner Corpus (TSLC) (Allan 2002). In the
Spanish language, however, there is still a shortage of studies on learner
corpora which would make it possible both to explain the level of com-
municative competence - both spoken and written - possessed by students
at each stage of their educational development and, on the basis of this
analysis, suggest the appropriate teaching method.
Along these lines, Corpus 921 was compiled so that reliable information
would be available on the degree of written competence possessed by
Spanish students upon entering university. This learner corpus has been
analysed in order to characterize student usage with a view to designing
rational efficient teaching materials that address their difficulties. The
corpus comprises 750 copies of entrance exams for Spanish universities
from June 1992. It includes academic texts from the Spanish secondary
teaching syllabus of both the sciences and the humanities (according to
the original names used in Spain to refer to the various itineraries in
secondary school) and examinations of the modules common to any
labus itinerary. 2 The same number ofwritten texts (125 exam copies) has
been collected from each of the six Spanish universities that took part in
the project: two from northern Spain, two from central and western Spain
and two from southern Spain. Consequently, this corpus is homogeneous
in nature as it is based on the same exam tests from the same sitting at
six somewhat different Spanish universities. Moreover, the exams were
all taken on the same date. Table 8.1 shows the criteria used when build-
as well as the abbreviations used to identify the texts, the
l WORKING SPANISH ORPORA WRITING
92
corpora, un.its
The chief in and a language is to the com-
municatíve competence possessed those learning it and to promote This paper presents between non-expert texts, as
reflection on that competence. In order to reach this goal, teachers need to represented the exam answers of 92, and expert texts cv1nu11eu
possess considerable knowledge of the difficulties their students encounter in a technical-scientific corpus. Carrying out thís comparison entails
during each stage of learning. Indeed, as Granger et al. (2002: 41) state, choosing a specialized corpus that may be used as a 'standard' with which
'Analysing authentic leamer errors in L2 (and also in Ll) corpora is an the learner corpus can be compared. The corpus taken in to consideration
extremely efficient - though all too frequently and in our view unjustifiably when making the comparison is the Corpus textual especializado multilingüe
disparaged - method to acquire that knowledge'. In this respect, linguistic (multilingual specialized text of the Institut Universitari de
research on Corpus 92 provides two answers regarding the degree of com- Lingüística Aplicada (IULA) at Fabra University. 6 This technical
petence attained by Spanish students upon admission to university: corpus (hereinafter, covers five different specialized domains
(economics, law, environmental medicine and IT). About one
1) What they do and do not know how to do when they write: which punc- million words have been compiled for each stemming from
tuation marks they have greater command of, and which punctuation samples taken from varying documents that represent discourse activity in
marks they have difficulties with; what vocabulary they are able to use; each field. 7 Unlike Corpus 92, the samples compiled are not full texts.
which aspects of parataxis they master fairly efficiently and which they For our specific purposes, in this study we have taken into consideration
do not; which devices they use to handle the unfolding of the informa- only two disciplines of the social sciences ( economics and environmental
tion they present; what expressions they employ in arder to raise studies) for the samples collected in Spanish from the IULA-CT corpus. This
hypotheses, to quote others, to clarify and exemplify concepts, to mod- methodological decision was based on similarity in the type of discourse. As
ulate their discourse; and so on. pointed out by Battaner et al. (2001), the humanities discourse from Corpus 92
2) What they should know how to do: what command of punctuation is characterized as being explanatory text in which description and reasoning
should be required; which should be the most suitable vocabulary to prevail. Descriptive and reasoning sequences are in economics and
answer the questions raised in an examination; to what issues of environmental studies; our decision to choose these two domains.
parataxis and hypotaxis should one lend particular attention in order Accordingly, the information from the corpora in this
to allow students to better the grammar of their discourse; is in Table 8.2.
what would be the most effective method of presenting and progress- specific objective is to compare the use made of conjunc-
ing in the unfolding of an academic subject; what types of resources tions in the two corpora. These have different discourse
would contribute to suitably raising hypotheses, quoting a text men- status according to the context in which
offering clarifications and of the concepts explained, of
giving their own standpoint with respect to the knowledge
and so forth. · "· """ the verv same lin-
to whether the text in question is ¿fa \Hitten
These are issues that contribute to an entire teaching and learning pro- or a conversational register (i.e.
gramme of the mother tongue for academic purposes. language uuLuvu~. the use of which is
studies, such as the one described in Corpus 92, highlight determined by the written or
the value ofleamer corpus data in improving language teaching pedagogi- Schleppegrell 2001; Clachar
cal materials grammars, classroom an reflection for those who do not possess
effective command of the mechanisms that are characteristíc of academic
WORKING SPANIS CORPORA ACADEMIC
Content
of
(98 u~•cu>uucc,,,
- economics: 1,091,314
(48
- environmental studies:
1,062,113 (50 documents) the interest that our is
Rather than contrasting the types of units used in each of the corpora, we
aim to look at the use made of each of the units in each context. In other
texts entails observing how the paratactic conjunctions are employed in this words, we will look at how paratactic conjunctions are used in each case,
type of discourse genre and distinguishing these uses from those made by which specific uses ofthese students are unacquainted with in
these students in a discourse that bears many features of orality (Tusón academic discourse, and which student uses we consider to be unsuitable.
1991; Salazar 1999). It is therefore interesting to account for the function
that characterizes theses units in academic discourse insofar as the follow-
ing aspects are concerned: the levels of liaison that are involved, how parat- 5. The use of
actic conjunctions are combined in expert discourse, and what pragmatic discourse
values characterize their formal academic usage. Thus, the aim is to show The enormous wealth of information corpora
students the features that these units possess in 'exemplary' texts; in other on account of their
words, texts that are examples of how expert writers make use of language recurrence in expert discourse, are useful for teaching and
in their respective academic fields. This use includes a command of para- learning in specific communication contexts. Moreover, a learner corpus
tactic conjunctions as textual connectives. such as Corpus 92 also shows common errors made the students
In a recent study on the use of paratactic conjunctions in Corpus 92 when managing the various discourse and grammar units. In this study
(López and Atienza 2006), we observed that students made very frequent we are looking at paratactic 1999; Flamenco
use of paratactic conjunctions. However, did not use them as connec- García 1999) .
tives, which would be the most suitable use in written academic texts, but As we have pointed out, in an earlier and Atienza 2006) we
rather as pragmatic markers (Schiffrin 1987) with various functions: to highlighted the syntactic, semantic and pragmatic that handling
change the subject, to introduce or to resume the discourse conjunctions poses for students 1999).
topic, for example - uses that are more characteristic of their function as (1) below illustrates the sense of
markers in a conversational text. These uses show that students at y in order to continue to add
level are still unaware ofvariation in the use fails to consider the semantic and new
to the element ofinformation has with the 1uuu.LuaacL
( l)
Los puntos de roce se hallaban en el Estado, :'víarx consideraba que había que
y/e, ni; para realizar una revolución mientras Bakunin defendía una
mas; sino; aunque. postura que negaba la existencia de Estado para llegar a la
The consultation tool to de realizar la revolución, creía estar
out the contexts in which these
la acción revolucionaria directa; y j>or último la 1;ortu:zhrirum
units are used is the Bwananet computer tool of available at the
se U¡J'UllJLdH
website of the Institut Universitari de of Pompeu Fabra
8 The here is a qualitative based on the In non-expert texts, such as those in 92, there are many
standard concordance put forward Bwananet. Jndeed, data9 is mistakes of infonnation. This is shown the that,
to differences in use are characteristic of expert discourse data units that do to the same level are
of their use as a way to structure information in the overall
text. the patterns of use in which tend to
appear modulate the discourse in a specific manner which needs to be
described if students are to leam how to use them effectively in their texts.
(8)
Tanto si el vino, las o la lefia de Torredembarra o de Tortosa,
en Catalufia, o bien Vinaroz y Burriana, en el 'reino' de Valencia, se trata evi-
dentemente del mismo movimiento de atracción, determinado por las necesi- (11)
dades de la barcelonesa en los mercados del litoral, y en especial Pero hay mucho más: los residuos urbanos e industriales, el la escasez de
de sus productos agrícolas. (e00085) agua, la contaminación de los ríos, el transporte, el tráfico, o la contaminación
acústica, son otros de los muchos abordados por el periodismo ambi-
The to the command of more complex discourse lies in the variation ental propiamente dicho que acapara en estos momentos los principales "º'wL;vo
and the interplay of combinations between varying conjunctions and y titulares.
between syntactic as seen in the grammar of specialized texts.
Since conjunctions carry out the role of discourse organizers in a
written text, have a range that is more restricted than that
found in less formal communication contexts. These uses also reflect the
first kind of difficulty we pointed out in a previous section (5.1), the syntac-
As mentioned earlier, the use of paratactic conjunctions poses another
tic difficulty involved in handling various levels of coordination in which
challenge for students in that are also used as textual organizers. The
paratactic conjunctions intervene. This can be illustrated the following
lack of planning in the student texts 92 can be illustrated by the
example, where the connector y is highlighted in lower case when it is used
fact that their starting for connections in discourse is
as a sentence conjunction, and in upper case when it is employed as a textual
formed the sentence unit rather than the text unit. do
approach the text as a whole in order to establish relationships
and hypotaxis, as is shown the following from (12)
Si se mantienen las relaciones comerciales y se impide que los déficit fiscales de
los ricos desalienten las inversiones en otras partes y si los de ingreso
(9)
alto mantienen una tasa de crecimiento elevada y estable, la mundial
Estas características también son a la mezquita de Córdoba. Esta fue
no disminuirá y se evitará que los ricos cedan ante las
construida en varias o 'fases', es decir, fue Abderramán I, quien en el
a causa la de tasas elevadas de
VIII hace la la cual consta de un donde se encuentra las fuentes
de abluciones que servían para lavarse los
encontrar el Alminar la as far as discourse is concerned, the text as a whole
11H:Ltj1m11a se encuentra que es orientada hacia la meca, y por allows us to consider both coherence in the information in the
último el Mihrab al cual no tenían acceso los fieles. Y el resto de la mezquita estaba
text and the linguistic ch osen. Examples , ( 11) and also
formado por columnas, llamado Haram. Fue en el S. IX cuando Abderramán U
la derrumbando la También por el S. X, intervino show certain schemes that characterize the grammar of paratac-
II que la rnlvió a pero la última intervención fue de tic features that can be
Almanzor, que la lateralmente. Todo este de salas forma la identified when vvith electronic corpora. The concordances
de Córdoba, uno de Jos offered the computing tool used enable us to determine the combin-
que destacar ations in which paratactic are used most in other
Medina A.zahara, S.X, en donde se encuentran los y arcos we can not the levels at which are used, but also
herradura todo este arte. (MU /HA/03) the types ofunits that appear before and after them and their
5.3 Comrnon schemes
taken
are combincd
--------------·-----
habitualmente
normalmente
actualmente
possess the same semantic
the called
siempre
Insofar as of the information are concerned, the
Textual
inicialmente follovving examples show combinations in this respect:
últimamente
(21)
Pedro el Grande convirtió a Rusia en una gran potencia occidental y también
extendió la influencia rusa en Oriente, intentando establecer relaciones comer-
Others: ciales directas con la India, hacer la paz con Persia y asegurar una salida al Mar
solo sea Caspio, aunque esto último se perdió después de la muerte de Pedro el Grande.
(e00118)
We can see that in expert discourse the following patterns of use prevail:
(22)
De esto se deduce que la adopción de políticas macroeconómicas acertadas no
a) patterns showing the strengthening or reinforcement of the semantic sólo es ventajosa para el crecimiento y la inflación, sino que además facilita la ori-
content (addition, contrast, alternatives) ofthe paratactic conjunctions; entación hacia el exterior. (eOOOOl)
patterns of organization of the information (textual structuring pat-
tems); and (23)
Se pretendía que la asunción de obligaciones dependiera del nivel de desarrollo
e) subjective assessment
de los Estados en relación a los distintos sectores, pero resultaba muy difícil
plasmar y llegar a la concreción de tales obligaciones por la falta de estadísticas
Let us take a closer look at each of these grammatical schemes. In written
fiables, o incluso en algunos casos por su inexistencia. ( e00029)
academic texts the uses made of conjunctions lend greater
importance to semantics and, as a result, they determine the semantic rela- another kind of syntactic scheme is what has been named subjective
tionships between the clauses that are linked. On the other hand, in spoken assessmen t expressions ( Ciapuscio 2003b). In these pattems, the conjunc-
paratactic take on more varied pragmatic values tions personalize and assess the information when combined vvith markers
2001; Clachar , sorne of which mav be unsuitable in of subjectivity or modality:
written register. The greater lent to the sema{1tic aspect
tactic in formal written discourse explains the (24)
Las noticias ambientales, y el resto de la actualidad, básicamente de
or reinforcement of each individual
personas, empresas, instituciones y múltiples
as can be seen in these a estos temas, pero no exclusivamente, porque la interdisciplinariedad
(18) intrínseca a lo ecológico o ambiental afecta abrumadoramente al ejercicio peri-
de todos modos, es interesante tener en cuenta que la velocidad del viento odístico, aunque es obvio que las diferentes áreas informativas tienen no pocos ter-
aumenta la altura y que veces es suficiente elevar 10 metros la de ritorios comunes: el las decisiones del los tribunales de
molino para que la que nos sea doble. los etc (a00022)
(19) (25)
Nuestro razonamiento no afecta mundo está cambiando: el se agota irremisiblemente
ificar considerablemente que asumamos lo lo mismo sucede con el carbón o el gas
hemos visto mientras se mantenga la
WORKING SPANISH ORPORA ACADEMIC
(26)
y
como un
Hmnanos
y, además,
in the written
n2":drnJ11' can teachers in
texts of non-experts.
should be devoted to a given lexical item or gram-
As we can see in Table 8.3, the non-expert discourse of students features
semantic reinforcement of addition and contrast and bears a prevalent use
of the adverb también combined váth the copulative conjunction y, and with Thus, this information - taken from a 'non-exemplary' sample of academic
the adversative conjunctions pero, sino and aunque. In addition, the use of discourse - can help in planning the content of a pedagogical grammar.
combinations that highlight organization of the information in the text is Likewise, if these grammatical issues are placed in the context of the com-
more limited in the wTitten texts of learners than it is in experts' written municative aims and rhetorical strategies of the they take on a vital
texts. The discourse organizer por último is commonly employed in combin- role in a more overall analysis of the conventions of discourse genre.
ation with copulative conjunctions. In the case of other types of conjunc- The needs of students become more obvious when their production is
tions, their use as a means of structuring the information is rather unclear. compared with that of experts. Taking in to consideration samples of texts
Lastly, the least frequent patterns in students' written texts are subjective taken from manuals, informative articles, encyclopedias, etc. can provide
assessment expressions, due to the fact that for these types of patterns it is guidance as to what course needs to be followed in order to achieve the com-
necessary to personalize the discourse. In other words, it becomes neces- municative competence that one needs in an academic context. The infor-
sary to assess the information provided based on an acquaintance with the mation taken from the IULA-CTcorpus that we used to make the contrastive
degree of certainty of that information, something that students have yet analysis in these pages is of great use in this respect. Nevertheless, it should
to acquire. This situation explains the scant use of subjectivity or modality be taken into account that the learner's Corpus 92 constitutes another spe-
expressions in which paratactic conjunctions intervene, as opposed to cialized domain and, therefore, the need for communicative competence
their common and varied recurrence in expert academic discourse (for (or the uses of the conjunctions in this highly specific domain) may not be
see the patterns in which the conjunction pero is employed in the exactly the same.
With this pedagogical aim in mind, we looked at the use of paratactic
In this last section, which presents an of the use of para tac tic con- conjunctions in a subsection of Corpus 92, humanities, on the one hand, and
in academic discourse, we have illustrated their combinations in in a sample of Spanish from economics and environment texts from the
expert discourse as a way the needs of the learners of this type IULA-CT specialized corpus, on the other. The study shows the inappropri-
of text. It concerns the most grammar of conjunctions in ate use made by pre-university students of these coordinating conjunctions
academic discourse, a grammar of reference for anyone who wishes to have with regard to certain aspects: when it comes to handling syntax, that is, pos-
an effective command ofthis of communication. Along these lines, the sessing a command of the paratactic relationship in microstructure (sen-
aim is to become acquainted with and to command oftheir patterns of units and macrostructure (text) units; when it comes to using them
use, rather than concerned the of use of the gram- as elements that organize the discourse in formal register that is character-
matical units in certain communication contexts, precisely because they istic of academic discourse; when it comes to common patterns of use in
define the discourse function and ofthe linguistic elements we are which they are combined with other textual units that characterize this type
of discourse. The data offered computerized corpora illustrates the types
of relationship that are established paratactic conjunctions, relationships
Condusions that involve a syntactic difficulty that poses students a degree of difficulty
similar to that encountered when use hypotactic conjunctions.
In this artide we have offered a which the semantic value these units assume in
proves useful for use of
SPANISH
academic
the lists has
the 'concordances' 1991) found in each
conjunction in their real contexts ofuse and by distinguishing these occur-
rences in various contexts: at the start of a paragraph or a sentence, for
An analysis of this type enables patterns, 'models of use' and Paratactic Learner DISCOURSE
regular expressions possessing a particular function in the discourse to be conjunctions 92: humanities the environment (a),
identified. Describing common syntactic schemes, otherwise known as pre- economics (e)
tnh~r.,tNt patterns (Granger 1999) or chunks (Bybee 2002), proves to be
4121 (2.892 % ) a: 33,880 (3.189%)
beneficial for language teaching and learning in differing com- e: 27,249 (2.496%)
munication contexts. nz 91 (0.063%) a: 337 (0.031 % )
Granger (2004) has underlined how valuable the line of research known e: 43 (0.003%)
as Computer Learner C01pus) has been for progress in studies on the acqui- tanto . .. como 62 (0.043%) a: 540 (0.050%)
sition of second and foreign languages. In this article we advocate expand- e: 396 (0.036%)
ing the sphere of action for this type of research to encompass the o/ o bien/ u 430 (0.104%) a: 7829 (0.737%)
development of communicative competence, not only in L2 or foreign lan- e: 5830 (0.534%)
(o) bien . .. (o) 7 (0.004%) a: 85 (0.008%)
guages, but also in Ll or mother tongues.
bien / bien . . . o e: 87 (0.007%)
jJero 535 (0.375%) a: 1287 (0.121 % )
Notes e: 1497 (0.137%)
sino 59(0.041%) a: 527 (0.049%)
l y estudio del Corpus 92 escrita por aspi- e: 959 (0.087%)
1nii11rnitnrir:><1, directed Dr María Paz Battaner, was ini- aunque 138 (0.096%) a: 582 (0.054%)
Pompeu Fabra Barcelona (1993-1994) and e: 832 (0.076%)
( 1994-1997) the Dirección General de Investigación
Científica y Técnica (DGICYT PB93-0392) of Spain.
2 92 comprises written tests in academic ui:,uµ1Jtut" The frequency lists for each corpus are provided the section Unidades
entrance exams. de contexto of the tool used. This table shows the paratactic con-
3 The shortest exam answers contain around 300 words and the whose no for the consultation tool
ones can be in excess of900. the units that pose of identi-
4 92 has been included in the de KP:tPrr,nr:;1.n fication in the word count the conjunction rnas, which in
ofthe the consultation made is mixed with the adverb . It should be
out that Bwananetmakes no distinction between the conjunction
5 and the adverb ni. In any event, adverbial uses are the least
Encarna Atienza and we consider for statistical the of error
Torner Castells. generate is minimal. We have
esi1ec;1at:izaao contains written texts in five
m·1au1rYJ'""'P in the of the forms tanto and
domains. The directed
WORKING SPANISH ORPORA
10
12
René
nomics. Pontificia Universidad Católica de
13 In Table 8.3 we have included only those paratactic conjunctions that Chile
lend themselves to combination with other grammatical units, i.e. the
connectives that coordinate two or more members paratactically in con-
junction with other connectives or discourse markers. Consequently, the
distributive conjunctions tanto ... como, bien ... bien have not been
Introduction
included in the table, as do not allow these combinations.
This research is part of the studies conducted on specialized discourse;
more specifically it is related to the written production of scientific articles.
The term 'specialized discourse' is nowadays widely accepted by language
scientists. Generally, the concept of specialized discourse is conceived in a
global and comprehensive way (Parodi 2005a), acknowledging a continuum
within the concept that includes texts that range from high to low special-
ization and belong to a variety of text types.
The text type traditionally studied in scientific specialized discourse has
been the scientific research article, which is considered as a prototype in this
kind of discourse practice (Sager et al. l 980; Bazerman 1988; Swales 1990,
2004; Salage1~Meyer 1991, 1992; and Martin 1993; Hyland 1998,
1999, 2000; Martin and Rose 2003). This text type has recently received
greater attention in and has been studied from various perspectives
(Calsamiglia 1998; Bolívar 2000; Ciapuscio 2000, 2003b; Cassany et al. 2000;
Moyano 2000; Ciapuscio and Otañi 2002; López 2002; Mogollón 2003;
Martín 2003; Gotti 2003).
These researchers generally conceive of a scientific research article as the
written text published in a specialized journal, whose aim is to inform the
discourse community on the results of scientific research performed by
applying the scientific and which requires clear rhetorical struc-
ture commonly following the IMRD model (Introduction, Method, Results,
Discussion), proposed Swales (1990). However, as stated
Swales (2004), the structure may vary according to the characteristics of
each scientific discipline.
Research in this field has been performed from linguistic-
textual, rhetorical and socio-cognitive perspectives, using model text
and criteria. The more
for special and didactic
LATENT
unJ1u•;.:11~a1
sciences and social ~utn1cc:~
first part of this be
the research, involving Cabré 2002; Ciapuscio 2003b; Gotti
course of science, collocation semantics and latent semantic analysis. The As said before, the scientific research article (SRA) is a written text, with
methodological framework is presented in the second part. Finally, after the a rather rigid structure (at least, when produced in sorne empirical disci-
of the results, the chapter ends with sorne conclusions. plines): each ofthe traditional sections (e.g. IMRD) is preceded by a title,
the names of the authors and the institutions where they work as
researchers, and an abstract, whose aim is to briefly inform the reader on
Theoretical background
the content of the total SRA, to help them decide if they consider it useful
toread (Moyano 2000).
1.1 The research article: a
Swales ( 1990) argues that sorne characteristics of scientific articles may be
We are aware that in science there is not always consensus about the denom- displayed through a wide range of disciplines. It is said they are repeated to
inatíon of the objects of study, mainly due to the focus and delirnítation of warrant the existence of a rnacro-genre. Nevertheless, we nowadays know that
the authors in their effort to conceptualize them. Norrnally there are rnul- articles may vary in the degree of standardization and style from one discip-
approaches because of diverging theoretical assumptions. The concept line to another: sciences known as 'hard', 'exact' or 'physical' follow the rigid
µc,ua.11L.cu discourse' is no exception. It has received many narnes: aca- pattern discussed above, while in social sciences there are sorne journals that
demic discourse, special discourse, professional discourse, technical dis- have adopted the common pattern with different degrees of success. There
course, institutional discourse, and so on. Sorne ofthern actually do are still other journals that resist adopting a fixed organízation (Moyano
while sorne others do not. 2000; Mogollón 2003; Swales 2004). On the other hand, dueto the interna-
Likewise, it is not easy to reach a certain terminological order and attain tional process of indexation and accreditation systems, scientific journals
a more or less homogeneous vision (Ciapuscio 2000; López 2002) as the task have increasingly paid more attention to the standardized production of arti-
of deterrnining whether a text can be classified as a specialized text or a cles, following general and more common norrns - at least in format.
general text becomes a theoretical and descriptive problern (Schroder Various previous investigations have focused on different parts of scientific
Parodi 2004). the view is in favour of a text research articles. The introduction, for example, has been analysed in depth
continuum (Lakoff distributed from a highly special- Swales (1990), introductions and conclusions by Gnutzmann and
ized domain toward another more and information-oriented Oldenburg (1991), conclusions Ciapuscio and Otañi (2002), abstracts by
extreme 1982; Schroder 1991; and Martín Salager-Meyer (1991) and Bolívar (2000), abstracts and introductions by
1994; Peronard 1994, Cabré 2002; Parodi Martín (2003), and the introduction and discussion sections Dudley-Evans
On the other hand, Gotti , also when . Ali such research has been conducted from linguistic-
the rnulti-dirnensional nature ofthe specialized discourse, states textual, rhetorical and socio-cognitive and most of them frorn a
that there is no among different specialized He comparative inter-language in model of texts.
argues that vanat10ns not lexical connota- \!Ve will now refer in more detail to two aspects of the scientific research
tions, but often influence other textual and article: the abstract and the ITP,cmu•rn
1-'"'LU"""v"'"'""' semantic and
of various types of specialized discourse. 1.1.1 The abstract in research articles
Therefore, the differences betí-veen discourses allow us to level The abstract is a brief text used to allow the reader to and
differences discourse since, for the mere presence the article's rnain content. This text is located between the title
SPANISH C USING LATENT SEMANTIC ANALYS S
contain.
in turn con- LSA does not information or
their that is, it is method of mathematical
informatíon is not known, such a type of that attains inductive effects, an adequate
approach is of words' method (J urafsky and number of dimensions to represent objects and contexts (Landauer et al.
Martin 2000) . 1998). The LSA method extracts its meaning representations ofwords and
From a strictly linguistic perspective, this notion is compatible with that paragraphs exclusively from text mathematical-statístical analysis. Its know-
of collocational meaning developed by the English functionalist school ledge lacks anything that might come from perceptual information on the
(Palmer 1980). As we know, such a notion is based on the contextual theory physical world, instinct or experience generated by corporal functions, feel-
of meaning, where a word acquires its meaning through the words that ings and/ or intentions. Therefore, its representation of meaning is partía!
accompany it (Palmer 1980; Stubbs 1996, 2001). This concept of meaning and limited, as it does not use either syntactic, logical or morphological rela-
has given rise to computer corpus semantics, which by means of computer tions. In spite of the latter, Landauer (2002) explains that, at least for the
tools enables the performance of empirical studies of meanings using huge English language, 80 per cent of the potential information in language
textual corpora (Halliday 1991, 1992; Sinclair 1991; Stubbs 1996, 2001). líes in word selection, 'Without any consideration of the order in whích they
Stubbs (2001) states that it is possible to study lexical-semantic relations are placed.
starting from the study of word collocation and frequency. Based on concrete In addition to this representation without syntax, there is an idea which
examples, he promotes the observational methods of corpus semantics, states that in the use oflanguage, represented by great quantities of corpora,
arguing that data obtained from the corpus provide evidence with respect to there are weak semantic interrelations between words empowered by the so-
denotative and connotative meaning. Nevertheless, the fact that most fre- called method of dimension reduction Singular Value Decomposition
quency studies in corpus linguistics are limited to the isolated count of the (SVD). In this sense, the metaphor underlying the term 'latent' means that,
more frequent units, hiding various interesting aspects related to null, through dimension reduction obtained using SVD, the adequate represen-
minimal or intermediate frequency units, has been criticized (Rojo 2002). tation of existing relations between words in the textual corpus is obtained,
We believe that to quantitatively study lexical-semantic relations in a and such relations are very weak due to the great number of words
corpus, we cannot focus only on the highest frequencies of occurrences. It (Landauer and Dumais 1996, 1997; Landauer et al. 1998).
is highly relevant to pay attention to the complete range of occurrences, and The procedure performed when using LSA to represent texts in multi-
even more to the co-occurrences of sorne features. dimensional semantic spaces is the one proposed for the Latent Semantic
Bearing this in mind, we decided to use a vectorial method based on Indexation (LSI) method by Deerwester et al. (1990). This method, origi-
latent semantic analysis that recognizes, dimension reduction, the nally applied in the information recovery area, has been used with theoret-
semantic similarities existing among the linguistic or text units, among ical and methodological in psycholinguistics during the past years
and also between words and documents. The most relevant idea as (Deerwester et al. 1990; Landauer and Dumais 1996, Landauer et al.
to semantic is that results may be explained through the degree of 1998; Kintsch 1998, 2000, · Landauer 2002; Quesada et al. 2002;
contextual exchange, or the degree by which a word may be substituted Quesada 2003). Additional information on the theoretical and method-
another within a context. From an perspective, measur- ological discussíon of this method may be found in Spanish in Venegas
is conceptualized models ofvectorial type, as a (2003, 2005, 2006) and Gutiérrez
of vectors, in order to determine the of two speaking, this method works as
words as vectors in a multi-dimensional or multi-vectorial follows. The first step is to build a matrix of co-occurrences (X) from texts,
space. A matrix is built to do this, representing numerically the co- where each column represents a document or co-text ( d), and each file
occurrence ofwords a unit called 'document' sentences resents word from the text . Each cell contains the in
or the file word appears at the text page shown . Cell access
SPANISH CORPORA USING SEMANTIC ANALYS S
2. Methodological framework
X= matrix) X = (transformed matrix)
di d2 d3 d4 di d2 d3 d4 2.1 Type of study and variables
Pi 1 3 2 4 Application of Pi 0.1 0.2 2 0.6 As we stated in the introduction to this research are: a) to
P2 2 4 3 P2 0.2 0.3 0.01 0.3 compare, using a vectorial analysis computer tool based on a corpus called
p3 Latent Semantic the lexical-semantic relationship of three text
3 4 2 1 p3 0.2 0.1 0.02 0.2
variables present in scientific research articles that are: keywords, abstracts
9.1. First step ofLSA method and the contents, and b) to compare, starting from the lexical-semantic sim-
ilarity values of the text variables, a sample of scientific research from two
science areas (biological sciences and social sciences).
X In order to with such objectives, we performed exploratory-
di d2 d3 d4 non-experimental a methodol-
P1 O. 1 0.2 2 0.6 ogy. It is because there are no studies concerning scientific
P2 0.2 0.3 0.01 0.3 writing conducted in Spanish using tools to calculate lexical-
semantic with LSA. It is because it about
Ps 0.2 0.1 0.02 0.2 incidence and values that show the relations between the text variables to
be
p D X' two areas with to their lexical con ten t.
P1 P2 P3 d1 d2 d3 d4 di d2 Variables considered in this work are of two types: text and discipline. Text
Pi i 0.2 2 d1 1 0.2 2 0.6 P1 0.1 0.2 variables are: a) a group of words or nominal phrases, that in
P2 0.2 1 0.01 d2 0.2 1 0.01 0.3 P2 0.2 0.3 on1p;Kt the main of the scientific research arti-
p3 0.2 0.1 d3 0.2 0.1 1 0.2
ld4 0.1 0.3 0.4 1 ~~'"'J"''~' in a lexical-semantic way the
scientific research e) the contents: texts made up of
ofSVD rhetorical-structural where the followed one or
USING
·----
4908
8395
4982
sciences. Those two areas been chosen because of their greater pres- 6156
RCHA5 2001 2735 13,419
ence in indexers used for corpus collection. Exact or pure
6 RCIM.5 2002 20(2) 1780 8876
sciences are another area of science: RClL'\2 2002 20 (3) INE2 2002 26(2) 8034
4251
however have not been taken into account in the '·'-'"HJ<H 8 RCHN6 2000 NS3
anides these areas are m Lúic:iw'" 9 RCHN7 2000 NS7 2003 186 7150
10 RCHNl 2002 5522 NSlO 2003 188 5436
11 At'VIV2 2001 33 (1) 2609
2.2 Hypothesis 12 At\1Vl5 2002 34(2) 3519
The research hu~r.;hp~'"º are the following: Total 45,959 73,838
Average 3829.9 7383.8
Hl: When comparing lexical-semantic similarity indexes between text vari- Std.Dev. 1177.8 258.4
ables (keywords-abstracts, keywords-contents and abstracts-contents) frorn
scientific research articles of two areas of knowledge, the abstracts-contents Latindex electronic indexers that cornply with intemational indexation crite-
relation will show higher similarity values than the keywords-contents and ria or . In addition, are texts
keywords-abstracts relations. that show the text variables for research. Annex 1 the
ARTICO corpus, considering the reviews, the number
H2: There will be significant differences between the two areas ofknowledge word total, in accordance with the scientific area of
researched when comparing lexical-semantic similarity indexes between text Table 9.1 presents the research corpus used to quantify semantic similar-
variables of the scientific research article in a specialized semantic space ities. For more details, see Annex 2 and bibliographic refer-
(keywords-abstracts, and abstracts-con ten ts). ences of the research sample).
Sciences,
Results
and social sciences
Information Sciences, Social and Cultural
3.1 variab/,es
Lc.vuvuuL~. Soc1oloov Cultural and Social m
2000 to 2003 in mainstream reviews in each science area, that is, reviews avail- The
able to researchers in ScIELO Electronic LSSs h"''""'""n
WORKING WITH USING SEMANTIC ANALYSIS
among variables
------------------------
2 sciences anides
between
A+C
---~------·-
text variables
3.3
Graph 9.1 Comparison of average indexes oflexical-semantic similarity between
Table 9.4 presents the results obtained for comparison of LSS beaveen the text variables in both areas
variables to be studied in social sciences.
Results of the comparison between keywords and abstract variables show Thus, the keyword-abstract and keyword-content variables have a lower
that article AD14 2002 5 has the lowest LSS_I (.0891), corresponding to a average value than the semantic similarity relations between abstract-
low Dº_LSS. Conversely, research article INE2 2002 26(2) has the highest content in all articles investigated, independent of the science to which
LSS_I (.4762), corresponding to a medium-high Dº_LSS. It is worth noting they belong. More specifically, there is a slightly higher average index in
that this artide has the highest LSS_I between keyword and abstract the keyword-abstract relationship than in the keyword-content relation,
variables among all articles investigated in both science areas. The average which enables us to suppose a tendency toward relating keywords with
result of the LSS-I between these variables reaches a value of corre- abstracts rather than with text contents.
sponding to a medium-low Dº _LSS. Statistical contrast of such relations allows a confirmation of this idea.
As to the comparison between abstractandcontentin social sciences Thus, when comparing LSS between text variables (keywords-abstracts, key-
the one with the lowest LSS_I is AMBl 2001 6 with corresponding to a words-contents, and abstracts-contents) using the Kruskal-Wallis test, and
low Dº_LSS. article NSlO 2003 188 has the highest LSS_l beaveen considering a 5 per cent error, we can establish that in biological sciences
abstract and content variables in the area, with a value of .7628. The average there is a statistical difference beaveen the three relations, that is LSS
LSS_I beaveen variables is toan average Dº _LSS. between kevwords and contents is lower than the LSS relation between key-
In the between and content variables, article AMBl words and ~bstracts. In turn, both relations are lower than the LSS beaveen
2001 6 has the lowest LSS_I between variables , corresponding to a abstracts and contents. In social sciences, statistical tests also showed differ-
article INE2 2002 has the ,u~;uc.~t ences beaveen the three variable relations. Therefore, the LSS between key-
and content variables in social sciences. As words and abstracts is lower than the LSS beaveen keywords and contents,
between these it is . and both are lower than the LSS beaveen abstracts and contents of the arti-
cles under study.
These results confirm our first research hypothesis, considering that
abstracts rather than keywords macrosemantize better in both areas the
global semantic content of the SRA. A LSS pattern is also identifiable in both
In beaveen LSS results of text variables in the areas where there is a macrosemantic hierarchy. Thus, greater macrose-
two scientific areas studied. mantization is given in the abstract-content relations, followed
As shown in 1, LSS averages between text variables make up keyword-abstract, and lesser macrosemantization is between
pattern of semantic to both scientific areas. words and contents.
WORKING SPANISH USING LATENT SEMANTIC ANALYSIS
wiili 00~ rn
in accordance with international standards of scientific
thus appear that research anides sh01N
rhetorical-stnKtural
similar LSS between the areasº We believe that what has been
as to form and content, to lexical-semantic an of communication
relations between the variables studiedº This standardization takes areas of scienceº
thanks to editorial whose aim is coherent and infor-
mative articlesº Condusions
First, it is possible to assert that the abstracts-contents relation in all SRA of
305 the corpus is stronger than keywords-abstracts and keywords-contents rela-
The above results allow us to establish a similar behavioural pattern for tionsº The results confirm the macrosemantization function of abstracts,
both sciences, with the LSS of text variables as a starting poinL In order to and that they help the reader to build a hypothesis on the central topics of
verify whether this pattern marks a difference between study areas, there is the texL In this particular case, we have verified that the degree of the LSS
a statistical comparison of the average values of all the LSS relations (see of abstracts is statistically high with reference to contents, therefore, in this
2) o type of text, and in the investigated areas, readers' hypotheses would prob-
A Mann-Whitney non-parametric test for two independent samples was able be fulfilled to a high degreeº
applied, obtaining an alpha value of A5L According to this value, our Another conclusion is that keywords, at least in the SRA studied, lack a
second research hypothesis is disprovedº This means that when analysing clear macrosemantization function of the global meaning of contentsº
the LSS_I between text variables (keywords-abstracts, keywords-contents Thus, it is possible to argue that most probably keywords succeed in
and abstracts-contents) of the scientific research articles within a special- placing the research article in a subject-matter or procedural disciplinary
ized semantic space, no significant differences were evidenced (with a 5 per field, with little reference to this within the articleº It is worth noting that
cent error) between the two areas investigatedº the method of quantifying similarities is based on co-text collocation of
One possible interpretation of this result is that the scientific research words within the text, therefore it is highly probable that this keyword
articles used in the study sample are the material product of a complex function is not detected by LSS values delivered by LSA It could happen
process of scientific production, where multiple agents unite their dis- that keywords do establish the macrosemantized global meaning of the
course and disciplinary competence in order to attain a scientific research text, but not in terms of intra-textual lexical relations, but rather in more
abstract macrosemantic relations of metatopical and eminently inter-
textual charactec In addition, one could argue that keywords would fulfil
Comparison of the lexical-semanfü:: similarity average in
another specific functionº Their nature would be more persuasive
both areas
than informative, that is, writers or even editorial committees would use
0,5 or suggest keywords or phrases that, though not strongly related to
texts, arouse the interest of a possible reader, thus covering the first
0,4
- 1
step of approach to text reading for scientific communities of specific
en interesL
w
...! 0,3 We can also state that there are no significant differences between the
Q)
m scientific domains under study, in accordance with the lexical-semantic
«l
....
Q)
0,2 relationships analysed between the text variables in the articles investi-
> gatedº Evidently, this result does not respond to what had been expected,
<:!.
O, 1 considering that notionally it was assumed that biological scientists would
show greater precision and standardization when writing a scientific
0,0 research article, fundamentally between the scientific article sections
bíological sciences social sciences
and the terminological use of the lexicon, in contrast with social sciences
9º2 Comparison of the lexical-semantic similarity average in both areas texts where less rhetorical-structural standardization, lexicon variability
according to ES-ARTICO and conceptual were expectedº With reference to semantic
WORKING SPANISH USING SEMANTlC ANALYSIS 3
Note
The numbers included in 9.1 and 9.2 are used for representa-
tional purposes and ··~·,"~~~ri to the results of the sta-
indexed scientífic reviews in
tistical 'rn" • v~1~.
ards of international In this sense, the article produced an
or authors, is submitted to a complex editorial process. Through this
process many scientific producers and those who understand the discipline
join their knowledge and text-discourse competence to co-build a scientific
research article not only possessing content quality but organizational
quality according with the rhetorical structure demanded by the journal.
it is possible nowadays that in most disciplines research articles tend
to greater rhetorical-structural homogeneity, and therefore similar LSS. In
this way, the trend is toward a progressing similarity where individual disci-
pline differences are lost, at least regarding differences in macrosemantiza-
tion processes between the variables studied in biological and social sciences
text samples.
It is possible to infer that the researcher intending to join a scientific dis-
course community must learn, among other things, to communicate his or
her research by following the semantic-textual norms associated with this
type of text and the proper disciplinary norms of specialized reviews,
assisted in this learning process by editorial instan ces of the journal selected
for publication. This process includes multiple assessments and suggestions
both from scientific peers and editorial committees and/ or journal editors.
the effort of getting an article published in a scientific joumal
becomes a semantic textual co-construction process oriented toward a dis-
course production system or knowledge co-writing. It is very
thus, to discover that a finally published scientific research
artide may, in sorne cases, turn into a product of great interaction and
meaning exchange. So, the writer must pay attention to many voices that
might make him or her change not onlyformat and content, but
communication purpose.
we must emphasize the fact that results obtained through LSA
help to confirm the function of abstracts in SRA and determine
that text variables macrosemantize contents in SRA, independent of the dis-
cipline area. In other words, LSA using mathematical-statistical data of
the selected texts enables us to establish w:ith fair precision and economy, in
terms, the forces of between lexical components,
them semantic values and the of
lexical-semantic
components.
SPANISH CORPORA USING SEMANTIC ANALYSIS
AJ'<JNEX CORPUS ARTICO: NUMBER OF ARTICLES AND WORDS ANNEX 2 RESEARGH CORPUS REFERENCES
CODE CORPUS ARTICO BIOLOGICAL SCIENCE
-----
EXACT SCIENCES ID CODE
DE LA SOCIEDAD CHILENA 78
ALvI\13-2002 Sievers, M., Cárdenas, C.
34(1) 'Estudio anual de la eliminación huevos
v larvas de
29 91,025
en ovinos de una estancia en
RPQ Chile'. Archivos de A1edicina Veterinaria 34, (1), 37-47.
6 20,841
2 GC3-2002 Lara, G., Parada, E. and Peredo, S. (2002) 'Alimentación
publicaciones/ing_quimica/Reglamento.htm 66(2) y conducta alimentaria de la almeja de agua dulce
AAL ACTAS DE LA ACADEMIA LUVENTICUS 6 21,615 Diplodon chilensis (bivalvia: hyriidae) '. Gavana (Concepc.)
http:/ /www.luventicus.org/ Actas/ 66, (2)' 107-12.
ITERC INTERCIENC_IA http:/ /www.interciencia.org/ 6 36,615 GC3-2003 Alarcón, M. (2003) 'Sifonapterofauna de tres especies de
3
ACV ACTA CIENTIFICA VENEZOLANA 7 32,831 roedores de Concepción, VI Región, Chile'. Gavana
67(1)
http:/ /acta.ivic.ve/ (Concepc.) 67, (1), 16-24.
132 478,908
BIOLOGICAL SCIENCES 4 RCHA4-2001 Vásquez, B. (2001) 'Presencia de CBG ene! estroma
GC GAYANA CONCEPCIÓN http:/ /www.scielo.cl/ 19(3) ovárico de mamíferos'. Revista Chilena de Anatomia 19, (3),
40 159,070
scielo.php?pid=0717-6538andscript=sci serial 279-84.
RCHA REVISTA CHILENA DE ANATOMÍA 66 219,685 5 RCHA5-2001 Castro, A., Ghezzi, M., Alzota, R., Lupidio, M. and
http:/ /www.scielo.cl/ scielo.php?pid =0716-9868 19(3) Rodríguez,]. (2001) 'Morfoloqía del hígado de llama
andscript=sci serial (Lama qlama) '.Revista Chilena de Anatomía 19, (3),
RCI REVISTA CHILENA DE INFECTOLOGÍA 34 136,299 291-96.
http:/ /www.scielo.cl/scielo.php?pid=0716-1018 6 RCHA5-2002 Briones, F., Calderón, M., Muñoz,J., Venegas, F. and
andscript=sci serial 20(2) Araya, N. (2002) 'El anticuerpo monoclonal Ki-67 como
RCHN REVISTA DE HISTORIA NATURAL 101 657,749 elemento de valor diagnóstico y pronóstico. en neoplasias
http:/ /www.scielo.cl/ scielo. php?pid = 0716-078X mamarias caninas'. Revista Chilena de Anatomía 20, (2),
andscript=sci serial 165-8.
AMV ARCHNOS DE MEDICINA VETERINARIA 57 263,781 7 RCHA2-2002 Babinski, M., Chagas, M., Costa, W. and Pereira, M.
http:/ /www.scielo.cl/ scielo.php?pid = 030 l-732X (2002) 'Morfología y fracción del área del lumen
andscript=sci serial 20(3)
glandular de la zona de transición en la próstata
298 1,436,584 humana'. Revista Chilena de Anatomía 20, (3), 255-62.
SOCIAL SCIENCES
AMB ÁMBITOS.REVISTA INTERNACIONAL 74 8 RCHN6-2000 Véliz, D. and Vásquez,J. (2000) 'La Familia Trochidae
497,473
DE COMUNICACIÓN http://www.ull.es/ 73(4) (Mollusca: Gastropoda) en el norte de Chile:
publicaciones/latina/ambitos/ambitos.htm consideraciones ecológicas taxonómicas'. Revista
CHU CHUNGARA http:/ /www.scielo.cl/ scielo. Chilena de Historia Natural (4)' 757-69.
57 351,956
php?pid=0717-7356andscript=sci serial 9 RCHN7-2000 Martínez, G. and Montecino, V. (2000) 'Competencia en
AD ANALES DE DOCUMENTACIÓN 54 379,925 73(4) Cladocera: implicancias de la sobreposición en el uso de
http:/ /www.um.es/fccd/anales/ los recursos tróficos'. Revista Chilena de Historia Natural,
NS NUEVA SOCIEDAD 26 152,501 73, (4)' 787-95.
http:/ /www.nuevasoc.org.ve/home/ Canals, M., Atala, C., Olivares, R., Novoa, F. and
INE 10 RCHNl-2002
INVESTIGACIONES ECONÓMICAS 34 333,777 Rosenmann, M. (2002) 'La asimetría y el grado de
75(2)
http:/ /www.funep.es/invecon/ sp/ sie.asp optimización del árbol bronquial en Rattus norvegicus y
245 1,715,632 Oryctolagus cuniculus'. Revista Chilena de Historia Natural
TOTAL 675 3,631,124 75, (2), 271-82.
6 WORKING WITH SPANISH CORPORA
ANNEX (continued)
BIOLOGICAL SCIENCE
ID CODE REFERENCE
AMV2-2001 Díaz, D., Picco, Encínas, Rubio, and Litterio. N.
33(1) (2001) 'Residuos tisulares de nicotina to de norfloxacina lll
administrado vía oral en credos'. Archivos de A1edicina
Vetetinaria 33, 37-42.
12 AMV15-2002 Perfumo, C, Sanguinetti, H., N., Armocida, A., Ni cole
34(2) Machuca, M., Massone, A., Risso, M., and Viviana Cortes
Idiart,J. (2002) 'Constrictura rectal en cerdos Douglas Biber
necropsiados en una granja de ciclo completo en
confinamiento. Consideraciones sobre su prevalencia, Northcrn Arizona University
hallazgos anatomopatológicos y etiopatogenia'. Archivos Iowa State University
de Medicina Veterinaria 34, (2), 245-52. United States of America
SOCIAL SCIENCE
texts, where
studies
LUc"'H~H. One of
. His interviews 397
sequences of corpora of Academic texts 1,002,550
were a combination of rhttM·An Total 560 3,224,575
the results reflected that sorne
,,,,~,,'"º on collocates are For the frequent collo-
cates in the Habla corpus (a corpus Madrid which consists 3) Caracas, Venezuela, 4) Havana, Cuba, 5) La Paz, Bolivia, 6)
ofinterviews with an include concrete nouns related to Lima, Peru, 7) Madrid, Spain, 8) Mexico Mexico, 9) SanJose, Costa
and processes, which are the San Juan, Puerto Rico, 11) Chile and 12) Sev:ille, Spain. This re g-
type of nouns and adjectives used in the País corpus, which is made up of is ter was chosen because it was a Jarge sample of spoken language that was
excerpts from the daily national newspaper El País, largely reflect the finan- available to the researchers. Most of the sociolinguistic interviews follow the
cia! content of that corpus. Butler conducted a functional analysis of the typical interview format of a question and then a long answer but ~til.l are rel-
recurrent word combinations identified in these corpora and reported that atively open-ended. A smaller number are more informal and m1m1C casual
the majority of the repeated word sequences held two main properties: struc- conversation with shorter turns by both interlocutors.
turally, they began with conjunctions, articles, pronouns, prepositions or The academic register was designed for this project to reflect the same
discourse markers, and, functionally, these expressions were used for inter- countries that are represented in the Habla Culta. The texts were c~llected
personal or textual functions rather than ideational functions. from online and print sources. Approximately one-third of the arucles are
The current research complements these previous studies by examining history articles from Argentina, another third are humanities articles from
the use of lexical bundles in Spanish, focusing on two distinct registers: various countries such as Bolivia, Chile, Colombia, Costa Rica, Cuba,
sociolinguistic interviews and academic writing. Because the research on Mexico, Peru, Puerto Rico, Spain and Venezuela, and the final third ~re
lexical bundles in Spanish is rather scarce, we chose to begin the investiga- science articles taken from journals in all of the above-mentioned countnes.
tion at a more global level, comparing two very different registers. Table 10.1 demonstrates the composition of the two registers.
Specifically, the study focused on the following research questions:
2.2 Identification of lexical bundles
1) What are the most frequent bundles in Spanish conversation and aca-
demic prose? Lexical bundles are defined as the most frequent recurring lexical
2) What are the functions of these bundles? sequences in a register (Biber et al. 1999). It is important to. stre.s~ the fact
3) What similarities and differences are there between registers? that lexical bundles are identifi.ed empirically rather than mtmt1vely and
that these word combinations are defined by frequency. By defi.nition,
In Section 2, we describe the methodology used to identify lexical bundles, lexical bundles 'are the sequences of words that most commonly co-occur
and continue in Section 3 by summarizing the major results ofthe identifi- in a register' (Biber et al. 1999: 989). In the current stud,r, we continu~ this
cation and classifi.cation of lexical bundles in each re gis ter. Finally, we con- frequency approach tradition to identify lexic~l bundles 111 the two reg1st:rs
clude with a discussion of the fi.ndings and a brief compa1ison to previous selected. Our approach is rather conservative 111 that m order to be cons1d-
studies of lexical bundles. ered a lexical bundle, an expression must occur at least 30 times in a million
words and appear in at least 20 different texts. To further limit the investi-
gation, only four-word bundles were analysed. A computer pro~rarr: was
2. Methodology
written by the third author that identifi.es each bundle and ma111ta111s a
count of the number of times it occurs in each register and the number of
2.1 Corpus used in the current study
different texts that it occurs in.
The current is based on the analysis of texts from sociolinguistic inter-
views taken from the Habla Culta (Lope Blanche 1977, 1991), in addition to
2.3 Codingfor structural types andfunctions
academic articles that were downloaded from online sources and scanned
from print sources. The Habla Culta includes sociolinguistic interviews from Two of the researchers independently coded each bundle based on struc-
12 different Spanish-speaking cities: 1) Bogotá, Colombia, 2) Buenos Aires, tural and functional characteristics and all conflicts were resolved
WORKING LE XI CAL IN AND WRITING
4000
3500
3000
e
60 ~ 2500
50 ·e,...
.... 2000
40 Q)
Q.
30 > 1500
l.)
e:
Q)
20 :::;¡ iOOO
O'
10
o-+-~~~__¡_~_;_;_-"--~-'-~~~~~~~~--L-'-'--'-"~~'-"'--'-~~~~ -...
Q)
500
Bundle Academic
STAi'JCE EXPRESSIONS
A. stanee
se va a, que a mí
Al) Personal
Type 2 bundles are both noun and pero yo creo que **
prepositional phrase 2a-2c noun phrase frag- que yo creo que **
ments (cada uno de los, el hecho de que, la ciencia y la), whereas types 2d and 2e yyo creo que **
incorporate prepositional fragments lo largo a través de en la yo creo que el *
que se). Figure 10.3 demonstrates the distributíon oflexical bundles across yo creo que en **
structural types. yo creo que es **
yo creo que la **
yo creo que no **
3.2 Functional taxonomy yo creo que sí **
In this section, we describe the functional taxonomy that emerged from yo no sé si **
A2) Impersonal
qualitatíve analysis of the bundles in context. For this analysis, we began
examining concordance lines to analyse the functions of all the bundles in
el hecho de que ** **
la verdad es que **
their discourse contexts. B. Attitudinal Stance
From the data, three primary functions were identified and those a mí me gusta **
functions match the functions that were described in Biber et al. (2004): a mí me parece **
1) stance expressions, 2) discourse organizers and 3) referential me parece que es *
expressions. Each of these three categories also includes subcategories of
U. DISCOURSE ORGANIZERS
functions. Table 10.3 lists the bundles in each of their functional A. Topic introduction/ focus
categories. The following sections will describe each of the bundles in en cuanto a la * **
more detail. se trata de un **
se trata de una **
te voy a decir *
80 B. Topic elaboration/ darification
o NP/PP-based bundles lo que pasa es que **
70 por eso es que *
11 VP-based bundles
111 60
que es lo que **
w
:¡:;
qué es lo que **
e: 50 III. REFERENTIAL EXPRESSIONS
:::1
.e A. Identification/ focus
o... 40 cada una de las *
w
.e 30
cada uno de los **
E
:::¡
de las cosas que **
z 20 de lo que es **
es lo que yo *
10 es una cosa que **
es una de las ** *
o es uno de los ** *
Sociolinguistic Academic Prose eso es lo que ***
lnterviews lo que más me *
lo que se llama *
Figure 10.3 Distribution oflexical bundles across structural types
WITH
BUNDLES IN SPEECH AND WRITING
Table 10.3
Bundle
de las cosas
B. Sp•ecifica1tion
**
América Latina y el
Bl) **
de la facultad de **
en la
* dela de
*
la mayor parte de
la mayoría de las
* ** de la universidad de * **
* * de los Estados Unidos * **
la mayoría de los * ** en la universidad de *
B2) attributes en los Estados Unidos
en el grupo de
*
* la facultad de ciencias *
B3) Intan¡jble framing attributes la universidad de Buenos
a pesar de que
*
** * Latina y el caribe *
con el fin de
** universidad de Buenos Aires **
con respecto a la
* C2) Time reference
desde el punto de *** ** a la vez que **
de la sociedad civil
** de la década de **
de la teoría de
de las ciencias sociales
* de la década del *
desde el punto de
** el momento en que * *
*** ** C3) Multijunctional reference
el caso de la
el caso de las
** a lo largo de **
el caso de los
* a lo largo del *
el derecho a la
* a partir de la ***
el desarrollo de la
* a partir de los *
** a través de la * ***
el problema de la
** * a través de las **
el punto de vista
*** ** a través de los * *
el sentido de que
en el area de
*
N. STRUCTURAL ONLY
en el campo de
**
* ** en el que se **
en el caso de en la que se *
** ***
en el caso del a mí no me **
en el marco de
* que a mí me
** **
en el proceso de
en el sentido de
* Key to syrnbols:
** = 10-19 per million words
en la medida en
en relación con el
** * = 20-39 per million words
* ** = 40-99 per million words
en relación con la *** = over 100 per million words
en torno a la
*
la idea de que
**
la medida en
** 3. 2.1 Stance bundles
la práctica de ** The stance bundles that were identified in this study were mainly examples
la teoría de la
** of personal epistemic stance and expressed certainty. Additionally, they all
la teoría de los
** induded the verb creerwith a complement clause. For example:
por parte de los
**
punto de vista de
** Y nosotros entendemos que subdesarrollado es tener poco. Pero yo creo que es
sobre la base de
* * demasiado limitativa, ¿verdad?, la idea de que los que tengan mucho son desarrol-
** lados y los que tengan poco son su ... son subdesarrollados. Yo creo que en este
momento lo que interesa es, realmente, las cosas de fondo que hacen las diferencias,
SPAN e BUNDLES IN SPEECH AND WRITING
Entre las limitaciones del estudio nos seüalar dos: el reducido número Pues eso es lo que iba a decir. En Madrid tienes tu medio de vida por todos los
de encontrar resultados estadísticamente signi- lados. (sociolinguistic interview)
ficativos, y el hecho de que estudio no tenga dos brazos aleatorios con otro
grupo de no tratados con bomba de comparar direc- Another category of the referential bundles is used to attributes. This
tamente ambos grupos. (academic prose) category overwhelmingly includes bundles that were found in academic
prose and are used to specify attributes of the head noun. Sorne of these
A small number of bundles that represent attitudinal stance were also iden-
bundles specify quantities as in the following example:
tified. For example:
La escuela pública jugó un papel fundamental en la transmisión a la sociedad de
... a mí me gusta más una ciudad chica. O sea que La Paz está demasiado grande las formas de legar la cultura de generación a generación y de las tecnologías
ahora ¿no?, no es que está demasiado grande, pero, por ejemplo, esto de los edi- del enseñar y el aprender. La mayoría de las naciones logró articular las escuelas,
ficios a mí no me convence mucho. Prefiero una ciudad como Cochabamba. colegios y universidades, valorándolos como espacios privilegiados para la
(sociolinguistic interview) enseüanza, el aprendizaje y la producción de los saberes públicos. (academic
prose)
3.2.2 Discourse-organizing bundles
Of the discourse-organizing bundles that were identified, there were two Other types of bundles specify tangible and intangible framing attributes.
main types: topic introduction/focus and topic elaboration/clarification. The intangible attributes were more frequent than the tangible framing
The introduction/focus bundles are used to either signify that a new topic attributes, where only one example was found:
is being introduced orto direct the listeners' or readers' attention to aspe-
Respecto a los efectos de las altas temperaturas, una investigación centrada en el
cific part of the discourse. For example, in the following excerpt from a grupo de personas ancianas encuentra un mayor impacto del calor entre las
sociolinguistic interview, the speaker is using the bundle to begin his/her mujeres. (academic prose)
answer to the interviewer's question:
In contrast, numerous examples of intangible framing attributes can be
Pues, ¿qué te voy a decir de la prensa espaüola, que tú no sepas? En mi casa se found in the academic register. These examples tend to identify abstract
compra el 'ABC' no porque seamos monárquicos ni nada, no, simplemente
characteristics:
porque es el periódico más cómodo de manejo, no es más que por eso.
Con respecto a la fecha de diagnóstico, en el caso de los pacientes con EC, 69%
The topic elaboration/ clarification bundles are used to add additional
de ellos fue diagnosticado después del aüo 1995, es decir en la segunda mitad del
information about the speaker's message, as in the following example:
período estudiado. (academic prose)
Me dedico al arte en la casa en los ratos en que tengo libre, hago mosaico, que es Additionally, referential bundles that specify places, institutions
lo que más me gusta, y a veces hago también una ... una técnica de estampado
or times were found:
de tela, pero no en el sentido comercial, sino en el sentido netamente artístico.
(sociolinguistic interview) Utilizamos seis conejos (Oryctolacus cuniculus), machos, sexualmente maduros,
híbridos, clínicamente sanos y alimentados con pellet y zanahorias ad libitum,
The next example includes a bundle that was originally identified as two sep-
obtenidos del Bioterio de la Facultad de Medicina de la Universidad de la
arate four-word bundles (lo que pasa es and que pasa es que) but through la ter Frontera, Temuco, Chile. (academic prose)
analysis was identified as a single five-word bundle lo que pasa es que. All exam-
ples of this bundle were found in the sociolinguistic register: El fútbol, como juego reglamentado, nació en Inglaterra hacia mediados de la
década de 1860. (academic prose)
... ésa es la impresión que yo tengo. Lo que pasa es que siempre tenemos el gran
vicio, nosotros, de ver una islita realmente, ¿no? Nosotros vivimos en Buenos Aires Severa! lexical bundles identified in the referential category were used to
además vivimos en una islita en Buenos Aires. (sociolinguistic interview) perform different referential functions in different contexts. In the case of
WORKING SPANISH CORPORA LEXICAL BUNDLES IN SPEECH AND WRITING
or
60
HCH,,UdillUCHlC alo 111
el)
1999 hasta 50
inicios de la i5
i:::
-...
;::¡
.o 40
Alo distintas sociedades ocuparon los territorios que o
íll 30
formarían la en el extremo noroeste de (academic .e
text) E
::;¡ 20
z
Another example of these multi-functional referential bundles is a través de
la/los/las. As shown in the following examples, these bundles were used indi- 10
cating time, place and text reference:
o
La justificación de la existencia se presenta como superación de ésta en la obra, por Sociolinguistic Academic Prose
medio de la consolidación de la duración a través de la narración. (academic text) lnterviews
Controlando este efecto, es posible distinguir aquellas variables que son efectiva- Figure 10.4 Distribution oflexical bundles across functional categories
mente significativas a través de los años. (academic text)
Han abandonado el medio arbóreo, han abandonado la vida, propiamente, en las
galerías - se les llama así - a través de los ríos, en donde la naturaleza los ... les o NP/PP-based bundles
prestaba, propiamente, protección. (sociolinguistic interview) 11 VP-based bundles
33 The interaction between structural and functional categories Figure 10.5 Interaction of structural and functional categories for both registers
As shown in Figure 10.4, referential bundles are predominant in both express stance and discourse-organizing functions, while NP-based bundles
spoken and written Spanish registers. In comparison, stance bundles are are typically used to express referential functions (see Biber et al. 2004).
common in sociolinguistic interviews but not in academic prose. There
are few discourse-organizing bundles in both registers, although more are
4. Discussion and condusion
found in sociolinguistic interviews.
Figure 10.5 shows the interaction between the structural characteristics The study identified the lexical bundles used in a corpus of acade-
and functional categories of lexical bundles in both registers. Referential mic prose and sociolinguistic interviews in classifying bundles
bundles are usually expressed by means of noun phrases or prepositional both and functionally. In sorne the findings are very
phrases, whereas stance and discourse-organizing functions are usually similar to those found in of lexical bundles in Spanish
realized as VP-based bundles. It is interesting to note that lexical bundles in and English et al. 2004). The most obvious
English show these same assodations of VP-based bundles being used to is the strong association between the structural and functional
WORKING SPANISH CORPORA LEXICAL BUNDLES IN SPEECH AND WRITING
characteristics of bundles
Gredos.
29-42.
communication', in U.
Berlín: de pp. 289-304. Somali'. Variation and
and Huckin, T. Gen re K ~rn11,1fp1ifre Biber, D. and Hared, M. in Somali: conse-
Communication: Cognition, Culture, Power. Hillsdale, NJ: Erlbaum. quences'. Annual Review of Applied Linguistics 12, 260-82. .. .
Bhatia, V. (1993) Analysing Genre: Language Use in Professional Settings. Biber, D. and Hared, M. (1994) 'Linguistic correlates ofthe trans1tlon to ht-
London: Longman. eracy in Somali: Language adaptation in six press registers', in D. Biber
Bhatia, V. (1995) 'Genre-mixing in the professional communication: The case and E. Finegan ( eds), Sociolinguistic Perspectives on &gister. Oxford: Oxford
of "prívate intentions" v. "socially recognised purposes" ', in P. Bruthiaux, University Press, pp. 182-216. .
T. Boswood and B. Bertha (eds), Explorations in English for Professional Biber, D., Conrad, S. and Cortes, V. (2003) 'Lexical bundles m speech and
Communication. Hong Kong: City University ofHong Kong, pp. 1-19. writing: An initial taxonomy', in A. Wilson, P. Rayson and T. McEnery
Bhatia, V. (2004) Worlds of Written Discourse: A Genre-Based View. London: (eds), Corpus Linguistics by the Lune. Frankfurt: Lang, pp. 71-93.
Continuum. Biber, D., Conrad, S. and Cortes, V. (2004) 'Ifyou look at ... lexical bundles
Biber, D. (1985) 'Investigating macroscopic textual variation through multi- in academic lectures and textbooks'. Applied Linguistics 25, 371-405.
feature/multi-dimensional analyses'. Linguistics 23, 337-60. Biber, D., Conrad, S. and Reppen, R. (1998) Corpus Linguistics: Investigating
Biber, D. (1986) 'Spoken and written textual dimensions in English: Language Structure and Use. Cambridge: Cambridge University Press.
Resolving the contradictory findings'. Language 62, 384-414. Biber, D.,Johansson, S., Leech, G., Conrad, S. and Finegan, E. (1999) The
Biber, D. (1988) Variation Across Speech and Writing. Cambridge: Cambridge Longman Grammar of Spoken and Written English. London: Longma~.
University Press. Blanche-Benveniste, C. (1998) Estudios Lingüísticos sobre la &laczon entre
Biber, D. (1992) 'On the complexity of discourse complexity: A multidi- Oralidad y &critura. Barcelona: Gedisa. .
mensional analysis'. Discourse Processes 15, 133-63. Blas,J. L. (2000) 'Aspectos sobre la variación lingüística en la lengua escnta:
D. (1994) 'Using register-diversified corpora for general language la expresión de futuridad en el español literario'. Lingüística Española
studies', in S. Armstrong ( ed.), Using Large Corpora. Cambridge: The MIT Actual22, 181-200.
Press, pp. 180-201. Bolívar, A. (2000) 'Homogeneidad versus variedad en la estructura de los
Biber, D. ( 1995) Dimensions ofRegister Variation: A Cross-Linguistic Comparison. resúmenes de investigación para congresos'. Akademos 2, 121-38.
Cambridge: Cambridge Press. Bosani, A. (2000) 'Verbos de comunicación y discurso', in J. de Bustos,
Biber, D. (1996) 'Investigating language use through corpus-based analyses of P. Charaudeau,J. Girón, S. Iglesias and C. López (eds), Lengua, Discurso,
association pattems'. International journal of Corpus Linguistics l, ( 2), 171-97. Texto: J Simposio Internacional de Análisis del Discurso. Madrid: Visor,
Biber, D. (2003) 'Variation among university spoken and written registers: pp. 253-62.
A new multí-dimensional analysis', in P. and C. (eds), Bosque, l. (1990) Las Categorías Gramaticales. Madrid: Síntesis.
Language Strncture Use. Amsterdam: Rodopi, Bosque, I. (1999) 'El nombre común', in I. Bosque and V. Demonte ,
pp. 47-70. GraináticaDescriptiva de la Lenguafüpañola. Madrid: Espasa Cal pe, pp. 3-76.
Biber, D. (2005) 'Paquetes léxicos en textos de estudio universitario: Bosque, I. and Demonte, V. (eds) Gramática Descriptiva de la Lengua
Variación entre disciplinas académicas'. Revista Signos Española. Madrid: Espasa Calpe.
Biber, D. (2006) Language: and Brinker, K. (1988) Linguistische extan1cztw Berlín: E. Schmidt.
Written Amsterdam: Benjamins. Brucart, J. (2000) 'L'analisi sintactica i la seva terminología en !' ensenya-
Biber, D. and Conrad, S. (1999) 'Lexical bundles in conversation and acad- ment secundari', in J. Macia Guila and J. Sola ( eds), La
, in H. and S. Oksefjell , Out Studies en Secundari. Barcelona:
wnrrn,snn Amsterdam: Rodopi, pp. 181-9.
REFERENCES
redes
y Sociedad
pp. 60-77. Castella Lidon, J. Oralitat
frameworks m 11m"~"·nrrrP Barcelona: Publicacions de de Montserrat.
Castellano, A. variación del
, in M. Mml.oz, G. Femández, V. Benítez , IV
de General. Cádiz: Universidad de Cádiz, pp. 521-31.
J. 'Sequentiality as the basis of constituent structure', in Castellón, I., Fernández, A., Martí, A., Morante, R. and Vázquez, G.
T. Givón and B. Malle (eds), The Evolution ofLanguage out oJPre-language. 'An interlingua representation based on the lexico-semantic informa-
Amsterdam: Benjamins, pp. 109-34. tion'. Online. Retrieved from: http://crl.nmsu.edu/Events/FWOI/
Cabré, M. (1999) 'El discurs especialitzat o la variació funcional determi- SecondWorkshop/paper/ castellon.html.
nada per la tema ti ca: Noves perspectives ', in M. Cabré ( ed.), La Cepeda, G. (2002) 'Entonación, actitud y modalidad'. fütudios Filológfros
Terminología: Representación y Comunicación. Una Teoría de Base Comunicativa 37, 7-28.
y Otros Artículos. Barcelona: IULA, pp. 151-73. Chafe, W. ( 1982) 'Integration and involvement in speaking, writing and oral
M. (2000) La Terminología: Representación y Comunicación. Elementos literature ', in D. Tannen ( ed.), Spoken and Written Language: Exploring
para una Teoría de Base Comunicativa. Barcelona: Instituto Universitario de Orality and Literacy. Norwood, NJ: Ablex, pp. 35-53.
Lingüística Aplicada, Universitat Pompeu Fabra. Chafe, W. ( 1985) 'Linguistic differences produced by differences between
Cabré, M. (2002) 'Textos especializados y unidades de conocimiento: speaking and writing', in D. Olson, N. Torrence and A. Hidyard
Metodología y tipologización', in.J. García and M. Fuentes (eds), Texto, ( eds), Literature, Language and Learning: The Nature and Consequences of
Terminología y Traducción. Barcelona: Almar, pp. 122-87. Reading and Writing. Cambridge: Cambridge University Press,
Cademártori, Y (2003) 'La inscripción de las personas en textos de divul- pp. 105-23.
gación científica'. Revista Latinoamericana de Estudios del Discurso 3, ( 1), 9-28. Chafe, W. (1986) 'Evidentiality in English conversation and academic
Cademártori, Y, Parodi, G. and Venegas, R. (2006) 'El discurso escrito y writing', in W. Chafe and J. Nicho Is (eds), Evidentiality: The Linguistic
especializado: caracterización y funciones de las nominalizaciones en los Codingof Epistemology. Norwood, NJ: Ablex, pp. 261-73.
manuales técnicos'. Literatura y Lingüística 17, 243-65. Chafe, W. (1992) 'The importance of corpus linguistics to understand the
E. (2000) 'Decir la ciencia: Las prácticas divulgativas nature of language', in J. Svartvik ( ed.), Directions in Corpus Linguistics.
de mira'. Revista Iberoamericana del Discurso y Sociedad 2, (2), Berlin: de Gruyter, pp. 79-97.
Chafe, W. (1994) Discourse, Consciousness and Time. Chicago: The University
H. 'Análisis discursivo de la divulgación of Chicago Press.
Online. Retrieved from: Chafe, W. and Danielewics,J. (1987) 'Properties of spoken and written lan-
danielcass/ anali.htm. guage', in R. Horowitz and J. Samuels (eds), Comprehending Oral and
Calsamiglia, H. and Tusón, A. (] Las Cosas de Decir: lvianual de Análisis Written Language. New York: Academic Press, 83-115.
de Discurso. Barcelona: Ariel. Christie, F. (1998) 'Science and apprenticeship: pedagogic discourse',
(1 in J. Martín and R. Veel Science: Critical and Functional
Gramática Perspectives on Discourse London: Routledge, pp. 152-80.
Christie, F. (2005) Classroom Discourse A Functional P~·cJ>,,rh"'"
London: Continuum.
Christie, F. and Martín, J. Genre and lnstitution: Social Processes
the and School. London: Continuum.
norma y habla del futuro de Church, K. 'Introduction to the
Rohrer Semantikos. Studia
Honorem Coseriu 1921-1981. Madrid: 383-94.
SPANISH REFERENCES
, in].
pp. 35-60.
Flamenco García, L. 'La coordinación adversativa', in l.
and argument selection'. V. Demonte , Gramática de la
3855-78.
Fleischman, S. The Future in and
( 'Genre VLO>Ll;<,áU.UU Of the H'1Crr'<l"
tion and discussion sections of MSc. , in M. Coulthard ( ed.), Cambridge University Press.
Talking about Text. Birmingham: English Language Research, Flowerdew,J. (ed.) (2002) Academic Discourse. London: Longman.
Fontanella, M. (1999) 'Sistemas pronominales de tratamiento usados en el
Birmingham University, pp. 128-45.
mundo hispánico', in I. Bosque and V. Demonte (eds), Gramática
Dyer, J. and Keller-Cohen, D. (2000) 'The discursive construction of pro-
fessional self through narratives of personal experience'. Discourse Studies Descriptiva de la Lengua Española. Madrid: Espasa Cal pe, pp. 1399-1426.
Fortanet, I. (2005) 'Honoris Causa speeches: an approach to structure'.
2, (3), 283-304.
R. (1991) 'Futuro analítico y futuro sintético en tres obras con Discourse Studies 7, ( 1), 31-51.
rasgos coloquiales: El Corbacho, La Celestina v La lozana andaluza' in Francis, N. (1979) 'A tagged corpus: Problems and prospects', in
K. H. Korner (ed.), HomenajeaHansFlasche. Stuttgart: Steiner, pp. 499-508. S. Greenbaum, G. Lee ch and J. Svartvik ( eds), Studies in English Lz·nv1us1;zcs
Eggins, S. and Martin,J. R. (2003) 'Context as genre: a functional linguistic for Randolph Quirk. London: Longman, pp. 192-209.
Fuentes, J. (1985) Gramática Moderna de la Lengua Madrid:
perspective'. Revista Signos 36, (54), 185-205.
Ellis, N. (1996) 'Sequencing in SIA: Phonological memory, chunking, and Editorial Bibliográfica Chilena.
points of order'. Studies in Second Language Acquisition 19, 91-126. Galán, C. (1999) 'La subordinación causal y final', in l. Bosque and
V. Demonte ( eds), Gramática Descriptiva de la Española. Madrid:
Escandell, V. (1993) 'Conectivas: el caso de la conjunción y', in V. Escandell,
Introducción a la Pragmática. Barcelona: Anthropos, pp. 185-97. Espasa Calpe, pp. 3597-642.
Ferguson, C. A. (1983) 'Sports announcer talk: Syntactic aspects ofregister Gallardo, S. (1999) 'Evidencialidad: la certeza y la duda en los textos
variation', Language in Society 12, 153-72. periodísticos sobre ciencia'. Revista de Lingüística Teórica y
Fernández, A., Vázquez, G., Martí, A. and Castellón, I. (1999) 'Los 37, 53-66.
predicados de cambio y su representación en una BCL'. Online. Gallardo, S. (2005) Los Médicos Recomiendan. Un Estudio de las Notas
Retrieved from: http:/ /www.sepln.org/revistaSEPLN/revista/24/24- Periodísticas sobre Salud. Buenos Aires: Eudeba.
Camero, S. (2001) La Traducción de Textos Técnicos. Barcelona: Ariel Lenguas
Fernández, (1999) 'El pronombre personal. Formas y distribuciones. Modernas.
Genette, G. !JI. Paris: Le Seuil (Points).
Pronombres átonos y tónicos', in I. Bosque and V. Demonte (eds),
Ghadessy, M. (1993) Register Theory and Practice. London:
Gramática Descriptiva de la Lengua Española. Madrid: Espasa Calpe,
pp. 1209-74. Pin ter.
de Sintaxis Española. Barcelona: Vox.
Ferrari, L. 'Modalidad epistémica y grados de certeza en los artícu- Gili
of classification in LSP
los de . Ponencia Presentada en el lll de del
Symposium on LSP, Copenhagen,
Mercosur: De la Teoría a la Praxis de las Lenguas, Universidad Nacional del
Argentina. Denmark.
genres.
S. ( 1999) 'Los marcadores de evidencialidad Glaser, R.
en una controversia ambiental'. Discurso y Fachsprache.'
Gnutzmann, C. and ~,.~~"~
verbs based LSP-research: Theoretical considerations
from: in H. Schroder
/Y11rhr><'"' and Text
REFERENCES
WORKING SPANISH CORPORA
.rn1x11fl.'w 1-1o·~"Mn
Léxico Fundamental
Ediciones Universitarias de and Martin, J. Science: Discursive
Power. Pittsburgh: University of Pittsburgh Press.
Hartley, J. and Kostoff, R. (2003) 'How useful are "key words" in scientific
journals?'. journal of Iriformation Science 29, (5), 433-8. . ,
Harvey, A. (2002) 'Representación e imagen del quehacer oenhfi~? en los
medios de comunicación', in G. Parodi (ed.), Lznguzstzca e
Interdisciplinariedad. Desafíos del Nuevo Milenio. en Honor a Marianne
Peronard. Valparaíso: Ediciones Universitarias de Valparaíso,_pp. 335-53.
Heinemann, W. (2000) 'Textsorten. Zur Diskussion um Bas1sklassen des
Kommunizierens. Rückschau und Ausblick', in K. Adamzik (ed.),
Textsorten. Reflexionen und Analysen. Tübingen: Stauffenburg, rP·
Heinemann, W. and Heinemann, M. (2002) Grundlagen der 1extlznguzstzk.
_9-29 ..
y medio de
26thAnnual
pp. 234-54.
edge" theories: Can the
genre of the texbook accommodate both?', in J. Flowerdew
Acadernic Discourse. London:
Manning, C. and Schütze, H. (2003) Foundations Statistical Natural
Lazaraton, A. (2002) 'Quantítative and qualitative approaches to discourse
Language Processing. Cambridge, YMA: MlT Press. _ . . .
. Annual &view ofApplied Linguistics 22, 32-51.
Marcos, F. (1975) Aproximación a la Gramática Espanola. Madnd: Ed1tonal
Leech, G. (1991) 'The state ofthe artin corpus linguistics', in K. Aijmer and
Cincel.
B. Altenberg (eds), English Corpus Linguistics. Studies in Honour of Jan
Markkanen, R. and Schroder, H. (2000) 'Hedging: A challenge for
Svartvik. London: Longman, pp. 8-29.
pragmatics and discourse analysis'. Online. Retrieved from: http:/ /
Leech, G. (1992) 'Corpora and theories of linguistic performance', in
sw2. euv-frankfurt-o. de /Publikl 1ationen/Hedging/ markkane. h tml.
J. Svartvik (ed.), Directions in Corpus Linguistics: Proceeding of Nobel Martín, G. (1986) Curso de Redacción. Madrid: Paraninfo.
Symposium. Berlin: de Gruyter, pp. 105-22.
Martin,J. (1992) English Text. Amsterdam: Benjamins. .
B. (1993) English Verb Classes and Alternations. A Preliminary
Martin, J. (1993) 'Technicality and abstraction', in M. Hall_1day and
Investigation. Chicago: The University of Chicago Press.
J. Martin, Writing Science: Literacy and Discursive Power. P1ttsburgh:
Lledó, E. (1995) 'Usos lingüísticos y género'. Textos de Didáctica de la Lengua
y la Literatura 6, 29-34. University of Pittsburgh Press, pp. 23-46. . . .
Martin,J. (1997) 'Register and genre: Modeling social context m funct10nal
Longacre, R. (1983) The GrammaroJDiscourse. NewYork: Plenum.
linguistics - narrative genres', in E. Pedro (_ed.), Proce.edz1:gs of the Fzrst Lzsbon
Blanch, J. M. (ed.) (1977) fütudios Sobre el Español Hablado en las
lnternational Meeting on Discourse Analysis. L1sbon: Cohbn/ APL, pp. 2_12-56.
Ciudades de América. Mexico Universidad Nacional Martin,J. (1998) 'A modeling context: The crooked path of progres,s m con-
Autónoma de México.
textual linguistics (Sydney SFL) ',in M. Ghadessy (ed.), Text and Context m
'--''<Hn.a, J. (1991) Estudios Sobre el ü"""'''"'n de A1éxico. Mexico
Universidad Nacional Autónoma de México. Functional Linguistics. Amsterdam: Benjamins, pp. 134-87._ . , .
(1999) 'Relaciones Martin, J. (2001) 'Beyond exchange: APPRAISAL systems m. Enghsh , m
, in l. Bosque and S. Hunston and G. Thompson (eds), Evaluation in Text: Authorzal Stance and
V. Demonte ( eds), Gramática
Espaiiola. Madrid: the Construction ofDiscourse. Oxford: Oxford University Press, pp. 34-55.
Calpe, pp. 3507-47.
Martín, J. and Rose, D. (2003) Working with Discourse: Meaning Beyond the
C. (2001) 'La comunicación del saber en los géneros académicos:
Clause. London: Continuum.
de modalidad de evidencialidad'. Martin,J. and Veel, R. (eds) (1998) &ading Science: Critica[ and Functional
PPJ'.>h1?ctives on Diswurse London: Routledge.
Martin,J., Christie, F. Rothery,J. (1987) 'Social processes in education.
A reply to and Watson others) ', in l. ~eid . , The Place
Genre Learning: Current Debates. Geelong, Australia: Deakm
'Las el
Press, pp. 46-57.
Berna! and J. DeCesars
P. 'A genre a11a1vM~ of and Spanish research paper
Paz Battaner: Barcelona: Institut Universitari social sciences'. Specific
11JHLdUct, pp. 147-59.
S.
and Portolés, J. 'Los marcadores del dis-
and V. Demonte Gramática de la
'.winnl!uL Madrid: pp. 4051-213.
SPANISH
REFERENCES
Press.
Universitarias de
Demonte
Parodi, G. 'Textos de y comunidades discursivas
''··""Inmn Madrid: Espasa pp. 1575-630.
técnico-profesionales: Una aproximación basada en corpus computa-
Mendikoetxea, A ( l 999b) 'Construcciones con se: Medias, pasivas e imper-
rizado'. Estudios Filológicos 39, 7-36.
sonales', in I. and V. Demonte (eds), Gramática de la
Parodi, G. (ed.) (2005~) Discurso e Instituciones Formadoras.
J',s17anola. Madrid: Espasa Calpe, pp. 1631-722.
Val paraíso: Ediciones Universitarias de Val paraíso.
S. (1999) 'El discurso del libro de texto: Una propuesta
Parodi, G. (2005b) 'La comprensión del discurso escrito
estratégico-pragmática'. Revista Iberoamericana de Sociedad 1, (2),
85-104. en ámbitos técnico-profesionales: ¿Aprendiendo a texto?'.
RevistaSignos38, (58), 221-67. , . . . . .
G. (2003) 'Paradigma científico y lenguaje especializado'. Revista
Parodi, G. (2005c) 'Lingüística de corpus y anahs1s multid1mens10nal:
de la Facultad de de la Universidad Central de Venezuela 18, (3),
5-14. Exploración de la variación en el corpus PUCV-_20.03: Una . .
mación multiniveles', in G. Parodi (ed.), Discurso Especializado e Instztuczones
Moliner, M. (1986) Diccionario de Uso del Español. Madrid: Credos.
Formadoras. Valparaíso: Ediciones Universitarias de Valparaíso, pp. 83-125.
Montolío, E. (2001) Conectores de la Escrita. Barcelona: Ariel
Practicum. Parodi, G. (2006) 'Reading-writing connections: Discourse-oriented
research'. Reading and Writing Interdisciplinary Journal July, 1-26.
R., Castellón, J. and G. 'Los verbos de
Parodi, G. (2007) Lingüística de Corpus. Buenos Aires: Eudeba.
ria'. Online. Retrieved from: http://grial.uab.es/archivos/2000-13.pdf.
Parodi, G. and Gramajo, A. (2003) 'Los tipos textuales del corpus PUCV-
Moreno de Alba,]. G. (1970) 'Vitalidad del futuro de indicativo en Ja norma
2003: una aproximación multiniveles'·, Revista 36; , 207-2.3.
culta del español hablado en México'. Anuario de Letras 8, 81-102.
Parodi, G. and Venegas, R. (2004) 'BUCOLICO: Aplicacion computaoor_ial
E. (2000) Comunicar Ciencia. El Artículo y las
para el análisis de textos (hacia un análisis de rasgos de la mformatlv1-
L,o:munu:ac:wnr:s a Buenos Aires: U niversídad Nacional de Lomas
dad) '. y Literatura 15, 223-51.
A. and J. 'From to
'La conferencia académica', in L. Cubo de Severino
, Los Textos de la Ciencia. Córdoba:
pp. 189-217.
M. (l un texto escrito?', in
M. L. Gómez, G. Parodi and P. Núñez , de
Textos fücritos: De la Teoría a Sala de Clases. de Chile: Andrés
Bello, pp. 55-78.
M., Gómez, L., G. and
Textos Escritos: De la 'Teoría a la Sala de Clases. ,.,,..,"·''""u
Andrés Bello.
WORKING WITH SPANISH CORPORA REFERENCES
¿Cómo se escribe
'Clasificación verbal:
,)nUafY'm,a 3. Lleida: Edicions de la
K (1998)
Gemsbacher and S.
of Science.
patterns of lexis
International
'98,
201-3
6,
181, 183,
features 19, 41-53 185-8
WORKING SPANISH CORPORA lNDEX
notion of94
12
142-3 stance 94, 95, 96
use in different areas 134-42 126 157-61
multi-dimensional 14, 55, Protoverbs 119 Didactic Guidelines 159,
57-9 invoked audiencc 161-3
differences between and 56 of
other languages 85 register differences 56 Technical Article 159, 160-1
methodological steps 58-9 register, factor for linguistic variation style stance 95 textbooks 170
of register variation in Spanish 54-5 subjective assessment expressions textual criteria 154-5
59-65,82 register features 56 189, 190 textual organizer 182-3, 184
salient characteristics 57-8 register, Halliday's definition of
multi-functional analysis, performing 57 tag fields 62 variability in PUCV-2003 corpus 32-8
a 18 register markers 56 tagger, grammatical 59-60, 62, 75 verba dicenci 11 O
multimodality, and texts 157 register, the concept of 95 Technical-Scientific (TSC) verbs, studies on 11 O
multiple feature analysis 110 register variation 11-12, 54-6, 57, 15, 16, 150 verbs and corpora 110-13
multi-word prefabricated expressions 59-65,84, 126 temporal distance 136, 137-8,
217 general patterns of 84 142-3 written competence, of students 173,
variation across registers 14 tests oflinguistic competence 145 175
neutral communication verbs 113, registers 54-6
116, 119-20, 126 of Spanish 115
NOTICEN TV 130 relationships between 24
researching Spanish
focalization 14 computational resources 6-1 O
Oral Interviews Corpus ( OIC) 15, 17 databases 6-10
orality 93, 96, 104 references 6-1 O
orality/writing dyad 93 websites 6-10