Vous êtes sur la page 1sur 22

Running head: CORPUS ANALYSIS OF LOVE 1

Corpus Analysis of love in British and American Literature

Lauren Porter

Colorado State University


CORPUS ANALYSIS OF LOVE IN BRITISH & AMERICAN LIT. 2

Introduction

The aim of this research is to use corpus analysis to identify different linguistic

cultural representations of the concept of love, as expressed using the word love, in

British and American Literature. Using both quantitative and qualitative analysis, this

research will examine the different representations of love in both F. Scott Fitzgeralds

The Great Gatsby (published in 1925), and Jane Austens Pride and Prejudice (published

in 1813). While universal ideas and realities (such as the abstract representation and

actual physical expression of love), exist across all cultures, cultures express these

universal ideas in different ways. That said, this research aims to answer the question,

Do British and American literary classics, as representations of their respective cultures,

linguistically represent love differently? If so, how? In more general terms, this study

aims to elicit linguistic cultural differences that exist with shared abstract ideas, as

different cultures vary in terms of linguistic representation of different concepts.

These two texts were chosen for corpus analysis as they are both well-read

literary classics, which are representative not only of the time period in which they were

written, but are also widely-read still today. Additionally, both texts share the themes of

love, class, and courtship, which allows for overlap of themes between the two texts. In

order to understand the results of the corpus analysis, and quantitative and qualitative

findings from it, it is important to understand a brief background and context of each

novel.

F. Scott Fitzgeralds The Great Gatsby was published in 1925 in America. The

book follows Nick Carraway, as he moves East to Long Island to work in the bonds

business. There, he befriends Jay Gatsby, who is in love with Nicks cousin, Daisy. The
CORPUS ANALYSIS OF LOVE IN BRITISH & AMERICAN LIT. 3

novel follows Nick as he participates in the New York social scene, and we learn of the

destructive behaviors, affairs, and scandals of the various characters. In terms of

historical context, the book is set in 1920s America (the Roaring 20s), which was a time

of prosperity, wealth, jazz, extravagance, and prohibition before the Great Depression.

Many people valued wealth and affluence, and because of the economic boom,

Americans experienced wealth and consumerism as never before.

Jane Austens Pride and Prejudice was published in 1813 in England. The novel

follows the Bennets, who have five daughters. The book follows the courtship of the

daughters by various suitors, but mainly follows two of the sisters- Jane and Elizabeth.

Elizabeth meets Mr. Darcy, whom she finds arrogant, but after a series of twists and turns

in the plot (and more proposals), she accepts his proposal and marries him.

In terms of historical context, this book was set in 19th century England, and was

written during the Romantic period, when literature was marked by the emphasis on

emotion, individualism, and nature. Some of these themes and feelings were a reaction to

the Industrial Revolution and modernization of England that was occurring during that

time.

Considerations

While this study aims to study cultural representations of love in both literary

works, this variable was not isolated in this study. The variable of culture exists because

one book is American and one is British, however there are two other significant

variables that need to be considered, and which couldve affected results. The first is the

date of publication, as the books were written in two different centuries. That said, corpus

analysis results may represent not only cultural differences, but also time-period
CORPUS ANALYSIS OF LOVE IN BRITISH & AMERICAN LIT. 4

differences. A more accurate representation would be to choose two novels from the same

time period. The second important consideration is that one novel was written by a male,

and the other by a female. It is possible that gender differences also contributed to

differences in the representation of love; however, it is hard to determine where this

variable may have affected the results. In order to isolate the variable of culture most

effectively, it would be pertinent to choose novels written in the same period as well as by

the same gendered author.

Literature Review

Stubbs (2001) provides a good introduction to corpus analysis- what it is, and how

it can be useful in understanding language. Stubbs (2001) describes that our interpretation

of a texts difficulty or ease of understanding is compared to other things that we have

read or heard in the past. Stubbs says, This means that individual texts are interpreted

against an intertextual background of norms of language use. These norms, which are

expressed largely in collocations of words, can be revealed by the computer-assisted

analysis of large corpora (p. 304). Stubbs (2001) proceeds by describing how

comparisons using corpora can help to understand text cohesion, intertextual relations,

and the extent to which linguistic competence includes knowledge of norms of language

use (collocations). In regard to the current study, Stubbs information is helpful mostly in

terms of intertextuality. While the corpus of these two novels was not compared to a more

general corpus, they were compared to each other, and corpus allowed for the comparison

(intertextually) of not only collocations, but many other linguistic features, using the

word love as the focus.


CORPUS ANALYSIS OF LOVE IN BRITISH & AMERICAN LIT. 5

Corpus analysis has many applications, and was used by Baker et al. (2008), in

conjunction with critical discourse analysis, to identify common categories of the

representation of refugees, asylum seekers, immigrants, and migrants (RASIM) in British

news articles. The study used collocations and concordance analysis to identify

categorical representations of these groups. The study also directed analysts to

representative texts to carry out qualitative analysis on the topic as well. This article is

helpful for this study because the researchers combined quantitative and qualitative

analyses, as this study does as well. The studies differ as the present one does not

incorporate critical discourse analysis, which focuses on theoretical concepts such as

power, ideology, and domination. Instead, the current study focuses on the cultural

representation of the abstract concept of love, but is not concerned with power or

ideological relations.

The Baker et al. (2008) study is helpful for the current one because one of the

research questions was, what attitudes towards RASIM emerge from the body of UK

newspapers seen as a whole? (Baker et al., 2008, p. 276), which indicates that

quantitative data was used to make qualitative inferences, similar to the current study.

The research also aptly describes that while corpus linguistic methods allow for a

reasonably high level of objectivity, that researcher subjectivity via subjective researcher

input is typically involved in every stage of the analysis (Baker et al., 2008, p. 277). This

is important to note because, during the transition from quantitative to qualitative analysis

(or the qualitative implication of quantitative data), there is subjectivity on the part of the

researcher in determining what the quantitative data might mean, qualitatively. Like the

present study, Baker et al. (2008) supplemented collocation findings with concordances,
CORPUS ANALYSIS OF LOVE IN BRITISH & AMERICAN LIT. 6

which allow the researcher and analyst to see the words in a larger context. As Baker et

al. says, Concordance analysis affords the examination of language features in co-text,

while taking into account the context that the analyst is aware of and can infer from the

context (2008, p. 279). Viewing concordances within larger contexts was important for

this study, and, as Baker et al. wrote, helped with making inferences. Once again, it

should be noted that inferences are subjective by nature, and thus subjectivity is a part of

this study.

A shortcoming of the Baker et al. (2008) study was that while the corpus

linguistics used the whole corpus, because of time and money constraints, the critical

discourse analysis was not able to use the whole corpus. Instead, the researchers had to

choose a sample of texts from the corpus to use for the critical discourse analysis.

However, given that the current study is not concerned with critical discourse analysis,

this study is still very helpful.

Corpus analysis a large area of applied linguistics, and in addition to being used

alongside critical discourse analysis, has been used to analyze literature previously, as

well. Fischer-Starcke (2010) has used corpus linguistics in literary analysis, specifically

with Jane Austen and other contemporaries of the author. Fischer-Starckes (2010) book

provides an introduction to corpus analysis and shows its application in the corpus

analysis of literary texts, specifically of Austens novel Northanger Abbey, corpora of her

other novels, and corpora of texts that are Austens contemporaries. The analysis focuses

on the impact of quantitative keywords, phraseological units, and frequent words as they

affect literary meanings and structural organization. Fischer-Starckes (2010) work is

helpful because it is another example that demonstrates corpus linguistics wide range of
CORPUS ANALYSIS OF LOVE IN BRITISH & AMERICAN LIT. 7

applicability of use. Additionally, it demonstrates that there is interest in the work of Jane

Austen outside of this study, which indicates that is an area where, though research has

begun, more can be added. Similar to Baker et al. (2008), Fischer-Starcke (2010) also

addresses issues regarding subjectivity and objectivity in corpus analysis, which supports

this study because a consideration of the current research is fallibility of results that

subjectivity can contribute to. Unlike Fischer-Starcke (2010), this research will not focus

on literary meanings or structural organization. Instead, the current study does quite the

opposite, as it aims to focus on cultural meanings that existed outside of the text.

Fischer-Starcke (2009) is a corpus analysis of Jane Austens Pride and Prejudice.

The study used corpus analysis to reveal meaning in fiction, whereas previous studies

targeted non-fiction work. Fischer-Starckes (2009) study used keywords and frequent

phrases in Pride and Prejudice to reveal literary meanings that were not apparent with

analyses by literary criticism or in literary critical secondary sources. The implication of

this study is that it provides evidence for a potential of corpus analysis in literature. This

is important to the current study, as corpus analysis was not only used with two pieces of

fiction, but helps bring credibility to the use of corpus analysis with fiction work.

However, unlike Fischer-Starcke (2009), the current study is not concerned with literary

meaning, and also aims to compare two works of fiction, instead of the focus on one.

This study contributes to the field of corpus analysis with fiction work as analysis

is carried out with two pieces of fiction. The research is new because the corpus analysis

is not intended to be used for literary analysis. Instead, this work combines and bridges

the fields of literature and sociolinguistics via the use of corpus analysis.
CORPUS ANALYSIS OF LOVE IN BRITISH & AMERICAN LIT. 8

Method

This study uses corpus data collected from Jane Austens Pride and Prejudice and

F. Scott Fitzgeralds The Great Gatsby in order to identify cultural representations of love

via the use of the word love in both works. In order to achieve this, both quantitative

and qualitative methods are used.

Using the corpus software AntConc, both texts were analyzed for the total

frequency of the lexical item love in both texts. Then, the software was used to

determine n-grams of love within each text. Afterward the software was used to locate

collocations of love, with a minimum frequency of 2 times in the novel, and with a

range of 3R-3L, in both novels. Finally, each instance of love was located in its

concordance line, and larger context, for both texts, and this is where the data was

analyzed qualitatively because this is where the lexical item love was able to be

examined within a larger context.

This qualitative analysis was coded based on the quantitative data, which derived

from the collocations and concordance lines of love. The data was coded based on the

collocates parts of speech. The parts of speech were coded in any position within the

collocation. In other words, it did not matter where in the range of the collocation the

frequent collocate occurred, it was analyzed based on its frequency. For ease of coding,

these parts of speech were put into three categories: 1) article or preposition 2) pronoun,

noun, or possessive 3) adjective, adverb, or verb 4) copular be (is). These concordance

lines were analyzed in relation to the larger context of the paragraph in which they

occurred. This allowed for a qualitative interpretation of the lexical item in context of the
CORPUS ANALYSIS OF LOVE IN BRITISH & AMERICAN LIT. 9

literature, and then this information was used in order to make inferences about the

cultural representation of love as a whole.

For this project, both texts were located first as PDFs on the web and then

converted to .txt format so that they could be used in AntConc.

Results

Tables 1-5 (below) display the results from the corpus analysis for frequency, n-

grams, and collocations. Tables 6-7 (below) display the quantitative data that was coded

and then used for qualitative analysis. For preservation of space and ease of reading, the

contextual examples of every collocation are not presented in this study.

Table 1 (below) shows the total frequency of love in both works, as well as the

normed frequency of love, which allows for a comparison of the frequency of use in

each text.

Table 1.

Frequency of love

Book Frequency of love Normed frequency

The Great Gatsby 24 478- .000478%

Pride and Prejudice 91 723- .000723%


CORPUS ANALYSIS OF LOVE IN BRITISH & AMERICAN LIT. 10

Table 2 (below) presents the n-grams/clusters of love in The Great Gatsby.

Table 2

N-grams of love in The Great Gatsby

N-Gram Frequency

love with 4

love to 3

love you 3

love belongs 1

love daisy 1

love every 1

love her 1

love him 1

love himpossibly 1

love it 1

love nest 1

love new 1

love through 1

love, but 1

love, nick 1

love, nor 1

love, of 1
Table 3 (below) presents the n-grams/clusters of love in Pride and Prejudice.
CORPUS ANALYSIS OF LOVE IN BRITISH & AMERICAN LIT. 11

Table 3.

N-grams of love in Pride and Prejudice

N-Gram Frequency

love with 17

love to 5

love him 4

love of 4

love, and 4

love in 3

love, i 3

love and 2

love as 2

love before 2

love her 2

love me 2

love; and 2

love; for 2

love a 1

love by 1

love can 1

love each 1

love for 1
CORPUS ANALYSIS OF LOVE IN BRITISH & AMERICAN LIT. 12

love it 1

love merely 1

love mr 1

love must 1

love now 1

love or 1

love which 1

love without 1

love you 1

love!" "i 1

love' is 1

love, ardent 1

love, flirtation 1

love, from 1

love, has 1

love, it 1

love, rather 1

love, ring 1

love, should 1

love, tell 1

love, their 1

love, though 1

love," said 1
CORPUS ANALYSIS OF LOVE IN BRITISH & AMERICAN LIT. 13

love. as 1

love. of 1

love. wherever 1

love." "it 1

love." "was 1

love; but 1

love? is 1

love?" "i 1

love?" "oh 1

Table 4 (below) shows the collocations with love in The Great Gatsby, with a

minimum frequency of 2, and a range of 3R-3L.

Table 4.

Collocations of love in The Great Gatsby

Collocation Frequency Freq (L) Freq (R)

i 13 10 3

you 9 3 6

in 6 6 0

and 6 3 3

to 5 1 4

with 4 0 4

of 4 2 2

me 4 0 4
CORPUS ANALYSIS OF LOVE IN BRITISH & AMERICAN LIT. 14

t 3 2 1

she 3 1 2

your 2 1 1

wife 2 1 1

too 2 0 2

their 2 2 0

the 2 1 1

more 2 1 1

it 2 1 1

her 2 0 2

had 2 2 0

gatsby 2 1 1

but 2 0 2

all 2 0 2

Table 5 (below) shows the collocations with love in Pride and Prejudice, with a

minimum frequency of 2, and a range of 3R-3L.

Table 5.
CORPUS ANALYSIS OF LOVE IN BRITISH & AMERICAN LIT. 15

Collocations of love in The Great Gatsby

Collocation Frequency Freq (L) Freq (R)

in 42 36 6

of 24 14 10

to 19 10 9

i 18 8 10

with 17 0 17

you 15 4 11

and 15 2 13

her 14 5 9

my 11 9 2

much 10 9 1

the 9 4 5

be 8 5 3

as 8 2 6

him 7 1 6

for 7 3 4

but 7 5 2

very 6 5 1

that 6 4 2

not 6 4 2

is 6 2 4

so 5 3 2
CORPUS ANALYSIS OF LOVE IN BRITISH & AMERICAN LIT. 16

it 5 1 4

his 5 5 0

he 5 1 4

from 5 2 3

been 5 3 2

all 5 3 2

a 5 1 4

was 4 2 2

mr 4 1 3

me 4 0 4

love 4 2 2

violently 3 3 0

they 3 2 1

s 3 2 1

really 3 3 0

must 3 1 2

if 3 2 1

friend 3 0 3

fall 3 3 0

darcy 3 0 3

world 2 1 1

well 2 0 2

though 2 0 2
CORPUS ANALYSIS OF LOVE IN BRITISH & AMERICAN LIT. 17

there 2 0 2

their 2 1 1

than 2 0 2

still 2 1 1

should 2 1 1

other 2 0 2

or 2 0 2

one 2 0 2

object 2 1 1

now 2 1 1

nothing 2 2 0

no 2 1 1

may 2 1 1

make 2 2 0

lydia 2 2 0

herself 2 2 0

have 2 1 1

half 2 1 1

had 2 0 2

falling 2 2 0

everything 2 1 1

each 2 0 2

can 2 1 1
CORPUS ANALYSIS OF LOVE IN BRITISH & AMERICAN LIT. 18

by 2 0 2

better 2 0 2

being 2 2 0

before 2 0 2

at 2 1 1

ardent 2 1 1

The qualitative analysis was conducted using the data from each contextual

instance of love in both texts. The coding of the concordances, which was used for the

qualitative analysis, is presented in Tables 6 and 7 (below) for each text. Table 6 and 7

represent the negotiated data based on two raters. Two raters were used for the coding of

this data to account for inter-rater reliability. The first rater is a candidate for a Masters in

English, and the second rater has a Masters in Microbiology, but is well-read and well-

educated. Differences in coded data could be attributed to misunderstandings of parts of

speech, or because certain words can be classified into different parts of speech based on

usage.

Table 6.

Coded collocations with love in The Great Gatsby

Collocation (1) article or (2) (3) adverb/ (4) copular be


Part of Speech preposition pronoun/noun/ adjective/
Category possessive verb
CORPUS ANALYSIS OF LOVE IN BRITISH & AMERICAN LIT. 19

Frequency 33 39 6 0

Table 7.

Coded collocations with love in Pride and Prejudice

Collocation (1) article or (2) (3) adverb/ (4) copular be


Part of Speech preposition pronoun/noun/ adjective/
Category possessive verb

Frequency 174 265 84 18

Discussion

It should be known that subjectivity is a part of corpus analysis, as previous

research has indicated. That said, this discussion is a combination of objective data and

subjective analysis of data.

Pride and Prejudice uses love approximately 1.5 times more frequently than

The Great Gatsby (see Table 1). In general, this means that Jane Austens work has love

as more of a central theme than The Great Gatsby. While this could be attributed to the

authors gender or time period in which the novel was published, culturally speaking, the

British representation of love with the use of love is more frequent than the American

representation of love. This finding is not too surprising, as Pride and Prejudice is

centered on the courtship and engagements of multiple characters, while The Great

Gatsby approaches love in a different way. The Great Gatsby examines love more in

regard to peoples aspirations- many of them aspired money and wealth, and prioritized

that over relational love.


CORPUS ANALYSIS OF LOVE IN BRITISH & AMERICAN LIT. 20

In terms of the collocations, it can be seen that in The Great Gatsby, love

collocates most frequently with articles and prepositions and pronouns, nouns, and

possessives. In fact, only 8% of the collocates in The Great Gatsby are adverbs,

adjectives, or verbs. This could mean that love is expressed with less vigor and emotion

in The Great Gatsby (and consequently in American culture) than in British culture.

Instead, love is seen as something directly attached to someone (i.e. my love, your

love) or as a state of mind (i.e. in love).

Pride and Prejudice displays a more even range of collocates, and a percentage

that is twice as high (16%) for collocates that were coded as adverbs, verbs, or adjectives.

This can be taken to mean that there is more emotion, passion, and description in the love

that is represented in the novel (and consequently in British culture). Examples include

ardent, violently, and very. These are engaging and inspiring words, which would

more likely be associated with interpersonal love.

Overall, as previously mentioned, there are many considerations with this study in

regard to the novels abilities to represent a cultural expression of love. While time period

and authors gender need to be factored in, so does the fact that literature cannot claim to

be representative of a culture as a whole. However, these novels are classics, and because

they have maintained popularity, they are one window through which to view both

American and British cultural representations of love. If nothing else, this work

demonstrates how different works of literature (and different cultures) can represent the

same idea in very different ways, and how corpus analysis can be used as a tool to

compare linguistic representations of an idea across different cultures.


CORPUS ANALYSIS OF LOVE IN BRITISH & AMERICAN LIT. 21

References

Austen, J. (1813). Pride and prejudice. London: Whitehall.

Baker, P. et al. A useful methodological survey? Combining critical discourse analysis

and corpus linguistics to examine discourses of refugees and asylum seekers in


CORPUS ANALYSIS OF LOVE IN BRITISH & AMERICAN LIT. 22

the UK press. Discourse & Society, 19(3), 273-306.

Doi: 10.1177/0957926508088962.

Fischer-Starcke, B. (2010). Corpus linguistics in literary analysis: Jane Austen and her

contemporaries. Continuum.

Fischer-Starcke, B. (2009). Keywords and frequent phrases of Jane Austens pride and

prejudice: A corpus-stylistic analysis. Journal of Corpus Linguistics, 14(4), 492-

523.

Fitzgerald, F. S. (1925). The great gatsby. New York: Simon & Schuster.

Stubbs, M. (2001). Computer-assisted text and corpus analysis: Lexical cohesion and

communicative competence. In D.S. Editor , D.T. Editor, & H.H. Editor. The

handbook of discourse analysis (304-321). London: Blackwell Publishers.

Vous aimerez peut-être aussi