Running Head: CORPUS ANALYSIS OF "LOVE" 1

Running head: CORPUS ANALYSIS OF LOVE 1
Corpus Analysis of love in British and American Literature
Lauren Porter
Colorado State University

CORPUS ANALYSIS OF LOVE IN BRITISH & AMERICAN LIT. 2
Introduction
The aim of this research is to use corpus analysis to identify different linguistic
cultural representations of the concept of love, as expressed using the word love, in
British and American Literature. Using both quantitative and qualitative analysis, this
research will examine the different representations of love in both F. Scott Fitzgeralds
The Great Gatsby (published in 1925), and Jane Austens Pride and Prejudice (published
in 1813). While universal ideas and realities (such as the abstract representation and
actual physical expression of love), exist across all cultures, cultures express these
universal ideas in different ways. That said, this research aims to answer the question,
Do British and American literary classics, as representations of their respective cultures,
linguistically represent love differently? If so, how? In more general terms, this study
aims to elicit linguistic cultural differences that exist with shared abstract ideas, as
different cultures vary in terms of linguistic representation of different concepts.
These two texts were chosen for corpus analysis as they are both well-read
literary classics, which are representative not only of the time period in which they were
written, but are also widely-read still today. Additionally, both texts share the themes of
love, class, and courtship, which allows for overlap of themes between the two texts. In
order to understand the results of the corpus analysis, and quantitative and qualitative
findings from it, it is important to understand a brief background and context of each
novel.
F. Scott Fitzgeralds The Great Gatsby was published in 1925 in America. The
book follows Nick Carraway, as he moves East to Long Island to work in the bonds
business. There, he befriends Jay Gatsby, who is in love with Nicks cousin, Daisy. The
novel follows Nick as he participates in the New York social scene, and we learn of the
destructive behaviors, affairs, and scandals of the various characters. In terms of
historical context, the book is set in 1920s America (the Roaring 20s), which was a time
of prosperity, wealth, jazz, extravagance, and prohibition before the Great Depression.
Many people valued wealth and affluence, and because of the economic boom,
Americans experienced wealth and consumerism as never before.
Jane Austens Pride and Prejudice was published in 1813 in England. The novel
follows the Bennets, who have five daughters. The book follows the courtship of the
daughters by various suitors, but mainly follows two of the sisters- Jane and Elizabeth.
Elizabeth meets Mr. Darcy, whom she finds arrogant, but after a series of twists and turns
in the plot (and more proposals), she accepts his proposal and marries him.
In terms of historical context, this book was set in 19th century England, and was
written during the Romantic period, when literature was marked by the emphasis on
emotion, individualism, and nature. Some of these themes and feelings were a reaction to
the Industrial Revolution and modernization of England that was occurring during that
time.
Considerations
While this study aims to study cultural representations of love in both literary
works, this variable was not isolated in this study. The variable of culture exists because
one book is American and one is British, however there are two other significant
variables that need to be considered, and which couldve affected results. The first is the
date of publication, as the books were written in two different centuries. That said, corpus
analysis results may represent not only cultural differences, but also time-period
differences. A more accurate representation would be to choose two novels from the same
time period. The second important consideration is that one novel was written by a male,
and the other by a female. It is possible that gender differences also contributed to
differences in the representation of love; however, it is hard to determine where this
variable may have affected the results. In order to isolate the variable of culture most
effectively, it would be pertinent to choose novels written in the same period as well as by
the same gendered author.
Literature Review
Stubbs (2001) provides a good introduction to corpus analysis- what it is, and how
it can be useful in understanding language. Stubbs (2001) describes that our interpretation
of a texts difficulty or ease of understanding is compared to other things that we have
read or heard in the past. Stubbs says, This means that individual texts are interpreted
against an intertextual background of norms of language use. These norms, which are
expressed largely in collocations of words, can be revealed by the computer-assisted
analysis of large corpora (p. 304). Stubbs (2001) proceeds by describing how
comparisons using corpora can help to understand text cohesion, intertextual relations,
and the extent to which linguistic competence includes knowledge of norms of language
use (collocations). In regard to the current study, Stubbs information is helpful mostly in
terms of intertextuality. While the corpus of these two novels was not compared to a more
general corpus, they were compared to each other, and corpus allowed for the comparison
(intertextually) of not only collocations, but many other linguistic features, using the
word love as the focus.

Corpus analysis has many applications, and was used by Baker et al. (2008), in
conjunction with critical discourse analysis, to identify common categories of the
representation of refugees, asylum seekers, immigrants, and migrants (RASIM) in British
news articles. The study used collocations and concordance analysis to identify
categorical representations of these groups. The study also directed analysts to
representative texts to carry out qualitative analysis on the topic as well. This article is
helpful for this study because the researchers combined quantitative and qualitative
analyses, as this study does as well. The studies differ as the present one does not
incorporate critical discourse analysis, which focuses on theoretical concepts such as
power, ideology, and domination. Instead, the current study focuses on the cultural
representation of the abstract concept of love, but is not concerned with power or
ideological relations.
The Baker et al. (2008) study is helpful for the current one because one of the
research questions was, what attitudes towards RASIM emerge from the body of UK
newspapers seen as a whole? (Baker et al., 2008, p. 276), which indicates that
quantitative data was used to make qualitative inferences, similar to the current study.
The research also aptly describes that while corpus linguistic methods allow for a
reasonably high level of objectivity, that researcher subjectivity via subjective researcher
input is typically involved in every stage of the analysis (Baker et al., 2008, p. 277). This
is important to note because, during the transition from quantitative to qualitative analysis
(or the qualitative implication of quantitative data), there is subjectivity on the part of the
researcher in determining what the quantitative data might mean, qualitatively. Like the
present study, Baker et al. (2008) supplemented collocation findings with concordances,
which allow the researcher and analyst to see the words in a larger context. As Baker et
al. says, Concordance analysis affords the examination of language features in co-text,
while taking into account the context that the analyst is aware of and can infer from the
context (2008, p. 279). Viewing concordances within larger contexts was important for
this study, and, as Baker et al. wrote, helped with making inferences. Once again, it
should be noted that inferences are subjective by nature, and thus subjectivity is a part of
this study.
A shortcoming of the Baker et al. (2008) study was that while the corpus
linguistics used the whole corpus, because of time and money constraints, the critical
discourse analysis was not able to use the whole corpus. Instead, the researchers had to
choose a sample of texts from the corpus to use for the critical discourse analysis.
However, given that the current study is not concerned with critical discourse analysis,
this study is still very helpful.
Corpus analysis a large area of applied linguistics, and in addition to being used
alongside critical discourse analysis, has been used to analyze literature previously, as
well. Fischer-Starcke (2010) has used corpus linguistics in literary analysis, specifically
with Jane Austen and other contemporaries of the author. Fischer-Starckes (2010) book
provides an introduction to corpus analysis and shows its application in the corpus
analysis of literary texts, specifically of Austens novel Northanger Abbey, corpora of her
other novels, and corpora of texts that are Austens contemporaries. The analysis focuses
on the impact of quantitative keywords, phraseological units, and frequent words as they
affect literary meanings and structural organization. Fischer-Starckes (2010) work is
helpful because it is another example that demonstrates corpus linguistics wide range of
applicability of use. Additionally, it demonstrates that there is interest in the work of Jane
Austen outside of this study, which indicates that is an area where, though research has
begun, more can be added. Similar to Baker et al. (2008), Fischer-Starcke (2010) also
addresses issues regarding subjectivity and objectivity in corpus analysis, which supports
this study because a consideration of the current research is fallibility of results that
subjectivity can contribute to. Unlike Fischer-Starcke (2010), this research will not focus
on literary meanings or structural organization. Instead, the current study does quite the
opposite, as it aims to focus on cultural meanings that existed outside of the text.
Fischer-Starcke (2009) is a corpus analysis of Jane Austens Pride and Prejudice.
The study used corpus analysis to reveal meaning in fiction, whereas previous studies
targeted non-fiction work. Fischer-Starckes (2009) study used keywords and frequent
phrases in Pride and Prejudice to reveal literary meanings that were not apparent with
analyses by literary criticism or in literary critical secondary sources. The implication of
this study is that it provides evidence for a potential of corpus analysis in literature. This
is important to the current study, as corpus analysis was not only used with two pieces of
fiction, but helps bring credibility to the use of corpus analysis with fiction work.
However, unlike Fischer-Starcke (2009), the current study is not concerned with literary
meaning, and also aims to compare two works of fiction, instead of the focus on one.
This study contributes to the field of corpus analysis with fiction work as analysis
is carried out with two pieces of fiction. The research is new because the corpus analysis
is not intended to be used for literary analysis. Instead, this work combines and bridges
the fields of literature and sociolinguistics via the use of corpus analysis.
Method
This study uses corpus data collected from Jane Austens Pride and Prejudice and
F. Scott Fitzgeralds The Great Gatsby in order to identify cultural representations of love
via the use of the word love in both works. In order to achieve this, both quantitative
and qualitative methods are used.
Using the corpus software AntConc, both texts were analyzed for the total
frequency of the lexical item love in both texts. Then, the software was used to
determine n-grams of love within each text. Afterward the software was used to locate
collocations of love, with a minimum frequency of 2 times in the novel, and with a
range of 3R-3L, in both novels. Finally, each instance of love was located in its
concordance line, and larger context, for both texts, and this is where the data was
analyzed qualitatively because this is where the lexical item love was able to be
examined within a larger context.
This qualitative analysis was coded based on the quantitative data, which derived
from the collocations and concordance lines of love. The data was coded based on the
collocates parts of speech. The parts of speech were coded in any position within the
collocation. In other words, it did not matter where in the range of the collocation the
frequent collocate occurred, it was analyzed based on its frequency. For ease of coding,
these parts of speech were put into three categories: 1) article or preposition 2) pronoun,
noun, or possessive 3) adjective, adverb, or verb 4) copular be (is). These concordance
lines were analyzed in relation to the larger context of the paragraph in which they
occurred. This allowed for a qualitative interpretation of the lexical item in context of the
literature, and then this information was used in order to make inferences about the
cultural representation of love as a whole.
For this project, both texts were located first as PDFs on the web and then
converted to .txt format so that they could be used in AntConc.
Results
Tables 1-5 (below) display the results from the corpus analysis for frequency, n-
grams, and collocations. Tables 6-7 (below) display the quantitative data that was coded
and then used for qualitative analysis. For preservation of space and ease of reading, the
contextual examples of every collocation are not presented in this study.
Table 1 (below) shows the total frequency of love in both works, as well as the
normed frequency of love, which allows for a comparison of the frequency of use in
each text.
Table 1.
Frequency of love
Book Frequency of love Normed frequency
The Great Gatsby 24 478- .000478%
Pride and Prejudice 91 723- .000723%

Table 2 (below) presents the n-grams/clusters of love in The Great Gatsby.
Table 2
N-grams of love in The Great Gatsby
N-Gram Frequency
love with 4
love to 3
love you 3
love belongs 1
love daisy 1
love every 1
love her 1
love him 1
love himpossibly 1
love it 1
love nest 1
love new 1
love through 1
love, but 1
love, nick 1
love, nor 1
love, of 1
Table 3 (below) presents the n-grams/clusters of love in Pride and Prejudice.
Table 3.
N-grams of love in Pride and Prejudice
N-Gram Frequency
love with 17
love to 5
love him 4
love of 4
love, and 4
love in 3
love, i 3
love and 2
love as 2
love before 2
love her 2
love me 2
love; and 2
love; for 2
love a 1
love by 1
love can 1
love each 1
love for 1
love it 1
love merely 1
love mr 1
love must 1
love now 1
love or 1
love which 1
love without 1
love you 1
love!" "i 1
love' is 1
love, ardent 1
love, flirtation 1
love, from 1
love, has 1
love, it 1
love, rather 1
love, ring 1
love, should 1
love, tell 1
love, their 1
love, though 1
love," said 1
love. as 1
love. of 1
love. wherever 1
love." "it 1
love." "was 1
love; but 1
love? is 1
love?" "i 1
love?" "oh 1
Table 4 (below) shows the collocations with love in The Great Gatsby, with a
minimum frequency of 2, and a range of 3R-3L.
Table 4.
Collocations of love in The Great Gatsby
Collocation Frequency Freq (L) Freq (R)
i 13 10 3
you 9 3 6
in 6 6 0
and 6 3 3
to 5 1 4
with 4 0 4
of 4 2 2
me 4 0 4
t 3 2 1
she 3 1 2
your 2 1 1
wife 2 1 1
too 2 0 2
their 2 2 0
the 2 1 1
more 2 1 1
it 2 1 1
her 2 0 2
had 2 2 0
gatsby 2 1 1
but 2 0 2
all 2 0 2
Table 5 (below) shows the collocations with love in Pride and Prejudice, with a
minimum frequency of 2, and a range of 3R-3L.
Table 5.
Collocations of love in The Great Gatsby
Collocation Frequency Freq (L) Freq (R)
in 42 36 6
of 24 14 10
to 19 10 9
i 18 8 10
with 17 0 17
you 15 4 11
and 15 2 13
her 14 5 9
my 11 9 2
much 10 9 1
the 9 4 5
be 8 5 3
as 8 2 6
him 7 1 6
for 7 3 4
but 7 5 2
very 6 5 1
that 6 4 2
not 6 4 2
is 6 2 4
so 5 3 2
it 5 1 4
his 5 5 0
he 5 1 4
from 5 2 3
been 5 3 2
all 5 3 2
a 5 1 4
was 4 2 2
mr 4 1 3
me 4 0 4
love 4 2 2
violently 3 3 0
they 3 2 1
s 3 2 1
really 3 3 0
must 3 1 2
if 3 2 1
friend 3 0 3
fall 3 3 0
darcy 3 0 3
world 2 1 1
well 2 0 2
though 2 0 2
there 2 0 2
their 2 1 1
than 2 0 2
still 2 1 1
should 2 1 1
other 2 0 2
or 2 0 2
one 2 0 2
object 2 1 1
now 2 1 1
nothing 2 2 0
no 2 1 1
may 2 1 1
make 2 2 0
lydia 2 2 0
herself 2 2 0
have 2 1 1
half 2 1 1
had 2 0 2
falling 2 2 0
everything 2 1 1
each 2 0 2
can 2 1 1
by 2 0 2
better 2 0 2
being 2 2 0
before 2 0 2
at 2 1 1
ardent 2 1 1
The qualitative analysis was conducted using the data from each contextual
instance of love in both texts. The coding of the concordances, which was used for the
qualitative analysis, is presented in Tables 6 and 7 (below) for each text. Table 6 and 7
represent the negotiated data based on two raters. Two raters were used for the coding of
this data to account for inter-rater reliability. The first rater is a candidate for a Masters in
English, and the second rater has a Masters in Microbiology, but is well-read and well-
educated. Differences in coded data could be attributed to misunderstandings of parts of
speech, or because certain words can be classified into different parts of speech based on
usage.
Table 6.
Coded collocations with love in The Great Gatsby
Collocation (1) article or (2) (3) adverb/ (4) copular be

Part of Speech preposition pronoun/noun/ adjective/
Category possessive verb
Frequency 33 39 6 0
Table 7.
Coded collocations with love in Pride and Prejudice
Collocation (1) article or (2) (3) adverb/ (4) copular be

Part of Speech preposition pronoun/noun/ adjective/
Category possessive verb
Frequency 174 265 84 18
Discussion
It should be known that subjectivity is a part of corpus analysis, as previous
research has indicated. That said, this discussion is a combination of objective data and
subjective analysis of data.
Pride and Prejudice uses love approximately 1.5 times more frequently than
The Great Gatsby (see Table 1). In general, this means that Jane Austens work has love
as more of a central theme than The Great Gatsby. While this could be attributed to the
authors gender or time period in which the novel was published, culturally speaking, the
British representation of love with the use of love is more frequent than the American
representation of love. This finding is not too surprising, as Pride and Prejudice is
centered on the courtship and engagements of multiple characters, while The Great
Gatsby approaches love in a different way. The Great Gatsby examines love more in
regard to peoples aspirations- many of them aspired money and wealth, and prioritized
that over relational love.

In terms of the collocations, it can be seen that in The Great Gatsby, love
collocates most frequently with articles and prepositions and pronouns, nouns, and
possessives. In fact, only 8% of the collocates in The Great Gatsby are adverbs,
adjectives, or verbs. This could mean that love is expressed with less vigor and emotion
in The Great Gatsby (and consequently in American culture) than in British culture.
Instead, love is seen as something directly attached to someone (i.e. my love, your
love) or as a state of mind (i.e. in love).
Pride and Prejudice displays a more even range of collocates, and a percentage
that is twice as high (16%) for collocates that were coded as adverbs, verbs, or adjectives.
This can be taken to mean that there is more emotion, passion, and description in the love
that is represented in the novel (and consequently in British culture). Examples include
ardent, violently, and very. These are engaging and inspiring words, which would
more likely be associated with interpersonal love.
Overall, as previously mentioned, there are many considerations with this study in
regard to the novels abilities to represent a cultural expression of love. While time period
and authors gender need to be factored in, so does the fact that literature cannot claim to
be representative of a culture as a whole. However, these novels are classics, and because
they have maintained popularity, they are one window through which to view both
American and British cultural representations of love. If nothing else, this work
demonstrates how different works of literature (and different cultures) can represent the
same idea in very different ways, and how corpus analysis can be used as a tool to
compare linguistic representations of an idea across different cultures.

References
Austen, J. (1813). Pride and prejudice. London: Whitehall.
Baker, P. et al. A useful methodological survey? Combining critical discourse analysis
and corpus linguistics to examine discourses of refugees and asylum seekers in

the UK press. Discourse & Society, 19(3), 273-306.
Doi: 10.1177/0957926508088962.
Fischer-Starcke, B. (2010). Corpus linguistics in literary analysis: Jane Austen and her
contemporaries. Continuum.
Fischer-Starcke, B. (2009). Keywords and frequent phrases of Jane Austens pride and
prejudice: A corpus-stylistic analysis. Journal of Corpus Linguistics, 14(4), 492-
523.
Fitzgerald, F. S. (1925). The great gatsby. New York: Simon & Schuster.
Stubbs, M. (2001). Computer-assisted text and corpus analysis: Lexical cohesion and
communicative competence. In D.S. Editor , D.T. Editor, & H.H. Editor. The
handbook of discourse analysis (304-321). London: Blackwell Publishers.

Running Head: CORPUS ANALYSIS OF "LOVE" 1

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Running Head: CORPUS ANALYSIS OF "LOVE" 1

Transféré par

Droits d'auteur :

Formats disponibles

Running head: CORPUS ANALYSIS OF LOVE 1

Corpus Analysis of love in British and American Literature

Colorado State University

Do British and American literary classics, as representations of their respective cultures,

different cultures vary in terms of linguistic representation of different concepts.

destructive behaviors, affairs, and scandals of the various characters. In terms of

Americans experienced wealth and consumerism as never before.

differences in the representation of love; however, it is hard to determine where this

the same gendered author.

of a texts difficulty or ease of understanding is compared to other things that we have

expressed largely in collocations of words, can be revealed by the computer-assisted

word love as the focus.

conjunction with critical discourse analysis, to identify common categories of the

representation of refugees, asylum seekers, immigrants, and migrants (RASIM) in British

categorical representations of these groups. The study also directed analysts to

incorporate critical discourse analysis, which focuses on theoretical concepts such as

this study is still very helpful.

affect literary meanings and structural organization. Fischer-Starckes (2010) work is

Fischer-Starcke (2009) is a corpus analysis of Jane Austens Pride and Prejudice.

analyses by literary criticism or in literary critical secondary sources. The implication of

and qualitative methods are used.

examined within a larger context.

noun, or possessive 3) adjective, adverb, or verb 4) copular be (is). These concordance

cultural representation of love as a whole.

converted to .txt format so that they could be used in AntConc.

contextual examples of every collocation are not presented in this study.

Book Frequency of love Normed frequency

The Great Gatsby 24 478- .000478%

Pride and Prejudice 91 723- .000723%

Table 2 (below) presents the n-grams/clusters of love in The Great Gatsby.

N-grams of love in The Great Gatsby

N-grams of love in Pride and Prejudice

minimum frequency of 2, and a range of 3R-3L.

Collocations of love in The Great Gatsby

Collocation Frequency Freq (L) Freq (R)

minimum frequency of 2, and a range of 3R-3L.

Collocations of love in The Great Gatsby

Collocation Frequency Freq (L) Freq (R)

educated. Differences in coded data could be attributed to misunderstandings of parts of

Coded collocations with love in The Great Gatsby

Collocation (1) article or (2) (3) adverb/ (4) copular be

Coded collocations with love in Pride and Prejudice

Collocation (1) article or (2) (3) adverb/ (4) copular be

Frequency 174 265 84 18

It should be known that subjectivity is a part of corpus analysis, as previous

subjective analysis of data.

that over relational love.

love) or as a state of mind (i.e. in love).

more likely be associated with interpersonal love.

compare linguistic representations of an idea across different cultures.

Austen, J. (1813). Pride and prejudice. London: Whitehall.

Baker, P. et al. A useful methodological survey? Combining critical discourse analysis

and corpus linguistics to examine discourses of refugees and asylum seekers in

the UK press. Discourse & Society, 19(3), 273-306.

prejudice: A corpus-stylistic analysis. Journal of Corpus Linguistics, 14(4), 492-

handbook of discourse analysis (304-321). London: Blackwell Publishers.

Vous aimerez peut-être aussi