A Thematic Categorization of Thomas Hardys Prose Fiction

An exploratory multivariate analysis approach

From the beginning of the twentieth century to the present, critical debate about Hardys prose fiction
writings has been shaped by questions of generic categorization and thematic classification. Almost all of the
work on the thematic classification of the prose writings of Thomas Hardy, however, is theoretically driven.
That is, classification criteria are selected by the critic based on some critical theory or framework (e.g.
formal, biographical/historical, moral, Victorian, anti-Victorian, feminist, psychoanalytic, postcolonial,
philosophical/religious, sociological/anthropological, etc) supported by personal knowledge and evaluation
of the texts. Even more, many of the existing accounts follow the stereotype classifications of what is called
Hardy Critical Industry. I mean by this that many of Hardys critics are willing to agree with conventional,
well-known criticisms of Hardy regardless of any critical presuppositions about what they are supposed to
agree/disagree. So in spite of the great number of thematic reviews of Hardys prose work, there is neither
consensus among his commentators nor an objective study that adopts reliable empirical methods.
In the face of the limitation of the previously mentioned methods, the research question in this article
specifically asks
Can an objective and conceptually useful classification- based on empirical evidence- abstracted from
Thomas Hardys prose fiction texts be found?
To address the research question, the study proposes vector space classification (VSC) for classifying the
novels and short stories of Thomas Hardy thematically based on the lexical frequency representations of
those texts. To put it into effect, VSC is executed where exploratory multivariate analysis techniques are
applied to perform a document ranking wherein cluster information is used within a graph-based framework.
The rationale behind the adoption of multivariate analysis, however, is that our proposed classification is
concerned with grouping texts of identical/similar themes together into distinct sets. This suggests that the
idea of analysis becomes a multivariate data-solving problem in the first place. Thus the core objective of
this exploratory research is to make some preliminary progress towards developing a thematic classification

that addresses the limitations of traditional philological methods.

Our approach to handling this objective can be outlined as follows: the study takes the form of a case-study
design, with an in-depth analysis of multivariate statistical techniques, particularly cluster analysis and
principal components analysis and their feasibility in generating an empirical thematic classification of the
prose fiction of Hardy.
For the purpose of classification, cluster analysis is used to perform the task. This is simply a multivariate
statistical technique for finding relatively homogeneous clusters of cases based on proximity measures. It
encompasses a number of different methods including hierarchical cluster analysis with the purpose of
sorting different objects into distinct groups where members of the one group are similar to each other and
distant from members of the other group/s. In this, hierarchical cluster analysis is first used to measure the
semantic relatedness between the selected texts of Thomas Hardy with the purpose of generating an
automated objective classification of these works. What we have as a result is an illustrated set of analyses
that show how texts are related to each other thematically.
To validate the results, Principal Components Analysis (PCA) is used to reformulate the lexical frequency
data into a reduced set of uncorrelated variables which is reanalyzed to determine whether or not the cluster
trees based on the two matrices are morphologically equivalent. Taken together, they (cluster analysis and
PCA) provide an integrated framework for the thematic classification of literary texts.
The results are that the 62 texts (involved in the study and which represent all the novels and short stories of
Hardy) fall into clearly defined 3 thematic groups in relation to the themes they convey, and that these
groups correlate to some extent with bibliographical, textual, and critical findings associated with the texts. It
can be concluded that computational methods like cluster analysis can usefully supplement the philological
methods in thematic classification of the novels and short stories of Thomas Hardy yet in objective replicable