Vous êtes sur la page 1sur 4

2014 Third International Conference on Computer Technology in Russia and in the Former Soviet Union

Toolbox for Historical and Biographical Research


(Prosopographic Databases on Russian History)
Svetlana Ulyanova, Vladislav Sinepol
St. Petersburg State Polytechnic University
St. Petersburg, Russia
oulianova@mail.spbstu.ru, sinepol@mail.spbstu.ru

Keywords—e-history, prosopography, prosopographic and strata. As early as by 2002, according to estimates by


databases, relational databases
Y. Yumasheva and G. Ivanova, domestic historiography had
Modern historiography uses a systematic approach to designed more than 100 prosopographic databases [3]. The
research past. The main difficulty is the need to process large authors explain the sharp growth in prosopographic research at
volumes of information and use relevant methods to analyze the beginning of the 1990s with the appearance of relatively
and synthesize it. So, informational problems of research simple commercial DBMSs that have made it possible to
come to the foreground. These also include development of create databases and apply statistical methods of their
better technical means on the basis of databases and databanks processing even for users that have not been specially trained
that would allow storing and using information not only about for that [4]. In a number of cases architects do not even
an object itself, but all the “external” data that can be mention what systems they have used to develop their
aggregated and synthesized in place and time and thereupon databases.
new interpretations can be searched for.
In 1990-1996 the most popular DBMSs was dBASE III
Intellectual forums where issues related to creation of plus; at the same period “Karat”, “FoxBase” (different
databases and databanks are discussed include conferences editions) and FILE FORCE application programs were used.
and other events organized by the association of “History and In 1992—1998 the DBMS of KLEIO was also used the same
Computing” (national branch of the international association as different purpose programs, such as STATISTIKA,
of “History & Computing”, established in 1992) [1]. PROSIS, ANARKHIST, SOCIOLOG, etc. [5]. Since 1997 the
According to I. Garskova if in 1990 table databases dominated majority of databases have been developed in MS Access
in cliometrics, in 2000 other trends appeared: full-text, environment (sometimes research can be done in MS Excel
prosopographic, historiographic databases; information electronic spreadsheets). As a rule, authors of such solutions
systems based on big collections of statistic and narrative data, choose a toolbox in terms of its availability, MS Windows
etc. [2]. compatibility and controllability (capability of correction,
refilling) of the DB [6].
Initially historiography interpreted the term of
“prosopography” as historiographic analysis. Its purpose was An example of such a “standard” solution is the DB of
to explore some historically significant social phenomenon or G. Dyachkov “Heroes of the Soviet Union” [7]. The author
structure by telling about people whose lives and actions were analyzes a collective “Portrait of a Hero” and highlights
closely related with corresponding phenomena and structures. generational, national and political (membership in All-Union
Appearance of the modern prosopographic research genre in Communist Party, All-Union Leninist Young Communist
history is associated with the name of the British historian L. League) features. At the same time the research does not
Stone. In 1971 his article was published where he presented include dynamic characteristics, which can, most probably, be
his vision of the “old” and “new” prosopography. The “old” explained with limited capabilities of the chosen DBMS.
prosopography studied relatively rare social elites whereas the
Another example of a prosopographic database, built in the
“new” (quantitative) one was meant to research much more
Microsoft Access DBMS, can be the DB “Dispossessed
numerous sample collections. Moreover, there is need for
Peasants of the Southern Urals (1930–1934)” (DB “DPSU”
researching not only elites but “grass roots”, too. According to
[8]. It mainly focuses on statistical (the DB contains almost
the definition of L. Stone, prosopography is the investigation
1500 entries) rather than biographical analysis.
of the common background characteristics of a group of actors
in history by means of a collective study of their lives, which Standard software tools are, to a large extend, orientated
concerns their ways in implementing political actions and on synchronized data processing, which limits capabilities of
options in social mobility and career ambitions [3]. historians to work with mass data. New technologies have to
be developed and closer cooperation of historians and
Since 1990 prosopography has turned into a research genre
programmers is needed.
that studies mass data so as to create, based on statistics
analysis, dynamic “collective biographies” of certain social An example of a toolbox designed for the objectives of a
groups, strata etc., keeping the chance to store and research certain prosopographic research project can be the database on
biographies of individuals that belong to these social groups personal records of workers in the oil producing firm

978-1-4799-1799-0/14 $31.00 © 2014 IEEE 188


183
DOI 10.1109/SoRuCom.2014.47
“Tovarischestvo br. Nobel”, which was created by successful, in our opinion, projects is “Russian
P. Akhanchi (The Institute of History of the Academy of Parliamentarians at the Beginning of the 20th Century”. The
Sciences of Azerbaijan) and I. Garskova (Moscow State information system, created by Permian scientists is full-text,
University) to study the labor market in the oil sector of Baku source orientated, operable through a web-interface and
and migration of workers to this region at the end of 19 and focuses on multitasking research. It allows getting information
beginning of 20 century. It is based on the data from 2000 about the members of the deputy corps of the State Duma in
personal records that contain both static (or unique) 1906-1917, making selections of deputies by certain
information and dynamic data on every worker over the whole parameters and thus doing prosopographic research [12]. In
period of his employment in the firm. The static information technological terms this system is distinct with the use of
includes data on the name, nationality, literacy, age, place of relevant applications of the PL/SQL language, when
birth and other data that were recorded on a single occasion as programming, and implementation on the basis of the software
the person applied for a job in the firm. The dynamic system of Oracle Application Server, which makes the
information includes data that were recorded in case of their information system cross-platform [13].
change and were included into personal records if it was
needed (in case there were changes in qualification level, Mass data by their origin have multifunctional
marital status, place of work, wages and their reasons, significance. They can be interesting for various problem
penalties and reward, accidents, etc.) [9]. solutions. A historian has to use the information in a source to
its maximum extent. Processing sources as information objects
The database architects set to themselves not biographical includes accounting for the structure of the information, its
but statistical research objectives. The main problem was to classification, evaluation of its fullness and representativity,
build dynamic rows by all major indices for the period to be analysis of psychological and other factors that have
researched based on the capability to pull in lists of workers influenced the character of the information, etc.
employed in “Tovarischestvo br. Nobel” at any instant of time
and calculate, on the basis of this lists, numerical values of E-history is marked by the fact that features of the objects
quantitative indices and a share of occurrence of separate being researched are not known in advance and the range of
categories of qualitative indices (for example average age of variables being studied is practically impossible to foresee.
workers or a share of literate workers). To reach this objective Difficulties of such research include the need for processing
a menu manageable package of ATiSeP (Aggregated Time large amounts of non-structured data and defining the required
Series on Prosopography) programs has been developed on the quantitative indices that characterize certain sample
DB language dBASE IV, designed to retrieve information collections of these data. In historical research it is necessary
from the multi-file database and build dynamic rows of fields not only to fix an arbitrary number of variables but to justify
of this database for each period of time selected by the user. their selection, set up a hierarchy and systematically arrange
The system works with several files of the database: the major them in relation to each other. Here the difficulty is that the
file that contains static data on individuals (the number of structure of features and indices that is set in the source does
entries in this file equals to the number of individuals in the not necessarily coincide with the goals and objectives of the
database), reference file, containing at least one entry for research. So there is an issue concerning the selection of
every worker on the date of his each application for a job and variables that are needed for it. It is also true that a historian is
each dismissal, and several additional files with the data on the often unaware beforehand which of them exactly can be of
changes in various dynamic indices and their dates for all the interest, which results in methodological complications and
individuals that such data are available for. possible alternatives in further work. So, prosopographic
studies call for historiographic, methodological and
The package contains the following modules: 1) a module technological solutions.
that builds up structures of the foregoing files; 2) a module
that builds up dynamic rows of yearly or monthly data; 3) a An example of such interdisciplinary search can be an
module that builds up auxiliary (selective) files, containing original software system designed to store and statistically
data on those individuals that are present in a certain period; 4) interpret data on historical individuals of an organization or an
a module that exports these selective files into a graphic or industry that has been developed in St. Petersburg State
statistic package for further analysis [10]. Polytechnical University.

As we can see, the foregoing database complies with one Place and time structuring of data on creative biographies
of the most important requirements for contemporary of science and higher education representatives can give
prosopographic research – presence of “dynamic valuable material to create a general picture of development of
characteristics” and presentation of the research results not as the national intellectual potential, collective portrait of a
a static “image” that characterizes a certain group of people at Russian scientist, professor at university. Analysis of a
a certain instant of time, but as a “collective biography” that number of sources (lives of scientists, their creative
allows seeing changes that the group being researched were biographies, memoirs of contemporaries, etc.) shows a
experiencing in their lives over a certain period [11]. principal capability to formally describe an individual, their
creative and social biography. If such a description is
Analysis of biographic (in their broad meaning) data of a accompanied with a fairly flexible searching and grouping
rather big selection of representatives of a certain social and mechanism, it is possible to get various qualitative features of
professional group of individuals can provide us with an the groups that are being studied.
interesting historiographic result. Thus, one of the most

184
189
The mechanism of relational databases is quite adequate When designing client-server systems for storage, transfer
and universal to be a modern tool for this type of research. It and display of text information the following features of such
was used as a basis for a software system to store and systems are to be taken into account:
statistically process data on historical individuals of an
organization or sector (industry, science, education, culture, 1. Quick access to the requested information and
etc.). The system includes a relational database that is stored capability to work with the same data for several users
on a SQL-server and a client web-application that allows at the same time. It can be carried out through a
editing the contents of the database (adding, modifying, distributed DB located on a multiprocessor server.
deleting entries); setting search requests at any level of Search for information can be optimized through
difficulty; regrouping results of search requests; setting parallel search in the distributed DB (on different
computation of the required quantitative parameters of data nodes of a multiprocessor server, information
selections (distribution, average values, dispersion, correlation clustering). Moreover, it is logical to position such a
coefficients). Such an approach makes it possible to get access system as SourceSafe, i.e. the user can copy some data
to the database through a common web-browser and avoid from the server, work with them and then synchronize
problems caused by the need for installing and maintaining a the changed data with the database. When checking
special application on client computers. It is worth mentioning out the data other users can only see the information
that the developers have tried to reconcile two underlying but not modify it.
operational principles of any historical databases: 2. Data processing service (both for the information that
historiographic and problem-orientated. Demand for any incomes and is stored). A Data Mining mechanism for
database, its importance and scientific potential as a whole acquiring knowledge from the accumulated
depend on how successful this reconciliation is. A researcher information and Test Mining (text analysis), statistical
does not only get an access to a large collection of structured analysis of lexicographic groups, search for dependent
computer-added data but can also create in this structure in statements, selection of cause-and-effect chains, etc.
any form any number of their own structures that are focused
on dealing with certain research objectives. 3. Separation of access rights for users (groups of users)
that is controlled by a business logic module. It
Any text and graphical data that can be converted in includes features like access granting or denial to the
electronic format are initial data for the application. On the requested information for reading, changing, deleting
bases of such sources an entry about a person is generated in of data.
the database and all the initial documents (in their electronic
representation) are logically linked to the entry about the 4. The database can be administered and modified with
person and can be used in research in case of need. the help of a separate module of the system. It allows
separating such features as reading and editing data,
The software system provides a means for organizing which makes the system less vulnerable to failures and
requests to the database so as to get data selections for their unauthorized access.
further processing and computing characteristic parameters.
The major feature of the application is a request for selection 5. Access to the data with the help of an Internet portal
of database entries that satisfy a certain set of conditions and through a common web-interface that offers a
distribution of these entries by groups, computation of values prospect to work both with the data using web-
for the set criteria. A request is formed on the values a user browsers and web-services, which allow creating
sets for any sub-collection of the search criteria. The set of specialized remote clients.
search criteria includes any combination of attributes that Based on the “Toolbox of Historical and Biographical
characterizes an individual. The search criteria can be both the Research”, a team of authors (V. Sinepol, S. Ulyanova,
criteria of the “range” type (for example, those, who were I. Aladyshkin, N. Kornet) created a new-type database:
born in a selected interval of years) and the criteria of the “Professors of St. Petersburg Polytechnical University, 20th
“gets-into-the-collection” type (for example, those who century” (PSPbPU, registered in the State Information
graduated from one of the stipulated universities – one or Register in 2009, state registration number 0220913105).
more universities can be indicated). As a result of such types
of requests, the number of people whose data in the entries of This database includes advanced means of information
the database correspond to the selected criteria and the list of storage and formalization. Work with the client application is
these people are displayed. divided into three interrelated parts: (1) registration of
individuals – input of the general biographical data and their
The results of the requests can be grouped according to the further editing (including search system by individuals); (2)
chosen parameters. Moreover, the entries that satisfy the processing mass data by requesting selection of entries; (3)
selected search conditions are divided into groups by the directories (of higher educational institutions of the RF,
values of the selected criteria of grouping. For each group both subdivisions of SPSPU, honorary degrees, majors of higher
the number of entries and the entries themselves that it education, etc.) that help the user deal with the materials of the
includes are displayed. If two or more parameters of grouping DB.
are selected, each group is divided into sub-groups, for which
the number of entries and entries themselves are shown, etc. To facilitate input of the materials from sources and their
The results of such requests can be represented in graphics as further editing, 13 key sections for distribution of the
bar charts. information were developed:

185
190
1) “Registration” which contains just general biographical broadens chances to reconstruct the types of careers for
information: life period, place of birth and social background. Russian scientists and teachers of the higher education
These data will be reflected in all further sections of data institution, to analyze priority fields of scientific work, to
formalization for any individual that are included into one study, at a personal level, integration of national scientists into
“field”; European and the world science. The elaborated database, in
perspective, can be an important stage in researching scientific
2) Data on relatives (including their last names, first and schools, synthetic understanding about the scientific
middle names and degree of kinship); community as a whole. The dynamic component of the
3) Education (including university, period of education and information in its data allows revealing the development
major obtained); trends of the social and professional group of representatives
of the scientific community that is being researched. In the
4) Work career (data on the work period, type of long run it is possible to design larger scale databases on the
organization and its geographical location, positions occupied, basis of the universal toolbox of historical and biographical
list of honorary degrees is given in addition); research that would embrace entire fields of industry, science,
4) Work in SPSPU (apart from the work period, positions education, culture, etc.
are included that were occupied at certain faculties, Thus, the use of digital history methods allows historians
departments, administrative roles, names of courses to store and process text, statistic, visual and other historical
developed); sources at a qualitatively new level, considering today’s global
5) Academic career (data on the year of graduate and post- trends in the field of computer complexes and information
graduate courses completion, organization, dates of technology.
Candidate’s and Doctor’s theses defense, field of science,
academic advisor and consultant, scientific rank, area of REFERENCES
expertise and academic titles); [1] Review of the themes of the AHC conferences over the last decade:
Garskova, I. New Trends in Historical Informatics: based on the
6) Awards (years of receipt and their names are included); conferences held in the 2000s // Herald of Chelyabinsk State University.
7) Documents (references on published editions (headline, 2011. No. 9. PP. 144—153.
name of the book, authors, place of edition, publishing house, [2] Ibid. P. 148.
year of edition, pages, ISBN) and archive documents (name of [3] Yumasheva, Y., Ivanova, G. History of Prosopography // Range of Ideas:
the archive, number of the stock, record, file, pages); Algorithms and Technology of Historical Informatics. oscow, 2005. P.
123.
8) Publications (the section contains data on the general [4] Ibid. P. 130.
number of patents and author’s certificates, general number of [5] Ibid. P. 134.
publications and sometimes selective indication of the most [6] Kosenkov, A. Regional Upper Social Strata in 1918—1953: Methods of
important scientific and methodological papers (including: co- Developing and Processing an Electronic Prosopographic Database //
authors, titles, status of the publication, volume, year of Harold of Tambov State University. 2013. Issue 11. P. 2.
edition and type of the publishing house)); [7] Dyachkov, G. Heroes of the Soviet Union: Features of the Collective
Profile // Harold of Tambov State University. 2007. Issue. 1. PP. 139—
9) Work in editorial boards (includes information on the 142.
work period and titles of editions); [8] Rakov, . Database “Dispossessed Kulaks of the South Urals (1930-
1934)”: New Results and Evolution of the Sample Collection //
10) Participations in exhibitions (year of participation, title Economic Theory. 2010. No. 3. P. 64—75.
of exhibition, prizes are included); [9] Garksova, I. From Prosopography to Statistics: Methods of Analyzing
Databases by Sources that Contain Dynamic Information // Source.
11) Scientific-social activities (period, category of board, Method. Computer. Barnaul, 1996. PP. 124—126.
level of membership are included); [10] Ibid. . 125—126.
12) Political activities (data on membership in a certain [11] Tselorungo, D. Officers and Soldiers of Russian Army – Combatants of
the Battle of Borodino (from Prosopographic Database to Historical
political party, years of membership, period of participation in Research) //
the work of elective bodies). URL: http://mozhblag.prihod.ru/stranicy_istorii_razdel/view/id/1130392
(date of visit 11.03.2014).
As a result of the flexible structure of introduction, storage
[12] Korniyenko, S. Studies of History of Public Administration and Self-
and options of further information processing, a researcher Administration in Pre-Revolutionary Russia (based on modern
obtains broad facilities to form there their own requests, information technology) // Vlast’. 2009. No. 11. PP. 44—46.
oriented on certain research objectives. [13] Povroznik, N. Information Systems for Historians: Major Development
Trends // Harold of Perm University. Series: History and Politics. 2009.
The database “Professors of Polytechnic University” is an No 3. P. 102.
electronic resource that provides a lot of facilities for
prosopographic research, and, undoubtedly, considerably

186
191

Vous aimerez peut-être aussi