Vous êtes sur la page 1sur 6

Digitization and multispectral analysis of historical books and archival

documents: Two exemplary cases


Giuseppe Maino
ENEA and University of Bologna, Ravenna site, via Mariani 5, 48100 Ravenna, Italy
giuseppe.maino@unibo.it
Abstract
A multispectral digital system, recently developed at
the ENEA laboratories in Bologna and applied to the
investigation of many artistic and archaeological
works, is presented, ranging from infrared radiation to
visible light and ultraviolet fluorescence, in order to
perform suitable analyses of paintings in illuminated
codes, parchments, books and documents in historical
archives, etc., preliminary to any restoration or
cleaning. Relevant software for multispectral image
analysis and digital restoration has been developed
associated with and complementary to this hardware
system and applied to a few cases of main historical
interest, namely the Marsilis archive in the Biblioteca
Universitaria of Bologna and incunaboli and
cinquecentine in the library of Minori Osservanti in
Bologna and the XVI century books in the Library of
Padri Minimi of Paola in Calabria (Italy).

1. Introduction
The application of multispectral nondestructive
techniques - initially developed in the field of materials
science - to the study of books and documents
conserved in archives provides the archivist and the
historian reliable qualitative and often quantitative
information on the considered objects. By this way, it
is possible to understand, for instance, the technical
and intellectual know-how of the artists or craftsmen of
the period under study, and - as for the archives
conservators - to obtain the preliminary often essential
information for conservation and restoration of
damaged or degraded objects and to assess the
authenticity of artefacts and documents in many cases.
On the other side, availability of a portable apparatus
represents a strong constraint for many analyses to be
performed in situ, since books or documents of large
size or particularly fragile cannot be easily and/or

safely transferred to specific laboratories. The infrared


and ultraviolet spectrometry is then a suitable
technology for this kind of analysis.

2. The experimental procedure


In many cases, the multispectral analysis allows to
improve the readibility of a document or to identify
again previous writings or images that have been
severely damaged or erased in the past. Finally, the
availability of multispectral data in digital format
together with suitable high-resolution images of the
document itself can be used to obtain a reliable
restoration of the considered historical object without
performing operations on the original copy, thanks to
suitable algorithms and physical models of degradation
processes.
I describe applications of multispectral techniques
to the analysis and digitization of both librarian and
archival heritage, considering two exemplary cases
developed within the framework of an international
project, GIANO (Grafica Innovativa per il patrimonio
Artistico Nazionale e per lOccupazione giovanile [1]),
funded by the European Community and the Italian
Ministry of University and Research. In particular, we
consider the digitization and popularization through
multimedia databases of XVI century books of
Calabrian libraries, many of them existing only in a
few copies, and the archival sources of sea trades
between Byzantium and the Southern Italy in the
Middle Age conserved in the Marsilis archive. Results
of digitization as well as multispectral analyses have
been arranged in suitable multimedia database systems
based on JAVA and XML scripts to allow for
interactive DBMS on web, starting from the previous
case and including another interesting application
about an important not yet well documented and
studied - historical archive of XVIII century conserved
at the Bologna University Library, namely the
manuscripts of general count Luigi Ferdinando Marsili

14th International Conference of Image Analysis and Processing - Workshops (ICIAPW 2007)
0-7695-2921-6/07 $25.00 2007

(1680-1730), founder of the Accademia dellIstituto


delle Scienze of Bologna, one of the most prominent
scientific institutions in the Europe of the enlightment
[2] (see fig.1).

statistical method has been developed and already


applied to investigation of paintings [5-10], based on
the principal components approach (PCA). By this way
it is possible to condensate the essential information
about document appearance in a single image.

ultra
violet

100
nm

400
nm

320 nm
Multispectral analysis

Ultraviolet

vamish
paint layers
ground
canvas/
canvas/wood

near infrared

visible

700
nm

1
nm

1.3
nm

MuSIS 2007

3
nm

1.55 nm

Choices and strategies for preservation of the collective memory

Visible

Infrared

retouche
underdrawing

Fig.1 Example of manuscript from Marsilis archive


Digital images of books and documents have been
obtained in different spectral ranges as shown in figs.2
where the esperimental setup is shown during the work
in the Libraries. The infrared radiation operating in the
reflection mode (reflectography) allows us to penetrate
the surface layers and to identify possible erased
writings (palinsesti) as well as to improve the
readibility of ancient documents where inks are nearly
vanished. The ultraviolet fluorescence excites the
surface varnish and provides information about
retouches, past restorations and, in general,
modifications to the surface layer [3,4].
The whole multispectral data, including the highresolution image in the visible, three different bands in
infrared reflectography and one band in ultraviolet
fluorescence, can be suitably treated in order to extract
the significant amount of information. To this aim a

Fig. 2 Experimental setup (lowest image) and spectral ranges


where digital images have been grabbed (upper image); the
behaviour of different wavelength radiation is schematically
presented in the middle figure with respect to the layer
structure of the examined object.

14th International Conference of Image Analysis and Processing - Workshops (ICIAPW 2007)
0-7695-2921-6/07 $25.00 2007

The resulting information, contained in the first two


three principal components (eigenvectors) is then
used to develop a theoretical model of degradation. By
means of an inverse regularized procedure - generally
consisting of solving an ill-posed problem in the
Hadamards sense the document can be significantly
improved in its readibility [11-13]. Fig. 3 shows the
results obtained in the case of Marsilis documents.

Fig.3 Some examples of Marsilis manuscripts before


(left hand side) and after (right hand side) image
restoration.

3. Image processing and ocr


Suitable algorithms of image restoration have been
thus applied in order to improve the quality of the
grabbed images. Then, an OCR, Optical Character
Recognition, procedure based on neural networks has
been applied to the written documents to achieve a
formatted text in digital form [14,15].
The following steps have been taken into account:

Pre OCR to select the significant information


and to proceed to the segmentation of the written text;

OCR, based on the forward-backward


multilayer perceptron neural network, trained of a
suitable set of pages from Marsilis archive;

Post OCR to perform an analysis of the


context through lists of items and names, thesauri,
sintactic rules;

Indicization to associate images to data


(numerical, textual or multimedial ones);

Parsing to finally develop a textual analysis in


order to introduce historical as well as biographical
information, bibliographic or archival items and so on.

The additional multispectral analysis, carried


out in the case of damaged pages, as previously
described, is a suitable non-destructive technique based
on portable equipment to perform analyses in situ and

useful to :
study the inks and papers used;
improve the readibility of the written text and the
images;
identify degradation and alteration processes;
extract information about the present status of
conservation and plan further interventions of
restoration and the best conditions for preservation of
the entire archive.
A further experimentation of the above mentioned
techniques has been carried out on the XVI century
books (cinquecentine) conserved in the historical
library and archive Charitas in the Convento dei
Padri Minimi of Paola (Cosenza, Italy), including rare
books such as:
Tommaso Campanella, Philosophia, sensibus
demonstrata, in octo disputationes distincta 1591
pp. 525;
Gioacchino da Fiore, Divini vatis abbatis 1519
pp. 278;
Bernardino Telesio, De natura iuxta propria
principia 1565 pp. 178;
Paolo Regio, La vita di S. Francesco di Paola
1581 pp. 270;
Gabriele Barrio, Pro lingua Latina libri tres 1571
pp. 606.
Fig. 4 shows an example of treated images of these
cinquecentine.
The OCR technique is similar to that developed by
my group in the past years in protein science for the
prediction of the secondary structure of proteins
starting from their amino acid sequence.
Computational tools can bridge the gap between
sequence and protein 3D-structure based on the notion
that information is to be retrieved from the data bases
and that knowledge-based methods can help in
approaching a solution of the protein folding problem.
Use has been made of two neural networks, the former
being a perceptron with one hidden layer, performing
a supervised learning phase. Sequences of outputs from
the first network are then introduced as inputs to the
second network, which has a filtering effect.
All our predictors take advantage of evolution
information derived from the structural alignments of
homologous (evolutionary related) proteins and taken
from the sequence and structure databases. An
analogous approach can be implemented for languages
in the case of OCR.
The networks are implemented in ANSI C language
and are optimized to run on a cluster of workstations
under the UNIX operating system with PVM protocol.

14th International Conference of Image Analysis and Processing - Workshops (ICIAPW 2007)
0-7695-2921-6/07 $25.00 2007

Fig. 4 Examples of treated images of cinquecentine in the Library of Padri Minimi of Paola.
Block description of the adopted procedure for analyzing and archiving
digital images
Original document
Acquiring high-resolution multispectral images

Acquiring high-resolution digital images in the visible

Image processing and restoration


Multimedia archive on web

Image enhancement
Digital archive

pre - OCR

OCR
post - OCR

Indicization
Parsing

The whole process is summarized in the previous block diagram.

14th International Conference of Image Analysis and Processing - Workshops (ICIAPW 2007)
0-7695-2921-6/07 $25.00 2007

4. The multimedia database and website


The final step has been the implementation and
validation of a multimedia database for archiving
information about diagnostics, conservation and
restoration of historical and artistic documents and
books. The software architecture is based on a Content
Management System (CMS) and allows the
development of a dynamic website.
Three main applications have been implemented
for demonstrative purposes whilst developing the
project. All of them are characteristics of a wide range
of cultural assets:
a) Libraries and historical archives, especially in
Calabria and Sicily, mainly important for the existence
of inedited documents related with Bisanzio presence
in the Southern and Insular Italy;
b) Diagnostic imaging and restoration reports
(including written and photographic reports) of
historical and artistic assets;
c) Mediterranean wall mosaics (IV-XIV A.D.).
All these results, useful for scientists and conservators,
but also remarkable for interested people and tourists,
must be available in a simple and effective way. At the
moment only two ways are available to organise big
size websites:

Collection of documents (hundreds or


thousands) consultable by the public;

On-demand database related applications that


use a dynamic way to show multimedia
documents and data.
The first solution requires complex, long and
expensive maintenance; moreover, it is not suitable in a
dynamic situation where information changes
frequently. In fact, it could be very difficult to maintain
the data consistency, while their updates should be
done by dedicated people, with a relevant consequent
expenditure of time and human resources.
On the other hand, the second scenario matches
more with the above situation. It consents to produce a
large number of web-pages, re-using graphic
components, maintaining distinct interface layouts and
developing the code to recover web-pages data.
Moreover, it allows a proficient management of all
human resources involved in the project.
The main aim of this project is to create databases to
hold and integrate all GIANO results: For this reason is
important to implement applications that let users
know the data sources, producing on-demand
documents.
It has been decided to develop a website with all
these characteristics using software defined CMS
(Content Management System), by resorting to skills,
knowledge and techniques necessary to build and

manage this kind of software in a hyper-textual way


(all documents are available on internet), with precise
communication standards (priority, visibility, etc.).
A Content Management System builds and updates a
website, managing all phases: Setting, editing,
publishing texts, images and sounds. Moreover, if it
would be useful in a portal, it should classify and
organize all the information to easily find, implement,
modify and link them or to re-use them in a different
part of the website: The bigger is the website, the more
important is that the CMS is flexible and efficient.
The mainly characteristics of the developed CMS are
therefore:

User-friendly interface;

Fast possibility of inputting, modifying and


finding information;

Capability to adapt to frame and graphic


website need;

Safe and flexible usability;

Interface uploading via browser;

Use of graphic template for showing contents;

Manage of different customer roles and


workflow;

Database for images, texts and graphs;

Find and integrate information from other


sources;

Manage mailing lists and mail boxes;

Manage and order links, news, FAQ, events;

Searching usability;

Customise graphic contents.

5. References
[1] D.Biagi Maino, G.Maino, The GIANO project:
Combining hypermedia and network technologies for
applications to the cultural heritage, in Proceedings of
International Conference on Multimedia Computing and
Systems, IEEE Multimedia Systems 99, Firenze, June 7-11,
1999, IEEE Computer Society, Los Alamitos, California
(1999) 1102-1105.
[2] G.Maino, Immagini per la storia. La documentazione
nellet digitale, in Limmagine del Settecento da Luigi
Ferdinando Marsili a Benedetto XIV, D.Biagi Maino, ed.,
Archivi di Arte Antica, Umberto Allemandi Editore, Torino,
2005, pp.129-159.
[3] G.Maino, S.Bruni, S.Ferriani, A.Musumeci, D.Visparelli,
Multispectral analysis of paintings and wooden sculptures, in
Proceedings of II Congresso Nazionale AIAr Scienza e Beni
Culturali, Patron Editore, Bologna (2002) 203-214.
[4] G.Maino, Multispectral investigation, in Choices and
strategies for preservation of the collective memory,
Ministero per i Beni e le Attivit Culturali, Roma, 2005,
pp.228-246.

14th International Conference of Image Analysis and Processing - Workshops (ICIAPW 2007)
0-7695-2921-6/07 $25.00 2007

[5] A.Tartari, G.Maino, E.Lodi, C.Bonifazzi, Compton


scattering elemental imaging of a deep layer performed with
the principal component analysis, in Proceedings of Roma
2000 15 th World Conference on Non-Destructive Testing,
Roma, October 15-21, 2000, 261-268.
[6] C.Bonifazzi, G.Di Domenico, E.Lodi, G.Maino,
A.Tartari, Principal component analysis of large layer
density in Compton scattering measurements, Applied
Radiation and Isotopes 53 (2000) 571-579.
[7] C.Bonifazzi, E.Lodi, G.Maino, V.Muzzioli, L.Nanetti,
A.Tartari, Multivariate image analysis of ECoSp Compton
spectra, Nuclear Instruments and Methods in Physical
Research B213 (2004) 712-716.
[8] C.Bonifazzi, S.Ferriani, G.Maino, L.Salmaso, A.Tartari,
Three-way data analysis of -ray spectra. Material
composition study and density imaging, in Proceedings of
CLADAG 2003 Meeting of the Classification and Data
Analysis Group of the Italian Statistical Society, Bologna,
September 22-24, 2003, CLUEB, Bologna (2003) 63-66.

Analysis and Processing, Firenze, September 17-19, 1997,


Lecture Notes in Computer Science 1311, vol.II, A.Del
Bimbo, ed., Springer Verlag, Berlin - Heidelberg (1997) 436444.
[14] D.Biagi Maino, G.Gandolfi, L.Roversi, S.Ferriani,
M.Galli, M.Magnani, G.Maino, A.Musumeci, D.Visparelli,
C.Zambon, An imaging computing system for handling
information about diagnostics and conservation of paintings,
in Proceedings of 6th International Conference on NonDestructive Testing and Microanalysis for the Diagnostics
and Conservation of the Cultural and Environmental
Heritage, ART-99, Roma, May 17-19, 1999, vol. II, Roma
(1999) 1247-1262.
[15] D.Biagi Maino, S.Bruni, L.Ciancabilla, S.Ferriani,
G.Gandolfi, G.Maino, D.Visparelli, Advances in digital
image processing and archiving for works of art, in
Proceedings of workshop on Artificial Intelligence for the
Cultural Heritage, Bologna, September 14, 1999, L.Bordoni
(ed.), Bologna (1999) 109-119.

[9]
C.Bonifazzi,
S.Ferriani,
G.Maino,
A.Tartari,
Multispectral examination of paintings and works of art: A
principal component analysis approach, in Proceedings of
CLADAG 2003 Meeting of the Classification and Data
Analysis Group of the Italian Statistical Society, Bologna,
September 22-24, 2003, CLUEB, Bologna (2003) 67-70.
[10] C.Bonifazzi, S.Ferriani, A.Romano, G.Maino, A.Tartari,
Multispectral Examination of
Paintings: A Principal
Component Image Analysis Approach, in Proceedings of
ART 2005 8th International Conference on Non-Destructive
Investigations and Microanalysis for the Diagnostics and
Conservation of the Cultural and Environmental Heritage,
Lecce, May 15-19, 2005.
[11] C.Bonifazzi, G.Maino, A.Tartari, Multiple scattering
and regularization algorithms for Compton scattering
techniques, in Proceedings of International Conference on
Nuclear Data for Science and Technology, Trieste, May 1924, 1997, G.Reffo, A.Ventura, C.Grandi, eds., Atti di
Conferenze della Societ Italiana di Fisica, vol. 59, part II
(1997) 1749-1751.
[12] C.Bonifazzi, G.Maino, A.Tartari, Regularization
methods for the image restoration of electromagnetic and
th
optical complex systems, in Proceedings of 12 International
Conference on Analysis and Optimization of Systems,
Images, Wavelets and PDEs, Parigi, June 26-28, 1996,
Lecture Notes in Control and Information Sciences 219, M.O.Berger, R.Deriche, I.Herlin, J.Jaffr and J.-M-Morel, eds.,
Springer Verlag, London (1996) 269-281.
[13] C.Bonifazzi, G.Maino, A.Tartari, A regularization
method for unfolding the measured data of different X-ray
spectrometers in Compton scattering tomography, in
Proceedings of 9th International Conference on Image

14th International Conference of Image Analysis and Processing - Workshops (ICIAPW 2007)
0-7695-2921-6/07 $25.00 2007

Vous aimerez peut-être aussi