Vous êtes sur la page 1sur 144

ON THE STRUCTURE OF INFORMATION TRANSFER NETWORKS

ALAN PRITCHARD
FLA MIInfSc MBCS
A thesis submitted to the CNAA
in partial fulfilment of the requirements
of the M Phil degree
Polytechnic of North London
School of Librarianship
April 1984
'
'
- 2 -
CONTENTS
Contents
List of figures
Acknowledgements

Abstract
[1] Introduction
1.1 Transport and society
1.2 Transport geography and information transfer.
[2] Models in library and information science and communication.
2.1 Introduction.
2.2 Models
2.3 A communication model
[3] A general theory of transport
3.1 Circulation.
3.2 Transport systems
3.3 Communication networks.
3.4 The transport system model.
3.5 Modes of transport.


[4] A general theory of information transfer
4.1 Information transfer model.
4.2 Operational milieu.
4.3 Need for information.
4.4 Structural components of the system
4.5 Systems operations and level of service
4.6 Flows through the system.
4.7 External policies and decisions
4.8 Summary
[5] Graph theory in library and information science.
[6]
5.1 Introduction.
5.2 Garner.
5.3 Korfhage et al.
5.4 Pritchard
5.5 Cummings and Fox.
5.6 Fialkowski and Jastrzebski.
5.7 Shaw.
Measures
6 1
6.2
6.3
6.4
6.5
6.6
6.7
6.8
6.9
6.10
of networks
Introduction.
Existing work
Beta Index.
Cyclomatic Number
Alpha Index
Diameter.
Gamma Index
Eta Index
Pi Index.
Theta Index


2
4
5
6
7
7
8
12
12
14
16
21
21
23
25
26
28
29
29

31
32
32
35
36
37
38
39
I
39
41
43
44
45
45
45
46
46
48
49
52
54
55
56
57
57
58
6.11 Iota Index.
6.12 S-I Index
- 3 -
6.13 Degree of Connectivity.
6.14 Dispersion and Connectivity
6.15 Connectivity Matrices
6.16 Summary
[7] Analysis of citation networks.
7.1 Introduction.

7.2 Introduction to the results
7.3 Growth of Sigma E .

7.4 Beta Index.

7.5
Gamma Index
7.6 S-I Index
7.7 Growth of paths
[8] Summary and conclusions.
8.1 Introduction.

8.2 Information transfer model.
8.3 Graph theory.
8.4 Implications.
Bibliography
Appendices
Appendix 1
Appendix 2
Appendix 3
Appendix 4
Appendix 5
Appendix 6
Patent network 1
Patent network 2
Patent network 3
Patent network 4
Electrophotography
Ziegler-Natta catalysis.
Lasers
EMI scanner.
Bibliometrics network
Computer programs
59
61
67
68
69
72
73
73
75
76
84
92
- 100
107
111
111
111
112
113
117
123
123
126
129
132
135
138
1
2.
3
4.
5.
6.
7.
8.
g.
10.
11
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
- 4 -
LIST OF FIGURES
Information transfer system and process.
Schematic representation of transport systems.
A horizontal analysis of transportation.
The transport system model
The information transfer model
Tree and star graphs and Beta.
Ring graphs and Beta
Star graphs and Mu
Tree graphs and Mu
Networks differentiated by Theta
Non-directed network for S-I analysis.
S-I Index plane-.. .. . . . . .
S-I Index for airline networks
Network for connectivity matrix analysis
Patent network 1 - Sigma E
Patent network 2 - Sigma E
Patent network 3 - Sigma E
Patent network 4 - Sigma E
All patent networks - Sigma E
Bibliometrics network - Sigma E.
Patent network 1 - Beta.
Patent network 2 - Beta.
Patent network 3 - Beta.
Patent network 4 - Beta.
All patent networks - Beta
Bibliometrics network - Beta
Patent network 1 - Gamma
Patent network 2 - Gamma
Patent network 3 - Gamma
Patent network 4 - Gamma
All patent networks - Gamma.
Bibliometrics network - Gamma.
Patent network 1 - S-I Index
Patent network 2 - S-I Index
Patent network 3 - S-I Index

Patent network 4 - S-I Index
Bibliometrics network - S-I Index.




All patent networks - Growth in paths.
Bibliometrics network - Growth in paths.

19

24

25-
27
30

51
5/

53
54
59
61
64
65
69
78

79
80
81
82
83
86
87
88
89
90
91
94

95
96
97
98
99
103
104
105
106
1 06a.
109
110
- 5 -
ACKNOWLEDGEMENTS
This study of information transfer networks has been carried out
on a part-time basis at the Polytechnic of North London, under the
supervision of Mr J. Mills of the School of Librarianship. The author
would especially like to thank both Mr Mills and his other supervisor,
Professor A. J. Meadows of Leicester University for their help and
guidance over several years. Thanks are also due to the City of Lon-
don Polytechnic for giving generous amounts of study leave and for
allowing the use of computing equipment, to Stephen Swanson for writ-
ing the program NETWRK as his project in the Modular Degree and
Diploma Scheme at the City of London Polytechnic, and to Drs P.
Layzell Ward and C. Dixon for very useful discussions on information
transfer and transportation geography respectively. Finally, it would
have been difficult to have compiled the bibliography on bibliometrics
without the help of a number of libraries (especially the City of Lon-
don Polytechnic, Aslib and the Library Association). The Library
Association Library, in particular, has obtained many early u.s. Mas-
ters and PhD theses, without which the network would have been much
less complete.
I
~
r
h
!I
'

- 6 -
ABSTRACT
ON THE STRUCTURE OF INFORMATION TRANSFER NETWORKS
Thesis submitted in partial fUlfilment of M Phil degree
ALAN PRITCHARD
The aim of this thesis has been to determine whether it is possi-
ble to distinguish and to classify information transfer networks on
the basis of their topological structure. A number of index measures
have been used in transportation geography and part of the work has
been to review these and to see to what extent they might be useful in
studies of information transfer networks. There would seem to be a
close analogy between the concerns of transport geography and those of
information transfer which would make the successful use of these
graph-theoretic measures likely.
A survey of existing models of information transfer is made and
a new model presented. Within this new model, various measures of the
topological structure of a graph or network are examined and their
characteristics and relevance to information transfer discussed. A
modification has been made to the S-I Index to cater for the differ-
ences between information transfer and transport geography. Some of
the measures are then tested on one particular information transfer
mode and the results presented for five citation networks (four patent
networks and the network of a comprehensive bibliography on
bibliometrics). It is shown that the measures tested are capable of
discriminating between the various citation networks. The patent net-
works were found to have a similar pattern on the various graphs,
which was different from that of the bibliometric network.
- 7 -
CHAPTER 1
INTRODUCTION
"Transport is civilisation" (Kipling)
~ - ~ Transport ~ society
The concept of transport is fundamental to the workings of
society. In the sphere of economics, the movement of goods from the
point of manufacture to the point of consumption is an essential
feature cf any society which exists at more than the bare subsistence
level.
Another feature which distinguishes homo sapiens in general, and,
particularly, societies which operate at rather more than a bare sub-
sistence level, is the ability to transport (or transmit) ideas,
information and knowledge across both space and time. Spatially
! .
information is transmitted from community to community and thus can be
considered as a special non-material form of transport. Through time,
information etc is transmitted from generation to generation, thus
enabling a later generation to use the knowledge acquired by an ear-
lier one, without having to rediscover it for itself. This feature,
which Korzybski called "time-binding" (Korzybski 1958; Weinberg 1960),
is that which essentially distinguishes man from the animals, in that
one generation can purposefully receive and build cumulatively upon
the work of previous ones. In Newton's words "by standing on the
- 8 -
shoulders of giants".
This underlying unity in the concept of transport (about which
I
more will be said in the main part of the thesis) means that the tools
and methods constructed in one area may well be applicable to another.
Arising out of the discipline of transport geography, which was ini-
tially concerned with the spatial relationships of transport, there
has been an increasing interest in a more general theory which is con-
cerned with all forms of transport, both through space and through
time (although this concern with an integrated theory was first voiced
by Cooley as long ago as 1894). As yet, this general theory is still
relatively unformed and has been the exclusive property of geogra-
phers, although some broadly related work has been going on separately
in some other fields.
~ . ~ . Transport geography ~ information transfer
It is the purpose of this thesis to attempt to draw together some
of these separate threads of work in disparate fields and to apply
these to that area of information science known as information
transfer. Along the way, an attempt is made to formulate a very gen-
'
eral theory or "schema" of information transfer and finally some
I
suggestions for further work are made.
[
In this general theory, specific areas such as transportation,
communication of ideas or mass communication may well form special
cases to which some or all of the general theory applies. There may
also be special theories in particular areas which are unique to those
areas.
However, there is value in the analogies which can be drawn from
one area and applied (with care and discrimination) in another. In
- 9 -
the particular area described in this thesis, there would seem to be a
number of ideas which can be drawn from transport geography and
applied to the communication of information generally (and to the more
specific area of scientific communication) and to areas of
bibliometrics and scientometrics.
These particular ideas can, at this stage, be summarised as:
[1] That there is a very close analogy between the physical transport
of physical objects (goods) across a spatial network and the
intellectual communication of ideas through a temporal network.
[2] That there is an analogy between the relationships between the
sub-networks (such as the rail network or the road system) and
the overall transport network, and the relationship between the
sub-networks (such as the citation network or informal communica-
tions network) and the overall communications network.
[3] That the network analysis and classification methods which have
been applied to transport networks can, therefore, be applied to
communications networks. As these methods are based upon the
generalised mathematical method of graph theory, it is particu-
larly likely that they will be applicable to communications net-
works, albeit in a slightly modified form since (a) analogies are
rarely precise and (b) there are distinctive features about com-
munications networks which will force modifications.
[4] That such topological methods (based upon graph theory) should,
since they are concerned with the flew of information through a
communications or information network, be of use to the informa-
tion world generally in the planning of the tools and services of
information transfer.
- 10 -
In order to develop the graph theoretic measures or the information
transfer network, it is necessary first or all to discuss the overall
concept of information transfer, the processes involved and their
relationship to a general theory of transport.
There are, of course, models of information transfer which have
already been developed and discussed in the literature, most notably
the Bradford-Zipf model and the epidemic model of Goffman. Both of
these are stochastic models, which attempt to explain on the basis of
mathematical statistics the distribution of information among a number
or journal titles (in the Bradford-Zipf model) and the flow of infer-
mation from one person to another across space and time (in the Goff-
man model).
The remaining chapters of this thesis are as follows:
[2] The second chapter discusses the nature and role of some models
in library and information science and in communication.
[3] The third chapter discusses the other theme or this thesis - the
role of an overall theory of transport and the models associated
with this.
[4] The fourth chapter develops a general theory of information
transfer within which the network structure is set.
[5] The fifth chapter discusses and summarises the relatively few
previous applications of graph theory in library and information
science.
[6] The sixth chapter then describes measures which have been used
(and develops some new ones) to classify a subject or discipline
on the basis of the graph theoretic model of the information
- 11 -
transfer network. Theses models are developed on an a priori
basis using ideas drawn from and modified from transport net-
works.
[7] The seventh chapter applies the most likely measures to some
citation networks in order to evaluate their effectiveness in
discriminating among different network structures.
[8] The final chapter then discusses the implications of the models
and network measures and the results obtained in terms of their
practical role in library and information science.
'11
'I
'I
:I
i
:
- 12 -
CHAPTER 2
MODELS IN LIBRARY AND INFORMATION SCIENCE AND COMMUNICATION
~ . ~ . Introduction
Since the communication process is so very complex, it is neces-
sary to simplify it and to abstract its essential features so that
they can be examined without the "noise" of extraneous information.
This process of abstraction is usually known as modelling. There are

many kinds of model and it may be helpful at this point to set the
'li . .
remainder of this thesis in context, firstly, by means of a general
discussion of the role and types of models and, secondly, by a discus-
sion of the types of models in the field of library and information
science, before going on, in the next chapter, to discuss and develop
a general model of transport which will include information transfer
r
as a special case.
Models represent a highly simplified and intelligible picture of
the real world - a simplified structuring of reality which presents
supposedly significant factors or relationships in a generalised form.
They must necessarily be highly selective approximations in that they
cannot include all associated measurements or observations. Indeed,
this must be the case, and a fully inclusive model a contradiction in
I .
terms, for the model which included all observations would map reality
so closely that it would cease to become the abstracted model and
would become reality itself. Furthermore, in many cases, the value of
'
J
- 13 -
r
a model is directly related to the level of abstraction. The concomi-
tant of selectivity is subjectivity. A model must necessarily be
highly subjective, since there must be a choice of what factors or
relationships are judged to be significant by the modeller. The con-
cept of choice involves that of subjective judgement about which fac-
tors are important for the purpose in hand.
The fundamental feature of the construction of a model is that it
has involved a selective attitude to information, wherein not only
noise but also less important but meaningful signals have been elim-
inated in order to enable us to see something of the heart of the
matter.
The advantages of a model (and particularly of mathematical
models, with which this thesis is concerned) are summarised well by
Edmundson (1967) as follows:
[1] May make it possible to use mathematical theories that otherwise
might appear to have no applicability to the problem.
[2] May be enlarged step-by-step to include aspects that were .
t
neglected.
[3] May uncover new relations between aspects of the physical entity
that are not apparent in a verbal description.
[4] May suggest informational gaps that otherwise might be over-
looked.
[5] May reveal the scope and limitations of the theory being used.
[6] May permit improvement in future modelling efforts.
- 14 -
[7] May suggest more quantitative measures of adequacy.
[8] May be the cheapest and fastest way to predict.
Models may be classified in a variety of ways depending upon the
function of the classification. One way is to divide them into
descriptive models, physical models and mathematical models. This
study is concerned with mathematical models which are usually classi-
fied by the major mathematical technique used, eg.
Mathematical models
Deterministic models, using
Logic
Analysis
Algebra
Geometry
Stochastic models, using
Probability
Statistics
~ . ~ . Models
An examination of models in a variety of subjects shows that most
models are stochastic in nature, and this is equally true in library
and information science.
One of the two major mathematical models in the area of informa-
tion transfer is the Bradford-Zipf model. The original formulation of
this empirical law of scatter (Bradford 1953) was as follows:
-
the aggregate number of articles in a given subject, apart
from those produced by the first group of large producers
(periodicals), is proportional to the logarithm of the
- 15 -
number of producers concerned, when these are arranged in
order of decreasing productivity.
It was soon recognised that Bradford's Law was essentially the same as
the relationship between the rank and the frequency of words (Zipf
1949) and the two names are generally coupled together, This is a
little unfair on Lotka who had formulated a similar law (Lotka 1926)
which related authors and the numbers of papers which they produced.
Over the past fifteen years or so, there has been particularly intense
interest in both the practical and theoretical aspects of the distri-
bution with many recent papers on both empirical and theoretical
aspects of the distributions. Price has shown (Price 1976) that all
the three laws noted above are special cases of an underlying cumula-
tive Advantage Distribution which models statistically the situation
in which success breeds success. The distribution also underlies the
Pareto Law of income distribution.
The other major model in information transfer has received less
interest from the library and information world, although, as with
Bradford-Zipf, it, too, is very wide-ranging with links and analogies
in a number of other fields, This is the Goffman epidemic model which
uses the concept of the medical epidemic process as an analogy to the
growth of interest in a subject and its subsequent decline, Workers
in the field who are publishing can be considered as carriers of the
disease. This, teo, is a probabilistic model which has been shown by
Worthen (1973) to be similar to Menzel's diffusion process and hence
to the diffusion models used by geographers (eg. Hagerstrand 1967)
which model the spatial diffusion of ideas, techniques etc.
Another interesting approach to the modelling of information
transfer is that of Avramescu (1975), who uses the heat dlffusion
equation of Fourier (a partial differential equation).
He also
' ~
t
....
- 16 -
derives Bradford's Law from this equation and proposes a physical
electric network model of the diffusion of scientific information.
~ . ~ . A Communication model
There is, in addition, one other descriptive model of communica-
tion in a more general sense which is applicable to this discussion
and which will be used in more detail at a later stage. It is, like
Garner's work (referred to in Chapter 4), curiously neglected by the
library and information world. The model was developed by Havelock
and his co-workers (Havelock 1971) at the Center for Research on Util-
ization of Scientific Knowledge (CRUSK) at the University of Michigan,
in the course of an extensive review of the literature on the dissemi-
nation and utilisation of knowledge, as a deliberate attempt to syn-
thesise the previously rather scattered results. It is set in the
context of a more personal exchange of information (although it has
obvious extension to the more formal scientific communication system
and to communication between organisations and systems) and it largely
'.(
::
ignores work in librarianship and information science.
The model consists of two elements (Havelock 1971, 1-10 to 1-14):
the knowledge transfer process and the knowledge flow system. The
process can be considered as a linkage between a potential "user" and
a potential "resource" and can be analysed into six categories or
problem foci by the formula:
WHO says WHAT to WHOI1 by WHAT CHANNEL to v/HAT EFFECT for WHAT PURPOSE
- 17 -
.2..3. .2..
However, this process forms only the "microsystem" within an
overall macrosystem" which is concerned with the overall flow of
knowledge in the chain:
BASIC RESEARCH scientists and systems
to
APPLIED RESEARCH AND DEVELOPMENT
to
PRACTITIONERS, practice groups and practice systems
to
CONSUMERS, consumer groups and society as a whole
The flow is not, of course, the strictly linear one shown above,
but there is both two-way communication and interdependence between
these four groups. The whole system and the relationship between the
macro system and the micro orocess is illustrated in Figure 1.
l
'i
!.
'i,
FIGURE 1
ORGANIZ.ATtOtJ
u Au
ORGANIZATIONAL
SUB-UNIT
@ ~
THE
KNO\ILEDGE
S URCE
/
/
/
/
- 18 -
SYSTEM
uk G- ~
n a g e ~ Linkage
< 7 '<(- I
/ I I I
I I I I
I I
I
I I
I
/
/
I
I
I
/
I I I
I
I
I
I
I
I
I
I
I
I
I
ORGANIZATION "C"
~ e
THE
KNOIILEOGE
USER
PROCESS
i!
,I;
i
' ,.
'I
I
:1
- 19 -
TWo things should be noted about the lower diagram in Figure 1, as
compared with the verbal formulations on the previous page. One is
that the concept of PURPOSE is in the WHO section of the diagram as
WHY, rather than explicitly as PURPOSE. WHY also appears as part of
the TO WHOM part of the Process, thus indicating that the process is
in fact a two-way traffic and that the concept of TO WHAT EFFECT must
also be affected by the personal characteristics of the person to whom
the original message was given.
The other interesting features of the diagram which are not in
the verbal description are the bands of "interpersonal barriers" which
:,.,
lie between the WHO and the WHOM and the linkage (the channel of com-
''
munica tion). These recognise that as well as channels of communica-
tion, there are obstacles and that there can be both personal and
technical problems in transmitting a message (coding it into whatever
form the channel requires) as well as personal and technical problems
in receiving a message (decoding it from the code used by the chan-
nel).
This model is described primarily in terms of interpersonal com-
munication and the channels are relatively ignored in Havelock's book.
However the model is more generally applicable and can be considered
in terms of the inter- or intra-institutional transfer of information
as well. One immediate improvement to the model would be the addition
of institutional and system barriers which may be interposed between
supplier and user of information, eg. a catalogue or classification
scheme may interpose a greater or lesser barrier to the use of a
library or information centre by a user. The model then provides a
conceptual framework within which many library and information
:]!
ii:
- 20 -
activities may be evaluated,
"
'
I.
j,
I:
- 21 -
CHAPTER 3
A GENERAL THEORY OF TRANSPORT

Circulation
The closest approach so far to an overall theory of transport is
that which is designated by the term circulation used by French geo-
graphers. It has no exact English equivalent but it represents very
well a concern with all spatial interactions and connections, whether
they be flows of goods, people, money, credit, ideas, innovations,
knowledge, or information (Eliot Hurst 1974, p.2).
In 1894 however, the American geographer C.H. Cooley wrote as
follows (quoted from Eliot Hurst 1974, p.18):
Transportation, in a social sense, aids the physical organi-
zation of society. Without movement and connectivity the
constituent parts of society could not be unified or inter-
dependent; specialization of activity would be impossible.
To link up those parts, to emphasize those relationships,
communication occurs. Communication is here used in the
widest sense of the communication of ideas and goods between
places spatially and temporally separated. These are the
threads that hold society .together; upon them all unity
depends. Using this wide definition of communication we can
categorize movement as:
1. The mechanism of material communication
a Place communication-
b Time communication - storage, etc.
2. The mechanism of psychical communication
- 22 -
a Place communication - speech, physical gestures, news-
papers, mail, telegraph, etc.
b Time communication - the printed/written word as a
document of cultural continuance; the spoken myth, etc.
Cooley's work is of particular importance for a number of
reasons. Firstly, his classification of the various forms of
communication is interesting because of the way in which he
includes "psychical communication" (i.e. the communication of
ideas) and then breaks this down into Place and Time communica-
tion. He takes a remarkably holistic view of geography and one
that would seem to encompass many more facets than the currently
accepted view of the subject that is concerned with the (admit-
tedly very wide) study of all aspects of spatial relationships in
society. Secondly, the examples that Cooley gives under "Psychi-
cal communication" foreshadow many of the current concerns of the
information world. Cooley does not explain his terms in any
detail, but from the context of the original writings, it would
seem that by "The spoken myth" Cooley was intending to mean what
we would now call the Invisible College and informal communica-
tion, rather than the more anthropological and social concepts of
myth as propounded by Levi-Strauss and others. "Cultural con-
tinuance" would seem to mean the study of information communica-
tion and the scientific communication process.
It seems to me that Cooley quite clearly saw the essential
unity between spatial and temporal flows and was suggesting that
geographers should be interested in both, and that the subject
should really consist not only of the traditional study of the
spatial aspects of society but should also include elements from
the discipline of history to form a unified system of study.
i'
I
I
l
f:
- 23 -
~ . z . Transport systems
Cooley's work has been condensed and reprinted in Eliot
Hurst (1974). This collection of reprints (and one original
article) is a major landmark in the process of broadening the
scope of transportation geography into an Anglo- Saxon equivalent
of circulation. Not only does Eliot Hurst link the sections and
the individual reprints (which are, in general, very network-
oriented) with a commentary and the progressive development of a
model, but there is also a specially written piece on communica-
tions geography (Abler 1974).
In the introduction to the major section on network
analysis, Eliot Hurst discusses the problems that the variability
of network types causes and the difficulty of comparing networks.
He then introduces graph theory as the means by which the
geometrical (or, rather more specifically, the topological ) pro-
perties of the networks can be compared and the common properties
analysed:
"No matter what type of system is studied common
geometrical properties can be analyzed, such as (1)
origins or nodes, (2) routes or links, and (3) destina-
tions or nodes, (or more simply,. nodes and routes)
Networks are structures designed to tie together nodes
via routes, whether they be flows of people, goods,
money, information or anything else that is moved from
one place to another. A transport network is then "a
set of geographic locations interconnected in a system
by a number of routes. (Kansky 1963, p.1). The
geometric structure of a transport network is the topo-
logical pattern formed by those elements, the nodes and
the routes.
This work on the geometric properties of networks has
two main thrusts: (1) to ascertain what measures will
describe the structure of transport networks and (2) to
ascertain how these measures are related to the characteris-
tics of the area in which the transport networks are
situated. Arising from these two main research areas have
been attempts to simulate networks - that is, given some
'I
li.
... ,:J I.
: ~ . :- I
-----
- 24 -
information about the nature and development of an area, can
the network(s) be replicated?"
In this excerpt we can see the concern with a broad outlook
on the questions and problems of transport and this is reinforced
by the following diagram (Eliot Hurst 1974, p.54):
Schematic Representation of Transport Systems
Nature
lntang1ble
FUNCTION
Tangible
Imaginary
Radio
Television
Airlines
Steamer route
Network has
no virtual
existence.
independent
ouransport
Source: From an idea by B. Turnbull
Object
Telephone
cable systems
Railway
Highway
Network exists
independently
of unit of
transport
Messages/
information
transported
Goods/people
transported
Figure 2: Schematic representation of transport systems
The concerns which Eliot Hurst mentions in the quotation
above are directly mirrored by the concerns of this thesis in the
field of information transfer. Again, these are to ascertain
what measures will describe the structure of networks and to
ascertain how these measures are related to the subject area in
which the network is situated.
:i
if
:.
'
I
I
- 25 -
~ . ~ . Communication networks
Abler (in Abler 1974) is very much concerned with the spa-
tial nature of the communications network and his examples are
drawn from the areas of spatial interaction analysis (citing such
studies as Zipf 1946, Zipf 1949 and Nystuen & Dacey 1961) and
diffusion theory (e.g. Hagerstrand 1967 and Brown 1968).
However this is again an area which overlaps considerably
with information science. The connections between the two areas
were explicitly drawn by Worthen (1973) who showed that the work
of Menzel on innovation diffusion and Goffman's epidemic theories
were directly related.
Yet another holistic view of transport is presented in the
table "A horizontal analysis of transportation" in Eliot Hurst
1974, p.382, and reproduced here as figure 3:
F/frURE 3
t.
A Horizontal Analysis of Transportation
t
Inventory Network
...
Flow Mode Relationship
Human Words. sounds, Fields of per- Flows of 1deas. Face-to-face. Social
i
'
communication symbols. smells, sonal, public facts telephone.
communication ~
taste and visual radio.
!
contacts; television.
'"
distribution pnnted word
!
channels, radio-
I
t
TV networks
'
i
!
Urban Cars, trucks. Streets. s1de- Flows of ideas. Pedestrian,l Balanced urban
!
transport buses. trains; walks. tracks. money. goods. automobile, transport system
~
mites of street. freeways people truck. bus,

freeway. rap1d transit
l
p
subway
I
'
Developing Cars. trucks. Road. ra1L Flows of goods, Railroad. Transport as a f
':'
areas lreight cars, water. a1r equ1pment. truck. neutral factor:
!
i;l.
<!.1-i
etc.; miles of networks mach1nery, automobile. concomitant with
l
f!
road, track, people. ideas water transport, growth, sometimes
lr ,<
etc. airplane. a catr1lyst
I;;.!
hovercraft
c
'I,.,
~
1:..
Highway New and p r o ~ New freeway, Volume of Automobile, Impact en region
t
.,.
impact jected highway roadway diverted. truck. bus -growth and
!
):
1.,
miles, networks generated, non- change of
l
,,,!
expected diverted traffic tertiary activities.
I'''
.>
traffic units and other g
i.:l
economic
f
o ..
activities
[
.......
Recreation Cars. buses. Road, rail. Flows of Automobile, Increased
planes. skJ a1r networks people. money bus, plane, leisure time.
'

lifts. etc.: skidoo. etc. regional impact.
'
higr.ways, a 1 r ~ growth of tertiary
i
line mdes activities
'I
- 26 -
3.. .!1.. The transport system model
The model developed by Eliot Hurst (1974, p.7) relates five
elements and sets these within a social context:
[1] An inventory (a stock of road route miles, number of vehi-
cles, etc.)
[2] A network (the geometric structure of the route system)
[3] Flows (what movement occurs, and how intensively)
[4] Modal systems (what type of transport occurs)
[5] The interrelationships of the four elements above.
The full diagram is given on the following page as Figure 4.
,.
- 27 -
FIGURE 4 - THE TRANSPORT SYSTEM
SOCIETAL FRAMEWORK
Milieu\ ..
.
GrouvilnUivnlua( Conxpruah""!10n
<>f SPACf:.
-
8
....

v
Al:IIVIIY

Sde Sole

""'''' }
"':J

mand Surr>IY

-'-
J: J.
Spatia/
I Ne!work I
1
! !'.lodes j I S1ocb j
Imbalance
f Positive Potentials
t Compete
! ...
Compkmenl
l Poicnllo.Jbu
I
-.-
I I
I SYSTEMS OPERA TlONS
!
Routing
G
8
! COMPOSITION I
I ()
s.he!.lu!inil
I
l'ncrng
(IIOKES AMOM,ST
I
Trf><:s t>f Serv1<:e
ALTERNATIVE
I n!I-IE VARIA noN/
01
Equ1rment rounts
:..__
L
I
Stimulate h:onom1<:

l'hnnel Gruw1h ol
Urban Reg1on
Bnng Gov.l$ 10
Market
Bendit General
81 U.Vl'L OF SCRYlC' I
WC"ilare
E01sc of Military
M<wetucnt
Potiucal
I
,-
'ION-Tfi.,\NSPORT DEOSIOr-<S
L.mU U<c (\mtrols
0
:-.;,d,vnal E.:onon>><:
\1ond:uy !'.1lococ'
l'"loi"'JI hlculugy
Fig. 1-2 The transport system. Positive potentials* refer to complementarity, inter
Venlng opportunity, cultural affinity. etc. Negative potentials*'" refer to distance, costs.
transferability, political barriers. etc.
I
I
I

,,
"
;;
, ..
I
i
'"
i:l
"
;r
!
- 28 -
l2 Modes ~ transport
The description given by Eliot Hurst (1974, pp.8-11 and
pp.287-291) is not given here for two reasons: (a) it is much too
lengthy and (b) it concentrates on the more purely transport
geography aspects of the models and so is not strictly relevant
to this thesis except as a starting point for the extension of
the model into the areas of information transfer. However, there
are some sections of the explanation which are more directly
relevant and have an immediate close relationship to information.
To quote a few lines relating to the concept of modes of tran-
sport (Eliot Hurst 1974, p.9):
Each mode of transport, for reasons which will become
evident later, plays a different, though sometimes
overlapping, role in the supply of transportation.
Thus railroads seem best suited to long-haul transpor-
tation of heavy commodities, which are often of small
value per unit of carload traffic. On the other hand
air transport specialise -in speed, long-haul, and
high-value goods needing rapid delivery.
It seems likely that we could transfer these concepts across
to information transfer with relatively little loss of meaning by
equating the transport modes given in the quotation with informa-
tion transfer modes with which we are familiar. Thus "railroads"
could be considered as analogous to "journals" in that they are
both relatively "slow-moving" and are concerned with "long- haul"
transportation; in a temporal sense as far as journals are con-
sidered. "Air transport" might then be considered as analogous
to "informal networks"; both are concerned with the rapid move-
ment of perishable commodities. These ideas are not to be taken
too seriously. More considered study and development of the
model takes place in the next chapter.
- 29 -
CHAPTER 4
A GENERAL THEORY OF INFORMATION TRANSFER
.!!_
. ! ! _ . ~ . Information transfer model
The work done by Havelock (1971) and by Eliot Hurst (1974)
can be blended together, modified and used as the basis of a gen-
eral theory of information transfer. Within this general theory,
other theories can be accommodated as specialised approaches to
particular areas. This chapter is of necessity condensed. I
believe that the ideas which are sketched in here could well be
expanded into a study in their own right. In some areas, the
translation from a transport model has not been fully worked out
and more work remains to be done. The model proposed was not the
main objective of the thesis, however.
It was felt that the later study of one specific aspect of
information transfer networks (their graph-theoretic structure)
with which the succeeding chapters are concerned and which con-
stitutes the body of this thesis, should be seen within the con-
text of a relatively coherent model of information transfer. In
addition, the model has the advantage of clearly distinguishing
between the structural analysis and the flow analysis of net-
works. The model is reproduced on the following page as Figure
5.
! i
:1 .,
ii'
ii'
'.
,.
'
,.
'""'
INFORMATION TRANSFER MODEL
SOCIETAL FRAMEWORK
(Operational Milieu)
information need
- f 2
1
Information Information
ROUTES
node node
t: ::J

NETWORK (Tangible/Intangible)
pply Demand
MODES (Compete/Complement)
Information
STOCKS (Fixed/mobile)
imbalance
"l
H
Q
c::

Barriers
Affinities /
+
t<:l
"'
I
5 tt 3
H

"l
0
w
FLOW
SYSTEMS OPERATIONS
'6

Scheduling Pricing
COMPOSITION
CHOICES AMONG

ALTERNATIVE
TIME
POLICIES
VARIATION
4
Investment
J LEVEL OF SERVICE
Military
$:
0
oCJ
H

oCJ

>

[JJ
"l
t<:l


t:f
E:l
t t
New technology
7
NON-INFORMATION DECISIONS'
Education Research

;__;: ;" --
"_1
- 31 -
~ . ~ . OPerational milieu
This model of the information transfer process can be
explained in the following way. The first step is to place the
whole approach into its socioeconomic and political setting; this
defines the basic operating conditions within which the informa-
tion system must be examined (the operational milieu ). This
allows us to take into account such factors as who owns and con-
trols the means of production within a given society, who creates
and develops information technologies and what ends they serve.
Although the above has been couched in the very broadest
terms of society as a whole, the approach is obviously applicable
at any level of study. Here we see that the model is already a
strong one, which is general for whatever level of structure is
under study. At the most general level, we can consider the
international information transfer (or technology transfer) sys-
', ...
""
tem, where the "societal framework" is the world community of
nations and their relationships. At the slightly lower level of
the national information transfer system, the "societal frame-
work" is obviously the nation itself. If we wish to study a
.........
library operating within an educational institution, then the
"societal framework" becomes the college or university which the
library serves, and the interactions which are described later
includes interactions with other information transfer systems
within the institution. At an even lower level, a department
within the library then operates within the "framework" of the
library as a whole.
- 32 -
.!!..3.. Need .fQ.r. information
Within this operating environment, the first stage
(corresponding to the box marked [1] in the diagram above) in any
information transfer system is the need or desire for interaction
to take place. Interaction arises from three basic causes:
[1] an awareness of the lack of information
[2] a desire to obtain information, and
[3] spatial or temporal separation from the means of satisfying
these needs or desires.
These three factors are present even in the simplest act of
communication eg. there is spatial separation between two per-
sons in a face-to-face situation. Information transfer is not
some inherent structure which exists in society as an end in i i
itself. It is only part of a broader set of utilities influenced
by individual and group aspirations and needs, lifestyles, polit-
'
'
ical motivations and socioeconomic relationships.
1.:
'
.!!_ .!!_. Structural components Q f ~ system
The second section of the model (marked [2] in the diagram),
is concerned with the basic structural components of the informa-
tion transfer system - the modes of information transfer, the
stocks and the networks.
The term 'modes' is used within the information transfer
context to indicate the various broad information transfer means
e.g. informal communication, the formal journal/scientifi? paper
network etc.
- 33 -
The term stocks' used in the transport model is used within
the information transfer model to mean the equipment whereby
information is transferred along a particular channel the
medium rather than the message, The term is generally used in a
fairly broad sense to encompass say, the concept of television
equipment as the means whereby information is transferred.
Information transfer is constrained by the channel of the
network, the characteristics of the particular modes of informa-
tion transfer and the facilities of both fixed and mobile stocks
of information transfer media, With many information transfer
systems, the stocks of basic equipment are fixed (eg, cables or
television equipment) or are unusual in that they are tangible
but transient (eg, a newspaper). The networks themselves consist
of nodes and the links between them (and are the primary object
of study of this thesis). Each node or activity site represents
a point of demand and/or supply of information, and each link
corresponds to a specific transfer channel. The links may be
tangible and well defined (eg. cables) or, more usually, diffuse
and intangible (eg. radio or face-to-face contact). Figure 2
above provides a breakdown using this form of classification. A
channel of communication may comprise more than one link between
nodes and the links do not necessarily have to be of the same
type. For example, the complete communication network between an
author and the eventual recipients of the information that he is
supplying would be a very complex one, using perhaps several dif-
ferent modes of information transfer (books, scientific journals,
television) and hence different stocks along different channels.
The links along even one channel may move from being fixed (a
cable to a broadcasting station) to intangible (the radio
'I'' ,,
.,,., ..
- 35 -
channels. in this model, there is a definite barrier between the
node and the channel which has a degrading effect upon the infer-
mation transfer process. At this level, the category analysis of
the process can also be applied. The knowledge flow system model
(the upper portion of Figure 1) is absorbed as a special case
into the overall information transfer model.
~ . 2 . Systems operations ~ level Qf service
Depending on the constraints of the operational milieu, the
system operator or decision maker has open to him a range and
variety of transfer options (stage [3] of the model). The indi-
vidual user of the system, whether as a potential reader, broad-
caster
or writer, is relatively powerless to influence the decision
making concerning the system and sees the information transfer
systems as essentially fixed. On the whole he can only choose
the mode and time for information transfer within the constraints
of the system, after the event. The individual has power mainly
in the aggregate as "The Market" and a sufficient number of indi-
victuals "voting with their feet" can then have a considerable
impact on the decision making by the system operators. The
operators (eg. publishers) can actually establish the networks
and schedules, the pricing and reliability and all the other fee-
tors which go to make up the concept of "level of service" (stage
[4] of the model), although they in turn must reflect to some
extent factors external to them (government decisions, market
forces, other publishers etc).
'
'
'
'
... -
; : ~
.,. .....
- 36 -
~ . ~ . l . Service
Such level of service is also constrained by certain socie-
tal values and goals such as profitability and community welfare.
In addition, policy makers can govern the types, numbers and
availability of stocks in the system (eg. by tax policies); can
add new links or close down old ones (eg. by opening a new telev-
ision channel or by closing a local newspaper); improve operating
characteristics; and, at a higher level, where important, can
regulate competition and assign operating rights (eg. by placing
a new contract for Research 1n Education) and rates.
This full set of systems operations is usually only open to
a single organisation or agency in fully planned economies (and
even this is somewhat debatable, since there are always multiple
power nodes in political systems which appear on first sight to
be monolithic). Even in a planned economy it is unlikely that
the whole range of information transfer modes would be covered,
perhaps only one might be - say, scientific journal publishing.
The other modes may well be under the control of different agen-
cies. In other types of economies, many individuals and agencies
will be involved and, in order to understand fully the current
information transfer process, it becomes essential to study the
variety of legislative and regulatory constraints and agencies
(many of them conflicting) which have influenced information
transfer in its historical, political and socioeconomic aspects.
~ . ~ . Flows through the system
In response to demand, routes, systems operations and level
of service, a flow of information moves through the system (Stage
i . '
i '
I ,
' '
::<
' '
- 37 -
[5] of the model). Flowe are the volumetric measures of the
interaction and the successful inter- relationships of the other
components. There is a feedback to level of service, operating
characteristics and, ultimately, to demand itself. The analysis
of flows is a major topic in its own right and the classic gen-
eral text in this area is Ford and Fulkerson (1962). A very
brief survey of some library and information science applications
of network flow analysis appears in the next Chapter when a sur-
vey of the network literature is made.
~ . ~ . External policies and decisions
Finally, information transfer decisions are influenced by a
wide variety of directly related (stage [6] of the model) and
indirectly related (stage [7]) factors. The direct factors
include general investment decisions which may affect the supply
of and the demand for information. There may also be military
decisions which may affect the whole area of information
transfer, eg. the setting up of the National Technical Informa-
tion Service and its predecessors in the United States, or of the
Department of Scientific and Industrial Research in the United
.........
Kingdom. Other information-oriented developments include the
whole area of new technology viewdata, teletext and the
integration of office equipment, microprocessors and data commun-
ications technologies. All of these options in this sixth stage
can modify the working of the other five stages.
Besides direct influences, a number of other external and
basically non- information factors must also be taken into
account, since they can affect the demands for, and use of,
information transfer systems. National economic policies, or the
- 38 -
lack of them, can materially affect the distribution of demand
over both space and time. Policies towards education and train-
ing can affect publishing (eg. the Industrial Training Act).
Policies towards scientific research obviously has an impact upon
both libraries and scientific journals
.!!.&. Summary
The common threads presented in this overall model underlie
the great variety of information interactions which exist at all
levels of society and within all modes of information transfer
from libraries to face-to-face communication, from manuscripts to
television. It is not an exhaustive development of the basic
,,
ideas, but I believe that it does provide a basic framework for
analysing all communication processes. In many cases, the modes
. ,.,
' I ~
of analysis may be the same as the other areas of the even more
general concepts of circulation. In other cases, special means
of analysis may become more applicable and not readily transfer-
~ - : I
able, even to other information transfer modes. An example of
.,i:
' .I !
. '
the first type is the theory of graphs which is widely applicable
.: , I
' ) I
I,'!.,'
............
whenever one can link two or more nodes with a line (or the con-
cept of a line), be these atoms in organic chemistry, linguistic
concepts, transportation networks or communications networks. An
example of the non-transferable concept is Shannon's theory of
Information Transfer which has specific and useful application to
data communications. In spite of much straining in the late
1950s and 1960s, it has been found to be less applicable to the
more general concepts of information retrieval than was hoped.
- 39 -
CHAPTER 5
GRAPH THEORY IN LIBRARY AND INFORMATION SCIENCE
~ . ~ . Introduction
In spite of the frequent use of the word network" within
the field of library and information science (eg. Price 1965),
the techniques and methods of analysing networks by means of
graph theory have been used very rarely.
This lack is the more surprising when we consider that the
term network" is used and well established in colloquial par-
lance eg "interlibrary loan network". What has been lacking so
far is two things: ' .
'
. i
[1] A recognition of the role of libraries and the materials
that they handle within a genera; theory of information flow
and communication; and
[2] consequently the recognition that the tools that are appli-
cable in one area can be applied equally well in another.
This lack of recognition is not only a feature of library
and information science, The analysis of social structures and
of the social aspects of communication has only used graph theory
to a very limited extent and has been a disappointment in the
background reading for this thesis.
'
- 40 -
Potentially, there are many areas of librarianship and
information science which could benefit from a more formal
graph-theoretic analysis of the networks involved. The obvious
area is that of citation studies and this is the one in which
some work has been done. However, virtually all library applica-
tiona involve some form of communication and inevitably a network
is formed. In particular, the library interloan network is a
fairly obvious example, and this is also an area in which some
work has been done (see below). It would be possible to extend
some of this work by evaluating the performance of networks with
different structures and seeing whether the structure could be
correlated with the performance.
Associated with the issue of performance is that of the
structure of the organisation. The graph-theoretic characteris-
tics of a library's organisational structure could be correlated
with a number of performance measures relating to the service to
the outside world or to internal measures of staff satisfaction.
It would be interesting to compare the structures of a number of
County Libraries or Polytechnic Libraries in this way.
1: .'
,,I
J ::.
There are a number of areas of information retrieval which
could be studied with the aid of graph theory. One area which
immediately springs to mind is that of thesaurus and classifies-
tion scheme structure. The syndetic structure within a thesaurus
would form a network and measures of the structure could be made.
Again, these measures could be correlated with measures of per-
formance - either objective ones or possibly even user opinions
on the complexity and ease of use of the thesaurus.
There are a number of other areas which could be
qooted
in
- 41 -
support of this idea. In each case, graph theory provides meas-
ures which summarise the characteristics of a highly complex net-
work and enable this structure to then be evaluated in terms of
some form of estimation of performance. The remainder of this
chapter reviews the major users of graph theory in library and
information science and describes the work of each author
.s_.z.. Garner
The earliest recorded use of graph theory appears to be by
Ralph Garner in his thesis (1966) which was reprinted as 1967a
and summarised as 1967b. The summary that follows is of Garner
1967a
Garner begins by briefly outlining the literature of both
citation indexing (which was, by 1967, not very large) and that
of graph theory. He describes some of the major texts on graph
theory and the bibliography contains a wide range of references
to many aspects ~ f graph theory - not all of which were used
directly in the thesis.
Chapter III discusses the basic idea that a citation rela-
tionship can be treated as a directed or an undirected graph by
mapping a set X onto X by means of a function [Gamma] which is
termed the citing function. Following Garner's example, in order
to assist in the preparation of this thesis, Gamma will be
represented by the capital letter T Thus the expression Tx means
"the set of papers which have cited the set of papers x". The
entire graph is denoted by G = (X,T). This states that the graph
is composed of papers, and a function which relates those papers.
The chapter goes on to define the various powers of Gamma,
I::
such as
- 42 -
2
T X
(ie those papers which cite the papers which cite x) and
-I
T X
(ie, the papers cited by X); and the three subdivisions of a
graph. Further on in the chapter, there is a discussion of the
use of matrices to represent directed and undirected graphs.
Chapter IV is entitled "Searching the citation index net-
work" and deals mainly with bibliographic coupling (drawing upon
Kessler's work) and the citation index search strategy of
cycling).
Chapter V "Locating paths in the citation network" is of
some more direct interest as the matrix notation is expanded and
it is shown that the matrix AA2 = TA2 and that both represent the
two-step paths in the network. A tentative approach is made to
the problem of determining the critical (ie, the shortest) path
through the network.
Finally, Chapter VI touches on some aspects of graph theory
as applied to relationships between authors using some of the
ideas drawn from sociology, although these do tend to deal with
graphs in a very restricted way.
Garner's thesis has been curiously neglected. It should
have sparked off a great deal of research in this area because,
although it does not provide many of the answers to the Ptoblems,
it does ask many of the right questions and throws out many ideas
- 43 -
and hints. It can be considered a seminal paper
.5.. .3.. Korfhage .ll .i1J..
A series of papers arising from work at the Southern Metho-
dist University in the United States has been grouped together
since they are all on similar themes and are written by the same
group of workers (Korfhage, Bhat and Nance). In each case the
published version of the paper has been referenced, although
there are also some more detailed technical reports issued by the
National Technical Information Service (NTIS) as PB reports
and/or the Educational Resources Information Centre (ERIC) as ED
reports.
The earliest paper in this series is Nance (1972), which
considers library interloan or referral networks as directed
graphs. Libraries are considered in three roles: as initiators,
receivers or relays (both receiving information and passing it
on). This paper is a good illustration of the other major
approach to the analysis of networks - that of considering the
cost/flow/capacity of a network (the key reference here is Ford
and Fulkerson 1962) as contrasted with the topological structure
of the network.
Korfhage, Bhat and Nance (1972) continues the study of
library networks by studying the Public Library Access Network.
The discussion of graph theory is largely in terms of the broad
class of network eg, cyclic or decentralised and in terms of weak
or strong connections. A formula is provided for the "flexibil-
ityn of a networl'
'i
~ ! '
- 44 -
where Q = edges and N = nodes.
The concept or rlexibility is concerned with the number or
alternative paths through the network ie, a network which has
only one path through it is a very inrlexible one.
A number or limiting conditions are given:
For a cyclic network, Q = N so F = o.
For a decentralised (totally connected) network, Q = N(N-1), so
F = 1.
For a hierarchical (branching tree) network, Q = 2N-2, so F =
1/N.
and ror a two-regular network, Q = 2N, so F = 1/(N-2).
Nance, Korrhage and Bhat (1972) continues this line or
approach and studies some structural measures or physical net-
works, rererring again to rlexibility, to accessibility and to
the multiplication or the connectivity matrix to give the powers
or the matrix (which measures the number or rererrals).
Korrhage (1974) is a re-examination or a paper by Crawrord
(1971) on the inrormal communication networks in sleep research.
He studies this data in terms or directed graphs or contacts
between scientists
.2 .!1.. Pritchard
An early paper (Pritchard 1972) foreshadowed some or the
ideas in this thesis by discussing the general theory of inrorma-
tion transfer in terms of a network and channels and the ~ l o w of
inrormation through these channels. Bibliometrics was defined as
-
:;
'
:r
'
.,
:'
'):
'j !
. _;I
'') !
- ~
- 45 -
the "metrology" of information transfer (although more strictly
it should have been considered as the metrology of printed infer-
mation transfer only). A network and channel model was developed
and examples given of the flow of information in computing
literature. The flow of information was quantified using data
derived from a research project and the suggestion was made that
the network model could be simulated using an analog computer.
_!i._!i. Cnmmings and Fox
Cummings and Fox (1973) describes the bibliography to a
paper as being represented by a directed graph. Two functions
are defined - the citation function and the bibliography funo-
tion. Two alternative methods are described and simulated by
randomly generated samples in an attempt to determine the more
effective strategy.
_ ! i . ~ . Fialkowski and Jastrzebski
A flow approach to information transfer in networks was stu-
died by Fialkowski and Jastrzebski (1978) which involved the use
of directed graphs. In this study, some nodes were considered to
be blocked and the nodes were weighted by the probabilities of
unblocking and the branches from the nodes weighted by the costs
of information transmission. The purpose of the model was to
derive a minimum cost path from the source to a destination node.
Shaw (1981) attempts to use the Maxwell-Boltzmann statistic
from classical statistical thermodynamics to define the amount of
information in a graph. He concludes that it is not possible to
:;
:;
,- ,,
,I
I

- % -
transfer the use of the M-B statistic directly, but derives
another statistic based upon the maximum number of co-author
graphs in a network. This work is not directly concerned with
the structure of a graph.
.
I
. I
..
- 47 -
CHAPTER 6
MEASURES OF NETWORKS
~ .
~ . ~ . Introduction
The problem of attempting to measure the topological proper-
ties of information transfer networks therefore resolves itself
into two parts:
[1] That of finding a measure or measures of those properties of
an information transfer network associated with the informa-
tion handling characteristics of the network, and
[2] Of testing the chosen measures both on networks of various
types and on the same network at various stages of its
development to ensure that the measures clearly differen-
tiate and group the various networks.
It is anticipated that the chronological development of sub-
jects could well result in a plot of measures on a diagram which
is analogous to the Hertzsprung-Russell diagram in astronomy. It
is further hypothesised that a similar "main sequence" and
anomalous pattern could emerge. Looking forward slightly to dis-
cussions of the S-I Index in Section 12 of this Chapter, the
basic graphical boundaries for this could well be the S-I Index
plane (see Figure 12).
::
- 48 -
.Q..Z,. Existing .'!l.Q.tk
There has been very little work so far on explicit measures
of network structure and characteristics in library and informa-
tion science. Such work as has been done has been exclusively
concerned with a cross-sectional" approach and has not been
explicitly concerned with comparative measures nor with overall
network measures. The main worker in this field is, of course,
Price, who in Price (1970) suggests two measures that can distin-
guish disciplines and, to a certain extent, subjects:
[1] The percentage of references that have appeared in the last
five years
[2] The number of references per paper.
Price puts most emphasis into discussing the meaning of the
first measure and its relationship to scholarship. There is a
scattergram in the paper which plots measure 1 on the x-axis and
measure 2 on the y-axis. One of the most interesting aspects to
note about measure 2 is that it is the Beta index (see below)
derived from a cross-sectional survey.
Price's paper continues a long line of bibliometric studies
which have given more or less emphasis to this cross-sectional
Beta Index approach in the course of other work. Parker, Pais-
ley and Garrett (1967) quote figures for various time periods.
It is a measure which has been studied on many occasions in the
course of work on the growth of "Big Science" and the extent to
which more recent scientific developments have had an impact on
the number of references per paper.
- 49 -
The other measure that is very often quoted, again in the
course of other studies, is what can be called an "inverse cross
sectional Beta Index" ie, a figure for the number of citations to
papers. This, too, may be worth plotting on a continuous basis,
although it is not studied as such in this thesis.
The remainder of this chapter discusses various network
measures that might be used to characterise the literature (or
other aspects such an informal communication) of a subject. They
have been drawn from a variety of sources - in particular from
transport geography, although some have been devised especially.
The sociological literature has been looked at
Network measures can be divided, broadly, into two groups:
[1] Those which describe the aggregate graph theoretic pattern
of the network as a whole; and,
[2] Those which describe the relationshiP of individual elements
of the network to the whole network.
Initially, those properties will be described which relate to the
whole network.
Q. .3.. Beta Index
The Beta Index is the simplest measure that can be derived
from the characteristics of a network, although it has a rela-
tively low power of discrimination in terms of distinguishing
between networks of varying characteristics. It is one measure
of the connectivity of a graph and is simply the ratio of the
number of edges [E] within the graph to the number of vertices
[V]:
I
l'
':
_.,
. ., ....
-
- 50 -
Beta = E/V
There are a number of properties of Beta which are more or
less useful for our purposes. Previous discussion of the
abstraction of networks into graphs has shown that generally
graphs of the size and complexity of large information transfer
networks will, almost certainly, be non-planar. However, since
the methodology is of general applicability, values for both
planar and non-planar graphs have been given. The properties of
Beta are:
[1] Disconnected graphs and those in the form of "trees" with
branches and no circuits will have values of Beta< 1.0.
The actual value of Beta is determined by the relationship
(James 1970):
Beta = 1 -
-I
v
[2] Graphs with one circuit have Beta = 1
[3] As the network structure becomes more complex with an
I
l .
increasing number of edges and vertices, so the Beta index
' .
increases.
[4] The ranges of Beta are as follows:
For planar graphs
0 <=Beta<= (V-1)/2
For non-planar graphs
0 <= Beta <= Infinity
[5] If Beta> 1, this indicates that there are alternative paths
through the graph.
- 51 -
As shown below, at the small graph level, Beta has rela-
tively poor powers of discrimination. For fixed V, Beta cannot
distinguish between different tree graphs, or between tree and
star graphs. In all these cases, V=6, E=5 and Beta= 0.833.
( ~ - )
>--<
(c.)
(d)
FIGURE 6 - TREE AND STAR GRAPHS AND BETA
In some cases, Beta is unable to distinguish between identi-
cal structures of differing sizes, as in this family of ring
graphs, where, in each case, Beta=1
VI-= 1
..
,i ;
:1 I
- ..
: . .. r
. i
:1 l
.
- 52 -
These are all, of course, very small graphs and the state-
ment in Haggett 1977 (pp. 76-77) remains very true:
Many problems in network theory arewell known because
they are easy to state, can be solved by trivial
methods for small blackboard examples, but which are
virtually intractable in large scale, realistic situa-
tions. Most of the difficulties arise from the com-
binatorial nature of the mathematics.
Larger graphs of information transfer networks and the
results from them will be presented later and they will illus-
trate this point. It is clear however that, from the results of
preliminary studies, Beta increases substantially in the early
part of the development of the network. This is in accord with
the analogy from transportation geography, where Beta correlates
positively with the growth in economic development of a country
and with the various levels of economic development (Kansky
1963).
The shape of a Beta (or, indeed, any other index measure)
graph as it changes with the growth of the subject could well be
one of the determining factors in distinguishing one subject from
another. The mathematical description of the graph should enable
us to predict the future development of the reference structure
of the papers in a subject and thus shed some light on the future
characteristics of the communication patterns within the subject.
~ . ~ . Cyclomatic number
Another measure of the connectivity of the network as a
whole is the Cyclomatic number or First order Betti number, gen-
erally designated as Mu. It is a measure of the number of
- 53 -
fundamental circuits within a graph. The measure is defined by:
[MOJ = E - V + G
where G = the number of sub-graphs.
The basic properties of the Cyclomatic number are:
[1] The number increases with the completeness of the connec-
tions and the complexity of the network.
[2] The bounds of the cyclomatic number are:
0 <= [MOJ <= (V-1)(V-2)/2
[3] Low levels of development of a subject are reflected in low
values of the Cyclomatic number. That is, the early
development of a subject is characterised by discontinuous
graphs and the existence of trees. More developed subjects
have more highly connected graphs with correspondingly
higher values of [MO].
Like Beta, Mu is, however, not very discriminatory between
graphs of various types and like Beta does not distinguish
between different tree graphs, tree and star graphs, and trees
and disconnected graphs. Thus both
:r
.,, ....
- 54 -
have values V=7, E=12 and G=1, and hence Mu values of 6. Simi-
larly with
~ )
FIGURE 9 - TREE GRAPHS AND MU
All of these have no circuits in them and hence mu values of o,
although they are different in shape.
~ . 2 . Alpha Index
The Cyclomatic number can be combined to form a ratio meas-
ure of the actual number of circuits to the maximum possible
within the system. As with all ratio measures, this forms a more
meaningful basis for comparing different networks.
The formula for the Alpha index for a non-planar graph is
MU
4
( V - V)/2- (V-1)
'
t
.
.
t'
- 55 -
Alpha is very much concerned with the degree of redundancy
within a network. It compares the actual network with a com-
pletely connected network with the same number of vertices. Some
of the salient features of alpha are:
[1] The bounds are given by:
0 <= alpha <= 1
[2] Completely interconnected networks have an alpha value of 1.
For networks with a decreasing number of edges, alpha
approaches 0. Zero values are assigned to all networks
which have a cyclomatic number = 0 (eg, all disconnected
graphs and trees).
' i
[3] One of the more useful properties of alpha is that it is
independent of the number of vertices in the network, as
indicated by the upper limit of the bounds. This does mean
that different networks can be compared regardless of their
size.
Again, however, Alpha is not very good at discriminating
between networks of differing patterns of edges between the same
nodes, where the values of v, E and G remain the same. Those
graphs which are by the use of Mu remain so
with Alpha
.Q. .Q.. Diameter
The diameter of a network is the second whole network non-
ratio measure which is to be considered (albeit briefly). It is
defined as the maximum associated number of a graph G, or the
maximum number of edges in the shortest path between each pair of
vertices. More formally:
~ . 2 . Gamma Index
- 56 -
d ..
~
The Gamma Index is a measure of connectivity and relates the
actual number of edges E to the maximum number possible and is
derived as follows:
If every node had a connection with every other node, the
matrix V x V would have V
2
1inks in it (represented by 1' s in the
matrix cells which have an interconnection), except for the major
diagonal (of size V) since nodes are not considered to be linked
to themselves. Thus the total number of links would be V
2
- v.
But each link in a complete matrix will appear twice, so the true
maximum is
(V Z- V)/2
Thus the equation for the Gamma Index is
2E I (v"--V)
The properties of Gamma are:
[1] The range of Gamma is
0 <= Gamma <=1
[2] The zero value represents the Gamma Index for a totally
disconnected network and the maximum a totally connected
network.
. '
'
1
- 57 -
[3] Gamma is independent of the number of vertices in the net-
work.
[4] The Gamma Index has a direct correlation with the level of
development of a subject: in subjects with a high Gamma
Index, there are alternative and possibly redundant links
which improve the level of accessibility (ie, the transfer
of information) between pedes
.G..&. Eta Index
The Eta Index is a measure which expresses a relationship
between the network as a whole and its routes. It is the ratio
of the total mileage (or length) of a network to the total number
of edges:
Eta= M/E
and represents the average length of an edge.
In information transfer terms, the concept of mileage
between one node and another could be replaced by the concept of
the time period (in years or other time units, depending upon the
........
subject of study) between papers being linked by references
.6_ .2_. .ti Index
The Pi Index ia also a measure of the relations between a
network as a whole and the specific edges of the network. It is
called the Pi index because of its logical similarity to the
relationship between the circumference and the diameter of a cir-
cle. The total mileage of a transportation network is defined as
being analogous to the circumference of a circle and denote it by
- 58 -
C. The total mileage of all edges of the diameter of the network
is analogous to the diameter of a circle, and has already been
defined above as D. Then Pi is defined as:
Pi = C/D
For disconnected networks:
Pi=
7V
where Pi is the index for each individual subgraph, and N is
the number of subgraphs.
The properties of Pi are as follows:
[1] Pi has the values:
Pi => 1
'
[2] Pi will vary directly with the degree of development of a
network. The concept of mileage within Pi for an informa-
tion transfer network, could also be replaced by time
periods.
~ . J Q . Theta Index
~ h e Theta Index is the ratio of the network as a whole to
its vertices. Kansky (1963) defined two ways of expressing
Theta. One uses the mileage of a network, as with Pi and Eta,
whilst the other uses the concept of the total traffic flow.
Thus there are two formulations of Theta:
[1]
Theta = T/V
'
'"
'
'
'
'
V''
- 59 -
where T = the total traffic flow.
[2]
Theta = M/V
where M = the total mileage of the network.
The Theta Index has the important property that it offers
information on the length, structure, and also the degree of con-
nectivity simultaneously. This is evident from the following
figure, where for networks A and B, Eta gives the same value,
whereas Theta differentiates them.
IO
Af
"
14.0
tVJ
=
)J+. 0
e. 7 e.-
=
7
Af
:zo
1'1
l
-
=:
20
::
-e...
1
=
=
-e._..
14 0
.
!}
1'1
23.3
/vJ
~ = 17S'
= =
"
&
v 6
=:
=
v
g
'
.....
FIGURE 10 - NETWORKS DIFFERENTIATED BY THETA
_.1J_. l.Qll Index
Iota is a particularly interesting index measure which pro-
vides a relationship between the information transfer ne,twork as
a whole and its weighted vertices.
I
'
'
'
'
- ~ -
- 60 -
Iota = M/W
where M = total mileage of the network, and
W = observed number of vertices weighted by their function.
Another formulation of Iota is:
Iota = MIT
where T = a number expressing the total traffic flow of the net-
work. This version of Iota is a measure of the "density" of
traffic flow of a given network.
One method of weighting the vertices is to use the "order"
of each vertex. The order of a vertex is, by definition, the sum
of the number of edges leading into the vertex (the "indegree" of
the vertex) and the number of edges leading out from the vertex
(the noutdegree" of the vertex). With a general, non-directed
graph this definition suffices. However, this definition is not
sufficient when directed graphs are under discussion.
Kansky (1963) applies a differential weighting to the ver-
tices to emphasise higher order vertices, ie,
vertices of order 1 (endpoints) are weighted by the factor 1
vertices of order 2 and above are weighted by a factor of 2.
This enabled him to make a distinction between networks
which were otherwise confounded by Iota. The translation of the
basis for weighting to information transfer networks would obvi-
ously require considerably more study. One possible method would
be to use the difference between the number of references con-
tained in an article and the number of citations to the article.
This would effectively provide a weighting in favour of more
f
I
'
,l
., ...
- 61 -
heavily cited papers
.6.. .U.. ~ - ~ Index
Very little work was done on the evaluation or the erfec-
tiveness of the above measures, following their publication and
derivation in Kansky (1963) where most of them were published for
the first time. A number of people did use the measures (eg
Pitts 1965), and Werner (1968) made some evaluation and criti-
cisms. The first major piece of work that was done on extending
Kansky's work was by James et al (1970), who looked at the effec-
tiveness of the Kansky measures in differentiating between net-
works. They showed that, as Kansky himself had pointed out in
his original paper, many of the measures were not very good at
differentiation and duplicated much of the information given by
other measures (as had also been noted by Werner 1968).
Basing their work upon the theoretical paper by Ord (1967),
the S-I Index was derived from the frequency distribution of the
shortest path lengths D There are well-known .methods for
obtaining the shortest path matrix which have been described in
Haggett and Chorley (1969) and in Haggett, Cliff and Frey (1977).
For example, in the figure below:
FJG-UR II- /Yo.-.1-<)M(ECTJ:J /\f'&7WO-f'k'FO,f
S- I .r}-N/t"-..)I.S/S
- 62 -
The shortest path distribution F(D, )
L.J
is given by:
Length
)._
0 1 2
3
Frequency ~ 8 26 26 4
James et al (1970) points out that already this distribution
summarises much of the information contained in other measures,
thus:
[1] F1/FO = 2 * Beta
[2] Max (L) = Delta (the Diameter)
r
[3] D(G) = Z j-1._ /f._
1 . . ~ I
[4] A highly negatively skewed distribution for Fij implies a
large value for D(G)
[5] A highly positively skewed distribution for Fij implies a
small value for D(G)
[6] A high modal value for Fij indicates that a large proportion
of nodes are linked only indirectly and suggests that the
graph is tree-like.
The theoretical background to the Index was described by Ord
(1967) who showed that the S-I Index provides a means of deter-
mining which discrete frequency distribution is most closely
- 63 -
approximated by Fij. The graph can be divided into regions and
these are shown as the next figure:
' '
. . . '
:::
=
. c_::_
:::-_::::::
:::,
I
==
E- ::::-- c ~ --.--,-
- 65 -
The Index itself has not been widely used, largely, one
suspects, because of the amount of time it takes to gather
comprehensive data on any real-life network. Forer (1973) stu-
died the spatial structure of the New Zealand internal airline
network using it, whilst Cliff, Haggett and Ord (1979) calculated
it for the graphs defined by the internal airline flights between
the seven largest cities in 18 countries. This graph is shown
below as Figure 13.
17
France 18 I
-01[-
-0'1
-0 3
/ :
'
/Index i
.#
\6 /fC:ln3J<l
/ india
15l United Kingdom
14 f
\South :\.frit:a
Fig. !3
- 66 -
The Index values themselves are determined by obtaining the
first three moments of the distribution (the mean, the standard
deviation and the skewness):
I
S"
[1] Mu1
=
,
"'
-;; to f ~ ri
c
~ N'= z -fL
[2] Mu2
=
S" '{.=0
:f<t_ (-{- ~ ) 2.
AI
1-" 0
[3]
Mu3= J
~ :f-t
({- )3
-
A/
A-" 0
Then
S = Mu3/Mu2
I = Mu2/Mu1
The ensuing discussion clearly shows that the Index is capa-
ble of discriminating between graphs which are not separated by
Alpha, Beta, Gamma, Mu or D(G) and that families of graphs of
broadly the same types but with different numbers of nodes fall
along predictable lines of development on the S-I graph. The
final comment is worth quoting in full:
Finally, it is worth noting that the S=I index can be
used to examine the distribution of path lengths from
any given node, I, in a graph to all other nodes J (not
equal to) I in the graph. If an S-I value was computed
for each I, it might be possible to distinguish dif-
ferent kinds of nodes within a graph.
If this comment were to be applied to scientific literature, then
we would expect to find very different values of the S-I index for
different kinds of papers in a network. Quite what these different
kinds of papers might be, could well be the subject of another study.
-
One obvious type would be review papers, which because of their nature
would be expected to have very many intercommunications with other
, ...
- 67 -
papers in the network and hence different S-I values. The program
NETWRK described in Appendix 6 enables the user to do this.
~ . ~ . D e g r e e ~ Connectivity (Prihar ~
This measure compares the relative position of an observed
network's connectivity on a scale bounded by the maximum and minimum
connectivity ratios and is derived as follows:
If V = the number of vertices and E = the number of edges, then the
maximum possible number of edges will be:
E(max) = V(V-1)/2
The maximum connectivity ratio is always:
V(V-1)/2
-------- = 1
V(V-1)/2
The minimum connectivity ratio is the other limit and this varies
according to the number of vertices:
V(V-1)/2
V-1
The Degree of Connectivity may then be written as:
V(V-1)/2
E
and the range summarised as:
1 <= D.C. <= V/2
- 68 -
~ . ~ . Dispersion ~ Connectivity
Two measures were suggested in a paper by Shimbel (1953) and were
derived as follows.
Beginning with the shortest path lengths L(ij) and the sum of all
L(ij) taken over all I and all J for a given network S. If the sum is
small then the network is compact. The sum D(S) is called the disper-
sion of s. Formally:
D(S)
= i_
L(ij)
.i:J j=J
Each of the individual sums is also of interest. The first sum
is called the accessibility and it measures the extent to which the
other nodes can be reached from I. Formally:
A(I,S) = ~ L(ij)
j=l
and hence
D(S) = A(I,S)
On the other hand, if we sum L(ij) over I, then we have a measure
which tells us how accessible J is to S:
and
D(S) =
-I
A =
L(ij)
L
A(I,S) = L
j=J
-1
4 (J,S)
It can now be derived from the accessibility matrix and is the
sum of all the rowsums. The dispersal index can be used as it stands
or can be divided by the number of nodes. Dispersion is an overall
- 69 -
property of the network, whereas accessibility is a measure indicating
the spatial relationship between a given element and the remainder of
the network.
Connectivity matrices: manipulation and measures
The connectivity matrix is fundamental to all analyses of net-
works and its manipulation forms the basis of many of the indexes
described above.
The basic connectivity matrix consists of a zero-one matrix in
which each connection between nodes is indicated by a '1' and the
absence of a connection between them by a
1
0
1

If this matrix is multiplied by itself (powered), then each power
represents the number of 2,3,4 -step paths between any pair of
nodes. The powering ceases at a point where no new figures enter into
the matrix; this final power is equal to the of the graph. Derived
from the initial connectivity matrix is a second matrix known vari-
ously as the accessibility matrix or the shortest path matrix and this
again forms the basis for a number of the measures described. The
process is best described by an example:
This network
FIGURE 14 - FOR CONNECTIVITY MATRIX ANALYSIS
'
- 70 -
CONNECTIVITY ACCESSIBILITY
MATRIX
MATRIX
c D
NODES 1 2 3 4 5 6
NODES 1 2 3 4- 5 6
----------- -----------
1 IO 1 0 0 0 0 1
IO 1 - -- -
2 11 0 1 0 0 0 2 11 0 1
3
IO 1 o 1 1 0 3
1- 1 0 1 1 -
4 IO 0 1 0 0 0 4 1- - 1
0 - -
5 IO o 1 0 0 1 5 1- - 1 - 0 1
6 IO o o o 1 0 6
1- - - -
1 0
2
2
c
D
NODES 1 2 3 4 5 6
NODES 1 2 3 4 5 6
----------- -----------
1 11 0 1 0 0 0 1 IO 1
2 - - -
2 IO 2 o 1 1 0 2 11 0 1
2 2 -
3 11 0 3 0 0 1 3
12 1 0 1 1 2
4 IO 1 0 1 1 0 4
,_ 2 1
0 2 -
5 IO 1 0 1 2 0 5
,_ 2 1
2 0 1
6 IO 0 1 0 0 1 6 1- - 2 - 1 0
3 3
c
D
NODES 1 2 3 4 5 6
NODES 1 2 3 4 5 6
----------- -----------
1 IO 2 o 1 1 0 1 IO 1 2
3 3 -
2 12 0 4 0 0 1 2 11 0 1 2 2 3
3
IO 4 0 3 4 0 3 12 1 0 1 1 2
4 11 0 3 0 0 1
4 13 2 1 0 2 3
5 I 1
0 4 0 0 2 5 13 2 1 2 0 1
; 1-
.(;
6 IO 1 0 1 2 0 6 1- 3 2 3 1 0 ; I
' ~ .
4 4
c D
NODES 1 2 3 4 5 6
NODES 1 2 3 4 5 6 rowsum
-----------
-----------
1 12 0 4 0 0 1 1 IO 1 2 3 3 4 13
2 lo 6 o 4 5 0 2 I 1 0 1 2 2 3 9
3
14 0 1 0 0 4 3
I 2 1 0 1 1 2 7
4 IO 4 o 3 4 o 4 13 2 1 0 2 3 11
5 IO 5 o 4 6 0 5 13 2 1 2 0 1 9
6 11 0 4 0 0 2 6 14 3 2 3 1 0 13
Total
=
62
- 71 -
At this point, powering of matrix C can cease, since matrix D is
full, with a maximum number in it of 4 (which is the diameter of
network). The accessibility matrix D is derived by successively
powering the connectivity matrix C and noting after each iteration
whether any new non-zero elements appear. If they do, then the power
of c is put into D. The larger the final result of the rowsum values
for each node, the less accessible the node to the rest of the net-
work. In the network above, it is clear that nodes 1 and 6 are less
accessible than the others. This easy visual inspection would not be
available were a much larger, more realistic network to be analysed.
On the other hand, we are caught in the classic graph theory
trap, because an analysis of a realistic network (say, bibliometrics
with 5,000+ items of literature) would require a matrix so large as to
be computed on only the most powerful machines. Even quite small
matrices exhaust the memory of the average academic machine.
The following relationships obtain:
[1] The rowsums of D4 provide, as explained above, the accessibility
figure for each node I.
[2] The total of all rowsums in D4 provides the network dispersion
measure D(S).
[3] The distribution of values in D4 provides the basic distribution
Fl necessary for the S-I Index, since each value ~ n the matrix
represents the shortest path length .between any pair of nodes I
and J.
The formal algorithm for deriving D (or S as it is a:so ~ n o w n
the shortest path matrix) is given by Haggett, Cliff a ~ d Frey (1977,
. .. -
'
.
:
.
-"'"'
- 72 -
pp.319-320),
.Q...J,i. Summary
The network measures described above are the major ones which
have been used the summarise the topological properties of a network,
The following Chapter looks at some of these measures from the
viewpoint of a particular type of information transfer network, It
studies five citation networks as being well-known examples which are
examples of information transfer, and in many ways the most obvious
candidates for being studied by means of graph theory.
- 73 -
CHAPTER 7
ANALYSIS OF CITATION NETWORKS
~ .
~ . ~ . Introduction
The previous chapter has briefly surveyed a number of measures
which have been used in other disciplines (most notably transportation
geography) to characterise networks. The question of the applicabil-
ity of the various measures to an information transfer network has not
been discussed. It is the purpose of this chapter to examine three
measures in more detail and to propose some modifications to one of
them which will make it more suitable for the purpose. Some results
of applying the measures to some citation networks are presented for
illustrative purposes. More extensive tests of the suitability would
need to be carried out in order to show positively that the measures
can be applied to information transfer networks.
The choice of indexes is a difficult one to make. The indexes
discussed in the previous chapter range from very simple ones (such as
the Beta Index) to the more complex ones and from the results obtained
elsewhere they do vary considerably in their ability to discriminate
between various networks. On the other hand, there is no evidence
from previous studies in the field of transportation geography that
these results would necessarily apply to the very large, complex
information transfer networks in which we are interested - and espe-
cially in their chronological development. It does seem highly prob-
'
0
'
'
'
~ -
t
- 74 -
able that the index measures liill be applicable for several reasons.
[1] Firstly, because of the close analogy between the transport of
goods across space and the transfer of information across time
(or, indeed, space as well, if a study were to be made of the
spatial characteristics of information transfer networks).
[2] Secondly, because networks are the subject of study in both
cases, even if there was no similarity between the use to which
the networks were being put.
[3] Thirdly, because the transport geography results are firmly based
on the generalised mathematical method of graph theory and, to
paraphrase Gertrude Stein, "a graph is a graph is a graph",
The choice has been made partly therefore on the basis of testing
out some of the results obtained elsewhere, in transportation geogra-
phy by e.g. Kansky (1963) and James et al (1970), for the simpler
index measures; partly on the ease of programming; and partly on the
claimed effectiveness of the S-I index. The three indexes chosen for
study therefore are the Gamma Index, the Beta Index and the S-I Index.
Each of these indexes has been applied to 5 citation networks 4
patent citation networks and a network of the literature of
bibliometrics, Graphs are presented for the results of the index
measures, together with some general graphs on the growth of the sub-
jects and the growth of the number of paths in the networks.
Full details of the networks appear in Appendices 1 to 5. Each
Appendix describes the source of the network, and prints the full data
tables relating to that network. It is the tables which form the
basis of the graphs in this chapter. The full frequency distribution
from which the S-I Index is derived is also included.
'
'
- 75 -
z.a. Introduction to in& results
The remainder or this chapter is concerned with a description and
interpretation or the results obtained rrom the analysis or the 5 net-
works. Graphs have been produced or each or the main variables con-
cerned with each network, Each network has a graph of the variable
and, usually, there is a comparative graph of all the patent networks,
In general, only the patent networks have been included on the same
graph, This is for three reasons,
[1] Firstly, the patent networks are of the same general rorm of
material and have rather similar, unique characteristics - the
citations were not included in u.s. patents until about 1948.
This has resulted in a distorted pattern compared to the
bibliometrics network (which is of a more usual form).
[2] Secondly, just because the patent networks are similar, it is
important to compare and contrast them directly, in order to see
whether the measures are able to distinguish between similar net-
works.
[3] Finally, the bibliometrics network is very much larger than the
patent ones and to attempt to show the characteristics on the
same scale would be difficult, It would unduly compress the
patent results so that their own characteristics would not be
able to be distinguished,
It is clear from the graphs which will be presented that the
patent networks do have a common family resemblance (even though they
do also differ amongst themselves), which is distinctly different from
the equivalent graphs relating to the bibliometrics network.
" ~ "
.. ,]
''
'
'
j' ".;
: , : : : ~
...
- 76 -
Five sets of graphs are presented: those for Sigma E, Beta Index,
Gamma Index, for the modified S-I Index and for the growth of the
total number of paths in the networks.
1.3. Growth Qf. Sigma ..K
The graphs for the growth of Sigma E (Figures 15 to 20) present a
picture of the general growth of each network. Sigma E which is the
cumulative number of links in the network, is plotted against Sigma V
which is the cumulative number of nodes. In the specific instances
with which we are dealing, i.e. a citation network, the links
represent references from one paper (or patent) to another, and the
nodes the papers (or patents). The slopes of the graphs thus
represent the growth in the direct interconnections between the nodes.
A graph with a steep slope has a larger number of references per paper
on average than one with a flatter curve.
The chief characteristic of the patent graphs is just such a
relatively steep slope, indicating that, once the citations start,
there is a rapid growth in the number of references back to the ear-
lier literature. Since the patent areas are emerging technologies on
the whole, this growth is not very surprising, although one use of
graphs of this type would be to indicate an abnormally low use of a
particular type of literature and thus to expose the under-use (com-
paratively) of a particular subject area.
In contrast, Figure 20 presents the general network growth for
the subject area of Bibliometrics. Here we find that there is a very
much slower rate of growth - although the total network size is much
greater. As explained in Appendix 5, it is believed that the Qoverage
of the bibliometrics literature is complete to 1959. This includes
'
.
.
.
...
d.
- 77 -
both the source literature (which is probably 95% complete) and the
references (which are less complete), Some work has been done on the
literature after this period and added to the citation data file.
Literature has been collected chronologically from 1960 onwards and
some much more recent items have been added which contain references
to the work of Bradford. The evidence from the graph would suggest
that the bibliography was complete up to item 700 (which occurs in
1964), and that papers which refer to Bradford tend to have a larger
number of references (as indicated by the steep rise in the graph at
this point, It would appear that either there is a considerable
genuine increase in the number of references over this period, or that
papers which refer to Bradford have a larger number of references.
Perhaps these papers review the rest of the early literature as well,
It would be possible to test this hypothesis by either looking at the
literature of another author in the same way, cr by including more
general literature and looking at the impact of this on the shape of
the graph, The rise at around item 400 (which occurs in 1956) is due
to the publication in that year of several items which carried a con-
siderable number of references. Among these were Brown (1956) - a
consolidation volume of results of citation studies which drew upon
.. -
and reviewed the previous literature (26 references); Egan and Henkle
(1956) - a review paper on the use of information which covered the
bibliometrics literature as well as use studies (42 references); Hopp
(1956) - a thesis (whose title, presumably deliberately, harked back
to Bradford's phrase) which reviewed the scatter of scientific litera-
ture and the problems this caused for complete documentation" (62
references).
kj
s
(]
.... ,
C'J
P_ATBNT 1_\TETWOHK 1

ELECTROPHOTOGRAPHY

"4)
.....} ,-( -----------

.<.
::-wo -
:z8o
:zf;o
24-0-

A., ...... L
zoo
180
160
14-0
120-
100
80 --
--o- b- --
4-0
.20
------
GROWTH OP SIC.i\:M E
/
.....----""'

/',/

.-"'""'
/
_.../
/ ____ _..
,-
_...-'
.,.,.. ... ..--""'"'
/ __
I
I
/
/
/
/
/
I
;'
/
I
I
-T --r--r----.- r----r ---.-.----,- --r--,------.-,---,--
10 so 50 70 90 110 130 150 170 190 210
SIGMA 1/
'11
q;
<::

'--
<!)
I
$' I
'-{ ---1
1\] <
<- I
'-1
<
l"tj


).;
---
(/)
'--
'!>

.:,
fl1
- 79-
F 1-UR.E }6 - P"'rTE/'IT AETWDRK 2 - .SI&/111\ E
g
I

,------------...........
I
I
I
..................... ...........
\.
\
...................._
"
\ ____ \ lr
... \
' I <::;
1j0

I
r
I
I
I

I
I
!
a
;::; ;::; :::, <:::,
<:::, :::, a
C\l
,._
--
w
p::;

w
j
+
f:c.)
0')

t"-,

.::;:
V""j

!:t,
0
0
h.,.,_
:J::
;$>-

E-<

0


r>,


E-<
r-,.

Crl
E-<
-""


\:)
,-..
"'"'

::":J a
\:)

'..} '..}
(:
::":J Cj
co
:n
'-c

C-.{
,....
.,__ .,__ ...._ ,__ ,__ ,__
- 8'0-
\:) \:) \:)
::":J a
,,-...
\:) \:)


,__
'::":l
0.,
co
t-....

I.Q

\.:. ....'
.,.._ ,.... .,.._
\:)
a
c-:,
:.)
.,.._

,__
'<'-
\:)
0:>
::":J
'
'"
I
1-<::l
!


1-
::":J
'\'-.
::":l



V}


-.,.,

'-cJ

iJ]
PiiTEI'JT NET1VORI{ 4


EMI
CRO'WJ'H OP SJG.Mt1 E
l 0 -------------------------
1 GO -
150-
140.
1-.'30
1.20
110 -
100-
90
80-
'70
uo
50
-
-.'30
.20 -
10
0 --r----..,.--,
10 -80
--..- ,.---r---;--r----,---r-
50 70 90
SJO.MA V
.
/
l
/
/
i
/
I
/
/
/
/
/
/
;/'
/
/
/
r- - 1 --.--.----r
110 1SO 150
'11
<t
S;;

I'll
-....
()\)
I

N I
w
'I


I)
?tl
.>;

I
\A
'*

flJ
kl

>.
<
i;j
"'1
G-1
PATENT NETWORKS 1 - 4
, .. , R 0
11
,.1'H 0 '" <;'JG' " 1 ""
I..,_. ... \. ;T I' h.<' ... 1l'Lf .l.:.>
f;oo -,---------------------
r
/
5oo
400-
300
.200 -
100-
0
0
-
/
f I
1
-f'j
i

-"''>
/*
7
,:0" I t-1'-
+
;/J' I /+'
if ' I ;r
ll'Jfll i
/' lra:
1
..1'
'?{j' bi
/' I
l
II
;f
fi' i' ,+,.f
u
""""J, ;j-'r
-----r-
I
400 100 .200 300
SR;},:fA l'
o PAJ'f + F'A1';? <> PA1'3 e. PA1'4
\
f]
....
't
s:
-">
rr,

'-.\)
I
-+
I'
"
<P

IV
.). I
..... ,
l't]

'\
<_



I
(;,
'
1
.:),.
ftr
500
'
.
.
.
.

,y.
- 83-
2.0 - B!BL-Io')IE./l{'!CS N'7VIOI(J(- Gl&-1!-1 ..


........................
(spu;DSTbO?.fJ;)
3 F"" ..

I

leo
r

r8
I
l
l

J v-..
r- .:::,
I

\ i
, I
' !
\ I

' I
'l

I
"
'
'
...
- 84 -
~ I n d e x
The Beta Index has been used in an unchanged form, viz.
Beta = E/V
The Beta Index has been produced by means of the program BEGAM
and the values for Beta for each of the networks also appear as part
of Appendices 1 to 5.
The Beta Index (references/papers) has been plotted cumulatively
against the cumulative number of papers, and is presented as Figures
21 to 25 for patents and Figure 26 for Bibliometrics.
As may be expected, the Beta Index graphs reflect the general
growth in the literature. For all the patent networks there is a
relatively steady rise in the number of references per patent. Com-
paring Figures 19 and 25, it is clear that the general patterns are
reflected for each network. The overall slopes of the graphs are
equivalent and the relationships of one patent network to another are
the same. The rate of growth of Beta shows little sign of slowing
down for any of the networks, although it remains fairly low at about
1.2 references per patent (although slightly higher at 1.6 references
for patent network 1).
The Bibliometrics network presents a different and more interest-
ing picture. There is steady growth in the cumulative number of
references per paper to a figure of about 2.8 at item 400. At this
point we see the sharp rise in the graph due to the factors noted
above. From here in from 400 to 700 (1956 to 1964) there is a pla-
teau, where the number of reference per paper remains steady. From
item 700 onwards,
there is a sharp growth in the
number of
references
- 85 -
per paper reflecting the same factors relating to the nature of the
bibliometrics data base as has been previously mentioned under Sigma E
above.


r<l

Plt TENT 1\fETVVOHI{ l

,,
E JLE C T R 0 P II 0 TOG R_,_L\P
GROWTH O!i' BETA
'l" f" -r----..--------------------------.. ----------- --------------------
1 '
' .1:> --
l _ _,,
1.4
., ":1
r . .z -
1' / -
l --
0.9 -
OJ3-
0.?' --
OJ)
0.5
OA-
o.s -
:.:;
0. l -
_ ...
/"'-----... .. /'
/
/
//
/
/
'
/
'
'
/
/,./
/"-...... /
,J ---- ._!
ll
/
/_,---------- _ .....
J
I
I
' I
!
' I
I
I
I
I
I
I
I
I
0
10 ao so '/O oo 110 1so tso '!'lo :r 9o .:z:ro
C' I,.,. J'A "
h.' . }'
n
.._

-'\)
fJj
N
-

I
<. d;)
I "'

ll]

il>
;..;:
'-

llJ

- 87-
F/6-UI(t:. 2 2. - ?rt-n::N'T N/'"WtrR.K 2 - .BE:J7:}
<::l
r----------------------------------------------r <::l
-o
<::l
'i
C") ~ ' > 1
,__ ,__
a:, o:;, ~ ~ .... ~ . n '-(j
'i
~ ~ q
'>--
, ......
~
,._,_ ,__
'>--
.,.__
a 6 6
,...;
a 6 a 6
,...;
~ ~
'<:
E-1
r ~
Q:j
PATENT NETVVORK 3


Lt\SERS
GROWTH OP BETA
1.2 ,---------------------------------
1.1
1
0.9 -
OJJ
0.7
0. (;
0.5
0.4
//
/
/
l
/
t) '"
.(..}
().2 -
0.1 -
/
/
/
...-"
/..-/'
')+ I
t. -----,----,-----r---r---,---r-r-------,------T
., )" r,u- so ~ ~ u - 0 0
,L -.. 1 ~ 1 (7
C' [,(,' u 1 T.'
1-J . '" ... '11 .f. ,.
/
/
/
...... /
//_,./// ..
.,..,... /
......
/
/
I
/
/
/
I
/
i
,I
/
.---r---r---,----,----T----
110 130 l50
--1
1 " 0
'n
'-
l
I))
N
w
"\)
.:)>
~
~
'i
~
~
1:5
II>
)\;
tv
I
%
rr,
,)1
IX>
(>:)
I
~
l:.'i
kj
hq
PA .. TEJVT NET11\fORJC 4:-


G!WWTH OF BET11
1. 1 -,---------------------------------------
1 -
0.9
0.8-
0
"1
I
().f.) -
0.5-
0.4
or:_;; -
0 . .2 -
0.1 --
E1v1I SCi\NNER
/'
I
I
II
/
I
/
I
/
i
i
,I
/
/
//"
//
I
0 -+-----T----T-----,----T-----,-------r------T----r-----r--1- -.-,----.--
10 30 50 70 90 110 130 150
SIGMA f'
.,,
'--
~
~
fn
N
t
~
'--1
"' <Xl
'-I>
'i
~
fl't
~
<::>
)(,
"
*
I
~
f"tJ
~
I
'
!,
'
.
.
' \.;

,___
>--
..,...
- ')O-
I
"1:'"-i..
'
(:-.. I.Q
>-- 6 6 6
C"'>
Q

Q .....
C'j
. .,
Q,
;:;..

)!
''i
\t..,

V}
,:""'-,.

':3

q
h

[l
'"'-<
+


>--
0

C") C;t >--
a 6

6
'- LJ!-
l=lfriAR.E. 2 6 - BISL!OA?77/?IC..S /V'ETWOR.)( -13E/7)
(_'\.\
I
I ~
I ~
r co
I
I
I
i"' I ~
L\"'o..
I ~
~
'"
' " .;:::,
f-. ,::J v}
! \:J
I <"'.
I '-;
i
\,, ~
I i
', J
---............. ,
I
' .
' f1
1..5...
- 92 -
Gamma Index
The Gamma Index has been used as it stands, ie. the equation:
2E/(V 2.- V)
The Gamma Index has been produced by means or the computer pro-
gram BEGAM in Appendix 6. The full results for the networks appear as
part or Appendices 1 to 5.
The Gamma Index is a measure of connectivity within the network
and relates the actual number of references between papers to the max-
imum number possible. We can imagine that in the early days of an
emerging discipline, when the number of papers is small and coming
from a relatively select group (an Invisible College made visible),
then most authors will know and will probably refer to the other
papers in the field. As the field expands this will become less pes-
sible and only the important and/or well-known papers will be referred
to. Thus we could expect that in the beginning, Gamma will have a
high value and will then slowly decline.
It has not proved possible to collect data on a modern, emerging,
scientiric discipline to test this hypothesis against, but Figures 27
to 32 present the values for Gamma plotted against the cumulative
number of papers (or patents). It should be noted that theY-axis
values for the individual patent networks (Figures 27 to 30) have been
multiplied by a factor of 100. The true Gamma Index values appear on
the graph for all the patent networks and on the Bibliometrics graph
(Figures 31 and 32). Again we see the same pattern for all the patent
networks. The overall shape of the graphs and the relationshiP of the
lines for each network remains essentially the same as in the previous
figures. It is especially interesting that there is a steep growth in
- 93 -
the Gamma Index values for the patents, thus contradicting the tenta-
tive hypothesis above. It would seem likely that the patent litera-
ture would disobey the general rules of the development of a discip-
line. General scientific literature is based upon the principle of a
free exchange of information; patents are issued to protect commercial
interests and indeed to prevent other companies from exploiting a
discovery.
The bibliometrics literature, on the other hand, again presents a
more interesting picture (Figure 32) and one which would seem to fit
more within the pattern of scientific literature generally. We see
also the same rises in the value of Gamma at items 400 and 700 as we
saw in the Beta Index, but the general picture is of a slow decline as
the number of references per paper remains static.
:'I:
":i
";
"'-::1
-q
;'[!
..:J
PitTE l\TT I\fETWOHI\: 1
+
ELECTROPHOTOGRAP
,.
GROWTH OF GAMMA
1 . --,--------------- -
1.5 -
1.4 -
'1 ..
J .. -
1 . J'
1
'
/'
I
/
/
,.
/
/
/
/
/
,___ {
0.9
OJ3
0.?
O.U -
0.5 -
0.4 -
0.3
0.2
0.1 -
1...
\, --......-......... ....... ----
I
, I .. J
' '
I ..
/
/
//
I
!', II
I '., ,-'
I ..
/
'
'
/
'
l
I
i
//
0 +--T-1--...----,---,--f---r-T---r----r-l--r---r--r----,-
10 so 50 70 90 110 130 150 11'0
190 210
.SIGMA l'
n

!
Ai
fll
N
'-...!
I
'\)
.)>
'I

tt, -1'

I
'-1



/1;
'-
I
ct>


,h
E-<


E-<
<1
0...
-95-
FIG-IA/(c 2.2> - P/9TE>vT NnYOif.K 2. - &-'tA?fo?/9

,----------------------------------------------r
lQ
\
-- -o
"}- \i.) C":;
6 6 6 6
c:;

I

!
r
I ,::::.
f-
I ,)?

"--- I
7
L l
........___.....j
\'--
6
':f!
" " ~
""i
~
'
~
~
PltTENT NETWORK :=1


LASERS
GROWTH OF GAMMA
l . t) -.----------
1.4
., .:.1
l./2 -
1 1 -
1
0.9
0.8
0.7-
0.6
0.6-
0.4-
O
n
,<._) -
0.2 -
0.1
(
,-'
_.,.rf
/ ' ~ , ....
;
/'
/
/
/
/
./ ........ ---- -----------....... _______ ,.......-
/
. /
/
/
0 -r------T----,----,-------,---r----,----+----,-,------r---,- ---r-- ....---.,----!
10 30 50 '?0 90 110 130 t50
t70
SIGM.A t'
'tr
.....
ct>
~
A)
,..,
f.>
~
~
~ "'
z "'
'1
~
~
-'I>
A;
w
I
~
1
~
.
- 0 7-
F 1 G- lA R, E. 3 o - P /7- T /1 T A/ E. 7vM R .1( 4< - 6 ..<}-/111):> -9
1---.. _
i\
!

l
1


----_ I
------ I c
---r- :--
1 ,...
l-
1

IV)
I
f \""'\
._ -
j ....

La
i 'O

! ::""'\
1-
I
;'<"\
. '

;-_

\;,i ;-.

C')
o:;
:c--.
<1:)
'-0
'tj- C'j
,__
a
,__
\'-- ;-.
,__
6 6 6 6 6 6 6 6 6
".
, .
.
.
.
;fr(
\.C
,..._
<::l

i.(.)

,,
"""'
,..._
Q
a
::i 6
C<
,..._ ,..._
,..._ ,..._ ,..._
<::l
Q
a a
6
6 6 ::i
- 08-
+
0
<::l
c;, cc,
{.r:, i.(.)
"} C'j
q
,..._

a



a a <::l a a
a
, ...
<::l <::l a a
Q

"-''
::i 6 6 6 6 6
.:::;
6 6
9'Ik71TV;:J
BIBiji0.&1ETRICS CIT.Lf\TION
(;ROWTH OF GAMM.4
0.019 - ----------------
0.018
r
O.Orl
0.01(; -
I I
0.015
0.014
\ //',_
0.013 -
\J
0.012
0.011 -

0.01
""i
":i
"';
0.009

'>Cl
OJJ08
0.007-
O.OOf>
,/\

\
\ \
\ J "-,
\ \ /
,
.,,,, /
"--
'' .. ,, . )/
'"--,
. .,..
"
--.. ........... ,
n
"
'-\'>
s:
..>;:,
frf
VJ
N
I
\),;
'-

t'-
\S>
C)
'!:>

I
ht


0.005

Ill
0.004
I)
0.003-
/1)
"
0.002-
I
0.001 - .
0
0 200 400 (WO
.---
800


SIGMA V
;.
- 100 -
1. .0.. ,S.-.I Index
The S-I Index is a more interesting case. It was developed for
the purpose of analysing transportation networks and has certain
characteristics which seem to be appropriate to that purpose and not
necessarily to the analysis of information transfer networks. There
are two characteristics which are important here:
[1] The graphs handled by the S-I Index as described in the geograph-
ical literature are non-directed ones and generally do not have
separate subgraphs or isolated points. Information transfer net-
works, on the other hand, are usuallY directed (although they may
not necessarily be) directed and large, realistic ones do have
many isolated nodes and subgraphs.
[2] In real life, when a transportation carrier comes to a node with
more than one link out of it (ega road or rail junction), a
choice must be made as to which link to travel along to the next
node. A lorry can only move along a single link and it is this
factor which makes the concept of shortest path so important to
the analysis of transportation networks. Ideas, on the other
hand, travel by more than one link. If we take a citation net-
work as an example, where a source paper A is cited by B (i.e. we
can consider the ideas in A to be travelling forwards in time to
B), and B is cited in turn by C and D, it is clear that the ideas
in A are travelling along two paths of intellectual heritage - A
to B to D and A to B to c. If both C and D are now cited by E,
then E receives A's ideas by means of both the routes A-B-C-E and
A-B-D-E.
It is thus necessary to modifY the S-I index to
cope with
this
- 101 -
situation. The modification is not to the formulae for the Index
itself, but to the way in which the basic frequency distribution is
derived. The original S-I index uses, as described in the previous
chapter, the shortest path distribution. The modification for the
'Information Transfer S-I Index uses a frequency distribution derived
from ~ paths through the network.
Using the argument above about the flow of information in the
network, the nodes in an information transfer network can be divided
into three groups: Source nodes (which do not refer to any other
nodes), Destination nodes (which are not cited by any other node), and
Others (which both refer to and are cited by other nodes). The path
length distribution is derived from all paths from all source nodes to
all destination nodes. Paths in the middle of the network i.e. from
Sources to Others, or from Others to Destinations, are not counted.
This frequency distribution has been derived by means of the com-
puter program NETWRK. This program was written as a Modular Degree
project in computing at the City of London Polytechnic. It enables
the user to set up a network, to add nodes and links to the network,
to delete nodes and links and to produce statistics from the resulting
network. As well as producing the 'Information Transfer' S-I Index,
the standard shortest-path Index was also produced. The data from
this program (and from the BEGAM program) was transferred to an IBM PC
computer and Lotus 1-2-3 used to produce the tables which form the
main sections of Appendices 1 to 5, as well as the graphs which appear
in this chapter. Some information on the program is given in Appendix
6 but the complete program is too large to be reproduced here.
Figures 33 to 36 present the graphs for the S-I Index values
the patent networks and Figure 37 the equivalent results for { g ~
- 102 -
Bibliometrics network.
These graphs do offer some interesting differences between the
patent networks. P.atent networks 1 and 2 have similar graphs, with
the following characteristics: an increase in both S and I from the
0,0 origin, followed by a fall-back, a further increase and then a
near-vertical drop in the S value. Although the actual values in net-
work 1 are generally higher, the same overall shape of the graph is
there. Network 2 has additionally considerably more activity around
the S=0.6, I=0.9 area. Patent network 3 is different, but the same
general pattern can be discerned. Patent network 4 is a very small
network and the appearance of the graph.is very different from the
other patent networks. This can probably be attributed to the newness
of the subject.
The S-I graph for the Bibliometrics network is distinctly dif-
ferent again to the four patent networks. Although the same rise
appears from the 0,0 origin up ~ o a maximum at S=1.6, I=1.8, followed
by a slight fall, there is not the same peaking as in patent network
1. Instead there is a steady {if somewhat convoluted fall in both S
and I, which tends to parallel the initial rise. Once a minimum has
been reached at around S= -0.8, I=0.9, however, there is a distinct
upwards curve in the graph which would appear to be continuous.
-
1./j
1\TETW.OHK
PATENT 1

''''*"'''''"'std,. __ 11
ElLECTHOPHOTOG RAPI-lY . '- ,.
S---1 fl\i'DEX GR4PH
,1' ----------------------- --------------------------------
0 .. 8-
O.t:
0.4
0. :z -
0
--1) ?

--0.4

-0.8-
-1
-f.2
--1.4
----1_[':-
------------------------------ -------
__________ __._. ....... -----"'
---- ---------------------------- --------
.d:J _-JJ
/' ---
,/
r/''
"" /0
,,
ch
...
..... --
_/
l1
'-

-'\:>
frt
w
w
\)
:p C>
:1 "'
l'ti
<.
'i

rr-,
'i_
\)

'--
Ci'
I
1--j
-- 1 . 8
0
--,, 04 ., ... o..-. t t<> E:i
.. <- -' .0 ' f'iJ

I
-: .'
- I OJ,<.-
if1
1-"l
if1
!
~
<
E-"
4:
""
u
~ ... ,
I
'
'
N ;_;;.;
!..l.,

~
\.;:;
01
~
f>:.)
,-.,
t:'
k.,..-i <
. ~
.....
~ I'
0
C'}
~
~
r-_-,
1-'-i
:.e"t
~
I-=
,:...;
~
E-<
~
n,
,__
0:>
~
'1-
)) \;) c \ ~
'1-
(0
0:
,___
C') "}
6 6 6 6
\;)
6 6 6 ""'-
,__
I
l
O:?-
o_ e;
0.5 -
OA
U
. '"
"-">
fi t


'
...
/
...--
/

_,_r"
NET1NOHK 3

'
LA
QERQ
..J j . k)
---,- ,--l--,--T--r--,--,----r----,-,---,----r--,--,---1--,-
)"<) 0<
1
- 08 1 t'" 11.. '18
t ""' . ,.. (l,b ,,. _..;. '.. ,b -.
1
n


.A;)
1"1
t>J
v,
I
'\)

'-l

'i

fll


c.v
(/)



)s:

0
\r,
.,..

-l
...,
;1!'
-1
;1!'
-1
r:
)
r\
-"-
-1
-1

:=l


Q;



J

'
"'-l
,..\
.... ,
,_,
-1

.....;
-1 ....,
"'\ i
"' C'J


-1

,.
-l
-!
,.
-l

"'
-1


- 106 -

I '
' I
.,....
.,....

'- r- - .
.,....
\1':)
6
' I
-,." . ,
<:o
6
',".'-,_ ;
0.
6
"-,
.........
'
'n,.., ___
Gl_ '-,
....-........ "\. .....
........ --........._ ......
----............_ ....
-.... ...._ L
... " -..'--., II

a
', ,_
Ia
1-
.,
........'\ ......
.,
......... ..............
-.... ," r
'T'
a
i.(j
'i-
C")
N
.,.__

6 a 6 a 6
.
. \
'I
'
;
'i
- /()(,Q._-
FtG-IA!tc 37- 818.00.Af!!!l{'!C$ /1/Er..JoRX- S-I /N.);Ex
~ - - - - - - - - - - - - - - - - - - - - - - - - , - - - - - - - - - - - - - - - - - ~
\
\
'l
,__
0:
'
.u
"i-
q :::, q
"i-
:.:}
0:
"'
,__
.....:
a a ci ~ ci ci ci ~
I i i
- 107 -
~ . ~ . Growth of paths
The final graphs (figures 38 and 39) present the growth in the
number of paths through the network for the four patent networks and
for the bibliometrics. They are illustrative of the comment quoted
earlier about the explosive combinatorial growth of paths in real-life
networks. The bibliometrics network is especially interesting with an
enormous growth in the number of paths towards the end of the analysed
network. The processing of the bibliometrics network was ended at
about source node 400. By then it had become very clear that the pat-
tern of paths was different to the family of patent networks. A
minor, subsidiary reason was the increasing length of time that each
stage in the analysis was taking. By the time that the 400 node net-
work was being processed, the analysis was taking something over 4
hours on a DECsystem-10 computer. No formal timings were made but it
appeared that the time was roughly proportional to the number of
paths. It is clear from the graph that the full bibliometrics network
is likely to contain paths numbered in billions and to require a very
powerful computer to process the data.
The figure for the Bibliometrics network is also interesting
because of the very abrupt growth in the number of paths which
occurred with the increase in the number of source papers from 260 to
270. The cause of this increase was the publication of Hart's thesis
(Hart 1950), which had 29 references and reviewed the major literature
of the field. This single publication had an impact on the informa-
tion transfer characteristics, far in excess of any other indicators
of its value. Deleting this paper (and any subsequent references to
it) from the network has the effect of bringing the number of paths
,0'
- 108 -
down to a lower set of values, which continue the general line of the
graph as it exists up to that point. There was also a doubling in
the number of paths which occurred in 1956 - at the same point and for
the same reasons as has previously been discussed above (7.3).
(jJ
'D
"'
.,r
,__ ,__ ,.__
-!09-
F /(S-WR E 33 - JTLL. P"fT.A(T /)IE7WORICS- !-'/ P/rlf!S
C'j C)
,__
,.__ ,__ ,.__
,__
0:,
0::1 ts.. \.t)
Q

<::)
CJ
(spuv sno?.{ tL)

+
0
l(.) ";)- 7)
N ""- <::>
,.....,:
<::)
6 6
,:--.:
-

-110-
FIG1AR 39- .818'-IDME}fiC$ /o/S:nJoRK -6??oiAITI/ lA/ A"J'T)fS
o'

'[l]
15""-
u_
V}


k.
c

rt:l
cl

Q)j
I
I



\

cjJ-
0
:1

!b

I I
a a a a a
\C
"1-

':J co

\;_) a \;_) a \;_) a
q \;) co \U "1- q

a
\)
'i-

..,.,_
"'""'"
(sp?..l/D 'S7W?.{ J,)
SHtLVd
;;:...

\};
...,
tl]
,;,
- 111 -
CHAPTER 8
SUMMARY AND CONCLUSIONS
~ .
~ . ~ . Introduction
The aim of this chapter is to summarise the results obtained in
the previous ones, and to suggest some ways in which these results
might be used in the future. Inevitably, a study of this type raises
more questions than it solves and some indications of additional work
that could arise from the analyses presented here are also given.
~ . ~ . Information Transfer Model
Chapter 4 presented a generalised model of the Information
Transfer process which can serve as framework for any future studies
of the process at any level from international studies even down to
local studies of the role and utility of the catalogue.
One of the advantages of setting any form of study within the
framework of a general model is that it sets the study within a gen-
eral context and relates the subject being studied to an overall pic-
ture. This has several implications. One of them is that there is
less likelihood of important and relevant factors being forgotten in
the course of the study. The second is that it is then easier for the
results of the study to be integrated with the general body of
knowledge within a discipline, if there is a relatively coherent model
of the overall process being studied. An analogy could be drawn with
- 112 -
a jigsaw puzzle. It is possible to fit together small sections of a
puzzle and these can form a coherent whole. It is difficult to relate
several small sections however, unless there is a picture on the front
of the box. If this picture is available, then the relationship of
one area to another is shown. Obviously the picture that is supplied
with a jigsaw is the "answer" and this is where the analogy breaks
down. The information transfer model (whether it is the one proposed
in this thesis or any other one) can only provide the outline of the
picture of information transfer. It is the purpose of the individual
studies to make up the completed picture. The general principle
remains, however, that it is difficult to see the relationship between
various areas of detail, unless there is a "broad sweep" picture of
the field as a whole. Any model should be able to subsume within its
scope, individual detailed models of particular areas. It is believed
that the model presented will do just that. Detailed studies on
library effectiveness, library management, user behaviour etc can all
be explained within the overall concepts of the model.
~ . ~ . Graph Theory
The main aim of the thesis was to see whether some of the gen-
eralised index measures by which networks could be characterised and
distinguished could be applied to information transfer networks.
Chapter 5 reviewed the ways in which graph theory had been previously
used in library and information studies and showed that the same
approach had not previously been used. It also contains some informa-
tion on other areas of library and information science which could
benefit from the applications of graph-theoretic methods to the net-
works which are an integral part of these areas. Some of the examples
mentioned are interlibrary loan networks, organisational structure and
'
'
.
.
t'
- 113 -
classification schemes and thesauri. In each case it is suggested
that there might be some relationship between the network structure
and some measure of the performance of a system.
Chapter 6 reviewed the various measures which have been used in
transport geography for the purpose of characterising networks. From
the work done by Kansky and by others, it appeared that the indexes
which had been proposed were very variable in their capacity to dis-
tinguish networks, although there appeared to have been no extensive
testing on large-scale networks. It is almost certain that there have
been no studies on graphs with greater that about 20 nodes, so the
results which have been developed for the very small graphs illus-
trated in Chapter 6 are not necessarily valid tor realistic informa- I ~
tion transfer networks.
Three graph-theoretic measures were then used to analyse 4 patent
citation networks and the citation network of the literature on
bibliometrics. The results were presented as the figures in Chapter
7. The results showed two things: firstly, that even the simpler
measures such as the Beta Index did seem to be able to distinguish

between networks of the same general type such as patent networks, and
. I
certainly between patent networks and the bibliometrics network.
Secondly, that whichever measure is chosen, there do seem to be real
differences between a family of networks which because of their social
background could be considered to be related (i.e. the patent net-
works) and the other network studied. These differences appear to be
related to the information transfer characteristics of the networks.
~ . ~ . Implications
The work in this thesis can only be indicative but it does lead
- 114 -
to some interesting speculations for future work.
[1] The information transfer networks studied are very restricted in
scope. Ideally, more work needs to be done with other scientific
and non-scientific literatures in the same way, i.e. plotting the
changes in the index measure concerned over time from the start
of the literature. In this way a series of graphs can be drawn
up which relate to various literatures.
[2] If this work were done, it is possible that various groups of
subjects would be seen to have similar ranges of values on the
S-I graph and that a series of classifications and groupings of
subjects could be done. The classification would then be on the
basis of the information transfer characteristics of the litera-
ture, rather than on traditional classification criteria.
[3] If this proved to be the case, then a grouping of like subjects
on the basis of their information transfer characteristics would
have some important implications for those aspects of the library
and information world which are concerned with services and pro-
ducts in the area of information transfer. In particular,
abstracting and indexing services and areas of user education
could be affected by a study of the information transfer
likenesses of different subjects. For example, it seems likely
that subjects with a low Gamma Index value would best be served
by either a contents pages service or a traditional abstracting
service, rather than a citation index. It also seems likely that
if subjects which traditionally are very different (say, English
literature and Engineering) were shown to have similar informa-
-
tion transfer characteristics, then the methods which had proved
effective in one area might be applied with more confidence in
'
- 115 -
another.
[4] The methods can be applied to any information transfer situation
and would obviously play a role in historical study of the
development of a literature. Very often gross numbers of
abstracts or papers are presented as graphs, with time as the x-
axis. It is suggested that it would be important to any histori-
cal or comparative study to look at the information transfer
characteristics of subjects and to ask questions about the extent
to which say, the chemical literature of the UK and Germany had
different citation network characteristics. The method can be
applied to personal communication patterns as well as to the for-
mal literature and could plot the growth of an "invisible col-
lege or informal information exchange, as well as distinguishing
between different types of invisible college. Very little work
has been done in looking at the structural characteristics and
the typology of informal information exchange networks.
[5] Following on from the historical study of what has happened in
the past, the data derived can then be projected into the future
and the future development of the information transfer charac-
teristics of the subject predicted. In this way useful planning
data can be derived from the graphs. If it is seen that the
information transfer characteristics of a particular subject net-
work is changing, then it may be important to promote or develop
a particular channel of information transfer and/or discourage
other ones where these are becoming deficient. By plotting the
citation characteristics of a subject, it would be possible to
indicate when a subject was moving to a situation where ~ cita-
tion index to the literature would be more effective and could
- 116 -
begin to replace a contents page service (market forces and
vested interests permitting!)
[6] The technical problems of this type of analysis should not be
under-estimated. Whilst the computational algorithms for calcu-
lating either the shortest paths or all paths through a network
are well-established, the amount of computing power required for
the analysis of any realistic network is very large. In addi-
tion, raw data for studies of the chronological development of a
literature are not readily available. The problem is a different
one from the usual one of studying citations, where a citation
index can be used. To gather the raw data for this type of study
involves the compilation of a chronologically arranged complete
bibliography and an examination of all the references attached to
each paper. Citation indexes are certainly of some use, but
other methods need to be used as well.
- 117 -
BIBLIOGRAPHY
ABLER, R.F. 1974. ~ geography Qf. communications.
Hurst (1974), 327-346.
In: Eliot
AVRAMESCU, A. 1975. Modelling scientific information transfer.
lnt Forum ln! Pocum 1(1) 1975, 13-19.
BRADFORD, S.C. 1953. pocumentation 2nd edition.
wood, 1953.
Crosby Lock-
BROWN, C.H. 1956. Scientific serials: characteristics of most
cited publications ~ mathematics, physics, chemistry, geology,
physiology, botany, zoology, iUMi entomology.
College and Research Libraries, 1956.
Association for
BROWN, L. 1968. Diffusion processes iUMi location: ~ conceptual
framework ~ bibliography. Philadelphia: Regional Science
Research Institute, 1968.
CLIFF, A.D., HAGGETT, P. and ORD, J.K. 1979. Graph theory in
geography. In: R.J. Wilson and L.W. Beineke, eds. Applica-
tions Qf. graph theory (Academic Press, 1979), 293-326.
COOLEY, C.H. 1894. The theory of transportation.
]QQn Assoc 9(3) May 1894. Reprinted in his Sociological theory
iUMi social research (Holt, 1938) and partially reprinted in Eliot
Hurst (1974), 15-29.
CRAWFORD, s. 1971. Informal communication among scientists in
sleep research. JASIS 22(5) Sep-Oct 1971, 301-310.
- 118 -
CUMMINGS, L.J. and FOX, D.A. 1973. Some mathematical properties
of cycling strategies using citation indexes. Inform Star Retr 9
(1973), 713-719.
ELIOT HURST, M.E. 1974. Transportation geography: comments and
readings. McGraw-Hill, 1974, 528p.
EDMUNDSON, H.P. 1967. Mathematical models in linguistics and
language processing. In: H. Berko, .rul.. Automated language pro-
cessing. (Wiley, 1967), 33-96.
EGAN, M. and HENKLE, H.H. 1956. Ways and means in which
research workers, executives and others use information. In: J.
H. Shera n ..sl. ~ . Documentation .ln action. (Reinhold, 1956),
pp 137-159.
ELLIS, P. 1977. A study Qf.1.!J& information content of patent
citation networks MSc dissertation, Centre for Information Sci-
ence, City University, 1977.
FIALKOWSKI, K. and JASTRZEBSKI, S. 1978. Identificational con-
trol of information flow in the network structures. Int Forum
lnt Docum 3(1) Jan 1978, 18-.
FORD, L.R. and FULKERSON, P.R. 1972. Flows in networks, Prince-
ton University Press, 1972.
GARNER, R. 1966. A graph theoretic analysis Qt. citation index
structures. Masters thesis, Drexel Institute of Technology,
1966.
GARNER, R. 1967a. A computer oriented, graph theoretic analysis
of citation index structures. In: B. Flood .rul., Three Drexel
- 119 -
science studies (Drexel Institute Technology,
1967), 1-46. A reprint of Garner (1966).
GARNER, R. 1967b. Graph theory as an retrieval tool
-an example from citation indexing. 4(1967), 80-83.
GOFFMAN, W. and NEWILL, V.A. 1964. Generalization of epidemic
theory: an application to the transmission ideas. Nature
204(4955) 17 Oct 1964, 225-228.
GOFFMAN, w. and NEWILL, V.A. 1967. Communication and epidemic
processes 298(1454) 2 May 1967, 316-334.
HAGERSTRAND, T. 1967. Innoyation diffgsion A spatial
University Chicago Press, 1967.
HAGGETT, P. and CHORLEY, R.J. 1969. Network analysis ln geogra-
.lilll!:. Arnold, 196 9.
HAGGETT, P., CLIFF, A.D. and FREY, A. 1977. Locational analysis
ln human geography. 2nd edition. Arnold, 1977.
HART, P.W. 1950. Periodicals professional librarianshio.
MSLS thesis, Catholic University, Aug 1950.
HAVELOCK, R.G. 1971. innoyation through dissemina-
knowledge. Center for Research on Utili-
zation of Scientific Knowledge, Institute for Social Research,
University of Michigan, 1971.
HOPP, R.H. 1956. A study complete documenta-
iiQn. PhD thesis, University of Illinois, 1956, 120p.
JAMES, G.A. 1970. Some discrete distributions for graphs
- 120 -
with applications to regional transport networks. Geogr Annaler
52B (1970), 14-21.
KANSKY, K.J. 1963. transport networks: relation-
between network geometry regional characteristics.
University of Chicago, Department of Geography, 1963. (Research
paper 84).
KORFHAGE, R.R., BHAT, U.N. and NANCE, R.E. 1972. Graph models
for library information networks. 42(1) Jan 1972, 31-42.
Reprinted in: Swanson, D.R. and Bookstein, A. Operations
research: implications libraries: thirty-fifth annual
conference Graduate Library School lLl. (University of
Chicago Press, 1972), 31-42.
KORFHAGE, R.R. 1972. Informal communication of scientific
information. 25(1) Jan-Feb 1972, 25-32.
KORZYBSKI, A. 1958. Science And sanity: an introduction 1Q
systems And general semantics. 4th edition.
Institute for General Semantics, 1958.
LOTKA, A.J. 1926. The frequency distribution of scientific pro-
ductivity. Academy Qt Science 16(12) 1926, 317-323.
LOWE, J.C. and MORYADAS, S. 1975. geography moyement.
Houghton Mifflin, 1975.
NANCE, R.E. 1970. An analytical model of a library network.
21(1) Jan-Feb 1970, 58-66.
NANCE, R.E., KORFHAGE, R.R. and BHAT, U.N. 1972. Information
networks: definitions and message transfer models. 23(4)
- 121 -
Jul-Aug 1972, 237-247.
NYSTUEN, J.D. and DACEY, M.F. 1961. A graph theory interpreta-
tion of nodal regions. Papers Procs Regional Science Assoc 7
(1961), 29-42. Reprinted in: Berry, B.J.L. and Marble, D.F. ~
Spatial analysis: ~ reader in statistical geography (Prentice-
Hall, 1968), 407-416.
ORD, J.K. 1967. On a system of discrete distributions.
trika 54(3/4) 1967, 649-656.
Biome-
PARKER, E.B., PAISLEY, W.J. and GARRETT, R. 1967. Bibliographic
citations ~ unobtrusive measures of scientific communication.
Stanford University, Institute for Communication Research. Oct
1967.
PITTS, F.R. 1965. A graph theoretic approach to historical
geography. Profess Geogr 17(5) 1965, 15-20.
PRICE, D.J. de s. 1970. Citation measures of hard science, soft
science, technology and nonscience. In: C.E. Nelson and D.K.
Pollack eds. Communication awong scientists ~ engineers.
(D.C. Heath, 1970), 3-22.
PRICE, D.J. de s. 1976. A general theory of bibliometrics and
other cumulative advantage processes.
1976, 292-306.
JASIS 27(5/6) Sep-Oct
PRITCHARD, A. and WITTIG, G.R. 1982. Bibliometrics: ~ bibliog-
raphy ~ citation index. Yolume ~ : ~ - ~ . Watford: ALLM
Books, 1982.
PRITCHARD, A. 1972. Bibliometrics and information transfer.
- 122 -
4(20) May 1972, 37-46.
PRIHAR, Z. 1956. Topological properties of telecommunication
networks. 44 (Jul 1956), 927-933.
SHAW, W.M. 1981. Measuring the information in a communication
graph. .AID;ll 18 ( 1981), 309-311.
SHIMBEL, A. 1953. Structural parameters of communication net-
works. Med Biophys 15 (1953), 501-507.
WEINBERG, H.L. 1960. Leyels QL existence: studies
semantics Hodder & Stoughton, 1960.
WERNER, c. 1968. A research seminar in theoretical transporta-
tion geography: networks and their service areas. In: F. Horton
Geographical studies QL urban transportation network
analysis (Northwestern University, 1968), 128-170. (Northwestern
University Studies in Geography no. 16).
ZIPF, G.K. 1949. Human behayiour principle QL least
effort. Addison-Wesley, 1949.
- 123 -
APPENDIX 1
PATENT NETWORK 1 - ELECTROPHOTOGRAPHY PATENTS
The source for this data is Ellis (1977), pages A1 to A18. It is
a medium sized patent network consisting of 205 nodes and 1161 paths.
The data for this network is reproduced as Tables A-1 and A-2. Table
A-1 provides the summary of the growth in links (Sigma E), the index
measures Beta, Gamma, S and I, together with the area of the S-I plane
in which the values of S/I fall. Table A-2 provides detailed informa-
tion on the total number of paths in the network. The figures across
the top of the table give the total number of paths and then the
lengths of the paths (0,1,2, ). For each 10th patent, the total
number of paths and the frequency of each path length is given.
This pattern of table data is constant across Appendices 1 5
and the description will not be repeated in the other Appendices.
- 124 -
TABLE A1 - PATENT NETWORK 1 - INDEX MEASURES
PATENT NETWORK 1 : ELECTROPHOTGBRAPHY
COUNT SIGMA E BETA BAMMA
1(l 0 0 0
20
30
40
5(1
0
(>
0
0
0
()
0
0
(l
0
0
bO 0 0 0
s
(l
0
0 0
0 0
0 0
(I i)
0 0
70 0 0 0 i) 0
BEST FIT
BO 6 0.075 0.189873 0.858824 0.929412 HYPERGEDMHRIC
90 19 0.211111 (l.474407 0.641509 i),820755 BINOMIAL
iM 19 (1.19 0.383838 0.64151)9 (i,820755 BINDMIHL
110 34 0.309091 0.567i39 0.510791 0.755396 HYPERBEIJMETRIC
120 59 0.491667 0.826331 0.309942 (1.654971 BINOMIAL
130 59 0.453846 0. 703637 0.309942 0.654971 BINOMIAL
140 84 0.6 0.863309 0.522778 0.753249 HYPERGEUMETRIC
150 91 O.h06667 0.814318 i).50033 0.73951 HYPERGEDMETRIC
160 98 0.6125 0.77044 0.812962 1.2543 BETH BINOMIAL
170 114 0. 670588 0. 793596 0. 434915 1.09966 BETA BINOMIAL
iBO 172 0.955556 1.067b6 -0.032{17 !),80128 BETA BINOMIAL
190 218 1.14737 1.21415 -0.14981 0.6'18273 BETA BINOMIAL
200 301 1.505 1.51256 -0.55370 0.785547 BETA BINOMIAL
2D5 333 i.62439 1.59254 -1.65346 0. 76B338 BETA BINO!'IIHL
- 125 -
TABLE A2 - PATENT NETWORK 1 - PATH DATA
PATENT NETWORK 1
'
ELECTROPHOTOGRAPHY
COUNT NO. PATHS
(l
0
3 4 5 6
"
10 10 10
20 20 20
30 30 30
4(l 4(l 41)
50
q,
..!'.' 50
60 60 60
70 7(! 70
80 85
70
6
"
90 106 87 19
10(1
116 97 i9
1 10
PO
"'
105 34
120 !"
"
112 51
130 181
rn
4 59
J4i! 212 130 74 8
i50 228 139 Bl 8
160 246 148 32
ro
ow 8
170 297 156
77
100
D
.. ~ . . .
u
18(i
427 161 76 182 8
190 527 170 117 232 8
200
7i I
172 81 190 302 t' pJl _u
205 1!61 172
0
323 399 229
7r
'-
00
- 126 -
APPENDIX 2
NETWORK 2 - ZIEGLER-NATTA CATALYSIS PATENTS
This network is taken from Ellis (1977) pages A19-A68. It is the
largest of the four patent networks, consisting of 456 patents with
1591 paths. The basic data for the network is given as Tables A-3 and
A-4.
- 127 -
TABLE A3 - PATENT NETWORK 2 - INDEX MEASURES
PATENT NETWORK 2 : ZIEBLER-NATTA CATALYSIS
COUNT SIGMA E BETA BAMMA s BEST FIT
10
20
30
4(1
50
6(l
70
BO
90
1 !)!)
110
120
0
0
0
!)
0
0
0
0
0
0
(i
0
0
0
0
0
0
I)
0
I)
(i
I)
(1
0
0
0
0
\)
I)
0
(l
i)
I)
0
0
0
0
(l
0
0
0
0
I)
0
!)
0
0
0
(l
0
0
0
0
lj
!)
0
0
0
0
0
0
I) 130 0 0 (!
140 0 0 0 0 0
150 10 0.066666 0.089485 0.874214 0.937107 HYPERGEOMETRIC
160 10 0.0625 (>.078616 0.88!657 0.940828 HYPERSEDMETRIC
170 10 0.058823 0.069613 0.888268 0.944134 HVPERSEOMETRIC
HW 10 1).055555 0.062073 0.89418 (1,94709 HYPERGEDMETRIC
190 10 0.052631 0.055694 0.399498 0.949749 HYPERGEOMETRIC
200 10 0.(l5 0.050251 0. 904306 0. 952153 BETH BINOMIAL
210 30 (!,142857 0.136705 0. 744681 0.87234 HYPERSEDMETRIC
22(1 35 0.159091 0.145289 0.716599 0.8583 HYPERGEOMETRIC
230 48 0.2(18696 0.182267 0.641791 0.320896 BINOMIAL
24( 55 0.229167 0.191771 0.611307 O.B0%54 HYPERGEDMETRIC
250 81 (1.324 0.260241 0.482428 0.741214 BINOMIAL
'"'i {
.LD.I
270
280
290
3(ii)
310
B9 0.342308 0.26433 0.507407 0.750301 BINOMIAL
113 0.418519 0.311166 0.696906 0.354597 BETA BINOMIAL
143 0.510714 0.366103 0.720687 0.900409 BETA BWOr!IAL
160 0.381816 0.67009 0.867576 BETA BINOMIAL
168 0.56 <).374582 0.647249 0.849946 BETA BINOMIAL
201 0.648387 0.419668 0.664515 0.879848 BETA BINilMIAL
320 204 <).6375 0.399687 0.669264 0.88062 BETA BINOMIAL
330 219 0.663636 0.403426 0.67428 0.941341 BETA BINOMIAL
340 237 (1.697059 0.411244 0.633912 1).969992 BETA BINOMIAL
350
36(1
370
380
390
4(;(!
410
420
430
450
246 0.702857 0.402783 0.851834
259 0.719444 0.400805 0.765792
290 0. 783784 0.412873
306 0.805263 0.42494i 0.964192
355 (1,910256 0.467998 0.848977
377 0.9425 (1.472431 0.732143
395 0. 963415 0, 471107 (1, 705234
432 1.02857 0.490965 0.398126
466 1.08372 0,5tj5231 0.375224
535 1.21591 0.553945 0.068300
578 1. 28444 (l, 572136 -0. 48995
1. 09 409 BET A BINOMIAL
1.11733 BETA BINOMIAL
1.07164 BETA BINOMIAL
1.20125 BETA BINOMIAL
1. i9083 BETA BINOMIAL
1. 8883 BET A BINOMIAL
1.310!6 BETA BINOMIAL
1.28997 BETA BINOMIAL
1.31968 BETA BINOMIAL
1.29513 BETA B1NDMIAL
1, 28575 BETA BINOMIAL
456 599 1.31.36 0.5774(!5 -1.33546 1.35434 BETA BINOMIAL
-
128
-
TABLE A4
-
PATENT NETWORK 2
-
PATH DATA
PATENT NETWORK 2 : ZIEGLER-NATTA CATALYSIS
COUNT NO. PATHS
{j
1
,
T
5 b 7 8
4
"
10 10
l(i
20 20 20
30 30 30
4(! 40 40
50 50 50
60 60 60
70 70 70
80 80
8(1
90 90 90
10(1 100 100
11 (l 110 110
120 120 120
130 130 130
14(1 140 140
150 159 149 10
160 169 159 10
170 179 169 10
180 189 179 10
190 199 189 10
200 209 199 10
210
"1<:t::
L.-..JW 205 30
22(1 247 212
-rr
""
230 268 220 48
240 283 228 55
250 313
'1'F}
81
...
260 327 239 87 1
270
<C:"'l
,,. 243 97 12
28(1 381 246 109
0'
.o
290 402 251 123 28
30!) 416 257 131 28
310 449 262 143 43
320 460
1"'ff
L.fi) i46 43
330 47b 276 130 63
34(1 495 280 130
84
350 509 285 128 81 15
360
r'"
,JL"! 291 114 107 17
370 561 294 120 130 17
38{i 588 300 115 150 9 14
390 642 304 128
162 34
14
400 677 311 1 19 182 51
I' ,,
410 749 317 112 178 100 42
410 B34 321 111 135 207 49
430 920 325 94 166 248
ro
"'
'!D
4U
44(1 1103
7"'1!
57 176 319 145 66 14
c.
1
LC
'e'
l!JV 1222 327 8
i'17
291 301 127 45
f
1591 329
'
"CI
96 408 320 257 116 31
"!-iO
,.
'"
- 129 -
APPENDIX 3
NETWORK 3 - LASER PATENTS
This network is taken from Ellis (1977) pages A69-A87. It is a
fairly small network with 1 6 ~ patents and 377 paths. The basic data
for the network is given as Tables A-5 and A-6.
- 130 -
TABLE A5 - PATENT NETWORK 3 - INDEX MEASURES
PATENT NETWORK 3 : LASERS
COUNT SISMA E BETii SMHA s BEST FIT
10 l)
0 0 0 0
20 i) 0 0 0 0
30 0 0 0
(i (l
4(1 0 i)
0
A
0
"
so 0 0 0 0
l)
6(1 0
0 0
(l
0
70 (1
0 0 0
l)
80 0 0 0 0 0
90 30 0.333333 0. 749064 0.482759 0.741379 BHlOM!AL
100 44 0.44 0.888889 0.348148 0.674074 BINOMIAL
110 59 0.536364 0.984153 0.749378 1.2262 BETA BINOMIAL
120 91 0.758333 1.27451 0.548285 0.960132 BETA BINOMIAL
130: 121 0.930769 L443{JS 0.327555 1.02968 BETA BINOMIAL
140 140 1.43885 0.353048 1.23932 BETA BINOMIAL
150 162 1.08 1.44966 0.(185512 i.51342 BETA BINOMIAL
160 181 L 13125 1.42296 -0.75422 1.29685 BETA BINOMIAL
"164 1q4 1.18293 1. 45144 -1.24262 1. 74176 BETA BINIJMIHL
- 131 -
TABLE A6 - PATENT NETWORK 3 - PATH DATA
PATENT NETWORK
7
,,
'
LiiSERS
COUNT NO. PATHS 0 2 3 4 5 6 ' I
10 10 10
20 20 20
30 30 30
40 40 40
50 50 50
6(l
60
b<)
70 70
7(l
80 BO 80
90
.. ,
llo 86 3(1
10(1
135 91 44
110 151 98 16
7"
w/
120 187 102 47 38
130 211 103
.,,
75 6
.,
14(i
234 1(16 20 59 48
150 267 109 11
'1<
70 "-'.J
:::"'1
""
"
"
160
<""'71
114 7 17 75 140 16
.,
'"'I ~
"
1 b4 377 1i4 5 6 4
r
P''"'
58 12 '"--'
.,:L
- 132 -
APPENDIX 4
NETWORK 4 - EMI SCANNER PATENTS
This network is taken from Ellis (1977) pages A88-A100.
It is
the smallest of the patent networks and consists of 154 patents and
265 paths. The basic data is presented as Tables A-7 and A-8.
- 133 -
TABLE A7 - PATENT NETWORK 4 - INDEX MEASURES
PATENT NETWORK 4 : EM! SCANNER
COUNT SIGMA E BE H. GAMMA 5 BEST FIT
10 0 0 0
(I
0
20 0
(l i)
0 0
31)
0
(I
0
(I
0
40
(i
0
(I
0
(l
50
i)
0
(l (l
0
60
(l
0 0
i) (i
70 0 0 0 0 0
80
(I
0
i) (l
0
90 0 0
,-,
(l \)
v
100 0 0
(l
0
(J
110 0
(I i) (l
0
120 29 0.241667 0.406162 0.66b667 0.833333 BHWMIAL
130 82 0. 639076 {). 977937 0.197917 0.598958 HYPERGEDMETRIC
140
1"'1
LL 0.871429 1.25385 0.460248 0.939635 BETA BINOMIAL
150 \ 4 ~ (1.996667 1. 324'33 1.06117 1. 70279
tit'Ti
1Ji...ll1 BINOMIAL
154 163 1. 05844 1.38358 0.244529 1. 86247 BETA BINOMIAL
- 134 -
TABLE A8 - PATENT NETWORK 4 - PATH DATA
PATENT NETWORK 4
'
EM! SCANNER
CDUNT NO. PATHS 0
"
7
4
"
t. 0 J
to 10 10
20 20 20
30
3(! 30
40 40 40
50 50 50
6(l 60 60
70 70 70
80 80 80
90 90 90
100 100 100
110 110 110
120 138 115 T 0
130 192 115 77
141) 224 224 116
""
""
<7
JJ
15(1 246 116 39 29 29
n
-J ..
154 265 117 2 11 55 47
n
)...I
- 135 -
APPENDIX 5
NETWORK 5 - BIBLIOMETRICS LITERATURE
This network was basically published as Pritchard and Wittig
(1982), although since publication, further work has been done on ver-
ifying and checking the papers and references up to 1959 and extending
the work further forward.
It is believed that the bibliography and datafile of references is
almost complete for the period up to 1964. After that period the
papers were collected partly on a random basis and partly on the basis
of being papers which referred to S.C. Bradford.
It is a very large network consisting of 829
references from the papers. The number of paths had
papers and 4194
ii:Z
grown to by
at which point, for reasons explained in Chapter 6, the
calculation of the path length distribution and the S-I Index had to
be ended. The basic data for the network is presented in Tables A-9
and A-10.
- 136 -
TABLE A9 - BIBLIOMETRICS NETWORK - INDEX MEASURES
BIBLIOMETRICS CITATION NETWORK
COUNT
10
20
30
40
50
60
70
BO
90
100
110
120
130
140
15(1
160
SISMA E
0
3
BETA GAMMA S
0 0 0
0.15 0.015789 1.49816
8 0.266666 0.018390 1.58721
14 0.35 0.017948 1.17398
16 0.32 0.013061 1.3016
21 0.35 0.011864 1.31684
29 0.414285 0.012008 1.22167
45 0.5625 0.014240 1.11879
57 0.633333 0.014232 1.03202
68 0.68 0.013737 1.26822
85 0. 772727 0.014178 1.06295
98 0.816666 0.013725 1.02594
116 0.892307 0.013834 0.770179
138 0.985714 0.014182 0.531475
197 1.313333 0.017628 -0.33051
227 1.41875 0.017845 -0.30953
BEST FIT
0
1.50877 BETA BINOMIAL
1.79167 BETA BINOMIAL
1.49915 BETA BINOMIAL
1.50877 BETA BINOMIAL
1.53575 BETA BINOMIAL
1.4453 BETA BINOMIAL
1.50891 BETA BINOMIAL
1.54633 BETA BINOMIAL
1.6075 BETA BINOMIAL
1.6527 BETA BINOMIAL
1.68368 BETA BINOMIAL
1.52298 BETA BINOMIAL
1.62734 BETA BINOMIAL
1.21026 BETA BINOMIAL
1.18469 BETA BINOMIAL
170 235 1.382352 0.016359 -<J.27215 1.20686 BETA BINOMIAL
lBO 294 1.633333 O.O!B249 -0.53595 1.0016 BETA BINOMIAL
190 333 1.752631 0.018546 -0.55365 0.991421 BETA BINOMIAL
200 370 1.85 0.018592 -0.74843 0.882323 BETA BINOMIAL
210 381, 1.814285 (>.017361 -0.76980 0. 974845 BETA B I N O ~ I A L
220 401 1.822727 0.016645 -0.73842 0.998632 BETA BINOMIAL
230 444 1.930434 0.016859 -0.78295 0.885029 BETA BINOMIAL
240 478 1.991666 0.016666 -0.87533 0.914475 BETA BINOMIAL
250 549 2.196 0.017638 -0.68557 0.908582 BETA BINOMIAL
260 5b6 2.176923 0.016810 -<).71563 0.955569 BETA BINOMIAL
270 603 2.233333 0.016604 -0.81691 0.693479 BETA BINOMIAL
280 649 2.317857 0.016615 -0.80302 0.654122 BETA BINOMIAL
290 708 2.441379 0.016895 -0.69495 0.6<)0585 BETA BINOMIAL
300 719 2.396666 0.016031 -0.62820 Q.596994 BETA BINOMIAL
310 774 2.496774 0.016160 -<).52969 0.567371 BETA BINOMIAL
320 802 2.50625 0.015713 -0.40475 0.553159 BETA BINOMIAL
330 855 2.590909 0.015750 -0.43063 0.542636 BETA BINOMIAL
340 880 2.588235 0.015269 -0.40666 0.535528 BETA BINOMIAL
35(! 910 2.6 0.014899 -0.35772 0.535439 BETA BINOMIAL
360 942 2.616666 0.014577 -0.37048 0.530753 BETA BINOMIAL
370 961 2.597297 0.014077 -0.34180 0.529741 BETA BINOMIAL
3BO 973 2.560526 0.013512 -0.33023 0.531909 BETA BINOMIAL
390 1001 2.566666 0.013196 -0.27999 0.53286 BETA BINOMIAL
400 1152 2.88 0.014436 -0.27218 0.520845 BETA BINOMIAL
41(1 1175 2.865853 0.014013
420 1206 2.871428 0.013706
430 1270 2. 953488 0.013769
440 1294 2.940909 0.013398
450 1304 2.897777 0.012907
460 1350 2.934782 0.012787
47(> 1379 2. 934042 0.012511
480 1414 2.945833 0.012299
49(1 1428 2.914285 0.011919
500 1454 2. 908 0.011655
510 1476 2.894117 0.011371
520 1518 2.919230 0.011249
530 1535 2.896226 0.010949
540 1572 2.911111 0.010801
550 1609 2.925454 0.010657
560 1637 2.923214 0.010458
570 1655 2. 903508 0.010205
580 1705 2. 939655 0.010154
590 1712 2.901694 1),009852
600 1743 2. 905 0.009699
610 1800 2.950819 0.009690
620 1840 2.967741 0.009588
6:\0 1868 2. 965079 0.009427
640 1879 2.935937 0.009189
650 1920 2.953846 0.009102
660 1937 2. 934848 0.008906
670 1909 2.938805 0.008785
680 1998 2.938235 0.008654
690 2020 2. 927536 0.(>08497
700 2031 2.901428 0.008301
710 2281 3.212b76 0.009062
720 2622 3.641666 0.010129
730 2746 3.761643 0.010320
740 2880 3.891891 (\,010532
750 2987 3.982666 0.010634
76(> 3045 4.006578 (1,010557
770 3150 4.090909 0.010639
79(1 3406 4.366066 0.011110
790 3624 4.587341 0.011628
BOO 3776 4.72 0.011814
810 3995 4.932098 0.01.2193
82(1 4086 4.982926 0.012168
829 4194 5.059107 0.012220
- 137 -
TABLE A10 - BIBLIOMETRICS NETWORK - PATH DATA
CDUNT NO. PATHS
I)
0
3 4 5 6

10 10 10
2(1 19 17 1 1
30 32 26 1 4
40 45 31 4 B 2
50 57 41 6 8
0

60 65 46 7 9 3
70 78 52 11 11 4
80 96 55 12 !B 9 2
90 114 61 12
o.
., 13 5
!00 129 b6 20 22 13 7 1
110 157 71 20 27 21 14 4
120 187 77 24 31 2b 20 B 1
130 215 80 28 37 38
"'
a 1
v
14(l 283 84 30 36 48 45 27 11
150 562 85 38 63 97 118 92 50
160 623 88 49 69 110 130 102 55
170 641 94 53 71 113 134 103 55
180 952 95 50 103 165 214 176 104
190 1215 99 55 112 185 259 237 162
200 1689 103 53 123 230 345 356
0'"
LiJ
210 1721 110 56 105 196 310 345 294
220 1757 118 58 111 210 312 345 297
230 2585 125 69 141 281 453 526 468
24(1 2637 127 70 m 258 417 511 489
250 35b5 135 85 187 327 541 665 648
260 4201 139 7b 172 314 514 671
i ' i ~
!I .. I
270 11181 145 80 212
51)7 993 1594 2052
280 15552 153 77 222 553 1121 1950 2715
290 22851 153 68 269 683 1453 2704 3943
3Q(l 24646 155 71 291 750 1642 3024 4335
310 36861 159 78 326 945 2176 4172 6248
320 49542 163 74 342 1038 2493 4996 7856
330 69002 166 81 357 1154 2858 5953 9891
34(! 761(15 170 77 347 1179 3001 6390 10725
350 95891 177 68 JOB 1120 2988 6640 11753
360 106484 187 70 309 1117 3040 6860 12403
370 125772 192
70
.L 333 1270 3527 7968 14393
3B(l 128094 198 74 346 1310 3674 8266 14848
391) 141235 201 75 391 1508 4282 9613 17018
40(1
257762 208 92 416 1759 5!95 12350 23876
T!tSL.E. f\-10 ( ~ )
COUNT 7 8 9 10 11 12 13 14 I!>
'"
10
20
~ . ( 1
40
50
60
70
80
90
100
11 (1
120
130
140 2
150 17 2
160 18 2
17(> 18 2
180 39 6
190 80 23 3
200 150 49
.,
'
210 191 87 24
'
0
220 192 87 24
7
0
230 318 153 45 6
240 360 196 66 10
250 506 304 129 34 4
26(1 661 490 282 119 32 4
270 2130 1748 1901 482 131 16
2BO 3023 2666 1814 902 297 55 4
290 4541 4112 2862 1459 498 98 8
300 4903 4369 2995 1501 504 98 a
310 7334 6787 4866 2593 949 208 20
321) 9673 9411 7159 4123 1692 451 67 4
330 12907 13363 10935 6904 3209 1016 192 16
340 14142 14794 12248 7851 3721 1206 234 20
350 16471 18442 16531 11710 6361 2524 680 110 9
36(;
17793 20408 18762 13063 7660 3156 891 153
ll.
370 20773 23953 22245 16419 9369 3943 1139 200
1/,
380 21239 24385 22515 16536 9401 3947 1139 200
1(.
390 239(11 26945 24434 17617 9830 4053 1151 200 N
401)
37161 46604 47396 38871 25271 12621 4610 1147
173
ll,
- 138 -
APPENDIX 6
COMPUTER PROGRAMS
The work presented in this thesis could not have been carried out
without the help of two main computer programs. A third program was
written to produce the reference and citation indexes in Pritchard and
Wittig (1982), but this is not dealt with in this Appendix.
BEG AM
The first program (BEGAM) calculates the values of Sigma E, Beta
and Gamma for all nodes in a network and prints every 10th value. It
accepts data in the form:
V,n,E1,E2,E3, En
where V = a code for the node n = the number of links (E) E1 En =
the codes for the links.
In the existing datafiles of nodes and references, the codes are
numeric ones
- the patent numbers for the patents and a code based upon the year,
month day of publication for the bibliometrics network.
The program listing is given below:
10 INPUT "FILE NAME: ";F$
20 FILE 1,F$
30 INPUT "NUMBER OF RECORDS: ";N
BEG AM
- 139 -
35 REM ANY NUMBER GREATER THAN THE NUMBER OF NODES
40 E1=0
50 PRINT
60 PRINT
70 PRINT ncOUNTn, nRECORDn, nsiGMA En,nBETAn,nGAMMAn
80 FOR V= 1 TO N
90 READ 1,M,N1
100 IF END 1, GOTO 200
105 REM ENSURES PRINTING OF LAST VALUE IN FILE
110 E1= E1 + N1
120 B= E1/V
130 G= E1/(.5*(VA2-V))
135 IF V/10=INT(V/10) GOTO 140
136 GOTO 150
140 PRINT V,M,E1,B,100*G
145 REM GAMMA MULTIPLIED BY 100
150 FOR B=1 TO N1
160 READ 1,L
170 IF END 1, GOTO 200
180 NEXT B
190 NEXT V
200 PRINT V,M,E1,E1/V,100*G
210 END
NETWRK
As previously noted, NETWRK was written by a student (Steven
Swanson) at the City of London Polytechnic as part of his project
within the Modular Degree and Diploma Scheme. An approach was made to
the author of this thesis by a computing lecturer for suggestions for
suitable thesis work. The idea of writing a network analysis package
with particular reference to the need for a modified S-I index was
approved, In the event, the program turned into a large one which
also provided a shortest path analysis and had extensive nHelpn facil-
ities. It also has extensive error checking facilities which enable
it to check whether nodes exist or whether there are loops in the net-
work.
The program has 12 main functions together with a number of
options, The options relate to the use of NOTEs within the datafiles
to provide some level of documentation, to the use of EXPERT/NO EXPERT
- 140 -
to suppress or include messages in the functions AA and DA, and to the
use of GRAPH/NO GRAPH to automatically print or to suppress automatic
printing of the graph in the SD function.
The 12 functions are:
[1] HELP which provides Help facilities about the main functions and
can also be issued from within a function to provide Help en the
facilities of that function.
[2] NEW filename 'QUITs' the current function and gets NEW
filename.
[3] OLD filename - 'QUITs' the current function and gets an OLD
(saved) filename.
[4] QUIT leaves the program without saving anything.
[5] SAVE filename - saves the current network in 'filename'
[6] EXIT - 'SAVEs' the network and leaves program.
[7] PN (Print Network) -has the subfunctions 'ALL' (to print the
whole network)
and n R1 Rn' to print information on then nodes R1 to Rn.
The information printed is the node, its references and the cita-
tions to it. PN thus will produce a citation index.
[8] AA (Add Arcs) - Adds information on nodes and their references in
the form N n R1 Rn. If n=O, a lone node N is added (although
references can be added to it later if desired). If any of
R1 Rn do not yet exist, a message is issued and the option
given to add them to the network.
- 141 -
[9] AN (Add Nodes) - Adds information on lone nodes in the format n
N1 Nn
[10] DA (Delete Arcs) - Deletes arcs from a node. The format is N n
R1 Rn, which deletes then arcs N-R1, N-R2 N-Rn.
[11] DN (Delete Nodes) -Deletes nodes. The format is n N1 Nn.
[12] SN (Summarise Network). This command is the core of the program
and has several subfunctions. SN issued from the general command
level automatically produces a summary of the Complete Distribu-
tion of Routes i.e. path lengths from 0 to the diameter of the
network with their associated frequencies. It also gives the
Total number of paths, the S and I index values and the area of
the S-I plane that the network falls into (i.e. the best fit fre-
quency distribution). Within SN the following other commands are
available:
SUB node. Displays the distribution of a sub-network using
'node' as the starting point.
SHORTF. Displays the shortest path distribution.
SHORT node1 node2. Gives the nodes which make up the shortest
path between node1 and node2.
TABLE. Tabulates the distribution.
GRAPH. Displays the distribution graphically.
SI. Prints the S-I Index values.
This is a very powerful program and one which should be of use in
many fields of study which involve networks.

Vous aimerez peut-être aussi