Académique Documents
Professionnel Documents
Culture Documents
Spring 2015
Network Centralization
Although centrality is a node-level measure of a network, we may want to be able to say
something about the centralization of the network as a whole. Are there just a few really
important nodes in the network, or are centrality scores evenly distributed, so that every
node is relatively important? There are two possible measures: one is to just compute the
variance of the centrality scores across all nodes in the network, so that the centralization
CG of a network G is given by:
PN
(ci c)2
CG = i=1
N
1-1
1-2
Centrality
max N
i=1 (c ci )
where c is the maximum centrality score of any node in the network and the denominator
is the maximum theoretical value that can be obtained by the sum. An advantage of using
Freemans formula is that it yields a unique formulation depending on the centrality measure
being used. So in practice, as CG goes to 0 we can say that every node is about as important
(or equivalently, every node is roughly unimportant), and as CG goes to 1 there are just a
small number of nodes that hold all of the importance (power) in the complex system.
Measures of centrality
We will discuss four often used and sometimes abused measures of centrality. It is important, in whatever analysis you are interested in, that you choose a centrality measure
that makes sense for your analysis. Each centrality encodes a different meaning or
interpretation of what important is; you thus need to employ the one that is a best match
to your study (and not the one that yields node centrality scores that are nicer or are ones
that you were expecting).
Degree centrality
Degree centrality claims that the importance of a node is only proportional to the number
of neighbors it has. When the network is undirected, degree centrality is simply given as:
X
ci =
Aij
j
If the network is directed, we may compute centrality with respect to incoming or outgoing
connections:
X
X
coi =
Aij cii =
Aji
j
The class slides take a look at some example networks, and the resulting degree centrality
of each node.
How
PN do we use Freemans centralization measure under degree centrality? When is the sum
i=1 (c ci ) maximal? This happens when a single node has the largest degree possible
(N 1) and if all other nodes have the smallest degree possible (1). This is exactly what
happens when we have a star network. Thus
N
X
X
max
(c ci ) =
((n 1) ci ) = ((n 1) 1) + ((n 1) 1) + ...((n 1) 1) + 0
i=1
Centrality
1-3
So we have the term (n 2) occurring (n 1) times in the summation, and the 0 corresponds
to the index of hub in the start network. We thus get:
PN
CG =
ci )
(n 1)(n 2)
i=1 (c
Closeness centrality
Closeness centrality asserts that a node is important if it is by some measure close to
many other nodes in the network. Speaking in network terms, closeness may be defined as
the average path length from a node to all other noes in the graph. Let dij be the distance
of the smallest path from nodes i to j (in an unweighted network, this is the number of
hops, in a weighted network, this is the sum of the weights edges in the path). As distances
increase in length, centrality decreases. We therefore define closeness centrality as inversely
proportional to the sum of all shortest paths:
X
ci = (
dij )1 ((n 1)di )1
i,j
where di is the mean shortest distance of all paths starting from i. Note that the maximim
centrality that could be assigned to a user closeness criteria is cm ax = (n 1)1 when it is
exactly 1 distance away from all other nodes in the network.
P
1
the largest possible centrality (c = (n1) ) and all the others have the smallest. Assuming
that the distance of a path along all edges is 1, this is still given by the star network. Note
that if the node is not the central one, its mean shortest distance to all other nodes is:
2(n 2) + 1
2n 3
1 + 2 + ... + 2
=
=
di =
n1
n1
n1
thus the centrality of these nodes is
ci = ((n 1)di )1 = (2n 3)1
So the denominator of the centralization term for the star graph is given by:
X
i
c ci = 0 + (n 1)(
1
1
n2
n2
) = (n 1)
=
n 1 2n 3
(2n 3)(n 1)
2n 3
So we get
PN
CG =
ci )
(n 2)/(2n 3)
i=1 (c
1-4
Centrality
Betweenness centrality
Betweenness centrality represents the intuitive notion of being central. It posits that a
node is more important if it is included in a large number of shortest paths in the network.
The intuition is that if a user participating in many shortest paths were to be removed,
eliminated, or alter its behavior, it may widely degrade the efficiency of the network and
its ability to transfer energy from one node to another.
Let gjk be the total number of shortests paths between nodes j and k and gjk (i) be the
number of such paths that include node i. The betweenness centrality of node i is given by:
X gjk (i)
ci =
gj k
j<k
where the < in the summation ensures that we do not consider the same path twice. Another
interpretation of ci is, if we assume that energy is passed between nodes in the most efficient
way possible, that ci is the probability node i will be involved in the transmission of energy
between any two nodes.
What is the range of values we may assign to nodes under betweeness centrality? For sure
we will have ci = 0 if i participates in no shortest paths between
nodes. If i were to lie on
n1
every shortest path between
all
nodes,
and
there
are
such
pairs
of nodes, we have the
2
n1
upper bound ci 2 = (n 1)(n 2)/2.
P
(c ci ) =
ci = (n 1)
2
2
i=1
i
So we get
PN
CG = 2
(n
i=1 (c ci )
1)2 (n 2)
Centrality relationships
It is worth noting that for most networks, the above three centraility measures will generally
be correlated with each other. This is somewhat intuitive; if a node is close to many others
it is likely to be on may shortest paths, and a node needs to have many neighbors to increase
the likelihood of it having high closeness and betweenness centrality. So when these centrality
measures and not correlated, something interesting may be going on. Consider the following
cases where the centrality scores a node exhibits are:
Centrality
1-5
High degree, low closeness: This situation may arise when the node is embedded
in a cluster that is a far distance away from the rest of the network.
High degree, low betweenness: In this case, the node as a large number of redundent connections, thus reducing the change for a transmission to include it.
high closeness, low degree: This is a very significant node in the network. Although
it is connected to few others, it can easily reach and impact many nodes.
high closeness, low betweenness: This may occur when there are multiple paths in
a network to reach most other nodes. So in this case, the node is close to many people,
but so are many others. In this sense, the user is actually not that interesting.
high betweenness, low degree: The node has a small number of ties, which is
critical to maintain connectivity in the network. These nodes represent dangerous
structural weaknesses.
high betweeness, low closeness: This is kind of like a gateway node: it monopolizes
the ties from a small number of nodes to connect to many others in the network.
Eigenvector centrality
Eigenvector centrality is an extension of degree centrality. The idea is that degree centrality
says that every neighbor counts equally towards a nodes importance. But in reality not all
neighbors are equivalent. Eigenvector centrality says, let us instead give more importance to
nodes, who are themselves neighbors of important nodes. Even though this sounds recursive,
lets just see what the math looks like.
We start with some initial guess about the centrality of every user i as xi . For example, we
could just assume that xi = 1 for all i to start. This random assignment is not very good,
so lets refine it by defining it as the sum of the centrality of a nodes neighbors:
X
x0i =
Aij xi
i
1-6
Centrality
ci v i =
i ci v i
ci kit vi = k1t
X
i
ci
ki t
vi
k1
where k1 is the largest of all ki eigenvalues of A. We factor out k1 from the summation so
that the fraction ki /k1 will always be less than 1, hence in the limit as t the summation
will decay to zero exponentially fast. This leaves us with:
x(t) = c1 k1t v1
Thus, the vector of centralities if we iterate (i.e. continue to refine the estimate for an
infinitely long time) is simply some proportion of the leading eigenvector of A. Equivalently,
we can say that the vector of centralities x satisfies
Ax = k1 x
which is an eigenvector problem.
PageRank (EV centrality for directed networks)
In theory, we can use eigenvector centrality for undirected or directed networks, but the
process is much simpler in the undirected case. In the directed situation, A is no longer
symmetric, and hence, it has a leading left and right eigenvector that we may choose from.
Besides this choice, there are some pathological problems that you can run into (see Figure
7.1) in the text. An extension of eigenvector centrality called Katz centrality was thus
developed for directed networks. This too, had some problems, and Katz centrality was
eventually supersceded by PageRank.
PageRank defines centrality as follows:
xi =
X
j
Aij
xj
kjo
where xoj is the outdegree of node j (if outdegree is 0 we artificially set it to 1, because no
one will count this neighbor in their centrality calculation anyway). This expression says
the degree to which you are an important neighbor to me is proportional to your centrality
divided by your out-degree. Thus, the more links you point to (i.e. the more connected in
the network you are) the less you matter. You are not important (e.g. it is not important if
google links to your webpage) when the page is already linking to a large number of others.
In matrix form, we get
x = AD1 x
Centrality
1-7
where D is a diagonal matrix with Dii = max(kio , 1). Rearranging terms, we find
x = D(D A)1 0
where 0 is a column vector of 1s of appropriate dimension. In practice, the google search
engine uses = 0.85 and calls it the damping factor of PageRank.