Vous êtes sur la page 1sur 7

CEG 7900: Network Science

Spring 2015

Network Mathematics #4: Importance Measures


Date: February 17th 2015
How important is a given node in a network? Importance can mean a lot of things, depending
on your context. An important node in a network may be one that is capable of reaching
other network nodes, so that it can transfer energy or make contact with others in the
system. A node may also be important if it carries a strong degree of flow - that is, if the
values of relationships connected to it are very high (so that it acts as a strong conduit for
the passage of information). Node may be important of they are vital to maintain network
connectivity, so if I remove the node suddenly the network becomes disconnected or heavily
fragmented. Importance may even be measured recursively: a node is important, if it is
connected to many other nodes that are also important. For example, people who work in
the White House or serve as senior aids to the President are powerful people not because of
their job title but simply because they have a direct and strong relationship with him.
A common goal of network analysis is to identify the most critical, essential, important,
or central nodes within a complex system. There are many methods for measuring node
importance, each of which satisfies a different meaning of the term. These measures of
centrality are node level measures: each node i is assigned a centrality score ci . We often
want to work with the relative importance of individual nodes in the network, thus we are
almost always be interested in standardizing each ci by the maximum value in the network:
ci = ci /max(ci )
so that ci = 1 is always assigned to the most important user in the network (again, importance defined depending on the centrality measure we use), values close to 1 are given to
users who are almost the most important, and values close 0 or equal to 0 for users who are
virtually unimportant. If centrality is used to support other kinds of analysis (where we
are not interested in comparing the importance of different users in the network) we will be
using the computed score ci .

Network Centralization
Although centrality is a node-level measure of a network, we may want to be able to say
something about the centralization of the network as a whole. Are there just a few really
important nodes in the network, or are centrality scores evenly distributed, so that every
node is relatively important? There are two possible measures: one is to just compute the
variance of the centrality scores across all nodes in the network, so that the centralization
CG of a network G is given by:
PN
(ci c)2
CG = i=1
N
1-1

1-2

Centrality

where N = |V |. An alternative measure is to consider Freemans general formula:


PN
i=1 (c ci )
CG =
P

max N
i=1 (c ci )
where c is the maximum centrality score of any node in the network and the denominator
is the maximum theoretical value that can be obtained by the sum. An advantage of using
Freemans formula is that it yields a unique formulation depending on the centrality measure
being used. So in practice, as CG goes to 0 we can say that every node is about as important
(or equivalently, every node is roughly unimportant), and as CG goes to 1 there are just a
small number of nodes that hold all of the importance (power) in the complex system.

Measures of centrality
We will discuss four often used and sometimes abused measures of centrality. It is important, in whatever analysis you are interested in, that you choose a centrality measure
that makes sense for your analysis. Each centrality encodes a different meaning or
interpretation of what important is; you thus need to employ the one that is a best match
to your study (and not the one that yields node centrality scores that are nicer or are ones
that you were expecting).

Degree centrality
Degree centrality claims that the importance of a node is only proportional to the number
of neighbors it has. When the network is undirected, degree centrality is simply given as:
X
ci =
Aij
j

If the network is directed, we may compute centrality with respect to incoming or outgoing
connections:
X
X
coi =
Aij cii =
Aji
j

The class slides take a look at some example networks, and the resulting degree centrality
of each node.
How
PN do we use Freemans centralization measure under degree centrality? When is the sum
i=1 (c ci ) maximal? This happens when a single node has the largest degree possible
(N 1) and if all other nodes have the smallest degree possible (1). This is exactly what
happens when we have a star network. Thus
N
X
X
max
(c ci ) =
((n 1) ci ) = ((n 1) 1) + ((n 1) 1) + ...((n 1) 1) + 0
i=1

Centrality

1-3

So we have the term (n 2) occurring (n 1) times in the summation, and the 0 corresponds
to the index of hub in the start network. We thus get:
PN

CG =

ci )
(n 1)(n 2)
i=1 (c

Closeness centrality
Closeness centrality asserts that a node is important if it is by some measure close to
many other nodes in the network. Speaking in network terms, closeness may be defined as
the average path length from a node to all other noes in the graph. Let dij be the distance
of the smallest path from nodes i to j (in an unweighted network, this is the number of
hops, in a weighted network, this is the sum of the weights edges in the path). As distances
increase in length, centrality decreases. We therefore define closeness centrality as inversely
proportional to the sum of all shortest paths:
X
ci = (
dij )1 ((n 1)di )1
i,j

where di is the mean shortest distance of all paths starting from i. Note that the maximim
centrality that could be assigned to a user closeness criteria is cm ax = (n 1)1 when it is
exactly 1 distance away from all other nodes in the network.
P

How do we centralize closeness centrality? Now N


i=1 (c ci ) is maximal when one node has

1
the largest possible centrality (c = (n1) ) and all the others have the smallest. Assuming
that the distance of a path along all edges is 1, this is still given by the star network. Note
that if the node is not the central one, its mean shortest distance to all other nodes is:
2(n 2) + 1
2n 3
1 + 2 + ... + 2
=
=
di =
n1
n1
n1
thus the centrality of these nodes is
ci = ((n 1)di )1 = (2n 3)1
So the denominator of the centralization term for the star graph is given by:
X
i

c ci = 0 + (n 1)(

1
1
n2
n2

) = (n 1)
=
n 1 2n 3
(2n 3)(n 1)
2n 3

So we get
PN

CG =

ci )
(n 2)/(2n 3)
i=1 (c

1-4

Centrality

Betweenness centrality
Betweenness centrality represents the intuitive notion of being central. It posits that a
node is more important if it is included in a large number of shortest paths in the network.
The intuition is that if a user participating in many shortest paths were to be removed,
eliminated, or alter its behavior, it may widely degrade the efficiency of the network and
its ability to transfer energy from one node to another.
Let gjk be the total number of shortests paths between nodes j and k and gjk (i) be the
number of such paths that include node i. The betweenness centrality of node i is given by:
X gjk (i)
ci =
gj k
j<k
where the < in the summation ensures that we do not consider the same path twice. Another
interpretation of ci is, if we assume that energy is passed between nodes in the most efficient
way possible, that ci is the probability node i will be involved in the transmission of energy
between any two nodes.
What is the range of values we may assign to nodes under betweeness centrality? For sure
we will have ci = 0 if i participates in no shortest paths between
nodes. If i were to lie on

n1
every shortest path between
all
nodes,
and
there
are
such
pairs
of nodes, we have the
2

n1
upper bound ci 2 = (n 1)(n 2)/2.
P

Lets centralize betweeness centrality. As before, the structure that maximizes N


i=1 (c ci )
is a star network. The hub of the star is a node participating in the maximim number of
shortest paths (c = (n 1)(n 2)/2) and all other nodes participate in zero shortest paths
(so ci = 0). We thus get:


N
X
X n 1
n1

(c ci ) =
ci = (n 1)
2
2
i=1
i
So we get
PN
CG = 2

(n

i=1 (c ci )
1)2 (n 2)

Centrality relationships
It is worth noting that for most networks, the above three centraility measures will generally
be correlated with each other. This is somewhat intuitive; if a node is close to many others
it is likely to be on may shortest paths, and a node needs to have many neighbors to increase
the likelihood of it having high closeness and betweenness centrality. So when these centrality
measures and not correlated, something interesting may be going on. Consider the following
cases where the centrality scores a node exhibits are:

Centrality

1-5

High degree, low closeness: This situation may arise when the node is embedded
in a cluster that is a far distance away from the rest of the network.
High degree, low betweenness: In this case, the node as a large number of redundent connections, thus reducing the change for a transmission to include it.
high closeness, low degree: This is a very significant node in the network. Although
it is connected to few others, it can easily reach and impact many nodes.
high closeness, low betweenness: This may occur when there are multiple paths in
a network to reach most other nodes. So in this case, the node is close to many people,
but so are many others. In this sense, the user is actually not that interesting.
high betweenness, low degree: The node has a small number of ties, which is
critical to maintain connectivity in the network. These nodes represent dangerous
structural weaknesses.
high betweeness, low closeness: This is kind of like a gateway node: it monopolizes
the ties from a small number of nodes to connect to many others in the network.

Eigenvector centrality
Eigenvector centrality is an extension of degree centrality. The idea is that degree centrality
says that every neighbor counts equally towards a nodes importance. But in reality not all
neighbors are equivalent. Eigenvector centrality says, let us instead give more importance to
nodes, who are themselves neighbors of important nodes. Even though this sounds recursive,
lets just see what the math looks like.
We start with some initial guess about the centrality of every user i as xi . For example, we
could just assume that xi = 1 for all i to start. This random assignment is not very good,
so lets refine it by defining it as the sum of the centrality of a nodes neighbors:
X
x0i =
Aij xi
i

In matrix notation, this may be written as:


x0 = Ax
where x is a vector with elements xi . If we keep repeating this process, we iteratively make
progress towards better and better assignments. At the tth iteration, we have
x(t) = At x
Here is the trick: let us write x (the initial assignments) as a linear combination of the eigenvectors vi of A (remember, since A is symmetric, its n eigenvectors are linearly independent

1-6

Centrality

and form a basis for Rn ), so that x =


expression for x(t) as:
x(t) = At

ci v i =

i ci v i

for some constants c. We can rewrite the

ci kit vi = k1t

X
i

ci

ki t
vi
k1

where k1 is the largest of all ki eigenvalues of A. We factor out k1 from the summation so
that the fraction ki /k1 will always be less than 1, hence in the limit as t the summation
will decay to zero exponentially fast. This leaves us with:
x(t) = c1 k1t v1
Thus, the vector of centralities if we iterate (i.e. continue to refine the estimate for an
infinitely long time) is simply some proportion of the leading eigenvector of A. Equivalently,
we can say that the vector of centralities x satisfies
Ax = k1 x
which is an eigenvector problem.
PageRank (EV centrality for directed networks)
In theory, we can use eigenvector centrality for undirected or directed networks, but the
process is much simpler in the undirected case. In the directed situation, A is no longer
symmetric, and hence, it has a leading left and right eigenvector that we may choose from.
Besides this choice, there are some pathological problems that you can run into (see Figure
7.1) in the text. An extension of eigenvector centrality called Katz centrality was thus
developed for directed networks. This too, had some problems, and Katz centrality was
eventually supersceded by PageRank.
PageRank defines centrality as follows:
xi =

X
j

Aij

xj
kjo

where xoj is the outdegree of node j (if outdegree is 0 we artificially set it to 1, because no
one will count this neighbor in their centrality calculation anyway). This expression says
the degree to which you are an important neighbor to me is proportional to your centrality
divided by your out-degree. Thus, the more links you point to (i.e. the more connected in
the network you are) the less you matter. You are not important (e.g. it is not important if
google links to your webpage) when the page is already linking to a large number of others.
In matrix form, we get
x = AD1 x

Centrality

1-7

where D is a diagonal matrix with Dii = max(kio , 1). Rearranging terms, we find
x = D(D A)1 0
where 0 is a column vector of 1s of appropriate dimension. In practice, the google search
engine uses = 0.85 and calls it the damping factor of PageRank.

Vous aimerez peut-être aussi