Vous êtes sur la page 1sur 147

Social Network Analysis

American Sociological Association


San Francisco, August 2004

James Moody
Introduction

We live in a connected world:

“To speak of social life is to speak of the association between people –


their associating in work and in play, in love and in war, to trade or to
worship, to help or to hinder. It is in the social relations men establish that
their interests find expression and their desires become realized.”
Peter M. Blau
Exchange and Power in Social Life, 1964

"If we ever get to the point of charting a whole city or a whole nation, we
would have … a picture of a vast solar system of intangible structures,
powerfully influencing conduct, as gravitation does in space. Such an
invisible structure underlies society and has its influence in determining the
conduct of society as a whole."
J.L. Moreno, New York Times, April 13, 1933

These patterns of connection form a social space, that can be seen in multiple
contexts:
Introduction

Source: Linton Freeman “See you in the funny pages” Connections, 23, 2000, 32-42.
Introduction
High Schools as Networks
Introduction

And yet, standard social science analysis methods do not take this space
into account.

“For the last thirty years, empirical social research has been
dominated by the sample survey. But as usually practiced, …, the
survey is a sociological meat grinder, tearing the individual from his
social context and guaranteeing that nobody in the study interacts
with anyone else in it.”
Allen Barton, 1968 (Quoted in Freeman 2004)

Moreover, the complexity of the relational world makes it impossible to


identify social connectivity using only our intuitive understanding.

Social Network Analysis (SNA) provides a set of tools to empirically


extend our theoretical intuition of the patterns that construct social
structure.
Introduction Why do Networks Matter? Local vision
Introduction Why do Networks Matter? Local vision
Introduction

Why networks matter:

• Intuitive: “goods” travel through contacts between actors,


which can reflect a power distribution or influence attitudes
and behaviors. Our understanding of social life improves if
we account for this social space.

• Less intuitive: patterns of inter-actor contact can have effects


on the spread of “goods” or power dynamics that could not be
seen focusing only on individual behavior.
Introduction

Social network analysis is:

•a set of relational methods for systematically understanding


and identifying connections among actors. SNA
•is motivated by a structural intuition based on ties linking
social actors
•is grounded in systematic empirical data
•draws heavily on graphic imagery
•relies on the use of mathematical and/or computational
models.

•Social Network Analysis embodies a range of theories


relating types of observable social spaces and their relation to
individual and group behavior.
1. Introduction
2. Social Network Data
a. Basic data Elements
b. Collecting network data
c. Basic data structures
3. Measuring Networks
a. Flows within of goods in networks
1) Topology
2) Time
b. Structure of Social Space
1) Small Worlds, Scale-Free, Triads
2) Cohesive Groups
3) Role Positions
4. Modeling with Networks
a. Modeling Behaviors with Networks
1) Peer attribute models
2) Network Autocorrelation Models
3) Dyad / QAP Models
b. Modeling Network Network Structure
1) QAP for network structure
2) Exponential Random Graph Models
5. SNA Computer Programs
Social Network Data

The unit of interest in a network are the combined sets of


actors and their relations.

We represent actors with points and relations with lines.


Actors are referred to variously as:
Nodes, vertices or points
Relations are referred to variously as:
Edges, Arcs, Lines, Ties

Example:
b d

a c e
Social Network Data

In general, a relation can be:


Binary or Valued
Directed or Undirected

b d b d

a c e a c e
Undirected, binary Directed, binary

b d b d
1 3 1 2

a c 4
e a c e
Undirected, Valued Directed, Valued
Social Network Data
Social network data are substantively divided by the number of
modes in the data.

1-mode data represents edges based on direct contact between


actors in the network. All the nodes are of the same type (people,
organization, ideas, etc). Examples:
Communication, friendship, giving orders, sending email.

1-mode data are usually singly reported (each person reports on


their friends), but you can use multiple-informant data, which is
more common in child development research (Cairns and
Cairns).
Social Network Data
Social network data are substantively divided by the number of
modes in the data.

2-mode data represents nodes from two separate classes, where


all ties are across classes. Examples:
People as members of groups
People as authors on papers
Words used often by people
Events in the life history of people

The two modes of the data represent a duality: you can project
the data as people connected to people through joint membership
in a group, or groups to each other through common membership

There may be multiple relations of multiple types connecting


your nodes.
Social Network Data

We can examine networks across multiple levels:

1) Ego-network
- Have data on a respondent (ego) and the people they are connected to
(alters). Example: 1985 GSS module

- May include estimates of connections among alters

2) Partial network
- Ego networks plus some amount of tracing to reach contacts of
contacts

- Something less than full account of connections among all pairs of


actors in the relevant population

- Example: CDC Contact tracing data for STDs


Social Network Data

We can examine networks across multiple levels:

3) Complete or “Global” data


- Data on all actors within a particular (relevant) boundary

- Never exactly complete (due to missing data), but boundaries are set

-Example: Coauthorship data among all writers in the social


sciences, friendships among all students in a classroom

For the most part, I will be discussing techniques surrounding global


networks today, though I will briefly mention some standard uses of
ego-network data.
Social Network Data
Collecting Network Data

Data capture any connection between the nodes. Sources include


surveys, published accounts, special informants, etc.

In general, you can only make conclusions about relations among the
set of nodes you have collected, so it is important to observe as
much of the network as possible.

See W&F, chap 2 on different types of data collection


Social Network Data
Collecting Network Data

If you use surveys to collect data, some general rules of thumb:

a) Network data collection can be time consuming. It is better (I think) to


have breadth over depth. Having detailed information on <50% of the
sample will make it very difficult to draw conclusions about the general
network structure.

b) Question format:
• If you ask people to recall names (an open list format), fatigue will
result in under-reporting
• If you ask people to check off names from a full list, you can often get
over-reporting
c) It is common to limit people to ~5 nominations. This will bias network stats
for stars, but is sometimes the best choice to avoid fatigue.
d) Concrete relational indicators are best (who did you talk to?) over attitudes
that are harder to define (who do you like?)
Social Network Data
Collecting Network Data

Existing Sources of Social Network Data

1) Check INSNA: The International Network of Social Network Analysis


2) Many secondary sources (particularly for 2-mode data)
3) National Longitudinal Survey of Adolescent Health (Add Health)
Social Network Data
Basic Data Structures
Working with pictures.
No standard way to draw a sociogram: each of these are equal:
Social Network Data
Basic Data Structures

In general, graphs are cumbersome to work with analytically, though there is a


great deal of good work to be done on using visualization to build network
intuition.

I recommend using layouts that optimize on the feature you are most interested
in, and find that either a hierarchical layout or a force-directed layout are best.
Social Network Data
Basic Data Structures

From pictures to matrices

b d b d

a c e a c e
Undirected, binary Directed, binary
a b c d e a b c d e
a 1 a 1
b 1 1 b 1
c 1 1 1 c 1 1 1
d 1 1 d
e 1 1 e 1 1
Social Network Data
Basic Data Structures

From matrices to lists

a b c d e Adjacency List Arc List


a 1 ab
b 1 1 ab ba
bac bc
c 1 1 1 cbde cb
d 1 1 dce cd
e 1 1 ecd ce
dc
de
ec
ed
Measuring Networks: Flow

“Goods” flow through networks:


Measuring Networks: Flow

In addition to the simple probability that one actor passes information on


to another (pij), two factors affect flow through a network:

Topology
-the shape, or form, of the network
- Example: one actor cannot pass information to another unless they
are either directly or indirectly connected

Time
- the timing of contact matters
- Example: an actor cannot pass information he has not receive yet
Measuring Networks: Flow

Two features of the network’s topology are known to be important: connectivity


and centrality

Connectivity refers to how actors in one part of the network are connected to
actors in another part of the network.

• Reachability: Is it possible for actor i to reach actor j? This can only be


true if there is a chain of contact from one actor to another.

• Distance: Given they can be reached, how many steps are they from
each other?

• Number of paths: How many different paths connect each pair?


Measuring Networks: Flow

Without full network data, you can’t distinguish actors with limited
information potential from those more deeply embedded in a setting.

b
a
Measuring Networks: Flow
Reachability

Indirect connections are what make networks systems. One actor can
reach another if there is a path in the graph connecting them.

b d a

b f
a c e
c
f
d e

Paths can be directed, leading to a distinction between “strong” and “weak”


components
Measuring Networks: Flow
Reachability

Reachability

If you can trace a sequence of relations from one actor to another,


then the two are reachable. If there is at least one path connecting
every pair of actors in the graph, the graph is connected and is called
a component.

Intuitively, a component is the set of people who are all connected by


a chain of relations.
Measuring Networks: Flow
Reachability

This example
contains many
components.
Measuring Networks: Flow
Distance & number of paths

Distance is measured by the (weighted) number of relations separating a pair:

Actor “a” is:


1 step from 4
2 steps from 5
3 steps from 4
4 steps from 3
5 steps from 1

a
Measuring Networks: Flow
Distance & number of paths
Paths are the different routes one can take. Node-independent paths are
particularly important.

There are 2 independent


paths connecting a and
b.
b

There are many non-


independent paths

a
Measuring Networks: Flow
Distance & number of paths

Probability of transfer
by distance and number of paths, assume a constant pij of 0.6
1.2

1
10 paths

0.8
probability

5 paths

0.6

2 paths
0.4
1 path
0.2

0
2 3 4 5 6
Path distance
Reachability in Colorado Springs
(Sexual contact only) •High-risk actors over 4 years
•695 people represented
•Longest path is 17 steps
•Average distance is about 5 steps
•Average person is within 3 steps
of 75 other people
•137 people connected through 2
independent paths, core of 30
people connected through 4
independent paths

(Node size = log of degree)


Measuring Networks: Flow
Centrality

Centrality refers to (one dimension of) location, identifying where an actor


resides in a network.

• For example, we can compare actors at the edge of the network to actors
at the center.

• In general, this is a way to formalize intuitive notions about the


distinction between insiders and outsiders.
Measuring Networks: Flow
Centrality

At the individual level, one dimension of position in the network can be


captured through centrality.

Conceptually, centrality is fairly straight forward: we want to identify


which nodes are in the ‘center’ of the network. In practice, identifying
exactly what we mean by ‘center’ is somewhat complicated, but
substantively we often have reason to believe that people at the center
are very important.

Three standard centrality measures capture a wide range of


“importance” in a network:
•Degree
•Closeness
•Betweenness
Measuring Networks: Flow
Centrality

The most intuitive notion of centrality focuses on degree. Degree is


the number of ties, and the actor with the most ties is the most
important:

C D  d (ni )  X i    X ij
j
Measuring Networks: Flow
Centrality

If we want to measure the degree to which the graph as a whole is centralized,


we look at the dispersion of centrality:

Simple: variance of the individual centrality scores.

 g
2
S D   (CD (ni )  Cd )  / g
2

 i 1 
Or, using Freeman’s general formula for centralization (which ranges from 0 to 1):

CD 
 C
g
i 1 D (n )  CD (ni )
*

[( g  1)( g  2)]
Measuring Networks: Flow Degree Centralization Scores
Centrality

Freeman: 1.0 Freeman: 0.0


Freeman: .02
Variance: 3.9 Variance: 0.0
Variance: .17

Freeman: .07
Variance: .20
Measuring Networks: Flow
Centrality

A second measure of centrality is closeness centrality. An actor is considered


important if he/she is relatively close to all other actors.

Closeness is based on the inverse of the distance of each actor to every other actor
in the network.
Closeness Centrality:
1
 g

Cc (ni )   d (ni , n j )
 j 1 
Normalized Closeness Centrality

CC' (ni )  (CC (ni ))( g  1)


Measuring Networks: Flow
Centrality Closeness Centrality in the examples
C=1.0 C=0.0

C=0.36

C=0.28
Measuring Networks: Flow
Centrality

Betweenness Centrality:
Model based on communication flow: A person who lies on
communication paths can control communication flow, and is thus important.
Betweenness centrality counts the number of shortest paths between i and k
that actor j resides on.

b
a

C d e f g h
Measuring Networks: Flow
Centrality

Betweenness Centrality:

C B (ni )   g jk (ni ) / g jk
j k

Where gjk = the number of geodesics connecting jk, and


gjk(ni) = the number that actor i is on.

Usually normalized by:

C (ni )  CB (ni ) /[( g  1)( g  2) / 2]


'
B
Measuring Networks: Flow
Centrality

Betweenness Centrality:

Centralization: 1.0 Centralization: .59 Centralization: 0

Centralization: .31
Measuring Networks: Flow
Centrality
Actors that appear very
different when seen
individually, are
comparable in the global
network.

(Node size proportional to betweenness centrality )


Measuring Networks: Flow
Time

Two factors that affect network flows:


Topology
- the shape, or form, of the network
- simple example: one actor cannot pass information to
another unless they are either directly or indirectly
connected

Time
- the timing of contacts matters
- simple example: an actor cannot pass information he has
not yet received.
Measuring Networks: Flow
Time

Timing in networks

A focus on contact structure has often slighted the importance of network


dynamics,though a number of recent pieces are addressing this.

Time affects networks in two important ways:


1) The structure itself evolves, in ways that will affect the topology an
thus flow.

2) The timing of contact constrains information flow


Measuring Networks: Flow
Time Drug Relations, Colorado Springs, Year 1

Data on drug users in


Colorado Springs, over
5 years
Measuring Networks: Flow
Time Drug Relations, Colorado Springs, Year 2
Current year in red, past relations in gray
Measuring Networks: Flow
Time Drug Relations, Colorado Springs, Year 3
Current year in red, past relations in gray
Measuring Networks: Flow
Time Drug Relations, Colorado Springs, Year 4
Current year in red, past relations in gray
Measuring Networks: Flow
Time Drug Relations, Colorado Springs, Year 5
Current year in red, past relations in gray
Measuring Networks: Flow
Time

What impact does timing have on flow through the network?

8-9
C E

2-5
A B

3-5
D F

Numbers above lines indicate contact periods


Measuring Networks: Flow
Time

The path graph for the hypothetical contact network

C E

A B

D F

While clearly important, this is not often handled well by current software.
Measuring Networks: Structure & Social Space
The second broad division for measuring networks steps back to
generalized features of the global network.

These factors almost always are of interest because of what they imply
about how goods move through the network, but have resulted in a distinct
line of methods and substantive research.

We focus on 3 such factors today:


1) Basic structure of large-scale networks
2) Cohesive Peer Groups
3) Identifying Role positions (blockmodels)
Measuring Networks: Large-Scale Models
Small World Networks

Based on Milgram’s (1967) famous work,


the substantive point is that networks
are structured such that even when
most of our connections are local,
any pair of people can be connected
by a fairly small number of relational
steps.

Works on 2 parameters:
1) The Clustering Coefficient (c) =
average proportion of closed
triangles
2) The average distance (L)
separating nodes in the network
Measuring Networks: Large-Scale Models
Small World Networks

C=Large, L is Small =
SW Graphs

•High probability that a node’s contacts are connected to each other.


•Small average distance between nodes
Measuring Networks: Large-Scale Models
Small World Networks

In a highly clustered, ordered


network, a single random
connection will create a shortcut
that lowers L dramatically

Watts demonstrates that small


world properties can occur in
graphs with a surprisingly small
number of shortcuts

Diffusion / flow implications are


unclear, but seem similar to a
random graphs where local
clusters are reduced to a single
point.
Measuring Networks: Large-Scale Models
Scale-Free Networks

Across a large number of substantive


settings, Barabási points out that the
distribution of network involvement
(degree) is highly and characteristically
skewed.
Measuring Networks: Large-Scale Models
Scale Free Networks

Many large networks are characterized by a highly skewed distribution of the


number of partners (degree)
Measuring Networks: Large-Scale Models
Scale Free Networks

Many large networks are characterized by a highly skewed distribution of the


number of partners (degree)


p(k ) ~ k
Measuring Networks: Large-Scale Models
Scale Free Networks

The scale-free model focuses on the distance-reducing


capacity of high-degree nodes:
Measuring Networks: Large-Scale Models
Scale Free Networks

The scale-free model focuses on the distance-reducing capacity of high-


degree nodes, as ‘hubs’ create shortcuts that carry network flow.
Measuring Networks: Large-Scale Models
Scale Free Networks

Colorado Springs High-Risk


(Sexual contact only) •Network is approximately
scale-free, with  = -1.3

•But connectivity does not


depend on the hubs.
Measuring Networks: Large-Scale Models
Social Cohesion

White, D. R. and F. Harary. 2001. "The Cohesiveness of Blocks


in Social Networks: Node Connectivity and Conditional
Density." Sociological Methodology 31:305-59.

Moody, James and Douglas R. White. 2003. “Structural


Cohesion and Embeddedness: A hierarchical Conception of
Social Groups” American Sociological Review 68:103-127

White, Douglas R., Jason Owen-Smith, James Moody, &


Walter W. Powell (2004) "Networks, Fields, and
Organizations: Scale, Topology and Cohesive
Embeddings." Computational and Mathematical
Organization Theory. 10:95-117

Moody, James "The Structure of a Social Science


Collaboration Network: Disciplinary Cohesion from
1963 to 1999" American Sociological Review. 69:213-
238
Measuring Networks: Large-Scale Models
Social Cohesion

Formal definition of Structural Cohesion:


(a) A group’s structural cohesion is equal to the minimum number of actors who,
if removed from the group, would disconnect the group.

Equivalently (by Menger’s Theorem):

(b) A group’s structural cohesion is equal to the minimum number of independent


paths linking each pair of actors in the group.
Measuring Networks: Large-Scale Models
Social Cohesion

•Networks are structurally cohesive if they remain connected even when


nodes are removed

0 1 2 3
Node Connectivity
Measuring Networks: Large-Scale Models
Social Cohesion

Structural cohesion gives rise automatically to a clear notion of


embeddedness, since cohesive sets nest inside of each other.

2
1 3

9
4 8 10

11
5 7 12
13
6 14
15
17

18 16
19
20

2
22

23
Measuring Networks: Large-Scale Models
Social Cohesion

Project 90, Sex-only network (n=695)

3-Component (n=58)
Measuring Networks: Large-Scale Models
Social Cohesion

IV Drug Sharing Connected


Largest BC: 247 Bicomponents
k > 4: 318
Max k: 12

Structural Cohesion
simultaneously gives
us a positional and
subgroup analysis.
Measuring Networks:
Cohesive Sub Groups

A primary interest in Social Network Analysis is the identification of


“significant social subgroups” – some smaller collection of nodes in
the graph that can be considered, at least in some senses, as a “unit”
based on the pattern, strength, or frequency of ties.

There are many ways to identify groups. They all insist on a group
being in a connected component, but other than that the variation is
wide.
Measuring Networks:
Cohesive Sub Groups

Graph Theoretical Models.

Start with a clique. A clique is defined as a maximal subgraph in which every


member of the graph is connected to every other member of the graph.
Cliques are collections of nodes where density = 1.0.

Properties of cliques:
• Density: 1.0
• Everyone connected to n-1 alters
• Distance between every pair is 1
• Ratio of within group ties to between
group ties is infinite
• All triads are transitive
Measuring Networks:
Cohesive Sub Groups

Graph Theoretical Models.

In practice, complete cliques are not very useful. They tend to overlap
heavily and are limited in their size.

Graph theorists have thus


relaxed the complete
connectivity requirement
(with varying degrees of
success). See the Moody
& White (2003) for a
discussion of these
attempts.
Measuring Networks:
Cohesive Sub Groups

Identifying Primary groups:

1) Measures of fit
To identify a primary group, we need some measure of how clustered
the network is. Usually, this is a function of the number of ties that
fall within group to the number of ties that fall between group.

2) Algorithmic approaches to maximizing (1)


Once we have such an index, we need a method for searching through
the network to maximize the fit.

3) Generalized cluster analysis


In addition to maximizing a group function such as (1) we can use the
relational distance directly, and look for clusters in the data. We next
go over two different styles of cluster analysis
Measuring Networks:
Cohesive Sub Groups

Segregation Index
(Freeman, L. C. 1972. "Segregation in Social Networks." Sociological Methods and
Research 6411-30.)

Freeman asked how we could identify segregation in a social network.


Theoretically, he argues, if a given attribute (group label) does not matter for
social relations, then relations should be distributed randomly with respect to the
attribute. Thus, the difference between the number of cross-group ties expected
by chance and the number observed measures segregation.

E( X )  X
Seg 
E( X )
Measuring Networks:
Cohesive Sub Groups

Consider the (hypothetical) network below. There are two


attributes in this network: people with Blue eyes and Brown eyes
and people who are square or not (they must be hip).
Measuring Networks:
Cohesive Sub Groups
Segregation Index
Mixing Matrix:

Blue Brown

Blue 6 17

Brown 17 16
Seg = -0.25

Hip Square

Hip 20 3

Square 3 30
Seg = 0.78
Measuring Networks:
Cohesive Sub Groups

The segregation index is one metric used to identify groups. Others include:
a) The ratio of in-group to out-group ties (Negopy, UCINET Factions)
b) Maximizing the probability of in-group contact (CliqueFinder)
c) The Segregation Matrix Index (SMI)
d) The dyadic factor loadings for overlapping groups (akin to a latent
class model)
e) Minimize the within-group distance

Once a metric has been chosen, some algorithm is needed to search through
the graph to identify clusters. These algorithms range from very sophisticated
“graph-intelligent” algorithms, such as NEGOPY, to simple cluster analysis
of distance matrices.

In most cases, you have to pre-set the number of groups to use (the exceptions
are NEGOPY and CliqueFinder. Moody’s CROWDS algorithm also has
automatic stopping criteria, but you have to give it starting values.
Measuring Networks:
Cohesive Sub Groups

In practice, the different


algorithms will give
different results.

Here, I compare the


NEGOPY results to the
RNM results. NEGOPY
returned one large group,
RNM found many smaller,
denser groups.

It’s usually a good idea to


explore multiple solutions
and algorithms.
Measuring Networks: Gangon Prison Network
Cohesive Sub Groups

In practice, the different


algorithms will give
different results.

Here, I compare
NEGOPY, FACTIONS
and RNM. Groups A and
B are identical, C is close.
F, E and D differ.

It’s usually a good idea to


explore multiple solutions
and algorithms.

(all solutions constrained to 6 groups)


Measuring Networks:
Role Positions

Overview
•Social life can be described (at least in part) through social roles.
•To the extent that roles can be characterized by regular interaction
patterns, we can summarize roles through common relational patterns.
•Identifying these sets is the goal of block-model analyses.

Nadel: The Coherence of Role Systems


•Background ideas for White, Boorman and Brieger. Social life as
interconnected system of roles
•Important feature: thinking of roles as connected in a role system =
social structure

White, Harrison C.; Boorman, Scott A., and Breiger, Ronald L. Social
Structure from Multiple Networks I. American Journal of Sociology.
1976; 81730-780.
•The key article describing the theoretical and technical elements of
block-modeling
Measuring Networks:
Role Positions

Elements of a Role:

•Rights and obligations with respect to other people or classes of


people

•Roles require a ‘role compliment’ another person who the role-


occupant acts with respect to

Examples:
Parent - child, Teacher - student, Lover - lover, Friend - Friend,
Husband - Wife, etc.

Nadel (Following functional anthropologists and sociologists) defines


‘logical’ types of roles, and then examines how they can be linked together.
Measuring Networks:
Role Positions

White et al: From logical role systems to empirical social structures

Start with some basic ideas of what a role is: An exchange of something (support,
ideas, commands, etc) between actors. Thus, we might represent a family as:

H W

C
C C
Romantic Love
Provides food for
Bickers with
(and there are, of course, many other relations inside a family!)
Measuring Networks:
Role Positions

The key idea, is that we can express a role through a relation (or set of relations)
and thus a social system by the inventory of roles. If roles equate to positions in
an exchange system, then we need only identify particular aspects of a position.
But what aspect?

Structural Equivalence

Two actors are structurally equivalent if they have the same


types of ties to the same people.
Measuring Networks:
Role Positions

Structural Equivalence
A single relation
Measuring Networks:
Role Positions

Structural Equivalence

Graph reduced to positions


Measuring Networks:
Role Positions

Blockmodeling: basic steps

In any positional analysis, there are 4 basic steps:

1) Identify a definition of equivalence


2) Measure the degree to which pairs of actors are equivalent
3) Develop a representation of the equivalencies
4) Assess the adequacy of the representation
Measuring Networks:
Role Positions

1) Identify a definition of equivalence


Structural Equivalence:
Two actors are equivalent if they have the same type of ties to the same people.
Measuring Networks:
Role Positions

Automorphic Equivalence:
Actors occupy indistinguishable structural locations in the network. That is,
that they are in isomorphic positions in the network.

In general, automorphically equivalent nodes are equivalent with respect to


all graph theoretic properties (I.e. degree, number of people reachable,
centrality, etc.)
Measuring Networks:
Role Positions

Automorphic Equivalence:
Measuring Networks:
Role Positions

Regular Equivalence:
Regular equivalence does not require actors to have identical
ties to identical actors or to be structurally indistinguishable.

Actors who are regularly equivalent have identical ties to and


from equivalent actors.

If actors i and j are regularly equivalent, then for all relations


and for all actors, if i k, then there exists some actor l such
that j l and k is regularly equivalent to l.
Measuring Networks:
Role Positions
Regular Equivalence:

There may be multiple regular equivalence partitions in a network, and thus we tend
to want to find the maximal regular equivalence position, the one with the fewest
positions.
Measuring Networks:
Role Positions
Role or Local Equivalence:

While most equivalence measures focus on position within the full network, some
measures focus only on the patters within the local tie neighborhood. These have
been called ‘local role’ equivalence.

Note that:
Structurally equivalent actors are automorphically equivalent,
Automorphically equivalent actors are regularly equivalent.
Structurally equivalent and automorphically equivalent actors are role equivalent

In practice, we tend to ignore some of these distinctions, as they get blurred quickly
once we have to operationalize them in real-world graphs. It turns out that few
people are ever exactly equivalent, and thus we approximate the links between the
types.

In all cases, the procedure can work over multiple relations simultaneously.

The process of identifying positions is called blockmodeling, and requires identifying


a measure of similarity among nodes.
Measuring Networks:
Role Positions
Once you identify equivalent actors, block them in the matrix and reduce it, based on the number of ties
in the cell of interest. The key values are a zero block (no ties) and a one-block (all ties present):

1 2 3 4 5 6 1 2 3 4 5 6
1 . 1 1 0 1 1 0 0 0
1 1 0 0 0 0 0 0 0 0 0 0
2 1 0 0 1 0 0
2 1 . 0 0 1 1 0 0 0 0 0 0 0 0
3 1 0 1 0 1 0
1 0 . 1 0 0 1 1 1 1 0 0 0 0
3 4 0 1 0 1 0 1
1 0 1 . 0 0 1 1 1 1 0 0 0 0
5 0 0 1 0 0 0
0 1 0 0 . 1 0 0 0 0 1 1 1 1
4 6 0 0 0 1 0 0
0 1 0 0 1 . 0 0 0 0 1 1 1 1
0 0 1 1 0 0 . 0 0 0 0 0 0 0
5 0 0 1 1 0 0 0 . 0 0 0 0 0 0
0 0 1 1 0 0 0 0 . 0 0 0 0 0
0 0 1 1 0 0 0 0 0 . 0 0 0 0
0 0 0 0 1 1 0 0 0 0 . 0 0 0
6 0 0 0 0 1 1 0 0 0 0 0 . 0 0
0 0 0 0 1 1 0 0 0 0 0 0 . 0
0 0 0 0 1 1 0 0 0 0 0 0 0 .

Structural equivalence thus generates 6 positions in the network


Measuring Networks:
Role Positions
Once you partition the matrix, reduce it:
. 1 1 1 0 0 0 0 0 0 0 0 0 0 1 2 3
1 . 0 0 1 1 0 0 0 0 0 0 0 0
1 0 . 1 0 0 1 1 1 1 0 0 0 0
1 1 1 0
1 0 1 . 0 0 1 1 1 1 0 0 0 0 2 1 1 1
0 1 0 0 . 1 0 0 0 0 1 1 1 1 3 0 1 0
0 1 0 0 1 . 0 0 0 0 1 1 1 1
0 0 1 1 0 0 . 0 0 0 0 0 0 0
0 0 1 1 0 0 0 . 0 0 0 0 0 0
0 0 1 1 0 0 0 0 . 0 0 0 0 0
0 0 1 1 0 0 0 0 0 . 0 0 0 0
0 0 0 0 1 1 0 0 0 0 . 0 0 0
0 0 0 0 1 1 0 0 0 0 0 . 0 0
0 0 0 0 1 1 0 0 0 0 0 0 . 0 1 2
0 0 0 0 1 1 0 0 0 0 0 0 0 .

3
Regular equivalence

(here I placed a one in the image matrix if there were any ties in the ij block)
Measuring Networks:
Role Positions

Operationally, you have to measure the similarity between actors. If two actors
are structurally equivalent, then they will have identical ties to other people.
Consider the example again:
C and D match on all
1 2 3 4 5 6 12 other people, and
C D Match
1 . 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 are thus structurally
2 1 . 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 equivalent.
1 0 . 1 0 0 1 1 1 1 0 0 0 0 . 1 .
3
1 0 1 . 0 0 1 1 1 1 0 0 0 0 1 . .
0 1 0 0 . 1 0 0 0 0 1 1 1 1 0 0 1
4
0 1 0 0 1 . 0 0 0 0 1 1 1 1 0 0 1
0 0 1 1 0 0 . 0 0 0 0 0 0 0 1 1 1
5 0 0 1 1 0 0 0 . 0 0 0 0 0 0 1 1 1
0 0 1 1 0 0 0 0 . 0 0 0 0 0 1 1 1
0 0 1 1 0 0 0 0 0 . 0 0 0 0 1 1 1
0 0 0 0 1 1 0 0 0 0 . 0 0 0 0 0 1
6 0 0 0 0 1 1 0 0 0 0 0 . 0 0 0 0 1
0 0 0 0 1 1 0 0 0 0 0 0 . 0 0 0 1
0 0 0 0 1 1 0 0 0 0 0 0 0 . 0 0 1
Sum: 12
Measuring Networks:
Role Positions
If the model is going to be based on asymmetric or multiple relations, you simply stack the
various relations, usually including both “directions” of asymmetric relations:

Stacked
Romance
0 1 0 0 0 0 1 0 0 0
H W 1 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 1 1 1
0 0 1 1 1
Feeds
C 0 0 1 1 1
0 0 0 0 0
0 0 0 0 0
C C 0 0 1 1 1
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
Romantic Love 0 0 0 0 0 0 0 0 0 0
Provides food for 1 1 0 0 0
Bickers with Bicker 1 1 0 0 0
0 0 0 0 0 1 1 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 1 1 0 0 0 0 0
0 0 1 0 0 0 0 0 1 1
0 0 1 1 0 0 0 1 0 1
0 0 1 1 0
Measuring Networks:
Role Positions

The metric used to measure structural equivalence by White, Boorman and Brieger is
the correlation between each node’s set of ties. For the example, this would be:

1.00 -0.20 0.08 0.08 -0.19 -0.19 0.77 0.77 0.77 0.77 -0.26 -0.26 -0.26 -0.26
-0.20 1.00 -0.19 -0.19 0.08 0.08 -0.26 -0.26 -0.26 -0.26 0.77 0.77 0.77 0.77
0.08 -0.19 1.00 1.00 -1.00 -1.00 0.36 0.36 0.36 0.36 -0.45 -0.45 -0.45 -0.45
0.08 -0.19 1.00 1.00 -1.00 -1.00 0.36 0.36 0.36 0.36 -0.45 -0.45 -0.45 -0.45
-0.19 0.08 -1.00 -1.00 1.00 1.00 -0.45 -0.45 -0.45 -0.45 0.36 0.36 0.36 0.36
-0.19 0.08 -1.00 -1.00 1.00 1.00 -0.45 -0.45 -0.45 -0.45 0.36 0.36 0.36 0.36
0.77 -0.26 0.36 0.36 -0.45 -0.45 1.00 1.00 1.00 1.00 -0.20 -0.20 -0.20 -0.20
0.77 -0.26 0.36 0.36 -0.45 -0.45 1.00 1.00 1.00 1.00 -0.20 -0.20 -0.20 -0.20
0.77 -0.26 0.36 0.36 -0.45 -0.45 1.00 1.00 1.00 1.00 -0.20 -0.20 -0.20 -0.20
0.77 -0.26 0.36 0.36 -0.45 -0.45 1.00 1.00 1.00 1.00 -0.20 -0.20 -0.20 -0.20
-0.26 0.77 -0.45 -0.45 0.36 0.36 -0.20 -0.20 -0.20 -0.20 1.00 1.00 1.00 1.00
-0.26 0.77 -0.45 -0.45 0.36 0.36 -0.20 -0.20 -0.20 -0.20 1.00 1.00 1.00 1.00
-0.26 0.77 -0.45 -0.45 0.36 0.36 -0.20 -0.20 -0.20 -0.20 1.00 1.00 1.00 1.00
-0.26 0.77 -0.45 -0.45 0.36 0.36 -0.20 -0.20 -0.20 -0.20 1.00 1.00 1.00 1.00

Another common metric is the Euclidean distance between pairs of actors, which you
then use in a standard cluster analysis.
Measuring Networks:
Role Positions

Automorphic and Regular equivalence are more difficult to find, and require
iteratively searching over possible class assignments for sets that have the same
graph theoretic patterns. Usually start with a set of nodes defined as similar on
a number of network measures, then look within these classes for automorphic
equivalence classes.

A theoretically appealing method for finding structures that are very similar to
regular equivalence, role equivalence, uses the triad census. Each node is
involved in (n-1)(n-2)/2 triads, and occupies a particular position in each of
these triads.
Measuring Networks:
Role Positions

Moving from a similarity/distance matrix to a blockmodel:


number of groups and determining blocks:

“An important decision in an analysis using CONCOR is how fine the


partition should be; in other words, when should one stop splitting
positions? Theory and the interpretability of the solution are the
primary consideration in deciding how many positions to produce.”
(W&F, p.378)

“In defining positions of actors, the ‘trick’ is to choose the point along
the series that gives a useful and interpretable partition of the actors
into equivalence classes.” (W&F p.383)
Measuring Networks:
Role Positions

An example:
Padgett, J. F. and Ansell, C. K.
Robust action and the rise of
the Medici, 1400-1434.
American Journal of
Sociology. 1993; 981259-
1319.

“Political Groups” in the attribute


sense do not seem to exist, so
P&A turn to the pattern of
network relations among
families.

This is the block reduction of the


full 92 family network.
Modeling with Networks: Behaviors

There are two general approaches to modeling behaviors with network data:
1) Using network measures as variables to predict individual outcomes
2) Network autocorrelation / peer influence models
3) Dyad / QAP models of the similarity of actors and their joint network
position
Modeling with Networks: Behaviors

The simplest way to use network data in research is to include the network
measure as a covariate in a standard model:

Y = a0 + b(netvars) + b(other vars) + e

“netvars” most commonly include:


•Functions of each person’s direct contacts attributes
•Such as: mean income of friends, proportion of friends who are
employed, racial heterogeneity of the friends,etc.
•Structural indicators:
•Such as: Centrality, dummies for group / role membership, etc.

These models are the only option for ego-network data,where information on
network alters is collected from a single respondent’s (ego’s) report.

They can be used from extractions of partial or complete data, but the error term
is – by definition – autocorrelated. Cases are not independent, but connected
through the social relations
Modeling with Networks: Behaviors

Network Autocorrelation models (aka Peer Influence models):

Friedkin, N. E. 1984. "Structural Cohesion and Equivalence


Explanations of Social Homogeneity." Sociological Methods and
Research 12:235-61.
———. 1998. A Structural Theory of Social Influence. Cambridge:
Cambridge.
Friedkin, N. E. and E. C. Johnsen. 1990. "Social Influence and
Opinions." Journal of Mathematical Sociology 15(193-205).
———. 1997. "Social Positions in Influence Networks." Social
Networks 19:209-22.

~
Y ()
 αWY ()
 Xb  e
Where W is a direct function of the adjacency matrix, and a is the estimated
value of peer influence.
Modeling with Networks: Behaviors

There are two general ways to test for peer influence in an observed network.
The first estimates the parameters (a and b) of the peer influence model directly,
the second transforms the network into a dyadic model, predicting similarity
among actors.

Peer influence model:


See Doreian, Patrick. “Maximum likelihood methods for linear models Spatial
Effects and Spatial Disturbances Terms.” Sociological Methods and Research.
1982; 10243-269.

Gould, Roger V. Multiple Networks and mobilization in the Paris Commune,


1871. American Sociological Review. 1991; 56716-729. (applied example)

~
Y ()
 αWY ()
 Xb  e
Modeling with Networks: Behaviors

The basic model says that people’s opinions are a function of the opinions of
others and their characteristics.

~
Y ()
 αWY ()
 Xb  e
WY = A simple vector which can be added to your model. That is, multiply
Y by a W matrix, and run the regression with WY as a new variable, and the
regression coefficient is an estimate of a.

This is what Doriean calls the QAD (“Quick and Dirty” estimate of peer
influence, and is equivalent (under certain assumptions) to adding the mean of
ego’s friends to the model.
Modeling with Networks: Behaviors

The problem with the above regression is that cases are, by definition, not
independent. In fact, WY is also known as the ‘network autocorrelation’
coefficient, since a ‘peer influence’ effect is an autocorrelation effect -- your value
is a function of the people you are connected to. In general, OLS is not the best
way to estimate this equation. That is, QAD = Quick and Dirty, and your results
will not be exact.

In practice, the QAD approach (perhaps combined with a GLS estimator) results in
empirical estimates that are “virtually indistinguishable” from MLE (Doreian et al,
1984)

The proper way to estimate the peer equation is to use maximum likelihood
estimates, and Doreian gives the formulas for this in his paper.

The other way is to use non-parametric approaches, such as the Quadratic


Assignment Procedure, to estimate the effects.
Modeling with Networks: Behaviors

An empirical Example: Peer influence in the OSU Graduate Student Network.


Each person was asked to rank their satisfaction with the program, which is the dependent variable
in this analysis.

I constructed two W matrices, one from HELP the other from Best Friend. I treat relations as
symmetric and valued, such that:

 1 if Aijt  1 or A jit  1 
 
Wijt  2 if Aijt  1 and A jit  1
 
 0 otherwise 
Wij  1
j

Wii  0
I also include Race (white/Non-white, Gender and Cohort Year as exogenous variables in the model.
Modeling with Networks: Behaviors

An empirical Example: Peer influence in the OSU Graduate Student Network.


Distribution of Satisfaction with the department.
Modeling with Networks: Behaviors

Parameter Estimates

Parameter Standardized
Variable Estimate Pr > |t| Estimate

Intercept 2.60252 0.0931 0


FEMALE -1.07540 0.0142 -0.25455
NONWHITE -0.22087 0.5975 -0.05491
y00 0.93176 0.0798 0.21627
y99 -0.19375 0.7052 -0.04586
y98 -0.45912 0.4637 -0.08289
y97 0.60670 0.3060 0.11919
PEER_BF 0.23936 0.0002 0.42084
PEER_H 0.50668 0.0277 0.23321

Model R2 = .41, compared to .15 without the peer effects


Modeling with Networks: Behaviors
Dyad QAP models

Another way to get at peer influence is not through the level of Y, but through the
extent to which actors are similar with respect to Y.

The model is now expressed at the dyad level as:

Yij  b0  b1 Aij   bk X k  eij


k

Where Y is a matrix of similarities, A is an adjacency matrix, and Xk is a


matrix of similarities on attributes
Modeling with Networks: Behaviors
Dyad QAP models

NODE ADJMAT SAMERCE SAMESEX


1 0 1 1 1 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 1 1 0
2 1 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1 0 0 1
3 1 1 0 0 1 0 1 0 0 0 0 0 1 0 1 1 1 0 1 0 0 1 0 0 1 1 0
4 1 0 0 0 1 0 0 0 0 0 0 1 0 0 1 1 1 0 1 0 1 0 0 0 1 1 0
5 0 0 1 1 0 1 0 1 0 1 1 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 1
6 0 0 0 0 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 0 0 1 0 0 0 1
7 0 1 1 0 0 0 0 0 0 0 0 1 1 0 1 0 1 0 1 0 1 1 0 0 0 1 0
8 0 0 0 0 1 1 0 0 1 0 0 1 1 0 1 1 0 0 1 0 1 1 0 0 1 0 0
9 0 0 0 0 0 1 0 1 0 1 1 0 0 1 0 0 0 0 0 1 0 0 1 1 0 0 0
Modeling with Networks: Behaviors
Dyad QAP models

Y Distance (Dij=abs(Yi-Yj)
0.32 .000 .277 .228 .181 .278 .298 .095 .307 .481
0.59 .277 .000 .049 .096 .555 .575 .182 .584 .758
0.54 .228 .049 .000 .047 .506 .526 .134 .535 .710
0.50 .181 .096 .047 .000 .459 .479 .087 .488 .663
0.04 .278 .555 .506 .459 .000 .020 .372 .029 .204
0.02 .298 .575 .526 .479 .020 .000 .392 .009 .184
0.41 .095 .182 .134 .087 .372 .392 .000 .401 .576
0.01 .307 .584 .535 .488 .029 .009 .401 .000 .175
-0.17 .481 .758 .710 .663 .204 .184 .576 .175 .000
Modeling with Networks: Behaviors
Dyad QAP models
The REG Procedure
Model: MODEL1
Dependent Variable: SIM

Analysis of Variance

Sum of Mean
Source DF Squares Square F Value Pr > F

Model 4 0.90657 0.22664 9.29 <.0001


Error 31 0.75591 0.02438
Corrected Total 35 1.66248

Root MSE 0.15615 R-Square 0.5453


Dependent Mean 0.33161 Adj R-Sq 0.4866
Coeff Var 47.08929

Parameter Estimates

Parameter Standard
Variable DF Estimate Error t Value Pr > |t|

Intercept 1 0.51931 0.05116 10.15 <.0001


NOM 1 -0.17054 0.05963 -2.86 0.0075
SAMERCE 1 0.05387 0.05916 0.91 0.3696
SAMESEX 1 -0.06535 0.05365 -1.22 0.2324
NCOMFND 1 -0.16134 0.03862 -4.18 0.0002
Modeling with Networks: Behaviors
Dyad QAP models

Like the basic Peer influence model, cases in dyad models are not
independent. However, the non-independence now comes from two sources:
(1) the fact that the same person is represented in (n-1) dyads and (2) that i and
j are linked through relations.

One of the best solutions to this problem is QAP: Quadratic Assignment


Procedure. A non-parametric procedure for significance testing.

QAP runs the model of interest on the real data, then randomly permutes the
rows/cols of the data matrix and estimates the model again. In so doing, it
generates an empirical distribution of the coefficients, generating n levels of
the coefficients at ‘chance’ levels, which you then compare to the observed
data. This is implemented in UCINET for regression, and in DAMN for
logistic regression (J.L. Martin).
Modeling with Networks: Behaviors
Dyad QAP models

Procedure:
1. Calculate the observed association / model
2. for K iterations do:
a) randomly sort one of the matrices
b) recalculate the association / model
c) store the outcome
3. compare the observed outcome to the distribution of
outcomes created by the random permutations.
Modeling with Networks: Behaviors
Dyad QAP models

Comparing multiple networks: QAP


Modeling with Networks: Behaviors
Dyad QAP models
Modeling with Networks: Behaviors
Dyad QAP models

MULTIPLE REGRESSION QAP W/ MISSING VALUES


--------------------------------------------------------------------------------

# of permutations: 2000
Diagonal valid? NO
Random seed: 533
Dependent variable: EX_SIM
Expected values: c:\moody\Classes\soc884\examples\UCINET\mrqap-predicted
Independent variables: EX_NCOM
EX_ADJ
EX_SRCE
EX_SSEX

Number of valid observations among the X variables = 72


N = 72
Number of permutations performed: 1999

MODEL FIT
R-square Adj R-Sqr Probability # of Obs
-------- --------- ----------- -----------
0.545 0.525 0.029 72

REGRESSION COEFFICIENTS
Un-stdized Stdized Proportion Proportion
Independent Coefficient Coefficient Significance As Large As Small
----------- ----------- ----------- ------------ ----------- -----------
Intercept 0.519314 0.000000 0.012 0.012 0.988
EX_NCOM -0.161337 -0.541828 0.011 0.989 0.011
EX_ADJ -0.170539 -0.381186 0.020 0.980 0.020
EX_SRCE 0.053864 0.124551 0.236 0.236 0.764
EX_SSEX -0.065364 -0.151144 0.180 0.820 0.180

Note that the coefficient values will be identical, but the p values differ
Modeling with Networks: Behaviors
Dyad QAP models

A substantive question raised with any kind of network autocorrelation


model is whether observed associations between network structure and
behaviors is due to selection or influence.

Theory is your best friend here, as there is no fool proof method to


distinguish the two.

However, recent work has made great progress using individual-level


fixed effect models (sometimes random effects models), where the
network features vary over time. This removes any stable characteristic
that might account for selection into a particular group.
Modeling with Networks: Structure
Dyad QAP models

While the most common way to use QAP models is to predict the
similarity on some substantive variable, one can just as easily predict the
presence/absence of a relation given attribute similarity.

This makes it possible to model the network itself, and ask questions about
how particular structures form.
Modeling with Networks: Structure
Exponential Random Graph Models (p*)

A long research tradition in statistics and random graph theory has lead to
parametric models of networks.

These are models of the entire graph, though as we will see they often work on
the dyads in the graph to be estimated.

Substantively, the approach is to ask whether the graph in question is an element


of the class of all random graphs with the given known elements. For example,
all graphs with 5 nodes and 3 edges, or, put probabilistically, the probability of
observing the current graph given the conditions.
Modeling with Networks: Structure
Exponential Random Graph Models (p*)

The earliest approaches are based on simple random graph theory, but there’s
been a flurry of activity in the last 10 years or so.

Key references:
- Holland and Leinhardt (1981) JASA
- Frank and Strauss (1986) JASA
- Wasserman and Faust (1994) – Chap 15 & 16
- Wasserman and Pattison (1996)

Thanks to Mark Handcock for sharing some figures/slides about these models.
Modeling with Networks: Structure
Exponential Random Graph Models (p*)

exp{ z ( x)}
p( X  x) 
 ( )
Where:
 is a vector of parameters (like regression coefficients)
z is a vector of network statistics, conditioning the graph
 is a normalizing constant, to ensure the probabilities sum to 1.
Modeling with Networks: Structure
Exponential Random Graph Models (p*)

The simplest graph is a Bernoulli random graph,where each Xij is


independent:

exp{ ij xij }
p( X  x) 
i, j

 ( )
Where:
ij = logit[P(Xij = 1)]
() =P[1 + exp(ij )]

Note this is one of the few cases where () can be written.
Modeling with Networks: Structure
Exponential Random Graph Models (p*)

Typically, we add a homogeneity condition, so that all isomorphic


graphs are equally likely. The homogeneous bernulli graph model:

exp  { xij }
p( X  x) 
i, j

 ( )

Where:
() =[1 + exp()]g
Modeling with Networks: Structure
Exponential Random Graph Models (p*)

If we want to condition on anything much more complicated than density, the


normalizing constant ends up being a problem. We need a way to express the
probability of the graph that doesn’t depend on that constant. It turns out we
can do this by conditioning on a ‘complement’ graph.

First some terms:

X i, j  Sociomatri x with ij element forced to 1


X i, j  Sociomatri x with ij element forced to 0
X ic, j  Sociomatri x with no tie between i and j
Modeling with Networks: Structure
Exponential Random Graph Models (p*)

After some algebra:

 p( X ij  1 | X ijc ) 
 ij  log  c 

  [ z ( xij )  z ( xij )]
 

 p( X ij  0 | X ij ) 

Note that we can now model the conditional probability of the graph,
as a function of a set of difference statistics, without reference to the
normalizing constant.

The model, then, simply reduces to a logit model on the dyads.


Modeling with Networks: Structure
Exponential Random Graph Models (p*)

Fitting p* models

I highly recommend working through the p* primer examples, which can be


found at:

http://kentucky.psych.uiuc.edu/pstar/index.html

Including:
A Practical Guide To Fitting p* Social Network Models
Via Logistic Regression

The site includes the PREPSTAR program for creating the difference variables
of interest.
Modeling with Networks: Structure
Exponential Random Graph Models (p*)

1 2 3 |4 5 6
1 1 1  1
2
2 1 1 
 
3 1 1 1 3
6
x  
4 1 1 4
5 1  5
 
6 1 1 

We can model this network based on parameters for overall degree of Choice
(), Differential Choice Within Positions (W), Mutuality(), Differential
Mutuality Within Positions (W), and Transitivity (T).

The vector of model parameters to be estimated is:  = {  W  W T }.


Modeling with Networks: Structure
Exponential Random Graph Models (p*) 1
2

proc logistic descending ;


tie = l lw m mw tt / noint; 3
6
run;
4

L = Choice
LW = Within Group
M = Mutuality
MW = Mutual within Group
TT = Transitivity

Substantively, this graph is likely from the random class of graphs with similar mutuality and size
Modeling with Networks: Structure
Exponential Random Graph Models (p*)

One practical problem is that the resulting values are often quite correlated,
making estimation difficult. This is particularly difficult with “star”
parameters.

lw m mw tt

lw 1.00000 0.58333 0.80178 0.15830


0.0007 <.0001 0.4034

m 0.58333 1.00000 0.80178 -0.02435


0.0007 <.0001 0.8984

mw 0.80178 0.80178 1.00000 -0.11716


<.0001 <.0001 0.5375

tt 0.15830 -0.02435 -0.11716 1.00000


0.4034 0.8984 0.5375
Modeling with Networks: Structure
Exponential Random Graph Models (p*)

Parameters that are often fit include:


1) Expansiveness and attractiveness parameters. = dummies for
each sender/receiver in the network
2) Degree distribution
3) Mutuality
4) Group membership (and all other parameters by group)
5) Transitivity / Intransitivity
6) K-in-stars, k-out-stars
7) Cyclicity
Modeling with Networks: Structure
Comparing to Random Graphs

A conceptual merge between random graph models and QAP models is to identify a
sample of graphs from the universe you are trying to model. So, instead of
estimating:

exp{ z ( x)}
p( X  x) 
 ( )

generate X empirically, then compare z(x) to see how likely a measure on x would
be given X. The difficulty, however, is generating X.
Modeling with Networks: Structure
Comparing to Random Graphs

The first option would be to generate all isomorphic graphs within a given
constraint.

This is possible for small graphs, but the number gets large fast. For a
network with 3 nodes, there are 16 possible directed graphs. For a
network with 4 nodes, there are 218, for 5 nodes 9608, for 6
nodes1,540,944, and so on…

So, the best approach is to sample from the universe, but, of course, if you
had the universe you wouldn’t need to sample from it. How do you
sample from a population you haven’t observed?

Use a construction algorithm that generates a random graph with known


constraints.
Modeling with Networks: Structure
Comparing to Random Graphs
Example: Bearman, Peter S., James Moody and Katherine Stovel (2004) “Chains of Affection:
The Structure of Adolescent Romantic and Sexual Networks” American Journal of Sociology
110:44:92
Romantic Relations in Jefferson High
Modeling with Networks: Structure
Comparing to Random Graphs

Simulate random networks with similar degree distribution:


Modeling with Networks: Structure
Comparing to Random Graphs
Simulated networks preserve observed degree, isolated dyad
distribution, and four-cycle constraint
Modeling with Networks: Structure
Comparing to Random Graphs
Simulated networks preserve observed degree, isolated dyad
distribution, and four-cycle constraint: 4 examples from the
simulated set
Social Network Software

UCINET
•The Standard network analysis program, runs in Windows
•Good for computing measures of network topography for single nets
•Input-Output of data is a special 2-file format, but is now able to read
PAJEK files directly.
•Not optimal for large networks
•Available from:
Analytic Technologies
Social Network Software

PAJEK
•Program for analyzing and plotting very large networks
•Intuitive windows interface
•Used for most of the real data plots in this presentation
•Started mainly a graphics program, but has expanded to a wide range of
analytic capabilities
•Can link to the R statistical package
•Free
•Available from:
Social Network Software
Cyram Netminer for Windows
•Newest Product, not yet widely used
•Price range depends on application
•Limited to smaller networks O(100)

http://www.netminer.com/NetMiner/home_01.jsp
Social Network Software

NetDraw
•Also very new, but by one of the best known names
in network analysis software.
•Free
•Limited to smaller networks O(100)
Social Network Software

NEGOPY
•Program designed to identify cohesive sub-groups in a network,
based on the relative density of ties.
•DOS based program, need to have data in arc-list format
•Moving the results back into an analysis program is difficult.
•Available from:
William D. Richards
http://www.sfu.ca/~richards/Pages/negopy.htm

SPAN - Sas Programs for Analyzing Networks (Moody, ongoing)


•is a collection of IML and Macro programs that allow one to:
a) create network data structures from nomination data
b) import/export data to/from the other network programs
c) calculate measures of network pattern and composition
d) analyze network models
•Allows one to work with multiple, large networks
•Easy to move from creating measures to analyzing data
•Available by sending an email to:
Moody.77@sociology.osu.edu