Vous êtes sur la page 1sur 33

BIO/CS 471 Algorithms for bioinformatics

Graph Theoretic
Concepts and Algorithms
for Bioinformatics

Intro. to Graph Theory

What is a graph
Formally: A finite graph G(V, E) is a pair (V, E),
where V is a finite set and E is a binary relation on V.
Recall: A relation R between two sets X and Y is a subset of X
x Y.
For each selection of two distinct Vs, that pair of Vs is
either in set E or not in set E.

The elements of the set V are called vertices (or


nodes) and those of set E are called edges.
Undirected graph: The edges are unordered pairs of
V (i.e. the binary relation is symmetric).

a
b
c

Ex: undirected G(V,E); V = {a,b,c}, E = {{a,b}, {b,c}}

Directed graph (digraph):The edges are ordered pairs


of V (i.e. the binary relation is not necessarily
symmetric).
b
Ex: digraph G(V,E); V = {a,b,c}, E = {(a,b), (b,c)}
Intro. to Graph Theory

a
c
2

Why graphs?
Many problems can be stated in terms of a graph
The properties of graphs are well-studied
Many algorithms exists to solve problems posed as graphs
Many problems are already known to be intractable

By reducing an instance of a problem to a standard graph


problem, we may be able to use well-known graph algorithms
to provide an optimal solution
Graphs are excellent structures for storing, searching, and
retrieving large amounts of data
Graph theoretic techniques play an important role in increasing the
storage/search efficiency of computational techniques.

Graphs are covered in section 2.2 of Setubal & Meidanis


Intro. to Graph Theory

Graphs in bioinformatics
Sequences
DNA, proteins, etc.

Chemical compounds

Intro. to Graph Theory

Metabolic pathways

Graphs in bioinformatics

Intro. to Graph Theory

Phylogenetic trees

Basic definitions
Undirected graph

Directed graph
loop

loop
G=(V,E)

isolated vertex
multiple
edges

adjacent

incidence: an edge (directed or undirected) is incident to a vertex


that is one of its end points.
degree of a vertex: number of edges incident to it
Nodes of a digraph can also be said to have an indegree and an outdegree

adjacency: two vertices connected by an edge are adjacent


Intro. to Graph Theory

Travel in graphs
x

path: no vertex can be repeated


example path: a-b-c-d-e
trail: no edge can be repeated
example trail: a-b-c-d-e-b-d
walk: no restriction
example walk: a-b-d-a-b-c

d
c

closed: if starting vertex is also ending vertex


length: number of edges in the path, trail, or walk
circuit: a closed trail (ex: a-b-c-d-b-e-d-a)
cycle: closed path (ex: a-b-c-d-a)
Intro. to Graph Theory

Types of graphs

simple graph: an undirected graph with no loops or multiple edges between


the same two vertices
multi-graph: any graph that is not simple
connected graph: all vertex pairs are joined by a path
disconnected graph: at least one vertex pairs is not joined by a path
complete graph: all vertex pairs are adjacent
Kn: the completely connected graph with n vertices

Simple graph
b

K5

Intro. to Graph Theory

c
Disconnected graph
with two components

d
c
8

Types of graphs
acyclic graph (forest): a graph with no cycles
tree: a connected, acyclic graph
rooted tree: a tree with a root or distinguished vertex
leaves: the terminal nodes of a rooted tree

directed acyclic graph (DAG): a digraph with no cycles


weighted graph: any graph with weights associated with the edges (edge-weighted) and/or the vertices
(vertex-weighted)

10
5

Intro. to Graph Theory

8
-3

2
e

Digraph definitions
for digraphs only
Directed graph
a
Every edge has a head (starting point) and a
b
tail (ending point)
Walks, trails, and paths can only use edges in
the appropriate direction
In a DAG, every path connects an
c
predecessor/ancestor (the vertex at the head
of the path) to its successor/descendents
d
(nodes at the tail of any path).
x
parent: direct ancestor (one hop)
y
w
child: direct descendent (one hop)
A descendent vertex is reachable from any of
v
u
its ancestors vertices
z
Intro. to Graph Theory

10

Computer representation
undirected graphs: usually represented as digraphs with two
directed edges per actual undirected edge.
adjacency matrix: a |V| x |V| array where each cell i,j contains
the weight of the edge between vi and vj (or 0 for no edge)
adjacency list: a |V| array where each cell i contains a list of all
vertices adjacent to vi
incidence matrix: a |V| by |E| array where each cell i,j contains a
weight (or a defined constant HEAD for unweighted graphs) if
the vertex i is the head of edge j or a constant TAIL if vertex I is
the tail of edge j

b
2

10

Intro. to Graph Theory

a
b
c
d

c
8

d
4

6
10 2

adjacency
matrix

a c (8), d (4)
b
c b (6)
d c (2), b (10)

adjacency
list

a
b
c
d

t
6

2
8
t

t
t
2 10

5
4

incidence
matrix
11

Computer representation
Linked list of nodes: Node is a defined data object with labels which include a list of pointers to its children and/or parents

Graph = [] # list of nodes

Class Node:
label = NIL;
parents = []; # list of nodes coming into this node
children = []; # list of nodes coming out of this node
childEdgeWeights = []; # ordered list of edged weights

Intro. to Graph Theory

12

Subgraphs
G(V,E) is a subgraph of G(V,E) if V V and E E.
induced subgraph: a subgraph that contains all possible edges
in E that have end points of the vertices of the selected V
a

a
e

d
c
G(V,E)
Intro. to Graph Theory

d
d

c
Induced subgraph of
G({a,c,d},{{c,d}}) G with V = {b,c,d,e}
13

Complement of a graph
The complement of a graph G (V,E) is a graph with the same
vertex set, but with vertices adjacent only if they were not
adjacent in G(V,E)
a

a
e

d
c

Intro. to Graph Theory

14

Famous problems: Shortest path


Consider a weighted connected directed graph with a distinguished vertex
source: a distinguished vertex with zero in-degree

What is the path of total minimum weight from the source to any other vertex?
Greedy strategy works for simple problems (no cycles, no negative weights)
Longest path is a similar problem (complement weights)
We will see this again soon for fragment assembly!

b
2

Intro. to Graph Theory

10

15

Dijkstras Algorithm

1.
2.

D(x) = distance from s to x (initially all )


Select the closest vertex to s, according to the current
estimate (call it c)
Recompute the estimate for every other vertex, x, as the
MINIMUM of:
1.
2.

The current distance, or


The distance from s to c , plus the distance from c to x D(c) + W(c,
x)

Intro. to Graph Theory

16

Dijkstras Algorithm Example


Initial
Process A
Process C

10

20

0
0

20

18

Process B

10

18

Process D

10

18

Process E

10

18

Intro. to Graph Theory

B
10

20
11
3

15

17

Famous problems: Isomorphism


Two graphs are isomorphic if a 1-to-1 correspondence between
their vertex sets exists that preserve adjacencies
Determining to two graphs are isomorphic is NP-complete
a

1
e

d
c

Intro. to Graph Theory

18

Famous problems: Maximal clique

clique: a complete subgraph


maximal clique: a clique not contained in any other clique; the largest
complete subgraph in the graph
Vertex cover: a subset of vertices such that each edge in E has at least one
end-point in the subset
clique cover: vertex set divided into non-disjoint subsets, each of which
induces a clique
clique partition: a disjoint clique cover

Intro. to Graph Theory

Maximal cliques: {1,2,3},{1,3,4}


Vertex cover: {1,3}
Clique cover: { {1,2,3}{1,3,4} }
Clique partition: { {1,2,3}{4} }
19

Famous problems: Coloring


vertex coloring: labeling the vertices such that no edge in E has two end-points with the same label
chromatic number: the smallest number of labels for a coloring of a graph
What is the chromatic number of this graph?

Would you believe that this problem (in general) is intractable?

Intro. to Graph Theory

20

Famous problems: Hamilton & TSP


Hamiltonian path: a path through a graph which contains
every vertex exactly once
Finding a Hamiltonian path is another NP-complete problem
Traveling Salesmen Problem (TSP): find a Hamiltonian path
of minimum cost
a
d
g

b
e

a
f

Intro. to Graph Theory

1
5

3
3
4

e
4
d

c 2
21

Famous problems: Bipartite graphs


Bipartite: any graph whose vertices can be partitioned into two
distinct sets so that every edge has one endpoint in each set.
How colorable is a bipartite graph?
Can you come up with an algorithm to determine if a graph is
bipartite or not?
Is this problem tractable or intractable?

K4,4
Intro. to Graph Theory

22

Famous problems: Minimal cut set


cut set: a subset of edges whose remove causes the number of
graph components to increase
vertex separation set: a subset of vertices whose removal
causes the number of graph components to increase
How would you determine the minimal cut set or vertex
separation set?
e
b
cut-sets: {(a,b),(a,c)},
d
{(b,d),(c,d)},{(d,f)},...
f
a
h
c

Intro. to Graph Theory

3
23

Famous problem: Conflict graphs

Conflict graph: a graph where each vertex represents a concept or resource


and an edge between two vertices represents a conflict between these two
concepts
When the vertices represents intervals on the real line (such as time) the
conflict graph is sometimes called an interval graph
A coloring of an interval graph produces a schedule that shows how to best
resolve the conflicts a minimal coloring is the best schedule
This concept is used to solve problems in the physical mapping of DNA

A
B
C
D
E
F

1
x

2
x
x
x

Intro. to Graph Theory

3
x
x
x
x

x
x
x

d
e

Colors?

f
24

Famous problems: Spanning tree


spanning tree: A subset of edges that are sufficient to keep a
graph connected if all other edges are removed
minimum spanning tree: A spanning tree where the sum of the
edge weights is minimum
2

a
4

d8

f 2

h
2

a
4
Intro. to Graph Theory

d8

f 2
4

h
2

25

Famous problems: Euler circuit


G is said to have a Euler circuit if there is a circuit in G that traverses every
edge in the graph exactly once
The seven bridges of Konigsberg: Find a way to walk about the city so as to
cross each bridge exactly once and then return to the starting point.

area b
a

area d

area c

d
c

Intro. to Graph Theory

This one is in P!
26

Famous problems: Dictionary


How can we organize a dictionary for fast lookup?
a b c y z

a b c y z

a b c y z

a b c y z

a b c y z

26-ary trie

a b c y z
Intro. to Graph Theory

CAB
27

Graph traversal
There are many strategies for solving graph problems for
many problems, the efficiency and accuracy of the solution boil
down to how you search the graph.
We will consider a travel problem for example:
Given the graph below, find a path from vertex a to vertex d.
Shorter paths (in terms of edge weight sums) are desirable.

b
4

a
2

5
Intro. to Graph Theory

c
6

f
28

A greedy approach
greedy traversal: Starting with the root node, take the edge
with smallest weight. Mark the edge so that you never attempt
to use it again. If you get to the end, great! If you get to a dead
end, back up one decision and try the next best edge.
Advantages: Fast! Drawbacks: Answer is usually non-optimal
For some problems, greedy approaches are optimal, for others
the answer may usually be close to the best answers, for yet
other problems, the greedy strategy is a poor choice.
3

b
4

a
2

5
Intro. to Graph Theory

c
6

Start node: a
End node: d
Traversal order: a, c, f, e, b, d

f
29

Exhaustive search: Breadth-first

For the current node, do any necessary work


In this case, calculate the cost to get to the node by the current path; if the cost is
better than any previous path, update the best path and lowest cost.

Place all adjacent unused edges in a queue (FIFO)


Take an edge from the queue, mark it as used, and follow it to the new
current node

b
4

a
2

d
Intro. to Graph Theory

c
6

Traversal order: a, b, c, d, e, f

30

Exhaustive search: Depth-first


For each current node
do any necessary work
Pick one unused edge out and
follow it to a new current
node
If no unused edges exist,
unmark all of your edges an
go back from whence you
came!

b
4

a
2

d
Intro. to Graph Theory

V.state = visited
Process vertex v
Foreach edge (v,w) {
if w.state = unseen {
DFS (G, w)
process edge (v,w)
}
}

c
6

DFS (G, v)

}
Traversal order: a, b, d, e, f, c
31

Branch and Bound

Begin a depth-first search (DFS)


Once you achieve a successful result, note the result as our initial best
result
Continue the DFS; if you find a better result, update the best result
At each step of the DFS compare your current cost to the cost of the
current best result; if we already exceed the cost of the best result, stop the
downward search! Mark all edges as used, and head back up.
3

b
4

a
2

Intro. to Graph Theory

c
6

Traversal order:
Path Current Best
A
0
AE
2
AEB 6
AEBD 11
11
AEF 9
11
AEFC 15
11
AC
1
11

Path Current Best


ACF 7
11
ACFE 15
11 < prune
AB
3
11
ABD 8
8
ABE 7
8
ABEF 14
8
32

Binary search trees

Binary trees have at two children per node (the child may be null)
Binary search trees are organized so that each node has a label.
When searching or inserting a value, compare the target value to each node;
one out-going edge corresponds to less than and one out-going edge
corresponds to greater than.
On the average, you eliminate 50% of the search space per node if the tree
is balanced

5
8

3
2
1
Intro. to Graph Theory

6
7

10
33

Vous aimerez peut-être aussi