Vous êtes sur la page 1sur 11

Algorithms: CSE 202 — Homework III

Problem 1: Job scheduling (KT 7.41)


Suppose you’re managing a collection of processors and must schedule a sequence of jobs over
time.
The jobs have the following characteristics. Each job j has an arrival time aj when it is first
available for processing, a length `j which indicates how much processing time it needs, and a
deadline dj by which it must be finished. (We’ll assume 0 < `j ≤ dj − aj .) Each job can be run on
any of the processors, but only on one at a time; it can also be preempted and resumed from where
it left off (possibly after a delay) on another processor.
Moreover, the collection of processors is not entirely static either: You have an overall pool of k
possible processors; but for each processor i, there is an interval of time [ti , t0i ] during which it is
available; it is unavailable at all other times.
Given all this data about job requirements and processor availability, you’d like to decide whether
the jobs can all be completed or not. Give a polynomial-time algorithm that either produces a
schedule completing all jobs by their deadlines or reports (correctly) that no such schedule exists.
You may assume that all the parameters associated with the problem are integers.
Example. Suppose we have two jobs J1 and J2 . J1 arrives at time 0, is due at time 4, and has
length 3. J2 arrives at time 1, is due at time 3, and has length 2. We also have two processors P1
and P2 . P1 is available between times 0 and 4; P2 is available between times 2 and 3. In this case,
there is a schedule that gets both jobs done.

• At time 0, we start job J1 on processor P1 .

• At time 1, we preempt J1 to start J2 on P1 .

• At time 2, we resume J1 on P2 . (J2 continues processing on P1 .)

• At time 3, J2 completes by its deadline. P2 ceases to be available, so we move J1 back to P1


to finish its remaining one unit of processing there.

• At time 4, J1 completes its processing on P1 .

Notice that there is no solution that does not involve preemption and moving of jobs.

Solution. The basic idea is to reformulate the problem as a (multi-)assignment problem. We are
assigning jobs to timesteps, such that the assignment satisfies the following conditions:

• A job can only be assigned to a timestep, if the time lies between the start time and the
deadline of the job.

• The number of timesteps a job has to be assigned to must be equal to the length of the job.

1
• For any one timestep, the number of jobs assigned is not more than the number of available
machines at that time.

More formally, let the jobs be J = {J1 , . . . , Jm }; the processors be P = {P1 , . . . , Pn }; Ji have
arrival time ai , deadline di , length li ; Pj have arrival time tj , ending time t0j .
Let T = ∪i∈[m] [ai , di ), where the interval [ai , di ) is to be interpreted as a subset of the integers.
T is the set of times during which jobs may be scheduled. (We will see that the network flow
problem we construct actually allows jobs to be scheduled at non-integer times, but that, due to the
integrality theorem, there is always an optimal solution that only schedules during integer times.)
Construct the graph G = (V, E) where

V = {s, t} ∪ J ∪ T
E = {s} × J ∪ {(Ji , j) | j ∈ [ai , di )} ∪ T × {t}

and with edge capacities

c(s, Ji ) = li c(Ji , j) = 1 c(j, t) = |Pk | j ∈ [tk , t0k )|

The capacity of edge (s, Ji) is the number of quanta of processing Ji needs. The capacity of the
edge (j, t) is the number of processors available at time j. There is an edge with capacity 1 from Pi
to time j iff Pi is available
P for scheduling at time j. P
We have |E| = O( i (di − ai )) and a max P flow Phas value ≤ i li , so a max flow can be found
with the Edmonds-Karp algorithm in time O( i li i (di − ai )). P
We claim that there is a valid job schedule if and only if there is a flow with value i li . We
first note that if there is such a flow, then it is a maximum flow, as the cut {s} has the same
value. Hence if there is such a flow there is also an integer flow achieving it (since all capacities
are integer). The assignment of job quanta to times is then given by the flow on the (Ji , j) edges,
and this assignment is valid since we assign every job to exactly least li timesteps, and the number
of jobs assigned to any timestep is bounded by the number of available machines. Likewise, if we
have a valid assignment, the flow that sends a flow of 1 through all edges where we assigned the
job to a timestep is a maxflow that matches the cut {s}. Note that if a time j receives k jobs, the
assignment of jobs to its ≥ k available processors can be arbitrary.

Problem 2: Graph cohesiveness (KT 7.46)


In sociology, one often studies a graph G in which nodes represent people and edges represent
those who are friends with each other. Let’s assume for purposes of this question that friendship is
symmetric, so we can consider an undirected graph.
Now suppose we want to study this graph G, looking for a “close-knit” group of people. One way
to formalize this notion would be as follows. For a subset S of nodes, let e(S) denote the number of
edges in S-that is, the number of edges that have both ends in S. We define the cohesiveness of
S as e(S)/|S|. A natural thing to search for would be a set S of people achieving the maximum
cohesiveness.

(a) Give a polynomial-time algorithm that takes a rational number α and determines whether there
exists a set S with cohesiveness at least α.

(b) Give a polynomial-time algorithm to find a set S of nodes with maximum cohesiveness.

2
Solution.

• Undirected graph G = (V, E)

• For any subset S ⊆ V , let e(S) be the number of edges in E with both ends in S
e(S)
• Define cohesiveness of S = |S|

0.1 Determine whether graph cohesiveness is strictly larger than a rational


number α
Let α be a rational number. We design an efficient algorithm to determine whether there exists a
vertex set S with cohesiveness strictly larger 1 than α.

1. Construct a flow network G as shown in Figure 1.

2. We have the source node s and sink node t.

3. For each vertex x ∈ V , we include a node vx in the flow network.

4. For each edge (x, y) ∈ E, we include a node ux,y in the flow network.

5. We have the following edges in G:

• Edges (s −→ ux,y ) with capacity 1


• Edges (ux,y −→ vx ) and (ux,y −→ vy ) with capacity ∞
• Edges (vx −→ t) with capacity α

6. Note that there are |E| edges leaving the source s, so the capacity of min cut of G must be
≤ |E|.

7. Since α is a rational number, we can scale edge capacities to integers. Time complexity to run
the Preflow-Push Maximum-Flow algorithm on G is O((|V | + |E|)3 ).

Theorem 1. There exists a vertex set S with cohesiveness strictly larger than α if and only if max
flow cannot saturate all edges leaving the source s.

Proof. • Use max flow formulation, and consider min cut (A, B).

• Define S ∗ as the vertices on the source side of the min cut.

• Observe that ux,y ∈ A iff both x ∈ S ∗ and y ∈ S ∗ :


i. Infinite capacity edges ensure that if ux,y ∈ A then x ∈ A and y ∈ A.
ii. If x ∈ A and y ∈ A but ux,y 6∈ A, then adding ux,y to A only decreases cut capacity.
P P P P
• Capacity of cut (A, B): cap(A, B) = ux,y ∈
6 A 1 + vx ∈A α = ( (x,y)∈E 1 − ux,y ∈A 1) +
∗ ∗
P
x∈S ∗ α = |E| − e(S ) + α|S |
1
We will show in Section 0.2 that cohesiveness of all subsets of G is a finite set of discrete rational numbers D. For
an arbitrary rational number α, let β be the largest rational number in D such that β < α. Then determining whether
graph cohesiveness is at least α is equivalent to determining whether graph cohesiveness is strictly larger than β.

3
Figure 1: Flow network G to determine if graph cohesiveness is strictly larger than α
• ⇐: if max flow cannot saturate all edges leaving source s, then by the max-flow min-cut
theorem, cap(A, B) < |E|, so e(S ∗ ) > α|S ∗ |, i.e. the set S ∗ has cohesiveness strictly larger
than α.

• ⇒: suppose there exists a set S with cohesiveness strictly larger than α, but max flow reaches
|E| units, i.e. the capacity of min cut is E. However, by forming a cut (A0 , B 0 ) according the
S, we get its capacity cap(A0 , B 0 ) = |E| − e(S) + α|S| < |E| = min cut capacity, which is a
contradiction of min cut definition.

0.2 Find a vertex set S of G with maximum cohesiveness


 |S|(|S|−1)
Since S ⊆ V , there are |V | choices of the size of subset S. Then there are |S|
2 = 2 choices
for e(S). Therefore, we enumerate all possible rational numbers α in O(|V |3 ) to form a set D of
cohesiveness values, and apply the algorithm in Section 0.1 for each rational number in D, from
largest to smallest.

Problem 3: Rounding (KT 7.39)


You are consulting for an environmental statistics firm. They collect statistics and publish the
collected data in a book. The statistics are about populations of different regions in the world and
are recorded in multiples of one million. Examples of such statistics would look like the Table 1.
We will assume here for simplicity that our data is such that all row and column sums are integers.

Table 1: Examples of census statistics.

Country A B C Total
grown-up men 11.998 9.083 2.919 24.000
grown-up women 12.983 10.872 3.145 27.000
children 1.019 2.045 0.936 4.000
Total 26.000 22.000 7.000 55.000

The Census Rounding Problem is to round all data to integers without changing any row or column

4
Table 2: Rounding results for census statistics.

Country A B C Total
grown-up men 11.000 10.000 3.000 24.000
grown-up women 13.000 10.000 4.000 27.000
children 2.000 2.000 0.000 4.000
Total 26.000 22.000 7.000 55.000

sum. Each fractional number can be rounded either up or down. For example, a good rounding for
our table data would be as Table 2.

(a) Consider first the special case when all data are between 0 and 1. So you have a matrix of
fractional numbers between 0 and 1, and your problem is to round each fraction that is between
0 and 1 to either 0 or 1 without changing the row or column sums. Use a flow computation to
check if the desired rounding is possible.

(b) Consider the Census Rounding Problem as defined above, where row and column sums are
integers, and you want to round each fractional number α to either bαc or dαe. Use a flow
computation to check if the desired rounding is possible.

(c) Prove that the rounding we are looking for in (a) and (b) always exists.

Solution.

(a) Let P be the population categories and C be the countries, m = |P |, n = |C|. We model the
problem as follows.

• input: Ri , Cj ∈ N s.t. i ∈ P , j ∈ C
• output: xi , j ∈ {0, 1} s.t. i ∈ P , j ∈ C or “not solvable” if the constraint cannot be met
P P
• constraint: ∀i ∈ P j∈C xi,j = Ri ∧ ∀j ∈ C i∈P xi,j = Cj

Here Ri is the ith row sum, Cj is the jth column sum, xi,j is 1 iff we round entry (i, j) up. We
can solve this problem by reduction to network flow. Construct the digraph G = (V, E) where

V = {s, t} ∪ P ∪ C
E = {s} × P ∪ P × C ∪ C × {t}

and with edge capacities

c(s, i) = Ri c(i, j) = 1 c(j, t) = Cj .


P P
The problem is solvable iff the max flow is i∈P Ri = j∈C Cj . This is true despite the integer
constraints on the variables since the Ford-Fulkerson method yields an integer flow given integer
inputs. Given an integer max flow, optimal xi,j can be read off the flow from i to j. There
is always a solution since the original fractional values in the matrix forms an optimal flow.
(This is highly nontrivial. That the existence of a fractional solution implies the existence of
an integer solution would be very hard to prove directly, i.e. without appealing to the flow
integrality theorem.)

5
Note that |V | = m + n, |E| = O(mn). The Edmonds-Karp algorithm can solve this problem
in time O(|E|2 ) = O(m2 n2 ). To see this, note that each augmenting path found increases the
flow by exactly 1 and costs O(|E|) time to find. The max flow is ≤ |E| for this problem, and so
≤ |E| such augmenting paths can be found.

(b) We can reduce this problem to part (a) as follows: For each matrix entry xi,j , subtract bxi,j c
from Ri and Cj .

(c) We proved the existence of a solution inP part (a). The input gives a fractional solution which
corresponds to a fractional flow of value i∈P Ri . Since there cannot be a larger flow, this flow
must be a maxflow. Since all capacities are integer,
P the flowP integrality theorem implies that
there is also an integer flow with the same value i∈P Ri = j∈C Cj .
Besides, we can also construct a graph G = (V, E) in Figure 2, where

V = {s, t} ∪ P ∪ cells ∪ cells0 ∪ C


cells are all the cell of the table, and cells’ is a copy of cells.
E = {s} × P ∪ P × cells ∪ cells0 × C ∪ C × {t}

and with edge capacities

c(s, i) = Ri c(i, cell(i, j)) = ∞ c(cell(i, j), cell(i, j)0 ) = 1


c(cell(i, j)0 , j) = ∞ c(j, t) = Cj .

Figure 2: Flow network for proof in rounding problem

In that network flow, we can run the same algorithm in part(a) P to get
P the rounding. But in
Figure 2, for every cut, the capacity must greater or equal to Ri = Ci . Using the property

6
P P
that min-cut equals maximum-flow, combining Ri = Ci is the min-cut, thus, there always
exists a solution.

Problem 4: Database projections (KT 7.38)


You’re working with a large database of employee records. For the purposes of this question, we’ll
picture the database as a two-dimensional table T with a set R of m rows and a set C of n columns;
the rows correspond to individual employees, and the columns correspond to different attributes.
To take a simple example, we may have four columns labeled
name, phone number, start date, manager’s name
and a table with five employees as shown here. Given a subset S of the columns, we can obtain a

Table 3: Table with five employees.

name phone number start date manager’s name


Alanis 3-4563 6/13/95 Chelsea
Chelsea 3-2341 1/20/93 Lou
Elrond 3-2345 12/19/01 Chelsea
Hal 3-9000 1/12/97 Chelsea
Raj 3-3453 7/1/96 Chelsea

new, smaller table by keeping only the entries that involve columns from S. We will call this new
table the projection of T onto S, and denote it by T [S]. For example, if S = {name, start date},
then the projection T [S] would be the table consisting of just the first and third columns.
There’s a different operation on tables that is also useful, which is to permute the columns. Given a
permutation p of the columns, we can obtain a new table of the same size as T by simply reordering the
columns according to p. We will call this new table the permutation of T by p, and denote it by Tp .
All of this comes into play for your particular application, as follows. You have k different subsets of the
columns S1 , S2 , . . . , Sk that you’re going to be working with a lot, so you’d like to have them available in a
readily accessible format. One choice would be to store the k projections T [S1 ], T [S2 ], . . . , T [Sk ], but this
would take up a lot of space. In considering alternatives to this, you learn that you may not need to explicitly
project onto each subset, because the underlying database system can deal with a subset of the columns
particularly efficiently if (in some order) the members of the subset constitute a prefix of the columns in
left-to-right order. So, in our example, the subsets {name, phone number} and {name, start date, phone
number,} constitute prefixes (they’re the first two and first three columns from the left, respectively); and as
such, they can be processed much more efficiently in this table than a subset such as {name, start date},
which does not constitute a prefix. (Again, note that a given subset Si does not come with a specified order,
and so we are interested in whether there is some order under which it forms a prefix of the columns.)
So here’s the question: Given a parameter ` < k, can you find ` permutations of the columns p1 , p2 , . . . , p`
so that for every one of the given subsets Si (for i = 1, 2, . . . , k), it’s the case that the columns in Si constitute
a prefix of at least one of the permuted tables Tp1 , Tp2 , . . . , Tp` ? We’ll say that such a set of permutations
constitutes a valid solution to the problem; if a valid solution exists, it means you only need to store the `
permuted tables rather than all k projections. Give a polynomial-time algorithm to solve this problem; for
instances on which there is a valid solution, your algorithm should return an appropriate set of ` permutations.
Example. Suppose the table is as above, the given subsets are

S1 = {name, phone number},


S2 = {name, start date},
S3 = {name, manager’s name, start date},

7
and ` = 2. Then there is a valid solution to the instance, and it could be achieved by the two permutations

p1 = {name, phone number, start date, manager’s name},


p2 = {name, start date, manager’s name, phone number}.

This way, S1 constitutes a prefix of the permuted table Tp1 , and both S2 and S3 constitute prefixes of
the permuted table Tp2 .

Solution.
Algorithm description: Firstly we construct a bipartite graph of two node sets A and B as
follows:

• For each subset Si , we create a node in both sides of the graph.


• For each Si ⊂ Sj where Si ∈ A, Sj ∈ B, we include an edge (Si , Sj ) in the graph.

Then we construct a flow network G based on the bipartite graph:

• We have the source node s and sink node t.


• Edges (s −→ Si ) with capacity 1
• Edges (Si −→ Sj ) with capacity +∞
• Edges (Sj −→ t) with capacity 1

For the example given above, the graph is shown in Figure 3.

Figure 3: Flow network G for database projection

Suppose we have k subsets, the maximum flow of G is m, then we need at least k − m


permutations. Therefore, if k − m ≤ `, there is a valid solution.

Correctness Proof: Let’s prove our algorithm as follows:

1. We can cover all subsets with k − m permutations


Given a flow of size m, we want to construct a set of k − m permutations that covers all
subsets. We can assume without loss of generality that our maxflow is integral. For our
graph this means in particular that for every edge, the flow in that edge is either 0 or 1.

8
We use the fact that if a family of subsets form a chain, then they can be covered by the
same permutation, e.g. if S1 ⊂ S2 ⊂ S3 , then we can construct a permutation that starts
with all elements in S1 (in any order), followed by all elements in S2 − S1 , followed by all
elements in S3 − S2 , followed by the rest. This permutation has prefixes for S1 , S2 , and
S3 .
We now show that given an integer flow for our graph of size m, we can construct k − m
chains such that each set is in exactly one chain.
In our flow, if a path s → Si → Sj → t in G contributes to the maximum flow, then
Si ⊂ Sj . Consider the set P of pairs (Si , Sj ) identified in that way. We have |P | = m
and furthermore the edges of capacity 1 in our graph ensure that for every Si , there is at
most one set Sj such that (Si , Sj ) ∈ P and at most one set Sk such that (Sk , Si ) ∈ P .
Using the set P we can easily construct exactly k − m chains by picking the smallest
set set Si that is not covered yet, and following the unique paths (Si , Sj1 ), (Sj1 , Sj2 ), . . .
(Sjk−1 , Sjk ), which gives us a chain of Si ⊂ Sj1 ⊂ · · · ⊂ Sjk . Note that by starting with
the smallest set (or one of the smallest), there is no pair (Sk , Si ) in P . Given m pairs
this gives us exactly k − m disjoint chains.
Therefore, with a maximum network flow of m, we can cover all subsets with k − m
permutations.
2. We cannot cover all subsets with less than k − m permutations
We want to do a proof by contradiction. Suppose there is a set of k −c many permutations
with c > m that covers all subsets. We then construct a flow of value c, contradicting
the fact that the maximum flow has value m.
• Define a subset chain as Sa1 ⊂ Sa2 ... ⊂ San where 1 ≤ n ≤ k.
• Define a segment as (Si , Sj ) if Si and Sj are adjacent on the chain.
We have the following assertions:
• Each permutation corresponds to one subset chain(it could be a single subset).
• Each subset appears only on one chain(if one subset appears on multiple chains, just
keep one and drop others)
• The number of chains is the number of subsets minus the number of segments on all
chains
Suppose we can cover all subsets with k − c permutaitons where c > m, this means there
are c segments on all chains. Let’s show this is impossible.
Each segment corresponds to a subset relation Si ⊂ Sj which corresponds to an augment-
ing path s → Si → Sj → t in the graph G. So we can find c augmenting paths using the
c segments. Note that these paths won’t intersect at node other than s and t since all
subset appears on only one chain. In other words, these augmenting paths are compatible
that all of them can contribute to the maximum flow. Therefore the maximum flow is at
least c which is greater than m by assumption. This contradicts the fact that m is the
maximum flow. Therefore, it’s impossible to cover all subsets with k − c permutaitons
where c > m. We need at least k − m permutations.

According to the proofs above, we can conclude that we need at least k − m permutations to
cover all subsets. Therefore, if ` ≥ k − m, there is a valid solution. Otherwise, we can not use
` permutations to cover all given subsets.

9
Time Complexity: We have k subsets. Constructing flow network takes O(k 2 ) since we need to
compare any two subsets to determine whether there is an edge. The complexity of running
the Ford Fulkerson algorithm is O(k 2 ∗ k), the total time complexity is O(k 3 ).

Problem 5: Spanning subgraph


Given a bipartite graph G = (V, E) and an integer dv for each node v, does there exist a spanning
subgraph H of G such that each node has degree dv in H. Give an efficient algorithm to answer
this question, and also necessary and sufficient conditions for the existence of such a subgraph. A
spanning subgraph of G = (V, E) is a subgraph whose vertex set is V and whose edge set is a subset
of E.

Solution. G is a bipartite graph, thus, we can divide V into two sets, X and Y such that every
edge connects a vertice in X to one in Y . We denote the nodes in X as x and the node in Y as y.

Figure 4: Flow network

Construct the graph G0 = (V 0 , E 0 ) in Figure 4 where

V 0 = {s, t} ∪ X ∪ Y
E 0 = {s} × X ∪ E ∪ Y × {t}.

and with edge capacities

c(s, x) = dx c(x, y) = 1 c(y, t) = dy .

Two conditions must be satisfied in order to get such a spanning subgraph that each node has
degree dv .

1. In the spanning subgraph, sum ofP the degree of


P nodes in set X must equal to the sum of the
degree of the nodes in set Y , s.t. x∈X xd = y∈Y dy . Otherwise, there is no such spanning
subgraph H, that each node in H has degree dv .
This is because in bipartite graph, all the edges come from set X will end at set Y . Consider
every edge, it will add one degree to each set X and Y , thus the sum of the degree of nodes in
set X equal to the sum of the degree of the nodes in set Y .

10
P
2. The maximum-flow of the graph in Figure 4 need to be equal to x∈X dx .
To prove it is an necessary and sufficient condition for the existence of such a subgraph, we
need to prove two aspects.
P
• If the maximum-flow equals x∈X dx , we can construct a spanning subgraph H, such
that each node in H has degree dv .
P
Proof. For the first statement, because
P there are x∈X dx flow coming from source node
s, and the maximum-flow equals x∈X dx , thus each node has saturate flow dx , and
according to the conservation property of maximum-flow, dx flow will output from node
x. Plus the capacity of edges out of x is 1, thus, it has dx edges out of x, which will be
the edges in the spanning subgraph.
For nodes in set Y , easy to prove it applying the same arguments.

• If there is a spanning
P subgraph H, with each node having degree dv , then we can have a
maximum-flow of x∈X dx .

Proof. First, weP construct the same flow graph in Figure 4. We can prove the graph have
maximum-flow x∈X dx using min-cut.
For each cut c(A, B), where s ∈ A and t ∈ B, assume there are k nodes of X in the set
A, thus, we have |X| − k nodes from set X in the set B, where 0 ≤ k ≤ |X|. For the x
not in A, the cut
P will cross the edges between them and s, thus, the capacity of the cut
of this part is x∈A/ dx . For those x in A, the flow will cross the cut or go to set Y . For
those flow go to set Y will finally
P flow out and cross the cut. The capacity of the nodes
not is A, we Pcan have at least x∈A dx capacity. Thus, we can have the total capacity of
the cut ≤ x ∈ Xdx . For the base
P case k = 0 or k = |X|, we can use the same idea to
prove it. Besides, we have only x∈A dx flow out from s, which is the min-cut.
Because the capacity of min-cut equals maximum-flow, thus, if there exists a spanning
subgraph
P H, with degree of each node to be dv , we can construct a maximum-flow
x∈X dx .

11

Vous aimerez peut-être aussi