Vous êtes sur la page 1sur 8

Mapping

the Genome:

Some Combinatorial Molecular


Richard M.

Problems

Arising

in

Biology
Karp*

Introduction

Experimental

Techniques

The

hereditary

information offspring

that is encoded

all

living in DNA

organisms molecules within control Thus their the chropro-

pass on to their packaged cells. scribed production mosomal The into

We briefly describe some of the basic experimental techniques that are used in physical mapping. The bibli-

in structures genes within messenger of proteins DNA is the

called these RNA within blueprint

chromosomes DNA and molecules cell. the ultimately for

ography detailed unique

lists a number account

of papers giving

a much more recognizes a of cleav-

are tran-

of these techniques. enzyme is an enzyme that and is capable

A r-estriction

the living

sequence of nucleotides

chemical

cesses of life. A DNA tertwined nucleotides strands on the human nucleot The many one strand molecule helical drawn is bonded and is a polymer Each the set from consisting strand {A, C, T, G}. of two The each inof two A on T

ing a DNA molecule at any site where that sequence sites for the occurs. Such sites are called restriction given restriction enzyme. A restriction enzyme can be digest, in which the DNA is used to obtain a complete
cut into disjoint consecutive different restriction each restriction fragments, restriction copies sites, sites, of the so that by two is obtained. called gel electrophoresis out size of each a collecaccording can each of which is bounded digest, at difof fragconsecuby two in which ferent ments, tive) sites, DNA a wide (not or a par+al are cleaved selection

strands.

is a sequence

are complementary other, ides. ultimate other the goal efforts genetic each

in the C is bonded

sense that

to its complementary

nucleotide

to a G. In a typical consists of about 10s

bounded

necessarily

chromosome}

each

strand

Using of the Human in molecular DNA of humans information goal Genome biology and other contained Project species therein. A physical fragmarkDNA to until and and A physt ion to size,

a technique fragments the and

of DNA

can be separated

is to sequence

approximate

fragment

the chromosomal elucidate ical map ments maps less ambitious

be determined. Cloning incorporated replication of macroscopic and The thus most makes perimentally incorporate cosmids, cleotides can is a process into process in which a fragment genetic incorporated the of DNA host; is

intermediate 23 pairs

is to construct

a self-replicating serves of the

the selffragment,

of our

of chromosomes. of markers DNA access

as a factory

for the production

specifies of DNA

the locations - along the a kind

- identifiable These to the linear that the

amounts it possible . A fragment

molecule.

to analyze so replicated hosts are

fragment

exwhich long, nunuwhich

ers provide molecule. be close the gene In order molecule ber DNA This ing lems should entists.

of random a feature and can start

is called plasmids, nucleotides about to

a clone.

To locate to the gene

of interest along

such as a gene, is known DNA

commonly fragments which long, and

used about

an experimenter

at a marker walk

10, 000 fragments

incorporate of

50,000

is identified. to construct it is necessary called and then a physical to extract clones, map from obtain how and the of a large it a large DNA num-

Yeast A?tzficial length

6h?omosomes, up a million

handle

fragments

cleot ides. A clone libraTy DNA library effort. is a procedure molecules containing together. hybridizes whether to the bond a probe in which two singleseis a collection is the starting of clones The for point covering creation any physical one of

of fragments clone, molecule reassembly algorithmic, that be grist

a fingerprint reassemble clones the or overlap. probway, and sci-

of each

mathematically leads handled

more

molecules

of interest.

by determining process combinatorial for the mills

a clone mapping

to a number in a primitive

of challeng-

probabilistic computer

Hybridization stranded quences molecule By testing DNA that

are currently

complementary

of theoretical

of nucleotides whether

A probe is a DNA labeled. clone a contains Techniques


requires a fee

is radioactively to determine

or fluorescently the clone probe.

to a given

it is possible Supported
Permission granted direct title that to

by NSF Grant
copy that without

CCR-9005448
fee all or part are not made copyright and appear, of this material ia for

piece

that

is complementary
To copy STOC otherwise,

Machinery. and/or 25th @ 1993 specific ACM ACM

or to republish,

provided commercial

the copies

or distributed notice notice

permission. 93-51931CA,USA -591 -71931000510278 . ..$1 .50 0-89791

advantage,

the ACM

and the is given

of the copying

publication

and its date

is by permission

of the Association

for Computing

278

have been devised bridization

that

make it possible probe experiment.

to test the hyof different

of a single

to hundreds

clones in a single automated

sizes, the locations of the restriction sites along the clone. We begin by describing the rather unrealistic noiseless) versions of these problems, in which all measurements are assumed to be exact. We then comment on how measurement error can be incorporated into the
problems,

The

Lander-Waterman
question to construct is that by clones. nearly is to determine a physical all the DNA the map.

Model
number A minimal should inare of

A fundamental
clones needed requirement be covered troduced

4.1
A clone A, and then B

The

Double

Digest
digested

Problenn
using of A restriction then cllone al < the be enzyme using be az A IV, <:

molecule

is completely using together. restriction Let

Lander model

and Waterman in which the down The

[LW88] clones

enzyme the for length

B, and

a probabilistic length, positions as $ where of each a gap and and of gaps

all of the same dent is the DNA. by any out random length Define clones, gaps. is defined

and are thrown along clone, the and DNA. n is the

at indepencoverage of clones, length C L

let the restriction . . . < a~, and let b1<b2 together where sequences first A = <... the the

sites the

enzyme sites enzymes

restriction When sites {c~ } {bj }. the .,aP

for

enzyme B are

B be usecl the the

number IV is the that study

< bq. restriction and yields al,.. yields multiset double reconstruct the sequence {ai}

A and

of the covered with-

are c1 < C2 < result the In noiseless

0.0 < cP+g, case,

as an interval a contig Waterman and the Two

is not interval

is the

of merging

as a maximal

Lander

the distribution of contig results covered the

experiment {al,az

multiset

of fragment the

lengths second 2? = experiV verthese a trio

of the lengths are and that

number

distribution of their of gaps simpler not

aP_.l, IV aP}, of fragment and (in the its C2, . . . . cPtq restriction i.e., that whether would yield N bq}, problem to determine

as a function the expected is e-C. to the expected aim

of C.

experiment {bl,bzbl,..., iment, cP+~}. sion) lem multisets. has choice the The is to

multiset

lengths third noiseless from C~,+q_I,

number fraction Based a clone long using

is ne c + 0(1) DNA researchers which a chromo50,000

bq bq-l, digest

that

of the library

C = {cl, the at all;

by any clone typically coverage some bases

on these in order

results, for to map

create

sites whether

is at least

5. Thus,

It is NP-hard a solution of restriction sites

the probexists a given

10s nucleotides long, a library library (for

cosmid

clones

there

of 10,000 would example, suffice

clones if only

would Artificial

be desired. of the Chromomapped

A smaller chromosome some using clone

a portion being

a Yeast long)

d, 1?,C of multisets. There are instances that are wildly ambiguous, in the sense that the number of solutions grows exponentially
hand, randomly if the from y one, that will

106 nucleotides clones.

were

as a function
sites the are there are

of p + q. On the other
real numk~ers is unique enumerative it quickly. drawn with alsolution various find

cosmid

restriction and

[0, AT] then almost

4
There striction

Fingerprints
are two fundamental In the fingerprinting, of the clone of restriction fragments In and to the it second the clone. error Thus, fragments approaches first to obtaining called of complete using and using called to to the differsizes a re-

probability gorithms

certainly

fingerprint

of a clone. fragment digests

approach, a number

4.2

The tial

Partial Digest
digest a single

Digest Problems
problem enzyme ..< the A.

and

Probed

Par-

or partial ent

are obtained, enzymes,

combinations resulting

In

the

partial using

clone Let the

is partially

di-

of the

are measured approach, which

gel elechybridizaa number probes fingerfinemphasize

gested tion i.e., The this

set of restricthat all

trophoresis. tion of probes, hybridize that printing gerprinting, trophoresis the remaining

sites the

be al fragments multiset

< a2 <.

aP. It is assumed by restriction lengths the is {aj restriction is similar, or sites

fingerprinting, the

clone It

is exposed is necessary

possible

bounded of fragment

occur; from except fluorescontain-

is determined

of these in these fragment and

ai, z < j}. sites

problem multiset probed a site

is to reconstruct of fragment partial the digest clone

experimental processes. some

is unavoidable in restriction may

lengths. problem

The that cently ing sured. {aj determine

be lost, values

gel elec-

b on

is radioactively fragments the rest

gives only

approximate

for the sizes of fingerprinting

labeled, this site Thus, < the

and are the b <

those data aj }, for

restriction from the sites. The

fragments. positives, error

Hybridization in which to a clone, is made. fragment from

segregated

and

mea-

is subject incorrectly in which In each the problem

to false the

a probe and false

is recorded negatives,

problem goal Both

is the once problems ion, and [NN92],

multiset is to are are easthus [SS90]

as hybridizing opposite arises variant

ai Iai

and

the

again

restriction to polynomial

of restriction

fingerprinting the fragment

il y reducible unlikely

fact orizat

of reconstructing,

to be NP-complete.

papers

279

and [SW91] the number patible with

discuss how the degree of ambiguity of choices for the set of restriction a given multiset of fragment lengths)

(i.e., can

set

of points along in which

(the the

probes) real line,

and

a set

of intervals the

(the

sites com-

clones) order

determine occur.

left-to-right

the points of noise

grow as a function of the number of restriction sites. In practice, the measurement process will miss certain fragments and will give only approximate values forthesizes of a perfect of restriction of the remaining fragments. Thus, instead solution, it is necessary to look for a choice sites that best fits the noisy data. Given a

In the absence each clone to it probes. property that, in is consecutive Thus, for every

the problem left-to-right (dij ) has the the rows the ones this are

is very that

easy.

Since of the ones so The

is an interval,

the set of probes

hybridize

in the

ordering consecutive consecutive. are that

the matrix columns; column, that i.e., achieve

can be permuted precisely

probabilistic model of measurement error, this problem can be cast as a maximum likelihood problem, and can be attacked ful example by branch-and-bound of this approach methods. A successis [KN93].

permutations the left-to-right tent data with the

property

orderings data. [BL76] An

of the probes algorithm based

are consisPQ-tree of this

on the

structure

yield in linear

a succinct time. problem but i does probe with Also,

description

set of permutations Now

let us complicate occur Thus, = is equal O, and, to

the

by assuming false positives on on p and case, model take not occur occur latter This

that do clone clone zero the may pains

Physical bridization

Mapping

Using

Hy-

false not j j

negatives occur. dij dij

at random, if one

Data
and his coworkers Deborah physical clones The are at Berkeley and (Farid Geoffrey

if probe

then then

z does probability in the

The

present

writer

Alizadeh, Zweig) for the

Lee Newberg, have been devising the probes.

Weisser mapping

with random

probability variables

1 p. ~j

algorithms by hy-

are independent.

case where with is a matrix the clones will to false to clone

fingerprinted for the

have

practical false

significance, positives. model takes

as experimenters

bridization problem by i and hybridizes the matrix Pooling sponding

data

mapping

to avoid In this i most ing into fast lating precise. To where properly with clone sume j,

(dij ), where by j; j, and ~j some and zero

the probes otherwise. erroneous false to

are indexed In practice correnumber matrix idea if one the

probabilistic solution that that proving number problem

the problem to ones, but the will

of finding form: convert

the (dij ) for a

is equal

to one if probe entries, the the

likely a matrix

the following of zeros

by chang-

a minimum This

contain negatives tests

has the consecutive is NP-hard, yields in that we are theorems reliably

ones property we have found

positives.

columns. to random and

techniques

can be used needed

to reduce construct and

method

near-optimal process make

solutions of formuthis claim

of hybridization (dZj) of incidence

instances;

between

probes

clones.

The

is to expose the can result conclude in the studied

a probe is negative that the mixture.

to a mixture then probe The (in the

of several noiseless

clones; case)

understand the data contains and a vertex that this problem order clone cannot we delete graph

the

method, Let Let probe

consider us assume

first that tij

the

case

hybridizes problem tests under is similar the

to none of minimizing rubric

of the

is noiseless. another. each (ui, Ui for

no clone graph for = each 1. Asthe parts Now, Ui and remainwill

clones expected long testing.

G be the bipartite i, a vertex only since if dij

number

of pooled

to problems of group

by statisticians

an edge graph

vj ) if and into

is connected,

otherwise

mapping

decomposes be ascertained from will this split graph into adjacent

independent from the data. vertex the

5.1

The

Case

of Unique
molecule

Probes
as an interval A that on the ([BD91], unlikely clones the on the probe is

whose suppose all the der

a probe Then

We may view the DNA


real line and each clone shorter typically occurrences probes probe point with ping a second that so much can

vertices

to it.

ss a subinterval. than a clone

of the

components. all the

There clones

each real [T91]) itself. to

of its line. the Each occur

be a left to the the lie left

component of Ui and strictly will the problem left probe

containing a right right right into

strictly all other that the the the

be viewed popular from that each

as a point approach the it is very and

component of Ui. Each probe parts,

containing of the vertices Thus and

In an increasingly are extracted is long time. probe in of the (in the along problem

clones between

to the consist left and divides

chromosome determines all the point.

components mapping ing to the isolated tains probes.

of isolated three

enough Thus must this

components. component recursively, for

probe

a unique incident The mapof The geoa order

correspondone ob-

the chromosome, intersect

component, vertices.

the right Continuing algorithm

at this the has

case is to determine along case) the the

a divide-and-conquer

ordering

occurrence problem metric

probes noiseless

chromosome. following between

We have

found

that

this

approach

works number

successfully of randomly

interpretation:

given

incidence

even in the presence

of a significant

280

placed when

false it and

negatives. the adjacent

Call

a probe

vertex

a sp~itter

if, the conas-

Under determine,
number ordering

these assumptions as a function


needed

it is an open
the minimum the

problem
n,L and expected left-to-right

to
N

clone

vertices

are deleted, that

of the parameters
model, to determine

remaining tain sumes strictly tains that the the clones. that to

subgraph Having the left

has only found of the

two

components the and

of the Lander-Waterman of probes of the

a splitter, splitter right

algorithm the other splitter, that algorithm the

one of these clones strictly vertices and

components to the

contains of the

clones conand lie

clones,

isolated the left

correspond components. down

to probes The three

5.2

Nonunique Model

Probes:

The

Pokson

between proceeds lems,

right

recursively without method. and breaks

to break splitters We that the easily. false had

these

subprobWhen by a are prointo that a

continuing

as long

as splitters

can be found. is solved splitters

Several experimenters ([CN90], [EL89], [L90], [LD91] and [ST90] ) have approached physical mapping using hybridization out requiring probes to fingerprint each probe to occur the clones, but withat a uniclue point. times along the (d~j ), where an
to clone complicated one can no hybridize

a subproblem brute very force

occurs,it found that

have the original

abundant, ordered

divide-and-conquer problem down so small

cess usually correctly they The blesome, annealing we have 21 that structed Another clones; a single more gether. can

Instead, each probe may occur many DNA. The data is given by a matrix

sequence

of subproblems

entrydii indicates
j. The resulting the one assume same than longer to the In clone mines curs). with that

whether mapping unique two

probe problem probes

i hybridizes is more because

be solved where

case but

positives good

occur success

is more using

trou-

we have

simulated

clones

overlap

if they

and other produced

heuristic

approaches. map with of human the map

In particular, chromosome recently con-

probe. interpretation on the set (the incidence wish to line places between determine of and the each the the problem prclbe the intervals most probe each deterocand likely

a physical research

a geometric is an a finite Given point interval point the

is in close agreement by a French complication are clones of the fragments out chimeric

team. occurrence instead of chimeric of or to-

where

is the which, of the clones.

these disjoint

of consisting consist of two joined approaches

the

sets,

we

fragment

chromosome, with

arrangement The sumption properly paper that

of the [AK93] the

intervals. attacks clones within probe model this are problem under that the asto is the

chromosome various

We are experimenting

distributed (implying clone), the rate DNA

according n~o clone ancl that

to screening

the Lander-Waterman contained of each process

another along

5.1.1 Up The by

On-Line
to now practical

Probe

Selection
that the are are unique in probes advance. to

occurrences by a Poisson in

are governed a probe probabilLet an An up order of oc-

of known

~ (thus, dz with

we have

assumed

occurs ityy Adz,

a short and disjoint

interval

of length

be used

in constructing situation that each the the previous

a map is perhaps probes

chosen better chosen on all the

intervals

are independent). endpoints (i.e.,

approximated in an on-line the result of We under clones.

interleaving currence interleaving to topological stretching problem of maximum

be specified specifies or contraction is formulated likelihood the

by the left-to-right arrangement

assuming with

of the left and right equivalence

of the clones. of the The clones

fashion, hybridizing have

choice

dependent probes to for probe

up to order-p,reserving mapping the interleaving (dij ). seems quite in-

begun

to study idealized

strategies

selection

of the real line). as that given of finding the data

the following

assumptions:

The the

set

of clones

is determined of clones

in

advance, the

and

This tractable, likelihood that are imize explain probes, probes over the

maximum so [AK93] function, One number data. for with

likelihood introduces leading tractable of the More each

problem

distribution model.

satisfies

Lander-

approximaticms to optimization versions of probes let let P S(c) be

to the problems

Waterman The there Each cides tracted. either from extracted

are reasonably NP-hard. the the and, incident alphabet whether this problem

in practice, simplest

although is to needed

they minto of of set set

hybridization are no false probe from

experiments positives

are

error-free; negatives. two

i.e.,

or false from

of occurrences precisely, clone c,

be the the

is extracted At each clone which

one of the the probe

ends debe exfrom are come

of some

clone.

step, the

experimenter will

c. Then that,

we seek the shortest for the optimal each clone probes

word is is al-

next

P such there

c, there It

The end oppsite from

probe

is equally chosen clone.

likely then

to come probes will they

a subword not value known for gorithm

containing to approximate

exactly the within

in S(c).

of the ends.

If two

exists

a polynomial-time objective factor. function

the same clone,

a constant

281

The of the ing the

optimization form ~ minr

problems ~(~), where

considered 1 ranges to

in [AK93] over the

are

interleavof of the

terval graphs instead of interval graphs. Unfortunately, these problems are NP-hard ([GK92] and [GS92]). A number or approximate tion problems. interleaving of researchers methods We outline have investigated an approach
Wj k,

and

is an

approximation Let clones C(T)

logarithm

heuristic

likelihood The

function. interleaving Let

m be a permutation

of solving

these two maximizagiven in [AK93] another. where E(l) For any

clones. mwtai!ion order ?(1), the It in

1 is called occur be the

compatible
in the minimum with c(n). ~(1) in and linear

with pervalue ~. of

~ if, in 1, the by all r.

left-to-right

for the case where no clone includes ~, let F(l) = ~~(1

given over problem turns

interleavings may

1 compatible as min. functions proposes are in quite for widely practice. effective

Then

be rephrased for the can c(m) to )

out C(T) mi~ similar

that,

considered time. To search to solve

is the set of pairs of clones that over ! ap in 1. Then we seek an 1 maximizing F(1). For any permutation T of the clones, let C(m) = maxi F(1), where the maximization
ranges the in over the that, time. by interleavings for a given Thus the search problem. based on this compatible as mi% m, C(m) can with C(m). It ~. Then problem [AK93] can be restated is shown can

[AK93],

be computed [AK93] the that three-opt

determine method algorithms

a local used This

Lin-Kernighan

be computed problem

( [LK73]

in quadratic be attacked tions, the in the

optimization in the We space are

traveling-salesman tational with ing strategy

problems has proved structure Zweig.

compu-

a local

of permutaalgorithm developing for a

on problems refined usrecently

spirit

of the

Lin-Kernighan approach.

up to 200 clones, a neighborhood by Geoffrey

and is currently local

being

traveling-salesman program

search

computer

devised

7
6 Mapping Graphs
Any method of fingerprinting clones can be used to esti-

Sequencing
The Sequence
version a DNA whose

Problems
Assembly
physical

Using

Interval 7.1 Problem


mapping problem

A miniature arises when

of the molecule

is covered of nucleotides a few by

by overlapping are known.

mate the probability ple, if hybridization

that two clones overlap; for examfingerprinting is used and the Pois-

fragments Since cannot fragments DNA RNA likely ment

sequences longer than reliably most

fragments will

hundred current short.

nucleotides methods, The of the the entire problem

son model of probe occurrences is assumed, then two clones are likely to overlap if they have many probes in common and there are few probes that lie on one clone but not the other. Given an overlap probability pj ~ for each pair (j, k) of clones, it is natural to look for an interleaving 1 that best agrees with these probabilities. The degree of agreement of interleaving 1 may be measured by IIEPj ~IIE( 1 pj ~), where E is the set of pairs of clones that overlap in 1, and ~ is the complementary set. Let ting wj ~ = in ~ we find that maximizing the degree of agreement zfi
wjk.

be sequenced typically the Similar

be rather likely from problems

is to determine molecule. molecules sequence

sequence arise sequenced assumptions one containing

in reconstructing fragments. the each most frag-

or proteins is the shortest and

Under

reasonable

probabilistic thus

as a substring, pIobierm

we are led to the

shortest

supe?wtving

given

the shortest

string

containing

strings Z1, x2, ..., xn, find each of the given strings

is equivalent
problem determines an edge intervals an

to maximizing
it is natural Any two A a graph with graph to a

To attack introduce intervals vertex tices arising interval sulting val terized determine ([BL76] Our graph another, for if the in the

this

maximization the real line and

concept interval

of an interval

graph. between overlap. interval

set of ver-

along each

as a consecutive substring. Simple polynomial-time approximation algorithms for the shortest superstring problem are discussed in [GM80] and [T89]. In particular, [BJ91] gives an approximation algorithm that solves the problem wit hin a factor of three; i.e., the superstring it produces is at most three times as long as the shortest superstring. The shortest superstring probclass lem is known to be complete for the complexity

corresponding this way

is called contained interval there a given

gTaph; if no
then graph. the reIntercharacto graph

is properly graph and [FG65], and

in another, graphs

is called proper and

a pTope? inte?vat is a linear-time graph

graphs

are well

Max SNP. This implies that unless P = NP there exists an 6 > 0 such that it is NP-hard to solve the shortest superstring problem within a factor of 1 + c. Of greater interest to biologists is the case where the measurement of the sequences for the fragments is subject to error and, given these possibly erroneous sequences, one wants to determine the most likely sequence for the DNA molecule from which the fragments have been drawn. as the sequence This problem
pvob~e~

algorithm

whether [KM89]).

is an interval

maximization to maximize then

problem ~E(~)
Wj k,

becomes: where that attention

find E(G)

an interval denotes the inincludes

edge set of G; if we wish we must

to require restrict

no clone

is known

generically
versions,

to proper

assembly

ithas many

282

depending ror, the

on the a prioTi and point

statistical information

model

of measurement about [K91] the

erDNA

to contribute noisy versions.

to solving

the

messier,

but

more

relevant,

available Reference the literature

molecule, starting lem.

other for

factors.

is a good prob-

VVe have omitted ing: combinatorial

concentrated other areas problems

on physical of biology These arise.

mapping, in which include

and

have

reviewing

on this

many

important the follow-

7.2

Sequencing

by

Hybridization

Sequencing by hybridization, ( [DL89], [PL91] and [SP91]), is a novel method of sequencing DNA p?obe as a sequence with molecules. Define a generalized
dont ple, first the cares over the alphabet AC

Search for Homologies strings for homologies that are similar Trees

Searching a database of (i.e., pairs or sets of strings

to one another); Inferring the evolutionary their DNA tree of or RNA

{A, C, T, Gj.
T that T by attempts of the contains in the

For

examany in and the G an it which, is

Phylogenetic

generalized of seven C in The it. seventh. fragment to

probe the

G represents A fifth

sequence in the unknown ized

nucleotides second, method z of DNA In one

a set of species by comparing sequences;

position,

to sequence method,

hybridizing

general-

Protein Folding Determining the three-dimensional structure of a protein from its sequence of amino acids. In conclusion, a word about the role of combinatorial It is conin molec-

probes that

version can

assumed

an experiment to hybridize determine z. = Let {pl,

be performed probe number the set

by attempting ment rences ized is no z, will

generalized IV(p, z), the

p to fragof occurof generalx if there that three

optimization in the field of molecular biology. venient to formulate reconstruction problems

of p in probes string P

us say same

that

p2, . . . , pn} Then

sequences length the

ular biology as optimization problems. For example, physical mapping can be treated as the probl~em of finding the most likely interleaving of a set of clones, given the fingerprints of the clones. However, the most likely interleaving is of little use unless it very closely resembles the true interleaving. Thus, optimization methods should be viewed not as vehicles for solving a problem, but for proposing a plausible hypothesis to be confirmed or disconfirmed by further experiments. the correct solution of a reconstruction The search for prolblem must a close in-

y #

z of the

as z such following

N(p; , y) = lV(pi, z) for all i.


problems
q

arise.

Given length {PI,?%

N(pl,
of z, , pn} a positive

z), IV(p2, s), . . . . IV(pn,


determine sequences integer z (this a); m and

z)

and only

the if

is possible

Given probes length

a set of generalized of the strings of

inevitably teraction

be an iterative between

process involving

P, determine

what

fraction

experimentation

and computation.

m are sequenced a positive a fraction integer

by P; m, determine strings the probes smallthat

Given

Acknowledgements The writer wishes to acknowledge enlightening discussions with William Chang, Dan Gusfield, John Kececioglu, Gene Lawler, Dalit Naor, David Nelson, Frank Olken, Pavel Pevzner, Ron Shamir, Terry Speed, David Torney, Michael Waterman and, especially,. . Farid Alizadeh, Lee Newberg, Deborah Weisser
and Geoffrey Zweig.

est cardinality sequences m. The solved a given first in the length and

of a set of generalized 1 c of the

of length

second

of these P consists The third

problems of all problem

have 4t strings appears

been of to

case where

t ([P89]).
open.

be completely

References
[AB89]

F.M.

Ausubel,

R. Brent,

R.E. Kingston,
Biology. Newberg of in Molecular on

et al.
1989 and D.

8
This

Conclusion
[AK92] article has been concerned about serious the will versions by the with problems molecule it. will an It of defrom inthe errors. if fail [BD91]

Current F. A

Protocols R.

in Molecular Karp, Problem 1993 and D. L.

Alizadeh, Combinatorial Proc. Algorithms

Weisser. ogy. crete

Physical

Mapping

Chromosomes: BiolDis-

termining formation to stress information Computer they derstand

information about that, about scientists noiseless in any

a DNA within biological

fragments

is important

ACM/SIAM

Symposium

application, contain and problems opportunity and

fragments be missing very

E. Barrilot, retical Using

J. Dausset

Cohen.

TheoStrategy PTOC.

are seduced

appealing of these

easy to un-

Analysis Random

of a Physical Single Copy

Mapping Landmarks.

283

Nail. 1991 [BJ91] A.

Acad.

Sci.

USA

Vol.

88,

pp

3917-3921.

[FS83]

W.M. Mapping ment s.

Fitch, the Gene

T.F. Order Vol.

Smith of DNA 22, pp H.

and

W.W.

Ralph. Frag-

Restriction

19-29.1983 and Tech Institute December, J. Storer. Vol. 20, R. Shamir. The Com-

Blum,

T.

Jiang, Linear

M.

Li,

J.

Tromp

and

M. [GK921 M.C. Graph Moise puter [GM80] Golumbic, Sandwich and Frida No. Sciences Kaplan Problems Eskansky 270/92, and Science Teport, of 1992 On pp Findof 50-58.

Yannakakis. Superstrings. Annual puting [BL76] K.S. ACM

Approximation

of Shortest Third Com-

P?oceedings Symposium 1991 Lueker.

of the Twenty
on Theory of

pp 328-336, Booth and Using

J. Gallant, ing 1980 a Minimal ComputeT

D. Maier Length Systems

G.S.

Testing

for

Conand 335[G090]

Superstring.

Journal

secutive Planarity

Ones

Property, PQ-Tree Systems

Interval Algorithms. Vol. Science

Graphs 13, pp

.lorwnal

of Computer 379.1976 [BS90] E. A.V. ing Maps 31(2), [CL89] A.V. et al. ing. [CL89] A.V. B. Raff, Meister Branscomb, Carrano

E.D. Region

Green

and

M.V.

Olson. Fibrosis

Chromosomal Gene for in Yeast Human pp 94-98.

of the

Cystic

T. and Human

Slezak, M. and

R.

Pae,

D.

Galas, Cosmid Vol. [GS92]

Artificial Genome 1990 M.C. and

Chromosomes: Mapping.

A Model Vol.

Waterman. Region-Specific Genome.

ConstructGenome

Science

250,

Chromosomeof the

pp 1059-1065.1989 Carrano, A J. Lamerdin, for L.K. DNA Ashworth, Fingerprint-

Golumbic Algorithms

and for

R.

Shamir.

Complexity About 1992 Cartoon Guide Time: A Sym-

Reasoning of Computing The

Graph-Theoretic posium [GW91] L. on TheoTy M.

Approach.

PTOC. lsTael

High-Resolution, Method Vol. J. E. DeJong, M. 4, pp Lamerdin, Branscomb, D. Kronick.

Fluorescence-Based, 129-136.1989 L.K. T. L. High Ashworth, Slezak, McBride, Resolution Method Vol. 4, pp [K91] M. S.

Semiautomated Genomics Cerrano, Watkins, P.J. and

Gonick,

Wheelis, 1991

to Genetics, [K88] M. Site Acad. Krawczak. Mapping Sci.

Harper

Algorithms of DNA USA Vol.

for Molecules.

the

Restriction PTOC. Natl.

Keith, A

85, pp 7298-7301.1988 and Approximation Reconstruction. of Arizona K. E. Isono. Coli for 1991 The Rapid LiAl-

Fluorescence-based, of DNA Fingerprinting. 129-136.1989 [CN90] A.G. hetner mid Type Craig, and D. H.

Semi-automated Genomics

J .D. Kececioglu. gorithms PhD for Dissertation, Kohara, Map Application and Cell A.

Exact DNA

Sequence University

Nizetic, Lehrach. the

J .D.

Hoheisel,

G. of

ZeCos-

[KA87]

Y.

Akiyama of the

and

Ordering Herpes A Test Simplex Case Nucleic

Physical some: Analysis brary. [KM89] N. val Korte

Whole

Chromo-

Clones

Covering

Virus for FinAcids

of a New

Strategy

I (HSV-1) by

Genome:

Sorting Vol.

of a Large

Genomic

gerprinting Research [DL89] R. DNA

Hybridisation.

50, pp 495-508.1987 Mohring. for Jou?nal An Incremental InterVol.

Vol. 18(9), pp 2653-2660.1990


I. Labat, I. Bruckner of Theory Magebase and R. Plus

and

R.H. Algorithm SIAM

Linear-Time Drmanac, Graphs. Crkvenajakov. Genomics [EL89] Vol. Sequencing 4, pg 114.1989 K.A. Genomes Lewis. Physical USA MapVol. 86, [L90]

Recognizing of Computing

18, pp 68-81.1989 [KN93] R. Karp, L. Newberg, Partial submitted Genes Press, Clone Digest An Algorithm 1993 PTess and OzfoTd for the

by Hybridization:

of a Method. Probed Reconstruction Prob-

G.A.
ping

Evans

and

lem, B.

to CABIOS IV 1990 Maps Made Cell

of Complex

by Cosmid

Multiplex

Analysis.

PTOC. Nat 1. A cad. Sci.

Lewin.

pp 5030-5034.1989 [L190] [F85] P.C. Fishburn. John Interval Wiley Orders and Interval [LD91] [FG65] D.R. trices Math Fulkerson abd Vol. and O.A. Gross. Incidence Jou?nal MaGraphs. pp 35-561985

Univemity P. Little. Vol. H. ping

Simple.

Nature

346, pp 611-12.1990 Lehrach, R. Drmanac, Fingerprinting Genome J. Hoheisel, in Genome Analysis et Vol. al. 1,

Hybridization

Map-

Interval

Graphs.

Pacific

of

and Sequencing.

15, pp 835-855.1965

pp 39-81.1991

284

[LK73]

S. Lin

and W. Kernighan.

An

Effective

HeurisProb-

[S w91]

W.

Schmitt

and

M.S.
in

Waterman.

Multiple ProbVol.

tic Algorithm lem. [LW88]

for the Traveling-Salesman

Solutions lems.

of DNA

Restriction
Applied

Map,ping
Mathematics

Operations

Research Vol. 21, No. 2.1973 Genomic Clones: A Vol. 2, pp

Advances

12, pp 412-427,

1991 Approximation Common Algcjrithms Problem. 83, No. 1, for

E.S. Lander and M.S. Waterman. Mapping by Fingerprinting Random Mathematical 231-239.1988 Analysis. Genomics

[T89]

J.S. the

Turner. Shortest

Superstring Vol.

Information pp 1-20.1983

and

Computation

[NN92]

L.A. Newberg and D. Naor. A Lower Bound on the Number of Solutions to the Exact Probed Partial Digest Problem. Advances in Applied
Mathematics, to appear. J.E. Dutchik, Strategy in Yeast. M.Y. for Proc. Graham, Natl. Acad. et al. RestricSci.

[T91]

D. Torney. Journal 264.1991

Mapping

Using Bioiogy

Unique Vol.

Sequences. 217, pp 259-

of Molecular

[OD86]

M.V. tion

Olson, Mapping

[TB91]

D.C. Analysis Clone somes.

Torney Physical Bulletin

and

D.J. Fingerprint

Balding. Data of Human

Statistical for Orderedl ChromoVol.

Random-Clone

Genomic

of DNA

Mapping

USA Vol.
[P82] W.

83, pp 7826-7830.1986 Automatic Maps. Construction Nucleic of Re[TD88]

of Mathematical

Biology

53, No. Pearson. Site striction

6, pp 853-879.1991 , P. Dussen, Map C. Mugnier Construction Comparability in the 103-110.1988 et al. in Assembly the CEA 9. pp and S. UsAlBio-

Acids

Res. Vol.

P.

Tuffery

10, pp 217-227.1982 [P89] P.A. puter tural [PD84] G. Pevzner. Analysis. Dynamics. Polner, L. DNA Nucleic L-tuple DNA Sequencing: ComStruc[T092] Dorgai Physical Acids and Map L. Orosz. PMAP. ProPMAPS: grams. 1984 [PL91] P.A, Pevzner, Y.P. Lysov, K.R. Khrapko, Construction

Hazout. ing

Restriction

a Complete Vol.

Sentences 1, pp

gorithm. sciences K. Tynan, and Gene 19.

Computer
4, No.

Applications

Journal
Vol.

of Biomoleczdar

7, pp 63-73.1989 A. Olsen, of Acids Region B. Trask, of Analysis Family Nucleic Cosmid Contigs Human Vol.

Chromosome 18, No.

Res. Vol

12, pp 227-236.

Research

2653-2660.1990

A.V. Mirz-

[w91]

C, Wills. sic Books,

Exons, 1991

Introns

and

Talking

Genes

Ba-

Belyavsky,
bridization. and Dynamics 1991 [S78] M. Stefik. mentation

V.L.

Florentiev Chim

and

A.D.

abekov. Immoved
Journal Vol.

for Secmencimz -..bv HyStructure [WG86] M.S. Graphs Vol. Waterman and Maps 48, pp. and J.R. Griggs. Bull. Intervall Biol. No. 2, pp 399-410. of DNA.

of Biomo~ecuiar
9, Issue

of Math.

189-195.1986 N. Hopkins, J. Roberts, Biology J. Steitz, Gene, 1987

Inferring Data.

DNA Artificial

Structure Intelligence

from Vol.

Seg11,

[WH87]

J. Watson, A. Weiner, Fourth

Molecular

of the

pp 85-114.1978 [SP91] Z. Strezoska, I. Labat, Sequencing by a Acad. Sci., Skiena, T. Paunesku, D. Radosavljevic, DNA Read Natl. 1991 Reon

Edition,

Benjamin/Cummings

R. Drmanac, by USA W.D. Sets

R. Crkvenjakov. 100 Bases Proc. 10089-10093, P. Lemke. Symposium

Hybridization: Method. 88, pp Vol.

Non-Gel-Baaed

[ss90] S.S.

Smith From

and

constructing Proceedings Computational

Interpoint

Distances.

of the 6th Annual


Geometry D.C.

pp 332-339.1990 C.E. of Human Academy 1990 Hildebrand, Chromo-

[ST90]R.L.
et al. somes

Stallings, Physical

Torney, Sequence National

Mapping

by Repetitive

Fingerprinting.

Proceedings

of the

of Sci-

ences Vol.

87, pp 6218-22.

285

Vous aimerez peut-être aussi