DNA Computing

ABSTRACT
“ Fact is often stranger Than Fiction”
Can you imagine a world without computers? No, it’s simply an absurd idea! Along similar lines can you
imagine a computer that is 10 times as fast as current supercomputers and runs on DNA instead of your humdrum
electronic micro-processor?
Say hello to DNA SuperComputing, the next avtaar of the molecule called DNA
“Deoxyribonucleic Acid” known by the acronym DNA is a biological molecule present in all living
organisms and holds the genetic information they require to subsist/exist. The DNA molecule forms the basis for this
new approach to computation.
In this approach the problem to be solved is coded into the structure of the DNA molecule (this may be considered
analogous to programming it)after which it can be used to compute and the resulting structure of the molecule is
interpreted as a solution the problem that was coded into the original DNA molecules.
Although this field is very new and still very theoretical in nature, research in DNA Computing has shown that the
molecule is a powerful tool for parallel processing applications, and due to its size ,it is perfect for nanotechnology
applications.Not to mention that it presents almost limitless processing speed due to its ability for replication.
DNA Computing has many promising attributes causing it to have extremely useful applications in the following areas:
1. Parallel processing
2. Miniaturization of data storage
3. Speed ,
4. Limitations of silicon based tech,
Now DNA is a remarkable molecule because of its enormous data density . Just like a string of binary data is
encoded with ones and zeros, the strands of DNA are encoded with the four bases which are spaced every 0.35
nanometers apart, giving DNA a data density of nearly 18 Mbits per inch. In two dimensions, if you assume one base
per square nanometers, the data density is over one million Gbits per square inch. Compare this to the data density of a
typical high performance hard disk drive, which is about 7 Gbits per square inch -- a factor of over 100,000 smaller
Talk of computers and instantly the picture of a box full of chips and electronic circuitry comes to mind. This
is because nowadays a computer and silicon are analogous things. But in my opinion the scene in another decade or so
could start to change. Already research in this area has started to bear fruit. Recently the first commercial DNA
computer was unveiled by a company ‘OLYMPUS OPTICAL’ which specialises in gene analysis, at another front
Israeli researchers at the WEIZMANN Inst. have developed a DNA computer that can execute 756 programs.
Although these achievements appear to be miniscule as compared to the processing capabilities at our
disposal but since DNA has many advantages over traditional silicon, the day is not far off when the desktop at your
table will run on DNA.
DNA Computing
In these days of Deep Blue, Deeper Blue, Hyper-threading, 64 bit computing, Itanium and one which most of us can
relate to, Pentium 4's . I have decided to bring before you a obscure but promising addition to computational science.
DNA Computing
The topic of my paper seems rather out of the ordinary and in the course of my talk i hope to impress upon
you just how novel it is.The first question that arises is What is DNA computing and if the title is accurate how is even
DNA remotely connected to computing.
DeOxyribose Nucliec Acid known by the acronym DNA is a biological molecule present in all living organisms,
responsible for holding the genetic information they require to subsist/exist.
What Is DNA Computing
DNA computing is a brand new field that involves interdisciplinary research between molecular biology and
computer science, it involves both practical and theoretical work. The theoretical research is mainly concerned with
developing formal models for biological phenomena, whilst the practical research involves the realisation of the
theoretical work in the laboratory.
DNA forms the basis for this new approach to computation in which the problem to be solved is coded into the
structure of the DNA molecule. (This may be considered analogous to programming the DNA) After which it can be
used to compute and the resulting structure of the molecule is interpreted as a solution the problem that was coded into
the original DNA molecules.
History Of DNA Computing

Computation using DNA is by no means a overnight phenomenon but then again, its roots lie a little more
than 10 years in the past. The earliest known papers relating to the theoretical models appeared in 1987, one was Titled:
Formal language theory and DNA : An analysis of the generative capacity of specific recombinant behaviors. This
paper wrote in great length starting from observing the structure and dynamics of DNA, and then the theoretical
research began to propose formal models (that is models with rules for performing the proposed theoretical operations)
for DNA computers.
But still, all the research done till now was purely theoretic and none had ever been demonstrated as
practically possible………… Until Nov 1994, the publishing of a paper in Science titled Molecular Solutions To
Computations Of Combitorial Problems by a professor of Computer Science at USC, Leonard M Adleman’s
research efforts in unconventional computing techniques.
In 1994, Leonard M. Adleman solved a normal and unremarkable computational problem with a radically
remarkable technique. The problem solved was one which a person could solve in a few moments, or an average
desktop computer could solve in the blink of an eye. It took Adleman, however, seven whole days to find a solution.
Why then was this work exceptional? Because he solved the problem using DNA. It was a landmark demonstration of
computing on the molecular level.
The type of problem that Adleman solved is formally known as a directed Hamiltonian Path (HP) problem, but
is more popularly recognized as a variant of the so-called "traveling salesman problem." In Adleman's version of the
traveling salesman problem, a hypothetical salesman tries to find a route through a set of cities such that he visits each
city only once. As the number of cities increases, the problem becomes more difficult and even computationally
expensive, making them impractical to solve on even the latest super-computer. Adleman’s demonstration only involves
seven cities, making it in some sense a trivial problem that can easily be solved by inspection. Nevertheless, his work is
significant for a number of reasons.
• It illustrates the possibilities of using DNA to solve a class of problems that is difficult or impossible to solve
using traditional computing methods.
• It's an example of computation at a molecular level, a size limit that may never be reached by the silicon
based semiconductor chips.
• It demonstrates unique aspects of DNA as a data structure
• It demonstrates that computing with DNA can work in a hugely parallel fashion.
DNA: A Unique Data Structure

The amount of information gathered on the molecular biology of DNA over the past half century or so, is
overwhelming in scope. Rather than going into the biochemical and biological aspects of DNA, we'll concentrate on
only the information relevant to DNA computing.
Structure Of DNA
The structure of the DNA molecule in itself one of natures marvels, is that of a double helix with cross
linkages in between the two strands forming the double helix. The components making up the strands are a set of 4
base pairs namely: adenosine, guanine, cytosine and tyrosine. (also known as nucleotides & represented by the letters
A, T, C, and G.)
We can see the structure diagrammatically as shown below:
?????? INSERT PIC OF DNA DOUBLE HELIX!!!!!!!
Data Density
As we can observe, the structure of the molecule is pretty unique and has certain traits that immediately lend
themselves to data storage. Just like a string of binary data is encoded with ones and zeros, the strands of DNA are
encoded with the four bases. The bases are spaced every 0.35 nanometers along the DNA molecule, giving DNA a
remarkable data density of nearly 18 Mbits per inch. In two dimensions, if you assume one base per square nanometers,
the data density is over one million Gbits per square inch. Compare this to the data density of a typical high
performance hard drive, which is about 7 Gbits per square inch -- a factor of over 100,000 smaller. It can be inferred
that the data density of DNA is impressive.
Double Strands
Another important property of DNA is its double stranded nature. The bases A and T, C and G, can bind
together, forming base pairs. Hence every DNA sequence has a natural complement. For example if sequence S is
ATTACGTCG, its complement, S', is TAATGCAGC. Both S and S' will come together (or hybridize) to form double
stranded DNA.
This complementarity makes DNA a unique data structure for computation and can be exploited in many
ways. Error correction is one example. Errors in DNA may happen due to many factors. Occasionally, DNA enzymes
simply make mistakes, cutting where they shouldn't, or inserting a T for a G. DNA can also be damaged by thermal
energy & UV energy from the sun. If the error occurs in one of the strands of double stranded DNA, repair enzymes
can restore the proper DNA sequence by using the complement strand as a reference.
In this sense, double stranded DNA is similar to a RAID 1 array, where data is mirrored on two drives,
allowing data to be recovered from the second drive if errors occur on the first. In biological systems, this facility for
error correction means that the error rate can be quite low. For example, in DNA replication, there is one error for every
10^9 copied bases or in other words an error rate of 10^-9. In comparison, hard drives have read error rates of only
10^-13.
Operations in parallel
In a cell, DNA is modified biochemically by a variety of enzymes, which can be visualised as very small
protein machines that read and process DNA according to nature's design. There is a wide variety and number of these
"operational" proteins, which manipulate DNA on the molecular level. For example, there are enzymes that cut DNA
and enzymes that paste it back together. Other enzymes function as copiers, and others as repair units. Molecular
biology, Biochemistry, and Biotechnology have developed techniques that allow us to perform many of these cellular
functions in test-tubes.
And note that in the test tube, enzymes do not function sequentially, working on one DNA at a time. Rather,
many copies of the enzyme can work on many DNA molecules simultaneously. This is the power of DNA computing,
that it can work in a massively parallel fashion.
It's this cellular machinery, along with some synthetic chemistry, that makes up the array of operations
available for computation. Just like a CPU has a basic collection of operations like addition, bit-shifting, logical
operators (AND, OR, NOT NOR), etc. that allow it to perform even the most complex calculations, DNA has cutting,
copying, pasting, repairing, and many others.
DNA vs. Silicon : A comparision
DNA, with its unique data structure and ability to perform many parallel operations, allows you to look at a
computational problem from a different point of view. Transistor-based computers typically handle operations in a
sequential manner. Of course there are multi-processor computers, and modern CPUs incorporate some parallel
processing, but in general, in the basic von Neumann architecture computer(which is what all modern CPUs are),
instructions are handled sequentially.
A Von Neumann machine basically repeats the same "fetch and execute cycle" over and over again; it fetches
an instruction and the appropriate data from main memory, and it executes the instruction. It does this many, many
times in a row, really, really fast. The great physicist Richard Feynman, in his Lectures on Computation, summed up
von Neumann computers by saying, "the inside of a computer is as dumb as hell, but it goes like mad!"
DNA computers, however, are non-von Neuman, stochastic machines that approach computation in a
different perspective than ordinary computers for the purpose of solving a different class of problems.
Typically, increasing performance of silicon computing means faster clock cycles and larger data paths,
where the emphasis is on the speed of the CPU and not on the size of the memory. For example, will doubling the clock
speed or doubling your RAM give you better performance?
For DNA computing, though, the power comes from the memory capacity and parallel processing. If forced
to behave sequentially, DNA loses its appeal. For example, let us look at the read and write rate of DNA. In bacteria,
DNA can be replicated at a rate of about 500 base pairs a second. Biologically this is quite fast (10 times faster than
human cells) and considering the low error rates, an impressive achievement. But this is only 1000 bits/sec, which is a
snail's pace when compared to the data transfer-speed of an average hard drive.
But look what happens if you allow many copies of the replication enzymes to work on DNA in parallel. First
of all, the replication enzymes can start on the second replicated strand of DNA even before they're finished copying
the first one. So already the data rate jumps to 2000 bits/sec.
Its interesting to observe what happens after each replication is finished - the number of DNA strands
increases exponentially (2^n after n iterations). With each additional strand, the data rate increases by 1000 bits/sec. So
after 10 iterations, the DNA is being replicated at a rate of about 1Mbit/sec; after 30 iterations it increases to 1000
Gbits/sec. This is beyond the sustained data rates of the fastest hard drives. This fact is better understood graphically, in
the below diagram the curve representing DNA replication rate is exponential while the data transfer rates of any hard
disk drive, even the fastest are naturally going to remain constant.
To compare how the approach to solving problems differ’s betweem DNA Computation and normal silicon
machines we observe how each would solve a nontrivial example of the traveling salesman problem (# of cities > 10).
With a von Neumann computer, one naive method would be to set up a search tree, measure each complete branch
sequentially, and keep the shortest one. Improvements could be made with better search algorithms, such as pruning the
search tree when one of the branches you are measuring is already longer than the best candidate. A method you
certainly would not use would be to first generate all possible paths and then search the entire list.
Why? Well, consider that the entire list of routes for a 20 city problem could theoretically take 45 million
GBytes of memory (18! routes with 7 byte words)! Also for a 100 MIPS computer, it would take two years just to
generate all paths (assuming one instruction cycle to generate each city in every path). However, using DNA
computing, this method becomes feasible! 10^15 is just a nanomole of material, a relatively small number for
biochemistry. Also, routes no longer have to be searched through sequentially. Operations can be done all in parallel.
To understand the enormity of the parallelism involved in DNA Computing it is essential to grasp the fact
that, each route may be encoded on a separate sequence and then all sequences can be processed simultaneously!!
Now it’s easy to agree that this is a totally different approach to parallel processing and unarguably the best.
HOW DNA COMPUTION WORKS : A classic Example
The Adleman experiment
There is no better way to understand how something works than by illustrative example. Here is presented a
directed Hamiltonian Path problem, and solved using the DNA methods demonstrated by Adleman. The concepts are
the same but the example is more simplified to make it easier to present.
Let us suppose that I live in Atlanta, and need to visit four cities: Boston, Chicago, Detroit, with Detroit being my
final destination.
The airline I’m taking has a specific set of connecting flights that restrict which routes I can take (i.e. there is a
flight from Boston to Atlanta but no flight from Chicago to Atlanta). What should my route be if I want to visit each
city only once?
It should take you only a moment to see that there is only one route. Starting from Boston you need to fly to
Atlanta, Chicago, and then to Detroit.
Any other choice of cities will force you to miss a destination, visit a city twice, or not make it to Detroit For
this example you obviously don’t need the help of a computer to find a solution. For six, seven, or even eight cities, the
problem is still manageable. However, as the number of cities increases, the problem quickly gets out of hand.
Assuming a random distribution of connecting routes, the number of possible paths you need to check increases
exponentially. Pretty soon analytic solution becomes tedious, cumbersome and ultimately an impossible task to check
all the possible routes, it becomes a problem for a computer..……………………………………..or perhaps DNA.
The method Adleman used to solve this problem is basically a brute force approach mentioned previously.
He first generated all the possible paths and then selected the correct path. This is the advantage of DNA. It’s small and
there are combinatorial techniques that can quickly generate many different data strings. Since the enzymes work on
many DNA molecules at once, the selection process is massively parallel and hence tremendously fast..
To be Specific, a method based on Adleman’s experiment would be as follows:
1) Generate all possible routes.
2) Select Routes that start with the proper city and end with the final city.
3) Select routes with the correct number of cities.
4) Select itineraries that contain each city only once.
All of the above steps can be accomplished with standard molecular biology techniques.
Part I: Generate all possible routes
Process: Encode city names in short DNA sequences. Encode the possible routes/ paths by connecting the city
sequences for which routes exist.
DNA can simply be treated as a string of data. For example, each city can be represented by a "word" of six bases:
Boston GCTACG
Atlanta CTAGTA
Chicago TCGTAC
Detroit CTACGG
The entire itinerary can be encoded by simply stringing together these DNA sequences that represent specific
cities. For example, the route from
Boston -> Atlanta -> Chicago -> Detroit
would simply be
GCTACG CTAGTA TCGTAC CTACGG ATGCCG
or equivalently it could be represented in double stranded form with its complement sequence. Synthesizing short
single stranded DNA is now a routine process, so encoding the city names is straightforward.
The molecules can be made by a machine called a DNA synthesizer. The possible routes/paths can then be
produced from the city encodings by linking them together in proper order. To accomplish this you can take advantage
of the fact that DNA hybridizes with its complimentary sequence. For example, you can encode the routes between
cities by encoding the compliment of the second half (last three letters) of the departure city and the first half (first
three letters) of the arrival city. For example the route between and Detroit (ATGCCG) can be made by taking the
second half of the coding for (CGG),
And the first half of the coding for Detroit (ATG). This gives CGGATG. By taking the complement of this
you get, GCCTAC, which not only uniquely represents the route from Boston to Detroit, but will connect the DNA
representing Boston and Detroit by hybridizing itself to the second half of the code representing Boston (...CGG) and
the first half of the code representing Detroit (ATG...). For example:
Random routes can be made by mixing city encodings with the route encodings. Finally, the DNA strands can
be connected together by an enzyme called ligase. What we are left with are strands of DNA representing paths with a
random number of cities and random set of routes. For example:
We can be confident that we have all possible combinations including the correct one by using an excess of
DNA encodings, say 10^13 copies of each city and each route between cities. It is essential to keep in mind that DNA is
a highly compact data format, so we can rest assured that numbers are on our side.
Part II: Select routes that start and end with the correct cities
Procedure: Selectively copy and amplify only the section of the DNA that starts with Boston and ends with Detroit by
using the Polymerase Chain Reaction.
After Part I, we now have a test tube full of various lengths of DNA that encode possible routes between cities. What
we want are routes that start with Boston and end with Detroit. To accomplish this we can use a technique called
Polymerase Chain Reaction (PCR), which allows you to produce many copies of a specific sequence of DNA. PCR is
an iterative process that cycles through a series of copying events using an enzyme called polymerase. Polymerase will
copy a section of single stranded DNA starting at the position of a primer, a short piece of DNA complimentary to one
end of a section of the DNA that you're interested in. By selecting primers that flank the section of DNA you want to
amplify, the polymerase preferentially amplifies the DNA between these primers, doubling the amount of DNA
containing this sequence. After many iterations of PCR, the DNA you're working on is amplified exponentially. So to
selectively amplify the no of routes that start and stop with our cities of interest, we use primers that are complimentary
to Boston and Detroit. What we end up with after PCR is a test tube full of double stranded DNA of various lengths,
encoding itineraries that start with Boston and end with Detroit.
Part III: Select itineraries that contain the correct number of cities.
Procedure: Sort the DNA by length and select the DNA whose length corresponds to 5 cities.
Our test tube is now filled with DNA encoded itineraries that start with Boston and end with Detroit, where the number
of cities in between Boston and Detroit varies. We now want to select those itineraries that are five cities long. To
accomplish this we can use a technique called Gel Electrophoresis, which is a common procedure used to resolve the
size of DNA. The basic principle behind Gel Electrophoresis is to force DNA through a gel matrix by using an electric
field. DNA is a negatively charged molecule under most conditions, so if placed in an electric field it will be attracted
to the positive potential. However since the charge density of DNA is constant (charge per length) long pieces of DNA
move as fast as short pieces when suspended in a fluid. This is why you use a gel matrix. The gel is made up of a
polymer that forms a meshwork of linked strands. The DNA now is forced to thread its way through the tiny spaces
between these strands, which slows down the DNA at different rates depending on its length. What we typically end up
with after running a gel is a series of DNA bands, with each band corresponding to a certain length. We can then simply
cut out the band of interest to isolate DNA of a specific length. Since we known that each city is encoded with 6 base
pairs of DNA, knowing the length of the itinerary gives us the number of cities. In this case we would isolate the DNA
that was 30 base pairs long (5 cities times 6 base pairs).
Applications
To consider that DNA Computing has not taken off at a practical level would not be correct, though it is a fact that the
field is largely theoretical but there have been some concrete steps towards a fully functional DNA computer.
A company ‘Olympus Optical’has succeeded in making what it claims to the first commercial DNA Computer that
specializes in the analysis of genes, the development is significant as it makes it possible to perform gene analysis,
which if dome manually normally takes about 3 days, in just 6 hours.
On the other hand Israeli scientist at the Weizimann Inst of Sci and Tech have developed a Biological computing device
which they say can execute 756 different programs, albeit of a modest nature they are stepping stones to improved
designs of the DNA computer. The researchers at Weizimann Inst. say that their device is a step towards more advanced
nano computers able to reside in the body, detect chemical abnormalities then synthesize and release appropriate drugs,
a kind of automated doctor.
Limitations
Adleman's experiment solved a seven city problem, but there are two major shortcomings preventing a large
scaling up of his computation. The complexity of the traveling salesman problem simply doesn’t disappear when
applying a different method of solution - it still increases exponentially. For Adleman’s method, what scales
exponentially is not the computing time, but rather the amount of DNA. Unfortunately this places some hard
restrictions on the number of cities that can be solved; after the Adleman article was published, more than a few people
have pointed out that using his method to solve a 200 city HP problem would take an amount of DNA that weighed
more than the earth. Another factor that places limits on his method is the error rate for each operation. Since these
operations are not deterministic but stochastically each step contains statistical errors, limiting the number of iterations
you can do successively before the probability of producing an error becomes greater than producing the correct result.
For example an error rate of 1% is fine for 10 iterations, giving less than 10% error, but after 100 iterations this error
grows to 63%.
Conclusions
Now that we have seen all that DNA Computing is presently capable of, it is time for us to draw some
conclusions as to the furure, scope and effectiveness of this novel new field. It is evident that DNA Computing can be
used for VERY fast computation but still it is in a nascent stage and still developing. The real questions are will it
someday become as prevalent and widespread as its silicon counterparts if not replace them entirely? I firmly belive so.
These experiments are just tips of the not an iceburg but the mountain that DNA Computing promises to become. These
first demonstrations of DNA computing use rather unsophisticated algorithms, but as the formalism of DNA computing
becomes refined, new algorithms perhaps will one day allow DNA to overtake conventional computation and set a new
records.
On the side of the "hardware" (or should I say "wetware"), improvements in biotechnology are happening at a rate
similar to the advances made in the semiconductor industry. For instance, look at sequencing; what once took a
graduate student 5 years to do for a Ph.D thesis takes Celera (a sequencing machine)just one day. Just look at the
number of advances in DNA-related technology that happened in the last five years. Today we have not one but several
companies making "DNA chips," where DNA strands are attached to a silicon substrate in large arrays (for example
Affemetrix’s genechip). Furthermore the Human Genome Project is producing rapid innovations in sequencing
technology. The future of DNA manipulation is speed, automation, and miniaturization.
And of course we are talking about DNA here, the genetic code of life itself. It certainly has been the
molecule of this century and most likely the next as well. And after considering all the data that research has been able
to furnish, it isn’t too hard to imagine that one day we might have the tools and talent to produce a small integrated
desktop machine that uses DNA, or a DNA-like biopolymer, as a computing substrate along with set of designer
enzymes.
Perhaps it won’t be used to play Quake IV or surf the web -- things that traditional computers are good at --
but it certainly might be used in the study of logic, encryption, genetic programming and algorithms, automata,
language systems, and lots of other interesting things that haven't even been invented yet.

DNA Computing

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

DNA Computing

Transféré par

Droits d'auteur :

Formats disponibles

ABSTRACT

“ Fact is often stranger Than Fiction”

What Is DNA Computing

History Of DNA Computing

DNA: A Unique Data Structure

We can see the structure diagrammatically as shown below:

?????? INSERT PIC OF DNA DOUBLE HELIX!!!!!!!

DNA vs. Silicon : A comparision

1) Generate all possible routes.

3) Select routes with the correct number of cities.

4) Select itineraries that contain each city only once.

Boston -> Atlanta -> Chicago -> Detroit

Vous aimerez peut-être aussi