Encoding Information For DNA Computing: Shinnosuke Seki

Encoding Information for
DNA computing
Shinnosuke Seki
Purpose
Whats an advantage of encoding?
To make a good or tractable code set for DNA

computing.
Development of polynomial-time algorithms

which decide whether a given code set is good
or bad.
Claude Elwood Shannon
The father of information
theory (Shannons entropy)
Boolean algebra with binary
arithmetic makes it possible to
simplify electromechanical
relays
In A mathematical theory of
communication [Sha48], he
showed that we can send
error-free information even on
noisy channel.
Chess program using minimax
evaluation procedure
etc.
Shannons information channel
Positive Noise
capacity C
sender encoder decoder receiver
Information flow R
Negative Noise
R>C overflow
RC We can make the error rate as small as possible.
To attain R = C in the noisy channel, we need to find a

good code.
Biological perspective
Every biological reaction is an information
channel model.
example The case of heredity
Natural Selection
parent DNA heredity DNA child
Mutation
For billions of years, Mother Nature has

developed wonderful code system?
Biology -> Computer Science
Review: in vitro DNA computing
1. Encode a given problem into single or double-stranded
DNAs (ssDNAs, dsDNAs)
2. Computation by a succession of bio-operations.
3. Decode the resulting solution and extract its output.
Review: WK-complementarity
Hydrogen bonds A T C G
Two strands which are
1. complementary to each other
2. with opposite directions
can form a (complete) dsDNA.
Example
5 - A T C G G T C A A C T G C C C T A A T G 3
3 T A G C C A G T T G A C G G G A T T A C - 5
Adlemans first trial
Find a solution of Hamiltonian path problem in a solution
in polynomial time order of the input graph.
The solution is filled with encoding oligonucleotides.
1 3
1 2 3 4
ACG CTT ATA GAT CGG TTA ACT TAA
GAA TAT CTA GCC AAT TGA
1 -> 2 2 -> 3 3 -> 4
2 4
Whats a good code set?
Each code word (oligonucleotide) shouldnt form any
undesirable structure.
A T A
2
ATA GAT
T A G
This may make itself inert.

Code words dont interact with each other in an
undesirable way.
Structure formation is due to
WK-complementarity
Gibbs free energy
Whats a good code set? (cont.)
Uniform melting temperature
Preventing undesirable hybridizations
Other constraints
Avoiding repeated bases
Forbidden subsequences
Using a restriction enzyme, its corresponding
recognition site should appear only in intended sites

Using only 3 types of nucleotides A, C, T
Melting temperature
Melting temperature Tm of a dsDNA is

the temperature at which half of the dsDNAs is
denatured.
The higher Tm is, the more stable the dsDNA is.
H
vm
R ln(Ct / S

R: gas constant,
Ct: total oligo concentration,
H & S : enthalpy & entropy
: 1 for self-complementary and 4 for non-self
Nearest-neighborhood method
Refer to [AlSa97], [TKY04] ([8], [9] in this table)
Melting temperature (cont.)
To uniform Tm can eliminate a bias of hybridization.
GC content
The ratio of the # of Gs and Cs over the total # of
nucleotides in a sequence
G-C pair is more stable than A-T pair.
Higher GC content implies higher Tm.
Sequences are designed with 50% GC content.
Gibbs free energy (G)
A well-known indicator of stability for DNA structures
A structure with lower G is more stable.
The G of entire structure is the sum of G of each

substructures [ZuSt81].
Secondary structures look like
Template method [ArKo02]
Prepare 2 bit sequences, each of which has some
desirable property
(e.g., 50%-GC content, error-correction).
Using convert rule, from these 2 sequences, we
construct a sequence.
Template method (cont.)
Design criteria
Template
An element x should have at least d-mismatches
with xR, xx, xR xR, xxR, xRx.
An exhaustive search to find a good template
Map (error-correcting code)

A code whose words have at least k-mismatches.
e.g. BCH code
Drawback
It cannot prevent sequences from forming secondary
structures.
AG-templates, GC-templates [KKA03]
GC-template
Template contains the
same # of 0s and 1s
(50% GC-content)
Map is an error correcting
code.
AG-template
Map is constant weight
codes (50% GC-content)
Results in the bigger set of
sequences
Other approaches
DNASequenceGenerator [FBR00]
A software with GUI
Create a sequence with melting temperature, GC-
content, no palindromes, start codons, nor restriction
sites.
Other approaches
Suyamas approach [YoSu00]
To generate sequences randomly, add it into a
sequence set iff it satisfied all of the following
constraints:
No mis-hybridization
No formation of stable secondary structure
Drawback is to fall into local optima easily.

Other approaches
Hybrid randomized neighborhoods [TuHo03]
Stochastic local search (SLS) algorithm
Searches neighbors by mutating current best
sequences randomly with a probability .
It moves to the direction where the # of constraint
conflicts is maximally decreased with a probability 1-.
Other approaches
GA (genetic algorithm)-based approach [ANH00]
Use GAs to evaluate fitness of solutions
As criteria
Restriction sites
GC-content
Hamming distance
Same base repetition

Other approaches
Gibbs free energy base approach
Taking thermodynamics into consideration
Gibbs free energy as a stability measure
Advantage
Greater accuracy because it takes into account
stability of loops or stacking between base-pairs

Disadvantage
More computational time to calculate free energy
How to decrease this computational complexity?

See [TKY05], [KNO08]
A formal language approach
Design a set of structure-free codes in terms of
WK-complementary.
Advantage
More reliable codes than Free-energy approach
More efficient algorithm for decision problems
Disadvantage
Need to consider each structure separately.
A formal language approach (cont.)
Abstracts of concepts
{A, C, G, T} an alphabet V,
WK-complementarity an antimorphic involution
Involution
A mapping s.t. 2 is identity (symmetry).
Antimorphism
(xy) = (y)(x) (opposite direction).
e.g. (TCATCCGATTTCGGG) = CCCGAAATCGGATGA

TCATCCGATTTCGGG
AGTAGGCTAAAGCCC
Bond-free properties [KKS05]
-non-overlapping: L ( L empty
-compliant: w L , x , y
, w, x w) y L xy
Strictly (a) : a property (a) with -non-overlapping

-p-compliant: w L, y , w, w) y L y
-s-compliant: w L , x
, w, x w) L x
-free: L2 ( L) empty
-sticky-free: w L , x , y
, wx, y w) L xy
-3-overhang-free: w L, x, y , wx, w) y L xy
-5-overhang-free: w L, x, y , xw, y w) L xy
-overhang-free: both of these

Decidability [KKS05]
Theorem
the following problem is decidable in quadratic time
w.r.t. |A|
Input: an NFA A,
Output: Yes/No depending on whether L(A) satisfies
any of the properties (or their strictly versions):

-compliant, -p-compliant, -s-compliant,
-sticky-free,
-3-overhang-free, -5-overhang-free, -overhang-free.
Decidability and maximality [KKS05]
Theorem
Let M be a regular language and L is a regular subset
of M with a property :
is one of the followings:
-compliant,
-p-compliant,
-s-compliant, or
-sticky-free
Then it is decidable whether L is a maximal subset of
M satisfying .
Secondary structure prevention
Secondary structures:
Hairpin-loop (or simply hairpin)
Internal loop
Multiple-branch loop
Pseudoknot
They can be undesirable
e.g. for Adlemans encoding technique for Hamiltonian
Path Problem (HPP).
Secondary Structures
Hairpin
Hairpin frame
5
(multiple loop)
Internal loop 3
5 A C G T 3
3 G C C 5
Hairpin-free language
A formal model of hairpin: x v y (v) z.
TAA---ACG---CGTTA---CGT---CGGT
x v y (v) z
Hairpin freeness
Intuitively its almost impossible to prevent hairpins of
short stack length (say 2 or 3).
Our desire is to prevent any hairpin of stack length no
less than some given parameter k.
Hairpin-free language [KKL06]
A word w is (, k)-hairpin-free (abbr. hp(, k)-free) iff
w xvy (v) z | v | k .
hpf(, k) : the set of all hp(, k)-free words on *

hp(, k) : * - hpf(, k).
A language L is called (, k)-hairpin-free iff

L hpf ( , k )
Regularity of hairpin languages
hp ( , k ) X *
wX * ( w) X *
| w| k
X X X
w (w)
hp(, k) and hpf(, k) are regular.
For a hp(, k)-free language L, there exists a finite

automaton M s.t. L = L(M).
Hairpin Freedom Problems
Hairpin-Freedom problem
Input: A nondeterministic automaton M,
Output: Y/N depending on whether L(M) is hp(, k)-free.
Maximal Hairpin-Freedom problem

Input: A deterministic automaton M1, and NFA M2.
Output: Y/N depending on whether there is a word
w L( M 2) L( M 1) s.t. L( M 1) {w} is hp(, k)-free.
Decidability
The hairpin-freedom problem for regular languages is
decidable in O (| M |) time.
The maximal hairpin-freedom problem for regular

languages is decidable in O (| M 1 | | M 2 |) time.
Hairpin Frames
So-called Multiple loop
hp-frame of degree n:
x1v1 y1 (v1) z1... xnvnyn (vn) zn
Figure is an example of hp-

frame of degree 3.
A word u is hp(fr, j)-word if it
contains a hp-frame of
degree j.
Regularity & decidability
hp(, fr, j) : the set of all hp(fr, j)-words on *
hpf(, fr, j) : its complement in *
The languages hp(, fr, j) & hpf(, fr, j) are regular.
The hp(fr, j)-freedom problem is decidable in linear

time.
The maximal hp(fr, j)-freedom problem is decidable
in O(| M 1 | | M 2 |) time.
Application : DNA-HRAMs
C G
T A
G C opening
T A
C G --A-C-T-G-T-C-G-A-C-A-G-T--
A T
closing
0 1
n-bit DNA-HRAM consists of n hairpins.
Each hairpin stores 1-bit information by forming and
deforming a hairpin as shown above.
n-bit DNA-HRAM
Concatenation of n 1-bit RAM, which is equivalent to hp-
frame of degree n.
x1v1 y1 (v1) z1... xnvnyn (vn) zn
In order for this word to work as n-bit RAM, the following
subword should be hpf(, 20)-free.
x1v1 y1 z1... xnvnynzn

DNA memory with 4 hairpins was proposed in [KYO08].
Reference
[AlSa97] Allawi, HT., SantaLucia, J.: Thermodynamics and NMR of internal

G T mismatches in DNA. Biochemistry 36(34) (1997) 10581-10594
[ArKo02] Arita, M., Kobayashi, S.: DNA sequence design using templates.
New Generation Computing 20 (2002) 263-277
[ANH00] Arita, M., Nishikawa, A., Hagiya, M., Komiya, K., Gouzu, H.,
Sakamoto, K.: Improving sequence design for dna computing. Proc. Genetic
and Evolutionary Computation Conference (2000) 875-882.
[FBR00] Feldkamp, U., Saghafi, S., Rauhe, H.: A DNA sequence compiler.
Proc. DNA6, (2000)
[KKS05] Kari, L., Konstantinidis, S., Sosik, P.: Preventing undesirable bonds
between DNA codewords. Prof. DNA10, LNCS 3384 (2005) 182-191.
[KKL06] Kari, L., Konstantinidis, S., Losseva, E., Sosik, P., Thierrin, G.: A
formal language analysis of DNA hairpin structures. Fundamenta
Informaticae 71 (2006) 453-475
[KKA03] Kobayashi, S., Kondo, T., Arita, M.: On template method for DNA
sequence design. Proc. DNA8, LNCS 2568 (2003) 205-214
Reference (cont.)
[KNO08] Kawashimo, S., Ng, Y-K., Ono, H., Sadakane, K., Yamashita, M.:
Speeding up local-search type algorithms for designing dna sequences
under thermodynamical constraints. Proc. DNA14 (2008) 152-161
[KYO08] Kameda, A., Yamamoto, M., Ohuchi, A., Yaegashi, S., Hagiya, M.:
Unravel four hairpins! Natural Computing 7 (2008) 287-298
[RFL01] Ruben, A. J., Freeland, S. J., Landweber, L. F.: PUNCH: An
evolutionary algorithm for optimizing bit set selection. DNA7 (2001) 150-160
[Sha48] Shannon, C.E.: A mathematical theory of communication. Bell
System Technical Journal 27 (1948) 379-423, 623-656
[TKY04] Tanaka, F., Kameda, A., Yamamoto, M., Ohuchi, A.:
Thermodynamic parameters based on a nearest-neighbor model for DNA
sequences with a single-bulge loop. Biochemistry 43(22) (2004) 7143-7150
[TKY05] Tanaka, F., Kameda, A., Yamamoto, M., Ohuchi, A.: Design of
nucleic acid sequences for DNA computing based on a thermodynamic
approach. Nucleic Acids Res. 33(3) (2005) 903-911
Reference (cont.)
[TuHo03] Tulpan, D., Hoos, H.: Hybrid randomised neighbourhoods improve

stochastic local search for dna code design. In Advances in Artificial
Intelligence: 16th Conference of the Canadian Society for Computational
Studies of Intelligence, 2671 (2003) 418-433
[YoSu00] Yoshida, H., Suyama, A.: Solution to 3-sat by breadth first search.
Proc. the 5th DIMACS Workshop on DNA Based Computers, 54 (2000) 9-22
[ZuSt81] Zuker, M., Stiegler, P.: Optimal computer folding of large RNA
sequences using thermodynamics and auxiliary information. Nucleic Acids
Res. 9(1) (1981) 133-148

Encoding Information For DNA Computing: Shinnosuke Seki

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Encoding Information For DNA Computing: Shinnosuke Seki

Transféré par

Droits d'auteur :

Formats disponibles

Encoding Information for

To make a good or tractable code set for DNA

Development of polynomial-time algorithms

To attain R = C in the noisy channel, we need to find a

parent DNA heredity DNA child

For billions of years, Mother Nature has

2. with opposite directions

can form a (complete) dsDNA.

This may make itself inert.

recognition site should appear only in intended sites

Melting temperature Tm of a dsDNA is

The G of entire structure is the sum of G of each

Map (error-correcting code)

e.g. BCH code

No formation of stable secondary structure

Drawback is to fall into local optima easily.

Same base repetition

stability of loops or stacking between base-pairs

How to decrease this computational complexity?

e.g. (TCATCCGATTTCGGG) = CCCGAAATCGGATGA

Strictly (a) : a property (a) with -non-overlapping

-overhang-free: both of these

Output: Yes/No depending on whether L(A) satisfies

any of the properties (or their strictly versions):

hpf(, k) : the set of all hp(, k)-free words on *

A language L is called (, k)-hairpin-free iff

hp(, k) and hpf(, k) are regular.

For a hp(, k)-free language L, there exists a finite

Maximal Hairpin-Freedom problem

The maximal hairpin-freedom problem for regular

Figure is an example of hp-

The languages hp(, fr, j) & hpf(, fr, j) are regular.

The hp(fr, j)-freedom problem is decidable in linear

x1v1 y1 z1... xnvnynzn

[AlSa97] Allawi, HT., SantaLucia, J.: Thermodynamics and NMR of internal

[TuHo03] Tulpan, D., Hoos, H.: Hybrid randomised neighbourhoods improve

Vous aimerez peut-être aussi