Académique Documents
Professionnel Documents
Culture Documents
DNA computing
Shinnosuke Seki
Purpose
Whats an advantage of encoding?
capacity C
sender encoder decoder receiver
Information flow R
Negative Noise
R>C overflow
RC We can make the error rate as small as possible.
Mutation
Hydrogen bonds A T C G
Two strands which are
1. complementary to each other
Example
5 - A T C G G T C A A C T G C C C T A A T G 3
3 T A G C C A G T T G A C G G G A T T A C - 5
Adlemans first trial
Find a solution of Hamiltonian path problem in a solution
in polynomial time order of the input graph.
The solution is filled with encoding oligonucleotides.
1 3
1 2 3 4
ACG CTT ATA GAT CGG TTA ACT TAA
GAA TAT CTA GCC AAT TGA
1 -> 2 2 -> 3 3 -> 4
2 4
Whats a good code set?
Each code word (oligonucleotide) shouldnt form any
undesirable structure.
A T A
2
ATA GAT
T A G
R: gas constant,
Ct: total oligo concentration,
H & S : enthalpy & entropy
: 1 for self-complementary and 4 for non-self
Nearest-neighborhood method
Refer to [AlSa97], [TKY04] ([8], [9] in this table)
Melting temperature (cont.)
Uniform melting temperature
To uniform Tm can eliminate a bias of hybridization.
GC content
The ratio of the # of Gs and Cs over the total # of
nucleotides in a sequence
G-C pair is more stable than A-T pair.
Higher GC content implies higher Tm.
Sequences are designed with 50% GC content.
Gibbs free energy (G)
A well-known indicator of stability for DNA structures
A structure with lower G is more stable.
Design criteria
Template
An element x should have at least d-mismatches
with xR, xx, xR xR, xxR, xRx.
An exhaustive search to find a good template
Drawback
It cannot prevent sequences from forming secondary
structures.
AG-templates, GC-templates [KKA03]
GC-template
Template contains the
same # of 0s and 1s
(50% GC-content)
Map is an error correcting
code.
AG-template
Map is constant weight
codes (50% GC-content)
Results in the bigger set of
sequences
Other approaches
DNASequenceGenerator [FBR00]
A software with GUI
Create a sequence with melting temperature, GC-
content, no palindromes, start codons, nor restriction
sites.
Other approaches
Suyamas approach [YoSu00]
To generate sequences randomly, add it into a
sequence set iff it satisfied all of the following
constraints:
Uniform melting temperature
No mis-hybridization
GC-content
Hamming distance
AGTAGGCTAAAGCCC
Bond-free properties [KKS05]
-non-overlapping: L ( L empty
-compliant: w L , x , y
, w, x w) y L xy
-s-compliant: w L , x
, w, x w) L x
Bond-free properties [KKS05]
-free: L2 ( L) empty
-sticky-free: w L , x , y
, wx, y w) L xy
Bond-free properties [KKS05]
-3-overhang-free: w L, x, y , wx, w) y L xy
-5-overhang-free: w L, x, y , xw, y w) L xy
-compliant,
-p-compliant,
-s-compliant, or
-sticky-free
Then it is decidable whether L is a maximal subset of
M satisfying .
Secondary structure prevention
Secondary structures:
Hairpin-loop (or simply hairpin)
Internal loop
Multiple-branch loop
Pseudoknot
They can be undesirable
e.g. for Adlemans encoding technique for Hamiltonian
Path Problem (HPP).
Secondary Structures
Hairpin
Hairpin frame
5
(multiple loop)
Internal loop 3
5 A C G T 3
3 G C C 5
Hairpin-free language
A formal model of hairpin: x v y (v) z.
TAA---ACG---CGTTA---CGT---CGGT
x v y (v) z
Hairpin freeness
Intuitively its almost impossible to prevent hairpins of
short stack length (say 2 or 3).
Our desire is to prevent any hairpin of stack length no
less than some given parameter k.
Hairpin-free language [KKL06]
A word w is (, k)-hairpin-free (abbr. hp(, k)-free) iff
w xvy (v) z | v | k .
X X X
w (w)
[KNO08] Kawashimo, S., Ng, Y-K., Ono, H., Sadakane, K., Yamashita, M.:
Speeding up local-search type algorithms for designing dna sequences
under thermodynamical constraints. Proc. DNA14 (2008) 152-161
[KYO08] Kameda, A., Yamamoto, M., Ohuchi, A., Yaegashi, S., Hagiya, M.:
Unravel four hairpins! Natural Computing 7 (2008) 287-298
[RFL01] Ruben, A. J., Freeland, S. J., Landweber, L. F.: PUNCH: An
evolutionary algorithm for optimizing bit set selection. DNA7 (2001) 150-160
[Sha48] Shannon, C.E.: A mathematical theory of communication. Bell
System Technical Journal 27 (1948) 379-423, 623-656
[TKY04] Tanaka, F., Kameda, A., Yamamoto, M., Ohuchi, A.:
Thermodynamic parameters based on a nearest-neighbor model for DNA
sequences with a single-bulge loop. Biochemistry 43(22) (2004) 7143-7150
[TKY05] Tanaka, F., Kameda, A., Yamamoto, M., Ohuchi, A.: Design of
nucleic acid sequences for DNA computing based on a thermodynamic
approach. Nucleic Acids Res. 33(3) (2005) 903-911
Reference (cont.)