DNA CRYPTOGRAPHY

Chapter 1
Introduction

NIT KURUKSHETRA

1

DNA CRYPTOGRAPHY

1

Introduction

1.1 DNA Cryptography
DNA cryptography is a new born cryptographic field emerged with the research of DNA computing, in which DNA is used as information carrier and the modern biological technology is used as implementation tool. The vast parallelism and extraordinary information density inherent in DNA molecules are explored for cryptographic purposes such as encryption, authentication, signature, and so on.

1.2 DNA
DNA is the abbreviation for deoxyribonucleic acid which is the germ plasm of all life styles. DNA is a kind of biological macromolecule and is made of nucleotides. Each nucleotide contains a single base and there are four kinds of bases, which are adenine (A) and thymine (T) or cytosine (C) and guanine (G), corresponding to four kinds of nucleotides. A single-stranded DNA is constructed with orientation: one end is called 5′, and the other end is called 3′. Usually DNA exists as double-stranded molecules in nature. The two complementary DNA strands are held together to form a double-helix structure by hydrogen bonds between the complementary bases of A and T (or C and G).

Fig 1.2.1 Double helix structure of DNA

NIT KURUKSHETRA

2

DNA CRYPTOGRAPHY 1.3 Amino Acid Codes
Amino Acid Name Alanine Arginine Asparagine Aspartic acid (Aspartate) Cysteine Glutamine Glutamic acid (Glutamate) Glycine Histidine Isoleucine Leucine Lysine Methionine Phenylalanine Proline Serine Threonine Tryptophan Tyrosine Valine Asparagine or Aspartic acid (Aspartate) Glutamine or Glutamic acid (Glutamate) Unknown amino acid (any amino acid) Translation stop Gap of indeterminate length Unknown character (any character or symbol not in table) Amino Acid Code Nucleotide Codon

A R N D C Q E G H I L K M F P S T W Y V B Z X * ?

GCT GCC GCA GCG CGT CGC CGA CGG AGA AGG ATT AAC GAT GAC TGT TGC CAA CAG GAA GAG GGT GGC GGA GGG CAT CAC ATT ATC ATA TTA TTG CTT CTC CTA CTG AAA AAG ATG TTT TTC CCT CCC CCA CCG TCT TCC TCA TCG AGT AGC ACT ACC ACA ACG TGG TAT, TAC GTT GTC GTA GTG
Random codon from D and N Random codon from E and Q Random codon

TAA TAG TGA --???

Table 1.3.1 Amino acids and codes 1.4 Primer
A primer is a short synthetic oligonucleotide which is used in many molecular techniques from PCR to DNA sequencing. These primers are designed to have a sequence which is the reverse complement of a region of template or target DNA to which we wish the primer to anneal.

NIT KURUKSHETRA

3

DNA CRYPTOGRAPHY

Some thoughts on designing primers
1. primers should be 17-28 bases in length; 2. base composition should be 50-60% (G+C); 3. primers should end (3') in a G or C, or CG or GC: this prevents "breathing" of ends and increases efficiency of priming; 4. Tms between 55-80oC are preferred; 5. 3'-ends of primers should not be complementary (ie. base pair), as otherwise primer dimers will be synthesised preferentially to any other product; 6. primer self-complementarity (ability to form 2o structures such as hairpins) should be avoided; 7. runs of three or more Cs or Gs at the 3'-ends of primers may promote mispriming at G or C-rich sequences (because of stability of annealing), and should be avoided.

1.5 Transcription and Translation
Transcription, or RNA synthesis, is the process of creating an equivalent RNA copy of a sequence of DNA. Both RNA and DNA are nucleic acids, which use base pairs of nucleotides as a complementary language that can be converted back and forth from DNA to RNA in the presence of the correct enzymes. During transcription, a DNA sequence is read by RNA polymerase, which produces a complementary, anti-parallel RNA strand. As opposed to DNA replication, transcription results in an RNA complement that includes uracil (U) in all instances where thymine (T) would have occurred in a DNA complement. Translation is the first stage of protein biosynthesis (part of the overall process of gene expression). Translation is the production of proteins by decoding mRNA produced in transcription. Translation occurs in the cytoplasm where the ribosomes are located. Ribosomes are made of a small and large subunit which surrounds the mRNA. In translation, messenger RNA (mRNA) is decoded to produce a specific polypeptide according to the rules specified by the genetic code. This uses an mRNA sequence as a template to guide the synthesis of a chain of amino acids that form a protein. Many types of transcribed RNA, such as transfer RNA, ribosomal RNA, and small nuclear RNA are not necessarily translated into an amino acid sequence.

NIT KURUKSHETRA

4

Encryption is the process of scrambling the plaintext using a known algorithm and a secret key. 1. and a single target DNA molecule can be amplified to 106 after 20 cycles in theory. establishing corresponding theories. It gets its name from the fact that the sender and receiver each possess identical notepads ¯lled with random data. each PCR primer (20-27)-mer nucleotides long is a comparatively perfect selection. we selected each PCR primer 20-mer nucleotides NIT KURUKSHETRA 5 . and the necessary target DNA can be amplified after a serial of polymerase enzyme. Thinking about the highly stability of PCR.DNA CRYPTOGRAPHY 1. * The main goal of the research of DNA cryptography is exploring characteristics of DNA molecule and reaction. The goal of encryption is to prevent decryption by an adversary who does not know the secret key. discovering possible development directions. The output is a sequence of characters known as the ciphertext. and lay-ing the basis for future development. Here we provide basic terminology used in cryptography. An unbreakable cryptosystem is one for which successful cryptanalysis is not possible. The goal is to transmit a message between a sender and receiver such that an eavesdropper is unable to understand it. Each piece of data is used once to encrypt a message by the sender and to decrypt it by the receiver. Decryption is the reverse process. such as that of a natural language. after which it is destroyed. Two complementary oligonucleotide primers are annealed to double-stranded target DNA strands. In this study. Such a system is the one-time-pad cipher. which transforms the encrypted message back to the original form using a key. Polymerase Chain Reaction (PCR) is a fast DNA amplification technology based on Watson-Crick complementarity.7 Advantages Of DNA Cryptography The difficult biological problem referred to here is “It is extremely difficult to amplify the message-encoded sequence without knowing the correct PCR two primer pairs”.6 Cryptography Data security and cryptography are critical aspects of conventional computing and may also be important to possible DNA database applications. searching for simple methods of realizing DNA cryptography. Thus one can effectively amplify a lot of DNA strands within a very short time. The PCR is a very sensitive method. Plaintext refers to a sequence of characters drawn from a ¯nite alphabet. and is one of the most important inventions in modern biology.

DNA cryptography has only nearly ten years history. If an adversary without knowing the correct two primer pairs wants to pick out the message encoded sequence by PCR amplification. It is impossible for an adversary to obtain a totally NIT KURUKSHETRA 6 .1 Development Traditional cryptography can be traced back to Caesar cipher 2000 years ago or even earlier. Any behavior of eavesdropping will change the cipher so it can be detected. he must choose two primer sequences from about 10^23 kinds of sequences (the number of combination taking 2 sequences from 420 candidates). 1. It is a special function in PCR amplification that having the correct primer pairs.9. that is to say. traditional cryptography and quantum cryptography 1. By and large. and the theory basis has been prepared while implementation is difficult. Although there is uncertainty about the computational power of quantum computers. their security is based on Heinsberg's Uncertainty Principle. Related theory is almost sound.8 Limitations Of DNA Cryptography (i) Lack of the related theoretical basis.DNA CRYPTOGRAPHY long.9. (ii) Difficult to realize and expensive to apply. Quantum cryptography came into being in the 1970s. they have not been plunged into practical use. Quantum cryptographic schemes are unbreakable under current theories.2 Security Only computational security can be achieved for traditional cryptographic schemes except for the one-time pad. 1. it is still impossible to break such a scheme.9 Comparisons among DNA cryptography. it is possible that all the traditional schemes except for the one-time pad can be broken by using the future quantum computers. 1. we believe that this biological problem is difficult and will last a relatively long time. It is shown that quantum computers have great and striking computational potential. Even if an eavesdropper is given the ability to do whatever he wants. It would still be extremely difficult to amplify the message-encoded sequence without knowing the correct two primer pairs. an adversary with infinite power of computation can break them theoretically. the theory basis is under research and the application costs very much. All the practical ciphers can be seen as traditional ones. so much as P=NP. Differently. So. and has infinite computing re-sources.

DNA and other storage medium. But from the above discussions we think it is likely that they exist and develop conjunctively and complement each other rather than one of them falls into disuse thoroughly. exceptional energy efficiency and extraordinary information density inherent in DNA molecules. we hold the following opinions: NIT KURUKSHETRA 7 . authentication. cash ticket and identification card. The disadvantage lies in the secure data storage. identity authentication and digital signature. Nonetheless. 1. Under the current level of techniques. thus the attempt to tamper but without being detected in vain.3 Application Traditional cryptosystems are the most convenient of which the computation can be executed by electronic. 1. fiber. which has nothing to do with the computing power and immunizes DNA cryptographic schemes against attacks using quantum computers. quantum key agreement schemes have unconditional security. digital signature. wireless channel and even by a messenger. For the DNA cryptography.DNA CRYPTOGRAPHY same the quanta with the intercepted one. steganography. which makes it infeasible to implement publickey encryption and digital signature as easily as traditional one does. and so on. the main security basis is the restriction of biological techniques. Due to the vast parallelism. the data can be transmitted by wire. However. such as secure data storage. quantum as well as DNA computers. DNA can even be used to produce unforgeable contract. only by physical ways can the cipher text of DNA cryptography be transmitted. Using the traditional cryptography we can realize purposes as public and private key encryption. and a great many problems remains to be solved especially for DNA and quantum cryptography. Therefore. DNA cryptography can have special advantages in some cryptographic purposes.9. Researches of all the three kinds of cryptography are still in progress. and the storage can be CDs. it is too early to predict the future development precisely.10 Development directions of DNA cryptography Since DNA cryptography is still in its immature stage. Quantum cryptosystem is implemented on quantum channels of which main ad-vantage lies in real-time communication. this making it hard to predict the future. the problem as to what is the extent this kind of security and how long it can be maintained it is still under exploration. magnetic medium. in view of the development of biological techniques and the requirement of cryptography.

e. The security requirements should also be founded upon the assumption proposed by Kirchoff that security should depend only on the secrecy of decryption key. which cannot be realized by electronic computers by using mathematical methods. the advantages inherent in DNA should be fully explored. Encryption and decryption algorithms hard to be implemented using electronic computers may be feasible using DNA ones with regard to their vast parallel computational ability. Thus. a sender and a receiver. The communication model for DNA encryption is also made up of two par-ties. Encryption and decryption are procedures of data transform which. realizing fast encryption and decryption based on the vast parallelism. an attacker should be fully aware of all the details of encryption and decryption except the decryption key. which obtain the secret key in a secure or authenticated way and then communicate securely with each other in an insecure or unauthenticated channel. that is. It is under this assumption that a cryptosystem can be said secure when any attacker cannot break it. are easier to be implemented than physical and chemical ones in the present era of electronic computers and the Internet. they should have properties such as higher security levels and storage density etc. 2) Security requirements :Regardless of the many differences between DNA and traditional cryptography. More precisely. If these schemes withstand attacks by quantum computers. Since it has not been made sure whether quan-tum computers threaten the hardness of various mathematical hard problems. DNA cryptography does not absolutely repulse traditional cryptography and it is possible to construct a hybrid cryptosystem of them.DNA CRYPTOGRAPHY 1) DNA cryptography should be implemented by using modern biological techniques as tools and biological hard problems as main security basis to fully exert the special advantages. such as developing nanoscopic storage based on the tiny volume of DNA. Thereby. if DNA cryptography is necessary to be developed. and utilizing difficult biological problems that one can utilize but still far from fully understand them as the secure foundation of DNA cryptography to realize novel crypto-system which can resist the attack from quantum com-puters. their computational security will be inherited into DNA schemes. and has enough knowledge and excellent laboratory devices to repeat the de-signer’s operations. If other kinds of cryptosystems are necessary to be researched and developed. The only thing NIT KURUKSHETRA 8 . it must be assumed that an attacker knows the basic biological method the designer used. if described by mathematical methods. these problems being se-curity basis cannot be excluded absolutely. they both satisfy the same characteristic of cryptography. i.

The development of modern biological technology makes it possible to express data by DNA. The method is easier to be implemented than encoding message into nucleotides directly while the storage density is somewhat lower. This motivates the research of DNA computing and cryptography. but the related research is in its initial stage. although the related research is just in its initial stage. 1. Sound theories have not been founded for both DNA computing and cryptography. For example. It is more practical to make use of colony property of plentiful DNA for cryptographer. the most important is to find the sound properties of DNA that can be used to computation and encryption. 4) Currently. and sometimes the experiment conditions. to establish the theoretical basis and to accumulate the experience. exceptional energy efficiency and extraordinary information density inherent in DNA. Modern biology lays particular stress on experiments rather than theories. In fact. the current research target should lie first in security and feasibility. In a DNA cryptosystem. it is also impossible to store all the worldwide data by using several grams of DNA. It can be proved that there are vast parallelism. 3) For DNA cryptography. which makes the operations of input/output faster and more convenient. store data by DNA chips and read data by hybridization.11 DNA Digital Coding Technology NIT KURUKSHETRA 9 . If the only requirement is to improve the density of storage.DNA CRYPTOGRAPHY not known by the attacker is the key. It is certainly urgent to find such a method similar to computational complexity. based on which the design of secure and practical DNA cryptosystems is possible. A sound cryptosystem should be secure as well as easy to be implemented. There is no efficient way to measure the hardness of a biological problem and the security level of the corresponding cryptosystems based on the problem. Scientists can easily operate DNA with the aid of kinds of restriction enzymes only after DNA strands are amplified with amplification technology such as PCR. The cur-rent goal or difficulty is to find and make use of the utmost potential. it is still difficult to operate the nanoscopic DNA directly. second in storage density. a key is usually some substances of biological materials or a preparation flow. it is hard to implement DNA cryptography at the present technique level. With the current technology. the main task for DNA cryptographers is to establish the theory foundations and to accumulate the practical experience. Presently.

According to this complementary rule. whose security on the scheme is mainly based on the difficult biological problems and difficult mathematical problems. Obviously. (4). 0123/GATC. 0123/TGCA. 3(11). G) is by means of 4 digits: 0(00). which are adenine (A) and thymine (T) or cytosine (C) and guanine (G) in DNA sequence. (3). T. 0123/CTAG. We will show the way of exchanging message safely just between specific two persons. the traditional encryption method such as DES or RSA could be used to preprocess to the plaintext in the cryptography scheme.12 System Design Of Encryption Scheme Now. 0123/ACGT. The DNA sequence after preprocessing by DNA digital coding techniques is able to do digital computing and adapt to the existing computer-processing mode. 1(01). we will describe the system design of encryption scheme. C. in a double helix DNA string. Take DNA digital coding into account. The binary digital coding of DNA sequences prevails over the character DNA coding with the following advantages: (1). it should reflect the biological characteristics of 4 nucleotide bases. There are four kinds of bases. 0123/CATG. (2). which is anything can be encoded by two state 0 or 1 and a combination of 0 and 1. The simplest coding patterns to encode the 4 nucleotide bases (A. This pattern could perfect reflect the biological characteristics of 4 nucleotide bases and have a certain biological significance. 0123/AGCT) which are topologically identical fit the complementary rule of the nucleotide bases. that is 0(00) to 3(11) and 1(01) to 2(10). which facilitates the direct conversion between biological information and encryption information in the cryptographyscheme. 1. By using the technology of DNA digital coding. the complementary rule that (~0)=1. The digital coding of DNA sequence is very convenient for mathematical operation and logical operation and may give a great impact on the DNA bio-computer. two DNA strands are held together complementary in terms of sequence. We shall call the NIT KURUKSHETRA 10 . and (~1=0) is proposed in this DNA digital coding. To decrease the redundancy of the information coding andimprove the coding efficiency compared to the traditional character DNA coding. is the best coding pattern for the nucleotide bases. that is A to T and C to G according to Watson-Crick complementarity rule. 0123/TCGA. So among these 24 patterns. the most fundamental coding method is binary digital coding.DNA CRYPTOGRAPHY In the information science. As we all know. only 8 kinds of patterns (0123/CTAG. 2(10). It is suggested that the coding pattern in accordance with the sequence of molecular weight. 0123/GTAC. there are 4!=24 possible coding patterns by this encoding format.

and an intended receiver Bob who owns a decryption key KB (KA = KB or KA ≠ KB). KB and C are not limited to digital data. Above all. Here. material. the sender Alice will translate the plaintext M into hexadecimal code by using the built-in computer code. Bob uses KB to translate the ciphertext C into the plaintext M by a translation D. After a pair of PCR primers is respectively designed and exchanged over a secure communication channel. Encryption First of all. etc. and the intended receiver Bob. Through this preprocess operation. such as DNA sequence. but can be any physical or chemical or biological or mathematical process such as traditional encryption method. Alice translates the binary plaintext M_ into the binary ciphertext C_ by using Bob’s public key e. as well as an decryption key KB that is a pair of PCR primers and Bob’s secret key d. Alice uses KA to translate a plaintext M into ciphertext C by a translation E. data. We call translation E as encryption process and C as ciphertext. E and D are also not limited to mathematical calculations. B. The intended receiver Bob has a pair of keys (e. which can effectively prevent attack from a possible word as NIT KURUKSHETRA 11 . we extend the definition of this encryption scheme as follows. Key Generation The message-sender Alice designs a DNA sequence which is 20-mer oligo nucleotides long as a forward primer for PCR amplification and transmits it to intended receiver Bob over a secure channel. we can get an encryption key KA that is a pair of PCR primers and Bob’s public key e. d). an encryption scheme with DNA technologies was proposed in this paper. The encryption process is: C = EKA (M) The decryption process is: DKB (C) = DKB (EKA (M)) = M It is difficult to obtain M from C unless one has KB. Then hexadecimal code is translated into binary plaintext M_ by using third-party software. Using traditional cryptography RSA to preprocess to the plaintext. we can get completely different ciphertext from the same plaintext. Suppose there is a sender Alice who owns an encryption key KA. A. Finally.DNA CRYPTOGRAPHY sender Alice. We will describe the general process of the encryption scheme as follows. KA. We call this preprocess operation is pretreatment data process (data pre-treatment). but can be any method. The message-receiver Bob also designs a DNA sequence which is 20-mer oligo nucleotides long as a reverse primer for PCR amplification and transmits it to Alice over a secure channel.

Thus. The last process of this encryption is that Alice generates a certain number of dummies and puts the secrete-message DNA sequence among them. he could retrieve the plaintext M sended from Alice from the reverse preprocess operation using his secret key d.12. It is necessary that each dummy has the same structure as the secretemessage DNA sequence. but also a biological process.1 Fig. Then. the secrete-message DNA sequence is prepared.DNA CRYPTOGRAPHY PCR primers. Since the intended receiver Bob had gotten the correct PCR two primer pairs through a secure way.1 Data pre(post)treatment flow chart NIT KURUKSHETRA 12 . Alice translates the binary ciphertext C_ into the DNA sequence according to the DNA digital coding technology. Alice synthesizes the secret-message DNA sequence which is flanked by forward and reverse PCR primers. In this scheme. After Bob amplifies the secrete-message DNA sequence. he could amplify the secret-message DNA sequence by perform PCR on DNA mixture. Alice sends the DNA mixture to Bob using an open communication channel. Decryption After the intended receiver Bob gets the DNA mixture. 1. After coding. C.12. each 20-mer oligo nucleotides long. After mixing the secretemessage DNA sequence with a certain number of dummies. the dummy is generated by sonicating human DNA to roughly 60 to 160 nucleotide pairs (average size) and denaturing it. This decryption process is not only a mathematic computation. he can easily find the secrete-message DNA sequence. The pretreatment data flow chart is described in Fig.1.

The result of the PCR amplification is shown in fig. We first convert this sentence into hexadecimal code by using the built-in computer code. Step 2: Data pretreatment. that is: 01000111 01000101 01001110 01000101 01000011 01010010 01011001 01010000 01010100 01001111 01000111 01010010 01000001 01010000 01001000 01011001 NIT KURUKSHETRA 13 . The encryption and decryption keys are a pair of PCR primers. Step 1: Key Generation. 1. because even if an adversary somehow caught one of a primer pair. only when both of the primer sequences were correct.12. we thoroughly discuss details of this encryption scheme with an example shown in fig. Here we choose “GENECRYPTOGRAPHY” (gene cryptography) as plaintext to encrypt. In this scheme.12. 1.3. that is: “47 45 4E 45 43 52 59 50 54 4F 47 52 41 50 48 59”. but respectively designed complete cooperation by sender and receiver.DNA CRYPTOGRAPHY In the following part of this section. The message-sender Alice and the message-receiver Bob respectively design and exchange a pair of PCR primers over a secure communication channel. Then we translate hexadecimal code into binary plaintext M_ by using third-party software. the amplification could be successful. the intended PCR two primer pairs was not independent designed by sender or receiver. the amplification was not efficient when one of a primer pair is incorrect. This operation could increase the security of this encryption scheme.2.

12.13 The codes The three codes described in detail in this paper are referred to as the Huffman code. After the intended receiver Bob gets the DNA mixture. Step 5: data post-treatment.2. such as DNA ink or DNA book. Alice sends the DNA mixture to Bob using an open communication channel. Thus. the comma code and the alternating code. providing that this text lacked any sort of punctuation. Finally. he can easily pick out the secret-message DNA sequence by using the correct primer pairs. It should be stated at the outset that none of them fulfill all the criteria listed above. Flow chart of Encryption scheme system. Then. After the binary plaintext M_ has been recovered. After that. Step 4: Decryption. symbols or numbers. After mixing the secrete-message DNA sequence with a certain number of dummies. Both the comma code and the alternating code. Result of the PCR amplification Step 3: Encryption. Bob can retrieve the plaintext M.3. 1. Fig. while the most NIT KURUKSHETRA 14 . a secret-message DNA sequence containing an encoded message 64 nucleotides long flanked by forward and reverse PCR primers. Alice converts the binary ciphertext C_ into the DNA sequence by using the DNA digital coding technology.12. “GENECRYPTOGRAPHY” from the binary plaintext M_ by using data posttreatment. Bob can decrypt the binary ciphertext C_ into the binary plaintext M_ by using his secret key e. the secrete-message DNA is prepared.DNA CRYPTOGRAPHY Fig. Alice will encrypt the binary plaintext M_ into the binary ciphertext C_ by using Bob’s public key e. The Huffman code is the most economical and would be the best for encrypting text for short-term storage. 1. Bob translates the secret-message DNA sequence into the binary ciphertext C_ by using the DNA digital coding technology. 1.

1 The Huffman code By varying the number of symbols allotted to a character in a code. The others all have NIT KURUKSHETRA 15 .13. it does have two disadvantages. no obvious pattern emerges when they are joined together to encode a message.13.DNA CRYPTOGRAPHY uneconomical of the codes. The Huffman code is the only code discussed in this paper with variable length codons. as the frequency of these characters will be heavily text-dependent. once the start point has been specified. i. For instance. The naive investigator might confuse it with natural DNA and therefore not appreciate its significance. The first is that it does not cater for any symbols or numbers. and the longest codon is five bases long (representing q and z. the alternating code is also unambiguous. the most frequently used letter in the English language). the message generated by a Huffman code is unambiguous. at the expense of economy. codes in which the text is encrypted by the minimum number of symbols – it is as short as it can possibly be.g. One of the best ways of constructing an economical code is to use Huffman’s method (Huffman 1952). the most infrequent letters in the English language). such a code is straightforward to construct (Materials and methods). Consequently they cannot be included when deriving the Huffman code. The second disadvantage of the Huffman code relates to its possible use in long-term storage of information. G. In the code.2 bases. shorter than the codons of any of the other codes described in this paper. there is only one way in which the stream of symbols comprising the message can be read. A. the shortest codon is just one base long (representing e. While of the three codes discussed here. C and T for the letters of the English alphabet is shown in Table 1. Because of the variable length of the codons.e. The average codon length is 2.1 Given the frequencies of occurrence of these letters. with the most frequent character being given the least number of symbols and the least frequent the most number of symbols. C and T). have the advantage that they generate base sequences which are obviously artificial. it is possible to construct very economical codes. The unambiguous nature of the Huffman code shown in Table 1 can be seen by encoding any group of letters with it and then decoding them from the beginning of the sequence: there is only one way it can be done. One could counteract this problem by using three instead of four bases (e. As well as being compact. That is. the base sequence CATGTAGTCG can only be read from the beginning as hester – no other interpretation of the message is possible. Given a suitable start signal. 1. The Huffman code constructed with the four DNA bases A. and so would be best suited to the encryption of information for long-term storage. the Huffman makes the most economical use of DNA.

Most (83%) point mutations give nonsense codons. This kind of an arrangement. above). the Huffman code has also been used to construct a ‘perfect’ genetic code comprising variable length codons. These codons are further restricted to three A:T base pairs and two G:C base pairs.13.suggested by unrelated work . e. The repetition of G every six bases must be construed by any careful sequence analyst as a deliberate device. consecutive 5-base codons are separated by a single base. and the C’s and W’s can adopt any arrangement (e. but not G.g.13. We note that. There are 80 codons in this set.1 The Huffman code 1.g. in a similar manner to the above. G− − − − − G− − − − − G− − − −− G. facilitating the construction of message DNA (‘Criteria for an optimal code’. which is always the same: e. the comma. WWCWC or WCCWW). where W = A or T. The codons take the general form CWWWC.2 The comma code In the comma code. and therefore the comma NIT KURUKSHETRA 16 . with the C of the latter always being located in the top strand.g. ATCAC. Table 1. A and T. has the advantage that it will generate a set of codons with isothermal melting temperatures.DNA CRYPTOGRAPHY fixed length codons. The codons that slot into the gaps in the above framework are made up of the remaining bases C.

With the comma code. Like the comma code it does not use DNA economically. Unlike the comma code. in a given piece of message DNA. It should also be noted that the base composition of the codons will give. three (17%) will produce sense codons (mutation of an A to a T. NIT KURUKSHETRA 17 . It is very unlikely that the alternating structure formed by strings of these codons would go unremarked – even short stretches (8 base pairs) of alternating purines and pyrimidines have been noted in naturally occurring DNA . and it is error-detecting. Furthermore.DNA CRYPTOGRAPHY code is good at detecting errors. it offers some protection against deletion and insertion mutations. there is no automatic reading frame. which could further complicate the interpretation of the other codes. message DNA with the unusual property of a 1:1 ratio of A:T to G:C base pairs.13. Of these 18 single point mutations. the number of G:C pairs will be the same as the number of A:T pairs. For example. the alternating code has two other advantages of the comma code: it is isothermal. The other two codes described do not have this advantage. Three possible point mutations can occur at each position of the codon GCWWWC (which includes the initial comma). But the principal attraction of the comma code is the reading frame established by the regular pattern of repeating G’s. since 67% of single point mutations result in nonsense codons.3 The alternating code The alternating code comprises sixty-four 6-base codons of alternating purines and pyrimidines: RYRYRY. it might be difficult to orientate oneself with respect to the message. or vice versa) and therefore the remaining 83% of single point mutations will given nonsense codons. it is not difficult to spot the codon containing the deletion mutation in the following comma-coded sequence: GATCACGATTCCGCTATGACTCAG. 1. the alternating structure has the unusual property that. and therefore there are 18 single point mutations altogether. As in the comma code. the reading frame is clear. As well as creating message DNA of an obviously artificial nature. where R = A or G and Y = C or T (although there is no reason why the purines and pyrimidines should not alternate YRYRYR. or be fixed in other arrangements such as YYYRRR or RRYYRY). when the commas are included. and. unless a start point is specified. but less so than the comma code.

there were a number of suggestions as to what form it might take. because CGG does not belong to the set. One might think that removing the commas would give a code without a reading frame. or the only types of code.13. For example. one could not begin reading one base in. One of these was the comma-free code. They are by no means the only codes. Before experimental data for the nature of the genetic code became available. ACG and GTG are part of a comma free code. Three others are outlined briefly in this section.13.2 General features of the codes Table 1. showed that twenty 3-base codons could be selected to act in a comma-free manner.4 Other codes The three codes detailed above are meant to be illustrative rather than exhaustive. Any combination of these codons will give a sequence which can be read in only one way. As the name suggests. But. by restricting oneself to a set of fixed-length codons with particular base combinations. For instance. There is nothing particularly wrong with the commaNIT KURUKSHETRA 18 . there is a set of fifty-seven 4-base codons that would be enough to carry out this task. possible. Although twenty codons is not sufficient to comfortably encrypt text. the 3-base codons AGG.13. the codons in this set can be chosen such that only one reading frame is ever possible – all the others give nonsense. in the sequence ACGGTGGTGACGAGG. at CGG. In their original paper on the subject.DNA CRYPTOGRAPHY Table 1. a comma-free code is just a comma code without the commas.3 Advantages of the codes 1.

perhaps the most obvious code of all is one similar to the genetic code – a triplet code. in a similar manner to the codons of the comma code.DNA CRYPTOGRAPHY free code as a message-encoding scheme. such that a degree of error-protection could be achieved.g. however. One other simple code that should also be mentioned because it produces DNA that is obviously artificial DNA is one that uses only three of the four different bases. Finally. the commafree code would be error-detecting to a certain extent. However. In fact.g. to give a larger codon set (34 = 81 as opposed to 33 = 27). with error-correcting codons representing symbols with opposite meaning (e. the only significant clue to the synthetic nature of message DNA containing text encrypted with a comma-free code would be the absence of runs of four identical bases (e. it ought to be rather good. as the comma-free code forbids these. There are no such absences in natural DNA. In fact. CTT to encode for ’<’ and AAG for ’>’). Codon assignment in this case may be done in a non-random fashion. message DNA has already been constructed with a 3-base codon version of this code. since it is quite economical and establishes an automatic reading frame. We would probably use a 4-base codon version of this code. NIT KURUKSHETRA 19 . AAAA). Like the alternating and comma codes.

DNA CRYPTOGRAPHY Chapter 2 Objective NIT KURUKSHETRA 20 .

Hide the biological complexity involved in basic processing of DNA cryptography.1 Objective The aim of our project is to build a system which fulfills the following objectives : • • • • To implement the basic concepts of DNA Cryptography. Added to this it is aimed to obtain a clear understanding of the Java cryptography and its native API. understanding the limitations and configurations needed to perform a new technique (DNA 2. To obtain an encoded text as desired.DNA CRYPTOGRAPHY 2. are available in the market this project aims at Although many encoding techniques Cryptography) for encoding text. Allow users to apply the encoding on textual information. NIT KURUKSHETRA 21 .2 Product Perspective The main purpose or goal of the project is to implement the basic fundamentals of DNA Cryptography using the Java platform so as to produce an encoding tool capable of applying the elementary encoding transformations to the text.

DNA CRYPTOGRAPHY Chapter 3 System Requirement Analysis NIT KURUKSHETRA 22 .

DNA CRYPTOGRAPHY 3 System Requirement Analysis: 3. ~ Encoding the text using DNA cryptography and PCR amplifications.1 Characteristics The important characteristics of the system being developed: FUNCTIONS ~ Loading the text file from source. INPUT ~ User input text file for encoder ~ Encoded file for the decoder OUTPUT ~ A Transformed encoded text for sending to decoder ~ Original text file at decoder 3.2 System Requirements The following requirements must be fulfilled to run the software on any computer system .  HARDWARE SPECIFICATIONS Processor Intel Pentium III or higher Color Monitor 800 x 600 or higher resolution PCR (Polymerase Chain Reaction) Monitor Amplifier Amplifier NIT KURUKSHETRA 23 .

NetBeans 6.4 Use Case Diagram 3.1 Usecase diagram(encoder) NIT KURUKSHETRA 24 .DNA CRYPTOGRAPHY  SOFTWARE SUPPORT Operating System Framework 3.1 Encoder Fig 3.3 Technology Used Windows 9x / XP/ NT / 2000 JVM and JRE installed.0 Programming Language JAVA 5 3.4.4.

2 Decoder Fig 3.DNA CRYPTOGRAPHY 3.4.4.2 Usecase diagram(decoder) NIT KURUKSHETRA 25 .

DNA CRYPTOGRAPHY Chapter 4 Project Overview NIT KURUKSHETRA 26 .

In dedicated applications sometimes specialized computers are used to achieve the desired level of performance. The functions of each component is as described below.DNA CRYPTOGRAPHY 4 Project Overview Network (To Receiver) Text File PCR Computer Amplifier Fig 4. Text File is a user input that has to be encoded. The computer is a general computer that can range from a PC to a supercomputer. NIT KURUKSHETRA 27 . It consists of specialized modules that perform specific tasks.1 Project overview The above figure shows the basic components comprising a typical general-purpose system used for dna cryptography. PCR Amplifier is the hardware component that will be used for converting the text into a graphical format which reduces the space consumed.

DNA CRYPTOGRAPHY Chapter 5 Software Design NIT KURUKSHETRA 28 .

T.DNA CRYPTOGRAPHY 5 Software Design 5.T.) = DKB (EKA (P.2. Key generation 2.1 Encoder Fig 5.1 Methodology OF Encryption Scheme The encryption process is: C.T. = EKA (P. Encryption 3.) The decryption process is: DKB (C.1 Flow Diagram(encoder) NIT KURUKSHETRA 29 . STEPS: 1. T.T.2 Flow Diagrams 5.2.)) = P. Decryption 5.

2.2 Flow Diagram(decoder) NIT KURUKSHETRA 30 .2 Decoder Fig 5.DNA CRYPTOGRAPHY 5.2.

1 Encoder Fig 5.3 Class Diagrams 5.1 Class diagram(encoder) NIT KURUKSHETRA 31 .DNA CRYPTOGRAPHY 5.3.3.

2 Decoder Fig 5.2 Class diagram(decoder) NIT KURUKSHETRA 32 .DNA CRYPTOGRAPHY 5.3.3.

3 KeyGen Fig 5.3 Class diagram(keyGen) NIT KURUKSHETRA 33 .3.DNA CRYPTOGRAPHY 5.3.

DNA CRYPTOGRAPHY Chapter 6 Software Testing NIT KURUKSHETRA 34 .

Our objective is to design tests that systematically uncover different classes of errors and do so with a minimum amount of time and effort. They move counter to the commonly held view that a successful test is one in which no errors are found. The above objectives imply a dramatic change in viewpoint. 2. It is used to detect errors. 6.1 Testing Methodology Software testing is critical element of software quality assurance and represents the ultimate review of specification.1 Black-Box Testing Black box testing focuses on the functional requirements of the software. That is. where the system to be tested is executed and the behavior of the system is observed.3 Testing Technique The techniques followed throughout the testing of the system are as follows: 6. Testing is a dynamic method for verification and validation. it is a complementary approach that is likely to uncover a different class of errors than white-box methods.Black-Box Testing attempts to find errors in the following categories: • Incorrect or missing functions. Rather. Testing is a process of executing a program with the intent of finding an error. A good test case is one that has a high probability of finding an as-yetundiscovered error. A successful test is one that uncovers an as-yet-undiscovered error.2 Testing Objectives 1. design and coding.3. 3.DNA CRYPTOGRAPHY 6 Testing 6. NIT KURUKSHETRA 35 . Black Box testing enables the software engineer to derive sets of input conditions that will fully exercise all functional requirements for a program. 6. Black Box Testing is not an alternative to white-box techniques. 4.

3.DNA CRYPTOGRAPHY • • • • Interface errors. which is performed early in the testing process. attention is focused on the information domain. and  Test cases that tell us something about the presence or absence of classes of errors. Performance errors. Because Black Box Testing purposely disregards control structure. Tests are designed to answer the following questions:       How is functional validity tested? What classes of input will make good test cases? Is the system particularly sensitive to certain input values? How are the boundaries of a data class isolated? What data rates and data volume can the system tolerate? What effect will specific combinations of data have on system operation? By applying black box techniques. the number of additional test cases that must be designed to achieve reasonable testing. Using white box testing methods the test cases that can derived are:   All independent paths with in a module have been exercised at least once. Initialization and termination errors. 6. by a count that is greater than one. Errors in data structures or external data base access. rather than errors associated only with the specific test at hand. we derive a set of test cases that satisfy the following criteria:  Test cases that reduce. Exercise all logical decisions on their true and false sides. Black Box Testing tends to be applied during later stages of testing.2 White-Box Testing White Box Testing knowing the internal workings of a product tests can be conducted to ensure that internal operations are performed according to specifications and all internal components have been adequately exercised. 36 NIT KURUKSHETRA . * Unlike White Box Testing.

DNA CRYPTOGRAPHY   Execute all loops at their boundaries and within their operational bounds. Exercise internal data structures to ensure their validity.2 Loop Testing Loops are the corner stone for the vast majority of all algorithms implemented in software. 6.3. NIT KURUKSHETRA 37 . assume that each statement in a program is assigned a unique statement number and that each function does not modify its parameters or global variables.3 Control Structure Testing 6.3.1 Condition Testing Condition testing is a test case design method that exercises the logical conditions contained in a program module.3.3. Therefore types of errors in a condition include the following      Boolean operator error Boolean variable error Boolean parenthesis error Relational operator error Arithmetic expression error 6. Loop testing is a white-box testing technique that focuses exclusively on the validity of loop constructs.3. If a condition is incorrect then at least one component of the condition is incorrect.3. In this testing approach. Four different classes of loops:     Simple Loops Nested Loops Concatenated Loops Unstructured Loops 6.3.3 Dataflow Testing The dataflow testing method selects test paths of a program according to the location of definitions and uses of variables in the program.

DNA CRYPTOGRAPHY It is useful for selecting test paths of a program containing nested if and loop statement. This approach is effective for error detection.  All error-handling paths are tested. NIT KURUKSHETRA 38 . Others consider a module for integration and use only after it has been unit tested satisfactorily. A software testing strategy should be flexible enough to promote a customized testing approach.  Boundary conditions are tested to ensure that modules operate properly at boundary limits of processing.1 Unit Testing Unit testing focuses verification efforts on the smallest unit of software design. 6. 6.  The module interface is tested to ensure that information properly flows in and out of program. the problems of measuring test coverage and selecting test paths for data flow testing are more difficult than the corresponding problems for condition testing.  Local data structure is examined to ensure that data stored temporarily maintain its integrity. 6.  All independent paths are exercised to ensure all statements in a module have been executed at least once. For example: .4.4 Testing Strategies A strategy for software testing integrates software test case design methods into a well planned series of steps that result in the successful construction of software. It is white box oriented.We followed a systematic technique for constructing the program structure that is “putting them together”.interfacing at the same time conducting tests to uncover errors. Unit testing is essentially for verification of the code produced during the coding phase and hence the goal is to test the internal logic of the module.4.2 Integration Testing Integration testing focuses on design and construction of the software architecture. We took unit tested components and build a program that has been dictated by design. However.

and database).4. It is a series of different tests whose primary purpose is to fully exercise the computer-based system. Although each test has a different purpose all work to verify that system elements have been properly integrated and perform allocated functions.DNA CRYPTOGRAPHY 6.g. people.3 Validation Testing It is achieved through a series of Black Box tests. once validated.System testing verifies that all elements mesh properly and that overall system function/performance is achieved. Software. It is intended for all the elements are properly configured and cataloged. must be combined with other system element (e. 6. It is also called AUDIT.4 System Testing The last high-order testing step falls outside the boundary of software engineering and into tile broader context of computer system engineering.4. hardware. An important element of validation process is configuration review.. NIT KURUKSHETRA 39 .

DNA CRYPTOGRAPHY Chapter 7 Project Snapshots NIT KURUKSHETRA 40 .

DNA CRYPTOGRAPHY 7.1 Text file Fig 7.1 Snapshot(original text) NIT KURUKSHETRA 41 .

2 Snapshot(encoded text) NIT KURUKSHETRA 42 .DNA CRYPTOGRAPHY 7.2 Encoded file Fig 7.

DNA CRYPTOGRAPHY 7.3 Decoded file Fig 7.3 Snapshot(decoded text) NIT KURUKSHETRA 43 .

DNA CRYPTOGRAPHY Chapter 8 Conclusion NIT KURUKSHETRA 44 .

This project provides an insight into the various details of the DNA and its use in cryptography purposes. NIT KURUKSHETRA 45 . This project provided us with an opportunity to analyse and practice all the phases of the Software Development Life Cycle.DNA CRYPTOGRAPHY 8 Conclusion The main purpose or goal of the project was to study and implement the basic fundamentals of DNA cryptography on textual information.

DNA CRYPTOGRAPHY Chapter 9 Future Prospects & Enhancements NIT KURUKSHETRA 46 .

DNA CRYPTOGRAPHY 9 Future Prospects and Enhancements  This project can be extended to encrypt other data formats.  DNA Cryptography can be used to prevent cyber crimes like hacking. and provide secure channel for communication.  The space complexity can be reduced by practical usage of PCR Amplifier. NIT KURUKSHETRA 47 .  Ongoing researches could be used for the future enhancement of this project.

DNA CRYPTOGRAPHY APPENDIX Abbreviations DNA RNA PCR C T A G U mRNA tRNA Fullforms Deoxyribose Nucleic Acid Ribose Nucleic Acid Polymer Chain Reaction Cytosine Thymine Adenine Guanine Uracil Messanger Ribose Nucleic Acid Transfer Ribose Nucleic Acid NIT KURUKSHETRA 48 .

Smith.DNA CRYPTOGRAPHY Bibliography Books & Literature [1] “Herbert Schildt”. Amber . JAVA2 Enterprise Edition 1. 2004 [2] Scott W. Magdy Saeb .0 API Documentation Websites [4] Hodorogea Tatiana. Streletchi Cosmin. Borda Monica. .∗ Some possible codes for encrypting data in DNA. Yanfeng Wang . A DNA-based Implementation of YAEA Encryption Algorithm [6] Guangzhao Cui .L. Salah El-Gindi. Hawkins & Jonathan P. Xuncai Zhang An Encryption Scheme Using DNA Technology. A Java Crypto Implementation of DNAProvider Featuring Complexity in Theory and Practice. Jonathan P. NIT KURUKSHETRA 49 . Amin . Biotechnology Letters 25: 1125–1130. 2003. A Pseudo DNA Cryptography Method [8] Geoff C. Tata McGraw-Hill Publishing Company Limited . Fiddes. IEEE 2008 [7] Ning Kang. Ceridwyn C. 2003 [3] Java 5. Fifth Edition. Limin Qin . IEEE 2008 [5] Sherif T.Willey Publishing Inc. Vaida Mircea-Florin . Cox. JAVA2 Complete Reference.4 Bible .

Sign up to vote on this title
UsefulNot useful