Vous êtes sur la page 1sur 21

ABSTRACT

The data security is a challenging issue nowadays with the increase of information capacity and
its transmission rate. The most common and widely used techniques in the data security fields
are cryptography and steganography. The combination of cryptography and steganography
methods provides more security to the data. Now, DNA (Deoxyribonucleic Acid) is explored as
a new carrier for data security since it achieves maximum protection and powerful security with
high capacity and low modification rate. A new data security method can be developed by taking
the advantages of DNA based AES (Advanced Encryption Standard) cryptography and DNA
steganography. This new technique will provide multilayer security to the secret message. Here
the secret message is first encoded to DNA bases then DNA based AES algorithm is applied to
it. Finally the encrypted DNA will be concealed in another DNA sequence. This hybrid
technique provides triple layer security to the secret message.
CONTENTS
SL.NO Title Page No

1. INTRODUCTION 1
2. RELATED WORKS 3
3. PROPOSED METHOD
3.1 DNA Encoding
3.2 DNA Based AES Algorithm
3.3 DNA Encryption steps
3.4 DNA Steganography steps
3.5 DNAA Decryption steps
4. SECURITY ANALYSIS
4.1 Theoretical Security analysis of the proposed method
4.2 Comparative study
5. CONCLUSION AND FUTURE WORK
LIST OF FIGURES
Figure No Title Page No

3.1 AES Shift Rows Operation

3.2 AES Mix Columns Operation

3.3 DNA Encryption

3.4 DNA Steganography

3.5 DNA Decryption


LIST OF TABLES
Table No Title Page No

3.1 Binary DNA Coded set

3.2 4-Bit Binary Code Rule

3.3 AES Substitution

4.1 Comparison of various DNA encryption Method


CHAPTER 1

INTRODUCTION

Security in data communication is required when message transfer between sender


and receiver is needed to be kept confidential. Cryptography provides the way of making secure
message for confidential message transfer. Cryptography is the process of transforming the
sender's message to a secret format called cipher text that only intended receiver will get
understand the meaning of the secret message. It can be thought of as a process of secret
writing in order to protect data or message from various attacks of the intruder.
Security is concerned with the protection of message or data while transmitting over
the networks. But nowadays to achieve complete data security is a challenging issue of data or
message transfer .in order to enhance data security and make the data more confidential
effective encryption algorithms is required.
DNA based encryption method is one of the recent technique embedded into
cryptographic field, lot of researchers are working on this. Some of them used DNA computing,
while some other applied biological property of DNA strands and DNA sequence after making
few modification. DNA complementary rule substitution, message embedding within a DNA
sequence, makes cipher text much larger in size compare to plaintext size. DNA cryptography is
a new promising direction in
cryptography research that emerged with the progress in DNA computing field in which DNA is
used as information carrier and the modem biological technology is used as implementation
tool. In fact, the tremendous storage capacity of DNA as well as the ability to synthesize DNA
sequences in any desirable length makes DNA a perfect medium for cryptography and
steganography. However like every data storage device, DNA requires protection through
secured algorithm.
DNA is abbreviated as deoxyribonucleic acid. DNA is important macromolecule (germ
plasma) that is necessary forn life consist of many nucleotides. Each nucleotide is on a single
base and there total four primary bases Adenine (A), Thymine (T), Cytosine (C) and Guanine (G),
the bases represent genetic code. Generally, DNA exists as doublestrandedmolecules. DNA
bases are bonded each other by hydrogen bonds: A with T and C with G, which is called the
complementary pairs of DNA strands.
Most of the security related researchers are working on data security and they are
trying to implement DNA concept into their proposed algorithms directly or indirectly. Some of
them are using DNA computing in their algorithms while others are working on incorporating
DNA properties into their algorithms. They had pointed out that the properties of DNA
sequences which they can utilize to encrypt data by incorporating the message into the DNA
sequence and working also on incorporating extra coding within the cipher text.
Most of the security related researchers are working on data security and they are
trying to implement DNA concept into their proposed algorithms directly or indirectly. Some of
them are using DNA computing in their algorithms while others are working on incorporating
DNA properties into theiralgorithms. They had pointed out that the properties of
DNAsequences which they can utilize to encrypt data by incorporating the message into the
DNA sequence and working also on incorporating extra coding within the cipher text.
The Advanced Encryption Standard (AES) specifies a Federal Information Processing
Standards (FIPS) which approves cryptographic algorithm that can be used to protect electronic
data. The AES algorithm is a symmetric block cipher that can encrypt (encipher) and decrypt
(decipher) information.AES is based on the Rijndael cipher developed by two Belgian
cryptographers, Joan Daemen and Vincent Rijmen, who submitted a proposal to NIST during
the AES selection process. Rijndael is a family of ciphers with different key and block sizes. AES
has been adopted by the U.S. government and is now used worldwide.
The encryption algorithm proposed here is based on the combination of the concepts
of DNA encoding and DNA based AES encryption along with DNA steganography.
CHAPTER 2

RELATED WORKS
A brief review of some related works are discussed in this section. In Design of DNA-
based Advanced Encryption Standard (AES) Explained how can we use DNA as a basic element
ForMany processes which lead to the implementation of a DNAbased version of the "Advanced
Encryption Standard" (DAES). And also trying to facilitate the link and interaction between the
fields of DNA computing and digital computing. The DNA- based AES have the same strength
properties and high security proved by AES.It showed also how the complicated binary based
operations can be implemented on DNA basis using very elementary requirements.
The main contribution was to prove the availability of implementing such a complex
system on DNA Computers using DNA basis. This work is summarized in converting all the
algorithm data, Operations, used functions and specifications, from binary basis to DNA basis.
As researches in DNA cryptography recently focuses more on theory than implementation, they
have converted all the used materials to the form of DNA and implemented this algorithm on
digital computers with detailed specification of the least requirements needed to implement
the algorithm on DNA computers.
In Hybrid Technique for Steganography-based on DNA with N-Bits Binary Coding Rule,
a blind data hiding hybrid technique is introduced using the concepts of cryptography and
steganography in order to achieve double layer secured system. New binary coding rule is
proposed that assigns 2n bits to each combination of n nucleotides instead of assigning two bits
to only one nucleotide which increase the number of rules from 4! To (2n*2n)! Binary coding
rules lead to strengthen the algorithm’s security. The proposed method consists of two phases:
phase one is converting the message to DNA format using the proposed n-bits binary coding
rule leading to high algorithms cracking probability compared with those of other algorithms.
Followed by applying the Play fair cipher based on DNA and amino acids to encrypt the secret
message which generates ambiguity. Phase two is hiding the cipher secret message parts with
the ambiguity results from the first phase.
The data is hidden using the least significant base (LSBase) on each codon of a
selected DNA reference sequence using 3:1 hiding strategy. The proposed technique achieves
hiding the data in DNA with preserving its biological functions as possible without requiring any
extra data to be sent To the receiver. The proposed method also doesn’t expand the real DNA
reference sequence. It preserves the original DNA sequence length as it depends on
substitution only so the algorithm’s payload is zero which avoids attracting the attention to the
faked sequence. As a future work, the proposed method can be modified to increase the hiding
capacity of the DNA sequence with increasing the algorithm’s security too.
Cryptography and Bioinformatics techniques for Secure Information transmission over
Insecure Channels, is another method in DNA cryptographic field.It is better than AES and DES.
The proposed algorithm uses large key values and uniqueness of DNA characteristics, Each DNA
consists of 3 Billion keys for small information of binary data. It will be able to provides stronger
protection against the various intruder's attacks like cipher text only, chosen cipher text etc.The
key for encryption is obtained from the server. The DNA complementary rule is applied to
DNAbased form of data as well as the key. Then XOR operation is performed between the
binary form of the key and data. Convert the result back to DNA form. Obtain the ASCII value of
the bases and finally convert the result to it corresponding seven bits binary form.
The method proposed in “A DNA and Amino-Acids Based Implementation Of Four-Square
Cipher discusses a significant modification of the old approach of using DNA and Amino Acids
based approach with Playfair Cipher to using the same approach with a different encryption
algorithm, i.e., Foursquare cipher to the core of the ciphering process. In this study, a binary
form of data, such as plaintext messages, or images are transformed into sequences of DNA
nucleotides. Subsequently, these nucleotides pass through a Foursquareencryption process
based on amino acids structure. The fundamental idea behind using this type of encryption
process is to enforce other conventional cryptographic algorithms which proved to be broken,
and also to open the door for applying the DNA and Amino Acids concepts to more
conventional cryptographic algorithms to enhance their security features.
CHAPTER 3
PROPOSED METHOD

In order to provide better security and reliable data transmission an efficient


method of DNA based cryptography with steganography is proposed here.The encryption based
on combination of the concepts of DNA encoding and DNA based AES Encryption along With
DNA Steganography. The proposed scheme consists of two phases: the first one is converting
the secret message to DNA by mapping the binary bits to DNA nucleotides using a proposed n-
bits binary coding rule.
N-bits binary coding rule works on mapping n-bits from the message to m-bases of
DNA Then, the DNA and amino acids based Playfair cipher encrypts the encrypted message.
Then, the ciphered message is hidden in a selected DNA sequence from NCB I (National center
for Biotechnology information) database using the LSBase method. So, the first contribution is
providing double layer security system by developing a hybrid technique to hide the encrypted
secret data in DNA results in high security. The second contribution is using
N-bits binary coding rule to convert the binary format of a text to DNA that results in
increasing the algorithm’s cracking probability in obvious way. Finally, the third contribution is
the innovation idea 3:1 ratio used in the data hiding technique. This strategy hides the secret
data in DNA using LSBase method which results in avoiding extra data to be sent to the
receiver.
3.1 DNA Encoding
During DNA encoding the data will undergo several conversions to obtain DNA form.
DNA and amino acids based Playfair cipher is applied as the following:
1 .input:Secret message M and secret key K.
2. Processing: The secret message is mapped into itsASCII Value. From that corresponding ASCII
Value 8-bits binary form Mbin are produced.Then each 4 bit is taken and converted to nitrogen
bases (DNA bases) using the four bit binary coding rule The naive binary coding rule maps each
2 bits to one DNA nucleotide, for example (A 00, C 01, G 10, T 11).
Table.3.1: Binary DNA Coded set

The naive rule makes 24 permutations. The first choice is for nucleotide A that will have
4 possibilities: 00, 01, 10 or 11. The next choice is for C that will have 3 binary codes that are
remaining after removing the binary code assigned to A.Similarly, G will have two options and
finally T will be assigned the last remaining binary code. So, the overall unrepeated
permutations = 4*3*2*1 = 24.
While, the proposed binary coding rule in Table 3.2. Maps each four bits of the binary
Message to two DNA nucleotides in order to increase the algorithm’s security as the BCR shown
in Table 3.1. Then, AA can be assigned by 0000 or 0001 or 0010 or.. Or 1111 so it has 16
possibilities. The next choice is for AC that will have 15 binary codes that are remaining after
removing the binary code assigned to AA. After that AG will have 14 options after AA and AC
are assigned and so on. So, the overall unrepeated permutations = 16*15*14*..*3*2*1 = 16!
That leads to high cracking probability as will be discussed in the security analysis section.
Simply, the binary coding rule can be generalized by assigning 2n bits to each n nucleotides
bases to achieve the required high degree of security by the system. This is declared by the
following example: A, AA, AAA, AAAA, .... (A...A) bases where A is repeated in the last term n
times can be assigned respectively to 00, 0000, 000000, (00...00) bits where 00 is repeated in
the last term n times.

Table.3.2: 4-bits binary coding rule that is used to convert binary format of a message to DNA

Binary coding rule:The proposed 4-bits representative binary coding rule represents each two
nucleotides by four bits. Since there are 4 bases (A, C, G and T), then the possible combinations
from them are 42 = 16 couples; (AA, AC, AG, AT, CC, CA, ..). The sender is free to select any
equivalentfour bits to every two nucleotides. It means that, AA can be represented by ‘0000’,
‘0001’, ‘0010’, ‘0011’, ..., ‘1111’, so, it has 16 options to be selected. If ‘0000’ is selected to
represent AA, AC can be represented by ‘0001’, ‘0010’, ‘0011’, ‘0100’,
..., ‘1111’, so, it has 15 options and so on till 16 pairs of nucleotides have been assigned a binary
representation. So, simply all binary coding rules are (# of BCRs):
#ofBCRs= 16∗15∗14∗13∗12∗..∗4∗3∗2∗1 = 16!
The output DNA of the secret message is converted to amino acids according to the new
distribution of the alphabet with their corresponding new codons. The new distribution is
derived from the standard universal table of amino acids and their DNA codons representation.
Since each amino acid is associated with multiple codons and the message is converted from
DNA to amino acids. There should be something that refers to the index of each DNA codon
corresponding to each amino acid to be able to retrieve the correct codon by the receiver in the
decryption phase when amino acids convert to DNA.
The conversion of DNA to Amino acid is done by obtaining its RNA form by replacing all ‘T’
With ‘U’ (Uracil nitrogen base). During amino conversion, corresponding ambiguity bits are
recorded. Apply swapping by taking each pair of amino acids. Again convert back to bases
having ambiguity ‘00’ using the same amino acid table. The DNA decoding is the reverse of
encoding steps.
3.2 DNA Based AES Algorithm
After DNA encoding, the result will be a sequence of nitrogen bases. The DNA based
AES algorithm takes data in blocks of 64 bases. A key of 128 bit (64 DNA) is used for encryption
as well as decryption.AES is a variant of Rijndael which has a fixed block size of 128 bits, and a
key size of 128, 192, or 256 bits. By contrast, the Rijndael specification per se is specified with
block and key sizes that may be any multiple of 32 bits, both with a minimum of 128 and a
maximum of 256 bits. AES operates on a 4x4 column-major order matrix of bytes, termed the
state, although some versions of Rijndael have a larger block size and have additional columns
in the state. Most AES calculations are done in a special [mite field. The key size used for an AES
cipher specifies the number of repetitions of transformation rounds that convert the input,
called the plaintext, into the final output, called the cipher text. The number of rounds are as
follows:
• 10 rounds for 128-bit keys.
• 12 rounds for 192-bit keys.
• 14 rounds for 256-bit keys.
At each round (Nr), AES has four basic steps for the plaintext to be implemented. AES should
implement another step of key expansion to expand the cipher key to the round keys schedule
presenting a round key for each round with size equals to the block size. Cipher key is Nb= 128,
192, or 256 bits according to each Nr: 10, 12, and 14 respectively. The binary form of plaintext
is divided into blocks of Nb bits [STATE] and each block goes through Nr rounds of encryption.
The input key is subjected to steps of key expansion Nr times resulting of Nr round keys. Steps
of key expansion are:
1- RotWordO takes a word [aO, al,a2,a3] as input, performs a cyclic permutation, and returns
the word [al,a2,a3,aO].
2- SubWordO is a function that takes a four-byte input word and applies the S-box to each of
the four bytes to produce an output word (substitution).
3- The round constant word array, Rcon[i], contains the values given by [Xi_I, {OO}, {OO}, {OO}],
with x /_1 being powers of x (x is denoted as {02}) in the field GF (28), (note that i starts at 1,
not 0).
4- Last operation includes XOR with Rcon and then XOR with the temp (output of the previous
round).
Inorder to obtain the cipher text, the input has to undergo 10 rounds of operation where
Each round involves the following functions:
1) DNAAddRoundKey: It is a simple operation that involves XOR of elements of the STATE with
the corresponding Round key. The round keys are generated through key expansion process .
2) DNASubBytes: In this step, each 4 DNA in STATE streams will be an input to the function
substitute in the S-Box.
Table. 3.3. AES SUBSTITUTION S-BOX

Each 2 bytes act as row and column entry to s-box resulting 2 different bytes.

3.DNARows: It is a transformation that operates row by row on STATE. It is basically a function


of separating each row in a separate stream then left rotation by 4 DNA characters according to
each row number.

Fig.3.1: AES Shift Rows Operation


Left Circular shift of each row (0<r<3) with the value of r ,As stream: shift (0,3)=do nothing;
Shift ( 1,4) = 1; shift(2,4) = 2 ; shift(3,4) = 3,

4. DNAMixColumn: In Mix Column function, a predefined column matrix is XORed with each
column of input as per the round number. This function is only present in first eight rounds.
The decryption procedure involves the inverse of all theencryption round functions. During
decryption, the inverse DNA Mix Column is present in eight rounds except first and last.
Fig.3.2: AES Mix Columns Operation

3.3 DNA Encryption Steps


1.Convert the message to binary form from its ASCII value.
2. Use 4-bit binary coding rule [3] to convert the message in binary form to DNA. Then check
whether the length of DNA form obtained is divisible by three, to divide them to codons (each
codon contains three bases). If not, append nitrogen base ‘A’ at the end of obtained DNA form
till the codons can be formed.
3. Convert DNA form obtained from the previous step to amino acid, based on amino table.
During this process record the corresponding ambiguity bits [3], which will be used in DNA
decoding step.
4. Perform swapping in amino acid form of the message and again convert it into DNA. Inorder
to perform DNA based AES encryption, the obtained DNA form need to be divided into blocks
or states (where each state contains 64 DNA bases). For that check the length of DNA bases
whether it is divisible by 64. If not, append base ‘A’ at the end of obtained DNA form till the
states can be formed. Then randomly generate the key and perform key expansion.
5. Apply DNA based AES algorithm
Fig.3.3: DNA Encryption

6. Convert the DNA Cipher to Binary.

3.4. DNA Steganography steps


The DNA steganography handles both embedding and extraction of data using a DNA
reference sequence. The reference DNA is publically available in NCBI (National Center for
Biotechnology Information) database. The data hiding procedure adopted here is based on
LSBase method and adjacent base method. The hiding operation is same in both LSBase and
adjacent base, only difference is in the selection of bases for hiding. The LSBase is a substitution
Method using least significant base in codons of DNA reference sequence, whereas in the
latter, the hiding is done on the adjacent bases, in a continuous manner.
LSBase method depends on hiding the secret message bits in the least signification bit
of each codon of the reference sequence. Any sequence is a combination of some purine bases
(A & G) and pyrimidine bases (C & T). In order to hide the cipher message bits, the following
steps are applied:
1.Input: Ciphered DNA message (cipherMDNA), ambiguity (AMBIG), and a DNA reference
sequence.

Fig.3.4: DNASteganography

2.Processing: The formed DNA (cipherMDNA) from the encryption phase is converted into
binary again to form cipher binary message (cipherMbin) by using 4-bits representationbinary
coding rule as following: The AMBIG as well is converted to binary AMBIGbin and since the
maximum number of codons corresponding to an amino acid is 4, indexed from 0 to 3 so it can
be represented in maximum 2 bits as showing Fig. 3.4. Select a DNA reference sequence from
one of the public databases such as EBI or NCBI and convert it to RNA by substituting each T
with U. Hide cipherMbin and AMBIGbin using LSBase method.
Since, hiding methodology depends on the message bits and the LSBase of each
codon in the DNA. The LSB of S is checked and if it is a purine base (A & G), it is substituted by
(G) to encode 1 of the secret message or (A) to encode 0. If the LSB of S is a pyrimidine base (C
& T), it is substituted by (C) to encode 1 or (U) to encode 0. LSBase algorithm neglects the
following codons: UGA, UGG, AUA and AUG during the hiding process since according to the
standard distribution of DNA codons to amino acids, Trp and Met amino acids have a single
codon which are AUG and UGG respectively. Also, stop has only one codon which is UGA which
will be neglected too. Finally lle is coded by three codons: AUU, AUC and AUA, so, AUA is
neglected and AUU and AUC will be used in data hiding. The complete data hiding scheme is
shown in Fig. 3.
3.Output: Output from phase I, not only the cipher binary message but also the ambiguity
results from converting DNA format of the message to amino acids. The objective of the
proposed method is to hide the secret message and the ambiguity required by the receiver to
retrieve the secret message from the DNA without additional information.
This because, the ratio of the length of the binary cipher message to the length of the binary
ambiguity is 3:1 as will be discussed in subsection C. So, hiding the message with the ambiguity
in the DNA sequence using 3:1 ratio avoids adding additional data to mark the starting position
of the message in the DNA reference sequence and the starting position of the ambiguity as
well.
In order to hide the cipher text bits and ambiguity bits, a pattern is formed by taking each
three bits of cipher bits followed by one ambiguity bit. For the pattern formation, the number
of the cipher bits taken along with ambiguity must be matched suitably. If it is not matching add
required `0' bits at the end of cipher bits or ambiguity bits. After extraction the extra bits added
will get excluded. In the extraction phase, key bits and ambiguity bits length can be extracted
easily since it was embedded based on adjacent base method. After this, the pattern gets
extracted and from that, cipher bits and ambiguity bits are separated.

3.5 DNA Decryption Steps


The following steps are for DNA Decryption.
1. Convert the binary form of cipher text to DNA. Also obtain the DNA form of key extracted.
Then perform key expansion.
2. Decrypt the DNA cipher with DNA based AES algorithm.
3. Converted the output obtained from the previous step to amino acid. Before the conversion
check whether the DNA form obtained can be divide into codons. For that exclude the bases at
the end one by one till the formation of codons is possible (here the extra bases appended
during encryption get removed).
4. Perform reverse swapping in amino form thus obtained.
5. Convert amino to DNA with the help of ambiguity bits then converts to binary.
6. Obtain the original message from ASCII value after ASCII conversion of binary.
Fig.3.5: DNA Decryption
CHAPTER 4
SECURITY ANALYSIS
The security analysis of the proposed method is included in this section along with the
comparative study of the proposed method and some recent DNA cryptography methods.

4.1 Theoretical security analysis of the proposed method.


The basic information needed for an intruder to crackthe proposed method are DNA reference
sequence used for data hiding, Key used for encryption, DNA encoding rule and hiding method
adopted.
1) DNA reference sequence, DNA encoding rule and LSBase data hiding method. The probability
of an attacker making successful guess (SG) of DNA reference sequence used for data hiding,
DNA encoding rule and hiding method adopted is
P (SG)=1/1.63*16*4*108

2) Key used in DNA based AES encryption.


The length of the key used in DAES encryption is of 64 DNA bases. Since it is produced randomly
the possible combination of key is about 464 (In 64 DNA bases of the key each base can be any
of A, C, G and T). So the probability of making successful guess of the key is:
P(SG of key)=1/464
Therefore, the total probability of an attacker making successful guess of the proposed method
which can be calculated by combining the above two equations is:
P (SG)=1/1.63*108*16*4*464

4.2 Comparative Study.


Some of the recent DNA based cryptographic methods are presented in Table 4.1. The
comparison between the proposed system and the other four already existing system. The
parameters used for comparison are: Input type, Type of encryption, Encryption algorithm used
other than DNA encoding, the Steganography method
Table.4.1: comparison of various DNA encryption Method
CHAPTER 5
CONCLUSION AND FUTURE WORK

The Cryptography and Steganography together provides a way for more confidential
data transfer. DNA based encryption and steganography method is one of the recent
techniques embedded into cryptographic field. Here a new hybrid method is presented by
combining the means of cryptography and steganography as well. This achieves multilayer
security of the system along with DNA based AES encryption. The Steganography methods
adopted here do not expand the reference DNA sequence and also the embedded data can be
extracted without the need of actual DNA reference sequence.
As a future work, the proposed method can be modified by introducing various
encryption techniques like Blowfish, Two fish, Triple DES etc. along with DNA based AES
encryption, such that these encryption techniques can be randomly selected and applied to the
selected data. This way the security of data can be enhanced. As the result after steganography
increases the size of data, compression techniques can also be applied with DNA encryption
and steganography.
REFERENCE

[1] Atanu Majumder, Abhishek Majumdar, Tanusree Podder, Nirmalya Kar and Meenakshi
Sharmas, “Secure Data Communication and Cryptography based on DNA based Message
Encoding”, IEEE, pp. 360- 363, 2014.
[2] Mona Sabry, Mohammed Hashem, Taymoor Nazmy and Mohammed Essam Khalifa, “Design
of DNA-based Advanced Encryption Standard (AES)”, IEEE, pp. 390-397, 2015.
[3] Ghada Hamed, Mohammed Marey, Safaa Amin El-Sayed and Mohammed Fahmy Tolba,
“Hybrid Technique for Steganography-based on DNA with N-Bits Binary Coding Rule”, pp.
95102, 2015.
[4] Siddaramappa .V, Ramesh K . B, “Cryptography and Bioinformatics techniques for Secure
Information transmission over Insecure Channels", IEEE, pp. 137-139, 2015.
[7] Sonal Namdev, Vimal Gupta, “A DNA and Amino-Acids BasedImplementation Of Four-Square
Cipher”, Journal of Engineering Research and Applications, ISSN: 2248-9622, vol. 6, Issue No. 1,
(Part-2), pp. 90-96, January 2016.

Vous aimerez peut-être aussi