Vous êtes sur la page 1sur 9

Experiment: 4

Title: To Understand and Implement the MD5


Cryptographic Hashing Algorithm
Aim: To understand and implement the Message Digest 5 (MD5) cryptographic hashing
algorithm and to generate a 128 bit message digest for a given input.

Theory:

Cryptographic Hash Function:

A cryptographic hash function is a hash function which takes an input (or 'message') and
returns a fixed-size alphanumeric string. The string is called the 'hash value', 'message digest',
'digital fingerprint', 'digest' or 'checksum'.
The ideal hash function has three main properties:

1. It is extremely easy to calculate a hash for any given data.


2. It is extremely computationally difficult to calculate an alphanumeric text that has a given
hash.
3. It is extremely unlikely that two slightly different messages will have the same hash.

Hash function examples

Uses:

Functions with these properties are used as hash functions for a variety of purposes, not
only in cryptography. Practical applications include message integrity checks, digital
signatures, authentication, and various information security applications.

A hash function takes a string of any length as input and produces a fixed length string which acts
as a kind of "signature" for the data provided. In this way, a person knowing the "hash value" is
unable to know the original message, but only the person who knows the original message can
prove the "hash value" is created from that message.

A cryptographic hash function should behave as much as possible like a random function while
still being deterministic and efficiently computable. A cryptographic hash function is considered
"insecure" from a cryptographic point of view, if either of the following is computationally
feasible:

1. Finding a (previously unseen) message that matches a given hash values.


2. Finding "collisions", in which two different messages have the same hash value.

An attacker who can find any of the above computations can use them to substitute an authorized
message with an unauthorized one.

Ideally, it should be impossible to find two different messages whose digests ("hash values") are
similar. Also, one would not want an attacker to be able to learn anything useful about a message
from its digest ("hash values"). Of course the attacker learns at least one piece of information, the
digest itself, by which the attacker can recognise if the same message occurred again.

In various standards and applications, the two most commonly used hash functions
are MD5 and SHA-1.

MD5:

The MD5 message-digest algorithm is a widely used hash function producing a 128-
bit hash value. Although MD5 was initially designed to be used as a cryptographic hash function,
it has been found to suffer from extensive vulnerabilities. It can still be used as a checksum to
verify data integrity, but only against unintentional corruption. It remains suitable for other non-
cryptographic purposes, for example for determining the partition for a particular key in a
partitioned database.

One basic requirement of any cryptographic hash function is that it should


be computationally infeasible to find two distinct messages which hash to the same value. MD5
fails this requirement catastrophically; such collisionscan be found in seconds on an ordinary
home computer.

The weaknesses of MD5 have been exploited in the field, most infamously by the Flame
malware in 2012. The CMU Software Engineering Institute considers MD5 essentially
"cryptographically broken and unsuitable for further use".[4]

MD5 was designed by Ronald Rivest in 1991 to replace an earlier hash function MD4,
and was specified in 1992 as RFC 1321.

MD5 processes a variable-length message into a fixed-length output of 128 bits. The input
message is broken up into chunks of 512-bit blocks (sixteen 32-bit words); the message
is padded so that its length is divisible by 512. The padding works as follows: first a single bit, 1,
is appended to the end of the message. This is followed by as many zeros as are required to bring
the length of the message up to 64 bits fewer than a multiple of 512. The remaining bits are filled
up with 64 bits representing the length of the original message, modulo 264.

The main MD5 algorithm operates on a 128-bit state, divided into four 32-bit words,
denoted A, B, C, and D. These are initialized to certain fixed constants. The main algorithm then
uses each 512-bit message block in turn to modify the state. The processing of a message block
consists of four similar stages, termed rounds; each round is composed of 16 similar operations
based on a non-linear function F, modular addition, and left rotation.
Working

Preparing the input

The MD5 algorithm first divides the input in blocks of 512 bits each. 64 Bits are inserted at the
end of the last block. These 64 bits are used to record the length of the original input. If the last
block is less than 512 bits, some extra bits are 'padded' to the end.

Next, each block is divided into 16 words of 32 bits each. These are denoted as M0 ... M15.

MD5 helper functions

The buffer

MD5 uses a buffer that is made up of four words that are each 32 bits long. These words are
called A, B, C and D. They are initialized as

word A: 01 23 45 67

word B: 89 ab cd ef

word C: fe dc ba 98

word D: 76 54 32 10

The table
MD5 further uses a table K/T that has 64 elements. Element number i is indicated as Ki. The
table is computed beforehand to speed up the computations. The elements are computed using
the mathematical sin function:

Ki = abs(sin(i + 1)) * 232

Four auxiliary functions

In addition MD5 uses four auxiliary functions that each take as input three 32-bit words and
produce as output one 32-bit word. They apply the logical operators and, or, not and xor to the
input bits.

F(X,Y,Z) = (X and Y) or (not(X) and Z)

G(X,Y,Z) = (X and Z) or (Y and not(Z))

H(X,Y,Z) = X xor Y xor Z

I(X,Y,Z) = Y xor (X or not(Z))

Processing the blocks

The contents of the four buffers (A, B, C and D) are now mixed with the words of the input,
using the four auxiliary functions (F, G, H and I). There are four rounds, each involves 16 basic
operations. One operation is illustrated in the figure below.
Fig: Each iteration in a round

The figure shows how the auxiliary function F is applied to the four buffers (A, B, C and D),
using message word Mi and constant Ki. The item "<<<s" denotes a binary left shift by s
bits.

Fig: Overall Working

Output:
After all rounds have been performed, the buffers A, B, C and D contain the MD5 digest of
the original input.

Algorithm:

var int[64] s, K

var int i

//s specifies the per-round shift amounts

Initialize the S array

//Use binary integer part of the sines of integers (Radians) as constants:

for i from 0 to 63

K[i] := floor(232 × abs(sin(i + 1)))

end for

//(Or just use the following precomputed table):

Initialize the K/T array

//Initialize variables:

var int a0 := 0x67452301 //A

var int b0 := 0xefcdab89 //B

var int c0 := 0x98badcfe //C

var int d0 := 0x10325476 //D

//Pre-processing: adding a single 1 bit

append "1" bit to message

// Notice: the input bytes are considered as bits strings,

// where the first bit is the most significant bit of the byte.[49]

//Pre-processing: padding with zeros

append "0" bit until message length in bits ≡ 448 (mod 512)

append original length in bits mod 264 to message

//Process the message in successive 512-bit chunks:

for each 512-bit chunk of padded message

break chunk into sixteen 32-bit words M[j], 0 ≤ j ≤ 15

//Initialize hash value for this chunk:

var int A := a0

var int B := b0

var int C := c0
var int D := d0

//Main loop:

for i from 0 to 63

var int F, g

if 0 ≤ i ≤ 15 then

F := (B and C) or ((not B) and D)

g := i

else if 16 ≤ i ≤ 31 then

F := (D and B) or ((not D) and C)

g := (5×i + 1) mod 16

else if 32 ≤ i ≤ 47 then

F := B xor C xor D

g := (3×i + 5) mod 16

else if 48 ≤ i ≤ 63 then

F := C xor (B or (not D))

g := (7×i) mod 16

//Be wary of the below definitions of a,b,c,d

F := F + A + K[i] + M[g]

A := D

D := C

C := B

B := B + leftrotate(F, s[i])

end for

//Add this chunk's hash to result so far:

a0 := a0 + A

b0 := b0 + B

c0 := c0 + C

d0 := d0 + D

end for

var char digest[16] := a0 append b0 append c0 append d0 //(Output is in little-endian)

//leftrotate function definition

leftrotate (x, c)

return (x << c) binary or (x >> (32-c));

Code and Output:


import math

rotate_amounts = [7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22, 7, 12,
17, 22,
5, 9, 14, 20, 5, 9, 14, 20, 5, 9, 14, 20, 5, 9,
14, 20,
4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23, 4, 11,
16, 23,
6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21, 6, 10,
15, 21]

constants = [int(abs(math.sin(i+1)) * 2**32) & 0xFFFFFFFF for i in


range(64)]

init_values = [0x67452301, 0xefcdab89, 0x98badcfe, 0x10325476]

functions = 16*[lambda b, c, d: (b & c) | (~b & d)] + \


16*[lambda b, c, d: (d & b) | (~d & c)] + \
16*[lambda b, c, d: b ^ c ^ d] + \
16*[lambda b, c, d: c ^ (b | ~d)]

index_functions = 16*[lambda i: i] + \
16*[lambda i: (5*i + 1)%16] + \
16*[lambda i: (3*i + 5)%16] + \
16*[lambda i: (7*i)%16]

def left_rotate(x, amount):

x &= 0xFFFFFFFF

return ((x<<amount) | (x>>(32-amount))) & 0xFFFFFFFF

def md5(message):

message = bytearray(message) #copy our input into a mutable buffer

orig_len_in_bits = (8 * len(message)) & 0xffffffffffffffff

message.append(0x80)

while len(message)%64 != 56:

message.append(0)

message += orig_len_in_bits.to_bytes(8, byteorder='little')

hash_pieces = init_values[:]

for chunk_ofst in range(0, len(message), 64):

a, b, c, d = hash_pieces

chunk = message[chunk_ofst:chunk_ofst+64]

for i in range(64):

f = functions[i](b, c, d)

g = index_functions[i](i)

to_rotate = a + f + constants[i] +
int.from_bytes(chunk[4*g:4*g+4], byteorder='little')
new_b = (b + left_rotate(to_rotate, rotate_amounts[i])) &
0xFFFFFFFF

a, b, c, d = d, new_b, b, c

if i%16 == 15:
print(int((i + 1)/16),"A:",a,"B:",b,"C:",c,"D:",d,sep =
' ')

for i, val in enumerate([a, b, c, d]):

hash_pieces[i] += val
hash_pieces[i] &= 0xFFFFFFFF

return sum(x<<(32*i) for i, x in enumerate(hash_pieces))

def md5_to_hex(digest):
raw = digest.to_bytes(16, byteorder='little')
return '{:032x}'.format(int.from_bytes(raw, byteorder='big'))

if __name__=='__main__':
demo = [
b"The quick brown fox jumps over the lazy dog.",
]
for message in demo:
print(md5_to_hex(md5(message)),' <=
"',message.decode('ascii'),'"', sep='')

“””
C:\Users\Asus\Documents\Sem VI\CSS\EXP4>python md5rosetta.py
1 A: 4156205459 B: 1288067225 C: 1515065400 D: 3684338458
2 A: 2960212908 B: 2098643899 C: 4106848806 D: 823743074
3 A: 4014926640 B: 4178434990 C: 833246599 D: 1855571174
4 A: 1522841315 B: 757998855 C: 356813730 D: 3231239785
e4d909c290d0fb1ca068ffaddf22cbd0 <= "The quick brown fox jumps over the
lazy dog."
”””

Conclusion/Analysis Report: Thus the Md5 hashing algorithm has been implemented and
the 128 bit hash value for the given plaintext has been calculated.

Vous aimerez peut-être aussi