Hash Functions
Boban Spasic
1 Introduction
One of fundamentals in modern cryptography is a hash function. A basic idea of hash function is to have a one-way function that will produce xed-length output (e.g. 128 or 160-bits) from a variable-length input message. This condensed representation of message is known as a message digest or ngerprint. Formula that represent hash function will be something like this : h = H(M ) where h is output message with xed length m (e.g. m = 128 bit for MD5) H is input message of arbitrary length M Many functions can produce xed-length output from variable-length input, but hash functions have additional characteristics that make them unique: from given M is easy to compute h from given h is hard to compute M ( quite impossible ) from given M is hard to nd another M that H(M ) = H(M ) Considering this characteristics, hash functions are widely used for data integrity checks and for digital signature tags in Message Authentication Codes.
2 Overview of some hash functions
2.1 MD4 and MD5
MD4 and MD5 were developed by Ron Rivest for RSA Data Security. MD4 is presented in 1990. Because of many weaknesses in MD4, just one year later MD5 is presented as strengthened version of MD4.
2.2 Mathematical background
In this section we will see some details on MD4. MD5 and SHA-1 are based on MD4, so for them we will discuss just modications in comparison to MD4.
2.2.1 Splitting message in 512 bits blocks Given is a bitstring x, we will rst produce an array: M = M [0]M [1] . . . M [N 1] where each M [i] is a bitstring of length 32 and n 0mod16. We will call each M [i] a word. M is constructed from x using the following algorithm: d = (447 |x|)mod512 l = 64 M = x|1|0d | Description: In the construction of M , we append a single 1 to x, then we concatenate enough zeros so that the length becomes congruent to 448 modulo 512, and nally we concatenate 64 bits that contain the binary representation of the (original) length of x (reduced modulo 264, if necessary). The resulting string M has length divisible by 512. So when we break M up into 32-bit words, the resulting number of words, denoted by N , will be divisible by 16. Now we proceed to construct a 128-bit message digest. The message digest is constructed as the concatenation of the four words A, B, C and D, which we refer to as registers. 2.2.2 Compression function of MD4 The four registers are initialized in step 1. Now we process the array M , 16 words at a time. In each iteration of the loop in step 2, we rst take the next 16 words of M and store them in an array X (step 3). The values of the four registers are then stored (step 4). Then we perform three rounds of hashing. Each round consists of one operation on each of the 16 words in X. The operations done in the three rounds produce new values in the four registers. Finally, the four registers are updated in step 8 by adding back the values that were stored in step 4. This addition is dened to be addition of positive integers, reduced modulo 232. Rounds 1, 2, and 3 of MD4 respectively use three functions f , g and h. Each of f , g and h is a bitwise boolean function that takes three words as input and produces a word as output. They are dened as follows: f (X, Y, Z) = (X Y ) ((X) (Y )) g(X, Y, Z) = (X Y ) (X Z) (Y Z) h(X, Y, Z) = X Y Z 2.2.3 Pseudo-code of MD4 hash function: #Initialize buffers ( registers ) A = 67452301 (hex) B = EFCDAB89 (hex) C = 98BADCFE (hex) D = 10325476 (hex) #make array
for i = 0 to N/16 - 1 do for j = 0 to 15 do X[j] = M [16i+j] #set values AA = A BB = B CC = C DD = D #rounds round 1 round 2 round 3 #set results to registers A = A + AA B = B + BB C = C + CC D = D + DD 2.2.4 Round 1 of MD4
A D C B A D C B A D C B A D C B
= = = = = = = = = = = = = = = =
(A + f (B, C, D) + X[0]) (D + f (A, B, C) + X[1]) (C + f (D, A, B) + X[2]) (B + f (C, D, A) + X[3]) (A + f (B, C, D) + X[4]) (D + f (A, B, C) + X[5]) (C + f (D, A, B) + X[6]) (B + f (C, D, A) + X[7]) (A + f (B, C, D) + X[8]) (D + f (A, B, C) + X[9]) (C + f (D, A, B) + X[10]) (B + f (C, D, A) + X[11]) (A + f (B, C, D) + X[12]) (D + f (A, B, C) + X[13]) (C + f (D, A, B) + X[14]) (B + f (C, D, A) + X[15])
3 7 11 19 3 7 11 19 3 7 11 19 3 7 11 19
2.2.5 Round 2 of MD4
A D C B A D C B A D C B A D C B
= = = = = = = = = = = = = = = =
(A + g(B, C, D) + X[0] + 5A827999) (D + g(A, B, C) + X[1] + 5A827999) (C + g(D, A, B) + X[2] + 5A827999) (B + g(C, D, A) + X[3] + 5A827999) (A + g(B, C, D) + X[4] + 5A827999) (D + g(A, B, C) + X[5] + 5A827999) (C + g(D, A, B) + X[6] + 5A827999) (B + g(C, D, A) + X[7] + 5A827999) (A + g(B, C, D) + X[8] + 5A827999) (D + g(A, B, C) + X[9] + 5A827999) (C + g(D, A, B) + X[10] + 5A827999) (B + g(C, D, A) + X[11] + 5A827999) (A + g(B, C, D) + X[12] + 5A827999) (D + g(A, B, C) + X[13] + 5A827999) (C + g(D, A, B) + X[14] + 5A827999) (B + g(C, D, A) + X[15] + 5A827999)
3 5 9 13 3 5 9 13 3 5 9 13 3 5 9 13
2.2.6 Round 3 of MD4
A D C B A D C B A D C B A
= = = = = = = = = = = = =
(A + h(B, C, D) + X[0] + 6ED9EBA1) (D + h(A, B, C) + X[1] + 6ED9EBA1) (C + h(D, A, B) + X[2] + 6ED9EBA1) (B + h(C, D, A) + X[3] + 6ED9EBA1) (A + h(B, C, D) + X[4] + 6ED9EBA1) (D + h(A, B, C) + X[5] + 6ED9EBA1) (C + h(D, A, B) + X[6] + 6ED9EBA1) (B + h(C, D, A) + X[7] + 6ED9EBA1) (A + h(B, C, D) + X[8] + 6ED9EBA1) (D + h(A, B, C) + X[9] + 6ED9EBA1) (C + h(D, A, B) + X[10] + 6ED9EBA1) (B + h(C, D, A) + X[11] + 6ED9EBA1) (A + h(B, C, D) + X[12] + 6ED9EBA1)
3 9 11 15 3 9 11 15 3 9 11 15 3
D = (D + h(A, B, C) + X[13] + 6ED9EBA1) C = (C + h(D, A, B) + X[14] + 6ED9EBA1) B = (B + h(C, D, A) + X[15] + 6ED9EBA1)
9 11 15
2.2.7 Dierences between MD5 and MD4 MD5 have four rounds also four boolean functions. Fourth boolean function is: i(X, Y, Z) = X (Y Z) MD5 uses additive constant in all rounds In MD5 for each step is used dierent additive constant to the result of each block of rounds, the result of previous block is added
2.3 SHA-1
The Secure Hash Algorithm (SHA) was developed by NIST and is specied in the Secure Hash Standard (SHS, FIPS 180). SHA-1 is a revision to this version and was published in 1994. It is also described in the ANSI X9.30 (part 2) standard. SHA-1 produces a 160bit (20 byte) message digest. Although slower than MD5, this larger digest size makes it stronger against brute force attacks. Modications in SHA-1 algorithm in comparison to MD4: SHA-1 is designed to run on a big-endian architecture, rather than a little-endian architecture. SHA-1 produces a 5-register (160-bit) message digest. Initial value of fth register by SHA-1 function is E = 3D2E1F0 SHA-1 processes the message in 16 words at a time, as does MD4. However, the 16 words are rst expanded into 80 words. Then a sequence of 80 operations is performed, one on each word. The following expansion function is used. Given are the 16 words X[0], ..., X[15], we compute 64 more words by following relation: X[j] = X[j 3] X[j 8] X[j 14] X[j 16], 16 j 79 For example : X[16] = X[0] X[2] X[8] X[13] With revision of SHA-1, expansion function is replaced by: X[j] = X[j 3] X[j 8] X[j 14] X[j 16] 1, 16 j 79 Operator 1 means circular left shift of one position.
2.4 Dierences between SHA-1 and MD4
A fourth round has been added. Fourth round uses same boolean function as second round.
SHA-1 keeps the MD4 scheme for constants, with exception that SHA-1 uses constant also in rst round Each block of rounds now adds at the end the result of the previous block like in MD5, plus a constant is added.
2.5 RIPEMD
RIPEMD was developed for the European Communitys RIPE project. The algorithm is a variation of MD4, designed to resist known cryptanalytic attacks, and produce a 128-bit hash value. The rotations and the order of the message words are modied. Additionally, two instances of the algorithm, diering only in the constants, run in parallel. After each block, the output of both instances are added to the chaining variables. This seems to make the algorithm highly resistant to cryptanalysis. As 128-bit hash result ( MD4, MD5, RIPEMD ) does not oer sucient protection anymore, development of 160bit was necessary. RIPEMD-160 is a strengthened version of RIPEMD with a 160-bit hash result, and is expected to be secure for the next ten years or more. The design philosophy is to build as much as possible on experience gained by evaluating MD4, MD5, and RIPEMD. Like its predecessors, RIPEMD-160 is tuned for 32-bit processors. RIPEMD-160 is designed by Hans Dobbertin, Antoon Bosselaers, and Bart Preneel. There are also RIPEMD-256 and RIPEMD-320 functions, but they just provide longer hash result and no better security.
2.6 RIPEMD-160
Basic dierence between RIPEMD-160 and MDx hash functions is that RIPEMD-160 have two function lines. This is like you started two parallel MD5 instances that swaps register contents between processes on end of each round. In RIPEMD-160 contents of registers is swaped in this way: at the end of rst round A and A are swaped, at the end of second round B and B are swaped, etc. RIPEMD-160 contains ve rounds for each line. In order to make lines more dierent, order of boolean functions and order of constants are dierent.
2.7 Parameters of RIPEMD-160
2.7.1 Operations (generalized): A = (A + f (B, C, D) + X + K) S + E, C=C 10, where S means cyclic shift over S positions.
2.7.2 Ordering of message words: Step 1: Apply the following permutations for p:
i p(i)
0 7
1 2 4 13
3 4 5 1 10 6
6 7 8 15 3 12
9 10 0 9
11 12 5 2
13 14 14 11
15 8
Step 2: Dene by setting (i) = 9i + 5mod(16). The order of words in lines is then given by following table: Line left right Round 1 id Round 2 p p Round 3 p2 p2 Round 4 p3 p3 Round 5 p4 p4
2.7.3 Boolean functions are dened as follows:
f1 (X, Y, Z) f2 (X, Y, Z) f3 (X, Y, Z) f4 (X, Y, Z) f5 (X, Y, Z)
= = = = =
X Y Z (X Y ) (X Z) (X Y ) Z (X Z) (Y Z) X (Y Z)
2.7.4 Boolean functions are used in dierent order for each line: Line left right Round 1 f1 f5 Round 2 f2 f4 Round 3 f3 f3 Round 4 f4 f2 Round 5 f5 f1
2.7.5 Following shifts are applied for both lines: Round 1 2 3 4 5 X0 11 12 13 14 15 X1 14 13 15 11 12 X2 15 11 14 12 13 X3 12 15 11 14 13 X4 5 6 7 8 9 X5 8 9 7 6 5 X6 7 9 6 5 8 X7 9 7 8 5 6 X8 11 12 13 15 14 X9 13 15 14 12 11 X10 14 11 13 15 12 X11 15 13 12 14 11 X12 6 7 5 9 8 X13 7 8 5 9 6 X14 9 7 6 8 5 X15 8 7 9 6 5
2.7.6 Constants are also dierent for each line (just integer parts are used): Line left right Round 1 0 30 2 2 Round 2 230 2 230 3 Round 3 230 3 230 5 Round 4 230 5 230 7 Round 5 230 7 0
3 HMAC - MACs based on hash functions
Most commonly mechanism to detect origin and integrity of message is based on a shared key between the parties, and this is usually called a MAC, or Message Authentication Code. The sender appends to message an authentication tag computed as a function of the message and the shared key. HMAC uses hash functions for calculating authentication tag.
3.1 Construction of HMAC
HM AC(T ext) = (K opad, H(K ipad, T ext), where : K = shared key ( 64 bytes long ) ipad = the byte 36 (hex) repeated 64 times opad = the byte 5C(hex)repeated 64 times H = hash function ( can be any hash function, e.g. MD5, SHA-1, RIPEMD-160, etc.)
3.2 Description of process
1. add zeros to the end of K to make 64 bytes long string 2. XOR string computed in previous step with ipad 3. append the data stream Text to the end of result from step two 4. apply hash function to result of step three 5. XOR string computed in step one with opad 6. append result from step 4 to result from step 5 7. apply hash function to result from previous step
4 Security of hash functions
Because the one-way construction of hash functions, reverse engineering of message digest is quite impossible or very hard. As most successful, but not most ecient kind of treat remains brute-force attack.
4.1 Brute-force attack
The Brute-force attack is kind of attack where attacker try all possible combination for input of same function used for securing, and compare the result with desired result. Knowing the number of possible results of hash functions, brute-force attack seems like
very hard and very time-consuming job. For example: on Athlon XP 1800+ probing all 8 characters long combinations od upper-case letters, lower-case letters and numbers will take about 6 days.
4.2 Dictionary-based attack
A bit sophisticated kind of attack is dictionary based attack. Dictionary based attack is also brute force attack, but with list of selected words (character combinations) as input.
4.3 Collision
Other fact that is present by hash functions is collision. Collision means that two different messages gives the same result. Simple example of collision: Say x is a message (1 bit long) and F = x 1. Function F will give the same result for x = 0 and for x = 1. That is called collision.
4.4 Birthday attack
Birthday attack or Birthday paradox is interesting when talking about collision. This paradox says: in a group of 23 random people at least two will share a birthday with probability of at least 1/2. This birthday attack imposes a lower bound on the sizes of message digests. A 40-bit message digest would be very insecure, since a collision could be found with probability 1/2 with just over 220 (about a million) random hashes. It is usually suggested that the minimum acceptable size of a message digest is 128 bits (the birthday attack will require over 264 hashes in this case). The choice of a 160-bit message digest is recommended.
4.5 Rainbow attack
State-of-the-art for brute-force attacks is Rainbow method. Today Rainbow method is used just for cracking LM passwords, but easy can be modied for any known one-way function. Basic idea of Rainbow method is to have a base of pre-calculated pairs of input and output values. If doubled output values occurs ( collision ), just one pair will be stored. This way, on modern PC, base will be calculated in about 50 hours. Size of base for English dictionary with all variations ( upper-case and lower-case letters, permutation inside the word, etc.) is smaller then 500 MB. With 1 GB base guarantee of success is 99.99 percent. Attack with Rainbow method is very fast, because attack is just a comparison of message digest with values from base.
5 Common uses of hash functions
Hash functions are commonly used for data integrity check (e.g. MD5SUM ). By calculating hash sum of le and comparing it with hash sum given by author of le, you can see if transfer was good or not. Also by providing hash sum of le, some primitive virus-detection can be done. Second uses of hash functions is in digital signatures as described in HMAC section above.
6 Conclusion
Because of growing performances of home computers, security of hash functions presented in this document, isnt so reliable any more. In October 2003 is presented CS301 processor from ClearSpeed Technologies, for less then 30000 dollars the power of supercomputer integrated on PCI card. Considering existence of such processor, specialized kernels ( like kernel and OS from LC4 group ) and methods like Rainbow, predicted life-time of hash functions is shortened very fast.
7 References
Bruce Schneier - Applied Cryptography, Second Edition: Protocols, Algorithms, and Source Code in C M. Bellare, R. Canetti and H. Krawczyk - Keying hash functions for message authentication Douglas R. Stinson - Cryptography - Theory and Practice http://www.esat.kuleuven.ac.be/bosselae/ripemd160.html http://membres.lycos.fr/mdcrack/index2.html http://www.cs.auckland.ac.nz/pgut001/links/books.html
10