3

Suppose a hash function based on Merkle-Damguard construction. Its compression function is given as $H_i=E_{m_i}(H_{i-1})$, where $E_{m_i}()$ denotes the encryption of an ideal block cipher with $n$-bit block size. What is the lowest complexity to find an preimage of this hash function?

Answer: By applying meet-in-the-middle attack, the complexity of finding an preimage is $2^{\frac{n}{2}}$.

The answer above is provided by lecturer. I don't understand how to apply meet-in-the-middle attack in this question.

Idonknow
  • 491
  • 8
  • 21
  • One note: while the lecturer was correct as regards to a Merke-Daamgard hash function with an inverible compression function, however all real MD hash functions (SHA-1, SHA-2) use a noninvertible compression function. What the lecturer was likely trying to point out why this is the case; why a compression function that is invertible is a bad idea. – poncho Nov 14 '14 at 19:16
  • So you are saying that if an invertible compression function is used in hash function, then the complexity to find an preiamge of such hash function can be reduced? – Idonknow Nov 14 '14 at 19:53
  • Yes, that is what I (and the lecturer) are saying. – poncho Nov 14 '14 at 19:58
  • But from the question, how do we know the compression function used is invertible? – Idonknow Nov 15 '14 at 01:39
  • 2
    Because they said $E_{m_i}()$ denoted the encryption of an ideal block cipher; block ciphers are, by definition, invertable. What MD hashes use in practice is $H_i = H_{i-1} \oplus E_{m_i}(H_{i-1})$; xoring in $H_{i-1}$ prevents invertibility – poncho Nov 15 '14 at 04:58

1 Answers1

3

For an $n$-bit hash with at least $n/2$-bit block size (which is very common and includes MD5, SHA-1, SHA-2, SHA-512), using a round function as in the question, here is how to find a 2-block message $m_0\|m_1$ hashing to given $H$ with effort $O(n/2)$.

I note the Initialization Vector $H_{-1}$ in order to match the question's recurence $H_i=E_{m_i}(H_{i-1})$. Hashing $m_0\|m_1$ requires 3 rounds, with the third processing the padding block $m_2$, which is known, and the result $H_2=H$.

We compute $H_1=E^{-1}_{m_2}(H)$. This is possible since $E$ is a block cipher and allow decryption. Notice that would not be possible with $H_i=E_{m_i}(H_{i-1})\oplus H_{i-1}$ as in the Davies-Meyer construction (used in SHA-1 and SHA-2 with a small modification).

For $2^{n/2}$ incremental values of $m_1$, we computes and make a list of $E^{-1}_{m_1}(H_1)$. It is possible that we get some collision(s) there (probability about 39% that we get at least one), but unlikely that we get more than a few, thus we likely have just shy of $2^{n/2}$ distinct values in the list.

For $2^{n/2}$ incremental values of $m_0$, we computes and search in the above list $E_{m_0}(H_{-1})$. There is good probability (>63%) that at least a match is found.

A corresponding $m_0\|m_1$ (if any) is our message hashing to $H$.

Note: we can reduce the message size to one block and about $n/2$ bits.


As is, the attack uses $O(2^{n/2})$ memory. However, that can be reduced to practical using Paul C. van Oorschot and Michael J. Wiener's Parallel Collision Search with Cryptanalytic Applications (in Journal of Cryptology, January 1999, Volume 12, Issue 1; free slightly earlier version available from the first author's website).

fgrieu
  • 140,762
  • 12
  • 307
  • 587