18

According to the birthday paradox we need approximately $O(|T|^{1/2})$ samples from the tag-space to find a collision for a hash function $h:K\times M \to T$. But how many samples are needed to find a three-way collision, i.e. $h(a) = h(b) = h(c)$ for three messages $a,b,c \in M$ hashed with the same key $k\in K$ ?

I thought that it would be $O(|T|^{1/2})$ to find $h(a) = h(b)$ and another $O(|T|^{1/2})$ to find a third, but when thinking about it, that feels wrong. How can I calculate this?

Cryptographeur
  • 4,317
  • 2
  • 27
  • 40
hsalin
  • 485
  • 4
  • 7
  • 2
    For narrow pipe (or even local wide-pipe) constructions you'll get a four-way hash collision for merely the cost of two normal collisions. – CodesInChaos Oct 04 '13 at 08:53

1 Answers1

19

The hand-waving argument goes thus: when you accumulate $n$ hash outputs, you are actually producing $n^3/6$ triplets, each of them having probability $t^{-2}$ to be a three-way collision (where $t = |T|$, i.e. the size of the output space). So you should expect the first three-way collision to appear when $n^3/6 = t^2$, i.e. $n = 6·t^{2/3}$. For a perfect hash function with 128-bit output, this means that you would need about $2^{88}$ hash function invocations.

Now that's just an approximation which does not give the exact result because the triplets are not exactly independent of each other; but it yields the proper order of magnitude.

More importantly, this assumes a perfect hash function. For a concrete hash function, even a "secure" one, you could get multicollisions much faster. As shown by Joux in 2004, an "iterated hash function" (e.g. MD5 or SHA-256) with an internal state of $s$ bits, you only need to produce $k$ "simple" collisions (of indidivual cost $2^{s/2}$) to deduce a $2^k$ multicollision. When $s$ is equal to the output size (that's called a "narrow pipe design", as @Codes says), this is much lower than the cost above; even if MD5 was not broken, it would still allow a four-way collision with cost $2^{65}$, an eight-way collision with cost $2^{66}$, and so on... The usual security property of "resistance to collisions up to $2^{s/2}$" is not incompatible with "beyond $2^{s/2}$ there may be an orgy of easy multicollisions".

Thomas Pornin
  • 86,974
  • 16
  • 242
  • 314
  • 4
    I apologize for my naivete, but could you please explain why there are $n^3/6$ triplets? – Moshe Jan 28 '14 at 21:26
  • 2
    Three choices of size $n$ means $n^3$ but order does not matter (triplet $(a,b,c)$, $(a,c,b)$, $(b,a,c)$... are identical) so you have to divide by $3!$, which happens to be equal to $6$. – Thomas Pornin Jan 28 '14 at 22:11
  • 1
    You get a $2^l$ way collision with cost $l \cdot 2^{64}$. So you get a 16-way collision with cost $2^{66} = 4 \cdot 2^{64}$ not just an eight-way collision. – CodesInChaos Mar 03 '14 at 13:50
  • 1
    When $n$ is large, $n^3/6$ and $n(n-1)(n-2)/6$ are almost the same thing. When talking about approximations (as is the case here), this kind of shortening is valid. – Thomas Pornin Nov 23 '14 at 16:52