0

Suppose Bob managed to obtain 220 different digests that were generated by a hash function employed by a target system. The hash function outputs 8-byte digest of a message. Bob now wants to find a message that hashes into 1 (one) of the obtained digests. How many different messages should Bob approximately hash until there is a good probability that a generated digest will match 1 of the obtained digests?

My answer is $\sqrt{2^{64}}$ ($= 2^{32}$) messages for a probability of 0.5. Is this correct?

AleksanderCH
  • 6,435
  • 10
  • 29
  • 62

1 Answers1

2

Let's assume that there are no prior collisions (two different messages that generate an identical hash) for the first $2^{64}$ messages.

If you want to find a message for one hash then you would have to try all these $2^{64}$ messages to reach the probability of 1 (100%). Since you mentioned good probability we can say that we would have to try half of that for a probability of 0.5 (50%). That means that we would have to try $2^{64} / \space 2$ possibilities which would be $2^{63}$.

Now you also already have a list of 220 hashes. That means that you can reduce it because we previously only calculated it for one hash, so the solution is:

$$\frac{2^{63}}{220} \approx 4.2 \times 10^{16}$$ for a probability of $0.5$ to find a message that hashes to one hash in your list.

AleksanderCH
  • 6,435
  • 10
  • 29
  • 62
  • @kelalaka, would appreciate on how to approach this problem. Thank you. – wongsimon Sep 24 '19 at 09:42
  • @alexkanderRas, so it is like solving for expected value? – wongsimon Sep 24 '19 at 11:14
  • @wongsimon What do you mean? – AleksanderCH Sep 24 '19 at 11:34
  • @aleskansdraRas, sorry my mistake. I thought it was similar to finding expected value as in probability. I am wrong, but now I understand your solution. Thank you for your assistance. – wongsimon Sep 24 '19 at 11:41
  • You seem to assume that there are exactly $2^{64}$ possible messages; why? – Squeamish Ossifrage Sep 24 '19 at 16:42
  • @SqueamishOssifrage The output of this hash is in this case 8 bytes, which would be 64 bits. – AleksanderCH Sep 24 '19 at 16:51
  • I was talking about inputs, not outputs. – Squeamish Ossifrage Sep 24 '19 at 16:52
  • There are of course more possible inputs, but we care about a collsion so I just took the minimum where a collision would be certain to appear (meaning that one of these $2^{64}$ inputs would have the same hash as one in the given list). – AleksanderCH Sep 24 '19 at 17:00
  • 2
    Suppose the messages are unbounded in length. Then the probability of finding at least one preimage after $2^{64}$ messages is not 100%; rather, it is the CDF at $2^{64}$ of the negative binomial distribution for one success with a success probability of $220/2^{64}$. Specifically, it is about $1 - (1 - 220/2^{64})^{2^{64}} \approx 1 - e^{-220}$. Granted, that is very close to $1$; my point is that the reasoning is wrong unless the search is going through a space of exactly $2^{64}$ possible messages. – Squeamish Ossifrage Sep 24 '19 at 17:07
  • @SqueamishOssifrage, would appreciate if you have any recommendation on introductory book or website for practice on problems of similar nature. Thank you. – wongsimon Sep 25 '19 at 00:40
  • @wongsimon Maybe try Willy Feller's introduction to probability theory? This is a traditional balls & urns exercise. – Squeamish Ossifrage Sep 25 '19 at 04:13
  • @SqueamishOssifrage, thanks for the recommendation. – wongsimon Sep 25 '19 at 04:59