How many hashes for high probability of finding a collision (specific case)?

Question

Suppose Bob managed to obtain 220 different digests that were generated by a hash function employed by a target system. The hash function outputs 8-byte digest of a message. Bob now wants to find a message that hashes into 1 (one) of the obtained digests. How many different messages should Bob approximately hash until there is a good probability that a generated digest will match 1 of the obtained digests?

My answer is $\sqrt{2^{64}}$ ($= 2^{32}$) messages for a probability of 0.5. Is this correct?

@kelalaka: your reasoning is right, but your "Lower" (than $2^{32}$) is incorrect. To the OP: your estimate is incorrect. — fgrieu, Sep 24 '19 at 09:11
How did you come to your conclusion that it would be $\sqrt{2^{64}}$? — AleksanderCH, Sep 24 '19 at 09:12
@AleksanderRas, since the hash function outputs 8-bytes digest of a message. — wongsimon, Sep 24 '19 at 09:24
You are not exactly having a birthday problem. You need to find not an arbitrary collision, you need to find a hit into your set. You are looking for pre-images. I read you question, incorrectly! — kelalaka, Sep 24 '19 at 09:29

score 2 · Answer 1 · answered Sep 24 '19 at 09:30

2

Let's assume that there are no prior collisions (two different messages that generate an identical hash) for the first $2^{64}$ messages.

If you want to find a message for one hash then you would have to try all these $2^{64}$ messages to reach the probability of 1 (100%). Since you mentioned good probability we can say that we would have to try half of that for a probability of 0.5 (50%). That means that we would have to try $2^{64} / \space 2$ possibilities which would be $2^{63}$.

Now you also already have a list of 220 hashes. That means that you can reduce it because we previously only calculated it for one hash, so the solution is:

$$\frac{2^{63}}{220} \approx 4.2 \times 10^{16}$$ for a probability of $0.5$ to find a message that hashes to one hash in your list.

answered Sep 24 '19 at 09:30

AleksanderCH

6,435
10
29
62

@kelalaka, would appreciate on how to approach this problem. Thank you. – wongsimon Sep 24 '19 at 09:42
@alexkanderRas, so it is like solving for expected value? – wongsimon Sep 24 '19 at 11:14
@wongsimon What do you mean? – AleksanderCH Sep 24 '19 at 11:34
@aleskansdraRas, sorry my mistake. I thought it was similar to finding expected value as in probability. I am wrong, but now I understand your solution. Thank you for your assistance. – wongsimon Sep 24 '19 at 11:41
You seem to assume that there are exactly $2^{64}$ possible messages; why? – Squeamish Ossifrage Sep 24 '19 at 16:42
@SqueamishOssifrage The output of this hash is in this case 8 bytes, which would be 64 bits. – AleksanderCH Sep 24 '19 at 16:51
I was talking about inputs, not outputs. – Squeamish Ossifrage Sep 24 '19 at 16:52
There are of course more possible inputs, but we care about a collsion so I just took the minimum where a collision would be certain to appear (meaning that one of these $2^{64}$ inputs would have the same hash as one in the given list). – AleksanderCH Sep 24 '19 at 17:00
2

Suppose the messages are unbounded in length. Then the probability of finding at least one preimage after $2^{64}$ messages is not 100%; rather, it is the CDF at $2^{64}$ of the negative binomial distribution for one success with a success probability of $220/2^{64}$. Specifically, it is about $1 - (1 - 220/2^{64})^{2^{64}} \approx 1 - e^{-220}$. Granted, that is very close to $1$; my point is that the reasoning is wrong unless the search is going through a space of exactly $2^{64}$ possible messages. – Squeamish Ossifrage Sep 24 '19 at 17:07
@SqueamishOssifrage, would appreciate if you have any recommendation on introductory book or website for practice on problems of similar nature. Thank you. – wongsimon Sep 25 '19 at 00:40
@wongsimon Maybe try Willy Feller's introduction to probability theory? This is a traditional balls & urns exercise. – Squeamish Ossifrage Sep 25 '19 at 04:13
@SqueamishOssifrage, thanks for the recommendation. – wongsimon Sep 25 '19 at 04:59

How many hashes for high probability of finding a collision (specific case)?

1 Answers1