0

Question: Disregarding brute-force, is it any easier to calculate a partial hash collision, in which only a certain number of bits match?

Reasoning: On many websites you find hashes for file downloads. That's nice for integrity checks from the original website, and very nice when downloading from mirrors to verify that the file wasn't changed.

I just put up a new file download to a website and added the SHA256 hash as well. Checking it, I noticed that I didn't really pay attention to the full hash, and that I never do. Instead I usually look at the first few digits and the last few digits, and disregard most of the values in between, thinking if those match, the others probably will as well.

No I ask myself, if that is a potential "social" attack vector. Offer a manipulated download of a file that just matches the partial checksum.

Calculating a full hash collision of SHA256 has not been demonstrated as far as I know. So this boils down to the question, if from the mathematical side it is any easier to calculate a partial hash collision for SHA256, preferably at certain bit locations at the front and back?

Let's consider brute-force still too expensive, since that will of course get easier with less and less bits to come out correct.

Jens
  • 319
  • 2
  • 12
  • sure, if you compare just the first letter, there's a 1/16 chance of getting spoofed. power those odds up to the number of chars you manually check. – dandavis Dec 13 '18 at 17:57
  • Neardupe https://security.stackexchange.com/questions/102862/do-you-need-to-check-the-entire-md5-hash-value – dave_thompson_085 Dec 14 '18 at 03:14

3 Answers3

4

To clarify, the attack you are concerned about (someone substituting a different file that hashes to the same known hash) is actually a second preimage attack, not a collision attack. A second preimage attack is significantly harder to achieve than a collision attack.

That being said, in your described scenario of just matching some undefined (and presumably small) number of characters in a hash, a second preimage attack is certainly possible, and even likely if someone puts in enough effort. So, yes, it is (obviously?) easier to match fewer characters of a hash, than more.

As a side note, if someone were able to somehow access your server and swap out the file with a different one, it seems reasonable that they could likely also change the stated hash value as well, which arguably would be a much easier and more effective attack than generating a file which partially matches the stated hash.

TTT
  • 9,212
  • 4
  • 20
  • 32
  • Good point about the wording, and yes, brute-forcing fewer characters is easier, and I was more interested in some algorithmic "weakness" enabling deduction of fewer bits. I was also not thinking about spoofing a file on the original server (then of course, just change the stated hash), but rather on a mirror (think all those download sites). – Jens Dec 13 '18 at 18:43
  • I would hope that a site that allows mirrored file downloads would verify the checksum first at upload, but, if not, I'd like to think that someone who bothered to verify the checksum on their own would also have the verification portion automated too (many utilities provide this), or would at least notice the difference. But as a thought experiment, I agree it's certainly possible someone could try it and maybe trick somebody. Of course, many more people don't even bother to verify checksums so I'm not sure it's worth the extra effort. – TTT Dec 13 '18 at 19:20
2

Yes. This is a known "problem". To be secure, you should check a number of bits appropriate for your security level. In code, you should just check the whole thing. For humans, it might be worth it to shorten the hash to make it less unpleasant to verify (thereby making it more likely that people actually do verify it).

To protect against currently plausible attacks, you should check at least ~180 bits, because a collision requires only 2n/2 bits to compute (see Wikipedia). For example, with SHA-256, you could choose to check three quarters of it (192 bits): nobody can currently brute force that, and nobody (as far as we know) has an attack on SHA-2 that is good enough to fake 192 bits of it. Or, if you're feeling lucky, you pick a few bytes at random and hope the attacker did not take psychology classes to predict which ones you will check.

This can also be used for benign purposes. See Bitcoin and Tor, where people generate vanity Bitcoin addresses and vanity domain names. These are examples of a partial brute force being used to get fun values.

Luc
  • 32,911
  • 8
  • 78
  • 138
  • Interesting! Could you elaborate on the method? From what I read about vanity hashes I get the idea that it's brute force, which is understandably possible the less correct bits you need. I was wondering if there is any mathematical "weakness" in SHA256 that makes partial hashes more easily accessible (given that those tools use brute force, probably not, but that's the question :-)). – Jens Dec 13 '18 at 15:33
  • @Jens There is no mathematical weakness that makes it more possible. If there were, then it would lessen the strength of the hashing algorithm and we might as well switch to something more secure, so then it would never be used for vanity addresses because we're busy moving to another algorithm. – Luc Dec 13 '18 at 15:45
  • Well said, makes sense that if there was a better-than-bruteforce method for some bits, that there would be one for all the bits, rendering the algorithm useless. Maybe you could add this comment to the answer to complete it. – Jens Dec 13 '18 at 18:40
  • @Luc This is not entirely correct. Collision resistance is always going to be half of the output size, so a 128-bit digest will have 64-bit collision resistance, and it is certainly possible to break that. – forest Dec 14 '18 at 04:08
  • @forest Wait so an MD5 can never be secure, even if MD5 had no vulnerabilities? Or similarly, if you truncate SHA-256 to 128 bits, it is no longer secure just because one only needs 2**63 attempts (on average) to find a collision? – Luc Dec 14 '18 at 09:13
  • @Luc Against collisions yes, even MD5 were it not vulnerable would not be secure against an adversary who can compute up to 2^64 hash operations (which many can do). Against preimages, a 128 bits is fine since preimage resistance is 2^n for an n-bit hash, whereas collision resistance is 2^(n/2) for an n-bit hash. – forest Dec 14 '18 at 09:14
  • (Forest kindly explained it to me in the DMZ chat.) Updated the answer! – Luc Dec 14 '18 at 10:13
0

Let's consider brute-force still too expensive, ...

The premice of your question is incorrect.

Let's say you are lazy and just verify ten hexadecimal digits. That is 40 bits, so the attacker needs to try on average 240 = 1012. If she can add random junk at the end of her modified file, it is cheap to calculate new hashes since she will not have to hash the whole file multiple times. On a GPU you can easily get 1010 hashes per second (source). So it would take about a minute to generate a match.

Off course if you check more digits the time grows exponentially. But the attacker can compensate (up to some point) by using more/better hardware.

Anders
  • 65,582
  • 24
  • 185
  • 221