1

At the moment I'm using version 11.2. A few days ago, after an update in my programs, I found errors. It turned out that the new version incorrectly computes the hash. I'm using method SHA256. And here is an example of computing a hash that differs from the result in the previous version.

(* 11.2 *)
Hash["¥"]

Out[..] := 756035385197879955

And the same in new version

(* 11.3 *)
Hash["¥"]

Out[..] := 756035385197879955

But for method SHA256

(* 11.2 *)
IntegerString[Hash["¥", "SHA256"], 16]

Out[..] := ac3ed5d81b09324e72933efee6365d9277132a857d324788842a83df908fe6b2

Another result for the new version

(* 11.3 *)
IntegerString[Hash["¥", "SHA256"], 16]

Out[..] := 6922e93e3827642ce4b883c756b31abf80036649d3614bf5fcb3adda43b8ea32

This works for characters whose code is greater than 127. Are there ways to solve this problem? (the result for the new version I tested in in the wolfram cloud)

Kirill Belov
  • 618
  • 6
  • 17
  • Have a look here. I hope this helps. – Henrik Schumacher Mar 12 '18 at 08:26
  • Thank you, I ran into this problem! – Kirill Belov Mar 12 '18 at 08:31
  • 2
    As far as I understand to keep consistent behavior across different versions you need to use: Hash[ToString["¥", CharacterEncoding -> "UTF8"], ___], so patch the old code rather than patch new Hash. Current behavior should be more robust and older had few undocumented issues/inconsistencies attached anyway. See this question and the discussion after the answer: https://mathematica.stackexchange.com/a/167639/5478 – Kuba Mar 12 '18 at 08:40
  • @Kuba Funnily enough, this will actually fail for some characters due to the vagaries of ToString. But I added an example to the other question that avoids it. – ilian Mar 13 '18 at 03:28

2 Answers2

8

The current SHA256 hash value is

Hash["¥", "SHA256", "HexString"]

(* "ac3ed5d81b09324e72933efee6365d9277132a857d324788842a83df908fe6b2" *)

which matches what you get from e.g. http://new.md5calc.com/hash-calc/sha256/%C2%A5 or from

$ echo -n ¥ | sha256sum
ac3ed5d81b09324e72933efee6365d9277132a857d324788842a83df908fe6b2  -

What is being hashed are the two bytes of the UTF-8 representation of the character ¥. See also this previous answer for more details.

ilian
  • 25,474
  • 4
  • 117
  • 186
  • At the moment in the cloud everything works correctly (according with 11.2). Your alternative is a full equivalent of my. – Kirill Belov Mar 13 '18 at 06:25
4

I think you have your two versions labeled backwards, because I see

In[21]:= IntegerString[Hash["¥", "SHA256"], 16] (* 11.3 *)
Out[21]= "ac3ed5d81b09324e72933efee6365d9277132a857d324788842a83df908fe6b2"

In[1]:= IntegerString[Hash["¥", "SHA256"], 16] (* 11.2 *)
Out[1]= "6922e93e3827642ce4b883c756b31abf80036649d3614bf5fcb3adda43b8ea32"

At any rate, there was a bug in in earlier versions where code points greater than 255 were simply truncated to their lowest 8 bits. In 11.3, strings are hashed correctly according to their UTF-8 bytes. You can use Developer`LegacyHash if you want bug-for-bug compatibility.

J. M.'s missing motivation
  • 124,525
  • 11
  • 401
  • 574
Itai Seggev
  • 14,113
  • 60
  • 84