1

Update

The short answer is that when getting results back from an API into an HTTPResponse object the "Body" property is a post-decoded form of the result. ImportString with "JSON" expects the encoded form of the result. Therefore you should use the "BodyBytes" property with FromCharacterCode.

With res a HTTPResponse object with the result, then

ImportString[FromCharacterCode@res["BodyBytes"], "JSON"]

brings in the result without issue.

URLExecute can also be used to directly return the imported JSON but you don't get a nice HTTPResponse object to check the status of your result.

With req a HTTPRequest object, then

URLExecute[req, {}, "JSON"]

brings in the result without issue.


OP

I have many large JSON results from a web API that contain Unicode characters. ImportString with "JSON" seems unable to import these. As a minimal example try,

ImportString["{\"id\":1,\"text\":\"ñía\"}", "JSON"]

Import::utf8expcontinuation: Input is not a valid UTF8 byte sequence. 237 is expected to be a continuation byte.

Import::jsonhintposition: An error occurred at line 1:19

I have tried adding option CharacterEncoding -> "UTF8" and get the same result.

Finally I tried

# -> ImportString["{\"id\":1,\"text\":\"ñía\"}", "JSON", 
    CharacterEncoding -> #] & /@ $CharacterEncodings

Every encoding failed.

How do I import Unicode JSON? I have many such results to import.

I am using Win10 Mma 11.3

Edmund
  • 42,267
  • 3
  • 51
  • 143
  • 1
    Round tripping to and from ByteArray works on your small example for some encodings: Cases[Quiet[# -> ImportString[ByteArrayToString[StringToByteArray["{\"id\":1,\"text\":\"ñía\"}"], #], "JSON"]] & /@ $CharacterEncodings, _[_, _List]] – Greg Hurst Mar 21 '19 at 23:26
  • @ChipHurst A few of them do work with the round trip. Any ideas why hoops must be jumped through to get what should be a basic import. Bug, perhaps? – Edmund Mar 22 '19 at 00:04
  • 1
    It feels like a bug, but I don’t have enough background to know for sure. – Greg Hurst Mar 22 '19 at 00:05
  • 4
    Where does this string come from? If from URLRead Body then it is not the way to go and you have to import bytes > FromCharacterCode > ImportString. For explanation see the first section in https://mathematica.stackexchange.com/q/154245/5478 – Kuba Mar 22 '19 at 00:21
  • @Kuba ImportString[FromCharacterCode@res["BodyBytes"], "JSON"] for res a HTTPResponse object does the trick. Thanks. – Edmund Mar 22 '19 at 13:39
  • @Edmund maybe we should put that line somewhere on top of the linked question as TL;DR for future visitors. It became quite long and may not be clear anymore, otoh everything is relevant – Kuba Mar 22 '19 at 13:42
  • @Kuba URLExecute[req, {}, "JSON"] with req a HTTPRequest object is even better and it comes in already parsed as a list of rules. However, it is not as easy to check for success since you don't get a HTTPResponse object. – Edmund Mar 22 '19 at 13:43
  • 1
    @Edmund yes if URLExecute can be used then go with it. Sometimes one needs to use URLRead when a more fine grained control is needed. – Kuba Mar 22 '19 at 13:43
  • Another option here is to use Developer`ReadRawJSONString since it has no problem with your string. – Jason B. Mar 22 '19 at 15:08

0 Answers0