5

When calling the Stack Exchange API v2.2 using the new (Mma11) function URLRead we get a GZIP encoded Body in alternative formats: "Body", "BodyByteArray" and "BodyBytes" depending on the options.

How can we deal with each of these formats in order to decode the body in Mathematica?

I am particularly intrigued with operating on ByteArray

I know I could get the content using

Import[URLBuild[{"https://api.stackexchange.com", "2.2", 
   "info"}, {"site" -> "mathematica"}], "RawJSON"]

My questions is about dealing explicitly with list of byte values as well as ByteArray that are encoded. Hopefully without creating a temporary file, to then read it back.

The code I'm using:

reply=URLRead[
 URLBuild[{"https://api.stackexchange.com", "2.2", "info"}, {"site" -> "mathematica"}]
 , {"Headers", "StatusCode", "StatusCodeDescription", "ContentType", 
  "BodyByteArray"}]
<|"Headers" -> {
"cache-control" -> "private"
, "content-type" -> "application/json; charset=utf-8"
, "content-encoding" -> "gzip"
, "access-control-allow-origin" -> "*"
, "access-control-allow-methods" -> "GET, POST"
, "access-control-allow-credentials" -> "false"
, "x-content-type-options" -> "nosniff"
, "date" -> "Thu, 11 May 2017 10:55:51 GMT"
, "content-length" -> "234"
}
, "StatusCode" -> 200
, "StatusCodeDescription" -> "OK"
, "ContentType" -> "application/json; charset=utf-8"
, "BodyBytesArray" -> ByteArray[< 234 >]
|>

By the way, strangely to me, Import[URLRead[url], "RawJSON"] doesn't work.

rhermans
  • 36,518
  • 4
  • 57
  • 149

1 Answers1

3

"BodyBytes"

  • a list of bytes from http response. Does not mean much without encoding information

"BodyByteArray"

  • afaict "BodyBytes" wrapped with ByteArray. (1.)

"Body"

  • afaict, a String - ToCharacterCode[#BodyBytes, encoding], where encoding is read from charset content-type header.

    That is a problem for us. First of all it ignores content-encoding. Additionally json does not need charset sub-header but without it it won't be recognized as utf8 (in case without gzip). Don't know if that is expected, probably deserves a spearate question.

So, the safe way (4.) is through bytes, e.g.:

URLRead[
    "https://api.stackexchange.com/2.2/info?site=mathematica"
  , "BodyBytes"
] // FromCharacterCode // ImportString[#, {"gzip", "RawJSON"}] & (*3.*)
<|"items" -> {<|"new_active_users" -> 0, "total_users" -> 31928, 
    "badges_per_minute" -> 0.03, "total_badges" -> 92485, 
    "total_votes" -> 567164, "total_comments" -> 325306, 
    "answers_per_minute" -> 0.02, "questions_per_minute" -> 0.01, 
    "total_answers" -> 63637, "total_accepted" -> 23610, 
    "total_unanswered" -> 4679, "total_questions" -> 43005, 
    "api_revision" -> "2017.5.3.25597"|>}, "has_more" -> False, 
 "quota_max" -> ..., "quota_remaining" -> ...|>

  1. What is the intended purpose of ByteArray

  2. Importing a Base64 encoded string

  3. SO: Content-Encoding vs charset

  4. Who is to blame: parsing UTF8 encoded JSON HTTPResponse fails

Kuba
  • 136,707
  • 13
  • 279
  • 740