What's purpose of the new function BinarySerialize?

Question

11.1 introduced a new function BinarySerialize, but I don't know what it can do better than the traditional method.Its behavior is very similar to Compress,though I cannot find any advantage of it.It even consumes more space than Compress,such as

BinarySerialize[Range[100]] // Normal // Length

805

Compress[Range[100]] // ToCharacterCode // Length

290

And ByteCount[BinarySerialize[Range[100]]] is also greater than ByteCount[Compress[Range[100]]].So what's purpose of this function? Anyone can provide a good example to use it?

@george2079 So it is for save space?Why not set PerformanceGoal->"Size" as a default option? — yode, Apr 05 '17 at 13:28

score 24 · Accepted Answer · edited Apr 13 '17 at 12:56

Disclaimer: This answer is written from a user's point of view. For useful insider information on this topic see this discussion with Mathematica developers on Community Forums.

Introduction

Binary serialization is rewriting expressions as an array of bytes (list of integers from Range[0, 255]). Binary representation of expression takes less space than a textual one and also can be exported and imported faster than text.

How do Compress and BinarySerialize functions work?

Compress (with default options) always does three steps:

It performs binary serialization.
It deflates result using zlib.
It transforms deflated result to a Base64-encoded text string.

BinarySerialize performs only binary serialization and sometimes deflates the result using zlib. With default options it will decide itself if it wants to deflate or not. With an option PerformanceGoal -> "Speed" it will avoid deflation. With an option PerformanceGoal -> "Size" it will likely deflate. BinarySerialize returns a ByteArray object. ByteArray is something like a packed array of 8-bit integers. However FullForm of ByteArray is visualized as Base64-encoded text string. This visualization can be somewhat misleading, because internally ByteArrays are stored and operated in binary form, not as text strings.

Binary serialization algorithms of Compress and BinarySerialize

Original serialization algorithm of Compress is described in this answer. That algorithm is not very optimized for size and produces larger-then-necessary output for many typical expressions. For example, it has no support for packed arrays of integers and rewrites such arrays as nested lists, which take a lot bytes.

BinarySerialize uses a more size-optimized binary serialization algorithm compared to what Compress (with default options) does. This algorithm supports packed arrays of integers, has optimizations for integers of different size (8,16,32 bit), stores big integers in binary form (not as text strings), and has other optimizations.

Applications of BinarySerialize

Using BinarySerialize we can write our own Compress-like functions with better compression. For example we can write myCompress function which does the same three steps as original Compress, but uses BinarySerialize for the serialization step:

myCompress[expr_]:=Module[
    {compressedBinaryData},
    compressedBinaryData = BinarySerialize[expr, PerformanceGoal->"Size"];
    Developer`EncodeBase64[compressedBinaryData]
    ];

myUncompress[string_]:=Module[
    {binaryData},
    binaryData = Developer`DecodeBase64ToByteArray[string];
    BinaryDeserialize[binaryData]
    ];

Even for simple integer list we can see size reduction.

Compress[Range[100]] // StringLength
(* 290 *)

myCompress[Range[100]] // StringLength
(* 244 *)

myUncompress[myCompress[Range[100]]] === Range[100]
(* True *)

If we take an expression with large number of small integers we get much more noticeable improvement:

bitmap = Rasterize[Plot[x, {x, 0, 1}]];

StringLength[Compress[bitmap]]
(*31246*)

StringLength[myCompress[bitmap]]
(*17820*)

myUncompress[myCompress[bitmap]] === bitmap
(* True *)

Conclusion

The example above shows that the result of a simple user-defined function myCompress based on a BinarySerialize can be almost twice more compact than the result of Compress.

Outlook

To decrease the output size even further one can use a compression algorithm with higher compression settings (in the second step) or use Ascii85-encoding instead of Base64 in the third step.

Appendix 1: Undocumented options of Compress

I have noticed that in Version 11.1 Compress has more undocumented options than in previous versions. Those options allows one to:

Disable both compression and Base64 encoding and return a binary serialized result as a string with unprintable characters:

Compress[Range[100], Method -> {"Version" -> 4}]
Change binary serialization algorithm to a more efficient one, but not exactly to BinarySerialize.

Compress[Range[100], Method -> {"Version" -> 6}] // StringLength

(* 254 *)

There is also a "ByteArray" option shown in usage message ??Compress but it does not work in Version 11.1.

Note that this behavior is undocumented and may change in future versions.

Appendix 2: Compression option of BinarySerialize

Just for fun one can manually compress result of BinarySerialize[..., PerformanceGoal -> "Speed"] to get the same output as BinarySerialize[..., PerformanceGoal -> "Size"] produces. This can be done with the following code:

myBinarySerializeSize[expr_]:=Module[
    {binaryData, dataBytes, compressedBytes},
    binaryData = Normal[BinarySerialize[expr, PerformanceGoal->"Speed"]];
    dataBytes = Drop[binaryData, 2]; (*remove magic "7:"*)
    compressedBytes = Developer`RawCompress[dataBytes];
    ByteArray[Join[ToCharacterCode["7C:"], compressedBytes]]
    ]

We can check that it gives the same result as PerformanceGoal -> "Size" option

data = Range[100];
myBinarySerializeSize[data] === BinarySerialize[data, PerformanceGoal -> "Size"]

Appendix 3: zlib compression functions

Description of undocumented zlib compression/decompression functions Developer`RawCompress and Developer`RawUncompress can be found in this answer.

Appendix 4: Base64 encoding functions

Usage of Base64 encoding/decoding functions from the Developer` context can be explained using the following code:

binaryData = Range[0, 255];

Normal[
    Developer`DecodeBase64ToByteArray[
        Developer`EncodeBase64[binaryData]
        ]
    ] == binaryData

(* True *)

You seem to have dug deep here. Perhaps you have some of the answers for this too. — Szabolcs, Apr 05 '17 at 20:11
I just feel very pity that Developer`RawCompress just can compress string or list. — yode, Apr 09 '17 at 14:34

score 7 · Answer 2 · answered Apr 05 '17 at 12:06

7

Measure the difference in bytes with ByteCount.

BinarySerialize[Range[100]] // ByteCount

Range[100] // ByteCount

BinarySerialize will occupy fewer bytes. Not a big difference in this trivial example but the difference generally widens as the object to be serialised gets larger.

Also you must Compress both for apples-to-apples comparison.

Compress@BinarySerialize[Range[100]] // ByteCount

Compress@Range[100] // ByteCount

Hope this helps.

answered Apr 05 '17 at 12:06

Edmund

42,267
3
51
143

So as your understand,this function is intend to save space? – yode Apr 05 '17 at 17:43
@yode I think that's probably right. Also, and here I'm just speculating, it might be faster to revert using BinaryDeserialize. Of course the fact that it's a binary format also probably means it's not platform independent. – b3m2a1 Apr 05 '17 at 17:46
@MB1965 You mean if I share my result of BinarySerialize to you,you cannot decode it by BinaryDeserialize? – yode Apr 05 '17 at 17:53
@MB1965 I would expect them to state it explicitly if it is not platform independent (as it is stated for the MX format). Compress is platform independent. – Szabolcs Apr 05 '17 at 18:02
@Szabolcs this is true... hmm... I have no Windows machines lying about but hopefully someone can test this and prove me wrong. – b3m2a1 Apr 05 '17 at 18:04
@yode Also if you look at the details you can either choose speed or space and it will automatically try to balance that. – b3m2a1 Apr 05 '17 at 18:12

score 6 · Answer 3 · edited Jun 16 '20 at 09:23

6

Here's an interesting complement to Edmund's answer. It's not always the case that the ByteArray form will be more compact. DocFind is just a table of reflinks I use for doc searching.

In[155]:= bs = BinarySerialize@DocFind[]; // RepeatedTiming
Out[155]= {1.8, Null}
In[156]:= cp = Compress@DocFind[]; // RepeatedTiming
Out[156]= {1.74, Null}
In[157]:= bs // ByteCount
Out[157]= 3027593
In[158]:= cp // ByteCount
Out[158]= 233104

On the other hand it's minimally faster to BinaryDeserialize:

In[159]:= Uncompress@cp; // RepeatedTiming
Out[159]= {0.34, Null}
In[160]:= BinaryDeserialize@bs; // RepeatedTiming
Out[160]= {0.30, Null}

If we use a much larger input we get a much more interesting result, though:

In[145]:= bs2 = BinarySerialize@Range[10000000]; // AbsoluteTiming
Out[145]= {0.16928, Null}
In[153]:= cp2 = Compress@Range[10000000]; // AbsoluteTiming
Out[153]= {4.92081, Null}

My intuition would suggest that BinarySerialize should almost always be as fast or faster than Compress but I'd be interested to be proven wrong.

Here the serialized version is much more compact:

In[161]:= bs2 // ByteCount
Out[161]= 80000104
In[162]:= bs3 // ByteCount
Out[162]= 31276128

And it's so much faster to use BinaryDeserialize here than Uncompress:

In[163]:= BinaryDeserialize@bs2; // RepeatedTiming
Out[163]= {0.13, Null}
In[164]:= Uncompress@cp2; // AbsoluteTiming
Out[164]= {29.8372, Null}

Again, because it's a byte format, my intuition would suggest Binary* will be faster and often more compact than a compressed string.

Update: yode confirmed that it is platform independent

On the other hand, since it's a byte format I would also believe it's platform dependent -- i.e., I can't take my serialized ByteArray and mail it to you and expect it to work if we have different operating systems/architecture, the same way I can't expect that of .mx files.

edited Jun 16 '20 at 09:23

Community

1

answered Apr 05 '17 at 18:02

b3m2a1

46,870
3
92
239

I'm in window 10.Can we test whether it is platform dependent or not? – yode Apr 05 '17 at 18:18
Run this code please.NotebookPut[Uncompress[FromCharacterCode[Flatten[ImageData[Import["http://i.stack.imgur.com/A2bMu.png"],"Byte"]]]]].Do you get Range[100] by BinaryDeserialize? – yode Apr 05 '17 at 18:21
@yode for some reason that gave me an "file not found" message. Alternatively you can pull the byte array file here (just go there and the file will download) and see if you can get that to BinaryDeserialize. – b3m2a1 Apr 05 '17 at 18:24
I get it.see this.It is your file? – yode Apr 05 '17 at 18:30
@yode yep. It's platform independent. I'll amend my post. – b3m2a1 Apr 05 '17 at 18:31
BTW MX is portable between OSs since 10.0, but not portable between 32 and 64 bit. MX written with older versions is also supposed to be readable with newer versions (perhaps only since v10.0) – Szabolcs Apr 05 '17 at 18:40
@Szabolcs good to know. I just remembered reading that it wasn't in the docs at some point. – b3m2a1 Apr 05 '17 at 19:05
@MB1965 Could I know how do you export a ByteArray object to get such file? – yode Apr 21 '17 at 05:18
@yode I exported to a file on my desktop and then used CopyFile to an object I got from CloudDeploy[None, Permissions->"Public"]. You'll need a cloud account to use it, but restricted cloud accounts are free, so that's all I do (if you need more you can do the Google Drive thing where you make a new email to get a new free account). You might be able to just set the "MIMEType" or something to get your object to autodownload... or maybe CloudPut. Dunno. I just use CopyFile. – b3m2a1 Apr 21 '17 at 05:20
I mean how to export it into your desktop. :)Like this? – yode Apr 21 '17 at 06:07
@yode ohhh. I surely used some variant on Export["~/Desktop/byte_array.m",barr]. – b3m2a1 Apr 21 '17 at 06:08
But why your file here is a no file extension file? – yode Apr 21 '17 at 06:12
@yode it's just a raw cloud object. Its file type is in its mime type. Since I copied it up from a .m file that's set to application/vnd.wolfram.mathematica.package. And your browser knows that to download links of that mime type. – b3m2a1 Apr 21 '17 at 06:16
I'm having some SSH oddities trying to import that, but the easiest thing is, I think, CloudGet although you can often use ToExpression on a raw Import, too. I can post a Q/A type question on sharing data via the cloud if that would be helpful to you. – b3m2a1 Apr 21 '17 at 06:33
@yode it's up. Comment there to ask for more info / clarification. – b3m2a1 Apr 21 '17 at 06:53

score 3 · Answer 4 · answered Apr 14 '17 at 13:52

In general, serialization is widely used for two purposes:

To send large objects "over the wire" by TCP/IP from one computer to another. For example, a company might have a programming object such as a BillOfMaterials containing a large number of part numbers. The company would serialize the BillOfMaterials, send it to a vendor, and request a quote and an estimate of parts availability. The vendor would de-serialize the BillOfMaterials, assign prices to the part numbers and indicate whether in stock, serialize the completed BillOfMaterials, and return the quote to the originating company.
Memoization. For recursive algorithms using bimomial (Binomial[...]) or multinomial (Multinomial[...]) coefficients, the integers involved become very large very fast. Even computers with large amounts of RAM cannot keep intermediate results in core memory and still do useful work. It is common to keep a dictionary or map (in Mathematica, an Association) of the results of computing a binomial or multinomial coefficient as a serialized object on disk. The dictionary key is a concatenation of the binomial or multinomial arguments, and the dictionary value is the file name of the serialized result. So, instead of recomputing the coefficient, it is read from disk and de-serialized. The same basic idea could be used in Dynamic Programming, in which a large problem is split into many smaller problems. Since the smaller problems may have to be solved over and over again, their solutions can be serialized to disk and recalled as needed.

What's purpose of the new function BinarySerialize?

4 Answers4

Update: yode confirmed that it is platform independent

Linked