9

I want to write data to a file using compression. Is it possible to do that just in time (like in java ZipOutputStream). E.g. what I want:

zipOutputStream = OpenWrite["file.zip", Compression->"ZIP"];
For[i = 0, i < 1000000, ++i, 
    Write[zipOutputStream, doMyStuff[i]];
]
Close[zipOutputStream];

so that the resulting file.zip will be a valid zip archive.

  • If you were asking about gzip, then the answer would probably be yes. See $OutputStreamMethods. But I don't know how to get it to work. BTW you really should consider not using For unless you can come up with a very good reason why it's appropriate. Just use Do. – Szabolcs Jan 02 '17 at 12:08
  • Why should I use Do instead of For? – Stanislav Poslavsky Jan 02 '17 at 13:24
  • 2
    Countless reasons that have been repeated many times here and on other forums: Unreadable, easy to mess up (countless examples of ,-; confusion), the iterator is not localized, it gets unnecessarily verbose (multiple iterators? list iterator?), it doesn't parallelize, etc. And it shows a bad example. A lot of beginner questions would just go away if For didn't exist in Mathematica. – Szabolcs Jan 02 '17 at 13:48
  • 2
    you might work with StartProcess and an external compression program. – george2079 Jan 02 '17 at 14:15
  • http://mathematica.stackexchange.com/q/134609/12 – Szabolcs Jan 02 '17 at 14:41
  • @StanislavPoslavsky: I have given an answer which indicates how you can build something on your own. If you want to do that and find a solution, it is welcomed that you answer your own question and accept that answer. If you think that is out of your current ambitions, then you might wait and see if someone else gives a more detailed answer... – Albert Retey Jan 03 '17 at 14:00
  • Now having tried this, there are two problems with StartProcess:. 1) There is no evident way to direct the process standard out to a file. (this can be worked around) 2) I have been unable to figure out how to signal the end of the data. see http://mathematica.stackexchange.com/q/84430/2079 – george2079 Jan 03 '17 at 15:28

2 Answers2

10

As mentioned already it's easy to implement this functionality with JLink. But once you use Java's ZipOtputStream you will need to convert your data to list of bytes first. Borrowing java code from here.

<< JLink`
InstallJava[];

openStream[file_] := Module[{fos, bos, zos},
   fos = JavaNew["java.io.FileOutputStream", file~StringJoin~".zip"];
   bos = JavaNew["java.io.BufferedOutputStream", fos];
   zos = JavaNew["java.util.zip.ZipOutputStream", bos];
   zos@putNextEntry[JavaNew["java.util.zip.ZipEntry", file]];
   zos];

closeStream[stream_] := Module[{},
   stream@closeEntry[];
   stream@close[];
   ];

stream = openStream["blah.dat"];

stream@write[{1, 2, 3}]
closeStream[stream];

As result you will get blah.dat.zip archive that will have blah.dat inside. This is not very useful. We need some sort of serialization. Let's take your example and define

doStuff[x_] := x*x

We will store squares of 1-1000000 in a zip file:

convertToBytes[list_] := 
  Flatten[ToCharacterCode /@ (ToString[#]~StringJoin~"\n" & /@ list)];
stream = openStream["blah.dat"];

stream@write[convertToBytes[doStuff /@ Range[1000000]]]
closeStream[stream];

Update @rcollyer proposed to hook into native streams with DefineOutputStreamMethod and it actually worked:

DefineOutputStreamMethod["Zipped", {
    "ConstructorFunction" -> 
          Function[{name, isAppend, caller, opts}, {True, openStream[name]}],

   "WriteFunction" -> 
          Function[{state, bytes}, state@write[bytes]; {Length[bytes], state}],
   "CloseFunction" -> Function[{state}, closeStream[state]]
}];

Now we can work with zipped streams using native methods:

starWars = OpenWrite["star-wars.dat", Method -> "Zipped"];

Write[starWars, "yoda forever!"];
Write[starWars, {"Luke", "Leia"}];
Write[starWars, doStuff /@ Range[1000]];

Close[starWars];
BlacKow
  • 6,428
  • 18
  • 32
6

Neither the import format ZIP (nor any of those similar to it like GZIP, TAR or BZIP) nor the function CreateArchive do have any documented functionality which would let you do what you want. So I see two possible ways to achieve what you want:

  • as george2079 mentioned in his comment, you could use StartProcess and an external program. That will of course become somewhat OS dependent.

  • you could use JLink to access the java functionality you are referring to directly.

If you get that to work you might have a look at the new stream methods functionality which should in principle let you define a specific output stream method which would behave as you indicated. Unfortunately the documentation of stream methods is not very detailed so I guess it would mean some experimenting to get this running.

Some superfluos spelunking in the CreateArchive code indicates that WRI uses JLink` plus java.util.zip.ZipOutputStream there so I think that is the most straightforward path. If you want to have a look at how they do things, you could have a look at the definition of CreateArchiveDump`compress and CreateArchiveDump`addToZIPFile which will be available after the first use of CreateArchive.

Albert Retey
  • 23,585
  • 60
  • 104