4

Against my better judgement, and in the interest of speedy I/O, I've decided to use .mx files for a project - limitations and all. I'm dealing with a few gigabyte-sized datasets that I need in core for reasonably efficient computations and, despite other options, the ol .mx remains my best option (with some hand-rolled HDF5 coming in a close second).

I try to be reasonably clever and name the file with the name of the symbol or context I am exporting. Despite my cleverness I occasionally dump things into the Global' context and then misname the file.

I've perused the always insightful work of @MrWizard on the subject, especially here but I'm still adrift in a sea of Definition when it comes to this particular problem.

For a while, I figured that looking what was defined in Global' before and after, then comparing, would be of assistance, but, if the symbol is already defined (and subsequently gets clobbered) it means I'm out of luck there.

I feel like I'm missing something pretty obvious here - or not.

flip
  • 1,800
  • 11
  • 20
  • 1
    You could load your mx file in another (temporary) context, like described e.g.here, and then analyze names created in that context (and then perhaps remove them all). B.t.w., for the datasets of the size you mention, you could try using undocumented Streaming functionality, which is also based on mx files, but where lots of things has been already automated (I hope that Streaming will soon become officially part of Mathematica). – Leonid Shifrin Feb 10 '17 at 16:01
  • Quick comment here, if I look at a hexdump of the file I can see the symbol names of known .mx dumps... it just doesn't seem to be at a 'nice' location in the file, like a header or something. It's always about ½ way through the file somewhere. – flip Feb 10 '17 at 16:02
  • @LeonidShifrin Thank you for both the pointer and suggestion. I will look at the Streaming. I seem to remember hearing mention of it a while back and did the usual 'hey neat, I could use that' and then promptly obliterated those neurons with Scotch or something similar. – flip Feb 10 '17 at 16:03
  • @LeonidShifrin Wait, so mx files don't actually contain fully qualified names? I though they were like a memory dump and that they were the only format that preserved expressions intact. (InputForm, FullForm, MathLink transfer, Compress, WDX, etc. are all incomplete and I know they don't always preserve context information). – Szabolcs Feb 10 '17 at 16:05
  • @Szabolcs IIRC there's some difference between it using Definition vs FullDefinition or some such that might be responsible for this? – flip Feb 10 '17 at 16:07
  • @Szabolcs I think they do, but the trick I described in the linked answer force them to be placed in another context. The problem is however that this won't work if the symbols have already been created in the original context. – Leonid Shifrin Feb 10 '17 at 16:07
  • 3
    @flip If you use MX, try not to use DumpSave. DumpSave (and Save) will save definitions, and that can be pretty inconvenient, as you discovered. Use Export instead. Thus, instead of a=Range[100]; DumpSave["foo.mx", a], then Get["foo.mx"] and thinking, "What was the name of that symbol? a? b? Somthing else?", just use Export["foo.mx", a] and a = Import["foo.mx"]. This won't save the symbol name and definition, it just exports the expression assigned to the symbol. Then you can re-assign to whatever symbol you like upon loading the file. – Szabolcs Feb 10 '17 at 16:08
  • So actually @Szabolcs makes a very good point, my suggestion will not help if you have those symbols already created / used in Global` and you dumped them also in Global` (for example) - because it relies in $NewSymbol. – Leonid Shifrin Feb 10 '17 at 16:09
  • @flip Of course this applies for cases when you really want to store data and not definitions. But it sounds like it is data that you are storing. Definitions wouldn't be huge anyway. – Szabolcs Feb 10 '17 at 16:10
  • @Szabolcs Right! That makes perfect sense,I seem to remember using DumpSave because I was saving lots of InterpolatingFunctions so I needed definitions. But you're certainly right in this case... data. Thanks to both you and @leonid – flip Feb 10 '17 at 16:11
  • 2
    @Szabolcs Actually, one thing to keep in mind regarding using Import / Export to store the data is that it is several times slower than using something like getData = Function[file, Block[{data}, Get[file];data]] and putData = Function[{file, dt}, Block[{data = dt}, DumpSave[file, data]]], where data is some symbol (typically living in a package's private part, so that Block isn't too dangerous to use). – Leonid Shifrin Feb 10 '17 at 16:14
  • 3
    Possible duplicates: (2900), (25027) – Mr.Wizard Feb 11 '17 at 00:40

0 Answers0