23

I have a couple of big lists (each of them contains elements that are themselves lists of two elements, i.e. an element of each list that I have is of the form {x, y}) that I want to save/export. Their generation namely takes hours, and I don't want to do this every single day.

I looked at the Mathematica help section on this (http://reference.wolfram.com/mathematica/tutorial/ImportingAndExportingData.html), but found that if I follow the example, I can't just import the data back into a list as it was before. It just ends up being something really messy.

So given such a list, say, list = {{1,2}, {1,3}, ... , {500, 500}}, what do I do, so that the next day I can just write list = Import[...]?

Ryker
  • 375
  • 1
  • 2
  • 7
  • 3
    I'd use .mx files (Export / Import in "MX" format). This is fast, and does not really involve serialization / parsing in the usual sense (via strings). In other words, mx files bypass the high-level parsing, populating internal structures at lower level. In addition, mx files preserve packed arrays. – Leonid Shifrin Jun 24 '14 at 21:20
  • @LeonidShifrin, yes! This works perfectly! If you make it an answer, I'd be glad to accept it. How did you figure this out, by the way? – Ryker Jun 24 '14 at 21:55
  • Actually, I recall now that I first learned about the fact that Export / Import working on .mx files from @Szabolcs. – Leonid Shifrin Jun 24 '14 at 22:06
  • 1
    I find this Q&A relevant and handy. – Johu Jun 25 '14 at 00:15
  • @Johu, thanks, I'll take a closer look it when I have the time. – Ryker Jun 25 '14 at 02:53

4 Answers4

23

I'd use .mx files (Export / Import in "MX" format):

Export["myFile.mx",list]

and

Import["myFile.mx"]

This is fast, and does not really involve serialization / parsing in the usual sense (via strings). In other words, mx files bypass the high-level parsing, populating internal structures at lower level. In addition, mx files preserve packed arrays.

Leonid Shifrin
  • 114,335
  • 15
  • 329
  • 420
  • 3
    I didn't mention MX on purpose because I thought it was overkill for relatively small strictly tabular data, and it does have some dangers that are important to be mentioned: it's not cross platform (before v10) and it's not compatible between different versions of Mathematica. I'm commenting just to warn the OP and make sure he doesn't use it for archiving or doesn't try to move files between say, Windows and Linux. – Szabolcs Jun 24 '14 at 22:06
  • @Szabolcs Yep, I do realize. Just when you posted this one, I added a comment under the main question, acknowledging that I learned about Export / Import working on .mx files from you. In this particular case, however, the OP's goal seems to be saving data for later use on the same machine, thus this suggestion. – Leonid Shifrin Jun 24 '14 at 22:08
  • Might comment was really aimed at the OP, not you. Yes, I agree than in general for saving important session data MX is best. Some systems like R even have a feature to restore the complete workspace, this is similar to that. – Szabolcs Jun 24 '14 at 22:11
  • @Szabolcs I disagree however that this is an overkill w.r.t. exporting as Table. The latter involves high-level serialization / parsing, and that always increases chances to not get the same thing back. Serializing to a binary format like .mx is different. – Leonid Shifrin Jun 24 '14 at 22:12
  • @Szabolcs Ok, apparently we agree on all points. – Leonid Shifrin Jun 24 '14 at 22:13
  • @LeonidShifrin, indeed, I just want to use this data on the same computer for now. I'm generating lists in a kind of inductive fashion, so I need the previous ones to generate the new one. If I don't have those stored, I can't get to the next "level", hence my desire to store the data and retrieve it with ease later on. – Ryker Jun 24 '14 at 22:30
  • 1
    @Szabolcs, Looks like v10 .mx files written by DumpSave and Import are not cross-platform either. That's kind of a bummer in my opinion. – kale Jun 25 '14 at 01:38
  • @kale Have you had the chance to try that? I thought that v10 MX files were portable between different OS for as long they're only used with v10 of the same "bitness" (e.g. all 64 bit). – Szabolcs Jun 25 '14 at 01:41
  • @Szabolcs, Only have it installed on one system, but built-in documentation specifies it's not compatible cross-platform or -version. – kale Jun 25 '14 at 01:43
  • @kale Well, the documentation is known to lag behind and be sometimes inaccurate ... I guess we'll find out when v10 final comes out. – Szabolcs Jun 25 '14 at 01:48
10

I provide two ways:

1) Human readable

data1 = RandomInteger[100, {25, 25}];
data2 = RandomReal[100, {25, 25}];
Save["humanReadable.m", {data1, data2}];

Unset[{data1, data2}]
Get["humanReadable.m"];
Dimensions@{data1, data2}

{2, 25, 25}

Note, that you can dump many different variables with ease, and the file is in easy to read Mathematica syntax allowing all kinds of symbolic and numeric data without any manual serialisation. By default Save appends which can be convenient, but ofc. must be kept in mind.

Saving the data from other sources in such format might be very handy, as you can use all of the Mathematica syntax including comments. For example I use it for measurement control software data dump.

2) Platform independent binary

Obviously the upside of binary format is smaller file size and loading time in case of big data.

In addition to mx data format already discussed, there is version and platform independend format wdx. Again there is a way to export and import it with symbol names attached.

DumpSave["platformIndependendBinary.wdx", {data1, data2}];
Unset[{data1, data2}]
Get["platformIndependendBinary.wdx"]

And if you don't want to fix / remember the variable names you can

DeleteFile["platformIndependendBinary.wdx"]
Export["platformIndependendBinary.wdx", {data1, data2}]
Unset[{data1, data2}]
{data1, data2} = Import["platformIndependendBinary.wdx"];

The same method works for .m files for human readable text format.

The only downside of wdx compared to mx is speed.

Johu
  • 4,918
  • 16
  • 43
  • 5
    WDX is awfully slow though. I use Export[..., Compress[expr], "String"], which is also version/platform independent, and faster than WDX. – Szabolcs Jun 25 '14 at 00:20
  • +1 for nice tip. The discussion about the speed I already stumbled upon and linked. – Johu Jun 25 '14 at 00:21
7

Probably the best way is to do

Export["mydata.txt", list, "Table"]

then later

Import["mydata.txt", "Table"]

Be sure to explicitly specify the data format: "Table". Otherwise Import/Export will likely still succeed but will automatically choose a different format.

This writes a whitespace separated plain text file that is readable by may other programs than Mathematica. If your dataset is so large that import/export takes too long, let me know, as there are better formats for that situation.

Szabolcs
  • 234,956
  • 30
  • 623
  • 1,263
  • I just tried it, but it doesn't work as I want it to. I namely don't get the original list if I do Import["mydata.txt"]. – Ryker Jun 24 '14 at 21:53
  • @Ryker For a list of the form you mentioned, it always works for me. Please post a short example list you have, then we can figure out why we see different behaviour. – Szabolcs Jun 24 '14 at 22:03
  • Well, for example, list = {{1,2}, {1,3}}. – Ryker Jun 24 '14 at 22:28
  • 1
    @Ryker https://www.dropbox.com/s/ichx99uq4sdlujv/Screenshot%202014-06-24%2018.29.19.png – Szabolcs Jun 24 '14 at 22:29
  • Damn, you're right, I overlooked the "Table" part in Import... – Ryker Jun 24 '14 at 22:34
  • I will say this, though. The .txt files are significantly smaller than the .mx files. I have, for example, ~170 MB vs. ~26 MB! So I think I'll be using your approach. – Ryker Jun 25 '14 at 06:01
  • @Ryker That sounds very unusual ... in my experience MX tends to be smaller (which makes sense since MX is binary). Are you sure you exported the very same thing? I just tried exporting r = RandomReal[1, {100000, 2}]; and MX is less than half the size of the plain text file. – Szabolcs Jun 25 '14 at 13:08
  • @Ryker Plus there's a very good point mentioned by Leonid which I didn't think of: exporting to plain text will lose a tiny little bit of precision, while MX retains exactly the same data. – Szabolcs Jun 25 '14 at 13:09
  • Yes, I did export the very same lists. I did it for two different ones, and for both the difference in size was about at least a factor of 4. And what do you mean by a loss in precision? That not as many digits of a number are stored in .txt files? – Ryker Jun 25 '14 at 17:07
  • 1
    @Ryker Computers store floating point numbers in a binary representation. When converting these numbers to a decimal representation, you would usually need a very high number of decimal digits to preserve the numbers precisely. An extreme example with base 3 (not base 2) that will illustrates well what happens: 0.1 in base 3 is 1/3 precisely. However, 1/3 is not even representable in decimal with a finite number of digits. Binary numbers are representable in decimal, but a perfect representation might take many more decimal digits than is reasonable to store. – Szabolcs Jun 25 '14 at 17:19
  • A machine precision floating points number uses 53 binary digits. This corresponds to a precision of Log[10, 2^53] ~ 16 decimal digits, so it makes no sense to store more than 16 decimal digits. However, e.g. 2^-40, which is perfectly representable on 53 binary digits (to be precise: only 1 binary digit is needed, plus the exponent), is precisely equal to 9.094947017729282379150390625 * 10^-13 in decimal. It needs 28 digits to represent perfectly in decimal. If you export to ASCII, i.e. as decimal, normally only 9.094947017729282 * 10^-13 will be stored (i.e. 16 digits). – Szabolcs Jun 25 '14 at 17:24
  • This is a famous article that explains this (non-Mathematica specific) problem, and sheds light on many more not immediately intuitive properties of floating point numbers and computer arithmetic: http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html – Szabolcs Jun 25 '14 at 17:26
  • @Ryker If your data is not confidential, can you please send me the data in MX format, so I can try to figure out why it takes so much space when using MX? Your finding is very disturbing for me (and I still cannot reproduce it on my machine). – Szabolcs Jun 25 '14 at 17:28
  • Yeah, I can send you the smallest .mx file, which is ~40 MB. Where should I send it to? – Ryker Jun 25 '14 at 17:48
  • @Ryker Best thing is to upload somewhere (if you don't have Dropbox, you might try http://ge.tt/). Most mailboxes don't support files this large. Then post the link in a comment. If you need to keep the data semi-private, email me the link instead (see my profile for my address). – Szabolcs Jun 25 '14 at 18:02
  • I sent you the link via e-mail, using their share link. By the way, I realized I wasn't wrong about .txt not yielding the desired results in all cases. It seems that, for example, double roots disrupt the whole. Try exporting and importing this list: list = {{Sqrt[7 i + Sqrt[7]], 4}, {1, 3}}. You don't get the same list back, do you? – Ryker Jun 25 '14 at 18:16
  • @Ryker If you need to preserve exact expressions which are not simple numbers, then MX or other formats that can store arbitrary Mathematica expressions are clearly the way to go. I don't expect that the "Table" format would preserve all such expressions. It's suitable for numbers like 1 or 1.2. It's good you let me know about the email because it went in my spam folder ... – Szabolcs Jun 25 '14 at 18:26
  • Yes, I think I'll be going with .mx after all, the size of the files notwithstanding. – Ryker Jun 25 '14 at 18:28
  • @Ryker All I can say is that you're right, MX is bigger than plain text, though I get a 10 MB MX when I re-export the same data I read from yours ... – Szabolcs Jun 26 '14 at 02:23
  • What were the commands you used (all of them, from importing onwards) to get that? Maybe I can follow your procedure and see if I then get a different size. Also, I'm using v9.0, if that makes a difference. I'm not sure why it should, though. Data is data. – Ryker Jun 26 '14 at 04:33
0

It is very easy. Save your file in .csv format e.g. Export["file.csv", data];, where data is like, data={{1.,1.},{2.,2.},{..}}. Then,during later use, initialize a list like, c={{0.,0.}} and then c = Import["file.csv"].

Henrik Schumacher
  • 106,770
  • 7
  • 179
  • 309
Devil
  • 1