15

I use

f = OpenRead["largefile.m"]
x = Read[f];
Close[f];

to read in a Mathematica expression from a large file largefile.m (about 3 GB) and assign it to x. The Read[f] step takes very long time. How to monitor its progress? BTW, if the file largefile.m is larger than 4.1 GB (as ls -la displays), will this approach work?

rm -rf
  • 88,781
  • 21
  • 293
  • 472
user13253
  • 8,666
  • 2
  • 42
  • 65

2 Answers2

10

I am fairly sure that for files of this size, you'll have problems loading them into Mathematica (unless you have truly huge amount of RAM on your machine, but perhaps even in this case). If your file contains newline-terminated strings, or if you otherwise know its structure (types stored there), consider using ReadList or BinaryReadList. I gave one example in my post here. In this way, you can attach something like ProgressIndicator to it, and monitor the progress. Also, I would recommend to use something similar to large data framework from this answer, to convert the contents of your file / parts of resulting expression, to a file-backed form, since operating on the entire dataset in-memory might not be possible and / or efficient for such huge amounts of data.

Leonid Shifrin
  • 114,335
  • 15
  • 329
  • 420
6

I've been using BinaryReadList to load reasonably large 100-500MB files and its been working well.

However, I've never had a way to watch the progress. Based @Leonid's answer I hacked together a ProgressIndicator for the function.

In my case my large data files consist of sets of 4 32bit (big endian) floats. Here I read the file in 100 steps. Timing is similiar to reading the file all at once.

ImportProgress[filename_] := Module[{str, data, n},
  (*open stream*)
  str = OpenRead[filename, BinaryFormat -> True];
  data = {}; n = 1;
  (*display the dynamic progress indicator*)
  Print[ProgressIndicator[Dynamic[n/100]]];
  (*read data*)
  While[n < 100,
   AppendTo[data, 
    BinaryReadList[str, {"Real32", "Real32", "Real32", "Real32"}, 
     Ceiling[FileByteCount[filename]/16/100], ByteOrdering -> +1]];
   n++];
  (*close stream*)
  Close[str];
  (*return the data*)
  data
]

(*read the data*)
data = ImportProgress["R06_14539-v01.pos"];  

Here is a more elegant alternative using Monitor where the ProgressIndicator disappears after the file finishes loading:

ImportProgressMonitor[filename_] := Monitor[
  Module[{str, data},
   (*open stream*)
   str = OpenRead[filename, BinaryFormat -> True];
   data = {};
   (*read data*)
   n = 1;
   While[n < 100,
    AppendTo[data, 
     BinaryReadList[str, {"Real32", "Real32", "Real32", "Real32"}, 
      Ceiling[FileByteCount[filename]/16/100], ByteOrdering -> +1]];
    n++];
   (*close stream*)
   Close[str];
   data
   ],
  (*monitor the progress*)
  ProgressIndicator[n/100]
]
s0rce
  • 9,632
  • 4
  • 45
  • 78