0

I have around 240 files with lots of data, I have been using Import[] to import these data for further processing. But as my simulations are growing bigger, I am having lots of difficulty importing them, my computer is freezing and lagging (my computer has 6GB RAM). I was wondering if there is a better (less memory intensive) option to import these files?

I have been using the following loop command to import this data as strings and then converting them to numbers

Rawdata = Table[
     s = ToString[j];
     Import["C:\\Users\\Downloads\\Run 6 new z=45 sigma =0.845\\pinocchio." <>
          StringTake[s,2] <> "." <> StringDrop[s,2] <> ".example.catalog.out",
          "Data"],
     {j, 200000, 420000, 1000}];

Is there an analogous way to import the files without using so much memory?

Mark Adler
  • 4,949
  • 1
  • 22
  • 37
HuShu
  • 439
  • 2
  • 14
  • Could you please format your code using linebreaks? It is very hard to read at the moment. Did you search the site for already existing solutions yet? – Yves Klett May 08 '15 at 04:59
  • I found the commands Get and Dump, but I am not sure how to apply a loop statement to it. – HuShu May 08 '15 at 05:03
  • Please search the site for "Import large" and take a look at the listed threads. – Yves Klett May 08 '15 at 05:07
  • Unfortunately I haven't found anything that specifically answers this question. – HuShu May 08 '15 at 05:35
  • 1
    Look at OpenRead and ReadList – Basheer Algohi May 08 '15 at 05:45
  • 1
    Plus, there's a missing quote in the code you provided. – Sektor May 08 '15 at 06:41
  • @Algohi I tried using ReadList["file",String], but my computer completely froze, I checked that with ReadList the memory usage was much more compared to Import. Perhaps, I will import them in parts and Concatenate them. But still the data is stored in the memory, and my computer starts to act sluggishly. Is there a way, I can erase the memory once I have the data concatenated? – HuShu May 08 '15 at 07:13
  • I would load the data files into a database like MySQL and then connect to it from Mathematica with DatabaseLink. MySQL's "LOAD DATA LOCAL INFILE ..." is incredibly fast. – Gustavo Delfino May 08 '15 at 08:11
  • I guess it will be difficult to help you unless you provide a minimal working example. – Yves Klett May 08 '15 at 09:34
  • what are you talking about importing as strings and converting to numbers? I don,t see that in your code at all. – george2079 May 08 '15 at 12:21
  • 1
    You can use ReadList to read line-by-line or by blocks (use its third parameter). Erasing memory: just unset (=.) the variable you want to clear, or use ClearAll or Remove. – Sjoerd C. de Vries May 08 '15 at 12:48
  • do you generate the data with mathematica and reread it with mathematica? If yes, then use Dump and Get as you suggested (although I think Export[filename,expression,"MX"] is slightly cleaner). Anyway, if you want the data to be accessable with other programs you could also look at specific file formats like HDF5 which are at least partially supported from Mathematica and will be a much better choice to store large numeric arrays, even a whole collection in one file... – Albert Retey May 08 '15 at 14:50
  • @YvesKlett The actual data is around 250 MB and contains over 10^6 lines of numbers. A small working example will defeat the purpose of this question as I can very well import a small data list using either ReadList or Import. – HuShu May 08 '15 at 19:07
  • @AlbertRetey No, it's generated from an N-body simulation like code. – HuShu May 08 '15 at 19:08
  • and the code writes the presumably numeric results into text files? Can you change that and make that code save something else? It certainly is possible to read in these text files, but it might in total consume more time than changing the data format (in case that is possible)... – Albert Retey May 08 '15 at 21:12
  • @SjoerdC.deVries I have been trying to break up my code into cells and after I evaluate one cell and transfer the result to a variable into the next cell, I clear the variable of the previous cell. But this doesn't seem to work, my RAM consumption keeps increasing even though I clear the unnecessary variable, is there a way out? – HuShu May 08 '15 at 22:30
  • @AlbertRetey I have the output files in .out format, but the problem is, each output file has some unnecessary text which I want to remove, that's why all this hassle. Perhaps, I could go into the simulation itself and modify it so that it doesn't outputs those text. Hmmm! – HuShu May 08 '15 at 22:33
  • 1
    concerning memory: do you know about $HistoryLength=0? if you don't do that Mathematica will remember all output you ever generated in a session. concerning import: if changing the simulation, I'd consider another format for your files, text is just not a very good encoding for numeric data. Other than that, you might find this question and answers to it helpful. If sticking with text, you can read lines with ReadList and type String instead of BindaryReadList as in the answer... – Albert Retey May 08 '15 at 23:34
  • A MWE would still be tremendously useful to show and test alternative approaches. Those you could benchmark with your data and give feedback on their performance. But without the exact specs for your files this depends on too much speculation. – Yves Klett May 09 '15 at 08:43

1 Answers1

2

If your data is or can be coerced into a rectangular array of machine Integer, Real, or Complex numbers (all the same type), then you can use ToPackedArray to reduce the amount of memory required, as well as make operations on the data faster. One might have hoped that Import would do this automatically, but it doesn't.

You can use ByteCount to see how much memory a variable is consuming.

Those operations, as well as PackedArrayQ and FromPackedArray require Needs["Developer`"] to load them.

Simple example:

Needs["Developer`"]
dat=ExportString[RandomReal[1,{1000,1000}],"Table"];
ImportString[dat,"Table"]//ByteCount
24208200
ImportString[dat,"Table"]//ToPackedArray//ByteCount
8000152
Mark Adler
  • 4,949
  • 1
  • 22
  • 37