6

I would like to read in really large text files (about 2GB) into Mathematica. The structure of the file is such that the first row and the first column are text i.e strings. Rest is all numbers. Is it possible to take advantage of this and make the Import faster? Currently I do Import["file.txt","Table"]. It takes a long time. I would like the output list to be same as that returned by the previously mentioned Import command.

Thanks for your help.

rcollyer
  • 33,976
  • 7
  • 92
  • 191
preeti
  • 833
  • 1
  • 7
  • 17
  • 1
    Related: http://mathematica.stackexchange.com/questions/5179/how-to-read-data-file-quickly – cormullion Dec 04 '12 at 19:53
  • Related: http://mathematica.stackexchange.com/questions/36/file-backed-lists-variables-for-handling-large-data – Eli Lansey Dec 04 '12 at 19:58
  • Related: http://stackoverflow.com/questions/2370570/way-to-deal-with-large-data-files-in-wolfram-mathematica – Eli Lansey Dec 04 '12 at 19:58
  • 1
    Import is often about the slowest you can have, particularly for tabular data. BinaryRead(List) can be much faster, and Java can give you still much faster reads, when one uses buffer reads – Leonid Shifrin Dec 04 '12 at 20:06
  • The three links above especially the third one are related but not exactly what I need. I don't have a problem with memory. It is only the speed that I am concerned. Also, I was looking for something like if I could tell mathematica that those are going to be numbers, is it possible that it will save time? For eg: in R, it does help a lot.@LeonidShifrin can you please explain your answer? Sorry, I am not able to understand. – preeti Dec 04 '12 at 20:39
  • Do you mean my comment or the answer I linked to? – Leonid Shifrin Dec 04 '12 at 20:58
  • your comment (that's what I meant,especially using BinaryReadList[] ) and also for the answer, how to do it when I call mathematica using -noprompt -script "test.m"? Thanks – preeti Dec 04 '12 at 21:00
  • Well, I think that answer you refer to should be pretty clear, I gave a working example there. As for this comment: I meant to say that BinaryReadList can give you more speed than Import - you can see one of the links in comments above for an example. Also,I linked to another answer of mine, where I used Java to get the file read into Mathematica 50x faster than with BinaryReadList. That answer was tailored to the specific question, but it shows how to do that. – Leonid Shifrin Dec 04 '12 at 21:12
  • @LeonidShifrin if you can make your comment an answer, I can see if it works and if it did, I can choose it as the best answer.Thanks. – preeti Dec 04 '12 at 22:12
  • But that's really not an answer, I don't answer anything specifically. Here is my advice: narrow down the question, provide a sample file format, provide a download link to a sample file, and then perhaps someone will come up with a more specific answer. But first, try all those suggestions. If they work and you get an answer which is any general (so that it could be useful for future visitors), post your own answer and accept it, if no better one appears. – Leonid Shifrin Dec 04 '12 at 22:18

1 Answers1

3

ReadList and streams using OpenRead are your friends (also OpenAppend if you want to append to it). OpenRead with streams uses very low-level native i/o methods, which is about the fastest you can possibly get. Also ReadList is much faster than Import, which has to load a Java package internally upon its first invocation, before it can do something.

Andreas Lauschke
  • 4,009
  • 22
  • 20