0

Trying to import a small (15 MB) CSV file and the Import command is either not responding or responding very slowly. Watching the memory usage of the Mathematica kernel when initializing the command it hits some maximum and stalls without changing for several minutes.

I assumed the size of the file would be small enough for it to do this without any real issues. The data I'm trying to import is seven rows, each with 721 columns, where each entry is 148 points.

I've tried this technique

<< JLink`;  
InstallJava[];  
ReinstallJava[JVMArguments -> "-Xmx1024m"]

Where I change the last few numbers, but I didn't see any improvement. Maybe I'm misunderstanding what's bogging it down.

I can successfully use Import to grab image files and for other CSV files. The images and CSV files are 2 orders of magnitude smaller in file size, though.

m_goldberg
  • 107,779
  • 16
  • 103
  • 257
  • 1
    To make sure the problem comes from the size, you could split the csv file into two or three parts and try importing them. Could you share the file, or a part of the file? – anderstood Nov 15 '16 at 18:06
  • @anderstood That's a good point. I'll try that after it finishes running this other thing that may take a couple hours. – Jordan Watkins Nov 15 '16 at 18:11
  • Just in case you are not aware of this: if the other computations don't use all the cores of your PC, you could open a new Mathematica window and experiment in this one (both kernels will be independent). – anderstood Nov 15 '16 at 18:14
  • If the file is all numbers you can read like this: f = OpenRead["test.csv"];ReadList[f, Number, RecordSeparators -> {","}];Close[f]; typically much faster than Import – george2079 Nov 15 '16 at 19:13
  • 1
    what do you mean "each entry is 148 points" ? – george2079 Nov 15 '16 at 19:17
  • @anderstood I'm not very well-versed in assessments of that sort. As I understand it, it will not allow for simultaneous kernels in different notebooks, but I admit that I have not tried. Maybe it does use all the cores, I never set it to do that but I also never told it not to. – Jordan Watkins Nov 15 '16 at 19:24
  • @george2079 what I mean is that each cell in the CSV is a list of 148 data points: {0.1,0.32,0.41,<<142>>,0.29,0.16,0.09}. – Jordan Watkins Nov 15 '16 at 19:24
  • 2
    It is not a proper csv if its nested 3 deep. Import is I suspect treating each of those 140 point sublists as a string. – george2079 Nov 15 '16 at 19:35
  • Is this correct? 7 lines each line with 106708 numbers separated by commas and grouped in sets of 148 by {} . ie,.. {1,2,3...148},{...},{...} – george2079 Nov 15 '16 at 19:46
  • @george2079 It is 3 deep. Let's say 7 BW images each 721 pixels high and 148 pixels wide where each pixel is represented by their intensity data. – Jordan Watkins Nov 15 '16 at 20:27
  • 1
    Then you need to "Flatten" it: you should just have lines of comma-separated numbers (and not list). You can always go back to the original 3 deep list using Partition afterwards. For example, if one line is {1,2,3},{4,5,6}, you could flatten it as 1,2,3,4,5,6, export it, import it back and partition if as {1,2,3},{4,5,6}. – anderstood Nov 15 '16 at 20:45

1 Answers1

1

This should work. I dont know how fast it will be on a big file.

test file:

f = OpenWrite["test.csv"];
WriteLine[f, "{1,2,3},{4,5,6}"];
WriteLine[f, "{7,8,9},{10,11,12}"]
Close[f];

read lines as strings, wrap each in {} and extract data:

f=OpenRead["test.csv"];
data=ToExpression["{"<>#<>"}"&/@ReadList[f,String]]
Close[f];

{{{1, 2, 3}, {4, 5, 6}}, {{7, 8, 9}, {10, 11, 12}}}

another approach: (here you need to know how to partition it)

f = OpenRead["test.csv"];
rowlen = 6;
sublen = 3;
data = Partition[#, sublen] & /@ Partition[
      ToExpression@ReadList[f, Record, NullRecords -> False,
      RecordSeparators -> {",", "}", "{", "\n", "\r\n", "\r"}], 
      rowlen];
Close[f];

Really I'd avoid using that file format if you can.

Note as I guessed if you Import this you get strings:

 Import["test.csv"] // InputForm

{{"{1, 2, 3}", "{4, 5, 6}"}, {"{7, 8, 9}", "{10, 11, 12}"}}

so you could do ToExpression@Import["test.csv"] if Import succeeded.

george2079
  • 38,913
  • 1
  • 43
  • 110