5

Is there a way to make ReadList ignore certain "comment lines" (say, all lines that start with #, or in general all lines that match a string pattern)? This would be very useful for many types of files.

a06e
  • 11,327
  • 4
  • 48
  • 108
  • How are you using ReadList, reading in the whole file at once via ReadList["file"] or reading in a certain number of lines each time via ReadList["file",types,n]? – Jason B. Jan 25 '16 at 13:14
  • @JasonB Up to now I've done ReadList["file"], but maybe ReadList["file",types,n] is more convenient to ignore comments? – a06e Jan 25 '16 at 13:16
  • So if you've read in the whole file using ReadList["file"], then can't you simply remove the elements that match the pattern? – Jason B. Jan 25 '16 at 13:18
  • I don't think ReadList supports this. Two ideas: 1. read the whole file as a string, filter the comments, convert the string to a stream, read from there 2. use sed to strip the comments, pipe the output to Mathematica (i.e. ReadList["!sed ..."] where you'll need to look up the correct arguments to sed). – Szabolcs Jan 25 '16 at 14:04
  • possible dup : http://mathematica.stackexchange.com/q/50718/2079 – george2079 Jan 25 '16 at 14:25
  • @george2079 not a duplicate. The comments may be in the middle of the file. – a06e Jan 25 '16 at 14:31
  • I believe the answers over there address that. – george2079 Jan 25 '16 at 14:36
  • @george2079 I would just point out that that answer uses Import to grab the whole file at once. For large files (hundreds of megabytes), Import isn't practical and you have to resort to functions like Read. I have a solution for this that reads in each line and applies your ImportString@StringReplace..... to it, but it's terribly slow. – Jason B. Jan 25 '16 at 15:09
  • The fastest thing likely is to use sed or grep. You can likely do that inline with something like Import["!sed .... file" , ..]. – george2079 Jan 25 '16 at 15:17

1 Answers1

5

george2079's answer works great for ignoring comment lines, but it uses Import to read in the whole file at once. Sometimes you have really large files that you can't Import in one go, you have to read them line by line. Of course, for files where you can use Import, that will be faster because you are applying the test to the one big string instead of to each line one after the other.

So should it ever be needed, here is this slower method,

readTableIgnoreComments[fname_, comment_] := 
  Module[{stream, input, list},
   stream = OpenRead[fname];
   input = ReadLine[stream];
   list = Reap[
      While[input =!= EndOfFile,
        input = StringReplace[input, StartOfLine ~~ "#" ~~ __ -> ""];

        If[input =!= "",
         Sow[ImportString[input, "Table"][[1]]];
         ];
        input = ReadLine[stream];
        ];
      ][[2, 1]];
   Close[stream];
   list];

Say you have a data file, like this one.

readTableIgnoreComments["AFmW2BKQ.txt", "#"]

(*{{13.2177,6.30967,5.20213,5.93021},
{0.350016,13.4168,4.70314,6.16283},{3.61517,2.2622,13.4662,8.27606},
{9.82748,9.46039,0.894798,2.29597},{11.3524,5.98368,1.55609,2.20832},
{9.59792,3.70512,13.2309,12.156},{3.11005,11.4154,3.06589,11.9867},
{8.46262,9.44805,3.81819,12.8846},{13.7071,3.44253,4.13362,1.14301},
{0.353064,11.238,7.46061,7.04745},{5.24699,10.2069,4.79834,7.86099},
{7.90967,1.95548,6.44391,6.85132},{12.2419,7.94127,12.8604,10.4504},
{5.58408,6.49862,10.0892,10.2229},{5.5434,4.1264,13.0629,12.3711},
{5.35179,10.4674,1.39775,10.0056},{5.90251,12.3466,1.8162,5.9312},
{10.0368,3.42365,13.5114,10.938},{9.3393,4.55733,7.98305,1.01929},
{9.68279,9.00243,6.19094,0.482091},{8.86858,5.26325,2.35884,7.05454},
{4.37432,8.51505,3.90883,0.380504},{4.86367,9.19055,3.04116,10.9041},
{13.1294,7.39576,8.72494,1.72672},{12.5197,7.75693,9.2014,6.95952},
{8.95084,5.61415,12.6574,9.6697},{0.707468,3.96087,1.08438,10.7936},
{13.5005,3.14536,4.87679,6.00281},{6.74514,11.6024,2.23439,4.34998},
{7.34057,5.99825,1.02762,3.7478},{4.14484,6.2788,13.2522,4.1299},
{1.467,8.05903,8.09584,7.55979},{1.68094,2.35345,6.66405,4.4644},
{2.51467,8.88769,5.72158,6.80248},{13.6002,3.71197,2.81909,1.05188}}*)

As george2079 and Szabolcs pointed out, if you are lucky enough to be running an operating system with sed, then by far the fastest way to do this is to use sed to replace the commented lines,

readListIgnoreComments[fname_, comment_] := Module[{list},
  Run["sed -e '/^" <> comment <> ".*/d' " <> fname <> " > TEMP_" <> 
    fname];
  list = ReadList["TEMP_" <> fname];
  DeleteFile["TEMP_" <> fname];
  list
  ]
Jason B.
  • 68,381
  • 3
  • 139
  • 286