Is there a way to make ReadList ignore certain "comment lines" (say, all lines that start with #, or in general all lines that match a string pattern)? This would be very useful for many types of files.
Asked
Active
Viewed 1,082 times
1 Answers
5
george2079's answer works great for ignoring comment lines, but it uses Import to read in the whole file at once. Sometimes you have really large files that you can't Import in one go, you have to read them line by line. Of course, for files where you can use Import, that will be faster because you are applying the test to the one big string instead of to each line one after the other.
So should it ever be needed, here is this slower method,
readTableIgnoreComments[fname_, comment_] :=
Module[{stream, input, list},
stream = OpenRead[fname];
input = ReadLine[stream];
list = Reap[
While[input =!= EndOfFile,
input = StringReplace[input, StartOfLine ~~ "#" ~~ __ -> ""];
If[input =!= "",
Sow[ImportString[input, "Table"][[1]]];
];
input = ReadLine[stream];
];
][[2, 1]];
Close[stream];
list];
Say you have a data file, like this one.
readTableIgnoreComments["AFmW2BKQ.txt", "#"]
(*{{13.2177,6.30967,5.20213,5.93021},
{0.350016,13.4168,4.70314,6.16283},{3.61517,2.2622,13.4662,8.27606},
{9.82748,9.46039,0.894798,2.29597},{11.3524,5.98368,1.55609,2.20832},
{9.59792,3.70512,13.2309,12.156},{3.11005,11.4154,3.06589,11.9867},
{8.46262,9.44805,3.81819,12.8846},{13.7071,3.44253,4.13362,1.14301},
{0.353064,11.238,7.46061,7.04745},{5.24699,10.2069,4.79834,7.86099},
{7.90967,1.95548,6.44391,6.85132},{12.2419,7.94127,12.8604,10.4504},
{5.58408,6.49862,10.0892,10.2229},{5.5434,4.1264,13.0629,12.3711},
{5.35179,10.4674,1.39775,10.0056},{5.90251,12.3466,1.8162,5.9312},
{10.0368,3.42365,13.5114,10.938},{9.3393,4.55733,7.98305,1.01929},
{9.68279,9.00243,6.19094,0.482091},{8.86858,5.26325,2.35884,7.05454},
{4.37432,8.51505,3.90883,0.380504},{4.86367,9.19055,3.04116,10.9041},
{13.1294,7.39576,8.72494,1.72672},{12.5197,7.75693,9.2014,6.95952},
{8.95084,5.61415,12.6574,9.6697},{0.707468,3.96087,1.08438,10.7936},
{13.5005,3.14536,4.87679,6.00281},{6.74514,11.6024,2.23439,4.34998},
{7.34057,5.99825,1.02762,3.7478},{4.14484,6.2788,13.2522,4.1299},
{1.467,8.05903,8.09584,7.55979},{1.68094,2.35345,6.66405,4.4644},
{2.51467,8.88769,5.72158,6.80248},{13.6002,3.71197,2.81909,1.05188}}*)
As george2079 and Szabolcs pointed out, if you are lucky enough to be running an operating system with sed, then by far the fastest way to do this is to use sed to replace the commented lines,
readListIgnoreComments[fname_, comment_] := Module[{list},
Run["sed -e '/^" <> comment <> ".*/d' " <> fname <> " > TEMP_" <>
fname];
list = ReadList["TEMP_" <> fname];
DeleteFile["TEMP_" <> fname];
list
]
ReadList, reading in the whole file at once viaReadList["file"]or reading in a certain number of lines each time viaReadList["file",types,n]? – Jason B. Jan 25 '16 at 13:14ReadList["file"], but maybeReadList["file",types,n]is more convenient to ignore comments? – a06e Jan 25 '16 at 13:16ReadList["file"], then can't you simply remove the elements that match the pattern? – Jason B. Jan 25 '16 at 13:18ReadListsupports this. Two ideas: 1. read the whole file as a string, filter the comments, convert the string to a stream, read from there 2. usesedto strip the comments, pipe the output to Mathematica (i.e.ReadList["!sed ..."]where you'll need to look up the correct arguments tosed). – Szabolcs Jan 25 '16 at 14:04Importto grab the whole file at once. For large files (hundreds of megabytes),Importisn't practical and you have to resort to functions likeRead. I have a solution for this that reads in each line and applies yourImportString@StringReplace.....to it, but it's terribly slow. – Jason B. Jan 25 '16 at 15:09sedorgrep. You can likely do that inline with something likeImport["!sed .... file" , ..]. – george2079 Jan 25 '16 at 15:17