Request help for reading text data files with headers and 2 sections

Question

I just started with Mathematica.

I need it for many purposes, but the important one is to read a text data file and extract column values as variables.

I have successfully tested the tab delimited data format. Now I have a problem with a file with this format:

15 lines of comments with spaces and blank lines
7 col and 100 lines: Col haves headers.
The above is repeated after 3 blanks lines.

Any comments and suggestions are welcome

Hi belisarius Thank you. Yes I tested Import function but no succes. Here you can find my data: https://www.dropbox.com/s/82n3dubfwbstns5/exampleDataFile.txt Best regards — user6726, Apr 13 '13 at 16:41
Please find the link to my data: https://www.dropbox.com/s/82n3dubfwbstns5/exampleDataFile.txt — user6726, Apr 13 '13 at 16:43
Andre, Please can you comment or give some details about your code? — user6726, Apr 13 '13 at 16:45
I cannot seem to download the file; I get only: Loading... Can anyone else? — Mr.Wizard, Apr 13 '13 at 16:48
https://www.dropbox.com/s/82n3dubfwbstns5/exampleDataFile.txt This link will work is you copy paste — user6726, Apr 13 '13 at 17:01

score 3 · Answer 1 · edited Apr 13 '17 at 12:55

Preamble : Input file

enter image description here

solution
Here is how I would do It :

Import[filename, "Text"] // (* import the whole file as a unique string *)
      StringSplit[#, "\n\n\n"] & // (* split every time there are 3 carriage return -> separate blocks *)
     (StringSplit[#, "\n"] & /@ # &) // (* split in the inner structures at every car ret -> separate lines in each block*)
    (Drop[#, 15] & /@ # &) // (* get rid of the 15 first lines in inner structures *)
   (Drop[#, 1] & /@ # &) // (* get rid of the next line -> remove headlines *)
  Map[StringSplit[#, "\t"] &, #, {2}] & // (* split at every tab in inner structures of level 2 -> separate numbers in each line *)
 Map[StringReplace[#, " " -> ""] &, #, {2}] & (* suppress parasitical white characters at level 2 *)

At this point each number is a String. You can use ImportString["8.17464362e-04", "Table"] to get the number or see this question.

To understand my code, try :
- first Import[filename, "Text"]
- then Import[filename, "Text"] //StringSplit[#, "\n\n\n"] &
- then Import[filename, "Text"] //StringSplit[#, "\n\n\n"] & //(StringSplit[#, "\n"]
- etc ...

Edit

The same code, in a more classical coding style :

fullText = Import[filename, "Text"];
splitLevel1 = StringSplit[fullText, "\n\n\n"];
splitLevel2 = Map[StringSplit[#, "\n"] &, splitLevel1];
splitLevel2WithoutHeadLines = Map[Drop[#, 16] &, splitLevel2];
splitLevel3 = Map[StringSplit[#, "\t"] &, splitLevel2WithoutHeadLines, {2}];
splitLevel3WithoutWhiteCharacters = Map[StringReplace[#, " " -> ""] &, splitLevel3, {2}]

Verification

enter image description here

Hi Andre and Wizard, Thank you very much for your very fast and efficient helps! I have to working solutions which can be very helpfull! — user6726, Apr 13 '13 at 17:28
Hi Andre Thank you again. It work well and the code is clear! How to plot column 2 vs column 1 — user6726, Apr 13 '13 at 19:48

score 1 · Answer 2 · edited Apr 13 '17 at 12:55

This is not likely to be particularly fast, but it is concise:

Cases[
 Import["exampleDataFile.txt", "Table"],
 {__?NumberQ}
]

Should there be false matches you can use the known number of columns:

Cases[
 Import["exampleDataFile.txt", "Table"],
 {Repeated[_?NumberQ, {7}]}
]

If you need to separate each block of numeric data you might use SplitBy as follows:

dat = Import["exampleDataFile.txt", "Table"];

SplitBy[dat, MatchQ[#, {Repeated[_, {7}]?NumberQ}] &][[2 ;; -1 ;; 2]]

If any of these methods do not work on your full set please consider uploading a second example data file.

If the rest of the file is as regular as this appears you could try using Read and related functions for greater performance, as shown here. For example:

skip[__String] = Sequence[];

line = Table[Number, {7}];
drop = skip @@ Table[Record, {12}];
spec = Join[{drop}, Table[line, {21}]];

ReadList["exampleDataFile.txt", spec]

{{{100., 41.2318, 1.31655*10^-9, 1.96161, -38600., 38600., -89.9971}, {158.489, 41.2318, 
   3.22439*10^-9, 1.91259, -24355., 24355., -89.9955}, {251.189, 41.2318, 8.01648*10^-9, 
   1.89304, -15367., 15367., -89.9929}, {398.107, 41.2318, 2.00525*10^-8, 
   1.88515, -9695.9, 9695.9, -89.9889}, {630.957, 41.2316, 5.02786*10^-8, 
   1.88175, -6117.72, 6117.72, -89.9824}, {1000., 41.2313, 1.26158*10^-7, 
   1.87975, -3860.05, 3860.05, -89.9721}, {1584.89, 41.2305, 3.16473*10^-7, 
   1.87732, -2435.57, 2435.57, -89.9558}, {2511.89, 41.2286, 7.92735*10^-7, 
   1.87228, -1536.81, 1536.81, -89.9302}, {3981.07, 41.2238, 1.97791*10^-6, 
   1.86016, -969.775, 969.777, -89.8901}, {6309.57, 41.2119, 4.88673*10^-6, 
   1.83067, -612.059, 612.062, -89.8286}, {10000., 41.1839, 0.0000117939, 
   1.7613, -386.441, 386.445, -89.7389}, {15848.9, 41.1232, 0.0000270225, 
   1.61129, -244.183, 244.188, -89.6219}, {25118.9, 41.0111, 0.0000560277, 
   1.33723, -154.485, 154.491, -89.5041}, {39810.7, 40.8533, 0.000100044, 
   0.957928, -97.8478, 97.8525, -89.4391}, {63095.7, 40.6927, 0.000153468, 
   0.589637, -61.9818, 61.9846, -89.455}, {100000., 40.563, 0.000214301, 
   0.329894, -39.2338, 39.2351, -89.5182}, {158489., 40.4627, 0.000285002, 
   0.175532, -24.8167, 24.8173, -89.5947}, {251189., 40.3884, 0.000360826, 
   0.0887994, -15.6873, 15.6876, -89.6757}, {398107., 40.3415, 0.000444295, 
   0.0436315, -9.90969, 9.90978, -89.7477}, {630957., 40.315, 0.000569237, 
   0.0222841, -6.25674, 6.25678, -89.7959}, {1.*10^6, 40.2996, 0.000817464, 
   0.0127498, -3.94925, 3.94927, -89.815}}, {{100., 41.2318, 1.31655*10^-9, 
   1.96161, -38600., 38600., -89.9971}, {158.489, 41.2318, 3.22439*10^-9, 
   1.91259, -24355., 24355., -89.9955}, {251.189, 41.2318, 8.01648*10^-9, 
   1.89304, -15367., 15367., -89.9929}, {398.107, 41.2318, 2.00525*10^-8, 
   1.88515, -9695.9, 9695.9, -89.9889}, {630.957, 41.2316, 5.02786*10^-8, 
   1.88175, -6117.72, 6117.72, -89.9824}, {1000., 41.2313, 1.26158*10^-7, 
   1.87975, -3860.05, 3860.05, -89.9721}, {1584.89, 41.2305, 3.16473*10^-7, 
   1.87732, -2435.57, 2435.57, -89.9558}, {2511.89, 41.2286, 7.92735*10^-7, 
   1.87228, -1536.81, 1536.81, -89.9302}, {3981.07, 41.2238, 1.97791*10^-6, 
   1.86016, -969.775, 969.777, -89.8901}, {6309.57, 41.2119, 4.88673*10^-6, 
   1.83067, -612.059, 612.062, -89.8286}, {10000., 41.1839, 0.0000117939, 
   1.7613, -386.441, 386.445, -89.7389}, {15848.9, 41.1232, 0.0000270225, 
   1.61129, -244.183, 244.188, -89.6219}, {25118.9, 41.0111, 0.0000560277, 
   1.33723, -154.485, 154.491, -89.5041}, {39810.7, 40.8533, 0.000100044, 
   0.957928, -97.8478, 97.8525, -89.4391}, {63095.7, 40.6927, 0.000153468, 
   0.589637, -61.9818, 61.9846, -89.455}, {100000., 40.563, 0.000214301, 
   0.329894, -39.2338, 39.2351, -89.5182}, {158489., 40.4627, 0.000285002, 
   0.175532, -24.8167, 24.8173, -89.5947}, {251189., 40.3884, 0.000360826, 
   0.0887994, -15.6873, 15.6876, -89.6757}, {398107., 40.3415, 0.000444295, 
   0.0436315, -9.90969, 9.90978, -89.7477}, {630957., 40.315, 0.000569237, 
   0.0222841, -6.25674, 6.25678, -89.7959}, {1.*10^6, 40.2996, 0.000817464, 
   0.0127498, -3.94925, 3.94927, -89.815}}}

The various repetition number in Table will need to be adjusted to match your data. If these many vary the Import method would be preferred.

@user6726 does Import["exampleDataFile.txt", "Table"] import correctly? (By correctly I just mean all the data there, but not formatted/extracted correctly.) — Mr.Wizard, Apr 13 '13 at 17:13
Sorry for my previous comment! I set my directory correctly now and get data with your second code! Now I am trying to understand the output! — user6726, Apr 13 '13 at 17:21
It seems that I have the whole data in a big list : Nested list? How to extract colon data? — user6726, Apr 13 '13 at 17:24
@user6726 I don't know what you mean by "colon data." The Import method puts all the data in one table of seven columns, whereas the ReadList method is producing a list of tables (tensor). Is one of these correct, or do you need something else? — Mr.Wizard, Apr 13 '13 at 17:28
@Mr.Wizard Assuming these aren't medical data, perhaps it's 'Column Data"? — cormullion, Apr 13 '13 at 17:34
@user6726 I added another method to my answer, using SplitBy -- there are many ways to approach a problem like this and it's really a matter of what you want and what the full file looks like. — Mr.Wizard, Apr 13 '13 at 17:58
It is Column. The data is organize in sets of columns and lines. For each set there are 7 columns and 21 lines (the number of a not fixed, it can be more than 21). In the present example there are 2 sets. So my question is : How can I separate the sets? Idealy I would like to have one table per set (7 columns, Xnumbers of Lines). Thank you — user6726, Apr 13 '13 at 18:50

Request help for reading text data files with headers and 2 sections

2 Answers2