1

I just started with Mathematica.

I need it for many purposes, but the important one is to read a text data file and extract column values as variables.

I have successfully tested the tab delimited data format. Now I have a problem with a file with this format:

15 lines of comments with spaces and blank lines
7 col and 100 lines: Col haves headers.
The above is repeated after 3 blanks lines.

Any comments and suggestions are welcome

Sjoerd C. de Vries
  • 65,815
  • 14
  • 188
  • 323
user6726
  • 33
  • 3

2 Answers2

3

Preamble : Input file

enter image description here

solution
Here is how I would do It :

Import[filename, "Text"] // (* import the whole file as a unique string *)
      StringSplit[#, "\n\n\n"] & // (* split every time there are 3 carriage return -> separate blocks *)
     (StringSplit[#, "\n"] & /@ # &) // (* split in the inner structures at every car ret -> separate lines in each block*)
    (Drop[#, 15] & /@ # &) // (* get rid of the 15 first lines in inner structures *)
   (Drop[#, 1] & /@ # &) // (* get rid of the next line -> remove headlines *)
  Map[StringSplit[#, "\t"] &, #, {2}] & // (* split at every tab in inner structures of level 2 -> separate numbers in each line *)
 Map[StringReplace[#, " " -> ""] &, #, {2}] & (* suppress parasitical white characters at level 2 *)

At this point each number is a String. You can use ImportString["8.17464362e-04", "Table"] to get the number or see this question.

To understand my code, try :
- first Import[filename, "Text"]
- then Import[filename, "Text"] //StringSplit[#, "\n\n\n"] &
- then Import[filename, "Text"] //StringSplit[#, "\n\n\n"] & //(StringSplit[#, "\n"]
- etc ...

Edit

The same code, in a more classical coding style :

fullText = Import[filename, "Text"];
splitLevel1 = StringSplit[fullText, "\n\n\n"];
splitLevel2 = Map[StringSplit[#, "\n"] &, splitLevel1];
splitLevel2WithoutHeadLines = Map[Drop[#, 16] &, splitLevel2];
splitLevel3 = Map[StringSplit[#, "\t"] &, splitLevel2WithoutHeadLines, {2}];
splitLevel3WithoutWhiteCharacters = Map[StringReplace[#, " " -> ""] &, splitLevel3, {2}]

Verification

enter image description here

andre314
  • 18,474
  • 1
  • 36
  • 69
  • Hi Andre and Wizard, Thank you very much for your very fast and efficient helps! I have to working solutions which can be very helpfull! – user6726 Apr 13 '13 at 17:28
  • Hi Andre Thank you again. It work well and the code is clear! How to plot column 2 vs column 1 – user6726 Apr 13 '13 at 19:48
1

This is not likely to be particularly fast, but it is concise:

Cases[
 Import["exampleDataFile.txt", "Table"],
 {__?NumberQ}
]

Should there be false matches you can use the known number of columns:

Cases[
 Import["exampleDataFile.txt", "Table"],
 {Repeated[_?NumberQ, {7}]}
]

If you need to separate each block of numeric data you might use SplitBy as follows:

dat = Import["exampleDataFile.txt", "Table"];

SplitBy[dat, MatchQ[#, {Repeated[_, {7}]?NumberQ}] &][[2 ;; -1 ;; 2]]

If any of these methods do not work on your full set please consider uploading a second example data file.


If the rest of the file is as regular as this appears you could try using Read and related functions for greater performance, as shown here. For example:

skip[__String] = Sequence[];

line = Table[Number, {7}];
drop = skip @@ Table[Record, {12}];
spec = Join[{drop}, Table[line, {21}]];

ReadList["exampleDataFile.txt", spec]
{{{100., 41.2318, 1.31655*10^-9, 1.96161, -38600., 38600., -89.9971}, {158.489, 41.2318, 
   3.22439*10^-9, 1.91259, -24355., 24355., -89.9955}, {251.189, 41.2318, 8.01648*10^-9, 
   1.89304, -15367., 15367., -89.9929}, {398.107, 41.2318, 2.00525*10^-8, 
   1.88515, -9695.9, 9695.9, -89.9889}, {630.957, 41.2316, 5.02786*10^-8, 
   1.88175, -6117.72, 6117.72, -89.9824}, {1000., 41.2313, 1.26158*10^-7, 
   1.87975, -3860.05, 3860.05, -89.9721}, {1584.89, 41.2305, 3.16473*10^-7, 
   1.87732, -2435.57, 2435.57, -89.9558}, {2511.89, 41.2286, 7.92735*10^-7, 
   1.87228, -1536.81, 1536.81, -89.9302}, {3981.07, 41.2238, 1.97791*10^-6, 
   1.86016, -969.775, 969.777, -89.8901}, {6309.57, 41.2119, 4.88673*10^-6, 
   1.83067, -612.059, 612.062, -89.8286}, {10000., 41.1839, 0.0000117939, 
   1.7613, -386.441, 386.445, -89.7389}, {15848.9, 41.1232, 0.0000270225, 
   1.61129, -244.183, 244.188, -89.6219}, {25118.9, 41.0111, 0.0000560277, 
   1.33723, -154.485, 154.491, -89.5041}, {39810.7, 40.8533, 0.000100044, 
   0.957928, -97.8478, 97.8525, -89.4391}, {63095.7, 40.6927, 0.000153468, 
   0.589637, -61.9818, 61.9846, -89.455}, {100000., 40.563, 0.000214301, 
   0.329894, -39.2338, 39.2351, -89.5182}, {158489., 40.4627, 0.000285002, 
   0.175532, -24.8167, 24.8173, -89.5947}, {251189., 40.3884, 0.000360826, 
   0.0887994, -15.6873, 15.6876, -89.6757}, {398107., 40.3415, 0.000444295, 
   0.0436315, -9.90969, 9.90978, -89.7477}, {630957., 40.315, 0.000569237, 
   0.0222841, -6.25674, 6.25678, -89.7959}, {1.*10^6, 40.2996, 0.000817464, 
   0.0127498, -3.94925, 3.94927, -89.815}}, {{100., 41.2318, 1.31655*10^-9, 
   1.96161, -38600., 38600., -89.9971}, {158.489, 41.2318, 3.22439*10^-9, 
   1.91259, -24355., 24355., -89.9955}, {251.189, 41.2318, 8.01648*10^-9, 
   1.89304, -15367., 15367., -89.9929}, {398.107, 41.2318, 2.00525*10^-8, 
   1.88515, -9695.9, 9695.9, -89.9889}, {630.957, 41.2316, 5.02786*10^-8, 
   1.88175, -6117.72, 6117.72, -89.9824}, {1000., 41.2313, 1.26158*10^-7, 
   1.87975, -3860.05, 3860.05, -89.9721}, {1584.89, 41.2305, 3.16473*10^-7, 
   1.87732, -2435.57, 2435.57, -89.9558}, {2511.89, 41.2286, 7.92735*10^-7, 
   1.87228, -1536.81, 1536.81, -89.9302}, {3981.07, 41.2238, 1.97791*10^-6, 
   1.86016, -969.775, 969.777, -89.8901}, {6309.57, 41.2119, 4.88673*10^-6, 
   1.83067, -612.059, 612.062, -89.8286}, {10000., 41.1839, 0.0000117939, 
   1.7613, -386.441, 386.445, -89.7389}, {15848.9, 41.1232, 0.0000270225, 
   1.61129, -244.183, 244.188, -89.6219}, {25118.9, 41.0111, 0.0000560277, 
   1.33723, -154.485, 154.491, -89.5041}, {39810.7, 40.8533, 0.000100044, 
   0.957928, -97.8478, 97.8525, -89.4391}, {63095.7, 40.6927, 0.000153468, 
   0.589637, -61.9818, 61.9846, -89.455}, {100000., 40.563, 0.000214301, 
   0.329894, -39.2338, 39.2351, -89.5182}, {158489., 40.4627, 0.000285002, 
   0.175532, -24.8167, 24.8173, -89.5947}, {251189., 40.3884, 0.000360826, 
   0.0887994, -15.6873, 15.6876, -89.6757}, {398107., 40.3415, 0.000444295, 
   0.0436315, -9.90969, 9.90978, -89.7477}, {630957., 40.315, 0.000569237, 
   0.0222841, -6.25674, 6.25678, -89.7959}, {1.*10^6, 40.2996, 0.000817464, 
   0.0127498, -3.94925, 3.94927, -89.815}}}

The various repetition number in Table will need to be adjusted to match your data. If these many vary the Import method would be preferred.

Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
  • I get blank list with your code. – user6726 Apr 13 '13 at 17:06
  • @user6726 does Import["exampleDataFile.txt", "Table"] import correctly? (By correctly I just mean all the data there, but not formatted/extracted correctly.) – Mr.Wizard Apr 13 '13 at 17:13
  • Sorry for my previous comment! I set my directory correctly now and get data with your second code! Now I am trying to understand the output! – user6726 Apr 13 '13 at 17:21
  • It seems that I have the whole data in a big list : Nested list? How to extract colon data? – user6726 Apr 13 '13 at 17:24
  • @user6726 I don't know what you mean by "colon data." The Import method puts all the data in one table of seven columns, whereas the ReadList method is producing a list of tables (tensor). Is one of these correct, or do you need something else? – Mr.Wizard Apr 13 '13 at 17:28
  • @Mr.Wizard Assuming these aren't medical data, perhaps it's 'Column Data"? – cormullion Apr 13 '13 at 17:34
  • @user6726 I added another method to my answer, using SplitBy -- there are many ways to approach a problem like this and it's really a matter of what you want and what the full file looks like. – Mr.Wizard Apr 13 '13 at 17:58
  • It is Column. The data is organize in sets of columns and lines. For each set there are 7 columns and 21 lines (the number of a not fixed, it can be more than 21). In the present example there are 2 sets. So my question is : How can I separate the sets? Idealy I would like to have one table per set (7 columns, Xnumbers of Lines). Thank you – user6726 Apr 13 '13 at 18:50