How to find the line number of a string in an imported `.txt` file?

Question

I have a txt file which contains a lot of data (in fact it's the result file of a .nb file which I have run on a cluster). I want to use the data to plot some graphs in a new .nb file.

Since there are a lot of data in the txt file, it's a really time-consuming task to copy and paste each desired part to the new .nb file, so I decided to put two markers at the start and at the end of each part of data by printing a special word. For example in the original .nb file, I write Print["start of density data"] command at the start of density data part, then I print my density function (which shows the data for density as a list), after this I write Print["end of density data"]. By doing so, I have determined the start and end of my desired part, then I need to write a code which finds the line number of "start of density data"(for example it's 600) and also the line number of "end of density data"(for example it's 700).

Now I can write a code to gives me lines 601-699 which contains my desire data. The latter is a straightforward task, but my main problem is how to write a code which finds the line numbers of two markers, I tried Read and Find after importing my txt file but they don't give line numbers. Any idea?

I have attached a dummy file here to facilitate answering.

Seems like the easiest solution would be to write the sections of data to separate files. Since you are able to print start/end, this should be easy. Following the link to the dummy file results in "The file you requested has been deleted". — Rohit Namjoshi, May 06 '21 at 15:55
What do you mean by "write the sections of data to separate files"? Please explain more — Wisdom, May 06 '21 at 16:03
In your original .nb file you print "start" then you print output from densityFunction and then you print "end". Instead of printing the densityFunction, save its output to a file. Put[densityFunction[arguments], "some file name"]. To read it later densityData = Get["some file name"] — Rohit Namjoshi, May 06 '21 at 16:32
Good idea, but I have to save output of 15 functions in each nb file, and I have 63 such nb files, so I need 945 separate files which are soooo much to handle! — Wisdom, May 06 '21 at 16:38
Can you provide a sample file? As I mentioned earlier, the link you provided no longer has the sample. — Rohit Namjoshi, May 06 '21 at 16:50
I had no access to my laptop right now, but I try to add the file in the next hour — Wisdom, May 06 '21 at 16:58
I absolutely agree that it would be much better to write your data to an extra file instead of parsing the notebook for these printed markers. Are you aware that there are various data formats that mathematica can read and write (e.g. HDF5, NetCDF) which will allow you to store more than one dataset? You wouldn't even have to convert the data to strings and back. With that you could certainly have a much more efficient and secure way to save and read your data... — Albert Retey, May 07 '21 at 22:14

A.G. · Accepted Answer · 2021-05-06T18:21:05.727

2

Here is one way to do it. The Lines option in Import will cause the import of lines as elements of a Table object.

file = Import["/somePath/data.txt", "Lines"];
Position[file, "\"start of density data\""][[1, 1]]
Position[file, "\"end of density data\""][[1, 1]]
(*
46
48
*)

edited May 06 '21 at 18:21

answered May 06 '21 at 18:15

A.G.

4,362
13
18

Thanks a lot. What does "Lines" do in import command? – Wisdom May 06 '21 at 18:18
@Wisdom It will cause the import of lines as elements of a Table object. – A.G. May 06 '21 at 18:21
I wonder how your code works by [[1,1]] while file is not a list. Also when I use this code just for fvar without quotation I get the error "Part 1 of {} does not exist", what is the logic of your code? – Wisdom May 07 '21 at 05:42
I wonder how your code works by [[1,1]] while file is not a list. Also when I use this code just for fvar without double quotation I get the error "Part 1 of {} does not exist", what is the logic of your code? How can I use it for finding line number of any string in the imported file? – Wisdom May 07 '21 at 05:48
In my test the command Position[file, "\"start of density data\""] returns {{46}}, thus the need for [[1,1]]. If what you want is the number of a line that contains some word/string you may have to replace "\"start of density data\"" by some pattern. – A.G. May 07 '21 at 06:24
Thanks but when I try Position[file2, "\"fvar\""] I get {}, Does this code work just for outputs of Print command and does not for comments? If so, what is the general way to find line number of every string in the imported file? – Wisdom May 07 '21 at 06:37
@Wisdom What is file2? – Rohit Namjoshi May 07 '21 at 15:59
Sorry my mean was file – Wisdom May 07 '21 at 16:11
@Wisdom See the answer I posted. – Rohit Namjoshi May 07 '21 at 16:42

score 1 · Answer 2 · answered May 07 '21 at 16:41

1

text = Import["~/Downloads/data.txt", "Lines"];

densityPosition = Position[text, "\"start of density data\"" | "\"end of density data\""] + {1, -1}

densityData = Extract[text, densityPosition] // ToExpression

answered May 07 '21 at 16:41

Rohit Namjoshi

10,212
6
16
67

Thanks but I want to find the position of fvar – Wisdom May 07 '21 at 16:55
In data.txt, fvar is after "end of density data". So are you asking a different question? – Rohit Namjoshi May 07 '21 at 17:22
fvar was just a example. Generally I want a code to find any string in an imported file – Wisdom May 07 '21 at 17:25
1

Depending on the structure of the imported data (single string or list of strings or ...) you would have to use Position or StringPosition. – Rohit Namjoshi May 07 '21 at 17:30
This is indeed another question. You can have a look at https://mathematica.stackexchange.com/questions/32855/finding-the-position-of-strings-in-a-list-that-contain-a-particular-word-string . Try Position[StringMatchQ[file, RegularExpression[".*fvar.*"]], True][[1]] . – A.G. May 07 '21 at 20:22
1

@A.G. For simple patterns RegularExpression is not needed. StringMatchQ accepts * (zero or more characters) and @ (one or more characters excluding uppercase letters). Check the 'Details and Options' section in the documentation. – Rohit Namjoshi May 08 '21 at 01:35

How to find the line number of a string in an imported `.txt` file?

2 Answers2

Linked