3

I have a large text file. I read data from text file one at a time and check whether data agree with my condition. If not I want to delete that data in text file and update it.

For example say my data text file is: 100 201 302 455

Suppose I want to delete second one. After deletion I want new text file to be look like 100 302 455 (without any gap between 100 and 302). Since my text file is large around 100MB, I want to read data one by one rather than loading all at once. Thanks

Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
Vajira
  • 31
  • 1
  • 1
    Just use sed and be done with it :) something like sed --in-place '/201/d' file.txt – Nasser Sep 01 '13 at 22:46
  • @Nasser Surely dedicated text utilities are a better approach for simple replacements. I have to assume that the condition here is nontrivial and requires or at least benefits from Mathematica processing. – Mr.Wizard Sep 01 '13 at 22:50
  • @Mr.Wizard yes ofcourse, I understand that. I was just giving an answer for the specific example (I also put a smiley, there, just in case :) – Nasser Sep 01 '13 at 23:07
  • @Nasser From your response I think I failed to communicate what I intended. Let me try again: Using an external text processing utility is a great idea and should be used whenever possible as it will deliver superior performance in nearly all cases. – Mr.Wizard Sep 01 '13 at 23:10

1 Answers1

7

100MB is small compared to today's RAM sizes. Why not load it all at once? Then it's just a matter of using DeleteCases and exporting the file.

Otherwise I don't believe Mathematica is natively equipped to modify a file piece by piece in that fashion so you'll need to export to a second file. As an example I'll filter a list of natural numbers to keep only the primes. (It would be a trivial use of Not to drop the primes.)

First generate the starting data:

Export["firstfile.txt", Range@100, "Table"];

Open the input and output streams:

in = OpenRead["firstfile.txt"];
out = OpenWrite["secondfile.txt"];

Read, filter, and export:

Module[{x},
 While[x =!= EndOfFile,
  x = Read[in];
  If[PrimeQ@x, Write[out, x]]
 ]
]

Close the streams:

Scan[Close, {in, out}]

The result:

FilePrint["secondfile.txt"]
2
3
5
7
11
13
...

Note: practically I would include in and out in my Module but here it made it harder to comment the code as I wanted.

Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371