4

I have about 1 million .txt text files in a directory which I would like to all concatenate in one single text file (separated by a newline character "\n"). Because of the number of files and their size I would rather not like to read everything in memory first. So I was wondering - what is the fastest way to do this in Mathematica, without having to resort to read everything in memory first?

cheers, Tom

Tom Wenseleers
  • 897
  • 5
  • 16
  • 2
    If you have Unix tools available, just cat *.txt > hugefile.txt is sufficient to join the files. – cormullion Nov 24 '13 at 12:00
  • Yes that's true, good point - although in my application, it would be nice if I could do it from within Mathematica – Tom Wenseleers Nov 24 '13 at 12:23
  • 1
    There's OpenAppend. Basically I'm thinking you could append one file at a time. Or you might use Unix tools from within Mathematica. – Michael E2 Nov 24 '13 at 13:07
  • 1
    If you really want to do it from within Mathematica you can still use cormullions suggestion: Run["cat *.txt > hugefile.txt"]. – Mike Honeychurch Nov 24 '13 at 20:47
  • Yes good idea - but there is no command line option in cat by any chance that will allow me to insert a next line character between each file? – Tom Wenseleers Nov 25 '13 at 13:20

2 Answers2

10

Here's a straightforward way:

Clear[copy, merge];
copy[file1_, out_OutputStream] := Module[{input},
   input = Check[Import[file1, "Text"], Return[$Failed]];
   BinaryWrite[out, input];
   BinaryWrite[out, "\n"];
   ];

merge[files_List, outFile_String] := Module[{out},
  Check[out = OpenWrite[outFile, BinaryFormat -> True], Return[$Failed]];
  Do[copy[in, out], {in, files}];
  Close[out];
  ]

Example use:

merge[
 FileNames[DigitCharacter ~~ ".txt", {"/tmp"}],
 "/tmp/foo.txt"
 ]

I'm not sure how to analyze its speed.

Michael E2
  • 235,386
  • 17
  • 334
  • 747
9

To regularly run terminal stuff from within Mathematica use CellEvaluationFunction as described by WReach on this site.

Step 1 set up a cell style:

Cell[StyleData["Terminal"],
 CellFrame->2,
 ShowGroupOpener->False,
 CellMargins->{{66, 4}, {10, 8}},
 Evaluatable->True,
 StripStyleOnPaste->True,
 CellEvaluationFunction->Function[{$CellContext`x, $CellContext`y}, 
   Import[
    StringExpression["!", $CellContext`x], "Text"]],
 CellFrameColor->GrayLevel[0.5],
 Hyphenation->False,
 AutoQuoteCharacters->{},
 PasteAutoQuoteCharacters->{},
 LanguageCategory->"Formula",
 ScriptLevel->1,
 MenuSortingValue->1800,
 FontFamily->"Monaco",
 FontSize->13,
 FontWeight->"Plain",
 FontSlant->"Plain",
 FontColor->RGBColor[0, 1, 0],
 Background->GrayLevel[0]]

enter image description here

Step 2 run a terminal command e.g.

open -a "QuickTime Player"

enter image description here

or

cat *.txt > hugefile.txt

Mike Honeychurch
  • 37,541
  • 3
  • 85
  • 158