There are plenty of .txt files and I try to figure a way in Latex to merge all *.txt to one *.txt. I thought about \newread and \newwrite but I don't know if there is a better way or how to start.
the *.tex have all the same structure it all are used for spreadtab, but before to use in spreadtab, I thought about to merge them.
My first Idea was to use \input, but it is not that easy than I thought.
- 141
1 Answers
This answer does not cover the problem of merging text files of different input-encoding.
In the following it is assumed that all source-files are of the same input-encoding and that the target-file's input-encoding shall be the same as the input-encoding of the source-files.
As Fran already said, you can do it without TeX, just using shell-commands:
In a Linux terminal
cat MYfileA.tex MYfileB.tex MYfileC.tex > MYfilesABCmerged.tex
should do the trick.
In a Windows PowerShell (where cat is an alias for the Get-Content cmdlet) this might do the trick as well.
In a Windows PowerShell another variant might be:
Get-Content MYfileA.tex,MYfileB.tex,MYfileC.tex | Out-File MYfilesABCmerged.tex
However, be aware that Windows' PowerShell's Out-File by default (re)encodes the file it produces in UTF-8 format without Byte Order Mark and that you need to use the parameter -Encoding for (re)encoding in a different encoding.
On a Windows command prompt
type MYfileA.tex MYfileB.tex MYfileC.tex > MYfilesABCmerged.tex
or
copy MYfileA.tex + MYfileB.tex + MYfileC.tex MYfilesABCmerged.tex
should do the trick.
In any case, e.g., when using wildcards */?, make sure that the target-file is not in the set of source-files to be merged!
If you insist in doing it in LaTeX, you probably can do something like the following:
% Create three text files
% =======================
\begin{filecontents*}{MYfileA.tex}
01 file A
02 file A
03 file A
04 file A
05 file A
06 file A
07 file A
08 file A
09 file A
10 file A
\end{filecontents*}
\begin{filecontents*}{MYfileB.tex}
01 file B
02 file B
03 file B
04 file B
05 file B
06 file B
07 file B
08 file B
09 file B
10 file B
\end{filecontents*}
\begin{filecontents*}{MYfileC.tex}
01 file C
02 file C
03 file C
04 file C
05 file C
06 file C
07 file C
08 file C
09 file C
10 file C
\end{filecontents*}
% Code for macro for merging files
% ================================
\makeatletter
\newread\ThisInFile
\newwrite\ThisOutFile
\newcommand\mergefiles[2]{%
%\NewDocumentCommand\mergefiles{vv}{%
\begingroup
\let\do@makeother
\dospecials % verbatim-category-code-régime.
\do^^I
\do^^M
\endlinechar=-1\relax
\newlinechar=-1\relax
\IfFileExists{#2}{%
@latex@warning@no@line{%
File \detokenize{#2}' already exists on the system.% \MessageBreak Not generating it from this source% }% }{% \immediate\openout\ThisOutFile #2 %\immediate\openout\ThisOutFile \string"#2\string" %\immediate\openout\ThisOutFile "#2" \@for\ThisInFileName:={#1}\do{% \expandafter\IfFileExists\expandafter{\ThisInFileName}{% \immediate\openin\ThisInFile \ThisInFileName\relax %\immediate\openin\ThisInFile \string"\ThisInFileName\string"\relax %\immediate\openin\ThisInFile "\ThisInFileName"\relax \appendthisfileloop \immediate\closein\ThisInFile }{% \@latex@warning@no@line{% File\detokenize\expandafter{\ThisInFileName}' not found on the system.%
\MessageBreak Therefore it cannot be included into the merge%
}%
}%
}%
\immediate\closeout\ThisOutFile
}%
\endgroup
}%
\newcommand\appendthisfileloop{%
\ifeof\ThisInFile\else
\immediate\read\ThisInFile to \thisline
\ifx\thisline\empty\ifeof\ThisInFile\expandafter\expandafter\expandafter@gobble\fi\fi
{%
\immediate\write\ThisOutFile{\detokenize\expandafter{\thisline}}%
%\message{(\detokenize\expandafter{\thisline})}%
}%
\expandafter\appendthisfileloop\fi
}%
\makeatother
% Let's merge the three text files:
% =================================
\mergefiles{MYfileA.tex,MYfileB.tex,MYfileC.tex}{MYfilesABCmerged.tex}%
% Have a document which uses the verbatim-package
% for displaying the file where things are merged:
% ================================================
\documentclass{article}
%\usepackage{xparse}
\usepackage{verbatim}
\begin{document}
\noindent This is the content of file \verb|MYfilesABCmerged.tex|:
\verbatiminput{MYfilesABCmerged.tex}
\end{document}
Pitfalls/Caveats
!!!With the code of the example above make sure yourself that with the macro \mergefiles the target-file does not occur in the comma-list of source-files as well!!!
(One of the source-files being the target-file as well implies destroying that file in the course of creating the target-file before reading it as source-file.)
With the macro \mergefiles there is no error-checking implemented on that. I.e., \mergefiles does not check whether the target-file occurs in the comma-list of source-files as well!
Things to keep in mind when using TeX/LaTeX for copying text files:
Depending on the characters that might occur within filenames, using the format LaTeX you probably may wish to define
\mergefilesin terms of\NewDocumentCommandwith arguments of typev(=verbatim).
With older LaTeX-distributions you may need to have loaded the package xparse for doing this.
"Depending on the characters that might occur": Hereby I think about unmatched characters like{or}because within usual macro-arguments/non-verbatim-type-arguments these characters can only occur if a matching counterpart is present as well. With file-paths in DOS/Windows, where the backslash is used for separating names of nested directories/folders I think about\which in TeX would introduce a control-sequence-token. I think about active characters which should not be expanded/carried out. ... Although having category code 6(parameter), the hash#occurring in file-names for TeX/LaTeX usually is not a problem—unless attempts take place to define temporary macros which expand to the filename which are not based on\edefand\unexpanded/⟨token-register⟩.When doing things with the program TeX you will in any case in the target-file loose sequences of space-characters that in the source-file occur at the right ends of lines of input-text-files. There is nothing you can do about this because TeX discards these spaces already in the stage of pre-processing lines of input-text-files, even before looking at category codes for tokenizing things.
When doing things with the program TeX, the encoding of newlines (the CR or CR+LF-thingie) in the target-file might differ from the encoding of newlines in the source-files. This is because files are processed line by line and when writing a line of text to the target-file, the newline-marker is produced by TeX's writing-routine. Afaik with nowadays TeX-distributions like TeX Live or MiKTeX you can specify the encoding of newlines when writing to file. More info can be found in the The TeX Live Guide—2021 / in the MiKTeX Manual Revision 4.2.
When doing things with the program TeX, then in the target-file you might get
^^-notation for some characters—TeXBook, Appendix C: Character Codes says:Very few conventions about character codes are hardwired into TeX:
[...]
(6) There is a special convention for representing characters 0–255 in the hexadecimal forms^^00–^^ff, explained in Chapter 8. This convention is always acceptable as input, when^is any character of catcode 7. Text output is produced with this convention only when representing characters of code ≥ 128 that a TeX installer has chosen not to output directly.E.g., in TeX Live and in MiKTeX you can provide translation-files with filename-extension .tcx for specifying which characters are to be represented in
^^-notation when TeX writes them to an external text-file.But I think you don't need to worry much about that with recent TeX-distributions:
E.g., The TeX Live Guide—2021 says in section 9.1.2 2004:
- Almost all formats leave most characters printable as themselves via the "translation file" cp227.tcx, instead of translating them
with the
^^notation. Specifically, characters at positions 32–256, plus tab, vertical tab, and form feed are considered printable and not translated. The exceptions are plain TeX (only 32–126 printable), ConTeXt (0–255 printable), and the Ω-related formats. This default behavior is almost the same as in TeX Live 2003, but it’s implemented more cleanly, with more possibilities for customization. See texmf-dist/doc/web2c/web2c.html#TCX-files. (By the way, with Unicode input, TeX may output partial character sequences when showing error contexts, since it is byte-oriented.)
E.g., back in December 2000 the nowadays obsolete MiKTeX Manual Revision 2.0 said in section 5.7 TCX files: Character translations:
By default, no characters are translated, and character codes between 32 and 126 inclusive (decimal) are printable. It is not possible to make these (or any) characters unprintable.
(I did not find any statement regarding defaults with "character translation" in the current MiKTeX Manual Revision 4.2. Maybe I just overlooked these. If so I'd be glad about a hint so I can edit this answer.)
- Almost all formats leave most characters printable as themselves via the "translation file" cp227.tcx, instead of translating them
with the
About file-names/directory-paths:
When specifying filenames within code that is to be processed by the program TeX (e.g., within LaTeX-documents or within .sty-files of LaTeX packages and the like) you need to cope with peculiarities that are specific to the platform/shell/operating-system/file-system in use.
- For example, permission to read the files to be merged is needed, and also permission to create/write the target-file with the merged content is needed.
- For example, in MiKTeX and in TeX Live you need to/can nest filenames/file-paths in quotes (
") if they contain space-characters, which makes it difficult to specify filenames/file-paths that contain quotes. - For example, file-sytems like FAT16 don't allow spaces in file-names/directory-paths while file-systems like NTFS or Ext3/Ext4 do.
- For example, the Kapathsea-library, which is used for path-searching with Web2C based implementations of TeX, takes the case of a filename or a file-path containing the string
$&for an attempt to use a shell-variable—this makes it difficult to specify filenames that contain the string$&. - For example often
/or\by the operating-system/shell in use is taken for something that within path-specifications separates folder-names/directory-names from each other. - For example in Windows
:in some places of a path-specification is not taken for a part of the name of a folder/directory/file but is taken for something that terminates the specification of a file-system/data-volume/drive. - For example with many operating systems
*and?can be used as wildcards.
So, besides problems imposed by using the space-character within file-names, we have, e.g., characters ", $, &, /, \, :, *, ? whose usage in file-names/file-paths might turn out problematic due to restrictions imposed by the platform/shell/operating-system in use although specifications of the file-system in use probably do not restrict usage of these characters.
This may make it difficult with some platforms/shells/operating-systems to access some files in a file-system although they can be accessed by means of other platforms/shells/operating-systems.
- 28,770
-
Thank you very much. I have a little question for example the content of MYfileA.tex has been changed. So when I compile again did MYfilesABCmerged.tex consider the changes or it just content the things that are compiled in the first time. – IH Pro Oct 31 '21 at 12:18
-
If not is there a way to let MYfileA.tex or the upper code get a "refresh-function" – IH Pro Oct 31 '21 at 12:20
-
@IHPro I have written the example so that TeX in any case is reluctant in overwriting files that already exist. In case TeX decided not to overwrite an already existing file you should be informed about this via message on the console/in the .log-file. Feel free to modify the example so that the
\IfFileExists{<file>}{<true-code>}{<false code>}-checks are omitted entirely, or so that in\IfFileExiststrue-branches you are asked whether the file in question shall be overwritten. Having TeX writing to terminal and reading from it is a nice learning-challenge. Especially if the editor in... – Ulrich Diez Oct 31 '21 at 13:14 -
@IHPro Especially if the TeX-input-editor used by you silently catches up console-output/terminal-messages etc for you. (That's why I tend to invoke TeX from the shell/connsole/command-prompt.) :-> – Ulrich Diez Oct 31 '21 at 13:15
-
@IHPro In my example I have used the filecontents*-environment for creating files
MYfileA.tex,MYfileB.texandMYfileC.texso that three files are available that can be used as source-files for exhibiting how to create a target-file where things are merged from source-files. That environment is reluctant in overwriting already existing files. There are options to that environment for changing this reluctant behavior, see source2e.pdf or the manual of the filecontents-package. In your real-life-scenario these files already exist and thus the filecontents-environment is not needed at all. – Ulrich Diez Oct 31 '21 at 13:23 -
thank you very much, that helps me out. I will take a look at filecontents-package and source2e too, that sound interesting. – IH Pro Oct 31 '21 at 13:42
-
Fran has use for the Terminal a code that merge all .txt in a folder. I wonder if it is possible that Latex can merge all .txt in a folder, so Latex can merge hundreds txt and its not necessary to name them all – IH Pro Oct 31 '21 at 14:43
-
@IHPro Afaik TeX/LaTeX itself can't do this. But if the
\write18-feature is enabled - in this context the package shellesc might be of interest to you - you can have LaTeX provide a command to the shell/command-prompt by writing to\write-register 18. These days for enabling the\write18-feature you may need to call latex from the command-line with the command-line-option--shell-escape, e.g.,pdflatex --shell-escape test.tex. More info should be in the manual of your TeX-distribution (TeX Live or MiKTex, I suppose). – Ulrich Diez Oct 31 '21 at 14:54 -
@IHPro E.g., with test.tex =
\RequirePackage{shellesc}\ShellEscape{echo}\ShellEscape{echo Hello IH PRO, have a nice day and happy LaTeX-ing!}\stopthe commandpdflatex --shell-escape test.texdelivers the echo-commands to shell, thus beneath other messages on the console you get the lineHello IH PRO, have a nice day and happy LaTeX-ing!. Without--shell-escapeecho-commands are not delivered to shell and you don't get that line on the console. – Ulrich Diez Oct 31 '21 at 15:17 -
@IHPro If the order in which files are processed by the shell/command-prompt when using wildcards like
*/?does not correspond to the order in which they are to be merged, then you need something for obtaining the (unsorted) list of filenames and sorting the filenames and delivering the sorted list of filenames to the program/command used for merging... – Ulrich Diez Oct 31 '21 at 15:31

cat *.txt > all.txt– Fran Oct 30 '21 at 22:31\inputinside thespreadtab-environment at all. :-) (Merge the spreadtab-preamble, the content of input-files and the spreadtab-postamble in a single file which can be loaded via\input...) – Ulrich Diez Oct 31 '21 at 14:28