4

Another way of going about my problem with a non-LaTeX collaberator, Is there a way to dump out the text of a LaTeX document? As in, instead of making a nice PDF, just give me the raw text output, sans fancy formatting? So if I had something like:

\documentclass{article}
\begin{document}
Foo \emph{bar} baz
\end{document}

It would give me a .txt with just "foo bar baz" in it (I could always try and copy and past out of the PDF I guess....)

Canageek
  • 17,935

1 Answers1

6

On Linux you can use pdftotext to output the pdf als blank text. pdftotext foo.pdf creates a textfile called foo.txt, but this solution isn't perfect. It will f.e. output the unnecessary dots that are used in table of contents, because it cant decide between characters that are used for text and ones that are used for style purposes.

klingt.net
  • 1,176
  • 1
    It'll also fail on two-column text, tables, formulas, ... – Stephan Lehmke Nov 28 '13 at 18:41
  • All methods will fail on many formulas because formulas are 2-dimensional, while plain text is a 1-dimensional format. Also upper ASCII characters in text files depend on an encoding. I've had only bad experience with pdftotext. The programs ps2ascii and dvi2tty come with some TeX distributions. They seem to get at least the standard ASCII characters correct. Tabulars are messed up. Not sure about two-column text. – Dan Nov 29 '13 at 04:34