8

I have a LaTeX root file that refers to many other single files. Those files are included/referenced by

\input{somefolder/somefile}

But somehow Pandoc is generating the output just from the main tex file (the entry point) and does not follow the inputs. What am I doing wrong?

pandoc main.tex -t docx -o main.docx

Remark: I'm trying to import the LaTeX to Adobe Indesign by converting it to Docx first and then to Indesign.

Update: I created a minimum working example. Seems that the input file are correctly referenced but still I get tons of other errors that seem to be related to the corporate layout that uses Tkiz drawings. The Pandoc version is 1.19.2.1 on MacOS 10.12.4 (Sierra). The Pandoc command is:

pandoc booklet2016.tex -t docx -o booklet2016.docx

Matthias
  • 381
  • Pandoc doesn't support full LaTeX, only subset. You can try to compile your document to ODT using tex4ht. try mk4ht oolatex filename.tex. The generated ODT file can be converted to DOCX using LIbreOffice or Word. – michal.h21 Apr 04 '17 at 13:38
  • Can't reproduce your issue. Please provide a minimal working example (MWE). Also which version of pandoc are you using? – DG' Apr 07 '17 at 10:03
  • I'm sorry that I can't provide a sample since my tex document refers to about 20 others and I'm using several complex newcommand definitions and TikZ for page layout and logos. Pandoc version is 1.19.2.1 on MacOS 10.12.4 (Sierra). – Matthias Apr 07 '17 at 10:10
  • Are you trying to extract only the contents or do you wish to keep the layout as well? – DG' Apr 08 '17 at 10:09
  • @Matthias have a look at my answer and your code. Maybe you can find the code that breaks things. – DG' Apr 08 '17 at 10:43
  • 1
    What exactly are you missing in the generated docx? I've downloaded you MWE, and the files are included, at least the parts that pandoc can interpret. As an experiment, include in every file as first line something like This is \verb"filename". (replacing of course filename with the respective filename). In the generated docx I find traces of the subfiles. But apparently commands like \lipsum and tikz code is beyond the abilities of pandoc. – gernot Apr 10 '17 at 10:16
  • @DG: I did exactly what you describe in your post. Don't know what else to do. – Matthias Apr 10 '17 at 11:26
  • @gernot: in my MWE there was not DOCX output since Pandoc stopped before. And what do you mean by including every file? Is there some Pandoc config file or what? The original latex project is about 50 files. – Matthias Apr 10 '17 at 11:26
  • 1
    Did you try @DG's test document below? Does file inclusion work in this simple setting? If yes, then pandoc is set up correctly to include files. With your MWE on my computer, pandoc did not crash but included all files, as far as I could check. Of course, there is not much to see in the generated docx, since much of the stuff cannot be digested by pandoc. – gernot Apr 10 '17 at 13:21
  • @Matthias I can generate a docx from your source (see update to my answer below) and I have to say the problem is much more complicated. You have two options: 1.) Break your code down in many pieces, fix them one by one until pandoc can parse everything or 2.) take the pdf and use a tool like pdfmasher to extract the content and do all the work in InDesign – DG' Apr 10 '17 at 13:23
  • 1
    I ended up creating the DOCX from PDF with Acrobat Reader. – Matthias Apr 10 '17 at 13:32
  • are we sure we want to troubleshoot pandoc? Wouldn't these questions belong to other sites? (superuser comes to mind) – jarnosc May 20 '23 at 12:34

4 Answers4

5

Pandoc does include input files


If you have a structure like this

.
├── main.tex
└── somefolder
    └── somefile.tex

with the following two files

1: main.tex

% !TeX program = XeLaTeX

\documentclass{article}
\begin{document}

\section*{Test}

    Input somefolder/somefile.tex:
    \input{somefolder/somefile}

\end{document}

and

2: somefile.tex

This is somefile

then

pandoc main.tex -t docx -o main.docx

will give you a word document, which contains the contents of somefolder/somefile.tex

3: main.docx

enter image description here


Bottom line: It works. If the structure of your project and/or code is more complicated then you should do some preprocessing first.


Update: Your MWE produces a docx file

enter image description here

As you can see, it contains the content from the included files. The trouble is that pandoc can't parse the elaborate macros (\newcommand) you are using, so there is a lot of noise and not a lot of signal.

DG'
  • 21,727
  • Hi. Thanks for your effort. I would accept it, but my Pandoc command does not work. I uploaded a minimum working example and edited my original post. You can find the link there. – Matthias Apr 10 '17 at 09:41
  • 1
    Your code is very messy: A lot of manual formatting and new commands. If you look at the docx that pandoc produces, you can see that it correctly includes for example booklet2016/eu/eu_project. So the problem lays elsewhere: Pandoc can't access the content hidden in the many macros. – DG' Apr 10 '17 at 10:38
  • 2
    Sorry to say, but using newcommand marcos is not messy - it's what Latex is about. – Matthias Apr 10 '17 at 13:32
  • Well, you are right and didn't mean to offend – DG' Apr 10 '17 at 13:39
  • Since the problem is hidden somewhere, and your answer is correct according to the original question, I accepted it. Thanks for your support. Appreciate it! – Matthias Apr 10 '17 at 14:38
  • Note that \input is recognised by pandoc but not \import using the import latex package. – jmon12 Jul 19 '22 at 11:42
2

I answered this question before here:

I know that this is an old question, but I have not seen any answers to this effect: Essentially, if you are using markdown and pandoc to convert your file to pdf, in your yaml data at the top of the page, you can include something like this:

---
header-includes:
- \usepackage{pdfpages}
output: pdf_document
---

\includepdf{/path/to/pdf/document.pdf}

# Section

Blah blah

## Section 

Blah blah

Since pandoc using latex to convert all of your documents, the header-includes section calls the pdfpages package. Then when you include \includepdf{/path/to/pdf/document.pdf} it will insert whatever is include in that document. Furthermore, you can include multiple pdf files this way.

As a fun bonus, and this is only because I often use markdown, if you would like to include files other than markdown, for instance latex files. I have modified this answer somewhat. Say that you have a markdown file markdown1.md:

---
title: Something meaning full
author: Talking head
---

And two addtional latex file document1, that looks like this:

\section{Section}

Profundity.

\subsection{Section}

Razor's edge.

And another, document2.tex, that looks like this:

\section{Section

Glah

\subsection{Section}

Balh Balh

Assuming that you want to include document1.tex and document2.tex into markdown1.md, you would just do this to markdown1.md

title: Something meaning full
author: Talking head
---

\input{/path/to/document1}
\input{/path/to/document2}

Run pandoc over it, e.g.

in terminal pandoc markdown1.md -o markdown1.pdf

1

I had the same problem and the solution might be of interest to others. The working directory of my console / terminals was not identical to the base directory of the LaTeX project. I thought that was not a problem because I had worked with absolute paths. In fact, processing LaTeX files works fine as soon as the working directory matches the basic directory of the LaTeX project.

0

I had the same problem with including a file with tons of latex macros. I had the LaTeX include directive \include{macros} in the header, before \begin{document}. The problem could be easily solved by moving it into the body:

\begin{document}
\include{macros}
...
\end{document}
Ute
  • 121