I have a typical scientific manuscript in a LaTeX .tex file, and I need to convert it to MS Word .doc file. The reason for having to convert to MS Word is I'm submitting the manuscript to an academic journal and they only accept MS Word (I know...)
The manuscript includes title page, figures, tables, equations (inline and in their own align environment), footnotes, bibliography, and an annex. The tables are in their own separate tables.tex file, which I include using the \include{tables} command. Most tables take up a whole landscape page, and were generated sing the package pdflscape. I am using Windows 7 Professional.
My plan is to use pandoc to go from .tex to .odt, open the latter in Libre Office, and convert to .doc. I have read a related question but it is too general. Similarly the examples in the Pandoc website are too simple. I have played around but I am unable to accomplish what I want. This is surprising since converting a scientific manuscript is probably the most common use case for Pandoc. Here are some sample failures:
Example 1
I open a command line in the project folder, and execute the following:
pandoc -s document.tex -o document.odt
I get this error message:
pandoc: figure1: openFile: does not exist <no such file or directory>
where figure1 is the name of a figure file (e.g. figure1.png) in the project folder referenced in a line as \includegraphics[width=5.8in]{figure1}. I suspect pandoc expects a .png extension but not sure how to provide it.
Example 2
Next I try .html, and excute the following:
pandoc -s document.tex -o document.html
The program executes fine. I open HTML file. Footnotes are there but figures are missing, tables are displayed as LaTeX, bibliography is missing, in-line math displays well, but math in align environment does not, section labels are displayed, and some other minor issues.
So given that mine is probably a typical use case scenario, my question is this: What commands should I use to get the .odt file I want? I could not find a fully worked out example on the web.
Here is a specific list of errors. I'll update how I corrected them based on community suggestions:
- Figures not rendering. Solved by adding
.pngextension to.texfile in\includegraphicscommand. Now figures are included but they are huge, with half of each figure outside the page. - No bibliography. Solved. First, I have one huge consolidated Latex
.bibfile where I keep all my citations. I manage it using JabRef. This was giving me problems as I do not keep the cleanest.bibfile in town. So I reduced the problem by using a neat trick in JabRef that allows you to subset your master.bibfile using the.auxfile generated by Latex when compiling your manuscript. In JabRef click on Tools > New Subdatabase based on AUX file. This way I generated a much smallerbiblio.bibfile with only the articles referenced in my manuscript. Runningpandoc -s document.tex -o document.odt --bibliography=biblio.bibdid the trick. - Display math. Math in
\begin{align}environment displayed in verbatim\latex; (A partial solution is to use the TexMaths Libre Office extension. Copy and paste the latex math code in the.odtfile created by Pandoc into the equation editor, and so on. Surely this could be built into a macro that can post-process all remaining math.) UPDATE: Display math works very well using--mathjaxextension. - Inline math. Inline equation do not always render properly. Bold math is a problem. E.g.
$\Sigma=\sigma^2\bm{I}$displays as$\Sigma=\sigma^2\bm{I}$; - Labels are displayed (e.g. section labels show as
[sec:empirical] blah blah]; - All tables display as raw latex.
\includegraphics[width=5.8in]{figure1.png}. This is only a workaround, since pandoc should support the extensionless format (which is the recommended one) as well. – Federico Poloni May 02 '13 at 08:44--default-image-extension=.pngoption (implemented in pandoc 1.11). You are probably best trying to generate .docx output. – Charles Stewart May 02 '13 at 12:29pandoc -s document.tex -o document.html --default-image-extension=.pngand getpandoc: unrecognized option. The Pandoc user guide is pretty but very sparse... – Fred May 02 '13 at 15:10pandoc --versiongives the version. I think you have overambitious expectations of pandoc: it cannot understand the whole Tex language, it can only do shallow parsing of typical Latex idioms. If you want the full language, then you need to run a Tex engine, and Keks' suggestion of tex4ht is the right sort of approach. – Charles Stewart May 02 '13 at 16:27hlatex documentand get error that it is not recognized. Part of the problem is there is no definitive guide. Online I've seen claims it works out of the box in MiKTek, others point to long set up instructions. Hard to figure out. Any pointers? – Fred May 02 '13 at 16:44convert -density 300 in.pdf out.pngis not acceptable? ;) Have you tried going via SVG, e.g. withpdftk in.pdf burst; for f in pg_*.pdf; do inkscape -l ${f%.pdf}.svg $f; done? – Raphael Sep 10 '13 at 23:29tex4ht) and I think that's the best approach. You do not want to irritate them. (Even an imperfect attempt avoids this whereas simply sending the.texfile does not.) But at submission, it is an awful idea. At least in my world, they have way too many papers and want reasons to reject. – cfr Aug 26 '15 at 01:11