228

I have a typical scientific manuscript in a LaTeX .tex file, and I need to convert it to MS Word .doc file. The reason for having to convert to MS Word is I'm submitting the manuscript to an academic journal and they only accept MS Word (I know...)

The manuscript includes title page, figures, tables, equations (inline and in their own align environment), footnotes, bibliography, and an annex. The tables are in their own separate tables.tex file, which I include using the \include{tables} command. Most tables take up a whole landscape page, and were generated sing the package pdflscape. I am using Windows 7 Professional.

My plan is to use pandoc to go from .tex to .odt, open the latter in Libre Office, and convert to .doc. I have read a related question but it is too general. Similarly the examples in the Pandoc website are too simple. I have played around but I am unable to accomplish what I want. This is surprising since converting a scientific manuscript is probably the most common use case for Pandoc. Here are some sample failures:

Example 1

I open a command line in the project folder, and execute the following:

pandoc -s document.tex -o document.odt

I get this error message:

pandoc: figure1: openFile: does not exist <no such file or directory>

where figure1 is the name of a figure file (e.g. figure1.png) in the project folder referenced in a line as \includegraphics[width=5.8in]{figure1}. I suspect pandoc expects a .png extension but not sure how to provide it.

Example 2

Next I try .html, and excute the following:

pandoc -s document.tex -o document.html

The program executes fine. I open HTML file. Footnotes are there but figures are missing, tables are displayed as LaTeX, bibliography is missing, in-line math displays well, but math in align environment does not, section labels are displayed, and some other minor issues.

So given that mine is probably a typical use case scenario, my question is this: What commands should I use to get the .odt file I want? I could not find a fully worked out example on the web.


Here is a specific list of errors. I'll update how I corrected them based on community suggestions:

  1. Figures not rendering. Solved by adding .png extension to .tex file in \includegraphics command. Now figures are included but they are huge, with half of each figure outside the page.
  2. No bibliography. Solved. First, I have one huge consolidated Latex .bib file where I keep all my citations. I manage it using JabRef. This was giving me problems as I do not keep the cleanest .bib file in town. So I reduced the problem by using a neat trick in JabRef that allows you to subset your master .bib file using the .aux file generated by Latex when compiling your manuscript. In JabRef click on Tools > New Subdatabase based on AUX file. This way I generated a much smaller biblio.bib file with only the articles referenced in my manuscript. Running pandoc -s document.tex -o document.odt --bibliography=biblio.bib did the trick.
  3. Display math. Math in \begin{align} environment displayed in verbatim \latex; (A partial solution is to use the TexMaths Libre Office extension. Copy and paste the latex math code in the .odt file created by Pandoc into the equation editor, and so on. Surely this could be built into a macro that can post-process all remaining math.) UPDATE: Display math works very well using --mathjax extension.
  4. Inline math. Inline equation do not always render properly. Bold math is a problem. E.g. $\Sigma=\sigma^2\bm{I}$ displays as $\Sigma=\sigma^2\bm{I}$;
  5. Labels are displayed (e.g. section labels show as [sec:empirical] blah blah];
  6. All tables display as raw latex.
Fred
  • 2,677
  • 2
    Try with \includegraphics[width=5.8in]{figure1.png}. This is only a workaround, since pandoc should support the extensionless format (which is the recommended one) as well. – Federico Poloni May 02 '13 at 08:44
  • 4
    The needs of convertion to Word have often been discussed in this forum. We found surprisingly many solutions to avoid conversion. So if you tell us the cause for your wish to convert your manuscript, we could look out for an alternative. That said, I suggest to use tex4ht to convert your manuscript to *.doc or whatever. – Keks Dose May 02 '13 at 09:16
  • 7
    Try invoking Pandoc with the --default-image-extension=.png option (implemented in pandoc 1.11). You are probably best trying to generate .docx output. – Charles Stewart May 02 '13 at 12:29
  • 1
    @KeksDose Thanks for your comment. I added a line to the question explaining the rationale. Please take a look at first paragraph. – Fred May 02 '13 at 15:05
  • 1
    @CharlesStewart Thanks. Saw that in the Pandoc manual but I could not find any examples of it being actually implemented. Could you kindly write down the full command? I've tried pandoc -s document.tex -o document.html --default-image-extension=.png and get pandoc: unrecognized option. The Pandoc user guide is pretty but very sparse... – Fred May 02 '13 at 15:10
  • 1
    Fred: What version are you using? The option has only been implemented in March, with version 1.11. – Charles Stewart May 02 '13 at 15:40
  • 1
    @CharlesStewart I saw that too. To be hones I am not sure. I had an older version installed. Then I downloaded the latest versions and installed it. I imagine that deleted the previous one but not sure. (As you can see I'm no expert). I looked for a version command in pandoc but could not find one. – Fred May 02 '13 at 15:50
  • 1
    @CharlesStewart I am using version 1.11.1 now. In any case I added the .png extension to the main .tex file as suggested by Federico Poloni above. That worked. However, the math is still a mess, and no citations. – Fred May 02 '13 at 15:57
  • 3
    Fred: pandoc --version gives the version. I think you have overambitious expectations of pandoc: it cannot understand the whole Tex language, it can only do shallow parsing of typical Latex idioms. If you want the full language, then you need to run a Tex engine, and Keks' suggestion of tex4ht is the right sort of approach. – Charles Stewart May 02 '13 at 16:27
  • 2
    @CharlesStewart Yes I looked at tex4ht but it is not very intuitive and does not appear to be well supported. For example I install the packages using MiKTex 2.9 package manager. I open CMD line in project folder. I enter hlatex document and get error that it is not recognized. Part of the problem is there is no definitive guide. Online I've seen claims it works out of the box in MiKTek, others point to long set up instructions. Hard to figure out. Any pointers? – Fred May 02 '13 at 16:44
  • 1
    I guess convert -density 300 in.pdf out.png is not acceptable? ;) Have you tried going via SVG, e.g. with pdftk in.pdf burst; for f in pg_*.pdf; do inkscape -l ${f%.pdf}.svg $f; done? – Raphael Sep 10 '13 at 23:29
  • 6
    Submit a pdf file. Wait until the manuscript is accepted for publication. Send the tex file, and ask the editors to sort it out for you. See what they say. Only if they refuse should you try to convert the thing yourself. Many publishing houses have technical staff who can play with conversion software, and secretarial staff who can correct a poorly converted manuscript. – Benjamin McKay Oct 12 '13 at 09:57
  • 1
    @BenjaminMcKay This is terrible advice. If they only accept Word, they only accept Word. Either you do your best to convert it or you don't submit there. It is a bit different if they are willing to accept PDF for submission and only insist on Word later. Even then, I do my best to convert (with tex4ht) and I think that's the best approach. You do not want to irritate them. (Even an imperfect attempt avoids this whereas simply sending the .tex file does not.) But at submission, it is an awful idea. At least in my world, they have way too many papers and want reasons to reject. – cfr Aug 26 '15 at 01:11
  • 1
  • 1
    Niceties like equations won't be converted into anything remotely editable... – vonbrand Jan 13 '16 at 01:52
  • 1
    Foxit Phantom PDF to word works great! – Farid Cheraghi Sep 18 '17 at 01:27
  • 2
    I'm not allowed to comment yet, but the way i did it: Convert to PDF -> Upload to Google Drive -> Convert to Doc -> Download as docx. – shredding Jan 31 '21 at 19:00

15 Answers15

145

I tried nearly all methods mentioned in other answers.

Eventually, and surprisingly, I found the most satisfactory way to convert is to just open the PDF file in MS Word (2013 or newer), which retained most of the layout. Although you are gonna lose the hyperlinks of cross-references.

Yebo Liu
  • 1,567
  • 7
    Remarkably, this appears to be the simplest and most satisfactory attempt for me, too! However, I used LibreOffice instead of MS Word to open PDF. Not extremely smooth, but the output looks ok. However, it is is the form of ODG graphics for which I now seek a converter into rtf/doc file... – Weather Report Nov 08 '15 at 08:24
  • 1
    this is by far the simplest method and for all the methods I tried, gives best results (comparatively, of course). – mcy Aug 05 '16 at 12:27
  • 8
    This did not work for me using Microsoft Word for Mac v15.32. It was asking for the type of text conversion to use and they all just looked like binary file junk. – Nicholas G Reich Jun 07 '18 at 19:50
  • 1
    this should be the accepted answer – Stefano Lombardi Nov 02 '18 at 00:33
  • 7
    If you're unable to open the PDF in Word directly, another option is to open in Adobe and: File -> Export To -> Word – FlacoT Feb 12 '19 at 15:28
  • wonderful, this is the easiest and best method I believe. thanks – Vinod Kumar Chauhan Jul 18 '20 at 10:57
  • I wanted to extract several complicated tables from a pdf paper. I couldn't believe my eyes. After many trials with different tools, including Acrobat, this is the best method! (word 2013). – Simon Dispa Aug 21 '20 at 15:53
  • Which font works the best for pdf conversion in Word? If I use the default LaTeX font (computer modern), Word removes hyphens from some hyphenated words. Do you know if Word pdf conversion works better with other fonts? Thanks! – Richard Herron Aug 26 '20 at 19:24
  • If its draft copy, just stack all the images table towards the end of document. Works like a charm. – Kunal Tiwari Dec 09 '20 at 14:08
  • This is indeed a simple way. But the question is specifically about using pandoc to accomplish this. – Fred Jan 08 '23 at 03:30
  • Word indeed does a very good job. Is there any way to do run the conversion from the commandline? Something like WINWORD.exe --input myfile.pdf --output converted.docx – Saaru Lindestøkke Mar 25 '24 at 09:49
37

I gave up on pandoc for almost exactly the same reasons you listed.

If you are set on using pandoc, the simplest solution may be to just identify environments and packages that cause trouble - and then not use them, or just type the offending stuff directly in to MS Word.

I've had a fair amount of luck with going to word documents using latex2rtf to create an .rtf that then gets converted, rather than going through pandoc. As I wrote in Hide output, but maintain the cross-references, my solution has been to put a very tight cap on the packages that are used when creating a tex document that you know will be converted. This is because a lot of problems with conversion from .tex to .rtf are caused by optional packages and environments that are not supported.

See https://github.com/AndyClifton/AccessibleMetaClass for a demo of a class that gives you a file that can be converted with latex2rtf to .rtf and thus to .docx. Bonus: this class almost(!) gives you a tagged PDF that passes automated testing for tags (the fabled 508 compliance).

Andy Clifton
  • 3,699
  • Thanks, looks interesting! I will try it and see if it resolves all the errors with Pandoc I mentioned. One can think of my question as a specification, or unit test. Only answers that pass the test can be accepted. – Fred Jan 06 '15 at 19:22
  • @Fred Very likely your question is unanswerable then, since it requires use of a tool to achieve X which is really not designed to do X. tex4ht works fairly well for me, although some clean up is generally needed. And the same recommendations apply about limiting packages carefully. – cfr Aug 26 '15 at 01:17
  • pandoc did not work for me for the citations, but latex2rtf -o output.docx input.tex worked like a charm. – Sudipta Basak Sep 28 '15 at 04:19
  • 3
    Actually, as of November 2017 I switched completely to Pandoc because it's now doing an excellent job of conversion (again, when a limited set of packages are used) compared to latex2rtf. – Andy Clifton Jan 11 '18 at 12:46
26

LaTeX2rtf is the easiest and fastest way to convert .tex files to .rtf that can be read by Microsoft Word. Using it is as simple as downloading the program, choosing your .tex file, and pressing run. A command window will open up to display the progress and warn of any errors. In most cases the default settings will be sufficient and despite errors it can usually output something useable.

For more information you can find the project on Sourceforge at http://latex2rtf.sourceforge.net/

Best of all, it is open source and actively maintained.

Kevin Morse
  • 253
  • 1
  • 7
TPArrow
  • 431
  • 5
  • 5
15

Pandoc's LaTeX importer may not handle every input very well, but when you go via Pandoc's markdown format, which maps basically one-to-one to Pandoc's internal document representation, you have precise control over the output.

  1. Convert .tex to markdown: pandoc document.tex -o document.md
  2. Manually clean up the generated markdown file. Pandoc's extended version of markdown has a surprising number of features, including math, tables, footnotes and citations using .bib files.
  3. Convert the markdown to Word: pandoc document.md -o document.docx
mb21
  • 790
  • 4
  • 19
  • This soltion works almost fine but for the float objects and references. In my case I have to correct them manually thinking about a workflow that eases me to get a final version in latex and another one in Word for the boss. – Aradnix Jun 09 '15 at 18:34
  • 1
    I couldn't recall that a markdown have page breaks. So it just wouldn't work. – Hi-Angel Jul 28 '15 at 18:59
12

I write my APA6th papers with LaTeX and export them with all beauty to PDF. Normally this is all I need. Sometimes publisher ask for word files (the reason why I don't know...). So I was on the search to a decent pdf to word converter since simpletex4th has table issues and I need tables a lot. The only converter I am satisfied with is PDF to Word + by Lighten Software Limited for Mac. The docx generated has NO differences from the PDF output and is perfectly editable. This works much better than these tex to xxx converters. http://www.lightenpdf.com/pdf-to-word-converter-mac.html http://www.lightenpdf.com/pdf-to-word-converter.html

Bastian
  • 121
  • 1
  • 2
  • I tried this in demo mode, and it converted each line of a paragraph into its own paragraph. It might work well for tables, but not really usable for a big document, although I guess you could try search/replace... – beroe Jan 06 '15 at 20:13
  • The soft doesn't even seems to be real. I tried it with various settings in both stable and dev- wine. It just always crashing for startup. I thought it is a bug in Wine, so I installed WinXP in VirtualBox, and the app still doesn't even starts. – Hi-Angel Jul 27 '15 at 18:18
  • I just tried it on a mac and it seemed to work really well. Pretty steep price to purchase, though. – Adam_G Oct 13 '15 at 17:19
10

I haven't tried pandoc. There doesn't seem to be a definitive solution. I usually try one of these and see which works. Here is how to convert to Word using tex4ht.

mk4ht oolatex document.tex

This should produce document.odt. You can convert this to Word using OpenOffice.org/LibreOffice. A short tutorial is here. Unfortunately, this may fail for long documents. Another option is to first compile with latex (not pdflatex) and then

latex2rtf document.tex and open document.tex directly in Word.

For long documents, I have had good results by converting the output pdf to Word. There are quite a few pdf to Word converters.[Adobe Acrobat, online converters, of other freeware.] that work pretty well.

Sameer
  • 293
  • I couldn't get latex2rtf to install (linking problems) on a mac, and the fink version seems to be gone. Thanks though for the quick tutorial on using oolatex, I hadn't realised it was an argument to mk4ht – Joanna Bryson Sep 12 '14 at 13:14
9

I'm not sure you want to read this.

If you are forced to use MS Word for your work then better use Word to write it.

LaTeX typesetting is much better than Word can do. So every conversion from LaTeX to Word will disappoint you about the quality of the conversion---if possible.

Why do you want to do the same "work" two times: first writing your script in LaTeX, later rework completly the word file?

JPi
  • 13,595
Mensch
  • 65,388
  • 27
    professor prefers MS word, but I love Latex structure. – Tawei Oct 12 '13 at 01:35
  • 14
    Sometimes you have to share an editable work to other colleagues that don't use LaTeX... – G M Dec 19 '13 at 11:26
  • 1
    To the downvoters: Please explain why you downvoted this answer! – Mensch Feb 17 '15 at 11:05
  • 35
    I'd down vote this because the question is not "I'm thinking of writing a paper, should I use word or LaTeX." The question is "I have written a paper in LaTeX already (maybe years ago). How can I convert it to Word (for some new purpose)?". – rjmunro Apr 20 '15 at 12:33
  • 9
    I agree with rjmunro: the answer doesn't fit the question. And, plus, some people hate Word that much that they don't what to touch it, including me. – PLG Apr 29 '15 at 10:49
  • 5
    Agree with the haters. I'd rather use LaTeX and deal with conversion than have to continuously deal with Word which I can't even afford and would need a different operating system to run. (I hate LibreOffice almost as much.) – cfr Aug 26 '15 at 01:15
  • When being forced to use Word for work, then your employer can provide the latest Word version to open the .pdf file. I am a Word hater, for big report, I prefer writing in Latex first, then convert it to Word, it will save a lot of time and effort. – biajia Dec 16 '20 at 22:43
  • Instead of accusing us of infidelity to LaTeX, go and protest in front of all those journals that require submissions in Micro$oft Word. Do you think we like converting beautiful documents into ugly, unstable, unstructured garbage? – yannis Feb 12 '22 at 18:42
5

I've found this free online pdf converter to be superior to Word's PDF conversion tool. They both make a mess of equations, but pdf2docx retains document formatting and references much better than Word's conversion tool.

ABM
  • 233
3

Indeed, there is almost a perfect way to do so. But you have to pay for that, the solution is Tex2Word. In order to get the best results, firstly you change to the the basic document class, e.g. article and avoid using self-defined styles. If you are using bibtex, then you just copy the content in bbl file to the tex file. Finally, open your tex file with MS Word (Yes! It is so easy!). All the equations, images and cross-references will be translated into MS word! The equations are native MS word equation or MathType rather than images. I would say that there is no better solution now.

2

Going via html using the --webtex option may solve your inline maths issue. Won't do anything about cross-referencing (the general case of your issue 5)

Chris H
  • 8,705
2

You can convert the resulting pdf document to word. That is another option without having to work with the TeX file directly.

Romain Picot
  • 6,730
  • 4
  • 28
  • 58
sathutu
  • 21
  • Can you expand more? I do not use word for example and so I do not know how to convert pdf to word format – Romain Picot Feb 22 '16 at 07:56
  • Acrobat and some other premium PDF software can do. However, results are usually not very pretty. – mcy Aug 05 '16 at 12:28
2

All, I am still using the Mac version of Acrobat XI (version 11.0.23 -- but I don't think this matters). It does a very good job of duplicating the LaTeX produced pdf into Word. Does it look as good as the real thing? Noooo. Of course not. But it suffices. BUT, each time I do this, I write a personal note to the journal Editor (both Executive Editor and the Action Editor for my manuscript if these are different people) and explain the beauty and glory of LaTeX while urging them to bring this up with the publisher. Note that OverLeaf now provides a long list of journals that let you submit your .tex and all other necessary files directly from OL (you don't have to think about this, OL does the thinking for you). It works!

Wayne
  • 561
1

Would also like to have a simple pandoc solution, but it seems that with a workaround there is less work and better results.

Best solution I have found is the Adobes PDF to word converter. Not all equations display as desired but otherwise the formatting is almost equal

0

I have tried almost all methods mentioned here, but none of them satisfied me.

The only satisfactory solution for converting LaTeX to MS Word I've encountered is GrindEQ (only for Windows). Yes it is a shareware, but you can try it 10 times before purchasing.

0

I write my homeworks in latex using exam.sty (makes creating key easy). My homeworks include figures and these are not well placed in the above solutions. I used to use GrindEq (it's a great product if you work in Windoze), but here's a solution that works for me on linux using docs.google.com website.

  1. Create pdf using pdflatex
  2. Create new doc file on docs.google.com
  3. Use File->Open->Upload to upload pdf in step 1. This will insert your pdf file in an editable and well laid out form in your open google doc.
  4. Download file created in 3. as a docx

This approach works well for me right now. I'm not doing much math typesetting, however.

I haven't done a file with bunch of math, but