214

I often have to write up reports based on the analysis of some data. I use R to analyse the data and export tables, figures, and text. This is then included into a LaTeX document either using input or Sweave (see here for details).

However, when I collaborate with others, I sometimes need to provide a document in Open Office / MS Word format.

Question:

Thus, assume the simplest scenario

  • I have a LaTeX document with text, tables, and figures
  • I need to export this reliably into Open Office or MS Word format: this includes mathematical formulas, table formatting, and quality figures)
  • I don't need to go back from MS Word to LaTeX

What is a reliable, efficient, and preferably free process?

Initial Thoughts:

I was hoping that there is an expert out there who has worked out a good system already.

Jeromy Anglim
  • 7,450
  • 6
  • 46
  • 63
  • 158
    In an ideal world the workflow would be "1) tell your collaborators to use LaTeX 2) live happily ever after" – Seamus Oct 15 '10 at 10:43
  • 3
    There are already 4 answers here: the bounty implies that none are what you are after. Could you elaborate a bit on what is needed beyond what's already been said? – Joseph Wright Oct 24 '10 at 20:24
  • @Jeromy: Would you like to ask this question on SO? I'd be happy to support it with a bounty in a couple of days if you do. @Joseph: From my point of view, none of the answers give any sense of how good a job they do, and only one of them gives evidence of awareness of the fact that the last eight years have seen three major versions of Word, each with different capabilities for support translation to/from Latex. – Charles Stewart Oct 24 '10 at 21:15
  • @Charles Thanks for the suggestion. The existing answers are helpful. I might see whether any additional answers come over the next few days. – Jeromy Anglim Oct 25 '10 at 00:22
  • @Joseph The existing answers are useful. I was hoping to get a clearer recommendation regarding what is most simple, reliable, and effective. I can and will play around with the options proposed. – Jeromy Anglim Oct 25 '10 at 01:05
  • 4
    @Jeromy a comparative assessment of some of the options is given here – Seamus Jan 26 '11 at 16:04
  • 1
    This is really a comment, not an answer, but since I haven't got the points to comment... I am really getting sick and tired of this issue. I waste weeks of my time writing these manuscripts with MS Word, because my coauthors want to have their Word documents to use that Track Changes feature they so much love. Word is throwing my pictures all over the document, cross-references mess up, Endnote crashes and adds some weird stuff in the document just before I am supposed to send my ms for a review. Man, I am really close to throw my laptop out of the office window, curse all MS products to hell – Mikko Apr 03 '12 at 10:00
  • 5
    Try htlatex with the following command line: htlatex main.tex "html,word" 'symbol/!' "-cvalidate" and see if this gets you started. Look for a file main.html, Word should be able to load it. – krlmlr Apr 03 '12 at 13:56
  • 1
    Thanks! I have to say that haven't used DOS since 1995 or something (I have a Window 7 machine at work). I found the folder with my tex file in Explorer, shift + right clicked it and clicked "Open command window here". Then I pasted in the code you provided (chanced the file name of course). The result is much better than in my earlier trials. One rather basic document is translated to MS Word document rather well. However, the command doesn't like title pages nor it seems to like my Sweave created tex files. This might be sufficient for a ms, but I am still looking for the perfect solution... – Mikko Apr 03 '12 at 14:49
  • Yet another option: convert the resulting PDF to MS Word. – gerrit Feb 04 '14 at 14:54
  • As in Aditya's answer a markup language may be a solution. Org-mode exports directly to both latex and odt, and also support execution of code (like Sweave). Math is inputted as LaTeX. I have produced fairly advanced documents (equations, figures, tables) that are of good quality when exporting as both odt and latex. Citations are a hurdle in this setup ATM but work is being done in this area. – Rasmus Apr 08 '15 at 19:02
  • 1
    In the event that you do want to convert back to Latex, the answers at http://tex.stackexchange.com/questions/16367/convert-tex-to-non-tex-and-back give possible solutions. – Charles Stewart Oct 07 '15 at 10:25

27 Answers27

70

I implemented this for a large R&D lab. We produced several hundred (if not thousand) documents per year, and the LaTeX Users' community there wanted to be able to produce documents using 'tex as well as WYSIWYG software.

The OP was right in that a well-defined workflow is essential. Part of this is the process, but you may also need to think about training and using a common repository, and how to implement corporate design.

Process

We implemented a process that allowed people to work in LaTeX and then switch to .docx for collaborators.

  1. Define a class file that contains the correct formatting, etc, using article, report or book classes. Include the minimum number of up-to-date packages in the class and add the nag package to make sure that you (and other users) can see that those packages are not deprecated.
  2. Create a template showing how to use the class file
  3. Create an SVN (or git, or whatever) repository for the class and template files, and distribute the URL of the repository to LaTeX users
  4. Create documents using the lab-standard class file
  5. Convert the tex files to .docx using Pandoc, which works on Windows, Mac, and Linux
  6. Get edits and peer reviews done on the .docx
  7. Transfer edits from the .doc or .docx document back in to 'tex manually, and complete the PDF production in LaTeX.
  8. Tagging the document using Adobe Acrobat for Section 508 compliance (accessibility).

N.B. Using one of the web-based editors like sharelatex.com or overleaf.com can remove the need for 5-7, especially now that they have rather good review tools.

Challenges

There were a couple of challenges we had to face to get this adopted.

  1. Getting the editors and reviewers something that fit with their existing process, hence the use of the .docx format
  2. Figuring out how to get the same class file(s) to all users, hence the SVN repository
  3. Making sure people know how to use it, hence the template
  4. Figuring out tools that let people collaborate. But that's a whole other post!

508 Compliance / Structured PDFs

The one thing that is still causing trouble is 508-compliance. I have been working (slowly) on using the pdfcomment package to add tooltips and modifying the accessibility package so that documents are accessible. My test PDF documents sometimes pass automated testing in Adobe Acrobat...

Repository

I've put a set of demo documents in a Github repository which may be helpful.

Note re. Pandoc

3 Dec 2017: Originally I suggested the use of latex2rtf instead of Pandoc. I am now editting this answer to suggest Pandoc as I find Pandoc is kept up to date, works well, and I like the flexibility to choose from many more input and output file types.

Andy Clifton
  • 3,699
  • Nice answer - although I'm not sure it effectively goes beyond @frabjous's answer-mention of latex2rtf, which would be the real meat of the answer here. – Sean Allred Apr 18 '13 at 16:40
  • Thanks. I thought the OP's request for "... a reliable, efficient, and preferably free process" hadn't been answered, which is what I tried to contribute. – Andy Clifton Apr 18 '13 at 17:03
  • 1
    No worries! That's exactly the idea on any SE site. Answer what (you think, but maybe ask comments for clarification) was asked, and especially on questions like this (since everyone has their own workflow, may the best one float to the top :)). – Sean Allred Apr 18 '13 at 17:06
  • latex2rtf generates 15 blank pages for my document :-( – gerrit Feb 04 '14 at 16:10
  • 1
    As of November 2017 I'm still using this process but have switched from latex2rtf to Pandoc 2.0 for the conversion to .docx. Results are excellent. – Andy Clifton Dec 03 '17 at 08:47
  • As I see it, there is no DOCX export capability in Overleaf. – Suncatcher Mar 29 '18 at 12:06
  • @Suncatcher - correct. What I meant was that some reviewers are happy to comment in web-based tools, and some Editors are happy to do corrections directly in those tools. That means that you potentially no longer need to export / convert to .doc(x) at all and can stay in latex for the review / edit process. This is very dependent on your institution, though. – Andy Clifton Mar 29 '18 at 14:35
  • 1
    upvoting pandoc: it's converted my LaTeX to word, including internal cross-references, hyperlinks, setting the biblatex bibliography as Vancouver, and using a distinct font for the {quote} environment. – Colin Rowat Oct 12 '19 at 12:43
47

I think that LaTeX is the wrong starting format, especially if you are generating your input file using Sweave. Instead you can consider using a light-weight markup (Markdown, RST, etc) as a starting format. It will be much easier to convert these formats to both LaTeX and OpenOffice (for example, using Pandoc). As an example, see this sweave file which is written in Markdown. I processed it using sweave, did a bit of post-processing, and then used Pandoc to convert it into ConTeXt. Since the file after post-processing is completely in Markdown format, converting it to OpenOffice should not be a problem.

Nikos Alexandris
  • 1,051
  • 1
  • 15
  • 43
Aditya
  • 62,301
  • 15
    I guess the issue is that LaTeX provides so much desired functionality around equations, bibliographies, cross referencing, and so on, that I don't believe are provided by most other markup languages, and certainly not markdown. – Jeromy Anglim Apr 03 '12 at 11:38
  • 5
    @JeromyAnglim: the Markdown that Pandoc uses is extended and perhaps better called Pandoc’s Markdown. – morbusg Apr 03 '12 at 12:30
  • What would you recommend as a starting format using Pandoc? Markdown? – Dr. Manuel Kuehner Mar 12 '14 at 08:19
  • It would be great to get a differential solution code here too as an example. Now, this solution feels like a replication of others. – Léo Léopold Hertz 준영 Jan 06 '17 at 11:03
  • I'll add my comment here - the by far best result I've had getting from LaTeX to .docx was with Pandoc, using the command from the example list: pandoc -s document.tex -o example5.docx

    I'm open to do the writing in Markdown, but in the physical sciences writing directly in LaTeX is probably my relative sweet spot, as I want formatting that LaTeX offers but Markdown/RST doesn't.

    – chryss Sep 15 '17 at 21:35
30

The new version of Word (2013) lets you open and edit PDFs. The workflow is then:

  1. Use latex and pdflatex to make your PDF
  2. Open the PDF in Word 2013
  3. Save as docx
  • 5
    Wow. I tried this; the result was on a different level than the open source or online tools I've tried, starting from LaTeX or PDF. It doesn't let you edit the PDF; rather, it converts it to an editable normal Word document. Extraoardinary. My tables become tables; text is nicely paragraphed. The equations have even been partly rendered. Congratulations to Microsoft (first time I've ever said that). I tried Libreoffice 5, which also tries the same thing, but the result is useless for me; text is not even joined into paragraphs. – CPBL Aug 06 '15 at 16:37
  • I tried on a Mac with my version of Microsoft Word (2016) but I couldn't get anything readable... What did you chose when they asked how to open the file? – Claire Boitet Oct 25 '18 at 13:25
  • 2
    @ClaireBoitet make sure you open the PDF by first starting Word, then using the Open File dialogue within Word. Word should display a conversion message while the PDF is being converted. I haven't used this on Mac but I don't know why it would be different. – Andrew Olney Oct 26 '18 at 14:35
  • Which font works the best for pdf conversion in Word? If I use the default LaTeX font (computer modern), Word removes hyphens from some hyphenated words. Do you know if Word pdf conversion works better with other fonts? Thanks! – Richard Herron Aug 26 '20 at 19:24
  • I do not know if some fonts are better. In general it seems to work rather well. Whitespace seems to more often be a problem for me (so words run together) than any problems with hyphens. – Andrew Olney Aug 27 '20 at 20:43
29

I think these two softwares are missing in the list.

  1. TeX2Word from Chikriilab

  2. LaTeX-to-Word from Grindeq

Both of them work elegantly for a properly written LaTeX file. Also, they offer packages for word to LaTeX conversion which are again excellent. But unfortunately both of them are not free.

20

I found a very easy solution for converting LaTeX-documents into editable Word-files.

  1. Compile your LaTeX-document to PDF
  2. Go to the Internet-page http://pdftoword.com/
  3. Upload your PDF and wait until the Word-file arrives.

I have only tested the site with text-files (no graphics or formulas), but it converted complex contract in Norwegian (æøå) to pretty exact copies. You loose the structure (no styles, only direct formatting), but it works if you need to send a Word-file for proof reading etc.

I suggest setting the text ragged-right in LaTeX. This turns off hyphenation (i.e. do not use ragged2e) and the word document will be easier to edit.

Of course, later you have to merge any changes to your LaTeX-source, but still it is better than retyping the document.

For the sake of good order: I have no connections with Nitro Software, I do not even own a copy of their program.

Sveinung
  • 20,355
  • 6
    Thanks. There seems to be quite a few of these online pdf to doc convertors. I've had mixed success with them especially once graphics, equations, headers and footers and so on get introduced. – Jeromy Anglim Jan 16 '12 at 22:18
  • Adobe Acrobat now has a pretty good PDF to word converter. Acrobat will cost you a bit, but it's worth a shot if you have access to it. – Kevin Mar 11 '14 at 13:27
  • Works great, thanks. My file does not include formulas, but formatting, graphics, footnotes and bibliography were converted very well. The trial version allows for 5 conversions per email. – Dennis Golomazov Jun 25 '14 at 05:24
  • The advantage of this tool (and also the problem) is that the Word document looks almost exactly the same as the PDF. I mean, all symbols are in the same places. This is also a problem because the hyphenation signs are translated into dashes. This means that if the Word document will be edited (e.g. by a publisher), they'll get a lot of unwanted dashes in the middle of the lines. – Dennis Golomazov Jun 25 '14 at 05:46
  • 1
    @DenisGolomazov It is easy to get rid of the hyphens: Go to Search/Replace (Ctrl+H). Search for "-^p" (^p represent new paragraph, you will also find it as a choice in the dialog box). Replace with nothing, and viola: the hyphenated words are concatenated into one word again. On the other hand, set the text ragged-right in LaTeX, you will not have hyphenation. – Sveinung Aug 10 '14 at 16:38
  • 1
    pdftoword.com limits on page size. smallpdf.com/pdf-to-word is an alternative choice. – SparkAndShine Sep 07 '16 at 09:04
  • I'd be worried about the security implications of downloading and then viewing word documents from random internet services. Word documents can contain executable code. – Parthian Shot Nov 11 '17 at 06:07
  • Worked much better than pandoc for my document. – Virgo Dec 10 '17 at 21:19
19

There is no pain-free way to do this. Really.

Convert your beautiful TeX to pdf, run pdftotext and then import the plain text into a word processor. Recreate all of the tables and equations by hand. Waste days of your life in order to be "compatible" with chumps who don't care about typography until, finally, you decide to stop working with them. Only then will you find inner peace.

16

My first instinct would probably be oolatex too, or some other technique using TeX4ht, but another method that can also work well is latex2rtf, though I've had the best luck when I tell it it convert formulas, tables, and other complicated stuff to embedded images in the result: obviously, this isn't a great option if the people you're sending them to need to be able to edit those formulas, etc. (But fine if they only need to read and comment.)

frabjous
  • 41,473
15

Several people have mentioned tex4ht but didn't give the command. From my looking around it seems that the command to run is mk4ht oolatex myfile.tex and you should get a .odt file. I tried it on a basic example and it worked great. When I get a chance I will run it on something more complex.

lockstep
  • 250,273
Joel Berger
  • 1,831
  • 1
  • 12
  • 16
  • maybe htlatex myfile.tex? – yo' Aug 29 '12 at 10:17
  • 4
    Just to confirm that in 2015 mk4ht oolatex myfile.tex is still the right command to produce a .odt file openable by LibreOffice. From there you can save as .docx if need be, and the result can be opened in MS Word or Google Docs. Just htlatex myfile.tex produces .html which is useful for other purposes (and can also be imported to MS Word). All this works best if you adhere strictly to "out-of-the-box" LaTeX syntax and use a minimum of packages. – musarithmia Jan 29 '15 at 20:06
12

The best way I know to convert a TeX to an XML application is tex4ht. The project page says it converts TeX to a number of different output formats, including "(X)HTML, MathML, OpenDocument, and DocBook." I believe tex4ht can even convert tikz code to SVG graphics. Word supports OpenDocument, so in theory you could just open up the converted document in Word. I'd expect tables to survive the transition, not so much equations and figures. But MS Word's native format is also an XML application, so you might be able to write an XSLT stylesheet to handle the math and figures.

The need for this kind of tool is evident, and the fact that there's no polished way to do it yet somewhat indicates the complexity of the task. Keep that in mind before you take it up!

Matthew Leingang
  • 44,937
  • 14
  • 131
  • 195
8

Another unlisted solution is the full version of Adobe Acrobat.

I tried the majority of the solutions listed here, which all failed pretty miserably.

Adobe successfully converted nearly everything perfectly, including:

  • most equations
  • almost all formatting
  • images/generated figures
enderland
  • 1,644
7

If all else fails (it did for me), Word 2016 can open pdfs directly. It does a decent job of converting them (I was surprised!), although equations are often converted into images.

onewhaleid
  • 1,257
6

If you are not forced to stick to a certain format pick your weapon of choice -- tex4ht (you can just use oolatex) , tth, latex2html etc -- and prepare a document style that converts well with that. I do this all the time for simple reports and such that I need to share with people who like using MS-Word etc to edit them. If you spend a bit of time to taylor it for the conversion, you can get consistently good results.

If you have requirements to stick a certain format, for example for a grant proposal etc, you can get by picking up a style that has more or less the right format but with minimal extras to generate the text then use MS-Word or OpenOffice to fix it up.

user1375
  • 111
6

All answers above suggested to use some converter from tex/pdf file to the wanted file format, that is why I try to give an n-th proposal. I think this approach is quite insane in this situation, as native solutions also exist - as the OP also mentioned.

As you generate the reports from R, it might be the less painful to rewrite some function you use in the reporting process and update those to be able to run in odfWeave. Well, it will generate an odt file from an odt one, so not a native Word format, but it is compatible with Ms Office also from the 2007 version (SP2).

That would require to write the body of your text (if any) and the reporting R code in a word processor (Ms Word or e.g. OOWriter), and later run it via odfWeave. The package has a really great documentation, just download the sources and look for the formatting.odt in the examples directory, which shows in 30+ pages most of the great formatting features of the package. This includes: paragraph, font, color, table, cell, image etc. also.

daroczig
  • 1,576
5

I don't see Nuance PDF Converter listed in other responses, and I think it is worth a mention. It is not free, but the Mac version that I downloaded has a free trial. I bought it after it successfully converted a PDF that I generated from LaTeX into a clean Word document. Adobe failed miserably on the tables, but Nuance worked well.

petRUShka
  • 215
  • 1
  • 6
4

I found a good solution through Latex2rtf + TexSword. The process consists performing first the convertion LaTex-> Word (which in my case is around 85% correct), and then to fix the wrong or not converted parts with TexSword (the remaining 15%).

petRUShka
  • 215
  • 1
  • 6
Barzi2001
  • 477
3

I have struggled for quite some time but with little success. I tried Latex2rtf and it did not convert references, formulas and tables correctly. tex4ht helped me a lot. I used the following steps to successfully convert tex file to doc file with references.

  1. install text4ht
  2. run latex article.tex
  3. run biber article
  4. run latex article.tex
  5. run mk4ht oolatex article.tex

Now, you can open the newly created file article.odt in OpenOffice (LiberOffice). You can save this file into doc or docx format as desired.

3

For me there are not doubt: The best workflow is start with Rmarkdown (edit in 2023: or today Quarto) and knitr (instead of Sweave), and then to compile the same simple source file document in Rstudio or VS Code as LaTeX, or PDF (made with a LaTeX compiler), or as HTTML, or as Word document directly.

The bad news is that Rmardownk syntax can produce the fundamental LaTeX constructs (as sections levels, cites, links, figures ... and tables and even a toc) but not every posible LaTeX code.

The good news is that (a) for a Word conversion this is mostly irrelevant because other type of LaTeX constructs often cannot be exported to Word in any way. i.e., there are not a clear Word equivalent (b) statistical reports rarely need other special thing that tables and figures generated by R, i.e., the basic markdown structures plus R chunks are more than enough. And (c) for a special high quality PDF, you can insert LaTeX commands directly in the Rmarkdown/Quarto text (although, unfortunately, will be used only in case of a PDF output). A small example:

---
title: "Test"
author: "Someone"
date: "26 de abril de 2019"
output:
  word_document: default
  pdf_document: default
---
## A test
a <- mean(1:10)

The mean is r a. This is not the end. \mbox{This is the end with \LaTeX}

(click on the imagen to zoom it)

mwe

Fran
  • 80,769
2

It sounds like the process you are describing can be entirely offloaded to Authorea. You can write your manuscript in LaTeX, Markdown (and rich text) right on the web and you can even embed data of various sorts (CSV, R files, d3.js, plot,ly to Jupyter Notebooks). Your manuscript has an underlying Git repository. At any point in time, you can export to Word (docx), as well as PDF, or LaTeX - also customizing the look and citation style of your exported manuscript.

2

The recommended way to convert LaTeX to Office formats using tex4ht is to use make4ht:

make4ht -f odt filename.tex

In contrast to mk4ht oolatex it does lots of post-processing of the generated files, to fix issues with Unicode and other stuff that might result in invalid ODT file otherwise.

The direct ODT output, in contrast to import of HTML to LO or Word, should keep better the structural information, especially math, tables or footnotes.

michal.h21
  • 50,697
2

Some answers already mentioned Adobe Acrobat, but I just want to expand on that a little. I just had to convert a 130 pages thesis from Latex to Word format for hand-in.

Adobe Acrobat Pro DC worked astonishingly well. The export engine has two options: preserve text flow, and preserve page layout.

I used 'preserve text flow'. Some things to note:

  • Layout is incredibly well preserved, not messing with the pages apart from a couple of words here or there
  • Table of contents is recognised as a TOC, preserving references/hyperlinks as well
  • List of figures and tables preserved the references/hyperlinks
  • Chapters and (sub)sections are recognised as such
  • Itemize/enumerate are recognised as such
  • Booktabs are imported as actual tables, keeping layout and style
  • Multicolumn environments are recognised as such
  • Hyphenation is not recognised as Word hyphenation, but as hyphen followed by space
  • Footnotes are not recognised as Word footnotes, but as superscript text
  • Longtables are a mess
  • Listings with borders and numbered lines are somewhat a mess. Indentation was messed up in some lines and some borders misplaced. Line numbers were sometimes recognised as a numbered list, sometimes just as text.
  • PGFplots and ggplots from R are imported as vector graphics, with separate elements. However, some points were hidden.

It is possible, however, to just copy some things over from a 'preserve page layout' version. With this option, text and such is placed within textboxes, thus keeping layout very well. In this mode, PGFplots/ggplots and longtables were displayed accurately. So my workflow was:

  1. Export with 'preserve text flow' and 'preserve page layout'
  2. Copy figures and longtables from 'page layout' to 'text flow' version
  3. As I needed a somewhat accurate word count, I had to replace Latex hyphenation with Word hyphenation to avoid double counts. Search for '- ' and replace with '', manually stepping through to avoid any mistakes. Then turn on hyphenation in Word. Worked not too bad, a couple of words here and there messing up the layout.
  4. Check layout page by page, also look for footnotes ending up on the wrong page.
  5. Manually tweak listings: indentation, borders and line numbers.

So that's not exactly a perfect automated workflow for reproducible analysis, but it gives pretty decent results even for large projects.

  • Be careful everybody: _ charachters will be translated into underlined spaces (sometimes two, sometimes one, sometimes _ ). Best of luck with some Python code! BTW most of the other mentioned online tool have the same (or worse) problems – massi Apr 18 '23 at 21:31
1

Since I did not come across it here, I am just highlighting that instead of html or PDF as an in-between format, ASCIIDOC (converted from LaTex by Pandoc) appears to be another alternative.

From: http://dag.wiee.rs/home-made/unoconv/#Screenshot

Screenshot

This may not be your typical set of screenshots, but here is a workflow I use for converting an AsciiDoc (text) formatted résumé into professional ODT, PDF, HTML and DOC versions:

[dag@moria cv]$ make odt pdf html doc
rm -f *.{odt,pdf,html,doc}
asciidoc -b docbook -d article -o resume.xml resume.txt
docbook2odf -f --params generate.meta=0 -o resume.tmp.odt resume.xml
# Saved resume.tmp.odt
unoconv -f odt -t template.ott -o resume.odt resume.tmp.odt
unoconv -f pdf -t template.ott -o resume.pdf resume.odt
unoconv -f html -t template.ott -o resume.html resume.odt
unoconv -f doc -t template.ott -o resume.doc resume.odt

The original files are:

A recipe: Makefile  
An AsciiDoc [.txt] file: resume.txt 
An OpenOffice template [.ott] file: template.ott

Converted into the following files:

Open Document [.odt]: resume.odt    - converted to ODF from TXT using asciidoc and stylesheet
PDF [.pdf]: resume.pdf  - converted from ODF using unoconv
HTML [.html]: resume.html   - converted from ODF using unoconv
Word [.doc]: resume.doc - converted from ODF using unoconv 
nrbray
  • 11
  • 3
  • 2
    So where does LaTeX fit into this mix? – Werner Jan 29 '18 at 20:23
  • Pandoc may be used to convert LaTex to Asciidoc. It then offers an alternative to html or PDF as an in-between format.

    Asciidoc may have some advantage in sending the changes back to LaTex with this toolchain.

    – nrbray Jan 29 '18 at 20:30
  • If you're using Pandoc already, wouldn't going to .docx be a standard output choice rather than going via ASCIIdoc? – Werner Jan 29 '18 at 20:39
  • I felt that the recipe above using open office's stylesheet may help give a work around to get all mathematical formulas, table formatting, and quality figures into docx. – nrbray Jan 29 '18 at 22:09
  • Reading that there seems to be issues with going direct to docx as https://tex.stackexchange.com/a/11088/153731 says pandoc's converter to Open Office seemed more reliable than its converter direct to Word. – nrbray Jan 29 '18 at 22:17
1

The best result that I achieved was following the steps:

  1. htlatex document.tex;
  2. Open document.html in LibreOffice
  3. Export as *.odt;
  4. Open *.odt and save as *.docx
  5. Remove all annotations

The formulas stay as image and references and bibliography are included.

naphaneal
  • 2,614
  • Recently started to open the pdf directly in Microsoft word which gives me the best results and is the simplest approach. The conversion is pretty straightforward. – Pedro Sobreiro Mar 06 '21 at 19:29
1

I tried various converters and combinations, but I obtained the best results with the following procedure.

1) Use htlatex to produce HTML code, with the following options:

htlatex document.tex "xhtml,mathml" " -cunihtf -utf8"

2) Convert the html with pandoc:

pandoc -s document.html -o document.docx

This procedure generates a docx with editable equations and a reasonable typesetting. In particular, getting editable equations is not easy and is not always guaranteed by the other methods. The main drawback of my method is that the figures are converted to raster png files, so the quality is degraded. But, typically, who uses Word is not too much interested into such technical details, so none of my colleagues complained about this problem (or even noticed).

However, there can be some difficulties. Some packages do not work properly, so, if any difficulty is found, a careful debugging must be done in order to find and remove the guilty package. Here is a list of issues I found.

1) The .html file does not display properly in my browser, although the docx obtained from it through pandoc is fine.

2) The package mhchem does not produce the desired result. The limited support of htlatex by mhchem is declared in the documentation. I had to write a perl script which rewrites the mhchem commands in an easier way.

3) The \mathrm{something} command does not make "something" upright, in my case. The problem is that the "mathrm" is encoded as mathvariant="normal" by htlatex, which should be fine, but then the "normal" is rendered as italic by pandoc. The workaround is to substitute "normal" with something upright, such as "sans-serif". So, between htlatex and pandoc, I run:

cat document.html | sed 's/mathvariant=\"normal\"/mathvariant=\"sans-serif\"/g' >memo.html

mv memo.html document.html

The drawback is that the mathrm text becomes "sans-serif", not necessarily the font we want.

4) Depending on the version and installation of htlatex, I had problems with the figures included with includegraphics. They are converted from the original format to png by calling some programs, which sometimes do not exist. I was not able to fix the bug, but I found a workaround: I add a cfg file that defines the conversions and pass its name to htlatex. So htlatex uses the programs defined by me instead of using its default converters (which do not exist). It takes time but it works.

  • 1
    have you tried the make4ht -f odt filename.tex method? It generates directly an ODT file, that should be possible to open directly using Word. If that doesn't work, the ODT file can be converted do DOCX using LibreOffice or Pandoc. This method should preserve footnotes and other stuff. – michal.h21 Sep 17 '19 at 13:50
  • I tried, but it converts equations into images. They are not editable in word nor in openoffice. The method I developed is aimed at having editable equations. However, your comment made me notice that my method does not handle properly the footnotes. – Doriano Brogioli Sep 17 '19 at 15:02
  • the math should be turned to mathml with make4ht, unless pictures are explicitly requested – michal.h21 Sep 17 '19 at 18:23
  • Could you please write the command line for "turning to mathml"? – Doriano Brogioli Sep 18 '19 at 08:37
  • MathML option is used by default in the ODT output mode, this is why I am surprised that math comes as pictures – michal.h21 Sep 18 '19 at 09:45
  • Now I'm using windows with msys and sometimes it does not work properly. I will try with linux. In the meanwhile, can you suggest how to debug the problem? – Doriano Brogioli Sep 18 '19 at 16:37
  • You should be able to find ooffice-mml.4ht and ooffice.4ht in the log file. On Windows, there were some issues with post-processing of the generated files using a Java program. Look for something like java -classpath "/usr/local/texlive/2019/texmf-dist/tex4ht/bin/tex4ht.jar" xtpipes -i "/usr/local/texlive/2019/texmf-dist/tex4ht/xtpipes/" -o "lt4ht.4oo" "lt4ht.tmp" in the terminal output (the paths will be different), there may be some error messages in the following lines. – michal.h21 Sep 19 '19 at 08:43
  • I tried some simple examples, and, yes, they work, and the generated equations are editable! I also see the problem with java. Moreover, I'm also having another problem. Sometimes, make4ht says: "Cannot create ODT file: file exists" although the file does not exist. Any suggestion? – Doriano Brogioli Sep 19 '19 at 12:13
  • The Java issue should be hopefully fixed with the new make4ht version, there are issues when the TeX distribution is placed in a path that contain spaces, such as Program files. The second error message means that a temporary directory make4ht uses for the ODT packing already exists. I've found that there is an error in make4ht that happens only on Windows, because temporary file names have a OS dependent form. It should be fixed in the next version as well. – michal.h21 Sep 20 '19 at 07:13
  • I thank you for promising the fix. I confirm that make4ht works on my linux machine. On the other hand, in the meanwhile, the method I suggested can be used by who wants to use windows right now. – Doriano Brogioli Sep 20 '19 at 07:40
1

I've been revisiting this recently (October 2019).

Originally I focussed on the "how to convert to word" part of the question and I advocated using latex and modifying the output of that for different users. I think that is still valid in some circumstances and will leave that answer.

I came back to this in the context of reproducible research, i.e. linking analysis to publications and preparing different output formats, which is closer to the original question.

I now think the key is to combine the analysis and documentation in the simplest possible base format, run the analysis, and then convert the resulting documentation to whatever output format is required for whichever group. This technology has become dramatically better in the last few years.

The workflow I have settled on is...

  1. Prepare an R markdown file that includes analysis steps and documentation. The analysis is included in knitr chunks and can be in R, python, and many other languages.
  2. Set up the YAML preamble in the R markdown to describe the output document format(s) I want. In my case they include PDF, HTML, word, and Github-flavoured markdown.
  3. Render this in R to produce the different documents. This is done using the rmarkdown and bookdown packages in R. rmarkdown and bookdown leverage pandoc to produce very good output files.
  4. Share the different output files with different groups.
  5. Update the Rmarkdown and iterate items 1-4 until complete.

I have put an example of this into a repository at https://github.com/AndyClifton/lit-pro-sci-pub-demo.

Andy Clifton
  • 3,699
1

This is my method for converting LaTeX documents to MS Word documents: first convert LaTeX file to epub3 file with MathML support by using tex4ebook, then convert its output to Word file by using Pandoc.

Conversion from LaTeX to epub3: tex4ebook -f epub3 mwe.tex "mathml"

Conversion from epub3 to MS Word: pandoc mwe.epub -o mwe.docx

Note: Probably you will need to revise your LaTeX source file before converting it to epub3, see this answer for details.

0

I have had little luck with, e.g. htlatex and mk4ht.

However, what works for me are the following four strategies:

  1. pdftotext -layout thesis.pdf
  2. detex thesis.tex > thesis_detex.txt
  3. pdf2htmlEX thesis.pdf
  4. Adobe Acrobat XI Pro

Pdftotext produces thesis.txt, which is very close to the formatting in pdf. To avoid character-mapping problems, load thesis.txt in a windows text editor, e.g., notepad++, select all and paste into a new MS Word/Openoffice Writer document. The only drawback is that if your tex-file produces a pdf with concatenated words, thesis.txt will contain the concatenations.

Detex produces thesis_detex.txt which works fine for many use cases since words are not concatenated.

pdf2htmlEX produces a very accurate html file. Although some characters are collated to one new character.

To be honoust, the conversion with Adobe Acrobat XI Pro worked flawlessly. I was surprised since I had not used Adobe for a while. Earlier versions did not perform well.

In conclusion, the best option for me is to use Adobe on a latex produced pdf.

rvaneijk
  • 330
  • what kind of issues did you have with htlatex or mk4ht? I am sure they should produce better results than just plain text. – michal.h21 Dec 02 '17 at 14:50
  • htlatex and mk4ht do not work very well with hyperref and pgf. Disabling hyperref is suggested. Pgf was a different problem for me, e.g., the first error message was "l.190 \pgfusepathqfill}". Also many as follows: "! Undefined control sequence. \pgfsys@svg@newline ->\Hnewline

    l.214 \pgfusepathqstroke}"

    htlatex produces an incomplete file, whereas mk4ht did not even finish.

    – rvaneijk Dec 03 '17 at 16:10
  • I think you should post a question illustrating your issues with hyperref, it should work without problems. Regarding the TikZ issue, this is known, it has been reported in TikZ issue tracker together with a solution, but it haven't been fixed upstream yet. see https://tex.stackexchange.com/a/386775/2891 for some solutions. – michal.h21 Dec 03 '17 at 16:27
0
  • Step 1. Create a PDF from your LaTeX file using pdflatex
  • Step 2. Convert the newly created PDF to Word document. You can achieve this step in 3 different ways:
  • For Windows users, Microsoft 2013 and later versions can open a PDF and save it as Word.
  • If you're not using Microsoft Word, you can simply use an online PDF to Word Converter as an alternative, no download needed.
  • If your document is text-only, just open it with Google Docs, and then download it as a Word document.