5

Possible Duplicate:
Workflow for converting LaTeX into Open Office / MS Word Format

I am a scientist and an (almost) average Windows 7 user. I like to use LaTeX (MikTeX with Texmaker) to write documents that do not go for a peer-review. Unfortunately, LaTeX is not very commonly used in my field of science. I have been writing my manuscripts in MS Word, because my coauthors can't use the MS Word's Track Changes feature on pdf files.

Yet, I haven't given away the dream of writing also my manuscripts with LaTeX. So far I have been trying to find an efficient way to convert my documents to .doc format without a success. I am aware of many treads (1, 2, etc) discussing about this topic, but yet there seems to be a lack of an efficient solution for me. I have been trying to use Pandoc and htlatex. Both of them have managed to convert some features, but not all of what I need (natbib seems to be a major problem). I can't believe that there wouldn't be a solution for this, since the Open Source world is full of brilliant solutions nowadays. Thus I would like to ask help in making a workflow for converting this Latex document, which includes features from an average manuscript, into .docx format:

\documentclass[a4paper,12pt]{scrartcl}
\usepackage[sort]{natbib}
\bibliographystyle{authordate1}
\usepackage{setspace}
\setstretch{2}

\usepackage[pdftex]{graphicx}
\usepackage{lineno}
\usepackage{natbib}
\usepackage{authblk}
\usepackage[colorlinks=true,
              linkcolor=blue]{hyperref}

\title{This is a manuscript}
\author[1]{John Rambo}
\author[1,2]{Face Macfaen}
\author[3]{Pure Animal}

\affil[1]{Institute of Handicapped Maniacs}
\affil[2]{Cap headed taxi drivers}
\affil[3]{University of Gottemham, UK}

\date{\today}

% Define some names

\def\GM{{\it Gadus morhua}}  

\begin{document}
\maketitle

\section*{Abstract}


We did some math\footnote{Which was borrowed from a Pandoc example, of course}
\[
\phi_n(\kappa) =
 \frac{1}{4\pi^2\kappa^2} \int_0^\infty
 \frac{\sin(\kappa R)}{\kappa R}
 \frac{\partial}{\partial R}
 \left[R^2\frac{\partial D_n(R)}{\partial R}\right]\,dR
\]

We found out nothing, but that's how science is sometimes. We also cited a lot to pretend that we know something. \citet{Medina-Elizalde2012} talks about a collapse. R \citep{Team2011} is really the coolest program. \LaTeX does not work for manuscript writing, because some co-authors want to use track changes in MS Word. 

\section*{Introduction}
\linenumbers
Plagiarized text: Pandoc is a \href{http://www.haskell.org/}{Haskell} library for
converting from one markup format to another, and a command-line tool
that uses this library. It can read
\href{http://daringfireball.net/projects/markdown/}{markdown} and
(subsets of) \href{http://redcloth.org/textile}{Textile},
\href{http://docutils.sourceforge.net/docs/ref/rst/introduction.html}{reStructuredText},
\href{http://www.w3.org/TR/html40/}{HTML}, and
\href{http://www.latex-project.org/}{LaTeX}; and it can write plain
text, \href{http://daringfireball.net/projects/markdown/}{markdown},
\href{http://docutils.sourceforge.net/docs/ref/rst/introduction.html}{reStructuredText},
\href{http://www.w3.org/TR/xhtml1/}{XHTML},
\href{http://www.w3.org/TR/html5/}{HTML 5},
\href{http://www.latex-project.org/}{LaTeX} (including
\href{http://www.tex.ac.uk/CTAN/macros/latex/contrib/beamer}{beamer}
slide shows), \href{http://www.pragma-ade.nl/}{ConTeXt},
\href{http://en.wikipedia.org/wiki/Rich\_Text\_Format}{RTF},
\href{http://www.docbook.org/}{DocBook XML},
\href{http://opendocument.xml.org/}{OpenDocument XML},
\href{http://en.wikipedia.org/wiki/OpenDocument}{ODT},
\href{http://www.microsoft.com/interop/openup/openxml/default.aspx}{Word
docx}, \href{http://www.gnu.org/software/texinfo/}{GNU Texinfo},
\href{http://www.mediawiki.org/wiki/Help:Formatting}{MediaWiki markup},
\href{http://www.idpf.org/}{EPUB},
\href{http://redcloth.org/textile}{Textile},
\href{http://developer.apple.com/DOCUMENTATION/Darwin/Reference/ManPages/man7/groff\_man.7.html}{groff
man} pages, \href{http://orgmode.org}{Emacs Org-Mode},
\href{http://www.methods.co.nz/asciidoc/}{AsciiDoc}, and
\href{http://www.w3.org/Talks/Tools/Slidy/}{Slidy},
\href{http://paulrouget.com/dzslides/}{DZSlides}, or
\href{http://meyerweb.com/eric/tools/s5/}{S5} HTML slide shows. It can
also produce \href{http://www.adobe.com/pdf/}{PDF} output on systems
where LaTeX is installed. (Taken from Pandoc manual).

\section*{Material and Methods}

\GM~ is a cod.

\section*{Results}

Table \ref{numbers} shows more of this boring stuff, but that's how science often is. Figure \ref{figure} is a pdf challenge for Pandoc. If it works, I'll eat my hat.

\begin{figure}[h!]
 \centering
 \includegraphics[scale=.7]{figure.pdf}
\caption{Figure shows some dull scientific stuff that confuses Pandoc. It's made with R and imported in pdf format. X-axis has unit of $\mu m \ s^{-1} $}
\label{figure}
\end{figure}

\newpage
\bibliography{example}


\section*{List of tables}

\input{numbers.tex}


\end{document}

Here is numbers.tex, which is needed to run the code.

% latex table generated in R 2.12.1 by xtable 1.5-6 package
% Sun May 29 13:02:18 2011
\begin{table}[ht]
\begin{center}
\caption{Distribution of samples over year, stage, age, sex and month}
\begin{tabular}{rrrrrrrrrrr}
  \hline
 & 1997 & 1998 & 2004 & 2005 & 2006 & 2007 & 2008 & 2009 & 2010 & Total \\ 
  \hline
1 & 0 & 0 & 0 & 0 & 0 & 7 & 15 & 35 & 16 & 73 \\ 
  2 & 28 & 11 & 34 & 138 & 102 & 50 & 37 & 29 & 85 & 514 \\ 
  3 & 0 & 0 & 0 & 0 & 0 & 2 & 0 & 0 & 0 & 2 \\ 
  4 & 28 & 11 & 34 & 138 & 102 & 57 & 52 & 64 & 101 & 587 \\ 
  5 & 28 & 7 & 26 & 125 & 58 & 55 & 40 & 62 & 73 & 474 \\ 
  6 & 0 & 3 & 5 & 8 & 42 & 2 & 11 & 2 & 28 & 101 \\ 
  7 & 0 & 1 & 3 & 5 & 2 & 2 & 1 & 0 & 0 & 14 \\ 
  8 & 28 & 10 & 31 & 133 & 100 & 57 & 51 & 64 & 101 & 575 \\ 
  9 & 4 & 2 & 14 & 43 & 27 & 28 & 20 & 7 & 40 & 185 \\ 
  10 & 0 & 3 & 5 & 8 & 42 & 2 & 11 & 2 & 28 & 101 \\ 
  11 & 10 & 5 & 11 & 71 & 19 & 24 & 12 & 11 & 29 & 192 \\ 
  12 & 14 & 1 & 4 & 16 & 14 & 5 & 9 & 44 & 4 & 111 \\ 
  13 & 14 & 10 & 30 & 122 & 88 & 54 & 43 & 20 & 97 & 478 \\ 
  14 & 0 & 0 & 0 & 0 & 0 & 2 & 2 & 0 & 9 & 13 \\ 
  15 & 0 & 0 & 0 & 0 & 0 & 5 & 9 & 11 & 5 & 30 \\ 
  16 & 21 & 8 & 24 & 98 & 93 & 25 & 31 & 46 & 57 & 403 \\ 
  17 & 7 & 3 & 5 & 38 & 9 & 25 & 10 & 7 & 28 & 132 \\ 
  18 & 0 & 0 & 5 & 2 & 0 & 2 & 0 & 0 & 2 & 11 \\ 
  19 & 28 & 11 & 29 & 136 & 102 & 57 & 52 & 64 & 99 & 578 \\ 
   \hline
\end{tabular}
\label{numbers}
\end{center}
\end{table}

Here is the bibliography (example.bib):

% This file was created with JabRef 2.7.2.
% Encoding: Cp1252

@ARTICLE{Medina-Elizalde2012,
  author = {Medina-Elizalde, M. and Rohling, E. J.},
  title = {Collapse of Classic Maya Civilization Related to Modest Reduction
    in Precipitation},
  journal = {Science},
  year = {2012},
  volume = {335},
  pages = {956-959},
  number = {6071},
  endnotereftype = {Journal Article},
  issn = {0036-8075 1095-9203},
  shorttitle = {Collapse of Classic Maya Civilization Related to Modest Reduction
    in Precipitation}
}

@MISC{Team2011,
  author = {R Development Core Team},
  title = {R: A Language and Environment for Statistical Computing},
  howpublished = {R Foundation for Statistical Computing},
  year = {2011},
  endnotereftype = {Electronic Source},
  shorttitle = {R: A Language and Environment for Statistical Computing},
  url = {http://www.R-project.org}
}

Here is the figure (figure.pdf)

enter image description here


Ok, I have tried several of these options. Here is a list what some of these multiple options given here do and do not convert (please edit, if you find mistakes / additions):

Pandoc

Converts

  • Math
  • Text
  • Headings

Does not convert

  • Tables
  • Pdf figures
  • natbib references
  • Hyperlinks and cross references
  • Author list

GrindEQ

Converts

  • Math
  • Tables
  • Cross-references
  • Text
  • Headings

Does not convert

  • \maketitle (with author list)
  • Pdf figures
  • Url's
  • natbib reference list

Adds unnecessary space after special commands and url's

Adobe Acrobat X Pro

Converts

  • Almost everything

Does not convert

  • Hyperlinks (but they are really not needed either)
  • Maths (or does convert them, but not perfectly)

Best option so far, but makes the font look weird in Word. I can't find a way changing it to normal.

PDF annotation

The community seems to think that this is by far the best option. I do agree, but as said some of the more experienced coauthors insist to have their doc version. Thus this is not a solution for this question. Mendeley was suggested as the best pdf annotation / reference manager program. If you haven't heard about this program, go and check it out. It seems very promising.

Mikko
  • 735
  • I'm not sure I understand... How is this different to the questions you linked to? – qubyte Apr 24 '12 at 12:52
  • A practical example. I didn't manage to convert this to .doc satisfactorily. – Mikko Apr 24 '12 at 12:54
  • 3
    The best answers to this question are already to be found on in the first question you linked to. Basically the answer is that there's no easy way to do this well. – qubyte Apr 24 '12 at 13:00
  • Well...thanks for help anyway. I'll try in 2015 again. Maybe then someone knows. – Mikko Apr 24 '12 at 13:02
  • 6
    Of course, if it's just for review, they can annotate a pdf file. – qubyte Apr 24 '12 at 13:02
  • I have just added my answer there in the link you provided. See if they are useful to you. –  Apr 24 '12 at 13:05
  • 3
    Whatever you end up doing, I'd strongly recommend that you replace the instruction \linespread{2} with \usepackage{setspace} \setstretch{2}; otherwise, your tabular material will look just awful. – Mico Apr 24 '12 at 13:05
  • @ Harish: GrindEQ works worse than Pandoc. Both of these are commercial products. – Mikko Apr 24 '12 at 13:09
  • @ Mico: Thanks! I made this just as an example and forgot about that command (used it in one text). – Mikko Apr 24 '12 at 13:11
  • 1
    The benefit you get with pdf annotation is that you retain control of the paper. Not optimal for cooperation, but sometimes it's useful to have a single author in charge of the actual writing. – qubyte Apr 24 '12 at 13:21
  • 1
    Nja, it just doesn't work like that around here. Some of the old guys just require to get their .doc version of the manuscript. They won't spend their time commenting on pdf's and waste their time correcting formats of poorly converted doc files. – Mikko Apr 24 '12 at 13:30
  • 1
    +1 cause this is a real problem although there does not seem to be any easy solutions... – jonalv Apr 24 '12 at 14:13
  • @Largh: Grindeq works well if the file (word or latex) is written properly. You know these things can be written in a bad way also. In such cases almost all converters give horrible results. –  Apr 24 '12 at 14:43
  • @HarishKumar Ok, which parts of the example above are poorly written? For some reason, I can't get natbib references to work, hyper-references are shown with their full url, the title (with authors etc.) and pdf figures are missing – Mikko Apr 24 '12 at 14:54
  • My best bet so far is to use Adobe Acrobat X Pro and convert the pdf to doc. This reproduces all the features (except the equation, which comes but not perfectly), but makes the font look weird. It doesn't help to change the font in Word. It remains uneven for some reason. – Mikko Apr 24 '12 at 15:02
  • One thing worth noting is that pandoc can be extended relatively easily with Haskell scripts. It may be possible to bolt on the extra bits you need to properly interpret such things as citet. – qubyte Apr 24 '12 at 15:15
  • 1
    @Largh Consider Mendeley; it supports pdf markup: http://www.mendeley.com/features/read-and-annotate/ – Ethan Bolker Apr 24 '12 at 15:44
  • Well, I don't agree with you that this question has exactly the same content than the earlier topic. I believe that this example with the typical features for a manuscript can be converted to a doc with a right kind of approach. If someone manages to figure out that approach for the example above, please send it to me as an email / post it to the other thread that is open. – Mikko Apr 24 '12 at 16:58
  • The main problem is that for this kind of approach, you'll require to also convert the doc back to LaTeX. Otherwise it's probably easiest to convert the LaTeX to HTML and import that in Word. – Stephan Lehmke Apr 25 '12 at 06:48
  • I am sorry, if I haven't been clear enough. The idea is just to produce a doc file for coauthors, so that they can use their track changes on it. There is no need to convert these docs back to LaTeX, since it's easier to edit text in LaTeX. Those colors, lines and comments makes me dizzy. – Mikko Apr 25 '12 at 07:08
  • Your earlier comment says you would welcome an email response, but I can't find your email address in the comment or in your profile ... what I would say in email is a response to @Stephan's comment above and my earlier one - if you need the coauthors' comments only to guide the changes you make to the LaTeX source then Mendeley comments on the pdf should work just as well as Word track changes - or better. – Ethan Bolker Apr 25 '12 at 19:50
  • First: I tried Mendeley and it seems to be a really nice program. I am very thankful that you linked it here, since I didn't know about it. Thanks! Then: If some of the coauthors are unwilling to use Adobe Acrobat Pro or other pdf readers capable for annotations, why should they use Mendeley? To me it seems the same than any other program. Some of the coauthors want to delete text in their text processor and add their own stuff. This is what makes them to feel that they have the control over the document, I guess... – Mikko Apr 26 '12 at 07:46

0 Answers0