The question Extracting the contents of text in a specified environment into a new file (and answers therein) involve using the extract package to produce a LaTeX file that contains all the text within in a specified environment.
My question is if it is possible to do the reverse? I am forced to work with Microsoft Word users. Due to Word's stability issues we usually maintain a .docx file containing all the text for a paper (or in this case my thesis) and one containing all of the figures and captions. Some journals require this approach as well. The endfloat package gets part of the way there by placing figures at the end of the PDF that is generated. I've had trouble using TeX4ht or latex2rtf because all of my figures are PDF. Figures in PDF format work great with pdftex, which I use to generated the PDF files.
Being a chemist, I use a lot of superscripts and subscripts, in particular with the mhchem package. I was hoping that if I could the text into HTML or RTF I could import to word without have to redo every subscript and superscript in the text. I've tried placing all of the \includegraphics commands within a comment environment (like textonly below) using the comment package:
\begin{figure}
\begin{textonly}
\centering
\includegraphics{figurefile}
\end{textonly}
\label{fig:figure}
\end{figure}
This approach broke all of my references (\ref{fig:figure}) to figures - they typeset as ??.
Is finding a way to compile the LaTeX code into a more Word friendly format the way to go or is converting the PDF generated by pdftex the best approach? I find that I have to manually fix the subscripts and superscripts to be recognized by Word. Additionally, all of the \refs work in the final output PDF, where that may be a problem in a split file approach.