8

Is there a TeX engine that converts a TeX document to MathML or XML directly the way PDFTeX converts text input to PDF?

I am familiar with some converters that transform a subset of TeX into something that can be viewed on a browser, but they all leave out many of the packages I used, particularly hyperref and xypic. I want a converter that passes Knuth's test for genuine TeX.

doncherry
  • 54,637
  • As far as I know, it doesn't exist. tex4ht does a good job in many cases, though, with hand-written support for many packages. – Bruno Le Floch May 24 '11 at 00:11
  • 1
    ConTeXt supports an xml backend (that can be used to export to xhtml or epub as well). But, I guess, that you are interested in a LaTeX to XML. – Aditya May 24 '11 at 00:31
  • 1
    @Aditya: true, I keep forgetting that you ConTeXt guys are much further along on this than LaTeX. I think that SixWingedSeraph's question is about TeX itself, though. – Bruno Le Floch May 24 '11 at 02:56
  • @Bruno: Although ConTeXt is able to process plain TeX input, I really doubt that it will generate any useful XML from that. To generate XML, all ConTeXt commands are use "tags" that are then translated into XML. (The standard ConTeXt command for defining new macros, \define and \definestartstop automatically insert these tags). Unless the macros used by plain TeX are redefined in terms of these tags, ConTeXt won't be able to give useful XML output. – Aditya May 24 '11 at 04:57
  • 1
    I doubt very much that you will ever get a genuine TeX to XML converter because TeX can do things with its output that XML can't - in particular, it can measure the size of various bits of its output and take action accordingly. Since an XML document doesn't know how it will be rendered, this would be extremely difficult to implement. That's why the converters in existence tend to work on a subset and build outwards. – Andrew Stacey May 24 '11 at 07:17
  • @Andrew: a hypothetical xmlTeX might measure sizes of boxes as if it were TeX (on an infinitely long page), and "take action accordingly", even though the final output would not necessarily conform to these measures. There would be a "best font/page size (etc)" to see the result, and any change may or may not work. – Bruno Le Floch May 24 '11 at 12:53
  • As a result of these comments, I am now pretty sure that there is no TeX to XML of the sort I was asking for. As I understand it, PDFTeX was created by taking Knuth's C source code for TeX and changing the output to PDF commands instead of DVI instructions. To convert to XML you would have to change the output to XML code, and, probably, something like Java code for some things that HTML can't do. – SixWingedSeraph May 25 '11 at 15:23
  • @Bruno: But how is it going to measure these boxes? It can't know what font I'm going to use - or even if I'm going to use a font (what about spoken text?). Somewhere else, the values of crocdoc (or some such name) were being extolled. It looked great, until I changed the font and then it looked an absolute mess. The flexibility of XHTML is an asset, the rigidity of TeX is likewise an asset. But the two are (as I see it) incompatible and to cripple one to fit the other is to miss the value of each. – Andrew Stacey May 25 '11 at 21:12
  • @SixWingedSeraph: if you have more specific needs than just "LaTeX to XML" then I might be able to help. I'm the sysadmin of the nLab project and there we confront this sort of thing quite often. We don't have perfect solutions, but we have things that work for us and some might work for you. Feel free to contact me by email if you wish. – Andrew Stacey May 25 '11 at 21:14
  • @Andrew: would it be possible to do what cannot be done in XHTML via some scripting language, whose actions would depend on e.g., the font chosen? [I think I'm finally understanding why TeX->XHTML is not done.] – Bruno Le Floch May 27 '11 at 13:59
  • @Bruno: Technically it would, and I've heard of some systems that do so. But relying on client-side javascript is a little risky, it also doesn't solve the question of what happens if the user does something you didn't anticipate; and if you really need that much control then you should just use PDF. I think that one should play to the strengths of the set-up you are using and not try to make it be something it isn't. – Andrew Stacey May 28 '11 at 17:50
  • @Bruno: PS, if you want to continue this discussion (which I'm more than happy to do) I think we should shift to chat to avoid clogging up the comment thread and flooding SixWingedSeraph's inbox. – Andrew Stacey May 28 '11 at 17:51
  • I would start with LaTeXML which is probably the most advanced convertor currently see also http://tex.stackexchange.com/questions/43847/why-havent-any-tex-html-converters-been-updated-to-use-current-web-standards-s – David Carlisle Jul 19 '12 at 11:26

8 Answers8

7

tex4ht can convert to XHTML and MathML. See my answer to this question.

raphink
  • 31,894
4

There's plasTeX and LaTeXML, written in Python and Perl respectively.

The KWARC group are a major user of LaTeXML.

3

You could give a try to Illumino. It can manage LaTeX files and output several output formats as:

  • Custom XML
  • Native PDF
  • PS
  • Custom HTML
  • others
Spike
  • 6,729
2

TeX4h. It is a part of MiKTeX bundle and, perhaps, TeX Live.

2

I'm not sure if this TeX/LaTeX to MathML Online Translator is part of what you're looking for. It generates clunky but acceptable MathML from (La)TeX input.

Warrick
  • 681
2

As XML is just a syntax it is trivial to convert any LaTeX into XML just by creatimg a file of the form <latex>... text of latex document with < & and > replaced by &lt; &amp; &gt; ....</latex> The result is XML but perhaps not what you meant. This isn't just a silly example though, people do such translations to store blobs of TeX in XML databases.

Another conversion to XML can be got by running latex to get dvi then using dvisvgm to convert the result to SVG. Again this is XML but perhaps not the kind of XML you intended as it is positioning every character by fixed coordinate positions. It does however cope with a very wide range of TeX inputs.

If you need a conversion to something not unlike HTML. Say, XHTML for text, MathML for maths and SVG for any vector images included, then the issues relate to the nature of the HTML format: that it is expected to reflow in the client and use reader- rather than author-specified fonts in many cases, and that the rendering environment (even with javascript enabled) isn't as tightly bound to the programming structures as happens in TeX. The fact that that system uses an XML (or HTML) syntax rather than a backslash-and-brace syntax is the least of the issues involved in a translation.

If you do want to translate to XHTML+MathML, then LaTeXML is a good place to start,

David Carlisle
  • 757,742
1

Please have a look at http://www.texfolio.org

TeXFolio is a web-based typesetting framework. It can output your TeX document which is structured according to the TeXFolio coding scheme to Elsevier DTD XML/MathML or JATS XML/MathML. It has lot of other features also.

-1

With Tex4ht supplied in MikTeX, you can use mzlatex filname (without extension) for creating the xml extension file. You can clean with some scripts according to your requirements.

Torbjørn T.
  • 206,688