2

I would like to convert the below file to MathML format as well need the LaTeX coding in MathML Semantic tag.

MWE:

\documentclass{article}
\usepackage[T1]{fontenc}

\begin{document}

\article{Article Title Here} \author{Author Name Here} \maketitle

\section{Introduction}

This is the sample paragraph. \begin{equation}\label{eq1-11} T,^{\prime}{\mu \nu} = \left( \frac{\partial \xi^\alpha} {\partial\xi^{\prime\mu}}\right) \left( \frac{\partial \xi^\beta}{\partial \xi^{\prime\nu}} \right) T{\alpha \beta} \end{equation}

Please refer the equations \ref{eq1-11} for the further testing. \end{document}

MadyYuvi
  • 13,693
Balaji
  • 2,282
  • sorry, I am a bit busy right now, so I cannot test it. I did some tests in this area some time ago: https://github.com/michal-h21/mathdimen . I don't know if it still works, as I said it is quite old. – michal.h21 Oct 09 '20 at 13:38
  • Must it be tex4ht? https://tex.stackexchange.com/q/227195/107497 gives that as an option, plus pandoc, latexml, and a link to a website with dozens of other options. – Teepeemm Oct 09 '20 at 14:47

2 Answers2

1

There are several possible approaches how to achieve this:

  1. configure TeX4ht to catch all math content and typeset it twice - once using MathML, second time as a verbatim text.
  2. parse MathML content and convert it back to the LaTeX code
  3. pre-process the input TeX file and modify it in the way it will be easier for working with

The first method could reuse the code that we use for the MathJax option in TeX4ht, see file mathjax-latex-4ht.4ht for details.

The second method won't produce the same LaTeX code as was the original input. It may be a problem for you. LuaXML can be used for the conversion.

I will present the third method in my answer. It consists of two components - the input filter that parses the input LaTeX file for the math content and marks it with some additional macros, and make4ht DOM filter that modifies the resulting HTML file to produce the correct MathML structure.

Here is the input filter. It reads input from the standard input and prints the modified output.

File altmath.lua:

-- insert envrionmnets that should be handled by the script here
local math_environments = {
  equation = true,
  displaymath = true,
  ["equation*"] = true,

}

-- macros that will be inserted to the updated document local macros = [[ \NewDocumentCommand\inlinemath {mv} {\HCode{<span class="inlinemath">}#1\HCode{<span class="alt">}\NoFonts #2\EndNoFonts\HCode{</span></span>}} \NewDocumentEnvironment{altdisplaymath}{} {\ifvmode\IgnorePar\fi\EndP\HCode{<div class="altmath">}} {\ifvmode\IgnorePar\fi\EndP\HCode{</div>}} ]]

-- we will insert macros before the second control sequence (we assume that first is \documentclass local cs_counter = 0

-- we will hanlde inline and diplay math differently local inline = 1 local display = 2

local function handle_math(input, nexts, stop, buffer, mathtype) local content = input:sub(nexts, stop) local format = "\inlinemath{%s}{%s}" -- format used to insert math content back to the doc -- set format for display math if mathtype == display then format = [[ \begin{altdisplaymath} %s \begin{verbatim} %s \end{verbatim} \end{altdisplaymath} ]] end buffer[#buffer + 1] = string.format(format, content, content ) end

local function find_next(input, start, buffer) -- find next cs or math start local nexts, stop = input:find("[$\]", start) local mathtype
if nexts then -- save current text chunk from the input buffer buffer[#buffer+1] = input:sub(start, nexts - 1) local kind, nextc = input:match("(.)(.)", nexts) if kind == "\" then -- handle cs -- insert our custom TeX macros before second control sequence cs_counter = cs_counter + 1 if cs_counter == 2 then buffer[#buffer+1] = macros end if nextc == "(" then -- inline math _, stop = input:find("\)", nexts) mathtype = inline elseif nextc == "[" then -- display math _, stop = input:find("\]", nexts) mathtype = display else -- maybe environment? -- find environment name local env_name = input:match("^begin%s{(.-)}", nexts+1) -- it must be enabled as math environment if env_name and math_environments[env_name] then _, stop = input:find("\end%s{" .. env_name .. "}", nexts) mathtype = display else -- not math environment buffer[#buffer+1] = "\" -- save backspace that was eaten by the processor return stop + 1 -- return back to the main loop end end else -- handle $ if nextc == "$" then -- display math _, stop = input:find("%$%$", nexts + 1) mathtype = display else -- inline math _, stop = input:find("%$", nexts + 1) mathtype = inline end end if not stop then -- something failed, move one char next return nexts + 1 end -- save math content to the buffer handle_math(input, nexts, stop, buffer, mathtype) else -- if we cannot find any more cs or math, we need to insert rest of the input -- to the output buffer buffer[#buffer+1] = input:sub(start, string.len(input)) return nil end return stop + 1 end

-- process the input buffer, detect inline and display math and also math environments local function process(input) local buffer = {} -- buffer where text chunks are stored local start = 1 start = find_next(input, start,buffer) while start do start = find_next(input, start, buffer) end return table.concat(buffer) -- convert output buffer to string end

local content = io.read("*all") print(process(content))

You can test it using the following command:

texlua altmath.lua < sample.tex

This is modified version of your original TeX file:

\documentclass{article}
\NewDocumentCommand\inlinemath {mv} {\HCode{<span class="inlinemath">}#1\HCode{<span class="alt">}\NoFonts #2\EndNoFonts\HCode{</span></span>}}
\NewDocumentEnvironment{altdisplaymath}{} {\ifvmode\IgnorePar\fi\EndP\HCode{<div class="altmath">}} {\ifvmode\IgnorePar\fi\EndP\HCode{</div>}}
\usepackage[T1]{fontenc}

\begin{document}

\title{Article Title Here} \author{Author Name Here} \maketitle

\section{Introduction}

This is the sample paragraph with \inlinemath{$a=b^2$}{$a=b^2$} inline math. Different \inlinemath{(a=c^2)}{(a=c^2)} type of math. \begin{altdisplaymath} \begin{equation}\label{eq1-11} T,^{\prime}{\mu \nu} = \left( \frac{\partial \xi^\alpha} {\partial\xi^{\prime\mu}}\right) \left( \frac{\partial \xi^\beta}{\partial \xi^{\prime\nu}} \right) T{\alpha \beta} \end{equation} \begin{verbatim} \begin{equation}\label{eq1-11} T,^{\prime}{\mu \nu} = \left( \frac{\partial \xi^\alpha} {\partial\xi^{\prime\mu}}\right) \left( \frac{\partial \xi^\beta}{\partial \xi^{\prime\nu}} \right) T{\alpha \beta} \end{equation} \end{verbatim} \end{altdisplaymath}

Please refer the equations \ref{eq1-11} for the further testing. \end{document}

You can see that it inserts macro definitions after the \documentclass command. It defines the \inlinemath command and altdisplaymath environment. The definitions contain code that inserts HTML tags directly to the converted file. They are designed to be used just with TeX4ht.

You can convert your file to HTML using

texlua altmath.lua < sample.tex | make4ht -j sample - "mathml"

It produces a following code:

<span class='inlinemath'><!-- l. 14 --><math xmlns='http://www.w3.org/1998/Math/MathML' display='inline'><mi>a</mi> <mo class='MathClass-rel'>=</mo> <msup><mrow><mi>b</mi></mrow><mrow><mn>2</mn></mrow></msup></math><span class='alt'>$a=b^2$</span></span> 

or

<div class='altmath'> <!-- tex4ht:inline --><table class='equation'><tr><td>
<!-- l. 16 --><math xmlns='http://www.w3.org/1998/Math/MathML' display='block' class='equation'>
                       <mstyle class='label' id='x1-1001r1'></mstyle><!-- endlabel --><mi>T</mi><msubsup><mrow><mspace width='0.17em' class='thinspace'></mspace></mrow><mrow><mi mathvariant='italic'>μν</mi></mrow><mrow><mi>′</mi></mrow></msubsup> <mo class='MathClass-rel'>=</mo> <mrow><mo form='prefix' fence='true'> (</mo><mrow> <mfrac><mrow><mi>∂</mi><msup><mrow><mi>ξ</mi></mrow><mrow><mi>α</mi></mrow></msup></mrow>
<mrow><mi>∂</mi><msup><mrow><mi>ξ</mi></mrow><mrow><mi mathvariant='italic'>′μ</mi></mrow></msup></mrow></mfrac> </mrow><mo form='postfix' fence='true'>)</mo></mrow> <mrow><mo form='prefix' fence='true'> (</mo><mrow> <mfrac><mrow><mi>∂</mi><msup><mrow><mi>ξ</mi></mrow><mrow><mi>β</mi></mrow></msup></mrow>
<mrow><mi>∂</mi><msup><mrow><mi>ξ</mi></mrow><mrow><mi mathvariant='italic'>′ν</mi></mrow></msup></mrow></mfrac> </mrow><mo form='postfix' fence='true'>)</mo></mrow> <msub><mrow><mi>T</mi></mrow><mrow><mi mathvariant='italic'>αβ</mi></mrow></msub>
</math></td><td class='eq-no'>(1)</td></tr></table>
<!-- l. 18 --><p class='nopar'>

</p> <pre id='verbatim-1' class='verbatim'> \begin{equation}\label{eq1-11} T,^{\prime}{\mu \nu} = \left( \frac{\partial \xi^\alpha} {\partial\xi^{\prime\mu}}\right) \left( \frac{\partial \xi^\beta}{\partial \xi^{\prime\nu}} \right) T{\alpha \beta} \end{equation} </pre> <!-- l. 23 --><p class='nopar'> </p></div>

We need to use make4ht DOM filter to create a correct MathML structure. Save the following file as build.lua:

local domfilter = require "make4ht-domfilter"

-- find mathml and insert TeX as an alternative annotation local function update_mathml(element, class) local alt_element_t = element:query_selector(class) if not alt_element_t and not alt_element_t[1] then return nil end -- save alt element contents and remove it from the document local alt_contents = alt_element_t[1]:get_children() alt_element_t[1]:remove_node() -- create a new structure of the mathml element -> -- mathml -- semantics -- mrow -> math content -- annotation -> saved TeX local mathml = element:query_selector("math")[1] local mathml_contents = mathml:get_children() local semantics = mathml:create_element("semantics") local mrow = semantics:create_element("mrow") mrow._children = mathml_contents -- this trick places saved original mathml content into a new <mrow> semantics:add_child_node(mrow) local annotation = semantics:create_element("annotation", {encoding="application/x-tex"}) annotation._children = alt_contents semantics:add_child_node(annotation) mathml._children = {semantics} end

local process = domfilter { function(dom) for _, inline in ipairs(dom:query_selector(".inlinemath")) do update_mathml(inline, ".alt") end for _, display in ipairs(dom:query_selector(".altmath")) do update_mathml(display, ".verbatim") end return dom end }

It parses the HTML files for our custom <span> and <div> elements, get the alt text and inserts it as an '` element of the MathML code.

This is the result:

   <h3 class='sectionHead'><span class='titlemark'>1   </span> <a id='x1-10001'></a>Introduction</h3>
<!-- l. 14 --><p class='noindent'>This  is  the  sample  paragraph  with
<span class='inlinemath'><!-- l. 14 --><math display='inline' xmlns='http://www.w3.org/1998/Math/MathML'><semantics><mrow><mi>a</mi> <mo class='MathClass-rel'>=</mo> <msup><mrow><mi>b</mi></mrow><mrow><mn>2</mn></mrow></msup></mrow><annotation encoding='application/x-tex'>$a=b^2$</annotation></semantics></math></span> inline math.
Different <span class='inlinemath'><!-- l. 14 --><math display='inline' xmlns='http://www.w3.org/1998/Math/MathML'><semantics><mrow><mrow><mi>a</mi> <mo class='MathClass-rel'>=</mo> <msup><mrow><mi>c</mi></mrow><mrow><mn>2</mn></mrow></msup></mrow></mrow><annotation encoding='application/x-tex'>\(a=c^2\)</annotation></semantics></math></span>
type of math. </p><div class='altmath'> <!-- tex4ht:inline --><table class='equation'><tr><td>
<!-- l. 16 --><math class='equation' xmlns='http://www.w3.org/1998/Math/MathML' display='block'><semantics><mrow>
                       <mstyle id='x1-1001r1' class='label'></mstyle><!-- endlabel --><mi>T</mi><msubsup><mrow><mspace width='0.17em' class='thinspace'></mspace></mrow><mrow><mi mathvariant='italic'>μν</mi></mrow><mrow><mi>′</mi></mrow></msubsup> <mo class='MathClass-rel'>=</mo> <mrow><mo fence='true' form='prefix'> (</mo><mrow> <mfrac><mrow><mi>∂</mi><msup><mrow><mi>ξ</mi></mrow><mrow><mi>α</mi></mrow></msup></mrow>
<mrow><mi>∂</mi><msup><mrow><mi>ξ</mi></mrow><mrow><mi mathvariant='italic'>′μ</mi></mrow></msup></mrow></mfrac> </mrow><mo fence='true' form='postfix'>)</mo></mrow> <mrow><mo fence='true' form='prefix'> (</mo><mrow> <mfrac><mrow><mi>∂</mi><msup><mrow><mi>ξ</mi></mrow><mrow><mi>β</mi></mrow></msup></mrow>
<mrow><mi>∂</mi><msup><mrow><mi>ξ</mi></mrow><mrow><mi mathvariant='italic'>′ν</mi></mrow></msup></mrow></mfrac> </mrow><mo fence='true' form='postfix'>)</mo></mrow> <msub><mrow><mi>T</mi></mrow><mrow><mi mathvariant='italic'>αβ</mi></mrow></msub>
</mrow><annotation encoding='application/x-tex'>
\begin{equation}\label{eq1-11}
T\,^{\prime}_{\mu \nu} = \left( \frac{\partial \xi^\alpha} {\partial\xi^{\prime\mu}}\right) \left( \frac{\partial \xi^\beta}{\partial \xi^{\prime\nu}} \right) T_{\alpha \beta}
\end{equation}
</annotation></semantics></math></td><td class='eq-no'>(1)</td></tr></table>
<!-- l. 18 --><p class='nopar'>

</p>

<!-- l. 23 --><p class='nopar'> </p></div>

michal.h21
  • 50,697
  • Awesome... Is it possible to get both MathML and LaTeX tags by using htlatex? in a same file? – MadyYuvi Oct 12 '20 at 04:31
  • 1
    @MadyYuvi this could work with htlatex as well. in theory. You would need to make temporary TeX file, because htlatex doesn't support pipes. YOu would also need to post-process the HTML file with the DOM filter. But anyway, htlatex is officially obsolete, because make4ht does lot of fixes on the generated HTML file. Including fixes on MathML. You most likely don't get correct MathML elements with htlatex only. – michal.h21 Oct 12 '20 at 06:52
0

Provided MWE having many LaTeX coding errors, I've fixed and the modified tags are:

\documentclass{article} 
\usepackage[T1]{fontenc}

\begin{document}

\title{Article Title Here}

\author{Author Name Here}

\maketitle

\section{Introduction}

This is the sample paragraph. \begin{equation}\label{eq1-11} T,^{\prime}{\mu \nu} = \left( \frac{\partial \xi^{\alpha}} {\partial\xi^{\prime\mu}}\right) \left( \frac{\partial \xi^{\beta}}{\partial \xi^{\prime\nu}} \right) T{\alpha \beta} \end{equation}

Please refer the equations \ref{eq1-11} for the further testing. \end{document}

After correct the errors, I've run the command

htlatex test "xhtml,mathml,mathml-" " -cunihft" "-cvalidate -p"

It converts nicely...

EDIT

If you need to get display the LaTeX tags in the converted HTML, then use the below .cfg file:

conversion.cfg

\RequirePackage{verbatim,etoolbox}

\Preamble{xhtml} \def\AltMathOne#1${\HCode{\detokenize{(#1)}}$} \Configure{$}{}{}{\expandafter\AltMathOne} \def\AltlMath#1){\HCode{\detokenize{(#1)}})} \Configure{()}{\AltlMath}{} \def\AltlDisplay#1]{\HCode{\detokenize{[#1]}}]} \Configure{[]}{\AltlDisplay}{} \def\AltDisplayOne#1#2$${#1\HCode{\detokenize{$$#2$$}}$$} \Configure{$$}{}{}{\AltDisplayOne}{}{} \newcommand\VerbMath[1]{% \ifcsdef{#1}{% \renewenvironment{#1}{% \NoFonts% \Configure{verbatim}{}{} % suppress <br /> tags \texttt{\string\begin{#1}}\HCode{\Hnewline}% we need to use \texttt to get all characters right \verbatim}{\endverbatim\texttt{\string\end{#1}}\EndNoFonts}% }{}% } \VerbMath{align} \VerbMath{equation} \VerbMath{equation*}

\begin{document}

\EndPreamble


Then the run command:

htlatex sample "conversion" " " "-cvalidate -p"
MadyYuvi
  • 13,693
  • 1
    What are the "many" coding errors? I see only \article instead of \title... – campa Oct 09 '20 at 08:53
  • 1
    @MadyYuvi: Thanks for your reply. But i expecting mathml coding as well input LaTeX Coding also. In your example i have found only Mathml and not found LaTeX coding. – Balaji Oct 09 '20 at 08:58
  • @campa I've modified OP's question and removed those errors... – MadyYuvi Oct 09 '20 at 11:55
  • 2
    What I meant is that (1) when you correct errors it would be helpful if you would describe them in the answer, and (2) you keep talking about errors (plural) while I see only one. (A couple of questionable other things, yes, but only one real error.) – campa Oct 09 '20 at 11:57
  • @MadyUuvi: How to get both MathML output and LaTeX output in the html file? Please advise. – Balaji Oct 09 '20 at 13:25
  • @Balaji It is possible by getting TeX code and MathML code separately, but not sure about both will be in a same file....Trying... – MadyYuvi Oct 09 '20 at 13:31
  • @MadyYuvi: TeX code and MathML Code separately also it's enough and good for me and waiting for your reply. – Balaji Oct 09 '20 at 13:38
  • @Balaji Check the modified suggestion.... – MadyYuvi Oct 09 '20 at 13:47
  • @MadyYuvi: Now i have getting only LaTeX coding. I have changed "\Preamble{mathml,-cunihtf,-utf8,mathjax,html5,early_,early^,ext=html}" then also mathml is not display. Why and how do get both in the same html file. – Balaji Oct 09 '20 at 14:02
  • @Balaji I've already told that I'm not sure about to get both LaTeX and MathML tags in a same output HTML file... – MadyYuvi Oct 09 '20 at 14:04
  • It should be possible to reuse code from the TeX4ht code for MathJax output to output both LaTeX and MathML. But I don't think it will be too easy. Another option is to parse MathML back to LaTeX using make4ht DOM filters. It won't be easy either. – michal.h21 Oct 09 '20 at 14:07
  • @Mady: But in the latexml another script is possible to get both mathml as well latex output. Why this is not possible in our tex4ht? Any ideas... – Balaji Oct 09 '20 at 14:07
  • @Balaji latexml is purely based on Perl' script, butTeX4ht' is not like that... – MadyYuvi Oct 09 '20 at 14:28
  • @michal: Any best tutorial about you know make4ht DOM filters with examples. I will try to study and do something about my requirements. – Balaji Oct 09 '20 at 15:23
  • Actually tex4ht is very good and give good results. some equations only not converting like a_\mathit and user defined macros... We are worry about this... Otherwise tex4ht is very good compare with latexml – Balaji Oct 09 '20 at 15:26
  • @Balaji see https://tex.stackexchange.com/a/548484/2891 for example of mathml parsing using LuaXML. it would need to be adapted for make4ht dom filters, of course. some info about DOM filters is here: https://tex.stackexchange.com/a/564006/2891 – michal.h21 Oct 09 '20 at 22:12