4

There is a tool called pdf2htmlex that does an impressive job of converting PDF to HTML. Very faithful. Either books full of mathematics and diagrams have been faithfully converted.

Unfortunately, it is no longer maintained. Is anyone aware of another tool that is maintained that does as good a job of converting either PDF or LaTeX into HTML?

Roxy
  • 809
  • This question is the reversed of https://tex.stackexchange.com/q/3079/34551 – Clément Jul 14 '20 at 14:40
  • called pdf2htmlex that does an impressive job of converting PDF to HTML I've been trying it. It does good job, except the quality of rendering is horrible. The font is too light and the math is hard to read. I tried zooming to be able to read. But the underlining font is too light. I could not figure how to improve it. Also the tool is abandoned. Too bad, as it looks promising. But I do not want to waste time on software that no one maintains. – Nasser Apr 29 '22 at 06:29
  • I looked at your book. You can see that in your own book. Hard to read as the font is just too light and math hard to read. No one will read this. Compare this to mathjax. Much much better quality rendering. tex4ht is much better. But if pdf2htmlEX can improve the quality of the fonts and rendering, then it can have potential. – Nasser Apr 29 '22 at 06:32

2 Answers2

6

I'd recommend make4ht which, from the documentation:

make4ht is a simple build system for tex4ht , TeX to XML converter. It provides a command line tool that drive the conversion process. It also provides a library which can be used to create customized conversion tools

The author of make4ht is michal-h21 and is a very active contributor to this site.

Let's use the following small example, mwe.tex, in what follows:

\documentclass{article}

\begin{document}

Here is some text. And here is some mathematical content $y=x^2$. \end{document}

example 1

Running

 make4ht.exe mwe.tex

gives the output:

<!DOCTYPE html> 
<html lang="en-US" xml:lang="en-US" > 
<head><title></title> 
<meta  charset="iso-8859-1" /> 
<meta name="generator" content="TeX4ht (http://www.tug.org/tex4ht/)" /> 
<meta name="viewport" content="width=device-width,initial-scale=1" /> 
<link rel="stylesheet" type="text/css" href="mwe.css" /> 
<meta name="src" content="mwe.tex" /> 
</head><body 
>
<!--l. 5--><p class="noindent" >Here is some text. And here is some mathematical content <span 
class="cmmi-10">y </span>= <span 
class="cmmi-10">x</span><sup><span 
class="cmr-7">2</span></sup>. </p> 
</body> 
</html>

example 2

From here we can customise the output by employing configuration files; if you have the following:

roxy.cfg

\Preamble{mathml,-css,NoFonts}
\Configure{@HEAD}{\HCode{<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js?config=TeX-AMS-MML_HTMLorMML">\Hnewline</script>\Hnewline}}
\begin{document}
\EndPreamble

and run

make4ht.exe -f html5 -c roxy.cfg mwe.tex

then you receive

<!DOCTYPE html> 
<html lang="en-US" xml:lang="en-US" > 
<head> <title></title> 
<meta  charset="iso-8859-1" /> 
<meta name="generator" content="TeX4ht (http://www.tug.org/tex4ht/)" /> 
<meta name="viewport" content="width=device-width,initial-scale=1" /> 
<link rel="stylesheet" type="text/css" href="\aa:CssFile " /> 
<meta name="src" content="mwe.tex"> 
<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js?config=TeX-AMS-MML_HTMLorMML"> 
</script> 
</head><body 
>
<!--l. 5--><p class="noindent" >Here is some text. And here is some mathematical content
<!--l. 5--><math 
 xmlns="http://www.w3.org/1998/Math/MathML"  
display="inline" ><mi 
>y</mi> <mo 
class="MathClass-rel">=</mo> <msup><mrow 
><mi 
>x</mi></mrow><mrow 
><mn>2</mn></mrow></msup 
></math>.

</body> </html>

example 3

If you have html tidy installed, then you can customise your build process to employ it by using the following roxy.mk4 file:

roxy.mk4

Make:match("html$", "tidy -m -config html-tidy.txt -i ${filename}")

and an html-tidy.txt configuration file

// sample config file for HTML tidy
indent: auto
indent-spaces: 2
quiet: yes
output-xhtml: no
output-html: yes

then you can run

make4ht.exe -f html5 -e roxy.mk4 -c roxy.cfg mwe.tex

to receive

<!DOCTYPE html>
<html lang="en-US">
<head>
  <meta name="generator" content=
  "HTML Tidy for HTML5 for Windows version 5.6.0">
  <title></title>
  <meta charset="utf-8">
  <meta name="generator" content=
  "TeX4ht (http://www.tug.org/tex4ht/)">
  <meta name="viewport" content=
  "width=device-width,initial-scale=1">
  <link rel="stylesheet" type="text/css" href="\aa:CssFile">
  <meta name="src" content="mwe.tex">
  <script src=
  "https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
</head>
<body>
  <!--l. 5-->
  <p class="noindent">Here is some text. And here is some
  mathematical content <!--l. 5--><math xmlns=
  "http://www.w3.org/1998/Math/MathML" display="inline">
  <mi>
    y
  </mi>
  <mo class="MathClass-rel">
    =
  </mo>
  <msup>
    <mrow>
      <mi>
        x
      </mi>
    </mrow>
    <mrow>
      <mn>
        2
      </mn>
    </mrow>
  </msup></math>.</p>
</body>
</html>
cmhughes
  • 100,947
0

Try pandoc

long version:

pandoc -f filename.pdf -t html -o output.html

short version:

pandoc filename.pdf -o output.html

There is an extensive list of examples for any kind of conversion here, and it's super fast!

On MacOS, all you have to do to install is:

brew install pandoc