I need to get just the text and math expressions (ignoring tables, pictures and styling) from a set of LaTeX documents and represent them in html.
Looks like plasTeX and MathJaX are enough for this task.
As I understand, after plasTeX has parsed the document, I would need to get all text nodes of the document and all nodes with math. For the math nodes I would try to preserve their LaTeX source. Is it possible, using plasTeX, to get LaTeX source of a math expression with all the commands already applied?