How would you go about writing a Latex compiler?

Question

I'm writing a Latex to text converter, i.e. its takes an input of latex code and outputs readable text, which will then be read aloud automatically by a computer. When both parts are connected it's a Latex-to-speech program. An example would look something like this:

"5m and m is a unit, $m\cdot f(x)\cdot sin(90^\circ)=f'(x)$" is converted into -> "5 meter and meter is a unit, m times f of x times sine of 90 degrees equals f dash of x"

I've been working on this project for some time, and right now it's capable of converting nearly every Latex formula, but the code is very ad hoc based, by which I mean when I began the project it was so comprehensive I thought the only way to start was not to overthink everything. Now that I have a larger understanding of what problems the project consists of I want to rewrite some of the fundamental functions. Right now the program doesn't scan and create an abstract syntax tree, like a typical compiler, and I'm contemplating if copying classical compiler methods would be a good approach. I've read on StackExhange that the Latex compiler is merely based on macros, maybe it's smarter to find inspiration in a Latex compiler? What do you think the best approach is?

For inspiration (or just to use one of these directly) you can look at https://github.com/alvinwan/TexSoup, https://pylatexenc.readthedocs.io/en/latest/latexwalker/, from pylatexenc also https://pylatexenc.readthedocs.io/en/latest/latex2text/, and https://dlmf.nist.gov/LaTeXML/. — Marijn, Apr 30 '21 at 19:59
Yours is not the first attempt to render "audible" LaTeX. An earlier project by T.V. Raman (a blind mathematician) produced his PhD thesis, "Audio System for Technical Readings" or "ASTeR" (named for his guide dog). Raman presented his work at a TUG meeting to an enthusiastic reception. Using it effectively does require an aware, cooperative author. Your approach may be different, but it's worth looking at earlier work. (Search the web for more references.) — barbara beeton, Apr 30 '21 at 22:04
See https://tex.stackexchange.com/questions/454944/is-there-screen-reader-software-or-a-built-in-method-that-supports-latex-equat — Steven B. Segletes, May 09 '21 at 03:14

score 1 · Answer 1 · answered May 31 '21 at 07:29

I was presented with the same challenge than you some years ago. In this regard I used ply.lex a python based tool to contruct a very basic lexer and parser for some basic LaTeX formulas. The end result although rough and very basic was useful. If you want to check it out is here blindtex. More specifically in the blindtex/latex2ast file. I must warn you that if you want to test it, the output is in Spanish; although my horrible comments are in English.

As it was pointed out in the comments, your best direction to go is to see Raman's work. I did it but as an amateur programmer, to learn Lisp and E-Macs to be able to explore it in more detail was more than I could handle.

I hope this is useful and if you have questions you can contact me for more details.

How would you go about writing a Latex compiler?

1 Answers1