Replacing all custom macros and standardizing papers written in TeX

Question

I would like to be able to replace every single user-defined macro in papers written in TeX by its definition. That is, if TeXing is considered to be a function f and my standardizing script is function g I want fg(x)=f(x)for any TeX code x that can be TeXXed without errors.

From similar questions others have asked before it is clear that this will require some really hard work because TeX is extremely customizable. Well, I still want to do it.

I do want to know what I'm about to get into though.

Do I essentially need to almost create half of a TeX engine (i.e. the input processor and the expansion processor)?
Assuming that the work is actually finished. Will I have to keep updating my code for the rest of my life in order for it to remain compatible with latest packages even though Knuth's TeX engine itself permanently remains stable?
If the goal in 1 and 2 are indeed infeasible for one person to reach during leisure time, is it feasible for me to achieve something less ambitious, namely making sure that at least 50% of the actually existing papers on arXiv gets successfully standardized? The papers are almost always written in LaTeX often using one of the AMS packages. However they tend to include low-level code such as \def and sometimes even \let which causes de-macro to be ineffective in standardizing them.

I'm reading Knuth's The TeXbook right now.

To be perfectly honest, i don't really understand your goal. — Johannes_B, Apr 18 '19 at 04:19
And you've asked basically the same question before (which was also closed as a duplicate to the one I link to above). Expand all \newcommand without doing anything else in LaTeX? — Alan Munn, Apr 18 '19 at 04:19
@AlanMunn No. That question is about people trying to de-macro their own papers. My question on the other hand is about writing a program that can de-macro most if not all papers with no manual fixing of anything. — Ying Zhou, Apr 18 '19 at 04:27
@Johannes_B My goal is to remove custom macros from arbitrary papers written in TeX so that a standardized form can be achieved.
For example one person may use \cata as a shortcut for \mathcal{A} using \newcommand{\cata}{\mathcal{A}} while someone else may use \ca as a shortcut for \mathcal{A} using \def\ca{\mathcal{A}}.My goal is for both cases to be expanded into \mathcal{A}. — Ying Zhou, Apr 18 '19 at 04:28
Why do you want to do that? LaTeX is a macro package which you want to demacrofy. Nobody would be able to read the result. — Johannes_B, Apr 18 '19 at 04:31
@Johannes_B Well, I don't want to demacrofy macros defined by TeX, LaTeX or well-known packages such as tikz-cd or amsthm. My problems are with user-defined ones in the beginning of papers, namely that long list of \defs, \newcommands, \lets and \DeclareMathOperators. — Ying Zhou, Apr 18 '19 at 04:33
I have considered the possibility that some people may deliberately revise catcodes and create completely absurd situations. I assume that this does not happen in actual papers other than \makeatletter and \makeatother. — Ying Zhou, Apr 18 '19 at 04:35
@YingZhou You would be surprised :-) A lot of crazy things happen even in “actual papers” and of course in the source of LaTeX packages themselves (which are after all written by other humans of varying levels of skill and style). — ShreevatsaR, Apr 18 '19 at 05:08
@ShreevatsaR Yeah. I have checked out some random papers in math, physics and economics. I just became increasingly mad at the fact that people are using all kinds of crazy things. — Ying Zhou, Apr 18 '19 at 05:29
@YingZhou Why? It's just a tool for typesetting; people will use it however they can and there's nothing wrong if it works for them. Even Knuth's TeXbook does not follow a lot of the LaTeX dogma, being of course not written by a LaTeX user. — ShreevatsaR, Apr 18 '19 at 05:54
@ShreevatsaR Because I want to use them for the purpose of machine learning in order to help automate some parts of scientific research. I can either focus on perfectly standardizing TeX or improving some variant of Im2Latex, that is, using machine learning to de-tex PDFs. The latter will contain errors which will ruin automated science hence I prefer the former. — Ying Zhou, Apr 18 '19 at 06:13
You could write a simplified command that ignores \usepackage, \RequiresPackage, \docmentclass etc. and only demacrofies \let, \def, \newcommand, \renewcommand, \DeclareRobustCommand, etc. within the document itself. That would cover the most common cases. Alternatively, write a full TeX parser but only un-macrofy definitions within the main document, not any packages itincludes. I’m not sure why you need to do this, however. — Davislor, Apr 18 '19 at 07:12
If you just have a limited number of macros to correct in a single document, and that’s overkill, search/replace is your friend. — Davislor, Apr 18 '19 at 07:15
I’d recommend against this in general. It’s sometimes the case that you want semantic markup. For example, you might want to copy-paste a formula from a paper that uses a journal’s house style into a Beamer presentation that uses sans-serif fonts. — Davislor, Apr 18 '19 at 07:18
there is no difference to tex between a user defined macro \cata and a latex defined macro \mathcal they are both simply macros that expand to a list of tokens, so at what point do you want to stop expanding? this is a completely undefined problem, and a duplicate. — David Carlisle, Apr 18 '19 at 10:37
@Davislor Well, I want to be able to partly demacrofy a very large amount of papers written by others automatically so search/replace is not acceptable.
Yeah I think I will just modify de-macro. — Ying Zhou, Apr 18 '19 at 14:28

Replacing all custom macros and standardizing papers written in TeX

0 Answers0