Automatically detect acronyms without \ac or \gls around them

Question

My Situation:

I am using glossaries package to manage my acronyms. They are all neatly defined in one file with a list generated automatically. But I have to confess: I was very inconsistent in using \gls{} inside the main document.

Now, I ended up with a structured acronyms.tex:

\newacronym{rx}{RX}{receive}
[...]

But a chaotic main.tex (example made up):

The \gls{rx} channel is the receiving channel, 
but only if Rx-buffer has RX capacity. 
Also receive channels have \emph{rx}-flags [...]

That is, acronyms may be either correctly escaped, the acronym in any caps or written out. I want all to be correctly escaped.
Additionally, the acronyms may occur inside other words. Those should stay the same.

Question:

Is there a quick solution to clean up the messed up main.tex?

Preferably automatically, using the strucutred data in acronyms.tex and with plain pdflatex / latexmk.

Some Ideas:

I got pretty far using find/replace with regex, but it is very tedious and often missing edge cases.

Fully automating the process seems hard, see: Typesetting acronyms without explicitly marking them. But detection of non-\gls{} entries with not too many false positives would already do the job for me.

Skimming acro and glossaries manuals, I only find ways to automate the list of abbreviations and the index (700 pages, I could very well have missed something).

There is a very similar answer. Also others that use LuaLaTeX and XeLaTeX. So probably it is possible to hack something together in lua/regex. Unfortunately, I only know Python...

So before I reinvent the wheel, I would like to know if anyone already faced this problem.

pdftex has essentially no access to the text, so if using pdftex a regex edit to fix your source is the only practical option. If using luatex you could use the input buffer callback to do lua string replace on the fly, but I think fixing the source would be preferable — David Carlisle, Apr 07 '23 at 09:37
Imho searching and replacing in the source in a semi-automatically way (with regex) is the only sensible way. — Ulrike Fischer, Apr 07 '23 at 09:40
yes, I fixing the source is much preferred, because maybe others need to build the project in the future. I am not sure about switching engines atm. So other way would be better. @UlrikeFischer I do so, but I tend to miss many edge cases, like start of line, hyphens, colons, \emph{} around the word. Maybe I try to write a Python script to aid in that. — Paul Smith, Apr 07 '23 at 09:56
well I would use some grep to get a list of "rx/RX/Rx" and go through it to handle the cases. — Ulrike Fischer, Apr 07 '23 at 10:06

Automatically detect acronyms without \ac or \gls around them

My Situation:

Question:

Some Ideas:

0 Answers0