ConTeXt: Escape hash symbols for use with lpeg replacer

Question

Background

Looking to perform a text substitution on arbitrary strings. In the code snippet, the value #a.m. #p.m. value comes from a document. More specifically, the input is an XML document generated from Markdown. The XML document resembles:

<p>See <a class="href" data-type="tbl" href="#ref"/> for details.</p>

The #ref is proving to be problematic.

System

ConTeXt version: 2023.09.26 18:19

Code

Minimal example to show the problem:

\startluacode
userdata = userdata or {}
userdata.TextReplacements = {}
local function TextReplacement( text )
  text = string.gsub( text, "#", "\#" )
  local replaced = lpeg.replacer( userdata.TextReplacements ):match( text )
  context( replaced )
end
interfaces.implement {
  name      = "TextReplacement",
  arguments = { "string" },
  public    = true,
  actions   = TextReplacement,
}
\stopluacode
\startluacode
userdata = userdata or {}
userdata.TextReplacements = {
  [1] = { "a.m.", "\cap{am}" },
  [2] = { "p.m.", "\cap{pm}" },
}
\stopluacode
\starttext
  \TextReplacement{#a.m. #p.m.}
\stoptext

Details

An additional detail is that the #ref value is being read into ConTeXt from the anchor's link and looked up as follows:

\startxmlsetups xml:xhtml
  \xmlsetsetup{\xmldocument}{a[@class='href']}{xml:anchorhref}
\stopxmlsetups
\startxmlsetups xml:anchorhref
  Xref = \xmlatt{#1}{data-type}-\xmlatt{#1}{href}
\stopxmlsetups

The anchorhref is executed from inside an xml:p setup, shown here:

\startxmlsetups xml:p
  \xmldoifnotselfempty{#1}{%
    \ignorespaces
    \expandafter\TextReplacement{\xmlflush{#1}}
    \removeunwantedspaces
  }
  \par
\stopxmlsetups

That call to \TextReplacement doesn't work because of the # symbol.

Problem

The wiki suggests using lpeg.replacer( ... ):match( ... ), but that produces a compile error.

By adding a call to string.gsub, the compile error goes away, but the output produces double hash symbols:

##AM ##PM

Question

How do you escape the hash symbol and any others that may cause a failure with the string replacement such that no double-hashes are output?

"compiling the replacer each time is not efficient" -- moving the replacer shaves between 0.1 and 0.5 seconds off the ~30 second build. – Dave Jarvis Dec 01 '23 at 19:28
Indeed, but I guess it all adds up. May I ask how long (how many pages) your document is? – mickep Dec 01 '23 at 20:38
262 pages at present. The bottlenecks are probably the larger images and the code that draws hexagonal grids of random complexity and size, reminiscent of neural networks. My plan is to generate each book per person, so even 0.5 seconds helps. – Dave Jarvis Dec 02 '23 at 01:15

score 1 · Answer 2 · answered Dec 01 '23 at 07:55

such that no double-substitutions are performed?

There's not really a double-substitution occurring; the hash symbol just appears twice as far as the Lua processor is concerned. The following document

\startluacode
    interfaces.implement {
        name      = "test",
        arguments = { "string" },
        public    = true,
        actions   = function(str)
            print("START OUTPUT")
            print(str)
            print("STOP OUTPUT")
        end
    }
\stopluacode
\test{#a}

gives this as its output:

START OUTPUT
##a
STOP OUTPUT

so instead of

text:gsub("#", ...)

you'll need to write

text:gsub("##", ...)

How do you escape the hash symbol and any others

Lots of options:

\starttext
\startluacode
    str = "#"
-- Naïve solution, does not work
-- context(str)
-- context.par()

-- Output in verbatim
context.verbatim(str)
context.par()

-- Backslash escape
context(str:gsub(str, &quot;\\#&quot;))
context.par()

-- \char escape
context(str:gsub(str, &quot;\\char`\\#&quot;))
context.par()

-- ConTeXt escape command
context(str:gsub(str, &quot;\\letterhash&quot;))
context.par()

-- Escape formatter string
context(&quot;%s %!tex! %02X&quot;, &quot;test&quot;, str, 10)
context.par()

-- Manually run the same function
context(lpeg.match(lpeg.patterns.texescape, str))
context.par()

\stopluacode
\stoptext