3

Background

Looking to perform a text substitution on arbitrary strings. In the code snippet, the value #a.m. #p.m. value comes from a document. More specifically, the input is an XML document generated from Markdown. The XML document resembles:

<p>See <a class="href" data-type="tbl" href="#ref"/> for details.</p>

The #ref is proving to be problematic.

System

ConTeXt version: 2023.09.26 18:19

Code

Minimal example to show the problem:

\startluacode
userdata = userdata or {}

userdata.TextReplacements = {}

local function TextReplacement( text ) text = string.gsub( text, "#", "\#" ) local replaced = lpeg.replacer( userdata.TextReplacements ):match( text ) context( replaced ) end

interfaces.implement { name = "TextReplacement", arguments = { "string" }, public = true, actions = TextReplacement, } \stopluacode

\startluacode userdata = userdata or {}

userdata.TextReplacements = { [1] = { "a.m.", "\cap{am}" }, [2] = { "p.m.", "\cap{pm}" }, } \stopluacode

\starttext \TextReplacement{#a.m. #p.m.} \stoptext

Details

An additional detail is that the #ref value is being read into ConTeXt from the anchor's link and looked up as follows:

\startxmlsetups xml:xhtml
  \xmlsetsetup{\xmldocument}{a[@class='href']}{xml:anchorhref}
\stopxmlsetups

\startxmlsetups xml:anchorhref Xref = \xmlatt{#1}{data-type}-\xmlatt{#1}{href} \stopxmlsetups

The anchorhref is executed from inside an xml:p setup, shown here:

\startxmlsetups xml:p
  \xmldoifnotselfempty{#1}{%
    \ignorespaces
    \expandafter\TextReplacement{\xmlflush{#1}}
    \removeunwantedspaces
  }
  \par
\stopxmlsetups

That call to \TextReplacement doesn't work because of the # symbol.

Problem

The wiki suggests using lpeg.replacer( ... ):match( ... ), but that produces a compile error.

By adding a call to string.gsub, the compile error goes away, but the output produces double hash symbols:

##AM ##PM

Question

How do you escape the hash symbol and any others that may cause a failure with the string replacement such that no double-hashes are output?

Related

Dave Jarvis
  • 11,809

2 Answers2

2

I happened to be in contact with Hans while reading this question. He first mentioned

\starttext
   \catcode`#=11
   \TextReplacement{#a.m. #p.m.}
\stoptext

but that might show problems when loading modules and so on. Then he explained that in lmtx we can use

text = string.gsub( text, "#", "#H" )

The #H is a hash escape.

\starttext
   #Ha.m. #Hp.m.
\stoptext

Gives

ampm

You can read about that and many more nice new things in lowlevel-macros.pdf in your distribution.

He then also mentioned that compiling the replacer each time is not efficient (you probably don't run on a web service, so that might not be a problem), so you better do something like

local replacer = lpeg.replacer( userdata.TextReplacements )

local function TextReplacement( text ) text = string.gsub( text, "#", "#H" ) context(replacer:match(text)) end

I don't want to gain any reputation from this, so I make it a Community wiki.

Dave Jarvis
  • 11,809
mickep
  • 8,685
  • "compiling the replacer each time is not efficient" -- moving the replacer shaves between 0.1 and 0.5 seconds off the ~30 second build. – Dave Jarvis Dec 01 '23 at 19:28
  • Indeed, but I guess it all adds up. May I ask how long (how many pages) your document is? – mickep Dec 01 '23 at 20:38
  • 262 pages at present. The bottlenecks are probably the larger images and the code that draws hexagonal grids of random complexity and size, reminiscent of neural networks. My plan is to generate each book per person, so even 0.5 seconds helps. – Dave Jarvis Dec 02 '23 at 01:15
1

such that no double-substitutions are performed?

There's not really a double-substitution occurring; the hash symbol just appears twice as far as the Lua processor is concerned. The following document

\startluacode
    interfaces.implement {
        name      = "test",
        arguments = { "string" },
        public    = true,
        actions   = function(str)
            print("START OUTPUT")
            print(str)
            print("STOP OUTPUT")
        end
    }
\stopluacode

\test{#a}

gives this as its output:

START OUTPUT
##a
STOP OUTPUT

so instead of

text:gsub("#", ...)

you'll need to write

text:gsub("##", ...)

How do you escape the hash symbol and any others

Lots of options:

\starttext
\startluacode
    str = "#"
-- Naïve solution, does not work
-- context(str)
-- context.par()

-- Output in verbatim
context.verbatim(str)
context.par()

-- Backslash escape
context(str:gsub(str, &quot;\\#&quot;))
context.par()

-- \char escape
context(str:gsub(str, &quot;\\char`\\#&quot;))
context.par()

-- ConTeXt escape command
context(str:gsub(str, &quot;\\letterhash&quot;))
context.par()

-- Escape formatter string
context(&quot;%s %!tex! %02X&quot;, &quot;test&quot;, str, 10)
context.par()

-- Manually run the same function
context(lpeg.match(lpeg.patterns.texescape, str))
context.par()

\stopluacode \stoptext

sample output

Max Chernoff
  • 4,667