Adding a comand hook to \appendix with cleveref

Question

I'm trying to set a hook to the \appendix command using the new hook management system. However, when cleveref is loaded, the hook fails with error "Illegal parameter number in definition of \appendix." at begindocument, hence presumably when the hook is being set by ltcmdhooks. The message happens regularly 46 times, same number in different documents, I think it is the number of ## in the definition done by cleveref.

The use of equivalent etoolbox code does not result in the same issue.

The MWE below is not really the "real use case", but is enough to reproduce the problem:

\documentclass{book}
\usepackage{cleveref}
\AddToHook{cmd/appendix/before}{\label{appendix}}
% 'etoolbox' works though
% \usepackage{etoolbox}
% \preto{\appendix}{\label{appendix}}{}{}
\begin{document}
Hello world!
\appendix
\end{document}

I'm a little at a loss at what to do with this. But understanding better what the culprit is and particularly why ltcmdhooks fails where etoolbox does not, so that I get a better grasp on what I can do with the former, would be much welcome.

Edit: I'd like to add one dimension to the problem here, if that's OK. My actual use case is that I'm trying to do some setup when the \appendix starts, in a package meant for publication. Now, I cannot hope to be sure of what \appendix may contain, since it is redefined in so many places in one or another package or class. I was expecting to keep the hook, but that, if things went sour in one or another use case, I'd be able to instruct users to manually remove the hook and restablish things to working condition. But adding and removing the hook still breaks \appendix:

\documentclass{book}
\usepackage{cleveref}
\AddToHook{cmd/appendix/before}[package-1]{\label{appendix}}
\RemoveFromHook{cmd/appendix/before}[package-1]
\begin{document}
Hello world!
\appendix
\end{document}

Would there be a safe way to disable the hook before it gets applied at begindocument?

That's a known issue (with no known solution so far :). The patching mechanism tries to be clever and avoid \scantokens as much as possible, and in case of very simple macros (for example, ones that don't take arguments), it patches by treating the macro more or less as a token list. However this procedure currently fails when there's a single parameter token (##) in the definition. — Phelype Oleinik, Oct 06 '21 at 01:55
Hi @PhelypeOleinik! Thanks! And why does etoolbox works for this case? I'm very "curious" here since I'm relying in one such hook to \appendix, and I have no hope of being able to know what "arbitrary packages or classes" may put there. That being the case, should I refrain from using ltcmdhooks? — gusbrs, Oct 06 '21 at 02:02
etoolbox works because it uses the \scantokens approach (which is also not fool-proof, but works well in this case). You should definitely not prefer manual patching over the method built into the kernel (well, maybe for now, while it's broken, but I hope to have it fixed for the November release). It's getting late, so if you can hold your curiosity until tomorrow, I can write a detailed explanation on why and how this is happening — Phelype Oleinik, Oct 06 '21 at 02:07
@PhelypeOleinik Absolutely, there is no hurry. I just posted it today, despite being late, because I was around this for quite some time and was about to "close the shop" for today as well. — gusbrs, Oct 06 '21 at 02:09
@PhelypeOleinik Btw \tl_put_left:Nn also works, but anything but \AddToHook seems inappropriate here... — gusbrs, Oct 06 '21 at 02:10
Regarding your edit: there's no use in doing \RemoveFromHook. Once you add some code to a generic cmd hook, that command enters the patching queue and nothing takes it out of there. What prevents the patch from happening is declaring the hook with \NewHook or \ActivateGenericHook (\ProvideHook in the current release). — Phelype Oleinik, Oct 06 '21 at 17:57
@PhelypeOleinik I did not understand how \NewHook, \ActivateGenericHook or \ProvideHook could prevent the patch from happening if \RemoveFromHook does not do it. You mean here that I could create a hook with a different name and then use \DisableHook altogether? — gusbrs, Oct 06 '21 at 19:08
Or is it the example you gave in your answer: \ActivateGenericHook before the package is loaded? — gusbrs, Oct 06 '21 at 19:10
Because \RemoveFromHook works exclusively on the hook's code pool, while \ActivateGenericHook tells ltcmdhooks if the patching should happen. The only restriction on the position of \ActivateGenericHook is that it has to appear before patching happens (that is, before \begin{document} in this case). — Phelype Oleinik, Oct 06 '21 at 19:40
The issue is fixed in the development branch, and will be available in the next release (https://github.com/latex3/latex2e/pull/699). — Phelype Oleinik, Oct 21 '21 at 14:29
@PhelypeOleinik I took a look at the commits, and wow! I've still a lot to learn! :-) — gusbrs, Oct 21 '21 at 14:51
The code in there probably looks more complicated than it actually is, so it can cope with several different cases. I added another chunk to my (already rather long) answer with the basic idea implemented — Phelype Oleinik, Oct 21 '21 at 15:30
@PhelypeOleinik Thanks again! I do get the general concept from the answer. — gusbrs, Oct 21 '21 at 17:55

Phelype Oleinik · Accepted Answer · 2021-10-21T15:29:12.797

There are roughly two ways to patch a command: via \scantokens, and via expansion+redefinition. There's a (not so) brief explanation of both at the end of this answer. When ltcmdhooks can detect the type of command, so that it knows exactly the <parameter text> of the command, it patches by expansion+redefinition, so it has no restriction on the catcode settings in force when the macro was defined. In the case of \appendix, it takes no arguments, so it can be treated as a token list and expanded, then redefined with the added material.

For example, here's a simple sketch of how it works:

\def\appendix{%
  \typeout{This starts the appendix.}}
\def\append#1{%
  \expandafter\appendaux\expandafter{#1}#1}
\def\appendaux#1#2#3{%
  \def#2{#1#3}}
\append\appendix{\typeout{I added this.}}
\appendix

However, what I did not anticipate when I wrote that code, is the case when the original definition of \appendix contains ## (try this definition in the code above):

\def\appendix{%
  \typeout{This starts the appendix. ##BOOM!}}

When \appendix is defined like that, TeX's definition scanner sees #₆#₆, and replaces that by a single parameter token #₆ in the definition of \appendix, so far so good. However when you expand the command, TeX also returns a single #₆, and then when you try to redefine the command you have:

\def\appendix{%
  \typeout{This starts the appendix. #BOOM!}%
  \typeout{I added this.}}

which contains an illegal parameter (#B), and the definition errors.

I have changed ltcmdhooks to handle this case (there's a brief explanation below), but meanwhile you can use \ActivateGenericHook (or \ProvideHook in LaTeX 2021-06-01) to tell ltcmdhooks that you have already patched the command, so it won't try patching, then you do the patching manually using etoolbox:

\documentclass{book}
\usepackage{cleveref}
\usepackage{etoolbox}
\IfFormatAtLeastTF{2021-11-15}%
  {\ActivateGenericHook}% LaTeX > 2021-11-15
  {\ProvideHook}%         LaTeX = 2021-06-01
    {cmd/appendix/before}
\pretocmd\appendix
  {\UseHook{cmd/appendix/before}}
  {}{\FAILED}
\AddToHook{cmd/appendix/before}{\label{appendix}}
\begin{document}
Hello world!
\appendix
\end{document}

Why the above works

The interface for ltcmdhooks in \AddToHook is supposed to work as follows:

If an end user writes \AddToHook{cmd/name/before}{code}, and the hook cmd/name/before doesn't exist yet (which implies that the command \name doesn't have that hook "installed"), then the code tries to patch that hook in the command.

If the end user writes \AddToHook{cmd/name/before}{code}, and the hook cmd/name/before already exists, this (probably) means that the command \name already has that hook, so it just adds the code to the hook, and leaves the command be.

This means that a package author may want to fine-tune the position of the cmd/name/before hook (for example, \def\name{<some initialization>\UseHook{cmd/name/before}<definition>}), then we don't want ltcmdhooks patching the command again (it would be wrong to add the same hook twice), so we tell ltcmdhooks that the hook already exists by saying \ActivateGenericHook{cmd/name/before}, then patching is no longer attempted.

This works for your case because you then manually add the hook to the command, and then tell ltcmdhooks that pathching is no longer needed. See section 3 Package Author Interface of the ltcmdhooks documentation.

So in essence, you, as the package author, are appropriating the \appendix command, by adding the hook yourself (exactly where ltcmdhooks would add it), and then telling ltcmdhooks to not patch it by using \ActivateGenericHook.

If instead of \appendix you were adding hooks to \UniqueCommandFromMyPackage, then you could use \NewHook instead of \ActivateGenericHook (the effect would be identical), because there would be no possibility of a name conflict.

How LaTeX2ε handles this case now

The problem: Turns out in the described case we're in a dead-end. When you write a definition like

\def\foo#1{#1##X}

TeX stores its <replacement text> as a token list containing:

out_param 1, par_token #, letter X

(out_param 1 is #1 to be replaced by the actual parameter when the macro is expanded, par_token # is a catcode 6 #, and letter X is a catcode 11 X).

Then, when you expand \foo with #1 (par_token #, character 1), TeX replaces out_param 1 and you have:

par_token #, character 1, par_token #, letter X

which is equivalent to typing #1#X. If you plug that back into a new definition of \foo you'll have:

\def\foo#1{#1#X}

which is obviously wrong (and thus the Illegal parameter number error). And at this point you have no way to tell what was an actual parameter when the macro was defined, and what was a single parameter token.

Half solution: There is one very simple case that can be easily detected and solved (which coincidentally is the one in your question): a macro without parameters. In this case, the macro has no argument, so any loose ## in its definition cannot possibly be confused with a parameter, so we can treat this such macros as token lists (in the expl3 sense) and do something akin to \tl_put_right:Nn and problem solved.

Another relatively simpler case is when the macro has no ## in its definition. In this case we don't have to worry about confusing parameters, so we treat the macro normally (this was the case implemented initially). LaTeX uses a rather simple loop to check if a macro has a parameter token in its definition (\__hook_if_has_hash:nTF): it looks at every token in the defintion, and compares it with #.

The other half: When the macro falls into the general case of having both parameters and parameter tokens in its definition (like \foo above), then we have to manually re-double every parameter token in the definition, so that it can be re-made. To do that, instead of expanding \foo with #1, LaTeX expands it with \c_@@_hash_tl, so \foo{\c_@@_hash_tl} becomes a definition like:

\foo#1{\c_@@_hash_tl 1#X}

then we loop through the replacement text of the macro (inside the braces) and double every ##, and replace every \c_@@_hash_tl by a single #, which then gives:

\foo#1{#1##X}

and then we can do the definition normally (phew!)

Patching with `\scantokens`

(wordier description here)

Suppose a macro defined with

\long\def\mycmd[#1]#2{\typeout{#1//#2}}

To append some code to it via \scantokens, you first do \meaning\mycmd to get a string like:

\long macro:[#1]#2->\typeout {#1//#2}

(with usual \detokenize catcodes: all 12 except spaces, which are catcode 10), then you use a delimited macro to separate the <prefixes>, the <parameter text>, and the <replacement text>, roughly like this:

\def\split#1{\expandafter\splitaux\meaning#1\relax}
\expanded{%
  \noexpand\def\noexpand\splitaux#1\detokenize{macro:}#2->#3\relax}{%
    \def\prefixes{#1}%
    \def\parameter{#2}%
    \def\replacement{#3}}

(I'm using \def\prefixes{#1}, etc. for the sake of understandability, but in reality you would inject everything expandably instead; see the definition of \__kernel_prefix_arg_replacement:wN in expl3-code.tex, and \etb@patchcmd in etoolbox.sty if you're feeling brave).

At this point you have every part of the definition as a string separately. Now you can either append or prepend some code to \replacement (or replace some part of it, as it's done in \patchcmd), or in rarer cases change \prefixes or \parameter. At this point you have three strings, each of which is a part of the definition. To reconstruct the definition you need:

<prefixes>\def\mycmd<parameter text>{<replacement text>}

but the three parts you have are still catcode 12 tokens, which are no good. Here comes the \scantokens part: you rescan those strings back to "normal" tokens:

\expanded{%
  \noexpand\scantokens{%
  % <prefixes>\def         \mycmd<parameter text>{<replacement text>}
    \prefixes \def\noexpand\mycmd\parameter      {\replacement      <added material>}%
  }%
}

which, after \expanded does its job, becomes:

\scantokens{%
  \long\def\mycmd[#1]#2{\typeout {#1//#2}<added material>}%
}%

then \scantokens does its thing and turns everything into tokens using the current catcode settings, and then the definition is carried out normally.

The advantage of this method is that you can do virtually any manipulation in any part of the definition.

The disatvantages are a few:

You need to know what catcodes were in force when the definition was first made (when patching you usually need to verify that a simple round of \meaning–\scantokens doesn't change the meaning of the macro) otherwise you can't patch safely;
If the macro was created with some combination of \edef and \detokenize to forcibly make some catcode 12 tokens, you will probably not be able to patch that macro (for example, \splitaux as defined above in this answer cannot ever be patched with \patchcmd because it contains letters (for example m) of both catcodes 11 and 12);
If the <parameter text> of the macro contains the characters ->, you won't be able to patch the macro.

Patching with expansion+redefinition

This method is much simpler, but requires previous knowledge of how the macro was defined. This can be done in few cases, namely when you know exactly what the <parameter text> of the macro is. The cases known by the kernel are when the macro was defined with \DeclareRobustCommand, or with ltcmd (\NewDocumentCommand or \NewExpandableDocumentCommand), or with \newcommand with an optional argument, or when the macro takes no argument.

Suppose the same macro from before, but defined with:

\newcommand\mycmd[2][default]{\typeout{#1//#2}}

(it will have an internal macro called \\mycmd, but for the sake of simplicity let's call it \mycmd as well), then we know for sure its <parameter text> is [#1]#2. Knowing what arguments the macro expects, we can feed it #1, #2, ... as arguments, so for \mycmd we would do:

\mycmd[#1]{#2}

which would then expand to the <replacement text> of the macro, with the first parameter (#1) replaced by #₆1₁₂ (the parameter token # followed by the character 1). The patching scheme would be something like:

\expanded{%
  \def\noexpand\mycmd[#1]#2{%
    \unexpanded\expandafter{\mycmd[#1]{#2}<added material>}%
  }%
}

then after the \expanded is done you are left with:

\def\mycmd[#1]#2{\typeout{#1//#2}<added material>}}

which is exactly what you had with the \scantokens approach, except that you didn't turn tokens into a string, so catcodes don't matter at all here.

The advantages of this method are roughly the disadvantages of the \scantokens method:

catcodes don't matter at all;
you can patch complicated macros (including the \splitaux macro from before) using this method given you know exactly what its <parameter text> is;
the <parameter text> of the macro may contain any token your heart desires (as long as you know what token it is); and
this method doesn't need a sanity check to ensure that the macro can be patched correctly.

The disadvantage is the requirement for the method to work: you need to know exactly what the <parameter text> is.

Hi Phelype! Thank you very much for the thorough explanation. I think I got the general idea of what the task of the hook infrastructure is, though probably a number of details are beyond my league. — gusbrs, Oct 06 '21 at 19:02
Regarding the suggested use of etoolbox to \UseHook{cmd/appendix/before} as a patch. Let me see if I understand well the intention here: to get things working promptly while still adhering to the more "civil" machinery of the new hook management system. Is that the intention there? — gusbrs, Oct 06 '21 at 19:04
However, since the package in question is not yet released, maybe the "best fix" for me is just waiting to see what you come up with. ;-) — gusbrs, Oct 06 '21 at 19:05
@gusbrs When \AddToHook{cmd/name/...}{} is used, two things happen: some code is added to the cmd/name/... hook, and independently from that, ltcmdhooks checks if it has to patch \name to add the hook to it. It will only patch \name if the hook didn't exist previously. I added a Why the above works section to the answer that hopefully makes things clearer. Feel free to ask. And no, this is part of the documented interface, so you could use this solution permanently (though it would require etoolbox; if you don't want the requirement, then wait until the November release). — Phelype Oleinik, Oct 06 '21 at 19:36
Phelype, I think I got the relation of the parts better now. Thanks once again! Indeed, I was not requiring etoolbox at all, and it seems too much just for that. I might well wait, which isn't a problem. To start with because I'll need at least another couple of weeks to wrap things up, what's a couple more? ;-) — gusbrs, Oct 06 '21 at 19:53
Phelype, I had one other little idea, tell me what you think of it. Suppose "before begin document" I get \meaning\appendix and do a string search (\str_if_in:NnTF) in it, if I find ##, I issue a \ActivateGenericHook and make the hook no-op, otherwise I let it do its job as usual. This would grant a decent result even without etoolbox. Do you think this would be a good idea? Is the ## the only known case of trouble there? (I'm just thinking "out loud" here, I'll probably just wait for the release anyway). — gusbrs, Oct 06 '21 at 23:13
@gusbrs I wouldn't bother. Yes, ## (two or more) are what cause the trouble here, but this solution (if I understood your intent) would make the code not work in the current release, while it can work using the intended interface. If you just want to avoid etoolbox, you can replace the \pretocmd line by \tl_gput_left:Nn \appendix { ... } (in the specific case of a command that takes no arguments). — Phelype Oleinik, Oct 07 '21 at 00:49
Well, it wouldn't work when it couldn't work, meaning if someone had redefined it in a way which would make \AddToHook choke, otherwise I think it would. But, you are right, it'd still be just too ad hoc to be worth it. Anyway, I was really just curious. Indeed, if there's one package which is likely not to be used with the one I'm making, it is cleveref, it just happened to be the offending case I found, what I really wanted to know were "the bounds of safe hooking". And your answer was great for a better grasp. Thanks again! — gusbrs, Oct 07 '21 at 01:09
@gusbrs Oh, so I misunderstood your idea. Yeah, if you want to stick to \AddToHook only, then all there's to do is wait, unfortunately. About “the bounds of safe hooking”, there should be little restriction on that: see section 2 (Restrictions and Operational details) of the ltcmdhooks documentation. Basically, hooking into a command is not possible when the command can't be patched (section 2.1 Patching), or when adding the hook disrupts the normal working of the command (2.2 Commands that look ahead). [cont'd] — Phelype Oleinik, Oct 07 '21 at 01:37
Regarding patching, ltcmdhooks should be even more permissive than \patchcmd (modulo bugs :), and if it's not possible to patch you should get an error message right away, but this should be rare. Regarding disrupting the command: this depends on what the command does, and cannot be checked programatically, so here you have to be much more careful about what commands you add hooks to (for example \AddToHook{cmd/section/after}{code} is recipe for disaster). — Phelype Oleinik, Oct 07 '21 at 01:37
Well, I have read the documentation, I was concerned because the case did not fit in any of the documented restrictions, and I did not understand well what is going on. Now I get it. My expectation was indeed that ltcmdhooks would be "as safe as it gets", specialy the before ones. But, one thing that I do miss from etoolbox patches are the ability to use the original command's arguments. But, unless \UseHook were able to pass them around, which it can't, that's understandable. — gusbrs, Oct 07 '21 at 01:48

Adding a comand hook to \appendix with cleveref

1 Answers1

Why the above works

How LaTeX2ε handles this case now

Patching with \scantokens

Patching with expansion+redefinition

Patching with `\scantokens`