There are roughly two ways to patch a command: via \scantokens, and via expansion+redefinition. There's a (not so) brief explanation of both at the end of this answer. When ltcmdhooks can detect the type of command, so that it knows exactly the <parameter text> of the command, it patches by expansion+redefinition, so it has no restriction on the catcode settings in force when the macro was defined. In the case of \appendix, it takes no arguments, so it can be treated as a token list and expanded, then redefined with the added material.
For example, here's a simple sketch of how it works:
\def\appendix{%
\typeout{This starts the appendix.}}
\def\append#1{%
\expandafter\appendaux\expandafter{#1}#1}
\def\appendaux#1#2#3{%
\def#2{#1#3}}
\append\appendix{\typeout{I added this.}}
\appendix
However, what I did not anticipate when I wrote that code, is the case when the original definition of \appendix contains ## (try this definition in the code above):
\def\appendix{%
\typeout{This starts the appendix. ##BOOM!}}
When \appendix is defined like that, TeX's definition scanner sees #6#6, and replaces that by a single parameter token #6 in the definition of \appendix, so far so good. However when you expand the command, TeX also returns a single #6, and then when you try to redefine the command you have:
\def\appendix{%
\typeout{This starts the appendix. #BOOM!}%
\typeout{I added this.}}
which contains an illegal parameter (#B), and the definition errors.
I have changed ltcmdhooks to handle this case (there's a brief explanation below), but meanwhile you can use \ActivateGenericHook (or \ProvideHook in LaTeX 2021-06-01) to tell ltcmdhooks that you have already patched the command, so it won't try patching, then you do the patching manually using etoolbox:
\documentclass{book}
\usepackage{cleveref}
\usepackage{etoolbox}
\IfFormatAtLeastTF{2021-11-15}%
{\ActivateGenericHook}% LaTeX > 2021-11-15
{\ProvideHook}% LaTeX = 2021-06-01
{cmd/appendix/before}
\pretocmd\appendix
{\UseHook{cmd/appendix/before}}
{}{\FAILED}
\AddToHook{cmd/appendix/before}{\label{appendix}}
\begin{document}
Hello world!
\appendix
\end{document}
Why the above works
The interface for ltcmdhooks in \AddToHook is supposed to work as follows:
If an end user writes \AddToHook{cmd/name/before}{code}, and the hook cmd/name/before doesn't exist yet (which implies that the command \name doesn't have that hook "installed"), then the code tries to patch that hook in the command.
If the end user writes \AddToHook{cmd/name/before}{code}, and the hook cmd/name/before already exists, this (probably) means that the command \name already has that hook, so it just adds the code to the hook, and leaves the command be.
This means that a package author may want to fine-tune the position of the cmd/name/before hook (for example, \def\name{<some initialization>\UseHook{cmd/name/before}<definition>}), then we don't want ltcmdhooks patching the command again (it would be wrong to add the same hook twice), so we tell ltcmdhooks that the hook already exists by saying \ActivateGenericHook{cmd/name/before}, then patching is no longer attempted.
This works for your case because you then manually add the hook to the command, and then tell ltcmdhooks that pathching is no longer needed. See section 3 Package Author Interface of the ltcmdhooks documentation.
So in essence, you, as the package author, are appropriating the \appendix command, by adding the hook yourself (exactly where ltcmdhooks would add it), and then telling ltcmdhooks to not patch it by using \ActivateGenericHook.
If instead of \appendix you were adding hooks to \UniqueCommandFromMyPackage, then you could use \NewHook instead of \ActivateGenericHook (the effect would be identical), because there would be no possibility of a name conflict.
How LaTeX2ε handles this case now
The problem: Turns out in the described case we're in a dead-end. When you write a definition like
\def\foo#1{#1##X}
TeX stores its <replacement text> as a token list containing:
out_param 1, par_token #, letter X
(out_param 1 is #1 to be replaced by the actual parameter when the macro is expanded, par_token # is a catcode 6 #, and letter X is a catcode 11 X).
Then, when you expand \foo with #1 (par_token #, character 1), TeX replaces out_param 1 and you have:
par_token #, character 1, par_token #, letter X
which is equivalent to typing #1#X. If you plug that back into a new definition of \foo you'll have:
\def\foo#1{#1#X}
which is obviously wrong (and thus the Illegal parameter number error). And at this point you have no way to tell what was an actual parameter when the macro was defined, and what was a single parameter token.
Half solution: There is one very simple case that can be easily detected and solved (which coincidentally is the one in your question): a macro without parameters. In this case, the macro has no argument, so any loose ## in its definition cannot possibly be confused with a parameter, so we can treat this such macros as token lists (in the expl3 sense) and do something akin to \tl_put_right:Nn and problem solved.
Another relatively simpler case is when the macro has no ## in its definition. In this case we don't have to worry about confusing parameters, so we treat the macro normally (this was the case implemented initially). LaTeX uses a rather simple loop to check if a macro has a parameter token in its definition (\__hook_if_has_hash:nTF): it looks at every token in the defintion, and compares it with #.
The other half: When the macro falls into the general case of having both parameters and parameter tokens in its definition (like \foo above), then we have to manually re-double every parameter token in the definition, so that it can be re-made. To do that, instead of expanding \foo with #1, LaTeX expands it with \c_@@_hash_tl, so \foo{\c_@@_hash_tl} becomes a definition like:
\foo#1{\c_@@_hash_tl 1#X}
then we loop through the replacement text of the macro (inside the braces) and double every ##, and replace every \c_@@_hash_tl by a single #, which then gives:
\foo#1{#1##X}
and then we can do the definition normally (phew!)
Patching with \scantokens
(wordier description here)
Suppose a macro defined with
\long\def\mycmd[#1]#2{\typeout{#1//#2}}
To append some code to it via \scantokens, you first do \meaning\mycmd to get a string like:
\long macro:[#1]#2->\typeout {#1//#2}
(with usual \detokenize catcodes: all 12 except spaces, which are catcode 10), then you use a delimited macro to separate the <prefixes>, the <parameter text>, and the <replacement text>, roughly like this:
\def\split#1{\expandafter\splitaux\meaning#1\relax}
\expanded{%
\noexpand\def\noexpand\splitaux#1\detokenize{macro:}#2->#3\relax}{%
\def\prefixes{#1}%
\def\parameter{#2}%
\def\replacement{#3}}
(I'm using \def\prefixes{#1}, etc. for the sake of understandability, but in reality you would inject everything expandably instead; see the definition of \__kernel_prefix_arg_replacement:wN in expl3-code.tex, and \etb@patchcmd in etoolbox.sty if you're feeling brave).
At this point you have every part of the definition as a string separately. Now you can either append or prepend some code to \replacement (or replace some part of it, as it's done in \patchcmd), or in rarer cases change \prefixes or \parameter. At this point you have three strings, each of which is a part of the definition. To reconstruct the definition you need:
<prefixes>\def\mycmd<parameter text>{<replacement text>}
but the three parts you have are still catcode 12 tokens, which are no good. Here comes the \scantokens part: you rescan those strings back to "normal" tokens:
\expanded{%
\noexpand\scantokens{%
% <prefixes>\def \mycmd<parameter text>{<replacement text>}
\prefixes \def\noexpand\mycmd\parameter {\replacement <added material>}%
}%
}
which, after \expanded does its job, becomes:
\scantokens{%
\long\def\mycmd[#1]#2{\typeout {#1//#2}<added material>}%
}%
then \scantokens does its thing and turns everything into tokens using the current catcode settings, and then the definition is carried out normally.
The advantage of this method is that you can do virtually any manipulation in any part of the definition.
The disatvantages are a few:
- You need to know what catcodes were in force when the definition was first made (when patching you usually need to verify that a simple round of
\meaning–\scantokens doesn't change the meaning of the macro) otherwise you can't patch safely;
- If the macro was created with some combination of
\edef and \detokenize to forcibly make some catcode 12 tokens, you will probably not be able to patch that macro (for example, \splitaux as defined above in this answer cannot ever be patched with \patchcmd because it contains letters (for example m) of both catcodes 11 and 12);
- If the
<parameter text> of the macro contains the characters ->, you won't be able to patch the macro.
Patching with expansion+redefinition
This method is much simpler, but requires previous knowledge of how the macro was defined. This can be done in few cases, namely when you know exactly what the <parameter text> of the macro is. The cases known by the kernel are when the macro was defined with \DeclareRobustCommand, or with ltcmd (\NewDocumentCommand or \NewExpandableDocumentCommand), or with \newcommand with an optional argument, or when the macro takes no argument.
Suppose the same macro from before, but defined with:
\newcommand\mycmd[2][default]{\typeout{#1//#2}}
(it will have an internal macro called \\mycmd, but for the sake of simplicity let's call it \mycmd as well), then we know for sure its <parameter text> is [#1]#2. Knowing what arguments the macro expects, we can feed it #1, #2, ... as arguments, so for \mycmd we would do:
\mycmd[#1]{#2}
which would then expand to the <replacement text> of the macro, with the first parameter (#1) replaced by #6112 (the parameter token # followed by the character 1). The patching scheme would be something like:
\expanded{%
\def\noexpand\mycmd[#1]#2{%
\unexpanded\expandafter{\mycmd[#1]{#2}<added material>}%
}%
}
then after the \expanded is done you are left with:
\def\mycmd[#1]#2{\typeout{#1//#2}<added material>}}
which is exactly what you had with the \scantokens approach, except that you didn't turn tokens into a string, so catcodes don't matter at all here.
The advantages of this method are roughly the disadvantages of the \scantokens method:
- catcodes don't matter at all;
- you can patch complicated macros (including the
\splitaux macro from before) using this method given you know exactly what its <parameter text> is;
- the
<parameter text> of the macro may contain any token your heart desires (as long as you know what token it is); and
- this method doesn't need a sanity check to ensure that the macro can be patched correctly.
The disadvantage is the requirement for the method to work: you need to know exactly what the <parameter text> is.
\scantokensas much as possible, and in case of very simple macros (for example, ones that don't take arguments), it patches by treating the macro more or less as a token list. However this procedure currently fails when there's a single parameter token (##) in the definition. – Phelype Oleinik Oct 06 '21 at 01:55etoolboxworks for this case? I'm very "curious" here since I'm relying in one such hook to\appendix, and I have no hope of being able to know what "arbitrary packages or classes" may put there. That being the case, should I refrain from usingltcmdhooks? – gusbrs Oct 06 '21 at 02:02etoolboxworks because it uses the\scantokensapproach (which is also not fool-proof, but works well in this case). You should definitely not prefer manual patching over the method built into the kernel (well, maybe for now, while it's broken, but I hope to have it fixed for the November release). It's getting late, so if you can hold your curiosity until tomorrow, I can write a detailed explanation on why and how this is happening – Phelype Oleinik Oct 06 '21 at 02:07\tl_put_left:Nnalso works, but anything but\AddToHookseems inappropriate here... – gusbrs Oct 06 '21 at 02:10\RemoveFromHook. Once you add some code to a genericcmdhook, that command enters the patching queue and nothing takes it out of there. What prevents the patch from happening is declaring the hook with\NewHookor\ActivateGenericHook(\ProvideHookin the current release). – Phelype Oleinik Oct 06 '21 at 17:57\NewHook,\ActivateGenericHookor\ProvideHookcould prevent the patch from happening if\RemoveFromHookdoes not do it. You mean here that I could create a hook with a different name and then use\DisableHookaltogether? – gusbrs Oct 06 '21 at 19:08\ActivateGenericHookbefore the package is loaded? – gusbrs Oct 06 '21 at 19:10\RemoveFromHookworks exclusively on the hook's code pool, while\ActivateGenericHooktellsltcmdhooksif the patching should happen. The only restriction on the position of\ActivateGenericHookis that it has to appear before patching happens (that is, before\begin{document}in this case). – Phelype Oleinik Oct 06 '21 at 19:40