4

There was a question about automatically adding space after the periods and commas. An answer to it included modifying category code. It is shown below.

\documentclass{article}
\usepackage{expl3}

\makeatletter \catcode.13 \def.{\char.@ifnextchar'{}{@ifnextchar.{}{ }}} \makeatother \def\Abbr{\catcode`.12} \begin{document} ... \end{document}

I have some questions about it. First, why did it write \catcode`.13? I supposed that the syntax for \catcode should have = in it, indicating the the character and its new category code.

Second, how come . get to be a command itself simply because its category code is 13? There're lots of such characters, and none of them can form names of commands without \ at the beginning.

youthdoo
  • 861
  • Making . active is a sure source for head scratching when weird error messages are thrown. Can you please mention what answer you're referring to? – egreg Nov 08 '22 at 15:23
  • 2
    = is almost always optional, \let\abc\xyz is \let\abc=\xyz , \catcode\.13is\catcode`.=13` – David Carlisle Nov 08 '22 at 15:25
  • @egreg it is here: https://www.zhihu.com/question/512113449 – youthdoo Nov 08 '22 at 15:49
  • These low-level commands are documented in TeXbook/TeX by topic/TeX in a nutshell. If you have time you can find the answer yourself by reading these. – user202729 Nov 08 '22 at 17:42
  • OK, even if the referenced site is in Chinese, the code is readable. But try to use \vspace{1.5pt} in your document. – egreg Nov 08 '22 at 18:42

2 Answers2

3

The code you show inserts space after each dot with two exceptions. If the dot is followed by the ' character or it is followed by another dot then the space is not inserted.

The \catcode TeX primitive command assigns the category code (one of 16 values) to a character. The syntax is \catode <number>=<number> or \catcode <number><number> (i.e the = is optional). The first number is numerical representation of the character in computers (ASCII or Unicode) and the second number is in range 0 to 15 and denotes the assigned category code. If you know the umerical representation of the dot (it is 46), then you can say \catcode 46 13 or \catcode 46=13. You can use the ` followed by a character or by \ followed a character instead of the numeric constant with the same result: \catcode`.=13 or \catcode`\.=13 or \catcode`.13 or \catcode`\.13.

Once the character has the category code 13, it is "active", it means that it behaves like another control sequence. You can use it in \def<control sequence>, \let<control sequence>, \countef<control sequence> and at many other situations. When it is defined as macro then the character expands to its body when it is reached by TeX scanner in common cases.

PlainTeX sets category code 13 to the ~ and defines it as a macro which expands to \penalty10000 \space, i.e. to a non-breakable space. No other visible characters have category 13 set by PlainTeX. Other TeX formats (like LaTeX) do the same: they set category 13 only to ~. You can set more such characters and you do it in the presented code.

The \@ifnextchar <character>{true}{false} is a LaTeX macro which looks to the next character (but keeps it unchanged in the input queue) using \futurelet TeX primitive and runs true if the next character is <character> and runs false if not. You can wonder what the code \@ifnextchar'{}{\@ifnextchar.{}{ }} will do.

Edit There is a new question in the comment: why the code doesn't work in new versions of LaTeX if \usepackage{expl3} isn't used. First: your macro has nothing common with expl3, but the new versions of LaTeX implement the expl3 stuff to the LaTeX kernel. And it needs to read the file l3backend-?.def depending on various cases. This file is read immediately if you use \usepackage{expl3} but it is read at \begin{document} in new LaTeX kernels if it is not read yet.

Reading the external macro-files expects that there is only standard catcode settings. The dot cannot be active at this state. This causes the error. (Note: OpTeX doesn't have this problem if you are using \load for loading external macros because it sets standard catcodes vector temporary during reading such files.)

Solution: you can put your macro with active dot after \begin{document}. Or set \catcode`.=12 at the end of your macro code and set again \catcode`.=13 after \begin{document} or using \AtBeginDocument macro.

wipet
  • 74,238
3

The = in assigments is optional. Actually, it's slightly more efficient to use =, but with a very small advantage. Benchmarking \catcode`?13 against \catcode`?=13 yields

No =
1.02e-7 seconds (0.472 ops)
With =
9.25e-8 seconds (0.427 ops)

I used l3benchmark which repeates the code several times in order to give as accurate an estimate as possible. By the way, it would be even more efficient to use \active instead of 13

No =
7.94e-8 seconds (0.378 ops)
With =
6.93e-8 seconds (0.327 ops)

Once you have set the catcode of a character to 13 (active character), it behaves like a control sequence, so you can use it as the token following \def (or any other control sequence defining command) to assign it a meaning.

The feature is exploited by babel to introduce shorthands. For instance, when the current babel language is German, the combination "| denotes a morpheme boundary, which is important in order to break unwanted ligatures, so one can type in Auf"|lage that makes it clear that the word is composed by “auf” and “lage”. This is achieved by giving " catcode 13 and assigning it a suitable definition based on the following token: "- inserts an additional line break point, allowing for hyphenation also past it.

Sounds familiar? Yes, it's a situation very similar to yours, with a big difference: the character " is very seldom used in TeX, but the period is part of TeX's syntax in many places.

Once you do \catcode`.=13, you will no longer be able to specify

\vspace{1.5pt}

Try it. You'd need to say \vspace{1\string.5pt}. Or you might use a comma, instead of a period. Are you keen to do it?

OK, let's assume you are. The idea is to define the active . to look for the next token and decide what to do.

The standard way to do this in LaTeX is to use \@ifnextchar, to which you supply the token to look for and the code to be executed in case of “success” or “failure”.

The given definition prints a period (instead of \char`. I'd do \string.) and then executes

\@ifnextchar'{T}{F}

(here T and F stand for the actual code). If the character following the period in the typescript is ', then T is executed, otherwise F. Since you want to do nothing, in this case, T is actually empty. In case ' doesn't follow, F is executed and the actual code is

\@ifnextchar.{}{ }

If a period follows, do nothing, otherwise a space is inserted. Why this check? I'm not sure, because typing two consecutive periods is not really usual and three periods is incorrect, because \ldots should be used.

Anyway, the code does its job. But, still assuming you're willing to accept several problems like the \vspace{1.5pt} described above, this is not the best programming.

\documentclass{article}

\makeatletter % define the desired action for the active period \begingroup\lccode~=.\lowercase{\endgroup \newcommand{\active@period}{.@ifnextchar'{}{@ifnextchar~{}{ }}}% } \AtBeginDocument{% % assign the meaning of \active@period to the active period \begingroup\lccode~=. \lowercase{\endgroup\let~}\active@period % change the catcode \catcode.=\active } \makeatother % to locally set the category code of . to 12 \newcommand{\Abbr}{\catcode.=12 }% don't forget the space!

\begin{document} ... \end{document}

This way the setting of the category code is delayed to when the document begins an configuration files have already been read in.

The space after 12 is needed. Try

{\Abbr 1.2.3}

with the original code to see there's a problem. Indeed, you'd get

\catcode`.121.2.3

and assigning 121 as a catcode is illegal. With the space it would be

\catcode`.12 1.2.3

and the space after 12 is ignored per TeX rules.

With a more modern approach:

\documentclass{article}

\ExplSyntaxOn

% to locally set the category code of . to 12 \NewDocumentCommand{\Abbr}{}{\char_set_catcode_other:N .}

% define the desired action for the active period \cs_new_protected:Nn \youthdoc_active_period: { . \peek_charcode:NF . {% a period doesn't follow \peek_charcode:NF ' {% a quote doesn't follow, insert a space \c_space_tl } } } % at begin document set the period to be active \AtBeginDocument { \char_set_active_eq:NN . \youthdoc_active_period: \char_set_catcode_active:N . }

\ExplSyntaxOff

\begin{document}

A period.With a space.But ...has a space only at the end

`A quote.'

\end{document}

No contortions to have the period with different catcodes in the definitions.

egreg
  • 1,121,712
  • The \@ifnextchar. will not work when the current catcode of dot is 12 (during the definition of your \active@period). Moreover, user can write \vskip 1,5pt instead \vskip 1.5pt. – wipet Nov 09 '22 at 21:20
  • @wipet Yeah, right. Fixed. – egreg Nov 09 '22 at 21:35