20

With the following MWE

\documentclass{article}
\usepackage{verbatim}
\usepackage{tabu}

\makeatletter
\renewcommand\verbatim@processline
{%
  \begin{tabu}to\textwidth{|l|X[-1,l]|}
    foo&\the\verbatim@line
  \end{tabu}
}
\makeatother

\begin{document}

\verbatiminput{test.tex}%

\end{document}

I get some error messages:

! Missing number, treated as zero.
<to be read again> 
                   \NC@list 
l.16 \verbatiminput{test.tex}
                             %^^M
! Incompatible glue units.
<to be read again> 
                   \NC@list 
l.16 \verbatiminput{test.tex}
                             %^^M

etc. and an undesired output:

screenshot

Is this a known problem? Is there an easy fix or should I report to the package maintainer?

Edit

In the answers it turned out that the problem is caused by applying \scantokens when catcodes have been redefined by \verbatiminput. As this is an information potentially useful to other package authors, I'd like to draw attention to it.

2 Answers2

17

UPDATE 2019-01-14

An equivalent patch has been applied in tabu 2.9 which has been submitted to ctan.


EDIT: After discussion in comments, it turns out that I had underestimated the problem, and that David's answer was closer to being the correct answer than mine.

Three catcode régimes are involved here. In chronological order, those are:

  1. catcodes in force when the code of the tabu package is read (tokenized);
  2. catcodes in force when the preamble of the tabu environment is tokenized;
  3. catcodes in force when the tabu is performed (here, in the verbatim environment).

The tabu package (v2.8) assumes that the catcode régimes 2 and 3 are identical (but it misuses \scantokens, which should only ever be used in combination with \everyeof —see below for the appropriate code—). Specifically, it tries to parse the preamble (which has catcodes in régime 2) using macros delimited by | in régime 3. When used as in the question, the tabu preamble is saved early on (with normal catcodes), and the tabu is performed when verbatim catcodes are in force. In that case, the catcode régime 2 actually coincides with the catcode régime 1, hence David's suggestion of disabling \scantokens is correct, since tabu then parses the preamble with a macro delimited by a régime 1 |.

In general, however, both solutions may fail if the three catcode régimes are distinct, which happens for instance if | is declared as a shorthand character for verbatim. In that case, the simplest approach is to use David's suggestion while making sure that the tabu preamble is tokenized with the category codes in place when the tabu package code is read, hence normal category codes. For example, removing the \DeleteShortVerb (and subsequent \MakeShortVerb) lines from the code below will fail because tabu fails to recognize the active | in the preamble.

\documentclass{article}
\usepackage{verbatim}
\usepackage{tabu}
\usepackage{shortvrb}
\MakeShortVerb{\|}

\begin{document}

We first input the file \jobname.tex with |\verbatiminput{\jobname.tex}|:

\verbatiminput{\jobname.tex}%

Then redefine |\verbatim@processline| % \makeatletter \DeleteShortVerb{|} \renewcommand\verbatim@processline {{\let\scantokens@firstofone \begin{tabu}to\textwidth{|[5pt]l|X[-1,l]|}% foo&\the\verbatim@line% \end{tabu}% }\par} \MakeShortVerb{|} \makeatother % and input the file again with the same command:

\verbatiminput{\jobname.tex}%

\end{document}

The fully correct fix would be to change completely the way a tabu preamble is parsed, replacing the current approach (which comes from LaTeXe's * through array's \newcolumntype) by an approach which reads characters in the preamble from left to right, ignores their catcode, checks if they are a "primitive" column type or should be expanded to something else, checks for arguments for those column types, and when it is done, goes to the next token in the preamble.


The eTeX primitive \scantokens is very tricky to use properly, and tabu misuses it (and in many places). This is clearly a bug of tabu, and is fixable.

Rather than

\scantokens{\def\:{|}} % bad

which is risky because \def\: is also rescanned (and braces too), it is better to do

\everyeof{\noexpand}
\edef\:{\expandafter\noexpand\scantokens{|}}

namely put only the part that needs to be rescanned in the brace group. The \edef ensures that \scantokens is expanded, and setting \everyeof to \noexpand prevents the end-of-file marker at the end of \scantokens to wreak havoc. The additional \expandafter\noexpand construction is only needed to support the case where | is currently active. The case where | is a macro parameter character, or a begin or end-group token, would break that code, but that is probably unavoidable. Of course, to use \scantokens properly, one also needs to take care of the \endlinechar (which tabu does), and the \newlinechar (in case that is set to |), hence the correct fix for your situation is

\renewcommand{\tabu@textbar}[1]%
  {%
    \begingroup
      \newlinechar \m@ne % I'm just paranoid.
      \endlinechar \m@ne
      \everyeof{\noexpand}%
      \edef\:{\expandafter\noexpand\scantokens{|}}%
      \expandafter
    \endgroup
    \expandafter #1%
    \:%
  }

Now, in my solution I make use of the fact that tabu's author only wants to rescan a single character here. What should he do when rescanning a full token list? Well, this is more tricky, always because TeX inserts a marker at the end of every file (including the \scantokens file), which behaves as an \outer "thing" preventing a macro appearing in one file to have its argument in a different file, for instance. The answer can be found in the implementation of \tl_set_rescan:Nnn in LaTeX3, or in one of Heiko Oberdiek's packages (dunno which one, reference welcome). Build a marker that cannot appear when rescanning (e.g., two @ with different catcodes), and set that as the end-of-file marker. Then define a macro with an argument delimited by that marker, to collect the rescanned token list. For instance,

\def\tabu@tmp#1%
  {%
    \long\def\tabu@gdef@rescan@##1#1%
      {\expandafter{##1}}%
    \long\def\tabu@gdef@rescan##1##2%
      {%
        \begingroup
          \newlinechar\m@ne
          \endlinechar\m@ne
          \everyeof{#1\noexpand}%
          \xdef##1%
            {%
              \unexpanded
                \expandafter\tabu@gdef@rescan@
                \expandafter\empty
                \scantokens{##2}%
            }%
        \endgroup
      }%
  }
\expandafter\tabu@tmp\expandafter{\string @@}
  • @Stephane: do you want to report, or shall I do it? If I recall correctly and don't confuse people, the author of tabu is easily reachable through comp.text.tex, and he usually has no qualms bashing other package writers for rather innocuous mistakes. – Bruno Le Floch Dec 16 '12 at 01:00
  • Not sure why I didn't use \newcommand for that last code snippet. This is not my best style :(, I'm now too used to LaTeX3 syntax :). – Bruno Le Floch Dec 16 '12 at 01:01
  • I sent an email with a link to this thread. No answer yet. So if you have a better way to reach the author, all the better. – Stephan Lehmke Dec 16 '12 at 01:01
  • @StephanLehmke All my apologies for the typo I made on your name :(. I think we can give the author more than a few hours before trying to contact him by other means :). – Bruno Le Floch Dec 16 '12 at 01:09
  • @BrunoLeFloch I haven't looked too hard at tabu and yours looks more like a fix to the code for the normal case, but here, inside verbatim, isn't scantokens just wrong? even with your above fix the | will be scanned after verbatim catcodes are setup however the actual tabu preamble has already been tokenised in Stephen's macro with the normal document catcodes, as by default verbatim doesn't change the catcode of | it comes to the same thing but... – David Carlisle Dec 16 '12 at 01:20
  • @BrunoLeFloch: You are fully right with your first comment. – Speravir Dec 16 '12 at 01:24
  • @BrunoLeFloch @davidcarlisle After discussion in chat I feel a bit uneasy about which answer to accept. My current feeling is that any use of \scantokens is questionable in a setting where all the catcodes are redefined, so Davids answer disabling \scantokens is "more correct" after all. What do you think, Bruno? – Stephan Lehmke Dec 16 '12 at 10:16
  • @DavidCarlisle, @StephanLehmke, In any case, using \scantokens as currently done in tabu is wrong (one should only rescan the part that one cares about rescanning). Using \scantokens in my fixed way makes sense when the tabu preamble and contents both appear with the same category code settings in the source file (for instance when | is active). I'm starting to think that the correct approach would use \savetabu and \usetabu, but I haven't looked carefully yet. – Bruno Le Floch Dec 16 '12 at 10:27
  • @DavidCarlisle, @StephanLehmke Add \usepackage{shortvrb}\MakeShortVerb\| to the document preamble, and replace | by |[5pt] in the tabular preamble to see both our solutions fail: in both, tabu searches for an "other" | when the preamble contains an active |. I think that the column-rewriting procedure was rewritten last year for tabu 2.9, but it is not yet publicly available. – Bruno Le Floch Dec 16 '12 at 10:43
  • Well in that case disabling scantokens is still the right thing to do as long as \verbatim@processline is defined before \MakeShortVerb. That's the point really: the catcode regime in the document is irrelevant (so scantokens should be disabled) as long as the tabu preamble is set up in a macro definition with normal catcodes. If you add the line you suggest to my solution just before \begin{document} it works and you get thick lines in the output. – David Carlisle Dec 16 '12 at 14:35
  • @DavidCarlisle Ok, I now agree that your solution is correct in that particular case. Typically, though, the preamble of a tabu is set with the same catcode régime as that in force when the tabu is typeset, so using \scantokens is the correct way to go, albeit not in the way currently done in tabu (as I explain in my answer). Well, in fact, the fully correct approach is to re-code completely the column-rewriting procedure as I discussed last year with the author on comp.text.tex. @StephanLehmke, please accept David's answer. – Bruno Le Floch Dec 16 '12 at 18:02
  • yes agreed, well actually what tabu probably ought to do if it is going to use scantokens at all is re-scan its preamble as well as the various parts so that saving the preamble in one catcode regime and using it in another doesn't do any harm. – David Carlisle Dec 16 '12 at 18:50
  • The better solution is to forget completely the approach of the array package which rewrites all columns of a given type, then all columns of another type etc, using delimited macros (\NC@find etc). Instead, it is better to go from left to right: read the fist token in the tabular. If it is declared by \newcolumntype, replace it, otherwise it must be a primitive column type. That approach is very similar to TeX's own expansion of macros, and we can decide to ignore the catcode of characters since they don't serve as delimiters anymore. – Bruno Le Floch Dec 16 '12 at 22:22
  • 1
    Bruno, @DavidCarlisle Thank you for the insightful discussion (and sorry for being unstable in accepting answers). Could any of the comment statements be incorporated in your answers? – Stephan Lehmke Dec 17 '12 at 04:02
  • I added some notes to my answer – David Carlisle Dec 17 '12 at 09:42
  • @BrunoLeFloch Just to give one more item of feedback, I never received an answer to my email to the package author. I hope you had more luck getting his attention on ctt. – Stephan Lehmke Jan 19 '13 at 15:49
  • Hi @StephanLehmke. I had not tried to contact him. I just posted on comp.text.tex, and will try to keep you informed. – Bruno Le Floch Jan 20 '13 at 00:59
15

UPDATE 2019-01-14

An patch equivalent to the code in Bruno's answer has been applied in tabu 2.9 which has been submitted to ctan, so the workaround suggested in this answer should not be needed.


enter image description here

tabu uses \scantokens while parsing the preamble, which means it picks up the local verbatim setting and goes wrong. Since the argument is just \def\:{|} just read them with the normal catcodes. Also you need a \par or it all comes out on one line.

\documentclass{article}
\usepackage{verbatim}
\usepackage{tabu}

\makeatletter \renewcommand\verbatim@processline {{\let\scantokens@firstofone \begin{tabu}to\textwidth{|l|X[-1,l]|}% foo&\the\verbatim@line% \end{tabu}% }\par} \makeatother

\begin{document} %\tracingall \verbatiminput{test.tex}%

\end{document}


As noted in the discussion in comments in Bruno's answer, disabling \scantokens here is only a partial fix for the special case of verbatim usage.

There are several catcode regimes that come into play in code such as this.

The catcodes in force at the time the array preamble is saved in the users macro. The catcodes in force during the body of the table (verbatim settings in this case) The catcodes in force when the tabu internals are read.

Disabling scantokens only works if the first and last of these are the same, which is the usual case. the tabu usage of scantokens tries to normalise the preamble using \scantokens but this assumes that the preamble has been saved with the catcodes in force when the table is executed which is not the case if the table preamble is stored in a macro rather than just being inline in the document.

Ideally a table preamble parsing code ought to be agnostic about catcodes (that is accept | as a vertical rule specification whatever catcode is used) or if it is using scantokens it should probably normalise the entire array preamble with a safe catcode regime

David Carlisle
  • 757,742