UPDATE 2019-01-14
An equivalent patch has been applied in tabu 2.9 which has been submitted to ctan.
EDIT: After discussion in comments, it turns out that I had underestimated the problem, and that David's answer was closer to being the correct answer than mine.
Three catcode régimes are involved here. In chronological order, those are:
- catcodes in force when the code of the
tabu package is read (tokenized);
- catcodes in force when the preamble of the
tabu environment is tokenized;
- catcodes in force when the
tabu is performed (here, in the verbatim environment).
The tabu package (v2.8) assumes that the catcode régimes 2 and 3 are identical (but it misuses \scantokens, which should only ever be used in combination with \everyeof —see below for the appropriate code—). Specifically, it tries to parse the preamble (which has catcodes in régime 2) using macros delimited by | in régime 3. When used as in the question, the tabu preamble is saved early on (with normal catcodes), and the tabu is performed when verbatim catcodes are in force. In that case, the catcode régime 2 actually coincides with the catcode régime 1, hence David's suggestion of disabling \scantokens is correct, since tabu then parses the preamble with a macro delimited by a régime 1 |.
In general, however, both solutions may fail if the three catcode régimes are distinct, which happens for instance if | is declared as a shorthand character for verbatim. In that case, the simplest approach is to use David's suggestion while making sure that the tabu preamble is tokenized with the category codes in place when the tabu package code is read, hence normal category codes. For example, removing the \DeleteShortVerb (and subsequent \MakeShortVerb) lines from the code below will fail because tabu fails to recognize the active | in the preamble.
\documentclass{article}
\usepackage{verbatim}
\usepackage{tabu}
\usepackage{shortvrb}
\MakeShortVerb{\|}
\begin{document}
We first input the file \jobname.tex with
|\verbatiminput{\jobname.tex}|:
\verbatiminput{\jobname.tex}%
Then redefine |\verbatim@processline|
%
\makeatletter
\DeleteShortVerb{|}
\renewcommand\verbatim@processline
{{\let\scantokens@firstofone
\begin{tabu}to\textwidth{|[5pt]l|X[-1,l]|}%
foo&\the\verbatim@line%
\end{tabu}%
}\par}
\MakeShortVerb{|}
\makeatother
%
and input the file again with the same command:
\verbatiminput{\jobname.tex}%
\end{document}
The fully correct fix would be to change completely the way a tabu preamble is parsed, replacing the current approach (which comes from LaTeXe's * through array's \newcolumntype) by an approach which reads characters in the preamble from left to right, ignores their catcode, checks if they are a "primitive" column type or should be expanded to something else, checks for arguments for those column types, and when it is done, goes to the next token in the preamble.
The eTeX primitive \scantokens is very tricky to use properly, and tabu misuses it (and in many places). This is clearly a bug of tabu, and is fixable.
Rather than
\scantokens{\def\:{|}} % bad
which is risky because \def\: is also rescanned (and braces too), it is better to do
\everyeof{\noexpand}
\edef\:{\expandafter\noexpand\scantokens{|}}
namely put only the part that needs to be rescanned in the brace group. The \edef ensures that \scantokens is expanded, and setting \everyeof to \noexpand prevents the end-of-file marker at the end of \scantokens to wreak havoc. The additional \expandafter\noexpand construction is only needed to support the case where | is currently active. The case where | is a macro parameter character, or a begin or end-group token, would break that code, but that is probably unavoidable. Of course, to use \scantokens properly, one also needs to take care of the \endlinechar (which tabu does), and the \newlinechar (in case that is set to |), hence the correct fix for your situation is
\renewcommand{\tabu@textbar}[1]%
{%
\begingroup
\newlinechar \m@ne % I'm just paranoid.
\endlinechar \m@ne
\everyeof{\noexpand}%
\edef\:{\expandafter\noexpand\scantokens{|}}%
\expandafter
\endgroup
\expandafter #1%
\:%
}
Now, in my solution I make use of the fact that tabu's author only wants to rescan a single character here. What should he do when rescanning a full token list? Well, this is more tricky, always because TeX inserts a marker at the end of every file (including the \scantokens file), which behaves as an \outer "thing" preventing a macro appearing in one file to have its argument in a different file, for instance. The answer can be found in the implementation of \tl_set_rescan:Nnn in LaTeX3, or in one of Heiko Oberdiek's packages (dunno which one, reference welcome). Build a marker that cannot appear when rescanning (e.g., two @ with different catcodes), and set that as the end-of-file marker. Then define a macro with an argument delimited by that marker, to collect the rescanned token list. For instance,
\def\tabu@tmp#1%
{%
\long\def\tabu@gdef@rescan@##1#1%
{\expandafter{##1}}%
\long\def\tabu@gdef@rescan##1##2%
{%
\begingroup
\newlinechar\m@ne
\endlinechar\m@ne
\everyeof{#1\noexpand}%
\xdef##1%
{%
\unexpanded
\expandafter\tabu@gdef@rescan@
\expandafter\empty
\scantokens{##2}%
}%
\endgroup
}%
}
\expandafter\tabu@tmp\expandafter{\string @@}
\newcommandfor that last code snippet. This is not my best style :(, I'm now too used to LaTeX3 syntax :). – Bruno Le Floch Dec 16 '12 at 01:01\scantokensis questionable in a setting where all the catcodes are redefined, so Davids answer disabling\scantokensis "more correct" after all. What do you think, Bruno? – Stephan Lehmke Dec 16 '12 at 10:16\scantokensas currently done in tabu is wrong (one should only rescan the part that one cares about rescanning). Using\scantokensin my fixed way makes sense when thetabupreamble and contents both appear with the same category code settings in the source file (for instance when|is active). I'm starting to think that the correct approach would use\savetabuand\usetabu, but I haven't looked carefully yet. – Bruno Le Floch Dec 16 '12 at 10:27\usepackage{shortvrb}\MakeShortVerb\|to the document preamble, and replace|by|[5pt]in the tabular preamble to see both our solutions fail: in both,tabusearches for an "other"|when the preamble contains an active|. I think that the column-rewriting procedure was rewritten last year for tabu 2.9, but it is not yet publicly available. – Bruno Le Floch Dec 16 '12 at 10:43\verbatim@processlineis defined before \MakeShortVerb. That's the point really: the catcode regime in the document is irrelevant (so scantokens should be disabled) as long as the tabu preamble is set up in a macro definition with normal catcodes. If you add the line you suggest to my solution just before\begin{document}it works and you get thick lines in the output. – David Carlisle Dec 16 '12 at 14:35\scantokensis the correct way to go, albeit not in the way currently done in tabu (as I explain in my answer). Well, in fact, the fully correct approach is to re-code completely the column-rewriting procedure as I discussed last year with the author on comp.text.tex. @StephanLehmke, please accept David's answer. – Bruno Le Floch Dec 16 '12 at 18:02\NC@findetc). Instead, it is better to go from left to right: read the fist token in the tabular. If it is declared by\newcolumntype, replace it, otherwise it must be a primitive column type. That approach is very similar to TeX's own expansion of macros, and we can decide to ignore the catcode of characters since they don't serve as delimiters anymore. – Bruno Le Floch Dec 16 '12 at 22:22ctt. – Stephan Lehmke Jan 19 '13 at 15:49