20

I convert documents from Word a lot, and I often forget to escape %, which leads to pieces of text missing. So I would like the following to happen in the main document body:

  • escaped \% working as expected;
  • single % causing an error;
  • double %% behaving like a comment.

Do you think this is possible to do? I think it can be done by making % catcode active and chanking if the next one is % as well based on this changing it to comment for the rest of the line.

Starting point / MWE:

\documentclass{article}

\usepackage{lipsum}

%\usepackage{lipsum} % should cause no error

\begin{document}

Hello!

I use to forget to escape up to 70 % of my percent-signs, which causes missing ends of sentences. So I would like this to cause error.

%% On the other hand, sometimes I really need to make a comment, so I want to use a double-percent-sign for that.

And of course, the 30 \% of escaped percent-signs should work correctly.

\lipsum

\end{document}
yo'
  • 51,322
  • How would you handle the correct spacing? Even if local typography dictates a normal-width space, I would use at least a non-breaking one (e.g. ~) between numbers and the percent sign. (I consider it a unit-like symbol and would use \,/siunitx, anyway.) Bottom line: I'm voting for regex here. – Qrrbrbirlbel Oct 20 '12 at 16:16
  • @Qrrbrbirlbel don't care much about the 3rd line, I know how I want to format the percent-sign. – yo' Oct 20 '12 at 16:23
  • %% as a comment by setting % active wouldn't work: in many situations % is seen when TeX is not expanding macros, even in the document environment: for example when you end a line in a \parbox with what should be a comment. – egreg Oct 20 '12 at 19:24

5 Answers5

16

I don't know whether this can be done with LuaTeX (maybe yes). But under (pdf)TeX this can't be done in full generality.

First of all let's recall how comments work. TeX reads a line of input and, before starting to transform the characters into tokens, it does some jobs.

  1. It throws away the operating system provided end-of-record byte (if any; some old operating systems don't have it) and everything that remains in the line after it.

  2. It throws away all spaces that might remain at the end of the record (and tabs, under implementations based on Web2C, which are the most commonly used). This independently of category code, as they have not yet been attached to characters.

  3. It adds at the end of the record the current \endlinechar (none if the parameter is negative or beyond the engine dependent maximum value, which is 255 for TeX and pdftex, 0x1FFFFF for XeTeX and LuaTeX).

  4. It does what is prescribed by its current state with respect to category code 5 and 10 characters; under normal circumstanes it throws away initial spaces and it possibly inserts a \par token (see the TeXbook or TeX by Topic for details).

  5. Now it starts tokenizing and it's here where comments are discarded: a category code 14 character causes TeX to ignore it and everything up to the end of the line, the added \endlinechar included.

One might think to define % as an active character, which looks for a following % and, in this case, inserts a category code 14 character in the input. Maybe doing this only in the document environment, so that comments work as usual in the preamble. This fails in two ways.

  1. If TeX is not expanding macros, the %% pair would be recognized too late. For example

    \parbox{abc %%
      def}
    

    wouldn't work as expected, because the active % wouldn't be expanded when the argument to \parbox is absorbed.

  2. Even if one ensures that no %% pair is in the argument to a macro, it's impossible to put a character code 14 character in the replacement text of a macro: no category code 0, 5, 9, 14 and 15 character can reach TeX's "stomach", where macro replacement texts are examined and stored in memory (TeXbook, exercise 7.3).

One might think to overcome this limitation by defining % to look for a following % and, in this case, to issue a macro \gobbletoend defined by something like

\def\gobbletoend%1^^M{}

but, alas, this can't be done for two reasons: ^^M (category code 5) can't reach TeX's stomach and, moreover, TeX wouldn't even see the pair of braces, because, when absorbing that line it would see ^^M which is the ASCII end-of-line and it would throw it away with the rest of the line. So the macro can't have its argument delimited by a category code 5 ^^M, which is something like \obeylines does, but you don't want that every end-of-line that's not preceded by %% has a final \par, so the definition ought to be much more complicated.

A possible way is to do like in the following example:

\documentclass{article}
\usepackage{amsmath}

\makeatletter
\begingroup\lccode`~=`\%
\lowercase{\endgroup\def~{\new@ifnextchar~\tohecz@comment\%}}
\def\tohecz@comment{\catcode`\^^M=3 \tohecz@commentignore}
\begingroup\lccode`$=`\^^M
\lowercase{\endgroup\def\tohecz@commentignore#1$}{\catcode`\^^M=5 }
\makeatother

\begin{document}
\catcode`\%=\active

abc %% def

abc % def

abc %

def
\end{document}

The Kant paragraph is just to show that paragraphs are correctly terminated; \new@ifnextchar from amsmath is used to avoid gobbling spaces. Recall, however, that %% can't appear in the argument to a command.

enter image description here

egreg
  • 1,121,712
10

ConTeXt provides a macro \asciimode (and an environment \startasciimode ... \stopasciimode) which makes all characters except \ and { and } behave as normal characters. Inside this environment % behaves like a normal character, and %% behaves like the comment character.

Note: % and %% work correctly inside arguments to a macro.

\setuppapersize[A7]
\starttext
\rightaligned{\asciimode Hello! %World 
%% This is a comment
}

% Normal comment

\asciimode

\rightaligned{Hello! again %World
%% This is a comment
}

I use to forget to escape up to 70 % of my percent-signs, which causes missing
ends of sentences. asciimode simply typesets them corrects. And also typesets
all special characters # & $ correctly.

%% On the other hand, sometimes I really need to make a comment, so I want to
%% use a double-percent-sign for that.

And of course, the 30 \% of escaped percent-signs should work correctly.

\stoptext

enter image description here

Aditya
  • 62,301
  • Interesting and thanks! However, I'm sticked to LaTeX. Still, is it possible to extract the definition of the macro and use it in LaTeX? – yo' Oct 20 '12 at 19:24
  • FWIW, \asciimode does two things: sets an appropriate catcode table, and hooks into the line reader to set %% as comment. The first is doable in pdftex but the second requires luatex. If you want to implement %% acting as comment in pdftex, you need to set % to be an active character, and check the next token to see if it is % or not. If you are using lualatex, then porting the definition should be simple (but I don't know what interface lualatex provides for hooking into the line reader). – Aditya Oct 20 '12 at 19:38
  • @egreg: Yes, \asciimode works correctly inside macros as well. See the edited answer. In fact, \asciimode even works correctly if it is called inside a macro, provided there is a line break before %% is used (\asciimode changes the line-reader, so if the current line has already been read, it will not change it). – Aditya Oct 20 '12 at 21:23
6

I would be wary of changing settings like that. I'm not sure what other problems might be cause by doing so.

I'd suggest using regular expressions to find all the instances of a single % that isn't escaped.

If you're using Unix, then grep % doc.tex | grep -v %% | grep -v '\\%' should do just that. Alternatively, if your text editor can find and replace based on a regex, then you should be able to easily change them all.

Anthony
  • 1,000
  • Of course, I know that I can use this. Nice answer, but I have similar one and better myself. – yo' Oct 20 '12 at 16:23
0

Although this does not answer the question exactly as phrased, it may be helpful to others who get here by search (that's how I found it).

If you paste text to TeX from a word processor, the odds are that several TeX special characters are supposed to have ordinary text meaning, not just the percent character. What you can do is re-define them as ordinary characters by default (only within document body, so as not to upset the preamble). And, allow the possibility to switch in and out of the TeX meanings. The simplest way to do this is to re-define \makeatletter and makeatother.

Then, if you ever need to use math environment, or tables, or create a macro that needs # for parameter, use \makeatletter and \makeatother to do it all together.

In the following code, I have included the underscore, because I don't use it for subscripts. If you do, then don't include it.

\documentclass{article}
\usepackage{etoolbox} % provides \AfterEndPreamble
% Do not use % as comment within following macro.
% Note that ^ cannot be included, else ^^J and ^^^^nnnn won't work.
\gdef\mymakeatletter{
  \catcode`\@=11\relax
  \catcode`\#=6\relax
  \catcode`\$=3\relax
  \catcode`\&=4\relax
  \catcode`\_=8\relax
  \catcode`\%=14\relax
}
% Do not use % as comment within following macro:
% Note that ^ cannot be included, else ^^J and ^^^^nnnn won't work.
\gdef\mymakeatother{
  \catcode`\@=12\relax
  \catcode`\#=12\relax
  \catcode`\$=12\relax
  \catcode`\&=12\relax
  \catcode`\_=12\relax
  \catcode`\%=12\relax
}
% Do not use % as comment within following macro:
\AfterEndPreamble{
  \let\makeatother\mymakeatother\relax
  \let\makeatletter\mymakeatletter\relax
  \makeatother
}
\begin{document}
It % was # a dark $ and & stormy _ night.

\makeatletter
It % was # a dark $ and & stormy _ night.
\makeatother

It % was # a dark $ and & stormy _ night.

\end{document}

EDIT: Comments indicate that re-defining the existing \makeatletter and \makeatother commands is a bad idea. Well, it works for me! But it is not necessary to do it exactly that way. For example, leave the existing commands untouched, and define entirely new commands such as \losemytexcodes and \findmytexcodes or whatever. I also note that folks who begin their work in a word processor, before transferring to TeX, are likely to be authors of literature rather than mathematicians.

  • 1
    I’d say that redefining \makeatletter and \makeatother is a very bad idea, that is almost guaranteed to cause endless confusion. To begin with, why do you think that is appropriate for \makeatletter to set up the “special” \catcodes for \#\% and for \makeatother to switch in the “other character” ones, and not the other way around? – GuM Jan 06 '17 at 18:16
  • Not sure I understand the reasoning here: why not use your own macro names instead of redefining \makeatletter? What does overriding those names give you? – ShreevatsaR Jan 06 '17 at 19:05
  • Could use own macro names, yes. Not necessary to re-define makeatletter and makeatother. But I did it that way because those commands have a similar purpose, and would often be used in the same place. Easier for me to remember. –  Jan 06 '17 at 19:15
  • In that case could you edit the answer to use different names instead? (As you said, "leave the existing commands untouched, and define entirely new commands".) I think this is a good solution that can be helpful to others who visit this confusion, and would like to upvote it, but in its current form, the risk of confusion or weird interactions from redefining these macros is just too high. – ShreevatsaR Jan 06 '17 at 22:20
  • @ShreevatsaR Thanks for your feedback, but I won't edit to change the names. Here's why: (1) Anyone who understands my reply would grasp the concept, and can change them. (2) \makeatletter is already used in the context of changing catcodes. It is a natural fit. (3) Anyone who doesn't like re-defining macros should not use hyperref, either. (4) It was never intended to be an authoritative solution. (5) TeX syntax often makes no sense. For example, why does \( ... $ work for entering and leaving math mode? (4) I never understood math. –  Jan 06 '17 at 22:55
0

Here is a LaTeX solution using LuaTeX. Call \CheckPercent to enable the feature and \StopCheckPercent to disable it.

\documentclass{article}
\usepackage{fontspec}
\usepackage{luacode}

\begin{luacode*}
  -- Prepare LPeg pattern
  local percent = lpeg.P('%')                    -- a percent character
  local normal_char = lpeg.P(1) - lpeg.S('\\%')  -- any character that isn't \ or %
  local command = lpeg.P('\\') * lpeg.P(1)       -- \ followed by any character
  local prefix = (normal_char+command)^0         -- normal_char or command, 0 or more times
  local bad_percent = prefix * percent           -- prefix followed by %

  -- Callback to check for a bad percent character
  function check_lone_percent(line)
    -- First check if the line contains %%.
    -- If yes, keep only what comes before %%.
    before_comment = line:match('(.-)%%%%') -- %% is the pattern for %
    if before_comment ~= nil then
      line = before_comment
    end
    -- Now check for a % that isn't \%
    if bad_percent:match(line) then
      tex.error("Found lone percent character",
                {"Make it a comment: %%, or a percent sign: \\%"})
    end
    return nil
  end
\end{luacode*}

\newcommand{\CheckPercent}{%
  \directlua{luatexbase.add_to_callback('process_input_buffer',
             check_lone_percent, 'check for lone percent')}%
}
\newcommand{\StopCheckPercent}{%
  \directlua{luatexbase.remove_from_callback('process_input_buffer',
             'check for lone percent')}%
}

\begin{document}
\CheckPercent
Some text %% Comments with double-percent work

Text can include \% characters.

Newlines before \% also work: \\\%, \\\\\%

But a percent alone will make an error, hopefully
in 100% of cases.

Also after newlines: \\%
\end{document}

It works at the line input level, before any processing is done by TeX. It would be quite simple if not for one tricky case: how to determine if a sequence \\\...\% is valid? If the number of backslashes is even, that's a bunch of newlines before a % so it should raise an error. But with an odd number of backslashes, it means a bunch of newlines followed by \%, which is valid.

The code above implements the check a bit differently: first it discards any comment by searching for %%. Then it starts at the beginning of the line and ignores normal characters (i.e. not \ or %). It also ignores backslashes followed by any character (such as \\, \% or the \e in \emph). If this ignored part is followed by a %, an error is raised.

jeremie
  • 744