9

I already looked at related questions, such as Stable alternative to \thepage, but I am just as baffled as before. MWE:

\documentclass{article}
\begin{document}
\thepage\par
\null\null\null\null\null\null\null\null
\null\null\null\null\null\null\null\null
\null\null\null\null\null\null\null\null
\null\null\null\null\null\null\null\null
\null\null\null\null\null\null\null\null
\null\null\null\null\null\null
\thepage\par
\end{document}

The above code produces a two-page document. It prints \thepage as 1 on both pages. Add another \null or two, and the second \thepage correctly shows as 2.

Apparently, \thepage does not kick in until about the third line of the page. I have tried this with more complicated code, and it seems to be the general case.

Bug? Feature? My mistake? Is there any way I can get \thepage correct on the very first line of a page? OK to use LuaTeX.

3 Answers3

9

Your mistake...

Here's an example having 24 pages (you can increase this quite a bit) with the same result - \thepage keeps printing 1:

enter image description here

\documentclass{article}

\usepackage[nopar]{lipsum}

\def\x{\lipsum[1]~\thepage~}
\def\xx{\x\x\x\x\x\x\x\x\x\x}

\begin{document}
\xx\xx\xx\xx\xx\xx\xx\xx\xx\xx
\end{document}

Why is this? It's because the first (only) sign of a paragraph break is at the end of the document (actually, issued during \end{document} as a result of \clearpage). TeX considers the above 100 \lipsum[1]~\thepage~ constructions as a single paragraph, and there are only certain locations where TeX initiates the page builder (see section 27.2 Activating the page builder of TeX by Topic, or p 110 of The TeX Book):

The page builder comes into play in the following circumstances.

  • Around paragraphs: after the \everypar tokens have been inserted, and after the paragraph has been added to the vertical list. See the end of this chapter for an example.
  • Around display formulas: after the \everydisplay tokens have been inserted, and after the display has been added to the list.
  • After \par commands, boxes, insertions, and explicit penalties in vertical mode.
  • After an output routine has ended.

How do you get an appropriate page number? Use \labels, and specifically, \pageref:

enter image description here

\documentclass{article}

\usepackage[nopar]{lipsum}

\def\x{\lipsum[1]~\thepage~}
\def\xx{\x\x\x\x\x\x\x\x\x\x}

\begin{document}
\xx\xx\xx\xx\xx\xx\xx\xx\xx\xx%
\label{mylabel}\pageref{mylabel}
\end{document}

This works because \labels are designed to function around the page builder as it requires the page reference capability.

Werner
  • 603,163
  • I tried the code shown just above, pasting it as-is. But I still see only 1 as the number on every page. Also, if (in my original code) I replace each \null with ~\par I get the same bad result. –  Nov 08 '16 at 05:28
  • @RobtA: That's because I still used \thepage as in your example. The only difference is visible when you use a \label together with a \pageref (like in my last page number). The \label has to be unique though. – Werner Nov 08 '16 at 05:51
  • I have played with \label and \pageref as you suggested. Alas, it requires double compilation. That may be effective for many users, but it is not what I had in mind. Further investigation reveals that \thepage appears to be quite repeatable on third line, even when using one-line paragraphs. I had envisioned that the page number would change, and be immediately available, as soon as a page shipped. Guess not. –  Nov 08 '16 at 06:21
  • @RobtA: It only requires an additional compilation if the page reference changes. And no, the page is not fully known inline. You can either use a \label-\ref (or \pageref) or tap into the shipout routine. – Werner Nov 08 '16 at 06:30
  • @RobtA the page number is known when the page is shipped out so your guess is correct. but tex looks ahead to consider possible break points it does not break as soon as there is a feasible one it (roughly) always goes to at least the next paragraph to check that breaking that is not feasible. Once it splits off a page any remaining material not used on that page is re-inserted but macros expanded before that. – David Carlisle Nov 08 '16 at 09:54
  • @DavidCarlisle: I see. Actually makes sense. Going through other material, I noticed that there is somethng called atbegshi that might be helpful. I do not actually need to print, or even know, the page number. All I really need to know is whether, between two points, the page changed. It can wait until the start of the following paragraph. This will be a learning task. –  Nov 08 '16 at 15:18
  • @RobtA any reliable way of determining a page change will be equivalent to setting a \label at the two points and comparing \pageref on the next pass. – David Carlisle Nov 08 '16 at 16:04
  • @RobtA: There's atbegshi and everyshi (and maybe others). They all function at the time when the pages are being shipped out. At that stage, the page content has already been accumulated. – Werner Nov 08 '16 at 16:17
  • @Werner, @DavidCarlisle: atbegshi and everyshi look promising. Currently I am attempting code (learning exercise) like this: yada yada\par\A yada yada yada yada\par\B yada yada. If there is a page break between \A and \B then something is written to the log file. Nothing needs to be written into the document. Document is fiction. For dramatic effect, some short passages should finish on the same page (or page spread) where they start. I am trying to come up with a way to identify those things automatically. Thought that recording and comparing \thepage at A and B would do it. No. –  Nov 08 '16 at 17:06
  • @RobtA: You can put the "short passage" inside an unbreakable block (like a minipage or tabular) and it will never break across the page boundary. Alternatively, consider using needspace. – Werner Nov 08 '16 at 17:36
  • @Werner: Unbreakable block would be a possible solution, but has the potential for other problems in this context. I'm more interested in getting a log message. See my alternative solution posted below. I had to re-phrase my own question to get there. –  Nov 08 '16 at 18:05
4

the page number is incremented by the output routine when it is adding headers and footers after having decided where to make a page break. Note that this happens independently of the execution of the macros that are making up the vertical galley that will eventually be split into pages.

The paragraph builder considers a whole paragraph, typesetting into one long horizontal list before splitting that list into lines, which means there is no way of testing in a macro directly which line of a paragraph you are on, as al macros are executed before linebreaking is considered. Similarly the page breaker cuts in and considers splitting off a chunk of the constructed vertical list by which time any macros will have been expanded long before. If you have short paragraphs and not much stretch the page breaker probably only ever makes one page and so thepage is only "wrong" for the last few lines of a paragraph held over from the previous page, but in general it can be any number pf pages. If you have enough text to make a 10-page paragraph then all macros within that paragraph will have been expanded before any line is constructed (and so before any page is shipped out) so \thepage would print 1 even if it is printed on page 10, like:

\documentclass{article}

\def\a{one two three four five six seven }
\def\b{\a red blue green\a\a pink yellow }
\def\c{\b\b\a\a\b\b\b aples oranges pears }
\def\d{\c\c\a\a\b\b\b\b\c\c\c\c\c}

\begin{document}
\d\d\d\d\d\d\fbox{\thepage}
\end{document}

enter image description here

David Carlisle
  • 757,742
0

After thinking about what Werner and David had to say, I realized that my original question was off the mark. Instead of writing something on the page, or forcing text to be in a block, I would rather get an informational message in the log. The reason is that there are many other things going on, too involved to phrase as a question here (and I'm not a good coder).

But I was able to come up with the following MWE, which exhibits the intended behavior:

\documentclass[letterpaper,12pt]{article}
\RequirePackage{everyshi}
\RequirePackage{xifthen}
\newcounter{runningPage} % Not necessarily the page number!
\newcounter{tempInsertStuff}
\newcounter{tempFinishStuff}
\EveryShipout{\stepcounter{runningPage}}
\newcommand\insertStuff{%
  \setcounter{tempInsertStuff}{\value{runningPage}}%
  \setcounter{tempFinishStuff}{\value{runningPage}}%
}
\newcommand\finishStuff{%
  \strut% seems necessary?
  \setcounter{tempFinishStuff}{\value{runningPage}}%
  \ifthenelse{\equal{\value{tempInsertStuff}}{\value{tempFinishStuff}}}{}{%
    \typeout{Ahoy: `Stuff` crossed page, at or before page \thepage.^^J}%
  }%
}
\begin{document}
yada0\par yada1\par yada2\par yada3\par yada4\par yada5\par yada6\par    yada7\par yada8\par yada9\par
yada10\par yada11\par yada12\par yada13\par yada14\par yada15\par yada16\par yada17\par yada18\par yada19\par
yada20\par yada21\par yada22\par yada23\par yada24\par yada25\par yada26\par yada27\par yada28\par yada29\par\insertStuff
yada30\par yada31\par yada32\par yada33\par yada34\par yada35\par yada36\par yada37\par yada38\par \finishStuff yada39\par
yada40\par yada41\par yada42\par yada43\par yada44\par yada45\par yada46\par yada47\par yada48\par yada49\par
\end{document}

The log message (Ahoy) is written when finishStuff is placed following yada38, which is the first paragraph on the second page. But the message is not written when finishStuff is placed following yada37, which is the final paragraph on the first page. So if my block of text is yada30 to 38, it can break, but I will get the message.

In my context, it does not matter if the paragraphs are longer, but someone else might find the above solution ineffective.

  • Note that your paragraphs here are only one line long. This, together with a zero-length \parskip makes for an obvious change in the page number as a \par sparks the page builder. Have paragraphs of (say) three lines each may cause you problems. – Werner Nov 08 '16 at 18:30
  • @Werner: Indeed. But I just tried it with longer paragraphs, and it behaves as expected. In particular, if a single long paragraph is split across the page break, then Ahoy informs me about it. I have also tried with an image using \insertgraphics and it seems to work: If the image is placed on the page intended, no Ahoy, but if it is forced to the following page due to lack of space, Ahoy appears. –  Nov 08 '16 at 19:11
  • That's because you have \insertStuff and \finishStuff as part of separate paragraphs; so you have \insertStuff ... \par ... \finishStuff. With that you will invoke the page builder and have correct output. Remove the interior \pars and add some text and you'll see a difference. Here's an example. – Werner Nov 08 '16 at 19:20
  • @Werner: OK, I see that the code has a problem there. But text similar to your example would not arise "in use", so I think that the problem is solved for the expected usage. Note that when I first asked the question, I could not get \thepage to work, with or without \par endings: If the top line of a page was its own paragraph, \thepage was not yet set at its end. But EveryShipout seems to catch the change, so that it is available at the end of the first \par on a page. That's all I need. –  Nov 08 '16 at 20:57