26

When finalizing documents, one of the last things I do is usually to play with the page geometry to improve the overall layout: get rid of a few orphans or widows, reduce the number of hyphenations, etc.

I usually start by designing my documents with an acceptable or imposed/suggested page geometry, say

\usepackage[scale=0.75]{geometry}

but more often than not (if not always), I can actually replace that 0.75 (as an example) by anything in between 0.73 and 0.77 (because even when the page geometry is supposedly "mandatory", most people won't notice the difference...).

My process so far is however manual, tedious and quite subjective, basically trying out different scale options and picking the one most pleasing to my eye...

A better process would be to have TeX output a few typographical indicators or, better, a global "badness" indicator for the whole document, and then use a small script to optimize the scale factor based on this. So here are my few related questions:

  1. a) Is it possible to have TeX output such typographic quality indicators such as:

    • The number of widows,
    • The number of orphans,
    • The number of hyphens,
    • The standard and maximum inter-word spacing,
    • The number of lines with inter-word spacing greater than twice the standard value, and
    • Any other relevant typographic quality indicator you believe is useful? (I am assuming cardinal sins like overfull boxes have been dealt with in all cases).

    b) For widows, orphans, hyphens and similar relevant grave sins, would it be possible to modify TeX's output routine to identify where they happen? Either directly inside the document, as a Warning in the log, or both at choice (as happens for over- and under-full hboxs).

  2. Is it possible to get TeX to output a global quality / badness indicator for the whole document? I know TeX works with a system of penalties internally; is it possible to output the penalty total for the whole document, and would that be an appropriate metric for my optimization desire?

(Note: the answer does not have to work on every engine; in particular, Lua code is perfectly acceptable as I guess the question draws to some of the stated objectives of luatex to "open up the internals of TeX")

Xavier
  • 13,947
  • 2
    You may be interested in the new CTAN 'package' wheretotrim. It is a 'Perl script that anal­y­ses a doc­u­ment, and re­ports the page and col­umn on which the least amount of text needs to be trimmed to re­duce the page count.' Caveat: only tested so far on Linux. – jon May 20 '13 at 18:14
  • @jon Thanks for the pointer. My question though is not about how to minimize the number of pages, but rather how to maximize their beauty. – Xavier May 20 '13 at 19:15
  • Indeed; but the thought behind the suggestion is that changing the size of paragraphs can have beneficial aesthetic side-effects. If you are enlarging the page to reduce widows, e.g., these are exactly the paragraphs that wheretotrim should identify as easy to 'fix' paragraphs in terms of reducing the overall length of the document. That is, these two goals overlap in many cases. – jon May 20 '13 at 19:25
  • @jon Oh yes, I can see the link now to help identify widows and orphans – Xavier May 20 '13 at 19:30
  • 1
    You may also wish to consider playing with looseness of paragraphs. – Andrew Swann May 21 '13 at 06:26
  • @Xavier 1. I think your reason is interesting, but not solely for the purpose you state. It would be very interesting indeed for everyone if LaTeX could tell us where are the widows, orphans and hyphens (perhaps with a \marginpar), for proofreading purposes, and it would be easy to add a counter so you know how many there are. For horizontal spacing, you will get "underfull \hbox" warnings. 2. I think it would be better to output a mean and/or a median level of badness, as well as the extremes. Otherwise, any long document will have a huge badness rate. – ienissei May 21 '13 at 14:42
  • @Xavier (Continued) If you feel that my suggestions are worthwhile, could you please update your question to include them, and delete this second comment. I have no solution for you, but think both points would deserve a package (that I do not have the skills to write). – ienissei May 21 '13 at 14:43
  • @ienissei Great idea to identify where widows, orphans and hyphens take place! I've added it as a subquestion. As for the mean/median badness level, I actually believe it to be a bad idea. Imagine for example a 2-page document, with 2 big sins on 1st page and only a widow on the 2nd page. If you increase the geometry to remove the widow, the total badness will decrease, but the average/median badness per page will increase. I would definitely go with the 1-pager without widow. – Xavier May 22 '13 at 20:35
  • @Xavier What I meant was a mean or a median of the badness caused by whatever sins we are looking for (widows and orphans, multiple hyphenated lines, etc.). Imagine the total badness of a 700-page document that has one minor sin every ten pages or so, compared with your one-pager. Total badness is useful only for very short or very similar documents (say, all of your documents are about 50-page long). But if we can calculate total badness, we can get the average badness easily – so it could be a plus. – ienissei May 23 '13 at 08:31
  • 1
    @ienissei I understand, but I don't really see the point of comparing the badness of a 700-page document to a 1-pager. They just can't be compared in my mind, but maybe I am wrong. Good news is, as you said, if we can get either total or average badness, we can get the other easily :) – Xavier May 23 '13 at 15:08

1 Answers1

11

This is not exactly what you are calling for, but since you wanted any other relevant typographic quality indicator I believe is useful, let me mention two packages:

  • Patrik Gundlach's lua-check-hyphen, which lets you review all hyphenations actually used in a document, and

  • Raphaël Pin­son's impnattypo, which implements quite a few rules of French typography; a few of them are not specific to French (like inserting ties after one-letter words, which is also required in Polish and (AFAIK) Czech), and some are even universal (like avoiding rivers); impnattypo can fix some of these issues and highlight other ones.

Both packages require LuaLaTeX (well, some features of impnattypo work without it). See their docs for the exact feature list.

mbork
  • 13,385
  • Thanks for the links! I didn't knew about lua-check-hyphen; as for impnattypo, it doesn't answer my quantification need, but it's quite impressive; I just read on TeX.SX that it can detect rivers! – Xavier May 23 '13 at 00:06