1

TeX won’t hyphenate a “word” if the word already contains a “discretionary”. As a consequence, compound words such as integro-differential and Rajamahendravaram--Visakhapatnam will not be hyphenated, since an empty discretionary was inserted by TeX after each explicit hyphen and dash.

I know how to trick TeX into hyphenating these compound words (there are plenty of similar questions on this site already, each with great answers). My question is about the kinds of items that should go into the horizontal list.

In TeX by Topic, Victor Eijkhout writes

\def\={\penalty10000 \hskip0pt -\penalty0 \hskip0pt\relax}
... integro\=differential equations...

and I don’t think I fully agree with this approach.

The \penalty0 is the most problematic: I think it should be \penalty\exhyphenpenalty (or it can simply be \penalty10000 , because there is already an empty discretionary in front of it?).

Secondly, the pre-hyphen sequence \penalty10000 \hskip0pt can be simplified to a single \kern0pt .

Lastly—since kern was mentioned—this approach doesn’t address implicit kerns around the hyphen and dashes. For example, the font could contain instructions that “o followed by - has positive kerning of 0.03em” and “en dash followed by V has negative kerning of 0.09em”.

In terms of horizontal list items, these are what I have come up with:

<first word>
<kern>    % kerning between last letter of first word and hyphen/dashes
<hyphen/dashes>
<empty discretionary>
<penalty> % no break penalty, or explicit hyphen penalty?
<glue>    % no stretch nor shrink, kerning between hyphen/dashes and first letter of second word
<second word>

Am I on the right track? Is there something that I have missed?


There seems to be some misconception that “a \nobreak following an explicit hyphen is wrong because it prohibits line break after this hyphen”. I disagree, and I hope the following plain TeX example will clear such misconception:

% plain TeX example
\hsize=24pc
\catcode`\@=11

\def\hyph{\kern\z@-\nobreak\hskip\z@skip} % This allows both pre- and post-hyphen words to be hyphenated, % and the \nobreak does not prohibit line break.

\noindent Here is a rather complicated and hard-to-understand integro\hyph differential equation. Maybe some experts understand this integro\hyph differential equation. And maybe some of them can solve this integro\hyph differential equation.

\bigskip

\def\hyph{-\penalty\z@\hskip\z@skip} % This does not allow pre-hyphen word to be hyphenated. % Also, breaking at explicit hyphen results in zero penalty % instead of \exhyphenpenalty being charged.

\noindent Here is a rather complicated and hard-to-understand integro\hyph differential equation. Maybe some experts understand this integro\hyph differential equation. And maybe some of them can solve this integro\hyph differential equation.

\bye

hyphen

More on \def\hyph{-\penalty\z@\hskip\z@skip}: Apart from the wrong penalty being charged (zero instead of \exhyphenpenalty), this also messes up the demerits calculation. Since the break now happens at the penalty, \doublehyphendemerits and \finalhyphendemerits will have no influence on this explicit hyphen, which is wrong. So, by this logic, even \penalty\exhyphenpenalty is incorrect, because break should not happen at a penalty! The only correct post-hyphen penalty is \nobreak.

Ruixi Zhang
  • 9,553
  • Instead of "discretionary", you should be thinking "explicit hyphen". A discretionary is added only by an author saying "if a hyphen is needed here, feel free to add it." (In that case, if additional hyphens may be needed, the author needs to add more discretionaries.) The coding of \=' is intended to avoid the rule that says "don't consider hyphenating a word that already contains a hyphen."tugboat.clsdefines\def\hyph{-\penalty\z@\hskip\z@skip }` to use for this purpose. – barbara beeton Mar 10 '22 at 17:22
  • @barbarabeeton After each “explicit hyphen“, an “empty discretionary” \discretionary{}{}{} is automatically inserted. Knuth still calls this auto-inserted thing “a discretionary item” nonetheless. The problem with the post-explicit-hyphen code is the \penalty\z@ part: zero penalty will be charged when breaking after the hyphen in \hyph, but an \exhyphenpenalty should be charged. – Ruixi Zhang Mar 10 '22 at 18:38
  • @barbarabeeton So a more “correct” way to define \hyph would be \def\hyph{-\penalty10000 \hskip\z@skip} or simply \def\hyph{-\nobreak\hskip\z@skip}. The \nobreak does NOT prohibit breaking after the hyphen—breaking can still happen at the auto-inserted empty discretionary and such a break will be charged a penalty of \exhyphenpenalty, as desired. – Ruixi Zhang Mar 10 '22 at 18:48
  • @barbarabeeton Secondly, this \hyph will only enable hyphenation in the post-hyphen word. The pre-hyphen word still won’t be hyphenated. So an even more “correct” definition would be \def\hyph{\kern\z@-\nobreak\hskip\z@skip}. And lastly, which was my third point in this post, what about the kerning around the hyphen/dashes? Do authors just write Rajamahendravaram\hyph\hskip-0.09em\relax Visakhapatnam? – Ruixi Zhang Mar 10 '22 at 19:07
  • The assumption is (almost?) always that a break should be permitted, even encouraged, after an explicit hyphen, so \nobreakor\penalty10000following the hyphen is *always wrong*. (If used, it should always precede the hyphen; I'm not sure how to express this properly in Arabic and other RTL languages.) Since actual problems are almost always found when text is (nearly) final, adding\hyph` is (or should be) intentional, and it's probably not necessary to be picky. – barbara beeton Mar 10 '22 at 19:07
  • @barbarabeeton The \nobreak following the explicit hyphen will be bypassed and will not prohibit line break. Line break happens at the empty discretionary between the explicit hyphen and the \nobreak, after which the \nobreak and the zero skip will disappear (being discardable). – Ruixi Zhang Mar 10 '22 at 19:13
  • Maybe Why can words with hyphen char not be hyphenated? would be helpful. (Many years ago, I had this all worked out, but I've forgotten the chain of logic. Now I just use the already-constructed workaround.) – barbara beeton Mar 10 '22 at 19:28
  • @barbarabeeton That post (along with TeX by Topic) is the origin of my question/frustration… I added a concrete plain TeX example to show the use of \nobreak after the explicit hyphen, and I hope this example is finally convincing enough. – Ruixi Zhang Mar 10 '22 at 20:21
  • 1
    Your plain TeX example has demonstrated the error in my understanding. (And in the TUGboat practice for 40+ years.) I will try to formulate an answer, but it may take a few days. – barbara beeton Mar 11 '22 at 01:38
  • @barbarabeeton A few days? ;) – cfr Nov 15 '23 at 06:34
  • @cfr Maybe I should get a paper published in TUGboat, you know, in hopes of permanently banishing this misconception (TeX by Topic, I’m looking at you!) The more I think about it, the more apparent the break should not occur at penalty. – Ruixi Zhang Nov 15 '23 at 14:15
  • Honestly, I don't understand it. I hoped an answer might help. – cfr Nov 15 '23 at 16:21

1 Answers1

0

FWIW, as far as I can discern, you are correct on all counts. Is that the answer you're looking for :)? A TUGboat article would be welcome.

As far as I know, Victor is not actively maintaining TbT and has no plans to ever publish another version.

As for your question about kerns, in my experience, authors simply use \hyph. They don't insert explicit kerns, and are probably unaware of the difference in the output. Something we could easily miss too, for that matter.

Although it is nicely symmetric to allow hyphenation either before or after the explicit hyphen, semantically and typographically, I'm not sure it's desirable (apart from the pre-hyphen text being exceedingly long). I don't recall ever seeing foo-bar-baz-quux with explicit hyphens on the following line. As a reader, I think I would be pretty surprised by that ...

Karl Berry
  • 2,102