37

Assume I have a word like Baden-Württemberg. TeX can't hyphenate any of these two word parts. Why?

no hyphenation in the compound word

why isn't it something like this:

enter image description here

where the small ticks indicate possible hyphenation points. A technical (TeXnical) explanation is welcome.

BTW: I am not asking how to circumvent this (by using the babel shorthand "= for example).

lockstep
  • 250,273
topskip
  • 37,020

2 Answers2

31

The TeXbook, page 454, last but one double dangerous bend paragraph

If a trial word l1 … ln has been found by this process, hyphenation will still be abandoned unless n ≥ λ + ρ, where λ = max(1,|\lefthyphenmin|) and ρ = max(1,|\righthyphenmin|). (Plain TeX takes λ = 2 and ρ = 3.) Furthermore, the items immediately following the trial word must consist of zero or more characters, ligatures, and implicit kerns, followed immediately by either glue or an explicit kern or a penalty item or a whatsit or an item of vertical mode material from \mark, \insert, or \vadjust. Thus, a box or rule or math formula or discretionary following too closely upon the trial word will inhibit hyphenation. (Since TeX inserts empty discretionaries after explicit hyphens, these rules imply that already-hyphenated compound words will not be further hyphenated by the algorithm.)

An explicit hyphen is a character whose character code matches the font's \hyphenchar value or a ligature that ends with such a character (that's why also -- or --- inhibit hyphenation).

Indeed, if you try the following example, you'll see that TeX hyphenates the compound word:

\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage[ngerman]{babel}
\begin{document}
\hyphenchar\font=\string"7F

\parbox{1pt}{In Baden-W\"urttemberg}

\end{document}

The result is

In
Ba-
den-Würt-
tem-
berg

The T1 encoded fonts have in position 0x7F a character which is identical to the normal hyphen. Changing the \hyphenchar to denote this slot, the normal hyphen does not inhibit hyphenation any more.

egreg
  • 1,121,712
  • 3
    8 consecutive lines with hyphenated words in the TeXbook (the mentioned paragraph). That should be worth a Knuth-error-award-cheque. – topskip Jul 13 '12 at 19:55
  • 1
    @PatrickGundlach Read at the bottom of page 451. :) – egreg Jul 13 '12 at 19:59
  • This solution works fine for me if I just load fontenc with the T1 encoding so that it uses cm-super. However, if I load lmodern as well, for example, it doesn't seem to work. Looking at the encoding files, both seem to put hyphen.alt in that spot. Does anybody know why it doesn't work with lmodern? – cfr Nov 26 '13 at 23:07
  • @cfr I get the same result with or without lmodern. – egreg Nov 26 '13 at 23:12
  • Thanks. That means something complicated is going on. Oh well. Time for MWEs. Currently working around the worst cases with \babelhyphen{hard}. However, there is something odd about my hyphenation as it is generally screwed in the bibliography so maybe that's related. Thanks again for the info. – cfr Nov 28 '13 at 01:55
  • I can reproduce the problem with a completely minimal example - the trick works great until I add lmodern. As soon as I load lmodern, it stops working. The only vaguely suspicious thing in the output is that it is loading babel's hyphenation patterns although I didn't load babel. But I think that is built into the latex format somehow and so standard. Unfortunately, I don't have room to give the Minimal Broken Example here or I would in the hope that I'm missing the obvious... – cfr Nov 28 '13 at 23:41
  • EDIT: Never mind. I'm an idiot. If you load lmodern, it matters whether you switch the character before or after \begin{document}. If you use the default fonts, it makes no difference. – cfr Nov 29 '13 at 00:01
  • I don't completely get why \hyphenchar\font=\string"7F enables hyphenation of compound words? Also it seems that then hyphenating at the hyphen of the compound word isn't possible any more? – cgnieder Feb 27 '14 at 10:43
  • @cgnieder The character that makes TeX stop trying hyphenation is the one corresponding to the \hyphenchar for the current font. Of course, changing it makes the normal hyphen not considered an explicit hyphen character any more, so it is treated like any other character; since no hyphenation pattern has it, no hyphenation can take place at it. – egreg Feb 27 '14 at 10:48
  • 2
    @egreg Am I right that changing \hyphenchar ist not really a practical »solution«? One would need to change it for all fonts and fonts series in use to get uniform behaviour for a document... (and still a "= may be needed in Baden-Württemberg so there isn't much gained when typing, anyway) – cgnieder Feb 27 '14 at 11:49
  • 1
    @cgnieder Yes, you're right. IIRC, there should a package that can hook the choice of \hyphenchar at every font loading; but also hyphenation patterns should be modified to have possible breaks at - (the usual hyphen). – egreg Feb 27 '14 at 11:55
  • @egreg: Is there any way to use your \hyphenchar\font=\string"7F solution in case of XeLaTeX or LuaLaTeX and fontspec? – LaTechneuse Mar 09 '16 at 12:16
  • @LaTechneuse Unicode has U+2011 NON-BREAKING HYPHEN which should be the corresponding choice. – egreg Mar 09 '16 at 14:47
  • How am I using it with \hyphenchar\font=\string"7F? Replacing "7f by U+2011 seems wrong :) – LaTechneuse Mar 09 '16 at 16:04
  • @LaTechneuse \hyphenchar\font="2011; but look at the documentation of fontspec for Hyphenchar – egreg Mar 09 '16 at 17:29
  • @LaTechneuse - See the posting Switch meaning of hyphenation commands for a LuaLaTeX-based solution. It doesn't actually modify the meaning of -. Instead, it replaces all instances of - on the fly with"=, the babel shorthand for a hyphenation character that permits hyphenation of what comes before and after "=. – Mico Aug 07 '16 at 20:11
5

The \hyphenchar\font=\string"7F seems not the correct work around for this problem, since it is not font independent.

A better way would be to set the \defaulthyphenchar=127 which seems font independent. Also hyphenation which are defined in acronyms will be correct too. BUT there are still issues when having hyphens in the text like:

Baden-W\"urttemberg
\gls{BP}-Test

enter image description here

If you look at "den-Würt-", it is still not split correctly into two lines. Also the last example "BP-Test" isn't split. The only work around I have found is to use "= instead of - in the text which works for now. But I would appreciate a better solution...

Here is the MWE:

\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage[ngerman]{babel}

%\usepackage{lmodern}
\usepackage{mathptmx}

\usepackage[
nonumberlist,
acronym,
nopostdot,
section]
{glossaries} 

\newacronym{BP}{BP}{Borderline-Pers\"onlichkeitsst\"orung}

\defaulthyphenchar=127

\begin{document}
\parbox{1pt}{In Baden"=W\"urttemberg}

\parbox{1pt}{In \gls{BP}}

\parbox{1pt}{\gls{BP}"=Test}

\end{document}

enter image description here

Martin
  • 168