39

(Remark: Maybe this is a typically German problem, I'm not sure in which other languages that might be relevant)

Sometimes there are words which start with a hyphen, as they are connected with a word used before, e. g. Werkstoffforschung und -entwicklung is a shorter (and I think more elegant) way to say Werkstoffforschung und Werkstoffentwicklung.

However, if the second word, which starts with the hyphen, is at the beginning of a new line, it can happen that the hyphen stays in the previous line and the word itself is the first one in the next line.

Minimal example:

\documentclass[11pt]{scrreprt}
\usepackage[ngerman]{babel}  
\begin{document}
Das ist ein Mustertext, der dazu da ist, um diese aa Textsatzproblematik
    bzw. -schwierigkeit zu demonstrieren 
\end{document}

output

(I hope that the example "works" for you to demonstrate the problem, when I typeset it with pdflatex, the "-" is at the end of the first line and the "schwierigkeit" is in the second one which is not wanted.)

Is there a smart* way to avoid that problem?
* pdflatex shall handle this automatically and "know" that a hyphen at the beginning of a word always has to stay connected to the word and shall not be separated from it.

doncherry
  • 54,637

4 Answers4

34

UPDATE: babel v3.9, released in March 2013, introduces a set of \babelhyphen macros -- see section 1.6 of the manual for details. In particular, \babelhyphen{nobreak} (the non-starred version) provides a non-breakable hyphen which allows hyphenation in the rest of the word -- for the present question, this may be used to define a new shorthand which removes the need to look up allowed breakpoints. (Note: Sometimes, the first allowed breakpoint will be located three or less characters after the non-breakable hyphen; if you consider such a breakpoint as inadequate, use my original answer.)

\documentclass[11pt]{scrreprt}
\usepackage[ngerman]{babel}  
% The following requires babel v3.9 (released March 2013)
\defineshorthand[ngerman]{"+}{\babelhyphen{nobreak}}
\begin{document}
Das ist ein Mustertext, der dazu da ist, um diese aa Textsatzproblematik
    bzw. "+schwierigkeit zu demonstrieren 
\end{document}

ORIGINAL ANSWER: Use babel's "~ shorthand to add an explicit hyphen with prohibited line break; supplement this by using the "- shorthand to specify the first allowed follow-up breakpoint. (You will have to look up those with \showhyphens{Schwierigkeit}. See pp. 5--7 of the documentation of the german package for details and other shorthands. (Note: the babel shorthands originate from the german package, which is considered obsolete nowadays.)

By the way, using "~ but not "- in your example would produce an overfull hbox.

\documentclass[11pt]{scrreprt}
\usepackage[ngerman]{babel}  
\begin{document}
Das ist ein Mustertext, der dazu da ist, um diese aa Textsatzproblematik
    bzw. "~schwie"-rigkeit zu demonstrieren 
\end{document}

enter image description here

lockstep
  • 250,273
  • @lockstep: thanks! I feared that something with ˚\mbox{} would be the recommended solution for that problem, but I dislike this kind of solutions very much as it makes the source text more difficult to read and it requires me thinking about such things. IMHO it is contrary to the philosophy of TeX (for my understanding to separate text editing and typesetting to avoid distractions) if I have to think all the time about how to trick LaTeX into doing it the correct way. :-( ... – MostlyHarmless Mar 12 '11 at 21:12
  • so isn't it possible (and wouldn't it make sense) to let LaTeX never separate a hyphen from the following characters or words if it follows a space (or in other words to never leave a lonely hyphen at the end of a line)? In what situation could it make sense what LaTeX does in this case? – MostlyHarmless Mar 12 '11 at 21:16
  • @Martin: I don't know if your suggested rule could be implemented. As for "making sense": I guess it made sense to the (English-speaking) developers of (La)TeX not to worry about words starting with a hyphen. – lockstep Mar 12 '11 at 21:21
  • Another option would be to use the \discretionary primitive (perhaps that's what ngerman does internally?): Das ist ein Mustertext, der dazu da ist, um diese aa Textsatzproblematik bzw. \discretionary{-}{-}{-}schwie-rigkeit zu demonstrieren – Gonzalo Medina Mar 12 '11 at 21:45
  • @lockstep: From the TeXbook, page 96: "Some German words traditionally change their spelling when they are split between lines. For example, ‘backen’ becomes ‘bak-ken’ and ‘Bettuch’ becomes ‘Bett-tuch’." So Knuth did know enough about German to worry about German hyphenation! – Hendrik Vogt Mar 12 '11 at 22:12
  • Good workaround, but the really annoying part is that you have to specify the first allowed follow-up breakpoint manually. – Hendrik Vogt Mar 12 '11 at 22:14
  • @Hendrik: True, but words starting with a hyphen, i.e. a hyphen that is never a legal breakpoint, may well have been a problem too arcane for Knuth. – lockstep Mar 12 '11 at 22:17
  • @Hendrik: Here's more about the "really annoying part". – lockstep Mar 12 '11 at 22:19
  • @Hendrik, @lockstep: Thanks for your comments - I think no matter if Knuth knew about this particular problem or not: I wonder how it could make any sense to tolerate a lonely hyphen at the end of a line and maybe it would be possible to generally avoid it.. (maybe things are different with a -- dash (Gedankenstrich) which IMHO should say at the end of the line, or shouldn't it, typographically? – MostlyHarmless Mar 13 '11 at 04:45
  • @Hendrik, @lockstep: I had seen [lockstep's other question} but I had not really read all the comments carefully: could \lefthyphenmin=4 help us avoid that? It has no influence in my example, if I try.. – MostlyHarmless Mar 13 '11 at 05:01
  • @lockstep: Thanks for your comments. Hope my "really annoying part" didn't offend you. (I do find it annoying, but after all, bugs are annoying, and I'd regard this as a bug even if one can't blame Knuth for it.) – Hendrik Vogt Mar 13 '11 at 07:46
  • @Hendrik: No worries! :-) – lockstep Mar 13 '11 at 09:56
  • @Hendrik: so if you consider this a bug (where I happily would agree), where/whom could I ask to fix it? – MostlyHarmless Mar 13 '11 at 13:38
  • 1
    @Martin: Good question. The best would be to fix the internal hypenation algorithm of TeX, but that won't happen: It's a lot more important that TeX is stable, i.e., the output doesn't depend on whether you compile now or in 10 years. And the discussion here shows that it'll be hard to write a package that fixes the problem. – Hendrik Vogt Mar 13 '11 at 14:28
11

(Edited answer; my initial idea of redefining \- was bad; thanks to lockstep for pointing that out and for suggesting \declare@shorthand.)

Based on Ulrike's solution I came up with a version that does not have the side effect of disabling hyphenation at the - in "Arbeiter-Unfallversicherung"; you also won't have to specify the first allowed follow-up breakpoint (cf. lockstep's answer). I can't tell if it has other side effects. The drawback is that you'll have to type "_ to get a hyphen that disallows a linebreak after it.

\documentclass[11pt]{scrreprt}
\usepackage[ngerman]{babel}
\makeatletter
\declare@shorthand{ngerman}{"_}{\hyphenchar\font=-1 -\hyphenchar\font=`\-}
\makeatother
\begin{document}
Das ist ein Mustertext, der dazu dient, diese unsch"one Textsatzproblematik
bzw. "_schwierigkeit zu demonstrieren.
Im n"achsten Satz gibt es einen Test f"ur die Arbeiter-Unfallversicherung.
\end{document}

Note that this solution does not depend on T1-encoding.

Hendrik Vogt
  • 37,935
  • Well yes the solution doesn't depend on T1. But it doesn't depend on the number "127" either. Any number different to the position of the hyphen will work (including -1). – Ulrike Fischer Mar 14 '11 at 13:20
  • @Ulrike: Thanks a lot for this comment. -1 looks indeed more natural in my solution; I've edited the answer accordingly. – Hendrik Vogt Mar 14 '11 at 19:28
  • @Hendrik: I'm not sure if the original definition of \- is without use in German and therefore would plead for defining a new command, possibly in form of a new babel shorthand. – lockstep Mar 14 '11 at 19:39
  • @lockstep: Why would you use \-? But if you have a better idea, please tell me - I've actually thought about it for a few minutes and then came up with \-. – Hendrik Vogt Mar 14 '11 at 19:55
  • @Hendrik: Example of a new babel shorthand: \usepackage[ngerman]{babel} \makeatletter \declare@shorthand{ngerman}{"+}{\hyphenchar\font=-1 -\hyphenchar\font=`\-} \makeatother – lockstep Mar 14 '11 at 19:56
  • @Hendrik: For some words, TeX may find incorrect breakpoints, which should be corrected with \- (i.e., defining the correct break points and prohibiting the ones TeX has found). I know that \hyphenation may also be used, but IIRC the capacity of this TeX macro is limited to 300 words. – lockstep Mar 14 '11 at 20:03
  • @lockstep: Ah, so "- doesn't work for preventing incorrect breakpoints? – Hendrik Vogt Mar 14 '11 at 20:06
  • @Hendrik: Exactly. It only adds new (hopefully correct) break points. – lockstep Mar 14 '11 at 20:11
  • @lockstep: Thanks for insisting that redefining \- is a bad idea. It's even worse: it breaks "-. No wonder, in ngermanb it says \declare@shorthand{ngerman}{"-}{\nobreak\-\bbl@allowhyphens}. – Hendrik Vogt Mar 14 '11 at 21:48
  • The standard "~ works already fine with the exception that is suppress hyphenation. So why not simply add a \bbl@allowhyphens to the definition of "~? \declare@shorthand{ngerman}{"~}{\textormath{\leavevmode\hbox{-}}{-}\bbl@allowhyphens} – Ulrike Fischer Mar 15 '11 at 08:42
  • @Ulrike: For phrases like "Gehaltszu- und -abschläge" I'd like to have both possibilities: Allowing a line break after "ab" for automatically created texts and allowing a line break only after "abschlä" for texts with manual correction. – lockstep Mar 15 '11 at 08:59
7

You can use T1-encoding and set \hyphenchar to 127. But you must do it for all fonts, which in the end means that for you must correct the font definitions (here as an example the entries from T1cmr.fd:

\documentclass[11pt]{scrreprt}

\makeatletter
\providecommand{\EC@family}[5]{%
  \DeclareFontShape{#1}{#2}{#3}{#4}%
  {<5><6><7><8><9><10><10.95><12><14.4>%
   <17.28><20.74><24.88><29.86><35.83>genb*#5}{\hyphenchar\font=127}}
\DeclareFontFamily{T1}{cmr}{}
\EC@family{T1}{cmr}{m}{n}{ecrm}
\EC@family{T1}{cmr}{m}{sl}{ecsl}
\EC@family{T1}{cmr}{m}{it}{ecti}
\EC@family{T1}{cmr}{m}{sc}{eccc}
\EC@family{T1}{cmr}{bx}{n}{ecbx}
\EC@family{T1}{cmr}{b}{n}{ecrb}
\EC@family{T1}{cmr}{bx}{it}{ecbi}
\EC@family{T1}{cmr}{bx}{sl}{ecbl}
\EC@family{T1}{cmr}{bx}{sc}{ecxc}
\EC@family{T1}{cmr}{m}{ui}{ecui}
\makeatother
\usepackage[T1]{fontenc}
\usepackage[ngerman]{babel}
\begin{document}

Das ist ein Mustertext, der dazu da ist, um diese aa Textsatzproblematik bzw. -schwierigkeit zu demonstrieren

\end{document}
lockstep
  • 250,273
Ulrike Fischer
  • 327,261
4

Perhaps it's useful to think about breaking the line at this explicit hyphen in terms of how TeX looks at the issue.

According to the TeXbook at the bottom of pg. 96, hyphenating words at their explicit hyphen implies in a penalty given by \exhyphenpenalty, whose default is 50 (at least in plain TeX).

So forbidding breaks at the explicit hyphen with \exhyphenpenalty=10000 in the preamble of your document seems to me like the least effort solution. I tested here and it works with your MWE.

Mafra
  • 1,615
  • 2
    This would inhibit breaking also at legitimate hyphens. – egreg Nov 26 '12 at 17:02
  • 2
    This disables line breaks after all explicit hyphens, not only those at the start of a word. – lockstep Nov 26 '12 at 17:02
  • 1
    @lockstep: Surely, but I'd say that breaking at explicit hyphens is not always wanted anyways (like in 2-D). Of course you can cook up examples like the Arbeiter-Unfallversicherung from Hendrik above, but TeX could find another paragraph break in that case. My solution avoids having to type extra stuff to circunvent the issue -- one can always type things inside mboxes to avoid hyphenation as a last resource. – Mafra Nov 26 '12 at 17:31