3

I am writing a German text with many words containing a hyphen (IR-Spektrum, Raman-Spektrum, UHV-Kammer, EPR-Spektroskopie etc.) using pdfLaTeX. In order to allow hyphenation at all reasonable break points I have to include the hyphen as "= instead of - (IR"=Spektrum, Raman"=Spektrum, UHV"=Kammer, EPR"=Spektroskopie etc.). Given the amount of hyphens I have to write it's getting rather annoying and the code gets less (proof-)readable.

Is there a way to interchange the meaning of - and "= in pure text cases (i.e. outside math), while also keeping -- as – (en-dash) and --- as — (em-dash)?

Mico
  • 506,678
Eldrad
  • 279

1 Answers1

1

(Answer revised thoroughly after receiving comments by the OP and by @UlrikeFischer.)

I take it that you want to permit hypenation in the "subwords" both to the left and the right of a compound word joined by a hyphenation character. I further assume that the left-hand subword may end with either a letter or a digit, while the right-hand subword must start with a letter. Examples of such hyphenated compound words are EPR-Spektroskopie, Baden-Württemberg and T1-Schriften. If using LuaLaTeX instead of pdfLaTeX is an option for you, it is straightforward to set up a Lua function that performs the conversion of - to "= and, conversely, of "= to - on the fly.

As you've requested, "= can -- in fact, it must -- be used for instances of hyphens that must be treated as "ordinary" hyphens, i.e., which must not be converted to "= for typesetting purposes. Suppose, for instance, that the siunitx package is loaded and that the document contains the sequence

\SI[per-mode=symbol, tight-spacing]{4e-3}{\meter\per\second}

Observe that this sequence contains three instances of -: between "per" and "mode", between "tight" and "spacing", and betweeh 4e and 3. To make LaTeX treat the first two instances of - as ordinary hyphenation characters, you must input this sequence as

\SI[per"=mode=symbol,tight"=spacing]{4e-3}{\meter\per\second}

It is not necessary to replace the third instance of - (in 4e-3) with "=. Why not? Because the - character is not followed by a letter. Thus, the Lua function doesn't operate on 4e-3 to begin with.

The code provided below also features LaTeX macros -- named \ConvertDashOn and \ConvertDashOff -- to switch the Lua function on and off. (The default state is "on".) Being able to switch off the Lua function is useful -- crucial, in fact -- if your document contains verbatim-like material and/or code listings. For such material, it would not be desirable to have "= show up in the typeset output instead of -, right?

In case you're curious how the Lua function works: The function performs several global substitutions using the gsub ("global substitution") function and Lua's capture mechanism. Crucially, the function is assigned to LuaTeX's process_input_buffer callback, meaning that it does its work before TeX starts its normal processing.

Aside: The babel package defines "= as \penalty\@M-\hskip\z@skip. An inspection of this definition reveals that it may be used safely in math mode. Hence, it is not necessary to prevent the substitution of - for "= from happening for math mode material.

enter image description here

% !TEX TS-program = lualatex
\documentclass[ngerman]{article}
\usepackage{fontspec,unicode-math,babel,luacode}
\usepackage{siunitx} % for `\SI[...]{...}{...}` example

%%% Lua-side code
\begin{luacode}
function dash_to_breakable_dash ( s )
    s = unicode.utf8.gsub ( s, '(%w)"=(%a)',   '%1XYZYX%2' )
    s = unicode.utf8.gsub ( s, '(%w)%-(%a)',   '%1"=%2' ) 
    s = unicode.utf8.gsub ( s, '(%w)XYZYX(%a)', '%1-%2' )
  return s
end
\end{luacode}

%%% TeX-side code
\newcommand\ConvertDashOn{\directlua{ 
  luatexbase.add_to_callback ( "process_input_buffer" , 
  dash_to_breakable_dash , "dash_to_breakable_dash" )}}
\newcommand\ConvertDashOff{\directlua{ 
  luatexbase.remove_from_callback ( "process_input_buffer" , 
  "dash_to_breakable_dash" )}}
\AtBeginDocument{\ConvertDashOn} 

%% Just for this example
\setlength\parindent{0pt}
\usepackage[textwidth=1pt]{geometry} 
\begin{document}
\obeylines % also just for this example

\mbox{Lua function switched \emph{on}}
EPR-Spektroskopie Baden-Württemberg T1-Schriften
\mbox{$x-y-z$}
% Replace '-' with '"=' in 'per-mode' and 'tight-spacing', 
% but not in '3.5e-6'
\SI[per"=mode=symbol,tight"=spacing]{4e-3}{\meter\per\second}

\bigskip
\mbox{Lua function switched \emph{off}}
\ConvertDashOff
\verb+Raman-Spektrum+ 
\mbox{$x-y-z$}
EPR-Spektroskopie Baden-Württemberg T1-Schriften
\SI[per-mode=symbol,tight-spacing]{4e-3}{\meter\per\second}
\end{document}
Mico
  • 506,678
  • Imho \exhyphenchar -1 does the same. – Ulrike Fischer Aug 07 '16 at 20:35
  • @UlrikeFischer - I'm afraid I have to disagree. First, setting \exhyphenchar -1 and examining, say, EPR-Spektroskopie, I get a spurious second hypen character after EPR. Second, applying this method to, say, Baden-Württemberg does not appear to allow a line break after "den-". Neither outcome seems desirable. (Aside: I'm running LuaLaTeX, and I am loading the babel package with the option ngerman.) Am I missing something? – Mico Aug 07 '16 at 21:07
  • @UlrikeFischer - I've also checked out the situation with pdfLaTeX, loading inputenc with the option utf8, fontenc with the option T1, and babel with the option ngerman. I now get an en-dash (not a regular dash) after "EPR" in "EPR-Spektroskopie" and after "chen" in "branchen-üblich". Moreover, pdfLaTeX still won't permit a line break after "den" in "Baden-Württemberg". Worse still, it replaces the "ß" in "Gesäß-Muskulatur" with a hideous "SS" (while still featuring an en-dash after "SS"). I'm running MacTeX2016, with all the latest updates applied; MacOSX 10.11.6 "El Capitan". – Mico Aug 07 '16 at 21:17
  • Oh I overlooked the double hyphen, but it doesn't feel right to replace at the input the hyphen. At first it doesn't catch all case, e.g. T1-Schriften, and also there is too much danger to replace a hyphen when you don't want it. – Ulrike Fischer Aug 07 '16 at 21:46
  • @UlrikeFischer - Indeed, the code currently doesn't catch cases such as T1-Schriften. (This could be fixed easily by changing '(%a)%-(%a)' to '(%w)%-(%a)' in the second argument of unicode.utf8.gsub.) The OP didn't ask for this functionality, and that's why I didn't provide it. I did point out that there may well be circumstances -- such as verbatim and lstlisting environments -- in which one should not replace - with "=. Can you give other examples of circumstances when the automated hyphen replacement (of the type pursued above) might be "too dangerous", to use your phrase? – Mico Aug 07 '16 at 22:02
  • 1
    Your answer looks promising, but unfortunately switching to LuaTeX is not feasible for me right now (maybe in future…). – Eldrad Aug 07 '16 at 22:18
  • @Eldrad - Thanks for this feedback. Oh well, hopefully my answer will still be of some use to other users, who may be able to switch to Lua(La)TeX. – Mico Aug 07 '16 at 22:21
  • I am thinking of other "dangerous" circumstances: What about options in commands like \SI[per-mode=fraction]{3.5e-6}{\kJ\per\mole} (using the package siunitx) or bibliography shortcuts containing a hyphen: autocite{bib:Smith-1} (using biblatex and biber)? – Eldrad Aug 07 '16 at 22:32
  • @Eldrad - Two excellent examples! :-) In the meantime, I've also come up with the case of filenames that contain hyphen characters, in commands such as \includegraphics{<filename-containing-hyphen-characters>}... Maybe the automatic replacement of - with "= isn't that great after all (especially if it's carried out at the input stage). – Mico Aug 07 '16 at 22:37
  • 1
    Is there any possibility to distinguish between text that is displayed "as is" in the final pdf, and all other signs, that belong to commands? If this was possible, you maybe could apply your function only to the former. But I am a total newbie in LaTeX (especially in TeX), therefore I absolutely don't now at all, what I am talking about. – Eldrad Aug 08 '16 at 18:06
  • @Eldrad - I've rewritten the answer thoroughly to incorporate both your comments and Ulrike's comments. The Lua function now takes care to replace - with "= and "= with -. One can now enter expressions such as \SI[per"=mode=symbol, tight"=spacing]{4e-3}{\meter\per\second} and \cite{andersen-bollerslev:1995}. In fact, the expressions must be entered as \SI[per"=mode=symbol, tight"=spacing]{4e-3}{\meter\per\second} and \cite{andersen"=bollerslev:1995}, respectively, respectively. Not sure if this setup is really easier than remembering to input "= only where needed! – Mico Aug 08 '16 at 21:01