14

In Polish typography dash (pol. myślnik) should not be put after a line break. Below you can find badly and correctly typed samples using en dash (pol. półpauza) and em dash (pol. pauza).

\documentclass[12pt]{article}
\usepackage[paperwidth=95mm,paperheight=55mm,margin=5mm,right=24mm,marginparsep=5mm]{geometry}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{lmodern}
\usepackage{microtype}
\usepackage{xcolor}
\pagestyle{empty}
\begin{document}
% line break before a dash is a sin according to Polish typography rules
\leavevmode\marginpar{\textsc{\color{purple}źle\\(bad)}}%
To jest maciupeńki test półpauzy -- na Zachodzie nazywanej \emph{en dash}.  {\color{orange}\hfill~--}
\par \emph{Em dash} za to nazywamy pauzą --- obecnie dość rzadko spotykana. {\color{orange}\hfill~---}
\vfill
% line break after a dash -- this is the way it should be done
\leavevmode\marginpar{\textsc{\color{teal}dobrze\\(good)}}%
To jest maciupeńki test półpauzy~-- na Zachodzie nazywanej \emph{en dash}.  {\color{orange}\hfill~--}
\par \emph{Em dash} za to nazywamy pauzą~--- obecnie dość rzadko spotykana. {\color{orange}\hfill~---}
\end{document}

enter image description here

To obtain correct result I had to use non-breaking space (tie) before each dash.

Is it possible to fix behavior of all en/em dashes surrounded by normal spaces in LaTeX document?

Side note: I am not asking about workarounds requiring preprocessing, like using s/ -- /~-- / in Vim/sed/perl/etc.

lockstep
  • 250,273
przemoc
  • 2,142
  • 5
    Not in a robust way. The ties are the safest method. – egreg Jul 05 '11 at 20:30
  • @egreg: Could you elaborate more on these unhealthy methods? If not as an answer meant to be accepted, then just as bad examples with commentary explaining their badness. – przemoc Jul 05 '11 at 20:45
  • 2
    @przemoc: 1) En dash is never surrounded by spaces. It's always surrounded by characters. 2) It's not good to brake neither after En dash, nor before it. Better both words, or numbers, before and after the En dash to be on the same line. 3) Em dash is treated as the hyphen not only in polish, but as far as I know, in all European languages. – Karl Karlsson Jul 05 '11 at 22:06
  • @Karl: 1) By normal spaces I meant plain space in .tex file, not how they are interpreted, because they should be sometimes shorter than normal spaces. 2) Agree, but breaking after is still acceptable, breaking before is not. 3) Em dash (---) is definitely not a hyphen (pol. dywiz or łącznik) in Polish! It is -. Moreover, if it is used explicitly (like in compound words), it has its own special rule for breaking, because it must be repeated after line break (\def\dywiz{\kern0sp\discretionary{-}{-}{-}\penalty10000\hskip0sp\relax} from polski package). – przemoc Jul 05 '11 at 22:19
  • @Karl: Giving more complete answer to 1) depends on the language and usage, i.e. en dash function in current context. En dash can be used e.g. as substitute of figure dash (U+2012), then there is no space around it. But if it is used as pol. myślnik (in the old days only em dash was used for this purpose) like in my example or in non number only ranges (e.g. 1 stycznia -- 2 lutego [Jan 1 -- Feb 2]), then normal spaces should surround it. I am talking about Polish typography, other countries have their own rules and habits. – przemoc Jul 05 '11 at 23:06
  • 7
    @Karl: In german en-dash is used to separate "thoughts" in a sentence ("Gedankenstrich") and is always surrounded by spaces if it used between words: "foo -- bar". There is no space if a comma or dot follows: "foo --, bar". Breaks before and after the en dash in the first case are ok, in the second case ("--,") the break before should be supressed. – Ulrike Fischer Jul 06 '11 at 07:26
  • Does "\XeTeXdashbreakstate=1" help if using XeTeX? – morbusg Jul 06 '11 at 08:33
  • 4
    @Karl: I have never seen an en dash separating words (not ranges) without spaces. Em dash is (often) not surrounded by spaces in English typography, but even here it's not the case for many other languages — for example, you ought to use thin/hair spaces in Russian and Latvian, except for ranges. – Andrey Vihrov Jul 06 '11 at 09:45
  • 6
    @Karl: Wrt. 1): Even in English writing, many style guides recommend to use an en-dash surrounded by spaces in places where the (American) writer would commonly use an em-dash: [...] For example, the Canadian The Elements of Typographic Style recommends the spaced en dash – like so – and argues that the length and visual magnitude of an em dash "belongs to the padded and corseted aesthetic of Victorian typography." (http://en.wikipedia.org/wiki/Dash). – Daniel Jul 06 '11 at 10:03
  • @morbusg \XeTeXdashbreakstate is not involved as the problem is to break or not at spaces. – egreg Jul 06 '11 at 10:32
  • @Andrey Vihrov: Russian and Latvian traditionally had entirely different symbols for dashes and hyphen. @Ulrike Fischer: German DIN is changed many times, so what is tradition, and what is correct... The new standard DIN 5008 suggest hyphen for ranges. – Karl Karlsson Jul 06 '11 at 14:51
  • 2
    @Karl: Can you elaborate on "entirely different symbols"? They just use a dash (usually 1 em or 3/4 em wide), in case of Russian — for at least two centuries already. Links on dash form and typesetting (in Russian): http://ru.wikipedia.org/wiki/%D0%A2%D0%B8%D1%80%D0%B5 http://www.artlebedev.ru/kovodstvo/sections/97/ http://www.paratype.ru/help/term/terms.asp?code=85 – Andrey Vihrov Jul 07 '11 at 12:21

2 Answers2

14

The only way to accomplish the task is to make - an active character and define it in such a way that it expands to a minus sign in math mode while, in text mode it looks forward to see whether one or two hyphens follow it and act in consequence.

A possible implementation with the active hyphen is as follows

\makeatletter
\def\ah@hyphen{-}
\def\ah@endash{--}
\def\ah@emdash{---}
\catcode`\-=\active
\protected\def-{\ifmmode\ah@hyphen\else\expandafter\ah@check\fi}
\def\ah@check{\@ifnextchar-{\ah@checki}{\ah@hyphen}}
\def\ah@checki#1{\@ifnextchar-{\ah@three}{\ah@two}}

\def\ah@two{\unskip~\ah@endash\space\ignorespaces}
\def\ah@three#1{\unskip~\ah@emdash\space\ignorespaces}
\makeatother

There is, however, a way out using Unicode characters. If your document is written in UTF-8 you can say

\usepackage{newunicodechar}
\newunicodechar{–}{\unskip~--\space\ignorespaces}
\newunicodechar{—}{\unskip~---\space\ignorespaces}

where in line 2 is U+2013 EN DASH and in line 3 is U+2014 EM DASH; using these characters in your source will do what you want. The main problem here is that they are almost indistinguishable from each other in a monospaced font. Just to show them I'll put them in a code box:

– U+2013 EN DASH  
— U+2014 EM DASH

and here's how they appear in a quotation box:

– U+2013 EN DASH
— U+2014 EM DASH

The rendering on screen depends on the font, of course.

egreg
  • 1,121,712
  • 4
    – — <- here they are in monospace font. – przemoc Jul 05 '11 at 21:11
  • @przemoc - just edited your contribution into the answer... – Brent.Longborough Nov 15 '11 at 12:38
  • @Brent.Longborough: I thought the idea of showing the dashes as a quote was to make the difference between them visible. As code, they're just as indistinguishable as in the code sample above. – doncherry Nov 15 '11 at 14:19
  • I moved the active character part in your solution to the front, just to keep things together that belong to one thought. – doncherry Nov 15 '11 at 14:23
  • @doncherry: I thought the idea was to illustrate just how indistinguishable they were; have I misinterpreted? If so, I'll put it back, with grovelling apologies. Enrico, care to comment? – Brent.Longborough Nov 15 '11 at 15:09
  • @Brent.Longborough I think that this version can stand: the "active" character solution seems to work correctly and the "indistinguishable" part is sufficiently informative for making a choice. – egreg Nov 15 '11 at 15:13
6

Inserting ties manually is still a good option: it doesn't have side effects, it is readable and it is easy to train yourself to always type ~---.

That said, the extdash package provides commands for dashes with non-breaking spaces. With the [shortcuts] option the \--- command is made available and stands for an em-dash with non-breaking space. Space surrounding the dash is also reduced for a better appearance.

Andrey Vihrov
  • 22,325
  • The extash package is great. Note that if you want to forbid newlines after, you need to use instead \===. You also have many options, like nospacearound to remove the half-space automatically added before/after the dash. – tobiasBora Jul 12 '21 at 07:59