soulutf8 strips characters from compound words

Question

I use soulutf8 for underlining in my LaTeX document. However, it seems to be stripping characters from the end of compound words, e.g.:

\ul{$n$-rozměrný}

results in

n-rozměrn

However, if I break the word at the hypen, i.e.:

\ul{$n$ rozměrný}

the 'ý' character re-appears:

n rozměrný

My document is defined as follows:

\documentclass[dvips,12pt]{article}

\usepackage[czech]{babel}
\usepackage[utf8x]{inputenc}
\usepackage[IL2]{fontenc}
\usepackage{soulutf8}

A similar problem has already been described here; however, the proposed solution doesn't seem to work in my case.

The document is in Czech; removing IL2 and utf8x would make Czech characters disappear. — John Manak, Jul 24 '14 at 10:02
No, with \usepackage[T1]{fontenc} and \usepackage[utf8]{inputenc}. — egreg, Jul 24 '14 at 10:30
The dash - is active in czech and soul doesn't like this. Hide it in an \mbox or use \shorthandoff{-}. — Ulrike Fischer, Jul 24 '14 at 10:38

egreg · Accepted Answer · 2014-07-24T11:48:26.543

The problem is in the fact that babel-czech makes the hyphen a shorthand character, but soul has no check for an active -; the result is that as many characters at the end are mangled as there are hyphens in the text.

Here is a workaround using regexpatch:

\documentclass[12pt]{article}

\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage[czech]{babel}
\usepackage{soulutf8}

\usepackage{regexpatch}
\makeatletter
\regexpatchcmd*{\SOUL@eval}
  {\cO-}
  {\cA-}
  {}{}
\makeatother

\begin{document}

\ul{$n$-rozměrný}

\ul{$n$ rozměrný}

\ul{a-b-cd}

\end{document}

enter image description here

Note that the IL2 encoding is obsolete and not recommended any more. I'd prefer utf8 to utf8x, as it's more stable.

Without regexpatch it's possible too, but quite indirectly.

\documentclass[12pt]{article}

\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage[czech]{babel}
\usepackage{soulutf8}

\usepackage{etoolbox}
\makeatletter
\patchcmd{\SOUL@eval}{-}{\SOUL@@@hyphen}{}{}
\patchcmd{\SOUL@eval}{-}{\SOUL@@@hyphen}{}{}
\catcode`-=\active
\patchcmd{\SOUL@eval}{\SOUL@@@hyphen}{-}{}{}
\patchcmd{\SOUL@eval}{\SOUL@@@hyphen}{-}{}{}
\catcode`-=12
\makeatother

\begin{document}

\ul{$n$-rozměrný}

\ul{$n$ rozměrný}

\ul{a-b-cd}

\end{document}

Thanks, the regexpatch fixes it (just as the mbox proposed in the comments). The main reason I stick with IL2 is that if I use T1 and copy "n-rozměrný" from the output PDF, I get: "n ✲r♦3♠➙r♥ý", whereas everything works OK when the encoding is set to IL2. — John Manak, Jul 24 '14 at 11:03
@JohnManak No, if you have a decent TeX distribution with Type1 fonts for the T1 encoded ones. Try adding \usepackage{lmodern}. You may want to look also at Slovak (and Czech) babel gives problems with \cmidrule and \cline for another problem with babel-czech. — egreg, Jul 24 '14 at 11:08
I've just been looking into this right now as well and yes, \usepackage{lmodern} fixes that problem too. — John Manak, Jul 24 '14 at 11:09

soulutf8 strips characters from compound words

1 Answers1

Linked