5
\documentclass{article}
\usepackage[utf8]{inputenc}
\DeclareUnicodeCharacter{2026}{\dots}% …
\usepackage{amsmath}
\begin{document}

\[\left\{a \dots \right\}\]
\[\left\{a … \right\}\]

\end{document}

The spacing around the ellipsis is not the same for \dots and in the document above (the third case is if I remove the \DeclareUnicodeCharacter{2026}{\dots}):

The unicode ellipsis has less space to the right with \DeclareUnicodeCharacter. It has has no space to the left (but the correct spacing to the right I think) without the \DeclareUnicodeCharacter

How can I get the same spacing with \dots and ? Hopefully this can be done without changing anything in the formula itself, just in the \DeclareUnicodeCharacter code, as otherwise I'm likely to forget the hack most of the time, and the formulas are less concise.

The problem disappears if I'm not using amsmath.

Suzanne Soy
  • 3,043
  • Just a guess: Does replacing \dots by \dots{} in the \DeclareUnicodeCharacter solve the problem? – crixstox Feb 09 '16 at 13:54
  • @crixstox Good idea, but unfortunately it doesn't change anything :-( . Also, it looks like something smaller than an actual space. I'm more interested in the explanation of why this happens, though. As a quick fix I can always add a \, in the \DeclareUnicodeCharacter, and hope that it doesn't cause extraneous spacing in other cases. – Suzanne Soy Feb 09 '16 at 14:37
  • 2
    @crixstox No, that would ruin the working of \dots that looks at the following token to decide if it should become \ldots or \cdots. – egreg Feb 09 '16 at 14:39
  • @egreg I didn't no that. Thanks for the explanation! – crixstox Feb 09 '16 at 14:49
  • 1
    @GeorgesDupéron It's curious, because the usage of doesn't break the choice between \cdots and \ldots; I tried a+…+b and a,…,b and got the expected behavior. However, it seems to be specific with \}, because if you try (a\dots) and (a…) you get the same spacing. – egreg Feb 09 '16 at 16:14
  • @egreg try a+… +b :-) – David Carlisle Feb 09 '16 at 16:47

1 Answers1

6

enter image description here

\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage{amsmath}

%\DeclareUnicodeCharacter{2026}{\dots}% …
% \u8:… ->\IeC {\dots }
\expandafter\def\csname u8:\detokenize{…}\endcsname#1{\dots#1}


\begin{document}


$\left\{a \dots \right\}$\vline

$\left\{a … \right\}$\vline

\end{document}

\dots looks ahead at the next token to see whether to use low or centered dots. \DeclareUnicodeCharacter wraps its definition in \IeC{...} where \IeC (here) is just a macro that does nothing but use its argument.

But the main problem is that \dots uses \futurelet (rather than say \@ifnextchar ) so it does not skip over white space while looking for the next token. this does not usually matter for \dots as white space is ignored after a command name, but not after ... (which is the issue that \IeC addresses, to make sure that input enc characters don't have definitions that end with a token that forces white space to be ignored if written to an external file such as table of contents.

So here I define ... to take an argument and return it which is a (more or less safe) way to force space after the character to be ignored, so the \dots tests see \} not a space. The only unsafe part about that is that {…} would give a parse error as the argument parser would hit the end of group while looking for #1.

David Carlisle
  • 757,742
  • This works well, except in one case: if you write \begin{document} {…} \end{document}, it will complain because has no argument. Nice to see how utf8 is handled :) . – Suzanne Soy Feb 09 '16 at 16:36
  • @GeorgesDupéron hey I was writing that:-) – David Carlisle Feb 09 '16 at 16:45
  • It seems \def\mydots#1{\dots#1}\expandafter\def\csname u8:\detokenize{…}\endcsname{\@ifnextchar\egroup{\expandafter\dots}{\expandafter\mydots}} does the trick, but I'm not sure if everything is working the way it should in the internals. – Suzanne Soy Feb 09 '16 at 16:56
  • Another one to fix. Apart from using \@ifnextchar to ignore spaces, this should also be fixed (i.e., work even if next token is \long). – Manuel Feb 09 '16 at 16:59
  • 1
    @Manuel it's a secret but the version of amsmath.sty I used to make the above is OK with \long tokens:-) – David Carlisle Feb 09 '16 at 17:04
  • 1
    @GeorgesDupéron simpler you could do \expandafter\def\csname u8:\detokenize{…}\endcsname{\@ifnextchar\dots\dots} and avoid the dangerous #1 lookahead. \@ifnextchar skips over white space you don't actually care what it finds as you can do `\dots`` in both cases/ – David Carlisle Feb 09 '16 at 21:19