Combining \newunicodechar with delimiter sizes \big, \left, \right, etc

Question

\documentclass[11pt]{standalone}
\usepackage[utf8]{inputenc}
\usepackage{newunicodechar}
\newunicodechar{‖}{\ensuremath{\|}}
\begin{document}
$$ ‖x‖ $$
$$ \big‖x\big‖ $$  % Error: Missing delimiter
\end{document}

Can we somehow hack \big to make it work?

Somewhat similar to Using greek characters in commands' arguments without braces - TeX - LaTeX Stack Exchange. — user202729, Jun 26 '22 at 17:11
Side note for OP: math mode - Why is [ ... ] preferable to $$ ... $$? - TeX - LaTeX Stack Exchange — user202729, Jun 26 '22 at 17:25
It'd be very complicated to make a ‖ character automatically determine whether the spacing should be \mathopen, \mathclose, as a middle bar or a relational operator. However, you might use the paired-math-delimiter commands in mathtools for a good shortcut. — Davislor, Jun 28 '22 at 02:51

egreg · Accepted Answer · 2022-06-26T16:26:09.583

3

Using \ensuremath makes no sense.

You can make it work with pdflatex, but you have to use braces with \big or similar commands. Not with \left and \right.

\documentclass{article}
\usepackage{amsmath}
\usepackage{newunicodechar}
\newunicodechar{‖}{\|}
\begin{document}
[
‖x‖ + \bigl{‖}x\bigr{‖} + \left‖\frac{x}{2}\right‖
]
\end{document}

The problem is that ‖ is not a single byte and \bigl only absorbs the first one.

Usual caveats. $$ should not be used in LaTeX and \big is not the right command for that use case.

On the other hand, \lVert and \rVert are the right commands for that use case.

edited Jun 26 '22 at 16:26

answered Jun 26 '22 at 15:38

egreg

1,121,712

Thanks! Unfortunately, it doesn't seem to work with \left and \right. Why shouldn't \big be used? What's the difference to using \bigl and \bigr? Seems like \big is the "default" suggestion for delimiter sizes around the web (e.g. overleaf or wikibooks) – Hyperplane Jun 26 '22 at 15:49
@Hyperplane try $\big|-1\big|$ vs $\bigl|-1\bigr|$ , the l an r variants create proper opening and closing symbol codes and thus the spacing on the minus becomes that of a sign not a binary relation. – daleif Jun 26 '22 at 15:54
Same problem exists with ‖-x‖ after \newunicodechar{‖}{\|}. Compare with this: \lVert -x \rVert. – hair-splitter Jun 26 '22 at 16:05
1

The description of overleaf and wikibooks on this topic is incorrect. – hair-splitter Jun 26 '22 at 16:14
@Hyperplane It does work with \left and \right, without braces. See edit. – egreg Jun 26 '22 at 16:26
@egreg Huh? Weird I got a compilation error earlier, must've done something wrong. – Hyperplane Jun 26 '22 at 16:40

user202729 · Answer 2 · 2022-06-28T02:38:11.333

For "academical" purpose, this is a macro that automatically braces the next Unicode character.

It does requires patching \big however. Unavoidably.

How TeX input stream works is a bit complex, you still need TeX knowledge to use this macro i.e. it's not 100% automatic.

It only works in pdflatex -- nevertheless if you're using some Unicode engine you would not need this code at all.

Side note--if you use this in a package/extend the code etc., fix the naming convention of \__stored_content etc. yourself, see expl3 manual interface3.pdf.

(use \tl_analysis_map_inline:nn just to check if it's an active character instead of something like e.g. \str_count:n, to handle some unlikely case that there's a length-1 control sequence and \escapechar=-1...)

%! TEX program = pdflatex
\documentclass{article}
\usepackage{newunicodechar}
\errorcontextlines=100
\newunicodechar{‖}{\ensuremath{\|}}
\begin{document}
\ExplSyntaxOn
% command docs:
% if you execute \__brace_next_unicode:nw {blob blob} ■ where ■ is any
% multi-byte UTF8 character, after some execution steps blob blob {■}
% will be executed. (spaces are only for demonstration.)
\cs_new_protected:Npn __brace_next_unicode:nw #1 {
 \peek_N_type:TF {
  \tl_set:Nn __stored_content {#1}  % actually this part can be done expandably
        % as well... but \peek_N_type:TF is already unexpandable
  __brace_next_unicode_get_one_byte:N
 }
 {
  % do nothing, just put blob blob out
  #1
 }
}
\cs_new_protected:Npn __brace_next_unicode_get_one_byte:N #1 {
 \tl_analysis_map_inline:nn {#1} {
  \bool_set:Nn __is_active_character { \token_if_eq_charcode_p:NN ##3 D }
 }
 \bool_if:nTF __is_active_character {
  \int_compare:nNnTF {#1} &lt; {&quot;80} { % not a part of multibyte UTF8 character, put it back. \__stored_content #1 } { % part of multibyte UTF8 character. \int_compare:nNnTF {#1} < {"E0} {
    % 2 bytes
    __brace_next_unicode_handle_two:nn #1
   } {
    \int_compare:nNnTF {`#1} < {"F0} {
     % 3 bytes
     __brace_next_unicode_handle_three:nnn #1
    } {
     % 4 bytes
     __brace_next_unicode_handle_four:nnnn #1
    }
   }
  }
 }
 {
  % else, it could be a control sequence or similar. Do nothing with it.
  __stored_content #1
 }
}
\cs_new_protected:Npn __brace_next_unicode_handle_two:nn #1 #2 {
 __stored_content {#1 #2}
}
\cs_new_protected:Npn __brace_next_unicode_handle_three:nnn #1 #2 #3 {
 __stored_content {#1 #2 #3}
}
\cs_new_protected:Npn __brace_next_unicode_handle_four:nnnn #1 #2 #3 #4 {
 __stored_content {#1 #2 #3 #4}
}
%__brace_next_unicode:nw {\pretty:nn {123}} ■
\NewCommandCopy \oldbig \big
\def \big {__brace_next_unicode:nw {\oldbig}}
\ExplSyntaxOff
[ ‖x‖ ]
[ \big‖x\big‖ ]
% check that it still works in normal cases
[ \big|x\big| ]
[ \big|x\big| ]
[ \big{|}x\big{|} ]
[ \big{|}x\big{|} ]
[ \big\lbrace x\big\rbrace ]
\end{document}

Please, use the recommended naming scheme: \__stored_content does not fit. And also \let\oldbig\big is not good, and should be \NewCommandCopy\oldbig\big because \big is defined with \DeclareRobustCommand. — egreg, Jun 26 '22 at 20:45
@DavidCarlisle That still work right? The peek_N_type returns a "false negative" i.e. returns false even if it's a N-type, but a \lbrace is not a partial UTF8 character and doesn't need bracing anyway. But yes, worth noting if someone else decide to modify the result and get strange result (generally-speaking \peek family of functions has a few special cases that they fails) — user202729, Jun 27 '22 at 00:07
Actually the \let won't be "very harmful" as the fix is idempotent i.e. applying multiple times doesn't hurt — user202729, Jun 27 '22 at 04:42
the edited version still fails as before: ! Improper alphabetic constant. <to be read again> \lbrace l.77 \[\big\lbrace x\big\rbrace\] ? — David Carlisle, Jun 27 '22 at 22:16

Combining \newunicodechar with delimiter sizes \big, \left, \right, etc

2 Answers2

Linked