0

In the past I asked a question about a replacement character for a unicode character (FFFD) in case of and xelatex (Defining a replacement of a unicode character), this works nicely. Now I need it for lualatex as well and found: Implementing \DeclareUnicodeCharacter in LuaLaTeX an XeLaTeX

So I tried to implement it (note there is no document yet as it crashes beforehand):

\documentclass{book}
\usepackage{ifluatex}

\newif\ifunicode
\ifluatex\unicodetrue\fi

\ifunicode
  \usepackage{fontspec}
  \usepackage{newunicodechar}
  \newcommand{\DeclareUnicodeCharacter}[2]{%
    \begingroup\lccode`|=\string"#1\relax
    \lowercase{\endgroup\newunicodechar{|}}{#2}%
  }
\else
  \usepackage[utf8]{inputenc}
\fi

\def\ucr{\adjustbox{width=\CodeWidthChar,height=\CodeHeightChar}{\stackinset{c}{}{c}{-.2pt}{%
   \textcolor{white}{\sffamily\bfseries\small ?}}{%
   \rotatebox{45}{$\blacksquare$}}}}

\ifunicode
  \DeclareUnicodeCharacter{FFFD}{\ucr}
\else
  % some other stuff
\fi

\begin{document}
\end{document}

though this gives the error message:

! String contains an invalid utf-8 sequence.
\newunicodechar #1#2->\if \relax \detokenize {#1}
                                                 \relax \nuc@emptyargerr \el...

l.23   \DeclareUnicodeCharacter{FFFD}{\ucr}

Any solution?

EDIT, based on @DavidCarlisle (used edit because of readability): I tried \newunicode{?}{\ucr} and get:

! Undefined control sequence. l.24 \newunicode {�}{\ucr}

(and with \newunicode{\ucr}{?} in get

! Undefined control sequence. l.25 \newunicode {\ucr}{�}.)

albert
  • 1,313
  • 1
    you don't need to use any of those tests' just use egreg's \newunicodechar command it works in luatex or pdftex. But your source should not have any U+FFFD characters in it so you shouldn't need to define that?? – David Carlisle Jul 08 '18 at 12:41
  • The document is a generated document and gets, unfortunately, those characters in it. I tried to replace \DeclareUnicodeCharacter with \newunicode and as first argument either FFFD or but both failed. – albert Jul 08 '18 at 12:47
  • 1
    \newunicodechar{�}{?} should work in both systems. But better would be to fix your generation system, U+FFD means that the input was corrupted and the original information could not be reconstructed so some system has given up and just replaced the original data by � – David Carlisle Jul 08 '18 at 12:54
  • I think this suggestion does made some sense that for the the FFFD values I would directly write out \ucr. – albert Jul 08 '18 at 14:01

1 Answers1

5

Like David Carlisle mentioned in a comment, \newunicodechar works for all engines. You use it like this:

\documentclass{book}
\usepackage{ifluatex}

\newif\ifunicode
\ifluatex\unicodetrue\fi

\ifunicode
\else
  \usepackage[utf8]{inputenc}
\fi
\usepackage{newunicodechar}

\def\ucr{X}

\newunicodechar{�}{\ucr}

\begin{document}
Hallo �.
\end{document}

The character U+FFFD is special through: LuaTeX uses it internally to mark invalid Unicode, so every time LuaTeX finds U+FFFD in your input, LuaTeX shows an error message. If you tell LuaTeX to continue, your document should still work.

This is a good example, why your source code should never contain characters like this. If you still want to do this, you can use a Lua callback to replace the U+FFFD character before LuaTeX sees it:

\documentclass{book}
\usepackage{ifluatex}
\usepackage{newunicodechar}

\ifluatex
\usepackage{luacode}
\else
  \usepackage[utf8]{inputenc}
\fi

\def\ucr{X}

\ifluatex
  \begin{luacode*}
    luatexbase.add_to_callback('process_input_buffer', function(buf)
      return buf:gsub(string.utfcharacter(0xFFFD), [[\ucr ]])
    end, 'replace U+FFFD')
  \end{luacode*}
\else
  \newunicodechar{�}{\ucr}
\fi

\begin{document}
Hallo �.
\end{document}
  • I tried your first example (cut and paste), but still the same problem. With example 2 I get that it tries to install luacode.sty but cannot find it. – albert Jul 08 '18 at 13:59
  • ah yes sorry i did know luatex special cases that really, but forgot when making the comment under the question – David Carlisle Jul 08 '18 at 15:02
  • In respect to my previous comment, I was referring here to MikTeX, but it looks like my MikTeX installation has some problems at the moment (strange as I didn't change anything, as one always says ;-) ). Switching to TexLive 2018 on Cygwin. The second example runs with the simle definition of \ucr to X, but with the requested definition I get: `! Undefined control sequence. \ucr ->\adjustbox {width=\CodeWidthChar ,height=\CodeHeightChar }{\stackinse...

    l.27 Hallo \ucr . `.

    – albert Jul 08 '18 at 15:42
  • Regarding my last comment this is my mistake. I forgot to include some packages and defines etc. Sorry. I tested now with pdflatex, xelatex and lualatex and all 3 worked (MikTex and TexLive). (MikTeX mysteriously started to work again). – albert Jul 08 '18 at 16:29