biblatex issue: \DeclareUnicodeCharacter

Question

I recently upgraded from Ubuntu 18 (TeX Live 2017) to Ubuntu 20, and things broke: I now need to add

\DeclareUnicodeCharacter{0304}{ ̄}
\DeclareUnicodeCharacter{0301}{ ́}

to make biblatex work. What has caused this change? Is there any way to make these lines conditional on the TeX Live version, so that my files are backward-compatible to my other machines which still have the old TeX version?

Note: a typical .bib entry of mine is

@book{Cederlof:stow,
   author=   {Mikael Cederl\"of},   
   title=    {The element \emph{-st\=ow} in the history of English},
   year=     {1998},
   series=   {Acta Universitatis Upsaliensis: Studia Anglistica
 Upsaliensia},
   publisher={Uppsala University},
   number=   {103},
}

The macron seems to require the \DeclareUnicodeCharacter, but the umlaut does not.

You'll probably need to provide a full but minimal example that others can copy and test. — daleif, Mar 08 '22 at 11:50
For one of the issues or both? It's a bit hard when I have several Mb of .bib files. I thought the first issue might be a known one. — KeithB, Mar 08 '22 at 11:52
\DeclareUnicodeCharacter{0304}{ ̄} may avoid an error but I doubt that it will give the right output. Your bib files contains something that doesn't work in a current latex/biblatex but without more info about the entries and the encoding of your files nobody can help you. — Ulrike Fischer, Mar 08 '22 at 11:55
Thanks - I think it does give the right output, but I will double-check it. But what about \nocite{*}? That's a separate issue and should not depend on encoding etc. — KeithB, Mar 08 '22 at 12:00
assuming that you use pdflatex: defining a combining accent can't work, pdflatex can't handle this. So you should either get a lonely accent or nothing in such places. (You can use \DeclareUnicodeCharacter{0304}{XXXXXXXXX} to check where this is used). And regarding the nocite: You are processing all bib entries with it, and at least one of them breaks. This can have various reasons, including a wrong encoding. — Ulrike Fischer, Mar 08 '22 at 12:04
U+0304 and U+0301 are the macron (over bar) and acute accents so if you make these empty you will certainly have corrupted your bibliography text. — David Carlisle, Mar 08 '22 at 12:06
But did I make them empty? All I did was respond to an error message from biblatex. It told me exactly what \DeclareUnicodeCharacter command was needed. I added this and the error message went away. Sorry, I don't have access to the machine which has the problem today. I will report more tomorrow. — KeithB, Mar 08 '22 at 12:09
Check the (next) entry at the point where nocite fails, for: missing or unbalanced braces; too many levels (3 or more) of nested braces. If you are using unicode fonts, then I would expect you to be compiling with xelatex\lualatex. Unrelated: with UTF-8 file encoding, you can type in the character(s) by codepoint or directly, so a^^^^0304 typesets an ā (when using a uunicode font), as indeed does the pre-composed glyph ā itself. ĕĥĵņŕŷą , Ѡѥѩѯѿ etc — Cicada, Mar 08 '22 at 13:14
Can you verify that the entry Cederlof:stow shown in the question errors for you if you don't have any \DeclareUnicodeCharacters, please? I get no error from this particular entry even without any \DeclareUnicodeCharacters. — moewe, Mar 08 '22 at 16:37
The TeX Live on your older system predated the LaTeX shift to UTF-8 standard encoding. Unless you specifically told LaTeX to use a different encoding you will have effectively been restricted to ASCII. In 2018 LaTeX switched to UTF-8. Since biblatex tries to detect your document encoding it will now try to use UTF-8. Usually this will work, but if you want to stick to ASCII, you could try passing the option texencoding=ascii, to biblatex. A similar approach would be to call Biber with the --output-safechars option as biber --output-safechars <filename>. — moewe, Mar 08 '22 at 16:44
As you say yourself, the \nocite{*} issue is a completely separate one and I would strongly encourage you to ask a separate question for that (see also https://tex.meta.stackexchange.com/q/7425/35864), so that each question can focus on one thing and we don't get confused between two different issues. — moewe, Mar 08 '22 at 16:53
I have found that adding --output-safechars to biber solves the fist problem - \DeclareUnicodeCharacter is no longer needed. I knew about this option, but as some of my projects use latexmk, this option was not getting passed to biber in all cases. Thanks to everyone for the help. \nocite{*} is still giving problems and I am working on it. — KeithB, Mar 09 '22 at 12:47
Alright. I've written up a quick answer for the \DeclareUnicodeCharacter issue. I suggest you edit your question to focus on the first question only and ask a new question once you have more details on the \nocite{*} thing. That way answers are going to be more specific and questions and answers are more useful for future visitors as well. — moewe, Mar 09 '22 at 16:33
New question at https://tex.stackexchange.com/questions/636690/biblatex-nocite-causes-tex-capacity-exceeded. — KeithB, Mar 11 '22 at 11:34
I removed the \nocite{*} bit of the question since it has been transferred to a new question. — moewe, Mar 11 '22 at 12:11

score 3 · Answer 1 · answered Mar 09 '22 at 16:31

The TeX Live on your older system predated the LaTeX shift to UTF-8 standard encoding (see, e.g. ltnews28). Unless you specifically told LaTeX to use a different encoding you will have effectively been restricted to ASCII. In 2018 LaTeX switched to UTF-8.

Since biblatex tries to detect your document encoding it will now try to use UTF-8. Usually this will work, but if you want to stick to ASCII, you could try passing the option texencoding=ascii, to biblatex. A similar approach would be to call Biber with the --output-safechars option as biber --output-safechars <filename>.

The example .bib entry shown in the question compiled fine for me with a current TeX distribution, but is possible that some definitions are not present in a slightly older distribution or that you have other entries that are still problematic. (There is a long-standing issue with dotless/dotted i: Unicode -(U+301) error in biblatex, but not in main text: {\'{\i}} and linked posts.)

The definitions

\DeclareUnicodeCharacter{0304}{ ̄}
\DeclareUnicodeCharacter{0301}{ ́}

will most likely not do anything useful, since they redefine combining characters. pdfLaTeX cannot really deal with combining characters, so documents using this code might compile (I'm not even sure about that), but are most likely not going to look as expected.

It is probably better to try and use something like \DeclareUnicodeCharacter{0304}{XXXXXXXXX} to find which entry causes the issue and to address that more directly by finding out why the combining accent is used.

biblatex issue: \DeclareUnicodeCharacter

1 Answers1