Input encoding error after upgrading from Biber 1.9 to Biber 2.1

Question

Under biber 2.1 my biblatex produces input encoding errors. The same procedures worked smoothly under biblatex 2.9a/biber 1.9, now that I upgraded to biblatex 3.0/biber 2.1 I get the following message when pdflatex runs after biber:

Package inputenc Error: Unicode char \u8:╠ü not set up for use with LaTeX.

Here is a minimum working example:

\documentclass{article}
\usepackage{lmodern}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage[backend=biber]{biblatex} 
\bibliography{test.bib}
\usepackage{filecontents}

\begin{filecontents}{test.bib}
@article{rodr,
author = {Rodr{\'{\i}}guez, A},
year = {1999} }
\end{filecontents}

\begin{document}
Rodr\'{\i}guez
\textcite{rodr}
\printbibliography
\end{document}

Any help appreciated!

Just out of curiosity: Is there a reason why you don't input the accented character directly as í? — Mico, Jun 20 '15 at 16:48
I have a large library in Mendeley, and this is how the export-to-bib function deals with the character. — cth, Jun 20 '15 at 19:07
Typing \'i for getting í has been in LaTeX for about 20 years. — egreg, Jun 21 '15 at 14:43

score 17 · Answer 1 · 2015-06-20T14:38:49.220

17

run biber with

biber --output-safechars <file>

enter image description here

edited Jun 20 '15 at 14:38

answered Jun 20 '15 at 14:26

Great! Works perfect! – cth Jun 20 '15 at 19:10
1

This is the only solution that works for me. – Eike P. Nov 16 '18 at 16:17
1

@jhin Me too. I use a master .bib file for all my projects, and I am asked by journals to use bibtex on some of them and I want to use biber on the others. Running biber with this option is the only solution that allows me to have the same .bib file for both bibtex and biber. – Pertinax Jun 10 '20 at 16:29

score 11 · Answer 2 · answered Jun 20 '15 at 15:33

11

Writing Rodr{\'{\i}}guez seems needlessly complicated. I suggest you write Rodr{\'i}guez instead. Not only is it easier to do so, it also makes the biber/inputenc issue go away automatically.

Incidentally, outside the bib file, i.e., in the body of tex file, I'd write Rodr\'iguez.

answered Jun 20 '15 at 15:33

Mico

506,678

1

I'd write the unicode character in all instances ;-) – Johannes_B Jun 20 '15 at 15:38
2

@Johannes_B - If it were my bib file, I would too. :-) I'm assuming, though, that the OP has a good reason for not entering the character as í. That's why I suggested he/she input it as {\'i}. – Mico Jun 20 '15 at 15:40
I think the real issue here is ref-manager export. See also http://golatex.de/i-mit-diaresis-in-bibtex-t15369.html – Johannes_B Jun 20 '15 at 15:48
@Johannes_B - That may well be the case. For the OP's MWE, though, the problem arises even without a ref-manager (bibdesk? jabref?) being involved. – Mico Jun 20 '15 at 15:57

score 11 · Answer 3 · answered Jun 21 '15 at 14:35

This is a change in biber 2.1 with \i in particular. Now biber properly encodes this as a dotless i (ı - 0x0131) with a combining accent. Even though biber always converts to precomposed (NFC) form on output and is therefore generally as friendly as possible to inputenc, there is no precomposed form of this combination. There is a precomposed form for a normal 'i' followed by the same combining char but this is incorrect and it is a completely different thing, causing problems in some fonts and needing a special case in the decoding. \'i is generally a better choice because most fonts know that the "i" in this shouldn't have both a dot and an accent (or alternatively you simply can't see the dot because of the accent if you are lucky). inputenc also knows about normal ISO8859-1 literal í characters so you can just use these without any macros.

If you get your data from some external source and therefore don't get to choose the form in which you get your accented "i"s, put in a biber sourcemap to fix up the code if you have inputenc or font issues with dotless "i"s:

\DeclareSourcemap{
  \maps[datatype=bibtex]{
    \map[overwrite]{
      \step[fieldsource=author,
            match=\regexp{\x{0131}\x{0301}},
            replace=\regexp{\x{00ED}}]
    }
  }
}

Since macro decoding into UTF8 is done before source mapping, it's already in UTF-8 NFD form by the time mappings are applied. The above simply changes the dotless i (U+0131) followed by combining acute accent (U+0301) into a standard ISO 8859-1 lower case i with acute (U+00ED) which is supported by inputenc.

Or you can just use the biber --output-safechars option to force encoding of UTF-8 into LaTeX macros when writing the .bbl.

There is \DeclareTextComposite{\'}{T1}{i}{237} in t1enc.def and similar code for the dot above, grave, dieresis and circumflex accents. This is the reason why \'i is good. It's not a font property. However, other accents such as breve or macron could give problems because of this change to the behavior of \i, I guess. — egreg, Jun 21 '15 at 14:42
Yes, it's a general thing with \i, regardless of the accent. There used to be all sorts of special cases just for this which was making recoding hard because really, you are applying an accent to a dotless i. It's just that these inputenc semantics were easier in days gone by. — PLK, Jun 21 '15 at 14:49
Well, just so whomever made that call knows I just wasted about an hour combing through my .bib file for non-unicode characters, converting it into hex and looking for cc81 until someone finally showed me \DeclareUnicodeCharacter{0301}{HERE}. So thanks for making biber generate unsafe output for pdflatex by default. Wouldn't that be better as the non-default option? — Canageek, Apr 28 '16 at 21:29
biber 2.19 DEV should now always output \i as a single UTF-8 grapheme instead of the dotless i plus combining diacritic. I have given in and made a special case for this as it's the most common issue for some. — PLK, Nov 17 '22 at 10:01

score 3 · Answer 4 · answered Jun 20 '15 at 16:04

If you typeset directly the í, it runs fine:

\documentclass{article}
\usepackage{lmodern}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage[backend=biber]{biblatex}
\bibliography{test.bib}
\usepackage{filecontents}
%
\begin{filecontents}{test.bib}
@article{rodr,
author = {Rodríguez, A},
year = {1999} }
\end{filecontents}

\begin{document}
Rodríguez
\textcite{rodr}
\printbibliography
\end{document}

enter image description here

Input encoding error after upgrading from Biber 1.9 to Biber 2.1

4 Answers4

Linked

Related