1

It is well-known that bibtex requires some proper setting to handle UTF-8 characters gracefully. However, even following the recommended practice, I get a strange bug with accented characters in names. Consider the following example.

\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\bibliographystyle{abbrv}

\begin{document} \begin{filecontents}{accents.bib} @inproceedings{accent, author = {Öh, Angel and Rumstein, Ángel}, booktitle = {Proceedings of the 2015 something}, pages = {41--63}, title = {Title}, year = {2015} } \end{filecontents}

\section{Introduction} Blah \cite{accent}.

\bibliography{accents} \end{document}

Running (latex then) bibtex produces a accent.bbl file that is not UTF-8 valid: it has an invalid character in place of the Á where Á. Rumstein should appear. Note that the Ö is valid. Further investigations reveal that the bug happens if, and only if, an accent appears on the first letter of the given name of an author. Changing abbrv to plain as bibliographic style also solves the problem. (The problem is not related to the inlining of the bib file, I did this only to produce a self-contained example.)

Has this bug been reported anywhere? Can I work around it? Or am I doing something wrong? I can’t find any mention of this but my searches are not much helpful because I stumble upon numerous questions from persons who are not using inputenc correctly (so that no accents work whatsoever).

I’d like to stick to LaTeX and bibtex, not switch to XeLaTeX or biber.

  • 2
    input the letter as a command {\'A}ngel, or use biblatex + biber which can handle utf8 properly. – Ulrike Fischer Dec 26 '23 at 16:21
  • 1
    it's not really a bug just a documented restriction that bibtex doesn't support utf-8, so when characters are uppercased it uppercases individual bytes of the utf-8 encoding and makes a malformed utf-8 string. – David Carlisle Dec 26 '23 at 17:03
  • 2
    note you don't need \usepackage[utf8]{inputenc} as utf-8 is the default latex encoding unless you have a very old latex release. – David Carlisle Dec 26 '23 at 17:04
  • For BibTeX purposes, you should replace Ö and Á with {\"O} and {\'A}, respectively, in the .bib file. See How to write “ä” and other umlauts and accented letters in bibliography? for more information on this topic. – Mico Dec 26 '23 at 18:55
  • @DavidCarlisle I do not understand then why switching to style plain works, or why the string is not malformed when the letter appears in the name. – Olivier Cailloux Dec 27 '23 at 17:37
  • 1
    it fails if it needs to uppercase or if it needs to take the first letter eg to make a sort key as "first letter" will just take the first byte of a multi-byte utf 8 sequence, so the exact failure depends on the bib style but basically nothing is expected to work even if sometimes you get lucky and some styles with some names avoid the issues. – David Carlisle Dec 27 '23 at 23:34

1 Answers1

2

You can use bibtexu instead of bibtex.

\begin{filecontents}{\jobname.bib}
@inproceedings{accent,
  author = {Öh, Angel and Rumstein, Ángel},
  booktitle = {Proceedings of the 2015 something},
  pages = {41--63},
  title = {Title},
  year = {2015}
}
\end{filecontents}

\documentclass{article} \usepackage[T1]{fontenc} %\usepackage[utf8]{inputenc}% not needed

\begin{document}

\section{Introduction} Blah \cite{accent}.

\bibliographystyle{abbrv} \bibliography{\jobname}

\end{document}

After running pdflatex+bibtexu+pdflatex I get

enter image description here

egreg
  • 1,121,712