19

I'm writing a little BibTeX exporter for the publication database of my institute. We do have a lot of authors with all kind of weird characters in their names, which get the "WTF is Unicode?"-treatment from BibTeX.

As I have to preprocess author names and titles before exporting anyway, I thought that I could replace as much unicode characters as possible with their LaTeX equivalent. There's an image with such a mapping on bibtex.org: mapping

But that image is

  1. incomplete (e.g. capital German umlauts are missing) and
  2. not of much use to me in this form.

Does someone know of such a mapping that is as complete as possible and available in a machine-readable format?

Edit: Juan's XML is probably as complete as it gets (I'll post a Python dictionary reduced to unicode and LaTeX on github). But in the meantime, I also found the mapping that Zotero uses. It can be found in their SVN-Repository.

Edit2: OK, the Python dictionary can be found here, and the XSL Style Sheet to convert Juan's XML into a Python dictionary is here.

  • 4
    Use biber+biblatex or bibtex8? – Seamus Jan 27 '11 at 11:52
  • 3
    we don't use the exported bibtex files ourselves. It's a service for people that download papers from our website. I can't influence how they use the bibtex files we provide. – Benjamin Wohlwend Jan 27 '11 at 12:14
  • Good question! There are other circumstances where this mapping could be valuable. – Charles Stewart Jan 27 '11 at 12:19
  • Many citation exporters don't try to create BibTeX-compatible databases any more, and I think that's pretty ok since BibTeX is completely outdated. I think you don't have to put much effort in supporting a decade-old, broken system. – Philipp Jan 27 '11 at 14:12
  • 1
    Philipp: What alternatives are there that are widely in use? I'm a programmer, not a researcher, so I'm not current on this topic :) – Benjamin Wohlwend Jan 27 '11 at 14:18
  • 1
    piquadrat: The mapping you linked to is great, but how did you actually apply it? It would be great to use this mapping to update this ancient Python unicode-to-LaTeX recipe. – gotgenes Jul 24 '11 at 04:45
  • These links seem to have broken. – TextGeek Apr 13 '21 at 16:10
  • 1
    @TextGeek I updated the links. It probably broke because, by coincidence, somebody signed up on github with the username matching one of the gist IDs, at which point GitHub displays the user profile instead of the gist – Benjamin Wohlwend Apr 14 '21 at 06:17

2 Answers2

10

You can use biber with the optional arguments for the bibdata bibtex and the exported bib encoding (UTF8).

biber --bblencoding=UTF-8 --bibencoding=latin1 --allentries --bibdata <file.bib>
  • this command does not seem to work anymore. allentries and bibdata are unknown options on biber v2.11 – glS Aug 08 '18 at 13:29
  • It worked for me on the contrary of bibtex % which wouldn't accept accents. – Hakim Nov 07 '18 at 12:26
  • Current version of biber can convert UTF-8 bib file to ASCII using the following command: biber --tool --output-encoding=ascii -O output-file.bib input-file.bib – michal.h21 May 04 '22 at 19:40
8

From a related question on SO, there is

... an XML file from the W3C. It maps Unicode to HTML, MathML, LaTeX, Mathematica, and others. (The file is 1.4 MB, uncompressed.)

You can read more about it here: http://www.w3.org/TR/unicode-xml/

Juan A. Navarro
  • 62,139
  • 32
  • 140
  • 169
  • 2
    Here's a web app based on the data from that XML file. Enter Unicode and loop up LaTeX and vice versa. http://www.johndcook.com/unicode_latex.html – John D. Cook Feb 18 '13 at 15:11
  • 2
    Also see https://digitalheir.github.io/mathy-unicode-characters/ - based on that same XML file – Maarten Jul 20 '17 at 22:17