1

This is a follow-up to the question Can't replace non-ascii characters with biblatex. The aim is to replace certain character strings in .bib file entries with other strings.

Compiling the following document now works fine in xelatex:

\documentclass{article}
\usepackage[style = authoryear-comp]{biblatex}
\usepackage{fontspec}
\usepackage{filecontents}
\begin{filecontents}{\jobname.bib}
@BOOK{lennon1968,
    AUTHOR = "John Lennon",
    TITLE = "xåy",
    YEAR = "1968"}
\end{filecontents}
\DeclareSourcemap{
  \maps[datatype = bibtex]{
    \map{
       \step[fieldsource = title,
          match = \regexp{xåy}, 
          replace = {abc}]
    }
  }
}
\addbibresource{\jobname.bib}
\begin{document}
\nocite{*}
\printbibliography
\end{document}

enter image description here

But when I use the common macro \aa instead of å, it fails:

\documentclass{article}
\usepackage[style = authoryear-comp]{biblatex}
\usepackage{fontspec}
\usepackage{filecontents}
\begin{filecontents}{\jobname.bib}
@BOOK{lennon1968,
    AUTHOR = "John Lennon",
    TITLE = "x{\aa}y",
    YEAR = "1968"}
\end{filecontents}
\DeclareSourcemap{
  \maps[datatype = bibtex]{
    \map{
       \step[fieldsource = title,
          match = \regexp{xåy}, 
          replace = {abc}]
    }
  }
}
\addbibresource{\jobname.bib}
\begin{document}
\nocite{*}
\printbibliography
\end{document}

enter image description here

Apart from replacing macros like \aa in my .bib file with UTF-8 characters like å, is there any way to overcome this problem?

Sverre
  • 20,729
  • @JosephWright I've found why it didn't work in my original document. It has something to to with \regexp and white space. My .bib file has something like foo x{\aa}y bar, and my .sty file has \regexp{foo x{å}y bar}. This won't work. Any solution? (I want to use \regexp because I'm replacing many different strings with one string. e.g. \regexp{(foo x{å}y bar)|(bar foo)|(no foo bar)}. – Sverre Sep 02 '14 at 16:16
  • Seems to be an issue with spaces in the regex at the LaTeX end. If I edit the .bcf file to for example map_match="foo x\{å\}y bar" then all is well ('ve escape the {} pair to be on the safe side). However, spaces in the regex at the LaTeX end don't get passed on correctly. I'll have to look at that separately! – Joseph Wright Sep 02 '14 at 16:41
  • Ah! It's deliberate (probably worrying about the spaces that otherwise get added after control sequences). Try \let\regexp\detokenize before the source map. If that works for you, should I write up an answer? – Joseph Wright Sep 02 '14 at 16:45
  • @JosephWright That works! However, I didn't put \let\regexp\detokenize before \DeclareSourcemap, because that messes up other things I do with regexp. Instead I put it after \map{. Please feel free to add an answer (but remember that it should also answer the original question, not just the stuff we've discussed in the comments here). – Sverre Sep 02 '14 at 17:05

1 Answers1

2

The regex is looking for exactly what you tell it to; the only thing to bear in mind that conversion to UTF-8 occurs before this process. As such, the BibTeX input

title = {x{\aa}y},

gets turned into x{å}y by Biber and it is that which you need to search/replace for. Its' not the same as when you put

title = {xåy}

as that has no braces.

As the BibTeX database format requires braces around accents, it's probably best to tackle the issue at the regex end. Thus something like

\documentclass{article}
\usepackage[style = authoryear-comp]{biblatex}
\usepackage{fontspec}
\usepackage{filecontents}
\begin{filecontents}{\jobname.bib}
@BOOK{lennon1968,
    AUTHOR = "John Lennon",
    TITLE = "x{\aa}y",
    YEAR = "1968"}
\end{filecontents}
\DeclareSourcemap{
  \maps[datatype = bibtex]{
    \map{
       \step[fieldsource = title,
          match = \regexp{x{å}y}, 
          replace = {abc}]
    }
  }
}
\addbibresource{\jobname.bib}
\begin{document}
\nocite{*}
\printbibliography
\end{document}

should work correctly.

Joseph Wright
  • 259,911
  • 34
  • 706
  • 1,036
  • 1
    I've limited the answer to the specifics about accents/braces: the space part can be covered by a separate question if required. – Joseph Wright Sep 02 '14 at 17:39
  • In newer versions of biber, the non-ascii characters can't be enclosed within curly braces anymore, so it needs to say match = \regexp{xåy}. – Sverre Oct 03 '15 at 11:52
  • But for some strange reason, some non-ascii characters still need to be enclosed by curly braces, such as the character æ. – Sverre Oct 03 '15 at 11:58