Too often I messed with bad formated BibTeX files from references databases that are infected with active characters as % or & (usually in the abstract, that is not printed but produce an error anyway) but I can live with this inconvenient and replace those by with \% or \&.
Also with some input encoding errors due to some UTF8 character that cannot be managed by pdflatex. In references this error is more disturbing that in the main text, but in best cases, is only change some greek letter as γ. In others is the insidious zero width U+200B, but with I can search and replace even this character.
The worse for me is the obnoxious U+301, specially when is not in over a consonant but hiding in vowels as normal acute characters, that are perfectly managed using inputenc, i.e., the normal e character (U+0065) plus U+301 look exactly like the kind é (U+00E9) and so on with á, Á, í, etc., so if is hard detect the poisonous diacritic mark.
Disappointingly, search U+0065 plus U+301 find also the single character U+00E9 and vice versa in some editors, so I ended with the pain of searching & replacing every accented letter (single or double) with the single character versión.
So the question is specifically is: There are clevers ways to sanitize .bib files with U+301 tildes?
(Of course, a more general answer, covering some automagic cleaning of all unrecognized and active characters will be welcome.)
The minimal (not) working example:
\documentclass{article}
\begin{filecontents}{test.bib}
@article{xx,
author={González, M. and Ruíz, P.}
title={Mañana hará calor & bochorno con un 90% de humedad}
}
\end{filecontents}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage[spanish]{babel}
\begin{document}
Hello \cite{xx}
\bibliography{test}
\bibliographystyle{plain}
\end{document}
.bibfile whenever you want? Seems like an easy job forsed(orPerl, or ...). – jon May 16 '17 at 05:23iconv? I can deal with a script as jon suggested, but I hate reinvent the wheel. – Fran May 16 '17 at 06:02uconv -x -any-nfc filename. See promising (more here). However I wait some time other solutions. – Fran May 16 '17 at 06:59test.bibto NFC (or in fact even removing all non-ASCII characters, and removing the&and%), still the reference is undefined. Can you check your MWE to see whether it's truly an example? Or maybe give two versions, one which works and one which doesn't? [Oh: maybe there's a step of running bibtex that I've forgotten? Been a few years since I worked with bibliographies in TeX…] – ShreevatsaR May 16 '17 at 14:58pdflatex foo.tex, thenbibtex foo.aux(which generatesfoo.bbl), thenpdflatex foo.textwice. And I see the issue: it's not really specific to bibtex or references; it is reproducible with\documentclass{article} \usepackage[utf8]{inputenc} \begin{document} hará calor & un 90% de humedad \end{document}. – ShreevatsaR May 16 '17 at 15:17.bibor.bblfile, which converts everything that isn't recognized as a letter by TeX to whatever it should be. This includes NFC normalization, changing&to\&and%to\%, and so on. From within TeX one could probably hack by redefining the TeX commands that occur in the.bblfile to do this, change catcodes and so on… but really there's no great reason to implement these transformations in TeX rather than with an external script. – ShreevatsaR May 16 '17 at 18:05