Two or three letter initials in bibliography with Biblatex (again)

Question

In French and some other European langages, given names should be abreviated keeping digraphs and trigraphs.

John should be abreviated as J.
Clare should be abreviated as Cl.
Charles should be abreviated as Ch.
Christine should be abreviated as Chr.
Philippe should be abreviated as Ph.
etc.

The classical solution was to modify the given name in the bibliographical data from

Charles

to

{\relax Ch}arles

A nice macro written by Yves de Saint-Pern uses the search and replace option with regexp in Biber/Biblatex to iterate over your bib file and insert on the fly

\relax in the author field.

However, the latest Biber breaks this possibility. A bib entry with a \relax in the author field will result in a faulty *.bbl file with unmatched braces. See here and here.

The only workaround is to use Biber extended name format which is cumbersome :

It makes it impossible to edit your bibliography file in a GUI application
It is a lot of manual work

Is there a possibility to modify Biber's initials creation routine to create correct giveni field if the given name starts with a list of digraphs and trigraphs ?

I include a MWE below :

\documentclass{article}
\usepackage{filecontents}
 \begin{filecontents}{\jobname.bib}
  % Extended name format
  @book{Book1,
  author = {family=Doe, given=Charles, given-i={Ch}},
  title  = {An Important Book},
  publisher = {Publisher},
  date = {2012},
}
@book{Book2,
  author = {Dolittle, Charlotte},
  title  = {Another Important Book},
  publisher = {Publisher},
  date = {2014},
}
@book{Book3,
  author = {Theodore Smith},
  title  = {A very Important Book},
  publisher = {Publisher},
  date = {2016},
}

@book{Book4,
    address = "Turnhout",
    author = "Arnoux, Mathieu and Gazeau, Véronique and Demetz, Christine",
    publisher = "Brepols",
    shorttitle = "Les chanoines réguliers de la province de Rouen",
    title = "{Des clercs au service de la réforme. Études et documents sur les chanoines réguliers de la province de Rouen}",
    year = "2000"
}
\end{filecontents}
\usepackage[style=verbose,giveninits=true, backend=biber]{biblatex}
% Does not work anymore with the latest Biber
% \DeclareSourcemap{
%  \maps[datatype=bibtex]{
%     \map{
%     \step[fieldsource=author, 
%             match={Charles},
%             replace=\regexp{\{\\relax \x20Ch\}arles}]
%   }
%  }
% }
\addbibresource{\jobname.bib}
\begin{document}

\cite{Book1}

\cite{Book2}

\cite{Book3}

\cite{Book4}
\end{document}

Short answer: We can't change Biber's initial routine. Your only chance is that this is changed on the Biber side. Or that someone essentially does the RegEx to use the extended name format for you. — moewe, Mar 24 '18 at 13:37
It is just impossible to deal with all these hacky edge cases which insert macros into fields. One of the primary goals of biber is sorting and such macros have to be stripped for sorting, naturally. Such macro stripping and UTF8 conversion (without which sorting doesn't work - you can't properly sort \"A without conversion to UTF-8, for example) is basically a hard/impossible problem in general without slow state parsing. So, I have to concentrate on UTF-8 conversion and not stripping arbitrary macro constructions. @moewe's solution is the best idea. — PLK, Mar 24 '18 at 21:53
A workaround which may be acceptable for some use cases is to use the bibtex backend instead of the default biber (i.e. backend=bibtex option to the package). Then \relax will work, but the functionality of biblatex will be limited. Check the biblatex documentation, section 3.15, to see what are the specific limitations (I don't have a very good understanding of this topic). — Szabolcs, Sep 30 '20 at 12:40

moewe · Answer 1 · 2018-03-27T16:24:56.737

11

If you can live with inputting names always as "Family, Given" we can automate using the extended name format.

\makeatletter
\def\do#1{%
  \ifcsundef{blx@csv@datamodel@names}
    {\csdef{blx@csv@datamodel@names}{#1}}
    {\csappto{blx@csv@datamodel@names}{,#1}}}
\dolistcsloop{blx@datamodel@names}

\DeclareSourcemap{
  \maps[datatype=bibtex]{
    \map[foreach=\blx@csv@datamodel@names]{
      \step[fieldsource=\regexp{$MAPLOOP}, 
            match=\regexp{(\A|\s+and\s+)([a-z'\s]+)?\s*(.+?)\,\s+(Chr?|Th|Ph|[B-DF-HJ-NP-TV-XZ](?:l|r)|\w)(.+?)(?=\Z|\s+and\s+)},
            replace=\regexp{$1 family=$3, prefix=$2, given=$4$5, given-i=\{$4\}}]
    }
  }
}
\makeatother

Splits off the initials with regexp and passes the name to the extended name format. This is now automatically done for all known name fields. The regex has been adapted from Two or three letter initials in bibliography with Biblatex.

\documentclass{article}
\usepackage{filecontents}
 \begin{filecontents}{\jobname.bib}
  % Extended name format
  @book{Book1,
  author = {Doe, Charles and Potter, William},
  title  = {An Important Book},
  publisher = {Publisher},
  date = {2012},
}
@book{Book2,
  author = {Dolittle, Charlotte},
  title  = {Another Important Book},
  publisher = {Publisher},
  date = {2014},
}
@book{Book3,
  author = {Smith, Theodore},
  title  = {A very Important Book},
  publisher = {Publisher},
  date = {2016},
}

@book{Book4,
    address = "Turnhout",
    author = "Arnoux, Mathieu and Gazeau, Véronique and Demetz, Christine",
    publisher = "Brepols",
    shorttitle = "Les chanoines réguliers de la province de Rouen",
    title = "{Des clercs au service de la réforme. Études et documents sur les chanoines réguliers de la province de Rouen}",
    year = "2000"
}
\end{filecontents}
\usepackage[style=verbose,giveninits=true, backend=biber]{biblatex}

\makeatletter
\def\do#1{%
  \ifcsundef{blx@csv@datamodel@names}
    {\csdef{blx@csv@datamodel@names}{#1}}
    {\csappto{blx@csv@datamodel@names}{,#1}}}
\dolistcsloop{blx@datamodel@names}

\DeclareSourcemap{
  \maps[datatype=bibtex]{
    \map[foreach=\blx@csv@datamodel@names]{
      \step[fieldsource=\regexp{$MAPLOOP}, 
            match=\regexp{(\A|\s+and\s+)([a-z'\s]+)?\s*(.+?)\,\s+(Chr?|Th|Ph|[B-DF-HJ-NP-TV-XZ](?:l|r)|\w)(.+?)(?=\Z|\s+and\s+)},
            replace=\regexp{$1 family=$3, prefix=$2, given=$4$5, given-i=\{$4\}}]
    }
  }
}
\makeatother

\addbibresource{\jobname.bib}
\begin{document}

\cite{Book1}

\cite{Book2}

\cite{Book3}

\cite{Book4}
\end{document}

edited Mar 27 '18 at 16:24

answered Mar 24 '18 at 15:20

moewe

175,683

Thank you very much. Do you think it would be possible to scan for a "von" part and label it as a prefix? Otherwise names get sorted in the bibliography and indexed as "von Savigny, Friedrich", whereas they should be "Savigny, Friedrich von". – ienissei Mar 27 '18 at 08:49
@ienissei I'll see what I can do later. But this is going to be really tricky. Essentially that would mean re-building BibTeX's name parsing with RegExp in Biber. – moewe Mar 27 '18 at 10:15
Not necessarily. We only need to check for a von part that starts with a lowercase letter, since this is the only case in which it can (reasonably) interfere with the sorting. I am bad with regex, but the single word could probably be caught with a simple ([a-z]+) – ienissei Mar 27 '18 at 10:33
1

@ienissei You'll also want to catch "de la" and "van de" etc., so we probably need ([a-z\s]+), I'll have a look later if that really works...+ – moewe Mar 27 '18 at 10:35
@ienissei Have a look at the edit, please. The "von" part should be supported now roughly similar to BibTeX's name parsing. At the moment, this solution can't parse the junior part, though. ... – moewe Mar 27 '18 at 13:35
Thanks a lot. I promise, we won't use the Jr part! I tested it on a bunch of real life entries, and it seems to work perfectly. For completeness' sake, we would need to include the apostrophe character for French names: ([a-z'\s]+)?. But with your solution, we'll be able to fine tune the regex when exceptions come up. Thanks – ienissei Mar 27 '18 at 16:22
1

@ienissei Thanks. ' is included in the answer now. I noticed that we get warnings about empty name parts now if the name does not have a "von" part. I'm quite sure these can be safely ignored, but they are present in the .blg file, so that's a bit annoying. I tried to also parse the Jr. part, but failed miserably. – moewe Mar 27 '18 at 16:27
Two issues are with multiple first names (Carl Friedrich) and composite first names (Jean-Philippe should output to J-.Ph.). -- And with accented first letters, for which the accent gets stripped, but perhaps it works with the latest biber build (besides, I don't think anyone would seriously notice that one) – ienissei Mar 27 '18 at 16:38
@ienissei Mhhh, I really expected accented characters to work if input as UTF-8, but apparently that is not the case, bummer. I expected problems with multiple and composite first names as well, but frankly a solution to that is beyond my abilities. The solution also fares badly with brace protection and I assume also if you use macros. RegExp is not really suitable for the latter two. – moewe Mar 27 '18 at 16:45
1

Will look into it and post anything good I come up with. Thanks for the help so far. – ienissei Mar 27 '18 at 16:48
I posted an answer which seems to solve most of the problems (at least the ones I found), including composite names. I still have no solution for accented initial letters… not sure why biber doesn't like them. – ienissei Mar 27 '18 at 20:45
@moewe As for now your we got many warnings from biber with your code: WARN - Invalid namepart '' found in extended name format name 'author' in entry 'Book1' Probably due to new biber version some changes are needed? – andc Jul 03 '22 at 13:04

ienissei · Answer 2 · 2022-09-06T20:20:54.017

Here is another attempt based on @moewe's solution. It should fix some of the problems discovered in comments.

It does not fix the issue with accented letters. I have no idea why they do not get parsed properly into the given-i field [Fix in edit below].

MWE

\documentclass{article}
\usepackage{filecontents}
 \begin{filecontents}{\jobname.bib}
  @book{Book1,
  author = {Savigny, Friedrich Carl von},
  title  = {An Important Book},
  publisher = {Publisher},
  date = {2012},
}
@book{Book2,
  author = {de La Vaissière, Claude-Henri},
  title  = {Another Important Book},
  publisher = {Publisher},
  date = {2014},
}
@book{Book3,
  author = {Durand, Jean-Philippe},
  title  = {A very Important Book},
  publisher = {Publisher},
  date = {2016},
}
@book{Book4,
  author = {de La Boétie, Étienne},
  title  = {A very Important Book},
  publisher = {Publisher},
  date = {2016},
}
@book{Book5,
  author = {Doe, Charles-Édouard},
  title  = {A very Important Book},
  publisher = {Publisher},
  date = {2016},
}
@book{Book6,
  author = {Dolittle, Étienne-Marie},
  title  = {A very Important Book},
  publisher = {Publisher},
  date = {2016},
}
\end{filecontents}
\usepackage[style=verbose,giveninits=true, backend=biber]{biblatex}
\addbibresource{\jobname.bib}
\makeatletter
\def\do#1{%
  \ifcsundef{blx@csv@datamodel@names}
    {\csdef{blx@csv@datamodel@names}{#1}}
    {\csappto{blx@csv@datamodel@names}{,#1}}}
\dolistcsloop{blx@datamodel@names}
\DeclareSourcemap{
  \maps[datatype=bibtex]{
    \map[foreach=\blx@csv@datamodel@names]{
      \step[fieldsource=\regexp{$MAPLOOP}, 
            match=\regexp{(\A|\s+and\s+)([a-z'\s]+)?\s*(.+?),\s+(Chr?|Th|Ph|B-DF-HJ-NP-TV-XZ|\w(?=\w+-))([\w\s]+)(?:(-)(Chr?|Th|Ph|B-DF-HJ-NP-TV-XZ|\w)([\w\s]+))?(?=\Z|\s+and\s+)},
            replace=\regexp{$1 family=$3, prefix=$2, given=$4$5$6$7$8, given-i={$4$6$7}}]
      \step[fieldsource=\regexp{$MAPLOOP},
            match=\regexp{(prefix=,)},
            replace=\regexp{}]
    }
  }
}
\makeatother
\begin{document}
Friedrich Carl von Savigny:\par
\cite{Book1}\medskip
Claude-Henri de La Vaissière:\par
\cite{Book2}\medskip
Jean-Philippe Durand:\par
\cite{Book3}\medskip
Étienne de La Boétie:\par
\cite{Book4}\medskip
Charles-Édouard Doe:\par
\cite{Book5}\medskip
Étienne-Marie Dolittle:\par
\cite{Book6}
\end{document}

As can be noted from the output above:

Names with von parts work properly (no support for Jr parts).
When a person has several first names, only the first one gets abbreviated (this is common practice when using several initial letters for a single name).
When a person has a composed name, including a hyphen, both parts of the name are properly abbreviated.

And, as a workaround for the [now solved] problem with accented letters, the regexp I use tries to minimise the number of cases in which we will use extended name format, so as to preserve native parsing of names (and the accented letters). At the moment, it means that:

Single or multiple first names starting with an accented letter will keep it
Accented initials in composed names will be converted to the non-accented version, as shown in the sample above.

Here is the detailed code, largely inspired from @moewe:

\def\do#1{%
  \ifcsundef{blx@csv@datamodel@names}
    {\csdef{blx@csv@datamodel@names}{#1}}
    {\csappto{blx@csv@datamodel@names}{,#1}}}
\dolistcsloop{blx@datamodel@names}
\DeclareSourcemap{
  \maps[datatype=bibtex]{
    \map[foreach=\blx@csv@datamodel@names]{
      \step[fieldsource=\regexp{$MAPLOOP}, 
            match=\regexp{(\A|\s+and\s+)([a-z'\s]+)?\s*(.+?),\s+(Chr?|Th|Ph|B-DF-HJ-NP-TV-XZ|\w(?=\w+-))([\w\s]+)(?:(-)(Chr?|Th|Ph|B-DF-HJ-NP-TV-XZ|\w)([\w\s]+))?(?=\Z|\s+and\s+)},
            replace=\regexp{$1 family=$3, prefix=$2, given=$4$5$6$7$8, given-i={$4$6$7}}]
      \step[fieldsource=\regexp{$MAPLOOP},
            match=\regexp{(prefix=,)},
            replace=\regexp{}]
    }
  }
}

Stripping it down, in case someone else wants to toy with the regexp:

(\A|\s+and\s+): Look for name boundaries ("and" is the separator). Matches $1.
([a-z'\s]+)?: Look for the von part, which only interests us when it is lowecased (as it will influence sorting). Matches $2.
\s*(.+?)\,\s+: Look for the last name, followed by a comma. Matches $3.
(Chr?|Th|Ph|[B-DF-HJ-NP-TV-XZ](?:l|r)|\w(?=\w+-)): Look for the strings that should be used as inits in the first name, or alternatively, take the first letter (\w), but only if the first name is composite (look ahead for \w+-, i.e. letters followed by a hyphen). This allows us to exclude some accented initials from the test, so as to preserve them. Matches $4.
([\w\s]+): Look for the rest of the first name, including any subsequent first names. Matches $5.
(?:(-)(Chr?|Th|Ph|[B-DF-HJ-NP-TV-XZ](?:l|r)|\w)([\w\s]+))?: In case the name is a composite one, repeat the last two steps, but look for a starting hyphen (matches $6), then the initial string or letter (matches $7, no need to look ahead here), then the rest of the first name (matches $8).
(?=\Z|\s+and\s+): Look ahead for name boundaries = end.

Also, biber issues warnings when given a an empty von part. Hence, we need a second step to look for prefix=\, and clear it.

Edit: In the comments below, @andc discovered a bug in the above code and, while I do not have time to investigate it properly, I do have a newer version of the code that does not have this bug. As a bonus, this edit also solves the previous problem with accented first letters.

Here is the code:

\def\do#1{%
  \ifcsundef{blx@csv@datamodel@names}
    {\csdef{blx@csv@datamodel@names}{#1}}
    {\csappto{blx@csv@datamodel@names}{,#1}}}
\dolistcsloop{blx@datamodel@names}
\map[foreach=\blx@csv@datamodel@names,overwrite=true]{%
    \step[fieldsource=\regexp{$MAPLOOP},%
        match=\regexp{(\A|\s+and\s+)([a-z'\s]+)?\s*(.+?),\s+(Chr?|Th|Ph|B-DF-HJ-NP-TV-XZ|(?:À|Â|Ä|Ç|É|È|Ê|Ë|Î|Ï|Ô|Ö|Ù|Û|Ü|Ÿ|Æ|Œ)|\w)(.+?)(?:(-(?:Chr?|Th|Ph|B-DF-HJ-NP-TV-XZ|(?:À|Â|Ä|Ç|É|È|Ê|Ë|Î|Ï|Ô|Ö|Ù|Û|Ü|Ÿ|Æ|Œ)|\w))(.+?))?(?=\Z|\s+and\s+)},
        replace=\regexp{$1 family=$3, prefix=$2\relax, given=$4$5$6$7, given-i={$4$6}}]
    \step[fieldsource=\regexp{$MAPLOOP},
        match=\regexp{(prefix=,)},
        replace=\regexp{}]
}

With your code there is the problem if we have two authors and name of second one begins with digraph, eg author = {Doe, John and Smith, Friedrich}, we get warning from biber: WARN - Invalid namepart '' found in extended name format name 'author' in entry 'Book1', ignoring and incorrect output: Doe and Fr. An Important Book. Publisher, 2012. — andc, Jul 03 '22 at 13:00
@andc Sorry for the delay, I don't connect very often. If I test the code I wrote in this answer, the bug does happen. If I run my current code, it no longer does. So what I will do is post the new code in an edit. I don't have time to figure out the differences. — ienissei, Sep 06 '22 at 19:59

score 4 · Answer 3 · answered Mar 26 '18 at 10:12

I have slightly modified the above answer to apply to authors and editors and to the abbreviation of Philippe.

\documentclass{article}
\usepackage{filecontents}
 \begin{filecontents}{\jobname.bib}
  @book{Book1,
  author = {Doe, Charles and Potter, William},
  title  = {An Important Book},
  publisher = {Publisher},
  date = {2012},
}
@book{Book2,
  author = {Dolittle, Charlotte},
  title  = {Another Important Book},
  publisher = {Publisher},
  date = {2014},
}
@book{Book3,
  author = {Smith, Philippe},
  title  = {A very Important Book},
  publisher = {Publisher},
  date = {2016},
}

@book{Book4,
    address = "Turnhout",
    editor = "Arnoux, Theodore and Gazeau, Véronique and Demetz, Christine",
    publisher = "Brepols",
    shorttitle = "Les chanoines réguliers de la province de Rouen",
    title = "{Des clercs au service de la réforme. Études et documents sur les chanoines réguliers de la province de Rouen}",
    year = "2000"
}
\end{filecontents}
\usepackage[style=verbose,giveninits=true, backend=biber]{biblatex}

\DeclareSourcemap{
  \maps[datatype=bibtex]{
    \map{
      \step[fieldsource=author, 
            match=\regexp{(\A|\s+and\s+)(.+?)\,\s+(Chr?|Th|Ph|\w)(.+?)(?=\Z|\s+and\s+)},
            replace=\regexp{$1 family=$2, given=$3$4, given-i=\{$3\}}]
       \step[fieldsource=editor, 
            match=\regexp{(\A|\s+and\s+)(.+?)\,\s+(Chr?|Th|Ph|\w)(.+?)(?=\Z|\s+and\s+)},
            replace=\regexp{$1 family=$2, given=$3$4, given-i=\{$3\}}]
    }
  }
}

\addbibresource{\jobname.bib}
\begin{document}

\cite{Book1}

\cite{Book2}

\cite{Book3}

\cite{Book4}
\end{document}

@ienissei Indeed, but I have to admit that I had forgotten to add the necessary code to the MWE. So my answer in its first version contained the full RegExp in the explanatory bit, but a less powerful RegExp in the MWE. I have since fixed this, and also updated the answer even more. — moewe, Mar 27 '18 at 10:24

Two or three letter initials in bibliography with Biblatex (again)

3 Answers3

Linked

Related