16

Using the authoryear-comp style I often cite different authors having the same last name. The biblatex package options

backend=biber,style=authoryear-comp,firstinits=true,uniquename=init

eliminate ambiguities by adding author initials to these citations. Problem is that I also cite some prolific authors who don't consistently use their middle name or hyphenate their first and middle names.

For example in the document below I get the citation:

J.-P. Doe 2005; J. Doe 2006; J. P. Doe 2007; J. P. Doe and Jones 2008; J.-P. Doe and M. Smith 2009; M. Smith and Jones 2011; A. Smith and J. Smith 2011

but there is really only one Doe here. Name inconsistencies for Doe in the reference list are fine as they accurately reflect the bibliographic data. But I'd rather have the citation be:

Doe 2005, 2006, 2007; Doe and Jones 2008; Doe and M. Smith 2009; M. Smith and Jones 2011; A. Smith and J. Smith 2011

Is there an easy way to define author/editor name aliases? The shortauthor field offers one approach, but ideally I'd like to avoid ongoing edits to the bib file.

\documentclass{article}
\usepackage[american]{babel}
\usepackage{csquotes}
\usepackage[backend=biber,style=authoryear-comp,firstinits=true,uniquename=init,sorting=none]{biblatex}

\usepackage{filecontents}
\begin{filecontents}{\jobname.bib}
@Book{hyphen,
  author = {Doe, John-Paul},
  title = {A book by John-Paul Doe},
  publisher = {Publisher},
  year = {2005}}
@Book{nomiddle,
  author = {Doe, John},
  title = {Another book by John-Paul Doe},
  publisher = {Publisher},
  year = {2006}}
@Book{nohyphen,
  author = {Doe, John Paul},
  title = {Yet another book by John-Paul Doe},
  publisher = {Publisher},
  year = {2007}}
@Article{initials,
  author = {Doe, J. P. and Jones, J.},
  title = {An article coauthored by John-Paul Doe},
  journal = {Journal title},
  year = {2008}}
@Article{hypheninitials,
  author = {Doe, J.-P. and Smith, M.},
  title = {Another article coauthored by John-Paul Doe},
  journal = {Journal title},
  year = {2009}}
@Article{smithm,
  author = {Smith, Mary and Jones, Jane},
  title = {Article title},
  journal = {Journal title},
  year = {2011}}
@Article{smitha,
  author = {Smith, Anne and Smith, Joe},
  title = {Article title},
  journal = {Journal title},
  year = {2011}}
\end{filecontents}
\addbibresource{\jobname.bib}

\begin{document}
\cite{hyphen,nomiddle,nohyphen,initials,hypheninitials,smithm,smitha}
\printbibliography
\end{document}
Audrey
  • 28,881
  • 2
    Biber can't know that "Doe, J. P." and "John Paul Doe" are the same person, so I guess that setting something in the document would be too late. – egreg Dec 11 '11 at 00:28
  • @egreg You're probably right. In any case I'd like to minimize bib file edits; the tex file constraint isn't really necessary. I edited the question. – Audrey Dec 11 '11 at 01:34
  • 1
    I pondered about something similar recently: If the same author has published books under different variants of his name as you've described, should they really be printed differently in the bibliography? For example, if you have a style with the dashed option where recurring names are replace by a dash, should the variations be treated as completely different names and not be replaced? I chose to unify the names in the .bib file. Probably not 100% correct but an irregularity which can be excused IMO. – Simifilm Dec 11 '11 at 08:45
  • @Simifilm I like the use of a long dash for recurring authors. But I'm guessing any readers using the bibliography data to look-up references would prefer accuracy over consistency. – Audrey Dec 11 '11 at 15:30
  • @Audrey The question is whether we're really talking about accuracy or only pseudo-accuracy. For example, I've often seen that the cover of a book says – for example – "J.P. Doe" while the first page says "John Paul Doe". Another question is whether you actually run the risk of confusing authors when you go for consistency in the bibliography. After all, the main goal of any bibliography is to ensure that your reader can easily find the source you're citing. If you have different entries for "J.P. Doe" and "John Paul Doe", don't you actually make things more difficult for your reader? – Simifilm Dec 11 '11 at 16:13
  • @Simifilm I don't think so. Under firstinits=true all first and middle names are printed as initials. Forcing consistent use of the middle name or hyphen, however, would mark departures from the actual bibliographic data. It would also mean ongoing edits to the bib file. – Audrey Dec 11 '11 at 18:14
  • 2
    You can get part of the way with uniquename=mininit. This makes the citations unique without necessarily disambiguating the individual authors. With your example, this gives J.-P. Doe 2005; J. Doe 2006; J. P. Doe 2007; Doe and Jones 2008; Doe and Smith 2009; Smith and Jones 2011; Smith and Smith 2011. Not exactly what you want but it's true that Biber can't tell that semantic (or really, pragmatic ...) equivalents are equal by pure syntax ... – PLK Dec 11 '11 at 18:43
  • @PLK Thanks. I was hoping that the biber.conf file could be used to map author data into shortauthor. Can you clarify why this approach won't work? (And please feel free to respond in the form of an answer.) – Audrey Dec 11 '11 at 20:24
  • 2
    The Chicago Manual of Style (§ 17.40) has some pragmatic advice on this point: "When a writer has published under different forms of his or her name, the works should be listed under the name used on the title page---unless the difference is merely the use of initials versus full names. Cross-references are occasionally used." The idea is to let the stylesheet of the journal (etc.) determine whether or not to use initials or not, but to make it clear to the reader that all the 'Doe's are the same person. I second @Simifilm: standardize the .bib file. – jon Dec 12 '11 at 01:08
  • @jon Thanks - I hadn't thought of consulting any standards. As I stated earlier, the issue here is more with imposing the use of middle initials or hyphens. A more relevant quote from CMS (16, 14.72) appears to agree with you guys: "To assist alphabetization, middle initials should be given wherever known." – Audrey Dec 12 '11 at 01:45
  • @Simifilm Could you convert your comments into an answer? I'm not a fan of the solution, but at least one other user is. – Audrey Dec 12 '11 at 01:47

2 Answers2

15

Here is one approach which gives exactly the result you want. Put this in your biber.conf:

<?xml version="1.0" encoding="UTF-8"?>
<config>
  <sourcemap>
    <maps datatype="bibtex" map_overwrite="1">
      <map maptype="field">
        <map_pair map_source="AUTHOR"/>
        <also_set map_field="SHORTAUTHOR" map_origfieldval="1"/>
        <also_set map_field="SORTNAME" map_origfieldval="1"/>
      </map>
      <map maptype="field">
        <map_pair map_source="SHORTAUTHOR" map_match="Doe,\s*J(?:\.|ohn)(?:[ -]*)(?:P\.|Paul)*" map_replace="Doe, John Paul"/>
        <map_pair map_source="SORTNAME" map_match="Doe,\s*J(?:\.|ohn)(?:[ -]*)(?:P\.|Paul)*" map_replace="Doe, John Paul"/>
      </map>
    </maps>
  </sourcemap>
</config>

This follows the new Biber 0.9.8 semantics for the maps - the maps are processed sequentially, each depending on the results of the previous. Within a map, you can also use multiple items but items within a map which change things should not rely on the results of other other changes within the map element. Use map elements to do the sequencing. So, this first copies AUTHOR to both SHORTAUTHOR and SORTNAME and then, after that, edits these two fields to regularise some specific names. Biber DEV PDF doc will be updated soon.

It's a rather hairy regular expression but they are incredibly useful for faking semantic equivalence. This essentially renders the Does into the canonical "Doe, John Paul" format so that Biber knows that they are the same person. It's probably possible to design a regular expression to make this not so name specific but that will depend in the idiosyncracies of the name formats. For example, you could remove all "-' characters in names and also, since you are using firstinits, force all names to initials only.

EDIT: There will be a simpler syntax in Biber 0.9.8 like this:

<?xml version="1.0" encoding="UTF-8"?>
<config>
  <sourcemap>
    <maps datatype="bibtex" map_overwrite="1">
      <map>
        <map_step map_field_source="AUTHOR" map_match="Doe," map_final="1"/>
        <map_step map_field_set="SHORTAUTHOR" map_origfieldval="1"/>
        <map_step map_field_set="SORTNAME" map_origfieldval="1"/>
        <map_step map_field_source="SHORTAUTHOR" map_match="Doe,\s*J(?:\.|ohn)(?:[ -]*)(?:P\.|Paul)*" map_replace="Doe, John Paul"/>
        <map_step map_field_source="SORTNAME" map_match="Doe,\s*J(?:\.|ohn)(?:[ -]*)(?:P\.|Paul)*" map_replace="Doe, John Paul"/>
      </map>
    </maps>
  </sourcemap>
</config>
PLK
  • 22,776
  • This is excellent! It works also for my dashed problem. Much better than my approach. – Simifilm Dec 12 '11 at 09:17
  • Hmm, I encountered a strange problem: How do I add several of this kind of replacing commands in the config file? All variants I tried resulted in that only the first replacement was executed. – Simifilm Dec 12 '11 at 09:32
  • You mean you want to do several match/replaces on the same field? – PLK Dec 12 '11 at 09:47
  • 1
    I want something like "Aldiss, Brian W." matched to "Aldiss, Brian Wilson" and "Ashley, Mike" matched to "Aldiss, Michael" (and probably many more in the future). – Simifilm Dec 12 '11 at 09:51
  • 1
    Ah, I see the problem. Looking into it. – PLK Dec 12 '11 at 09:59
  • 1
    I have fixed this I think and I'll update the 0.9.8 dev binaries tonight. – PLK Dec 12 '11 at 11:21
  • Thanks a lot. BTW, any special reason whe the dev tree isn't available via the Git interface anymore. – Simifilm Dec 12 '11 at 11:45
  • It moved to github: https://github.com/plk/biber – PLK Dec 12 '11 at 12:07
  • @PLK Thanks! Could the replacement expression be mapped to both the shortauthor and sortname fields, with the author field left as-is? – Audrey Dec 12 '11 at 14:16
  • @PLK: Thanks, this works marvelously with the latest build from Git. – Simifilm Dec 12 '11 at 17:31
  • 1
    Good. I'm looking at @Audrey's use case but that's rather tricky currently. – PLK Dec 12 '11 at 17:57
  • 2
    I have a much generalised way of processing sourcemaps now - if @Simifilm could confirm that the latest git dev branch commit still works for him, I will push to binaries later. I can get the full case to work now and will update the answer when confirmed. – PLK Dec 13 '11 at 11:15
  • 1
    @PLK just pulled the latest changes from Git and everything seems to work fine. – Simifilm Dec 13 '11 at 11:58
  • 1
    Ok, good. This new method should make lots more things possible. I will update docs and binaries and then the answer later. – PLK Dec 13 '11 at 12:30
  • 1
    I've updated the answer with one which should do exactly what you want. You'll need the latest 0.9.8 beta from SourceForge. I haven't updated the PDF docs yet. – PLK Dec 13 '11 at 21:49
  • 1
    PDF documentation is now updated in the DEV tree on SF – PLK Dec 14 '11 at 18:45
  • 1
    I have in fact simplified and expanded the syntax of this option again and the new format will be in Biber 0.9.8. It is in the current dev binaries and updated above. Will be relevant to @Simifilm too. Sorry to do this but I want it simple to use and flexible before Biber 1.0 ... – PLK Dec 18 '11 at 20:17
  • @PLK No problem. The updates seem to work fine. Thanks again. – Audrey Dec 18 '11 at 23:44
3

I pondered about something similar recently: If the same author has published books under different variants of his name as you've described, should they really be printed differently in the bibliography? For example, if you have a style with the dashed option where recurring names are replaced by a dash, should the variations be treated as completely different names and not be replaced? I chose to unify the names in the .bib file.

I think an important question is whether we're really talking about accuracy here or only pseudo-accuracy. For example, I've often seen that the cover of a book says "J.P. Doe" while the first page says "John Paul Doe".

Another question is whether you really run the risk of confusing authors when you go for consistency in the bibliography in this way. After all, the main goal of any bibliography is to ensure that your reader can easily find the source you're citing. If you have different entries for "J.P. Doe" and "John Paul Doe", don't you actually make things more difficult for your reader?

So if it's only a question of initials vs. full names and the like I would opt for actually standardizing the .bib file.

Simifilm
  • 3,168
  • 23
  • 29