Authors with abbreviated and full first names - how to deal with the mess in biblatex?

Question

I am quite sure someone has come across this madness, just not sure how to ask uncle google. When downloading citations to either Zotero or JabRef or whatever citation manager, some journals will only give first names as abbreviations, eg. B. K. others will provide full names leading to mixed up entries where some entries have a particular author with a full first name and others with only an abbreviated one. This, consequently, causes issues in e.g. biblatex because it considers Doe, John and Doe, J. two different authors leading to behaviours such as putting first name initials in in-text citations which one needs to resolve by uniquename=false which in turn causes maxcitenames=3, mincitenames=1 being ignored (probably still to resolve non-uniquness). It is actually very annoying problem. Obviously, one way is to go manually one by one through references and deal with it. But this is very tedious.

E.g. both JabRef and Zotero have tools that deal with duplicate entries. JabRef, in addition, has a way to show errors in the bib database. But how do you deal with this authors mess automatically? (ideally I would envisage a tool that just shows you a bunch of bib items that have the same last name of author plus the initial of the first and you select, "ok, I want to copy the full first name to those or these entries" or any other sensible way to deal with this without spending hours on manually modifying the entries and ideally keeping the full first names as much as possible)

(I feel this might be considered off-topic to certain overly-concerned individuals, but I think it is very much to the point of TeX bibliography and bibtex specifically)

I don't think there really is a way to deal with this automatically. I just fix things incrementally whenever they arise. There's generally no need to fix your entire .bib file at once, since any one paper you write is likely using a small subset of it. — Alan Munn, Mar 13 '24 at 16:22
uniquename should usually not influence min/maxcitenames that much. For the number of names the uniquelist feature is usually much more relevant. See for example https://tex.stackexchange.com/q/69028/35864. — moewe, Mar 13 '24 at 20:44
But to the real question: This is tricky. Assuming the data is good (which it is not always: https://tex.stackexchange.com/q/386053/35864), which is to say it matches the publication metadata provided on the publication itself I can see arguments both for preserving names as given in the work itself and normalising to a "default" form. I'm not sure that this is something that can be done programmatically. Technically speaking you'd have to verify that "D. E. Knuth" is actually "Donald E. Knuth" if you want to normalise both to the same name. — moewe, Mar 13 '24 at 20:48

score 2 · Answer 1 · answered Mar 13 '24 at 20:56

You can write this functionality yourself, for example using the Perl library Text::BibTeX, that provides parsing functionality for .bib files.

The code below performs two passes over an input .bib file. In the first pass it finds the longest first name for the combination of last name+initial for each author of each entry. In the second pass it replaces all first names by the long version stored in the first pass.

There is some bookkeeping to actually reconstruct the name out of the last name, first name, prefixes ('von' part) and suffixes ('jr' part), because it was unclear to me if it is possible to keep an existing Text::BibTeX::Name object and modify only one part of it (here the first part) while keeping the rest of the parts the same - so I decided to build a name string out of the parts and concatenate all names at the end of an entry manually.

Code:

use Text::BibTeX;
first pass: store the longest name for each last name+initial combination
my $bibfile = Text::BibTeX::File->new("$ARGV[0]") or die "error: $ARGV[0] not found\n";
while ($entry = Text::BibTeX::Entry->new($bibfile)){
    @names = $entry->names('author');
    # loop all authors
    foreach $name (@names){
        $last = join(' ', $name->part('last'));
        $first = join(' ', $name->part('first'));
        $char1 = substr($first,0,1);
        # if the last name with this initial is seen before,
        if(exists($namemap{$last.$char1})){
            # check if the new first name is longer than the previous one
            if(length($first) > length($namemap{$last.$char1})){
                # if yes, store it, otherwise do nothing (i.e., keep the previous one)
                $namemap{$last.$char1} = $first;
            }
        # last name with this initial has not been seen, store the first name
        }else{
            $namemap{$last.$char1} = $first;
        }
    }
}
$bibfile->close;
second pass: replace the names
my $bibfile = Text::BibTeX::File->new("$ARGV[0]") or die "error: $ARGV[0] not found\n";
my $newfile = Text::BibTeX::File->new(">$ARGV[1]") or die "error: cannot write $ARGV[1]\n";
while ($entry = Text::BibTeX::Entry->new($bibfile)){
    @newnames = ();
    @names = $entry->names('author');
    # loop each author
    foreach $name (@names){
        $last = join(' ', $name->part('last'));
        $currfirst = join(' ', $name->part('first'));
        $char1 = substr($currfirst,0,1);
        # find the first name stored in the first pass
        $first = $namemap{$last.$char1};
        # reconstruct the name as von last, jr, first
        $reconstructname = "";
        if($name->part('von')){
           $reconstructname .= $name->part('von')." ";
        }
        $reconstructname .= $last.", ";
        if($name->part('jr')){
           $reconstructname .= $name->part('jr').", ";
        }
        $reconstructname .= $first;
        # add reconstructed name to list of authors
        push(@newnames, $reconstructname);
    }
    # set full author field as all authors separated by "and"
    $entry->set('author', join(' and ', @newnames));
    # write the entry to the bib file
    $entry->write($newfile);
}

Example input .bib file:

@misc{doeabbr,
   author = {J. Smith and J. Doe and C. Doe},
   title = {The Shorter The Better},
   year = {2009}
}
@misc{doefull,
   author = {Doe, John and Smith, Jane Mary and Doe, Charlie},
   title = {Full Names Are Important},
   year = {2010}
}

Command line:

perl script.pl inputfile.bib outputfile.bib

Resulting output file:

@misc{doeabbr,
  author = {Smith, Jane Mary and Doe, John and Doe, Charlie},
  title = {The Shorter The Better},
  year = {2009},
}
@misc{doefull,
  author = {Doe, John and Smith, Jane Mary and Doe, Charlie},
  title = {Full Names Are Important},
  year = {2010},
}

Authors with abbreviated and full first names - how to deal with the mess in biblatex?

1 Answers1

first pass: store the longest name for each last name+initial combination

second pass: replace the names