4

I asked this question on stackoverflow a few weeks ago, but have not gotten a useable answer, even after a colleague added a 150pt bounty.

Is it possible to query the doi for each record in a table of citations?

I have a table (csv version) that includes the last name of the first author, the title, journal, year, and page numbers for each citation. I expect most (>90%) of the rows to have a valid doi, but using the [simple query uploader at CrossRef], I get a hit rate of ~7%.

There is also an XML-based query format that allows fuzzy matching, but this seems to have a limit on the number of queries that can be sent at one time.

The table is currently in MySQL, but starting with the .csv file would be a great help.

1 Answers1

4

This is an open problem. There are better and worse ways to attack it but, start by reading Karen Coyle's summary of this problem. The bibliography attached to that article as well is excellent.

In short, the problem of quantifying sameness between two bibliographic records is hard, and a substantial amount machine-learning research has centered around this topic.

meawoppl
  • 1,918
  • 11
  • 20