4

I need a way to translate shortened/alternative spellings back to their original name.

Example: Tom -> Thomas, Ben -> Benjamin

Are there any datasets that you know of that could help with this?

  • Have you checked all the answers on this site with questions about names? –  Mar 30 '18 at 07:36
  • namepedia and incompetech offer this, though i suspect you'll have to do all of the heavy lifting. namepedia example: http://www.namepedia.org/en/firstname/Albert/ and any search on https://incompetech.com/named/ will show variations if they exist. – albert Apr 04 '18 at 21:20
  • duplicate? https://opendata.stackexchange.com/q/9777/1511 (Please flag it if so) – philshem Apr 05 '18 at 19:32
  • I checked that and found the answer to be lacking. It does not give a large enough dataset. – Myles Hollowed Apr 10 '18 at 22:49

2 Answers2

1

Using keywords hypocoristic and diminutive, one can find the following links:

Finally, there is diminutives.db on GitHub:

The databases of diminutives, male_diminutives.csv and female_diminutives.csv, are manually-edited versions of data that was automatically extracted from Wiktionary by the PHP script bin/generate_diminutives_csv.php.

Stanislav Kralin
  • 2,975
  • 1
  • 12
  • 33
  • Links are broken; I believe a fork of the original project is located at https://github.com/jonathanhar/diminutives.db – rinogo Nov 23 '21 at 15:30
1

It sounds like you may be looking for what librarians call an "authority file". You can reference the links posted for a starter, but may find that not every diminutive links back to the same given name and/or that some people prefer to be known by the dimunitive, not the given name.

Usually, your goal is to standardize names (hence, "authority file") rather than insist that they fit a specific pattern.

Ari Davidow
  • 131
  • 3
  • How can I come by or create an authority file? – Myles Hollowed Apr 10 '18 at 22:48
  • An authority file can be almost anything convenient--a text file, a database, a key-value store. The longer answer is that it has to be maintainable, and it has to be something that can return the "authoritative" answer when it receives a query. I think I'd use key-value pairs, built from the sources that were cited earlier. That would be simplest given the data. You could also use any triple store tool--that would let you indicate relationships more nuanced than just "variant" and "authoritative version." (For the authoritative version, you would store that authoritative name in both fields.) – Ari Davidow Apr 10 '18 at 23:39
  • Thanks, but what you're saying kind of boils down to "create your own database". I'm looking for actual data that's available for download – Myles Hollowed Apr 11 '18 at 19:40
  • I hope that's not what I'm saying. What I mean is that once you get the data, you will have to do something with it if you wish to use it. That may be entering each of the entries as a key/value pair in a search engine, or other type of lookup (aforementioned triples or whatever). The lone exception is if the data aren't actionable--they are simply desired as a personal lookup. – Ari Davidow Apr 12 '18 at 20:23