3

Looking for an name/gender dataset. From first name I need to infere whether it is a male or female. There are many online service, like this, or this. I need JSON or XML, to store it locally.

János
  • 899
  • 8
  • 20

1 Answers1

2

My answers (one & two) on given names and country of origin also has gender guesses.

The file is a little cryptic, but it's the most complete dataset that's openly available.

The best source of international human given (first) names comes from a German computer magazine. The text file has nearly 50k names that are classified by likely gender, and how popular in each country. It's carefully curated and has a friendly license (GNU Free Documentation License).

The file can be downloaded here : ftp://ftp.heise.de/pub/ct/listings/0717-182.zip (name_dict.txt contains the data).


Depending on which programming language you are using, there are libraries either built on this dataset or on others.

See, for example, gender-guesser, a python library (github) based on the above file (formerly called "sex machine").

>>> import gender_guesser.detector as gender
>>> d = gender.Detector()
>>> print(d.get_gender(u"Bob"))
male
>>> print(d.get_gender(u"Sally"))
female
>>> print(d.get_gender(u"Pauley")) # should be androgynous
andy

You can also specify country:

>>> print(d.get_gender(u"Jamie"))
mostly_female
>>> print(d.get_gender(u"Jamie", u'great_britain'))
mostly_male

That codes doesn't give JSON/XML, but it parses the raw .txt file from above into a python object. Probably you can hack detector.py to give you all names as JSON.

This is just one port of the raw data. I'm sure there are others.

philshem
  • 17,647
  • 7
  • 68
  • 170