Where I can get the human name corpus country-wise, so that I can use it for training my neural network for detecting the human names out of the string.
Does anyone have idea of this?
- 221
- 1
- 7
-
First and/or last names? Please [edit] – Jul 12 '17 at 10:06
-
Both ... if there is any corpus available then it will be ideal for my case – Jaffer Wilson Jul 12 '17 at 10:23
-
Wikidata/DBpedia? How many records do you need? In English only? – Stanislav Kralin Jul 12 '17 at 13:33
-
@StanislavKralin As many as I can get. There is no limit to the data. As the training needs more and more data.. say in GBs or TBs. So there is no question of any number of records. If you know many records, you just can tell me from where to get it. – Jaffer Wilson Jul 13 '17 at 02:25
-
Well, Wikidata contains about 3 millions of people... This query clarifies the structure of the data. Most likely, you need to download a (partial) dump to extract all these records. – Stanislav Kralin Jul 13 '17 at 07:46
-
I think this will help.. can you show me how I can get the complete database instead of the partial one.. please? – Jaffer Wilson Jul 13 '17 at 07:55
-
1See also other questions with the [tag:names] tag. From this answer: Fake Name Generator. – Stanislav Kralin Jul 13 '17 at 09:23
-
3Possible duplicate of Multinational list of popular first names and surnames? – Jul 26 '17 at 14:08
-
@jknappen That question does not seem to be a duplicate. It is for separate lists, while this one is for the name combination. – Jul 27 '17 at 07:26
-
Related: https://opendata.stackexchange.com/questions/13116/recent-dump-of-names-from-facebook-com-directory – Adam Bittlingmayer Aug 10 '18 at 12:20
-
@A.M.Bittlingmayer Had you visited your suggestion first. The page is not found in the reference you have shared. So has no meaning to sharing. – Jaffer Wilson Aug 10 '18 at 12:21
2 Answers
UK first names are available from birth registration data from the ONS here (annual files are available and there is also data going back a lot of years). Trends and visualisation of some of this data are available on my Tableau Public page.
Data from the USA are also available. A good place to start is here. This dataset is also accessible from several other sources including an open dataset in Google's BigQuery (if you have an account then it saves a lot of downloading and structuring).
These generally only give the prevalence of first names for privacy reasons, but that is a good place to start.
- 400
- 2
- 6
Since 2014 the Dutch social insurance bank publishes lists of most popular boys and girls names:
for boys: http://svbvod.download.kpnstreaming.nl/kindernamen-2016/jongens-populair-2016.pdf
for girls: http://svbvod.download.kpnstreaming.nl/kindernamen-2016/meisjes-populair-2016.pdf
For 2014 or 2015, change the years in the links.
Another source is a research institute in the Netherlands that keeps track of every single name in the Netherlands, in 2010 they published every first name that occurs more than 500 times in the Netherlands: http://www.meertens.knaw.nl/nvb/downloads/Top_eerste_voornamen_NL_2010.zip
They also have an interface which allows you to view the occurrence of names, but you can't use it to download bulk data: http://www.meertens.knaw.nl/nvb/english
- 1,453
- 1
- 9
- 26