30

Wikidata is a new Wikimedia project:

It centralizes access to and management of structured data, such as interwiki references and statistical information.

This data would be of enormous interest to the Open Data community. Does anyone know if it is possible to download a complete dump of the Wikidata database?

Update: Thanks to everyone who has pointed me to the Wikidata wiki dumps. However, I am interested in a more usable data format that doesn't require parsing to get to the actual data. An RDF dump, for example, would do nicely.

Nemo
  • 113
  • 4
Patrick Hoefler
  • 5,790
  • 4
  • 31
  • 47
  • 2
    Not sure if you noticed that you can already download some data dumps: http://dumps.wikimedia.org/wikidatawiki/ Most of these files lack some decent description but as you can see, they're actively uploading new documents. – r_31415 May 09 '13 at 04:25
  • @RobertSmith, thanks for the link! As far as I understand, these dumps contain the Wikidata wiki, but not the Wikidata data – I know, it sounds a little confusing ;) Am I wrong? If so, which one of the dumps is the one with the relevant data? – Patrick Hoefler May 09 '13 at 08:57
  • btw have you checked this answer? – RSFalcon7 May 09 '13 at 09:05
  • Thanks, @RSFalcon7! However, I'm interested in the data from the Wikidata project, not the database dumps from Wikipedia – though they can also be very helpful in certain cases. – Patrick Hoefler May 09 '13 at 09:39
  • @PatrickHoefler I think you're right but it seemed to me they are just uploading their own data first. – r_31415 May 10 '13 at 15:10
  • It's indirect, but have you looked at the dumps available from dbpedia? – Joe Jun 03 '13 at 21:05
  • @Joe Thanks for the tip, I know (and use) DBpedia. Its "indirectness" is exactly why I'm interested in the Wikidata dump :) – Patrick Hoefler Jun 03 '13 at 21:16
  • Do you really need the entire dump? If you only need a small subset of the data, probably using the MediaWiki API for Wikibase would be easier. Hence this question would use the [mediawiki] and/or [mediawiki-api] tag. – Nemo Apr 29 '15 at 09:30

8 Answers8

12

The wikidata dump is already available. As of now, the last (mostly) complete dump is from 5 May 2013 and it includes the dump of pages in the important namespaces (pages-articles.xml).

svick
  • 869
  • 4
  • 9
  • Thanks for the links! However, I'm not interested in the dumps of the wiki pages, but in the dumps of the data collected by the Wikidata project. – Patrick Hoefler May 22 '13 at 17:09
  • 5
    @PatrickHoefler Well, that data is in that dump too. Wikidata stores its data as JSON page content. For example, if you look at the XML, the third <page> there is Q15 (Africa). – svick May 22 '13 at 17:15
12

Wikidata RDF dumps have been available since 3-Aug-2013. http://www.mail-archive.com/wikidata-l@lists.wikimedia.org/msg02553.html

If you're interested in topics related to Wikidata, I'd suggest subscribing to their mailing list.

Of course, both Freebase and DBpedia have had much larger dumps available for years.

Joe
  • 4,445
  • 1
  • 18
  • 40
Tom Morris
  • 1,001
  • 6
  • 13
7

The Wikidata database is currently offered for download in several different formats:

Patrick Hoefler
  • 5,790
  • 4
  • 31
  • 47
5

Depending on your use-case Wikidata toolkit is a Java tool that allows you to iterate over all the items in a dump. In fact it auto-downloads the latest dump and translates the entire model into Java objects. This means everything is abstracted and you just have to write the call-back to be applied to each item. I wrote a tutorial on how to use it in conjunction with Python pandas here.

Nemo
  • 113
  • 4
notconfusing
  • 171
  • 1
  • 5
3

Max Klein describes three ways of getting the data at 3 Ways To Access Wikidata Data Until It Can Be Done Properly.

Not mentioned is the WDA tool.

Nemo
  • 113
  • 4
vanthome
  • 399
  • 2
  • 6
3

As you already found out the Dumps of wikidata are readily available and can be downloaded and unzipped in a few hours. Getting a proper RDF database server running with such dumps is another story.

At http://wiki.bitplan.com/index.php/Get_your_own_copy_of_WikiData you'll find the description of some successful and failed attempts as well as Performance reports.

Importing WikiData dumps is a moving target. Something that works today in a certain environment might not work in a few weeks or months.

As of 2020-06 a successful import has been reported by Jonas Sourlier in https://issues.apache.org/jira/browse/JENA-1909

Please note that Jonas has switched the loader to the more recent one just as i did. My recent attempt described in https://issues.apache.org/jira/browse/JENA-1908 failed.

I hope the fourth attempt I am currently describing will be successful and you could then simply follow the procedure described in the wikipage (which I'll copy here once the attempt is successful). Please stay tuned ...

http://wiki.bitplan.com/index.php/WikiData_Import_2020-08-15 reports a successful attempt.

3

Additionally to what is described in other answers, there is a Wikidata RDF dump which is used for the Wikidata query service. The format is described in RDF data format article.

StasM
  • 443
  • 4
  • 8
2

I have no information about Wikidata, but those are two ways to download structured information from Wikipedia:

Images and uploaded files are stored elsewhere, also downloadable

Also answered here

RSFalcon7
  • 933
  • 7
  • 23
  • Thanks for the links! However, I'm especially interested in the data from the Wikidata project, not the database dumps from Wikipedia – though they can also be very helpful in certain cases. – Patrick Hoefler May 09 '13 at 09:33
  • 1
    This is not structured information, it's semi-structured. It's mostly prose text which has no structure but is supplemented with things like infoboxes which are structured to varying degrees and always require clean up. – hippietrail Aug 10 '13 at 03:04