I need to access to history of each wiki page. I found http://dumps.wikimedia.org/fawiki/latest/fawiki-latest-pages-meta-history.xml.7z contains such data for Persian wiki in XML format. But I want prossess the data using MySQL. Is that file available in sql format too? Where can I find it?
-
Could you clarify what do you mean by “visited a page during two consecutive edits”? As far as I know, Mediawiki doesn't store which pages a user visited. – svick May 22 '13 at 11:26
-
@svick I mean visit log. I want to know how many visitor have visited one article in a specific time. Or know visitor name + time of visit for an article. – Real Dreams May 22 '13 at 15:23
-
1In that case, the best you can do is to get the page view statistics. – svick May 22 '13 at 17:00
2 Answers
No, there are no SQL versions of the XML dump files.
The page Data dumps/Tools for importing on meta.wikimedia.org describes how to work around that: You can either use ImportDump.php to import the XML file directly (apparently suitable only for small wikis), or you can use a tool like mwdumper to convert the XML into SQL and then import that.
- 869
- 4
- 9
-
Thanks a lot. I tried to use
mwdumperto convert xml file to sql. I executedjava -jar mwdumper-1.16.jar --format=sql:1.5 --output=file:xx.sql mzn\mznwiki-20130507-pages-meta-history.xmlbut I regenerate a XML file instead of sql. Is there any thing wrong with the command? – Real Dreams May 24 '13 at 02:39 -
There is already an answer to this question, by @RSFalcon7.
You will find the whole explanation on this Wikipedia page about dumps available in XML or SQL format.
The data about revisions in SQL seems private though (no link provided for direct download), but I couldn't figure what this implies.
It does not seem available in Persian unfortunately either.
Another possibility is to parse your XML and add it to your database. You can then define your own Database schema. Python language can probably help you do that.
-
-
mmh it's an answer. I quoted the answer I used to build it and detailed it (in the answer I quote, only the link is given but I think it is not enough to help you). Did you find what you wanted? – Vince May 22 '13 at 08:32
-
-
@svick Are you sure? I can read the description of XML dumps: "All pages with complete edit history" – Vince May 22 '13 at 11:31
-
-
@svick I read "Deleted page and revision data. (private)" with indeed no link, as I said in my answer. I also said I don't know what 'private' implies, because I suppose if you log in (maybe it's private to specific users) you can have access to the link. But if you know more, please give some details. – Vince May 22 '13 at 11:35
-
@Vince I believe “private” here means that it's accessible only to Wikimedia people. – svick May 22 '13 at 11:39
-
yeah me too (there is not even a link to log in). That's also why I added in my answer that some Python script can do the job. Thanks for the improvement. – Vince May 22 '13 at 11:43