2

I've imported the wikipedia database in four languages with the goal of running some machine learning algorithms on it for text classification. The import doesn't populate the "category" table though. Am I missing something?

I would also like to know if there was a way to map categories across the different language databases? i.e. know which category in English represents which category in German for example?

Thanks!

podzway
  • 23
  • 2
  • 1
    It turns out I imported the articles only (using mwdumper). The categories (as well as the pages) and other table dumps can be downloaded here: https://dumps.wikimedia.org/enwiki/latest/. "en" can be replaced with the desired language code. – podzway Mar 08 '16 at 11:05

1 Answers1

1

For the second question, the answer is in the langlinks table (categories have page_id just like pages). Se more here: https://www.mediawiki.org/wiki/Manual:Langlinks_table

Ainali
  • 446
  • 3
  • 7
  • Anyway beware that structure of categories is different in different Wikipedias and this makes mapping categories between Wikipedias harder than mapping articles. – Pere Mar 04 '17 at 17:23