Pulling Tabular Data from a Website

Question

I am trying to extract data from this website by simply using

data = Import["https://www.whoscored.com/Regions/252/Tournaments/2/Seasons/6829/Stages/15151/TeamStatistics/England-Premier-League-2017-2018", "Data"]

But Mathematica seems unable to pick up the data from the table. How do I pull the actual statistics in order to then analyze/visualize them?

That seems harder because the data is in dynamically generated tables, rather than embedded in the code of the page. — MarcoB, May 09 '18 at 14:38

score 1 · Answer 1 · answered May 09 '18 at 14:59

Here is something that may help. Using Google Chrome's Developer tools (Network tab) you can follow the activity that happens when you request data from the page. For instance, I clicked on the "Defensive" tab in the first table. This shows that the web site sent a request to the following URL:

https://www.whoscored.com/StatisticsFeed/1/GetTeamStatistics?\
category=summaryteam&subcategory=defensive&statsAccumulationType=0&\
field=Overall&tournamentOptions=&timeOfTheGameStart=&timeOfTheGameEnd=\
&teamIds=&stageId=15151&sortBy=tacklePerGame&sortAscending=&page=&\
numberOfTeamsToPick=&isCurrent=true&formation=

I opened that in a browser and noticed that it was essentially just the data needed to generate the new table. I therefore tried to import data from that URL:

tableURL = 
  "https://www.whoscored.com/StatisticsFeed/1/GetTeamStatistics?\
category=summaryteam&subcategory=defensive&statsAccumulationType=0&\
field=Overall&tournamentOptions=&timeOfTheGameStart=&timeOfTheGameEnd=\
&teamIds=&stageId=15151&sortBy=tacklePerGame&sortAscending=&page=&\
numberOfTeamsToPick=&isCurrent=true&formation=";

Import[tableURL, "JSON"] //Short

(* Out: {teamTableStats->{<<1>>},
         paging->{<<1>>},
         statColumns->{apps,<<4>>,offsideGivenPerGame}} *)

This is a nicely organized list of rules that contains the data, from which it should be easier to extract what you need.

A caveat: the links so obtained are time- or session- limited, i.e. they expire after a very short while, no doubt to limit data scraping, so a certain amount of processing by hand will still be necessary.

Pulling Tabular Data from a Website

1 Answers1