2

I am trying to import data from multiple web pages hosted by a single online source. The data is posted by the source as one data set per web page for each week of the year. I would like to import the data for all 52 weeks in a year at the same time rather than modifying my code for each of the weeks and importing one at a time.

Here is my code to import one week's data:

week012012 = 
  Import["http://www.boxofficemojo.com/weekend/chart/?yr=2012&wknd=01&\p=.htm", "Data"]

If it is of interest or relevance here is the further processing I do with the data after it has been imported:

week012012B = Cases[week012012, {_, _, _, _, _, _, _, _, _, _, _, _?NumericQ}, ∞];

Grid[week012012B, Frame -> All]

The site uses a consistent naming scheme for each week of the year and indeed from year-to-year as well. If I were looking to get just a few weeks data I could manually change the URL for the weekend from 01 to 02, 03, 04..., but I want all 52 weeks for 2012. The approach I have been playing with is to use string manipulation to modify the URL and then import and save the data for each of the weeks. Any suggestions?

m_goldberg
  • 107,779
  • 16
  • 103
  • 257
Nguyen Van Falk
  • 409
  • 3
  • 7

1 Answers1

5
wkstrngs = StringJoin /@ Map[ToString, PadLeft[IntegerDigits /@ Range[52]], {-1}]; 
wkurls = Quiet["http://www.boxofficemojo.com/weekend/chart/?yr=2012&wknd=" ~~ 
  # ~~ "&\p=.htm" & /@ wkstrngs[[;; 5]]] (* remove [[;;5]] for all 52 weeks *)
(* {"http://www.boxofficemojo.com/weekend/chart/?yr=2012&wknd=01&\\p=.htm",
    "http://www.boxofficemojo.com/weekend/chart/?yr=2012&wknd=02&\\p=.htm",
    "http://www.boxofficemojo.com/weekend/chart/?yr=2012&wknd=03&\\p=.htm",
    "http://www.boxofficemojo.com/weekend/chart/?yr=2012&wknd=04&\\p=.htm",
    "http://www.boxofficemojo.com/weekend/chart/?yr=2012&wknd=05&\\p=.htm"} *)
fiveweeks = Import[#, "Data"] & /@ wkurls;
data5wks = Cases[#, {__,_?NumericQ}, Infinity] & /@fiveweeks; (*thanks: Mike Honeychurch*)
Grid[#, Frame -> All] & /@ data5wks
kglr
  • 394,356
  • 18
  • 477
  • 896