0

I want to get the number of total pages of a given search resluts of this site: http://srh.bankofchina.com/search/whpj/searchen.jsp, e.g., from 2015-1-12 to 2015-1-13 the currency of USD by the following code:

$initialUrl = "http://srh.bankofchina.com/search/whpj/searchen.jsp";
startdate = "2015-01-12";
enddate = DateString[{"Year", "-", "Month", "-", "Day"}];
name = "1336"; (*only USD*)

$parameters = {"erectDate" -> startdate, "nothing" -> enddate, 
   "pjname" -> name};
$results = Import[$initialUrl, "Data"
  , "RequestMethod" -> "POST"
  , "RequestParameters" -> $parameters]
$results // ColumnForm;

But it seems that the Data did not include the total number of pages (as in the following graph).

total number

So is there still a way to get the total number of pages, such that I can get all the results page by page?

van abel
  • 1,235
  • 1
  • 11
  • 27

1 Answers1

1

In order to solve the problem exactly as it is stated - by finding the element with class "nav_pagenum" - I suggest this solution. On that page there are also links to posts where it is shown how to retrieve the element using Cases. Below I present a toy solution that I just happened upon.


When you request a page that does not exist it will simply return the last page that does exist. The major drawback of this approach is that you cannot get the page asynchronously.

requestData[startdate_, enddate_, currency_][page_] := Module[
  {url = "http://srh.bankofchina.com/search/whpj/searchen.jsp", parameters},
  parameters = {"erectDate" -> startdate, "nothing" -> enddate, "pjname" -> currency, "page" -> ToString@page};
  Rest[Import[url, "Data", "RequestMethod" -> "POST", "RequestParameters" -> parameters]][[1, 2 ;;, 2 ;;]]
  ]

startdate = "2015-01-12";
enddate = DateString[{"Year", "-", "Month", "-", "Day"}];
name = "1336";(*only USD*)

p = 0;
data = {};
last = requestData[startdate, enddate, name][1];
new = requestData[startdate, enddate, name][2];
While[new =!= last,
  last = new;
  new = requestData[startdate, enddate, name][p];
  AppendTo[data, new];
  p++;
  ];
results = Most@data;
C. E.
  • 70,533
  • 6
  • 140
  • 264
  • what's the logic of the while loop? – van abel Jan 13 '15 at 11:46
  • @vanabel It continues until the last retrieved page is the same as the formerly retrieved page. This works because if you request a page that doesn't exist it returns the last page that does. – C. E. Jan 13 '15 at 11:51
  • I mean if there is only one page, then no data will output? – van abel Jan 13 '15 at 12:03
  • 1
    @vanabel I can help you fix that special case, if you need it. I only attempted to show a way to do this, I think that now that you know this trick you may be able to write code for that special case yourself. – C. E. Jan 13 '15 at 12:16
  • thanks, please help me again, the logic is not easy to me. – van abel Jan 13 '15 at 13:28
  • @vanabel OK, note that new and last is defined before the loop even starts. If, before the loop even starts, new == last then there is only one page. So If[new == last, new, While...]. – C. E. Jan 13 '15 at 13:30
  • Why the last Most is needed? – van abel Jan 13 '15 at 13:35
  • Thanks a lot, I finally get the following: data = {}; last = requestData[startdate, enddate, name][1]; p = 1; While[data = data~Join~last; p++; new = requestData[startdate, enddate, name][p]; last =!= new, last = new; ] data // Length – van abel Jan 13 '15 at 13:56
  • @vanabel Most is needed because it retrieves one duplicate before it realizes that it is a duplicate. Please, I don't think you should accept this. Not yet at least. It didn't find the number by parsing the HTML as you requested. It was just a solution I happened upon, and thought I should post. – C. E. Jan 13 '15 at 14:00
  • yes, this is not a directly solution, but it works for me, so I think is ok. Also the original question is still interesting. – van abel Jan 13 '15 at 14:06
  • 1
    @vanabel Updated with a small note that may be of interest. – C. E. Jan 17 '15 at 22:45