Here is a shameless plug for my HTML parser posted here. The code is a bit long to reproduce here, the only change to it I'd do is to replace the function processPosList with this code:
processPosList::unmatched = "Unmatched lists `1` enountered!";
processPosList[{openlist_List, closelist_List}] :=
Module[{opengroup, closegroup, poslist},
{opengroup, closegroup} = groupPositions /@ {openlist, closelist};
poslist = Transpose[Transpose[Sort[#]] & /@ {opengroup, closegroup}];
If[UnsameQ @@ poslist[[1]], Return[(Message[
processPosList::unmatched , {openlist, closelist}]; {})],
poslist = Transpose[{poslist[[1, 1]], Transpose /@ Transpose[poslist[[2]]]}]]];
which will issue a message when some parts can not be parsed instead of printing the details (as the original code does). I must warn that my parser for some reason can not fully parse the Wolfram Functions pages (either they are ill-formed or my parser contains bugs), but it will parse enough for our purposes. Here is a simple web-scraper based on it and on a few observations about the typical format of the page:
Clear[getForms];
getForms[url_String] :=
Quiet@ Cases[postProcess@parseText[Import[url, "Text"]],
pContainer[attribContainer[" class='CitationInfo'"], x__String] :>
StringJoin@x, Infinity] //.
x_String :> StringReplace[ x, {""" | "quot;" :> "\"", "&" :> "",
"<" | "<" :> "<", ">" | ">" :> ">", "\n" :> " "}];
Clear[formsOk, getInputForm, getStandardForm, getRuleForm];
formsOk[forms_] := Length[forms] == 5;
getInputForm[forms_?formsOk] := ToExpression[forms[[1]], InputForm];
getStandardForm[forms_?formsOk] := ToExpression[First@ToExpression[forms[[2]]], StandardForm];
getRuleForm[forms_?formsOk] := ToExpression[First@ToExpression[forms[[4]]]];
getInputForm[__] = getStandardForm[__] = getRuleForm[__] = $Failed;
I can not say how fragile this is, probably rather fragile. Here is an example of use:
In[277]:=
forms = getForms["http://functions.wolfram.com/07.23.17.0084.01"];
Through[{getInputForm,getStandardForm,getRuleForm}[forms]]
Out[278]= {Hypergeometric2F1[a,b,-(1/2)+a+b,z]==((Sqrt[1-z]-Sqrt[-z])^(1-2 a)
Hypergeometric2F1[-1+2 a,-1+a+b,-2+2 a+2 b,2 z+2 Sqrt[-z+z^2]])/Sqrt[1-z]/;Re[z]>1/2,
Hypergeometric2F1[a,b,-(1/2)+a+b,z]==((Sqrt[1-z]-Sqrt[-z])^(1-2 a)
Hypergeometric2F1[-1+2 a,-1+a+b,-2+2 a+2 b,2 z+2 Sqrt[-z+z^2]])/Sqrt[1-z]/;Re[z]>1/2,
HoldPattern[Hypergeometric2F1[a_,b_,a_+b_-1/2,z_]]:>((Sqrt[1-z]-Sqrt[-z])^(1-2 a)
Hypergeometric2F1[2 a-1,a+b-1,2 a+2 b-2,2 Sqrt[z^2-z]+2 z])/Sqrt[1-z]/;Re[z]/2}
I tested on about 10 different formulas, and this worked fine, but of course this is not an extensive test, so most likely this will not always work.
Import, which does a pretty good job of importing in default format (HTML here). My solution parses HTML and looks at certain pattern in the parsed document. For the case at hand, your solution seems much more robust and adequate, so I am tempted to delete mine - perhaps will give it a day or two... – Leonid Shifrin May 31 '11 at 12:08