0

I've created a very simple API that imports the "Plaintext" of a URL or a PDF. Here is my code:

urlScraperTest = APIFunction[
   {"url" -> "String"},
   Import[#url, "Plaintext"] &];

and deployed it to the Wolfram Cloud:

CloudDeploy[urlScraperTest, "urlScraper", Permissions -> "Public", CloudObjectNameFormat -> "CloudUserUUID"]

CloudObject["https://www.wolframcloud.com/obj/user-7d8eeec4-69a3-4149-bc75-10de306a3cbd/urlScraper"]

This code works fine if I pass it some URLs, e.g. here is a recipe that I am able to scrape from my browser:

https://www.wolframcloud.com/obj/user-7d8eeec4-69a3-4149-bc75-10de306a3cbd/urlScraper?url=https://www.foodandwine.com/recipes/beef-wellington

But when I try to pass the code a PDF file name, it produces an (unhelpful) "$Failed" error. For instance, here is a simple pdf stored in a cloud location:

https://bbf184a8c110ea5f6bb4192bc1d23ad5.cdn.bubble.io/f1703793538605x556426251682794500/Perfect%20Chocolate%20Chip%20Cookies%20Recipe%20-%20NYT%20Cooking.pdf

If I enter this location in my browser search bar directly, the PDF renders correctly. But if I try to pass this URL to my function

https://www.wolframcloud.com/obj/user-7d8eeec4-69a3-4149-bc75-10de306a3cbd/urlScraper?url=https://bbf184a8c110ea5f6bb4192bc1d23ad5.cdn.bubble.io/f1703793538605x556426251682794500/Perfect%20Chocolate%20Chip%20Cookies%20Recipe%20-%20NYT%20Cooking.pdf

It produces a "$Failed" message. Note: If I call the URLScraperTest[] function directly from a local MMA notebook using the above URL, it does return the results I expect with no error.

Can someone help shed some light on this behavior and suggest a solution?

Thanks!

MSC02476
  • 699
  • 4
  • 12
  • I should have mentioned that I need to call with Wolfram Cloud function from another system, not from Mathematica. Thanks! – MSC02476 Dec 29 '23 at 16:15

1 Answers1

0

Your url will be url-decoded but you want to keep %20s for Import so you need to encode it in your request. At least I am assuming this is the problem.

url = "https://bbf184a8c110ea5f6bb4192bc1d23ad5.cdn.bubble.io/\
f1703793538605x556426251682794500/Perfect%20Chocolate%20Chip%\
20Cookies%20Recipe%20-%20NYT%20Cooking.pdf"

Import[URLDecode@url, "Plaintext"] (* $Failed *)

So your final request needs to look like this, does it?

URLBuild[apiURL, {"url" -> url}]

"...?url=https%3A%2F%2Fbbf184a8c110ea5f6bb4192bc1d23ad5.cdn.bubble.io%
2Ff1703793538605x556426251682794500%2FPerfect%2520Chocolate%2520Chip%
2520Cookies%2520Recipe%2520-%2520NYT%2520Cooking.pdf"

Kuba
  • 136,707
  • 13
  • 279
  • 740
  • Thanks @kuba. I'm not quite sure I follow your solution. I can get this work from a Mathematica notebook, but not from calling the API from outside of Mathematica (I'd would like to use this function as part of a non-MMA web app I am building. Thank you. – MSC02476 Dec 29 '23 at 16:17
  • @MSC02476 my point was that since the input is url-decoded before Import is used then you need to have a valid encoded url encoded another time to account for that. – Kuba Jan 28 '24 at 09:07