1

I need to download all files in a domain folder says https://example.com/folder/subfolder. The subfolder files dont have unique increment, means the file names are random string. I want to download all the files in the subfolder using wget or any other method. Please give details.

I tried the answer here. It only download index.html file. I tried other option in that answer with --reject option, but it don't download anything.

J C
  • 111

1 Answers1

0

As far as I am aware, wget only works with links that:

  • Explicitly have an standard href attribute.

  • Are present in a given HTML document (which is something the server generates, so every technically available file may not always be listed for wget to download).

Furthermore, you should probably look at any raw page source (e.g. in your browser). If the page uses JavaScript, you may be out of luck, as wget does not process JavaScript.

If a link is listed in the raw HTML, but without a standard href attribute, you can still parse the page for links, just not with wget. You would likely need to write your own script with something like Windows PowerShell or Python (possibly with requests) and BeautifulSoup.


Note that in some rare cases, if the links are entirely generated by JavaScript, you might even need Selenium to save a fully rendered page before processing it for file links. Python has a Selenium module and I have personally had good luck with the stand-alone "Marmaduke" builds (zip files) of Ungoogled Chromium from Woolyss for browser automation.

Anaksunaman
  • 17,239