34

I have been using Wget, and I have run across an issue. I have a site,that has several folders and subfolders within the site. I need to download all of the contents within each folder and subfolder. I have tried several methods using Wget, and when i check the completion, all I can see in the folders are an "index" file. I can click on the index file, and it will take me to the files, but i need the actual files.

does anyone have a command for Wget that i have overlooked, or is there another program i could use to get all of this information?

site example:

www.mysite.com/Pictures/ within the Pictures DIr, there are several folders.....

www.mysite.com/Pictures/Accounting/

www.mysite.com/Pictures/Managers/North America/California/JoeUser.jpg

I need all files, folders, etc.....

Oliver Salzburg
  • 87,539
  • 63
  • 263
  • 308

3 Answers3

60

I want to assume you've not tried this:

wget -r --no-parent http://www.mysite.com/Pictures/

or to retrieve the content, without downloading the "index.html" files:

wget -r --no-parent --reject "index.html*" http://www.mysite.com/Pictures/

Reference: Using wget to recursively fetch a directory with arbitrary files in it

  • 2
    Thanks, I have run that command several times, but i did not let the command finish all the way to the end. I got side tracked, and let the command actually finish, and it copied ALL Folders First, then it went back and copied ALL of the files into the folder. – Horrid Henry Oct 07 '13 at 16:46
  • just goes to show you, if i had patience, i would have had this done 2 weeks ago.... LOL. :) thanks again. – Horrid Henry Oct 07 '13 at 16:47
  • @Horrid Henry, Congratulations! – Felix Imafidon Oct 07 '13 at 17:02
  • I use the similar command but only getting an index.html file! – shenkwen Jun 25 '19 at 20:55
36

I use wget -rkpN -e robots=off http://www.example.com/

-r means recursively

-k means convert links. So links on the webpage will be localhost instead of example.com/bla

-p means get all webpage resources so obtain images and javascript files to make website work properly.

-N is to retrieve timestamps so if local files are newer than files on remote website skip them.

-e is a flag option it needs to be there for the robots=off to work.

robots=off means ignore robots file.

I also had -c in this command so if they connection dropped if would continue where it left off from when i re-run the command. I figured -N would go well with -c

Tim Jonas
  • 696
1

wget -m -A * -pk -e robots=off www.mysite.com/ this will download all type of files locally and point to them from the html file
and it will ignore robots file