CURL to download a directory

Question

I am trying to download a full website directory using CURL. The following command does not work:

curl -LO http://example.com/

It returns an error: curl: Remote file name has no length!.

But when I do this: curl -LO http://example.com/someFile.type it works. Any idea how to download all files in the specified directory? Thanks.

score 97 · Answer 1 · answered Jan 31 '14 at 16:44

97

Always works for me, included no parent and recursive to only get the desired directory.

 wget --no-parent -r http://WEBSITE.com/DIRECTORY

answered Jan 31 '14 at 16:44

StanleyZheng

1,071

2

This should be the accepted answer. – Juan Jimenez Apr 19 '19 at 12:12
Trying to download a folder from git.
I tried wget --no-parent -r http://WEBSITE.com/DIRECTORY and also without --no-parent - did not work. – Sam-T Oct 15 '19 at 15:10
1

It worked well. I just add that is better to enclose the URL with quotes in order to prevent errors: wget --no-parent -r 'http://domain/page.php' – alemol Feb 27 '20 at 17:05
As of (at least) version 1.20.3, the wget command sets a default maximum depth of 5 for it's recursive mode. You need to explicitly specify a higher maximum depth.
To specify an infinite depth use: wget --no-parent -r -l inf 'http://WEBSITE.com/DIRECTORY'.

I would also recommend rate limiting the recursion as a courtesy to the webmaster and their host, and also randomizing the wait interval to more effectively avoid automated detection of web crawlers: wget --no-parent -r -l inf --wait 5 --random-wait 'http://WEBSITE.com/DIRECTORY'
– Cory Gross Jul 19 '21 at 04:21
This works but does not include and hidden files or directories – ericcurtin Jul 18 '22 at 16:09

score 36 · Answer 2 · answered Oct 17 '10 at 19:59

36

HTTP doesn't really have a notion of directories. The slashes other than the first three (http://example.com/) do not have any special meaning except with respect to .. in relative URLs. So unless the server follows a particular format, there's no way to “download all files in the specified directory”.

If you want to download the whole site, your best bet is to traverse all the links in the main page recursively. Curl can't do it, but wget can. This will work if the website is not too dynamic (in particular, wget won't see links that are constructed by Javascript code). Start with wget -r http://example.com/, and look under “Recursive Retrieval Options” and “Recursive Accept/Reject Options” in the wget manual for more relevant options (recursion depth, exclusion lists, etc).

If the website tries to block automated downloads, you may need to change the user agent string (-U Mozilla), and to ignore robots.txt (create an empty file example.com/robots.txt and use the -nc option so that wget doesn't try to download it from the server).

answered Oct 17 '10 at 19:59

Gilles 'SO- stop being evil'

70,726

How wget is able to do it. ?? – Srikan Oct 06 '16 at 16:29
@Srikan wget parses the HTML to find the links that it contains and recursively downloads (a selection of) those links. – Gilles 'SO- stop being evil' Oct 06 '16 at 21:05
If the files don't have any internal links, then does recursive download fail to get all the files. Lets say there is a HTTP folder of some txt files. Will wget succeed to get all the files. Let me try it after this comment – Srikan Oct 15 '16 at 02:28
@Srikan HTTP has no concept of directory. Recursive download means following links in web pages (including web pages generated by the server to show a directory listing, if the web server does this). – Gilles 'SO- stop being evil' Oct 15 '16 at 11:58
wget supports ignoring robots.txt with the flag -e robots=off. Alternatively you can avoid downloading it by rejecting it with -R "robots.txt". – Ryan Krage Nov 13 '18 at 13:39

score 24 · Answer 3 · edited Jun 20 '14 at 15:35

24

In this case, curl is NOT the best tool. You can use wget with the -r argument, like this:

wget -r http://example.com/

This is the most basic form, and and you can use additional arguments as well. For more information, see the manpage (man wget).

edited Jun 20 '14 at 15:35

Canadian Luke

24,339

answered Jan 23 '14 at 11:50

moroccan

241

score 7 · Answer 4 · answered Oct 17 '10 at 17:59

7

This isn't possible. There is no standard, generally implemented, way for a web server to return the contents of a directory to you. Most servers do generate an HTML index of a directory, if configured to do so, but this output isn't standard, nor guaranteed by any means. You could parse this HTML, but keep in mind that the format will change from server to server, and won't always be enabled.

answered Oct 17 '10 at 17:59

Brad

6,182

Look at this app called Site Sucker. http://sitesucker.us. How do they do it? – Foo Oct 17 '10 at 18:09
They parse the HTML file and download every link in it. – Brad Oct 17 '10 at 18:14
Using wget or curl? – Foo Oct 17 '10 at 18:17
No, wget and curl don't support this. They are for making HTTP requests, that's all. What you do with the data later is up to you. You need to write some code to parse the HTML, or find some other utility. – Brad Oct 17 '10 at 18:19
7

@Brad: curl doesn't parse the HTML, but wget does precisely this (it's called recursive retrieval). – Gilles 'SO- stop being evil' Oct 17 '10 at 20:00
1

Ah, well I stand corrected! http://www.gnu.org/software/wget/manual/html_node/Recursive-Download.html#Recursive-Download OP should be aware that this still doesn't get what he is looking for... it only follows links that are available on the pages returned. – Brad Oct 17 '10 at 20:13

score 4 · Answer 5 · answered Dec 20 '20 at 12:32

4

When you're downloading from a directory listing add one more argument to wget called reject.

wget --no-parent -r --reject "index.html*" "http://url"

answered Dec 20 '20 at 12:32

LAamanni

161
4

score 4 · Answer 6 · answered Aug 30 '21 at 08:24

4

lftp -c mirror <url>

Obviously, you need to install lftp first.

answered Aug 30 '21 at 08:24

HappyFace

1,221

Works great. Other solutions were having issues with UTF8 filenames. – Azghanvi Dec 09 '22 at 18:31
Upvoted. I was able to greatly simplify the following wget command: wget -r --no-host-directories --no-parent --reject '*.html*' -P ./ -I local-dir http://example.com/remote-dir, to lftp -c mirror http://example.com/remote-dir local-dir – Antony Nguyen Feb 25 '23 at 22:42

score 3 · Answer 7 · answered Jan 20 '13 at 00:08

3

You can use the Firefox extension DownThemAll! It will let you download all the files in a directory in one click. It is also customizable and you can specify what file types to download. This is the easiest way I have found.

answered Jan 20 '13 at 00:08

Asdf

31

score 1 · Answer 8 · answered Jan 23 '14 at 12:44

1

You might find a use for a website ripper here, this will download everything and modify the contents/internal links for local use. A good one can be found here: http://www.httrack.com

answered Jan 23 '14 at 12:44

Gaurav Joseph

1,745

CURL to download a directory

8 Answers8

Linked