CURL to download a directory

I am trying to download a full website directory using CURL. The following command does not work:

curl -LO 

It returns an error: curl: Remote file name has no length!.

But when I do this: curl -LO it works. Any idea how to download all files in the specified directory? Thanks.

8 Answers

Always works for me, included no parent and recursive to only get the desired directory.

 wget --no-parent -r 
4

HTTP doesn't really have a notion of directories. The slashes other than the first three () do not have any special meaning except with respect to .. in relative URLs. So unless the server follows a particular format, there's no way to “download all files in the specified directory”.

If you want to download the whole site, your best bet is to traverse all the links in the main page recursively. Curl can't do it, but wget can. This will work if the website is not too dynamic (in particular, wget won't see links that are constructed by Javascript code). Start with wget -r , and look under “Recursive Retrieval Options” and “Recursive Accept/Reject Options” in the wget manual for more relevant options (recursion depth, exclusion lists, etc).

If the website tries to block automated downloads, you may need to change the user agent string (-U Mozilla), and to ignore robots.txt (create an empty file and use the -nc option so that wget doesn't try to download it from the server).

5

In this case, curl is NOT the best tool. You can use wget with the -r argument, like this:

wget -r 

This is the most basic form, and and you can use additional arguments as well. For more information, see the manpage (man wget).

This isn't possible. There is no standard, generally implemented, way for a web server to return the contents of a directory to you. Most servers do generate an HTML index of a directory, if configured to do so, but this output isn't standard, nor guaranteed by any means. You could parse this HTML, but keep in mind that the format will change from server to server, and won't always be enabled.

6

When you're downloading from a directory listing add one more argument to wget called reject.

wget --no-parent -r --reject "index.html*" ""

You can use the Firefox extension DownThemAll! It will let you download all the files in a directory in one click. It is also customizable and you can specify what file types to download. This is the easiest way I have found.

You might find a use for a website ripper here, this will download everything and modify the contents/internal links for local use. A good one can be found here:

lftp -c mirror <url>

Obviously, you need to install lftp first.

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

You Might Also Like