How would I download a list of files from a file server like this one ?
I suppose I could use wget but then it tries to get all the links and the html file as well. Is there a better tool to accomplish this?
23 Answers
You can specify what file extensions wget will download when crawling pages:
wget -r -A zip,rpm,tar.gz this will perform a recursive search and only download files with the .zip, .rpm, and .tar.gz extensions.
supposing you really just want a list of the files on the server without fetching them (yet):
%> wget -r -np --spider 2>&1 | awk -f filter.awk | uniq
while 'filter.awk' looks like this
/^--.*-- http:\/\/.*[^\/]$/ { u=$3; }
/^Length: [[:digit:]]+/ { print u; }then you possibly have to filter out some entries like
"" Ref:
You can use following command:
wget --execute="robots = off" --mirror --convert-links --no-parent --wait=5 <website-url>Explanation with each options
wget: Simple Command to make CURL request and download remote files to our local machine.--execute="robots = off": This will ignore robots.txt file while crawling through pages. It is helpful if you're not getting all of the files.--mirror: This option will basically mirror the directory structure for the given URL. It's a shortcut for-N -r -l inf --no-remove-listingwhich means:-N: don't re-retrieve files unless newer than local-r: specify recursive download-l inf: maximum recursion depth (inf or 0 for infinite)--no-remove-listing: don't remove '.listing' files
--convert-links: make links in downloaded HTML or CSS point to local files--no-parent: don't ascend to the parent directory--wait=5: wait 5 seconds between retrievals. So that we don't thrash the server.<website-url>: This is the website url from where to download the files.
Happy Downloading :smiley: