WP-Mix

A fresh mix of code snippets and tutorials

Recursive Download Files with Wget

Quick note for future reference. GNU’s Wget enables you to download files and resources directly via Terminal/shell. The big tip here is the r option, which tells Wget to download the target files recursively. For example, if you want to download an entire site:

wget -r https://example.com

If you omit the r option, that command will download only the homepage, located at example.com. So add the r to make it a recursive download.

Note about robots.txt

By default, Wget checks and obeys any rules specified in the site’s robots.txt file. For example, if example.com has a robots file with the following rules:

User-agent: *
Disallow: /private.html
Disallow: /secret.html
Disallow: /treasure.pdf

..then by default Wget will obey the rules and not download those “forbidden” files. But the thing is, robots rules are not mandatory. Ultimately it is up to the bot or agent (in this case Wget) to follow the rules or simply ignore them.

SO to tell Wget to ignore the site’s robots rules, you can set the e option, like so:

wget -e robots=off -r https://example.com

By setting the e option to robots=off, Wget will ignore any rules contained in the site’s robots.txt file, and just proceed to download everything (thanks to the r option).

Bonus: Change the download destination

By default Wget downloads files into the current working directory. To change that, use the cd command to go to the desired destination directory, for example:

cd /home/path/Downloads

That puts you in the folder located at /home/path/Downloads on your machine. So now when you run your Wget command, any files that you download will go to that location.

★ Pro Tip:

Banhammer ProBlackhole ProBBQ Pro