Recursive Download Files with Wget
Quick note for future reference. GNU’s Wget enables you to download files and resources directly via Terminal/shell. The big tip here is the r
option, which tells Wget to download the target files recursively. For example, if you want to download an entire site:
wget -r https://example.com
If you omit the r
option, that command will download only the homepage, located at example.com
. So add the r
to make it a recursive download.
Note about robots.txt
By default, Wget checks and obeys any rules specified in the site’s robots.txt file. For example, if example.com
has a robots file with the following rules:
User-agent: *
Disallow: /private.html
Disallow: /secret.html
Disallow: /treasure.pdf
..then by default Wget will obey the rules and not download those “forbidden” files. But the thing is, robots rules are not mandatory. Ultimately it is up to the bot or agent (in this case Wget) to follow the rules or simply ignore them.
SO to tell Wget to ignore the site’s robots rules, you can set the e
option, like so:
wget -e robots=off -r https://example.com
By setting the e
option to robots=off
, Wget will ignore any rules contained in the site’s robots.txt file, and just proceed to download everything (thanks to the r
option).
Bonus: Change the download destination
By default Wget downloads files into the current working directory. To change that, use the cd
command to go to the desired destination directory, for example:
cd /home/path/Downloads
That puts you in the folder located at /home/path/Downloads
on your machine. So now when you run your Wget command, any files that you download will go to that location.