WP-Mix

A fresh mix of code snippets and tutorials

Block Wayback Machine

If your website has been online for any length of time, it probably has been crawled and stored in the Internet Archive, AKA the Wayback Machine. That means your web pages will be available even if your site is taken offline. To remove any pages that already have been archived, you need to contact the support team at the Internet Archive. To prevent the Wayback Machine from archiving any future pages, you can block them via robots.txt or Apache/.htaccess. Here’s how to do it..

Block via robots.txt

Here are the rules to block Wayback via robots.txt:

User-agent: ia_archiver
User-agent: archive.org_bot
User-agent: ia_archiver-web.archive.org 
Disallow: /

As you know, robots rules only work if they are obeyed. So it’s a matter of trust with robots.txt. If you want to always enforce blocking of Wayback, you can go the Apache/.htaccess route..

Block via Apache/.htaccess

If you want to always block Wayback machine no matter what, add the following rules to your site’s public root .htaccess file:

<IfModule mod_rewrite.c>
	RewriteEngine On
	RewriteCond %{HTTP_USER_AGENT} (ia_archiver|archive\.org_bot|ia_archiver-web\.archive\.org ) [NC]
	RewriteRule (.*) - [F,L]
</IfModule>

Note these blocking rules are current as of January 2025. Keep in mind that the Wayback Machine may have changed user-agent names, or even added new agents. Best advice is to check online for the latest Wayback user-agent information.

Also note that there may be other archiving websites out there. So just blocking Wayback is no guarantee that your pages will not be stored somewhere online (or offline).

Learn more

.htaccess made easy