Apache SetEnvIfNoCase - Banning Site Rippers and Email Robots
published 01.Jan.2002
Using SetEnvIfNoCase to ban site rippers
If you have a popular site that perhaps has a good links directory or offers downloads like graphics and mp3 music, then sooner or later someone will use a site ripper to download all your sites content.
On a large site this can quickly use up your bandwidth allowance, or impact on system resources to such an extent that your server will grind to a halt for other visitors. So your only option is to ban the user agents.
Besides banning all the know site rippers, it is useful to ban all the email siphon robots that visit your server collecting email address purely for spam purposes.
Below is some sample code from my Apache httpd.conf file which blocks the Wget site ripper and blocks a couple of email robots. You would need to keep an eye on your servers log file and watch for site ripping activity, and then add the offending user agent to this list.
# ban the Wget site ripper
SetEnvIfNoCase User-Agent "^Wget" banned
# ban email collection robots
SetEnvIfNoCase User-Agent "^EmailCollector" banned
SetEnvIfNoCase User-Agent "^EmailSiphon" banned
SetEnvIfNoCase User-Agent "^EmailWolf" banned
SetEnvIfNoCase User-Agent "^WebEMailExtrac.*" banned
order allow,deny
allow from all
deny from env=banned
