Apache SetEnvIfNoCase - Banning Site Rippers and Email Robots

published 01.Jan.2002

Using SetEnvIfNoCase to ban site rippers

If you have a popular site that perhaps has a good links directory or offers downloads like graphics and mp3 music, then sooner or later someone will use a site ripper to download all your sites content.

On a large site this can quickly use up your bandwidth allowance, or impact on system resources to such an extent that your server will grind to a halt for other visitors. So your only option is to ban the user agents.

Besides banning all the know site rippers, it is useful to ban all the email siphon robots that visit your server collecting email address purely for spam purposes.

Below is some sample code from my Apache httpd.conf file which blocks the Wget site ripper and blocks a couple of email robots. You would need to keep an eye on your servers log file and watch for site ripping activity, and then add the offending user agent to this list.


# ban the Wget site ripper
SetEnvIfNoCase User-Agent "^Wget" banned

# ban email collection robots
SetEnvIfNoCase User-Agent "^EmailCollector" banned
SetEnvIfNoCase User-Agent "^EmailSiphon" banned
SetEnvIfNoCase User-Agent "^EmailWolf" banned
SetEnvIfNoCase User-Agent "^WebEMailExtrac.*" banned

order allow,deny
allow from all
deny from env=banned

What Next?

Bookmark this article at :-

 

 

About Author:
I'm a father, husband, and a software developer who works for the NHS. more »

Sections :
« Articles
« Contact


 

FreeImage - an open source library project for developers who would like to support popular graphics image formats like PNG, BMP, JPEG, TIFF.