Invasion of the Email Snatchers

They're sneaky. And stealthy. They're quiet andscript is one that will check to see if a robot is
mostly unobtrusive, but once you've been visitedfriendly, and if not it will put it to sleep for say,
by them, you'll know it. Because you'll be10,000 minutes. This will cause the robot to
inundated with a seemingly never-ending streamterminate the request and move on to another
of spam-mails.They're email harvesting robots, andvictim.
chances are you've been visited by one.What
these insidious creatures do is crawl your site,$number = $ENV{REMOTE_ADDR};
much like the search engine spiders do, and collect($a,$b,$c,$d)=split(/\./,$number);
any and all email addresses they find there. Many$ipadr=pack("C4",$a,$b,$c,$d);
of them crawl your entire site, following every($name,$aliases,$addrtype,$length,
link, gathering email addresses from your@addrs)=(gethostbyaddr("$ipadr", 2)); if ($name
guestbook, your message boards, databases, and=~ /foo.com/i) {
everywhere else they can get to.What happens$ENV{HTTP_USER_AGENT} =~ /emailsiphon/i;
next is so sinister, so unthinkable; I can barely say$access_denied++;
it. They put your email addresses on CDRom andsleep(10000);
sell them- as opt-in lists. You've seen them,}
"20,000 targeted email addresses for onlyThe last option is, in my humble opinion, the best
$29.95!", or my personal favorite, "Send 10option. If you have the ability to modify your
Bazillion emails- WITHOUT SPAMMING!!". What you.htaccess file, you can specify certain host agents
didn't know was that it was YOUR email addressthat are not allowed to visit your site using the
they were selling.To find out if your site has beenmod_rewrite file. This effectively blocks the
visited by an email harvester, you only need tooffending robots from ever touching your site.
look at your logs. If your web host provides youYou should definitely check with your hosting
with your stats, you can look in the Browserprovider to see whether or not you can make
report for any of the following:such a modification. Most hosts will be more than
happy to make the modification for you.For those
EmailSiphonof you willing and able to make the changes
Crescent Internet Tool Pack v1.0yourself, just add the following to your.htaccess
Cherry Pickerfile:RewriteEngine on
Email CollectorRewriteCond %{HTTP_USER_AGENT}
Libwww-perl 1.0^EmailSiphon [OR]
If you don't have a stats program, you canRewriteCond %{HTTP_USER_AGENT}
examine your logs for visits from these agents.^EmailWolf [OR]
The easiest way to do this is to download themRewriteCond %{HTTP_USER_AGENT}
and open them in a program with a search^ExtractorPro [OR]
function (like Wordpad). Then you can search forRewriteCond %{HTTP_USER_AGENT}
the names listed above.So, what can you do to^Mozilla.*NEWT [OR]
protect your site from these evil robots?RewriteCond %{HTTP_USER_AGENT}
Unfortunately, there's no single magic solution.^Crescent [OR]
There are, however steps you can take toRewriteCond %{HTTP_USER_AGENT}
discourage them.The first thing you can do is^CherryPicker [OR]
create a Robots Exclusion file. This is simply aRewriteCond %{HTTP_USER_AGENT}
text file named robots.txt that you place in your^[Ww]eb[Bb]andit [OR]
root directory. What this file does is tells robotsRewriteCond %{HTTP_USER_AGENT}
where they can and cannot go (as well as which^WebEMailExtrac.* [OR]
robots can and cannot visit your site). TheRewriteCond %{HTTP_USER_AGENT}
drawback of using this file to combat email^NICErsPRO [OR]
harvesting robots is that as a rule, the robots.txtRewriteCond %{HTTP_USER_AGENT}
file is based on a sort of robot honor system.^Telesoft [OR]
That is to say that you are assuming that anyRewriteCond %{HTTP_USER_AGENT}
robot that visits will ask for and comply with the^Zeus.*Webster [OR]
directives that you put there. Unfortunately,RewriteCond %{HTTP_USER_AGENT}
harvesting robots are typically ill-mannered robots^Microsoft.URL [OR]
that ignore this file. For more information onRewriteCond %{HTTP_USER_AGENT} ^Mozilla
Robot Exclusion, visit the Robots Exclusion3.Mozilla/2.01 [OR]
StandardA really fun solution is to use a cgi-scriptRewriteCond %{HTTP_USER_AGENT}
that punishes bad robots. What these do is to^EmailCollector
direct the robot to a page full of fake emailRewriteRule ^.*$ /badspammer.html [L]While
addresses- lots and lots of them. So, what thethese are all effective measures to fight the Email
spammer gets is a whole lot of bounced emailSnatchers, there are new robots evolving every
messages, which will discourage them fromday. It's important to stay informed with the
visiting you again. The downside of this method islatest tools that the spammers are using. Some
that they do also collect the valid email addresses.excellent sources of information can be found
Also, most scripts of this type have a littleat:Search Engine World
disclaimer attached to them stating that theyToday
won't be held responsible for any legal issues that"Restricting Access by Copyright 2001 Sharon
arise from the use of their script- and that has toDavis. When she is not waging war on spammers,
make you wonder.There are other scripts thatshe is the owner of 2Work-At-Home.Com, Work
hide your email address from the robots, but notAt Home Articles.net and the Editor of the site's
your site visitors. This is a great solution formonthly ezine, America's Home. In her spare time
smaller sites that don't have more than one orshe reminisces about what it was like to have
two addresses listed. You can find both types ofspare time.
scripts at the CGI Resource IndexAnother handy