| They're sneaky. And stealthy. They're quiet and | | | | script is one that will check to see if a robot is |
| mostly unobtrusive, but once you've been visited | | | | friendly, and if not it will put it to sleep for say, |
| by them, you'll know it. Because you'll be | | | | 10,000 minutes. This will cause the robot to |
| inundated with a seemingly never-ending stream | | | | terminate the request and move on to another |
| of spam-mails.They're email harvesting robots, and | | | | victim. |
| chances are you've been visited by one.What | | | | |
| these insidious creatures do is crawl your site, | | | | $number = $ENV{REMOTE_ADDR}; |
| much like the search engine spiders do, and collect | | | | ($a,$b,$c,$d)=split(/\./,$number); |
| any and all email addresses they find there. Many | | | | $ipadr=pack("C4",$a,$b,$c,$d); |
| of them crawl your entire site, following every | | | | ($name,$aliases,$addrtype,$length, |
| link, gathering email addresses from your | | | | @addrs)=(gethostbyaddr("$ipadr", 2)); if ($name |
| guestbook, your message boards, databases, and | | | | =~ /foo.com/i) { |
| everywhere else they can get to.What happens | | | | $ENV{HTTP_USER_AGENT} =~ /emailsiphon/i; |
| next is so sinister, so unthinkable; I can barely say | | | | $access_denied++; |
| it. They put your email addresses on CDRom and | | | | sleep(10000); |
| sell them- as opt-in lists. You've seen them, | | | | } |
| "20,000 targeted email addresses for only | | | | The last option is, in my humble opinion, the best |
| $29.95!", or my personal favorite, "Send 10 | | | | option. If you have the ability to modify your |
| Bazillion emails- WITHOUT SPAMMING!!". What you | | | | .htaccess file, you can specify certain host agents |
| didn't know was that it was YOUR email address | | | | that are not allowed to visit your site using the |
| they were selling.To find out if your site has been | | | | mod_rewrite file. This effectively blocks the |
| visited by an email harvester, you only need to | | | | offending robots from ever touching your site. |
| look at your logs. If your web host provides you | | | | You should definitely check with your hosting |
| with your stats, you can look in the Browser | | | | provider to see whether or not you can make |
| report for any of the following: | | | | such a modification. Most hosts will be more than |
| | | | happy to make the modification for you.For those |
| EmailSiphon | | | | of you willing and able to make the changes |
| Crescent Internet Tool Pack v1.0 | | | | yourself, just add the following to your.htaccess |
| Cherry Picker | | | | file:RewriteEngine on |
| Email Collector | | | | RewriteCond %{HTTP_USER_AGENT} |
| Libwww-perl 1.0 | | | | ^EmailSiphon [OR] |
| If you don't have a stats program, you can | | | | RewriteCond %{HTTP_USER_AGENT} |
| examine your logs for visits from these agents. | | | | ^EmailWolf [OR] |
| The easiest way to do this is to download them | | | | RewriteCond %{HTTP_USER_AGENT} |
| and open them in a program with a search | | | | ^ExtractorPro [OR] |
| function (like Wordpad). Then you can search for | | | | RewriteCond %{HTTP_USER_AGENT} |
| the names listed above.So, what can you do to | | | | ^Mozilla.*NEWT [OR] |
| protect your site from these evil robots? | | | | RewriteCond %{HTTP_USER_AGENT} |
| Unfortunately, there's no single magic solution. | | | | ^Crescent [OR] |
| There are, however steps you can take to | | | | RewriteCond %{HTTP_USER_AGENT} |
| discourage them.The first thing you can do is | | | | ^CherryPicker [OR] |
| create a Robots Exclusion file. This is simply a | | | | RewriteCond %{HTTP_USER_AGENT} |
| text file named robots.txt that you place in your | | | | ^[Ww]eb[Bb]andit [OR] |
| root directory. What this file does is tells robots | | | | RewriteCond %{HTTP_USER_AGENT} |
| where they can and cannot go (as well as which | | | | ^WebEMailExtrac.* [OR] |
| robots can and cannot visit your site). The | | | | RewriteCond %{HTTP_USER_AGENT} |
| drawback of using this file to combat email | | | | ^NICErsPRO [OR] |
| harvesting robots is that as a rule, the robots.txt | | | | RewriteCond %{HTTP_USER_AGENT} |
| file is based on a sort of robot honor system. | | | | ^Telesoft [OR] |
| That is to say that you are assuming that any | | | | RewriteCond %{HTTP_USER_AGENT} |
| robot that visits will ask for and comply with the | | | | ^Zeus.*Webster [OR] |
| directives that you put there. Unfortunately, | | | | RewriteCond %{HTTP_USER_AGENT} |
| harvesting robots are typically ill-mannered robots | | | | ^Microsoft.URL [OR] |
| that ignore this file. For more information on | | | | RewriteCond %{HTTP_USER_AGENT} ^Mozilla |
| Robot Exclusion, visit the Robots Exclusion | | | | 3.Mozilla/2.01 [OR] |
| StandardA really fun solution is to use a cgi-script | | | | RewriteCond %{HTTP_USER_AGENT} |
| that punishes bad robots. What these do is to | | | | ^EmailCollector |
| direct the robot to a page full of fake email | | | | RewriteRule ^.*$ /badspammer.html [L]While |
| addresses- lots and lots of them. So, what the | | | | these are all effective measures to fight the Email |
| spammer gets is a whole lot of bounced email | | | | Snatchers, there are new robots evolving every |
| messages, which will discourage them from | | | | day. It's important to stay informed with the |
| visiting you again. The downside of this method is | | | | latest tools that the spammers are using. Some |
| that they do also collect the valid email addresses. | | | | excellent sources of information can be found |
| Also, most scripts of this type have a little | | | | at:Search Engine World |
| disclaimer attached to them stating that they | | | | Today |
| won't be held responsible for any legal issues that | | | | "Restricting Access by Copyright 2001 Sharon |
| arise from the use of their script- and that has to | | | | Davis. When she is not waging war on spammers, |
| make you wonder.There are other scripts that | | | | she is the owner of 2Work-At-Home.Com, Work |
| hide your email address from the robots, but not | | | | At Home Articles.net and the Editor of the site's |
| your site visitors. This is a great solution for | | | | monthly ezine, America's Home. In her spare time |
| smaller sites that don't have more than one or | | | | she reminisces about what it was like to have |
| two addresses listed. You can find both types of | | | | spare time. |
| scripts at the CGI Resource IndexAnother handy | | | | |