| They're sneaky. And stealthy. They're quiet | | | | $number = $ENV{REMOTE_ADDR}; |
| and mostly unobtrusive, but once you've been | | | | |
| visited by them, you'll know it. Because | | | | ($a,$b,$c,$d)=split(/\./,$number); |
| you'll be inundated with a seemingly | | | | |
| never-ending stream of spam-mails.They're | | | | $ipadr=pack("C4",$a,$b,$c,$d); |
| email harvesting robots, and chances are | | | | |
| you've been visited by one.What these | | | | ($name,$aliases,$addrtype,$length, |
| insidious creatures do is crawl your site, | | | | |
| much like the search engine spiders do, and | | | | @addrs)=(gethostbyaddr("$ipadr", 2)); if |
| collect any and all email addresses they find | | | | ($name =~ /foo.com/i) { |
| there. Many of them crawl your entire site, | | | | |
| following every link, gathering email | | | | $ENV{HTTP_USER_AGENT} =~ /emailsiphon/i; |
| addresses from your guestbook, your message | | | | |
| boards, databases, and everywhere else they | | | | $access_denied++; |
| can get to.What happens next is so sinister, | | | | |
| so unthinkable; I can barely say it. They | | | | sleep(10000); |
| put your email addresses on CDRom and sell | | | | |
| them- as opt-in lists. You've seen them, | | | | } |
| "20,000 targeted email addresses for only | | | | |
| $29.95!", or my personal favorite, "Send 10 | | | | The last option is, in my humble opinion, |
| Bazillion emails- WITHOUT SPAMMING!!". What | | | | the best option. If you have the ability to |
| you didn't know was that it was YOUR email | | | | modify your .htaccess file, you can specify |
| address they were selling.To find out if your | | | | certain host agents that are not allowed to |
| site has been visited by an email harvester, | | | | visit your site using the mod_rewrite file. |
| you only need to look at your logs. If your | | | | This effectively blocks the offending robots |
| web host provides you with your stats, you | | | | from ever touching your site. You should |
| can look in the Browser report for any of the | | | | definitely check with your hosting provider |
| following: | | | | to see whether or not you can make such a |
| | | | modification. Most hosts will be more than |
| | | | happy to make the modification for you.For |
| | | | those of you willing and able to make the |
| EmailSiphon | | | | changes yourself, just add the following to |
| | | | your.htaccess file:RewriteEngine on |
| Crescent Internet Tool Pack v1.0 | | | | |
| | | | RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon |
| Cherry Picker | | | | [OR] |
| | | | |
| Email Collector | | | | RewriteCond %{HTTP_USER_AGENT} ^EmailWolf |
| | | | [OR] |
| Libwww-perl 1.0 | | | | |
| | | | RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro |
| If you don't have a stats program, you can | | | | [OR] |
| examine your logs for visits from these | | | | |
| agents. The easiest way to do this is to | | | | RewriteCond %{HTTP_USER_AGENT} |
| download them and open them in a program with | | | | ^Mozilla.*NEWT [OR] |
| a search function (like Wordpad). Then you | | | | |
| can search for the names listed above.So, | | | | RewriteCond %{HTTP_USER_AGENT} ^Crescent |
| what can you do to protect your site from | | | | [OR] |
| these evil robots? Unfortunately, there's no | | | | |
| single magic solution. There are, however | | | | RewriteCond %{HTTP_USER_AGENT} ^CherryPicker |
| steps you can take to discourage them.The | | | | [OR] |
| first thing you can do is create a Robots | | | | |
| Exclusion file. This is simply a text file | | | | RewriteCond %{HTTP_USER_AGENT} |
| named robots.txt that you place in your root | | | | ^[Ww]eb[Bb]andit [OR] |
| directory. What this file does is tells | | | | |
| robots where they can and cannot go (as well | | | | RewriteCond %{HTTP_USER_AGENT} |
| as which robots can and cannot visit your | | | | ^WebEMailExtrac.* [OR] |
| site). The drawback of using this file to | | | | |
| combat email harvesting robots is that as a | | | | RewriteCond %{HTTP_USER_AGENT} ^NICErsPRO |
| rule, the robots.txt file is based on a sort | | | | [OR] |
| of robot honor system. That is to say that | | | | |
| you are assuming that any robot that visits | | | | RewriteCond %{HTTP_USER_AGENT} ^Telesoft |
| will ask for and comply with the directives | | | | [OR] |
| that you put there. Unfortunately, | | | | |
| harvesting robots are typically ill-mannered | | | | RewriteCond %{HTTP_USER_AGENT} |
| robots that ignore this file. For more | | | | ^Zeus.*Webster [OR] |
| information on Robot Exclusion, visit the | | | | |
| Robots Exclusion StandardA really fun | | | | RewriteCond %{HTTP_USER_AGENT} |
| solution is to use a cgi-script that punishes | | | | ^Microsoft.URL [OR] |
| bad robots. What these do is to direct the | | | | |
| robot to a page full of fake email addresses- | | | | RewriteCond %{HTTP_USER_AGENT} ^Mozilla |
| lots and lots of them. So, what the spammer | | | | 3.Mozilla/2.01 [OR] |
| gets is a whole lot of bounced email | | | | |
| messages, which will discourage them from | | | | RewriteCond %{HTTP_USER_AGENT} |
| visiting you again. The downside of this | | | | ^EmailCollector |
| method is that they do also collect the valid | | | | |
| email addresses. Also, most scripts of this | | | | RewriteRule ^.*$ /badspammer.html [L]While |
| type have a little disclaimer attached to | | | | these are all effective measures to fight the |
| them stating that they won't be held | | | | Email Snatchers, there are new robots |
| responsible for any legal issues that arise | | | | evolving every day. It's important to stay |
| from the use of their script- and that has to | | | | informed with the latest tools that the |
| make you wonder.There are other scripts that | | | | spammers are using. Some excellent sources |
| hide your email address from the robots, but | | | | of information can be found at:Search Engine |
| not your site visitors. This is a great | | | | World |
| solution for smaller sites that don't have | | | | |
| more than one or two addresses listed. You | | | | Today |
| can find both types of scripts at the CGI | | | | |
| Resource IndexAnother handy script is one | | | | "Restricting Access by Copyright 2001 |
| that will check to see if a robot is | | | | Sharon Davis. When she is not waging war on |
| friendly, and if not it will put it to sleep | | | | spammers, she is the owner of |
| for say, 10,000 minutes. This will cause the | | | | 2Work-At-Home.Com, Work At Home Articles.net |
| robot to terminate the request and move on to | | | | and the Editor of the site's monthly ezine, |
| another victim. | | | | America's Home. In her spare time she |
| | | | reminisces about what it was like to have |
| | | | spare time. |
| | | | |