| They're sneaky. And stealthy. They're
| |
| | not your site visitors. This is a great
|
| quiet and mostly unobtrusive, but once
| |
| | solution for smaller sites that don't
|
| you've been visited by them, you'll know
| |
| | have more than one or two addresses
|
| it. Because you'll be inundated with a
| |
| | listed. You can find both types of
|
| seemingly never-ending stream of
| |
| | scripts at the CGI Resource IndexAnother
|
| spam-mails.They're email harvesting
| |
| | handy script is one that will check to
|
| robots, and chances are you've been
| |
| | see if a robot is friendly, and if not it
|
| visited by one.What these insidious
| |
| | will put it to sleep for say, 10,000
|
| creatures do is crawl your site, much
| |
| | minutes. This will cause the robot to
|
| like the search engine spiders do, and
| |
| | terminate the request and move on to
|
| collect any and all email addresses they
| |
| | another victim.
|
| find there. Many of them crawl your
| |
| |
|
| entire site, following every link,
| |
| | $number = $ENV{REMOTE_ADDR};
|
| gathering email addresses from your
| |
| | ($a,$b,$c,$d)=split(/\./,$number);
|
| guestbook, your message boards,
| |
| | $ipadr=pack("C4",$a,$b,$c,$d);
|
| databases, and everywhere else they can
| |
| | ($name,$aliases,$addrtype,$length,
|
| get to.What happens next is so sinister,
| |
| | @addrs)=(gethostbyaddr("$ipadr", 2)); if
|
| so unthinkable; I can barely say it.
| |
| | ($name =~ /foo.com/i) {
|
| They put your email addresses on CDRom
| |
| | $ENV{HTTP_USER_AGENT} =~ /emailsiphon/i;
|
| and sell them- as opt-in lists. You've
| |
| | $access_denied++;
|
| seen them, "20,000 targeted email
| |
| | sleep(10000);
|
| addresses for only $29.95!", or my
| |
| | }
|
| personal favorite, "Send 10 Bazillion
| |
| | The last option is, in my humble
|
| emails- WITHOUT SPAMMING!!". What you
| |
| | opinion, the best option. If you have
|
| didn't know was that it was YOUR email
| |
| | the ability to modify your .htaccess
|
| address they were selling.To find out if
| |
| | file, you can specify certain host agents
|
| your site has been visited by an email
| |
| | that are not allowed to visit your site
|
| harvester, you only need to look at your
| |
| | using the mod_rewrite file. This
|
| logs. If your web host provides you with
| |
| | effectively blocks the offending robots
|
| your stats, you can look in the Browser
| |
| | from ever touching your site. You should
|
| report for any of the following:
| |
| | definitely check with your hosting
|
|
| |
| | provider to see whether or not you can
|
| EmailSiphon
| |
| | make such a modification. Most hosts
|
| Crescent Internet Tool Pack v1.0
| |
| | will be more than happy to make the
|
| Cherry Picker
| |
| | modification for you.For those of you
|
| Email Collector
| |
| | willing and able to make the changes
|
| Libwww-perl 1.0
| |
| | yourself, just add the following to
|
| If you don't have a stats program, you
| |
| | your.htaccess file:RewriteEngine on
|
| can examine your logs for visits from
| |
| | RewriteCond %{HTTP_USER_AGENT}
|
| these agents. The easiest way to do this
| |
| | ^EmailSiphon [OR]
|
| is to download them and open them in a
| |
| | RewriteCond %{HTTP_USER_AGENT}
|
| program with a search function (like
| |
| | ^EmailWolf [OR]
|
| Wordpad). Then you can search for the
| |
| | RewriteCond %{HTTP_USER_AGENT}
|
| names listed above.So, what can you do to
| |
| | ^ExtractorPro [OR]
|
| protect your site from these evil robots?
| |
| | RewriteCond %{HTTP_USER_AGENT}
|
| Unfortunately, there's no single magic
| |
| | ^Mozilla.*NEWT [OR]
|
| solution. There are, however steps you
| |
| | RewriteCond %{HTTP_USER_AGENT} ^Crescent
|
| can take to discourage them.The first
| |
| | [OR]
|
| thing you can do is create a Robots
| |
| | RewriteCond %{HTTP_USER_AGENT}
|
| Exclusion file. This is simply a text
| |
| | ^CherryPicker [OR]
|
| file named robots.txt that you place in
| |
| | RewriteCond %{HTTP_USER_AGENT}
|
| your root directory. What this file does
| |
| | ^[Ww]eb[Bb]andit [OR]
|
| is tells robots where they can and cannot
| |
| | RewriteCond %{HTTP_USER_AGENT}
|
| go (as well as which robots can and
| |
| | ^WebEMailExtrac.* [OR]
|
| cannot visit your site). The drawback of
| |
| | RewriteCond %{HTTP_USER_AGENT}
|
| using this file to combat email
| |
| | ^NICErsPRO [OR]
|
| harvesting robots is that as a rule, the
| |
| | RewriteCond %{HTTP_USER_AGENT} ^Telesoft
|
| robots.txt file is based on a sort of
| |
| | [OR]
|
| robot honor system. That is to say that
| |
| | RewriteCond %{HTTP_USER_AGENT}
|
| you are assuming that any robot that
| |
| | ^Zeus.*Webster [OR]
|
| visits will ask for and comply with the
| |
| | RewriteCond %{HTTP_USER_AGENT}
|
| directives that you put there.
| |
| | ^Microsoft.URL [OR]
|
| Unfortunately, harvesting robots are
| |
| | RewriteCond %{HTTP_USER_AGENT} ^Mozilla
|
| typically ill-mannered robots that ignore
| |
| | 3.Mozilla/2.01 [OR]
|
| this file. For more information on Robot
| |
| | RewriteCond %{HTTP_USER_AGENT}
|
| Exclusion, visit the Robots Exclusion
| |
| | ^EmailCollector
|
| StandardA really fun solution is to use a
| |
| | RewriteRule ^.*$ /badspammer.html
|
| cgi-script that punishes bad robots.
| |
| | [L]While these are all effective measures
|
| What these do is to direct the robot to a
| |
| | to fight the Email Snatchers, there are
|
| page full of fake email addresses- lots
| |
| | new robots evolving every day. It's
|
| and lots of them. So, what the spammer
| |
| | important to stay informed with the
|
| gets is a whole lot of bounced email
| |
| | latest tools that the spammers are using.
|
| messages, which will discourage them from
| |
| | Some excellent sources of information
|
| visiting you again. The downside of this
| |
| | can be found at:Search Engine World
|
| method is that they do also collect the
| |
| | Today
|
| valid email addresses. Also, most
| |
| | "Restricting Access by Copyright 2001
|
| scripts of this type have a little
| |
| | Sharon Davis. When she is not waging war
|
| disclaimer attached to them stating that
| |
| | on spammers, she is the owner of
|
| they won't be held responsible for any
| |
| | 2Work-At-Home.Com, Work At Home
|
| legal issues that arise from the use of
| |
| | Articles.net and the Editor of the
|
| their script- and that has to make you
| |
| | site's monthly ezine, America's Home. In
|
| wonder.There are other scripts that hide
| |
| | her spare time she reminisces about what
|
| your email address from the robots, but
| |
| | it was like to have spare time.
|