Blocking referrer spam, mal-bots, and other malicious weasels with htaccess

Anyone who runs a site/server for very long will likely find out about the gruesome underbelly of the online work - spammers. They come in many shapes and sizes (most are bots), and with different purposes each, but they each have this in common - they hurt your site/server and it's available resources.

Below are some things to look out for and some methods to take care of one particular type of spam, referrer spam, which can cripple a site/server in no time. With enough referrer spam you'll have what amounts to a denial of service attack (e.g., so many junk requests that the server can't even tend to the real ones).

Example of how serious this can be
Recently one of the sites we host had a big traffic day thanks to being front paged at Fark.com and Foobies.com. 18,000+ unique visitors in 18 hours. Suffice it to say that put quite a load on the shared environment they were hosted in. Well, guess what - the (unrelated) spam attack the site received a few days later actually created more than twice the load on the server that the huge amounts of legitmate traffic did!

Identifying the problem

The first step in fixing a problem is, of course, to know you have one! Referrer spam can be tricky because without knowing where to look you may never realize what is happening in the dark corners of your webserver - you'll just see the symptoms. (a slow site or one that is down completely)

Where to look
If you've got performance issues with your site that you can't tie to an increase in visits then it might be worth a look. The places where you can track referrer spam are a) in your server logs, b) in your site/cpanel statistics pages.

What you'll want to look at is your most recent hits, and the most frequently requested pages. If you see something that surprises you (e.g., an invalid url, or a url that you don't think should be that busy) then note the ip address(es) and/or domain(s) of the who is requesting it. If you ever see pages continually requested by only one ip address/domain or numerous ip's within the same range, then that's not a good sign. Grab the ip address and do a whois lookup on it and try and find out more. There are certain countries, for instance, where spam often orignates from.

Block that spammer
Ok, so now you sure. Your site is being taken apart by a rougue bot. You've identified a fixed ip or defined range of ip that it's coming from. Now it's time to block this vermin using a little .htaccess magic:

To block a single ip address:
(substituting the real ip for the placeholders x's, of course):

order allow,deny
deny from xxx.xxx.xx.x
allow from all

To block a range of ip addresses:
(again with use the appropriate ip in place of the x's - there are three examples here, showing that you can control how much of the range you want to block):

order allow,deny
deny from xxx.xxx.xx.
dent from xxx.xxx.
dent from xxx.
allow from all

How to block if their IP is not constant/spoofed?
As long as you can find one thing that is constant and unique to their request you can find an htaccess command to stop them. In once case we simply blocked an ip spoofing spammer by using the domain their request was coming from:

RewriteCond %{HTTP_REFERER} ^http(s)?://(www\.)?somedomain1.net.*$ [OR]
RewriteCond %{HTTP_REFERER} ^http(s)?://(www\.)?somedomain2.com.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http(s)?://(www\.)?somedomain3.*$ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http(s)?://(www\.)?somedomain4.com.*$ [NC]
RewriteRule .* - [F,L]

10 February, 2007

Comments

We did this deny from IP setup for WSF2008.net (looked it up independently of this page even though this link was right in our notes!) to stop an insane once or twice a second RSS checker from Michigan (if I ever meet this fool...)

Anyhow, we noticed the "this IP has been blocked by server setup" (or similar message) in the Apache error log (every second). Is this normal / preventable?

If I understand what your describing correctly, it sounds like some ip's were autobanned/blacklisted by your server. I'm not sure what the service/app that does that but I know that such things exists because I got autobanned myself one time for to many failed ssh login attempts. :-)

Sorry I couldn't be more help on this one.

I'm noticing an unusually high number of visits to a page that baffles me, because I can't see how so many visitors are getting to that page. Do spam referrers cause these unusual spikes in traffic?

It can indeed mean that - but I recommend treading lightly until you can be reasonably sure. Try to figure out where all the traffic is coming from - if it's coming from just one place then that's a good sign generally. But you should still do a ip lookup or even a google search on the ip to see if you can figure out more.