Fighting comment spam

kao

When I started this blog, I was aware that comment spam exists. What I didn't know, is how common it really is. 🙂

Current statistics are:

Comment type Count %
Spam 39 56
Normal 29 41
Trashed 2 3
Total 70 100

What can be done?

WordPress has several anti-spam plugins. Some of the add captchas, some rely on JavaScript and others rely on continually updated blacklists for spammer IPs and/or keywords. I hate captchas, I respect users that use NoScript, and my webhost is running with allow_url_fopen = false which prevents automatic blacklist updates. Crap!

So, I'm left with a very few options, like blocking spammer IP address ranges using .htaccess file and mod_rewrite.

mod_rewrite magic

When you know what you're doing, mod_rewrite does wonders. When you don't, you might lock yourself out of web-admin interface. Trust me, it's not fun! 😉

In the very simplest form, we can block one IP address:

RewriteCond %{REMOTE_ADDR} ^(123\.456\.789\.666)$
RewriteRule (.*) - [F,L]

First line is a condition - if visitor comes from IP address 123.456.789.666, then apply the rule. Keep in mind that mod_rewrite is matching IP address against regexp, so do not forget backslashes! Otherwise you might accidentally block more than you wanted..

Second line is the rule - whatever URL it tries to access, send response "403 Forbidden". (.*) is a regexp matching anything1. [F] forbids access and [L] stops any other rules from applying, making it the last rule.

It will work, but my webhost does not allow custom 403 pages. So, we can adjust the example a bit:

RewriteCond %{REMOTE_ADDR} ^(123\.456\.789\.666)$
RewriteCond %{REQUEST_URI} !/error.html$
RewriteRule (.*) /error.html [R=302,L]

Now there are 2 conditions, first is matching IP address, 2nd is checking if requested page is not error.html. Note that by default all conditions must match (logical "and").

Also, [R=302] is used to redirect users with Error 302 Found to error.html instead of sending Error 403 Forbidden.

It's better, but we need to block several IP blocks. That's easy too!

RewriteCond %{REMOTE_ADDR} ^123\.456\.789 [OR] RewriteCond %{REMOTE_ADDR} ^555\.666
RewriteCond %{REQUEST_URI} !/error.html$
RewriteRule (.*) /error.html [R=302,L]

Flags [OR] say we're checking if IP address begins with 123.456.789 or 555.666. Also, the regexp was changed to check only beginning of IP address, and ignore the rest.

That's it. Easy, right? 🙂

Identifying spammer-friendly IP blocks

I just went through my inbox and looked at the "Please moderate" emails:

Author : Adrienne (IP: 104.168.70.107 , 104-168-70-107-host.colocrossing.com)
E-mail : hekhwrjjrab@mail.com
URL : http://Adrienne
Whois : http://whois.domaintools.com/104.168.70.107
Comment:
Hi, my name is Adrienne and I am the sales manager at {Spammer Company}. I was just looking at your When software is good enough | Life In Hex website and see...

So, the offending IP address is 104.168.70.107.

DomainTools tells us it's owned by ColoCrossing, and how large the IP block is:
IP Location: United States United States Williamsville Proxy R Us.com
ASN: United States AS36352 AS-COLOCROSSING - ColoCrossing (registered Dec 12, 2005)
Resolve Host: 104-168-70-107-host.colocrossing.com
Whois Server: whois.arin.net
IP Address: 104.168.70.107
NetRange: 104.168.0.0 - 104.168.127.255

Going through other notification emails, I identified 2 more spammer-friendly proxy/vps services: AS15003 and Krypt. It covers almost all comment spam, the rest are residental IP addresses in China and Vietnam - most likely part of some botnet and not really worth blacklisting.

Putting it all together

Armed with basic knowledge about mod_rewrite and offending IP addresses, I put it all together:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{REQUEST_URI} !/403.html$
RewriteCond %{REQUEST_URI} !/403.png$
RewriteCond %{REQUEST_URI} !/403.css$
RewriteCond %{REQUEST_URI} !/sad.png$
RewriteCond %{REMOTE_ADDR} ^23\.108\.170 [OR] RewriteCond %{REMOTE_ADDR} ^23\.94 [OR] RewriteCond %{REMOTE_ADDR} ^104\.168 [OR] RewriteCond %{REMOTE_ADDR} ^98\.126
RewriteRule (.*) /403.html [R=302,L] </IfModule>

So, anyone coming from those IP address blocks will get redirected to http://lifeinhex.com/403.html. Problem (hopefully) solved! 🙂

Further reading

These sites were invaluable in adding simple spam block to my blog:
How to redirect requests from particular IP addresses or networks with mod_rewrite - basic usage.
System: mod_rewrite: Examples - great examples, explained well.
mod_rewrite Cheat Sheet - all I ever wanted to know, and little bit more.
How To Ban And Block Proxy Servers? - I didn't have to take this approach yet. And it wouldn't work against "elite" proxies anyway.

Footnotes

1. Actually, the pattern in the RewriteRule does not need to match the _whole_ URL, so you might encounter "$", "(.*)", "." and many more variations in these kinds of rules.