Comment Spamming

Phil Ringnalda recently got hit by a comment spammer-- arguably one of the lowest forms of life. They care not about you or your blog regardless of what they say ("This is a great site!"). Mark considers the problem in his usual "maxium verbosity" style.

I'd like to propose a solution, or at least a remedy that might help. This is also a "club" solution, to use Mark's analogy, but perhaps it has some merit. It builds on Shelley's approach, that is to have a piece of data in the comment form that is validated upon submission. In this case, the value to be embedded isn't static, but dynamic. Here's how I would do it:

The comment form should be generated dynamically. For MT, this would mean to always use mt-comments.cgi to produce the comment form.

Use dynamic field names. Next, the comment form itself would be formed in a way so that the field names are dynamically named. The field names would be calculated using three factors: the current time, the entry id and the "real" field name. The calculated name would be encrypted and base64 encoded. The encryption technique should use a key that only the server has access to (especially important for systems that publish their source code!). The encryption should be reversible but only using the original encryption key.

Validate the submitted data. The server-side process that receives the comment data would first check the field names for validity. It would decode and decrypt them. It would check the entry ID and make sure that it matches. It could also check the time that the form was issued. If the elapsed time for posting the comment was less than or greater than an appropriate window of time (this window would be user-configurable), then the comment form would reload with the comment, a new authentication key and with an explanation that they should resubmit. The user would have to resubmit within the proper time frame for the comment to be accepted.

So what does this accomplish? Well, for one, it requires that the spammer retrieve the actual comment form first. Why? Because the form field names are encoded using a technique that can't be reproduced (not without that encryption key). The comment process requires those names to be valid in order for the comment to be processed. Second, since those field names are dynamic, it will require the spammer to parse the HTML returned to determine which fields are the right fields for name, e-mail address, URL, and comment. This will be difficult since the comment form itself is user-defined and would vary from site to site (unless the blog is just using the default template). Third, by embedding the time data into the form we can put limits on the response. If it comes too fast, it is probably an automated process (a spammer). If it comes too late, then it's probably an automated, queued response. Fourth, by sending the entry id along with the form, it limits abuse to that one entry.

Here are some other suggestions for preventing spamming:

  • Throttling comment form requests from the same IP. Since the process above requires the form be generated by the server, we can slow down any form of spidering this way.
  • Since some people don't enable comments for every post, the server should automatically blacklist comments posted for entries that have disabled comments or where the entry doesn't exist at all. Using a "honey pot" approach, the server could return a normal comment form for the invalid entry but upon submitting the comment, the user's IP would get blacklisted.
  • Comments posted to multiple entries within a minute and from the same IP should also be flagged automatically. There should be a way to browse and delete suspect comments.

All of this will probably not stop spam comments altogether (to paraphrase Dr. Ian Malcolm from Jurassic Park, "spam finds a way"), but it will certainly help stem the tide.

TrackBack

TrackBack URL for this entry:
http://bradchoate.com/mt/feedback/tb/430

Listed below are links to weblogs that reference Comment Spamming:

» Spam en los weblogs from kusor dhtml weblog
Estaba leyendo el weblog de Mark Pilgrim, concretamente el post relacionado con el spam en los comentarios de los weblogs [Read More]

» Comment and Trackback Spamming from Burningbird
The discussion continues on comment spamming and a couple of people have taken my initial quick fix and expanded on it nicely. Jennifer from Scripty Goddess has taken to solution into the MT tmpl files, adding the hidden field [Read More]

» "No Comment" from Joni Electric
I've been following with interest the issue of comment spamming through MovableType (and presumably other content management systems that use [Read More]

» Spam Proof your Blog from Macrofun - DHTML / CFMX / FLASHMX
I was reading an article over at bradchoate.com (via moik78) on anti-Spamming "howtos" and while I agree with his hardcore "that'll screw em" approach, there is a very simple but effective alternative to stuff these bastards up (I agree, spammers... [Read More]

» Spam Proof your Blog from Macrofun - DHTML / CFMX / FLASHMX
I was reading an article over at bradchoate.com (via moik78) on anti-Spamming "howtos" and while I agree with his hardcore "that'll screw em" approach, there is a very simple but effective alternative to stuff these bastards up (I agree, spammers... [Read More]

» Blogspam from BLOG@STEFANGEENS.COM
...And the innocent days of blogging are over. Over the past few months I've deleted exactly two nutcase comments off my blog, both... [Read More]

» Spam in blogs, een nieuw probleem? from Punkey
Ik las onderweg al een pointer op mijn PDA-webloglezertje over een nieuw, snelgroeiend fenomeen: commentspam. Met name Movable Type-logs (wat wij dus ook hebben) schijnen hier veel last van te hebben. Een script zorgt voor 100en comments op je weblog... [Read More]

» Comment and Trackback Spamming from Burningbird
The discussion continues on comment spamming and a couple of people have taken my initial quick fix and expanded on it nicely. Jennifer from Scripty Goddess has taken to solution into the MT tmpl files, adding the hidden field to processing.tmpl. Brad... [Read More]

5 Comments

Mark said:

I want to set up a tarpit/honeypot, using some automated methods like this to spot spammers, then serve them up a fake comments form that doesn't actually post, combined with something like mod_bandwidth to dynamically throttle spamming IP addresses (rather than banning them).

http://www.cohprog.com/v3/bandwidth/doc-en.html

Ali said:

Comment Spamming is major problem. Isn't there a mod avaiable which allows all comments to be validated before being posted on the site?

paul said:

Small problem - your suggested solution assumes that all real users traffic will consistently come from the same ip address throughout their visit.

This isn't true if their access is through multiple web proxies (eg AOL or my office, among others), so you risk loosing real comments along with the spam.

Brad Choate said:

paul-- you're right. Now that I think about it, there isn't any good reason to encode the user's IP like that. I've revised my post to remove that element. No need to unnecessarily exclude real visitors.

tom said:

Form fields can be accessed via DOM document.forms(x), based on their order in the HTML, so the spammer would not need to parse the HTML. However, a hidden field that was randomly placed with each page load would foil this approach.

About

This article was published on October 29, 2002 7:18 PM.

The article previously posted was Sideblog.

The next article is PHP at Yahoo.

Many more can be found on the home page or by looking through the archives.

Powered by Movable Type