Comment Spamming
Phil Ringnalda recently got hit by a comment spammer-- arguably one of the lowest forms of life. They care not about you or your blog regardless of what they say ("This is a great site!"). Mark considers the problem in his usual "maxium verbosity" style.
I'd like to propose a solution, or at least a remedy that might help. This is also a "club" solution, to use Mark's analogy, but perhaps it has some merit. It builds on Shelley's approach, that is to have a piece of data in the comment form that is validated upon submission. In this case, the value to be embedded isn't static, but dynamic. Here's how I would do it:
The comment form should be generated dynamically. For MT, this would mean to always use mt-comments.cgi to produce the comment form.
Use dynamic field names. Next, the comment form itself would be formed in a way so that the field names are dynamically named. The field names would be calculated using three factors: the current time, the entry id and the "real" field name. The calculated name would be encrypted and base64 encoded. The encryption technique should use a key that only the server has access to (especially important for systems that publish their source code!). The encryption should be reversible but only using the original encryption key.
Validate the submitted data. The server-side process that receives the comment data would first check the field names for validity. It would decode and decrypt them. It would check the entry ID and make sure that it matches. It could also check the time that the form was issued. If the elapsed time for posting the comment was less than or greater than an appropriate window of time (this window would be user-configurable), then the comment form would reload with the comment, a new authentication key and with an explanation that they should resubmit. The user would have to resubmit within the proper time frame for the comment to be accepted.
So what does this accomplish? Well, for one, it requires that the spammer retrieve the actual comment form first. Why? Because the form field names are encoded using a technique that can't be reproduced (not without that encryption key). The comment process requires those names to be valid in order for the comment to be processed. Second, since those field names are dynamic, it will require the spammer to parse the HTML returned to determine which fields are the right fields for name, e-mail address, URL, and comment. This will be difficult since the comment form itself is user-defined and would vary from site to site (unless the blog is just using the default template). Third, by embedding the time data into the form we can put limits on the response. If it comes too fast, it is probably an automated process (a spammer). If it comes too late, then it's probably an automated, queued response. Fourth, by sending the entry id along with the form, it limits abuse to that one entry.
Here are some other suggestions for preventing spamming:
- Throttling comment form requests from the same IP. Since the process above requires the form be generated by the server, we can slow down any form of spidering this way.
- Since some people don't enable comments for every post, the server should automatically blacklist comments posted for entries that have disabled comments or where the entry doesn't exist at all. Using a "honey pot" approach, the server could return a normal comment form for the invalid entry but upon submitting the comment, the user's IP would get blacklisted.
- Comments posted to multiple entries within a minute and from the same IP should also be flagged automatically. There should be a way to browse and delete suspect comments.
All of this will probably not stop spam comments altogether (to paraphrase Dr. Ian Malcolm from Jurassic Park, "spam finds a way"), but it will certainly help stem the tide.
I want to set up a tarpit/honeypot, using some automated methods like this to spot spammers, then serve them up a fake comments form that doesn't actually post, combined with something like mod_bandwidth to dynamically throttle spamming IP addresses (rather than banning them).
http://www.cohprog.com/v3/bandwidth/doc-en.html
Comment Spamming is major problem. Isn't there a mod avaiable which allows all comments to be validated before being posted on the site?
Small problem - your suggested solution assumes that all real users traffic will consistently come from the same ip address throughout their visit.
This isn't true if their access is through multiple web proxies (eg AOL or my office, among others), so you risk loosing real comments along with the spam.
paul-- you're right. Now that I think about it, there isn't any good reason to encode the user's IP like that. I've revised my post to remove that element. No need to unnecessarily exclude real visitors.
Form fields can be accessed via DOM document.forms(x), based on their order in the HTML, so the spammer would not need to parse the HTML. However, a hidden field that was randomly placed with each page load would foil this approach.