Sanitize Plugin
This plugin has been deprecated. The Santize plugin was incorporated (and extended) into Movable Type 2.6. For more information about this and how you would go about uninstalling this plugin, please read this article.
Sanitize is a Movable Type plugin that allows you to clean HTML and other markup that might exist in an comment entry. Read on for more information about how it works and what it's for. If you're using Movable Type and allow HTML comments on your site, you really need to read this...
Availability
The plugin referred to here is available here: mtsanitize-1_2.zip.
Description
Movable Type has a nasty little security problem. Well, it isn't the fault of Movable Type actually. It's something you get when you allow HTML in your comments, combined with using PHP, ASP, JSP, or simple SSI for processing your pages. For example-- what if someone were to put this in their comment (on a blog that allows HTML comments):
<?php readfile("/etc/passwd") ?>
Well, that would print out the contents of the /etc/passwd file on your server. Worse yet, they could reference other files readable by Movable Type-- such as mt-db-pass.cgi (which holds the password for a Movable Type installation that uses MySQL for storing the blog content).
Well, the quick fix to this problem is to disallow HTML comments. But if you want to keep your HTML comments and strip them of unsafe tags, you can use the Sanitize plugin to clean them up. Here's how you might use it:
<MTCommentBody sanitize_html="a href,b,br,p,strong,em,ul,li">
The tags listed in the 'sanitize_html' attribute are the tags that are allowed. Any tags not listed will be removed. In addition, the JSP, ASP, PHP and SSI markups are automatically stripped out to prevent abuse. Attributes must also be specified (as of the 1.1 update). To specify attributes, add a space after the tag name, then follow that with each allowed attribute name, putting a space to delimit each of them. If you want to allow an attribute for ALL allowed tags, add a '*' as a tag name, followed by the list of attribute names.
One more feature of the Sanitize plugin is that while it scans the HTML for tags, it keeps up with which tags have been opened and closed. By the end of the data, if there are any tags that weren't closed, it will append closure tags for each of them. 'Runaway tags' are commonplace when you allow HTML in your comments-- a bold tag is opened but the commenter doesn't close it, so all the following content becomes bold too. But Sanitize will add that closing bold tag in such a case.
If you have questions, complaints or problems with this plugin, please post them here within the comments for this entry.
How about stripping attributes from allowed tags (well, not href from a, but other than that)? I don't mind letting people make things bold, but I'd just as soon they don't make them style="color:lime; font-size:80px"
You could do that using the Macro plugin. Ie:
<MTMacroDefine ctag="a" name="comment_a" no_case="1">
<MTMacroAttr name="style" remove="1"><MTMacroTag rebuild="1"><MTMacroContent></a>
</MTMacroDefine>
Then for your CommentBody tag:
<MTCommentBody apply_macros="m/^comment_/">
Hey there... I've got a similar plugin to this (MTCleanHTMLPlugin), based on code I borrowed from LiveJournal. One thing that they do that would be good to consider is extensive filtering out of javascript-enabling atttributes (ie. onclick, onmouseover, etc). Nasty things have been known to happen down that road. You could probably do something like that fairly simply by having a list of allowed attributes per tag, as well as allowed tags.
Brad, Great plug-in! Just a couple of comments:
1. It closes tags that don't need to be closed, like the image tag.
2. It closes tags after the closing closing paragraph tag added by MT, so that the order of tags is not standard, i.e., p b /p /b, rather than p b /b /p.
3. If there's a br tag in the line, it doesn't close the b tag.
I don't know if all this matters to the Web browser, so maybe it's not that important.
Sorry about those double pings - the first time I saved the entry, the URL stayed in the box as though the ping had failed - at least, I thought that's what it meant. Hmmmm.
Hi! I’m currently configuring my blog to use your plugin, but I was wondering:
My website uses XHTML. So I was wondering if your plugin supports elements from other namespaces, like
xml:lang.Thanks for making this plugin!
— minger
Erm, it's not a problem with HTML. It's a problem with Movable Type, for God's sake. Its comments section is insecure out of the box.
And with its thousands users... it only took almost a year to notice MT allowed just anything to pass without first checking it's secure ? I don't mean to badmouth MT, for in my mouth it would sound very biased, but please... LMAO, this is incredible :P
Thank God (or Rasmus and friends) for PHP's strip_tags() function, which does just what you did.
By the way, I just noticed you should also change any style, class, id attribute into a safer title attribute in the comments. This avoids defacements such as this one.
(OK, so b2 was vulnerable to such ridicule 0wnage too until tonight, but at least it doesn't come with a giant security issue in the comments form by default, eheh ;)
michel -- MT is *not* insecure out of the box. By default, all files have the extension .html. There may be some obnoxious Javascript tricks that could be pulled using that configuration, but certainly nothing like the things you can do with PHP, SSI, etc.
Now, it's true that the documentation could have been more clear about the dangers of using .php and .shtml as file extensions *with* static comments, but to say that this insecurity exists out of the box is completely false.
OK, I retract that statement about MT being insecure out of the box.
But just because files have the extension .html by default doesn't mean comment data shouldn't be sanitised by default before being inserted in the database, don't you think ? :)
Actually, it is (well, it's sanitized before being displayed on a public page, at least).
Another default in a new MT blog is that HTML in comments is not allowed. So by default, all HTML in comments is completely stripped out--this includes both valid HTML tags and PHP/SSI/JSP/etc.
So in order to make it insecure you actually have to change the file extensions *and* check "allow HTML in comments". I forgot to mention that last night.
Anyway, yes, Brad's plugin is great, and it definitely fills a need.
Do you take plugin requests? ;) ... I'd like to propose a de-sanitizer (defiler??) that basically goes the otherway, it takes straight text and produces HTML. Since you've posted a geekcode ;) then the technical word for what I would most like to see is a WikiText plugin.
This would seem most similar to your santizer, and I think folding the easy-markup of WikiText into blogspace would open up blogging for a lot more people. We just need to look at the size of Wikipedia to see that everyday people have no trouble with WikiText, but there's no way we're going to get more than 10% of the population using inline HTML.
what do you think? doable? desireable?
A suggestion that occurred to me recently. I've noticed that when people post HTML comments, and accidentally put in a line break (by pressing enter while typing, not with a <br> tag) in the middle of a tag, stuff stops working. For example, an image might not display, or other annoying things like that.
I'm pretty sure it would be an easy fix that could tie in nicely with the checking for a close tag, but I'm a perl retard, and thus cannot implement it myself.
Just a suggestion :)
A question from a novice. After creating the directories and installing the files Is it correct that I replace with in the comment template
An error occurred:
bradchoate/sanitize.pm did not return a true value at plugins/sanitize.pl line 28.
this is my code:
recognise this error?
thanks, nardo
I think this plugin is stripping the target from my href's. The target="_blank" exists in the comment entry but is not getting through to the HTML. Do I need to include something for target in my allowable tags? I tried including target already.
Quote:
"One more feature of the Sanitize plugin is that while it scans the HTML for tags, it keeps up with which tags have been opened and closed. By the end of the data, if there are any tags that weren’t closed, it will append closure tags for each of them."
Can this be refined a bit? I'm having a little trouble, which is easiest explained by pointing you to the second page of this support forum thread. It may be closing a tag for me prematurely?? Or do you think it's something else causing this?
Thanks! Love the new site, btw!
To allow the target tag, put this in your allowed tags:
a href target
Additionally, you can include title as well, allowing someone to create a full link.
What's the correct way to declare a singular tag with attributes?
img src width height alt/
Or
img/ src width height alt