Content-Dislocation

As a computer programmer and user, I prefer to see file extensions. If it's a ".html" file, I know what that means and what kind of content that file contains. So I always enable the display of the extensions for the computers I use. But for the web, I'm doing my best to remove the extension for my web pages. At least, from the end user's point of view. It's called cruft. And it's bad.

I've reconfigured my weblog so that each place it refers to a document, it strips off the ".php" extension. I did this using my Movable Type Regex plugin:

<$MTEntryLink regex="s/\.php$//"$>

Not terribly difficult to do, but there are a lot of places I have to do it. I think I have them all covered. RSS/Atom feeds, search results, etc. Now, I still have php as my filename extension for all the files produced from Movable Type-- I want the files to have an extension within the filesystem. I'm more comfortable with it that way. But I rely on the MultiViews feature of Apache's mod_negotation to serve up the right file when the extension itself is absent. This works very well, and allows any extension to be optional.1

So all the links for my blog are published with links sans extensions. And yet, I still see people linking to my site with URLs ending in ".php". Why is that?

Well, I found out that Apache is still telling on me. When mod_negotiation serves a file that it picks, it sets a header named Content-Location that reveals the real filename. Here's the response I got from a "HEAD" request for http://bradchoate.com/weblog/2004/04/14/inventory:

HTTP/1.1 200 OK
Date: Thu, 15 Apr 2004 10:25:04 GMT
Server: Apache/1.3.29 (Unix) mod_throttle/3.1.2
        mod_gzip/1.3.26.1a PHP/4.3.4 mod_perl/1.29
Content-Location: inventory.php
Vary: negotiate,Accept-Encoding
TCN: choice
X-Powered-By: PHP/4.3.4
X-Accelerated-By: PHPA/1.3.3r2
Content-Type: text/html

The headers that mod_negotiate adds are: Content-Location, Vary and TCN. Now I don't mind the latter two, but I'd rather avoid the Content-Location header. Because mod_negotiation is emitting that, some clients out there (newsreaders for example), will adjust the URL for that resource. So when people click-through, they use the link with the php extension. Bad.

So what to do? Well, after doing a little digging, I found mod_headers. It's another Apache module that lets you add or remove HTTP headers at will. It's one of the bundled modules for Apache, but it isn't enabled by default. After a quick rebuild of the server, I tried it out in my .htaccess file:

<IfModule mod_headers.c>
    Header unset Content-Location
</IfModule>

But alas, the header remained. I tried placing it in httpd.conf instead. No good. I re-read the documentation. I searched Google. I searched Google Groups. Still no luck. Lots of references to removing this header using mod_headers, but it wasn't removing it!2

OK. I'm a programmer. Let's look at the source. And there it was (mod_negotiation.c):

ap_table_setn(r->err_headers_out, "Content-Location",
              ap_pstrdup(r->pool, variant->file_name));

I don't know what the ap_table_setn function does, but I infer from the use of the $r->err_headers_out variable that this header is being stored as an error header, a distinction that makes a difference with mod_headers. mod_headers has a separate directive for handling error headers: ErrorHeader. So, a quick edit to my .htaccess file:

<IfModule mod_headers.c>
    ErrorHeader unset Content-Location
</IfModule>

And that did it! This went against the mod_headers documentation for the ErrorHeader directive, which says:

This directive can replace, merge or remove HTTP response headers during 3xx, 4xx and 5xx replies. For normal replies use the Header directive.

Even though negotiated documents are returned with a HTTP 200 result, the Header directive failed to handle the headers set by mod_negotiation. The ErrorHeader directive works perfectly fine, regardless of the actual HTTP result code. The documentation clearly needs to clarify all of this.

So in closing, yay for Google for pointing me in the right direction anyway. Yay for open source so I could get to the bottom of this. And if you're using MultiViews like this, you should be stripping the Content-Location header. Otherwise, Apache will reveal your crufty little secret.

1 I prefer using MultiViews over naming files without an extension and forcing a MIME type. The main reason is that if I were to force a MIME type, I couldn't mix multiple file types within the same directory. With MultiViews, I can place .pl, .cgi, .php, .js, etc. all in the same directory and server knows how to handle each of them, because the extension is present. All I have to do is just strip the extension from the published URLs to those files.

2 At the time of this writing, Googling for errorheader unset content-location doesn't turn up anything, so this is hopefully a solution that others can benefit from.

TrackBack

TrackBack URL for this entry:
http://bradchoate.com/mt/feedback/tb/886

Listed below are links to weblogs that reference Content-Dislocation:

» Hiding File Extensions in Movable Type from Abe Fettig's Web Workshop
The techniques I use to remove file extensions from my Moveable-Type generated URLs. [Read More]

» I took some advice from Verily
...I cleaned up my archive entries so that they're more human-readable and "future-proof". Thanks really goes out to Brad for that one simple link in his content-dislocation post. Hope it doesn't mess up anyone's bookmarks...... [Read More]

» I took some advice from Verily
...I cleaned up my archive entries so that they're more human-readable and "future-proof". Thanks really goes out to Brad for that one simple link in his content-dislocation post. Hope it doesn't mess up anyone's bookmarks...... [Read More]

» Hiding File Extensions in Movable Type from Abe Fettig's Web Workshop
The techniques I use to remove file extensions from my Moveable-Type generated URLs. [Read More]

» Moveable Type 3.0 from Rizwan Kassim's Public Log
I bit the bullet after SixApart updated their Moveable Type licensing setup… Had to mod some plugins to get MT3 working right, but now seems like a good time as any to start categorizing etc, so thats on the todo... [Read More]

» Moveable Type 3.0 from Rizwan Kassim's Public Log
I bit the bullet after SixApart updated their Moveable Type licensing setup… Had to mod some plugins to get MT3 working right, but now seems like a good time as any to start categorizing etc, so thats on the todo... [Read More]

» Moveable Type 3.0 from Rizwan Kassim's Public Log
I bit the bullet after SixApart updated their Moveable Type licensing setup… Had to mod some plugins to get MT3 working right, but now seems like a good time as any to start categorizing etc, so thats on the todo... [Read More]

» Don't Let Your URL's Get Old and Crufty from Lockergnome's Web Developers
If you've been searching for a way to remove file extensions from your weblog archives to make them neat and clean, it's not quite impossible! Removing these extensions (or "url cruft") is actually quite simple. Using DiveIntoMark.org's tutorial for Cr... [Read More]

» Don't Let Your URLs Get Old And Crufty from Lockergnome's Web Developers
If you've been searching for a way to remove file extensions from your weblog archives to make them neat and clean, it's not quite impossible! Removing these extensions (or "url cruft") is actually quite simple. Using DiveIntoMark.org's tutorial for Cr... [Read More]

13 Comments

I did something similar to this in MT a while ago (using your reg-ex plugin): Future-proof Your URIs.

I just did that. Unfortunately my trackback link is (now) broken :(

Matt said:

In WordPress it's a click of a button to enable URIs like yours.

SV said:

Please I need some help. If you see my blog ( http://mblog.com/forsv/) I have the Frequent Commenter section in side bar. I would like to modify that to do:

Say Tab [31]

[1] If I click on the name Tab it should go to Tab's Blog URL

[2] If I click on [31] it should go to a page with all the comments made by Tab in a format similar to when u click on category.

Can this be done or am thinking too much. Please help.

Thanks in advance.

Another technique is to redirect all incoming extensioned requests to the extension-free URL:

RedirectMatch permanent /(.*)\.php$ http://example.com/$1

If you're using PHP and MultiViews, you may want to read this. I don't know whether stripping the Content-Location header will mess up the SCRIPT_FILENAME variable, I'll have to check. You should be able to strip the Content-Location header with PHP, too: I'll check and get back to you...

Hmmm. Apache 2.x no longer has the ErrorHeader directive, and Header still does not work, as above.

PHP is less helpful than I had thought. It is possible to replace a response header (via the header() function), but not to delete one. The official spec for Content-Location does not permit it to be present but blank.

The only solution I can see, which I'm far too lazy to actually test, is to use the headers_list() function to parse the pending response headers, and remove the Content-Location header, if the array can be modified.

Brian said:

I can confirm Mark Tranchant's report: unsetting the Content-Location header does not work with Apache 2.x. I can't seem to get it to work on 1.3, either. I've tried

Header unset Content-Location
ErrorHeader unset Content-Location

Neither works. I tested Header to make sure it was on in both Apache setups. Shame.

Because of an Opera bug related to the Content-Location header, I have been forced to take action. Fortunately, it’s a simple one-line comment block to recompile mod_negotiation to lose this header. Read about it, and the bug that caused me to leap out of slumbering lethargy, here.

Steve Clay said:

I just discovered this phenomenon after posting it as an Opera bug.
Here was my test page showing a potential data loss situation due to Opera's "correct" implementation.

Richard A said:

The solution for Apache 2 is to use "always" instead of the default "onsuccess":

Header always unset Content-Location

Rene A. said:

Thanks Richard A!

I was looking for a solution in Apache 2.2.x and the command in the last comment works very nice.

Rene A. said:

Oh my god!

On Mac OSX with DarwinPorts - apache 2.2.4 it works.
On Debian - apache 2.2.3 the header remains!

Does anybody has any idea about this?

Milivoj said:

Thank you Richard A. I was starting to despair, the "always" did the trick.

About

This article was published on April 15, 2004 9:57 AM.

The article previously posted was OS X software inventory.

The next article is Yet another Google sell-out.

Many more can be found on the home page or by looking through the archives.

Powered by Movable Type