As a computer programmer and user, I prefer to see file extensions. If it's a ".html" file, I know what that means and what kind of content that file contains. So I always enable the display of the extensions for the computers I use. But for the web, I'm doing my best to remove the extension for my web pages. At least, from the end user's point of view. It's called cruft. And it's bad.
I've reconfigured my weblog so that each place it refers to a document, it strips off the ".php" extension. I did this using my Movable Type Regex plugin:
Not terribly difficult to do, but there are a lot of places I have to do it. I think I have them all covered. RSS/Atom feeds, search results, etc. Now, I still have php as my filename extension for all the files produced from Movable Type-- I want the files to have an extension within the filesystem. I'm more comfortable with it that way. But I rely on the MultiViews feature of Apache's mod_negotation to serve up the right file when the extension itself is absent. This works very well, and allows any extension to be optional.1
So all the links for my blog are published with links sans extensions. And yet, I still see people linking to my site with URLs ending in ".php". Why is that?
Well, I found out that Apache is still telling on me. When mod_negotiation serves a file that it picks, it sets a header named
Content-Location that reveals the real filename. Here's the response I got from a "HEAD" request for http://bradchoate.com/weblog/2004/04/14/inventory:
HTTP/1.1 200 OK Date: Thu, 15 Apr 2004 10:25:04 GMT Server: Apache/1.3.29 (Unix) mod_throttle/3.1.2 mod_gzip/188.8.131.52a PHP/4.3.4 mod_perl/1.29 Content-Location: inventory.php Vary: negotiate,Accept-Encoding TCN: choice X-Powered-By: PHP/4.3.4 X-Accelerated-By: PHPA/1.3.3r2 Content-Type: text/html
The headers that mod_negotiate adds are:
TCN. Now I don't mind the latter two, but I'd rather avoid the
Content-Location header. Because mod_negotiation is emitting that, some clients out there (newsreaders for example), will adjust the URL for that resource. So when people click-through, they use the link with the php extension. Bad.
So what to do? Well, after doing a little digging, I found mod_headers. It's another Apache module that lets you add or remove HTTP headers at will. It's one of the bundled modules for Apache, but it isn't enabled by default. After a quick rebuild of the server, I tried it out in my .htaccess file:
<IfModule mod_headers.c> Header unset Content-Location </IfModule>
But alas, the header remained. I tried placing it in httpd.conf instead. No good. I re-read the documentation. I searched Google. I searched Google Groups. Still no luck. Lots of references to removing this header using mod_headers, but it wasn't removing it!2
OK. I'm a programmer. Let's look at the source. And there it was (mod_negotiation.c):
ap_table_setn(r->err_headers_out, "Content-Location", ap_pstrdup(r->pool, variant->file_name));
I don't know what the
ap_table_setn function does, but I infer from the use of the
$r->err_headers_out variable that this header is being stored as an error header, a distinction that makes a difference with mod_headers. mod_headers has a separate directive for handling error headers:
ErrorHeader. So, a quick edit to my .htaccess file:
<IfModule mod_headers.c> ErrorHeader unset Content-Location </IfModule>
And that did it! This went against the mod_headers documentation for the
ErrorHeader directive, which says:
This directive can replace, merge or remove HTTP response headers during 3xx, 4xx and 5xx replies. For normal replies use the Header directive.
Even though negotiated documents are returned with a HTTP 200 result, the
Header directive failed to handle the headers set by mod_negotiation. The
ErrorHeader directive works perfectly fine, regardless of the actual HTTP result code. The documentation clearly needs to clarify all of this.
So in closing, yay for Google for pointing me in the right direction anyway. Yay for open source so I could get to the bottom of this. And if you're using MultiViews like this, you should be stripping the
Content-Location header. Otherwise, Apache will reveal your crufty little secret.
1 I prefer using MultiViews over naming files without an extension and forcing a MIME type. The main reason is that if I were to force a MIME type, I couldn't mix multiple file types within the same directory. With MultiViews, I can place .pl, .cgi, .php, .js, etc. all in the same directory and server knows how to handle each of them, because the extension is present. All I have to do is just strip the extension from the published URLs to those files.
2 At the time of this writing, Googling for errorheader unset content-location doesn't turn up anything, so this is hopefully a solution that others can benefit from.