Getting dasBlog to support Transparent HTTP Content Negociation

January 24, 2005 Edit http

I started this entry as part of the Making dasBlog XHTML 1.0 Strict Compliant, but the subject is different enough that it really deserves a separate entry.

I’m a strong supporter of HTTP transparent content negotiation. What is it? In short, it means that a user agent, given a URL, can request a document in different formats. This involves changing the dasBlog url scheme (which I really dislike in the first place, GUID have a tendency to freak me out). I’ve settled on title based URLs in the form of http://blog.thetechnologist.net/year/month/day/TitleOfYourEntry. The old URLs are remapped dynamically to the new scheme. And I just didn’t want to expose the .aspx extension, as it looks awkward to me (as a URI should point to a resource, and as these resources are dynamic, it’s architecturally just plain wrong to expose a file extension).

Whenever a user-agent sends a GET on this URL using the Accept header, the server will deliver either:

text/html
application/xhtml+xml
application/rss+xml
application/atom+xml

That’s right, on the same URL.

Why would you want to serve aggregation formats per entry, I hear you ask. Well, first because I can! But more to the point, because I think it’s a big limitation not to be able to retrieve any item in a specific blog by its URL and read it in your aggregator.

But how do you find the url in an aggregator to retrieve it? Very easy really, as each component of the URL I gave earlier is actually served using content negotiation as well! Just do a GET on http://blog.thetechnologist.net/year/month/ and you will get a feed with the list of entries for that month. Or an XHtml 1.0 page. Or simple HTML.

Of course, most agents don’t support content negotiation right now (and many actually send an Accept: */*, the worst case). To support linking to a specific resource, you can annotate your URL with attributes that have the same effect as the header (the system actually dynamically include these as HTTP headers before the request is processed any further). Taking the previous example, from a browser not supporting content negotiation, you can get a pdf document of the entry as http://blog.thetechnologist.net/year/month/day/TitleOfYourEntry@accept:application/pdf. I’ve chosen the separator character @ because it doesn’t require escaping in a URL and because in these xml days, it means attribute.

I also wanted a way to translate different entries in different languages. Ideally, I’d like my blog to be available both in French and in plain old English. Basic support for that is already available in dasBlog, but there is no correlation between two posts in different languages, dasBlog treats them differently. This involves a bit more modifications to the engine, but considering what have already been done, I see no reason not to do it! In this context, the URL for the French version will be http://blog.thetechnologist.net/year/month/day/TitleOfYourEntry@accept:application/pdf@accept-language:fr-fr

There is a flaw there: in a translated entry, the title would also be translated. Without a meaningful way to link at the HTTP or URI level two entries with alternate urls, I’ve decided to simply provide access through the translated entry as well. Note that because I link several URLs to the same entry, I also expose an id based entry as http://blog.thetechnologist.net/1176f9ea. The id itself is served in the default language / content type, just like the rest of the titles.

I do describe the alternative urls for each entry as follow:

In the Xhtml version, I publish and leverage both the hreflang and type attributes.
In all versions, I publish X-Alternate headers containing the same link element (although this may change). That way, if the format doesn’t support showing the alterative urls (like pdf documents), the information is still available to the agent.
In SOAP endpoints… Well the SOAP part is not very much advanced yet, but let’s just say that I consider each entry to be a resource addressable individually.

The last rather smart thing done is to leverage to link all the different variations of one entry to it’s id url. This is what you would get:

http://blog.thetechnologist.net/year/month/day/TitleOfYourEntry

http://blog.thetechnologist.net/1176f9ea

Using this system, it’s possible to detect which URIs are primary and which are aliases.

I’m still considering providing some form of RDF for the rdf junkies out there to have fun, but I’m not an expert and would need to study a lot before I could make a fit between rdf and the content I’m exposing.

The scope of what can be achieved with this type of negotiation is huge! It means that you’d possibly be able to request an entry in audio format that would either give you a podcast or a text-to-speech representation of an entry.

This will also be the subject of an article, as the infrastructure I’m building to support content negotiation (the URL mapper, and associated IContentProvider interfaces that support each type) can be reused on a wide variety of web applications.