Over the last week or so, I’ve been modifying the dasBlog source code quite heavily to make it Xhtml 1.0 Strict compliant. I also want to make it content type negotiation enabled, and as such serve application/xhtml+xml mime types instead of just text/html (more on that later).
The goals for this project:
- Render only XML formatted XHTML 1.0 Strict markup;
- Clean up the markup being used, and beautify it;
- Give an option to minimize the size of a page.
The scope of the problem is large, as both asp.net and dasBlog lack most of the support for it. Over the next couple of weeks, I’ll publish several articles on how I fixed a bunch of issues:
- Links don’t have proper character entity escaping (turning & into & in urls like http://test.com?firstparam&secondparam ).
- WebControls push style blocks in the body of the page.
- Different components misuse the Write* family of methods on the HtmlTextWriter object that is used to render markup in the HTTP pipeline of asp.net.
- Blog entries are not stored or processed as proper Xhtml content. There needs to be clean-up of the code, both when an entry is being added and when pulling an entry from the dasblog store. Storing as Xhtml only content and publishing proper namespaced html content will be a next step in the project.
- Several elements, including input and form, have a name attribute instead of / alongside an id attribute.
- Input elements are not wrapped in the proper controls
- Several elements use inappropriate attributes (language when using scripts in onclick blocks etc)
This non-exhaustive list is what must be fixed before we can start thinking of being compliant. There’s also a couple of things I wanted to achieve to clean the markup rendered by dasBlog:
- Do these two things for style blocks as well.
I decided to fix these problems by:
- Inhering from HtmlTextWriter and building a new XhtmlTextWriter.
- Overriding all the Write* methods on HtmlTextWriter and pipe them through the proper AddAttribute / BeginElement methods (involving parsing html content and cleaning it / piping it on the fly).
I’ll write at least two articles on the subject over the next couple of weeks.
I started this entry as part of the Making dasBlog XHTML 1.0 Strict Compliant, but the subject is different enough that it really deserves a separate entry.
I’m a strong supporter of HTTP transparent content negotiation. What is it? In short, it means that a user agent, given a URL, can request a document in different formats. This involves changing the dasBlog url scheme (which I really dislike in the first place, GUID have a tendency to freak me out). I’ve settled on title based URLs in the form of http://blog.thetechnologist.net/year/month/day/TitleOfYourEntry. The old URLs are remapped dynamically to the new scheme. And I just didn’t want to expose the .aspx extension, as it looks awkward to me (as a URI should point to a resource, and as these resources are dynamic, it’s architecturally just plain wrong to expose a file extension).
Whenever a user-agent sends a GET on this URL using the Accept header, the server will deliver either:
That’s right, on the same URL.
Why would you want to serve aggregation formats per entry, I hear you ask. Well, first because I can! But more to the point, because I think it’s a big limitation not to be able to retrieve any item in a specific blog by its URL and read it in your aggregator.
But how do you find the url in an aggregator to retrieve it? Very easy really, as each component of the URL I gave earlier is actually served using content negotiation as well! Just do a GET on http://blog.thetechnologist.net/year/month/ and you will get a feed with the list of entries for that month. Or an XHtml 1.0 page. Or simple HTML.
Of course, most agents don’t support content negotiation right now (and many actually send an Accept: */*, the worst case). To support linking to a specific resource, you can annotate your URL with attributes that have the same effect as the header (the system actually dynamically include these as HTTP headers before the request is processed any further). Taking the previous example, from a browser not supporting content negotiation, you can get a pdf document of the entry as http://blog.thetechnologist.net/year/month/day/TitleOfYourEntry@accept:application/pdf. I’ve chosen the separator character @ because it doesn’t require escaping in a URL and because in these xml days, it means attribute.
I also wanted a way to translate different entries in different languages. Ideally, I’d like my blog to be available both in French and in plain old English. Basic support for that is already available in dasBlog, but there is no correlation between two posts in different languages, dasBlog treats them differently. This involves a bit more modifications to the engine, but considering what have already been done, I see no reason not to do it! In this context, the URL for the French version will be http://blog.thetechnologist.net/year/month/day/TitleOfYourEntry@accept:application/pdf@accept-language:fr-fr
There is a flaw there: in a translated entry, the title would also be translated. Without a meaningful way to link at the HTTP or URI level two entries with alternate urls, I’ve decided to simply provide access through the translated entry as well. Note that because I link several URLs to the same entry, I also expose an id based entry as http://blog.thetechnologist.net/1176f9ea. The id itself is served in the default language / content type, just like the rest of the titles.
I do describe the alternative urls for each entry as follow:
- In the Xhtml version, I publish and leverage both the hreflang and type attributes.
- In all versions, I publish X-Alternate headers containing the same link element (although this may change). That way, if the format doesn’t support showing the alterative urls (like pdf documents), the information is still available to the agent.
- In SOAP endpoints… Well the SOAP part is not very much advanced yet, but let’s just say that I consider each entry to be a resource addressable individually.
The last rather smart thing done is to leverage to link all the different variations of one entry to it’s id url. This is what you would get:
<link rev="alternate" href="href="http://blog.thetechnologist.net/1176f9ea" />
<link rel="alternate" href="http://blog.thetechnologist.net/year/month/day/TitleOfYourEntry" />
Using this system, it’s possible to detect which URIs are primary and which are aliases.
I’m still considering providing some form of RDF for the rdf junkies out there to have fun, but I’m not an expert and would need to study a lot before I could make a fit between rdf and the content I’m exposing.
The scope of what can be achieved with this type of negotiation is huge! It means that you’d possibly be able to request an entry in audio format that would either give you a podcast or a text-to-speech representation of an entry.
This will also be the subject of an article, as the infrastructure I’m building to support content negotiation (the URL mapper, and associated IContentProvider interfaces that support each type) can be reused on a wide variety of web applications.