In how many ways can the Microsoft Ajax 4 syntax break XHTML?

June 12, 2009 Edit html

[Update: Added example pages from aria]

For the answer, see the points below.

I was going to email the comments below to some people at Microsoft that have shown an interest in opening up a conversation, but after a comment I left on Hanselman’s blog, and Bertrand Le Roy, a PM at Microsoft, responded by saying “I really wish we could concentrate on the fantastic features in Microsoft Ajax 4 instead of discussing dogma.”, I think it’s in the public interest to highlight some of the incompatibilities the Microsoft’s proposed syntax for AJAX 4 is going to induce in your code. Not that it’s the first time either that a suggestion for MS Ajax takes liberties with standards. See my previous post from their original idea of xml scripts that were also breaking stuff last year.

So what is it that AJAX is providing in the visual studio beta 1? Here’s a partial example taken from Hanselman’s entry.

<html xmlns="http://www.w3.org/1999/xhtml">
<head>     
  <title>AdventureWorks AJAX</title>    
</head>
<body xmlns:sys="javascript:Sys" xmlns:class="http://schemas.microsoft.com/aspnet/class">
  <h1>Customer Directory</h1>
  <table cellspacing="0">           
    <tbody id="customers-template" class="sys-template">
      <tr sys:command="select" class:odd="{{ $index % 2 != 0 }}">
        ...              
      </tr>
    </tbody>
  </table>
</body>
</html>

The state of affairs with the html family

Before we go any further, let’s clarify what the state of affairs is. The HTML family exists as follow:

HTML4.01 is based on SGML and allows empty elements without a closing />, and doesn’t support namespaces
XHTML 1.0 and XHTML 1.1, which are XML languages and fully support the xml namespaces, with a caveat: they don’t validate as XHTML. Full stop. See below why this is an issue.
HTML5 family, which has two serializations:
- HTML5, a derivative of SGML that is not SGML anymore (so yes, we do have a third serialization format to contend with), and that only supports 3 fixed namespaces (namely html, svg and mathml). None others are recognized.
- XHTML5, an XML serialization, that supports namespaces.

That’s the serialization formats. Now all those formats end up in a DOM, an in-memory representation of the components of an html page

The html 4 DOM is the DOM level 2, and doesn’t support namespaces declared in HTML. The scripting itself can however use namespaces.
The html 5 DOM, that supports xml namespaces (see below for caveats)
The XML DOM, that supports xml namespaces

Defining the language used by the example

So this is where the fun begins. As everything is lowercase, this language could be HTML4.01, HTML5, XHTML5 or XHTML 1.x. In effect, HTML4 like HTML5 are case-insensitive, while any xml language is actually case sensitive.

The absence of an HTML4 doctype means this document cannot be a valid HTML4 document. It could be an html 5 element, or one of the XHTML languages. We’ll see why it’s important later, and it all comes down to the DOM.

So, let’s move on to the attributes used in the tbody element. How would a browser parse this element to build a DOM?

In HTML 4.01, HTML5 or XHTML (1.x or 5) served as HTML (the only mode IE supports), the use of a colon on an attribute is allowed but is not attached to the concept of a namespace, it’s just an opaque identifier. Aka the element name would be xmlns:sys.
In XHTML served as XML (aka XHTML 1.0, XHTML 1.1 and XHTML5), they’re allowed and understood to be part of a namespace. Aka the element name would be sys, with a prefix of xmlns.

So now we start seeing where the problem lies. HTML languages *do not* support xml namespaces. At all. Ever. Not even HTML5. The DOM however supports namespaces in XML-based languages and in HTML5, but not in HTML4.

Ok so I mentioned that introducing xml namespaces in such a way broke a bunch of scenarios. Let’s review them.

XHTML 1.x isn’t conforming when used with xml namespaces

Bertrand Le Roy highlighted that all was fine in xml, because surely xml supports xml namespaces. While this is true, XHTML does not allow using other xml namespaces within the XHTML specification. Here’s the relevant quote from http://www.w3.org/TR/xhtml1/#well-formed.

The XHTML namespace may be used with other XML namespaces as per [XMLNS], although such documents are not strictly conforming XHTML 1.0 documents as defined above. Work by W3C is addressing ways to specify conformance for documents involving multiple namespaces.

What this means is that a document may well be well-formed xml but not be conforming XHTML 1.0 if it uses xml namespaces. That is why validators will refuse XHTML+RDFa or XHTML+MATHML are different specifications with their own DTDs.

Character entities are breaking

As it happens, without a DTD for XHTML served as xml, Firefox and any other xml-enabled browser will ignore all the character entities everybody has been scattering for too long in their pages. Yes, that even includes the sacrosanct  

The only way for those character entities to be recognized, including in scripts, is to include a DTD.

In this scenario, the format proposed by microsoft would be a compound XHTML language requiring its own DTD in order to validate at all, unless you agressively prevent people from using DTDs.

HTML5 is broken

As I’ve highlighted before, the HTML serialization of HTML5 specifically ignores, and will not generate namespaced DOM elements, any element and attribute that is not part of either the mathml, svg or html namespaces. More than that, the xmlns attribute is ignored and allowed only for convenience.

What this means is very clear: if you’re doing an HTML5 document (as opposed to XHTML5), you cannot use namespaces.

DOM is broken in most scenarios

For those of us that use scripts, and I assume AJAX 4.0 is for scripters, this gets better. In the case where the document was served as HTML, be it that it was HTML 4.01, HTML5, XHTML5 or XHTML 1.x, the method you’d use is the getAttribute(“sys:command”).

In the case where you actually care about processing using Xhtml and serve to browsers that support this language, you’d have to use getAttributeNS(“javascript:Sys”,”command”);

For an example of the kind of trouble this provokes, see this code snippet: https://developer.mozilla.org/en/Code_snippets/getAttributeNS

So DOM is broken when using namespaces, as you now have to know in your script if you’re with a DOM that supports xml namespaces, if the page was served as an xml serialization or an html one.

CSS is broken

Because of exactly the same reason, CSS selectors don’t work in scenarios where you’re served as xml and html, or across versions of HTML.

Let’s try to write a CSS selector for the element specified above. In the scenario where the document is parsed without xml namespaces, aka HTML5, HTML4.01 or XHTML 1.x served as text/html.

Sys\:command { background: red; }

The backslash, as per the CSS specification, is used to escape a character that would otherwise be recognised. Here, I style an opaque element name with the value of “Sys:command”

So what happens when you deal with xhtml?

Sys|command { background: red; }

The syntax is different because xml namespaces are expressed that way in CSS.

jQuery may be broken

And if you think the CSS syntax is a problem, you’ll have to remind yourself that jQuery is using CSS selectors, and doesn’t support the xml namespace syntax. See the problem above.

Learning from the Aria experiments

Rather than write code myself, I’ll point you to two tests that were written when the debate on the aria extensions was all the rage. Note that the first one may be flawed because of firefox parsing the content as html. In either case, none of those scenarios have consistent results across browsers as it stands today.

http://simon.html5.org/test/aria/colon-vs-dash/
http://www.w3.org/XML/2008/04/ARIA-Testing/colon-test.html with description from Henry Thomson http://www.w3.org/XML/2008/04/ARIA-Testing/#three

Conclusion

I made the point, on Scott's blog, that entering the field of using xml namespaces in non-xml languages was a grave mistake that had major compatibility issues with xml languages. I was brushed off, on that blog and through twitter, with the argument that dogma was the root cause of my argument. I hope that this entry addresses this concern, and that we can start discussing a way forward that doesn’t break anyone attempting to use application/xhtml+xml as it was intended.

To recap why using xml namespaces for this scenario is a bad idea:

Mixing HTML and XHTML is a mess, because of the various DOM levels, the absence of namespaces in HTML, and the different resulting DOM trees.
Arguing that xml is extensible when Internet explorer doesn’t understand any XHTML format served as xml is just a very strange thing to do.
The Aria guys had the same issues and settled on a different extensibility points for those exact reasons.

My point here is not that the whole concept is wrong, it is right in a few scenarios, and comes with a lot of baggage. If Microsoft intends to deliver those features to a wide audience, and knowing their track record at warning developers of the tradeoffs of not respecting standards (and to be fair, the track record of Microsoft developers not giving a f*ck about standards and eating whatever comes from the hosepipe), they should take a responsible approach and warn prominently of those incompatibilities.

So, how to move forward? Here’s a few ideas thrown against the wall, to see if it sticks:

Adopt a custom style group using a CSS-like language with its own media type. Compatibility is high and it doesn’t break anything (and would degrade very gracefully across languages).
Rely on the class attribute and rely on JavaScript for attributing templates.
Enclose any template as a CDATA block that can be processed manually through the DOM by a JavaScript library. This wouldn’t break anything and would be a valid way to enclose xml-valid documents.
Support Sam Ruby and the TAG in pushing industry-wide adoption of xml namespaces in HTML languages, aka HTML 5 or 4.2 or whatever is going to come out of the working group.
All failing, put big red-letter warning signs explaining all the incompatibilities they introduce by implementing xml features when their flagship and only browser doesn’t support xml.
Don’t be defensive when points about your frameworks are made by the community. Don’t ask me to cheerlead you by making me look like a drag on your creativity. It’s really unnecessary if not borderline derogatory.

And the nitpicker corner, for all the things I heard before, and that I may agree with, none of which changes the result of this analysis:

Your css-like proposed language requires a CSS-like parser in JavaScript, it’s slow
Firefox, Safari and Opera also have issues with mixed HTML / XHTML scenarios
The W3C sucks and no one cares anymore, tag soup has won
Internet Explorer is right in not accepting xml languages, but still support xml DOM, cause that makes sense in an alternative reality.