Helpful Fascism

Quite a storm of activity going on over at Dave Hyatt‘s developer weblog.

Dave is clearly annoyed with the difficulty of parsing broken HTML. It’s an extremely hard problem to solve, and not very well-defined. It must be even more frustrating to have to be judged against the quirky behavior of the dominant browser. And Dave is exactly right about Safari’s XML parser: it must be “Draconian“. Otherwise, by definition it’s not an XML parser. (The more interesting question is whether one should actually use an XML parser to parse XHTML.)

All of this kerfluffle has generated various proposals for solving the problem of broken XHTML. Presumably the web would be a better place if all XHTML files were well-formed and therefore parseable with a standard XML parser. But the reality is that about 75% of XHTML home pages are invalid, and over 90% of XHTML sites have at least one invalid page.[1] So how do we improve the situation?

One commenter on Dave’s weblog said that he dreamed about a “fascist” browser, one that would refuse to display any page with any errors. Unfortunately, a recent weblogs.mozillazine.org server error swallowed that comment forever. It seems that the Great Spirits of the Internet have spoken. Fascist browser: bad idea. Let us never speak of it again.

A friendlier variation on this proposal is to design a browser that still displays a malformed XHTML page, but provides some sort of obtrusive error message. The basic idea is that the users will then know about the errors, the website will receive a flood of complaints, the designers will fix the page, and XHTML quality will improve. As Dave puts it:

“Many people suggested that there be a built-in validator in the browser that could show the errors to the developer. The validators basically break down into two types: obtrusive validators and unobtrusive validators.

If the validator is unobtrusive, then I would argue that it won’t receive sufficient usage to make a difference. If the browser doesn’t impose a penalty of some kind, then there will be no incentive for the author to correct mistakes.”

Never before has the gulf between developer and end user been more stark.

I’m hoping this is simply the result of sheer frustration on Dave’s part. That at some level, he realizes how quickly this “feature” will destroy Safari. In a sense, it’s kind of reassuring. Proves he’s human.[2]

Unfortunately, I don’t think any browser could survive pulling such a stunt. Maybe Internet Explorer could… after all, IE users have been trained for years to ignore annoying technical messages, and most of them don’t know how to change their browser anyway. It would be a close one, at least. Actually, here’s a clever evil plan. If we could somehow convince the IE development team that the “helpful fascist” browser was a good idea, we’d be in a win-win situation. Because either:

  • XHTML quality would magically improve all over the web, OR
  • Firebird and Opera would capture 50% of the Windows browser market.

Oops, did I say that last part out loud? Damn. Being an evil mastermind is harder than I thought.

1. Note that the XHTML 100 does not distinguish between well-formedness errors and validity errors. However, I can assure you that very few sites were invalid but well-formed. Most of the failed sites generated a torrent of errors of all kinds.

2. Although we really ought to run Dave through the Voight-Kampff Empathy Test, just to be sure.

7 thoughts on “Helpful Fascism

  1. I’m not sure I get it.

    We already “have” fascist browsers: Mozilla uses the expat parser on XML pages; IE does too (but only on local files, not on pages served over the ‘net as application/xhtml+xml).

    They’ll both barf on ill-formed XML.

    I’m not even sure what the “lenient” proposal on the table is:

    * not use expat (instead, use some “liberal” XML parser)?
    * use expat, but fall back to the tag-soup parser if expat reports an error?
    * use expat, but if expat reports an error, try to “correct” it and resubmit to expat?

    Firebird already *does* operate in the fashion proposed for Safari. Does that really make it a less-attractive browser?

  2. > Firebird already *does* operate in the fashion proposed for Safari. Does that really make it a less-attractive browser?

    No. But only because no one uses XHTML. One reason that no one uses XHTML is the strict error handling (the other is the lack of support from IE/Win – but working around this limitation is almost trivial for most sites). Lots of people are claiming this strict error handling is a good thing, but then most of those people are serving HTML as text/html, so reality doesn’t back up their claims.

    Given the existence of a browser that parsed ill-formed XHTML and one that didn’t, the browser which parsed more files would generally be more attractive, since it would be more likely to render the content you were attempting to view. Given that this browser was IE/Win and thus had a large market share, the amount of content that could only be rendered by a liberal parser could become large – and we’d be back in eaxctly the same situation we’re in for HTML where the IE implementation is the accepted standard, rather than the standard itself.

    This is the situation we wish to avoid.

    People wanting libral parsers, and UA authors wanting to please their users is essentially a social problem. I don’t think you can simply define it away. The specs themselves should be robust so there’s no reason for proprietry error handling. One could also mandate that UAs provide a mechanism for users to learn about the errors on a particular page. Since validity problems are problems only the author can fix, there is even some merit in the path that Atom seems to be taking, which is to create a feedback mechanism so that authors learn about the problems on their page. That probably won’t encourage them to fix the problems, but there is only so much that people can do.

    I can’t think of a single, successful content distribution format in which client programs are forbidden from performing error recovery and this restriction is actually abided by. Why will XML be different?

  3. Seems a lot of heated discussion of a format “nobody uses.”

    My personal proposal for XML is: parse till you hit the first well-formedness error, then bail: close all your open tags and ship the result off to be rendered. This procedure is (I think) unambiguous (unlike more complicated “error-recovery” schemes) and gives the user as much of the page as can be done (without introducing dubious guesses as to what the author meant to write).

  4. Oh, there isn’t any new lenient proposal on the table. I *like* Firebird’s current behavior. There are countless thousands of “bozos” (as Tim Bray calls them) who produce malformed XHTML; Firebird protects us from those bozos admirably. The application/xhtml+xml MIME-type serves as a little flag declaring, “I am not a bozo! Parse me with Expat!” The system works. Not to say it hasn’t had side effects. For example, it’s created a generation of designers who slap XHTML DOCTYPES on the tops of their pages and then smugly lecture everyone in earshot about “adherence to web standards.” But I think we can all live with second-order annoyances like that.

    Anyway, the situation is that people are proposing schemes to make browsers more strict, not less. There are any number of ways you could do this:

    1. Extreme Fascism: the proposal of the commenter I mentioned. Don’t display any web page that has any invalid markup, CSS, ECMAScript, et cetera. (Obviously this proposal amounts to a strawman, not to be taken seriously.)

    2. Moderate Fascism: A watered-down version of #1. Use the XML parser to render any page that declares an XHTML DOCTYPE.

    3. Helpful Fascism: display an obtrusive error to users whenever they encounter a malformed XHTML page. Thus creating social pressure to improve markup across the board.

    The reason I wrote this post is because I’m deathly worried that Dave has permanently latched on to #3. I’m hoping that he will come to his senses, or failing that, Apple’s UI team or Product Marketing team will do their job and kick this one to the curb. Actually, I had *meant* to write a different post, one that discusses what James is talking about above. I.e. the social problem, and the disconnect between the nature of the technology (XHTML’s strictness) and the goal it is trying to achieve (facilitating human-to-human communication, which *requires* error tolerance). But I wrote this post instead. I love Safari, and Dave is, like, seriously freaking me out.

  5. > Seems a lot of heated discussion of a format “nobody uses.”

    But if IE implemenst it, and implements it in a lenient way, then it will stop being a format that nobody uses and become just like HTML 4.

    There is actually a good reason that the IE team might decide to do this. Robert Scoble has stated that IE will not do anything that breaks backward compatibility. Ian Hickson has demonstrated that the IE engine doesn’t have the properties required for good CSS complaince. Therefore, if the IE team want good CSS compliance, they might have to swich rendering engines based on MIME type.

    >> The reason I wrote this post is because I’m deathly worried that Dave has permanently latched on to #3

    Evan, I don’t think that Safari will switch to XHTML mode basec on anything other than the MIME type anytime soon.

    One reason for this is simply that it would break the web – Dave knows this as well as anyone. Another reason is slightly technical. By the time the doctype has been read, the browser has already selected a parser (to parse the doctype). In order to switch parser at this stage, the browser has to either a) rerequest the entire document from the server, and ignore any data from the original parse, b) cache the data well enough that it can be fed into the other parser or c) have an architecture designed to deal with this problem. I know that, in the Mozilla architecture at least, switching parsers mid parse is difficult to do. I suppose Safari is similar, if only because XHTML support is a relativly new feature.

    I suppose there is also the question of which behaviour is “correct”. A doctype is really only useful for validating a document, whereas the content-type should tell the browser how to interpret a document. So I might even go as far as to say the browser behaviour is right in the abstract sense as well as the practical one.

Comments are closed.