Helpful Fascism

Quite a storm of activity going on over at Dave Hyatt‘s developer weblog.

Dave is clearly annoyed with the difficulty of parsing broken HTML. It’s an extremely hard problem to solve, and not very well-defined. It must be even more frustrating to have to be judged against the quirky behavior of the dominant browser. And Dave is exactly right about Safari’s XML parser: it must be “Draconian“. Otherwise, by definition it’s not an XML parser. (The more interesting question is whether one should actually use an XML parser to parse XHTML.)

All of this kerfluffle has generated various proposals for solving the problem of broken XHTML. Presumably the web would be a better place if all XHTML files were well-formed and therefore parseable with a standard XML parser. But the reality is that about 75% of XHTML home pages are invalid, and over 90% of XHTML sites have at least one invalid page.[1] So how do we improve the situation?

One commenter on Dave’s weblog said that he dreamed about a “fascist” browser, one that would refuse to display any page with any errors. Unfortunately, a recent weblogs.mozillazine.org server error swallowed that comment forever. It seems that the Great Spirits of the Internet have spoken. Fascist browser: bad idea. Let us never speak of it again.

A friendlier variation on this proposal is to design a browser that still displays a malformed XHTML page, but provides some sort of obtrusive error message. The basic idea is that the users will then know about the errors, the website will receive a flood of complaints, the designers will fix the page, and XHTML quality will improve. As Dave puts it:

“Many people suggested that there be a built-in validator in the browser that could show the errors to the developer. The validators basically break down into two types: obtrusive validators and unobtrusive validators.

If the validator is unobtrusive, then I would argue that it won’t receive sufficient usage to make a difference. If the browser doesn’t impose a penalty of some kind, then there will be no incentive for the author to correct mistakes.”

Never before has the gulf between developer and end user been more stark.

I’m hoping this is simply the result of sheer frustration on Dave’s part. That at some level, he realizes how quickly this “feature” will destroy Safari. In a sense, it’s kind of reassuring. Proves he’s human.[2]

Unfortunately, I don’t think any browser could survive pulling such a stunt. Maybe Internet Explorer could… after all, IE users have been trained for years to ignore annoying technical messages, and most of them don’t know how to change their browser anyway. It would be a close one, at least. Actually, here’s a clever evil plan. If we could somehow convince the IE development team that the “helpful fascist” browser was a good idea, we’d be in a win-win situation. Because either:

  • XHTML quality would magically improve all over the web, OR
  • Firebird and Opera would capture 50% of the Windows browser market.

Oops, did I say that last part out loud? Damn. Being an evil mastermind is harder than I thought.

1. Note that the XHTML 100 does not distinguish between well-formedness errors and validity errors. However, I can assure you that very few sites were invalid but well-formed. Most of the failed sites generated a torrent of errors of all kinds.

2. Although we really ought to run Dave through the Voight-Kampff Empathy Test, just to be sure.

Happy, Happy Machines

Mark Pilgrim: “Thought Experiment“. It’s not a thought experiment, actually. It’s Jacques Distler‘s reality.

Here’s a short story for you all. My company makes a large suite of J2EE software, mostly for banks and insurance companies. Our software uses XML all over the freaking place. We use XML for our configuration files, XML to communicate from our machines to our customer’s machines, and so on. Machines happily consuming XML from other machines. It works just fine, thanks.

One example of where we are using XML is in our workflow engine. A workflow contains tasks, chained together in a kind of graph. If you change the shape of the graph or the nature of the tasks, you change the work that the end users (the customer service representatives) have to do. Tasks and workflows can be in various states, they can connect to each other in certain ways, they can branch due to logical conditions, they can contain timing information, they can be routed to specific users or groups… and on and on it goes. As you might expect, the XML that represents a workflow is rather baroque.

Fortunately, we provide a nice GUI interface that allows you to quickly create tasks and assemble them into a workflow. Like the workflow engine, the GUI interface can read our proprietary workflow XML format. But instead of running the workflow, the GUI merely displays it in graphical form. Anything you can do by editing XML files, you can do by using the GUI interface. Huzzah!

Unfortunately, one of my newly-adopted manuals both A) lists every element and attribute of our workflow XML format and B) describes how to use the XML in detail, with many code examples. By placing this information in our public documentation, we have accidentally encouraged our customer engineers to muck with the workflow XML format by hand. This has caused exactly the sort of problems you would expect. And this is why I spent a good chunk of last week stripping out all of this information from our public documentation. The information will partly move into the schema, partly be reserved for internal documentation. If our customers need this information, they can get it. But we won’t be broadcasting the message, “Look! It’s XML! Open it in vi and have at it!” quite so loudly.

The more I have to deal with XML, the more convinced I grow that XML is for machine-to-machine communication only.

Horse of a Different Colour

Intrepid J2EE nerd Charles Miller is annoyed with Apple’s USA-centrism, at least when it comes to spelling.[1] For the record, I’ve worked for three USA companies that had writers in the UK… and I have to say that I have always taken great pleasure in pointing out to my colleagues across the pond that in this company we’re standardized on U.S. English, and by the way, that’s “standardized“, not “standardised”…

Well, of course I’m kidding. I’m actually very nice when I’m editing.

No, I’m not.

On a related matter, I’ve always wondered about the Anglo-centrism of computer languages. Consider the case of a non-English-speaking developer who’s starting to learn Java. The reserved words (“if”, “else”, “for”, “this”, …) are in English, which almost certainly results in annoying overhead. To make matters worse for our developer, all of the standard packages (and most 3rd-party packages) are in English too. If you speak English, you can often guess what a Java method call does — for example, HashMap.clear() probably, err, clears a hashmap. But if your sole language was French or Korean, you wouldn’t know what “clear” or “hashmap” were unless you had run across those words before (perhaps earlier in your career). In any case, your learning curve for Java or any other high-level language[2] would be steeper than than a native English speaker’s. And it would be even worse if you didn’t know the character set. Imagine as an English speaker, having to learn to code using Arabic or Japanese Kanji. What a pain that would be.

Of course, there’s no reason that you couldn’t have a development environment that allows you to code in your native language, and then automagically transforms the source into the associated English source code. That should be pretty straightforward for the basic language keywords and any standard libraries, anyway. I wonder if such a feature exists? Hmmm.

1. Having recently reinstalled my PowerBook’s operating system, I can also state for the record that Apple clearly favors Swedes over Norwegians. The Swedish localization files install before the Norwegian files, in blatant disregard for alphabetical order.

2. Except for UNIX shell scripting, which is gibberish in any language.