Rakesh Pai: “The Economics of XHTML” (via Anne Van Kesteren). Anne advises that when you read Rakesh’s article, you sub in “semantic HTML” for “XHTML”. That’s a good substitution, although I actually prefer “clean markup.” Making your markup more semantic is a good thing, up to a point. Once you cross a certain line, your mind begins an inevitable slide into Semantic Extremism, until eventually you’ve convinced yourself that everything should be a list item or some such nonsense.[1] But I digress.
There have been countless articles like Rakesh’s about how XHTML clean markup will save you big bucks. Honestly, I don’t fundamentally doubt the overall theory, but it disturbs me that none of these fine articles puts out hard numbers on how much money you’ll actually save in practice. The most concrete examples in the genre so far are the “redesign articles”, wherein the author picks a large site with crufty markup, redesigns the home page with clean markup, and performs a highly naive calculation of the bandwidth saved. The best article that I know of is Mike Davidson’s interview with DevEdge, and even that piece only provides a theoretical estimate.
So let’s all put on our Business Analyst hats and ask a few questions that might be pertinent for designing an actual case study. To be sure, thinking in BizDev does not come naturally to most folks, certainly not me.[2] So first, a short cleansing ritual, to prepare the mind for these alien thoughts:
Ph’nglui mglw’nafh Forbes R’lyeh wgah’nagl fhtagn! ROI! ROI! ROI! Aiiiiieeee!
Ah, there we go. Now, consider a largish commercial site:
-
What are the actual bandwidth savings over one-month period, factoring in caching, real-world resource request patterns, etc.?
-
How much does a TB of bandwidth go for these days? How much will that same TB cost three years from now?
-
How much developer time does it take to refactor a complicated CMS to produce clean markup?
-
How much developer time does it take to clean up legacy content? Is this archived material accessed often enough to be worth cleaning?
-
Are developers who have modern skills more expensive than old-skool
<font>
-happy developers? (I would think so.) -
What percentage of visitors use NN 4 or IE 4? Does the revenue lost from these visitors outweigh the overall bandwidth savings?
-
How much does it cost to employ other techniques to speed up your site, such as enabling conditional gzip compression? Comparing these techniques with a total redesign, which ones are the cheapest?
I don’t have the answers to these questions. But I do suspect that any web design shops that can answer these questions (with non-foofy numbers) basically have a license to print money.
1. If we all lived in Web Designer City, a metropolis bustling with architects and bricklayers, professors and artists, hustlers and street vendors, you would be the guy staggering down the street, muttering to himself.
2. Business persons tend to ask questions that either A) make no sense or B) are so hard that any response you get back is almost certainly a lie. Or if we’re feeling charitable, a “Wild Ass Guess.”
How much does it save in bandwidth?
That question got put to Tantek from Technorati at Hypertext 04 last month.
He said he couldn’t quote a number, but it was significant.
Also, Technorati’s already consuming well-formed content (or content that can be repaired quickly) so they didn’t have a huge base of legacy static HTML to fix.
I think I hurt myself trying to pronounce the words for the cleansing ritual.
Bill – That’s interesting, thanks. I am sure the savings is significant over time. But again, we really need some concrete numbers, because the cost of refactoring a large site is also signficant, and there may be other, cheaper ways to achieve that savings. (Of course, I understand why these sites are reluctant to provide these numbers, I’m not blaming Tantek.)
Also, a site with a large base of legacy content is exactly what I’m thinking of. We don’t have to take this to mean Amazon or eBay… I’m thinking NewEgg.com or someone like that.
Mike – That’s okay if you can’t perform the ritual. It’s pretty dangerous, you could end up permanently in BizDev mode…
The purported bandwidth savings of clean markup is a lot of malarky. If you are using
mod_gzip
/mod_deflate
, you’ll find the difference to be negligible. Text compresses really well (by a factor of 4 or 5) and “bloated” old-skool markup compresses (slightly) better than “clean” markup.If you’re not using
mod_gzip
/mod_deflate
, then you are not actually serious about saving bandwidth. (Hint: if you want to know why Tantek doesn’t quote any numbers, examine the HTTP headers from technorati.)“Text compresses really well (by a factor of 4 or 5)…”
When I read Rakesh’s article, that was exactly the first thing that popped into my mind. You have managed to uncover the real motivation for this piece. 🙂
If you’re creating a new site from scratch, clean markup is a no-brainer, because the cost is pretty close to zero. But if you have a large, well-established site with crufty markup, it’s not a no-brainer. Maybe the most efficient way to improve the user experience is to enable gzip compression. (For certain sites, gzip compression can lead to a serious tradeoff in CPU usage — something my company is testing right now in our product.) Or… maybe the “best” approach for improving user performance is to just cut Akamai a big check.
In short, Your Mileage May Vary. That’s why I’d like see a few real case studies with real numbers. Clean markup is a good solid hammer in your toolbox, but not all problems are nails.
ingoBay!
The reason why I haven’t gone html-minimul like Anne van Kesteren has on some sites is that gzipping really makes dropping the head, closing tags and angles negligible.
We’ve monitored client sites and get close to 98% coming down gzipped…
Mike, you actually thought I was doing that for saving bandwidth? That was just a little joke actually and it also showed that all “XHTML saves bandwidth” people were wrong.
I’m using it on sites because it makes the source look clean, nothing more.
Anne’s web pages are not so much designed around bandwidth savings, they are more like minimalist works of art.
The first time I saw the source of one of his minimalist pages, I had to go running back to the spec. “Ummm, can he do that?” Yup, I think he can…
“Anne’s web pages are not so much designed around bandwidth savings, they are more like minimalist works of art.”
The effect was kind of ruined for me because Markdown (which he uses for the content of his latest minimalist creation) generates closing
</p>
and</li>
tags, even it its HTML4 generation mode. The spare neo-Bauhaus lines are marred by this bit of code bloat.The site would be much more beautiful were he to tinker with the Markdown code to fix that.
Oh, what a quandary. Do we prefer Old Anne, whose perfectly strict minimalism led an almost Zen-like quality to his markup, or do we prefer New Anne, who uses his closing tags to enlighten us to the importance of symmetry in every aspect of the cosmos?
So many different aesthetic points of view to consider!
Heh, I’m going to fix that part of Markdown some day. I entirely agree that it is suboptimal 🙂
I managed to do some (dirty) clean-up on that site. The source code now looks lovely, imho.