Make Money Fast with Clean Markup

Rakesh Pai: “The Economics of XHTML” (via Anne Van Kesteren). Anne advises that when you read Rakesh’s article, you sub in “semantic HTML” for “XHTML”. That’s a good substitution, although I actually prefer “clean markup.” Making your markup more semantic is a good thing, up to a point. Once you cross a certain line, your mind begins an inevitable slide into Semantic Extremism, until eventually you’ve convinced yourself that everything should be a list item or some such nonsense.[1] But I digress.

There have been countless articles like Rakesh’s about how XHTML clean markup will save you big bucks. Honestly, I don’t fundamentally doubt the overall theory, but it disturbs me that none of these fine articles puts out hard numbers on how much money you’ll actually save in practice. The most concrete examples in the genre so far are the “redesign articles”, wherein the author picks a large site with crufty markup, redesigns the home page with clean markup, and performs a highly naive calculation of the bandwidth saved. The best article that I know of is Mike Davidson’s interview with DevEdge, and even that piece only provides a theoretical estimate.

So let’s all put on our Business Analyst hats and ask a few questions that might be pertinent for designing an actual case study. To be sure, thinking in BizDev does not come naturally to most folks, certainly not me.[2] So first, a short cleansing ritual, to prepare the mind for these alien thoughts:

Ph’nglui mglw’nafh Forbes R’lyeh wgah’nagl fhtagn! ROI! ROI! ROI! Aiiiiieeee!

Ah, there we go. Now, consider a largish commercial site:

  • What are the actual bandwidth savings over one-month period, factoring in caching, real-world resource request patterns, etc.?

  • How much does a TB of bandwidth go for these days? How much will that same TB cost three years from now?

  • How much developer time does it take to refactor a complicated CMS to produce clean markup?

  • How much developer time does it take to clean up legacy content? Is this archived material accessed often enough to be worth cleaning?

  • Are developers who have modern skills more expensive than old-skool <font>-happy developers? (I would think so.)

  • What percentage of visitors use NN 4 or IE 4? Does the revenue lost from these visitors outweigh the overall bandwidth savings?

  • How much does it cost to employ other techniques to speed up your site, such as enabling conditional gzip compression? Comparing these techniques with a total redesign, which ones are the cheapest?

I don’t have the answers to these questions. But I do suspect that any web design shops that can answer these questions (with non-foofy numbers) basically have a license to print money.

1. If we all lived in Web Designer City, a metropolis bustling with architects and bricklayers, professors and artists, hustlers and street vendors, you would be the guy staggering down the street, muttering to himself.

2. Business persons tend to ask questions that either A) make no sense or B) are so hard that any response you get back is almost certainly a lie. Or if we’re feeling charitable, a “Wild Ass Guess.”

Quick Hits: The Muggles of Physics

Or, my official “this journal is not yet moribund” post. Err, well, you be the judge.

  • The U.S. Energy Dept. is taking another look at Cold Fusion [via Slashdot].

    Ah, cold fusion. The field of inquiry that is predicated on the belief that chemical reactions of ~5 eV can affect the threshold energies of reactions that require 50,000 eV or more. Of course, this being Slashdot, it wasn’t too hard to find a conspiracy theorist or two modded up. Actually, if you want a better conspiracy theory, physicist Chad Orzel has one for you: if the DOE is thinking about funding cold fusion research, that enables the administration to say that they are “researching alternative energy sources” without, actually, like, researching any alternative energy sources. I wouldn’t listen to Orzel though; as he himself admits, he’s just a nutbar conspiracy theorist.

  • Well, forget about cold fusion. If you’re jonesin’ for some real physics (and who isn’t?), one need look no further than Britney Spears’s Guide to Semiconductor Physics.

    “It is a little known fact that Ms Spears is an expert in semiconductor physics. Not content with just singing and acting, in the following pages, she will guide you in the fundamentals of the vital laser components that have made it possible to hear her super music in a digital format.”

    It actually looks to be a pretty informative introduction to semiconductor physics,[1] although high school or college students looking for term paper material should note that it is probably not a good idea to list this reference explicitly in your bibliography. Just lie and say it came from the IEEE. You should also probably avoid cribbing this illustration of the conduction and valence bands.

  • And speaking of the IEEE, they have another article that takes a look at music encoding algorithms [also via Slashdot]. “At its heart, the MP3 format uses an algorithm that takes the data contained in CD music relating loudness to specific points in time and transforms it instead into data relating loudness to specific frequencies.” When I first read this, I thought, “This is the IEEE and they can’t bring themselves to say ‘Fourier Transform’?” Then I started googling, and discovered you don’t necessarily have to use FFT to do the encoding. You can come up with whatever algorithm you like, and you can even charge people for it if you like. How ’bout that, you learn something new on the Intarweb every day.

    Now, if I was going to write an MP3 encoder, I would use a Laplace Transform. If you think that’s perverse, I apologize… I can’t help it, it’s the way I was brought up.

  • I saw a bumper sticker yesterday, “Bush is a Muggle.” At first I thought, well, of course he’s a Muggle, we’re all Muggles in the strict sense of the definition. Then I thought, maybe that was the bumper sticker’s point? Maybe it’s a very subtle pro-Bush bumper sticker?

    1. Bush is a Muggle.
    2. I’m a Muggle, you’re a Muggle.
    3. We’re all just happy Muggles together. Revel in our common Muggle-osity!

    Then I thought, I’m thinking a little too hard about this.

Time to go make the mint syrup for the mojitos for today’s barbeque. It’s a quadruple batch, Yum!

1. Not only informative, but entertaining as well:

Note that in this technical region [temperature range] if the counter doping is negligible, Na << Nd or Nd << Na, (35) and (37) simplify to

n = Nd (39)

p = Na (40)

which is what we tell the engineers.

Information Loss

A quick exercise:

  1. Think of an area of knowledge where you have acknowledged, real-world expertise.

  2. Think of the last journalism piece you encountered that touched on that area of knowledge.

  3. How accurate was that piece?

I don’t know about you, but more often than not, I dread reading mainstream articles on fields that I know a little something about. Take physics. Okay, yes, I expect to read piles of dreck in my old field of nanomechanics, what with the active campaign to spread dreck and all. But even discounting nanomechanics, there’s no shortage of the dreck in other areas of physics.

Case in point: Brad DeLong recently picked apart a picks apart an article by TNR columnist Gregg Easterbrook on the Stephen Hawking black hole information loss bet. Easterbrook not only attacks physicists as mumbo-jumbo-spouting medieval priests, but also manages to make an appalling number of scientific errors. I’ve actually liked reading Easterbrook in the past, but now I’m wishing I had taken his writing with a much larger grain of salt. If you think modern physics is worth snarking over because The Physics of the Very Large and The Physics of the Very Small does not match our common-sense intuitions about The Physics of Tables and Chairs — well, as DeLong points out, you’re about 300 years too late to that party.

No doubt one of the main reasons The New Republic published Easterbrook’s article is because it behooves them to take a generally contrarian view. And let’s face it, the mainstream take on the Hawking story was pretty darn boring. Cute, gnomish High Priest of Physics pronounces to his fellow white-haired, gnomish physicists that he has lost an old bet about — something wacky, something to do with black holes. Gnomish men scurry off to check their leader’s calculations, muttering that they don’t quite understand what he’s talking about. Cricket and baseball are somehow involved. Those darned physicists! The End.

The sad part is that there really was a non-boring version of the story; namely Jacques’s take, where we learn that A) mainstream theoretical physics solved this problem quite some time ago, and B) Hawking’s concession argument is rather strange and incomplete to say the least! Unfortunately, Jacques wrote his article for people who have at least a passing acquaintance with Anti de Sitter / Conformal Field Theory, a group that probably excludes you and definitely excludes me. Still, there is in fact a real story there.1

If only The New Republic had thought to hire a geniune physicist to write about physics, the way Slate has thought to hire a genuine Wall Street scoundrel to write about Wall Street shenanigans. Oh, well.

1. And the good news is, it can’t possibly be lost! We think.

Linkdump: Games People Play

Well, it’s been a couple of weeks since my last post. What have I missed?

  • Bulwer-Lytton 2004 is out. Get it while it’s hot.

  • Anne van Kesteren is back from vacation, and he is on fire. What’s the deal with XHTML? Does XHTML really save bandwidth over HTML? Day by day, bit by little bit, we all edge closer to markup sanity.

  • Meanwhile, Jacques is back too, and he seems rather underwhelmed by all the hype over the Stephen Hawking’s now famous black hole information loss wager. After all, as Jacques reminds us, “Anyone who hasn’t been asleep for the past 6 years knows that quantum gravity in asymptotically anti-de Sitter space has unitary time evolution.” Actually, what I find even more interesting is the fact that Jacques’s post is titled, “No Information Lost Here!”, and is sitting at the URL http://golem.ph.utexas.edu/~distler/blog/archives/000404.html. Coincidence?? I think no– oh, heck, it’s probably a coincidence.

  • The World of Warcraft Beta developers are hard at work, furiously redesigning the in-game auction houses. Seems like they’re spending a lot of time on this, particularly since someone else has already done most of the design grunt work for them.

  • In other MMORPG news, City of Heroes has implemented capes. What I really like about this is that they tried to fold this into the story. It’s not that the developers didn’t quite get capes working in time for the release — no, no, no, it’s because all the heroes had been in mourning over one of their fallen comrades. Nicely done! Although come to think of it, why didn’t Sony ever try this with Everquest? For example, rangers sucked for the first three-and-a-half years of the game not because of a development problem, but because they were all in mourning. They were all holding back, see?

  • Well, forget all these fancy-schmantzy MMORPGs. I’m holding out for Peasant’s Quest.

  • Finally, via Russ, I found out that fellow ’97 HMC alum Joe Beda is a development lead on Microsoft’s Avalon team. Right on, Joe! For the record, I’m not even a little bit jealous of Joe’s incredibly important and prestigious job. Although that’s probably because I can take comfort in the fact that I still have all my hair.

I Need a New Hobby

It’s been interesting to see the backandforth discussion between CSS guru Eric Meyer and lead Safari developer Dave Hyatt on Apple’s proposed Dashboard extensions to HTML. At first Eric nearly hit the boiling point, but he is now working constructively with Dave to help him extend HTML in as safe a manner as possible. I’m glad they’re working together on this, because this is pretty important to get right. HTML could certainly use enhancement, but we can’t afford to implement these enhancements and in so doing FUBAR validation entirely.

The really exciting thing about Dashboard is that Apple clearly intends Dashboard widgets to be as easy to write as possible. This is one of the main reasons that they’re targeting HTML, as opposed to XHTML exclusively. Dave points out that:

“First, it was suggested that the widgets be written in XML rather than HTML and that all of the new tags and attributes be namespaced. However, this would have dramatically increased the complexity of crafting Dashboard widgets. People know how to write HTML, but most of those same people have never written an XML file, and namespaces are a point of confusion.”

This has of course drawn out legions of Markup Experts to snigger that XHTML isn’t so hard, Apple should do things properly in XHTML, Apple is just being “lazy”, how dare they muck with rotten old legacy HTML, any developer worth their salt can write XML, et cetera. For amusement, I bounced around the web this morning checking out these arguments. I validated five pages in a row. One hundred percent were serving up their arguments as non-well-formed XHTML.

Replication of this experiment is left as an exercise for the reader. In the meantime, I’m thinking of taking up watercolor painting.

Posted in Web

Atom IDs: What’s Wrong With Domain + Timestamp?

For a long time I had been mostly indifferent to the nascent Atom syndication format. True, Atom was taking the right approach in formalizing its specification and developing comprehensive test suites. But I was still thinking, why bother adding an Atom feed when I already had perfectly fine RSS feeds? In other words, “What does Atom add that RSS can’t already do?”

That was my mindset until a few weeks ago, when Mark Pilgrim delivered a well-needed sharp blow to the kidneys to that whole idea. This is when I got religion. Of course, I really should have gotten religion well before that, but I’m a slow learner.

So I was all rarin’ to go, ready to put up an Atom feed, when another post by Mark brought me up short. In his instructions on “How to make a good ID in Atom“, Mark advised against using your HTTP permalink URI as an ID, recommending Tag URIs (yuck!) instead. Then Tim Bray chimed in, saying that permalink URI as an Atom ID is probably fine, but if you don’t want to use that, you can construct a NewsML URN (yuck again!) Not to be outdone, Bill de Hora said, “Never mind the URI.” Permalinks — heck, domains themselves — are transient, so… use Tag URIs. But wait, Tag URIs (at least the way Mark constructs them) contain domains! Pretty soon my brain started to hurt, and I decided to back off and think about this later. Then things got busy at work, time passed, life went on. I probably would have forgotten about the whole thing, except I a few days ago I noticed a task in iCal: “Make Atom feed.” Stupid iCal.

So it’s back to thinking about Atom IDs. Each entry in an Atom feed must have a globally unique, unchanging ID that is also a valid URI. The idea is that Atom applications should be able to identify feed entries uniquely, even if your entries get resyndicated or published in several places at once. However we choose to generate our Atom IDs, we are duty-bound to ensure that they don’t change after creation and that we don’t accidentally clobber someone else’s Atom IDs.

Okay, so what can we use that’s unique and unchanging?

  • My domain of goer.org is unique — I own it, and no one else who’s publishing feeds should be using it. (Bill would say that I’m just renting my domain, but as we’ll see, that doesn’t really matter.)

  • The creation timestamp of each entry is unchanging.[1]

Together, the two form a globally unique, unchanging event. For example, on May 10, 2004 at 9:56pm and 30 seconds PST, the site goer.org posted the entry, “Supercharge Your Outlook Performance!” Now, the domain goer.org is not unchanging — I could easily lose it. The creation timestamp is not unique — there’s an excellent chance someone posted something on May 10, 2004 at 9:56pm and 30 seconds PST. But put the two together, and we have the makings of a simple but robust Atom ID:

https://www.goer.org/2004/05/10/21/56/30/PST/

Note that this is not the permalink to that entry. The permalink is https://www.goer.org/2004/May/index.html#10. The Atom ID takes the domain and appends the timestamp components, separated by forward slashes. I used forward slashes because I know that forward slashes can be used to make valid HTTP URIs. (Tag URIs and NewsML URNs seem weird and scary to me, so I decided to stick with familiar HTTP URIs.) Thus, the Atom ID is a valid HTTP URI, even though it doesn’t point directly to an HTTP resource. Based on my cursory reading on HTTP URIs, I think this is okay. (If I’m wrong about that — if it is incorrect to design an HTTP URI that does not point directly to an HTTP resource — let me know.)

For another example, consider my HTML tutorial. The permalink for the entry on classes and IDs is https://www.goer.org/HTML/intermediate/classes_and_ids/. However, the creation timestamp was August 21, 2002 at 12:34pm and 03 seconds PST. Thus, the corresponding Atom ID would be https://www.goer.org/2002/08/21/12/34/03/PST/.

I’m trying to think of scenarios where this scheme would fail. It doesn’t matter if my permalinks change. The domain and creation timestamp are still the same. Nor does it matter if I lose my domain name. I couldn’t produce future entries with the same domain, but the entries I already produced would still be valid, and I could certainly produce new entries with a new domain name.

For example, let’s say that on January 1, 2005, I’m getting married. Because I’m such a forward-thinking, progressive guy, I’m not asking my wife to change her last name — in fact, I’ve decided to change my last name to hers. Henceforth I am to be known as “Evan Goer Nahasapeemapetilon.” And to really drive the point home to friends and family, I’m dumping goer.org for a more appropriate domain name. So now to the really important question: what does that do to my Atom entries? Well, all posts before January 1, 2005 would have the form https://www.goer.org/YYYY/MM/DD/HH/MM/SS/PST, while all posts after January 1, 2005 would have the form http://nahasapeemapetilon.net/YYYY/MM/DD/HH/MM/SS/PST. Old entry IDs are unaffected, and all IDs are still unique.

Okay, but what if someone else takes over goer.org? In order to have a collision, the following would have to happen:

  1. The new owner decides to publish an Atom feed, AND
  2. They happen to use the exact same format I do, with the exact same separators, AND
  3. They decide to publish entries that were created in the past, before they owned goer.org, AND
  4. Some of those entries have creation timestamps equal to some of mine, down to the second, AND
  5. The old entries have never been published as Atom entries before, because otherwise they would already have an Atom ID, and Atom IDs are unchanging, remember?

The other scenario I thought of is, what about a gigantic site that produces many thousands of entries a day? In that case, we do start to have a non-trivial chance of having two entries with the same creation timestamp. However, the Reuters and eBays of this world can certainly afford to generate some sort of extra identifier to ensure uniqueness. Fortunately, the average weblogger wouldn’t need to add this extra machinery.

Okay, let’s sum up: to make a good Atom ID, construct an HTTP URI using domain + creation timestamp. Anyone see any problems with this scheme? If not, I’ll forge ahead with Atom in a few days. One Atom feed will provide entry summaries, while the other will be the first goer.org feed ever to provide full-content goodness. Yum!

1. In this inertial reference frame, anyway.

Quick Hits: Games You Should Be Playing

  • There sure are a lot of good-looking girls at the beach in LA. Hey, I’m just sayin’.

  • More fun than a rampaging pack of Zebranskys! The best game ever packed into 11 megabytes is now available as a free download for Windows, Linux, and Mac. Cinematics are broken, pretty much everything else works great. God bless open source.[1]

  • Rummaging through the iTunes Music store is just way too much fun. Anyone remember Shriekback? I do.

    Just don’t browse iTunes while tipsy, or you might end up buying more cheesy music than is safe for human consumption. Ah-WHOA! I just died in your arms tonight! Damnit.

  • Dynegy crook treated unfairly! Actually, the article is really about the unfairness of minimum mandatory sentencing. Still, it is puzzling that the crook in question is described as a “middle-level manager” (he was a VP) who “did not personally profit from his crime” (umm, no).

    You know… I like to think that Evil is nuanced. Real villains aren’t cartoon characters. A book that has the bad guys sitting around and cackling about how evil they are can be immediately discounted as hokey and unrealistic. Right? So why can’t real-life bad guys live up to literary standards?

  • It’s not whether you win or lose, it’s how you play the game. Words to live by, especially when your company softball team is coming off a season-opening 17-1 loss.[2]

1. For more time-wasting pleasure, there’s a remake of the classic Avalon Hill boardgame Titan. God bless open source and Java.

2. Unfortunately we can’t use “Wait ’till next year!” for at least a couple of months.

All Things In Moderation

Dave Shea institutes a form of comment moderation; predictable firestorm ensues. Dave also redesigned his site. switching to black text on a white background, with plenty of whitespace. The default font also looks bigger and clearer to me, although that could be an illusion — I haven’t really looked at the old and new stylesheets side by side. Overall, I think it looks great, but I’m biased towards higher contrast and larger fonts. I’ve been thinking about writing a song about this, actually…

I like big TEXT and I cannot lie
You other designers can’t deny…

Eh, well, it needs some work.

In Other Markup-related News: Peter-Paul Koch has a great article on cleaning up inline JavaScript in favor of using simple DOM hooks instead. Good stuff. Often irascible but always entertaining, Peter-Paul Koch is like the Joe Clark of JavaScript. You have to give PPK credit — if he finds it necessary to add a non-standard replace attribute to his code, he’ll do it, and the hell with validation if it gets in the way of his work. In this day and age, this is quite the bold position. After all, these days the issues seem to revolve around such things as the semantic meaning of the <div> element and how many <h1> elements can dance on the head of a pin are allowed on a single webpage. Heck, you can’t even kinda sorta make the suggestion that maybe simple structure tables might be useful under some circumstances without getting lambasted by markup purists. So you’ve got to respect someone who uses their markup to actually, you know, solve problems. PPK, my hat is off to you.