Atom IDs: What’s Wrong With Domain + Timestamp?

For a long time I had been mostly indifferent to the nascent Atom syndication format. True, Atom was taking the right approach in formalizing its specification and developing comprehensive test suites. But I was still thinking, why bother adding an Atom feed when I already had perfectly fine RSS feeds? In other words, “What does Atom add that RSS can’t already do?”

That was my mindset until a few weeks ago, when Mark Pilgrim delivered a well-needed sharp blow to the kidneys to that whole idea. This is when I got religion. Of course, I really should have gotten religion well before that, but I’m a slow learner.

So I was all rarin’ to go, ready to put up an Atom feed, when another post by Mark brought me up short. In his instructions on “How to make a good ID in Atom“, Mark advised against using your HTTP permalink URI as an ID, recommending Tag URIs (yuck!) instead. Then Tim Bray chimed in, saying that permalink URI as an Atom ID is probably fine, but if you don’t want to use that, you can construct a NewsML URN (yuck again!) Not to be outdone, Bill de Hora said, “Never mind the URI.” Permalinks — heck, domains themselves — are transient, so… use Tag URIs. But wait, Tag URIs (at least the way Mark constructs them) contain domains! Pretty soon my brain started to hurt, and I decided to back off and think about this later. Then things got busy at work, time passed, life went on. I probably would have forgotten about the whole thing, except I a few days ago I noticed a task in iCal: “Make Atom feed.” Stupid iCal.

So it’s back to thinking about Atom IDs. Each entry in an Atom feed must have a globally unique, unchanging ID that is also a valid URI. The idea is that Atom applications should be able to identify feed entries uniquely, even if your entries get resyndicated or published in several places at once. However we choose to generate our Atom IDs, we are duty-bound to ensure that they don’t change after creation and that we don’t accidentally clobber someone else’s Atom IDs.

Okay, so what can we use that’s unique and unchanging?

  • My domain of goer.org is unique — I own it, and no one else who’s publishing feeds should be using it. (Bill would say that I’m just renting my domain, but as we’ll see, that doesn’t really matter.)

  • The creation timestamp of each entry is unchanging.[1]

Together, the two form a globally unique, unchanging event. For example, on May 10, 2004 at 9:56pm and 30 seconds PST, the site goer.org posted the entry, “Supercharge Your Outlook Performance!” Now, the domain goer.org is not unchanging — I could easily lose it. The creation timestamp is not unique — there’s an excellent chance someone posted something on May 10, 2004 at 9:56pm and 30 seconds PST. But put the two together, and we have the makings of a simple but robust Atom ID:

https://www.goer.org/2004/05/10/21/56/30/PST/

Note that this is not the permalink to that entry. The permalink is https://www.goer.org/2004/May/index.html#10. The Atom ID takes the domain and appends the timestamp components, separated by forward slashes. I used forward slashes because I know that forward slashes can be used to make valid HTTP URIs. (Tag URIs and NewsML URNs seem weird and scary to me, so I decided to stick with familiar HTTP URIs.) Thus, the Atom ID is a valid HTTP URI, even though it doesn’t point directly to an HTTP resource. Based on my cursory reading on HTTP URIs, I think this is okay. (If I’m wrong about that — if it is incorrect to design an HTTP URI that does not point directly to an HTTP resource — let me know.)

For another example, consider my HTML tutorial. The permalink for the entry on classes and IDs is https://www.goer.org/HTML/intermediate/classes_and_ids/. However, the creation timestamp was August 21, 2002 at 12:34pm and 03 seconds PST. Thus, the corresponding Atom ID would be https://www.goer.org/2002/08/21/12/34/03/PST/.

I’m trying to think of scenarios where this scheme would fail. It doesn’t matter if my permalinks change. The domain and creation timestamp are still the same. Nor does it matter if I lose my domain name. I couldn’t produce future entries with the same domain, but the entries I already produced would still be valid, and I could certainly produce new entries with a new domain name.

For example, let’s say that on January 1, 2005, I’m getting married. Because I’m such a forward-thinking, progressive guy, I’m not asking my wife to change her last name — in fact, I’ve decided to change my last name to hers. Henceforth I am to be known as “Evan Goer Nahasapeemapetilon.” And to really drive the point home to friends and family, I’m dumping goer.org for a more appropriate domain name. So now to the really important question: what does that do to my Atom entries? Well, all posts before January 1, 2005 would have the form https://www.goer.org/YYYY/MM/DD/HH/MM/SS/PST, while all posts after January 1, 2005 would have the form http://nahasapeemapetilon.net/YYYY/MM/DD/HH/MM/SS/PST. Old entry IDs are unaffected, and all IDs are still unique.

Okay, but what if someone else takes over goer.org? In order to have a collision, the following would have to happen:

  1. The new owner decides to publish an Atom feed, AND
  2. They happen to use the exact same format I do, with the exact same separators, AND
  3. They decide to publish entries that were created in the past, before they owned goer.org, AND
  4. Some of those entries have creation timestamps equal to some of mine, down to the second, AND
  5. The old entries have never been published as Atom entries before, because otherwise they would already have an Atom ID, and Atom IDs are unchanging, remember?

The other scenario I thought of is, what about a gigantic site that produces many thousands of entries a day? In that case, we do start to have a non-trivial chance of having two entries with the same creation timestamp. However, the Reuters and eBays of this world can certainly afford to generate some sort of extra identifier to ensure uniqueness. Fortunately, the average weblogger wouldn’t need to add this extra machinery.

Okay, let’s sum up: to make a good Atom ID, construct an HTTP URI using domain + creation timestamp. Anyone see any problems with this scheme? If not, I’ll forge ahead with Atom in a few days. One Atom feed will provide entry summaries, while the other will be the first goer.org feed ever to provide full-content goodness. Yum!

1. In this inertial reference frame, anyway.

Quick Hits: Games You Should Be Playing

  • There sure are a lot of good-looking girls at the beach in LA. Hey, I’m just sayin’.

  • More fun than a rampaging pack of Zebranskys! The best game ever packed into 11 megabytes is now available as a free download for Windows, Linux, and Mac. Cinematics are broken, pretty much everything else works great. God bless open source.[1]

  • Rummaging through the iTunes Music store is just way too much fun. Anyone remember Shriekback? I do.

    Just don’t browse iTunes while tipsy, or you might end up buying more cheesy music than is safe for human consumption. Ah-WHOA! I just died in your arms tonight! Damnit.

  • Dynegy crook treated unfairly! Actually, the article is really about the unfairness of minimum mandatory sentencing. Still, it is puzzling that the crook in question is described as a “middle-level manager” (he was a VP) who “did not personally profit from his crime” (umm, no).

    You know… I like to think that Evil is nuanced. Real villains aren’t cartoon characters. A book that has the bad guys sitting around and cackling about how evil they are can be immediately discounted as hokey and unrealistic. Right? So why can’t real-life bad guys live up to literary standards?

  • It’s not whether you win or lose, it’s how you play the game. Words to live by, especially when your company softball team is coming off a season-opening 17-1 loss.[2]

1. For more time-wasting pleasure, there’s a remake of the classic Avalon Hill boardgame Titan. God bless open source and Java.

2. Unfortunately we can’t use “Wait ’till next year!” for at least a couple of months.