« Quick Hits: Games You Should Be Playing | Main | I Need a New Hobby »
For a long time I had been mostly indifferent to the nascent Atom syndication format. True, Atom was taking the right approach in formalizing its specification and developing comprehensive test suites. But I was still thinking, why bother adding an Atom feed when I already had perfectly fine RSS feeds? In other words, "What does Atom add that RSS can't already do?"
That was my mindset until a few weeks ago, when Mark Pilgrim delivered a well-needed sharp blow to the kidneys to that whole idea. This is when I got religion. Of course, I really should have gotten religion well before that, but I'm a slow learner.
So I was all rarin' to go, ready to put up an Atom feed, when another post by Mark brought me up short. In his instructions on "How to make a good ID in Atom", Mark advised against using your HTTP permalink URI as an ID, recommending Tag URIs (yuck!) instead. Then Tim Bray chimed in, saying that permalink URI as an Atom ID is probably fine, but if you don't want to use that, you can construct a NewsML URN (yuck again!) Not to be outdone, Bill de Hora said, "Never mind the URI." Permalinks -- heck, domains themselves -- are transient, so... use Tag URIs. But wait, Tag URIs (at least the way Mark constructs them) contain domains! Pretty soon my brain started to hurt, and I decided to back off and think about this later. Then things got busy at work, time passed, life went on. I probably would have forgotten about the whole thing, except I a few days ago I noticed a task in iCal: "Make Atom feed." Stupid iCal.
So it's back to thinking about Atom IDs. Each entry in an Atom feed must have a globally unique, unchanging ID that is also a valid URI. The idea is that Atom applications should be able to identify feed entries uniquely, even if your entries get resyndicated or published in several places at once. However we choose to generate our Atom IDs, we are duty-bound to ensure that they don't change after creation and that we don't accidentally clobber someone else's Atom IDs.
Okay, so what can we use that's unique and unchanging?
My domain of goer.org is unique -- I own it, and no one else who's publishing feeds should be using it. (Bill would say that I'm just renting my domain, but as we'll see, that doesn't really matter.)
The creation timestamp of each entry is unchanging.[1]
Together, the two form a globally unique, unchanging event. For example, on May 10, 2004 at 9:56pm and 30 seconds PST, the site goer.org posted the entry, "Supercharge Your Outlook Performance!" Now, the domain goer.org is not unchanging -- I could easily lose it. The creation timestamp is not unique -- there's an excellent chance someone posted something on May 10, 2004 at 9:56pm and 30 seconds PST. But put the two together, and we have the makings of a simple but robust Atom ID:
http://goer.org/2004/05/10/21/56/30/PST/
Note that this is not the permalink to that entry. The permalink is http://www.goer.org/Journal/2004/May/index.html#10. The Atom ID takes the domain and appends the timestamp components, separated by forward slashes. I used forward slashes because I know that forward slashes can be used to make valid HTTP URIs. (Tag URIs and NewsML URNs seem weird and scary to me, so I decided to stick with familiar HTTP URIs.) Thus, the Atom ID is a valid HTTP URI, even though it doesn't point directly to an HTTP resource. Based on my cursory reading on HTTP URIs, I think this is okay. (If I'm wrong about that -- if it is incorrect to design an HTTP URI that does not point directly to an HTTP resource -- let me know.)
For another example, consider my HTML tutorial. The permalink for the entry on classes and IDs is http://www.goer.org/HTML/intermediate/classes_and_ids/. However, the creation timestamp was August 21, 2002 at 12:34pm and 03 seconds PST. Thus, the corresponding Atom ID would be http://goer.org/2002/08/21/12/34/03/PST/.
I'm trying to think of scenarios where this scheme would fail. It doesn't matter if my permalinks change. The domain and creation timestamp are still the same. Nor does it matter if I lose my domain name. I couldn't produce future entries with the same domain, but the entries I already produced would still be valid, and I could certainly produce new entries with a new domain name.
For example, let's say that on January 1, 2005, I'm getting married. Because I'm such a forward-thinking, progressive guy, I'm not asking my wife to change her last name -- in fact, I've decided to change my last name to hers. Henceforth I am to be known as "Evan Goer Nahasapeemapetilon." And to really drive the point home to friends and family, I'm dumping goer.org for a more appropriate domain name. So now to the really important question: what does that do to my Atom entries? Well, all posts before January 1, 2005 would have the form http://goer.org/YYYY/MM/DD/HH/MM/SS/PST, while all posts after January 1, 2005 would have the form http://nahasapeemapetilon.net/YYYY/MM/DD/HH/MM/SS/PST. Old entry IDs are unaffected, and all IDs are still unique.
Okay, but what if someone else takes over goer.org? In order to have a collision, the following would have to happen:
The other scenario I thought of is, what about a gigantic site that produces many thousands of entries a day? In that case, we do start to have a non-trivial chance of having two entries with the same creation timestamp. However, the Reuters and eBays of this world can certainly afford to generate some sort of extra identifier to ensure uniqueness. Fortunately, the average weblogger wouldn't need to add this extra machinery.
Okay, let's sum up: to make a good Atom ID, construct an HTTP URI using domain + creation timestamp. Anyone see any problems with this scheme? If not, I'll forge ahead with Atom in a few days. One Atom feed will provide entry summaries, while the other will be the first goer.org feed ever to provide full-content goodness. Yum!
1. In this inertial reference frame, anyway.
Posted by Evan Goer on Jun. 26, 2004 at 9:43 PM | Comments (5)
The basics:
http://www.yahoo.com automatically become links.This entry was posted on June 26, 2004 by Evan Goer.
For more entries, you can visit the main journal page or browse through the complete archives, which date back to 2001.
Text released under Creative Commons.
To use this license, you must attribute this work properly. This license does not extend to comments unless the original poster of that comment states otherwise.
Powered by Movable Type 3.33.
Home | About | Journal | HTML Tutorial
© Copyright 2001-2007, Evan Goer. Some Rights Reserved. Last Updated April 20, 2009.
Posted by Phil Ringnalda on Jun. 26, 2004 at 6:54 PM
Posted by Evan on Jun. 26, 2004 at 7:17 PM
Posted by Mark on Jul. 04, 2004 at 6:17 PM
Posted by Mark on Jul. 04, 2004 at 6:22 PM
Posted by Evan on Jul. 04, 2004 at 6:48 PM