Tech Pubs Tuesday: Catching Regressions in Writing

Like everyone else, I make plenty of huge embarrassing typos. Over the years, I’ve tried to discover patterns in how and when I make basic errors.

Perhaps the most common scenario is when I start out with a paragraph that’s correct, and then decide to wordsmith a sentence or clause within that paragraph. For me at least, this is a huge DANGER ZONE moment — there’s an excellent chance that fixing that little passage will actually spawn more errors.

Just to pick a random example, I might start with a paragraph like,

Perhaps one of the most common scenarios is when I start out with a paragraph that’s correct, and then decide to wordsmith a sentence or clause within that paragraph. For me at least, this is a huge DANGER ZONE moment — there’s an excellent chance that fixing that little passage will actually spawn more errors.

Then I decide that the lead sentence would be stronger if I said, “the most common” instead of the more hedging, “one of the most common.” So I change the paragraph:

Perhaps the most common scenarios is when I start out with a paragraph that’s correct, and then decide …

Oops! I forgot to change “scenarios” back to singular. There’s something about going back and focusing on a tiny section of text that makes me forget about overall coherence at the sentence and paragraph.

If this kind of thing trips you up as well, you can do one of two things.

Option A is to do what many professional writers tell you to do: write your entire rough draft without any editing, then go back and do a big editing sweep. Editing entire passages all at once means you’re less likely to make this kind of coherence error.

Unfortunately, I’ve never written that way. I am a writer who backtracks. Always have been, always will be.

So Option B is to cultivate a sense of hyper-awareness anytime you go back and refactor something you’ve just written. This is the approach I take. If you go back and fix up something you’ve just written, the hairs on the back of your neck should rise.

If it helps, you can think of it this way:

  • each sentence in a paragraph is like a function
  • all the sentences in a paragraph work together like a collection of functions in a small class or module
  • if you tweak a sentence that already works, that’s like changing the arguments or the return value of a function that already works. So… are you sure you didn’t screw up something else that depends on that sentence?

You might be wondering, “Are there test suites for sentences?” In fact, yes, there are! These test suites are called “technical copyeditors,” and they’re very expensive to run (even once).

So in lieu of that, your next best option is to go back and eyeball the rest of the paragraph. At a minimum, carefully read the preceding and following clauses for tense agreement, etc. That should catch most of these “regression” style writing errors, but keep in mind that the best defense is always to get another pair of eyes on your work.

Tech Pubs Tuesday: Don’t Write Directly in HTML

Since most of the documentation you produce is going to be hosted as HTML anyway, you might be wondering: why not just cut out the middleman and write all your documentation in plain HTML? Instead of learning some weird intermediate format that transforms to HTML, wouldn’t it make sense to handcraft whatever markup, CSS, and JS that you need directly?

Actually, since most documentation tools and formats range from mediocre to awful, plain HTML isn’t necessarily a bad choice. I’ve seen some beautiful documentation authored in handcrafted HTML. That said, using a more specialized format will make things easier on you. Here’s why.

First, typing out HTML open and close tags is annoying and breaks your writing flow, even if you have a good text editor or IDE. If you’ve used a lightweight markup language like Markdown or reStructuredText, you know the difference. Creating a new paragraph with newline, newline is more natural than angle-bracket, p, angle-bracket. Creating a bullet list with leading *s is much more natural than typing <ul>s and <li>s.

Second, maybe you do want multiple output formats! Are you sure you don’t want PDF? What about man pages? If you want man pages, you’re going to have to invent some mechanism for transforming your HTML into TROFF (a venal sin), or you’re going to have to write the same material twice (a mortal sin).

Third, and perhaps most important, HTML is deliberately primitive and general-purpose. It lacks the semantics that you want for describing technical documentation. For example, when writing a sophisticated book, you might want things like:

  • Real cross-references. You want to create a link that points to a section or an example or a table that automatically updates itself when the thing it references moves, or changes its title.
  • Fancy admonitions (warnings, dangers, cautions, notes, tips)
  • Fancy titled tables and figures
  • Fancy titled code examples
  • Automatically generated tables of contents, lists of figures, lists of examples…
  • Automatically generated glossaries and indexes. (Okay, who am I kidding? Nobody cares about indexes anymore. Sniff.)
  • File inclusions (raw, interpreted, syntax highlighted or not, with line numbers or not, …)
  • Replaceable text
  • Conditional text (generating different aspects of the book from the same source)

… and so on.

I think my bottom line is, you can get away with writing a small amount of documentation in plain HTML, such as a README or a short install guide. But the larger your book, the harder this gets. The pattern you really want to avoid is:

  1. Start authoring a substantial book in HTML.
  2. Part way through, discover that you need some of the features on the list above.
  3. Start hacking those features in with some kind of special ad-hoc syntax. No worries — it’s not too hard, it’s just one or two “special tags” or “macros”…
  4. Eventually end up re-inventing a bad version of reStructuredText or Pandoc-flavored Markdown. Except with way more angle brackets.

Don’t be that guy.

Tech Pubs Tuesday: Use Spellcheck

I’m pretty sleep deprived this week thanks to my ADORABLE BABIES WHO I ASSURE YOU CAN DO NO WRONG, and so my wife recommended keeping this week’s Tech Pubs Tuesday short and sweet. “How about something like, ‘Use spellcheck,'” she joked.

Ha! Little did she know that “Use Spellcheck” is actually on the list of possible topics I wrote out ahead of time. The joke is on her!

So. Use spellcheck!

Spellcheck is one of your best friends when you’re writing technical documentation. More accurately, spellcheck is your only friend when writing technical documentation.

Certainly grammar check isn’t a very good friend. Grammar check is like that “friend” who keeps passing around links to Truther theories or inane inspirational messages, and man, you’re totally going to unfriend grammar check when you finally get around to pruning your friends list. But spellcheck! When you need to move furniture across town, spellcheck is right there with her truck. Good old spellcheck.

“But Evan,” I hear you say. “Spellcheck doesn’t have all the technical terminology that I need! ” To which I say, “Right-click, Add to Dictionary.”

“But Evan,” I hear you say. “I use vim to write documentation, not some sissy GUI text editor.” To which I say, “I use vim to write documentation too!” And vim has a spellchecker. In fact, the vim spellchecker is… kind of interestingly nerdy. So start using it.

Seriously! Use spellcheck. It’s like having a test suite for your documentation. Or it’s about as close as you’re going to get to testing your documentation this side of the year 2050.

Thank you for your attention. And God help me if there are any spelling errors in this one.

Tech Pubs Tuesday: The 25% Rule for Editing (AKA Be Aggressive. Be-ee Aggressive!)

Even for a professional writer, long editing sessions are a challenge. Was I actually editing those last ten pages, or did I nod off around page three?

To help make sure I’ve been editing aggressively enough, I use a simple checkpoint rule:

After editing a first draft, the text should be 25% shorter.

Cutting 25% probably sounds like a lot. How does this work in practice? Let’s take a look at two short real-world examples.

Example 1: jQuery Document Ready

From the “How jQuery Works” tutorial, “Launching Code on Document Ready“:

The first thing that most Javascript programmers end up doing is adding some code to their program, similar to this:

window.onload = function(){ alert(“welcome”); }

Inside of which is the code that you want to run right when the page is loaded. Problematically, however, the Javascript code isn’t run until all images are finished downloading (this includes banner ads). The reason for using window.onload in the first place is that the HTML ‘document’ isn’t finished loading yet, when you first try to run your code.

Breaking this down step-by-step:

  1. “The first thing that most Javascript programmers end up doing is adding some code to their program, similar to this” — Already I’m not a fan of this intro, as it doesn’t explain to the poor newbie jQuery developer why “most” JavaScript programmers do this.

    • “The first thing that” — This is throat-clearing. Instead lead off with, “Most JavaScript programmers…” or more accurately, “Many JavaScript programmers…”
    • “end up doing is” — More throat-clearing. Instead, say what the programmers are doing directly.
    • “adding some code to their program” — Too vague. Replace with something more specific, like “wrap their code in an onload function…”
    • “similar to this:” — Also unnecessary. Launching into an example already implies “similar to this.” If you do want to have a transition to the example, you can always go with the classic, “For example:”
  2. “Inside of which is the code that you want to run right when the page is loaded.” — The first paragraph now already says most of this, thanks to the “wrap their code in an onload function.” The slightly new information is the “right when the page is loaded.” Delete this sentence, but remember that last bit…

  3. “Problematically, however,” — Replace with, “Unfortunately,”

  4. “… the JavaScript code isn’t run until all images are finished downloading (this includes banner ads).” — Minor stylistic changes: “isn’t run” to “doesn’t run” and “(this includes banner ads)” to “, including banner ads.” Parenthetical statements are a bit harder to parse, so I try to use them only when necessary.

  5. “The reason for using window.onload in the first place is that the HTML ‘document’ isn’t finished loading yet, when you first try to run your code.” — Finally, here is the reasoning we were looking for back in the first paragraph. Why do “most JavaScript programmers” use onload? Because they are waiting for the browser to finish loading the document. So let’s say that, and while we’re at it, move the concept up to the top of the section. “To ensure that their code runs after the browser finishes loading the document…”

The revised passage is now:

To ensure that their code runs after the browser finishes loading the document, many JavaScript programmers wrap their code in an onload function:

window.onload = function(){ alert(“welcome”); }

Unfortunately, the code doesn’t run until all images are finished downloading, including banner ads.

Example 2: Facebook API

From the Facebook Graph API documentation for Achievement(Instance):

The achievement(Instance) object represents the achievement achieved by a user for a particular app.

You can read more about achievements here.

An app can always access achievement(instance) associated with their app with an app or user access_token associated with their app. To access achievements for a user for other apps they require user_games_activity permission and to access achievements for the user’s friends, the app requires friends_games_activity permission.

Breaking this down step-by-step:

  1. “… represents the achievement achieved by a user…” — The obvious weak spot is the “achievement achieved.” Eliminating this repetition and changing the clause to active yields, “represents a user’s achievement for a particular app.”

  2. “You can read more about achievements here.” — A “click here” sentence! Excellent, a freebie. Just link “achievement” above and then nuke this sentence entirely.

  3. “An app can always access achievement(instance) associated with their app with an app or user access_token associated with their app.” — This sentence is hopelessly mangled. Without talking to an actual Facebook engineer, my best stab at this sentence is:

    • “An app” — Shorten to just, “Apps.” This is just a minor stylistic change.
    • “achievement(instance) associated with their app” — Shorten to, “their own achievements.” Since this whole section is about the achievement(instance) object, it’s unambiguous here to just refer to them as “achievements” most of the time. As a side benefit, it’s not clear what Facebook’s convention is around capitalizing achievement(instance), so using “achievements” sidesteps this style issue.
    • “with an app or user access_token associated with their app.” — Change to, “using an app access_token or user access_token.” The “associated with their app” is implied given the sentence that follows. Also, clarify whether we’re talking about a single type of access token (a user or app access token), or two types of access tokens (a user access token or an app access token). Here, I’m guessing it’s the latter.
  4. “To access achievements for a user for other apps they require user_games_activity permission” — Break this off into a new sentence, “Accessing a user’s achievements in other apps requires user_games_activity permission.”

    • “To access” — Change to “Accessing.” Another minor stylistic change. The main reason I like it better is that it makes it a bit easier to get rid of the “they” later on in the sentence.
    • “achievements for a user for other apps” — Eliminate the double “for” and shorten to, a “user’s achievements in other apps.”
  5. “and to access achievements for the user’s friends, the app requires friends_games_activity permission.” — Create a parallel sentence to the previous one, “Accessing a user’s friend’s achievements requires friends_games_activity permission.” Saying “user’s friend’s achievements” is a little dicey, but I can live with it.

The revised passage is now:

The achievement(Instance) object represents a user’s achievement for a particular app.

Apps can always access their own achievements using an app access_token or user access_token. Accessing a user’s achievements in other apps requires user_games_activity permission. Accessing a user’s friend’s achievements requires friends_games_activity permission.

Conclusion

$ wc /tmp/*.txt
5 67 475 /tmp/facebook-orig.txt
3 43 354 /tmp/facebook-revised.txt
5 85 522 /tmp/jquery-orig.txt
5 42 301 /tmp/jquery-revised.txt
18 237 1652 total

There are a couple of takeaways here.

The first is that you won’t reach the 25% by tinkering at the margins, fixing little issues around grammar and wordiness. You need to dive in and rework entire sentences, paragraphs, and sections. Don’t worry — in a first draft you will always, always find sections that you can delete or radically shorten. The 25% Rule is there to force you to do the harder, more important work of rethinking the text.

The second is that the overall idea is not, “Whoever reduces the text the most WINS AT TECH WRITING!” The idea is only that if you are editing a first draft aggressively enough, then you should probably see a reduction of around 25% or more.

In other words, the 25% Rule is just a rough self-check for your edit. If you’ve crossed the 25% threshold, you’re probably doing okay. There are no “extra points” for going further.

Tech Pubs Tuesday: Code Proximity

So I’m going to try a new thing in 2013 — a weekly series of short posts on the craft of software technical writing. Welcome to Tech Pubs Tuesday! Today’s topic is code proximity: what it is and why you want it.

There are lots of paths to broken documentation, but one of the most popular paths is to make sure your documentation source and software source live as far apart from each other as possible. In practice, this could mean:

  • keeping your documentation source in a separate repo from your software source
  • keeping your documentation in a proprietary binary format that only a few people can actually open and use
  • keeping your documentation authors as far as possible from your software authors

By contrast, in a healthy project:

  • documentation source sits right alongside your software in a /doc directory, embedded inside your source code as doc comments, or both
  • corollary: documentation source should be in an open, plain text format
  • documentation builds as part of your normal CI process or build process
  • documentation authors work right alongside software authors (or are the same people)

In other words, documentation must be part of your project, just like your tests and build scripts. It needs to be in your face every time you view your repo. The last thing you want is for it to be hidden away, in a place you don’t know about, in a format you don’t understand, controlled by people you don’t really know. Code proximity is your first line of defense against documentation rot.