Template for Making Ebooks with Pandoc Markdown

After reading Charles Stross’s article, “Why Microsoft Word Must Die“, the thought occurred that someone, somewhere might be interested in using Pandoc Markdown for generating ebooks.

If you’re writing fiction, Pandoc is probably too nerdy to bother with. You should use Scrivener, which is highly polished and produces fine ebook output with the press of a button. However, if you like to work with a manuscript as a collection of plain text files that you can check into version control, Pandoc fits the bill nicely. It’s a bit easier to understand than Sphinx, and it can handle anything up to moderately complex technical books.

So in case anyone finds it useful, here’s a project you can use to get started: https://github.com/evangoer/pandoc-ebook-template. Pull requests welcome!

Software Backup for Writers

[Cross-posted to the Viable Paradise blog.]

The oldest known printed paper book was published in China well over a thousand years ago. The oldest fragments of Egyptian papyrus have survived over four thousand years.

The lifespan of a manuscript on a hard drive: about five years. A hard disk drive consists of platters that spin at thousands of RPM. Inevitably, these mechanical parts will fail. A solid state drive lacks mechanical parts, but due to unfortunate quantum mechanical effects, it can only record a limited number of writes. As with the hard disk drive, the more you use your solid state drive, the sooner it will degrade and die.

HOWTO fight Entropy (or lose more gracefully)

At a minimum, you should make sure that your work resides on:

  • At least two drives you can access locally
  • At least one drive in an offsite location

Here are a few strategies you can use.

Buy a external hard drive

There’s a good chance your computer only has one drive, particularly if it’s a laptop. The easiest way to rectify this is to buy an external hard drive. Even if funds are tight, please at least consider investing in a second hard drive, as the data on your computer is probably much more valuable than the hardware.

In a pinch, you can use a small USB drive. USB drives are inexpensive (often free, if you’ve mastered the art of lurking around corporate conventions), and they’re usually big enough to store all your documents.

Once you have an external hard drive, you’ll need some software to perform automated backups. Mac OS X systems have a built-in service called Time Machine, and Windows 8 has a similar service called File History. Windows 7 has a simpler backup system called Windows Backup. There are also a plethora of third-party backup software to choose from. Just make sure you have at least some automated system chugging away. Don’t rely on manually dragging your files over to your external drive. You won’t remember to do it.

Set up a network attached storage device

This is the geekier version of buying an external hard drive. A storage device is a fine option if you enjoy tinkering with hardware and software, or if you know someone else who does. Get some friends together and have a FreeNAS build party! Serve pizza! It’ll be awesome.

Alternatively, you can invest in an proprietary storage appliance such as a Drobo. They are very shiny.

Use a cloud service

Google, Zoho, Microsoft, and various smaller independent companies and startups offer online writing tools that store your documents in professionally run data centers. There your documents will be preserved, nearly impossible for anyone to destroy (even if you yourself do your best to try). Many other companies such as Dropbox, Box, Crashplan, and Tarsnap don’t provide authoring tools directly, but enable you to sync and preserve files in general.

The advantage of cloud services is that they handle backup in a sort of Platonic ideal way, without requiring conscious intervention on your part. The disadvantage of cloud services is that the companies that provide them either tend to get bored with their current strategy and/or go out of business. So if you do choose to use a cloud service, be sure to regularly export the documents you care about to some common, popular format and save them on your local systems. For best results, stick with companies that appear to have sustainable business models, and avoid the ones that are just moonlighting or that are in the process of being prepped for sale by their VCs. Further reading: Maciej CegÅ‚owski’s Don’t Be a Free User, Jason Scott’s F**K THE CLOUD (NSFW language).

Use remote version control

The simplest and most common form of version control is to litter your hard drive with files named things like, “My Story 20-Dec-2014.doc”. (It’s okay if you do this. We’re not judging.) A more elegant solution is to use Windows File History or Mac OS X Time Machine. These tools enable you to “go back in time” and restore a file to a previous state.

If you’re not a huge software nerd, this is a fine place to stop.

However, if you are a huge software nerd (or at least a hobbyist), you might consider using a true version control system such as Git or Subversion. Not only do version control systems provide you lots of powerful features, but they also enable you to sync your manuscripts to remote repositories such as GitHub or BitBucket. BitBucket offers private repos for free, which makes it an appealing choice for authors.

Note that version control systems are not the best place to store binary formats such as Word docs. If you’re going so far as to use Git, consider going even even nerdier and writing your books in a plaintext format such as LaTeX, Pandoc Markdown, MultiMarkdown, or reStructuredText.

Take a page (ahem) from our Egyptian and Chinese ancestors

Print your final manuscript out on paper. This is a great way to ensure that at least some version of your work will survive a catastrophe — and will likely even be readable decades from now, when all the software we use today is long, long dead.

If all else fails

Perhaps you ignored all the options above, and your only hard drive failed. Or perhaps you backed up your documents all over the place, and the universe just hates you. The good news is, there are companies that specialize in salvaging data from failed and damaged hard drives. The bad news is, using a data recovery service is a lot like casting Raise Dead: it might not work, and it’ll cost you most of your gold.

Remember: any strategy is better than no strategy

This has just been a short overview of a few possible options. There are many more to think about, and it’s up to you to assemble them into a system that makes sense for you.

Admittedly, it’s hard to get religion about backups if you haven’t been bitten yet. In fact, there’s a good chance your lizard brain is telling you, “Well this backup thing doesn’t sound that scary, and hey, by the way, we’re hungry, doesn’t a snack sound nice right around now?” Ignore your lizard brain! Do something!

In the interest of full disclosure, I’ll admit that I’m not perfectly happy with my backup strategy either. For the record, my strategy these days is: Time Machine to a storage device, plus a remote Git server and DropBox for offsite backup of certain critical projects. That has me covered for some situations. I could be doing a lot more.

What’s your backup strategy? What are you going to do in the next week to make your strategy better, even if just by a little?

What To Do About Language Bullies

I enjoyed this Slate article, Are You a Language Bully? a great deal. The one thing I’d add to it is that most of the things that language bullies like to harp about are actually incorrect when you bother to do a little research. Take the example from the article where an NPR listener complained about using “decimate” to mean “destroying a large portion of” rather than, “executing every tenth person.” A quick survey of no less than four dictionaries reveals
that every single one of them lists “destroying a large portion of” as a definition, and three out of four list it as the primary definition.

If you’re faced with a language bully, do a quick search through a dictionary or through Language Log, and there’s an excellent chance they’ll have the goods on whatever your bully is complaining about. Your language bully is operating based on whatever feels right to them; the professional linguists who run Language Log enjoy writing Python scripts to crunch through reams of English text in their copious free time.

The other thing about language bullies is that like most bullies, they learned their behavior by being bullied themselves. It might feel good to “punch the bully in the nose” with your answer, but instead, please try to be kind and respectful. With proper guidance, language bullies can be reformed! Believe me, I know.


G: Why did Peter lose his jacket?

Me: Because his buttons were caught.

G: Why were his buttons caught?

Me: Because they were stuck in the gooseberry net.

G: Why was he stuck in the gooseberry net?

Me: Because Mr. MacGregor was chasing him.

G: Why was Mr. MacGregor chasing him?

Me: … because roughly 13 billion years ago, the Big Bang happened, and therefore eventually Mr. MacGregor decided to chase Peter.

G: … what is the Big Bang?

Me: It was a giant explosion that created the universe as we know it.

G: <slaps his own knee as hard he can> — That was a Big Bang!

How the Random House / Hydra Contract Becomes the New Normal (For a While)

John Scalzi has the goods on Random House’s new Hydra vanity press, and the contract really does sound awful.

If you want to publish fiction, there are two main options.

  • Go with a reputable publisher. Advantage: the publisher fronts a bunch of fixed costs around book production and distribution and pays you an advance. Disadvantage: your book might never be published, and the publisher keeps most of the revenue from your book.
  • Self-publish. Advantage: You keep the lion’s share of the revenue. Disadvantage: you get no advance, and you have to front the cost of design, editing, art, etc. (Though if you don’t give a rip about design, editing, art, etc., this cost could be close to zero.)

Random House / Hydra combines the worst aspects of both paths. Unlike going with a publisher, you get no advance, and you end up paying the fixed costs of production. Unlike self-publishing, you only get 50% of the revenue, not 70% or 90% or 97.1%. And critically, unlike either of the above options, you lose your rights over your own work, in perpetuity. Random House assumes no risk and takes 50% of your money — forever. Just unbelievably cynical and awful.

The trick will be making this kind of contract the new normal. Here’s the playbook:

  1. Full court press in industry publications and social media, arguing that the contract is actually fantastic for authors, 50% is like totally way better than 15% you guys duh, and anyone who says otherwise is a fuddy-duddy who doesn’t get The Internet and hates puppies.
  2. Select a handful of authors and spend serious money promoting them, making sure they rack up some respectable sales by hook or by crook. Brag a lot about the top-line numbers.
  3. Wait for shit-rags like Business Insider and Newsweek and Slate to dutifully churn out praise for this brave new business model. Wait for this sentiment to work its way up the media food chain.
  4. Profit???

I dunno. The thing is… if I worked for Random House, and if I were worried about competing with Amazon, Apple, Smashwords etc., I would spend some time thinking about how I could offer a better deal. I’d at least try to match their terms, and maybe add some unique features that the others can’t match. The last thing I would want to do is construct a business that is worse than what my competitors currently offer in every respect, and that promises razor-thin margins for a few more years at best.

It seems like an awfully ugly way to die.

No, Your Son is Probably Not Brain Damaged

But how is it that two year old boys just know that they need to run around and shriek, “OOGIE SKOOGIE KAN-NOOGIE!”

Is it nurture? It sure as hell isn’t nurture.

Is it some kind of Jungian collective unconscious… thingy?

Is it genetic? Does it provide some kind of increased chance of survival in the wild? (Because from where I’m standing, quite the opposite.)

Songs for Git Commands

git push

“Salt-N-Pepa — Push It”

Okay, this one was kind of a gimmie.

git commit

“Beyonce — Single Ladies (Put A Ring On It)”

If you wanted it, you shoulda put a hash on it!

git log

“Ren & Stimpy — Log”

It’s better than bad, it’s good.

git config

“Eminem — My Name Is”

Got other suggestions? Leave ’em in comments. (Extra points awarded for whatever mad genius comes up with something for git reflog.)

git reflog [new]

“Britney Spears — Baby One More Time”

Thank you, Allen!

git fetch [new]

“Queen — I Want It All”

Thank you, Josh Adell!

git rebase [new]

“Bad-CRC — All Your Base Are Belong To Us”

Thank you, Petey!

Tech Pubs Tuesday: Converting TWiki/Foswiki to Markdown, reStructuredText, or Some Other Sane Format

Once upon a time, there was a young technical writer who dwelled in an elegant cubicle with high, noise-baffling walls. He was handsome and clever, beloved by engineers and product managers throughout the Valley, and could write like the wind itself.

One dark winter’s evening just before close of business, an ancient crone came tottering over to his desk. In her arms she clutched a tattered collection of printouts of the DR DOS 3.31 manual. “Please, young sir,” the crone asked, “could you trouble yourself to help me edit this poor old woman’s manuscript?”

The young technical writer laughed and refused. “I am far too busy to help the likes of you,” he said, by which he meant too busy reading Reddit. “Begone!”

Once more the crone asked the young man for help with her edits, taking care to warn him that things were not always what they seemed on the surface… and once more the young man refused to help her.

BAMF! With a flash of light and an acrid whiff of toner, the crone’s disguise fell away, revealing her true form: the Goddess of Technical Writing herself.

The young man fell onto his knees, begging forgiveness and sobbing, but it was too little, too late. The Goddess faded away, but not before spitting out her most venomous curse: “May all your software manuals forever be written in TWiki!”

So if you’re an unlucky person, you might find yourself stuck with a pile of technical documentation in TWiki or some other baroque “enterprise” wiki. If so, here’s a hacky recipe for getting yourself unstuck.

  1. View a rendered wiki page and select the content div out of the HTML, saving each page as an HTML snippet. In TWiki or Foswiki, the content div typically has an ID of patternMainContents. For a small number of pages, you can use your browser’s element inspector to help you copy and paste the content. For a large number of pages, you can try automating this process with curl and some custom script that strips off everything that isn’t the content div.

  2. Run all resulting HTML snippets through /usr/bin/tidy.

  3. Download and install pandoc.

  4. Run all tidied HTML snippets through pandoc, converting them to Markdown, RST, or the output format of your choice.

There’s a good chance the conversion won’t go all the way, particularly if your documentation used any TWiki plugins that rely on JavaScript (like the “Twisty” plugin). Look for sections that didn’t convert properly and fix them up by hand.

The main takeaway is that you don’t want to mess with parsing the horrible native wiki format. Let the wiki do its thing and just get the resulting HTML — that’s something you can work with.

Tech Pubs Tuesday: Catching Regressions in Writing

Like everyone else, I make plenty of huge embarrassing typos. Over the years, I’ve tried to discover patterns in how and when I make basic errors.

Perhaps the most common scenario is when I start out with a paragraph that’s correct, and then decide to wordsmith a sentence or clause within that paragraph. For me at least, this is a huge DANGER ZONE moment — there’s an excellent chance that fixing that little passage will actually spawn more errors.

Just to pick a random example, I might start with a paragraph like,

Perhaps one of the most common scenarios is when I start out with a paragraph that’s correct, and then decide to wordsmith a sentence or clause within that paragraph. For me at least, this is a huge DANGER ZONE moment — there’s an excellent chance that fixing that little passage will actually spawn more errors.

Then I decide that the lead sentence would be stronger if I said, “the most common” instead of the more hedging, “one of the most common.” So I change the paragraph:

Perhaps the most common scenarios is when I start out with a paragraph that’s correct, and then decide …

Oops! I forgot to change “scenarios” back to singular. There’s something about going back and focusing on a tiny section of text that makes me forget about overall coherence at the sentence and paragraph.

If this kind of thing trips you up as well, you can do one of two things.

Option A is to do what many professional writers tell you to do: write your entire rough draft without any editing, then go back and do a big editing sweep. Editing entire passages all at once means you’re less likely to make this kind of coherence error.

Unfortunately, I’ve never written that way. I am a writer who backtracks. Always have been, always will be.

So Option B is to cultivate a sense of hyper-awareness anytime you go back and refactor something you’ve just written. This is the approach I take. If you go back and fix up something you’ve just written, the hairs on the back of your neck should rise.

If it helps, you can think of it this way:

  • each sentence in a paragraph is like a function
  • all the sentences in a paragraph work together like a collection of functions in a small class or module
  • if you tweak a sentence that already works, that’s like changing the arguments or the return value of a function that already works. So… are you sure you didn’t screw up something else that depends on that sentence?

You might be wondering, “Are there test suites for sentences?” In fact, yes, there are! These test suites are called “technical copyeditors,” and they’re very expensive to run (even once).

So in lieu of that, your next best option is to go back and eyeball the rest of the paragraph. At a minimum, carefully read the preceding and following clauses for tense agreement, etc. That should catch most of these “regression” style writing errors, but keep in mind that the best defense is always to get another pair of eyes on your work.

Tech Pubs Tuesday: Don’t Write Directly in HTML

Since most of the documentation you produce is going to be hosted as HTML anyway, you might be wondering: why not just cut out the middleman and write all your documentation in plain HTML? Instead of learning some weird intermediate format that transforms to HTML, wouldn’t it make sense to handcraft whatever markup, CSS, and JS that you need directly?

Actually, since most documentation tools and formats range from mediocre to awful, plain HTML isn’t necessarily a bad choice. I’ve seen some beautiful documentation authored in handcrafted HTML. That said, using a more specialized format will make things easier on you. Here’s why.

First, typing out HTML open and close tags is annoying and breaks your writing flow, even if you have a good text editor or IDE. If you’ve used a lightweight markup language like Markdown or reStructuredText, you know the difference. Creating a new paragraph with newline, newline is more natural than angle-bracket, p, angle-bracket. Creating a bullet list with leading *s is much more natural than typing <ul>s and <li>s.

Second, maybe you do want multiple output formats! Are you sure you don’t want PDF? What about man pages? If you want man pages, you’re going to have to invent some mechanism for transforming your HTML into TROFF (a venal sin), or you’re going to have to write the same material twice (a mortal sin).

Third, and perhaps most important, HTML is deliberately primitive and general-purpose. It lacks the semantics that you want for describing technical documentation. For example, when writing a sophisticated book, you might want things like:

  • Real cross-references. You want to create a link that points to a section or an example or a table that automatically updates itself when the thing it references moves, or changes its title.
  • Fancy admonitions (warnings, dangers, cautions, notes, tips)
  • Fancy titled tables and figures
  • Fancy titled code examples
  • Automatically generated tables of contents, lists of figures, lists of examples…
  • Automatically generated glossaries and indexes. (Okay, who am I kidding? Nobody cares about indexes anymore. Sniff.)
  • File inclusions (raw, interpreted, syntax highlighted or not, with line numbers or not, …)
  • Replaceable text
  • Conditional text (generating different aspects of the book from the same source)

… and so on.

I think my bottom line is, you can get away with writing a small amount of documentation in plain HTML, such as a README or a short install guide. But the larger your book, the harder this gets. The pattern you really want to avoid is:

  1. Start authoring a substantial book in HTML.
  2. Part way through, discover that you need some of the features on the list above.
  3. Start hacking those features in with some kind of special ad-hoc syntax. No worries — it’s not too hard, it’s just one or two “special tags” or “macros”…
  4. Eventually end up re-inventing a bad version of reStructuredText or Pandoc-flavored Markdown. Except with way more angle brackets.

Don’t be that guy.