Why Oh Why Does Documentation Software Suck?

I find myself this Saturday in the possession of a half-full pitcher of mojito. This is something of a problem, given that I need that very pitcher to make mojitos for tomorrow‘s Sunday barbecue. So I have been doing my best this afternoon to rectify the problem. I only bring this up so that if this post seems less coherent than usual, it’s because of the Demon Rum. In vino veritas, and all that.

So. In the course of my job, I need to produce documentation that falls into these basic types:

  • API documentation: a terse reference for the classes and methods available for a particular C++/Java/PHP/whatever library.
  • Man pages: a terse reference for the commands and options available for a particular command-line tool.
  • User guides: conceptual information and examples, written around the relevant API documentation and man pages.

And I need to produce said documentation in the following formats:

  • HTML: the primary format for modern documentation. At my very first job, we produced our documentation as very nice perfect-bound 7″x9″ manuals using Framemaker. That era is long gone.
  • PDF: in case someone needs to print the documentation.
  • troff: man page format, suitable for installing into /usr/share/man/ or wherever man pages go. To be honest, I’m somewhat confused about the difference between troff, nroff, and other *off variations. But I suppose I shouldn’t worry my pretty little tech writer head over such things.

For engineering documentation, I don’t think these types and formats are all that shocking. There are thousands of writers and engineers who are faced with the same problem every day. And yet there is no documentation technology that can handle all of these documentation types and output formats seamlessly. None.

AuthorIT, Framemaker + Webworks, and other mid-range tech writing tools can at least produce output HTML and PDF. All of these tools are Windows-only. All use a proprietary binary format. None handles man pages and source code-generated API documentation. (We won’t even mention Microsoft Word, which still hasn’t figured out how to do ordered lists consistently, or handle documents longer than 100 pages.)

The only toolchain I’m aware of that even comes close is Docbook. It’s text/xml, so it plays nicely with UNIX. It doesn’t require an expensive client to edit. It can produce output in myriad formats, including HTML, PDF, and man pages. It’s open source. It’s modular (with XInclude). It is the only documentation tool chain that even approaches the holy grail of user guides, API guides, and man pages.

Except… There’s no such thing as “out-of-the-box” Docbook: you need to pick your editor, XSLT processor, FO processor, and template customizations, and there is very little guidance on how to do this.

Except… the default HTML output looks like something out of 1993. Basically, the output is nicely-marked up semantic HTML with no CSS whatsoever. Which is fine, except that this means you’re going to have to sink some time into making the HTML look pretty.

Except… PDF output is really buggy, mostly because the major open source FO processor is still in beta status. Not that I blame them — XSL-FO is hard, and typesetting in general is really hard. But the alternative is to buy a commercial FO processor for $4000/CPU… grrrr…

Except… in general, source code documentation generators do not integrate with Docbook. For Java code, there’s a Javadoc doclet that produces Docbook (yay!). For PHP code, phpdocumentor can generate Docbook natively (yay again!) But for C++, Perl, Python, and other languages, you’re screwed.

Why oh why does documentation software suck?

6 thoughts on “Why Oh Why Does Documentation Software Suck?

  1. Hmmm, looks interesting, and relatively simple compared to Docbook and LaTeX. Are there more examples of the HTML output besides the docutils.sourceforge.net page itself? What does the output PDF look like? How customizable is the output? Does it do index entries, lists of tables and figures?

  2. I’m afraid I don’t have a PDF example handy, but here’s my documentation for the PEAR Log package:

    http://www.indelible.org/pear/Log/

    The HTML is primarily customized via stylesheets; the entities are well-defined.

    All of the Python PEPS (enhancement proposals) are written using docutils using another custom stylesheet:

    http://www.python.org/dev/peps/

    Essentially, if you know some Python, you can write a new Writer implementation that takes your document tree and outputs it in whatever format you want. Fortunately, most common writers are available, but if you need that added customization capability, investing some time up front in writing your own Writer might be worth it.

  3. For a docbook editor, there’s Vex – – it’s a specialised version of Eclipse – have only played with it though, many versions ago.

    But know what you’re saying. Have largely given up fighting this battle – for inline API docs phpdocumentor or equivalent but for “stand alone” end user docs, find the lesser for all evils is Perl’s Pod markup – it’s got some quirks but is mainly easy to work with, plus you can convert it to pretty much anything (e.g. man pages) – a bunch of converters come with Perl while more live on CPAN – even the BBC have contributed!

  4. Aha, I had looked at Vex earlier — it looked promising, but there didn’t seem to be a build for OS X. 🙁

    Actually, the editing situation for Docbook isn’t so bad. There are two other cross-platform semi-WYSIWYG Docbook out there — the first is Syntext’s Serna, and the second is XMLMind’s XMLEditor. Serna’s user interface is beyond awful, but XMLEditor turned out to be actually pretty usable, and the Standard Edition is free and pretty much all you need for just plain editing. XMLMind’s weakness is mostly that if you need to insert some complicated sequence of Docbook elements, you’d much rather be looking at the raw markup. But 95% of the time, the WYSIWYG interface is good.

    As for POD, I could rant about its deficiencies for a while. 🙂 The very very short version of my main problem with POD is that its semantics are just too simple for *general* technical documentation. Docbook’s semantics are too complex, which is why everyone who uses Docbook eventually pares down the schema.

    POD also got on my bad side when I tracked down a documentation bug we were having — and discovered that for the same version of Perl, pod2html on Red Hat generated wildly different markup from pod2html on FreeBSD. The custodian of the Red Hat version arbitrarily decided to change the output to “XHTML”. Why, I don’t know. But not only is the output not even remotely valid XHTML, but the structure of the markup changed, which totally fux0red the display of our man pages. Bad idea, bad implementation, not backwards-compatible, not even consistent with other versions of Perl. Gah.

Comments are closed.