Documentation Generator

Generating documentation from comments embedded in code

Terms defined: accumulator, block comment, doc comment, line comment, slug

Many programmers believe they're more likely to write documentation and keep it up to date if it is close to the code. Tools that extract specially-formatted comments from code and turn them into documentation have been around since the 1980s; many are used for JavaScript, including JSDoc and ESDoc (which we use to document the tools that build this book). This chapter will use what we now know about parsing source code to build a simple documentation generator of our own.

How can we extract documentation comments?

Once again we will use Acorn to extract comments by providing an onComment option to the parser with an array for it to fill in. For the moment we won't bother to assign the AST produced by parsing to a variable because we are just interested in the comments:

import fs from 'fs'
import acorn from 'acorn'

const text = fs.readFileSync(process.argv[2], 'utf-8')
const options = {
  sourceType: 'module',
  locations: true,
  onComment: []
}
acorn.parse(text, options)
console.log(JSON.stringify(options.onComment, null, 2))
// double-slash comment
/* slash-star comment */
[
  {
    "type": "Line",
    "value": " double-slash comment",
    "start": 0,
    "end": 23,
    "loc": {
      "start": {
        "line": 1,
        "column": 0
      },
      "end": {
        "line": 1,
        "column": 23
      }
    }
  },
  {
    "type": "Block",
    "value": " slash-star comment ",
    "start": 24,
    "end": 48,
    "loc": {
      "start": {
        "line": 2,
        "column": 0
      },
      "end": {
        "line": 2,
        "column": 24
      }
    }
  }
]

There is more information here than we need, so let's slim down the JSON that we extract:

import fs from 'fs'
import acorn from 'acorn'

const text = fs.readFileSync(process.argv[2], 'utf-8')
const options = {
  sourceType: 'module',
  locations: true,
  onComment: []
}
acorn.parse(text, options)
const subset = options.onComment.map(entry => {
  return {
    type: entry.type,
    value: entry.value,
    start: entry.loc.start.line,
    end: entry.loc.end.line
  }
})
console.log(JSON.stringify(subset, null, 2))
node extract-comments-subset.js two-kinds-of-comment.js
[
  {
    "type": "Line",
    "value": " double-slash comment",
    "start": 1,
    "end": 1
  },
  {
    "type": "Block",
    "value": " slash-star comment ",
    "start": 2,
    "end": 2
  }
]

Acorn distinguishes two kinds of comments. Line comments cannot span multiple lines; if one line comment occurs immediately after another, Acorn reports two comments ():

//
// multi-line double-slash comment
//
node extract-comments-subset.js multi-line-double-slash-comment.js
[
  {
    "type": "Line",
    "value": "",
    "start": 1,
    "end": 1
  },
  {
    "type": "Line",
    "value": " multi-line double-slash comment",
    "start": 2,
    "end": 2
  },
  {
    "type": "Line",
    "value": "",
    "start": 3,
    "end": 3
  }
]
Line and block comments
How line comments and block comments are distinguished and represented.

Block comments, on the other hand, can span any number of lines. We don't need to prefix each line with * but most people do for readability:

/*
 * multi-line slash-star comment
 */
node extract-comments-subset.js multi-line-slash-star-comment.js
[
  {
    "type": "Block",
    "value": "\n * multi-line slash-star comment\n ",
    "start": 1,
    "end": 3
  }
]

By convention, we use block comments that start with /** for documentation. The first two characters are recognized by the parser as "start of comment", so the first character in the extracted text is *:

/**
 * doc comment
 */
[
  {
    "type": "Block",
    "value": "*\n * doc comment\n ",
    "start": 1,
    "end": 3
  }
]

What input will we try to handle?

We will use [Markdown][markdown] for formatting our documentation. The documentation for function definitions looks like this:

/**
 * # Demonstrate documentation generator.
 */

import util from './util-plain'

/**
 * ## `main`: Main driver.
 */
const main = () => {
  // Parse arguments.
  // Process input stream.
}

/**
 * ## `parseArgs`: Parse command line.
 * - `args` (`string[]`): arguments to parse.
 * - `defaults` (`Object`): default values.
 *
 * Returns: program configuration object.
 */
const parseArgs = (args, defaults) => {
  // body would go here
}

/**
 * ## `process`: Transform data.
 * - `input` (`stream`): where to read.
 * - `output` (`stream`): where to write.
 * - `op` (`class`): what to do.
 *    Use @BaseProcessor unless told otherwise.
 */
const process = (input, output, op = util.BaseProcessor) => {
  // body would go here
}

while the documentation for class definitions looks like this:

/**
 * # Utilities to demonstrate doc generator.
 */

/**
 * ## `BaseProcessor`: General outline.
 */
class BaseProcessor {
  /**
   * ### `constructor`: Build processor.
   */
  constructor () {
    // body would go here
  }

  /**
   * ### `run`: Pass input to output.
   * - `input` (`stream`): where to read.
   * - `output` (`stream`): where to write.
   */
  run (input, output) {
    // body would go here
  }
}

export default BaseProcessor

The embedded comments are rather unpleasant at the moment: the function and method names from the code are repeated in the doc comments, we have to create titles ourselves, and we have to remember the back-quotes for formatting code. We will fix some of these problems once we have a basic tool up and running.

There are many [Markdown][markdown] parsers in JavaScript; after a bit of experimentation, we decided to use markdown-it along with the markdown-it-anchor extension that creates HTML anchors for headings. The main program gets all the doc comments from all of the input files, converts the Markdown to HTML, and displays that:

const main = () => {
  const allComments = getAllComments(process.argv.slice(2))
  const md = new MarkdownIt({ html: true })
    .use(MarkdownAnchor, { level: 1, slugify: slugify })
  const html = md.render(allComments)
  console.log(html)
}

To get all the comments we extract comments from all the files, remove the leading * characters (which aren't part of the documentation), and then join up the results after stripping off extraneous blanks:

const getAllComments = (allFilenames) => {
  return allFilenames
    .map(filename => {
      const comments = extractComments(filename)
      return { filename, comments }
    })
    .map(({ filename, comments }) => {
      comments = comments.map(comment => removePrefix(comment))
      return { filename, comments }
    })
    .map(({ filename, comments }) => {
      const combined = comments
        .map(comment => comment.stripped)
        .join('\n\n')
      return `# ${filename}\n\n${combined}`
    })
    .join('\n\n')
}

Extracting the comments from a single file is done as before:

const extractComments = (filename) => {
  const text = fs.readFileSync(filename, 'utf-8')
  const options = {
    sourceType: 'module',
    locations: true,
    onComment: []
  }
  acorn.parse(text, options)
  const subset = options.onComment
    .filter(entry => entry.type === 'Block')
    .map(entry => {
      return {
        type: entry.type,
        value: entry.value,
        start: entry.start,
        end: entry.end
      }
    })
  return subset
}

and removing the prefix * characters is a matter of splitting the text into lines, removing the leading spaces and asterisks, and putting the lines back together:

const removePrefix = (comment) => {
  comment.stripped = comment.value
    .split('\n')
    .slice(0, -1)
    .map(line => line.replace(/^ *\/?\* */, ''))
    .map(line => line.replace('*/', ''))
    .join('\n')
    .trim()
  return comment
}

One thing that isn't in this file (because we're going to use it in later versions) is the function slugify. A slug is a short string that identifies a header or a web page; the name comes from the era of newspapers, where a slug was a short name used to identify an article while it was in production. Our slugify function strips unnecessary characters out of a title, adds hyphens, and generally makes it something you might see in a URL:

const slugify = (text) => {
  return encodeURIComponent(
    text.split(' ')[0]
      .replace(/.js$/, '')
      .trim()
      .toLowerCase()
      .replace(/[^ \w]/g, '')
      .replace(/\s+/g, '-')
  )
}

export default slugify

Let's run the first version of our documentation generator and see what it produces ():

node process-plain.js example-plain.js util-plain.js
<h1 id="exampleplain">example-plain.js</h1>
<h1 id="demonstrate">Demonstrate documentation generator.</h1>
<h2 id="main"><code>main</code>: Main driver.</h2>
<h2 id="parseargs"><code>parseArgs</code>: Parse command line.</h2>
<ul>
<li><code>args</code> (<code>string[]</code>): arguments to parse.</li>
<li><code>defaults</code> (<code>Object</code>): default values.</li>
</ul>
<p>Returns: program configuration object.</p>
<h2 id="process"><code>process</code>: Transform data.</h2>
<ul>
<li><code>input</code> (<code>stream</code>): where to read.</li>
<li><code>output</code> (<code>stream</code>): where to write.</li>
<li><code>op</code> (<code>class</code>): what to do.
Use @BaseProcessor unless told otherwise.</li>
</ul>
<h1 id="utilplain">util-plain.js</h1>
<h1 id="utilities">Utilities to demonstrate doc generator.</h1>
<h2 id="baseprocessor"><code>BaseProcessor</code>: General outline.</h2>
<h3 id="constructor"><code>constructor</code>: Build processor.</h3>
<h3 id="run"><code>run</code>: Pass input to output.</h3>
<ul>
<li><code>input</code> (<code>stream</code>): where to read.</li>
<li><code>output</code> (<code>stream</code>): where to write.</li>
</ul>

example-plain.js

Demonstrate documentation generator.

main: Main driver.

parseArgs: Parse command line.

Returns: program configuration object.

process: Transform data.

util-plain.js

Utilities to demonstrate doc generator.

BaseProcessor: General outline.

constructor: Build processor.

run: Pass input to output.

Mapping comments to documentation
How comments in code map to documentation in HTML.

Again, there is a lot of room for improvement: there is a double h1 header for each file (the filename and and the title comment), the anchor IDs are hard to read, there are no cross-references, and so on. Some of the visual issues can be resolved with CSS, and we can change our input format to make processing easier as long as it also makes authoring easier. However, anything that is written twice will eventually be wrong in one place or another, so our first priority is to remove duplication.

How can we avoid duplicating function names?

If a comment is the first thing in a file, we want to use it as title text; this will save us having to write an explicit level-1 title in a comment. For each other comment, we can extract the name of the function or method from the node on the line immediately following the doc comment. This allows us to write much tidier comments:

/**
 * Overall file header.
 */

/**
 * Double the input.
 */
const double = (x) => 2 * x

/**
 * Triple the input.
 */
function triple (x) {
  return 3 * x
}

/**
 * Define a class.
 */
class Example {
  /**
   * Method description.
   */
  someMethod () {
  }
}

To extract and display information from nodes immediately following doc comments we must find all the block comments, record the last line of each, and then recurse through the AST to find the node on line immediately following any of those last comment lines. The main program finds the comments as usual, creates a set containing the line numbers we are looking for, then searches for the nodes we want:

const main = () => {
  const options = {
    sourceType: 'module',
    locations: true,
    onComment: []
  }
  const text = fs.readFileSync(process.argv[2], 'utf-8')
  const ast = acorn.parse(text, options)
  const comments = options.onComment
    .filter(entry => entry.type === 'Block')
    .map(entry => {
      return {
        value: entry.value,
        start: entry.loc.start.line,
        end: entry.loc.end.line
      }
    })
  const targets = new Set(comments.map(comment => comment.end + 1))
  const nodes = []
  findFollowing(ast, targets, nodes)
  console.log(nodes.map(node => condense(node)))
}

The recursive search is straightforward as well---the only new trick is that we delete line numbers from the target set and add nodes to the accumulator as we find matches:

const findFollowing = (node, targets, accum) => {
  if ((!node) || (typeof node !== 'object') || (!('type' in node))) {
    return
  }

  if (targets.has(node.loc.start.line)) {
    accum.push(node)
    targets.delete(node.loc.start.line)
  }

  for (const key in node) {
    if (Array.isArray(node[key])) {
      node[key].forEach(child => findFollowing(child, targets, accum))
    } else if (typeof node[key] === 'object') {
      findFollowing(node[key], targets, accum)
    }
  }
}

Finally, we use a function called condense to get the name we want out of the AST we have:

const condense = (node) => {
  const result = {
    type: node.type,
    start: node.loc.start.line
  }
  switch (node.type) {
    case 'VariableDeclaration':
      result.name = node.declarations[0].id.name
      break
    case 'FunctionDeclaration':
      result.name = node.id.name
      break
    case 'ClassDeclaration':
      result.name = node.id.name
      break
    case 'MethodDefinition':
      result.name = node.key.name
      break
    default:
      assert.fail(`Unknown node type ${node.type}`)
      break
  }
  return result
}

We need this because we get a different structure with:

const name = function() => {
}

than we get with:

function name() {
}

When we run this on our test case we get:

[
  { type: 'VariableDeclaration', start: 8, name: 'double' },
  { type: 'FunctionDeclaration', start: 13, name: 'triple' },
  { type: 'ClassDeclaration', start: 20, name: 'Example' },
  { type: 'MethodDefinition', start: 24, name: 'someMethod' }
]

We can use this to create better output:

import MarkdownIt from 'markdown-it'
import MarkdownAnchor from 'markdown-it-anchor'

import getComments from './get-comments.js'
import getDefinitions from './get-definitions.js'
import fillIn from './fill-in.js'
import slugify from './slugify.js'

const main = () => {
  const filenames = process.argv.slice(2)
  const allComments = getComments(filenames)
  const allDefinitions = getDefinitions(filenames)
  const combined = []
  for (const [filename, comments] of allComments) {
    const definitions = allDefinitions.get(filename)
    const text = fillIn(filename, comments, definitions)
    combined.push(text)
  }
  const md = new MarkdownIt({ html: true })
    .use(MarkdownAnchor, { level: 1, slugify: slugify })
  const html = md.render(combined.join('\n\n'))
  console.log(html)
}

main()
<h1 id="fillinheadersinput">fill-in-headers-input.js</h1>
<p>Demonstrate documentation generator.</p>
<h2 id="main">main</h2>
<p>Main driver.</p>
<h2 id="parseargs">parseArgs</h2>
<p>Parse command-line arguments.</p>
<ul>
<li><code>args</code> (<code>string[]</code>): arguments to parse.</li>
<li><code>defaults</code> (<code>Object</code>): default values.</li>
</ul>
<blockquote>
<p>Program configuration object.</p>
</blockquote>
<h2 id="baseprocessor">BaseProcessor</h2>
<p>Default processing class.</p>
<h3 id="constructor">constructor</h3>
<p>Build base processor.</p>
<h3 id="run">run</h3>
<p>Pass input to output.</p>
<ul>
<li><code>input</code> (<code>stream</code>): where to read.</li>
<li><code>output</code> (<code>stream</code>): where to write.</li>
</ul>

fill-in-headers-input.js

Demonstrate documentation generator.

main

Main driver.

parseArgs

Parse command-line arguments.

Program configuration object.

BaseProcessor

Default processing class.

constructor

Build base processor.

run

Pass input to output.

Exercises

Building an index

Modify the documentation generator to produce an alphabetical index of all classes and methods found. Index entries should be hyperlinks to the documentation for the corresponding item.

Documenting exceptions

Extend the documentation generator to allow people to document the exceptions that a function throws.

Deprecation warning

Add a feature to the documentation generator to allow authors to mark functions and methods as deprecation (i.e., to indicate that while they still exist, they should not be used because they are being phased out).

Usage examples

Enhance the documentation generator so that if a horizontal rule --- appears in a documentation comment, the text following is typeset as usage example. (A doc comment may contain several usage examples.)

Unit testing

Write unit tests for the documentation generator using Mocha.

Summarizing functions

Modify the documentation generator so that line comments inside a function that use //* are formatted as a bullet list in the documentation for that function.

Cross referencing

Modify the documentation generator so that the documentation for one class or function can include Markdown links to other classes or functions.

Data types

Modify the documentation generator to allow authors to define new data types in the same way as JSDoc.

Inline parameter documentation

Some documentation generators put the documentation for a parameter on the same line as the parameter:

/**
 * Transform data.
 */
function process(
  input,  /*- {stream} where to read */
  output, /*- {stream} where to write */
  op      /*- {Operation} what to do */
){
  // body would go here
}

Modify the documentation generator to handle this.

Tests as documentation

The doctest library for Python allows programmers to embed unit tests as documentation in their programs. Write a tool that:

  1. Finds functions that start with a block comment.

  2. Extracts the code and output from those blocks comments and turns them into assertions.

For example, given this input:

const findIncreasing = (values) => {
  /**
   * > findIncreasing([])
   * []
   * > findIncreasing([1])
   * [1]
   * > findIncreasing([1, 2])
   * [1, 2]
   * > findIncreasing([2, 1])
   * [2]
   */
}

the tool would produce:

assert.deepStrictEqual(findIncreasing([]), [])
assert.deepStrictEqual(findIncreasing([1]), [1])
assert.deepStrictEqual(findIncreasing([1, 2]), [1, 2])
assert.deepStrictEqual(findIncreasing([2, 1]), [2])