Documentation Generator
Generating documentation from comments embedded in code
Terms defined:
Many programmers believe they're more likely to write documentation and keep it up to date if it is close to the code. Tools that extract specially-formatted comments from code and turn them into documentation have been around since the 1980s; many are used for JavaScript, including JSDoc and ESDoc (which we use to document the tools that build this book). This chapter will use what we now know about parsing source code to build a simple documentation generator of our own.
How can we extract documentation comments?
Once again we will use Acorn to extract comments
by providing an onComment
option to the parser
with an array for it to fill in.
For the moment we won't bother to assign the AST produced by parsing to a variable
because we are just interested in the comments:
import fs from 'fs'
import acorn from 'acorn'
const text = fs.readFileSync(process.argv[2], 'utf-8')
const options = {
sourceType: 'module',
locations: true,
onComment: []
}
acorn.parse(text, options)
console.log(JSON.stringify(options.onComment, null, 2))
// double-slash comment
/* slash-star comment */
[
{
"type": "Line",
"value": " double-slash comment",
"start": 0,
"end": 23,
"loc": {
"start": {
"line": 1,
"column": 0
},
"end": {
"line": 1,
"column": 23
}
}
},
{
"type": "Block",
"value": " slash-star comment ",
"start": 24,
"end": 48,
"loc": {
"start": {
"line": 2,
"column": 0
},
"end": {
"line": 2,
"column": 24
}
}
}
]
There is more information here than we need, so let's slim down the JSON that we extract:
import fs from 'fs'
import acorn from 'acorn'
const text = fs.readFileSync(process.argv[2], 'utf-8')
const options = {
sourceType: 'module',
locations: true,
onComment: []
}
acorn.parse(text, options)
const subset = options.onComment.map(entry => {
return {
type: entry.type,
value: entry.value,
start: entry.loc.start.line,
end: entry.loc.end.line
}
})
console.log(JSON.stringify(subset, null, 2))
node extract-comments-subset.js two-kinds-of-comment.js
[
{
"type": "Line",
"value": " double-slash comment",
"start": 1,
"end": 1
},
{
"type": "Block",
"value": " slash-star comment ",
"start": 2,
"end": 2
}
]
Acorn distinguishes two kinds of comments.
//
// multi-line double-slash comment
//
node extract-comments-subset.js multi-line-double-slash-comment.js
[
{
"type": "Line",
"value": "",
"start": 1,
"end": 1
},
{
"type": "Line",
"value": " multi-line double-slash comment",
"start": 2,
"end": 2
},
{
"type": "Line",
"value": "",
"start": 3,
"end": 3
}
]
*
but most people do for readability:
/*
* multi-line slash-star comment
*/
node extract-comments-subset.js multi-line-slash-star-comment.js
[
{
"type": "Block",
"value": "\n * multi-line slash-star comment\n ",
"start": 1,
"end": 3
}
]
By convention,
we use block comments that start with /**
for documentation.
The first two characters are recognized by the parser as "start of comment",
so the first character in the extracted text is *
:
/**
* doc comment
*/
[
{
"type": "Block",
"value": "*\n * doc comment\n ",
"start": 1,
"end": 3
}
]
What input will we try to handle?
We will use [Markdown][markdown] for formatting our documentation. The documentation for function definitions looks like this:
/**
* # Demonstrate documentation generator.
*/
import util from './util-plain'
/**
* ## `main`: Main driver.
*/
const main = () => {
// Parse arguments.
// Process input stream.
}
/**
* ## `parseArgs`: Parse command line.
* - `args` (`string[]`): arguments to parse.
* - `defaults` (`Object`): default values.
*
* Returns: program configuration object.
*/
const parseArgs = (args, defaults) => {
// body would go here
}
/**
* ## `process`: Transform data.
* - `input` (`stream`): where to read.
* - `output` (`stream`): where to write.
* - `op` (`class`): what to do.
* Use @BaseProcessor unless told otherwise.
*/
const process = (input, output, op = util.BaseProcessor) => {
// body would go here
}
while the documentation for class definitions looks like this:
/**
* # Utilities to demonstrate doc generator.
*/
/**
* ## `BaseProcessor`: General outline.
*/
class BaseProcessor {
/**
* ### `constructor`: Build processor.
*/
constructor () {
// body would go here
}
/**
* ### `run`: Pass input to output.
* - `input` (`stream`): where to read.
* - `output` (`stream`): where to write.
*/
run (input, output) {
// body would go here
}
}
export default BaseProcessor
The embedded comments are rather unpleasant at the moment:
the function and method names from the code are repeated in the
There are many [Markdown][markdown] parsers in JavaScript;
after a bit of experimentation,
we decided to use markdown-it
along with the markdown-it-anchor
extension
that creates HTML anchors for headings.
The main program gets all the doc comments from all of the input files,
converts the Markdown to HTML,
and displays that:
const main = () => {
const allComments = getAllComments(process.argv.slice(2))
const md = new MarkdownIt({ html: true })
.use(MarkdownAnchor, { level: 1, slugify: slugify })
const html = md.render(allComments)
console.log(html)
}
To get all the comments
we extract comments from all the files,
remove the leading *
characters (which aren't part of the documentation),
and then join up the results after stripping off extraneous blanks:
const getAllComments = (allFilenames) => {
return allFilenames
.map(filename => {
const comments = extractComments(filename)
return { filename, comments }
})
.map(({ filename, comments }) => {
comments = comments.map(comment => removePrefix(comment))
return { filename, comments }
})
.map(({ filename, comments }) => {
const combined = comments
.map(comment => comment.stripped)
.join('\n\n')
return `# ${filename}\n\n${combined}`
})
.join('\n\n')
}
Extracting the comments from a single file is done as before:
const extractComments = (filename) => {
const text = fs.readFileSync(filename, 'utf-8')
const options = {
sourceType: 'module',
locations: true,
onComment: []
}
acorn.parse(text, options)
const subset = options.onComment
.filter(entry => entry.type === 'Block')
.map(entry => {
return {
type: entry.type,
value: entry.value,
start: entry.start,
end: entry.end
}
})
return subset
}
and removing the prefix *
characters is a matter of splitting the text into lines,
removing the leading spaces and asterisks,
and putting the lines back together:
const removePrefix = (comment) => {
comment.stripped = comment.value
.split('\n')
.slice(0, -1)
.map(line => line.replace(/^ *\/?\* */, ''))
.map(line => line.replace('*/', ''))
.join('\n')
.trim()
return comment
}
One thing that isn't in this file (because we're going to use it in later versions)
is the function slugify
.
A slugify
function strips unnecessary characters out of a title,
adds hyphens,
and generally makes it something you might see in a URL:
const slugify = (text) => {
return encodeURIComponent(
text.split(' ')[0]
.replace(/.js$/, '')
.trim()
.toLowerCase()
.replace(/[^ \w]/g, '')
.replace(/\s+/g, '-')
)
}
export default slugify
Let's run the first version of our documentation generator
and see what it produces
(
node process-plain.js example-plain.js util-plain.js
<h1 id="exampleplain">example-plain.js</h1>
<h1 id="demonstrate">Demonstrate documentation generator.</h1>
<h2 id="main"><code>main</code>: Main driver.</h2>
<h2 id="parseargs"><code>parseArgs</code>: Parse command line.</h2>
<ul>
<li><code>args</code> (<code>string[]</code>): arguments to parse.</li>
<li><code>defaults</code> (<code>Object</code>): default values.</li>
</ul>
<p>Returns: program configuration object.</p>
<h2 id="process"><code>process</code>: Transform data.</h2>
<ul>
<li><code>input</code> (<code>stream</code>): where to read.</li>
<li><code>output</code> (<code>stream</code>): where to write.</li>
<li><code>op</code> (<code>class</code>): what to do.
Use @BaseProcessor unless told otherwise.</li>
</ul>
<h1 id="utilplain">util-plain.js</h1>
<h1 id="utilities">Utilities to demonstrate doc generator.</h1>
<h2 id="baseprocessor"><code>BaseProcessor</code>: General outline.</h2>
<h3 id="constructor"><code>constructor</code>: Build processor.</h3>
<h3 id="run"><code>run</code>: Pass input to output.</h3>
<ul>
<li><code>input</code> (<code>stream</code>): where to read.</li>
<li><code>output</code> (<code>stream</code>): where to write.</li>
</ul>
example-plain.js
Demonstrate documentation generator.
main
: Main driver.
parseArgs
: Parse command line.
args
(string[]
): arguments to parse.defaults
(Object
): default values.
Returns: program configuration object.
process
: Transform data.
input
(stream
): where to read.output
(stream
): where to write.op
(class
): what to do. Use @BaseProcessor unless told otherwise.
util-plain.js
Utilities to demonstrate doc generator.
BaseProcessor
: General outline.
constructor
: Build processor.
run
: Pass input to output.
input
(stream
): where to read.output
(stream
): where to write.
Again,
there is a lot of room for improvement:
there is a double h1
header for each file (the filename and and the title comment),
the anchor IDs are hard to read,
there are no cross-references,
and so on.
Some of the visual issues can be resolved with CSS,
and we can change our input format to make processing easier
as long as it also makes authoring easier.
However,
anything that is written twice will eventually be wrong in one place or another,
so our first priority is to remove duplication.
How can we avoid duplicating function names?
If a comment is the first thing in a file, we want to use it as title text; this will save us having to write an explicit level-1 title in a comment. For each other comment, we can extract the name of the function or method from the node on the line immediately following the doc comment. This allows us to write much tidier comments:
/**
* Overall file header.
*/
/**
* Double the input.
*/
const double = (x) => 2 * x
/**
* Triple the input.
*/
function triple (x) {
return 3 * x
}
/**
* Define a class.
*/
class Example {
/**
* Method description.
*/
someMethod () {
}
}
To extract and display information from nodes immediately following doc comments we must find all the block comments, record the last line of each, and then recurse through the AST to find the node on line immediately following any of those last comment lines. The main program finds the comments as usual, creates a set containing the line numbers we are looking for, then searches for the nodes we want:
const main = () => {
const options = {
sourceType: 'module',
locations: true,
onComment: []
}
const text = fs.readFileSync(process.argv[2], 'utf-8')
const ast = acorn.parse(text, options)
const comments = options.onComment
.filter(entry => entry.type === 'Block')
.map(entry => {
return {
value: entry.value,
start: entry.loc.start.line,
end: entry.loc.end.line
}
})
const targets = new Set(comments.map(comment => comment.end + 1))
const nodes = []
findFollowing(ast, targets, nodes)
console.log(nodes.map(node => condense(node)))
}
The recursive search is straightforward as well---the only new trick is that
we delete line numbers from the target set
and add nodes to the
const findFollowing = (node, targets, accum) => {
if ((!node) || (typeof node !== 'object') || (!('type' in node))) {
return
}
if (targets.has(node.loc.start.line)) {
accum.push(node)
targets.delete(node.loc.start.line)
}
for (const key in node) {
if (Array.isArray(node[key])) {
node[key].forEach(child => findFollowing(child, targets, accum))
} else if (typeof node[key] === 'object') {
findFollowing(node[key], targets, accum)
}
}
}
Finally,
we use a function called condense
to get the name we want out of the AST we have:
const condense = (node) => {
const result = {
type: node.type,
start: node.loc.start.line
}
switch (node.type) {
case 'VariableDeclaration':
result.name = node.declarations[0].id.name
break
case 'FunctionDeclaration':
result.name = node.id.name
break
case 'ClassDeclaration':
result.name = node.id.name
break
case 'MethodDefinition':
result.name = node.key.name
break
default:
assert.fail(`Unknown node type ${node.type}`)
break
}
return result
}
We need this because we get a different structure with:
const name = function() => {
}
than we get with:
function name() {
}
When we run this on our test case we get:
[
{ type: 'VariableDeclaration', start: 8, name: 'double' },
{ type: 'FunctionDeclaration', start: 13, name: 'triple' },
{ type: 'ClassDeclaration', start: 20, name: 'Example' },
{ type: 'MethodDefinition', start: 24, name: 'someMethod' }
]
We can use this to create better output:
import MarkdownIt from 'markdown-it'
import MarkdownAnchor from 'markdown-it-anchor'
import getComments from './get-comments.js'
import getDefinitions from './get-definitions.js'
import fillIn from './fill-in.js'
import slugify from './slugify.js'
const main = () => {
const filenames = process.argv.slice(2)
const allComments = getComments(filenames)
const allDefinitions = getDefinitions(filenames)
const combined = []
for (const [filename, comments] of allComments) {
const definitions = allDefinitions.get(filename)
const text = fillIn(filename, comments, definitions)
combined.push(text)
}
const md = new MarkdownIt({ html: true })
.use(MarkdownAnchor, { level: 1, slugify: slugify })
const html = md.render(combined.join('\n\n'))
console.log(html)
}
main()
<h1 id="fillinheadersinput">fill-in-headers-input.js</h1>
<p>Demonstrate documentation generator.</p>
<h2 id="main">main</h2>
<p>Main driver.</p>
<h2 id="parseargs">parseArgs</h2>
<p>Parse command-line arguments.</p>
<ul>
<li><code>args</code> (<code>string[]</code>): arguments to parse.</li>
<li><code>defaults</code> (<code>Object</code>): default values.</li>
</ul>
<blockquote>
<p>Program configuration object.</p>
</blockquote>
<h2 id="baseprocessor">BaseProcessor</h2>
<p>Default processing class.</p>
<h3 id="constructor">constructor</h3>
<p>Build base processor.</p>
<h3 id="run">run</h3>
<p>Pass input to output.</p>
<ul>
<li><code>input</code> (<code>stream</code>): where to read.</li>
<li><code>output</code> (<code>stream</code>): where to write.</li>
</ul>
fill-in-headers-input.js
Demonstrate documentation generator.
main
Main driver.
parseArgs
Parse command-line arguments.
args
(string[]
): arguments to parse.defaults
(Object
): default values.
Program configuration object.
BaseProcessor
Default processing class.
constructor
Build base processor.
run
Pass input to output.
input
(stream
): where to read.output
(stream
): where to write.
Exercises
Building an index
Modify the documentation generator to produce an alphabetical index of all classes and methods found. Index entries should be hyperlinks to the documentation for the corresponding item.
Documenting exceptions
Extend the documentation generator to allow people to document the exceptions that a function throws.
Deprecation warning
Add a feature to the documentation generator
to allow authors to mark functions and methods as
Usage examples
Enhance the documentation generator so that
if a horizontal rule ---
appears in a documentation comment,
the text following is typeset as usage example.
(A doc comment may contain several usage examples.)
Unit testing
Write unit tests for the documentation generator using Mocha.
Summarizing functions
Modify the documentation generator so that line comments inside a function that use //*
are formatted as a bullet list in the documentation for that function.
Cross referencing
Modify the documentation generator so that the documentation for one class or function can include Markdown links to other classes or functions.
Data types
Modify the documentation generator to allow authors to define new data types in the same way as JSDoc.
Inline parameter documentation
Some documentation generators put the documentation for a parameter on the same line as the parameter:
/**
* Transform data.
*/
function process(
input, /*- {stream} where to read */
output, /*- {stream} where to write */
op /*- {Operation} what to do */
){
// body would go here
}
Modify the documentation generator to handle this.
Tests as documentation
The doctest library for Python allows programmers to embed unit tests as documentation in their programs. Write a tool that:
-
Finds functions that start with a block comment.
-
Extracts the code and output from those blocks comments and turns them into assertions.
For example, given this input:
const findIncreasing = (values) => {
/**
* > findIncreasing([])
* []
* > findIncreasing([1])
* [1]
* > findIncreasing([1, 2])
* [1, 2]
* > findIncreasing([2, 1])
* [2]
*/
}
the tool would produce:
assert.deepStrictEqual(findIncreasing([]), [])
assert.deepStrictEqual(findIncreasing([1]), [1])
assert.deepStrictEqual(findIncreasing([1, 2]), [1, 2])
assert.deepStrictEqual(findIncreasing([2, 1]), [2])